26
ETNA Kent State University and Johann Radon Institute (RICAM) Electronic Transactions on Numerical Analysis. Volume 51, pp. 469–494, 2019. Copyright c 2019, Kent State University. ISSN 1068–9613. DOI: 10.1553/etna_vol51s469 FLIP-FLOP SPECTRUM-REVEALINGQR FACTORIZATION AND ITS APPLICATIONS TOSINGULAR VALUE DECOMPOSITION * YUEHUA FENG , JIANWEI XIAO , AND MING GU Abstract. We present the Flip-Flop Spectrum-Revealing QR (Flip-Flop SRQR) factorization, a significantly faster and more reliable variant of the QLP factorization of Stewart for low-rank matrix approximations. Flip-Flop SRQR uses SRQR factorization to initialize a partial column-pivoted QR factorization and then computes a partial LQ factorization. As observed by Stewart in his original QLP work, Flip-Flop SRQR tracks the exact singular values with “considerable fidelity”. We develop singular value lower bounds and residual error upper bounds for the Flip-Flop SRQR factorization. In situations where singular values of the input matrix decay relatively quickly, the low-rank approximation computed by Flip-Flop SRQR is guaranteed to be as accurate as the truncated SVD. We also perform a complexity analysis to show that Flip-Flop SRQR is faster than the randomized subspace iteration for approximating the SVD, the standard method used in the Matlab tensor toolbox. We additionally compare Flip-Flop SRQR with alternatives on two applications, a tensor approximation and a nuclear norm minimization, to demonstrate its efficiency and effectiveness. Key words. QR factorization, randomized algorithm, low-rank approximation, approximate SVD, higher-order SVD, nuclear norm minimization AMS subject classifications. 15A18, 15A23, 65F99 1. Introduction. The singular value decomposition (SVD) of a matrix A R m×n is the factorization of A into the product of three matrices A = U ΣV T where the matrices U =(u 1 , ··· ,u m ) R m×m and V =(v 1 , ··· ,v n ) R n×n are orthogonal singular vector matrices and Σ R m×n is a rectangular diagonal matrix with non-increasing non-negative singular values σ i (1 i min (m, n)) on the diagonal. The SVD has become a critical analytic tool in large data analysis and machine learning [1, 19, 53]. Let Diag (x) denote the diagonal matrix with vector x R n on its diagonal. For any 1 k min (m, n), the rank-k truncated SVD of A is defined by A k def =(u 1 , ··· ,u k ) Diag (σ 1 , ··· k )(v 1 , ··· ,v k ) T . The rank-k truncated SVD turns out to be the best rank-k approximation to A with respect to spectral norm and Frobenius norm, as explained by the Eckart-Young-Mirsky Theorem [18, 26]. However, due to the prohibitive costs in computing the rank-k truncated SVD, in practical applications one typically computes a rank-k approximate SVD which satisfies some tolerance requirements [16, 27, 30, 42, 66]. The rank-k approximate SVD has been applied to many research areas including principal component analysis (PCA) [37, 58], web search models [38], information retrieval [4, 23], and face recognition [52, 22]. Among assorted SVD approximation algorithms, the pivoted QLP decomposition pro- posed by Stewart [66] is an effective and efficient one. The pivoted QLP decomposition is obtained by computing a QR factorization with column pivoting (QRCP) [6, 25] of A to get an upper triangular factor R and then computing an LQ factorization of R to get a lower triangular factor L. Stewart’s key numerical observation is that the diagonal elements of L track the * Received October 14, 2018. Accepted November 6, 2019. Published online on December 11, 2019. Recom- mended by Y. Saad. The research by M. Gu was supported in part by NSF award FRG-1760316. School of Mathematics, Physics and Statistics, Shanghai University of Engineering Science, China ([email protected]). Department of Mathematics, University of California, Berkeley ([email protected], [email protected]). 469

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION AND …etna.mcs.kent.edu/vol.51.2019/pp469-494.dir/pp469-494.pdfminimization. In Section3, we introduce Flip-Flop SRQR and analyze its

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION AND …etna.mcs.kent.edu/vol.51.2019/pp469-494.dir/pp469-494.pdfminimization. In Section3, we introduce Flip-Flop SRQR and analyze its

ETNAKent State University and

Johann Radon Institute (RICAM)

Electronic Transactions on Numerical AnalysisVolume 51 pp 469ndash494 2019Copyright ccopy 2019 Kent State UniversityISSN 1068ndash9613DOI 101553etna_vol51s469

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION AND ITSAPPLICATIONS TO SINGULAR VALUE DECOMPOSITIONlowast

YUEHUA FENGdagger JIANWEI XIAODagger AND MING GUDagger

Abstract We present the Flip-Flop Spectrum-Revealing QR (Flip-Flop SRQR) factorization a significantlyfaster and more reliable variant of the QLP factorization of Stewart for low-rank matrix approximations Flip-FlopSRQR uses SRQR factorization to initialize a partial column-pivoted QR factorization and then computes a partialLQ factorization As observed by Stewart in his original QLP work Flip-Flop SRQR tracks the exact singularvalues with ldquoconsiderable fidelityrdquo We develop singular value lower bounds and residual error upper bounds for theFlip-Flop SRQR factorization In situations where singular values of the input matrix decay relatively quickly thelow-rank approximation computed by Flip-Flop SRQR is guaranteed to be as accurate as the truncated SVD Wealso perform a complexity analysis to show that Flip-Flop SRQR is faster than the randomized subspace iteration forapproximating the SVD the standard method used in the Matlab tensor toolbox We additionally compare Flip-FlopSRQR with alternatives on two applications a tensor approximation and a nuclear norm minimization to demonstrateits efficiency and effectiveness

Key words QR factorization randomized algorithm low-rank approximation approximate SVD higher-orderSVD nuclear norm minimization

AMS subject classifications 15A18 15A23 65F99

1 Introduction The singular value decomposition (SVD) of a matrix A isin Rmtimesn isthe factorization of A into the product of three matrices A = UΣV T where the matricesU = (u1 middot middot middot um) isin Rmtimesm and V = (v1 middot middot middot vn) isin Rntimesn are orthogonal singular vectormatrices and Σ isin Rmtimesn is a rectangular diagonal matrix with non-increasing non-negativesingular values σi (1 le i le min (mn)) on the diagonal The SVD has become a criticalanalytic tool in large data analysis and machine learning [1 19 53]

Let Diag (x) denote the diagonal matrix with vector x isin Rn on its diagonal For any1 le k le min (mn) the rank-k truncated SVD of A is defined by

Akdef= (u1 middot middot middot uk) Diag (σ1 middot middot middot σk) (v1 middot middot middot vk)

T

The rank-k truncated SVD turns out to be the best rank-k approximation to A with respect tospectral norm and Frobenius norm as explained by the Eckart-Young-Mirsky Theorem [1826]

However due to the prohibitive costs in computing the rank-k truncated SVD in practicalapplications one typically computes a rank-k approximate SVD which satisfies some tolerancerequirements [16 27 30 42 66] The rank-k approximate SVD has been applied to manyresearch areas including principal component analysis (PCA) [37 58] web search models [38]information retrieval [4 23] and face recognition [52 22]

Among assorted SVD approximation algorithms the pivoted QLP decomposition pro-posed by Stewart [66] is an effective and efficient one The pivoted QLP decomposition isobtained by computing a QR factorization with column pivoting (QRCP) [6 25] of A to get anupper triangular factorR and then computing an LQ factorization ofR to get a lower triangularfactor L Stewartrsquos key numerical observation is that the diagonal elements of L track the

lowastReceived October 14 2018 Accepted November 6 2019 Published online on December 11 2019 Recom-mended by Y Saad The research by M Gu was supported in part by NSF award FRG-1760316daggerSchool of Mathematics Physics and Statistics Shanghai University of Engineering Science China

(fyh1001hotmailcom)DaggerDepartment of Mathematics University of California Berkeley

(jwxiaoberkeleyedu mgumathberkeleyedu)

469

ETNAKent State University and

Johann Radon Institute (RICAM)

470 Y FENG J XIAO AND M GU

singular values of A with ldquoconsiderable fidelityrdquo no matter what the matrix A is The pivotedQLP decomposition is extensively analyzed in Huckaby and Chan [34 35] More recentlyDuersch and Gu developed a much more efficient variant of the pivoted QLP decompositionTUXV and demonstrated its remarkable quality as a low-rank approximation empiricallywithout a rigorous theoretical justification of TUXVrsquos success [17]

In this paper we present Flip-Flop SRQR a slightly different variant of TUXV of Duerschand Gu [17] Like TUXV Flip-Flop SRQR requires the most effort for computing a partial QRfactorization using a truncated randomized QRCP (TRQRCP) and a partial LQ factorizationUnlike TUXV however Flip-Flop SRQR also performs additional computations to ensure aspectrum-revealing QR factorization (SRQR) [75] before the partial LQ factorization

We demonstrate the reliable theoretical quality of this variant as a low-rank approximationand its highly competitiveness with state-of-the-art low-rank approximation methods in realworld applications in both low-rank tensor compression [13 40 60 71] and nuclear normminimization [7 45 49 54 68]

The rest of this paper is organized as follows In Section 2 we introduce the TRQRCPalgorithm the SRQR factorization the low-rank tensor compression and the nuclear normminimization In Section 3 we introduce Flip-Flop SRQR and analyze its computational costsand low-rank approximation properties In Section 4 we present numerical results comparingFlip-Flop SRQR with state-of-the-art low-rank approximation methods In Section 5 wesummarize our main results and draw some conclusions

2 Preliminaries and background

21 Partial QRCP The QR factorization of a matrix A isin Rmtimesn is A = QR with anorthogonal matrix Q isin Rmtimesm and an upper trapezoidal matrix R isin Rmtimesn which can becomputed by the LAPACK [2] routine xGEQRF where x stands for the matrix data type Thestandard QR factorization is not suitable for some practical situations where either the matrixA is rank deficient or only representative columns of A are of interest Usually the QRCPis adequate for the aforementioned situations except a few rare examples such as the Kahanmatrix [26] Given a matrix A isin Rmtimesn the QRCP of matrix A has the form

AΠ = QR

where Π isin Rntimesn is a permutation matrix Q isin Rmtimesm is an orthogonal matrix andR isin Rmtimesn is an upper trapezoidal matrix QRCP can be computed by the LAPACK [2]routines xGEQPF and xGEQP3 where xGEQP3 is a more efficient blocked implementationof xGEQPF For a given target rank k (1 le k le min (mn)) the partial QRCP factorization(Algorithm 4 in the appendix) has a 2times 2 block form

(21) AΠ = Q

[R11 R12

R22

]=[Q1 Q2

] [R11 R12

R22

]

where R11 isin Rktimesk is upper triangular The partial QRCP computes an approximate col-umn subspace of A spanned by the leading k columns in AΠ up to the error term in R22Equivalently (21) yields a low rank approximation

(22) A asymp Q1

[R11 R12

]ΠT

with an approximation quality closely related to the error term in R22The Randomized QRCP (RQRCP) algorithm [17 75] is a more efficient variant of

Algorithm 4 RQRCP generates a Gaussian random matrix Ω isin N (0 1)(b+p)timesm with

b + p m where the entries of Ω are independently sampled from a normal distribution

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 471

Algorithm 1 Truncated Randomized QRCP (TRQRCP)InputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0OutputsOrthogonal matrix Q isin RmtimesmUpper trapezoidal matrix R isin RktimesnPermutation matrix Π isin Rntimesn such that AΠ asymp Q ( 1 k)RAlgorithmGenerate iid Gaussian random matrix Ω isin N (0 1)

(b+p)timesmForm the initial sample matrix B = ΩA and initialize Π = Infor j = 1 b k dob = min (k minus j + 1 b)Do partial QRCP on B ( j n) to obtain b pivotsExchange corresponding columns in A B Π and WT Do QR on A (j m j j + bminus 1) using WY formula without updating the trailingmatrixUpdate B ( j + b n) using the formula (24)

end forQ = Q1Q2 middot middot middotQdkbe R = upper trapezoidal part of the submatrix A (1 k 1 n)

to compress A into B = ΩA with much smaller row dimension In practice b is the blocksize and p is the oversampling size RQRCP repeatedly runs partial QRCP on B to obtain bcolumn pivots applies them to the matrix A and then computes QR without pivoting (QRNP)on A and updates the remaining columns of B RQRCP exits this process when it reachesthe target rank k QRCP and RQRCP choose pivots on A and B respectively RQRCP issignificantly faster than QRCP as B has much smaller row dimension than A It is shownin [75] that RQRCP is as reliable as QRCP up to failure probabilities that decay exponentiallywith respect to the oversampling size p

Since the trailing matrix of A is usually not required for low-rank matrix approximations(see (22)) the TRQRCP (truncated RQRCP) algorithm of [17] re-organizes the computationsin RQRCP to directly compute the approximation (22) without explicitly computing thetrailing matrix R22 For more details both RQRCP and TRQRCP are based on the WYrepresentation of the Householder transformations [5 55 61]

Q = Q1Q2 middot middot middotQk = I minus Y TY T

where T isin Rktimesk is an upper triangular matrix and Y isin Rmtimesk is a trapezoidal matrix

consisting of k consecutive Householder vectors Let WT def= TTY TA then the trailing

matrix update formula becomes QTA = Aminus YWT The main difference between RQRCPand TRQRCP is that RQRCP computes the whole trailing matrix update while TRQRCP onlycomputes the part of the update that affects the approximation (22) More discussions aboutRQRCP and TRQRCP can be found in [17] The main steps of TRQRCP are briefly describedin Algorithm 1 The deduction of the updating formula for B is as follows consider partialQRs on AΠ and BΠ = ΩAΠ

(23) AΠ = Q

[R11 R12

R22

] BΠ = Q

[R11 R12

R22

]

ETNAKent State University and

Johann Radon Institute (RICAM)

472 Y FENG J XIAO AND M GU

Partition Ωdef= Q

TΩQ

def=[Ω1 Ω2

] and therefore (23) implies

Ω

[R11 R12

R22

]=

[R11 R12

R22

]

which leads to the updating formula for B in Algorithm 1

(24)[R12

R22

]larr Ω2R22 =

[R12 minusR11R

minus111 R12

R22

]

With TRQRCP the TUXV algorithm [17 Algorithm 7] (Algorithm 5 in the appendix)computes a low-rank approximation with the QLP factorization at a greatly accelerated speedby computing a partial QR factorization with column pivoting followed with a partial LQfactorization

22 Spectrum-revealing QR factorization Although both RQRCP and TRQRCP arevery effective practical tools for low-rank matrix approximations they are not known toprovide reliable low-rank matrix approximations due to their underlying greediness in thecolumn norm-based pivoting strategy To solve this potential problem of column-basedQR factorization Gu and Eisenstat [28] proposed an efficient way to perform additionalcolumn interchanges to enhance the quality of the leading k columns in AΠ as a basis for theapproximate column subspace More recently a more efficient and effective method SRQRwas introduced and analyzed in [50 75] to compute the low-rank approximation (22) Theconcept of spectrum-revealing introduced in [50 74] emphasizes the utilization of partialQR factorization (22) as a low-rank matrix approximation as opposed to the more traditionalrank-revealing factorization which emphasizes the utility of the partial QR factorization (21)as a tool for numerical rank determination The SRQR algorithm is described in Algorithm 2The SRQR algorithm will always run to completion with a high-quality low-rank matrixapproximation (22) For real data matrices that usually have fast decaying singular-valuespectrum this approximation is often as good as the truncated SVD The SRQR algorithmof [75] explicitly updates the partial QR factorization (21) while swapping columns but theSRQR algorithm can actually avoid any explicit computations on the trailing matrix R22 usingTRQRCP instead of RQRCP to obtain exactly the same partial QR initialization Below weoutline the SRQR algorithm

In (21) let

(25) Rdef=

[R11 a

α

]be the leading (l + 1)times (l + 1) (l ge k) submatrix of R We define

(26) g1def=R2212|α|

and g2def= |α|

∥∥∥RminusT∥∥∥12

where X12 is the largest column 2-norm of X for any given X In [75] the authorsproved approximation quality bounds involving g1 g2 for the low-rank approximation com-puted by RQRCP or TRQRCP RQRCP or TRQRCP will provide a good low-rank ma-

trix approximation if g1 and g2 are O(1) The authors also proved that g1 leradic

1+ε1minusε and

g2 leradic

2(1+ε)

1minusε

(1 +

radic1+ε1minusε

)lminus1

hold with high probability for both RQRCP and TRQRCPwhere 0 lt ε lt 1 is a user-defined parameter which guides the choice of the oversampling size

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 473

p For reasonably choices of ε like ε = 12 g1 is a small constant while g2 can potentially be an

extremely large number which can lead to a poor low-rank approximation quality To avoidthe potential large number of g2 the SRQR algorithm (Algorithm 2) proposed in [75] uses apair-wise swapping strategy to guarantee that g2 is below some user-defined tolerance g gt 1which is usually chosen to be a small number larger than one like 20 In Algorithm 2 weuse an approximate formula instead of using the definition (26) directly to estimate g2 ieg2 asymp |α|radicd

∥∥∥ΩRminusT∥∥∥

12 based on the Johnson-Lindenstrauss Lemma [36]

Algorithm 2 Spectrum-revealing QR Factorization (SRQR)InputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0Integer l ge k Tolerance g gt 1 for g2OutputsOrthogonal matrix Q isin Rmtimesm formed by the first k reflectorsUpper trapezoidal matrix R isin RktimesnPermutation matrix Π isin Rntimesn such that AΠ asymp Q ( 1 k)RAlgorithmCompute QRΠ with RQRCP or TRQRCP to l stepsCompute squared 2-norm of the columns of B( l + 1 n) ri (l + 1 le i le n) where Bis an updated random projection of A computed by RQRCP or TRQRCPApproximate squared 2-norm of the columns of A(l + 1 m l + 1 n) ri = ri(b +p) (l + 1 le i le n)ı = argmaxl+1leilenriSwap ı-th and (l + 1)-st columns of AΠ rOne-step QR factorization of A(l + 1 m l + 1 n)|α| = Rl+1l+1ri = ri minusA(l + 1 i)2 (l + 2 le i le n)Generate a random matrix Ω isin N (0 1)dtimes(l+1) (d l)Compute g2 = |α|

∥∥∥RminusT∥∥∥12asymp |α|radic

d

∥∥∥ΩRminusT∥∥∥

12

while g2 gt g doı = argmax1leilel+1ith column norm of ΩRminusT Swap ı-th and (l + 1)-st columns of A and Π in a Round Robin rotationGivens-rotate R back into upper-trapezoidal formrl+1 = R2

l+1l+1 ri = ri +A(l + 1 i)2 (l + 2 le i le n)ı = argmaxl+1leilenriSwap ı-th and (l + 1)-st columns of AΠ rOne-step QR factorization of A(l + 1 m l + 1 n)|α| = Rl+1l+1ri = ri minusA(l + 1 i)2 (l + 2 le i le n)Generate a random matrix Ω isin N (0 1)dtimes(l+1)(d l) Compute g2 = |α|

∥∥∥RminusT∥∥∥12asymp |α|radic

d

∥∥∥ΩRminusT∥∥∥

12

end while

23 Tensor approximation In this section we review some basic notations and conceptsinvolving tensors A more detailed discussion of the properties and applications of tensors can

ETNAKent State University and

Johann Radon Institute (RICAM)

474 Y FENG J XIAO AND M GU

be found in the review [40] A tensor is a d-dimensional array of numbers denoted by scriptnotation X isin RI1timesmiddotmiddotmiddottimesId with entries given by

xj1jd 1 le j1 le I1 1 le jd le Id

We use the matrix X(n) isin RIntimes(Πj 6=nIj) to denote the nth mode unfolding of the tensorX Since this tensor has d dimensions there are altogether d-possibilities for unfolding Then-mode product of a tensor X isin RI1timesmiddotmiddotmiddottimesId with a matrix U isin RktimesIn results in a tensorY isin RI1timesmiddotmiddotmiddottimesInminus1timesktimesIn+1timesmiddotmiddotmiddottimesId such that

yj1jnminus1jjn+1jd = (X timesn U)j1jnminus1jjn+1jd=

Insumjn=1

xj1jdujjn

Alternatively it can be expressed conveniently in terms of unfolded tensors

Y = X timesn U hArr Y(n) = UX(n)

Decompositions of higher-order tensors have applications in signal processing [12 15 63]numerical linear algebra [13 39 76] computer vision [62 72 67] etc Two particular tensordecompositions can be considered as higher-order extensions of the matrix SVD CANDE-COMPPARAFAC (CP) [10 31] decomposes a tensor as a sum of rank-one tensors andthe Tucker decomposition [70] is a higher-order form of the principal component analy-sis Knowing the definitions of mode products and unfolding of tensors we can introducethe sequentially truncated higher-order SVD (ST-HOSVD) algorithm for producing a rank(k1 kd) approximation to the tensor based on the Tucker decomposition format The ST-HOSVD algorithm [3 71] (Algorithm 6 in the appendix) returns a core tensor G isin Rk1timesmiddotmiddotmiddottimeskdand a set of unitary matrices Uj isin RIjtimeskj for j = 1 d such that

X asymp G times1 U1 middot middot middot timesd Ud

where the right-hand side is called a Tucker decomposition However a straightforwardgeneralization to higher-order (d ge 3) tensors of the matrix Eckart-Young-Mirsky Theorem isnot possible [14]

In ST-HOSVD one key step is to compute an exact or approximate rank-kr SVD ofthe tensor unfolding Well known efficient ways to compute an exact low-rank SVD includeKrylov subspace methods [44] There are also efficient randomized algorithms to find anapproximate low-rank SVD [30] In the Matlab tensorlab toolbox [73] the most efficientmethod MLSVD_RSI is essentially ST-HOSVD with an randomized subspace iteration tofind the approximate SVD of the tensor unfolding

24 Nuclear norm minimization Matrix rank minimization problem appears ubiqui-tously in many fields such as Euclidean embedding [20 46] control [21 51 56] collaborativefiltering [9 57 65] system identification [47 48] etc The regularized nuclear norm mini-mization problem is the following

minXisinRmtimesn

f (X) + τXlowast

where τ gt 0 is a regularization parameter and Xlowastdef=sumqi=1 σi (X) The choice of the

function f (middot) is situational f (X) = M minus X1 in robust principal component analysis(robust PCA) [8] and f (X) = πΩ (M)minus πΩ (X) ||2F in matrix completion [7]

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 475

Many researchers have devoted themselves to solve the above nuclear norm minimizationproblem and plenty of algorithms have been proposed including singular value thresholding(SVT) [7] fixed point continuous (FPC) [49] accelerated proximal gradient (APG) [68]augmented Lagrange multiplier (ALM) [45] The most expensive part of these algorithms liesin the computation of the truncated SVD Inexact augmented Lagrange multiplier (IALM) [45]has been proved to be one of the most accurate and efficient among them We now describeIALM for the robust PCA and matrix completion problems

The robust PCA problem can be formalized as a minimization problem of a sum of nuclearnorm and a scaled matrix l1-norm (sum of matrix entries in absolute value)

(27) min Xlowast + λE1 subject to M = X + E

where M is a measured matrix X has low-rank E is an error matrix and sufficiently sparseand λ is a positive weighting parameter Algorithm 7 in the appendix describes details of theIALM method to solve the robust PCA problem [45]

The matrix completion problem [9 45] can be written in the form

(28) minXisinRmtimesn

Xlowast subject to X + E = M πΩ (E) = 0

where πΩ Rmtimesn rarr Rmtimesn is an orthogonal projection that keeps the entries in Ω unchangedand sets those outside Ω zeros In [45] the authors applied the IALM method to the matrixcompletion problem (Algorithm 8 in the appendix)

3 Flip-Flop SRQR factorization

31 Flip-Flop SRQR factorization In this section we introduce our Flip-Flop SRQRfactorization a slightly different variant of TUXV (Algorithm 5 in the appendix) to computean SVD approximation based on the QLP factorization Given an integer l ge k we run SRQR(the version without computing the trailing matrix) with l steps on A

(31) AΠ = QR = Q

[R11 R12

R22

]

where R11 isin Rltimesl is upper triangular R12 isin Rltimes(nminusl) and R22 isin R(mminusl)times(nminusl) Then werun partial QRNP with l steps on RT

(32) RT =

[RT11

RT12 RT22

]= Q

[R11 R12

R22

]asymp Q1

[R11 R12

]

where Q =[Q1 Q2

]with Q1 isin Rntimesl Therefore combining with the fact that

AΠQ1 = Q[R11 R12

]T

we can approximate the matrix A by

(33) A = QRΠT = Q(RT)T

ΠT asymp Q

[RT11

RT12

]QT1 ΠT = A

(ΠQ1

)(ΠQ1

)T

We denote the rank-k truncated SVD ofAΠQ1 by UkΣkVTk LetUk = Uk Vk = ΠQ1Vk

then using (33) a rank-k approximate SVD of A is obtained

(34) A asymp UkΣkVTk

where Uk isin Rmtimesk Vk isin Rntimesk are column orthonormal and Σk = Diag (σ1 middot middot middot σk) withσirsquos being the leading k singular values of AΠQ1 The Flip-Flop SRQR factorization isoutlined in Algorithm 3

ETNAKent State University and

Johann Radon Institute (RICAM)

476 Y FENG J XIAO AND M GU

Algorithm 3 Flip-Flop Spectrum-Revealing QR FactorizationInputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0Integer l ge k Tolerance g gt 1 for g2OutputsU isin Rmtimesk contains the approximate top k left singular vectors of AΣ isin Rktimesk contains the approximate top k singular values of AV isin Rntimesk contains the approximate top k right singular vectors of AAlgorithmRun SRQR on A to l steps to obtain (R11 R12)Run QRNP on (R11 R12)

T to obtain Q1 represented by a sequence of Householder vectors

A = AΠQ1[U Σ V ] = svd

(A)

U = U ( 1 k) Σ = Σ (1 k 1 k) V = ΠQ1V ( 1 k)

32 Complexity analysis In this section we perform a complexity analysis of Flip-Flop SRQR Since the approximate SVD only makes sense when the target rank k is smallwe assume that k le l min (mn) The complexity analysis of Flip-Flop SRQR looks asfollows

1 The main cost of doing SRQR with TRQRCP onA is 2mnl+2(b+p)mn+(m+n)l22 The main cost of the QR factorization for [R11 R12]

T and forming Q1 is 2nl2minus 23 l

33 The main cost of computing A = AΠQ1 is 2mnl4 The main cost of computing [Usimsim] = svd

(A)

is O(ml2)5 The main cost of forming Vk is 2nlk

Since k le l min (mn) the complexity of Flip-Flop SRQR is 4mnl + 2(b+ p)mnby omitting the lower-order terms

On the other hand the complexity of the approximate SVD with randomized subspaceiteration (RSISVD) [27 30] is (4 + 4q)mn (k + p) where p is the oversampling size and qis the number of subspace iterations (see the detailed analysis in the appendix) In practice pis chosen to be a small integer for instance 5 in RSISVD In Flip-Flop SRQR l is usuallychosen to be a little bit larger than k eg l = k + 5 We also found that l = k is sufficient interms of the approximation quality in our numerical experiments Therefore we observe thatFlip-Flop SRQR is more efficient than RSISVD for any q gt 0

Since a practical dataset matrix can be too large to fit into the main memory the cost oftransferring the matrix into a memory hierarchy typically predominates the arithmetic costTherefore we also compare the number of data passes between RSISVD and Flip-Flop SRQRRSISVD needs 2q + 2 passes while Flip-Flop SRQR needs 4 passes computing B = ΩAswapping the columns of A realizing a partial QR factorization on A and computing AΠQ1Therefore Flip-Flop SRQR requires less data transfer than RSISVD for any q gt 1

33 Quality analysis of Flip-Flop SRQR This section is devoted to the quality analysisof Flip-Flop SRQR We start with Lemma 31

LEMMA 31 Given any matrix X = [X1X2] with Xi isin Rmtimesni i = 1 2 and n1 +n2 = n

σj (X)2 le σj (X1)

2+ X222 1 le j le min(mn)

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 477

Proof Since XXT = X1XT1 + X2X

T2 we obtain the above result using [33 Theo-

rem 3316]

We are now ready to derive bounds for the singular values and the approximation errorof Flip-Flop SRQR We need to emphasize that even if the target rank is k we run Flip-FlopSRQR with an actual target rank l which is a little bit larger than k The difference betweenk and l can create a gap between the singular values of A so that we can obtain a reliablelow-rank approximation

THEOREM 32 Given a matrix A isin Rmtimesn a target rank k an oversampling size p andan actual target rank l ge k the matrices Uk Σk Vk in (34) computed by Flip-Flop SRQRsatisfy

(35) σj(Σk) ge σj(A)

4

radic1 +

2R2242σ4j (Σk)

1 le j le k

and

(36)∥∥Aminus UkΣkV

Tk

∥∥2le σk+1 (A)

4

radic1 + 2

(R222σk+1 (A)

)4

where R22 isin R(mminusl)times(nminusl) is the trailing matrix in (31) Using the properties of SRQR wefurthermore have

(37) σj (Σk) ge σj (A)

4

radic1 + min

(2τ4 τ4 (2 + 4τ4)

(σl+1(A)σj(A)

)4) 1 le j le k

and

(38)∥∥Aminus UkΣkV

Tk

∥∥2le σk+1 (A)

4

radic1 + 2τ4

(σl+1 (A)

σk+1 (A)

)4

where τ and τ defined in (312) have matrix dimension-dependent upper bounds

τ le g1g2

radic(l + 1) (nminus l) and τ le g1g2

radicl (nminus l)

where g1 leradic

1+ε1minusε and g2 le g with high probability The parameters ε gt 0 and g gt 1 are

user-defined

Proof In terms of the singular value bounds observing that

[R11 R12

R22

][R11 R12

R22

]T=

[R11R

T11 + R12R

T12 R12R

T22

R22RT12 R22R

T22

]

ETNAKent State University and

Johann Radon Institute (RICAM)

478 Y FENG J XIAO AND M GU

we apply Lemma 31 twice for any 1 le j le k

σ2j

[R11 R12

R22

][R11 R12

R22

]Tle σ2

j

([R11R

T11 + R12R

T12 R12R

T22

])+∥∥∥[R22R

T12 R22R

T22

]∥∥∥2

2

le σ2j

(R11R

T11 + R12R

T12

)+∥∥∥R12R

T22

∥∥∥2

2+∥∥∥[R22R

T12 R22R

T22

]∥∥∥2

2

le σ2j

(R11R

T11 + R12R

T12

)+ 2

∥∥∥∥∥[R12

R22

]∥∥∥∥∥4

2

= σ2j

(R11R

T11 + R12R

T12

)+ 2 R2242 (39)

The relation (39) can be further rewritten as

σ4j (A) le σ4

j

([R11 R12

])+ 2 R2242 = σ4

j (Σk) + 2 R2242 1 le j le k

which is equivalent to

σj(Σk) ge σj(A)

4

radic1 +

2R2242σ4j (Σk)

For the residual matrix bound we let[R11 R12

]def=[R11 R12

]+[δR11 δR12

]

where[R11 R12

]is the rank-k truncated SVD of

[R11 R12

] Since

(310)∥∥Aminus UkΣkV

Tk

∥∥2

=

∥∥∥∥∥AΠminusQ[R11 R12

0

]TQT

∥∥∥∥∥2

it follows from the orthogonality of singular vectors that[R11 R12

]T [δR11 δR12

]= 0

and therefore[R11 R12

]T [R11 R12

]=[R11 R12

]T [R11 R12

]+[δR11 δR12

]T [δR11 δR12

]

which implies

(311) RT12R12 = RT

12R12 +(δR12

)T (δR12

)

Similarly to the deduction of (39) from (311) we can derive∥∥∥∥[δR11 δR12

R22

]∥∥∥∥4

2

le∥∥[δR11 δR12

]∥∥4

2+ 2

∥∥∥∥[δR12

R22

]∥∥∥∥4

2

le σ4k+1 (A) + 2

∥∥∥∥∥[R12

R22

]∥∥∥∥∥4

2

= σ4k+1 (A) + 2 R2242

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 479

Combining with (310) it now follows that

∥∥Aminus UkΣkVTk

∥∥2

=

∥∥∥∥∥AΠminusQ[R11 R12

0

]TQT

∥∥∥∥∥2

=

∥∥∥∥[δR11 δR12

R22

]∥∥∥∥2

le σk+1 (A)4

radic1 + 2

(R222σk+1 (A)

)4

To obtain an upper bound of R222 in (35) and (36) we follow the analysis of SRQRin [75]

From the analysis of [75 Section IV] Algorithm 2 ensures that g1 leradic

1+ε1minusε and g2 le g

with high probability (the actual probability-guarantee formula can be found in [75 SectionIV]) where g1 and g2 are defined by (26) Here 0 lt ε lt 1 is a user defined parameter toadjust the choice of the oversampling size p used in the TRQRCP initialization part in SRQRg gt 1 is a user-defined parameter in the extra swapping part in SRQR Let

(312) τdef= g1g2

R222R2212

∥∥∥RminusT∥∥∥minus1

12

σl+1 (A)and τ

def= g1g2

R222R2212

∥∥RminusT11

∥∥minus1

12

σk (Σk)

where R is defined in equation (25) Since 1radicnX12 le X2 le

radicnX12 and

σi (X1) le σi (X) 1 le i le min (s t) for any matrix X isin Rmtimesn and submatrix X1 isin Rstimestof X we obtain that

τ = g1g2R222R2212

∥∥∥RminusT∥∥∥minus1

12

σl+1

(R) σl+1

(R)

σl+1 (A)le g1g2

radic(l + 1) (nminus l)

Using the fact that σl([R11 R12

])= σl

(R11

)and σk (Σk) = σk

([R11 R12

])by (32)

and (34)

τ = g1g2R222R2212

∥∥RminusT11

∥∥minus1

12

σl (R11)

σl (R11)

σl([R11 R12

]) σl ([R11 R12

])σk (Σk)

le g1g2

radicl (nminus l)

By the definition of τ

(313) R222 = τ σl+1 (A)

Plugging this into (36) yields (38)By the definition of τ we observe that

(314) R222 le τσk (Σk)

By (35) and (314)(315)

σj (Σk) ge σj (A)

4

radic1 + 2

(R222σj(Σk)

)4ge σj (A)

4

radic1 + 2

(R222σk(Σk)

)4ge σj (A)

4radic

1 + 2τ4 1 le j le k

ETNAKent State University and

Johann Radon Institute (RICAM)

480 Y FENG J XIAO AND M GU

On the other hand using (35)

σ4j (A) le σ4

j (Σk)

(1 + 2

σ4j (A)

σ4j (Σk)

R2242σ4j (A)

)

le σ4j (Σk)

1 + 2

(σ4j (Σk) + 2 R2242

)σ4j (Σk)

R2242σ4j (A)

le σ4

j (Σk)

(1 + 2

(1 + 2

R2242σ4k (Σk)

)R2242σ4j (A)

)

that is

σj (Σk) ge σj (A)

4

radic1 +

(2 + 4

R2242σ4k(Σk)

)R2242σ4j (A)

Plugging (313) and (314) into this above equation yields

(316) σj (Σk) ge σj (A)

4

radic1 + τ4 (2 + 4τ4)

(σl+1(A)σj(A)

)4 1 le j le k

Combining (315) with (316) we arrive at (37)We note that (35) and (36) still hold true if we replace k by lInequality (37) shows that with the definitions (312) of τ and τ Flip-Flop SRQR can

reveal at least a dimension-dependent fraction of all the leading singular values of A andindeed approximate them very accurately in case they decay relatively quickly Moreover (38)shows that Flip-Flop SRQR can compute a rank-k approximation that is up to a factor of4

radic1 + 2τ4

(σl+1(A)σk+1(A)

)4

from optimal In situations where the singular values of A decay

relatively quickly our rank-k approximation is about as accurate as the truncated SVD with achoice of l such that

σl+1 (A)

σk+1 (A)= o (1)

REMARK 33 The ldquohigh probabilityrdquo mentioned in Theorem 32 is dependent on theoversampling size p but not on the block size b We decided not to dive deep into the probabilityexplanation since it will make the proof much longer and more tedious Theoretically thecomputed oversampling size p for certain given high probability would be rather high Forexample to achieve a probability 095 we need to choose p to be at least 500 However inpractice we would never choose such a large p since it will deteriorate the code efficiencyIt turns out that a small p like 5 sim 10 would achieve good approximation results empiricallyThe detailed explanation about what does high probability mean here can be found in [75]In terms of the choice of the block size b it wonrsquot affect the analysis or the error boundsThe choice of b is purely for the performance of the codes Only a suitable b can take fulladvantage of the cache hierarchy and BLAS routines

4 Numerical experiments In this section we demonstrate the effectiveness and effi-ciency of the Flip-Flop SRQR (FFSRQR) algorithm in several numerical experiments Firstly

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 481

TABLE 41Methods for approximate SVD

Method DescriptionLANSVD Approximate SVD using Lanczos bidiagonalization with partial

reorthogonalization [42]Its Fortran and Matlab implementations are from PROPACK [41]

FFSRQR Flip-Flop Spectrum-revealing QR factorizationIts Matlab implementation is using Mex-wrappedBLAS and LAPACK routines [2]

RSISVD Approximate SVD with randomized subspace iteration [30 27]Its Matlab implementation is from tensorlab toolbox [73]

TUXV Algorithm 5 in the appendixLTSVD Linear Time SVD [16]

We use its Matlab implementation by Ma et al [49]

we compare FFSRQR with other approximate SVD algorithms for a matrix approximationSecondly we compare FFSRQR with other methods on tensor approximation problems usingthe tensorlab toolbox [73] Thirdly we compare FFSRQR with other methods for the robustPCA problem and matrix completion problem All experiments are implemented on a laptopwith 29 GHz Intel Core i5 processor and 8 GB of RAM The codes based on LAPACK [2]and PROPACK [41] in Section 41 are in Fortran The codes in Sections 42 43 are in Matlabexcept that FFSRQR is implemented in Fortran and wrapped by mex

41 Approximate truncated SVD In this section we compare FFSRQR with otherapproximate SVD algorithms for low-rank matrix approximation All tested methods are listedin Table 41 Since LTSVD has only a Matlab implementation this method is only consideredin Sections 42 and 43 The test matrices areType 1 A isin Rmtimesn [66] is defined by A = U DV T + 01 middot dss middotE where U isin Rmtimess V isin

Rntimess are column-orthonormal matrices and D isin Rstimess is a diagonal matrix with sgeometrically decreasing diagonal entries from 1 to 10minus3 dss is the last diagonalentry of D E isin Rmtimesn is a random matrix where the entries are independentlysampled from a normal distribution N (0 1) In our numerical experiment we testthree different random matrices The square matrix has a size of 10000times 10000 theshort-fat matrix has a size of 1000times 10000 and the tall-skinny matrix has a size of10000times 1000

Type 2 A isin R4929times4929 is a real data matrix from the University of Florida sparse matrixcollection [11] Its corresponding file name is HBGEMAT11

For a given matrix A isin Rmtimesn the relative SVD approximation error is measured by∥∥Aminus UkΣkVTk

∥∥FAF where Σk contains the approximate top k singular values and

Uk Vk are the corresponding approximate top k singular vectors The parameters used inFFSRQR RSISVD TUXV and LTSVD are listed in Table 42 The additional time requiredto compute g2 is negligible In our experiments g2 always remains modest and never triggerssubsequent SRQR column swaps However computing g2 is still recommended since g2

serves as an insurance policy against potential quality degration by TRQRCP The block size bis chosen to be 32 according to the xGEQP3 LAPACK routine When k is less than 32 weimplement an unblock version of the partial QRCP

Figures 41ndash43 display the run time the relative approximation error and the top 20singular values comparison respectively for four different matrices In terms of speed FFSRQRis always faster than LANSVD and RSISVD Moreover the efficiency difference is more

ETNAKent State University and

Johann Radon Institute (RICAM)

482 Y FENG J XIAO AND M GU

TABLE 42Parameters used in RSISVD FFSRQR TUXV and LTSVD

Method ParameterFFSRQR oversampling size p = 5 block size b = min32 k l = k d = 10 g = 20RSISVD oversampling size p = 5 subspace iteration q = 1TUXV oversampling size p = 5 block size b = min32 k l = kLTSVD probabilities pi = 1n

obvious when k is relatively large FFSRQR is only slightly slower than TUXV In termsof the relative approximation error FFSRQR is comparable to the other three methods Interms of the top singular values approximation Figure 43 displays the top 20 singular valueswhen k = 500 for four different matrices In Figure 43 FFSRQR is superior to TUXV aswe take the absolute values of the main diagonal entries of the upper triangle matrix R as theapproximate singular values for TUXV

100 200 300 400 500

target rank k

100

101

102

103

104

run tim

e (

sec)

FFSRQR

RSISVD

LANSVD

TUXV

(a) Type 1 Random square matrix

100 200 300 400 500

target rank k

10-1

100

101

102

run tim

e (

sec)

FFSRQR

RSISVD

LANSVD

TUXV

(b) Type 1 Random short-fat matrix

100 200 300 400 500

target rank k

10-1

100

101

102

run tim

e (

sec)

FFSRQR

RSISVD

LANSVD

TUXV

(c) Type 1 Random tall-skinny matrix

100 200 300 400 500

target rank k

10-1

100

101

102

103

run tim

e (

sec)

FFSRQR

RSISVD

LANSVD

TUXV

(d) Type 2 GEMAT11

FIG 41 Run time comparison for approximate SVD algorithms

42 Tensor approximation This section illustrates the effectiveness and efficiency ofFFSRQR for computing approximate tensors ST-HOSVD [3 71] is one of the most efficient

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 483

100 200 300 400 500

target rank k

022

024

026

028

03

032

034

036re

lative

ap

pro

xim

atio

n e

rro

r

FFSRQR

RSISVD

LANSVD

TUXV

(a) Type 1 Random square matrix

100 200 300 400 500

target rank k

10-2

10-1

100

rela

tive

ap

pro

xim

atio

n e

rro

r

FFSRQR

RSISVD

LANSVD

TUXV

(b) Type 1 Random short-fat matrix

100 200 300 400 500

target rank k

10-2

10-1

100

rela

tive

ap

pro

xim

atio

n e

rro

r

FFSRQR

RSISVD

LANSVD

TUXV

(c) Type 1 Random tall-skinny matrix

100 200 300 400 500

target rank k

024

026

028

03

032

034

036

rela

tive

ap

pro

xim

atio

n e

rro

rFFSRQR

RSISVD

LANSVD

TUXV

(d) Type 2 GEMAT11

FIG 42 Relative approximation error comparison for approximate SVD algorithms

algorithms to compute a Tucker decomposition of tensors and the most costly part of thisalgorithm is to compute an SVD or approximate SVD of the tensor unfoldings The truncatedSVD and RSISVD are used in the routines MLSVD and MLSVD_RSI respectively in theMatlab tensorlab toolbox [73] Based on this Matlab toolbox we implement ST-HOSVDusing FFSRQR or LTSVD to do the SVD approximation We name these two new routines byMLSVD_FFSRQR and MLSVD_LTSVD respectively and compare these four routines in thenumerical experiment We also have Python codes for this tensor approximation experimentWe donrsquot list the results of the python programs here as they are similar to those for Matlab

421 A sparse tensor example We test a sparse tensor X isin Rntimesntimesn of the followingformat [64 59]

X =

10sumj=1

1000

jxj yj zj +

nsumj=11

1

jxj yj zj

where xj yj zj isin Rn are sparse vectors with nonnegative entries The symbol ldquordquo representsthe vector outer product We compute a rank-(k k k) Tucker decomposition [GU1 U2 U3]using MLSVD MLSVD_FFSRQR MLSVD_RSI and MLSVD_LTSVD respectively Therelative approximation error is measured by X minus XkF XF where Xk = G times1 U1 times2

U2 times3 U3

ETNAKent State University and

Johann Radon Institute (RICAM)

484 Y FENG J XIAO AND M GU

5 10 15 20

index

06

07

08

09

1to

p 2

0 s

ingula

r valu

es

FFSRQR

RSISVD

LANSVD

TUXV

(a) Type 1 Random square matrix

5 10 15 20

index

06

07

08

09

1

top 2

0 s

ingula

r valu

es

FFSRQR

RSISVD

LANSVD

TUXV

(b) Type 1 Random short-fat matrix

5 10 15 20

index

06

065

07

075

08

085

09

095

top

20

sin

gu

lar

va

lue

s

FFSRQR

RSISVD

LANSVD

TUXV

(c) Type 1 Random tall-skinny matrix

5 10 15 20

index

101

102

103

top

20

sin

gu

lar

va

lue

sFFSRQR

RSISVD

LANSVD

TUXV

(d) Type 2 GEMAT11

FIG 43 Top 20 singular values comparison for approximate SVD algorithms

Figure 44 displays a comparison of the efficiency and accuracy of different methods for a400times 400times 400 sparse tensor approximation problem MLSVD_LTSVD is the fastest but theleast accurate one The other three methods have similar accuracy while MLSVD_FFSRQR isfaster when the target rank k is larger

422 Handwritten digits classification MNIST is a handwritten digits image data setcreated by Yann LeCun [43] Every digit is represented by a 28times28 pixel image Handwrittendigits classification has the objective of training a classification model to classify new unlabeledimages A HOSVD algorithm is proposed by Savas and Elden [60] to classify handwrittendigits To reduce the training time a more efficient ST-HOSVD algorithm is introducedin [71]

We do handwritten digits classification using MNIST which consists of 60 000 trainingimages and 10 000 test images The number of training images in each class is restrictedto 5421 so that the training set is equally distributed over all classes The training set isrepresented by a tensor X of size 786times 5421times 10 The classification relies on Algorithm 2in [60] We use various algorithms to obtain an approximation X asymp G times1 U1 times2 U2 times3 U3

where the core tensor G has size 65times 142times 10The results are summarized in Table 43 In terms of run time we observe that our method

MLSVD_FFSRQR is comparable to MLSVD_RSI while MLSVD is the most expensive

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 485

50 100 150 200

target rank k

10-5

10-4

10-3

10-2

10-1

100

rela

tive

ap

pro

xim

atio

n e

rro

r

MLSVD

MLSVD_FFSRQR

MLSVD_RSI

MLSVD_LTSVD

50 100 150 200

target rank k

10-2

10-1

100

101

102

run tim

e (

sec)

MLSVD

MLSVD_FFSRQR

MLSVD_RSI

MLSVD_LTSVD

FIG 44 Run time and relative approximation error comparison on a sparse tensor

TABLE 43Comparison on handwritten digits classification

MLSVD MLSVD_FFSRQR MLSVD_RSI MLSVD_LTSVD

Training Time [sec] 272121 15455 19343 04266Relative Model Error 04099 04273 04247 05162Classification Accur 9519 9498 9505 9259

one and MLSVD_LTSVD is the fastest one In terms of classification quality MLSVDMLSVD_FFSRQR and MLSVD_RSI are comparable while MLSVD_LTSVD is the leastaccurate one

43 Solving nuclear norm minimization problem To show the effectiveness of theFFSRQR algorithm for the nuclear norm minimization problems we investigate two scenariosthe robust PCA (27) and the matrix completion problem (28) The test matrix used for therobust PCA is introduced in [45] and the test matrices used in the matrix completion are tworeal data sets We use the IALM method [45] to solve both problems the IALM code can bedownloaded1

431 Robust PCA To solve the robust PCA problem we replace the approximateSVD part in the IALM method [45] by various methods We denote the actual solution tothe robust PCA problem by a matrix pair (Xlowast Elowast) isin Rmtimesn times Rmtimesn The matrix Xlowast isXlowast = XLX

TR with XL isin Rmtimesk XR isin Rntimesk being random matrices whose entries are

independently sampled from a normal distribution The sparse matrix Elowast is a random matrixwhere its non-zero entries are independently sampled from a uniform distribution over theinterval [minus500 500] The input to the IALM algorithm has the form M = Xlowast + Elowast andthe output is denoted by

(X E

) In this numerical experiment we use the same parameter

settings as the IALM code for robust PCA rank k is 01m and the number of non-zero entriesin E is 005m2 We choose the trade-off parameter λ = 1

radicmax (mn) as suggested by

Candes et al [8] The solution quality is measured by the normalized root mean square errorX minusXlowastF XlowastF

Table 44 includes relative error run time the number of non-zero entries in E (E0)the iteration count and the number of non-zero singular values (sv) in X of the IALM

1httpperceptioncslillinoisedumatrix-ranksample_codehtml

ETNAKent State University and

Johann Radon Institute (RICAM)

486 Y FENG J XIAO AND M GU

algorithm using different approximate SVD methods We observe that IALM_FFSRQR isfaster than all the other three methods while its error is comparable to IALM_LANSVD andIALM_RSISVD IALM_LTSVD is relatively slow and not effective

TABLE 44Comparison on robust PCA

Size Method Error Time (sec) E0 Iter sv

1000times 1000

IALM_LANSVD 333e-07 579e+00 50000 22 100IALM_FFSRQR 279e-07 102e+00 50000 25 100IALM_RSISVD 336e-07 109e+00 49999 22 100IALM_LTSVD 992e-02 311e+00 999715 100 100

2000times 2000

IALM_LANSVD 261e-07 591e+01 199999 22 200IALM_FFSRQR 182e-07 693e+00 199998 25 200IALM_RSISVD 263e-07 738e+00 199996 22 200IALM_LTSVD 842e-02 220e+01 3998937 100 200

4000times 4000

IALM_LANSVD 138e-07 465e+02 799991 23 400IALM_FFSRQR 139e-07 443e+01 800006 26 400IALM_RSISVD 151e-07 504e+01 799990 23 400IALM_LTSVD 894e-02 154e+02 15996623 100 400

6000times 6000

IALM_LANSVD 130e-07 166e+03 1799982 23 600IALM_FFSRQR 102e-07 142e+02 1799993 26 600IALM_RSISVD 144e-07 162e+02 1799985 23 600IALM_LTSVD 858e-02 555e+02 35992605 100 600

432 Matrix completion We solve the matrix completion problems for two real datasets used in [68] the Jester joke data set [24] and the MovieLens data set [32] The Jester jokedata set consists of 41 million ratings for 100 jokes from 73 421 users and can be downloadedfrom the website httpgoldbergberkeleyedujester-data We test withthe following data matrices

bull jester-1 Data from 24 983 users who have rated 36 or more jokesbull jester-2 Data from 23 500 users who have rated 36 or more jokesbull jester-3 Data from 24 938 users who have rated between 15 and 35 jokesbull jester-all The combination of jester-1 jester-2 and jester-3

The MovieLens data set can be downloaded fromhttpsgrouplensorgdatasetsmovielensWe test with the following data matricesbull movie-100K 100 000 ratings of 943 users for 1682 moviesbull movie-1M 1 million ratings of 6040 users for 3900 moviesbull movie-latest-small 100 000 ratings of 700 users for 9000 movies

For each data set we let M be the original data matrix where Mij stands for the ratingof joke (movie) j by user i and Γ be the set of indices where Mij is known The matrixcompletion algorithm quality is measured by the Normalized Mean Absolute Error (NMAE)defined by

NMAEdef=

1|Γ|sum

(ij)isinΓ |Mij minusXij |rmax minus rmin

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 487

where Xij is the prediction of the rating of joke (movie) j given by user i and rmin rmax arethe lower and upper bounds of the ratings respectively For the Jester joke data sets we setrmin = minus10 and rmax = 10 For the MovieLens data sets we set rmin = 1 and rmax = 5

Since |Γ| is large we randomly select a subset Ω from Γ and then use Algorithm 8 tosolve the problem (28) We randomly select 10 ratings for each user in the Jester joke datasets while we randomly choose about 50 of the ratings for each user in the MovieLensdata sets Table 45 includes the parameter settings in the algorithms The maximum iterationnumber is 100 in IALM and all other parameters are the same as those used in [45]

The numerical results are included in Table 46 We observe that IALM_FFSRQR achievesalmost the same recoverability as other methods (except IALM_LTSVD) and is slightly fasterthan IALM_RSISVD for these two data sets

TABLE 45Parameters used in the IALM method on matrix completion

Data set m n |Γ| |Ω|jester-1 24983 100 181e+06 249830jester-2 23500 100 171e+06 235000jester-3 24938 100 617e+05 249384

jester-all 73421 100 414e+06 734210movie-100K 943 1682 100e+05 49918movie-1M 6040 3706 100e+06 498742

movie-latest-small 671 9066 100e+05 52551

5 Conclusions We have presented the Flip-Flop SRQR factorization a variant ofthe QLP factorization to compute low-rank matrix approximations The Flip-Flop SRQRalgorithm uses a SRQR factorization to initialize the truncated version of a column-pivotedQR factorization and then forms an LQ factorization For the numerical results presented theerrors in the proposed algorithm were comparable to those obtained from the other state-of-the-art algorithms This new algorithm is cheaper to compute and produces quality low-rankmatrix approximations Furthermore we prove singular value lower bounds and residual errorupper bounds for the Flip-Flop SRQR factorization In situations where singular values ofthe input matrix decay relatively quickly the low-rank approximation computed by Flip-FlopSRQR is guaranteed to be as accurate as the truncated SVD We also perform a complexityanalysis to show that Flip-Flop SRQR is faster than the approximate SVD with randomizedsubspace iteration Future work includes reducing the overhead cost in Flip-Flop SRQR andimplementing the Flip-Flop SRQR algorithm on distributed memory machines for popularapplications such as distributed PCA

Acknowledgments The authors would like to thank Yousef Saad the editor and threeanonymous referees who kindly reviewed the earlier version of this manuscript and providedvaluable suggestions and comments

Appendix A Partial QR factorization with column pivotingThe details of partial QRCP are covered in Algorithm 4

Appendix B TUXV algorithmThe details of TUXV algorithm are presented in Algorithm 5

Appendix C Computing Tucker decomposition of higher-order tensorsAlgorithm 6 computes the Tucker decomposition of higher-order tensors ie sequentially

truncated higher-order SVD

ETNAKent State University and

Johann Radon Institute (RICAM)

488 Y FENG J XIAO AND M GU

TABLE 46Comparison on matrix completion

Data set Method Iter Time NMAE sv σmax σmin

jester-1

IALM-LANSVD 12 706e+00 184e-01 100 214e+03 100e+00IALM-FFSRQR 12 344e+00 169e-01 100 228e+03 100e+00IALM-RSISVD 12 375e+00 189e-01 100 212e+03 100e+00IALM-LTSVD 100 211e+01 174e-01 62 300e+03 100e+00

jester-2

IALM-LANSVD 12 680e+00 185e-01 100 213e+03 100e+00IALM-FFSRQR 12 279e+00 170e-01 100 229e+03 100e+00IALM-RSISVD 12 359e+00 191e-01 100 212e+03 100e+00IALM-LTSVD 100 203e+01 175e-01 58 296e+03 100e+00

jester-3

IALM-LANSVD 12 705e+00 126e-01 99 179e+03 100e+00IALM-FFSRQR 12 303e+00 122e-01 100 171e+03 100e+00IALM-RSISVD 12 385e+00 131e-01 100 178e+03 100e+00IALM-LTSVD 100 212e+01 133e-01 55 250e+03 100e+00

jester-all

IALM-LANSVD 12 239e+01 172e-01 100 356e+03 100e+00IALM-FFSRQR 12 112e+01 162e-01 100 363e+03 100e+00IALM-RSISVD 12 134e+01 182e-01 100 347e+03 100e+00IALM-LTSVD 100 699e+01 168e-01 52 492e+03 100e+00

movie-100K

IALM-LANSVD 29 286e+01 183e-01 285 121e+03 100e+00IALM-FFSRQR 30 455e+00 167e-01 295 153e+03 100e+00IALM-RSISVD 29 482e+00 182e-01 285 129e+03 100e+00IALM-LTSVD 48 142e+01 147e-01 475 191e+03 100e+00

movie-1M

IALM-LANSVD 50 740e+02 158e-01 495 499e+03 100e+00IALM-FFSRQR 53 207e+02 137e-01 525 663e+03 100e+00IALM-RSISVD 50 223e+02 157e-01 495 535e+03 100e+00IALM-LTSVD 100 850e+02 117e-01 995 897e+03 100e+00

movie-latest-small

IALM-LANSVD 31 166e+02 185e-01 305 113e+03 100e+00IALM-FFSRQR 31 196e+01 200e-01 305 142e+03 100e+00IALM-RSISVD 31 285e+01 191e-01 305 120e+03 100e+00IALM-LTSVD 63 402e+01 208e-01 298 179e+03 100e+00

Appendix D IALM for solving two special nuclear norm minimization problemsAlgorithm 7 and Algorithm 8 solve the robust PCA and matrix completion problems

respectively In Algorithm 7 Sω (x) = sgn (x) middot max (|x| minus ω 0) is the soft shrinkageoperator [29] with x isin Rn and ω gt 0 In Algorithm 8 Ω is the complement of Ω

Appendix E Approximate SVD with randomized subspace iterationThe randomized subspace iteration was proposed in [30 Algorithm 44] to compute an

orthonormal matrix whose range approximates the range of A An approximate SVD can becomputed using the aforementioned orthonormal matrix [30 Algorithm 51] A randomizedsubspace iteration is used in routine MLSVD_RSI in Matlab toolbox tensorlab [73] andMLSVD_RSI is by far the most efficient function that we can find in Matlab to compute ST-HOSVD We summarize the approximate SVD with randomized subspace iteration pseudocodein Algorithm 9

Now we perform a complexity analysis for the approximate SVD with randomizedsubspace iteration We first note that

1 The cost of generating a random matrix is negligible2 The cost of computing B = AΩ is 2mn (k + p)

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 489

Algorithm 4 Partial QRCPInputsMatrix A isin Rmtimesn Target rank kOutputsOrthogonal matrix Q isin RmtimesmUpper trapezoidal matrix R isin RmtimesnPermutation matrix Π isin Rntimesn such that AΠ = QRAlgorithmInitialize Π = In Compute column norms rs = A (1 m s) 2 (1 le s le n)for j = 1 k do

Find i = arg maxjleslen

rs Exchange rj and ri columns in A and Π

Form Householder reflection Qj from A (j m j)Update trailing matrix A (j m j n)larr QTj A (j m j n)Update rs = A (j + 1 m s) 2 (j + 1 le s le n)

end forQ = Q1Q2 middot middot middotQk is the product of all reflections R = upper trapezoidal part of A

Algorithm 5 TUXV AlgorithmInputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0OutputsColumn orthonormal matrices U isin Rmtimesk V isin Rntimesk and upper triangular matrixR isin Rktimesk such that A asymp URV T AlgorithmDo TRQRCP on A to obtain Q isin Rmtimesk R isin Rktimesn and Π isin RntimesnR = RΠT and do LQ factorization ie [VR] = qr(RT 0)Compute Z = AV and do QR factorization ie [UR] = qr(Z 0)

3 In each QR step [Qsim] = qr (B 0) the cost of computing the QR factorization ofB is 2m (k + p)

2 minus 23 (k + p)

3 (cf [69]) and the cost of forming the first (k + p)

columns in the full Q matrix is m (k + p)2

+ 13 (k + p)

3

Now we count the flops for each i in the for loop

1 The cost of computing B = AT lowastQ is 2mn (k + p)2 The cost of computing [Qsim] = qr (B 0) is 2n (k + p)

2 minus 23 (k + p)

3 and the costof forming the first (k + p) columns in the full Q matrix is n (k + p)

2+ 1

3 (k + p)3

3 The cost of computing B = A lowastQ is 2mn (k + p)4 The cost of computing [Qsim] = qr (B 0) is 2m (k + p)

2minus 23 (k + p)

3 and the costof forming the first (k + p) columns in the full Q matrix is m (k + p)

2+ 1

3 (k + p)3

Together the cost of running the for loop q times is

q

(4mn (k + p) + 3 (m+ n) (k + p)

2 minus 2

3(k + p)

3

)

Additionally the cost of computing B = QT lowastA is 2mn (k + p) the cost of doing SVD of Bis O(n (k + p)

2) and the cost of computing U = Q lowast U is 2m (k + p)

2

ETNAKent State University and

Johann Radon Institute (RICAM)

490 Y FENG J XIAO AND M GU

Algorithm 6 ST-HOSVDInputsTensor X isin RI1timesmiddotmiddotmiddottimesId desired rank (k1 kd) and processing order p = (p1 middot middot middot pd)OutputsTucker decomposition [GU1 middot middot middot Ud]AlgorithmDefine tensor G larr X for j = 1 d dor = pj Compute exact or approximate rank kr SVD of the tensor unfolding G(r) asymp UrΣrV Tr Ur larr UrUpdate G(r) larr ΣrV

Tr ie applying UTr to G

end for

Algorithm 7 Robust PCA Using IALMInputsMeasured matrix M isin Rmtimesn positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 J (M) = max

(M2 λminus1MF

) Y0 = MJ (M) E0 = 0

while not converged do(UΣV) = svd

(MminusEk + microminus1k Yk

)

Xk+1 = USmicrominus1k

(Σ)V T

Ek+1 = Sλmicrominus1k

(M minusXk+1 + microminus1

k Yk)

Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then

Breakend if

end while

Now assume k + p min (mn) and omit the lower-order terms then we arriveat (4q + 4)mn (k + p) as the complexity of approximate SVD with randomized subspaceiteration In practice q is usually chosen to be an integer between 0 and 2

REFERENCES

[1] O ALTER P O BROWN AND D BOTSTEIN Singular value decomposition for genome-wide expressiondata processing and modeling Proc Natl Acad Sci USA 97 (2000) pp 10101ndash10106

[2] E ANDERSON Z BAI C BISCHOF L S BLACKFORD J DEMMEL J DONGARRA J DU CROZA GREENBAUM S HAMMARLING A MCKENNEY AND D SORENSEN LAPACK Usersrsquo Guide 3rded SIAM Philadelphia 1999

[3] C A ANDERSSON AND R BRO Improving the speed of multi-way algorithms Part I Tucker3 ChemometricsIntell Lab Syst 42 (1998) pp 93ndash103

[4] M W BERRY S T DUMAIS AND G W OrsquoBRIEN Using linear algebra for intelligent information retrievalSIAM Rev 37 (1995) pp 573ndash595

[5] C BISCHOF AND C VAN LOAN The WY representation for products of Householder matrices SIAM J Sci

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 491

Algorithm 8 Matrix Completion Using IALMInputsSampled set Ω sampled entries πΩ (M) positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 Y0 = 0 E0 = 0while not converged do

(UΣV) = svd(MminusEk + microminus1k Yk

)

Xk+1 = USmicrominus1k

(Σ)V T

Ek+1 = πΩ

(M minusXk+1 + microminus1

k Yk)

Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then

Breakend if

end while

Statist Comput 8 (1987) pp S2ndashS13[6] P BUSINGER AND G H GOLUB Handbook series linear algebra Linear least squares solutions by House-

holder transformations Numer Math 7 (1965) pp 269ndash276[7] J-F CAI E J CANDEgraveS AND Z SHEN A singular value thresholding algorithm for matrix completion

SIAM J Optim 20 (2010) pp 1956ndash1982[8] E J CANDEgraveS X LI Y MA AND J WRIGHT Robust principal component analysis J ACM 58 (2011)

Art 11 37 pages[9] E J CANDEgraveS AND B RECHT Exact matrix completion via convex optimization Found Comput Math 9

(2009) pp 717ndash772[10] J D CARROLL AND J-J CHANG Analysis of individual differences in multidimensional scaling via an

N-way generalization of ldquoEckart-Youngrdquo decomposition Psychometrika 35 (1970) pp 283ndash319[11] T A DAVIS AND Y HU The University of Florida sparse matrix collection ACM Trans Math Software 38

(2011) pp Art 1 25[12] L DE LATHAUWER AND B DE MOOR From matrix to tensor Multilinear algebra and signal processing in

Mathematics in Signal Processing IV Inst Math Appl Conf Ser Oxford Univ Press Oxford 1998pp 1ndash16

[13] L DE LATHAUWER B DE MOOR AND J VANDEWALLE A multilinear singular value decompositionSIAM J Matrix Anal Appl 21 (2000) pp 1253ndash1278

[14] On the best rank-1 and rank-(R1 R2 middot middot middot RN ) approximation of higher-order tensors SIAM JMatrix Anal Appl 21 (2000) pp 1324ndash1342

[15] L DE LATHAUWER AND J VANDEWALLE Dimensionality reduction in higher-order signal processing andrank-(R1 R2 RN ) reduction in multilinear algebra Linear Algebra Appl 391 (2004) pp 31ndash55

[16] P DRINEAS R KANNAN AND M W MAHONEY Fast Monte Carlo algorithms for matrices II Computinga low-rank approximation to a matrix SIAM J Comput 36 (2006) pp 158ndash183

[17] J A DUERSCH AND M GU Randomized QR with column pivoting SIAM J Sci Comput 39 (2017)pp C263ndashC291

[18] C ECKART AND G YOUNG The approximation of one matrix by another of lower rank Psychometrika 1(1936) pp 211ndash218

[19] L ELDEacuteN Matrix Methods in Data Mining and Pattern Recognition SIAM Philadelphia 2007[20] M FAZEL H HINDI AND S P BOYD Log-det heuristic for matrix rank minimization with applications to

Hankel and Euclidean distance matrices in Proceedings of the 2003 American Control Conference 2003IEEE Conference Proceedings IEEE Piscataway pp 2156ndash2162

[21] M FAZEL T K PONG D SUN AND P TSENG Hankel matrix rank minimization with applications tosystem identification and realization SIAM J Matrix Anal Appl 34 (2013) pp 946ndash977

[22] C FRANK AND E NOumlTH Optimizing eigenfaces by face masks for facial expression recognition in ComputerAnalysis of Images and Patterns N Petkov and M A Westenberg eds vol 2756 of Lecture Notes inComput Sci Springer Berlin 2003 pp 646ndash654

ETNAKent State University and

Johann Radon Institute (RICAM)

492 Y FENG J XIAO AND M GU

Algorithm 9 Approximate SVD with Randomized Subspace IterationInputsMatrix A isin Rmtimesn Target rank k Oversampling size p ge 0 Number of iterations q ge 1OutputsU isin Rmtimesk contains the approximate top k left singular vectors of AΣ isin Rktimesk contains the approximate top k singular values of AV isin Rntimesk contains the approximate top k right singular vectors of AAlgorithmGenerate iid Gaussian matrix Ω isin N (0 1)

ntimes(k+p)Compute B = AΩ[Qsim] = qr (B 0)for i = 1 q doB = AT lowastQ[Qsim] = qr (B 0)B = A lowastQ[Qsim] = qr (B 0)

end forB = QT lowastA[UΣ V ] = svd (B)U = Q lowast U U = U ( 1 k)Σ = Σ (1 k 1 k)V = V ( 1 k)

[23] G W FURNAS S DEERWESTER S T DUMAIS T K LANDAUER R A HARSHMAN L A STREETERAND K E LOCHBAUM Information retrieval using a singular value decomposition model of latentsemantic structure in Proceedings of the 11th Annual International ACM SIGIR Conference on Researchand Development in Information Retrieval Y Chiaramella ed ACM New York 1988 pp 465ndash480

[24] K GOLDBERG T ROEDER D GUPTA AND C PERKINS Eigentaste A constant time collaborative filteringalgorithm Inf Retrieval 4 (2001) pp 133ndash151

[25] G GOLUB Numerical methods for solving linear least squares problems Numer Math 7 (1965) pp 206ndash216[26] G H GOLUB AND C F VAN LOAN Matrix Computations 4th ed Johns Hopkins University Press

Baltimore 2013[27] M GU Subspace iteration randomization and singular value problems SIAM J Sci Comput 37 (2015)

pp A1139ndashA1173[28] M GU AND S C EISENSTAT Efficient algorithms for computing a strong rank-revealing QR factorization

SIAM J Sci Comput 17 (1996) pp 848ndash869[29] E T HALE W YIN AND Y ZHANG Fixed-point continuation for l1-minimization methodology and

convergence SIAM J Optim 19 (2008) pp 1107ndash1130[30] N HALKO P G MARTINSSON AND J A TROPP Finding structure with randomness probabilistic

algorithms for constructing approximate matrix decompositions SIAM Rev 53 (2011) pp 217ndash288[31] R A HARSHMAN Foundations of the PARAFAC procedure Models and conditions for an ldquoexplanatoryrdquo

multimodal factor analysis UCLA Working Papers in Phonetics 16 (1970) pp 1ndash84[32] J L HERLOCKER J A KONSTAN A BORCHERS AND J RIEDL An algorithmic framework for performing

collaborative filtering in Proceedings of the 22nd Annual International ACM SIGIR Conference onResearch and Development in Information Retrieval F Gey M Heaast and R Tong eds ACM NewYork 1999 pp 230ndash237

[33] R A HORN AND C R JOHNSON Topics in Matrix Analysis Cambridge University Press Cambridge 1991[34] D A HUCKABY AND T F CHAN On the convergence of Stewartrsquos QLP algorithm for approximating the

SVD Numer Algorithms 32 (2003) pp 287ndash316[35] Stewartrsquos pivoted QLP decomposition for low-rank matrices Numer Linear Algebra Appl 12 (2005)

pp 153ndash159[36] W B JOHNSON AND J LINDENSTRAUSS Extensions of Lipschitz mappings into a Hilbert space in

Conference in Modern Analysis and Probability (New Haven Conn 1982) R Beals A Beck A Bellowand A Hajian eds vol 26 of Contemp Math Amer Math Soc Providence 1984 pp 189ndash206

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 493

[37] I T JOLLIFFE Principal Component Analysis Springer New York 1986[38] J M KLEINBERG Authoritative sources in a hyperlinked environment J ACM 46 (1999) pp 604ndash632[39] T G KOLDA Orthogonal tensor decompositions SIAM J Matrix Anal Appl 23 (2001) pp 243ndash255[40] T G KOLDA AND B W BADER Tensor decompositions and applications SIAM Rev 51 (2009) pp 455ndash

500[41] R M LARSEN PROPACK-software for large and sparse SVD calculations available at

httpsunstanfordedurmunkPROPACK[42] Lanczos bidiagonalization with partial reorthogonalization Tech Report PB-537 Department of

Computer Science Aarhus University Aarhus Denmark 1998[43] Y LECUN L BOTTOU Y BENGIO AND P HAFFNER Gradient-based learning applied to document

recognition Proc IEEE 86 (1998) pp 2278ndash2324[44] R B LEHOUCQ D C SORENSEN AND C YANG ARPACK Usersrsquo Guide Solution of Large-Scale

Eigenvalue Problems with Implicitly Restarted Arnoldi Methods SIAM Philadelphia 1998[45] Z LIN M CHEN AND Y MA The augmented Lagrange multiplier method for exact recovery of corrupted

low-rank matrices Preprint on arXiv httpsarxivorgabs10095055 2010[46] N LINIAL E LONDON AND Y RABINOVICH The geometry of graphs and some of its algorithmic

applications Combinatorica 15 (1995) pp 215ndash245[47] Z LIU A HANSSON AND L VANDENBERGHE Nuclear norm system identification with missing inputs and

outputs Systems Control Lett 62 (2013) pp 605ndash612[48] Z LIU AND L VANDENBERGHE Interior-point method for nuclear norm approximation with application to

system identification SIAM J Matrix Anal Appl 31 (2009) pp 1235ndash1256[49] S MA D GOLDFARB AND L CHEN Fixed point and Bregman iterative methods for matrix rank minimiza-

tion Math Program 128 (2011) pp 321ndash353[50] C B MELGAARD Randomized Pivoting and Spectrum-Revealing Bounds in Numerical Linear Algebra

PhD Thesis University of California Berkeley 2015[51] M MESBAHI AND G P PAPAVASSILOPOULOS On the rank minimization problem over a positive semidefinite

linear matrix inequality IEEE Trans Automat Control 42 (1997) pp 239ndash243[52] N MULLER L MAGAIA AND B M HERBST Singular value decomposition eigenfaces and 3D recon-

structions SIAM Rev 46 (2004) pp 518ndash545[53] M NARWARIA AND W LIN SVD-based quality metric for image and video using machine learning IEEE

Trans Systbdquo Man and Cybernetics Part B (Cybernetics) 42 (2012) pp 347ndash364[54] T-H OH Y MATSUSHITA Y-W TAI AND I SO KWEON Fast randomized singular value thresholding

for nuclear norm minimization in Proceedings of the IEEE Conference on Computer Vision and PatternRecognition 2015 IEEE Conference Proceedings IEEE Los Alamitos 2015 pp 4484ndash4493

[55] G QUINTANA-ORTIacute X SUN AND C H BISCHOF A BLAS-3 version of the QR factorization with columnpivoting SIAM J Sci Comput 19 (1998) pp 1486ndash1494

[56] B RECHT M FAZEL AND P A PARRILO Guaranteed minimum-rank solutions of linear matrix equationsvia nuclear norm minimization SIAM Rev 52 (2010) pp 471ndash501

[57] J D RENNIE AND N SREBRO Fast maximum margin matrix factorization for collaborative prediction inProceedings of the 22nd International Conference on Machine Learning L De Raedt and S Wrobel edsACM Press Neow York 2005 pp 713ndash719

[58] V ROKHLIN A SZLAM AND M TYGERT A randomized algorithm for principal component analysis SIAMJ Matrix Anal Appl 31 (2009) pp 1100ndash1124

[59] A K SAIBABA HOID higher order interpolatory decomposition for tensors based on Tucker representationSIAM J Matrix Anal Appl 37 (2016) pp 1223ndash1249

[60] B SAVAS AND L ELDEacuteN Handwritten digit classification using higher order singular value decompositionPattern Recognition 40 (2007) pp 993ndash1003

[61] R SCHREIBER AND C VAN LOAN A storage-efficient WY representation for products of Householdertransformations SIAM J Sci Statist Comput 10 (1989) pp 53ndash57

[62] A SHASHUA AND T HAZAN Non-negative tensor factorization with applications to statistics and com-puter vision in Proceedings of the 22nd International Conference on Machine Learning S DzeroskiL De Raedt and S Wrobel eds ACM New York 2005 pp 792ndash799

[63] N D SIDIROPOULOS R BRO AND G B GIANNAKIS Parallel factor analysis in sensor array processingIEEE Trans Signal Process 48 (2000) pp 2377ndash2388

[64] D C SORENSEN AND M EMBREE A DEIM induced CUR factorization SIAM J Sci Comput 38 (2016)pp A1454ndashA1482

[65] N SREBRO J RENNIE AND T S JAAKKOLA Maximum-margin matrix factorization in NIPSrsquo04 Proceed-ings of the 17th International Conference on Neural Information Processing Systems L K Saul Y Weissand L Bottou eds MIT Press Cambridge 2004 pp 1329ndash1336

[66] G W STEWART The QLP approximation to the singular value decomposition SIAM J Sci Comput 20(1999) pp 1336ndash1348

ETNAKent State University and

Johann Radon Institute (RICAM)

494 Y FENG J XIAO AND M GU

[67] S TAHERI Q QIU AND R CHELLAPPA Structure-preserving sparse decomposition for facial expressionanalysis IEEE Trans Image Process 23 (2014) pp 3590ndash3603

[68] K-C TOH AND S YUN An accelerated proximal gradient algorithm for nuclear norm regularized linearleast squares problems Pac J Optim 6 (2010) pp 615ndash640

[69] L N TREFETHEN AND D BAU III Numerical Linear Algebra SIAM Philadelphia 1997[70] L R TUCKER Some mathematical notes on three-mode factor analysis Psychometrika 31 (1966) pp 279ndash

311[71] N VANNIEUWENHOVEN R VANDEBRIL AND K MEERBERGEN A new truncation strategy for the higher-

order singular value decomposition SIAM J Sci Comput 34 (2012) pp A1027ndashA1052[72] M A O VASILESCU AND D TERZOPOULOS Multilinear analysis of image ensembles Tensorfaces in

European Conference on Computer Vision 2002 A Heyden G Sparr M Nielsen and P Johansen edsLecture Notes in Computer Science vol 2350 Springer Berlin 2002 pp 447ndash460

[73] N VERVLIET O DEBALS L SORBER M V BAREL AND L DE LATHAUWER Tensorlab user guideavailable at httpwwwtensorlabnet

[74] J XIAO AND M GU Spectrum-revealing Cholesky factorization for kernel methods in Proceedings of the 16thIEEE International Conference on Data Mining (ICDM) IEEE Conference Proceedings Los Alamitos2016 pp 1293ndash1298

[75] J XIAO M GU AND J LANGOU Fast parallel randomized QR with column pivoting algorithms for reliablelow-rank matrix approximations in Proceedings of the 24th IEEE International Conference on HighPerformance Computing (HiPC) IEEE Conference Proceedings Los Alamitos 2017 pp 233ndash242

[76] T ZHANG AND G H GOLUB Rank-one approximation to high order tensors SIAM J Matrix Anal Appl23 (2001) pp 534ndash550

Page 2: FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION AND …etna.mcs.kent.edu/vol.51.2019/pp469-494.dir/pp469-494.pdfminimization. In Section3, we introduce Flip-Flop SRQR and analyze its

ETNAKent State University and

Johann Radon Institute (RICAM)

470 Y FENG J XIAO AND M GU

singular values of A with ldquoconsiderable fidelityrdquo no matter what the matrix A is The pivotedQLP decomposition is extensively analyzed in Huckaby and Chan [34 35] More recentlyDuersch and Gu developed a much more efficient variant of the pivoted QLP decompositionTUXV and demonstrated its remarkable quality as a low-rank approximation empiricallywithout a rigorous theoretical justification of TUXVrsquos success [17]

In this paper we present Flip-Flop SRQR a slightly different variant of TUXV of Duerschand Gu [17] Like TUXV Flip-Flop SRQR requires the most effort for computing a partial QRfactorization using a truncated randomized QRCP (TRQRCP) and a partial LQ factorizationUnlike TUXV however Flip-Flop SRQR also performs additional computations to ensure aspectrum-revealing QR factorization (SRQR) [75] before the partial LQ factorization

We demonstrate the reliable theoretical quality of this variant as a low-rank approximationand its highly competitiveness with state-of-the-art low-rank approximation methods in realworld applications in both low-rank tensor compression [13 40 60 71] and nuclear normminimization [7 45 49 54 68]

The rest of this paper is organized as follows In Section 2 we introduce the TRQRCPalgorithm the SRQR factorization the low-rank tensor compression and the nuclear normminimization In Section 3 we introduce Flip-Flop SRQR and analyze its computational costsand low-rank approximation properties In Section 4 we present numerical results comparingFlip-Flop SRQR with state-of-the-art low-rank approximation methods In Section 5 wesummarize our main results and draw some conclusions

2 Preliminaries and background

21 Partial QRCP The QR factorization of a matrix A isin Rmtimesn is A = QR with anorthogonal matrix Q isin Rmtimesm and an upper trapezoidal matrix R isin Rmtimesn which can becomputed by the LAPACK [2] routine xGEQRF where x stands for the matrix data type Thestandard QR factorization is not suitable for some practical situations where either the matrixA is rank deficient or only representative columns of A are of interest Usually the QRCPis adequate for the aforementioned situations except a few rare examples such as the Kahanmatrix [26] Given a matrix A isin Rmtimesn the QRCP of matrix A has the form

AΠ = QR

where Π isin Rntimesn is a permutation matrix Q isin Rmtimesm is an orthogonal matrix andR isin Rmtimesn is an upper trapezoidal matrix QRCP can be computed by the LAPACK [2]routines xGEQPF and xGEQP3 where xGEQP3 is a more efficient blocked implementationof xGEQPF For a given target rank k (1 le k le min (mn)) the partial QRCP factorization(Algorithm 4 in the appendix) has a 2times 2 block form

(21) AΠ = Q

[R11 R12

R22

]=[Q1 Q2

] [R11 R12

R22

]

where R11 isin Rktimesk is upper triangular The partial QRCP computes an approximate col-umn subspace of A spanned by the leading k columns in AΠ up to the error term in R22Equivalently (21) yields a low rank approximation

(22) A asymp Q1

[R11 R12

]ΠT

with an approximation quality closely related to the error term in R22The Randomized QRCP (RQRCP) algorithm [17 75] is a more efficient variant of

Algorithm 4 RQRCP generates a Gaussian random matrix Ω isin N (0 1)(b+p)timesm with

b + p m where the entries of Ω are independently sampled from a normal distribution

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 471

Algorithm 1 Truncated Randomized QRCP (TRQRCP)InputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0OutputsOrthogonal matrix Q isin RmtimesmUpper trapezoidal matrix R isin RktimesnPermutation matrix Π isin Rntimesn such that AΠ asymp Q ( 1 k)RAlgorithmGenerate iid Gaussian random matrix Ω isin N (0 1)

(b+p)timesmForm the initial sample matrix B = ΩA and initialize Π = Infor j = 1 b k dob = min (k minus j + 1 b)Do partial QRCP on B ( j n) to obtain b pivotsExchange corresponding columns in A B Π and WT Do QR on A (j m j j + bminus 1) using WY formula without updating the trailingmatrixUpdate B ( j + b n) using the formula (24)

end forQ = Q1Q2 middot middot middotQdkbe R = upper trapezoidal part of the submatrix A (1 k 1 n)

to compress A into B = ΩA with much smaller row dimension In practice b is the blocksize and p is the oversampling size RQRCP repeatedly runs partial QRCP on B to obtain bcolumn pivots applies them to the matrix A and then computes QR without pivoting (QRNP)on A and updates the remaining columns of B RQRCP exits this process when it reachesthe target rank k QRCP and RQRCP choose pivots on A and B respectively RQRCP issignificantly faster than QRCP as B has much smaller row dimension than A It is shownin [75] that RQRCP is as reliable as QRCP up to failure probabilities that decay exponentiallywith respect to the oversampling size p

Since the trailing matrix of A is usually not required for low-rank matrix approximations(see (22)) the TRQRCP (truncated RQRCP) algorithm of [17] re-organizes the computationsin RQRCP to directly compute the approximation (22) without explicitly computing thetrailing matrix R22 For more details both RQRCP and TRQRCP are based on the WYrepresentation of the Householder transformations [5 55 61]

Q = Q1Q2 middot middot middotQk = I minus Y TY T

where T isin Rktimesk is an upper triangular matrix and Y isin Rmtimesk is a trapezoidal matrix

consisting of k consecutive Householder vectors Let WT def= TTY TA then the trailing

matrix update formula becomes QTA = Aminus YWT The main difference between RQRCPand TRQRCP is that RQRCP computes the whole trailing matrix update while TRQRCP onlycomputes the part of the update that affects the approximation (22) More discussions aboutRQRCP and TRQRCP can be found in [17] The main steps of TRQRCP are briefly describedin Algorithm 1 The deduction of the updating formula for B is as follows consider partialQRs on AΠ and BΠ = ΩAΠ

(23) AΠ = Q

[R11 R12

R22

] BΠ = Q

[R11 R12

R22

]

ETNAKent State University and

Johann Radon Institute (RICAM)

472 Y FENG J XIAO AND M GU

Partition Ωdef= Q

TΩQ

def=[Ω1 Ω2

] and therefore (23) implies

Ω

[R11 R12

R22

]=

[R11 R12

R22

]

which leads to the updating formula for B in Algorithm 1

(24)[R12

R22

]larr Ω2R22 =

[R12 minusR11R

minus111 R12

R22

]

With TRQRCP the TUXV algorithm [17 Algorithm 7] (Algorithm 5 in the appendix)computes a low-rank approximation with the QLP factorization at a greatly accelerated speedby computing a partial QR factorization with column pivoting followed with a partial LQfactorization

22 Spectrum-revealing QR factorization Although both RQRCP and TRQRCP arevery effective practical tools for low-rank matrix approximations they are not known toprovide reliable low-rank matrix approximations due to their underlying greediness in thecolumn norm-based pivoting strategy To solve this potential problem of column-basedQR factorization Gu and Eisenstat [28] proposed an efficient way to perform additionalcolumn interchanges to enhance the quality of the leading k columns in AΠ as a basis for theapproximate column subspace More recently a more efficient and effective method SRQRwas introduced and analyzed in [50 75] to compute the low-rank approximation (22) Theconcept of spectrum-revealing introduced in [50 74] emphasizes the utilization of partialQR factorization (22) as a low-rank matrix approximation as opposed to the more traditionalrank-revealing factorization which emphasizes the utility of the partial QR factorization (21)as a tool for numerical rank determination The SRQR algorithm is described in Algorithm 2The SRQR algorithm will always run to completion with a high-quality low-rank matrixapproximation (22) For real data matrices that usually have fast decaying singular-valuespectrum this approximation is often as good as the truncated SVD The SRQR algorithmof [75] explicitly updates the partial QR factorization (21) while swapping columns but theSRQR algorithm can actually avoid any explicit computations on the trailing matrix R22 usingTRQRCP instead of RQRCP to obtain exactly the same partial QR initialization Below weoutline the SRQR algorithm

In (21) let

(25) Rdef=

[R11 a

α

]be the leading (l + 1)times (l + 1) (l ge k) submatrix of R We define

(26) g1def=R2212|α|

and g2def= |α|

∥∥∥RminusT∥∥∥12

where X12 is the largest column 2-norm of X for any given X In [75] the authorsproved approximation quality bounds involving g1 g2 for the low-rank approximation com-puted by RQRCP or TRQRCP RQRCP or TRQRCP will provide a good low-rank ma-

trix approximation if g1 and g2 are O(1) The authors also proved that g1 leradic

1+ε1minusε and

g2 leradic

2(1+ε)

1minusε

(1 +

radic1+ε1minusε

)lminus1

hold with high probability for both RQRCP and TRQRCPwhere 0 lt ε lt 1 is a user-defined parameter which guides the choice of the oversampling size

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 473

p For reasonably choices of ε like ε = 12 g1 is a small constant while g2 can potentially be an

extremely large number which can lead to a poor low-rank approximation quality To avoidthe potential large number of g2 the SRQR algorithm (Algorithm 2) proposed in [75] uses apair-wise swapping strategy to guarantee that g2 is below some user-defined tolerance g gt 1which is usually chosen to be a small number larger than one like 20 In Algorithm 2 weuse an approximate formula instead of using the definition (26) directly to estimate g2 ieg2 asymp |α|radicd

∥∥∥ΩRminusT∥∥∥

12 based on the Johnson-Lindenstrauss Lemma [36]

Algorithm 2 Spectrum-revealing QR Factorization (SRQR)InputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0Integer l ge k Tolerance g gt 1 for g2OutputsOrthogonal matrix Q isin Rmtimesm formed by the first k reflectorsUpper trapezoidal matrix R isin RktimesnPermutation matrix Π isin Rntimesn such that AΠ asymp Q ( 1 k)RAlgorithmCompute QRΠ with RQRCP or TRQRCP to l stepsCompute squared 2-norm of the columns of B( l + 1 n) ri (l + 1 le i le n) where Bis an updated random projection of A computed by RQRCP or TRQRCPApproximate squared 2-norm of the columns of A(l + 1 m l + 1 n) ri = ri(b +p) (l + 1 le i le n)ı = argmaxl+1leilenriSwap ı-th and (l + 1)-st columns of AΠ rOne-step QR factorization of A(l + 1 m l + 1 n)|α| = Rl+1l+1ri = ri minusA(l + 1 i)2 (l + 2 le i le n)Generate a random matrix Ω isin N (0 1)dtimes(l+1) (d l)Compute g2 = |α|

∥∥∥RminusT∥∥∥12asymp |α|radic

d

∥∥∥ΩRminusT∥∥∥

12

while g2 gt g doı = argmax1leilel+1ith column norm of ΩRminusT Swap ı-th and (l + 1)-st columns of A and Π in a Round Robin rotationGivens-rotate R back into upper-trapezoidal formrl+1 = R2

l+1l+1 ri = ri +A(l + 1 i)2 (l + 2 le i le n)ı = argmaxl+1leilenriSwap ı-th and (l + 1)-st columns of AΠ rOne-step QR factorization of A(l + 1 m l + 1 n)|α| = Rl+1l+1ri = ri minusA(l + 1 i)2 (l + 2 le i le n)Generate a random matrix Ω isin N (0 1)dtimes(l+1)(d l) Compute g2 = |α|

∥∥∥RminusT∥∥∥12asymp |α|radic

d

∥∥∥ΩRminusT∥∥∥

12

end while

23 Tensor approximation In this section we review some basic notations and conceptsinvolving tensors A more detailed discussion of the properties and applications of tensors can

ETNAKent State University and

Johann Radon Institute (RICAM)

474 Y FENG J XIAO AND M GU

be found in the review [40] A tensor is a d-dimensional array of numbers denoted by scriptnotation X isin RI1timesmiddotmiddotmiddottimesId with entries given by

xj1jd 1 le j1 le I1 1 le jd le Id

We use the matrix X(n) isin RIntimes(Πj 6=nIj) to denote the nth mode unfolding of the tensorX Since this tensor has d dimensions there are altogether d-possibilities for unfolding Then-mode product of a tensor X isin RI1timesmiddotmiddotmiddottimesId with a matrix U isin RktimesIn results in a tensorY isin RI1timesmiddotmiddotmiddottimesInminus1timesktimesIn+1timesmiddotmiddotmiddottimesId such that

yj1jnminus1jjn+1jd = (X timesn U)j1jnminus1jjn+1jd=

Insumjn=1

xj1jdujjn

Alternatively it can be expressed conveniently in terms of unfolded tensors

Y = X timesn U hArr Y(n) = UX(n)

Decompositions of higher-order tensors have applications in signal processing [12 15 63]numerical linear algebra [13 39 76] computer vision [62 72 67] etc Two particular tensordecompositions can be considered as higher-order extensions of the matrix SVD CANDE-COMPPARAFAC (CP) [10 31] decomposes a tensor as a sum of rank-one tensors andthe Tucker decomposition [70] is a higher-order form of the principal component analy-sis Knowing the definitions of mode products and unfolding of tensors we can introducethe sequentially truncated higher-order SVD (ST-HOSVD) algorithm for producing a rank(k1 kd) approximation to the tensor based on the Tucker decomposition format The ST-HOSVD algorithm [3 71] (Algorithm 6 in the appendix) returns a core tensor G isin Rk1timesmiddotmiddotmiddottimeskdand a set of unitary matrices Uj isin RIjtimeskj for j = 1 d such that

X asymp G times1 U1 middot middot middot timesd Ud

where the right-hand side is called a Tucker decomposition However a straightforwardgeneralization to higher-order (d ge 3) tensors of the matrix Eckart-Young-Mirsky Theorem isnot possible [14]

In ST-HOSVD one key step is to compute an exact or approximate rank-kr SVD ofthe tensor unfolding Well known efficient ways to compute an exact low-rank SVD includeKrylov subspace methods [44] There are also efficient randomized algorithms to find anapproximate low-rank SVD [30] In the Matlab tensorlab toolbox [73] the most efficientmethod MLSVD_RSI is essentially ST-HOSVD with an randomized subspace iteration tofind the approximate SVD of the tensor unfolding

24 Nuclear norm minimization Matrix rank minimization problem appears ubiqui-tously in many fields such as Euclidean embedding [20 46] control [21 51 56] collaborativefiltering [9 57 65] system identification [47 48] etc The regularized nuclear norm mini-mization problem is the following

minXisinRmtimesn

f (X) + τXlowast

where τ gt 0 is a regularization parameter and Xlowastdef=sumqi=1 σi (X) The choice of the

function f (middot) is situational f (X) = M minus X1 in robust principal component analysis(robust PCA) [8] and f (X) = πΩ (M)minus πΩ (X) ||2F in matrix completion [7]

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 475

Many researchers have devoted themselves to solve the above nuclear norm minimizationproblem and plenty of algorithms have been proposed including singular value thresholding(SVT) [7] fixed point continuous (FPC) [49] accelerated proximal gradient (APG) [68]augmented Lagrange multiplier (ALM) [45] The most expensive part of these algorithms liesin the computation of the truncated SVD Inexact augmented Lagrange multiplier (IALM) [45]has been proved to be one of the most accurate and efficient among them We now describeIALM for the robust PCA and matrix completion problems

The robust PCA problem can be formalized as a minimization problem of a sum of nuclearnorm and a scaled matrix l1-norm (sum of matrix entries in absolute value)

(27) min Xlowast + λE1 subject to M = X + E

where M is a measured matrix X has low-rank E is an error matrix and sufficiently sparseand λ is a positive weighting parameter Algorithm 7 in the appendix describes details of theIALM method to solve the robust PCA problem [45]

The matrix completion problem [9 45] can be written in the form

(28) minXisinRmtimesn

Xlowast subject to X + E = M πΩ (E) = 0

where πΩ Rmtimesn rarr Rmtimesn is an orthogonal projection that keeps the entries in Ω unchangedand sets those outside Ω zeros In [45] the authors applied the IALM method to the matrixcompletion problem (Algorithm 8 in the appendix)

3 Flip-Flop SRQR factorization

31 Flip-Flop SRQR factorization In this section we introduce our Flip-Flop SRQRfactorization a slightly different variant of TUXV (Algorithm 5 in the appendix) to computean SVD approximation based on the QLP factorization Given an integer l ge k we run SRQR(the version without computing the trailing matrix) with l steps on A

(31) AΠ = QR = Q

[R11 R12

R22

]

where R11 isin Rltimesl is upper triangular R12 isin Rltimes(nminusl) and R22 isin R(mminusl)times(nminusl) Then werun partial QRNP with l steps on RT

(32) RT =

[RT11

RT12 RT22

]= Q

[R11 R12

R22

]asymp Q1

[R11 R12

]

where Q =[Q1 Q2

]with Q1 isin Rntimesl Therefore combining with the fact that

AΠQ1 = Q[R11 R12

]T

we can approximate the matrix A by

(33) A = QRΠT = Q(RT)T

ΠT asymp Q

[RT11

RT12

]QT1 ΠT = A

(ΠQ1

)(ΠQ1

)T

We denote the rank-k truncated SVD ofAΠQ1 by UkΣkVTk LetUk = Uk Vk = ΠQ1Vk

then using (33) a rank-k approximate SVD of A is obtained

(34) A asymp UkΣkVTk

where Uk isin Rmtimesk Vk isin Rntimesk are column orthonormal and Σk = Diag (σ1 middot middot middot σk) withσirsquos being the leading k singular values of AΠQ1 The Flip-Flop SRQR factorization isoutlined in Algorithm 3

ETNAKent State University and

Johann Radon Institute (RICAM)

476 Y FENG J XIAO AND M GU

Algorithm 3 Flip-Flop Spectrum-Revealing QR FactorizationInputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0Integer l ge k Tolerance g gt 1 for g2OutputsU isin Rmtimesk contains the approximate top k left singular vectors of AΣ isin Rktimesk contains the approximate top k singular values of AV isin Rntimesk contains the approximate top k right singular vectors of AAlgorithmRun SRQR on A to l steps to obtain (R11 R12)Run QRNP on (R11 R12)

T to obtain Q1 represented by a sequence of Householder vectors

A = AΠQ1[U Σ V ] = svd

(A)

U = U ( 1 k) Σ = Σ (1 k 1 k) V = ΠQ1V ( 1 k)

32 Complexity analysis In this section we perform a complexity analysis of Flip-Flop SRQR Since the approximate SVD only makes sense when the target rank k is smallwe assume that k le l min (mn) The complexity analysis of Flip-Flop SRQR looks asfollows

1 The main cost of doing SRQR with TRQRCP onA is 2mnl+2(b+p)mn+(m+n)l22 The main cost of the QR factorization for [R11 R12]

T and forming Q1 is 2nl2minus 23 l

33 The main cost of computing A = AΠQ1 is 2mnl4 The main cost of computing [Usimsim] = svd

(A)

is O(ml2)5 The main cost of forming Vk is 2nlk

Since k le l min (mn) the complexity of Flip-Flop SRQR is 4mnl + 2(b+ p)mnby omitting the lower-order terms

On the other hand the complexity of the approximate SVD with randomized subspaceiteration (RSISVD) [27 30] is (4 + 4q)mn (k + p) where p is the oversampling size and qis the number of subspace iterations (see the detailed analysis in the appendix) In practice pis chosen to be a small integer for instance 5 in RSISVD In Flip-Flop SRQR l is usuallychosen to be a little bit larger than k eg l = k + 5 We also found that l = k is sufficient interms of the approximation quality in our numerical experiments Therefore we observe thatFlip-Flop SRQR is more efficient than RSISVD for any q gt 0

Since a practical dataset matrix can be too large to fit into the main memory the cost oftransferring the matrix into a memory hierarchy typically predominates the arithmetic costTherefore we also compare the number of data passes between RSISVD and Flip-Flop SRQRRSISVD needs 2q + 2 passes while Flip-Flop SRQR needs 4 passes computing B = ΩAswapping the columns of A realizing a partial QR factorization on A and computing AΠQ1Therefore Flip-Flop SRQR requires less data transfer than RSISVD for any q gt 1

33 Quality analysis of Flip-Flop SRQR This section is devoted to the quality analysisof Flip-Flop SRQR We start with Lemma 31

LEMMA 31 Given any matrix X = [X1X2] with Xi isin Rmtimesni i = 1 2 and n1 +n2 = n

σj (X)2 le σj (X1)

2+ X222 1 le j le min(mn)

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 477

Proof Since XXT = X1XT1 + X2X

T2 we obtain the above result using [33 Theo-

rem 3316]

We are now ready to derive bounds for the singular values and the approximation errorof Flip-Flop SRQR We need to emphasize that even if the target rank is k we run Flip-FlopSRQR with an actual target rank l which is a little bit larger than k The difference betweenk and l can create a gap between the singular values of A so that we can obtain a reliablelow-rank approximation

THEOREM 32 Given a matrix A isin Rmtimesn a target rank k an oversampling size p andan actual target rank l ge k the matrices Uk Σk Vk in (34) computed by Flip-Flop SRQRsatisfy

(35) σj(Σk) ge σj(A)

4

radic1 +

2R2242σ4j (Σk)

1 le j le k

and

(36)∥∥Aminus UkΣkV

Tk

∥∥2le σk+1 (A)

4

radic1 + 2

(R222σk+1 (A)

)4

where R22 isin R(mminusl)times(nminusl) is the trailing matrix in (31) Using the properties of SRQR wefurthermore have

(37) σj (Σk) ge σj (A)

4

radic1 + min

(2τ4 τ4 (2 + 4τ4)

(σl+1(A)σj(A)

)4) 1 le j le k

and

(38)∥∥Aminus UkΣkV

Tk

∥∥2le σk+1 (A)

4

radic1 + 2τ4

(σl+1 (A)

σk+1 (A)

)4

where τ and τ defined in (312) have matrix dimension-dependent upper bounds

τ le g1g2

radic(l + 1) (nminus l) and τ le g1g2

radicl (nminus l)

where g1 leradic

1+ε1minusε and g2 le g with high probability The parameters ε gt 0 and g gt 1 are

user-defined

Proof In terms of the singular value bounds observing that

[R11 R12

R22

][R11 R12

R22

]T=

[R11R

T11 + R12R

T12 R12R

T22

R22RT12 R22R

T22

]

ETNAKent State University and

Johann Radon Institute (RICAM)

478 Y FENG J XIAO AND M GU

we apply Lemma 31 twice for any 1 le j le k

σ2j

[R11 R12

R22

][R11 R12

R22

]Tle σ2

j

([R11R

T11 + R12R

T12 R12R

T22

])+∥∥∥[R22R

T12 R22R

T22

]∥∥∥2

2

le σ2j

(R11R

T11 + R12R

T12

)+∥∥∥R12R

T22

∥∥∥2

2+∥∥∥[R22R

T12 R22R

T22

]∥∥∥2

2

le σ2j

(R11R

T11 + R12R

T12

)+ 2

∥∥∥∥∥[R12

R22

]∥∥∥∥∥4

2

= σ2j

(R11R

T11 + R12R

T12

)+ 2 R2242 (39)

The relation (39) can be further rewritten as

σ4j (A) le σ4

j

([R11 R12

])+ 2 R2242 = σ4

j (Σk) + 2 R2242 1 le j le k

which is equivalent to

σj(Σk) ge σj(A)

4

radic1 +

2R2242σ4j (Σk)

For the residual matrix bound we let[R11 R12

]def=[R11 R12

]+[δR11 δR12

]

where[R11 R12

]is the rank-k truncated SVD of

[R11 R12

] Since

(310)∥∥Aminus UkΣkV

Tk

∥∥2

=

∥∥∥∥∥AΠminusQ[R11 R12

0

]TQT

∥∥∥∥∥2

it follows from the orthogonality of singular vectors that[R11 R12

]T [δR11 δR12

]= 0

and therefore[R11 R12

]T [R11 R12

]=[R11 R12

]T [R11 R12

]+[δR11 δR12

]T [δR11 δR12

]

which implies

(311) RT12R12 = RT

12R12 +(δR12

)T (δR12

)

Similarly to the deduction of (39) from (311) we can derive∥∥∥∥[δR11 δR12

R22

]∥∥∥∥4

2

le∥∥[δR11 δR12

]∥∥4

2+ 2

∥∥∥∥[δR12

R22

]∥∥∥∥4

2

le σ4k+1 (A) + 2

∥∥∥∥∥[R12

R22

]∥∥∥∥∥4

2

= σ4k+1 (A) + 2 R2242

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 479

Combining with (310) it now follows that

∥∥Aminus UkΣkVTk

∥∥2

=

∥∥∥∥∥AΠminusQ[R11 R12

0

]TQT

∥∥∥∥∥2

=

∥∥∥∥[δR11 δR12

R22

]∥∥∥∥2

le σk+1 (A)4

radic1 + 2

(R222σk+1 (A)

)4

To obtain an upper bound of R222 in (35) and (36) we follow the analysis of SRQRin [75]

From the analysis of [75 Section IV] Algorithm 2 ensures that g1 leradic

1+ε1minusε and g2 le g

with high probability (the actual probability-guarantee formula can be found in [75 SectionIV]) where g1 and g2 are defined by (26) Here 0 lt ε lt 1 is a user defined parameter toadjust the choice of the oversampling size p used in the TRQRCP initialization part in SRQRg gt 1 is a user-defined parameter in the extra swapping part in SRQR Let

(312) τdef= g1g2

R222R2212

∥∥∥RminusT∥∥∥minus1

12

σl+1 (A)and τ

def= g1g2

R222R2212

∥∥RminusT11

∥∥minus1

12

σk (Σk)

where R is defined in equation (25) Since 1radicnX12 le X2 le

radicnX12 and

σi (X1) le σi (X) 1 le i le min (s t) for any matrix X isin Rmtimesn and submatrix X1 isin Rstimestof X we obtain that

τ = g1g2R222R2212

∥∥∥RminusT∥∥∥minus1

12

σl+1

(R) σl+1

(R)

σl+1 (A)le g1g2

radic(l + 1) (nminus l)

Using the fact that σl([R11 R12

])= σl

(R11

)and σk (Σk) = σk

([R11 R12

])by (32)

and (34)

τ = g1g2R222R2212

∥∥RminusT11

∥∥minus1

12

σl (R11)

σl (R11)

σl([R11 R12

]) σl ([R11 R12

])σk (Σk)

le g1g2

radicl (nminus l)

By the definition of τ

(313) R222 = τ σl+1 (A)

Plugging this into (36) yields (38)By the definition of τ we observe that

(314) R222 le τσk (Σk)

By (35) and (314)(315)

σj (Σk) ge σj (A)

4

radic1 + 2

(R222σj(Σk)

)4ge σj (A)

4

radic1 + 2

(R222σk(Σk)

)4ge σj (A)

4radic

1 + 2τ4 1 le j le k

ETNAKent State University and

Johann Radon Institute (RICAM)

480 Y FENG J XIAO AND M GU

On the other hand using (35)

σ4j (A) le σ4

j (Σk)

(1 + 2

σ4j (A)

σ4j (Σk)

R2242σ4j (A)

)

le σ4j (Σk)

1 + 2

(σ4j (Σk) + 2 R2242

)σ4j (Σk)

R2242σ4j (A)

le σ4

j (Σk)

(1 + 2

(1 + 2

R2242σ4k (Σk)

)R2242σ4j (A)

)

that is

σj (Σk) ge σj (A)

4

radic1 +

(2 + 4

R2242σ4k(Σk)

)R2242σ4j (A)

Plugging (313) and (314) into this above equation yields

(316) σj (Σk) ge σj (A)

4

radic1 + τ4 (2 + 4τ4)

(σl+1(A)σj(A)

)4 1 le j le k

Combining (315) with (316) we arrive at (37)We note that (35) and (36) still hold true if we replace k by lInequality (37) shows that with the definitions (312) of τ and τ Flip-Flop SRQR can

reveal at least a dimension-dependent fraction of all the leading singular values of A andindeed approximate them very accurately in case they decay relatively quickly Moreover (38)shows that Flip-Flop SRQR can compute a rank-k approximation that is up to a factor of4

radic1 + 2τ4

(σl+1(A)σk+1(A)

)4

from optimal In situations where the singular values of A decay

relatively quickly our rank-k approximation is about as accurate as the truncated SVD with achoice of l such that

σl+1 (A)

σk+1 (A)= o (1)

REMARK 33 The ldquohigh probabilityrdquo mentioned in Theorem 32 is dependent on theoversampling size p but not on the block size b We decided not to dive deep into the probabilityexplanation since it will make the proof much longer and more tedious Theoretically thecomputed oversampling size p for certain given high probability would be rather high Forexample to achieve a probability 095 we need to choose p to be at least 500 However inpractice we would never choose such a large p since it will deteriorate the code efficiencyIt turns out that a small p like 5 sim 10 would achieve good approximation results empiricallyThe detailed explanation about what does high probability mean here can be found in [75]In terms of the choice of the block size b it wonrsquot affect the analysis or the error boundsThe choice of b is purely for the performance of the codes Only a suitable b can take fulladvantage of the cache hierarchy and BLAS routines

4 Numerical experiments In this section we demonstrate the effectiveness and effi-ciency of the Flip-Flop SRQR (FFSRQR) algorithm in several numerical experiments Firstly

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 481

TABLE 41Methods for approximate SVD

Method DescriptionLANSVD Approximate SVD using Lanczos bidiagonalization with partial

reorthogonalization [42]Its Fortran and Matlab implementations are from PROPACK [41]

FFSRQR Flip-Flop Spectrum-revealing QR factorizationIts Matlab implementation is using Mex-wrappedBLAS and LAPACK routines [2]

RSISVD Approximate SVD with randomized subspace iteration [30 27]Its Matlab implementation is from tensorlab toolbox [73]

TUXV Algorithm 5 in the appendixLTSVD Linear Time SVD [16]

We use its Matlab implementation by Ma et al [49]

we compare FFSRQR with other approximate SVD algorithms for a matrix approximationSecondly we compare FFSRQR with other methods on tensor approximation problems usingthe tensorlab toolbox [73] Thirdly we compare FFSRQR with other methods for the robustPCA problem and matrix completion problem All experiments are implemented on a laptopwith 29 GHz Intel Core i5 processor and 8 GB of RAM The codes based on LAPACK [2]and PROPACK [41] in Section 41 are in Fortran The codes in Sections 42 43 are in Matlabexcept that FFSRQR is implemented in Fortran and wrapped by mex

41 Approximate truncated SVD In this section we compare FFSRQR with otherapproximate SVD algorithms for low-rank matrix approximation All tested methods are listedin Table 41 Since LTSVD has only a Matlab implementation this method is only consideredin Sections 42 and 43 The test matrices areType 1 A isin Rmtimesn [66] is defined by A = U DV T + 01 middot dss middotE where U isin Rmtimess V isin

Rntimess are column-orthonormal matrices and D isin Rstimess is a diagonal matrix with sgeometrically decreasing diagonal entries from 1 to 10minus3 dss is the last diagonalentry of D E isin Rmtimesn is a random matrix where the entries are independentlysampled from a normal distribution N (0 1) In our numerical experiment we testthree different random matrices The square matrix has a size of 10000times 10000 theshort-fat matrix has a size of 1000times 10000 and the tall-skinny matrix has a size of10000times 1000

Type 2 A isin R4929times4929 is a real data matrix from the University of Florida sparse matrixcollection [11] Its corresponding file name is HBGEMAT11

For a given matrix A isin Rmtimesn the relative SVD approximation error is measured by∥∥Aminus UkΣkVTk

∥∥FAF where Σk contains the approximate top k singular values and

Uk Vk are the corresponding approximate top k singular vectors The parameters used inFFSRQR RSISVD TUXV and LTSVD are listed in Table 42 The additional time requiredto compute g2 is negligible In our experiments g2 always remains modest and never triggerssubsequent SRQR column swaps However computing g2 is still recommended since g2

serves as an insurance policy against potential quality degration by TRQRCP The block size bis chosen to be 32 according to the xGEQP3 LAPACK routine When k is less than 32 weimplement an unblock version of the partial QRCP

Figures 41ndash43 display the run time the relative approximation error and the top 20singular values comparison respectively for four different matrices In terms of speed FFSRQRis always faster than LANSVD and RSISVD Moreover the efficiency difference is more

ETNAKent State University and

Johann Radon Institute (RICAM)

482 Y FENG J XIAO AND M GU

TABLE 42Parameters used in RSISVD FFSRQR TUXV and LTSVD

Method ParameterFFSRQR oversampling size p = 5 block size b = min32 k l = k d = 10 g = 20RSISVD oversampling size p = 5 subspace iteration q = 1TUXV oversampling size p = 5 block size b = min32 k l = kLTSVD probabilities pi = 1n

obvious when k is relatively large FFSRQR is only slightly slower than TUXV In termsof the relative approximation error FFSRQR is comparable to the other three methods Interms of the top singular values approximation Figure 43 displays the top 20 singular valueswhen k = 500 for four different matrices In Figure 43 FFSRQR is superior to TUXV aswe take the absolute values of the main diagonal entries of the upper triangle matrix R as theapproximate singular values for TUXV

100 200 300 400 500

target rank k

100

101

102

103

104

run tim

e (

sec)

FFSRQR

RSISVD

LANSVD

TUXV

(a) Type 1 Random square matrix

100 200 300 400 500

target rank k

10-1

100

101

102

run tim

e (

sec)

FFSRQR

RSISVD

LANSVD

TUXV

(b) Type 1 Random short-fat matrix

100 200 300 400 500

target rank k

10-1

100

101

102

run tim

e (

sec)

FFSRQR

RSISVD

LANSVD

TUXV

(c) Type 1 Random tall-skinny matrix

100 200 300 400 500

target rank k

10-1

100

101

102

103

run tim

e (

sec)

FFSRQR

RSISVD

LANSVD

TUXV

(d) Type 2 GEMAT11

FIG 41 Run time comparison for approximate SVD algorithms

42 Tensor approximation This section illustrates the effectiveness and efficiency ofFFSRQR for computing approximate tensors ST-HOSVD [3 71] is one of the most efficient

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 483

100 200 300 400 500

target rank k

022

024

026

028

03

032

034

036re

lative

ap

pro

xim

atio

n e

rro

r

FFSRQR

RSISVD

LANSVD

TUXV

(a) Type 1 Random square matrix

100 200 300 400 500

target rank k

10-2

10-1

100

rela

tive

ap

pro

xim

atio

n e

rro

r

FFSRQR

RSISVD

LANSVD

TUXV

(b) Type 1 Random short-fat matrix

100 200 300 400 500

target rank k

10-2

10-1

100

rela

tive

ap

pro

xim

atio

n e

rro

r

FFSRQR

RSISVD

LANSVD

TUXV

(c) Type 1 Random tall-skinny matrix

100 200 300 400 500

target rank k

024

026

028

03

032

034

036

rela

tive

ap

pro

xim

atio

n e

rro

rFFSRQR

RSISVD

LANSVD

TUXV

(d) Type 2 GEMAT11

FIG 42 Relative approximation error comparison for approximate SVD algorithms

algorithms to compute a Tucker decomposition of tensors and the most costly part of thisalgorithm is to compute an SVD or approximate SVD of the tensor unfoldings The truncatedSVD and RSISVD are used in the routines MLSVD and MLSVD_RSI respectively in theMatlab tensorlab toolbox [73] Based on this Matlab toolbox we implement ST-HOSVDusing FFSRQR or LTSVD to do the SVD approximation We name these two new routines byMLSVD_FFSRQR and MLSVD_LTSVD respectively and compare these four routines in thenumerical experiment We also have Python codes for this tensor approximation experimentWe donrsquot list the results of the python programs here as they are similar to those for Matlab

421 A sparse tensor example We test a sparse tensor X isin Rntimesntimesn of the followingformat [64 59]

X =

10sumj=1

1000

jxj yj zj +

nsumj=11

1

jxj yj zj

where xj yj zj isin Rn are sparse vectors with nonnegative entries The symbol ldquordquo representsthe vector outer product We compute a rank-(k k k) Tucker decomposition [GU1 U2 U3]using MLSVD MLSVD_FFSRQR MLSVD_RSI and MLSVD_LTSVD respectively Therelative approximation error is measured by X minus XkF XF where Xk = G times1 U1 times2

U2 times3 U3

ETNAKent State University and

Johann Radon Institute (RICAM)

484 Y FENG J XIAO AND M GU

5 10 15 20

index

06

07

08

09

1to

p 2

0 s

ingula

r valu

es

FFSRQR

RSISVD

LANSVD

TUXV

(a) Type 1 Random square matrix

5 10 15 20

index

06

07

08

09

1

top 2

0 s

ingula

r valu

es

FFSRQR

RSISVD

LANSVD

TUXV

(b) Type 1 Random short-fat matrix

5 10 15 20

index

06

065

07

075

08

085

09

095

top

20

sin

gu

lar

va

lue

s

FFSRQR

RSISVD

LANSVD

TUXV

(c) Type 1 Random tall-skinny matrix

5 10 15 20

index

101

102

103

top

20

sin

gu

lar

va

lue

sFFSRQR

RSISVD

LANSVD

TUXV

(d) Type 2 GEMAT11

FIG 43 Top 20 singular values comparison for approximate SVD algorithms

Figure 44 displays a comparison of the efficiency and accuracy of different methods for a400times 400times 400 sparse tensor approximation problem MLSVD_LTSVD is the fastest but theleast accurate one The other three methods have similar accuracy while MLSVD_FFSRQR isfaster when the target rank k is larger

422 Handwritten digits classification MNIST is a handwritten digits image data setcreated by Yann LeCun [43] Every digit is represented by a 28times28 pixel image Handwrittendigits classification has the objective of training a classification model to classify new unlabeledimages A HOSVD algorithm is proposed by Savas and Elden [60] to classify handwrittendigits To reduce the training time a more efficient ST-HOSVD algorithm is introducedin [71]

We do handwritten digits classification using MNIST which consists of 60 000 trainingimages and 10 000 test images The number of training images in each class is restrictedto 5421 so that the training set is equally distributed over all classes The training set isrepresented by a tensor X of size 786times 5421times 10 The classification relies on Algorithm 2in [60] We use various algorithms to obtain an approximation X asymp G times1 U1 times2 U2 times3 U3

where the core tensor G has size 65times 142times 10The results are summarized in Table 43 In terms of run time we observe that our method

MLSVD_FFSRQR is comparable to MLSVD_RSI while MLSVD is the most expensive

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 485

50 100 150 200

target rank k

10-5

10-4

10-3

10-2

10-1

100

rela

tive

ap

pro

xim

atio

n e

rro

r

MLSVD

MLSVD_FFSRQR

MLSVD_RSI

MLSVD_LTSVD

50 100 150 200

target rank k

10-2

10-1

100

101

102

run tim

e (

sec)

MLSVD

MLSVD_FFSRQR

MLSVD_RSI

MLSVD_LTSVD

FIG 44 Run time and relative approximation error comparison on a sparse tensor

TABLE 43Comparison on handwritten digits classification

MLSVD MLSVD_FFSRQR MLSVD_RSI MLSVD_LTSVD

Training Time [sec] 272121 15455 19343 04266Relative Model Error 04099 04273 04247 05162Classification Accur 9519 9498 9505 9259

one and MLSVD_LTSVD is the fastest one In terms of classification quality MLSVDMLSVD_FFSRQR and MLSVD_RSI are comparable while MLSVD_LTSVD is the leastaccurate one

43 Solving nuclear norm minimization problem To show the effectiveness of theFFSRQR algorithm for the nuclear norm minimization problems we investigate two scenariosthe robust PCA (27) and the matrix completion problem (28) The test matrix used for therobust PCA is introduced in [45] and the test matrices used in the matrix completion are tworeal data sets We use the IALM method [45] to solve both problems the IALM code can bedownloaded1

431 Robust PCA To solve the robust PCA problem we replace the approximateSVD part in the IALM method [45] by various methods We denote the actual solution tothe robust PCA problem by a matrix pair (Xlowast Elowast) isin Rmtimesn times Rmtimesn The matrix Xlowast isXlowast = XLX

TR with XL isin Rmtimesk XR isin Rntimesk being random matrices whose entries are

independently sampled from a normal distribution The sparse matrix Elowast is a random matrixwhere its non-zero entries are independently sampled from a uniform distribution over theinterval [minus500 500] The input to the IALM algorithm has the form M = Xlowast + Elowast andthe output is denoted by

(X E

) In this numerical experiment we use the same parameter

settings as the IALM code for robust PCA rank k is 01m and the number of non-zero entriesin E is 005m2 We choose the trade-off parameter λ = 1

radicmax (mn) as suggested by

Candes et al [8] The solution quality is measured by the normalized root mean square errorX minusXlowastF XlowastF

Table 44 includes relative error run time the number of non-zero entries in E (E0)the iteration count and the number of non-zero singular values (sv) in X of the IALM

1httpperceptioncslillinoisedumatrix-ranksample_codehtml

ETNAKent State University and

Johann Radon Institute (RICAM)

486 Y FENG J XIAO AND M GU

algorithm using different approximate SVD methods We observe that IALM_FFSRQR isfaster than all the other three methods while its error is comparable to IALM_LANSVD andIALM_RSISVD IALM_LTSVD is relatively slow and not effective

TABLE 44Comparison on robust PCA

Size Method Error Time (sec) E0 Iter sv

1000times 1000

IALM_LANSVD 333e-07 579e+00 50000 22 100IALM_FFSRQR 279e-07 102e+00 50000 25 100IALM_RSISVD 336e-07 109e+00 49999 22 100IALM_LTSVD 992e-02 311e+00 999715 100 100

2000times 2000

IALM_LANSVD 261e-07 591e+01 199999 22 200IALM_FFSRQR 182e-07 693e+00 199998 25 200IALM_RSISVD 263e-07 738e+00 199996 22 200IALM_LTSVD 842e-02 220e+01 3998937 100 200

4000times 4000

IALM_LANSVD 138e-07 465e+02 799991 23 400IALM_FFSRQR 139e-07 443e+01 800006 26 400IALM_RSISVD 151e-07 504e+01 799990 23 400IALM_LTSVD 894e-02 154e+02 15996623 100 400

6000times 6000

IALM_LANSVD 130e-07 166e+03 1799982 23 600IALM_FFSRQR 102e-07 142e+02 1799993 26 600IALM_RSISVD 144e-07 162e+02 1799985 23 600IALM_LTSVD 858e-02 555e+02 35992605 100 600

432 Matrix completion We solve the matrix completion problems for two real datasets used in [68] the Jester joke data set [24] and the MovieLens data set [32] The Jester jokedata set consists of 41 million ratings for 100 jokes from 73 421 users and can be downloadedfrom the website httpgoldbergberkeleyedujester-data We test withthe following data matrices

bull jester-1 Data from 24 983 users who have rated 36 or more jokesbull jester-2 Data from 23 500 users who have rated 36 or more jokesbull jester-3 Data from 24 938 users who have rated between 15 and 35 jokesbull jester-all The combination of jester-1 jester-2 and jester-3

The MovieLens data set can be downloaded fromhttpsgrouplensorgdatasetsmovielensWe test with the following data matricesbull movie-100K 100 000 ratings of 943 users for 1682 moviesbull movie-1M 1 million ratings of 6040 users for 3900 moviesbull movie-latest-small 100 000 ratings of 700 users for 9000 movies

For each data set we let M be the original data matrix where Mij stands for the ratingof joke (movie) j by user i and Γ be the set of indices where Mij is known The matrixcompletion algorithm quality is measured by the Normalized Mean Absolute Error (NMAE)defined by

NMAEdef=

1|Γ|sum

(ij)isinΓ |Mij minusXij |rmax minus rmin

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 487

where Xij is the prediction of the rating of joke (movie) j given by user i and rmin rmax arethe lower and upper bounds of the ratings respectively For the Jester joke data sets we setrmin = minus10 and rmax = 10 For the MovieLens data sets we set rmin = 1 and rmax = 5

Since |Γ| is large we randomly select a subset Ω from Γ and then use Algorithm 8 tosolve the problem (28) We randomly select 10 ratings for each user in the Jester joke datasets while we randomly choose about 50 of the ratings for each user in the MovieLensdata sets Table 45 includes the parameter settings in the algorithms The maximum iterationnumber is 100 in IALM and all other parameters are the same as those used in [45]

The numerical results are included in Table 46 We observe that IALM_FFSRQR achievesalmost the same recoverability as other methods (except IALM_LTSVD) and is slightly fasterthan IALM_RSISVD for these two data sets

TABLE 45Parameters used in the IALM method on matrix completion

Data set m n |Γ| |Ω|jester-1 24983 100 181e+06 249830jester-2 23500 100 171e+06 235000jester-3 24938 100 617e+05 249384

jester-all 73421 100 414e+06 734210movie-100K 943 1682 100e+05 49918movie-1M 6040 3706 100e+06 498742

movie-latest-small 671 9066 100e+05 52551

5 Conclusions We have presented the Flip-Flop SRQR factorization a variant ofthe QLP factorization to compute low-rank matrix approximations The Flip-Flop SRQRalgorithm uses a SRQR factorization to initialize the truncated version of a column-pivotedQR factorization and then forms an LQ factorization For the numerical results presented theerrors in the proposed algorithm were comparable to those obtained from the other state-of-the-art algorithms This new algorithm is cheaper to compute and produces quality low-rankmatrix approximations Furthermore we prove singular value lower bounds and residual errorupper bounds for the Flip-Flop SRQR factorization In situations where singular values ofthe input matrix decay relatively quickly the low-rank approximation computed by Flip-FlopSRQR is guaranteed to be as accurate as the truncated SVD We also perform a complexityanalysis to show that Flip-Flop SRQR is faster than the approximate SVD with randomizedsubspace iteration Future work includes reducing the overhead cost in Flip-Flop SRQR andimplementing the Flip-Flop SRQR algorithm on distributed memory machines for popularapplications such as distributed PCA

Acknowledgments The authors would like to thank Yousef Saad the editor and threeanonymous referees who kindly reviewed the earlier version of this manuscript and providedvaluable suggestions and comments

Appendix A Partial QR factorization with column pivotingThe details of partial QRCP are covered in Algorithm 4

Appendix B TUXV algorithmThe details of TUXV algorithm are presented in Algorithm 5

Appendix C Computing Tucker decomposition of higher-order tensorsAlgorithm 6 computes the Tucker decomposition of higher-order tensors ie sequentially

truncated higher-order SVD

ETNAKent State University and

Johann Radon Institute (RICAM)

488 Y FENG J XIAO AND M GU

TABLE 46Comparison on matrix completion

Data set Method Iter Time NMAE sv σmax σmin

jester-1

IALM-LANSVD 12 706e+00 184e-01 100 214e+03 100e+00IALM-FFSRQR 12 344e+00 169e-01 100 228e+03 100e+00IALM-RSISVD 12 375e+00 189e-01 100 212e+03 100e+00IALM-LTSVD 100 211e+01 174e-01 62 300e+03 100e+00

jester-2

IALM-LANSVD 12 680e+00 185e-01 100 213e+03 100e+00IALM-FFSRQR 12 279e+00 170e-01 100 229e+03 100e+00IALM-RSISVD 12 359e+00 191e-01 100 212e+03 100e+00IALM-LTSVD 100 203e+01 175e-01 58 296e+03 100e+00

jester-3

IALM-LANSVD 12 705e+00 126e-01 99 179e+03 100e+00IALM-FFSRQR 12 303e+00 122e-01 100 171e+03 100e+00IALM-RSISVD 12 385e+00 131e-01 100 178e+03 100e+00IALM-LTSVD 100 212e+01 133e-01 55 250e+03 100e+00

jester-all

IALM-LANSVD 12 239e+01 172e-01 100 356e+03 100e+00IALM-FFSRQR 12 112e+01 162e-01 100 363e+03 100e+00IALM-RSISVD 12 134e+01 182e-01 100 347e+03 100e+00IALM-LTSVD 100 699e+01 168e-01 52 492e+03 100e+00

movie-100K

IALM-LANSVD 29 286e+01 183e-01 285 121e+03 100e+00IALM-FFSRQR 30 455e+00 167e-01 295 153e+03 100e+00IALM-RSISVD 29 482e+00 182e-01 285 129e+03 100e+00IALM-LTSVD 48 142e+01 147e-01 475 191e+03 100e+00

movie-1M

IALM-LANSVD 50 740e+02 158e-01 495 499e+03 100e+00IALM-FFSRQR 53 207e+02 137e-01 525 663e+03 100e+00IALM-RSISVD 50 223e+02 157e-01 495 535e+03 100e+00IALM-LTSVD 100 850e+02 117e-01 995 897e+03 100e+00

movie-latest-small

IALM-LANSVD 31 166e+02 185e-01 305 113e+03 100e+00IALM-FFSRQR 31 196e+01 200e-01 305 142e+03 100e+00IALM-RSISVD 31 285e+01 191e-01 305 120e+03 100e+00IALM-LTSVD 63 402e+01 208e-01 298 179e+03 100e+00

Appendix D IALM for solving two special nuclear norm minimization problemsAlgorithm 7 and Algorithm 8 solve the robust PCA and matrix completion problems

respectively In Algorithm 7 Sω (x) = sgn (x) middot max (|x| minus ω 0) is the soft shrinkageoperator [29] with x isin Rn and ω gt 0 In Algorithm 8 Ω is the complement of Ω

Appendix E Approximate SVD with randomized subspace iterationThe randomized subspace iteration was proposed in [30 Algorithm 44] to compute an

orthonormal matrix whose range approximates the range of A An approximate SVD can becomputed using the aforementioned orthonormal matrix [30 Algorithm 51] A randomizedsubspace iteration is used in routine MLSVD_RSI in Matlab toolbox tensorlab [73] andMLSVD_RSI is by far the most efficient function that we can find in Matlab to compute ST-HOSVD We summarize the approximate SVD with randomized subspace iteration pseudocodein Algorithm 9

Now we perform a complexity analysis for the approximate SVD with randomizedsubspace iteration We first note that

1 The cost of generating a random matrix is negligible2 The cost of computing B = AΩ is 2mn (k + p)

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 489

Algorithm 4 Partial QRCPInputsMatrix A isin Rmtimesn Target rank kOutputsOrthogonal matrix Q isin RmtimesmUpper trapezoidal matrix R isin RmtimesnPermutation matrix Π isin Rntimesn such that AΠ = QRAlgorithmInitialize Π = In Compute column norms rs = A (1 m s) 2 (1 le s le n)for j = 1 k do

Find i = arg maxjleslen

rs Exchange rj and ri columns in A and Π

Form Householder reflection Qj from A (j m j)Update trailing matrix A (j m j n)larr QTj A (j m j n)Update rs = A (j + 1 m s) 2 (j + 1 le s le n)

end forQ = Q1Q2 middot middot middotQk is the product of all reflections R = upper trapezoidal part of A

Algorithm 5 TUXV AlgorithmInputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0OutputsColumn orthonormal matrices U isin Rmtimesk V isin Rntimesk and upper triangular matrixR isin Rktimesk such that A asymp URV T AlgorithmDo TRQRCP on A to obtain Q isin Rmtimesk R isin Rktimesn and Π isin RntimesnR = RΠT and do LQ factorization ie [VR] = qr(RT 0)Compute Z = AV and do QR factorization ie [UR] = qr(Z 0)

3 In each QR step [Qsim] = qr (B 0) the cost of computing the QR factorization ofB is 2m (k + p)

2 minus 23 (k + p)

3 (cf [69]) and the cost of forming the first (k + p)

columns in the full Q matrix is m (k + p)2

+ 13 (k + p)

3

Now we count the flops for each i in the for loop

1 The cost of computing B = AT lowastQ is 2mn (k + p)2 The cost of computing [Qsim] = qr (B 0) is 2n (k + p)

2 minus 23 (k + p)

3 and the costof forming the first (k + p) columns in the full Q matrix is n (k + p)

2+ 1

3 (k + p)3

3 The cost of computing B = A lowastQ is 2mn (k + p)4 The cost of computing [Qsim] = qr (B 0) is 2m (k + p)

2minus 23 (k + p)

3 and the costof forming the first (k + p) columns in the full Q matrix is m (k + p)

2+ 1

3 (k + p)3

Together the cost of running the for loop q times is

q

(4mn (k + p) + 3 (m+ n) (k + p)

2 minus 2

3(k + p)

3

)

Additionally the cost of computing B = QT lowastA is 2mn (k + p) the cost of doing SVD of Bis O(n (k + p)

2) and the cost of computing U = Q lowast U is 2m (k + p)

2

ETNAKent State University and

Johann Radon Institute (RICAM)

490 Y FENG J XIAO AND M GU

Algorithm 6 ST-HOSVDInputsTensor X isin RI1timesmiddotmiddotmiddottimesId desired rank (k1 kd) and processing order p = (p1 middot middot middot pd)OutputsTucker decomposition [GU1 middot middot middot Ud]AlgorithmDefine tensor G larr X for j = 1 d dor = pj Compute exact or approximate rank kr SVD of the tensor unfolding G(r) asymp UrΣrV Tr Ur larr UrUpdate G(r) larr ΣrV

Tr ie applying UTr to G

end for

Algorithm 7 Robust PCA Using IALMInputsMeasured matrix M isin Rmtimesn positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 J (M) = max

(M2 λminus1MF

) Y0 = MJ (M) E0 = 0

while not converged do(UΣV) = svd

(MminusEk + microminus1k Yk

)

Xk+1 = USmicrominus1k

(Σ)V T

Ek+1 = Sλmicrominus1k

(M minusXk+1 + microminus1

k Yk)

Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then

Breakend if

end while

Now assume k + p min (mn) and omit the lower-order terms then we arriveat (4q + 4)mn (k + p) as the complexity of approximate SVD with randomized subspaceiteration In practice q is usually chosen to be an integer between 0 and 2

REFERENCES

[1] O ALTER P O BROWN AND D BOTSTEIN Singular value decomposition for genome-wide expressiondata processing and modeling Proc Natl Acad Sci USA 97 (2000) pp 10101ndash10106

[2] E ANDERSON Z BAI C BISCHOF L S BLACKFORD J DEMMEL J DONGARRA J DU CROZA GREENBAUM S HAMMARLING A MCKENNEY AND D SORENSEN LAPACK Usersrsquo Guide 3rded SIAM Philadelphia 1999

[3] C A ANDERSSON AND R BRO Improving the speed of multi-way algorithms Part I Tucker3 ChemometricsIntell Lab Syst 42 (1998) pp 93ndash103

[4] M W BERRY S T DUMAIS AND G W OrsquoBRIEN Using linear algebra for intelligent information retrievalSIAM Rev 37 (1995) pp 573ndash595

[5] C BISCHOF AND C VAN LOAN The WY representation for products of Householder matrices SIAM J Sci

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 491

Algorithm 8 Matrix Completion Using IALMInputsSampled set Ω sampled entries πΩ (M) positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 Y0 = 0 E0 = 0while not converged do

(UΣV) = svd(MminusEk + microminus1k Yk

)

Xk+1 = USmicrominus1k

(Σ)V T

Ek+1 = πΩ

(M minusXk+1 + microminus1

k Yk)

Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then

Breakend if

end while

Statist Comput 8 (1987) pp S2ndashS13[6] P BUSINGER AND G H GOLUB Handbook series linear algebra Linear least squares solutions by House-

holder transformations Numer Math 7 (1965) pp 269ndash276[7] J-F CAI E J CANDEgraveS AND Z SHEN A singular value thresholding algorithm for matrix completion

SIAM J Optim 20 (2010) pp 1956ndash1982[8] E J CANDEgraveS X LI Y MA AND J WRIGHT Robust principal component analysis J ACM 58 (2011)

Art 11 37 pages[9] E J CANDEgraveS AND B RECHT Exact matrix completion via convex optimization Found Comput Math 9

(2009) pp 717ndash772[10] J D CARROLL AND J-J CHANG Analysis of individual differences in multidimensional scaling via an

N-way generalization of ldquoEckart-Youngrdquo decomposition Psychometrika 35 (1970) pp 283ndash319[11] T A DAVIS AND Y HU The University of Florida sparse matrix collection ACM Trans Math Software 38

(2011) pp Art 1 25[12] L DE LATHAUWER AND B DE MOOR From matrix to tensor Multilinear algebra and signal processing in

Mathematics in Signal Processing IV Inst Math Appl Conf Ser Oxford Univ Press Oxford 1998pp 1ndash16

[13] L DE LATHAUWER B DE MOOR AND J VANDEWALLE A multilinear singular value decompositionSIAM J Matrix Anal Appl 21 (2000) pp 1253ndash1278

[14] On the best rank-1 and rank-(R1 R2 middot middot middot RN ) approximation of higher-order tensors SIAM JMatrix Anal Appl 21 (2000) pp 1324ndash1342

[15] L DE LATHAUWER AND J VANDEWALLE Dimensionality reduction in higher-order signal processing andrank-(R1 R2 RN ) reduction in multilinear algebra Linear Algebra Appl 391 (2004) pp 31ndash55

[16] P DRINEAS R KANNAN AND M W MAHONEY Fast Monte Carlo algorithms for matrices II Computinga low-rank approximation to a matrix SIAM J Comput 36 (2006) pp 158ndash183

[17] J A DUERSCH AND M GU Randomized QR with column pivoting SIAM J Sci Comput 39 (2017)pp C263ndashC291

[18] C ECKART AND G YOUNG The approximation of one matrix by another of lower rank Psychometrika 1(1936) pp 211ndash218

[19] L ELDEacuteN Matrix Methods in Data Mining and Pattern Recognition SIAM Philadelphia 2007[20] M FAZEL H HINDI AND S P BOYD Log-det heuristic for matrix rank minimization with applications to

Hankel and Euclidean distance matrices in Proceedings of the 2003 American Control Conference 2003IEEE Conference Proceedings IEEE Piscataway pp 2156ndash2162

[21] M FAZEL T K PONG D SUN AND P TSENG Hankel matrix rank minimization with applications tosystem identification and realization SIAM J Matrix Anal Appl 34 (2013) pp 946ndash977

[22] C FRANK AND E NOumlTH Optimizing eigenfaces by face masks for facial expression recognition in ComputerAnalysis of Images and Patterns N Petkov and M A Westenberg eds vol 2756 of Lecture Notes inComput Sci Springer Berlin 2003 pp 646ndash654

ETNAKent State University and

Johann Radon Institute (RICAM)

492 Y FENG J XIAO AND M GU

Algorithm 9 Approximate SVD with Randomized Subspace IterationInputsMatrix A isin Rmtimesn Target rank k Oversampling size p ge 0 Number of iterations q ge 1OutputsU isin Rmtimesk contains the approximate top k left singular vectors of AΣ isin Rktimesk contains the approximate top k singular values of AV isin Rntimesk contains the approximate top k right singular vectors of AAlgorithmGenerate iid Gaussian matrix Ω isin N (0 1)

ntimes(k+p)Compute B = AΩ[Qsim] = qr (B 0)for i = 1 q doB = AT lowastQ[Qsim] = qr (B 0)B = A lowastQ[Qsim] = qr (B 0)

end forB = QT lowastA[UΣ V ] = svd (B)U = Q lowast U U = U ( 1 k)Σ = Σ (1 k 1 k)V = V ( 1 k)

[23] G W FURNAS S DEERWESTER S T DUMAIS T K LANDAUER R A HARSHMAN L A STREETERAND K E LOCHBAUM Information retrieval using a singular value decomposition model of latentsemantic structure in Proceedings of the 11th Annual International ACM SIGIR Conference on Researchand Development in Information Retrieval Y Chiaramella ed ACM New York 1988 pp 465ndash480

[24] K GOLDBERG T ROEDER D GUPTA AND C PERKINS Eigentaste A constant time collaborative filteringalgorithm Inf Retrieval 4 (2001) pp 133ndash151

[25] G GOLUB Numerical methods for solving linear least squares problems Numer Math 7 (1965) pp 206ndash216[26] G H GOLUB AND C F VAN LOAN Matrix Computations 4th ed Johns Hopkins University Press

Baltimore 2013[27] M GU Subspace iteration randomization and singular value problems SIAM J Sci Comput 37 (2015)

pp A1139ndashA1173[28] M GU AND S C EISENSTAT Efficient algorithms for computing a strong rank-revealing QR factorization

SIAM J Sci Comput 17 (1996) pp 848ndash869[29] E T HALE W YIN AND Y ZHANG Fixed-point continuation for l1-minimization methodology and

convergence SIAM J Optim 19 (2008) pp 1107ndash1130[30] N HALKO P G MARTINSSON AND J A TROPP Finding structure with randomness probabilistic

algorithms for constructing approximate matrix decompositions SIAM Rev 53 (2011) pp 217ndash288[31] R A HARSHMAN Foundations of the PARAFAC procedure Models and conditions for an ldquoexplanatoryrdquo

multimodal factor analysis UCLA Working Papers in Phonetics 16 (1970) pp 1ndash84[32] J L HERLOCKER J A KONSTAN A BORCHERS AND J RIEDL An algorithmic framework for performing

collaborative filtering in Proceedings of the 22nd Annual International ACM SIGIR Conference onResearch and Development in Information Retrieval F Gey M Heaast and R Tong eds ACM NewYork 1999 pp 230ndash237

[33] R A HORN AND C R JOHNSON Topics in Matrix Analysis Cambridge University Press Cambridge 1991[34] D A HUCKABY AND T F CHAN On the convergence of Stewartrsquos QLP algorithm for approximating the

SVD Numer Algorithms 32 (2003) pp 287ndash316[35] Stewartrsquos pivoted QLP decomposition for low-rank matrices Numer Linear Algebra Appl 12 (2005)

pp 153ndash159[36] W B JOHNSON AND J LINDENSTRAUSS Extensions of Lipschitz mappings into a Hilbert space in

Conference in Modern Analysis and Probability (New Haven Conn 1982) R Beals A Beck A Bellowand A Hajian eds vol 26 of Contemp Math Amer Math Soc Providence 1984 pp 189ndash206

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 493

[37] I T JOLLIFFE Principal Component Analysis Springer New York 1986[38] J M KLEINBERG Authoritative sources in a hyperlinked environment J ACM 46 (1999) pp 604ndash632[39] T G KOLDA Orthogonal tensor decompositions SIAM J Matrix Anal Appl 23 (2001) pp 243ndash255[40] T G KOLDA AND B W BADER Tensor decompositions and applications SIAM Rev 51 (2009) pp 455ndash

500[41] R M LARSEN PROPACK-software for large and sparse SVD calculations available at

httpsunstanfordedurmunkPROPACK[42] Lanczos bidiagonalization with partial reorthogonalization Tech Report PB-537 Department of

Computer Science Aarhus University Aarhus Denmark 1998[43] Y LECUN L BOTTOU Y BENGIO AND P HAFFNER Gradient-based learning applied to document

recognition Proc IEEE 86 (1998) pp 2278ndash2324[44] R B LEHOUCQ D C SORENSEN AND C YANG ARPACK Usersrsquo Guide Solution of Large-Scale

Eigenvalue Problems with Implicitly Restarted Arnoldi Methods SIAM Philadelphia 1998[45] Z LIN M CHEN AND Y MA The augmented Lagrange multiplier method for exact recovery of corrupted

low-rank matrices Preprint on arXiv httpsarxivorgabs10095055 2010[46] N LINIAL E LONDON AND Y RABINOVICH The geometry of graphs and some of its algorithmic

applications Combinatorica 15 (1995) pp 215ndash245[47] Z LIU A HANSSON AND L VANDENBERGHE Nuclear norm system identification with missing inputs and

outputs Systems Control Lett 62 (2013) pp 605ndash612[48] Z LIU AND L VANDENBERGHE Interior-point method for nuclear norm approximation with application to

system identification SIAM J Matrix Anal Appl 31 (2009) pp 1235ndash1256[49] S MA D GOLDFARB AND L CHEN Fixed point and Bregman iterative methods for matrix rank minimiza-

tion Math Program 128 (2011) pp 321ndash353[50] C B MELGAARD Randomized Pivoting and Spectrum-Revealing Bounds in Numerical Linear Algebra

PhD Thesis University of California Berkeley 2015[51] M MESBAHI AND G P PAPAVASSILOPOULOS On the rank minimization problem over a positive semidefinite

linear matrix inequality IEEE Trans Automat Control 42 (1997) pp 239ndash243[52] N MULLER L MAGAIA AND B M HERBST Singular value decomposition eigenfaces and 3D recon-

structions SIAM Rev 46 (2004) pp 518ndash545[53] M NARWARIA AND W LIN SVD-based quality metric for image and video using machine learning IEEE

Trans Systbdquo Man and Cybernetics Part B (Cybernetics) 42 (2012) pp 347ndash364[54] T-H OH Y MATSUSHITA Y-W TAI AND I SO KWEON Fast randomized singular value thresholding

for nuclear norm minimization in Proceedings of the IEEE Conference on Computer Vision and PatternRecognition 2015 IEEE Conference Proceedings IEEE Los Alamitos 2015 pp 4484ndash4493

[55] G QUINTANA-ORTIacute X SUN AND C H BISCHOF A BLAS-3 version of the QR factorization with columnpivoting SIAM J Sci Comput 19 (1998) pp 1486ndash1494

[56] B RECHT M FAZEL AND P A PARRILO Guaranteed minimum-rank solutions of linear matrix equationsvia nuclear norm minimization SIAM Rev 52 (2010) pp 471ndash501

[57] J D RENNIE AND N SREBRO Fast maximum margin matrix factorization for collaborative prediction inProceedings of the 22nd International Conference on Machine Learning L De Raedt and S Wrobel edsACM Press Neow York 2005 pp 713ndash719

[58] V ROKHLIN A SZLAM AND M TYGERT A randomized algorithm for principal component analysis SIAMJ Matrix Anal Appl 31 (2009) pp 1100ndash1124

[59] A K SAIBABA HOID higher order interpolatory decomposition for tensors based on Tucker representationSIAM J Matrix Anal Appl 37 (2016) pp 1223ndash1249

[60] B SAVAS AND L ELDEacuteN Handwritten digit classification using higher order singular value decompositionPattern Recognition 40 (2007) pp 993ndash1003

[61] R SCHREIBER AND C VAN LOAN A storage-efficient WY representation for products of Householdertransformations SIAM J Sci Statist Comput 10 (1989) pp 53ndash57

[62] A SHASHUA AND T HAZAN Non-negative tensor factorization with applications to statistics and com-puter vision in Proceedings of the 22nd International Conference on Machine Learning S DzeroskiL De Raedt and S Wrobel eds ACM New York 2005 pp 792ndash799

[63] N D SIDIROPOULOS R BRO AND G B GIANNAKIS Parallel factor analysis in sensor array processingIEEE Trans Signal Process 48 (2000) pp 2377ndash2388

[64] D C SORENSEN AND M EMBREE A DEIM induced CUR factorization SIAM J Sci Comput 38 (2016)pp A1454ndashA1482

[65] N SREBRO J RENNIE AND T S JAAKKOLA Maximum-margin matrix factorization in NIPSrsquo04 Proceed-ings of the 17th International Conference on Neural Information Processing Systems L K Saul Y Weissand L Bottou eds MIT Press Cambridge 2004 pp 1329ndash1336

[66] G W STEWART The QLP approximation to the singular value decomposition SIAM J Sci Comput 20(1999) pp 1336ndash1348

ETNAKent State University and

Johann Radon Institute (RICAM)

494 Y FENG J XIAO AND M GU

[67] S TAHERI Q QIU AND R CHELLAPPA Structure-preserving sparse decomposition for facial expressionanalysis IEEE Trans Image Process 23 (2014) pp 3590ndash3603

[68] K-C TOH AND S YUN An accelerated proximal gradient algorithm for nuclear norm regularized linearleast squares problems Pac J Optim 6 (2010) pp 615ndash640

[69] L N TREFETHEN AND D BAU III Numerical Linear Algebra SIAM Philadelphia 1997[70] L R TUCKER Some mathematical notes on three-mode factor analysis Psychometrika 31 (1966) pp 279ndash

311[71] N VANNIEUWENHOVEN R VANDEBRIL AND K MEERBERGEN A new truncation strategy for the higher-

order singular value decomposition SIAM J Sci Comput 34 (2012) pp A1027ndashA1052[72] M A O VASILESCU AND D TERZOPOULOS Multilinear analysis of image ensembles Tensorfaces in

European Conference on Computer Vision 2002 A Heyden G Sparr M Nielsen and P Johansen edsLecture Notes in Computer Science vol 2350 Springer Berlin 2002 pp 447ndash460

[73] N VERVLIET O DEBALS L SORBER M V BAREL AND L DE LATHAUWER Tensorlab user guideavailable at httpwwwtensorlabnet

[74] J XIAO AND M GU Spectrum-revealing Cholesky factorization for kernel methods in Proceedings of the 16thIEEE International Conference on Data Mining (ICDM) IEEE Conference Proceedings Los Alamitos2016 pp 1293ndash1298

[75] J XIAO M GU AND J LANGOU Fast parallel randomized QR with column pivoting algorithms for reliablelow-rank matrix approximations in Proceedings of the 24th IEEE International Conference on HighPerformance Computing (HiPC) IEEE Conference Proceedings Los Alamitos 2017 pp 233ndash242

[76] T ZHANG AND G H GOLUB Rank-one approximation to high order tensors SIAM J Matrix Anal Appl23 (2001) pp 534ndash550

Page 3: FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION AND …etna.mcs.kent.edu/vol.51.2019/pp469-494.dir/pp469-494.pdfminimization. In Section3, we introduce Flip-Flop SRQR and analyze its

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 471

Algorithm 1 Truncated Randomized QRCP (TRQRCP)InputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0OutputsOrthogonal matrix Q isin RmtimesmUpper trapezoidal matrix R isin RktimesnPermutation matrix Π isin Rntimesn such that AΠ asymp Q ( 1 k)RAlgorithmGenerate iid Gaussian random matrix Ω isin N (0 1)

(b+p)timesmForm the initial sample matrix B = ΩA and initialize Π = Infor j = 1 b k dob = min (k minus j + 1 b)Do partial QRCP on B ( j n) to obtain b pivotsExchange corresponding columns in A B Π and WT Do QR on A (j m j j + bminus 1) using WY formula without updating the trailingmatrixUpdate B ( j + b n) using the formula (24)

end forQ = Q1Q2 middot middot middotQdkbe R = upper trapezoidal part of the submatrix A (1 k 1 n)

to compress A into B = ΩA with much smaller row dimension In practice b is the blocksize and p is the oversampling size RQRCP repeatedly runs partial QRCP on B to obtain bcolumn pivots applies them to the matrix A and then computes QR without pivoting (QRNP)on A and updates the remaining columns of B RQRCP exits this process when it reachesthe target rank k QRCP and RQRCP choose pivots on A and B respectively RQRCP issignificantly faster than QRCP as B has much smaller row dimension than A It is shownin [75] that RQRCP is as reliable as QRCP up to failure probabilities that decay exponentiallywith respect to the oversampling size p

Since the trailing matrix of A is usually not required for low-rank matrix approximations(see (22)) the TRQRCP (truncated RQRCP) algorithm of [17] re-organizes the computationsin RQRCP to directly compute the approximation (22) without explicitly computing thetrailing matrix R22 For more details both RQRCP and TRQRCP are based on the WYrepresentation of the Householder transformations [5 55 61]

Q = Q1Q2 middot middot middotQk = I minus Y TY T

where T isin Rktimesk is an upper triangular matrix and Y isin Rmtimesk is a trapezoidal matrix

consisting of k consecutive Householder vectors Let WT def= TTY TA then the trailing

matrix update formula becomes QTA = Aminus YWT The main difference between RQRCPand TRQRCP is that RQRCP computes the whole trailing matrix update while TRQRCP onlycomputes the part of the update that affects the approximation (22) More discussions aboutRQRCP and TRQRCP can be found in [17] The main steps of TRQRCP are briefly describedin Algorithm 1 The deduction of the updating formula for B is as follows consider partialQRs on AΠ and BΠ = ΩAΠ

(23) AΠ = Q

[R11 R12

R22

] BΠ = Q

[R11 R12

R22

]

ETNAKent State University and

Johann Radon Institute (RICAM)

472 Y FENG J XIAO AND M GU

Partition Ωdef= Q

TΩQ

def=[Ω1 Ω2

] and therefore (23) implies

Ω

[R11 R12

R22

]=

[R11 R12

R22

]

which leads to the updating formula for B in Algorithm 1

(24)[R12

R22

]larr Ω2R22 =

[R12 minusR11R

minus111 R12

R22

]

With TRQRCP the TUXV algorithm [17 Algorithm 7] (Algorithm 5 in the appendix)computes a low-rank approximation with the QLP factorization at a greatly accelerated speedby computing a partial QR factorization with column pivoting followed with a partial LQfactorization

22 Spectrum-revealing QR factorization Although both RQRCP and TRQRCP arevery effective practical tools for low-rank matrix approximations they are not known toprovide reliable low-rank matrix approximations due to their underlying greediness in thecolumn norm-based pivoting strategy To solve this potential problem of column-basedQR factorization Gu and Eisenstat [28] proposed an efficient way to perform additionalcolumn interchanges to enhance the quality of the leading k columns in AΠ as a basis for theapproximate column subspace More recently a more efficient and effective method SRQRwas introduced and analyzed in [50 75] to compute the low-rank approximation (22) Theconcept of spectrum-revealing introduced in [50 74] emphasizes the utilization of partialQR factorization (22) as a low-rank matrix approximation as opposed to the more traditionalrank-revealing factorization which emphasizes the utility of the partial QR factorization (21)as a tool for numerical rank determination The SRQR algorithm is described in Algorithm 2The SRQR algorithm will always run to completion with a high-quality low-rank matrixapproximation (22) For real data matrices that usually have fast decaying singular-valuespectrum this approximation is often as good as the truncated SVD The SRQR algorithmof [75] explicitly updates the partial QR factorization (21) while swapping columns but theSRQR algorithm can actually avoid any explicit computations on the trailing matrix R22 usingTRQRCP instead of RQRCP to obtain exactly the same partial QR initialization Below weoutline the SRQR algorithm

In (21) let

(25) Rdef=

[R11 a

α

]be the leading (l + 1)times (l + 1) (l ge k) submatrix of R We define

(26) g1def=R2212|α|

and g2def= |α|

∥∥∥RminusT∥∥∥12

where X12 is the largest column 2-norm of X for any given X In [75] the authorsproved approximation quality bounds involving g1 g2 for the low-rank approximation com-puted by RQRCP or TRQRCP RQRCP or TRQRCP will provide a good low-rank ma-

trix approximation if g1 and g2 are O(1) The authors also proved that g1 leradic

1+ε1minusε and

g2 leradic

2(1+ε)

1minusε

(1 +

radic1+ε1minusε

)lminus1

hold with high probability for both RQRCP and TRQRCPwhere 0 lt ε lt 1 is a user-defined parameter which guides the choice of the oversampling size

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 473

p For reasonably choices of ε like ε = 12 g1 is a small constant while g2 can potentially be an

extremely large number which can lead to a poor low-rank approximation quality To avoidthe potential large number of g2 the SRQR algorithm (Algorithm 2) proposed in [75] uses apair-wise swapping strategy to guarantee that g2 is below some user-defined tolerance g gt 1which is usually chosen to be a small number larger than one like 20 In Algorithm 2 weuse an approximate formula instead of using the definition (26) directly to estimate g2 ieg2 asymp |α|radicd

∥∥∥ΩRminusT∥∥∥

12 based on the Johnson-Lindenstrauss Lemma [36]

Algorithm 2 Spectrum-revealing QR Factorization (SRQR)InputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0Integer l ge k Tolerance g gt 1 for g2OutputsOrthogonal matrix Q isin Rmtimesm formed by the first k reflectorsUpper trapezoidal matrix R isin RktimesnPermutation matrix Π isin Rntimesn such that AΠ asymp Q ( 1 k)RAlgorithmCompute QRΠ with RQRCP or TRQRCP to l stepsCompute squared 2-norm of the columns of B( l + 1 n) ri (l + 1 le i le n) where Bis an updated random projection of A computed by RQRCP or TRQRCPApproximate squared 2-norm of the columns of A(l + 1 m l + 1 n) ri = ri(b +p) (l + 1 le i le n)ı = argmaxl+1leilenriSwap ı-th and (l + 1)-st columns of AΠ rOne-step QR factorization of A(l + 1 m l + 1 n)|α| = Rl+1l+1ri = ri minusA(l + 1 i)2 (l + 2 le i le n)Generate a random matrix Ω isin N (0 1)dtimes(l+1) (d l)Compute g2 = |α|

∥∥∥RminusT∥∥∥12asymp |α|radic

d

∥∥∥ΩRminusT∥∥∥

12

while g2 gt g doı = argmax1leilel+1ith column norm of ΩRminusT Swap ı-th and (l + 1)-st columns of A and Π in a Round Robin rotationGivens-rotate R back into upper-trapezoidal formrl+1 = R2

l+1l+1 ri = ri +A(l + 1 i)2 (l + 2 le i le n)ı = argmaxl+1leilenriSwap ı-th and (l + 1)-st columns of AΠ rOne-step QR factorization of A(l + 1 m l + 1 n)|α| = Rl+1l+1ri = ri minusA(l + 1 i)2 (l + 2 le i le n)Generate a random matrix Ω isin N (0 1)dtimes(l+1)(d l) Compute g2 = |α|

∥∥∥RminusT∥∥∥12asymp |α|radic

d

∥∥∥ΩRminusT∥∥∥

12

end while

23 Tensor approximation In this section we review some basic notations and conceptsinvolving tensors A more detailed discussion of the properties and applications of tensors can

ETNAKent State University and

Johann Radon Institute (RICAM)

474 Y FENG J XIAO AND M GU

be found in the review [40] A tensor is a d-dimensional array of numbers denoted by scriptnotation X isin RI1timesmiddotmiddotmiddottimesId with entries given by

xj1jd 1 le j1 le I1 1 le jd le Id

We use the matrix X(n) isin RIntimes(Πj 6=nIj) to denote the nth mode unfolding of the tensorX Since this tensor has d dimensions there are altogether d-possibilities for unfolding Then-mode product of a tensor X isin RI1timesmiddotmiddotmiddottimesId with a matrix U isin RktimesIn results in a tensorY isin RI1timesmiddotmiddotmiddottimesInminus1timesktimesIn+1timesmiddotmiddotmiddottimesId such that

yj1jnminus1jjn+1jd = (X timesn U)j1jnminus1jjn+1jd=

Insumjn=1

xj1jdujjn

Alternatively it can be expressed conveniently in terms of unfolded tensors

Y = X timesn U hArr Y(n) = UX(n)

Decompositions of higher-order tensors have applications in signal processing [12 15 63]numerical linear algebra [13 39 76] computer vision [62 72 67] etc Two particular tensordecompositions can be considered as higher-order extensions of the matrix SVD CANDE-COMPPARAFAC (CP) [10 31] decomposes a tensor as a sum of rank-one tensors andthe Tucker decomposition [70] is a higher-order form of the principal component analy-sis Knowing the definitions of mode products and unfolding of tensors we can introducethe sequentially truncated higher-order SVD (ST-HOSVD) algorithm for producing a rank(k1 kd) approximation to the tensor based on the Tucker decomposition format The ST-HOSVD algorithm [3 71] (Algorithm 6 in the appendix) returns a core tensor G isin Rk1timesmiddotmiddotmiddottimeskdand a set of unitary matrices Uj isin RIjtimeskj for j = 1 d such that

X asymp G times1 U1 middot middot middot timesd Ud

where the right-hand side is called a Tucker decomposition However a straightforwardgeneralization to higher-order (d ge 3) tensors of the matrix Eckart-Young-Mirsky Theorem isnot possible [14]

In ST-HOSVD one key step is to compute an exact or approximate rank-kr SVD ofthe tensor unfolding Well known efficient ways to compute an exact low-rank SVD includeKrylov subspace methods [44] There are also efficient randomized algorithms to find anapproximate low-rank SVD [30] In the Matlab tensorlab toolbox [73] the most efficientmethod MLSVD_RSI is essentially ST-HOSVD with an randomized subspace iteration tofind the approximate SVD of the tensor unfolding

24 Nuclear norm minimization Matrix rank minimization problem appears ubiqui-tously in many fields such as Euclidean embedding [20 46] control [21 51 56] collaborativefiltering [9 57 65] system identification [47 48] etc The regularized nuclear norm mini-mization problem is the following

minXisinRmtimesn

f (X) + τXlowast

where τ gt 0 is a regularization parameter and Xlowastdef=sumqi=1 σi (X) The choice of the

function f (middot) is situational f (X) = M minus X1 in robust principal component analysis(robust PCA) [8] and f (X) = πΩ (M)minus πΩ (X) ||2F in matrix completion [7]

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 475

Many researchers have devoted themselves to solve the above nuclear norm minimizationproblem and plenty of algorithms have been proposed including singular value thresholding(SVT) [7] fixed point continuous (FPC) [49] accelerated proximal gradient (APG) [68]augmented Lagrange multiplier (ALM) [45] The most expensive part of these algorithms liesin the computation of the truncated SVD Inexact augmented Lagrange multiplier (IALM) [45]has been proved to be one of the most accurate and efficient among them We now describeIALM for the robust PCA and matrix completion problems

The robust PCA problem can be formalized as a minimization problem of a sum of nuclearnorm and a scaled matrix l1-norm (sum of matrix entries in absolute value)

(27) min Xlowast + λE1 subject to M = X + E

where M is a measured matrix X has low-rank E is an error matrix and sufficiently sparseand λ is a positive weighting parameter Algorithm 7 in the appendix describes details of theIALM method to solve the robust PCA problem [45]

The matrix completion problem [9 45] can be written in the form

(28) minXisinRmtimesn

Xlowast subject to X + E = M πΩ (E) = 0

where πΩ Rmtimesn rarr Rmtimesn is an orthogonal projection that keeps the entries in Ω unchangedand sets those outside Ω zeros In [45] the authors applied the IALM method to the matrixcompletion problem (Algorithm 8 in the appendix)

3 Flip-Flop SRQR factorization

31 Flip-Flop SRQR factorization In this section we introduce our Flip-Flop SRQRfactorization a slightly different variant of TUXV (Algorithm 5 in the appendix) to computean SVD approximation based on the QLP factorization Given an integer l ge k we run SRQR(the version without computing the trailing matrix) with l steps on A

(31) AΠ = QR = Q

[R11 R12

R22

]

where R11 isin Rltimesl is upper triangular R12 isin Rltimes(nminusl) and R22 isin R(mminusl)times(nminusl) Then werun partial QRNP with l steps on RT

(32) RT =

[RT11

RT12 RT22

]= Q

[R11 R12

R22

]asymp Q1

[R11 R12

]

where Q =[Q1 Q2

]with Q1 isin Rntimesl Therefore combining with the fact that

AΠQ1 = Q[R11 R12

]T

we can approximate the matrix A by

(33) A = QRΠT = Q(RT)T

ΠT asymp Q

[RT11

RT12

]QT1 ΠT = A

(ΠQ1

)(ΠQ1

)T

We denote the rank-k truncated SVD ofAΠQ1 by UkΣkVTk LetUk = Uk Vk = ΠQ1Vk

then using (33) a rank-k approximate SVD of A is obtained

(34) A asymp UkΣkVTk

where Uk isin Rmtimesk Vk isin Rntimesk are column orthonormal and Σk = Diag (σ1 middot middot middot σk) withσirsquos being the leading k singular values of AΠQ1 The Flip-Flop SRQR factorization isoutlined in Algorithm 3

ETNAKent State University and

Johann Radon Institute (RICAM)

476 Y FENG J XIAO AND M GU

Algorithm 3 Flip-Flop Spectrum-Revealing QR FactorizationInputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0Integer l ge k Tolerance g gt 1 for g2OutputsU isin Rmtimesk contains the approximate top k left singular vectors of AΣ isin Rktimesk contains the approximate top k singular values of AV isin Rntimesk contains the approximate top k right singular vectors of AAlgorithmRun SRQR on A to l steps to obtain (R11 R12)Run QRNP on (R11 R12)

T to obtain Q1 represented by a sequence of Householder vectors

A = AΠQ1[U Σ V ] = svd

(A)

U = U ( 1 k) Σ = Σ (1 k 1 k) V = ΠQ1V ( 1 k)

32 Complexity analysis In this section we perform a complexity analysis of Flip-Flop SRQR Since the approximate SVD only makes sense when the target rank k is smallwe assume that k le l min (mn) The complexity analysis of Flip-Flop SRQR looks asfollows

1 The main cost of doing SRQR with TRQRCP onA is 2mnl+2(b+p)mn+(m+n)l22 The main cost of the QR factorization for [R11 R12]

T and forming Q1 is 2nl2minus 23 l

33 The main cost of computing A = AΠQ1 is 2mnl4 The main cost of computing [Usimsim] = svd

(A)

is O(ml2)5 The main cost of forming Vk is 2nlk

Since k le l min (mn) the complexity of Flip-Flop SRQR is 4mnl + 2(b+ p)mnby omitting the lower-order terms

On the other hand the complexity of the approximate SVD with randomized subspaceiteration (RSISVD) [27 30] is (4 + 4q)mn (k + p) where p is the oversampling size and qis the number of subspace iterations (see the detailed analysis in the appendix) In practice pis chosen to be a small integer for instance 5 in RSISVD In Flip-Flop SRQR l is usuallychosen to be a little bit larger than k eg l = k + 5 We also found that l = k is sufficient interms of the approximation quality in our numerical experiments Therefore we observe thatFlip-Flop SRQR is more efficient than RSISVD for any q gt 0

Since a practical dataset matrix can be too large to fit into the main memory the cost oftransferring the matrix into a memory hierarchy typically predominates the arithmetic costTherefore we also compare the number of data passes between RSISVD and Flip-Flop SRQRRSISVD needs 2q + 2 passes while Flip-Flop SRQR needs 4 passes computing B = ΩAswapping the columns of A realizing a partial QR factorization on A and computing AΠQ1Therefore Flip-Flop SRQR requires less data transfer than RSISVD for any q gt 1

33 Quality analysis of Flip-Flop SRQR This section is devoted to the quality analysisof Flip-Flop SRQR We start with Lemma 31

LEMMA 31 Given any matrix X = [X1X2] with Xi isin Rmtimesni i = 1 2 and n1 +n2 = n

σj (X)2 le σj (X1)

2+ X222 1 le j le min(mn)

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 477

Proof Since XXT = X1XT1 + X2X

T2 we obtain the above result using [33 Theo-

rem 3316]

We are now ready to derive bounds for the singular values and the approximation errorof Flip-Flop SRQR We need to emphasize that even if the target rank is k we run Flip-FlopSRQR with an actual target rank l which is a little bit larger than k The difference betweenk and l can create a gap between the singular values of A so that we can obtain a reliablelow-rank approximation

THEOREM 32 Given a matrix A isin Rmtimesn a target rank k an oversampling size p andan actual target rank l ge k the matrices Uk Σk Vk in (34) computed by Flip-Flop SRQRsatisfy

(35) σj(Σk) ge σj(A)

4

radic1 +

2R2242σ4j (Σk)

1 le j le k

and

(36)∥∥Aminus UkΣkV

Tk

∥∥2le σk+1 (A)

4

radic1 + 2

(R222σk+1 (A)

)4

where R22 isin R(mminusl)times(nminusl) is the trailing matrix in (31) Using the properties of SRQR wefurthermore have

(37) σj (Σk) ge σj (A)

4

radic1 + min

(2τ4 τ4 (2 + 4τ4)

(σl+1(A)σj(A)

)4) 1 le j le k

and

(38)∥∥Aminus UkΣkV

Tk

∥∥2le σk+1 (A)

4

radic1 + 2τ4

(σl+1 (A)

σk+1 (A)

)4

where τ and τ defined in (312) have matrix dimension-dependent upper bounds

τ le g1g2

radic(l + 1) (nminus l) and τ le g1g2

radicl (nminus l)

where g1 leradic

1+ε1minusε and g2 le g with high probability The parameters ε gt 0 and g gt 1 are

user-defined

Proof In terms of the singular value bounds observing that

[R11 R12

R22

][R11 R12

R22

]T=

[R11R

T11 + R12R

T12 R12R

T22

R22RT12 R22R

T22

]

ETNAKent State University and

Johann Radon Institute (RICAM)

478 Y FENG J XIAO AND M GU

we apply Lemma 31 twice for any 1 le j le k

σ2j

[R11 R12

R22

][R11 R12

R22

]Tle σ2

j

([R11R

T11 + R12R

T12 R12R

T22

])+∥∥∥[R22R

T12 R22R

T22

]∥∥∥2

2

le σ2j

(R11R

T11 + R12R

T12

)+∥∥∥R12R

T22

∥∥∥2

2+∥∥∥[R22R

T12 R22R

T22

]∥∥∥2

2

le σ2j

(R11R

T11 + R12R

T12

)+ 2

∥∥∥∥∥[R12

R22

]∥∥∥∥∥4

2

= σ2j

(R11R

T11 + R12R

T12

)+ 2 R2242 (39)

The relation (39) can be further rewritten as

σ4j (A) le σ4

j

([R11 R12

])+ 2 R2242 = σ4

j (Σk) + 2 R2242 1 le j le k

which is equivalent to

σj(Σk) ge σj(A)

4

radic1 +

2R2242σ4j (Σk)

For the residual matrix bound we let[R11 R12

]def=[R11 R12

]+[δR11 δR12

]

where[R11 R12

]is the rank-k truncated SVD of

[R11 R12

] Since

(310)∥∥Aminus UkΣkV

Tk

∥∥2

=

∥∥∥∥∥AΠminusQ[R11 R12

0

]TQT

∥∥∥∥∥2

it follows from the orthogonality of singular vectors that[R11 R12

]T [δR11 δR12

]= 0

and therefore[R11 R12

]T [R11 R12

]=[R11 R12

]T [R11 R12

]+[δR11 δR12

]T [δR11 δR12

]

which implies

(311) RT12R12 = RT

12R12 +(δR12

)T (δR12

)

Similarly to the deduction of (39) from (311) we can derive∥∥∥∥[δR11 δR12

R22

]∥∥∥∥4

2

le∥∥[δR11 δR12

]∥∥4

2+ 2

∥∥∥∥[δR12

R22

]∥∥∥∥4

2

le σ4k+1 (A) + 2

∥∥∥∥∥[R12

R22

]∥∥∥∥∥4

2

= σ4k+1 (A) + 2 R2242

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 479

Combining with (310) it now follows that

∥∥Aminus UkΣkVTk

∥∥2

=

∥∥∥∥∥AΠminusQ[R11 R12

0

]TQT

∥∥∥∥∥2

=

∥∥∥∥[δR11 δR12

R22

]∥∥∥∥2

le σk+1 (A)4

radic1 + 2

(R222σk+1 (A)

)4

To obtain an upper bound of R222 in (35) and (36) we follow the analysis of SRQRin [75]

From the analysis of [75 Section IV] Algorithm 2 ensures that g1 leradic

1+ε1minusε and g2 le g

with high probability (the actual probability-guarantee formula can be found in [75 SectionIV]) where g1 and g2 are defined by (26) Here 0 lt ε lt 1 is a user defined parameter toadjust the choice of the oversampling size p used in the TRQRCP initialization part in SRQRg gt 1 is a user-defined parameter in the extra swapping part in SRQR Let

(312) τdef= g1g2

R222R2212

∥∥∥RminusT∥∥∥minus1

12

σl+1 (A)and τ

def= g1g2

R222R2212

∥∥RminusT11

∥∥minus1

12

σk (Σk)

where R is defined in equation (25) Since 1radicnX12 le X2 le

radicnX12 and

σi (X1) le σi (X) 1 le i le min (s t) for any matrix X isin Rmtimesn and submatrix X1 isin Rstimestof X we obtain that

τ = g1g2R222R2212

∥∥∥RminusT∥∥∥minus1

12

σl+1

(R) σl+1

(R)

σl+1 (A)le g1g2

radic(l + 1) (nminus l)

Using the fact that σl([R11 R12

])= σl

(R11

)and σk (Σk) = σk

([R11 R12

])by (32)

and (34)

τ = g1g2R222R2212

∥∥RminusT11

∥∥minus1

12

σl (R11)

σl (R11)

σl([R11 R12

]) σl ([R11 R12

])σk (Σk)

le g1g2

radicl (nminus l)

By the definition of τ

(313) R222 = τ σl+1 (A)

Plugging this into (36) yields (38)By the definition of τ we observe that

(314) R222 le τσk (Σk)

By (35) and (314)(315)

σj (Σk) ge σj (A)

4

radic1 + 2

(R222σj(Σk)

)4ge σj (A)

4

radic1 + 2

(R222σk(Σk)

)4ge σj (A)

4radic

1 + 2τ4 1 le j le k

ETNAKent State University and

Johann Radon Institute (RICAM)

480 Y FENG J XIAO AND M GU

On the other hand using (35)

σ4j (A) le σ4

j (Σk)

(1 + 2

σ4j (A)

σ4j (Σk)

R2242σ4j (A)

)

le σ4j (Σk)

1 + 2

(σ4j (Σk) + 2 R2242

)σ4j (Σk)

R2242σ4j (A)

le σ4

j (Σk)

(1 + 2

(1 + 2

R2242σ4k (Σk)

)R2242σ4j (A)

)

that is

σj (Σk) ge σj (A)

4

radic1 +

(2 + 4

R2242σ4k(Σk)

)R2242σ4j (A)

Plugging (313) and (314) into this above equation yields

(316) σj (Σk) ge σj (A)

4

radic1 + τ4 (2 + 4τ4)

(σl+1(A)σj(A)

)4 1 le j le k

Combining (315) with (316) we arrive at (37)We note that (35) and (36) still hold true if we replace k by lInequality (37) shows that with the definitions (312) of τ and τ Flip-Flop SRQR can

reveal at least a dimension-dependent fraction of all the leading singular values of A andindeed approximate them very accurately in case they decay relatively quickly Moreover (38)shows that Flip-Flop SRQR can compute a rank-k approximation that is up to a factor of4

radic1 + 2τ4

(σl+1(A)σk+1(A)

)4

from optimal In situations where the singular values of A decay

relatively quickly our rank-k approximation is about as accurate as the truncated SVD with achoice of l such that

σl+1 (A)

σk+1 (A)= o (1)

REMARK 33 The ldquohigh probabilityrdquo mentioned in Theorem 32 is dependent on theoversampling size p but not on the block size b We decided not to dive deep into the probabilityexplanation since it will make the proof much longer and more tedious Theoretically thecomputed oversampling size p for certain given high probability would be rather high Forexample to achieve a probability 095 we need to choose p to be at least 500 However inpractice we would never choose such a large p since it will deteriorate the code efficiencyIt turns out that a small p like 5 sim 10 would achieve good approximation results empiricallyThe detailed explanation about what does high probability mean here can be found in [75]In terms of the choice of the block size b it wonrsquot affect the analysis or the error boundsThe choice of b is purely for the performance of the codes Only a suitable b can take fulladvantage of the cache hierarchy and BLAS routines

4 Numerical experiments In this section we demonstrate the effectiveness and effi-ciency of the Flip-Flop SRQR (FFSRQR) algorithm in several numerical experiments Firstly

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 481

TABLE 41Methods for approximate SVD

Method DescriptionLANSVD Approximate SVD using Lanczos bidiagonalization with partial

reorthogonalization [42]Its Fortran and Matlab implementations are from PROPACK [41]

FFSRQR Flip-Flop Spectrum-revealing QR factorizationIts Matlab implementation is using Mex-wrappedBLAS and LAPACK routines [2]

RSISVD Approximate SVD with randomized subspace iteration [30 27]Its Matlab implementation is from tensorlab toolbox [73]

TUXV Algorithm 5 in the appendixLTSVD Linear Time SVD [16]

We use its Matlab implementation by Ma et al [49]

we compare FFSRQR with other approximate SVD algorithms for a matrix approximationSecondly we compare FFSRQR with other methods on tensor approximation problems usingthe tensorlab toolbox [73] Thirdly we compare FFSRQR with other methods for the robustPCA problem and matrix completion problem All experiments are implemented on a laptopwith 29 GHz Intel Core i5 processor and 8 GB of RAM The codes based on LAPACK [2]and PROPACK [41] in Section 41 are in Fortran The codes in Sections 42 43 are in Matlabexcept that FFSRQR is implemented in Fortran and wrapped by mex

41 Approximate truncated SVD In this section we compare FFSRQR with otherapproximate SVD algorithms for low-rank matrix approximation All tested methods are listedin Table 41 Since LTSVD has only a Matlab implementation this method is only consideredin Sections 42 and 43 The test matrices areType 1 A isin Rmtimesn [66] is defined by A = U DV T + 01 middot dss middotE where U isin Rmtimess V isin

Rntimess are column-orthonormal matrices and D isin Rstimess is a diagonal matrix with sgeometrically decreasing diagonal entries from 1 to 10minus3 dss is the last diagonalentry of D E isin Rmtimesn is a random matrix where the entries are independentlysampled from a normal distribution N (0 1) In our numerical experiment we testthree different random matrices The square matrix has a size of 10000times 10000 theshort-fat matrix has a size of 1000times 10000 and the tall-skinny matrix has a size of10000times 1000

Type 2 A isin R4929times4929 is a real data matrix from the University of Florida sparse matrixcollection [11] Its corresponding file name is HBGEMAT11

For a given matrix A isin Rmtimesn the relative SVD approximation error is measured by∥∥Aminus UkΣkVTk

∥∥FAF where Σk contains the approximate top k singular values and

Uk Vk are the corresponding approximate top k singular vectors The parameters used inFFSRQR RSISVD TUXV and LTSVD are listed in Table 42 The additional time requiredto compute g2 is negligible In our experiments g2 always remains modest and never triggerssubsequent SRQR column swaps However computing g2 is still recommended since g2

serves as an insurance policy against potential quality degration by TRQRCP The block size bis chosen to be 32 according to the xGEQP3 LAPACK routine When k is less than 32 weimplement an unblock version of the partial QRCP

Figures 41ndash43 display the run time the relative approximation error and the top 20singular values comparison respectively for four different matrices In terms of speed FFSRQRis always faster than LANSVD and RSISVD Moreover the efficiency difference is more

ETNAKent State University and

Johann Radon Institute (RICAM)

482 Y FENG J XIAO AND M GU

TABLE 42Parameters used in RSISVD FFSRQR TUXV and LTSVD

Method ParameterFFSRQR oversampling size p = 5 block size b = min32 k l = k d = 10 g = 20RSISVD oversampling size p = 5 subspace iteration q = 1TUXV oversampling size p = 5 block size b = min32 k l = kLTSVD probabilities pi = 1n

obvious when k is relatively large FFSRQR is only slightly slower than TUXV In termsof the relative approximation error FFSRQR is comparable to the other three methods Interms of the top singular values approximation Figure 43 displays the top 20 singular valueswhen k = 500 for four different matrices In Figure 43 FFSRQR is superior to TUXV aswe take the absolute values of the main diagonal entries of the upper triangle matrix R as theapproximate singular values for TUXV

100 200 300 400 500

target rank k

100

101

102

103

104

run tim

e (

sec)

FFSRQR

RSISVD

LANSVD

TUXV

(a) Type 1 Random square matrix

100 200 300 400 500

target rank k

10-1

100

101

102

run tim

e (

sec)

FFSRQR

RSISVD

LANSVD

TUXV

(b) Type 1 Random short-fat matrix

100 200 300 400 500

target rank k

10-1

100

101

102

run tim

e (

sec)

FFSRQR

RSISVD

LANSVD

TUXV

(c) Type 1 Random tall-skinny matrix

100 200 300 400 500

target rank k

10-1

100

101

102

103

run tim

e (

sec)

FFSRQR

RSISVD

LANSVD

TUXV

(d) Type 2 GEMAT11

FIG 41 Run time comparison for approximate SVD algorithms

42 Tensor approximation This section illustrates the effectiveness and efficiency ofFFSRQR for computing approximate tensors ST-HOSVD [3 71] is one of the most efficient

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 483

100 200 300 400 500

target rank k

022

024

026

028

03

032

034

036re

lative

ap

pro

xim

atio

n e

rro

r

FFSRQR

RSISVD

LANSVD

TUXV

(a) Type 1 Random square matrix

100 200 300 400 500

target rank k

10-2

10-1

100

rela

tive

ap

pro

xim

atio

n e

rro

r

FFSRQR

RSISVD

LANSVD

TUXV

(b) Type 1 Random short-fat matrix

100 200 300 400 500

target rank k

10-2

10-1

100

rela

tive

ap

pro

xim

atio

n e

rro

r

FFSRQR

RSISVD

LANSVD

TUXV

(c) Type 1 Random tall-skinny matrix

100 200 300 400 500

target rank k

024

026

028

03

032

034

036

rela

tive

ap

pro

xim

atio

n e

rro

rFFSRQR

RSISVD

LANSVD

TUXV

(d) Type 2 GEMAT11

FIG 42 Relative approximation error comparison for approximate SVD algorithms

algorithms to compute a Tucker decomposition of tensors and the most costly part of thisalgorithm is to compute an SVD or approximate SVD of the tensor unfoldings The truncatedSVD and RSISVD are used in the routines MLSVD and MLSVD_RSI respectively in theMatlab tensorlab toolbox [73] Based on this Matlab toolbox we implement ST-HOSVDusing FFSRQR or LTSVD to do the SVD approximation We name these two new routines byMLSVD_FFSRQR and MLSVD_LTSVD respectively and compare these four routines in thenumerical experiment We also have Python codes for this tensor approximation experimentWe donrsquot list the results of the python programs here as they are similar to those for Matlab

421 A sparse tensor example We test a sparse tensor X isin Rntimesntimesn of the followingformat [64 59]

X =

10sumj=1

1000

jxj yj zj +

nsumj=11

1

jxj yj zj

where xj yj zj isin Rn are sparse vectors with nonnegative entries The symbol ldquordquo representsthe vector outer product We compute a rank-(k k k) Tucker decomposition [GU1 U2 U3]using MLSVD MLSVD_FFSRQR MLSVD_RSI and MLSVD_LTSVD respectively Therelative approximation error is measured by X minus XkF XF where Xk = G times1 U1 times2

U2 times3 U3

ETNAKent State University and

Johann Radon Institute (RICAM)

484 Y FENG J XIAO AND M GU

5 10 15 20

index

06

07

08

09

1to

p 2

0 s

ingula

r valu

es

FFSRQR

RSISVD

LANSVD

TUXV

(a) Type 1 Random square matrix

5 10 15 20

index

06

07

08

09

1

top 2

0 s

ingula

r valu

es

FFSRQR

RSISVD

LANSVD

TUXV

(b) Type 1 Random short-fat matrix

5 10 15 20

index

06

065

07

075

08

085

09

095

top

20

sin

gu

lar

va

lue

s

FFSRQR

RSISVD

LANSVD

TUXV

(c) Type 1 Random tall-skinny matrix

5 10 15 20

index

101

102

103

top

20

sin

gu

lar

va

lue

sFFSRQR

RSISVD

LANSVD

TUXV

(d) Type 2 GEMAT11

FIG 43 Top 20 singular values comparison for approximate SVD algorithms

Figure 44 displays a comparison of the efficiency and accuracy of different methods for a400times 400times 400 sparse tensor approximation problem MLSVD_LTSVD is the fastest but theleast accurate one The other three methods have similar accuracy while MLSVD_FFSRQR isfaster when the target rank k is larger

422 Handwritten digits classification MNIST is a handwritten digits image data setcreated by Yann LeCun [43] Every digit is represented by a 28times28 pixel image Handwrittendigits classification has the objective of training a classification model to classify new unlabeledimages A HOSVD algorithm is proposed by Savas and Elden [60] to classify handwrittendigits To reduce the training time a more efficient ST-HOSVD algorithm is introducedin [71]

We do handwritten digits classification using MNIST which consists of 60 000 trainingimages and 10 000 test images The number of training images in each class is restrictedto 5421 so that the training set is equally distributed over all classes The training set isrepresented by a tensor X of size 786times 5421times 10 The classification relies on Algorithm 2in [60] We use various algorithms to obtain an approximation X asymp G times1 U1 times2 U2 times3 U3

where the core tensor G has size 65times 142times 10The results are summarized in Table 43 In terms of run time we observe that our method

MLSVD_FFSRQR is comparable to MLSVD_RSI while MLSVD is the most expensive

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 485

50 100 150 200

target rank k

10-5

10-4

10-3

10-2

10-1

100

rela

tive

ap

pro

xim

atio

n e

rro

r

MLSVD

MLSVD_FFSRQR

MLSVD_RSI

MLSVD_LTSVD

50 100 150 200

target rank k

10-2

10-1

100

101

102

run tim

e (

sec)

MLSVD

MLSVD_FFSRQR

MLSVD_RSI

MLSVD_LTSVD

FIG 44 Run time and relative approximation error comparison on a sparse tensor

TABLE 43Comparison on handwritten digits classification

MLSVD MLSVD_FFSRQR MLSVD_RSI MLSVD_LTSVD

Training Time [sec] 272121 15455 19343 04266Relative Model Error 04099 04273 04247 05162Classification Accur 9519 9498 9505 9259

one and MLSVD_LTSVD is the fastest one In terms of classification quality MLSVDMLSVD_FFSRQR and MLSVD_RSI are comparable while MLSVD_LTSVD is the leastaccurate one

43 Solving nuclear norm minimization problem To show the effectiveness of theFFSRQR algorithm for the nuclear norm minimization problems we investigate two scenariosthe robust PCA (27) and the matrix completion problem (28) The test matrix used for therobust PCA is introduced in [45] and the test matrices used in the matrix completion are tworeal data sets We use the IALM method [45] to solve both problems the IALM code can bedownloaded1

431 Robust PCA To solve the robust PCA problem we replace the approximateSVD part in the IALM method [45] by various methods We denote the actual solution tothe robust PCA problem by a matrix pair (Xlowast Elowast) isin Rmtimesn times Rmtimesn The matrix Xlowast isXlowast = XLX

TR with XL isin Rmtimesk XR isin Rntimesk being random matrices whose entries are

independently sampled from a normal distribution The sparse matrix Elowast is a random matrixwhere its non-zero entries are independently sampled from a uniform distribution over theinterval [minus500 500] The input to the IALM algorithm has the form M = Xlowast + Elowast andthe output is denoted by

(X E

) In this numerical experiment we use the same parameter

settings as the IALM code for robust PCA rank k is 01m and the number of non-zero entriesin E is 005m2 We choose the trade-off parameter λ = 1

radicmax (mn) as suggested by

Candes et al [8] The solution quality is measured by the normalized root mean square errorX minusXlowastF XlowastF

Table 44 includes relative error run time the number of non-zero entries in E (E0)the iteration count and the number of non-zero singular values (sv) in X of the IALM

1httpperceptioncslillinoisedumatrix-ranksample_codehtml

ETNAKent State University and

Johann Radon Institute (RICAM)

486 Y FENG J XIAO AND M GU

algorithm using different approximate SVD methods We observe that IALM_FFSRQR isfaster than all the other three methods while its error is comparable to IALM_LANSVD andIALM_RSISVD IALM_LTSVD is relatively slow and not effective

TABLE 44Comparison on robust PCA

Size Method Error Time (sec) E0 Iter sv

1000times 1000

IALM_LANSVD 333e-07 579e+00 50000 22 100IALM_FFSRQR 279e-07 102e+00 50000 25 100IALM_RSISVD 336e-07 109e+00 49999 22 100IALM_LTSVD 992e-02 311e+00 999715 100 100

2000times 2000

IALM_LANSVD 261e-07 591e+01 199999 22 200IALM_FFSRQR 182e-07 693e+00 199998 25 200IALM_RSISVD 263e-07 738e+00 199996 22 200IALM_LTSVD 842e-02 220e+01 3998937 100 200

4000times 4000

IALM_LANSVD 138e-07 465e+02 799991 23 400IALM_FFSRQR 139e-07 443e+01 800006 26 400IALM_RSISVD 151e-07 504e+01 799990 23 400IALM_LTSVD 894e-02 154e+02 15996623 100 400

6000times 6000

IALM_LANSVD 130e-07 166e+03 1799982 23 600IALM_FFSRQR 102e-07 142e+02 1799993 26 600IALM_RSISVD 144e-07 162e+02 1799985 23 600IALM_LTSVD 858e-02 555e+02 35992605 100 600

432 Matrix completion We solve the matrix completion problems for two real datasets used in [68] the Jester joke data set [24] and the MovieLens data set [32] The Jester jokedata set consists of 41 million ratings for 100 jokes from 73 421 users and can be downloadedfrom the website httpgoldbergberkeleyedujester-data We test withthe following data matrices

bull jester-1 Data from 24 983 users who have rated 36 or more jokesbull jester-2 Data from 23 500 users who have rated 36 or more jokesbull jester-3 Data from 24 938 users who have rated between 15 and 35 jokesbull jester-all The combination of jester-1 jester-2 and jester-3

The MovieLens data set can be downloaded fromhttpsgrouplensorgdatasetsmovielensWe test with the following data matricesbull movie-100K 100 000 ratings of 943 users for 1682 moviesbull movie-1M 1 million ratings of 6040 users for 3900 moviesbull movie-latest-small 100 000 ratings of 700 users for 9000 movies

For each data set we let M be the original data matrix where Mij stands for the ratingof joke (movie) j by user i and Γ be the set of indices where Mij is known The matrixcompletion algorithm quality is measured by the Normalized Mean Absolute Error (NMAE)defined by

NMAEdef=

1|Γ|sum

(ij)isinΓ |Mij minusXij |rmax minus rmin

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 487

where Xij is the prediction of the rating of joke (movie) j given by user i and rmin rmax arethe lower and upper bounds of the ratings respectively For the Jester joke data sets we setrmin = minus10 and rmax = 10 For the MovieLens data sets we set rmin = 1 and rmax = 5

Since |Γ| is large we randomly select a subset Ω from Γ and then use Algorithm 8 tosolve the problem (28) We randomly select 10 ratings for each user in the Jester joke datasets while we randomly choose about 50 of the ratings for each user in the MovieLensdata sets Table 45 includes the parameter settings in the algorithms The maximum iterationnumber is 100 in IALM and all other parameters are the same as those used in [45]

The numerical results are included in Table 46 We observe that IALM_FFSRQR achievesalmost the same recoverability as other methods (except IALM_LTSVD) and is slightly fasterthan IALM_RSISVD for these two data sets

TABLE 45Parameters used in the IALM method on matrix completion

Data set m n |Γ| |Ω|jester-1 24983 100 181e+06 249830jester-2 23500 100 171e+06 235000jester-3 24938 100 617e+05 249384

jester-all 73421 100 414e+06 734210movie-100K 943 1682 100e+05 49918movie-1M 6040 3706 100e+06 498742

movie-latest-small 671 9066 100e+05 52551

5 Conclusions We have presented the Flip-Flop SRQR factorization a variant ofthe QLP factorization to compute low-rank matrix approximations The Flip-Flop SRQRalgorithm uses a SRQR factorization to initialize the truncated version of a column-pivotedQR factorization and then forms an LQ factorization For the numerical results presented theerrors in the proposed algorithm were comparable to those obtained from the other state-of-the-art algorithms This new algorithm is cheaper to compute and produces quality low-rankmatrix approximations Furthermore we prove singular value lower bounds and residual errorupper bounds for the Flip-Flop SRQR factorization In situations where singular values ofthe input matrix decay relatively quickly the low-rank approximation computed by Flip-FlopSRQR is guaranteed to be as accurate as the truncated SVD We also perform a complexityanalysis to show that Flip-Flop SRQR is faster than the approximate SVD with randomizedsubspace iteration Future work includes reducing the overhead cost in Flip-Flop SRQR andimplementing the Flip-Flop SRQR algorithm on distributed memory machines for popularapplications such as distributed PCA

Acknowledgments The authors would like to thank Yousef Saad the editor and threeanonymous referees who kindly reviewed the earlier version of this manuscript and providedvaluable suggestions and comments

Appendix A Partial QR factorization with column pivotingThe details of partial QRCP are covered in Algorithm 4

Appendix B TUXV algorithmThe details of TUXV algorithm are presented in Algorithm 5

Appendix C Computing Tucker decomposition of higher-order tensorsAlgorithm 6 computes the Tucker decomposition of higher-order tensors ie sequentially

truncated higher-order SVD

ETNAKent State University and

Johann Radon Institute (RICAM)

488 Y FENG J XIAO AND M GU

TABLE 46Comparison on matrix completion

Data set Method Iter Time NMAE sv σmax σmin

jester-1

IALM-LANSVD 12 706e+00 184e-01 100 214e+03 100e+00IALM-FFSRQR 12 344e+00 169e-01 100 228e+03 100e+00IALM-RSISVD 12 375e+00 189e-01 100 212e+03 100e+00IALM-LTSVD 100 211e+01 174e-01 62 300e+03 100e+00

jester-2

IALM-LANSVD 12 680e+00 185e-01 100 213e+03 100e+00IALM-FFSRQR 12 279e+00 170e-01 100 229e+03 100e+00IALM-RSISVD 12 359e+00 191e-01 100 212e+03 100e+00IALM-LTSVD 100 203e+01 175e-01 58 296e+03 100e+00

jester-3

IALM-LANSVD 12 705e+00 126e-01 99 179e+03 100e+00IALM-FFSRQR 12 303e+00 122e-01 100 171e+03 100e+00IALM-RSISVD 12 385e+00 131e-01 100 178e+03 100e+00IALM-LTSVD 100 212e+01 133e-01 55 250e+03 100e+00

jester-all

IALM-LANSVD 12 239e+01 172e-01 100 356e+03 100e+00IALM-FFSRQR 12 112e+01 162e-01 100 363e+03 100e+00IALM-RSISVD 12 134e+01 182e-01 100 347e+03 100e+00IALM-LTSVD 100 699e+01 168e-01 52 492e+03 100e+00

movie-100K

IALM-LANSVD 29 286e+01 183e-01 285 121e+03 100e+00IALM-FFSRQR 30 455e+00 167e-01 295 153e+03 100e+00IALM-RSISVD 29 482e+00 182e-01 285 129e+03 100e+00IALM-LTSVD 48 142e+01 147e-01 475 191e+03 100e+00

movie-1M

IALM-LANSVD 50 740e+02 158e-01 495 499e+03 100e+00IALM-FFSRQR 53 207e+02 137e-01 525 663e+03 100e+00IALM-RSISVD 50 223e+02 157e-01 495 535e+03 100e+00IALM-LTSVD 100 850e+02 117e-01 995 897e+03 100e+00

movie-latest-small

IALM-LANSVD 31 166e+02 185e-01 305 113e+03 100e+00IALM-FFSRQR 31 196e+01 200e-01 305 142e+03 100e+00IALM-RSISVD 31 285e+01 191e-01 305 120e+03 100e+00IALM-LTSVD 63 402e+01 208e-01 298 179e+03 100e+00

Appendix D IALM for solving two special nuclear norm minimization problemsAlgorithm 7 and Algorithm 8 solve the robust PCA and matrix completion problems

respectively In Algorithm 7 Sω (x) = sgn (x) middot max (|x| minus ω 0) is the soft shrinkageoperator [29] with x isin Rn and ω gt 0 In Algorithm 8 Ω is the complement of Ω

Appendix E Approximate SVD with randomized subspace iterationThe randomized subspace iteration was proposed in [30 Algorithm 44] to compute an

orthonormal matrix whose range approximates the range of A An approximate SVD can becomputed using the aforementioned orthonormal matrix [30 Algorithm 51] A randomizedsubspace iteration is used in routine MLSVD_RSI in Matlab toolbox tensorlab [73] andMLSVD_RSI is by far the most efficient function that we can find in Matlab to compute ST-HOSVD We summarize the approximate SVD with randomized subspace iteration pseudocodein Algorithm 9

Now we perform a complexity analysis for the approximate SVD with randomizedsubspace iteration We first note that

1 The cost of generating a random matrix is negligible2 The cost of computing B = AΩ is 2mn (k + p)

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 489

Algorithm 4 Partial QRCPInputsMatrix A isin Rmtimesn Target rank kOutputsOrthogonal matrix Q isin RmtimesmUpper trapezoidal matrix R isin RmtimesnPermutation matrix Π isin Rntimesn such that AΠ = QRAlgorithmInitialize Π = In Compute column norms rs = A (1 m s) 2 (1 le s le n)for j = 1 k do

Find i = arg maxjleslen

rs Exchange rj and ri columns in A and Π

Form Householder reflection Qj from A (j m j)Update trailing matrix A (j m j n)larr QTj A (j m j n)Update rs = A (j + 1 m s) 2 (j + 1 le s le n)

end forQ = Q1Q2 middot middot middotQk is the product of all reflections R = upper trapezoidal part of A

Algorithm 5 TUXV AlgorithmInputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0OutputsColumn orthonormal matrices U isin Rmtimesk V isin Rntimesk and upper triangular matrixR isin Rktimesk such that A asymp URV T AlgorithmDo TRQRCP on A to obtain Q isin Rmtimesk R isin Rktimesn and Π isin RntimesnR = RΠT and do LQ factorization ie [VR] = qr(RT 0)Compute Z = AV and do QR factorization ie [UR] = qr(Z 0)

3 In each QR step [Qsim] = qr (B 0) the cost of computing the QR factorization ofB is 2m (k + p)

2 minus 23 (k + p)

3 (cf [69]) and the cost of forming the first (k + p)

columns in the full Q matrix is m (k + p)2

+ 13 (k + p)

3

Now we count the flops for each i in the for loop

1 The cost of computing B = AT lowastQ is 2mn (k + p)2 The cost of computing [Qsim] = qr (B 0) is 2n (k + p)

2 minus 23 (k + p)

3 and the costof forming the first (k + p) columns in the full Q matrix is n (k + p)

2+ 1

3 (k + p)3

3 The cost of computing B = A lowastQ is 2mn (k + p)4 The cost of computing [Qsim] = qr (B 0) is 2m (k + p)

2minus 23 (k + p)

3 and the costof forming the first (k + p) columns in the full Q matrix is m (k + p)

2+ 1

3 (k + p)3

Together the cost of running the for loop q times is

q

(4mn (k + p) + 3 (m+ n) (k + p)

2 minus 2

3(k + p)

3

)

Additionally the cost of computing B = QT lowastA is 2mn (k + p) the cost of doing SVD of Bis O(n (k + p)

2) and the cost of computing U = Q lowast U is 2m (k + p)

2

ETNAKent State University and

Johann Radon Institute (RICAM)

490 Y FENG J XIAO AND M GU

Algorithm 6 ST-HOSVDInputsTensor X isin RI1timesmiddotmiddotmiddottimesId desired rank (k1 kd) and processing order p = (p1 middot middot middot pd)OutputsTucker decomposition [GU1 middot middot middot Ud]AlgorithmDefine tensor G larr X for j = 1 d dor = pj Compute exact or approximate rank kr SVD of the tensor unfolding G(r) asymp UrΣrV Tr Ur larr UrUpdate G(r) larr ΣrV

Tr ie applying UTr to G

end for

Algorithm 7 Robust PCA Using IALMInputsMeasured matrix M isin Rmtimesn positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 J (M) = max

(M2 λminus1MF

) Y0 = MJ (M) E0 = 0

while not converged do(UΣV) = svd

(MminusEk + microminus1k Yk

)

Xk+1 = USmicrominus1k

(Σ)V T

Ek+1 = Sλmicrominus1k

(M minusXk+1 + microminus1

k Yk)

Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then

Breakend if

end while

Now assume k + p min (mn) and omit the lower-order terms then we arriveat (4q + 4)mn (k + p) as the complexity of approximate SVD with randomized subspaceiteration In practice q is usually chosen to be an integer between 0 and 2

REFERENCES

[1] O ALTER P O BROWN AND D BOTSTEIN Singular value decomposition for genome-wide expressiondata processing and modeling Proc Natl Acad Sci USA 97 (2000) pp 10101ndash10106

[2] E ANDERSON Z BAI C BISCHOF L S BLACKFORD J DEMMEL J DONGARRA J DU CROZA GREENBAUM S HAMMARLING A MCKENNEY AND D SORENSEN LAPACK Usersrsquo Guide 3rded SIAM Philadelphia 1999

[3] C A ANDERSSON AND R BRO Improving the speed of multi-way algorithms Part I Tucker3 ChemometricsIntell Lab Syst 42 (1998) pp 93ndash103

[4] M W BERRY S T DUMAIS AND G W OrsquoBRIEN Using linear algebra for intelligent information retrievalSIAM Rev 37 (1995) pp 573ndash595

[5] C BISCHOF AND C VAN LOAN The WY representation for products of Householder matrices SIAM J Sci

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 491

Algorithm 8 Matrix Completion Using IALMInputsSampled set Ω sampled entries πΩ (M) positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 Y0 = 0 E0 = 0while not converged do

(UΣV) = svd(MminusEk + microminus1k Yk

)

Xk+1 = USmicrominus1k

(Σ)V T

Ek+1 = πΩ

(M minusXk+1 + microminus1

k Yk)

Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then

Breakend if

end while

Statist Comput 8 (1987) pp S2ndashS13[6] P BUSINGER AND G H GOLUB Handbook series linear algebra Linear least squares solutions by House-

holder transformations Numer Math 7 (1965) pp 269ndash276[7] J-F CAI E J CANDEgraveS AND Z SHEN A singular value thresholding algorithm for matrix completion

SIAM J Optim 20 (2010) pp 1956ndash1982[8] E J CANDEgraveS X LI Y MA AND J WRIGHT Robust principal component analysis J ACM 58 (2011)

Art 11 37 pages[9] E J CANDEgraveS AND B RECHT Exact matrix completion via convex optimization Found Comput Math 9

(2009) pp 717ndash772[10] J D CARROLL AND J-J CHANG Analysis of individual differences in multidimensional scaling via an

N-way generalization of ldquoEckart-Youngrdquo decomposition Psychometrika 35 (1970) pp 283ndash319[11] T A DAVIS AND Y HU The University of Florida sparse matrix collection ACM Trans Math Software 38

(2011) pp Art 1 25[12] L DE LATHAUWER AND B DE MOOR From matrix to tensor Multilinear algebra and signal processing in

Mathematics in Signal Processing IV Inst Math Appl Conf Ser Oxford Univ Press Oxford 1998pp 1ndash16

[13] L DE LATHAUWER B DE MOOR AND J VANDEWALLE A multilinear singular value decompositionSIAM J Matrix Anal Appl 21 (2000) pp 1253ndash1278

[14] On the best rank-1 and rank-(R1 R2 middot middot middot RN ) approximation of higher-order tensors SIAM JMatrix Anal Appl 21 (2000) pp 1324ndash1342

[15] L DE LATHAUWER AND J VANDEWALLE Dimensionality reduction in higher-order signal processing andrank-(R1 R2 RN ) reduction in multilinear algebra Linear Algebra Appl 391 (2004) pp 31ndash55

[16] P DRINEAS R KANNAN AND M W MAHONEY Fast Monte Carlo algorithms for matrices II Computinga low-rank approximation to a matrix SIAM J Comput 36 (2006) pp 158ndash183

[17] J A DUERSCH AND M GU Randomized QR with column pivoting SIAM J Sci Comput 39 (2017)pp C263ndashC291

[18] C ECKART AND G YOUNG The approximation of one matrix by another of lower rank Psychometrika 1(1936) pp 211ndash218

[19] L ELDEacuteN Matrix Methods in Data Mining and Pattern Recognition SIAM Philadelphia 2007[20] M FAZEL H HINDI AND S P BOYD Log-det heuristic for matrix rank minimization with applications to

Hankel and Euclidean distance matrices in Proceedings of the 2003 American Control Conference 2003IEEE Conference Proceedings IEEE Piscataway pp 2156ndash2162

[21] M FAZEL T K PONG D SUN AND P TSENG Hankel matrix rank minimization with applications tosystem identification and realization SIAM J Matrix Anal Appl 34 (2013) pp 946ndash977

[22] C FRANK AND E NOumlTH Optimizing eigenfaces by face masks for facial expression recognition in ComputerAnalysis of Images and Patterns N Petkov and M A Westenberg eds vol 2756 of Lecture Notes inComput Sci Springer Berlin 2003 pp 646ndash654

ETNAKent State University and

Johann Radon Institute (RICAM)

492 Y FENG J XIAO AND M GU

Algorithm 9 Approximate SVD with Randomized Subspace IterationInputsMatrix A isin Rmtimesn Target rank k Oversampling size p ge 0 Number of iterations q ge 1OutputsU isin Rmtimesk contains the approximate top k left singular vectors of AΣ isin Rktimesk contains the approximate top k singular values of AV isin Rntimesk contains the approximate top k right singular vectors of AAlgorithmGenerate iid Gaussian matrix Ω isin N (0 1)

ntimes(k+p)Compute B = AΩ[Qsim] = qr (B 0)for i = 1 q doB = AT lowastQ[Qsim] = qr (B 0)B = A lowastQ[Qsim] = qr (B 0)

end forB = QT lowastA[UΣ V ] = svd (B)U = Q lowast U U = U ( 1 k)Σ = Σ (1 k 1 k)V = V ( 1 k)

[23] G W FURNAS S DEERWESTER S T DUMAIS T K LANDAUER R A HARSHMAN L A STREETERAND K E LOCHBAUM Information retrieval using a singular value decomposition model of latentsemantic structure in Proceedings of the 11th Annual International ACM SIGIR Conference on Researchand Development in Information Retrieval Y Chiaramella ed ACM New York 1988 pp 465ndash480

[24] K GOLDBERG T ROEDER D GUPTA AND C PERKINS Eigentaste A constant time collaborative filteringalgorithm Inf Retrieval 4 (2001) pp 133ndash151

[25] G GOLUB Numerical methods for solving linear least squares problems Numer Math 7 (1965) pp 206ndash216[26] G H GOLUB AND C F VAN LOAN Matrix Computations 4th ed Johns Hopkins University Press

Baltimore 2013[27] M GU Subspace iteration randomization and singular value problems SIAM J Sci Comput 37 (2015)

pp A1139ndashA1173[28] M GU AND S C EISENSTAT Efficient algorithms for computing a strong rank-revealing QR factorization

SIAM J Sci Comput 17 (1996) pp 848ndash869[29] E T HALE W YIN AND Y ZHANG Fixed-point continuation for l1-minimization methodology and

convergence SIAM J Optim 19 (2008) pp 1107ndash1130[30] N HALKO P G MARTINSSON AND J A TROPP Finding structure with randomness probabilistic

algorithms for constructing approximate matrix decompositions SIAM Rev 53 (2011) pp 217ndash288[31] R A HARSHMAN Foundations of the PARAFAC procedure Models and conditions for an ldquoexplanatoryrdquo

multimodal factor analysis UCLA Working Papers in Phonetics 16 (1970) pp 1ndash84[32] J L HERLOCKER J A KONSTAN A BORCHERS AND J RIEDL An algorithmic framework for performing

collaborative filtering in Proceedings of the 22nd Annual International ACM SIGIR Conference onResearch and Development in Information Retrieval F Gey M Heaast and R Tong eds ACM NewYork 1999 pp 230ndash237

[33] R A HORN AND C R JOHNSON Topics in Matrix Analysis Cambridge University Press Cambridge 1991[34] D A HUCKABY AND T F CHAN On the convergence of Stewartrsquos QLP algorithm for approximating the

SVD Numer Algorithms 32 (2003) pp 287ndash316[35] Stewartrsquos pivoted QLP decomposition for low-rank matrices Numer Linear Algebra Appl 12 (2005)

pp 153ndash159[36] W B JOHNSON AND J LINDENSTRAUSS Extensions of Lipschitz mappings into a Hilbert space in

Conference in Modern Analysis and Probability (New Haven Conn 1982) R Beals A Beck A Bellowand A Hajian eds vol 26 of Contemp Math Amer Math Soc Providence 1984 pp 189ndash206

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 493

[37] I T JOLLIFFE Principal Component Analysis Springer New York 1986[38] J M KLEINBERG Authoritative sources in a hyperlinked environment J ACM 46 (1999) pp 604ndash632[39] T G KOLDA Orthogonal tensor decompositions SIAM J Matrix Anal Appl 23 (2001) pp 243ndash255[40] T G KOLDA AND B W BADER Tensor decompositions and applications SIAM Rev 51 (2009) pp 455ndash

500[41] R M LARSEN PROPACK-software for large and sparse SVD calculations available at

httpsunstanfordedurmunkPROPACK[42] Lanczos bidiagonalization with partial reorthogonalization Tech Report PB-537 Department of

Computer Science Aarhus University Aarhus Denmark 1998[43] Y LECUN L BOTTOU Y BENGIO AND P HAFFNER Gradient-based learning applied to document

recognition Proc IEEE 86 (1998) pp 2278ndash2324[44] R B LEHOUCQ D C SORENSEN AND C YANG ARPACK Usersrsquo Guide Solution of Large-Scale

Eigenvalue Problems with Implicitly Restarted Arnoldi Methods SIAM Philadelphia 1998[45] Z LIN M CHEN AND Y MA The augmented Lagrange multiplier method for exact recovery of corrupted

low-rank matrices Preprint on arXiv httpsarxivorgabs10095055 2010[46] N LINIAL E LONDON AND Y RABINOVICH The geometry of graphs and some of its algorithmic

applications Combinatorica 15 (1995) pp 215ndash245[47] Z LIU A HANSSON AND L VANDENBERGHE Nuclear norm system identification with missing inputs and

outputs Systems Control Lett 62 (2013) pp 605ndash612[48] Z LIU AND L VANDENBERGHE Interior-point method for nuclear norm approximation with application to

system identification SIAM J Matrix Anal Appl 31 (2009) pp 1235ndash1256[49] S MA D GOLDFARB AND L CHEN Fixed point and Bregman iterative methods for matrix rank minimiza-

tion Math Program 128 (2011) pp 321ndash353[50] C B MELGAARD Randomized Pivoting and Spectrum-Revealing Bounds in Numerical Linear Algebra

PhD Thesis University of California Berkeley 2015[51] M MESBAHI AND G P PAPAVASSILOPOULOS On the rank minimization problem over a positive semidefinite

linear matrix inequality IEEE Trans Automat Control 42 (1997) pp 239ndash243[52] N MULLER L MAGAIA AND B M HERBST Singular value decomposition eigenfaces and 3D recon-

structions SIAM Rev 46 (2004) pp 518ndash545[53] M NARWARIA AND W LIN SVD-based quality metric for image and video using machine learning IEEE

Trans Systbdquo Man and Cybernetics Part B (Cybernetics) 42 (2012) pp 347ndash364[54] T-H OH Y MATSUSHITA Y-W TAI AND I SO KWEON Fast randomized singular value thresholding

for nuclear norm minimization in Proceedings of the IEEE Conference on Computer Vision and PatternRecognition 2015 IEEE Conference Proceedings IEEE Los Alamitos 2015 pp 4484ndash4493

[55] G QUINTANA-ORTIacute X SUN AND C H BISCHOF A BLAS-3 version of the QR factorization with columnpivoting SIAM J Sci Comput 19 (1998) pp 1486ndash1494

[56] B RECHT M FAZEL AND P A PARRILO Guaranteed minimum-rank solutions of linear matrix equationsvia nuclear norm minimization SIAM Rev 52 (2010) pp 471ndash501

[57] J D RENNIE AND N SREBRO Fast maximum margin matrix factorization for collaborative prediction inProceedings of the 22nd International Conference on Machine Learning L De Raedt and S Wrobel edsACM Press Neow York 2005 pp 713ndash719

[58] V ROKHLIN A SZLAM AND M TYGERT A randomized algorithm for principal component analysis SIAMJ Matrix Anal Appl 31 (2009) pp 1100ndash1124

[59] A K SAIBABA HOID higher order interpolatory decomposition for tensors based on Tucker representationSIAM J Matrix Anal Appl 37 (2016) pp 1223ndash1249

[60] B SAVAS AND L ELDEacuteN Handwritten digit classification using higher order singular value decompositionPattern Recognition 40 (2007) pp 993ndash1003

[61] R SCHREIBER AND C VAN LOAN A storage-efficient WY representation for products of Householdertransformations SIAM J Sci Statist Comput 10 (1989) pp 53ndash57

[62] A SHASHUA AND T HAZAN Non-negative tensor factorization with applications to statistics and com-puter vision in Proceedings of the 22nd International Conference on Machine Learning S DzeroskiL De Raedt and S Wrobel eds ACM New York 2005 pp 792ndash799

[63] N D SIDIROPOULOS R BRO AND G B GIANNAKIS Parallel factor analysis in sensor array processingIEEE Trans Signal Process 48 (2000) pp 2377ndash2388

[64] D C SORENSEN AND M EMBREE A DEIM induced CUR factorization SIAM J Sci Comput 38 (2016)pp A1454ndashA1482

[65] N SREBRO J RENNIE AND T S JAAKKOLA Maximum-margin matrix factorization in NIPSrsquo04 Proceed-ings of the 17th International Conference on Neural Information Processing Systems L K Saul Y Weissand L Bottou eds MIT Press Cambridge 2004 pp 1329ndash1336

[66] G W STEWART The QLP approximation to the singular value decomposition SIAM J Sci Comput 20(1999) pp 1336ndash1348

ETNAKent State University and

Johann Radon Institute (RICAM)

494 Y FENG J XIAO AND M GU

[67] S TAHERI Q QIU AND R CHELLAPPA Structure-preserving sparse decomposition for facial expressionanalysis IEEE Trans Image Process 23 (2014) pp 3590ndash3603

[68] K-C TOH AND S YUN An accelerated proximal gradient algorithm for nuclear norm regularized linearleast squares problems Pac J Optim 6 (2010) pp 615ndash640

[69] L N TREFETHEN AND D BAU III Numerical Linear Algebra SIAM Philadelphia 1997[70] L R TUCKER Some mathematical notes on three-mode factor analysis Psychometrika 31 (1966) pp 279ndash

311[71] N VANNIEUWENHOVEN R VANDEBRIL AND K MEERBERGEN A new truncation strategy for the higher-

order singular value decomposition SIAM J Sci Comput 34 (2012) pp A1027ndashA1052[72] M A O VASILESCU AND D TERZOPOULOS Multilinear analysis of image ensembles Tensorfaces in

European Conference on Computer Vision 2002 A Heyden G Sparr M Nielsen and P Johansen edsLecture Notes in Computer Science vol 2350 Springer Berlin 2002 pp 447ndash460

[73] N VERVLIET O DEBALS L SORBER M V BAREL AND L DE LATHAUWER Tensorlab user guideavailable at httpwwwtensorlabnet

[74] J XIAO AND M GU Spectrum-revealing Cholesky factorization for kernel methods in Proceedings of the 16thIEEE International Conference on Data Mining (ICDM) IEEE Conference Proceedings Los Alamitos2016 pp 1293ndash1298

[75] J XIAO M GU AND J LANGOU Fast parallel randomized QR with column pivoting algorithms for reliablelow-rank matrix approximations in Proceedings of the 24th IEEE International Conference on HighPerformance Computing (HiPC) IEEE Conference Proceedings Los Alamitos 2017 pp 233ndash242

[76] T ZHANG AND G H GOLUB Rank-one approximation to high order tensors SIAM J Matrix Anal Appl23 (2001) pp 534ndash550

Page 4: FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION AND …etna.mcs.kent.edu/vol.51.2019/pp469-494.dir/pp469-494.pdfminimization. In Section3, we introduce Flip-Flop SRQR and analyze its

ETNAKent State University and

Johann Radon Institute (RICAM)

472 Y FENG J XIAO AND M GU

Partition Ωdef= Q

TΩQ

def=[Ω1 Ω2

] and therefore (23) implies

Ω

[R11 R12

R22

]=

[R11 R12

R22

]

which leads to the updating formula for B in Algorithm 1

(24)[R12

R22

]larr Ω2R22 =

[R12 minusR11R

minus111 R12

R22

]

With TRQRCP the TUXV algorithm [17 Algorithm 7] (Algorithm 5 in the appendix)computes a low-rank approximation with the QLP factorization at a greatly accelerated speedby computing a partial QR factorization with column pivoting followed with a partial LQfactorization

22 Spectrum-revealing QR factorization Although both RQRCP and TRQRCP arevery effective practical tools for low-rank matrix approximations they are not known toprovide reliable low-rank matrix approximations due to their underlying greediness in thecolumn norm-based pivoting strategy To solve this potential problem of column-basedQR factorization Gu and Eisenstat [28] proposed an efficient way to perform additionalcolumn interchanges to enhance the quality of the leading k columns in AΠ as a basis for theapproximate column subspace More recently a more efficient and effective method SRQRwas introduced and analyzed in [50 75] to compute the low-rank approximation (22) Theconcept of spectrum-revealing introduced in [50 74] emphasizes the utilization of partialQR factorization (22) as a low-rank matrix approximation as opposed to the more traditionalrank-revealing factorization which emphasizes the utility of the partial QR factorization (21)as a tool for numerical rank determination The SRQR algorithm is described in Algorithm 2The SRQR algorithm will always run to completion with a high-quality low-rank matrixapproximation (22) For real data matrices that usually have fast decaying singular-valuespectrum this approximation is often as good as the truncated SVD The SRQR algorithmof [75] explicitly updates the partial QR factorization (21) while swapping columns but theSRQR algorithm can actually avoid any explicit computations on the trailing matrix R22 usingTRQRCP instead of RQRCP to obtain exactly the same partial QR initialization Below weoutline the SRQR algorithm

In (21) let

(25) Rdef=

[R11 a

α

]be the leading (l + 1)times (l + 1) (l ge k) submatrix of R We define

(26) g1def=R2212|α|

and g2def= |α|

∥∥∥RminusT∥∥∥12

where X12 is the largest column 2-norm of X for any given X In [75] the authorsproved approximation quality bounds involving g1 g2 for the low-rank approximation com-puted by RQRCP or TRQRCP RQRCP or TRQRCP will provide a good low-rank ma-

trix approximation if g1 and g2 are O(1) The authors also proved that g1 leradic

1+ε1minusε and

g2 leradic

2(1+ε)

1minusε

(1 +

radic1+ε1minusε

)lminus1

hold with high probability for both RQRCP and TRQRCPwhere 0 lt ε lt 1 is a user-defined parameter which guides the choice of the oversampling size

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 473

p For reasonably choices of ε like ε = 12 g1 is a small constant while g2 can potentially be an

extremely large number which can lead to a poor low-rank approximation quality To avoidthe potential large number of g2 the SRQR algorithm (Algorithm 2) proposed in [75] uses apair-wise swapping strategy to guarantee that g2 is below some user-defined tolerance g gt 1which is usually chosen to be a small number larger than one like 20 In Algorithm 2 weuse an approximate formula instead of using the definition (26) directly to estimate g2 ieg2 asymp |α|radicd

∥∥∥ΩRminusT∥∥∥

12 based on the Johnson-Lindenstrauss Lemma [36]

Algorithm 2 Spectrum-revealing QR Factorization (SRQR)InputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0Integer l ge k Tolerance g gt 1 for g2OutputsOrthogonal matrix Q isin Rmtimesm formed by the first k reflectorsUpper trapezoidal matrix R isin RktimesnPermutation matrix Π isin Rntimesn such that AΠ asymp Q ( 1 k)RAlgorithmCompute QRΠ with RQRCP or TRQRCP to l stepsCompute squared 2-norm of the columns of B( l + 1 n) ri (l + 1 le i le n) where Bis an updated random projection of A computed by RQRCP or TRQRCPApproximate squared 2-norm of the columns of A(l + 1 m l + 1 n) ri = ri(b +p) (l + 1 le i le n)ı = argmaxl+1leilenriSwap ı-th and (l + 1)-st columns of AΠ rOne-step QR factorization of A(l + 1 m l + 1 n)|α| = Rl+1l+1ri = ri minusA(l + 1 i)2 (l + 2 le i le n)Generate a random matrix Ω isin N (0 1)dtimes(l+1) (d l)Compute g2 = |α|

∥∥∥RminusT∥∥∥12asymp |α|radic

d

∥∥∥ΩRminusT∥∥∥

12

while g2 gt g doı = argmax1leilel+1ith column norm of ΩRminusT Swap ı-th and (l + 1)-st columns of A and Π in a Round Robin rotationGivens-rotate R back into upper-trapezoidal formrl+1 = R2

l+1l+1 ri = ri +A(l + 1 i)2 (l + 2 le i le n)ı = argmaxl+1leilenriSwap ı-th and (l + 1)-st columns of AΠ rOne-step QR factorization of A(l + 1 m l + 1 n)|α| = Rl+1l+1ri = ri minusA(l + 1 i)2 (l + 2 le i le n)Generate a random matrix Ω isin N (0 1)dtimes(l+1)(d l) Compute g2 = |α|

∥∥∥RminusT∥∥∥12asymp |α|radic

d

∥∥∥ΩRminusT∥∥∥

12

end while

23 Tensor approximation In this section we review some basic notations and conceptsinvolving tensors A more detailed discussion of the properties and applications of tensors can

ETNAKent State University and

Johann Radon Institute (RICAM)

474 Y FENG J XIAO AND M GU

be found in the review [40] A tensor is a d-dimensional array of numbers denoted by scriptnotation X isin RI1timesmiddotmiddotmiddottimesId with entries given by

xj1jd 1 le j1 le I1 1 le jd le Id

We use the matrix X(n) isin RIntimes(Πj 6=nIj) to denote the nth mode unfolding of the tensorX Since this tensor has d dimensions there are altogether d-possibilities for unfolding Then-mode product of a tensor X isin RI1timesmiddotmiddotmiddottimesId with a matrix U isin RktimesIn results in a tensorY isin RI1timesmiddotmiddotmiddottimesInminus1timesktimesIn+1timesmiddotmiddotmiddottimesId such that

yj1jnminus1jjn+1jd = (X timesn U)j1jnminus1jjn+1jd=

Insumjn=1

xj1jdujjn

Alternatively it can be expressed conveniently in terms of unfolded tensors

Y = X timesn U hArr Y(n) = UX(n)

Decompositions of higher-order tensors have applications in signal processing [12 15 63]numerical linear algebra [13 39 76] computer vision [62 72 67] etc Two particular tensordecompositions can be considered as higher-order extensions of the matrix SVD CANDE-COMPPARAFAC (CP) [10 31] decomposes a tensor as a sum of rank-one tensors andthe Tucker decomposition [70] is a higher-order form of the principal component analy-sis Knowing the definitions of mode products and unfolding of tensors we can introducethe sequentially truncated higher-order SVD (ST-HOSVD) algorithm for producing a rank(k1 kd) approximation to the tensor based on the Tucker decomposition format The ST-HOSVD algorithm [3 71] (Algorithm 6 in the appendix) returns a core tensor G isin Rk1timesmiddotmiddotmiddottimeskdand a set of unitary matrices Uj isin RIjtimeskj for j = 1 d such that

X asymp G times1 U1 middot middot middot timesd Ud

where the right-hand side is called a Tucker decomposition However a straightforwardgeneralization to higher-order (d ge 3) tensors of the matrix Eckart-Young-Mirsky Theorem isnot possible [14]

In ST-HOSVD one key step is to compute an exact or approximate rank-kr SVD ofthe tensor unfolding Well known efficient ways to compute an exact low-rank SVD includeKrylov subspace methods [44] There are also efficient randomized algorithms to find anapproximate low-rank SVD [30] In the Matlab tensorlab toolbox [73] the most efficientmethod MLSVD_RSI is essentially ST-HOSVD with an randomized subspace iteration tofind the approximate SVD of the tensor unfolding

24 Nuclear norm minimization Matrix rank minimization problem appears ubiqui-tously in many fields such as Euclidean embedding [20 46] control [21 51 56] collaborativefiltering [9 57 65] system identification [47 48] etc The regularized nuclear norm mini-mization problem is the following

minXisinRmtimesn

f (X) + τXlowast

where τ gt 0 is a regularization parameter and Xlowastdef=sumqi=1 σi (X) The choice of the

function f (middot) is situational f (X) = M minus X1 in robust principal component analysis(robust PCA) [8] and f (X) = πΩ (M)minus πΩ (X) ||2F in matrix completion [7]

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 475

Many researchers have devoted themselves to solve the above nuclear norm minimizationproblem and plenty of algorithms have been proposed including singular value thresholding(SVT) [7] fixed point continuous (FPC) [49] accelerated proximal gradient (APG) [68]augmented Lagrange multiplier (ALM) [45] The most expensive part of these algorithms liesin the computation of the truncated SVD Inexact augmented Lagrange multiplier (IALM) [45]has been proved to be one of the most accurate and efficient among them We now describeIALM for the robust PCA and matrix completion problems

The robust PCA problem can be formalized as a minimization problem of a sum of nuclearnorm and a scaled matrix l1-norm (sum of matrix entries in absolute value)

(27) min Xlowast + λE1 subject to M = X + E

where M is a measured matrix X has low-rank E is an error matrix and sufficiently sparseand λ is a positive weighting parameter Algorithm 7 in the appendix describes details of theIALM method to solve the robust PCA problem [45]

The matrix completion problem [9 45] can be written in the form

(28) minXisinRmtimesn

Xlowast subject to X + E = M πΩ (E) = 0

where πΩ Rmtimesn rarr Rmtimesn is an orthogonal projection that keeps the entries in Ω unchangedand sets those outside Ω zeros In [45] the authors applied the IALM method to the matrixcompletion problem (Algorithm 8 in the appendix)

3 Flip-Flop SRQR factorization

31 Flip-Flop SRQR factorization In this section we introduce our Flip-Flop SRQRfactorization a slightly different variant of TUXV (Algorithm 5 in the appendix) to computean SVD approximation based on the QLP factorization Given an integer l ge k we run SRQR(the version without computing the trailing matrix) with l steps on A

(31) AΠ = QR = Q

[R11 R12

R22

]

where R11 isin Rltimesl is upper triangular R12 isin Rltimes(nminusl) and R22 isin R(mminusl)times(nminusl) Then werun partial QRNP with l steps on RT

(32) RT =

[RT11

RT12 RT22

]= Q

[R11 R12

R22

]asymp Q1

[R11 R12

]

where Q =[Q1 Q2

]with Q1 isin Rntimesl Therefore combining with the fact that

AΠQ1 = Q[R11 R12

]T

we can approximate the matrix A by

(33) A = QRΠT = Q(RT)T

ΠT asymp Q

[RT11

RT12

]QT1 ΠT = A

(ΠQ1

)(ΠQ1

)T

We denote the rank-k truncated SVD ofAΠQ1 by UkΣkVTk LetUk = Uk Vk = ΠQ1Vk

then using (33) a rank-k approximate SVD of A is obtained

(34) A asymp UkΣkVTk

where Uk isin Rmtimesk Vk isin Rntimesk are column orthonormal and Σk = Diag (σ1 middot middot middot σk) withσirsquos being the leading k singular values of AΠQ1 The Flip-Flop SRQR factorization isoutlined in Algorithm 3

ETNAKent State University and

Johann Radon Institute (RICAM)

476 Y FENG J XIAO AND M GU

Algorithm 3 Flip-Flop Spectrum-Revealing QR FactorizationInputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0Integer l ge k Tolerance g gt 1 for g2OutputsU isin Rmtimesk contains the approximate top k left singular vectors of AΣ isin Rktimesk contains the approximate top k singular values of AV isin Rntimesk contains the approximate top k right singular vectors of AAlgorithmRun SRQR on A to l steps to obtain (R11 R12)Run QRNP on (R11 R12)

T to obtain Q1 represented by a sequence of Householder vectors

A = AΠQ1[U Σ V ] = svd

(A)

U = U ( 1 k) Σ = Σ (1 k 1 k) V = ΠQ1V ( 1 k)

32 Complexity analysis In this section we perform a complexity analysis of Flip-Flop SRQR Since the approximate SVD only makes sense when the target rank k is smallwe assume that k le l min (mn) The complexity analysis of Flip-Flop SRQR looks asfollows

1 The main cost of doing SRQR with TRQRCP onA is 2mnl+2(b+p)mn+(m+n)l22 The main cost of the QR factorization for [R11 R12]

T and forming Q1 is 2nl2minus 23 l

33 The main cost of computing A = AΠQ1 is 2mnl4 The main cost of computing [Usimsim] = svd

(A)

is O(ml2)5 The main cost of forming Vk is 2nlk

Since k le l min (mn) the complexity of Flip-Flop SRQR is 4mnl + 2(b+ p)mnby omitting the lower-order terms

On the other hand the complexity of the approximate SVD with randomized subspaceiteration (RSISVD) [27 30] is (4 + 4q)mn (k + p) where p is the oversampling size and qis the number of subspace iterations (see the detailed analysis in the appendix) In practice pis chosen to be a small integer for instance 5 in RSISVD In Flip-Flop SRQR l is usuallychosen to be a little bit larger than k eg l = k + 5 We also found that l = k is sufficient interms of the approximation quality in our numerical experiments Therefore we observe thatFlip-Flop SRQR is more efficient than RSISVD for any q gt 0

Since a practical dataset matrix can be too large to fit into the main memory the cost oftransferring the matrix into a memory hierarchy typically predominates the arithmetic costTherefore we also compare the number of data passes between RSISVD and Flip-Flop SRQRRSISVD needs 2q + 2 passes while Flip-Flop SRQR needs 4 passes computing B = ΩAswapping the columns of A realizing a partial QR factorization on A and computing AΠQ1Therefore Flip-Flop SRQR requires less data transfer than RSISVD for any q gt 1

33 Quality analysis of Flip-Flop SRQR This section is devoted to the quality analysisof Flip-Flop SRQR We start with Lemma 31

LEMMA 31 Given any matrix X = [X1X2] with Xi isin Rmtimesni i = 1 2 and n1 +n2 = n

σj (X)2 le σj (X1)

2+ X222 1 le j le min(mn)

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 477

Proof Since XXT = X1XT1 + X2X

T2 we obtain the above result using [33 Theo-

rem 3316]

We are now ready to derive bounds for the singular values and the approximation errorof Flip-Flop SRQR We need to emphasize that even if the target rank is k we run Flip-FlopSRQR with an actual target rank l which is a little bit larger than k The difference betweenk and l can create a gap between the singular values of A so that we can obtain a reliablelow-rank approximation

THEOREM 32 Given a matrix A isin Rmtimesn a target rank k an oversampling size p andan actual target rank l ge k the matrices Uk Σk Vk in (34) computed by Flip-Flop SRQRsatisfy

(35) σj(Σk) ge σj(A)

4

radic1 +

2R2242σ4j (Σk)

1 le j le k

and

(36)∥∥Aminus UkΣkV

Tk

∥∥2le σk+1 (A)

4

radic1 + 2

(R222σk+1 (A)

)4

where R22 isin R(mminusl)times(nminusl) is the trailing matrix in (31) Using the properties of SRQR wefurthermore have

(37) σj (Σk) ge σj (A)

4

radic1 + min

(2τ4 τ4 (2 + 4τ4)

(σl+1(A)σj(A)

)4) 1 le j le k

and

(38)∥∥Aminus UkΣkV

Tk

∥∥2le σk+1 (A)

4

radic1 + 2τ4

(σl+1 (A)

σk+1 (A)

)4

where τ and τ defined in (312) have matrix dimension-dependent upper bounds

τ le g1g2

radic(l + 1) (nminus l) and τ le g1g2

radicl (nminus l)

where g1 leradic

1+ε1minusε and g2 le g with high probability The parameters ε gt 0 and g gt 1 are

user-defined

Proof In terms of the singular value bounds observing that

[R11 R12

R22

][R11 R12

R22

]T=

[R11R

T11 + R12R

T12 R12R

T22

R22RT12 R22R

T22

]

ETNAKent State University and

Johann Radon Institute (RICAM)

478 Y FENG J XIAO AND M GU

we apply Lemma 31 twice for any 1 le j le k

σ2j

[R11 R12

R22

][R11 R12

R22

]Tle σ2

j

([R11R

T11 + R12R

T12 R12R

T22

])+∥∥∥[R22R

T12 R22R

T22

]∥∥∥2

2

le σ2j

(R11R

T11 + R12R

T12

)+∥∥∥R12R

T22

∥∥∥2

2+∥∥∥[R22R

T12 R22R

T22

]∥∥∥2

2

le σ2j

(R11R

T11 + R12R

T12

)+ 2

∥∥∥∥∥[R12

R22

]∥∥∥∥∥4

2

= σ2j

(R11R

T11 + R12R

T12

)+ 2 R2242 (39)

The relation (39) can be further rewritten as

σ4j (A) le σ4

j

([R11 R12

])+ 2 R2242 = σ4

j (Σk) + 2 R2242 1 le j le k

which is equivalent to

σj(Σk) ge σj(A)

4

radic1 +

2R2242σ4j (Σk)

For the residual matrix bound we let[R11 R12

]def=[R11 R12

]+[δR11 δR12

]

where[R11 R12

]is the rank-k truncated SVD of

[R11 R12

] Since

(310)∥∥Aminus UkΣkV

Tk

∥∥2

=

∥∥∥∥∥AΠminusQ[R11 R12

0

]TQT

∥∥∥∥∥2

it follows from the orthogonality of singular vectors that[R11 R12

]T [δR11 δR12

]= 0

and therefore[R11 R12

]T [R11 R12

]=[R11 R12

]T [R11 R12

]+[δR11 δR12

]T [δR11 δR12

]

which implies

(311) RT12R12 = RT

12R12 +(δR12

)T (δR12

)

Similarly to the deduction of (39) from (311) we can derive∥∥∥∥[δR11 δR12

R22

]∥∥∥∥4

2

le∥∥[δR11 δR12

]∥∥4

2+ 2

∥∥∥∥[δR12

R22

]∥∥∥∥4

2

le σ4k+1 (A) + 2

∥∥∥∥∥[R12

R22

]∥∥∥∥∥4

2

= σ4k+1 (A) + 2 R2242

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 479

Combining with (310) it now follows that

∥∥Aminus UkΣkVTk

∥∥2

=

∥∥∥∥∥AΠminusQ[R11 R12

0

]TQT

∥∥∥∥∥2

=

∥∥∥∥[δR11 δR12

R22

]∥∥∥∥2

le σk+1 (A)4

radic1 + 2

(R222σk+1 (A)

)4

To obtain an upper bound of R222 in (35) and (36) we follow the analysis of SRQRin [75]

From the analysis of [75 Section IV] Algorithm 2 ensures that g1 leradic

1+ε1minusε and g2 le g

with high probability (the actual probability-guarantee formula can be found in [75 SectionIV]) where g1 and g2 are defined by (26) Here 0 lt ε lt 1 is a user defined parameter toadjust the choice of the oversampling size p used in the TRQRCP initialization part in SRQRg gt 1 is a user-defined parameter in the extra swapping part in SRQR Let

(312) τdef= g1g2

R222R2212

∥∥∥RminusT∥∥∥minus1

12

σl+1 (A)and τ

def= g1g2

R222R2212

∥∥RminusT11

∥∥minus1

12

σk (Σk)

where R is defined in equation (25) Since 1radicnX12 le X2 le

radicnX12 and

σi (X1) le σi (X) 1 le i le min (s t) for any matrix X isin Rmtimesn and submatrix X1 isin Rstimestof X we obtain that

τ = g1g2R222R2212

∥∥∥RminusT∥∥∥minus1

12

σl+1

(R) σl+1

(R)

σl+1 (A)le g1g2

radic(l + 1) (nminus l)

Using the fact that σl([R11 R12

])= σl

(R11

)and σk (Σk) = σk

([R11 R12

])by (32)

and (34)

τ = g1g2R222R2212

∥∥RminusT11

∥∥minus1

12

σl (R11)

σl (R11)

σl([R11 R12

]) σl ([R11 R12

])σk (Σk)

le g1g2

radicl (nminus l)

By the definition of τ

(313) R222 = τ σl+1 (A)

Plugging this into (36) yields (38)By the definition of τ we observe that

(314) R222 le τσk (Σk)

By (35) and (314)(315)

σj (Σk) ge σj (A)

4

radic1 + 2

(R222σj(Σk)

)4ge σj (A)

4

radic1 + 2

(R222σk(Σk)

)4ge σj (A)

4radic

1 + 2τ4 1 le j le k

ETNAKent State University and

Johann Radon Institute (RICAM)

480 Y FENG J XIAO AND M GU

On the other hand using (35)

σ4j (A) le σ4

j (Σk)

(1 + 2

σ4j (A)

σ4j (Σk)

R2242σ4j (A)

)

le σ4j (Σk)

1 + 2

(σ4j (Σk) + 2 R2242

)σ4j (Σk)

R2242σ4j (A)

le σ4

j (Σk)

(1 + 2

(1 + 2

R2242σ4k (Σk)

)R2242σ4j (A)

)

that is

σj (Σk) ge σj (A)

4

radic1 +

(2 + 4

R2242σ4k(Σk)

)R2242σ4j (A)

Plugging (313) and (314) into this above equation yields

(316) σj (Σk) ge σj (A)

4

radic1 + τ4 (2 + 4τ4)

(σl+1(A)σj(A)

)4 1 le j le k

Combining (315) with (316) we arrive at (37)We note that (35) and (36) still hold true if we replace k by lInequality (37) shows that with the definitions (312) of τ and τ Flip-Flop SRQR can

reveal at least a dimension-dependent fraction of all the leading singular values of A andindeed approximate them very accurately in case they decay relatively quickly Moreover (38)shows that Flip-Flop SRQR can compute a rank-k approximation that is up to a factor of4

radic1 + 2τ4

(σl+1(A)σk+1(A)

)4

from optimal In situations where the singular values of A decay

relatively quickly our rank-k approximation is about as accurate as the truncated SVD with achoice of l such that

σl+1 (A)

σk+1 (A)= o (1)

REMARK 33 The ldquohigh probabilityrdquo mentioned in Theorem 32 is dependent on theoversampling size p but not on the block size b We decided not to dive deep into the probabilityexplanation since it will make the proof much longer and more tedious Theoretically thecomputed oversampling size p for certain given high probability would be rather high Forexample to achieve a probability 095 we need to choose p to be at least 500 However inpractice we would never choose such a large p since it will deteriorate the code efficiencyIt turns out that a small p like 5 sim 10 would achieve good approximation results empiricallyThe detailed explanation about what does high probability mean here can be found in [75]In terms of the choice of the block size b it wonrsquot affect the analysis or the error boundsThe choice of b is purely for the performance of the codes Only a suitable b can take fulladvantage of the cache hierarchy and BLAS routines

4 Numerical experiments In this section we demonstrate the effectiveness and effi-ciency of the Flip-Flop SRQR (FFSRQR) algorithm in several numerical experiments Firstly

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 481

TABLE 41Methods for approximate SVD

Method DescriptionLANSVD Approximate SVD using Lanczos bidiagonalization with partial

reorthogonalization [42]Its Fortran and Matlab implementations are from PROPACK [41]

FFSRQR Flip-Flop Spectrum-revealing QR factorizationIts Matlab implementation is using Mex-wrappedBLAS and LAPACK routines [2]

RSISVD Approximate SVD with randomized subspace iteration [30 27]Its Matlab implementation is from tensorlab toolbox [73]

TUXV Algorithm 5 in the appendixLTSVD Linear Time SVD [16]

We use its Matlab implementation by Ma et al [49]

we compare FFSRQR with other approximate SVD algorithms for a matrix approximationSecondly we compare FFSRQR with other methods on tensor approximation problems usingthe tensorlab toolbox [73] Thirdly we compare FFSRQR with other methods for the robustPCA problem and matrix completion problem All experiments are implemented on a laptopwith 29 GHz Intel Core i5 processor and 8 GB of RAM The codes based on LAPACK [2]and PROPACK [41] in Section 41 are in Fortran The codes in Sections 42 43 are in Matlabexcept that FFSRQR is implemented in Fortran and wrapped by mex

41 Approximate truncated SVD In this section we compare FFSRQR with otherapproximate SVD algorithms for low-rank matrix approximation All tested methods are listedin Table 41 Since LTSVD has only a Matlab implementation this method is only consideredin Sections 42 and 43 The test matrices areType 1 A isin Rmtimesn [66] is defined by A = U DV T + 01 middot dss middotE where U isin Rmtimess V isin

Rntimess are column-orthonormal matrices and D isin Rstimess is a diagonal matrix with sgeometrically decreasing diagonal entries from 1 to 10minus3 dss is the last diagonalentry of D E isin Rmtimesn is a random matrix where the entries are independentlysampled from a normal distribution N (0 1) In our numerical experiment we testthree different random matrices The square matrix has a size of 10000times 10000 theshort-fat matrix has a size of 1000times 10000 and the tall-skinny matrix has a size of10000times 1000

Type 2 A isin R4929times4929 is a real data matrix from the University of Florida sparse matrixcollection [11] Its corresponding file name is HBGEMAT11

For a given matrix A isin Rmtimesn the relative SVD approximation error is measured by∥∥Aminus UkΣkVTk

∥∥FAF where Σk contains the approximate top k singular values and

Uk Vk are the corresponding approximate top k singular vectors The parameters used inFFSRQR RSISVD TUXV and LTSVD are listed in Table 42 The additional time requiredto compute g2 is negligible In our experiments g2 always remains modest and never triggerssubsequent SRQR column swaps However computing g2 is still recommended since g2

serves as an insurance policy against potential quality degration by TRQRCP The block size bis chosen to be 32 according to the xGEQP3 LAPACK routine When k is less than 32 weimplement an unblock version of the partial QRCP

Figures 41ndash43 display the run time the relative approximation error and the top 20singular values comparison respectively for four different matrices In terms of speed FFSRQRis always faster than LANSVD and RSISVD Moreover the efficiency difference is more

ETNAKent State University and

Johann Radon Institute (RICAM)

482 Y FENG J XIAO AND M GU

TABLE 42Parameters used in RSISVD FFSRQR TUXV and LTSVD

Method ParameterFFSRQR oversampling size p = 5 block size b = min32 k l = k d = 10 g = 20RSISVD oversampling size p = 5 subspace iteration q = 1TUXV oversampling size p = 5 block size b = min32 k l = kLTSVD probabilities pi = 1n

obvious when k is relatively large FFSRQR is only slightly slower than TUXV In termsof the relative approximation error FFSRQR is comparable to the other three methods Interms of the top singular values approximation Figure 43 displays the top 20 singular valueswhen k = 500 for four different matrices In Figure 43 FFSRQR is superior to TUXV aswe take the absolute values of the main diagonal entries of the upper triangle matrix R as theapproximate singular values for TUXV

100 200 300 400 500

target rank k

100

101

102

103

104

run tim

e (

sec)

FFSRQR

RSISVD

LANSVD

TUXV

(a) Type 1 Random square matrix

100 200 300 400 500

target rank k

10-1

100

101

102

run tim

e (

sec)

FFSRQR

RSISVD

LANSVD

TUXV

(b) Type 1 Random short-fat matrix

100 200 300 400 500

target rank k

10-1

100

101

102

run tim

e (

sec)

FFSRQR

RSISVD

LANSVD

TUXV

(c) Type 1 Random tall-skinny matrix

100 200 300 400 500

target rank k

10-1

100

101

102

103

run tim

e (

sec)

FFSRQR

RSISVD

LANSVD

TUXV

(d) Type 2 GEMAT11

FIG 41 Run time comparison for approximate SVD algorithms

42 Tensor approximation This section illustrates the effectiveness and efficiency ofFFSRQR for computing approximate tensors ST-HOSVD [3 71] is one of the most efficient

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 483

100 200 300 400 500

target rank k

022

024

026

028

03

032

034

036re

lative

ap

pro

xim

atio

n e

rro

r

FFSRQR

RSISVD

LANSVD

TUXV

(a) Type 1 Random square matrix

100 200 300 400 500

target rank k

10-2

10-1

100

rela

tive

ap

pro

xim

atio

n e

rro

r

FFSRQR

RSISVD

LANSVD

TUXV

(b) Type 1 Random short-fat matrix

100 200 300 400 500

target rank k

10-2

10-1

100

rela

tive

ap

pro

xim

atio

n e

rro

r

FFSRQR

RSISVD

LANSVD

TUXV

(c) Type 1 Random tall-skinny matrix

100 200 300 400 500

target rank k

024

026

028

03

032

034

036

rela

tive

ap

pro

xim

atio

n e

rro

rFFSRQR

RSISVD

LANSVD

TUXV

(d) Type 2 GEMAT11

FIG 42 Relative approximation error comparison for approximate SVD algorithms

algorithms to compute a Tucker decomposition of tensors and the most costly part of thisalgorithm is to compute an SVD or approximate SVD of the tensor unfoldings The truncatedSVD and RSISVD are used in the routines MLSVD and MLSVD_RSI respectively in theMatlab tensorlab toolbox [73] Based on this Matlab toolbox we implement ST-HOSVDusing FFSRQR or LTSVD to do the SVD approximation We name these two new routines byMLSVD_FFSRQR and MLSVD_LTSVD respectively and compare these four routines in thenumerical experiment We also have Python codes for this tensor approximation experimentWe donrsquot list the results of the python programs here as they are similar to those for Matlab

421 A sparse tensor example We test a sparse tensor X isin Rntimesntimesn of the followingformat [64 59]

X =

10sumj=1

1000

jxj yj zj +

nsumj=11

1

jxj yj zj

where xj yj zj isin Rn are sparse vectors with nonnegative entries The symbol ldquordquo representsthe vector outer product We compute a rank-(k k k) Tucker decomposition [GU1 U2 U3]using MLSVD MLSVD_FFSRQR MLSVD_RSI and MLSVD_LTSVD respectively Therelative approximation error is measured by X minus XkF XF where Xk = G times1 U1 times2

U2 times3 U3

ETNAKent State University and

Johann Radon Institute (RICAM)

484 Y FENG J XIAO AND M GU

5 10 15 20

index

06

07

08

09

1to

p 2

0 s

ingula

r valu

es

FFSRQR

RSISVD

LANSVD

TUXV

(a) Type 1 Random square matrix

5 10 15 20

index

06

07

08

09

1

top 2

0 s

ingula

r valu

es

FFSRQR

RSISVD

LANSVD

TUXV

(b) Type 1 Random short-fat matrix

5 10 15 20

index

06

065

07

075

08

085

09

095

top

20

sin

gu

lar

va

lue

s

FFSRQR

RSISVD

LANSVD

TUXV

(c) Type 1 Random tall-skinny matrix

5 10 15 20

index

101

102

103

top

20

sin

gu

lar

va

lue

sFFSRQR

RSISVD

LANSVD

TUXV

(d) Type 2 GEMAT11

FIG 43 Top 20 singular values comparison for approximate SVD algorithms

Figure 44 displays a comparison of the efficiency and accuracy of different methods for a400times 400times 400 sparse tensor approximation problem MLSVD_LTSVD is the fastest but theleast accurate one The other three methods have similar accuracy while MLSVD_FFSRQR isfaster when the target rank k is larger

422 Handwritten digits classification MNIST is a handwritten digits image data setcreated by Yann LeCun [43] Every digit is represented by a 28times28 pixel image Handwrittendigits classification has the objective of training a classification model to classify new unlabeledimages A HOSVD algorithm is proposed by Savas and Elden [60] to classify handwrittendigits To reduce the training time a more efficient ST-HOSVD algorithm is introducedin [71]

We do handwritten digits classification using MNIST which consists of 60 000 trainingimages and 10 000 test images The number of training images in each class is restrictedto 5421 so that the training set is equally distributed over all classes The training set isrepresented by a tensor X of size 786times 5421times 10 The classification relies on Algorithm 2in [60] We use various algorithms to obtain an approximation X asymp G times1 U1 times2 U2 times3 U3

where the core tensor G has size 65times 142times 10The results are summarized in Table 43 In terms of run time we observe that our method

MLSVD_FFSRQR is comparable to MLSVD_RSI while MLSVD is the most expensive

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 485

50 100 150 200

target rank k

10-5

10-4

10-3

10-2

10-1

100

rela

tive

ap

pro

xim

atio

n e

rro

r

MLSVD

MLSVD_FFSRQR

MLSVD_RSI

MLSVD_LTSVD

50 100 150 200

target rank k

10-2

10-1

100

101

102

run tim

e (

sec)

MLSVD

MLSVD_FFSRQR

MLSVD_RSI

MLSVD_LTSVD

FIG 44 Run time and relative approximation error comparison on a sparse tensor

TABLE 43Comparison on handwritten digits classification

MLSVD MLSVD_FFSRQR MLSVD_RSI MLSVD_LTSVD

Training Time [sec] 272121 15455 19343 04266Relative Model Error 04099 04273 04247 05162Classification Accur 9519 9498 9505 9259

one and MLSVD_LTSVD is the fastest one In terms of classification quality MLSVDMLSVD_FFSRQR and MLSVD_RSI are comparable while MLSVD_LTSVD is the leastaccurate one

43 Solving nuclear norm minimization problem To show the effectiveness of theFFSRQR algorithm for the nuclear norm minimization problems we investigate two scenariosthe robust PCA (27) and the matrix completion problem (28) The test matrix used for therobust PCA is introduced in [45] and the test matrices used in the matrix completion are tworeal data sets We use the IALM method [45] to solve both problems the IALM code can bedownloaded1

431 Robust PCA To solve the robust PCA problem we replace the approximateSVD part in the IALM method [45] by various methods We denote the actual solution tothe robust PCA problem by a matrix pair (Xlowast Elowast) isin Rmtimesn times Rmtimesn The matrix Xlowast isXlowast = XLX

TR with XL isin Rmtimesk XR isin Rntimesk being random matrices whose entries are

independently sampled from a normal distribution The sparse matrix Elowast is a random matrixwhere its non-zero entries are independently sampled from a uniform distribution over theinterval [minus500 500] The input to the IALM algorithm has the form M = Xlowast + Elowast andthe output is denoted by

(X E

) In this numerical experiment we use the same parameter

settings as the IALM code for robust PCA rank k is 01m and the number of non-zero entriesin E is 005m2 We choose the trade-off parameter λ = 1

radicmax (mn) as suggested by

Candes et al [8] The solution quality is measured by the normalized root mean square errorX minusXlowastF XlowastF

Table 44 includes relative error run time the number of non-zero entries in E (E0)the iteration count and the number of non-zero singular values (sv) in X of the IALM

1httpperceptioncslillinoisedumatrix-ranksample_codehtml

ETNAKent State University and

Johann Radon Institute (RICAM)

486 Y FENG J XIAO AND M GU

algorithm using different approximate SVD methods We observe that IALM_FFSRQR isfaster than all the other three methods while its error is comparable to IALM_LANSVD andIALM_RSISVD IALM_LTSVD is relatively slow and not effective

TABLE 44Comparison on robust PCA

Size Method Error Time (sec) E0 Iter sv

1000times 1000

IALM_LANSVD 333e-07 579e+00 50000 22 100IALM_FFSRQR 279e-07 102e+00 50000 25 100IALM_RSISVD 336e-07 109e+00 49999 22 100IALM_LTSVD 992e-02 311e+00 999715 100 100

2000times 2000

IALM_LANSVD 261e-07 591e+01 199999 22 200IALM_FFSRQR 182e-07 693e+00 199998 25 200IALM_RSISVD 263e-07 738e+00 199996 22 200IALM_LTSVD 842e-02 220e+01 3998937 100 200

4000times 4000

IALM_LANSVD 138e-07 465e+02 799991 23 400IALM_FFSRQR 139e-07 443e+01 800006 26 400IALM_RSISVD 151e-07 504e+01 799990 23 400IALM_LTSVD 894e-02 154e+02 15996623 100 400

6000times 6000

IALM_LANSVD 130e-07 166e+03 1799982 23 600IALM_FFSRQR 102e-07 142e+02 1799993 26 600IALM_RSISVD 144e-07 162e+02 1799985 23 600IALM_LTSVD 858e-02 555e+02 35992605 100 600

432 Matrix completion We solve the matrix completion problems for two real datasets used in [68] the Jester joke data set [24] and the MovieLens data set [32] The Jester jokedata set consists of 41 million ratings for 100 jokes from 73 421 users and can be downloadedfrom the website httpgoldbergberkeleyedujester-data We test withthe following data matrices

bull jester-1 Data from 24 983 users who have rated 36 or more jokesbull jester-2 Data from 23 500 users who have rated 36 or more jokesbull jester-3 Data from 24 938 users who have rated between 15 and 35 jokesbull jester-all The combination of jester-1 jester-2 and jester-3

The MovieLens data set can be downloaded fromhttpsgrouplensorgdatasetsmovielensWe test with the following data matricesbull movie-100K 100 000 ratings of 943 users for 1682 moviesbull movie-1M 1 million ratings of 6040 users for 3900 moviesbull movie-latest-small 100 000 ratings of 700 users for 9000 movies

For each data set we let M be the original data matrix where Mij stands for the ratingof joke (movie) j by user i and Γ be the set of indices where Mij is known The matrixcompletion algorithm quality is measured by the Normalized Mean Absolute Error (NMAE)defined by

NMAEdef=

1|Γ|sum

(ij)isinΓ |Mij minusXij |rmax minus rmin

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 487

where Xij is the prediction of the rating of joke (movie) j given by user i and rmin rmax arethe lower and upper bounds of the ratings respectively For the Jester joke data sets we setrmin = minus10 and rmax = 10 For the MovieLens data sets we set rmin = 1 and rmax = 5

Since |Γ| is large we randomly select a subset Ω from Γ and then use Algorithm 8 tosolve the problem (28) We randomly select 10 ratings for each user in the Jester joke datasets while we randomly choose about 50 of the ratings for each user in the MovieLensdata sets Table 45 includes the parameter settings in the algorithms The maximum iterationnumber is 100 in IALM and all other parameters are the same as those used in [45]

The numerical results are included in Table 46 We observe that IALM_FFSRQR achievesalmost the same recoverability as other methods (except IALM_LTSVD) and is slightly fasterthan IALM_RSISVD for these two data sets

TABLE 45Parameters used in the IALM method on matrix completion

Data set m n |Γ| |Ω|jester-1 24983 100 181e+06 249830jester-2 23500 100 171e+06 235000jester-3 24938 100 617e+05 249384

jester-all 73421 100 414e+06 734210movie-100K 943 1682 100e+05 49918movie-1M 6040 3706 100e+06 498742

movie-latest-small 671 9066 100e+05 52551

5 Conclusions We have presented the Flip-Flop SRQR factorization a variant ofthe QLP factorization to compute low-rank matrix approximations The Flip-Flop SRQRalgorithm uses a SRQR factorization to initialize the truncated version of a column-pivotedQR factorization and then forms an LQ factorization For the numerical results presented theerrors in the proposed algorithm were comparable to those obtained from the other state-of-the-art algorithms This new algorithm is cheaper to compute and produces quality low-rankmatrix approximations Furthermore we prove singular value lower bounds and residual errorupper bounds for the Flip-Flop SRQR factorization In situations where singular values ofthe input matrix decay relatively quickly the low-rank approximation computed by Flip-FlopSRQR is guaranteed to be as accurate as the truncated SVD We also perform a complexityanalysis to show that Flip-Flop SRQR is faster than the approximate SVD with randomizedsubspace iteration Future work includes reducing the overhead cost in Flip-Flop SRQR andimplementing the Flip-Flop SRQR algorithm on distributed memory machines for popularapplications such as distributed PCA

Acknowledgments The authors would like to thank Yousef Saad the editor and threeanonymous referees who kindly reviewed the earlier version of this manuscript and providedvaluable suggestions and comments

Appendix A Partial QR factorization with column pivotingThe details of partial QRCP are covered in Algorithm 4

Appendix B TUXV algorithmThe details of TUXV algorithm are presented in Algorithm 5

Appendix C Computing Tucker decomposition of higher-order tensorsAlgorithm 6 computes the Tucker decomposition of higher-order tensors ie sequentially

truncated higher-order SVD

ETNAKent State University and

Johann Radon Institute (RICAM)

488 Y FENG J XIAO AND M GU

TABLE 46Comparison on matrix completion

Data set Method Iter Time NMAE sv σmax σmin

jester-1

IALM-LANSVD 12 706e+00 184e-01 100 214e+03 100e+00IALM-FFSRQR 12 344e+00 169e-01 100 228e+03 100e+00IALM-RSISVD 12 375e+00 189e-01 100 212e+03 100e+00IALM-LTSVD 100 211e+01 174e-01 62 300e+03 100e+00

jester-2

IALM-LANSVD 12 680e+00 185e-01 100 213e+03 100e+00IALM-FFSRQR 12 279e+00 170e-01 100 229e+03 100e+00IALM-RSISVD 12 359e+00 191e-01 100 212e+03 100e+00IALM-LTSVD 100 203e+01 175e-01 58 296e+03 100e+00

jester-3

IALM-LANSVD 12 705e+00 126e-01 99 179e+03 100e+00IALM-FFSRQR 12 303e+00 122e-01 100 171e+03 100e+00IALM-RSISVD 12 385e+00 131e-01 100 178e+03 100e+00IALM-LTSVD 100 212e+01 133e-01 55 250e+03 100e+00

jester-all

IALM-LANSVD 12 239e+01 172e-01 100 356e+03 100e+00IALM-FFSRQR 12 112e+01 162e-01 100 363e+03 100e+00IALM-RSISVD 12 134e+01 182e-01 100 347e+03 100e+00IALM-LTSVD 100 699e+01 168e-01 52 492e+03 100e+00

movie-100K

IALM-LANSVD 29 286e+01 183e-01 285 121e+03 100e+00IALM-FFSRQR 30 455e+00 167e-01 295 153e+03 100e+00IALM-RSISVD 29 482e+00 182e-01 285 129e+03 100e+00IALM-LTSVD 48 142e+01 147e-01 475 191e+03 100e+00

movie-1M

IALM-LANSVD 50 740e+02 158e-01 495 499e+03 100e+00IALM-FFSRQR 53 207e+02 137e-01 525 663e+03 100e+00IALM-RSISVD 50 223e+02 157e-01 495 535e+03 100e+00IALM-LTSVD 100 850e+02 117e-01 995 897e+03 100e+00

movie-latest-small

IALM-LANSVD 31 166e+02 185e-01 305 113e+03 100e+00IALM-FFSRQR 31 196e+01 200e-01 305 142e+03 100e+00IALM-RSISVD 31 285e+01 191e-01 305 120e+03 100e+00IALM-LTSVD 63 402e+01 208e-01 298 179e+03 100e+00

Appendix D IALM for solving two special nuclear norm minimization problemsAlgorithm 7 and Algorithm 8 solve the robust PCA and matrix completion problems

respectively In Algorithm 7 Sω (x) = sgn (x) middot max (|x| minus ω 0) is the soft shrinkageoperator [29] with x isin Rn and ω gt 0 In Algorithm 8 Ω is the complement of Ω

Appendix E Approximate SVD with randomized subspace iterationThe randomized subspace iteration was proposed in [30 Algorithm 44] to compute an

orthonormal matrix whose range approximates the range of A An approximate SVD can becomputed using the aforementioned orthonormal matrix [30 Algorithm 51] A randomizedsubspace iteration is used in routine MLSVD_RSI in Matlab toolbox tensorlab [73] andMLSVD_RSI is by far the most efficient function that we can find in Matlab to compute ST-HOSVD We summarize the approximate SVD with randomized subspace iteration pseudocodein Algorithm 9

Now we perform a complexity analysis for the approximate SVD with randomizedsubspace iteration We first note that

1 The cost of generating a random matrix is negligible2 The cost of computing B = AΩ is 2mn (k + p)

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 489

Algorithm 4 Partial QRCPInputsMatrix A isin Rmtimesn Target rank kOutputsOrthogonal matrix Q isin RmtimesmUpper trapezoidal matrix R isin RmtimesnPermutation matrix Π isin Rntimesn such that AΠ = QRAlgorithmInitialize Π = In Compute column norms rs = A (1 m s) 2 (1 le s le n)for j = 1 k do

Find i = arg maxjleslen

rs Exchange rj and ri columns in A and Π

Form Householder reflection Qj from A (j m j)Update trailing matrix A (j m j n)larr QTj A (j m j n)Update rs = A (j + 1 m s) 2 (j + 1 le s le n)

end forQ = Q1Q2 middot middot middotQk is the product of all reflections R = upper trapezoidal part of A

Algorithm 5 TUXV AlgorithmInputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0OutputsColumn orthonormal matrices U isin Rmtimesk V isin Rntimesk and upper triangular matrixR isin Rktimesk such that A asymp URV T AlgorithmDo TRQRCP on A to obtain Q isin Rmtimesk R isin Rktimesn and Π isin RntimesnR = RΠT and do LQ factorization ie [VR] = qr(RT 0)Compute Z = AV and do QR factorization ie [UR] = qr(Z 0)

3 In each QR step [Qsim] = qr (B 0) the cost of computing the QR factorization ofB is 2m (k + p)

2 minus 23 (k + p)

3 (cf [69]) and the cost of forming the first (k + p)

columns in the full Q matrix is m (k + p)2

+ 13 (k + p)

3

Now we count the flops for each i in the for loop

1 The cost of computing B = AT lowastQ is 2mn (k + p)2 The cost of computing [Qsim] = qr (B 0) is 2n (k + p)

2 minus 23 (k + p)

3 and the costof forming the first (k + p) columns in the full Q matrix is n (k + p)

2+ 1

3 (k + p)3

3 The cost of computing B = A lowastQ is 2mn (k + p)4 The cost of computing [Qsim] = qr (B 0) is 2m (k + p)

2minus 23 (k + p)

3 and the costof forming the first (k + p) columns in the full Q matrix is m (k + p)

2+ 1

3 (k + p)3

Together the cost of running the for loop q times is

q

(4mn (k + p) + 3 (m+ n) (k + p)

2 minus 2

3(k + p)

3

)

Additionally the cost of computing B = QT lowastA is 2mn (k + p) the cost of doing SVD of Bis O(n (k + p)

2) and the cost of computing U = Q lowast U is 2m (k + p)

2

ETNAKent State University and

Johann Radon Institute (RICAM)

490 Y FENG J XIAO AND M GU

Algorithm 6 ST-HOSVDInputsTensor X isin RI1timesmiddotmiddotmiddottimesId desired rank (k1 kd) and processing order p = (p1 middot middot middot pd)OutputsTucker decomposition [GU1 middot middot middot Ud]AlgorithmDefine tensor G larr X for j = 1 d dor = pj Compute exact or approximate rank kr SVD of the tensor unfolding G(r) asymp UrΣrV Tr Ur larr UrUpdate G(r) larr ΣrV

Tr ie applying UTr to G

end for

Algorithm 7 Robust PCA Using IALMInputsMeasured matrix M isin Rmtimesn positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 J (M) = max

(M2 λminus1MF

) Y0 = MJ (M) E0 = 0

while not converged do(UΣV) = svd

(MminusEk + microminus1k Yk

)

Xk+1 = USmicrominus1k

(Σ)V T

Ek+1 = Sλmicrominus1k

(M minusXk+1 + microminus1

k Yk)

Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then

Breakend if

end while

Now assume k + p min (mn) and omit the lower-order terms then we arriveat (4q + 4)mn (k + p) as the complexity of approximate SVD with randomized subspaceiteration In practice q is usually chosen to be an integer between 0 and 2

REFERENCES

[1] O ALTER P O BROWN AND D BOTSTEIN Singular value decomposition for genome-wide expressiondata processing and modeling Proc Natl Acad Sci USA 97 (2000) pp 10101ndash10106

[2] E ANDERSON Z BAI C BISCHOF L S BLACKFORD J DEMMEL J DONGARRA J DU CROZA GREENBAUM S HAMMARLING A MCKENNEY AND D SORENSEN LAPACK Usersrsquo Guide 3rded SIAM Philadelphia 1999

[3] C A ANDERSSON AND R BRO Improving the speed of multi-way algorithms Part I Tucker3 ChemometricsIntell Lab Syst 42 (1998) pp 93ndash103

[4] M W BERRY S T DUMAIS AND G W OrsquoBRIEN Using linear algebra for intelligent information retrievalSIAM Rev 37 (1995) pp 573ndash595

[5] C BISCHOF AND C VAN LOAN The WY representation for products of Householder matrices SIAM J Sci

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 491

Algorithm 8 Matrix Completion Using IALMInputsSampled set Ω sampled entries πΩ (M) positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 Y0 = 0 E0 = 0while not converged do

(UΣV) = svd(MminusEk + microminus1k Yk

)

Xk+1 = USmicrominus1k

(Σ)V T

Ek+1 = πΩ

(M minusXk+1 + microminus1

k Yk)

Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then

Breakend if

end while

Statist Comput 8 (1987) pp S2ndashS13[6] P BUSINGER AND G H GOLUB Handbook series linear algebra Linear least squares solutions by House-

holder transformations Numer Math 7 (1965) pp 269ndash276[7] J-F CAI E J CANDEgraveS AND Z SHEN A singular value thresholding algorithm for matrix completion

SIAM J Optim 20 (2010) pp 1956ndash1982[8] E J CANDEgraveS X LI Y MA AND J WRIGHT Robust principal component analysis J ACM 58 (2011)

Art 11 37 pages[9] E J CANDEgraveS AND B RECHT Exact matrix completion via convex optimization Found Comput Math 9

(2009) pp 717ndash772[10] J D CARROLL AND J-J CHANG Analysis of individual differences in multidimensional scaling via an

N-way generalization of ldquoEckart-Youngrdquo decomposition Psychometrika 35 (1970) pp 283ndash319[11] T A DAVIS AND Y HU The University of Florida sparse matrix collection ACM Trans Math Software 38

(2011) pp Art 1 25[12] L DE LATHAUWER AND B DE MOOR From matrix to tensor Multilinear algebra and signal processing in

Mathematics in Signal Processing IV Inst Math Appl Conf Ser Oxford Univ Press Oxford 1998pp 1ndash16

[13] L DE LATHAUWER B DE MOOR AND J VANDEWALLE A multilinear singular value decompositionSIAM J Matrix Anal Appl 21 (2000) pp 1253ndash1278

[14] On the best rank-1 and rank-(R1 R2 middot middot middot RN ) approximation of higher-order tensors SIAM JMatrix Anal Appl 21 (2000) pp 1324ndash1342

[15] L DE LATHAUWER AND J VANDEWALLE Dimensionality reduction in higher-order signal processing andrank-(R1 R2 RN ) reduction in multilinear algebra Linear Algebra Appl 391 (2004) pp 31ndash55

[16] P DRINEAS R KANNAN AND M W MAHONEY Fast Monte Carlo algorithms for matrices II Computinga low-rank approximation to a matrix SIAM J Comput 36 (2006) pp 158ndash183

[17] J A DUERSCH AND M GU Randomized QR with column pivoting SIAM J Sci Comput 39 (2017)pp C263ndashC291

[18] C ECKART AND G YOUNG The approximation of one matrix by another of lower rank Psychometrika 1(1936) pp 211ndash218

[19] L ELDEacuteN Matrix Methods in Data Mining and Pattern Recognition SIAM Philadelphia 2007[20] M FAZEL H HINDI AND S P BOYD Log-det heuristic for matrix rank minimization with applications to

Hankel and Euclidean distance matrices in Proceedings of the 2003 American Control Conference 2003IEEE Conference Proceedings IEEE Piscataway pp 2156ndash2162

[21] M FAZEL T K PONG D SUN AND P TSENG Hankel matrix rank minimization with applications tosystem identification and realization SIAM J Matrix Anal Appl 34 (2013) pp 946ndash977

[22] C FRANK AND E NOumlTH Optimizing eigenfaces by face masks for facial expression recognition in ComputerAnalysis of Images and Patterns N Petkov and M A Westenberg eds vol 2756 of Lecture Notes inComput Sci Springer Berlin 2003 pp 646ndash654

ETNAKent State University and

Johann Radon Institute (RICAM)

492 Y FENG J XIAO AND M GU

Algorithm 9 Approximate SVD with Randomized Subspace IterationInputsMatrix A isin Rmtimesn Target rank k Oversampling size p ge 0 Number of iterations q ge 1OutputsU isin Rmtimesk contains the approximate top k left singular vectors of AΣ isin Rktimesk contains the approximate top k singular values of AV isin Rntimesk contains the approximate top k right singular vectors of AAlgorithmGenerate iid Gaussian matrix Ω isin N (0 1)

ntimes(k+p)Compute B = AΩ[Qsim] = qr (B 0)for i = 1 q doB = AT lowastQ[Qsim] = qr (B 0)B = A lowastQ[Qsim] = qr (B 0)

end forB = QT lowastA[UΣ V ] = svd (B)U = Q lowast U U = U ( 1 k)Σ = Σ (1 k 1 k)V = V ( 1 k)

[23] G W FURNAS S DEERWESTER S T DUMAIS T K LANDAUER R A HARSHMAN L A STREETERAND K E LOCHBAUM Information retrieval using a singular value decomposition model of latentsemantic structure in Proceedings of the 11th Annual International ACM SIGIR Conference on Researchand Development in Information Retrieval Y Chiaramella ed ACM New York 1988 pp 465ndash480

[24] K GOLDBERG T ROEDER D GUPTA AND C PERKINS Eigentaste A constant time collaborative filteringalgorithm Inf Retrieval 4 (2001) pp 133ndash151

[25] G GOLUB Numerical methods for solving linear least squares problems Numer Math 7 (1965) pp 206ndash216[26] G H GOLUB AND C F VAN LOAN Matrix Computations 4th ed Johns Hopkins University Press

Baltimore 2013[27] M GU Subspace iteration randomization and singular value problems SIAM J Sci Comput 37 (2015)

pp A1139ndashA1173[28] M GU AND S C EISENSTAT Efficient algorithms for computing a strong rank-revealing QR factorization

SIAM J Sci Comput 17 (1996) pp 848ndash869[29] E T HALE W YIN AND Y ZHANG Fixed-point continuation for l1-minimization methodology and

convergence SIAM J Optim 19 (2008) pp 1107ndash1130[30] N HALKO P G MARTINSSON AND J A TROPP Finding structure with randomness probabilistic

algorithms for constructing approximate matrix decompositions SIAM Rev 53 (2011) pp 217ndash288[31] R A HARSHMAN Foundations of the PARAFAC procedure Models and conditions for an ldquoexplanatoryrdquo

multimodal factor analysis UCLA Working Papers in Phonetics 16 (1970) pp 1ndash84[32] J L HERLOCKER J A KONSTAN A BORCHERS AND J RIEDL An algorithmic framework for performing

collaborative filtering in Proceedings of the 22nd Annual International ACM SIGIR Conference onResearch and Development in Information Retrieval F Gey M Heaast and R Tong eds ACM NewYork 1999 pp 230ndash237

[33] R A HORN AND C R JOHNSON Topics in Matrix Analysis Cambridge University Press Cambridge 1991[34] D A HUCKABY AND T F CHAN On the convergence of Stewartrsquos QLP algorithm for approximating the

SVD Numer Algorithms 32 (2003) pp 287ndash316[35] Stewartrsquos pivoted QLP decomposition for low-rank matrices Numer Linear Algebra Appl 12 (2005)

pp 153ndash159[36] W B JOHNSON AND J LINDENSTRAUSS Extensions of Lipschitz mappings into a Hilbert space in

Conference in Modern Analysis and Probability (New Haven Conn 1982) R Beals A Beck A Bellowand A Hajian eds vol 26 of Contemp Math Amer Math Soc Providence 1984 pp 189ndash206

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 493

[37] I T JOLLIFFE Principal Component Analysis Springer New York 1986[38] J M KLEINBERG Authoritative sources in a hyperlinked environment J ACM 46 (1999) pp 604ndash632[39] T G KOLDA Orthogonal tensor decompositions SIAM J Matrix Anal Appl 23 (2001) pp 243ndash255[40] T G KOLDA AND B W BADER Tensor decompositions and applications SIAM Rev 51 (2009) pp 455ndash

500[41] R M LARSEN PROPACK-software for large and sparse SVD calculations available at

httpsunstanfordedurmunkPROPACK[42] Lanczos bidiagonalization with partial reorthogonalization Tech Report PB-537 Department of

Computer Science Aarhus University Aarhus Denmark 1998[43] Y LECUN L BOTTOU Y BENGIO AND P HAFFNER Gradient-based learning applied to document

recognition Proc IEEE 86 (1998) pp 2278ndash2324[44] R B LEHOUCQ D C SORENSEN AND C YANG ARPACK Usersrsquo Guide Solution of Large-Scale

Eigenvalue Problems with Implicitly Restarted Arnoldi Methods SIAM Philadelphia 1998[45] Z LIN M CHEN AND Y MA The augmented Lagrange multiplier method for exact recovery of corrupted

low-rank matrices Preprint on arXiv httpsarxivorgabs10095055 2010[46] N LINIAL E LONDON AND Y RABINOVICH The geometry of graphs and some of its algorithmic

applications Combinatorica 15 (1995) pp 215ndash245[47] Z LIU A HANSSON AND L VANDENBERGHE Nuclear norm system identification with missing inputs and

outputs Systems Control Lett 62 (2013) pp 605ndash612[48] Z LIU AND L VANDENBERGHE Interior-point method for nuclear norm approximation with application to

system identification SIAM J Matrix Anal Appl 31 (2009) pp 1235ndash1256[49] S MA D GOLDFARB AND L CHEN Fixed point and Bregman iterative methods for matrix rank minimiza-

tion Math Program 128 (2011) pp 321ndash353[50] C B MELGAARD Randomized Pivoting and Spectrum-Revealing Bounds in Numerical Linear Algebra

PhD Thesis University of California Berkeley 2015[51] M MESBAHI AND G P PAPAVASSILOPOULOS On the rank minimization problem over a positive semidefinite

linear matrix inequality IEEE Trans Automat Control 42 (1997) pp 239ndash243[52] N MULLER L MAGAIA AND B M HERBST Singular value decomposition eigenfaces and 3D recon-

structions SIAM Rev 46 (2004) pp 518ndash545[53] M NARWARIA AND W LIN SVD-based quality metric for image and video using machine learning IEEE

Trans Systbdquo Man and Cybernetics Part B (Cybernetics) 42 (2012) pp 347ndash364[54] T-H OH Y MATSUSHITA Y-W TAI AND I SO KWEON Fast randomized singular value thresholding

for nuclear norm minimization in Proceedings of the IEEE Conference on Computer Vision and PatternRecognition 2015 IEEE Conference Proceedings IEEE Los Alamitos 2015 pp 4484ndash4493

[55] G QUINTANA-ORTIacute X SUN AND C H BISCHOF A BLAS-3 version of the QR factorization with columnpivoting SIAM J Sci Comput 19 (1998) pp 1486ndash1494

[56] B RECHT M FAZEL AND P A PARRILO Guaranteed minimum-rank solutions of linear matrix equationsvia nuclear norm minimization SIAM Rev 52 (2010) pp 471ndash501

[57] J D RENNIE AND N SREBRO Fast maximum margin matrix factorization for collaborative prediction inProceedings of the 22nd International Conference on Machine Learning L De Raedt and S Wrobel edsACM Press Neow York 2005 pp 713ndash719

[58] V ROKHLIN A SZLAM AND M TYGERT A randomized algorithm for principal component analysis SIAMJ Matrix Anal Appl 31 (2009) pp 1100ndash1124

[59] A K SAIBABA HOID higher order interpolatory decomposition for tensors based on Tucker representationSIAM J Matrix Anal Appl 37 (2016) pp 1223ndash1249

[60] B SAVAS AND L ELDEacuteN Handwritten digit classification using higher order singular value decompositionPattern Recognition 40 (2007) pp 993ndash1003

[61] R SCHREIBER AND C VAN LOAN A storage-efficient WY representation for products of Householdertransformations SIAM J Sci Statist Comput 10 (1989) pp 53ndash57

[62] A SHASHUA AND T HAZAN Non-negative tensor factorization with applications to statistics and com-puter vision in Proceedings of the 22nd International Conference on Machine Learning S DzeroskiL De Raedt and S Wrobel eds ACM New York 2005 pp 792ndash799

[63] N D SIDIROPOULOS R BRO AND G B GIANNAKIS Parallel factor analysis in sensor array processingIEEE Trans Signal Process 48 (2000) pp 2377ndash2388

[64] D C SORENSEN AND M EMBREE A DEIM induced CUR factorization SIAM J Sci Comput 38 (2016)pp A1454ndashA1482

[65] N SREBRO J RENNIE AND T S JAAKKOLA Maximum-margin matrix factorization in NIPSrsquo04 Proceed-ings of the 17th International Conference on Neural Information Processing Systems L K Saul Y Weissand L Bottou eds MIT Press Cambridge 2004 pp 1329ndash1336

[66] G W STEWART The QLP approximation to the singular value decomposition SIAM J Sci Comput 20(1999) pp 1336ndash1348

ETNAKent State University and

Johann Radon Institute (RICAM)

494 Y FENG J XIAO AND M GU

[67] S TAHERI Q QIU AND R CHELLAPPA Structure-preserving sparse decomposition for facial expressionanalysis IEEE Trans Image Process 23 (2014) pp 3590ndash3603

[68] K-C TOH AND S YUN An accelerated proximal gradient algorithm for nuclear norm regularized linearleast squares problems Pac J Optim 6 (2010) pp 615ndash640

[69] L N TREFETHEN AND D BAU III Numerical Linear Algebra SIAM Philadelphia 1997[70] L R TUCKER Some mathematical notes on three-mode factor analysis Psychometrika 31 (1966) pp 279ndash

311[71] N VANNIEUWENHOVEN R VANDEBRIL AND K MEERBERGEN A new truncation strategy for the higher-

order singular value decomposition SIAM J Sci Comput 34 (2012) pp A1027ndashA1052[72] M A O VASILESCU AND D TERZOPOULOS Multilinear analysis of image ensembles Tensorfaces in

European Conference on Computer Vision 2002 A Heyden G Sparr M Nielsen and P Johansen edsLecture Notes in Computer Science vol 2350 Springer Berlin 2002 pp 447ndash460

[73] N VERVLIET O DEBALS L SORBER M V BAREL AND L DE LATHAUWER Tensorlab user guideavailable at httpwwwtensorlabnet

[74] J XIAO AND M GU Spectrum-revealing Cholesky factorization for kernel methods in Proceedings of the 16thIEEE International Conference on Data Mining (ICDM) IEEE Conference Proceedings Los Alamitos2016 pp 1293ndash1298

[75] J XIAO M GU AND J LANGOU Fast parallel randomized QR with column pivoting algorithms for reliablelow-rank matrix approximations in Proceedings of the 24th IEEE International Conference on HighPerformance Computing (HiPC) IEEE Conference Proceedings Los Alamitos 2017 pp 233ndash242

[76] T ZHANG AND G H GOLUB Rank-one approximation to high order tensors SIAM J Matrix Anal Appl23 (2001) pp 534ndash550

Page 5: FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION AND …etna.mcs.kent.edu/vol.51.2019/pp469-494.dir/pp469-494.pdfminimization. In Section3, we introduce Flip-Flop SRQR and analyze its

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 473

p For reasonably choices of ε like ε = 12 g1 is a small constant while g2 can potentially be an

extremely large number which can lead to a poor low-rank approximation quality To avoidthe potential large number of g2 the SRQR algorithm (Algorithm 2) proposed in [75] uses apair-wise swapping strategy to guarantee that g2 is below some user-defined tolerance g gt 1which is usually chosen to be a small number larger than one like 20 In Algorithm 2 weuse an approximate formula instead of using the definition (26) directly to estimate g2 ieg2 asymp |α|radicd

∥∥∥ΩRminusT∥∥∥

12 based on the Johnson-Lindenstrauss Lemma [36]

Algorithm 2 Spectrum-revealing QR Factorization (SRQR)InputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0Integer l ge k Tolerance g gt 1 for g2OutputsOrthogonal matrix Q isin Rmtimesm formed by the first k reflectorsUpper trapezoidal matrix R isin RktimesnPermutation matrix Π isin Rntimesn such that AΠ asymp Q ( 1 k)RAlgorithmCompute QRΠ with RQRCP or TRQRCP to l stepsCompute squared 2-norm of the columns of B( l + 1 n) ri (l + 1 le i le n) where Bis an updated random projection of A computed by RQRCP or TRQRCPApproximate squared 2-norm of the columns of A(l + 1 m l + 1 n) ri = ri(b +p) (l + 1 le i le n)ı = argmaxl+1leilenriSwap ı-th and (l + 1)-st columns of AΠ rOne-step QR factorization of A(l + 1 m l + 1 n)|α| = Rl+1l+1ri = ri minusA(l + 1 i)2 (l + 2 le i le n)Generate a random matrix Ω isin N (0 1)dtimes(l+1) (d l)Compute g2 = |α|

∥∥∥RminusT∥∥∥12asymp |α|radic

d

∥∥∥ΩRminusT∥∥∥

12

while g2 gt g doı = argmax1leilel+1ith column norm of ΩRminusT Swap ı-th and (l + 1)-st columns of A and Π in a Round Robin rotationGivens-rotate R back into upper-trapezoidal formrl+1 = R2

l+1l+1 ri = ri +A(l + 1 i)2 (l + 2 le i le n)ı = argmaxl+1leilenriSwap ı-th and (l + 1)-st columns of AΠ rOne-step QR factorization of A(l + 1 m l + 1 n)|α| = Rl+1l+1ri = ri minusA(l + 1 i)2 (l + 2 le i le n)Generate a random matrix Ω isin N (0 1)dtimes(l+1)(d l) Compute g2 = |α|

∥∥∥RminusT∥∥∥12asymp |α|radic

d

∥∥∥ΩRminusT∥∥∥

12

end while

23 Tensor approximation In this section we review some basic notations and conceptsinvolving tensors A more detailed discussion of the properties and applications of tensors can

ETNAKent State University and

Johann Radon Institute (RICAM)

474 Y FENG J XIAO AND M GU

be found in the review [40] A tensor is a d-dimensional array of numbers denoted by scriptnotation X isin RI1timesmiddotmiddotmiddottimesId with entries given by

xj1jd 1 le j1 le I1 1 le jd le Id

We use the matrix X(n) isin RIntimes(Πj 6=nIj) to denote the nth mode unfolding of the tensorX Since this tensor has d dimensions there are altogether d-possibilities for unfolding Then-mode product of a tensor X isin RI1timesmiddotmiddotmiddottimesId with a matrix U isin RktimesIn results in a tensorY isin RI1timesmiddotmiddotmiddottimesInminus1timesktimesIn+1timesmiddotmiddotmiddottimesId such that

yj1jnminus1jjn+1jd = (X timesn U)j1jnminus1jjn+1jd=

Insumjn=1

xj1jdujjn

Alternatively it can be expressed conveniently in terms of unfolded tensors

Y = X timesn U hArr Y(n) = UX(n)

Decompositions of higher-order tensors have applications in signal processing [12 15 63]numerical linear algebra [13 39 76] computer vision [62 72 67] etc Two particular tensordecompositions can be considered as higher-order extensions of the matrix SVD CANDE-COMPPARAFAC (CP) [10 31] decomposes a tensor as a sum of rank-one tensors andthe Tucker decomposition [70] is a higher-order form of the principal component analy-sis Knowing the definitions of mode products and unfolding of tensors we can introducethe sequentially truncated higher-order SVD (ST-HOSVD) algorithm for producing a rank(k1 kd) approximation to the tensor based on the Tucker decomposition format The ST-HOSVD algorithm [3 71] (Algorithm 6 in the appendix) returns a core tensor G isin Rk1timesmiddotmiddotmiddottimeskdand a set of unitary matrices Uj isin RIjtimeskj for j = 1 d such that

X asymp G times1 U1 middot middot middot timesd Ud

where the right-hand side is called a Tucker decomposition However a straightforwardgeneralization to higher-order (d ge 3) tensors of the matrix Eckart-Young-Mirsky Theorem isnot possible [14]

In ST-HOSVD one key step is to compute an exact or approximate rank-kr SVD ofthe tensor unfolding Well known efficient ways to compute an exact low-rank SVD includeKrylov subspace methods [44] There are also efficient randomized algorithms to find anapproximate low-rank SVD [30] In the Matlab tensorlab toolbox [73] the most efficientmethod MLSVD_RSI is essentially ST-HOSVD with an randomized subspace iteration tofind the approximate SVD of the tensor unfolding

24 Nuclear norm minimization Matrix rank minimization problem appears ubiqui-tously in many fields such as Euclidean embedding [20 46] control [21 51 56] collaborativefiltering [9 57 65] system identification [47 48] etc The regularized nuclear norm mini-mization problem is the following

minXisinRmtimesn

f (X) + τXlowast

where τ gt 0 is a regularization parameter and Xlowastdef=sumqi=1 σi (X) The choice of the

function f (middot) is situational f (X) = M minus X1 in robust principal component analysis(robust PCA) [8] and f (X) = πΩ (M)minus πΩ (X) ||2F in matrix completion [7]

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 475

Many researchers have devoted themselves to solve the above nuclear norm minimizationproblem and plenty of algorithms have been proposed including singular value thresholding(SVT) [7] fixed point continuous (FPC) [49] accelerated proximal gradient (APG) [68]augmented Lagrange multiplier (ALM) [45] The most expensive part of these algorithms liesin the computation of the truncated SVD Inexact augmented Lagrange multiplier (IALM) [45]has been proved to be one of the most accurate and efficient among them We now describeIALM for the robust PCA and matrix completion problems

The robust PCA problem can be formalized as a minimization problem of a sum of nuclearnorm and a scaled matrix l1-norm (sum of matrix entries in absolute value)

(27) min Xlowast + λE1 subject to M = X + E

where M is a measured matrix X has low-rank E is an error matrix and sufficiently sparseand λ is a positive weighting parameter Algorithm 7 in the appendix describes details of theIALM method to solve the robust PCA problem [45]

The matrix completion problem [9 45] can be written in the form

(28) minXisinRmtimesn

Xlowast subject to X + E = M πΩ (E) = 0

where πΩ Rmtimesn rarr Rmtimesn is an orthogonal projection that keeps the entries in Ω unchangedand sets those outside Ω zeros In [45] the authors applied the IALM method to the matrixcompletion problem (Algorithm 8 in the appendix)

3 Flip-Flop SRQR factorization

31 Flip-Flop SRQR factorization In this section we introduce our Flip-Flop SRQRfactorization a slightly different variant of TUXV (Algorithm 5 in the appendix) to computean SVD approximation based on the QLP factorization Given an integer l ge k we run SRQR(the version without computing the trailing matrix) with l steps on A

(31) AΠ = QR = Q

[R11 R12

R22

]

where R11 isin Rltimesl is upper triangular R12 isin Rltimes(nminusl) and R22 isin R(mminusl)times(nminusl) Then werun partial QRNP with l steps on RT

(32) RT =

[RT11

RT12 RT22

]= Q

[R11 R12

R22

]asymp Q1

[R11 R12

]

where Q =[Q1 Q2

]with Q1 isin Rntimesl Therefore combining with the fact that

AΠQ1 = Q[R11 R12

]T

we can approximate the matrix A by

(33) A = QRΠT = Q(RT)T

ΠT asymp Q

[RT11

RT12

]QT1 ΠT = A

(ΠQ1

)(ΠQ1

)T

We denote the rank-k truncated SVD ofAΠQ1 by UkΣkVTk LetUk = Uk Vk = ΠQ1Vk

then using (33) a rank-k approximate SVD of A is obtained

(34) A asymp UkΣkVTk

where Uk isin Rmtimesk Vk isin Rntimesk are column orthonormal and Σk = Diag (σ1 middot middot middot σk) withσirsquos being the leading k singular values of AΠQ1 The Flip-Flop SRQR factorization isoutlined in Algorithm 3

ETNAKent State University and

Johann Radon Institute (RICAM)

476 Y FENG J XIAO AND M GU

Algorithm 3 Flip-Flop Spectrum-Revealing QR FactorizationInputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0Integer l ge k Tolerance g gt 1 for g2OutputsU isin Rmtimesk contains the approximate top k left singular vectors of AΣ isin Rktimesk contains the approximate top k singular values of AV isin Rntimesk contains the approximate top k right singular vectors of AAlgorithmRun SRQR on A to l steps to obtain (R11 R12)Run QRNP on (R11 R12)

T to obtain Q1 represented by a sequence of Householder vectors

A = AΠQ1[U Σ V ] = svd

(A)

U = U ( 1 k) Σ = Σ (1 k 1 k) V = ΠQ1V ( 1 k)

32 Complexity analysis In this section we perform a complexity analysis of Flip-Flop SRQR Since the approximate SVD only makes sense when the target rank k is smallwe assume that k le l min (mn) The complexity analysis of Flip-Flop SRQR looks asfollows

1 The main cost of doing SRQR with TRQRCP onA is 2mnl+2(b+p)mn+(m+n)l22 The main cost of the QR factorization for [R11 R12]

T and forming Q1 is 2nl2minus 23 l

33 The main cost of computing A = AΠQ1 is 2mnl4 The main cost of computing [Usimsim] = svd

(A)

is O(ml2)5 The main cost of forming Vk is 2nlk

Since k le l min (mn) the complexity of Flip-Flop SRQR is 4mnl + 2(b+ p)mnby omitting the lower-order terms

On the other hand the complexity of the approximate SVD with randomized subspaceiteration (RSISVD) [27 30] is (4 + 4q)mn (k + p) where p is the oversampling size and qis the number of subspace iterations (see the detailed analysis in the appendix) In practice pis chosen to be a small integer for instance 5 in RSISVD In Flip-Flop SRQR l is usuallychosen to be a little bit larger than k eg l = k + 5 We also found that l = k is sufficient interms of the approximation quality in our numerical experiments Therefore we observe thatFlip-Flop SRQR is more efficient than RSISVD for any q gt 0

Since a practical dataset matrix can be too large to fit into the main memory the cost oftransferring the matrix into a memory hierarchy typically predominates the arithmetic costTherefore we also compare the number of data passes between RSISVD and Flip-Flop SRQRRSISVD needs 2q + 2 passes while Flip-Flop SRQR needs 4 passes computing B = ΩAswapping the columns of A realizing a partial QR factorization on A and computing AΠQ1Therefore Flip-Flop SRQR requires less data transfer than RSISVD for any q gt 1

33 Quality analysis of Flip-Flop SRQR This section is devoted to the quality analysisof Flip-Flop SRQR We start with Lemma 31

LEMMA 31 Given any matrix X = [X1X2] with Xi isin Rmtimesni i = 1 2 and n1 +n2 = n

σj (X)2 le σj (X1)

2+ X222 1 le j le min(mn)

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 477

Proof Since XXT = X1XT1 + X2X

T2 we obtain the above result using [33 Theo-

rem 3316]

We are now ready to derive bounds for the singular values and the approximation errorof Flip-Flop SRQR We need to emphasize that even if the target rank is k we run Flip-FlopSRQR with an actual target rank l which is a little bit larger than k The difference betweenk and l can create a gap between the singular values of A so that we can obtain a reliablelow-rank approximation

THEOREM 32 Given a matrix A isin Rmtimesn a target rank k an oversampling size p andan actual target rank l ge k the matrices Uk Σk Vk in (34) computed by Flip-Flop SRQRsatisfy

(35) σj(Σk) ge σj(A)

4

radic1 +

2R2242σ4j (Σk)

1 le j le k

and

(36)∥∥Aminus UkΣkV

Tk

∥∥2le σk+1 (A)

4

radic1 + 2

(R222σk+1 (A)

)4

where R22 isin R(mminusl)times(nminusl) is the trailing matrix in (31) Using the properties of SRQR wefurthermore have

(37) σj (Σk) ge σj (A)

4

radic1 + min

(2τ4 τ4 (2 + 4τ4)

(σl+1(A)σj(A)

)4) 1 le j le k

and

(38)∥∥Aminus UkΣkV

Tk

∥∥2le σk+1 (A)

4

radic1 + 2τ4

(σl+1 (A)

σk+1 (A)

)4

where τ and τ defined in (312) have matrix dimension-dependent upper bounds

τ le g1g2

radic(l + 1) (nminus l) and τ le g1g2

radicl (nminus l)

where g1 leradic

1+ε1minusε and g2 le g with high probability The parameters ε gt 0 and g gt 1 are

user-defined

Proof In terms of the singular value bounds observing that

[R11 R12

R22

][R11 R12

R22

]T=

[R11R

T11 + R12R

T12 R12R

T22

R22RT12 R22R

T22

]

ETNAKent State University and

Johann Radon Institute (RICAM)

478 Y FENG J XIAO AND M GU

we apply Lemma 31 twice for any 1 le j le k

σ2j

[R11 R12

R22

][R11 R12

R22

]Tle σ2

j

([R11R

T11 + R12R

T12 R12R

T22

])+∥∥∥[R22R

T12 R22R

T22

]∥∥∥2

2

le σ2j

(R11R

T11 + R12R

T12

)+∥∥∥R12R

T22

∥∥∥2

2+∥∥∥[R22R

T12 R22R

T22

]∥∥∥2

2

le σ2j

(R11R

T11 + R12R

T12

)+ 2

∥∥∥∥∥[R12

R22

]∥∥∥∥∥4

2

= σ2j

(R11R

T11 + R12R

T12

)+ 2 R2242 (39)

The relation (39) can be further rewritten as

σ4j (A) le σ4

j

([R11 R12

])+ 2 R2242 = σ4

j (Σk) + 2 R2242 1 le j le k

which is equivalent to

σj(Σk) ge σj(A)

4

radic1 +

2R2242σ4j (Σk)

For the residual matrix bound we let[R11 R12

]def=[R11 R12

]+[δR11 δR12

]

where[R11 R12

]is the rank-k truncated SVD of

[R11 R12

] Since

(310)∥∥Aminus UkΣkV

Tk

∥∥2

=

∥∥∥∥∥AΠminusQ[R11 R12

0

]TQT

∥∥∥∥∥2

it follows from the orthogonality of singular vectors that[R11 R12

]T [δR11 δR12

]= 0

and therefore[R11 R12

]T [R11 R12

]=[R11 R12

]T [R11 R12

]+[δR11 δR12

]T [δR11 δR12

]

which implies

(311) RT12R12 = RT

12R12 +(δR12

)T (δR12

)

Similarly to the deduction of (39) from (311) we can derive∥∥∥∥[δR11 δR12

R22

]∥∥∥∥4

2

le∥∥[δR11 δR12

]∥∥4

2+ 2

∥∥∥∥[δR12

R22

]∥∥∥∥4

2

le σ4k+1 (A) + 2

∥∥∥∥∥[R12

R22

]∥∥∥∥∥4

2

= σ4k+1 (A) + 2 R2242

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 479

Combining with (310) it now follows that

∥∥Aminus UkΣkVTk

∥∥2

=

∥∥∥∥∥AΠminusQ[R11 R12

0

]TQT

∥∥∥∥∥2

=

∥∥∥∥[δR11 δR12

R22

]∥∥∥∥2

le σk+1 (A)4

radic1 + 2

(R222σk+1 (A)

)4

To obtain an upper bound of R222 in (35) and (36) we follow the analysis of SRQRin [75]

From the analysis of [75 Section IV] Algorithm 2 ensures that g1 leradic

1+ε1minusε and g2 le g

with high probability (the actual probability-guarantee formula can be found in [75 SectionIV]) where g1 and g2 are defined by (26) Here 0 lt ε lt 1 is a user defined parameter toadjust the choice of the oversampling size p used in the TRQRCP initialization part in SRQRg gt 1 is a user-defined parameter in the extra swapping part in SRQR Let

(312) τdef= g1g2

R222R2212

∥∥∥RminusT∥∥∥minus1

12

σl+1 (A)and τ

def= g1g2

R222R2212

∥∥RminusT11

∥∥minus1

12

σk (Σk)

where R is defined in equation (25) Since 1radicnX12 le X2 le

radicnX12 and

σi (X1) le σi (X) 1 le i le min (s t) for any matrix X isin Rmtimesn and submatrix X1 isin Rstimestof X we obtain that

τ = g1g2R222R2212

∥∥∥RminusT∥∥∥minus1

12

σl+1

(R) σl+1

(R)

σl+1 (A)le g1g2

radic(l + 1) (nminus l)

Using the fact that σl([R11 R12

])= σl

(R11

)and σk (Σk) = σk

([R11 R12

])by (32)

and (34)

τ = g1g2R222R2212

∥∥RminusT11

∥∥minus1

12

σl (R11)

σl (R11)

σl([R11 R12

]) σl ([R11 R12

])σk (Σk)

le g1g2

radicl (nminus l)

By the definition of τ

(313) R222 = τ σl+1 (A)

Plugging this into (36) yields (38)By the definition of τ we observe that

(314) R222 le τσk (Σk)

By (35) and (314)(315)

σj (Σk) ge σj (A)

4

radic1 + 2

(R222σj(Σk)

)4ge σj (A)

4

radic1 + 2

(R222σk(Σk)

)4ge σj (A)

4radic

1 + 2τ4 1 le j le k

ETNAKent State University and

Johann Radon Institute (RICAM)

480 Y FENG J XIAO AND M GU

On the other hand using (35)

σ4j (A) le σ4

j (Σk)

(1 + 2

σ4j (A)

σ4j (Σk)

R2242σ4j (A)

)

le σ4j (Σk)

1 + 2

(σ4j (Σk) + 2 R2242

)σ4j (Σk)

R2242σ4j (A)

le σ4

j (Σk)

(1 + 2

(1 + 2

R2242σ4k (Σk)

)R2242σ4j (A)

)

that is

σj (Σk) ge σj (A)

4

radic1 +

(2 + 4

R2242σ4k(Σk)

)R2242σ4j (A)

Plugging (313) and (314) into this above equation yields

(316) σj (Σk) ge σj (A)

4

radic1 + τ4 (2 + 4τ4)

(σl+1(A)σj(A)

)4 1 le j le k

Combining (315) with (316) we arrive at (37)We note that (35) and (36) still hold true if we replace k by lInequality (37) shows that with the definitions (312) of τ and τ Flip-Flop SRQR can

reveal at least a dimension-dependent fraction of all the leading singular values of A andindeed approximate them very accurately in case they decay relatively quickly Moreover (38)shows that Flip-Flop SRQR can compute a rank-k approximation that is up to a factor of4

radic1 + 2τ4

(σl+1(A)σk+1(A)

)4

from optimal In situations where the singular values of A decay

relatively quickly our rank-k approximation is about as accurate as the truncated SVD with achoice of l such that

σl+1 (A)

σk+1 (A)= o (1)

REMARK 33 The ldquohigh probabilityrdquo mentioned in Theorem 32 is dependent on theoversampling size p but not on the block size b We decided not to dive deep into the probabilityexplanation since it will make the proof much longer and more tedious Theoretically thecomputed oversampling size p for certain given high probability would be rather high Forexample to achieve a probability 095 we need to choose p to be at least 500 However inpractice we would never choose such a large p since it will deteriorate the code efficiencyIt turns out that a small p like 5 sim 10 would achieve good approximation results empiricallyThe detailed explanation about what does high probability mean here can be found in [75]In terms of the choice of the block size b it wonrsquot affect the analysis or the error boundsThe choice of b is purely for the performance of the codes Only a suitable b can take fulladvantage of the cache hierarchy and BLAS routines

4 Numerical experiments In this section we demonstrate the effectiveness and effi-ciency of the Flip-Flop SRQR (FFSRQR) algorithm in several numerical experiments Firstly

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 481

TABLE 41Methods for approximate SVD

Method DescriptionLANSVD Approximate SVD using Lanczos bidiagonalization with partial

reorthogonalization [42]Its Fortran and Matlab implementations are from PROPACK [41]

FFSRQR Flip-Flop Spectrum-revealing QR factorizationIts Matlab implementation is using Mex-wrappedBLAS and LAPACK routines [2]

RSISVD Approximate SVD with randomized subspace iteration [30 27]Its Matlab implementation is from tensorlab toolbox [73]

TUXV Algorithm 5 in the appendixLTSVD Linear Time SVD [16]

We use its Matlab implementation by Ma et al [49]

we compare FFSRQR with other approximate SVD algorithms for a matrix approximationSecondly we compare FFSRQR with other methods on tensor approximation problems usingthe tensorlab toolbox [73] Thirdly we compare FFSRQR with other methods for the robustPCA problem and matrix completion problem All experiments are implemented on a laptopwith 29 GHz Intel Core i5 processor and 8 GB of RAM The codes based on LAPACK [2]and PROPACK [41] in Section 41 are in Fortran The codes in Sections 42 43 are in Matlabexcept that FFSRQR is implemented in Fortran and wrapped by mex

41 Approximate truncated SVD In this section we compare FFSRQR with otherapproximate SVD algorithms for low-rank matrix approximation All tested methods are listedin Table 41 Since LTSVD has only a Matlab implementation this method is only consideredin Sections 42 and 43 The test matrices areType 1 A isin Rmtimesn [66] is defined by A = U DV T + 01 middot dss middotE where U isin Rmtimess V isin

Rntimess are column-orthonormal matrices and D isin Rstimess is a diagonal matrix with sgeometrically decreasing diagonal entries from 1 to 10minus3 dss is the last diagonalentry of D E isin Rmtimesn is a random matrix where the entries are independentlysampled from a normal distribution N (0 1) In our numerical experiment we testthree different random matrices The square matrix has a size of 10000times 10000 theshort-fat matrix has a size of 1000times 10000 and the tall-skinny matrix has a size of10000times 1000

Type 2 A isin R4929times4929 is a real data matrix from the University of Florida sparse matrixcollection [11] Its corresponding file name is HBGEMAT11

For a given matrix A isin Rmtimesn the relative SVD approximation error is measured by∥∥Aminus UkΣkVTk

∥∥FAF where Σk contains the approximate top k singular values and

Uk Vk are the corresponding approximate top k singular vectors The parameters used inFFSRQR RSISVD TUXV and LTSVD are listed in Table 42 The additional time requiredto compute g2 is negligible In our experiments g2 always remains modest and never triggerssubsequent SRQR column swaps However computing g2 is still recommended since g2

serves as an insurance policy against potential quality degration by TRQRCP The block size bis chosen to be 32 according to the xGEQP3 LAPACK routine When k is less than 32 weimplement an unblock version of the partial QRCP

Figures 41ndash43 display the run time the relative approximation error and the top 20singular values comparison respectively for four different matrices In terms of speed FFSRQRis always faster than LANSVD and RSISVD Moreover the efficiency difference is more

ETNAKent State University and

Johann Radon Institute (RICAM)

482 Y FENG J XIAO AND M GU

TABLE 42Parameters used in RSISVD FFSRQR TUXV and LTSVD

Method ParameterFFSRQR oversampling size p = 5 block size b = min32 k l = k d = 10 g = 20RSISVD oversampling size p = 5 subspace iteration q = 1TUXV oversampling size p = 5 block size b = min32 k l = kLTSVD probabilities pi = 1n

obvious when k is relatively large FFSRQR is only slightly slower than TUXV In termsof the relative approximation error FFSRQR is comparable to the other three methods Interms of the top singular values approximation Figure 43 displays the top 20 singular valueswhen k = 500 for four different matrices In Figure 43 FFSRQR is superior to TUXV aswe take the absolute values of the main diagonal entries of the upper triangle matrix R as theapproximate singular values for TUXV

100 200 300 400 500

target rank k

100

101

102

103

104

run tim

e (

sec)

FFSRQR

RSISVD

LANSVD

TUXV

(a) Type 1 Random square matrix

100 200 300 400 500

target rank k

10-1

100

101

102

run tim

e (

sec)

FFSRQR

RSISVD

LANSVD

TUXV

(b) Type 1 Random short-fat matrix

100 200 300 400 500

target rank k

10-1

100

101

102

run tim

e (

sec)

FFSRQR

RSISVD

LANSVD

TUXV

(c) Type 1 Random tall-skinny matrix

100 200 300 400 500

target rank k

10-1

100

101

102

103

run tim

e (

sec)

FFSRQR

RSISVD

LANSVD

TUXV

(d) Type 2 GEMAT11

FIG 41 Run time comparison for approximate SVD algorithms

42 Tensor approximation This section illustrates the effectiveness and efficiency ofFFSRQR for computing approximate tensors ST-HOSVD [3 71] is one of the most efficient

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 483

100 200 300 400 500

target rank k

022

024

026

028

03

032

034

036re

lative

ap

pro

xim

atio

n e

rro

r

FFSRQR

RSISVD

LANSVD

TUXV

(a) Type 1 Random square matrix

100 200 300 400 500

target rank k

10-2

10-1

100

rela

tive

ap

pro

xim

atio

n e

rro

r

FFSRQR

RSISVD

LANSVD

TUXV

(b) Type 1 Random short-fat matrix

100 200 300 400 500

target rank k

10-2

10-1

100

rela

tive

ap

pro

xim

atio

n e

rro

r

FFSRQR

RSISVD

LANSVD

TUXV

(c) Type 1 Random tall-skinny matrix

100 200 300 400 500

target rank k

024

026

028

03

032

034

036

rela

tive

ap

pro

xim

atio

n e

rro

rFFSRQR

RSISVD

LANSVD

TUXV

(d) Type 2 GEMAT11

FIG 42 Relative approximation error comparison for approximate SVD algorithms

algorithms to compute a Tucker decomposition of tensors and the most costly part of thisalgorithm is to compute an SVD or approximate SVD of the tensor unfoldings The truncatedSVD and RSISVD are used in the routines MLSVD and MLSVD_RSI respectively in theMatlab tensorlab toolbox [73] Based on this Matlab toolbox we implement ST-HOSVDusing FFSRQR or LTSVD to do the SVD approximation We name these two new routines byMLSVD_FFSRQR and MLSVD_LTSVD respectively and compare these four routines in thenumerical experiment We also have Python codes for this tensor approximation experimentWe donrsquot list the results of the python programs here as they are similar to those for Matlab

421 A sparse tensor example We test a sparse tensor X isin Rntimesntimesn of the followingformat [64 59]

X =

10sumj=1

1000

jxj yj zj +

nsumj=11

1

jxj yj zj

where xj yj zj isin Rn are sparse vectors with nonnegative entries The symbol ldquordquo representsthe vector outer product We compute a rank-(k k k) Tucker decomposition [GU1 U2 U3]using MLSVD MLSVD_FFSRQR MLSVD_RSI and MLSVD_LTSVD respectively Therelative approximation error is measured by X minus XkF XF where Xk = G times1 U1 times2

U2 times3 U3

ETNAKent State University and

Johann Radon Institute (RICAM)

484 Y FENG J XIAO AND M GU

5 10 15 20

index

06

07

08

09

1to

p 2

0 s

ingula

r valu

es

FFSRQR

RSISVD

LANSVD

TUXV

(a) Type 1 Random square matrix

5 10 15 20

index

06

07

08

09

1

top 2

0 s

ingula

r valu

es

FFSRQR

RSISVD

LANSVD

TUXV

(b) Type 1 Random short-fat matrix

5 10 15 20

index

06

065

07

075

08

085

09

095

top

20

sin

gu

lar

va

lue

s

FFSRQR

RSISVD

LANSVD

TUXV

(c) Type 1 Random tall-skinny matrix

5 10 15 20

index

101

102

103

top

20

sin

gu

lar

va

lue

sFFSRQR

RSISVD

LANSVD

TUXV

(d) Type 2 GEMAT11

FIG 43 Top 20 singular values comparison for approximate SVD algorithms

Figure 44 displays a comparison of the efficiency and accuracy of different methods for a400times 400times 400 sparse tensor approximation problem MLSVD_LTSVD is the fastest but theleast accurate one The other three methods have similar accuracy while MLSVD_FFSRQR isfaster when the target rank k is larger

422 Handwritten digits classification MNIST is a handwritten digits image data setcreated by Yann LeCun [43] Every digit is represented by a 28times28 pixel image Handwrittendigits classification has the objective of training a classification model to classify new unlabeledimages A HOSVD algorithm is proposed by Savas and Elden [60] to classify handwrittendigits To reduce the training time a more efficient ST-HOSVD algorithm is introducedin [71]

We do handwritten digits classification using MNIST which consists of 60 000 trainingimages and 10 000 test images The number of training images in each class is restrictedto 5421 so that the training set is equally distributed over all classes The training set isrepresented by a tensor X of size 786times 5421times 10 The classification relies on Algorithm 2in [60] We use various algorithms to obtain an approximation X asymp G times1 U1 times2 U2 times3 U3

where the core tensor G has size 65times 142times 10The results are summarized in Table 43 In terms of run time we observe that our method

MLSVD_FFSRQR is comparable to MLSVD_RSI while MLSVD is the most expensive

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 485

50 100 150 200

target rank k

10-5

10-4

10-3

10-2

10-1

100

rela

tive

ap

pro

xim

atio

n e

rro

r

MLSVD

MLSVD_FFSRQR

MLSVD_RSI

MLSVD_LTSVD

50 100 150 200

target rank k

10-2

10-1

100

101

102

run tim

e (

sec)

MLSVD

MLSVD_FFSRQR

MLSVD_RSI

MLSVD_LTSVD

FIG 44 Run time and relative approximation error comparison on a sparse tensor

TABLE 43Comparison on handwritten digits classification

MLSVD MLSVD_FFSRQR MLSVD_RSI MLSVD_LTSVD

Training Time [sec] 272121 15455 19343 04266Relative Model Error 04099 04273 04247 05162Classification Accur 9519 9498 9505 9259

one and MLSVD_LTSVD is the fastest one In terms of classification quality MLSVDMLSVD_FFSRQR and MLSVD_RSI are comparable while MLSVD_LTSVD is the leastaccurate one

43 Solving nuclear norm minimization problem To show the effectiveness of theFFSRQR algorithm for the nuclear norm minimization problems we investigate two scenariosthe robust PCA (27) and the matrix completion problem (28) The test matrix used for therobust PCA is introduced in [45] and the test matrices used in the matrix completion are tworeal data sets We use the IALM method [45] to solve both problems the IALM code can bedownloaded1

431 Robust PCA To solve the robust PCA problem we replace the approximateSVD part in the IALM method [45] by various methods We denote the actual solution tothe robust PCA problem by a matrix pair (Xlowast Elowast) isin Rmtimesn times Rmtimesn The matrix Xlowast isXlowast = XLX

TR with XL isin Rmtimesk XR isin Rntimesk being random matrices whose entries are

independently sampled from a normal distribution The sparse matrix Elowast is a random matrixwhere its non-zero entries are independently sampled from a uniform distribution over theinterval [minus500 500] The input to the IALM algorithm has the form M = Xlowast + Elowast andthe output is denoted by

(X E

) In this numerical experiment we use the same parameter

settings as the IALM code for robust PCA rank k is 01m and the number of non-zero entriesin E is 005m2 We choose the trade-off parameter λ = 1

radicmax (mn) as suggested by

Candes et al [8] The solution quality is measured by the normalized root mean square errorX minusXlowastF XlowastF

Table 44 includes relative error run time the number of non-zero entries in E (E0)the iteration count and the number of non-zero singular values (sv) in X of the IALM

1httpperceptioncslillinoisedumatrix-ranksample_codehtml

ETNAKent State University and

Johann Radon Institute (RICAM)

486 Y FENG J XIAO AND M GU

algorithm using different approximate SVD methods We observe that IALM_FFSRQR isfaster than all the other three methods while its error is comparable to IALM_LANSVD andIALM_RSISVD IALM_LTSVD is relatively slow and not effective

TABLE 44Comparison on robust PCA

Size Method Error Time (sec) E0 Iter sv

1000times 1000

IALM_LANSVD 333e-07 579e+00 50000 22 100IALM_FFSRQR 279e-07 102e+00 50000 25 100IALM_RSISVD 336e-07 109e+00 49999 22 100IALM_LTSVD 992e-02 311e+00 999715 100 100

2000times 2000

IALM_LANSVD 261e-07 591e+01 199999 22 200IALM_FFSRQR 182e-07 693e+00 199998 25 200IALM_RSISVD 263e-07 738e+00 199996 22 200IALM_LTSVD 842e-02 220e+01 3998937 100 200

4000times 4000

IALM_LANSVD 138e-07 465e+02 799991 23 400IALM_FFSRQR 139e-07 443e+01 800006 26 400IALM_RSISVD 151e-07 504e+01 799990 23 400IALM_LTSVD 894e-02 154e+02 15996623 100 400

6000times 6000

IALM_LANSVD 130e-07 166e+03 1799982 23 600IALM_FFSRQR 102e-07 142e+02 1799993 26 600IALM_RSISVD 144e-07 162e+02 1799985 23 600IALM_LTSVD 858e-02 555e+02 35992605 100 600

432 Matrix completion We solve the matrix completion problems for two real datasets used in [68] the Jester joke data set [24] and the MovieLens data set [32] The Jester jokedata set consists of 41 million ratings for 100 jokes from 73 421 users and can be downloadedfrom the website httpgoldbergberkeleyedujester-data We test withthe following data matrices

bull jester-1 Data from 24 983 users who have rated 36 or more jokesbull jester-2 Data from 23 500 users who have rated 36 or more jokesbull jester-3 Data from 24 938 users who have rated between 15 and 35 jokesbull jester-all The combination of jester-1 jester-2 and jester-3

The MovieLens data set can be downloaded fromhttpsgrouplensorgdatasetsmovielensWe test with the following data matricesbull movie-100K 100 000 ratings of 943 users for 1682 moviesbull movie-1M 1 million ratings of 6040 users for 3900 moviesbull movie-latest-small 100 000 ratings of 700 users for 9000 movies

For each data set we let M be the original data matrix where Mij stands for the ratingof joke (movie) j by user i and Γ be the set of indices where Mij is known The matrixcompletion algorithm quality is measured by the Normalized Mean Absolute Error (NMAE)defined by

NMAEdef=

1|Γ|sum

(ij)isinΓ |Mij minusXij |rmax minus rmin

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 487

where Xij is the prediction of the rating of joke (movie) j given by user i and rmin rmax arethe lower and upper bounds of the ratings respectively For the Jester joke data sets we setrmin = minus10 and rmax = 10 For the MovieLens data sets we set rmin = 1 and rmax = 5

Since |Γ| is large we randomly select a subset Ω from Γ and then use Algorithm 8 tosolve the problem (28) We randomly select 10 ratings for each user in the Jester joke datasets while we randomly choose about 50 of the ratings for each user in the MovieLensdata sets Table 45 includes the parameter settings in the algorithms The maximum iterationnumber is 100 in IALM and all other parameters are the same as those used in [45]

The numerical results are included in Table 46 We observe that IALM_FFSRQR achievesalmost the same recoverability as other methods (except IALM_LTSVD) and is slightly fasterthan IALM_RSISVD for these two data sets

TABLE 45Parameters used in the IALM method on matrix completion

Data set m n |Γ| |Ω|jester-1 24983 100 181e+06 249830jester-2 23500 100 171e+06 235000jester-3 24938 100 617e+05 249384

jester-all 73421 100 414e+06 734210movie-100K 943 1682 100e+05 49918movie-1M 6040 3706 100e+06 498742

movie-latest-small 671 9066 100e+05 52551

5 Conclusions We have presented the Flip-Flop SRQR factorization a variant ofthe QLP factorization to compute low-rank matrix approximations The Flip-Flop SRQRalgorithm uses a SRQR factorization to initialize the truncated version of a column-pivotedQR factorization and then forms an LQ factorization For the numerical results presented theerrors in the proposed algorithm were comparable to those obtained from the other state-of-the-art algorithms This new algorithm is cheaper to compute and produces quality low-rankmatrix approximations Furthermore we prove singular value lower bounds and residual errorupper bounds for the Flip-Flop SRQR factorization In situations where singular values ofthe input matrix decay relatively quickly the low-rank approximation computed by Flip-FlopSRQR is guaranteed to be as accurate as the truncated SVD We also perform a complexityanalysis to show that Flip-Flop SRQR is faster than the approximate SVD with randomizedsubspace iteration Future work includes reducing the overhead cost in Flip-Flop SRQR andimplementing the Flip-Flop SRQR algorithm on distributed memory machines for popularapplications such as distributed PCA

Acknowledgments The authors would like to thank Yousef Saad the editor and threeanonymous referees who kindly reviewed the earlier version of this manuscript and providedvaluable suggestions and comments

Appendix A Partial QR factorization with column pivotingThe details of partial QRCP are covered in Algorithm 4

Appendix B TUXV algorithmThe details of TUXV algorithm are presented in Algorithm 5

Appendix C Computing Tucker decomposition of higher-order tensorsAlgorithm 6 computes the Tucker decomposition of higher-order tensors ie sequentially

truncated higher-order SVD

ETNAKent State University and

Johann Radon Institute (RICAM)

488 Y FENG J XIAO AND M GU

TABLE 46Comparison on matrix completion

Data set Method Iter Time NMAE sv σmax σmin

jester-1

IALM-LANSVD 12 706e+00 184e-01 100 214e+03 100e+00IALM-FFSRQR 12 344e+00 169e-01 100 228e+03 100e+00IALM-RSISVD 12 375e+00 189e-01 100 212e+03 100e+00IALM-LTSVD 100 211e+01 174e-01 62 300e+03 100e+00

jester-2

IALM-LANSVD 12 680e+00 185e-01 100 213e+03 100e+00IALM-FFSRQR 12 279e+00 170e-01 100 229e+03 100e+00IALM-RSISVD 12 359e+00 191e-01 100 212e+03 100e+00IALM-LTSVD 100 203e+01 175e-01 58 296e+03 100e+00

jester-3

IALM-LANSVD 12 705e+00 126e-01 99 179e+03 100e+00IALM-FFSRQR 12 303e+00 122e-01 100 171e+03 100e+00IALM-RSISVD 12 385e+00 131e-01 100 178e+03 100e+00IALM-LTSVD 100 212e+01 133e-01 55 250e+03 100e+00

jester-all

IALM-LANSVD 12 239e+01 172e-01 100 356e+03 100e+00IALM-FFSRQR 12 112e+01 162e-01 100 363e+03 100e+00IALM-RSISVD 12 134e+01 182e-01 100 347e+03 100e+00IALM-LTSVD 100 699e+01 168e-01 52 492e+03 100e+00

movie-100K

IALM-LANSVD 29 286e+01 183e-01 285 121e+03 100e+00IALM-FFSRQR 30 455e+00 167e-01 295 153e+03 100e+00IALM-RSISVD 29 482e+00 182e-01 285 129e+03 100e+00IALM-LTSVD 48 142e+01 147e-01 475 191e+03 100e+00

movie-1M

IALM-LANSVD 50 740e+02 158e-01 495 499e+03 100e+00IALM-FFSRQR 53 207e+02 137e-01 525 663e+03 100e+00IALM-RSISVD 50 223e+02 157e-01 495 535e+03 100e+00IALM-LTSVD 100 850e+02 117e-01 995 897e+03 100e+00

movie-latest-small

IALM-LANSVD 31 166e+02 185e-01 305 113e+03 100e+00IALM-FFSRQR 31 196e+01 200e-01 305 142e+03 100e+00IALM-RSISVD 31 285e+01 191e-01 305 120e+03 100e+00IALM-LTSVD 63 402e+01 208e-01 298 179e+03 100e+00

Appendix D IALM for solving two special nuclear norm minimization problemsAlgorithm 7 and Algorithm 8 solve the robust PCA and matrix completion problems

respectively In Algorithm 7 Sω (x) = sgn (x) middot max (|x| minus ω 0) is the soft shrinkageoperator [29] with x isin Rn and ω gt 0 In Algorithm 8 Ω is the complement of Ω

Appendix E Approximate SVD with randomized subspace iterationThe randomized subspace iteration was proposed in [30 Algorithm 44] to compute an

orthonormal matrix whose range approximates the range of A An approximate SVD can becomputed using the aforementioned orthonormal matrix [30 Algorithm 51] A randomizedsubspace iteration is used in routine MLSVD_RSI in Matlab toolbox tensorlab [73] andMLSVD_RSI is by far the most efficient function that we can find in Matlab to compute ST-HOSVD We summarize the approximate SVD with randomized subspace iteration pseudocodein Algorithm 9

Now we perform a complexity analysis for the approximate SVD with randomizedsubspace iteration We first note that

1 The cost of generating a random matrix is negligible2 The cost of computing B = AΩ is 2mn (k + p)

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 489

Algorithm 4 Partial QRCPInputsMatrix A isin Rmtimesn Target rank kOutputsOrthogonal matrix Q isin RmtimesmUpper trapezoidal matrix R isin RmtimesnPermutation matrix Π isin Rntimesn such that AΠ = QRAlgorithmInitialize Π = In Compute column norms rs = A (1 m s) 2 (1 le s le n)for j = 1 k do

Find i = arg maxjleslen

rs Exchange rj and ri columns in A and Π

Form Householder reflection Qj from A (j m j)Update trailing matrix A (j m j n)larr QTj A (j m j n)Update rs = A (j + 1 m s) 2 (j + 1 le s le n)

end forQ = Q1Q2 middot middot middotQk is the product of all reflections R = upper trapezoidal part of A

Algorithm 5 TUXV AlgorithmInputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0OutputsColumn orthonormal matrices U isin Rmtimesk V isin Rntimesk and upper triangular matrixR isin Rktimesk such that A asymp URV T AlgorithmDo TRQRCP on A to obtain Q isin Rmtimesk R isin Rktimesn and Π isin RntimesnR = RΠT and do LQ factorization ie [VR] = qr(RT 0)Compute Z = AV and do QR factorization ie [UR] = qr(Z 0)

3 In each QR step [Qsim] = qr (B 0) the cost of computing the QR factorization ofB is 2m (k + p)

2 minus 23 (k + p)

3 (cf [69]) and the cost of forming the first (k + p)

columns in the full Q matrix is m (k + p)2

+ 13 (k + p)

3

Now we count the flops for each i in the for loop

1 The cost of computing B = AT lowastQ is 2mn (k + p)2 The cost of computing [Qsim] = qr (B 0) is 2n (k + p)

2 minus 23 (k + p)

3 and the costof forming the first (k + p) columns in the full Q matrix is n (k + p)

2+ 1

3 (k + p)3

3 The cost of computing B = A lowastQ is 2mn (k + p)4 The cost of computing [Qsim] = qr (B 0) is 2m (k + p)

2minus 23 (k + p)

3 and the costof forming the first (k + p) columns in the full Q matrix is m (k + p)

2+ 1

3 (k + p)3

Together the cost of running the for loop q times is

q

(4mn (k + p) + 3 (m+ n) (k + p)

2 minus 2

3(k + p)

3

)

Additionally the cost of computing B = QT lowastA is 2mn (k + p) the cost of doing SVD of Bis O(n (k + p)

2) and the cost of computing U = Q lowast U is 2m (k + p)

2

ETNAKent State University and

Johann Radon Institute (RICAM)

490 Y FENG J XIAO AND M GU

Algorithm 6 ST-HOSVDInputsTensor X isin RI1timesmiddotmiddotmiddottimesId desired rank (k1 kd) and processing order p = (p1 middot middot middot pd)OutputsTucker decomposition [GU1 middot middot middot Ud]AlgorithmDefine tensor G larr X for j = 1 d dor = pj Compute exact or approximate rank kr SVD of the tensor unfolding G(r) asymp UrΣrV Tr Ur larr UrUpdate G(r) larr ΣrV

Tr ie applying UTr to G

end for

Algorithm 7 Robust PCA Using IALMInputsMeasured matrix M isin Rmtimesn positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 J (M) = max

(M2 λminus1MF

) Y0 = MJ (M) E0 = 0

while not converged do(UΣV) = svd

(MminusEk + microminus1k Yk

)

Xk+1 = USmicrominus1k

(Σ)V T

Ek+1 = Sλmicrominus1k

(M minusXk+1 + microminus1

k Yk)

Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then

Breakend if

end while

Now assume k + p min (mn) and omit the lower-order terms then we arriveat (4q + 4)mn (k + p) as the complexity of approximate SVD with randomized subspaceiteration In practice q is usually chosen to be an integer between 0 and 2

REFERENCES

[1] O ALTER P O BROWN AND D BOTSTEIN Singular value decomposition for genome-wide expressiondata processing and modeling Proc Natl Acad Sci USA 97 (2000) pp 10101ndash10106

[2] E ANDERSON Z BAI C BISCHOF L S BLACKFORD J DEMMEL J DONGARRA J DU CROZA GREENBAUM S HAMMARLING A MCKENNEY AND D SORENSEN LAPACK Usersrsquo Guide 3rded SIAM Philadelphia 1999

[3] C A ANDERSSON AND R BRO Improving the speed of multi-way algorithms Part I Tucker3 ChemometricsIntell Lab Syst 42 (1998) pp 93ndash103

[4] M W BERRY S T DUMAIS AND G W OrsquoBRIEN Using linear algebra for intelligent information retrievalSIAM Rev 37 (1995) pp 573ndash595

[5] C BISCHOF AND C VAN LOAN The WY representation for products of Householder matrices SIAM J Sci

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 491

Algorithm 8 Matrix Completion Using IALMInputsSampled set Ω sampled entries πΩ (M) positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 Y0 = 0 E0 = 0while not converged do

(UΣV) = svd(MminusEk + microminus1k Yk

)

Xk+1 = USmicrominus1k

(Σ)V T

Ek+1 = πΩ

(M minusXk+1 + microminus1

k Yk)

Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then

Breakend if

end while

Statist Comput 8 (1987) pp S2ndashS13[6] P BUSINGER AND G H GOLUB Handbook series linear algebra Linear least squares solutions by House-

holder transformations Numer Math 7 (1965) pp 269ndash276[7] J-F CAI E J CANDEgraveS AND Z SHEN A singular value thresholding algorithm for matrix completion

SIAM J Optim 20 (2010) pp 1956ndash1982[8] E J CANDEgraveS X LI Y MA AND J WRIGHT Robust principal component analysis J ACM 58 (2011)

Art 11 37 pages[9] E J CANDEgraveS AND B RECHT Exact matrix completion via convex optimization Found Comput Math 9

(2009) pp 717ndash772[10] J D CARROLL AND J-J CHANG Analysis of individual differences in multidimensional scaling via an

N-way generalization of ldquoEckart-Youngrdquo decomposition Psychometrika 35 (1970) pp 283ndash319[11] T A DAVIS AND Y HU The University of Florida sparse matrix collection ACM Trans Math Software 38

(2011) pp Art 1 25[12] L DE LATHAUWER AND B DE MOOR From matrix to tensor Multilinear algebra and signal processing in

Mathematics in Signal Processing IV Inst Math Appl Conf Ser Oxford Univ Press Oxford 1998pp 1ndash16

[13] L DE LATHAUWER B DE MOOR AND J VANDEWALLE A multilinear singular value decompositionSIAM J Matrix Anal Appl 21 (2000) pp 1253ndash1278

[14] On the best rank-1 and rank-(R1 R2 middot middot middot RN ) approximation of higher-order tensors SIAM JMatrix Anal Appl 21 (2000) pp 1324ndash1342

[15] L DE LATHAUWER AND J VANDEWALLE Dimensionality reduction in higher-order signal processing andrank-(R1 R2 RN ) reduction in multilinear algebra Linear Algebra Appl 391 (2004) pp 31ndash55

[16] P DRINEAS R KANNAN AND M W MAHONEY Fast Monte Carlo algorithms for matrices II Computinga low-rank approximation to a matrix SIAM J Comput 36 (2006) pp 158ndash183

[17] J A DUERSCH AND M GU Randomized QR with column pivoting SIAM J Sci Comput 39 (2017)pp C263ndashC291

[18] C ECKART AND G YOUNG The approximation of one matrix by another of lower rank Psychometrika 1(1936) pp 211ndash218

[19] L ELDEacuteN Matrix Methods in Data Mining and Pattern Recognition SIAM Philadelphia 2007[20] M FAZEL H HINDI AND S P BOYD Log-det heuristic for matrix rank minimization with applications to

Hankel and Euclidean distance matrices in Proceedings of the 2003 American Control Conference 2003IEEE Conference Proceedings IEEE Piscataway pp 2156ndash2162

[21] M FAZEL T K PONG D SUN AND P TSENG Hankel matrix rank minimization with applications tosystem identification and realization SIAM J Matrix Anal Appl 34 (2013) pp 946ndash977

[22] C FRANK AND E NOumlTH Optimizing eigenfaces by face masks for facial expression recognition in ComputerAnalysis of Images and Patterns N Petkov and M A Westenberg eds vol 2756 of Lecture Notes inComput Sci Springer Berlin 2003 pp 646ndash654

ETNAKent State University and

Johann Radon Institute (RICAM)

492 Y FENG J XIAO AND M GU

Algorithm 9 Approximate SVD with Randomized Subspace IterationInputsMatrix A isin Rmtimesn Target rank k Oversampling size p ge 0 Number of iterations q ge 1OutputsU isin Rmtimesk contains the approximate top k left singular vectors of AΣ isin Rktimesk contains the approximate top k singular values of AV isin Rntimesk contains the approximate top k right singular vectors of AAlgorithmGenerate iid Gaussian matrix Ω isin N (0 1)

ntimes(k+p)Compute B = AΩ[Qsim] = qr (B 0)for i = 1 q doB = AT lowastQ[Qsim] = qr (B 0)B = A lowastQ[Qsim] = qr (B 0)

end forB = QT lowastA[UΣ V ] = svd (B)U = Q lowast U U = U ( 1 k)Σ = Σ (1 k 1 k)V = V ( 1 k)

[23] G W FURNAS S DEERWESTER S T DUMAIS T K LANDAUER R A HARSHMAN L A STREETERAND K E LOCHBAUM Information retrieval using a singular value decomposition model of latentsemantic structure in Proceedings of the 11th Annual International ACM SIGIR Conference on Researchand Development in Information Retrieval Y Chiaramella ed ACM New York 1988 pp 465ndash480

[24] K GOLDBERG T ROEDER D GUPTA AND C PERKINS Eigentaste A constant time collaborative filteringalgorithm Inf Retrieval 4 (2001) pp 133ndash151

[25] G GOLUB Numerical methods for solving linear least squares problems Numer Math 7 (1965) pp 206ndash216[26] G H GOLUB AND C F VAN LOAN Matrix Computations 4th ed Johns Hopkins University Press

Baltimore 2013[27] M GU Subspace iteration randomization and singular value problems SIAM J Sci Comput 37 (2015)

pp A1139ndashA1173[28] M GU AND S C EISENSTAT Efficient algorithms for computing a strong rank-revealing QR factorization

SIAM J Sci Comput 17 (1996) pp 848ndash869[29] E T HALE W YIN AND Y ZHANG Fixed-point continuation for l1-minimization methodology and

convergence SIAM J Optim 19 (2008) pp 1107ndash1130[30] N HALKO P G MARTINSSON AND J A TROPP Finding structure with randomness probabilistic

algorithms for constructing approximate matrix decompositions SIAM Rev 53 (2011) pp 217ndash288[31] R A HARSHMAN Foundations of the PARAFAC procedure Models and conditions for an ldquoexplanatoryrdquo

multimodal factor analysis UCLA Working Papers in Phonetics 16 (1970) pp 1ndash84[32] J L HERLOCKER J A KONSTAN A BORCHERS AND J RIEDL An algorithmic framework for performing

collaborative filtering in Proceedings of the 22nd Annual International ACM SIGIR Conference onResearch and Development in Information Retrieval F Gey M Heaast and R Tong eds ACM NewYork 1999 pp 230ndash237

[33] R A HORN AND C R JOHNSON Topics in Matrix Analysis Cambridge University Press Cambridge 1991[34] D A HUCKABY AND T F CHAN On the convergence of Stewartrsquos QLP algorithm for approximating the

SVD Numer Algorithms 32 (2003) pp 287ndash316[35] Stewartrsquos pivoted QLP decomposition for low-rank matrices Numer Linear Algebra Appl 12 (2005)

pp 153ndash159[36] W B JOHNSON AND J LINDENSTRAUSS Extensions of Lipschitz mappings into a Hilbert space in

Conference in Modern Analysis and Probability (New Haven Conn 1982) R Beals A Beck A Bellowand A Hajian eds vol 26 of Contemp Math Amer Math Soc Providence 1984 pp 189ndash206

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 493

[37] I T JOLLIFFE Principal Component Analysis Springer New York 1986[38] J M KLEINBERG Authoritative sources in a hyperlinked environment J ACM 46 (1999) pp 604ndash632[39] T G KOLDA Orthogonal tensor decompositions SIAM J Matrix Anal Appl 23 (2001) pp 243ndash255[40] T G KOLDA AND B W BADER Tensor decompositions and applications SIAM Rev 51 (2009) pp 455ndash

500[41] R M LARSEN PROPACK-software for large and sparse SVD calculations available at

httpsunstanfordedurmunkPROPACK[42] Lanczos bidiagonalization with partial reorthogonalization Tech Report PB-537 Department of

Computer Science Aarhus University Aarhus Denmark 1998[43] Y LECUN L BOTTOU Y BENGIO AND P HAFFNER Gradient-based learning applied to document

recognition Proc IEEE 86 (1998) pp 2278ndash2324[44] R B LEHOUCQ D C SORENSEN AND C YANG ARPACK Usersrsquo Guide Solution of Large-Scale

Eigenvalue Problems with Implicitly Restarted Arnoldi Methods SIAM Philadelphia 1998[45] Z LIN M CHEN AND Y MA The augmented Lagrange multiplier method for exact recovery of corrupted

low-rank matrices Preprint on arXiv httpsarxivorgabs10095055 2010[46] N LINIAL E LONDON AND Y RABINOVICH The geometry of graphs and some of its algorithmic

applications Combinatorica 15 (1995) pp 215ndash245[47] Z LIU A HANSSON AND L VANDENBERGHE Nuclear norm system identification with missing inputs and

outputs Systems Control Lett 62 (2013) pp 605ndash612[48] Z LIU AND L VANDENBERGHE Interior-point method for nuclear norm approximation with application to

system identification SIAM J Matrix Anal Appl 31 (2009) pp 1235ndash1256[49] S MA D GOLDFARB AND L CHEN Fixed point and Bregman iterative methods for matrix rank minimiza-

tion Math Program 128 (2011) pp 321ndash353[50] C B MELGAARD Randomized Pivoting and Spectrum-Revealing Bounds in Numerical Linear Algebra

PhD Thesis University of California Berkeley 2015[51] M MESBAHI AND G P PAPAVASSILOPOULOS On the rank minimization problem over a positive semidefinite

linear matrix inequality IEEE Trans Automat Control 42 (1997) pp 239ndash243[52] N MULLER L MAGAIA AND B M HERBST Singular value decomposition eigenfaces and 3D recon-

structions SIAM Rev 46 (2004) pp 518ndash545[53] M NARWARIA AND W LIN SVD-based quality metric for image and video using machine learning IEEE

Trans Systbdquo Man and Cybernetics Part B (Cybernetics) 42 (2012) pp 347ndash364[54] T-H OH Y MATSUSHITA Y-W TAI AND I SO KWEON Fast randomized singular value thresholding

for nuclear norm minimization in Proceedings of the IEEE Conference on Computer Vision and PatternRecognition 2015 IEEE Conference Proceedings IEEE Los Alamitos 2015 pp 4484ndash4493

[55] G QUINTANA-ORTIacute X SUN AND C H BISCHOF A BLAS-3 version of the QR factorization with columnpivoting SIAM J Sci Comput 19 (1998) pp 1486ndash1494

[56] B RECHT M FAZEL AND P A PARRILO Guaranteed minimum-rank solutions of linear matrix equationsvia nuclear norm minimization SIAM Rev 52 (2010) pp 471ndash501

[57] J D RENNIE AND N SREBRO Fast maximum margin matrix factorization for collaborative prediction inProceedings of the 22nd International Conference on Machine Learning L De Raedt and S Wrobel edsACM Press Neow York 2005 pp 713ndash719

[58] V ROKHLIN A SZLAM AND M TYGERT A randomized algorithm for principal component analysis SIAMJ Matrix Anal Appl 31 (2009) pp 1100ndash1124

[59] A K SAIBABA HOID higher order interpolatory decomposition for tensors based on Tucker representationSIAM J Matrix Anal Appl 37 (2016) pp 1223ndash1249

[60] B SAVAS AND L ELDEacuteN Handwritten digit classification using higher order singular value decompositionPattern Recognition 40 (2007) pp 993ndash1003

[61] R SCHREIBER AND C VAN LOAN A storage-efficient WY representation for products of Householdertransformations SIAM J Sci Statist Comput 10 (1989) pp 53ndash57

[62] A SHASHUA AND T HAZAN Non-negative tensor factorization with applications to statistics and com-puter vision in Proceedings of the 22nd International Conference on Machine Learning S DzeroskiL De Raedt and S Wrobel eds ACM New York 2005 pp 792ndash799

[63] N D SIDIROPOULOS R BRO AND G B GIANNAKIS Parallel factor analysis in sensor array processingIEEE Trans Signal Process 48 (2000) pp 2377ndash2388

[64] D C SORENSEN AND M EMBREE A DEIM induced CUR factorization SIAM J Sci Comput 38 (2016)pp A1454ndashA1482

[65] N SREBRO J RENNIE AND T S JAAKKOLA Maximum-margin matrix factorization in NIPSrsquo04 Proceed-ings of the 17th International Conference on Neural Information Processing Systems L K Saul Y Weissand L Bottou eds MIT Press Cambridge 2004 pp 1329ndash1336

[66] G W STEWART The QLP approximation to the singular value decomposition SIAM J Sci Comput 20(1999) pp 1336ndash1348

ETNAKent State University and

Johann Radon Institute (RICAM)

494 Y FENG J XIAO AND M GU

[67] S TAHERI Q QIU AND R CHELLAPPA Structure-preserving sparse decomposition for facial expressionanalysis IEEE Trans Image Process 23 (2014) pp 3590ndash3603

[68] K-C TOH AND S YUN An accelerated proximal gradient algorithm for nuclear norm regularized linearleast squares problems Pac J Optim 6 (2010) pp 615ndash640

[69] L N TREFETHEN AND D BAU III Numerical Linear Algebra SIAM Philadelphia 1997[70] L R TUCKER Some mathematical notes on three-mode factor analysis Psychometrika 31 (1966) pp 279ndash

311[71] N VANNIEUWENHOVEN R VANDEBRIL AND K MEERBERGEN A new truncation strategy for the higher-

order singular value decomposition SIAM J Sci Comput 34 (2012) pp A1027ndashA1052[72] M A O VASILESCU AND D TERZOPOULOS Multilinear analysis of image ensembles Tensorfaces in

European Conference on Computer Vision 2002 A Heyden G Sparr M Nielsen and P Johansen edsLecture Notes in Computer Science vol 2350 Springer Berlin 2002 pp 447ndash460

[73] N VERVLIET O DEBALS L SORBER M V BAREL AND L DE LATHAUWER Tensorlab user guideavailable at httpwwwtensorlabnet

[74] J XIAO AND M GU Spectrum-revealing Cholesky factorization for kernel methods in Proceedings of the 16thIEEE International Conference on Data Mining (ICDM) IEEE Conference Proceedings Los Alamitos2016 pp 1293ndash1298

[75] J XIAO M GU AND J LANGOU Fast parallel randomized QR with column pivoting algorithms for reliablelow-rank matrix approximations in Proceedings of the 24th IEEE International Conference on HighPerformance Computing (HiPC) IEEE Conference Proceedings Los Alamitos 2017 pp 233ndash242

[76] T ZHANG AND G H GOLUB Rank-one approximation to high order tensors SIAM J Matrix Anal Appl23 (2001) pp 534ndash550

Page 6: FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION AND …etna.mcs.kent.edu/vol.51.2019/pp469-494.dir/pp469-494.pdfminimization. In Section3, we introduce Flip-Flop SRQR and analyze its

ETNAKent State University and

Johann Radon Institute (RICAM)

474 Y FENG J XIAO AND M GU

be found in the review [40] A tensor is a d-dimensional array of numbers denoted by scriptnotation X isin RI1timesmiddotmiddotmiddottimesId with entries given by

xj1jd 1 le j1 le I1 1 le jd le Id

We use the matrix X(n) isin RIntimes(Πj 6=nIj) to denote the nth mode unfolding of the tensorX Since this tensor has d dimensions there are altogether d-possibilities for unfolding Then-mode product of a tensor X isin RI1timesmiddotmiddotmiddottimesId with a matrix U isin RktimesIn results in a tensorY isin RI1timesmiddotmiddotmiddottimesInminus1timesktimesIn+1timesmiddotmiddotmiddottimesId such that

yj1jnminus1jjn+1jd = (X timesn U)j1jnminus1jjn+1jd=

Insumjn=1

xj1jdujjn

Alternatively it can be expressed conveniently in terms of unfolded tensors

Y = X timesn U hArr Y(n) = UX(n)

Decompositions of higher-order tensors have applications in signal processing [12 15 63]numerical linear algebra [13 39 76] computer vision [62 72 67] etc Two particular tensordecompositions can be considered as higher-order extensions of the matrix SVD CANDE-COMPPARAFAC (CP) [10 31] decomposes a tensor as a sum of rank-one tensors andthe Tucker decomposition [70] is a higher-order form of the principal component analy-sis Knowing the definitions of mode products and unfolding of tensors we can introducethe sequentially truncated higher-order SVD (ST-HOSVD) algorithm for producing a rank(k1 kd) approximation to the tensor based on the Tucker decomposition format The ST-HOSVD algorithm [3 71] (Algorithm 6 in the appendix) returns a core tensor G isin Rk1timesmiddotmiddotmiddottimeskdand a set of unitary matrices Uj isin RIjtimeskj for j = 1 d such that

X asymp G times1 U1 middot middot middot timesd Ud

where the right-hand side is called a Tucker decomposition However a straightforwardgeneralization to higher-order (d ge 3) tensors of the matrix Eckart-Young-Mirsky Theorem isnot possible [14]

In ST-HOSVD one key step is to compute an exact or approximate rank-kr SVD ofthe tensor unfolding Well known efficient ways to compute an exact low-rank SVD includeKrylov subspace methods [44] There are also efficient randomized algorithms to find anapproximate low-rank SVD [30] In the Matlab tensorlab toolbox [73] the most efficientmethod MLSVD_RSI is essentially ST-HOSVD with an randomized subspace iteration tofind the approximate SVD of the tensor unfolding

24 Nuclear norm minimization Matrix rank minimization problem appears ubiqui-tously in many fields such as Euclidean embedding [20 46] control [21 51 56] collaborativefiltering [9 57 65] system identification [47 48] etc The regularized nuclear norm mini-mization problem is the following

minXisinRmtimesn

f (X) + τXlowast

where τ gt 0 is a regularization parameter and Xlowastdef=sumqi=1 σi (X) The choice of the

function f (middot) is situational f (X) = M minus X1 in robust principal component analysis(robust PCA) [8] and f (X) = πΩ (M)minus πΩ (X) ||2F in matrix completion [7]

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 475

Many researchers have devoted themselves to solve the above nuclear norm minimizationproblem and plenty of algorithms have been proposed including singular value thresholding(SVT) [7] fixed point continuous (FPC) [49] accelerated proximal gradient (APG) [68]augmented Lagrange multiplier (ALM) [45] The most expensive part of these algorithms liesin the computation of the truncated SVD Inexact augmented Lagrange multiplier (IALM) [45]has been proved to be one of the most accurate and efficient among them We now describeIALM for the robust PCA and matrix completion problems

The robust PCA problem can be formalized as a minimization problem of a sum of nuclearnorm and a scaled matrix l1-norm (sum of matrix entries in absolute value)

(27) min Xlowast + λE1 subject to M = X + E

where M is a measured matrix X has low-rank E is an error matrix and sufficiently sparseand λ is a positive weighting parameter Algorithm 7 in the appendix describes details of theIALM method to solve the robust PCA problem [45]

The matrix completion problem [9 45] can be written in the form

(28) minXisinRmtimesn

Xlowast subject to X + E = M πΩ (E) = 0

where πΩ Rmtimesn rarr Rmtimesn is an orthogonal projection that keeps the entries in Ω unchangedand sets those outside Ω zeros In [45] the authors applied the IALM method to the matrixcompletion problem (Algorithm 8 in the appendix)

3 Flip-Flop SRQR factorization

31 Flip-Flop SRQR factorization In this section we introduce our Flip-Flop SRQRfactorization a slightly different variant of TUXV (Algorithm 5 in the appendix) to computean SVD approximation based on the QLP factorization Given an integer l ge k we run SRQR(the version without computing the trailing matrix) with l steps on A

(31) AΠ = QR = Q

[R11 R12

R22

]

where R11 isin Rltimesl is upper triangular R12 isin Rltimes(nminusl) and R22 isin R(mminusl)times(nminusl) Then werun partial QRNP with l steps on RT

(32) RT =

[RT11

RT12 RT22

]= Q

[R11 R12

R22

]asymp Q1

[R11 R12

]

where Q =[Q1 Q2

]with Q1 isin Rntimesl Therefore combining with the fact that

AΠQ1 = Q[R11 R12

]T

we can approximate the matrix A by

(33) A = QRΠT = Q(RT)T

ΠT asymp Q

[RT11

RT12

]QT1 ΠT = A

(ΠQ1

)(ΠQ1

)T

We denote the rank-k truncated SVD ofAΠQ1 by UkΣkVTk LetUk = Uk Vk = ΠQ1Vk

then using (33) a rank-k approximate SVD of A is obtained

(34) A asymp UkΣkVTk

where Uk isin Rmtimesk Vk isin Rntimesk are column orthonormal and Σk = Diag (σ1 middot middot middot σk) withσirsquos being the leading k singular values of AΠQ1 The Flip-Flop SRQR factorization isoutlined in Algorithm 3

ETNAKent State University and

Johann Radon Institute (RICAM)

476 Y FENG J XIAO AND M GU

Algorithm 3 Flip-Flop Spectrum-Revealing QR FactorizationInputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0Integer l ge k Tolerance g gt 1 for g2OutputsU isin Rmtimesk contains the approximate top k left singular vectors of AΣ isin Rktimesk contains the approximate top k singular values of AV isin Rntimesk contains the approximate top k right singular vectors of AAlgorithmRun SRQR on A to l steps to obtain (R11 R12)Run QRNP on (R11 R12)

T to obtain Q1 represented by a sequence of Householder vectors

A = AΠQ1[U Σ V ] = svd

(A)

U = U ( 1 k) Σ = Σ (1 k 1 k) V = ΠQ1V ( 1 k)

32 Complexity analysis In this section we perform a complexity analysis of Flip-Flop SRQR Since the approximate SVD only makes sense when the target rank k is smallwe assume that k le l min (mn) The complexity analysis of Flip-Flop SRQR looks asfollows

1 The main cost of doing SRQR with TRQRCP onA is 2mnl+2(b+p)mn+(m+n)l22 The main cost of the QR factorization for [R11 R12]

T and forming Q1 is 2nl2minus 23 l

33 The main cost of computing A = AΠQ1 is 2mnl4 The main cost of computing [Usimsim] = svd

(A)

is O(ml2)5 The main cost of forming Vk is 2nlk

Since k le l min (mn) the complexity of Flip-Flop SRQR is 4mnl + 2(b+ p)mnby omitting the lower-order terms

On the other hand the complexity of the approximate SVD with randomized subspaceiteration (RSISVD) [27 30] is (4 + 4q)mn (k + p) where p is the oversampling size and qis the number of subspace iterations (see the detailed analysis in the appendix) In practice pis chosen to be a small integer for instance 5 in RSISVD In Flip-Flop SRQR l is usuallychosen to be a little bit larger than k eg l = k + 5 We also found that l = k is sufficient interms of the approximation quality in our numerical experiments Therefore we observe thatFlip-Flop SRQR is more efficient than RSISVD for any q gt 0

Since a practical dataset matrix can be too large to fit into the main memory the cost oftransferring the matrix into a memory hierarchy typically predominates the arithmetic costTherefore we also compare the number of data passes between RSISVD and Flip-Flop SRQRRSISVD needs 2q + 2 passes while Flip-Flop SRQR needs 4 passes computing B = ΩAswapping the columns of A realizing a partial QR factorization on A and computing AΠQ1Therefore Flip-Flop SRQR requires less data transfer than RSISVD for any q gt 1

33 Quality analysis of Flip-Flop SRQR This section is devoted to the quality analysisof Flip-Flop SRQR We start with Lemma 31

LEMMA 31 Given any matrix X = [X1X2] with Xi isin Rmtimesni i = 1 2 and n1 +n2 = n

σj (X)2 le σj (X1)

2+ X222 1 le j le min(mn)

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 477

Proof Since XXT = X1XT1 + X2X

T2 we obtain the above result using [33 Theo-

rem 3316]

We are now ready to derive bounds for the singular values and the approximation errorof Flip-Flop SRQR We need to emphasize that even if the target rank is k we run Flip-FlopSRQR with an actual target rank l which is a little bit larger than k The difference betweenk and l can create a gap between the singular values of A so that we can obtain a reliablelow-rank approximation

THEOREM 32 Given a matrix A isin Rmtimesn a target rank k an oversampling size p andan actual target rank l ge k the matrices Uk Σk Vk in (34) computed by Flip-Flop SRQRsatisfy

(35) σj(Σk) ge σj(A)

4

radic1 +

2R2242σ4j (Σk)

1 le j le k

and

(36)∥∥Aminus UkΣkV

Tk

∥∥2le σk+1 (A)

4

radic1 + 2

(R222σk+1 (A)

)4

where R22 isin R(mminusl)times(nminusl) is the trailing matrix in (31) Using the properties of SRQR wefurthermore have

(37) σj (Σk) ge σj (A)

4

radic1 + min

(2τ4 τ4 (2 + 4τ4)

(σl+1(A)σj(A)

)4) 1 le j le k

and

(38)∥∥Aminus UkΣkV

Tk

∥∥2le σk+1 (A)

4

radic1 + 2τ4

(σl+1 (A)

σk+1 (A)

)4

where τ and τ defined in (312) have matrix dimension-dependent upper bounds

τ le g1g2

radic(l + 1) (nminus l) and τ le g1g2

radicl (nminus l)

where g1 leradic

1+ε1minusε and g2 le g with high probability The parameters ε gt 0 and g gt 1 are

user-defined

Proof In terms of the singular value bounds observing that

[R11 R12

R22

][R11 R12

R22

]T=

[R11R

T11 + R12R

T12 R12R

T22

R22RT12 R22R

T22

]

ETNAKent State University and

Johann Radon Institute (RICAM)

478 Y FENG J XIAO AND M GU

we apply Lemma 31 twice for any 1 le j le k

σ2j

[R11 R12

R22

][R11 R12

R22

]Tle σ2

j

([R11R

T11 + R12R

T12 R12R

T22

])+∥∥∥[R22R

T12 R22R

T22

]∥∥∥2

2

le σ2j

(R11R

T11 + R12R

T12

)+∥∥∥R12R

T22

∥∥∥2

2+∥∥∥[R22R

T12 R22R

T22

]∥∥∥2

2

le σ2j

(R11R

T11 + R12R

T12

)+ 2

∥∥∥∥∥[R12

R22

]∥∥∥∥∥4

2

= σ2j

(R11R

T11 + R12R

T12

)+ 2 R2242 (39)

The relation (39) can be further rewritten as

σ4j (A) le σ4

j

([R11 R12

])+ 2 R2242 = σ4

j (Σk) + 2 R2242 1 le j le k

which is equivalent to

σj(Σk) ge σj(A)

4

radic1 +

2R2242σ4j (Σk)

For the residual matrix bound we let[R11 R12

]def=[R11 R12

]+[δR11 δR12

]

where[R11 R12

]is the rank-k truncated SVD of

[R11 R12

] Since

(310)∥∥Aminus UkΣkV

Tk

∥∥2

=

∥∥∥∥∥AΠminusQ[R11 R12

0

]TQT

∥∥∥∥∥2

it follows from the orthogonality of singular vectors that[R11 R12

]T [δR11 δR12

]= 0

and therefore[R11 R12

]T [R11 R12

]=[R11 R12

]T [R11 R12

]+[δR11 δR12

]T [δR11 δR12

]

which implies

(311) RT12R12 = RT

12R12 +(δR12

)T (δR12

)

Similarly to the deduction of (39) from (311) we can derive∥∥∥∥[δR11 δR12

R22

]∥∥∥∥4

2

le∥∥[δR11 δR12

]∥∥4

2+ 2

∥∥∥∥[δR12

R22

]∥∥∥∥4

2

le σ4k+1 (A) + 2

∥∥∥∥∥[R12

R22

]∥∥∥∥∥4

2

= σ4k+1 (A) + 2 R2242

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 479

Combining with (310) it now follows that

∥∥Aminus UkΣkVTk

∥∥2

=

∥∥∥∥∥AΠminusQ[R11 R12

0

]TQT

∥∥∥∥∥2

=

∥∥∥∥[δR11 δR12

R22

]∥∥∥∥2

le σk+1 (A)4

radic1 + 2

(R222σk+1 (A)

)4

To obtain an upper bound of R222 in (35) and (36) we follow the analysis of SRQRin [75]

From the analysis of [75 Section IV] Algorithm 2 ensures that g1 leradic

1+ε1minusε and g2 le g

with high probability (the actual probability-guarantee formula can be found in [75 SectionIV]) where g1 and g2 are defined by (26) Here 0 lt ε lt 1 is a user defined parameter toadjust the choice of the oversampling size p used in the TRQRCP initialization part in SRQRg gt 1 is a user-defined parameter in the extra swapping part in SRQR Let

(312) τdef= g1g2

R222R2212

∥∥∥RminusT∥∥∥minus1

12

σl+1 (A)and τ

def= g1g2

R222R2212

∥∥RminusT11

∥∥minus1

12

σk (Σk)

where R is defined in equation (25) Since 1radicnX12 le X2 le

radicnX12 and

σi (X1) le σi (X) 1 le i le min (s t) for any matrix X isin Rmtimesn and submatrix X1 isin Rstimestof X we obtain that

τ = g1g2R222R2212

∥∥∥RminusT∥∥∥minus1

12

σl+1

(R) σl+1

(R)

σl+1 (A)le g1g2

radic(l + 1) (nminus l)

Using the fact that σl([R11 R12

])= σl

(R11

)and σk (Σk) = σk

([R11 R12

])by (32)

and (34)

τ = g1g2R222R2212

∥∥RminusT11

∥∥minus1

12

σl (R11)

σl (R11)

σl([R11 R12

]) σl ([R11 R12

])σk (Σk)

le g1g2

radicl (nminus l)

By the definition of τ

(313) R222 = τ σl+1 (A)

Plugging this into (36) yields (38)By the definition of τ we observe that

(314) R222 le τσk (Σk)

By (35) and (314)(315)

σj (Σk) ge σj (A)

4

radic1 + 2

(R222σj(Σk)

)4ge σj (A)

4

radic1 + 2

(R222σk(Σk)

)4ge σj (A)

4radic

1 + 2τ4 1 le j le k

ETNAKent State University and

Johann Radon Institute (RICAM)

480 Y FENG J XIAO AND M GU

On the other hand using (35)

σ4j (A) le σ4

j (Σk)

(1 + 2

σ4j (A)

σ4j (Σk)

R2242σ4j (A)

)

le σ4j (Σk)

1 + 2

(σ4j (Σk) + 2 R2242

)σ4j (Σk)

R2242σ4j (A)

le σ4

j (Σk)

(1 + 2

(1 + 2

R2242σ4k (Σk)

)R2242σ4j (A)

)

that is

σj (Σk) ge σj (A)

4

radic1 +

(2 + 4

R2242σ4k(Σk)

)R2242σ4j (A)

Plugging (313) and (314) into this above equation yields

(316) σj (Σk) ge σj (A)

4

radic1 + τ4 (2 + 4τ4)

(σl+1(A)σj(A)

)4 1 le j le k

Combining (315) with (316) we arrive at (37)We note that (35) and (36) still hold true if we replace k by lInequality (37) shows that with the definitions (312) of τ and τ Flip-Flop SRQR can

reveal at least a dimension-dependent fraction of all the leading singular values of A andindeed approximate them very accurately in case they decay relatively quickly Moreover (38)shows that Flip-Flop SRQR can compute a rank-k approximation that is up to a factor of4

radic1 + 2τ4

(σl+1(A)σk+1(A)

)4

from optimal In situations where the singular values of A decay

relatively quickly our rank-k approximation is about as accurate as the truncated SVD with achoice of l such that

σl+1 (A)

σk+1 (A)= o (1)

REMARK 33 The ldquohigh probabilityrdquo mentioned in Theorem 32 is dependent on theoversampling size p but not on the block size b We decided not to dive deep into the probabilityexplanation since it will make the proof much longer and more tedious Theoretically thecomputed oversampling size p for certain given high probability would be rather high Forexample to achieve a probability 095 we need to choose p to be at least 500 However inpractice we would never choose such a large p since it will deteriorate the code efficiencyIt turns out that a small p like 5 sim 10 would achieve good approximation results empiricallyThe detailed explanation about what does high probability mean here can be found in [75]In terms of the choice of the block size b it wonrsquot affect the analysis or the error boundsThe choice of b is purely for the performance of the codes Only a suitable b can take fulladvantage of the cache hierarchy and BLAS routines

4 Numerical experiments In this section we demonstrate the effectiveness and effi-ciency of the Flip-Flop SRQR (FFSRQR) algorithm in several numerical experiments Firstly

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 481

TABLE 41Methods for approximate SVD

Method DescriptionLANSVD Approximate SVD using Lanczos bidiagonalization with partial

reorthogonalization [42]Its Fortran and Matlab implementations are from PROPACK [41]

FFSRQR Flip-Flop Spectrum-revealing QR factorizationIts Matlab implementation is using Mex-wrappedBLAS and LAPACK routines [2]

RSISVD Approximate SVD with randomized subspace iteration [30 27]Its Matlab implementation is from tensorlab toolbox [73]

TUXV Algorithm 5 in the appendixLTSVD Linear Time SVD [16]

We use its Matlab implementation by Ma et al [49]

we compare FFSRQR with other approximate SVD algorithms for a matrix approximationSecondly we compare FFSRQR with other methods on tensor approximation problems usingthe tensorlab toolbox [73] Thirdly we compare FFSRQR with other methods for the robustPCA problem and matrix completion problem All experiments are implemented on a laptopwith 29 GHz Intel Core i5 processor and 8 GB of RAM The codes based on LAPACK [2]and PROPACK [41] in Section 41 are in Fortran The codes in Sections 42 43 are in Matlabexcept that FFSRQR is implemented in Fortran and wrapped by mex

41 Approximate truncated SVD In this section we compare FFSRQR with otherapproximate SVD algorithms for low-rank matrix approximation All tested methods are listedin Table 41 Since LTSVD has only a Matlab implementation this method is only consideredin Sections 42 and 43 The test matrices areType 1 A isin Rmtimesn [66] is defined by A = U DV T + 01 middot dss middotE where U isin Rmtimess V isin

Rntimess are column-orthonormal matrices and D isin Rstimess is a diagonal matrix with sgeometrically decreasing diagonal entries from 1 to 10minus3 dss is the last diagonalentry of D E isin Rmtimesn is a random matrix where the entries are independentlysampled from a normal distribution N (0 1) In our numerical experiment we testthree different random matrices The square matrix has a size of 10000times 10000 theshort-fat matrix has a size of 1000times 10000 and the tall-skinny matrix has a size of10000times 1000

Type 2 A isin R4929times4929 is a real data matrix from the University of Florida sparse matrixcollection [11] Its corresponding file name is HBGEMAT11

For a given matrix A isin Rmtimesn the relative SVD approximation error is measured by∥∥Aminus UkΣkVTk

∥∥FAF where Σk contains the approximate top k singular values and

Uk Vk are the corresponding approximate top k singular vectors The parameters used inFFSRQR RSISVD TUXV and LTSVD are listed in Table 42 The additional time requiredto compute g2 is negligible In our experiments g2 always remains modest and never triggerssubsequent SRQR column swaps However computing g2 is still recommended since g2

serves as an insurance policy against potential quality degration by TRQRCP The block size bis chosen to be 32 according to the xGEQP3 LAPACK routine When k is less than 32 weimplement an unblock version of the partial QRCP

Figures 41ndash43 display the run time the relative approximation error and the top 20singular values comparison respectively for four different matrices In terms of speed FFSRQRis always faster than LANSVD and RSISVD Moreover the efficiency difference is more

ETNAKent State University and

Johann Radon Institute (RICAM)

482 Y FENG J XIAO AND M GU

TABLE 42Parameters used in RSISVD FFSRQR TUXV and LTSVD

Method ParameterFFSRQR oversampling size p = 5 block size b = min32 k l = k d = 10 g = 20RSISVD oversampling size p = 5 subspace iteration q = 1TUXV oversampling size p = 5 block size b = min32 k l = kLTSVD probabilities pi = 1n

obvious when k is relatively large FFSRQR is only slightly slower than TUXV In termsof the relative approximation error FFSRQR is comparable to the other three methods Interms of the top singular values approximation Figure 43 displays the top 20 singular valueswhen k = 500 for four different matrices In Figure 43 FFSRQR is superior to TUXV aswe take the absolute values of the main diagonal entries of the upper triangle matrix R as theapproximate singular values for TUXV

100 200 300 400 500

target rank k

100

101

102

103

104

run tim

e (

sec)

FFSRQR

RSISVD

LANSVD

TUXV

(a) Type 1 Random square matrix

100 200 300 400 500

target rank k

10-1

100

101

102

run tim

e (

sec)

FFSRQR

RSISVD

LANSVD

TUXV

(b) Type 1 Random short-fat matrix

100 200 300 400 500

target rank k

10-1

100

101

102

run tim

e (

sec)

FFSRQR

RSISVD

LANSVD

TUXV

(c) Type 1 Random tall-skinny matrix

100 200 300 400 500

target rank k

10-1

100

101

102

103

run tim

e (

sec)

FFSRQR

RSISVD

LANSVD

TUXV

(d) Type 2 GEMAT11

FIG 41 Run time comparison for approximate SVD algorithms

42 Tensor approximation This section illustrates the effectiveness and efficiency ofFFSRQR for computing approximate tensors ST-HOSVD [3 71] is one of the most efficient

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 483

100 200 300 400 500

target rank k

022

024

026

028

03

032

034

036re

lative

ap

pro

xim

atio

n e

rro

r

FFSRQR

RSISVD

LANSVD

TUXV

(a) Type 1 Random square matrix

100 200 300 400 500

target rank k

10-2

10-1

100

rela

tive

ap

pro

xim

atio

n e

rro

r

FFSRQR

RSISVD

LANSVD

TUXV

(b) Type 1 Random short-fat matrix

100 200 300 400 500

target rank k

10-2

10-1

100

rela

tive

ap

pro

xim

atio

n e

rro

r

FFSRQR

RSISVD

LANSVD

TUXV

(c) Type 1 Random tall-skinny matrix

100 200 300 400 500

target rank k

024

026

028

03

032

034

036

rela

tive

ap

pro

xim

atio

n e

rro

rFFSRQR

RSISVD

LANSVD

TUXV

(d) Type 2 GEMAT11

FIG 42 Relative approximation error comparison for approximate SVD algorithms

algorithms to compute a Tucker decomposition of tensors and the most costly part of thisalgorithm is to compute an SVD or approximate SVD of the tensor unfoldings The truncatedSVD and RSISVD are used in the routines MLSVD and MLSVD_RSI respectively in theMatlab tensorlab toolbox [73] Based on this Matlab toolbox we implement ST-HOSVDusing FFSRQR or LTSVD to do the SVD approximation We name these two new routines byMLSVD_FFSRQR and MLSVD_LTSVD respectively and compare these four routines in thenumerical experiment We also have Python codes for this tensor approximation experimentWe donrsquot list the results of the python programs here as they are similar to those for Matlab

421 A sparse tensor example We test a sparse tensor X isin Rntimesntimesn of the followingformat [64 59]

X =

10sumj=1

1000

jxj yj zj +

nsumj=11

1

jxj yj zj

where xj yj zj isin Rn are sparse vectors with nonnegative entries The symbol ldquordquo representsthe vector outer product We compute a rank-(k k k) Tucker decomposition [GU1 U2 U3]using MLSVD MLSVD_FFSRQR MLSVD_RSI and MLSVD_LTSVD respectively Therelative approximation error is measured by X minus XkF XF where Xk = G times1 U1 times2

U2 times3 U3

ETNAKent State University and

Johann Radon Institute (RICAM)

484 Y FENG J XIAO AND M GU

5 10 15 20

index

06

07

08

09

1to

p 2

0 s

ingula

r valu

es

FFSRQR

RSISVD

LANSVD

TUXV

(a) Type 1 Random square matrix

5 10 15 20

index

06

07

08

09

1

top 2

0 s

ingula

r valu

es

FFSRQR

RSISVD

LANSVD

TUXV

(b) Type 1 Random short-fat matrix

5 10 15 20

index

06

065

07

075

08

085

09

095

top

20

sin

gu

lar

va

lue

s

FFSRQR

RSISVD

LANSVD

TUXV

(c) Type 1 Random tall-skinny matrix

5 10 15 20

index

101

102

103

top

20

sin

gu

lar

va

lue

sFFSRQR

RSISVD

LANSVD

TUXV

(d) Type 2 GEMAT11

FIG 43 Top 20 singular values comparison for approximate SVD algorithms

Figure 44 displays a comparison of the efficiency and accuracy of different methods for a400times 400times 400 sparse tensor approximation problem MLSVD_LTSVD is the fastest but theleast accurate one The other three methods have similar accuracy while MLSVD_FFSRQR isfaster when the target rank k is larger

422 Handwritten digits classification MNIST is a handwritten digits image data setcreated by Yann LeCun [43] Every digit is represented by a 28times28 pixel image Handwrittendigits classification has the objective of training a classification model to classify new unlabeledimages A HOSVD algorithm is proposed by Savas and Elden [60] to classify handwrittendigits To reduce the training time a more efficient ST-HOSVD algorithm is introducedin [71]

We do handwritten digits classification using MNIST which consists of 60 000 trainingimages and 10 000 test images The number of training images in each class is restrictedto 5421 so that the training set is equally distributed over all classes The training set isrepresented by a tensor X of size 786times 5421times 10 The classification relies on Algorithm 2in [60] We use various algorithms to obtain an approximation X asymp G times1 U1 times2 U2 times3 U3

where the core tensor G has size 65times 142times 10The results are summarized in Table 43 In terms of run time we observe that our method

MLSVD_FFSRQR is comparable to MLSVD_RSI while MLSVD is the most expensive

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 485

50 100 150 200

target rank k

10-5

10-4

10-3

10-2

10-1

100

rela

tive

ap

pro

xim

atio

n e

rro

r

MLSVD

MLSVD_FFSRQR

MLSVD_RSI

MLSVD_LTSVD

50 100 150 200

target rank k

10-2

10-1

100

101

102

run tim

e (

sec)

MLSVD

MLSVD_FFSRQR

MLSVD_RSI

MLSVD_LTSVD

FIG 44 Run time and relative approximation error comparison on a sparse tensor

TABLE 43Comparison on handwritten digits classification

MLSVD MLSVD_FFSRQR MLSVD_RSI MLSVD_LTSVD

Training Time [sec] 272121 15455 19343 04266Relative Model Error 04099 04273 04247 05162Classification Accur 9519 9498 9505 9259

one and MLSVD_LTSVD is the fastest one In terms of classification quality MLSVDMLSVD_FFSRQR and MLSVD_RSI are comparable while MLSVD_LTSVD is the leastaccurate one

43 Solving nuclear norm minimization problem To show the effectiveness of theFFSRQR algorithm for the nuclear norm minimization problems we investigate two scenariosthe robust PCA (27) and the matrix completion problem (28) The test matrix used for therobust PCA is introduced in [45] and the test matrices used in the matrix completion are tworeal data sets We use the IALM method [45] to solve both problems the IALM code can bedownloaded1

431 Robust PCA To solve the robust PCA problem we replace the approximateSVD part in the IALM method [45] by various methods We denote the actual solution tothe robust PCA problem by a matrix pair (Xlowast Elowast) isin Rmtimesn times Rmtimesn The matrix Xlowast isXlowast = XLX

TR with XL isin Rmtimesk XR isin Rntimesk being random matrices whose entries are

independently sampled from a normal distribution The sparse matrix Elowast is a random matrixwhere its non-zero entries are independently sampled from a uniform distribution over theinterval [minus500 500] The input to the IALM algorithm has the form M = Xlowast + Elowast andthe output is denoted by

(X E

) In this numerical experiment we use the same parameter

settings as the IALM code for robust PCA rank k is 01m and the number of non-zero entriesin E is 005m2 We choose the trade-off parameter λ = 1

radicmax (mn) as suggested by

Candes et al [8] The solution quality is measured by the normalized root mean square errorX minusXlowastF XlowastF

Table 44 includes relative error run time the number of non-zero entries in E (E0)the iteration count and the number of non-zero singular values (sv) in X of the IALM

1httpperceptioncslillinoisedumatrix-ranksample_codehtml

ETNAKent State University and

Johann Radon Institute (RICAM)

486 Y FENG J XIAO AND M GU

algorithm using different approximate SVD methods We observe that IALM_FFSRQR isfaster than all the other three methods while its error is comparable to IALM_LANSVD andIALM_RSISVD IALM_LTSVD is relatively slow and not effective

TABLE 44Comparison on robust PCA

Size Method Error Time (sec) E0 Iter sv

1000times 1000

IALM_LANSVD 333e-07 579e+00 50000 22 100IALM_FFSRQR 279e-07 102e+00 50000 25 100IALM_RSISVD 336e-07 109e+00 49999 22 100IALM_LTSVD 992e-02 311e+00 999715 100 100

2000times 2000

IALM_LANSVD 261e-07 591e+01 199999 22 200IALM_FFSRQR 182e-07 693e+00 199998 25 200IALM_RSISVD 263e-07 738e+00 199996 22 200IALM_LTSVD 842e-02 220e+01 3998937 100 200

4000times 4000

IALM_LANSVD 138e-07 465e+02 799991 23 400IALM_FFSRQR 139e-07 443e+01 800006 26 400IALM_RSISVD 151e-07 504e+01 799990 23 400IALM_LTSVD 894e-02 154e+02 15996623 100 400

6000times 6000

IALM_LANSVD 130e-07 166e+03 1799982 23 600IALM_FFSRQR 102e-07 142e+02 1799993 26 600IALM_RSISVD 144e-07 162e+02 1799985 23 600IALM_LTSVD 858e-02 555e+02 35992605 100 600

432 Matrix completion We solve the matrix completion problems for two real datasets used in [68] the Jester joke data set [24] and the MovieLens data set [32] The Jester jokedata set consists of 41 million ratings for 100 jokes from 73 421 users and can be downloadedfrom the website httpgoldbergberkeleyedujester-data We test withthe following data matrices

bull jester-1 Data from 24 983 users who have rated 36 or more jokesbull jester-2 Data from 23 500 users who have rated 36 or more jokesbull jester-3 Data from 24 938 users who have rated between 15 and 35 jokesbull jester-all The combination of jester-1 jester-2 and jester-3

The MovieLens data set can be downloaded fromhttpsgrouplensorgdatasetsmovielensWe test with the following data matricesbull movie-100K 100 000 ratings of 943 users for 1682 moviesbull movie-1M 1 million ratings of 6040 users for 3900 moviesbull movie-latest-small 100 000 ratings of 700 users for 9000 movies

For each data set we let M be the original data matrix where Mij stands for the ratingof joke (movie) j by user i and Γ be the set of indices where Mij is known The matrixcompletion algorithm quality is measured by the Normalized Mean Absolute Error (NMAE)defined by

NMAEdef=

1|Γ|sum

(ij)isinΓ |Mij minusXij |rmax minus rmin

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 487

where Xij is the prediction of the rating of joke (movie) j given by user i and rmin rmax arethe lower and upper bounds of the ratings respectively For the Jester joke data sets we setrmin = minus10 and rmax = 10 For the MovieLens data sets we set rmin = 1 and rmax = 5

Since |Γ| is large we randomly select a subset Ω from Γ and then use Algorithm 8 tosolve the problem (28) We randomly select 10 ratings for each user in the Jester joke datasets while we randomly choose about 50 of the ratings for each user in the MovieLensdata sets Table 45 includes the parameter settings in the algorithms The maximum iterationnumber is 100 in IALM and all other parameters are the same as those used in [45]

The numerical results are included in Table 46 We observe that IALM_FFSRQR achievesalmost the same recoverability as other methods (except IALM_LTSVD) and is slightly fasterthan IALM_RSISVD for these two data sets

TABLE 45Parameters used in the IALM method on matrix completion

Data set m n |Γ| |Ω|jester-1 24983 100 181e+06 249830jester-2 23500 100 171e+06 235000jester-3 24938 100 617e+05 249384

jester-all 73421 100 414e+06 734210movie-100K 943 1682 100e+05 49918movie-1M 6040 3706 100e+06 498742

movie-latest-small 671 9066 100e+05 52551

5 Conclusions We have presented the Flip-Flop SRQR factorization a variant ofthe QLP factorization to compute low-rank matrix approximations The Flip-Flop SRQRalgorithm uses a SRQR factorization to initialize the truncated version of a column-pivotedQR factorization and then forms an LQ factorization For the numerical results presented theerrors in the proposed algorithm were comparable to those obtained from the other state-of-the-art algorithms This new algorithm is cheaper to compute and produces quality low-rankmatrix approximations Furthermore we prove singular value lower bounds and residual errorupper bounds for the Flip-Flop SRQR factorization In situations where singular values ofthe input matrix decay relatively quickly the low-rank approximation computed by Flip-FlopSRQR is guaranteed to be as accurate as the truncated SVD We also perform a complexityanalysis to show that Flip-Flop SRQR is faster than the approximate SVD with randomizedsubspace iteration Future work includes reducing the overhead cost in Flip-Flop SRQR andimplementing the Flip-Flop SRQR algorithm on distributed memory machines for popularapplications such as distributed PCA

Acknowledgments The authors would like to thank Yousef Saad the editor and threeanonymous referees who kindly reviewed the earlier version of this manuscript and providedvaluable suggestions and comments

Appendix A Partial QR factorization with column pivotingThe details of partial QRCP are covered in Algorithm 4

Appendix B TUXV algorithmThe details of TUXV algorithm are presented in Algorithm 5

Appendix C Computing Tucker decomposition of higher-order tensorsAlgorithm 6 computes the Tucker decomposition of higher-order tensors ie sequentially

truncated higher-order SVD

ETNAKent State University and

Johann Radon Institute (RICAM)

488 Y FENG J XIAO AND M GU

TABLE 46Comparison on matrix completion

Data set Method Iter Time NMAE sv σmax σmin

jester-1

IALM-LANSVD 12 706e+00 184e-01 100 214e+03 100e+00IALM-FFSRQR 12 344e+00 169e-01 100 228e+03 100e+00IALM-RSISVD 12 375e+00 189e-01 100 212e+03 100e+00IALM-LTSVD 100 211e+01 174e-01 62 300e+03 100e+00

jester-2

IALM-LANSVD 12 680e+00 185e-01 100 213e+03 100e+00IALM-FFSRQR 12 279e+00 170e-01 100 229e+03 100e+00IALM-RSISVD 12 359e+00 191e-01 100 212e+03 100e+00IALM-LTSVD 100 203e+01 175e-01 58 296e+03 100e+00

jester-3

IALM-LANSVD 12 705e+00 126e-01 99 179e+03 100e+00IALM-FFSRQR 12 303e+00 122e-01 100 171e+03 100e+00IALM-RSISVD 12 385e+00 131e-01 100 178e+03 100e+00IALM-LTSVD 100 212e+01 133e-01 55 250e+03 100e+00

jester-all

IALM-LANSVD 12 239e+01 172e-01 100 356e+03 100e+00IALM-FFSRQR 12 112e+01 162e-01 100 363e+03 100e+00IALM-RSISVD 12 134e+01 182e-01 100 347e+03 100e+00IALM-LTSVD 100 699e+01 168e-01 52 492e+03 100e+00

movie-100K

IALM-LANSVD 29 286e+01 183e-01 285 121e+03 100e+00IALM-FFSRQR 30 455e+00 167e-01 295 153e+03 100e+00IALM-RSISVD 29 482e+00 182e-01 285 129e+03 100e+00IALM-LTSVD 48 142e+01 147e-01 475 191e+03 100e+00

movie-1M

IALM-LANSVD 50 740e+02 158e-01 495 499e+03 100e+00IALM-FFSRQR 53 207e+02 137e-01 525 663e+03 100e+00IALM-RSISVD 50 223e+02 157e-01 495 535e+03 100e+00IALM-LTSVD 100 850e+02 117e-01 995 897e+03 100e+00

movie-latest-small

IALM-LANSVD 31 166e+02 185e-01 305 113e+03 100e+00IALM-FFSRQR 31 196e+01 200e-01 305 142e+03 100e+00IALM-RSISVD 31 285e+01 191e-01 305 120e+03 100e+00IALM-LTSVD 63 402e+01 208e-01 298 179e+03 100e+00

Appendix D IALM for solving two special nuclear norm minimization problemsAlgorithm 7 and Algorithm 8 solve the robust PCA and matrix completion problems

respectively In Algorithm 7 Sω (x) = sgn (x) middot max (|x| minus ω 0) is the soft shrinkageoperator [29] with x isin Rn and ω gt 0 In Algorithm 8 Ω is the complement of Ω

Appendix E Approximate SVD with randomized subspace iterationThe randomized subspace iteration was proposed in [30 Algorithm 44] to compute an

orthonormal matrix whose range approximates the range of A An approximate SVD can becomputed using the aforementioned orthonormal matrix [30 Algorithm 51] A randomizedsubspace iteration is used in routine MLSVD_RSI in Matlab toolbox tensorlab [73] andMLSVD_RSI is by far the most efficient function that we can find in Matlab to compute ST-HOSVD We summarize the approximate SVD with randomized subspace iteration pseudocodein Algorithm 9

Now we perform a complexity analysis for the approximate SVD with randomizedsubspace iteration We first note that

1 The cost of generating a random matrix is negligible2 The cost of computing B = AΩ is 2mn (k + p)

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 489

Algorithm 4 Partial QRCPInputsMatrix A isin Rmtimesn Target rank kOutputsOrthogonal matrix Q isin RmtimesmUpper trapezoidal matrix R isin RmtimesnPermutation matrix Π isin Rntimesn such that AΠ = QRAlgorithmInitialize Π = In Compute column norms rs = A (1 m s) 2 (1 le s le n)for j = 1 k do

Find i = arg maxjleslen

rs Exchange rj and ri columns in A and Π

Form Householder reflection Qj from A (j m j)Update trailing matrix A (j m j n)larr QTj A (j m j n)Update rs = A (j + 1 m s) 2 (j + 1 le s le n)

end forQ = Q1Q2 middot middot middotQk is the product of all reflections R = upper trapezoidal part of A

Algorithm 5 TUXV AlgorithmInputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0OutputsColumn orthonormal matrices U isin Rmtimesk V isin Rntimesk and upper triangular matrixR isin Rktimesk such that A asymp URV T AlgorithmDo TRQRCP on A to obtain Q isin Rmtimesk R isin Rktimesn and Π isin RntimesnR = RΠT and do LQ factorization ie [VR] = qr(RT 0)Compute Z = AV and do QR factorization ie [UR] = qr(Z 0)

3 In each QR step [Qsim] = qr (B 0) the cost of computing the QR factorization ofB is 2m (k + p)

2 minus 23 (k + p)

3 (cf [69]) and the cost of forming the first (k + p)

columns in the full Q matrix is m (k + p)2

+ 13 (k + p)

3

Now we count the flops for each i in the for loop

1 The cost of computing B = AT lowastQ is 2mn (k + p)2 The cost of computing [Qsim] = qr (B 0) is 2n (k + p)

2 minus 23 (k + p)

3 and the costof forming the first (k + p) columns in the full Q matrix is n (k + p)

2+ 1

3 (k + p)3

3 The cost of computing B = A lowastQ is 2mn (k + p)4 The cost of computing [Qsim] = qr (B 0) is 2m (k + p)

2minus 23 (k + p)

3 and the costof forming the first (k + p) columns in the full Q matrix is m (k + p)

2+ 1

3 (k + p)3

Together the cost of running the for loop q times is

q

(4mn (k + p) + 3 (m+ n) (k + p)

2 minus 2

3(k + p)

3

)

Additionally the cost of computing B = QT lowastA is 2mn (k + p) the cost of doing SVD of Bis O(n (k + p)

2) and the cost of computing U = Q lowast U is 2m (k + p)

2

ETNAKent State University and

Johann Radon Institute (RICAM)

490 Y FENG J XIAO AND M GU

Algorithm 6 ST-HOSVDInputsTensor X isin RI1timesmiddotmiddotmiddottimesId desired rank (k1 kd) and processing order p = (p1 middot middot middot pd)OutputsTucker decomposition [GU1 middot middot middot Ud]AlgorithmDefine tensor G larr X for j = 1 d dor = pj Compute exact or approximate rank kr SVD of the tensor unfolding G(r) asymp UrΣrV Tr Ur larr UrUpdate G(r) larr ΣrV

Tr ie applying UTr to G

end for

Algorithm 7 Robust PCA Using IALMInputsMeasured matrix M isin Rmtimesn positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 J (M) = max

(M2 λminus1MF

) Y0 = MJ (M) E0 = 0

while not converged do(UΣV) = svd

(MminusEk + microminus1k Yk

)

Xk+1 = USmicrominus1k

(Σ)V T

Ek+1 = Sλmicrominus1k

(M minusXk+1 + microminus1

k Yk)

Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then

Breakend if

end while

Now assume k + p min (mn) and omit the lower-order terms then we arriveat (4q + 4)mn (k + p) as the complexity of approximate SVD with randomized subspaceiteration In practice q is usually chosen to be an integer between 0 and 2

REFERENCES

[1] O ALTER P O BROWN AND D BOTSTEIN Singular value decomposition for genome-wide expressiondata processing and modeling Proc Natl Acad Sci USA 97 (2000) pp 10101ndash10106

[2] E ANDERSON Z BAI C BISCHOF L S BLACKFORD J DEMMEL J DONGARRA J DU CROZA GREENBAUM S HAMMARLING A MCKENNEY AND D SORENSEN LAPACK Usersrsquo Guide 3rded SIAM Philadelphia 1999

[3] C A ANDERSSON AND R BRO Improving the speed of multi-way algorithms Part I Tucker3 ChemometricsIntell Lab Syst 42 (1998) pp 93ndash103

[4] M W BERRY S T DUMAIS AND G W OrsquoBRIEN Using linear algebra for intelligent information retrievalSIAM Rev 37 (1995) pp 573ndash595

[5] C BISCHOF AND C VAN LOAN The WY representation for products of Householder matrices SIAM J Sci

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 491

Algorithm 8 Matrix Completion Using IALMInputsSampled set Ω sampled entries πΩ (M) positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 Y0 = 0 E0 = 0while not converged do

(UΣV) = svd(MminusEk + microminus1k Yk

)

Xk+1 = USmicrominus1k

(Σ)V T

Ek+1 = πΩ

(M minusXk+1 + microminus1

k Yk)

Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then

Breakend if

end while

Statist Comput 8 (1987) pp S2ndashS13[6] P BUSINGER AND G H GOLUB Handbook series linear algebra Linear least squares solutions by House-

holder transformations Numer Math 7 (1965) pp 269ndash276[7] J-F CAI E J CANDEgraveS AND Z SHEN A singular value thresholding algorithm for matrix completion

SIAM J Optim 20 (2010) pp 1956ndash1982[8] E J CANDEgraveS X LI Y MA AND J WRIGHT Robust principal component analysis J ACM 58 (2011)

Art 11 37 pages[9] E J CANDEgraveS AND B RECHT Exact matrix completion via convex optimization Found Comput Math 9

(2009) pp 717ndash772[10] J D CARROLL AND J-J CHANG Analysis of individual differences in multidimensional scaling via an

N-way generalization of ldquoEckart-Youngrdquo decomposition Psychometrika 35 (1970) pp 283ndash319[11] T A DAVIS AND Y HU The University of Florida sparse matrix collection ACM Trans Math Software 38

(2011) pp Art 1 25[12] L DE LATHAUWER AND B DE MOOR From matrix to tensor Multilinear algebra and signal processing in

Mathematics in Signal Processing IV Inst Math Appl Conf Ser Oxford Univ Press Oxford 1998pp 1ndash16

[13] L DE LATHAUWER B DE MOOR AND J VANDEWALLE A multilinear singular value decompositionSIAM J Matrix Anal Appl 21 (2000) pp 1253ndash1278

[14] On the best rank-1 and rank-(R1 R2 middot middot middot RN ) approximation of higher-order tensors SIAM JMatrix Anal Appl 21 (2000) pp 1324ndash1342

[15] L DE LATHAUWER AND J VANDEWALLE Dimensionality reduction in higher-order signal processing andrank-(R1 R2 RN ) reduction in multilinear algebra Linear Algebra Appl 391 (2004) pp 31ndash55

[16] P DRINEAS R KANNAN AND M W MAHONEY Fast Monte Carlo algorithms for matrices II Computinga low-rank approximation to a matrix SIAM J Comput 36 (2006) pp 158ndash183

[17] J A DUERSCH AND M GU Randomized QR with column pivoting SIAM J Sci Comput 39 (2017)pp C263ndashC291

[18] C ECKART AND G YOUNG The approximation of one matrix by another of lower rank Psychometrika 1(1936) pp 211ndash218

[19] L ELDEacuteN Matrix Methods in Data Mining and Pattern Recognition SIAM Philadelphia 2007[20] M FAZEL H HINDI AND S P BOYD Log-det heuristic for matrix rank minimization with applications to

Hankel and Euclidean distance matrices in Proceedings of the 2003 American Control Conference 2003IEEE Conference Proceedings IEEE Piscataway pp 2156ndash2162

[21] M FAZEL T K PONG D SUN AND P TSENG Hankel matrix rank minimization with applications tosystem identification and realization SIAM J Matrix Anal Appl 34 (2013) pp 946ndash977

[22] C FRANK AND E NOumlTH Optimizing eigenfaces by face masks for facial expression recognition in ComputerAnalysis of Images and Patterns N Petkov and M A Westenberg eds vol 2756 of Lecture Notes inComput Sci Springer Berlin 2003 pp 646ndash654

ETNAKent State University and

Johann Radon Institute (RICAM)

492 Y FENG J XIAO AND M GU

Algorithm 9 Approximate SVD with Randomized Subspace IterationInputsMatrix A isin Rmtimesn Target rank k Oversampling size p ge 0 Number of iterations q ge 1OutputsU isin Rmtimesk contains the approximate top k left singular vectors of AΣ isin Rktimesk contains the approximate top k singular values of AV isin Rntimesk contains the approximate top k right singular vectors of AAlgorithmGenerate iid Gaussian matrix Ω isin N (0 1)

ntimes(k+p)Compute B = AΩ[Qsim] = qr (B 0)for i = 1 q doB = AT lowastQ[Qsim] = qr (B 0)B = A lowastQ[Qsim] = qr (B 0)

end forB = QT lowastA[UΣ V ] = svd (B)U = Q lowast U U = U ( 1 k)Σ = Σ (1 k 1 k)V = V ( 1 k)

[23] G W FURNAS S DEERWESTER S T DUMAIS T K LANDAUER R A HARSHMAN L A STREETERAND K E LOCHBAUM Information retrieval using a singular value decomposition model of latentsemantic structure in Proceedings of the 11th Annual International ACM SIGIR Conference on Researchand Development in Information Retrieval Y Chiaramella ed ACM New York 1988 pp 465ndash480

[24] K GOLDBERG T ROEDER D GUPTA AND C PERKINS Eigentaste A constant time collaborative filteringalgorithm Inf Retrieval 4 (2001) pp 133ndash151

[25] G GOLUB Numerical methods for solving linear least squares problems Numer Math 7 (1965) pp 206ndash216[26] G H GOLUB AND C F VAN LOAN Matrix Computations 4th ed Johns Hopkins University Press

Baltimore 2013[27] M GU Subspace iteration randomization and singular value problems SIAM J Sci Comput 37 (2015)

pp A1139ndashA1173[28] M GU AND S C EISENSTAT Efficient algorithms for computing a strong rank-revealing QR factorization

SIAM J Sci Comput 17 (1996) pp 848ndash869[29] E T HALE W YIN AND Y ZHANG Fixed-point continuation for l1-minimization methodology and

convergence SIAM J Optim 19 (2008) pp 1107ndash1130[30] N HALKO P G MARTINSSON AND J A TROPP Finding structure with randomness probabilistic

algorithms for constructing approximate matrix decompositions SIAM Rev 53 (2011) pp 217ndash288[31] R A HARSHMAN Foundations of the PARAFAC procedure Models and conditions for an ldquoexplanatoryrdquo

multimodal factor analysis UCLA Working Papers in Phonetics 16 (1970) pp 1ndash84[32] J L HERLOCKER J A KONSTAN A BORCHERS AND J RIEDL An algorithmic framework for performing

collaborative filtering in Proceedings of the 22nd Annual International ACM SIGIR Conference onResearch and Development in Information Retrieval F Gey M Heaast and R Tong eds ACM NewYork 1999 pp 230ndash237

[33] R A HORN AND C R JOHNSON Topics in Matrix Analysis Cambridge University Press Cambridge 1991[34] D A HUCKABY AND T F CHAN On the convergence of Stewartrsquos QLP algorithm for approximating the

SVD Numer Algorithms 32 (2003) pp 287ndash316[35] Stewartrsquos pivoted QLP decomposition for low-rank matrices Numer Linear Algebra Appl 12 (2005)

pp 153ndash159[36] W B JOHNSON AND J LINDENSTRAUSS Extensions of Lipschitz mappings into a Hilbert space in

Conference in Modern Analysis and Probability (New Haven Conn 1982) R Beals A Beck A Bellowand A Hajian eds vol 26 of Contemp Math Amer Math Soc Providence 1984 pp 189ndash206

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 493

[37] I T JOLLIFFE Principal Component Analysis Springer New York 1986[38] J M KLEINBERG Authoritative sources in a hyperlinked environment J ACM 46 (1999) pp 604ndash632[39] T G KOLDA Orthogonal tensor decompositions SIAM J Matrix Anal Appl 23 (2001) pp 243ndash255[40] T G KOLDA AND B W BADER Tensor decompositions and applications SIAM Rev 51 (2009) pp 455ndash

500[41] R M LARSEN PROPACK-software for large and sparse SVD calculations available at

httpsunstanfordedurmunkPROPACK[42] Lanczos bidiagonalization with partial reorthogonalization Tech Report PB-537 Department of

Computer Science Aarhus University Aarhus Denmark 1998[43] Y LECUN L BOTTOU Y BENGIO AND P HAFFNER Gradient-based learning applied to document

recognition Proc IEEE 86 (1998) pp 2278ndash2324[44] R B LEHOUCQ D C SORENSEN AND C YANG ARPACK Usersrsquo Guide Solution of Large-Scale

Eigenvalue Problems with Implicitly Restarted Arnoldi Methods SIAM Philadelphia 1998[45] Z LIN M CHEN AND Y MA The augmented Lagrange multiplier method for exact recovery of corrupted

low-rank matrices Preprint on arXiv httpsarxivorgabs10095055 2010[46] N LINIAL E LONDON AND Y RABINOVICH The geometry of graphs and some of its algorithmic

applications Combinatorica 15 (1995) pp 215ndash245[47] Z LIU A HANSSON AND L VANDENBERGHE Nuclear norm system identification with missing inputs and

outputs Systems Control Lett 62 (2013) pp 605ndash612[48] Z LIU AND L VANDENBERGHE Interior-point method for nuclear norm approximation with application to

system identification SIAM J Matrix Anal Appl 31 (2009) pp 1235ndash1256[49] S MA D GOLDFARB AND L CHEN Fixed point and Bregman iterative methods for matrix rank minimiza-

tion Math Program 128 (2011) pp 321ndash353[50] C B MELGAARD Randomized Pivoting and Spectrum-Revealing Bounds in Numerical Linear Algebra

PhD Thesis University of California Berkeley 2015[51] M MESBAHI AND G P PAPAVASSILOPOULOS On the rank minimization problem over a positive semidefinite

linear matrix inequality IEEE Trans Automat Control 42 (1997) pp 239ndash243[52] N MULLER L MAGAIA AND B M HERBST Singular value decomposition eigenfaces and 3D recon-

structions SIAM Rev 46 (2004) pp 518ndash545[53] M NARWARIA AND W LIN SVD-based quality metric for image and video using machine learning IEEE

Trans Systbdquo Man and Cybernetics Part B (Cybernetics) 42 (2012) pp 347ndash364[54] T-H OH Y MATSUSHITA Y-W TAI AND I SO KWEON Fast randomized singular value thresholding

for nuclear norm minimization in Proceedings of the IEEE Conference on Computer Vision and PatternRecognition 2015 IEEE Conference Proceedings IEEE Los Alamitos 2015 pp 4484ndash4493

[55] G QUINTANA-ORTIacute X SUN AND C H BISCHOF A BLAS-3 version of the QR factorization with columnpivoting SIAM J Sci Comput 19 (1998) pp 1486ndash1494

[56] B RECHT M FAZEL AND P A PARRILO Guaranteed minimum-rank solutions of linear matrix equationsvia nuclear norm minimization SIAM Rev 52 (2010) pp 471ndash501

[57] J D RENNIE AND N SREBRO Fast maximum margin matrix factorization for collaborative prediction inProceedings of the 22nd International Conference on Machine Learning L De Raedt and S Wrobel edsACM Press Neow York 2005 pp 713ndash719

[58] V ROKHLIN A SZLAM AND M TYGERT A randomized algorithm for principal component analysis SIAMJ Matrix Anal Appl 31 (2009) pp 1100ndash1124

[59] A K SAIBABA HOID higher order interpolatory decomposition for tensors based on Tucker representationSIAM J Matrix Anal Appl 37 (2016) pp 1223ndash1249

[60] B SAVAS AND L ELDEacuteN Handwritten digit classification using higher order singular value decompositionPattern Recognition 40 (2007) pp 993ndash1003

[61] R SCHREIBER AND C VAN LOAN A storage-efficient WY representation for products of Householdertransformations SIAM J Sci Statist Comput 10 (1989) pp 53ndash57

[62] A SHASHUA AND T HAZAN Non-negative tensor factorization with applications to statistics and com-puter vision in Proceedings of the 22nd International Conference on Machine Learning S DzeroskiL De Raedt and S Wrobel eds ACM New York 2005 pp 792ndash799

[63] N D SIDIROPOULOS R BRO AND G B GIANNAKIS Parallel factor analysis in sensor array processingIEEE Trans Signal Process 48 (2000) pp 2377ndash2388

[64] D C SORENSEN AND M EMBREE A DEIM induced CUR factorization SIAM J Sci Comput 38 (2016)pp A1454ndashA1482

[65] N SREBRO J RENNIE AND T S JAAKKOLA Maximum-margin matrix factorization in NIPSrsquo04 Proceed-ings of the 17th International Conference on Neural Information Processing Systems L K Saul Y Weissand L Bottou eds MIT Press Cambridge 2004 pp 1329ndash1336

[66] G W STEWART The QLP approximation to the singular value decomposition SIAM J Sci Comput 20(1999) pp 1336ndash1348

ETNAKent State University and

Johann Radon Institute (RICAM)

494 Y FENG J XIAO AND M GU

[67] S TAHERI Q QIU AND R CHELLAPPA Structure-preserving sparse decomposition for facial expressionanalysis IEEE Trans Image Process 23 (2014) pp 3590ndash3603

[68] K-C TOH AND S YUN An accelerated proximal gradient algorithm for nuclear norm regularized linearleast squares problems Pac J Optim 6 (2010) pp 615ndash640

[69] L N TREFETHEN AND D BAU III Numerical Linear Algebra SIAM Philadelphia 1997[70] L R TUCKER Some mathematical notes on three-mode factor analysis Psychometrika 31 (1966) pp 279ndash

311[71] N VANNIEUWENHOVEN R VANDEBRIL AND K MEERBERGEN A new truncation strategy for the higher-

order singular value decomposition SIAM J Sci Comput 34 (2012) pp A1027ndashA1052[72] M A O VASILESCU AND D TERZOPOULOS Multilinear analysis of image ensembles Tensorfaces in

European Conference on Computer Vision 2002 A Heyden G Sparr M Nielsen and P Johansen edsLecture Notes in Computer Science vol 2350 Springer Berlin 2002 pp 447ndash460

[73] N VERVLIET O DEBALS L SORBER M V BAREL AND L DE LATHAUWER Tensorlab user guideavailable at httpwwwtensorlabnet

[74] J XIAO AND M GU Spectrum-revealing Cholesky factorization for kernel methods in Proceedings of the 16thIEEE International Conference on Data Mining (ICDM) IEEE Conference Proceedings Los Alamitos2016 pp 1293ndash1298

[75] J XIAO M GU AND J LANGOU Fast parallel randomized QR with column pivoting algorithms for reliablelow-rank matrix approximations in Proceedings of the 24th IEEE International Conference on HighPerformance Computing (HiPC) IEEE Conference Proceedings Los Alamitos 2017 pp 233ndash242

[76] T ZHANG AND G H GOLUB Rank-one approximation to high order tensors SIAM J Matrix Anal Appl23 (2001) pp 534ndash550

Page 7: FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION AND …etna.mcs.kent.edu/vol.51.2019/pp469-494.dir/pp469-494.pdfminimization. In Section3, we introduce Flip-Flop SRQR and analyze its

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 475

Many researchers have devoted themselves to solve the above nuclear norm minimizationproblem and plenty of algorithms have been proposed including singular value thresholding(SVT) [7] fixed point continuous (FPC) [49] accelerated proximal gradient (APG) [68]augmented Lagrange multiplier (ALM) [45] The most expensive part of these algorithms liesin the computation of the truncated SVD Inexact augmented Lagrange multiplier (IALM) [45]has been proved to be one of the most accurate and efficient among them We now describeIALM for the robust PCA and matrix completion problems

The robust PCA problem can be formalized as a minimization problem of a sum of nuclearnorm and a scaled matrix l1-norm (sum of matrix entries in absolute value)

(27) min Xlowast + λE1 subject to M = X + E

where M is a measured matrix X has low-rank E is an error matrix and sufficiently sparseand λ is a positive weighting parameter Algorithm 7 in the appendix describes details of theIALM method to solve the robust PCA problem [45]

The matrix completion problem [9 45] can be written in the form

(28) minXisinRmtimesn

Xlowast subject to X + E = M πΩ (E) = 0

where πΩ Rmtimesn rarr Rmtimesn is an orthogonal projection that keeps the entries in Ω unchangedand sets those outside Ω zeros In [45] the authors applied the IALM method to the matrixcompletion problem (Algorithm 8 in the appendix)

3 Flip-Flop SRQR factorization

31 Flip-Flop SRQR factorization In this section we introduce our Flip-Flop SRQRfactorization a slightly different variant of TUXV (Algorithm 5 in the appendix) to computean SVD approximation based on the QLP factorization Given an integer l ge k we run SRQR(the version without computing the trailing matrix) with l steps on A

(31) AΠ = QR = Q

[R11 R12

R22

]

where R11 isin Rltimesl is upper triangular R12 isin Rltimes(nminusl) and R22 isin R(mminusl)times(nminusl) Then werun partial QRNP with l steps on RT

(32) RT =

[RT11

RT12 RT22

]= Q

[R11 R12

R22

]asymp Q1

[R11 R12

]

where Q =[Q1 Q2

]with Q1 isin Rntimesl Therefore combining with the fact that

AΠQ1 = Q[R11 R12

]T

we can approximate the matrix A by

(33) A = QRΠT = Q(RT)T

ΠT asymp Q

[RT11

RT12

]QT1 ΠT = A

(ΠQ1

)(ΠQ1

)T

We denote the rank-k truncated SVD ofAΠQ1 by UkΣkVTk LetUk = Uk Vk = ΠQ1Vk

then using (33) a rank-k approximate SVD of A is obtained

(34) A asymp UkΣkVTk

where Uk isin Rmtimesk Vk isin Rntimesk are column orthonormal and Σk = Diag (σ1 middot middot middot σk) withσirsquos being the leading k singular values of AΠQ1 The Flip-Flop SRQR factorization isoutlined in Algorithm 3

ETNAKent State University and

Johann Radon Institute (RICAM)

476 Y FENG J XIAO AND M GU

Algorithm 3 Flip-Flop Spectrum-Revealing QR FactorizationInputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0Integer l ge k Tolerance g gt 1 for g2OutputsU isin Rmtimesk contains the approximate top k left singular vectors of AΣ isin Rktimesk contains the approximate top k singular values of AV isin Rntimesk contains the approximate top k right singular vectors of AAlgorithmRun SRQR on A to l steps to obtain (R11 R12)Run QRNP on (R11 R12)

T to obtain Q1 represented by a sequence of Householder vectors

A = AΠQ1[U Σ V ] = svd

(A)

U = U ( 1 k) Σ = Σ (1 k 1 k) V = ΠQ1V ( 1 k)

32 Complexity analysis In this section we perform a complexity analysis of Flip-Flop SRQR Since the approximate SVD only makes sense when the target rank k is smallwe assume that k le l min (mn) The complexity analysis of Flip-Flop SRQR looks asfollows

1 The main cost of doing SRQR with TRQRCP onA is 2mnl+2(b+p)mn+(m+n)l22 The main cost of the QR factorization for [R11 R12]

T and forming Q1 is 2nl2minus 23 l

33 The main cost of computing A = AΠQ1 is 2mnl4 The main cost of computing [Usimsim] = svd

(A)

is O(ml2)5 The main cost of forming Vk is 2nlk

Since k le l min (mn) the complexity of Flip-Flop SRQR is 4mnl + 2(b+ p)mnby omitting the lower-order terms

On the other hand the complexity of the approximate SVD with randomized subspaceiteration (RSISVD) [27 30] is (4 + 4q)mn (k + p) where p is the oversampling size and qis the number of subspace iterations (see the detailed analysis in the appendix) In practice pis chosen to be a small integer for instance 5 in RSISVD In Flip-Flop SRQR l is usuallychosen to be a little bit larger than k eg l = k + 5 We also found that l = k is sufficient interms of the approximation quality in our numerical experiments Therefore we observe thatFlip-Flop SRQR is more efficient than RSISVD for any q gt 0

Since a practical dataset matrix can be too large to fit into the main memory the cost oftransferring the matrix into a memory hierarchy typically predominates the arithmetic costTherefore we also compare the number of data passes between RSISVD and Flip-Flop SRQRRSISVD needs 2q + 2 passes while Flip-Flop SRQR needs 4 passes computing B = ΩAswapping the columns of A realizing a partial QR factorization on A and computing AΠQ1Therefore Flip-Flop SRQR requires less data transfer than RSISVD for any q gt 1

33 Quality analysis of Flip-Flop SRQR This section is devoted to the quality analysisof Flip-Flop SRQR We start with Lemma 31

LEMMA 31 Given any matrix X = [X1X2] with Xi isin Rmtimesni i = 1 2 and n1 +n2 = n

σj (X)2 le σj (X1)

2+ X222 1 le j le min(mn)

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 477

Proof Since XXT = X1XT1 + X2X

T2 we obtain the above result using [33 Theo-

rem 3316]

We are now ready to derive bounds for the singular values and the approximation errorof Flip-Flop SRQR We need to emphasize that even if the target rank is k we run Flip-FlopSRQR with an actual target rank l which is a little bit larger than k The difference betweenk and l can create a gap between the singular values of A so that we can obtain a reliablelow-rank approximation

THEOREM 32 Given a matrix A isin Rmtimesn a target rank k an oversampling size p andan actual target rank l ge k the matrices Uk Σk Vk in (34) computed by Flip-Flop SRQRsatisfy

(35) σj(Σk) ge σj(A)

4

radic1 +

2R2242σ4j (Σk)

1 le j le k

and

(36)∥∥Aminus UkΣkV

Tk

∥∥2le σk+1 (A)

4

radic1 + 2

(R222σk+1 (A)

)4

where R22 isin R(mminusl)times(nminusl) is the trailing matrix in (31) Using the properties of SRQR wefurthermore have

(37) σj (Σk) ge σj (A)

4

radic1 + min

(2τ4 τ4 (2 + 4τ4)

(σl+1(A)σj(A)

)4) 1 le j le k

and

(38)∥∥Aminus UkΣkV

Tk

∥∥2le σk+1 (A)

4

radic1 + 2τ4

(σl+1 (A)

σk+1 (A)

)4

where τ and τ defined in (312) have matrix dimension-dependent upper bounds

τ le g1g2

radic(l + 1) (nminus l) and τ le g1g2

radicl (nminus l)

where g1 leradic

1+ε1minusε and g2 le g with high probability The parameters ε gt 0 and g gt 1 are

user-defined

Proof In terms of the singular value bounds observing that

[R11 R12

R22

][R11 R12

R22

]T=

[R11R

T11 + R12R

T12 R12R

T22

R22RT12 R22R

T22

]

ETNAKent State University and

Johann Radon Institute (RICAM)

478 Y FENG J XIAO AND M GU

we apply Lemma 31 twice for any 1 le j le k

σ2j

[R11 R12

R22

][R11 R12

R22

]Tle σ2

j

([R11R

T11 + R12R

T12 R12R

T22

])+∥∥∥[R22R

T12 R22R

T22

]∥∥∥2

2

le σ2j

(R11R

T11 + R12R

T12

)+∥∥∥R12R

T22

∥∥∥2

2+∥∥∥[R22R

T12 R22R

T22

]∥∥∥2

2

le σ2j

(R11R

T11 + R12R

T12

)+ 2

∥∥∥∥∥[R12

R22

]∥∥∥∥∥4

2

= σ2j

(R11R

T11 + R12R

T12

)+ 2 R2242 (39)

The relation (39) can be further rewritten as

σ4j (A) le σ4

j

([R11 R12

])+ 2 R2242 = σ4

j (Σk) + 2 R2242 1 le j le k

which is equivalent to

σj(Σk) ge σj(A)

4

radic1 +

2R2242σ4j (Σk)

For the residual matrix bound we let[R11 R12

]def=[R11 R12

]+[δR11 δR12

]

where[R11 R12

]is the rank-k truncated SVD of

[R11 R12

] Since

(310)∥∥Aminus UkΣkV

Tk

∥∥2

=

∥∥∥∥∥AΠminusQ[R11 R12

0

]TQT

∥∥∥∥∥2

it follows from the orthogonality of singular vectors that[R11 R12

]T [δR11 δR12

]= 0

and therefore[R11 R12

]T [R11 R12

]=[R11 R12

]T [R11 R12

]+[δR11 δR12

]T [δR11 δR12

]

which implies

(311) RT12R12 = RT

12R12 +(δR12

)T (δR12

)

Similarly to the deduction of (39) from (311) we can derive∥∥∥∥[δR11 δR12

R22

]∥∥∥∥4

2

le∥∥[δR11 δR12

]∥∥4

2+ 2

∥∥∥∥[δR12

R22

]∥∥∥∥4

2

le σ4k+1 (A) + 2

∥∥∥∥∥[R12

R22

]∥∥∥∥∥4

2

= σ4k+1 (A) + 2 R2242

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 479

Combining with (310) it now follows that

∥∥Aminus UkΣkVTk

∥∥2

=

∥∥∥∥∥AΠminusQ[R11 R12

0

]TQT

∥∥∥∥∥2

=

∥∥∥∥[δR11 δR12

R22

]∥∥∥∥2

le σk+1 (A)4

radic1 + 2

(R222σk+1 (A)

)4

To obtain an upper bound of R222 in (35) and (36) we follow the analysis of SRQRin [75]

From the analysis of [75 Section IV] Algorithm 2 ensures that g1 leradic

1+ε1minusε and g2 le g

with high probability (the actual probability-guarantee formula can be found in [75 SectionIV]) where g1 and g2 are defined by (26) Here 0 lt ε lt 1 is a user defined parameter toadjust the choice of the oversampling size p used in the TRQRCP initialization part in SRQRg gt 1 is a user-defined parameter in the extra swapping part in SRQR Let

(312) τdef= g1g2

R222R2212

∥∥∥RminusT∥∥∥minus1

12

σl+1 (A)and τ

def= g1g2

R222R2212

∥∥RminusT11

∥∥minus1

12

σk (Σk)

where R is defined in equation (25) Since 1radicnX12 le X2 le

radicnX12 and

σi (X1) le σi (X) 1 le i le min (s t) for any matrix X isin Rmtimesn and submatrix X1 isin Rstimestof X we obtain that

τ = g1g2R222R2212

∥∥∥RminusT∥∥∥minus1

12

σl+1

(R) σl+1

(R)

σl+1 (A)le g1g2

radic(l + 1) (nminus l)

Using the fact that σl([R11 R12

])= σl

(R11

)and σk (Σk) = σk

([R11 R12

])by (32)

and (34)

τ = g1g2R222R2212

∥∥RminusT11

∥∥minus1

12

σl (R11)

σl (R11)

σl([R11 R12

]) σl ([R11 R12

])σk (Σk)

le g1g2

radicl (nminus l)

By the definition of τ

(313) R222 = τ σl+1 (A)

Plugging this into (36) yields (38)By the definition of τ we observe that

(314) R222 le τσk (Σk)

By (35) and (314)(315)

σj (Σk) ge σj (A)

4

radic1 + 2

(R222σj(Σk)

)4ge σj (A)

4

radic1 + 2

(R222σk(Σk)

)4ge σj (A)

4radic

1 + 2τ4 1 le j le k

ETNAKent State University and

Johann Radon Institute (RICAM)

480 Y FENG J XIAO AND M GU

On the other hand using (35)

σ4j (A) le σ4

j (Σk)

(1 + 2

σ4j (A)

σ4j (Σk)

R2242σ4j (A)

)

le σ4j (Σk)

1 + 2

(σ4j (Σk) + 2 R2242

)σ4j (Σk)

R2242σ4j (A)

le σ4

j (Σk)

(1 + 2

(1 + 2

R2242σ4k (Σk)

)R2242σ4j (A)

)

that is

σj (Σk) ge σj (A)

4

radic1 +

(2 + 4

R2242σ4k(Σk)

)R2242σ4j (A)

Plugging (313) and (314) into this above equation yields

(316) σj (Σk) ge σj (A)

4

radic1 + τ4 (2 + 4τ4)

(σl+1(A)σj(A)

)4 1 le j le k

Combining (315) with (316) we arrive at (37)We note that (35) and (36) still hold true if we replace k by lInequality (37) shows that with the definitions (312) of τ and τ Flip-Flop SRQR can

reveal at least a dimension-dependent fraction of all the leading singular values of A andindeed approximate them very accurately in case they decay relatively quickly Moreover (38)shows that Flip-Flop SRQR can compute a rank-k approximation that is up to a factor of4

radic1 + 2τ4

(σl+1(A)σk+1(A)

)4

from optimal In situations where the singular values of A decay

relatively quickly our rank-k approximation is about as accurate as the truncated SVD with achoice of l such that

σl+1 (A)

σk+1 (A)= o (1)

REMARK 33 The ldquohigh probabilityrdquo mentioned in Theorem 32 is dependent on theoversampling size p but not on the block size b We decided not to dive deep into the probabilityexplanation since it will make the proof much longer and more tedious Theoretically thecomputed oversampling size p for certain given high probability would be rather high Forexample to achieve a probability 095 we need to choose p to be at least 500 However inpractice we would never choose such a large p since it will deteriorate the code efficiencyIt turns out that a small p like 5 sim 10 would achieve good approximation results empiricallyThe detailed explanation about what does high probability mean here can be found in [75]In terms of the choice of the block size b it wonrsquot affect the analysis or the error boundsThe choice of b is purely for the performance of the codes Only a suitable b can take fulladvantage of the cache hierarchy and BLAS routines

4 Numerical experiments In this section we demonstrate the effectiveness and effi-ciency of the Flip-Flop SRQR (FFSRQR) algorithm in several numerical experiments Firstly

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 481

TABLE 41Methods for approximate SVD

Method DescriptionLANSVD Approximate SVD using Lanczos bidiagonalization with partial

reorthogonalization [42]Its Fortran and Matlab implementations are from PROPACK [41]

FFSRQR Flip-Flop Spectrum-revealing QR factorizationIts Matlab implementation is using Mex-wrappedBLAS and LAPACK routines [2]

RSISVD Approximate SVD with randomized subspace iteration [30 27]Its Matlab implementation is from tensorlab toolbox [73]

TUXV Algorithm 5 in the appendixLTSVD Linear Time SVD [16]

We use its Matlab implementation by Ma et al [49]

we compare FFSRQR with other approximate SVD algorithms for a matrix approximationSecondly we compare FFSRQR with other methods on tensor approximation problems usingthe tensorlab toolbox [73] Thirdly we compare FFSRQR with other methods for the robustPCA problem and matrix completion problem All experiments are implemented on a laptopwith 29 GHz Intel Core i5 processor and 8 GB of RAM The codes based on LAPACK [2]and PROPACK [41] in Section 41 are in Fortran The codes in Sections 42 43 are in Matlabexcept that FFSRQR is implemented in Fortran and wrapped by mex

41 Approximate truncated SVD In this section we compare FFSRQR with otherapproximate SVD algorithms for low-rank matrix approximation All tested methods are listedin Table 41 Since LTSVD has only a Matlab implementation this method is only consideredin Sections 42 and 43 The test matrices areType 1 A isin Rmtimesn [66] is defined by A = U DV T + 01 middot dss middotE where U isin Rmtimess V isin

Rntimess are column-orthonormal matrices and D isin Rstimess is a diagonal matrix with sgeometrically decreasing diagonal entries from 1 to 10minus3 dss is the last diagonalentry of D E isin Rmtimesn is a random matrix where the entries are independentlysampled from a normal distribution N (0 1) In our numerical experiment we testthree different random matrices The square matrix has a size of 10000times 10000 theshort-fat matrix has a size of 1000times 10000 and the tall-skinny matrix has a size of10000times 1000

Type 2 A isin R4929times4929 is a real data matrix from the University of Florida sparse matrixcollection [11] Its corresponding file name is HBGEMAT11

For a given matrix A isin Rmtimesn the relative SVD approximation error is measured by∥∥Aminus UkΣkVTk

∥∥FAF where Σk contains the approximate top k singular values and

Uk Vk are the corresponding approximate top k singular vectors The parameters used inFFSRQR RSISVD TUXV and LTSVD are listed in Table 42 The additional time requiredto compute g2 is negligible In our experiments g2 always remains modest and never triggerssubsequent SRQR column swaps However computing g2 is still recommended since g2

serves as an insurance policy against potential quality degration by TRQRCP The block size bis chosen to be 32 according to the xGEQP3 LAPACK routine When k is less than 32 weimplement an unblock version of the partial QRCP

Figures 41ndash43 display the run time the relative approximation error and the top 20singular values comparison respectively for four different matrices In terms of speed FFSRQRis always faster than LANSVD and RSISVD Moreover the efficiency difference is more

ETNAKent State University and

Johann Radon Institute (RICAM)

482 Y FENG J XIAO AND M GU

TABLE 42Parameters used in RSISVD FFSRQR TUXV and LTSVD

Method ParameterFFSRQR oversampling size p = 5 block size b = min32 k l = k d = 10 g = 20RSISVD oversampling size p = 5 subspace iteration q = 1TUXV oversampling size p = 5 block size b = min32 k l = kLTSVD probabilities pi = 1n

obvious when k is relatively large FFSRQR is only slightly slower than TUXV In termsof the relative approximation error FFSRQR is comparable to the other three methods Interms of the top singular values approximation Figure 43 displays the top 20 singular valueswhen k = 500 for four different matrices In Figure 43 FFSRQR is superior to TUXV aswe take the absolute values of the main diagonal entries of the upper triangle matrix R as theapproximate singular values for TUXV

100 200 300 400 500

target rank k

100

101

102

103

104

run tim

e (

sec)

FFSRQR

RSISVD

LANSVD

TUXV

(a) Type 1 Random square matrix

100 200 300 400 500

target rank k

10-1

100

101

102

run tim

e (

sec)

FFSRQR

RSISVD

LANSVD

TUXV

(b) Type 1 Random short-fat matrix

100 200 300 400 500

target rank k

10-1

100

101

102

run tim

e (

sec)

FFSRQR

RSISVD

LANSVD

TUXV

(c) Type 1 Random tall-skinny matrix

100 200 300 400 500

target rank k

10-1

100

101

102

103

run tim

e (

sec)

FFSRQR

RSISVD

LANSVD

TUXV

(d) Type 2 GEMAT11

FIG 41 Run time comparison for approximate SVD algorithms

42 Tensor approximation This section illustrates the effectiveness and efficiency ofFFSRQR for computing approximate tensors ST-HOSVD [3 71] is one of the most efficient

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 483

100 200 300 400 500

target rank k

022

024

026

028

03

032

034

036re

lative

ap

pro

xim

atio

n e

rro

r

FFSRQR

RSISVD

LANSVD

TUXV

(a) Type 1 Random square matrix

100 200 300 400 500

target rank k

10-2

10-1

100

rela

tive

ap

pro

xim

atio

n e

rro

r

FFSRQR

RSISVD

LANSVD

TUXV

(b) Type 1 Random short-fat matrix

100 200 300 400 500

target rank k

10-2

10-1

100

rela

tive

ap

pro

xim

atio

n e

rro

r

FFSRQR

RSISVD

LANSVD

TUXV

(c) Type 1 Random tall-skinny matrix

100 200 300 400 500

target rank k

024

026

028

03

032

034

036

rela

tive

ap

pro

xim

atio

n e

rro

rFFSRQR

RSISVD

LANSVD

TUXV

(d) Type 2 GEMAT11

FIG 42 Relative approximation error comparison for approximate SVD algorithms

algorithms to compute a Tucker decomposition of tensors and the most costly part of thisalgorithm is to compute an SVD or approximate SVD of the tensor unfoldings The truncatedSVD and RSISVD are used in the routines MLSVD and MLSVD_RSI respectively in theMatlab tensorlab toolbox [73] Based on this Matlab toolbox we implement ST-HOSVDusing FFSRQR or LTSVD to do the SVD approximation We name these two new routines byMLSVD_FFSRQR and MLSVD_LTSVD respectively and compare these four routines in thenumerical experiment We also have Python codes for this tensor approximation experimentWe donrsquot list the results of the python programs here as they are similar to those for Matlab

421 A sparse tensor example We test a sparse tensor X isin Rntimesntimesn of the followingformat [64 59]

X =

10sumj=1

1000

jxj yj zj +

nsumj=11

1

jxj yj zj

where xj yj zj isin Rn are sparse vectors with nonnegative entries The symbol ldquordquo representsthe vector outer product We compute a rank-(k k k) Tucker decomposition [GU1 U2 U3]using MLSVD MLSVD_FFSRQR MLSVD_RSI and MLSVD_LTSVD respectively Therelative approximation error is measured by X minus XkF XF where Xk = G times1 U1 times2

U2 times3 U3

ETNAKent State University and

Johann Radon Institute (RICAM)

484 Y FENG J XIAO AND M GU

5 10 15 20

index

06

07

08

09

1to

p 2

0 s

ingula

r valu

es

FFSRQR

RSISVD

LANSVD

TUXV

(a) Type 1 Random square matrix

5 10 15 20

index

06

07

08

09

1

top 2

0 s

ingula

r valu

es

FFSRQR

RSISVD

LANSVD

TUXV

(b) Type 1 Random short-fat matrix

5 10 15 20

index

06

065

07

075

08

085

09

095

top

20

sin

gu

lar

va

lue

s

FFSRQR

RSISVD

LANSVD

TUXV

(c) Type 1 Random tall-skinny matrix

5 10 15 20

index

101

102

103

top

20

sin

gu

lar

va

lue

sFFSRQR

RSISVD

LANSVD

TUXV

(d) Type 2 GEMAT11

FIG 43 Top 20 singular values comparison for approximate SVD algorithms

Figure 44 displays a comparison of the efficiency and accuracy of different methods for a400times 400times 400 sparse tensor approximation problem MLSVD_LTSVD is the fastest but theleast accurate one The other three methods have similar accuracy while MLSVD_FFSRQR isfaster when the target rank k is larger

422 Handwritten digits classification MNIST is a handwritten digits image data setcreated by Yann LeCun [43] Every digit is represented by a 28times28 pixel image Handwrittendigits classification has the objective of training a classification model to classify new unlabeledimages A HOSVD algorithm is proposed by Savas and Elden [60] to classify handwrittendigits To reduce the training time a more efficient ST-HOSVD algorithm is introducedin [71]

We do handwritten digits classification using MNIST which consists of 60 000 trainingimages and 10 000 test images The number of training images in each class is restrictedto 5421 so that the training set is equally distributed over all classes The training set isrepresented by a tensor X of size 786times 5421times 10 The classification relies on Algorithm 2in [60] We use various algorithms to obtain an approximation X asymp G times1 U1 times2 U2 times3 U3

where the core tensor G has size 65times 142times 10The results are summarized in Table 43 In terms of run time we observe that our method

MLSVD_FFSRQR is comparable to MLSVD_RSI while MLSVD is the most expensive

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 485

50 100 150 200

target rank k

10-5

10-4

10-3

10-2

10-1

100

rela

tive

ap

pro

xim

atio

n e

rro

r

MLSVD

MLSVD_FFSRQR

MLSVD_RSI

MLSVD_LTSVD

50 100 150 200

target rank k

10-2

10-1

100

101

102

run tim

e (

sec)

MLSVD

MLSVD_FFSRQR

MLSVD_RSI

MLSVD_LTSVD

FIG 44 Run time and relative approximation error comparison on a sparse tensor

TABLE 43Comparison on handwritten digits classification

MLSVD MLSVD_FFSRQR MLSVD_RSI MLSVD_LTSVD

Training Time [sec] 272121 15455 19343 04266Relative Model Error 04099 04273 04247 05162Classification Accur 9519 9498 9505 9259

one and MLSVD_LTSVD is the fastest one In terms of classification quality MLSVDMLSVD_FFSRQR and MLSVD_RSI are comparable while MLSVD_LTSVD is the leastaccurate one

43 Solving nuclear norm minimization problem To show the effectiveness of theFFSRQR algorithm for the nuclear norm minimization problems we investigate two scenariosthe robust PCA (27) and the matrix completion problem (28) The test matrix used for therobust PCA is introduced in [45] and the test matrices used in the matrix completion are tworeal data sets We use the IALM method [45] to solve both problems the IALM code can bedownloaded1

431 Robust PCA To solve the robust PCA problem we replace the approximateSVD part in the IALM method [45] by various methods We denote the actual solution tothe robust PCA problem by a matrix pair (Xlowast Elowast) isin Rmtimesn times Rmtimesn The matrix Xlowast isXlowast = XLX

TR with XL isin Rmtimesk XR isin Rntimesk being random matrices whose entries are

independently sampled from a normal distribution The sparse matrix Elowast is a random matrixwhere its non-zero entries are independently sampled from a uniform distribution over theinterval [minus500 500] The input to the IALM algorithm has the form M = Xlowast + Elowast andthe output is denoted by

(X E

) In this numerical experiment we use the same parameter

settings as the IALM code for robust PCA rank k is 01m and the number of non-zero entriesin E is 005m2 We choose the trade-off parameter λ = 1

radicmax (mn) as suggested by

Candes et al [8] The solution quality is measured by the normalized root mean square errorX minusXlowastF XlowastF

Table 44 includes relative error run time the number of non-zero entries in E (E0)the iteration count and the number of non-zero singular values (sv) in X of the IALM

1httpperceptioncslillinoisedumatrix-ranksample_codehtml

ETNAKent State University and

Johann Radon Institute (RICAM)

486 Y FENG J XIAO AND M GU

algorithm using different approximate SVD methods We observe that IALM_FFSRQR isfaster than all the other three methods while its error is comparable to IALM_LANSVD andIALM_RSISVD IALM_LTSVD is relatively slow and not effective

TABLE 44Comparison on robust PCA

Size Method Error Time (sec) E0 Iter sv

1000times 1000

IALM_LANSVD 333e-07 579e+00 50000 22 100IALM_FFSRQR 279e-07 102e+00 50000 25 100IALM_RSISVD 336e-07 109e+00 49999 22 100IALM_LTSVD 992e-02 311e+00 999715 100 100

2000times 2000

IALM_LANSVD 261e-07 591e+01 199999 22 200IALM_FFSRQR 182e-07 693e+00 199998 25 200IALM_RSISVD 263e-07 738e+00 199996 22 200IALM_LTSVD 842e-02 220e+01 3998937 100 200

4000times 4000

IALM_LANSVD 138e-07 465e+02 799991 23 400IALM_FFSRQR 139e-07 443e+01 800006 26 400IALM_RSISVD 151e-07 504e+01 799990 23 400IALM_LTSVD 894e-02 154e+02 15996623 100 400

6000times 6000

IALM_LANSVD 130e-07 166e+03 1799982 23 600IALM_FFSRQR 102e-07 142e+02 1799993 26 600IALM_RSISVD 144e-07 162e+02 1799985 23 600IALM_LTSVD 858e-02 555e+02 35992605 100 600

432 Matrix completion We solve the matrix completion problems for two real datasets used in [68] the Jester joke data set [24] and the MovieLens data set [32] The Jester jokedata set consists of 41 million ratings for 100 jokes from 73 421 users and can be downloadedfrom the website httpgoldbergberkeleyedujester-data We test withthe following data matrices

bull jester-1 Data from 24 983 users who have rated 36 or more jokesbull jester-2 Data from 23 500 users who have rated 36 or more jokesbull jester-3 Data from 24 938 users who have rated between 15 and 35 jokesbull jester-all The combination of jester-1 jester-2 and jester-3

The MovieLens data set can be downloaded fromhttpsgrouplensorgdatasetsmovielensWe test with the following data matricesbull movie-100K 100 000 ratings of 943 users for 1682 moviesbull movie-1M 1 million ratings of 6040 users for 3900 moviesbull movie-latest-small 100 000 ratings of 700 users for 9000 movies

For each data set we let M be the original data matrix where Mij stands for the ratingof joke (movie) j by user i and Γ be the set of indices where Mij is known The matrixcompletion algorithm quality is measured by the Normalized Mean Absolute Error (NMAE)defined by

NMAEdef=

1|Γ|sum

(ij)isinΓ |Mij minusXij |rmax minus rmin

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 487

where Xij is the prediction of the rating of joke (movie) j given by user i and rmin rmax arethe lower and upper bounds of the ratings respectively For the Jester joke data sets we setrmin = minus10 and rmax = 10 For the MovieLens data sets we set rmin = 1 and rmax = 5

Since |Γ| is large we randomly select a subset Ω from Γ and then use Algorithm 8 tosolve the problem (28) We randomly select 10 ratings for each user in the Jester joke datasets while we randomly choose about 50 of the ratings for each user in the MovieLensdata sets Table 45 includes the parameter settings in the algorithms The maximum iterationnumber is 100 in IALM and all other parameters are the same as those used in [45]

The numerical results are included in Table 46 We observe that IALM_FFSRQR achievesalmost the same recoverability as other methods (except IALM_LTSVD) and is slightly fasterthan IALM_RSISVD for these two data sets

TABLE 45Parameters used in the IALM method on matrix completion

Data set m n |Γ| |Ω|jester-1 24983 100 181e+06 249830jester-2 23500 100 171e+06 235000jester-3 24938 100 617e+05 249384

jester-all 73421 100 414e+06 734210movie-100K 943 1682 100e+05 49918movie-1M 6040 3706 100e+06 498742

movie-latest-small 671 9066 100e+05 52551

5 Conclusions We have presented the Flip-Flop SRQR factorization a variant ofthe QLP factorization to compute low-rank matrix approximations The Flip-Flop SRQRalgorithm uses a SRQR factorization to initialize the truncated version of a column-pivotedQR factorization and then forms an LQ factorization For the numerical results presented theerrors in the proposed algorithm were comparable to those obtained from the other state-of-the-art algorithms This new algorithm is cheaper to compute and produces quality low-rankmatrix approximations Furthermore we prove singular value lower bounds and residual errorupper bounds for the Flip-Flop SRQR factorization In situations where singular values ofthe input matrix decay relatively quickly the low-rank approximation computed by Flip-FlopSRQR is guaranteed to be as accurate as the truncated SVD We also perform a complexityanalysis to show that Flip-Flop SRQR is faster than the approximate SVD with randomizedsubspace iteration Future work includes reducing the overhead cost in Flip-Flop SRQR andimplementing the Flip-Flop SRQR algorithm on distributed memory machines for popularapplications such as distributed PCA

Acknowledgments The authors would like to thank Yousef Saad the editor and threeanonymous referees who kindly reviewed the earlier version of this manuscript and providedvaluable suggestions and comments

Appendix A Partial QR factorization with column pivotingThe details of partial QRCP are covered in Algorithm 4

Appendix B TUXV algorithmThe details of TUXV algorithm are presented in Algorithm 5

Appendix C Computing Tucker decomposition of higher-order tensorsAlgorithm 6 computes the Tucker decomposition of higher-order tensors ie sequentially

truncated higher-order SVD

ETNAKent State University and

Johann Radon Institute (RICAM)

488 Y FENG J XIAO AND M GU

TABLE 46Comparison on matrix completion

Data set Method Iter Time NMAE sv σmax σmin

jester-1

IALM-LANSVD 12 706e+00 184e-01 100 214e+03 100e+00IALM-FFSRQR 12 344e+00 169e-01 100 228e+03 100e+00IALM-RSISVD 12 375e+00 189e-01 100 212e+03 100e+00IALM-LTSVD 100 211e+01 174e-01 62 300e+03 100e+00

jester-2

IALM-LANSVD 12 680e+00 185e-01 100 213e+03 100e+00IALM-FFSRQR 12 279e+00 170e-01 100 229e+03 100e+00IALM-RSISVD 12 359e+00 191e-01 100 212e+03 100e+00IALM-LTSVD 100 203e+01 175e-01 58 296e+03 100e+00

jester-3

IALM-LANSVD 12 705e+00 126e-01 99 179e+03 100e+00IALM-FFSRQR 12 303e+00 122e-01 100 171e+03 100e+00IALM-RSISVD 12 385e+00 131e-01 100 178e+03 100e+00IALM-LTSVD 100 212e+01 133e-01 55 250e+03 100e+00

jester-all

IALM-LANSVD 12 239e+01 172e-01 100 356e+03 100e+00IALM-FFSRQR 12 112e+01 162e-01 100 363e+03 100e+00IALM-RSISVD 12 134e+01 182e-01 100 347e+03 100e+00IALM-LTSVD 100 699e+01 168e-01 52 492e+03 100e+00

movie-100K

IALM-LANSVD 29 286e+01 183e-01 285 121e+03 100e+00IALM-FFSRQR 30 455e+00 167e-01 295 153e+03 100e+00IALM-RSISVD 29 482e+00 182e-01 285 129e+03 100e+00IALM-LTSVD 48 142e+01 147e-01 475 191e+03 100e+00

movie-1M

IALM-LANSVD 50 740e+02 158e-01 495 499e+03 100e+00IALM-FFSRQR 53 207e+02 137e-01 525 663e+03 100e+00IALM-RSISVD 50 223e+02 157e-01 495 535e+03 100e+00IALM-LTSVD 100 850e+02 117e-01 995 897e+03 100e+00

movie-latest-small

IALM-LANSVD 31 166e+02 185e-01 305 113e+03 100e+00IALM-FFSRQR 31 196e+01 200e-01 305 142e+03 100e+00IALM-RSISVD 31 285e+01 191e-01 305 120e+03 100e+00IALM-LTSVD 63 402e+01 208e-01 298 179e+03 100e+00

Appendix D IALM for solving two special nuclear norm minimization problemsAlgorithm 7 and Algorithm 8 solve the robust PCA and matrix completion problems

respectively In Algorithm 7 Sω (x) = sgn (x) middot max (|x| minus ω 0) is the soft shrinkageoperator [29] with x isin Rn and ω gt 0 In Algorithm 8 Ω is the complement of Ω

Appendix E Approximate SVD with randomized subspace iterationThe randomized subspace iteration was proposed in [30 Algorithm 44] to compute an

orthonormal matrix whose range approximates the range of A An approximate SVD can becomputed using the aforementioned orthonormal matrix [30 Algorithm 51] A randomizedsubspace iteration is used in routine MLSVD_RSI in Matlab toolbox tensorlab [73] andMLSVD_RSI is by far the most efficient function that we can find in Matlab to compute ST-HOSVD We summarize the approximate SVD with randomized subspace iteration pseudocodein Algorithm 9

Now we perform a complexity analysis for the approximate SVD with randomizedsubspace iteration We first note that

1 The cost of generating a random matrix is negligible2 The cost of computing B = AΩ is 2mn (k + p)

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 489

Algorithm 4 Partial QRCPInputsMatrix A isin Rmtimesn Target rank kOutputsOrthogonal matrix Q isin RmtimesmUpper trapezoidal matrix R isin RmtimesnPermutation matrix Π isin Rntimesn such that AΠ = QRAlgorithmInitialize Π = In Compute column norms rs = A (1 m s) 2 (1 le s le n)for j = 1 k do

Find i = arg maxjleslen

rs Exchange rj and ri columns in A and Π

Form Householder reflection Qj from A (j m j)Update trailing matrix A (j m j n)larr QTj A (j m j n)Update rs = A (j + 1 m s) 2 (j + 1 le s le n)

end forQ = Q1Q2 middot middot middotQk is the product of all reflections R = upper trapezoidal part of A

Algorithm 5 TUXV AlgorithmInputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0OutputsColumn orthonormal matrices U isin Rmtimesk V isin Rntimesk and upper triangular matrixR isin Rktimesk such that A asymp URV T AlgorithmDo TRQRCP on A to obtain Q isin Rmtimesk R isin Rktimesn and Π isin RntimesnR = RΠT and do LQ factorization ie [VR] = qr(RT 0)Compute Z = AV and do QR factorization ie [UR] = qr(Z 0)

3 In each QR step [Qsim] = qr (B 0) the cost of computing the QR factorization ofB is 2m (k + p)

2 minus 23 (k + p)

3 (cf [69]) and the cost of forming the first (k + p)

columns in the full Q matrix is m (k + p)2

+ 13 (k + p)

3

Now we count the flops for each i in the for loop

1 The cost of computing B = AT lowastQ is 2mn (k + p)2 The cost of computing [Qsim] = qr (B 0) is 2n (k + p)

2 minus 23 (k + p)

3 and the costof forming the first (k + p) columns in the full Q matrix is n (k + p)

2+ 1

3 (k + p)3

3 The cost of computing B = A lowastQ is 2mn (k + p)4 The cost of computing [Qsim] = qr (B 0) is 2m (k + p)

2minus 23 (k + p)

3 and the costof forming the first (k + p) columns in the full Q matrix is m (k + p)

2+ 1

3 (k + p)3

Together the cost of running the for loop q times is

q

(4mn (k + p) + 3 (m+ n) (k + p)

2 minus 2

3(k + p)

3

)

Additionally the cost of computing B = QT lowastA is 2mn (k + p) the cost of doing SVD of Bis O(n (k + p)

2) and the cost of computing U = Q lowast U is 2m (k + p)

2

ETNAKent State University and

Johann Radon Institute (RICAM)

490 Y FENG J XIAO AND M GU

Algorithm 6 ST-HOSVDInputsTensor X isin RI1timesmiddotmiddotmiddottimesId desired rank (k1 kd) and processing order p = (p1 middot middot middot pd)OutputsTucker decomposition [GU1 middot middot middot Ud]AlgorithmDefine tensor G larr X for j = 1 d dor = pj Compute exact or approximate rank kr SVD of the tensor unfolding G(r) asymp UrΣrV Tr Ur larr UrUpdate G(r) larr ΣrV

Tr ie applying UTr to G

end for

Algorithm 7 Robust PCA Using IALMInputsMeasured matrix M isin Rmtimesn positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 J (M) = max

(M2 λminus1MF

) Y0 = MJ (M) E0 = 0

while not converged do(UΣV) = svd

(MminusEk + microminus1k Yk

)

Xk+1 = USmicrominus1k

(Σ)V T

Ek+1 = Sλmicrominus1k

(M minusXk+1 + microminus1

k Yk)

Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then

Breakend if

end while

Now assume k + p min (mn) and omit the lower-order terms then we arriveat (4q + 4)mn (k + p) as the complexity of approximate SVD with randomized subspaceiteration In practice q is usually chosen to be an integer between 0 and 2

REFERENCES

[1] O ALTER P O BROWN AND D BOTSTEIN Singular value decomposition for genome-wide expressiondata processing and modeling Proc Natl Acad Sci USA 97 (2000) pp 10101ndash10106

[2] E ANDERSON Z BAI C BISCHOF L S BLACKFORD J DEMMEL J DONGARRA J DU CROZA GREENBAUM S HAMMARLING A MCKENNEY AND D SORENSEN LAPACK Usersrsquo Guide 3rded SIAM Philadelphia 1999

[3] C A ANDERSSON AND R BRO Improving the speed of multi-way algorithms Part I Tucker3 ChemometricsIntell Lab Syst 42 (1998) pp 93ndash103

[4] M W BERRY S T DUMAIS AND G W OrsquoBRIEN Using linear algebra for intelligent information retrievalSIAM Rev 37 (1995) pp 573ndash595

[5] C BISCHOF AND C VAN LOAN The WY representation for products of Householder matrices SIAM J Sci

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 491

Algorithm 8 Matrix Completion Using IALMInputsSampled set Ω sampled entries πΩ (M) positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 Y0 = 0 E0 = 0while not converged do

(UΣV) = svd(MminusEk + microminus1k Yk

)

Xk+1 = USmicrominus1k

(Σ)V T

Ek+1 = πΩ

(M minusXk+1 + microminus1

k Yk)

Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then

Breakend if

end while

Statist Comput 8 (1987) pp S2ndashS13[6] P BUSINGER AND G H GOLUB Handbook series linear algebra Linear least squares solutions by House-

holder transformations Numer Math 7 (1965) pp 269ndash276[7] J-F CAI E J CANDEgraveS AND Z SHEN A singular value thresholding algorithm for matrix completion

SIAM J Optim 20 (2010) pp 1956ndash1982[8] E J CANDEgraveS X LI Y MA AND J WRIGHT Robust principal component analysis J ACM 58 (2011)

Art 11 37 pages[9] E J CANDEgraveS AND B RECHT Exact matrix completion via convex optimization Found Comput Math 9

(2009) pp 717ndash772[10] J D CARROLL AND J-J CHANG Analysis of individual differences in multidimensional scaling via an

N-way generalization of ldquoEckart-Youngrdquo decomposition Psychometrika 35 (1970) pp 283ndash319[11] T A DAVIS AND Y HU The University of Florida sparse matrix collection ACM Trans Math Software 38

(2011) pp Art 1 25[12] L DE LATHAUWER AND B DE MOOR From matrix to tensor Multilinear algebra and signal processing in

Mathematics in Signal Processing IV Inst Math Appl Conf Ser Oxford Univ Press Oxford 1998pp 1ndash16

[13] L DE LATHAUWER B DE MOOR AND J VANDEWALLE A multilinear singular value decompositionSIAM J Matrix Anal Appl 21 (2000) pp 1253ndash1278

[14] On the best rank-1 and rank-(R1 R2 middot middot middot RN ) approximation of higher-order tensors SIAM JMatrix Anal Appl 21 (2000) pp 1324ndash1342

[15] L DE LATHAUWER AND J VANDEWALLE Dimensionality reduction in higher-order signal processing andrank-(R1 R2 RN ) reduction in multilinear algebra Linear Algebra Appl 391 (2004) pp 31ndash55

[16] P DRINEAS R KANNAN AND M W MAHONEY Fast Monte Carlo algorithms for matrices II Computinga low-rank approximation to a matrix SIAM J Comput 36 (2006) pp 158ndash183

[17] J A DUERSCH AND M GU Randomized QR with column pivoting SIAM J Sci Comput 39 (2017)pp C263ndashC291

[18] C ECKART AND G YOUNG The approximation of one matrix by another of lower rank Psychometrika 1(1936) pp 211ndash218

[19] L ELDEacuteN Matrix Methods in Data Mining and Pattern Recognition SIAM Philadelphia 2007[20] M FAZEL H HINDI AND S P BOYD Log-det heuristic for matrix rank minimization with applications to

Hankel and Euclidean distance matrices in Proceedings of the 2003 American Control Conference 2003IEEE Conference Proceedings IEEE Piscataway pp 2156ndash2162

[21] M FAZEL T K PONG D SUN AND P TSENG Hankel matrix rank minimization with applications tosystem identification and realization SIAM J Matrix Anal Appl 34 (2013) pp 946ndash977

[22] C FRANK AND E NOumlTH Optimizing eigenfaces by face masks for facial expression recognition in ComputerAnalysis of Images and Patterns N Petkov and M A Westenberg eds vol 2756 of Lecture Notes inComput Sci Springer Berlin 2003 pp 646ndash654

ETNAKent State University and

Johann Radon Institute (RICAM)

492 Y FENG J XIAO AND M GU

Algorithm 9 Approximate SVD with Randomized Subspace IterationInputsMatrix A isin Rmtimesn Target rank k Oversampling size p ge 0 Number of iterations q ge 1OutputsU isin Rmtimesk contains the approximate top k left singular vectors of AΣ isin Rktimesk contains the approximate top k singular values of AV isin Rntimesk contains the approximate top k right singular vectors of AAlgorithmGenerate iid Gaussian matrix Ω isin N (0 1)

ntimes(k+p)Compute B = AΩ[Qsim] = qr (B 0)for i = 1 q doB = AT lowastQ[Qsim] = qr (B 0)B = A lowastQ[Qsim] = qr (B 0)

end forB = QT lowastA[UΣ V ] = svd (B)U = Q lowast U U = U ( 1 k)Σ = Σ (1 k 1 k)V = V ( 1 k)

[23] G W FURNAS S DEERWESTER S T DUMAIS T K LANDAUER R A HARSHMAN L A STREETERAND K E LOCHBAUM Information retrieval using a singular value decomposition model of latentsemantic structure in Proceedings of the 11th Annual International ACM SIGIR Conference on Researchand Development in Information Retrieval Y Chiaramella ed ACM New York 1988 pp 465ndash480

[24] K GOLDBERG T ROEDER D GUPTA AND C PERKINS Eigentaste A constant time collaborative filteringalgorithm Inf Retrieval 4 (2001) pp 133ndash151

[25] G GOLUB Numerical methods for solving linear least squares problems Numer Math 7 (1965) pp 206ndash216[26] G H GOLUB AND C F VAN LOAN Matrix Computations 4th ed Johns Hopkins University Press

Baltimore 2013[27] M GU Subspace iteration randomization and singular value problems SIAM J Sci Comput 37 (2015)

pp A1139ndashA1173[28] M GU AND S C EISENSTAT Efficient algorithms for computing a strong rank-revealing QR factorization

SIAM J Sci Comput 17 (1996) pp 848ndash869[29] E T HALE W YIN AND Y ZHANG Fixed-point continuation for l1-minimization methodology and

convergence SIAM J Optim 19 (2008) pp 1107ndash1130[30] N HALKO P G MARTINSSON AND J A TROPP Finding structure with randomness probabilistic

algorithms for constructing approximate matrix decompositions SIAM Rev 53 (2011) pp 217ndash288[31] R A HARSHMAN Foundations of the PARAFAC procedure Models and conditions for an ldquoexplanatoryrdquo

multimodal factor analysis UCLA Working Papers in Phonetics 16 (1970) pp 1ndash84[32] J L HERLOCKER J A KONSTAN A BORCHERS AND J RIEDL An algorithmic framework for performing

collaborative filtering in Proceedings of the 22nd Annual International ACM SIGIR Conference onResearch and Development in Information Retrieval F Gey M Heaast and R Tong eds ACM NewYork 1999 pp 230ndash237

[33] R A HORN AND C R JOHNSON Topics in Matrix Analysis Cambridge University Press Cambridge 1991[34] D A HUCKABY AND T F CHAN On the convergence of Stewartrsquos QLP algorithm for approximating the

SVD Numer Algorithms 32 (2003) pp 287ndash316[35] Stewartrsquos pivoted QLP decomposition for low-rank matrices Numer Linear Algebra Appl 12 (2005)

pp 153ndash159[36] W B JOHNSON AND J LINDENSTRAUSS Extensions of Lipschitz mappings into a Hilbert space in

Conference in Modern Analysis and Probability (New Haven Conn 1982) R Beals A Beck A Bellowand A Hajian eds vol 26 of Contemp Math Amer Math Soc Providence 1984 pp 189ndash206

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 493

[37] I T JOLLIFFE Principal Component Analysis Springer New York 1986[38] J M KLEINBERG Authoritative sources in a hyperlinked environment J ACM 46 (1999) pp 604ndash632[39] T G KOLDA Orthogonal tensor decompositions SIAM J Matrix Anal Appl 23 (2001) pp 243ndash255[40] T G KOLDA AND B W BADER Tensor decompositions and applications SIAM Rev 51 (2009) pp 455ndash

500[41] R M LARSEN PROPACK-software for large and sparse SVD calculations available at

httpsunstanfordedurmunkPROPACK[42] Lanczos bidiagonalization with partial reorthogonalization Tech Report PB-537 Department of

Computer Science Aarhus University Aarhus Denmark 1998[43] Y LECUN L BOTTOU Y BENGIO AND P HAFFNER Gradient-based learning applied to document

recognition Proc IEEE 86 (1998) pp 2278ndash2324[44] R B LEHOUCQ D C SORENSEN AND C YANG ARPACK Usersrsquo Guide Solution of Large-Scale

Eigenvalue Problems with Implicitly Restarted Arnoldi Methods SIAM Philadelphia 1998[45] Z LIN M CHEN AND Y MA The augmented Lagrange multiplier method for exact recovery of corrupted

low-rank matrices Preprint on arXiv httpsarxivorgabs10095055 2010[46] N LINIAL E LONDON AND Y RABINOVICH The geometry of graphs and some of its algorithmic

applications Combinatorica 15 (1995) pp 215ndash245[47] Z LIU A HANSSON AND L VANDENBERGHE Nuclear norm system identification with missing inputs and

outputs Systems Control Lett 62 (2013) pp 605ndash612[48] Z LIU AND L VANDENBERGHE Interior-point method for nuclear norm approximation with application to

system identification SIAM J Matrix Anal Appl 31 (2009) pp 1235ndash1256[49] S MA D GOLDFARB AND L CHEN Fixed point and Bregman iterative methods for matrix rank minimiza-

tion Math Program 128 (2011) pp 321ndash353[50] C B MELGAARD Randomized Pivoting and Spectrum-Revealing Bounds in Numerical Linear Algebra

PhD Thesis University of California Berkeley 2015[51] M MESBAHI AND G P PAPAVASSILOPOULOS On the rank minimization problem over a positive semidefinite

linear matrix inequality IEEE Trans Automat Control 42 (1997) pp 239ndash243[52] N MULLER L MAGAIA AND B M HERBST Singular value decomposition eigenfaces and 3D recon-

structions SIAM Rev 46 (2004) pp 518ndash545[53] M NARWARIA AND W LIN SVD-based quality metric for image and video using machine learning IEEE

Trans Systbdquo Man and Cybernetics Part B (Cybernetics) 42 (2012) pp 347ndash364[54] T-H OH Y MATSUSHITA Y-W TAI AND I SO KWEON Fast randomized singular value thresholding

for nuclear norm minimization in Proceedings of the IEEE Conference on Computer Vision and PatternRecognition 2015 IEEE Conference Proceedings IEEE Los Alamitos 2015 pp 4484ndash4493

[55] G QUINTANA-ORTIacute X SUN AND C H BISCHOF A BLAS-3 version of the QR factorization with columnpivoting SIAM J Sci Comput 19 (1998) pp 1486ndash1494

[56] B RECHT M FAZEL AND P A PARRILO Guaranteed minimum-rank solutions of linear matrix equationsvia nuclear norm minimization SIAM Rev 52 (2010) pp 471ndash501

[57] J D RENNIE AND N SREBRO Fast maximum margin matrix factorization for collaborative prediction inProceedings of the 22nd International Conference on Machine Learning L De Raedt and S Wrobel edsACM Press Neow York 2005 pp 713ndash719

[58] V ROKHLIN A SZLAM AND M TYGERT A randomized algorithm for principal component analysis SIAMJ Matrix Anal Appl 31 (2009) pp 1100ndash1124

[59] A K SAIBABA HOID higher order interpolatory decomposition for tensors based on Tucker representationSIAM J Matrix Anal Appl 37 (2016) pp 1223ndash1249

[60] B SAVAS AND L ELDEacuteN Handwritten digit classification using higher order singular value decompositionPattern Recognition 40 (2007) pp 993ndash1003

[61] R SCHREIBER AND C VAN LOAN A storage-efficient WY representation for products of Householdertransformations SIAM J Sci Statist Comput 10 (1989) pp 53ndash57

[62] A SHASHUA AND T HAZAN Non-negative tensor factorization with applications to statistics and com-puter vision in Proceedings of the 22nd International Conference on Machine Learning S DzeroskiL De Raedt and S Wrobel eds ACM New York 2005 pp 792ndash799

[63] N D SIDIROPOULOS R BRO AND G B GIANNAKIS Parallel factor analysis in sensor array processingIEEE Trans Signal Process 48 (2000) pp 2377ndash2388

[64] D C SORENSEN AND M EMBREE A DEIM induced CUR factorization SIAM J Sci Comput 38 (2016)pp A1454ndashA1482

[65] N SREBRO J RENNIE AND T S JAAKKOLA Maximum-margin matrix factorization in NIPSrsquo04 Proceed-ings of the 17th International Conference on Neural Information Processing Systems L K Saul Y Weissand L Bottou eds MIT Press Cambridge 2004 pp 1329ndash1336

[66] G W STEWART The QLP approximation to the singular value decomposition SIAM J Sci Comput 20(1999) pp 1336ndash1348

ETNAKent State University and

Johann Radon Institute (RICAM)

494 Y FENG J XIAO AND M GU

[67] S TAHERI Q QIU AND R CHELLAPPA Structure-preserving sparse decomposition for facial expressionanalysis IEEE Trans Image Process 23 (2014) pp 3590ndash3603

[68] K-C TOH AND S YUN An accelerated proximal gradient algorithm for nuclear norm regularized linearleast squares problems Pac J Optim 6 (2010) pp 615ndash640

[69] L N TREFETHEN AND D BAU III Numerical Linear Algebra SIAM Philadelphia 1997[70] L R TUCKER Some mathematical notes on three-mode factor analysis Psychometrika 31 (1966) pp 279ndash

311[71] N VANNIEUWENHOVEN R VANDEBRIL AND K MEERBERGEN A new truncation strategy for the higher-

order singular value decomposition SIAM J Sci Comput 34 (2012) pp A1027ndashA1052[72] M A O VASILESCU AND D TERZOPOULOS Multilinear analysis of image ensembles Tensorfaces in

European Conference on Computer Vision 2002 A Heyden G Sparr M Nielsen and P Johansen edsLecture Notes in Computer Science vol 2350 Springer Berlin 2002 pp 447ndash460

[73] N VERVLIET O DEBALS L SORBER M V BAREL AND L DE LATHAUWER Tensorlab user guideavailable at httpwwwtensorlabnet

[74] J XIAO AND M GU Spectrum-revealing Cholesky factorization for kernel methods in Proceedings of the 16thIEEE International Conference on Data Mining (ICDM) IEEE Conference Proceedings Los Alamitos2016 pp 1293ndash1298

[75] J XIAO M GU AND J LANGOU Fast parallel randomized QR with column pivoting algorithms for reliablelow-rank matrix approximations in Proceedings of the 24th IEEE International Conference on HighPerformance Computing (HiPC) IEEE Conference Proceedings Los Alamitos 2017 pp 233ndash242

[76] T ZHANG AND G H GOLUB Rank-one approximation to high order tensors SIAM J Matrix Anal Appl23 (2001) pp 534ndash550

Page 8: FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION AND …etna.mcs.kent.edu/vol.51.2019/pp469-494.dir/pp469-494.pdfminimization. In Section3, we introduce Flip-Flop SRQR and analyze its

ETNAKent State University and

Johann Radon Institute (RICAM)

476 Y FENG J XIAO AND M GU

Algorithm 3 Flip-Flop Spectrum-Revealing QR FactorizationInputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0Integer l ge k Tolerance g gt 1 for g2OutputsU isin Rmtimesk contains the approximate top k left singular vectors of AΣ isin Rktimesk contains the approximate top k singular values of AV isin Rntimesk contains the approximate top k right singular vectors of AAlgorithmRun SRQR on A to l steps to obtain (R11 R12)Run QRNP on (R11 R12)

T to obtain Q1 represented by a sequence of Householder vectors

A = AΠQ1[U Σ V ] = svd

(A)

U = U ( 1 k) Σ = Σ (1 k 1 k) V = ΠQ1V ( 1 k)

32 Complexity analysis In this section we perform a complexity analysis of Flip-Flop SRQR Since the approximate SVD only makes sense when the target rank k is smallwe assume that k le l min (mn) The complexity analysis of Flip-Flop SRQR looks asfollows

1 The main cost of doing SRQR with TRQRCP onA is 2mnl+2(b+p)mn+(m+n)l22 The main cost of the QR factorization for [R11 R12]

T and forming Q1 is 2nl2minus 23 l

33 The main cost of computing A = AΠQ1 is 2mnl4 The main cost of computing [Usimsim] = svd

(A)

is O(ml2)5 The main cost of forming Vk is 2nlk

Since k le l min (mn) the complexity of Flip-Flop SRQR is 4mnl + 2(b+ p)mnby omitting the lower-order terms

On the other hand the complexity of the approximate SVD with randomized subspaceiteration (RSISVD) [27 30] is (4 + 4q)mn (k + p) where p is the oversampling size and qis the number of subspace iterations (see the detailed analysis in the appendix) In practice pis chosen to be a small integer for instance 5 in RSISVD In Flip-Flop SRQR l is usuallychosen to be a little bit larger than k eg l = k + 5 We also found that l = k is sufficient interms of the approximation quality in our numerical experiments Therefore we observe thatFlip-Flop SRQR is more efficient than RSISVD for any q gt 0

Since a practical dataset matrix can be too large to fit into the main memory the cost oftransferring the matrix into a memory hierarchy typically predominates the arithmetic costTherefore we also compare the number of data passes between RSISVD and Flip-Flop SRQRRSISVD needs 2q + 2 passes while Flip-Flop SRQR needs 4 passes computing B = ΩAswapping the columns of A realizing a partial QR factorization on A and computing AΠQ1Therefore Flip-Flop SRQR requires less data transfer than RSISVD for any q gt 1

33 Quality analysis of Flip-Flop SRQR This section is devoted to the quality analysisof Flip-Flop SRQR We start with Lemma 31

LEMMA 31 Given any matrix X = [X1X2] with Xi isin Rmtimesni i = 1 2 and n1 +n2 = n

σj (X)2 le σj (X1)

2+ X222 1 le j le min(mn)

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 477

Proof Since XXT = X1XT1 + X2X

T2 we obtain the above result using [33 Theo-

rem 3316]

We are now ready to derive bounds for the singular values and the approximation errorof Flip-Flop SRQR We need to emphasize that even if the target rank is k we run Flip-FlopSRQR with an actual target rank l which is a little bit larger than k The difference betweenk and l can create a gap between the singular values of A so that we can obtain a reliablelow-rank approximation

THEOREM 32 Given a matrix A isin Rmtimesn a target rank k an oversampling size p andan actual target rank l ge k the matrices Uk Σk Vk in (34) computed by Flip-Flop SRQRsatisfy

(35) σj(Σk) ge σj(A)

4

radic1 +

2R2242σ4j (Σk)

1 le j le k

and

(36)∥∥Aminus UkΣkV

Tk

∥∥2le σk+1 (A)

4

radic1 + 2

(R222σk+1 (A)

)4

where R22 isin R(mminusl)times(nminusl) is the trailing matrix in (31) Using the properties of SRQR wefurthermore have

(37) σj (Σk) ge σj (A)

4

radic1 + min

(2τ4 τ4 (2 + 4τ4)

(σl+1(A)σj(A)

)4) 1 le j le k

and

(38)∥∥Aminus UkΣkV

Tk

∥∥2le σk+1 (A)

4

radic1 + 2τ4

(σl+1 (A)

σk+1 (A)

)4

where τ and τ defined in (312) have matrix dimension-dependent upper bounds

τ le g1g2

radic(l + 1) (nminus l) and τ le g1g2

radicl (nminus l)

where g1 leradic

1+ε1minusε and g2 le g with high probability The parameters ε gt 0 and g gt 1 are

user-defined

Proof In terms of the singular value bounds observing that

[R11 R12

R22

][R11 R12

R22

]T=

[R11R

T11 + R12R

T12 R12R

T22

R22RT12 R22R

T22

]

ETNAKent State University and

Johann Radon Institute (RICAM)

478 Y FENG J XIAO AND M GU

we apply Lemma 31 twice for any 1 le j le k

σ2j

[R11 R12

R22

][R11 R12

R22

]Tle σ2

j

([R11R

T11 + R12R

T12 R12R

T22

])+∥∥∥[R22R

T12 R22R

T22

]∥∥∥2

2

le σ2j

(R11R

T11 + R12R

T12

)+∥∥∥R12R

T22

∥∥∥2

2+∥∥∥[R22R

T12 R22R

T22

]∥∥∥2

2

le σ2j

(R11R

T11 + R12R

T12

)+ 2

∥∥∥∥∥[R12

R22

]∥∥∥∥∥4

2

= σ2j

(R11R

T11 + R12R

T12

)+ 2 R2242 (39)

The relation (39) can be further rewritten as

σ4j (A) le σ4

j

([R11 R12

])+ 2 R2242 = σ4

j (Σk) + 2 R2242 1 le j le k

which is equivalent to

σj(Σk) ge σj(A)

4

radic1 +

2R2242σ4j (Σk)

For the residual matrix bound we let[R11 R12

]def=[R11 R12

]+[δR11 δR12

]

where[R11 R12

]is the rank-k truncated SVD of

[R11 R12

] Since

(310)∥∥Aminus UkΣkV

Tk

∥∥2

=

∥∥∥∥∥AΠminusQ[R11 R12

0

]TQT

∥∥∥∥∥2

it follows from the orthogonality of singular vectors that[R11 R12

]T [δR11 δR12

]= 0

and therefore[R11 R12

]T [R11 R12

]=[R11 R12

]T [R11 R12

]+[δR11 δR12

]T [δR11 δR12

]

which implies

(311) RT12R12 = RT

12R12 +(δR12

)T (δR12

)

Similarly to the deduction of (39) from (311) we can derive∥∥∥∥[δR11 δR12

R22

]∥∥∥∥4

2

le∥∥[δR11 δR12

]∥∥4

2+ 2

∥∥∥∥[δR12

R22

]∥∥∥∥4

2

le σ4k+1 (A) + 2

∥∥∥∥∥[R12

R22

]∥∥∥∥∥4

2

= σ4k+1 (A) + 2 R2242

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 479

Combining with (310) it now follows that

∥∥Aminus UkΣkVTk

∥∥2

=

∥∥∥∥∥AΠminusQ[R11 R12

0

]TQT

∥∥∥∥∥2

=

∥∥∥∥[δR11 δR12

R22

]∥∥∥∥2

le σk+1 (A)4

radic1 + 2

(R222σk+1 (A)

)4

To obtain an upper bound of R222 in (35) and (36) we follow the analysis of SRQRin [75]

From the analysis of [75 Section IV] Algorithm 2 ensures that g1 leradic

1+ε1minusε and g2 le g

with high probability (the actual probability-guarantee formula can be found in [75 SectionIV]) where g1 and g2 are defined by (26) Here 0 lt ε lt 1 is a user defined parameter toadjust the choice of the oversampling size p used in the TRQRCP initialization part in SRQRg gt 1 is a user-defined parameter in the extra swapping part in SRQR Let

(312) τdef= g1g2

R222R2212

∥∥∥RminusT∥∥∥minus1

12

σl+1 (A)and τ

def= g1g2

R222R2212

∥∥RminusT11

∥∥minus1

12

σk (Σk)

where R is defined in equation (25) Since 1radicnX12 le X2 le

radicnX12 and

σi (X1) le σi (X) 1 le i le min (s t) for any matrix X isin Rmtimesn and submatrix X1 isin Rstimestof X we obtain that

τ = g1g2R222R2212

∥∥∥RminusT∥∥∥minus1

12

σl+1

(R) σl+1

(R)

σl+1 (A)le g1g2

radic(l + 1) (nminus l)

Using the fact that σl([R11 R12

])= σl

(R11

)and σk (Σk) = σk

([R11 R12

])by (32)

and (34)

τ = g1g2R222R2212

∥∥RminusT11

∥∥minus1

12

σl (R11)

σl (R11)

σl([R11 R12

]) σl ([R11 R12

])σk (Σk)

le g1g2

radicl (nminus l)

By the definition of τ

(313) R222 = τ σl+1 (A)

Plugging this into (36) yields (38)By the definition of τ we observe that

(314) R222 le τσk (Σk)

By (35) and (314)(315)

σj (Σk) ge σj (A)

4

radic1 + 2

(R222σj(Σk)

)4ge σj (A)

4

radic1 + 2

(R222σk(Σk)

)4ge σj (A)

4radic

1 + 2τ4 1 le j le k

ETNAKent State University and

Johann Radon Institute (RICAM)

480 Y FENG J XIAO AND M GU

On the other hand using (35)

σ4j (A) le σ4

j (Σk)

(1 + 2

σ4j (A)

σ4j (Σk)

R2242σ4j (A)

)

le σ4j (Σk)

1 + 2

(σ4j (Σk) + 2 R2242

)σ4j (Σk)

R2242σ4j (A)

le σ4

j (Σk)

(1 + 2

(1 + 2

R2242σ4k (Σk)

)R2242σ4j (A)

)

that is

σj (Σk) ge σj (A)

4

radic1 +

(2 + 4

R2242σ4k(Σk)

)R2242σ4j (A)

Plugging (313) and (314) into this above equation yields

(316) σj (Σk) ge σj (A)

4

radic1 + τ4 (2 + 4τ4)

(σl+1(A)σj(A)

)4 1 le j le k

Combining (315) with (316) we arrive at (37)We note that (35) and (36) still hold true if we replace k by lInequality (37) shows that with the definitions (312) of τ and τ Flip-Flop SRQR can

reveal at least a dimension-dependent fraction of all the leading singular values of A andindeed approximate them very accurately in case they decay relatively quickly Moreover (38)shows that Flip-Flop SRQR can compute a rank-k approximation that is up to a factor of4

radic1 + 2τ4

(σl+1(A)σk+1(A)

)4

from optimal In situations where the singular values of A decay

relatively quickly our rank-k approximation is about as accurate as the truncated SVD with achoice of l such that

σl+1 (A)

σk+1 (A)= o (1)

REMARK 33 The ldquohigh probabilityrdquo mentioned in Theorem 32 is dependent on theoversampling size p but not on the block size b We decided not to dive deep into the probabilityexplanation since it will make the proof much longer and more tedious Theoretically thecomputed oversampling size p for certain given high probability would be rather high Forexample to achieve a probability 095 we need to choose p to be at least 500 However inpractice we would never choose such a large p since it will deteriorate the code efficiencyIt turns out that a small p like 5 sim 10 would achieve good approximation results empiricallyThe detailed explanation about what does high probability mean here can be found in [75]In terms of the choice of the block size b it wonrsquot affect the analysis or the error boundsThe choice of b is purely for the performance of the codes Only a suitable b can take fulladvantage of the cache hierarchy and BLAS routines

4 Numerical experiments In this section we demonstrate the effectiveness and effi-ciency of the Flip-Flop SRQR (FFSRQR) algorithm in several numerical experiments Firstly

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 481

TABLE 41Methods for approximate SVD

Method DescriptionLANSVD Approximate SVD using Lanczos bidiagonalization with partial

reorthogonalization [42]Its Fortran and Matlab implementations are from PROPACK [41]

FFSRQR Flip-Flop Spectrum-revealing QR factorizationIts Matlab implementation is using Mex-wrappedBLAS and LAPACK routines [2]

RSISVD Approximate SVD with randomized subspace iteration [30 27]Its Matlab implementation is from tensorlab toolbox [73]

TUXV Algorithm 5 in the appendixLTSVD Linear Time SVD [16]

We use its Matlab implementation by Ma et al [49]

we compare FFSRQR with other approximate SVD algorithms for a matrix approximationSecondly we compare FFSRQR with other methods on tensor approximation problems usingthe tensorlab toolbox [73] Thirdly we compare FFSRQR with other methods for the robustPCA problem and matrix completion problem All experiments are implemented on a laptopwith 29 GHz Intel Core i5 processor and 8 GB of RAM The codes based on LAPACK [2]and PROPACK [41] in Section 41 are in Fortran The codes in Sections 42 43 are in Matlabexcept that FFSRQR is implemented in Fortran and wrapped by mex

41 Approximate truncated SVD In this section we compare FFSRQR with otherapproximate SVD algorithms for low-rank matrix approximation All tested methods are listedin Table 41 Since LTSVD has only a Matlab implementation this method is only consideredin Sections 42 and 43 The test matrices areType 1 A isin Rmtimesn [66] is defined by A = U DV T + 01 middot dss middotE where U isin Rmtimess V isin

Rntimess are column-orthonormal matrices and D isin Rstimess is a diagonal matrix with sgeometrically decreasing diagonal entries from 1 to 10minus3 dss is the last diagonalentry of D E isin Rmtimesn is a random matrix where the entries are independentlysampled from a normal distribution N (0 1) In our numerical experiment we testthree different random matrices The square matrix has a size of 10000times 10000 theshort-fat matrix has a size of 1000times 10000 and the tall-skinny matrix has a size of10000times 1000

Type 2 A isin R4929times4929 is a real data matrix from the University of Florida sparse matrixcollection [11] Its corresponding file name is HBGEMAT11

For a given matrix A isin Rmtimesn the relative SVD approximation error is measured by∥∥Aminus UkΣkVTk

∥∥FAF where Σk contains the approximate top k singular values and

Uk Vk are the corresponding approximate top k singular vectors The parameters used inFFSRQR RSISVD TUXV and LTSVD are listed in Table 42 The additional time requiredto compute g2 is negligible In our experiments g2 always remains modest and never triggerssubsequent SRQR column swaps However computing g2 is still recommended since g2

serves as an insurance policy against potential quality degration by TRQRCP The block size bis chosen to be 32 according to the xGEQP3 LAPACK routine When k is less than 32 weimplement an unblock version of the partial QRCP

Figures 41ndash43 display the run time the relative approximation error and the top 20singular values comparison respectively for four different matrices In terms of speed FFSRQRis always faster than LANSVD and RSISVD Moreover the efficiency difference is more

ETNAKent State University and

Johann Radon Institute (RICAM)

482 Y FENG J XIAO AND M GU

TABLE 42Parameters used in RSISVD FFSRQR TUXV and LTSVD

Method ParameterFFSRQR oversampling size p = 5 block size b = min32 k l = k d = 10 g = 20RSISVD oversampling size p = 5 subspace iteration q = 1TUXV oversampling size p = 5 block size b = min32 k l = kLTSVD probabilities pi = 1n

obvious when k is relatively large FFSRQR is only slightly slower than TUXV In termsof the relative approximation error FFSRQR is comparable to the other three methods Interms of the top singular values approximation Figure 43 displays the top 20 singular valueswhen k = 500 for four different matrices In Figure 43 FFSRQR is superior to TUXV aswe take the absolute values of the main diagonal entries of the upper triangle matrix R as theapproximate singular values for TUXV

100 200 300 400 500

target rank k

100

101

102

103

104

run tim

e (

sec)

FFSRQR

RSISVD

LANSVD

TUXV

(a) Type 1 Random square matrix

100 200 300 400 500

target rank k

10-1

100

101

102

run tim

e (

sec)

FFSRQR

RSISVD

LANSVD

TUXV

(b) Type 1 Random short-fat matrix

100 200 300 400 500

target rank k

10-1

100

101

102

run tim

e (

sec)

FFSRQR

RSISVD

LANSVD

TUXV

(c) Type 1 Random tall-skinny matrix

100 200 300 400 500

target rank k

10-1

100

101

102

103

run tim

e (

sec)

FFSRQR

RSISVD

LANSVD

TUXV

(d) Type 2 GEMAT11

FIG 41 Run time comparison for approximate SVD algorithms

42 Tensor approximation This section illustrates the effectiveness and efficiency ofFFSRQR for computing approximate tensors ST-HOSVD [3 71] is one of the most efficient

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 483

100 200 300 400 500

target rank k

022

024

026

028

03

032

034

036re

lative

ap

pro

xim

atio

n e

rro

r

FFSRQR

RSISVD

LANSVD

TUXV

(a) Type 1 Random square matrix

100 200 300 400 500

target rank k

10-2

10-1

100

rela

tive

ap

pro

xim

atio

n e

rro

r

FFSRQR

RSISVD

LANSVD

TUXV

(b) Type 1 Random short-fat matrix

100 200 300 400 500

target rank k

10-2

10-1

100

rela

tive

ap

pro

xim

atio

n e

rro

r

FFSRQR

RSISVD

LANSVD

TUXV

(c) Type 1 Random tall-skinny matrix

100 200 300 400 500

target rank k

024

026

028

03

032

034

036

rela

tive

ap

pro

xim

atio

n e

rro

rFFSRQR

RSISVD

LANSVD

TUXV

(d) Type 2 GEMAT11

FIG 42 Relative approximation error comparison for approximate SVD algorithms

algorithms to compute a Tucker decomposition of tensors and the most costly part of thisalgorithm is to compute an SVD or approximate SVD of the tensor unfoldings The truncatedSVD and RSISVD are used in the routines MLSVD and MLSVD_RSI respectively in theMatlab tensorlab toolbox [73] Based on this Matlab toolbox we implement ST-HOSVDusing FFSRQR or LTSVD to do the SVD approximation We name these two new routines byMLSVD_FFSRQR and MLSVD_LTSVD respectively and compare these four routines in thenumerical experiment We also have Python codes for this tensor approximation experimentWe donrsquot list the results of the python programs here as they are similar to those for Matlab

421 A sparse tensor example We test a sparse tensor X isin Rntimesntimesn of the followingformat [64 59]

X =

10sumj=1

1000

jxj yj zj +

nsumj=11

1

jxj yj zj

where xj yj zj isin Rn are sparse vectors with nonnegative entries The symbol ldquordquo representsthe vector outer product We compute a rank-(k k k) Tucker decomposition [GU1 U2 U3]using MLSVD MLSVD_FFSRQR MLSVD_RSI and MLSVD_LTSVD respectively Therelative approximation error is measured by X minus XkF XF where Xk = G times1 U1 times2

U2 times3 U3

ETNAKent State University and

Johann Radon Institute (RICAM)

484 Y FENG J XIAO AND M GU

5 10 15 20

index

06

07

08

09

1to

p 2

0 s

ingula

r valu

es

FFSRQR

RSISVD

LANSVD

TUXV

(a) Type 1 Random square matrix

5 10 15 20

index

06

07

08

09

1

top 2

0 s

ingula

r valu

es

FFSRQR

RSISVD

LANSVD

TUXV

(b) Type 1 Random short-fat matrix

5 10 15 20

index

06

065

07

075

08

085

09

095

top

20

sin

gu

lar

va

lue

s

FFSRQR

RSISVD

LANSVD

TUXV

(c) Type 1 Random tall-skinny matrix

5 10 15 20

index

101

102

103

top

20

sin

gu

lar

va

lue

sFFSRQR

RSISVD

LANSVD

TUXV

(d) Type 2 GEMAT11

FIG 43 Top 20 singular values comparison for approximate SVD algorithms

Figure 44 displays a comparison of the efficiency and accuracy of different methods for a400times 400times 400 sparse tensor approximation problem MLSVD_LTSVD is the fastest but theleast accurate one The other three methods have similar accuracy while MLSVD_FFSRQR isfaster when the target rank k is larger

422 Handwritten digits classification MNIST is a handwritten digits image data setcreated by Yann LeCun [43] Every digit is represented by a 28times28 pixel image Handwrittendigits classification has the objective of training a classification model to classify new unlabeledimages A HOSVD algorithm is proposed by Savas and Elden [60] to classify handwrittendigits To reduce the training time a more efficient ST-HOSVD algorithm is introducedin [71]

We do handwritten digits classification using MNIST which consists of 60 000 trainingimages and 10 000 test images The number of training images in each class is restrictedto 5421 so that the training set is equally distributed over all classes The training set isrepresented by a tensor X of size 786times 5421times 10 The classification relies on Algorithm 2in [60] We use various algorithms to obtain an approximation X asymp G times1 U1 times2 U2 times3 U3

where the core tensor G has size 65times 142times 10The results are summarized in Table 43 In terms of run time we observe that our method

MLSVD_FFSRQR is comparable to MLSVD_RSI while MLSVD is the most expensive

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 485

50 100 150 200

target rank k

10-5

10-4

10-3

10-2

10-1

100

rela

tive

ap

pro

xim

atio

n e

rro

r

MLSVD

MLSVD_FFSRQR

MLSVD_RSI

MLSVD_LTSVD

50 100 150 200

target rank k

10-2

10-1

100

101

102

run tim

e (

sec)

MLSVD

MLSVD_FFSRQR

MLSVD_RSI

MLSVD_LTSVD

FIG 44 Run time and relative approximation error comparison on a sparse tensor

TABLE 43Comparison on handwritten digits classification

MLSVD MLSVD_FFSRQR MLSVD_RSI MLSVD_LTSVD

Training Time [sec] 272121 15455 19343 04266Relative Model Error 04099 04273 04247 05162Classification Accur 9519 9498 9505 9259

one and MLSVD_LTSVD is the fastest one In terms of classification quality MLSVDMLSVD_FFSRQR and MLSVD_RSI are comparable while MLSVD_LTSVD is the leastaccurate one

43 Solving nuclear norm minimization problem To show the effectiveness of theFFSRQR algorithm for the nuclear norm minimization problems we investigate two scenariosthe robust PCA (27) and the matrix completion problem (28) The test matrix used for therobust PCA is introduced in [45] and the test matrices used in the matrix completion are tworeal data sets We use the IALM method [45] to solve both problems the IALM code can bedownloaded1

431 Robust PCA To solve the robust PCA problem we replace the approximateSVD part in the IALM method [45] by various methods We denote the actual solution tothe robust PCA problem by a matrix pair (Xlowast Elowast) isin Rmtimesn times Rmtimesn The matrix Xlowast isXlowast = XLX

TR with XL isin Rmtimesk XR isin Rntimesk being random matrices whose entries are

independently sampled from a normal distribution The sparse matrix Elowast is a random matrixwhere its non-zero entries are independently sampled from a uniform distribution over theinterval [minus500 500] The input to the IALM algorithm has the form M = Xlowast + Elowast andthe output is denoted by

(X E

) In this numerical experiment we use the same parameter

settings as the IALM code for robust PCA rank k is 01m and the number of non-zero entriesin E is 005m2 We choose the trade-off parameter λ = 1

radicmax (mn) as suggested by

Candes et al [8] The solution quality is measured by the normalized root mean square errorX minusXlowastF XlowastF

Table 44 includes relative error run time the number of non-zero entries in E (E0)the iteration count and the number of non-zero singular values (sv) in X of the IALM

1httpperceptioncslillinoisedumatrix-ranksample_codehtml

ETNAKent State University and

Johann Radon Institute (RICAM)

486 Y FENG J XIAO AND M GU

algorithm using different approximate SVD methods We observe that IALM_FFSRQR isfaster than all the other three methods while its error is comparable to IALM_LANSVD andIALM_RSISVD IALM_LTSVD is relatively slow and not effective

TABLE 44Comparison on robust PCA

Size Method Error Time (sec) E0 Iter sv

1000times 1000

IALM_LANSVD 333e-07 579e+00 50000 22 100IALM_FFSRQR 279e-07 102e+00 50000 25 100IALM_RSISVD 336e-07 109e+00 49999 22 100IALM_LTSVD 992e-02 311e+00 999715 100 100

2000times 2000

IALM_LANSVD 261e-07 591e+01 199999 22 200IALM_FFSRQR 182e-07 693e+00 199998 25 200IALM_RSISVD 263e-07 738e+00 199996 22 200IALM_LTSVD 842e-02 220e+01 3998937 100 200

4000times 4000

IALM_LANSVD 138e-07 465e+02 799991 23 400IALM_FFSRQR 139e-07 443e+01 800006 26 400IALM_RSISVD 151e-07 504e+01 799990 23 400IALM_LTSVD 894e-02 154e+02 15996623 100 400

6000times 6000

IALM_LANSVD 130e-07 166e+03 1799982 23 600IALM_FFSRQR 102e-07 142e+02 1799993 26 600IALM_RSISVD 144e-07 162e+02 1799985 23 600IALM_LTSVD 858e-02 555e+02 35992605 100 600

432 Matrix completion We solve the matrix completion problems for two real datasets used in [68] the Jester joke data set [24] and the MovieLens data set [32] The Jester jokedata set consists of 41 million ratings for 100 jokes from 73 421 users and can be downloadedfrom the website httpgoldbergberkeleyedujester-data We test withthe following data matrices

bull jester-1 Data from 24 983 users who have rated 36 or more jokesbull jester-2 Data from 23 500 users who have rated 36 or more jokesbull jester-3 Data from 24 938 users who have rated between 15 and 35 jokesbull jester-all The combination of jester-1 jester-2 and jester-3

The MovieLens data set can be downloaded fromhttpsgrouplensorgdatasetsmovielensWe test with the following data matricesbull movie-100K 100 000 ratings of 943 users for 1682 moviesbull movie-1M 1 million ratings of 6040 users for 3900 moviesbull movie-latest-small 100 000 ratings of 700 users for 9000 movies

For each data set we let M be the original data matrix where Mij stands for the ratingof joke (movie) j by user i and Γ be the set of indices where Mij is known The matrixcompletion algorithm quality is measured by the Normalized Mean Absolute Error (NMAE)defined by

NMAEdef=

1|Γ|sum

(ij)isinΓ |Mij minusXij |rmax minus rmin

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 487

where Xij is the prediction of the rating of joke (movie) j given by user i and rmin rmax arethe lower and upper bounds of the ratings respectively For the Jester joke data sets we setrmin = minus10 and rmax = 10 For the MovieLens data sets we set rmin = 1 and rmax = 5

Since |Γ| is large we randomly select a subset Ω from Γ and then use Algorithm 8 tosolve the problem (28) We randomly select 10 ratings for each user in the Jester joke datasets while we randomly choose about 50 of the ratings for each user in the MovieLensdata sets Table 45 includes the parameter settings in the algorithms The maximum iterationnumber is 100 in IALM and all other parameters are the same as those used in [45]

The numerical results are included in Table 46 We observe that IALM_FFSRQR achievesalmost the same recoverability as other methods (except IALM_LTSVD) and is slightly fasterthan IALM_RSISVD for these two data sets

TABLE 45Parameters used in the IALM method on matrix completion

Data set m n |Γ| |Ω|jester-1 24983 100 181e+06 249830jester-2 23500 100 171e+06 235000jester-3 24938 100 617e+05 249384

jester-all 73421 100 414e+06 734210movie-100K 943 1682 100e+05 49918movie-1M 6040 3706 100e+06 498742

movie-latest-small 671 9066 100e+05 52551

5 Conclusions We have presented the Flip-Flop SRQR factorization a variant ofthe QLP factorization to compute low-rank matrix approximations The Flip-Flop SRQRalgorithm uses a SRQR factorization to initialize the truncated version of a column-pivotedQR factorization and then forms an LQ factorization For the numerical results presented theerrors in the proposed algorithm were comparable to those obtained from the other state-of-the-art algorithms This new algorithm is cheaper to compute and produces quality low-rankmatrix approximations Furthermore we prove singular value lower bounds and residual errorupper bounds for the Flip-Flop SRQR factorization In situations where singular values ofthe input matrix decay relatively quickly the low-rank approximation computed by Flip-FlopSRQR is guaranteed to be as accurate as the truncated SVD We also perform a complexityanalysis to show that Flip-Flop SRQR is faster than the approximate SVD with randomizedsubspace iteration Future work includes reducing the overhead cost in Flip-Flop SRQR andimplementing the Flip-Flop SRQR algorithm on distributed memory machines for popularapplications such as distributed PCA

Acknowledgments The authors would like to thank Yousef Saad the editor and threeanonymous referees who kindly reviewed the earlier version of this manuscript and providedvaluable suggestions and comments

Appendix A Partial QR factorization with column pivotingThe details of partial QRCP are covered in Algorithm 4

Appendix B TUXV algorithmThe details of TUXV algorithm are presented in Algorithm 5

Appendix C Computing Tucker decomposition of higher-order tensorsAlgorithm 6 computes the Tucker decomposition of higher-order tensors ie sequentially

truncated higher-order SVD

ETNAKent State University and

Johann Radon Institute (RICAM)

488 Y FENG J XIAO AND M GU

TABLE 46Comparison on matrix completion

Data set Method Iter Time NMAE sv σmax σmin

jester-1

IALM-LANSVD 12 706e+00 184e-01 100 214e+03 100e+00IALM-FFSRQR 12 344e+00 169e-01 100 228e+03 100e+00IALM-RSISVD 12 375e+00 189e-01 100 212e+03 100e+00IALM-LTSVD 100 211e+01 174e-01 62 300e+03 100e+00

jester-2

IALM-LANSVD 12 680e+00 185e-01 100 213e+03 100e+00IALM-FFSRQR 12 279e+00 170e-01 100 229e+03 100e+00IALM-RSISVD 12 359e+00 191e-01 100 212e+03 100e+00IALM-LTSVD 100 203e+01 175e-01 58 296e+03 100e+00

jester-3

IALM-LANSVD 12 705e+00 126e-01 99 179e+03 100e+00IALM-FFSRQR 12 303e+00 122e-01 100 171e+03 100e+00IALM-RSISVD 12 385e+00 131e-01 100 178e+03 100e+00IALM-LTSVD 100 212e+01 133e-01 55 250e+03 100e+00

jester-all

IALM-LANSVD 12 239e+01 172e-01 100 356e+03 100e+00IALM-FFSRQR 12 112e+01 162e-01 100 363e+03 100e+00IALM-RSISVD 12 134e+01 182e-01 100 347e+03 100e+00IALM-LTSVD 100 699e+01 168e-01 52 492e+03 100e+00

movie-100K

IALM-LANSVD 29 286e+01 183e-01 285 121e+03 100e+00IALM-FFSRQR 30 455e+00 167e-01 295 153e+03 100e+00IALM-RSISVD 29 482e+00 182e-01 285 129e+03 100e+00IALM-LTSVD 48 142e+01 147e-01 475 191e+03 100e+00

movie-1M

IALM-LANSVD 50 740e+02 158e-01 495 499e+03 100e+00IALM-FFSRQR 53 207e+02 137e-01 525 663e+03 100e+00IALM-RSISVD 50 223e+02 157e-01 495 535e+03 100e+00IALM-LTSVD 100 850e+02 117e-01 995 897e+03 100e+00

movie-latest-small

IALM-LANSVD 31 166e+02 185e-01 305 113e+03 100e+00IALM-FFSRQR 31 196e+01 200e-01 305 142e+03 100e+00IALM-RSISVD 31 285e+01 191e-01 305 120e+03 100e+00IALM-LTSVD 63 402e+01 208e-01 298 179e+03 100e+00

Appendix D IALM for solving two special nuclear norm minimization problemsAlgorithm 7 and Algorithm 8 solve the robust PCA and matrix completion problems

respectively In Algorithm 7 Sω (x) = sgn (x) middot max (|x| minus ω 0) is the soft shrinkageoperator [29] with x isin Rn and ω gt 0 In Algorithm 8 Ω is the complement of Ω

Appendix E Approximate SVD with randomized subspace iterationThe randomized subspace iteration was proposed in [30 Algorithm 44] to compute an

orthonormal matrix whose range approximates the range of A An approximate SVD can becomputed using the aforementioned orthonormal matrix [30 Algorithm 51] A randomizedsubspace iteration is used in routine MLSVD_RSI in Matlab toolbox tensorlab [73] andMLSVD_RSI is by far the most efficient function that we can find in Matlab to compute ST-HOSVD We summarize the approximate SVD with randomized subspace iteration pseudocodein Algorithm 9

Now we perform a complexity analysis for the approximate SVD with randomizedsubspace iteration We first note that

1 The cost of generating a random matrix is negligible2 The cost of computing B = AΩ is 2mn (k + p)

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 489

Algorithm 4 Partial QRCPInputsMatrix A isin Rmtimesn Target rank kOutputsOrthogonal matrix Q isin RmtimesmUpper trapezoidal matrix R isin RmtimesnPermutation matrix Π isin Rntimesn such that AΠ = QRAlgorithmInitialize Π = In Compute column norms rs = A (1 m s) 2 (1 le s le n)for j = 1 k do

Find i = arg maxjleslen

rs Exchange rj and ri columns in A and Π

Form Householder reflection Qj from A (j m j)Update trailing matrix A (j m j n)larr QTj A (j m j n)Update rs = A (j + 1 m s) 2 (j + 1 le s le n)

end forQ = Q1Q2 middot middot middotQk is the product of all reflections R = upper trapezoidal part of A

Algorithm 5 TUXV AlgorithmInputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0OutputsColumn orthonormal matrices U isin Rmtimesk V isin Rntimesk and upper triangular matrixR isin Rktimesk such that A asymp URV T AlgorithmDo TRQRCP on A to obtain Q isin Rmtimesk R isin Rktimesn and Π isin RntimesnR = RΠT and do LQ factorization ie [VR] = qr(RT 0)Compute Z = AV and do QR factorization ie [UR] = qr(Z 0)

3 In each QR step [Qsim] = qr (B 0) the cost of computing the QR factorization ofB is 2m (k + p)

2 minus 23 (k + p)

3 (cf [69]) and the cost of forming the first (k + p)

columns in the full Q matrix is m (k + p)2

+ 13 (k + p)

3

Now we count the flops for each i in the for loop

1 The cost of computing B = AT lowastQ is 2mn (k + p)2 The cost of computing [Qsim] = qr (B 0) is 2n (k + p)

2 minus 23 (k + p)

3 and the costof forming the first (k + p) columns in the full Q matrix is n (k + p)

2+ 1

3 (k + p)3

3 The cost of computing B = A lowastQ is 2mn (k + p)4 The cost of computing [Qsim] = qr (B 0) is 2m (k + p)

2minus 23 (k + p)

3 and the costof forming the first (k + p) columns in the full Q matrix is m (k + p)

2+ 1

3 (k + p)3

Together the cost of running the for loop q times is

q

(4mn (k + p) + 3 (m+ n) (k + p)

2 minus 2

3(k + p)

3

)

Additionally the cost of computing B = QT lowastA is 2mn (k + p) the cost of doing SVD of Bis O(n (k + p)

2) and the cost of computing U = Q lowast U is 2m (k + p)

2

ETNAKent State University and

Johann Radon Institute (RICAM)

490 Y FENG J XIAO AND M GU

Algorithm 6 ST-HOSVDInputsTensor X isin RI1timesmiddotmiddotmiddottimesId desired rank (k1 kd) and processing order p = (p1 middot middot middot pd)OutputsTucker decomposition [GU1 middot middot middot Ud]AlgorithmDefine tensor G larr X for j = 1 d dor = pj Compute exact or approximate rank kr SVD of the tensor unfolding G(r) asymp UrΣrV Tr Ur larr UrUpdate G(r) larr ΣrV

Tr ie applying UTr to G

end for

Algorithm 7 Robust PCA Using IALMInputsMeasured matrix M isin Rmtimesn positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 J (M) = max

(M2 λminus1MF

) Y0 = MJ (M) E0 = 0

while not converged do(UΣV) = svd

(MminusEk + microminus1k Yk

)

Xk+1 = USmicrominus1k

(Σ)V T

Ek+1 = Sλmicrominus1k

(M minusXk+1 + microminus1

k Yk)

Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then

Breakend if

end while

Now assume k + p min (mn) and omit the lower-order terms then we arriveat (4q + 4)mn (k + p) as the complexity of approximate SVD with randomized subspaceiteration In practice q is usually chosen to be an integer between 0 and 2

REFERENCES

[1] O ALTER P O BROWN AND D BOTSTEIN Singular value decomposition for genome-wide expressiondata processing and modeling Proc Natl Acad Sci USA 97 (2000) pp 10101ndash10106

[2] E ANDERSON Z BAI C BISCHOF L S BLACKFORD J DEMMEL J DONGARRA J DU CROZA GREENBAUM S HAMMARLING A MCKENNEY AND D SORENSEN LAPACK Usersrsquo Guide 3rded SIAM Philadelphia 1999

[3] C A ANDERSSON AND R BRO Improving the speed of multi-way algorithms Part I Tucker3 ChemometricsIntell Lab Syst 42 (1998) pp 93ndash103

[4] M W BERRY S T DUMAIS AND G W OrsquoBRIEN Using linear algebra for intelligent information retrievalSIAM Rev 37 (1995) pp 573ndash595

[5] C BISCHOF AND C VAN LOAN The WY representation for products of Householder matrices SIAM J Sci

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 491

Algorithm 8 Matrix Completion Using IALMInputsSampled set Ω sampled entries πΩ (M) positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 Y0 = 0 E0 = 0while not converged do

(UΣV) = svd(MminusEk + microminus1k Yk

)

Xk+1 = USmicrominus1k

(Σ)V T

Ek+1 = πΩ

(M minusXk+1 + microminus1

k Yk)

Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then

Breakend if

end while

Statist Comput 8 (1987) pp S2ndashS13[6] P BUSINGER AND G H GOLUB Handbook series linear algebra Linear least squares solutions by House-

holder transformations Numer Math 7 (1965) pp 269ndash276[7] J-F CAI E J CANDEgraveS AND Z SHEN A singular value thresholding algorithm for matrix completion

SIAM J Optim 20 (2010) pp 1956ndash1982[8] E J CANDEgraveS X LI Y MA AND J WRIGHT Robust principal component analysis J ACM 58 (2011)

Art 11 37 pages[9] E J CANDEgraveS AND B RECHT Exact matrix completion via convex optimization Found Comput Math 9

(2009) pp 717ndash772[10] J D CARROLL AND J-J CHANG Analysis of individual differences in multidimensional scaling via an

N-way generalization of ldquoEckart-Youngrdquo decomposition Psychometrika 35 (1970) pp 283ndash319[11] T A DAVIS AND Y HU The University of Florida sparse matrix collection ACM Trans Math Software 38

(2011) pp Art 1 25[12] L DE LATHAUWER AND B DE MOOR From matrix to tensor Multilinear algebra and signal processing in

Mathematics in Signal Processing IV Inst Math Appl Conf Ser Oxford Univ Press Oxford 1998pp 1ndash16

[13] L DE LATHAUWER B DE MOOR AND J VANDEWALLE A multilinear singular value decompositionSIAM J Matrix Anal Appl 21 (2000) pp 1253ndash1278

[14] On the best rank-1 and rank-(R1 R2 middot middot middot RN ) approximation of higher-order tensors SIAM JMatrix Anal Appl 21 (2000) pp 1324ndash1342

[15] L DE LATHAUWER AND J VANDEWALLE Dimensionality reduction in higher-order signal processing andrank-(R1 R2 RN ) reduction in multilinear algebra Linear Algebra Appl 391 (2004) pp 31ndash55

[16] P DRINEAS R KANNAN AND M W MAHONEY Fast Monte Carlo algorithms for matrices II Computinga low-rank approximation to a matrix SIAM J Comput 36 (2006) pp 158ndash183

[17] J A DUERSCH AND M GU Randomized QR with column pivoting SIAM J Sci Comput 39 (2017)pp C263ndashC291

[18] C ECKART AND G YOUNG The approximation of one matrix by another of lower rank Psychometrika 1(1936) pp 211ndash218

[19] L ELDEacuteN Matrix Methods in Data Mining and Pattern Recognition SIAM Philadelphia 2007[20] M FAZEL H HINDI AND S P BOYD Log-det heuristic for matrix rank minimization with applications to

Hankel and Euclidean distance matrices in Proceedings of the 2003 American Control Conference 2003IEEE Conference Proceedings IEEE Piscataway pp 2156ndash2162

[21] M FAZEL T K PONG D SUN AND P TSENG Hankel matrix rank minimization with applications tosystem identification and realization SIAM J Matrix Anal Appl 34 (2013) pp 946ndash977

[22] C FRANK AND E NOumlTH Optimizing eigenfaces by face masks for facial expression recognition in ComputerAnalysis of Images and Patterns N Petkov and M A Westenberg eds vol 2756 of Lecture Notes inComput Sci Springer Berlin 2003 pp 646ndash654

ETNAKent State University and

Johann Radon Institute (RICAM)

492 Y FENG J XIAO AND M GU

Algorithm 9 Approximate SVD with Randomized Subspace IterationInputsMatrix A isin Rmtimesn Target rank k Oversampling size p ge 0 Number of iterations q ge 1OutputsU isin Rmtimesk contains the approximate top k left singular vectors of AΣ isin Rktimesk contains the approximate top k singular values of AV isin Rntimesk contains the approximate top k right singular vectors of AAlgorithmGenerate iid Gaussian matrix Ω isin N (0 1)

ntimes(k+p)Compute B = AΩ[Qsim] = qr (B 0)for i = 1 q doB = AT lowastQ[Qsim] = qr (B 0)B = A lowastQ[Qsim] = qr (B 0)

end forB = QT lowastA[UΣ V ] = svd (B)U = Q lowast U U = U ( 1 k)Σ = Σ (1 k 1 k)V = V ( 1 k)

[23] G W FURNAS S DEERWESTER S T DUMAIS T K LANDAUER R A HARSHMAN L A STREETERAND K E LOCHBAUM Information retrieval using a singular value decomposition model of latentsemantic structure in Proceedings of the 11th Annual International ACM SIGIR Conference on Researchand Development in Information Retrieval Y Chiaramella ed ACM New York 1988 pp 465ndash480

[24] K GOLDBERG T ROEDER D GUPTA AND C PERKINS Eigentaste A constant time collaborative filteringalgorithm Inf Retrieval 4 (2001) pp 133ndash151

[25] G GOLUB Numerical methods for solving linear least squares problems Numer Math 7 (1965) pp 206ndash216[26] G H GOLUB AND C F VAN LOAN Matrix Computations 4th ed Johns Hopkins University Press

Baltimore 2013[27] M GU Subspace iteration randomization and singular value problems SIAM J Sci Comput 37 (2015)

pp A1139ndashA1173[28] M GU AND S C EISENSTAT Efficient algorithms for computing a strong rank-revealing QR factorization

SIAM J Sci Comput 17 (1996) pp 848ndash869[29] E T HALE W YIN AND Y ZHANG Fixed-point continuation for l1-minimization methodology and

convergence SIAM J Optim 19 (2008) pp 1107ndash1130[30] N HALKO P G MARTINSSON AND J A TROPP Finding structure with randomness probabilistic

algorithms for constructing approximate matrix decompositions SIAM Rev 53 (2011) pp 217ndash288[31] R A HARSHMAN Foundations of the PARAFAC procedure Models and conditions for an ldquoexplanatoryrdquo

multimodal factor analysis UCLA Working Papers in Phonetics 16 (1970) pp 1ndash84[32] J L HERLOCKER J A KONSTAN A BORCHERS AND J RIEDL An algorithmic framework for performing

collaborative filtering in Proceedings of the 22nd Annual International ACM SIGIR Conference onResearch and Development in Information Retrieval F Gey M Heaast and R Tong eds ACM NewYork 1999 pp 230ndash237

[33] R A HORN AND C R JOHNSON Topics in Matrix Analysis Cambridge University Press Cambridge 1991[34] D A HUCKABY AND T F CHAN On the convergence of Stewartrsquos QLP algorithm for approximating the

SVD Numer Algorithms 32 (2003) pp 287ndash316[35] Stewartrsquos pivoted QLP decomposition for low-rank matrices Numer Linear Algebra Appl 12 (2005)

pp 153ndash159[36] W B JOHNSON AND J LINDENSTRAUSS Extensions of Lipschitz mappings into a Hilbert space in

Conference in Modern Analysis and Probability (New Haven Conn 1982) R Beals A Beck A Bellowand A Hajian eds vol 26 of Contemp Math Amer Math Soc Providence 1984 pp 189ndash206

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 493

[37] I T JOLLIFFE Principal Component Analysis Springer New York 1986[38] J M KLEINBERG Authoritative sources in a hyperlinked environment J ACM 46 (1999) pp 604ndash632[39] T G KOLDA Orthogonal tensor decompositions SIAM J Matrix Anal Appl 23 (2001) pp 243ndash255[40] T G KOLDA AND B W BADER Tensor decompositions and applications SIAM Rev 51 (2009) pp 455ndash

500[41] R M LARSEN PROPACK-software for large and sparse SVD calculations available at

httpsunstanfordedurmunkPROPACK[42] Lanczos bidiagonalization with partial reorthogonalization Tech Report PB-537 Department of

Computer Science Aarhus University Aarhus Denmark 1998[43] Y LECUN L BOTTOU Y BENGIO AND P HAFFNER Gradient-based learning applied to document

recognition Proc IEEE 86 (1998) pp 2278ndash2324[44] R B LEHOUCQ D C SORENSEN AND C YANG ARPACK Usersrsquo Guide Solution of Large-Scale

Eigenvalue Problems with Implicitly Restarted Arnoldi Methods SIAM Philadelphia 1998[45] Z LIN M CHEN AND Y MA The augmented Lagrange multiplier method for exact recovery of corrupted

low-rank matrices Preprint on arXiv httpsarxivorgabs10095055 2010[46] N LINIAL E LONDON AND Y RABINOVICH The geometry of graphs and some of its algorithmic

applications Combinatorica 15 (1995) pp 215ndash245[47] Z LIU A HANSSON AND L VANDENBERGHE Nuclear norm system identification with missing inputs and

outputs Systems Control Lett 62 (2013) pp 605ndash612[48] Z LIU AND L VANDENBERGHE Interior-point method for nuclear norm approximation with application to

system identification SIAM J Matrix Anal Appl 31 (2009) pp 1235ndash1256[49] S MA D GOLDFARB AND L CHEN Fixed point and Bregman iterative methods for matrix rank minimiza-

tion Math Program 128 (2011) pp 321ndash353[50] C B MELGAARD Randomized Pivoting and Spectrum-Revealing Bounds in Numerical Linear Algebra

PhD Thesis University of California Berkeley 2015[51] M MESBAHI AND G P PAPAVASSILOPOULOS On the rank minimization problem over a positive semidefinite

linear matrix inequality IEEE Trans Automat Control 42 (1997) pp 239ndash243[52] N MULLER L MAGAIA AND B M HERBST Singular value decomposition eigenfaces and 3D recon-

structions SIAM Rev 46 (2004) pp 518ndash545[53] M NARWARIA AND W LIN SVD-based quality metric for image and video using machine learning IEEE

Trans Systbdquo Man and Cybernetics Part B (Cybernetics) 42 (2012) pp 347ndash364[54] T-H OH Y MATSUSHITA Y-W TAI AND I SO KWEON Fast randomized singular value thresholding

for nuclear norm minimization in Proceedings of the IEEE Conference on Computer Vision and PatternRecognition 2015 IEEE Conference Proceedings IEEE Los Alamitos 2015 pp 4484ndash4493

[55] G QUINTANA-ORTIacute X SUN AND C H BISCHOF A BLAS-3 version of the QR factorization with columnpivoting SIAM J Sci Comput 19 (1998) pp 1486ndash1494

[56] B RECHT M FAZEL AND P A PARRILO Guaranteed minimum-rank solutions of linear matrix equationsvia nuclear norm minimization SIAM Rev 52 (2010) pp 471ndash501

[57] J D RENNIE AND N SREBRO Fast maximum margin matrix factorization for collaborative prediction inProceedings of the 22nd International Conference on Machine Learning L De Raedt and S Wrobel edsACM Press Neow York 2005 pp 713ndash719

[58] V ROKHLIN A SZLAM AND M TYGERT A randomized algorithm for principal component analysis SIAMJ Matrix Anal Appl 31 (2009) pp 1100ndash1124

[59] A K SAIBABA HOID higher order interpolatory decomposition for tensors based on Tucker representationSIAM J Matrix Anal Appl 37 (2016) pp 1223ndash1249

[60] B SAVAS AND L ELDEacuteN Handwritten digit classification using higher order singular value decompositionPattern Recognition 40 (2007) pp 993ndash1003

[61] R SCHREIBER AND C VAN LOAN A storage-efficient WY representation for products of Householdertransformations SIAM J Sci Statist Comput 10 (1989) pp 53ndash57

[62] A SHASHUA AND T HAZAN Non-negative tensor factorization with applications to statistics and com-puter vision in Proceedings of the 22nd International Conference on Machine Learning S DzeroskiL De Raedt and S Wrobel eds ACM New York 2005 pp 792ndash799

[63] N D SIDIROPOULOS R BRO AND G B GIANNAKIS Parallel factor analysis in sensor array processingIEEE Trans Signal Process 48 (2000) pp 2377ndash2388

[64] D C SORENSEN AND M EMBREE A DEIM induced CUR factorization SIAM J Sci Comput 38 (2016)pp A1454ndashA1482

[65] N SREBRO J RENNIE AND T S JAAKKOLA Maximum-margin matrix factorization in NIPSrsquo04 Proceed-ings of the 17th International Conference on Neural Information Processing Systems L K Saul Y Weissand L Bottou eds MIT Press Cambridge 2004 pp 1329ndash1336

[66] G W STEWART The QLP approximation to the singular value decomposition SIAM J Sci Comput 20(1999) pp 1336ndash1348

ETNAKent State University and

Johann Radon Institute (RICAM)

494 Y FENG J XIAO AND M GU

[67] S TAHERI Q QIU AND R CHELLAPPA Structure-preserving sparse decomposition for facial expressionanalysis IEEE Trans Image Process 23 (2014) pp 3590ndash3603

[68] K-C TOH AND S YUN An accelerated proximal gradient algorithm for nuclear norm regularized linearleast squares problems Pac J Optim 6 (2010) pp 615ndash640

[69] L N TREFETHEN AND D BAU III Numerical Linear Algebra SIAM Philadelphia 1997[70] L R TUCKER Some mathematical notes on three-mode factor analysis Psychometrika 31 (1966) pp 279ndash

311[71] N VANNIEUWENHOVEN R VANDEBRIL AND K MEERBERGEN A new truncation strategy for the higher-

order singular value decomposition SIAM J Sci Comput 34 (2012) pp A1027ndashA1052[72] M A O VASILESCU AND D TERZOPOULOS Multilinear analysis of image ensembles Tensorfaces in

European Conference on Computer Vision 2002 A Heyden G Sparr M Nielsen and P Johansen edsLecture Notes in Computer Science vol 2350 Springer Berlin 2002 pp 447ndash460

[73] N VERVLIET O DEBALS L SORBER M V BAREL AND L DE LATHAUWER Tensorlab user guideavailable at httpwwwtensorlabnet

[74] J XIAO AND M GU Spectrum-revealing Cholesky factorization for kernel methods in Proceedings of the 16thIEEE International Conference on Data Mining (ICDM) IEEE Conference Proceedings Los Alamitos2016 pp 1293ndash1298

[75] J XIAO M GU AND J LANGOU Fast parallel randomized QR with column pivoting algorithms for reliablelow-rank matrix approximations in Proceedings of the 24th IEEE International Conference on HighPerformance Computing (HiPC) IEEE Conference Proceedings Los Alamitos 2017 pp 233ndash242

[76] T ZHANG AND G H GOLUB Rank-one approximation to high order tensors SIAM J Matrix Anal Appl23 (2001) pp 534ndash550

Page 9: FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION AND …etna.mcs.kent.edu/vol.51.2019/pp469-494.dir/pp469-494.pdfminimization. In Section3, we introduce Flip-Flop SRQR and analyze its

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 477

Proof Since XXT = X1XT1 + X2X

T2 we obtain the above result using [33 Theo-

rem 3316]

We are now ready to derive bounds for the singular values and the approximation errorof Flip-Flop SRQR We need to emphasize that even if the target rank is k we run Flip-FlopSRQR with an actual target rank l which is a little bit larger than k The difference betweenk and l can create a gap between the singular values of A so that we can obtain a reliablelow-rank approximation

THEOREM 32 Given a matrix A isin Rmtimesn a target rank k an oversampling size p andan actual target rank l ge k the matrices Uk Σk Vk in (34) computed by Flip-Flop SRQRsatisfy

(35) σj(Σk) ge σj(A)

4

radic1 +

2R2242σ4j (Σk)

1 le j le k

and

(36)∥∥Aminus UkΣkV

Tk

∥∥2le σk+1 (A)

4

radic1 + 2

(R222σk+1 (A)

)4

where R22 isin R(mminusl)times(nminusl) is the trailing matrix in (31) Using the properties of SRQR wefurthermore have

(37) σj (Σk) ge σj (A)

4

radic1 + min

(2τ4 τ4 (2 + 4τ4)

(σl+1(A)σj(A)

)4) 1 le j le k

and

(38)∥∥Aminus UkΣkV

Tk

∥∥2le σk+1 (A)

4

radic1 + 2τ4

(σl+1 (A)

σk+1 (A)

)4

where τ and τ defined in (312) have matrix dimension-dependent upper bounds

τ le g1g2

radic(l + 1) (nminus l) and τ le g1g2

radicl (nminus l)

where g1 leradic

1+ε1minusε and g2 le g with high probability The parameters ε gt 0 and g gt 1 are

user-defined

Proof In terms of the singular value bounds observing that

[R11 R12

R22

][R11 R12

R22

]T=

[R11R

T11 + R12R

T12 R12R

T22

R22RT12 R22R

T22

]

ETNAKent State University and

Johann Radon Institute (RICAM)

478 Y FENG J XIAO AND M GU

we apply Lemma 31 twice for any 1 le j le k

σ2j

[R11 R12

R22

][R11 R12

R22

]Tle σ2

j

([R11R

T11 + R12R

T12 R12R

T22

])+∥∥∥[R22R

T12 R22R

T22

]∥∥∥2

2

le σ2j

(R11R

T11 + R12R

T12

)+∥∥∥R12R

T22

∥∥∥2

2+∥∥∥[R22R

T12 R22R

T22

]∥∥∥2

2

le σ2j

(R11R

T11 + R12R

T12

)+ 2

∥∥∥∥∥[R12

R22

]∥∥∥∥∥4

2

= σ2j

(R11R

T11 + R12R

T12

)+ 2 R2242 (39)

The relation (39) can be further rewritten as

σ4j (A) le σ4

j

([R11 R12

])+ 2 R2242 = σ4

j (Σk) + 2 R2242 1 le j le k

which is equivalent to

σj(Σk) ge σj(A)

4

radic1 +

2R2242σ4j (Σk)

For the residual matrix bound we let[R11 R12

]def=[R11 R12

]+[δR11 δR12

]

where[R11 R12

]is the rank-k truncated SVD of

[R11 R12

] Since

(310)∥∥Aminus UkΣkV

Tk

∥∥2

=

∥∥∥∥∥AΠminusQ[R11 R12

0

]TQT

∥∥∥∥∥2

it follows from the orthogonality of singular vectors that[R11 R12

]T [δR11 δR12

]= 0

and therefore[R11 R12

]T [R11 R12

]=[R11 R12

]T [R11 R12

]+[δR11 δR12

]T [δR11 δR12

]

which implies

(311) RT12R12 = RT

12R12 +(δR12

)T (δR12

)

Similarly to the deduction of (39) from (311) we can derive∥∥∥∥[δR11 δR12

R22

]∥∥∥∥4

2

le∥∥[δR11 δR12

]∥∥4

2+ 2

∥∥∥∥[δR12

R22

]∥∥∥∥4

2

le σ4k+1 (A) + 2

∥∥∥∥∥[R12

R22

]∥∥∥∥∥4

2

= σ4k+1 (A) + 2 R2242

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 479

Combining with (310) it now follows that

∥∥Aminus UkΣkVTk

∥∥2

=

∥∥∥∥∥AΠminusQ[R11 R12

0

]TQT

∥∥∥∥∥2

=

∥∥∥∥[δR11 δR12

R22

]∥∥∥∥2

le σk+1 (A)4

radic1 + 2

(R222σk+1 (A)

)4

To obtain an upper bound of R222 in (35) and (36) we follow the analysis of SRQRin [75]

From the analysis of [75 Section IV] Algorithm 2 ensures that g1 leradic

1+ε1minusε and g2 le g

with high probability (the actual probability-guarantee formula can be found in [75 SectionIV]) where g1 and g2 are defined by (26) Here 0 lt ε lt 1 is a user defined parameter toadjust the choice of the oversampling size p used in the TRQRCP initialization part in SRQRg gt 1 is a user-defined parameter in the extra swapping part in SRQR Let

(312) τdef= g1g2

R222R2212

∥∥∥RminusT∥∥∥minus1

12

σl+1 (A)and τ

def= g1g2

R222R2212

∥∥RminusT11

∥∥minus1

12

σk (Σk)

where R is defined in equation (25) Since 1radicnX12 le X2 le

radicnX12 and

σi (X1) le σi (X) 1 le i le min (s t) for any matrix X isin Rmtimesn and submatrix X1 isin Rstimestof X we obtain that

τ = g1g2R222R2212

∥∥∥RminusT∥∥∥minus1

12

σl+1

(R) σl+1

(R)

σl+1 (A)le g1g2

radic(l + 1) (nminus l)

Using the fact that σl([R11 R12

])= σl

(R11

)and σk (Σk) = σk

([R11 R12

])by (32)

and (34)

τ = g1g2R222R2212

∥∥RminusT11

∥∥minus1

12

σl (R11)

σl (R11)

σl([R11 R12

]) σl ([R11 R12

])σk (Σk)

le g1g2

radicl (nminus l)

By the definition of τ

(313) R222 = τ σl+1 (A)

Plugging this into (36) yields (38)By the definition of τ we observe that

(314) R222 le τσk (Σk)

By (35) and (314)(315)

σj (Σk) ge σj (A)

4

radic1 + 2

(R222σj(Σk)

)4ge σj (A)

4

radic1 + 2

(R222σk(Σk)

)4ge σj (A)

4radic

1 + 2τ4 1 le j le k

ETNAKent State University and

Johann Radon Institute (RICAM)

480 Y FENG J XIAO AND M GU

On the other hand using (35)

σ4j (A) le σ4

j (Σk)

(1 + 2

σ4j (A)

σ4j (Σk)

R2242σ4j (A)

)

le σ4j (Σk)

1 + 2

(σ4j (Σk) + 2 R2242

)σ4j (Σk)

R2242σ4j (A)

le σ4

j (Σk)

(1 + 2

(1 + 2

R2242σ4k (Σk)

)R2242σ4j (A)

)

that is

σj (Σk) ge σj (A)

4

radic1 +

(2 + 4

R2242σ4k(Σk)

)R2242σ4j (A)

Plugging (313) and (314) into this above equation yields

(316) σj (Σk) ge σj (A)

4

radic1 + τ4 (2 + 4τ4)

(σl+1(A)σj(A)

)4 1 le j le k

Combining (315) with (316) we arrive at (37)We note that (35) and (36) still hold true if we replace k by lInequality (37) shows that with the definitions (312) of τ and τ Flip-Flop SRQR can

reveal at least a dimension-dependent fraction of all the leading singular values of A andindeed approximate them very accurately in case they decay relatively quickly Moreover (38)shows that Flip-Flop SRQR can compute a rank-k approximation that is up to a factor of4

radic1 + 2τ4

(σl+1(A)σk+1(A)

)4

from optimal In situations where the singular values of A decay

relatively quickly our rank-k approximation is about as accurate as the truncated SVD with achoice of l such that

σl+1 (A)

σk+1 (A)= o (1)

REMARK 33 The ldquohigh probabilityrdquo mentioned in Theorem 32 is dependent on theoversampling size p but not on the block size b We decided not to dive deep into the probabilityexplanation since it will make the proof much longer and more tedious Theoretically thecomputed oversampling size p for certain given high probability would be rather high Forexample to achieve a probability 095 we need to choose p to be at least 500 However inpractice we would never choose such a large p since it will deteriorate the code efficiencyIt turns out that a small p like 5 sim 10 would achieve good approximation results empiricallyThe detailed explanation about what does high probability mean here can be found in [75]In terms of the choice of the block size b it wonrsquot affect the analysis or the error boundsThe choice of b is purely for the performance of the codes Only a suitable b can take fulladvantage of the cache hierarchy and BLAS routines

4 Numerical experiments In this section we demonstrate the effectiveness and effi-ciency of the Flip-Flop SRQR (FFSRQR) algorithm in several numerical experiments Firstly

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 481

TABLE 41Methods for approximate SVD

Method DescriptionLANSVD Approximate SVD using Lanczos bidiagonalization with partial

reorthogonalization [42]Its Fortran and Matlab implementations are from PROPACK [41]

FFSRQR Flip-Flop Spectrum-revealing QR factorizationIts Matlab implementation is using Mex-wrappedBLAS and LAPACK routines [2]

RSISVD Approximate SVD with randomized subspace iteration [30 27]Its Matlab implementation is from tensorlab toolbox [73]

TUXV Algorithm 5 in the appendixLTSVD Linear Time SVD [16]

We use its Matlab implementation by Ma et al [49]

we compare FFSRQR with other approximate SVD algorithms for a matrix approximationSecondly we compare FFSRQR with other methods on tensor approximation problems usingthe tensorlab toolbox [73] Thirdly we compare FFSRQR with other methods for the robustPCA problem and matrix completion problem All experiments are implemented on a laptopwith 29 GHz Intel Core i5 processor and 8 GB of RAM The codes based on LAPACK [2]and PROPACK [41] in Section 41 are in Fortran The codes in Sections 42 43 are in Matlabexcept that FFSRQR is implemented in Fortran and wrapped by mex

41 Approximate truncated SVD In this section we compare FFSRQR with otherapproximate SVD algorithms for low-rank matrix approximation All tested methods are listedin Table 41 Since LTSVD has only a Matlab implementation this method is only consideredin Sections 42 and 43 The test matrices areType 1 A isin Rmtimesn [66] is defined by A = U DV T + 01 middot dss middotE where U isin Rmtimess V isin

Rntimess are column-orthonormal matrices and D isin Rstimess is a diagonal matrix with sgeometrically decreasing diagonal entries from 1 to 10minus3 dss is the last diagonalentry of D E isin Rmtimesn is a random matrix where the entries are independentlysampled from a normal distribution N (0 1) In our numerical experiment we testthree different random matrices The square matrix has a size of 10000times 10000 theshort-fat matrix has a size of 1000times 10000 and the tall-skinny matrix has a size of10000times 1000

Type 2 A isin R4929times4929 is a real data matrix from the University of Florida sparse matrixcollection [11] Its corresponding file name is HBGEMAT11

For a given matrix A isin Rmtimesn the relative SVD approximation error is measured by∥∥Aminus UkΣkVTk

∥∥FAF where Σk contains the approximate top k singular values and

Uk Vk are the corresponding approximate top k singular vectors The parameters used inFFSRQR RSISVD TUXV and LTSVD are listed in Table 42 The additional time requiredto compute g2 is negligible In our experiments g2 always remains modest and never triggerssubsequent SRQR column swaps However computing g2 is still recommended since g2

serves as an insurance policy against potential quality degration by TRQRCP The block size bis chosen to be 32 according to the xGEQP3 LAPACK routine When k is less than 32 weimplement an unblock version of the partial QRCP

Figures 41ndash43 display the run time the relative approximation error and the top 20singular values comparison respectively for four different matrices In terms of speed FFSRQRis always faster than LANSVD and RSISVD Moreover the efficiency difference is more

ETNAKent State University and

Johann Radon Institute (RICAM)

482 Y FENG J XIAO AND M GU

TABLE 42Parameters used in RSISVD FFSRQR TUXV and LTSVD

Method ParameterFFSRQR oversampling size p = 5 block size b = min32 k l = k d = 10 g = 20RSISVD oversampling size p = 5 subspace iteration q = 1TUXV oversampling size p = 5 block size b = min32 k l = kLTSVD probabilities pi = 1n

obvious when k is relatively large FFSRQR is only slightly slower than TUXV In termsof the relative approximation error FFSRQR is comparable to the other three methods Interms of the top singular values approximation Figure 43 displays the top 20 singular valueswhen k = 500 for four different matrices In Figure 43 FFSRQR is superior to TUXV aswe take the absolute values of the main diagonal entries of the upper triangle matrix R as theapproximate singular values for TUXV

100 200 300 400 500

target rank k

100

101

102

103

104

run tim

e (

sec)

FFSRQR

RSISVD

LANSVD

TUXV

(a) Type 1 Random square matrix

100 200 300 400 500

target rank k

10-1

100

101

102

run tim

e (

sec)

FFSRQR

RSISVD

LANSVD

TUXV

(b) Type 1 Random short-fat matrix

100 200 300 400 500

target rank k

10-1

100

101

102

run tim

e (

sec)

FFSRQR

RSISVD

LANSVD

TUXV

(c) Type 1 Random tall-skinny matrix

100 200 300 400 500

target rank k

10-1

100

101

102

103

run tim

e (

sec)

FFSRQR

RSISVD

LANSVD

TUXV

(d) Type 2 GEMAT11

FIG 41 Run time comparison for approximate SVD algorithms

42 Tensor approximation This section illustrates the effectiveness and efficiency ofFFSRQR for computing approximate tensors ST-HOSVD [3 71] is one of the most efficient

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 483

100 200 300 400 500

target rank k

022

024

026

028

03

032

034

036re

lative

ap

pro

xim

atio

n e

rro

r

FFSRQR

RSISVD

LANSVD

TUXV

(a) Type 1 Random square matrix

100 200 300 400 500

target rank k

10-2

10-1

100

rela

tive

ap

pro

xim

atio

n e

rro

r

FFSRQR

RSISVD

LANSVD

TUXV

(b) Type 1 Random short-fat matrix

100 200 300 400 500

target rank k

10-2

10-1

100

rela

tive

ap

pro

xim

atio

n e

rro

r

FFSRQR

RSISVD

LANSVD

TUXV

(c) Type 1 Random tall-skinny matrix

100 200 300 400 500

target rank k

024

026

028

03

032

034

036

rela

tive

ap

pro

xim

atio

n e

rro

rFFSRQR

RSISVD

LANSVD

TUXV

(d) Type 2 GEMAT11

FIG 42 Relative approximation error comparison for approximate SVD algorithms

algorithms to compute a Tucker decomposition of tensors and the most costly part of thisalgorithm is to compute an SVD or approximate SVD of the tensor unfoldings The truncatedSVD and RSISVD are used in the routines MLSVD and MLSVD_RSI respectively in theMatlab tensorlab toolbox [73] Based on this Matlab toolbox we implement ST-HOSVDusing FFSRQR or LTSVD to do the SVD approximation We name these two new routines byMLSVD_FFSRQR and MLSVD_LTSVD respectively and compare these four routines in thenumerical experiment We also have Python codes for this tensor approximation experimentWe donrsquot list the results of the python programs here as they are similar to those for Matlab

421 A sparse tensor example We test a sparse tensor X isin Rntimesntimesn of the followingformat [64 59]

X =

10sumj=1

1000

jxj yj zj +

nsumj=11

1

jxj yj zj

where xj yj zj isin Rn are sparse vectors with nonnegative entries The symbol ldquordquo representsthe vector outer product We compute a rank-(k k k) Tucker decomposition [GU1 U2 U3]using MLSVD MLSVD_FFSRQR MLSVD_RSI and MLSVD_LTSVD respectively Therelative approximation error is measured by X minus XkF XF where Xk = G times1 U1 times2

U2 times3 U3

ETNAKent State University and

Johann Radon Institute (RICAM)

484 Y FENG J XIAO AND M GU

5 10 15 20

index

06

07

08

09

1to

p 2

0 s

ingula

r valu

es

FFSRQR

RSISVD

LANSVD

TUXV

(a) Type 1 Random square matrix

5 10 15 20

index

06

07

08

09

1

top 2

0 s

ingula

r valu

es

FFSRQR

RSISVD

LANSVD

TUXV

(b) Type 1 Random short-fat matrix

5 10 15 20

index

06

065

07

075

08

085

09

095

top

20

sin

gu

lar

va

lue

s

FFSRQR

RSISVD

LANSVD

TUXV

(c) Type 1 Random tall-skinny matrix

5 10 15 20

index

101

102

103

top

20

sin

gu

lar

va

lue

sFFSRQR

RSISVD

LANSVD

TUXV

(d) Type 2 GEMAT11

FIG 43 Top 20 singular values comparison for approximate SVD algorithms

Figure 44 displays a comparison of the efficiency and accuracy of different methods for a400times 400times 400 sparse tensor approximation problem MLSVD_LTSVD is the fastest but theleast accurate one The other three methods have similar accuracy while MLSVD_FFSRQR isfaster when the target rank k is larger

422 Handwritten digits classification MNIST is a handwritten digits image data setcreated by Yann LeCun [43] Every digit is represented by a 28times28 pixel image Handwrittendigits classification has the objective of training a classification model to classify new unlabeledimages A HOSVD algorithm is proposed by Savas and Elden [60] to classify handwrittendigits To reduce the training time a more efficient ST-HOSVD algorithm is introducedin [71]

We do handwritten digits classification using MNIST which consists of 60 000 trainingimages and 10 000 test images The number of training images in each class is restrictedto 5421 so that the training set is equally distributed over all classes The training set isrepresented by a tensor X of size 786times 5421times 10 The classification relies on Algorithm 2in [60] We use various algorithms to obtain an approximation X asymp G times1 U1 times2 U2 times3 U3

where the core tensor G has size 65times 142times 10The results are summarized in Table 43 In terms of run time we observe that our method

MLSVD_FFSRQR is comparable to MLSVD_RSI while MLSVD is the most expensive

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 485

50 100 150 200

target rank k

10-5

10-4

10-3

10-2

10-1

100

rela

tive

ap

pro

xim

atio

n e

rro

r

MLSVD

MLSVD_FFSRQR

MLSVD_RSI

MLSVD_LTSVD

50 100 150 200

target rank k

10-2

10-1

100

101

102

run tim

e (

sec)

MLSVD

MLSVD_FFSRQR

MLSVD_RSI

MLSVD_LTSVD

FIG 44 Run time and relative approximation error comparison on a sparse tensor

TABLE 43Comparison on handwritten digits classification

MLSVD MLSVD_FFSRQR MLSVD_RSI MLSVD_LTSVD

Training Time [sec] 272121 15455 19343 04266Relative Model Error 04099 04273 04247 05162Classification Accur 9519 9498 9505 9259

one and MLSVD_LTSVD is the fastest one In terms of classification quality MLSVDMLSVD_FFSRQR and MLSVD_RSI are comparable while MLSVD_LTSVD is the leastaccurate one

43 Solving nuclear norm minimization problem To show the effectiveness of theFFSRQR algorithm for the nuclear norm minimization problems we investigate two scenariosthe robust PCA (27) and the matrix completion problem (28) The test matrix used for therobust PCA is introduced in [45] and the test matrices used in the matrix completion are tworeal data sets We use the IALM method [45] to solve both problems the IALM code can bedownloaded1

431 Robust PCA To solve the robust PCA problem we replace the approximateSVD part in the IALM method [45] by various methods We denote the actual solution tothe robust PCA problem by a matrix pair (Xlowast Elowast) isin Rmtimesn times Rmtimesn The matrix Xlowast isXlowast = XLX

TR with XL isin Rmtimesk XR isin Rntimesk being random matrices whose entries are

independently sampled from a normal distribution The sparse matrix Elowast is a random matrixwhere its non-zero entries are independently sampled from a uniform distribution over theinterval [minus500 500] The input to the IALM algorithm has the form M = Xlowast + Elowast andthe output is denoted by

(X E

) In this numerical experiment we use the same parameter

settings as the IALM code for robust PCA rank k is 01m and the number of non-zero entriesin E is 005m2 We choose the trade-off parameter λ = 1

radicmax (mn) as suggested by

Candes et al [8] The solution quality is measured by the normalized root mean square errorX minusXlowastF XlowastF

Table 44 includes relative error run time the number of non-zero entries in E (E0)the iteration count and the number of non-zero singular values (sv) in X of the IALM

1httpperceptioncslillinoisedumatrix-ranksample_codehtml

ETNAKent State University and

Johann Radon Institute (RICAM)

486 Y FENG J XIAO AND M GU

algorithm using different approximate SVD methods We observe that IALM_FFSRQR isfaster than all the other three methods while its error is comparable to IALM_LANSVD andIALM_RSISVD IALM_LTSVD is relatively slow and not effective

TABLE 44Comparison on robust PCA

Size Method Error Time (sec) E0 Iter sv

1000times 1000

IALM_LANSVD 333e-07 579e+00 50000 22 100IALM_FFSRQR 279e-07 102e+00 50000 25 100IALM_RSISVD 336e-07 109e+00 49999 22 100IALM_LTSVD 992e-02 311e+00 999715 100 100

2000times 2000

IALM_LANSVD 261e-07 591e+01 199999 22 200IALM_FFSRQR 182e-07 693e+00 199998 25 200IALM_RSISVD 263e-07 738e+00 199996 22 200IALM_LTSVD 842e-02 220e+01 3998937 100 200

4000times 4000

IALM_LANSVD 138e-07 465e+02 799991 23 400IALM_FFSRQR 139e-07 443e+01 800006 26 400IALM_RSISVD 151e-07 504e+01 799990 23 400IALM_LTSVD 894e-02 154e+02 15996623 100 400

6000times 6000

IALM_LANSVD 130e-07 166e+03 1799982 23 600IALM_FFSRQR 102e-07 142e+02 1799993 26 600IALM_RSISVD 144e-07 162e+02 1799985 23 600IALM_LTSVD 858e-02 555e+02 35992605 100 600

432 Matrix completion We solve the matrix completion problems for two real datasets used in [68] the Jester joke data set [24] and the MovieLens data set [32] The Jester jokedata set consists of 41 million ratings for 100 jokes from 73 421 users and can be downloadedfrom the website httpgoldbergberkeleyedujester-data We test withthe following data matrices

bull jester-1 Data from 24 983 users who have rated 36 or more jokesbull jester-2 Data from 23 500 users who have rated 36 or more jokesbull jester-3 Data from 24 938 users who have rated between 15 and 35 jokesbull jester-all The combination of jester-1 jester-2 and jester-3

The MovieLens data set can be downloaded fromhttpsgrouplensorgdatasetsmovielensWe test with the following data matricesbull movie-100K 100 000 ratings of 943 users for 1682 moviesbull movie-1M 1 million ratings of 6040 users for 3900 moviesbull movie-latest-small 100 000 ratings of 700 users for 9000 movies

For each data set we let M be the original data matrix where Mij stands for the ratingof joke (movie) j by user i and Γ be the set of indices where Mij is known The matrixcompletion algorithm quality is measured by the Normalized Mean Absolute Error (NMAE)defined by

NMAEdef=

1|Γ|sum

(ij)isinΓ |Mij minusXij |rmax minus rmin

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 487

where Xij is the prediction of the rating of joke (movie) j given by user i and rmin rmax arethe lower and upper bounds of the ratings respectively For the Jester joke data sets we setrmin = minus10 and rmax = 10 For the MovieLens data sets we set rmin = 1 and rmax = 5

Since |Γ| is large we randomly select a subset Ω from Γ and then use Algorithm 8 tosolve the problem (28) We randomly select 10 ratings for each user in the Jester joke datasets while we randomly choose about 50 of the ratings for each user in the MovieLensdata sets Table 45 includes the parameter settings in the algorithms The maximum iterationnumber is 100 in IALM and all other parameters are the same as those used in [45]

The numerical results are included in Table 46 We observe that IALM_FFSRQR achievesalmost the same recoverability as other methods (except IALM_LTSVD) and is slightly fasterthan IALM_RSISVD for these two data sets

TABLE 45Parameters used in the IALM method on matrix completion

Data set m n |Γ| |Ω|jester-1 24983 100 181e+06 249830jester-2 23500 100 171e+06 235000jester-3 24938 100 617e+05 249384

jester-all 73421 100 414e+06 734210movie-100K 943 1682 100e+05 49918movie-1M 6040 3706 100e+06 498742

movie-latest-small 671 9066 100e+05 52551

5 Conclusions We have presented the Flip-Flop SRQR factorization a variant ofthe QLP factorization to compute low-rank matrix approximations The Flip-Flop SRQRalgorithm uses a SRQR factorization to initialize the truncated version of a column-pivotedQR factorization and then forms an LQ factorization For the numerical results presented theerrors in the proposed algorithm were comparable to those obtained from the other state-of-the-art algorithms This new algorithm is cheaper to compute and produces quality low-rankmatrix approximations Furthermore we prove singular value lower bounds and residual errorupper bounds for the Flip-Flop SRQR factorization In situations where singular values ofthe input matrix decay relatively quickly the low-rank approximation computed by Flip-FlopSRQR is guaranteed to be as accurate as the truncated SVD We also perform a complexityanalysis to show that Flip-Flop SRQR is faster than the approximate SVD with randomizedsubspace iteration Future work includes reducing the overhead cost in Flip-Flop SRQR andimplementing the Flip-Flop SRQR algorithm on distributed memory machines for popularapplications such as distributed PCA

Acknowledgments The authors would like to thank Yousef Saad the editor and threeanonymous referees who kindly reviewed the earlier version of this manuscript and providedvaluable suggestions and comments

Appendix A Partial QR factorization with column pivotingThe details of partial QRCP are covered in Algorithm 4

Appendix B TUXV algorithmThe details of TUXV algorithm are presented in Algorithm 5

Appendix C Computing Tucker decomposition of higher-order tensorsAlgorithm 6 computes the Tucker decomposition of higher-order tensors ie sequentially

truncated higher-order SVD

ETNAKent State University and

Johann Radon Institute (RICAM)

488 Y FENG J XIAO AND M GU

TABLE 46Comparison on matrix completion

Data set Method Iter Time NMAE sv σmax σmin

jester-1

IALM-LANSVD 12 706e+00 184e-01 100 214e+03 100e+00IALM-FFSRQR 12 344e+00 169e-01 100 228e+03 100e+00IALM-RSISVD 12 375e+00 189e-01 100 212e+03 100e+00IALM-LTSVD 100 211e+01 174e-01 62 300e+03 100e+00

jester-2

IALM-LANSVD 12 680e+00 185e-01 100 213e+03 100e+00IALM-FFSRQR 12 279e+00 170e-01 100 229e+03 100e+00IALM-RSISVD 12 359e+00 191e-01 100 212e+03 100e+00IALM-LTSVD 100 203e+01 175e-01 58 296e+03 100e+00

jester-3

IALM-LANSVD 12 705e+00 126e-01 99 179e+03 100e+00IALM-FFSRQR 12 303e+00 122e-01 100 171e+03 100e+00IALM-RSISVD 12 385e+00 131e-01 100 178e+03 100e+00IALM-LTSVD 100 212e+01 133e-01 55 250e+03 100e+00

jester-all

IALM-LANSVD 12 239e+01 172e-01 100 356e+03 100e+00IALM-FFSRQR 12 112e+01 162e-01 100 363e+03 100e+00IALM-RSISVD 12 134e+01 182e-01 100 347e+03 100e+00IALM-LTSVD 100 699e+01 168e-01 52 492e+03 100e+00

movie-100K

IALM-LANSVD 29 286e+01 183e-01 285 121e+03 100e+00IALM-FFSRQR 30 455e+00 167e-01 295 153e+03 100e+00IALM-RSISVD 29 482e+00 182e-01 285 129e+03 100e+00IALM-LTSVD 48 142e+01 147e-01 475 191e+03 100e+00

movie-1M

IALM-LANSVD 50 740e+02 158e-01 495 499e+03 100e+00IALM-FFSRQR 53 207e+02 137e-01 525 663e+03 100e+00IALM-RSISVD 50 223e+02 157e-01 495 535e+03 100e+00IALM-LTSVD 100 850e+02 117e-01 995 897e+03 100e+00

movie-latest-small

IALM-LANSVD 31 166e+02 185e-01 305 113e+03 100e+00IALM-FFSRQR 31 196e+01 200e-01 305 142e+03 100e+00IALM-RSISVD 31 285e+01 191e-01 305 120e+03 100e+00IALM-LTSVD 63 402e+01 208e-01 298 179e+03 100e+00

Appendix D IALM for solving two special nuclear norm minimization problemsAlgorithm 7 and Algorithm 8 solve the robust PCA and matrix completion problems

respectively In Algorithm 7 Sω (x) = sgn (x) middot max (|x| minus ω 0) is the soft shrinkageoperator [29] with x isin Rn and ω gt 0 In Algorithm 8 Ω is the complement of Ω

Appendix E Approximate SVD with randomized subspace iterationThe randomized subspace iteration was proposed in [30 Algorithm 44] to compute an

orthonormal matrix whose range approximates the range of A An approximate SVD can becomputed using the aforementioned orthonormal matrix [30 Algorithm 51] A randomizedsubspace iteration is used in routine MLSVD_RSI in Matlab toolbox tensorlab [73] andMLSVD_RSI is by far the most efficient function that we can find in Matlab to compute ST-HOSVD We summarize the approximate SVD with randomized subspace iteration pseudocodein Algorithm 9

Now we perform a complexity analysis for the approximate SVD with randomizedsubspace iteration We first note that

1 The cost of generating a random matrix is negligible2 The cost of computing B = AΩ is 2mn (k + p)

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 489

Algorithm 4 Partial QRCPInputsMatrix A isin Rmtimesn Target rank kOutputsOrthogonal matrix Q isin RmtimesmUpper trapezoidal matrix R isin RmtimesnPermutation matrix Π isin Rntimesn such that AΠ = QRAlgorithmInitialize Π = In Compute column norms rs = A (1 m s) 2 (1 le s le n)for j = 1 k do

Find i = arg maxjleslen

rs Exchange rj and ri columns in A and Π

Form Householder reflection Qj from A (j m j)Update trailing matrix A (j m j n)larr QTj A (j m j n)Update rs = A (j + 1 m s) 2 (j + 1 le s le n)

end forQ = Q1Q2 middot middot middotQk is the product of all reflections R = upper trapezoidal part of A

Algorithm 5 TUXV AlgorithmInputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0OutputsColumn orthonormal matrices U isin Rmtimesk V isin Rntimesk and upper triangular matrixR isin Rktimesk such that A asymp URV T AlgorithmDo TRQRCP on A to obtain Q isin Rmtimesk R isin Rktimesn and Π isin RntimesnR = RΠT and do LQ factorization ie [VR] = qr(RT 0)Compute Z = AV and do QR factorization ie [UR] = qr(Z 0)

3 In each QR step [Qsim] = qr (B 0) the cost of computing the QR factorization ofB is 2m (k + p)

2 minus 23 (k + p)

3 (cf [69]) and the cost of forming the first (k + p)

columns in the full Q matrix is m (k + p)2

+ 13 (k + p)

3

Now we count the flops for each i in the for loop

1 The cost of computing B = AT lowastQ is 2mn (k + p)2 The cost of computing [Qsim] = qr (B 0) is 2n (k + p)

2 minus 23 (k + p)

3 and the costof forming the first (k + p) columns in the full Q matrix is n (k + p)

2+ 1

3 (k + p)3

3 The cost of computing B = A lowastQ is 2mn (k + p)4 The cost of computing [Qsim] = qr (B 0) is 2m (k + p)

2minus 23 (k + p)

3 and the costof forming the first (k + p) columns in the full Q matrix is m (k + p)

2+ 1

3 (k + p)3

Together the cost of running the for loop q times is

q

(4mn (k + p) + 3 (m+ n) (k + p)

2 minus 2

3(k + p)

3

)

Additionally the cost of computing B = QT lowastA is 2mn (k + p) the cost of doing SVD of Bis O(n (k + p)

2) and the cost of computing U = Q lowast U is 2m (k + p)

2

ETNAKent State University and

Johann Radon Institute (RICAM)

490 Y FENG J XIAO AND M GU

Algorithm 6 ST-HOSVDInputsTensor X isin RI1timesmiddotmiddotmiddottimesId desired rank (k1 kd) and processing order p = (p1 middot middot middot pd)OutputsTucker decomposition [GU1 middot middot middot Ud]AlgorithmDefine tensor G larr X for j = 1 d dor = pj Compute exact or approximate rank kr SVD of the tensor unfolding G(r) asymp UrΣrV Tr Ur larr UrUpdate G(r) larr ΣrV

Tr ie applying UTr to G

end for

Algorithm 7 Robust PCA Using IALMInputsMeasured matrix M isin Rmtimesn positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 J (M) = max

(M2 λminus1MF

) Y0 = MJ (M) E0 = 0

while not converged do(UΣV) = svd

(MminusEk + microminus1k Yk

)

Xk+1 = USmicrominus1k

(Σ)V T

Ek+1 = Sλmicrominus1k

(M minusXk+1 + microminus1

k Yk)

Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then

Breakend if

end while

Now assume k + p min (mn) and omit the lower-order terms then we arriveat (4q + 4)mn (k + p) as the complexity of approximate SVD with randomized subspaceiteration In practice q is usually chosen to be an integer between 0 and 2

REFERENCES

[1] O ALTER P O BROWN AND D BOTSTEIN Singular value decomposition for genome-wide expressiondata processing and modeling Proc Natl Acad Sci USA 97 (2000) pp 10101ndash10106

[2] E ANDERSON Z BAI C BISCHOF L S BLACKFORD J DEMMEL J DONGARRA J DU CROZA GREENBAUM S HAMMARLING A MCKENNEY AND D SORENSEN LAPACK Usersrsquo Guide 3rded SIAM Philadelphia 1999

[3] C A ANDERSSON AND R BRO Improving the speed of multi-way algorithms Part I Tucker3 ChemometricsIntell Lab Syst 42 (1998) pp 93ndash103

[4] M W BERRY S T DUMAIS AND G W OrsquoBRIEN Using linear algebra for intelligent information retrievalSIAM Rev 37 (1995) pp 573ndash595

[5] C BISCHOF AND C VAN LOAN The WY representation for products of Householder matrices SIAM J Sci

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 491

Algorithm 8 Matrix Completion Using IALMInputsSampled set Ω sampled entries πΩ (M) positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 Y0 = 0 E0 = 0while not converged do

(UΣV) = svd(MminusEk + microminus1k Yk

)

Xk+1 = USmicrominus1k

(Σ)V T

Ek+1 = πΩ

(M minusXk+1 + microminus1

k Yk)

Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then

Breakend if

end while

Statist Comput 8 (1987) pp S2ndashS13[6] P BUSINGER AND G H GOLUB Handbook series linear algebra Linear least squares solutions by House-

holder transformations Numer Math 7 (1965) pp 269ndash276[7] J-F CAI E J CANDEgraveS AND Z SHEN A singular value thresholding algorithm for matrix completion

SIAM J Optim 20 (2010) pp 1956ndash1982[8] E J CANDEgraveS X LI Y MA AND J WRIGHT Robust principal component analysis J ACM 58 (2011)

Art 11 37 pages[9] E J CANDEgraveS AND B RECHT Exact matrix completion via convex optimization Found Comput Math 9

(2009) pp 717ndash772[10] J D CARROLL AND J-J CHANG Analysis of individual differences in multidimensional scaling via an

N-way generalization of ldquoEckart-Youngrdquo decomposition Psychometrika 35 (1970) pp 283ndash319[11] T A DAVIS AND Y HU The University of Florida sparse matrix collection ACM Trans Math Software 38

(2011) pp Art 1 25[12] L DE LATHAUWER AND B DE MOOR From matrix to tensor Multilinear algebra and signal processing in

Mathematics in Signal Processing IV Inst Math Appl Conf Ser Oxford Univ Press Oxford 1998pp 1ndash16

[13] L DE LATHAUWER B DE MOOR AND J VANDEWALLE A multilinear singular value decompositionSIAM J Matrix Anal Appl 21 (2000) pp 1253ndash1278

[14] On the best rank-1 and rank-(R1 R2 middot middot middot RN ) approximation of higher-order tensors SIAM JMatrix Anal Appl 21 (2000) pp 1324ndash1342

[15] L DE LATHAUWER AND J VANDEWALLE Dimensionality reduction in higher-order signal processing andrank-(R1 R2 RN ) reduction in multilinear algebra Linear Algebra Appl 391 (2004) pp 31ndash55

[16] P DRINEAS R KANNAN AND M W MAHONEY Fast Monte Carlo algorithms for matrices II Computinga low-rank approximation to a matrix SIAM J Comput 36 (2006) pp 158ndash183

[17] J A DUERSCH AND M GU Randomized QR with column pivoting SIAM J Sci Comput 39 (2017)pp C263ndashC291

[18] C ECKART AND G YOUNG The approximation of one matrix by another of lower rank Psychometrika 1(1936) pp 211ndash218

[19] L ELDEacuteN Matrix Methods in Data Mining and Pattern Recognition SIAM Philadelphia 2007[20] M FAZEL H HINDI AND S P BOYD Log-det heuristic for matrix rank minimization with applications to

Hankel and Euclidean distance matrices in Proceedings of the 2003 American Control Conference 2003IEEE Conference Proceedings IEEE Piscataway pp 2156ndash2162

[21] M FAZEL T K PONG D SUN AND P TSENG Hankel matrix rank minimization with applications tosystem identification and realization SIAM J Matrix Anal Appl 34 (2013) pp 946ndash977

[22] C FRANK AND E NOumlTH Optimizing eigenfaces by face masks for facial expression recognition in ComputerAnalysis of Images and Patterns N Petkov and M A Westenberg eds vol 2756 of Lecture Notes inComput Sci Springer Berlin 2003 pp 646ndash654

ETNAKent State University and

Johann Radon Institute (RICAM)

492 Y FENG J XIAO AND M GU

Algorithm 9 Approximate SVD with Randomized Subspace IterationInputsMatrix A isin Rmtimesn Target rank k Oversampling size p ge 0 Number of iterations q ge 1OutputsU isin Rmtimesk contains the approximate top k left singular vectors of AΣ isin Rktimesk contains the approximate top k singular values of AV isin Rntimesk contains the approximate top k right singular vectors of AAlgorithmGenerate iid Gaussian matrix Ω isin N (0 1)

ntimes(k+p)Compute B = AΩ[Qsim] = qr (B 0)for i = 1 q doB = AT lowastQ[Qsim] = qr (B 0)B = A lowastQ[Qsim] = qr (B 0)

end forB = QT lowastA[UΣ V ] = svd (B)U = Q lowast U U = U ( 1 k)Σ = Σ (1 k 1 k)V = V ( 1 k)

[23] G W FURNAS S DEERWESTER S T DUMAIS T K LANDAUER R A HARSHMAN L A STREETERAND K E LOCHBAUM Information retrieval using a singular value decomposition model of latentsemantic structure in Proceedings of the 11th Annual International ACM SIGIR Conference on Researchand Development in Information Retrieval Y Chiaramella ed ACM New York 1988 pp 465ndash480

[24] K GOLDBERG T ROEDER D GUPTA AND C PERKINS Eigentaste A constant time collaborative filteringalgorithm Inf Retrieval 4 (2001) pp 133ndash151

[25] G GOLUB Numerical methods for solving linear least squares problems Numer Math 7 (1965) pp 206ndash216[26] G H GOLUB AND C F VAN LOAN Matrix Computations 4th ed Johns Hopkins University Press

Baltimore 2013[27] M GU Subspace iteration randomization and singular value problems SIAM J Sci Comput 37 (2015)

pp A1139ndashA1173[28] M GU AND S C EISENSTAT Efficient algorithms for computing a strong rank-revealing QR factorization

SIAM J Sci Comput 17 (1996) pp 848ndash869[29] E T HALE W YIN AND Y ZHANG Fixed-point continuation for l1-minimization methodology and

convergence SIAM J Optim 19 (2008) pp 1107ndash1130[30] N HALKO P G MARTINSSON AND J A TROPP Finding structure with randomness probabilistic

algorithms for constructing approximate matrix decompositions SIAM Rev 53 (2011) pp 217ndash288[31] R A HARSHMAN Foundations of the PARAFAC procedure Models and conditions for an ldquoexplanatoryrdquo

multimodal factor analysis UCLA Working Papers in Phonetics 16 (1970) pp 1ndash84[32] J L HERLOCKER J A KONSTAN A BORCHERS AND J RIEDL An algorithmic framework for performing

collaborative filtering in Proceedings of the 22nd Annual International ACM SIGIR Conference onResearch and Development in Information Retrieval F Gey M Heaast and R Tong eds ACM NewYork 1999 pp 230ndash237

[33] R A HORN AND C R JOHNSON Topics in Matrix Analysis Cambridge University Press Cambridge 1991[34] D A HUCKABY AND T F CHAN On the convergence of Stewartrsquos QLP algorithm for approximating the

SVD Numer Algorithms 32 (2003) pp 287ndash316[35] Stewartrsquos pivoted QLP decomposition for low-rank matrices Numer Linear Algebra Appl 12 (2005)

pp 153ndash159[36] W B JOHNSON AND J LINDENSTRAUSS Extensions of Lipschitz mappings into a Hilbert space in

Conference in Modern Analysis and Probability (New Haven Conn 1982) R Beals A Beck A Bellowand A Hajian eds vol 26 of Contemp Math Amer Math Soc Providence 1984 pp 189ndash206

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 493

[37] I T JOLLIFFE Principal Component Analysis Springer New York 1986[38] J M KLEINBERG Authoritative sources in a hyperlinked environment J ACM 46 (1999) pp 604ndash632[39] T G KOLDA Orthogonal tensor decompositions SIAM J Matrix Anal Appl 23 (2001) pp 243ndash255[40] T G KOLDA AND B W BADER Tensor decompositions and applications SIAM Rev 51 (2009) pp 455ndash

500[41] R M LARSEN PROPACK-software for large and sparse SVD calculations available at

httpsunstanfordedurmunkPROPACK[42] Lanczos bidiagonalization with partial reorthogonalization Tech Report PB-537 Department of

Computer Science Aarhus University Aarhus Denmark 1998[43] Y LECUN L BOTTOU Y BENGIO AND P HAFFNER Gradient-based learning applied to document

recognition Proc IEEE 86 (1998) pp 2278ndash2324[44] R B LEHOUCQ D C SORENSEN AND C YANG ARPACK Usersrsquo Guide Solution of Large-Scale

Eigenvalue Problems with Implicitly Restarted Arnoldi Methods SIAM Philadelphia 1998[45] Z LIN M CHEN AND Y MA The augmented Lagrange multiplier method for exact recovery of corrupted

low-rank matrices Preprint on arXiv httpsarxivorgabs10095055 2010[46] N LINIAL E LONDON AND Y RABINOVICH The geometry of graphs and some of its algorithmic

applications Combinatorica 15 (1995) pp 215ndash245[47] Z LIU A HANSSON AND L VANDENBERGHE Nuclear norm system identification with missing inputs and

outputs Systems Control Lett 62 (2013) pp 605ndash612[48] Z LIU AND L VANDENBERGHE Interior-point method for nuclear norm approximation with application to

system identification SIAM J Matrix Anal Appl 31 (2009) pp 1235ndash1256[49] S MA D GOLDFARB AND L CHEN Fixed point and Bregman iterative methods for matrix rank minimiza-

tion Math Program 128 (2011) pp 321ndash353[50] C B MELGAARD Randomized Pivoting and Spectrum-Revealing Bounds in Numerical Linear Algebra

PhD Thesis University of California Berkeley 2015[51] M MESBAHI AND G P PAPAVASSILOPOULOS On the rank minimization problem over a positive semidefinite

linear matrix inequality IEEE Trans Automat Control 42 (1997) pp 239ndash243[52] N MULLER L MAGAIA AND B M HERBST Singular value decomposition eigenfaces and 3D recon-

structions SIAM Rev 46 (2004) pp 518ndash545[53] M NARWARIA AND W LIN SVD-based quality metric for image and video using machine learning IEEE

Trans Systbdquo Man and Cybernetics Part B (Cybernetics) 42 (2012) pp 347ndash364[54] T-H OH Y MATSUSHITA Y-W TAI AND I SO KWEON Fast randomized singular value thresholding

for nuclear norm minimization in Proceedings of the IEEE Conference on Computer Vision and PatternRecognition 2015 IEEE Conference Proceedings IEEE Los Alamitos 2015 pp 4484ndash4493

[55] G QUINTANA-ORTIacute X SUN AND C H BISCHOF A BLAS-3 version of the QR factorization with columnpivoting SIAM J Sci Comput 19 (1998) pp 1486ndash1494

[56] B RECHT M FAZEL AND P A PARRILO Guaranteed minimum-rank solutions of linear matrix equationsvia nuclear norm minimization SIAM Rev 52 (2010) pp 471ndash501

[57] J D RENNIE AND N SREBRO Fast maximum margin matrix factorization for collaborative prediction inProceedings of the 22nd International Conference on Machine Learning L De Raedt and S Wrobel edsACM Press Neow York 2005 pp 713ndash719

[58] V ROKHLIN A SZLAM AND M TYGERT A randomized algorithm for principal component analysis SIAMJ Matrix Anal Appl 31 (2009) pp 1100ndash1124

[59] A K SAIBABA HOID higher order interpolatory decomposition for tensors based on Tucker representationSIAM J Matrix Anal Appl 37 (2016) pp 1223ndash1249

[60] B SAVAS AND L ELDEacuteN Handwritten digit classification using higher order singular value decompositionPattern Recognition 40 (2007) pp 993ndash1003

[61] R SCHREIBER AND C VAN LOAN A storage-efficient WY representation for products of Householdertransformations SIAM J Sci Statist Comput 10 (1989) pp 53ndash57

[62] A SHASHUA AND T HAZAN Non-negative tensor factorization with applications to statistics and com-puter vision in Proceedings of the 22nd International Conference on Machine Learning S DzeroskiL De Raedt and S Wrobel eds ACM New York 2005 pp 792ndash799

[63] N D SIDIROPOULOS R BRO AND G B GIANNAKIS Parallel factor analysis in sensor array processingIEEE Trans Signal Process 48 (2000) pp 2377ndash2388

[64] D C SORENSEN AND M EMBREE A DEIM induced CUR factorization SIAM J Sci Comput 38 (2016)pp A1454ndashA1482

[65] N SREBRO J RENNIE AND T S JAAKKOLA Maximum-margin matrix factorization in NIPSrsquo04 Proceed-ings of the 17th International Conference on Neural Information Processing Systems L K Saul Y Weissand L Bottou eds MIT Press Cambridge 2004 pp 1329ndash1336

[66] G W STEWART The QLP approximation to the singular value decomposition SIAM J Sci Comput 20(1999) pp 1336ndash1348

ETNAKent State University and

Johann Radon Institute (RICAM)

494 Y FENG J XIAO AND M GU

[67] S TAHERI Q QIU AND R CHELLAPPA Structure-preserving sparse decomposition for facial expressionanalysis IEEE Trans Image Process 23 (2014) pp 3590ndash3603

[68] K-C TOH AND S YUN An accelerated proximal gradient algorithm for nuclear norm regularized linearleast squares problems Pac J Optim 6 (2010) pp 615ndash640

[69] L N TREFETHEN AND D BAU III Numerical Linear Algebra SIAM Philadelphia 1997[70] L R TUCKER Some mathematical notes on three-mode factor analysis Psychometrika 31 (1966) pp 279ndash

311[71] N VANNIEUWENHOVEN R VANDEBRIL AND K MEERBERGEN A new truncation strategy for the higher-

order singular value decomposition SIAM J Sci Comput 34 (2012) pp A1027ndashA1052[72] M A O VASILESCU AND D TERZOPOULOS Multilinear analysis of image ensembles Tensorfaces in

European Conference on Computer Vision 2002 A Heyden G Sparr M Nielsen and P Johansen edsLecture Notes in Computer Science vol 2350 Springer Berlin 2002 pp 447ndash460

[73] N VERVLIET O DEBALS L SORBER M V BAREL AND L DE LATHAUWER Tensorlab user guideavailable at httpwwwtensorlabnet

[74] J XIAO AND M GU Spectrum-revealing Cholesky factorization for kernel methods in Proceedings of the 16thIEEE International Conference on Data Mining (ICDM) IEEE Conference Proceedings Los Alamitos2016 pp 1293ndash1298

[75] J XIAO M GU AND J LANGOU Fast parallel randomized QR with column pivoting algorithms for reliablelow-rank matrix approximations in Proceedings of the 24th IEEE International Conference on HighPerformance Computing (HiPC) IEEE Conference Proceedings Los Alamitos 2017 pp 233ndash242

[76] T ZHANG AND G H GOLUB Rank-one approximation to high order tensors SIAM J Matrix Anal Appl23 (2001) pp 534ndash550

Page 10: FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION AND …etna.mcs.kent.edu/vol.51.2019/pp469-494.dir/pp469-494.pdfminimization. In Section3, we introduce Flip-Flop SRQR and analyze its

ETNAKent State University and

Johann Radon Institute (RICAM)

478 Y FENG J XIAO AND M GU

we apply Lemma 31 twice for any 1 le j le k

σ2j

[R11 R12

R22

][R11 R12

R22

]Tle σ2

j

([R11R

T11 + R12R

T12 R12R

T22

])+∥∥∥[R22R

T12 R22R

T22

]∥∥∥2

2

le σ2j

(R11R

T11 + R12R

T12

)+∥∥∥R12R

T22

∥∥∥2

2+∥∥∥[R22R

T12 R22R

T22

]∥∥∥2

2

le σ2j

(R11R

T11 + R12R

T12

)+ 2

∥∥∥∥∥[R12

R22

]∥∥∥∥∥4

2

= σ2j

(R11R

T11 + R12R

T12

)+ 2 R2242 (39)

The relation (39) can be further rewritten as

σ4j (A) le σ4

j

([R11 R12

])+ 2 R2242 = σ4

j (Σk) + 2 R2242 1 le j le k

which is equivalent to

σj(Σk) ge σj(A)

4

radic1 +

2R2242σ4j (Σk)

For the residual matrix bound we let[R11 R12

]def=[R11 R12

]+[δR11 δR12

]

where[R11 R12

]is the rank-k truncated SVD of

[R11 R12

] Since

(310)∥∥Aminus UkΣkV

Tk

∥∥2

=

∥∥∥∥∥AΠminusQ[R11 R12

0

]TQT

∥∥∥∥∥2

it follows from the orthogonality of singular vectors that[R11 R12

]T [δR11 δR12

]= 0

and therefore[R11 R12

]T [R11 R12

]=[R11 R12

]T [R11 R12

]+[δR11 δR12

]T [δR11 δR12

]

which implies

(311) RT12R12 = RT

12R12 +(δR12

)T (δR12

)

Similarly to the deduction of (39) from (311) we can derive∥∥∥∥[δR11 δR12

R22

]∥∥∥∥4

2

le∥∥[δR11 δR12

]∥∥4

2+ 2

∥∥∥∥[δR12

R22

]∥∥∥∥4

2

le σ4k+1 (A) + 2

∥∥∥∥∥[R12

R22

]∥∥∥∥∥4

2

= σ4k+1 (A) + 2 R2242

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 479

Combining with (310) it now follows that

∥∥Aminus UkΣkVTk

∥∥2

=

∥∥∥∥∥AΠminusQ[R11 R12

0

]TQT

∥∥∥∥∥2

=

∥∥∥∥[δR11 δR12

R22

]∥∥∥∥2

le σk+1 (A)4

radic1 + 2

(R222σk+1 (A)

)4

To obtain an upper bound of R222 in (35) and (36) we follow the analysis of SRQRin [75]

From the analysis of [75 Section IV] Algorithm 2 ensures that g1 leradic

1+ε1minusε and g2 le g

with high probability (the actual probability-guarantee formula can be found in [75 SectionIV]) where g1 and g2 are defined by (26) Here 0 lt ε lt 1 is a user defined parameter toadjust the choice of the oversampling size p used in the TRQRCP initialization part in SRQRg gt 1 is a user-defined parameter in the extra swapping part in SRQR Let

(312) τdef= g1g2

R222R2212

∥∥∥RminusT∥∥∥minus1

12

σl+1 (A)and τ

def= g1g2

R222R2212

∥∥RminusT11

∥∥minus1

12

σk (Σk)

where R is defined in equation (25) Since 1radicnX12 le X2 le

radicnX12 and

σi (X1) le σi (X) 1 le i le min (s t) for any matrix X isin Rmtimesn and submatrix X1 isin Rstimestof X we obtain that

τ = g1g2R222R2212

∥∥∥RminusT∥∥∥minus1

12

σl+1

(R) σl+1

(R)

σl+1 (A)le g1g2

radic(l + 1) (nminus l)

Using the fact that σl([R11 R12

])= σl

(R11

)and σk (Σk) = σk

([R11 R12

])by (32)

and (34)

τ = g1g2R222R2212

∥∥RminusT11

∥∥minus1

12

σl (R11)

σl (R11)

σl([R11 R12

]) σl ([R11 R12

])σk (Σk)

le g1g2

radicl (nminus l)

By the definition of τ

(313) R222 = τ σl+1 (A)

Plugging this into (36) yields (38)By the definition of τ we observe that

(314) R222 le τσk (Σk)

By (35) and (314)(315)

σj (Σk) ge σj (A)

4

radic1 + 2

(R222σj(Σk)

)4ge σj (A)

4

radic1 + 2

(R222σk(Σk)

)4ge σj (A)

4radic

1 + 2τ4 1 le j le k

ETNAKent State University and

Johann Radon Institute (RICAM)

480 Y FENG J XIAO AND M GU

On the other hand using (35)

σ4j (A) le σ4

j (Σk)

(1 + 2

σ4j (A)

σ4j (Σk)

R2242σ4j (A)

)

le σ4j (Σk)

1 + 2

(σ4j (Σk) + 2 R2242

)σ4j (Σk)

R2242σ4j (A)

le σ4

j (Σk)

(1 + 2

(1 + 2

R2242σ4k (Σk)

)R2242σ4j (A)

)

that is

σj (Σk) ge σj (A)

4

radic1 +

(2 + 4

R2242σ4k(Σk)

)R2242σ4j (A)

Plugging (313) and (314) into this above equation yields

(316) σj (Σk) ge σj (A)

4

radic1 + τ4 (2 + 4τ4)

(σl+1(A)σj(A)

)4 1 le j le k

Combining (315) with (316) we arrive at (37)We note that (35) and (36) still hold true if we replace k by lInequality (37) shows that with the definitions (312) of τ and τ Flip-Flop SRQR can

reveal at least a dimension-dependent fraction of all the leading singular values of A andindeed approximate them very accurately in case they decay relatively quickly Moreover (38)shows that Flip-Flop SRQR can compute a rank-k approximation that is up to a factor of4

radic1 + 2τ4

(σl+1(A)σk+1(A)

)4

from optimal In situations where the singular values of A decay

relatively quickly our rank-k approximation is about as accurate as the truncated SVD with achoice of l such that

σl+1 (A)

σk+1 (A)= o (1)

REMARK 33 The ldquohigh probabilityrdquo mentioned in Theorem 32 is dependent on theoversampling size p but not on the block size b We decided not to dive deep into the probabilityexplanation since it will make the proof much longer and more tedious Theoretically thecomputed oversampling size p for certain given high probability would be rather high Forexample to achieve a probability 095 we need to choose p to be at least 500 However inpractice we would never choose such a large p since it will deteriorate the code efficiencyIt turns out that a small p like 5 sim 10 would achieve good approximation results empiricallyThe detailed explanation about what does high probability mean here can be found in [75]In terms of the choice of the block size b it wonrsquot affect the analysis or the error boundsThe choice of b is purely for the performance of the codes Only a suitable b can take fulladvantage of the cache hierarchy and BLAS routines

4 Numerical experiments In this section we demonstrate the effectiveness and effi-ciency of the Flip-Flop SRQR (FFSRQR) algorithm in several numerical experiments Firstly

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 481

TABLE 41Methods for approximate SVD

Method DescriptionLANSVD Approximate SVD using Lanczos bidiagonalization with partial

reorthogonalization [42]Its Fortran and Matlab implementations are from PROPACK [41]

FFSRQR Flip-Flop Spectrum-revealing QR factorizationIts Matlab implementation is using Mex-wrappedBLAS and LAPACK routines [2]

RSISVD Approximate SVD with randomized subspace iteration [30 27]Its Matlab implementation is from tensorlab toolbox [73]

TUXV Algorithm 5 in the appendixLTSVD Linear Time SVD [16]

We use its Matlab implementation by Ma et al [49]

we compare FFSRQR with other approximate SVD algorithms for a matrix approximationSecondly we compare FFSRQR with other methods on tensor approximation problems usingthe tensorlab toolbox [73] Thirdly we compare FFSRQR with other methods for the robustPCA problem and matrix completion problem All experiments are implemented on a laptopwith 29 GHz Intel Core i5 processor and 8 GB of RAM The codes based on LAPACK [2]and PROPACK [41] in Section 41 are in Fortran The codes in Sections 42 43 are in Matlabexcept that FFSRQR is implemented in Fortran and wrapped by mex

41 Approximate truncated SVD In this section we compare FFSRQR with otherapproximate SVD algorithms for low-rank matrix approximation All tested methods are listedin Table 41 Since LTSVD has only a Matlab implementation this method is only consideredin Sections 42 and 43 The test matrices areType 1 A isin Rmtimesn [66] is defined by A = U DV T + 01 middot dss middotE where U isin Rmtimess V isin

Rntimess are column-orthonormal matrices and D isin Rstimess is a diagonal matrix with sgeometrically decreasing diagonal entries from 1 to 10minus3 dss is the last diagonalentry of D E isin Rmtimesn is a random matrix where the entries are independentlysampled from a normal distribution N (0 1) In our numerical experiment we testthree different random matrices The square matrix has a size of 10000times 10000 theshort-fat matrix has a size of 1000times 10000 and the tall-skinny matrix has a size of10000times 1000

Type 2 A isin R4929times4929 is a real data matrix from the University of Florida sparse matrixcollection [11] Its corresponding file name is HBGEMAT11

For a given matrix A isin Rmtimesn the relative SVD approximation error is measured by∥∥Aminus UkΣkVTk

∥∥FAF where Σk contains the approximate top k singular values and

Uk Vk are the corresponding approximate top k singular vectors The parameters used inFFSRQR RSISVD TUXV and LTSVD are listed in Table 42 The additional time requiredto compute g2 is negligible In our experiments g2 always remains modest and never triggerssubsequent SRQR column swaps However computing g2 is still recommended since g2

serves as an insurance policy against potential quality degration by TRQRCP The block size bis chosen to be 32 according to the xGEQP3 LAPACK routine When k is less than 32 weimplement an unblock version of the partial QRCP

Figures 41ndash43 display the run time the relative approximation error and the top 20singular values comparison respectively for four different matrices In terms of speed FFSRQRis always faster than LANSVD and RSISVD Moreover the efficiency difference is more

ETNAKent State University and

Johann Radon Institute (RICAM)

482 Y FENG J XIAO AND M GU

TABLE 42Parameters used in RSISVD FFSRQR TUXV and LTSVD

Method ParameterFFSRQR oversampling size p = 5 block size b = min32 k l = k d = 10 g = 20RSISVD oversampling size p = 5 subspace iteration q = 1TUXV oversampling size p = 5 block size b = min32 k l = kLTSVD probabilities pi = 1n

obvious when k is relatively large FFSRQR is only slightly slower than TUXV In termsof the relative approximation error FFSRQR is comparable to the other three methods Interms of the top singular values approximation Figure 43 displays the top 20 singular valueswhen k = 500 for four different matrices In Figure 43 FFSRQR is superior to TUXV aswe take the absolute values of the main diagonal entries of the upper triangle matrix R as theapproximate singular values for TUXV

100 200 300 400 500

target rank k

100

101

102

103

104

run tim

e (

sec)

FFSRQR

RSISVD

LANSVD

TUXV

(a) Type 1 Random square matrix

100 200 300 400 500

target rank k

10-1

100

101

102

run tim

e (

sec)

FFSRQR

RSISVD

LANSVD

TUXV

(b) Type 1 Random short-fat matrix

100 200 300 400 500

target rank k

10-1

100

101

102

run tim

e (

sec)

FFSRQR

RSISVD

LANSVD

TUXV

(c) Type 1 Random tall-skinny matrix

100 200 300 400 500

target rank k

10-1

100

101

102

103

run tim

e (

sec)

FFSRQR

RSISVD

LANSVD

TUXV

(d) Type 2 GEMAT11

FIG 41 Run time comparison for approximate SVD algorithms

42 Tensor approximation This section illustrates the effectiveness and efficiency ofFFSRQR for computing approximate tensors ST-HOSVD [3 71] is one of the most efficient

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 483

100 200 300 400 500

target rank k

022

024

026

028

03

032

034

036re

lative

ap

pro

xim

atio

n e

rro

r

FFSRQR

RSISVD

LANSVD

TUXV

(a) Type 1 Random square matrix

100 200 300 400 500

target rank k

10-2

10-1

100

rela

tive

ap

pro

xim

atio

n e

rro

r

FFSRQR

RSISVD

LANSVD

TUXV

(b) Type 1 Random short-fat matrix

100 200 300 400 500

target rank k

10-2

10-1

100

rela

tive

ap

pro

xim

atio

n e

rro

r

FFSRQR

RSISVD

LANSVD

TUXV

(c) Type 1 Random tall-skinny matrix

100 200 300 400 500

target rank k

024

026

028

03

032

034

036

rela

tive

ap

pro

xim

atio

n e

rro

rFFSRQR

RSISVD

LANSVD

TUXV

(d) Type 2 GEMAT11

FIG 42 Relative approximation error comparison for approximate SVD algorithms

algorithms to compute a Tucker decomposition of tensors and the most costly part of thisalgorithm is to compute an SVD or approximate SVD of the tensor unfoldings The truncatedSVD and RSISVD are used in the routines MLSVD and MLSVD_RSI respectively in theMatlab tensorlab toolbox [73] Based on this Matlab toolbox we implement ST-HOSVDusing FFSRQR or LTSVD to do the SVD approximation We name these two new routines byMLSVD_FFSRQR and MLSVD_LTSVD respectively and compare these four routines in thenumerical experiment We also have Python codes for this tensor approximation experimentWe donrsquot list the results of the python programs here as they are similar to those for Matlab

421 A sparse tensor example We test a sparse tensor X isin Rntimesntimesn of the followingformat [64 59]

X =

10sumj=1

1000

jxj yj zj +

nsumj=11

1

jxj yj zj

where xj yj zj isin Rn are sparse vectors with nonnegative entries The symbol ldquordquo representsthe vector outer product We compute a rank-(k k k) Tucker decomposition [GU1 U2 U3]using MLSVD MLSVD_FFSRQR MLSVD_RSI and MLSVD_LTSVD respectively Therelative approximation error is measured by X minus XkF XF where Xk = G times1 U1 times2

U2 times3 U3

ETNAKent State University and

Johann Radon Institute (RICAM)

484 Y FENG J XIAO AND M GU

5 10 15 20

index

06

07

08

09

1to

p 2

0 s

ingula

r valu

es

FFSRQR

RSISVD

LANSVD

TUXV

(a) Type 1 Random square matrix

5 10 15 20

index

06

07

08

09

1

top 2

0 s

ingula

r valu

es

FFSRQR

RSISVD

LANSVD

TUXV

(b) Type 1 Random short-fat matrix

5 10 15 20

index

06

065

07

075

08

085

09

095

top

20

sin

gu

lar

va

lue

s

FFSRQR

RSISVD

LANSVD

TUXV

(c) Type 1 Random tall-skinny matrix

5 10 15 20

index

101

102

103

top

20

sin

gu

lar

va

lue

sFFSRQR

RSISVD

LANSVD

TUXV

(d) Type 2 GEMAT11

FIG 43 Top 20 singular values comparison for approximate SVD algorithms

Figure 44 displays a comparison of the efficiency and accuracy of different methods for a400times 400times 400 sparse tensor approximation problem MLSVD_LTSVD is the fastest but theleast accurate one The other three methods have similar accuracy while MLSVD_FFSRQR isfaster when the target rank k is larger

422 Handwritten digits classification MNIST is a handwritten digits image data setcreated by Yann LeCun [43] Every digit is represented by a 28times28 pixel image Handwrittendigits classification has the objective of training a classification model to classify new unlabeledimages A HOSVD algorithm is proposed by Savas and Elden [60] to classify handwrittendigits To reduce the training time a more efficient ST-HOSVD algorithm is introducedin [71]

We do handwritten digits classification using MNIST which consists of 60 000 trainingimages and 10 000 test images The number of training images in each class is restrictedto 5421 so that the training set is equally distributed over all classes The training set isrepresented by a tensor X of size 786times 5421times 10 The classification relies on Algorithm 2in [60] We use various algorithms to obtain an approximation X asymp G times1 U1 times2 U2 times3 U3

where the core tensor G has size 65times 142times 10The results are summarized in Table 43 In terms of run time we observe that our method

MLSVD_FFSRQR is comparable to MLSVD_RSI while MLSVD is the most expensive

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 485

50 100 150 200

target rank k

10-5

10-4

10-3

10-2

10-1

100

rela

tive

ap

pro

xim

atio

n e

rro

r

MLSVD

MLSVD_FFSRQR

MLSVD_RSI

MLSVD_LTSVD

50 100 150 200

target rank k

10-2

10-1

100

101

102

run tim

e (

sec)

MLSVD

MLSVD_FFSRQR

MLSVD_RSI

MLSVD_LTSVD

FIG 44 Run time and relative approximation error comparison on a sparse tensor

TABLE 43Comparison on handwritten digits classification

MLSVD MLSVD_FFSRQR MLSVD_RSI MLSVD_LTSVD

Training Time [sec] 272121 15455 19343 04266Relative Model Error 04099 04273 04247 05162Classification Accur 9519 9498 9505 9259

one and MLSVD_LTSVD is the fastest one In terms of classification quality MLSVDMLSVD_FFSRQR and MLSVD_RSI are comparable while MLSVD_LTSVD is the leastaccurate one

43 Solving nuclear norm minimization problem To show the effectiveness of theFFSRQR algorithm for the nuclear norm minimization problems we investigate two scenariosthe robust PCA (27) and the matrix completion problem (28) The test matrix used for therobust PCA is introduced in [45] and the test matrices used in the matrix completion are tworeal data sets We use the IALM method [45] to solve both problems the IALM code can bedownloaded1

431 Robust PCA To solve the robust PCA problem we replace the approximateSVD part in the IALM method [45] by various methods We denote the actual solution tothe robust PCA problem by a matrix pair (Xlowast Elowast) isin Rmtimesn times Rmtimesn The matrix Xlowast isXlowast = XLX

TR with XL isin Rmtimesk XR isin Rntimesk being random matrices whose entries are

independently sampled from a normal distribution The sparse matrix Elowast is a random matrixwhere its non-zero entries are independently sampled from a uniform distribution over theinterval [minus500 500] The input to the IALM algorithm has the form M = Xlowast + Elowast andthe output is denoted by

(X E

) In this numerical experiment we use the same parameter

settings as the IALM code for robust PCA rank k is 01m and the number of non-zero entriesin E is 005m2 We choose the trade-off parameter λ = 1

radicmax (mn) as suggested by

Candes et al [8] The solution quality is measured by the normalized root mean square errorX minusXlowastF XlowastF

Table 44 includes relative error run time the number of non-zero entries in E (E0)the iteration count and the number of non-zero singular values (sv) in X of the IALM

1httpperceptioncslillinoisedumatrix-ranksample_codehtml

ETNAKent State University and

Johann Radon Institute (RICAM)

486 Y FENG J XIAO AND M GU

algorithm using different approximate SVD methods We observe that IALM_FFSRQR isfaster than all the other three methods while its error is comparable to IALM_LANSVD andIALM_RSISVD IALM_LTSVD is relatively slow and not effective

TABLE 44Comparison on robust PCA

Size Method Error Time (sec) E0 Iter sv

1000times 1000

IALM_LANSVD 333e-07 579e+00 50000 22 100IALM_FFSRQR 279e-07 102e+00 50000 25 100IALM_RSISVD 336e-07 109e+00 49999 22 100IALM_LTSVD 992e-02 311e+00 999715 100 100

2000times 2000

IALM_LANSVD 261e-07 591e+01 199999 22 200IALM_FFSRQR 182e-07 693e+00 199998 25 200IALM_RSISVD 263e-07 738e+00 199996 22 200IALM_LTSVD 842e-02 220e+01 3998937 100 200

4000times 4000

IALM_LANSVD 138e-07 465e+02 799991 23 400IALM_FFSRQR 139e-07 443e+01 800006 26 400IALM_RSISVD 151e-07 504e+01 799990 23 400IALM_LTSVD 894e-02 154e+02 15996623 100 400

6000times 6000

IALM_LANSVD 130e-07 166e+03 1799982 23 600IALM_FFSRQR 102e-07 142e+02 1799993 26 600IALM_RSISVD 144e-07 162e+02 1799985 23 600IALM_LTSVD 858e-02 555e+02 35992605 100 600

432 Matrix completion We solve the matrix completion problems for two real datasets used in [68] the Jester joke data set [24] and the MovieLens data set [32] The Jester jokedata set consists of 41 million ratings for 100 jokes from 73 421 users and can be downloadedfrom the website httpgoldbergberkeleyedujester-data We test withthe following data matrices

bull jester-1 Data from 24 983 users who have rated 36 or more jokesbull jester-2 Data from 23 500 users who have rated 36 or more jokesbull jester-3 Data from 24 938 users who have rated between 15 and 35 jokesbull jester-all The combination of jester-1 jester-2 and jester-3

The MovieLens data set can be downloaded fromhttpsgrouplensorgdatasetsmovielensWe test with the following data matricesbull movie-100K 100 000 ratings of 943 users for 1682 moviesbull movie-1M 1 million ratings of 6040 users for 3900 moviesbull movie-latest-small 100 000 ratings of 700 users for 9000 movies

For each data set we let M be the original data matrix where Mij stands for the ratingof joke (movie) j by user i and Γ be the set of indices where Mij is known The matrixcompletion algorithm quality is measured by the Normalized Mean Absolute Error (NMAE)defined by

NMAEdef=

1|Γ|sum

(ij)isinΓ |Mij minusXij |rmax minus rmin

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 487

where Xij is the prediction of the rating of joke (movie) j given by user i and rmin rmax arethe lower and upper bounds of the ratings respectively For the Jester joke data sets we setrmin = minus10 and rmax = 10 For the MovieLens data sets we set rmin = 1 and rmax = 5

Since |Γ| is large we randomly select a subset Ω from Γ and then use Algorithm 8 tosolve the problem (28) We randomly select 10 ratings for each user in the Jester joke datasets while we randomly choose about 50 of the ratings for each user in the MovieLensdata sets Table 45 includes the parameter settings in the algorithms The maximum iterationnumber is 100 in IALM and all other parameters are the same as those used in [45]

The numerical results are included in Table 46 We observe that IALM_FFSRQR achievesalmost the same recoverability as other methods (except IALM_LTSVD) and is slightly fasterthan IALM_RSISVD for these two data sets

TABLE 45Parameters used in the IALM method on matrix completion

Data set m n |Γ| |Ω|jester-1 24983 100 181e+06 249830jester-2 23500 100 171e+06 235000jester-3 24938 100 617e+05 249384

jester-all 73421 100 414e+06 734210movie-100K 943 1682 100e+05 49918movie-1M 6040 3706 100e+06 498742

movie-latest-small 671 9066 100e+05 52551

5 Conclusions We have presented the Flip-Flop SRQR factorization a variant ofthe QLP factorization to compute low-rank matrix approximations The Flip-Flop SRQRalgorithm uses a SRQR factorization to initialize the truncated version of a column-pivotedQR factorization and then forms an LQ factorization For the numerical results presented theerrors in the proposed algorithm were comparable to those obtained from the other state-of-the-art algorithms This new algorithm is cheaper to compute and produces quality low-rankmatrix approximations Furthermore we prove singular value lower bounds and residual errorupper bounds for the Flip-Flop SRQR factorization In situations where singular values ofthe input matrix decay relatively quickly the low-rank approximation computed by Flip-FlopSRQR is guaranteed to be as accurate as the truncated SVD We also perform a complexityanalysis to show that Flip-Flop SRQR is faster than the approximate SVD with randomizedsubspace iteration Future work includes reducing the overhead cost in Flip-Flop SRQR andimplementing the Flip-Flop SRQR algorithm on distributed memory machines for popularapplications such as distributed PCA

Acknowledgments The authors would like to thank Yousef Saad the editor and threeanonymous referees who kindly reviewed the earlier version of this manuscript and providedvaluable suggestions and comments

Appendix A Partial QR factorization with column pivotingThe details of partial QRCP are covered in Algorithm 4

Appendix B TUXV algorithmThe details of TUXV algorithm are presented in Algorithm 5

Appendix C Computing Tucker decomposition of higher-order tensorsAlgorithm 6 computes the Tucker decomposition of higher-order tensors ie sequentially

truncated higher-order SVD

ETNAKent State University and

Johann Radon Institute (RICAM)

488 Y FENG J XIAO AND M GU

TABLE 46Comparison on matrix completion

Data set Method Iter Time NMAE sv σmax σmin

jester-1

IALM-LANSVD 12 706e+00 184e-01 100 214e+03 100e+00IALM-FFSRQR 12 344e+00 169e-01 100 228e+03 100e+00IALM-RSISVD 12 375e+00 189e-01 100 212e+03 100e+00IALM-LTSVD 100 211e+01 174e-01 62 300e+03 100e+00

jester-2

IALM-LANSVD 12 680e+00 185e-01 100 213e+03 100e+00IALM-FFSRQR 12 279e+00 170e-01 100 229e+03 100e+00IALM-RSISVD 12 359e+00 191e-01 100 212e+03 100e+00IALM-LTSVD 100 203e+01 175e-01 58 296e+03 100e+00

jester-3

IALM-LANSVD 12 705e+00 126e-01 99 179e+03 100e+00IALM-FFSRQR 12 303e+00 122e-01 100 171e+03 100e+00IALM-RSISVD 12 385e+00 131e-01 100 178e+03 100e+00IALM-LTSVD 100 212e+01 133e-01 55 250e+03 100e+00

jester-all

IALM-LANSVD 12 239e+01 172e-01 100 356e+03 100e+00IALM-FFSRQR 12 112e+01 162e-01 100 363e+03 100e+00IALM-RSISVD 12 134e+01 182e-01 100 347e+03 100e+00IALM-LTSVD 100 699e+01 168e-01 52 492e+03 100e+00

movie-100K

IALM-LANSVD 29 286e+01 183e-01 285 121e+03 100e+00IALM-FFSRQR 30 455e+00 167e-01 295 153e+03 100e+00IALM-RSISVD 29 482e+00 182e-01 285 129e+03 100e+00IALM-LTSVD 48 142e+01 147e-01 475 191e+03 100e+00

movie-1M

IALM-LANSVD 50 740e+02 158e-01 495 499e+03 100e+00IALM-FFSRQR 53 207e+02 137e-01 525 663e+03 100e+00IALM-RSISVD 50 223e+02 157e-01 495 535e+03 100e+00IALM-LTSVD 100 850e+02 117e-01 995 897e+03 100e+00

movie-latest-small

IALM-LANSVD 31 166e+02 185e-01 305 113e+03 100e+00IALM-FFSRQR 31 196e+01 200e-01 305 142e+03 100e+00IALM-RSISVD 31 285e+01 191e-01 305 120e+03 100e+00IALM-LTSVD 63 402e+01 208e-01 298 179e+03 100e+00

Appendix D IALM for solving two special nuclear norm minimization problemsAlgorithm 7 and Algorithm 8 solve the robust PCA and matrix completion problems

respectively In Algorithm 7 Sω (x) = sgn (x) middot max (|x| minus ω 0) is the soft shrinkageoperator [29] with x isin Rn and ω gt 0 In Algorithm 8 Ω is the complement of Ω

Appendix E Approximate SVD with randomized subspace iterationThe randomized subspace iteration was proposed in [30 Algorithm 44] to compute an

orthonormal matrix whose range approximates the range of A An approximate SVD can becomputed using the aforementioned orthonormal matrix [30 Algorithm 51] A randomizedsubspace iteration is used in routine MLSVD_RSI in Matlab toolbox tensorlab [73] andMLSVD_RSI is by far the most efficient function that we can find in Matlab to compute ST-HOSVD We summarize the approximate SVD with randomized subspace iteration pseudocodein Algorithm 9

Now we perform a complexity analysis for the approximate SVD with randomizedsubspace iteration We first note that

1 The cost of generating a random matrix is negligible2 The cost of computing B = AΩ is 2mn (k + p)

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 489

Algorithm 4 Partial QRCPInputsMatrix A isin Rmtimesn Target rank kOutputsOrthogonal matrix Q isin RmtimesmUpper trapezoidal matrix R isin RmtimesnPermutation matrix Π isin Rntimesn such that AΠ = QRAlgorithmInitialize Π = In Compute column norms rs = A (1 m s) 2 (1 le s le n)for j = 1 k do

Find i = arg maxjleslen

rs Exchange rj and ri columns in A and Π

Form Householder reflection Qj from A (j m j)Update trailing matrix A (j m j n)larr QTj A (j m j n)Update rs = A (j + 1 m s) 2 (j + 1 le s le n)

end forQ = Q1Q2 middot middot middotQk is the product of all reflections R = upper trapezoidal part of A

Algorithm 5 TUXV AlgorithmInputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0OutputsColumn orthonormal matrices U isin Rmtimesk V isin Rntimesk and upper triangular matrixR isin Rktimesk such that A asymp URV T AlgorithmDo TRQRCP on A to obtain Q isin Rmtimesk R isin Rktimesn and Π isin RntimesnR = RΠT and do LQ factorization ie [VR] = qr(RT 0)Compute Z = AV and do QR factorization ie [UR] = qr(Z 0)

3 In each QR step [Qsim] = qr (B 0) the cost of computing the QR factorization ofB is 2m (k + p)

2 minus 23 (k + p)

3 (cf [69]) and the cost of forming the first (k + p)

columns in the full Q matrix is m (k + p)2

+ 13 (k + p)

3

Now we count the flops for each i in the for loop

1 The cost of computing B = AT lowastQ is 2mn (k + p)2 The cost of computing [Qsim] = qr (B 0) is 2n (k + p)

2 minus 23 (k + p)

3 and the costof forming the first (k + p) columns in the full Q matrix is n (k + p)

2+ 1

3 (k + p)3

3 The cost of computing B = A lowastQ is 2mn (k + p)4 The cost of computing [Qsim] = qr (B 0) is 2m (k + p)

2minus 23 (k + p)

3 and the costof forming the first (k + p) columns in the full Q matrix is m (k + p)

2+ 1

3 (k + p)3

Together the cost of running the for loop q times is

q

(4mn (k + p) + 3 (m+ n) (k + p)

2 minus 2

3(k + p)

3

)

Additionally the cost of computing B = QT lowastA is 2mn (k + p) the cost of doing SVD of Bis O(n (k + p)

2) and the cost of computing U = Q lowast U is 2m (k + p)

2

ETNAKent State University and

Johann Radon Institute (RICAM)

490 Y FENG J XIAO AND M GU

Algorithm 6 ST-HOSVDInputsTensor X isin RI1timesmiddotmiddotmiddottimesId desired rank (k1 kd) and processing order p = (p1 middot middot middot pd)OutputsTucker decomposition [GU1 middot middot middot Ud]AlgorithmDefine tensor G larr X for j = 1 d dor = pj Compute exact or approximate rank kr SVD of the tensor unfolding G(r) asymp UrΣrV Tr Ur larr UrUpdate G(r) larr ΣrV

Tr ie applying UTr to G

end for

Algorithm 7 Robust PCA Using IALMInputsMeasured matrix M isin Rmtimesn positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 J (M) = max

(M2 λminus1MF

) Y0 = MJ (M) E0 = 0

while not converged do(UΣV) = svd

(MminusEk + microminus1k Yk

)

Xk+1 = USmicrominus1k

(Σ)V T

Ek+1 = Sλmicrominus1k

(M minusXk+1 + microminus1

k Yk)

Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then

Breakend if

end while

Now assume k + p min (mn) and omit the lower-order terms then we arriveat (4q + 4)mn (k + p) as the complexity of approximate SVD with randomized subspaceiteration In practice q is usually chosen to be an integer between 0 and 2

REFERENCES

[1] O ALTER P O BROWN AND D BOTSTEIN Singular value decomposition for genome-wide expressiondata processing and modeling Proc Natl Acad Sci USA 97 (2000) pp 10101ndash10106

[2] E ANDERSON Z BAI C BISCHOF L S BLACKFORD J DEMMEL J DONGARRA J DU CROZA GREENBAUM S HAMMARLING A MCKENNEY AND D SORENSEN LAPACK Usersrsquo Guide 3rded SIAM Philadelphia 1999

[3] C A ANDERSSON AND R BRO Improving the speed of multi-way algorithms Part I Tucker3 ChemometricsIntell Lab Syst 42 (1998) pp 93ndash103

[4] M W BERRY S T DUMAIS AND G W OrsquoBRIEN Using linear algebra for intelligent information retrievalSIAM Rev 37 (1995) pp 573ndash595

[5] C BISCHOF AND C VAN LOAN The WY representation for products of Householder matrices SIAM J Sci

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 491

Algorithm 8 Matrix Completion Using IALMInputsSampled set Ω sampled entries πΩ (M) positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 Y0 = 0 E0 = 0while not converged do

(UΣV) = svd(MminusEk + microminus1k Yk

)

Xk+1 = USmicrominus1k

(Σ)V T

Ek+1 = πΩ

(M minusXk+1 + microminus1

k Yk)

Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then

Breakend if

end while

Statist Comput 8 (1987) pp S2ndashS13[6] P BUSINGER AND G H GOLUB Handbook series linear algebra Linear least squares solutions by House-

holder transformations Numer Math 7 (1965) pp 269ndash276[7] J-F CAI E J CANDEgraveS AND Z SHEN A singular value thresholding algorithm for matrix completion

SIAM J Optim 20 (2010) pp 1956ndash1982[8] E J CANDEgraveS X LI Y MA AND J WRIGHT Robust principal component analysis J ACM 58 (2011)

Art 11 37 pages[9] E J CANDEgraveS AND B RECHT Exact matrix completion via convex optimization Found Comput Math 9

(2009) pp 717ndash772[10] J D CARROLL AND J-J CHANG Analysis of individual differences in multidimensional scaling via an

N-way generalization of ldquoEckart-Youngrdquo decomposition Psychometrika 35 (1970) pp 283ndash319[11] T A DAVIS AND Y HU The University of Florida sparse matrix collection ACM Trans Math Software 38

(2011) pp Art 1 25[12] L DE LATHAUWER AND B DE MOOR From matrix to tensor Multilinear algebra and signal processing in

Mathematics in Signal Processing IV Inst Math Appl Conf Ser Oxford Univ Press Oxford 1998pp 1ndash16

[13] L DE LATHAUWER B DE MOOR AND J VANDEWALLE A multilinear singular value decompositionSIAM J Matrix Anal Appl 21 (2000) pp 1253ndash1278

[14] On the best rank-1 and rank-(R1 R2 middot middot middot RN ) approximation of higher-order tensors SIAM JMatrix Anal Appl 21 (2000) pp 1324ndash1342

[15] L DE LATHAUWER AND J VANDEWALLE Dimensionality reduction in higher-order signal processing andrank-(R1 R2 RN ) reduction in multilinear algebra Linear Algebra Appl 391 (2004) pp 31ndash55

[16] P DRINEAS R KANNAN AND M W MAHONEY Fast Monte Carlo algorithms for matrices II Computinga low-rank approximation to a matrix SIAM J Comput 36 (2006) pp 158ndash183

[17] J A DUERSCH AND M GU Randomized QR with column pivoting SIAM J Sci Comput 39 (2017)pp C263ndashC291

[18] C ECKART AND G YOUNG The approximation of one matrix by another of lower rank Psychometrika 1(1936) pp 211ndash218

[19] L ELDEacuteN Matrix Methods in Data Mining and Pattern Recognition SIAM Philadelphia 2007[20] M FAZEL H HINDI AND S P BOYD Log-det heuristic for matrix rank minimization with applications to

Hankel and Euclidean distance matrices in Proceedings of the 2003 American Control Conference 2003IEEE Conference Proceedings IEEE Piscataway pp 2156ndash2162

[21] M FAZEL T K PONG D SUN AND P TSENG Hankel matrix rank minimization with applications tosystem identification and realization SIAM J Matrix Anal Appl 34 (2013) pp 946ndash977

[22] C FRANK AND E NOumlTH Optimizing eigenfaces by face masks for facial expression recognition in ComputerAnalysis of Images and Patterns N Petkov and M A Westenberg eds vol 2756 of Lecture Notes inComput Sci Springer Berlin 2003 pp 646ndash654

ETNAKent State University and

Johann Radon Institute (RICAM)

492 Y FENG J XIAO AND M GU

Algorithm 9 Approximate SVD with Randomized Subspace IterationInputsMatrix A isin Rmtimesn Target rank k Oversampling size p ge 0 Number of iterations q ge 1OutputsU isin Rmtimesk contains the approximate top k left singular vectors of AΣ isin Rktimesk contains the approximate top k singular values of AV isin Rntimesk contains the approximate top k right singular vectors of AAlgorithmGenerate iid Gaussian matrix Ω isin N (0 1)

ntimes(k+p)Compute B = AΩ[Qsim] = qr (B 0)for i = 1 q doB = AT lowastQ[Qsim] = qr (B 0)B = A lowastQ[Qsim] = qr (B 0)

end forB = QT lowastA[UΣ V ] = svd (B)U = Q lowast U U = U ( 1 k)Σ = Σ (1 k 1 k)V = V ( 1 k)

[23] G W FURNAS S DEERWESTER S T DUMAIS T K LANDAUER R A HARSHMAN L A STREETERAND K E LOCHBAUM Information retrieval using a singular value decomposition model of latentsemantic structure in Proceedings of the 11th Annual International ACM SIGIR Conference on Researchand Development in Information Retrieval Y Chiaramella ed ACM New York 1988 pp 465ndash480

[24] K GOLDBERG T ROEDER D GUPTA AND C PERKINS Eigentaste A constant time collaborative filteringalgorithm Inf Retrieval 4 (2001) pp 133ndash151

[25] G GOLUB Numerical methods for solving linear least squares problems Numer Math 7 (1965) pp 206ndash216[26] G H GOLUB AND C F VAN LOAN Matrix Computations 4th ed Johns Hopkins University Press

Baltimore 2013[27] M GU Subspace iteration randomization and singular value problems SIAM J Sci Comput 37 (2015)

pp A1139ndashA1173[28] M GU AND S C EISENSTAT Efficient algorithms for computing a strong rank-revealing QR factorization

SIAM J Sci Comput 17 (1996) pp 848ndash869[29] E T HALE W YIN AND Y ZHANG Fixed-point continuation for l1-minimization methodology and

convergence SIAM J Optim 19 (2008) pp 1107ndash1130[30] N HALKO P G MARTINSSON AND J A TROPP Finding structure with randomness probabilistic

algorithms for constructing approximate matrix decompositions SIAM Rev 53 (2011) pp 217ndash288[31] R A HARSHMAN Foundations of the PARAFAC procedure Models and conditions for an ldquoexplanatoryrdquo

multimodal factor analysis UCLA Working Papers in Phonetics 16 (1970) pp 1ndash84[32] J L HERLOCKER J A KONSTAN A BORCHERS AND J RIEDL An algorithmic framework for performing

collaborative filtering in Proceedings of the 22nd Annual International ACM SIGIR Conference onResearch and Development in Information Retrieval F Gey M Heaast and R Tong eds ACM NewYork 1999 pp 230ndash237

[33] R A HORN AND C R JOHNSON Topics in Matrix Analysis Cambridge University Press Cambridge 1991[34] D A HUCKABY AND T F CHAN On the convergence of Stewartrsquos QLP algorithm for approximating the

SVD Numer Algorithms 32 (2003) pp 287ndash316[35] Stewartrsquos pivoted QLP decomposition for low-rank matrices Numer Linear Algebra Appl 12 (2005)

pp 153ndash159[36] W B JOHNSON AND J LINDENSTRAUSS Extensions of Lipschitz mappings into a Hilbert space in

Conference in Modern Analysis and Probability (New Haven Conn 1982) R Beals A Beck A Bellowand A Hajian eds vol 26 of Contemp Math Amer Math Soc Providence 1984 pp 189ndash206

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 493

[37] I T JOLLIFFE Principal Component Analysis Springer New York 1986[38] J M KLEINBERG Authoritative sources in a hyperlinked environment J ACM 46 (1999) pp 604ndash632[39] T G KOLDA Orthogonal tensor decompositions SIAM J Matrix Anal Appl 23 (2001) pp 243ndash255[40] T G KOLDA AND B W BADER Tensor decompositions and applications SIAM Rev 51 (2009) pp 455ndash

500[41] R M LARSEN PROPACK-software for large and sparse SVD calculations available at

httpsunstanfordedurmunkPROPACK[42] Lanczos bidiagonalization with partial reorthogonalization Tech Report PB-537 Department of

Computer Science Aarhus University Aarhus Denmark 1998[43] Y LECUN L BOTTOU Y BENGIO AND P HAFFNER Gradient-based learning applied to document

recognition Proc IEEE 86 (1998) pp 2278ndash2324[44] R B LEHOUCQ D C SORENSEN AND C YANG ARPACK Usersrsquo Guide Solution of Large-Scale

Eigenvalue Problems with Implicitly Restarted Arnoldi Methods SIAM Philadelphia 1998[45] Z LIN M CHEN AND Y MA The augmented Lagrange multiplier method for exact recovery of corrupted

low-rank matrices Preprint on arXiv httpsarxivorgabs10095055 2010[46] N LINIAL E LONDON AND Y RABINOVICH The geometry of graphs and some of its algorithmic

applications Combinatorica 15 (1995) pp 215ndash245[47] Z LIU A HANSSON AND L VANDENBERGHE Nuclear norm system identification with missing inputs and

outputs Systems Control Lett 62 (2013) pp 605ndash612[48] Z LIU AND L VANDENBERGHE Interior-point method for nuclear norm approximation with application to

system identification SIAM J Matrix Anal Appl 31 (2009) pp 1235ndash1256[49] S MA D GOLDFARB AND L CHEN Fixed point and Bregman iterative methods for matrix rank minimiza-

tion Math Program 128 (2011) pp 321ndash353[50] C B MELGAARD Randomized Pivoting and Spectrum-Revealing Bounds in Numerical Linear Algebra

PhD Thesis University of California Berkeley 2015[51] M MESBAHI AND G P PAPAVASSILOPOULOS On the rank minimization problem over a positive semidefinite

linear matrix inequality IEEE Trans Automat Control 42 (1997) pp 239ndash243[52] N MULLER L MAGAIA AND B M HERBST Singular value decomposition eigenfaces and 3D recon-

structions SIAM Rev 46 (2004) pp 518ndash545[53] M NARWARIA AND W LIN SVD-based quality metric for image and video using machine learning IEEE

Trans Systbdquo Man and Cybernetics Part B (Cybernetics) 42 (2012) pp 347ndash364[54] T-H OH Y MATSUSHITA Y-W TAI AND I SO KWEON Fast randomized singular value thresholding

for nuclear norm minimization in Proceedings of the IEEE Conference on Computer Vision and PatternRecognition 2015 IEEE Conference Proceedings IEEE Los Alamitos 2015 pp 4484ndash4493

[55] G QUINTANA-ORTIacute X SUN AND C H BISCHOF A BLAS-3 version of the QR factorization with columnpivoting SIAM J Sci Comput 19 (1998) pp 1486ndash1494

[56] B RECHT M FAZEL AND P A PARRILO Guaranteed minimum-rank solutions of linear matrix equationsvia nuclear norm minimization SIAM Rev 52 (2010) pp 471ndash501

[57] J D RENNIE AND N SREBRO Fast maximum margin matrix factorization for collaborative prediction inProceedings of the 22nd International Conference on Machine Learning L De Raedt and S Wrobel edsACM Press Neow York 2005 pp 713ndash719

[58] V ROKHLIN A SZLAM AND M TYGERT A randomized algorithm for principal component analysis SIAMJ Matrix Anal Appl 31 (2009) pp 1100ndash1124

[59] A K SAIBABA HOID higher order interpolatory decomposition for tensors based on Tucker representationSIAM J Matrix Anal Appl 37 (2016) pp 1223ndash1249

[60] B SAVAS AND L ELDEacuteN Handwritten digit classification using higher order singular value decompositionPattern Recognition 40 (2007) pp 993ndash1003

[61] R SCHREIBER AND C VAN LOAN A storage-efficient WY representation for products of Householdertransformations SIAM J Sci Statist Comput 10 (1989) pp 53ndash57

[62] A SHASHUA AND T HAZAN Non-negative tensor factorization with applications to statistics and com-puter vision in Proceedings of the 22nd International Conference on Machine Learning S DzeroskiL De Raedt and S Wrobel eds ACM New York 2005 pp 792ndash799

[63] N D SIDIROPOULOS R BRO AND G B GIANNAKIS Parallel factor analysis in sensor array processingIEEE Trans Signal Process 48 (2000) pp 2377ndash2388

[64] D C SORENSEN AND M EMBREE A DEIM induced CUR factorization SIAM J Sci Comput 38 (2016)pp A1454ndashA1482

[65] N SREBRO J RENNIE AND T S JAAKKOLA Maximum-margin matrix factorization in NIPSrsquo04 Proceed-ings of the 17th International Conference on Neural Information Processing Systems L K Saul Y Weissand L Bottou eds MIT Press Cambridge 2004 pp 1329ndash1336

[66] G W STEWART The QLP approximation to the singular value decomposition SIAM J Sci Comput 20(1999) pp 1336ndash1348

ETNAKent State University and

Johann Radon Institute (RICAM)

494 Y FENG J XIAO AND M GU

[67] S TAHERI Q QIU AND R CHELLAPPA Structure-preserving sparse decomposition for facial expressionanalysis IEEE Trans Image Process 23 (2014) pp 3590ndash3603

[68] K-C TOH AND S YUN An accelerated proximal gradient algorithm for nuclear norm regularized linearleast squares problems Pac J Optim 6 (2010) pp 615ndash640

[69] L N TREFETHEN AND D BAU III Numerical Linear Algebra SIAM Philadelphia 1997[70] L R TUCKER Some mathematical notes on three-mode factor analysis Psychometrika 31 (1966) pp 279ndash

311[71] N VANNIEUWENHOVEN R VANDEBRIL AND K MEERBERGEN A new truncation strategy for the higher-

order singular value decomposition SIAM J Sci Comput 34 (2012) pp A1027ndashA1052[72] M A O VASILESCU AND D TERZOPOULOS Multilinear analysis of image ensembles Tensorfaces in

European Conference on Computer Vision 2002 A Heyden G Sparr M Nielsen and P Johansen edsLecture Notes in Computer Science vol 2350 Springer Berlin 2002 pp 447ndash460

[73] N VERVLIET O DEBALS L SORBER M V BAREL AND L DE LATHAUWER Tensorlab user guideavailable at httpwwwtensorlabnet

[74] J XIAO AND M GU Spectrum-revealing Cholesky factorization for kernel methods in Proceedings of the 16thIEEE International Conference on Data Mining (ICDM) IEEE Conference Proceedings Los Alamitos2016 pp 1293ndash1298

[75] J XIAO M GU AND J LANGOU Fast parallel randomized QR with column pivoting algorithms for reliablelow-rank matrix approximations in Proceedings of the 24th IEEE International Conference on HighPerformance Computing (HiPC) IEEE Conference Proceedings Los Alamitos 2017 pp 233ndash242

[76] T ZHANG AND G H GOLUB Rank-one approximation to high order tensors SIAM J Matrix Anal Appl23 (2001) pp 534ndash550

Page 11: FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION AND …etna.mcs.kent.edu/vol.51.2019/pp469-494.dir/pp469-494.pdfminimization. In Section3, we introduce Flip-Flop SRQR and analyze its

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 479

Combining with (310) it now follows that

∥∥Aminus UkΣkVTk

∥∥2

=

∥∥∥∥∥AΠminusQ[R11 R12

0

]TQT

∥∥∥∥∥2

=

∥∥∥∥[δR11 δR12

R22

]∥∥∥∥2

le σk+1 (A)4

radic1 + 2

(R222σk+1 (A)

)4

To obtain an upper bound of R222 in (35) and (36) we follow the analysis of SRQRin [75]

From the analysis of [75 Section IV] Algorithm 2 ensures that g1 leradic

1+ε1minusε and g2 le g

with high probability (the actual probability-guarantee formula can be found in [75 SectionIV]) where g1 and g2 are defined by (26) Here 0 lt ε lt 1 is a user defined parameter toadjust the choice of the oversampling size p used in the TRQRCP initialization part in SRQRg gt 1 is a user-defined parameter in the extra swapping part in SRQR Let

(312) τdef= g1g2

R222R2212

∥∥∥RminusT∥∥∥minus1

12

σl+1 (A)and τ

def= g1g2

R222R2212

∥∥RminusT11

∥∥minus1

12

σk (Σk)

where R is defined in equation (25) Since 1radicnX12 le X2 le

radicnX12 and

σi (X1) le σi (X) 1 le i le min (s t) for any matrix X isin Rmtimesn and submatrix X1 isin Rstimestof X we obtain that

τ = g1g2R222R2212

∥∥∥RminusT∥∥∥minus1

12

σl+1

(R) σl+1

(R)

σl+1 (A)le g1g2

radic(l + 1) (nminus l)

Using the fact that σl([R11 R12

])= σl

(R11

)and σk (Σk) = σk

([R11 R12

])by (32)

and (34)

τ = g1g2R222R2212

∥∥RminusT11

∥∥minus1

12

σl (R11)

σl (R11)

σl([R11 R12

]) σl ([R11 R12

])σk (Σk)

le g1g2

radicl (nminus l)

By the definition of τ

(313) R222 = τ σl+1 (A)

Plugging this into (36) yields (38)By the definition of τ we observe that

(314) R222 le τσk (Σk)

By (35) and (314)(315)

σj (Σk) ge σj (A)

4

radic1 + 2

(R222σj(Σk)

)4ge σj (A)

4

radic1 + 2

(R222σk(Σk)

)4ge σj (A)

4radic

1 + 2τ4 1 le j le k

ETNAKent State University and

Johann Radon Institute (RICAM)

480 Y FENG J XIAO AND M GU

On the other hand using (35)

σ4j (A) le σ4

j (Σk)

(1 + 2

σ4j (A)

σ4j (Σk)

R2242σ4j (A)

)

le σ4j (Σk)

1 + 2

(σ4j (Σk) + 2 R2242

)σ4j (Σk)

R2242σ4j (A)

le σ4

j (Σk)

(1 + 2

(1 + 2

R2242σ4k (Σk)

)R2242σ4j (A)

)

that is

σj (Σk) ge σj (A)

4

radic1 +

(2 + 4

R2242σ4k(Σk)

)R2242σ4j (A)

Plugging (313) and (314) into this above equation yields

(316) σj (Σk) ge σj (A)

4

radic1 + τ4 (2 + 4τ4)

(σl+1(A)σj(A)

)4 1 le j le k

Combining (315) with (316) we arrive at (37)We note that (35) and (36) still hold true if we replace k by lInequality (37) shows that with the definitions (312) of τ and τ Flip-Flop SRQR can

reveal at least a dimension-dependent fraction of all the leading singular values of A andindeed approximate them very accurately in case they decay relatively quickly Moreover (38)shows that Flip-Flop SRQR can compute a rank-k approximation that is up to a factor of4

radic1 + 2τ4

(σl+1(A)σk+1(A)

)4

from optimal In situations where the singular values of A decay

relatively quickly our rank-k approximation is about as accurate as the truncated SVD with achoice of l such that

σl+1 (A)

σk+1 (A)= o (1)

REMARK 33 The ldquohigh probabilityrdquo mentioned in Theorem 32 is dependent on theoversampling size p but not on the block size b We decided not to dive deep into the probabilityexplanation since it will make the proof much longer and more tedious Theoretically thecomputed oversampling size p for certain given high probability would be rather high Forexample to achieve a probability 095 we need to choose p to be at least 500 However inpractice we would never choose such a large p since it will deteriorate the code efficiencyIt turns out that a small p like 5 sim 10 would achieve good approximation results empiricallyThe detailed explanation about what does high probability mean here can be found in [75]In terms of the choice of the block size b it wonrsquot affect the analysis or the error boundsThe choice of b is purely for the performance of the codes Only a suitable b can take fulladvantage of the cache hierarchy and BLAS routines

4 Numerical experiments In this section we demonstrate the effectiveness and effi-ciency of the Flip-Flop SRQR (FFSRQR) algorithm in several numerical experiments Firstly

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 481

TABLE 41Methods for approximate SVD

Method DescriptionLANSVD Approximate SVD using Lanczos bidiagonalization with partial

reorthogonalization [42]Its Fortran and Matlab implementations are from PROPACK [41]

FFSRQR Flip-Flop Spectrum-revealing QR factorizationIts Matlab implementation is using Mex-wrappedBLAS and LAPACK routines [2]

RSISVD Approximate SVD with randomized subspace iteration [30 27]Its Matlab implementation is from tensorlab toolbox [73]

TUXV Algorithm 5 in the appendixLTSVD Linear Time SVD [16]

We use its Matlab implementation by Ma et al [49]

we compare FFSRQR with other approximate SVD algorithms for a matrix approximationSecondly we compare FFSRQR with other methods on tensor approximation problems usingthe tensorlab toolbox [73] Thirdly we compare FFSRQR with other methods for the robustPCA problem and matrix completion problem All experiments are implemented on a laptopwith 29 GHz Intel Core i5 processor and 8 GB of RAM The codes based on LAPACK [2]and PROPACK [41] in Section 41 are in Fortran The codes in Sections 42 43 are in Matlabexcept that FFSRQR is implemented in Fortran and wrapped by mex

41 Approximate truncated SVD In this section we compare FFSRQR with otherapproximate SVD algorithms for low-rank matrix approximation All tested methods are listedin Table 41 Since LTSVD has only a Matlab implementation this method is only consideredin Sections 42 and 43 The test matrices areType 1 A isin Rmtimesn [66] is defined by A = U DV T + 01 middot dss middotE where U isin Rmtimess V isin

Rntimess are column-orthonormal matrices and D isin Rstimess is a diagonal matrix with sgeometrically decreasing diagonal entries from 1 to 10minus3 dss is the last diagonalentry of D E isin Rmtimesn is a random matrix where the entries are independentlysampled from a normal distribution N (0 1) In our numerical experiment we testthree different random matrices The square matrix has a size of 10000times 10000 theshort-fat matrix has a size of 1000times 10000 and the tall-skinny matrix has a size of10000times 1000

Type 2 A isin R4929times4929 is a real data matrix from the University of Florida sparse matrixcollection [11] Its corresponding file name is HBGEMAT11

For a given matrix A isin Rmtimesn the relative SVD approximation error is measured by∥∥Aminus UkΣkVTk

∥∥FAF where Σk contains the approximate top k singular values and

Uk Vk are the corresponding approximate top k singular vectors The parameters used inFFSRQR RSISVD TUXV and LTSVD are listed in Table 42 The additional time requiredto compute g2 is negligible In our experiments g2 always remains modest and never triggerssubsequent SRQR column swaps However computing g2 is still recommended since g2

serves as an insurance policy against potential quality degration by TRQRCP The block size bis chosen to be 32 according to the xGEQP3 LAPACK routine When k is less than 32 weimplement an unblock version of the partial QRCP

Figures 41ndash43 display the run time the relative approximation error and the top 20singular values comparison respectively for four different matrices In terms of speed FFSRQRis always faster than LANSVD and RSISVD Moreover the efficiency difference is more

ETNAKent State University and

Johann Radon Institute (RICAM)

482 Y FENG J XIAO AND M GU

TABLE 42Parameters used in RSISVD FFSRQR TUXV and LTSVD

Method ParameterFFSRQR oversampling size p = 5 block size b = min32 k l = k d = 10 g = 20RSISVD oversampling size p = 5 subspace iteration q = 1TUXV oversampling size p = 5 block size b = min32 k l = kLTSVD probabilities pi = 1n

obvious when k is relatively large FFSRQR is only slightly slower than TUXV In termsof the relative approximation error FFSRQR is comparable to the other three methods Interms of the top singular values approximation Figure 43 displays the top 20 singular valueswhen k = 500 for four different matrices In Figure 43 FFSRQR is superior to TUXV aswe take the absolute values of the main diagonal entries of the upper triangle matrix R as theapproximate singular values for TUXV

100 200 300 400 500

target rank k

100

101

102

103

104

run tim

e (

sec)

FFSRQR

RSISVD

LANSVD

TUXV

(a) Type 1 Random square matrix

100 200 300 400 500

target rank k

10-1

100

101

102

run tim

e (

sec)

FFSRQR

RSISVD

LANSVD

TUXV

(b) Type 1 Random short-fat matrix

100 200 300 400 500

target rank k

10-1

100

101

102

run tim

e (

sec)

FFSRQR

RSISVD

LANSVD

TUXV

(c) Type 1 Random tall-skinny matrix

100 200 300 400 500

target rank k

10-1

100

101

102

103

run tim

e (

sec)

FFSRQR

RSISVD

LANSVD

TUXV

(d) Type 2 GEMAT11

FIG 41 Run time comparison for approximate SVD algorithms

42 Tensor approximation This section illustrates the effectiveness and efficiency ofFFSRQR for computing approximate tensors ST-HOSVD [3 71] is one of the most efficient

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 483

100 200 300 400 500

target rank k

022

024

026

028

03

032

034

036re

lative

ap

pro

xim

atio

n e

rro

r

FFSRQR

RSISVD

LANSVD

TUXV

(a) Type 1 Random square matrix

100 200 300 400 500

target rank k

10-2

10-1

100

rela

tive

ap

pro

xim

atio

n e

rro

r

FFSRQR

RSISVD

LANSVD

TUXV

(b) Type 1 Random short-fat matrix

100 200 300 400 500

target rank k

10-2

10-1

100

rela

tive

ap

pro

xim

atio

n e

rro

r

FFSRQR

RSISVD

LANSVD

TUXV

(c) Type 1 Random tall-skinny matrix

100 200 300 400 500

target rank k

024

026

028

03

032

034

036

rela

tive

ap

pro

xim

atio

n e

rro

rFFSRQR

RSISVD

LANSVD

TUXV

(d) Type 2 GEMAT11

FIG 42 Relative approximation error comparison for approximate SVD algorithms

algorithms to compute a Tucker decomposition of tensors and the most costly part of thisalgorithm is to compute an SVD or approximate SVD of the tensor unfoldings The truncatedSVD and RSISVD are used in the routines MLSVD and MLSVD_RSI respectively in theMatlab tensorlab toolbox [73] Based on this Matlab toolbox we implement ST-HOSVDusing FFSRQR or LTSVD to do the SVD approximation We name these two new routines byMLSVD_FFSRQR and MLSVD_LTSVD respectively and compare these four routines in thenumerical experiment We also have Python codes for this tensor approximation experimentWe donrsquot list the results of the python programs here as they are similar to those for Matlab

421 A sparse tensor example We test a sparse tensor X isin Rntimesntimesn of the followingformat [64 59]

X =

10sumj=1

1000

jxj yj zj +

nsumj=11

1

jxj yj zj

where xj yj zj isin Rn are sparse vectors with nonnegative entries The symbol ldquordquo representsthe vector outer product We compute a rank-(k k k) Tucker decomposition [GU1 U2 U3]using MLSVD MLSVD_FFSRQR MLSVD_RSI and MLSVD_LTSVD respectively Therelative approximation error is measured by X minus XkF XF where Xk = G times1 U1 times2

U2 times3 U3

ETNAKent State University and

Johann Radon Institute (RICAM)

484 Y FENG J XIAO AND M GU

5 10 15 20

index

06

07

08

09

1to

p 2

0 s

ingula

r valu

es

FFSRQR

RSISVD

LANSVD

TUXV

(a) Type 1 Random square matrix

5 10 15 20

index

06

07

08

09

1

top 2

0 s

ingula

r valu

es

FFSRQR

RSISVD

LANSVD

TUXV

(b) Type 1 Random short-fat matrix

5 10 15 20

index

06

065

07

075

08

085

09

095

top

20

sin

gu

lar

va

lue

s

FFSRQR

RSISVD

LANSVD

TUXV

(c) Type 1 Random tall-skinny matrix

5 10 15 20

index

101

102

103

top

20

sin

gu

lar

va

lue

sFFSRQR

RSISVD

LANSVD

TUXV

(d) Type 2 GEMAT11

FIG 43 Top 20 singular values comparison for approximate SVD algorithms

Figure 44 displays a comparison of the efficiency and accuracy of different methods for a400times 400times 400 sparse tensor approximation problem MLSVD_LTSVD is the fastest but theleast accurate one The other three methods have similar accuracy while MLSVD_FFSRQR isfaster when the target rank k is larger

422 Handwritten digits classification MNIST is a handwritten digits image data setcreated by Yann LeCun [43] Every digit is represented by a 28times28 pixel image Handwrittendigits classification has the objective of training a classification model to classify new unlabeledimages A HOSVD algorithm is proposed by Savas and Elden [60] to classify handwrittendigits To reduce the training time a more efficient ST-HOSVD algorithm is introducedin [71]

We do handwritten digits classification using MNIST which consists of 60 000 trainingimages and 10 000 test images The number of training images in each class is restrictedto 5421 so that the training set is equally distributed over all classes The training set isrepresented by a tensor X of size 786times 5421times 10 The classification relies on Algorithm 2in [60] We use various algorithms to obtain an approximation X asymp G times1 U1 times2 U2 times3 U3

where the core tensor G has size 65times 142times 10The results are summarized in Table 43 In terms of run time we observe that our method

MLSVD_FFSRQR is comparable to MLSVD_RSI while MLSVD is the most expensive

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 485

50 100 150 200

target rank k

10-5

10-4

10-3

10-2

10-1

100

rela

tive

ap

pro

xim

atio

n e

rro

r

MLSVD

MLSVD_FFSRQR

MLSVD_RSI

MLSVD_LTSVD

50 100 150 200

target rank k

10-2

10-1

100

101

102

run tim

e (

sec)

MLSVD

MLSVD_FFSRQR

MLSVD_RSI

MLSVD_LTSVD

FIG 44 Run time and relative approximation error comparison on a sparse tensor

TABLE 43Comparison on handwritten digits classification

MLSVD MLSVD_FFSRQR MLSVD_RSI MLSVD_LTSVD

Training Time [sec] 272121 15455 19343 04266Relative Model Error 04099 04273 04247 05162Classification Accur 9519 9498 9505 9259

one and MLSVD_LTSVD is the fastest one In terms of classification quality MLSVDMLSVD_FFSRQR and MLSVD_RSI are comparable while MLSVD_LTSVD is the leastaccurate one

43 Solving nuclear norm minimization problem To show the effectiveness of theFFSRQR algorithm for the nuclear norm minimization problems we investigate two scenariosthe robust PCA (27) and the matrix completion problem (28) The test matrix used for therobust PCA is introduced in [45] and the test matrices used in the matrix completion are tworeal data sets We use the IALM method [45] to solve both problems the IALM code can bedownloaded1

431 Robust PCA To solve the robust PCA problem we replace the approximateSVD part in the IALM method [45] by various methods We denote the actual solution tothe robust PCA problem by a matrix pair (Xlowast Elowast) isin Rmtimesn times Rmtimesn The matrix Xlowast isXlowast = XLX

TR with XL isin Rmtimesk XR isin Rntimesk being random matrices whose entries are

independently sampled from a normal distribution The sparse matrix Elowast is a random matrixwhere its non-zero entries are independently sampled from a uniform distribution over theinterval [minus500 500] The input to the IALM algorithm has the form M = Xlowast + Elowast andthe output is denoted by

(X E

) In this numerical experiment we use the same parameter

settings as the IALM code for robust PCA rank k is 01m and the number of non-zero entriesin E is 005m2 We choose the trade-off parameter λ = 1

radicmax (mn) as suggested by

Candes et al [8] The solution quality is measured by the normalized root mean square errorX minusXlowastF XlowastF

Table 44 includes relative error run time the number of non-zero entries in E (E0)the iteration count and the number of non-zero singular values (sv) in X of the IALM

1httpperceptioncslillinoisedumatrix-ranksample_codehtml

ETNAKent State University and

Johann Radon Institute (RICAM)

486 Y FENG J XIAO AND M GU

algorithm using different approximate SVD methods We observe that IALM_FFSRQR isfaster than all the other three methods while its error is comparable to IALM_LANSVD andIALM_RSISVD IALM_LTSVD is relatively slow and not effective

TABLE 44Comparison on robust PCA

Size Method Error Time (sec) E0 Iter sv

1000times 1000

IALM_LANSVD 333e-07 579e+00 50000 22 100IALM_FFSRQR 279e-07 102e+00 50000 25 100IALM_RSISVD 336e-07 109e+00 49999 22 100IALM_LTSVD 992e-02 311e+00 999715 100 100

2000times 2000

IALM_LANSVD 261e-07 591e+01 199999 22 200IALM_FFSRQR 182e-07 693e+00 199998 25 200IALM_RSISVD 263e-07 738e+00 199996 22 200IALM_LTSVD 842e-02 220e+01 3998937 100 200

4000times 4000

IALM_LANSVD 138e-07 465e+02 799991 23 400IALM_FFSRQR 139e-07 443e+01 800006 26 400IALM_RSISVD 151e-07 504e+01 799990 23 400IALM_LTSVD 894e-02 154e+02 15996623 100 400

6000times 6000

IALM_LANSVD 130e-07 166e+03 1799982 23 600IALM_FFSRQR 102e-07 142e+02 1799993 26 600IALM_RSISVD 144e-07 162e+02 1799985 23 600IALM_LTSVD 858e-02 555e+02 35992605 100 600

432 Matrix completion We solve the matrix completion problems for two real datasets used in [68] the Jester joke data set [24] and the MovieLens data set [32] The Jester jokedata set consists of 41 million ratings for 100 jokes from 73 421 users and can be downloadedfrom the website httpgoldbergberkeleyedujester-data We test withthe following data matrices

bull jester-1 Data from 24 983 users who have rated 36 or more jokesbull jester-2 Data from 23 500 users who have rated 36 or more jokesbull jester-3 Data from 24 938 users who have rated between 15 and 35 jokesbull jester-all The combination of jester-1 jester-2 and jester-3

The MovieLens data set can be downloaded fromhttpsgrouplensorgdatasetsmovielensWe test with the following data matricesbull movie-100K 100 000 ratings of 943 users for 1682 moviesbull movie-1M 1 million ratings of 6040 users for 3900 moviesbull movie-latest-small 100 000 ratings of 700 users for 9000 movies

For each data set we let M be the original data matrix where Mij stands for the ratingof joke (movie) j by user i and Γ be the set of indices where Mij is known The matrixcompletion algorithm quality is measured by the Normalized Mean Absolute Error (NMAE)defined by

NMAEdef=

1|Γ|sum

(ij)isinΓ |Mij minusXij |rmax minus rmin

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 487

where Xij is the prediction of the rating of joke (movie) j given by user i and rmin rmax arethe lower and upper bounds of the ratings respectively For the Jester joke data sets we setrmin = minus10 and rmax = 10 For the MovieLens data sets we set rmin = 1 and rmax = 5

Since |Γ| is large we randomly select a subset Ω from Γ and then use Algorithm 8 tosolve the problem (28) We randomly select 10 ratings for each user in the Jester joke datasets while we randomly choose about 50 of the ratings for each user in the MovieLensdata sets Table 45 includes the parameter settings in the algorithms The maximum iterationnumber is 100 in IALM and all other parameters are the same as those used in [45]

The numerical results are included in Table 46 We observe that IALM_FFSRQR achievesalmost the same recoverability as other methods (except IALM_LTSVD) and is slightly fasterthan IALM_RSISVD for these two data sets

TABLE 45Parameters used in the IALM method on matrix completion

Data set m n |Γ| |Ω|jester-1 24983 100 181e+06 249830jester-2 23500 100 171e+06 235000jester-3 24938 100 617e+05 249384

jester-all 73421 100 414e+06 734210movie-100K 943 1682 100e+05 49918movie-1M 6040 3706 100e+06 498742

movie-latest-small 671 9066 100e+05 52551

5 Conclusions We have presented the Flip-Flop SRQR factorization a variant ofthe QLP factorization to compute low-rank matrix approximations The Flip-Flop SRQRalgorithm uses a SRQR factorization to initialize the truncated version of a column-pivotedQR factorization and then forms an LQ factorization For the numerical results presented theerrors in the proposed algorithm were comparable to those obtained from the other state-of-the-art algorithms This new algorithm is cheaper to compute and produces quality low-rankmatrix approximations Furthermore we prove singular value lower bounds and residual errorupper bounds for the Flip-Flop SRQR factorization In situations where singular values ofthe input matrix decay relatively quickly the low-rank approximation computed by Flip-FlopSRQR is guaranteed to be as accurate as the truncated SVD We also perform a complexityanalysis to show that Flip-Flop SRQR is faster than the approximate SVD with randomizedsubspace iteration Future work includes reducing the overhead cost in Flip-Flop SRQR andimplementing the Flip-Flop SRQR algorithm on distributed memory machines for popularapplications such as distributed PCA

Acknowledgments The authors would like to thank Yousef Saad the editor and threeanonymous referees who kindly reviewed the earlier version of this manuscript and providedvaluable suggestions and comments

Appendix A Partial QR factorization with column pivotingThe details of partial QRCP are covered in Algorithm 4

Appendix B TUXV algorithmThe details of TUXV algorithm are presented in Algorithm 5

Appendix C Computing Tucker decomposition of higher-order tensorsAlgorithm 6 computes the Tucker decomposition of higher-order tensors ie sequentially

truncated higher-order SVD

ETNAKent State University and

Johann Radon Institute (RICAM)

488 Y FENG J XIAO AND M GU

TABLE 46Comparison on matrix completion

Data set Method Iter Time NMAE sv σmax σmin

jester-1

IALM-LANSVD 12 706e+00 184e-01 100 214e+03 100e+00IALM-FFSRQR 12 344e+00 169e-01 100 228e+03 100e+00IALM-RSISVD 12 375e+00 189e-01 100 212e+03 100e+00IALM-LTSVD 100 211e+01 174e-01 62 300e+03 100e+00

jester-2

IALM-LANSVD 12 680e+00 185e-01 100 213e+03 100e+00IALM-FFSRQR 12 279e+00 170e-01 100 229e+03 100e+00IALM-RSISVD 12 359e+00 191e-01 100 212e+03 100e+00IALM-LTSVD 100 203e+01 175e-01 58 296e+03 100e+00

jester-3

IALM-LANSVD 12 705e+00 126e-01 99 179e+03 100e+00IALM-FFSRQR 12 303e+00 122e-01 100 171e+03 100e+00IALM-RSISVD 12 385e+00 131e-01 100 178e+03 100e+00IALM-LTSVD 100 212e+01 133e-01 55 250e+03 100e+00

jester-all

IALM-LANSVD 12 239e+01 172e-01 100 356e+03 100e+00IALM-FFSRQR 12 112e+01 162e-01 100 363e+03 100e+00IALM-RSISVD 12 134e+01 182e-01 100 347e+03 100e+00IALM-LTSVD 100 699e+01 168e-01 52 492e+03 100e+00

movie-100K

IALM-LANSVD 29 286e+01 183e-01 285 121e+03 100e+00IALM-FFSRQR 30 455e+00 167e-01 295 153e+03 100e+00IALM-RSISVD 29 482e+00 182e-01 285 129e+03 100e+00IALM-LTSVD 48 142e+01 147e-01 475 191e+03 100e+00

movie-1M

IALM-LANSVD 50 740e+02 158e-01 495 499e+03 100e+00IALM-FFSRQR 53 207e+02 137e-01 525 663e+03 100e+00IALM-RSISVD 50 223e+02 157e-01 495 535e+03 100e+00IALM-LTSVD 100 850e+02 117e-01 995 897e+03 100e+00

movie-latest-small

IALM-LANSVD 31 166e+02 185e-01 305 113e+03 100e+00IALM-FFSRQR 31 196e+01 200e-01 305 142e+03 100e+00IALM-RSISVD 31 285e+01 191e-01 305 120e+03 100e+00IALM-LTSVD 63 402e+01 208e-01 298 179e+03 100e+00

Appendix D IALM for solving two special nuclear norm minimization problemsAlgorithm 7 and Algorithm 8 solve the robust PCA and matrix completion problems

respectively In Algorithm 7 Sω (x) = sgn (x) middot max (|x| minus ω 0) is the soft shrinkageoperator [29] with x isin Rn and ω gt 0 In Algorithm 8 Ω is the complement of Ω

Appendix E Approximate SVD with randomized subspace iterationThe randomized subspace iteration was proposed in [30 Algorithm 44] to compute an

orthonormal matrix whose range approximates the range of A An approximate SVD can becomputed using the aforementioned orthonormal matrix [30 Algorithm 51] A randomizedsubspace iteration is used in routine MLSVD_RSI in Matlab toolbox tensorlab [73] andMLSVD_RSI is by far the most efficient function that we can find in Matlab to compute ST-HOSVD We summarize the approximate SVD with randomized subspace iteration pseudocodein Algorithm 9

Now we perform a complexity analysis for the approximate SVD with randomizedsubspace iteration We first note that

1 The cost of generating a random matrix is negligible2 The cost of computing B = AΩ is 2mn (k + p)

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 489

Algorithm 4 Partial QRCPInputsMatrix A isin Rmtimesn Target rank kOutputsOrthogonal matrix Q isin RmtimesmUpper trapezoidal matrix R isin RmtimesnPermutation matrix Π isin Rntimesn such that AΠ = QRAlgorithmInitialize Π = In Compute column norms rs = A (1 m s) 2 (1 le s le n)for j = 1 k do

Find i = arg maxjleslen

rs Exchange rj and ri columns in A and Π

Form Householder reflection Qj from A (j m j)Update trailing matrix A (j m j n)larr QTj A (j m j n)Update rs = A (j + 1 m s) 2 (j + 1 le s le n)

end forQ = Q1Q2 middot middot middotQk is the product of all reflections R = upper trapezoidal part of A

Algorithm 5 TUXV AlgorithmInputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0OutputsColumn orthonormal matrices U isin Rmtimesk V isin Rntimesk and upper triangular matrixR isin Rktimesk such that A asymp URV T AlgorithmDo TRQRCP on A to obtain Q isin Rmtimesk R isin Rktimesn and Π isin RntimesnR = RΠT and do LQ factorization ie [VR] = qr(RT 0)Compute Z = AV and do QR factorization ie [UR] = qr(Z 0)

3 In each QR step [Qsim] = qr (B 0) the cost of computing the QR factorization ofB is 2m (k + p)

2 minus 23 (k + p)

3 (cf [69]) and the cost of forming the first (k + p)

columns in the full Q matrix is m (k + p)2

+ 13 (k + p)

3

Now we count the flops for each i in the for loop

1 The cost of computing B = AT lowastQ is 2mn (k + p)2 The cost of computing [Qsim] = qr (B 0) is 2n (k + p)

2 minus 23 (k + p)

3 and the costof forming the first (k + p) columns in the full Q matrix is n (k + p)

2+ 1

3 (k + p)3

3 The cost of computing B = A lowastQ is 2mn (k + p)4 The cost of computing [Qsim] = qr (B 0) is 2m (k + p)

2minus 23 (k + p)

3 and the costof forming the first (k + p) columns in the full Q matrix is m (k + p)

2+ 1

3 (k + p)3

Together the cost of running the for loop q times is

q

(4mn (k + p) + 3 (m+ n) (k + p)

2 minus 2

3(k + p)

3

)

Additionally the cost of computing B = QT lowastA is 2mn (k + p) the cost of doing SVD of Bis O(n (k + p)

2) and the cost of computing U = Q lowast U is 2m (k + p)

2

ETNAKent State University and

Johann Radon Institute (RICAM)

490 Y FENG J XIAO AND M GU

Algorithm 6 ST-HOSVDInputsTensor X isin RI1timesmiddotmiddotmiddottimesId desired rank (k1 kd) and processing order p = (p1 middot middot middot pd)OutputsTucker decomposition [GU1 middot middot middot Ud]AlgorithmDefine tensor G larr X for j = 1 d dor = pj Compute exact or approximate rank kr SVD of the tensor unfolding G(r) asymp UrΣrV Tr Ur larr UrUpdate G(r) larr ΣrV

Tr ie applying UTr to G

end for

Algorithm 7 Robust PCA Using IALMInputsMeasured matrix M isin Rmtimesn positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 J (M) = max

(M2 λminus1MF

) Y0 = MJ (M) E0 = 0

while not converged do(UΣV) = svd

(MminusEk + microminus1k Yk

)

Xk+1 = USmicrominus1k

(Σ)V T

Ek+1 = Sλmicrominus1k

(M minusXk+1 + microminus1

k Yk)

Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then

Breakend if

end while

Now assume k + p min (mn) and omit the lower-order terms then we arriveat (4q + 4)mn (k + p) as the complexity of approximate SVD with randomized subspaceiteration In practice q is usually chosen to be an integer between 0 and 2

REFERENCES

[1] O ALTER P O BROWN AND D BOTSTEIN Singular value decomposition for genome-wide expressiondata processing and modeling Proc Natl Acad Sci USA 97 (2000) pp 10101ndash10106

[2] E ANDERSON Z BAI C BISCHOF L S BLACKFORD J DEMMEL J DONGARRA J DU CROZA GREENBAUM S HAMMARLING A MCKENNEY AND D SORENSEN LAPACK Usersrsquo Guide 3rded SIAM Philadelphia 1999

[3] C A ANDERSSON AND R BRO Improving the speed of multi-way algorithms Part I Tucker3 ChemometricsIntell Lab Syst 42 (1998) pp 93ndash103

[4] M W BERRY S T DUMAIS AND G W OrsquoBRIEN Using linear algebra for intelligent information retrievalSIAM Rev 37 (1995) pp 573ndash595

[5] C BISCHOF AND C VAN LOAN The WY representation for products of Householder matrices SIAM J Sci

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 491

Algorithm 8 Matrix Completion Using IALMInputsSampled set Ω sampled entries πΩ (M) positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 Y0 = 0 E0 = 0while not converged do

(UΣV) = svd(MminusEk + microminus1k Yk

)

Xk+1 = USmicrominus1k

(Σ)V T

Ek+1 = πΩ

(M minusXk+1 + microminus1

k Yk)

Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then

Breakend if

end while

Statist Comput 8 (1987) pp S2ndashS13[6] P BUSINGER AND G H GOLUB Handbook series linear algebra Linear least squares solutions by House-

holder transformations Numer Math 7 (1965) pp 269ndash276[7] J-F CAI E J CANDEgraveS AND Z SHEN A singular value thresholding algorithm for matrix completion

SIAM J Optim 20 (2010) pp 1956ndash1982[8] E J CANDEgraveS X LI Y MA AND J WRIGHT Robust principal component analysis J ACM 58 (2011)

Art 11 37 pages[9] E J CANDEgraveS AND B RECHT Exact matrix completion via convex optimization Found Comput Math 9

(2009) pp 717ndash772[10] J D CARROLL AND J-J CHANG Analysis of individual differences in multidimensional scaling via an

N-way generalization of ldquoEckart-Youngrdquo decomposition Psychometrika 35 (1970) pp 283ndash319[11] T A DAVIS AND Y HU The University of Florida sparse matrix collection ACM Trans Math Software 38

(2011) pp Art 1 25[12] L DE LATHAUWER AND B DE MOOR From matrix to tensor Multilinear algebra and signal processing in

Mathematics in Signal Processing IV Inst Math Appl Conf Ser Oxford Univ Press Oxford 1998pp 1ndash16

[13] L DE LATHAUWER B DE MOOR AND J VANDEWALLE A multilinear singular value decompositionSIAM J Matrix Anal Appl 21 (2000) pp 1253ndash1278

[14] On the best rank-1 and rank-(R1 R2 middot middot middot RN ) approximation of higher-order tensors SIAM JMatrix Anal Appl 21 (2000) pp 1324ndash1342

[15] L DE LATHAUWER AND J VANDEWALLE Dimensionality reduction in higher-order signal processing andrank-(R1 R2 RN ) reduction in multilinear algebra Linear Algebra Appl 391 (2004) pp 31ndash55

[16] P DRINEAS R KANNAN AND M W MAHONEY Fast Monte Carlo algorithms for matrices II Computinga low-rank approximation to a matrix SIAM J Comput 36 (2006) pp 158ndash183

[17] J A DUERSCH AND M GU Randomized QR with column pivoting SIAM J Sci Comput 39 (2017)pp C263ndashC291

[18] C ECKART AND G YOUNG The approximation of one matrix by another of lower rank Psychometrika 1(1936) pp 211ndash218

[19] L ELDEacuteN Matrix Methods in Data Mining and Pattern Recognition SIAM Philadelphia 2007[20] M FAZEL H HINDI AND S P BOYD Log-det heuristic for matrix rank minimization with applications to

Hankel and Euclidean distance matrices in Proceedings of the 2003 American Control Conference 2003IEEE Conference Proceedings IEEE Piscataway pp 2156ndash2162

[21] M FAZEL T K PONG D SUN AND P TSENG Hankel matrix rank minimization with applications tosystem identification and realization SIAM J Matrix Anal Appl 34 (2013) pp 946ndash977

[22] C FRANK AND E NOumlTH Optimizing eigenfaces by face masks for facial expression recognition in ComputerAnalysis of Images and Patterns N Petkov and M A Westenberg eds vol 2756 of Lecture Notes inComput Sci Springer Berlin 2003 pp 646ndash654

ETNAKent State University and

Johann Radon Institute (RICAM)

492 Y FENG J XIAO AND M GU

Algorithm 9 Approximate SVD with Randomized Subspace IterationInputsMatrix A isin Rmtimesn Target rank k Oversampling size p ge 0 Number of iterations q ge 1OutputsU isin Rmtimesk contains the approximate top k left singular vectors of AΣ isin Rktimesk contains the approximate top k singular values of AV isin Rntimesk contains the approximate top k right singular vectors of AAlgorithmGenerate iid Gaussian matrix Ω isin N (0 1)

ntimes(k+p)Compute B = AΩ[Qsim] = qr (B 0)for i = 1 q doB = AT lowastQ[Qsim] = qr (B 0)B = A lowastQ[Qsim] = qr (B 0)

end forB = QT lowastA[UΣ V ] = svd (B)U = Q lowast U U = U ( 1 k)Σ = Σ (1 k 1 k)V = V ( 1 k)

[23] G W FURNAS S DEERWESTER S T DUMAIS T K LANDAUER R A HARSHMAN L A STREETERAND K E LOCHBAUM Information retrieval using a singular value decomposition model of latentsemantic structure in Proceedings of the 11th Annual International ACM SIGIR Conference on Researchand Development in Information Retrieval Y Chiaramella ed ACM New York 1988 pp 465ndash480

[24] K GOLDBERG T ROEDER D GUPTA AND C PERKINS Eigentaste A constant time collaborative filteringalgorithm Inf Retrieval 4 (2001) pp 133ndash151

[25] G GOLUB Numerical methods for solving linear least squares problems Numer Math 7 (1965) pp 206ndash216[26] G H GOLUB AND C F VAN LOAN Matrix Computations 4th ed Johns Hopkins University Press

Baltimore 2013[27] M GU Subspace iteration randomization and singular value problems SIAM J Sci Comput 37 (2015)

pp A1139ndashA1173[28] M GU AND S C EISENSTAT Efficient algorithms for computing a strong rank-revealing QR factorization

SIAM J Sci Comput 17 (1996) pp 848ndash869[29] E T HALE W YIN AND Y ZHANG Fixed-point continuation for l1-minimization methodology and

convergence SIAM J Optim 19 (2008) pp 1107ndash1130[30] N HALKO P G MARTINSSON AND J A TROPP Finding structure with randomness probabilistic

algorithms for constructing approximate matrix decompositions SIAM Rev 53 (2011) pp 217ndash288[31] R A HARSHMAN Foundations of the PARAFAC procedure Models and conditions for an ldquoexplanatoryrdquo

multimodal factor analysis UCLA Working Papers in Phonetics 16 (1970) pp 1ndash84[32] J L HERLOCKER J A KONSTAN A BORCHERS AND J RIEDL An algorithmic framework for performing

collaborative filtering in Proceedings of the 22nd Annual International ACM SIGIR Conference onResearch and Development in Information Retrieval F Gey M Heaast and R Tong eds ACM NewYork 1999 pp 230ndash237

[33] R A HORN AND C R JOHNSON Topics in Matrix Analysis Cambridge University Press Cambridge 1991[34] D A HUCKABY AND T F CHAN On the convergence of Stewartrsquos QLP algorithm for approximating the

SVD Numer Algorithms 32 (2003) pp 287ndash316[35] Stewartrsquos pivoted QLP decomposition for low-rank matrices Numer Linear Algebra Appl 12 (2005)

pp 153ndash159[36] W B JOHNSON AND J LINDENSTRAUSS Extensions of Lipschitz mappings into a Hilbert space in

Conference in Modern Analysis and Probability (New Haven Conn 1982) R Beals A Beck A Bellowand A Hajian eds vol 26 of Contemp Math Amer Math Soc Providence 1984 pp 189ndash206

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 493

[37] I T JOLLIFFE Principal Component Analysis Springer New York 1986[38] J M KLEINBERG Authoritative sources in a hyperlinked environment J ACM 46 (1999) pp 604ndash632[39] T G KOLDA Orthogonal tensor decompositions SIAM J Matrix Anal Appl 23 (2001) pp 243ndash255[40] T G KOLDA AND B W BADER Tensor decompositions and applications SIAM Rev 51 (2009) pp 455ndash

500[41] R M LARSEN PROPACK-software for large and sparse SVD calculations available at

httpsunstanfordedurmunkPROPACK[42] Lanczos bidiagonalization with partial reorthogonalization Tech Report PB-537 Department of

Computer Science Aarhus University Aarhus Denmark 1998[43] Y LECUN L BOTTOU Y BENGIO AND P HAFFNER Gradient-based learning applied to document

recognition Proc IEEE 86 (1998) pp 2278ndash2324[44] R B LEHOUCQ D C SORENSEN AND C YANG ARPACK Usersrsquo Guide Solution of Large-Scale

Eigenvalue Problems with Implicitly Restarted Arnoldi Methods SIAM Philadelphia 1998[45] Z LIN M CHEN AND Y MA The augmented Lagrange multiplier method for exact recovery of corrupted

low-rank matrices Preprint on arXiv httpsarxivorgabs10095055 2010[46] N LINIAL E LONDON AND Y RABINOVICH The geometry of graphs and some of its algorithmic

applications Combinatorica 15 (1995) pp 215ndash245[47] Z LIU A HANSSON AND L VANDENBERGHE Nuclear norm system identification with missing inputs and

outputs Systems Control Lett 62 (2013) pp 605ndash612[48] Z LIU AND L VANDENBERGHE Interior-point method for nuclear norm approximation with application to

system identification SIAM J Matrix Anal Appl 31 (2009) pp 1235ndash1256[49] S MA D GOLDFARB AND L CHEN Fixed point and Bregman iterative methods for matrix rank minimiza-

tion Math Program 128 (2011) pp 321ndash353[50] C B MELGAARD Randomized Pivoting and Spectrum-Revealing Bounds in Numerical Linear Algebra

PhD Thesis University of California Berkeley 2015[51] M MESBAHI AND G P PAPAVASSILOPOULOS On the rank minimization problem over a positive semidefinite

linear matrix inequality IEEE Trans Automat Control 42 (1997) pp 239ndash243[52] N MULLER L MAGAIA AND B M HERBST Singular value decomposition eigenfaces and 3D recon-

structions SIAM Rev 46 (2004) pp 518ndash545[53] M NARWARIA AND W LIN SVD-based quality metric for image and video using machine learning IEEE

Trans Systbdquo Man and Cybernetics Part B (Cybernetics) 42 (2012) pp 347ndash364[54] T-H OH Y MATSUSHITA Y-W TAI AND I SO KWEON Fast randomized singular value thresholding

for nuclear norm minimization in Proceedings of the IEEE Conference on Computer Vision and PatternRecognition 2015 IEEE Conference Proceedings IEEE Los Alamitos 2015 pp 4484ndash4493

[55] G QUINTANA-ORTIacute X SUN AND C H BISCHOF A BLAS-3 version of the QR factorization with columnpivoting SIAM J Sci Comput 19 (1998) pp 1486ndash1494

[56] B RECHT M FAZEL AND P A PARRILO Guaranteed minimum-rank solutions of linear matrix equationsvia nuclear norm minimization SIAM Rev 52 (2010) pp 471ndash501

[57] J D RENNIE AND N SREBRO Fast maximum margin matrix factorization for collaborative prediction inProceedings of the 22nd International Conference on Machine Learning L De Raedt and S Wrobel edsACM Press Neow York 2005 pp 713ndash719

[58] V ROKHLIN A SZLAM AND M TYGERT A randomized algorithm for principal component analysis SIAMJ Matrix Anal Appl 31 (2009) pp 1100ndash1124

[59] A K SAIBABA HOID higher order interpolatory decomposition for tensors based on Tucker representationSIAM J Matrix Anal Appl 37 (2016) pp 1223ndash1249

[60] B SAVAS AND L ELDEacuteN Handwritten digit classification using higher order singular value decompositionPattern Recognition 40 (2007) pp 993ndash1003

[61] R SCHREIBER AND C VAN LOAN A storage-efficient WY representation for products of Householdertransformations SIAM J Sci Statist Comput 10 (1989) pp 53ndash57

[62] A SHASHUA AND T HAZAN Non-negative tensor factorization with applications to statistics and com-puter vision in Proceedings of the 22nd International Conference on Machine Learning S DzeroskiL De Raedt and S Wrobel eds ACM New York 2005 pp 792ndash799

[63] N D SIDIROPOULOS R BRO AND G B GIANNAKIS Parallel factor analysis in sensor array processingIEEE Trans Signal Process 48 (2000) pp 2377ndash2388

[64] D C SORENSEN AND M EMBREE A DEIM induced CUR factorization SIAM J Sci Comput 38 (2016)pp A1454ndashA1482

[65] N SREBRO J RENNIE AND T S JAAKKOLA Maximum-margin matrix factorization in NIPSrsquo04 Proceed-ings of the 17th International Conference on Neural Information Processing Systems L K Saul Y Weissand L Bottou eds MIT Press Cambridge 2004 pp 1329ndash1336

[66] G W STEWART The QLP approximation to the singular value decomposition SIAM J Sci Comput 20(1999) pp 1336ndash1348

ETNAKent State University and

Johann Radon Institute (RICAM)

494 Y FENG J XIAO AND M GU

[67] S TAHERI Q QIU AND R CHELLAPPA Structure-preserving sparse decomposition for facial expressionanalysis IEEE Trans Image Process 23 (2014) pp 3590ndash3603

[68] K-C TOH AND S YUN An accelerated proximal gradient algorithm for nuclear norm regularized linearleast squares problems Pac J Optim 6 (2010) pp 615ndash640

[69] L N TREFETHEN AND D BAU III Numerical Linear Algebra SIAM Philadelphia 1997[70] L R TUCKER Some mathematical notes on three-mode factor analysis Psychometrika 31 (1966) pp 279ndash

311[71] N VANNIEUWENHOVEN R VANDEBRIL AND K MEERBERGEN A new truncation strategy for the higher-

order singular value decomposition SIAM J Sci Comput 34 (2012) pp A1027ndashA1052[72] M A O VASILESCU AND D TERZOPOULOS Multilinear analysis of image ensembles Tensorfaces in

European Conference on Computer Vision 2002 A Heyden G Sparr M Nielsen and P Johansen edsLecture Notes in Computer Science vol 2350 Springer Berlin 2002 pp 447ndash460

[73] N VERVLIET O DEBALS L SORBER M V BAREL AND L DE LATHAUWER Tensorlab user guideavailable at httpwwwtensorlabnet

[74] J XIAO AND M GU Spectrum-revealing Cholesky factorization for kernel methods in Proceedings of the 16thIEEE International Conference on Data Mining (ICDM) IEEE Conference Proceedings Los Alamitos2016 pp 1293ndash1298

[75] J XIAO M GU AND J LANGOU Fast parallel randomized QR with column pivoting algorithms for reliablelow-rank matrix approximations in Proceedings of the 24th IEEE International Conference on HighPerformance Computing (HiPC) IEEE Conference Proceedings Los Alamitos 2017 pp 233ndash242

[76] T ZHANG AND G H GOLUB Rank-one approximation to high order tensors SIAM J Matrix Anal Appl23 (2001) pp 534ndash550

Page 12: FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION AND …etna.mcs.kent.edu/vol.51.2019/pp469-494.dir/pp469-494.pdfminimization. In Section3, we introduce Flip-Flop SRQR and analyze its

ETNAKent State University and

Johann Radon Institute (RICAM)

480 Y FENG J XIAO AND M GU

On the other hand using (35)

σ4j (A) le σ4

j (Σk)

(1 + 2

σ4j (A)

σ4j (Σk)

R2242σ4j (A)

)

le σ4j (Σk)

1 + 2

(σ4j (Σk) + 2 R2242

)σ4j (Σk)

R2242σ4j (A)

le σ4

j (Σk)

(1 + 2

(1 + 2

R2242σ4k (Σk)

)R2242σ4j (A)

)

that is

σj (Σk) ge σj (A)

4

radic1 +

(2 + 4

R2242σ4k(Σk)

)R2242σ4j (A)

Plugging (313) and (314) into this above equation yields

(316) σj (Σk) ge σj (A)

4

radic1 + τ4 (2 + 4τ4)

(σl+1(A)σj(A)

)4 1 le j le k

Combining (315) with (316) we arrive at (37)We note that (35) and (36) still hold true if we replace k by lInequality (37) shows that with the definitions (312) of τ and τ Flip-Flop SRQR can

reveal at least a dimension-dependent fraction of all the leading singular values of A andindeed approximate them very accurately in case they decay relatively quickly Moreover (38)shows that Flip-Flop SRQR can compute a rank-k approximation that is up to a factor of4

radic1 + 2τ4

(σl+1(A)σk+1(A)

)4

from optimal In situations where the singular values of A decay

relatively quickly our rank-k approximation is about as accurate as the truncated SVD with achoice of l such that

σl+1 (A)

σk+1 (A)= o (1)

REMARK 33 The ldquohigh probabilityrdquo mentioned in Theorem 32 is dependent on theoversampling size p but not on the block size b We decided not to dive deep into the probabilityexplanation since it will make the proof much longer and more tedious Theoretically thecomputed oversampling size p for certain given high probability would be rather high Forexample to achieve a probability 095 we need to choose p to be at least 500 However inpractice we would never choose such a large p since it will deteriorate the code efficiencyIt turns out that a small p like 5 sim 10 would achieve good approximation results empiricallyThe detailed explanation about what does high probability mean here can be found in [75]In terms of the choice of the block size b it wonrsquot affect the analysis or the error boundsThe choice of b is purely for the performance of the codes Only a suitable b can take fulladvantage of the cache hierarchy and BLAS routines

4 Numerical experiments In this section we demonstrate the effectiveness and effi-ciency of the Flip-Flop SRQR (FFSRQR) algorithm in several numerical experiments Firstly

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 481

TABLE 41Methods for approximate SVD

Method DescriptionLANSVD Approximate SVD using Lanczos bidiagonalization with partial

reorthogonalization [42]Its Fortran and Matlab implementations are from PROPACK [41]

FFSRQR Flip-Flop Spectrum-revealing QR factorizationIts Matlab implementation is using Mex-wrappedBLAS and LAPACK routines [2]

RSISVD Approximate SVD with randomized subspace iteration [30 27]Its Matlab implementation is from tensorlab toolbox [73]

TUXV Algorithm 5 in the appendixLTSVD Linear Time SVD [16]

We use its Matlab implementation by Ma et al [49]

we compare FFSRQR with other approximate SVD algorithms for a matrix approximationSecondly we compare FFSRQR with other methods on tensor approximation problems usingthe tensorlab toolbox [73] Thirdly we compare FFSRQR with other methods for the robustPCA problem and matrix completion problem All experiments are implemented on a laptopwith 29 GHz Intel Core i5 processor and 8 GB of RAM The codes based on LAPACK [2]and PROPACK [41] in Section 41 are in Fortran The codes in Sections 42 43 are in Matlabexcept that FFSRQR is implemented in Fortran and wrapped by mex

41 Approximate truncated SVD In this section we compare FFSRQR with otherapproximate SVD algorithms for low-rank matrix approximation All tested methods are listedin Table 41 Since LTSVD has only a Matlab implementation this method is only consideredin Sections 42 and 43 The test matrices areType 1 A isin Rmtimesn [66] is defined by A = U DV T + 01 middot dss middotE where U isin Rmtimess V isin

Rntimess are column-orthonormal matrices and D isin Rstimess is a diagonal matrix with sgeometrically decreasing diagonal entries from 1 to 10minus3 dss is the last diagonalentry of D E isin Rmtimesn is a random matrix where the entries are independentlysampled from a normal distribution N (0 1) In our numerical experiment we testthree different random matrices The square matrix has a size of 10000times 10000 theshort-fat matrix has a size of 1000times 10000 and the tall-skinny matrix has a size of10000times 1000

Type 2 A isin R4929times4929 is a real data matrix from the University of Florida sparse matrixcollection [11] Its corresponding file name is HBGEMAT11

For a given matrix A isin Rmtimesn the relative SVD approximation error is measured by∥∥Aminus UkΣkVTk

∥∥FAF where Σk contains the approximate top k singular values and

Uk Vk are the corresponding approximate top k singular vectors The parameters used inFFSRQR RSISVD TUXV and LTSVD are listed in Table 42 The additional time requiredto compute g2 is negligible In our experiments g2 always remains modest and never triggerssubsequent SRQR column swaps However computing g2 is still recommended since g2

serves as an insurance policy against potential quality degration by TRQRCP The block size bis chosen to be 32 according to the xGEQP3 LAPACK routine When k is less than 32 weimplement an unblock version of the partial QRCP

Figures 41ndash43 display the run time the relative approximation error and the top 20singular values comparison respectively for four different matrices In terms of speed FFSRQRis always faster than LANSVD and RSISVD Moreover the efficiency difference is more

ETNAKent State University and

Johann Radon Institute (RICAM)

482 Y FENG J XIAO AND M GU

TABLE 42Parameters used in RSISVD FFSRQR TUXV and LTSVD

Method ParameterFFSRQR oversampling size p = 5 block size b = min32 k l = k d = 10 g = 20RSISVD oversampling size p = 5 subspace iteration q = 1TUXV oversampling size p = 5 block size b = min32 k l = kLTSVD probabilities pi = 1n

obvious when k is relatively large FFSRQR is only slightly slower than TUXV In termsof the relative approximation error FFSRQR is comparable to the other three methods Interms of the top singular values approximation Figure 43 displays the top 20 singular valueswhen k = 500 for four different matrices In Figure 43 FFSRQR is superior to TUXV aswe take the absolute values of the main diagonal entries of the upper triangle matrix R as theapproximate singular values for TUXV

100 200 300 400 500

target rank k

100

101

102

103

104

run tim

e (

sec)

FFSRQR

RSISVD

LANSVD

TUXV

(a) Type 1 Random square matrix

100 200 300 400 500

target rank k

10-1

100

101

102

run tim

e (

sec)

FFSRQR

RSISVD

LANSVD

TUXV

(b) Type 1 Random short-fat matrix

100 200 300 400 500

target rank k

10-1

100

101

102

run tim

e (

sec)

FFSRQR

RSISVD

LANSVD

TUXV

(c) Type 1 Random tall-skinny matrix

100 200 300 400 500

target rank k

10-1

100

101

102

103

run tim

e (

sec)

FFSRQR

RSISVD

LANSVD

TUXV

(d) Type 2 GEMAT11

FIG 41 Run time comparison for approximate SVD algorithms

42 Tensor approximation This section illustrates the effectiveness and efficiency ofFFSRQR for computing approximate tensors ST-HOSVD [3 71] is one of the most efficient

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 483

100 200 300 400 500

target rank k

022

024

026

028

03

032

034

036re

lative

ap

pro

xim

atio

n e

rro

r

FFSRQR

RSISVD

LANSVD

TUXV

(a) Type 1 Random square matrix

100 200 300 400 500

target rank k

10-2

10-1

100

rela

tive

ap

pro

xim

atio

n e

rro

r

FFSRQR

RSISVD

LANSVD

TUXV

(b) Type 1 Random short-fat matrix

100 200 300 400 500

target rank k

10-2

10-1

100

rela

tive

ap

pro

xim

atio

n e

rro

r

FFSRQR

RSISVD

LANSVD

TUXV

(c) Type 1 Random tall-skinny matrix

100 200 300 400 500

target rank k

024

026

028

03

032

034

036

rela

tive

ap

pro

xim

atio

n e

rro

rFFSRQR

RSISVD

LANSVD

TUXV

(d) Type 2 GEMAT11

FIG 42 Relative approximation error comparison for approximate SVD algorithms

algorithms to compute a Tucker decomposition of tensors and the most costly part of thisalgorithm is to compute an SVD or approximate SVD of the tensor unfoldings The truncatedSVD and RSISVD are used in the routines MLSVD and MLSVD_RSI respectively in theMatlab tensorlab toolbox [73] Based on this Matlab toolbox we implement ST-HOSVDusing FFSRQR or LTSVD to do the SVD approximation We name these two new routines byMLSVD_FFSRQR and MLSVD_LTSVD respectively and compare these four routines in thenumerical experiment We also have Python codes for this tensor approximation experimentWe donrsquot list the results of the python programs here as they are similar to those for Matlab

421 A sparse tensor example We test a sparse tensor X isin Rntimesntimesn of the followingformat [64 59]

X =

10sumj=1

1000

jxj yj zj +

nsumj=11

1

jxj yj zj

where xj yj zj isin Rn are sparse vectors with nonnegative entries The symbol ldquordquo representsthe vector outer product We compute a rank-(k k k) Tucker decomposition [GU1 U2 U3]using MLSVD MLSVD_FFSRQR MLSVD_RSI and MLSVD_LTSVD respectively Therelative approximation error is measured by X minus XkF XF where Xk = G times1 U1 times2

U2 times3 U3

ETNAKent State University and

Johann Radon Institute (RICAM)

484 Y FENG J XIAO AND M GU

5 10 15 20

index

06

07

08

09

1to

p 2

0 s

ingula

r valu

es

FFSRQR

RSISVD

LANSVD

TUXV

(a) Type 1 Random square matrix

5 10 15 20

index

06

07

08

09

1

top 2

0 s

ingula

r valu

es

FFSRQR

RSISVD

LANSVD

TUXV

(b) Type 1 Random short-fat matrix

5 10 15 20

index

06

065

07

075

08

085

09

095

top

20

sin

gu

lar

va

lue

s

FFSRQR

RSISVD

LANSVD

TUXV

(c) Type 1 Random tall-skinny matrix

5 10 15 20

index

101

102

103

top

20

sin

gu

lar

va

lue

sFFSRQR

RSISVD

LANSVD

TUXV

(d) Type 2 GEMAT11

FIG 43 Top 20 singular values comparison for approximate SVD algorithms

Figure 44 displays a comparison of the efficiency and accuracy of different methods for a400times 400times 400 sparse tensor approximation problem MLSVD_LTSVD is the fastest but theleast accurate one The other three methods have similar accuracy while MLSVD_FFSRQR isfaster when the target rank k is larger

422 Handwritten digits classification MNIST is a handwritten digits image data setcreated by Yann LeCun [43] Every digit is represented by a 28times28 pixel image Handwrittendigits classification has the objective of training a classification model to classify new unlabeledimages A HOSVD algorithm is proposed by Savas and Elden [60] to classify handwrittendigits To reduce the training time a more efficient ST-HOSVD algorithm is introducedin [71]

We do handwritten digits classification using MNIST which consists of 60 000 trainingimages and 10 000 test images The number of training images in each class is restrictedto 5421 so that the training set is equally distributed over all classes The training set isrepresented by a tensor X of size 786times 5421times 10 The classification relies on Algorithm 2in [60] We use various algorithms to obtain an approximation X asymp G times1 U1 times2 U2 times3 U3

where the core tensor G has size 65times 142times 10The results are summarized in Table 43 In terms of run time we observe that our method

MLSVD_FFSRQR is comparable to MLSVD_RSI while MLSVD is the most expensive

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 485

50 100 150 200

target rank k

10-5

10-4

10-3

10-2

10-1

100

rela

tive

ap

pro

xim

atio

n e

rro

r

MLSVD

MLSVD_FFSRQR

MLSVD_RSI

MLSVD_LTSVD

50 100 150 200

target rank k

10-2

10-1

100

101

102

run tim

e (

sec)

MLSVD

MLSVD_FFSRQR

MLSVD_RSI

MLSVD_LTSVD

FIG 44 Run time and relative approximation error comparison on a sparse tensor

TABLE 43Comparison on handwritten digits classification

MLSVD MLSVD_FFSRQR MLSVD_RSI MLSVD_LTSVD

Training Time [sec] 272121 15455 19343 04266Relative Model Error 04099 04273 04247 05162Classification Accur 9519 9498 9505 9259

one and MLSVD_LTSVD is the fastest one In terms of classification quality MLSVDMLSVD_FFSRQR and MLSVD_RSI are comparable while MLSVD_LTSVD is the leastaccurate one

43 Solving nuclear norm minimization problem To show the effectiveness of theFFSRQR algorithm for the nuclear norm minimization problems we investigate two scenariosthe robust PCA (27) and the matrix completion problem (28) The test matrix used for therobust PCA is introduced in [45] and the test matrices used in the matrix completion are tworeal data sets We use the IALM method [45] to solve both problems the IALM code can bedownloaded1

431 Robust PCA To solve the robust PCA problem we replace the approximateSVD part in the IALM method [45] by various methods We denote the actual solution tothe robust PCA problem by a matrix pair (Xlowast Elowast) isin Rmtimesn times Rmtimesn The matrix Xlowast isXlowast = XLX

TR with XL isin Rmtimesk XR isin Rntimesk being random matrices whose entries are

independently sampled from a normal distribution The sparse matrix Elowast is a random matrixwhere its non-zero entries are independently sampled from a uniform distribution over theinterval [minus500 500] The input to the IALM algorithm has the form M = Xlowast + Elowast andthe output is denoted by

(X E

) In this numerical experiment we use the same parameter

settings as the IALM code for robust PCA rank k is 01m and the number of non-zero entriesin E is 005m2 We choose the trade-off parameter λ = 1

radicmax (mn) as suggested by

Candes et al [8] The solution quality is measured by the normalized root mean square errorX minusXlowastF XlowastF

Table 44 includes relative error run time the number of non-zero entries in E (E0)the iteration count and the number of non-zero singular values (sv) in X of the IALM

1httpperceptioncslillinoisedumatrix-ranksample_codehtml

ETNAKent State University and

Johann Radon Institute (RICAM)

486 Y FENG J XIAO AND M GU

algorithm using different approximate SVD methods We observe that IALM_FFSRQR isfaster than all the other three methods while its error is comparable to IALM_LANSVD andIALM_RSISVD IALM_LTSVD is relatively slow and not effective

TABLE 44Comparison on robust PCA

Size Method Error Time (sec) E0 Iter sv

1000times 1000

IALM_LANSVD 333e-07 579e+00 50000 22 100IALM_FFSRQR 279e-07 102e+00 50000 25 100IALM_RSISVD 336e-07 109e+00 49999 22 100IALM_LTSVD 992e-02 311e+00 999715 100 100

2000times 2000

IALM_LANSVD 261e-07 591e+01 199999 22 200IALM_FFSRQR 182e-07 693e+00 199998 25 200IALM_RSISVD 263e-07 738e+00 199996 22 200IALM_LTSVD 842e-02 220e+01 3998937 100 200

4000times 4000

IALM_LANSVD 138e-07 465e+02 799991 23 400IALM_FFSRQR 139e-07 443e+01 800006 26 400IALM_RSISVD 151e-07 504e+01 799990 23 400IALM_LTSVD 894e-02 154e+02 15996623 100 400

6000times 6000

IALM_LANSVD 130e-07 166e+03 1799982 23 600IALM_FFSRQR 102e-07 142e+02 1799993 26 600IALM_RSISVD 144e-07 162e+02 1799985 23 600IALM_LTSVD 858e-02 555e+02 35992605 100 600

432 Matrix completion We solve the matrix completion problems for two real datasets used in [68] the Jester joke data set [24] and the MovieLens data set [32] The Jester jokedata set consists of 41 million ratings for 100 jokes from 73 421 users and can be downloadedfrom the website httpgoldbergberkeleyedujester-data We test withthe following data matrices

bull jester-1 Data from 24 983 users who have rated 36 or more jokesbull jester-2 Data from 23 500 users who have rated 36 or more jokesbull jester-3 Data from 24 938 users who have rated between 15 and 35 jokesbull jester-all The combination of jester-1 jester-2 and jester-3

The MovieLens data set can be downloaded fromhttpsgrouplensorgdatasetsmovielensWe test with the following data matricesbull movie-100K 100 000 ratings of 943 users for 1682 moviesbull movie-1M 1 million ratings of 6040 users for 3900 moviesbull movie-latest-small 100 000 ratings of 700 users for 9000 movies

For each data set we let M be the original data matrix where Mij stands for the ratingof joke (movie) j by user i and Γ be the set of indices where Mij is known The matrixcompletion algorithm quality is measured by the Normalized Mean Absolute Error (NMAE)defined by

NMAEdef=

1|Γ|sum

(ij)isinΓ |Mij minusXij |rmax minus rmin

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 487

where Xij is the prediction of the rating of joke (movie) j given by user i and rmin rmax arethe lower and upper bounds of the ratings respectively For the Jester joke data sets we setrmin = minus10 and rmax = 10 For the MovieLens data sets we set rmin = 1 and rmax = 5

Since |Γ| is large we randomly select a subset Ω from Γ and then use Algorithm 8 tosolve the problem (28) We randomly select 10 ratings for each user in the Jester joke datasets while we randomly choose about 50 of the ratings for each user in the MovieLensdata sets Table 45 includes the parameter settings in the algorithms The maximum iterationnumber is 100 in IALM and all other parameters are the same as those used in [45]

The numerical results are included in Table 46 We observe that IALM_FFSRQR achievesalmost the same recoverability as other methods (except IALM_LTSVD) and is slightly fasterthan IALM_RSISVD for these two data sets

TABLE 45Parameters used in the IALM method on matrix completion

Data set m n |Γ| |Ω|jester-1 24983 100 181e+06 249830jester-2 23500 100 171e+06 235000jester-3 24938 100 617e+05 249384

jester-all 73421 100 414e+06 734210movie-100K 943 1682 100e+05 49918movie-1M 6040 3706 100e+06 498742

movie-latest-small 671 9066 100e+05 52551

5 Conclusions We have presented the Flip-Flop SRQR factorization a variant ofthe QLP factorization to compute low-rank matrix approximations The Flip-Flop SRQRalgorithm uses a SRQR factorization to initialize the truncated version of a column-pivotedQR factorization and then forms an LQ factorization For the numerical results presented theerrors in the proposed algorithm were comparable to those obtained from the other state-of-the-art algorithms This new algorithm is cheaper to compute and produces quality low-rankmatrix approximations Furthermore we prove singular value lower bounds and residual errorupper bounds for the Flip-Flop SRQR factorization In situations where singular values ofthe input matrix decay relatively quickly the low-rank approximation computed by Flip-FlopSRQR is guaranteed to be as accurate as the truncated SVD We also perform a complexityanalysis to show that Flip-Flop SRQR is faster than the approximate SVD with randomizedsubspace iteration Future work includes reducing the overhead cost in Flip-Flop SRQR andimplementing the Flip-Flop SRQR algorithm on distributed memory machines for popularapplications such as distributed PCA

Acknowledgments The authors would like to thank Yousef Saad the editor and threeanonymous referees who kindly reviewed the earlier version of this manuscript and providedvaluable suggestions and comments

Appendix A Partial QR factorization with column pivotingThe details of partial QRCP are covered in Algorithm 4

Appendix B TUXV algorithmThe details of TUXV algorithm are presented in Algorithm 5

Appendix C Computing Tucker decomposition of higher-order tensorsAlgorithm 6 computes the Tucker decomposition of higher-order tensors ie sequentially

truncated higher-order SVD

ETNAKent State University and

Johann Radon Institute (RICAM)

488 Y FENG J XIAO AND M GU

TABLE 46Comparison on matrix completion

Data set Method Iter Time NMAE sv σmax σmin

jester-1

IALM-LANSVD 12 706e+00 184e-01 100 214e+03 100e+00IALM-FFSRQR 12 344e+00 169e-01 100 228e+03 100e+00IALM-RSISVD 12 375e+00 189e-01 100 212e+03 100e+00IALM-LTSVD 100 211e+01 174e-01 62 300e+03 100e+00

jester-2

IALM-LANSVD 12 680e+00 185e-01 100 213e+03 100e+00IALM-FFSRQR 12 279e+00 170e-01 100 229e+03 100e+00IALM-RSISVD 12 359e+00 191e-01 100 212e+03 100e+00IALM-LTSVD 100 203e+01 175e-01 58 296e+03 100e+00

jester-3

IALM-LANSVD 12 705e+00 126e-01 99 179e+03 100e+00IALM-FFSRQR 12 303e+00 122e-01 100 171e+03 100e+00IALM-RSISVD 12 385e+00 131e-01 100 178e+03 100e+00IALM-LTSVD 100 212e+01 133e-01 55 250e+03 100e+00

jester-all

IALM-LANSVD 12 239e+01 172e-01 100 356e+03 100e+00IALM-FFSRQR 12 112e+01 162e-01 100 363e+03 100e+00IALM-RSISVD 12 134e+01 182e-01 100 347e+03 100e+00IALM-LTSVD 100 699e+01 168e-01 52 492e+03 100e+00

movie-100K

IALM-LANSVD 29 286e+01 183e-01 285 121e+03 100e+00IALM-FFSRQR 30 455e+00 167e-01 295 153e+03 100e+00IALM-RSISVD 29 482e+00 182e-01 285 129e+03 100e+00IALM-LTSVD 48 142e+01 147e-01 475 191e+03 100e+00

movie-1M

IALM-LANSVD 50 740e+02 158e-01 495 499e+03 100e+00IALM-FFSRQR 53 207e+02 137e-01 525 663e+03 100e+00IALM-RSISVD 50 223e+02 157e-01 495 535e+03 100e+00IALM-LTSVD 100 850e+02 117e-01 995 897e+03 100e+00

movie-latest-small

IALM-LANSVD 31 166e+02 185e-01 305 113e+03 100e+00IALM-FFSRQR 31 196e+01 200e-01 305 142e+03 100e+00IALM-RSISVD 31 285e+01 191e-01 305 120e+03 100e+00IALM-LTSVD 63 402e+01 208e-01 298 179e+03 100e+00

Appendix D IALM for solving two special nuclear norm minimization problemsAlgorithm 7 and Algorithm 8 solve the robust PCA and matrix completion problems

respectively In Algorithm 7 Sω (x) = sgn (x) middot max (|x| minus ω 0) is the soft shrinkageoperator [29] with x isin Rn and ω gt 0 In Algorithm 8 Ω is the complement of Ω

Appendix E Approximate SVD with randomized subspace iterationThe randomized subspace iteration was proposed in [30 Algorithm 44] to compute an

orthonormal matrix whose range approximates the range of A An approximate SVD can becomputed using the aforementioned orthonormal matrix [30 Algorithm 51] A randomizedsubspace iteration is used in routine MLSVD_RSI in Matlab toolbox tensorlab [73] andMLSVD_RSI is by far the most efficient function that we can find in Matlab to compute ST-HOSVD We summarize the approximate SVD with randomized subspace iteration pseudocodein Algorithm 9

Now we perform a complexity analysis for the approximate SVD with randomizedsubspace iteration We first note that

1 The cost of generating a random matrix is negligible2 The cost of computing B = AΩ is 2mn (k + p)

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 489

Algorithm 4 Partial QRCPInputsMatrix A isin Rmtimesn Target rank kOutputsOrthogonal matrix Q isin RmtimesmUpper trapezoidal matrix R isin RmtimesnPermutation matrix Π isin Rntimesn such that AΠ = QRAlgorithmInitialize Π = In Compute column norms rs = A (1 m s) 2 (1 le s le n)for j = 1 k do

Find i = arg maxjleslen

rs Exchange rj and ri columns in A and Π

Form Householder reflection Qj from A (j m j)Update trailing matrix A (j m j n)larr QTj A (j m j n)Update rs = A (j + 1 m s) 2 (j + 1 le s le n)

end forQ = Q1Q2 middot middot middotQk is the product of all reflections R = upper trapezoidal part of A

Algorithm 5 TUXV AlgorithmInputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0OutputsColumn orthonormal matrices U isin Rmtimesk V isin Rntimesk and upper triangular matrixR isin Rktimesk such that A asymp URV T AlgorithmDo TRQRCP on A to obtain Q isin Rmtimesk R isin Rktimesn and Π isin RntimesnR = RΠT and do LQ factorization ie [VR] = qr(RT 0)Compute Z = AV and do QR factorization ie [UR] = qr(Z 0)

3 In each QR step [Qsim] = qr (B 0) the cost of computing the QR factorization ofB is 2m (k + p)

2 minus 23 (k + p)

3 (cf [69]) and the cost of forming the first (k + p)

columns in the full Q matrix is m (k + p)2

+ 13 (k + p)

3

Now we count the flops for each i in the for loop

1 The cost of computing B = AT lowastQ is 2mn (k + p)2 The cost of computing [Qsim] = qr (B 0) is 2n (k + p)

2 minus 23 (k + p)

3 and the costof forming the first (k + p) columns in the full Q matrix is n (k + p)

2+ 1

3 (k + p)3

3 The cost of computing B = A lowastQ is 2mn (k + p)4 The cost of computing [Qsim] = qr (B 0) is 2m (k + p)

2minus 23 (k + p)

3 and the costof forming the first (k + p) columns in the full Q matrix is m (k + p)

2+ 1

3 (k + p)3

Together the cost of running the for loop q times is

q

(4mn (k + p) + 3 (m+ n) (k + p)

2 minus 2

3(k + p)

3

)

Additionally the cost of computing B = QT lowastA is 2mn (k + p) the cost of doing SVD of Bis O(n (k + p)

2) and the cost of computing U = Q lowast U is 2m (k + p)

2

ETNAKent State University and

Johann Radon Institute (RICAM)

490 Y FENG J XIAO AND M GU

Algorithm 6 ST-HOSVDInputsTensor X isin RI1timesmiddotmiddotmiddottimesId desired rank (k1 kd) and processing order p = (p1 middot middot middot pd)OutputsTucker decomposition [GU1 middot middot middot Ud]AlgorithmDefine tensor G larr X for j = 1 d dor = pj Compute exact or approximate rank kr SVD of the tensor unfolding G(r) asymp UrΣrV Tr Ur larr UrUpdate G(r) larr ΣrV

Tr ie applying UTr to G

end for

Algorithm 7 Robust PCA Using IALMInputsMeasured matrix M isin Rmtimesn positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 J (M) = max

(M2 λminus1MF

) Y0 = MJ (M) E0 = 0

while not converged do(UΣV) = svd

(MminusEk + microminus1k Yk

)

Xk+1 = USmicrominus1k

(Σ)V T

Ek+1 = Sλmicrominus1k

(M minusXk+1 + microminus1

k Yk)

Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then

Breakend if

end while

Now assume k + p min (mn) and omit the lower-order terms then we arriveat (4q + 4)mn (k + p) as the complexity of approximate SVD with randomized subspaceiteration In practice q is usually chosen to be an integer between 0 and 2

REFERENCES

[1] O ALTER P O BROWN AND D BOTSTEIN Singular value decomposition for genome-wide expressiondata processing and modeling Proc Natl Acad Sci USA 97 (2000) pp 10101ndash10106

[2] E ANDERSON Z BAI C BISCHOF L S BLACKFORD J DEMMEL J DONGARRA J DU CROZA GREENBAUM S HAMMARLING A MCKENNEY AND D SORENSEN LAPACK Usersrsquo Guide 3rded SIAM Philadelphia 1999

[3] C A ANDERSSON AND R BRO Improving the speed of multi-way algorithms Part I Tucker3 ChemometricsIntell Lab Syst 42 (1998) pp 93ndash103

[4] M W BERRY S T DUMAIS AND G W OrsquoBRIEN Using linear algebra for intelligent information retrievalSIAM Rev 37 (1995) pp 573ndash595

[5] C BISCHOF AND C VAN LOAN The WY representation for products of Householder matrices SIAM J Sci

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 491

Algorithm 8 Matrix Completion Using IALMInputsSampled set Ω sampled entries πΩ (M) positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 Y0 = 0 E0 = 0while not converged do

(UΣV) = svd(MminusEk + microminus1k Yk

)

Xk+1 = USmicrominus1k

(Σ)V T

Ek+1 = πΩ

(M minusXk+1 + microminus1

k Yk)

Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then

Breakend if

end while

Statist Comput 8 (1987) pp S2ndashS13[6] P BUSINGER AND G H GOLUB Handbook series linear algebra Linear least squares solutions by House-

holder transformations Numer Math 7 (1965) pp 269ndash276[7] J-F CAI E J CANDEgraveS AND Z SHEN A singular value thresholding algorithm for matrix completion

SIAM J Optim 20 (2010) pp 1956ndash1982[8] E J CANDEgraveS X LI Y MA AND J WRIGHT Robust principal component analysis J ACM 58 (2011)

Art 11 37 pages[9] E J CANDEgraveS AND B RECHT Exact matrix completion via convex optimization Found Comput Math 9

(2009) pp 717ndash772[10] J D CARROLL AND J-J CHANG Analysis of individual differences in multidimensional scaling via an

N-way generalization of ldquoEckart-Youngrdquo decomposition Psychometrika 35 (1970) pp 283ndash319[11] T A DAVIS AND Y HU The University of Florida sparse matrix collection ACM Trans Math Software 38

(2011) pp Art 1 25[12] L DE LATHAUWER AND B DE MOOR From matrix to tensor Multilinear algebra and signal processing in

Mathematics in Signal Processing IV Inst Math Appl Conf Ser Oxford Univ Press Oxford 1998pp 1ndash16

[13] L DE LATHAUWER B DE MOOR AND J VANDEWALLE A multilinear singular value decompositionSIAM J Matrix Anal Appl 21 (2000) pp 1253ndash1278

[14] On the best rank-1 and rank-(R1 R2 middot middot middot RN ) approximation of higher-order tensors SIAM JMatrix Anal Appl 21 (2000) pp 1324ndash1342

[15] L DE LATHAUWER AND J VANDEWALLE Dimensionality reduction in higher-order signal processing andrank-(R1 R2 RN ) reduction in multilinear algebra Linear Algebra Appl 391 (2004) pp 31ndash55

[16] P DRINEAS R KANNAN AND M W MAHONEY Fast Monte Carlo algorithms for matrices II Computinga low-rank approximation to a matrix SIAM J Comput 36 (2006) pp 158ndash183

[17] J A DUERSCH AND M GU Randomized QR with column pivoting SIAM J Sci Comput 39 (2017)pp C263ndashC291

[18] C ECKART AND G YOUNG The approximation of one matrix by another of lower rank Psychometrika 1(1936) pp 211ndash218

[19] L ELDEacuteN Matrix Methods in Data Mining and Pattern Recognition SIAM Philadelphia 2007[20] M FAZEL H HINDI AND S P BOYD Log-det heuristic for matrix rank minimization with applications to

Hankel and Euclidean distance matrices in Proceedings of the 2003 American Control Conference 2003IEEE Conference Proceedings IEEE Piscataway pp 2156ndash2162

[21] M FAZEL T K PONG D SUN AND P TSENG Hankel matrix rank minimization with applications tosystem identification and realization SIAM J Matrix Anal Appl 34 (2013) pp 946ndash977

[22] C FRANK AND E NOumlTH Optimizing eigenfaces by face masks for facial expression recognition in ComputerAnalysis of Images and Patterns N Petkov and M A Westenberg eds vol 2756 of Lecture Notes inComput Sci Springer Berlin 2003 pp 646ndash654

ETNAKent State University and

Johann Radon Institute (RICAM)

492 Y FENG J XIAO AND M GU

Algorithm 9 Approximate SVD with Randomized Subspace IterationInputsMatrix A isin Rmtimesn Target rank k Oversampling size p ge 0 Number of iterations q ge 1OutputsU isin Rmtimesk contains the approximate top k left singular vectors of AΣ isin Rktimesk contains the approximate top k singular values of AV isin Rntimesk contains the approximate top k right singular vectors of AAlgorithmGenerate iid Gaussian matrix Ω isin N (0 1)

ntimes(k+p)Compute B = AΩ[Qsim] = qr (B 0)for i = 1 q doB = AT lowastQ[Qsim] = qr (B 0)B = A lowastQ[Qsim] = qr (B 0)

end forB = QT lowastA[UΣ V ] = svd (B)U = Q lowast U U = U ( 1 k)Σ = Σ (1 k 1 k)V = V ( 1 k)

[23] G W FURNAS S DEERWESTER S T DUMAIS T K LANDAUER R A HARSHMAN L A STREETERAND K E LOCHBAUM Information retrieval using a singular value decomposition model of latentsemantic structure in Proceedings of the 11th Annual International ACM SIGIR Conference on Researchand Development in Information Retrieval Y Chiaramella ed ACM New York 1988 pp 465ndash480

[24] K GOLDBERG T ROEDER D GUPTA AND C PERKINS Eigentaste A constant time collaborative filteringalgorithm Inf Retrieval 4 (2001) pp 133ndash151

[25] G GOLUB Numerical methods for solving linear least squares problems Numer Math 7 (1965) pp 206ndash216[26] G H GOLUB AND C F VAN LOAN Matrix Computations 4th ed Johns Hopkins University Press

Baltimore 2013[27] M GU Subspace iteration randomization and singular value problems SIAM J Sci Comput 37 (2015)

pp A1139ndashA1173[28] M GU AND S C EISENSTAT Efficient algorithms for computing a strong rank-revealing QR factorization

SIAM J Sci Comput 17 (1996) pp 848ndash869[29] E T HALE W YIN AND Y ZHANG Fixed-point continuation for l1-minimization methodology and

convergence SIAM J Optim 19 (2008) pp 1107ndash1130[30] N HALKO P G MARTINSSON AND J A TROPP Finding structure with randomness probabilistic

algorithms for constructing approximate matrix decompositions SIAM Rev 53 (2011) pp 217ndash288[31] R A HARSHMAN Foundations of the PARAFAC procedure Models and conditions for an ldquoexplanatoryrdquo

multimodal factor analysis UCLA Working Papers in Phonetics 16 (1970) pp 1ndash84[32] J L HERLOCKER J A KONSTAN A BORCHERS AND J RIEDL An algorithmic framework for performing

collaborative filtering in Proceedings of the 22nd Annual International ACM SIGIR Conference onResearch and Development in Information Retrieval F Gey M Heaast and R Tong eds ACM NewYork 1999 pp 230ndash237

[33] R A HORN AND C R JOHNSON Topics in Matrix Analysis Cambridge University Press Cambridge 1991[34] D A HUCKABY AND T F CHAN On the convergence of Stewartrsquos QLP algorithm for approximating the

SVD Numer Algorithms 32 (2003) pp 287ndash316[35] Stewartrsquos pivoted QLP decomposition for low-rank matrices Numer Linear Algebra Appl 12 (2005)

pp 153ndash159[36] W B JOHNSON AND J LINDENSTRAUSS Extensions of Lipschitz mappings into a Hilbert space in

Conference in Modern Analysis and Probability (New Haven Conn 1982) R Beals A Beck A Bellowand A Hajian eds vol 26 of Contemp Math Amer Math Soc Providence 1984 pp 189ndash206

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 493

[37] I T JOLLIFFE Principal Component Analysis Springer New York 1986[38] J M KLEINBERG Authoritative sources in a hyperlinked environment J ACM 46 (1999) pp 604ndash632[39] T G KOLDA Orthogonal tensor decompositions SIAM J Matrix Anal Appl 23 (2001) pp 243ndash255[40] T G KOLDA AND B W BADER Tensor decompositions and applications SIAM Rev 51 (2009) pp 455ndash

500[41] R M LARSEN PROPACK-software for large and sparse SVD calculations available at

httpsunstanfordedurmunkPROPACK[42] Lanczos bidiagonalization with partial reorthogonalization Tech Report PB-537 Department of

Computer Science Aarhus University Aarhus Denmark 1998[43] Y LECUN L BOTTOU Y BENGIO AND P HAFFNER Gradient-based learning applied to document

recognition Proc IEEE 86 (1998) pp 2278ndash2324[44] R B LEHOUCQ D C SORENSEN AND C YANG ARPACK Usersrsquo Guide Solution of Large-Scale

Eigenvalue Problems with Implicitly Restarted Arnoldi Methods SIAM Philadelphia 1998[45] Z LIN M CHEN AND Y MA The augmented Lagrange multiplier method for exact recovery of corrupted

low-rank matrices Preprint on arXiv httpsarxivorgabs10095055 2010[46] N LINIAL E LONDON AND Y RABINOVICH The geometry of graphs and some of its algorithmic

applications Combinatorica 15 (1995) pp 215ndash245[47] Z LIU A HANSSON AND L VANDENBERGHE Nuclear norm system identification with missing inputs and

outputs Systems Control Lett 62 (2013) pp 605ndash612[48] Z LIU AND L VANDENBERGHE Interior-point method for nuclear norm approximation with application to

system identification SIAM J Matrix Anal Appl 31 (2009) pp 1235ndash1256[49] S MA D GOLDFARB AND L CHEN Fixed point and Bregman iterative methods for matrix rank minimiza-

tion Math Program 128 (2011) pp 321ndash353[50] C B MELGAARD Randomized Pivoting and Spectrum-Revealing Bounds in Numerical Linear Algebra

PhD Thesis University of California Berkeley 2015[51] M MESBAHI AND G P PAPAVASSILOPOULOS On the rank minimization problem over a positive semidefinite

linear matrix inequality IEEE Trans Automat Control 42 (1997) pp 239ndash243[52] N MULLER L MAGAIA AND B M HERBST Singular value decomposition eigenfaces and 3D recon-

structions SIAM Rev 46 (2004) pp 518ndash545[53] M NARWARIA AND W LIN SVD-based quality metric for image and video using machine learning IEEE

Trans Systbdquo Man and Cybernetics Part B (Cybernetics) 42 (2012) pp 347ndash364[54] T-H OH Y MATSUSHITA Y-W TAI AND I SO KWEON Fast randomized singular value thresholding

for nuclear norm minimization in Proceedings of the IEEE Conference on Computer Vision and PatternRecognition 2015 IEEE Conference Proceedings IEEE Los Alamitos 2015 pp 4484ndash4493

[55] G QUINTANA-ORTIacute X SUN AND C H BISCHOF A BLAS-3 version of the QR factorization with columnpivoting SIAM J Sci Comput 19 (1998) pp 1486ndash1494

[56] B RECHT M FAZEL AND P A PARRILO Guaranteed minimum-rank solutions of linear matrix equationsvia nuclear norm minimization SIAM Rev 52 (2010) pp 471ndash501

[57] J D RENNIE AND N SREBRO Fast maximum margin matrix factorization for collaborative prediction inProceedings of the 22nd International Conference on Machine Learning L De Raedt and S Wrobel edsACM Press Neow York 2005 pp 713ndash719

[58] V ROKHLIN A SZLAM AND M TYGERT A randomized algorithm for principal component analysis SIAMJ Matrix Anal Appl 31 (2009) pp 1100ndash1124

[59] A K SAIBABA HOID higher order interpolatory decomposition for tensors based on Tucker representationSIAM J Matrix Anal Appl 37 (2016) pp 1223ndash1249

[60] B SAVAS AND L ELDEacuteN Handwritten digit classification using higher order singular value decompositionPattern Recognition 40 (2007) pp 993ndash1003

[61] R SCHREIBER AND C VAN LOAN A storage-efficient WY representation for products of Householdertransformations SIAM J Sci Statist Comput 10 (1989) pp 53ndash57

[62] A SHASHUA AND T HAZAN Non-negative tensor factorization with applications to statistics and com-puter vision in Proceedings of the 22nd International Conference on Machine Learning S DzeroskiL De Raedt and S Wrobel eds ACM New York 2005 pp 792ndash799

[63] N D SIDIROPOULOS R BRO AND G B GIANNAKIS Parallel factor analysis in sensor array processingIEEE Trans Signal Process 48 (2000) pp 2377ndash2388

[64] D C SORENSEN AND M EMBREE A DEIM induced CUR factorization SIAM J Sci Comput 38 (2016)pp A1454ndashA1482

[65] N SREBRO J RENNIE AND T S JAAKKOLA Maximum-margin matrix factorization in NIPSrsquo04 Proceed-ings of the 17th International Conference on Neural Information Processing Systems L K Saul Y Weissand L Bottou eds MIT Press Cambridge 2004 pp 1329ndash1336

[66] G W STEWART The QLP approximation to the singular value decomposition SIAM J Sci Comput 20(1999) pp 1336ndash1348

ETNAKent State University and

Johann Radon Institute (RICAM)

494 Y FENG J XIAO AND M GU

[67] S TAHERI Q QIU AND R CHELLAPPA Structure-preserving sparse decomposition for facial expressionanalysis IEEE Trans Image Process 23 (2014) pp 3590ndash3603

[68] K-C TOH AND S YUN An accelerated proximal gradient algorithm for nuclear norm regularized linearleast squares problems Pac J Optim 6 (2010) pp 615ndash640

[69] L N TREFETHEN AND D BAU III Numerical Linear Algebra SIAM Philadelphia 1997[70] L R TUCKER Some mathematical notes on three-mode factor analysis Psychometrika 31 (1966) pp 279ndash

311[71] N VANNIEUWENHOVEN R VANDEBRIL AND K MEERBERGEN A new truncation strategy for the higher-

order singular value decomposition SIAM J Sci Comput 34 (2012) pp A1027ndashA1052[72] M A O VASILESCU AND D TERZOPOULOS Multilinear analysis of image ensembles Tensorfaces in

European Conference on Computer Vision 2002 A Heyden G Sparr M Nielsen and P Johansen edsLecture Notes in Computer Science vol 2350 Springer Berlin 2002 pp 447ndash460

[73] N VERVLIET O DEBALS L SORBER M V BAREL AND L DE LATHAUWER Tensorlab user guideavailable at httpwwwtensorlabnet

[74] J XIAO AND M GU Spectrum-revealing Cholesky factorization for kernel methods in Proceedings of the 16thIEEE International Conference on Data Mining (ICDM) IEEE Conference Proceedings Los Alamitos2016 pp 1293ndash1298

[75] J XIAO M GU AND J LANGOU Fast parallel randomized QR with column pivoting algorithms for reliablelow-rank matrix approximations in Proceedings of the 24th IEEE International Conference on HighPerformance Computing (HiPC) IEEE Conference Proceedings Los Alamitos 2017 pp 233ndash242

[76] T ZHANG AND G H GOLUB Rank-one approximation to high order tensors SIAM J Matrix Anal Appl23 (2001) pp 534ndash550

Page 13: FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION AND …etna.mcs.kent.edu/vol.51.2019/pp469-494.dir/pp469-494.pdfminimization. In Section3, we introduce Flip-Flop SRQR and analyze its

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 481

TABLE 41Methods for approximate SVD

Method DescriptionLANSVD Approximate SVD using Lanczos bidiagonalization with partial

reorthogonalization [42]Its Fortran and Matlab implementations are from PROPACK [41]

FFSRQR Flip-Flop Spectrum-revealing QR factorizationIts Matlab implementation is using Mex-wrappedBLAS and LAPACK routines [2]

RSISVD Approximate SVD with randomized subspace iteration [30 27]Its Matlab implementation is from tensorlab toolbox [73]

TUXV Algorithm 5 in the appendixLTSVD Linear Time SVD [16]

We use its Matlab implementation by Ma et al [49]

we compare FFSRQR with other approximate SVD algorithms for a matrix approximationSecondly we compare FFSRQR with other methods on tensor approximation problems usingthe tensorlab toolbox [73] Thirdly we compare FFSRQR with other methods for the robustPCA problem and matrix completion problem All experiments are implemented on a laptopwith 29 GHz Intel Core i5 processor and 8 GB of RAM The codes based on LAPACK [2]and PROPACK [41] in Section 41 are in Fortran The codes in Sections 42 43 are in Matlabexcept that FFSRQR is implemented in Fortran and wrapped by mex

41 Approximate truncated SVD In this section we compare FFSRQR with otherapproximate SVD algorithms for low-rank matrix approximation All tested methods are listedin Table 41 Since LTSVD has only a Matlab implementation this method is only consideredin Sections 42 and 43 The test matrices areType 1 A isin Rmtimesn [66] is defined by A = U DV T + 01 middot dss middotE where U isin Rmtimess V isin

Rntimess are column-orthonormal matrices and D isin Rstimess is a diagonal matrix with sgeometrically decreasing diagonal entries from 1 to 10minus3 dss is the last diagonalentry of D E isin Rmtimesn is a random matrix where the entries are independentlysampled from a normal distribution N (0 1) In our numerical experiment we testthree different random matrices The square matrix has a size of 10000times 10000 theshort-fat matrix has a size of 1000times 10000 and the tall-skinny matrix has a size of10000times 1000

Type 2 A isin R4929times4929 is a real data matrix from the University of Florida sparse matrixcollection [11] Its corresponding file name is HBGEMAT11

For a given matrix A isin Rmtimesn the relative SVD approximation error is measured by∥∥Aminus UkΣkVTk

∥∥FAF where Σk contains the approximate top k singular values and

Uk Vk are the corresponding approximate top k singular vectors The parameters used inFFSRQR RSISVD TUXV and LTSVD are listed in Table 42 The additional time requiredto compute g2 is negligible In our experiments g2 always remains modest and never triggerssubsequent SRQR column swaps However computing g2 is still recommended since g2

serves as an insurance policy against potential quality degration by TRQRCP The block size bis chosen to be 32 according to the xGEQP3 LAPACK routine When k is less than 32 weimplement an unblock version of the partial QRCP

Figures 41ndash43 display the run time the relative approximation error and the top 20singular values comparison respectively for four different matrices In terms of speed FFSRQRis always faster than LANSVD and RSISVD Moreover the efficiency difference is more

ETNAKent State University and

Johann Radon Institute (RICAM)

482 Y FENG J XIAO AND M GU

TABLE 42Parameters used in RSISVD FFSRQR TUXV and LTSVD

Method ParameterFFSRQR oversampling size p = 5 block size b = min32 k l = k d = 10 g = 20RSISVD oversampling size p = 5 subspace iteration q = 1TUXV oversampling size p = 5 block size b = min32 k l = kLTSVD probabilities pi = 1n

obvious when k is relatively large FFSRQR is only slightly slower than TUXV In termsof the relative approximation error FFSRQR is comparable to the other three methods Interms of the top singular values approximation Figure 43 displays the top 20 singular valueswhen k = 500 for four different matrices In Figure 43 FFSRQR is superior to TUXV aswe take the absolute values of the main diagonal entries of the upper triangle matrix R as theapproximate singular values for TUXV

100 200 300 400 500

target rank k

100

101

102

103

104

run tim

e (

sec)

FFSRQR

RSISVD

LANSVD

TUXV

(a) Type 1 Random square matrix

100 200 300 400 500

target rank k

10-1

100

101

102

run tim

e (

sec)

FFSRQR

RSISVD

LANSVD

TUXV

(b) Type 1 Random short-fat matrix

100 200 300 400 500

target rank k

10-1

100

101

102

run tim

e (

sec)

FFSRQR

RSISVD

LANSVD

TUXV

(c) Type 1 Random tall-skinny matrix

100 200 300 400 500

target rank k

10-1

100

101

102

103

run tim

e (

sec)

FFSRQR

RSISVD

LANSVD

TUXV

(d) Type 2 GEMAT11

FIG 41 Run time comparison for approximate SVD algorithms

42 Tensor approximation This section illustrates the effectiveness and efficiency ofFFSRQR for computing approximate tensors ST-HOSVD [3 71] is one of the most efficient

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 483

100 200 300 400 500

target rank k

022

024

026

028

03

032

034

036re

lative

ap

pro

xim

atio

n e

rro

r

FFSRQR

RSISVD

LANSVD

TUXV

(a) Type 1 Random square matrix

100 200 300 400 500

target rank k

10-2

10-1

100

rela

tive

ap

pro

xim

atio

n e

rro

r

FFSRQR

RSISVD

LANSVD

TUXV

(b) Type 1 Random short-fat matrix

100 200 300 400 500

target rank k

10-2

10-1

100

rela

tive

ap

pro

xim

atio

n e

rro

r

FFSRQR

RSISVD

LANSVD

TUXV

(c) Type 1 Random tall-skinny matrix

100 200 300 400 500

target rank k

024

026

028

03

032

034

036

rela

tive

ap

pro

xim

atio

n e

rro

rFFSRQR

RSISVD

LANSVD

TUXV

(d) Type 2 GEMAT11

FIG 42 Relative approximation error comparison for approximate SVD algorithms

algorithms to compute a Tucker decomposition of tensors and the most costly part of thisalgorithm is to compute an SVD or approximate SVD of the tensor unfoldings The truncatedSVD and RSISVD are used in the routines MLSVD and MLSVD_RSI respectively in theMatlab tensorlab toolbox [73] Based on this Matlab toolbox we implement ST-HOSVDusing FFSRQR or LTSVD to do the SVD approximation We name these two new routines byMLSVD_FFSRQR and MLSVD_LTSVD respectively and compare these four routines in thenumerical experiment We also have Python codes for this tensor approximation experimentWe donrsquot list the results of the python programs here as they are similar to those for Matlab

421 A sparse tensor example We test a sparse tensor X isin Rntimesntimesn of the followingformat [64 59]

X =

10sumj=1

1000

jxj yj zj +

nsumj=11

1

jxj yj zj

where xj yj zj isin Rn are sparse vectors with nonnegative entries The symbol ldquordquo representsthe vector outer product We compute a rank-(k k k) Tucker decomposition [GU1 U2 U3]using MLSVD MLSVD_FFSRQR MLSVD_RSI and MLSVD_LTSVD respectively Therelative approximation error is measured by X minus XkF XF where Xk = G times1 U1 times2

U2 times3 U3

ETNAKent State University and

Johann Radon Institute (RICAM)

484 Y FENG J XIAO AND M GU

5 10 15 20

index

06

07

08

09

1to

p 2

0 s

ingula

r valu

es

FFSRQR

RSISVD

LANSVD

TUXV

(a) Type 1 Random square matrix

5 10 15 20

index

06

07

08

09

1

top 2

0 s

ingula

r valu

es

FFSRQR

RSISVD

LANSVD

TUXV

(b) Type 1 Random short-fat matrix

5 10 15 20

index

06

065

07

075

08

085

09

095

top

20

sin

gu

lar

va

lue

s

FFSRQR

RSISVD

LANSVD

TUXV

(c) Type 1 Random tall-skinny matrix

5 10 15 20

index

101

102

103

top

20

sin

gu

lar

va

lue

sFFSRQR

RSISVD

LANSVD

TUXV

(d) Type 2 GEMAT11

FIG 43 Top 20 singular values comparison for approximate SVD algorithms

Figure 44 displays a comparison of the efficiency and accuracy of different methods for a400times 400times 400 sparse tensor approximation problem MLSVD_LTSVD is the fastest but theleast accurate one The other three methods have similar accuracy while MLSVD_FFSRQR isfaster when the target rank k is larger

422 Handwritten digits classification MNIST is a handwritten digits image data setcreated by Yann LeCun [43] Every digit is represented by a 28times28 pixel image Handwrittendigits classification has the objective of training a classification model to classify new unlabeledimages A HOSVD algorithm is proposed by Savas and Elden [60] to classify handwrittendigits To reduce the training time a more efficient ST-HOSVD algorithm is introducedin [71]

We do handwritten digits classification using MNIST which consists of 60 000 trainingimages and 10 000 test images The number of training images in each class is restrictedto 5421 so that the training set is equally distributed over all classes The training set isrepresented by a tensor X of size 786times 5421times 10 The classification relies on Algorithm 2in [60] We use various algorithms to obtain an approximation X asymp G times1 U1 times2 U2 times3 U3

where the core tensor G has size 65times 142times 10The results are summarized in Table 43 In terms of run time we observe that our method

MLSVD_FFSRQR is comparable to MLSVD_RSI while MLSVD is the most expensive

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 485

50 100 150 200

target rank k

10-5

10-4

10-3

10-2

10-1

100

rela

tive

ap

pro

xim

atio

n e

rro

r

MLSVD

MLSVD_FFSRQR

MLSVD_RSI

MLSVD_LTSVD

50 100 150 200

target rank k

10-2

10-1

100

101

102

run tim

e (

sec)

MLSVD

MLSVD_FFSRQR

MLSVD_RSI

MLSVD_LTSVD

FIG 44 Run time and relative approximation error comparison on a sparse tensor

TABLE 43Comparison on handwritten digits classification

MLSVD MLSVD_FFSRQR MLSVD_RSI MLSVD_LTSVD

Training Time [sec] 272121 15455 19343 04266Relative Model Error 04099 04273 04247 05162Classification Accur 9519 9498 9505 9259

one and MLSVD_LTSVD is the fastest one In terms of classification quality MLSVDMLSVD_FFSRQR and MLSVD_RSI are comparable while MLSVD_LTSVD is the leastaccurate one

43 Solving nuclear norm minimization problem To show the effectiveness of theFFSRQR algorithm for the nuclear norm minimization problems we investigate two scenariosthe robust PCA (27) and the matrix completion problem (28) The test matrix used for therobust PCA is introduced in [45] and the test matrices used in the matrix completion are tworeal data sets We use the IALM method [45] to solve both problems the IALM code can bedownloaded1

431 Robust PCA To solve the robust PCA problem we replace the approximateSVD part in the IALM method [45] by various methods We denote the actual solution tothe robust PCA problem by a matrix pair (Xlowast Elowast) isin Rmtimesn times Rmtimesn The matrix Xlowast isXlowast = XLX

TR with XL isin Rmtimesk XR isin Rntimesk being random matrices whose entries are

independently sampled from a normal distribution The sparse matrix Elowast is a random matrixwhere its non-zero entries are independently sampled from a uniform distribution over theinterval [minus500 500] The input to the IALM algorithm has the form M = Xlowast + Elowast andthe output is denoted by

(X E

) In this numerical experiment we use the same parameter

settings as the IALM code for robust PCA rank k is 01m and the number of non-zero entriesin E is 005m2 We choose the trade-off parameter λ = 1

radicmax (mn) as suggested by

Candes et al [8] The solution quality is measured by the normalized root mean square errorX minusXlowastF XlowastF

Table 44 includes relative error run time the number of non-zero entries in E (E0)the iteration count and the number of non-zero singular values (sv) in X of the IALM

1httpperceptioncslillinoisedumatrix-ranksample_codehtml

ETNAKent State University and

Johann Radon Institute (RICAM)

486 Y FENG J XIAO AND M GU

algorithm using different approximate SVD methods We observe that IALM_FFSRQR isfaster than all the other three methods while its error is comparable to IALM_LANSVD andIALM_RSISVD IALM_LTSVD is relatively slow and not effective

TABLE 44Comparison on robust PCA

Size Method Error Time (sec) E0 Iter sv

1000times 1000

IALM_LANSVD 333e-07 579e+00 50000 22 100IALM_FFSRQR 279e-07 102e+00 50000 25 100IALM_RSISVD 336e-07 109e+00 49999 22 100IALM_LTSVD 992e-02 311e+00 999715 100 100

2000times 2000

IALM_LANSVD 261e-07 591e+01 199999 22 200IALM_FFSRQR 182e-07 693e+00 199998 25 200IALM_RSISVD 263e-07 738e+00 199996 22 200IALM_LTSVD 842e-02 220e+01 3998937 100 200

4000times 4000

IALM_LANSVD 138e-07 465e+02 799991 23 400IALM_FFSRQR 139e-07 443e+01 800006 26 400IALM_RSISVD 151e-07 504e+01 799990 23 400IALM_LTSVD 894e-02 154e+02 15996623 100 400

6000times 6000

IALM_LANSVD 130e-07 166e+03 1799982 23 600IALM_FFSRQR 102e-07 142e+02 1799993 26 600IALM_RSISVD 144e-07 162e+02 1799985 23 600IALM_LTSVD 858e-02 555e+02 35992605 100 600

432 Matrix completion We solve the matrix completion problems for two real datasets used in [68] the Jester joke data set [24] and the MovieLens data set [32] The Jester jokedata set consists of 41 million ratings for 100 jokes from 73 421 users and can be downloadedfrom the website httpgoldbergberkeleyedujester-data We test withthe following data matrices

bull jester-1 Data from 24 983 users who have rated 36 or more jokesbull jester-2 Data from 23 500 users who have rated 36 or more jokesbull jester-3 Data from 24 938 users who have rated between 15 and 35 jokesbull jester-all The combination of jester-1 jester-2 and jester-3

The MovieLens data set can be downloaded fromhttpsgrouplensorgdatasetsmovielensWe test with the following data matricesbull movie-100K 100 000 ratings of 943 users for 1682 moviesbull movie-1M 1 million ratings of 6040 users for 3900 moviesbull movie-latest-small 100 000 ratings of 700 users for 9000 movies

For each data set we let M be the original data matrix where Mij stands for the ratingof joke (movie) j by user i and Γ be the set of indices where Mij is known The matrixcompletion algorithm quality is measured by the Normalized Mean Absolute Error (NMAE)defined by

NMAEdef=

1|Γ|sum

(ij)isinΓ |Mij minusXij |rmax minus rmin

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 487

where Xij is the prediction of the rating of joke (movie) j given by user i and rmin rmax arethe lower and upper bounds of the ratings respectively For the Jester joke data sets we setrmin = minus10 and rmax = 10 For the MovieLens data sets we set rmin = 1 and rmax = 5

Since |Γ| is large we randomly select a subset Ω from Γ and then use Algorithm 8 tosolve the problem (28) We randomly select 10 ratings for each user in the Jester joke datasets while we randomly choose about 50 of the ratings for each user in the MovieLensdata sets Table 45 includes the parameter settings in the algorithms The maximum iterationnumber is 100 in IALM and all other parameters are the same as those used in [45]

The numerical results are included in Table 46 We observe that IALM_FFSRQR achievesalmost the same recoverability as other methods (except IALM_LTSVD) and is slightly fasterthan IALM_RSISVD for these two data sets

TABLE 45Parameters used in the IALM method on matrix completion

Data set m n |Γ| |Ω|jester-1 24983 100 181e+06 249830jester-2 23500 100 171e+06 235000jester-3 24938 100 617e+05 249384

jester-all 73421 100 414e+06 734210movie-100K 943 1682 100e+05 49918movie-1M 6040 3706 100e+06 498742

movie-latest-small 671 9066 100e+05 52551

5 Conclusions We have presented the Flip-Flop SRQR factorization a variant ofthe QLP factorization to compute low-rank matrix approximations The Flip-Flop SRQRalgorithm uses a SRQR factorization to initialize the truncated version of a column-pivotedQR factorization and then forms an LQ factorization For the numerical results presented theerrors in the proposed algorithm were comparable to those obtained from the other state-of-the-art algorithms This new algorithm is cheaper to compute and produces quality low-rankmatrix approximations Furthermore we prove singular value lower bounds and residual errorupper bounds for the Flip-Flop SRQR factorization In situations where singular values ofthe input matrix decay relatively quickly the low-rank approximation computed by Flip-FlopSRQR is guaranteed to be as accurate as the truncated SVD We also perform a complexityanalysis to show that Flip-Flop SRQR is faster than the approximate SVD with randomizedsubspace iteration Future work includes reducing the overhead cost in Flip-Flop SRQR andimplementing the Flip-Flop SRQR algorithm on distributed memory machines for popularapplications such as distributed PCA

Acknowledgments The authors would like to thank Yousef Saad the editor and threeanonymous referees who kindly reviewed the earlier version of this manuscript and providedvaluable suggestions and comments

Appendix A Partial QR factorization with column pivotingThe details of partial QRCP are covered in Algorithm 4

Appendix B TUXV algorithmThe details of TUXV algorithm are presented in Algorithm 5

Appendix C Computing Tucker decomposition of higher-order tensorsAlgorithm 6 computes the Tucker decomposition of higher-order tensors ie sequentially

truncated higher-order SVD

ETNAKent State University and

Johann Radon Institute (RICAM)

488 Y FENG J XIAO AND M GU

TABLE 46Comparison on matrix completion

Data set Method Iter Time NMAE sv σmax σmin

jester-1

IALM-LANSVD 12 706e+00 184e-01 100 214e+03 100e+00IALM-FFSRQR 12 344e+00 169e-01 100 228e+03 100e+00IALM-RSISVD 12 375e+00 189e-01 100 212e+03 100e+00IALM-LTSVD 100 211e+01 174e-01 62 300e+03 100e+00

jester-2

IALM-LANSVD 12 680e+00 185e-01 100 213e+03 100e+00IALM-FFSRQR 12 279e+00 170e-01 100 229e+03 100e+00IALM-RSISVD 12 359e+00 191e-01 100 212e+03 100e+00IALM-LTSVD 100 203e+01 175e-01 58 296e+03 100e+00

jester-3

IALM-LANSVD 12 705e+00 126e-01 99 179e+03 100e+00IALM-FFSRQR 12 303e+00 122e-01 100 171e+03 100e+00IALM-RSISVD 12 385e+00 131e-01 100 178e+03 100e+00IALM-LTSVD 100 212e+01 133e-01 55 250e+03 100e+00

jester-all

IALM-LANSVD 12 239e+01 172e-01 100 356e+03 100e+00IALM-FFSRQR 12 112e+01 162e-01 100 363e+03 100e+00IALM-RSISVD 12 134e+01 182e-01 100 347e+03 100e+00IALM-LTSVD 100 699e+01 168e-01 52 492e+03 100e+00

movie-100K

IALM-LANSVD 29 286e+01 183e-01 285 121e+03 100e+00IALM-FFSRQR 30 455e+00 167e-01 295 153e+03 100e+00IALM-RSISVD 29 482e+00 182e-01 285 129e+03 100e+00IALM-LTSVD 48 142e+01 147e-01 475 191e+03 100e+00

movie-1M

IALM-LANSVD 50 740e+02 158e-01 495 499e+03 100e+00IALM-FFSRQR 53 207e+02 137e-01 525 663e+03 100e+00IALM-RSISVD 50 223e+02 157e-01 495 535e+03 100e+00IALM-LTSVD 100 850e+02 117e-01 995 897e+03 100e+00

movie-latest-small

IALM-LANSVD 31 166e+02 185e-01 305 113e+03 100e+00IALM-FFSRQR 31 196e+01 200e-01 305 142e+03 100e+00IALM-RSISVD 31 285e+01 191e-01 305 120e+03 100e+00IALM-LTSVD 63 402e+01 208e-01 298 179e+03 100e+00

Appendix D IALM for solving two special nuclear norm minimization problemsAlgorithm 7 and Algorithm 8 solve the robust PCA and matrix completion problems

respectively In Algorithm 7 Sω (x) = sgn (x) middot max (|x| minus ω 0) is the soft shrinkageoperator [29] with x isin Rn and ω gt 0 In Algorithm 8 Ω is the complement of Ω

Appendix E Approximate SVD with randomized subspace iterationThe randomized subspace iteration was proposed in [30 Algorithm 44] to compute an

orthonormal matrix whose range approximates the range of A An approximate SVD can becomputed using the aforementioned orthonormal matrix [30 Algorithm 51] A randomizedsubspace iteration is used in routine MLSVD_RSI in Matlab toolbox tensorlab [73] andMLSVD_RSI is by far the most efficient function that we can find in Matlab to compute ST-HOSVD We summarize the approximate SVD with randomized subspace iteration pseudocodein Algorithm 9

Now we perform a complexity analysis for the approximate SVD with randomizedsubspace iteration We first note that

1 The cost of generating a random matrix is negligible2 The cost of computing B = AΩ is 2mn (k + p)

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 489

Algorithm 4 Partial QRCPInputsMatrix A isin Rmtimesn Target rank kOutputsOrthogonal matrix Q isin RmtimesmUpper trapezoidal matrix R isin RmtimesnPermutation matrix Π isin Rntimesn such that AΠ = QRAlgorithmInitialize Π = In Compute column norms rs = A (1 m s) 2 (1 le s le n)for j = 1 k do

Find i = arg maxjleslen

rs Exchange rj and ri columns in A and Π

Form Householder reflection Qj from A (j m j)Update trailing matrix A (j m j n)larr QTj A (j m j n)Update rs = A (j + 1 m s) 2 (j + 1 le s le n)

end forQ = Q1Q2 middot middot middotQk is the product of all reflections R = upper trapezoidal part of A

Algorithm 5 TUXV AlgorithmInputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0OutputsColumn orthonormal matrices U isin Rmtimesk V isin Rntimesk and upper triangular matrixR isin Rktimesk such that A asymp URV T AlgorithmDo TRQRCP on A to obtain Q isin Rmtimesk R isin Rktimesn and Π isin RntimesnR = RΠT and do LQ factorization ie [VR] = qr(RT 0)Compute Z = AV and do QR factorization ie [UR] = qr(Z 0)

3 In each QR step [Qsim] = qr (B 0) the cost of computing the QR factorization ofB is 2m (k + p)

2 minus 23 (k + p)

3 (cf [69]) and the cost of forming the first (k + p)

columns in the full Q matrix is m (k + p)2

+ 13 (k + p)

3

Now we count the flops for each i in the for loop

1 The cost of computing B = AT lowastQ is 2mn (k + p)2 The cost of computing [Qsim] = qr (B 0) is 2n (k + p)

2 minus 23 (k + p)

3 and the costof forming the first (k + p) columns in the full Q matrix is n (k + p)

2+ 1

3 (k + p)3

3 The cost of computing B = A lowastQ is 2mn (k + p)4 The cost of computing [Qsim] = qr (B 0) is 2m (k + p)

2minus 23 (k + p)

3 and the costof forming the first (k + p) columns in the full Q matrix is m (k + p)

2+ 1

3 (k + p)3

Together the cost of running the for loop q times is

q

(4mn (k + p) + 3 (m+ n) (k + p)

2 minus 2

3(k + p)

3

)

Additionally the cost of computing B = QT lowastA is 2mn (k + p) the cost of doing SVD of Bis O(n (k + p)

2) and the cost of computing U = Q lowast U is 2m (k + p)

2

ETNAKent State University and

Johann Radon Institute (RICAM)

490 Y FENG J XIAO AND M GU

Algorithm 6 ST-HOSVDInputsTensor X isin RI1timesmiddotmiddotmiddottimesId desired rank (k1 kd) and processing order p = (p1 middot middot middot pd)OutputsTucker decomposition [GU1 middot middot middot Ud]AlgorithmDefine tensor G larr X for j = 1 d dor = pj Compute exact or approximate rank kr SVD of the tensor unfolding G(r) asymp UrΣrV Tr Ur larr UrUpdate G(r) larr ΣrV

Tr ie applying UTr to G

end for

Algorithm 7 Robust PCA Using IALMInputsMeasured matrix M isin Rmtimesn positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 J (M) = max

(M2 λminus1MF

) Y0 = MJ (M) E0 = 0

while not converged do(UΣV) = svd

(MminusEk + microminus1k Yk

)

Xk+1 = USmicrominus1k

(Σ)V T

Ek+1 = Sλmicrominus1k

(M minusXk+1 + microminus1

k Yk)

Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then

Breakend if

end while

Now assume k + p min (mn) and omit the lower-order terms then we arriveat (4q + 4)mn (k + p) as the complexity of approximate SVD with randomized subspaceiteration In practice q is usually chosen to be an integer between 0 and 2

REFERENCES

[1] O ALTER P O BROWN AND D BOTSTEIN Singular value decomposition for genome-wide expressiondata processing and modeling Proc Natl Acad Sci USA 97 (2000) pp 10101ndash10106

[2] E ANDERSON Z BAI C BISCHOF L S BLACKFORD J DEMMEL J DONGARRA J DU CROZA GREENBAUM S HAMMARLING A MCKENNEY AND D SORENSEN LAPACK Usersrsquo Guide 3rded SIAM Philadelphia 1999

[3] C A ANDERSSON AND R BRO Improving the speed of multi-way algorithms Part I Tucker3 ChemometricsIntell Lab Syst 42 (1998) pp 93ndash103

[4] M W BERRY S T DUMAIS AND G W OrsquoBRIEN Using linear algebra for intelligent information retrievalSIAM Rev 37 (1995) pp 573ndash595

[5] C BISCHOF AND C VAN LOAN The WY representation for products of Householder matrices SIAM J Sci

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 491

Algorithm 8 Matrix Completion Using IALMInputsSampled set Ω sampled entries πΩ (M) positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 Y0 = 0 E0 = 0while not converged do

(UΣV) = svd(MminusEk + microminus1k Yk

)

Xk+1 = USmicrominus1k

(Σ)V T

Ek+1 = πΩ

(M minusXk+1 + microminus1

k Yk)

Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then

Breakend if

end while

Statist Comput 8 (1987) pp S2ndashS13[6] P BUSINGER AND G H GOLUB Handbook series linear algebra Linear least squares solutions by House-

holder transformations Numer Math 7 (1965) pp 269ndash276[7] J-F CAI E J CANDEgraveS AND Z SHEN A singular value thresholding algorithm for matrix completion

SIAM J Optim 20 (2010) pp 1956ndash1982[8] E J CANDEgraveS X LI Y MA AND J WRIGHT Robust principal component analysis J ACM 58 (2011)

Art 11 37 pages[9] E J CANDEgraveS AND B RECHT Exact matrix completion via convex optimization Found Comput Math 9

(2009) pp 717ndash772[10] J D CARROLL AND J-J CHANG Analysis of individual differences in multidimensional scaling via an

N-way generalization of ldquoEckart-Youngrdquo decomposition Psychometrika 35 (1970) pp 283ndash319[11] T A DAVIS AND Y HU The University of Florida sparse matrix collection ACM Trans Math Software 38

(2011) pp Art 1 25[12] L DE LATHAUWER AND B DE MOOR From matrix to tensor Multilinear algebra and signal processing in

Mathematics in Signal Processing IV Inst Math Appl Conf Ser Oxford Univ Press Oxford 1998pp 1ndash16

[13] L DE LATHAUWER B DE MOOR AND J VANDEWALLE A multilinear singular value decompositionSIAM J Matrix Anal Appl 21 (2000) pp 1253ndash1278

[14] On the best rank-1 and rank-(R1 R2 middot middot middot RN ) approximation of higher-order tensors SIAM JMatrix Anal Appl 21 (2000) pp 1324ndash1342

[15] L DE LATHAUWER AND J VANDEWALLE Dimensionality reduction in higher-order signal processing andrank-(R1 R2 RN ) reduction in multilinear algebra Linear Algebra Appl 391 (2004) pp 31ndash55

[16] P DRINEAS R KANNAN AND M W MAHONEY Fast Monte Carlo algorithms for matrices II Computinga low-rank approximation to a matrix SIAM J Comput 36 (2006) pp 158ndash183

[17] J A DUERSCH AND M GU Randomized QR with column pivoting SIAM J Sci Comput 39 (2017)pp C263ndashC291

[18] C ECKART AND G YOUNG The approximation of one matrix by another of lower rank Psychometrika 1(1936) pp 211ndash218

[19] L ELDEacuteN Matrix Methods in Data Mining and Pattern Recognition SIAM Philadelphia 2007[20] M FAZEL H HINDI AND S P BOYD Log-det heuristic for matrix rank minimization with applications to

Hankel and Euclidean distance matrices in Proceedings of the 2003 American Control Conference 2003IEEE Conference Proceedings IEEE Piscataway pp 2156ndash2162

[21] M FAZEL T K PONG D SUN AND P TSENG Hankel matrix rank minimization with applications tosystem identification and realization SIAM J Matrix Anal Appl 34 (2013) pp 946ndash977

[22] C FRANK AND E NOumlTH Optimizing eigenfaces by face masks for facial expression recognition in ComputerAnalysis of Images and Patterns N Petkov and M A Westenberg eds vol 2756 of Lecture Notes inComput Sci Springer Berlin 2003 pp 646ndash654

ETNAKent State University and

Johann Radon Institute (RICAM)

492 Y FENG J XIAO AND M GU

Algorithm 9 Approximate SVD with Randomized Subspace IterationInputsMatrix A isin Rmtimesn Target rank k Oversampling size p ge 0 Number of iterations q ge 1OutputsU isin Rmtimesk contains the approximate top k left singular vectors of AΣ isin Rktimesk contains the approximate top k singular values of AV isin Rntimesk contains the approximate top k right singular vectors of AAlgorithmGenerate iid Gaussian matrix Ω isin N (0 1)

ntimes(k+p)Compute B = AΩ[Qsim] = qr (B 0)for i = 1 q doB = AT lowastQ[Qsim] = qr (B 0)B = A lowastQ[Qsim] = qr (B 0)

end forB = QT lowastA[UΣ V ] = svd (B)U = Q lowast U U = U ( 1 k)Σ = Σ (1 k 1 k)V = V ( 1 k)

[23] G W FURNAS S DEERWESTER S T DUMAIS T K LANDAUER R A HARSHMAN L A STREETERAND K E LOCHBAUM Information retrieval using a singular value decomposition model of latentsemantic structure in Proceedings of the 11th Annual International ACM SIGIR Conference on Researchand Development in Information Retrieval Y Chiaramella ed ACM New York 1988 pp 465ndash480

[24] K GOLDBERG T ROEDER D GUPTA AND C PERKINS Eigentaste A constant time collaborative filteringalgorithm Inf Retrieval 4 (2001) pp 133ndash151

[25] G GOLUB Numerical methods for solving linear least squares problems Numer Math 7 (1965) pp 206ndash216[26] G H GOLUB AND C F VAN LOAN Matrix Computations 4th ed Johns Hopkins University Press

Baltimore 2013[27] M GU Subspace iteration randomization and singular value problems SIAM J Sci Comput 37 (2015)

pp A1139ndashA1173[28] M GU AND S C EISENSTAT Efficient algorithms for computing a strong rank-revealing QR factorization

SIAM J Sci Comput 17 (1996) pp 848ndash869[29] E T HALE W YIN AND Y ZHANG Fixed-point continuation for l1-minimization methodology and

convergence SIAM J Optim 19 (2008) pp 1107ndash1130[30] N HALKO P G MARTINSSON AND J A TROPP Finding structure with randomness probabilistic

algorithms for constructing approximate matrix decompositions SIAM Rev 53 (2011) pp 217ndash288[31] R A HARSHMAN Foundations of the PARAFAC procedure Models and conditions for an ldquoexplanatoryrdquo

multimodal factor analysis UCLA Working Papers in Phonetics 16 (1970) pp 1ndash84[32] J L HERLOCKER J A KONSTAN A BORCHERS AND J RIEDL An algorithmic framework for performing

collaborative filtering in Proceedings of the 22nd Annual International ACM SIGIR Conference onResearch and Development in Information Retrieval F Gey M Heaast and R Tong eds ACM NewYork 1999 pp 230ndash237

[33] R A HORN AND C R JOHNSON Topics in Matrix Analysis Cambridge University Press Cambridge 1991[34] D A HUCKABY AND T F CHAN On the convergence of Stewartrsquos QLP algorithm for approximating the

SVD Numer Algorithms 32 (2003) pp 287ndash316[35] Stewartrsquos pivoted QLP decomposition for low-rank matrices Numer Linear Algebra Appl 12 (2005)

pp 153ndash159[36] W B JOHNSON AND J LINDENSTRAUSS Extensions of Lipschitz mappings into a Hilbert space in

Conference in Modern Analysis and Probability (New Haven Conn 1982) R Beals A Beck A Bellowand A Hajian eds vol 26 of Contemp Math Amer Math Soc Providence 1984 pp 189ndash206

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 493

[37] I T JOLLIFFE Principal Component Analysis Springer New York 1986[38] J M KLEINBERG Authoritative sources in a hyperlinked environment J ACM 46 (1999) pp 604ndash632[39] T G KOLDA Orthogonal tensor decompositions SIAM J Matrix Anal Appl 23 (2001) pp 243ndash255[40] T G KOLDA AND B W BADER Tensor decompositions and applications SIAM Rev 51 (2009) pp 455ndash

500[41] R M LARSEN PROPACK-software for large and sparse SVD calculations available at

httpsunstanfordedurmunkPROPACK[42] Lanczos bidiagonalization with partial reorthogonalization Tech Report PB-537 Department of

Computer Science Aarhus University Aarhus Denmark 1998[43] Y LECUN L BOTTOU Y BENGIO AND P HAFFNER Gradient-based learning applied to document

recognition Proc IEEE 86 (1998) pp 2278ndash2324[44] R B LEHOUCQ D C SORENSEN AND C YANG ARPACK Usersrsquo Guide Solution of Large-Scale

Eigenvalue Problems with Implicitly Restarted Arnoldi Methods SIAM Philadelphia 1998[45] Z LIN M CHEN AND Y MA The augmented Lagrange multiplier method for exact recovery of corrupted

low-rank matrices Preprint on arXiv httpsarxivorgabs10095055 2010[46] N LINIAL E LONDON AND Y RABINOVICH The geometry of graphs and some of its algorithmic

applications Combinatorica 15 (1995) pp 215ndash245[47] Z LIU A HANSSON AND L VANDENBERGHE Nuclear norm system identification with missing inputs and

outputs Systems Control Lett 62 (2013) pp 605ndash612[48] Z LIU AND L VANDENBERGHE Interior-point method for nuclear norm approximation with application to

system identification SIAM J Matrix Anal Appl 31 (2009) pp 1235ndash1256[49] S MA D GOLDFARB AND L CHEN Fixed point and Bregman iterative methods for matrix rank minimiza-

tion Math Program 128 (2011) pp 321ndash353[50] C B MELGAARD Randomized Pivoting and Spectrum-Revealing Bounds in Numerical Linear Algebra

PhD Thesis University of California Berkeley 2015[51] M MESBAHI AND G P PAPAVASSILOPOULOS On the rank minimization problem over a positive semidefinite

linear matrix inequality IEEE Trans Automat Control 42 (1997) pp 239ndash243[52] N MULLER L MAGAIA AND B M HERBST Singular value decomposition eigenfaces and 3D recon-

structions SIAM Rev 46 (2004) pp 518ndash545[53] M NARWARIA AND W LIN SVD-based quality metric for image and video using machine learning IEEE

Trans Systbdquo Man and Cybernetics Part B (Cybernetics) 42 (2012) pp 347ndash364[54] T-H OH Y MATSUSHITA Y-W TAI AND I SO KWEON Fast randomized singular value thresholding

for nuclear norm minimization in Proceedings of the IEEE Conference on Computer Vision and PatternRecognition 2015 IEEE Conference Proceedings IEEE Los Alamitos 2015 pp 4484ndash4493

[55] G QUINTANA-ORTIacute X SUN AND C H BISCHOF A BLAS-3 version of the QR factorization with columnpivoting SIAM J Sci Comput 19 (1998) pp 1486ndash1494

[56] B RECHT M FAZEL AND P A PARRILO Guaranteed minimum-rank solutions of linear matrix equationsvia nuclear norm minimization SIAM Rev 52 (2010) pp 471ndash501

[57] J D RENNIE AND N SREBRO Fast maximum margin matrix factorization for collaborative prediction inProceedings of the 22nd International Conference on Machine Learning L De Raedt and S Wrobel edsACM Press Neow York 2005 pp 713ndash719

[58] V ROKHLIN A SZLAM AND M TYGERT A randomized algorithm for principal component analysis SIAMJ Matrix Anal Appl 31 (2009) pp 1100ndash1124

[59] A K SAIBABA HOID higher order interpolatory decomposition for tensors based on Tucker representationSIAM J Matrix Anal Appl 37 (2016) pp 1223ndash1249

[60] B SAVAS AND L ELDEacuteN Handwritten digit classification using higher order singular value decompositionPattern Recognition 40 (2007) pp 993ndash1003

[61] R SCHREIBER AND C VAN LOAN A storage-efficient WY representation for products of Householdertransformations SIAM J Sci Statist Comput 10 (1989) pp 53ndash57

[62] A SHASHUA AND T HAZAN Non-negative tensor factorization with applications to statistics and com-puter vision in Proceedings of the 22nd International Conference on Machine Learning S DzeroskiL De Raedt and S Wrobel eds ACM New York 2005 pp 792ndash799

[63] N D SIDIROPOULOS R BRO AND G B GIANNAKIS Parallel factor analysis in sensor array processingIEEE Trans Signal Process 48 (2000) pp 2377ndash2388

[64] D C SORENSEN AND M EMBREE A DEIM induced CUR factorization SIAM J Sci Comput 38 (2016)pp A1454ndashA1482

[65] N SREBRO J RENNIE AND T S JAAKKOLA Maximum-margin matrix factorization in NIPSrsquo04 Proceed-ings of the 17th International Conference on Neural Information Processing Systems L K Saul Y Weissand L Bottou eds MIT Press Cambridge 2004 pp 1329ndash1336

[66] G W STEWART The QLP approximation to the singular value decomposition SIAM J Sci Comput 20(1999) pp 1336ndash1348

ETNAKent State University and

Johann Radon Institute (RICAM)

494 Y FENG J XIAO AND M GU

[67] S TAHERI Q QIU AND R CHELLAPPA Structure-preserving sparse decomposition for facial expressionanalysis IEEE Trans Image Process 23 (2014) pp 3590ndash3603

[68] K-C TOH AND S YUN An accelerated proximal gradient algorithm for nuclear norm regularized linearleast squares problems Pac J Optim 6 (2010) pp 615ndash640

[69] L N TREFETHEN AND D BAU III Numerical Linear Algebra SIAM Philadelphia 1997[70] L R TUCKER Some mathematical notes on three-mode factor analysis Psychometrika 31 (1966) pp 279ndash

311[71] N VANNIEUWENHOVEN R VANDEBRIL AND K MEERBERGEN A new truncation strategy for the higher-

order singular value decomposition SIAM J Sci Comput 34 (2012) pp A1027ndashA1052[72] M A O VASILESCU AND D TERZOPOULOS Multilinear analysis of image ensembles Tensorfaces in

European Conference on Computer Vision 2002 A Heyden G Sparr M Nielsen and P Johansen edsLecture Notes in Computer Science vol 2350 Springer Berlin 2002 pp 447ndash460

[73] N VERVLIET O DEBALS L SORBER M V BAREL AND L DE LATHAUWER Tensorlab user guideavailable at httpwwwtensorlabnet

[74] J XIAO AND M GU Spectrum-revealing Cholesky factorization for kernel methods in Proceedings of the 16thIEEE International Conference on Data Mining (ICDM) IEEE Conference Proceedings Los Alamitos2016 pp 1293ndash1298

[75] J XIAO M GU AND J LANGOU Fast parallel randomized QR with column pivoting algorithms for reliablelow-rank matrix approximations in Proceedings of the 24th IEEE International Conference on HighPerformance Computing (HiPC) IEEE Conference Proceedings Los Alamitos 2017 pp 233ndash242

[76] T ZHANG AND G H GOLUB Rank-one approximation to high order tensors SIAM J Matrix Anal Appl23 (2001) pp 534ndash550

Page 14: FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION AND …etna.mcs.kent.edu/vol.51.2019/pp469-494.dir/pp469-494.pdfminimization. In Section3, we introduce Flip-Flop SRQR and analyze its

ETNAKent State University and

Johann Radon Institute (RICAM)

482 Y FENG J XIAO AND M GU

TABLE 42Parameters used in RSISVD FFSRQR TUXV and LTSVD

Method ParameterFFSRQR oversampling size p = 5 block size b = min32 k l = k d = 10 g = 20RSISVD oversampling size p = 5 subspace iteration q = 1TUXV oversampling size p = 5 block size b = min32 k l = kLTSVD probabilities pi = 1n

obvious when k is relatively large FFSRQR is only slightly slower than TUXV In termsof the relative approximation error FFSRQR is comparable to the other three methods Interms of the top singular values approximation Figure 43 displays the top 20 singular valueswhen k = 500 for four different matrices In Figure 43 FFSRQR is superior to TUXV aswe take the absolute values of the main diagonal entries of the upper triangle matrix R as theapproximate singular values for TUXV

100 200 300 400 500

target rank k

100

101

102

103

104

run tim

e (

sec)

FFSRQR

RSISVD

LANSVD

TUXV

(a) Type 1 Random square matrix

100 200 300 400 500

target rank k

10-1

100

101

102

run tim

e (

sec)

FFSRQR

RSISVD

LANSVD

TUXV

(b) Type 1 Random short-fat matrix

100 200 300 400 500

target rank k

10-1

100

101

102

run tim

e (

sec)

FFSRQR

RSISVD

LANSVD

TUXV

(c) Type 1 Random tall-skinny matrix

100 200 300 400 500

target rank k

10-1

100

101

102

103

run tim

e (

sec)

FFSRQR

RSISVD

LANSVD

TUXV

(d) Type 2 GEMAT11

FIG 41 Run time comparison for approximate SVD algorithms

42 Tensor approximation This section illustrates the effectiveness and efficiency ofFFSRQR for computing approximate tensors ST-HOSVD [3 71] is one of the most efficient

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 483

100 200 300 400 500

target rank k

022

024

026

028

03

032

034

036re

lative

ap

pro

xim

atio

n e

rro

r

FFSRQR

RSISVD

LANSVD

TUXV

(a) Type 1 Random square matrix

100 200 300 400 500

target rank k

10-2

10-1

100

rela

tive

ap

pro

xim

atio

n e

rro

r

FFSRQR

RSISVD

LANSVD

TUXV

(b) Type 1 Random short-fat matrix

100 200 300 400 500

target rank k

10-2

10-1

100

rela

tive

ap

pro

xim

atio

n e

rro

r

FFSRQR

RSISVD

LANSVD

TUXV

(c) Type 1 Random tall-skinny matrix

100 200 300 400 500

target rank k

024

026

028

03

032

034

036

rela

tive

ap

pro

xim

atio

n e

rro

rFFSRQR

RSISVD

LANSVD

TUXV

(d) Type 2 GEMAT11

FIG 42 Relative approximation error comparison for approximate SVD algorithms

algorithms to compute a Tucker decomposition of tensors and the most costly part of thisalgorithm is to compute an SVD or approximate SVD of the tensor unfoldings The truncatedSVD and RSISVD are used in the routines MLSVD and MLSVD_RSI respectively in theMatlab tensorlab toolbox [73] Based on this Matlab toolbox we implement ST-HOSVDusing FFSRQR or LTSVD to do the SVD approximation We name these two new routines byMLSVD_FFSRQR and MLSVD_LTSVD respectively and compare these four routines in thenumerical experiment We also have Python codes for this tensor approximation experimentWe donrsquot list the results of the python programs here as they are similar to those for Matlab

421 A sparse tensor example We test a sparse tensor X isin Rntimesntimesn of the followingformat [64 59]

X =

10sumj=1

1000

jxj yj zj +

nsumj=11

1

jxj yj zj

where xj yj zj isin Rn are sparse vectors with nonnegative entries The symbol ldquordquo representsthe vector outer product We compute a rank-(k k k) Tucker decomposition [GU1 U2 U3]using MLSVD MLSVD_FFSRQR MLSVD_RSI and MLSVD_LTSVD respectively Therelative approximation error is measured by X minus XkF XF where Xk = G times1 U1 times2

U2 times3 U3

ETNAKent State University and

Johann Radon Institute (RICAM)

484 Y FENG J XIAO AND M GU

5 10 15 20

index

06

07

08

09

1to

p 2

0 s

ingula

r valu

es

FFSRQR

RSISVD

LANSVD

TUXV

(a) Type 1 Random square matrix

5 10 15 20

index

06

07

08

09

1

top 2

0 s

ingula

r valu

es

FFSRQR

RSISVD

LANSVD

TUXV

(b) Type 1 Random short-fat matrix

5 10 15 20

index

06

065

07

075

08

085

09

095

top

20

sin

gu

lar

va

lue

s

FFSRQR

RSISVD

LANSVD

TUXV

(c) Type 1 Random tall-skinny matrix

5 10 15 20

index

101

102

103

top

20

sin

gu

lar

va

lue

sFFSRQR

RSISVD

LANSVD

TUXV

(d) Type 2 GEMAT11

FIG 43 Top 20 singular values comparison for approximate SVD algorithms

Figure 44 displays a comparison of the efficiency and accuracy of different methods for a400times 400times 400 sparse tensor approximation problem MLSVD_LTSVD is the fastest but theleast accurate one The other three methods have similar accuracy while MLSVD_FFSRQR isfaster when the target rank k is larger

422 Handwritten digits classification MNIST is a handwritten digits image data setcreated by Yann LeCun [43] Every digit is represented by a 28times28 pixel image Handwrittendigits classification has the objective of training a classification model to classify new unlabeledimages A HOSVD algorithm is proposed by Savas and Elden [60] to classify handwrittendigits To reduce the training time a more efficient ST-HOSVD algorithm is introducedin [71]

We do handwritten digits classification using MNIST which consists of 60 000 trainingimages and 10 000 test images The number of training images in each class is restrictedto 5421 so that the training set is equally distributed over all classes The training set isrepresented by a tensor X of size 786times 5421times 10 The classification relies on Algorithm 2in [60] We use various algorithms to obtain an approximation X asymp G times1 U1 times2 U2 times3 U3

where the core tensor G has size 65times 142times 10The results are summarized in Table 43 In terms of run time we observe that our method

MLSVD_FFSRQR is comparable to MLSVD_RSI while MLSVD is the most expensive

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 485

50 100 150 200

target rank k

10-5

10-4

10-3

10-2

10-1

100

rela

tive

ap

pro

xim

atio

n e

rro

r

MLSVD

MLSVD_FFSRQR

MLSVD_RSI

MLSVD_LTSVD

50 100 150 200

target rank k

10-2

10-1

100

101

102

run tim

e (

sec)

MLSVD

MLSVD_FFSRQR

MLSVD_RSI

MLSVD_LTSVD

FIG 44 Run time and relative approximation error comparison on a sparse tensor

TABLE 43Comparison on handwritten digits classification

MLSVD MLSVD_FFSRQR MLSVD_RSI MLSVD_LTSVD

Training Time [sec] 272121 15455 19343 04266Relative Model Error 04099 04273 04247 05162Classification Accur 9519 9498 9505 9259

one and MLSVD_LTSVD is the fastest one In terms of classification quality MLSVDMLSVD_FFSRQR and MLSVD_RSI are comparable while MLSVD_LTSVD is the leastaccurate one

43 Solving nuclear norm minimization problem To show the effectiveness of theFFSRQR algorithm for the nuclear norm minimization problems we investigate two scenariosthe robust PCA (27) and the matrix completion problem (28) The test matrix used for therobust PCA is introduced in [45] and the test matrices used in the matrix completion are tworeal data sets We use the IALM method [45] to solve both problems the IALM code can bedownloaded1

431 Robust PCA To solve the robust PCA problem we replace the approximateSVD part in the IALM method [45] by various methods We denote the actual solution tothe robust PCA problem by a matrix pair (Xlowast Elowast) isin Rmtimesn times Rmtimesn The matrix Xlowast isXlowast = XLX

TR with XL isin Rmtimesk XR isin Rntimesk being random matrices whose entries are

independently sampled from a normal distribution The sparse matrix Elowast is a random matrixwhere its non-zero entries are independently sampled from a uniform distribution over theinterval [minus500 500] The input to the IALM algorithm has the form M = Xlowast + Elowast andthe output is denoted by

(X E

) In this numerical experiment we use the same parameter

settings as the IALM code for robust PCA rank k is 01m and the number of non-zero entriesin E is 005m2 We choose the trade-off parameter λ = 1

radicmax (mn) as suggested by

Candes et al [8] The solution quality is measured by the normalized root mean square errorX minusXlowastF XlowastF

Table 44 includes relative error run time the number of non-zero entries in E (E0)the iteration count and the number of non-zero singular values (sv) in X of the IALM

1httpperceptioncslillinoisedumatrix-ranksample_codehtml

ETNAKent State University and

Johann Radon Institute (RICAM)

486 Y FENG J XIAO AND M GU

algorithm using different approximate SVD methods We observe that IALM_FFSRQR isfaster than all the other three methods while its error is comparable to IALM_LANSVD andIALM_RSISVD IALM_LTSVD is relatively slow and not effective

TABLE 44Comparison on robust PCA

Size Method Error Time (sec) E0 Iter sv

1000times 1000

IALM_LANSVD 333e-07 579e+00 50000 22 100IALM_FFSRQR 279e-07 102e+00 50000 25 100IALM_RSISVD 336e-07 109e+00 49999 22 100IALM_LTSVD 992e-02 311e+00 999715 100 100

2000times 2000

IALM_LANSVD 261e-07 591e+01 199999 22 200IALM_FFSRQR 182e-07 693e+00 199998 25 200IALM_RSISVD 263e-07 738e+00 199996 22 200IALM_LTSVD 842e-02 220e+01 3998937 100 200

4000times 4000

IALM_LANSVD 138e-07 465e+02 799991 23 400IALM_FFSRQR 139e-07 443e+01 800006 26 400IALM_RSISVD 151e-07 504e+01 799990 23 400IALM_LTSVD 894e-02 154e+02 15996623 100 400

6000times 6000

IALM_LANSVD 130e-07 166e+03 1799982 23 600IALM_FFSRQR 102e-07 142e+02 1799993 26 600IALM_RSISVD 144e-07 162e+02 1799985 23 600IALM_LTSVD 858e-02 555e+02 35992605 100 600

432 Matrix completion We solve the matrix completion problems for two real datasets used in [68] the Jester joke data set [24] and the MovieLens data set [32] The Jester jokedata set consists of 41 million ratings for 100 jokes from 73 421 users and can be downloadedfrom the website httpgoldbergberkeleyedujester-data We test withthe following data matrices

bull jester-1 Data from 24 983 users who have rated 36 or more jokesbull jester-2 Data from 23 500 users who have rated 36 or more jokesbull jester-3 Data from 24 938 users who have rated between 15 and 35 jokesbull jester-all The combination of jester-1 jester-2 and jester-3

The MovieLens data set can be downloaded fromhttpsgrouplensorgdatasetsmovielensWe test with the following data matricesbull movie-100K 100 000 ratings of 943 users for 1682 moviesbull movie-1M 1 million ratings of 6040 users for 3900 moviesbull movie-latest-small 100 000 ratings of 700 users for 9000 movies

For each data set we let M be the original data matrix where Mij stands for the ratingof joke (movie) j by user i and Γ be the set of indices where Mij is known The matrixcompletion algorithm quality is measured by the Normalized Mean Absolute Error (NMAE)defined by

NMAEdef=

1|Γ|sum

(ij)isinΓ |Mij minusXij |rmax minus rmin

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 487

where Xij is the prediction of the rating of joke (movie) j given by user i and rmin rmax arethe lower and upper bounds of the ratings respectively For the Jester joke data sets we setrmin = minus10 and rmax = 10 For the MovieLens data sets we set rmin = 1 and rmax = 5

Since |Γ| is large we randomly select a subset Ω from Γ and then use Algorithm 8 tosolve the problem (28) We randomly select 10 ratings for each user in the Jester joke datasets while we randomly choose about 50 of the ratings for each user in the MovieLensdata sets Table 45 includes the parameter settings in the algorithms The maximum iterationnumber is 100 in IALM and all other parameters are the same as those used in [45]

The numerical results are included in Table 46 We observe that IALM_FFSRQR achievesalmost the same recoverability as other methods (except IALM_LTSVD) and is slightly fasterthan IALM_RSISVD for these two data sets

TABLE 45Parameters used in the IALM method on matrix completion

Data set m n |Γ| |Ω|jester-1 24983 100 181e+06 249830jester-2 23500 100 171e+06 235000jester-3 24938 100 617e+05 249384

jester-all 73421 100 414e+06 734210movie-100K 943 1682 100e+05 49918movie-1M 6040 3706 100e+06 498742

movie-latest-small 671 9066 100e+05 52551

5 Conclusions We have presented the Flip-Flop SRQR factorization a variant ofthe QLP factorization to compute low-rank matrix approximations The Flip-Flop SRQRalgorithm uses a SRQR factorization to initialize the truncated version of a column-pivotedQR factorization and then forms an LQ factorization For the numerical results presented theerrors in the proposed algorithm were comparable to those obtained from the other state-of-the-art algorithms This new algorithm is cheaper to compute and produces quality low-rankmatrix approximations Furthermore we prove singular value lower bounds and residual errorupper bounds for the Flip-Flop SRQR factorization In situations where singular values ofthe input matrix decay relatively quickly the low-rank approximation computed by Flip-FlopSRQR is guaranteed to be as accurate as the truncated SVD We also perform a complexityanalysis to show that Flip-Flop SRQR is faster than the approximate SVD with randomizedsubspace iteration Future work includes reducing the overhead cost in Flip-Flop SRQR andimplementing the Flip-Flop SRQR algorithm on distributed memory machines for popularapplications such as distributed PCA

Acknowledgments The authors would like to thank Yousef Saad the editor and threeanonymous referees who kindly reviewed the earlier version of this manuscript and providedvaluable suggestions and comments

Appendix A Partial QR factorization with column pivotingThe details of partial QRCP are covered in Algorithm 4

Appendix B TUXV algorithmThe details of TUXV algorithm are presented in Algorithm 5

Appendix C Computing Tucker decomposition of higher-order tensorsAlgorithm 6 computes the Tucker decomposition of higher-order tensors ie sequentially

truncated higher-order SVD

ETNAKent State University and

Johann Radon Institute (RICAM)

488 Y FENG J XIAO AND M GU

TABLE 46Comparison on matrix completion

Data set Method Iter Time NMAE sv σmax σmin

jester-1

IALM-LANSVD 12 706e+00 184e-01 100 214e+03 100e+00IALM-FFSRQR 12 344e+00 169e-01 100 228e+03 100e+00IALM-RSISVD 12 375e+00 189e-01 100 212e+03 100e+00IALM-LTSVD 100 211e+01 174e-01 62 300e+03 100e+00

jester-2

IALM-LANSVD 12 680e+00 185e-01 100 213e+03 100e+00IALM-FFSRQR 12 279e+00 170e-01 100 229e+03 100e+00IALM-RSISVD 12 359e+00 191e-01 100 212e+03 100e+00IALM-LTSVD 100 203e+01 175e-01 58 296e+03 100e+00

jester-3

IALM-LANSVD 12 705e+00 126e-01 99 179e+03 100e+00IALM-FFSRQR 12 303e+00 122e-01 100 171e+03 100e+00IALM-RSISVD 12 385e+00 131e-01 100 178e+03 100e+00IALM-LTSVD 100 212e+01 133e-01 55 250e+03 100e+00

jester-all

IALM-LANSVD 12 239e+01 172e-01 100 356e+03 100e+00IALM-FFSRQR 12 112e+01 162e-01 100 363e+03 100e+00IALM-RSISVD 12 134e+01 182e-01 100 347e+03 100e+00IALM-LTSVD 100 699e+01 168e-01 52 492e+03 100e+00

movie-100K

IALM-LANSVD 29 286e+01 183e-01 285 121e+03 100e+00IALM-FFSRQR 30 455e+00 167e-01 295 153e+03 100e+00IALM-RSISVD 29 482e+00 182e-01 285 129e+03 100e+00IALM-LTSVD 48 142e+01 147e-01 475 191e+03 100e+00

movie-1M

IALM-LANSVD 50 740e+02 158e-01 495 499e+03 100e+00IALM-FFSRQR 53 207e+02 137e-01 525 663e+03 100e+00IALM-RSISVD 50 223e+02 157e-01 495 535e+03 100e+00IALM-LTSVD 100 850e+02 117e-01 995 897e+03 100e+00

movie-latest-small

IALM-LANSVD 31 166e+02 185e-01 305 113e+03 100e+00IALM-FFSRQR 31 196e+01 200e-01 305 142e+03 100e+00IALM-RSISVD 31 285e+01 191e-01 305 120e+03 100e+00IALM-LTSVD 63 402e+01 208e-01 298 179e+03 100e+00

Appendix D IALM for solving two special nuclear norm minimization problemsAlgorithm 7 and Algorithm 8 solve the robust PCA and matrix completion problems

respectively In Algorithm 7 Sω (x) = sgn (x) middot max (|x| minus ω 0) is the soft shrinkageoperator [29] with x isin Rn and ω gt 0 In Algorithm 8 Ω is the complement of Ω

Appendix E Approximate SVD with randomized subspace iterationThe randomized subspace iteration was proposed in [30 Algorithm 44] to compute an

orthonormal matrix whose range approximates the range of A An approximate SVD can becomputed using the aforementioned orthonormal matrix [30 Algorithm 51] A randomizedsubspace iteration is used in routine MLSVD_RSI in Matlab toolbox tensorlab [73] andMLSVD_RSI is by far the most efficient function that we can find in Matlab to compute ST-HOSVD We summarize the approximate SVD with randomized subspace iteration pseudocodein Algorithm 9

Now we perform a complexity analysis for the approximate SVD with randomizedsubspace iteration We first note that

1 The cost of generating a random matrix is negligible2 The cost of computing B = AΩ is 2mn (k + p)

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 489

Algorithm 4 Partial QRCPInputsMatrix A isin Rmtimesn Target rank kOutputsOrthogonal matrix Q isin RmtimesmUpper trapezoidal matrix R isin RmtimesnPermutation matrix Π isin Rntimesn such that AΠ = QRAlgorithmInitialize Π = In Compute column norms rs = A (1 m s) 2 (1 le s le n)for j = 1 k do

Find i = arg maxjleslen

rs Exchange rj and ri columns in A and Π

Form Householder reflection Qj from A (j m j)Update trailing matrix A (j m j n)larr QTj A (j m j n)Update rs = A (j + 1 m s) 2 (j + 1 le s le n)

end forQ = Q1Q2 middot middot middotQk is the product of all reflections R = upper trapezoidal part of A

Algorithm 5 TUXV AlgorithmInputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0OutputsColumn orthonormal matrices U isin Rmtimesk V isin Rntimesk and upper triangular matrixR isin Rktimesk such that A asymp URV T AlgorithmDo TRQRCP on A to obtain Q isin Rmtimesk R isin Rktimesn and Π isin RntimesnR = RΠT and do LQ factorization ie [VR] = qr(RT 0)Compute Z = AV and do QR factorization ie [UR] = qr(Z 0)

3 In each QR step [Qsim] = qr (B 0) the cost of computing the QR factorization ofB is 2m (k + p)

2 minus 23 (k + p)

3 (cf [69]) and the cost of forming the first (k + p)

columns in the full Q matrix is m (k + p)2

+ 13 (k + p)

3

Now we count the flops for each i in the for loop

1 The cost of computing B = AT lowastQ is 2mn (k + p)2 The cost of computing [Qsim] = qr (B 0) is 2n (k + p)

2 minus 23 (k + p)

3 and the costof forming the first (k + p) columns in the full Q matrix is n (k + p)

2+ 1

3 (k + p)3

3 The cost of computing B = A lowastQ is 2mn (k + p)4 The cost of computing [Qsim] = qr (B 0) is 2m (k + p)

2minus 23 (k + p)

3 and the costof forming the first (k + p) columns in the full Q matrix is m (k + p)

2+ 1

3 (k + p)3

Together the cost of running the for loop q times is

q

(4mn (k + p) + 3 (m+ n) (k + p)

2 minus 2

3(k + p)

3

)

Additionally the cost of computing B = QT lowastA is 2mn (k + p) the cost of doing SVD of Bis O(n (k + p)

2) and the cost of computing U = Q lowast U is 2m (k + p)

2

ETNAKent State University and

Johann Radon Institute (RICAM)

490 Y FENG J XIAO AND M GU

Algorithm 6 ST-HOSVDInputsTensor X isin RI1timesmiddotmiddotmiddottimesId desired rank (k1 kd) and processing order p = (p1 middot middot middot pd)OutputsTucker decomposition [GU1 middot middot middot Ud]AlgorithmDefine tensor G larr X for j = 1 d dor = pj Compute exact or approximate rank kr SVD of the tensor unfolding G(r) asymp UrΣrV Tr Ur larr UrUpdate G(r) larr ΣrV

Tr ie applying UTr to G

end for

Algorithm 7 Robust PCA Using IALMInputsMeasured matrix M isin Rmtimesn positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 J (M) = max

(M2 λminus1MF

) Y0 = MJ (M) E0 = 0

while not converged do(UΣV) = svd

(MminusEk + microminus1k Yk

)

Xk+1 = USmicrominus1k

(Σ)V T

Ek+1 = Sλmicrominus1k

(M minusXk+1 + microminus1

k Yk)

Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then

Breakend if

end while

Now assume k + p min (mn) and omit the lower-order terms then we arriveat (4q + 4)mn (k + p) as the complexity of approximate SVD with randomized subspaceiteration In practice q is usually chosen to be an integer between 0 and 2

REFERENCES

[1] O ALTER P O BROWN AND D BOTSTEIN Singular value decomposition for genome-wide expressiondata processing and modeling Proc Natl Acad Sci USA 97 (2000) pp 10101ndash10106

[2] E ANDERSON Z BAI C BISCHOF L S BLACKFORD J DEMMEL J DONGARRA J DU CROZA GREENBAUM S HAMMARLING A MCKENNEY AND D SORENSEN LAPACK Usersrsquo Guide 3rded SIAM Philadelphia 1999

[3] C A ANDERSSON AND R BRO Improving the speed of multi-way algorithms Part I Tucker3 ChemometricsIntell Lab Syst 42 (1998) pp 93ndash103

[4] M W BERRY S T DUMAIS AND G W OrsquoBRIEN Using linear algebra for intelligent information retrievalSIAM Rev 37 (1995) pp 573ndash595

[5] C BISCHOF AND C VAN LOAN The WY representation for products of Householder matrices SIAM J Sci

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 491

Algorithm 8 Matrix Completion Using IALMInputsSampled set Ω sampled entries πΩ (M) positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 Y0 = 0 E0 = 0while not converged do

(UΣV) = svd(MminusEk + microminus1k Yk

)

Xk+1 = USmicrominus1k

(Σ)V T

Ek+1 = πΩ

(M minusXk+1 + microminus1

k Yk)

Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then

Breakend if

end while

Statist Comput 8 (1987) pp S2ndashS13[6] P BUSINGER AND G H GOLUB Handbook series linear algebra Linear least squares solutions by House-

holder transformations Numer Math 7 (1965) pp 269ndash276[7] J-F CAI E J CANDEgraveS AND Z SHEN A singular value thresholding algorithm for matrix completion

SIAM J Optim 20 (2010) pp 1956ndash1982[8] E J CANDEgraveS X LI Y MA AND J WRIGHT Robust principal component analysis J ACM 58 (2011)

Art 11 37 pages[9] E J CANDEgraveS AND B RECHT Exact matrix completion via convex optimization Found Comput Math 9

(2009) pp 717ndash772[10] J D CARROLL AND J-J CHANG Analysis of individual differences in multidimensional scaling via an

N-way generalization of ldquoEckart-Youngrdquo decomposition Psychometrika 35 (1970) pp 283ndash319[11] T A DAVIS AND Y HU The University of Florida sparse matrix collection ACM Trans Math Software 38

(2011) pp Art 1 25[12] L DE LATHAUWER AND B DE MOOR From matrix to tensor Multilinear algebra and signal processing in

Mathematics in Signal Processing IV Inst Math Appl Conf Ser Oxford Univ Press Oxford 1998pp 1ndash16

[13] L DE LATHAUWER B DE MOOR AND J VANDEWALLE A multilinear singular value decompositionSIAM J Matrix Anal Appl 21 (2000) pp 1253ndash1278

[14] On the best rank-1 and rank-(R1 R2 middot middot middot RN ) approximation of higher-order tensors SIAM JMatrix Anal Appl 21 (2000) pp 1324ndash1342

[15] L DE LATHAUWER AND J VANDEWALLE Dimensionality reduction in higher-order signal processing andrank-(R1 R2 RN ) reduction in multilinear algebra Linear Algebra Appl 391 (2004) pp 31ndash55

[16] P DRINEAS R KANNAN AND M W MAHONEY Fast Monte Carlo algorithms for matrices II Computinga low-rank approximation to a matrix SIAM J Comput 36 (2006) pp 158ndash183

[17] J A DUERSCH AND M GU Randomized QR with column pivoting SIAM J Sci Comput 39 (2017)pp C263ndashC291

[18] C ECKART AND G YOUNG The approximation of one matrix by another of lower rank Psychometrika 1(1936) pp 211ndash218

[19] L ELDEacuteN Matrix Methods in Data Mining and Pattern Recognition SIAM Philadelphia 2007[20] M FAZEL H HINDI AND S P BOYD Log-det heuristic for matrix rank minimization with applications to

Hankel and Euclidean distance matrices in Proceedings of the 2003 American Control Conference 2003IEEE Conference Proceedings IEEE Piscataway pp 2156ndash2162

[21] M FAZEL T K PONG D SUN AND P TSENG Hankel matrix rank minimization with applications tosystem identification and realization SIAM J Matrix Anal Appl 34 (2013) pp 946ndash977

[22] C FRANK AND E NOumlTH Optimizing eigenfaces by face masks for facial expression recognition in ComputerAnalysis of Images and Patterns N Petkov and M A Westenberg eds vol 2756 of Lecture Notes inComput Sci Springer Berlin 2003 pp 646ndash654

ETNAKent State University and

Johann Radon Institute (RICAM)

492 Y FENG J XIAO AND M GU

Algorithm 9 Approximate SVD with Randomized Subspace IterationInputsMatrix A isin Rmtimesn Target rank k Oversampling size p ge 0 Number of iterations q ge 1OutputsU isin Rmtimesk contains the approximate top k left singular vectors of AΣ isin Rktimesk contains the approximate top k singular values of AV isin Rntimesk contains the approximate top k right singular vectors of AAlgorithmGenerate iid Gaussian matrix Ω isin N (0 1)

ntimes(k+p)Compute B = AΩ[Qsim] = qr (B 0)for i = 1 q doB = AT lowastQ[Qsim] = qr (B 0)B = A lowastQ[Qsim] = qr (B 0)

end forB = QT lowastA[UΣ V ] = svd (B)U = Q lowast U U = U ( 1 k)Σ = Σ (1 k 1 k)V = V ( 1 k)

[23] G W FURNAS S DEERWESTER S T DUMAIS T K LANDAUER R A HARSHMAN L A STREETERAND K E LOCHBAUM Information retrieval using a singular value decomposition model of latentsemantic structure in Proceedings of the 11th Annual International ACM SIGIR Conference on Researchand Development in Information Retrieval Y Chiaramella ed ACM New York 1988 pp 465ndash480

[24] K GOLDBERG T ROEDER D GUPTA AND C PERKINS Eigentaste A constant time collaborative filteringalgorithm Inf Retrieval 4 (2001) pp 133ndash151

[25] G GOLUB Numerical methods for solving linear least squares problems Numer Math 7 (1965) pp 206ndash216[26] G H GOLUB AND C F VAN LOAN Matrix Computations 4th ed Johns Hopkins University Press

Baltimore 2013[27] M GU Subspace iteration randomization and singular value problems SIAM J Sci Comput 37 (2015)

pp A1139ndashA1173[28] M GU AND S C EISENSTAT Efficient algorithms for computing a strong rank-revealing QR factorization

SIAM J Sci Comput 17 (1996) pp 848ndash869[29] E T HALE W YIN AND Y ZHANG Fixed-point continuation for l1-minimization methodology and

convergence SIAM J Optim 19 (2008) pp 1107ndash1130[30] N HALKO P G MARTINSSON AND J A TROPP Finding structure with randomness probabilistic

algorithms for constructing approximate matrix decompositions SIAM Rev 53 (2011) pp 217ndash288[31] R A HARSHMAN Foundations of the PARAFAC procedure Models and conditions for an ldquoexplanatoryrdquo

multimodal factor analysis UCLA Working Papers in Phonetics 16 (1970) pp 1ndash84[32] J L HERLOCKER J A KONSTAN A BORCHERS AND J RIEDL An algorithmic framework for performing

collaborative filtering in Proceedings of the 22nd Annual International ACM SIGIR Conference onResearch and Development in Information Retrieval F Gey M Heaast and R Tong eds ACM NewYork 1999 pp 230ndash237

[33] R A HORN AND C R JOHNSON Topics in Matrix Analysis Cambridge University Press Cambridge 1991[34] D A HUCKABY AND T F CHAN On the convergence of Stewartrsquos QLP algorithm for approximating the

SVD Numer Algorithms 32 (2003) pp 287ndash316[35] Stewartrsquos pivoted QLP decomposition for low-rank matrices Numer Linear Algebra Appl 12 (2005)

pp 153ndash159[36] W B JOHNSON AND J LINDENSTRAUSS Extensions of Lipschitz mappings into a Hilbert space in

Conference in Modern Analysis and Probability (New Haven Conn 1982) R Beals A Beck A Bellowand A Hajian eds vol 26 of Contemp Math Amer Math Soc Providence 1984 pp 189ndash206

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 493

[37] I T JOLLIFFE Principal Component Analysis Springer New York 1986[38] J M KLEINBERG Authoritative sources in a hyperlinked environment J ACM 46 (1999) pp 604ndash632[39] T G KOLDA Orthogonal tensor decompositions SIAM J Matrix Anal Appl 23 (2001) pp 243ndash255[40] T G KOLDA AND B W BADER Tensor decompositions and applications SIAM Rev 51 (2009) pp 455ndash

500[41] R M LARSEN PROPACK-software for large and sparse SVD calculations available at

httpsunstanfordedurmunkPROPACK[42] Lanczos bidiagonalization with partial reorthogonalization Tech Report PB-537 Department of

Computer Science Aarhus University Aarhus Denmark 1998[43] Y LECUN L BOTTOU Y BENGIO AND P HAFFNER Gradient-based learning applied to document

recognition Proc IEEE 86 (1998) pp 2278ndash2324[44] R B LEHOUCQ D C SORENSEN AND C YANG ARPACK Usersrsquo Guide Solution of Large-Scale

Eigenvalue Problems with Implicitly Restarted Arnoldi Methods SIAM Philadelphia 1998[45] Z LIN M CHEN AND Y MA The augmented Lagrange multiplier method for exact recovery of corrupted

low-rank matrices Preprint on arXiv httpsarxivorgabs10095055 2010[46] N LINIAL E LONDON AND Y RABINOVICH The geometry of graphs and some of its algorithmic

applications Combinatorica 15 (1995) pp 215ndash245[47] Z LIU A HANSSON AND L VANDENBERGHE Nuclear norm system identification with missing inputs and

outputs Systems Control Lett 62 (2013) pp 605ndash612[48] Z LIU AND L VANDENBERGHE Interior-point method for nuclear norm approximation with application to

system identification SIAM J Matrix Anal Appl 31 (2009) pp 1235ndash1256[49] S MA D GOLDFARB AND L CHEN Fixed point and Bregman iterative methods for matrix rank minimiza-

tion Math Program 128 (2011) pp 321ndash353[50] C B MELGAARD Randomized Pivoting and Spectrum-Revealing Bounds in Numerical Linear Algebra

PhD Thesis University of California Berkeley 2015[51] M MESBAHI AND G P PAPAVASSILOPOULOS On the rank minimization problem over a positive semidefinite

linear matrix inequality IEEE Trans Automat Control 42 (1997) pp 239ndash243[52] N MULLER L MAGAIA AND B M HERBST Singular value decomposition eigenfaces and 3D recon-

structions SIAM Rev 46 (2004) pp 518ndash545[53] M NARWARIA AND W LIN SVD-based quality metric for image and video using machine learning IEEE

Trans Systbdquo Man and Cybernetics Part B (Cybernetics) 42 (2012) pp 347ndash364[54] T-H OH Y MATSUSHITA Y-W TAI AND I SO KWEON Fast randomized singular value thresholding

for nuclear norm minimization in Proceedings of the IEEE Conference on Computer Vision and PatternRecognition 2015 IEEE Conference Proceedings IEEE Los Alamitos 2015 pp 4484ndash4493

[55] G QUINTANA-ORTIacute X SUN AND C H BISCHOF A BLAS-3 version of the QR factorization with columnpivoting SIAM J Sci Comput 19 (1998) pp 1486ndash1494

[56] B RECHT M FAZEL AND P A PARRILO Guaranteed minimum-rank solutions of linear matrix equationsvia nuclear norm minimization SIAM Rev 52 (2010) pp 471ndash501

[57] J D RENNIE AND N SREBRO Fast maximum margin matrix factorization for collaborative prediction inProceedings of the 22nd International Conference on Machine Learning L De Raedt and S Wrobel edsACM Press Neow York 2005 pp 713ndash719

[58] V ROKHLIN A SZLAM AND M TYGERT A randomized algorithm for principal component analysis SIAMJ Matrix Anal Appl 31 (2009) pp 1100ndash1124

[59] A K SAIBABA HOID higher order interpolatory decomposition for tensors based on Tucker representationSIAM J Matrix Anal Appl 37 (2016) pp 1223ndash1249

[60] B SAVAS AND L ELDEacuteN Handwritten digit classification using higher order singular value decompositionPattern Recognition 40 (2007) pp 993ndash1003

[61] R SCHREIBER AND C VAN LOAN A storage-efficient WY representation for products of Householdertransformations SIAM J Sci Statist Comput 10 (1989) pp 53ndash57

[62] A SHASHUA AND T HAZAN Non-negative tensor factorization with applications to statistics and com-puter vision in Proceedings of the 22nd International Conference on Machine Learning S DzeroskiL De Raedt and S Wrobel eds ACM New York 2005 pp 792ndash799

[63] N D SIDIROPOULOS R BRO AND G B GIANNAKIS Parallel factor analysis in sensor array processingIEEE Trans Signal Process 48 (2000) pp 2377ndash2388

[64] D C SORENSEN AND M EMBREE A DEIM induced CUR factorization SIAM J Sci Comput 38 (2016)pp A1454ndashA1482

[65] N SREBRO J RENNIE AND T S JAAKKOLA Maximum-margin matrix factorization in NIPSrsquo04 Proceed-ings of the 17th International Conference on Neural Information Processing Systems L K Saul Y Weissand L Bottou eds MIT Press Cambridge 2004 pp 1329ndash1336

[66] G W STEWART The QLP approximation to the singular value decomposition SIAM J Sci Comput 20(1999) pp 1336ndash1348

ETNAKent State University and

Johann Radon Institute (RICAM)

494 Y FENG J XIAO AND M GU

[67] S TAHERI Q QIU AND R CHELLAPPA Structure-preserving sparse decomposition for facial expressionanalysis IEEE Trans Image Process 23 (2014) pp 3590ndash3603

[68] K-C TOH AND S YUN An accelerated proximal gradient algorithm for nuclear norm regularized linearleast squares problems Pac J Optim 6 (2010) pp 615ndash640

[69] L N TREFETHEN AND D BAU III Numerical Linear Algebra SIAM Philadelphia 1997[70] L R TUCKER Some mathematical notes on three-mode factor analysis Psychometrika 31 (1966) pp 279ndash

311[71] N VANNIEUWENHOVEN R VANDEBRIL AND K MEERBERGEN A new truncation strategy for the higher-

order singular value decomposition SIAM J Sci Comput 34 (2012) pp A1027ndashA1052[72] M A O VASILESCU AND D TERZOPOULOS Multilinear analysis of image ensembles Tensorfaces in

European Conference on Computer Vision 2002 A Heyden G Sparr M Nielsen and P Johansen edsLecture Notes in Computer Science vol 2350 Springer Berlin 2002 pp 447ndash460

[73] N VERVLIET O DEBALS L SORBER M V BAREL AND L DE LATHAUWER Tensorlab user guideavailable at httpwwwtensorlabnet

[74] J XIAO AND M GU Spectrum-revealing Cholesky factorization for kernel methods in Proceedings of the 16thIEEE International Conference on Data Mining (ICDM) IEEE Conference Proceedings Los Alamitos2016 pp 1293ndash1298

[75] J XIAO M GU AND J LANGOU Fast parallel randomized QR with column pivoting algorithms for reliablelow-rank matrix approximations in Proceedings of the 24th IEEE International Conference on HighPerformance Computing (HiPC) IEEE Conference Proceedings Los Alamitos 2017 pp 233ndash242

[76] T ZHANG AND G H GOLUB Rank-one approximation to high order tensors SIAM J Matrix Anal Appl23 (2001) pp 534ndash550

Page 15: FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION AND …etna.mcs.kent.edu/vol.51.2019/pp469-494.dir/pp469-494.pdfminimization. In Section3, we introduce Flip-Flop SRQR and analyze its

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 483

100 200 300 400 500

target rank k

022

024

026

028

03

032

034

036re

lative

ap

pro

xim

atio

n e

rro

r

FFSRQR

RSISVD

LANSVD

TUXV

(a) Type 1 Random square matrix

100 200 300 400 500

target rank k

10-2

10-1

100

rela

tive

ap

pro

xim

atio

n e

rro

r

FFSRQR

RSISVD

LANSVD

TUXV

(b) Type 1 Random short-fat matrix

100 200 300 400 500

target rank k

10-2

10-1

100

rela

tive

ap

pro

xim

atio

n e

rro

r

FFSRQR

RSISVD

LANSVD

TUXV

(c) Type 1 Random tall-skinny matrix

100 200 300 400 500

target rank k

024

026

028

03

032

034

036

rela

tive

ap

pro

xim

atio

n e

rro

rFFSRQR

RSISVD

LANSVD

TUXV

(d) Type 2 GEMAT11

FIG 42 Relative approximation error comparison for approximate SVD algorithms

algorithms to compute a Tucker decomposition of tensors and the most costly part of thisalgorithm is to compute an SVD or approximate SVD of the tensor unfoldings The truncatedSVD and RSISVD are used in the routines MLSVD and MLSVD_RSI respectively in theMatlab tensorlab toolbox [73] Based on this Matlab toolbox we implement ST-HOSVDusing FFSRQR or LTSVD to do the SVD approximation We name these two new routines byMLSVD_FFSRQR and MLSVD_LTSVD respectively and compare these four routines in thenumerical experiment We also have Python codes for this tensor approximation experimentWe donrsquot list the results of the python programs here as they are similar to those for Matlab

421 A sparse tensor example We test a sparse tensor X isin Rntimesntimesn of the followingformat [64 59]

X =

10sumj=1

1000

jxj yj zj +

nsumj=11

1

jxj yj zj

where xj yj zj isin Rn are sparse vectors with nonnegative entries The symbol ldquordquo representsthe vector outer product We compute a rank-(k k k) Tucker decomposition [GU1 U2 U3]using MLSVD MLSVD_FFSRQR MLSVD_RSI and MLSVD_LTSVD respectively Therelative approximation error is measured by X minus XkF XF where Xk = G times1 U1 times2

U2 times3 U3

ETNAKent State University and

Johann Radon Institute (RICAM)

484 Y FENG J XIAO AND M GU

5 10 15 20

index

06

07

08

09

1to

p 2

0 s

ingula

r valu

es

FFSRQR

RSISVD

LANSVD

TUXV

(a) Type 1 Random square matrix

5 10 15 20

index

06

07

08

09

1

top 2

0 s

ingula

r valu

es

FFSRQR

RSISVD

LANSVD

TUXV

(b) Type 1 Random short-fat matrix

5 10 15 20

index

06

065

07

075

08

085

09

095

top

20

sin

gu

lar

va

lue

s

FFSRQR

RSISVD

LANSVD

TUXV

(c) Type 1 Random tall-skinny matrix

5 10 15 20

index

101

102

103

top

20

sin

gu

lar

va

lue

sFFSRQR

RSISVD

LANSVD

TUXV

(d) Type 2 GEMAT11

FIG 43 Top 20 singular values comparison for approximate SVD algorithms

Figure 44 displays a comparison of the efficiency and accuracy of different methods for a400times 400times 400 sparse tensor approximation problem MLSVD_LTSVD is the fastest but theleast accurate one The other three methods have similar accuracy while MLSVD_FFSRQR isfaster when the target rank k is larger

422 Handwritten digits classification MNIST is a handwritten digits image data setcreated by Yann LeCun [43] Every digit is represented by a 28times28 pixel image Handwrittendigits classification has the objective of training a classification model to classify new unlabeledimages A HOSVD algorithm is proposed by Savas and Elden [60] to classify handwrittendigits To reduce the training time a more efficient ST-HOSVD algorithm is introducedin [71]

We do handwritten digits classification using MNIST which consists of 60 000 trainingimages and 10 000 test images The number of training images in each class is restrictedto 5421 so that the training set is equally distributed over all classes The training set isrepresented by a tensor X of size 786times 5421times 10 The classification relies on Algorithm 2in [60] We use various algorithms to obtain an approximation X asymp G times1 U1 times2 U2 times3 U3

where the core tensor G has size 65times 142times 10The results are summarized in Table 43 In terms of run time we observe that our method

MLSVD_FFSRQR is comparable to MLSVD_RSI while MLSVD is the most expensive

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 485

50 100 150 200

target rank k

10-5

10-4

10-3

10-2

10-1

100

rela

tive

ap

pro

xim

atio

n e

rro

r

MLSVD

MLSVD_FFSRQR

MLSVD_RSI

MLSVD_LTSVD

50 100 150 200

target rank k

10-2

10-1

100

101

102

run tim

e (

sec)

MLSVD

MLSVD_FFSRQR

MLSVD_RSI

MLSVD_LTSVD

FIG 44 Run time and relative approximation error comparison on a sparse tensor

TABLE 43Comparison on handwritten digits classification

MLSVD MLSVD_FFSRQR MLSVD_RSI MLSVD_LTSVD

Training Time [sec] 272121 15455 19343 04266Relative Model Error 04099 04273 04247 05162Classification Accur 9519 9498 9505 9259

one and MLSVD_LTSVD is the fastest one In terms of classification quality MLSVDMLSVD_FFSRQR and MLSVD_RSI are comparable while MLSVD_LTSVD is the leastaccurate one

43 Solving nuclear norm minimization problem To show the effectiveness of theFFSRQR algorithm for the nuclear norm minimization problems we investigate two scenariosthe robust PCA (27) and the matrix completion problem (28) The test matrix used for therobust PCA is introduced in [45] and the test matrices used in the matrix completion are tworeal data sets We use the IALM method [45] to solve both problems the IALM code can bedownloaded1

431 Robust PCA To solve the robust PCA problem we replace the approximateSVD part in the IALM method [45] by various methods We denote the actual solution tothe robust PCA problem by a matrix pair (Xlowast Elowast) isin Rmtimesn times Rmtimesn The matrix Xlowast isXlowast = XLX

TR with XL isin Rmtimesk XR isin Rntimesk being random matrices whose entries are

independently sampled from a normal distribution The sparse matrix Elowast is a random matrixwhere its non-zero entries are independently sampled from a uniform distribution over theinterval [minus500 500] The input to the IALM algorithm has the form M = Xlowast + Elowast andthe output is denoted by

(X E

) In this numerical experiment we use the same parameter

settings as the IALM code for robust PCA rank k is 01m and the number of non-zero entriesin E is 005m2 We choose the trade-off parameter λ = 1

radicmax (mn) as suggested by

Candes et al [8] The solution quality is measured by the normalized root mean square errorX minusXlowastF XlowastF

Table 44 includes relative error run time the number of non-zero entries in E (E0)the iteration count and the number of non-zero singular values (sv) in X of the IALM

1httpperceptioncslillinoisedumatrix-ranksample_codehtml

ETNAKent State University and

Johann Radon Institute (RICAM)

486 Y FENG J XIAO AND M GU

algorithm using different approximate SVD methods We observe that IALM_FFSRQR isfaster than all the other three methods while its error is comparable to IALM_LANSVD andIALM_RSISVD IALM_LTSVD is relatively slow and not effective

TABLE 44Comparison on robust PCA

Size Method Error Time (sec) E0 Iter sv

1000times 1000

IALM_LANSVD 333e-07 579e+00 50000 22 100IALM_FFSRQR 279e-07 102e+00 50000 25 100IALM_RSISVD 336e-07 109e+00 49999 22 100IALM_LTSVD 992e-02 311e+00 999715 100 100

2000times 2000

IALM_LANSVD 261e-07 591e+01 199999 22 200IALM_FFSRQR 182e-07 693e+00 199998 25 200IALM_RSISVD 263e-07 738e+00 199996 22 200IALM_LTSVD 842e-02 220e+01 3998937 100 200

4000times 4000

IALM_LANSVD 138e-07 465e+02 799991 23 400IALM_FFSRQR 139e-07 443e+01 800006 26 400IALM_RSISVD 151e-07 504e+01 799990 23 400IALM_LTSVD 894e-02 154e+02 15996623 100 400

6000times 6000

IALM_LANSVD 130e-07 166e+03 1799982 23 600IALM_FFSRQR 102e-07 142e+02 1799993 26 600IALM_RSISVD 144e-07 162e+02 1799985 23 600IALM_LTSVD 858e-02 555e+02 35992605 100 600

432 Matrix completion We solve the matrix completion problems for two real datasets used in [68] the Jester joke data set [24] and the MovieLens data set [32] The Jester jokedata set consists of 41 million ratings for 100 jokes from 73 421 users and can be downloadedfrom the website httpgoldbergberkeleyedujester-data We test withthe following data matrices

bull jester-1 Data from 24 983 users who have rated 36 or more jokesbull jester-2 Data from 23 500 users who have rated 36 or more jokesbull jester-3 Data from 24 938 users who have rated between 15 and 35 jokesbull jester-all The combination of jester-1 jester-2 and jester-3

The MovieLens data set can be downloaded fromhttpsgrouplensorgdatasetsmovielensWe test with the following data matricesbull movie-100K 100 000 ratings of 943 users for 1682 moviesbull movie-1M 1 million ratings of 6040 users for 3900 moviesbull movie-latest-small 100 000 ratings of 700 users for 9000 movies

For each data set we let M be the original data matrix where Mij stands for the ratingof joke (movie) j by user i and Γ be the set of indices where Mij is known The matrixcompletion algorithm quality is measured by the Normalized Mean Absolute Error (NMAE)defined by

NMAEdef=

1|Γ|sum

(ij)isinΓ |Mij minusXij |rmax minus rmin

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 487

where Xij is the prediction of the rating of joke (movie) j given by user i and rmin rmax arethe lower and upper bounds of the ratings respectively For the Jester joke data sets we setrmin = minus10 and rmax = 10 For the MovieLens data sets we set rmin = 1 and rmax = 5

Since |Γ| is large we randomly select a subset Ω from Γ and then use Algorithm 8 tosolve the problem (28) We randomly select 10 ratings for each user in the Jester joke datasets while we randomly choose about 50 of the ratings for each user in the MovieLensdata sets Table 45 includes the parameter settings in the algorithms The maximum iterationnumber is 100 in IALM and all other parameters are the same as those used in [45]

The numerical results are included in Table 46 We observe that IALM_FFSRQR achievesalmost the same recoverability as other methods (except IALM_LTSVD) and is slightly fasterthan IALM_RSISVD for these two data sets

TABLE 45Parameters used in the IALM method on matrix completion

Data set m n |Γ| |Ω|jester-1 24983 100 181e+06 249830jester-2 23500 100 171e+06 235000jester-3 24938 100 617e+05 249384

jester-all 73421 100 414e+06 734210movie-100K 943 1682 100e+05 49918movie-1M 6040 3706 100e+06 498742

movie-latest-small 671 9066 100e+05 52551

5 Conclusions We have presented the Flip-Flop SRQR factorization a variant ofthe QLP factorization to compute low-rank matrix approximations The Flip-Flop SRQRalgorithm uses a SRQR factorization to initialize the truncated version of a column-pivotedQR factorization and then forms an LQ factorization For the numerical results presented theerrors in the proposed algorithm were comparable to those obtained from the other state-of-the-art algorithms This new algorithm is cheaper to compute and produces quality low-rankmatrix approximations Furthermore we prove singular value lower bounds and residual errorupper bounds for the Flip-Flop SRQR factorization In situations where singular values ofthe input matrix decay relatively quickly the low-rank approximation computed by Flip-FlopSRQR is guaranteed to be as accurate as the truncated SVD We also perform a complexityanalysis to show that Flip-Flop SRQR is faster than the approximate SVD with randomizedsubspace iteration Future work includes reducing the overhead cost in Flip-Flop SRQR andimplementing the Flip-Flop SRQR algorithm on distributed memory machines for popularapplications such as distributed PCA

Acknowledgments The authors would like to thank Yousef Saad the editor and threeanonymous referees who kindly reviewed the earlier version of this manuscript and providedvaluable suggestions and comments

Appendix A Partial QR factorization with column pivotingThe details of partial QRCP are covered in Algorithm 4

Appendix B TUXV algorithmThe details of TUXV algorithm are presented in Algorithm 5

Appendix C Computing Tucker decomposition of higher-order tensorsAlgorithm 6 computes the Tucker decomposition of higher-order tensors ie sequentially

truncated higher-order SVD

ETNAKent State University and

Johann Radon Institute (RICAM)

488 Y FENG J XIAO AND M GU

TABLE 46Comparison on matrix completion

Data set Method Iter Time NMAE sv σmax σmin

jester-1

IALM-LANSVD 12 706e+00 184e-01 100 214e+03 100e+00IALM-FFSRQR 12 344e+00 169e-01 100 228e+03 100e+00IALM-RSISVD 12 375e+00 189e-01 100 212e+03 100e+00IALM-LTSVD 100 211e+01 174e-01 62 300e+03 100e+00

jester-2

IALM-LANSVD 12 680e+00 185e-01 100 213e+03 100e+00IALM-FFSRQR 12 279e+00 170e-01 100 229e+03 100e+00IALM-RSISVD 12 359e+00 191e-01 100 212e+03 100e+00IALM-LTSVD 100 203e+01 175e-01 58 296e+03 100e+00

jester-3

IALM-LANSVD 12 705e+00 126e-01 99 179e+03 100e+00IALM-FFSRQR 12 303e+00 122e-01 100 171e+03 100e+00IALM-RSISVD 12 385e+00 131e-01 100 178e+03 100e+00IALM-LTSVD 100 212e+01 133e-01 55 250e+03 100e+00

jester-all

IALM-LANSVD 12 239e+01 172e-01 100 356e+03 100e+00IALM-FFSRQR 12 112e+01 162e-01 100 363e+03 100e+00IALM-RSISVD 12 134e+01 182e-01 100 347e+03 100e+00IALM-LTSVD 100 699e+01 168e-01 52 492e+03 100e+00

movie-100K

IALM-LANSVD 29 286e+01 183e-01 285 121e+03 100e+00IALM-FFSRQR 30 455e+00 167e-01 295 153e+03 100e+00IALM-RSISVD 29 482e+00 182e-01 285 129e+03 100e+00IALM-LTSVD 48 142e+01 147e-01 475 191e+03 100e+00

movie-1M

IALM-LANSVD 50 740e+02 158e-01 495 499e+03 100e+00IALM-FFSRQR 53 207e+02 137e-01 525 663e+03 100e+00IALM-RSISVD 50 223e+02 157e-01 495 535e+03 100e+00IALM-LTSVD 100 850e+02 117e-01 995 897e+03 100e+00

movie-latest-small

IALM-LANSVD 31 166e+02 185e-01 305 113e+03 100e+00IALM-FFSRQR 31 196e+01 200e-01 305 142e+03 100e+00IALM-RSISVD 31 285e+01 191e-01 305 120e+03 100e+00IALM-LTSVD 63 402e+01 208e-01 298 179e+03 100e+00

Appendix D IALM for solving two special nuclear norm minimization problemsAlgorithm 7 and Algorithm 8 solve the robust PCA and matrix completion problems

respectively In Algorithm 7 Sω (x) = sgn (x) middot max (|x| minus ω 0) is the soft shrinkageoperator [29] with x isin Rn and ω gt 0 In Algorithm 8 Ω is the complement of Ω

Appendix E Approximate SVD with randomized subspace iterationThe randomized subspace iteration was proposed in [30 Algorithm 44] to compute an

orthonormal matrix whose range approximates the range of A An approximate SVD can becomputed using the aforementioned orthonormal matrix [30 Algorithm 51] A randomizedsubspace iteration is used in routine MLSVD_RSI in Matlab toolbox tensorlab [73] andMLSVD_RSI is by far the most efficient function that we can find in Matlab to compute ST-HOSVD We summarize the approximate SVD with randomized subspace iteration pseudocodein Algorithm 9

Now we perform a complexity analysis for the approximate SVD with randomizedsubspace iteration We first note that

1 The cost of generating a random matrix is negligible2 The cost of computing B = AΩ is 2mn (k + p)

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 489

Algorithm 4 Partial QRCPInputsMatrix A isin Rmtimesn Target rank kOutputsOrthogonal matrix Q isin RmtimesmUpper trapezoidal matrix R isin RmtimesnPermutation matrix Π isin Rntimesn such that AΠ = QRAlgorithmInitialize Π = In Compute column norms rs = A (1 m s) 2 (1 le s le n)for j = 1 k do

Find i = arg maxjleslen

rs Exchange rj and ri columns in A and Π

Form Householder reflection Qj from A (j m j)Update trailing matrix A (j m j n)larr QTj A (j m j n)Update rs = A (j + 1 m s) 2 (j + 1 le s le n)

end forQ = Q1Q2 middot middot middotQk is the product of all reflections R = upper trapezoidal part of A

Algorithm 5 TUXV AlgorithmInputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0OutputsColumn orthonormal matrices U isin Rmtimesk V isin Rntimesk and upper triangular matrixR isin Rktimesk such that A asymp URV T AlgorithmDo TRQRCP on A to obtain Q isin Rmtimesk R isin Rktimesn and Π isin RntimesnR = RΠT and do LQ factorization ie [VR] = qr(RT 0)Compute Z = AV and do QR factorization ie [UR] = qr(Z 0)

3 In each QR step [Qsim] = qr (B 0) the cost of computing the QR factorization ofB is 2m (k + p)

2 minus 23 (k + p)

3 (cf [69]) and the cost of forming the first (k + p)

columns in the full Q matrix is m (k + p)2

+ 13 (k + p)

3

Now we count the flops for each i in the for loop

1 The cost of computing B = AT lowastQ is 2mn (k + p)2 The cost of computing [Qsim] = qr (B 0) is 2n (k + p)

2 minus 23 (k + p)

3 and the costof forming the first (k + p) columns in the full Q matrix is n (k + p)

2+ 1

3 (k + p)3

3 The cost of computing B = A lowastQ is 2mn (k + p)4 The cost of computing [Qsim] = qr (B 0) is 2m (k + p)

2minus 23 (k + p)

3 and the costof forming the first (k + p) columns in the full Q matrix is m (k + p)

2+ 1

3 (k + p)3

Together the cost of running the for loop q times is

q

(4mn (k + p) + 3 (m+ n) (k + p)

2 minus 2

3(k + p)

3

)

Additionally the cost of computing B = QT lowastA is 2mn (k + p) the cost of doing SVD of Bis O(n (k + p)

2) and the cost of computing U = Q lowast U is 2m (k + p)

2

ETNAKent State University and

Johann Radon Institute (RICAM)

490 Y FENG J XIAO AND M GU

Algorithm 6 ST-HOSVDInputsTensor X isin RI1timesmiddotmiddotmiddottimesId desired rank (k1 kd) and processing order p = (p1 middot middot middot pd)OutputsTucker decomposition [GU1 middot middot middot Ud]AlgorithmDefine tensor G larr X for j = 1 d dor = pj Compute exact or approximate rank kr SVD of the tensor unfolding G(r) asymp UrΣrV Tr Ur larr UrUpdate G(r) larr ΣrV

Tr ie applying UTr to G

end for

Algorithm 7 Robust PCA Using IALMInputsMeasured matrix M isin Rmtimesn positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 J (M) = max

(M2 λminus1MF

) Y0 = MJ (M) E0 = 0

while not converged do(UΣV) = svd

(MminusEk + microminus1k Yk

)

Xk+1 = USmicrominus1k

(Σ)V T

Ek+1 = Sλmicrominus1k

(M minusXk+1 + microminus1

k Yk)

Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then

Breakend if

end while

Now assume k + p min (mn) and omit the lower-order terms then we arriveat (4q + 4)mn (k + p) as the complexity of approximate SVD with randomized subspaceiteration In practice q is usually chosen to be an integer between 0 and 2

REFERENCES

[1] O ALTER P O BROWN AND D BOTSTEIN Singular value decomposition for genome-wide expressiondata processing and modeling Proc Natl Acad Sci USA 97 (2000) pp 10101ndash10106

[2] E ANDERSON Z BAI C BISCHOF L S BLACKFORD J DEMMEL J DONGARRA J DU CROZA GREENBAUM S HAMMARLING A MCKENNEY AND D SORENSEN LAPACK Usersrsquo Guide 3rded SIAM Philadelphia 1999

[3] C A ANDERSSON AND R BRO Improving the speed of multi-way algorithms Part I Tucker3 ChemometricsIntell Lab Syst 42 (1998) pp 93ndash103

[4] M W BERRY S T DUMAIS AND G W OrsquoBRIEN Using linear algebra for intelligent information retrievalSIAM Rev 37 (1995) pp 573ndash595

[5] C BISCHOF AND C VAN LOAN The WY representation for products of Householder matrices SIAM J Sci

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 491

Algorithm 8 Matrix Completion Using IALMInputsSampled set Ω sampled entries πΩ (M) positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 Y0 = 0 E0 = 0while not converged do

(UΣV) = svd(MminusEk + microminus1k Yk

)

Xk+1 = USmicrominus1k

(Σ)V T

Ek+1 = πΩ

(M minusXk+1 + microminus1

k Yk)

Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then

Breakend if

end while

Statist Comput 8 (1987) pp S2ndashS13[6] P BUSINGER AND G H GOLUB Handbook series linear algebra Linear least squares solutions by House-

holder transformations Numer Math 7 (1965) pp 269ndash276[7] J-F CAI E J CANDEgraveS AND Z SHEN A singular value thresholding algorithm for matrix completion

SIAM J Optim 20 (2010) pp 1956ndash1982[8] E J CANDEgraveS X LI Y MA AND J WRIGHT Robust principal component analysis J ACM 58 (2011)

Art 11 37 pages[9] E J CANDEgraveS AND B RECHT Exact matrix completion via convex optimization Found Comput Math 9

(2009) pp 717ndash772[10] J D CARROLL AND J-J CHANG Analysis of individual differences in multidimensional scaling via an

N-way generalization of ldquoEckart-Youngrdquo decomposition Psychometrika 35 (1970) pp 283ndash319[11] T A DAVIS AND Y HU The University of Florida sparse matrix collection ACM Trans Math Software 38

(2011) pp Art 1 25[12] L DE LATHAUWER AND B DE MOOR From matrix to tensor Multilinear algebra and signal processing in

Mathematics in Signal Processing IV Inst Math Appl Conf Ser Oxford Univ Press Oxford 1998pp 1ndash16

[13] L DE LATHAUWER B DE MOOR AND J VANDEWALLE A multilinear singular value decompositionSIAM J Matrix Anal Appl 21 (2000) pp 1253ndash1278

[14] On the best rank-1 and rank-(R1 R2 middot middot middot RN ) approximation of higher-order tensors SIAM JMatrix Anal Appl 21 (2000) pp 1324ndash1342

[15] L DE LATHAUWER AND J VANDEWALLE Dimensionality reduction in higher-order signal processing andrank-(R1 R2 RN ) reduction in multilinear algebra Linear Algebra Appl 391 (2004) pp 31ndash55

[16] P DRINEAS R KANNAN AND M W MAHONEY Fast Monte Carlo algorithms for matrices II Computinga low-rank approximation to a matrix SIAM J Comput 36 (2006) pp 158ndash183

[17] J A DUERSCH AND M GU Randomized QR with column pivoting SIAM J Sci Comput 39 (2017)pp C263ndashC291

[18] C ECKART AND G YOUNG The approximation of one matrix by another of lower rank Psychometrika 1(1936) pp 211ndash218

[19] L ELDEacuteN Matrix Methods in Data Mining and Pattern Recognition SIAM Philadelphia 2007[20] M FAZEL H HINDI AND S P BOYD Log-det heuristic for matrix rank minimization with applications to

Hankel and Euclidean distance matrices in Proceedings of the 2003 American Control Conference 2003IEEE Conference Proceedings IEEE Piscataway pp 2156ndash2162

[21] M FAZEL T K PONG D SUN AND P TSENG Hankel matrix rank minimization with applications tosystem identification and realization SIAM J Matrix Anal Appl 34 (2013) pp 946ndash977

[22] C FRANK AND E NOumlTH Optimizing eigenfaces by face masks for facial expression recognition in ComputerAnalysis of Images and Patterns N Petkov and M A Westenberg eds vol 2756 of Lecture Notes inComput Sci Springer Berlin 2003 pp 646ndash654

ETNAKent State University and

Johann Radon Institute (RICAM)

492 Y FENG J XIAO AND M GU

Algorithm 9 Approximate SVD with Randomized Subspace IterationInputsMatrix A isin Rmtimesn Target rank k Oversampling size p ge 0 Number of iterations q ge 1OutputsU isin Rmtimesk contains the approximate top k left singular vectors of AΣ isin Rktimesk contains the approximate top k singular values of AV isin Rntimesk contains the approximate top k right singular vectors of AAlgorithmGenerate iid Gaussian matrix Ω isin N (0 1)

ntimes(k+p)Compute B = AΩ[Qsim] = qr (B 0)for i = 1 q doB = AT lowastQ[Qsim] = qr (B 0)B = A lowastQ[Qsim] = qr (B 0)

end forB = QT lowastA[UΣ V ] = svd (B)U = Q lowast U U = U ( 1 k)Σ = Σ (1 k 1 k)V = V ( 1 k)

[23] G W FURNAS S DEERWESTER S T DUMAIS T K LANDAUER R A HARSHMAN L A STREETERAND K E LOCHBAUM Information retrieval using a singular value decomposition model of latentsemantic structure in Proceedings of the 11th Annual International ACM SIGIR Conference on Researchand Development in Information Retrieval Y Chiaramella ed ACM New York 1988 pp 465ndash480

[24] K GOLDBERG T ROEDER D GUPTA AND C PERKINS Eigentaste A constant time collaborative filteringalgorithm Inf Retrieval 4 (2001) pp 133ndash151

[25] G GOLUB Numerical methods for solving linear least squares problems Numer Math 7 (1965) pp 206ndash216[26] G H GOLUB AND C F VAN LOAN Matrix Computations 4th ed Johns Hopkins University Press

Baltimore 2013[27] M GU Subspace iteration randomization and singular value problems SIAM J Sci Comput 37 (2015)

pp A1139ndashA1173[28] M GU AND S C EISENSTAT Efficient algorithms for computing a strong rank-revealing QR factorization

SIAM J Sci Comput 17 (1996) pp 848ndash869[29] E T HALE W YIN AND Y ZHANG Fixed-point continuation for l1-minimization methodology and

convergence SIAM J Optim 19 (2008) pp 1107ndash1130[30] N HALKO P G MARTINSSON AND J A TROPP Finding structure with randomness probabilistic

algorithms for constructing approximate matrix decompositions SIAM Rev 53 (2011) pp 217ndash288[31] R A HARSHMAN Foundations of the PARAFAC procedure Models and conditions for an ldquoexplanatoryrdquo

multimodal factor analysis UCLA Working Papers in Phonetics 16 (1970) pp 1ndash84[32] J L HERLOCKER J A KONSTAN A BORCHERS AND J RIEDL An algorithmic framework for performing

collaborative filtering in Proceedings of the 22nd Annual International ACM SIGIR Conference onResearch and Development in Information Retrieval F Gey M Heaast and R Tong eds ACM NewYork 1999 pp 230ndash237

[33] R A HORN AND C R JOHNSON Topics in Matrix Analysis Cambridge University Press Cambridge 1991[34] D A HUCKABY AND T F CHAN On the convergence of Stewartrsquos QLP algorithm for approximating the

SVD Numer Algorithms 32 (2003) pp 287ndash316[35] Stewartrsquos pivoted QLP decomposition for low-rank matrices Numer Linear Algebra Appl 12 (2005)

pp 153ndash159[36] W B JOHNSON AND J LINDENSTRAUSS Extensions of Lipschitz mappings into a Hilbert space in

Conference in Modern Analysis and Probability (New Haven Conn 1982) R Beals A Beck A Bellowand A Hajian eds vol 26 of Contemp Math Amer Math Soc Providence 1984 pp 189ndash206

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 493

[37] I T JOLLIFFE Principal Component Analysis Springer New York 1986[38] J M KLEINBERG Authoritative sources in a hyperlinked environment J ACM 46 (1999) pp 604ndash632[39] T G KOLDA Orthogonal tensor decompositions SIAM J Matrix Anal Appl 23 (2001) pp 243ndash255[40] T G KOLDA AND B W BADER Tensor decompositions and applications SIAM Rev 51 (2009) pp 455ndash

500[41] R M LARSEN PROPACK-software for large and sparse SVD calculations available at

httpsunstanfordedurmunkPROPACK[42] Lanczos bidiagonalization with partial reorthogonalization Tech Report PB-537 Department of

Computer Science Aarhus University Aarhus Denmark 1998[43] Y LECUN L BOTTOU Y BENGIO AND P HAFFNER Gradient-based learning applied to document

recognition Proc IEEE 86 (1998) pp 2278ndash2324[44] R B LEHOUCQ D C SORENSEN AND C YANG ARPACK Usersrsquo Guide Solution of Large-Scale

Eigenvalue Problems with Implicitly Restarted Arnoldi Methods SIAM Philadelphia 1998[45] Z LIN M CHEN AND Y MA The augmented Lagrange multiplier method for exact recovery of corrupted

low-rank matrices Preprint on arXiv httpsarxivorgabs10095055 2010[46] N LINIAL E LONDON AND Y RABINOVICH The geometry of graphs and some of its algorithmic

applications Combinatorica 15 (1995) pp 215ndash245[47] Z LIU A HANSSON AND L VANDENBERGHE Nuclear norm system identification with missing inputs and

outputs Systems Control Lett 62 (2013) pp 605ndash612[48] Z LIU AND L VANDENBERGHE Interior-point method for nuclear norm approximation with application to

system identification SIAM J Matrix Anal Appl 31 (2009) pp 1235ndash1256[49] S MA D GOLDFARB AND L CHEN Fixed point and Bregman iterative methods for matrix rank minimiza-

tion Math Program 128 (2011) pp 321ndash353[50] C B MELGAARD Randomized Pivoting and Spectrum-Revealing Bounds in Numerical Linear Algebra

PhD Thesis University of California Berkeley 2015[51] M MESBAHI AND G P PAPAVASSILOPOULOS On the rank minimization problem over a positive semidefinite

linear matrix inequality IEEE Trans Automat Control 42 (1997) pp 239ndash243[52] N MULLER L MAGAIA AND B M HERBST Singular value decomposition eigenfaces and 3D recon-

structions SIAM Rev 46 (2004) pp 518ndash545[53] M NARWARIA AND W LIN SVD-based quality metric for image and video using machine learning IEEE

Trans Systbdquo Man and Cybernetics Part B (Cybernetics) 42 (2012) pp 347ndash364[54] T-H OH Y MATSUSHITA Y-W TAI AND I SO KWEON Fast randomized singular value thresholding

for nuclear norm minimization in Proceedings of the IEEE Conference on Computer Vision and PatternRecognition 2015 IEEE Conference Proceedings IEEE Los Alamitos 2015 pp 4484ndash4493

[55] G QUINTANA-ORTIacute X SUN AND C H BISCHOF A BLAS-3 version of the QR factorization with columnpivoting SIAM J Sci Comput 19 (1998) pp 1486ndash1494

[56] B RECHT M FAZEL AND P A PARRILO Guaranteed minimum-rank solutions of linear matrix equationsvia nuclear norm minimization SIAM Rev 52 (2010) pp 471ndash501

[57] J D RENNIE AND N SREBRO Fast maximum margin matrix factorization for collaborative prediction inProceedings of the 22nd International Conference on Machine Learning L De Raedt and S Wrobel edsACM Press Neow York 2005 pp 713ndash719

[58] V ROKHLIN A SZLAM AND M TYGERT A randomized algorithm for principal component analysis SIAMJ Matrix Anal Appl 31 (2009) pp 1100ndash1124

[59] A K SAIBABA HOID higher order interpolatory decomposition for tensors based on Tucker representationSIAM J Matrix Anal Appl 37 (2016) pp 1223ndash1249

[60] B SAVAS AND L ELDEacuteN Handwritten digit classification using higher order singular value decompositionPattern Recognition 40 (2007) pp 993ndash1003

[61] R SCHREIBER AND C VAN LOAN A storage-efficient WY representation for products of Householdertransformations SIAM J Sci Statist Comput 10 (1989) pp 53ndash57

[62] A SHASHUA AND T HAZAN Non-negative tensor factorization with applications to statistics and com-puter vision in Proceedings of the 22nd International Conference on Machine Learning S DzeroskiL De Raedt and S Wrobel eds ACM New York 2005 pp 792ndash799

[63] N D SIDIROPOULOS R BRO AND G B GIANNAKIS Parallel factor analysis in sensor array processingIEEE Trans Signal Process 48 (2000) pp 2377ndash2388

[64] D C SORENSEN AND M EMBREE A DEIM induced CUR factorization SIAM J Sci Comput 38 (2016)pp A1454ndashA1482

[65] N SREBRO J RENNIE AND T S JAAKKOLA Maximum-margin matrix factorization in NIPSrsquo04 Proceed-ings of the 17th International Conference on Neural Information Processing Systems L K Saul Y Weissand L Bottou eds MIT Press Cambridge 2004 pp 1329ndash1336

[66] G W STEWART The QLP approximation to the singular value decomposition SIAM J Sci Comput 20(1999) pp 1336ndash1348

ETNAKent State University and

Johann Radon Institute (RICAM)

494 Y FENG J XIAO AND M GU

[67] S TAHERI Q QIU AND R CHELLAPPA Structure-preserving sparse decomposition for facial expressionanalysis IEEE Trans Image Process 23 (2014) pp 3590ndash3603

[68] K-C TOH AND S YUN An accelerated proximal gradient algorithm for nuclear norm regularized linearleast squares problems Pac J Optim 6 (2010) pp 615ndash640

[69] L N TREFETHEN AND D BAU III Numerical Linear Algebra SIAM Philadelphia 1997[70] L R TUCKER Some mathematical notes on three-mode factor analysis Psychometrika 31 (1966) pp 279ndash

311[71] N VANNIEUWENHOVEN R VANDEBRIL AND K MEERBERGEN A new truncation strategy for the higher-

order singular value decomposition SIAM J Sci Comput 34 (2012) pp A1027ndashA1052[72] M A O VASILESCU AND D TERZOPOULOS Multilinear analysis of image ensembles Tensorfaces in

European Conference on Computer Vision 2002 A Heyden G Sparr M Nielsen and P Johansen edsLecture Notes in Computer Science vol 2350 Springer Berlin 2002 pp 447ndash460

[73] N VERVLIET O DEBALS L SORBER M V BAREL AND L DE LATHAUWER Tensorlab user guideavailable at httpwwwtensorlabnet

[74] J XIAO AND M GU Spectrum-revealing Cholesky factorization for kernel methods in Proceedings of the 16thIEEE International Conference on Data Mining (ICDM) IEEE Conference Proceedings Los Alamitos2016 pp 1293ndash1298

[75] J XIAO M GU AND J LANGOU Fast parallel randomized QR with column pivoting algorithms for reliablelow-rank matrix approximations in Proceedings of the 24th IEEE International Conference on HighPerformance Computing (HiPC) IEEE Conference Proceedings Los Alamitos 2017 pp 233ndash242

[76] T ZHANG AND G H GOLUB Rank-one approximation to high order tensors SIAM J Matrix Anal Appl23 (2001) pp 534ndash550

Page 16: FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION AND …etna.mcs.kent.edu/vol.51.2019/pp469-494.dir/pp469-494.pdfminimization. In Section3, we introduce Flip-Flop SRQR and analyze its

ETNAKent State University and

Johann Radon Institute (RICAM)

484 Y FENG J XIAO AND M GU

5 10 15 20

index

06

07

08

09

1to

p 2

0 s

ingula

r valu

es

FFSRQR

RSISVD

LANSVD

TUXV

(a) Type 1 Random square matrix

5 10 15 20

index

06

07

08

09

1

top 2

0 s

ingula

r valu

es

FFSRQR

RSISVD

LANSVD

TUXV

(b) Type 1 Random short-fat matrix

5 10 15 20

index

06

065

07

075

08

085

09

095

top

20

sin

gu

lar

va

lue

s

FFSRQR

RSISVD

LANSVD

TUXV

(c) Type 1 Random tall-skinny matrix

5 10 15 20

index

101

102

103

top

20

sin

gu

lar

va

lue

sFFSRQR

RSISVD

LANSVD

TUXV

(d) Type 2 GEMAT11

FIG 43 Top 20 singular values comparison for approximate SVD algorithms

Figure 44 displays a comparison of the efficiency and accuracy of different methods for a400times 400times 400 sparse tensor approximation problem MLSVD_LTSVD is the fastest but theleast accurate one The other three methods have similar accuracy while MLSVD_FFSRQR isfaster when the target rank k is larger

422 Handwritten digits classification MNIST is a handwritten digits image data setcreated by Yann LeCun [43] Every digit is represented by a 28times28 pixel image Handwrittendigits classification has the objective of training a classification model to classify new unlabeledimages A HOSVD algorithm is proposed by Savas and Elden [60] to classify handwrittendigits To reduce the training time a more efficient ST-HOSVD algorithm is introducedin [71]

We do handwritten digits classification using MNIST which consists of 60 000 trainingimages and 10 000 test images The number of training images in each class is restrictedto 5421 so that the training set is equally distributed over all classes The training set isrepresented by a tensor X of size 786times 5421times 10 The classification relies on Algorithm 2in [60] We use various algorithms to obtain an approximation X asymp G times1 U1 times2 U2 times3 U3

where the core tensor G has size 65times 142times 10The results are summarized in Table 43 In terms of run time we observe that our method

MLSVD_FFSRQR is comparable to MLSVD_RSI while MLSVD is the most expensive

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 485

50 100 150 200

target rank k

10-5

10-4

10-3

10-2

10-1

100

rela

tive

ap

pro

xim

atio

n e

rro

r

MLSVD

MLSVD_FFSRQR

MLSVD_RSI

MLSVD_LTSVD

50 100 150 200

target rank k

10-2

10-1

100

101

102

run tim

e (

sec)

MLSVD

MLSVD_FFSRQR

MLSVD_RSI

MLSVD_LTSVD

FIG 44 Run time and relative approximation error comparison on a sparse tensor

TABLE 43Comparison on handwritten digits classification

MLSVD MLSVD_FFSRQR MLSVD_RSI MLSVD_LTSVD

Training Time [sec] 272121 15455 19343 04266Relative Model Error 04099 04273 04247 05162Classification Accur 9519 9498 9505 9259

one and MLSVD_LTSVD is the fastest one In terms of classification quality MLSVDMLSVD_FFSRQR and MLSVD_RSI are comparable while MLSVD_LTSVD is the leastaccurate one

43 Solving nuclear norm minimization problem To show the effectiveness of theFFSRQR algorithm for the nuclear norm minimization problems we investigate two scenariosthe robust PCA (27) and the matrix completion problem (28) The test matrix used for therobust PCA is introduced in [45] and the test matrices used in the matrix completion are tworeal data sets We use the IALM method [45] to solve both problems the IALM code can bedownloaded1

431 Robust PCA To solve the robust PCA problem we replace the approximateSVD part in the IALM method [45] by various methods We denote the actual solution tothe robust PCA problem by a matrix pair (Xlowast Elowast) isin Rmtimesn times Rmtimesn The matrix Xlowast isXlowast = XLX

TR with XL isin Rmtimesk XR isin Rntimesk being random matrices whose entries are

independently sampled from a normal distribution The sparse matrix Elowast is a random matrixwhere its non-zero entries are independently sampled from a uniform distribution over theinterval [minus500 500] The input to the IALM algorithm has the form M = Xlowast + Elowast andthe output is denoted by

(X E

) In this numerical experiment we use the same parameter

settings as the IALM code for robust PCA rank k is 01m and the number of non-zero entriesin E is 005m2 We choose the trade-off parameter λ = 1

radicmax (mn) as suggested by

Candes et al [8] The solution quality is measured by the normalized root mean square errorX minusXlowastF XlowastF

Table 44 includes relative error run time the number of non-zero entries in E (E0)the iteration count and the number of non-zero singular values (sv) in X of the IALM

1httpperceptioncslillinoisedumatrix-ranksample_codehtml

ETNAKent State University and

Johann Radon Institute (RICAM)

486 Y FENG J XIAO AND M GU

algorithm using different approximate SVD methods We observe that IALM_FFSRQR isfaster than all the other three methods while its error is comparable to IALM_LANSVD andIALM_RSISVD IALM_LTSVD is relatively slow and not effective

TABLE 44Comparison on robust PCA

Size Method Error Time (sec) E0 Iter sv

1000times 1000

IALM_LANSVD 333e-07 579e+00 50000 22 100IALM_FFSRQR 279e-07 102e+00 50000 25 100IALM_RSISVD 336e-07 109e+00 49999 22 100IALM_LTSVD 992e-02 311e+00 999715 100 100

2000times 2000

IALM_LANSVD 261e-07 591e+01 199999 22 200IALM_FFSRQR 182e-07 693e+00 199998 25 200IALM_RSISVD 263e-07 738e+00 199996 22 200IALM_LTSVD 842e-02 220e+01 3998937 100 200

4000times 4000

IALM_LANSVD 138e-07 465e+02 799991 23 400IALM_FFSRQR 139e-07 443e+01 800006 26 400IALM_RSISVD 151e-07 504e+01 799990 23 400IALM_LTSVD 894e-02 154e+02 15996623 100 400

6000times 6000

IALM_LANSVD 130e-07 166e+03 1799982 23 600IALM_FFSRQR 102e-07 142e+02 1799993 26 600IALM_RSISVD 144e-07 162e+02 1799985 23 600IALM_LTSVD 858e-02 555e+02 35992605 100 600

432 Matrix completion We solve the matrix completion problems for two real datasets used in [68] the Jester joke data set [24] and the MovieLens data set [32] The Jester jokedata set consists of 41 million ratings for 100 jokes from 73 421 users and can be downloadedfrom the website httpgoldbergberkeleyedujester-data We test withthe following data matrices

bull jester-1 Data from 24 983 users who have rated 36 or more jokesbull jester-2 Data from 23 500 users who have rated 36 or more jokesbull jester-3 Data from 24 938 users who have rated between 15 and 35 jokesbull jester-all The combination of jester-1 jester-2 and jester-3

The MovieLens data set can be downloaded fromhttpsgrouplensorgdatasetsmovielensWe test with the following data matricesbull movie-100K 100 000 ratings of 943 users for 1682 moviesbull movie-1M 1 million ratings of 6040 users for 3900 moviesbull movie-latest-small 100 000 ratings of 700 users for 9000 movies

For each data set we let M be the original data matrix where Mij stands for the ratingof joke (movie) j by user i and Γ be the set of indices where Mij is known The matrixcompletion algorithm quality is measured by the Normalized Mean Absolute Error (NMAE)defined by

NMAEdef=

1|Γ|sum

(ij)isinΓ |Mij minusXij |rmax minus rmin

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 487

where Xij is the prediction of the rating of joke (movie) j given by user i and rmin rmax arethe lower and upper bounds of the ratings respectively For the Jester joke data sets we setrmin = minus10 and rmax = 10 For the MovieLens data sets we set rmin = 1 and rmax = 5

Since |Γ| is large we randomly select a subset Ω from Γ and then use Algorithm 8 tosolve the problem (28) We randomly select 10 ratings for each user in the Jester joke datasets while we randomly choose about 50 of the ratings for each user in the MovieLensdata sets Table 45 includes the parameter settings in the algorithms The maximum iterationnumber is 100 in IALM and all other parameters are the same as those used in [45]

The numerical results are included in Table 46 We observe that IALM_FFSRQR achievesalmost the same recoverability as other methods (except IALM_LTSVD) and is slightly fasterthan IALM_RSISVD for these two data sets

TABLE 45Parameters used in the IALM method on matrix completion

Data set m n |Γ| |Ω|jester-1 24983 100 181e+06 249830jester-2 23500 100 171e+06 235000jester-3 24938 100 617e+05 249384

jester-all 73421 100 414e+06 734210movie-100K 943 1682 100e+05 49918movie-1M 6040 3706 100e+06 498742

movie-latest-small 671 9066 100e+05 52551

5 Conclusions We have presented the Flip-Flop SRQR factorization a variant ofthe QLP factorization to compute low-rank matrix approximations The Flip-Flop SRQRalgorithm uses a SRQR factorization to initialize the truncated version of a column-pivotedQR factorization and then forms an LQ factorization For the numerical results presented theerrors in the proposed algorithm were comparable to those obtained from the other state-of-the-art algorithms This new algorithm is cheaper to compute and produces quality low-rankmatrix approximations Furthermore we prove singular value lower bounds and residual errorupper bounds for the Flip-Flop SRQR factorization In situations where singular values ofthe input matrix decay relatively quickly the low-rank approximation computed by Flip-FlopSRQR is guaranteed to be as accurate as the truncated SVD We also perform a complexityanalysis to show that Flip-Flop SRQR is faster than the approximate SVD with randomizedsubspace iteration Future work includes reducing the overhead cost in Flip-Flop SRQR andimplementing the Flip-Flop SRQR algorithm on distributed memory machines for popularapplications such as distributed PCA

Acknowledgments The authors would like to thank Yousef Saad the editor and threeanonymous referees who kindly reviewed the earlier version of this manuscript and providedvaluable suggestions and comments

Appendix A Partial QR factorization with column pivotingThe details of partial QRCP are covered in Algorithm 4

Appendix B TUXV algorithmThe details of TUXV algorithm are presented in Algorithm 5

Appendix C Computing Tucker decomposition of higher-order tensorsAlgorithm 6 computes the Tucker decomposition of higher-order tensors ie sequentially

truncated higher-order SVD

ETNAKent State University and

Johann Radon Institute (RICAM)

488 Y FENG J XIAO AND M GU

TABLE 46Comparison on matrix completion

Data set Method Iter Time NMAE sv σmax σmin

jester-1

IALM-LANSVD 12 706e+00 184e-01 100 214e+03 100e+00IALM-FFSRQR 12 344e+00 169e-01 100 228e+03 100e+00IALM-RSISVD 12 375e+00 189e-01 100 212e+03 100e+00IALM-LTSVD 100 211e+01 174e-01 62 300e+03 100e+00

jester-2

IALM-LANSVD 12 680e+00 185e-01 100 213e+03 100e+00IALM-FFSRQR 12 279e+00 170e-01 100 229e+03 100e+00IALM-RSISVD 12 359e+00 191e-01 100 212e+03 100e+00IALM-LTSVD 100 203e+01 175e-01 58 296e+03 100e+00

jester-3

IALM-LANSVD 12 705e+00 126e-01 99 179e+03 100e+00IALM-FFSRQR 12 303e+00 122e-01 100 171e+03 100e+00IALM-RSISVD 12 385e+00 131e-01 100 178e+03 100e+00IALM-LTSVD 100 212e+01 133e-01 55 250e+03 100e+00

jester-all

IALM-LANSVD 12 239e+01 172e-01 100 356e+03 100e+00IALM-FFSRQR 12 112e+01 162e-01 100 363e+03 100e+00IALM-RSISVD 12 134e+01 182e-01 100 347e+03 100e+00IALM-LTSVD 100 699e+01 168e-01 52 492e+03 100e+00

movie-100K

IALM-LANSVD 29 286e+01 183e-01 285 121e+03 100e+00IALM-FFSRQR 30 455e+00 167e-01 295 153e+03 100e+00IALM-RSISVD 29 482e+00 182e-01 285 129e+03 100e+00IALM-LTSVD 48 142e+01 147e-01 475 191e+03 100e+00

movie-1M

IALM-LANSVD 50 740e+02 158e-01 495 499e+03 100e+00IALM-FFSRQR 53 207e+02 137e-01 525 663e+03 100e+00IALM-RSISVD 50 223e+02 157e-01 495 535e+03 100e+00IALM-LTSVD 100 850e+02 117e-01 995 897e+03 100e+00

movie-latest-small

IALM-LANSVD 31 166e+02 185e-01 305 113e+03 100e+00IALM-FFSRQR 31 196e+01 200e-01 305 142e+03 100e+00IALM-RSISVD 31 285e+01 191e-01 305 120e+03 100e+00IALM-LTSVD 63 402e+01 208e-01 298 179e+03 100e+00

Appendix D IALM for solving two special nuclear norm minimization problemsAlgorithm 7 and Algorithm 8 solve the robust PCA and matrix completion problems

respectively In Algorithm 7 Sω (x) = sgn (x) middot max (|x| minus ω 0) is the soft shrinkageoperator [29] with x isin Rn and ω gt 0 In Algorithm 8 Ω is the complement of Ω

Appendix E Approximate SVD with randomized subspace iterationThe randomized subspace iteration was proposed in [30 Algorithm 44] to compute an

orthonormal matrix whose range approximates the range of A An approximate SVD can becomputed using the aforementioned orthonormal matrix [30 Algorithm 51] A randomizedsubspace iteration is used in routine MLSVD_RSI in Matlab toolbox tensorlab [73] andMLSVD_RSI is by far the most efficient function that we can find in Matlab to compute ST-HOSVD We summarize the approximate SVD with randomized subspace iteration pseudocodein Algorithm 9

Now we perform a complexity analysis for the approximate SVD with randomizedsubspace iteration We first note that

1 The cost of generating a random matrix is negligible2 The cost of computing B = AΩ is 2mn (k + p)

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 489

Algorithm 4 Partial QRCPInputsMatrix A isin Rmtimesn Target rank kOutputsOrthogonal matrix Q isin RmtimesmUpper trapezoidal matrix R isin RmtimesnPermutation matrix Π isin Rntimesn such that AΠ = QRAlgorithmInitialize Π = In Compute column norms rs = A (1 m s) 2 (1 le s le n)for j = 1 k do

Find i = arg maxjleslen

rs Exchange rj and ri columns in A and Π

Form Householder reflection Qj from A (j m j)Update trailing matrix A (j m j n)larr QTj A (j m j n)Update rs = A (j + 1 m s) 2 (j + 1 le s le n)

end forQ = Q1Q2 middot middot middotQk is the product of all reflections R = upper trapezoidal part of A

Algorithm 5 TUXV AlgorithmInputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0OutputsColumn orthonormal matrices U isin Rmtimesk V isin Rntimesk and upper triangular matrixR isin Rktimesk such that A asymp URV T AlgorithmDo TRQRCP on A to obtain Q isin Rmtimesk R isin Rktimesn and Π isin RntimesnR = RΠT and do LQ factorization ie [VR] = qr(RT 0)Compute Z = AV and do QR factorization ie [UR] = qr(Z 0)

3 In each QR step [Qsim] = qr (B 0) the cost of computing the QR factorization ofB is 2m (k + p)

2 minus 23 (k + p)

3 (cf [69]) and the cost of forming the first (k + p)

columns in the full Q matrix is m (k + p)2

+ 13 (k + p)

3

Now we count the flops for each i in the for loop

1 The cost of computing B = AT lowastQ is 2mn (k + p)2 The cost of computing [Qsim] = qr (B 0) is 2n (k + p)

2 minus 23 (k + p)

3 and the costof forming the first (k + p) columns in the full Q matrix is n (k + p)

2+ 1

3 (k + p)3

3 The cost of computing B = A lowastQ is 2mn (k + p)4 The cost of computing [Qsim] = qr (B 0) is 2m (k + p)

2minus 23 (k + p)

3 and the costof forming the first (k + p) columns in the full Q matrix is m (k + p)

2+ 1

3 (k + p)3

Together the cost of running the for loop q times is

q

(4mn (k + p) + 3 (m+ n) (k + p)

2 minus 2

3(k + p)

3

)

Additionally the cost of computing B = QT lowastA is 2mn (k + p) the cost of doing SVD of Bis O(n (k + p)

2) and the cost of computing U = Q lowast U is 2m (k + p)

2

ETNAKent State University and

Johann Radon Institute (RICAM)

490 Y FENG J XIAO AND M GU

Algorithm 6 ST-HOSVDInputsTensor X isin RI1timesmiddotmiddotmiddottimesId desired rank (k1 kd) and processing order p = (p1 middot middot middot pd)OutputsTucker decomposition [GU1 middot middot middot Ud]AlgorithmDefine tensor G larr X for j = 1 d dor = pj Compute exact or approximate rank kr SVD of the tensor unfolding G(r) asymp UrΣrV Tr Ur larr UrUpdate G(r) larr ΣrV

Tr ie applying UTr to G

end for

Algorithm 7 Robust PCA Using IALMInputsMeasured matrix M isin Rmtimesn positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 J (M) = max

(M2 λminus1MF

) Y0 = MJ (M) E0 = 0

while not converged do(UΣV) = svd

(MminusEk + microminus1k Yk

)

Xk+1 = USmicrominus1k

(Σ)V T

Ek+1 = Sλmicrominus1k

(M minusXk+1 + microminus1

k Yk)

Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then

Breakend if

end while

Now assume k + p min (mn) and omit the lower-order terms then we arriveat (4q + 4)mn (k + p) as the complexity of approximate SVD with randomized subspaceiteration In practice q is usually chosen to be an integer between 0 and 2

REFERENCES

[1] O ALTER P O BROWN AND D BOTSTEIN Singular value decomposition for genome-wide expressiondata processing and modeling Proc Natl Acad Sci USA 97 (2000) pp 10101ndash10106

[2] E ANDERSON Z BAI C BISCHOF L S BLACKFORD J DEMMEL J DONGARRA J DU CROZA GREENBAUM S HAMMARLING A MCKENNEY AND D SORENSEN LAPACK Usersrsquo Guide 3rded SIAM Philadelphia 1999

[3] C A ANDERSSON AND R BRO Improving the speed of multi-way algorithms Part I Tucker3 ChemometricsIntell Lab Syst 42 (1998) pp 93ndash103

[4] M W BERRY S T DUMAIS AND G W OrsquoBRIEN Using linear algebra for intelligent information retrievalSIAM Rev 37 (1995) pp 573ndash595

[5] C BISCHOF AND C VAN LOAN The WY representation for products of Householder matrices SIAM J Sci

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 491

Algorithm 8 Matrix Completion Using IALMInputsSampled set Ω sampled entries πΩ (M) positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 Y0 = 0 E0 = 0while not converged do

(UΣV) = svd(MminusEk + microminus1k Yk

)

Xk+1 = USmicrominus1k

(Σ)V T

Ek+1 = πΩ

(M minusXk+1 + microminus1

k Yk)

Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then

Breakend if

end while

Statist Comput 8 (1987) pp S2ndashS13[6] P BUSINGER AND G H GOLUB Handbook series linear algebra Linear least squares solutions by House-

holder transformations Numer Math 7 (1965) pp 269ndash276[7] J-F CAI E J CANDEgraveS AND Z SHEN A singular value thresholding algorithm for matrix completion

SIAM J Optim 20 (2010) pp 1956ndash1982[8] E J CANDEgraveS X LI Y MA AND J WRIGHT Robust principal component analysis J ACM 58 (2011)

Art 11 37 pages[9] E J CANDEgraveS AND B RECHT Exact matrix completion via convex optimization Found Comput Math 9

(2009) pp 717ndash772[10] J D CARROLL AND J-J CHANG Analysis of individual differences in multidimensional scaling via an

N-way generalization of ldquoEckart-Youngrdquo decomposition Psychometrika 35 (1970) pp 283ndash319[11] T A DAVIS AND Y HU The University of Florida sparse matrix collection ACM Trans Math Software 38

(2011) pp Art 1 25[12] L DE LATHAUWER AND B DE MOOR From matrix to tensor Multilinear algebra and signal processing in

Mathematics in Signal Processing IV Inst Math Appl Conf Ser Oxford Univ Press Oxford 1998pp 1ndash16

[13] L DE LATHAUWER B DE MOOR AND J VANDEWALLE A multilinear singular value decompositionSIAM J Matrix Anal Appl 21 (2000) pp 1253ndash1278

[14] On the best rank-1 and rank-(R1 R2 middot middot middot RN ) approximation of higher-order tensors SIAM JMatrix Anal Appl 21 (2000) pp 1324ndash1342

[15] L DE LATHAUWER AND J VANDEWALLE Dimensionality reduction in higher-order signal processing andrank-(R1 R2 RN ) reduction in multilinear algebra Linear Algebra Appl 391 (2004) pp 31ndash55

[16] P DRINEAS R KANNAN AND M W MAHONEY Fast Monte Carlo algorithms for matrices II Computinga low-rank approximation to a matrix SIAM J Comput 36 (2006) pp 158ndash183

[17] J A DUERSCH AND M GU Randomized QR with column pivoting SIAM J Sci Comput 39 (2017)pp C263ndashC291

[18] C ECKART AND G YOUNG The approximation of one matrix by another of lower rank Psychometrika 1(1936) pp 211ndash218

[19] L ELDEacuteN Matrix Methods in Data Mining and Pattern Recognition SIAM Philadelphia 2007[20] M FAZEL H HINDI AND S P BOYD Log-det heuristic for matrix rank minimization with applications to

Hankel and Euclidean distance matrices in Proceedings of the 2003 American Control Conference 2003IEEE Conference Proceedings IEEE Piscataway pp 2156ndash2162

[21] M FAZEL T K PONG D SUN AND P TSENG Hankel matrix rank minimization with applications tosystem identification and realization SIAM J Matrix Anal Appl 34 (2013) pp 946ndash977

[22] C FRANK AND E NOumlTH Optimizing eigenfaces by face masks for facial expression recognition in ComputerAnalysis of Images and Patterns N Petkov and M A Westenberg eds vol 2756 of Lecture Notes inComput Sci Springer Berlin 2003 pp 646ndash654

ETNAKent State University and

Johann Radon Institute (RICAM)

492 Y FENG J XIAO AND M GU

Algorithm 9 Approximate SVD with Randomized Subspace IterationInputsMatrix A isin Rmtimesn Target rank k Oversampling size p ge 0 Number of iterations q ge 1OutputsU isin Rmtimesk contains the approximate top k left singular vectors of AΣ isin Rktimesk contains the approximate top k singular values of AV isin Rntimesk contains the approximate top k right singular vectors of AAlgorithmGenerate iid Gaussian matrix Ω isin N (0 1)

ntimes(k+p)Compute B = AΩ[Qsim] = qr (B 0)for i = 1 q doB = AT lowastQ[Qsim] = qr (B 0)B = A lowastQ[Qsim] = qr (B 0)

end forB = QT lowastA[UΣ V ] = svd (B)U = Q lowast U U = U ( 1 k)Σ = Σ (1 k 1 k)V = V ( 1 k)

[23] G W FURNAS S DEERWESTER S T DUMAIS T K LANDAUER R A HARSHMAN L A STREETERAND K E LOCHBAUM Information retrieval using a singular value decomposition model of latentsemantic structure in Proceedings of the 11th Annual International ACM SIGIR Conference on Researchand Development in Information Retrieval Y Chiaramella ed ACM New York 1988 pp 465ndash480

[24] K GOLDBERG T ROEDER D GUPTA AND C PERKINS Eigentaste A constant time collaborative filteringalgorithm Inf Retrieval 4 (2001) pp 133ndash151

[25] G GOLUB Numerical methods for solving linear least squares problems Numer Math 7 (1965) pp 206ndash216[26] G H GOLUB AND C F VAN LOAN Matrix Computations 4th ed Johns Hopkins University Press

Baltimore 2013[27] M GU Subspace iteration randomization and singular value problems SIAM J Sci Comput 37 (2015)

pp A1139ndashA1173[28] M GU AND S C EISENSTAT Efficient algorithms for computing a strong rank-revealing QR factorization

SIAM J Sci Comput 17 (1996) pp 848ndash869[29] E T HALE W YIN AND Y ZHANG Fixed-point continuation for l1-minimization methodology and

convergence SIAM J Optim 19 (2008) pp 1107ndash1130[30] N HALKO P G MARTINSSON AND J A TROPP Finding structure with randomness probabilistic

algorithms for constructing approximate matrix decompositions SIAM Rev 53 (2011) pp 217ndash288[31] R A HARSHMAN Foundations of the PARAFAC procedure Models and conditions for an ldquoexplanatoryrdquo

multimodal factor analysis UCLA Working Papers in Phonetics 16 (1970) pp 1ndash84[32] J L HERLOCKER J A KONSTAN A BORCHERS AND J RIEDL An algorithmic framework for performing

collaborative filtering in Proceedings of the 22nd Annual International ACM SIGIR Conference onResearch and Development in Information Retrieval F Gey M Heaast and R Tong eds ACM NewYork 1999 pp 230ndash237

[33] R A HORN AND C R JOHNSON Topics in Matrix Analysis Cambridge University Press Cambridge 1991[34] D A HUCKABY AND T F CHAN On the convergence of Stewartrsquos QLP algorithm for approximating the

SVD Numer Algorithms 32 (2003) pp 287ndash316[35] Stewartrsquos pivoted QLP decomposition for low-rank matrices Numer Linear Algebra Appl 12 (2005)

pp 153ndash159[36] W B JOHNSON AND J LINDENSTRAUSS Extensions of Lipschitz mappings into a Hilbert space in

Conference in Modern Analysis and Probability (New Haven Conn 1982) R Beals A Beck A Bellowand A Hajian eds vol 26 of Contemp Math Amer Math Soc Providence 1984 pp 189ndash206

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 493

[37] I T JOLLIFFE Principal Component Analysis Springer New York 1986[38] J M KLEINBERG Authoritative sources in a hyperlinked environment J ACM 46 (1999) pp 604ndash632[39] T G KOLDA Orthogonal tensor decompositions SIAM J Matrix Anal Appl 23 (2001) pp 243ndash255[40] T G KOLDA AND B W BADER Tensor decompositions and applications SIAM Rev 51 (2009) pp 455ndash

500[41] R M LARSEN PROPACK-software for large and sparse SVD calculations available at

httpsunstanfordedurmunkPROPACK[42] Lanczos bidiagonalization with partial reorthogonalization Tech Report PB-537 Department of

Computer Science Aarhus University Aarhus Denmark 1998[43] Y LECUN L BOTTOU Y BENGIO AND P HAFFNER Gradient-based learning applied to document

recognition Proc IEEE 86 (1998) pp 2278ndash2324[44] R B LEHOUCQ D C SORENSEN AND C YANG ARPACK Usersrsquo Guide Solution of Large-Scale

Eigenvalue Problems with Implicitly Restarted Arnoldi Methods SIAM Philadelphia 1998[45] Z LIN M CHEN AND Y MA The augmented Lagrange multiplier method for exact recovery of corrupted

low-rank matrices Preprint on arXiv httpsarxivorgabs10095055 2010[46] N LINIAL E LONDON AND Y RABINOVICH The geometry of graphs and some of its algorithmic

applications Combinatorica 15 (1995) pp 215ndash245[47] Z LIU A HANSSON AND L VANDENBERGHE Nuclear norm system identification with missing inputs and

outputs Systems Control Lett 62 (2013) pp 605ndash612[48] Z LIU AND L VANDENBERGHE Interior-point method for nuclear norm approximation with application to

system identification SIAM J Matrix Anal Appl 31 (2009) pp 1235ndash1256[49] S MA D GOLDFARB AND L CHEN Fixed point and Bregman iterative methods for matrix rank minimiza-

tion Math Program 128 (2011) pp 321ndash353[50] C B MELGAARD Randomized Pivoting and Spectrum-Revealing Bounds in Numerical Linear Algebra

PhD Thesis University of California Berkeley 2015[51] M MESBAHI AND G P PAPAVASSILOPOULOS On the rank minimization problem over a positive semidefinite

linear matrix inequality IEEE Trans Automat Control 42 (1997) pp 239ndash243[52] N MULLER L MAGAIA AND B M HERBST Singular value decomposition eigenfaces and 3D recon-

structions SIAM Rev 46 (2004) pp 518ndash545[53] M NARWARIA AND W LIN SVD-based quality metric for image and video using machine learning IEEE

Trans Systbdquo Man and Cybernetics Part B (Cybernetics) 42 (2012) pp 347ndash364[54] T-H OH Y MATSUSHITA Y-W TAI AND I SO KWEON Fast randomized singular value thresholding

for nuclear norm minimization in Proceedings of the IEEE Conference on Computer Vision and PatternRecognition 2015 IEEE Conference Proceedings IEEE Los Alamitos 2015 pp 4484ndash4493

[55] G QUINTANA-ORTIacute X SUN AND C H BISCHOF A BLAS-3 version of the QR factorization with columnpivoting SIAM J Sci Comput 19 (1998) pp 1486ndash1494

[56] B RECHT M FAZEL AND P A PARRILO Guaranteed minimum-rank solutions of linear matrix equationsvia nuclear norm minimization SIAM Rev 52 (2010) pp 471ndash501

[57] J D RENNIE AND N SREBRO Fast maximum margin matrix factorization for collaborative prediction inProceedings of the 22nd International Conference on Machine Learning L De Raedt and S Wrobel edsACM Press Neow York 2005 pp 713ndash719

[58] V ROKHLIN A SZLAM AND M TYGERT A randomized algorithm for principal component analysis SIAMJ Matrix Anal Appl 31 (2009) pp 1100ndash1124

[59] A K SAIBABA HOID higher order interpolatory decomposition for tensors based on Tucker representationSIAM J Matrix Anal Appl 37 (2016) pp 1223ndash1249

[60] B SAVAS AND L ELDEacuteN Handwritten digit classification using higher order singular value decompositionPattern Recognition 40 (2007) pp 993ndash1003

[61] R SCHREIBER AND C VAN LOAN A storage-efficient WY representation for products of Householdertransformations SIAM J Sci Statist Comput 10 (1989) pp 53ndash57

[62] A SHASHUA AND T HAZAN Non-negative tensor factorization with applications to statistics and com-puter vision in Proceedings of the 22nd International Conference on Machine Learning S DzeroskiL De Raedt and S Wrobel eds ACM New York 2005 pp 792ndash799

[63] N D SIDIROPOULOS R BRO AND G B GIANNAKIS Parallel factor analysis in sensor array processingIEEE Trans Signal Process 48 (2000) pp 2377ndash2388

[64] D C SORENSEN AND M EMBREE A DEIM induced CUR factorization SIAM J Sci Comput 38 (2016)pp A1454ndashA1482

[65] N SREBRO J RENNIE AND T S JAAKKOLA Maximum-margin matrix factorization in NIPSrsquo04 Proceed-ings of the 17th International Conference on Neural Information Processing Systems L K Saul Y Weissand L Bottou eds MIT Press Cambridge 2004 pp 1329ndash1336

[66] G W STEWART The QLP approximation to the singular value decomposition SIAM J Sci Comput 20(1999) pp 1336ndash1348

ETNAKent State University and

Johann Radon Institute (RICAM)

494 Y FENG J XIAO AND M GU

[67] S TAHERI Q QIU AND R CHELLAPPA Structure-preserving sparse decomposition for facial expressionanalysis IEEE Trans Image Process 23 (2014) pp 3590ndash3603

[68] K-C TOH AND S YUN An accelerated proximal gradient algorithm for nuclear norm regularized linearleast squares problems Pac J Optim 6 (2010) pp 615ndash640

[69] L N TREFETHEN AND D BAU III Numerical Linear Algebra SIAM Philadelphia 1997[70] L R TUCKER Some mathematical notes on three-mode factor analysis Psychometrika 31 (1966) pp 279ndash

311[71] N VANNIEUWENHOVEN R VANDEBRIL AND K MEERBERGEN A new truncation strategy for the higher-

order singular value decomposition SIAM J Sci Comput 34 (2012) pp A1027ndashA1052[72] M A O VASILESCU AND D TERZOPOULOS Multilinear analysis of image ensembles Tensorfaces in

European Conference on Computer Vision 2002 A Heyden G Sparr M Nielsen and P Johansen edsLecture Notes in Computer Science vol 2350 Springer Berlin 2002 pp 447ndash460

[73] N VERVLIET O DEBALS L SORBER M V BAREL AND L DE LATHAUWER Tensorlab user guideavailable at httpwwwtensorlabnet

[74] J XIAO AND M GU Spectrum-revealing Cholesky factorization for kernel methods in Proceedings of the 16thIEEE International Conference on Data Mining (ICDM) IEEE Conference Proceedings Los Alamitos2016 pp 1293ndash1298

[75] J XIAO M GU AND J LANGOU Fast parallel randomized QR with column pivoting algorithms for reliablelow-rank matrix approximations in Proceedings of the 24th IEEE International Conference on HighPerformance Computing (HiPC) IEEE Conference Proceedings Los Alamitos 2017 pp 233ndash242

[76] T ZHANG AND G H GOLUB Rank-one approximation to high order tensors SIAM J Matrix Anal Appl23 (2001) pp 534ndash550

Page 17: FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION AND …etna.mcs.kent.edu/vol.51.2019/pp469-494.dir/pp469-494.pdfminimization. In Section3, we introduce Flip-Flop SRQR and analyze its

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 485

50 100 150 200

target rank k

10-5

10-4

10-3

10-2

10-1

100

rela

tive

ap

pro

xim

atio

n e

rro

r

MLSVD

MLSVD_FFSRQR

MLSVD_RSI

MLSVD_LTSVD

50 100 150 200

target rank k

10-2

10-1

100

101

102

run tim

e (

sec)

MLSVD

MLSVD_FFSRQR

MLSVD_RSI

MLSVD_LTSVD

FIG 44 Run time and relative approximation error comparison on a sparse tensor

TABLE 43Comparison on handwritten digits classification

MLSVD MLSVD_FFSRQR MLSVD_RSI MLSVD_LTSVD

Training Time [sec] 272121 15455 19343 04266Relative Model Error 04099 04273 04247 05162Classification Accur 9519 9498 9505 9259

one and MLSVD_LTSVD is the fastest one In terms of classification quality MLSVDMLSVD_FFSRQR and MLSVD_RSI are comparable while MLSVD_LTSVD is the leastaccurate one

43 Solving nuclear norm minimization problem To show the effectiveness of theFFSRQR algorithm for the nuclear norm minimization problems we investigate two scenariosthe robust PCA (27) and the matrix completion problem (28) The test matrix used for therobust PCA is introduced in [45] and the test matrices used in the matrix completion are tworeal data sets We use the IALM method [45] to solve both problems the IALM code can bedownloaded1

431 Robust PCA To solve the robust PCA problem we replace the approximateSVD part in the IALM method [45] by various methods We denote the actual solution tothe robust PCA problem by a matrix pair (Xlowast Elowast) isin Rmtimesn times Rmtimesn The matrix Xlowast isXlowast = XLX

TR with XL isin Rmtimesk XR isin Rntimesk being random matrices whose entries are

independently sampled from a normal distribution The sparse matrix Elowast is a random matrixwhere its non-zero entries are independently sampled from a uniform distribution over theinterval [minus500 500] The input to the IALM algorithm has the form M = Xlowast + Elowast andthe output is denoted by

(X E

) In this numerical experiment we use the same parameter

settings as the IALM code for robust PCA rank k is 01m and the number of non-zero entriesin E is 005m2 We choose the trade-off parameter λ = 1

radicmax (mn) as suggested by

Candes et al [8] The solution quality is measured by the normalized root mean square errorX minusXlowastF XlowastF

Table 44 includes relative error run time the number of non-zero entries in E (E0)the iteration count and the number of non-zero singular values (sv) in X of the IALM

1httpperceptioncslillinoisedumatrix-ranksample_codehtml

ETNAKent State University and

Johann Radon Institute (RICAM)

486 Y FENG J XIAO AND M GU

algorithm using different approximate SVD methods We observe that IALM_FFSRQR isfaster than all the other three methods while its error is comparable to IALM_LANSVD andIALM_RSISVD IALM_LTSVD is relatively slow and not effective

TABLE 44Comparison on robust PCA

Size Method Error Time (sec) E0 Iter sv

1000times 1000

IALM_LANSVD 333e-07 579e+00 50000 22 100IALM_FFSRQR 279e-07 102e+00 50000 25 100IALM_RSISVD 336e-07 109e+00 49999 22 100IALM_LTSVD 992e-02 311e+00 999715 100 100

2000times 2000

IALM_LANSVD 261e-07 591e+01 199999 22 200IALM_FFSRQR 182e-07 693e+00 199998 25 200IALM_RSISVD 263e-07 738e+00 199996 22 200IALM_LTSVD 842e-02 220e+01 3998937 100 200

4000times 4000

IALM_LANSVD 138e-07 465e+02 799991 23 400IALM_FFSRQR 139e-07 443e+01 800006 26 400IALM_RSISVD 151e-07 504e+01 799990 23 400IALM_LTSVD 894e-02 154e+02 15996623 100 400

6000times 6000

IALM_LANSVD 130e-07 166e+03 1799982 23 600IALM_FFSRQR 102e-07 142e+02 1799993 26 600IALM_RSISVD 144e-07 162e+02 1799985 23 600IALM_LTSVD 858e-02 555e+02 35992605 100 600

432 Matrix completion We solve the matrix completion problems for two real datasets used in [68] the Jester joke data set [24] and the MovieLens data set [32] The Jester jokedata set consists of 41 million ratings for 100 jokes from 73 421 users and can be downloadedfrom the website httpgoldbergberkeleyedujester-data We test withthe following data matrices

bull jester-1 Data from 24 983 users who have rated 36 or more jokesbull jester-2 Data from 23 500 users who have rated 36 or more jokesbull jester-3 Data from 24 938 users who have rated between 15 and 35 jokesbull jester-all The combination of jester-1 jester-2 and jester-3

The MovieLens data set can be downloaded fromhttpsgrouplensorgdatasetsmovielensWe test with the following data matricesbull movie-100K 100 000 ratings of 943 users for 1682 moviesbull movie-1M 1 million ratings of 6040 users for 3900 moviesbull movie-latest-small 100 000 ratings of 700 users for 9000 movies

For each data set we let M be the original data matrix where Mij stands for the ratingof joke (movie) j by user i and Γ be the set of indices where Mij is known The matrixcompletion algorithm quality is measured by the Normalized Mean Absolute Error (NMAE)defined by

NMAEdef=

1|Γ|sum

(ij)isinΓ |Mij minusXij |rmax minus rmin

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 487

where Xij is the prediction of the rating of joke (movie) j given by user i and rmin rmax arethe lower and upper bounds of the ratings respectively For the Jester joke data sets we setrmin = minus10 and rmax = 10 For the MovieLens data sets we set rmin = 1 and rmax = 5

Since |Γ| is large we randomly select a subset Ω from Γ and then use Algorithm 8 tosolve the problem (28) We randomly select 10 ratings for each user in the Jester joke datasets while we randomly choose about 50 of the ratings for each user in the MovieLensdata sets Table 45 includes the parameter settings in the algorithms The maximum iterationnumber is 100 in IALM and all other parameters are the same as those used in [45]

The numerical results are included in Table 46 We observe that IALM_FFSRQR achievesalmost the same recoverability as other methods (except IALM_LTSVD) and is slightly fasterthan IALM_RSISVD for these two data sets

TABLE 45Parameters used in the IALM method on matrix completion

Data set m n |Γ| |Ω|jester-1 24983 100 181e+06 249830jester-2 23500 100 171e+06 235000jester-3 24938 100 617e+05 249384

jester-all 73421 100 414e+06 734210movie-100K 943 1682 100e+05 49918movie-1M 6040 3706 100e+06 498742

movie-latest-small 671 9066 100e+05 52551

5 Conclusions We have presented the Flip-Flop SRQR factorization a variant ofthe QLP factorization to compute low-rank matrix approximations The Flip-Flop SRQRalgorithm uses a SRQR factorization to initialize the truncated version of a column-pivotedQR factorization and then forms an LQ factorization For the numerical results presented theerrors in the proposed algorithm were comparable to those obtained from the other state-of-the-art algorithms This new algorithm is cheaper to compute and produces quality low-rankmatrix approximations Furthermore we prove singular value lower bounds and residual errorupper bounds for the Flip-Flop SRQR factorization In situations where singular values ofthe input matrix decay relatively quickly the low-rank approximation computed by Flip-FlopSRQR is guaranteed to be as accurate as the truncated SVD We also perform a complexityanalysis to show that Flip-Flop SRQR is faster than the approximate SVD with randomizedsubspace iteration Future work includes reducing the overhead cost in Flip-Flop SRQR andimplementing the Flip-Flop SRQR algorithm on distributed memory machines for popularapplications such as distributed PCA

Acknowledgments The authors would like to thank Yousef Saad the editor and threeanonymous referees who kindly reviewed the earlier version of this manuscript and providedvaluable suggestions and comments

Appendix A Partial QR factorization with column pivotingThe details of partial QRCP are covered in Algorithm 4

Appendix B TUXV algorithmThe details of TUXV algorithm are presented in Algorithm 5

Appendix C Computing Tucker decomposition of higher-order tensorsAlgorithm 6 computes the Tucker decomposition of higher-order tensors ie sequentially

truncated higher-order SVD

ETNAKent State University and

Johann Radon Institute (RICAM)

488 Y FENG J XIAO AND M GU

TABLE 46Comparison on matrix completion

Data set Method Iter Time NMAE sv σmax σmin

jester-1

IALM-LANSVD 12 706e+00 184e-01 100 214e+03 100e+00IALM-FFSRQR 12 344e+00 169e-01 100 228e+03 100e+00IALM-RSISVD 12 375e+00 189e-01 100 212e+03 100e+00IALM-LTSVD 100 211e+01 174e-01 62 300e+03 100e+00

jester-2

IALM-LANSVD 12 680e+00 185e-01 100 213e+03 100e+00IALM-FFSRQR 12 279e+00 170e-01 100 229e+03 100e+00IALM-RSISVD 12 359e+00 191e-01 100 212e+03 100e+00IALM-LTSVD 100 203e+01 175e-01 58 296e+03 100e+00

jester-3

IALM-LANSVD 12 705e+00 126e-01 99 179e+03 100e+00IALM-FFSRQR 12 303e+00 122e-01 100 171e+03 100e+00IALM-RSISVD 12 385e+00 131e-01 100 178e+03 100e+00IALM-LTSVD 100 212e+01 133e-01 55 250e+03 100e+00

jester-all

IALM-LANSVD 12 239e+01 172e-01 100 356e+03 100e+00IALM-FFSRQR 12 112e+01 162e-01 100 363e+03 100e+00IALM-RSISVD 12 134e+01 182e-01 100 347e+03 100e+00IALM-LTSVD 100 699e+01 168e-01 52 492e+03 100e+00

movie-100K

IALM-LANSVD 29 286e+01 183e-01 285 121e+03 100e+00IALM-FFSRQR 30 455e+00 167e-01 295 153e+03 100e+00IALM-RSISVD 29 482e+00 182e-01 285 129e+03 100e+00IALM-LTSVD 48 142e+01 147e-01 475 191e+03 100e+00

movie-1M

IALM-LANSVD 50 740e+02 158e-01 495 499e+03 100e+00IALM-FFSRQR 53 207e+02 137e-01 525 663e+03 100e+00IALM-RSISVD 50 223e+02 157e-01 495 535e+03 100e+00IALM-LTSVD 100 850e+02 117e-01 995 897e+03 100e+00

movie-latest-small

IALM-LANSVD 31 166e+02 185e-01 305 113e+03 100e+00IALM-FFSRQR 31 196e+01 200e-01 305 142e+03 100e+00IALM-RSISVD 31 285e+01 191e-01 305 120e+03 100e+00IALM-LTSVD 63 402e+01 208e-01 298 179e+03 100e+00

Appendix D IALM for solving two special nuclear norm minimization problemsAlgorithm 7 and Algorithm 8 solve the robust PCA and matrix completion problems

respectively In Algorithm 7 Sω (x) = sgn (x) middot max (|x| minus ω 0) is the soft shrinkageoperator [29] with x isin Rn and ω gt 0 In Algorithm 8 Ω is the complement of Ω

Appendix E Approximate SVD with randomized subspace iterationThe randomized subspace iteration was proposed in [30 Algorithm 44] to compute an

orthonormal matrix whose range approximates the range of A An approximate SVD can becomputed using the aforementioned orthonormal matrix [30 Algorithm 51] A randomizedsubspace iteration is used in routine MLSVD_RSI in Matlab toolbox tensorlab [73] andMLSVD_RSI is by far the most efficient function that we can find in Matlab to compute ST-HOSVD We summarize the approximate SVD with randomized subspace iteration pseudocodein Algorithm 9

Now we perform a complexity analysis for the approximate SVD with randomizedsubspace iteration We first note that

1 The cost of generating a random matrix is negligible2 The cost of computing B = AΩ is 2mn (k + p)

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 489

Algorithm 4 Partial QRCPInputsMatrix A isin Rmtimesn Target rank kOutputsOrthogonal matrix Q isin RmtimesmUpper trapezoidal matrix R isin RmtimesnPermutation matrix Π isin Rntimesn such that AΠ = QRAlgorithmInitialize Π = In Compute column norms rs = A (1 m s) 2 (1 le s le n)for j = 1 k do

Find i = arg maxjleslen

rs Exchange rj and ri columns in A and Π

Form Householder reflection Qj from A (j m j)Update trailing matrix A (j m j n)larr QTj A (j m j n)Update rs = A (j + 1 m s) 2 (j + 1 le s le n)

end forQ = Q1Q2 middot middot middotQk is the product of all reflections R = upper trapezoidal part of A

Algorithm 5 TUXV AlgorithmInputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0OutputsColumn orthonormal matrices U isin Rmtimesk V isin Rntimesk and upper triangular matrixR isin Rktimesk such that A asymp URV T AlgorithmDo TRQRCP on A to obtain Q isin Rmtimesk R isin Rktimesn and Π isin RntimesnR = RΠT and do LQ factorization ie [VR] = qr(RT 0)Compute Z = AV and do QR factorization ie [UR] = qr(Z 0)

3 In each QR step [Qsim] = qr (B 0) the cost of computing the QR factorization ofB is 2m (k + p)

2 minus 23 (k + p)

3 (cf [69]) and the cost of forming the first (k + p)

columns in the full Q matrix is m (k + p)2

+ 13 (k + p)

3

Now we count the flops for each i in the for loop

1 The cost of computing B = AT lowastQ is 2mn (k + p)2 The cost of computing [Qsim] = qr (B 0) is 2n (k + p)

2 minus 23 (k + p)

3 and the costof forming the first (k + p) columns in the full Q matrix is n (k + p)

2+ 1

3 (k + p)3

3 The cost of computing B = A lowastQ is 2mn (k + p)4 The cost of computing [Qsim] = qr (B 0) is 2m (k + p)

2minus 23 (k + p)

3 and the costof forming the first (k + p) columns in the full Q matrix is m (k + p)

2+ 1

3 (k + p)3

Together the cost of running the for loop q times is

q

(4mn (k + p) + 3 (m+ n) (k + p)

2 minus 2

3(k + p)

3

)

Additionally the cost of computing B = QT lowastA is 2mn (k + p) the cost of doing SVD of Bis O(n (k + p)

2) and the cost of computing U = Q lowast U is 2m (k + p)

2

ETNAKent State University and

Johann Radon Institute (RICAM)

490 Y FENG J XIAO AND M GU

Algorithm 6 ST-HOSVDInputsTensor X isin RI1timesmiddotmiddotmiddottimesId desired rank (k1 kd) and processing order p = (p1 middot middot middot pd)OutputsTucker decomposition [GU1 middot middot middot Ud]AlgorithmDefine tensor G larr X for j = 1 d dor = pj Compute exact or approximate rank kr SVD of the tensor unfolding G(r) asymp UrΣrV Tr Ur larr UrUpdate G(r) larr ΣrV

Tr ie applying UTr to G

end for

Algorithm 7 Robust PCA Using IALMInputsMeasured matrix M isin Rmtimesn positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 J (M) = max

(M2 λminus1MF

) Y0 = MJ (M) E0 = 0

while not converged do(UΣV) = svd

(MminusEk + microminus1k Yk

)

Xk+1 = USmicrominus1k

(Σ)V T

Ek+1 = Sλmicrominus1k

(M minusXk+1 + microminus1

k Yk)

Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then

Breakend if

end while

Now assume k + p min (mn) and omit the lower-order terms then we arriveat (4q + 4)mn (k + p) as the complexity of approximate SVD with randomized subspaceiteration In practice q is usually chosen to be an integer between 0 and 2

REFERENCES

[1] O ALTER P O BROWN AND D BOTSTEIN Singular value decomposition for genome-wide expressiondata processing and modeling Proc Natl Acad Sci USA 97 (2000) pp 10101ndash10106

[2] E ANDERSON Z BAI C BISCHOF L S BLACKFORD J DEMMEL J DONGARRA J DU CROZA GREENBAUM S HAMMARLING A MCKENNEY AND D SORENSEN LAPACK Usersrsquo Guide 3rded SIAM Philadelphia 1999

[3] C A ANDERSSON AND R BRO Improving the speed of multi-way algorithms Part I Tucker3 ChemometricsIntell Lab Syst 42 (1998) pp 93ndash103

[4] M W BERRY S T DUMAIS AND G W OrsquoBRIEN Using linear algebra for intelligent information retrievalSIAM Rev 37 (1995) pp 573ndash595

[5] C BISCHOF AND C VAN LOAN The WY representation for products of Householder matrices SIAM J Sci

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 491

Algorithm 8 Matrix Completion Using IALMInputsSampled set Ω sampled entries πΩ (M) positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 Y0 = 0 E0 = 0while not converged do

(UΣV) = svd(MminusEk + microminus1k Yk

)

Xk+1 = USmicrominus1k

(Σ)V T

Ek+1 = πΩ

(M minusXk+1 + microminus1

k Yk)

Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then

Breakend if

end while

Statist Comput 8 (1987) pp S2ndashS13[6] P BUSINGER AND G H GOLUB Handbook series linear algebra Linear least squares solutions by House-

holder transformations Numer Math 7 (1965) pp 269ndash276[7] J-F CAI E J CANDEgraveS AND Z SHEN A singular value thresholding algorithm for matrix completion

SIAM J Optim 20 (2010) pp 1956ndash1982[8] E J CANDEgraveS X LI Y MA AND J WRIGHT Robust principal component analysis J ACM 58 (2011)

Art 11 37 pages[9] E J CANDEgraveS AND B RECHT Exact matrix completion via convex optimization Found Comput Math 9

(2009) pp 717ndash772[10] J D CARROLL AND J-J CHANG Analysis of individual differences in multidimensional scaling via an

N-way generalization of ldquoEckart-Youngrdquo decomposition Psychometrika 35 (1970) pp 283ndash319[11] T A DAVIS AND Y HU The University of Florida sparse matrix collection ACM Trans Math Software 38

(2011) pp Art 1 25[12] L DE LATHAUWER AND B DE MOOR From matrix to tensor Multilinear algebra and signal processing in

Mathematics in Signal Processing IV Inst Math Appl Conf Ser Oxford Univ Press Oxford 1998pp 1ndash16

[13] L DE LATHAUWER B DE MOOR AND J VANDEWALLE A multilinear singular value decompositionSIAM J Matrix Anal Appl 21 (2000) pp 1253ndash1278

[14] On the best rank-1 and rank-(R1 R2 middot middot middot RN ) approximation of higher-order tensors SIAM JMatrix Anal Appl 21 (2000) pp 1324ndash1342

[15] L DE LATHAUWER AND J VANDEWALLE Dimensionality reduction in higher-order signal processing andrank-(R1 R2 RN ) reduction in multilinear algebra Linear Algebra Appl 391 (2004) pp 31ndash55

[16] P DRINEAS R KANNAN AND M W MAHONEY Fast Monte Carlo algorithms for matrices II Computinga low-rank approximation to a matrix SIAM J Comput 36 (2006) pp 158ndash183

[17] J A DUERSCH AND M GU Randomized QR with column pivoting SIAM J Sci Comput 39 (2017)pp C263ndashC291

[18] C ECKART AND G YOUNG The approximation of one matrix by another of lower rank Psychometrika 1(1936) pp 211ndash218

[19] L ELDEacuteN Matrix Methods in Data Mining and Pattern Recognition SIAM Philadelphia 2007[20] M FAZEL H HINDI AND S P BOYD Log-det heuristic for matrix rank minimization with applications to

Hankel and Euclidean distance matrices in Proceedings of the 2003 American Control Conference 2003IEEE Conference Proceedings IEEE Piscataway pp 2156ndash2162

[21] M FAZEL T K PONG D SUN AND P TSENG Hankel matrix rank minimization with applications tosystem identification and realization SIAM J Matrix Anal Appl 34 (2013) pp 946ndash977

[22] C FRANK AND E NOumlTH Optimizing eigenfaces by face masks for facial expression recognition in ComputerAnalysis of Images and Patterns N Petkov and M A Westenberg eds vol 2756 of Lecture Notes inComput Sci Springer Berlin 2003 pp 646ndash654

ETNAKent State University and

Johann Radon Institute (RICAM)

492 Y FENG J XIAO AND M GU

Algorithm 9 Approximate SVD with Randomized Subspace IterationInputsMatrix A isin Rmtimesn Target rank k Oversampling size p ge 0 Number of iterations q ge 1OutputsU isin Rmtimesk contains the approximate top k left singular vectors of AΣ isin Rktimesk contains the approximate top k singular values of AV isin Rntimesk contains the approximate top k right singular vectors of AAlgorithmGenerate iid Gaussian matrix Ω isin N (0 1)

ntimes(k+p)Compute B = AΩ[Qsim] = qr (B 0)for i = 1 q doB = AT lowastQ[Qsim] = qr (B 0)B = A lowastQ[Qsim] = qr (B 0)

end forB = QT lowastA[UΣ V ] = svd (B)U = Q lowast U U = U ( 1 k)Σ = Σ (1 k 1 k)V = V ( 1 k)

[23] G W FURNAS S DEERWESTER S T DUMAIS T K LANDAUER R A HARSHMAN L A STREETERAND K E LOCHBAUM Information retrieval using a singular value decomposition model of latentsemantic structure in Proceedings of the 11th Annual International ACM SIGIR Conference on Researchand Development in Information Retrieval Y Chiaramella ed ACM New York 1988 pp 465ndash480

[24] K GOLDBERG T ROEDER D GUPTA AND C PERKINS Eigentaste A constant time collaborative filteringalgorithm Inf Retrieval 4 (2001) pp 133ndash151

[25] G GOLUB Numerical methods for solving linear least squares problems Numer Math 7 (1965) pp 206ndash216[26] G H GOLUB AND C F VAN LOAN Matrix Computations 4th ed Johns Hopkins University Press

Baltimore 2013[27] M GU Subspace iteration randomization and singular value problems SIAM J Sci Comput 37 (2015)

pp A1139ndashA1173[28] M GU AND S C EISENSTAT Efficient algorithms for computing a strong rank-revealing QR factorization

SIAM J Sci Comput 17 (1996) pp 848ndash869[29] E T HALE W YIN AND Y ZHANG Fixed-point continuation for l1-minimization methodology and

convergence SIAM J Optim 19 (2008) pp 1107ndash1130[30] N HALKO P G MARTINSSON AND J A TROPP Finding structure with randomness probabilistic

algorithms for constructing approximate matrix decompositions SIAM Rev 53 (2011) pp 217ndash288[31] R A HARSHMAN Foundations of the PARAFAC procedure Models and conditions for an ldquoexplanatoryrdquo

multimodal factor analysis UCLA Working Papers in Phonetics 16 (1970) pp 1ndash84[32] J L HERLOCKER J A KONSTAN A BORCHERS AND J RIEDL An algorithmic framework for performing

collaborative filtering in Proceedings of the 22nd Annual International ACM SIGIR Conference onResearch and Development in Information Retrieval F Gey M Heaast and R Tong eds ACM NewYork 1999 pp 230ndash237

[33] R A HORN AND C R JOHNSON Topics in Matrix Analysis Cambridge University Press Cambridge 1991[34] D A HUCKABY AND T F CHAN On the convergence of Stewartrsquos QLP algorithm for approximating the

SVD Numer Algorithms 32 (2003) pp 287ndash316[35] Stewartrsquos pivoted QLP decomposition for low-rank matrices Numer Linear Algebra Appl 12 (2005)

pp 153ndash159[36] W B JOHNSON AND J LINDENSTRAUSS Extensions of Lipschitz mappings into a Hilbert space in

Conference in Modern Analysis and Probability (New Haven Conn 1982) R Beals A Beck A Bellowand A Hajian eds vol 26 of Contemp Math Amer Math Soc Providence 1984 pp 189ndash206

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 493

[37] I T JOLLIFFE Principal Component Analysis Springer New York 1986[38] J M KLEINBERG Authoritative sources in a hyperlinked environment J ACM 46 (1999) pp 604ndash632[39] T G KOLDA Orthogonal tensor decompositions SIAM J Matrix Anal Appl 23 (2001) pp 243ndash255[40] T G KOLDA AND B W BADER Tensor decompositions and applications SIAM Rev 51 (2009) pp 455ndash

500[41] R M LARSEN PROPACK-software for large and sparse SVD calculations available at

httpsunstanfordedurmunkPROPACK[42] Lanczos bidiagonalization with partial reorthogonalization Tech Report PB-537 Department of

Computer Science Aarhus University Aarhus Denmark 1998[43] Y LECUN L BOTTOU Y BENGIO AND P HAFFNER Gradient-based learning applied to document

recognition Proc IEEE 86 (1998) pp 2278ndash2324[44] R B LEHOUCQ D C SORENSEN AND C YANG ARPACK Usersrsquo Guide Solution of Large-Scale

Eigenvalue Problems with Implicitly Restarted Arnoldi Methods SIAM Philadelphia 1998[45] Z LIN M CHEN AND Y MA The augmented Lagrange multiplier method for exact recovery of corrupted

low-rank matrices Preprint on arXiv httpsarxivorgabs10095055 2010[46] N LINIAL E LONDON AND Y RABINOVICH The geometry of graphs and some of its algorithmic

applications Combinatorica 15 (1995) pp 215ndash245[47] Z LIU A HANSSON AND L VANDENBERGHE Nuclear norm system identification with missing inputs and

outputs Systems Control Lett 62 (2013) pp 605ndash612[48] Z LIU AND L VANDENBERGHE Interior-point method for nuclear norm approximation with application to

system identification SIAM J Matrix Anal Appl 31 (2009) pp 1235ndash1256[49] S MA D GOLDFARB AND L CHEN Fixed point and Bregman iterative methods for matrix rank minimiza-

tion Math Program 128 (2011) pp 321ndash353[50] C B MELGAARD Randomized Pivoting and Spectrum-Revealing Bounds in Numerical Linear Algebra

PhD Thesis University of California Berkeley 2015[51] M MESBAHI AND G P PAPAVASSILOPOULOS On the rank minimization problem over a positive semidefinite

linear matrix inequality IEEE Trans Automat Control 42 (1997) pp 239ndash243[52] N MULLER L MAGAIA AND B M HERBST Singular value decomposition eigenfaces and 3D recon-

structions SIAM Rev 46 (2004) pp 518ndash545[53] M NARWARIA AND W LIN SVD-based quality metric for image and video using machine learning IEEE

Trans Systbdquo Man and Cybernetics Part B (Cybernetics) 42 (2012) pp 347ndash364[54] T-H OH Y MATSUSHITA Y-W TAI AND I SO KWEON Fast randomized singular value thresholding

for nuclear norm minimization in Proceedings of the IEEE Conference on Computer Vision and PatternRecognition 2015 IEEE Conference Proceedings IEEE Los Alamitos 2015 pp 4484ndash4493

[55] G QUINTANA-ORTIacute X SUN AND C H BISCHOF A BLAS-3 version of the QR factorization with columnpivoting SIAM J Sci Comput 19 (1998) pp 1486ndash1494

[56] B RECHT M FAZEL AND P A PARRILO Guaranteed minimum-rank solutions of linear matrix equationsvia nuclear norm minimization SIAM Rev 52 (2010) pp 471ndash501

[57] J D RENNIE AND N SREBRO Fast maximum margin matrix factorization for collaborative prediction inProceedings of the 22nd International Conference on Machine Learning L De Raedt and S Wrobel edsACM Press Neow York 2005 pp 713ndash719

[58] V ROKHLIN A SZLAM AND M TYGERT A randomized algorithm for principal component analysis SIAMJ Matrix Anal Appl 31 (2009) pp 1100ndash1124

[59] A K SAIBABA HOID higher order interpolatory decomposition for tensors based on Tucker representationSIAM J Matrix Anal Appl 37 (2016) pp 1223ndash1249

[60] B SAVAS AND L ELDEacuteN Handwritten digit classification using higher order singular value decompositionPattern Recognition 40 (2007) pp 993ndash1003

[61] R SCHREIBER AND C VAN LOAN A storage-efficient WY representation for products of Householdertransformations SIAM J Sci Statist Comput 10 (1989) pp 53ndash57

[62] A SHASHUA AND T HAZAN Non-negative tensor factorization with applications to statistics and com-puter vision in Proceedings of the 22nd International Conference on Machine Learning S DzeroskiL De Raedt and S Wrobel eds ACM New York 2005 pp 792ndash799

[63] N D SIDIROPOULOS R BRO AND G B GIANNAKIS Parallel factor analysis in sensor array processingIEEE Trans Signal Process 48 (2000) pp 2377ndash2388

[64] D C SORENSEN AND M EMBREE A DEIM induced CUR factorization SIAM J Sci Comput 38 (2016)pp A1454ndashA1482

[65] N SREBRO J RENNIE AND T S JAAKKOLA Maximum-margin matrix factorization in NIPSrsquo04 Proceed-ings of the 17th International Conference on Neural Information Processing Systems L K Saul Y Weissand L Bottou eds MIT Press Cambridge 2004 pp 1329ndash1336

[66] G W STEWART The QLP approximation to the singular value decomposition SIAM J Sci Comput 20(1999) pp 1336ndash1348

ETNAKent State University and

Johann Radon Institute (RICAM)

494 Y FENG J XIAO AND M GU

[67] S TAHERI Q QIU AND R CHELLAPPA Structure-preserving sparse decomposition for facial expressionanalysis IEEE Trans Image Process 23 (2014) pp 3590ndash3603

[68] K-C TOH AND S YUN An accelerated proximal gradient algorithm for nuclear norm regularized linearleast squares problems Pac J Optim 6 (2010) pp 615ndash640

[69] L N TREFETHEN AND D BAU III Numerical Linear Algebra SIAM Philadelphia 1997[70] L R TUCKER Some mathematical notes on three-mode factor analysis Psychometrika 31 (1966) pp 279ndash

311[71] N VANNIEUWENHOVEN R VANDEBRIL AND K MEERBERGEN A new truncation strategy for the higher-

order singular value decomposition SIAM J Sci Comput 34 (2012) pp A1027ndashA1052[72] M A O VASILESCU AND D TERZOPOULOS Multilinear analysis of image ensembles Tensorfaces in

European Conference on Computer Vision 2002 A Heyden G Sparr M Nielsen and P Johansen edsLecture Notes in Computer Science vol 2350 Springer Berlin 2002 pp 447ndash460

[73] N VERVLIET O DEBALS L SORBER M V BAREL AND L DE LATHAUWER Tensorlab user guideavailable at httpwwwtensorlabnet

[74] J XIAO AND M GU Spectrum-revealing Cholesky factorization for kernel methods in Proceedings of the 16thIEEE International Conference on Data Mining (ICDM) IEEE Conference Proceedings Los Alamitos2016 pp 1293ndash1298

[75] J XIAO M GU AND J LANGOU Fast parallel randomized QR with column pivoting algorithms for reliablelow-rank matrix approximations in Proceedings of the 24th IEEE International Conference on HighPerformance Computing (HiPC) IEEE Conference Proceedings Los Alamitos 2017 pp 233ndash242

[76] T ZHANG AND G H GOLUB Rank-one approximation to high order tensors SIAM J Matrix Anal Appl23 (2001) pp 534ndash550

Page 18: FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION AND …etna.mcs.kent.edu/vol.51.2019/pp469-494.dir/pp469-494.pdfminimization. In Section3, we introduce Flip-Flop SRQR and analyze its

ETNAKent State University and

Johann Radon Institute (RICAM)

486 Y FENG J XIAO AND M GU

algorithm using different approximate SVD methods We observe that IALM_FFSRQR isfaster than all the other three methods while its error is comparable to IALM_LANSVD andIALM_RSISVD IALM_LTSVD is relatively slow and not effective

TABLE 44Comparison on robust PCA

Size Method Error Time (sec) E0 Iter sv

1000times 1000

IALM_LANSVD 333e-07 579e+00 50000 22 100IALM_FFSRQR 279e-07 102e+00 50000 25 100IALM_RSISVD 336e-07 109e+00 49999 22 100IALM_LTSVD 992e-02 311e+00 999715 100 100

2000times 2000

IALM_LANSVD 261e-07 591e+01 199999 22 200IALM_FFSRQR 182e-07 693e+00 199998 25 200IALM_RSISVD 263e-07 738e+00 199996 22 200IALM_LTSVD 842e-02 220e+01 3998937 100 200

4000times 4000

IALM_LANSVD 138e-07 465e+02 799991 23 400IALM_FFSRQR 139e-07 443e+01 800006 26 400IALM_RSISVD 151e-07 504e+01 799990 23 400IALM_LTSVD 894e-02 154e+02 15996623 100 400

6000times 6000

IALM_LANSVD 130e-07 166e+03 1799982 23 600IALM_FFSRQR 102e-07 142e+02 1799993 26 600IALM_RSISVD 144e-07 162e+02 1799985 23 600IALM_LTSVD 858e-02 555e+02 35992605 100 600

432 Matrix completion We solve the matrix completion problems for two real datasets used in [68] the Jester joke data set [24] and the MovieLens data set [32] The Jester jokedata set consists of 41 million ratings for 100 jokes from 73 421 users and can be downloadedfrom the website httpgoldbergberkeleyedujester-data We test withthe following data matrices

bull jester-1 Data from 24 983 users who have rated 36 or more jokesbull jester-2 Data from 23 500 users who have rated 36 or more jokesbull jester-3 Data from 24 938 users who have rated between 15 and 35 jokesbull jester-all The combination of jester-1 jester-2 and jester-3

The MovieLens data set can be downloaded fromhttpsgrouplensorgdatasetsmovielensWe test with the following data matricesbull movie-100K 100 000 ratings of 943 users for 1682 moviesbull movie-1M 1 million ratings of 6040 users for 3900 moviesbull movie-latest-small 100 000 ratings of 700 users for 9000 movies

For each data set we let M be the original data matrix where Mij stands for the ratingof joke (movie) j by user i and Γ be the set of indices where Mij is known The matrixcompletion algorithm quality is measured by the Normalized Mean Absolute Error (NMAE)defined by

NMAEdef=

1|Γ|sum

(ij)isinΓ |Mij minusXij |rmax minus rmin

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 487

where Xij is the prediction of the rating of joke (movie) j given by user i and rmin rmax arethe lower and upper bounds of the ratings respectively For the Jester joke data sets we setrmin = minus10 and rmax = 10 For the MovieLens data sets we set rmin = 1 and rmax = 5

Since |Γ| is large we randomly select a subset Ω from Γ and then use Algorithm 8 tosolve the problem (28) We randomly select 10 ratings for each user in the Jester joke datasets while we randomly choose about 50 of the ratings for each user in the MovieLensdata sets Table 45 includes the parameter settings in the algorithms The maximum iterationnumber is 100 in IALM and all other parameters are the same as those used in [45]

The numerical results are included in Table 46 We observe that IALM_FFSRQR achievesalmost the same recoverability as other methods (except IALM_LTSVD) and is slightly fasterthan IALM_RSISVD for these two data sets

TABLE 45Parameters used in the IALM method on matrix completion

Data set m n |Γ| |Ω|jester-1 24983 100 181e+06 249830jester-2 23500 100 171e+06 235000jester-3 24938 100 617e+05 249384

jester-all 73421 100 414e+06 734210movie-100K 943 1682 100e+05 49918movie-1M 6040 3706 100e+06 498742

movie-latest-small 671 9066 100e+05 52551

5 Conclusions We have presented the Flip-Flop SRQR factorization a variant ofthe QLP factorization to compute low-rank matrix approximations The Flip-Flop SRQRalgorithm uses a SRQR factorization to initialize the truncated version of a column-pivotedQR factorization and then forms an LQ factorization For the numerical results presented theerrors in the proposed algorithm were comparable to those obtained from the other state-of-the-art algorithms This new algorithm is cheaper to compute and produces quality low-rankmatrix approximations Furthermore we prove singular value lower bounds and residual errorupper bounds for the Flip-Flop SRQR factorization In situations where singular values ofthe input matrix decay relatively quickly the low-rank approximation computed by Flip-FlopSRQR is guaranteed to be as accurate as the truncated SVD We also perform a complexityanalysis to show that Flip-Flop SRQR is faster than the approximate SVD with randomizedsubspace iteration Future work includes reducing the overhead cost in Flip-Flop SRQR andimplementing the Flip-Flop SRQR algorithm on distributed memory machines for popularapplications such as distributed PCA

Acknowledgments The authors would like to thank Yousef Saad the editor and threeanonymous referees who kindly reviewed the earlier version of this manuscript and providedvaluable suggestions and comments

Appendix A Partial QR factorization with column pivotingThe details of partial QRCP are covered in Algorithm 4

Appendix B TUXV algorithmThe details of TUXV algorithm are presented in Algorithm 5

Appendix C Computing Tucker decomposition of higher-order tensorsAlgorithm 6 computes the Tucker decomposition of higher-order tensors ie sequentially

truncated higher-order SVD

ETNAKent State University and

Johann Radon Institute (RICAM)

488 Y FENG J XIAO AND M GU

TABLE 46Comparison on matrix completion

Data set Method Iter Time NMAE sv σmax σmin

jester-1

IALM-LANSVD 12 706e+00 184e-01 100 214e+03 100e+00IALM-FFSRQR 12 344e+00 169e-01 100 228e+03 100e+00IALM-RSISVD 12 375e+00 189e-01 100 212e+03 100e+00IALM-LTSVD 100 211e+01 174e-01 62 300e+03 100e+00

jester-2

IALM-LANSVD 12 680e+00 185e-01 100 213e+03 100e+00IALM-FFSRQR 12 279e+00 170e-01 100 229e+03 100e+00IALM-RSISVD 12 359e+00 191e-01 100 212e+03 100e+00IALM-LTSVD 100 203e+01 175e-01 58 296e+03 100e+00

jester-3

IALM-LANSVD 12 705e+00 126e-01 99 179e+03 100e+00IALM-FFSRQR 12 303e+00 122e-01 100 171e+03 100e+00IALM-RSISVD 12 385e+00 131e-01 100 178e+03 100e+00IALM-LTSVD 100 212e+01 133e-01 55 250e+03 100e+00

jester-all

IALM-LANSVD 12 239e+01 172e-01 100 356e+03 100e+00IALM-FFSRQR 12 112e+01 162e-01 100 363e+03 100e+00IALM-RSISVD 12 134e+01 182e-01 100 347e+03 100e+00IALM-LTSVD 100 699e+01 168e-01 52 492e+03 100e+00

movie-100K

IALM-LANSVD 29 286e+01 183e-01 285 121e+03 100e+00IALM-FFSRQR 30 455e+00 167e-01 295 153e+03 100e+00IALM-RSISVD 29 482e+00 182e-01 285 129e+03 100e+00IALM-LTSVD 48 142e+01 147e-01 475 191e+03 100e+00

movie-1M

IALM-LANSVD 50 740e+02 158e-01 495 499e+03 100e+00IALM-FFSRQR 53 207e+02 137e-01 525 663e+03 100e+00IALM-RSISVD 50 223e+02 157e-01 495 535e+03 100e+00IALM-LTSVD 100 850e+02 117e-01 995 897e+03 100e+00

movie-latest-small

IALM-LANSVD 31 166e+02 185e-01 305 113e+03 100e+00IALM-FFSRQR 31 196e+01 200e-01 305 142e+03 100e+00IALM-RSISVD 31 285e+01 191e-01 305 120e+03 100e+00IALM-LTSVD 63 402e+01 208e-01 298 179e+03 100e+00

Appendix D IALM for solving two special nuclear norm minimization problemsAlgorithm 7 and Algorithm 8 solve the robust PCA and matrix completion problems

respectively In Algorithm 7 Sω (x) = sgn (x) middot max (|x| minus ω 0) is the soft shrinkageoperator [29] with x isin Rn and ω gt 0 In Algorithm 8 Ω is the complement of Ω

Appendix E Approximate SVD with randomized subspace iterationThe randomized subspace iteration was proposed in [30 Algorithm 44] to compute an

orthonormal matrix whose range approximates the range of A An approximate SVD can becomputed using the aforementioned orthonormal matrix [30 Algorithm 51] A randomizedsubspace iteration is used in routine MLSVD_RSI in Matlab toolbox tensorlab [73] andMLSVD_RSI is by far the most efficient function that we can find in Matlab to compute ST-HOSVD We summarize the approximate SVD with randomized subspace iteration pseudocodein Algorithm 9

Now we perform a complexity analysis for the approximate SVD with randomizedsubspace iteration We first note that

1 The cost of generating a random matrix is negligible2 The cost of computing B = AΩ is 2mn (k + p)

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 489

Algorithm 4 Partial QRCPInputsMatrix A isin Rmtimesn Target rank kOutputsOrthogonal matrix Q isin RmtimesmUpper trapezoidal matrix R isin RmtimesnPermutation matrix Π isin Rntimesn such that AΠ = QRAlgorithmInitialize Π = In Compute column norms rs = A (1 m s) 2 (1 le s le n)for j = 1 k do

Find i = arg maxjleslen

rs Exchange rj and ri columns in A and Π

Form Householder reflection Qj from A (j m j)Update trailing matrix A (j m j n)larr QTj A (j m j n)Update rs = A (j + 1 m s) 2 (j + 1 le s le n)

end forQ = Q1Q2 middot middot middotQk is the product of all reflections R = upper trapezoidal part of A

Algorithm 5 TUXV AlgorithmInputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0OutputsColumn orthonormal matrices U isin Rmtimesk V isin Rntimesk and upper triangular matrixR isin Rktimesk such that A asymp URV T AlgorithmDo TRQRCP on A to obtain Q isin Rmtimesk R isin Rktimesn and Π isin RntimesnR = RΠT and do LQ factorization ie [VR] = qr(RT 0)Compute Z = AV and do QR factorization ie [UR] = qr(Z 0)

3 In each QR step [Qsim] = qr (B 0) the cost of computing the QR factorization ofB is 2m (k + p)

2 minus 23 (k + p)

3 (cf [69]) and the cost of forming the first (k + p)

columns in the full Q matrix is m (k + p)2

+ 13 (k + p)

3

Now we count the flops for each i in the for loop

1 The cost of computing B = AT lowastQ is 2mn (k + p)2 The cost of computing [Qsim] = qr (B 0) is 2n (k + p)

2 minus 23 (k + p)

3 and the costof forming the first (k + p) columns in the full Q matrix is n (k + p)

2+ 1

3 (k + p)3

3 The cost of computing B = A lowastQ is 2mn (k + p)4 The cost of computing [Qsim] = qr (B 0) is 2m (k + p)

2minus 23 (k + p)

3 and the costof forming the first (k + p) columns in the full Q matrix is m (k + p)

2+ 1

3 (k + p)3

Together the cost of running the for loop q times is

q

(4mn (k + p) + 3 (m+ n) (k + p)

2 minus 2

3(k + p)

3

)

Additionally the cost of computing B = QT lowastA is 2mn (k + p) the cost of doing SVD of Bis O(n (k + p)

2) and the cost of computing U = Q lowast U is 2m (k + p)

2

ETNAKent State University and

Johann Radon Institute (RICAM)

490 Y FENG J XIAO AND M GU

Algorithm 6 ST-HOSVDInputsTensor X isin RI1timesmiddotmiddotmiddottimesId desired rank (k1 kd) and processing order p = (p1 middot middot middot pd)OutputsTucker decomposition [GU1 middot middot middot Ud]AlgorithmDefine tensor G larr X for j = 1 d dor = pj Compute exact or approximate rank kr SVD of the tensor unfolding G(r) asymp UrΣrV Tr Ur larr UrUpdate G(r) larr ΣrV

Tr ie applying UTr to G

end for

Algorithm 7 Robust PCA Using IALMInputsMeasured matrix M isin Rmtimesn positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 J (M) = max

(M2 λminus1MF

) Y0 = MJ (M) E0 = 0

while not converged do(UΣV) = svd

(MminusEk + microminus1k Yk

)

Xk+1 = USmicrominus1k

(Σ)V T

Ek+1 = Sλmicrominus1k

(M minusXk+1 + microminus1

k Yk)

Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then

Breakend if

end while

Now assume k + p min (mn) and omit the lower-order terms then we arriveat (4q + 4)mn (k + p) as the complexity of approximate SVD with randomized subspaceiteration In practice q is usually chosen to be an integer between 0 and 2

REFERENCES

[1] O ALTER P O BROWN AND D BOTSTEIN Singular value decomposition for genome-wide expressiondata processing and modeling Proc Natl Acad Sci USA 97 (2000) pp 10101ndash10106

[2] E ANDERSON Z BAI C BISCHOF L S BLACKFORD J DEMMEL J DONGARRA J DU CROZA GREENBAUM S HAMMARLING A MCKENNEY AND D SORENSEN LAPACK Usersrsquo Guide 3rded SIAM Philadelphia 1999

[3] C A ANDERSSON AND R BRO Improving the speed of multi-way algorithms Part I Tucker3 ChemometricsIntell Lab Syst 42 (1998) pp 93ndash103

[4] M W BERRY S T DUMAIS AND G W OrsquoBRIEN Using linear algebra for intelligent information retrievalSIAM Rev 37 (1995) pp 573ndash595

[5] C BISCHOF AND C VAN LOAN The WY representation for products of Householder matrices SIAM J Sci

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 491

Algorithm 8 Matrix Completion Using IALMInputsSampled set Ω sampled entries πΩ (M) positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 Y0 = 0 E0 = 0while not converged do

(UΣV) = svd(MminusEk + microminus1k Yk

)

Xk+1 = USmicrominus1k

(Σ)V T

Ek+1 = πΩ

(M minusXk+1 + microminus1

k Yk)

Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then

Breakend if

end while

Statist Comput 8 (1987) pp S2ndashS13[6] P BUSINGER AND G H GOLUB Handbook series linear algebra Linear least squares solutions by House-

holder transformations Numer Math 7 (1965) pp 269ndash276[7] J-F CAI E J CANDEgraveS AND Z SHEN A singular value thresholding algorithm for matrix completion

SIAM J Optim 20 (2010) pp 1956ndash1982[8] E J CANDEgraveS X LI Y MA AND J WRIGHT Robust principal component analysis J ACM 58 (2011)

Art 11 37 pages[9] E J CANDEgraveS AND B RECHT Exact matrix completion via convex optimization Found Comput Math 9

(2009) pp 717ndash772[10] J D CARROLL AND J-J CHANG Analysis of individual differences in multidimensional scaling via an

N-way generalization of ldquoEckart-Youngrdquo decomposition Psychometrika 35 (1970) pp 283ndash319[11] T A DAVIS AND Y HU The University of Florida sparse matrix collection ACM Trans Math Software 38

(2011) pp Art 1 25[12] L DE LATHAUWER AND B DE MOOR From matrix to tensor Multilinear algebra and signal processing in

Mathematics in Signal Processing IV Inst Math Appl Conf Ser Oxford Univ Press Oxford 1998pp 1ndash16

[13] L DE LATHAUWER B DE MOOR AND J VANDEWALLE A multilinear singular value decompositionSIAM J Matrix Anal Appl 21 (2000) pp 1253ndash1278

[14] On the best rank-1 and rank-(R1 R2 middot middot middot RN ) approximation of higher-order tensors SIAM JMatrix Anal Appl 21 (2000) pp 1324ndash1342

[15] L DE LATHAUWER AND J VANDEWALLE Dimensionality reduction in higher-order signal processing andrank-(R1 R2 RN ) reduction in multilinear algebra Linear Algebra Appl 391 (2004) pp 31ndash55

[16] P DRINEAS R KANNAN AND M W MAHONEY Fast Monte Carlo algorithms for matrices II Computinga low-rank approximation to a matrix SIAM J Comput 36 (2006) pp 158ndash183

[17] J A DUERSCH AND M GU Randomized QR with column pivoting SIAM J Sci Comput 39 (2017)pp C263ndashC291

[18] C ECKART AND G YOUNG The approximation of one matrix by another of lower rank Psychometrika 1(1936) pp 211ndash218

[19] L ELDEacuteN Matrix Methods in Data Mining and Pattern Recognition SIAM Philadelphia 2007[20] M FAZEL H HINDI AND S P BOYD Log-det heuristic for matrix rank minimization with applications to

Hankel and Euclidean distance matrices in Proceedings of the 2003 American Control Conference 2003IEEE Conference Proceedings IEEE Piscataway pp 2156ndash2162

[21] M FAZEL T K PONG D SUN AND P TSENG Hankel matrix rank minimization with applications tosystem identification and realization SIAM J Matrix Anal Appl 34 (2013) pp 946ndash977

[22] C FRANK AND E NOumlTH Optimizing eigenfaces by face masks for facial expression recognition in ComputerAnalysis of Images and Patterns N Petkov and M A Westenberg eds vol 2756 of Lecture Notes inComput Sci Springer Berlin 2003 pp 646ndash654

ETNAKent State University and

Johann Radon Institute (RICAM)

492 Y FENG J XIAO AND M GU

Algorithm 9 Approximate SVD with Randomized Subspace IterationInputsMatrix A isin Rmtimesn Target rank k Oversampling size p ge 0 Number of iterations q ge 1OutputsU isin Rmtimesk contains the approximate top k left singular vectors of AΣ isin Rktimesk contains the approximate top k singular values of AV isin Rntimesk contains the approximate top k right singular vectors of AAlgorithmGenerate iid Gaussian matrix Ω isin N (0 1)

ntimes(k+p)Compute B = AΩ[Qsim] = qr (B 0)for i = 1 q doB = AT lowastQ[Qsim] = qr (B 0)B = A lowastQ[Qsim] = qr (B 0)

end forB = QT lowastA[UΣ V ] = svd (B)U = Q lowast U U = U ( 1 k)Σ = Σ (1 k 1 k)V = V ( 1 k)

[23] G W FURNAS S DEERWESTER S T DUMAIS T K LANDAUER R A HARSHMAN L A STREETERAND K E LOCHBAUM Information retrieval using a singular value decomposition model of latentsemantic structure in Proceedings of the 11th Annual International ACM SIGIR Conference on Researchand Development in Information Retrieval Y Chiaramella ed ACM New York 1988 pp 465ndash480

[24] K GOLDBERG T ROEDER D GUPTA AND C PERKINS Eigentaste A constant time collaborative filteringalgorithm Inf Retrieval 4 (2001) pp 133ndash151

[25] G GOLUB Numerical methods for solving linear least squares problems Numer Math 7 (1965) pp 206ndash216[26] G H GOLUB AND C F VAN LOAN Matrix Computations 4th ed Johns Hopkins University Press

Baltimore 2013[27] M GU Subspace iteration randomization and singular value problems SIAM J Sci Comput 37 (2015)

pp A1139ndashA1173[28] M GU AND S C EISENSTAT Efficient algorithms for computing a strong rank-revealing QR factorization

SIAM J Sci Comput 17 (1996) pp 848ndash869[29] E T HALE W YIN AND Y ZHANG Fixed-point continuation for l1-minimization methodology and

convergence SIAM J Optim 19 (2008) pp 1107ndash1130[30] N HALKO P G MARTINSSON AND J A TROPP Finding structure with randomness probabilistic

algorithms for constructing approximate matrix decompositions SIAM Rev 53 (2011) pp 217ndash288[31] R A HARSHMAN Foundations of the PARAFAC procedure Models and conditions for an ldquoexplanatoryrdquo

multimodal factor analysis UCLA Working Papers in Phonetics 16 (1970) pp 1ndash84[32] J L HERLOCKER J A KONSTAN A BORCHERS AND J RIEDL An algorithmic framework for performing

collaborative filtering in Proceedings of the 22nd Annual International ACM SIGIR Conference onResearch and Development in Information Retrieval F Gey M Heaast and R Tong eds ACM NewYork 1999 pp 230ndash237

[33] R A HORN AND C R JOHNSON Topics in Matrix Analysis Cambridge University Press Cambridge 1991[34] D A HUCKABY AND T F CHAN On the convergence of Stewartrsquos QLP algorithm for approximating the

SVD Numer Algorithms 32 (2003) pp 287ndash316[35] Stewartrsquos pivoted QLP decomposition for low-rank matrices Numer Linear Algebra Appl 12 (2005)

pp 153ndash159[36] W B JOHNSON AND J LINDENSTRAUSS Extensions of Lipschitz mappings into a Hilbert space in

Conference in Modern Analysis and Probability (New Haven Conn 1982) R Beals A Beck A Bellowand A Hajian eds vol 26 of Contemp Math Amer Math Soc Providence 1984 pp 189ndash206

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 493

[37] I T JOLLIFFE Principal Component Analysis Springer New York 1986[38] J M KLEINBERG Authoritative sources in a hyperlinked environment J ACM 46 (1999) pp 604ndash632[39] T G KOLDA Orthogonal tensor decompositions SIAM J Matrix Anal Appl 23 (2001) pp 243ndash255[40] T G KOLDA AND B W BADER Tensor decompositions and applications SIAM Rev 51 (2009) pp 455ndash

500[41] R M LARSEN PROPACK-software for large and sparse SVD calculations available at

httpsunstanfordedurmunkPROPACK[42] Lanczos bidiagonalization with partial reorthogonalization Tech Report PB-537 Department of

Computer Science Aarhus University Aarhus Denmark 1998[43] Y LECUN L BOTTOU Y BENGIO AND P HAFFNER Gradient-based learning applied to document

recognition Proc IEEE 86 (1998) pp 2278ndash2324[44] R B LEHOUCQ D C SORENSEN AND C YANG ARPACK Usersrsquo Guide Solution of Large-Scale

Eigenvalue Problems with Implicitly Restarted Arnoldi Methods SIAM Philadelphia 1998[45] Z LIN M CHEN AND Y MA The augmented Lagrange multiplier method for exact recovery of corrupted

low-rank matrices Preprint on arXiv httpsarxivorgabs10095055 2010[46] N LINIAL E LONDON AND Y RABINOVICH The geometry of graphs and some of its algorithmic

applications Combinatorica 15 (1995) pp 215ndash245[47] Z LIU A HANSSON AND L VANDENBERGHE Nuclear norm system identification with missing inputs and

outputs Systems Control Lett 62 (2013) pp 605ndash612[48] Z LIU AND L VANDENBERGHE Interior-point method for nuclear norm approximation with application to

system identification SIAM J Matrix Anal Appl 31 (2009) pp 1235ndash1256[49] S MA D GOLDFARB AND L CHEN Fixed point and Bregman iterative methods for matrix rank minimiza-

tion Math Program 128 (2011) pp 321ndash353[50] C B MELGAARD Randomized Pivoting and Spectrum-Revealing Bounds in Numerical Linear Algebra

PhD Thesis University of California Berkeley 2015[51] M MESBAHI AND G P PAPAVASSILOPOULOS On the rank minimization problem over a positive semidefinite

linear matrix inequality IEEE Trans Automat Control 42 (1997) pp 239ndash243[52] N MULLER L MAGAIA AND B M HERBST Singular value decomposition eigenfaces and 3D recon-

structions SIAM Rev 46 (2004) pp 518ndash545[53] M NARWARIA AND W LIN SVD-based quality metric for image and video using machine learning IEEE

Trans Systbdquo Man and Cybernetics Part B (Cybernetics) 42 (2012) pp 347ndash364[54] T-H OH Y MATSUSHITA Y-W TAI AND I SO KWEON Fast randomized singular value thresholding

for nuclear norm minimization in Proceedings of the IEEE Conference on Computer Vision and PatternRecognition 2015 IEEE Conference Proceedings IEEE Los Alamitos 2015 pp 4484ndash4493

[55] G QUINTANA-ORTIacute X SUN AND C H BISCHOF A BLAS-3 version of the QR factorization with columnpivoting SIAM J Sci Comput 19 (1998) pp 1486ndash1494

[56] B RECHT M FAZEL AND P A PARRILO Guaranteed minimum-rank solutions of linear matrix equationsvia nuclear norm minimization SIAM Rev 52 (2010) pp 471ndash501

[57] J D RENNIE AND N SREBRO Fast maximum margin matrix factorization for collaborative prediction inProceedings of the 22nd International Conference on Machine Learning L De Raedt and S Wrobel edsACM Press Neow York 2005 pp 713ndash719

[58] V ROKHLIN A SZLAM AND M TYGERT A randomized algorithm for principal component analysis SIAMJ Matrix Anal Appl 31 (2009) pp 1100ndash1124

[59] A K SAIBABA HOID higher order interpolatory decomposition for tensors based on Tucker representationSIAM J Matrix Anal Appl 37 (2016) pp 1223ndash1249

[60] B SAVAS AND L ELDEacuteN Handwritten digit classification using higher order singular value decompositionPattern Recognition 40 (2007) pp 993ndash1003

[61] R SCHREIBER AND C VAN LOAN A storage-efficient WY representation for products of Householdertransformations SIAM J Sci Statist Comput 10 (1989) pp 53ndash57

[62] A SHASHUA AND T HAZAN Non-negative tensor factorization with applications to statistics and com-puter vision in Proceedings of the 22nd International Conference on Machine Learning S DzeroskiL De Raedt and S Wrobel eds ACM New York 2005 pp 792ndash799

[63] N D SIDIROPOULOS R BRO AND G B GIANNAKIS Parallel factor analysis in sensor array processingIEEE Trans Signal Process 48 (2000) pp 2377ndash2388

[64] D C SORENSEN AND M EMBREE A DEIM induced CUR factorization SIAM J Sci Comput 38 (2016)pp A1454ndashA1482

[65] N SREBRO J RENNIE AND T S JAAKKOLA Maximum-margin matrix factorization in NIPSrsquo04 Proceed-ings of the 17th International Conference on Neural Information Processing Systems L K Saul Y Weissand L Bottou eds MIT Press Cambridge 2004 pp 1329ndash1336

[66] G W STEWART The QLP approximation to the singular value decomposition SIAM J Sci Comput 20(1999) pp 1336ndash1348

ETNAKent State University and

Johann Radon Institute (RICAM)

494 Y FENG J XIAO AND M GU

[67] S TAHERI Q QIU AND R CHELLAPPA Structure-preserving sparse decomposition for facial expressionanalysis IEEE Trans Image Process 23 (2014) pp 3590ndash3603

[68] K-C TOH AND S YUN An accelerated proximal gradient algorithm for nuclear norm regularized linearleast squares problems Pac J Optim 6 (2010) pp 615ndash640

[69] L N TREFETHEN AND D BAU III Numerical Linear Algebra SIAM Philadelphia 1997[70] L R TUCKER Some mathematical notes on three-mode factor analysis Psychometrika 31 (1966) pp 279ndash

311[71] N VANNIEUWENHOVEN R VANDEBRIL AND K MEERBERGEN A new truncation strategy for the higher-

order singular value decomposition SIAM J Sci Comput 34 (2012) pp A1027ndashA1052[72] M A O VASILESCU AND D TERZOPOULOS Multilinear analysis of image ensembles Tensorfaces in

European Conference on Computer Vision 2002 A Heyden G Sparr M Nielsen and P Johansen edsLecture Notes in Computer Science vol 2350 Springer Berlin 2002 pp 447ndash460

[73] N VERVLIET O DEBALS L SORBER M V BAREL AND L DE LATHAUWER Tensorlab user guideavailable at httpwwwtensorlabnet

[74] J XIAO AND M GU Spectrum-revealing Cholesky factorization for kernel methods in Proceedings of the 16thIEEE International Conference on Data Mining (ICDM) IEEE Conference Proceedings Los Alamitos2016 pp 1293ndash1298

[75] J XIAO M GU AND J LANGOU Fast parallel randomized QR with column pivoting algorithms for reliablelow-rank matrix approximations in Proceedings of the 24th IEEE International Conference on HighPerformance Computing (HiPC) IEEE Conference Proceedings Los Alamitos 2017 pp 233ndash242

[76] T ZHANG AND G H GOLUB Rank-one approximation to high order tensors SIAM J Matrix Anal Appl23 (2001) pp 534ndash550

Page 19: FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION AND …etna.mcs.kent.edu/vol.51.2019/pp469-494.dir/pp469-494.pdfminimization. In Section3, we introduce Flip-Flop SRQR and analyze its

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 487

where Xij is the prediction of the rating of joke (movie) j given by user i and rmin rmax arethe lower and upper bounds of the ratings respectively For the Jester joke data sets we setrmin = minus10 and rmax = 10 For the MovieLens data sets we set rmin = 1 and rmax = 5

Since |Γ| is large we randomly select a subset Ω from Γ and then use Algorithm 8 tosolve the problem (28) We randomly select 10 ratings for each user in the Jester joke datasets while we randomly choose about 50 of the ratings for each user in the MovieLensdata sets Table 45 includes the parameter settings in the algorithms The maximum iterationnumber is 100 in IALM and all other parameters are the same as those used in [45]

The numerical results are included in Table 46 We observe that IALM_FFSRQR achievesalmost the same recoverability as other methods (except IALM_LTSVD) and is slightly fasterthan IALM_RSISVD for these two data sets

TABLE 45Parameters used in the IALM method on matrix completion

Data set m n |Γ| |Ω|jester-1 24983 100 181e+06 249830jester-2 23500 100 171e+06 235000jester-3 24938 100 617e+05 249384

jester-all 73421 100 414e+06 734210movie-100K 943 1682 100e+05 49918movie-1M 6040 3706 100e+06 498742

movie-latest-small 671 9066 100e+05 52551

5 Conclusions We have presented the Flip-Flop SRQR factorization a variant ofthe QLP factorization to compute low-rank matrix approximations The Flip-Flop SRQRalgorithm uses a SRQR factorization to initialize the truncated version of a column-pivotedQR factorization and then forms an LQ factorization For the numerical results presented theerrors in the proposed algorithm were comparable to those obtained from the other state-of-the-art algorithms This new algorithm is cheaper to compute and produces quality low-rankmatrix approximations Furthermore we prove singular value lower bounds and residual errorupper bounds for the Flip-Flop SRQR factorization In situations where singular values ofthe input matrix decay relatively quickly the low-rank approximation computed by Flip-FlopSRQR is guaranteed to be as accurate as the truncated SVD We also perform a complexityanalysis to show that Flip-Flop SRQR is faster than the approximate SVD with randomizedsubspace iteration Future work includes reducing the overhead cost in Flip-Flop SRQR andimplementing the Flip-Flop SRQR algorithm on distributed memory machines for popularapplications such as distributed PCA

Acknowledgments The authors would like to thank Yousef Saad the editor and threeanonymous referees who kindly reviewed the earlier version of this manuscript and providedvaluable suggestions and comments

Appendix A Partial QR factorization with column pivotingThe details of partial QRCP are covered in Algorithm 4

Appendix B TUXV algorithmThe details of TUXV algorithm are presented in Algorithm 5

Appendix C Computing Tucker decomposition of higher-order tensorsAlgorithm 6 computes the Tucker decomposition of higher-order tensors ie sequentially

truncated higher-order SVD

ETNAKent State University and

Johann Radon Institute (RICAM)

488 Y FENG J XIAO AND M GU

TABLE 46Comparison on matrix completion

Data set Method Iter Time NMAE sv σmax σmin

jester-1

IALM-LANSVD 12 706e+00 184e-01 100 214e+03 100e+00IALM-FFSRQR 12 344e+00 169e-01 100 228e+03 100e+00IALM-RSISVD 12 375e+00 189e-01 100 212e+03 100e+00IALM-LTSVD 100 211e+01 174e-01 62 300e+03 100e+00

jester-2

IALM-LANSVD 12 680e+00 185e-01 100 213e+03 100e+00IALM-FFSRQR 12 279e+00 170e-01 100 229e+03 100e+00IALM-RSISVD 12 359e+00 191e-01 100 212e+03 100e+00IALM-LTSVD 100 203e+01 175e-01 58 296e+03 100e+00

jester-3

IALM-LANSVD 12 705e+00 126e-01 99 179e+03 100e+00IALM-FFSRQR 12 303e+00 122e-01 100 171e+03 100e+00IALM-RSISVD 12 385e+00 131e-01 100 178e+03 100e+00IALM-LTSVD 100 212e+01 133e-01 55 250e+03 100e+00

jester-all

IALM-LANSVD 12 239e+01 172e-01 100 356e+03 100e+00IALM-FFSRQR 12 112e+01 162e-01 100 363e+03 100e+00IALM-RSISVD 12 134e+01 182e-01 100 347e+03 100e+00IALM-LTSVD 100 699e+01 168e-01 52 492e+03 100e+00

movie-100K

IALM-LANSVD 29 286e+01 183e-01 285 121e+03 100e+00IALM-FFSRQR 30 455e+00 167e-01 295 153e+03 100e+00IALM-RSISVD 29 482e+00 182e-01 285 129e+03 100e+00IALM-LTSVD 48 142e+01 147e-01 475 191e+03 100e+00

movie-1M

IALM-LANSVD 50 740e+02 158e-01 495 499e+03 100e+00IALM-FFSRQR 53 207e+02 137e-01 525 663e+03 100e+00IALM-RSISVD 50 223e+02 157e-01 495 535e+03 100e+00IALM-LTSVD 100 850e+02 117e-01 995 897e+03 100e+00

movie-latest-small

IALM-LANSVD 31 166e+02 185e-01 305 113e+03 100e+00IALM-FFSRQR 31 196e+01 200e-01 305 142e+03 100e+00IALM-RSISVD 31 285e+01 191e-01 305 120e+03 100e+00IALM-LTSVD 63 402e+01 208e-01 298 179e+03 100e+00

Appendix D IALM for solving two special nuclear norm minimization problemsAlgorithm 7 and Algorithm 8 solve the robust PCA and matrix completion problems

respectively In Algorithm 7 Sω (x) = sgn (x) middot max (|x| minus ω 0) is the soft shrinkageoperator [29] with x isin Rn and ω gt 0 In Algorithm 8 Ω is the complement of Ω

Appendix E Approximate SVD with randomized subspace iterationThe randomized subspace iteration was proposed in [30 Algorithm 44] to compute an

orthonormal matrix whose range approximates the range of A An approximate SVD can becomputed using the aforementioned orthonormal matrix [30 Algorithm 51] A randomizedsubspace iteration is used in routine MLSVD_RSI in Matlab toolbox tensorlab [73] andMLSVD_RSI is by far the most efficient function that we can find in Matlab to compute ST-HOSVD We summarize the approximate SVD with randomized subspace iteration pseudocodein Algorithm 9

Now we perform a complexity analysis for the approximate SVD with randomizedsubspace iteration We first note that

1 The cost of generating a random matrix is negligible2 The cost of computing B = AΩ is 2mn (k + p)

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 489

Algorithm 4 Partial QRCPInputsMatrix A isin Rmtimesn Target rank kOutputsOrthogonal matrix Q isin RmtimesmUpper trapezoidal matrix R isin RmtimesnPermutation matrix Π isin Rntimesn such that AΠ = QRAlgorithmInitialize Π = In Compute column norms rs = A (1 m s) 2 (1 le s le n)for j = 1 k do

Find i = arg maxjleslen

rs Exchange rj and ri columns in A and Π

Form Householder reflection Qj from A (j m j)Update trailing matrix A (j m j n)larr QTj A (j m j n)Update rs = A (j + 1 m s) 2 (j + 1 le s le n)

end forQ = Q1Q2 middot middot middotQk is the product of all reflections R = upper trapezoidal part of A

Algorithm 5 TUXV AlgorithmInputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0OutputsColumn orthonormal matrices U isin Rmtimesk V isin Rntimesk and upper triangular matrixR isin Rktimesk such that A asymp URV T AlgorithmDo TRQRCP on A to obtain Q isin Rmtimesk R isin Rktimesn and Π isin RntimesnR = RΠT and do LQ factorization ie [VR] = qr(RT 0)Compute Z = AV and do QR factorization ie [UR] = qr(Z 0)

3 In each QR step [Qsim] = qr (B 0) the cost of computing the QR factorization ofB is 2m (k + p)

2 minus 23 (k + p)

3 (cf [69]) and the cost of forming the first (k + p)

columns in the full Q matrix is m (k + p)2

+ 13 (k + p)

3

Now we count the flops for each i in the for loop

1 The cost of computing B = AT lowastQ is 2mn (k + p)2 The cost of computing [Qsim] = qr (B 0) is 2n (k + p)

2 minus 23 (k + p)

3 and the costof forming the first (k + p) columns in the full Q matrix is n (k + p)

2+ 1

3 (k + p)3

3 The cost of computing B = A lowastQ is 2mn (k + p)4 The cost of computing [Qsim] = qr (B 0) is 2m (k + p)

2minus 23 (k + p)

3 and the costof forming the first (k + p) columns in the full Q matrix is m (k + p)

2+ 1

3 (k + p)3

Together the cost of running the for loop q times is

q

(4mn (k + p) + 3 (m+ n) (k + p)

2 minus 2

3(k + p)

3

)

Additionally the cost of computing B = QT lowastA is 2mn (k + p) the cost of doing SVD of Bis O(n (k + p)

2) and the cost of computing U = Q lowast U is 2m (k + p)

2

ETNAKent State University and

Johann Radon Institute (RICAM)

490 Y FENG J XIAO AND M GU

Algorithm 6 ST-HOSVDInputsTensor X isin RI1timesmiddotmiddotmiddottimesId desired rank (k1 kd) and processing order p = (p1 middot middot middot pd)OutputsTucker decomposition [GU1 middot middot middot Ud]AlgorithmDefine tensor G larr X for j = 1 d dor = pj Compute exact or approximate rank kr SVD of the tensor unfolding G(r) asymp UrΣrV Tr Ur larr UrUpdate G(r) larr ΣrV

Tr ie applying UTr to G

end for

Algorithm 7 Robust PCA Using IALMInputsMeasured matrix M isin Rmtimesn positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 J (M) = max

(M2 λminus1MF

) Y0 = MJ (M) E0 = 0

while not converged do(UΣV) = svd

(MminusEk + microminus1k Yk

)

Xk+1 = USmicrominus1k

(Σ)V T

Ek+1 = Sλmicrominus1k

(M minusXk+1 + microminus1

k Yk)

Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then

Breakend if

end while

Now assume k + p min (mn) and omit the lower-order terms then we arriveat (4q + 4)mn (k + p) as the complexity of approximate SVD with randomized subspaceiteration In practice q is usually chosen to be an integer between 0 and 2

REFERENCES

[1] O ALTER P O BROWN AND D BOTSTEIN Singular value decomposition for genome-wide expressiondata processing and modeling Proc Natl Acad Sci USA 97 (2000) pp 10101ndash10106

[2] E ANDERSON Z BAI C BISCHOF L S BLACKFORD J DEMMEL J DONGARRA J DU CROZA GREENBAUM S HAMMARLING A MCKENNEY AND D SORENSEN LAPACK Usersrsquo Guide 3rded SIAM Philadelphia 1999

[3] C A ANDERSSON AND R BRO Improving the speed of multi-way algorithms Part I Tucker3 ChemometricsIntell Lab Syst 42 (1998) pp 93ndash103

[4] M W BERRY S T DUMAIS AND G W OrsquoBRIEN Using linear algebra for intelligent information retrievalSIAM Rev 37 (1995) pp 573ndash595

[5] C BISCHOF AND C VAN LOAN The WY representation for products of Householder matrices SIAM J Sci

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 491

Algorithm 8 Matrix Completion Using IALMInputsSampled set Ω sampled entries πΩ (M) positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 Y0 = 0 E0 = 0while not converged do

(UΣV) = svd(MminusEk + microminus1k Yk

)

Xk+1 = USmicrominus1k

(Σ)V T

Ek+1 = πΩ

(M minusXk+1 + microminus1

k Yk)

Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then

Breakend if

end while

Statist Comput 8 (1987) pp S2ndashS13[6] P BUSINGER AND G H GOLUB Handbook series linear algebra Linear least squares solutions by House-

holder transformations Numer Math 7 (1965) pp 269ndash276[7] J-F CAI E J CANDEgraveS AND Z SHEN A singular value thresholding algorithm for matrix completion

SIAM J Optim 20 (2010) pp 1956ndash1982[8] E J CANDEgraveS X LI Y MA AND J WRIGHT Robust principal component analysis J ACM 58 (2011)

Art 11 37 pages[9] E J CANDEgraveS AND B RECHT Exact matrix completion via convex optimization Found Comput Math 9

(2009) pp 717ndash772[10] J D CARROLL AND J-J CHANG Analysis of individual differences in multidimensional scaling via an

N-way generalization of ldquoEckart-Youngrdquo decomposition Psychometrika 35 (1970) pp 283ndash319[11] T A DAVIS AND Y HU The University of Florida sparse matrix collection ACM Trans Math Software 38

(2011) pp Art 1 25[12] L DE LATHAUWER AND B DE MOOR From matrix to tensor Multilinear algebra and signal processing in

Mathematics in Signal Processing IV Inst Math Appl Conf Ser Oxford Univ Press Oxford 1998pp 1ndash16

[13] L DE LATHAUWER B DE MOOR AND J VANDEWALLE A multilinear singular value decompositionSIAM J Matrix Anal Appl 21 (2000) pp 1253ndash1278

[14] On the best rank-1 and rank-(R1 R2 middot middot middot RN ) approximation of higher-order tensors SIAM JMatrix Anal Appl 21 (2000) pp 1324ndash1342

[15] L DE LATHAUWER AND J VANDEWALLE Dimensionality reduction in higher-order signal processing andrank-(R1 R2 RN ) reduction in multilinear algebra Linear Algebra Appl 391 (2004) pp 31ndash55

[16] P DRINEAS R KANNAN AND M W MAHONEY Fast Monte Carlo algorithms for matrices II Computinga low-rank approximation to a matrix SIAM J Comput 36 (2006) pp 158ndash183

[17] J A DUERSCH AND M GU Randomized QR with column pivoting SIAM J Sci Comput 39 (2017)pp C263ndashC291

[18] C ECKART AND G YOUNG The approximation of one matrix by another of lower rank Psychometrika 1(1936) pp 211ndash218

[19] L ELDEacuteN Matrix Methods in Data Mining and Pattern Recognition SIAM Philadelphia 2007[20] M FAZEL H HINDI AND S P BOYD Log-det heuristic for matrix rank minimization with applications to

Hankel and Euclidean distance matrices in Proceedings of the 2003 American Control Conference 2003IEEE Conference Proceedings IEEE Piscataway pp 2156ndash2162

[21] M FAZEL T K PONG D SUN AND P TSENG Hankel matrix rank minimization with applications tosystem identification and realization SIAM J Matrix Anal Appl 34 (2013) pp 946ndash977

[22] C FRANK AND E NOumlTH Optimizing eigenfaces by face masks for facial expression recognition in ComputerAnalysis of Images and Patterns N Petkov and M A Westenberg eds vol 2756 of Lecture Notes inComput Sci Springer Berlin 2003 pp 646ndash654

ETNAKent State University and

Johann Radon Institute (RICAM)

492 Y FENG J XIAO AND M GU

Algorithm 9 Approximate SVD with Randomized Subspace IterationInputsMatrix A isin Rmtimesn Target rank k Oversampling size p ge 0 Number of iterations q ge 1OutputsU isin Rmtimesk contains the approximate top k left singular vectors of AΣ isin Rktimesk contains the approximate top k singular values of AV isin Rntimesk contains the approximate top k right singular vectors of AAlgorithmGenerate iid Gaussian matrix Ω isin N (0 1)

ntimes(k+p)Compute B = AΩ[Qsim] = qr (B 0)for i = 1 q doB = AT lowastQ[Qsim] = qr (B 0)B = A lowastQ[Qsim] = qr (B 0)

end forB = QT lowastA[UΣ V ] = svd (B)U = Q lowast U U = U ( 1 k)Σ = Σ (1 k 1 k)V = V ( 1 k)

[23] G W FURNAS S DEERWESTER S T DUMAIS T K LANDAUER R A HARSHMAN L A STREETERAND K E LOCHBAUM Information retrieval using a singular value decomposition model of latentsemantic structure in Proceedings of the 11th Annual International ACM SIGIR Conference on Researchand Development in Information Retrieval Y Chiaramella ed ACM New York 1988 pp 465ndash480

[24] K GOLDBERG T ROEDER D GUPTA AND C PERKINS Eigentaste A constant time collaborative filteringalgorithm Inf Retrieval 4 (2001) pp 133ndash151

[25] G GOLUB Numerical methods for solving linear least squares problems Numer Math 7 (1965) pp 206ndash216[26] G H GOLUB AND C F VAN LOAN Matrix Computations 4th ed Johns Hopkins University Press

Baltimore 2013[27] M GU Subspace iteration randomization and singular value problems SIAM J Sci Comput 37 (2015)

pp A1139ndashA1173[28] M GU AND S C EISENSTAT Efficient algorithms for computing a strong rank-revealing QR factorization

SIAM J Sci Comput 17 (1996) pp 848ndash869[29] E T HALE W YIN AND Y ZHANG Fixed-point continuation for l1-minimization methodology and

convergence SIAM J Optim 19 (2008) pp 1107ndash1130[30] N HALKO P G MARTINSSON AND J A TROPP Finding structure with randomness probabilistic

algorithms for constructing approximate matrix decompositions SIAM Rev 53 (2011) pp 217ndash288[31] R A HARSHMAN Foundations of the PARAFAC procedure Models and conditions for an ldquoexplanatoryrdquo

multimodal factor analysis UCLA Working Papers in Phonetics 16 (1970) pp 1ndash84[32] J L HERLOCKER J A KONSTAN A BORCHERS AND J RIEDL An algorithmic framework for performing

collaborative filtering in Proceedings of the 22nd Annual International ACM SIGIR Conference onResearch and Development in Information Retrieval F Gey M Heaast and R Tong eds ACM NewYork 1999 pp 230ndash237

[33] R A HORN AND C R JOHNSON Topics in Matrix Analysis Cambridge University Press Cambridge 1991[34] D A HUCKABY AND T F CHAN On the convergence of Stewartrsquos QLP algorithm for approximating the

SVD Numer Algorithms 32 (2003) pp 287ndash316[35] Stewartrsquos pivoted QLP decomposition for low-rank matrices Numer Linear Algebra Appl 12 (2005)

pp 153ndash159[36] W B JOHNSON AND J LINDENSTRAUSS Extensions of Lipschitz mappings into a Hilbert space in

Conference in Modern Analysis and Probability (New Haven Conn 1982) R Beals A Beck A Bellowand A Hajian eds vol 26 of Contemp Math Amer Math Soc Providence 1984 pp 189ndash206

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 493

[37] I T JOLLIFFE Principal Component Analysis Springer New York 1986[38] J M KLEINBERG Authoritative sources in a hyperlinked environment J ACM 46 (1999) pp 604ndash632[39] T G KOLDA Orthogonal tensor decompositions SIAM J Matrix Anal Appl 23 (2001) pp 243ndash255[40] T G KOLDA AND B W BADER Tensor decompositions and applications SIAM Rev 51 (2009) pp 455ndash

500[41] R M LARSEN PROPACK-software for large and sparse SVD calculations available at

httpsunstanfordedurmunkPROPACK[42] Lanczos bidiagonalization with partial reorthogonalization Tech Report PB-537 Department of

Computer Science Aarhus University Aarhus Denmark 1998[43] Y LECUN L BOTTOU Y BENGIO AND P HAFFNER Gradient-based learning applied to document

recognition Proc IEEE 86 (1998) pp 2278ndash2324[44] R B LEHOUCQ D C SORENSEN AND C YANG ARPACK Usersrsquo Guide Solution of Large-Scale

Eigenvalue Problems with Implicitly Restarted Arnoldi Methods SIAM Philadelphia 1998[45] Z LIN M CHEN AND Y MA The augmented Lagrange multiplier method for exact recovery of corrupted

low-rank matrices Preprint on arXiv httpsarxivorgabs10095055 2010[46] N LINIAL E LONDON AND Y RABINOVICH The geometry of graphs and some of its algorithmic

applications Combinatorica 15 (1995) pp 215ndash245[47] Z LIU A HANSSON AND L VANDENBERGHE Nuclear norm system identification with missing inputs and

outputs Systems Control Lett 62 (2013) pp 605ndash612[48] Z LIU AND L VANDENBERGHE Interior-point method for nuclear norm approximation with application to

system identification SIAM J Matrix Anal Appl 31 (2009) pp 1235ndash1256[49] S MA D GOLDFARB AND L CHEN Fixed point and Bregman iterative methods for matrix rank minimiza-

tion Math Program 128 (2011) pp 321ndash353[50] C B MELGAARD Randomized Pivoting and Spectrum-Revealing Bounds in Numerical Linear Algebra

PhD Thesis University of California Berkeley 2015[51] M MESBAHI AND G P PAPAVASSILOPOULOS On the rank minimization problem over a positive semidefinite

linear matrix inequality IEEE Trans Automat Control 42 (1997) pp 239ndash243[52] N MULLER L MAGAIA AND B M HERBST Singular value decomposition eigenfaces and 3D recon-

structions SIAM Rev 46 (2004) pp 518ndash545[53] M NARWARIA AND W LIN SVD-based quality metric for image and video using machine learning IEEE

Trans Systbdquo Man and Cybernetics Part B (Cybernetics) 42 (2012) pp 347ndash364[54] T-H OH Y MATSUSHITA Y-W TAI AND I SO KWEON Fast randomized singular value thresholding

for nuclear norm minimization in Proceedings of the IEEE Conference on Computer Vision and PatternRecognition 2015 IEEE Conference Proceedings IEEE Los Alamitos 2015 pp 4484ndash4493

[55] G QUINTANA-ORTIacute X SUN AND C H BISCHOF A BLAS-3 version of the QR factorization with columnpivoting SIAM J Sci Comput 19 (1998) pp 1486ndash1494

[56] B RECHT M FAZEL AND P A PARRILO Guaranteed minimum-rank solutions of linear matrix equationsvia nuclear norm minimization SIAM Rev 52 (2010) pp 471ndash501

[57] J D RENNIE AND N SREBRO Fast maximum margin matrix factorization for collaborative prediction inProceedings of the 22nd International Conference on Machine Learning L De Raedt and S Wrobel edsACM Press Neow York 2005 pp 713ndash719

[58] V ROKHLIN A SZLAM AND M TYGERT A randomized algorithm for principal component analysis SIAMJ Matrix Anal Appl 31 (2009) pp 1100ndash1124

[59] A K SAIBABA HOID higher order interpolatory decomposition for tensors based on Tucker representationSIAM J Matrix Anal Appl 37 (2016) pp 1223ndash1249

[60] B SAVAS AND L ELDEacuteN Handwritten digit classification using higher order singular value decompositionPattern Recognition 40 (2007) pp 993ndash1003

[61] R SCHREIBER AND C VAN LOAN A storage-efficient WY representation for products of Householdertransformations SIAM J Sci Statist Comput 10 (1989) pp 53ndash57

[62] A SHASHUA AND T HAZAN Non-negative tensor factorization with applications to statistics and com-puter vision in Proceedings of the 22nd International Conference on Machine Learning S DzeroskiL De Raedt and S Wrobel eds ACM New York 2005 pp 792ndash799

[63] N D SIDIROPOULOS R BRO AND G B GIANNAKIS Parallel factor analysis in sensor array processingIEEE Trans Signal Process 48 (2000) pp 2377ndash2388

[64] D C SORENSEN AND M EMBREE A DEIM induced CUR factorization SIAM J Sci Comput 38 (2016)pp A1454ndashA1482

[65] N SREBRO J RENNIE AND T S JAAKKOLA Maximum-margin matrix factorization in NIPSrsquo04 Proceed-ings of the 17th International Conference on Neural Information Processing Systems L K Saul Y Weissand L Bottou eds MIT Press Cambridge 2004 pp 1329ndash1336

[66] G W STEWART The QLP approximation to the singular value decomposition SIAM J Sci Comput 20(1999) pp 1336ndash1348

ETNAKent State University and

Johann Radon Institute (RICAM)

494 Y FENG J XIAO AND M GU

[67] S TAHERI Q QIU AND R CHELLAPPA Structure-preserving sparse decomposition for facial expressionanalysis IEEE Trans Image Process 23 (2014) pp 3590ndash3603

[68] K-C TOH AND S YUN An accelerated proximal gradient algorithm for nuclear norm regularized linearleast squares problems Pac J Optim 6 (2010) pp 615ndash640

[69] L N TREFETHEN AND D BAU III Numerical Linear Algebra SIAM Philadelphia 1997[70] L R TUCKER Some mathematical notes on three-mode factor analysis Psychometrika 31 (1966) pp 279ndash

311[71] N VANNIEUWENHOVEN R VANDEBRIL AND K MEERBERGEN A new truncation strategy for the higher-

order singular value decomposition SIAM J Sci Comput 34 (2012) pp A1027ndashA1052[72] M A O VASILESCU AND D TERZOPOULOS Multilinear analysis of image ensembles Tensorfaces in

European Conference on Computer Vision 2002 A Heyden G Sparr M Nielsen and P Johansen edsLecture Notes in Computer Science vol 2350 Springer Berlin 2002 pp 447ndash460

[73] N VERVLIET O DEBALS L SORBER M V BAREL AND L DE LATHAUWER Tensorlab user guideavailable at httpwwwtensorlabnet

[74] J XIAO AND M GU Spectrum-revealing Cholesky factorization for kernel methods in Proceedings of the 16thIEEE International Conference on Data Mining (ICDM) IEEE Conference Proceedings Los Alamitos2016 pp 1293ndash1298

[75] J XIAO M GU AND J LANGOU Fast parallel randomized QR with column pivoting algorithms for reliablelow-rank matrix approximations in Proceedings of the 24th IEEE International Conference on HighPerformance Computing (HiPC) IEEE Conference Proceedings Los Alamitos 2017 pp 233ndash242

[76] T ZHANG AND G H GOLUB Rank-one approximation to high order tensors SIAM J Matrix Anal Appl23 (2001) pp 534ndash550

Page 20: FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION AND …etna.mcs.kent.edu/vol.51.2019/pp469-494.dir/pp469-494.pdfminimization. In Section3, we introduce Flip-Flop SRQR and analyze its

ETNAKent State University and

Johann Radon Institute (RICAM)

488 Y FENG J XIAO AND M GU

TABLE 46Comparison on matrix completion

Data set Method Iter Time NMAE sv σmax σmin

jester-1

IALM-LANSVD 12 706e+00 184e-01 100 214e+03 100e+00IALM-FFSRQR 12 344e+00 169e-01 100 228e+03 100e+00IALM-RSISVD 12 375e+00 189e-01 100 212e+03 100e+00IALM-LTSVD 100 211e+01 174e-01 62 300e+03 100e+00

jester-2

IALM-LANSVD 12 680e+00 185e-01 100 213e+03 100e+00IALM-FFSRQR 12 279e+00 170e-01 100 229e+03 100e+00IALM-RSISVD 12 359e+00 191e-01 100 212e+03 100e+00IALM-LTSVD 100 203e+01 175e-01 58 296e+03 100e+00

jester-3

IALM-LANSVD 12 705e+00 126e-01 99 179e+03 100e+00IALM-FFSRQR 12 303e+00 122e-01 100 171e+03 100e+00IALM-RSISVD 12 385e+00 131e-01 100 178e+03 100e+00IALM-LTSVD 100 212e+01 133e-01 55 250e+03 100e+00

jester-all

IALM-LANSVD 12 239e+01 172e-01 100 356e+03 100e+00IALM-FFSRQR 12 112e+01 162e-01 100 363e+03 100e+00IALM-RSISVD 12 134e+01 182e-01 100 347e+03 100e+00IALM-LTSVD 100 699e+01 168e-01 52 492e+03 100e+00

movie-100K

IALM-LANSVD 29 286e+01 183e-01 285 121e+03 100e+00IALM-FFSRQR 30 455e+00 167e-01 295 153e+03 100e+00IALM-RSISVD 29 482e+00 182e-01 285 129e+03 100e+00IALM-LTSVD 48 142e+01 147e-01 475 191e+03 100e+00

movie-1M

IALM-LANSVD 50 740e+02 158e-01 495 499e+03 100e+00IALM-FFSRQR 53 207e+02 137e-01 525 663e+03 100e+00IALM-RSISVD 50 223e+02 157e-01 495 535e+03 100e+00IALM-LTSVD 100 850e+02 117e-01 995 897e+03 100e+00

movie-latest-small

IALM-LANSVD 31 166e+02 185e-01 305 113e+03 100e+00IALM-FFSRQR 31 196e+01 200e-01 305 142e+03 100e+00IALM-RSISVD 31 285e+01 191e-01 305 120e+03 100e+00IALM-LTSVD 63 402e+01 208e-01 298 179e+03 100e+00

Appendix D IALM for solving two special nuclear norm minimization problemsAlgorithm 7 and Algorithm 8 solve the robust PCA and matrix completion problems

respectively In Algorithm 7 Sω (x) = sgn (x) middot max (|x| minus ω 0) is the soft shrinkageoperator [29] with x isin Rn and ω gt 0 In Algorithm 8 Ω is the complement of Ω

Appendix E Approximate SVD with randomized subspace iterationThe randomized subspace iteration was proposed in [30 Algorithm 44] to compute an

orthonormal matrix whose range approximates the range of A An approximate SVD can becomputed using the aforementioned orthonormal matrix [30 Algorithm 51] A randomizedsubspace iteration is used in routine MLSVD_RSI in Matlab toolbox tensorlab [73] andMLSVD_RSI is by far the most efficient function that we can find in Matlab to compute ST-HOSVD We summarize the approximate SVD with randomized subspace iteration pseudocodein Algorithm 9

Now we perform a complexity analysis for the approximate SVD with randomizedsubspace iteration We first note that

1 The cost of generating a random matrix is negligible2 The cost of computing B = AΩ is 2mn (k + p)

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 489

Algorithm 4 Partial QRCPInputsMatrix A isin Rmtimesn Target rank kOutputsOrthogonal matrix Q isin RmtimesmUpper trapezoidal matrix R isin RmtimesnPermutation matrix Π isin Rntimesn such that AΠ = QRAlgorithmInitialize Π = In Compute column norms rs = A (1 m s) 2 (1 le s le n)for j = 1 k do

Find i = arg maxjleslen

rs Exchange rj and ri columns in A and Π

Form Householder reflection Qj from A (j m j)Update trailing matrix A (j m j n)larr QTj A (j m j n)Update rs = A (j + 1 m s) 2 (j + 1 le s le n)

end forQ = Q1Q2 middot middot middotQk is the product of all reflections R = upper trapezoidal part of A

Algorithm 5 TUXV AlgorithmInputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0OutputsColumn orthonormal matrices U isin Rmtimesk V isin Rntimesk and upper triangular matrixR isin Rktimesk such that A asymp URV T AlgorithmDo TRQRCP on A to obtain Q isin Rmtimesk R isin Rktimesn and Π isin RntimesnR = RΠT and do LQ factorization ie [VR] = qr(RT 0)Compute Z = AV and do QR factorization ie [UR] = qr(Z 0)

3 In each QR step [Qsim] = qr (B 0) the cost of computing the QR factorization ofB is 2m (k + p)

2 minus 23 (k + p)

3 (cf [69]) and the cost of forming the first (k + p)

columns in the full Q matrix is m (k + p)2

+ 13 (k + p)

3

Now we count the flops for each i in the for loop

1 The cost of computing B = AT lowastQ is 2mn (k + p)2 The cost of computing [Qsim] = qr (B 0) is 2n (k + p)

2 minus 23 (k + p)

3 and the costof forming the first (k + p) columns in the full Q matrix is n (k + p)

2+ 1

3 (k + p)3

3 The cost of computing B = A lowastQ is 2mn (k + p)4 The cost of computing [Qsim] = qr (B 0) is 2m (k + p)

2minus 23 (k + p)

3 and the costof forming the first (k + p) columns in the full Q matrix is m (k + p)

2+ 1

3 (k + p)3

Together the cost of running the for loop q times is

q

(4mn (k + p) + 3 (m+ n) (k + p)

2 minus 2

3(k + p)

3

)

Additionally the cost of computing B = QT lowastA is 2mn (k + p) the cost of doing SVD of Bis O(n (k + p)

2) and the cost of computing U = Q lowast U is 2m (k + p)

2

ETNAKent State University and

Johann Radon Institute (RICAM)

490 Y FENG J XIAO AND M GU

Algorithm 6 ST-HOSVDInputsTensor X isin RI1timesmiddotmiddotmiddottimesId desired rank (k1 kd) and processing order p = (p1 middot middot middot pd)OutputsTucker decomposition [GU1 middot middot middot Ud]AlgorithmDefine tensor G larr X for j = 1 d dor = pj Compute exact or approximate rank kr SVD of the tensor unfolding G(r) asymp UrΣrV Tr Ur larr UrUpdate G(r) larr ΣrV

Tr ie applying UTr to G

end for

Algorithm 7 Robust PCA Using IALMInputsMeasured matrix M isin Rmtimesn positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 J (M) = max

(M2 λminus1MF

) Y0 = MJ (M) E0 = 0

while not converged do(UΣV) = svd

(MminusEk + microminus1k Yk

)

Xk+1 = USmicrominus1k

(Σ)V T

Ek+1 = Sλmicrominus1k

(M minusXk+1 + microminus1

k Yk)

Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then

Breakend if

end while

Now assume k + p min (mn) and omit the lower-order terms then we arriveat (4q + 4)mn (k + p) as the complexity of approximate SVD with randomized subspaceiteration In practice q is usually chosen to be an integer between 0 and 2

REFERENCES

[1] O ALTER P O BROWN AND D BOTSTEIN Singular value decomposition for genome-wide expressiondata processing and modeling Proc Natl Acad Sci USA 97 (2000) pp 10101ndash10106

[2] E ANDERSON Z BAI C BISCHOF L S BLACKFORD J DEMMEL J DONGARRA J DU CROZA GREENBAUM S HAMMARLING A MCKENNEY AND D SORENSEN LAPACK Usersrsquo Guide 3rded SIAM Philadelphia 1999

[3] C A ANDERSSON AND R BRO Improving the speed of multi-way algorithms Part I Tucker3 ChemometricsIntell Lab Syst 42 (1998) pp 93ndash103

[4] M W BERRY S T DUMAIS AND G W OrsquoBRIEN Using linear algebra for intelligent information retrievalSIAM Rev 37 (1995) pp 573ndash595

[5] C BISCHOF AND C VAN LOAN The WY representation for products of Householder matrices SIAM J Sci

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 491

Algorithm 8 Matrix Completion Using IALMInputsSampled set Ω sampled entries πΩ (M) positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 Y0 = 0 E0 = 0while not converged do

(UΣV) = svd(MminusEk + microminus1k Yk

)

Xk+1 = USmicrominus1k

(Σ)V T

Ek+1 = πΩ

(M minusXk+1 + microminus1

k Yk)

Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then

Breakend if

end while

Statist Comput 8 (1987) pp S2ndashS13[6] P BUSINGER AND G H GOLUB Handbook series linear algebra Linear least squares solutions by House-

holder transformations Numer Math 7 (1965) pp 269ndash276[7] J-F CAI E J CANDEgraveS AND Z SHEN A singular value thresholding algorithm for matrix completion

SIAM J Optim 20 (2010) pp 1956ndash1982[8] E J CANDEgraveS X LI Y MA AND J WRIGHT Robust principal component analysis J ACM 58 (2011)

Art 11 37 pages[9] E J CANDEgraveS AND B RECHT Exact matrix completion via convex optimization Found Comput Math 9

(2009) pp 717ndash772[10] J D CARROLL AND J-J CHANG Analysis of individual differences in multidimensional scaling via an

N-way generalization of ldquoEckart-Youngrdquo decomposition Psychometrika 35 (1970) pp 283ndash319[11] T A DAVIS AND Y HU The University of Florida sparse matrix collection ACM Trans Math Software 38

(2011) pp Art 1 25[12] L DE LATHAUWER AND B DE MOOR From matrix to tensor Multilinear algebra and signal processing in

Mathematics in Signal Processing IV Inst Math Appl Conf Ser Oxford Univ Press Oxford 1998pp 1ndash16

[13] L DE LATHAUWER B DE MOOR AND J VANDEWALLE A multilinear singular value decompositionSIAM J Matrix Anal Appl 21 (2000) pp 1253ndash1278

[14] On the best rank-1 and rank-(R1 R2 middot middot middot RN ) approximation of higher-order tensors SIAM JMatrix Anal Appl 21 (2000) pp 1324ndash1342

[15] L DE LATHAUWER AND J VANDEWALLE Dimensionality reduction in higher-order signal processing andrank-(R1 R2 RN ) reduction in multilinear algebra Linear Algebra Appl 391 (2004) pp 31ndash55

[16] P DRINEAS R KANNAN AND M W MAHONEY Fast Monte Carlo algorithms for matrices II Computinga low-rank approximation to a matrix SIAM J Comput 36 (2006) pp 158ndash183

[17] J A DUERSCH AND M GU Randomized QR with column pivoting SIAM J Sci Comput 39 (2017)pp C263ndashC291

[18] C ECKART AND G YOUNG The approximation of one matrix by another of lower rank Psychometrika 1(1936) pp 211ndash218

[19] L ELDEacuteN Matrix Methods in Data Mining and Pattern Recognition SIAM Philadelphia 2007[20] M FAZEL H HINDI AND S P BOYD Log-det heuristic for matrix rank minimization with applications to

Hankel and Euclidean distance matrices in Proceedings of the 2003 American Control Conference 2003IEEE Conference Proceedings IEEE Piscataway pp 2156ndash2162

[21] M FAZEL T K PONG D SUN AND P TSENG Hankel matrix rank minimization with applications tosystem identification and realization SIAM J Matrix Anal Appl 34 (2013) pp 946ndash977

[22] C FRANK AND E NOumlTH Optimizing eigenfaces by face masks for facial expression recognition in ComputerAnalysis of Images and Patterns N Petkov and M A Westenberg eds vol 2756 of Lecture Notes inComput Sci Springer Berlin 2003 pp 646ndash654

ETNAKent State University and

Johann Radon Institute (RICAM)

492 Y FENG J XIAO AND M GU

Algorithm 9 Approximate SVD with Randomized Subspace IterationInputsMatrix A isin Rmtimesn Target rank k Oversampling size p ge 0 Number of iterations q ge 1OutputsU isin Rmtimesk contains the approximate top k left singular vectors of AΣ isin Rktimesk contains the approximate top k singular values of AV isin Rntimesk contains the approximate top k right singular vectors of AAlgorithmGenerate iid Gaussian matrix Ω isin N (0 1)

ntimes(k+p)Compute B = AΩ[Qsim] = qr (B 0)for i = 1 q doB = AT lowastQ[Qsim] = qr (B 0)B = A lowastQ[Qsim] = qr (B 0)

end forB = QT lowastA[UΣ V ] = svd (B)U = Q lowast U U = U ( 1 k)Σ = Σ (1 k 1 k)V = V ( 1 k)

[23] G W FURNAS S DEERWESTER S T DUMAIS T K LANDAUER R A HARSHMAN L A STREETERAND K E LOCHBAUM Information retrieval using a singular value decomposition model of latentsemantic structure in Proceedings of the 11th Annual International ACM SIGIR Conference on Researchand Development in Information Retrieval Y Chiaramella ed ACM New York 1988 pp 465ndash480

[24] K GOLDBERG T ROEDER D GUPTA AND C PERKINS Eigentaste A constant time collaborative filteringalgorithm Inf Retrieval 4 (2001) pp 133ndash151

[25] G GOLUB Numerical methods for solving linear least squares problems Numer Math 7 (1965) pp 206ndash216[26] G H GOLUB AND C F VAN LOAN Matrix Computations 4th ed Johns Hopkins University Press

Baltimore 2013[27] M GU Subspace iteration randomization and singular value problems SIAM J Sci Comput 37 (2015)

pp A1139ndashA1173[28] M GU AND S C EISENSTAT Efficient algorithms for computing a strong rank-revealing QR factorization

SIAM J Sci Comput 17 (1996) pp 848ndash869[29] E T HALE W YIN AND Y ZHANG Fixed-point continuation for l1-minimization methodology and

convergence SIAM J Optim 19 (2008) pp 1107ndash1130[30] N HALKO P G MARTINSSON AND J A TROPP Finding structure with randomness probabilistic

algorithms for constructing approximate matrix decompositions SIAM Rev 53 (2011) pp 217ndash288[31] R A HARSHMAN Foundations of the PARAFAC procedure Models and conditions for an ldquoexplanatoryrdquo

multimodal factor analysis UCLA Working Papers in Phonetics 16 (1970) pp 1ndash84[32] J L HERLOCKER J A KONSTAN A BORCHERS AND J RIEDL An algorithmic framework for performing

collaborative filtering in Proceedings of the 22nd Annual International ACM SIGIR Conference onResearch and Development in Information Retrieval F Gey M Heaast and R Tong eds ACM NewYork 1999 pp 230ndash237

[33] R A HORN AND C R JOHNSON Topics in Matrix Analysis Cambridge University Press Cambridge 1991[34] D A HUCKABY AND T F CHAN On the convergence of Stewartrsquos QLP algorithm for approximating the

SVD Numer Algorithms 32 (2003) pp 287ndash316[35] Stewartrsquos pivoted QLP decomposition for low-rank matrices Numer Linear Algebra Appl 12 (2005)

pp 153ndash159[36] W B JOHNSON AND J LINDENSTRAUSS Extensions of Lipschitz mappings into a Hilbert space in

Conference in Modern Analysis and Probability (New Haven Conn 1982) R Beals A Beck A Bellowand A Hajian eds vol 26 of Contemp Math Amer Math Soc Providence 1984 pp 189ndash206

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 493

[37] I T JOLLIFFE Principal Component Analysis Springer New York 1986[38] J M KLEINBERG Authoritative sources in a hyperlinked environment J ACM 46 (1999) pp 604ndash632[39] T G KOLDA Orthogonal tensor decompositions SIAM J Matrix Anal Appl 23 (2001) pp 243ndash255[40] T G KOLDA AND B W BADER Tensor decompositions and applications SIAM Rev 51 (2009) pp 455ndash

500[41] R M LARSEN PROPACK-software for large and sparse SVD calculations available at

httpsunstanfordedurmunkPROPACK[42] Lanczos bidiagonalization with partial reorthogonalization Tech Report PB-537 Department of

Computer Science Aarhus University Aarhus Denmark 1998[43] Y LECUN L BOTTOU Y BENGIO AND P HAFFNER Gradient-based learning applied to document

recognition Proc IEEE 86 (1998) pp 2278ndash2324[44] R B LEHOUCQ D C SORENSEN AND C YANG ARPACK Usersrsquo Guide Solution of Large-Scale

Eigenvalue Problems with Implicitly Restarted Arnoldi Methods SIAM Philadelphia 1998[45] Z LIN M CHEN AND Y MA The augmented Lagrange multiplier method for exact recovery of corrupted

low-rank matrices Preprint on arXiv httpsarxivorgabs10095055 2010[46] N LINIAL E LONDON AND Y RABINOVICH The geometry of graphs and some of its algorithmic

applications Combinatorica 15 (1995) pp 215ndash245[47] Z LIU A HANSSON AND L VANDENBERGHE Nuclear norm system identification with missing inputs and

outputs Systems Control Lett 62 (2013) pp 605ndash612[48] Z LIU AND L VANDENBERGHE Interior-point method for nuclear norm approximation with application to

system identification SIAM J Matrix Anal Appl 31 (2009) pp 1235ndash1256[49] S MA D GOLDFARB AND L CHEN Fixed point and Bregman iterative methods for matrix rank minimiza-

tion Math Program 128 (2011) pp 321ndash353[50] C B MELGAARD Randomized Pivoting and Spectrum-Revealing Bounds in Numerical Linear Algebra

PhD Thesis University of California Berkeley 2015[51] M MESBAHI AND G P PAPAVASSILOPOULOS On the rank minimization problem over a positive semidefinite

linear matrix inequality IEEE Trans Automat Control 42 (1997) pp 239ndash243[52] N MULLER L MAGAIA AND B M HERBST Singular value decomposition eigenfaces and 3D recon-

structions SIAM Rev 46 (2004) pp 518ndash545[53] M NARWARIA AND W LIN SVD-based quality metric for image and video using machine learning IEEE

Trans Systbdquo Man and Cybernetics Part B (Cybernetics) 42 (2012) pp 347ndash364[54] T-H OH Y MATSUSHITA Y-W TAI AND I SO KWEON Fast randomized singular value thresholding

for nuclear norm minimization in Proceedings of the IEEE Conference on Computer Vision and PatternRecognition 2015 IEEE Conference Proceedings IEEE Los Alamitos 2015 pp 4484ndash4493

[55] G QUINTANA-ORTIacute X SUN AND C H BISCHOF A BLAS-3 version of the QR factorization with columnpivoting SIAM J Sci Comput 19 (1998) pp 1486ndash1494

[56] B RECHT M FAZEL AND P A PARRILO Guaranteed minimum-rank solutions of linear matrix equationsvia nuclear norm minimization SIAM Rev 52 (2010) pp 471ndash501

[57] J D RENNIE AND N SREBRO Fast maximum margin matrix factorization for collaborative prediction inProceedings of the 22nd International Conference on Machine Learning L De Raedt and S Wrobel edsACM Press Neow York 2005 pp 713ndash719

[58] V ROKHLIN A SZLAM AND M TYGERT A randomized algorithm for principal component analysis SIAMJ Matrix Anal Appl 31 (2009) pp 1100ndash1124

[59] A K SAIBABA HOID higher order interpolatory decomposition for tensors based on Tucker representationSIAM J Matrix Anal Appl 37 (2016) pp 1223ndash1249

[60] B SAVAS AND L ELDEacuteN Handwritten digit classification using higher order singular value decompositionPattern Recognition 40 (2007) pp 993ndash1003

[61] R SCHREIBER AND C VAN LOAN A storage-efficient WY representation for products of Householdertransformations SIAM J Sci Statist Comput 10 (1989) pp 53ndash57

[62] A SHASHUA AND T HAZAN Non-negative tensor factorization with applications to statistics and com-puter vision in Proceedings of the 22nd International Conference on Machine Learning S DzeroskiL De Raedt and S Wrobel eds ACM New York 2005 pp 792ndash799

[63] N D SIDIROPOULOS R BRO AND G B GIANNAKIS Parallel factor analysis in sensor array processingIEEE Trans Signal Process 48 (2000) pp 2377ndash2388

[64] D C SORENSEN AND M EMBREE A DEIM induced CUR factorization SIAM J Sci Comput 38 (2016)pp A1454ndashA1482

[65] N SREBRO J RENNIE AND T S JAAKKOLA Maximum-margin matrix factorization in NIPSrsquo04 Proceed-ings of the 17th International Conference on Neural Information Processing Systems L K Saul Y Weissand L Bottou eds MIT Press Cambridge 2004 pp 1329ndash1336

[66] G W STEWART The QLP approximation to the singular value decomposition SIAM J Sci Comput 20(1999) pp 1336ndash1348

ETNAKent State University and

Johann Radon Institute (RICAM)

494 Y FENG J XIAO AND M GU

[67] S TAHERI Q QIU AND R CHELLAPPA Structure-preserving sparse decomposition for facial expressionanalysis IEEE Trans Image Process 23 (2014) pp 3590ndash3603

[68] K-C TOH AND S YUN An accelerated proximal gradient algorithm for nuclear norm regularized linearleast squares problems Pac J Optim 6 (2010) pp 615ndash640

[69] L N TREFETHEN AND D BAU III Numerical Linear Algebra SIAM Philadelphia 1997[70] L R TUCKER Some mathematical notes on three-mode factor analysis Psychometrika 31 (1966) pp 279ndash

311[71] N VANNIEUWENHOVEN R VANDEBRIL AND K MEERBERGEN A new truncation strategy for the higher-

order singular value decomposition SIAM J Sci Comput 34 (2012) pp A1027ndashA1052[72] M A O VASILESCU AND D TERZOPOULOS Multilinear analysis of image ensembles Tensorfaces in

European Conference on Computer Vision 2002 A Heyden G Sparr M Nielsen and P Johansen edsLecture Notes in Computer Science vol 2350 Springer Berlin 2002 pp 447ndash460

[73] N VERVLIET O DEBALS L SORBER M V BAREL AND L DE LATHAUWER Tensorlab user guideavailable at httpwwwtensorlabnet

[74] J XIAO AND M GU Spectrum-revealing Cholesky factorization for kernel methods in Proceedings of the 16thIEEE International Conference on Data Mining (ICDM) IEEE Conference Proceedings Los Alamitos2016 pp 1293ndash1298

[75] J XIAO M GU AND J LANGOU Fast parallel randomized QR with column pivoting algorithms for reliablelow-rank matrix approximations in Proceedings of the 24th IEEE International Conference on HighPerformance Computing (HiPC) IEEE Conference Proceedings Los Alamitos 2017 pp 233ndash242

[76] T ZHANG AND G H GOLUB Rank-one approximation to high order tensors SIAM J Matrix Anal Appl23 (2001) pp 534ndash550

Page 21: FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION AND …etna.mcs.kent.edu/vol.51.2019/pp469-494.dir/pp469-494.pdfminimization. In Section3, we introduce Flip-Flop SRQR and analyze its

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 489

Algorithm 4 Partial QRCPInputsMatrix A isin Rmtimesn Target rank kOutputsOrthogonal matrix Q isin RmtimesmUpper trapezoidal matrix R isin RmtimesnPermutation matrix Π isin Rntimesn such that AΠ = QRAlgorithmInitialize Π = In Compute column norms rs = A (1 m s) 2 (1 le s le n)for j = 1 k do

Find i = arg maxjleslen

rs Exchange rj and ri columns in A and Π

Form Householder reflection Qj from A (j m j)Update trailing matrix A (j m j n)larr QTj A (j m j n)Update rs = A (j + 1 m s) 2 (j + 1 le s le n)

end forQ = Q1Q2 middot middot middotQk is the product of all reflections R = upper trapezoidal part of A

Algorithm 5 TUXV AlgorithmInputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0OutputsColumn orthonormal matrices U isin Rmtimesk V isin Rntimesk and upper triangular matrixR isin Rktimesk such that A asymp URV T AlgorithmDo TRQRCP on A to obtain Q isin Rmtimesk R isin Rktimesn and Π isin RntimesnR = RΠT and do LQ factorization ie [VR] = qr(RT 0)Compute Z = AV and do QR factorization ie [UR] = qr(Z 0)

3 In each QR step [Qsim] = qr (B 0) the cost of computing the QR factorization ofB is 2m (k + p)

2 minus 23 (k + p)

3 (cf [69]) and the cost of forming the first (k + p)

columns in the full Q matrix is m (k + p)2

+ 13 (k + p)

3

Now we count the flops for each i in the for loop

1 The cost of computing B = AT lowastQ is 2mn (k + p)2 The cost of computing [Qsim] = qr (B 0) is 2n (k + p)

2 minus 23 (k + p)

3 and the costof forming the first (k + p) columns in the full Q matrix is n (k + p)

2+ 1

3 (k + p)3

3 The cost of computing B = A lowastQ is 2mn (k + p)4 The cost of computing [Qsim] = qr (B 0) is 2m (k + p)

2minus 23 (k + p)

3 and the costof forming the first (k + p) columns in the full Q matrix is m (k + p)

2+ 1

3 (k + p)3

Together the cost of running the for loop q times is

q

(4mn (k + p) + 3 (m+ n) (k + p)

2 minus 2

3(k + p)

3

)

Additionally the cost of computing B = QT lowastA is 2mn (k + p) the cost of doing SVD of Bis O(n (k + p)

2) and the cost of computing U = Q lowast U is 2m (k + p)

2

ETNAKent State University and

Johann Radon Institute (RICAM)

490 Y FENG J XIAO AND M GU

Algorithm 6 ST-HOSVDInputsTensor X isin RI1timesmiddotmiddotmiddottimesId desired rank (k1 kd) and processing order p = (p1 middot middot middot pd)OutputsTucker decomposition [GU1 middot middot middot Ud]AlgorithmDefine tensor G larr X for j = 1 d dor = pj Compute exact or approximate rank kr SVD of the tensor unfolding G(r) asymp UrΣrV Tr Ur larr UrUpdate G(r) larr ΣrV

Tr ie applying UTr to G

end for

Algorithm 7 Robust PCA Using IALMInputsMeasured matrix M isin Rmtimesn positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 J (M) = max

(M2 λminus1MF

) Y0 = MJ (M) E0 = 0

while not converged do(UΣV) = svd

(MminusEk + microminus1k Yk

)

Xk+1 = USmicrominus1k

(Σ)V T

Ek+1 = Sλmicrominus1k

(M minusXk+1 + microminus1

k Yk)

Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then

Breakend if

end while

Now assume k + p min (mn) and omit the lower-order terms then we arriveat (4q + 4)mn (k + p) as the complexity of approximate SVD with randomized subspaceiteration In practice q is usually chosen to be an integer between 0 and 2

REFERENCES

[1] O ALTER P O BROWN AND D BOTSTEIN Singular value decomposition for genome-wide expressiondata processing and modeling Proc Natl Acad Sci USA 97 (2000) pp 10101ndash10106

[2] E ANDERSON Z BAI C BISCHOF L S BLACKFORD J DEMMEL J DONGARRA J DU CROZA GREENBAUM S HAMMARLING A MCKENNEY AND D SORENSEN LAPACK Usersrsquo Guide 3rded SIAM Philadelphia 1999

[3] C A ANDERSSON AND R BRO Improving the speed of multi-way algorithms Part I Tucker3 ChemometricsIntell Lab Syst 42 (1998) pp 93ndash103

[4] M W BERRY S T DUMAIS AND G W OrsquoBRIEN Using linear algebra for intelligent information retrievalSIAM Rev 37 (1995) pp 573ndash595

[5] C BISCHOF AND C VAN LOAN The WY representation for products of Householder matrices SIAM J Sci

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 491

Algorithm 8 Matrix Completion Using IALMInputsSampled set Ω sampled entries πΩ (M) positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 Y0 = 0 E0 = 0while not converged do

(UΣV) = svd(MminusEk + microminus1k Yk

)

Xk+1 = USmicrominus1k

(Σ)V T

Ek+1 = πΩ

(M minusXk+1 + microminus1

k Yk)

Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then

Breakend if

end while

Statist Comput 8 (1987) pp S2ndashS13[6] P BUSINGER AND G H GOLUB Handbook series linear algebra Linear least squares solutions by House-

holder transformations Numer Math 7 (1965) pp 269ndash276[7] J-F CAI E J CANDEgraveS AND Z SHEN A singular value thresholding algorithm for matrix completion

SIAM J Optim 20 (2010) pp 1956ndash1982[8] E J CANDEgraveS X LI Y MA AND J WRIGHT Robust principal component analysis J ACM 58 (2011)

Art 11 37 pages[9] E J CANDEgraveS AND B RECHT Exact matrix completion via convex optimization Found Comput Math 9

(2009) pp 717ndash772[10] J D CARROLL AND J-J CHANG Analysis of individual differences in multidimensional scaling via an

N-way generalization of ldquoEckart-Youngrdquo decomposition Psychometrika 35 (1970) pp 283ndash319[11] T A DAVIS AND Y HU The University of Florida sparse matrix collection ACM Trans Math Software 38

(2011) pp Art 1 25[12] L DE LATHAUWER AND B DE MOOR From matrix to tensor Multilinear algebra and signal processing in

Mathematics in Signal Processing IV Inst Math Appl Conf Ser Oxford Univ Press Oxford 1998pp 1ndash16

[13] L DE LATHAUWER B DE MOOR AND J VANDEWALLE A multilinear singular value decompositionSIAM J Matrix Anal Appl 21 (2000) pp 1253ndash1278

[14] On the best rank-1 and rank-(R1 R2 middot middot middot RN ) approximation of higher-order tensors SIAM JMatrix Anal Appl 21 (2000) pp 1324ndash1342

[15] L DE LATHAUWER AND J VANDEWALLE Dimensionality reduction in higher-order signal processing andrank-(R1 R2 RN ) reduction in multilinear algebra Linear Algebra Appl 391 (2004) pp 31ndash55

[16] P DRINEAS R KANNAN AND M W MAHONEY Fast Monte Carlo algorithms for matrices II Computinga low-rank approximation to a matrix SIAM J Comput 36 (2006) pp 158ndash183

[17] J A DUERSCH AND M GU Randomized QR with column pivoting SIAM J Sci Comput 39 (2017)pp C263ndashC291

[18] C ECKART AND G YOUNG The approximation of one matrix by another of lower rank Psychometrika 1(1936) pp 211ndash218

[19] L ELDEacuteN Matrix Methods in Data Mining and Pattern Recognition SIAM Philadelphia 2007[20] M FAZEL H HINDI AND S P BOYD Log-det heuristic for matrix rank minimization with applications to

Hankel and Euclidean distance matrices in Proceedings of the 2003 American Control Conference 2003IEEE Conference Proceedings IEEE Piscataway pp 2156ndash2162

[21] M FAZEL T K PONG D SUN AND P TSENG Hankel matrix rank minimization with applications tosystem identification and realization SIAM J Matrix Anal Appl 34 (2013) pp 946ndash977

[22] C FRANK AND E NOumlTH Optimizing eigenfaces by face masks for facial expression recognition in ComputerAnalysis of Images and Patterns N Petkov and M A Westenberg eds vol 2756 of Lecture Notes inComput Sci Springer Berlin 2003 pp 646ndash654

ETNAKent State University and

Johann Radon Institute (RICAM)

492 Y FENG J XIAO AND M GU

Algorithm 9 Approximate SVD with Randomized Subspace IterationInputsMatrix A isin Rmtimesn Target rank k Oversampling size p ge 0 Number of iterations q ge 1OutputsU isin Rmtimesk contains the approximate top k left singular vectors of AΣ isin Rktimesk contains the approximate top k singular values of AV isin Rntimesk contains the approximate top k right singular vectors of AAlgorithmGenerate iid Gaussian matrix Ω isin N (0 1)

ntimes(k+p)Compute B = AΩ[Qsim] = qr (B 0)for i = 1 q doB = AT lowastQ[Qsim] = qr (B 0)B = A lowastQ[Qsim] = qr (B 0)

end forB = QT lowastA[UΣ V ] = svd (B)U = Q lowast U U = U ( 1 k)Σ = Σ (1 k 1 k)V = V ( 1 k)

[23] G W FURNAS S DEERWESTER S T DUMAIS T K LANDAUER R A HARSHMAN L A STREETERAND K E LOCHBAUM Information retrieval using a singular value decomposition model of latentsemantic structure in Proceedings of the 11th Annual International ACM SIGIR Conference on Researchand Development in Information Retrieval Y Chiaramella ed ACM New York 1988 pp 465ndash480

[24] K GOLDBERG T ROEDER D GUPTA AND C PERKINS Eigentaste A constant time collaborative filteringalgorithm Inf Retrieval 4 (2001) pp 133ndash151

[25] G GOLUB Numerical methods for solving linear least squares problems Numer Math 7 (1965) pp 206ndash216[26] G H GOLUB AND C F VAN LOAN Matrix Computations 4th ed Johns Hopkins University Press

Baltimore 2013[27] M GU Subspace iteration randomization and singular value problems SIAM J Sci Comput 37 (2015)

pp A1139ndashA1173[28] M GU AND S C EISENSTAT Efficient algorithms for computing a strong rank-revealing QR factorization

SIAM J Sci Comput 17 (1996) pp 848ndash869[29] E T HALE W YIN AND Y ZHANG Fixed-point continuation for l1-minimization methodology and

convergence SIAM J Optim 19 (2008) pp 1107ndash1130[30] N HALKO P G MARTINSSON AND J A TROPP Finding structure with randomness probabilistic

algorithms for constructing approximate matrix decompositions SIAM Rev 53 (2011) pp 217ndash288[31] R A HARSHMAN Foundations of the PARAFAC procedure Models and conditions for an ldquoexplanatoryrdquo

multimodal factor analysis UCLA Working Papers in Phonetics 16 (1970) pp 1ndash84[32] J L HERLOCKER J A KONSTAN A BORCHERS AND J RIEDL An algorithmic framework for performing

collaborative filtering in Proceedings of the 22nd Annual International ACM SIGIR Conference onResearch and Development in Information Retrieval F Gey M Heaast and R Tong eds ACM NewYork 1999 pp 230ndash237

[33] R A HORN AND C R JOHNSON Topics in Matrix Analysis Cambridge University Press Cambridge 1991[34] D A HUCKABY AND T F CHAN On the convergence of Stewartrsquos QLP algorithm for approximating the

SVD Numer Algorithms 32 (2003) pp 287ndash316[35] Stewartrsquos pivoted QLP decomposition for low-rank matrices Numer Linear Algebra Appl 12 (2005)

pp 153ndash159[36] W B JOHNSON AND J LINDENSTRAUSS Extensions of Lipschitz mappings into a Hilbert space in

Conference in Modern Analysis and Probability (New Haven Conn 1982) R Beals A Beck A Bellowand A Hajian eds vol 26 of Contemp Math Amer Math Soc Providence 1984 pp 189ndash206

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 493

[37] I T JOLLIFFE Principal Component Analysis Springer New York 1986[38] J M KLEINBERG Authoritative sources in a hyperlinked environment J ACM 46 (1999) pp 604ndash632[39] T G KOLDA Orthogonal tensor decompositions SIAM J Matrix Anal Appl 23 (2001) pp 243ndash255[40] T G KOLDA AND B W BADER Tensor decompositions and applications SIAM Rev 51 (2009) pp 455ndash

500[41] R M LARSEN PROPACK-software for large and sparse SVD calculations available at

httpsunstanfordedurmunkPROPACK[42] Lanczos bidiagonalization with partial reorthogonalization Tech Report PB-537 Department of

Computer Science Aarhus University Aarhus Denmark 1998[43] Y LECUN L BOTTOU Y BENGIO AND P HAFFNER Gradient-based learning applied to document

recognition Proc IEEE 86 (1998) pp 2278ndash2324[44] R B LEHOUCQ D C SORENSEN AND C YANG ARPACK Usersrsquo Guide Solution of Large-Scale

Eigenvalue Problems with Implicitly Restarted Arnoldi Methods SIAM Philadelphia 1998[45] Z LIN M CHEN AND Y MA The augmented Lagrange multiplier method for exact recovery of corrupted

low-rank matrices Preprint on arXiv httpsarxivorgabs10095055 2010[46] N LINIAL E LONDON AND Y RABINOVICH The geometry of graphs and some of its algorithmic

applications Combinatorica 15 (1995) pp 215ndash245[47] Z LIU A HANSSON AND L VANDENBERGHE Nuclear norm system identification with missing inputs and

outputs Systems Control Lett 62 (2013) pp 605ndash612[48] Z LIU AND L VANDENBERGHE Interior-point method for nuclear norm approximation with application to

system identification SIAM J Matrix Anal Appl 31 (2009) pp 1235ndash1256[49] S MA D GOLDFARB AND L CHEN Fixed point and Bregman iterative methods for matrix rank minimiza-

tion Math Program 128 (2011) pp 321ndash353[50] C B MELGAARD Randomized Pivoting and Spectrum-Revealing Bounds in Numerical Linear Algebra

PhD Thesis University of California Berkeley 2015[51] M MESBAHI AND G P PAPAVASSILOPOULOS On the rank minimization problem over a positive semidefinite

linear matrix inequality IEEE Trans Automat Control 42 (1997) pp 239ndash243[52] N MULLER L MAGAIA AND B M HERBST Singular value decomposition eigenfaces and 3D recon-

structions SIAM Rev 46 (2004) pp 518ndash545[53] M NARWARIA AND W LIN SVD-based quality metric for image and video using machine learning IEEE

Trans Systbdquo Man and Cybernetics Part B (Cybernetics) 42 (2012) pp 347ndash364[54] T-H OH Y MATSUSHITA Y-W TAI AND I SO KWEON Fast randomized singular value thresholding

for nuclear norm minimization in Proceedings of the IEEE Conference on Computer Vision and PatternRecognition 2015 IEEE Conference Proceedings IEEE Los Alamitos 2015 pp 4484ndash4493

[55] G QUINTANA-ORTIacute X SUN AND C H BISCHOF A BLAS-3 version of the QR factorization with columnpivoting SIAM J Sci Comput 19 (1998) pp 1486ndash1494

[56] B RECHT M FAZEL AND P A PARRILO Guaranteed minimum-rank solutions of linear matrix equationsvia nuclear norm minimization SIAM Rev 52 (2010) pp 471ndash501

[57] J D RENNIE AND N SREBRO Fast maximum margin matrix factorization for collaborative prediction inProceedings of the 22nd International Conference on Machine Learning L De Raedt and S Wrobel edsACM Press Neow York 2005 pp 713ndash719

[58] V ROKHLIN A SZLAM AND M TYGERT A randomized algorithm for principal component analysis SIAMJ Matrix Anal Appl 31 (2009) pp 1100ndash1124

[59] A K SAIBABA HOID higher order interpolatory decomposition for tensors based on Tucker representationSIAM J Matrix Anal Appl 37 (2016) pp 1223ndash1249

[60] B SAVAS AND L ELDEacuteN Handwritten digit classification using higher order singular value decompositionPattern Recognition 40 (2007) pp 993ndash1003

[61] R SCHREIBER AND C VAN LOAN A storage-efficient WY representation for products of Householdertransformations SIAM J Sci Statist Comput 10 (1989) pp 53ndash57

[62] A SHASHUA AND T HAZAN Non-negative tensor factorization with applications to statistics and com-puter vision in Proceedings of the 22nd International Conference on Machine Learning S DzeroskiL De Raedt and S Wrobel eds ACM New York 2005 pp 792ndash799

[63] N D SIDIROPOULOS R BRO AND G B GIANNAKIS Parallel factor analysis in sensor array processingIEEE Trans Signal Process 48 (2000) pp 2377ndash2388

[64] D C SORENSEN AND M EMBREE A DEIM induced CUR factorization SIAM J Sci Comput 38 (2016)pp A1454ndashA1482

[65] N SREBRO J RENNIE AND T S JAAKKOLA Maximum-margin matrix factorization in NIPSrsquo04 Proceed-ings of the 17th International Conference on Neural Information Processing Systems L K Saul Y Weissand L Bottou eds MIT Press Cambridge 2004 pp 1329ndash1336

[66] G W STEWART The QLP approximation to the singular value decomposition SIAM J Sci Comput 20(1999) pp 1336ndash1348

ETNAKent State University and

Johann Radon Institute (RICAM)

494 Y FENG J XIAO AND M GU

[67] S TAHERI Q QIU AND R CHELLAPPA Structure-preserving sparse decomposition for facial expressionanalysis IEEE Trans Image Process 23 (2014) pp 3590ndash3603

[68] K-C TOH AND S YUN An accelerated proximal gradient algorithm for nuclear norm regularized linearleast squares problems Pac J Optim 6 (2010) pp 615ndash640

[69] L N TREFETHEN AND D BAU III Numerical Linear Algebra SIAM Philadelphia 1997[70] L R TUCKER Some mathematical notes on three-mode factor analysis Psychometrika 31 (1966) pp 279ndash

311[71] N VANNIEUWENHOVEN R VANDEBRIL AND K MEERBERGEN A new truncation strategy for the higher-

order singular value decomposition SIAM J Sci Comput 34 (2012) pp A1027ndashA1052[72] M A O VASILESCU AND D TERZOPOULOS Multilinear analysis of image ensembles Tensorfaces in

European Conference on Computer Vision 2002 A Heyden G Sparr M Nielsen and P Johansen edsLecture Notes in Computer Science vol 2350 Springer Berlin 2002 pp 447ndash460

[73] N VERVLIET O DEBALS L SORBER M V BAREL AND L DE LATHAUWER Tensorlab user guideavailable at httpwwwtensorlabnet

[74] J XIAO AND M GU Spectrum-revealing Cholesky factorization for kernel methods in Proceedings of the 16thIEEE International Conference on Data Mining (ICDM) IEEE Conference Proceedings Los Alamitos2016 pp 1293ndash1298

[75] J XIAO M GU AND J LANGOU Fast parallel randomized QR with column pivoting algorithms for reliablelow-rank matrix approximations in Proceedings of the 24th IEEE International Conference on HighPerformance Computing (HiPC) IEEE Conference Proceedings Los Alamitos 2017 pp 233ndash242

[76] T ZHANG AND G H GOLUB Rank-one approximation to high order tensors SIAM J Matrix Anal Appl23 (2001) pp 534ndash550

Page 22: FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION AND …etna.mcs.kent.edu/vol.51.2019/pp469-494.dir/pp469-494.pdfminimization. In Section3, we introduce Flip-Flop SRQR and analyze its

ETNAKent State University and

Johann Radon Institute (RICAM)

490 Y FENG J XIAO AND M GU

Algorithm 6 ST-HOSVDInputsTensor X isin RI1timesmiddotmiddotmiddottimesId desired rank (k1 kd) and processing order p = (p1 middot middot middot pd)OutputsTucker decomposition [GU1 middot middot middot Ud]AlgorithmDefine tensor G larr X for j = 1 d dor = pj Compute exact or approximate rank kr SVD of the tensor unfolding G(r) asymp UrΣrV Tr Ur larr UrUpdate G(r) larr ΣrV

Tr ie applying UTr to G

end for

Algorithm 7 Robust PCA Using IALMInputsMeasured matrix M isin Rmtimesn positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 J (M) = max

(M2 λminus1MF

) Y0 = MJ (M) E0 = 0

while not converged do(UΣV) = svd

(MminusEk + microminus1k Yk

)

Xk+1 = USmicrominus1k

(Σ)V T

Ek+1 = Sλmicrominus1k

(M minusXk+1 + microminus1

k Yk)

Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then

Breakend if

end while

Now assume k + p min (mn) and omit the lower-order terms then we arriveat (4q + 4)mn (k + p) as the complexity of approximate SVD with randomized subspaceiteration In practice q is usually chosen to be an integer between 0 and 2

REFERENCES

[1] O ALTER P O BROWN AND D BOTSTEIN Singular value decomposition for genome-wide expressiondata processing and modeling Proc Natl Acad Sci USA 97 (2000) pp 10101ndash10106

[2] E ANDERSON Z BAI C BISCHOF L S BLACKFORD J DEMMEL J DONGARRA J DU CROZA GREENBAUM S HAMMARLING A MCKENNEY AND D SORENSEN LAPACK Usersrsquo Guide 3rded SIAM Philadelphia 1999

[3] C A ANDERSSON AND R BRO Improving the speed of multi-way algorithms Part I Tucker3 ChemometricsIntell Lab Syst 42 (1998) pp 93ndash103

[4] M W BERRY S T DUMAIS AND G W OrsquoBRIEN Using linear algebra for intelligent information retrievalSIAM Rev 37 (1995) pp 573ndash595

[5] C BISCHOF AND C VAN LOAN The WY representation for products of Householder matrices SIAM J Sci

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 491

Algorithm 8 Matrix Completion Using IALMInputsSampled set Ω sampled entries πΩ (M) positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 Y0 = 0 E0 = 0while not converged do

(UΣV) = svd(MminusEk + microminus1k Yk

)

Xk+1 = USmicrominus1k

(Σ)V T

Ek+1 = πΩ

(M minusXk+1 + microminus1

k Yk)

Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then

Breakend if

end while

Statist Comput 8 (1987) pp S2ndashS13[6] P BUSINGER AND G H GOLUB Handbook series linear algebra Linear least squares solutions by House-

holder transformations Numer Math 7 (1965) pp 269ndash276[7] J-F CAI E J CANDEgraveS AND Z SHEN A singular value thresholding algorithm for matrix completion

SIAM J Optim 20 (2010) pp 1956ndash1982[8] E J CANDEgraveS X LI Y MA AND J WRIGHT Robust principal component analysis J ACM 58 (2011)

Art 11 37 pages[9] E J CANDEgraveS AND B RECHT Exact matrix completion via convex optimization Found Comput Math 9

(2009) pp 717ndash772[10] J D CARROLL AND J-J CHANG Analysis of individual differences in multidimensional scaling via an

N-way generalization of ldquoEckart-Youngrdquo decomposition Psychometrika 35 (1970) pp 283ndash319[11] T A DAVIS AND Y HU The University of Florida sparse matrix collection ACM Trans Math Software 38

(2011) pp Art 1 25[12] L DE LATHAUWER AND B DE MOOR From matrix to tensor Multilinear algebra and signal processing in

Mathematics in Signal Processing IV Inst Math Appl Conf Ser Oxford Univ Press Oxford 1998pp 1ndash16

[13] L DE LATHAUWER B DE MOOR AND J VANDEWALLE A multilinear singular value decompositionSIAM J Matrix Anal Appl 21 (2000) pp 1253ndash1278

[14] On the best rank-1 and rank-(R1 R2 middot middot middot RN ) approximation of higher-order tensors SIAM JMatrix Anal Appl 21 (2000) pp 1324ndash1342

[15] L DE LATHAUWER AND J VANDEWALLE Dimensionality reduction in higher-order signal processing andrank-(R1 R2 RN ) reduction in multilinear algebra Linear Algebra Appl 391 (2004) pp 31ndash55

[16] P DRINEAS R KANNAN AND M W MAHONEY Fast Monte Carlo algorithms for matrices II Computinga low-rank approximation to a matrix SIAM J Comput 36 (2006) pp 158ndash183

[17] J A DUERSCH AND M GU Randomized QR with column pivoting SIAM J Sci Comput 39 (2017)pp C263ndashC291

[18] C ECKART AND G YOUNG The approximation of one matrix by another of lower rank Psychometrika 1(1936) pp 211ndash218

[19] L ELDEacuteN Matrix Methods in Data Mining and Pattern Recognition SIAM Philadelphia 2007[20] M FAZEL H HINDI AND S P BOYD Log-det heuristic for matrix rank minimization with applications to

Hankel and Euclidean distance matrices in Proceedings of the 2003 American Control Conference 2003IEEE Conference Proceedings IEEE Piscataway pp 2156ndash2162

[21] M FAZEL T K PONG D SUN AND P TSENG Hankel matrix rank minimization with applications tosystem identification and realization SIAM J Matrix Anal Appl 34 (2013) pp 946ndash977

[22] C FRANK AND E NOumlTH Optimizing eigenfaces by face masks for facial expression recognition in ComputerAnalysis of Images and Patterns N Petkov and M A Westenberg eds vol 2756 of Lecture Notes inComput Sci Springer Berlin 2003 pp 646ndash654

ETNAKent State University and

Johann Radon Institute (RICAM)

492 Y FENG J XIAO AND M GU

Algorithm 9 Approximate SVD with Randomized Subspace IterationInputsMatrix A isin Rmtimesn Target rank k Oversampling size p ge 0 Number of iterations q ge 1OutputsU isin Rmtimesk contains the approximate top k left singular vectors of AΣ isin Rktimesk contains the approximate top k singular values of AV isin Rntimesk contains the approximate top k right singular vectors of AAlgorithmGenerate iid Gaussian matrix Ω isin N (0 1)

ntimes(k+p)Compute B = AΩ[Qsim] = qr (B 0)for i = 1 q doB = AT lowastQ[Qsim] = qr (B 0)B = A lowastQ[Qsim] = qr (B 0)

end forB = QT lowastA[UΣ V ] = svd (B)U = Q lowast U U = U ( 1 k)Σ = Σ (1 k 1 k)V = V ( 1 k)

[23] G W FURNAS S DEERWESTER S T DUMAIS T K LANDAUER R A HARSHMAN L A STREETERAND K E LOCHBAUM Information retrieval using a singular value decomposition model of latentsemantic structure in Proceedings of the 11th Annual International ACM SIGIR Conference on Researchand Development in Information Retrieval Y Chiaramella ed ACM New York 1988 pp 465ndash480

[24] K GOLDBERG T ROEDER D GUPTA AND C PERKINS Eigentaste A constant time collaborative filteringalgorithm Inf Retrieval 4 (2001) pp 133ndash151

[25] G GOLUB Numerical methods for solving linear least squares problems Numer Math 7 (1965) pp 206ndash216[26] G H GOLUB AND C F VAN LOAN Matrix Computations 4th ed Johns Hopkins University Press

Baltimore 2013[27] M GU Subspace iteration randomization and singular value problems SIAM J Sci Comput 37 (2015)

pp A1139ndashA1173[28] M GU AND S C EISENSTAT Efficient algorithms for computing a strong rank-revealing QR factorization

SIAM J Sci Comput 17 (1996) pp 848ndash869[29] E T HALE W YIN AND Y ZHANG Fixed-point continuation for l1-minimization methodology and

convergence SIAM J Optim 19 (2008) pp 1107ndash1130[30] N HALKO P G MARTINSSON AND J A TROPP Finding structure with randomness probabilistic

algorithms for constructing approximate matrix decompositions SIAM Rev 53 (2011) pp 217ndash288[31] R A HARSHMAN Foundations of the PARAFAC procedure Models and conditions for an ldquoexplanatoryrdquo

multimodal factor analysis UCLA Working Papers in Phonetics 16 (1970) pp 1ndash84[32] J L HERLOCKER J A KONSTAN A BORCHERS AND J RIEDL An algorithmic framework for performing

collaborative filtering in Proceedings of the 22nd Annual International ACM SIGIR Conference onResearch and Development in Information Retrieval F Gey M Heaast and R Tong eds ACM NewYork 1999 pp 230ndash237

[33] R A HORN AND C R JOHNSON Topics in Matrix Analysis Cambridge University Press Cambridge 1991[34] D A HUCKABY AND T F CHAN On the convergence of Stewartrsquos QLP algorithm for approximating the

SVD Numer Algorithms 32 (2003) pp 287ndash316[35] Stewartrsquos pivoted QLP decomposition for low-rank matrices Numer Linear Algebra Appl 12 (2005)

pp 153ndash159[36] W B JOHNSON AND J LINDENSTRAUSS Extensions of Lipschitz mappings into a Hilbert space in

Conference in Modern Analysis and Probability (New Haven Conn 1982) R Beals A Beck A Bellowand A Hajian eds vol 26 of Contemp Math Amer Math Soc Providence 1984 pp 189ndash206

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 493

[37] I T JOLLIFFE Principal Component Analysis Springer New York 1986[38] J M KLEINBERG Authoritative sources in a hyperlinked environment J ACM 46 (1999) pp 604ndash632[39] T G KOLDA Orthogonal tensor decompositions SIAM J Matrix Anal Appl 23 (2001) pp 243ndash255[40] T G KOLDA AND B W BADER Tensor decompositions and applications SIAM Rev 51 (2009) pp 455ndash

500[41] R M LARSEN PROPACK-software for large and sparse SVD calculations available at

httpsunstanfordedurmunkPROPACK[42] Lanczos bidiagonalization with partial reorthogonalization Tech Report PB-537 Department of

Computer Science Aarhus University Aarhus Denmark 1998[43] Y LECUN L BOTTOU Y BENGIO AND P HAFFNER Gradient-based learning applied to document

recognition Proc IEEE 86 (1998) pp 2278ndash2324[44] R B LEHOUCQ D C SORENSEN AND C YANG ARPACK Usersrsquo Guide Solution of Large-Scale

Eigenvalue Problems with Implicitly Restarted Arnoldi Methods SIAM Philadelphia 1998[45] Z LIN M CHEN AND Y MA The augmented Lagrange multiplier method for exact recovery of corrupted

low-rank matrices Preprint on arXiv httpsarxivorgabs10095055 2010[46] N LINIAL E LONDON AND Y RABINOVICH The geometry of graphs and some of its algorithmic

applications Combinatorica 15 (1995) pp 215ndash245[47] Z LIU A HANSSON AND L VANDENBERGHE Nuclear norm system identification with missing inputs and

outputs Systems Control Lett 62 (2013) pp 605ndash612[48] Z LIU AND L VANDENBERGHE Interior-point method for nuclear norm approximation with application to

system identification SIAM J Matrix Anal Appl 31 (2009) pp 1235ndash1256[49] S MA D GOLDFARB AND L CHEN Fixed point and Bregman iterative methods for matrix rank minimiza-

tion Math Program 128 (2011) pp 321ndash353[50] C B MELGAARD Randomized Pivoting and Spectrum-Revealing Bounds in Numerical Linear Algebra

PhD Thesis University of California Berkeley 2015[51] M MESBAHI AND G P PAPAVASSILOPOULOS On the rank minimization problem over a positive semidefinite

linear matrix inequality IEEE Trans Automat Control 42 (1997) pp 239ndash243[52] N MULLER L MAGAIA AND B M HERBST Singular value decomposition eigenfaces and 3D recon-

structions SIAM Rev 46 (2004) pp 518ndash545[53] M NARWARIA AND W LIN SVD-based quality metric for image and video using machine learning IEEE

Trans Systbdquo Man and Cybernetics Part B (Cybernetics) 42 (2012) pp 347ndash364[54] T-H OH Y MATSUSHITA Y-W TAI AND I SO KWEON Fast randomized singular value thresholding

for nuclear norm minimization in Proceedings of the IEEE Conference on Computer Vision and PatternRecognition 2015 IEEE Conference Proceedings IEEE Los Alamitos 2015 pp 4484ndash4493

[55] G QUINTANA-ORTIacute X SUN AND C H BISCHOF A BLAS-3 version of the QR factorization with columnpivoting SIAM J Sci Comput 19 (1998) pp 1486ndash1494

[56] B RECHT M FAZEL AND P A PARRILO Guaranteed minimum-rank solutions of linear matrix equationsvia nuclear norm minimization SIAM Rev 52 (2010) pp 471ndash501

[57] J D RENNIE AND N SREBRO Fast maximum margin matrix factorization for collaborative prediction inProceedings of the 22nd International Conference on Machine Learning L De Raedt and S Wrobel edsACM Press Neow York 2005 pp 713ndash719

[58] V ROKHLIN A SZLAM AND M TYGERT A randomized algorithm for principal component analysis SIAMJ Matrix Anal Appl 31 (2009) pp 1100ndash1124

[59] A K SAIBABA HOID higher order interpolatory decomposition for tensors based on Tucker representationSIAM J Matrix Anal Appl 37 (2016) pp 1223ndash1249

[60] B SAVAS AND L ELDEacuteN Handwritten digit classification using higher order singular value decompositionPattern Recognition 40 (2007) pp 993ndash1003

[61] R SCHREIBER AND C VAN LOAN A storage-efficient WY representation for products of Householdertransformations SIAM J Sci Statist Comput 10 (1989) pp 53ndash57

[62] A SHASHUA AND T HAZAN Non-negative tensor factorization with applications to statistics and com-puter vision in Proceedings of the 22nd International Conference on Machine Learning S DzeroskiL De Raedt and S Wrobel eds ACM New York 2005 pp 792ndash799

[63] N D SIDIROPOULOS R BRO AND G B GIANNAKIS Parallel factor analysis in sensor array processingIEEE Trans Signal Process 48 (2000) pp 2377ndash2388

[64] D C SORENSEN AND M EMBREE A DEIM induced CUR factorization SIAM J Sci Comput 38 (2016)pp A1454ndashA1482

[65] N SREBRO J RENNIE AND T S JAAKKOLA Maximum-margin matrix factorization in NIPSrsquo04 Proceed-ings of the 17th International Conference on Neural Information Processing Systems L K Saul Y Weissand L Bottou eds MIT Press Cambridge 2004 pp 1329ndash1336

[66] G W STEWART The QLP approximation to the singular value decomposition SIAM J Sci Comput 20(1999) pp 1336ndash1348

ETNAKent State University and

Johann Radon Institute (RICAM)

494 Y FENG J XIAO AND M GU

[67] S TAHERI Q QIU AND R CHELLAPPA Structure-preserving sparse decomposition for facial expressionanalysis IEEE Trans Image Process 23 (2014) pp 3590ndash3603

[68] K-C TOH AND S YUN An accelerated proximal gradient algorithm for nuclear norm regularized linearleast squares problems Pac J Optim 6 (2010) pp 615ndash640

[69] L N TREFETHEN AND D BAU III Numerical Linear Algebra SIAM Philadelphia 1997[70] L R TUCKER Some mathematical notes on three-mode factor analysis Psychometrika 31 (1966) pp 279ndash

311[71] N VANNIEUWENHOVEN R VANDEBRIL AND K MEERBERGEN A new truncation strategy for the higher-

order singular value decomposition SIAM J Sci Comput 34 (2012) pp A1027ndashA1052[72] M A O VASILESCU AND D TERZOPOULOS Multilinear analysis of image ensembles Tensorfaces in

European Conference on Computer Vision 2002 A Heyden G Sparr M Nielsen and P Johansen edsLecture Notes in Computer Science vol 2350 Springer Berlin 2002 pp 447ndash460

[73] N VERVLIET O DEBALS L SORBER M V BAREL AND L DE LATHAUWER Tensorlab user guideavailable at httpwwwtensorlabnet

[74] J XIAO AND M GU Spectrum-revealing Cholesky factorization for kernel methods in Proceedings of the 16thIEEE International Conference on Data Mining (ICDM) IEEE Conference Proceedings Los Alamitos2016 pp 1293ndash1298

[75] J XIAO M GU AND J LANGOU Fast parallel randomized QR with column pivoting algorithms for reliablelow-rank matrix approximations in Proceedings of the 24th IEEE International Conference on HighPerformance Computing (HiPC) IEEE Conference Proceedings Los Alamitos 2017 pp 233ndash242

[76] T ZHANG AND G H GOLUB Rank-one approximation to high order tensors SIAM J Matrix Anal Appl23 (2001) pp 534ndash550

Page 23: FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION AND …etna.mcs.kent.edu/vol.51.2019/pp469-494.dir/pp469-494.pdfminimization. In Section3, we introduce Flip-Flop SRQR and analyze its

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 491

Algorithm 8 Matrix Completion Using IALMInputsSampled set Ω sampled entries πΩ (M) positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 Y0 = 0 E0 = 0while not converged do

(UΣV) = svd(MminusEk + microminus1k Yk

)

Xk+1 = USmicrominus1k

(Σ)V T

Ek+1 = πΩ

(M minusXk+1 + microminus1

k Yk)

Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then

Breakend if

end while

Statist Comput 8 (1987) pp S2ndashS13[6] P BUSINGER AND G H GOLUB Handbook series linear algebra Linear least squares solutions by House-

holder transformations Numer Math 7 (1965) pp 269ndash276[7] J-F CAI E J CANDEgraveS AND Z SHEN A singular value thresholding algorithm for matrix completion

SIAM J Optim 20 (2010) pp 1956ndash1982[8] E J CANDEgraveS X LI Y MA AND J WRIGHT Robust principal component analysis J ACM 58 (2011)

Art 11 37 pages[9] E J CANDEgraveS AND B RECHT Exact matrix completion via convex optimization Found Comput Math 9

(2009) pp 717ndash772[10] J D CARROLL AND J-J CHANG Analysis of individual differences in multidimensional scaling via an

N-way generalization of ldquoEckart-Youngrdquo decomposition Psychometrika 35 (1970) pp 283ndash319[11] T A DAVIS AND Y HU The University of Florida sparse matrix collection ACM Trans Math Software 38

(2011) pp Art 1 25[12] L DE LATHAUWER AND B DE MOOR From matrix to tensor Multilinear algebra and signal processing in

Mathematics in Signal Processing IV Inst Math Appl Conf Ser Oxford Univ Press Oxford 1998pp 1ndash16

[13] L DE LATHAUWER B DE MOOR AND J VANDEWALLE A multilinear singular value decompositionSIAM J Matrix Anal Appl 21 (2000) pp 1253ndash1278

[14] On the best rank-1 and rank-(R1 R2 middot middot middot RN ) approximation of higher-order tensors SIAM JMatrix Anal Appl 21 (2000) pp 1324ndash1342

[15] L DE LATHAUWER AND J VANDEWALLE Dimensionality reduction in higher-order signal processing andrank-(R1 R2 RN ) reduction in multilinear algebra Linear Algebra Appl 391 (2004) pp 31ndash55

[16] P DRINEAS R KANNAN AND M W MAHONEY Fast Monte Carlo algorithms for matrices II Computinga low-rank approximation to a matrix SIAM J Comput 36 (2006) pp 158ndash183

[17] J A DUERSCH AND M GU Randomized QR with column pivoting SIAM J Sci Comput 39 (2017)pp C263ndashC291

[18] C ECKART AND G YOUNG The approximation of one matrix by another of lower rank Psychometrika 1(1936) pp 211ndash218

[19] L ELDEacuteN Matrix Methods in Data Mining and Pattern Recognition SIAM Philadelphia 2007[20] M FAZEL H HINDI AND S P BOYD Log-det heuristic for matrix rank minimization with applications to

Hankel and Euclidean distance matrices in Proceedings of the 2003 American Control Conference 2003IEEE Conference Proceedings IEEE Piscataway pp 2156ndash2162

[21] M FAZEL T K PONG D SUN AND P TSENG Hankel matrix rank minimization with applications tosystem identification and realization SIAM J Matrix Anal Appl 34 (2013) pp 946ndash977

[22] C FRANK AND E NOumlTH Optimizing eigenfaces by face masks for facial expression recognition in ComputerAnalysis of Images and Patterns N Petkov and M A Westenberg eds vol 2756 of Lecture Notes inComput Sci Springer Berlin 2003 pp 646ndash654

ETNAKent State University and

Johann Radon Institute (RICAM)

492 Y FENG J XIAO AND M GU

Algorithm 9 Approximate SVD with Randomized Subspace IterationInputsMatrix A isin Rmtimesn Target rank k Oversampling size p ge 0 Number of iterations q ge 1OutputsU isin Rmtimesk contains the approximate top k left singular vectors of AΣ isin Rktimesk contains the approximate top k singular values of AV isin Rntimesk contains the approximate top k right singular vectors of AAlgorithmGenerate iid Gaussian matrix Ω isin N (0 1)

ntimes(k+p)Compute B = AΩ[Qsim] = qr (B 0)for i = 1 q doB = AT lowastQ[Qsim] = qr (B 0)B = A lowastQ[Qsim] = qr (B 0)

end forB = QT lowastA[UΣ V ] = svd (B)U = Q lowast U U = U ( 1 k)Σ = Σ (1 k 1 k)V = V ( 1 k)

[23] G W FURNAS S DEERWESTER S T DUMAIS T K LANDAUER R A HARSHMAN L A STREETERAND K E LOCHBAUM Information retrieval using a singular value decomposition model of latentsemantic structure in Proceedings of the 11th Annual International ACM SIGIR Conference on Researchand Development in Information Retrieval Y Chiaramella ed ACM New York 1988 pp 465ndash480

[24] K GOLDBERG T ROEDER D GUPTA AND C PERKINS Eigentaste A constant time collaborative filteringalgorithm Inf Retrieval 4 (2001) pp 133ndash151

[25] G GOLUB Numerical methods for solving linear least squares problems Numer Math 7 (1965) pp 206ndash216[26] G H GOLUB AND C F VAN LOAN Matrix Computations 4th ed Johns Hopkins University Press

Baltimore 2013[27] M GU Subspace iteration randomization and singular value problems SIAM J Sci Comput 37 (2015)

pp A1139ndashA1173[28] M GU AND S C EISENSTAT Efficient algorithms for computing a strong rank-revealing QR factorization

SIAM J Sci Comput 17 (1996) pp 848ndash869[29] E T HALE W YIN AND Y ZHANG Fixed-point continuation for l1-minimization methodology and

convergence SIAM J Optim 19 (2008) pp 1107ndash1130[30] N HALKO P G MARTINSSON AND J A TROPP Finding structure with randomness probabilistic

algorithms for constructing approximate matrix decompositions SIAM Rev 53 (2011) pp 217ndash288[31] R A HARSHMAN Foundations of the PARAFAC procedure Models and conditions for an ldquoexplanatoryrdquo

multimodal factor analysis UCLA Working Papers in Phonetics 16 (1970) pp 1ndash84[32] J L HERLOCKER J A KONSTAN A BORCHERS AND J RIEDL An algorithmic framework for performing

collaborative filtering in Proceedings of the 22nd Annual International ACM SIGIR Conference onResearch and Development in Information Retrieval F Gey M Heaast and R Tong eds ACM NewYork 1999 pp 230ndash237

[33] R A HORN AND C R JOHNSON Topics in Matrix Analysis Cambridge University Press Cambridge 1991[34] D A HUCKABY AND T F CHAN On the convergence of Stewartrsquos QLP algorithm for approximating the

SVD Numer Algorithms 32 (2003) pp 287ndash316[35] Stewartrsquos pivoted QLP decomposition for low-rank matrices Numer Linear Algebra Appl 12 (2005)

pp 153ndash159[36] W B JOHNSON AND J LINDENSTRAUSS Extensions of Lipschitz mappings into a Hilbert space in

Conference in Modern Analysis and Probability (New Haven Conn 1982) R Beals A Beck A Bellowand A Hajian eds vol 26 of Contemp Math Amer Math Soc Providence 1984 pp 189ndash206

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 493

[37] I T JOLLIFFE Principal Component Analysis Springer New York 1986[38] J M KLEINBERG Authoritative sources in a hyperlinked environment J ACM 46 (1999) pp 604ndash632[39] T G KOLDA Orthogonal tensor decompositions SIAM J Matrix Anal Appl 23 (2001) pp 243ndash255[40] T G KOLDA AND B W BADER Tensor decompositions and applications SIAM Rev 51 (2009) pp 455ndash

500[41] R M LARSEN PROPACK-software for large and sparse SVD calculations available at

httpsunstanfordedurmunkPROPACK[42] Lanczos bidiagonalization with partial reorthogonalization Tech Report PB-537 Department of

Computer Science Aarhus University Aarhus Denmark 1998[43] Y LECUN L BOTTOU Y BENGIO AND P HAFFNER Gradient-based learning applied to document

recognition Proc IEEE 86 (1998) pp 2278ndash2324[44] R B LEHOUCQ D C SORENSEN AND C YANG ARPACK Usersrsquo Guide Solution of Large-Scale

Eigenvalue Problems with Implicitly Restarted Arnoldi Methods SIAM Philadelphia 1998[45] Z LIN M CHEN AND Y MA The augmented Lagrange multiplier method for exact recovery of corrupted

low-rank matrices Preprint on arXiv httpsarxivorgabs10095055 2010[46] N LINIAL E LONDON AND Y RABINOVICH The geometry of graphs and some of its algorithmic

applications Combinatorica 15 (1995) pp 215ndash245[47] Z LIU A HANSSON AND L VANDENBERGHE Nuclear norm system identification with missing inputs and

outputs Systems Control Lett 62 (2013) pp 605ndash612[48] Z LIU AND L VANDENBERGHE Interior-point method for nuclear norm approximation with application to

system identification SIAM J Matrix Anal Appl 31 (2009) pp 1235ndash1256[49] S MA D GOLDFARB AND L CHEN Fixed point and Bregman iterative methods for matrix rank minimiza-

tion Math Program 128 (2011) pp 321ndash353[50] C B MELGAARD Randomized Pivoting and Spectrum-Revealing Bounds in Numerical Linear Algebra

PhD Thesis University of California Berkeley 2015[51] M MESBAHI AND G P PAPAVASSILOPOULOS On the rank minimization problem over a positive semidefinite

linear matrix inequality IEEE Trans Automat Control 42 (1997) pp 239ndash243[52] N MULLER L MAGAIA AND B M HERBST Singular value decomposition eigenfaces and 3D recon-

structions SIAM Rev 46 (2004) pp 518ndash545[53] M NARWARIA AND W LIN SVD-based quality metric for image and video using machine learning IEEE

Trans Systbdquo Man and Cybernetics Part B (Cybernetics) 42 (2012) pp 347ndash364[54] T-H OH Y MATSUSHITA Y-W TAI AND I SO KWEON Fast randomized singular value thresholding

for nuclear norm minimization in Proceedings of the IEEE Conference on Computer Vision and PatternRecognition 2015 IEEE Conference Proceedings IEEE Los Alamitos 2015 pp 4484ndash4493

[55] G QUINTANA-ORTIacute X SUN AND C H BISCHOF A BLAS-3 version of the QR factorization with columnpivoting SIAM J Sci Comput 19 (1998) pp 1486ndash1494

[56] B RECHT M FAZEL AND P A PARRILO Guaranteed minimum-rank solutions of linear matrix equationsvia nuclear norm minimization SIAM Rev 52 (2010) pp 471ndash501

[57] J D RENNIE AND N SREBRO Fast maximum margin matrix factorization for collaborative prediction inProceedings of the 22nd International Conference on Machine Learning L De Raedt and S Wrobel edsACM Press Neow York 2005 pp 713ndash719

[58] V ROKHLIN A SZLAM AND M TYGERT A randomized algorithm for principal component analysis SIAMJ Matrix Anal Appl 31 (2009) pp 1100ndash1124

[59] A K SAIBABA HOID higher order interpolatory decomposition for tensors based on Tucker representationSIAM J Matrix Anal Appl 37 (2016) pp 1223ndash1249

[60] B SAVAS AND L ELDEacuteN Handwritten digit classification using higher order singular value decompositionPattern Recognition 40 (2007) pp 993ndash1003

[61] R SCHREIBER AND C VAN LOAN A storage-efficient WY representation for products of Householdertransformations SIAM J Sci Statist Comput 10 (1989) pp 53ndash57

[62] A SHASHUA AND T HAZAN Non-negative tensor factorization with applications to statistics and com-puter vision in Proceedings of the 22nd International Conference on Machine Learning S DzeroskiL De Raedt and S Wrobel eds ACM New York 2005 pp 792ndash799

[63] N D SIDIROPOULOS R BRO AND G B GIANNAKIS Parallel factor analysis in sensor array processingIEEE Trans Signal Process 48 (2000) pp 2377ndash2388

[64] D C SORENSEN AND M EMBREE A DEIM induced CUR factorization SIAM J Sci Comput 38 (2016)pp A1454ndashA1482

[65] N SREBRO J RENNIE AND T S JAAKKOLA Maximum-margin matrix factorization in NIPSrsquo04 Proceed-ings of the 17th International Conference on Neural Information Processing Systems L K Saul Y Weissand L Bottou eds MIT Press Cambridge 2004 pp 1329ndash1336

[66] G W STEWART The QLP approximation to the singular value decomposition SIAM J Sci Comput 20(1999) pp 1336ndash1348

ETNAKent State University and

Johann Radon Institute (RICAM)

494 Y FENG J XIAO AND M GU

[67] S TAHERI Q QIU AND R CHELLAPPA Structure-preserving sparse decomposition for facial expressionanalysis IEEE Trans Image Process 23 (2014) pp 3590ndash3603

[68] K-C TOH AND S YUN An accelerated proximal gradient algorithm for nuclear norm regularized linearleast squares problems Pac J Optim 6 (2010) pp 615ndash640

[69] L N TREFETHEN AND D BAU III Numerical Linear Algebra SIAM Philadelphia 1997[70] L R TUCKER Some mathematical notes on three-mode factor analysis Psychometrika 31 (1966) pp 279ndash

311[71] N VANNIEUWENHOVEN R VANDEBRIL AND K MEERBERGEN A new truncation strategy for the higher-

order singular value decomposition SIAM J Sci Comput 34 (2012) pp A1027ndashA1052[72] M A O VASILESCU AND D TERZOPOULOS Multilinear analysis of image ensembles Tensorfaces in

European Conference on Computer Vision 2002 A Heyden G Sparr M Nielsen and P Johansen edsLecture Notes in Computer Science vol 2350 Springer Berlin 2002 pp 447ndash460

[73] N VERVLIET O DEBALS L SORBER M V BAREL AND L DE LATHAUWER Tensorlab user guideavailable at httpwwwtensorlabnet

[74] J XIAO AND M GU Spectrum-revealing Cholesky factorization for kernel methods in Proceedings of the 16thIEEE International Conference on Data Mining (ICDM) IEEE Conference Proceedings Los Alamitos2016 pp 1293ndash1298

[75] J XIAO M GU AND J LANGOU Fast parallel randomized QR with column pivoting algorithms for reliablelow-rank matrix approximations in Proceedings of the 24th IEEE International Conference on HighPerformance Computing (HiPC) IEEE Conference Proceedings Los Alamitos 2017 pp 233ndash242

[76] T ZHANG AND G H GOLUB Rank-one approximation to high order tensors SIAM J Matrix Anal Appl23 (2001) pp 534ndash550

Page 24: FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION AND …etna.mcs.kent.edu/vol.51.2019/pp469-494.dir/pp469-494.pdfminimization. In Section3, we introduce Flip-Flop SRQR and analyze its

ETNAKent State University and

Johann Radon Institute (RICAM)

492 Y FENG J XIAO AND M GU

Algorithm 9 Approximate SVD with Randomized Subspace IterationInputsMatrix A isin Rmtimesn Target rank k Oversampling size p ge 0 Number of iterations q ge 1OutputsU isin Rmtimesk contains the approximate top k left singular vectors of AΣ isin Rktimesk contains the approximate top k singular values of AV isin Rntimesk contains the approximate top k right singular vectors of AAlgorithmGenerate iid Gaussian matrix Ω isin N (0 1)

ntimes(k+p)Compute B = AΩ[Qsim] = qr (B 0)for i = 1 q doB = AT lowastQ[Qsim] = qr (B 0)B = A lowastQ[Qsim] = qr (B 0)

end forB = QT lowastA[UΣ V ] = svd (B)U = Q lowast U U = U ( 1 k)Σ = Σ (1 k 1 k)V = V ( 1 k)

[23] G W FURNAS S DEERWESTER S T DUMAIS T K LANDAUER R A HARSHMAN L A STREETERAND K E LOCHBAUM Information retrieval using a singular value decomposition model of latentsemantic structure in Proceedings of the 11th Annual International ACM SIGIR Conference on Researchand Development in Information Retrieval Y Chiaramella ed ACM New York 1988 pp 465ndash480

[24] K GOLDBERG T ROEDER D GUPTA AND C PERKINS Eigentaste A constant time collaborative filteringalgorithm Inf Retrieval 4 (2001) pp 133ndash151

[25] G GOLUB Numerical methods for solving linear least squares problems Numer Math 7 (1965) pp 206ndash216[26] G H GOLUB AND C F VAN LOAN Matrix Computations 4th ed Johns Hopkins University Press

Baltimore 2013[27] M GU Subspace iteration randomization and singular value problems SIAM J Sci Comput 37 (2015)

pp A1139ndashA1173[28] M GU AND S C EISENSTAT Efficient algorithms for computing a strong rank-revealing QR factorization

SIAM J Sci Comput 17 (1996) pp 848ndash869[29] E T HALE W YIN AND Y ZHANG Fixed-point continuation for l1-minimization methodology and

convergence SIAM J Optim 19 (2008) pp 1107ndash1130[30] N HALKO P G MARTINSSON AND J A TROPP Finding structure with randomness probabilistic

algorithms for constructing approximate matrix decompositions SIAM Rev 53 (2011) pp 217ndash288[31] R A HARSHMAN Foundations of the PARAFAC procedure Models and conditions for an ldquoexplanatoryrdquo

multimodal factor analysis UCLA Working Papers in Phonetics 16 (1970) pp 1ndash84[32] J L HERLOCKER J A KONSTAN A BORCHERS AND J RIEDL An algorithmic framework for performing

collaborative filtering in Proceedings of the 22nd Annual International ACM SIGIR Conference onResearch and Development in Information Retrieval F Gey M Heaast and R Tong eds ACM NewYork 1999 pp 230ndash237

[33] R A HORN AND C R JOHNSON Topics in Matrix Analysis Cambridge University Press Cambridge 1991[34] D A HUCKABY AND T F CHAN On the convergence of Stewartrsquos QLP algorithm for approximating the

SVD Numer Algorithms 32 (2003) pp 287ndash316[35] Stewartrsquos pivoted QLP decomposition for low-rank matrices Numer Linear Algebra Appl 12 (2005)

pp 153ndash159[36] W B JOHNSON AND J LINDENSTRAUSS Extensions of Lipschitz mappings into a Hilbert space in

Conference in Modern Analysis and Probability (New Haven Conn 1982) R Beals A Beck A Bellowand A Hajian eds vol 26 of Contemp Math Amer Math Soc Providence 1984 pp 189ndash206

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 493

[37] I T JOLLIFFE Principal Component Analysis Springer New York 1986[38] J M KLEINBERG Authoritative sources in a hyperlinked environment J ACM 46 (1999) pp 604ndash632[39] T G KOLDA Orthogonal tensor decompositions SIAM J Matrix Anal Appl 23 (2001) pp 243ndash255[40] T G KOLDA AND B W BADER Tensor decompositions and applications SIAM Rev 51 (2009) pp 455ndash

500[41] R M LARSEN PROPACK-software for large and sparse SVD calculations available at

httpsunstanfordedurmunkPROPACK[42] Lanczos bidiagonalization with partial reorthogonalization Tech Report PB-537 Department of

Computer Science Aarhus University Aarhus Denmark 1998[43] Y LECUN L BOTTOU Y BENGIO AND P HAFFNER Gradient-based learning applied to document

recognition Proc IEEE 86 (1998) pp 2278ndash2324[44] R B LEHOUCQ D C SORENSEN AND C YANG ARPACK Usersrsquo Guide Solution of Large-Scale

Eigenvalue Problems with Implicitly Restarted Arnoldi Methods SIAM Philadelphia 1998[45] Z LIN M CHEN AND Y MA The augmented Lagrange multiplier method for exact recovery of corrupted

low-rank matrices Preprint on arXiv httpsarxivorgabs10095055 2010[46] N LINIAL E LONDON AND Y RABINOVICH The geometry of graphs and some of its algorithmic

applications Combinatorica 15 (1995) pp 215ndash245[47] Z LIU A HANSSON AND L VANDENBERGHE Nuclear norm system identification with missing inputs and

outputs Systems Control Lett 62 (2013) pp 605ndash612[48] Z LIU AND L VANDENBERGHE Interior-point method for nuclear norm approximation with application to

system identification SIAM J Matrix Anal Appl 31 (2009) pp 1235ndash1256[49] S MA D GOLDFARB AND L CHEN Fixed point and Bregman iterative methods for matrix rank minimiza-

tion Math Program 128 (2011) pp 321ndash353[50] C B MELGAARD Randomized Pivoting and Spectrum-Revealing Bounds in Numerical Linear Algebra

PhD Thesis University of California Berkeley 2015[51] M MESBAHI AND G P PAPAVASSILOPOULOS On the rank minimization problem over a positive semidefinite

linear matrix inequality IEEE Trans Automat Control 42 (1997) pp 239ndash243[52] N MULLER L MAGAIA AND B M HERBST Singular value decomposition eigenfaces and 3D recon-

structions SIAM Rev 46 (2004) pp 518ndash545[53] M NARWARIA AND W LIN SVD-based quality metric for image and video using machine learning IEEE

Trans Systbdquo Man and Cybernetics Part B (Cybernetics) 42 (2012) pp 347ndash364[54] T-H OH Y MATSUSHITA Y-W TAI AND I SO KWEON Fast randomized singular value thresholding

for nuclear norm minimization in Proceedings of the IEEE Conference on Computer Vision and PatternRecognition 2015 IEEE Conference Proceedings IEEE Los Alamitos 2015 pp 4484ndash4493

[55] G QUINTANA-ORTIacute X SUN AND C H BISCHOF A BLAS-3 version of the QR factorization with columnpivoting SIAM J Sci Comput 19 (1998) pp 1486ndash1494

[56] B RECHT M FAZEL AND P A PARRILO Guaranteed minimum-rank solutions of linear matrix equationsvia nuclear norm minimization SIAM Rev 52 (2010) pp 471ndash501

[57] J D RENNIE AND N SREBRO Fast maximum margin matrix factorization for collaborative prediction inProceedings of the 22nd International Conference on Machine Learning L De Raedt and S Wrobel edsACM Press Neow York 2005 pp 713ndash719

[58] V ROKHLIN A SZLAM AND M TYGERT A randomized algorithm for principal component analysis SIAMJ Matrix Anal Appl 31 (2009) pp 1100ndash1124

[59] A K SAIBABA HOID higher order interpolatory decomposition for tensors based on Tucker representationSIAM J Matrix Anal Appl 37 (2016) pp 1223ndash1249

[60] B SAVAS AND L ELDEacuteN Handwritten digit classification using higher order singular value decompositionPattern Recognition 40 (2007) pp 993ndash1003

[61] R SCHREIBER AND C VAN LOAN A storage-efficient WY representation for products of Householdertransformations SIAM J Sci Statist Comput 10 (1989) pp 53ndash57

[62] A SHASHUA AND T HAZAN Non-negative tensor factorization with applications to statistics and com-puter vision in Proceedings of the 22nd International Conference on Machine Learning S DzeroskiL De Raedt and S Wrobel eds ACM New York 2005 pp 792ndash799

[63] N D SIDIROPOULOS R BRO AND G B GIANNAKIS Parallel factor analysis in sensor array processingIEEE Trans Signal Process 48 (2000) pp 2377ndash2388

[64] D C SORENSEN AND M EMBREE A DEIM induced CUR factorization SIAM J Sci Comput 38 (2016)pp A1454ndashA1482

[65] N SREBRO J RENNIE AND T S JAAKKOLA Maximum-margin matrix factorization in NIPSrsquo04 Proceed-ings of the 17th International Conference on Neural Information Processing Systems L K Saul Y Weissand L Bottou eds MIT Press Cambridge 2004 pp 1329ndash1336

[66] G W STEWART The QLP approximation to the singular value decomposition SIAM J Sci Comput 20(1999) pp 1336ndash1348

ETNAKent State University and

Johann Radon Institute (RICAM)

494 Y FENG J XIAO AND M GU

[67] S TAHERI Q QIU AND R CHELLAPPA Structure-preserving sparse decomposition for facial expressionanalysis IEEE Trans Image Process 23 (2014) pp 3590ndash3603

[68] K-C TOH AND S YUN An accelerated proximal gradient algorithm for nuclear norm regularized linearleast squares problems Pac J Optim 6 (2010) pp 615ndash640

[69] L N TREFETHEN AND D BAU III Numerical Linear Algebra SIAM Philadelphia 1997[70] L R TUCKER Some mathematical notes on three-mode factor analysis Psychometrika 31 (1966) pp 279ndash

311[71] N VANNIEUWENHOVEN R VANDEBRIL AND K MEERBERGEN A new truncation strategy for the higher-

order singular value decomposition SIAM J Sci Comput 34 (2012) pp A1027ndashA1052[72] M A O VASILESCU AND D TERZOPOULOS Multilinear analysis of image ensembles Tensorfaces in

European Conference on Computer Vision 2002 A Heyden G Sparr M Nielsen and P Johansen edsLecture Notes in Computer Science vol 2350 Springer Berlin 2002 pp 447ndash460

[73] N VERVLIET O DEBALS L SORBER M V BAREL AND L DE LATHAUWER Tensorlab user guideavailable at httpwwwtensorlabnet

[74] J XIAO AND M GU Spectrum-revealing Cholesky factorization for kernel methods in Proceedings of the 16thIEEE International Conference on Data Mining (ICDM) IEEE Conference Proceedings Los Alamitos2016 pp 1293ndash1298

[75] J XIAO M GU AND J LANGOU Fast parallel randomized QR with column pivoting algorithms for reliablelow-rank matrix approximations in Proceedings of the 24th IEEE International Conference on HighPerformance Computing (HiPC) IEEE Conference Proceedings Los Alamitos 2017 pp 233ndash242

[76] T ZHANG AND G H GOLUB Rank-one approximation to high order tensors SIAM J Matrix Anal Appl23 (2001) pp 534ndash550

Page 25: FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION AND …etna.mcs.kent.edu/vol.51.2019/pp469-494.dir/pp469-494.pdfminimization. In Section3, we introduce Flip-Flop SRQR and analyze its

ETNAKent State University and

Johann Radon Institute (RICAM)

FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 493

[37] I T JOLLIFFE Principal Component Analysis Springer New York 1986[38] J M KLEINBERG Authoritative sources in a hyperlinked environment J ACM 46 (1999) pp 604ndash632[39] T G KOLDA Orthogonal tensor decompositions SIAM J Matrix Anal Appl 23 (2001) pp 243ndash255[40] T G KOLDA AND B W BADER Tensor decompositions and applications SIAM Rev 51 (2009) pp 455ndash

500[41] R M LARSEN PROPACK-software for large and sparse SVD calculations available at

httpsunstanfordedurmunkPROPACK[42] Lanczos bidiagonalization with partial reorthogonalization Tech Report PB-537 Department of

Computer Science Aarhus University Aarhus Denmark 1998[43] Y LECUN L BOTTOU Y BENGIO AND P HAFFNER Gradient-based learning applied to document

recognition Proc IEEE 86 (1998) pp 2278ndash2324[44] R B LEHOUCQ D C SORENSEN AND C YANG ARPACK Usersrsquo Guide Solution of Large-Scale

Eigenvalue Problems with Implicitly Restarted Arnoldi Methods SIAM Philadelphia 1998[45] Z LIN M CHEN AND Y MA The augmented Lagrange multiplier method for exact recovery of corrupted

low-rank matrices Preprint on arXiv httpsarxivorgabs10095055 2010[46] N LINIAL E LONDON AND Y RABINOVICH The geometry of graphs and some of its algorithmic

applications Combinatorica 15 (1995) pp 215ndash245[47] Z LIU A HANSSON AND L VANDENBERGHE Nuclear norm system identification with missing inputs and

outputs Systems Control Lett 62 (2013) pp 605ndash612[48] Z LIU AND L VANDENBERGHE Interior-point method for nuclear norm approximation with application to

system identification SIAM J Matrix Anal Appl 31 (2009) pp 1235ndash1256[49] S MA D GOLDFARB AND L CHEN Fixed point and Bregman iterative methods for matrix rank minimiza-

tion Math Program 128 (2011) pp 321ndash353[50] C B MELGAARD Randomized Pivoting and Spectrum-Revealing Bounds in Numerical Linear Algebra

PhD Thesis University of California Berkeley 2015[51] M MESBAHI AND G P PAPAVASSILOPOULOS On the rank minimization problem over a positive semidefinite

linear matrix inequality IEEE Trans Automat Control 42 (1997) pp 239ndash243[52] N MULLER L MAGAIA AND B M HERBST Singular value decomposition eigenfaces and 3D recon-

structions SIAM Rev 46 (2004) pp 518ndash545[53] M NARWARIA AND W LIN SVD-based quality metric for image and video using machine learning IEEE

Trans Systbdquo Man and Cybernetics Part B (Cybernetics) 42 (2012) pp 347ndash364[54] T-H OH Y MATSUSHITA Y-W TAI AND I SO KWEON Fast randomized singular value thresholding

for nuclear norm minimization in Proceedings of the IEEE Conference on Computer Vision and PatternRecognition 2015 IEEE Conference Proceedings IEEE Los Alamitos 2015 pp 4484ndash4493

[55] G QUINTANA-ORTIacute X SUN AND C H BISCHOF A BLAS-3 version of the QR factorization with columnpivoting SIAM J Sci Comput 19 (1998) pp 1486ndash1494

[56] B RECHT M FAZEL AND P A PARRILO Guaranteed minimum-rank solutions of linear matrix equationsvia nuclear norm minimization SIAM Rev 52 (2010) pp 471ndash501

[57] J D RENNIE AND N SREBRO Fast maximum margin matrix factorization for collaborative prediction inProceedings of the 22nd International Conference on Machine Learning L De Raedt and S Wrobel edsACM Press Neow York 2005 pp 713ndash719

[58] V ROKHLIN A SZLAM AND M TYGERT A randomized algorithm for principal component analysis SIAMJ Matrix Anal Appl 31 (2009) pp 1100ndash1124

[59] A K SAIBABA HOID higher order interpolatory decomposition for tensors based on Tucker representationSIAM J Matrix Anal Appl 37 (2016) pp 1223ndash1249

[60] B SAVAS AND L ELDEacuteN Handwritten digit classification using higher order singular value decompositionPattern Recognition 40 (2007) pp 993ndash1003

[61] R SCHREIBER AND C VAN LOAN A storage-efficient WY representation for products of Householdertransformations SIAM J Sci Statist Comput 10 (1989) pp 53ndash57

[62] A SHASHUA AND T HAZAN Non-negative tensor factorization with applications to statistics and com-puter vision in Proceedings of the 22nd International Conference on Machine Learning S DzeroskiL De Raedt and S Wrobel eds ACM New York 2005 pp 792ndash799

[63] N D SIDIROPOULOS R BRO AND G B GIANNAKIS Parallel factor analysis in sensor array processingIEEE Trans Signal Process 48 (2000) pp 2377ndash2388

[64] D C SORENSEN AND M EMBREE A DEIM induced CUR factorization SIAM J Sci Comput 38 (2016)pp A1454ndashA1482

[65] N SREBRO J RENNIE AND T S JAAKKOLA Maximum-margin matrix factorization in NIPSrsquo04 Proceed-ings of the 17th International Conference on Neural Information Processing Systems L K Saul Y Weissand L Bottou eds MIT Press Cambridge 2004 pp 1329ndash1336

[66] G W STEWART The QLP approximation to the singular value decomposition SIAM J Sci Comput 20(1999) pp 1336ndash1348

ETNAKent State University and

Johann Radon Institute (RICAM)

494 Y FENG J XIAO AND M GU

[67] S TAHERI Q QIU AND R CHELLAPPA Structure-preserving sparse decomposition for facial expressionanalysis IEEE Trans Image Process 23 (2014) pp 3590ndash3603

[68] K-C TOH AND S YUN An accelerated proximal gradient algorithm for nuclear norm regularized linearleast squares problems Pac J Optim 6 (2010) pp 615ndash640

[69] L N TREFETHEN AND D BAU III Numerical Linear Algebra SIAM Philadelphia 1997[70] L R TUCKER Some mathematical notes on three-mode factor analysis Psychometrika 31 (1966) pp 279ndash

311[71] N VANNIEUWENHOVEN R VANDEBRIL AND K MEERBERGEN A new truncation strategy for the higher-

order singular value decomposition SIAM J Sci Comput 34 (2012) pp A1027ndashA1052[72] M A O VASILESCU AND D TERZOPOULOS Multilinear analysis of image ensembles Tensorfaces in

European Conference on Computer Vision 2002 A Heyden G Sparr M Nielsen and P Johansen edsLecture Notes in Computer Science vol 2350 Springer Berlin 2002 pp 447ndash460

[73] N VERVLIET O DEBALS L SORBER M V BAREL AND L DE LATHAUWER Tensorlab user guideavailable at httpwwwtensorlabnet

[74] J XIAO AND M GU Spectrum-revealing Cholesky factorization for kernel methods in Proceedings of the 16thIEEE International Conference on Data Mining (ICDM) IEEE Conference Proceedings Los Alamitos2016 pp 1293ndash1298

[75] J XIAO M GU AND J LANGOU Fast parallel randomized QR with column pivoting algorithms for reliablelow-rank matrix approximations in Proceedings of the 24th IEEE International Conference on HighPerformance Computing (HiPC) IEEE Conference Proceedings Los Alamitos 2017 pp 233ndash242

[76] T ZHANG AND G H GOLUB Rank-one approximation to high order tensors SIAM J Matrix Anal Appl23 (2001) pp 534ndash550

Page 26: FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION AND …etna.mcs.kent.edu/vol.51.2019/pp469-494.dir/pp469-494.pdfminimization. In Section3, we introduce Flip-Flop SRQR and analyze its

ETNAKent State University and

Johann Radon Institute (RICAM)

494 Y FENG J XIAO AND M GU

[67] S TAHERI Q QIU AND R CHELLAPPA Structure-preserving sparse decomposition for facial expressionanalysis IEEE Trans Image Process 23 (2014) pp 3590ndash3603

[68] K-C TOH AND S YUN An accelerated proximal gradient algorithm for nuclear norm regularized linearleast squares problems Pac J Optim 6 (2010) pp 615ndash640

[69] L N TREFETHEN AND D BAU III Numerical Linear Algebra SIAM Philadelphia 1997[70] L R TUCKER Some mathematical notes on three-mode factor analysis Psychometrika 31 (1966) pp 279ndash

311[71] N VANNIEUWENHOVEN R VANDEBRIL AND K MEERBERGEN A new truncation strategy for the higher-

order singular value decomposition SIAM J Sci Comput 34 (2012) pp A1027ndashA1052[72] M A O VASILESCU AND D TERZOPOULOS Multilinear analysis of image ensembles Tensorfaces in

European Conference on Computer Vision 2002 A Heyden G Sparr M Nielsen and P Johansen edsLecture Notes in Computer Science vol 2350 Springer Berlin 2002 pp 447ndash460

[73] N VERVLIET O DEBALS L SORBER M V BAREL AND L DE LATHAUWER Tensorlab user guideavailable at httpwwwtensorlabnet

[74] J XIAO AND M GU Spectrum-revealing Cholesky factorization for kernel methods in Proceedings of the 16thIEEE International Conference on Data Mining (ICDM) IEEE Conference Proceedings Los Alamitos2016 pp 1293ndash1298

[75] J XIAO M GU AND J LANGOU Fast parallel randomized QR with column pivoting algorithms for reliablelow-rank matrix approximations in Proceedings of the 24th IEEE International Conference on HighPerformance Computing (HiPC) IEEE Conference Proceedings Los Alamitos 2017 pp 233ndash242

[76] T ZHANG AND G H GOLUB Rank-one approximation to high order tensors SIAM J Matrix Anal Appl23 (2001) pp 534ndash550