Lecture Notes in Mathematics 2173 - Emory Universitybenzi/Web_papers/nsmail.pdf · PROOF Lecture Notes in Mathematics 2173 1 Editors-in-Chief: J.-M. Morel, Cachan 2 3 B. Teissier,

UNCO

RREC

TED

PROO

F

Lecture Notes in Mathematics 2173 1

Editors-in-Chief:J.-M. Morel, Cachan

2

3

B. Teissier, Paris 4

Advisory Board:Camillo De Lellis, Zurich

5

6

Mario di Bernardo, Bristol 7Michel Brion, Grenoble 8Alessio Figalli, Zurich 9Davar Khoshnevisan, Salt Lake City 10Ioannis Kontoyiannis, Athens 11Gabor Lugosi, Barcelona 12Mark Podolskij, Aarhus 13Sylvia Serfaty, New York 14Anna Wienhard, Heidelberg 15

More information about this series at http://www.springer.com/series/304

http://www.springer.com/series/304

UNCO

RREC

TED

PROO

F

UNCO

RREC

TED

PROO

F

Michele Benzi • Dario Bini • Daniel Kressner •Hans Munthe-Kaas • Charles Van Loan

16

17

Editors 18

Exploiting Hidden Structurein Matrix Computations:Algorithms and Applications

19

20

21

Cetraro, Italy 2015 22

123

UNCO

RREC

TED

PROO

F

EditorsMichele BenziMathematics and Science CenterEmory UniversityAtlanta, USA

Dario BiniUniversita’ di PisaPisa, Italy

Daniel KressnerÉcole Polytechnique Fédérale de LausanneLausanne, Switzerland

Hans Munthe-KaasDepartment of MathematicsUniversity of BergenBergen, Norway

Charles Van LoanDepartment of Computer ScienceCornell UniversityIthacaNew York, USA

23

ISSN 0075-8434 ISSN 1617-9692 (electronic)Lecture Notes in MathematicsISBN 978-3-319-49886-7 ISBN 978-3-319-49887-4 (eBook)DOI 10.1007/978-3-319-49887-4

24

Library of Congress Control Number: xxxxxxxxxx 25

Mathematics Subject Classification: 65Fxx, 65Nxx 26

© Springer International Publishing AG 2016 27This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part ofthe material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,broadcasting, reproduction on microfilms or in any other physical way, and transmission or informationstorage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodologynow known or hereafter developed.

28

29

30

31

32

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publicationdoes not imply, even in the absence of a specific statement, that such names are exempt from the relevantprotective laws and regulations and therefore free for general use.

33

34

35

The publisher, the authors and the editors are safe to assume that the advice and information in this bookare believed to be true and accurate at the date of publication. Neither the publisher nor the authors orthe editors give a warranty, express or implied, with respect to the material contained herein or for anyerrors or omissions that may have been made.

36

37

38

39

Printed on acid-free paper 40

This Springer imprint is published by Springer Nature 41The registered company is Springer International Publishing AG 42The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

UNCO

RREC

TED

PROO

F

Preface 1

This volume collects the notes of the lectures delivered at the CIME Summer 2Course “Exploiting Hidden Structure in Matrix Computations. Algorithms and 3Applications,” held in Cetraro, Italy, from June 22 to June 26, 2015. 4

The course focused on various types of hidden and approximate structure in 5matrices and on their key role in various emerging applications. Matrices with 6special structure arise frequently in scientific and engineering problems and have 7long been the object of study in numerical linear algebra, as well as in matrix and 8operator theory. For instance, banded matrices or Toeplitz matrices fall under this 9category. More recently, however, researchers have begun to investigate classes of 10matrices for which structural properties, while present, may not be immediately 11obvious. Important examples include matrices with low-rank off-diagonal block 12structure, such as hierarchical matrices, quasi-separable matrices, and so forth. 13Another example is provided by matrices in which the entries follow some type 14of decay pattern away from the main diagonal. The structural analysis of complex 15networks leads to matrices that appear at first sight to be completely unstructured; 16a closer look, however, may reveal a great deal of latent structure, for example, in 17the distribution of the nonzero entries and of the eigenvalues. Knowledge of these 18properties can be of great importance in the design of efficient numerical methods 19for such problems. 20

In many cases, the matrix is only “approximately” structured; for instance, a 21matrix could be close in some norm to a matrix that has a desirable structure, such 22as a banded matrix or a semiseparable matrix. In such cases, it may be possible to 23develop solution algorithms that exploit the “near-structure” present in the matrix; 24for example, efficient preconditioners could be developed for the nearby structured 25problem and applied to the original problem. In other cases, the solution of the 26nearby problem, for which efficient algorithms exist, may be a sufficiently good 27approximation to the solution of the original problem, and the main difficulty could 28be detecting the “nearest” structured problem. 29

Another very useful kind of structure is represented by various types of sym- 30metries. Symmetries in matrices and tensors are often linked to invariance under 31some group of transformations. Again, the structure, or near-structure, may not be 32

v

UNCO

RREC

TED

PROO

F

vi Preface

immediately obvious in a given problem: uncovering the hidden symmetries (and 33underlying transformation group) in a problem can potentially lead to very efficient 34algorithms. 35

The aim of this course was to present this increasingly important point of view 36to young researchers by exploiting the expertise of leading figures in this area, 37with different theoretical and application perspectives. The course was attended 38by 31 PhD students and young researchers, roughly half of them from Italy and 39the remaining ones from the USA, Great Britain, Germany, the Netherlands, Spain, 40Croatia, Czech Republic, and Finland. Many participants (about 50%) were at least 41partially supported by funding from CIME and the European Mathematical Society; 42other financial support was provided by the Università di Bologna. 43

Two evenings (for a total of four hours) were devoted to short 15-min pre- 44sentations by 16 of the participants on their current research activities. These 45contributions were mostly of very high quality and very enjoyable, contributing to 46the success of the meeting. 47

The notes collected in this volume are a faithful record of the material covered 48in the 28 h of lectures comprising the course. Charles Van Loan (Cornell University, 49USA) discussed structured matrix computations originating from tensor analysis, 50in particular tensor decompositions, with low-rank and Kronecker structure playing 51the main role. Dario Bini (Università di Pisa, Italy) covered matrices with Toeplitz 52and Toeplitz-like structure, as well as rank-structured matrices, with applications 53to Markov modeling and queueing theory. Daniel Kressner (EPFL, Switzerland) 54lectured on techniques for matrices with hierarchical low-rank structures, including 55applications to the numerical solution of elliptic PDEs; Jonas Ballani (EPFL) 56also contributed to the drafting of the corresponding lecture notes contained in 57this volume. Michele Benzi (Emory University) discussed matrices with decay, 58with particular emphasis on localization in matrix functions, with applications to 59quantum physics and network science. The lectures of Hans Munthe-Kaas (Univer- 60sity of Bergen, Norway) concerned the use of group-theoretic methods (including 61representation theory) in numerical linear algebra, with applications to the fast 62Fourier transform (“harmonic analysis on finite groups”) and to computational 63aspects of Lie groups and Lie algebras. 64

We hope that these lecture notes will be of use to young researchers in numerical 65linear algebra and scientific computing and that they will stimulate further research 66in this area. 67

We would like to express our sincere thanks to Elvira Mascolo (CIME Director) 68and especially to Paolo Salani (CIME Secretary), whose unfailing support proved 69crucial for the success of the course. We are also grateful to the European 70Mathematical Society for their endorsement and financial support, and to the 71Dipartimento di Matematica of Università di Bologna for additional support for 72the lecturers.AQ1 73

Atlanta, GA, USA Michele Benzi 74Bologna, Italy Valeria Simoncini 75

UNCO

RREC

TED

PROO

F

Acknowledgments 1

CIME activity is carried out with the collaboration and financial support of INdAM 2(Istituto Nazionale di Alta Matematica) and Mathematics Department, University of 3Bologna. 4

vii

UNCO

RREC

TED

PROO

F

Contents 1

Structured Matrix Problems from Tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2Charles F. Van Loan 3

Matrix Structures in Queuing Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4Dario A. Bini 5

Matrices with Hierarchical Low-Rank Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 6Jonas Ballani and Daniel Kressner 7

Localization in Matrix Computations: Theory and Applications . . . . . . . . . . 211 8Michele Benzi 9

Groups and Symmetries in Numerical Linear Algebra . . . . . . . . . . . . . . . . . . . . . . 319 10Hans Z. Munthe-Kaas 11

ix

UNCO

RREC

TED

PROO

F

AUTHOR QUERY

AQ1. A slight mismatch has been noticed between source file (i.e., TeX) and PDFfile provided for preface and we have followed the source file (i.e., TeX).Please confirm if this is fine.

UNCO

RREC

TED

PROO

F

Structured Matrix Problems from Tensors 1

Charles F. Van Loan 2

Abstract This chapter looks at the structured matrix computations that arise in 3the context of various “svd-like” tensor decompositions. Kronecker products and 4low-rank manipulations are central to the theme. Algorithmic details include the 5exploitation of partial symmetries, componentwise optimization, and how we might 6beat the “curse of dimensionality.” Order-4 tensors figure heavily in the discussion.AQ2 7

1 Introduction 8

A tensor is a multi-dimensional array. Instead of just A.i; j/ as for matrices we have 9A.i; j; k; `; : : :/. High-dimensional modeling, cheap storage, and sensor technology 10combine to explain why tensor computations are surging in importance. Here is an 11annotated timeline that helps to put things in perspective: 12

Scalar-Level Thinking

19600s +

Matrix-Level Thinking

19800s +

Block Matrix-Level Thinking

20000s +

Tensor-Level Thinking

13

C.F. Van Loan (�)Department of Computer Science, Cornell University, Ithaca, NY, USAAQ1e-mail: [email protected]

© Springer International Publishing AG 2016M. Benzi, V. Simoncini (eds.), Exploiting Hidden Structure in Matrix Computations:Algorithms and Applications, Lecture Notes in Mathematics 2173,DOI 10.1007/978-3-319-49887-4_1

1

mailto:[email protected]

UNCO

RREC

TED

PROO

F

2 C.F. Van Loan

The factorization paradigm: LU, LDLT , QR, U˙VT ,etc.

Memory traffic awareness:, cache, parallel computing,LAPACK, etc.

Matrix-tensor connections: unfoldings, Kroneckerproduct, multilinear optimization, etc.

14

An important subtext is the changing definition of what we mean by a “big problem.” 15In matrix computations, to say that A 2 Rn1�n2is “big” is to say that both n1 and n2 16are big. In tensor computations, to say that A 2 Rn1��nd is “big” is to say that 17n1n2 � � � nd is big and this does NOT necessarily require big nk. For example, no 18computer in the world (nowadays at least!) can store a tensor with modal dimensions 19n1 D n2 D � � � D n1000 D 2: What this means is that a significant part of the tensor 20research community is preoccupied with the development of algorithms that scale 21with d. Algorithmic innovations must deal with the “curse of dimensionality.” How 22the transition from matrix-based scientific computation to tensor-based scientific 23computation plays out is all about the fate of the “the curse.” 24

This chapter is designed to give readers who are somewhat familiar with 25matrix computations an idea about the underlying challenges associated with tensor 26computations. These include mechanisms by which tensor computations are turned 27into matrix computations and how various matrix algorithms and decompositions 28(especially the SVD) turn up all along the way. An important theme throughout is 29the exploitation of Kronecker product structure. 30

To set the tone we use Sect. 2 to present an overview of some remarkable “hidden 31structures” that show up in matrix computations. Each of the chosen examples 32has a review component and a message about tensor-based matrix computations. 33The connection between block matrices and order-4 tensors is used in Sect. 3 to 34introduce the idea of a tensor unfolding and to connect Kronecker products and 35tensor products. A simple nearest rank-1 tensor problem is used in Sect. 4 to 36showcase the idea of componentwise optimization, a strategy that is widely used 37in tensor computations. In Sect. 5 we show how Rayleigh quotients can be used to 38extend the notion of singular values and vectors to tensors. Transposition and tensor 39symmetry are discussed in Sect. 6. Extending the singular value decomposition to 40tensors can be done in a number of ways. We present the Tucker decomposition in 41Sect. 7, the CP decomposition in Sect. 8, the Kronecker product SVD in Sect. 9, and 42the tensor train SVD in Sect. 10. Cholesky with column pivoting also has a role to 43play in tensor computations as we show in Sect. 11. 44

We want to stress that this chapter is a high-level, informal look at the kind of 45matrix problems that arise out of tensor computations. Implementation details and 46rigorous analysis are left to the references. To get started with the literature and for 47general background we recommend [7, 11, 16, 17, 19, 29]. 48

UNCO

RREC

TED

PROO

F


2 The Exploitation of Structure in Matrix Computations 49

We survey five interesting matrix examples that showcase the idea of hidden 50structure. By “‘hidden” we mean “not obvious”. In each case the exploitation of 51the hidden structure has important ramifications from the computational point of 52view. 53

2.1 Exploiting Data Sparsity 54

The n-by-n discrete Fourier transform matrix Fn is defined by 55

ŒFn�kq D !kqn !n D cos�2�

n

�� i sin

�2�

n

�56

where we are subscripting from zero. Thus, 57

F4 D

2666664

1 1 1 1

1 !4 !24 !

34

1 !24 !44 !

64

1 !34 !64 !

94

3777775D

2666664

1 1 1 1

1 �i �1 i1 �1 1 �11 i �1 �i

3777775: 58

If we carefully reorder the columns of F2m, then copies of Fm magically appear, e.g., 59

F4

2666664

1 0 0 0

0 0 1 0

0 1 0 0

0 0 0 1

3777775D

2666664

1 1 1 1

1 �1 �i i1 1 �1 �11 �1 i �i

3777775D"

F2 ˝2F2

F2 �˝2F2

#60

where 61

˝2 D�1 0

0 �i�: 62

In general we have 63

F2m˘2;m D"

Fm ˝mFm

Fm �˝mFm

#(1)

UNCO

RREC

TED

PROO

F

4 C.F. Van Loan

where ˘2;m is a perfect shuffle permutation (to be described in Sect. 3.7) and ˝m is 64the diagonal matrix 65

˝m D diag.1; !n; : : : ; !m�1n /: 66

The DFT matrix is dense, but by exploiting the recursion (1) it can be factored into 67a product of sparse matrices, e.g., 68

F1024 D A10 � � �A2A1PT : 69

Here, P is the bit reversal permutation and each Ak has just two nonzeros per row. 70It is this structured factorization that makes it possible to have a fast, O.n log n/ 71Fourier transform, e.g., 72

73y D PTx74for k D 1 W 1075y D Aky76end

See [31] for a detailed “matrix factorization” treatment of the FFT. 77The DFT matrix Fn is data sparse meaning that it can be represented with many 78

fewer than n2 numbers. Other examples of data sparsity include matrices that have 79low-rank and matrices that are Kronecker products. Many tensor problems lead to 80matrix problems that are data sparse. 81

2.2 Exploiting Structured Eigensystems 82

Suppose A;F;G 2 Rn�nand that both F and G are symmetric. The matrix M defined 83by 84

M D"

A F

G �AT#

F D FT ; G D GT 85

is said to be a Hamiltonian matrix. The eigenvalues of a Hamiltonian matrix come 86in plus-minus pairs and the eigenvectors associated with such a pair are related: 87

M

�yz

�D �

�yz

�) MT

�z�y�D ��

�z�y�: 88

Hamiltonian structure can also be defined through a permutation similarity. If 89

J2n D"0 In

�In 0

#90

UNCO

RREC

TED

PROO

F


then M 2 R2n�2nis Hamiltonian if JT2nMJ2n D �MT . Under mild assumptions we 91can compute a structured Schur decomposition for a Hamiltonian matrix M: 92

QTMQ D"

Q1 Q2

�Q2 Q1

#TM

"Q1 Q2

�Q2 Q1

#D"

T11 T12

0 �TT11

#: 93

Here, Q is orthogonal and symplectic (JT2nQJ2n D Q�T ) and T11 is upper quasi- 94triangular. Various Ricatti equation problems can be solved by exploiting this 95structured decomposition. See [23]. 96

Tensors with multiple symmetries can be reshaped into matrices with multiple 97symmetries and these matrices have structured block factorizations. 98

2.3 Exploiting the Right Representation 99

Here is an example of a Cauchy-like matrix: 100

A D

26666666664

r1s1!1 � �1

r1s2!1 � �2

r1s3!1 � �3

r1s4!1 � �4

r2s1!2 � �1

r2s2!2 � �2

r2s3!2 � �3

r2s4!2 � �4

r3s1!3 � �1

r3s2!3 � �2

r3s3!3 � �3

r3s4!3 � �4

r4s1!4 � �1

r4s2!4 � �2

r4s3!4 � �3

r4s4!4 � �4

37777777775: 101

For this to be defined we must have f�1; : : : ; �ng [ f�1; : : : ; �ng D ;. Cauchy-like 102matrices are data sparse and a particularly clever characterization of this fact is to 103note that if ˝ D diag.!i/ and� D diag.�i/, then 104

˝A� A� D rsT (2)

where r; s 2 Rn. If˝A�A� has rank r, then we say that A has displacement rank r 105(with respect to ˝ and �.) Thus, a Cauchy-like matrix has unit displacement rank. 106

Now let us consider the first step of Gaussian elimination. Ignoring pivoting this 107involves computing a row of U, a column of L, and a rank-1 update: 108

A D

266664

1 0 0 0

`21 1 0 0

`31 0 1 0

`41 0 0 1

377775

266664

1 0 0 0

0 b22 b23 b24

0 b32 b33 b34

0 b42 b43 b44

377775

266664

u11 u12 u13 u14

0 1 0 0

0 0 1 0

0 0 0 1

377775 : 109

UNCO

RREC

TED

PROO

F

6 C.F. Van Loan

It is easy to compute the required entries of L and U from the displacement rank 110representation (2). What about B? If we represent B conventionally as an array 111then that would involve O.n2/ flops. Instead we exploit the fact that B also has 112unit displacement rank: 113

Q̋ B � B Q� D QrQsT : 114

It turns out that we can transition from A’s representation f˝;�; r; sg to B’s 115representation f Q̋ ; Q�; Qr; Qsg with O.n/ work and this enables us to compute the LU 116factorization of a Cauchy-like matrix with just O.n2/ work. See [11, p. 682]. 117

Being able to work with clever representations is often the key to having a 118successful solution framework for a tensor problem. 119

2.4 Exploiting Orthogonality Structures 120

Assume that the columns of the 2-by-1 block matrix 121

Q D"

Q1

Q2

#122

are orthonormal, i.e., QT1Q1 C QT2Q2 D I. The CS decomposition says that Q1 and 123Q2 have related SVDs: 124

"U1 0

0 U2

#T "Q1

Q2

#V D

"diag.ci/

diag.si/

#c2i C s2i D 1 (3)

where U1, U2, and V are orthogonal. This truly remarkable hidden structure can be 125used to compute stably the generalized singular value decomposition (GSVD) of a 126A1 2 Rm1�nand A2 2 Rm2�n. In particular, suppose we compute the QR factorization 127

"A1

A2

#D"

Q1

Q2

#R; 128

and then the CS decomposition (3). By setting X D RTV , we obtain the GSVD 129

A1 D U1 � diag.ci/ � XT A2 D U2 � diag.si/ � XT : 130

For a more detailed discussion about the GSVD and the CS decomposition, see [11, 131p. 309]. 132

A tensor decomposition can often be regarded as simultaneous decomposition of 133its (many) matrix “layers”. 134

UNCO

RREC

TED

PROO

F


2.5 Exploiting a Structured Data layout 135

Suppose we have a block matrix 136

A D

266664

A11 A12 � � � A1NA21 A22 � � � A2N:::

:::: : :

:::

AM1 AM2 � � � AMN

377775 137

that is stored in such a way that the data in each Aij is contiguous in memory. 138Note that there is nothing structured about “A-the-matrix”. However, “A-the-stored- 139array” does have an exploitable structure that we now illustrate by considering the 140computation of C D AT . We start by transposing each of A’s blocks: 141

266664

B11 B12 � � � B1NB21 B22 � � � B2N:::

:::: : :

:::

BM1 BM2 � � � BMN

377775

266664

AT11 AT12 � � � AT1N

AT21 AT22 � � � AT2N

::::::: : :

:::

ATM1 ATM2 � � � ATMN

377775 : 142

These block transpositions involve “local data” and this is important because 143moving data around in a large scale matrix computation is typically the dominant 144cost. Next, we transpose B as a block matrix: 145

266664

C11 C12 � � � C1MC21 C22 � � � C2M:::

:::: : :

:::

CN1 C2N � � � CNM

377775

266664

B11 B21 � � � BM1B12 B22 � � � BM2:::

:::: : :

:::

B1N B2N � � � BMN

377775 146

Again, this is a “memory traffic friendly” maneuver because blocks of contiguous 147data are being moved. It is easy to verify that Cij D ATji . 148

What we have sketched is a “2-pass” transposition procedure. By blocking in 149a way that resonates with cache/local memory size and by breaking the overall 150transposition process down into a sequence of carefully designed passes, one can 151effectively manage the underlying dataflow. See [11, p. 711]. 152

Transposition and looping are much more complicated with tensors because 153there are typically an exponential number of possible data structures and an ex- 154ponential number of possible loop nestings. Software tools that facilitate reasoning 155in this space are essential. See [2, 25, 30]. 156

UNCO

RREC

TED

PROO

F

8 C.F. Van Loan

3 Matrix-Tensor Connections 157

Tensor computations typically get “reshaped” into matrix computations. To operate 158in this venue we need terminology and mechanisms for matricizing the tensor data. 159In this section we introduce the notion of a tensor unfolding and we get comfortable 160with Kronecker products and their properties. For more details see [11, 17, 19, 26, 16129]. 162

3.1 Talking About Tensors 163

An order-d tensor A 2 Rn1��ndis a real d-dimensional array 164

A.1 W n1; : : : ; 1 W nd/ 165

where the index range in the k-th mode is from 1 to nk. Note that 166

a

8<:

scalarvectormatrix

9=; is an

8<:

order-0order-1order-2

9=; tensor: 167

We use calligraphic font to designate tensors e.g., A, B, C, etc. Sometimes we will 168write A for matrix A if it makes things clear. 169

One way that tensors arise is through discretization. A.i; j; k; `/ might house the 170value of f .w; x; y; z/ at .w; x; y; z/ D .wi; xj; yk; z`/. In multiway analysis the value 171of A.i; j; k; `/ could measure the interaction between four variables/factors. See [3] 172and [29]. 173

3.2 Tensor Parts: Fibers and Slices 174

A fiber of a tensor A is a column vector obtained by fixing all but one A’s indices. 175For example, if A D A.1 W 3; 1 W 5; 1 W 4; 1 W 7/ 2 R3�5�4�7, then 176

A.2; W; 4; 6/ D A.2; 1 W 5; 4; 6/ D

266666664

A.2; 1; 4; 6/A.2; 2; 4; 6/A.2; 3; 4; 6/A.2; 4; 4; 6/A.2; 5; 4; 6/

377777775

177

is a mode-2 fiber. 178

UNCO

RREC

TED

PROO

F


A slice of a tensor A is a matrix obtained by fixing all but two of A’s indices. For 179example, if A D A.1 W 3; 1 W 5; 1 W 4; 1 W 7/, then 180

A.W; 3; W; 6/ D

264A.1; 3; 1; 6/ A.1; 3; 2; 6/ A.1; 3; 3; 6/ A.1; 3; 4; 6/A.2; 3; 1; 6/ A.2; 3; 2; 6/ A.2; 3; 3; 6/ A.2; 3; 4; 6/A.3; 3; 1; 6/ A.3; 3; 2; 6/ A.3; 3; 3; 6/ A.3; 3; 4; 6/

375 181

is a slice. 182

3.3 Order-4 Tensors and Block Matrices 183

Block matrices with uniformly-sized blocks are reshaped order-4 tensors. For 184example, if 185

C D

266666666664

c11 c12 c13 c14 c15 c16

c21 c22 c23 c24 c25 c26

c31 c32 c33 c34 c35 c36

c41 c42 c43 c44 c45 c46

c51 c52 c53 c54 c55 c56

c61 c62 c63 c64 c65 c66

377777777775

186

then matrix entry c45 is entry (2,1) of block (2,3). Thus, we can think of ŒCij�k` as 187the .i; j; k; `/ entry of a tensor C, e.g., 188

c45 D ŒC23�21 D C.2; 3; 2; 1/: 189Working in the other direction we can unfold an order-4 tensor into a block 190

matrix. Suppose A 2 Rn�n�n�n. Here is its “Œ1; 2� � Œ3; 4�” unfolding: 191

AŒ1;2��Œ3;4� D

266666666666666666664

a1111 a1112 a1113 a1121 a1122 a1123 a1131 a1132 a1133

a1211 a1212 a1213 a1221 a1222 a1223 a1231 a1232 a1233

a1311 a1312 a1313 a1321 a1322 a1323 a1331 a1332 a1333

a2111 a2112 a2113 a2121 a2122 a2123 a2131 a2132 a2133

a2211 a2212 a2213 a2221 a2222 a2223 a2231 a2232 a2233

a2311 a2312 a2313 a2321 a2322 a2323 a2331 a2332 a2333

a3111 a3112 a3113 a3121 a3122 a3123 a3131 a3132 a3133

a3211 a3212 a3213 a3221 a3222 a3223 a3231 a3232 a3233

a3311 a3312 a3313 a3321 a3322 a3323 a3331 a3332 a3333

377777777777777777775

: 192

UNCO

RREC

TED

PROO

F

10 C.F. Van Loan

If A D AŒ1;2��Œ3;4� then the tensor-to-matrix mapping is given by 193

A.i1; i2; i3; i4/ ! A.i1 C .i2 � 1/n; i3 C .i4 � 1/n/: 194

An alternate unfolding results if modes 1 and 3 are associated with rows and modes 1952 and 4 are associated with columns: 196

AŒ1;3��Œ2;4� D

266666666666666666664

a1111 a1112 a1113 a1211 a1212 a1213 a1311 a1312 a1313

a1121 a1122 a1123 a1221 a1222 a1223 a1321 a1322 a1323

a1131 a1132 a1133 a1231 a1232 a1233 a1331 a1332 a1333

a2111 a2112 a2113 a2211 a2212 a2213 a2311 a2312 a2313

a2121 a2122 a2123 a2221 a2222 a2223 a2321 a2322 a2323

a2131 a2132 a2133 a2231 a2232 a2233 a2331 a2332 a2333

a3111 a3112 a3113 a3211 a3212 a3213 a3311 a3312 a3313

a3121 a3122 a3123 a3221 a3222 a3223 a3321 a3322 a3323

a3131 a3132 a3133 a3231 a3232 a3233 a3331 a3332 a3333

377777777777777777775

: 197

If A D AŒ1;3��Œ2;4� , then the tensor-to-matrix mapping is given by 198

A.i1; i2; i3; i4/ ! A.i1 C .i3 � 1/n; i2 C .i4 � 1/n/: 199

If a tensor A is structured, then different unfoldings reveal that structure in different 200ways [33]. The idea of block unfoldings is discussed in [26]. 201

3.4 Modal Unfoldings 202

A particularly important class of tensor unfoldings are the modal unfoldings. An 203order-d tensor has d modal unfoldings which we designate by A.1/; : : : ;A.d/. Let’s 204look at the tensor-to-matrix mappings for the case A 2 Rn1�n2�n3�n4 : 205

A.i1; i2; i3; i4/! A.1/.i1; i2 C .i3 � 1/n2 C .i4 � 1/n2n3/



A.i1; i2; i3; i4/! A.4/.i4; i1 C .i2 � 1/n1 C .i3 � 1/n1n2/:

UNCO

RREC

TED

PROO

F


Note that if N D n1 � � � nd, then A.k/ is nk-by-N=nk and its columns are mode-k 206fibers. Thus, if n2 D 4 and n1 D n3 D n4 D 2, then 207

A.2/ D

266664

a1111 a1121 a1112 a1122 a2111 a2121 a2112 a2122

a1211 a1221 a1212 a1222 a2211 a2221 a2212 a2222

a1311 a1321 a1312 a1322 a2311 a2321 a2312 a2322

a1411 a1421 a1412 a1422 a2411 a2421 a2412 a2422

377775 : 208

Modal unfoldings arise naturally in many multilinear optimization settings. 209

3.5 The vec Operation 210

The vec operator turns tensors into column vectors. The vec of a matrix is obtained 211by stacking its columns: 212

A 2 R3�2 ) vec.A/ D

266666664

a11a21a31a12a22a32

377777775: 213

The vec of an order-3 tensor A 2 Rn1�n2�n3 stacks the vecs of the slices 214A.W; W; 1/; : : : ;A.W; W; n3/, e.g., 215

A 2 R2�2�2 ) vec.A/ D"

vec.A.W; W; 1//vec.A.W; W; 2//

#D

2666666666664

a111a211a121a221a112a212a122a222

3777777777775

: 216

In general, if A 2 Rn1��nd and the order d � 1 tensor Ak is defined by Ak D A.W 217; : : : ; W; k/, then we have the following recursive definition: 218

vec.A/ D

264

vec.A1/:::

vec.And /

375 : 219

Thus, vec unfolds a tensor into a column vector. 220

UNCO

RREC

TED

PROO

F

12 C.F. Van Loan

3.6 The Kronecker Product 221

Unfoldings enable us to reshape tensor computations as matrix computations and 222Kronecker products are very often part of the scene. The Kronecker product A D 223B ˝ C of two matrices B and C is a block matrix .Aij/ where Aij D bijC, e.g., 224

A D

264

b11 b12 b13

b21 b22 b23

b31 b32 b33

375 ˝

"c11 c12

c21 c22

#D

266664

b11C b12C b13C

b21C b22C b23C

b31C b32C b33C

377775 : 225

Of course, B ˝ C is also a matrix of scalars: 226

A D

2666666666666664

b11c11 b11c12 b12c11 b12c12 b13c11 b13c12

b11c21 b11c22 b12c21 b12c22 b13c21 b13c22

b21c11 b21c12 b22c11 b22c12 b23c11 b23c12

b21c21 b21c22 b22c21 b22c22 b23c21 b23c22

b31c11 b31c12 b32c11 b32c12 b33c11 b33c12

b31c21 b31c22 b32c21 b32c22 b33c21 b33c22

3777777777777775

: 227

Note that every possible product bijck` “shows up” in B ˝ C. 228In general, if A1 2 Rm1�n1and A2 2 Rm2�n2, then A D A1˝A2 is an m1m2-by-n1n2 229

matrix of scalars. It is also an m1-by-n1 block matrix with blocks that are m2-by-n2. 230Kronecker products can be applied in succession. Thus, if 231

A D

2664

b11 b12

b21 b22

b31 b32

3775 ˝

2666664

c11 c12 c13 c14

c21 c22 c23 c24

c31 c32 c33 c34

c41 c42 c43 c44

3777775˝

2664

d11 d12 d13 d14

d21 d22 d23 d24

d31 d32 d33 d34

3775 ; 232

then A is a 3-by-2 block matrix whose entries are 4-by-4 block matrices whose 233entries are 3-by-4 matrices. 234

It is important to have a facility with the Kronecker product operation because 235they figure heavily in tensor computations. Here are three critical properties: 236

.A ˝ B/.C ˝ D/ D AC ˝ BD.A ˝ B/�1 D A�1 ˝ B�1

.A ˝ B/T D AT ˝ BT :

UNCO

RREC

TED

PROO

F


Of course, in these expressions the various matrix products and inversions have to 237be defined. 238

If the matrices A and B have structure, then their Kronecker product is typically 239structured in the same way. For example, if Q1 2 Rm1�n1and Q2 2 Rm2�n2 have 240orthonormal columns, then Q1 ˝ Q2 has orthonormal columns: 241

.Q1 ˝ Q2/T.Q1 ˝ Q2/ D .QT1 ˝ QT2 /.Q1 ˝ Q2/D .QT1Q1/ ˝ .QT2Q2/ D In1 ˝ In2 D In1n2 :

Kronecker products often arise through the “vectorization” of a matrix equation, 242e.g., 243

C D BXAT , vec.C/ D .A ˝ B/ vec.X/: 244

3.7 Perfect Shuffles, Kronecker Products, and Transposition 245

In general, A1 ˝ A2 ¤ A2 ˝ A1. However, very structured permutation matrices 246P1 and P2 exist so that P1.A1 ˝ A2/P2 D A2 ˝ A1. Define the .p; q/-perfect shuffle 247matrix˘p;q 2 Rpq�pq by 248

˘p;q D�

Ipq.W; 1 W q W pq/ j Ipq.W; 2 W q W pq/ j � � � j Ipq.W; q W q W pq/�: 249

Here is an example: 250

˘3;2 D

266666664

1 0 0 0 0 0

0 0 0 1 0 0

0 1 0 0 0 0

0 0 0 0 1 0

0 0 1 0 0 0

0 0 0 0 0 1

377777775: 251

In general A1 ˝ A2 ¤ A2 ˝ A1. However, if A1 2 Rm1�n1 and A2 2 Rm2�n2 then 252

˘m1;m2 .A1 ˝ A2/˘Tn1;n2 D A2 ˝ A1: 253

UNCO

RREC

TED

PROO

F

14 C.F. Van Loan

The perfect shuffle is also “behind the scenes” when the transpose of a matrix is 254taken, e.g., 255

˘3;2 vec.A/ D

266666664

1 0 0 0 0 0

0 0 0 1 0 0

0 1 0 0 0 0

0 0 0 0 1 0

0 0 1 0 0 0

0 0 0 0 0 1

377777775

266666664

a11a21a31a12a22a32

377777775D

266666664

a11a12a21a22a31a32

377777775D vec.AT/: 256

In general, if A 2 Rm�n then ˘m;nvec.A/ D vec.AT/. See [13]. We return to these 257interconnections when we discuss tensor transposition in Sect. 6. 258

3.8 Tensor Notation 259

It is often perfectly adequate to illustrate a tensor computation idea using order-3 260examples. For example, suppose A 2 Rn1�n2�n3 , X1 2 Rm1�n1, X2 2 Rm2�n2, and 261X3 2 Rm3�n3 are given and that we wish to compute 262

B.i1; i2; i3/ Dn1X

j1D1

n2Xj2D1

n3Xj3D1

A. j1; j2; j3/X1.i1; j1/X2.i2; j2/X1.i3; j3/ 263

where 1 � i1 � m1, 1 � i2 � m2, and 1 � i3 � m3. Here we are using matrix-like 264subscript notation to spell out the definition of B. We could probably use the same 265notation to describe the order-4 version of this computation. However, for higher- 266order cases we have to resort to the dot-dot-dot notation and it gets pretty unwieldy: 267

B.i1; : : : ; id/ Dn1X

j1D1

n2Xj2D1� � �

ndXjdD1

A. j1; : : : ; jd/X1.i1; j1/ � � �Xd.id; jd/: 268

2691 � i1 � m1; 1 � i2 � m2; : : : ; 1 � id � md 270

One way to streamline the presentation of such a calculation is to “vectorize” the 271notation using bold font to indicate vectors of subscripts. Multiple summations can 272also be combined through vectorization. Thus, if 273

i D Œi1; : : : ; id�; j D Œ j1; : : : ; jd�; m D Œm1; : : : ;md�; n D Œn1; : : : ; nd�; 274

then the B tensor given above can be expressed as follows: 275

B.i/ DnX

jD1A.j/X1.i1; j1/ � � �Xd.id; jd/; 1 � i � m: 276

UNCO

RREC

TED

PROO

F


Here, 1 D Œ1; 1; : : : ; 1�. As another example of this notation, if n D Œn1; : : : ; nd � 277and A 2 Rn, then 278

kA kF Dvuut nX

iD1A.i/2 279

is its Frobenius norm. We shall make use of this vectorized notation whenever it is 280necessary to hide detail and/or when we are working with tensors of arbitrary order. 281

Finally, it is handy to have a MATLAB “reshape” notation. Suppose n D 282Œ n1; : : : ; nd � and m D Œ m1; : : : ; me �. If A 2 Rn and n1 � � � nd D m1 � � �me, 283then 284

B D reshape.A;m/ 285

is the m1 � � � � � me tensor defined by vec.A/ D vec.B/: 286

3.9 The Tensor Product 287

On occasion it is handy to talk about operations between tensors without recasting 288the discussion in the language of matrices. Suppose n D Œn1; : : : ; nd� and that B; C 2 289Rn. We can multiply a tensor by a scalar, 290

A D ˛B , A.i/ D ˛B.i/; 1 � i � n 291

and we can add one tensor to another, 292

A D BC C , A.i/ D B.i/C C.i/; 1 � i � n: 293

Slightly more complicated is the tensor product which is a way of multiplying 294two tensors together to obtain a new, higher order tensor. For example if B 2 295Rm1�m2�m3 D Rm and C 2 Rn1�n2 D Rn, then the tensor product A D B ı C is 296defined by 297

A.i1; i2; i3; j1; j2/ D B.i1; i2; i3/C. j1; j2/ 298

i.e., A.i; j/ D B.i/ C.j/ for all 1 � i � m and 1 � j � n. 299If B 2 Rm and C 2 Rn, then there is a connection between the tensor product 300

A D B ı C and its m-by-n unfolding: 301

Am�n D vec.B/vec.C/T : 302

UNCO

RREC

TED

PROO

F

16 C.F. Van Loan

There is also a connection between the tensor product of two vectors and their 303Kronecker product: 304

vec.x ı y/ D vec.xyT/ D y ˝ x: 305

Likewise, if B and C are order-2 tensors and A D B ı C, then 306

AŒ1;3��Œ2;4� D B ˝ C: 307

The point of all this notation-heavy discussion is to stress the importance of 308flexibility and point of view. Whether we write A.i/ or A.i1; i2; i3/ or ai1i2i3 depends 309on the context, what we are trying to communicate, and what typesets nicely!. 310Sometimes it will be handy to regard A as a vector such as vec.A/ and sometimes 311as a matrix such as A.3/. Algorithmic insights in tensor computations frequently 312require an ability to “reshape” how the problem at hand is viewed. 313

4 A Rank-1 Tensor Problem 314

Rank-1 matrices have a prominent role to play in matrix computations. For example, 315one step of Gaussian elimination involves a rank-1 update of a submatrix. The 316SVD decomposes a matrix into a sum of very special rank-1 matrices. Quasi- 317Newton methods for nonlinear systems involve rank-1 modifications of the current 318approximate Jacobian matrix. 319

In this section we introduce the concept of a rank-1 tensor and consider how we 320might approximate a given tensor with such an entity. This leads to a discussion 321(through an example) of multilinear optimization. 322

4.1 Rank-1 Matrices 323

If u and v are vectors, then A D uvT is a rank-1 matrix, e.g., 324

A D24 u1u2

u3

35�v1

v2

�TD24 u1v1 u1v2u2v1 u2v2

u3v1 u3v2

35 : 325

UNCO

RREC

TED

PROO

F


Note that if A D uvT , then vec.A/ D v ˝ u and so we have 326266666664

a11a21a31a12a22a32

377777775D

266666664

u1v1u2v1u3v1u1v2u2v2u3v2

377777775D�v1

v2

�˝24 u1u2

u3

35 : 327

4.2 Rank-1 Tensors 328

How can we extend the rank-1 idea from matrices to tensors? In matrix computa- 329tions we think of rank-1 matrices as outer products, i.e., A D uvT where u and v are 330vectors. Thinking of matrix A as tensor A, we see that it is just the tensor product of 331u and v: A.i1; i2/ D u.i1/v.i2/. Thus, we have 332

A D u ı v D24u1u2

u3

35 ı

�v1v2

�, vec.A/ D

266666664

u1v1u2v1u3v1u1v2u2v2u3v2

377777775D v ˝ u: 333

Here is an order-3 example of the same idea: 334

A D uıvıw D24 u1u2

u3

35ı�v1v2

�ı�

w1w2

�, vec.A/ D

26666666666666666664

u1v1w1u2v1w1u3v1w1u1v2w1u2v2w1u3v2w1u1v1w2u2v1w2u1v2w2u2v2w2u3v2w2

37777777777777777775

D w˝v˝u: 335

Each entry in A is a product of entries from u, v, and w: A.p; q; r/ D upvqwr. 336In general, a rank-1 tensor is a tensor product of vectors. To be specific, if x.i/ 2 337

Rni for i D 1; : : : ; d, then 338

A D x.1/ ı � � � ı x.d/ 339

UNCO

RREC

TED

PROO

F

18 C.F. Van Loan

is a rank-1 tensor whose entries are defined by A.i/ D x.1/i1 � � � x.d/id . In terms of the 340Kronecker product we have vec.x.1/ ı � � � ı x.d// D x.d/ ˝ � � � ˝ x.1/. 341

4.3 The Nearest Rank-1 Problem for Matrices 342

Given a matrix A 2 Rm�n, consider the minimization of 343

�.�; u; v/ D k A � �uvT kF 344

where u 2 Rm and v 2 Rn have unit 2-norm and � is a nonnegative scalar. This is an 345SVD problem for if UTAV D ˙ D diag.�i/ is the SVD of A and �1 � � � � � �n � 0, 346then �.�; u; v/ is minimized by setting � D �1, u D U.W; 1/, and v D V.W; 1/. 347

Instead of explicitly computing the entire SVD, we can compute �1 and its 348singular vectors using an alternating least squares approach. The starting point is to 349realize that 350

k A � �uvT k2F D tr.ATA/� 2�uTAv C �2: 351

where tr.M/ indicates the trace of a matrix M, i.e., the sum of its diagonal entries. 352Note that 353

�u.y/ D tr.ATA/ � 2yTAv C k y k22 354

is minimized by setting y D Av and that 355

�v.x/ D tr.ATA/� 2uTAxC k x k22 356

is minimized by setting x D ATu. This suggests the following iterative framework 357for minimizing k A � �uvT kF: 358

Nearest Rank-1 Matrix

359Given: A 2 Rm�n, v 2 Rn, k v k2 D 1360Repeat:

361Fix v and choose � and u to minimize k A � �uvT kF W362y D AvI � D k y kI u D y=�363Fix u and choose � and v to minimize k A � �uvT kF W364x D ATuI � D k x kI v D x=�365�opt D � I uopt D uI vopt D v

366

UNCO

RREC

TED

PROO

F


This is basically just the power method applied to the matrix 367

sym.A/ D"0 A

AT 0

#: 368

The reason for bringing up this alternating least squares framework is that it readily 369extends to tensors. 370

4.4 A Nearest Rank-1 Tensor Problem 371

Given A 2 Rm�n�p, we wish to determine unit vectors u 2 Rm, v 2 Rn, and w 2 Rp 372and a scalar � so that the following is minimized: 373

kA � � � u ı v ı w kF Dvuut mX

iD1

nXjD1

pXkD1.aijk � uivjwk/2: 374

Noting that 375

kA � � � u ı v ı w kF D k vec.A/ � � � w ˝ v ˝ u k2 376

we obtain the following alternating least squares framework: 377

Nearest Rank-1 Tensor

378Given: A 2 Rm�n�p and unit vectors v 2 Rn and w 2 Rp.379Repeat:

380Determine x 2 Rm that minimizes k vec.A/ � w ˝ v ˝ x k2381and set � D k x k and u D x=�:382Determine y 2 Rn that minimizes k vec.A/ � w ˝ y ˝ u k2383and set � D k y k and v D y=�:384Determine z 2 Rp that minimizes k vec.A/ � z ˝ v ˝ u k2385and set � D k z k and w D z=�:386�opt D �; uopt D u; vopt D v; wopt D w

387

UNCO

RREC

TED

PROO

F

20 C.F. Van Loan

It is instructive to examine some of the details associated with this iteration for the 388case m D n D p D 2. The objective function has the form 389

�.�; 1; 2; 3/ D

��

266666666666664

a111a211a121a221a112a212a122a222

377777777777775

� � � w ˝ v ˝ u

��

D

��

266666666666664

a111a211a121a221a112a212a122a222

377777777777775

� � �

266666666666664

c3c2c1c3c2s1c3s2c1c3s2s1s3c2c1s3c2s1s3s2c1s3s2s1

377777777777775

��

390

where 391

u D�

cos.1/sin.1/

�D�

c1s1

�; v D

�cos.2/sin.2/

�D�

c2s2

�; w D

�cos.3/sin.3/

�D�

c3s3

�: 392

Let us look at the three structured linear least squares problems that arise during 393each iteration. 394

(1) To improve 1 and � , we fix 2 and 3 and minimize 395��

2666666666664

a111a211a121a221a112a212a122a222

3777777777775

� � �

2666666666664


3777777777775

��

D

��

2666666666664

a111a211a121a221a112a212a122a222

3777777777775

�

2666666666664

c3c2 00 c3c2

c3s2 00 c3s2

s3c2 00 s3c2

s3s2 00 s3s2

3777777777775

�x1y1

��

396

with respect to x1 and y1. We then set � Dq

x21 C y21 and u D Œx1 y1�T=�: 397(2) To improve 2 and � , we fix 1 and 3 and minimize 398

��

2666666666664

a111a211a121a221a112a212a122a222

3777777777775

� � �

2666666666664


3777777777775

��

D

��

2666666666664

a111a211a121a221a112a212a122a222

3777777777775

�

2666666666664

c3c1 0c3s1 00 c3c10 c3s1

s3c1 0s3s1 00 s3c10 s3s1

3777777777775

�x2y2

��

399


x22 C y22 and v D Œx2 y2�T=�: 400

UNCO

RREC

TED

PROO

F


(3) To improve 3 and � , we fix 1 and 2 and minimize 401

��

2666666666664

a111a211a121a221a112a212a122a222

3777777777775

� � �

2666666666664


3777777777775

��

D

��

2666666666664

a111a211a121a221a112a212a122a222

3777777777775

�

2666666666664

c2c1 0c2s1 0s2c1 0s2s1 00 c2s10 c2s10 s2c10 s2s1

3777777777775

�x3y3

��

402


x23 C y23 and w D Œx3 y3�T=�: 403Componentwise optimization is a common framework for many tensor-related 404

computations. The basic idea is to choose a subset of the unknowns and (temporar- 405ily) freeze their value. This leads to a simplified optimization problem involving the 406other unknowns. The process is repeated using different subsets of unknowns each 407iteration until convergence. The framework is frequently successful, but there is a 408tendency for the iterates to get trapped near an uninteresting local minima. 409

5 The Variational Approach to Tensor Singular Values 410

If �2 is a zero of the characteristic polynomial p.�/ D det.ATA � �I/, then � is a 411singular value of A and the associated left and right singular vectors are eigenvectors 412for AAT and ATA respectively. How can we extend these notions to tensors? Is there 413a version of the characteristic polynomial that makes sense for tensors? What would 414be the analog of the matrices AAT and ATA? These are tough questions. Fortunately, 415there is a constructive way to avoid these difficulties and that is to take a variational 416approach. Singular values and vectors are solutions to a very tractable optimization 417problem. 418

5.1 Rayleigh Quotient/Power Method Ideas: The Matrix Case 419

The singular values and singular vectors of a general matrix A 2 Rm�n are the 420stationary values and vectors of the Rayleigh quotient 421

yTAx

k x k2k y k2: 422

UNCO

RREC

TED

PROO

F

22 C.F. Van Loan

It is a slightly more convenient to pose this as a constrained optimization problem: 423singular values and singular vectors of a general matrix A 2 Rm�n are the stationary 424values and vectors of 425

A.x; y/ D xTAy DmX

iD1

nXjD1

aijxiyj 426

subject to the constraints k x k2 D k y k2 D 1. To connect this “definition” to the 427SVD we use the method of Lagrange multipliers and that means looking at the 428gradient of 429

Q A.x; y/ D .x; y/ � �2.xTx � 1/� �

2.yTy � 1/: 430

Using the rearrangements 431

A.x; y/ DmX

iD1xi

0@ nX

jD1aijyj

1A D

nXjD1

yj

mX

iD1aijxi

!; 432

it follows that 433

r Q A.x; y/ D"

Ay � �xATx � �y

#: (4)

From the vector equation r Q A.x; y/ D 0 we conclude that � D � D xTAy D 434 A.x; y/ and that x and y satisfy Ay D .xTAy/x and ATx D .xTAy/y. That is to 435say, x is an eigenvector of ATA and y is an eigenvector of AAT and the associated 436eigenvalue in each case is .yTAx/2. These are exactly the conclusions that can be 437reached by equating columns in the SVD equation AV D U˙ . Indeed, from 438

AŒ v1 j � � � j vn � D Œ u1 j � � � j un � diag.�1; : : : ; �n/ 439

we see that Avi D �iui, ATui D �ivi, and �i D uTi Avi, for i D 1 W n. 440

UNCO

RREC

TED

PROO

F


The power method for matrices can be designed to go after the largest singular 441value and associated singular vectors: 442

Power Method for Matrix SingularValues and Vectors

Given A 2 Rm�n and unit vector y 2 Rn.Repeat:

Qx D Ay; x D Qx=k Qx kQy D ATx; y D Qy=k Qy k� D A.x; y/ D yTAx

�opt D � , uopt D y, vopt D x

449

This can be viewed as an alternating procedure for finding a zero for the gradient (4). 450Under mild assumptions, f�opt; uopt; voptgwill approximate the largest singular value 451of A and the corresponding left and right singular vectors. In principle, deflation can 452be used to find other singular value triplets. Thus, by applying the power method to 453A � �1u1vT1 we could obtain an estimate of f�2; u2; v2g. 454

5.2 Rayleigh Quotient/Power Method Ideas: The Tensor Case 455

Let us extend the Rayleigh quotient characterization for matrix singular values and 456vectors to tensors. We work out the order-3 situation for simplicity. 457

If A 2 Rn1�n2�n3 , x 2 Rn1 , y 2 Rn2 , and z 2 Rn3 , then the singular values and 458vectors of A are the stationary values and vectors of 459

A.x; y; z/ Dn1X

iD1

n2XjD1

n3XkD1

aijk xiyjzk (5)

subject to the constraints k x k2 D k y k2 D k z k2 D 1. Before we start 460taking gradients we present three alternative formulations of this summation. Each 461highlights a different unfolding of the tensor A. 462

The Mode-1 Formulation 463

A.x; y; z/ Dn1X

iD1xi

0@ n2X

jD1

n3XkD1

aijk yjzk

1A D xTA.1/z ˝ y (6)

UNCO

RREC

TED

PROO

F

24 C.F. Van Loan

where A.1/.i; jC .k � 1/n2/ D aijk. For the case n D Œ4; 3; 2� we have 464

A.1/ D

2664

a111 a121 a131 a112 a122 a132a211 a221 a231 a212 a222 a232a311 a321 a331 a312 a322 a332a411 a421 a431 a412 a422 a432

3775 :

.1;1/ .2;1/ .3;1/ .1;2/ .2;2/ .3;2/

465


A.x; y; z/ Dn2X

jD1yj

n1X

iD1

n3XkD1

aijk xizk

!D yTA.2/z ˝ x (7)

where A.2/. j; iC .k � 1/n1/ D aijk. For the case n D Œ4; 3; 2� we have 467

A.2/ D24 a111 a211 a311 a411 a112 a212 a312 a412a121 a221 a321 a421 a122 a222 a322 a422

a131 a231 a331 a431 a132 a232 a332 a432

35 :

.1;1/ .2;1/ .3;1/ .4;1/ .1;2/ .2;2/ .3;2/ .4;2/

468


A.x; y; z/ DpX

kD1zk

0@ mX

iD1

nXjD1

aijk xiyj

1A D zTA.3/y ˝ x (8)

where A.3/.k; iC . j� 1/n1/ D aijk. For the case n D Œ4; 3; 2� we have 470

A.3/ D�

a111 a211 a311 a411 a121 a221 a321 a421 a131 a231 a331 a431a112 a212 a312 a412 a122 a222 a322 a422 a132 a232 a332 a432

�:

.1;1/ .2;1/ .3;1/ .4;1/ .1;2/ .2;2/ .3;2/ .4;2/ .1;3/ .2;3/ .3;3/ .4;3/

471

The matrices A.1/, A.2/, and A.3/ are the mode-1, mode-2, and mode-3 unfoldings of 472A that we introduced in Sect. 3.4. It is handy to identify columns with multi-indices 473as we have shown. 474

We return to the constrained minimization of the objective function A.x; y; z/) 475that is defined in (5). Using the method of Lagrange multipliers we set the gradient 476of 477

Q A.x; y; z/ D A.x; y; z/ � �2.xTx � 1/� �

2.yTy � 1/�

2.zTz � 1/ 478

UNCO

RREC

TED

PROO

F


to zero. Using (6)–(8) we get 479

r Q A D

2664A.1/.z ˝ y/ � � xA.2/.z ˝ x/ � � yA.3/.y ˝ x/ � z

3775 D

26640

0

0

3775 : (9)

Since x, y, and z are unit vectors it follows that � D � D D .x; y; z/. In this case 480we say that � D .x; y; z/ is a singular value of A and x, y, and z are the associated 481singular vectors. How might we solve this (highly structured) system of nonlinear 482equations? The triplet of matrix-vector products: 483

A.1/ � .z ˝ y/ D � � x A.2/ � .z ˝ x/ D � � y A.3/ � .y ˝ x/ D � � z 484

suggests a componentwise solution strategy: 485

Power Method for Tensor SingularValues and Vectors

4Given: A 2 Rn1�n2�n3 and unit vectors y 2 Rn2 and z 2 Rn3 .4Repeat:

4Qx D A.1/.z ˝ y/; � D k Qx k; x D Qx=�4Qy D A.2/.z ˝ x/; � D k Qy k; y D Qy=�4Qz D A.3/.y ˝ x/; � D k Qz k; z D Qz=�4�opt D � , xopt D x, yopt D y, zopt D z

492

See [8, 18] for details. For matrices, the SVD expansion 493

A D U˙VT Drank.A/X

iD1�iuiv

Ti 494

has an important optimality property. In particular, the Eckhart-Young theorem tells 495us that 496

Ar DrX

iD1�iuiv

Ti r � rank.A/ 497

is the closest rank-r matrix to A in either the 2-norm or Frobenius norm. Moreover, 498the closest rank-1 matrix to Ar�1 is �rurvTr . Thus, it would be possible (in principle) 499to compute the full SVD by solving a sequence of closest rank-1 matrix problems. 500

UNCO

RREC

TED

PROO

F

26 C.F. Van Loan

This idea does not work for tensors. In other words, if �1u1 ı v1 ı w1 is the closest 501rank-1 tensor to A and �2u1 ı v2 ı w2 is the closest rank-1 tensor to 502

QA D A � �1u1 ı v1 ı w1; 503

then 504

A2 D �1u1 ı v1 ı w1 C �2u2 ı v2 ı w2 505

is not necessarily the closest rank-2 tensor to A. We need an alternative approach to 506formulating a “tensor SVD”. 507

5.3 A First Look at Tensor Rank 508

Since matrix rank ideas do not readily extend to the tensor setting, we should look 509more carefully at tensor rank to appreciate the “degree of difficulty” associated with 510the formulation of illuminating low-rank tensor expansions. 511

We start with a definition. Suppose the tensor A can be written as the sum of r 512rank-1 tensors and that r is minimal in this regard. In this case we say that rank.A/ D 513r. Let us explore this concept in the simplest possible setting: A 2 R2�2�2. For this 514problem the goal is to find three thin-as-possible matrices X;Y;Z 2 R2�r so that 515

A DrX

kD1X.W; k/ ı Y.W; k/ ı Z.W; k/; (10)

i.e., 516

vec.A/ D

2666666666664

a111a211a121a221a112a212a122a222

3777777777775

DrX

kD1Z.W; k/ ˝ Y.W; k/ ˝ X.W; k/: 517

UNCO

RREC

TED

PROO

F


This vector equation can also be written as a pair of matrix equations: 518

A.W; 1/ D"

a111 a121

a211 a221

#D

rXkD1

Z.1; k/X.W; k/Y.W; k/T

A.W; 2/ D"

a112 a122

a212 a222

#D

rXkD1

Z.2; k/X.W; k/Y.W; k/T :

Readers familiar with the generalized eigenvalue problem should see a connection 519to our 2-by-2-by-2 rank.A/ problem. Indeed, it can be shown that 520

det

"a111 a121

a211 a221

#� �

"a112 a122

a212 a222

#!D 0 521

has real distinct roots with probability 0.79 and complex conjugate roots with 522probability 0.21 when the matrix entries are randomly selected using the MATLAB 523randn function. If this 2-by-2 generalized eigenvalue problem has real distinct 524eigenvalues, then it is possible to find nonsingular matrices S and T so that 525

"a111 a121

a211 a221

#D S

"˛1 0

0 ˛2

#TT

"a112 a122

a212 a222

#D S

"ˇ1 0

0 ˇ2

#TT :

This shows that the rank-1 expansion (10) for A holds with r D 2, X D S, Y D T 526and 527

Z D"˛1 ˛2

ˇ1 ˇ2

#: 528

Thus, for 2-by-2-by-2 tensors, rank equals two with probability about 0.79. A 529similar generalized eigenvalue analysis shows that the rank is three with probability 5300.21. This is a very different situation than with matrices where an n-by-n matrix has 531rank n with probability 1. The subtleties associated with tensor rank are discussed 532further in Sect. 8.3. 533

UNCO

RREC

TED

PROO

F

28 C.F. Van Loan

6 Tensor Symmetry 534

Symmetry for a matrix is defined through transposition: A D AT . This is a nice 535shorthand way of saying that A.i; j/ D A. j; i/ for all possible i and j that satisfy 5361 � i � n and 1 � j � n. 537

How do we extend this notion to tensors? Transposition moves indices around so 538we need a way of talking about what happens when we (say) interchange A.i; j; k/ 539with A. j; i; k/ or A.k; j; i/ or A.i; k; j/ or A. j; k; i/ or A.k; i; j/. It looks like we will 540have to contend with an exponential number of transpositions and an exponential 541number of partial symmetries. 542

6.1 Tensor Transposition 543

If C 2 Rn1�n2�n3 , then there are 3Š D 6 possible transpositions identified by the 544notation C< Œi j k� > where Œi j k� is a permutation of Œ1 2 3�: 545

B D

8̂ˆ̂̂̂̂ˆ̂̂̂

C< Œ1 3 2� >

C< Œ2 1 3� >

C< Œ2 3 1� >

C< Œ3 1 2� >

C< Œ3 2 1� >

9>>>>>>>>>>>=>>>>>>>>>>>;

H)

8̂ˆ̂̂̂̂ˆ̂̂̂>>>>>>>>>>=>>>>>>>>>>>;

D cijk 546

for i D 1 W n1; j D 1 W n2; k D 1 W n3: 547For order-d tensors there are dŠ possibilities. Suppose v D Œv1; v2; : : : ; vd� is a 548

permutation of the integer vector 1 W d D Œ1; 2; : : : ; d �. If C 2 Rn1��nd , then 549

B D C ) B.i.v// D C.i/ 1 � i � n: 550

6.2 Symmetric Tensors 551

An order-d tensor C 2 Rn��n is symmetric if C D C for all permutations v of 5521 W d. If d D 3 this means that 553

cijk D cikj D cjik D cjki D ckij D ckji: 554

UNCO

RREC

TED

PROO

F


To get a feel for the level of redundancy in a symmetric tensor, we see that a 555symmetric C 2 R3�3�3 has at most ten distinct values: 556

c111c112 D c121 D c211c113 D c131 D c311c222c221 D c212 D c122c223 D c232 D c322c333c331 D c313 D c133c332 D c323 D c233c123 D c132 D c213 D c231 D c312 D c321:

557

The modal unfoldings of a symmetric tensor are all the same. For example, the (2,3)AQ3 558entries in each of the matrices 559

C.1/ D

264

c111 c121 c131 c112 c122 c132 c113 c123 c133

c211 c221 c231 c212 c222 c232 c213 c223 c233

c311 c321 c331 c312 c322 c332 c113 c323 c333

375 560

561

C.2/ D

264

c111 c211 c311 c112 c212 c312 c113 c213 c313

c121 c221 c321 c122 c222 c322 c123 c223 c323

c131 c231 c331 c132 c232 c332 c113 c233 c333

375 562

563

C.3/ D

264

c111 c211 c311 c121 c221 c321 c131 c231 c331

c112 c212 c312 c122 c222 c322 c132 c232 c332

c113 c213 c313 c123 c223 c323 c133 c233 c333

375 564

are the equal: c231 D c321 D c312. 565

6.3 Symmetric Rank 566

An order-d symmetric rank-1 tensor C 2 Rn��n has the form 567

C D x ı � � � ı x„ ƒ‚ …d times

568

UNCO

RREC

TED

PROO

F

30 C.F. Van Loan

where x 2 Rn. In this case we clearly have 569

C.i1; : : : ; id/ D xi1xi2 � � � xid 570

and 571

vec.C/ D x ˝ � � � ˝ x„ ƒ‚ …d times

: 572

An order-3 symmetric tensor C has symmetric rank r if there exists x1; : : : ; xr 2 Rn 573and � 2 Rr such that 574

C DrX

kD1�k � xk ı xk ı xk 575

and no shorter sum of symmetric rank-1 tensors exists. Symmetric rank is denoted 576by rankS.C/. Note, in contrast to what we would expect for matrices, there may be 577a shorter sum of general rank-1 tensors that add up to C: 578

C DQrX

kD1Q�k � Qxk ı Qyk ı Qzk: 579

The symmetric rank of a symmetric tensor is more tractable than the (general) rank 580of a general tensor. For example, if C 2 C n��n is an order-d symmetric tensor, 581then with probability one we have 582

rankS.C/ D8<:

f .d;m/C 1 if (d,n)D .3; 5/; .4; 3/; .4; 4/; or.4; 5/

f .d; n/ otherwise583

where 584

f .d; n/ D ceil

0BBB@

�nC d � 1

d

�

n

1CCCA : 585

See [5, 6] for deeper discussions of symmetric rank. 586

UNCO

RREC

TED

PROO

F


6.4 The Eigenvalues of a Symmetric Tensor 587

For a symmetric matrix C the stationary values of �C.x/ D xTCx subject to the 588constraint that k x k2 D 1 are the eigenvalues of C. The associated stationary vectors 589are eigenvectors. We extend this idea to symmetric tensors and the order-3 case is 590good enough to illustrate the main ideas. 591

If C 2 Rn�n�n is a symmetric tensor, then we define the stationary values of 592

�C.x/ DnX

iD1

nXjD1

nXkD1

cijkxixjxk D xTC.1/.x ˝ x/ 593

subject to the constraint that k x k2 D 1 to be the eigenvalues of C. The associated 594stationary vectors are eigenvectors. Using the method of Lagrange multipliers it can 595be shown that if x is a stationary vector for �C then 596

x D �C.x/C.1/.x ˝ x/ 597This leads to an iteration of the following form: 598

Power Method for Tensor Eigenvaluesand Eigenvectors

599Given: Symmetric C 2 Rn�n�n and unit vector x 2 Rn.600Repeat:

601Qx D C.1/.x ˝ x/602� D k Qx k2603x D Qx=�604�opt D �, xopt D x.

605

There are some convergence results for this iteration, For example, it can be shown 606that if the order of C is even and M is a square unfolding, then the iteration converges 607if M is positive definite [15]. 608

6.5 Symmetric Embeddings 609

In the matrix case there are connections between the singular values and vectors of 610A 2 Rn1�n2 and the eigenvalues and vectors of 611

sym.A/ D"0 A

AT 0

#2 R.n1Cn2/�.n1Cn2/ 612

UNCO

RREC

TED

PROO

F

32 C.F. Van Loan

If A D U � diag.�i/ � VT is the SVD of A 2 Rn1�n2, then for k D 1 W rank.A/ 613"0 A

AT 0

#"uk

˙vk

#D ˙�k

"uk

˙vk

#614

where uk D U.W; k/ and vk D V.W; k/. 615It turns out that symmetric tensor sym.A/ can be “built” from of a general tensor 616

A by judiciously positioning A and all of its transposes. Here is a depiction of the 617order-3 case: 618

619

We can think of sym.A/ as a 3-by-3-by-3 block tensor. As a tensor of scalars, it 620is N-by-N-by-N where N D n1n2n3. If 621

8<: �;

24 uv

z

359=; 622

is a stationary pair for sym.A/, then so are 6238<: �;

24 u�v�z

359=; ;

8<: ��;

24 u�v

z

359=; ;

8<: ��;

24 uv�z

359=; : 624

This is a nice generalization of the result for matrices. There are interesting 625connections between power methods with A and power methods with sym.A/. 626There are also connections between the rank of A and the symmetric rank of

UNCO

RREC

TED

PROO

F


sym.A/: 627

dŠ rank.A/ � rankS.sym.A//: 628

It is not clear if the inequality can be replaced by equality. See [27]. 629

7 The Tucker Decomposition 630

We have seen that it is possible to extend the definition of singular values and vectors 631to tensors by using variational principles. However, these insights did not culminate 632in the production of a tensor SVD and that is disappointing when we consider the 633power of the matrix SVD: 634

1. It can turn a given problem into an equivalent easy-to-solve problem. For 635example, the min k Ax � b k2 problem can be converted into an equivalent 636diagonal least squares problem using the SVD. 637

2. It can uncover hidden relationships that exist in matrix-encoded data. For 638example, the SVD can show that a data matrix has a low rank structure. 639

In the next several sections we produce various SVD-like tensor decompositions. 640We start with the Tucker decomposition because it involves orthogonality and has a 641strong resemblance to the matrix SVD. We use order-3 tensors to illustrate the main 642ideas. 643

7.1 Tucker Representations: The Matrix Case 644

Given a matrix A 2 Rn1�n2, the Tucker representation problem involves finding a 645core matrix S 2 Rr1�r2 and matrices U1 2 Rn1�r1 and U2 2 Rn2�r2 such that 646

A.i1; i2/ Dr1X

j1D1

r2Xj2D1

S. j1; j2/ � U1.i1; j1/ � U2.i2; j2/: 647

Part of the “deal” involves choosing the integers r1 and r2 and perhaps replacing the 648“=” with “�”. Regardless, it is possible to reformulate the right hand side as a sum 649of rank-1 matrices, 650

A Dr1X

j1D1

r2Xj2D1

S. j1; j2/ � U1.W; j1/ � U2.W; j2/T ; (11)

UNCO

RREC

TED

PROO

F

34 C.F. Van Loan

or as a sum of vector Kronecker products, 651

vec.A/ Dr1X

j1D1

r2Xj2D1

S. j1; j2/U2.W; j2/ ˝ U1.W; j2/; 652

or as a single matrix-vector product, 653

vec.A/ D .U2 ˝ U1/ � vec.S/: 654

The tensor versions of these reformulations get us to think the right way about how 655we might generalize the matrix SVD. 656

7.2 Tucker Representations: The Tensor Case 657

Given a tensor A 2 Rn1�n2�n3 , the Tucker representation problem involves finding 658a core tensor S 2 Rr1�r2�r3 and matrices U1 2 Rn1�r1 , U2 2 Rn2�r2 , and U3 2 659Rn3�r3 such that 660

A.i1; i2; i3/ Dr1X

j1D1

r2Xj2D1

r3Xj3D1

S. j1; j2; j3/ � U1.i1; j1/ � U2.i2; j2/ � U3.i3; j3/: 661

As in the matrix case above, we can rewrite this as a sum of rank-1 tensors, 662

A Dr1X

j1D1

r2Xj2D1

r3Xj3D1

S. j1; j2; j3/ � U1.W; j1/ ı U2.W; j2/ ı U3.W; j3/; 663

or as the sum of vector Kronecker products, 664

vec.A/ Dr1X

j1D1

r2Xj2D1

r3Xj3D1

S. j1; j2; j3/ � U3.W; j3/ ˝ U2.W; j2/ ˝ U1.W; j1/ 665

or as a single matrix-vector product, 666

vec.A/ D .U3 ˝ U2 ˝ U1/ � vec.S/: 667

The challenge is to design the representation so that it is illuminating and com- 668putable. 669

Before we proceed it is instructive to revisit the matrix case. If we set r1 D n1 670and r2 D n2 in (11) and assume the U matrices are orthogonal, then the Tucker

UNCO

RREC

TED

PROO

F


representation has the form 671

A D U1.UT1 AU2/UT2 Dn1X

j1D1

n2Xj2D1

S. j1; j2/U1.W; j1/U2.W; j2/T 672

where S D UT1AU2. To make this an “illuminating” representation of A we strive for 673a diagonal core matrix S and that, of course, leads to the SVD: 674

A Drank.A/X

kD1S.k; k/U1.W; k/U2.W; k/T : 675

From there we prove various optimality theorems and conclude that 676

Ar DrX

kD1S.k; k/U1.W; k/U2.W; k/T r � rank.A/ 677

is the closest rank-r matrix to A in (say) the Frobenius norm. 678On the algorithmic front methods typically determine U1 and U2 through a 679

sequence of updates. Thus, if A D U1SUT2 is the “current” representation of A we 680proceed to compute orthogonal �1 and �2 so that QS D �T1S�2 is “more diagonal” 681than S. We then update the representation: 682

S �1TS�2; U1 U1�1; U2 U2�2: 683

Our plan is to mimic this sequence of events for tensors focusing first on the 684connection between the core tensor S and the U matrices. 685

7.3 The Mode-k Product 686

Updating a Tucker representation involves updating the current core tensor S and 687the associated U-matrices. Regarding the former we anticipate the need to design 688a relevant tensor-matrix product. The mode-k product turns out to be that operation 689and we motivate the main idea with an example. Suppose S 2 R4�3�2 and consider 690its mode-2 unfolding: 691

S.2/ D

264

s111 s211 s311 s411 s112 s212 s312 s412

s121 s221 s321 s421 s122 s222 s322 s422

s131 s231 s331 s431 s132 s232 s332 s432

375 : 692

UNCO

RREC

TED

PROO

F

36 C.F. Van Loan

Its columns are the mode-2 fibers of S. Suppose we apply a 5-by-3 matrix M to each 693of those fibers: 694

266666664

t111 t211 t311 t411 t112 t212 t312 t412

t121 t221 t321 t421 t122 t222 t322 t422

t131 t231 t331 t431 t132 t232 t332 t432

t141 t241 t341 t441 t142 t242 t342 t442

t151 t251 t351 t451 t152 t252 t352 t452

377777775

D

266666664

m11 m12 m13

m21 m22 m23

m31 m32 m33

m41 m42 m43

m51 m52 m53

377777775

2664

s111 s211 s311 s411 s112 s212 s312 s412

s121 s221 s321 s421 s122 s222 s322 s422

s131 s231 s331 s431 s132 s232 s332 s432

3775 :

This defines a new tensor T 2 R4�5�2 that is totally specified by the equation 695

T.2/ D M � S.2/ 696

and referred to as the mode-2 product of a tensor S with a matrix M. In general, if 697S is an n1� � � � � nd tensor and M 2 Rmk�nk for some k that satisfies 1 � k � d, then 698the mode-k product of S with M is a new tensor T defined by 699

T.k/ D M � S.k/: 700

To indicate this operation we use the notation 701

T D S�k M: 702

Note that T is an n1 � � � � � nk�1 � mk � nkC1 � nd tensor. To illustrate more 703characterizations of the mode-k product we drop down to the order-3 case. 704

– Mode-1 Product. If S 2 Rn1�n2�n3 and M1 2 Rm1�n1, then T D S�1 M1 is an 705m1 � n2 � n3 tensor that is equivalently defined by 706

T .i1; i2; i3/ Dn1X

kD1M1.i1; k/S.k; i2; i3/

vec.T / D .In3 ˝ In2 ˝ M1/vec.S/: (12)

UNCO

RREC

TED

PROO

F


– Mode-2 Product. If S 2 Rn1�n2�n3 and M2 2 Rm2�n2, then T D S�2 M2 is an 707n1 �m2 � n3 tensor that is equivalently defined by 708

T .i1; i2; i3/ Dn2X

kD1M2.i2; k/S.i1; k; i3/

vec.T / D .In3 ˝ M2 ˝ In1 /vec.S/ (13)

– Mode-3 Product. If S 2 Rn1�n2�n3 and M3 2 Rm3�n3, then T D A�3 M3 is an 709n1 � n2 � m3 tensor that is is equivalently defined by 710

T .i1; i2; i3/ Dn3X

kD1M3.i3; k/S.i1; i2; k/

vec.T / D .M3 ˝ In2 ˝ In1 /vec.S/ (14)

The modal products have two important properties that we will be using later. The 711first concerns successive products in the same mode. If S 2 Rn1��nd and M1;M2 2 712Rnk�nk, then 713

.S�k M1/�k M2 D S�k .M1M2/: 714

The second property concerns successive products in different modes. If S 2 715Rn1��nd , Mk 2 Rnk�nk, Mj 2 Rnj�nj, and k ¤ j, then 716

.S�k Mk/�j Mj D .S �j Mj/�k Mk 717The order is not important so we just write S �j Mj�k Mk or S �k Mk�j Mj. 718

7.4 The Core Tensor 719

Suppose we have a Tucker representation 720

A.i/ DnX

jD1S.j/ � U1.i1; j1/ �U2.i2; j2/ � U3.i3; j3/; 721

where A 2 Rn, n D Œn1; n2; n3�, and U1 2 Rn1�n1, U2 2 Rn2�n2, and U3 2 Rn3�n3. It 722follows that 723

vec.A/ D .U3 ˝ U2 ˝ U1/vec.S/

D .U3 ˝ In2 ˝ In1/.In3 ˝ U2 ˝ In1 /.In3 ˝ In2 ˝ U1/vec.S/:

UNCO

RREC

TED

PROO

F

38 C.F. Van Loan

This factored product enables us to relate the core tensor S to A via a triplet of 724modal products. Indeed, if 725

vec.S.1// D .In3 ˝ In2 ˝ U1/vec.S/

vec.S.2// D .In3 ˝ U2 ˝ In1 /vec.S.1//

vec.S.3// D .U3 ˝ In2 ˝ In1 /vec.S.2//

then A D S.3/. But from (12), (13), and (14) this means that 726

S.1/ D S�1 U1 S.2/ D S.1/�2 U2 S.3/ D S.2/�3 U3 727

and so 728

A D S�1 U1�2 U2�3 U3: 729

If the U’s are nonsingular then 730

A D A�1 .U�11 U1/�2 .U�12 U2/�3 .U�13 U3/

D A�1 U�11 �2 U�12 �3 U�13 �1 U1�2 U2�3 U3and so S D A�1 U�11 �2 U�12 �3 U�13 . 731

If the U’s are orthogonal and A 2 Rn1�n2�n3 and U1 2 Rn1�n1, U2 2 Rn2�n2, and 732U3 2 Rn3�n3 are orthogonal, then 733

A D S�1 U1�2 U2�3 U3 734

where 735

S D A�1 UT1�2 UT2�3 UT3 : (15)

or equivalently 736

A.1/ D U1S.1/.U3 ˝ U2/T (16)

A.2/ D U2S.2/.U3 ˝ U1/T (17)

A.3/ D U3S.3/.U2 ˝ U1/T (18)

With this choice we are representing A as a Tucker product of a core tensor S and 737three orthogonal matrices. Things are beginning to look “SVD-like”. 738

UNCO

RREC

TED

PROO

F


7.5 The Higher-Order SVD 739

How do we choose the U matrices in (15) so that the core tensor S reveals things 740about the structure of A? It is instructive to look at the matrix case where we know 741the answer to this question. Suppose we have the SVDs 742

UT1A.1/V1 D ˙1; UT2A.2/V2 D ˙2: 743

Since A1 D A and A.2/ D AT it follows that we may set V1 D U2. In other words, 744we can (in principle) compute the SVD of A by computing the SVD of the modal 745unfoldings A.1/ and A.2/. The U matrices from the modal SVDs are the “right” 746choice. 747

This suggests a strategy for picking good U matrices for the tensor case A 2 748Rn1�n2�n3 . Namely, compute the SVD of the modal unfoldings 749

A.1/ D U1˙1VT1 A.2/ D U2˙2VT2 A.3/ D U3˙3VT3 (19)

and set 750

S D A�1 UT1�2 UT2�3 UT3 : 751

The resulting decomposition 752

A D S�1 U1�2 U2�3 U3; 753

is the higher-order SVD (HOSVD) ofA. IfA D S�1 U1�2 U2�3 U3 is the HOSVD 754of A 2 Rn1�n2�n3 , then 755

A DrX

jD1S.j/ � U1.W; j1/ ı U2.W; j2/ ıU3.W; j3/ (20)

where r1 D rank.A.1//, r2 D rank.A.2//, and r3 D rank.A.3//. The triplet of modal 756ranks Œr1; r2; r3 � is called the multilinear rank of A. 757

The core tensor in the HOSVD has important properties. By combining (16)–(19) 758we have 759

S.1/ D ˙1V1.U3 ˝ U2/

S.2/ D ˙2V2.U3 ˝ U1/

S.3/ D ˙3V3.U2 ˝ U1/

UNCO

RREC

TED

PROO

F

40 C.F. Van Loan

from which we conclude that 760

k S. j; W; W/ kF D �j.A.1// j D 1 W n1

k S.W; j; W/ kF D �j.A.2// j D 1 W n2

k S.W; W; j/ kF D �j.A.3// j D 1 W n3:

Here, �j.C/ denotes the jth largest singular value of the matrix C. Notice that the 761norms of the tensor’s slices are getting smaller as we “move away” from A.1; 1; 1/. 762

This suggests that we can use the grading in S to truncate the HOSVD: 763

A � AQr DQrX

jD1S.j/ � U1.W; j1/ ı U2.W; j2/ ı U3.W; j3/ 764

where Qr � r, i.e., Qr1 � r1, Qr2 � r2, and Qr3 � r3. As with SVD-based low-rank 765approximation in the matrix case, we simply need a tolerance to determine how to 766abbreviate the summation in (20. For a deeper discussion of the HOSVD, see [9]. 767

7.6 The Tucker Nearness Problem 768

Suppose we are given A 2 Rn1�n2�n3 and r D Œr1; r2; r3� � Œn1; n2; n3� D n. 769In the Tucker nearness problem we determine S 2 Rr1�r2�r3 and matrices U1 2 770Rn1�r1, U2 2 Rn2�r2, and U3 2 Rn3�r3 with orthonormal columns such that 771

�.U1;U2;U3/ D�� A �

rXjD1

S.j/ � U1.W; j1/ ı U2.W; j2/ ı U3.W; j3/��

F

772

is minimized. It is easy to show that 773

�.U1;U2;U3/ D k vec.A/ � .U3 ˝ U2 ˝ U1/vec.S/ k2 774

and so from using normal equations we see that 775

S D UT3 ˝ UT2 ˝ UT1 � vec.A/: 776

UNCO

RREC

TED

PROO

F


This is why the objective function � does not have S as an argument: the “best S” 777is determined by U1, U2, and U3. The goal is to minimize 778

�.U1;U2;U3/ D kI � .U3 ˝ U2 ˝ U1/

UT3 ˝ UT2 ˝ UT1

vec.A/ k

2: 779

Since U3 ˝ U2 ˝ U1 has orthonormal columns, it follows that minimizing this 780norm is the same as maximizing 781

�.U1;U2;U3/ D kUT3 ˝ UT2 ˝ UT1

� vec.A/ k2: 782

The reformulations 783

�.U1;U2;U3/ D

8̂ˆ̂<ˆ̂̂:

k UT1 � A.1/ � .U3 ˝ U2/ kFk UT2 � A.2/ � .U3 ˝ U1/ kFk UT3 � A.3/ � .U2 ˝ U1/ kF

784

set the stage for a componentwise optimization approach: 785

Fix U2 and U3 and choose U1 to maximize k UT1 � A.1/ � .U3 ˝ U2/ kF:Fix U1 and U3 and choose U2 to maximize k UT2 � A.2/ � .U3 ˝ U1/ kF:Fix U1 and U2 and choose U3 to maximize k UT3 � A.3/ � .U2 ˝ U1/ kF:

786

These optimization problems can be solved using the SVD. Consider the problem 787of maximizing k QTM kF where Q 2 Rm�r has orthonormal columns and M 2 788Rm�n is given. If 789

M D U˙VT 790

is the SVD of M, then 791

kQTM k2F D kQTU˙VT k2F D kQTU˙ k2F DrX

kD1�2k k QTU.W; k/ k22: 792

It is clear that we can maximize the summation by setting Q D U.W; 1 W r/. Putting 793it all together we obtain the following framework. 794

UNCO

RREC

TED

PROO

F

42 C.F. Van Loan

The Tucker Nearness Problem

795Given A 2 Rn1�n2�n3796Compute the HOSVD of A and determine r D Œr1; r2; r3� � n.797Set U1 D U1.W; 1 W r1/, U2 D U2.W; 1 W r2/, and U3 D U3.W; 1 W r3/.798Repeat:

799Compute the SVD A.1/ � .U3 ˝ U2/ D QU1˙1VT1800and set U1 D QU1.W; 1 W Qr1/.

801Compute the SVD A.2/ � .U3 ˝ U1/ D QU2˙2VT2802and set U2 D QU2.W; 1 W Qr2/.

803Compute the SVD A.3/ � .U2 ˝ U1/ D QU3˙3VT3804and set U3 D QU3.W; 1 W Qr3/.805U.opt/1 D U1, U.opt/2 D U2, U.opt/3 D U3

806

Using the HOSVD to generate an initial guess makes sense given the discussion in 807Sect. 7.5. The matrix-matrix products, e.g., A.1/ � .U3 ˝ U2/, are rich in exploitable 808Kronecker structure. See [29] for further details. 809

7.7 A Jacobi Approach 810

Solving the Tucker Nearness problem is not equivalent to maximizing the “diagonal 811mass” of the core tensor S. We briefly describe a Jacobi-like procedure that does. 812To motivate the main idea, consider the problem of maximizing tr.UT1 AU2/ where 813A 2 Rn1�n2 is given, U1 2 Rn1�n1 is orthogonal, and U2 2 Rn2�n2 is orthogonal. It is 814easy to show that the optimum U1 and U2 have the property that UT1 AU2 is diagonal 815with nonnegative diagonal entries. Thus, the SVD solves this particular “max trace” 816problem. 817

Now suppose C is n-by-n-by-n and define 818

.C/ DnX

iD1ciii: 819

UNCO

RREC

TED

PROO

F


Given A 2 Rn1�n2�n3 , our goal is to compute orthogonal U1 2 Rn1�n1, U2 2 Rn2�n2, 820and U3 2 Rn3�n3 so that if the tensor S is defined by 821

vec.S/ D .U3 ˝ U2 ˝ U1/Tvec.A/ 822

then �.S/ is maximized. Here is a Jacobi-like strategy for updating the “current” 823orthogonal triplet fU1; U2; U3 g so that new core tensor has a larger trace. 824

A Jacobi Framework for Computing aCompressed Tucker Representation

825Given: A 2 Rn1�n2�n3826Set U1 D In, U2 D In, U3 D In , and S D A:827Repeat:

828Find “simple” orthogonal QU1, QU2, and QU3 so that829tr.S�1 QU1�2 QU2�3 QU3/ > tr.S/830Update:

831S D S�1 QU1�2 QU2�3 QU3832U1 D U1 QU1, U2 D U2 QU2, U3 D U3 QU3833U.opt/1 D U1, U.opt/2 D U2, U.opt/3 D U3.

834

We say that this iteration strives for a “compressed” Tucker representation because 835there is an explicit attempt to compress the information in A into a relatively small 836number of near-the-diagonal entries in S . One idea for QU3 ˝ QU2 ˝ QU1 is to use 837carefully designed Jacobi rotations, e.g., 838

QU3 ˝ QU2 ˝ QU1 D

8̂ˆ̂̂<ˆ̂̂̂:

In ˝ Jpq.ˇ/ ˝ Jpq.˛/

Jpq.ˇ/ ˝ In ˝ Jpq.˛/

Jpq.ˇ/ ˝ Jpq.˛/ ˝ In

: 839

Here, Jpq./ is a Jacobi rotation in planes p and q. These updates modify only 840two diagonal entries: sppp and sqqq. Sines and cosines can be chosen to increase 841the resulting trace and their determination leads to a 2-by-2-by-2 Jacobi rotation 842subproblem. 843

For example, determine c˛ D cos.˛/, s˛ D sin.˛/, cˇ D cos.ˇ/, and sˇ D 844sin.ˇ/, so that if 845

"�ppp �pqp

�qpp �qqp

#D

"c˛ s˛

�s˛ c˛

#T "sppp spqp

sqpp sqqp

#"cˇ sˇ

�sˇ cˇ

#846

UNCO

RREC

TED

PROO

F

44 C.F. Van Loan

and 847

"�ppq �pqq

�qpq �qqq

#D

"c˛ s˛

�s˛ c˛

#T "sppq spqq

sqpq sqqq

#"cˇ sˇ

�sˇ cˇ

#848

then �ppp C �qqq is maximized. See [20]. 849

8 The CP Decomposition 850

As with the Tucker representation, the CP representation of a tensor expresses the 851tensor as a sum-of-rank-one tensors. However, it does not involve orthogonality and 852the core tensor is truly diagonal, e.g., sijk D 0 unless i D j D k. 853

A note about terminology before we begin. The ideas behind the CP de- 854composition are very similar to the ideas behind the CANDECOMP (Canonical 855Decomposition) and the PARAFAC (Parallel Factors Decomposition). Thus, “CP” 856is an effective way to acknowledge the connections. 857

8.1 CP Representations: The Matrix Case 858

For matrices, the SVD 859

A D U1˙UT2 DX

i

�iU1.W; i/U2.W; i/T 860

is an example of a CP decomposition. But an eigenvalue decomposition also 861qualifies. If A is diagonalizable, then we have 862

A D U1diag.�i/UT2 DX

i

�iU1.W; i/U2.W; i/T 863

where UT2 D U�11 . Of course orthogonality is part of the SVD and biorthogonality 864(UT2U1 D I) figures in eigenvalue diagonalization. This kind of structure falls by the 865wayside when we graduate to tensors. 866

UNCO

RREC

TED

PROO

F


8.2 CP Representation: The Tensor Case 867

We use the order-3 situation to expose the key ideas. The CP representation for an 868n1 � n2 � n3 tensor A has the form 869

A DrX

kD1�kU1.W; k/ ı U2.W; k/ ı U3.W; k/ 870

where �’s are real scalars and U1 2 Rn1�r, U2 2 Rn2�r, and U3 2 Rn3�r have unit 8712-norm columns. Alternatively, we have 872

A.i1; i2; i3/ DrX

jD1�j � U1.i1; j/ � U2.i2; j/ �U3.i3; j// (21)

vec.A/ DrX

jD1�j � U3.W; j/ ˝ U2.W; j/ ˝ U1.W; j/ (22)

In contrast to the Tucker representation, 873

A Dr1X

j1D1

r2Xj2D1

r3Xj3D1

S. j1; j2; j3/ � U1.W; j1/ ı U2.W; j2/ ı U3.W; j3/; 874

we see that the CP representation involves a diagonal core tensor. The Tucker 875representation gives that up in exchange for orthonormal U matrices. 876

8.3 More About Tensor Rank 877

As we mentioned in Sect. 5.3, the rank of a tensorA is the minimum number of rank- 8781 tensors that sum to A. Thus, the length of the shortest possible CP representation 879of a tensor is its rank. Our analysis of the 2-by-2-by-2 situation indicates that there 880are several fundamental differences between tensor rank and matrix rank. Here are 881some more anomalies: 882

Anomaly 1. The largest rank attainable for an n1-by-. . . -nd tensor is called the 883maximum rank. It is not a simple formula that depends on the dimensions 884n1; : : : ; nd. Indeed, its precise value is only known for small examples. Maximum 885rank does not equal minfn1; : : : ; ndg unless d � 2. 886

Anomaly 2. If the set of rank-k tensors in Rn1��nd has positive Lebesgue 887measure, then k is a typical rank. Here are some examples where this quantity 888is known: 889

UNCO

RREC

TED

PROO

F

46 C.F. Van Loan

t8.1Size Typical ranks

t8.22� 2� 2 2,3t8.33� 3� 3 4t8.43� 3� 4 4,5t8.53� 3� 5 5,6

For n1-by-n2 matrices, typical rank and maximal rank are both equal to the 890smaller of n1 and n2. 891

Anomaly 3. The rank of a particular tensor over the real field may be different 892than its rank over the complex field. 893

Anomaly 4. It is possible for a tensor with a given rank to be arbitrarily close to a 894tensor with lesser rank. Such a tensor is said to be degenerate. 895

For more on the issue of tensor rank, see [10] and [14]. 896

8.4 The Nearest CP Problem 897

Suppose A 2 Rn1�n2�n3 and r are given. The nearest CP approximation problem 898involves finding a vector � 2 Rr and matrices U1 2 Rn1�r, U2 2 Rn2�r, and U3 2 899Rn3�r (with unit 2-norm columns) so that 900

�.U1;U2;U3; �/ D��A �

rXjD1

�j � U1.W; j/ ı U2.W; j/ ı U3.W; j/��

F

901

is minimized. The objective function for this multilinear optimization problem has 902three different formulations: 903

�.U1;U2;U3; �/ D

8̂ˆ̂̂̂̂ˆ̂̂̂ˆ̂̂̂̂ˆ̂<ˆ̂̂̂̂ˆ̂̂̂ˆ̂̂̂̂ˆ̂̂:

��A.1/ �rX

jD1�j � U1.W; j/ ˝ .U3.W; j/ ˝ U2.W; j//T

��

Documents

Lecture Notes in Mathematics 2173 - Emory Universitybenzi/Web_papers/nsmail.pdf · PROOF Lecture Notes in Mathematics 2173 1 Editors-in-Chief: J.-M. Morel, Cachan 2 3 B. Teissier,