418
UNCORRECTED PROOF Lecture Notes in Mathematics 2173 1 Editors-in-Chief: J.-M. Morel, Cachan 2 3 B. Teissier, Paris 4 Advisory Board: Camillo De Lellis, Zurich 5 6 Mario di Bernardo, Bristol 7 Michel Brion, Grenoble 8 Alessio Figalli, Zurich 9 Davar Khoshnevisan, Salt Lake City 10 Ioannis Kontoyiannis, Athens 11 Gabor Lugosi, Barcelona 12 Mark Podolskij, Aarhus 13 Sylvia Serfaty, New York 14 Anna Wienhard, Heidelberg 15 More information about this series at http://www.springer.com/series/304

Lecture Notes in Mathematics 2173 - Emory Universitybenzi/Web_papers/nsmail.pdf · PROOF Lecture Notes in Mathematics 2173 1 Editors-in-Chief: J.-M. Morel, Cachan 2 3 B. Teissier,

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

  • UNCO

    RREC

    TED

    PROO

    F

    Lecture Notes in Mathematics 2173 1

    Editors-in-Chief:J.-M. Morel, Cachan

    2

    3

    B. Teissier, Paris 4

    Advisory Board:Camillo De Lellis, Zurich

    5

    6

    Mario di Bernardo, Bristol 7Michel Brion, Grenoble 8Alessio Figalli, Zurich 9Davar Khoshnevisan, Salt Lake City 10Ioannis Kontoyiannis, Athens 11Gabor Lugosi, Barcelona 12Mark Podolskij, Aarhus 13Sylvia Serfaty, New York 14Anna Wienhard, Heidelberg 15

    More information about this series at http://www.springer.com/series/304

    http://www.springer.com/series/304

  • UNCO

    RREC

    TED

    PROO

    F

  • UNCO

    RREC

    TED

    PROO

    F

    Michele Benzi • Dario Bini • Daniel Kressner •Hans Munthe-Kaas • Charles Van Loan

    16

    17

    Editors 18

    Exploiting Hidden Structurein Matrix Computations:Algorithms and Applications

    19

    20

    21

    Cetraro, Italy 2015 22

    123

  • UNCO

    RREC

    TED

    PROO

    F

    EditorsMichele BenziMathematics and Science CenterEmory UniversityAtlanta, USA

    Dario BiniUniversita’ di PisaPisa, Italy

    Daniel KressnerÉcole Polytechnique Fédérale de LausanneLausanne, Switzerland

    Hans Munthe-KaasDepartment of MathematicsUniversity of BergenBergen, Norway

    Charles Van LoanDepartment of Computer ScienceCornell UniversityIthacaNew York, USA

    23

    ISSN 0075-8434 ISSN 1617-9692 (electronic)Lecture Notes in MathematicsISBN 978-3-319-49886-7 ISBN 978-3-319-49887-4 (eBook)DOI 10.1007/978-3-319-49887-4

    24

    Library of Congress Control Number: xxxxxxxxxx 25

    Mathematics Subject Classification: 65Fxx, 65Nxx 26

    © Springer International Publishing AG 2016 27This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part ofthe material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,broadcasting, reproduction on microfilms or in any other physical way, and transmission or informationstorage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodologynow known or hereafter developed.

    28

    29

    30

    31

    32

    The use of general descriptive names, registered names, trademarks, service marks, etc. in this publicationdoes not imply, even in the absence of a specific statement, that such names are exempt from the relevantprotective laws and regulations and therefore free for general use.

    33

    34

    35

    The publisher, the authors and the editors are safe to assume that the advice and information in this bookare believed to be true and accurate at the date of publication. Neither the publisher nor the authors orthe editors give a warranty, express or implied, with respect to the material contained herein or for anyerrors or omissions that may have been made.

    36

    37

    38

    39

    Printed on acid-free paper 40

    This Springer imprint is published by Springer Nature 41The registered company is Springer International Publishing AG 42The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

  • UNCO

    RREC

    TED

    PROO

    F

    Preface 1

    This volume collects the notes of the lectures delivered at the CIME Summer 2Course “Exploiting Hidden Structure in Matrix Computations. Algorithms and 3Applications,” held in Cetraro, Italy, from June 22 to June 26, 2015. 4

    The course focused on various types of hidden and approximate structure in 5matrices and on their key role in various emerging applications. Matrices with 6special structure arise frequently in scientific and engineering problems and have 7long been the object of study in numerical linear algebra, as well as in matrix and 8operator theory. For instance, banded matrices or Toeplitz matrices fall under this 9category. More recently, however, researchers have begun to investigate classes of 10matrices for which structural properties, while present, may not be immediately 11obvious. Important examples include matrices with low-rank off-diagonal block 12structure, such as hierarchical matrices, quasi-separable matrices, and so forth. 13Another example is provided by matrices in which the entries follow some type 14of decay pattern away from the main diagonal. The structural analysis of complex 15networks leads to matrices that appear at first sight to be completely unstructured; 16a closer look, however, may reveal a great deal of latent structure, for example, in 17the distribution of the nonzero entries and of the eigenvalues. Knowledge of these 18properties can be of great importance in the design of efficient numerical methods 19for such problems. 20

    In many cases, the matrix is only “approximately” structured; for instance, a 21matrix could be close in some norm to a matrix that has a desirable structure, such 22as a banded matrix or a semiseparable matrix. In such cases, it may be possible to 23develop solution algorithms that exploit the “near-structure” present in the matrix; 24for example, efficient preconditioners could be developed for the nearby structured 25problem and applied to the original problem. In other cases, the solution of the 26nearby problem, for which efficient algorithms exist, may be a sufficiently good 27approximation to the solution of the original problem, and the main difficulty could 28be detecting the “nearest” structured problem. 29

    Another very useful kind of structure is represented by various types of sym- 30metries. Symmetries in matrices and tensors are often linked to invariance under 31some group of transformations. Again, the structure, or near-structure, may not be 32

    v

  • UNCO

    RREC

    TED

    PROO

    F

    vi Preface

    immediately obvious in a given problem: uncovering the hidden symmetries (and 33underlying transformation group) in a problem can potentially lead to very efficient 34algorithms. 35

    The aim of this course was to present this increasingly important point of view 36to young researchers by exploiting the expertise of leading figures in this area, 37with different theoretical and application perspectives. The course was attended 38by 31 PhD students and young researchers, roughly half of them from Italy and 39the remaining ones from the USA, Great Britain, Germany, the Netherlands, Spain, 40Croatia, Czech Republic, and Finland. Many participants (about 50%) were at least 41partially supported by funding from CIME and the European Mathematical Society; 42other financial support was provided by the Università di Bologna. 43

    Two evenings (for a total of four hours) were devoted to short 15-min pre- 44sentations by 16 of the participants on their current research activities. These 45contributions were mostly of very high quality and very enjoyable, contributing to 46the success of the meeting. 47

    The notes collected in this volume are a faithful record of the material covered 48in the 28 h of lectures comprising the course. Charles Van Loan (Cornell University, 49USA) discussed structured matrix computations originating from tensor analysis, 50in particular tensor decompositions, with low-rank and Kronecker structure playing 51the main role. Dario Bini (Università di Pisa, Italy) covered matrices with Toeplitz 52and Toeplitz-like structure, as well as rank-structured matrices, with applications 53to Markov modeling and queueing theory. Daniel Kressner (EPFL, Switzerland) 54lectured on techniques for matrices with hierarchical low-rank structures, including 55applications to the numerical solution of elliptic PDEs; Jonas Ballani (EPFL) 56also contributed to the drafting of the corresponding lecture notes contained in 57this volume. Michele Benzi (Emory University) discussed matrices with decay, 58with particular emphasis on localization in matrix functions, with applications to 59quantum physics and network science. The lectures of Hans Munthe-Kaas (Univer- 60sity of Bergen, Norway) concerned the use of group-theoretic methods (including 61representation theory) in numerical linear algebra, with applications to the fast 62Fourier transform (“harmonic analysis on finite groups”) and to computational 63aspects of Lie groups and Lie algebras. 64

    We hope that these lecture notes will be of use to young researchers in numerical 65linear algebra and scientific computing and that they will stimulate further research 66in this area. 67

    We would like to express our sincere thanks to Elvira Mascolo (CIME Director) 68and especially to Paolo Salani (CIME Secretary), whose unfailing support proved 69crucial for the success of the course. We are also grateful to the European 70Mathematical Society for their endorsement and financial support, and to the 71Dipartimento di Matematica of Università di Bologna for additional support for 72the lecturers.AQ1 73

    Atlanta, GA, USA Michele Benzi 74Bologna, Italy Valeria Simoncini 75

  • UNCO

    RREC

    TED

    PROO

    F

    Acknowledgments 1

    CIME activity is carried out with the collaboration and financial support of INdAM 2(Istituto Nazionale di Alta Matematica) and Mathematics Department, University of 3Bologna. 4

    vii

  • UNCO

    RREC

    TED

    PROO

    F

    Contents 1

    Structured Matrix Problems from Tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2Charles F. Van Loan 3

    Matrix Structures in Queuing Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4Dario A. Bini 5

    Matrices with Hierarchical Low-Rank Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 6Jonas Ballani and Daniel Kressner 7

    Localization in Matrix Computations: Theory and Applications . . . . . . . . . . 211 8Michele Benzi 9

    Groups and Symmetries in Numerical Linear Algebra . . . . . . . . . . . . . . . . . . . . . . 319 10Hans Z. Munthe-Kaas 11

    ix

  • UNCO

    RREC

    TED

    PROO

    F

    AUTHOR QUERY

    AQ1. A slight mismatch has been noticed between source file (i.e., TeX) and PDFfile provided for preface and we have followed the source file (i.e., TeX).Please confirm if this is fine.

  • UNCO

    RREC

    TED

    PROO

    F

    Structured Matrix Problems from Tensors 1

    Charles F. Van Loan 2

    Abstract This chapter looks at the structured matrix computations that arise in 3the context of various “svd-like” tensor decompositions. Kronecker products and 4low-rank manipulations are central to the theme. Algorithmic details include the 5exploitation of partial symmetries, componentwise optimization, and how we might 6beat the “curse of dimensionality.” Order-4 tensors figure heavily in the discussion.AQ2 7

    1 Introduction 8

    A tensor is a multi-dimensional array. Instead of just A.i; j/ as for matrices we have 9A.i; j; k; `; : : :/. High-dimensional modeling, cheap storage, and sensor technology 10combine to explain why tensor computations are surging in importance. Here is an 11annotated timeline that helps to put things in perspective: 12

    Scalar-Level Thinking

    19600s +

    Matrix-Level Thinking

    19800s +

    Block Matrix-Level Thinking

    20000s +

    Tensor-Level Thinking

    13

    C.F. Van Loan (�)Department of Computer Science, Cornell University, Ithaca, NY, USAAQ1e-mail: [email protected]

    © Springer International Publishing AG 2016M. Benzi, V. Simoncini (eds.), Exploiting Hidden Structure in Matrix Computations:Algorithms and Applications, Lecture Notes in Mathematics 2173,DOI 10.1007/978-3-319-49887-4_1

    1

    mailto:[email protected]

  • UNCO

    RREC

    TED

    PROO

    F

    2 C.F. Van Loan

    The factorization paradigm: LU, LDLT , QR, U˙VT ,etc.

    Memory traffic awareness:, cache, parallel computing,LAPACK, etc.

    Matrix-tensor connections: unfoldings, Kroneckerproduct, multilinear optimization, etc.

    14

    An important subtext is the changing definition of what we mean by a “big problem.” 15In matrix computations, to say that A 2 Rn1�n2is “big” is to say that both n1 and n2 16are big. In tensor computations, to say that A 2 Rn1�����nd is “big” is to say that 17n1n2 � � � nd is big and this does NOT necessarily require big nk. For example, no 18computer in the world (nowadays at least!) can store a tensor with modal dimensions 19n1 D n2 D � � � D n1000 D 2: What this means is that a significant part of the tensor 20research community is preoccupied with the development of algorithms that scale 21with d. Algorithmic innovations must deal with the “curse of dimensionality.” How 22the transition from matrix-based scientific computation to tensor-based scientific 23computation plays out is all about the fate of the “the curse.” 24

    This chapter is designed to give readers who are somewhat familiar with 25matrix computations an idea about the underlying challenges associated with tensor 26computations. These include mechanisms by which tensor computations are turned 27into matrix computations and how various matrix algorithms and decompositions 28(especially the SVD) turn up all along the way. An important theme throughout is 29the exploitation of Kronecker product structure. 30

    To set the tone we use Sect. 2 to present an overview of some remarkable “hidden 31structures” that show up in matrix computations. Each of the chosen examples 32has a review component and a message about tensor-based matrix computations. 33The connection between block matrices and order-4 tensors is used in Sect. 3 to 34introduce the idea of a tensor unfolding and to connect Kronecker products and 35tensor products. A simple nearest rank-1 tensor problem is used in Sect. 4 to 36showcase the idea of componentwise optimization, a strategy that is widely used 37in tensor computations. In Sect. 5 we show how Rayleigh quotients can be used to 38extend the notion of singular values and vectors to tensors. Transposition and tensor 39symmetry are discussed in Sect. 6. Extending the singular value decomposition to 40tensors can be done in a number of ways. We present the Tucker decomposition in 41Sect. 7, the CP decomposition in Sect. 8, the Kronecker product SVD in Sect. 9, and 42the tensor train SVD in Sect. 10. Cholesky with column pivoting also has a role to 43play in tensor computations as we show in Sect. 11. 44

    We want to stress that this chapter is a high-level, informal look at the kind of 45matrix problems that arise out of tensor computations. Implementation details and 46rigorous analysis are left to the references. To get started with the literature and for 47general background we recommend [7, 11, 16, 17, 19, 29]. 48

  • UNCO

    RREC

    TED

    PROO

    F

    Structured Matrix Problems from Tensors 3

    2 The Exploitation of Structure in Matrix Computations 49

    We survey five interesting matrix examples that showcase the idea of hidden 50structure. By “‘hidden” we mean “not obvious”. In each case the exploitation of 51the hidden structure has important ramifications from the computational point of 52view. 53

    2.1 Exploiting Data Sparsity 54

    The n-by-n discrete Fourier transform matrix Fn is defined by 55

    ŒFn�kq D !kqn !n D cos�2�

    n

    �� i sin

    �2�

    n

    �56

    where we are subscripting from zero. Thus, 57

    F4 D

    2666664

    1 1 1 1

    1 !4 !24 !

    34

    1 !24 !44 !

    64

    1 !34 !64 !

    94

    3777775D

    2666664

    1 1 1 1

    1 �i �1 i1 �1 1 �11 i �1 �i

    3777775: 58

    If we carefully reorder the columns of F2m, then copies of Fm magically appear, e.g., 59

    F4

    2666664

    1 0 0 0

    0 0 1 0

    0 1 0 0

    0 0 0 1

    3777775D

    2666664

    1 1 1 1

    1 �1 �i i1 1 �1 �11 �1 i �i

    3777775D"

    F2 ˝2F2

    F2 �˝2F2

    #60

    where 61

    ˝2 D�1 0

    0 �i�: 62

    In general we have 63

    F2m˘2;m D"

    Fm ˝mFm

    Fm �˝mFm

    #(1)

  • UNCO

    RREC

    TED

    PROO

    F

    4 C.F. Van Loan

    where ˘2;m is a perfect shuffle permutation (to be described in Sect. 3.7) and ˝m is 64the diagonal matrix 65

    ˝m D diag.1; !n; : : : ; !m�1n /: 66

    The DFT matrix is dense, but by exploiting the recursion (1) it can be factored into 67a product of sparse matrices, e.g., 68

    F1024 D A10 � � �A2A1PT : 69

    Here, P is the bit reversal permutation and each Ak has just two nonzeros per row. 70It is this structured factorization that makes it possible to have a fast, O.n log n/ 71Fourier transform, e.g., 72

    73y D PTx74for k D 1 W 1075y D Aky76end

    See [31] for a detailed “matrix factorization” treatment of the FFT. 77The DFT matrix Fn is data sparse meaning that it can be represented with many 78

    fewer than n2 numbers. Other examples of data sparsity include matrices that have 79low-rank and matrices that are Kronecker products. Many tensor problems lead to 80matrix problems that are data sparse. 81

    2.2 Exploiting Structured Eigensystems 82

    Suppose A;F;G 2 Rn�nand that both F and G are symmetric. The matrix M defined 83by 84

    M D"

    A F

    G �AT#

    F D FT ; G D GT 85

    is said to be a Hamiltonian matrix. The eigenvalues of a Hamiltonian matrix come 86in plus-minus pairs and the eigenvectors associated with such a pair are related: 87

    M

    �yz

    �D �

    �yz

    �) MT

    �z�y�D ��

    �z�y�: 88

    Hamiltonian structure can also be defined through a permutation similarity. If 89

    J2n D"0 In

    �In 0

    #90

  • UNCO

    RREC

    TED

    PROO

    F

    Structured Matrix Problems from Tensors 5

    then M 2 R2n�2nis Hamiltonian if JT2nMJ2n D �MT . Under mild assumptions we 91can compute a structured Schur decomposition for a Hamiltonian matrix M: 92

    QTMQ D"

    Q1 Q2

    �Q2 Q1

    #TM

    "Q1 Q2

    �Q2 Q1

    #D"

    T11 T12

    0 �TT11

    #: 93

    Here, Q is orthogonal and symplectic (JT2nQJ2n D Q�T ) and T11 is upper quasi- 94triangular. Various Ricatti equation problems can be solved by exploiting this 95structured decomposition. See [23]. 96

    Tensors with multiple symmetries can be reshaped into matrices with multiple 97symmetries and these matrices have structured block factorizations. 98

    2.3 Exploiting the Right Representation 99

    Here is an example of a Cauchy-like matrix: 100

    A D

    26666666664

    r1s1!1 � �1

    r1s2!1 � �2

    r1s3!1 � �3

    r1s4!1 � �4

    r2s1!2 � �1

    r2s2!2 � �2

    r2s3!2 � �3

    r2s4!2 � �4

    r3s1!3 � �1

    r3s2!3 � �2

    r3s3!3 � �3

    r3s4!3 � �4

    r4s1!4 � �1

    r4s2!4 � �2

    r4s3!4 � �3

    r4s4!4 � �4

    37777777775: 101

    For this to be defined we must have f�1; : : : ; �ng [ f�1; : : : ; �ng D ;. Cauchy-like 102matrices are data sparse and a particularly clever characterization of this fact is to 103note that if ˝ D diag.!i/ and� D diag.�i/, then 104

    ˝A� A� D rsT (2)

    where r; s 2 Rn. If˝A�A� has rank r, then we say that A has displacement rank r 105(with respect to ˝ and �.) Thus, a Cauchy-like matrix has unit displacement rank. 106

    Now let us consider the first step of Gaussian elimination. Ignoring pivoting this 107involves computing a row of U, a column of L, and a rank-1 update: 108

    A D

    266664

    1 0 0 0

    `21 1 0 0

    `31 0 1 0

    `41 0 0 1

    377775

    266664

    1 0 0 0

    0 b22 b23 b24

    0 b32 b33 b34

    0 b42 b43 b44

    377775

    266664

    u11 u12 u13 u14

    0 1 0 0

    0 0 1 0

    0 0 0 1

    377775 : 109

  • UNCO

    RREC

    TED

    PROO

    F

    6 C.F. Van Loan

    It is easy to compute the required entries of L and U from the displacement rank 110representation (2). What about B? If we represent B conventionally as an array 111then that would involve O.n2/ flops. Instead we exploit the fact that B also has 112unit displacement rank: 113

    Q̋ B � B Q� D QrQsT : 114

    It turns out that we can transition from A’s representation f˝;�; r; sg to B’s 115representation f Q̋ ; Q�; Qr; Qsg with O.n/ work and this enables us to compute the LU 116factorization of a Cauchy-like matrix with just O.n2/ work. See [11, p. 682]. 117

    Being able to work with clever representations is often the key to having a 118successful solution framework for a tensor problem. 119

    2.4 Exploiting Orthogonality Structures 120

    Assume that the columns of the 2-by-1 block matrix 121

    Q D"

    Q1

    Q2

    #122

    are orthonormal, i.e., QT1Q1 C QT2Q2 D I. The CS decomposition says that Q1 and 123Q2 have related SVDs: 124

    "U1 0

    0 U2

    #T "Q1

    Q2

    #V D

    "diag.ci/

    diag.si/

    #c2i C s2i D 1 (3)

    where U1, U2, and V are orthogonal. This truly remarkable hidden structure can be 125used to compute stably the generalized singular value decomposition (GSVD) of a 126A1 2 Rm1�nand A2 2 Rm2�n. In particular, suppose we compute the QR factorization 127

    "A1

    A2

    #D"

    Q1

    Q2

    #R; 128

    and then the CS decomposition (3). By setting X D RTV , we obtain the GSVD 129

    A1 D U1 � diag.ci/ � XT A2 D U2 � diag.si/ � XT : 130

    For a more detailed discussion about the GSVD and the CS decomposition, see [11, 131p. 309]. 132

    A tensor decomposition can often be regarded as simultaneous decomposition of 133its (many) matrix “layers”. 134

  • UNCO

    RREC

    TED

    PROO

    F

    Structured Matrix Problems from Tensors 7

    2.5 Exploiting a Structured Data layout 135

    Suppose we have a block matrix 136

    A D

    266664

    A11 A12 � � � A1NA21 A22 � � � A2N:::

    :::: : :

    :::

    AM1 AM2 � � � AMN

    377775 137

    that is stored in such a way that the data in each Aij is contiguous in memory. 138Note that there is nothing structured about “A-the-matrix”. However, “A-the-stored- 139array” does have an exploitable structure that we now illustrate by considering the 140computation of C D AT . We start by transposing each of A’s blocks: 141

    266664

    B11 B12 � � � B1NB21 B22 � � � B2N:::

    :::: : :

    :::

    BM1 BM2 � � � BMN

    377775

    266664

    AT11 AT12 � � � AT1N

    AT21 AT22 � � � AT2N

    ::::::: : :

    :::

    ATM1 ATM2 � � � ATMN

    377775 : 142

    These block transpositions involve “local data” and this is important because 143moving data around in a large scale matrix computation is typically the dominant 144cost. Next, we transpose B as a block matrix: 145

    266664

    C11 C12 � � � C1MC21 C22 � � � C2M:::

    :::: : :

    :::

    CN1 C2N � � � CNM

    377775

    266664

    B11 B21 � � � BM1B12 B22 � � � BM2:::

    :::: : :

    :::

    B1N B2N � � � BMN

    377775 146

    Again, this is a “memory traffic friendly” maneuver because blocks of contiguous 147data are being moved. It is easy to verify that Cij D ATji . 148

    What we have sketched is a “2-pass” transposition procedure. By blocking in 149a way that resonates with cache/local memory size and by breaking the overall 150transposition process down into a sequence of carefully designed passes, one can 151effectively manage the underlying dataflow. See [11, p. 711]. 152

    Transposition and looping are much more complicated with tensors because 153there are typically an exponential number of possible data structures and an ex- 154ponential number of possible loop nestings. Software tools that facilitate reasoning 155in this space are essential. See [2, 25, 30]. 156

  • UNCO

    RREC

    TED

    PROO

    F

    8 C.F. Van Loan

    3 Matrix-Tensor Connections 157

    Tensor computations typically get “reshaped” into matrix computations. To operate 158in this venue we need terminology and mechanisms for matricizing the tensor data. 159In this section we introduce the notion of a tensor unfolding and we get comfortable 160with Kronecker products and their properties. For more details see [11, 17, 19, 26, 16129]. 162

    3.1 Talking About Tensors 163

    An order-d tensor A 2 Rn1�����ndis a real d-dimensional array 164

    A.1 W n1; : : : ; 1 W nd/ 165

    where the index range in the k-th mode is from 1 to nk. Note that 166

    a

    8<:

    scalarvectormatrix

    9=; is an

    8<:

    order-0order-1order-2

    9=; tensor: 167

    We use calligraphic font to designate tensors e.g., A, B, C, etc. Sometimes we will 168write A for matrix A if it makes things clear. 169

    One way that tensors arise is through discretization. A.i; j; k; `/ might house the 170value of f .w; x; y; z/ at .w; x; y; z/ D .wi; xj; yk; z`/. In multiway analysis the value 171of A.i; j; k; `/ could measure the interaction between four variables/factors. See [3] 172and [29]. 173

    3.2 Tensor Parts: Fibers and Slices 174

    A fiber of a tensor A is a column vector obtained by fixing all but one A’s indices. 175For example, if A D A.1 W 3; 1 W 5; 1 W 4; 1 W 7/ 2 R3�5�4�7, then 176

    A.2; W; 4; 6/ D A.2; 1 W 5; 4; 6/ D

    266666664

    A.2; 1; 4; 6/A.2; 2; 4; 6/A.2; 3; 4; 6/A.2; 4; 4; 6/A.2; 5; 4; 6/

    377777775

    177

    is a mode-2 fiber. 178

  • UNCO

    RREC

    TED

    PROO

    F

    Structured Matrix Problems from Tensors 9

    A slice of a tensor A is a matrix obtained by fixing all but two of A’s indices. For 179example, if A D A.1 W 3; 1 W 5; 1 W 4; 1 W 7/, then 180

    A.W; 3; W; 6/ D

    264A.1; 3; 1; 6/ A.1; 3; 2; 6/ A.1; 3; 3; 6/ A.1; 3; 4; 6/A.2; 3; 1; 6/ A.2; 3; 2; 6/ A.2; 3; 3; 6/ A.2; 3; 4; 6/A.3; 3; 1; 6/ A.3; 3; 2; 6/ A.3; 3; 3; 6/ A.3; 3; 4; 6/

    375 181

    is a slice. 182

    3.3 Order-4 Tensors and Block Matrices 183

    Block matrices with uniformly-sized blocks are reshaped order-4 tensors. For 184example, if 185

    C D

    266666666664

    c11 c12 c13 c14 c15 c16

    c21 c22 c23 c24 c25 c26

    c31 c32 c33 c34 c35 c36

    c41 c42 c43 c44 c45 c46

    c51 c52 c53 c54 c55 c56

    c61 c62 c63 c64 c65 c66

    377777777775

    186

    then matrix entry c45 is entry (2,1) of block (2,3). Thus, we can think of ŒCij�k` as 187the .i; j; k; `/ entry of a tensor C, e.g., 188

    c45 D ŒC23�21 D C.2; 3; 2; 1/: 189Working in the other direction we can unfold an order-4 tensor into a block 190

    matrix. Suppose A 2 Rn�n�n�n. Here is its “Œ1; 2� � Œ3; 4�” unfolding: 191

    AŒ1;2��Œ3;4� D

    266666666666666666664

    a1111 a1112 a1113 a1121 a1122 a1123 a1131 a1132 a1133

    a1211 a1212 a1213 a1221 a1222 a1223 a1231 a1232 a1233

    a1311 a1312 a1313 a1321 a1322 a1323 a1331 a1332 a1333

    a2111 a2112 a2113 a2121 a2122 a2123 a2131 a2132 a2133

    a2211 a2212 a2213 a2221 a2222 a2223 a2231 a2232 a2233

    a2311 a2312 a2313 a2321 a2322 a2323 a2331 a2332 a2333

    a3111 a3112 a3113 a3121 a3122 a3123 a3131 a3132 a3133

    a3211 a3212 a3213 a3221 a3222 a3223 a3231 a3232 a3233

    a3311 a3312 a3313 a3321 a3322 a3323 a3331 a3332 a3333

    377777777777777777775

    : 192

  • UNCO

    RREC

    TED

    PROO

    F

    10 C.F. Van Loan

    If A D AŒ1;2��Œ3;4� then the tensor-to-matrix mapping is given by 193

    A.i1; i2; i3; i4/ ! A.i1 C .i2 � 1/n; i3 C .i4 � 1/n/: 194

    An alternate unfolding results if modes 1 and 3 are associated with rows and modes 1952 and 4 are associated with columns: 196

    AŒ1;3��Œ2;4� D

    266666666666666666664

    a1111 a1112 a1113 a1211 a1212 a1213 a1311 a1312 a1313

    a1121 a1122 a1123 a1221 a1222 a1223 a1321 a1322 a1323

    a1131 a1132 a1133 a1231 a1232 a1233 a1331 a1332 a1333

    a2111 a2112 a2113 a2211 a2212 a2213 a2311 a2312 a2313

    a2121 a2122 a2123 a2221 a2222 a2223 a2321 a2322 a2323

    a2131 a2132 a2133 a2231 a2232 a2233 a2331 a2332 a2333

    a3111 a3112 a3113 a3211 a3212 a3213 a3311 a3312 a3313

    a3121 a3122 a3123 a3221 a3222 a3223 a3321 a3322 a3323

    a3131 a3132 a3133 a3231 a3232 a3233 a3331 a3332 a3333

    377777777777777777775

    : 197

    If A D AŒ1;3��Œ2;4� , then the tensor-to-matrix mapping is given by 198

    A.i1; i2; i3; i4/ ! A.i1 C .i3 � 1/n; i2 C .i4 � 1/n/: 199

    If a tensor A is structured, then different unfoldings reveal that structure in different 200ways [33]. The idea of block unfoldings is discussed in [26]. 201

    3.4 Modal Unfoldings 202

    A particularly important class of tensor unfoldings are the modal unfoldings. An 203order-d tensor has d modal unfoldings which we designate by A.1/; : : : ;A.d/. Let’s 204look at the tensor-to-matrix mappings for the case A 2 Rn1�n2�n3�n4 : 205

    A.i1; i2; i3; i4/! A.1/.i1; i2 C .i3 � 1/n2 C .i4 � 1/n2n3/

    A.i1; i2; i3; i4/! A.2/.i2; i1 C .i3 � 1/n1 C .i4 � 1/n1n3/

    A.i1; i2; i3; i4/! A.3/.i3; i1 C .i2 � 1/n1 C .i4 � 1/n1n2/

    A.i1; i2; i3; i4/! A.4/.i4; i1 C .i2 � 1/n1 C .i3 � 1/n1n2/:

  • UNCO

    RREC

    TED

    PROO

    F

    Structured Matrix Problems from Tensors 11

    Note that if N D n1 � � � nd, then A.k/ is nk-by-N=nk and its columns are mode-k 206fibers. Thus, if n2 D 4 and n1 D n3 D n4 D 2, then 207

    A.2/ D

    266664

    a1111 a1121 a1112 a1122 a2111 a2121 a2112 a2122

    a1211 a1221 a1212 a1222 a2211 a2221 a2212 a2222

    a1311 a1321 a1312 a1322 a2311 a2321 a2312 a2322

    a1411 a1421 a1412 a1422 a2411 a2421 a2412 a2422

    377775 : 208

    Modal unfoldings arise naturally in many multilinear optimization settings. 209

    3.5 The vec Operation 210

    The vec operator turns tensors into column vectors. The vec of a matrix is obtained 211by stacking its columns: 212

    A 2 R3�2 ) vec.A/ D

    266666664

    a11a21a31a12a22a32

    377777775: 213

    The vec of an order-3 tensor A 2 Rn1�n2�n3 stacks the vecs of the slices 214A.W; W; 1/; : : : ;A.W; W; n3/, e.g., 215

    A 2 R2�2�2 ) vec.A/ D"

    vec.A.W; W; 1//vec.A.W; W; 2//

    #D

    2666666666664

    a111a211a121a221a112a212a122a222

    3777777777775

    : 216

    In general, if A 2 Rn1�����nd and the order d � 1 tensor Ak is defined by Ak D A.W 217; : : : ; W; k/, then we have the following recursive definition: 218

    vec.A/ D

    264

    vec.A1/:::

    vec.And /

    375 : 219

    Thus, vec unfolds a tensor into a column vector. 220

  • UNCO

    RREC

    TED

    PROO

    F

    12 C.F. Van Loan

    3.6 The Kronecker Product 221

    Unfoldings enable us to reshape tensor computations as matrix computations and 222Kronecker products are very often part of the scene. The Kronecker product A D 223B ˝ C of two matrices B and C is a block matrix .Aij/ where Aij D bijC, e.g., 224

    A D

    264

    b11 b12 b13

    b21 b22 b23

    b31 b32 b33

    375 ˝

    "c11 c12

    c21 c22

    #D

    266664

    b11C b12C b13C

    b21C b22C b23C

    b31C b32C b33C

    377775 : 225

    Of course, B ˝ C is also a matrix of scalars: 226

    A D

    2666666666666664

    b11c11 b11c12 b12c11 b12c12 b13c11 b13c12

    b11c21 b11c22 b12c21 b12c22 b13c21 b13c22

    b21c11 b21c12 b22c11 b22c12 b23c11 b23c12

    b21c21 b21c22 b22c21 b22c22 b23c21 b23c22

    b31c11 b31c12 b32c11 b32c12 b33c11 b33c12

    b31c21 b31c22 b32c21 b32c22 b33c21 b33c22

    3777777777777775

    : 227

    Note that every possible product bijck` “shows up” in B ˝ C. 228In general, if A1 2 Rm1�n1and A2 2 Rm2�n2, then A D A1˝A2 is an m1m2-by-n1n2 229

    matrix of scalars. It is also an m1-by-n1 block matrix with blocks that are m2-by-n2. 230Kronecker products can be applied in succession. Thus, if 231

    A D

    2664

    b11 b12

    b21 b22

    b31 b32

    3775 ˝

    2666664

    c11 c12 c13 c14

    c21 c22 c23 c24

    c31 c32 c33 c34

    c41 c42 c43 c44

    3777775˝

    2664

    d11 d12 d13 d14

    d21 d22 d23 d24

    d31 d32 d33 d34

    3775 ; 232

    then A is a 3-by-2 block matrix whose entries are 4-by-4 block matrices whose 233entries are 3-by-4 matrices. 234

    It is important to have a facility with the Kronecker product operation because 235they figure heavily in tensor computations. Here are three critical properties: 236

    .A ˝ B/.C ˝ D/ D AC ˝ BD.A ˝ B/�1 D A�1 ˝ B�1

    .A ˝ B/T D AT ˝ BT :

  • UNCO

    RREC

    TED

    PROO

    F

    Structured Matrix Problems from Tensors 13

    Of course, in these expressions the various matrix products and inversions have to 237be defined. 238

    If the matrices A and B have structure, then their Kronecker product is typically 239structured in the same way. For example, if Q1 2 Rm1�n1and Q2 2 Rm2�n2 have 240orthonormal columns, then Q1 ˝ Q2 has orthonormal columns: 241

    .Q1 ˝ Q2/T.Q1 ˝ Q2/ D .QT1 ˝ QT2 /.Q1 ˝ Q2/D .QT1Q1/ ˝ .QT2Q2/ D In1 ˝ In2 D In1n2 :

    Kronecker products often arise through the “vectorization” of a matrix equation, 242e.g., 243

    C D BXAT , vec.C/ D .A ˝ B/ vec.X/: 244

    3.7 Perfect Shuffles, Kronecker Products, and Transposition 245

    In general, A1 ˝ A2 ¤ A2 ˝ A1. However, very structured permutation matrices 246P1 and P2 exist so that P1.A1 ˝ A2/P2 D A2 ˝ A1. Define the .p; q/-perfect shuffle 247matrix˘p;q 2 Rpq�pq by 248

    ˘p;q D�

    Ipq.W; 1 W q W pq/ j Ipq.W; 2 W q W pq/ j � � � j Ipq.W; q W q W pq/�: 249

    Here is an example: 250

    ˘3;2 D

    266666664

    1 0 0 0 0 0

    0 0 0 1 0 0

    0 1 0 0 0 0

    0 0 0 0 1 0

    0 0 1 0 0 0

    0 0 0 0 0 1

    377777775: 251

    In general A1 ˝ A2 ¤ A2 ˝ A1. However, if A1 2 Rm1�n1 and A2 2 Rm2�n2 then 252

    ˘m1;m2 .A1 ˝ A2/˘Tn1;n2 D A2 ˝ A1: 253

  • UNCO

    RREC

    TED

    PROO

    F

    14 C.F. Van Loan

    The perfect shuffle is also “behind the scenes” when the transpose of a matrix is 254taken, e.g., 255

    ˘3;2 vec.A/ D

    266666664

    1 0 0 0 0 0

    0 0 0 1 0 0

    0 1 0 0 0 0

    0 0 0 0 1 0

    0 0 1 0 0 0

    0 0 0 0 0 1

    377777775

    266666664

    a11a21a31a12a22a32

    377777775D

    266666664

    a11a12a21a22a31a32

    377777775D vec.AT/: 256

    In general, if A 2 Rm�n then ˘m;nvec.A/ D vec.AT/. See [13]. We return to these 257interconnections when we discuss tensor transposition in Sect. 6. 258

    3.8 Tensor Notation 259

    It is often perfectly adequate to illustrate a tensor computation idea using order-3 260examples. For example, suppose A 2 Rn1�n2�n3 , X1 2 Rm1�n1, X2 2 Rm2�n2, and 261X3 2 Rm3�n3 are given and that we wish to compute 262

    B.i1; i2; i3/ Dn1X

    j1D1

    n2Xj2D1

    n3Xj3D1

    A. j1; j2; j3/X1.i1; j1/X2.i2; j2/X1.i3; j3/ 263

    where 1 � i1 � m1, 1 � i2 � m2, and 1 � i3 � m3. Here we are using matrix-like 264subscript notation to spell out the definition of B. We could probably use the same 265notation to describe the order-4 version of this computation. However, for higher- 266order cases we have to resort to the dot-dot-dot notation and it gets pretty unwieldy: 267

    B.i1; : : : ; id/ Dn1X

    j1D1

    n2Xj2D1� � �

    ndXjdD1

    A. j1; : : : ; jd/X1.i1; j1/ � � �Xd.id; jd/: 268

    2691 � i1 � m1; 1 � i2 � m2; : : : ; 1 � id � md 270

    One way to streamline the presentation of such a calculation is to “vectorize” the 271notation using bold font to indicate vectors of subscripts. Multiple summations can 272also be combined through vectorization. Thus, if 273

    i D Œi1; : : : ; id�; j D Œ j1; : : : ; jd�; m D Œm1; : : : ;md�; n D Œn1; : : : ; nd�; 274

    then the B tensor given above can be expressed as follows: 275

    B.i/ DnX

    jD1A.j/X1.i1; j1/ � � �Xd.id; jd/; 1 � i � m: 276

  • UNCO

    RREC

    TED

    PROO

    F

    Structured Matrix Problems from Tensors 15

    Here, 1 D Œ1; 1; : : : ; 1�. As another example of this notation, if n D Œn1; : : : ; nd � 277and A 2 Rn, then 278

    kA kF Dvuut nX

    iD1A.i/2 279

    is its Frobenius norm. We shall make use of this vectorized notation whenever it is 280necessary to hide detail and/or when we are working with tensors of arbitrary order. 281

    Finally, it is handy to have a MATLAB “reshape” notation. Suppose n D 282Œ n1; : : : ; nd � and m D Œ m1; : : : ; me �. If A 2 Rn and n1 � � � nd D m1 � � �me, 283then 284

    B D reshape.A;m/ 285

    is the m1 � � � � � me tensor defined by vec.A/ D vec.B/: 286

    3.9 The Tensor Product 287

    On occasion it is handy to talk about operations between tensors without recasting 288the discussion in the language of matrices. Suppose n D Œn1; : : : ; nd� and that B; C 2 289Rn. We can multiply a tensor by a scalar, 290

    A D ˛B , A.i/ D ˛B.i/; 1 � i � n 291

    and we can add one tensor to another, 292

    A D BC C , A.i/ D B.i/C C.i/; 1 � i � n: 293

    Slightly more complicated is the tensor product which is a way of multiplying 294two tensors together to obtain a new, higher order tensor. For example if B 2 295Rm1�m2�m3 D Rm and C 2 Rn1�n2 D Rn, then the tensor product A D B ı C is 296defined by 297

    A.i1; i2; i3; j1; j2/ D B.i1; i2; i3/C. j1; j2/ 298

    i.e., A.i; j/ D B.i/ C.j/ for all 1 � i � m and 1 � j � n. 299If B 2 Rm and C 2 Rn, then there is a connection between the tensor product 300

    A D B ı C and its m-by-n unfolding: 301

    Am�n D vec.B/vec.C/T : 302

  • UNCO

    RREC

    TED

    PROO

    F

    16 C.F. Van Loan

    There is also a connection between the tensor product of two vectors and their 303Kronecker product: 304

    vec.x ı y/ D vec.xyT/ D y ˝ x: 305

    Likewise, if B and C are order-2 tensors and A D B ı C, then 306

    AŒ1;3��Œ2;4� D B ˝ C: 307

    The point of all this notation-heavy discussion is to stress the importance of 308flexibility and point of view. Whether we write A.i/ or A.i1; i2; i3/ or ai1i2i3 depends 309on the context, what we are trying to communicate, and what typesets nicely!. 310Sometimes it will be handy to regard A as a vector such as vec.A/ and sometimes 311as a matrix such as A.3/. Algorithmic insights in tensor computations frequently 312require an ability to “reshape” how the problem at hand is viewed. 313

    4 A Rank-1 Tensor Problem 314

    Rank-1 matrices have a prominent role to play in matrix computations. For example, 315one step of Gaussian elimination involves a rank-1 update of a submatrix. The 316SVD decomposes a matrix into a sum of very special rank-1 matrices. Quasi- 317Newton methods for nonlinear systems involve rank-1 modifications of the current 318approximate Jacobian matrix. 319

    In this section we introduce the concept of a rank-1 tensor and consider how we 320might approximate a given tensor with such an entity. This leads to a discussion 321(through an example) of multilinear optimization. 322

    4.1 Rank-1 Matrices 323

    If u and v are vectors, then A D uvT is a rank-1 matrix, e.g., 324

    A D24 u1u2

    u3

    35�v1

    v2

    �TD24 u1v1 u1v2u2v1 u2v2

    u3v1 u3v2

    35 : 325

  • UNCO

    RREC

    TED

    PROO

    F

    Structured Matrix Problems from Tensors 17

    Note that if A D uvT , then vec.A/ D v ˝ u and so we have 326266666664

    a11a21a31a12a22a32

    377777775D

    266666664

    u1v1u2v1u3v1u1v2u2v2u3v2

    377777775D�v1

    v2

    �˝24 u1u2

    u3

    35 : 327

    4.2 Rank-1 Tensors 328

    How can we extend the rank-1 idea from matrices to tensors? In matrix computa- 329tions we think of rank-1 matrices as outer products, i.e., A D uvT where u and v are 330vectors. Thinking of matrix A as tensor A, we see that it is just the tensor product of 331u and v: A.i1; i2/ D u.i1/v.i2/. Thus, we have 332

    A D u ı v D24u1u2

    u3

    35 ı

    �v1v2

    �, vec.A/ D

    266666664

    u1v1u2v1u3v1u1v2u2v2u3v2

    377777775D v ˝ u: 333

    Here is an order-3 example of the same idea: 334

    A D uıvıw D24 u1u2

    u3

    35ı�v1v2

    �ı�

    w1w2

    �, vec.A/ D

    26666666666666666664

    u1v1w1u2v1w1u3v1w1u1v2w1u2v2w1u3v2w1u1v1w2u2v1w2u1v2w2u2v2w2u3v2w2

    37777777777777777775

    D w˝v˝u: 335

    Each entry in A is a product of entries from u, v, and w: A.p; q; r/ D upvqwr. 336In general, a rank-1 tensor is a tensor product of vectors. To be specific, if x.i/ 2 337

    Rni for i D 1; : : : ; d, then 338

    A D x.1/ ı � � � ı x.d/ 339

  • UNCO

    RREC

    TED

    PROO

    F

    18 C.F. Van Loan

    is a rank-1 tensor whose entries are defined by A.i/ D x.1/i1 � � � x.d/id . In terms of the 340Kronecker product we have vec.x.1/ ı � � � ı x.d// D x.d/ ˝ � � � ˝ x.1/. 341

    4.3 The Nearest Rank-1 Problem for Matrices 342

    Given a matrix A 2 Rm�n, consider the minimization of 343

    �.�; u; v/ D k A � �uvT kF 344

    where u 2 Rm and v 2 Rn have unit 2-norm and � is a nonnegative scalar. This is an 345SVD problem for if UTAV D ˙ D diag.�i/ is the SVD of A and �1 � � � � � �n � 0, 346then �.�; u; v/ is minimized by setting � D �1, u D U.W; 1/, and v D V.W; 1/. 347

    Instead of explicitly computing the entire SVD, we can compute �1 and its 348singular vectors using an alternating least squares approach. The starting point is to 349realize that 350

    k A � �uvT k2F D tr.ATA/� 2�uTAv C �2: 351

    where tr.M/ indicates the trace of a matrix M, i.e., the sum of its diagonal entries. 352Note that 353

    �u.y/ D tr.ATA/ � 2yTAv C k y k22 354

    is minimized by setting y D Av and that 355

    �v.x/ D tr.ATA/� 2uTAxC k x k22 356

    is minimized by setting x D ATu. This suggests the following iterative framework 357for minimizing k A � �uvT kF: 358

    Nearest Rank-1 Matrix

    359Given: A 2 Rm�n, v 2 Rn, k v k2 D 1360Repeat:

    361Fix v and choose � and u to minimize k A � �uvT kF W362y D AvI � D k y kI u D y=�363Fix u and choose � and v to minimize k A � �uvT kF W364x D ATuI � D k x kI v D x=�365�opt D � I uopt D uI vopt D v

    366

  • UNCO

    RREC

    TED

    PROO

    F

    Structured Matrix Problems from Tensors 19

    This is basically just the power method applied to the matrix 367

    sym.A/ D"0 A

    AT 0

    #: 368

    The reason for bringing up this alternating least squares framework is that it readily 369extends to tensors. 370

    4.4 A Nearest Rank-1 Tensor Problem 371

    Given A 2 Rm�n�p, we wish to determine unit vectors u 2 Rm, v 2 Rn, and w 2 Rp 372and a scalar � so that the following is minimized: 373

    kA � � � u ı v ı w kF Dvuut mX

    iD1

    nXjD1

    pXkD1.aijk � uivjwk/2: 374

    Noting that 375

    kA � � � u ı v ı w kF D k vec.A/ � � � w ˝ v ˝ u k2 376

    we obtain the following alternating least squares framework: 377

    Nearest Rank-1 Tensor

    378Given: A 2 Rm�n�p and unit vectors v 2 Rn and w 2 Rp.379Repeat:

    380Determine x 2 Rm that minimizes k vec.A/ � w ˝ v ˝ x k2381and set � D k x k and u D x=�:382Determine y 2 Rn that minimizes k vec.A/ � w ˝ y ˝ u k2383and set � D k y k and v D y=�:384Determine z 2 Rp that minimizes k vec.A/ � z ˝ v ˝ u k2385and set � D k z k and w D z=�:386�opt D �; uopt D u; vopt D v; wopt D w

    387

  • UNCO

    RREC

    TED

    PROO

    F

    20 C.F. Van Loan

    It is instructive to examine some of the details associated with this iteration for the 388case m D n D p D 2. The objective function has the form 389

    �.�; 1; 2; 3/ D

    �������������������

    266666666666664

    a111a211a121a221a112a212a122a222

    377777777777775

    � � � w ˝ v ˝ u

    �������������������

    D

    �������������������

    266666666666664

    a111a211a121a221a112a212a122a222

    377777777777775

    � � �

    266666666666664

    c3c2c1c3c2s1c3s2c1c3s2s1s3c2c1s3c2s1s3s2c1s3s2s1

    377777777777775

    �������������������

    390

    where 391

    u D�

    cos.1/sin.1/

    �D�

    c1s1

    �; v D

    �cos.2/sin.2/

    �D�

    c2s2

    �; w D

    �cos.3/sin.3/

    �D�

    c3s3

    �: 392

    Let us look at the three structured linear least squares problems that arise during 393each iteration. 394

    (1) To improve 1 and � , we fix 2 and 3 and minimize 395�����������������

    2666666666664

    a111a211a121a221a112a212a122a222

    3777777777775

    � � �

    2666666666664

    c3c2c1c3c2s1c3s2c1c3s2s1s3c2c1s3c2s1s3s2c1s3s2s1

    3777777777775

    �����������������

    D

    �����������������

    2666666666664

    a111a211a121a221a112a212a122a222

    3777777777775

    2666666666664

    c3c2 00 c3c2

    c3s2 00 c3s2

    s3c2 00 s3c2

    s3s2 00 s3s2

    3777777777775

    �x1y1

    ������������������

    396

    with respect to x1 and y1. We then set � Dq

    x21 C y21 and u D Œx1 y1�T=�: 397(2) To improve 2 and � , we fix 1 and 3 and minimize 398

    �����������������

    2666666666664

    a111a211a121a221a112a212a122a222

    3777777777775

    � � �

    2666666666664

    c3c2c1c3c2s1c3s2c1c3s2s1s3c2c1s3c2s1s3s2c1s3s2s1

    3777777777775

    �����������������

    D

    �����������������

    2666666666664

    a111a211a121a221a112a212a122a222

    3777777777775

    2666666666664

    c3c1 0c3s1 00 c3c10 c3s1

    s3c1 0s3s1 00 s3c10 s3s1

    3777777777775

    �x2y2

    ������������������

    399

    with respect to x2 and y2. We then set � Dq

    x22 C y22 and v D Œx2 y2�T=�: 400

  • UNCO

    RREC

    TED

    PROO

    F

    Structured Matrix Problems from Tensors 21

    (3) To improve 3 and � , we fix 1 and 2 and minimize 401

    �����������������

    2666666666664

    a111a211a121a221a112a212a122a222

    3777777777775

    � � �

    2666666666664

    c3c2c1c3c2s1c3s2c1c3s2s1s3c2c1s3c2s1s3s2c1s3s2s1

    3777777777775

    �����������������

    D

    �����������������

    2666666666664

    a111a211a121a221a112a212a122a222

    3777777777775

    2666666666664

    c2c1 0c2s1 0s2c1 0s2s1 00 c2s10 c2s10 s2c10 s2s1

    3777777777775

    �x3y3

    ������������������

    402

    with respect to x3 and y3. We then set � Dq

    x23 C y23 and w D Œx3 y3�T=�: 403Componentwise optimization is a common framework for many tensor-related 404

    computations. The basic idea is to choose a subset of the unknowns and (temporar- 405ily) freeze their value. This leads to a simplified optimization problem involving the 406other unknowns. The process is repeated using different subsets of unknowns each 407iteration until convergence. The framework is frequently successful, but there is a 408tendency for the iterates to get trapped near an uninteresting local minima. 409

    5 The Variational Approach to Tensor Singular Values 410

    If �2 is a zero of the characteristic polynomial p.�/ D det.ATA � �I/, then � is a 411singular value of A and the associated left and right singular vectors are eigenvectors 412for AAT and ATA respectively. How can we extend these notions to tensors? Is there 413a version of the characteristic polynomial that makes sense for tensors? What would 414be the analog of the matrices AAT and ATA? These are tough questions. Fortunately, 415there is a constructive way to avoid these difficulties and that is to take a variational 416approach. Singular values and vectors are solutions to a very tractable optimization 417problem. 418

    5.1 Rayleigh Quotient/Power Method Ideas: The Matrix Case 419

    The singular values and singular vectors of a general matrix A 2 Rm�n are the 420stationary values and vectors of the Rayleigh quotient 421

    yTAx

    k x k2k y k2: 422

  • UNCO

    RREC

    TED

    PROO

    F

    22 C.F. Van Loan

    It is a slightly more convenient to pose this as a constrained optimization problem: 423singular values and singular vectors of a general matrix A 2 Rm�n are the stationary 424values and vectors of 425

    A.x; y/ D xTAy DmX

    iD1

    nXjD1

    aijxiyj 426

    subject to the constraints k x k2 D k y k2 D 1. To connect this “definition” to the 427SVD we use the method of Lagrange multipliers and that means looking at the 428gradient of 429

    Q A.x; y/ D .x; y/ � �2.xTx � 1/� �

    2.yTy � 1/: 430

    Using the rearrangements 431

    A.x; y/ DmX

    iD1xi

    0@ nX

    jD1aijyj

    1A D

    nXjD1

    yj

    mX

    iD1aijxi

    !; 432

    it follows that 433

    r Q A.x; y/ D"

    Ay � �xATx � �y

    #: (4)

    From the vector equation r Q A.x; y/ D 0 we conclude that � D � D xTAy D 434 A.x; y/ and that x and y satisfy Ay D .xTAy/x and ATx D .xTAy/y. That is to 435say, x is an eigenvector of ATA and y is an eigenvector of AAT and the associated 436eigenvalue in each case is .yTAx/2. These are exactly the conclusions that can be 437reached by equating columns in the SVD equation AV D U˙ . Indeed, from 438

    AŒ v1 j � � � j vn � D Œ u1 j � � � j un � diag.�1; : : : ; �n/ 439

    we see that Avi D �iui, ATui D �ivi, and �i D uTi Avi, for i D 1 W n. 440

  • UNCO

    RREC

    TED

    PROO

    F

    Structured Matrix Problems from Tensors 23

    The power method for matrices can be designed to go after the largest singular 441value and associated singular vectors: 442

    Power Method for Matrix SingularValues and Vectors

    Given A 2 Rm�n and unit vector y 2 Rn.Repeat:

    Qx D Ay; x D Qx=k Qx kQy D ATx; y D Qy=k Qy k� D A.x; y/ D yTAx

    �opt D � , uopt D y, vopt D x

    449

    This can be viewed as an alternating procedure for finding a zero for the gradient (4). 450Under mild assumptions, f�opt; uopt; voptgwill approximate the largest singular value 451of A and the corresponding left and right singular vectors. In principle, deflation can 452be used to find other singular value triplets. Thus, by applying the power method to 453A � �1u1vT1 we could obtain an estimate of f�2; u2; v2g. 454

    5.2 Rayleigh Quotient/Power Method Ideas: The Tensor Case 455

    Let us extend the Rayleigh quotient characterization for matrix singular values and 456vectors to tensors. We work out the order-3 situation for simplicity. 457

    If A 2 Rn1�n2�n3 , x 2 Rn1 , y 2 Rn2 , and z 2 Rn3 , then the singular values and 458vectors of A are the stationary values and vectors of 459

    A.x; y; z/ Dn1X

    iD1

    n2XjD1

    n3XkD1

    aijk xiyjzk (5)

    subject to the constraints k x k2 D k y k2 D k z k2 D 1. Before we start 460taking gradients we present three alternative formulations of this summation. Each 461highlights a different unfolding of the tensor A. 462

    The Mode-1 Formulation 463

    A.x; y; z/ Dn1X

    iD1xi

    0@ n2X

    jD1

    n3XkD1

    aijk yjzk

    1A D xTA.1/z ˝ y (6)

  • UNCO

    RREC

    TED

    PROO

    F

    24 C.F. Van Loan

    where A.1/.i; jC .k � 1/n2/ D aijk. For the case n D Œ4; 3; 2� we have 464

    A.1/ D

    2664

    a111 a121 a131 a112 a122 a132a211 a221 a231 a212 a222 a232a311 a321 a331 a312 a322 a332a411 a421 a431 a412 a422 a432

    3775 :

    .1;1/ .2;1/ .3;1/ .1;2/ .2;2/ .3;2/

    465

    The Mode-2 Formulation 466

    A.x; y; z/ Dn2X

    jD1yj

    n1X

    iD1

    n3XkD1

    aijk xizk

    !D yTA.2/z ˝ x (7)

    where A.2/. j; iC .k � 1/n1/ D aijk. For the case n D Œ4; 3; 2� we have 467

    A.2/ D24 a111 a211 a311 a411 a112 a212 a312 a412a121 a221 a321 a421 a122 a222 a322 a422

    a131 a231 a331 a431 a132 a232 a332 a432

    35 :

    .1;1/ .2;1/ .3;1/ .4;1/ .1;2/ .2;2/ .3;2/ .4;2/

    468

    The Mode-3 Formulation 469

    A.x; y; z/ DpX

    kD1zk

    0@ mX

    iD1

    nXjD1

    aijk xiyj

    1A D zTA.3/y ˝ x (8)

    where A.3/.k; iC . j� 1/n1/ D aijk. For the case n D Œ4; 3; 2� we have 470

    A.3/ D�

    a111 a211 a311 a411 a121 a221 a321 a421 a131 a231 a331 a431a112 a212 a312 a412 a122 a222 a322 a422 a132 a232 a332 a432

    �:

    .1;1/ .2;1/ .3;1/ .4;1/ .1;2/ .2;2/ .3;2/ .4;2/ .1;3/ .2;3/ .3;3/ .4;3/

    471

    The matrices A.1/, A.2/, and A.3/ are the mode-1, mode-2, and mode-3 unfoldings of 472A that we introduced in Sect. 3.4. It is handy to identify columns with multi-indices 473as we have shown. 474

    We return to the constrained minimization of the objective function A.x; y; z/) 475that is defined in (5). Using the method of Lagrange multipliers we set the gradient 476of 477

    Q A.x; y; z/ D A.x; y; z/ � �2.xTx � 1/� �

    2.yTy � 1/�

    2.zTz � 1/ 478

  • UNCO

    RREC

    TED

    PROO

    F

    Structured Matrix Problems from Tensors 25

    to zero. Using (6)–(8) we get 479

    r Q A D

    2664A.1/.z ˝ y/ � � xA.2/.z ˝ x/ � � yA.3/.y ˝ x/ � z

    3775 D

    26640

    0

    0

    3775 : (9)

    Since x, y, and z are unit vectors it follows that � D � D D .x; y; z/. In this case 480we say that � D .x; y; z/ is a singular value of A and x, y, and z are the associated 481singular vectors. How might we solve this (highly structured) system of nonlinear 482equations? The triplet of matrix-vector products: 483

    A.1/ � .z ˝ y/ D � � x A.2/ � .z ˝ x/ D � � y A.3/ � .y ˝ x/ D � � z 484

    suggests a componentwise solution strategy: 485

    Power Method for Tensor SingularValues and Vectors

    4Given: A 2 Rn1�n2�n3 and unit vectors y 2 Rn2 and z 2 Rn3 .4Repeat:

    4Qx D A.1/.z ˝ y/; � D k Qx k; x D Qx=�4Qy D A.2/.z ˝ x/; � D k Qy k; y D Qy=�4Qz D A.3/.y ˝ x/; � D k Qz k; z D Qz=�4�opt D � , xopt D x, yopt D y, zopt D z

    492

    See [8, 18] for details. For matrices, the SVD expansion 493

    A D U˙VT Drank.A/X

    iD1�iuiv

    Ti 494

    has an important optimality property. In particular, the Eckhart-Young theorem tells 495us that 496

    Ar DrX

    iD1�iuiv

    Ti r � rank.A/ 497

    is the closest rank-r matrix to A in either the 2-norm or Frobenius norm. Moreover, 498the closest rank-1 matrix to Ar�1 is �rurvTr . Thus, it would be possible (in principle) 499to compute the full SVD by solving a sequence of closest rank-1 matrix problems. 500

  • UNCO

    RREC

    TED

    PROO

    F

    26 C.F. Van Loan

    This idea does not work for tensors. In other words, if �1u1 ı v1 ı w1 is the closest 501rank-1 tensor to A and �2u1 ı v2 ı w2 is the closest rank-1 tensor to 502

    QA D A � �1u1 ı v1 ı w1; 503

    then 504

    A2 D �1u1 ı v1 ı w1 C �2u2 ı v2 ı w2 505

    is not necessarily the closest rank-2 tensor to A. We need an alternative approach to 506formulating a “tensor SVD”. 507

    5.3 A First Look at Tensor Rank 508

    Since matrix rank ideas do not readily extend to the tensor setting, we should look 509more carefully at tensor rank to appreciate the “degree of difficulty” associated with 510the formulation of illuminating low-rank tensor expansions. 511

    We start with a definition. Suppose the tensor A can be written as the sum of r 512rank-1 tensors and that r is minimal in this regard. In this case we say that rank.A/ D 513r. Let us explore this concept in the simplest possible setting: A 2 R2�2�2. For this 514problem the goal is to find three thin-as-possible matrices X;Y;Z 2 R2�r so that 515

    A DrX

    kD1X.W; k/ ı Y.W; k/ ı Z.W; k/; (10)

    i.e., 516

    vec.A/ D

    2666666666664

    a111a211a121a221a112a212a122a222

    3777777777775

    DrX

    kD1Z.W; k/ ˝ Y.W; k/ ˝ X.W; k/: 517

  • UNCO

    RREC

    TED

    PROO

    F

    Structured Matrix Problems from Tensors 27

    This vector equation can also be written as a pair of matrix equations: 518

    A.W; 1/ D"

    a111 a121

    a211 a221

    #D

    rXkD1

    Z.1; k/X.W; k/Y.W; k/T

    A.W; 2/ D"

    a112 a122

    a212 a222

    #D

    rXkD1

    Z.2; k/X.W; k/Y.W; k/T :

    Readers familiar with the generalized eigenvalue problem should see a connection 519to our 2-by-2-by-2 rank.A/ problem. Indeed, it can be shown that 520

    det

    "a111 a121

    a211 a221

    #� �

    "a112 a122

    a212 a222

    #!D 0 521

    has real distinct roots with probability 0.79 and complex conjugate roots with 522probability 0.21 when the matrix entries are randomly selected using the MATLAB 523randn function. If this 2-by-2 generalized eigenvalue problem has real distinct 524eigenvalues, then it is possible to find nonsingular matrices S and T so that 525

    "a111 a121

    a211 a221

    #D S

    "˛1 0

    0 ˛2

    #TT

    "a112 a122

    a212 a222

    #D S

    "ˇ1 0

    0 ˇ2

    #TT :

    This shows that the rank-1 expansion (10) for A holds with r D 2, X D S, Y D T 526and 527

    Z D"˛1 ˛2

    ˇ1 ˇ2

    #: 528

    Thus, for 2-by-2-by-2 tensors, rank equals two with probability about 0.79. A 529similar generalized eigenvalue analysis shows that the rank is three with probability 5300.21. This is a very different situation than with matrices where an n-by-n matrix has 531rank n with probability 1. The subtleties associated with tensor rank are discussed 532further in Sect. 8.3. 533

  • UNCO

    RREC

    TED

    PROO

    F

    28 C.F. Van Loan

    6 Tensor Symmetry 534

    Symmetry for a matrix is defined through transposition: A D AT . This is a nice 535shorthand way of saying that A.i; j/ D A. j; i/ for all possible i and j that satisfy 5361 � i � n and 1 � j � n. 537

    How do we extend this notion to tensors? Transposition moves indices around so 538we need a way of talking about what happens when we (say) interchange A.i; j; k/ 539with A. j; i; k/ or A.k; j; i/ or A.i; k; j/ or A. j; k; i/ or A.k; i; j/. It looks like we will 540have to contend with an exponential number of transpositions and an exponential 541number of partial symmetries. 542

    6.1 Tensor Transposition 543

    If C 2 Rn1�n2�n3 , then there are 3Š D 6 possible transpositions identified by the 544notation C< Œi j k� > where Œi j k� is a permutation of Œ1 2 3�: 545

    B D

    8̂ˆ̂̂̂̂ˆ̂̂̂

    C< Œ1 3 2� >

    C< Œ2 1 3� >

    C< Œ2 3 1� >

    C< Œ3 1 2� >

    C< Œ3 2 1� >

    9>>>>>>>>>>>=>>>>>>>>>>>;

    H)

    8̂ˆ̂̂̂̂ˆ̂̂̂>>>>>>>>>>=>>>>>>>>>>>;

    D cijk 546

    for i D 1 W n1; j D 1 W n2; k D 1 W n3: 547For order-d tensors there are dŠ possibilities. Suppose v D Œv1; v2; : : : ; vd� is a 548

    permutation of the integer vector 1 W d D Œ1; 2; : : : ; d �. If C 2 Rn1�����nd , then 549

    B D C ) B.i.v// D C.i/ 1 � i � n: 550

    6.2 Symmetric Tensors 551

    An order-d tensor C 2 Rn�����n is symmetric if C D C for all permutations v of 5521 W d. If d D 3 this means that 553

    cijk D cikj D cjik D cjki D ckij D ckji: 554

  • UNCO

    RREC

    TED

    PROO

    F

    Structured Matrix Problems from Tensors 29

    To get a feel for the level of redundancy in a symmetric tensor, we see that a 555symmetric C 2 R3�3�3 has at most ten distinct values: 556

    c111c112 D c121 D c211c113 D c131 D c311c222c221 D c212 D c122c223 D c232 D c322c333c331 D c313 D c133c332 D c323 D c233c123 D c132 D c213 D c231 D c312 D c321:

    557

    The modal unfoldings of a symmetric tensor are all the same. For example, the (2,3)AQ3 558entries in each of the matrices 559

    C.1/ D

    264

    c111 c121 c131 c112 c122 c132 c113 c123 c133

    c211 c221 c231 c212 c222 c232 c213 c223 c233

    c311 c321 c331 c312 c322 c332 c113 c323 c333

    375 560

    561

    C.2/ D

    264

    c111 c211 c311 c112 c212 c312 c113 c213 c313

    c121 c221 c321 c122 c222 c322 c123 c223 c323

    c131 c231 c331 c132 c232 c332 c113 c233 c333

    375 562

    563

    C.3/ D

    264

    c111 c211 c311 c121 c221 c321 c131 c231 c331

    c112 c212 c312 c122 c222 c322 c132 c232 c332

    c113 c213 c313 c123 c223 c323 c133 c233 c333

    375 564

    are the equal: c231 D c321 D c312. 565

    6.3 Symmetric Rank 566

    An order-d symmetric rank-1 tensor C 2 Rn�����n has the form 567

    C D x ı � � � ı x„ ƒ‚ …d times

    568

  • UNCO

    RREC

    TED

    PROO

    F

    30 C.F. Van Loan

    where x 2 Rn. In this case we clearly have 569

    C.i1; : : : ; id/ D xi1xi2 � � � xid 570

    and 571

    vec.C/ D x ˝ � � � ˝ x„ ƒ‚ …d times

    : 572

    An order-3 symmetric tensor C has symmetric rank r if there exists x1; : : : ; xr 2 Rn 573and � 2 Rr such that 574

    C DrX

    kD1�k � xk ı xk ı xk 575

    and no shorter sum of symmetric rank-1 tensors exists. Symmetric rank is denoted 576by rankS.C/. Note, in contrast to what we would expect for matrices, there may be 577a shorter sum of general rank-1 tensors that add up to C: 578

    C DQrX

    kD1Q�k � Qxk ı Qyk ı Qzk: 579

    The symmetric rank of a symmetric tensor is more tractable than the (general) rank 580of a general tensor. For example, if C 2 C n�����n is an order-d symmetric tensor, 581then with probability one we have 582

    rankS.C/ D8<:

    f .d;m/C 1 if (d,n)D .3; 5/; .4; 3/; .4; 4/; or.4; 5/

    f .d; n/ otherwise583

    where 584

    f .d; n/ D ceil

    0BBB@

    �nC d � 1

    d

    n

    1CCCA : 585

    See [5, 6] for deeper discussions of symmetric rank. 586

  • UNCO

    RREC

    TED

    PROO

    F

    Structured Matrix Problems from Tensors 31

    6.4 The Eigenvalues of a Symmetric Tensor 587

    For a symmetric matrix C the stationary values of �C.x/ D xTCx subject to the 588constraint that k x k2 D 1 are the eigenvalues of C. The associated stationary vectors 589are eigenvectors. We extend this idea to symmetric tensors and the order-3 case is 590good enough to illustrate the main ideas. 591

    If C 2 Rn�n�n is a symmetric tensor, then we define the stationary values of 592

    �C.x/ DnX

    iD1

    nXjD1

    nXkD1

    cijkxixjxk D xTC.1/.x ˝ x/ 593

    subject to the constraint that k x k2 D 1 to be the eigenvalues of C. The associated 594stationary vectors are eigenvectors. Using the method of Lagrange multipliers it can 595be shown that if x is a stationary vector for �C then 596

    x D �C.x/C.1/.x ˝ x/ 597This leads to an iteration of the following form: 598

    Power Method for Tensor Eigenvaluesand Eigenvectors

    599Given: Symmetric C 2 Rn�n�n and unit vector x 2 Rn.600Repeat:

    601Qx D C.1/.x ˝ x/602� D k Qx k2603x D Qx=�604�opt D �, xopt D x.

    605

    There are some convergence results for this iteration, For example, it can be shown 606that if the order of C is even and M is a square unfolding, then the iteration converges 607if M is positive definite [15]. 608

    6.5 Symmetric Embeddings 609

    In the matrix case there are connections between the singular values and vectors of 610A 2 Rn1�n2 and the eigenvalues and vectors of 611

    sym.A/ D"0 A

    AT 0

    #2 R.n1Cn2/�.n1Cn2/ 612

  • UNCO

    RREC

    TED

    PROO

    F

    32 C.F. Van Loan

    If A D U � diag.�i/ � VT is the SVD of A 2 Rn1�n2, then for k D 1 W rank.A/ 613"0 A

    AT 0

    #"uk

    ˙vk

    #D ˙�k

    "uk

    ˙vk

    #614

    where uk D U.W; k/ and vk D V.W; k/. 615It turns out that symmetric tensor sym.A/ can be “built” from of a general tensor 616

    A by judiciously positioning A and all of its transposes. Here is a depiction of the 617order-3 case: 618

    619

    We can think of sym.A/ as a 3-by-3-by-3 block tensor. As a tensor of scalars, it 620is N-by-N-by-N where N D n1n2n3. If 621

    8<: �;

    24 uv

    z

    359=; 622

    is a stationary pair for sym.A/, then so are 6238<: �;

    24 u�v�z

    359=; ;

    8<: ��;

    24 u�v

    z

    359=; ;

    8<: ��;

    24 uv�z

    359=; : 624

    This is a nice generalization of the result for matrices. There are interesting 625connections between power methods with A and power methods with sym.A/. 626There are also connections between the rank of A and the symmetric rank of

  • UNCO

    RREC

    TED

    PROO

    F

    Structured Matrix Problems from Tensors 33

    sym.A/: 627

    dŠ rank.A/ � rankS.sym.A//: 628

    It is not clear if the inequality can be replaced by equality. See [27]. 629

    7 The Tucker Decomposition 630

    We have seen that it is possible to extend the definition of singular values and vectors 631to tensors by using variational principles. However, these insights did not culminate 632in the production of a tensor SVD and that is disappointing when we consider the 633power of the matrix SVD: 634

    1. It can turn a given problem into an equivalent easy-to-solve problem. For 635example, the min k Ax � b k2 problem can be converted into an equivalent 636diagonal least squares problem using the SVD. 637

    2. It can uncover hidden relationships that exist in matrix-encoded data. For 638example, the SVD can show that a data matrix has a low rank structure. 639

    In the next several sections we produce various SVD-like tensor decompositions. 640We start with the Tucker decomposition because it involves orthogonality and has a 641strong resemblance to the matrix SVD. We use order-3 tensors to illustrate the main 642ideas. 643

    7.1 Tucker Representations: The Matrix Case 644

    Given a matrix A 2 Rn1�n2, the Tucker representation problem involves finding a 645core matrix S 2 Rr1�r2 and matrices U1 2 Rn1�r1 and U2 2 Rn2�r2 such that 646

    A.i1; i2/ Dr1X

    j1D1

    r2Xj2D1

    S. j1; j2/ � U1.i1; j1/ � U2.i2; j2/: 647

    Part of the “deal” involves choosing the integers r1 and r2 and perhaps replacing the 648“=” with “�”. Regardless, it is possible to reformulate the right hand side as a sum 649of rank-1 matrices, 650

    A Dr1X

    j1D1

    r2Xj2D1

    S. j1; j2/ � U1.W; j1/ � U2.W; j2/T ; (11)

  • UNCO

    RREC

    TED

    PROO

    F

    34 C.F. Van Loan

    or as a sum of vector Kronecker products, 651

    vec.A/ Dr1X

    j1D1

    r2Xj2D1

    S. j1; j2/U2.W; j2/ ˝ U1.W; j2/; 652

    or as a single matrix-vector product, 653

    vec.A/ D .U2 ˝ U1/ � vec.S/: 654

    The tensor versions of these reformulations get us to think the right way about how 655we might generalize the matrix SVD. 656

    7.2 Tucker Representations: The Tensor Case 657

    Given a tensor A 2 Rn1�n2�n3 , the Tucker representation problem involves finding 658a core tensor S 2 Rr1�r2�r3 and matrices U1 2 Rn1�r1 , U2 2 Rn2�r2 , and U3 2 659Rn3�r3 such that 660

    A.i1; i2; i3/ Dr1X

    j1D1

    r2Xj2D1

    r3Xj3D1

    S. j1; j2; j3/ � U1.i1; j1/ � U2.i2; j2/ � U3.i3; j3/: 661

    As in the matrix case above, we can rewrite this as a sum of rank-1 tensors, 662

    A Dr1X

    j1D1

    r2Xj2D1

    r3Xj3D1

    S. j1; j2; j3/ � U1.W; j1/ ı U2.W; j2/ ı U3.W; j3/; 663

    or as the sum of vector Kronecker products, 664

    vec.A/ Dr1X

    j1D1

    r2Xj2D1

    r3Xj3D1

    S. j1; j2; j3/ � U3.W; j3/ ˝ U2.W; j2/ ˝ U1.W; j1/ 665

    or as a single matrix-vector product, 666

    vec.A/ D .U3 ˝ U2 ˝ U1/ � vec.S/: 667

    The challenge is to design the representation so that it is illuminating and com- 668putable. 669

    Before we proceed it is instructive to revisit the matrix case. If we set r1 D n1 670and r2 D n2 in (11) and assume the U matrices are orthogonal, then the Tucker

  • UNCO

    RREC

    TED

    PROO

    F

    Structured Matrix Problems from Tensors 35

    representation has the form 671

    A D U1.UT1 AU2/UT2 Dn1X

    j1D1

    n2Xj2D1

    S. j1; j2/U1.W; j1/U2.W; j2/T 672

    where S D UT1AU2. To make this an “illuminating” representation of A we strive for 673a diagonal core matrix S and that, of course, leads to the SVD: 674

    A Drank.A/X

    kD1S.k; k/U1.W; k/U2.W; k/T : 675

    From there we prove various optimality theorems and conclude that 676

    Ar DrX

    kD1S.k; k/U1.W; k/U2.W; k/T r � rank.A/ 677

    is the closest rank-r matrix to A in (say) the Frobenius norm. 678On the algorithmic front methods typically determine U1 and U2 through a 679

    sequence of updates. Thus, if A D U1SUT2 is the “current” representation of A we 680proceed to compute orthogonal �1 and �2 so that QS D �T1S�2 is “more diagonal” 681than S. We then update the representation: 682

    S �1TS�2; U1 U1�1; U2 U2�2: 683

    Our plan is to mimic this sequence of events for tensors focusing first on the 684connection between the core tensor S and the U matrices. 685

    7.3 The Mode-k Product 686

    Updating a Tucker representation involves updating the current core tensor S and 687the associated U-matrices. Regarding the former we anticipate the need to design 688a relevant tensor-matrix product. The mode-k product turns out to be that operation 689and we motivate the main idea with an example. Suppose S 2 R4�3�2 and consider 690its mode-2 unfolding: 691

    S.2/ D

    264

    s111 s211 s311 s411 s112 s212 s312 s412

    s121 s221 s321 s421 s122 s222 s322 s422

    s131 s231 s331 s431 s132 s232 s332 s432

    375 : 692

  • UNCO

    RREC

    TED

    PROO

    F

    36 C.F. Van Loan

    Its columns are the mode-2 fibers of S. Suppose we apply a 5-by-3 matrix M to each 693of those fibers: 694

    266666664

    t111 t211 t311 t411 t112 t212 t312 t412

    t121 t221 t321 t421 t122 t222 t322 t422

    t131 t231 t331 t431 t132 t232 t332 t432

    t141 t241 t341 t441 t142 t242 t342 t442

    t151 t251 t351 t451 t152 t252 t352 t452

    377777775

    D

    266666664

    m11 m12 m13

    m21 m22 m23

    m31 m32 m33

    m41 m42 m43

    m51 m52 m53

    377777775

    2664

    s111 s211 s311 s411 s112 s212 s312 s412

    s121 s221 s321 s421 s122 s222 s322 s422

    s131 s231 s331 s431 s132 s232 s332 s432

    3775 :

    This defines a new tensor T 2 R4�5�2 that is totally specified by the equation 695

    T.2/ D M � S.2/ 696

    and referred to as the mode-2 product of a tensor S with a matrix M. In general, if 697S is an n1� � � � � nd tensor and M 2 Rmk�nk for some k that satisfies 1 � k � d, then 698the mode-k product of S with M is a new tensor T defined by 699

    T.k/ D M � S.k/: 700

    To indicate this operation we use the notation 701

    T D S�k M: 702

    Note that T is an n1 � � � � � nk�1 � mk � nkC1 � nd tensor. To illustrate more 703characterizations of the mode-k product we drop down to the order-3 case. 704

    – Mode-1 Product. If S 2 Rn1�n2�n3 and M1 2 Rm1�n1, then T D S�1 M1 is an 705m1 � n2 � n3 tensor that is equivalently defined by 706

    T .i1; i2; i3/ Dn1X

    kD1M1.i1; k/S.k; i2; i3/

    vec.T / D .In3 ˝ In2 ˝ M1/vec.S/: (12)

  • UNCO

    RREC

    TED

    PROO

    F

    Structured Matrix Problems from Tensors 37

    – Mode-2 Product. If S 2 Rn1�n2�n3 and M2 2 Rm2�n2, then T D S�2 M2 is an 707n1 �m2 � n3 tensor that is equivalently defined by 708

    T .i1; i2; i3/ Dn2X

    kD1M2.i2; k/S.i1; k; i3/

    vec.T / D .In3 ˝ M2 ˝ In1 /vec.S/ (13)

    – Mode-3 Product. If S 2 Rn1�n2�n3 and M3 2 Rm3�n3, then T D A�3 M3 is an 709n1 � n2 � m3 tensor that is is equivalently defined by 710

    T .i1; i2; i3/ Dn3X

    kD1M3.i3; k/S.i1; i2; k/

    vec.T / D .M3 ˝ In2 ˝ In1 /vec.S/ (14)

    The modal products have two important properties that we will be using later. The 711first concerns successive products in the same mode. If S 2 Rn1�����nd and M1;M2 2 712Rnk�nk, then 713

    .S�k M1/�k M2 D S�k .M1M2/: 714

    The second property concerns successive products in different modes. If S 2 715Rn1�����nd , Mk 2 Rnk�nk, Mj 2 Rnj�nj, and k ¤ j, then 716

    .S�k Mk/�j Mj D .S �j Mj/�k Mk 717The order is not important so we just write S �j Mj�k Mk or S �k Mk�j Mj. 718

    7.4 The Core Tensor 719

    Suppose we have a Tucker representation 720

    A.i/ DnX

    jD1S.j/ � U1.i1; j1/ �U2.i2; j2/ � U3.i3; j3/; 721

    where A 2 Rn, n D Œn1; n2; n3�, and U1 2 Rn1�n1, U2 2 Rn2�n2, and U3 2 Rn3�n3. It 722follows that 723

    vec.A/ D .U3 ˝ U2 ˝ U1/vec.S/

    D .U3 ˝ In2 ˝ In1/.In3 ˝ U2 ˝ In1 /.In3 ˝ In2 ˝ U1/vec.S/:

  • UNCO

    RREC

    TED

    PROO

    F

    38 C.F. Van Loan

    This factored product enables us to relate the core tensor S to A via a triplet of 724modal products. Indeed, if 725

    vec.S.1// D .In3 ˝ In2 ˝ U1/vec.S/

    vec.S.2// D .In3 ˝ U2 ˝ In1 /vec.S.1//

    vec.S.3// D .U3 ˝ In2 ˝ In1 /vec.S.2//

    then A D S.3/. But from (12), (13), and (14) this means that 726

    S.1/ D S�1 U1 S.2/ D S.1/�2 U2 S.3/ D S.2/�3 U3 727

    and so 728

    A D S�1 U1�2 U2�3 U3: 729

    If the U’s are nonsingular then 730

    A D A�1 .U�11 U1/�2 .U�12 U2/�3 .U�13 U3/

    D A�1 U�11 �2 U�12 �3 U�13 �1 U1�2 U2�3 U3and so S D A�1 U�11 �2 U�12 �3 U�13 . 731

    If the U’s are orthogonal and A 2 Rn1�n2�n3 and U1 2 Rn1�n1, U2 2 Rn2�n2, and 732U3 2 Rn3�n3 are orthogonal, then 733

    A D S�1 U1�2 U2�3 U3 734

    where 735

    S D A�1 UT1�2 UT2�3 UT3 : (15)

    or equivalently 736

    A.1/ D U1S.1/.U3 ˝ U2/T (16)

    A.2/ D U2S.2/.U3 ˝ U1/T (17)

    A.3/ D U3S.3/.U2 ˝ U1/T (18)

    With this choice we are representing A as a Tucker product of a core tensor S and 737three orthogonal matrices. Things are beginning to look “SVD-like”. 738

  • UNCO

    RREC

    TED

    PROO

    F

    Structured Matrix Problems from Tensors 39

    7.5 The Higher-Order SVD 739

    How do we choose the U matrices in (15) so that the core tensor S reveals things 740about the structure of A? It is instructive to look at the matrix case where we know 741the answer to this question. Suppose we have the SVDs 742

    UT1A.1/V1 D ˙1; UT2A.2/V2 D ˙2: 743

    Since A1 D A and A.2/ D AT it follows that we may set V1 D U2. In other words, 744we can (in principle) compute the SVD of A by computing the SVD of the modal 745unfoldings A.1/ and A.2/. The U matrices from the modal SVDs are the “right” 746choice. 747

    This suggests a strategy for picking good U matrices for the tensor case A 2 748Rn1�n2�n3 . Namely, compute the SVD of the modal unfoldings 749

    A.1/ D U1˙1VT1 A.2/ D U2˙2VT2 A.3/ D U3˙3VT3 (19)

    and set 750

    S D A�1 UT1�2 UT2�3 UT3 : 751

    The resulting decomposition 752

    A D S�1 U1�2 U2�3 U3; 753

    is the higher-order SVD (HOSVD) ofA. IfA D S�1 U1�2 U2�3 U3 is the HOSVD 754of A 2 Rn1�n2�n3 , then 755

    A DrX

    jD1S.j/ � U1.W; j1/ ı U2.W; j2/ ıU3.W; j3/ (20)

    where r1 D rank.A.1//, r2 D rank.A.2//, and r3 D rank.A.3//. The triplet of modal 756ranks Œr1; r2; r3 � is called the multilinear rank of A. 757

    The core tensor in the HOSVD has important properties. By combining (16)–(19) 758we have 759

    S.1/ D ˙1V1.U3 ˝ U2/

    S.2/ D ˙2V2.U3 ˝ U1/

    S.3/ D ˙3V3.U2 ˝ U1/

  • UNCO

    RREC

    TED

    PROO

    F

    40 C.F. Van Loan

    from which we conclude that 760

    k S. j; W; W/ kF D �j.A.1// j D 1 W n1

    k S.W; j; W/ kF D �j.A.2// j D 1 W n2

    k S.W; W; j/ kF D �j.A.3// j D 1 W n3:

    Here, �j.C/ denotes the jth largest singular value of the matrix C. Notice that the 761norms of the tensor’s slices are getting smaller as we “move away” from A.1; 1; 1/. 762

    This suggests that we can use the grading in S to truncate the HOSVD: 763

    A � AQr DQrX

    jD1S.j/ � U1.W; j1/ ı U2.W; j2/ ı U3.W; j3/ 764

    where Qr � r, i.e., Qr1 � r1, Qr2 � r2, and Qr3 � r3. As with SVD-based low-rank 765approximation in the matrix case, we simply need a tolerance to determine how to 766abbreviate the summation in (20. For a deeper discussion of the HOSVD, see [9]. 767

    7.6 The Tucker Nearness Problem 768

    Suppose we are given A 2 Rn1�n2�n3 and r D Œr1; r2; r3� � Œn1; n2; n3� D n. 769In the Tucker nearness problem we determine S 2 Rr1�r2�r3 and matrices U1 2 770Rn1�r1, U2 2 Rn2�r2, and U3 2 Rn3�r3 with orthonormal columns such that 771

    �.U1;U2;U3/ D������ A �

    rXjD1

    S.j/ � U1.W; j1/ ı U2.W; j2/ ı U3.W; j3/������

    F

    772

    is minimized. It is easy to show that 773

    �.U1;U2;U3/ D k vec.A/ � .U3 ˝ U2 ˝ U1/vec.S/ k2 774

    and so from using normal equations we see that 775

    S D UT3 ˝ UT2 ˝ UT1 � vec.A/: 776

  • UNCO

    RREC

    TED

    PROO

    F

    Structured Matrix Problems from Tensors 41

    This is why the objective function � does not have S as an argument: the “best S” 777is determined by U1, U2, and U3. The goal is to minimize 778

    �.U1;U2;U3/ D kI � .U3 ˝ U2 ˝ U1/

    UT3 ˝ UT2 ˝ UT1

    vec.A/ k

    2: 779

    Since U3 ˝ U2 ˝ U1 has orthonormal columns, it follows that minimizing this 780norm is the same as maximizing 781

    �.U1;U2;U3/ D kUT3 ˝ UT2 ˝ UT1

    � vec.A/ k2: 782

    The reformulations 783

    �.U1;U2;U3/ D

    8̂ˆ̂<ˆ̂̂:

    k UT1 � A.1/ � .U3 ˝ U2/ kFk UT2 � A.2/ � .U3 ˝ U1/ kFk UT3 � A.3/ � .U2 ˝ U1/ kF

    784

    set the stage for a componentwise optimization approach: 785

    Fix U2 and U3 and choose U1 to maximize k UT1 � A.1/ � .U3 ˝ U2/ kF:Fix U1 and U3 and choose U2 to maximize k UT2 � A.2/ � .U3 ˝ U1/ kF:Fix U1 and U2 and choose U3 to maximize k UT3 � A.3/ � .U2 ˝ U1/ kF:

    786

    These optimization problems can be solved using the SVD. Consider the problem 787of maximizing k QTM kF where Q 2 Rm�r has orthonormal columns and M 2 788Rm�n is given. If 789

    M D U˙VT 790

    is the SVD of M, then 791

    kQTM k2F D kQTU˙VT k2F D kQTU˙ k2F DrX

    kD1�2k k QTU.W; k/ k22: 792

    It is clear that we can maximize the summation by setting Q D U.W; 1 W r/. Putting 793it all together we obtain the following framework. 794

  • UNCO

    RREC

    TED

    PROO

    F

    42 C.F. Van Loan

    The Tucker Nearness Problem

    795Given A 2 Rn1�n2�n3796Compute the HOSVD of A and determine r D Œr1; r2; r3� � n.797Set U1 D U1.W; 1 W r1/, U2 D U2.W; 1 W r2/, and U3 D U3.W; 1 W r3/.798Repeat:

    799Compute the SVD A.1/ � .U3 ˝ U2/ D QU1˙1VT1800and set U1 D QU1.W; 1 W Qr1/.

    801Compute the SVD A.2/ � .U3 ˝ U1/ D QU2˙2VT2802and set U2 D QU2.W; 1 W Qr2/.

    803Compute the SVD A.3/ � .U2 ˝ U1/ D QU3˙3VT3804and set U3 D QU3.W; 1 W Qr3/.805U.opt/1 D U1, U.opt/2 D U2, U.opt/3 D U3

    806

    Using the HOSVD to generate an initial guess makes sense given the discussion in 807Sect. 7.5. The matrix-matrix products, e.g., A.1/ � .U3 ˝ U2/, are rich in exploitable 808Kronecker structure. See [29] for further details. 809

    7.7 A Jacobi Approach 810

    Solving the Tucker Nearness problem is not equivalent to maximizing the “diagonal 811mass” of the core tensor S. We briefly describe a Jacobi-like procedure that does. 812To motivate the main idea, consider the problem of maximizing tr.UT1 AU2/ where 813A 2 Rn1�n2 is given, U1 2 Rn1�n1 is orthogonal, and U2 2 Rn2�n2 is orthogonal. It is 814easy to show that the optimum U1 and U2 have the property that UT1 AU2 is diagonal 815with nonnegative diagonal entries. Thus, the SVD solves this particular “max trace” 816problem. 817

    Now suppose C is n-by-n-by-n and define 818

    .C/ DnX

    iD1ciii: 819

  • UNCO

    RREC

    TED

    PROO

    F

    Structured Matrix Problems from Tensors 43

    Given A 2 Rn1�n2�n3 , our goal is to compute orthogonal U1 2 Rn1�n1, U2 2 Rn2�n2, 820and U3 2 Rn3�n3 so that if the tensor S is defined by 821

    vec.S/ D .U3 ˝ U2 ˝ U1/Tvec.A/ 822

    then �.S/ is maximized. Here is a Jacobi-like strategy for updating the “current” 823orthogonal triplet fU1; U2; U3 g so that new core tensor has a larger trace. 824

    A Jacobi Framework for Computing aCompressed Tucker Representation

    825Given: A 2 Rn1�n2�n3826Set U1 D In, U2 D In, U3 D In , and S D A:827Repeat:

    828Find “simple” orthogonal QU1, QU2, and QU3 so that829tr.S�1 QU1�2 QU2�3 QU3/ > tr.S/830Update:

    831S D S�1 QU1�2 QU2�3 QU3832U1 D U1 QU1, U2 D U2 QU2, U3 D U3 QU3833U.opt/1 D U1, U.opt/2 D U2, U.opt/3 D U3.

    834

    We say that this iteration strives for a “compressed” Tucker representation because 835there is an explicit attempt to compress the information in A into a relatively small 836number of near-the-diagonal entries in S . One idea for QU3 ˝ QU2 ˝ QU1 is to use 837carefully designed Jacobi rotations, e.g., 838

    QU3 ˝ QU2 ˝ QU1 D

    8̂ˆ̂̂<ˆ̂̂̂:

    In ˝ Jpq.ˇ/ ˝ Jpq.˛/

    Jpq.ˇ/ ˝ In ˝ Jpq.˛/

    Jpq.ˇ/ ˝ Jpq.˛/ ˝ In

    : 839

    Here, Jpq./ is a Jacobi rotation in planes p and q. These updates modify only 840two diagonal entries: sppp and sqqq. Sines and cosines can be chosen to increase 841the resulting trace and their determination leads to a 2-by-2-by-2 Jacobi rotation 842subproblem. 843

    For example, determine c˛ D cos.˛/, s˛ D sin.˛/, cˇ D cos.ˇ/, and sˇ D 844sin.ˇ/, so that if 845

    "�ppp �pqp

    �qpp �qqp

    #D

    "c˛ s˛

    �s˛ c˛

    #T "sppp spqp

    sqpp sqqp

    #"cˇ sˇ

    �sˇ cˇ

    #846

  • UNCO

    RREC

    TED

    PROO

    F

    44 C.F. Van Loan

    and 847

    "�ppq �pqq

    �qpq �qqq

    #D

    "c˛ s˛

    �s˛ c˛

    #T "sppq spqq

    sqpq sqqq

    #"cˇ sˇ

    �sˇ cˇ

    #848

    then �ppp C �qqq is maximized. See [20]. 849

    8 The CP Decomposition 850

    As with the Tucker representation, the CP representation of a tensor expresses the 851tensor as a sum-of-rank-one tensors. However, it does not involve orthogonality and 852the core tensor is truly diagonal, e.g., sijk D 0 unless i D j D k. 853

    A note about terminology before we begin. The ideas behind the CP de- 854composition are very similar to the ideas behind the CANDECOMP (Canonical 855Decomposition) and the PARAFAC (Parallel Factors Decomposition). Thus, “CP” 856is an effective way to acknowledge the connections. 857

    8.1 CP Representations: The Matrix Case 858

    For matrices, the SVD 859

    A D U1˙UT2 DX

    i

    �iU1.W; i/U2.W; i/T 860

    is an example of a CP decomposition. But an eigenvalue decomposition also 861qualifies. If A is diagonalizable, then we have 862

    A D U1diag.�i/UT2 DX

    i

    �iU1.W; i/U2.W; i/T 863

    where UT2 D U�11 . Of course orthogonality is part of the SVD and biorthogonality 864(UT2U1 D I) figures in eigenvalue diagonalization. This kind of structure falls by the 865wayside when we graduate to tensors. 866

  • UNCO

    RREC

    TED

    PROO

    F

    Structured Matrix Problems from Tensors 45

    8.2 CP Representation: The Tensor Case 867

    We use the order-3 situation to expose the key ideas. The CP representation for an 868n1 � n2 � n3 tensor A has the form 869

    A DrX

    kD1�kU1.W; k/ ı U2.W; k/ ı U3.W; k/ 870

    where �’s are real scalars and U1 2 Rn1�r, U2 2 Rn2�r, and U3 2 Rn3�r have unit 8712-norm columns. Alternatively, we have 872

    A.i1; i2; i3/ DrX

    jD1�j � U1.i1; j/ � U2.i2; j/ �U3.i3; j// (21)

    vec.A/ DrX

    jD1�j � U3.W; j/ ˝ U2.W; j/ ˝ U1.W; j/ (22)

    In contrast to the Tucker representation, 873

    A Dr1X

    j1D1

    r2Xj2D1

    r3Xj3D1

    S. j1; j2; j3/ � U1.W; j1/ ı U2.W; j2/ ı U3.W; j3/; 874

    we see that the CP representation involves a diagonal core tensor. The Tucker 875representation gives that up in exchange for orthonormal U matrices. 876

    8.3 More About Tensor Rank 877

    As we mentioned in Sect. 5.3, the rank of a tensorA is the minimum number of rank- 8781 tensors that sum to A. Thus, the length of the shortest possible CP representation 879of a tensor is its rank. Our analysis of the 2-by-2-by-2 situation indicates that there 880are several fundamental differences between tensor rank and matrix rank. Here are 881some more anomalies: 882

    Anomaly 1. The largest rank attainable for an n1-by-. . . -nd tensor is called the 883maximum rank. It is not a simple formula that depends on the dimensions 884n1; : : : ; nd. Indeed, its precise value is only known for small examples. Maximum 885rank does not equal minfn1; : : : ; ndg unless d � 2. 886

    Anomaly 2. If the set of rank-k tensors in Rn1�����nd has positive Lebesgue 887measure, then k is a typical rank. Here are some examples where this quantity 888is known: 889

  • UNCO

    RREC

    TED

    PROO

    F

    46 C.F. Van Loan

    t8.1Size Typical ranks

    t8.22� 2� 2 2,3t8.33� 3� 3 4t8.43� 3� 4 4,5t8.53� 3� 5 5,6

    For n1-by-n2 matrices, typical rank and maximal rank are both equal to the 890smaller of n1 and n2. 891

    Anomaly 3. The rank of a particular tensor over the real field may be different 892than its rank over the complex field. 893

    Anomaly 4. It is possible for a tensor with a given rank to be arbitrarily close to a 894tensor with lesser rank. Such a tensor is said to be degenerate. 895

    For more on the issue of tensor rank, see [10] and [14]. 896

    8.4 The Nearest CP Problem 897

    Suppose A 2 Rn1�n2�n3 and r are given. The nearest CP approximation problem 898involves finding a vector � 2 Rr and matrices U1 2 Rn1�r, U2 2 Rn2�r, and U3 2 899Rn3�r (with unit 2-norm columns) so that 900

    �.U1;U2;U3; �/ D������A �

    rXjD1

    �j � U1.W; j/ ı U2.W; j/ ı U3.W; j/������

    F

    901

    is minimized. The objective function for this multilinear optimization problem has 902three different formulations: 903

    �.U1;U2;U3; �/ D

    8̂ˆ̂̂̂̂ˆ̂̂̂ˆ̂̂̂̂ˆ̂<ˆ̂̂̂̂ˆ̂̂̂ˆ̂̂̂̂ˆ̂̂:

    ������A.1/ �rX

    jD1�j � U1.W; j/ ˝ .U3.W; j/ ˝ U2.W; j//T

    �����