Block Tensor Computations - cs.cornell.edu · New applications, factoriza-tions, data structures, non-linear analysis, optimization strategies, etc. ⊗ Block Tensor Computations

Bridging the Gap from Matrix to Tensor

Block Tensor Computations

Charles Van Loan

Cornell UniversityDepartment of Computer Science

Householder XVIIIJune 12–17, 2011

Tahoe City, California

⊗ Block Tensor Computations ⊗ Charlie Van Loan Householder Talk 1 / 41

A 5-by-5-by-5 Block Tensor...

Assuming that each of the 125 blocks is an order-3 tensor andthat they all they “fit together.”


More Generally, What is a Block Tensor?

Informal Definition

A tensor whose entries are other tensors of the same order.

Example

If A ∈ IR9×5×8×7 and

1:9 =[

1:2 3:5 6:9]

1:5 =[

1:3 4:5]

1:8 =[

1:2 3:4 5:6 7:8]

1:7 =[

1:4 5:7]

then A can be regarded as a 3-by-2-by-4-by-2 block tensor.

The (2,1,3,2) block: A2132 = A(3:5, 1:3, 5:6, 5:7).


The Context

The Next BIG Thing?

Scalar-Level Thinking

1960’s ⇓

Matrix-Level Thinking

1980’s ⇓

Block Matrix-Level Thinking

2000’s ⇓

Tensor-Level Thinking

⇐ The factorization paradigm:LU, LDLT , QR, UΣV T , etc.

⇐ Cache utilization, parallelcomputing, LAPACK, etc.

⇐New applications, factoriza-tions, data structures, non-linear analysis, optimizationstrategies, etc.


The Context (Continued)

The Changing Definition of “Big”

In Matrix Computations, to say that A ∈ IRn1×n2 is “big” is to saythat both n1 and n2 are big.

In Tensor Computations, to say that A ∈ IRn1×···×nd is “big” is tosay that n1n2 · · · nd is big and this need not require big nk . E.g.n1 = n2 = · · · = n1000 = 2.

Algorithms that beat the “curse of dimensionality” will induce atransition...

Matrix-Based Scientific Computation

⇓Tensor-Based Scientific Computation


The Context (Continued)

Tensor-Related Presentations at Householder XVIII

Karen Braman Tamara Kolda

Lars Elden Lek-Heng Lim

Shmuel Friedland Ivan Oseledets

Thomas Huckle Stefan Ragnarsson

Misha Kilmer Berkant Savas

Sabine Van Huffel


The Fringe Benefits of Blocking In Matrix Computations

Insight

FFT: F2mx =

[Im diag(ωk

n )

Im −diag(ωkn )

] [Fm 0

0 Fm

] [x(2:2:2m)

x(1:2:2m)

]

Data Re-Use

Level-3 BLAS =⇒ Block LU, QR, etc

Generalization

Lanczos =⇒ Block Lanczos

Givens Rotations =⇒ CS Decomposition

Versions of these stories are beginning to play out in tensorcomputations and this talk is about that.


Acknowledgments

Lieven De Lathauwer

PhD Thesis (1997)

Recent SIMAX papers that connect the Parafac and Tuckerrepresentations through a block-structured core tensor. (2010)

What follows is based upon ongoing research and two papers...

Block Tensors and Symmetric Embeddings (with Stefan Ragnarsson)

Block Tensor Unfoldings (with Stefan Ragnarsson)

NSF DMS-1016284


Tensor Unfolding

A Common Framework for Tensor Computations...

1. Reshape tensor A into a matrix A.

2. Through matrix computations, discover things about matrix A.

3. Draw conclusions about tensor A based on what is learned aboutmatrix A.

“Reshape” ≡ “Unfold” ≡ “Matricize” ≡ “Flatten”


Modal Unfoldings

A Mode-1 Unfolding of A ∈ IR4×3×2

A(1) =

a111 a121 a131 a112 a122 a132

a211 a221 a231 a212 a222 a232

a311 a321 a331 a312 a322 a332

a411 a421 a431 a412 a422 a432

(1,1) (2,1) (3,1) (1,2) (2,2) (3,2)

A Mode-2 Unfolding of A ∈ IR4×3×2

A(2) =

a111 a211 a311 a411 a112 a212 a312 a412

a121 a221 a321 a421 a122 a222 a322 a422

a131 a231 a331 a431 a132 a232 a332 a432

(1,1) (2,1) (3,1) (4,1) (1,2) (2,2) (3,2) (4,2)


More General Unfoldings

If A ∈ IR2×3×2×2×3, r = [1, 2, 4], and c = [3, 5], then

Ar×c =

(1,1) (2,1) (1,2) (2,2) (1,3) (2,3)

a11111 a11211 a11112 a11212 a11113 a11213

a21111 a21211 a21112 a21212 a21113 a21213

a12111 a12211 a12112 a12212 a12113 a12213

a22111 a22211 a22112 a22212 a22113 a22213

a13111 a13211 a13112 a13212 a13113 a13213

a23111 a23211 a23112 a23212 a23113 a23213

a11121 a11221 a11122 a11222 a11123 a11223

a21121 a21221 a21122 a21222 a21123 a21223

a12121 a12221 a12122 a12222 a12123 a12223

a22121 a22221 a22122 a22222 a22123 a22223

a13121 a13221 a13122 a13222 a13123 a13223

a23121 a23221 a23122 a23222 a23123 a23223

(1,1,1)

(2,1,1)

(1,2,1)

(2,2,1)

(1,3,1)

(2,3,1)

(1,1,2)

(2,1,2)

(1,2,2)

(2,2,2)

(1,3,2)

(2,3,2)


With the Tensor Toolbox..

A = tenrand([2 3 2 2 3]);

r = [ 1 2 4];

c = [3 5];

Amat = tenmat(A,r,c);

Kolda and Bader (2006)


Block Unfoldings

Tensor Blocks Are Not Contiguous in a Vec-Based Unfolding

�

CCCCCCW

A311 = A(6:9, 1:3, 1:2)H

HHHHHHj

AAAU

A213 = A(3:5, 1:3, 5:6)

CCWXXXXXz

A124 = A(1:2, 4:5, 7:8)

vvvv

vvvv

vvvv

vvvv

vvvv

vvvv

v v v v v vv v v v v vv v v v v vvv vv vv vv

Tensor Blocks Are Contiguous in a Block Unfolding

2{

3{

4

{︸︷︷︸

6︸︷︷︸

4︸︷︷︸

6︸︷︷︸

4︸︷︷︸

6︸︷︷︸

4︸︷︷︸

6︸︷︷︸

4

(A111)(1)

(A211)(1)

(A311)(1)

(A121)(1)

(A221)(1)

(A321)(1)

(A112)(1)

(A212)(1)

(A312)(1)

(A122)(1)

(A222)(1)

(A322)(1)

(A113)(1)

(A213)(1)

(A313)(1)

(A123)(1)

(A223)(1)

(A323)(1)

(A114)(1)

(A214)(1)

(A314)(1)

(A124)(1)

(A224)(1)

(A324)(1)


Vec vs BlockVec

The Vec Ordering...

A =

1 12 23 34 45 56 67 78 89

2 13 24 35 46 57 68 79 90

3 14 25 36 47 58 69 80 91

4 15 26 37 48 59 70 81 92

5 16 27 38 49 60 71 82 93

6 17 28 39 50 61 72 83 94

7 18 29 40 51 62 73 84 95

8 19 30 41 52 63 74 85 96

9 20 31 42 53 64 75 86 97

10 21 32 43 54 65 76 87 98

11 22 33 44 55 66 77 88 99

If v = Vec(A) then v(58) = A(3,6).


Vec vs BlockVec

The BlockVec Ordering...

A =

1 3 23 25 27 29 67 69 71

2 4 24 26 28 30 68 70 72

5 10 31 36 41 46 73 78 83

6 11 32 37 42 47 74 79 84

7 12 33 38 43 48 75 80 85

8 13 34 39 44 49 76 81 86

9 14 35 40 45 50 77 82 87

15 19 51 55 59 63 88 92 96

16 20 52 56 60 64 89 93 97

17 21 53 57 61 65 90 94 98

18 22 54 58 62 66 91 95 99

If v = BlockVec(A) then v(46) = A(3,6).


Vec vs BlockVec

Vec for Tensors...

If A ∈ IRn1×n2×n3 , then

vec (A) =

vec(A( : , : , 1))...

vec(A( : , : , n3))

BlockVec for Tensors

If A is a 2-by-2-by-2 block tensor, then

BlockVec(A) =

vec(A111)vec(A211)vec(A121)vec(A221)vec(A112)vec(A212)vec(A122)vec(A222)


Connections

Basic Theorem

We have a complete specification of the permutation that mapsVec(A) to BlockVec(A) where A is a tensor with blocking M:

BlockVec(A) = PM Vec(A)

It is an intricate combination of perfect shuffles.

Block Unfolding Theorem

Relates the r × c vec-based unfolding to the corresponding blockunfolding:

AR×C = PR Ar×c PTC

Blocks in the tensor become contiguous blocks in the unfolding.

Some ramifications...


Symmetric Embedding andTensor Rank


Is There a Tensor Analog of This?

The “Sym” of a Matrix

sym(A) =

[0 A

AT 0

]∈ IR(n1+n2)×(n1+n2)

The SVD of A Relates to the EVD of sym(A)

If A = U · diag(σi ) · V T is the SVD of A ∈ IRn1×n2 , then fork = 1:rank(A) [

0 AAT 0

] [uk

±vk

]= ±σk

[uk

±vk

]where uk = U(:, k) and vk = V (:, k).

Try to shed light on the tensor rank problem and connect some well-knownpower iterations.


Tensor Transposition: The Order-3 Case

Six possibilities...

If A ∈ IRn1×n2×n3 , then there are 6 = 3! possible transpositionsidentified by the notation A< [i j k] > where [i j k] is a permutation of[1 2 3]:

B =

A< [1 2 3] >

A< [1 3 2] >

A< [2 1 3] >

A< [2 3 1] >

A< [3 1 2] >

A< [3 2 1] >

=⇒

bijk

bikj

bjik

bjki

bkij

bkji

= aijk

for i = 1:n1, j = 1:n2, k = 1:n3.


Supersymmetry

Order-3 Definition

A(i , j , k) =

A(i , k, j)

A(j , i , k)

A(i , k, i)

A(k, i , j)

A(k, j , i)


Symmetric Embedding of a Tensor

An Order-3 Example...

Note the careful placement of A’s six transposes

C(:, :, 1)

��

��

��

��

��

��

��

��A<[321]>

A<[231]>

C(:, :, 2)

��

��

��

��

��

��

��

��A<[312]>

A<[132]>

C(:, :, 3)

��

��

��

��

��

��

��

��

A<[123]>

A<[213]>


Rank-1 Tensors

An Example

A =

102030

◦ [4050

]◦

60708090

◦ [100110

]

A(3, 1, 4, 2) = 30 · 40 · 90 · 110


Some Tensor Rank Definitions (Order-4 Examples)

Outer Product Rank

Shortest sum of the form A =r∑

k=1

uk ◦ vk ◦ wk ◦ zk

Multilinear Rank [r1, r2, r3, r4]

r1 = rank of the Mode-1 unfoldingr2 = rank of the Mode-2 unfoldingr3 = rank of the Mode-3 unfoldingr4 = rank of the Mode-4 unfolding

Symmetric Rank

Shortest sum of the form A =r∑

k=1

αk · uk ◦ uk ◦ uk ◦ uk


Some Results for Order-d Tensors

Outer Product Rank

d · rank(A) ≤ rank(sym(A)) ≤ d! · rank(A)

Multilinear Rank

If A is an order-d tensor with multilinear rank [r1, . . . , rd ], then themultilinear rank of sym(A) is [r , r , . . . , r ] where r = r1 + · · ·+ rd .


Contractions andUnfoldings


Vec-Based Contractions and Unfoldings

A Canonical example...

C(i1, i2, j1, j2, j3) =

p1∑k1=1

p2∑k2=1

A(i1, i2, k1, k2)B(k1, k2, j1, j2, j3)

With Multi-index Notation...

C(i, j) =

p∑k=1

A(i, k)B(k, j)

As a Product of Unfoldings...

C[1 2]×[3 4 5] = A[1 2]×[3 4] · B[1 2]×[3 4 5]

A tensor contraction doesn’t just look like a matrix multiplication–itIS a matrix multiplication!


Blocked Matrix Multiplication

Visualization

C11 C12

C21 C22

C31 C32

=

A11 A12 A13 A14

A21 A22 A23 A24

A31 A32 A33 A34

B11 B12

B21 B22

B31 B32

B41 B42

C31 =[

A31 A32 A33 A34

] B11

B21

B31

B41

C31 = A31 ·B11 + A32 ·B21 + A33 ·B31 + A43 ·B41


Blocked Contractions

Visualization of A ? B where “?” is Some Contraction

?�

��

��

��

��

��

�

��

��

��

��

��

�

��

��

�

��

��

�

��

��

�

��

��

�

��

��

�

��

��

�

?��

��

��

��

��

��

��

��

��

��

��

? ?+�

�

��

��

��

��

��

��

��

��

��

��

��


Block Unfoldings and Block Contractions

The Setting...

ComputeC = A ? B

where ? is some contraction and A and B are blocked conformably.

Results

We have shown how to frame this as a block matrix product...

CR×C = AR×Λ · BΛ×C

Ongoing...

High-Performance Implementations, Data Structures, Strassen Ideas,Communication Lower Bounds


Block Representation andApproximation


Useful Representations

The Higher Order SVD

Compute the SVD of each modal unfolding and “glue together theresults to characterize/approximate the original tensor.

The Higher-Order Kronecker Product SVD

Compute the KSVD of an arbitrary block unfolding and use theresults to characterize/approximate the original tensor.


The Higher Order Singular Value Decomposition (HOSVD)

Basic Idea (Order-3 Case)

The HOSVD of an n1-by-n2-by-n3 tensor A involves computing thematrix SVDs of its modal unfoldings A(1), A(2), and A(3):

UT1 A(1)V1 = Σ1

UT2 A(2)V2 = Σ2

UT3 A(3)V3 = Σ3

permitting us to write

A =

n1∑j1=1

n2∑j2=1

n3∑j3=1

S(j1, j2, j3) · (U1(:, j1) ◦ U2(:, j2) ◦ U3(:, j3))

De Lathauer, De Moor, and Vandewalle (2000)


The Higher Order Singular Value Decomposition (HOSVD)

Basic Idea (Order-3 Case)

The HOSVD of an n1-by-n2-by-n3 tensor A involves computing thematrix SVDs of its modal unfoldings A(1), A(2), and A(3):

UT1 A(1)V1 = Σ1

UT2 A(2)V2 = Σ2

UT3 A(3)V3 = Σ3

permitting us to approximate

A ≈r1∑

j1=1

r2∑j2=1

r3∑j3=1

S(j1, j2, j3) · (U1(:, j1) ◦ U2(:, j2) ◦ U3(:, j3))

De Lathauer, De Moor, and Vandewalle (2000)


The Kronecker Product SVD (KPSVD)

For Uniformly Blocked Matrices...

A =

A11 · · · A1q...

. . ....

Ap1 · · · Apq

=r∑

m=1

σmUm ⊗ Vm.

Nearest Sum of s Kronecker Products in the Frobenius Norm...

As =s∑

m=1

σmUm ⊗ Vm.

Pitsianis and VL (1992)


Approximating a Special Order-4 Tensor

A Structured Summation that is O(N4)

Compute

µ =N∑

i=1

N∑j=1

N∑k=1

N∑`=1

A(i , j , k, `)vivjvkv`

where v ∈ IRN and A has the following symmetries:

A(i , j , k, `) =

A(j , i , k, `)

A(i , j , `, k)

A(k, `, i , j)


Three Symmetries

The [1 2]× [3 4] Unfolding Inherits these Symmetries...

280 206 100 206 182 187 100 187 296

206 328 188 182 138 148 187 244 143

100 188 176 187 148 122 296 143 326

206 182 187 328 138 244 188 148 143

182 138 148 138 312 192 148 192 212

187 148 122 244 192 272 143 212 200

100 187 296 188 148 143 176 122 326

187 244 143 148 192 212 122 272 200

296 143 326 143 212 200 326 200 280


The Kronecker Product SVD (KPSVD)

The KSVD is Structured...

A[1 2]×[3 4] =r∑

m=1

σmUm ⊗ Um UTm = Um

i.e.,

A(i , j , k, l) =r∑

m=1

σmUm(i , j)Um(k, `)

The summation µ becomes an O(sN2) summation:

µ =r∑

m=1

N∑i=1

N∑j=1

N∑k=1

N∑`=1

σmUm(i , j)Um(k, `)vivjvkv`

=r∑

m=1

σm(vTUmv)2 ≈s∑

m=1

σm(vTUmv)2


The Higher Order KSVD

The Framework

A is a block tensor (Ai).

Let A be a block unfolding.

Compute its KSVD: A =∑

σmUm ⊗ Vm

Equivalent to writing the block tensor A as a sum of “tensorKronecker Products”:

A =∑

σm Um“⊗ ”Vm

If U and V are tensors, then U“⊗ ”V is a block tensor whose i-th block isthe tensor U(i) · V, i.e., U(i1, i2, i3) · V.


Summary


Block unfoldings preserve structure and locality of data.

The higher-order Kronecker Product SVD offers a block-levelapproach to low-rank tensor approximation.

The symmetric embedding shows how new algorithms and analysesare prompted by thinking at the “block” level.

In my opinion, blocking will eventually have the same impact intensor computations as it does in matrix computations.


Documents

Block Tensor Computations - cs.cornell.edu · New applications, factoriza-tions, data structures, non-linear analysis, optimization strategies, etc. ⊗ Block Tensor Computations