238
Preface B. Khoromskij, Z¨ urich 2010 1 These notes are based on my lectures given to the students of Pro Doc Program at the University/ETH Z¨urich, in the winter semester of 2010. This course, consisting of 18 lectures and MATLAB exercises, presents an introduction to the modern tensor-structured numerical methods in scientific computing. In the recent years these methods were proved to provide a powerful tool for efficient computations in higher dimensions overcoming the so-called “curse of dimensionality”. In these lectures I try to display a triple of probably most important ingredients of the tensor approach: Analytical methods of separable approximation of multivariate functions and operators in R d , d 3. Algebraic low-rank approximation to function related multi-dimensional vectors/matrices in basic tensor formats, and the respective multilinear algebra in R n×n×...×n . Tensor truncated iterative methods in the Tucker, tensor train (TT) and quantics-TT formats with applications to the solution of multi- dimensional equations in electronic structure calculations, quantum molecular dynamics and stochastic PDEs. BNK Z¨urich, October – December 2010. Introduction to Tensor Numerical Methods I B. Khoromskij, Z¨ urich 2010 2 Everything should be made as simple as possible, but not simpler. A. Einstein (1879-1955) Introduction to Tensor Numerical Methods in Scientific Computing (Part I. Analytic Methods of Separable Approximation) Boris N. Khoromskij http://personal-homepages.mis.mpg.de/bokh University/ETHZ¨urich,Pro Doc Program, WS 2010

Introduction to Tensor Numerical Methods in Scientiï¬c Computing

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Preface B. Khoromskij, Zurich 2010 1

These notes are based on my lectures given to the students of Pro∗Doc

Program at the University/ETH Zurich, in the winter semester of 2010.

This course, consisting of 18 lectures and MATLAB exercises, presents an

introduction to the modern tensor-structured numerical methods in

scientific computing. In the recent years these methods were proved to

provide a powerful tool for efficient computations in higher dimensions

overcoming the so-called “curse of dimensionality”.

In these lectures I try to display a triple of probably most important

ingredients of the tensor approach:

⋄ Analytical methods of separable approximation of multivariate functions

and operators in Rd, d ≥ 3.

⋄ Algebraic low-rank approximation to function related multi-dimensional

vectors/matrices in basic tensor formats, and the respective multilinear

algebra in Rn×n×...×n.

⋄ Tensor truncated iterative methods in the Tucker, tensor train (TT)

and quantics-TT formats with applications to the solution of multi-

dimensional equations in electronic structure calculations, quantum

molecular dynamics and stochastic PDEs.

BNK Zurich, October – December 2010.

Introduction to Tensor Numerical Methods I B. Khoromskij, Zurich 2010 2

Everything should be made as simple

as possible, but not simpler.

A. Einstein (1879-1955)

Introduction to Tensor Numerical Methods in

Scientific Computing

(Part I. Analytic Methods of Separable Approximation)

Boris N. Khoromskij

http://personal-homepages.mis.mpg.de/bokh

University/ETH Zurich, Pro∗Doc Program, WS 2010

Page 2: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Outline of the Lecture Course B. Khoromskij, Zurich 2010 3

Part I. Analytic Methods of Separable Approximation in Rd.

1. Separable approximation of multivariate functions in Rd. Basic rank

structured tensor-product formats. Curse of dimension and

Kolmogorow’s paradigm. Schmidt expansion. Greedy Algorithms for

d ≥ 3.

2. Classical Polynomial Approximation. Tensor-product polynomial and

trigonometric interpolation. Application to the Helmholtz kernel.

Functions of the form f(x1 + ...+ xd).

3. Separation by integration. Fitting by exponential sums. Celebrated

sampling theorem. sinc- interpolation and quadratures for analytic

functions. Error estimate for truncated sums.

4. Separable representation of analytic, shift-invariant functions.

Kronecker-product representation of multi-dimensional integral

operators Au =R

Rdg(‖ · −y‖)u(y)dy. Tensor product convolution.

Part II. Algebraic Methods of Tensor Approximation. Multilinear

Algebra. (see page xx)

Part III. Solving Equations by TT/QTT methods (BVPs, EVPs,

transient problems.) (see page xx)

Lect. 1. On separable approxim. in higher dimensions B. Khoromskij, Zurich 2010(L1) 4

Outlook of Lecture 1.

• Motivations: Modern applications in higher dimensions.

• From low to higher dimensions: what can be adopted from

traditional numerics.

• Rank structured separable representations of multi-variate

functions in Rd. Basic dimension splitting formats.

• Indispensable rank structured matrix/tensor multilinear

algebra (MLA).

• “Curse of dimensionality” and Kolmogorow’s paradigm.

• d = 2: Celebrated Schmidt’s decomposition (SD).

• Greedy Algorithms: simple but slow convergence.

Page 3: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Separability concept in multi-dimensional modeling B. Khoromskij, Zurich 2010(L1) 5

1929, Dirac:

The fundamental laws necessary for the mathematical treatment of large

part of physics and the whole of chemistry are thus completely known,

and the difficulty lies only in the fact that application of these laws leads

to equations that are too complex to be solved.

1998, W. Kohn, A. Pople:

Nobel Prize in Chemistry for development of DFT, based on

use of problem adapted (separable) GTO basis sets.

Nowadays: Spreading of tensor methods in multi-dimensional

numerical modeling:

Effective nonlinear approximation of operators/functions in Rd,

MLA with linear complexity scaling in dimension d,

Initial applications in comput. chemistry, sPDEs, quantum computing.

Multi-dimensional equations in wide range applications B. Khoromskij, Zurich 2010(L1) 6

Basic physical models include (nonlocal) multivariate transforms.

Examples of high dimensional problems.

1. Multi-dimensional integral operators in Rd (convolution and Green’s

functions, Fourier, Laplace transforms).

2. Elliptic/parabolic/hyperbolic solution operators, preconditioning.

3. Schrodinger eq. for many-particle systems. Density matrix

calculation in R3 × R3 (DFT, Hartree-Fock/Kohn-Sham eqs.),

quantum molecular dynamics, DMRG and quantum computing.

4. Stochactic/parametric PDEs, Kolmogorow forward/Fokker-Planck

eqs

5. Financial math. (Kolmogorow backward, Black-Scholes eqs).

6. Collision integrals in the deterministic Boltzmann eq. in R3

(dilute gas).

7. Multi-dimensional data in chemometrics, psychometrics, higher-order

statistics, data mining, ...

Page 4: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Examples of operator calculus B. Khoromskij, Zurich 2010(L1) 7

Tensor structured vectors and matrices in Rn×d

= Rnd :

x ∈ Rnd R

n ⊗ ...⊗ Rn, A ∈ R

md×nd Rm×n ⊗ ...⊗ R

m×n.

• Linear elliptic systems and spectral problems

Au = f, Au = λu ⇒ B ≈ A−1.

• Volume/interface preconditioning ⇒ ∆−α, α = 1,±1/2.

• Parabolic equations

∂u∂t +Au = f ⇒ exp(−tA), (A+ 1

τ I)−1.

• Control theory: Matrix Lyapunov equation on Rn×n,

AX +XB = G ⇒ X =∫∞0e−tAGe−tBdt, sign(A).

Challenge of Higher Dimensions B. Khoromskij, Zurich 2010(L1) 8

1. Motivating applications:

Molecular systems: quantum molecular dynamics, DMRG in quant. chem.

FEM/BEM in Rd: stochastic PDEs, atmospheric model., financial math.

Data mining: quantum computing, machine learning, image processing.

2. ”Curse of dimensionality”: (R. Bellman, Princeton UP, NJ, 1961).

O(Nd)-methods using N ×N × ...×N| z d

grids (linear in volume size).

3. O(dN)-Methods via separation of variables:

Tensor-formatted methods to represent d-variate functions, operators, and

for solving equations on rank-structured tensor manifolds in Rd, d ≥ 3.

4. log-volume super-compressed representation:

Quantics-TT approximation of N-d tensors, Nd → O(d logN).

Page 5: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Large problems in low dimensions B. Khoromskij, Zurich 2010(L1) 9

In low dimensions (d = 1, 2, 3) the goal is O(N)-methods.

Main principles: making use of hierarchical structures,

low-rank pattern, recursive algorithms and parallelization.

Based on recursions via hierarchical structures:

Classical Fourier (1768-1830) methods, FFT in O(N logN) op.

FFT-based circulant convolution, Toeplitz, Hankel matrices.

Multiresolution representation via wavelets, O(N)-FWT.

Multigrid methods: O(N) - elliptic problem solvers.

Fast multipole, panel clustering, H-matrix in O(cdN logN) op.

Well suited for integral (nonlocal) operators in FEM/BEM.

Parallelization:

Domain decomposition: O(N/p) - parallel algorithms.

Traditional numerical tools of reduced complexity B. Khoromskij, Zurich 2010(L1) 10

• High order methods: hp-FEM/BEM, spectral methods,

bcFEM, Richardson extrapolation.

• Adaptive mesh refinement: a priori/a posteriori strateg.

• Dimension reduction: boundary/interface equations,

Schur complement/domain decomposition methods.

• Combination of tensor-product basis with anisotropic

adaptivity: hyperbolic cross approximation by

FEM/wavelet (sparse grids).

• Model reduction: multi-scale, homogenization, neural

networks.

• Monte-Carlo method (e.g., random walk dynamics).

Page 6: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Separabe representation of functions in TPHS B. Khoromskij, Zurich 2010(L1) 11

Let Hℓ (ℓ = 1, ..., d) be a real, separable Hilbert space of

functions. M. Reed, B. Simon, Functional analysis, AP, 1972.

Def. 1.1 A tensor-product of Hilbert spaces Hℓ (TPHS),

H = H1 ⊗ ...⊗Hd, is defined as the closure of a set of finite

sums,∑

k

⊗dℓ=1w

(ℓ)k , of dual multilinear forms (linear

functionals) on H1 × ...×Hd. A single form is defined by

d⊗

ℓ=1

w(ℓ) (v(1), ..., v(d)) :=

d∏

ℓ=1

〈w(ℓ), v(ℓ)〉Hℓ.

The scalar product of rank-1 (separable) elements (tensors)

in H is defined by

〈w(1) ⊗ . . .⊗ w(d), v(1) ⊗ . . .⊗ v(d)〉 =d∏

ℓ=1

〈w(ℓ), v(ℓ)〉,

and it is extended by linearity.

〈·, ·〉 is called the induced scalar product.

Basic properties of TPHS. First examples. B. Khoromskij, Zurich 2010(L1) 12

Lem. 1.1 〈·, ·〉 is well defined and it is positive definite.

Lem. 1.2 If φ(ℓ)kℓ

is an orthonormal basis in Hℓ, then Φk =

⊗dℓ=1 φ

(ℓ)kℓ

, k = (k1, ..., kd) ∈ Nd, is the orthonormal basis in H.

Exercise 1.1. Prove Lem. 1.1 - 1.2.

The tensor product of univariate functions f(ℓ)(xℓ), xℓ ∈ Iℓ = [aℓ, bℓ], is a

d-variate function (called as separable or rank-1) defined as follows

f :=dO

ℓ=1

f(ℓ), where f(x1, ..., xd) =dY

ℓ=1

f(ℓ)(xℓ).

Exer. 1.2 Prove L2(I1 × ...× Id) =⊗d

ℓ=1 L2(Iℓ).

Example 2.1 Denote by H⊗n the n-fold tensor product of spaces H. If

H = L2(R), then an element ψ ∈ F(H) := ⊕∞n=0H

⊗n, of the so-called Fock

space over H, F(H), is a sequence of functions

ψ = ψ0, ψ1(x1), ψ2(x1, x2), ψ3(x1, x2, x3), . . .,

Page 7: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Basic properties of TPHS. First examples. B. Khoromskij, Zurich 2010(L1) 13

such that

|ψ0|2 +

∞X

n=1

Z

Rn|ψn(x1, . . . , xn)|2dx1 . . . dxn < ∞.

The finite expansion in F(H) as above is also known as ANOVA repr.

In the physical literature, the subspaces of F(H) consisting of

symmetric/antisymmetric functions w.r.t. permutation of two arguments

are called the boson and fermion Fock spaces, respectively.

Def. 1.2 d-th order tensor is a function of d discrete

arguments, f : RI1×...×Id → R, (multi-dimensional array over

I1 × ...× Id). The respective TPHS H is equipped with

Euclidean scalar product and Frobenius norm (More details in Lect. 6).

Example 1.3 H = RI1×...×Id =⊗d

ℓ=1 RIℓ , with Iℓ = 1, ..., nℓ.

Tensor formats: Canonical representation in TPHS B. Khoromskij, Zurich 2010(L1) 14

Def. 1.3 (Canonical format). Call by CR a subset of

elements in H, requiring at most R terms (rank-R functions),

CR =

w ∈ H : w =

R∑

k=1

w(1)k ⊗ w

(2)k ⊗ . . .⊗ w

(d)k , w

(ℓ)k ∈ Hℓ

.

w ∈ CR can be represented by the description of Rd elements

w(ℓ)k ∈ Hℓ. Storage on nd-grid: dRn (linear in d).

Advantage: Tremendous reduction of representation cost,

removing d from the exponential, nd → dRn.

Limitations: Applies to special class of functions given

analytically, nonrobust algebraic decomposition.

Probl. 1. Best rank-R approximation of a multi-variate

function f = f(x1, ..., xd) ∈ H in the set CR.

Page 8: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Orthogonal separabe representation B. Khoromskij, Zurich 2010(L1) 15

Given a tuple of dimensions, r = (r1, . . . , rd) ∈ Nd, choose

Vℓ = spanφ

(ℓ)k

rℓ

k=1⊂ Hℓ, rℓ := dimVℓ <∞ (1 ≤ ℓ ≤ d) with

orthogonal basis and build the tensor subspace,

V = V1 ⊗ V2 ⊗ . . .⊗ Vd ⊂ H. Each v ∈ V can be represented by

v =r∑

k=1

bkφ(1)k1

⊗ φ(2)k2

⊗ . . .⊗ φ(d)kd. (1)

Def 1.4 (Tucker format) Given r, define

Tr := v ∈ V ⊂ H : ∀ Vℓ s.t. dimVℓ = rℓ with bk ∈ R .

Representing w ∈ Tr:∏d

ℓ=1 rℓ reals and the sampling of∑d

ℓ=1 rℓ

functions φ(ℓ)k .

Robust but storage on nd-grid: rd + rdn≪ nd, r = max rℓ.

Orthogonal separabe representation B. Khoromskij, Zurich 2010(L1) 16

Visualization of the canonical and Tucker models for d = 3.

+

b

A

1b

V V V

V V V

V V V

+= ...+

1

1 2

2

2

r

r

r

(1) (1) (1)

(2) (2) (2)

21

(3) (3) (3)

rb

=

I 2

I 1

I 3

A B

I 1

r 2

r 1

I 2

I 3

r 3

V

V

V

(1)

(2)

(3)

Probl. 2. Best rank-r orthogonal approx. of f ∈ H in Tr.

Page 9: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Examples on rank-R and Tucker formats B. Khoromskij, Zurich 2010(L1) 17

Ex. 1.4 H = L2(Id). Rank-1 elements, f = f1(x1)...fd(xd), e.g.

f = exp(f1(x1) + ...+ fd(xd)) =∏d

ℓ=1 exp(fℓ(xℓ)). For the function

f = sin(∑d

j=1 xj

), rank(f) = 2 holds over the field C,

2i sin(∑d

j=1 xj

)= ei

Pdj=1 xj − e−i

Pdj=1 xj .

Rank-d function f(x) = x1 + x2 + . . .+ xd, can be approximated

by a rank-2 expansion with any prescribed accuracy,

f ≈Qdℓ=1(1+εxℓ)−1

ε +O(ε), as ε→ 0.

Ex. 1.5 The Tucker approximation in H = L2(Id) can be made

by the tensor product polynomial interp. of order r,

f(x1, ..., xd) ≈r∑

j=1

f(νj1 , ..., νjd)d∏

ℓ=1

Ljℓ(xℓ).

Ljℓ is a set of the Lagrange polynomials on [−1, 1] at, say,

Chebyshev-Gauss-Lobatto grid, νjℓ, jℓ = 1, ..., rℓ.

Function product decomposition (Tensor chain/train) B. Khoromskij, Zurich 2010(L1) 18

Given J := ×dℓ=1Jℓ, Jℓ = 1, ..., rℓ, and J0 = Jd.

Def. 1.5 A rank-r functional tensor chain/train (FTC/FTT)

format: contracted product of functional tri-tensors over J ,

f(x1, ..., xd) =∑

j∈Jg1(jd, x1, j1)g2(j1, x2, j2) · · · gd(jd−1, xd, jd)

or in a compact form

FTC[r]:= f ∈ H : f = ×Jℓdℓ=1G

(ℓ)(xℓ) with G(ℓ) ∈ RJℓ−1×Hℓ×Jℓ.

If J0 = 1, we have the FTT decomp. Here G(1)(x1) is a row

1 × r1-vector function depending on x1, G(ℓ)(xℓ) is a matrix of

size rℓ−1 × rℓ with functional elements depending on xℓ,

G(d)(xd) is a column vector of size rd−1 × 1, depending on xd.

Sampling on a nd-grid: O(dr2n)-storage.

A function f ∈ H is approximated by a product of matrices

(matrix product states), each depending on a single variable.

Page 10: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Examples on FTT decomposition B. Khoromskij, Zurich 2010(L1) 19

Ex. 1.6 d-fold contracted product of tri-tensors over J1, ..., Jd (d = 6).

N

r1

r1rr

2 2r3

d=6

r

N

N

3

r6

r5

6

r5 r4

r r4

Special case r6 = 1: FTT[r] = FTC[r].

Exer. 1.3 In some cases the function product decomp. can be

constructed explicitly. FTT rank of f(x) = x1 + x2 + . . .+ xd is 2,

[Oseledets ’10].

f(x) =“x1 1

”0@ 1 0

x2 1

1A · · ·

0@ 1 0

xd−1 1

1A0@ 1

xd

1A .

Nonlinear approximation in tensor format B. Khoromskij, Zurich 2010(L1) 20

Since Tr, CR and FTC[r] are not linear spaces, we obtain a

nontrivial nonlinear approximation problem on estimation

f ∈ H : σ(f,S) := infs∈S

‖f − s‖, (2)

where S = Tr, CR, FTC[r].Why the problem (317) might be difficult for d ≥ 3?

Prop. 1.1 [Beylkin, Mohlenkamp] The trigonometric identity (d ≥ 2)

f(x) := sin

d∑

j=1

xj

=

d∑

j=1

sin(xj)∏

k∈1,...,d\j

sin(xk + αk − αj)

sin(αk − αj)

(3)

holds for any αk ∈ R, s.t. sin(αk − αj) 6= 0 for all j 6= k.

For d ≥ 3 it can be proven by induction (nontrivial exercise!).

Exer. 1.4 Proof that FTT-rank of f in (44) is 2 (Lect. 2).

Page 11: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Nonlinear approximation in tensor format B. Khoromskij, Zurich 2010(L1) 21

Expansion (44) shows the lack of uniqueness (ambiguity) of

the best rank-d tensor representation. The minimisation

process might be non-robust (multiple local minima).

Principal questions (no ultimate answers):

Is the “curse of dimensionality” relevant?

How to solve (317) efficiently? (Extend truncated SVD)

Can one expect the fast (exponential) convergence in

the rank parameters R, r = max rℓ?

Can one solve the physical equations on nonlinear

tensor manifold S getting rid of “curse of dimension”?

Our approach: Construct tensor-structured numerical

methods based on efficient multilinear algebra (MLA).

Kolmogorow’s paradigm B. Khoromskij, Zurich 2010(L1) 22

Hilbert 13th problem: A solution of the algebraic equation of degree 7

cannot be written as superposition of continuous bivariate functions.

Solved by celebrated theorem by Kolmogorow on the

superposition of univariate functions.

Thm. 1. (A. Kolmogorow 1957) Let I = [0, 1]. For d ≥ 2, any

function f ∈ C[Id] can be represented in the form

f(x1, ..., xd) =2d+1∑

i=1

gi

(d∑

ℓ=1

φiℓ(xℓ)

),

where functions φiℓ : I → R do not depend on f and belong to

the class Lip1, while gi : R → R are continuous functions.

Thm. 1. is not constructive, but in our context it says that in the discrete

setting, any function f can be represented by O(2dN + (2d+ 1)dN) reals, N

corresponds to the size of the interpolating table for gi [Griebel].

Page 12: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

d = 2: Schmidt expansion and SVD B. Khoromskij, Zurich 2010(L1) 23

The approximation of functions f(x, y) by bilinear forms

f ≈R∑

k=1

uk(x)vk(y) in L2([0, 1]2),

is due to E. Schmidt, 1907 (celebrated theorem). The result

is a continuous analogue to SVD of matrices.

Let σk(Jf ), σ1 ≥ σ2 ≥ ... ≥ 0, be a nonincreasing sequence of

singular values of the IO,

Jf (g) :=

∫ 1

0

f(x, y)g(y)dy,

σk(Jf ) := λk[(A)1/2], A = J∗f Jf , J∗f adjoint to Jf

with orthonormal sequences ϕk(x), ψk(y),

Aψk(y) = λkψk(y); A∗ϕk(x) = λkϕk(x), k = 1, 2, ...

d = 2: Schmidt expansion and SVD B. Khoromskij, Zurich 2010(L1) 24

The kernel function of A is given by

fA(x, y) :=

∫ 1

0

f(x, z)f(z, y)dz.

The Schmidt decomposition (SD) is given by

f(x, y) =

∞∑

k=1

σk(Jf )ϕk(x)ψk(y).

The best bilinear approximation property reads as,‚‚‚‚‚f(x, y) −

RX

k=1

σkϕk(x)ψk(y)

‚‚‚‚‚L2

= infuk,vk∈L2, k=1,...,R

‚‚‚‚‚f(x, y) −RX

k=1

uk(x)vk(y)

‚‚‚‚‚L2

.

SD ensures that for d = 2 the best bilinear approximation can

be realised by the so-called Pure Greedy Algorithm (PGA).

For Nystrom’s approximation the problem is reduced to SVD.

Page 13: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Computing canonical decomposition B. Khoromskij, Zurich 2010(L1) 25

For S = CR, the canonical decomposition can be considered in

the framework of the best R-term approximation with regard

to a redundant dictionary of rank-1 functions.

Def. 1.6 A system D of functions from H is called a

dictionary if each g ∈ D has norm one and its linear span is

dense in H.

Denote by ΣR(D) the collection of s ∈ H which can be written

in the form

s =∑

g∈Λ

cgg, Λ ⊂ D : #Λ ≤ R ∈ N with cg ∈ R.

For f ∈ H, the best R-term approximation error is defined by

σR(f,D) := infs∈ΣR(D)

‖f − s‖.

Pure Greedy Algorithm B. Khoromskij, Zurich 2010(L1) 26

The Pure Greedy Algorithm (PGA) inductively computes an

estimate to the best R-term approximation.

Let g = g(f) ∈ D be an element maximising |〈f, g〉| (best rank-1

approximation by nonlinear maximisation!). Define

G(f) := 〈f, g〉g, R(f) := f −G(f).

The PGA reads as: Given f ∈ H, introduce

R0(f) := f and G0(f) := 0.

Then, for all 1 ≤ m ≤ R, we inductively define

Gm(f) := Gm−1(f) +G(Rm−1(f)),

Rm(f) := f −Gm(f) = R(Rm−1(f)).

Page 14: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Pure Greedy Algorithm B. Khoromskij, Zurich 2010(L1) 27

Applying PGA to functions characterised via the

approximation property (low order approximation)

σR(f,D) ≤ R−q, R = 1, 2, ...,

with some q ∈ (0, 1/2], leads to the error bound (Temlyakov)

‖f −GR(f,D)‖ ≤ C(q,D)R−q, R = 1, 2, ...,

which is “too pessimistic” in our applications (Monte-Carlo).

Our goal: The constructive R-term approximation on a class

of analytic functions (possibly with point singularities),

providing exponential convergence in R = 1, 2, ...,

σR(f,D) ≤ C exp(−Rq), q = 1 or q = 1/2.

Methods of choice: Quadrature- and interpolation-based

sinc-approximation, the direct fitting by exponential sums.

Greedy completely orthogonal decomposition B. Khoromskij, Zurich 2010(L1) 28

The decomposition in CR,

f =

R∑

k=1

akvk, vk = φ(1)k (x1) ⊗ ...⊗ φ

(d)k (xd) ∈ C1,

is called completely orthogonal if

〈φ(ℓ)k , φ(ℓ)

m 〉 = δk,m ∀ℓ = 1, ..., d, ⇔ Φ(ℓ) = [φ(ℓ)1 , ..., φ

(ℓ)R ] − orthog.

Greedy completely orthogonal decomposition (GCOD) is

defined as GOD with the orthogonality constraint on Φ(ℓ).

Lem. 1.3 (Tucker format with the diagonal core.) Let f ∈ H allow a rank-R

completely orthogonal decomposition. Then the GCOD

algorithm correctly computes it. If a1 > a2 > ... > aR > 0, then

the decomp. is unique.

Exer. 1.5 Prove Lem. 1.3. [Golub, Zhang 2001].

Limitations. Poor approximation properties of COD.

Page 15: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Literature to Lecture 1 B. Khoromskij, Zurich 2010(L1) 29

1. G. Beylkin, M. Mohlenkamp: Numerical operator calculus in higher dimensions,

Proc. Natl. Acad. Sci. USA, 99 (2002), 10246–10251.

2. M. Reed, and B. Simon: Methods of Modern Mathematical Physics, I. Functional Analysis, AP, NY, 1972.

3. B.N. Khoromskij: An introduction to Structured Tensor-product representation of Discrete

Nonlocal Operators. Lecture notes 27, MPI MIS, Leipzig 2005.

4. I.V. Oseledets: Constructive representation of functions in tensor formats. Preprint INM Moscow, 2010.

5. V.N. Temlyakov: Greedy Algorithms and M-Term Approximation with Regard

to Redundant Dictionaries. J. of Approx. Theory 98 (1999), 117-145.

6. T. Zhang, and G.H. Golub: Rank-one approximation to high order tensors.

SIAM J. Matrix Anal. Appl. 23 (2001), 534-550.

http://personal-homepages.mis.mpg.de/bokh

Lect. 2. Separation by tensor-product polynomial interpolation B. Khoromskij, Zuerich 2010(L2)

Outlook of Lecture 2

1. Best polynomial approximation. Error bound for analytic

functions.

3. Separable approximation by tensor-product interpolants.

- Polynomial interpolation.

- Trigonometric interpolation.

- Sinc interpolation (Lect. 3, 4).

4. Application to the Helmholtz kernel.

5. Approximation of functions in the form f(x1 + ...+ xd).

FTT decomposition of the Helmholtz kernel.

6. MATLAB Tensor Toolbox:

– http://csmr.ca.sandia.gov/∼tgkolda/Tensor Toolbox/

– TT/QTT http://spring.inm.ras.ru/osel (I. Oseledets).

Page 16: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Chebyshev polynomials. Polynomial approximation B. Khoromskij, Zuerich 2010(L2) 31

The Chebyshev polynomials, Tn(w), w ∈ C - complex plane,

are defined recursively

T0(w) = 1, T1(w) = w,

Tn+1(w) = 2wTn(w) − Tn−1(w), n = 1, 2, . . . .

Representation Tn(x) = cos(n arccosx), x ∈ [−1, 1], implies

Tn(1) = 1, Tn(−1) = (−1)n. There holds

Tn(w) =1

2(zn + z−n) with w =

1

2(z +

1

z). (4)

Let B := [−1, 1] be the reference interval, denote by Eρ = Eρ(B)

the Bernstein’s regularity ellipse (with foci at w = ±1 and the

sum of semi-axes equal to ρ > 1),

Eρ := w ∈ C : |w − 1| + |w + 1| ≤ ρ+ ρ−1.

Best polynomial approximation by Chebyshev series B. Khoromskij, Zuerich 2010(L2) 32

Thm. 2.1. (Chebyshev series). Let F be analytic and

bounded by M in Eρ (with ρ > 1). Then the expansion

F (w) = C0 + 2

∞∑

n=1

CnTn(w), (5)

holds for all w ∈ Eρ, and with

Cn =1

π

∫ 1

−1

F (w)Tn(w)√1 − w2

dw.

Moreover, |Cn| ≤M/ρn, and for w ∈ B, and m = 1, 2, 3, . . . ,

|F (w) − C0 − 2m∑

n=1

CnTn(w)| ≤ 2M

ρ− 1ρ−m, w ∈ B. (6)

Rem. Thm. 2.1. provides the same appoximation error as for the best

polynomial approximation (S. N. Bernstein, 1880-1968).

Page 17: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Laurent’s Theorem B. Khoromskij, Zuerich 2010(L2) 33

In the complex plane C, we introduce the circular ring

Rρ := z ∈ C : 1/ρ < |z| < ρ with ρ > 1.

Thm. 2.2. (Laurent’s Theorem). Let f : C → C be analytic

and bounded by M > 0 in Rρ with ρ > 1, (in the following we

say f ∈ Aρ), and set

Cn :=1

∫ 2π

0

f(eiθ)einθdθ, n = 0, ±1, ±2, . . . .

Then for all z ∈ Rρ, f(z) =∞∑

n=−∞Cnz

n, where the series

converges to f(z) for all z ∈ Rρ. Moreover |Cn| ≤M/ρ|n|, and

for all θ ∈ [0, 2π] and arbitrary integer m,∣∣∣∣∣f(eiθ) −

m∑

n=−m

Cneinθ

∣∣∣∣∣ ≤2M

ρ− 1ρ−m.

Proof of the approximation Theorem 2.1 B. Khoromskij, Zuerich 2010(L2) 34

Proof. Each f ∈ Aρ,s := f ∈ Aρ : C−n = Cn has a representation (cf.

Thm. 2.2)

f(z) = C0 +∞X

n=1

Cn(zn + z−n), z ∈ Rρ. (7)

(7) implies that f(1/z) = f(z), z ∈ Rρ.

Let us apply the mapping w = 12(z + 1

z), which satisfies w(1/z) = w(z). It is

a conformal transform of ξ ∈ Rρ : |ξ| > 1 onto Eρ as well as of

ξ ∈ Rρ : |ξ| < 1 onto Eρ (but not Rρ onto Eρ!). It provides a one to one

correspondence of functions F that are analytic and bounded by M in Eρ

with functions f in Aρ,s.

Since under this mapping we have (4), it follows that if f defined by (7) is

in Aρ,s, then the corresponding transformed function F (w) = f(z(w)), that

is analytic and bounded by M in Eρ, is given by (5).

Now the result follows directly due to Thm. 2.2.

Page 18: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Lagrangian polynomial interpolation B. Khoromskij, Zuerich 2010(L2) 35

Let PN (B) be the set of polynomials of degree ≤ N on B.

Define by [INF ](x) ∈ PN (B) the interpolation polynomial of F

w.r.t. the Chebyshev-Gauss-Lobatto (CGL) nodes

ξj = cosπj

N∈ B, j = 0, 1, . . . , N, with ξ0 = 1, ξN = −1,

where ξj are zeroes of the polynomials (1 − x2)T ′N (x), x ∈ B.

The Lagrangian interpolant IN of F has the form

INF :=

N∑

j=0

F (ξj)lj(x) ∈ PN (B) (8)

with lj(x) being the set of interpolation polynomials

lj :=N∏

k=0,j 6=k

x− ξkξj − ξk

∈ PN (B), j = 0, . . . , N.

Clearly, IN (ξj) = F (ξj), since lj(ξj) = 1 and lj(ξk) = 0 ∀k 6= j.

Stability: Lebesque constant for Chebyshev interpolation B. Khoromskij, Zuerich 2010(L2) 36

Given the set ξjNj=0 of interpolation points on [−1, 1] and the

associated Lagrangian interpolation operator IN .

The approximation theory for polynomial interpolation

includes the so-called Lebesque constant ΛN ∈ R>1,

‖INu‖∞,B ≤ ΛN‖u‖∞,B ∀u ∈ C(B). (9)

In the case of Chebyshev interpolation it can be shown that

ΛN grows at most logarithmically in N ,

ΛN ≤ 2

πlogN + 1.

The interpolation points which produce the smallest value Λ∗Nof all ΛN are not known, but Bernstein ’54 proves that

Λ∗N =2

πlogN +O(1).

Page 19: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Error bound for polynomial interpolation B. Khoromskij, Zuerich 2010(L2) 37

Thm. 2.3. Let u ∈ C∞[−1, 1] have an analytic extension to Eρ

bounded by M > 0 in Eρ (with ρ > 1). Then we have

‖u− INu‖∞,I ≤ (1 + ΛN )2M

ρ− 1ρ−N , N ∈ N0. (10)

Proof. Due to (6) one obtains for the best polynomial

approximations to u on [−1, 1],

minv∈PN

‖u− v‖∞,B ≤ 2M

ρ− 1ρ−N .

The interpolation operator IN is a projection, that is, for all

v ∈ PN we have INv = v.

Now apply the triangle inequality,

‖u− INu‖∞,B = ‖u− v − IN (u− v)‖∞,B ≤ (1 + ΛN )‖u− v‖∞,B .

Tensor-product polynomial interpolation B. Khoromskij, Zuerich 2010(L2) 38

Ginen N ∈ N, the set of interpolating functions ϕj(x), x ∈ B,

and sampling points ξj ∈ B (j = 0, 1, ..., N), s.t. ϕj(ξi) = δij,

The Lagrangian interpolant IN of F : B → R has the form

INF :=N∑

j=0

F (ξj)ϕj(x), F ∈ C(B) (11)

with interpolation property IN (ξj) = F (ξj) (j = 0, 1, . . . , N).

Consider a multi-variate function

f = f(x1, . . . , xd), f : Bd → R, d ≥ 2,

defined on a box Bd = B1 ×B2 × . . .×Bd with Bk = B.

Define N-th order tensor product interpolation operator

IN : C(Bd) → PN [Bd], INf = I1N × I2

N × . . .× IdNf.

Page 20: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Tensor-product polynomial interpolation B. Khoromskij, Zuerich 2010(L2) 39

Here IkNf is the interpolation polynomial w.r.t. xk, at nodes

ξjk ∈ Bk, k = 1, . . . , d,

IkNf(x1, ..., xk, ..., xd) =

N∑

jk=0

f(x1, ..., ξjk , ..., xd)ϕ(k)jk

(xk)

The tensor-product interpolant IN in d variables reads

INf :=

N∑

j=0

f(ξj1 , ..., ξjd)ϕ(1)j1

(x1)...ϕ(d)jd

(xd).

Our choice: The polynomial or Sinc interpolants.

In the case of CGL nodes, the interpolation points ξα ∈ Bd,

α = (j1, . . . , jd) ∈ Nd0, are obtained by the Cartesian product of

1D-nodes,

ξα :=

(cos

πj1N

, . . . , cosπjdN

).

Error bound for tensor-product polynomial interp. B. Khoromskij, Zuerich 2010(L2) 40

Again, IN is the projection map,

IN : C(Bd) → PN := p1 × . . .× pd : pi ∈ PN , i = 1, . . . d

implying stability of IN in the multidimensional case, cf. (9),

‖INf‖∞,Bd ≤ ΛdN‖f‖∞,Bd ∀ f ∈ C(Bd).

To derive an analogue of Thm. 2.3, introduce the product

domain

E(j)ρ := B1 × . . .×Bj−1 × Eρ(Ij) ×Bj+1 × . . .×Bd,

and denote by X−j the (d− 1)-dimensional (single-hole) subset

of variables

x1, . . . , xj−1, xj+1, . . . , xd with xj ∈ Bj , j = 1, ..., d.

Page 21: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Error bound for tensor-product polynomial interp. B. Khoromskij, Zuerich 2010(L2) 41

Assump. 2.1. Given f ∈ C∞(Bd), assume there is ρ > 1 s.t.

for all j = 1, . . . , d, and each fixed ξ ∈ X−j, there exists an

analytic extension fj(xj , ξ) of f(xj , ξ) to Eρ(Bj) ⊂ C w.r.t. xj

bounded in Eρ(Bj) by certain Mj > 0, independent on ξ.

Thm. 2.4. For f ∈ C∞(Bd), let Assump. 2.1 be satisfied.

Then the interpolation error can be estimated by

‖f − INf‖∞,Bd ≤ ΛdN

2Mρ(f)

ρ− 1ρ−N , (12)

where ΛN is the maximal Lebesque const. for the 1D

interpolants IkN , and

Mρ(f) := max1≤j≤d

maxx∈E(j)ρ

|fj(x, ξ)|.

Error bound for tensor-product polynomial interp. B. Khoromskij, Zuerich 2010(L2) 42

Proof. Multiple use of (9), (10) and the triangle inequality

lead to

|f − INf | ≤ |f − I1Nf | + |I1

N (f − I2N × . . .× Id

Nf)|≤ |f − I1

Nf | + |I1N (f − I2

Nf)| ++ |I1

NI2N (f − I3

Nf)| + . . .+ |I1N × . . .× Id−1

N (f − IdNf)|

≤ [(1 + ΛN ) maxx∈E(1)ρ

|f1(x, ξ)| + ΛN (1 + ΛN ) maxx∈E(2)ρ

|f2(x, ξ)|

+ . . .+ Λd−1N (1 + ΛN ) max

x∈E(d)ρ

|fd(x, ξ)|]2

ρ− 1ρ−N

≤ (1 + ΛN )(ΛdN − 1)

ΛN − 1

2Mρ

ρ− 1ρ−N .

Hence (12) follows since for x > 1 we have(1+x)(xn−1)

x−1 ≤ xn.

Page 22: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Application to the Helmholtz kernel B. Khoromskij, Zuerich 2010(L2) 43

Are the Tucker/canonical/FTT models robust in κ ?

Construct exponentially convergent tensor decompositions of

the classical Helmholtz kernel eiκ‖x−y‖

‖x−y‖ , κ ∈ R, such that its real

and imaginary parts

cos(κ‖x− y‖)‖x− y‖ and

sin(κ‖x− y‖)‖x− y‖ , x, y ∈ R

d

are treated separately.

Goal: Separable approximation of the oscillatory potentials

f1,κ(‖x‖) :=sin(κ‖x‖)

‖x‖ ; f2,κ(‖x‖) :=1

‖x‖−cos(κ‖x‖)

‖x‖ =2sin2(κ

2 ‖x‖)‖x‖ ,

and the related kernel functions

f1,κ(‖x− y‖), f2,κ(‖x− y‖), 1

‖x− y‖ , x, y ∈ Rd.

f1,κ(‖x‖), f2,κ(‖x‖), slice for d = 3, ‖x‖ ≤ π, κ = 1, 15 B. Khoromskij, Zuerich 2010(L2) 44

0 10 20 30 40 50 60 70 80

0

20

40

60

80

−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0

20

40

60

80

0

20

40

60

80−14

−12

−10

−8

−6

−4

−2

0

2

4

0 10 20 30 40 50 60 70 80

0

20

40

60

800

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 10 20 30 40 50 60 70 800

20

40

60

80

0

2

4

6

8

10

12

Page 23: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Estimates on the Tucker/canonical rank of f1,κ B. Khoromskij, Zuerich 2010(L2) 45

Main result: The Tucker and canonical approximations to

f1,κ, f2,κ, allow the rank bound (see [2])

rT ≤ R ≤ Cd(| log ε| + κ).

Numerics (d = 3, moderate κ ≤ 15) agrees with the theory.

Rem. 2.1. 1‖x‖ is proven to have a low-rank separable

approximation with rT ≤ R = O(log2 ε) or R = O(| log ε| logn) in

the discrete case (Lect. 3, 4), (see [1]).

Thm. 2.5. For given ε > 0, the funct. f1,κ : [0, 2π√d]d → R

allows the Tucker/canonical approx., s.t.

σ(f1,κ,S) ≤ Cε with S = T r,CR,

and with the rank estimates (r = (r, ..., r)),

r ≤ R ≤ Cd(| log ε| + κ).

Estimates on the Tucker/canonical rank of f1,κ B. Khoromskij, Zuerich 2010(L2) 46

Proof. Set t = ‖x‖2 and approximate the entire function g(t) =sin(κ

√t)√

t,

t ∈ [0, 2π] by trigonometric polynomials in t up to the accuracy ε in the

max-norm. Making use of the change of variables z = cos(t), z ∈ [−1, 1],

consider the entire function f(z) = g(arccos(z)), that has the maximum

value O(eκ) on the respective Bernstein’s regularity ellipse of size O(1).

Applying the Chebyshev series to f(z) leads to approximation by trig.

polynomials with O(| log ε| + κ) terms, where each trigonometric term has

the form cos(mt) = cos(m‖x‖2).

The multivariate function h(x) := cos(x21 + ...+ x2

d) has a separation rank

R ≤ d, i.e. h ∈ Cd, in view of the rank-d representation (Lec. 1, Prop. 1.1)

cos(dX

i=1

xj) =dX

i=1

sin(xj +π

2d)

Y

k∈1,...,d\j

sin(xk + π2d

+ αk − αj)

sin(αk − αj),

for any αk ∈ R, s.t. sin(αk − αj) 6= 0 for all j 6= k.

Now the result follows by applying the Chebyshev trigonometric

approximation taking into account that h ∈ Cd.

Page 24: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Estimates on the Tucker/canonical rank of f2,κ B. Khoromskij, Zuerich 2010(L2) 47

Next statement applies to the d-th order tensor representing the projected

kernel f2,κ onto the piecewise constant basis functions.

Introduce the function

f0(t) :=sin2(κ/2

√t)

t≡`f1,κ/2(t)

´2, f2,κ(t) = 2‖x‖f0(t).

Thm. 2.6. Suppose RankC(‖x‖) = O(| log ε| log n), then for any d ≥ 3, and

for given ε > 0, the function f2,κ : [0, 2π√d]d → R, allows the

Tucker/canonical approximations, s.t.

σ(f2,κ,S) ≤ Cε with S = T r,CR,

and with the rank estimates (r = (r, ..., r)),

r ≤ R ≤ Cd| log ε| log n (| log ε| + κ).

Proof. Factorise the function

g(t) =sin2(κ/2

√t)√

t, t = ‖x‖2 ∈ [0, 2π],

to obtain

g(t) =√tf0(t) with f0(t) =

`f1,κ/2(t)

´2.

Estimates on the Tucker/canonical rank of f2,κ B. Khoromskij, Zuerich 2010(L2) 48

Applying to the function f0 : [0, 2π] → R the same argument

as in Theorem 2.5, we obtain its separable approximation in

classes T r and CR (on the continuous level) that allows the

κ-dependent rank estimate

r ≤ R ≤ Cd(| log ε| + κ).

We are left to the tensor approximation of ‖x‖ = ‖x‖2/‖x‖. By

assumption, using the rank-| log ε| logn approximation of ‖x‖,we arrive at the desired (canonical) rank estimate for the

decomposition, obtained as the Hadamard product of two

canonical tensors of rank d(log ε| + κ) and | log ε| logn,respectively.

Page 25: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Comlexity estimates B. Khoromskij, Zuerich 2010(L2) 49

Theorems 2.5 and 2.6 indicate linear scaling of the tensor

rank in the frequency parameter κ, leading to the remarkable

reduction of the numerical cost.

The approximation condition for the high frequency κ ≤ Cn.

Complexity issues:

Storage needs: dRn ≤ rd + rdn ⇒ d(Rr − 1)n ≤ rd−1.

κ ≤ Cn ⇒ the Tucker model scales linear in Nvol = n3.

If κ ≤ Cn2/3 ⇒ the Tucker model scales linear in NBEM = n2.

Canonical decomposition scales at most as O(n2) for any d.

Conclusion: Tensor decomposition outperforms by the order

of magnitude the “best” wavelet O(n3 logn+ κ3 log κ)-meth.

[Beylkin, et al. ’08].

FTT decomposition of functions f1,κ and f2,κ B. Khoromskij, Zuerich 2010(L2) 50

Lem. 2.1 Rank-2 FFT decomposition of f(x) := sin(dP

j=1xj), x ∈ Rd, reads

(see [3])

f(x) =“sinx1 cosx1

”0@cosx2 − sinx2

sinx2 cosx2

1A · · ·

0@cosxd−1 − sin xd−1

sinxd−1 cos xd−1

1A0@cosxd

sinxd

1A .

Proof. Induction, similar to Lec. 1, Exer. 3.1,

f(x) = sinx1 cos(x2 + ...+ xd) + cos x1 sin(x2 + ...+ xd)

=“sinx1 cosx1

”0@cos(x2 + ...+ xd)

sin(x2 + ...+ xd)

1A

=“sinx1 cosx1

”0@cosx2 − sinx2

sinx2 cosx2

1A0@cos(x3 + ...+ xd)

sin(x3 + ...+ xd)

1A .

Thm. 2.7 For any d ≥ 3, we have RankFT T (f1,κ(‖x‖)) ≤ C(| log ε| + κ).

Suppose that RankC(‖x‖) = O(| log ε| logn), then for any d ≥ 3,

RankFT T (f2,κ(‖x‖)) ≤ C| log ε| logn(| log ε| + κ).

Page 26: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Applicability to 3D scattering problems B. Khoromskij, Zuerich 2010(L2) 51

Figure 1: Examples of step-type geometries, which are well suited for

tensor-product representation in 3D FEM/BEM.

Numerics B. Khoromskij, Zuerich 2010(L2) 52

Example 2.1. Figure 13 represents the convergence history for the best

orthogonal Tucker vs. canonical approximations of the Newton/Yukawa

potentials on n× n× n grid for n = 2048.

0 10 20 30 4010

−8

10−6

10−4

10−2

100

Tensor rank

Rel

ativ

e er

ror

Newton

0 10 20 30 40 50Tensor rank

Yukawa − κ=1

Sinc − NFFDTucker approx.

Figure 2: The Tucker vs. canonical approximations of the New-

ton/Yukawa potentials.

Page 27: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Numerics B. Khoromskij, Zuerich 2010(L2) 53

Example 2.2. Figure 3 shows the convergence history for the Tucker

model applied to f1,κ, f2,κ depending on κ ∈ [1, 15]. It clearly indicates the

relation r ∼ C + κ for differen (fixed) values of ε1 = 10−3 and ε2 = 10−4.

2 4 6 8 10 12 142

4

6

8

10

12

14

16

18

20

22

κ

Tuc

ker

rank

f1 (|x|) on [0,π ]3

ε =10−3

ε =10−4

2 4 6 8 10 12 142

4

6

8

10

12

14

16

18

20

22

κT

ucke

r ra

nk

f2(|x|) on [0,π]3

ε =10−3

ε =10−4

Figure 3: Convergence history for the Tucker model applied to f1,κ, f2,κ,

κ ∈ [1, 15].

Literature to Lecture 2 B. Khoromskij, Zuerich 2010(L2) 54

1. W. Hackbusch and B.N. Khoromskij: Hierarchical Kronecker Tensor-Product Approximation to a Class

of Nonlocal Operators in High Dimensions. Part I. Computing 76 (2006) 177-202.

2. B.N. Khoromskij: Tensor-structured Preconditioners and Approximate Inverse of Elliptic Operators in Rd.

J. Constructive Approx. 30:599-620 (2009).

3. I.V. Oseledets: Constructive representation of functions in tensor formats. Preprint INM Moscow, 2010.

URL: http://personal-homepages.mis.mpg.de/bokh

Page 28: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Lect. 3. Analytic Methods of Separable Approximation in RdB. Khoromskij, Zuerich 2010(L3) 55

Outline of Lecture 3

1. Separable approximation by exponential sums.

2. Sinc approximation on (−∞,∞).

- Sampling theorem.

- Sinc quadratures and interpolation.

- Exponential convergence rate for functions in H1(Dδ).

- Improved quadratures.

3. Polynomial and exponential decay on [0,∞).

4. Sinc methods on an arc (a, b)

5. Numerical illustrations.

Setting the approximation problem B. Khoromskij, Zuerich 2010(L3) 56

Analytic methods of the Tucker/canonical tensor-product

decomposition to non-local operators and separable

approximation to multi-variate functions can be based on sinc

interpolation or quadrature.

Approximation problem: Given a multi-variate function

F : Ωd → R, (d ≥ 2), approximate it by a separable expansion

Fr(ζ1, ..., ζd) :=r∑

k=1

ckΦ(1)k (ζ1) · · ·Φ(d)

k (ζd) ≈ F, Ω ∈ R,R+, (a, b),

where the set of univariate funct. Φ(ℓ)k : Ω → R, 1 ≤ ℓ ≤ d,

1 ≤ k ≤ R, may be fixed or chosen adaptively, ck ∈ R.

For numerical efficiency the separation rank r ∈ N should be

reasonably small.

Page 29: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Separable appr. by interpolation and quadratures B. Khoromskij, Zuerich 2010(L3) 57

I. Separation by tensor-prod. interpolation (Tucker model)

• Polynomial interpolation

• Sinc interpolation

The Tucker model applies to a class of analytic functions.

II. Approximating by exponential sums (canonical model)

• Sinc quadratures (simple direct method)

• Fitting by exponential sums∑ake−bkx

(best r-term nonlinear approximation via nontrivial iteration)

• Approx. by trigonometric sums∑

[ak sin(bkx) + a′k cos(b′kx)].

The canonical model applies well to functions depending on

the sum of single variables (say, f(x) = f(‖x‖), x ∈ Rd).

Canonical approximation via separation by integration B. Khoromskij, Zuerich 2010(L3) 58

Assume that a function of ρ =∑d

i=1 xi is given by the integral

f(ρ) =

Ω

G(t)eρF (t)dt, Ω ∈ R,R+, (a, b).

If a quadrature can be applied, one obtains the separable

approximation (with weights cν = ωνG(tν))

f(x1 + . . .+ xd) ≈r∑

ν=1

ωνG(tν)eρF (tν) =r∑

ν=1

d∏

i=1

exiF (tν).

We apply the Sinc-quadratures to the Laplace transform.

Examples of f(ρ): Green’s kernels and classical potentials,

f(x) =1

x1 + ...+ xd, xi ≥ 0,

1

ρ=

∫ ∞

0

e−ρtdt, ρ > 0.

f(x) = 1/‖x‖, x ∈ Rd,

1

ρ=

2

π

∫ ∞

0

e−ρ2t2dt;

Page 30: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Separation by exponential fitting B. Khoromskij, Zuerich 2010(L3) 59

Rem. 3.1 Quadrature approximation provides quasi-optimal

r-term approximation that can be then optimised by algebraic

methods.

The best r-term approximation of f(ρ) by exponential sums,

f(ρ) ≈r∑

ν=1

ωνe−tνρ, tν ∈ C (13)

(e.g., w.r.t. the L∞- or L2-norm), leads to an approximation

whose separation rank R is close to optimal.

Rem. 3.2. The approximation by exponential/trigonometric

sums also applies to the matrix-valued function f(A), with

A =∑d

i=1Ai and pairwise commutable matrices Ai.

Big Bernstein Theorem B. Khoromskij, Zuerich 2010(L3) 60

For n ≥ 1, consider the set E0n of exponential sums on [0,R+):

E0n :=

(u =

nX

ν=1

ωνe−tνx : ων , tν ∈ R

).

Now one can address the problem of finding the best approximation to f

over the set E0n characterised by the best approximation error

d(f, E0n) := infv∈E0

n‖f − v‖∞.

The existence of an approximation by exponentials is due to

Big Bernstein Theorem: If f is completely monotone for x ≥ 0, i.e.,

(−1)nf(n)(x) ≥ 0 for all n ≥ 0, x ≥ 0,

then it is the restriction of the Laplace transform of a measure to R+:

f(z) =

Z

R+

e−tzdµ(t).

Page 31: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Exponential decay of the error on [a, b] B. Khoromskij, Zuerich 2010(L3) 61

The complete elliptic integral of the first kind with modulus κ,

K(κ) =

∫ 1

0

dt√(1 − t2)(1 − κ2t2)

(0 < κ < 1)

and define K′(κ) := K(κ′) by κ2 + (κ′)2 = 1.

Prop. 3.1. [Braess] Assume that f is completely monotone and

analytic for ℜe z > 0, and let 0 < a < b. Then for the uniform

approximation on the interval [a, b],

limn→∞

d(f, E0n)1/n ≤ 1

ω2, ω = exp

πK(κ)

K′(κ)< 1, with κ =

a

b.

In the cases f(ρ) as below, we may assume ρ ∈ [1, R], i.e.,

κ = 1/R for 1 ≪ R.

Exponential decay of the error on [a, b] B. Khoromskij, Zuerich 2010(L3) 62

Now applying the asymptotics

K(κ′) = ln 4κ + C1κ+ ... for κ′ → 1,

K(κ) = π2 1 + 1

4κ2 + C1κ

4 + ... for κ→ 0,

of the complete elliptic integrals, we obtain

1

ω2= exp

(−2πK(κ)

K(κ′)

)≈ exp

(− π2

ln(4R)

)≈ 1 − π2

ln(4R).

The latter expression indicates that the number n of different

terms to achieve a tolerance ε > 0 is estimated by

n ≈ | log ε|| logω−2| ≈

| log ε| ln (4R)

π2.

This result shows the same asymptotical convergence in n as

that for the Sinc approximation (see below and Lect. 4).

Page 32: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Exponential approximations in L2-norm B. Khoromskij, Zuerich 2010(L3) 63

The best approximation to f(ρ), ρ ∈ [1, R] w.r.t. a weighted

L2-norm is reduced to the minimisation of an explicitly given

differentiable functional.

Given R > 1, n ≥ 1, find the 2n parameters α1, ω1, ..., αn, ωn ∈ R,

such that

FW (R;α1, ω1, ..., αn, ωn) :=

∫ R

1

W (x)(f(x)−

n∑

i=1

ωie−αix

)2

dx = min .

In the important particular case of f(x) = 1/x and W (x) = 1,the integral can be calculated in a closed form

F1(R;α1, ω1, ..., αn, ωn) = 1 − 1

R− 2

nX

i=1

ωi [Ei(−αi) − Ei(−αiR)]

+1

2

nX

i=1

ω2i

αi

he−2αi − e−2αiR

i+ 2

X

1≤i<j≤n

ωiωj

αi + αj

he−(αi+αj) − e−(αi+αj)R

i

with the integral exponential function Ei(x) = −∫ x

−∞et

t dt.

Exponential approximations in L2-norm B. Khoromskij, Zuerich 2010(L3) 64

In the case R = ∞, the expression for F1(∞; . . .) simplifies.

Gradient or Newton type methods with a proper choice of the

initial guess can be used to obtain the minimiser of F1.

However, the convergence of nonlinear iterations might be

very slow.

The integral FW may be approximated by certain quadrature.

Optimisation with respect to the maximum norm leads to the

nonlinear minimisation problem

infv∈E0n‖f − v‖L∞[1,R]

involving 2n parameters ων , tνnν=1. The numerical scheme

can be based on the Remez algorithm of rational

approximation.

Page 33: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Exponential approximations in L2-norm B. Khoromskij, Zuerich 2010(L3) 65

Calculations using the weighted L2([1, R])-norm have been

performed by the MATLAB subroutine FMINS based on the

global minimisation by direct search.

best approximation to 1/√ρ in weighted L2([1, R])-norm.

R 10 50 100 200 ‖ · ‖L∞ W (ρ) = 1/√ρ

r = 4 3.710-4 9.610-4 1.510-3 2.210-3 1.910-3 4.810-3

r = 5 2.810-4 2.810-4 3.710-4 5.810-4 4.210-4 1.210-3

r = 6 8.010-5 9.810-5 1.110-4 1.610-4 9.510-5 3.310-4

r = 7 3.510-5 3.810-5 3.910-5 4.710-5 2.210-5 8.110-5

Calculations for nearly best approximation to 1/√ρ in

L∞-norm are presented by W. Hackbusch,

www.mis.mpg.de/scicomp/EXP SUM/1 x/tabelle.

Sampling Theorem, Sinc Approximation B. Khoromskij, Zuerich 2010(L3) 66

How to discretise analog signals ?

The class of functins f(t), t ∈ R can be discretized by

recording their sample values f(nh)n∈Z at intervals h > 0.

The sinc function (also called Cardinal function) is given as

sinc(x) :=sin(πx)

πxwith convention sinc(0) = 1.

V.A. Kotelnikov (1933) and J. Whittaker (1935) proved a celebrated

theorem: band-limited signals can be exactly reconstructed

via their sampling values.

Thm. 3.1. (Kotelnikov, Shannon, Whittaker) If the support of f is

included in [−π/h, π/h] then for t ∈ R

f(t) =

∞∑

n=−∞f(nh)Sn,h(t), with Sn,h(t) = sinc(t/h − n).

Page 34: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Sampling Theorem B. Khoromskij, Zuerich 2010(L3) 67

Proof. Exer. 3.1. Use properties of Fourier transform (FT) Khoromskij [4].

bf(ω) :=

Z

R

f(t)e−iωtdt (continuous Fourier transform).

Exer. 3.2. Let χ[−T,T ](t) = 1 if t ∈ [−T, T ] and 0 otherwise (characteristic,

indicator, step function). Prove 12Tbχ =

sin(T w)T w

.

−1 −0.5 0 0.5 1 1.5 2

−0.5

0

0.5

1

1.5

Haar scaling function

−10 −8 −6 −4 −2 0 2 4 6 8 10−0.4

−0.2

0

0.2

0.4

0.6

0.8

1Sinc function

Figure 4: Haar (cf. bf of f = sinc) and Sinc scaling functions.

Sampling theorem plays an important role in tele/radio communications,

signal processing, stochastical models etc.

Sampling Thm. as a decomposition in orthogonal basis B. Khoromskij, Zuerich 2010(L3) 68

Define the space Uh as a set of functions whose FTs have a

support included in [−π/h, π/h].

Lem. 3.2. [Stenger] A set of functions Sn,h(t)n∈Z is an

orthogonal basis of the space Uh. If f ∈ Uh then

f(nh) =1

h〈f(t), Sn,h(t)〉 .

Cor. 3.3. The sinc-interpolation formula of Thm. 3.1 can be

interpreted as a decomposition of f ∈ Uh in an orthogonal

basis of Uh:

f(t) =1

h

∞∑

n=−∞〈f(·), Sn,h(·)〉Sn,h(t).

If f 6∈ Uh, one finds the orthogonal projection of f in Uh.

Page 35: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Exact Sinc-interpolation of entire functions B. Khoromskij, Zuerich 2010(L3) 69

When the Sinc-interpolant represents a funct. exactly?

C(f, h)(x) =

∞∑

k=−∞f(kh)Sk,h(x).

Def. 3.1. Let h > 0, and let W(π/h) denote the family of

entire functions, s.t.∫

R|f(t)|2dt <∞, and for all z ∈ C

|f(z)| ≤ Ceπ|z|/hwith constant C > 0.

Thm. 3.4. (Stenger) h−1/2Sk,h(x)k∈Z is a complete

L2(R)-orthonormal sequence in W(π/h).

Every f ∈ W(π/h) has the cardinal series representation

f(x) = C(f, h)(x), x ∈ R.

Sinc-approximation of analytic functions B. Khoromskij, Zuerich 2010(L3) 70

Interpolant C(f, h) provides an incredibly accurate approx.

on R for functions which are analytic and uniformly bounded

on the strip

Dδ := z ∈ C : |ℑmz| ≤ δ, 0 < δ <π

2,

such that

N(f,Dδ) :=

R

(|f(x+ iδ)| + |f(x− iδ)|) dx <∞.

This defines the Hardy space H1(Dδ).

For f ∈ H1(Dδ) we have exponential convergence in 1/h (Stenger)

supx∈R

|f(x) − C(f, h)(x)| = O(e−πδ/h), h → 0. (14)

Page 36: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Sinc-quadratures for analytic integrand B. Khoromskij, Zuerich 2010(L3) 71

Likewise, if f ∈ H1(Dδ), the integral

I(f) =

Ω

f(x)dx (Ω = R or Ω = R+)

can be approximated with exponential convergence by the

Sinc-quadrature (trapezoidal rule)

T (f, h) := h

∞∑

k=−∞f(kh)

(=

R

C(f, h)(x)dx ≈ I(f)

),

|I(f) − T (f, h)| = O(e−πδ/h), h → 0. (15)

Analogues estimates hold for (computable) trucated sums

CM (f, h) :=∑M

k=−M f(kh)Sk,h(x), TM (f, h) := h∑M

k=−M f(kh).

Standard error estimates on R B. Khoromskij, Zuerich 2010(L3) 72

Thm. 3.5. [Stenger] If f ∈ H1(Dδ) and |f(x)| ≤ C exp(−b|x|) for

all x ∈ R b, C > 0, then

‖f − CM (f, h)‖∞ ≤ C

[e−πδ/h

2πδN(f,Dδ) +

1

bhe−bhM

], (16)

|I(f) − TM (f, h)| ≤ C

[e−2πδ/h

1 − e−2πδ/hN(f,Dδ) +

1

be−bhM

]. (17)

Sketch of proof: First term in the rhs of (16) represents the

approximation error (14),

‖f(x) − C(f, h)(x)‖∞ ≤ N(f,Dδ)

2πδ sinh(πδ/h),

while the second one gives the truncation error

‖C(f, h)(x) − CM (f, h)(x)‖∞ ≤ ∑|k|≥M+1

|f(kh)|

≤ 2C∞∑

k=M+1

e−bkh ≤ 2Cbhe−bhM .

Page 37: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Exponential convergence rate in M B. Khoromskij, Zuerich 2010(L3) 73

Similar arguments apply to (17).

For interpolation error (16), the choice

h =√πδ/bM

implies the exponential convergence rate

‖f − CM (f, h)‖∞ ≤ CM1/2e−√

πδbM . (18)

In fact, for the chosen h, the first term in the rhs in (16)

dominates, hence (18) follows. Usually we set δ = π/2.

For the quadrature error (17), the “optimal” choice

h =√

2πδ/bM

yields

|I(f) − TM (f, h)| ≤ Ce−√

2πδbM . (19)

Error bound in the case of double-exponential decay B. Khoromskij, Zuerich 2010(L3) 74

If f has a double-exponential decay as |x| → ∞, i.e.,

|f(x)| ≤ C exp(−bea|x|) for all x ∈ R with a, b, C > 0, (20)

the convergence rate of Sinc- interpolation and quadrature

can be improved up to O(e−cM/ log M ) (cf. Thm. 3.5).

Thm. 3.6. (Gavrilyuk, Hackbusch, Khoromskij) Let f ∈ H1(Dδ) with

some δ < π2 , and let (20) hold. Then the choice

h = log( 2πaMb )/ (aM) leads for the quadrature error

|I − TM (f, h)| ≤ C N(f,Dδ)e−2πδaM/ log(2πaM/b). (21)

The choice h = log(πaMb )/ (aM) ensures the interpolation error

‖f − CM (f, h)‖∞ ≤ CN(f,Dδ)

2πδe−πδaM/ log(πaM/b). (22)

Page 38: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Error bound in the case of double-exponential decay B. Khoromskij, Zuerich 2010(L3) 75

Proof. The bound for |I − T (f, h)| is the same as in Thm. 3.5.

For the rest sum we use the simple estimate to obtain

k: |k|>M

exp(−bea|kh|) = 2∞∑

k=M+1

exp(−bea|kh|)

≤ 2

∫ ∞

M

exp(−bea|xh|)dx ≤ 2e−ahM

abhexp(−beahM ).

Hence, the quadrature error has a bound

|I − TM (f, h)| ≤ C

[e−2πδ/h

1 − e−2πδ/hN(f,Dδ) +

e−ahM

abexp(−beahM )

].

Now (21) follows by substitution of h.

Error bound in the case of double-exponential decay B. Khoromskij, Zuerich 2010(L3) 76

The interpolation error of CM (f,h) satisfies

‖f − CM (f,h)‖∞ ≤ C

"e−πδ/h

2πδN(f,Dδ) +

e−ahM

abhexp(−beahM )

#.

The approximation error allows the same estimate as in the standard case.

To prove (22), we note that the truncation error bound is determined by

the decay rate of f as |x| → ∞,

‖C(f,h)(x) − CM (f,h)(x)‖∞ ≤ P|k|≥M+1

|f(kh)|

≤ 2C∞P

k=M+1e−beakh ≤ 2C

baheahM e−beahM.

Exer. 3.3. For numerical approximation of the integralR∞−∞ exp(−x2)dx =

√π, show that with the choice h = (π/M)1/2,

˛˛˛Z ∞

−∞exp(−x2)dx− h

MX

−M

e−k2h2

˛˛˛ ≤ C exp(−πM).

Calculate the approximation to√π for M = 4, 8, 12, the latter will be

accurate to 15 digits. Calculate the sinc interpolant to exp(−λx2), λ > 0.

Page 39: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Sinc-interpolation on (a, b) via Thm. 3.5 B. Khoromskij, Zuerich 2010(L3) 77

To apply Thm. 3.5 in the case Ω = (a, b) (say, Ω = R+) one

has to substitute the variable x ∈ Ω by x = ϕ(ζ) such that

ϕ : R → (a, b) is a bijection. This changes f : (a, b) → R into

f1 := ϕ′ · (f ϕ) : R → R (quadrature case),

f1 := f ϕ (interpolation case).

Assuming f1 ∈ H1(Dδ), one can apply (18)-(19) to the

transformed function f1.

Ex. 3.1. In the case of interval, (a, b):

ϕ−1(z) = log[(z − a)/(b− z)], ℜe z = x.

Ex. 3.2. In the case of semi-axis, R+ := (0,∞):

ϕ−1(z) = log[sinh(z)] or ϕ−1(z) = log(z), (ϕ(ζ) = eζ).

Sinc quadratures on R+ (polynomial/exponential decay) B. Khoromskij, Zuerich 2010(L3) 78

Polynomial decay. Let us set Ω = R+ and assume:

(i) f can be analytically extended from R+ into the sector

D(1)δ = z ∈ C : | arg(z)| < δ for some 0 < δ < π/2,

(actually, ϕ−1 : D(1)δ → Dδ is the conformal map, ϕ(ζ) = eζ),

(ii) f satisfies the inequality

|f(z)| ≤ c|z|α−1(1+|z|)−α−β for some 0 < α, β ≤ 1 and ∀z ∈ D(1)δ .

Let α = 1. Choosing any M ∈ N and taking

h(1) =√

2πδ/(βM),

we define the corresponding quadrature rule

T(1)M = h(1)

M∑

k=−βM

ckf(zk), zk = ekh(1)

, ck = ekh(1)

,

Page 40: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Sinc quadratures on R+ (polynomial/exponential decay) B. Khoromskij, Zuerich 2010(L3) 79

possessing the exponential converg. (C > 0 independ. of M)∣∣∣I − T

(1)M

∣∣∣ ≤ Ce−√

2πδβM .

d

d 0

Dd1

id

0

d

d

Dd3

Figure 5: The analyticity sector D(1)δ (left) and the “bullet-shaped” do-

main D(3)δ .

Rem. 3.3 The results for polynomial decay are in sharp contrast to the

error in polynomial-based approximation with algebraic singularities. For

example, for the function f(x) = xα(1 − x)α, α = 1/2, the best

approximation by polynomials of degree n converges with the rate Cn−α.

Sinc quadratures on R+ (polynomial/exponential decay) B. Khoromskij, Zuerich 2010(L3) 80

Exponential decay. Assume that the integrand f can be

analytically extended into the “bullet-shaped” domain

D(3)δ = z ∈ C : | arg(sinh z)| < δ, 0 < δ < π/2,

and that f satisfies

|f(z)| ≤ C

( |z|1 + |z|

)α−1

e−βℜe z in D(3)δ , α, β ∈ (0, 1]. (23)

Setting α = 1 and choosing h(2) = h(1), c(2)k = 1 + e−2kh(2)

and

M ∈ N, we obtain the quadrature

T(2)M = h(2)

M∑

k=−βM

c(2)k f(z

(2)k ), z

(2)k = log[ekh(2)

+√

1 + e2kh(2) ],

possessing the exponential convergence rate as above.

Page 41: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Numerics for the Sinc interpolation on (a, b) B. Khoromskij, Zuerich 2010(L3) 81

Ex. 3.3. Separable approximation to the function

g(x, y) = ‖x‖λ sinc(‖x‖ ‖y‖), λ ∈ (−3, 1],

arising in the Boltzmann equation, x, y ∈ R3.

4 8 12 16 20 24 28 32 36 40 44 4810

−12

10−10

10−8

10−6

10−4

10−2

100

M − number of quadrature points

erro

r

|x|s sinc(y|x|), x ∈ [−1,1],s=1,y=16

4 8 12 16 20 24 28 32 36 40 44 4810

−8

10−7

10−6

10−5

10−4

10−3

10−2

10−1

M − number of quadrature points

erro

r

|x|s sinc(y|x|), x ∈ [−1,1],s=1,y=25

4 8 12 16 20 24 28 32 36 40 44 4810

−5

10−4

10−3

10−2

10−1

M − number of quadrature points

erro

r

|x|s sinc(y|x|), x ∈ [−1,1],s=1,y=36

Figure 6: L∞-error of the sinc-interpolation to |x|λsinc(|x|y), x ∈[−1, 1], y = 16, 25, 36, λ = 1.

Rem. 3.4. Sinc-interpolant provides exponentially convergent separable

approximation of g(x, y) in (x, y) ∈ [a, b]3 × [c, d]3.

Numerics for the Sinc interpolation on R+ B. Khoromskij, Zuerich 2010(L3) 82

Ex. 3.4. Sinc-interpolation for g(x, y) = exp(−xy), x, y ≥ 0.

Consider the auxiliary function f(x, y) = x1+x exp(−xy), x ∈ R+,

y ∈ [1, R], which satisfies all the conditions above with

α = β = 1 (exponential decay). With the choice of

interpolation points xk := log[ekh +√

1 + e2kh] ∈ R+, it can be

approximated with exponential convergence.

4 8 12 16 20 24 28 32 36 40 44 4810

−14

10−12

10−10

10−8

10−6

10−4

10−2

100

M − number of quadrature points

erro

r

|x|s exp(−y|x|), x ∈ [−1,1],s=1,y=1.

4 8 12 16 20 24 28 32 36 40 44 4810

−14

10−12

10−10

10−8

10−6

10−4

10−2

100

M − number of quadrature points

erro

r

|x|s exp(−y|x|), x ∈ [−1,1],s=1,y=10.

4 8 12 16 20 24 28 32 36 40 44 4810

−10

10−8

10−6

10−4

10−2

M − number of quadrature points

erro

r

|x|s exp(−y|x|), x ∈ [−1,1],s=1,y=100.

Figure 7: L∞-error of the sinc-interpolation of exp(−|x|y), x ∈ [−1, 1], y = 1, 10, 100.

Page 42: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Numerics for the Sinc interpolation on R B. Khoromskij, Zuerich 2010(L3) 83

Ex. 3.5. Mexican hat scaling function

−5 −4 −3 −2 −1 0 1 2 3 4 5 6−0.5

0

0.5

1Mexican hat scaling function

Figure 8: Mexican hat f(x) = (1 − x2) exp(−αx2), α > 0.

Sinc interpolation to the Mexican hat, r = M + 1.

α\M 4 9 16 25 36 49 64 81 100

1 0.05 6.10-4 7.10-7 1.10-10 2.10-15 1.10-15 - - -

10 0.17 0.13 0.12 0.04 0.01 0.004 0.0009 1.710-4 2.610-5

0.1 3.8 2.6 0.6 0.08 0.006 1.610-5 2.10-7 2.510-9 2.10-11

Literature to Lect. 3 B. Khoromskij, Zuerich 2010(L3) 84

1. D. Braess: Nonlinear approximation theory. Springer-Verlag, Berlin, 1986.

2. I.P. Gavrilyuk, W. Hackbusch, and B.N. Khoromskij: Data-sparse approximation to a class of

operator-valued functions. Math. Comp. 74 (2005), 681-708.

3. W. Hackbusch and B.N. Khoromskij: Hierarchical Kronecker Tensor-Product Approximation to a Class

of Nonlocal Operators in High Dimensions. Part I. Computing 76 (2006), 177-202.

4. B.N. Khoromskij: An Introduction to Structured Tensor-product Representation of Discrete

Nonlocal Operators. Lecture notes 27, MPI MIS, Leipzig 2005.

5. J. Lund, and K.L. Bowers: Sinc Methods for Quadrature and Different. Equations. SIAM, Philadelphia 1992.

6. F. Stenger: Numerical methods based on Sinc and analytic functions. Springer-Verlag, 1993.

http://personal-homepages.mis.mpg.de/bokh

Page 43: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Lect. 4. Low Rank Sinc Approximation of Green Kernels B. Khoromskij, Zuerich 2010(L4) 85

Outlook of Lecture 4

1. Tensor-product Sinc interpolation ⇒ Tucker approxim.

- Lebesque constant.

- Exponential converg. of tensor-product Sinc interpolation.

2. Separable approx. of integral operators and related kernels.

- General discussion.

- The case of chift invariant kernels. Green kernels.

3. Error analysis for basic examples, 1x1+...+xd

, 1‖x‖ ,

e−‖x‖

‖x‖ ,

x ∈ Rd.

4. Numerical illustrations.

5. Tensor product convolution in Rd. Quadrature-based

canonical decomp. of the projected Newton/Yukawa kernels.

Tensor-product interpolation revisited B. Khoromskij, Zuerich 2010(L4) 86

Given N ∈ N, the set of interpolating functions ϕj(x),x ∈ B := [−a, a], and sampling points ξj ∈ B, s.t. ϕj(ξi) = δij,

(i, j = 1, ..., N).

The Lagrangian interpolant IN of F : B → R has the form

INf :=

N∑

j=1

f(ξj)ϕj(x), f ∈ C[B] (24)

with (INf)(ξj) = f(ξj) (j = 1, . . . , N).

Recall the tensor-product interpolant IN in d spatial variables,

INf := I1N × · · · × Id

Nf =N∑

j=1

f(ξj1 , ..., ξjd)ϕ(1)j1

(x1) · · ·ϕ(d)jd

(xd),

where f : Bd → R, and IℓMf is the univariate interpolation in

xℓ ∈ Bℓ (1 ≤ ℓ ≤ d), Bd = B1 × · · · ×Bd.

Page 44: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Sinc-interpolation of multi-variate functions B. Khoromskij, Zuerich 2010(L4) 87

Consider the separable approximation on Bd = Rd, B = R.

Extension to the case B = R+ or B = (a, b) is straightforward.

The tensor-product Sinc interpolant CM in d variables,

CMf := C1M × ...× Cd

Mf, f : Rd → R,

where CℓMf = Cℓ

M (f, h), 1 ≤ ℓ ≤ d, is the univariate Sinc interp.

CℓM (f, h) =

M∑

k=−M

f(x1, ..., kh, ..., xd)Sk,h(xℓ).

Ex. 4.1. CMf converges exponentially fast in M for

f(x) = ‖x‖α, f(x) = e−κ‖x‖γ , f(x) =erf(‖x‖)

‖x‖ , f(x, y) = ‖x‖γ sinc(‖x‖ · ‖y‖),

x, y ∈ Rd.

Stability: Lebesgue constant of the Sinc-interpolant B. Khoromskij, Zuerich 2010(L4) 88

Error bound for tensor-product Sinc interpolant.

The estimation of the error f − CMf requires the Lebesgue

constant ΛM ≥ 1 of the univariate interpolant defined by

||CM (f, h)||∞ ≤ ΛM ||f ||∞ for all f ∈ C(R). (25)

Stenger ’93 proves the inequality

ΛM := maxx∈R

M∑

k=−M

|Sk,h(x)| ≤ 2

π(3 + log(M)). (26)

For each fixed ℓ ∈ 1, . . . , d, choose ζℓ ∈ Bℓ and define the

“single hole” parameter set by

Yℓ := B1 × ...×Bℓ−1 ×Bℓ+1 × ...×Bd ∈ Rd−1.

Page 45: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Sinc-interpolation error B. Khoromskij, Zuerich 2010(L4) 89

Introduce the univariate (parameter dependent) function

Fℓ(·, y) : Bℓ → R, y ∈ Yℓ, ℓ = 1, ..., d,

that is the restriction of f onto Bℓ. Recall the Hardy space

H1(Dδ).

Thm. 4.1. For each ℓ = 1, ..., d, and for any fixed y ∈ Yℓ, we

assume that Fℓ(·, y) satisfies

(a) Fℓ(·, y) ∈ H1(Dδ) with N(Fℓ, Dδ) ≤ N0 <∞ uniformly in y;

(b) Fℓ(·, y) has hyper-exponential decay with a = 1, C, b > 0.

Then, the “optimal” choice h := log MM , yields

‖f − CM (f, h)‖∞ ≤ C

2πδΛd

MN0 e−πδMlogM (27)

with ΛM defined by (26).

Proof of the Sinc-interpolation error B. Khoromskij, Zuerich 2010(L4) 90

Similar to the case of polynomial interpolation, the multiple

use of (25) and the triangle inequality lead to

|f − CMf | ≤ |f − C1Mf | + |C1

M (f − C2M . . . Cd

Mf)|≤ |f − C1

Nf | + |C1M (f − C2

Mf)| ++ |C1

MC2M (f − C3

Mf)| + . . .+ |C1M . . . Cd−1

M (f − CdMf)|

≤ (1 + ΛM )[N1 + ΛMN2 + . . .+ Λd−1M Nd]

1

2πδe

−πδMlogM

≤ (1 + ΛM )1 + ΛM + ...+ Λd−1

M

2πδmax

ℓ=1,...,dN(Fℓ, Dδ) e

−πδMlogM .

Hence, for ΛM → ∞, (27) follows.

Notice that usually δ can be chosen close to π/2. The choice

of δ effects only the error estimate, while CMf does not

depend on δ.

Page 46: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Tucker approx. to integral operators (IOs) (analytic meth.) B. Khoromskij, Zuerich 2010(L4) 91

CMf applies to Nystrom method:

(Gu) (x) :=

Ω

g(x, y)u(y)dy ≈∑

k

g(xm, yk)u(yk), xm, yk ∈ Ωh ∈ Rd.

A separable approx. of the singular kernel works only for

coupled variables (Φℓk(·, ·) is a set of bivariate functions),

gr :=r∑

k=1

bkΦ(1)k1

(x1, y1) · · ·Φ(d)kd

(xd, yd) ≈ g.

For the shift-invariant singular kernel g(x, y) = g(‖x− y‖),

g(x, y) ⇒ G(ζ1, ..., ζd) ≡ G

(√ζ21 + ...+ ζ2

d

),

where ζℓ = |xℓ − yℓ| ∈ [0, 1], ℓ = 1, ..., d.

Now the sinc interpolat applies w.r.t. the d coupled variables

ζ1, ..., ζd (only one point singularity!).

Canonical approximation to IOs (analytic meth.) B. Khoromskij, Zuerich 2010(L4) 92

Separation by integration applies to collocation and Galerkin

methods. The r-term Sinc-quadratures for the Laplace

integral representation of G(ρ), ρ = ‖x− y‖2, ρ ∈ [a, b],

G(ρ) =

R

f(t)e−tρdt ≈r∑

k=1

ckf(tk)d∏

ℓ=1

e−tk|xℓ−yℓ|2 , ρ ∈ [a, b].

For the collocation-projection, comp. a rank-r tensor at x = 0,

gi = 〈G(ρ), φi〉 ≈r∑

k=1

ckf(tk)

d∏

ℓ=1

〈e−tk|yℓ|2 , φiℓ(yℓ)〉, i ∈ I.

Ex. 4.2. For the classical Green kernels, x, y ∈ Rd,

log ‖x−y‖, 1

‖x− y‖ ,e−µ‖x−y‖

‖x− y‖ , (µ ∈ R+),e−i κ2‖x−y‖

‖x− y‖ , (κ2 ∈ R),

Sinc method provides asymptotically optimal bound on both

the canonical and Tucker ranks (see also Lect. 2),

r = O(logn| log ε|), r = (r, ..., r).

Page 47: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Initial applications: Problem classes B. Khoromskij, Zuerich 2010(L4) 93

• Tri-linear approx. to 3-rd/6-th order tensors generated by

the classical Green kernels (examples below).

• ”Multi-centred” potential (prototype of large molecules).∑k cke

−αk‖x−xk‖, x ∈ R3 (examples below).

• Electron density, Hartree and exchange potentials in R3.

Solving the Hartree-Fock eq. in tensor format (Part III).

• Traditional FEM/BEM (Part III):

– elliptic inverse in Rd via 1‖x‖2 , particular solutions

(convolution with Green’s func.), BEM on special surf.

– solving the elliptic boundary value, spectral and

transient problems in tensor format.

• Solving stochastic PDEs in tensor format (Part III).

Basic examples B. Khoromskij, Zuerich 2010(L4) 94

Low rank separable approx. of the multi-variate functions

(a)1

x21 + ...+ x2

d

, (b)1√

x21 + ...+ x2

d

, (c)e−λ‖x‖

‖x‖ .

Ex. 4.3. In case (a), the Sinc method applies to the Laplace

integral transform

1

ρ=

R+

e−ρtdt(ρ = x2

1 + ...+ x2d ∈ [1, R], R > 1

). (28)

The improved quadrature applies by using substitutions

t = log(1 + eu) and u = sinh(w),

1

ρ=

R

f2(w)dw, with f2(w) =cosh(w)

1 + e− sinh(w)e−ρ log(1+esinh(w)).

Page 48: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Canonical rank estimate for 1/ρ B. Khoromskij, Zuerich 2010(L4) 95

Special case of Thm. 3.6 (Lect. 3).

Lem. 4.1. (Hackbusch, Khoromskij [1]) Let ρ ∈ [1, R], then the choice

δ = δ(R) = O(1/ log(R)), a = 1, b = 1/2, in Thm. 3.6 implies the

uniform quadrature error bound by setting h = log(4πM)/M ,

∣∣∣∣1

ρ− TM (f2, h)

∣∣∣∣ . Ce− π2M

(C + log(R)) log(π2M) . (29)

Sketch of Proof. The funct. f2(w) belongs to H1(Dδ), with

δ = O(1/ log(R)), and N(f2, Dδ) <∞ independent of ρ.

Double-exponential decay of f2(w) on w ∈ (−∞,∞), is due to

f2(w) ≈ 1

2ew− ρ2 ew as w → ∞; f2(w) ≈ 1

2e|w|−

12 e|w|

as w → −∞,

corresponding to C = 12 , b = min1, ρ/2, a = 1, in Thm. 3.6.

Numerics for 1/ρ B. Khoromskij, Zuerich 2010(L4) 96

In the case 1/ρ = 1x21+...+x2

d, estimate (29) implies that an

approximation of accuracy ε > 0 is obtainable with

M ≤ O(log( 1

ε ) · logR), (30)

provided that 1 ≤ ρ ≤ R (can be achieved by a proper scaling).

The numerical results support even the better bound

M ≤ O(log( 1

ε ) + logR)

(see Fig. 9, 10).

0 200 400 600 800 1000−8

−6

−4

−2

0

2

4

6x 10

−6

0 200 400 600 800 1000−2.5

−2

−1.5

−1

−0.5

0

0.5

1x 10

−8

0 200 400 600 800 1000−1

−0.5

0

0.5

1

1.5

2

2.5

3x 10

−13

Figure 9: The quadrature error related to (29) with 1 ≤ ρ ≤ 103, and

M = 16 (left), M = 32 (middle), M = 64 (right).

Page 49: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Numerics for 1/ρ B. Khoromskij, Zuerich 2010(L4) 97

0 0.5 1 1.5 2

x 104

−8

−6

−4

−2

0

2

4

6

8x 10

−6

0 0.5 1 1.5 2

x 104

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2x 10

−7

0 0.5 1 1.5 2

x 104

−4

−2

0

2

4

6x 10

−10

Figure 10: The quadrature error related to (29) with 1 ≤ r ≤ 18000, and

M = 16 (left), M = 32 (middle), M = 64 (right).

Lem. 4.1 indicates that the separation rank r = 2M + 1 depends only linear

logarithmically on both the tolerance ε > 0 and the upper bound R of ρ.

Important: Rank r does not depend on the dimension d.

Rem. 4.2. The choice of δ only effects the error bound but not the

quadrature itself.

Canonical rank estimate for 1/qx21 + ...+ xd

d B. Khoromskij, Zuerich 2010(L4) 98

A function 1√x21+...+x2

d

for d = 3, defines the Newton kernel.

Ex. 4.4. In case 1ρ = 1/

√x2

1 + ...+ x2d, apply the Gauss integral

1

ρ=

2√π

R+

e−ρ2t2dt (ρ ∈ [1, R]) . (31)

To maintain robustness in ρ, let us rewrite the Gauss integral

(31) using substitutions t = log(1 + eu) and u = sinh(w),

1

ρ=

R

f(w)dw with f(w) := cosh(w)F (sinh(w)) (32)

with

F (u) :=2√π

e−ρ2 log2(1+eu)

1 + e−u, u ∈ (−∞,∞).

Page 50: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Canonical rank estimate for 1/qx21 + ...+ xd

d B. Khoromskij, Zuerich 2010(L4) 99

Lem. 4.2. Let δ < π/2, ρ ≥ 1. Then for the function f in (32)

we have f ∈ H1(Dδ). Moreover, Thm. 3.6 applies with a = 1.

The improved (2M + 1)-point quadrature with the choice

δ(ρ) = πC+log(ρ) , allows the error bound

∣∣∣∣1

ρ− IM (f, h)

∣∣∣∣ ≤ C1 exp

(− π2M

(C + log(ρ)) logM

). (33)

Sketch of Proof. It is easy to check that f is holomorphic in

Dδ and N(f,Dδ) <∞ uniformly in ρ (with the choice δ = δ(ρ)).

Now we check the double-exponential decay of the integrand

as |w| → ∞ and then apply Thm. 3.6, with

δ = δ(ρ) =π

C + log(ρ).

Numerics for 1qx21+...+x2

d

B. Khoromskij, Zuerich 2010(L4) 100

We apply (33) and obtain the bound (no dependence on d),

M ≤ O`log( 1

ε) · logR

´. (34)

Fig. 11 presents numerical illustrations for sinc quadrature with values

ρ ∈ [1, R], R ≤ 5000. We observe very weak error increase in ρ. Similar

results were obtained in the case R > 5000 manifesting a rather stable

behaviour of the quadrature error w.r.t. R.

0 50 100 150 200−4

−3

−2

−1

0

1

2

3x 10

−8

0 200 400 600 800 1000−3

−2

−1

0

1

2

3

4x 10

−7

0 1000 2000 3000 4000 5000−5

0

5x 10

−7

Figure 11: The quadrature error for M = 64 with R = 200 (left), R = 1000

(middle), R = 5000 (right).

Page 51: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Helmholtz kernel revisited B. Khoromskij, Zuerich 2010(L4) 101

Ex. 4.5. The Helmholtz kernel in Rd can be approximated by

sinc-interpolation in the Tucker format.

Given κ ∈ R, consider the Helmholtz kernel function

g(x, y) :=cos(κ‖x− y‖)

‖x− y‖ = ℜeeiκ‖x−y‖

‖x− y‖ for (x, y) ∈ [1, R]d × [1, R]d.

The Sinc interpolation applies to the modified kernels.

For this example we have N0(F,Dδ) = O(eκ), hence, the

separation rank for Tucker approximation is r = 2M + 1, with

the maximal canonical rank rd−1, where M ∼ κ+ | log ε| logR.

This provides unsatisfactory complexity, e.g., O(κd−1n).

We do not know the good quadrature approximation.

Note: The result in Lect. 2 leads to the following bound on

the canonical rank r = O(d(κ+ | log ε| logn)).

SVD recompression B. Khoromskij, Zuerich 2010(L4) 102

4 6 8 10 12 14 16 18 20 22 24 26 28 30 3210

−14

10−12

10−10

10−8

10−6

10−4

10−2

M − number of quadrature points

erro

r

exp[(x2+y2)1/2]− exp[y], x ∈ [0,1], y=0.1

0 5 10 15 20 25 3010

−20

10−15

10−10

10−5

100

105

2 4 6 8 10 12 14 16 18 20 22 24 26 28 3010

−14

10−12

10−10

10−8

10−6

10−4

10−2

M − number of quadrature points

erro

r

exp[(x2+y2)1/2]− exp[y], x ∈ [0,5], y=0.1

0 5 10 15 20 25 3010

−20

10−15

10−10

10−5

100

105

Figure 12: Rank-(r1, ..., rd) approx. to exp(−‖x − y‖), d = 2 (left); SVD optimization (right).

Page 52: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Rank-r Tucker approx. to 1/‖x‖, d = 3, ‖x‖ ≤ 10. B. Khoromskij, Zuerich 2010(L4) 103

2 4 6 8 10 12

10−10

10−8

10−6

10−4

10−2

100

Tucker rank

erro

rNewton , AR=10, n = 64

EFN

EFE

EC

0 10 20 30 40 50 60−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6Canonical components L=3 r=6

Newton , AR=10, n = 64

grid points

Figure 13: Convergence history and canonical vectors for the Newton

potential on n× n× n grid.

Rank-r Tucker approx. to exp(−‖x‖γ), d = 3, ‖x‖ ≤ 10. B. Khoromskij, Zuerich 2010(L4) 104

1 2 3 4 5 6 7 8 9 1010

−10

10−8

10−6

10−4

10−2

100

Tucker rank

erro

r

exp(−|x|γ), , γ=0.5, n = 64

EFN

EFE

EC

1 2 3 4 5 6 7 8 9 10

10−10

10−8

10−6

10−4

10−2

100

Tucker rank

erro

r

exp(−|x|γ), , γ=1, n = 64

EFN

EFE

EC

1 2 3 4 5 6 7 8 9 10

10−12

10−10

10−8

10−6

10−4

10−2

100

Tucker rank

erro

r

exp(−|x|γ), , γ=1.5, n = 64

EFN

EFE

EC

0 10 20 30 40 50 60−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8Canonical components L=3 r=6

exp(−|x|γ), , γ=0.5, n = 64

grid points0 10 20 30 40 50 60

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6Canonical components L=3 r=6

exp(−|x|γ), , γ=1, n = 64

grid points0 10 20 30 40 50 60

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6Canonical components L=3 r=6

exp(−|x|γ), , γ=1.5, n = 64

grid points

Figure 14: Canonical vectors for the Slater-type potential.

Page 53: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Rank-r Tucker approx. toP64

k=1 ck exp(−‖x− xk‖) B. Khoromskij, Zuerich 2010(L4) 105

010

2030

40

0

10

20

30

40

0

0.05

0.1

0.15

0.2

1 2 3 4 5 6 7 810

−15

10−10

10−5

100

relative energy−norm

relative energy

Tucker rank

erro

r

Slater potential, AR=10

1 1.5 2 2.5 3 3.5 4

10−4

10−3

10−2

10−1

100

relative energy−norm

relative energy

Tucker rank

erro

r

Slater−Mult−Rand 1% , AR=10, n = 64

1 2 3 4 5 6

10−6

10−5

10−4

10−3

10−2

10−1

100

relative energy−norm

relative energy

Tucker rank

erro

r

Slater−Mult−Rand 0.1% , AR=10, n = 64

1 2 3 4 5 610

−8

10−6

10−4

10−2

100

relative energy−norm

relative energy

Tucker rank

erro

r

Slater−Mult−Rand 0.01% , AR=10, n = 64

Figure 15: Multi-centred randomly perturbed Slater potential.

Multidimensional Convolution via Tensorization B. Khoromskij, Zuerich 2010(L4) 106

Goal: Fast and accurate computation of convolution

transform in Rd,

w(x) := (f ∗ g)(x) :=

Rd

f(y)g(x− y)dy f, g ∈ L1(Rd).

Application: Solving the Poisson equation in Rd, in

particular, the Hartree potential in quantum chemistry,

VH(x) =

R3

ρ(y, y)

‖x− y‖dy, x ∈ R3.

Method: The low tensor rank multi-linear collocation.

Physical prerequisits:

(a) Compute f ∗ g in some fixed box Ω = [−A,A]d.

(b) Suppose that f has the support in Ω.

(c) f has R-term separable representation with moderate R.

Page 54: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Multidimensional Convolution via Tensorization B. Khoromskij, Zuerich 2010(L4) 107

Tensor grid: Let ωd := ω1 × ...× ωd be the equi-distant tensor

grid of collocation points xm in Ω, m ∈ M := 1, ..., n+ 1d,

ωℓ := −A+ (m− 1)h : m = 1, ..., n+ 1 (ℓ = 1, ..., d), h = 2A/n.

Product Basis: For given piecewise const. basis funct. φi,φi(x) =

d∏ℓ=1

φiℓ(xℓ), φiℓ(·) = φ(· + (iℓ − 1)h), i ∈ I := 1, ..., nd,

related to ωd, let

f(y) ≈∑

i∈Ifiφi(y), fi = f(Pi).

The discrete collocation scheme (cost O(n2d)):

f ∗ g ≈ wmm∈M, wm :=∑

i∈Ifi

Rd

φi(y)g(xm − y)dy, xm ∈ ωd.

The collocation coeff. tensor G = gi ∈ RI (L2-project.)

gi =

Rd

φi(y)g(−y)dy, i ∈ I.

Sinc quadrature for Yukawa projection-collocation B. Khoromskij, Zuerich 2010(L4) 108

Consider a class of spherically-symmetric convolving kernels g : Rd → R,

g(y) = G(ρ(y)) ≡ G(ρ) with ρ ≡ ρ(y) = y21 + ...+ y2d,

where G : R+ → R is represented via the generalised Laplace transform

G(ρ) =

Z

R+

bG(τ2)e−ρτ2dτ. (35)

sinc-quadrature methods apply to the collocation coefficients tensor

G = [gi]i∈I via rank-(2M + 1) canonical decomposition

gi ≈MX

k=−M

wkbG(τ2

k )dY

ℓ=1

Z

R

e−y2ℓ τ2kφiℓ (yℓ)dyℓ, i ∈ I,

with suitably chosen coefficients wk ∈ R and quadrature pints τk ∈ R+.

In the particular case of Yukawa potential for κ ∈ [0,∞), we apply the

Gauss transform (cf. (35))

G(ρ) =e−κ

√ρ

√ρ

=2√π

Z

R+

exp(−ρτ2 − κ2/τ2)dτ, (36)

corresponding to the choice bG(τ2) = 2√πe−κ2/τ2

.

Page 55: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Error bound for Sinc quadrature approximation B. Khoromskij, Zuerich 2010(L4) 109

Thm. 4.2. [3] For given G(ρ) in (36) with fixed κ > 0, we set

wk = hMbG(τ2

k ) and τk = etk , tk = khM ,

with hM = C0 log(M)/M for some C0 > 0. Then for the rank-(2M + 1)

collocation coefficients tensor G = [gi]i∈I, we have

‚‚‚‚‚‚gi −

MX

k=−M

wkbG(τ2

k )dY

ℓ=1

Z

R

e−y2ℓ τ2kφiℓ(yℓ)dyℓ

‚‚‚‚‚‚≤ Ce−π2M/(C+log(M)). (37)

Sketch of proof. Choose the analyticity domain for the integrand in (36)

as a sector Sδ := w ∈ C : |arg(w)| < δ with apex angle 0 < 2δ < π/2, and

then use the conformal map

ϕ−1 : Sδ → Dδ with w = ϕ(z) = ez , ϕ−1(w) = log(w).

Applying the change of variables τ = et leads to

G(ρ) =

Z

R

f(t; ρ)dt with f(t; ρ) = Q(t)e−ρe2t, Q(t) =

2√πet−κ2e−2t

.

Error bound for Sinc quadrature approximation B. Khoromskij, Zuerich 2010(L4) 110

f can be analytically extended into the strip Dδ. By definition,

gi = 〈G(ρ), φi〉 =

Z

R

〈f(t; ρ), φi〉dt ≡Z

R

pi(t)dt, pi(t) = Q(t)dY

ℓ=1

Z

R

e−y2ℓ e2t

φiℓdyℓ.

The rank-1 (separable in i) function pi : R → R, can be analytically

extended into the strip Dδ with 0 < δ < π/4, and this extension belongs to

the Hardy space H1(Dδ). In fact, using the error function erf : R → R,

erf(t) :=2√π

Z t

0e−τ2

dτ,

we calculate the explicit representationZ

R

e−y2e2tφi(y)dy =

1

2 t

˘erf(t ih) − erf(t (i− 1)h)

¯, (38)

with h = 2A/n (uniform grid spacing) for i = 1, ..., n. Since erf(z)/z is an

entire function one proves the required analyticity of pi.

To estimate N(pi, Dδ), we let Hi = h(i1 − 1, ..., id − 1)T ∈ Rd to obtainZ

Rde−w2|y|2φ(y +Hi)dy =

Z

Rde−w2|v−Hi|2φ(v)dv,

taking into account that φ has compact support [−h, h]d.

Page 56: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Error bound for Sinc quadrature approximation B. Khoromskij, Zuerich 2010(L4) 111

Notice that |Q(ζ exp(iδ))| ≤ C0 <∞ for ζ ∈ [0,∞), leading to the bound,

N(pi, Dδ) =

=

Z

∂Sδ

|pi(w)| |dw|

=

Z

∂Sδ

Z

Rd|Q(w)|

˛˛e−w2|y|2φ(y +Hi)dy

˛˛ |dw|

≤ 2

Z

R+

Z

Rd|Q(ζeiδ)|

˛˛e−ζ2 exp(2iδ)|u−Hi|2φ(u)du

˛˛ dζ

≤ 2C0

Z

Rd

Z

R+

˛˛e−ζ2 exp(2iδ)|u+Hi|2

˛˛ dζ |φ(u)|du

= 2C0

Z

Rd

Z

R+

e−ζ2cos(2δ)|u+Hi|2dζ |φ(u)|du

=2C0pcos(2δ)

Z

Rd

|φ(u)||u−Hi|2

du.

The latter term is uniformly bounded in Hi.

Error bound for Sinc quadrature approximation B. Khoromskij, Zuerich 2010(L4) 112

Asymptotics of the integrand pi(t) := pi(t;κ) on the real axis,

pi(t;κ) ≈ et−κ2e2tas t→ ∞,

pi(t;κ) ≈ et−κ2e2|t|as t→ −∞,

corresponds to a = 2, b = κ2, and C = 1 for the double-exponential decay.

Finally, we apply Thm. 3.6, providing the exponential convergence in M

of the rank-r sinc-quadrature approximation as in (37), with r = 2M + 1,

gi =

Z

R

pi(t;κ)dt ≈ hM

MX

k=−M

pi(tk;κ).

Rem. Sinc quadrature approximation of the Galerkin tensor for Newton

kernel is analysed in [2].

Ex. 4.6. Laplace transform for Slater function

G(ρ) = e−2√

αρ =

√α√π

Z

R+

τ−3/2 exp(−α/τ − ρτ)dτ.

Page 57: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Numerics for tensor-product convolution B. Khoromskij, Zuerich 2010(L4) 113

Ex. 4.7. n = 200: Naive collocation - n6 ∼ 2 · 1012;

FFT3 - n3 logn ∼ 107; canonical-canonical tensor format - 3nR1R2.

Next table shows the advantage of the fast tensor-product convolution

method. compared with those based on FFT3.

CPU time (sec.) for high accuracy computation of the Hartree potential

for H2O molecule calculated in MATLAB on a Sun Fire X4600 computer

with 2,6 GHz processor, [4].

CPU time for FFT3 scheme with n ≥ 1024 is obtained by extrapolation.

n3 1283 2563 5123 10243 20483 40963 81923 163843

FFT3 4.3 55.4 582.8 ∼ 6000 – – – ∼ 2 years

C ∗ C 1.0 3.1 5.3 21.9 43.7 127.1 368.6 700.2

Literature to Lect. 4 B. Khoromskij, Zuerich 2010(L4) 114

1. W. Hackbusch and B.N. Khoromskij: Low-rank Kronecker product approximation to multi-dimensional

nonlocal operators . Parts I/II. Computing 76 (2006) 177-202/203-225.

2. B.N. Khoromskij: Structured Rank-(r1, ..., rd) Decomposition of Function-related Tensors in Rd.

Comp. Meth. in Appl. Math., V. 6 (2006), 194-220.

3. B.N. Khoromskij: On Tensor Approximation of Green Iterations for Kohn-Sham Equations. Computing

and Visualization in Sci., 11 (2008), 259-271.

4. B.N. Khoromskij and V. Khoromskaia: Low-rank Tucker Tensor Approximation to Classical Potentials.

Central European J. of Math., 5(3) 2007, 1-28.

http://personal-homepages.mis.mpg.de/bokh

Page 58: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Introduction to Tensor Numerical Methods II B. Khoromskij, Zuerich 2010(L5) 115

Everything is more simple than one thinks

but at the same time more complex than one can understand.

J.W. von Goethe (1749-1832)

Introduction to Tensor Numerical Methods in

Scientific Computing(Part II. Tensor Approximation and Multilinear Algebra)

Boris N. Khoromskij

http://personal-homepages.mis.mpg.de/bokh

University/ETH Zuerich, Pro∗Doc Program, WS 2010

Part II: Outlook B. Khoromskij, Zuerich 2010(L5) 116

Part II (Lect. 5-11). Basic Tensor formats. Algebraic Methods of

Tensor Approximation and Multilinear Algebra (MLA).

Motivatory examples of modern applications. Low rank and H-matrices.

Truncated SVD, ACA. FFT and circulant convolution.

Canonical, Tucker and mixed tensor formats. On Strassen algorithm.

Unfolding of a tensor and contracted product. Basic MLA operations

on rank structured tensors. High order SVD (HOSVD), quasioptimality.

Multilinear Kronecker product of matrices, basic properties.

Approximation by rank-1 tensors.

Tucker/canonical approximation by ALS iteration. Multigrid accelerated

tensor approximation. Reduced HOSVD and fast canonical-to-Tucker

transform.

Tensor representation of matrix-valued functions.

Truncated iteration.

Tensor convolution revisited and other bilinear operations.

Page 59: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Lect. 5. From low to higher dimensions B. Khoromskij, Zuerich 2010(L5) 117

Outline of Lecture 5.

1. Wide range applications in Rd.

2. d = 2: Main properties of rank-R matrices. Approximation

by low rank matrices.

3. Truncated SVD, reduced SVD, and adaptive cross

approximation (ACA).

4. H-matrices in dimension ≤ 3: advantages and limitations.

5. FFT, FFTd, and circulant convolution.

6. A paradigm of super-computing (does not relax the curse

of dimension).

Problem classes in RdB. Khoromskij, Zuerich 2010(L5) 118

Elliptic (parameter-dependent) eq.: Find u ∈ H10 (Ω), s.t.,

Hu := − div (A gradu) + V u = F in Ω ∈ Rd.

EVP: Find a pair (λ, u) ∈ R ×H10 (Ω), s.t., 〈u, u〉 = 1, and

Hu = λu in Ω ∈ Rd,

u = 0 on ∂Ω.

Parabolic equations: Find u : Rd × (0,∞) → R, s.t.

u(x, 0) ∈ H2(Rd) : σ∂u

∂t+ Hu = 0, H = ∆d + V (x1, ..., xd).

Specific features:

⊲ High spacial dimension: Ω = (−b, b)d ∈ Rd (d = 2, 3, ..., 100, ...).

⊲ Multiparametric eq.: A(y, x), u(y, x), y ∈ RM (M = 1, 2, ..., 100, ...,∞).

⊲ Nonlinear, nonlocal (integral) operator V = V (x, u), singular potentials.

Page 60: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

General examples B. Khoromskij, Zuerich 2010(L5) 119

• Fast Poisson solver, preconditioning ⇒ (−∆ + I)−1.

• Convolution transform in Rd with Green’s function for

d-Laplacian (d ≥ 3),

f(x) =

Rd

ρ(y)

‖x− y‖d−2dy, x ∈ R

d.

O(dn logn)-algorithms and numerics in electronic

structure calculations.

• Parabolic equations (heat transfer, molecular dynamics)

∂u∂x +Au = f ⇒ exp(−tA), Gayley Transform I+A

I−A .

• Linear algebra, complexity theory (Strassen’s algorithm

by tensor decomposition).

• Matrix product states (MPS) and DMRG-type methods

for slightly entangled sytems (electronic structure,

molecular dynamics).

Many-particle models B. Khoromskij, Zuerich 2010(L5) 120

Objectives in Many-particle models.

• The electronic Schrodinger eq. for many-particle system in Rd,

HΨ = ΛΨ

with the Hamiltonian H = H[r1, ..., rNe ],

H := −1

2

NeX

i=1

∆i −KX

a=1

NeX

i=1

Za

|ri −Ra|+

X

i<j≤Ne

1

|ri − rj |+

X

a<b≤K

ZaZb

|Ra −Rb|,

Za, Ra are charges and positions of the nuclei, ri ∈ R3.

Hence the problem is posed in Rd with high dimension d = 3Ne,

where Ne is the (large) number of electrons.

Desired size of the system is Ne = O(10q), q = 1, 2, 3, 4, ...?

Proteins: q = 3, 4.

Molecular dynamics, electronic structure calculation for small

molecules: q = 1, 2.

Page 61: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Many-particle models B. Khoromskij, Zuerich 2010(L5) 121

• Hartree-Fock equation

»−1

2∆ − Vc(x) +

Z

R3

ρ(y, y)

‖x− y‖dy–φ(x) − 1

2

Z

R3

ρ(x, y)

‖x− y‖ φ(y)dy = λφ(y),

ρ(x, y) =Ne/2Pi=1

φi(x)φi(y) electron density matrix,

e−µ‖x‖ - density function for hydrogen atom,1

‖x‖ - Newton potential,

Vc - external potential with singularities at centers of atoms.

Tensor approximation scheme and numerics Lect. 10 - 11.

• Kohn-Sham equation (simplyfied Hartree-Fock eq.)

»−1

2∆ − Vc(x) +

Z

R3

ρ(y)

‖x− y‖dy − αVρ(x)

–ψ = λψ, Vρ(x) =

3

πρ(x)

ff1/3

.

• Poisson-Boltzmann eq. (the electrostatic potential of proteins)

∇ · [ε(x)∇ · φ(x)] − ε(x)h(x)2sinh[φ(x)] + 4πρ(x)/kT = 0, x ∈ R3.

If ε(x) = ε0, h(x) = h, ρ(x) = δ(x), then φ(x) = e−h‖x‖

‖x‖ .

Parametric Elliptic Problems: Stochastic PDEs B. Khoromskij, Zuerich 2010(L5) 122

Find uM ∈ L2(Γ) ×H10 (D), s.t.

AuM (y, x) = f(x) in D, ∀y ∈ Γ,

uM (y, x) = 0 on ∂D, ∀y ∈ Γ,

A := − div (aM (y, x) grad) , f ∈ L2 (D) , D ∈ Rd, d = 1, 2, 3,

aM (y, x) is smooth in x ∈ D, y = (y1, ..., yM ) ∈ Γ := [−1, 1]M , M ≤ ∞.

Additive case (via the truncated Karhunen-Loeve expansion)

aM (y, x) := a0(x) +MX

m=1

am(x)ym, am ∈ L∞(D), M → ∞.

Log-additive case

aM (y, x) := exp(a0(x) +MX

m=1

am(x)ym) > 0.

Computing the truncated Karhunen-Loeve expansion.

Analysis of best N-term approximations.

Tensor representation of stochastic-Galerkin and collocation matrices.

Tensor truncated preconditioned iteration.

Page 62: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

“Low dimensional” methods as building blocks B. Khoromskij, Zuerich 2010(L5) 123

In low dimensions (d = 1, 2, 3) the goal is O(N)-methods.

Main principles: making use of hierarchical structures,

low-rank pattern and recursive algorithms.

Basic numerical methods for d = 1, 2, 3:

Finite element/Finite difference methods (FE/FD).

Classical Fourier (1768-1830) methods: FFT in O(N logN) op.,

FFT-based circulant convolution, Toeplitz, Hankel matrices.

Multigrid principle: O(N) - elliptic problem solvers.

Spectral equivalent preconditionrs.

Numertical linear algebra.

Low rank matrix approximation. SVD-based algorithms.

Matrix SVD B. Khoromskij, Zuerich 2010(L5) 124

Lem. 5.1. (matrix SVD). Every real (complex) τ × σ-matrix

M can be representd as the product

M = U (1) · S · V (2)T= S ×1 U

(1) ×2 V(2)T

= S×1 U(1) ×2 U

(2),

in which

1. U (1) = [U(1)1 U

(1)2 ...U

(1)I1

] is a unitary τ × τ-matrix,

2. U (2) = [U(2)1 U

(2)2 ...U

(2)I2

] is a unitary σ × σ-matrix,

3. S is an τ × σ-matrix (core tensor) with the properties of

(i) pseudodiagonality : S = diagσ1, σ2, ..., σmin(τ,σ),

(ii) ordering : σ1 ≥ σ2 ≥ ... ≥ σmin(τ,σ) ≥ 0.

The σi are singular values of M , and the vectors U(1)i and U

(2)i

are, resp., an ith left and ith right singular vectors.

Page 63: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Low rank matrices B. Khoromskij, Zuerich 2010(L5) 125

The class of rank ≤ k matrices in Rτ×σ will be called by

Rk-matrices, i.e. rank(M) ≤ k for M ∈ Rk.

Each M ∈ Rk can be represented in the form

M = A ·BT , A ∈ Rτ×k, B ∈ R

σ×k. (39)

Lem. 5.2. Attractive features of Rk-matrices:

1. The set Rk is closed (nontrivial result in linear algebra).

2. Only k(τ + σ) numbers are required to store an Rk-matrix.

3. The matrix-vector multiplication x 7→ y := Mx, x ∈ Rσ

can be done in two steps:

y′ := BTx ∈ Rk, and y := Ay′ ∈ Rτ .

The corresponding cost is 2k(σ + τ).

Low rank matrices B. Khoromskij, Zuerich 2010(L5) 126

4. The sum of two Rk-matrices R1 = A1BT1 , R2 = A2B

T2 is an

R2k-matrix,

R1 +R2 = [A1|A2][B1|B2]T , [A1|A2] ∈ R

τ×2k, [B1|B2] ∈ Rσ×2k.

5. The multiplication of R ∈ Rk by an arbitrary matrix M of

the proper size gives again an Rk-matrix:

RM = A(MTB)T , MR = (MA)BT .

6. The best approximation of an arbitrary matrix M ∈ Rτ×σ

by an Rk-matrix Mk, say in the Frobenius norm, that is

‖A‖2F :=

(i,j)∈τ×σ

a2ij ,

can be calculated by the truncated SVD (discrete version of

the Schmidt decomposition).

Page 64: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Truncated SVD B. Khoromskij, Zuerich 2010(L5) 127

Alg. 5.1. (Truncated SVD). For given k ∈ N, let M = UΣV T

be the SVD of M , i.e., Σ = diagσ1, . . . σk, ..., σn with

σ1 ≥ σ2 ≥ . . . ≥ σn ≥ 0, and U = [U1, ..., Uk, Uk+1, ..., Un],

V = [V1, ..., Vk, Vk+1, ..., Vn] being unitary.

Set Σk := diagσ1, . . . , σk, 0, . . . , 0, then

Mk := UΣkVT = UΣkV

T ≈M,

and

‖Mk −M‖F ≤

√√√√n∑

j=k+1

σ2j .

The complexity of the truncated SVD: O(τσ2) with τ ≥ σ.

Too expensive for large τ and σ.

Is it possible to compute almost the best rank-k matrix

approximation getting rid of full matrix SVD ? – Yes.

Reduced truncated SVD B. Khoromskij, Zuerich 2010(L5) 128

If M ∈ Rm, then its best approximation Mk ∈ Rk, k < m, can be computed

by the following QR-SVD scheme.

Alg. 5.2. (Reduced truncated SVD). Given M = ABT ∈ Rm,

(i) Calculate the QR-decompositions A = QARA and B = QBRB, with the

unitary matrices QA ∈ Rτ×m, and QB ∈ Rσ×m, and upper triangular

matrices RA, RB ∈ Rm×m.

(ii) Calculate a SVD, RARTB = UΣV T (with the cost O(m3)).

(iii) Define Mk = AkBTk with Ak := QAUkΣk ∈ Rτ×k and

Bk := QBVk ∈ Rσ×k, where Uk := [U1, . . . , Uk], Vk := [V1, . . . , Vk] (in both

cases, first k columns) and the truncated matrix Σk of Σ are defined by

truncated SVD of RARTB = UΣV T .

Alg. 5.2 can be implemented in O(m2(τ + σ) +m3) operations.

Exer. 5.1 Compute the rank-r, r = 2M + 1, sinc quadrature

approximation of the Hilbert matrix A = aij, aij = 1/(i+ j) (i, j = 1, ..., n)

for n = 103, 104, and M = 64. Apply to the result the best low rank

approximation via reduced truncated SVD by Alg. 5.2 (cf. Exercise 2.2).

Page 65: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Adaptive cross approximation (ACA) B. Khoromskij, Zuerich 2010(L5) 129

In FEM/BEM applications, nearly best (suboptimal) rank-k approximation

over partial data can be computed by the heuristic method called adaptive

cross approximation (ACA), cf. [3], [6].

Many matrix decomposition algorithms can be represented as a sequence

of rank-one Wedderburn updates.

J. H. M. Wedderburn, Lectures on matrices, colloquim publications, vol. XVII, AMS, NY, 1934.

For a given m× n matrix A and vectors x, y of appropriate sizes, s.t.

xTAy 6= 0, matrix

B = A− AyxTA

xTAy,

has rank(B) =rank(A) − 1. For the rank-r matrix A0 = A after r updates of

form

Ak = Ak−1 − Ak−1ykxTk Ak−1

xTk Ak−1yk

, with xTk Ak−1yk 6= 0,

the matrix Ar becomes zero leading to rank-r decomposition of A.

Adaptive cross approximation (ACA) B. Khoromskij, Zuerich 2010(L5) 130

The ACA algorithm is a special case of Wedderburn updates based on

“max-element” pivoting strategy cf. [3], [6].

The idea of the ACA algorithm is as follows.

Starting from R0 = A ∈ Rm×n, find a nonzero pivot in Rk, say (ik, jk),

and subtract a scaled outer product of the ikth row and the jkth column:

Rk+1 := Rk − 1

(Rk)ikjk

ukvTk , with uk = (Rk)1:m,jk , vk = (Rk)ik,1:n,

where we use the notation (Rk)ik,1:n and (Rk)1:m,jk for the ikth row and

the jkth column of Rk, respectively.

Here jk is chosen as the maximum element in modulus of the ikth row,

i.e.,

|(Rk−1)ikjk | = maxj=1,...,n

|(Rk−1)ikj |.

The choice of ik will be similar.

The matrix Sr :=Pr

k=1 ukvTk will be used as the rank-r approximation

of A = Sr +Rr, since rank(Sr) ≤ r.

Apply the reduced truncated SVD to Sr for rank optimization.

Page 66: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

H-matrix format: brief survey B. Khoromskij, Zuerich 2010(L5) 131

H- and H2-matrix technique is a direct descendant of panel

clustering, fast multipole and mosaic-skeleton approximations.

In addition, it allows data-sparse matrix-matrix operations.

MH,k(TI×I ,P), the class of data-sparse hierarchical

H-matrices - Hackbusch, Khoromskij, Bebendorf, Borm, Grasedyck, Sauter (’99 - ’05).

The construction of H-matrices defined on the product index

set I × I, is based on the following ingredients:

• An H-tree T (I) of the index set I (hierarchical cluster

tree).

• The admissible partitioning P of I × I based on a block

cluster tree T (I × I).

• Low rank approximation of all large enough blocks in P.

H-matrix format: brief survey B. Khoromskij, Zuerich 2010(L5) 132

Def. 5.1. For an admissible partitioning P and k ∈ N, define

the set MH,k(I × I,P) ⊂ RI×I of (real) H-matrices by

MH,k(I×I,P) := M ∈ RI×I : rank(M |b) ≤ k for all b ∈ P. (40)

M |b = (mij)(i,j)∈b denotes the matrix block of M = (mij)i,j∈I,

corresponding to b ∈ P.The matrices from MH,k(I × I,P) are implemented by means

of the list M |b : b ∈ P of matrix blocks, where each M |b(b = τ × σ with τ, σ ∈ T (I)) is represented by the rank-k matrix

k∑

ν=1

aνb⊤ν with vectors aν ∈ R

τ , bν ∈ Rσ.

The number k is called the local rank.

Page 67: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Examples of hierarchical partitioning B. Khoromskij, Zuerich 2010(L5) 133

Hierarchical Partitionings P1/2(I × I) and PW (I × I)

Figure 16: Standard- (left) and Weak-admissible H-partitionings for d =

1.

Main properties of the H-matrix format B. Khoromskij, Zuerich 2010(L5) 134

Thm. 5.1. (complexity of the H-matrix arithmetic)

For k ∈ N, and H-tree TI×I of depth L > 1, the arithmetic of

N ×N-matrices in MH,k(TI×I ,P) has the complexity

NH,store ≤ 2CspkLN, NH·v ≤ 4CspkLN,

NH⊕H ≤ Cspk2N(C1L+ C2k),

NH⊙H ≤ C0C2spk

2LN maxk, L, NgInv(H)≤ CNH⊙H,

where Csp = Csp(d) is the sparsity constant.

Typically: Csp(1) ≈ 3, Csp(2) ≈ 25, Csp(3) ≈ 150.

The H-matrix format is well suited for representation of

integral (nonlocal) operators in BEM applications (d = 2, 3).

The hierarchical LU decomposition applies in FEM.

Limitations: Not applicable in high dimensions, d > 3.

Page 68: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Fast Fourier Transform B. Khoromskij, Zuerich 2010(L5) 135

Let SN be the space of sequences f [n]0≤n<N of period N .

SN is an Euclidean space with the scalar product

〈f, g〉 =

N−1∑

n=0

f [n]g∗[n].

Thm. 5.2. The familyek[n] = exp

(2iπkn

N

)0≤k<N

is an

orthogonal basis of SN with ||ek||2 = N . Any f ∈ SN can berepresented by

f =

N−1X

k=0

〈f, ek〉||ek||2

ek. (41)

Def. 5.2. The discrete Fourier transform (DFT) of f is

bf [k] := 〈f, ek〉 =

N−1X

n=0

f [n] exp

„−2iπkn

N

«, (N2 complex multiplications).

Due to (41) an inverse DFT is given by

f [n] :=1

N

N−1X

k=0

bf [k] exp

„2iπkn

N

«.

FFT: Matrix representation B. Khoromskij, Zuerich(L5) 136

The FT matrix FN = fk,nNk,n=1 is given by

fk,n := exp(−2iπkn

N) = W−nk, W = e2iπ/N .

The DFT(N) can be calculated by Fast Fourier Transform

(FFT) in NFFT (N) = CFN log2N operations, CF ≈ 4.

The FFT traces back (1805) to Gauss (1777 - 1855).

First computer program Coolly/Tukey (1965).

The inverse FFT of f can be derived from the forward FFT

of its complex conjugate f∗ due to

f∗[n] :=1

N

N−1∑

k=0

f∗[k] exp

(−2iπkn

N

).

Page 69: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Discrete convolution B. Khoromskij, Zuerich(L5) 137

Let g be the discrete convolution of two signals f, h supported

only by the indices 0 ≤ n ≤M − 1,

g[n] = (f ∗ h)[n] =∞∑

k=−∞f [k]h[n− k].

The naive implementation requires M(M + 1) operations.

It can be represented as a matrix-by-vector product (MVP)

with the Toeplitz matrix

T = h[n− k]0≤n,k<M ∈ RM×M , g = Tf.

Extending f and h with over M samples by

h[M ] = 0, h[2M − i] = h[i], i = 1, ...,M − 1,

f [n] = 0, n = M, ..., 2M − 1,

we reduce the problem to the MVP with a circulant matrix

C ∈ R2M×2M specified by the first row h ∈ R2M .

Circulant convolution by FFT B. Khoromskij, Zuerich(L5) 138

An n×n Toeplitz matrix C is called circulant if it has the form

C = circc1, . . . , cn :=

c1 c2 . . . cn

cn c1 . . . cn−1

......

. . ....

c2 . . . cn c1

, ci ∈ C .

The set of all n× n circulant matrices is closed with respect

to addition and multiplication by a constant.

Any circulant matrix C is associated with the polynomial

pc(z) := c1 + c2z + . . .+ cnzn−1, z ∈ C.

Page 70: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Circulant convolution by FFT B. Khoromskij, Zuerich(L5) 139

Matrix C has a diagonal representation in the Fourier basis,

C = FTn ΛcFn

with

Λc = diagpc(1), . . . , pc(ωn−1), ω = eiπ/n.

The eigenvector corresponding to the eigenvalue pc(ωj−1) is

given by jth column of Fn, i.e.,

~ωj =1√nω(k−1)(j−1)n

k=1.

The matrix-vector product with C costs 2CFn log2 n+O(n) op.

Multi-dimensional FFT can be performed by tensorization

process with the linear-logarithmic cost O(N log2N), N = nd.

Huge problems: tensor methods bit super-computers B. Khoromskij, Zuerich 2010(L5) 140

⊲ The algebraic operations on high-dimensional data require

heavy computing.

⊲ Linear cost O(N), N = nd is satisfactory only for small d.

⊲ Traditional ”asymptotically optimal” methods suffer from

the “curse of dimensionality”

⊲ Complexity of matrix operations in full arithmetics: O(N3).

It is large already for d = 3, i.e., N = n3 ⇒ N3 = n9.

⊲ A paradigm of up-to-date numerical simulations:

The higher computer capacities do not relax the curse of

dimensionality.

⊲ Remedy: The identification and efficient using of low rank

tensor structured representations with linear scaling in d.

Page 71: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Literature to Lecture 5 B. Khoromskij, Zuerich 2010(L5) 141

1. G.H. Golub and C.F. Van Loan: Matrix computations. 3rd ed., The Johns Hopkins University Press,

Baltimore, 1996.

2. W. Hackbusch: Hierarchiche Matrizen - Algorithmen und Analysis. Springer 2009.

3. M. Bebendorf: Hierarchical Matrices. Springer, 2008.

4. W. Hackbusch and B.N. Khoromskij: A Sparse H-matrix Arithmetic. Part II: Application to

Multi-Dimensional Problems. Copmputing 64 (2000), 21-47.

5. W. Hackbusch, B.N. Khoromskij and S. Sauter: On H2-Matrices. In: Lectures on Appl. Math.

(H.-J. Bungartz et al. eds.) Springer, Berlin, 2000, 9-30.

6. B.N. Khoromskij: Data-Sparse Approximation of Integral Operators. Lecture notes 17, MPI MIS,

Leipzig 2003, 1-61.

7. E. Tyrtyshnikov: Incomplete cross approximation in the mosaic-skeleton method.

Computing 64 (2000), 367-380.

http://personal-homepages.mis.mpg.de/bokh

Lect. 6. Basic rank structured tensor formats B. Khoromskij, Zurich 2010(L6) 142

Outline of Lecture 6.

0. FFT and circulant convolution.

1. Tensor product of finite dimensional Hilbert spaces

(multidimensional vectors).

2. Matrix unfolding and contracted product of tensors.

3. Tensor rank and canonical representation.

4. Rank decomposition can be useful in linear algebra:

O(nlog2 7)- Strassen algorithm of matrix multiplication.

5. Orthogonal Tucker and mixed Tucker-canonical models.

6. Linear and multilinear operations on “formatted tensors”.

7. Toward best (nonlinear) approximation in basic tensor

formats.

Page 72: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Tensor product of finite dimensional Hilbert spaces B. Khoromskij, Zurich 2010(L6) 143

Let H = H1 ⊗ ...⊗Hd be a tensor prod. Hilbert space (TPHS).

Hℓ is a real Euclidean space of vectors,

Hℓ = Rnℓ , nℓ ∈ N, nℓ := dimHℓ, ℓ = 1, ..., d.

The scalar product of rank-1 elements W,V ∈ H is given by

〈W,V 〉 = 〈w(1)⊗ . . .⊗w(d), v(1)⊗ . . .⊗v(d)〉 =d∏

ℓ=1

〈w(ℓ), v(ℓ)〉Hℓ, (42)

W (i1, ..., id) =d∏

ℓ=1

w(ℓ)(iℓ), Stor(W ) = n1 + ...+ nd ≪d∏

ℓ=1

nℓ.

Choose a basisφ

(ℓ)k : 1 ≤ k ≤ nℓ

of Hℓ, then the set

φ(1)k1

⊗ φ(2)k2

⊗ . . .⊗ φ(d)kd

(1 ≤ kℓ ≤ nℓ, 1 ≤ ℓ ≤ d) is the basis in H.

Denote the d-fold tensor prod. H = H ⊗ ...⊗H by H⊗d (= RId).

Tensor product of finite dimensional Hilbert spaces B. Khoromskij, Zurich 2010(L6) 144

Rem. 6.1. d-th order tensor A ∈ H of size n = (n1, ..., nd) is a

function of d discrete arguments (multi-dimensional

array/vector over I := I1 × ...× Id, Iℓ = 1, ..., nℓ), i.e.,

W : I1 × ...× Id → R, with dim(H) = |n| = n1 · · ·nd.

Notations for the coordinate representation of A,

A := [ai1...id ] = [A(i1, ..., id)] ∈ RI .

The Euclidean scalar product of tensors A,B ∈ H becomes

〈A,B〉 :=∑

(i1,...,id)∈Iai1...id bi1...id ,

inducing the Euclidean (Frobenious) norm ‖A‖F :=√

〈A,A〉.The dimension directions ℓ = 1, ..., d are called the modes.

Tensor is a union of ℓ-mode fibers, A(i1, ..., iℓ−1, : , iℓ+1, ..., id).

Page 73: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Vectorization of a tensor B. Khoromskij Zurich 2010(L6) 145

For a matrix A ∈ Rm×n we use the vector representation

(vectorization or concatenation) A→ vec(A) ∈ Rmn, where

vec(A) is an nm× 1 vector obtained by “stacking” A’s columns

(the FORTRAN-style ordering)

vec(A) := [a11, ..., an1, a12, ..., anm]T .

In this way, vec(A) is a rearranged version of A.

Def. 6.1. In general, if A ∈ RI1×...×Id is a tensor, then thevectorization of A is recursively defined by

vec(A) =

26666664

vec([A(i1, ..., id−1, 1)])

vec([A(i1, ..., id−1, 2)])

.

.

.

vec([A(i1, ..., id−1, nd)])

37777775

∈ R|n|×1.

The tensor element A(i1, ..., id) maps to vector entry (j, 1),

where j = 1 +d∑

k=1

(ik − 1)k−1∏ℓ=1

nℓ.

Matrix unfolding of a tensor B. Khoromskij Zurich 2010(L6) 146

Unfolding of a tensor into a matrix (matricization) is a way to

map high order tensor into two-fold arrays by rearranging

(reshaping) it for some ℓ ∈ 1, ..., d, RI 7→ RIℓ×I(−ℓ) , and then

vectorizing the tensors in Riℓ×I(−ℓ) for each iℓ ∈ Iℓ. The single

hole index set is defined by I(−ℓ) := I1 × ...× Iℓ−1 × Iℓ+1 × ...× Id.

Def. 6.2. The unfolding mat(A) of a tensor A ∈ RI1×...×Id

w.r.t. the index ℓ (along mode ℓ) is defined by a matrix

mat(A) := A(ℓ) of dimension nℓ × nℓ, so that the tensor element

A(i1, ..., id) maps to matrix element v(iℓ, j), iℓ ∈ Iℓ, where

A(ℓ) = [viℓj ], with j ∈ 1, . . . , nℓ, nℓ = n1 · · ·nℓ−1nℓ+1 · · ·nd,

j = 1 +dX

k=1,k 6=ℓ

(ik − 1)Jk, Jk =

k−1Y

m=1,m6=ℓ

nm.

Exer. 6.1. (mat(A) by recursion over vec(A)). Derive the representation

mat(A) = [vec([A(i1, ..., iℓ−1, 1, iℓ+1, ..., id)], ..., vec([A(i1, ..., iℓ−1, nℓ, iℓ+1, ..., id)])]T .

Page 74: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Example of matrix unfolding of a tensor B. Khoromskij Zurich 2010(L6) 147

Rem. 6.2. Kolmogorow’s decomposition is a particular way

for unforlding of the multivariate function into

“one-dimensional” representation (univariate function).

Ex. 6.1. Define a tensor A ∈ R3×2×3 by

a111 = a112 = a211 = −a212 = 1,

a213 = a311 = a313 = a121 = a122 = a221 = −a222 = 2,

a223 = a321 = a323 = 4, a113 = a312 = a123 = a322 = 0.

The matrix unfolding A(1) is given by

A(1) =

1 1 0

1 − 1 2

2 0 2

2 2 0

2 − 2 4

4 0 4

.

Visualization of matrix unfolding of a tensor B. Khoromskij Zurich 2010(L6) 148

SVD

SVD

SVD

(1)

(2)

(3)

r 1

r2

r 3

n 1

n 2

n3

n1

n . n

n2

n 3

n . n

3 2

1 3

2 1

n n.

T

T

T

0

0

0

Figure 17: Visualization of the matrix unfolding for d = 3.

Page 75: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

ℓ-rank of a tensor. Contracted product of tensors B. Khoromskij, Zurich 2010(L6) 149

Def. 6.3. The ℓ-rank of A (ℓ = 1, ..., d), denoted by

Rℓ = rankℓ(A), is the dimension of the vector space spanned

by the ℓ-mode vectors (fibers).

The ℓ-mode fibers of A are the column vectors of the matrix

unfolding A(ℓ) (by definition).

Prop. 6.1. We have

rankℓ(A) = rank(A(ℓ)).

The major difference with the matrix case, however, is the

fact that the different ℓ-ranks of a higher-order tensor are not

necessarily the same.

An important tensor-tensor operation is the contracted

product of two tensors. In the following we use a

tensor-matrix contracted product along mode ℓ.

Contracted product of tensors B. Khoromskij Zurich 2010(L6) 150

Def. 6.4. Given V ∈ RI1×...×Id, and a matrix M ∈ RJℓ×Iℓ ,

define the mode-ℓ tensor-matrix contracted product by

U = V ×ℓ M ∈ RI1×...×Iℓ−1×Jℓ×Iℓ+1...×Id ,

where

ui1,...,iℓ−1,jℓ,iℓ−1,...,id =

nℓ∑

iℓ=1

vi1,...,iℓ−1,iℓ,iℓ−1,...,idmjℓ,iℓ , jℓ ∈ Jℓ.

This is the generalization of the matrix-matrix multiplication:

M(n,m) ×2 M(p,m) = M(n,m)MT(p,m) →M(n,p).

n3 n 3

r3

r3

n1

n1

n2

n2

Figure 18: Contracted product of a third-order tensor with a matrix.

Page 76: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Rank-1 tensors and canonical format B. Khoromskij, Zurich 2010(L6) 151

Rem. 6.3. A dth-order tensor A has rank 1, rank(A) = 1, if it

is the contracted product of d vectors t(1), ..., t(d), t(ℓ) ∈ RIℓ ,

A = t(1) ×2 t(2)...×d t

(d), ai1...id = t(1)i1...t

(d)id,

for iℓ ∈ Iℓ (ℓ = 1, ..., d).

Ex. 6.2. Let A = a1 ⊗ a2, B = b1 ⊗ b2, ai, bi ∈ Rn (d = 2).

〈A,B〉 = 〈a1, b1〉〈a2, b2〉, ||A||F =√〈a1, a1〉〈a2, a2〉.

Def. 6.5. (Canonical format). Choose a subset of those

elements which require only R terms. They form the set

CR =

w ∈ H : w =

R∑

k=1

w(1)k ⊗ w

(2)k ⊗ . . .⊗ w

(d)k , w

(ℓ)k ∈ Hℓ

.

Elements w ∈ CR with w /∈ CR−1 are called to have the tensor

rank R.

Pro and contra for canonical format B. Khoromskij, Zurich 2010(L6) 152

Tensors w ∈ CR can be represented by the description of Rd

elements w(ℓ)k ∈ Hℓ, i.e. with linear cost in d, dRn.

Advantages: Tremendous reduction of storage cost,

removing d from the exponential, nd → dRn;

Analytic methods of low-rank approximation for Green

kernels.

Limitations: CR is a nonclosed set. Approximation process in

CR is not robust.

Visualization of the canonical model for d = 3.

+

b

A

1b

V V V

V V V

V V V

+= ...+

1

1 2

2

2

r

r

r

(1) (1) (1)

(2) (2) (2)

21

(3) (3) (3)

rb

Page 77: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Strassen algorithm via rank decomposition B. Khoromskij, Zurich 2010(L6) 153

Finding the tensor rank can be a useful concept even in the

classical linear algebra.

Historical remarks on the Strassen algorithm of fast

matrix-matrix multiplication of complexity O(nlog2 7).

O(n2+ε) algorithm to multiply two n× n matrices gives O(n2+ε)

method for solving system of n linear eqs. [Strassen 1969].

Best known result: O(n2.376) [Copperesmith-Winograd 1987].

Lloyd N. Trefethen bets Peter Alfred (25 June 1985) that a

method will have been found to solve Ax = b in O(n2+ε)

operations for any ε > 0 (numerical stabiliy is not an isue).

Details at personal homepage by Prof. L.N. Trefethen (Uni.

Oxford).

Strassen algorithm via rank decomposition B. Khoromskij, Zurich 2010(L6) 154

In the block form24 C1 C2

C3 C4

35 =

24 A1A2

A3A4

35 ·

24 B1 B2

B3 B4

35

with

Ck =4X

i=1

4X

j=1

γijkAiBj , k = 1, ..., 4,

where for the 3-rd order coefficients tensor of size 4 × 4 × 4 we have

(slicewise)

γijk = ⊳1

2666664

1 0 0 0

0 0 1 0

0 0 0 0

0 0 0 0

3777775⊳2

2666664

0 1 0 0

0 0 0 1

0 0 0 0

0 0 0 0

3777775⊳3

2666664

0 0 0 0

0 0 0 0

1 0 0 0

0 0 1 0

3777775⊳4

2666664

0 0 0 0

0 0 0 0

0 1 0 0

0 0 0 1

3777775.

Here ⊳i means that the related matrix corresponds to slice number i ≤ 4.

Page 78: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Strassen algorithm via rank decomposition B. Khoromskij, Zurich 2010(L6) 155

Suppose that we have rank-R expansion

γijk =RX

t=1

uitvjtwkt.

Then

Ck =

RX

t=1

wkt

4X

i=1

4X

j=1

uitAivjtBj =

RX

t=1

wkt

4X

i=1

uitAi

!0@

4X

j=1

vjtBj

1A .

Precompute Σt =4P

i=1uitAi, ∆t =

4Pj=1

vjtBj and reduce the initial task to R

matrix-matrix products of size n/2 × n/2.

We have R ≤ 8, but there are representations (infinitely many) of rank 7

(Strassen’s result).

Open problem: Is it possible to construct rank decompositions with

R < 7? If yes, then the Strassen result can be improved.

Exer. 6.2. Try to compute the canonical rank-7 decomposition of γ by

the Tensor Toolbox.

Orthogonal separabe representation B. Khoromskij, Zurich 2010(L6) 156

As in the Galerkin method, the replacement of Hℓ by

subspaces Vℓ ⊂ Hℓ (1 ≤ ℓ ≤ d) leads to the tensor subspace

V = V1 ⊗ V2 ⊗ . . .⊗ Vd ⊂ H.

Setting rℓ := dimVℓ and choosing a orthonormal basisφ

(ℓ)k : 1 ≤ k ≤ rℓ

of Vℓ, we can represent each v ∈ V by

v =∑

k

bkφ(1)k1

⊗ φ(2)k2

⊗ . . .⊗ φ(d)kd, with bk ∈ R

J1×...×Jd ,

and with the multi-index k = (k1, . . . , kd), 1 ≤ kℓ ≤ rℓ, where

Jℓ := 1, ..., rℓ, (1 ≤ ℓ ≤ d).

Let r = (r1, . . . , rd) ∈ Nd be a d-tuple of dimensions.

Exer. 6.3. Max. canonical rank in V, R = (∏d

ℓ=1 rℓ)/maxℓ rℓ.

Page 79: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Orthogonal rank-r representation (Tucker format) B. Khoromskij, Zurich 2010(L6) 157

Def 6.6. (Tucker format) Given r, define

Tr := v ∈ V ⊂ H ∀ Vℓ s.t. dimVℓ = rℓ, ℓ = 1, ..., d .

A representation of w ∈ Tr is called a Tucker format of rank r

(cf. [1], [3], [4]).

Denote by U (ℓ) = [φ(ℓ)1 , ..., φ

(ℓ)rℓ ] ∈ R

nℓ×rℓ the ℓ-mode side matrix.

Def. 6.7. We say that U (ℓ) ∈ Srℓ, where Srℓ is the Stiefel

manifold of the orthogonal nℓ × rℓ matrices.

The Tucker representation is not unique (rotation of U (ℓ)).

Let us set for ease of presentation, n = nℓ, (ℓ = 1, ..., d).

Storage of w ∈ Tr:∏d

ℓ=1 rℓ reals and the sampling of∑d

ℓ=1 rℓ

vectors φ(ℓ)k ∈ Rn, O(rd + drn), r = max rℓ (curse of dimension).

Orthogonal rank-r representation (Tucker format) B. Khoromskij, Zurich 2010(L6) 158

Remark to Def. 6.6. Using the (orthogonal) side-matrices

U (ℓ) = [φ(ℓ)1 ...φ(ℓ)

rℓ] ∈ R

n×rℓ ,

we represent the Tucker decomposition of V ∈ T r as a

tensor-by-matrix contracted products,

V = β ×1 U(1) ×2 U

(2)...×d U(d),

where β ∈ RJ1×...×Jd is the core tensor of “small” size

r1 × ...× rd.

Rem. 6.4. In the case d = 2, the above representation is a

multilinear equivalent of a matrix factorisation, i.e., we have

A = β ×1 U(1) ×2 U

(2) = U (1) · β · U (2)T , β ∈ Rr1×r2 .

Page 80: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Tucker orthogonality meets the canonical sparsity B. Khoromskij, Zurich 2010(L6) 159

Visualization of the Tucker model for d = 3:

=

I 2

I 1

I 3

A B

I 1

r 2

r 1

I 2

I 3

r 3

V

V

V

(1)

(2)

(3)

How to relax drowbacks of both T r,n and CR ?

Main idea: The two-level tensor format that inherits the Tucker

orthogonality in primal space (robust decomposition) and the CR

structure in the dual (coefficients) space (linear scaling in d, n,R, r).

Two-level Tucker-canonical model B. Khoromskij, Zurich 2010(L6) 160

Def. 6.8. Mixed Tucker-canonical model (T CR,r), ([2]).

Given the rank parameters r, R, define a subclass T CR,r ⊂ T r,n

of tensors with β ∈ CR,r ⊂ RJ1×...×Jd ,

V =

(∑R

ν=1βνu

(1)ν ⊗ . . .⊗ u(d)

ν

)×1 V

(1) ×2 V(2)...×d V

(d).

Storage: S(V ) = dRr +R+ drn (linear scaling in d, n,R, r).

AB

V

V

r3

I 3

I 2

r2

(3)

(2)

I1

I

I

I

2

3

1r2

r3

r1

r1

V (1)

I

I

I

2

3

1

b1

+ . . . + bR

U(3)1

U(2)2

U(1)1

U(3)R

U(2)R

U(1)R

B

.

.

Level I: Tucker decomposition (left). Level II: canonical decomposition of β (right).

Exer. 6.4. Compute the mixed decomposition of functional

tensor for f1,κ, is it much faster than CP? (cf. Lect. 2).

Page 81: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Nonlinear approximation in tensor format B. Khoromskij, Zurich 2010(L6) 161

Exer. 6.5. Compute the canonical, Tucker and ℓ-mode ε-rank of the

Hilbert tensor A = aijk, aijk = 1/(i+ j + k) (i, j, k = 1, ..., n) with n = 102,

corresponding to approximation error ε = 10−3, 10−4, 10−5. Do you observe

the exponential convergence in rε?

Probl. 1. Efficient and accurate MLA in fixed tensor classes

S getting rid of the curse of dimensionality.

Probl. 2. Best rank-structured approximation of a high-order

tensor f ∈ Vn in the fixed set S ⊂ Tr, CR,T CR,r.Probl. 3. For fixed accuracy ε > 0, efficient approximation of

a high-order tensor f ∈ Vn in S with adaptive rank parameter.

Since both Tr and CR are not linear spaces, we arrive at a

nontrivial nonlinear approximation problem on estimation:

Given X ∈ Vn (more generally, X ∈ S0 ⊂ Vn), find

Tr(X) := argminA∈S

‖X −A‖, where S ⊂ Tr, CR,T CR,r. (43)

Nonlinear approximation in tensor format B. Khoromskij, Zurich 2010(L6) 162

Recall that the decomposition

f(x) := sin

d∑

j=1

xj

=

d∑

j=1

sin(xj)∏

k∈1,...,d\j

sin(xk + αk − αj)

sin(αk − αj)

(44)

holds for any αk ∈ R, s.t. sin(αk − αj) 6= 0 for all j 6= k.

(44) shows the lack of uniqueness (ambiguity) of the best

rank-d tensor representation. The convergence of

minimisation schemes in CR might be non-robust (multiple

local minima).

Exer. 6.6. Prove that the tensor related to f(x) has the

maximal Tucker rank 2. Check it by Tensor Toolbox.

Next discussion: How to solve (317) efficiently?

Main steps: MLA on tensors + high-order extension(s) of

truncated SVD + nonlinear iteration + multigrid.

Page 82: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Literature to Lecture 6 B. Khoromskij, Zurich 2010(L6) 163

1. L. De Lathauwer, B. De Moor, J. Vandewalle: On the best rank-1 and rank-(R1, ..., RN )

approximation of higher-order tensors. SIAM J. Matrix Anal. Appl., 21 (2000) 1324-1342.

2. B.N. Khoromskij: Structured Rank-(r1, ..., rd) Decomposition of Function-related Tensors in Rd.

Comp. Meth. in Appl. Math., V. 6 (2006), 194-220.

3. T.G. Kolda, and B.W. Bader: Tensor decompositions and applications.

SIAM Review, 51/3 (2009), 455-500.

4. L.R. Tucker: Some mathematical notes on three-mode factor analysis. Psychometrika 31 (1966) 279-311.

http://personal-homepages.mis.mpg.de/bokh

Lect. 7. Toward multilinear algebra of tensors B. Khoromskij, Zurich 2010(L7) 164

Outlook of Lecture 7.

1. From d = 2 to higher dimensions: main distinctions.

2. Extending SVD to High Order SVD (HOSVD).

3. Truncated HOSVD: Quasi-optimal error estimate.

4. Reduced HOSDV (RHOSDV). The error estimate.

5. Summary and observations on basic tensor representations:

(a) Full → Tucker.

(b) Canonical (CP) → Tucker.

(c) Tucker → Tucker.

(d) Tucker → CP.

Page 83: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

The multi-factor analysis is nonlinear B. Khoromskij, Zurich 2010(L7) 165

Def. 7.1. (Tensor rank, cf. Def. 6.5). The minimal number

R in the representation

RI ∋ A =

R∑

k=1

v(1)k ⊗ · · · ⊗ v

(d)k , v

(ℓ)k ∈ R

n, (45)

is called a tensor rank, R = rank(A), of the tensor A.

[Hitchcock 1927; Kruskal]

Finding of a tensor rank R and the corresponding

decomposition(s) in high dimensions (d ≥ 3) is the main issue

of the multi-factor analysis. Computing the rank of high

order tensor is NP-hard [Hastad 1990].

Rem. 7.1. For d = 2, Def. 7.1 coincides with the standard

definition of matrix rank(A), which can be calculated (with

the rank decomposition) by finite SVD algorithm in O(n3) op.

The orthogonality requirement in SVD ensures the

uniqueness of rank decomposition.

Little analogy between the cases d ≥ 3 and d = 2 B. Khoromskij, Zurich 2010(L7) 166

If d > 2, the situation changes dramatically.

I. rank(A) depends on the number field (say, R or C).

II. Set of tensors of rank not larder than r,

Cr(d) := T ∈ H1 ⊗ ...⊗Hd : rank(T ) ≤ r,

is closed when d = 2 (matrices), or if r = 1 (rank-1 tensors),

[Golub, Zhang ’01].

III. For d ≥ 3, r 6= 1, the set Cr(d) is nonclosed.

Ex. 7.1. Let x, y be two linearly independent vectors in H

(say, dim(H) = 2). Consider the tensor T ∈ H ⊗H ⊗H = H⊗3,

T := x⊗ x⊗ x+ x⊗ y ⊗ y + y ⊗ x⊗ y.

It can be proven that

(a) rank(T ) = 3 (Exer. 7.1. Prove (a), [De Silva, Lim ’06]);

(b) T has no best rank-2 approximation.

Page 84: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Little analogy between the cases d ≥ 3 and d = 2 B. Khoromskij, Zurich 2010(L7) 167

Proof of (b): Consider a sequence Sk∞k=1 in H⊗3,

Sk := x⊗ x⊗ (x− ky) + (x+ 1/ky) ⊗ (x+ 1/ky) ⊗ ky.

Clearly that rank(Sk) ≤ 2 for all k. By multilinearity of ⊗,

Sk = T +1

ky ⊗ y ⊗ y.

Hence, for any choice of norm on H ⊗H ⊗H,

‖Sk − T‖ =1

k‖y ⊗ y ⊗ y‖ → 0 as k → ∞.

IV. For d ≥ 3 we do not know any finite algorithm to compute

r = rank(A), except simple bounds:

0 ≤ rank(A) ≤ nd−1.

Compare with the case d = 2.

Little analogy between the cases d ≥ 3 and d = 2 B. Khoromskij, Zurich 2010(L7) 168

V. For fixed d ≥ 3 and n we do not know the exact value of

maxrank(A).J. Kruskal ’77 [3] proved that:

– for any 2 × 2 × 2 tensor we have maxrank(A) = 3 < 4;

– for 3 × 3 × 3 tensors there holds maxrank(A) = 5 < 9.

VI. “Probabilistic” properties of rank(A):

In the set of 2 × 2 × 2 tensors there is about 79% of rank-2

tensors and 21% of rank-3 tensors, while rank-1 tensors

appear with probability 0, (J. Kruskal).

Clearly, for n× n matrices we have (why?)

Prank(A) = n = 1.

Page 85: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Little analogy between the cases d ≥ 3 and d = 2 B. Khoromskij, Zurich 2010(L7) 169

VII. However, it is possible to prove the important uniqueness

property within the equivalence classes.

Two representations like (45) are considered as equivalent (essential

equivalence) if either

(a) they differ in the order of terms or

(b) for some set of paramers aℓk ∈ R such that

dQℓ=1

aℓk = 1 (k = 1, ..., R),

there is a transform v(ℓ)k → aℓ

kv(ℓ)k .

A simplified version of the general uniqueness result is the

following (all factors have the same full rank R).

Prop. 7.2. [J. Kruskal, ’77]. Let for each ℓ = 1, ..., d, the vectors

v(ℓ)k , (k = 1, ..., R) with R = rank(A), are linear independent. If

(d− 2)R ≥ d− 1,

then the decomposition (45) is uniquely determined up to the

equivalence (a) - (b) above.

High order SVD (HOSVD) B. Khoromskij, Zurich 2010(L7) 170

The analogy of SVD model for dth-order tensors can be

formulated as the so-called high order SVD.

Thm. 7.1. (dth-order SVD, [De Lathauwer, Moor, Vendewalle 2000]).

Every complex n1 × n2 × ...× nd-tensor A can be written as the

contracted product

A = S ×1 U(1) ×2 U

(2)...×d U(d), where (46)

1. U (ℓ) = [U(ℓ)1 U

(ℓ)2 ...U

(ℓ)nℓ ] is a unitary nℓ × nℓ-matrix,

2. S is a complex n1 × n2 × ...× nd-tensor of which the

subtensors Siℓ=α, obtained by fixing the ℓth index to α, have

the properties of

(i) all-orthogonality: two subtensors Siℓ=α and Siℓ=β are

orthogonal for all possible values of ℓ, α, and β s.t. α 6= β:

〈Siℓ=α,Siℓ=β〉 = 0 when α 6= β.

Page 86: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

High order SVD (HOSVD) B. Khoromskij, Zurich 2010(L7) 171

(ii) Ordering:

‖Siℓ=1‖ ≥ ‖Siℓ=2‖ ≥ ... ≥ ‖Siℓ=nℓ‖ ≥ 0, ∀ ℓ = 1, ..., d.

Hint: consider the matrix representations

A(ℓ) = U (ℓ) S(ℓ) [U (1) ⊗ ...⊗ U (ℓ−1) ⊗ U (ℓ+1) ⊗ ...⊗ U (d)]T ,

A(ℓ) = U (ℓ) Σ(ℓ) V (ℓ)T, Σ(ℓ) = diagσ(ℓ)

1 , ..., σ(ℓ)nℓ ,

S(ℓ) = Σ(ℓ) V (ℓ)T[U (1) ⊗ ...⊗ U (ℓ−1) ⊗ U (ℓ+1) ⊗ ...⊗ U (d)],

where ⊗ denotes the Kronecker product of matrices (details in Lect. 8).

Now the latter implies (i) and (ii) due to orthogonality of matrices U (ℓ).

(iii) The Frobenius-norms ‖Siℓ=i‖, symbolized by σ(ℓ)i , are

ℓ-mode singular values of A(ℓ) and the vector U(ℓ)i is an ith

ℓ-mode left singular vector of A(ℓ).

Truncated HOSVD: Full → Tucker B. Khoromskij, Zurich 2010(L7) 172

Computation: U (ℓ) (ℓ = 1, ..., d), is left singular matrix of A(ℓ).

Core S can be computed by bringing the matrices of sigular

vectors to the left side of (46):

S = A×1 U(1)T ×2 U

(2)T ...×d U(d)T .

Thm. 7.2. (Approximation by HOSVD, [1]). Let the HOSVD

of A be given as in Thm. 7.1 and let the ℓ-mode rank of A

be equal to Rℓ (ℓ = 1, ..., d). For given rank parameter

r = (r1, ..., rd), define a tensor Ar by discarding the smallest

ℓ-mode singular values σ(ℓ)rℓ+1, σ

(ℓ)rℓ+2, ..., σ

(ℓ)Rℓ

(ℓ = 1, ..., d), i.e., set

the corresponding parts of S equal to zero. Then we have

‖A− Ar‖2 ≤d∑

ℓ=1

Rℓ∑

iℓ=rℓ+1

σ(ℓ)2

iℓ.

Notice that truncated HOSVD gets lost only the factor√d compared with

“best approximation”.

Page 87: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Visualizing the truncated HOSVD B. Khoromskij, Zurich 2010(L7) 173

SVD

SVD

SVD

(1)

(2)

(3)

r 1

r2

r 3

n 1

n 2

n3

n1

n . n

n2

n 3

n . n

3 2

1 3

2 1

n n.

T

T

T

0

0

0

Figure 19: Approximation by HOSVD via truncated SVD of the matrix

unfoldings for d = 3, rℓ < nℓ.

Rem. 7.2. Truncated HOSVD practically applies to small d

and to moderate n.

Proof of the T-HOSVD error bound B. Khoromskij Zurich 2010(L7) 174

Proof. Due to orthogonality of U (ℓ), ℓ = 1, ..., d, and Thm. 7.1 we have

‖A− Ar‖2 =

R1∑

i1=1

R2∑

i2=1

· · ·Rd∑

id=1

s2i1i2...id −r1∑

i1=1

r2∑

i2=1

· · ·rd∑

id=1

s2i1i2...id

≤R1∑

i1=r1+1

R2∑

i2=1

· · ·Rd∑

id=1

s2i1i2...id+

R1∑

i1=1

R2∑

i2=r2+1

· · ·Rd∑

id=1

s2i1i2...id

+ · · · +R1∑

i1=1

R2∑

i2=1

· · ·Rd∑

id=rd+1

s2i1i2...id

=

R1∑

i1=r1+1

σ(1)2

i1+

R2∑

i2=r2+1

σ(2)2

i2+ · · · +

Rd∑

id=rd+1

σ(d)2

id.

Though the tensor Ar is not the best apprix. of A under the given ℓ-mode

rank constraints, it normally provides a good Tucker approx. of A.

Exer. 7.2. Compare the rank-r RHOSVD of n× n× n Hilbert tensor for

d = 3, n = 100; d = 4, n = 50, with the “best“ rank-r Tucker approx.

Page 88: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Reduced HOSVD (RHOSVD): CP → Tucker B. Khoromskij, Zurich 2010(L7) 175

For given A ∈ CR,n, in the rank-R canonical format,

A =XR

ν=1ξνu

ν1 ⊗ . . .⊗ uν

d , ξν ∈ R,

use its contracted product representation (Tucker with r = (R, ..., R))

A = ξ ×1 U(1) ×2 U

(2) · · · ×d U(d), ξ = diagξ1, ..., ξR,

via ℓ-mode side matrices U (ℓ) = [u1ℓ ...u

Rℓ ] ∈ Rn×R (ℓ = 1, ..., d).

How to symplify HOSVD ? To fix the idea, suppose that n ≤ R.

Def. 7.2. (RHOSVD, [2]). For given A ∈ CR,n, and r, rℓ ≤ R, let

U (ℓ) ≈W (ℓ) := Z(ℓ)0 Dℓ,0V

(ℓ)0

T, be the truncated SVD of the side-matrix

U (ℓ) (ℓ = 1, ..., d), where Dℓ,0 = diagσℓ,1, σℓ,2, ..., σℓ,rℓ and

Z(ℓ)0 = [z1ℓ , ..., z

rℓℓ ] ∈ Rn×rℓ , V0

(ℓ) ∈ RR×rℓ , represent the respective

orthogonal factors. Then the RHOSVD approximation of A is defined by

A0(r) = ξ ×1

»Z

(1)0 D1,0V

(1)0

T–×2 · · · ×d

»Z

(d)0 Dd,0V

(d)0

T–. (47)

Note that A0(r)

∈ Tr in (47) is obtained by the projection of side matrices

U (ℓ) onto the left singular matrices Z(ℓ)0 .

Canonical-to-Tucker approx. by RHOSVD B. Khoromskij, Zurich 2010(L7) 176

Thm. 7.3. (Error of RHOSVD, [2])

(a) For A =RP

ν=1ξνuν

1 ⊗ ...⊗ uνd ∈ CR,n, the minimisation problem

A ∈ CR,n ⊂ Vn : A(r) = argminT∈Tr,n‖A− T‖Vn

,

is equivalent to the dual maximisation problem

[Z(1), ..., Z(d)] = argmaxW (ℓ)∈Gℓ[Srℓ ]

‚‚‚‚‚RX

ν=1

ξν“W (1)T

uν1

”⊗ ...⊗

“W (d)T

uνd

”‚‚‚‚‚

2

Hr

.

(b) (Error of RHOSVD). Let σℓ,1 ≥ σℓ,2... ≥ σℓ,min(n,R) be the singular

values of U (ℓ) ∈ Rn×R (ℓ = 1, ..., d). Then the RHOSVD approx. of A, A0(r)

,

exhibits the error bound (extra factor ‖ξ‖ compared to HOSVD),

‖A−A0(r)‖ ≤ ‖ξ‖

dX

ℓ=1

(

min(n,R)X

k=rℓ+1

σ2ℓ,k)1/2, ‖ξ‖ =

vuutRX

ν=1

ξ2ν . (48)

Item (a): to be considered in Lect. 9.

Page 89: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Sketching the proof of RHOSVD error bound B. Khoromskij, Zurich 2010(L7) 177

Proof. Using the contracted product representations of A ∈ CR,n and A0(r)

,

leads to the following expansion for the respective error,

A−A0(r) = ξ ×1 U

(1) ×2 U(2) · · · ×d U

(d)

− ξ ×1

»Z

(1)0 D1,0V

(1)0

T–×2

»Z

(2)0 D2,0V

(2)0

T–· · · ×d

»Z

(d)0 Dd,0V

(d)0

T–

= ξ ×1

»U (1) − Z

(1)0 D1,0V

(1)0

T–×2

»Z

(2)0 D2,0V

(2)0

T–· · · ×d

»Z

(d)0 Dd,0V

(d)0

T–

+ ξ ×1 U(1) ×2

»U (2) − Z

(2)0 D2,0V

(2)0

T–· · · ×d

»Z

(d)0 Dd,0V

(d)0

T–

+ ...

+ ξ ×1 U(1) ×2 U

(2) · · · ×d

»U (d) − Z

(d)0 Dd,0V

(d)0

T–.

Introduce the ℓ-mode residual

∆(ℓ) = U (ℓ) − Z(ℓ)0 Dℓ,0V

(ℓ)0

T, ∆(ℓ)ν =

nX

k=rℓ+1

σℓ,kzkℓ v

kℓ,ν , ν = 1, ..., R,

with notations

V(ℓ)0 = [v1ℓ , ..., v

rℓℓ ]T , vk

ℓ = vkℓ,νR

ν=1.

Sketching the proof of RHOSVD error bound B. Khoromskij, Zurich 2010(L7) 178

The ℓth summand in the righ-hand side in ‖A−A0(r)

‖ takes the form

Bℓ = ξ ×1 U(1) · · · ×ℓ−1 U

(ℓ−1) ×ℓ ∆(ℓ) ×ℓ+1 W(ℓ+1) · · · ×d W

(d).

This leads to the error bound (by the triangle inequality)

‖A−A0(r)‖ ≤

dX

ℓ=1

‖Bℓ‖ = ‖ξ ×1 ∆(1) ×2 W(2) · · · ×d W

(d)‖

+ ‖ξ ×1 U(1) ×2 ∆(2) · · · ×d W

(d)‖ + ...

+ ‖ξ ×1 U(1) ×2 U

(2) · · · ×d ∆(d)‖,

where the ℓth term Bℓ is represented by

RX

ν=1

ξν

"uν1 · · · ×ℓ−1 u

νℓ−1 ×ℓ ∆(ℓ)ν ×ℓ+1

rℓ+1X

k=1

σℓ+1,kzkℓ+1v

kℓ+1,ν · · · ×d

rdX

k=1

σd,kzkdv

kd,ν

#,

providing the estimate (in view of ‖uνℓ ‖ = 1, ℓ = 1, ..., d, ν = 1, ..., R)

‖Bℓ‖ ≤RX

ν=1

|ξν |(nX

k=rℓ+1

σ2ℓ,kv

kℓ,ν

2)1/2·(

rℓ+1X

k=1

σ2ℓ+1,kv

kℓ+1,ν

2)1/2 · · · (

rdX

k=1

σ2d,kv

kd,ν

2)1/2.

Page 90: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Sketching the proof of RHOSVD error bound B. Khoromskij, Zurich 2010(L7) 179

Notice that U (ℓ) (ℓ = 1, ..., d) has normalised columns, i.e.,

1 = ‖uνℓ ‖ = ‖

nX

k=1

σℓ,kzkℓ v

kℓ,ν‖,

implyingnP

k=1σ2

ℓ,kvkℓ,ν

2= 1 for ℓ = 1, ..., d ν = 1, ..., R.

We finalise the error bound as follows,

‖A− A0(r)‖ ≤

dX

ℓ=1

RX

ν=1

|ξν |

0@

nX

k=rℓ+1

σ2ℓ,kv

kℓ,ν

2

1A

1/2

≤dX

ℓ=1

RX

ν=1

ξ2ν

!1/20@

RX

ν=1

nX

k=rℓ+1

σ2ℓ,kv

kℓ,ν

2

1A

1/2

=dX

ℓ=1

‖ξ‖

0@

nX

k=rℓ+1

σ2ℓ,k

RX

ν=1

vkℓ,ν

2

1A

1/2

= ‖ξ‖dX

ℓ=1

0@

nX

k=rℓ+1

σ2ℓ,k

1A

1/2

.

The case R < n can be analysed along the same line.

Canonical rank estimate B. Khoromskij, Zurich 2010(L7) 180

Recall that nℓ denotes the ℓ-mode single-hole product of dimensions

nℓ = n1 · · ·nℓ−1nℓ+1 · · ·nd.

Rem. 7.3. The canonical rank of a tensor A ∈ Vn has the upper bound

R ≤ min1≤ℓ≤d

nℓ =

dY

ℓ=1

nℓ/maxℓnℓ. (49)

Proof. Consider the case d = 3. Let n1 = max1≤ℓ≤d

nℓ for definiteness. We

can represent a tensor A as

A =

n3X

k=1

Bk ⊗ Zk, Bk ∈ Rn1×n2 , Zk ∈ R

n3 ,

where Bk = A(:, :, k) (k = 1, . . . , n3) is the n1 × n2 matrix slice of A and

Zk(i) = 0, for i 6= k, Zk(k) = 1. Let rank(Bk) = rk ≤ n2, k = 1, . . . , n3, then

rank(Bk ⊗ Zk) = rank(Bk) ≤ n2, implying

rank(A) ≤n3X

n=1

rank(Bk) ≤ n2n3 = min1≤ℓ≤3

nℓ.

Page 91: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Tucker → Tucker, Tucker → CP B. Khoromskij, Zurich 2010(L7) 181

The general case of d > 3 can be proven by induction.

Lem. 7.4. (Mixed Tucker-to-canonical approximation).

(A) Let the target tensor A have the form

A = β ×1 V(1) ×2 ...×d V

(d) ∈ T r,n,

with the orthogonal side-matrices V (ℓ) = [v(ℓ)1 . . . v

(ℓ)rℓ ] ∈ Rn×rℓ and

β ∈ Rr1×...×rd .

Then, for a given R ≤ min1≤ℓ≤d

rℓ (see (49)),

minZ∈CR,n

‖A− Z‖ = minµ∈CR,r

‖β − µ‖. (50)

(B) Assume that there exists the best rank-R approximation A(R) ∈ CR,n

of A, then there is the best rank-R approximation β(R) ∈ CR,r of β, such

that

A(R) = β(R) ×1 V(1) ×2 ...×d V

(d). (51)

Tucker → Tucker, Tucker → CP B. Khoromskij, Zurich 2010(L7) 182

Proof. (A) Notice that the canonical vectors y(ℓ)k of any test element in

the left-hand side of (50),

Z =RX

k=1

λk y(1)k ⊗ ...⊗ y

(d)k ∈ CR,n, (52)

can be chosen in spanv(ℓ)1 , . . . , v(ℓ)rℓ , i.e.,

y(ℓ)k =

rℓX

m=1

µ(ℓ)k,mv

(ℓ)m , k = 1, . . . , R, ℓ = 1, ..., d. (53)

Indeed, assuming

y(ℓ)k =

rℓX

m=1

µ(ℓ)k,mv

(ℓ)m + E

(ℓ)k with E

(ℓ)k ⊥spanv(ℓ)1 , . . . , v

(ℓ)rℓ ,

we conclude that E(ℓ)k does not effect the cost function in (50) because of

the orthogonality of V (ℓ). Hence, setting E(ℓ)k = 0, and substituting (53)

into (52), we arrive at the desired Tucker decomposition of Z,

Z = βz ×1 V(1) ×2 . . .×d V

(d), βz ∈ CR,r.

Page 92: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Tucker → Tucker, Tucker → CP B. Khoromskij, Zurich 2010(L7) 183

This implies

‖A− Z‖2 = ‖(βz − β) ×1 V(1) ×2 . . .×d V

(d)‖2 = ‖β − βz‖2 ≥ minµ∈CR,r

‖β − µ‖2.

On the other hand, we have

minZ∈CR,n

‖A− Z‖2 ≤ minβz∈CR,r

‖(β − βz) ×1 V(1) ×2 . . .×d V

(d)‖2 = minµ∈CR,r

‖β − µ‖2.

Hence, we prove (50).

(B) Likewise, for any minimizer A(R) ∈ CR,n in the r.h.s. of (50), one

obtains

A(R) = β(R) ×1 V(1) ×2 V

(2)...×d V(d)

with the respective rank-R core tensor

β(R) =RX

k=1

λku(1)k ⊗ ...⊗ u

(d)k ∈ CR,r.

Here u(ℓ)k = µ(ℓ)

k,mℓrℓ

mℓ=1 ∈ Rrℓ , are calculated by using representation

Tucker → Tucker, Tucker → CP B. Khoromskij, Zurich 2010(L7) 184

(53), and then changing the order of summation,

A(R) =RX

k=1

λky(1)k ⊗ ...⊗ y

(d)k

=RX

k=1

λk

0@

r1X

m1=1

µ(1)k,m1

v(1)m1

1A⊗ ...⊗

0@

rdX

md=1

µ(d)k,md

v(d)md

1A

=

r1X

m1=1

...

rdX

md=1

(RX

k=1

λk

dY

ℓ=1

µ(ℓ)k,mℓ

)v(1)m1 ⊗ ...⊗ v

(d)md

.

Now (51) implies that

‖A−AR‖ = ‖β − β(R)‖,

since the ℓ-mode multiplication with orthogonal side matrices V (ℓ) does

not change the cost function. Taking into account the l.h.s. of (50), the

latter indicates that β(R) is the minimizer in the r.h.s. of (50).

Page 93: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Overlook on direct methods of tensor approx. B. Khoromskij, Zurich 2010(L7) 185

1. ACA + SVD for two-fold decompositions (d = 2).

2. Analytic approximation to some function-generated dth

order tensors (d ≥ 2), (Lect. 3, 4).

Def. 7.3. Given the multivariate function

g : Ω ∈ Rd → R, Ω := [−L,L]d,

and a set of collocation points ζi = (ζ1i1, ..., ζd

id), i ∈ Id,

specified by a tensor grid in Ω.

The function-generated d-th order tensor is defined by

A ≡ A(g) := [ai1...id ] ∈ RId with ai1...id := g(ζ1

i1 , ..., ζdid

).

3. T-HOSVD, RT-HOSVD for quasi-optimal Tucker approx.

4. Next step: Algebraic recompression methods by iterated

rank-r Tucker or/and canonical approximation of high order

tensors (convergence theory is an open question).

Preliminary summarise on Tucker/canonical formats B. Khoromskij, Zurich 2010(L7) 186

Direct analytic approximation: Analytic approximation

methods are of principal importance.

Basic examples: Tensor representation of Green kernels.

Direct SVD-based approx.: Applies to low dimensions.

Reduction to univariate operations: Basic multi-linear

algebra can be performed using one-dimensional operations,

thus avoiding the exponential scaling in d (Lect. 8).

Bottleneck: Lack of robust and efficient algebraic methods

for the robust canonical/Tucker tensor decomposition of high

order tensors (for d ≥ 3).

Algebraic methods are indespensible. Basic concepts:

nonlinear iteration, multigrid, new tensor formats.

Next we consider the heuristic ALS-type Tucker/canonical

approximation applicable to moderate dimensions.

Page 94: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Literature to Lect. 7 B. Khoromskij, Zurich 2010(L7) 187

1. L. De Lathauwer, B. De Moor, J. Vandewalle, A multilinear singular value decomposition. SIAM J.

Matrix Anal. Appl., 21 (2000) 1253-1278.

2. B.N. Khoromskij and V. Khoromskaia, Multigrid Tensor Approximation of Function Related Arrays.

SIAM J. on Sci. Comp., 31(4), 3002-3026 (2009).

3. B. Kruskal, Three-way arrays: Rank uniqueness of trilinear decompositions. Lin. Alg. Appl. 18 (1977), 95-138.

4. T. Zhang, and G.H. Golub, Rank-one approximation to high order tensors.

SIAM J. Matrix Anal. Appl. 23 (2001), 534-550.

5. V. De Silva, and L.-H. Lim, Tensor rank and the ill-posedness of the best low-rank approximation problem.

SIAM J. Matrix Anal. Appl., 30 (2008) 1084-1127.

http://personal-homepages.mis.mpg.de/bokh

Lect. 8. Matrices in tensor format B. Khoromskij, Zurich 2010(L8) 188

Contents of Lecture 8.

1. Rank-R and Tucker-type matrices.

2. Kronecker, Hadamard and Khatri-Rao product of matrices.

3. Kronecker sum of matrices.

4. Properties of the Kronecker product and sum.

5. Matrix exponential.

6. Eigenvalue problem for Kronecker sum.

7. Matrix Lyapunov/Silvester equations.

8. Kronecker Hadamard scalar product.

9. Kronecker matrix rank if d = 2.

Page 95: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Rank-R matrices (operators) B. Khoromskij, Zurich 2010(L8) 189

Tensor is a vector in the TPHS V = V1 ⊗ ...⊗ Vd = RJ ,

J = J1 × ...× Jd, hence, a rank-R matrix A ∈ MR,I×J is

supposed to be the linear operator which maps

RI×J ∋ A : V → W,

where W = W1 ⊗ ...⊗Wd = RI, I = I1 × ...× Id.

Def. 8.1. We call by MR,I×J a class of rank-R linear

tensor-tensor operators (matrices) A ∈ RI×J , A : R

J → RI,

A =∑R

ν=1ανA

ν1 ⊗ . . .⊗ Aν

d, αν ∈ R, Aνℓ ∈ R

Iℓ×Jℓ ,

for which the matrix-vector multiplication with rank-1 tensor

V ∈ C1,J is defined by the rank-R canonical sum

AV :=∑R

ν=1ανA

ν1v

(1) ⊗ ...⊗Aνdv

(d) ∈ CR,I .

R is called the Kronecker/tensor rank. Matrices Aνℓ : RJℓ → RIℓ

may have fully populated, or data sparse structure.

Tucker matrix format B. Khoromskij, Zurich 2010(L9) 190

Rem. 8.1. Rank-R matrices in MR,I×J could be recognized as a

”matricization“ of all canonical vectors in the CP tensor.

Def. 8.2. The definition of rank-R matrices in MR,I×J can

be extended to the Tucker format, that could be recognized

as a ”matricization“ of all orthogonal vectors in the Tucker

tensor,

A = β ×1 U(1) ×2 U

(2)...×d U(d) ∈ Mr,I×J ,

where β ∈ RJ1×...×Jd is the core r1 × ...× rd-tensor, and

U (ℓ) = [Φ(ℓ)1 , ...,Φ(ℓ)

rℓ ] ∈ RIℓ×Jℓ×rℓ , Φ

(ℓ)k ∈ R

Iℓ×Jℓ .

The operator A maps A : C1,J → T r,I.

If tensors are vectorised (unfolding to a vector, A→ vec(A))

then the respective matrix in MR,I×J can be represented by

the Kronecker product of matrices of size Iℓ × Jℓ (ℓ = 1, ..., d).

This construction is supported by MATLAB.

Page 96: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

The Kronecker product of matrices B. Khoromskij, Zurich 2010(L8) 191

Def. 8.3. The Kronecker product (KP) operation A⊗B oftwo matrices A = [aij ] ∈ Rm×n, B ∈ Rh×g is an mh× ng matrixthat has the block-representation [aijB], i = 1, ...,m; j = 1, ..., n.Recursive extension to d-fold KP (see Property 1. below),

A⊗B ⊗ C = (A⊗ B) ⊗ C = A⊗ (B ⊗ C).

We have the equivalent representation

A⊗B = [a1 ⊗ b1 a1 ⊗ b2 ... an ⊗ bg−1 an ⊗ bg].

Def. 8.4. The Kronecker sum of A ∈ Rm×m and B ∈ Rn×n is defined by

A+⊗ B = Im ⊗ B +A⊗ In.

Ex. 8.1. Define the discrete FD Laplacian on H10 ([0, 1]d),

∆(d) := ∆ ⊗ I ⊗ ...⊗ I + I ⊗ ∆ ⊗ I ⊗ ...⊗ I + ...+ I ⊗ ...I ⊗ ∆ ∈ Rnd×nd ,

where I = In is the n× n identity and ∆ = 1(n+1)2

tridiag−1, 2,−1 ∈ Rn×n.

∆(d) has Kronecker/tensor rank R = d.

Exer. 8.1. Prove that ∆(d) ∈ Mr,I×I with r = (2, 2, ..., 2).

Khatri-Rao and Hadamard product of matrices B. Khoromskij, Zurich 2010(L8) 192

Def. 8.5. The Khatri-Rao product is the “matching

columnwise“ Kronecker product. Given matrices

A = [aij ] ∈ Rm×n, B ∈ R

h×n, their Khatri-Rao product is

denoted by A⊡B. It has the size mh× n and is defined by

A⊡B = [a1 ⊗ b1 a2 ⊗ b2 ... an ⊗ bn].

If A and B are vectors, then the Khatri-Rao and Kronecker

products are identical, i.e., A⊗B = A⊡B.

Def. 8.6. Define the Hadamard product

A⊙B = C := [ci1...id ](i1...id)∈I , I = I1 × ...× Id,

of two tensors/matrices of the same size I, A,B ∈ RI, by the

entrywise multiplication

ci1...id = ai1...id · bi1...id .

Page 97: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Properties of the Kronecker product B. Khoromskij, Zurich 2010(L8) 193

KP inherits many properties from matrices A and B (cf.

[1]):

(1) Let C ∈ Rs×t, then the KP satisfies the associative law,

(A⊗B) ⊗ C = A⊗ (B ⊗ C) = A⊗B ⊗ C,

and therefore we do not use brackets above. The matrix

A⊗B ⊗ C := (A⊗B) ⊗ C has (mhs) rows and (ngt) columns.

(2) Let C ∈ Rn×r and D ∈ Rg×s, then the standard

matrix-matrix product in the Kronecker format takes the form

(A⊗B)(C ⊗D) = (AC) ⊗ (BD).

The corresponding extension to d-th order tensors is

(A1 ⊗ ...⊗Ad)(B1 ⊗ ...⊗Bd) = (A1B1) ⊗ ...⊗ (AdBd).

Properties of the Kronecker product B. Khoromskij, Zurich 2010(L8) 194

(3) The distributive law

(A+B) ⊗ (C +D) = A⊗ C + A⊗D +B ⊗ C +B ⊗D.

(4) Rank relation: rank(A⊗B) = rank(A)rank(B).

Exer. 8.1. In general A⊗B 6= B ⊗A. Give the condition on A

and B that provides A⊗B = B ⊗A.

Invariance of some matrix properties:

(5) If A and B are diagonal then A⊗B is also diagonal, and

conversely (if A⊗B 6= 0).

(6) The upper/lower triangular matrices are preserved.

(7) Let A, B be Hermitian/normal/orthog. matr. (A∗ = A,

A−1 = A, A−1 = AT ). Then A⊗B is of the corresponding type.

(8) Let A ∈ Rn×n and B ∈ Rm×m. Then

det(A⊗B) = (detA)m(detB)n.

Page 98: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Matrix operations with Kronecker product and sum B. Khoromskij, Zurich 2010(L8) 195

Thm. 8.1. Let A ∈ Rn×n and B ∈ R

m×m be invertible

matrices. Then

(A⊗B)−1 = A−1 ⊗B−1.

Proof. Since det(A) 6= 0, det(B) 6= 0 and the above property

(8), we have det(A⊗B) 6= 0. Thus (A⊗B)−1 exists and

(A−1 ⊗B−1)(A⊗B) = (A−1A) ⊗ (B−1B) = Inm.

Lem. 8.2. Let A ∈ Rn×n and B ∈ R

m×m be unitary matrices.

Then A⊗B is a unitary matrix.

Proof. Since A∗ = A−1, B∗ = B−1 we have

(A⊗B)∗ = A∗ ⊗B∗ = A−1 ⊗B−1 = (A⊗B)−1.

Matrix operations with Kronecker products and sums B. Khoromskij, Zurich 2010(L8) 196

Define the commutator [A,B] := AB −BA.

Lem. 8.3. Let A ∈ Rn×n and B ∈ R

m×m. Then

[A⊗ In, Im ⊗B] = 0 ∈ Rm2×n2

.

Proof.

[A⊗ In, Im ⊗B] = (A⊗ In)(Im ⊗B) − (Im ⊗B)(A⊗ In)

= A⊗B −A⊗B = 0.

Rem. 8.2. Let A,B ∈ Rn×n, C,D ∈ R

m×m and [A,B] = 0,

[C,D] = 0. Then

[A⊗ C,B ⊗D] = 0.

Proof. Apply the identity (A⊗B)(C ⊗D) = (AC) ⊗ (BD).

Page 99: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Matrix operations with Kronecker products and sums B. Khoromskij, Zurich 2010(L8) 197

Lem. 8.4. Let A ∈ Rn×n and B ∈ R

m×m. Then

tr(A⊗B) = tr(A)tr(B).

Proof. Since diag(aiiB) = aiidiag(B), we have

tr(A⊗B) =n∑

i=1

m∑

j=1

aiibjj =n∑

i=1

aii

m∑

j=1

bjj .

Thm. 8.5. Let A,B, I ∈ Rn×n. Then

exp(A⊗ I + I ⊗B) = (expA) ⊗ (expB).

Proof. Since [A⊗ I, I ⊗B] = 0, we have

exp(A⊗ I + I ⊗B) = exp(A⊗ I) exp(I ⊗B).

Matrix operations with Kronecker products and sums B. Khoromskij, Zurich 2010(L8) 198

Furthermore, since

exp(A⊗ I) =∞∑

k=0

(A⊗ I)k

k!, exp(I ⊗B) =

∞∑

m=0

(I ⊗B)m

m!

the arbitrary term in exp(A⊗ I) exp(I ⊗B) is given by

1

k!

1

m!(A⊗ I)k(I ⊗B)m.

Imposing

(A⊗I)k(I⊗B)m = (Ak⊗Ik)(Im⊗Bm) = (Ak⊗I)(I⊗Bm) ≡ Ak⊗Bm,

we finally arrive at

1

k!

1

m!(A⊗ I)k(I ⊗B)m = (

1

k!Ak) ⊗ (

1

m!Bm).

Page 100: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Matrix operations with Kronecker products and sums B. Khoromskij, Zurich 2010(L8) 199

Thm. 8.5. can be extended to the case of many-term sum

exp(A1⊗I⊗...⊗I+I⊗A2⊗...⊗I+...+I⊗...⊗I⊗Ad) = (eA1)⊗...⊗(eAd).

Rem. 8.3. Similar properties can be shown for other analytic

functions, e.g.,

sin(In ⊗A) = In ⊗ sin(A),

sin(A⊗ Im + In ⊗B) = sin(A) ⊗ cos(B) + cos(A) ⊗ sin(B),

sin(A⊗ Im + In ⊗B) =sin(A) ⊗ sin(B + (b− a)I)

sin(b− a)+

sin(A+ (a− b)I) ⊗ sin(B))

sin(a− b)

for all values a, b such that sin(a− b) 6= 0. The latter can be extended to

sin(A1 ⊗ I ⊗ ...⊗ I + ...+ I ⊗ ...⊗ I ⊗ Ad) (compare with sin(x1 + ...+ xd)).

Other simple properties:

(A⊗B)T = AT ⊗BT , (A⊗B)∗ = A∗ ⊗B∗.

Eigenvalue problem for Kronecker sums B. Khoromskij, Zurich 2010(L8) 200

Lem. 8.6. Let A ∈ Rm×m and B ∈ R

n×n have the eigen-data

λ1, ..., λm, u1, ..., um, and µ1, ..., µn, v1, ..., vn, respectively. Then

A⊗B has the eigenvalues λjµk with the corresponding

eigenvectors uj ⊗ vk, 1 ≤ j ≤ m, 1 ≤ k ≤ n.

Thm. 8.7. Under the conditions of Thm. 8.6 the

eigenvalues/eigenfunctions of A⊗ In + Im ⊗B are given by

λj + µk and uj ⊗ vk, respectively.

Proof. Due to Thm. 8.6 we have

(A⊗ In + Im ⊗B)(uj ⊗ vk) = (A⊗ In)(uj ⊗ vk) + (Im ⊗B)(uj ⊗ vk)

= (Auj) ⊗ (Invk) + (Imuj) ⊗ (Bvk)

= (λjuj) ⊗ vk + uj ⊗ (µkvk)

= (λj + µk)(uj ⊗ vk).

Page 101: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Application to matrix Lyapunov/Sylvester equations B. Khoromskij, Zurich 2010(L8) 201

Recall that for a matrix A ∈ Rm×n, we use the vector representation

A→ vec(A) ∈ Rmn, where vec(A) is an nm× 1 vector obtained by

“stacking” A’s columns

vec(A) := [a11, ..., an1, a12, ..., anm]T .

In this way, vec(A) is a rearranged version of A.

Lem. 8.8. Let A ∈ Rm×m, Y ∈ Rm×n, B ∈ Rn×n. Then

vec(AY B) = (BT ⊗A)vec(Y ).

The matrix Sylvester equation for X ∈ Rm×n

AX +XBT = G ∈ Rm×m (54)

with A ∈ Rm×m, B ∈ R

n×n, can be written in vector form

(In ⊗A+B ⊗ Im)vec(X) = vec(G).

In the case A = B we arrive at the Lyapunov equation.

Application to matrix Lyapunov/Sylvester equations B. Khoromskij, Zurich 2010(L8) 202

Now the solvability conditions and certain solution methods

can be derived (cf. the results for eigenvalue problems).

The matrix Sylvester equation (54) is uniquely solvable if

λj(A) + µk(B) 6= 0.

Rem. 8.4. Since In ⊗A and B ⊗ Im commute, we can apply

methods based on sinc-quadr., to represent the inverse

(In ⊗ A+B ⊗ Im)−1 =

R+

e−t(In⊗A+B⊗Im)dt.

If A and B represent the discrete elliptic operators in Rd with

separable coefficients, we obtain the low-rank tensor-product

approx. to the Sylvester solution operator.

Exer. 8.2. Approximate formally by the sinc quadrature

(∆(2))−1 = (In ⊗ ∆ + ∆ ⊗ In)−1 ≈MX

k=−M

cke−tk∆ ⊗ e−tk∆.

Page 102: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Kronecker Hadamard scalar product B. Khoromskij, Zurich 2010(L8) 203

Given tensors Y ⊗ U ∈ RJ×I with U ∈ R

I, Y ∈ RJ , and

B ∈ RL×I. Let T : RL → RJ be the linear operator (tensor)

that maps tensors defined on the index set L into those

defined on J . In particular T ·B ∈ RJ×I.

Def. 8.7. ([3]) The Hadamard “scalar” product [D,C]I ∈ RK

of two tensors D := [Di,k] ∈ RI×K and C := [Ci,k] ∈ R

I×K is

defined by

[D,C]I :=∑

i∈I[Di,K] ⊙ [Ci,K],

where ⊙ denotes the Hadamard product over the index set Kand [Di,K] := [Di,k]k∈K.

Lem. 8.9. Let U, Y,B and T be given as above. Then, with

K = J , the following identity is valid

[U ⊗ Y, T ·B]I = Y ⊙ (T · [U,B]I) ∈ RJ . (55)

Kronecker Hadamard scalar product B. Khoromskij, Zurich 2010(L8) 204

Proof. By definition of the Hadamard scalar product we have

[U ⊗ Y, T ·B]I =∑

i∈I[U ⊗ Y ]i,J ⊙ [T ·B]i,J

=∑

i∈I[[U ]iY ]J ⊙ [T ·B]i,J

= Y ⊙(∑

i∈I[U ]i[T ·B]i,J

)

= Y ⊙(T ·∑

i∈I[U ]i[B]i,L

),

then the assertion follows.

Identity (55) is particularly important in application to the

Boltzmann eq. [3], since in the right-hand side the operator T

is removed from the scalar product in I and, hence, it applies

only once.

Page 103: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Remarks on rank structured operators (matrices) B. Khoromskij, Zurich 2010(L8) 205

Rem. 8.5. By Def. 8.1 the matrix-vector multiplication with rank-1

tensor V ∈ C1,J is defined by the rank-R canonical sum

AV =XR

ν=1ανA

ν1v1 ⊗ ...⊗Aν

dvd ∈ CR,I .

Since ⊗ is also traditionally used for the Kronecker product of matrices

the equivalent and more consistent notation could be based on the

contracted product,

A =XR

ν=1ανA

ν1 ×2 . . .×d A

νd , αν ∈ R, Aν

ℓ ∈ RIℓ×Jℓ .

However, if there is no confusion, we continue using ⊗ in the matrix

tensor product as in Def. 8.1.

In particular, for the “single-index” representation of vectors, the discrete

Laplacian in Rd, takes the form as in Ex. 8.1. In the tensor representation

of vectors, we can write

∆(d) := ∆ ×2 I ×3 ...×d I + I ×2 ∆ ×3 I ×4 ...×d I + ...+ I ×2 ...I ×d ∆,

with ∆(d) ∈ R(n×n)×...×(n×n).

If there is no confusion, we continue using ⊗ in the notation for the

discrete operators (stiffness matrices) in Rd.

Kronecker matrix rank if d = 2 B. Khoromskij, Zurich 2010(L8) 206

If d = 2, estimate of the Kronecker matrix rank can be

reduced to computation of the standard matrix rank (cf. [2,

4]).

Let A = [a(i, j)]1≤i,j≤N , N = n1n2. Use the bijection

i↔ (i1, i2), j ↔ (j1, j2), 1 ≤ i1, j1 ≤ n1, 1 ≤ i2, j2 ≤ n2,

defined by FORTRAN-style ordering,

i = i1+(i2−1)n1, j = j1+(j2−1)n1, 1 ≤ i1, j1 ≤ n1, 1 ≤ i2, j2 ≤ n2.

A can be indexed by a(i, j) = a(i1, i2, j1, j2). Introduce a new

matrix A of size n21 × n2

2, indexed by the respective pairs

(i1, j1), (i2, j2) (long indices),

a(i1, j1; i2, j2) = a(i1, i2, j1, j2).

Page 104: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Kronecker matrix rank if d = 2 B. Khoromskij, Zurich 2010(L8) 207

New indexing also defines the bijective mapping P : A→ A

(rearranged version of A), which preserves the Frobenius

norm.

Note. There is no permutation s.t. A = PAP T .

Applied to the rank-R Kronecker product sum,

A→ AR =R∑

k=1

Uk ⊗ Vk ⇐⇒ P(AR) := AR =R∑

k=1

vk · uTk ,

with Uk = [uk(i2, j2)]1≤i2,j2≤n2, Vk = [vk(i1, j1)]1≤i1,j1≤n1 , vk ∈ Rn2

1 ,

uk ∈ Rn2

2 .

Rem. 8.6. The problem of finding a Kronecker tensor-rank

approximation AR of A is identical to the problem to find a

low-rank approximation AR of A.

Complexity of the Kronecker-matrix arithmetics B. Khoromskij, Zurich 2010(L8) 208

We say that a matrix A ∈ MR,I×J has the S-inherited

Kronecker tensor product structure (S-structure) if Aℓk ∈ S,

where S is a class of data-sparse matrices of complexity

O(n logn) (storage, MVM, MMM).

Complexity issues: Let N = nd,

• Data compression.

The storage for A is O(dRn) = O(dRN1/d).

Hence, we enjoy the sub-linear complexity.

• Matrix-by-vector (MVM) complexity of Ax, x ∈ CN .

For general x one has the linear cost O(dRN log n).

If x = x1 × ...× xd, xi ∈ Cn, we again arrive at sub-linear complexity

O(dRn log n) = O(RN1/d logN1/d).

• Matrix-by-matrix (MMM) complexity of AB, A⊙B and A⊗B.

The S-struct. of the Kronecker factors leads to

O(dR2n logn) = O(R2N1/d logN1/d) op. instead of O(N3).

Page 105: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Literature to Lecture 8 B. Khoromskij, Zurich 2010(L8) 209

1. P.J. Davis: Circulant matrices. John Wiley & and Sons, Inc., NY, 1979.

2. W. Hackbusch, B.N. Khoromskij and E. Tyrtyshnikov: Hierarchical Kronecker tensor-product approximation.

J. Numer. Math. v. 13, n. 2 (2005), 119-156.

3. B.N. Khoromskij: Structured data-sparse approximation to high order tensors arising from the deterministic

Boltzmann equation. Math. Comp. 76 (2007), 1292-1315.

4. C. Van Loan: The ubiquitous Kronecker product. J. of Comp. and Applied Math. 123 (2000) 85-100.

URL: http://personal-homepages.mis.mpg.de/bokh

Lect. 9. Tensor approximation by nonlinear iteration B. Khoromskij, Zurich 2010(L9) 210

Outlook of Lecture 9.

1. Nonlinear approximation and tensor truncation.

2. Dual maximization problem for the rank-r Tucker

approximation.

3. ALS iteration for best rank-1 approximation.

4. Canonical rank-R approximation by ALS.

5. ALS iteration for the orthogonal Tucker approximation.

6. Two-level orthogonal Tucker approx. for canonical input.

7. Multigrid accelerated Tucker approx.: initial guess, most

important fibers, fast convergence.

8. Numerical illustrations in electronic structure calculations.

Page 106: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Nonlinear approximation in tensor format B. Khoromskij, Zurich 2010(L9) 211

Probl. 1. Best rank-R approximation of a high-order tensor

A ∈ Vn in the set CR.

Probl. 2. Best rank-r orthogonal approx. of A ∈ Vn in the

Tucker format Tr.

Probl. 3. Best rank-(R, r) two-level orthogonal approx. of

A ∈ Vn in T CR,r .

Nonlinear approximation problem on estimation: Given

A ∈ S0 ⊂ Vn, find best approximation in S to A,

Tr(A) := argminT∈S

‖A− T‖, where S ⊂ Tr, CR,T CR,r. (56)

Solution of this problem defines the tensor truncation

operator

Tr : S0 ⊂ Vn → S.

Usually it is calculated only upto certain accuracy ε > 0.

Dual maximisation problem B. Khoromskij, Zurich 2010(L9) 212

Rem. 9.1. Recall the Stiefel manifold of orthogonal n× r matrices,

Sn,r := Y ∈ Rn×r : Y T Y = Ir×r.

Consider the minimization problem

A ∈ S0 ⊂ Vn : Ar = argminT∈Tr ‖A− T‖Vn(57)

Lem. 9.1. (quadratic convergence of norm). Let

A(r) = β ×1 V(1) ×2 V

(2) . . .×d V(d) ∈ RI1×...×Id solve the

minimisation problem (57), then ‖β‖ = ‖A(r)‖ ≤ ‖A‖.Moreover, we have the ”quadratic” error bound for the norm,

‖A‖ − ‖A(r)‖‖A‖ ≤ ‖A(r) −A‖2

‖A‖2. (58)

Sketch of proof. The Tucker orthogonality implies ‖β‖ = ‖A(r)‖. Relation

(57) is merely a linear least-square problem w.r.t. β ∈ Rr1×...×rd ,

g(β) := 〈A,A〉 − 2〈A,β ×1 V(1) ×2 . . .×d V

(d)〉 + 〈β,β〉 → min. (59)

Page 107: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Dual maximisation problem B. Khoromskij, Zurich 2010(L9) 213

The corresponding minimisation condition

g(β + δβ) − g(β) ≥ 0 ∀ δβ ∈ Rr1×...×rd ,

leads to the following equations for minimiser,

−〈A, δβ ×1 V(1) ×2 . . .×d V

(d)〉 + 〈β, δβ〉 = 0 ∀ δβ ∈ Rr1×...×rd ,

〈−A×1 V(1)T ×2 . . .×d V

(d)T+ β, δβ〉 = 0, ∀ δβ ∈ R

r1×...×rd ,

β − A×1 V(1)T ×2 . . .×d V

(d)T= 0. (60)

Next we derive

‖A(r) −A‖2 = ‖A(r)‖2 − 2〈β ×1 V(1) ×2 . . .×d V

(d), A〉 + ‖A‖2

= ‖A(r)‖2 + ‖A‖2 − 2〈β, A×1 V(1)T ×2 . . .×d V

(d)T 〉,= ‖A‖2 − ‖β‖2,

hence it follows that ‖A‖2 − ‖A(r)‖2 = ‖A(r) −A‖2, and

‖A‖ − ‖A(r)‖‖A‖ =

‖A(r) −A‖2

(‖A(r)‖ + ‖A‖)‖A‖ ≤‖A(r) −A‖2

‖A|2 .

Best orthogonal rank-(r1, ..., rd) Tucker approximation B. Khoromskij, Zurich 2010(L9) 214

Thm. 9.2. The minimisation problem (57) is equivalent to

the dual maximisation problem

[U (1), ..., U (d)] = argmax∥∥∥A×1 V

(1)T ×2 ...×d V(d)T

∥∥∥2

F(61)

over the product of (compact) Stiefel manifolds,

V (ℓ) = [v1ℓ ...v

rℓℓ ] ∈ Snℓ,rℓ , ℓ = 1, ..., d.

For given maximizing matrices U (m) (m = 1, ..., d), the tensor

β minimising (57) is represented by

β = A×1 U(1)T ×2 ...×d U

(d)T ∈ Rr1×...×rd . (62)

Under the compartibility condition

rm ≤ rm := r1...rm−1rm+1...rd, m = 1, ..., d, (63)

there is at least one solution of (61).

Page 108: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Best orthogonal rank-(r1, ..., rd) Tucker approximation B. Khoromskij, Zurich 2010(L9) 215

Proof. The substitution of β from (60) to (59) leads to the

equivalent minimizing equation

〈A,A〉−〈A×1V(1)T ×2 ...×dV

(d)T, A×1V

(1)T ×2 ...×dV(d)T 〉 → min,

that proves (61), while (60) yields (62).

For the size consistency of arising tensors, we require the

compatibility conditions (63). Then the dual maximisation

problem (61) posed on the compact manifold can be proven

to have at least one global maximum.

The rotational non-uniqueness of the maximizer in (61) can

be avoided if one solves this maximisation problem in a

product of the so-called Grassmann manifolds Gℓ, ℓ = 1, ..., d.

The latter is the factor space of Snℓ,rℓ w.r.t. the rotational

transforms.

Best rank-1 approximation B. Khoromskij, Zurich 2010(L9) 216

Now we look in more detail to the simple special case of the

Tucker/canonical model that is the best rank-1 approximation. It is an

important ingredient in typical multi-linear algebra algorithms.

To derive the corresponding Lagrange equations, we notice that due to

the normalisation ‖A(1)‖2 = β2, A(1) = βV (1)T ×2 ...×d V(d)T

, the dual

problem of maximising the generalised Rayleigh quotient over the

unit-norm vectors (eliminates the scalar β), reads as

˛˛A×1 V

(1)T ×2 ...×d V(d)T

˛˛2−

dX

ℓ=1

λ(ℓ)“‖V (ℓ)‖2 − 1

”→ max. (64)

For any solution of this problem, the corresponding scalar β can be

chosen as β = A×1 V (1)T × ...×d V(d)T

.

Differentiating (64) w.r.t. V (m) (1 ≤ m ≤ d) leads to the equations

β A×1 V(1)T

...×m−1 V(m−1)T ×m+1 V

(m+1)T...×d V

(d)T= λ(m) V (m),

which imply λ(m) = β2.

Page 109: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Best rank-1 approximation B. Khoromskij, Zurich 2010(L9) 217

Finally, the Lagrange equations read as

A×1 V(1)T

...×m−1 V(m−1)T ×m+1 V

(m+1)T...×d V

(d)T= β V (m),

A×1 V(1)T ×2 ...×d V

(d)T= β,

‖V (m)‖ = 1 (1 ≤ m ≤ d).

The above system of Lagrange equations can be solved by an alternating

least squares (ALS) algorithm.

ALS algorithm for rank-1 approximation (see Alg. BTA below).

At each iterative step an approximant to the scalar β and the estimate of

vectors V (m) (m = 1, ..., d) are optimised, while the rest vector-components

with ℓ 6= m are kept constant.

The ALS method for the best rank-1 approximation is proved to have

locally linear convergence rate ([Zhang, Golub 2001]). The Newton’s type

method provides locally quadratic convergence.

The rank-R canonical approximation by ALS iteration B. Khoromskij, Zurich 2010(L9) 218

Let V (m) ∈ Rn×R (m = 1, ..., d), and β = βkR

k=1, be the side

(factor) matrices with normalised column vectors, and

respective weights of the rank-R approximant T ∈ CR of A.

ALS algorithm for rank-R approximation. [Carroll, Chang ’70]

Given initial matrices V (m) (m = 1, 2, ..., d), the ALS scheme

fixes V (m) (m = 2, ..., d) to solve (317) for V (ℓ), ℓ = 1, then

fixes V (m) (m = 1, 3, ..., d) to solve (317) for V (ℓ), ℓ = 2, and so

on for ℓ = 1, 2, ..., d, and continues to repeat such a global

iteration until some convergence or stopping criteria is

satisfied.

Having fixed all but one side matrix, the numerical task

reduces to linear least squares problem.

– ALS algorithm is simple to implement,

– but convergence may be very slow.

Page 110: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

The rank-R canonical approximation by ALS iteration B. Khoromskij, Zurich 2010(L9) 219

For example, for d = 3, we can write Iter. 1 for ℓ = 1,

T(1) = V (V (3) ⊡ V (2))T , where V = V (1) · diagβ.

The minimization problem takes the form (Frobenius norm)

minV

‖A(1) − V (V (3) ⊡ V (2))T ‖2,

and the optimal solution (minimizer) is then given by

V = A(1)

[(V (3) ⊡ V (2))T

]†,

where B† denotes the Moore-Penrose pseudo-inverse of B.

Update for β is computed by normalising columns of V .

Notice that the pseudo-inverse of B = (V (3) ⊡ V (2))T is only of

size R×R.

Generalization to an arbitrary d ≥ 3 is similar, [Kolda, Bader ’09].

Best rank-rTucker approximation (BTA) B. Khoromskij, Zurich 2010(L9) 220

The ALS iteration to compute BTA is also known as higher-order

orthogonal iteration (HOOI) [De Lathauwer, De Moor, J. Vandewalle ’00].

Algorithm BTA (Vn → T r,n). Given the input tensor A ∈ Vn.

1. Compute an initial guess V(ℓ)0 (ℓ = 1, ..., d) for the ℓ-mode

side-matrices by “truncated” SVD applied to n× nd−1 unfolding

matrix A(ℓ), i.e. HOSVD, (cost O(nd+1)).

2. For each q = 1, ..., d, and with fixed side-matrices V (ℓ) ∈ Rn×rℓ , ℓ 6= q,

the ALS iteration optimises the q-mode matrix V (q) via computing

the dominating rq-dimensional subspace (truncated SVD) for the

unfolding matrix B(q) ∈ Rn×rq , rq = r1...rq−1rq+1...rd = O(rd−1)

corresponding to the q-mode contracted product

B = A×1 V(1)T ×2 ...×q−1 V

(q−1)T ×q+1 V(q+1)T

...×d V(d)T

.

Each iteration has the cost O(drd−1nmaxrd−1, n).

3. Check stopping criteria.

4. Compute the core β as the representation coefficients of the

orthoprojection of A onto ⊗dℓ=1 spanvν

ℓ rℓν=1, (cost O(rdn))

β = A×1 V(1)T ×2 ...×d V

(d)T ∈ Rr1×....×rd .

Page 111: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Two-level BTA for canonical input B. Khoromskij, Zurich 2010(L9) 221

Conclusion: Algorithm BTA (Vn → T r,n) applies to moderate d, and

moderate n, because of exponential scaling in d.

Consider approximation in the two-level Tucker-canonical format.

In the iterative solution of multi-dimensional PDEs the typical situation

may arise when the target tensor is already presented in the rank-R

canonical format, A ∈ CR,n, but with large R and large n. In this case, a

two-level approximation scheme can be applied,

CR,n→T CR,r → T CR′,r. (65)

On Level-I, the best orthogonal Tucker approx. applies to CR,n input, so

that the resultant Tucker core is represented in the CR,r format.

On Level-II, the “small-size” Tucker core in CR,r is approximated by an

element in CR′,r with R′ ≪ R.

Next statement gives the result on the solvability and structure of the

Level-I scheme in (65), and provides the key to construct its efficient

numerical implementation.

Two-level BTA for canonical input B. Khoromskij, Zurich 2010(L9) 222

Thm 9.3. (Canonical to Tucker approximation, [4]).

(a) For A =RP

ν=1ξνuν

1 ⊗ ...⊗ uνd ∈ CR,n, the minimisation problem

A ∈ CR,n ⊂ Vn : A(r) = argminT∈Tr,n‖A− T‖Vn

, (66)

is equivalent to the dual maximisation problem

[V (1), ..., V (d)] = argmaxW (ℓ)∈Gℓ[Srℓ ]

‚‚‚‚‚RX

ν=1

ξν“W (1)T

uν1

”⊗ ...⊗

“W (d)T

uνd

”‚‚‚‚‚

2

Hr

.

(b) The compatibility condition is simplified to

rℓ ≤ rank(U (ℓ)) with U (ℓ) = [u(ℓ)1 ...u

(ℓ)R ] ∈ R

n×R,

ensuring the solvability of (221).

The maximizer is given by orthogonal matrices V (ℓ) = [v(ℓ)1 ...v

(ℓ)rℓ ] ∈ Rn×rℓ ,

computed as in Alg. BTA, with modification HOSVD → RHOSVD.

(c) The minimiser in (66) is then calculated by the orthoprojection,

A(r) =rX

k=1

µkv(1)k1

⊗...⊗v(d)kd, µ =

RX

ν=1

ξν(V (1)Tu(1)ν )⊗· · ·⊗(V (d)T

u(d)ν ) ∈ CR,r.

Page 112: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Two-level BTA for canonical input B. Khoromskij, Zurich 2010(L9) 223

Sketch of Proof. (a) The generic dual maximization problem

(61) with A ∈ CR,n, takes the form (221) due to the relation

〈vk11 ⊗ ...⊗ vkd

d , A〉 =

R∑

ν=1

ξν〈vk11 , uν

1〉...〈vkdd , uν

d〉.

(b) The compatibilty condition ensures the size-consistency

of all matrix unfoldings.

(c) Formula for A(r) is a special case of the general

orthoprojection representation for the Tucker core.

Notice that the approximation error of the minimizer A(r) is

given by Thm. 7.3 (Lect. 7), see also Rem. 9.2 below.

Algorithm C BTA, complexity bound B. Khoromskij, Zurich 2010(L9) 224

Algorithm C BTA (CR,n→T CR,r). Given A ∈ CR,n, iteration

parameter kmax, and the rank parameter r.

1. For ℓ = 1, ..., d, compute the truncated SVD of U (ℓ) to

obtain orthogonal matrices Z(ℓ)0 ∈ R

n×rℓ , representing the

rank-rℓ RHOSVD approximation of ℓ-mode dominating

subspaces (cost O(dRnminR, n)).

2. Given initial quess Z(ℓ)0 , (ℓ = 1, ..., d) for ℓ-mode orthogonal

matrices. Perform kmax ALS iterations as at Step 2 in the

general Alg. BTA to obtain the maximizer V (ℓ) ∈ Rnℓ×rℓ,

ℓ = 1, ..., d (cost O(drd−1nminrd−1, n) per iteration).

3. Calculate projections of U (ℓ) onto the basis of computed

orthogonal vectors of V (ℓ) as the matrix product V (ℓ)TU (ℓ)

(ℓ = 1, ..., d), at the cost O(drRn).

Page 113: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Algorithm C BTA, complexity bound B. Khoromskij, Zurich 2010(L9) 225

4. Using the columns in V (ℓ)TU (ℓ) (ℓ = 1, ..., d), calculate the

rank-R core tensor µ ∈ CR,r as in Thm. 9.3, in O(drRn)

operations and with O(drR)-storage.

Rem. 9.2 Algorithm C BTA (CR,n→T CR,r) exhibits

polynomial cost in R, r, n,

O(dRnminn,R + drd−1nminrd−1, n),

with exponential scaling in d. In absence of Step 2 (i.e.,

RHOSVD provides satisfactory approximation), we have for

any d ≥ 2 a finite SVD-based scheme with error bound as in

Thm. 7.3, see (4). If the R-term canonical representation is

only weakly redundant, then ‖ξ‖ will be of the same order as

‖A‖, and the relative error of RHOSVD becomes as good as

for HOSVD.

Old and new Tucker-type approximation methods B. Khoromskij, Zurich (L9) 226

1. Best orthogonal Tucker approx. for general Vn-input.

2. ALS-based rank reduction methods CR,n → Cr,n, r < R.

3. Two-level version of BTA.

→ Polynomial cost O(d(R+ rq)nminn,R) for CR,n-input.

4. Multigrid rank-r Tucker approximation (MG-BTA):

→ Robustness: fast convergence of ALS iteration ensured by

good initial guess at all representation grids.

→ Linear scaling for CR,n-input, O(dRrn).

→ Complexity reduction nd+1 → nd for full format input.

→ Efficient rank-structured tensor realization of the

convolution operator in Rd on large spatial grids (e.g.,

n⊗3, n ≤ 3.2 · 104), to be discussed in the following lectures.

Page 114: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Improved RHOSVD for function related tensors B. Khoromskij, Zurich (L9) 227

Multigrid accelerated method, [BNK, V. Khoromskaia ’08]: [4].

Main idea:

– Solve the sequence of approximation problems for An = Anm

with n = nm := n02m, m = 0, 1, ...,M . (R)HOSVD analysis of

An0 , only on the coarse grid.

– Use the coarse-to-fine approximation of dominating

subspaces.

– Find “most important fibers” (MIFs) of ℓ-mode unfolding

matrices on coarse level(s), via maximal energy principle.

– ALS iteration on the reduced data set by choosing only

small set of most representative MIFs of ℓ-mode unfolding

matrices.

Multilevel iteration with learning the data structure B. Khoromskij, Zurich 2010(L9) 228

Solving problems on a sequence of grids, n = n0, ..., nM :

1. The equidistant tensor grid ωd,n := ω1 × ω2 · · · × ωd, where

ωℓ := −A+ (k − 1)h : k = 1, ..., n+ 1 (ℓ = 1, ..., d)

with mesh-size h = 2A/n, n = n02m, m = 0, 1, ...,M . A set of collocation

points

xk ∈ Ω ∈ Rd, k ∈ I := 1, ..., nd

located at the midpoints of the grid-cells numbered by k ∈ I.

2. Given continuous multivariate function f : Ω → R, the target tensor

An = [an,k] ∈ RI is defined as the trace of f on the set xk,

an,k = f(xk), k ∈ I.

3. An “accurate“ tensor-product prolongation operator Im−1→m from the

coarse to fine grid (interpolation by piecewise linear/cubic splines).

4. Transfer the initial guess and position of most important fibers (MIFs)

from coarse to fine grids.

Page 115: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Algorithm MG C BTA B. Khoromskij, Zurich 2010(L9) 229

Algorithm MG C BTA (CR,nM→T CR,r). Multigrid accelerated

canonical-to-Tucker approximation. Set r = (r, ..., r) for ease of present.

1. Given Am ∈ CR,nm , corresponding to a sequence of grid parameters

nm := n02m, m = 0, 1, ...,M . Fix a structural constant p = O(1) (i.e.

pr ≪ rd−1), iteration parameter kmax, and the Tucker rank r.

2. For m = 0, solve C BTA(CR,n0 → TCR,r) and compute the index set

Jq,p(n0) ⊂ Jrq via identification of MIFs in the matrix unfolding B(q),

q = 1, ..., d, using the maximum energy principle applied to the

q-mode unfolding of the Tucker core β(q) = U (q)TB(q) ∈ Rrq×rq .

+ ... + + ... +

n1 n n nn

1 1 11

n n

n n3 3

2 2

rr

r12r r2

r33 r 3 3r

r2 23

B(q)find MIFs

βcore (q)

Figure 20: d = 3: Finding MIFs in the coarse level core β(q), q = 1 for the rank-R initial

data on the coarse grid n0 = (n1, n2, n3). For explanatory reasons, B(q) is presented in a tensor

form.

Algorithm MG C BTA B. Khoromskij, Zurich 2010(L9) 230

3. For m = 1, ...,M perform the cascadic MGA Tucker approximation by

the restricted ALS iteration:

3a) Compute initial orthogonal side matrices on level m by

interpolation from level m− 1 (say, by using cubic splines)

V (q) = V(q)m = Im−1→m(V

(q)m−1), q = 1, ..., d.

3b) For each q = 1, ..., d, fix V (ℓ) (ℓ = 1, ..., d, ℓ 6= q) and perform:

→ Compute matrix products V (ℓ)TU (ℓ), ℓ = 1, ..., d; ℓ 6= q, and build

the ”restricted” q-mode matrix unfolding B(q,p),

B(q,p) = B(q)|Jq,p(n0)∈ R

nm×pr ,

by calculating pr columns in the complete unfolding matr.

B(q) ∈ Rnm×rq .

→ Update the orthogonal matrix V (q) = V(q)m ∈ Rnm×r via computing

the r-dimensional dominating subspace of the ”restricted” matrix

unfolding B(q,p) (truncated SVD of nm × pr matrix).

4. If m = M , compute the rank-R core tensor β ∈ CR,r, as in Step 3 of

basic algorithm C BTA (CR,n → TCR,r).

Page 116: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Algorithm MG C BTA B. Khoromskij, Zurich 2010(L9) 231

2 4 6 8 10 12 14

2

4

6

8

10

12

142 4 6 8 10 12 14

2

4

6

8

10

12

142 4 6 8 10 12 14

2

4

6

8

10

12

14

Figure 21: MIFs: selected projections of the fibers in the coarse level cores for comput-

ing U(1)(left), U(2) (middle) and U(3) (right). The example corresponds to the multigrid rank

compression in computation of the Hartree potential for the HO2 molecule, r = 14, p = 4.

Thm. 9.4. Algorithm MG C BTA(CR,nM→T CR,r) amounts to

O(dRrnM + dp2r2nM )

operations per ALS loop, plus extra cost of the coarse mesh

solver BTA (CR,n0→T CR,r). It requires O(drnM + drR) storage

to represent the result.

Algorithm MG C BTA B. Khoromskij, Zurich 2010(L9) 232

U(l) Z (l)

m=0,..., M, l=1,...,d

HOSVD : 0 0

Given β , U (l)m , for,p>0

U(l)

0(Z(l)

0)T

0and the core tensor

dq=1,...,,

µ

find p MIFs in B(q) , specify Ip,q

, l=1,...,d

compute projections

ALSfor m=1,...,M interpolate Z

(l)

m−1Z

(l)

m

using Ip,q B (q)

Z(l)

mB (q)compute dominating subspaces

its canonical rank

compute µM and reduce

using matrix unfolding of 0µ B (q)

compute reduced unfoldings

Figure 22: Flow chart of Alg. MG C BTA for the rank-R target.

Page 117: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Complexity of Algorithm MG C BTA B. Khoromskij, Zurich 2010(L9) 233

Proof. Step 3a) requires O(drnm) operations and memory.

Notice that for large m, we have pr ≤ nm, hence the

complexity of second part in Step 3b) is dominated by

O(dRrnm + p2r2nm) per iteration loop, and same for first part

of Step 3b).

Rank-R representation of β ∈ CR,r requires O(drRnm)

operations and O(drR)-storage. Summing up over levels

m = 0, ...,M , proves the result.

Thm. 9.4 shows that Algorithm MG C BTA realizes the fast

rank reduction method that scales linearly in d, nM , R and r

on refined levels.

Moreover, the complexity and error of the MGA Tucker

approximation can be controlled by the adaptive choice of the

governing parameters r, p, n0 and kmax.

Numerics to Algorithm MG C BTA B. Khoromskij, Zurich 2010(L9) 234

200 400 600 800 1000 12000

10

20

30

40

50

60

70

80

90

rank

seco

nds

n=2048n=1024n=512n=256

20 40 60 80 10010

−10

10−8

10−6

10−4

10−2

100

n=128n=256n=512n=1024n=2048

Figure 23: Linear scaling in R and in n (left). Plot of SVD for the mode-1

matrix unfolding B(1,p), p = 4 (right).

Exer. 9.1. Find the maximal size of 3-tensors which are

tractable for Tucker approximation by Tensor Toolbox. What

is the extrapolated CPU time for n = 2048 scaled by O(n4).

Page 118: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Numerics to Algorithm MG C BTA B. Khoromskij, Zurich 2010(L9) 235

10 11 12 13 14 15 1610

−8

10−7

10−6

10−5

10−4

Tucker rank

MGA for Slater function, relative error

n=512n=256n=128

14 16 18 20 2210

−6

10−5

10−4

10−3

10−2

Tucker rank

relative error, ρ1/3 ,pseudodensity of CH4, kmax=4

n=128n=256n=512

Figure 24: Approximation error of the multigrid Tucker approximation

vs. rank for the 3D Slater function e−‖x‖ (left), and for the third root of

electron density ρ1/3 of CH4 (right).

Rem. 9.3. The multigrid BTA methods applies to the full format tensors

reducing the cost by nd+1 → nd. For d = 3, the alternative approach to

T-HOSVD can be based on the adaptive cross approximation [5].

Literature to Lecture 9 B. Khoromskij, Zurich 2010(L9) 236

1. J.D. Carrol, and J. Chang: Analysis of individual differences in multidimensional scaling.

Psychometrika 35 (1970), 283-319.

2. L. De Lathauwer, B. De Moor, J. Vandewalle, On the best rank-1 and rank-(R1, ..., RN )

approximation of higher-order tensors. SIAM J. Matrix Anal. Appl., 21 (2000) 1324-1342.

3. T.G. Kolda, and B.W. Bader: Tensor decompositions and applications.

SIAM Review, 51/3 (2009), 455-500.

4. B.N. Khoromskij and V. khoromskaia: Multigrid accelerated tensor approximation of function related

multi-dimensional arrays. SIAM J. Sci. Comp. 31(4), 2009, 3002-3026.

5. I. Oseledets, D. Savostianov, and E. Tyrtyshnikov: Tucker dimensionality reduction

of three-dimensional arrays in linear time. SIMAX, 30(3), 939-956 (2008).

6. T. Zhang and G.H. Golub: Rank-one approximation to high order tensors. SIAM J. Matrix Anal. Appl.

23 (2001), 534-550.

URL: http://personal-homepages.mis.mpg.de/bokh

Page 119: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Lect. 10. Tensor Representation of Matrix-Valued Functions B. Khoromskij, Zurich 2010(L10) 237

Outlook of Lecture 10.

1. Examples of Matrix-Valued Functions (MVF), see [3] for

more detail.

2. Standard integral or iterative representations of MVFs.

3. MVFs as solution operators (SOs) of PDEs and matrix

equ.

4. Quadrature representation of the elliptic resolvent.

5. Quadrature error controls the error in operator norm.

6. Matrix exponential.

7. Newton iteration to compute A−1.

8. Newton-Schultz iteration for sign(A), and√A.

Examples of matrix-valued functions B. Khoromskij, Zurich 2010(L10) 238

MVFs of the elliptic operator L (resp. matrix A) arise as the

solution operators of elliptic, parabolic, hyperbolic eq., etc.

Tensor-product representations for several classes of MVFs:

F(L) :=L−α, α > 0, elliptic inverse, preconditioning

F(L) :=e−tL, parabolic solution operator

F(L) := cos(t√L)L−k, k ∈ N, regularized hyperbolic SO

F(L) :=

∫ ∞

0

e−tL∗

Ge−tLdt, SO of matrix Sylvester eq.

F(L) := sign(L), control theory, DFT.

Both the discrete elliptic inverse A−1 and the matrix

exponential e−tA play the key role in numerical PDEs.

Usually MVFs are nonlocal, hence tensor-product formats are

useful for their efficient representation even if d = 3.

Page 120: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Constructive representation of MVFs B. Khoromskij, Zurich 2010(L10) 239

There are different methods to represent MVFs (L = A):

• The case of diagonalisable matrices, i.e., A = T−1DT with

D = diagd1, ..., dn - diagonal, one defines

F (A) = T−1F (D)T, F (D) = diagF (d1), ..., F (dn).

• Dunford-Cauchy integral for analytic functions

F (A) =1

2πi

Γ

F (z)(zI −A)−1dz, Γ ∈ C, σ(A) ⊂ int(Γ).

• Laplace type transform

F (A) =

R

f(t)e−tAdt.

• Trigonometric integral representation

F (A) =

R

[a(t) cos(tA) + b(t) sin(tA)]dt.

• Polynomial expansions or/and nonlinear iteration.

MVFs as solution operators of PDEs and matrix eq. B. Khoromskij, Zurich 2010(L10) 240

Ex. 10.1. The solution operator to the initial value parabolic

problem∂u

∂t+ Lu(t) = 0, u(0) = u0 ∈ X, (67)

is given by

T (t;L) = e−tL =

Γ

e−zt(zI − L)−1dz,

where L is an elliptic (say, sectorial) operator (e.g., L = −∆)

in a Hilbert space X and u(t) is a vector-valued function

u : R+ → X, Γ envelops σ(L).

Given the initial vector u0, then u(t) = T (t;L)u0.

A simple example of a parabolic PDE is the heat equation

u(0) = u0 :∂u

∂t− ∆u = 0, u : R+ × [0, 1]d → R

with the corresponding boundary conditions.

Page 121: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

MVFs as solution operators of PDEs and matrix eq. B. Khoromskij, Zurich 2010(L10) 241

Ex. 10.2. Initial-value problem for the second order

differential equation with an operator coefficient

u′′(t) + Lu(t) = 0, u(0) = u0, u′(0) = 0,

has the solution operator

C(t;L) := cos(t√L) =

Γ

cos(t√z)(zI − L)−1dz,

(the hyperbolic operator cosine family), u(t) = C(t;L)u0.

It represents the function-to-operator map cos(t√·)→ C(t;L).

An example of a hyperbolic PDE is the classical wave eq.

∂2u

∂t2− ∆xu = 0, x ∈ R

d,

subject to the corresponding boundary and initial conditions.

MVFs as solution operators of PDEs and matrix eq. B. Khoromskij, Zurich 2010(L10) 242

Ex. 10.3. For the boundary value problem

d2u

dx2− Lu = 0, u(0) = 0, u(1) = u1, (68)

in a Hilbert space X, the solution operator is the normalised

hyperbolic operator sine family

E(x;L) :=(sinh(

√L))−1

sinh(x√L) =

Γ

sinh(x√z)

sinh(√z)

(zI − L)−1dz,

so that u(x) = E(x;L)u1.

Simple PDE of type (68): the Laplace eq. in a cylinder:

d2u

dx2+d2u

dy2= 0, x ∈ [0, 1], y ∈ [c, d],

u(0, y) = 0, u(1, y) = u1(y).

Rem. 10.1 Representations 10.1 - 10.3 allow to avoid time

stepping (parallel in time computations).

Page 122: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

MVFs as solution operators of PDEs and matrix eq. B. Khoromskij, Zurich 2010(L10) 243

Ex. 10.4. In the case of Sylvester matrix equation

AX +XB = G, (A,B,G ∈ Rn×n given)

the solution X ∈ Rn×n is given by

X = F(A,B)G :=

∫ ∞

0

e−tAGe−tBdt,

provided that A,B ensure existence of integral (cf. Lect. 8).

The (nonlinear) Riccati matrix equation

AX +XA⊤ +XFX = G,

where A,F,G ∈ Rn×n are given and X ∈ Rn×n is the unknown

matrix, can be solved by Newton’s iteration. At each iteration

step the Lyapunov equation has to be solved (Xk → X)

(A− FXk)Xk+1 +Xk+1(A− FXk)⊤ = −XkFXk +G.

MVFs as solution operators of PDEs and matrix eq. B. Khoromskij, Zurich 2010(L10) 244

Ex. 10.5. Let A ∈ Rn×n be a matrix whose spectrum σ(A)

does not intersect the imeginary exis. The matrix function

F (A) = sign(A) is defined by

sign(A) :=1

πi

Γ+

(zI −A)−1dz − I (69)

with Γ+ being any simply connected closed curve in C whose

interior contains all eigenvalues of A with positive real part.

The tensor representation of the MVF sign(A) (rank depends

on the grid-size) can be based on an efficient quadrature for

the integral

sign(A) =1

cf

R+

f(tA)

tdt.

MVF sign(A) arises in optimization theory, the DFT in

comput. quantum chemistry, and in linear algebra.

Page 123: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

MVFs as solution operators of PDEs and matrix eq. B. Khoromskij, Zurich 2010(L10) 245

Ex. 10.6. A negative fractional power of A is represented by

A−σ =1

Γ(σ)

∫ ∞

0

tσ−1e−tAdt, σ > 0, (70)

provided that the integral exists.

If A = −∆, (70) is of the particular interest in the cases:

(a) σ = 1: Laplacian inverse,

(b) σ = 1/2: Preconditioning for the Laplace-Beltrami op.

(−∆)1/2, and for the hypersingular integral op. in BEM,

(c) σ = 2: Inverse biharmonic operator.

A positive fractional power of A, say Aα, 0 < α < 1, can be

represented by a simple factorisation,

Aα = AA1−α.

Classes of elliptic operators B. Khoromskij, Zurich 2010(L10) 246

The elliptic operator A : V → V ′ with V = H10 (Ω), V ′ = H−1(Ω),

A =d∑

j=1

− ∂

∂xjaj(xj)

∂xj+ bj(xj)

∂xj+ cj(xj)

,

is supposed to have “separable” coefficients. The associated

bilinear form (with c(x) =∑cj(xj))

a(u, v) =

Ω

d∑

j=1

aj(x)∂u

∂xj

∂v

∂xj+

d∑

j=1

bj(x)∂u

∂xjv + c(x)uv

dx

with a : V × V → R is assumed to be continuous and V -elliptic:

|a(u, v)| ≤ C‖u‖V ‖v‖V , ℜe a(v, v) ≥ δ0‖v‖2V , δ0 > 0.

In tensor-product setting we have (x1, ..., xd) ∈ Ω := (0, 1)d ∈ Rd.

Page 124: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Classes of elliptic operators B. Khoromskij, Zurich 2010(L10) 247

Let X = L2(Ω), then the corresponding elliptic operator L and

its discrete counterpart A (say, A is the FEM/FD stiffness

matrix corresponding to L) satisfy

‖(zI −A)−1‖X←X ≤ 1

|z| sin(θ1 − θ)∀ z ∈ C : θ1 ≤ | arg z| ≤ π,

(71)

for any θ1 ∈ (θ, π), where cos θ = δ0/C.

In the case of discrete elliptic operators A, the bound (71) on

the matrix resolvent is valid uniformly in the mesh-size h.

The variation in the r.h.s. of (71) is proportional to cond(A).

We consider tensor approximation to

A−1, exp(−tA),√A, sign(A) not analytic.

Tensor approximation of MVF quadratures for Laplace transform B. Khoromskij, Zurich 2010(L10) 248

Assume that for given f(ρ), ρ ∈ [1, R], and εR > 0, there is an

accurate r-term approximation by exponential sum,

fr(ρ) :=r∑

k=1

ake−bkρ s.t. |f(ρ) − fr(ρ)| ≤ εR, ρ ∈ [1, R]. (72)

The question is how accurate does the ansatz fr(A) represent

the matrix-valued function f(A)?

We consider two cases

(A) Real-diagonalisable matrix A, i.e., A = T−1DT with a

diagonal D = diagd1, ..., dn, where di ∈ [1, R].

(B) The analytic function f has the Dunford-Cauchy integral

representation (Γ “envelopes” σ(A)):

f(A) =1

2πi

Γ

f(z)(zI −A)−1dz.

Page 125: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Tensor approximation of MVF quadratures for Laplace transform B. Khoromskij, Zurich 2010(L10) 249

Lem. 10.1. In Case (A) we have

‖f(A) − fr(A)‖ ≤ ‖T‖ ‖T−1‖ εR.

In Case (B), let (72) hold with εR = g(ρ)εΓ, at least for

ρ = z ∈ Γ. Then we have

‖f(A) − fr(A)‖ ≤ εΓ2π

maxz∈Γ

|g(z)|∫

Γ

∥∥(zI −A)−1∥∥ d |z|.

In the case of discrete elliptic operator A, we have

Γ

∥∥(zI −A)−1∥∥ d |z| ≤ C log

|λmax||λmin|

, λmax, λmin ∈ σ(A),

where C depends on the ellipticity and continuity constants of

the related operator A.

Tensor approximation of MVF quadratures for Laplace transform B. Khoromskij, Zurich 2010(L10) 250

Proof: In Case (A), we readily obtain

‖f(A) − fr(A)‖ = ‖T−1 diagf1, ..., fnT‖with fi = f(di) − fr(di), which proves the statement. If T is

the unitary transform then ‖T‖ = ‖T−1‖ = 1.

In Case (B), there holds

‖f(A) − fr(A)‖ =1

∥∥∥∥∥

Γ

[f(z) −r∑

k=1

ake−bkz](zI −A)−1dz

∥∥∥∥∥

≤ εΓ2π

Γ

|g(z)|∥∥(zI −A)−1

∥∥ d |z|,

that proves the general assertion. Finally, in the case of

discrete elliptic operators we shoose Γ in such a way that∥∥(zI −A)−1∥∥ ≤ C

|z| , (see (71)), to obtain

Γ

∥∥(zI −A)−1∥∥ d |z| ≤ C

Γ

d |z||z| .

Page 126: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Low tensor rank Laplacian inverse B. Khoromskij, Zurich 2010(L10) 251

Ex. 10.7. In the case of FD negative Laplacian ∆(d) on

H10 ([0, 1]d), for d = 2, 3, ..., the matrix ∆−1

(d), is approximated in

the rank-R Kronecker format,

LM :=M∑

k=−M

ck

d⊗

ℓ=1

exp(−tk∆(ℓ)) ≈ (∆(d))−1, ∆(ℓ) = ∆ ∈ R

n×n,

(tk, ck ∈ R), providing exponential convergence in R = 2M + 1.

In particular, taking

tk = ekh, ck = htk, h = π/√M,

leads to the convergence rate

∥∥(∆(d))−1 − LM

∥∥ ≤ Ce−π√

M .

Hence the ε-rank of (∆(d))−1 is of order O(| log ε|2), uniformly

in d (compare with Ex. 8.1, Lect. 8).

Low tensor rank Laplacian inverse B. Khoromskij, Zurich 2010(L10) 252

The matrix-vector multiplication of LM with rank-1 vector in

Rn⊗d

takes O(dRn logn) op. by the diagonalization

exp(−tk∆(ℓ)) = F ′ℓ ·Dℓ · Fℓ, Dℓ = diage−tkλ(ℓ)1 , ..., e−tkλ(ℓ)

n ,

where Fℓ is the ℓ-mode sin-transform matrix of size n, and

λ(ℓ)i (i = 1, ..., n) are the eigenvalues of the 1D Laplacian ∆(ℓ).

This also allows the rank reduction scheme by using tensor

product FFT,

A−1 ≈d⊗

ℓ=1

FTℓ

M∑

k=−M

ck

d⊗

ℓ=1

diage−tkλ(ℓ)1 , ..., e−tkλ(ℓ)

n d⊗

ℓ=1

Fℓ,

Rem. 10.2. The above decomposition is similar to the sinc approxim. of

the Hilbert tensor, applied to the n⊗d tensor Λ = [1/(λ(1)i1

+ ...+ λ(d)id

)].

Exer. 10.1. Construct the fast Poisson solver on uniform grid over [0, 1]d,

by the rank decomposition of the Hilbert tensor Λ. Test the case f = 1,

d = 3, for different grid-size n and truncation ranks.

Page 127: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Matrix exponentials B. Khoromskij, Zurich 2010(L10) 253

Ex. 10.8. The matrix exponential can be defined and then

calculated by

exp(A) :=∞∑

k=0

1

k!Ak ≈ EN :=

N−1∑

k=0

1

k!Ak. (73)

This approximation converges exponentially (if N is large

enough, say, N ≥ e||A||),

||EN − exp(A)|| ≤∞∑

k=N

1

k!||A||k ≤ C(||A||)

N !≈(e||A||)N

)N

.

The Horner scheme to calculate (73) requires only N − 1

matrix multiplications

AN := I; for k = N − 1 downto 1 do Ak :=1

kAk+1A+ I,

such that EN := A0.

If ||A|| > 1 the algorithm (73) may produce very large terms

for intermediate values of N !

Matrix exponentials B. Khoromskij, Zurich 2010(L10) 254

Recal that for commutative matrices A,B we have

exp(A+B) = exp(A) exp(B), in particular exp(A) = [exp(A/2)]2.

Now, the algorithm (73) can be modified as follows:

(a) Choose n such that 12n ‖A‖ ≤ 1.

(b) Compute B = exp(A/2n) by algorithm (73).

(c) Compute exp(A) = B2n in n ≈ log2(‖A‖) matrix quadrations.

If B = exp(A/2n) can be represented in certain data-sparse

format (e.g., Kronecker product form) then truncating all the

intermediate products B2m, m = 1, ..., n, into the fixed format

S leads to the desired tensor representation of exp(A).

In this case, the truncation error analysis is an open question.

Modification to the case of many exponentials: lectures

below.

Page 128: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Tensor truncated nonlinear iteration B. Khoromskij, Zurich 2010(L10) 255

Ex. 10.9. In some cases iterative schemes (with possible

recompression at each iteration) can be applied.

(A) An approximation to A−1: given X0 ∈ Rn×n, the

Newton-Schulz iteration

Xk+1 = Xk(2I −AXk), k = 1, 2, ... (74)

converges to A−1 locally quadratically (cf. anylisis below).

Iteration (74) is nothing but the Newton method

Ψ′(Xk)(Xk+1 −Xk) = −Ψ(Xk)

for solving the nonlinear matrix equation

Ψ(X) := A−X−1 = 0.

In fact, Ψ(X + δ) − Ψ(X) = X−1δ(X + δ)−1 providing

Ψ′(Xk)(δ) = X−1k δX−1

k . Now (74) follows from

Xk+1 −Xk = −Xk(A−X−1k )Xk.

Analysis of iterative schemes B. Khoromskij, Zurich 2010(L10) 256

Analysis of Newton-Schulz iteration (74) to compute A−1.

Denote the residual error by Ek = I −AXk, k = 0, 1, 2, . . .. It is

easy to see that

Xk+1 = Xk(I +Ek), k = 0, 1, 2, . . . ,

which implies (for k = 1, 2, . . .)

Ek = I−AXk−1(I+Ek−1) = I−(I−Ek−1)(I+Ek−1) = E2k−1. (75)

Applying (75) recursively, we find that

Ek = E2k

0 , k = 1, 2, . . . . (76)

It is also clear that

A−1 −Xk = A−1Ek = A−1E2k

0 = X0(I −E0)−1E2k

0 .

Page 129: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Analysis of iterative schemes B. Khoromskij, Zurich 2010(L10) 257

Under the assumption on the spectral radius of E0,

ρ ≡ ρ[E0] = maxj

|λj | < 1,

where λj = λj(E0) are the eigenvalues of E0, we obtain that

the error Ek in (76) vanishes like ρ2k .

Rem. 10.3. The iteration (74) can be applied to any

preconditioned matrix B = R0A, where R0 is a spectrally

equivalent preconditioner to A so that σ(B) is uniformly

bounded in n. Assuming that both R0 and R0A already have

the S-structured representation, we then obtain the

approximate inverse of interest from

A−1 = (R0A)−1R0.

In some cases this approach provides the constructive proof

on the existence of the S-matrix inverse.

Analysis of iterative schemes B. Khoromskij, Zurich 2010(L10) 258

Let E0 = I −BX0. The requirement ρ[E0] < 1 can be achieved

under the following conditions.

Lem. 10.2. Let B have real eigenvalues in the interval

0 < m ≤ λj ≤M , j = 1, 2, . . . , n. Let X0(w) = wI, then ρ[E0] < 1

for all w ∈ (0, 2M ). Moreover, if ρ(w) = ρ[E0(w)], then there

holds

ρ(w∗) = minw∈(0, 2

M )ρ(w) =

M −m

M +m< 1, w∗ =

2

M +m. (77)

Proof. This lemma is a reformulation of a standard

convergence result for the Richardson iteration.

Implementing (74) in the formatted S-arithmetics efficiently,

one can compute the S approximation Xk to A−1 with

O(log log ε−1) iterations, where ‖I −AXk‖ ≤ ε.

Page 130: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Iterative schemes to compute sign(A) and sqrt(A) B. Khoromskij, Zurich 2010(L10) 259

(B1) Newton-Schulz iteration scheme to approximate sign(A):

Xk+1 = Xk +1

2

[I − (Xk)2

]Xk , X0 = A/||A||2. (78)

For diagonalisable matrices we have locally quadratic

convergence Xk → sign(A) (see the analysis below). Many

methods are presented in [3].

This scheme was already successfully applied in many-particle

calculations.

The above mentioned schemes (a) and (b) are especially

efficient in the case q = 2, since the optimal SVD or ACA

recompression in the (H)KT-format can be applied.

(B2) Newton’s method to calculate sign(A). The iteration

X0 = A, Xk+1 =1

2(Xk +X−1

k ) (79)

converges (locally quadratically) to sign(A).

Iterative schemes to compute sign(A) and sqrt(A) B. Khoromskij, Zurich 2010(L10) 260

The iterative calculation may be not very simple !

(C) Newton iteration to compute the square root A1/2 of the

symmetric positive definite matrix A: Given X0, the iteration

Xk∆k + ∆kXk = A−X2k , (80)

where ∆k = Xk+1 −Xk, converges to A1/2 quadratically

(locally). It requires solving matrix Lyapunov equation.

This scheme can be consider as the Newton iteration to solve

the nonlinear matrix equation

Ψ(X) := A−1 −X2 = 0.

Clearly,

Ψ(X + δ) − Ψ(X) = −X∆ − ∆X,

so our iteration can be interpreted as the Newton method for

solving Ψ(X) = 0 (see Lect. 11 for the analysis of truncated

iterations).

Page 131: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Literature to Lecture 10 B. Khoromskij, Zurich 2010(L10) 261

1. I. Gavrilyuk, W. Hackbusch and B.N. Khoromskij: Tensor-Product Approximation to Elliptic and Parabolic

Solution Operators in Higher Dimensions. Computing 74 (2005), 131-157.

2. W. Hackbusch, B.N. Khoromskij and E. Tyrtyshnikov: Approximate Iteration for Structured Matrices.

J. Numer. Math. Vol. 13, No. 2 (2005), 119-156.

3. N.J. Higham: Functions of Matrices: Theory and Computation. SIAM, 2008.

URL: http://personal-homepages.mis.mpg.de/bokh

Lect. 11. Truncated iteration. Tensor convolution and other bilinear operations. B. Khoromskij,

Outlook of Lecture 11.

1. Tensor truncated nonlinear iteration: attempt to the

convergence theory.

2. Example on SVD based truncation methods for A−1.

3. Newton type iterations for sign(A) and√A.

4. Multivariate convolution: error bound for projection

collocation scheme. Richardson extrapolation.

5. Discrete d-dimensional convolutiuon in various tensor

formats.

6. Tensorization of Hadamard, scalar & contracted products.

7. Beyond the additive dimension splitting: what is the next

generation of tensor methods?

Page 132: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

On convergence rate for fixed-point iteration B. Khoromskij, Zurich 2010(L11) 263

Let V be a normed space (e.g., n× n matrices) and consider

a function f : V → V . Assume that A ∈ V and B := f(A) can

be obtained by the locally convergent fixed-point iterations

Given X0 ∈ V, Xk = Φ(Xk−1), k = 1, 2, ... , (81)

where Φ : V → V is a one-step operator,

limk→∞

Xk = B = Φ(B). (82)

Lem. 11.1. ([1]) Assume, there are constants cΦ, εΦ > 0 s.t.

‖Φ(X) −B‖ ≤ cΦ ‖X −B‖α ∀ X: ‖X −B‖ ≤ εΦ, (83)

and set α = 2, ε := min (εΦ, 1/cΦ). Then (82) holds for any X0

satisfying ||X0 −B|| < ε, and, moreover,

‖Xk −B‖ ≤ c−1Φ (cΦ ‖X0 −B‖ )

2k(k = 0, 1, 2, . . .) . (84)

On convergence rate for fixed-point iteration B. Khoromskij, Zurich 2010(L11) 264

Proof: Let ek := ‖Xk −B‖. Then, due to (83),

ek ≤ cΦe2k−1, provided that ek−1 ≤ εΦ. (85)

Since (85), ek−1 ≤ ε ≤ εΦ imply ek ≤ cΦε2 = ε (cΦε) ≤ ε. Hence,

all iterates stay in the ε-neighbourhood of B.

(84) is proved by induction:

ek ≤(85)

cΦe2k−1 =

induct. hypoth.cΦ ·

(c−1Φ (cΦe0)

2k−1)2

=c−1Φ (cΦe0)

2k.

Whenever e0 < ε, (84) shows ek → 0.

Rem. 11.1. (84) together with e0 ≤ ε implies monotonicity:

‖Xk −B‖ ≤ ‖Xk−1 −B‖ . (86)

Rem. 11.2. Cond. (83), α = 2, is valid for the Newton iter.

Page 133: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Quadratic convergence of truncated iteration B. Khoromskij, Zurich 2010(L11) 265

Let S ⊂ V be a subset (not necessarily a subspace) considered

as a class of certain structured elements (e.g.

tensor-structured matrices) and suppose that R : V → S is an

operator mapping elements from V onto suitable structured

approximants in S. We call R a truncation operator .

Define a truncated iterative process as follows:

Y0 := R(X0), Yk := R(Φ(Yk−1)), k = 1, 2 . . . (87)

Thm. 11.2. ([1]) Under the premises of Lem. 11.1, assume

that

‖X −R(X)‖ ≤ cR ‖X −B‖ ∀ X: ‖X −B‖ ≤ εΦ. (88)

Then there exists δ > 0 such that the truncated iteration

(87) converges to B so that for k = 1, 2, . . .

‖Yk −B‖ ≤ cRΦ ‖Yk−1 −B‖2with cRΦ := (cR + 1)cΦ (89)

Quadratic convergence of truncated iteration B. Khoromskij, Zurich 2010(L11) 266

for any starting value Y0 = R(Y0) satisfying ‖Y0 −B‖ < δ.

Proof: Let ε := min (εΦ, 1/cΦ) and define Zk = Φ(Yk−1). By

(86) we have

‖Zk −B‖ ≤ ‖Yk−1 −B‖ ,

provided that ‖Yk−1 −B‖ ≤ ε. Then

‖Yk −B‖ = ‖R(Zk) − Zk + Zk −B‖ ≤ (cR + 1) ‖Zk −B‖ . (90a)

Assuming ‖Yk−1 −B‖ ≤ ε, the bounds ε ≤ εΦ and (83) ensure

‖Zk −B‖ = ‖Φk(Yk−1) −B‖ ≤ cΦ ‖Yk−1 −B‖2 . (90b)

Combining (90a) and (90b), we obtain (89) for any k,

provided that ‖Yk−1 −B‖ ≤ ε.

As in the proof of Lem. 11.1, the choice δ := min (ε, 1/cRΦ)

guarantees that ‖Y0 −B‖ ≤ δ implies ‖Yk −B‖ ≤ δ ≤ ε, k ∈ N.

Page 134: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Some remarks on trancated FP iteration B. Khoromskij, Zurich 2010(L11) 267

Cor. 11.3. Under the assumptions of Thm. 11.1, any

starting value Y0 with ‖Y0 −B‖ ≤ δ leads to

‖Yk −B‖ ≤ c−1RΦ (cRΦ ‖Y0 −B‖)2

k

(k = 1, 2, . . .) , (91)

where cRΦ and δ are defined as above.

The condition (88) has a clear geometrical meaning. If

R(X) := argmin ‖X − Y ‖ : Y ∈ S

is the best approximation to X in the given norm, inequality

(88) holds with cR = 1, since B ∈ S. Therefore, (88) with

cR ≥ 1 can be viewed as a quasi-optimality condition.

Rem. 11.3. If the norm is defined by a scalar product, S is a subspace,

and R(X) is the orthogonal projection onto S, then (88) is obviously

fulfilled with cR = 1. If α = 1 in (83), i.e. linear convergence, the trancated

process retains linear convergence rate, provided that (cR + 1)cΦ < 1.

Truncated Newton iteration to compute A−1B. Khoromskij, Zurich 2010(L11) 268

We analyse the case of second order tensors (d = 2)

A ≈ Ar =r∑

k=1

Uk ⊗ Vk, Uk ∈ Rm×m, Vk ∈ R

n×n.

Recall (see Def. 6.1, Lect. 6) that for a matrix A ∈ Rm×n we

use the vector representation A→ vec(A) ∈ Rmn, where vec(A)

is an nm× 1 vector obtained by “stacking” A’s columns

vec(A) := [a11, ..., an1, a12, ..., anm]T ,

vec(A) is a rearranged version of A. Introduce the linear

invertible oper. L : Rmn×mn → R

m2×n2

by (L = P, cf. Lect. 8)

L(Ar) ≡ Ar :=r∑

k=1

vec(Vk) · vec(Uk)T .

L is unitary with respect to the spectral or Frobenius norm.

Page 135: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Truncated Newton iteration to compute A−1B. Khoromskij, Zurich 2010(L11) 269

Making use of the transform L allows to reduce the low

Kronecker rank approximation of A to those for the low-rank

approximation to A. For fixed r one may apply truncation

operator R of the form

R(A) := L−1(Πr(L(A))), I −R = L−1(I − Πr)L,

where Πr(B) is the best rank-r approximation to B ∈ Rm2×n2

in the given norm (say, spectral or Frobenius norm).

We formulate the general statement.

Let B = F(A) be defined by the given matrix-valued function

F and let R be the truncation operator that satisfies (88) for

all X in the “small” neighbourhood S(B) of B.

In particular, we consider F(A) = A−1.

Truncated Newton iteration to compute A−1B. Khoromskij, Zurich 2010(L11) 270

Introduce the modified (truncated) Newton-Schultz iteration

Zk+1 = Xk(2I −AXk), Xk+1 = R(Zk+1), k = 1, 2, ... (92)

Thm. 11.4. Let (88) be satisfied. Then for any initial guess

X0 = R(X0) ∈ S(B), the truncated Newton-Schultz iteration

(92) converges to A−1 quadratically

||A−1 −Xk|| ≤ (1 + CR)||A|| ||A−1 −Xk||2, k = 1, 2, ...

Proof. Note that (88) leads to

B ≡ A−1 = R(A−1).

Now eq. (92) implies A−1 − Zk+1 = (A−1 −Xk)A(A−1 −Xk),

which yields

||A−1 − Zk+1|| ≤ ||A|| ||A−1 −Xk||2. (93)

Page 136: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Truncated Newton iteration to compute A−1B. Khoromskij, Zurich 2010(L11) 271

On the other hand, (88) implies

||Xk − Zk|| = ||R(Zk) − Zk|| ≤ CR||A−1 − Zk||,

hence, the triangle inequality leads to

||A−1 −Xk|| ≤ ||A−1 − Zk|| + ||Zk −Xk|| ≤ (1 + CR)||A−1 − Zk||.

Combinig this bound with (93) completes the proof.

Let us check (88) for the choice R(A) = L−1(Πr(L(A))). We

denote Y = L(X) and YB = L(B) and note that B = R(B)

yields ΠrYB = YB.

In the following proof we make use of the standard stability

estimates for the singular values of the perturbed matrix

[Wielandt, Hoffman ’55].

Now we estimate in the Frobenis norm

Truncated Newton iteration to compute A−1B. Khoromskij, Zurich 2010(L11) 272

‖L−1‖−1‖X −RX‖ ≤ ‖(I − Πr)Y ‖ = (nX

k=r+1

σk(Y )2)1/2

=

vuutnX

k=r+1

(σk(Y ) − σk(YB))2

≤nX

k=r+1

|σk(Y ) − σk(YB)|

≤n−r+1X

k=1

σk(Y − YB) ≤√n− r||L(X − B)||.

Estimate (88) now follows with CR =√n− r‖L−1‖‖L‖.

Few remarks.

1. The factor√n− r can be omitted.

2. The error estimate above allows the straightforward local analysis for

algorithm (94) with the truncation operator R.

3. In the case of three (or more) factors (d ≥ 3), we can analyse the

sub-optimal truncation operator R via Tucker-type decomposition.

Page 137: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Iterative schemes to compute sqrt(A) B. Khoromskij, Zurich 2010(L11) 273

The iterative calculation may be not very simple !

Newton iteration to compute the square root A1/2 of the

s.p.d. matrix A: Given X0, the iteration

Xk∆k + ∆kXk = A−X2k , with ∆k = Xk+1 −Xk, (94)

converges to A1/2 quadratically (locally). It requires solving

matrix Lyapunov equation.

This scheme can be considered as the Newton iteration to

solve the nonlinear matrix equation

Ψ(X) := A−1 −X2 = 0.

Clearly,

Ψ(X + δ) − Ψ(X) = −X∆ − ∆X,

so our iteration can be interpreted as the Newton method for

solving Ψ(X) = 0 (Thm. 11.2: analyses of truncated iter.).

Iterative schemes to compute sqrt(A) B. Khoromskij, Zurich 2010(L11) 274

Iteration (94) can be written as Xk = Φk(Xk−1) corresponding

to the choice

Φk(X) := Φ(X),

where Φ(X) solves the matrix equation

X(Φ(X) −X) + (Φ(X) −X)X = A−X2.

Simple calculation shows that the latter equation implies

(with the substitution A = B2)

X(Φ(X) −B) +XB −X2 + (Φ(X) −B)X +BX −X2 = B2 −X2,

which leads to the matrix Lyapunov eq. w.r.t. Y = Φ(X) −B,

XY + Y X = (B −X)2.

Page 138: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Iterative schemes to compute sqrt(A) B. Khoromskij, Zurich 2010(L11) 275

Making use of the solution operator for the Lyapunov eq.

(assume that X = X⊤ > 0), we arrive at the norm estimate

‖Φ(X) −B‖ ≤∥∥∥∥∫ ∞

0

e−tX(B −X)2e−tXdt

∥∥∥∥ ≤ C‖B −X‖2.

This proves relation (3) in Lem. 11.1 with α = 2. Hence,

Thm. 11.2 ensures the convergence of the truncated version

of the nonlinear iteration (94).

Note that the simpler scheme

X0 = a0A, Xk := Xk−1 −1

2(Xk−1 −X−1

k−1A) (k = 1, 2, . . .) ,

where a0 > 0 is the given constant, does not guarantee, in

general, the convergence of truncated iterations.

Exer. 11.1. Compute sign(A), sqrt(A) and A−1 in the case of 2D

Laplacian, ∆m ⊗ In + Im ⊗ ∆n, as MATLAB functions. What is the

Kronecker ε-rank in all cases (use SVD of the rearranged matrix A). Try

to apply the Newton scheme to A−1 on moderate m× n grid.

Iterative schemes to compute sign(A) B. Khoromskij, Zurich 2010(L11) 276

Newton-Schulz iteration to compute sign(A),

Xk+1 = Xk +1

2

[I − (Xk)2

]Xk , X0 = A/||A||2. (95)

Diagonalisable case. Let T be the unitary transform that

diagonalises A, i.e., A = T⊤DT with di ∈ [−1, 1], then it also

diagonalises all Xk, k = 1, 2, .... Hence we have to show that

the scalar iteration

xk+1 = f(xk), with x0 ∈ [−1, 0) ∪ (0, 1],

with f(x) := x+ 12x(1 − x2) ≡ xg(x), converges to sign(x0)

quadratically.

f(x), x ∈ [−1, 1], is increasing and has the fixed points

x = −1, 0, 1. Since g(x) > 1, x ∈ (−1, 1), it implies

0 < xk < xk+1 ≤ 1 if x0 ∈ (0, 1] and −1 ≤ xk+1 < xk < 0 if

x0 ∈ [−1, 0).

Hence, both x = −1 and x = 1 are stable fixed points.

Page 139: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Iterative schemes to compute sign(A) B. Khoromskij, Zurich 2010(L11) 277

For example, consider the case with small initial guess x0 > 0.

For x ∈ [−1/2, 1/2], we have g(x) ≥ q > 1 with q = 1 + 3/8, thus

the number of iterations xk+1 = xkg(xk) to achieve the value,

say, xk = 0.5 starting from x0 > 0 is about O(logq x0).

For xk ≥ 1/2, we enter the regime with quadratic

convergence. In fact, we just have

1 − xk+1 =1

2(1 − xk)2(xk + 2),

which implies |1 − xk+1| ≤ 32 (1 − xk)2. In this stage, to achieve

precision ε > 0 one requires O(log2 log2 ε−1) iterations.

For the initial guess we actually have x0 = cond(A)−1, which

implies that the total number of iterations is bounded by

O(log2 log2 ε−1) +O(logq cond(A)).

Iterative schemes to compute sign(A) B. Khoromskij, Zurich 2010(L11) 278

Note that iteration (95) can be written as Xk = Φ(Xk−1) with

Φ(X) := X + 12

(I −X2

)X (see Lem. 11.1). Since all Xk

(k = 1, 2, ...) are simultaniously diagonalised by the same

matrix T , we have (with B = sign(A)):

Φ(X) −B = X −B +1

2(B2 −X2)X

=1

2(X −B)(B(B −X) + (B −X)(B +X)

= −(X −B)2(B +1

2X).

The analysis for algorithm

X0 = A, Xk+1 =1

2(Xk +X−1

k )

in the diagonalisable case is reduced to that one for the

Newton meth. applied to

Ψ(x) := x2 − 1 = 0, that is xk+1 =1

2(xk +

1

xk).

Page 140: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Multidimensional convolution in tensor format B. Khoromskij, Zurich 2010(L11) 279

Fast and accurate computation of convolution transform in

Rd (recall from Lect. 4),

w(x) := (f ∗ g)(x) :=

Rd

f(y)g(x− y)dy f, g ∈ L1(Rd).

Application: Solving the elliptic eq. with constant coef. in

Rd, in particular, the Hartree potential in quantum chemistry,

VH(x) =

R3

ρ(y, y)

‖x− y‖dy, x ∈ R3.

Method: The low rank multi-linear projection-collocation.

Physical prerequisits:

(a) Compute f ∗ g in some fixed box Ω = [−A,A]d.

(b) Suppose that f has the support in Ω.

(c) f has R-term separable representation with moderate R.

Multidimensional convolution in tensor format B. Khoromskij, Zurich 2010(L11) 280

Tensor grid: Let ωd := ω1 × ...× ωd be the equi-distant tensor

grid of collocation points xm in Ω, m ∈ M := 1, ..., n+ 1d,

ωℓ := −A+ (m− 1)h : m = 1, ..., n+ 1 (ℓ = 1, ..., d), h = 2A/n.

Product Basis: For given piecewise const. basis funct. φi,φi(x) =

d∏ℓ=1

φiℓ(xℓ), φiℓ(·) = φ(· + (iℓ − 1)h), i ∈ I := 1, ..., nd,

related to ωd, let

f(y) ≈∑

i∈Ifiφi(y), fi = f(Pi).

The discrete collocation scheme (cost O(n2d)):

f ∗ g ≈ wmm∈M, wm :=∑

i∈Ifi

Rd

φi(y)g(xm − y)dy, xm ∈ ωd.

Compute the collocation coeff. tensor (L2-projection)

G = gi ∈ RI : gi =

Rd

φi(y)g(−y)dy, i ∈ I.

Page 141: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

d-dimensional discrete convolution B. Khoromskij, Zurich 2010(L11) 281

Define the d-th order tensor F = fi ∈ RI.

Compute discrete convolution in Rd (cost O(nd logq n) via FFT)

F ∗ G := zj, zj :=∑

i

figj−i+1, j ∈ J := 1, ..., 2n− 1d,

where the sum is over all i ∈ I which lead to legal subscripts

for fi and gj−i+1.

Important step: wm, m ∈ M, is obtained by copying the

corresponding part of zj (centred by j = n),

wm = zj|j=n/2+m, m ∈ M.

Specifically, for jℓ = 1, ..., 2n− 1,

iℓ ∈ [max(1, jℓ + 1 − n) : min(jℓ, n)].

For example, in 1D case,

z(1) = f(1) · g(1), z(2) = f(1) · g(2) + f(2) · g(1),

d-dimensional discrete convolution B. Khoromskij, Zurich 2010(L11) 282

z(n) = f(1) · g(n) + f(2) · g(n− 1) + ...+ f(n) · g(1), z(2n− 1) = f(n) · g(n).

Ex. 11.1. 1D-case, n = 4 m = 1, ..., 5.

z(1) = f(1) · g(1), z(2) = f(1) · g(2) + f(2) · g(1),

z(3) = f(1) · g(3) + f(2) · g(2) + f(3) · g(1),

z(4) = f(1) · g(4) + f(2) · g(3) + f(3) · g(2) + f(4) · g(1).

z(5) = f(2) · g(4) + f(3) · g(3) + f(4) · g(2),

z(6) = f(3) · g(4) + f(4) · g(3), z(7) = f(4) · g(4).But the collocation scheme leads to (note: g1 = g4, g2 = g3)

w(1) = f(1) · g(2) + f(2) · g(1),

w(2) = f(1) · g(3) + f(2) · g(2) + f(3) · g(1),

w(3) = f(1) · g(4) + f(2) · g(3) + f(3) · g(2) + f(4) · g(1).

w(4) = f(2) · g(4) + f(3) · g(3) + f(4) · g(2), w(5) = f(3) · g(4) + f(4) · g(3).

Page 142: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Error analysis for projection collocation convolution B. Khoromskij, Zurich 2010(L11) 283

Lem. 11.5 Let f ∈ C2(Ω) and let g ∈ L1(Ω). Furthermore, we

assume that there exist µ ≥ 1 and β > 0, such that

|F(g)| ≤ C/|κ|µ as |κ| → ∞, κ ∈ Rd,

|∇yg(x− y)| ≤ C/|x− y|β for x, y ∈ Ω, x 6= y.

Then there is a constant C > 0 independent of h such that

|w(xm) − wm| ≤ Ch2, m ∈ M.

Exer. 11.2. See BNK [1] for the technical proof of Lem. 11.5.

Ex. 11.2. The fundamental solution of the Laplace operator

in Rd is given by

G(x) = c(d)/|x|d−2, with F(G) = C/|κ|2.

Lem. 11.5 now applies with β = d− 1, µ = 2.

Richardson extrapolation B. Khoromskij, Zurich 2010(L11) 284

Lem. 11.6. (Richardson extrapolation, BNK [1])

Under the assumptions of Lem. 11.5, let f ∈ C3(Ω). Then

there exists a function c1 ∈ C(Ω) which is independent of h,

such that for m ∈ M we have

w(xm) = wm + c1(xm)h2 + ηm,h with |ηm,h| ≤ Ch3.

Higher order approximation without extra cost!

wm = (4 ∗ wh/2m − wh

m)/3, m ∈ Mh,

then

|w(xm) − wm| ≤ Ch3, m ∈ Mh.

Richardson extrapolation applies directly to functionals of

w(xm).

Page 143: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Numerics for tensor-product convolution B. Khoromskij, Zurich 2010(L11) 285

Letting G ∈ CR, F ∈ T r, we tensorize the discr. convolution

F ∗ G =

R,rX

k=1,m=1

bkcm1...md

“U

(1)k ∗ V (1)

m1

”× ...×

“U

(d)k ∗ V (d)

md

”.

1D convolution on equidistant grid, U(ℓ)k ∗ V (ℓ)

mℓ∈ R2n−1, can be computed

by FFT in O(n logn) operations.

Setting a = U(ℓ)k ,b = V

(ℓ)mℓ

∈ Rn, we have

(a ∗ b)j =

nX

m=1

ambj−m+1, j = 1, ..., 2n− 1.

This leads to the linear in n complexity

NC∗T = O(drRn log n+Rrd) ≪ nd log n.

Ex. 11.3. The Hartree potential for H2O molecule [3].

n3 1283 2563 5123 10243 20483 40963 81923 163843

FFT3 4.3 55.4 582.8 ∼ 6000 – – – ∼ 2 years

C ∗ C 1.0 3.1 5.3 21.9 43.7 127.1 368.6 700.2

Linear and bilinear operations on Tucker tensors B. Khoromskij, Zurich 2010(L11) 286

Def. 11.1. For given tensors A,B ∈ RI , the Hadamard product

A⊙B ∈ RI of two tensors of the same size I is defined by componentwise

product,

(A⊙B)i = ai · bi, i ∈ I.For A1, A2 ∈ T r, one tensorizes the Hadamard product in O(drn+ r2d) op.

A1⊙A2 :=rX

k1,m1=1

···rX

kd,md=1

βk1...kdζm1...md

“u(1)k1

⊙ v(1)m1

”⊗...⊗

“u(d)kd

⊙ v(d)md

”.

(96)

Apply definition to the rank-1 tensors (β = ζ = 1), to obtain

(A1 ⊙ A2)i =(u(1)i1v(1)i1

) · · · (u(d)idv(d)id

)

=“u(1) ⊙ v(1)

”⊗ · · · ⊗

“u(d) ⊙ v(d)

”. (97)

Def. 11.2. For given tensors F = [fi] ∈ RI , G = [gi] ∈ RI , we define their

discrete convolution product by

F ∗G :=

24X

i∈Ifigj−i+1

35

j∈J

, J := 1, ..., 2n− 1d,

where j− i+1 ∈ I (G can be extended to the index set beyond I by zeros).

Page 144: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Linear and bilinear operations on Tucker tensors B. Khoromskij, Zurich 2010(L11) 287

For given A1, A2 ∈ T r, the tensorized convolution product reads

A1 ∗A2 :=rX

k=1

rX

m=1

βk1...kdζm1...md

“u(1)k1

∗ u(1)m1

”⊗ ...⊗

“u(d)kd

∗ v(d)md

”. (98)

This relation follows from the analysis in the case of rank-1 convolving

tensors F, G ∈ C1, similar to the case of Hadamard product of tensors,

(F ∗G)j :=X

i∈If(1)i1

· · · f(d)idg(1)j1−i1+1 · · · g(d)

jd−id+1

=

n1X

i1=1

f(1)i1g(1)j1−i1+1 · · ·

ndX

id=1

f(d)idg(d)jd−id+1 =

dY

ℓ=1

“f(ℓ) ∗ g(ℓ)

”jℓ. (99)

Assuming that ”one-dimensional” convolutions of n-vectors,

u(ℓ)kℓ

∗ v(ℓ)mℓ∈ R

2n−1,

can be computed in O(n logn) operations (circulant convolution by FFT),

we arrive at the overall complexity estimate

N·∗· = O(dr2n log n+ r2d) ≪ O(nd).

Bilinear tensors operations in canonical format B. Khoromskij, Zurich 2010(L11) 288

Consider tensors A1, A2, represented in the rank-R canonical format,

A1 =

R1X

k=1

cku(1)k ⊗ . . .⊗ u

(d)k , A2 =

R2X

m=1

bmv(1)m ⊗ . . .⊗ v

(d)m , (100)

with normalized vectors u(ℓ)k , v

(ℓ)m ∈ Rnℓ . For simplicity of discussion, we

assume nℓ = n, ℓ = 1, ..., d.

1. A sum of two canonical tensors given by (100) can be written as

A1 +A2 =

R1X

k=1

cku(1)k ⊗ . . .⊗ u

(d)k +

R2X

m=1

bmv(1)m ⊗ . . .⊗ v

(d)m , (101)

resulting in the canonical tensor with the rank at most RS = R1 +R2. This

operation has no cost since it is simply a concatenation of two tensors.

2. For given canonical tensors A1, A2, the scalar product is computed by

〈A1, A2〉 :=

R1X

k=1

R2X

m=1

ckbm

dY

ℓ=1

Du(ℓ)k , v

(ℓ)m

E. (102)

Calculation of (102) includes R1R2 scalar products of vectors in Rn,

Page 145: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Bilinear tensors operations in canonical format B. Khoromskij, Zurich 2010(L11) 289

leading to the overall complexity

N〈·,·〉 = O(dnR1R2).

3. For A1, A2 given by (100), we tensorize the Hadamard product by (97),

A1 ⊙ A2 :=

R1X

k=1

R2X

m=1

ckbm“u(1)k ⊙ v

(1)m

”⊗ . . .⊗

“u(d)k ⊙ v

(d)m

”. (103)

This leads to the complexity O(dnR1R2).

4. The convolution product of two tensors in the canonical format (100),

is given by (see (99))

A1 ∗A2 :=

R1X

k=1

R2X

m=1

ckbm“u(1)k ∗ v(1)m

”⊗ . . .⊗

“u(d)k ∗ v(d)

m

”, (104)

leading to the asymptotic complexity O(dR1R2n log n).

Bilinear tensors operations in canonical format B. Khoromskij, Zurich 2010(L11) 290

The convolution transform using the two-level format T CR,r (see Def.

6.8). The result W = F ⋆ G is represented in the two-level Tucker format

T CRc,rc with moderate Rc and rc.

Alg. 11.1. (Tensor convolution of type T CR1,r⋆ CR → T CRc,rc )

1. Given F ∈ T CR1,rwith the core β =

PR1ν=1βνz

(1)ν ⊗ . . .⊗ z

(d)ν , G ∈ Cn,R

and the rank parameters rc, Rc ∈ N (suppose that Rc ≪ RR1).

2. For ℓ = 1, ..., d, kℓ = 1, ..., r, compute 1D convolutions gkℓ,mℓ = u

(ℓ)kℓ⋆v

(ℓ)m

(m = 1, ..., R) of size 2n− 1, restrict the results onto the index set Iℓ,

and form the n× rR matrix unfolding Aℓ (cost O(drRn logn)).

3. For ℓ = 1, ..., d, compute the ℓ-mode rc-dimensional dominating

subspace for Aℓ, specified by an orthogonal matrix W (ℓ) = [w1ℓ , ..., w

rcℓ ]

at the expense O(dnr2R2).

4. Project the ℓ-mode matrices Gmℓ = [g1,m

ℓ , ..., gr,mℓ ] onto the new

orthogonal basis, Gmℓ ≈ W (ℓ) ·Mm

ℓ with a coefficients matrix

Mmℓ ∈ Rrc×r (cost O(drrcn)).

5. Calculate the core tensor of size rc = (rc, ..., rc) in the

Page 146: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Bilinear tensors operations in canonical format B. Khoromskij, Zurich 2010(L11) 291

product-canonical format

βc =RX

m=1

γm

0@

R1X

ν=1

βν

dO

ℓ=1

Mmℓ zν

1A ∈ Cn,RR1

at the cost O(d2R1Rr2rc). Recompress the core βc to the rank-Rc

tensor eβc and constitute the result in the form

W = eβc ×1 W(1) ×2 ...×d W

(d) ∈ T CRc,rc .

We have proven that Alg. 11.1 scales linearly in n and quadratically in d,

NTC⋆C→TC = O(drRn log n+ dr2R2n+ drrcn+ d2r2RR1rc),

upto the low cost of rank RR1-to-Rc truncation at Step 5.

The Tucker-CP type convolution scales as

NT ⋆C→T = O(drRn logn+ dr2R2n+ rRrdc ).

Rem. 11.4. Additive dimension splitting based on canonical and Tucker

formats can be efficient in moderate dimensions. Tensor numerical

methods in higher dim. are based on multiplicative dimension splitting.

Literature to Lect. 11 B. Khoromskij, Zurich 2010(L11) 292

1. W. Hackbusch, B.N. Khoromskij and E. Tyrtyshnikov: Approximate Iteration for Structured Matrices.

J. Numer. Math. Vol. 13, No. 2 (2005), 119-156.

2. B.N. Khoromskij, Fast and Accurate Tensor Approximation of a Multivariate Convolution with Linear

Scaling in Dimension. J. of Comp. Appl. Math., 234 (2010) 3122-3139.

3. B.N. Khoromskij and V. Khoromskaia, Multigrid Tensor Approximation of Function Related Arrays.

SIAM J. on Sci. Comp., 31(4), 3002-3026 (2009).

http://personal-homepages.mis.mpg.de/bokh

Page 147: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Introduction to Tensor Numerical Methods III B. Khoromskij, Zurich 2010(L12) 293

If you can‘t explain it simply,

you don‘t understand it well enough.

Albert Einstein (1879-1955)

Introduction to Tensor Numerical Methods in

Scientific Computing(Part III. Multiplicative dimension splitting by TT/QTT formats.

TT/QTT numerical methods in modern applications.)

Boris N. Khoromskij

http://personal-homepages.mis.mpg.de/bokh

University/ETH Zurich, Pro∗Doc Program, WS 2010

Part III: Outlook B. Khoromskij, Zurich 2010(L12) 294

Part III (Lect. 12-18). Multiplicative dimension splitting by TT/QTT

formats. TT/QTT numerical methods in modern applications.

Tensor train/chain multiplicative formats. Main properties, SVD based

TT-rank truncation.

Quantics-TT representation. TT and QTT rank estimates on classes

of “multivariate“ vectors.

Approximation in QTT format. Explicit QTT representation of vectors

and matrices.

MLA in (Q)TT tensor formats. Relation to MPS and DMRG.

Rank structured solution operators and preconditioners in Rd.

Toward solving boundary value and spectral problems in tensor format.

Tensor numerical methods for the Hartree-Fock eq. in electronic

structure calculations.

Application to SPDEs.

DMRG in molecular dynamics and other directions.

Page 148: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Lect. 12. From additive to multiplicative dimension splitting B. Khoromskij, Zurich 2010(L12) 295

Outline of Lecture 12.

1. Higher dimensions might (and should) be tractable.

2. Dimension splitting via tensor train/chain factorization,

matrix product states (MPS).

3. Quasioptimal SVD based approximation.

4. Tensor truncation in TT format.

5. Historical overview, and related approaches.

6. On approximability in TT format.

(a) Analytic methods.

(b) Canonical → TT.

(c) SVD-based recompression.

7. TT/QTT MATLAB Tensor Toolbox:

http://spring.inm.ras.ru/osel (Dr. I. Oseledets, INM RUS, Moscow).

Higher dimensions might be tractable B. Khoromskij, Zurich 2010(L12) 296

1. Large scale modern applications:

DMRG for FCI electronic structure calculations, molecular dynamics,

quantum computing. Stochastic PDEs. Tensor networks in quantum

chemistry.

2. Main computational issues:

Solving basic PDEs on N ×N × ...×N︸ ︷︷ ︸d

grids, getting rid of

”curse of dimension”.

3. From additive to multiplicative dimension splitting:

Fast and robust MPS numerical methods for representation

of d-variate functions, operators and for solving physical

equations in Rd, with linear O(dN)-scaling in d.

4. Super-compression by high-dimensional Q-folding:

log-volume quantics-TT approxim. of N-d tensors, O(d logN).

Page 149: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Dimension splitting (DS) via tensor train/chain factorization B. Khoromskij, Zurich 2010(L12) 297

Def. 12.1. (Tensor Train/Chain format), TT/TC[r,n, d].

Given the index set J := ×dℓ=1Jℓ, Jℓ = 1, ..., rℓ, and J0 = Jd.

The rank-r tensor chain (TC) format

TC[r,n, d] ≡ TC[r] ⊂ Vn, n=(N,...,N) - d-fold,

contains all V ∈ Vn that can be presented as the chain of

contracted products of 3-tensors over (auxiliary) J ,

V = ×ℓdℓ=1G

(ℓ) with 3-tensors G(ℓ) ∈ RJℓ−1×Iℓ×Jℓ .

In coordinate representation,

V (i1, ..., id) =∑

α∈JG1(αd, i1, α1)G2(α1, i2, α2) · · ·Gd(αd−1, id, αd)

If J0 = Jd = 1 (disconnected chain), TC-format coincides

with the Tensor-Train (TT) model.

TT – Oseledets, Tyrtyshnikov, [3], [4], TC – Khoromskij, [1].

Specific features of TC/TT factorization B. Khoromskij, Zurich 2010(L12) 298

Here G1(i1) is a row 1 × r1-vector depending on i1, Gℓ(iℓ) is a

matrix of size rℓ−1 × rℓ with elements depending on iℓ, Gd(id)

is a column vector of size rd−1 × 1, depending on id.

A tensor V ∈ TC[r] is represented (approximated) by a

product of matrices (matrix product states), each depending

on a single “physical” index. Its similar to Tucker, but with

localised connectivity constraints.

d = 2: TT is a skeleton factorization of a rank-r matrix.

TT/TC model have many beneficial features:

– linear in d storage,

– existance of quasioptimal SVD-based rank approximation

(analogy of T-HOSVD),

– SVD-based rank truncation procedure with linear scaling in

d (analogy of T-RHOSVD),

– efficient formated bilinear tensor operations.

Page 150: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Visualising TC/TT models B. Khoromskij, Zurich 2010(L12) 299

The Tensor-Chain format for d = 6.

N ×N × ...×N︸ ︷︷ ︸d

-tensor in TC[r] is the d-fold contracted prod.

of tri-tensors.

N

r1

r1rr

2 2r3

d=6

r

N

N

3

r6

r5

6

r5 r4

r r4

Special case r6 = 1: TT[r] = TC[r] (disconnected chain).

Embedding: TT[r] ⊂ TC[r].

Benefits of TC/TT models B. Khoromskij, Zurich 2010(L12) 300

Thm. 12.1. (Storage, rank bound, concatenation, quasioptimality).

(A) Storage:d∑

ℓ=1

rℓ−1rℓN ≤ dr2N with r = maxℓ rℓ.

(B) Rank bound: rℓ ≤ rankℓ(V) ≤ rank(V).

(C) Canonical embeddings:

TT [r] ⊂ TC[r]; CR,n ⊂ TT [r,n, d] with r = (R, ..., R).

(D) Concatenation to higher dim.: V [d1]⊗ V [d2] → D = d1 + d2.

(E) Quasioptimal TT[r]-approximation of V ∈ Vn, satisfy

minT∈TT [r]

‖V −T‖F ≤ (d∑

ℓ=1

ε2ℓ )1/2, εℓ = min

rank B≤rℓ‖V[ℓ] −B‖F ,

and it can be computed by QR/SVD. V[ℓ] – ℓ-mode TT

unfolding matrix.

TC approximation requires ALS-type iteration.

Page 151: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Historical remarks related to quasioptimality (E) B. Khoromskij, Zurich 2010(L12) 301

Historical remarks toward the proof of (E),

(see Lem. 12.2. below).

Rem. 12.1. (Quasioptimality via SVD-based approximation).

Full-to-Tucker-HOSVD – [De Lathauwer et al. 2000].

Idea of Hierarchical DS – [BNK ’06], see [6].

Canonical-to-Tucker-RHOSVD – [BNK, Khoromskaia ’08].

Full-to-TT – [Oseledets, Tyrtyshnikov ’09].

Hierarchical Tucker format – [Hackbusch, Kuhn; Grasedyck ’09].

Quantics-TT – [BNK ’09], [Oseledets ’09], [BNK, Oseledets ’09-’10]

(approximation, theoretical bounds on rℓ and numerics).

Manifolds of TT tensors – [R. Schneider, Holtz, Rohwedder ’10]

Rem. 12.2. εℓ can be estimated over the truncated SVD or

ACA of the ℓ-mode TT-unfolding matr. of V, V[ℓ] (ℓ = 1, ..., d).

Toward the proof of Thm. 12.1 (E) B. Khoromskij, Zurich 2010(L12) 302

Lem. 12.2. (see [5]) For any tensor A = [A(i1, . . . , id)] there exists a TT

approximation T = [T (i1, . . . , id)] ∈ TT [r] with ranks rk, s.t.

‖A − T‖2F ≤

d−1X

k=1

ε2k, (105)

where εk is the Frobenius distance from A[k] to its best rank-rk approx.:

εk = minrankB≤rk

‖A[k] − B‖F .

Proof. First, consider the case d = 2. Then TT decomposition reads,

T (i1, i2) =

r1X

α1=1

G1(i1, α1)G2(α1, i2),

and coincides with the skeleton decomposition of the matrix T . Choose T

using the rank-r1 truncated SVD of A, which guarantees that the norm

‖A− T‖F = ε1 is minimal possible.

Then, proceed by induction. Consider the SVD of first unfolding matrix,

A[1] ≡ A(1) = [A(i1; i2...id)] = UΣV, Σ = diagσ1, σ2, · · · . (106)

Page 152: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Toward the proof of Thm. 12.1 (E) B. Khoromskij, Zurich 2010(L12) 303

As an approximation to A(1), consider

B1 = U1ΛV1, Λ = diag(σ1, . . . , σr1 ), (107)

where U1 and V1 contain the first r1 coulmns of U and rows of V ,

respectively. Then B1 is the best rank-r1 approximation to A[1], i.e.,

A[1] = B1 + E1, rankB1 ≤ r1, ‖E1‖F = ε1.

Obviously, B1 can be considered as a tensor B = [B(i1, . . . id)] and the

approximation problem reduces to the one for B.

Observe that if we take an arbitrary tensor T = [T (i1, . . . , id)] with the first

unfolding matrix T[1] = [T (i1, ; i2, . . . , id)] in the form,

T[1] = U1W (108)

with U1 from (107) and an arbitarry matrix W with r1 rows and as many

columns as in T[1], then E∗1T[1] = 0 and this implies that

‖(A − B) + (B − T)‖2F = ‖A − B‖2

F + ‖B − T‖2F . (109)

However, the tensor B is still of dimensionality d. To reduce

Toward the proof of Thm. 12.1 (E) B. Khoromskij, Zurich 2010(L12) 304

dimensionality, rewrite the matrix equality (107) in the elementwise form

B(i1; i2, . . . , id) =

r1X

α1=1

U1(i1;α1) bA(α1; i2, . . . , id), where bA = ΛV1.

Then, concatenate indices α1 and i2 into one long index and consider bAas a tensor

bA = [ bA(α1i2, i3, . . . id)] of dimensionality d− 1.

By induction, bA admits a TT approx. bT = [ bT (α1i2, i3, . . . id)] of the form

bT (α1i2, i3, . . . id) =X

α2,...,αd−1

G2(α1i2, α2)G3(α2, i3, α3) · · ·Gd(αd−1, id),

such that

‖bA − bT‖2F ≤

d−1X

k=2

ε2k, with

εk = minrankC≤rk

‖ bAk − C‖F , bAk = [ bA(α1i2, . . . ik; ik+1, . . . , id)].

Now let us set G1(i1, α1) = U(i1, α1), separate indices α1, i2 from the long

Page 153: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Toward the proof of Thm. 12.1 (E) B. Khoromskij, Zurich 2010(L12) 305

index α1i2 and define T by the following tensor train:

T (i1, . . . , id) =X

α1,...αd

G1(i1, α1)G2(α1, i2, α3) · · ·Gd(αd−1, id).

It remains to estimate ‖A − T‖F . First, from (106) and (107) it stems

that

bA = ΛV1 = U∗1A[1],

and consequently (Overlined numbers mean complex conjugates)

bA(α1i2, i3, . . . , id) =X

i1

U1(i1, α1)A(i1, i2, . . . , id).

Let Ak = Bk + Ek with rankBk ≤ rk and ‖Ek‖F = εk. We can consider Bk

and Ek as tensors Bk(i1, . . . , id) and Ek(i1, . . . , id). Since Bk admits a

skeleton decomposition with rk terms, we obtain

A(i1, . . . id) =

rkX

γ=1

P (i1, . . . ik; γ)Q(γ; ik+1, . . . , id) + Ek(i1, . . . , id).

Toward the proof of Thm. 12.1 (E) B. Khoromskij, Zurich 2010(L12) 306

Hence, bA(α1i2, i3, . . . , id) = Hk(α1, i2, i3, . . . , id) +Rk(α1, i2, i3, . . . , id) with

Hk(α1, i2, i3, . . . , id) =X

i1

U1(i1, α1)

rkX

γ=1

P (i1, . . . , ik; γ)Q(γ; ik+1, . . . , id),

Rk(α1, i2, i3, . . . , id) =X

i1

U1(i1, α1)Ek(i1, . . . , id).

Let us introduce a tensor L as follows:

L(α1, i2, . . . , ik, γ) =X

i1

U1(i1, α1)P (i1, . . . , ik; γ).

Then we can consider Hk as a matrix with the elements defined by a

skeleton decomposition

Hk(α1, i2, . . . , ik; ik+1, . . . , id) = L(α1, i2, . . . , ik; γ)Q(γ; ik+1, . . . , id)

and it makes it evident that the rank of Hk does not exceed rk. As well

we can consider Rk as a matrix with the elements defined by

Rk(α1; i2, i3, . . . , id) =X

i1

U1(i1;α1)Ek(i1; i2, . . . , id).

We know that U1 has orthonormal columns, and it means that the matrix

Page 154: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Toward the proof of Thm. 12.1 (E) B. Khoromskij, Zurich 2010(L12) 307

Ek is premultiplied by a matrix with orthonormal rows. Since this cannot

increase its Frobenius norm, we conclude that

εk ≤ ‖Rk‖F ≤ ‖Ek‖F = εk, 2 ≤ k ≤ d− 1.

Hence, for the error tensor E with the elements

E(α1i2, i3, . . . , id) = A(α1i2, i3, . . . , id) − T (α1i2, i3, . . . , id),

we obtain

‖bE‖2F ≤

d−1X

k=2

ε2k.

Further, the error tensor E = B − T can be viewed as matrix of the form

E(i1; i2, . . . , id) =

r1X

α1=1

U1(i1;α1) bE(α1; i2, . . . , id),

which shows that the matrix bE is premultiplied by a matrix with

orthonormal rows, so we have ‖E‖2F ≤ ‖bE‖2

F ≤d−1Pk=2

ε2k.

Finally, we observe that the first unfolding matrix T1 for T is exactly of

the form (108). Thus, (109) is valid completing the proof.

Further properties B. Khoromskij, Zurich 2010(L12) 308

Cor. 12.1 Given a tensor A, denote by ε = infB∈TT [r]

‖A − B‖F .

Then the optimal B exists (the infimum is in fact minimum)

and the TT approx. T constructed in the proof of Lem. 12.2

is quasi-optimal in the sense that

‖A − T‖F ≤√d− 1ε.

Proof. By the definition of infimum, there exists a sequence of tensor

trains B(s) (s = 1, 2, . . . , ) with the property lims→∞ ‖A − B(s)‖F = ε.

We cannot say that all elements of the corresponding tensor carriages are

uniformly bounded. Nevertheless, all elements of the tensor B(s) are

uniformly bounded, and hence, some subsequence B(st) converges

element-wise to some tensor B(min). The same holds true for the

corresponding unfolding matrices, B(s)[1]

→ B(min)[1]

, 1 ≤ k ≤ d.

It is well known that a sequence of matrices with a common rank bound

cannot converge to a matrix with a larger rank. Thus, rankB(st)[1]

≤ rk

implies that rankB(min) ≤ rk, and ‖A − B(min)‖F = ε, so B(min) is the

minimizer.

Page 155: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Further properties B. Khoromskij, Zurich 2010(L12) 309

It is now suffiecient to note that εk ≤ ε. The reason is that ε is the

approximation accuracy for every unfolding matrix A[k] delivered by a

special structured skeleton (dyadic) decomposition with rk terms while εk

stands for the best approximation accuracy without any restrictions on

the vestors of skeleton decomposition. Hence, εk ≤ ε. Then the

quasi-optimality bound follows directly from (105).

Cor. 12.2. If a tensor A admits an R-term canonical

approximation of accuracy ε > 0, then there exists a TT

approx. with rk ≤ R, and accuracy√d− 1ε.

Rem. 12.3. Similar to the case of product Stiefel manifold

of Tucker tensors, the set of TT-tensors with fixed rank

parameters, TT[r], can be proven to be the nonlinear

manifold in the TPHS Vn (see [7] for detailed discussion).

Full-to-TT approximation scheme B. Khoromskij, Zurich 2010(L12) 310

Rem. 12.4. The proof of Lem. 12.2 also gives a

constructive method for computing a TT-approximation.

Alg. 12.1. Full-to-TT compression algorithm ([5]).

Input: a tensor A of size n1 × n2 · · · × nd and accuracy bound ε > 0.

Output: tensor carriages Gk, k = 1, . . . d, defining a TT approximation to

A with the relative error bound ε.

1: Compute nrm := ‖A‖F .

2: Sizes of the first unfolding matrix : Nl = n1, Nr =Qd

k=2 nk.

3: Temporary tensor: B = A.

4: First unfolding: M := reshape(B, [Nl, Nr]).

5: Compute the truncated SVD of M ≈ UΛV , so that the approximate

rank r ensuresmin(Nl,Nr)X

k=r+1

σ2k ≤ (ε · nrm)2

d− 1.

6: Set G1 = U , M := ΛV T , r1 = r.

7: Process other modes

Page 156: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Full-to-TT approximation scheme B. Khoromskij, Zurich 2010(L12) 311

8: for k = 2 to d− 1 do

9: Redefine the sizes: Nl := nk, Nr := Nrnk

.

10: Construct the next unfolding: M := reshape(M, [rNl, Nr]).

11: Compute the truncated SVD of M ≈ UΛV , so that the approximate

rank r ensuresmin(Nl,Nr)X

k=r+1

σ2k ≤ (ε · nrm)2

d− 1.

12: Reshape the matrix U into a tensor:

Gk := reshape(U, [rk−1, nk, rk]).

13: Recompute M := ΛV .

14: end for

15. Set Gd = M .

Rem. 12.5. The Alg. 12.1 scales as O(nd+1).

Exer. 12.1. Examine Alg. 12.1 and Alg. 12.2 and compare with the

MATLAB codes.

Rank reduction in TT format B. Khoromskij, Zurich 2010(L12) 312

One of the most important procedures in structured tensor

computation is the recompression of formatted tensors.

Given a tensor A ∈ TT [r] with non-optimal ranks rk, we want

to approximate it with another TT-tensor B with smallest

possible ranks rk ≤ rk, while maintaining the desired relative

accuracy ε:

||A − B||F ≤ ε||B||F .

Such “projection” will define the ε-truncation operator,

B = Tε(A).

Construction of such an operator in the canonical form is a

notoriously difficult task, with no best solution known. Use of

the Tucker format is limited by the curse of dimension.

Page 157: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Rank reduction in TT format B. Khoromskij, Zurich 2010(L12) 313

For the TT-format it can be implemented by using standard

SVD and QR decompositions, see I. Oseledets, [3], as in Alg. 12.2

below, (with notation from [2]).

A MATLAB code for this algorithm is a part of TT-Toolbox

(I. Oseledets).

By SVDδ in Alg. 12.2, we denote SVD with singular values

that are set to zero if smaller then δ, and by QRrows, we

denote QR-decomposition of a matrix, where Q factor has

orthonormal rows.

The SVDδ(A) returns three matrices U , Λ, V of the

decomposition A ≈ UΛV ⊤ (as MATLAB svd function), and

QRrows returns two: Q-factor and R-factor.

Alg. 12.2 is an extension of reduced truncated matrix SVD,

and the RHOSVD for canonical tensors.

Algorithm of TT rank recompression B. Khoromskij, Zurich 2010(L12) 314

Alg. 12.2. TTε recompression ([3]).

Input: d-dimensional tensor A in the TT-format, required accuracy ε > 0.

Output: B in the TT-format with smallest compression ranks brk, s.t.

||A − B||F ≤ ε||A||F , i.e. B = Tε(A).

1: Let Gk, k = 1, . . . , d− 1 be cores of A.

2: Initialization: Compute truncation parameter δ = ε√d−1

||A||F .

3: Right-to-left orthogonalization:

4: for k = d to 2 step −1 do

5: [Gk(βk−1; ikβk), R(αk−1, βk−1)] := QRrows(Gk(αk−1; ikβk)).

6: Gk−1 := Gk ×3 R.

7: end for

8: Compression of the orthogonalized representation:

9: for k = 1 to d− 1 do (Compute δ-truncated SVD):

10: [Gk(βk−1ik; γk),Λ, V (βk, γk)] := SVDδ[Gk(βk−1ik;βk)].

11: Gk+1 := Gk+1 ×1 (V Λ)⊤.

12: end for; 13: Return: Gk, k = 1, . . . , d, as cores of B.

Page 158: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Algorithm of TT rank recompression B. Khoromskij, Zurich 2010(L12) 315

Rem. 12.6. The complexity of Alg. 12.2 is O(dnr3).

Rem. 12.7. All basic multilinear algebra (MLA) can be

implemented in TT-format: addition, multiplication by a

number, scalar product and norm calculation,

matrix-by-vector product, tensor-by-vector contracted

product, etc. Accomplished with the well-posed recompr.

procedure providing quasi-optimal approx., this gives an

efficient tool for solving large-scale high-dimensional probl.

Exer. 12.2. Apply the rank-2 TT representation of a tensor related to

f(x) = sin(x1 + ...+ xd) =eix − e−ix

2i= Im(eix),

to approximate the exact value of the multivariate integral for d = 5, 10, 20,

I(d) = Im

Z

[0,1]dei(x1+...+xd)dx = Im

"„ei − 1

i

«d#.

Hint: Use Lem. 2.1, Lect. 2, apply simple quadrature rule on n× ...× n

grid, and TT scalar product.

Basic tensor formats Tucker, MPS: HT, TT, QTT B. Khoromskij, Zurich 2010(L12) 316

Tensor methods in electronic structure calcul. (Hartree-Fock eq.):

Tucker, mixed Tucker/canonical - [BNK, Khoromskaia, Flad ’08-’10]

Canonical applies to SPDEs - [BNK, Ch. Schwab ’10]

DMRG via Matrix Product States (full configuration interaction

quantum chemistry) - [White ’92] ... [Schneider ’10]

Tucker + Hierarchical Dim. Splitting in molec. dynam., MCTDH -

[Meyer et al ’00-’09, Wang, Thoss ’03, ...]

Slightly entangled quantum computation, quantum Ising model [Vidal ’03]

TT (tensor train) + TT-toolbox (MATLAB) - [Oseledets, Tyrtyshnikov ’09]

HDS - O(drlog dn)-representation: [BNK ’06]; TC (periodic TT) - [BNK ’09]

HT - (hierarchical Tucker) [Hackbusch, Kuhn, Grasedyck ’09]

Quantics-TT - [BNK, Oseledets ’09] – SPDEs, DMRG in quant. mol. dyn.

Toward Tensor Networks:

TT/QTT ≈ MPS ⊂ TC/QTC ⊂ TN.

Page 159: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

On nonlinear approximation in TT-type tensor formats B. Khoromskij, Zurich 2010(L12) 317

Important operation: ”Projection” onto the nonlinear

manifold S of rank-structured tensors,

S ⊂ S0 ⊂ Vn.

Def. 12.3. (Tensor truncation TS : S0 ⊂ Vn → S).

Let S ∈ Tr, CR,T CR,r , TT,QTT.

Given A ∈ S0 ⊂ Vn : Find TS(A) := argminT∈S

‖T −A‖.

⇒ The nonlinear approx. problem. on computation of TS for

S ∈ TT, TC,QTT, admits the SVD-based implementation

providing quasioptimal error and linear scaling in d.

Operator TS is an extension of the truncated SVD for

matrices to higher dimensions d > 2, getting rid of the ”curse

of dimensionality”.

Approximation tools: combine analytic & algebraic methods B. Khoromskij, Zurich 2010(L12) 318

Approximation in TT format can be based on:

(a) Analytic methods.

(b) Canonical → TT recompression (Any canonical decomposition is a

good starting point for further algebraic TT-rank approximation).

(c) SVD-based recompression of TT-tensors.

(d) Combination of (a) - (c).

Exer. 12.3. In some cases a TT representation can be derived from the

explicit function-TT (FTT) decomposition [Oseledets ’10]. The rank-2 TT

representation of the Kronecker sum tensor (cf. Def. 8.4, Lect. 8)

A :=dX

ℓ=1

Xℓ, Xℓ ∈ Rnℓ ,

is obtained as the n1 × · · · × nd grid representation of the rank-2 FTT

decomposition of f(x) = x1 + x2 + . . .+ xd, cf. Exer. 3.1, Lect. 1,

f(x) =“x1 1

”0@ 1 0

x2 1

1A · · ·

0@ 1 0

xd−1 1

1A0@ 1

xd

1A .

Check by the TT Toolbox. Derive the same with generalization

xℓ → fℓ(xℓ) (Recall that rank(A) = d).

Page 160: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Tensor numerical methods: main ingredients B. Khoromskij, Zurich 2010(L12) 319

1. Discretization in tensor-product Hilbert space of N-d

tensors, Vn = RI1×···×Id, #Iℓ = N .

2. MLA in low separation rank tensor formats S ⊂ Vn:

S = CR, Tr, TCR,r, TT [r], QTT [r], QTTloc.

Key point: Efficient tensor truncation (projection),

TS : S0 → S ⊂ S0 ⊂ Vn,

based on SVD + (R)HOSVD + ALS + ... + multigrid.

3. Multilevel tensor-truncated preconditioned iteration.

4. Minimisation on tensor manifold: precond. + DMRG + QTT.

5. Explicit TT-representation of functions and operators.

6. Quasi-direct tensor solvers: A−1, exp(tA), 1D sPDEs.

Literature to Lecture 12 B. Khoromskij, Zuerich 2010(L12) 320

1. B.N. Khoromskij, O(d logN)-Quantics Approximation of N-d Tensors in High-Dimensional Numerical

Modeling. Preprint 55/2009 MPI MiS, Leipzig 2009. J. Constr. Approx, 2011, to appear.

2. B.N. Khoromskij, and I. Oseledets. Quantics-TT approximation of elliptic solution operators in higher

dimensions. Preprint MPI MiS 79/2009, Leipzig. Rus. J. Numer. Anal. Math. Mod., 2011, to appear.

3. I.V. Oseledets, Compact matrix form of the d-dimensional tensor decomposition. Preprint 09-01, INM

RAS, Moscow 2009.

4. I.V. Oseledets, and E.E. Tyrtyshnikov, Breaking the Curse of Dimensionality, or How to Use SVD in

Many Dimensions. SIAM J. Sci. Comp., 31 (2009), 3744-3759.

5. I.V. Oseledets, and E.E. Tyrtyshnikov, TT-cross approximation for multidimensional arrays. Lin. Alg.

and its Applications. 432 (2010), 70-88.

6. B.N. Khoromskij, Structured Rank-(r1, ..., rd) Decomposition of Function-related Tensors in Rd.

Comp. Meth. in Applied Math., 6 (2006), 2, 194-220.

7. S. Holtz, T. Rohwedder, and R. Schneider, On manifold of tensors of fixed TT-rank, Tech. Rep. 61,

TU Berlin, 2010.

Page 161: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Lect. 13. Quantics-TT model: TT tour of highest dimensions B. Khoromskij, Zurich 2010(L13)

Outline of Lecture 13.

1. Why folding of a vector to high dimensional tensor?

2. Quantics folding. Quantics + TT = QTT.

3. Combined multiplicative tensor formats.

4. QTT representation of exponential/trigonometric vectors.

5. QTT of polynomials, sampled over uniform/graded grids.

6. Analytic methods of approximation.

7. On explicit (closed form) TT/QTT representation.

8. Numerics:

(a) QTT vector/tensor compression.

(b) QTT matrix compression.

9. Toward TT/QTT numerical methods.

Main properties of TT/TC models revisited B. Khoromskij, Zurich 2010(L13) 322

Thm. 12.1. (Lect. 12). (Stor., rank, concatenation, quasioptimality).

(A) Storage:d∑

ℓ=1

rℓ−1rℓN ≤ dr2N with r = maxℓ rℓ.

(B) Rank bound: rℓ ≤ rankℓ(V).

(C) Canonical embeddings:

TT [r] ⊂ TC[r]; CR,n ⊂ TT [r,n, d] with r = (R, ..., R).

(D) Concatenation to higher dim.: V [d1]⊗ V [d2] → D = d1 + d2.

(E) Quasioptimal TT[r]-approximation of V ∈ Vn,

minT∈TT [r]

‖V −T‖F ≤ (d∑

ℓ=1

ε2ℓ )1/2, εℓ = min

rank B≤rℓ‖V[ℓ] −B‖F ,

can be computed by QR/SVD, TC requires ALS iteration.

Page 162: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Quantics model: TT tour of higher dimensions B. Khoromskij, Zurich 2010(L13) 323

Lem. 13.1. [BNK ’09-’10] (Quantics approximation – background).

For a given N = 2L, with L ∈ N, and c, z ∈ C, the exponential N-vector

X := czn−1Nn=1 ∈ CN ,

can be reshaped by the dyadic folding to the rank-1, 2 × 2 × ...× 2| z L

-tensor,

F2,L : X 7→ A = c⊗Lp=1

24 1

z2p−1

35 , A : 1, 2⊗L → C, (2 − L tensor).

The trigonometric N-vector X := sin(h(n− 1))Nn=1 ∈ CN , can be

reshaped to the 2-L tensor, of TT-rank 2, Hint: sin z = eiz−e−iz

2i.

Explicit repres. of sin-vector, xp = h2p−1ip, ip ∈ 0, 1, n− 1 =LP

p=1ip2p−1,

F2,L : X 7→ [sin x1cos x1] ⊗L−1p=2

24 cos xp −sin xp

sin xp cos xp

35⊗

24 cos xL

sinxL

35 ∈ 0, 1⊗L,

Benefit: Reduces the number of representation parameters N→2 log2N .

Quantics folding to higher dimension B. Khoromskij, Zurich 2010(L13) 324

Quantics folding: Vector/Tensor Tensorization to highest possible dim.

Def. 13.1. The q-adic folding of degree 2 ≤ p ≤ L,

Fq,d,p : Vn,d → Vm,dp, m = (m1, ...,mℓ), mℓ = (mℓ,1, ...,mℓ,p),

mℓ,1 = qL−p+1, mℓ,ν = q for ν = 2, ..., p, (ℓ = 1, ..., d), reshapes the n-d

tensors in Vn,d to the elements of quantics space Vm,dp:

(A) d = 1: a vector X(N,1) = [X(i)]i∈I ∈ VN,1, is reshaped to VqL−p+1,p,

Fq,1,p : X(N,1) → Y(m,p) = [Y (j)] := [X(i)], j = j1, ..., jp,

j1 ∈ 1, ..., qL−p+1, and jν ∈ 1, ..., q for ν = 2, ..., p.

For fixed i, jν = jν(i) is defined by jν = 1 + CL−p−1+ν , (ν = 1, ..., p), where

the CL−p−1+ν are found from the partial radix-q representation of i− 1,

i− 1 = CL−p + CL−p+1qL−p+1 + · · · + CL−1q

L−1.

(B) For d > 1 a tensor A(n,d) = [A(i1, ..., id)], iℓ ∈ Iℓ, ℓ = 1, ..., d, is

reshaped to

Fq,d,p : A(n,d) → B(m,dp) = [B(j1, ..., jd)] := [A(i1, ..., id)], jℓ = jℓ,1, ..., jℓ,p,

Page 163: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Quantics folding to higher dimension B. Khoromskij, Zurich 2010(L13) 325

with jℓ,1 ∈ 1, ..., qL−p+1, and jℓ,ν ∈ 1, ..., q, for ν = 2, ..., p, and for all

ℓ = 1, ..., d. Now the univariate ℓ-mode index iℓ is reshaped into jℓ as in the

case d = 1.

(C) In the case p = 1, we define Fq,d,1 as the identity mapping.

Rem. 13.1. For the sake of higher compressibility, the maximal degree

folding, Fq,d,L, should be applied, corresponding to p = L. In this case the

index jν − 1 (ν = 1, ..., L) is the q-adic representation of iℓ − 1 for iℓ ∈ Iℓ, in

radix-q system, s.t. jν ∈ 1, ..., q. If q = 2, use the binary coding of i− 1,

i− 1 =LX

ν=1

(jν − 1)2ν−1.

Rem. 13.2. One step folding of a N2-vector to N ×N matrix is the well

known procedure to compress data in signal processing.

The unfolding transform, e.g., tensor-to-matrix (matricization) or

tensor-to-vector (vectorization), may be viewed as the reverse to the

folding transform,

F−1q,d,p : Vm,dp → Vn,d.

Quantics folding to higher dimension B. Khoromskij, Zurich 2010(L13) 326

The folding transform Fq,d,p exhibits many useful properties:

(F1) Fq,d,p is the linear isometry between VN,d and VqL−p+1,dp that has

the inverse transform (unfolding)

F−1q,d,p : VqL−p+1,dp → VN,d.

(F2) The q-folding of a rank-1 tensor w = x1 × ...× xd ∈ VN,d, is given by

the outer product of componentwise reshaping transforms of

canonical vectors,

Fq,d,pw = Fq,1,px1 ⊗ ...⊗Fq,1,pxd.

(F3) Let d = 1, then for any p = 2, ..., L and X = [X(i)] ∈ CN we have a

bound on the TT rank of a tensor Fq,1,LX,

rp−1 ≤ rank(Xp),

where Xp is the reshaping of X to a N/qp−1 × qp−1 matrix.

Page 164: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Why quantics model does a job B. Khoromskij, Zurich 2010(L13) 327

Lem. 13.2. For given N = qL, with q = 2, 3, ... and L ∈ N+, and for given

ck, zk ∈ C (k = 1, ..., R), we have:

(A) A exponential sum N-vector,

X := xn :=RX

k=1

ckzn−1k N

n=1,

can be reshaped by the q-folding Fq,1,L, to the rank-R, q-L tensor in Vq,L,

Fq,1,L : X → A(q,L) =

RX

k=1

ck ⊗Lp=1 [1 zqp−1

k ... z(q−1)qp−1

k ]T ∈ CR,q[TT [1]].

(B) A sum of trigon. N-vectors, X := xn :=RP

k=1ck sin(αk(n− 1))N

n=1, can

be reshaped to the rank-2R, q-L tensor A(q,L), whose TT-ranks do not

exeed 2R,

Fq,1,L : X → A(q,L) =

RX

k=1

Ak ∈ Vq,L, with Ak ∈ TT [2, L].

In both cases, the number of representation parameters is reduced from

(N + 1)R to (qL+ 1)R and 4qLR, respectively.

Why quantics model does a job B. Khoromskij, Zurich 2010(L13) 328

Proof. (A) R = 1, induction: L = 2, i.e., N = q2, Fq,1,2 : X(q2,1) → A(q,2),

A(q,2) :=

0BBBBBBB@

1 zq · · · z(q−1)q

z. . . · · · z(q−1)q+1

.

.

....

. . ....

zq−1 z2q−1 · · · zq2−1

1CCCCCCCA

=

26666664

1

z

.

.

.

zq−1

37777775

[1 zq ... z(q−1)q].

Induction step: L to L+ 1, i.e., for N = q qL. Subvectors x1, ..., xq ∈ RqL of

X(N,1), with xk(i) := X[i+ (k − 1)qL, 1], (k = 1, ..., q, i = 1, ..., qL), represent

the result of a one-level folding (p = 2), by the rank-1, N/q × q-matrix via

rescaling of the first colomn x1,

Fq,1,2 : X(N,1) → A(N/q,2) := c[x1x2...xq] = cx1 ⊗ y,

y := [1 zqL ... z(q−1)qL ]T .

By induction, substitute xk, k = 1, ..., q, of size N/q = qL by rank-1 tensor,

A(q,L+1) = ch⊗L

p=1[1 zqp−1... z(q−1)qp−1

]Ti⊗ [1 zqL ... z(q−1)qL ]T .

Page 165: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Why quantics model does a job B. Khoromskij, Zurich 2010(L13) 329

(B) Again, we begin from the case R = 1. Using trigonometric identity

sin z =eiz − e−iz

2i,

and applying item (A) with R = 1, we arrive at the required claim on

tensor rank of A(q,L) over field C.

Now, the rank of each ℓ-mode TT-unfolding matrix of the q-L tensor does

not exceed 2, since the matrix rank does not change if we extend the field

R to C. Since the TT-ranks do not exceed the ranks of respective

directional unfolding matrices, the maximal TT-rank of the q-L tensor

A(q,L) is bounded by 2R.

In the case of arbitrary rank parameter R > 1, the result is obtained by

summation of rank-1 (resp. rank-2) terms.

The complexity bounds then follow from Thm. 12.1, (A).

Examples on quantics-TT model B. Khoromskij, Zurich 2010(L13) 330

Ex. 13.1. For d = 1 and p = 2, 3, Fq,1,p folds an N-vector to a

N/q × q-matrix or to N/q2 × q × q, 3-tensor, respectively.

Ex. 13.2. Quantics folding of the exponential N-vector:

N = 8, L = 3, X = [1 z z2 z3 z4 z5 z6 z7]T ∈ C8, F2,3(X) ∈ C

2×2×2.

F2,1,3 : X 7→ A =

1

z

1

z2

1

z4

∈ QTT [1, 3] ⊂ C

2×2×2.

N

r1

r1rr

2 2r3

d=6

r

N

N

3

r6

r5

6

r5 r4

r r4

d= log N = 3

F

N=23

Def. 13.2. Quantics + TT = QTT model (or QTC).

Page 166: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Combined multiplicative tensor formats B. Khoromskij, Zurich 2010(L13) 331

Introduce the Tucker-TC format, T r[TC[r1]], containing all Tucker

tensors in T r,n with the Tucker core in the rank-r1 TC format. Now the

storage complexity of representation scales linearly in r, O(drN + d r21 r),

while the representation basis is given explicitly by the “optimal” set of

orthogonal Tucker vectors. Notice: T CR,r ⊂ T r[TT [r1]] with r1 = (R, ..., R).

Rem. 13.3. The hierarchical Tucker format introduced in [3] is closely

related to TT model: actually, it is in the spirit of T r[TT [r1]] format.

The so-called canonical-TC format, CR,n[TC[r, L]], is specified as a set

of N-d tensors in CR,n with N = qL, where each canonical N-vector in

rank-1 terms is represented by the q-L tensor in the TC[r, L] format with

L = logq N . The particular representation looks like

V =RX

k=1

ckT(1)k ×2 T

(2)k ...×d T

(d)k ∈ CR,n[TC[r, L]], (110)

where, for k = 1, ..., R, ν = 1, ..., d,

T(ν)k := ×ℓL

ℓ=1G(ℓ)k,ν ∈ TC[r, L] with small-size G

(ℓ)k,ν ∈ R

rℓ−1×q×rℓ .

The storage complexity scales logarithmically in N , O(Rr2d logN), hence,

it has advantages for large tensor size N .

Toward computations in quantum tensor networks B. Khoromskij, Zurich 2010(L13) 332

α

α α

i1α2 i2

1 i1α

αd id

1 d. . .

idαα

iαα1 i 1 α1 α1 i 2α2 α2 αd−1 d

α

α

α1 i 2

d−1

. . .

. . .

α i2 α id−1

α

αd−1i dαd

d d i 1α α1 1

α 2

α2d−1

Page 167: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Quantics approxim. on classes of functional vectors B. Khoromskij, Zurich 2010(L13) 333

The exponential-trigonometric product vector allows the 4 logq N

complexity quantics representation.

Lem. 13.3. For given N = qL, with q = 2, 3, ..., L ∈ N+, and c, z ∈ C, α ∈ R,

the exponential-trigonometric vector X := xn := czn−1 sin(α(n− 1))Nn=1,

can be reshaped to the q-L tensor, Fq,1,L : X → A(q,L) ∈ TT [2], whose

TT-ranks do not exceed 2.

Proof. The properties of the folding transform Fq,1,L imply that the q-L

tensor A(q,L) is obtained by the Hadamard product of the rank-1 quantics

representation for a single exponential and rank-2 quantics of the

trigonometric vector (cf. Lem. 13.2). Now, the statement follows from

the fact that the Hadamard product with rank-1 tensor does not enlarge

the TT-rank of the second factor that is exactly 2. Hence, the TT-rank

of the resultant q-L tensor A(q,L) does not exceed 2.

Rem. 13.4. The minimization of the parametric function fx(q) := q logq x

for the large value of x ∈ R+, leads to the optimal quantics base q∗ ∈ [2, 3].

This means that for large vector-size N , the choice q = 2, 3 leads to the

best compression rate.

Quantics approxim. on classes of functional vectors B. Khoromskij, Zurich 2010(L13) 334

Rem. 13.5. Property (F3) of the quantics folding ensures that the QTT

rank of a vector obtained by the equidistant sampling the polynomial of

degree m, does not exceed m+ 1. In fact, the column space of the

reshaped TT-unfolding matrix is spaned by at most m+ 1 polynomial

vectors generated by 1, x, ..., xm, respectively. The explicit (closed form)

rank-(m+ 1) QTT representation is discussed in Lect. 14.

The quantics format applies to piecewise polynomial wavelet basis

functions. For example, the QTT rank of Haar wavelet does not exceed

2, implying that the asymptotic QTT compression properties are at least

as good as for the Haar wavelets.

Conj. 13.1 Based on the extensive numerical tests, we further assume

that the Gaussian- and sinc-vectors obtained via the uniform sampling,

allow the q-folding quantics approximation, whose TT-rank remains

bounded by a small constant (say, 4, 5) uniformly in the vector size N (see

Table 1).

Page 168: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Quantics approxim. on classes of functional vectors B. Khoromskij, Zurich 2010(L13) 335

Using equidistant sampling points is non mandatory.

Next statements give the uniform rank bound for polynomial vectors

sampled at N + 1 Chebyshev Gauss-Lobatto nodes,

xj = cosπj

N∈ [−1, 1], j = 0, ...,N,

and for Gaussian type function with quadratic mesh grading.

Lem. 13.4 (A) For any n = 0, 1, ..., the Chebyshev polynomial

Tn(x) = cos(n arccos x), |x| ≤ 1, sampled over N + 1 = 2L Chebyshev nodes

xj ∈ [−1, 1], can be represented in the quantics space of 2-logN tensors

with both C-rank and QTT-rank ≤ 2, uniformly in N .

(B) The Chebyshev polynomial Tn(x), sampled as a vector X, at

Chebyshev nodes, θj = arccos xj, has the explicit rank-2 QTT

representation (with yp = h2p−1ip − 1, ip ∈ 0, 1, h = 2/N),

X 7→ [cos y1 − sin y1] ⊗L−1p=2

24 cos yp − sin yp

sin yp cos yp

35⊗

24 cos yL

sin yL

35 ∈ 0, 1⊗L,

(C) Any polynomial of degree m sampled over N + 1 = 2L Chebyshev

nodes at [−1, 1] has a quantics-TT separation rank bounded by 2m+ 1.

Quantics approxim. on classes of functional vectors B. Khoromskij, Zurich 2010(L13) 336

Proof. (A) First we note that the Chebyshev polynomial

Tn(x) = cos(n arccosx), sampled at Chebyshev nodes coinsides with the

cos-trigonometric vector sampled over uniformly graded points in variable

θj = arccosxj , j = 0, ...,N . Then the result follows by Lem. 13.1.

(B) Based on the explicit FTT representat. of the cos-vector (Lect. 14).

(C) Any polynomial of degree m can be represented in the orthogonal

basis of Chebyshev polynomials by at most m+ 1 terms with T0 = 1.

Hence (B) follows from item (A).

Lem. 13.4 can be applied to the case of polynomial interpolation over

Chebyshev nodes, which are usually more prefarable compared with

equispaced nodes. In fact, this prevents the well-known instability

appearing in those interpolation process based on equidistant grids.

Rem. 13.6. The TT-rank of the q-folded discrete Gaussian type function

sampled over the uniform grid, e−α(n−1)2Nn=1, appears to be greater

than 2, but numerical tests show that it remains to be almost uniformly

bounded in the vector size N (see Table 1 below). Lemma 13.2 implies

the rank-1 quantics representation in the case of quadratic mesh grading

toward the origin, i.e., by sampling the Gaussian e−αt2 over

tn =ph(n− 1), n = 1, ...,N, h > 0.

Page 169: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Analytic quantics approximation B. Khoromskij, Zurich 2010(L13) 337

Consider the error bound for the semianalytic quantics

approximation of function related (FR) tensors.

Lem. 13.5. For given continuous function f : [a, b] → R, and

ε > 0, let there is an approximation s.t.

maxx∈[a,b]

|f(x) −M∑

k=1

cke−tkx| ≤ ε. (111)

Then for any N = qL, with q = 2, 3, ..., and L ∈ N+, we have:

(A) The FR N-d tensor F = [Fi], defined by

Fi = f(h(i1+i2+...+id)), i ∈ I⊗d, h > 0, with a ≤ dh ≤ b/N,

with the generating multivariate function f(x1 + ...+ xd), can

be represented by the rank-M , q-dL tensor, up to the

tolerance ε in the max-norm.

Analytic quantics approximation B. Khoromskij, Zurich 2010(L13) 338

(B) Let a ≤ dh ≤ b/N , then the FR N-d tensor G = [Gi],

Gi = f(x21,i1 + ...+ x2

d,id), xℓ,iℓ =

√hiℓ, i ∈ I⊗d,

discretising the multivariate function g = f(x21 + ...+ x2

d) on the

polynomially graded grid xℓ,iℓ, imbedded into the region

a ≤∑dℓ=1 x

2ℓ ≤ b, can be approximated by the rank-M , q-dL

tensor with the tolerance ε in the max-norm.

In both cases, the representation complexity is O(dqM logq N).

Proof. We notice that the previous results can be applied to R-term sums

of exponential/trigonometric vectors in d dimensions, i.e., to the

respective N-d tensors,

A(n,d) := xn :=RX

k=1

ck

dY

ℓ=1

znℓ−1k,ℓ n∈I⊗d , I = 1, ...,N, (112)

A(n,d) can be reshaped to the Q-format CR,q[TT [1, dL]] of complexity

dqR logq N . Now items (A), (B) directly follow from (112).

Page 170: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Analytic quantics approximation B. Khoromskij, Zurich 2010(L13) 339

Lem. 13.5 allows us to derive the accurate O(d logN)

approximations to the wide class of function related tensors

in high dimension. For a class of analytic functions the basic

approximability assumption (111) can be verified with

ε = O(e−αM/ log M ), α > 0,

by applying the sinc-approx.

Exer. 13.1. Check the QTT rank of monomial, general polynomial and

Chebyshev polynomial, all over uniform and Chebyshev grids in [−1, 1].

Exer. 13.2. Test the QTT rank of sin-Hemholtz kernel (scaling in κ?).

Exer. 13.3. Find the QTT rank of step-function and Haar wavelet.

In all cases look on the average rank,

r :=

vuut 1

d− 1

d−1X

k=1

rkrk+1.

Super-compression in high dimension? B. Khoromskij, Zurich 2010(L13) 340

Exer. 13.4. Linear-log-log scaling via quantics in auxiliary

dimension: dth order Hilbert N-d tensor A of dimension N⊗d,

A(i1, ..., id) =1

i1 + i2 + ...+ id≈

M∑

k=−M

d⊗

ℓ=1

cke−tkiℓ ,

i1, ..., id = 1, ..., N = 2L, can be approximated by a rank-| log ε|tensor of order D = d logN and of size 2⊗D, requiring only of

Q = d| log ε| logN ≪ Nd reals.

Using our canonical decomposition, compute its QTT

aproximation applying C-to-QTT.

Numerical gain:

Matrix case: d = 2, N = 220 ⇒ Q = 40| log ε| ≪ 240.

High dimension: d = 210, N = 220 ⇒ Q = 20 · 210| log ε| ≪ 22·104.

Page 171: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Numerics on quantics model B. Khoromskij, Zurich 2010(L13) 341

Tables 1, 6 represent the average QTT-ranks in approximation of function related vectors/matrices up to

tolerance ε = 10−5. Table 6 includs the example of matrix exponential (cf. Conj. 13.1). One can observe

that rank parameters are small, and depend very mildly on the grid size.

N \ r e−αx2, α = 0.1 ÷ 102

sin(αx)x

, α = 1 ÷ 1021x

e−x

xx, x10, x1/10

210 3.2/2.8/2.8/2.2 4.0/4.7/5.5 4 3.5 1.9/2.7/3.9

212 3.1/2.9/2.9/2.6 3.8/4.8/5.6 4.2 3.8 1.9/2.6/3.9

214 2.9/2.8/2.8/2.8 3.6/4.7/5.5 4.2 3.8 1.9/2.5/3.9

216 2.8/2.7/2.8/2.8 3.6/4.5/5.4 4.2 5.3 1.9/2.4/3.9

Table 1: QTT2-ranks of large functional N-vectors, N = 2p.

N \ r e−α∆1 , α = 0.1, 1, 10, 102 ∆−11 diag(1/x2) diag(e−x2

)

29 6.2/6.8/9.7/11.2 6.2 5.1 4.0

210 6.3/6.8/9.5/10.8 6.3 5.3 4.0

211 6.4/6.8/9.0/10.4 6.2 5.5 4.1

Table 2: QTT2-matrix-ranks of N × N-matrices for N = 2p.

Numerics on quantics model B. Khoromskij, Zurich 2010(L13) 342

N \ r 1/(x1 + x2) e−‖x‖ e−‖x‖2diag(e−x2

) ∆−12 1, ε = 10−6, 10−7, 10−8

29 5.0 9.4 7.8 3.8 3.6/3.6/3.6

210 5.1 9.4 7.7 3.9 3.6/3.6/3.6

211 5.2 9.3 7.5 3.9 3.7/3.7/3.7

Table 3: QTT2-ranks of functional N ×N-matrices, N = 2p.

N 128 256 512 1024

1/‖x‖ 13.8 16.0 17.5 18.0

ρ(x) 32.0 40.0 45.8 48.6

Hartree(S) 13.7 14.2 14.2 13.9

Hartree(F) 32.1 34.9 20.2 28.2

Table 4: QTT2-ranks of projected 1/‖x‖, x ∈ R3, the Hartree potential

VH , and electron density ρ of CH4 on N ×N ×N grids in 3D; S - 2D slice,

F - full tensor.

Page 172: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

TT/QTT based tensor numerical methods: main ingredients B. Khoromskij, Zurich 2010(L13) 343

1. Discretization in tensor-product Hilbert space of N-d

tensors, Vn = RI1×···×Id, #Iℓ = N = 2L.

2. MLA in low rank TT/QTT tensor formats S ⊂ Vn:

S = Tr[TT [r1]], TT [r], QTT [r], QTTloc.

Key point: Efficient tensor truncation (projection),

TS : S0 → S ⊂ S0 ⊂ Vn,

based on SVD + (R)HOSVD + ALS + ... + multigrid.

3. Explicit TT/QTT-represent. of functions and operators.

4. Multilevel tensor-truncated preconditioned iteration.

5. Quasi-direct tensor solvers: A−1, exp(tA), 1D sPDEs.

6. Minimisation on tensor manifold: precond. + DMRG + QTT.

Literature to Lecture 13 B. Khoromskij, Zurich 2010(L13) 344

1. I.V. Oseledets, Approximation of 2d × 2d matrices using tensor decomposition. SIAM J. Matrix Anal.

Appl., 2010.

2. B.N. Khoromskij, O(d logN)-Quantics Approximation of N-d Tensors in High-Dimensional Numerical

Modeling. Preprint 55/2009 MPI MIS, Leipzig 2009. J. Constr. Approx., 2011, to appear.

3. W. Hackbusch, and S. Kuhn, A new scheme for the tensor representation. Preprint 2/2009, MPI MIS,

Leipzig 2009, submitted.

4. B.N. Khoromskij, and I. Oseledets. Quantics-TT approximation of elliptic solution operators in higher

dimensions. Preprint MPI MiS 79/2009, Leipzig. Rus. J. Numer. Anal. Math. Mod., 2011, to appear.

5. I.V. Oseledets. Constructive representation of functions in tensor formats. Preprint INM Moscow, 2010.

6. I.V. Oseledets, and E.E. Tyrtyshnikov. Breaking the Curse of Dimensionality, or How to Use SVD in

Many Dimensions. SIAM J. Sci. Comp., 31 (2009), 3744-3759.

7. B.N. Khoromskij. Tensors-structured Numerical Methods in Scientific Computing: Survey on Recent

Advances. Preprint 21/2010, MPI MiS Leipzig 2010 (submitted).

Page 173: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Lect. 14. Explicit TT/QTT representation of multivariate vectors B. Khoromskij, Zurich 2010(L14)

Outline of Lecture 14.

1. Operator/matrix TT (OTT/MTT) formats.

2. Explicit (closed form) FTT representation of multivariate

functions in the form f(x) = f1(x1) + f2(x2) + . . .+ fd(xd).

3. Generic trigonometric functions of x1 + ...+ xd.

4. Rank-r separable functions.

5. Multivariate polynomials P (x1 + ...+ xd).

6. Explicit QTT repr. for classes of function related tensors.

7. QTT rank of multivariate polynomials.

8. Remarkable special cases: harmonic oscillator, multivariate

polynomial potentials.

Operator TT (OTT) decomposition B. Khoromskij, Zurich 2010(L14) 346

Recall FTT: Given J := ×dℓ=1Jℓ, Jℓ = 1, ..., rℓ, and J0 = Jd.

Def. 1.5 (Lect. 1) The rank-r functional tensor chain/train

(FTC/FTT) format: contains products of functional

tri-tensors over J (entrywise representation),

f(x1, ..., xd) =∑

j∈Jg1(jd, x1, j1)g2(j1, x2, j2) · · · gd(jd−1, xd, jd),

or in a tensor/functional form via contracted products,

FTC[r]:= f ∈ H : f = ×Jℓdℓ=1G

(ℓ)(xℓ) with G(ℓ) ∈ RJℓ−1×Hℓ×Jℓ.

A function f(x1, ..., xd) ∈ H, x ∈ [0, 1]d is represented

(approximately) by a product of matrices (matrix product

states), each depending on a single variable xℓ.

Rem. 14.1. Efficient tensor numerical methods for multidimensional

equations will require not only low rank FTT decomposition but also the

respective multiplicative representation of operators.

Page 174: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Operator TT (OTT) decomposition B. Khoromskij, Zurich 2010(L14) 347

FTT decomposition induces the important concept of

multiplicative formats for the operators acting between two

TPHSs, A : X → Y, each of dimension d (deatails in Lect. 15).

Ex. 14.1. X = Y = L2[0, 1]d. X = H10 ([0, 1]d), Y = H−1([0, 1]d).

X = Y = Rn⊗d.

Def. 14.1.(OTT/OTC decomp.) Introduce the rank-r operator TC

(OTC) decomposition simbolised by a set of factorised operators A,

A =X

j∈JG(1)(jd, j1)G(2)(j1, j2) · · ·G(d)(jd−1, jd),

with G(ℓ) = [G(ℓ)(jℓ, jℓ+1)] being the operator valued rℓ × rℓ+1 matrix,

where G(ℓ)(jℓ, jℓ+1) : Xℓ → Yℓ, (ℓ = 1, ..., d), s.t. the action Af on rank-1

function f ∈ X is defined as the rank-r TT/TC element in Y,

(Af) (y1, ..., yd) :=X

j∈Jg1(jd, y1, j1)g2(j1, y2, j2) · · · gd(jd−1, yd, jd),

with

gℓ(jℓ−1, yℓ, jℓ) = (G(ℓ)(jℓ−1, jℓ)fℓ)(yℓ).

A sum of univariate functions B. Khoromskij, Zurich 2010(L14) 348

Thm. 14.1 ([5]) The function

f(x) = f1(x1) + f2(x2) + . . .+ fd(xd)

allows rank-2 FTT decomposition of the form

f(x) =“f1(x1) 1

”0@ 1 0

f2(x2) 1

1A · · ·

0@ 1 0

fd−1(xd−1) 1

1A0@ 1

fd(xd)

1A .

Proof. By induction, using the identity for any a and b,0@1 0

a 1

1A0@1 0

b 1

1A =

0@ 1 0

a+ b 1

1A .

Rem. 14.2 The rank-2 TT repr. of the Kronecker sum tensor (Def. 8.4)

A :=dX

ℓ=1

Xℓ, Xℓ ∈ Rnℓ , rank(A) = d,

where Xℓ is the grid discretization of fℓ(xℓ), is obtained as the

n1 × · · · × nd grid representation of the rank-2 FTT decomposition of

f(x) = f1(x1) + f2(x2) + . . .+ fd(xd), as in Thm. 14.1.

Page 175: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Trigonometric functions f(x) = T (x1 + ...+ xd) B. Khoromskij, Zurich 2010(L14) 349

Lem. 14.2. The rank-2 FTT decomposition of f(x) := sin(dP

j=1xj), and

g(x) := cos(dP

j=1xj), x ∈ Rd, has form (same for tan(

dPj=1

xj), cot(dP

j=1xj), etc.),

f(x) =“sinx1 cosx1

”0@cosx2 − sinx2

sinx2 cosx2

1A · · ·

0@cosxd−1 − sin xd−1

sinxd−1 cos xd−1

1A0@cosxd

sinxd

1A ,

g(x) =“cos x1 − sinx1

”⊗L−1

p=2

0@ cos xp − sin xp

sin xp cos xp

1A0@cos xd

sinxd

1A , respect.

Proof. Prove the case g(x) by induction, similar to Ex. 3.1, Lec. 1,

g(x) = cos x1 cos(x2 + ...+ xd) − sinx1 sin(x2 + ...+ xd)

=“cosx1 − sinx1

”0@cos(x2 + ...+ xd)

sin(x2 + ...+ xd)

1A

=“cosx1 − sinx1

”0@cosx2 − sinx2

sinx2 cos x2

1A0@cos(x3 + ...+ xd)

sin(x3 + ...+ xd)

1A .

H(x+ y) is rank-r separable B. Khoromskij, Zurich 2010(L14) 350

Thm. 14.2. ([5]) Let f be a funct. that depends on a sum of arguments

f(x1, . . . , xd) = H(x1 + x2 + . . .+ xd),

where function H(x+ y) has separation rank r,

H(x+ y) =rX

α=1

uα(x)vα(y),

and functions uα(x) and vα(y) form two linear independent sets. Then

1. All FTT-ranks are bounded by r.

2. If, additionally, bxi, i = 1, . . . , r, and byj , j = 1, . . . , r, are known s.t. matrix

with elements H(bxi + byj) is nonsingular, then FTT decomp. takes form

f = g1(x1)G(x2) · . . . ·G(xd−1)gd(xd), G(xℓ) ∈ Rr×r ,

where

g1(x1) = (ψ1(x1)ψ2(x1) . . . ψr(x1),

G(x)ij = ψi(x+ byj),

Page 176: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

H(x+ y) is rank-r separable B. Khoromskij, Zurich 2010(L14) 351

with

ψi(z) =rX

j=1

MijH(bxi + z),

gd(xd) =

0BBBBBB@

H(by1 + xd)

H(by2 + xd)

.

.

.

H(byr + xd)

1CCCCCCA,

and

[Mij ] = [H(bxi + byj)]−1.

Proof. Due to separability assumption, functional skeleton decomposition

ensures that there exist points bxi, i = 1, . . . , r, and byj , j = 1, . . . , r, such

that

H(x+ y) =rX

i,j=1

H(x+ byj)MijH(bxi + y), (113)

where

[Mij ] = [H(bxi + byj)]−1.

The requirement for (113) to be true is that nodes bxi and byj are chosen

H(x+ y) is rank-r separable B. Khoromskij, Zurich 2010(L14) 352

so that matrix [H(bxi + byj)] is nonsingular. Let us rewrite function H(x+ y)

in the form

H(x+ y) =

rX

i,j=1

H(x+ byj)ψi(y), (114)

where ψi(y) is defined as above.

Now, let us proceed construction of FTT decomposition for a function f .

From (113) and (114) it follows that

f = (ψ1(x1) ψ2(x1) . . . ψr(x1) )

0BBBBBB@

H(by1 + (x2 + . . .+ xd))

H(by2 + (x2 + . . .+ xd))

.

.

.

H(byr + (x2 + . . .+ xd))

1CCCCCCA.

For each element of the second vector in r.h.s., x2 can be separated:

H(byk + (x2 + . . .+ xd)) =rX

i=1

ψi(byk + x2)H(byi + (x3 + . . .+ xd)),

Page 177: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

H(x+ y) is rank-r separable B. Khoromskij, Zurich 2010(L14) 353

therefore

0BBBBBB@

H(by1 + (x2 + . . .+ xd))

H(by2 + (x2 + . . .+ xd))

.

.

.

H(byr + (x2 + . . .+ xd))

1CCCCCCA

= G2(x2)

0BBBBBB@

H(by1 + (x3 + . . .+ xd))

H(by2 + (x3 + . . .+ xd))

.

.

.

H(byr + (x3 + . . .+ xd))

1CCCCCCA,

where G2 is r × r matrix with elements

G2(x2)ij = ψi(x2 + byj).

This completes the proof.

Cor. 14.1. For a function

f(x1, . . . , xd) = P (x1 + x2 + . . .+ xd),

where function P (z) is degree-p polynomial, P (z) =pP

k=0akz

k, FTT

decomposition has form

f = g1(x1)G(x2) · . . . ·G(xd−1)gd(xd),

H(x+ y) is rank-r separable B. Khoromskij, Zurich 2010(L14) 354

where G(x) is a matrix function of size (p+ 1) × (p+ 1) with entries

G(x)ij = Ci−ji xi−j , i ≥ j, and G(x)ij = 0 otherwise,

for i, j = 0, ..., p, while Csk = k!

s!(k−s)!is the binomial coefficient. Moreover,

g1(x1) = (φ0(x1) φ1(x1) · · ·φp−1(x1) φp(x1)),

φs(x) =

pX

k=s

akCskx

k−s, s = 0, ..., p,

and

gd(xd) = (1 xd x2d · · ·xp

d)T .

Proof. Follows by Thm. 14.2 due to the rank-(p + 1) separable

representation of polynomial P (x+ y),

pX

k=0

ak(x+ y)k =

pX

k=0

ak

kX

s=0

Csky

sxk−s =

pX

s=0

yspX

k=s

akCskx

k−s =

pX

s=0

ysφs(x),

with

φs(x) =

pX

k=s

akCskx

k−s.

Page 178: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Explicit QTT decomp. of rank-r separable funct. B. Khoromskij, Zurich 2010(L14) 355

Thm. 14.3. ([5]) Let f(x) be a continuous function defined on an interval

[a, b] and f(x+ y) has separation rank r. Consider uniform grid on [a, b],

xi = a+ (i− 1)h, h =b− a

n− 1, n = 2d, i = 1, ..., n,

a function generated vector v = [v(i)]

v(i) = f(xi),

and 2 × 2 × · · · × 2 tensor V in dimension d, which is a reshape of v:

V (i1, i2, . . . , id) = v(i),

where ik ∈ 0, 1 (k = 1, ..., d) are binary digits of integer i. Then

1. All QTT ranks are bounded by r.

2. If, additionally, bxi, i = 1, . . . , r, and byj , j = 1, . . . , r, are known such that

matrix with elements f(bxi + byj) is nonsingular, then TT decomposition of

V (and QTT of v) has form

V (i1, i2, . . . , id) = g1(x1)G(x2) · . . . ·G(xd−1)gd(xd),

Explicit QTT decomp. of rank-r separable funct. B. Khoromskij, Zurich 2010(L14) 356

where

g1(x1) = (ψ1(x1)ψ2(x1) . . . ψr(x1),

G(x)ij = ψ1(x+ byj),

gd(xd) =

0BBBBBB@

f(by1 + xd)

f(by2 + xd)

.

.

.

f(byr + xd)

1CCCCCCA,

ψi(z) =rX

j=1

Mijf(bxi + z), i = 1, ..., r,

xk =a

d+ 2k−1ikh, k = 1, ..., d,

and [Mij ] = [f(bxi + byj)]−1.

Proof. To prove Theorem, it is sufficient to note that

V (i1, i2, . . . , id) = v(i) = v(i1 + 2i2 + 4i3 + . . . 2d−1id) = f(x1 + x2 + . . . xd),

where xk = ad

+ 2k−1ikh, and apply Thm. 14.2.

Page 179: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Explicit QTT decomposition of polynomial vectors B. Khoromskij, Zurich 2010(L14) 357

Cor. 14.2. ([5]) Let

M(x) =

pX

k=0

akxk

be a polynomial of degree p on an interval [a, b]. Consider uniform grid on

this interval,

xi = a+ (i− 1)h, h =b− a

n− 1, n = 2d, i = 1, ..., n,

and vector v = [v(i)],

v(i) = M(xi), i = 1, ..., n,

and a 2 × 2 × . . .× 2 tensor V of dimension d, which is a reshape of v:

V (i1, i2, . . . , id) = v(i),

where ik ∈ 0, 1 (k = 1, ..., d) are binary digits of integer i. Then,

QTT-decomposition of v, that is TT-decomposition of V , has form

V (i1, i2, . . . , id) = g1(i1)G2(i2) . . . Gd−1(id−1)gd(id),

Explicit QTT decomposition of polynomial vectors B. Khoromskij, Zurich 2010(L14) 358

where

g1(i1) =“φ0(

a

d+ i1h) φ1(

a

d+ i1h) . . . φp−1(

a

d+ i1h) φp(

a

d+ i1h)

”,

φs(x) =

pX

k=s

akCskx

k−s, s = 0, . . . , p,

Gk(ik) = G“ad

+ 2kikh”,

G(x)ij =

8<:

Ci−ji xi−j , i ≥ j

0, i < j, i, j = 0, . . . p,

gd(id) = g(a

d+ 2didh),

with

g(x) =`1, x, x2, · · · , xp

´T.

Proof. To prove the statement, it is sufficient to note that

V (i1, i2, . . . , id) = v(i) = v(i1 + 2i2 + 4i3 + . . . 2d−1id) = M(x1 + x2 + . . . xd),

where xk = 2k−1ikh+ ad, and apply Thm. 14.3.

Page 180: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

FQTT decomp. of rational polynomials and trigonometric funct. B. Khoromskij, Zurich 2010(L14)

Cor. 14.3. For a rational function f(x) =p(x)q(x)

, where p and q are

polynomials defined on an interval [a, b] (|f(x)| <∞, x ∈ [a, b]) and uniform

grid with n = 2d grid points, QTT-ranks behave logarithmically in

accuracy ε > 0 of the QTT-approximation, and number of grid points n:

rk = O(logα 1/ε logβ n), ε, β ≥ 0.

Proof. Due to Thm. 14.3., QTT rank estimates are reduced to the

estimation of the ε-separation rank of the function

g(x, y) = f(x+ y). (115)

For a rational function such estimates can be obtained via constructive

separable approximation schemes (for example, sinc-quadratures).

Cor. 14.4. Trigonometric functions f(x) = sin(x), f(x) = cos(x),

f(x) = tan(x), f(x) = cot(x), etc., sampled as an N-vector over equispaced

grid with N = 2d, admit the closed form QTT decomposition of rank 2.

Proof. Since these functions have separation rank 2, the result follows by

Thm. 14.3.

TT ranks of multivariate polynomials B. Khoromskij, Zurich 2010(L14) 360

The question on low-rank approximation of matrices representing

multidimensional potentials V (q1, . . . , qf ) can be reduced to low-rank

approximation of this function on a tensor grid. In particular, this problem

arises in approximation of so-called potential energy surface (PES) in

molecular dynamics ([6]).

If variables in V (q1, . . . , qf ) are separated,

V (q1, . . . , qf ) ≈rX

k=1

fY

i=1

vi(qi, k),

then the canonical rank of a tensor V does not exceed r, and moreover

TT-ranks of V do not exceed r. However, they can be much smaller. For

important case of polynomial potentials, one can obtain following

estimate on TT-ranks of corresponding tensors.

Thm. 14.4. ([6]) For a general homogeneous polynomial potential of

form

V (q1, . . . , qf ) =

fX

i1,...,is=1

a(i1, . . . , is)sY

k=1

qik ,

rankT T (V ) = C0f[ s2] + o(f [ s

2]).

Page 181: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

TT ranks of multivariate polynomials B. Khoromskij, Zurich 2010(L14) 361

Proof. To illustrate the idea, let us first consider quadratic potential,

V =

fX

i,j=1

aijqiqj ,

and estimate its TT-ranks. To prove the statement, we will separate qkone-by-one. This is exactly how numerical algorithm for computing

TT-decomposition works. Suppose, we already have decomposition of

form

V = G1(q1) . . . Gk(qk)W (qk+1, . . . , qf ),

which is a “partial” variant of the FTT-decomposition, and we want to

obtain the next core. W (qk+1, . . . , qf ) is actually a parameter-dependent

vector of length rk:

W (qk+1, . . . , qf ) =

0BBB@

W1(qk+1, . . . , qf )

.

.

.

Wrk (qk+1, . . . , qf )

1CCCA .

In each element, qk+1 can be separated from other variables, but we

require that the same basis functions Rα(qk+2, . . . , qf ), s = 1, . . . , rk+1, are

TT ranks of multivariate polynomials B. Khoromskij, Zurich 2010(L14) 362

used, i.e.,

Ws(. . .) =

rk+1X

α=1

hαs(qk+1)Rα(qk+2, . . . , qf ), s = 1, . . . , rk.

This can be always done: for each component Ws one can separate qk+1

from other variables with ps terms, and get ps basis functions, and no

more thanPrk

s=1 ps basis functions are required. However, there are cases,

when less basis functions are needed and we can estimate their number

for the polynomial PES.

At the first step, q1 is separated from other variables. V is quadratic in q1:

V = a11q21 + q1

fX

j=2

a1jqj +

fX

i,j=2

aijqiqj ,

hence

V =“a11q21 q1 1

”0BB@

1

l1

s2

1CCA ,

where l1 is linear in q2, . . . , qf and s2 is quadratic in q2, . . . , qf , so r1 ≤ 3.

Now, at the second step separation of q2 is required. To estimate r2, one

Page 182: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

TT ranks of multivariate polynomials B. Khoromskij, Zurich 2010(L14) 363

has to bound the number of functions depending on q3, . . . , qf required to

represent each element of the vector0BB@

a11

l1

s1

1CCA ,

as a linear combination of such functions with coefficients depending only

on q2. In order to do that, decompose l1 as

l1(q2, . . . , qf ) = a12q2 + l2(q3, . . . , qf ),

where l2 is linear in q3, . . . , qf , and

s1(q2, . . . , qf ) = a22q22 + q2l3(q3, . . . , qf ) + s2(q3, . . . , qf ),

where l3 is linear in q3, . . . , qf and s2 is quadratic in q3, . . . , qf . Therefore,

the following basis functions arise: 1, l2, l3, s2, i.e., there is one constant,

two linear functions in q3, . . . , qf , and one quadratic in q3, . . . , qf . It is easy

to see what happens next: quadratic function gives one more linear

function to the basis, thus after k steps we will have one constant, k

linear functions and 1 quadratic function in qk+1, . . . , qf , and the rank

bound is 2 + k.

TT ranks of multivariate polynomials B. Khoromskij, Zurich 2010(L14) 364

However, dimension of the space of linear functions of qk+1, . . . , qf is

bounded by (f − k), therefore the rank increases only until k ≤ (f − k), i.e.

k ≤ [ f2] and then start to decrease. Thus, maximal rank is [ f

2] + 1.

The idea is naturally extended to higher order of polynomials. For degree

three, at the first step, we will have one constant function, one linear

function, one quadratic and one cubic function in remaining variables.

Cubic function produces one quadratic and one linear function (and one

cubic remains), each quadratic function produces one additional linear

function. At the k-th step there will be k quadratic functions, and the

number of linear functions grows as O(k2), but the dimension of linear

space decreases as (f − k), thus when k ≤ O(k2) rank bound is

O(k2) + k + 2, and the rank increases with k, whereas for k > O(k2) rank

bound is simply f − k + k + 2 = f + 2. This can be depicted in a Table 5.

The rank bound can be obtained from the Table 5 by taking minimum of

the second and the third column.

Analysis for the general case s ≥ 2 is presentd in [6].

Page 183: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

QTT ranks of multivariate polynomials B. Khoromskij, Zurich 2010(L14) 365

Pol. order Number at k-th step Dimension of the space

0 1 1

1 k2 f − k

2 k O(f − k)2

3 1 O(f − k)3

Table 5: Different polynomials appearing during TT-SVD process and

dimensions of corresponding spans for order-3 polynomials.

Let us estimate the QTT ranks of discretised polynomial potentials by

sampling over uniform grid.

Thm. 14.5. ([6]) For a general homogeneous polynomial potential of

form

V (q1, . . . , qf ) =

fX

i1,...,is=1

a(i1, . . . , is)sY

k=1

qik ,

sampled over uniform grid, we have

rankQT T (V ) = C0f[ s2] + o(f [ s

2]).

QTT ranks of multivariate polynomials B. Khoromskij, Zurich 2010(L14) 366

Proof. For continuous potentials it is sufficient to notice that

TT-decomposition (without QTT) obtained in the proof of Thm. 14.4

gives rise to FTT decomposition (it follows from the proof directly)

V (q1, . . . , qf ) = v1(q1)V2(q2) · · ·Vf−1(qf−1)vf (qf ),

where Vp for p = 2, . . . , f − 1, are rp−1 × rp matrices with elements which

are polynomials of degree in qp at most s, and rp are TT-ranks of V .

After discretization in variable qp with 2k points we obtain discrete

representation for Vp as rp−1 × 2 × 2 × . . .× 2 × rp array, and analogously to

the proof in the scalar case it can be shown that for the matrix

polynomial case QTT-ranks are bounded by (rp−1 + rp)(s+ 1).

In numerics, it is observed that constants hidden in O(·) in estimates of

Thm. 14.5 are not large. The following Hypothesis summarized results of

our numerical experiments.

Hypoth. 14.1. ([6]) Under premises of Thm. 14.5, the following rank

estimates hold:

1. For a general quadratic potential, V (q1, . . . , qf ) =Pf

ij aijqiqj ,

rankQT T (V ) ≤ f + 1.

Page 184: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

QTT ranks of multivariate polynomials B. Khoromskij, Zurich 2010(L14) 367

2. For a general cubic potential, V (q1, . . . , qf ) =Pf

ijk aijkqiqjqk,

rankQT T (V ) ≤ f + 1.

3. For a general quartic potential, V (q1, . . . , qf ) =Pf

ijkl aijklqiqjqkql,

rankQT T (V ) ≤ f(f + 1).

However, these are upper asymptotical estimates for general coefficients

of polynomials. For particular potentials ranks can be much smaller.

Lem. 14.3. ([6]) For harmonic potential,

V (q1, . . . , qf ) =

rX

k=1

wkq2k,

QTT-ranks are bounded by 6, and for Henon-Heiles potential (which was

used as a benchmark computations in molecular dynamics) of form

V (q1, . . . , qf ) =1

2

fX

k=1

q2k + λ

f−1X

k=1

„q2kqk+1 − 1

3q3k

«, (116)

QTT-ranks are bounded by 8 (in numerics we observe 7).

QTT ranks of special multivariate polynomials B. Khoromskij, Zurich 2010(L14) 368

Proof. Since the maximal QTT-rank of discretized monomial q2 is 3, the

result for harmonic potential follows from Thm. 14.1.

To get decomposition for the Henon-Heiles potential, first separate q1:

V =“

−λ3q31 + 1

2q21 λq11

”0BB@

1

q2

V2(q2, . . . , qf )

1CCA ,

where V2(q2, . . . , qf ) is the Henon-Heiles potential of q2, . . . , qf . Separation

of q2 gives

V =“

−λ3q31 + 1

2q21 λq11

”0BB@

1 0 0

q2 0 0

12q22 − λ

3q32 q2 1

1CCA

0BB@

1

q3

V3(q3, . . . , qf )

1CCA ,

which justifies the general structure of the FTT core at subsequent steps:

they are 3 × 3 matrices,

Gk(qk) =

0BB@

1 0 0

qk 0 0

12q2k − λ

3q3k qk 1

1CCA , thus TT-ranks are equal to 3.

Page 185: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

QTT ranks of special multivariate polynomials B. Khoromskij, Zurich 2010(L14) 369

To obtain QTT-decomposition on [a, b], one should consider binary

representation of qk, qk = a+ h(Pd−1

s=0 is2s−1), where h is a step size, and is

take values 0 and 1 (for simplicity, index k is omitted, of course, a, h, is,

will depend on k). Introducing new variables,

xs =a

d+ his2

s−1,

we obtain qk = x1 + . . .+ xs. Estimation of QTT-ranks is now reduced to

the separation of indices in block parameter-dependent matrix

G(q) =

0BB@

1 0 0

q 0 0

12q2 − λ

3q3 q 1

1CCA .

This matrix can be split into three parts. Its element in position (3,1) is a

degree-3 polynomial in q, thus rankQT T ( 12q2 − λ

3q3) ≤ 4, and for the linear

part they are bounded by 2 + 2 = 4, therefore the overall rank estimate is 8.

Rem. 14.2. Tucker ranks in all these cases are bounded by f , and lead

to O(ff ) scaling in general, while QTT-format gives polynomial storage

and polynomial complexity in f , even for the most general coefficients.

Literature to Lecture 14 B. Khoromskij, Zurich 2010(L14) 370

1. I.V. Oseledets, Approximation of 2d × 2d matrices using tensor decomposition. SIAM J. Matrix Anal.

Appl., 2010.

2. B.N. Khoromskij, O(d logN)-Quantics Approximation of N-d Tensors in High-Dimensional Numerical

Modeling. Preprint 55/2009 MPI MIS, Leipzig 2009 (submitted).

3. B.N. Khoromskij, and I. Oseledets. Quantics-TT approximation of elliptic solution operators in higher

dimensions. Preprint MPI MiS 79/2009, Leipzig 2009, submitted.

4. B.N. Khoromskij. Tensors-structured Numerical Methods in Scientific Computing: Survey on Recent

Advances. Preprint 21/2010, MPI MiS Leipzig 2010 (submitted).

5. I.V. Oseledets. Constructive representation of functions in tensor formats. Preprint INM Moscow, 2010.

6. B.N. Khoromskij, and I.V. Oseledets. DMRG + QTT approach to high-dimensional quantum molecular

dynamics. Preprint MPI MiS 68/2010, Leipzig 2010, submitted.

7. L. Grasedyck. Polynomial approximation in hierarchical Tucker format by vector-tensorization. Preprint

78, Aachen University, Aachen, 2010.

http://personal-homepages.mis.mpg.de/bokh

Page 186: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Lect. 15. Explicit QTT representation of multivariate matrices B. Khoromskij, Zurich 2010(L15)

Outline of Lecture 15.

1. Operator/matrix TT (OTT/MTT) formats.

2. Vector and operator OTT/OQTT ranks.

3. Laplace-type operators and special notations.

4. Shift and gradient matrices.

5. 1D Laplacian.

6. D-dimensional Laplacian.

7. Inverse Laplace operator in 1D.

8. Estimates on vector and operator QTT ranks for a class

of discrete elliptic operators.

9. Numerics.

Operator TT (OTT) decomposition B. Khoromskij, Zurich 2010(L15) 372

FTT decomposition induces the important concept of

multiplicative formats for the operators acting between two

TPHSs, A : X → Y, each of dimension d.

Ex. 14.1. X = Y = L2[0, 1]d. X = H10 ([0, 1]d), Y = H−1([0, 1]d).

Def. 14.1.(OTT/OTC decomp.) Introduce the rank-r operator TC

(OTC) decomposition simbolised by a set of factorised operators A,

A =X

j∈JG(1)(jd, j1)G(2)(j1, j2) · · ·G(d)(jd−1, jd),

with G(ℓ) = [G(ℓ)(jℓ, jℓ+1)] being the operator valued rℓ × rℓ+1 matrix,

where G(ℓ)(jℓ, jℓ+1) : Xℓ → Yℓ, (ℓ = 1, ..., d), s.t. the action Af on rank-1

function f ∈ X is defined as the rank-r TT/TC element in Y,

(Af) (y1, ..., yd) :=X

j∈Jg1(jd, y1, j1)g2(j1, y2, j2) · · · gd(jd−1, yd, jd),

with

gℓ(jℓ−1, yℓ, jℓ) = (G(ℓ)(jℓ−1, jℓ)fℓ)(yℓ).

Page 187: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Vector TT/QTT ranks B. Khoromskij, Zurich 2010(L15) 373

Vector TT ranks. TT representation may be applied to an operator Aover a vector space, which could be recognized as a “matricization“ ofTT-cores of a “vectorized“ matrix, e. g.

A`i1, j1, . . . , iD, jD

´=

r1X

α1=1

. . .

rD−1X

αD−1=1

U1 (i1, j1, α1)U2 (α1, i2, j2, α2) · . . . ·

· UD−1

“αD−2, iD−1, jD−1, αD−1

”UD

“αD−1, iD, jD

”.(117)

Let us now recur to the equation (117) in view of the basic results

obtained on the minimal possible k-th rank of an exact rather than

approximate TT decomposition of a tensor A (the k-th TT rank of A):

Def. 15.1. A multi-way n1 × . . .× nD-vector

A ∈ Rn1 × . . .× R

nD

is given, its k-th TT rank is the rank of its unfolding A(k) with the elem.

A(k) (i1 . . . ik ; ik+1 . . . iD) = A(i1 . . . iD), 1 ≤ k ≤ D − 1.

Once we apply this to a multi-way matrix rather than vector, TT

decomposition of which is given by (117), we arrive at the same concept

of matrix TT rank. This implies application of TT to a “vectorization” of

the matrix: a matrix is considered merely as a vector in (117), and its

Vector TT/QTT ranks B. Khoromskij, Zurich 2010(L15) 374

neither possibility to map vectors to vectors nor related properties are

taken into consideration. To emphasize this, we refer to this ranks as

vector TT ranks (presentation is based on V. Kazeev, BNK [5]).

Def. 15.2. A multi-way m1 × n1 × . . .×mD × nD-matrix

A : Rn1 × . . .× R

nD 7→ Rm1 × . . .× R

mD

is given, its k-th vector TT rank is the rank of its unfolding A(k)

(1 ≤ k ≤ D − 1) with the elements

A(k) (i1j1 . . . ikjk ; ik+1jk+1 . . . iDjD) = A(i1j1 . . . iDjD).

In particular this means that the minimal vector ranks of TT

decomposition of a certain matrix are somewhat independent from one

other, depending on the matrix in the aggregate. So we may consider a

minimal rank decomposition, which it holds for that no one of D − 1 ranks

can be reduced without introducing an error in (117) even if we allow the

others to grow. This makes it reasonable to compare ranks elementwise.

Def. 15.3. Let us say that a multiway matrix (vector) is of ranks not

greater than r1 . . . rD−1 if and only if for any k: 1 ≤ k ≤ D − 1 its k-th

vector TT rank is not greater than rk.

Page 188: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Operator TT/QTT ranks B. Khoromskij, Zurich 2010(L15) 375

Operator TT ranks. Vector TT rank of a matrix is of great

importance in view of storage costs and complexity of such basic

operations as dot product, multi-dimensional contraction,

matrix-by-vector multiplication, rank reduction and orthoganalization of a

tensor train. Their complexity upper bound is linear with respect to vector

TT rank upper bound raised to the power 2 or 3.

Even if we manage to perform a matrix-by-vector multiplication, this

may not be enough for solution of the problem involved. For example,

developing iterative solvers we are likely to be concerned with vector TT

ranks of a matrix-by-vector product.

Formally, ranks of TT decompositions are multiplied when two matrices

or a matrix and a vector are multiplied. Often this obvious estimate of

ranks of the product leads to unaffordable complexity estimates, but,

fortunately, is not sharp, so that low-rank approximation is possible.

A reasonable a priori estimate of ranks would allow one to rely upon

such the approximation procedure, complexity of which is cubic in respect

of ranks. Below we introduce the concept of operator TT rank.

Operator TT/QTT ranks B. Khoromskij, Zurich 2010(L15) 376

Def. 15.4. A multi-way matrix A : Rn1 × . . .× RnD 7→ Rm1 × . . .× RmD

given, for any vector X ∈ Rn1 × . . .× RnD let us denote vector TT ranks

of the matrix-by-vector product AX by r1 . . . rD−1. Then let us refer to

maxk=1...D−1,

X is of vector TT rank 1...1

rk

as the operator TT rank of A.

The following Prop. gives obvious inequality between the two ranks

introduced in Def. 15.4 and Def. 15.2.

Prop. 15.1. Operator TT rank does not exceed the maximum

component of vector TT rank.

This estimate is essentially not sharp. For example, consider two

vectors X,Y ∈ Rn1 × . . .× RnD such that X is of vector TT rank 1 . . . 1.

Then for any vector Z ∈ Rn1 × . . .× RnD of vector TT rank 1 . . . 1 the

tensor (XY ′) Z = 〈Y ,Z〉X is of vector TT rank 1 . . . 1, while

(Y X′) Z = 〈X,Z〉Y is of the same vector TT rank as Y . Consequently,

operator TT rank of XY ′ is equal to 1, while that of Y X′ is as high as

the maximum rank of TT cores of Y , which can be random and have a

very bad QTT structure resulting in a high vector TT rank of XY ′.

Page 189: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Explicit QTT for Laplace-related operators B. Khoromskij, Zurich 2010(L15) 377

Class of operators. We focus on QTT structure of finite difference

discretization ∆(d1...dD) of Laplace operator, considered over a

D-dimensional cube on tensor uniform grids, and, in one-dimensional case,

of its inverse as well.

The grids in question are tensor products of D one-dimensional uniform

grids, each k-th of them comprising 2dk points.

By discrete Laplace operator we mean a matrix

∆(d1...dD) = a1∆(d1)1 ⊗ I2d2 . . .⊗ I

2dD+ . . .+ I2d1 ⊗ . . .⊗ I

2dD−1 ⊗ aD∆

(dD)D ,

summed by D terms here, Im being an m×m identity matrix. The

weights ak are to take into consideration both the difference in grid steps

and anisotropy. For the sake of brevity these weights are let be 1 below

unless otherwise stated.

Each of ∆(dk)k may be any of the following 2dk × 2dk -matrices, depending

on the boundary conditions imposed:

Explicit QTT for Laplace-related operators B. Khoromskij, Zurich 2010(L15) 378

∆(dk)DD

=

0BBBBBBBBBBBBB@

2 −1

−1 2

...

...

... −1

−1 2 −1

−1 2

1CCCCCCCCCCCCCA

, ∆(dk)NN

=

0BBBBBBBBBBBBB@

1 −1

−1 2

...

...

... −1

−1 2 −1

−1 1

1CCCCCCCCCCCCCA

(118)

are the ones for Dirichlet and Neumann boundary conditions respectively,

∆(dk)DN

=

0BBBBBBBBBBBBB@

2 −1

−1 2

...

...

... −1

−1 2 −1

−1 1

1CCCCCCCCCCCCCA

, ∆(dk)ND

=

0BBBBBBBBBBBBB@

1 −1

−1 2

...

...

... −1

−1 2 −1

−1 2

1CCCCCCCCCCCCCA

(119)

are the ones for various boundary conditions in the two boundary points and

∆(dk)P

=

0BBBBBBBBBBBBB@

2 −1 −1

−1 2

...

...

... −1

−1 2 −1

−1 −1 2

1CCCCCCCCCCCCCA

(120)

is the one for periodic boundary conditions.

Page 190: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Notations B. Khoromskij, Zurich 2010(L15) 379

Notation.

I =

0@1 0

0 1

1A , J =

0@0 1

0 0

1A , I1 =

0@1 0

0 0

1A , I2 =

0@0 0

0 1

1A , P =

0@0 1

1 0

1A ,

E =

0@1 1

1 1

1A , F =

0@ 1 −1

−1 1

1A , K =

0@−1 0

0 1

1A , L =

0@0 −1

1 0

1A . (121)

To deal with 3 and 4-dimensional TT cores efficiently, we use matrix

notation for them and their convolutions. For instance, if n×m-matrices

Aαβ, α = 1 . . . r1, β = 1 . . . r2 are the TT blocks of a TT core U of mode

sizes n and m, left rank r1 and right rank r2, so that U (α, i, j, β) =`Aαβ

´ij

for all the values of the indices, then we write it just as a matrix

U =

26664

A11 · · · A1r2

.

.

....

.

.

.

Ar11 · · · Ar1r2

37775 ,

a core matrix (in square braces). As long as we aim to present TT

structure in terms of a narrow set of TT blocks, we need to focus on

rank structure of the cores, and that is why such a notation is convenient

in handling the cores of TT decomposition.

Notations B. Khoromskij, Zurich 2010(L15) 380

For every two TT cores U and V of proper sizes we define an inner core

product of them, U ⋊⋉V , as a regular product of the two core matrices,

their elements (TT blocks) being multiplied by means of tensor product,

e. g. U ⋊⋉V =

=

24A11 A12

A21 A22

35⋊⋉

24B11 B12

B21 B22

35 =

24A11 ⊗B11 + A12 ⊗B21 A11 ⊗B12 + A12 ⊗B22

A21 ⊗B11 + A22 ⊗B21 A21 ⊗B12 + A22 ⊗B22

35 ,

and an outer core product of them, U •V , as a tensor product of the two

core matrices, their elements (TT blocks) being multiplied by means of

regular matrix product, e. g.

U •V =

24A11 A12

A21 A22

35 •

24B11 B12

B21 B22

35 =

2666664

A11B11 A11B12 A12B11 A12B12

A11B21 A11B22 A12B21 A12B22

A21B11 A21B12 A22B11 A22B12

A21B21 A21B22 A22B21 A22B22

3777775.

In order to avoid confusion we use square braces for TT cores, which are

to be multiplied by means of inner or outer core product, and round

braces for regular matrices, which are to be multiplied as usual.

Page 191: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Representation of Laplace-like operator B. Khoromskij, Zurich 2010(L15) 381

Ex. 15.1. Both core products introduced above arise naturally from the

TT format. For instance, (117) could be recasted as

A = U1 ⋊⋉U2 ⋊⋉ . . .⋊⋉UD−1 ⋊⋉UD,

and a matrix product of A and B = V1 ⋊⋉V2 ⋊⋉ . . .⋊⋉VD−1 ⋊⋉VD could be put

down then as

AB = (U1 •V1) ⋊⋉(U2 •V2) ⋊⋉ . . .⋊⋉(UD−1 •VD−1) ⋊⋉(UD •VD) ,

let B be either a matrix or a vector.

As usual, by A⊗ k, k being natural, we mean a k-th tensor power of A. For

example, I⊗ 3 = I ⊗ I ⊗ I, and the same way for the core product

operations “⋊⋉“ and “•“.

TT structure of a “D-dimensional” Laplace-like operators.

Below we will also need a Laplace-like operator L(D), D ≥ 2, with a

Representation of Laplace-like operator B. Khoromskij, Zurich 2010(L15) 382

slightly more general structure:

L(D) = M1 ⊗R2 ⊗R3 ⊗ . . .⊗RD−2 ⊗RD−1 ⊗RD

+ L1 ⊗M2 ⊗R3 ⊗ . . .⊗RD−2 ⊗RD−1 ⊗RD + . . .

+ L1 ⊗L2 ⊗L3 ⊗ . . .⊗LD−2 ⊗MD−1 ⊗RD

+ L1 ⊗L2 ⊗L3 ⊗ . . .⊗LD−2 ⊗LD−1 ⊗MD, (122)

matrices Lk, Mk and Rk being of size mk × nk, 1 ≤ k ≤ D.

Lem. 15.1. For any D ≥ 2 the Laplace-like operator L(D) allows the

following rank-2 . . . 2 TT representation in terms of the blocks Lk, Mk and

Rk:

L(D) =

hL1 M1

i⋊⋉

24L2 M2

R2

35⋊⋉ . . .⋊⋉

24LD−1 MD−1

RD−1

35⋊⋉

24MD

RD

35 .

Rem. 15.1. Once QTT decompositions of each of “one-dimensional“

operators Lk, Mk and Rk, 1 ≤ k ≤ D, are known, they can easily be

merged into a QTT decomposition of the “D-dimensional“ operator L(D)

according to Lem. 15.1.

Page 192: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

“One-dimensional” shift and gradient matrices B. Khoromskij, Zurich 2010(L15) 383

“One-dimensional” shift and gradient matrices.

Let us introduce the QTT structure of the two such recognizable

“one-dimensional” operators as shift and gradient matrices:

S(d) =

0 1 0

. . .. . .

. . .

0 1 0

0 1

0

, G(d) =

1 −1 0

. . .. . .

. . .

1 −1 0

1 −1

1

,

size of both being equal 2d. A simple recursive block structure of G(k)

G(k) =

0@G(k−1) −J ′ ⊗(k−1)

G(k−1)

1A = I ⊗G(k−1) − J ⊗ J ′ ⊗(k−1),

“One-dimensional” shift and gradient matrices B. Khoromskij, Zurich 2010(L15) 384

in our core product notation leads straightforwardly to

G(d) =

hI J

i⋊⋉

24 G(d−1)

−J′ ⊗(d−1)

35 =

hI J

i⋊⋉

24I J

J′

35 ⋊⋉

2664

G(d−2)

−J′ ⊗(d−2)

−J′ ⊗(d−2)

3775

=hI J

i⋊⋉

24I J

J′

35 ⋊⋉

24 G(d−2)

−J′ ⊗(d−2)

35 = . . . =

hI J

i⋊⋉

24I J

J′

35

⋊⋉(d−2)

⋊⋉

24G(1)

−J′

35

=hI J

i⋊⋉

24I J

J′

35

⋊⋉(d−2)

⋊⋉

24I − J

−J′

35 .

Decomposition of a shift matrix is obtained by the same token:

S(d) =hI J

i⋊⋉

24I J

J ′

35

⋊⋉(d−2)

⋊⋉

24S(1)

J ′

35 =

hI J

i⋊⋉

24I J

J ′

35

⋊⋉(d−2)

⋊⋉

24JJ ′

35 .

One dimensional Laplacian. Consider “one-dimensional” Laplace

operator ∆(d)DD.

Like the gradient matrix dealt with above it has a low-rank QTT

structure, described in the next lemma.

Page 193: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

“One-dimensional” shift and gradient matrices B. Khoromskij, Zurich 2010(L15) 385

Lem. 15.2. For any d ≥ 2 it holds that

∆(d)DD =

hI J ′ J

i⋊⋉

2664

I J ′ J

J

J ′

3775

⋊⋉(d−2)

⋊⋉

2664

2I − J − J ′

−J−J ′

3775 .

Proof. Similarly to a gradient matrix, ∆(d)DD

exhibits a recursive block structure:

∆(k)DD

=

0@ ∆

(k−1)DD

−J′ ⊗(k−1)

−J⊗(k−1)∆

(k−1)DD

1A = I ⊗∆

(k−1)DD

− J′⊗ J

⊗(k−1)− J ⊗ J

′ ⊗(k−1),

which yields us its low-rank QTT representation:

∆(d)DD

=hI J′ J

i⋊⋉

2664

∆(d−1)

−J⊗(d−1)

−J′ ⊗(d−1)

3775 =

hI J′ J

i⋊⋉

2664

I J′ J

J

J′

3775 ⋊⋉

266666664

∆(d−2)

−J⊗(d−2)

−J′ ⊗(d−2)

−J⊗(d−2)

−J′ ⊗(d−2)

377777775

=hI J′ J

i⋊⋉

2664

I J′ J

J

J′

3775 ⋊⋉

2664

∆(d−2)

−J⊗(d−2)

−J′ ⊗(d−2)

3775 = . . . =

=hI J′ J

i⋊⋉

2664

I J′ J

J

J′

3775

⋊⋉(d−2)

⋊⋉

2664

2I − J − J′

−J

−J′

3775 .

D-dimensional Laplacian B. Khoromskij, Zurich 2010(L15) 386

D dimensions. As soon as Mk = ak∆(dk)k and Lk = Rk = I⊗ dk for any

k = 1 . . . D, L(D) (122) is a Laplace operator ∆(d1...dD) (377), and hence

the following corollary to Lem. 15.2. holds.

Lem. 15.3. For any D ≥ 2 the “D-dimensional” Laplace operator defined

by (377) has the following QTT structure in terms of the

“one-dimensional” Laplace operators ak∆(dk)k , 1 ≤ k ≤ D:

∆(d1...dD) =hI⊗ d1 a1∆

(d1)1

i

⋊⋉

24I

⊗ d2 a2∆(d2)2

I⊗ d2

35⋊⋉ . . .⋊⋉

24I

⊗ dD−1 aD−1∆(dD−1)

D−1

I⊗ dD−1

35⋊⋉

24aD∆

(dD)D

I⊗ dD

35 .

Next we match this with results of Lem. 15.2 and Lem. 15.3 according

to the Rem. 15.1.

As soon as we derive low-rank QTT representations of the supercores

involved, we will have the one of the D-dimensional Laplace operator

comprising these supercores at once.

Page 194: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

D-dimensional Laplacian B. Khoromskij, Zurich 2010(L15) 387

In case of Dirichlet boundary conditions we put QTT cores into the supercores involved and do the same

thing as before: reduce ranks as possible by elimination of dependent QTT blocks, which could be

conceived as sweeping column (in regard to the left core) or row (in regard to the right core)

transformation matrices through the “tensor train” just as it was done in the proof of Lem. 15.2.

Lem. 15.4. For any d ≥ 3 the following QTT representations hold.

24I

⊗ dk ak∆(dk)DD

I⊗ dk

35 =

24I J′ J

I

35 ⋊⋉

2666664

I J′ J

J

J′

I

3777775

⋊⋉(d−2)

⋊⋉

2666664

I ak

“2I − J − J′

−akJ

−akJ′

I

3777775,

hI⊗ dk ak∆

(dk)DD

i=

hI J′ J

i⋊⋉

2664

I J′ J

J

J′

3775

⋊⋉(d−2)

⋊⋉

2664

I ak

“2I − J − J′

−akJ

−akJ′

3775 ,

24ak∆

(dk)DD

I⊗ dk

35 =

24I J′ J

I

35 ⋊⋉

2666664

I J′ J

J

J′

I

3777775

⋊⋉(d−3)

⋊⋉

2666664

akI akJ′ akJ

akJ

akJ′

12I − 1

2I − 1

2I

3777775

⋊⋉

2664

2I − J − J′

−J

−J′

3775 .

D-dimensional Laplacian B. Khoromskij, Zurich 2010(L15) 388

Proof.

24I

⊗ dk ak∆(dk)DD

I⊗ dk

35 =

24I I J′ J

I

35

⋊⋉

266666664

I

I J′ J

J

J′

I

377777775

⋊⋉(d−2)

⋊⋉

266666664

I

ak

“2I − J − J′

−akJ

−akJ′

I

377777775

=

24I J′ J

I

35 ⋊⋉

2666664

I J′ J

J

J′

I

3777775

⋊⋉(d−2)

⋊⋉

2666664

I ak

“2I − J − J′

−akJ

−akJ′

I

3777775,

for a middle supercore. The terminal supercores are subcores of that, which allows to reduce ranks of

them similarly to how it was done in the proof of Lem. 15.2.

Page 195: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Laplacian inverse B. Khoromskij, Zurich 2010(L15) 389

1D Laplace operator inverse.

Next we derive low-rank QTT decompositions of inverse of a discretized

Laplace operator, Dirichlet-Neumann or Dirichlet-Dirichlet boundary

conditions being imposed. We will proceed from explicit representation of

∆(d)DD

−1and ∆

(d)DN

−1. Next Lem. follows by a direct check.

Lem. 15.5. Let ∆DD, ∆DN , be n× n-matrices. Then

∆DD−1

ij =1

n+ 1

8<:

i(n+ 1 − j), 1 ≤ i ≤ j ≤ n

(n+ 1 − i)j, 1 ≤ j < i ≤ n,

∆DN−1

ij =1

n+ 1

8<:

i(n+ 1), 1 ≤ i ≤ j ≤ n

(n+ 1)j, 1 ≤ j < i ≤ n.

Lem. 15.6. For any d ≥ 2 it holds that

∆(d)DN

−1=

hI I2 J J ′

i⋊⋉

2666664

I I2 J J ′

2E

I2 + J ′ E

I2 + J E

3777775

⋊⋉(d−2)

⋊⋉

2666664

E + I2

2E

E + I2 + J ′

E + I2 + J

3777775.

Laplacian inverse B. Khoromskij, Zurich 2010(L15) 390

Proof. According to Lem. 15.6, the inverse of the matrix ∆(d)DN has the

following form:

∆(d)DN

−1=

0BBBBBBBBBBBBBBBB@

1 · · · · · · · · · 1

.

.

. 2 · · · · · · 2

.

.

.

.

.

. 3 · · · 3

.

.

.

.

.

.

.

.

.

...

.

.

.

1 2 3 · · · 2d

1CCCCCCCCCCCCCCCCA

,

and hence, introducing matrices

K(k) =

0BBBBBBBBBBBBBBBB@

1 2 3 · · · 2k

.

.

. · · · · · · · · ·

.

.

.

.

.

. · · · · · · · · ·

.

.

.

.

.

. · · · · · · · · ·

.

.

.

1 2 3 · · · 2k

1CCCCCCCCCCCCCCCCA

,

1 ≤ k ≤ D − 1, which it holds for that

K(k) =

0@K(k−1) 2k−1E⊗(k−1) + K(k−1)

K(k−1) 2k−1E⊗(k−1) + K(k−1)

1A =

hI2 + J E

i⋊⋉

242k−1E⊗(k−1)

K(k−1)

35 ,

Page 196: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Laplacian inverse B. Khoromskij, Zurich 2010(L15) 391

we draw up the following:

∆(d)DN

−1=

hI I2 J J′

i⋊⋉

2666664

∆(d−1)DN

−1

2d−1E⊗(d−1)

K(d−1)′

K(d−1)

3777775

=hI I2 J J′

i⋊⋉

2666664

I I2 J J′

2E

I2 + J′ E

I2 + J E

3777775

⋊⋉

266666666666666666664

∆(d−2)DN

−1

2d−2E⊗(d−2)

K(d−2)′

K(d−2)

2d−2E⊗(d−2)

2d−2E⊗(d−2)

K(d−2)′

2d−2E⊗(d−2)

K(d−2)

377777777777777777775

=hI I2 J J′

i⋊⋉

2666664

I I2 J J′

2E

I2 + J′ E

I2 + J E

3777775

⋊⋉

2666664

∆(d−2)DN

−1

2d−2E⊗(d−2)

K(d−2)′

K(d−2)

3777775

= . . . =hI I2 J J′

i⋊⋉

2666664

I I2 J J′

2E

I2 + J′ E

I2 + J E

3777775

⋊⋉(d−2)

⋊⋉

2666664

E + I2

2E

E + I2 + J′

E + I2 + J

3777775.

Laplacian inverse B. Khoromskij, Zurich 2010(L15) 392

Lem. 15.7. Let d ≥ 2 and

λk = −2k−1 + 1

2, µk =

(2k−1 + 1)2

2k,

ξk =2k−1 + 1

2k + 1, ηk =

2k−2

2k + 1, C(k) =

0@λk µk

µk λk

1A

for 1 ≤ k ≤ d. Then ∆(d)DD

−1has a rank-5 . . . 5 QTT representation

∆(d)DD

−1= Wd ⋊⋉Wd−1 ⋊⋉ . . .⋊⋉W2 ⋊⋉W1,

that consists of the TT cores

Wd =hI 1

4C(d) C(d) −λdK −µdL

i, W1 =

266666664

13

(I + E)

−E

− 136F

− 13K

13L

377777775

,

Wk =

266666664

I 14C(k) C(k) −λkK −µkL

E

η2kF ξ2kE ξkηkK ξkηkL

η2kK ξkηkE

−η2kL ξkηkE

377777775

, 2 ≤ k ≤ d − 1.

Sketching Proof. Use the Sherman-Morrison-Woodbury formula and recursive representations for

k = 2, ..., d (see V. Kazeev, BNK [5]).

Page 197: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Collect operator QTT ranks B. Khoromskij, Zurich 2010(L15) 393

Thm. 15.1. The following upper bounds of vector QTT ranks of the corresponding matrices hold.

∆(d)DD

: 3 . . . 3

(Lem. 15.2)

∆(d)DN

, ∆(d)ND

: 4 . . . 4

∆(d)NN

: 4, 5 . . . 5, 4

∆(d)P

: 2, 3 . . . 3

([5])

∆(d)DD

−1: 4, 5 . . . 5, 4

(Lem. 15.7.)

∆(d)DN

−1, ∆

(d)ND

−1: 4 . . . 4

(Lem. 15.6)

∆(d1...dd)DD

: 3 . . . 3, 2, 4 . . . 4, 2 . . . . . . 2, 4 . . . 4, 2, 4 . . . 4, 3

(Lem. 15.3, 15.4)

∆(d1...dd)DN

, ∆(d1...dd)ND

: 4 . . . 4, 2, 5 . . . 5, 2 . . . . . . 2, 5 . . . 5, 2, 5 . . . 5, 4

∆(d1...dd)NN

: 4, 5 . . . 5, 2, 5, 6 . . . 6, 5, 2 . . . . . . 2, 5, 6 . . . 6, 5, 2, 5, 6 . . . 6, 4

∆(d1...dd)P

: 2, 3 . . . 3, 2, 3, 4 . . . 4, 2 . . . . . . 2, 3, 4 . . . 4, 2, 3, 4 . . . 4, 3

([5])

Proof. Follows from the lemmas presenting explicit QTT representation of the mentioned ranks.

Toward numerical issues B. Khoromskij, Zurich 2010(L15) 394

Rem. 15.2. Numerical experiments carried out with TT-Toolbox prove

all the upper bounds for vector TT/QTT ranks given in Thm. 15.1 to be

sharp; the corresponding explicit representations, to be of minimal rank.

Exer. 15.1. ∆(d)DD has a rank-5 . . . 5 explicit QTT representation (Lem.

15.x). Confirm it with the numerical QTT decomposition. Compare the

CPU time for explicit and algebraic decomposition for large n = 2d.

Exer. 15.2. Recall that eigenvectors u of ∆(dk)DD have the explicit rank-2

FQTT decomposition (Lect. 14). Check it by “analytic“ calculation of

the matrix-vectors product of the respective OQTT and FQTT

decompositions.

Exer. 15.3. Compare vector and operator ranks of ∆(d)DD

−1by QTT

calculations.

Page 198: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Numerics on quantics model B. Khoromskij, Zurich 2010(L15) 395

Tables 6, 7 represent the average QTT-ranks in approximation of function related matrices up to

tolerance ε = 10−5. Table 6 includs the example of matrix exponential (cf. Conj. 13.1). One can observe

that rank parameters are small, and depend very mildly on the grid size.

N \ r e−α∆1 , α = 0.1, 1, 10, 102 ∆−11 diag(1/x2) diag(e−x2

)

29 6.2/6.8/9.7/11.2 6.2 5.1 4.0

210 6.3/6.8/9.5/10.8 6.3 5.3 4.0

211 6.4/6.8/9.0/10.4 6.2 5.5 4.1

Table 6: QTT2-matrix-ranks of N × N-matrices for N = 2p.

N \ r 1/(x1 + x2) e−‖x‖ e−‖x‖2diag(e−x2

) ∆−12 1, ε = 10−6, 10−7, 10−8

29 5.0 9.4 7.8 3.8 3.6/3.6/3.6

210 5.1 9.4 7.7 3.9 3.6/3.6/3.6

211 5.2 9.3 7.5 3.9 3.7/3.7/3.7

Table 7: QTT2-ranks of functional N ×N-matrices, N = 2p.

Literature to Lecture 15 B. Khoromskij, Zurich 2010(L15) 396

1. I.V. Oseledets, Approximation of 2d × 2d matrices using tensor decomposition. SIAM J. Matrix Anal.

Appl., 2010.

2. B.N. Khoromskij, O(d logN)-Quantics Approximation of N-d Tensors in High-Dimensional Numerical

Modeling. Preprint 55/2009 MPI MIS, Leipzig 2009 (submitted).

3. I.V. Oseledets. Constructive representation of functions in tensor formats. Preprint INM Moscow, 2010.

4. B.N. Khoromskij, and I.V. Oseledets. DMRG + QTT approach to high-dimensional quantum molecular

dynamics. Preprint MPI MiS 68/2010, Leipzig 2010, submitted.

5. V. Kazeev, and B.N. Khoromskij. On explicit QTT representation of Laplace-like operators and their

inverse. Preprint MPI MiS 75/2010, Leipzig 2010, submitted.

http://personal-homepages.mis.mpg.de/bokh

Page 199: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Lect. 16. Preconditioned tensor truncated iterative solvers B. Khoromskij, Zurich 2010(L16) 397

Outline of Lecture 16.

1. Problem classes.

2. BVPs: TT/QTT truncated preconditioned iteration.

3. Laplacian-based preconditioners for slow and strongly

varying coefficients.

4. Recursive quadratures for matrix exponential family.

5. EVPs: TT/QTT truncated preconditioned iteration.

6. Green function iteration.

7. Numerical experience.

8. Parabolic problems.

9. Regularised QTT matrix exponential.

10. Numerics on QTT representation of multivariate

potentials.

Problem classes in Rd revisited B. Khoromskij, Zurich 2010(L16) 398

Elliptic (parameter-dependent) eq.: Find u ∈ H10 (Ω), s.t.,

Hu := − div (a gradu) + V u = F in Ω ∈ Rd.

EVP: Find a pair (λ, u) ∈ R ×H10 (Ω), s.t., 〈u, u〉 = 1,

Hu = λu in Ω ∈ Rd,

u = 0 on ∂Ω.

Parabolic equations: Find u : Rd × (0,∞) → R, s.t.

u(x, 0) ∈ H2(Rd) : σ∂u

∂t+ Hu = 0, H = ∆d + V (x1, ..., xd).

Specific features:

⊲ High spacial dimension: Ω = (−b, b)d ∈ Rd (d = 2, 3, ..., 100, ...).

⊲ Multiparametric eq.: a(y, x), u(y, x), y ∈ RM (M = 1, 2, ..., 100, ...,∞).

⊲ Nonlinear, nonlocal (integral) operator V = V (x, u), singular potentials.

Page 200: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Tensor-truncated preconditioned iteration: BVP B. Khoromskij, Zurich 2010(L16) 399

Parametric elliptic BVP on nonlinear manifold S:

A(y)U(y) = F,

Um+1 = Um − B−1(AUm −F), Um+1 := TS(Um+1) ∈ S.

Assumptions:

U,F allow the low S-rank tensor approximation,

A and B−1 are of low matrix S-rank,

A and B are spectral equivalent (close).

Good candidates for B−1:

(A) Slowly variable a(x): Shifted anisotropic d-Lapl. inverse (∆(d) + a0I)−1,

∆(d) = a1∆1 ⊗ IN ⊗ ...⊗ IN + ...+ IN ⊗ IN ...⊗ ad∆d ∈ RN⊗d×N⊗d

.

(B) Highly variable coef. a(y, x): ”reciprocal“ preconditioner

“∇T a∇

”−1≈ P := ∆−1

„∇T 1

a∇«

∆−1. (123)

Dolgov, BNK, Oseledets, Tyrtyshnikov [1]

QTT representation of operators (matrices) B. Khoromskij, Zurich 2010(L16) 400

Rank bounds for Laplacian-related matrices

Lem. 16.1. The following TT/QTT rank estimates hold:

rankC(∆d) = d, rankTT (∆d) = 2.

rankQTT (∆1) = 3, rankQTT (∆d) = 4, d ≥ 2, Kazeev, BNK [9].

rankQTT (∆−11 ) ≤ 5, [9]. rankQTT

(∇T a∇

)≤ 7 rankQTT (a), [1].

ε-rank: rankQTT (exp(−α∆1)) ≤ C| log ε|| logα| - numer. proof.

ε-rank: rankTT (∆−1d ) ≤ rankC(∆−1

d ) ≤ C| log ε| logN .

ε-rank: rankQTT (∆−1d ) ≤ C| log ε|2 logN .

∆−1

(∇T 1

a∇)

∆−1(∇T a∇

)= I +R, where for d = 1

rank(R) = 1, and for d ≥ 2, we have rank(R) = const in the

case of pwc coefficient with one interface. Numerically, it

demonstrates surprisingly good clustering properties, [1].

Page 201: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Low tensor rank Laplacian inverse B. Khoromskij, Zurich 2010(L16) 401

Ex. 16.1. ∆(d): FD negative Laplacian on H10 ([0, 1]d),

d = 2, 3, .... Sinc-type rank-R, approx. of ∆−1(d), R = 2M + 1,

BM :=

M∑

k=−M

ck

d⊗

ℓ=1

exp(−tk∆(ℓ)) ≈ (∆(d))−1, ∆(ℓ) = ∆ ∈ R

n×n.

Exponential convergence in R:∥∥(∆(d))

−1 − BM

∥∥ ≤ C0e−π√

M , for tk = ekh, ck = htk, h =π√M.

(124)

The ε-rank of (∆(d))−1 is O(| log ε|2), uniformly in d.

The matrix-vector multiplication of BM with rank-1 vector

in Rn⊗d

takes O(dRn logn) op. by the diagonalization

exp(−tk∆(ℓ)) = F ′ℓ ·Dℓ · Fℓ, Dℓ = diage−tkλ(ℓ)1 , ..., e−tkλ(ℓ)

n ,

where Fℓ is the ℓ-mode sin-transform n× n matrix, and λ(ℓ)i

(i = 1, ..., n) are the eigenvalues of the 1D Laplacian ∆(ℓ).

Low tensor rank Laplacian inverse B. Khoromskij, Zurich 2010(L16) 402

Equivalent representation by using tensor product FFT,

(∆(d))−1 ≈

d⊗

ℓ=1

FTℓ

M∑

k=−M

ck

d⊗

ℓ=1

diage−tkλ(ℓ)1 , ..., e−tkλ(ℓ)

n d⊗

ℓ=1

Fℓ =: LM .

Exer. 16.1. Construct the QTT Poisson/Yukawa solver on uniform grid

over [0, 1]d (large d, moderate n), by precomputing the QTT

decompositions of matrix exponential family,

exp(−tk∆(ℓ)) ∈ Rn×n, k = −M, ...,M.

Try with F = 1, large d and moderate grid-size n ≤ 128.

Exer. 16.2. Construct the TT Poisson/Yukawa solver on uniform grid

over [0, 1]d, by LM + FFT , and using the TT-rank recompression of the

Hilbert tensor Λ = [1/(λ(1)i1

+ ...+ λ(d)id

)]. Try the case F = 1, d = 3, for large

grid-size n (proceed from Exer. 10.1).

Page 202: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Analysis of precond: Stability of spectral equivalence B. Khoromskij, Zurich 2010(L16) 403

Introduce the class of rank-R Kronecker product preconditioners defined

by approximate inverse BR as above. Here the coefficients aℓ can be

chosen from the optimisation of the condition number in (here L0 = ∆(d)))

C1 〈AU,U〉 ≤ 〈L0U,U〉 ≤ C2 〈AU, U〉 ∀U ∈ Vn, (125)

so that the matrix L0 is supposed to be spectrally close/equivalent to the

initial stiffness matrix A.

The following lemma proves the spectral equivalence estimates.

Lem. 16.2. [5] (Spectral equivalence). Let us suppose that the constants

C0, C1, C2 are determined by (124) and (125), respectively. Choose M

such that the inequality C0‖L0‖‖B−1R ‖e−π

√M < q(M)C1 holds with q < 1,

then

eC1 〈AU, U〉 ≤D

B−1R U,U

E≤ eC2 〈AU, U〉 ∀U ∈ Vn, (126)

with some spectral equivalence constants eC1, eC2 > 0, that allow the

following bound on the condition number

eC2

eC1

≤ 1

1 − q(M)

C2

C1+

q(M)

1 − q(M).

Analysis of precond: Stability of spectral equivalence B. Khoromskij, Zurich 2010(L16) 404

Proof. We have‚‚‚L0 − B

−1R

‚‚‚ =‚‚‚−L0(L−1

0 − BR)B−1R

‚‚‚ ≤ ‖L0‖‚‚‚L−1

0 − BR

‚‚‚‚‚‚B

−1R

‚‚‚ .

Using (124) the constants eC1, eC2 > 0 can be estimated by

C1 − C0‖L0‖‖B−1R ‖e−π

√M ≤ eC1, eC2 ≤ C2 + C0‖L0‖‖B

−1R ‖e−π

√M ,

with C0 > 0 defined in (124). Combining the above inequality with the

error estimate (124) and with (125) leads to the desired bound.

Rem. 16.1. [5] Lem. 16.2 indicates that the rank-R preconditioner B−1R

has linear (or quadratic) scaling in the univariate problem size n, providing

at the same time the condition number of order C2/C1 as in (125), as

soon as the estimate

C0‖L0‖‚‚‚B

−1R

‚‚‚ e−π√

M < C1

holds. The latter is valid for R = O(| log(q(M)/C1)|2) = O((logn)2). Notice

that the modified sinc-quadrature leads to the improved convergence rate

in (124), C0e−αM/ log(M) with α = log(cond(L0)), again providing the rank

estimate R = O(log(| log(q(M))|/C1)) = O((logn)2).

Page 203: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Numerics on tensor-structured Laplacian inverse B. Khoromskij, Zurich 2010(L16) 405

BNK, Oseledets, [6].

N Precomp Time for sol Residue Relative L2 error

28 6.14 2.98 6.6e-06 7.0e-06

29 8.37 3.52 8.7e-06 7.0e-06

210 10.81 4.02 9.4e-06 7.0e-06

Table 8: Solving 100D Poisson equation in C-QTT.

Numerical complexity in QTT-format: W = O(d| log ε|2 logN).

We used mixed Can-QTT (or simply CQTT) format: approximate each

individual factor in QTT format but do not assemble the full QTT

matrix. It is easy to design an algorithm for matrix and vector operations

in such format, using algorithms from TT Toolbox.

The results for d = 100, given in Table 8 confirm the expected complexity.

Now, as a precomputation, we have time for evaluation of matrix

exponentials. As a pay off, the solution time is now slightly higher, but

still it is linear in d and logarithmic in grid size n.

Numerics on tensor-structured Laplacian inverse B. Khoromskij, Zurich 2010(L16) 406

n Step 1 Step 2 Time for sol Residue Relative L2 error

27 7.34 1.66 0.11 7.4e-03 6.3e-06

28 10.01 2.67 0.19 2.1e-04 8.3e-06

29 12.43 7.68 0.36 2.0e-04 1.0e-05

210 18.71 27.89 0.49 1.7e-03 1.8e-05

Table 9: Numerics for 3D Poisson equation, using 2k-quadrature

n Step 1 Step 2 Time for sol Residue Relative L2 error

27 4.68 8.56 0.58 1.18e-02 7.35e-05

28 6.94 12.98 0.78 4.42e-01 7.81e-04

29 9.99 24.12 1.17 5.97e+00 1.75e-02

210 13.32 45.37 1.48 1.0774e+00 6.23e-01

Table 10: Numerics for 10D Poisson equation, using 2k-quadrature

Page 204: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

The recursive quadrature for matrix exponential family B. Khoromskij, Zurich 2010(L16) 407

We discuss a simple recursion that connects previously computed

exponents exp(−tp∆1), p < k, with the new one for the number k. Denote

these matrices by Φk,

Φk = exp(−tk∆1).

The simplest possible recursion is

Φk = Φ2k−1, corresponding to tk = 2tk−1.

This is possible by choosing M such that eh = 2, or equivalently,

h = log 2 = π√M, therefore

M =

„π

log 2

«2

≈ 20.54.

Since M should be integer, so we select M = 21 or M = 20 and slightly

modify h (h = log 2 to make the recursion exact. This yields a new

quadrature formula with

tk = 2k, ck = 2k log 2, k = −M, . . . ,M. (127)

This quadrature will be called 2k-quadrature.

The recursive quadrature for matrix exponential family B. Khoromskij, Zurich 2010(L16) 408

The accuracy of quadrature formula (127) depends on the interval where

it is considered (i.e. on the spectrum of ∆1), but it always gives an

excellent approximate inverse to serve as preconditioner (with relative

accuracy not smaller than 10−3). The special structure of the quadrature

nodes allows the computation of all exponents fast, in 2M + 1

multiplications — in exact arithmetic.

However, in the approximate case, the error may accumulate during the

squaring (since for k = −M the exponents are close to the identity

matrix), and a mixed scheme is more preferable. Up to some k0 the

exponents are computed by scaling and squaring method, and after that

they are just the squares of the previously computed exponents.

A similar approach can be adopted to obtain a more accurate quadrature.

For example: another possible recurrence relation is

Φk = Φk−1Φk−2, or tk = tk−1 + tk−2.

Denoting a = eh , implies that a satisfies the quadratic eq.

a2 − a− 1 = 0, hence a is a a golden ratio: a = ρ =1 +

√5

2≈ 1.6180.

Page 205: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

The recursive quadrature for matrix exponential family B. Khoromskij, Zurich 2010(L16) 409

The corresponding M is larger:

M =

„π

log ρ

«2

≈ 42.62,

so one can choose M = 42 or M = 43.

The respective quadrature weights and nodes are (ρk-quadrature)

tk = ρk, ck = ρk log ρ.

This quadrature formula is around 1000 times more accurate than

2k-quadrature. The number of quadrature points can be slightly

decreased, since the norm of several first and last summands is negligible.

There are other possible recursions:

Φk = Φ2k−2, i.e. tk = 2tk−2, and Φk = Φk−2Φk−3, i.e. tk = tk−2 + tk−3,

and so on, which lead to increased M and increased accuracy, yielding a

whole family of “cheap” quadrature rules.

Tensor-truncated preconditioned iteration: EVP B. Khoromskij, Zurich 2010(L16) 410

Elliptic spectral problem (EVP) in S:

AU = λU; 〈U,U〉 = 1.

Preconditioned iteration (or inverse iteration, A → A−1)

Um+1 = Um − B−1(AUm − λmUm),

Um+1 := TS(Um+1) ∈ S,

Um+1 := Um+1/‖Um+1‖|S , λm+1 = 〈AUm+1,Um+1〉|S .

Direct minimization of the Rayleigh quotient (energy func.)

〈AU,U〉 → min, 〈U,U〉 = 1.

DMRG + QTT/MPS.

Green function iteration by direct inversion of Yukawa-like

operator.

Page 206: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Tensor-truncated preconditioned iteration: EVP B. Khoromskij, Zurich 2010(L16) 411

Ex. 15.2. We present the results on the iterative calculation of the

minimal eigen-value for the FD d-Laplacian by inverse power method with

the rank truncation [2]. We discretise the problem on (0, π)d using nd grid

points and apply the sinc-quadr. with M = 49 to obtain the rank-(2M + 1)

appprox. of the Laplacian inverse for d = 3, 10, 50.

Table 411 presents the CPU time (sec.) per iter., the relative H1-error in

the eigen-funct. and the relative error in the eigen-value for n = 29. In all

cases, the number of power iter. does not exceed 6.

d Time/it δλ δu

3 0.9 3.1 · 10−6 4.5 · 10−4

10 2.9 3.1 · 10−6 3.8 · 10−4

50 14.7 3.1 · 10−6 3.1 · 10−4

Table 11: Minimal eigen-value for the d-dimensional Laplacian (d = 3, 10, 50).

Table 411 clearly indicates the linear scaling in d of our tensor solver.

Detailed numerical illustations on tensor-structured eigen-value solvers

can be found in Hackbusch, BNK, Sauter, Tyrtyshnikov [2].

Many-particle models B. Khoromskij, Zurich 2010(L16) 412

Objectives in Many-particle models.

The electronic Schrodinger eq. for many-particle system in Rd,

HΨ = ΛΨ

with the Hamiltonian H = H[r1, ..., rNe ],

H := −1

2

NeX

i=1

∆i −KX

a=1

NeX

i=1

Za

|ri −Ra|+

X

i<j≤Ne

1

|ri − rj |+

X

a<b≤K

ZaZb

|Ra −Rb|,

Za, Ra are charges and positions of the nuclei, ri ∈ R3.

Hence the problem is posed in Rd with high dimension d = 3Ne, where Ne

is the (large) number of electrons.

Desired size of the system is Ne = O(10q), q = 1, 2, 3, 4, ...?

Proteins: q = 3, 4.

Molecular dynamics, electronic structure calculation for small molecules:

q = 1, 2.

The Hartree-Fock equation in R3 (Lect. 17).

Page 207: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Many-particle models B. Khoromskij, Zurich 2010(L16) 413

Lippman-Schwinger integral formulation of the electronic

Schrodinger eq. in R3N , N is the (large) number of electrons

(−∆ − V )ψ = λψ ⇒ ψ = (−∆ − λ)−1V ψ.

New tensor method Truncated Green function iteration for

Hu := (−∆ + V )u = λu.

Rewrite the EVP as an integral equation (fixed point eq.)

u = −(−∆ + λ)−1V u := G(λ)u,

solve it by the power iteration (p.w.c. elements are

admissible!),

um+1 = G(λm)um/‖G(λm)um‖, λm+1 = (Hum+1, um+1).

Truncated Green function iteration B. Khoromskij, Zurich 2010(L16) 414

The truncated Green function iteration for solving the

spectral problem reads as (see BNK [4]),

(∆d + V)U = EU, ‖U‖ = 1, U ∈ RI . (128)

This iteration takes the form

Um+1 = (∆d−EmI)−1VUm, Um+1 := TS(Um+1), Um+1 :=

Um+1

||Um+1||,

and Em+1 is recomputed at each step as a Rayleigh quotient:

Em+1 = 〈LUm+1, Um+1〉.

The particular numerical illustrations are presented for the

Schrodinger eq. describing the hydrogen atom.

Page 208: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Numerics on tensor-structured EVP solvers B. Khoromskij, Zurich 2010(L16) 415

The Schrodinger equation for hydrogen atom,

(−1

2∆ − 1

‖x‖ )u = λu, x ∈ Ω := R3.

The eigenpair with minimal eigenvalue is

u1(x) = e−‖x‖, λ1 = −0.5,

and e−‖x‖ can be proven to have the low canonical rank, r = O(| log ε|).

The Green function iteration + QTT elliptic inverse.

N ×N ×N-grid, Time = O(logN).

BNK, Oseledets, [6]

N Time for 1 iter. Iter. Eigenvalue error

27 8.5 8 6.1e-03

28 13 8 1.5e-03

29 18 8 4.0e-04

210 25 8 1.0e-04

Time dependent problems B. Khoromskij, Zurich 2010(L16) 416

Parabolic BVP in S⊂ Vn:

∂U

∂t− iAU = 0, U0 = TS(U(0)).

The regularised solution operator by QTT -matrix exponential, [BNK ’10]

U(t) = eiAtU0 ≈ TS(eiAtB)TS(B−1U0), t ≥ 0.

Spectral decomposition: AUn = λnUn,

U(t) ≈NX

n=1

eiλnt〈U0,Un〉Un.

Recursive time-space separation by Cayley transform

[Gavriluyk, BNK ’10] in progress

U(t) =∞X

p=0

L(0)p (t)Xp L

(0)p − Laguerre polynomials

X0 = −(iA − I)−1U0, Xp+1 = iA(iA − I)−1Xp, p = 0, 1, ...

Implicit integrators by time stepping.

Page 209: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Parabolic Problems in Molecular Dynamics B. Khoromskij, Zurich 2010(L16) 417

Heat-like problems in Rd,

∂u

∂t−Hu = 0, H = ∆ + V.

The Schrodinger equation for nuclei,

i~∂u

∂t−Hu = 0, H = T + V,

with kinetic energy

T = −dX

ℓ=1

~2

2Mℓ∆xℓ , D(T ) = H2(R3d) ⊂ D(V ), xℓ ∈ R

3,

and a potential V (x1, ..., xd) ≈ E(x1, ..., xd) – PES.

PES E(x1, ..., xd) is computed by multiple solving of HF equation!

Multi-Configuration Time-Dependent Hartree (MCTDH) method

[Meyer et al. 2000], [Lubich, Koch ’08-’09]

QTT tensor approximation of PES.

Spectral solver: DMRG in QTT format with rank compression

BNK, Oseledets [8].

Numerics to MD: QTT matrix exponential B. Khoromskij, Zurich 2010(L16) 418

Direct solver using Quantics-TT matrix exponential.

The solution operator e−iHt could not be approximated by

QTT -matrix exponential with uniform bound on rankQTT .

Introduce the regularized solution operator and iterated

wavepacket

Σp = H−pe−iHt, Up = HpU(x, 0), p = 1, 2, 3,

⇒ stable QTT approximation of Σp,

⇒ low rank QTT approximation of Up.

Assumption:

U(x, 0) contains only low frequences of H.

QTT approximation for fixed t:

U(x, t) = e−iHtU(x, 0) ≈ (TSΣp)TSUp, t ≥ 0.

Page 210: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Numerics to MD: QTT matrix exponential B. Khoromskij, Zurich 2010(L16) 419

Numerical example: quantum harmonic oscillator corresponding to

V (x) =1

2‖x‖2,

propagating the wave packet (exact eigenfunction) by

U(x, t) = Un(x)e−i(n+1/2)t,

where Un(x) is the eigenfunction of quantum harmonic oscillator

(Gaussian multiple of the Hermit polynomials).

Set n = 0, t = 1.0, ε = 10−6, p = 2, d = 1.

The average rankQT T of both the initial wave packet and the time

evolved solutions are of about several ones.

N rank(H−pcos(Ht)) rank(H−pcos(Ht)Up) rank(Up)

28 33.8 3.7 4.7

29 33.2 3.7 4.7

210 32.5 3.6 4.9

Numerical cost is = O(r3 logN), N - spatial grid-size.

[BNK ’10], Preprint 21/2010, MPI MiS, Leipzig.

Numerics to QTT representation of PES B. Khoromskij, Zurich 2010(L16) 420

For particular potentials ranks can be small or even uniformly bounded in

physical dimension f (cf. Lect. 14).

For harmonic potential, QTT-ranks are bounded by 6,

V (q1, . . . , qf ) =

fX

k=1

wkq2k, rankQT T (V ) ≤ 6

For Henon-Heiles potential QTT-ranks are bounded by 7,

V (q1, . . . , qf ) =1

2

fX

k=1

q2k + λ

f−1X

k=1

„q2kqk+1 − 1

3q3k

«, rankQT T (V ) ≤ 7.

Notice: Tucker ranks in all these cases are equal to f ⇒ O(ff ) scaling.

QTT-format gives polynomial in f storage and complexity O(fr3 logN).

Can-to-QTT algorithm: Each rank-1 term is compressed to QTT, and

then addition with compression at each step: complexity O(Rdfr3),

N = 2d.

This is a preprocessing step to be performed only once for each particular

model and potential.

Page 211: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Numerics to QTT representation of PES B. Khoromskij, Zurich 2010(L16) 421

Compression of Henon-Heiles potentials for different dimensions.

Figure 25: Timings: Can-to-QTT approximation of Henon-Heiles pot., N = 1024, f = 4, . . . , 256

The maximal rank for V (Q) is 5, and for the highest dimension considered

f = 256, memory to store V (Q) in QTT format is 62.5 KB.

Dependence from one-dimensional grid size N = 2d is logarithmic, O(logN)

Discussion on numerical issues B. Khoromskij, Zurich 2010(L16) 422

Rem. 16.1. Since the QTT-rank of ∆(f) is bounded by 4 the numerical

commplexity of matrix-vector multiplication and matrix storage is only

limited by the efficiency of QTT representation to multi-dimensional PES.

It is known as a severe bottleneck of modern numerical methods in

molecular dynamics.

Rem. 16.2. Further numerical examples will be presented in Lect. 18

related to DMRG solution of spectral problem including high dimensional

PES.

Exer. 16.3. Compute the QTT decomposition to the few first

eigenfunctions of quantum harmonic oscillator, Un(x), x ∈ Rf (Gaussian

multiple of the Hermit polynomials for n = 0, 1, 2, ...).

Exer. 16.4. Look on the QTT rank of the time evolved solutions,

U(x, t) = Un(x)e−i(n+1/2)t, x ∈ Rf ,

(stable in time). Consider only the real part.

Page 212: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Literature to Lecture 16 B. Khoromskij, Zurich 2010(L16) 423

1. S.V. Dolgov, B.N. Khoromskij, I. Oseledets, and E.E. Tyrtyshnikov: Tensor Structured Iterative

Solutions of Elliptic Problems with Jumping Coeff. Preprint 55/2010 MPI MIS, Leipzig 2010, subm..

2. W. Hackbusch, B.N. Khoromskij, S. Sauter, and E. Tyrtyshnikov, Use of Tensor Formats in Elliptic

Eigenvalue Problems. Preprint 78/2008, MPI MiS Leipzig 2008 (submitted).

3. B.N. Khoromskij, O(d logN)-Quantics Approximation of N-d Tensors in High-Dimensional Numerical

Modeling. Preprint 55/2009 MPI MIS, Leipzig 2009 (J. Constr. Appr., accepted).

4. B.N. Khoromskij, On tensor approximation of Green iterations for Kohn-Sham equations. Comp. and

Visualization in Sci., 11 (2008) 259-271.

5. B.N. Khoromskij, Tensor-Structured Preconditioners and Approximate Inverse of Elliptic Operators in

Rd. J. Constructive Approx. 30:599-620 (2009).

6. B.N. Khoromskij, and I. Oseledets. Quantics-TT approximation of elliptic solution operators in higher

dimensions. Preprint MPI MiS 79/2009, Leipzig 2009, J. Numer. Math. 2010, accepted.

7. B.N. Khoromskij, and Ch. Schwab. Tensor-Structured Galerkin Approximation of Parametric and

Stochastic Elliptic PDEs. Preprint MPI MiS 9/2010, Leipzig 2010, SISC 2010, accepted.

Literature to Lecture 16 B. Khoromskij, Zurich 2010(L16) 424

8. B.N. Khoromskij, and I.V. Oseledets. DMRG + QTT approach to high-dimensional quantum molecular

dynamics. Preprint MPI MiS 68/2010, Leipzig 2010, submitted.

9. V. Kazeev, and B.N. Khoromskij. On explicit QTT representation of Laplace-like operators and their

inverse. Preprint MPI MiS 75/2010, Leipzig 2010.

Page 213: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Lect. 17. Preconditioned tensor truncated iterative solvers B. Khoromskij, Zurich 2010(L17) 425

Outline of Lecture 17.

1. Challenging computational features for Hartree-Fock

equation in electronic structure calculations.

2. Fast tensor convolution with Newton potential.

3. Tensor represenation of Hartree and exchange potentials.

4. Fast nolinear iteration via multilevel DIIS.

5. SPDEs with additive and log-additive multiparametric

coefficients.

6. Rank bound for solutions of 1D SPDEs.

7. Rank estimate for the FD multiparametric matrices.

8. Computation in canonical format.

9. Numerics for additive and log-additive cases.

Ab initio and DFT models B. Khoromskij, Zurich 2010(L17) 426

• Minimization of the Hartree-Fock energy functional

IHF = inf

8<:

NeX

i=1

1

2

Z

R3|∇φi|2 +

Z

R3ρVc +

1

2

Z Z

R3

ρ(x)ρ(y) − |τ(x, y)|2‖x− y‖ dxdy

9=; ,

(129)

over φi ∈ H1(R3),R

R3 φiφj = δij , 1 ≤ i, j ≤ Ne, where

τ(x, y) =NePi=1

φi(x)φi(y) - electron density matrix,

ρ(x) = τ(x, x) - electron density,1

‖x‖ - Newton potential,

Vc - external potential with singularities at centers of atoms.

• Hartree-Fock equation - the Euler-Lagrange equation of (129),

»−1

2∆ − Vc(x) +

Z

R3

ρ(y)

‖x− y‖dy–φi(x) −

1

2

Z

R3

τ(x, y)

‖x− y‖ φi(y)dy = λφi(y),

• Hartree-Fock-Slater/Kohn-Sham equation - DFT

»−1

2∆ − Vc(x) +

Z

R3

ρ(y)

‖x− y‖dy − αVρ(x)

–ψ = λψ, Vρ(x) =

3

πρ(x)

ff1/3

.

Page 214: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Electronic structure calculations B. Khoromskij, Zurich 2010(L17) 427

The Hartree-Fock equation

»−1

2∆ − Vc(x) +

Z

R3

ρ(y)

‖x− y‖dy–φi(x) −

1

2

Z

R3

τ(x, y)

‖x− y‖ φ(y)dy = λiφi(y),

Challenging features:

⊲ Nonlinear eigenvalue problem for i = 1, ...,Ne.

⊲ Nonlocal (integral) convolution-type operators

(Ne convolutions with the Newton potential).

⊲ Tensor decomposition on large spatial grids in 3D.

⊲ High accuracy.

Tensorized L2-projection of 1/‖x‖ onto p.w.c. basis functions over

n× n× n grid in [0, 1]3 ∈ R3. The resultant low rank canonical tensor is the

important ingredient in the fast Hartree-Fock solver including multiple

convolutions with the Coulomb (Newton) potential in the range n ≈ 104.

Ex. I: Canonical approx. to projected 1/‖x‖ in R3. B. Khoromskij, Zurich 2010(L17) 428

10 20 30 4010

−6

10−5

10−4

10−3

10−2

10−1

rank

n=64n=128n=256

Figure 26: Canonical approximation of 1/‖x‖ via sinc quadratures (solid lines). Algebraically

recompressed approximations (marked solid lines). [Bertoglio, BNK ’08]

Fast tensor convolution in R3 vs. FFT, [BNK ’08]

Matlab time/sec, linear scaling in Rρ and n, Rρ = 861, r = 15.

n3 1283 2563 5123 10243 20483 40963 81923 163843

FFT3 4.3 55.4 582.8 ∼ 6000 – – – ∼ 2 years

C ∗ C 1.0 3.1 5.3 21.9 43.7 127.1 368.6 700.2

Page 215: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Tensor representation of HF equation B. Khoromskij, Zurich 2010(L17) 429

For a given finite basis set gµ1≤µ≤Nb , gµ ∈ H1(R3), the molecular

orbitals ψi are represented (approximately) as

ψi =

NbX

µ=1

Cµigµ, i = 1, ...,N. (130)

To derive the equation for the unknown coefficients matrix

C = Cµi ∈ RNb×N , we first introduce the mass (overlap) matrix

S = Sµν1≤µ,ν≤Nb , given by

Sµν =

Z

R3gµgνdx,

the stiffness matrix H = hµν of the core Hamiltonian H = − 12∆ + Vc,

hµν =1

2

Z

R3∇gµ · ∇gνdx+

Z

R3Vc(x)gµgνdx, 1 ≤ µ, ν ≤ Nb,

and the symmetric density matrix

D = 2CC∗ ∈ RNb×Nb . (131)

The nonlinear terms representing the Galerkin approximation of the

Hartree and exchange operators are usually constructed by using the

Tensor representation of HF equation B. Khoromskij, Zurich 2010(L17) 430

so-called two-electron integrals, defined as

bµν,κλ =

Z

R3

Z

R3

gµ(x)gν(x)gκ(y)gλ(y)

‖x− y‖ dxdy, 1 ≤ µ, ν, κ, λ ≤ Nb.

Introducing the Nb ×Nb matrices J(D) and K(D), with D defined by (131),

J(D)µν =

NbX

κ,λ=1

bµν,κλDκλ, K(D)µν = −1

2

NbX

κ,λ=1

bµλ,νκDκλ,

and then the complete Fock matrix F ,

F (D) = H +G(D), G(D) = J(D) +K(D), (132)

one obtains the respective Galerkin system of nonlinear equations for the

coefficients matrix C ∈ RNb×N ,

F (D)C = SCΛ, Λ = diag(λ1, ..., λN ), (133)

C∗SC = IN ,

where the second equation represents the orthogonality constraintsRR3 ψiψj = δij , with IN being the N ×N identity matrix.

Page 216: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Tensor representation of HF equation B. Khoromskij, Zurich 2010(L17) 431

The Galerkin representation of the Hartree operator (the Coloumb

matrix) in tensor method is based on the agglomerated integrals,

J(D)µν =

Z

R3gµ(x)VH(x)gν(x)dx, 1 ≤ µ, ν ≤ Nb, (134)

by a single convolution transform in R3 to compute the Hartree potent.,

VH = ρ ∗ 1

‖ · ‖ ,

where the electron density is given by

ρ(y) = 2NX

a=1

0@

NbX

κ,λ=1

CκaCλagκ(y)gλ(y)

1A . (135)

We represent the matrix entries of K(D) by the following three loops

Khoromskaia [2]: For a = 1, ...,N , compute the convolution integrals,

Waν(x) =

Z

R3

gν(y)NbP

κ=1Cκagκ(y)

‖x− y‖ dy, ν = 1, ...,Nb, (136)

Tensor representation of HF equation B. Khoromskij, Zurich 2010(L17) 432

and then the scalar products

Kµν,a =

Z

R3

24

NbX

κ=1

Cκagκ(x)

35 gµ(x)Waν(x)dx, µ, ν = 1, ...,Nb. (137)

Finally, the entries of the exchange matrix are given by sums over all

orbitals,

K(C)µν =NX

a=1

Kµν,a, µ, ν = 1, ...,Nb. (138)

The advantage of above representations is due to the minimization of the

number of convolution products that have to be computed by numerical

quadratures. What is even more important, that we have the possibility of

efficient low-rank separable approximation of the discretised density ρ(x)

as well as of the auxiliary potentials Waν(x) at step (136).

Page 217: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Example II: Hartree potential on large grids B. Khoromskij, Zurich 2010(L17) 433

VH(x) :=

R3

ρ(y)

|x− y|dy =

(ρ ∗ 1

‖ · ‖

)(x).

Represent orbitals in “approximating basis” gk, e.g., the

GTO basis,

ρ(x) =

N/2∑

i=1

(ϕi)2, ϕi =

R0∑

k=1

ci,kgk(x), R0 ≈ 100,

gk = (x−Ak)βke−λk(x−Ak)2

, x ∈ R3.

O(n logn)-computation of VH and its Galerkin matrix in tensor

format, on large n× n× n grids, error O(h3), h = 1/n.

Use Canonical-to-Tucker-to-canonical transform on a

sequence of grids to reduces the initial rank, Rρ ≈ R20/2.

V. Khoromskaia, BNK [1].

Compared with MOLPRO analytic program.

Example II: Hartree potential on large grids B. Khoromskij, Zurich 2010(L17) 434

−6 −4 −2 0 2 4 6

10−4

10−3

hart

ree

Abs. approx. error, VH

for H2O

a) atomic units

n=4096n=8192Ri−4096−8192

2000 4000 6000 8000 10000 12000 14000 160000

1

2

3

4

5

6

grid size

min

utes

H2O , r

T=20

C−2−T time3D conv. time

a) Absolute error of the tensor-product computation for the Hartree

potential of the HO2 molecule (visualised in Ω = [−6, 6] × 0 × 0);b) CPU times corresponding to n× n× n-grid, up to n = 16000.

Page 218: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Ex. II: Error in the Coulomb matrix Jkm B. Khoromskij, Zurich 2010(L17) 435

Coulomb (Galerkin) matrix is computed by tensor inner products in gk,

Jkm :=

Z

R3gk(x)VH(x)gm(x)dx, k,m = 1, . . . R0, x ∈ R

3.

a) Electron density of H2O in Ω = [−4, 4] × [−4, 4] × 0.c) Absolute approx. error for the Coulomb matrix Jkm (≈ 10−6).

Ex. II: The Exchange Galerkin Matrix Kex = Kkm B. Khoromskij, Zurich 2010(L17) 436

Khoromskaia [2]. Linear scaling in n, cubic in R0.

Kk,m := −1

2

Z

R3gk(x)

τ(x, y)

|x− y| gm(y)dxdy, k,m = 1, . . . R0.

Absolute L∞-error in the matrix elements of Kex for the density of CH4

and pseudodensity of CH3OH.

Univariate grid size n = 1024, 4096. Approximation error O(h3), h = 1/n.

Page 219: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Ex. III: The tensor-truncated iter. for H-F eq. B. Khoromskij, Zurich 2010(L17) 437

gµ ∈ H1(R3) : ψi =

NbX

µ=1

Cµigµ, i = 1, ...,N.

For C = Cµi ∈ RNb×N , and F (C) = H + J(C) −K(C), with Galerkin

matrices

I → S, H = −1

2∆ + Vc → H, VH → J(C), K → K(C),

solve

F (C)C = SCΛ,

C∗SC = IN .

Multilevel “Fixed-point” tensor-truncated iteration: initial guess C0, for

J = K = 0,

eFkCk+1 = SCk+1Λk+1, Λk+1 = diag(λk+11 , ..., λk+1

N )

C∗k+1SCk+1 = IN ,

where eFk, k = 0, 1, ..., is specified by extrapolation over solutions

Ck, Ck−1, ... Khoromskaia, BNK, Flad [3].

Ex. III: The tensor-truncated iter. for H-F eq. B. Khoromskij, Zurich 2010(L17) 438

⊲ Multigrid convergence in eigenvalues (left).

⊲ Convergence in effective iterations scaled to finest grid (right).

2 4 6 8 10 1210

−4

10−3

10−2

10−1

100

iterations

P. CH4 abs. error , | λ − λ

n,it |

n=64n=128

n=256

n=512

n=1024

0 1 2 3 410

−4

10−3

10−2

10−1

100

conv.in eff.iterations, CH4, pseudo, n=512

Page 220: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Ex. III: Optimal scaling of tensor-truncated iteration B. Khoromskij, Zurich 2010(L17) 439

Linear scaling of the CPU time (per iteration) in the univariate grid size n.

200 400 600 800 10000

5

10

15

20

25

30

univariate grid size

min

ute

s

time per SCF iteration

Exer. 17.1. Compute QTT repres. of Slater funct. e−‖x‖, x ∈ [0, 10]3 on

n× n× n (n = 2d) grid by using sinc-quadrature canonical decomp. (see

Ex. 4.6, use quadrature similar to the case of Yukawa kernel (4.13)).

Parametric Elliptic Problems: Stochastic PDEs B. Khoromskij, Zurich 2010(L17) 440

Find uM ∈ L2(Γ) ×H10 (D), s.t.

AuM (y, x) = f(x) in D, ∀y ∈ Γ,

uM (y, x) = 0 on ∂D, ∀y ∈ Γ,

A := − div (aM (y, x) grad) , f ∈ L2 (D) , D ∈ Rd, d = 1, 2, 3,

aM (y, x) is smooth in x ∈ D, y = (y1, ..., yM ) ∈ Γ := [−1, 1]M , M ≤ ∞.

Additive case (via the truncated Karhunen-Loeve expansion)

aM (y, x) := a0(x) +

MX

m=1

am(x)ym, am ∈ L∞(D), M → ∞.

Log-additive case

aM (y, x) := exp(a0(x) +MX

m=1

am(x)ym) > 0.

Sparse stochastic Galerkin/collocation: [Babuska, Nobile, Tempone ’06-’10; Schwab el. ’07-’10]

Stochastic Galerkin, canonical form., additive case: BNK, Ch. Schwab, [5]

HT, additive case: Kressner, Tobler, [8]

QTT, both additive and log-additive cases: BNK, Oseledets, [6]

Page 221: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Stochastic collocation (additive case) B. Khoromskij, Zurich 2010(L17) 441

A parametric linear system, N - grid size in x (FEM, FD in x)

A(y)u(y) = f, f ∈ RN , u(y) ∈ R

N , y ∈ Γ, (139)

A(y) = A0 +MX

m=1

Amym, Am ∈ RN×N , parameter dependent matrix.

Collocation on 1D grid y(k)m =: Γn ∈ [−1, 1], k = 1, . . . , n, n - grid size in y

⇒ Assembled large linear system

Au = f , u, f ∈ RNnM , A ∈ R

NnM×NnM ,

A = A0 × I × . . .× I +A1 ×D1 × I × . . .× I + . . .+ AM × I × . . .×DM ,

Dm, m = 1, . . . ,M , is n× n diagonal matrix with positions of collocation

points y(k)m ∈ Γn, on the diagonal: rankC(A) ≤M .

f = f × e× . . .× e, e = (1, ..., 1)T ∈ Rn.

Def. 17.1. (BNK [7]) Weakly (locally) coupled canonical N-(d+ 1) tensors,

V ∈ RJ×I (similar definition for functional decomposition),

V ∈ Cloc[R] : V(j, I) =RX

k=1

dO

ℓ=1

V(ℓ)k (j), V

(ℓ)k (j) ∈ R

Iℓ , j ∈ J.

Rank bound for the solution, d = 1 B. Khoromskij, Zurich 2010(L17) 442

Define v = −∆−1x f , σm =

‖am‖PMm=1 ‖am‖ > 0, bm(ym, x) = σma0(x) + am(x)ym.

Prop. 17.1. (BNK [7]) Let d = 1, assume ∇xuM (y, x) ∈ C(D) for all y ∈ Γ,

∇xv(x) ∈ C(D), and there exists amin > 0, s.t.,

(A) amin ≤ a0(x) < ∞,

(B)

˛˛˛

MPm=1

am(x)ym

˛˛˛ ≤ γamin with γ < 1, and for |ym| < 1 (m = 1, ...,M).

Then for ε-rank:

rank(∇xuM ) ≤ C| log ε| (additive); rankCloc (∇xuM ) = 1 (log − additive).

Proof. We have ∇xuM (y, x) = 1aM (y,x)

(C0 + ∇xv(x)). Then, in additive

case, there exist ck, tk ∈ R>0, s.t.

‚‚‚‚‚‚∇xuM (y, x) −

KX

k=−K

ck

MY

m=1

e−tkbm(ym,x)(C0 + ∇xv(x))

‚‚‚‚‚‚L∞

≤ Ce−βK/ log K ,

β > 0 and C do not depend on M and K. Log-additive: rank(aM (y, x)) = 1.

Rem. 17.1. Discrete analogy of Prop. 17.1 is based on Lem. 16.1.

Page 222: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Matrix QTT-ranks in additive case B. Khoromskij, Zurich 2010(L17) 443

BNK, Oseledets [6]

Consider 2D-dimensional (x ∈ R2) SPDE in stratified media (i.e., with

coefficient depending on 1D variable x = x− 1) in the two cases:

1. Polynomial decay: am(x) = 0.5(m+1)2

sinmx, x ∈ [−π, π], m = 1, . . . ,M .

2. Exponential decay: am(x) = e−0.7m sinmx, x ∈ [−π, π], m = 1, . . . ,M .

The parametric space is discretized on a uniform mesh in [−1, 1] with 2p

points in each spatial direction. For the experiments, p = 8 is taken.

Ranks are presented with different truncation parameters. Table 12

presents results for the log-additive case and polynomial decay of

coefficients, and Table 13 – for exponential decay. The dependence on M

is linear for polynomial decay, and seems to be much milder in the case of

exponential decay, which is rather natural.

We use two different TT rank estimates for tensors: one characterising

the overall storage needs and complexity, rT T , and another one serving for

the QTT-rank disctribution, rQT T :

rT T (u) =

sPniriri+1P

ni, rQT T (u) =

r1

M

Xriri+1.

Matrix QTT-ranks in additive case B. Khoromskij, Zurich 2010(L17) 444

M QTT-rank(10−7) QTT-rank(10−3)

5 27 10

10 44 17

20 78 27

40 117 49

Table 12: The matrix QTT-rank vs. M , log-additive case,

polynomial decay N = 128, p = 8

M QTT-rank(10−7) QTT-rank(10−3)

5 33 11

10 43 21

20 51 23

40 50 25

Table 13: The matrix QTT-rank vs. M . log-additive case,

exponential decay, N = 128, p = 8.

Page 223: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Matrix QTT-ranks in additive case B. Khoromskij, Zurich 2010(L17) 445

Table 14 describes dependence from the accuracy for a fixed M . This

confirms, that ranks are logarithmic in accuracy ε.

ε QTT-rank(ε)

10−3 25

10−4 31

10−5 38

10−6 44

10−7 50

Table 14: The matrix QTT-rank vs. accuracy. log-additive

case, exponential decay, N = 128, M = 40, p = 8.

Tables 12 - 14 confirm numerically that matrices for log-additive case

have low maximal QTT-ranks, and this representation can be used in the

solution process.

Stochastic collocation (log-additive case) B. Khoromskij, Zurich 2010(L17) 446

In log-additive case the dependence on y is no longer affine.

Let d = 1. Apply collocation to (139) ⇒ nM linear systems (p.w.l. FEM),

A(j1, . . . , jM )u(j1, . . . , jM ) = f, 1 ≤ jm ≤ n ⇒ Au = f .

A(i, j, y) =

Z

Db(x, y)

∂φi

∂x

∂φj

∂xdx, y ∈ Γn, D = [0, 1].

A(i, i, y) =1

4(b(xi−1, y) + 2b(xi, y) + b(xi+1, y)),

A(i, i− 1, y) =1

2(b(xi−1, y) + b(xi, y)), A(i− 1, i, y) = A(i, i− 1, y),

for i = 1, ...,N , with

b(x, y) = ea(x,y) = ea0(x)MY

m=1

eam(x)ym , y ∈ Γn.

There is still good low rank approximations of the form

A ≈RX

k=1

MO

m=0

Amk, Amk ∈ R(M+1)×n.

Page 224: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Stochastic collocation (log-additive case) B. Khoromskij, Zurich 2010(L17) 447

Lem. 17.1. For 1D SPDE by p.w.l. FEM, the log-additive case:

rankC(A(i, j, y)) ≤ 3 (i, j ≤ N, y ∈ Γn) ⇒ A ∈ Cloc[3] ⊂ QTTloc[3]

rankQT T (A(i, j, y)) ≤ 3 i, j ≤ N, y ∈ Γn, ⇒ A ∈ QTTloc[3].

rankC(A) ≤ 7N .

Prof.

A(y) = D(y) + Z(y) + Z⊤(y), y ∈ Γn.

D(y) is a diagonal of A, Z is the first subdiagonal. D(y) is represented as

D(y) =NX

i=1

A(i, i, y)eie⊤i =

1

4(C1(y) + 2C2(y) + C3(y)),

where C2(y) takes the form

C2(y) =NX

i=1

eie⊤i e

a0(xi)MY

m=1

eam(xi)ym . (140)

C2(y), y ∈ Γn, is Nnm ×Nnm diagonal matrix, and each summand in (140)

has tensor rank-1. For QTT format in variable ym, TT-ranks will be equal

to 1, since it is an exponential function (Lect. 13).

QTT-truncated preconditioned iteration B. Khoromskij, Zurich 2010(L17) 448

BNK, Ch. Schwab [5]

u(k+1) := u(k) − ωB−1k

(Au(k) − f

), u(k+1) = Tε(u

(k+1)) → u,

Tε is the rank truncation operator in given format S,

preserving accuracy ε.

In additive case, a good choice of a (rank-1) preconditioner

B−10 = A−1

0 × I × . . .× I.

In log-additive case, adaptive preconditioner at iter. step k,

B−1k = A(y∗k)−1 × I × . . .× I, y∗k = argminQTT (‖f − Au(k)‖).

Note: B0 corresponds to y∗ = 0.

Proven spectral equivalence, B0 ∼ A, in both cases.

Page 225: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Numerics to sPDEs: additive case/canonical B. Khoromskij, Zurich 2010(L17) 449

BNK, Ch. Schwab [5]

S-truncated preconditioned iteration in (d+M)-dimensional parametric

space. Canonical format, M ≤ 100.

Solving sPDE on N⊗(M+d)-grid, d = 1, M = 20 (S = CR, B−1 := A(0)−1).

Variable coefficients with exponential decay (N = 63, R ≤ 5),

am(x) = 0.5 e−msin(mx), m = 1, 2, ....,M, x ∈ (0, π).

1 2 3 4 510

−4

10−3

10−2

10−1

Dim=20, alpha=1, rank=5, grid=63

rank

2−no

rm

1 2 3 4 5 6 710

−5

10−4

10−3

10−2

10−1

100

101

Dim=20, alpha=1, rank=5, grid=63

T−iter

Res

idua

l

Numerics to sPDEs: additive case/canonical B. Khoromskij, Zurich 2010(L17) 450

Zero order SPDE: smooth and random coefficient in y.

a(y, x) = a(y) := 1 +MX

m=1

amym with γ = ‖a‖ℓ1 :=MX

m=1

|am| < 1, (141)

for the truncated sequence of (spatially homogeneous) coefficients

am = (1 +m)−α, (m = 1, ...,M) with algebraic decay rates α = 2, 3, 5.

Equivalent to the so-called zero order sPDE in the form,

a(y)u(y) = f. (142)

Highly oscillating random coefficient

a(y) = 1 +MX

m=1

amym H(ym − cm(ym)),

the pwc function cm(ym), given by a random n-vector at [−1, 1].

H : R → −1, 1, H(x) = −1 for x < 0, and H(x) = 1 for x ≥ 0.

Page 226: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Numerics to sPDEs: additive case/canonical B. Khoromskij, Zurich 2010(L17) 451

1 2 3 4 510

−6

10−5

10−4

10−3

10−2

rank

2−no

rmDim=20, alpha=3, rank−max=5, grid=63

0 10 20 30 40 50 60 70−0.3

−0.2

−0.1

0

0.1

0.2

0.3

0.4

Figure 27: Approximation error vs. rank R (left) and the five canonical

vectors in variable y1 (right) for the solution of (142), M = 20.

r-convergence exponential, same as in the smooth case.

The canonical vectors – highly oscillating.

Numerics to sPDEs: QTT/log-additive B. Khoromskij, Zurich 2010(L17) 452

Figure 28: Convergence in the stratified 2D example with two different truncation param., 1-point

preconditioner, Left: Residue with iteration, Right: Ranks with iteration BNK, Oseledets, [6]

Page 227: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Conclusions on tensor methods for SPDEs B. Khoromskij, Zurich 2010(L17) 453

Exer. 17.2. Compute QTT rank of the system matrix A for 1D SPDE

(N = 2d, n = 2p), exponential decay, for additive case (can-to-QTT

compr.).

Conclusions: C/QTT + Preconditioned tensor-truncated iteration:

unified approach to challenging problems of numerical SPDEs.

– Separation rank estimates and analytic approximation

– Rank-structured tensor representation of the system matrix

– C/QTT preconditioners

– Fast and stable rank optimization algorithms via QTT-MLA

– Possible tensorization in the physical variable

– Successful computations in HT format Kressner, Tobler [8]

8. D. Kressner, Ch. Tobler. Krylov subspace methods for linear systems with tensor product structure.

SIMAX, 31(4): 1688-1714, 2010.

Literature to Lecture 17 B. Khoromskij, Zurich 2010(L17) 454

1. B.N. Khoromskij and V. Khoromskaia, Multigrid Tensor Approximation of Function Related Arrays.

SIAM J. on Sci. Comp., 31(4), 3002-3026 (2009).

2. V. Khoromskaia, Computation of the Hartree-Fock Exchange in the Tensor-structured Format.

Computational Methods in Applied Mathematics, Vol. 10(2010), No 2, 204-218.

3. B.N. Khoromskij, V. Khoromskaia, and H.-J. Flad, Numerical Solution of the Hartree-Fock Equation in

Multilevel Tensor-structured Format. Preprint 44/2009, MPI MIS Leipzig 2009 (SISC, accepted).

4. B.N. Khoromskij, Fast and Accurate Tensor Approximation of a Multivariate Convolution with Linear

Scaling in Dimension. J. of Comp. Appl. Math., 234 (2010) 3122-3139.

5. B.N. Khoromskij, and Ch. Schwab. Tensor-Structured Galerkin Approximation of Parametric and

Stochastic Elliptic PDEs. Preprint MPI MiS 9/2010, Leipzig 2010, SISC 2010, accepted.

6. B.N. Khoromskij, and I. Oseledets, Quantics-TT collocation approximation of parameter-dependent

and stochastic elliptic PDEs. Preprint MPI MiS 37/2010, Leipzig 2010 (CMAM 2010, accepted).

7. B.N. Khoromskij. Tensors-structured Numerical Methods in Scientific Computing: Survey on Recent

Advances. Preprint 21/2010, MPI MiS Leipzig 2010 (submitted).

Page 228: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Lect. 18. DMRG + QTT approach to high-dimensional QMD B. Khoromskij, Zurich 2010(L18)

Outline of Lecture 18.

1. Time evolution by spectral decomposition in quantum

molecular dynamics (QMD).

2. Problem setting.

3. Representation of the Hamiltonian matrix.

4. Numerical QTT representation.

5. QTT representation of multidimensional potential energy

surface (PES).

6. Sketch of the density matrix renormalization group

(DMRG) iteration.

7. Local EVP.

8. Numerics in the case of Henon-Heiles PES in the wide

rande of dimensions f ≤ 256. Linear scaling in f .

9. General conclusions on tensor numerical methods.

Tensor methods for time dependent problems B. Khoromskij, Zurich 2010(L18) 456

Parabolic BVP in S⊂ Vn:

∂U

∂t− iAU = 0, U0 = TS(U(0)), A − symmetric.

The regularised solution operator by QTT -matrix exponential, [BNK ’10]

U(t) = eiAtU0 ≈ TS(eiAtB)TS(B−1U0), t ≥ 0, B

−1 ≈ A.

Spectral decomposition (common in QMD): AUn = λnUn,, n = 1, 2, ...

U(t) ≈NX

n=1

eiλnt〈U0,Un〉Un.

Time-space separation by Cayley transform, [Gavriluyk, BNK ’10] in progress

Implicit integrators by time stepping (Yukawa solvers).

Above methods apply to both heat-like and QMD problems.

Present lecture: methods based on spectral decomposition via

DMRG/QTT algorithms with examples in QMD [BNK, Oseledets ’10].

Page 229: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Spectral problems in QMD B. Khoromskij, Zurich 2010(L18) 457

The basic problem in quantum molecular dynamics: time-dependent

molecular Schrodinger equation,

i∂ψ

∂t= Hψ = (−1

2∆ + V )ψ, ψ(x, 0) = ψ0(x), x ∈ R

f , (143)

V : Rf → R is (known) approximation to the potential energy surface

(PES) and ψ0(x) is an initial wavepacket (Meyer [3,4], Lubich [2]).

Eq. (143) is solved in time, and high-quality spectrum of H can be

recovered from it, describing the vibrational motion of molecules.

The computation of the ground state (or several lower states) of H can

be considered as an ingredient in quantum molecular dynamics

simulations, providing the alternative to direct time-evolution of the

system by discrete time-integrators.

Finding ground state is the solution of the eigenvalue problem,

Hψ = (−1

2∆ + V )ψ = Eψ, ψ = ψ(q1, . . . , qf ) (144)

which has to be solved in the QTT tensor format.

Problem setting B. Khoromskij, Zurich 2010(L18) 458

H is the molecular Schroedinger operator

H = −1

2∆ + V,

where ∆ is a f-dimensional Laplace operator, and V = V (q1, . . . , qf ) is

called PES. q1, . . . , qf are degrees of freedom (for example, coordinates of

atoms in a molecule).

Values of V should be obtained from the solution of electronic

Schroedinger eq. using any reliable method for electronic structure

calculations (say, the Hartree-Fock eq.).

The case of polynomial potential is important. For example, second-order

polynomial in normal coordinates becomes (harmonic oscillator)

V =

fX

k=1

w2kqk,

where wk are vibrational frequencies, and f is the number of degrees of

freedom.

The solution decays exponentially as |q| → ∞.

Page 230: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Problem setting B. Khoromskij, Zurich 2010(L18) 459

The computational domain can be chosen to be a f-dimensional cube,

and Dirichlet boundary conditions can be imposed. In this cube, uniform

tensor grid with n points in each direction is introduced, and unknowns

will be values of function ψ on this grid, and there are nf unknown values.

Multiplication by V reduces to the multiplication by diagonal matrix.

For the Laplace operator, finite-difference approximation is used, yielding

a matrix ∆(f), which can be written as

∆(f) = ∆1 ⊗ I ⊗ . . .⊗ I + . . .+ I ⊗ . . .⊗ ∆f , (145)

where ∆i, i = 1, . . . , f are discretizations of one-dimensional Laplace

operator with Dirichlet boundary conditions, and ⊗ is the tensor

(Kronecker) product of matrices.

In the simplest uniform grid case, we have

∆i = γitridiag[−1, 2,−1].

Representation of the matrix B. Khoromskij, Zurich 2010(L18) 460

Theoretical rank bounds for the polynomial PES revisited.

Recall the results in Lect. 14-16, Thm. 14.4, Lem. 14.3. The Laplace

operator allows TT-decomposition with ranks equal to 2:

rankTT(∆f ) ≤ 2.

Suppose that the uniform grid is chosen, and ∆i in (145) are

∆i = γitridiag[−1, 2,−1]. For such case, QTT-ranks are bounded by 4:

rankQTT(∆i) ≤ 4.

The potential V , is discretized by collocation at grid points, leading to a

diagonal matrix. The low-rank approximation of this matrix is reduced to

the low-rank approximation of a function V (q1, . . . , qf ) on a tensor grid.

If variables in V (q1, . . . , qd) are separated,

V (q1, . . . , qf ) ≈rX

k=1

fY

i=1

vi(qi, k),

then the canonical rank of the respective tensor V does not exceed r,

hence TT-ranks of V do not exceed r. But they can be much smaller.

Page 231: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Representation of the matrix B. Khoromskij, Zurich 2010(L18) 461

For particular potentials ranks can be uniformly bounded in dimens. f .

Lem. 18.1. For a general homogeneous polynomial potential we have

V (q1, . . . , qf ) =

fX

i1,...,is=1

a(i1, . . . , is)

sY

k=1

qik , rankT T (V ) = C0f[ s2] + o(f [ s

2]).

For harmonic potential, QTT-ranks are bounded by 6,

V (q1, . . . , qf ) =

fX

k=1

wkq2k, rankQT T (V ) ≤ 6

For Henon-Heiles potential QTT-ranks are bounded by 7,

V (q1, . . . , qf ) =1

2

fX

k=1

q2k + λ

f−1X

k=1

„q2kqk+1 − 1

3q3k

«, rankQT T (V ) ≤ 7.

Notice: Tucker ranks in all these cases are equal to f ⇒ O(ff ) scaling.

QTT-format gives polynomial in f storage and complexity O(fr3 logN).

Can-to-QTT preprocessing: Each rank-1 term is compressed to QTT,

and then addition with compression: complexity O(Rdfr3), N = 2d.

A preprocessing step: performed only once for any particular potential.

Numerical QTT representation B. Khoromskij, Zurich 2010(L18) 462

In the case V is given analytically, often, V can be represented as

separable expansion of form

V (q1, . . . , qf ) ≈RX

α=1

h1(q1, α) . . . hf (qf , α), (146)

but with large number of terms R. This is true for the polynomial PES.

The conversion of V defined by (146) into the QTT-format is performed

using the following steps. For each summand, it is converted into

QTT-format, by converting into QTT-format one-dimensional functions

hk(qk, α), k = 1, . . . , f , using either full to tt subroutine, or in case of

functions like polynomial or sine/cos, known analytical

QTT-representations. Then their Kronecker product is performed by

using mkron function in TT-Toolbox, which for given tensors A1, . . . ,Af in

the TT-format computes their Kronecker product (df-dimensional tensor)

Vα = A1 × A2 × . . .Af .

Kronecker product reduces to the concatenation of cores (no operations).

Page 232: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Numerical QTT representation B. Khoromskij, Zurich 2010(L18) 463

The required tensor V is now given as a sum of R tensors in the

QTT-format:

V =RX

α=1

Vα.

Performing additions in the QTT-format directly yields a tensor with rank

bounded by

R“max

αrankQTT(Vα)

”.

Since R can be very large, Vα are added one-by-one approximately by

using the following scheme:

V(1) := V1, V(k+1) = V(k) + Vk, V(k+1) = Tε(V(k+1)),

where Tε is rounding operator with relative accuracy ε in the QTT format.

If we assume, that ranks of intermediate tensors V(k) are of order r,

complexity of the algorithm becomes O(Rdfr3). If no intermediate

compression is used, it would be cubic in R.

Minimization on QTT-manifold for high-dim. EVPs B. Khoromskij, Zurich 2010(L18) 464

Nonlinear optimization problem. Actual dimension is equal to df → d.

The Hamiltonian matrix, H (for the QTT-format nk = 2),

H(i1, . . . , id, j1, . . . , jd) = ⊗H1(i1, j1) . . . Hd(id, jd), (147)

Hk(ik, jk) = Hk(αk−1, ik, jk, αk)

are cores of the QTT-representation of H, and

1 ≤ αk ≤ Rk ≤ R, k = 1, . . . , d, with 1 ≤ ik, jk ≤ nk.

Find an eigenvector ψ that solves

Hψ = Eψ, ψ ∈ S := QTT [r]. (148)

ψ = reshape(ψ, [n1, . . . , nd]), ψ(i1, . . . , id) ≈ ⊗G1(i1) . . . Gd(id).

Equations for parameters, defining representation.

For the EVP (148) the standard way is to minimize Rayleigh quotient:

ψ = arg min(Hψ,ψ) : (ψ,ψ) = 1, rankQTT(ψ) ≤ r.

Rank parameter r controls storage requirement, and has to be chosen

compromising between accuracy and complexity.

Page 233: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Sketch of DMRG iteration B. Khoromskij, Zurich 2010(L18) 465

If all cores except Gk and Gk+1 are fixed, we are left with a “small”

optimization problem in Gk and Gk+1 (still nonlinear).

Linearisation: introduce a new superblock:

W (ik, ik+1) = ⊗Gk(ik)Gk+1(ik+1), i.e., rk−1 × nk × nk+1 × rk+2 - tensor.

Equiv. to agglomeration modes, k and k + 1 in the TT-repres., yielding

n1 × . . .× nk−1 × (nknk+1)× nk+2 . . .× nd tensor (middle core is optimized).

Imposing norm constraint ||ψ|| = 1: The quadratic problem in W can be

reduced to the EVP for finding W under a constraint, ||ψ|| = 1.

Before minimization, we ensure that in the TT-representation of ψ (464)

fixed cores Gs(is), s = 1, . . . , k − 1 are left-orthogonal, i.e.X

isG⊤

s (is)Gs(is) = Irs ,

and cores Gs, s = k + 2, . . . , d are right-orthogonal, i.e.X

isGs(is)G

⊤s (is) = Irs−1 .

Then, ||ψ|| is the norm of the cores that we optimize. In the matr. formrX

ik,jk||W (ik, ik+1||2

F= ||ψ|| = 1.

Decimation step and DMRG sweep B. Khoromskij, Zurich 2010(L18) 466

Decimation step: After W is obtained from the “local” eigenvalue

problem, it is approximated with some prescribed accuracy ε to recover

Gk and Gk+1,

W (ik, ik+1) ≈ Gk(ik)Gk+1(ik+1). (149)

Approximation (149) requires one SVDε: eq. (149) in the index form is

W (αk−1, ik, ik+1, αk+1) ≈rkX

αk=1

Gk(αk−1, ik, αk)Gk+1(αk, ik+1, αk+1),

and it can be done via reshaping W into a rk−1nk × nk+1rk+1 matrix and

computing its SVDε. Rank rk is determined adaptively to the accuracy

parameter ε, (a big advantage over the standard ALS).

DMRG sweep: Consists of the following steps:

(1) Cores G3, . . . , Gd are made right-orthogonal,

(2) Optimization for G1, and G2, then for G2 and G3 and so on, keeping

cores G1, . . . , Gk−1 left-orthogonal to make W of the unit Frobenius norm.

(3) After the last core is reached, sweep can be made from right to left

and process continuesly until convergence (stabilization of the Rayleigh

quotient, or stabilization of ψ).

Page 234: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Local EVP B. Khoromskij, Zurich 2010(L18) 467

Representation of the Rayleigh quotient: Ψ is represented as

Ψ(i1, . . . , id) = G1(i1) . . . Gk−1(ik−1)W (ik, ik+1)Gk+2(ik+2) . . . Gd(id).

(Hψ,ψ), includes matrix-by-vector product

y = Hψ,

that is in the TT-format with cores,

Yk(ik) =X

jk

“Hk(ik, jk) ⊗Gk(jk)

”.

Representation for (Hψ,ψ):

(Hψ,ψ) = Γ1Γ2 . . .Γk−1 · Γ · Γk+2 . . .Γd,

Γs =X

is,js

Hs(is, js) ⊗Gs(is) ⊗Gs(js), s = 1, . . . d, s 6= k, s 6= k + 1,

Γ =X

ik,ik+1,jk,jk+1

Hk(ik, jk)Hk+1(ik+1, jk+1) ⊗W (ik, ik+1) ⊗W (jk, jk+1).

Γs is a matrix of size Rsr2s ×Rs+1r2s+1,

Γ is of size Rk−1r2k−1 ×Rk+1r

2k+1 (corresponding to merged core).

Local EVP B. Khoromskij, Zurich 2010(L18) 468

The product

pk = Γ1 . . .Γk−1, (resp.) qk = Γk+2 . . .Γd,

is a row vector of length Rk−1r2k−1, (resp.) a column vect. of length

Rk+1r2k+1.

Precomputed pk and qk, the Rayleigh quotient becomes

(Hψ,ψ) = pk

“ X

ik,ik+1,jk,jk+1

Hk(ik, jk)Hk+1(ik+1, jk+1)⊗W (ik, ik+1)⊗W (jk, jk+1)”qk.

Assuming rs ∼ r,Rs ∼ R, the total cost is O(n2Rr3) with n = 2.

Local eigenvalue problem for W : “Local” notation. DMRG

optimization step is equivalent to the ALS step applied to the tensor with

modes k, k + 1 merged ⇒ i = (ik, ik+1) – one long index, and j = (jk, jk+1).

The function to be minimized

f(W ) = p“X

i,j

M(i, j) ⊗W (i) ⊗W (j)”q,

where M(i, j), i, j = 1, . . . ,m are R1 × R2 matrices, W (i), i = 1, . . . ,m are

r1 × r2 matrices.

Page 235: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Local EVP B. Khoromskij, Zurich 2010(L18) 469

Here p is a row R1r21-vector, and q is a column R2r22-vector.

The local optimization problem:

p“X

i,j

M(i, j) ⊗W (i) ⊗W (j)”q → min,

X

i

||W (i)||2F = 1. (150)

Quadratic optimization problem in W , (150), is reduced to the EVP for

W ,

cMw = Ew, ||w|| = 1,

and w is W transformed into the “long vector” (MATLAB: w = W (:)).

Complexity: W contains 4r2 elements (n = 2), i.e., matrix cM is 4r2 × 4r2.

In full matrix format, ranks r > 50 imply 4r2 > 10000, but in practice one

can encounter ranks of order several hundred.

Remedy: Storage of cM requires only 16r2 memory cells to store M(i, j),

and matrix-by-vector product cMw costs O(Rr3 +R2r2).

Iterative computation of the lowest eigenvalue in the local EVP can be an

option.

Hamiltonian with Henon-Heiles pot. (dim. f ≤ 256) B. Khoromskij, Zurich 2010(L18) 470

Matrix H was approximated in QTT-format by using add-and-compress

algorithm.

The grid size was set to n = 128 = 27, and the number of dimensions f

is varying from 4 to 256.

Figure 29: Time to approximate the matrix. Figure 30: Dependence of

maximal and effective ranks on the number of DOFs.

QTT-ranks of solution look like in Figure 31.

Minima in this oscillating picture correspond to TT-ranks, i.e., in

absence of QTT-tensorization, and for new (virtual) dimensions ranks are

larger but in the same range (local matrix size is reduced dramatically).

Resume on QTT-DMRG iteration:

Vector, matrix and MLA complexities scale linearly in dimension.

Global convergence of DMRG sweep is robust.

Local preconditioned EVP solver is fast.

Further prospects: Toward Tensor Networks, FCI electronic structure

calculations, general applications.

Page 236: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Hamiltonian with Henon-Heiles pot. (dim. f ≤ 256) B. Khoromskij, Zurich 2010(L18) 471

Exer. 18.1. Calculate the local DMRG-QTT matrix cM on the exact

solution (rank-2 QTT sin-tensor) for the problem −∆ψ = λψ, where

∆ = ∆(d)DD is 1D DD-Laplacian on uniform grid of size n = 2d. Use explicit

rank-3 QTT representation of ∆(d)DD (Lect. 16) and rank-2 representation

of ψ (Lect. 15).

Figure 29: Solution and approximation timings for Henon-Heiles potentials with d = 7 and f is

varying, ε = 10−6.

Hamiltonian with Henon-Heiles pot. (dim. f ≤ 256) B. Khoromskij, Zurich 2010(L18) 472

Figure 30: Maximal and effective rank behaviour, d = 7, f is varying,

ε = 10−6.

Page 237: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Hamiltonian with Henon-Heiles pot. (dim. f ≤ 256) B. Khoromskij, Zurich 2010(L18) 473

Figure 31: QTT-ranks for the solution with f = 32, d = 7 for different modes

Status of tensor numerical methods in modern applications B. Khoromskij, Zurich 2010(L18) 474

Elliptic (parameter-dependent) eq.: Find u ∈ H10 (Ω), s.t.,

Hu := − div (a gradu) + V u = F in Ω ∈ Rd.

EVP: Find a pair (λ, u) ∈ R ×H10 (Ω), s.t., 〈u, u〉 = 1,

Hu = λu in Ω ∈ Rd,

u = 0 on ∂Ω.

Parabolic (hyperbolic) eq.: Find u : Rd × (0,∞) → R, s.t.

u(x, 0) ∈ H2(Rd) : σ∂u

∂t+ Hu = 0, H = ∆d + V (x1, ..., xd).

Tensor meth. are gainfully adapted to main challenges:

High spacial dimension: Ω = (−b, b)d ∈ Rd (d = 2, 3, ..., 100, ...).

Multiparametric eq.: a(y, x), u(y, x), y ∈ RM (M = 1, 2, ..., 100, ...,∞).

Nonlinear, nonlocal (integral) operator V = V (x, u), singular potentials.

Page 238: Introduction to Tensor Numerical Methods in Scientiï¬c Computing

Literature to Lecture 18 B. Khoromskij, Zurich 2010(L18) 475

1. B.N. Khoromskij, and I.V. Oseledets. DMRG + QTT approach to high-dimensional quantum molecular

dynamics. Preprint MPI MiS 68/2010, Leipzig 2010, submitted.

2. I.V. Oseledets, Compact matrix form of the d-dimensional tensor decomposition. Preprint 09-01, INM

RAS, Moscow 2009.

3. Ch. Lubich, From quantum to classical molecular dynamics: reduced models and numerical analysis.

Zurich Lectures in Advanced Mathematics, EMS, 2008.

4. M.H. Beck, A. Jackle, G.A. Worth, and H.-D. Meyer, The multiconfiguration time-dependent Hartree

(MCTDH) method: A highly efficient algor. for propagating wavepackets. Phys. Rep. 324 (2000), 1-105.

5. H.-D. Meyer, F. Gatti, and G.A. Worth, Multidimensional quantum dynamics: MCTDH Theory and

Applications, Wiley-VCH, Weinheim, 2009.

6. G. Vidal, Efficient classical simulation of slightly entangled quantum computations. Phys. Rev. Lett.,

91 (2003).

7. S.R. White, Density matrix algorithms for quantum renormalization groups. Phys. Rev. B., 48 (1993),

10345-10356.

Acknowledgments and best wishes for 2011! B. Khoromskij, Zurich 2010(L18) 476

Acknowledgments.

I gratefully acknowledge Christoph Schwab and Stefan Sauter for their

kind invitation to present these lectures, for collaboration, and for

creating friendly and encouraging atmosphere during my stay in Zuerich.

I am very much appreciative to my colleagues and coauthors Wolfgang

Hackbusch, Eugene Tyrtyshnikov, Venera Khoromskaia, Ivan Oseledets,

Ivan Gavriluyk, Heinz-Jurgen Flad, and to MFTI-students Vladimir Kazeev

and Sergey Dolgov. Our effective collaboration in the recent years lead to

rigorous understanding of tensor numerical methods. Many of our joint

papers provided the base for my lecture course.

Special thanks are to Christine Tobler (ETH Zurich) for essential and

qualified assistance with MATLAB exercises to the course.

I am thankful to my students for their interest and patience.

Merry Christmas and the Happy New Year !!! :-) :-) :-)

http://personal-homepages.mis.mpg.de/bokh