Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
Preface B. Khoromskij, Zurich 2010 1
These notes are based on my lectures given to the students of Pro∗Doc
Program at the University/ETH Zurich, in the winter semester of 2010.
This course, consisting of 18 lectures and MATLAB exercises, presents an
introduction to the modern tensor-structured numerical methods in
scientific computing. In the recent years these methods were proved to
provide a powerful tool for efficient computations in higher dimensions
overcoming the so-called “curse of dimensionality”.
In these lectures I try to display a triple of probably most important
ingredients of the tensor approach:
⋄ Analytical methods of separable approximation of multivariate functions
and operators in Rd, d ≥ 3.
⋄ Algebraic low-rank approximation to function related multi-dimensional
vectors/matrices in basic tensor formats, and the respective multilinear
algebra in Rn×n×...×n.
⋄ Tensor truncated iterative methods in the Tucker, tensor train (TT)
and quantics-TT formats with applications to the solution of multi-
dimensional equations in electronic structure calculations, quantum
molecular dynamics and stochastic PDEs.
BNK Zurich, October – December 2010.
Introduction to Tensor Numerical Methods I B. Khoromskij, Zurich 2010 2
Everything should be made as simple
as possible, but not simpler.
A. Einstein (1879-1955)
Introduction to Tensor Numerical Methods in
Scientific Computing
(Part I. Analytic Methods of Separable Approximation)
Boris N. Khoromskij
http://personal-homepages.mis.mpg.de/bokh
University/ETH Zurich, Pro∗Doc Program, WS 2010
Outline of the Lecture Course B. Khoromskij, Zurich 2010 3
Part I. Analytic Methods of Separable Approximation in Rd.
1. Separable approximation of multivariate functions in Rd. Basic rank
structured tensor-product formats. Curse of dimension and
Kolmogorow’s paradigm. Schmidt expansion. Greedy Algorithms for
d ≥ 3.
2. Classical Polynomial Approximation. Tensor-product polynomial and
trigonometric interpolation. Application to the Helmholtz kernel.
Functions of the form f(x1 + ...+ xd).
3. Separation by integration. Fitting by exponential sums. Celebrated
sampling theorem. sinc- interpolation and quadratures for analytic
functions. Error estimate for truncated sums.
4. Separable representation of analytic, shift-invariant functions.
Kronecker-product representation of multi-dimensional integral
operators Au =R
Rdg(‖ · −y‖)u(y)dy. Tensor product convolution.
Part II. Algebraic Methods of Tensor Approximation. Multilinear
Algebra. (see page xx)
Part III. Solving Equations by TT/QTT methods (BVPs, EVPs,
transient problems.) (see page xx)
Lect. 1. On separable approxim. in higher dimensions B. Khoromskij, Zurich 2010(L1) 4
Outlook of Lecture 1.
• Motivations: Modern applications in higher dimensions.
• From low to higher dimensions: what can be adopted from
traditional numerics.
• Rank structured separable representations of multi-variate
functions in Rd. Basic dimension splitting formats.
• Indispensable rank structured matrix/tensor multilinear
algebra (MLA).
• “Curse of dimensionality” and Kolmogorow’s paradigm.
• d = 2: Celebrated Schmidt’s decomposition (SD).
• Greedy Algorithms: simple but slow convergence.
Separability concept in multi-dimensional modeling B. Khoromskij, Zurich 2010(L1) 5
1929, Dirac:
The fundamental laws necessary for the mathematical treatment of large
part of physics and the whole of chemistry are thus completely known,
and the difficulty lies only in the fact that application of these laws leads
to equations that are too complex to be solved.
1998, W. Kohn, A. Pople:
Nobel Prize in Chemistry for development of DFT, based on
use of problem adapted (separable) GTO basis sets.
Nowadays: Spreading of tensor methods in multi-dimensional
numerical modeling:
Effective nonlinear approximation of operators/functions in Rd,
MLA with linear complexity scaling in dimension d,
Initial applications in comput. chemistry, sPDEs, quantum computing.
Multi-dimensional equations in wide range applications B. Khoromskij, Zurich 2010(L1) 6
Basic physical models include (nonlocal) multivariate transforms.
Examples of high dimensional problems.
1. Multi-dimensional integral operators in Rd (convolution and Green’s
functions, Fourier, Laplace transforms).
2. Elliptic/parabolic/hyperbolic solution operators, preconditioning.
3. Schrodinger eq. for many-particle systems. Density matrix
calculation in R3 × R3 (DFT, Hartree-Fock/Kohn-Sham eqs.),
quantum molecular dynamics, DMRG and quantum computing.
4. Stochactic/parametric PDEs, Kolmogorow forward/Fokker-Planck
eqs
5. Financial math. (Kolmogorow backward, Black-Scholes eqs).
6. Collision integrals in the deterministic Boltzmann eq. in R3
(dilute gas).
7. Multi-dimensional data in chemometrics, psychometrics, higher-order
statistics, data mining, ...
Examples of operator calculus B. Khoromskij, Zurich 2010(L1) 7
Tensor structured vectors and matrices in Rn×d
= Rnd :
x ∈ Rnd R
n ⊗ ...⊗ Rn, A ∈ R
md×nd Rm×n ⊗ ...⊗ R
m×n.
• Linear elliptic systems and spectral problems
Au = f, Au = λu ⇒ B ≈ A−1.
• Volume/interface preconditioning ⇒ ∆−α, α = 1,±1/2.
• Parabolic equations
∂u∂t +Au = f ⇒ exp(−tA), (A+ 1
τ I)−1.
• Control theory: Matrix Lyapunov equation on Rn×n,
AX +XB = G ⇒ X =∫∞0e−tAGe−tBdt, sign(A).
Challenge of Higher Dimensions B. Khoromskij, Zurich 2010(L1) 8
1. Motivating applications:
Molecular systems: quantum molecular dynamics, DMRG in quant. chem.
FEM/BEM in Rd: stochastic PDEs, atmospheric model., financial math.
Data mining: quantum computing, machine learning, image processing.
2. ”Curse of dimensionality”: (R. Bellman, Princeton UP, NJ, 1961).
O(Nd)-methods using N ×N × ...×N| z d
grids (linear in volume size).
3. O(dN)-Methods via separation of variables:
Tensor-formatted methods to represent d-variate functions, operators, and
for solving equations on rank-structured tensor manifolds in Rd, d ≥ 3.
4. log-volume super-compressed representation:
Quantics-TT approximation of N-d tensors, Nd → O(d logN).
Large problems in low dimensions B. Khoromskij, Zurich 2010(L1) 9
In low dimensions (d = 1, 2, 3) the goal is O(N)-methods.
Main principles: making use of hierarchical structures,
low-rank pattern, recursive algorithms and parallelization.
Based on recursions via hierarchical structures:
Classical Fourier (1768-1830) methods, FFT in O(N logN) op.
FFT-based circulant convolution, Toeplitz, Hankel matrices.
Multiresolution representation via wavelets, O(N)-FWT.
Multigrid methods: O(N) - elliptic problem solvers.
Fast multipole, panel clustering, H-matrix in O(cdN logN) op.
Well suited for integral (nonlocal) operators in FEM/BEM.
Parallelization:
Domain decomposition: O(N/p) - parallel algorithms.
Traditional numerical tools of reduced complexity B. Khoromskij, Zurich 2010(L1) 10
• High order methods: hp-FEM/BEM, spectral methods,
bcFEM, Richardson extrapolation.
• Adaptive mesh refinement: a priori/a posteriori strateg.
• Dimension reduction: boundary/interface equations,
Schur complement/domain decomposition methods.
• Combination of tensor-product basis with anisotropic
adaptivity: hyperbolic cross approximation by
FEM/wavelet (sparse grids).
• Model reduction: multi-scale, homogenization, neural
networks.
• Monte-Carlo method (e.g., random walk dynamics).
Separabe representation of functions in TPHS B. Khoromskij, Zurich 2010(L1) 11
Let Hℓ (ℓ = 1, ..., d) be a real, separable Hilbert space of
functions. M. Reed, B. Simon, Functional analysis, AP, 1972.
Def. 1.1 A tensor-product of Hilbert spaces Hℓ (TPHS),
H = H1 ⊗ ...⊗Hd, is defined as the closure of a set of finite
sums,∑
k
⊗dℓ=1w
(ℓ)k , of dual multilinear forms (linear
functionals) on H1 × ...×Hd. A single form is defined by
d⊗
ℓ=1
w(ℓ) (v(1), ..., v(d)) :=
d∏
ℓ=1
〈w(ℓ), v(ℓ)〉Hℓ.
The scalar product of rank-1 (separable) elements (tensors)
in H is defined by
〈w(1) ⊗ . . .⊗ w(d), v(1) ⊗ . . .⊗ v(d)〉 =d∏
ℓ=1
〈w(ℓ), v(ℓ)〉,
and it is extended by linearity.
〈·, ·〉 is called the induced scalar product.
Basic properties of TPHS. First examples. B. Khoromskij, Zurich 2010(L1) 12
Lem. 1.1 〈·, ·〉 is well defined and it is positive definite.
Lem. 1.2 If φ(ℓ)kℓ
is an orthonormal basis in Hℓ, then Φk =
⊗dℓ=1 φ
(ℓ)kℓ
, k = (k1, ..., kd) ∈ Nd, is the orthonormal basis in H.
Exercise 1.1. Prove Lem. 1.1 - 1.2.
The tensor product of univariate functions f(ℓ)(xℓ), xℓ ∈ Iℓ = [aℓ, bℓ], is a
d-variate function (called as separable or rank-1) defined as follows
f :=dO
ℓ=1
f(ℓ), where f(x1, ..., xd) =dY
ℓ=1
f(ℓ)(xℓ).
Exer. 1.2 Prove L2(I1 × ...× Id) =⊗d
ℓ=1 L2(Iℓ).
Example 2.1 Denote by H⊗n the n-fold tensor product of spaces H. If
H = L2(R), then an element ψ ∈ F(H) := ⊕∞n=0H
⊗n, of the so-called Fock
space over H, F(H), is a sequence of functions
ψ = ψ0, ψ1(x1), ψ2(x1, x2), ψ3(x1, x2, x3), . . .,
Basic properties of TPHS. First examples. B. Khoromskij, Zurich 2010(L1) 13
such that
|ψ0|2 +
∞X
n=1
Z
Rn|ψn(x1, . . . , xn)|2dx1 . . . dxn < ∞.
The finite expansion in F(H) as above is also known as ANOVA repr.
In the physical literature, the subspaces of F(H) consisting of
symmetric/antisymmetric functions w.r.t. permutation of two arguments
are called the boson and fermion Fock spaces, respectively.
Def. 1.2 d-th order tensor is a function of d discrete
arguments, f : RI1×...×Id → R, (multi-dimensional array over
I1 × ...× Id). The respective TPHS H is equipped with
Euclidean scalar product and Frobenius norm (More details in Lect. 6).
Example 1.3 H = RI1×...×Id =⊗d
ℓ=1 RIℓ , with Iℓ = 1, ..., nℓ.
Tensor formats: Canonical representation in TPHS B. Khoromskij, Zurich 2010(L1) 14
Def. 1.3 (Canonical format). Call by CR a subset of
elements in H, requiring at most R terms (rank-R functions),
CR =
w ∈ H : w =
R∑
k=1
w(1)k ⊗ w
(2)k ⊗ . . .⊗ w
(d)k , w
(ℓ)k ∈ Hℓ
.
w ∈ CR can be represented by the description of Rd elements
w(ℓ)k ∈ Hℓ. Storage on nd-grid: dRn (linear in d).
Advantage: Tremendous reduction of representation cost,
removing d from the exponential, nd → dRn.
Limitations: Applies to special class of functions given
analytically, nonrobust algebraic decomposition.
Probl. 1. Best rank-R approximation of a multi-variate
function f = f(x1, ..., xd) ∈ H in the set CR.
Orthogonal separabe representation B. Khoromskij, Zurich 2010(L1) 15
Given a tuple of dimensions, r = (r1, . . . , rd) ∈ Nd, choose
Vℓ = spanφ
(ℓ)k
rℓ
k=1⊂ Hℓ, rℓ := dimVℓ <∞ (1 ≤ ℓ ≤ d) with
orthogonal basis and build the tensor subspace,
V = V1 ⊗ V2 ⊗ . . .⊗ Vd ⊂ H. Each v ∈ V can be represented by
v =r∑
k=1
bkφ(1)k1
⊗ φ(2)k2
⊗ . . .⊗ φ(d)kd. (1)
Def 1.4 (Tucker format) Given r, define
Tr := v ∈ V ⊂ H : ∀ Vℓ s.t. dimVℓ = rℓ with bk ∈ R .
Representing w ∈ Tr:∏d
ℓ=1 rℓ reals and the sampling of∑d
ℓ=1 rℓ
functions φ(ℓ)k .
Robust but storage on nd-grid: rd + rdn≪ nd, r = max rℓ.
Orthogonal separabe representation B. Khoromskij, Zurich 2010(L1) 16
Visualization of the canonical and Tucker models for d = 3.
+
b
A
1b
V V V
V V V
V V V
+= ...+
1
1 2
2
2
r
r
r
(1) (1) (1)
(2) (2) (2)
21
(3) (3) (3)
rb
=
I 2
I 1
I 3
A B
I 1
r 2
r 1
I 2
I 3
r 3
V
V
V
(1)
(2)
(3)
Probl. 2. Best rank-r orthogonal approx. of f ∈ H in Tr.
Examples on rank-R and Tucker formats B. Khoromskij, Zurich 2010(L1) 17
Ex. 1.4 H = L2(Id). Rank-1 elements, f = f1(x1)...fd(xd), e.g.
f = exp(f1(x1) + ...+ fd(xd)) =∏d
ℓ=1 exp(fℓ(xℓ)). For the function
f = sin(∑d
j=1 xj
), rank(f) = 2 holds over the field C,
2i sin(∑d
j=1 xj
)= ei
Pdj=1 xj − e−i
Pdj=1 xj .
Rank-d function f(x) = x1 + x2 + . . .+ xd, can be approximated
by a rank-2 expansion with any prescribed accuracy,
f ≈Qdℓ=1(1+εxℓ)−1
ε +O(ε), as ε→ 0.
Ex. 1.5 The Tucker approximation in H = L2(Id) can be made
by the tensor product polynomial interp. of order r,
f(x1, ..., xd) ≈r∑
j=1
f(νj1 , ..., νjd)d∏
ℓ=1
Ljℓ(xℓ).
Ljℓ is a set of the Lagrange polynomials on [−1, 1] at, say,
Chebyshev-Gauss-Lobatto grid, νjℓ, jℓ = 1, ..., rℓ.
Function product decomposition (Tensor chain/train) B. Khoromskij, Zurich 2010(L1) 18
Given J := ×dℓ=1Jℓ, Jℓ = 1, ..., rℓ, and J0 = Jd.
Def. 1.5 A rank-r functional tensor chain/train (FTC/FTT)
format: contracted product of functional tri-tensors over J ,
f(x1, ..., xd) =∑
j∈Jg1(jd, x1, j1)g2(j1, x2, j2) · · · gd(jd−1, xd, jd)
or in a compact form
FTC[r]:= f ∈ H : f = ×Jℓdℓ=1G
(ℓ)(xℓ) with G(ℓ) ∈ RJℓ−1×Hℓ×Jℓ.
If J0 = 1, we have the FTT decomp. Here G(1)(x1) is a row
1 × r1-vector function depending on x1, G(ℓ)(xℓ) is a matrix of
size rℓ−1 × rℓ with functional elements depending on xℓ,
G(d)(xd) is a column vector of size rd−1 × 1, depending on xd.
Sampling on a nd-grid: O(dr2n)-storage.
A function f ∈ H is approximated by a product of matrices
(matrix product states), each depending on a single variable.
Examples on FTT decomposition B. Khoromskij, Zurich 2010(L1) 19
Ex. 1.6 d-fold contracted product of tri-tensors over J1, ..., Jd (d = 6).
N
r1
r1rr
2 2r3
d=6
r
N
N
3
r6
r5
6
r5 r4
r r4
Special case r6 = 1: FTT[r] = FTC[r].
Exer. 1.3 In some cases the function product decomp. can be
constructed explicitly. FTT rank of f(x) = x1 + x2 + . . .+ xd is 2,
[Oseledets ’10].
f(x) =“x1 1
”0@ 1 0
x2 1
1A · · ·
0@ 1 0
xd−1 1
1A0@ 1
xd
1A .
Nonlinear approximation in tensor format B. Khoromskij, Zurich 2010(L1) 20
Since Tr, CR and FTC[r] are not linear spaces, we obtain a
nontrivial nonlinear approximation problem on estimation
f ∈ H : σ(f,S) := infs∈S
‖f − s‖, (2)
where S = Tr, CR, FTC[r].Why the problem (317) might be difficult for d ≥ 3?
Prop. 1.1 [Beylkin, Mohlenkamp] The trigonometric identity (d ≥ 2)
f(x) := sin
d∑
j=1
xj
=
d∑
j=1
sin(xj)∏
k∈1,...,d\j
sin(xk + αk − αj)
sin(αk − αj)
(3)
holds for any αk ∈ R, s.t. sin(αk − αj) 6= 0 for all j 6= k.
For d ≥ 3 it can be proven by induction (nontrivial exercise!).
Exer. 1.4 Proof that FTT-rank of f in (44) is 2 (Lect. 2).
Nonlinear approximation in tensor format B. Khoromskij, Zurich 2010(L1) 21
Expansion (44) shows the lack of uniqueness (ambiguity) of
the best rank-d tensor representation. The minimisation
process might be non-robust (multiple local minima).
Principal questions (no ultimate answers):
Is the “curse of dimensionality” relevant?
How to solve (317) efficiently? (Extend truncated SVD)
Can one expect the fast (exponential) convergence in
the rank parameters R, r = max rℓ?
Can one solve the physical equations on nonlinear
tensor manifold S getting rid of “curse of dimension”?
Our approach: Construct tensor-structured numerical
methods based on efficient multilinear algebra (MLA).
Kolmogorow’s paradigm B. Khoromskij, Zurich 2010(L1) 22
Hilbert 13th problem: A solution of the algebraic equation of degree 7
cannot be written as superposition of continuous bivariate functions.
Solved by celebrated theorem by Kolmogorow on the
superposition of univariate functions.
Thm. 1. (A. Kolmogorow 1957) Let I = [0, 1]. For d ≥ 2, any
function f ∈ C[Id] can be represented in the form
f(x1, ..., xd) =2d+1∑
i=1
gi
(d∑
ℓ=1
φiℓ(xℓ)
),
where functions φiℓ : I → R do not depend on f and belong to
the class Lip1, while gi : R → R are continuous functions.
Thm. 1. is not constructive, but in our context it says that in the discrete
setting, any function f can be represented by O(2dN + (2d+ 1)dN) reals, N
corresponds to the size of the interpolating table for gi [Griebel].
d = 2: Schmidt expansion and SVD B. Khoromskij, Zurich 2010(L1) 23
The approximation of functions f(x, y) by bilinear forms
f ≈R∑
k=1
uk(x)vk(y) in L2([0, 1]2),
is due to E. Schmidt, 1907 (celebrated theorem). The result
is a continuous analogue to SVD of matrices.
Let σk(Jf ), σ1 ≥ σ2 ≥ ... ≥ 0, be a nonincreasing sequence of
singular values of the IO,
Jf (g) :=
∫ 1
0
f(x, y)g(y)dy,
σk(Jf ) := λk[(A)1/2], A = J∗f Jf , J∗f adjoint to Jf
with orthonormal sequences ϕk(x), ψk(y),
Aψk(y) = λkψk(y); A∗ϕk(x) = λkϕk(x), k = 1, 2, ...
d = 2: Schmidt expansion and SVD B. Khoromskij, Zurich 2010(L1) 24
The kernel function of A is given by
fA(x, y) :=
∫ 1
0
f(x, z)f(z, y)dz.
The Schmidt decomposition (SD) is given by
f(x, y) =
∞∑
k=1
σk(Jf )ϕk(x)ψk(y).
The best bilinear approximation property reads as,‚‚‚‚‚f(x, y) −
RX
k=1
σkϕk(x)ψk(y)
‚‚‚‚‚L2
= infuk,vk∈L2, k=1,...,R
‚‚‚‚‚f(x, y) −RX
k=1
uk(x)vk(y)
‚‚‚‚‚L2
.
SD ensures that for d = 2 the best bilinear approximation can
be realised by the so-called Pure Greedy Algorithm (PGA).
For Nystrom’s approximation the problem is reduced to SVD.
Computing canonical decomposition B. Khoromskij, Zurich 2010(L1) 25
For S = CR, the canonical decomposition can be considered in
the framework of the best R-term approximation with regard
to a redundant dictionary of rank-1 functions.
Def. 1.6 A system D of functions from H is called a
dictionary if each g ∈ D has norm one and its linear span is
dense in H.
Denote by ΣR(D) the collection of s ∈ H which can be written
in the form
s =∑
g∈Λ
cgg, Λ ⊂ D : #Λ ≤ R ∈ N with cg ∈ R.
For f ∈ H, the best R-term approximation error is defined by
σR(f,D) := infs∈ΣR(D)
‖f − s‖.
Pure Greedy Algorithm B. Khoromskij, Zurich 2010(L1) 26
The Pure Greedy Algorithm (PGA) inductively computes an
estimate to the best R-term approximation.
Let g = g(f) ∈ D be an element maximising |〈f, g〉| (best rank-1
approximation by nonlinear maximisation!). Define
G(f) := 〈f, g〉g, R(f) := f −G(f).
The PGA reads as: Given f ∈ H, introduce
R0(f) := f and G0(f) := 0.
Then, for all 1 ≤ m ≤ R, we inductively define
Gm(f) := Gm−1(f) +G(Rm−1(f)),
Rm(f) := f −Gm(f) = R(Rm−1(f)).
Pure Greedy Algorithm B. Khoromskij, Zurich 2010(L1) 27
Applying PGA to functions characterised via the
approximation property (low order approximation)
σR(f,D) ≤ R−q, R = 1, 2, ...,
with some q ∈ (0, 1/2], leads to the error bound (Temlyakov)
‖f −GR(f,D)‖ ≤ C(q,D)R−q, R = 1, 2, ...,
which is “too pessimistic” in our applications (Monte-Carlo).
Our goal: The constructive R-term approximation on a class
of analytic functions (possibly with point singularities),
providing exponential convergence in R = 1, 2, ...,
σR(f,D) ≤ C exp(−Rq), q = 1 or q = 1/2.
Methods of choice: Quadrature- and interpolation-based
sinc-approximation, the direct fitting by exponential sums.
Greedy completely orthogonal decomposition B. Khoromskij, Zurich 2010(L1) 28
The decomposition in CR,
f =
R∑
k=1
akvk, vk = φ(1)k (x1) ⊗ ...⊗ φ
(d)k (xd) ∈ C1,
is called completely orthogonal if
〈φ(ℓ)k , φ(ℓ)
m 〉 = δk,m ∀ℓ = 1, ..., d, ⇔ Φ(ℓ) = [φ(ℓ)1 , ..., φ
(ℓ)R ] − orthog.
Greedy completely orthogonal decomposition (GCOD) is
defined as GOD with the orthogonality constraint on Φ(ℓ).
Lem. 1.3 (Tucker format with the diagonal core.) Let f ∈ H allow a rank-R
completely orthogonal decomposition. Then the GCOD
algorithm correctly computes it. If a1 > a2 > ... > aR > 0, then
the decomp. is unique.
Exer. 1.5 Prove Lem. 1.3. [Golub, Zhang 2001].
Limitations. Poor approximation properties of COD.
Literature to Lecture 1 B. Khoromskij, Zurich 2010(L1) 29
1. G. Beylkin, M. Mohlenkamp: Numerical operator calculus in higher dimensions,
Proc. Natl. Acad. Sci. USA, 99 (2002), 10246–10251.
2. M. Reed, and B. Simon: Methods of Modern Mathematical Physics, I. Functional Analysis, AP, NY, 1972.
3. B.N. Khoromskij: An introduction to Structured Tensor-product representation of Discrete
Nonlocal Operators. Lecture notes 27, MPI MIS, Leipzig 2005.
4. I.V. Oseledets: Constructive representation of functions in tensor formats. Preprint INM Moscow, 2010.
5. V.N. Temlyakov: Greedy Algorithms and M-Term Approximation with Regard
to Redundant Dictionaries. J. of Approx. Theory 98 (1999), 117-145.
6. T. Zhang, and G.H. Golub: Rank-one approximation to high order tensors.
SIAM J. Matrix Anal. Appl. 23 (2001), 534-550.
http://personal-homepages.mis.mpg.de/bokh
Lect. 2. Separation by tensor-product polynomial interpolation B. Khoromskij, Zuerich 2010(L2)
Outlook of Lecture 2
1. Best polynomial approximation. Error bound for analytic
functions.
3. Separable approximation by tensor-product interpolants.
- Polynomial interpolation.
- Trigonometric interpolation.
- Sinc interpolation (Lect. 3, 4).
4. Application to the Helmholtz kernel.
5. Approximation of functions in the form f(x1 + ...+ xd).
FTT decomposition of the Helmholtz kernel.
6. MATLAB Tensor Toolbox:
– http://csmr.ca.sandia.gov/∼tgkolda/Tensor Toolbox/
– TT/QTT http://spring.inm.ras.ru/osel (I. Oseledets).
Chebyshev polynomials. Polynomial approximation B. Khoromskij, Zuerich 2010(L2) 31
The Chebyshev polynomials, Tn(w), w ∈ C - complex plane,
are defined recursively
T0(w) = 1, T1(w) = w,
Tn+1(w) = 2wTn(w) − Tn−1(w), n = 1, 2, . . . .
Representation Tn(x) = cos(n arccosx), x ∈ [−1, 1], implies
Tn(1) = 1, Tn(−1) = (−1)n. There holds
Tn(w) =1
2(zn + z−n) with w =
1
2(z +
1
z). (4)
Let B := [−1, 1] be the reference interval, denote by Eρ = Eρ(B)
the Bernstein’s regularity ellipse (with foci at w = ±1 and the
sum of semi-axes equal to ρ > 1),
Eρ := w ∈ C : |w − 1| + |w + 1| ≤ ρ+ ρ−1.
Best polynomial approximation by Chebyshev series B. Khoromskij, Zuerich 2010(L2) 32
Thm. 2.1. (Chebyshev series). Let F be analytic and
bounded by M in Eρ (with ρ > 1). Then the expansion
F (w) = C0 + 2
∞∑
n=1
CnTn(w), (5)
holds for all w ∈ Eρ, and with
Cn =1
π
∫ 1
−1
F (w)Tn(w)√1 − w2
dw.
Moreover, |Cn| ≤M/ρn, and for w ∈ B, and m = 1, 2, 3, . . . ,
|F (w) − C0 − 2m∑
n=1
CnTn(w)| ≤ 2M
ρ− 1ρ−m, w ∈ B. (6)
Rem. Thm. 2.1. provides the same appoximation error as for the best
polynomial approximation (S. N. Bernstein, 1880-1968).
Laurent’s Theorem B. Khoromskij, Zuerich 2010(L2) 33
In the complex plane C, we introduce the circular ring
Rρ := z ∈ C : 1/ρ < |z| < ρ with ρ > 1.
Thm. 2.2. (Laurent’s Theorem). Let f : C → C be analytic
and bounded by M > 0 in Rρ with ρ > 1, (in the following we
say f ∈ Aρ), and set
Cn :=1
2π
∫ 2π
0
f(eiθ)einθdθ, n = 0, ±1, ±2, . . . .
Then for all z ∈ Rρ, f(z) =∞∑
n=−∞Cnz
n, where the series
converges to f(z) for all z ∈ Rρ. Moreover |Cn| ≤M/ρ|n|, and
for all θ ∈ [0, 2π] and arbitrary integer m,∣∣∣∣∣f(eiθ) −
m∑
n=−m
Cneinθ
∣∣∣∣∣ ≤2M
ρ− 1ρ−m.
Proof of the approximation Theorem 2.1 B. Khoromskij, Zuerich 2010(L2) 34
Proof. Each f ∈ Aρ,s := f ∈ Aρ : C−n = Cn has a representation (cf.
Thm. 2.2)
f(z) = C0 +∞X
n=1
Cn(zn + z−n), z ∈ Rρ. (7)
(7) implies that f(1/z) = f(z), z ∈ Rρ.
Let us apply the mapping w = 12(z + 1
z), which satisfies w(1/z) = w(z). It is
a conformal transform of ξ ∈ Rρ : |ξ| > 1 onto Eρ as well as of
ξ ∈ Rρ : |ξ| < 1 onto Eρ (but not Rρ onto Eρ!). It provides a one to one
correspondence of functions F that are analytic and bounded by M in Eρ
with functions f in Aρ,s.
Since under this mapping we have (4), it follows that if f defined by (7) is
in Aρ,s, then the corresponding transformed function F (w) = f(z(w)), that
is analytic and bounded by M in Eρ, is given by (5).
Now the result follows directly due to Thm. 2.2.
Lagrangian polynomial interpolation B. Khoromskij, Zuerich 2010(L2) 35
Let PN (B) be the set of polynomials of degree ≤ N on B.
Define by [INF ](x) ∈ PN (B) the interpolation polynomial of F
w.r.t. the Chebyshev-Gauss-Lobatto (CGL) nodes
ξj = cosπj
N∈ B, j = 0, 1, . . . , N, with ξ0 = 1, ξN = −1,
where ξj are zeroes of the polynomials (1 − x2)T ′N (x), x ∈ B.
The Lagrangian interpolant IN of F has the form
INF :=
N∑
j=0
F (ξj)lj(x) ∈ PN (B) (8)
with lj(x) being the set of interpolation polynomials
lj :=N∏
k=0,j 6=k
x− ξkξj − ξk
∈ PN (B), j = 0, . . . , N.
Clearly, IN (ξj) = F (ξj), since lj(ξj) = 1 and lj(ξk) = 0 ∀k 6= j.
Stability: Lebesque constant for Chebyshev interpolation B. Khoromskij, Zuerich 2010(L2) 36
Given the set ξjNj=0 of interpolation points on [−1, 1] and the
associated Lagrangian interpolation operator IN .
The approximation theory for polynomial interpolation
includes the so-called Lebesque constant ΛN ∈ R>1,
‖INu‖∞,B ≤ ΛN‖u‖∞,B ∀u ∈ C(B). (9)
In the case of Chebyshev interpolation it can be shown that
ΛN grows at most logarithmically in N ,
ΛN ≤ 2
πlogN + 1.
The interpolation points which produce the smallest value Λ∗Nof all ΛN are not known, but Bernstein ’54 proves that
Λ∗N =2
πlogN +O(1).
Error bound for polynomial interpolation B. Khoromskij, Zuerich 2010(L2) 37
Thm. 2.3. Let u ∈ C∞[−1, 1] have an analytic extension to Eρ
bounded by M > 0 in Eρ (with ρ > 1). Then we have
‖u− INu‖∞,I ≤ (1 + ΛN )2M
ρ− 1ρ−N , N ∈ N0. (10)
Proof. Due to (6) one obtains for the best polynomial
approximations to u on [−1, 1],
minv∈PN
‖u− v‖∞,B ≤ 2M
ρ− 1ρ−N .
The interpolation operator IN is a projection, that is, for all
v ∈ PN we have INv = v.
Now apply the triangle inequality,
‖u− INu‖∞,B = ‖u− v − IN (u− v)‖∞,B ≤ (1 + ΛN )‖u− v‖∞,B .
Tensor-product polynomial interpolation B. Khoromskij, Zuerich 2010(L2) 38
Ginen N ∈ N, the set of interpolating functions ϕj(x), x ∈ B,
and sampling points ξj ∈ B (j = 0, 1, ..., N), s.t. ϕj(ξi) = δij,
The Lagrangian interpolant IN of F : B → R has the form
INF :=N∑
j=0
F (ξj)ϕj(x), F ∈ C(B) (11)
with interpolation property IN (ξj) = F (ξj) (j = 0, 1, . . . , N).
Consider a multi-variate function
f = f(x1, . . . , xd), f : Bd → R, d ≥ 2,
defined on a box Bd = B1 ×B2 × . . .×Bd with Bk = B.
Define N-th order tensor product interpolation operator
IN : C(Bd) → PN [Bd], INf = I1N × I2
N × . . .× IdNf.
Tensor-product polynomial interpolation B. Khoromskij, Zuerich 2010(L2) 39
Here IkNf is the interpolation polynomial w.r.t. xk, at nodes
ξjk ∈ Bk, k = 1, . . . , d,
IkNf(x1, ..., xk, ..., xd) =
N∑
jk=0
f(x1, ..., ξjk , ..., xd)ϕ(k)jk
(xk)
The tensor-product interpolant IN in d variables reads
INf :=
N∑
j=0
f(ξj1 , ..., ξjd)ϕ(1)j1
(x1)...ϕ(d)jd
(xd).
Our choice: The polynomial or Sinc interpolants.
In the case of CGL nodes, the interpolation points ξα ∈ Bd,
α = (j1, . . . , jd) ∈ Nd0, are obtained by the Cartesian product of
1D-nodes,
ξα :=
(cos
πj1N
, . . . , cosπjdN
).
Error bound for tensor-product polynomial interp. B. Khoromskij, Zuerich 2010(L2) 40
Again, IN is the projection map,
IN : C(Bd) → PN := p1 × . . .× pd : pi ∈ PN , i = 1, . . . d
implying stability of IN in the multidimensional case, cf. (9),
‖INf‖∞,Bd ≤ ΛdN‖f‖∞,Bd ∀ f ∈ C(Bd).
To derive an analogue of Thm. 2.3, introduce the product
domain
E(j)ρ := B1 × . . .×Bj−1 × Eρ(Ij) ×Bj+1 × . . .×Bd,
and denote by X−j the (d− 1)-dimensional (single-hole) subset
of variables
x1, . . . , xj−1, xj+1, . . . , xd with xj ∈ Bj , j = 1, ..., d.
Error bound for tensor-product polynomial interp. B. Khoromskij, Zuerich 2010(L2) 41
Assump. 2.1. Given f ∈ C∞(Bd), assume there is ρ > 1 s.t.
for all j = 1, . . . , d, and each fixed ξ ∈ X−j, there exists an
analytic extension fj(xj , ξ) of f(xj , ξ) to Eρ(Bj) ⊂ C w.r.t. xj
bounded in Eρ(Bj) by certain Mj > 0, independent on ξ.
Thm. 2.4. For f ∈ C∞(Bd), let Assump. 2.1 be satisfied.
Then the interpolation error can be estimated by
‖f − INf‖∞,Bd ≤ ΛdN
2Mρ(f)
ρ− 1ρ−N , (12)
where ΛN is the maximal Lebesque const. for the 1D
interpolants IkN , and
Mρ(f) := max1≤j≤d
maxx∈E(j)ρ
|fj(x, ξ)|.
Error bound for tensor-product polynomial interp. B. Khoromskij, Zuerich 2010(L2) 42
Proof. Multiple use of (9), (10) and the triangle inequality
lead to
|f − INf | ≤ |f − I1Nf | + |I1
N (f − I2N × . . .× Id
Nf)|≤ |f − I1
Nf | + |I1N (f − I2
Nf)| ++ |I1
NI2N (f − I3
Nf)| + . . .+ |I1N × . . .× Id−1
N (f − IdNf)|
≤ [(1 + ΛN ) maxx∈E(1)ρ
|f1(x, ξ)| + ΛN (1 + ΛN ) maxx∈E(2)ρ
|f2(x, ξ)|
+ . . .+ Λd−1N (1 + ΛN ) max
x∈E(d)ρ
|fd(x, ξ)|]2
ρ− 1ρ−N
≤ (1 + ΛN )(ΛdN − 1)
ΛN − 1
2Mρ
ρ− 1ρ−N .
Hence (12) follows since for x > 1 we have(1+x)(xn−1)
x−1 ≤ xn.
Application to the Helmholtz kernel B. Khoromskij, Zuerich 2010(L2) 43
Are the Tucker/canonical/FTT models robust in κ ?
Construct exponentially convergent tensor decompositions of
the classical Helmholtz kernel eiκ‖x−y‖
‖x−y‖ , κ ∈ R, such that its real
and imaginary parts
cos(κ‖x− y‖)‖x− y‖ and
sin(κ‖x− y‖)‖x− y‖ , x, y ∈ R
d
are treated separately.
Goal: Separable approximation of the oscillatory potentials
f1,κ(‖x‖) :=sin(κ‖x‖)
‖x‖ ; f2,κ(‖x‖) :=1
‖x‖−cos(κ‖x‖)
‖x‖ =2sin2(κ
2 ‖x‖)‖x‖ ,
and the related kernel functions
f1,κ(‖x− y‖), f2,κ(‖x− y‖), 1
‖x− y‖ , x, y ∈ Rd.
f1,κ(‖x‖), f2,κ(‖x‖), slice for d = 3, ‖x‖ ≤ π, κ = 1, 15 B. Khoromskij, Zuerich 2010(L2) 44
0 10 20 30 40 50 60 70 80
0
20
40
60
80
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0
20
40
60
80
0
20
40
60
80−14
−12
−10
−8
−6
−4
−2
0
2
4
0 10 20 30 40 50 60 70 80
0
20
40
60
800
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 10 20 30 40 50 60 70 800
20
40
60
80
0
2
4
6
8
10
12
Estimates on the Tucker/canonical rank of f1,κ B. Khoromskij, Zuerich 2010(L2) 45
Main result: The Tucker and canonical approximations to
f1,κ, f2,κ, allow the rank bound (see [2])
rT ≤ R ≤ Cd(| log ε| + κ).
Numerics (d = 3, moderate κ ≤ 15) agrees with the theory.
Rem. 2.1. 1‖x‖ is proven to have a low-rank separable
approximation with rT ≤ R = O(log2 ε) or R = O(| log ε| logn) in
the discrete case (Lect. 3, 4), (see [1]).
Thm. 2.5. For given ε > 0, the funct. f1,κ : [0, 2π√d]d → R
allows the Tucker/canonical approx., s.t.
σ(f1,κ,S) ≤ Cε with S = T r,CR,
and with the rank estimates (r = (r, ..., r)),
r ≤ R ≤ Cd(| log ε| + κ).
Estimates on the Tucker/canonical rank of f1,κ B. Khoromskij, Zuerich 2010(L2) 46
Proof. Set t = ‖x‖2 and approximate the entire function g(t) =sin(κ
√t)√
t,
t ∈ [0, 2π] by trigonometric polynomials in t up to the accuracy ε in the
max-norm. Making use of the change of variables z = cos(t), z ∈ [−1, 1],
consider the entire function f(z) = g(arccos(z)), that has the maximum
value O(eκ) on the respective Bernstein’s regularity ellipse of size O(1).
Applying the Chebyshev series to f(z) leads to approximation by trig.
polynomials with O(| log ε| + κ) terms, where each trigonometric term has
the form cos(mt) = cos(m‖x‖2).
The multivariate function h(x) := cos(x21 + ...+ x2
d) has a separation rank
R ≤ d, i.e. h ∈ Cd, in view of the rank-d representation (Lec. 1, Prop. 1.1)
cos(dX
i=1
xj) =dX
i=1
sin(xj +π
2d)
Y
k∈1,...,d\j
sin(xk + π2d
+ αk − αj)
sin(αk − αj),
for any αk ∈ R, s.t. sin(αk − αj) 6= 0 for all j 6= k.
Now the result follows by applying the Chebyshev trigonometric
approximation taking into account that h ∈ Cd.
Estimates on the Tucker/canonical rank of f2,κ B. Khoromskij, Zuerich 2010(L2) 47
Next statement applies to the d-th order tensor representing the projected
kernel f2,κ onto the piecewise constant basis functions.
Introduce the function
f0(t) :=sin2(κ/2
√t)
t≡`f1,κ/2(t)
´2, f2,κ(t) = 2‖x‖f0(t).
Thm. 2.6. Suppose RankC(‖x‖) = O(| log ε| log n), then for any d ≥ 3, and
for given ε > 0, the function f2,κ : [0, 2π√d]d → R, allows the
Tucker/canonical approximations, s.t.
σ(f2,κ,S) ≤ Cε with S = T r,CR,
and with the rank estimates (r = (r, ..., r)),
r ≤ R ≤ Cd| log ε| log n (| log ε| + κ).
Proof. Factorise the function
g(t) =sin2(κ/2
√t)√
t, t = ‖x‖2 ∈ [0, 2π],
to obtain
g(t) =√tf0(t) with f0(t) =
`f1,κ/2(t)
´2.
Estimates on the Tucker/canonical rank of f2,κ B. Khoromskij, Zuerich 2010(L2) 48
Applying to the function f0 : [0, 2π] → R the same argument
as in Theorem 2.5, we obtain its separable approximation in
classes T r and CR (on the continuous level) that allows the
κ-dependent rank estimate
r ≤ R ≤ Cd(| log ε| + κ).
We are left to the tensor approximation of ‖x‖ = ‖x‖2/‖x‖. By
assumption, using the rank-| log ε| logn approximation of ‖x‖,we arrive at the desired (canonical) rank estimate for the
decomposition, obtained as the Hadamard product of two
canonical tensors of rank d(log ε| + κ) and | log ε| logn,respectively.
Comlexity estimates B. Khoromskij, Zuerich 2010(L2) 49
Theorems 2.5 and 2.6 indicate linear scaling of the tensor
rank in the frequency parameter κ, leading to the remarkable
reduction of the numerical cost.
The approximation condition for the high frequency κ ≤ Cn.
Complexity issues:
Storage needs: dRn ≤ rd + rdn ⇒ d(Rr − 1)n ≤ rd−1.
κ ≤ Cn ⇒ the Tucker model scales linear in Nvol = n3.
If κ ≤ Cn2/3 ⇒ the Tucker model scales linear in NBEM = n2.
Canonical decomposition scales at most as O(n2) for any d.
Conclusion: Tensor decomposition outperforms by the order
of magnitude the “best” wavelet O(n3 logn+ κ3 log κ)-meth.
[Beylkin, et al. ’08].
FTT decomposition of functions f1,κ and f2,κ B. Khoromskij, Zuerich 2010(L2) 50
Lem. 2.1 Rank-2 FFT decomposition of f(x) := sin(dP
j=1xj), x ∈ Rd, reads
(see [3])
f(x) =“sinx1 cosx1
”0@cosx2 − sinx2
sinx2 cosx2
1A · · ·
0@cosxd−1 − sin xd−1
sinxd−1 cos xd−1
1A0@cosxd
sinxd
1A .
Proof. Induction, similar to Lec. 1, Exer. 3.1,
f(x) = sinx1 cos(x2 + ...+ xd) + cos x1 sin(x2 + ...+ xd)
=“sinx1 cosx1
”0@cos(x2 + ...+ xd)
sin(x2 + ...+ xd)
1A
=“sinx1 cosx1
”0@cosx2 − sinx2
sinx2 cosx2
1A0@cos(x3 + ...+ xd)
sin(x3 + ...+ xd)
1A .
Thm. 2.7 For any d ≥ 3, we have RankFT T (f1,κ(‖x‖)) ≤ C(| log ε| + κ).
Suppose that RankC(‖x‖) = O(| log ε| logn), then for any d ≥ 3,
RankFT T (f2,κ(‖x‖)) ≤ C| log ε| logn(| log ε| + κ).
Applicability to 3D scattering problems B. Khoromskij, Zuerich 2010(L2) 51
Figure 1: Examples of step-type geometries, which are well suited for
tensor-product representation in 3D FEM/BEM.
Numerics B. Khoromskij, Zuerich 2010(L2) 52
Example 2.1. Figure 13 represents the convergence history for the best
orthogonal Tucker vs. canonical approximations of the Newton/Yukawa
potentials on n× n× n grid for n = 2048.
0 10 20 30 4010
−8
10−6
10−4
10−2
100
Tensor rank
Rel
ativ
e er
ror
Newton
0 10 20 30 40 50Tensor rank
Yukawa − κ=1
Sinc − NFFDTucker approx.
Figure 2: The Tucker vs. canonical approximations of the New-
ton/Yukawa potentials.
Numerics B. Khoromskij, Zuerich 2010(L2) 53
Example 2.2. Figure 3 shows the convergence history for the Tucker
model applied to f1,κ, f2,κ depending on κ ∈ [1, 15]. It clearly indicates the
relation r ∼ C + κ for differen (fixed) values of ε1 = 10−3 and ε2 = 10−4.
2 4 6 8 10 12 142
4
6
8
10
12
14
16
18
20
22
κ
Tuc
ker
rank
f1 (|x|) on [0,π ]3
ε =10−3
ε =10−4
2 4 6 8 10 12 142
4
6
8
10
12
14
16
18
20
22
κT
ucke
r ra
nk
f2(|x|) on [0,π]3
ε =10−3
ε =10−4
Figure 3: Convergence history for the Tucker model applied to f1,κ, f2,κ,
κ ∈ [1, 15].
Literature to Lecture 2 B. Khoromskij, Zuerich 2010(L2) 54
1. W. Hackbusch and B.N. Khoromskij: Hierarchical Kronecker Tensor-Product Approximation to a Class
of Nonlocal Operators in High Dimensions. Part I. Computing 76 (2006) 177-202.
2. B.N. Khoromskij: Tensor-structured Preconditioners and Approximate Inverse of Elliptic Operators in Rd.
J. Constructive Approx. 30:599-620 (2009).
3. I.V. Oseledets: Constructive representation of functions in tensor formats. Preprint INM Moscow, 2010.
URL: http://personal-homepages.mis.mpg.de/bokh
Lect. 3. Analytic Methods of Separable Approximation in RdB. Khoromskij, Zuerich 2010(L3) 55
Outline of Lecture 3
1. Separable approximation by exponential sums.
2. Sinc approximation on (−∞,∞).
- Sampling theorem.
- Sinc quadratures and interpolation.
- Exponential convergence rate for functions in H1(Dδ).
- Improved quadratures.
3. Polynomial and exponential decay on [0,∞).
4. Sinc methods on an arc (a, b)
5. Numerical illustrations.
Setting the approximation problem B. Khoromskij, Zuerich 2010(L3) 56
Analytic methods of the Tucker/canonical tensor-product
decomposition to non-local operators and separable
approximation to multi-variate functions can be based on sinc
interpolation or quadrature.
Approximation problem: Given a multi-variate function
F : Ωd → R, (d ≥ 2), approximate it by a separable expansion
Fr(ζ1, ..., ζd) :=r∑
k=1
ckΦ(1)k (ζ1) · · ·Φ(d)
k (ζd) ≈ F, Ω ∈ R,R+, (a, b),
where the set of univariate funct. Φ(ℓ)k : Ω → R, 1 ≤ ℓ ≤ d,
1 ≤ k ≤ R, may be fixed or chosen adaptively, ck ∈ R.
For numerical efficiency the separation rank r ∈ N should be
reasonably small.
Separable appr. by interpolation and quadratures B. Khoromskij, Zuerich 2010(L3) 57
I. Separation by tensor-prod. interpolation (Tucker model)
• Polynomial interpolation
• Sinc interpolation
The Tucker model applies to a class of analytic functions.
II. Approximating by exponential sums (canonical model)
• Sinc quadratures (simple direct method)
• Fitting by exponential sums∑ake−bkx
(best r-term nonlinear approximation via nontrivial iteration)
• Approx. by trigonometric sums∑
[ak sin(bkx) + a′k cos(b′kx)].
The canonical model applies well to functions depending on
the sum of single variables (say, f(x) = f(‖x‖), x ∈ Rd).
Canonical approximation via separation by integration B. Khoromskij, Zuerich 2010(L3) 58
Assume that a function of ρ =∑d
i=1 xi is given by the integral
f(ρ) =
∫
Ω
G(t)eρF (t)dt, Ω ∈ R,R+, (a, b).
If a quadrature can be applied, one obtains the separable
approximation (with weights cν = ωνG(tν))
f(x1 + . . .+ xd) ≈r∑
ν=1
ωνG(tν)eρF (tν) =r∑
ν=1
cν
d∏
i=1
exiF (tν).
We apply the Sinc-quadratures to the Laplace transform.
Examples of f(ρ): Green’s kernels and classical potentials,
f(x) =1
x1 + ...+ xd, xi ≥ 0,
1
ρ=
∫ ∞
0
e−ρtdt, ρ > 0.
f(x) = 1/‖x‖, x ∈ Rd,
1
ρ=
2
π
∫ ∞
0
e−ρ2t2dt;
Separation by exponential fitting B. Khoromskij, Zuerich 2010(L3) 59
Rem. 3.1 Quadrature approximation provides quasi-optimal
r-term approximation that can be then optimised by algebraic
methods.
The best r-term approximation of f(ρ) by exponential sums,
f(ρ) ≈r∑
ν=1
ωνe−tνρ, tν ∈ C (13)
(e.g., w.r.t. the L∞- or L2-norm), leads to an approximation
whose separation rank R is close to optimal.
Rem. 3.2. The approximation by exponential/trigonometric
sums also applies to the matrix-valued function f(A), with
A =∑d
i=1Ai and pairwise commutable matrices Ai.
Big Bernstein Theorem B. Khoromskij, Zuerich 2010(L3) 60
For n ≥ 1, consider the set E0n of exponential sums on [0,R+):
E0n :=
(u =
nX
ν=1
ωνe−tνx : ων , tν ∈ R
).
Now one can address the problem of finding the best approximation to f
over the set E0n characterised by the best approximation error
d(f, E0n) := infv∈E0
n‖f − v‖∞.
The existence of an approximation by exponentials is due to
Big Bernstein Theorem: If f is completely monotone for x ≥ 0, i.e.,
(−1)nf(n)(x) ≥ 0 for all n ≥ 0, x ≥ 0,
then it is the restriction of the Laplace transform of a measure to R+:
f(z) =
Z
R+
e−tzdµ(t).
Exponential decay of the error on [a, b] B. Khoromskij, Zuerich 2010(L3) 61
The complete elliptic integral of the first kind with modulus κ,
K(κ) =
∫ 1
0
dt√(1 − t2)(1 − κ2t2)
(0 < κ < 1)
and define K′(κ) := K(κ′) by κ2 + (κ′)2 = 1.
Prop. 3.1. [Braess] Assume that f is completely monotone and
analytic for ℜe z > 0, and let 0 < a < b. Then for the uniform
approximation on the interval [a, b],
limn→∞
d(f, E0n)1/n ≤ 1
ω2, ω = exp
πK(κ)
K′(κ)< 1, with κ =
a
b.
In the cases f(ρ) as below, we may assume ρ ∈ [1, R], i.e.,
κ = 1/R for 1 ≪ R.
Exponential decay of the error on [a, b] B. Khoromskij, Zuerich 2010(L3) 62
Now applying the asymptotics
K(κ′) = ln 4κ + C1κ+ ... for κ′ → 1,
K(κ) = π2 1 + 1
4κ2 + C1κ
4 + ... for κ→ 0,
of the complete elliptic integrals, we obtain
1
ω2= exp
(−2πK(κ)
K(κ′)
)≈ exp
(− π2
ln(4R)
)≈ 1 − π2
ln(4R).
The latter expression indicates that the number n of different
terms to achieve a tolerance ε > 0 is estimated by
n ≈ | log ε|| logω−2| ≈
| log ε| ln (4R)
π2.
This result shows the same asymptotical convergence in n as
that for the Sinc approximation (see below and Lect. 4).
Exponential approximations in L2-norm B. Khoromskij, Zuerich 2010(L3) 63
The best approximation to f(ρ), ρ ∈ [1, R] w.r.t. a weighted
L2-norm is reduced to the minimisation of an explicitly given
differentiable functional.
Given R > 1, n ≥ 1, find the 2n parameters α1, ω1, ..., αn, ωn ∈ R,
such that
FW (R;α1, ω1, ..., αn, ωn) :=
∫ R
1
W (x)(f(x)−
n∑
i=1
ωie−αix
)2
dx = min .
In the important particular case of f(x) = 1/x and W (x) = 1,the integral can be calculated in a closed form
F1(R;α1, ω1, ..., αn, ωn) = 1 − 1
R− 2
nX
i=1
ωi [Ei(−αi) − Ei(−αiR)]
+1
2
nX
i=1
ω2i
αi
he−2αi − e−2αiR
i+ 2
X
1≤i<j≤n
ωiωj
αi + αj
he−(αi+αj) − e−(αi+αj)R
i
with the integral exponential function Ei(x) = −∫ x
−∞et
t dt.
Exponential approximations in L2-norm B. Khoromskij, Zuerich 2010(L3) 64
In the case R = ∞, the expression for F1(∞; . . .) simplifies.
Gradient or Newton type methods with a proper choice of the
initial guess can be used to obtain the minimiser of F1.
However, the convergence of nonlinear iterations might be
very slow.
The integral FW may be approximated by certain quadrature.
Optimisation with respect to the maximum norm leads to the
nonlinear minimisation problem
infv∈E0n‖f − v‖L∞[1,R]
involving 2n parameters ων , tνnν=1. The numerical scheme
can be based on the Remez algorithm of rational
approximation.
Exponential approximations in L2-norm B. Khoromskij, Zuerich 2010(L3) 65
Calculations using the weighted L2([1, R])-norm have been
performed by the MATLAB subroutine FMINS based on the
global minimisation by direct search.
best approximation to 1/√ρ in weighted L2([1, R])-norm.
R 10 50 100 200 ‖ · ‖L∞ W (ρ) = 1/√ρ
r = 4 3.710-4 9.610-4 1.510-3 2.210-3 1.910-3 4.810-3
r = 5 2.810-4 2.810-4 3.710-4 5.810-4 4.210-4 1.210-3
r = 6 8.010-5 9.810-5 1.110-4 1.610-4 9.510-5 3.310-4
r = 7 3.510-5 3.810-5 3.910-5 4.710-5 2.210-5 8.110-5
Calculations for nearly best approximation to 1/√ρ in
L∞-norm are presented by W. Hackbusch,
www.mis.mpg.de/scicomp/EXP SUM/1 x/tabelle.
Sampling Theorem, Sinc Approximation B. Khoromskij, Zuerich 2010(L3) 66
How to discretise analog signals ?
The class of functins f(t), t ∈ R can be discretized by
recording their sample values f(nh)n∈Z at intervals h > 0.
The sinc function (also called Cardinal function) is given as
sinc(x) :=sin(πx)
πxwith convention sinc(0) = 1.
V.A. Kotelnikov (1933) and J. Whittaker (1935) proved a celebrated
theorem: band-limited signals can be exactly reconstructed
via their sampling values.
Thm. 3.1. (Kotelnikov, Shannon, Whittaker) If the support of f is
included in [−π/h, π/h] then for t ∈ R
f(t) =
∞∑
n=−∞f(nh)Sn,h(t), with Sn,h(t) = sinc(t/h − n).
Sampling Theorem B. Khoromskij, Zuerich 2010(L3) 67
Proof. Exer. 3.1. Use properties of Fourier transform (FT) Khoromskij [4].
bf(ω) :=
Z
R
f(t)e−iωtdt (continuous Fourier transform).
Exer. 3.2. Let χ[−T,T ](t) = 1 if t ∈ [−T, T ] and 0 otherwise (characteristic,
indicator, step function). Prove 12Tbχ =
sin(T w)T w
.
−1 −0.5 0 0.5 1 1.5 2
−0.5
0
0.5
1
1.5
Haar scaling function
−10 −8 −6 −4 −2 0 2 4 6 8 10−0.4
−0.2
0
0.2
0.4
0.6
0.8
1Sinc function
Figure 4: Haar (cf. bf of f = sinc) and Sinc scaling functions.
Sampling theorem plays an important role in tele/radio communications,
signal processing, stochastical models etc.
Sampling Thm. as a decomposition in orthogonal basis B. Khoromskij, Zuerich 2010(L3) 68
Define the space Uh as a set of functions whose FTs have a
support included in [−π/h, π/h].
Lem. 3.2. [Stenger] A set of functions Sn,h(t)n∈Z is an
orthogonal basis of the space Uh. If f ∈ Uh then
f(nh) =1
h〈f(t), Sn,h(t)〉 .
Cor. 3.3. The sinc-interpolation formula of Thm. 3.1 can be
interpreted as a decomposition of f ∈ Uh in an orthogonal
basis of Uh:
f(t) =1
h
∞∑
n=−∞〈f(·), Sn,h(·)〉Sn,h(t).
If f 6∈ Uh, one finds the orthogonal projection of f in Uh.
Exact Sinc-interpolation of entire functions B. Khoromskij, Zuerich 2010(L3) 69
When the Sinc-interpolant represents a funct. exactly?
C(f, h)(x) =
∞∑
k=−∞f(kh)Sk,h(x).
Def. 3.1. Let h > 0, and let W(π/h) denote the family of
entire functions, s.t.∫
R|f(t)|2dt <∞, and for all z ∈ C
|f(z)| ≤ Ceπ|z|/hwith constant C > 0.
Thm. 3.4. (Stenger) h−1/2Sk,h(x)k∈Z is a complete
L2(R)-orthonormal sequence in W(π/h).
Every f ∈ W(π/h) has the cardinal series representation
f(x) = C(f, h)(x), x ∈ R.
Sinc-approximation of analytic functions B. Khoromskij, Zuerich 2010(L3) 70
Interpolant C(f, h) provides an incredibly accurate approx.
on R for functions which are analytic and uniformly bounded
on the strip
Dδ := z ∈ C : |ℑmz| ≤ δ, 0 < δ <π
2,
such that
N(f,Dδ) :=
∫
R
(|f(x+ iδ)| + |f(x− iδ)|) dx <∞.
This defines the Hardy space H1(Dδ).
For f ∈ H1(Dδ) we have exponential convergence in 1/h (Stenger)
supx∈R
|f(x) − C(f, h)(x)| = O(e−πδ/h), h → 0. (14)
Sinc-quadratures for analytic integrand B. Khoromskij, Zuerich 2010(L3) 71
Likewise, if f ∈ H1(Dδ), the integral
I(f) =
∫
Ω
f(x)dx (Ω = R or Ω = R+)
can be approximated with exponential convergence by the
Sinc-quadrature (trapezoidal rule)
T (f, h) := h
∞∑
k=−∞f(kh)
(=
∫
R
C(f, h)(x)dx ≈ I(f)
),
|I(f) − T (f, h)| = O(e−πδ/h), h → 0. (15)
Analogues estimates hold for (computable) trucated sums
CM (f, h) :=∑M
k=−M f(kh)Sk,h(x), TM (f, h) := h∑M
k=−M f(kh).
Standard error estimates on R B. Khoromskij, Zuerich 2010(L3) 72
Thm. 3.5. [Stenger] If f ∈ H1(Dδ) and |f(x)| ≤ C exp(−b|x|) for
all x ∈ R b, C > 0, then
‖f − CM (f, h)‖∞ ≤ C
[e−πδ/h
2πδN(f,Dδ) +
1
bhe−bhM
], (16)
|I(f) − TM (f, h)| ≤ C
[e−2πδ/h
1 − e−2πδ/hN(f,Dδ) +
1
be−bhM
]. (17)
Sketch of proof: First term in the rhs of (16) represents the
approximation error (14),
‖f(x) − C(f, h)(x)‖∞ ≤ N(f,Dδ)
2πδ sinh(πδ/h),
while the second one gives the truncation error
‖C(f, h)(x) − CM (f, h)(x)‖∞ ≤ ∑|k|≥M+1
|f(kh)|
≤ 2C∞∑
k=M+1
e−bkh ≤ 2Cbhe−bhM .
Exponential convergence rate in M B. Khoromskij, Zuerich 2010(L3) 73
Similar arguments apply to (17).
For interpolation error (16), the choice
h =√πδ/bM
implies the exponential convergence rate
‖f − CM (f, h)‖∞ ≤ CM1/2e−√
πδbM . (18)
In fact, for the chosen h, the first term in the rhs in (16)
dominates, hence (18) follows. Usually we set δ = π/2.
For the quadrature error (17), the “optimal” choice
h =√
2πδ/bM
yields
|I(f) − TM (f, h)| ≤ Ce−√
2πδbM . (19)
Error bound in the case of double-exponential decay B. Khoromskij, Zuerich 2010(L3) 74
If f has a double-exponential decay as |x| → ∞, i.e.,
|f(x)| ≤ C exp(−bea|x|) for all x ∈ R with a, b, C > 0, (20)
the convergence rate of Sinc- interpolation and quadrature
can be improved up to O(e−cM/ log M ) (cf. Thm. 3.5).
Thm. 3.6. (Gavrilyuk, Hackbusch, Khoromskij) Let f ∈ H1(Dδ) with
some δ < π2 , and let (20) hold. Then the choice
h = log( 2πaMb )/ (aM) leads for the quadrature error
|I − TM (f, h)| ≤ C N(f,Dδ)e−2πδaM/ log(2πaM/b). (21)
The choice h = log(πaMb )/ (aM) ensures the interpolation error
‖f − CM (f, h)‖∞ ≤ CN(f,Dδ)
2πδe−πδaM/ log(πaM/b). (22)
Error bound in the case of double-exponential decay B. Khoromskij, Zuerich 2010(L3) 75
Proof. The bound for |I − T (f, h)| is the same as in Thm. 3.5.
For the rest sum we use the simple estimate to obtain
∑
k: |k|>M
exp(−bea|kh|) = 2∞∑
k=M+1
exp(−bea|kh|)
≤ 2
∫ ∞
M
exp(−bea|xh|)dx ≤ 2e−ahM
abhexp(−beahM ).
Hence, the quadrature error has a bound
|I − TM (f, h)| ≤ C
[e−2πδ/h
1 − e−2πδ/hN(f,Dδ) +
e−ahM
abexp(−beahM )
].
Now (21) follows by substitution of h.
Error bound in the case of double-exponential decay B. Khoromskij, Zuerich 2010(L3) 76
The interpolation error of CM (f,h) satisfies
‖f − CM (f,h)‖∞ ≤ C
"e−πδ/h
2πδN(f,Dδ) +
e−ahM
abhexp(−beahM )
#.
The approximation error allows the same estimate as in the standard case.
To prove (22), we note that the truncation error bound is determined by
the decay rate of f as |x| → ∞,
‖C(f,h)(x) − CM (f,h)(x)‖∞ ≤ P|k|≥M+1
|f(kh)|
≤ 2C∞P
k=M+1e−beakh ≤ 2C
baheahM e−beahM.
Exer. 3.3. For numerical approximation of the integralR∞−∞ exp(−x2)dx =
√π, show that with the choice h = (π/M)1/2,
˛˛˛Z ∞
−∞exp(−x2)dx− h
MX
−M
e−k2h2
˛˛˛ ≤ C exp(−πM).
Calculate the approximation to√π for M = 4, 8, 12, the latter will be
accurate to 15 digits. Calculate the sinc interpolant to exp(−λx2), λ > 0.
Sinc-interpolation on (a, b) via Thm. 3.5 B. Khoromskij, Zuerich 2010(L3) 77
To apply Thm. 3.5 in the case Ω = (a, b) (say, Ω = R+) one
has to substitute the variable x ∈ Ω by x = ϕ(ζ) such that
ϕ : R → (a, b) is a bijection. This changes f : (a, b) → R into
f1 := ϕ′ · (f ϕ) : R → R (quadrature case),
f1 := f ϕ (interpolation case).
Assuming f1 ∈ H1(Dδ), one can apply (18)-(19) to the
transformed function f1.
Ex. 3.1. In the case of interval, (a, b):
ϕ−1(z) = log[(z − a)/(b− z)], ℜe z = x.
Ex. 3.2. In the case of semi-axis, R+ := (0,∞):
ϕ−1(z) = log[sinh(z)] or ϕ−1(z) = log(z), (ϕ(ζ) = eζ).
Sinc quadratures on R+ (polynomial/exponential decay) B. Khoromskij, Zuerich 2010(L3) 78
Polynomial decay. Let us set Ω = R+ and assume:
(i) f can be analytically extended from R+ into the sector
D(1)δ = z ∈ C : | arg(z)| < δ for some 0 < δ < π/2,
(actually, ϕ−1 : D(1)δ → Dδ is the conformal map, ϕ(ζ) = eζ),
(ii) f satisfies the inequality
|f(z)| ≤ c|z|α−1(1+|z|)−α−β for some 0 < α, β ≤ 1 and ∀z ∈ D(1)δ .
Let α = 1. Choosing any M ∈ N and taking
h(1) =√
2πδ/(βM),
we define the corresponding quadrature rule
T(1)M = h(1)
M∑
k=−βM
ckf(zk), zk = ekh(1)
, ck = ekh(1)
,
Sinc quadratures on R+ (polynomial/exponential decay) B. Khoromskij, Zuerich 2010(L3) 79
possessing the exponential converg. (C > 0 independ. of M)∣∣∣I − T
(1)M
∣∣∣ ≤ Ce−√
2πδβM .
d
d 0
Dd1
id
0
d
d
Dd3
Figure 5: The analyticity sector D(1)δ (left) and the “bullet-shaped” do-
main D(3)δ .
Rem. 3.3 The results for polynomial decay are in sharp contrast to the
error in polynomial-based approximation with algebraic singularities. For
example, for the function f(x) = xα(1 − x)α, α = 1/2, the best
approximation by polynomials of degree n converges with the rate Cn−α.
Sinc quadratures on R+ (polynomial/exponential decay) B. Khoromskij, Zuerich 2010(L3) 80
Exponential decay. Assume that the integrand f can be
analytically extended into the “bullet-shaped” domain
D(3)δ = z ∈ C : | arg(sinh z)| < δ, 0 < δ < π/2,
and that f satisfies
|f(z)| ≤ C
( |z|1 + |z|
)α−1
e−βℜe z in D(3)δ , α, β ∈ (0, 1]. (23)
Setting α = 1 and choosing h(2) = h(1), c(2)k = 1 + e−2kh(2)
and
M ∈ N, we obtain the quadrature
T(2)M = h(2)
M∑
k=−βM
c(2)k f(z
(2)k ), z
(2)k = log[ekh(2)
+√
1 + e2kh(2) ],
possessing the exponential convergence rate as above.
Numerics for the Sinc interpolation on (a, b) B. Khoromskij, Zuerich 2010(L3) 81
Ex. 3.3. Separable approximation to the function
g(x, y) = ‖x‖λ sinc(‖x‖ ‖y‖), λ ∈ (−3, 1],
arising in the Boltzmann equation, x, y ∈ R3.
4 8 12 16 20 24 28 32 36 40 44 4810
−12
10−10
10−8
10−6
10−4
10−2
100
M − number of quadrature points
erro
r
|x|s sinc(y|x|), x ∈ [−1,1],s=1,y=16
4 8 12 16 20 24 28 32 36 40 44 4810
−8
10−7
10−6
10−5
10−4
10−3
10−2
10−1
M − number of quadrature points
erro
r
|x|s sinc(y|x|), x ∈ [−1,1],s=1,y=25
4 8 12 16 20 24 28 32 36 40 44 4810
−5
10−4
10−3
10−2
10−1
M − number of quadrature points
erro
r
|x|s sinc(y|x|), x ∈ [−1,1],s=1,y=36
Figure 6: L∞-error of the sinc-interpolation to |x|λsinc(|x|y), x ∈[−1, 1], y = 16, 25, 36, λ = 1.
Rem. 3.4. Sinc-interpolant provides exponentially convergent separable
approximation of g(x, y) in (x, y) ∈ [a, b]3 × [c, d]3.
Numerics for the Sinc interpolation on R+ B. Khoromskij, Zuerich 2010(L3) 82
Ex. 3.4. Sinc-interpolation for g(x, y) = exp(−xy), x, y ≥ 0.
Consider the auxiliary function f(x, y) = x1+x exp(−xy), x ∈ R+,
y ∈ [1, R], which satisfies all the conditions above with
α = β = 1 (exponential decay). With the choice of
interpolation points xk := log[ekh +√
1 + e2kh] ∈ R+, it can be
approximated with exponential convergence.
4 8 12 16 20 24 28 32 36 40 44 4810
−14
10−12
10−10
10−8
10−6
10−4
10−2
100
M − number of quadrature points
erro
r
|x|s exp(−y|x|), x ∈ [−1,1],s=1,y=1.
4 8 12 16 20 24 28 32 36 40 44 4810
−14
10−12
10−10
10−8
10−6
10−4
10−2
100
M − number of quadrature points
erro
r
|x|s exp(−y|x|), x ∈ [−1,1],s=1,y=10.
4 8 12 16 20 24 28 32 36 40 44 4810
−10
10−8
10−6
10−4
10−2
M − number of quadrature points
erro
r
|x|s exp(−y|x|), x ∈ [−1,1],s=1,y=100.
Figure 7: L∞-error of the sinc-interpolation of exp(−|x|y), x ∈ [−1, 1], y = 1, 10, 100.
Numerics for the Sinc interpolation on R B. Khoromskij, Zuerich 2010(L3) 83
Ex. 3.5. Mexican hat scaling function
−5 −4 −3 −2 −1 0 1 2 3 4 5 6−0.5
0
0.5
1Mexican hat scaling function
Figure 8: Mexican hat f(x) = (1 − x2) exp(−αx2), α > 0.
Sinc interpolation to the Mexican hat, r = M + 1.
α\M 4 9 16 25 36 49 64 81 100
1 0.05 6.10-4 7.10-7 1.10-10 2.10-15 1.10-15 - - -
10 0.17 0.13 0.12 0.04 0.01 0.004 0.0009 1.710-4 2.610-5
0.1 3.8 2.6 0.6 0.08 0.006 1.610-5 2.10-7 2.510-9 2.10-11
Literature to Lect. 3 B. Khoromskij, Zuerich 2010(L3) 84
1. D. Braess: Nonlinear approximation theory. Springer-Verlag, Berlin, 1986.
2. I.P. Gavrilyuk, W. Hackbusch, and B.N. Khoromskij: Data-sparse approximation to a class of
operator-valued functions. Math. Comp. 74 (2005), 681-708.
3. W. Hackbusch and B.N. Khoromskij: Hierarchical Kronecker Tensor-Product Approximation to a Class
of Nonlocal Operators in High Dimensions. Part I. Computing 76 (2006), 177-202.
4. B.N. Khoromskij: An Introduction to Structured Tensor-product Representation of Discrete
Nonlocal Operators. Lecture notes 27, MPI MIS, Leipzig 2005.
5. J. Lund, and K.L. Bowers: Sinc Methods for Quadrature and Different. Equations. SIAM, Philadelphia 1992.
6. F. Stenger: Numerical methods based on Sinc and analytic functions. Springer-Verlag, 1993.
http://personal-homepages.mis.mpg.de/bokh
Lect. 4. Low Rank Sinc Approximation of Green Kernels B. Khoromskij, Zuerich 2010(L4) 85
Outlook of Lecture 4
1. Tensor-product Sinc interpolation ⇒ Tucker approxim.
- Lebesque constant.
- Exponential converg. of tensor-product Sinc interpolation.
2. Separable approx. of integral operators and related kernels.
- General discussion.
- The case of chift invariant kernels. Green kernels.
3. Error analysis for basic examples, 1x1+...+xd
, 1‖x‖ ,
e−‖x‖
‖x‖ ,
x ∈ Rd.
4. Numerical illustrations.
5. Tensor product convolution in Rd. Quadrature-based
canonical decomp. of the projected Newton/Yukawa kernels.
Tensor-product interpolation revisited B. Khoromskij, Zuerich 2010(L4) 86
Given N ∈ N, the set of interpolating functions ϕj(x),x ∈ B := [−a, a], and sampling points ξj ∈ B, s.t. ϕj(ξi) = δij,
(i, j = 1, ..., N).
The Lagrangian interpolant IN of F : B → R has the form
INf :=
N∑
j=1
f(ξj)ϕj(x), f ∈ C[B] (24)
with (INf)(ξj) = f(ξj) (j = 1, . . . , N).
Recall the tensor-product interpolant IN in d spatial variables,
INf := I1N × · · · × Id
Nf =N∑
j=1
f(ξj1 , ..., ξjd)ϕ(1)j1
(x1) · · ·ϕ(d)jd
(xd),
where f : Bd → R, and IℓMf is the univariate interpolation in
xℓ ∈ Bℓ (1 ≤ ℓ ≤ d), Bd = B1 × · · · ×Bd.
Sinc-interpolation of multi-variate functions B. Khoromskij, Zuerich 2010(L4) 87
Consider the separable approximation on Bd = Rd, B = R.
Extension to the case B = R+ or B = (a, b) is straightforward.
The tensor-product Sinc interpolant CM in d variables,
CMf := C1M × ...× Cd
Mf, f : Rd → R,
where CℓMf = Cℓ
M (f, h), 1 ≤ ℓ ≤ d, is the univariate Sinc interp.
CℓM (f, h) =
M∑
k=−M
f(x1, ..., kh, ..., xd)Sk,h(xℓ).
Ex. 4.1. CMf converges exponentially fast in M for
f(x) = ‖x‖α, f(x) = e−κ‖x‖γ , f(x) =erf(‖x‖)
‖x‖ , f(x, y) = ‖x‖γ sinc(‖x‖ · ‖y‖),
x, y ∈ Rd.
Stability: Lebesgue constant of the Sinc-interpolant B. Khoromskij, Zuerich 2010(L4) 88
Error bound for tensor-product Sinc interpolant.
The estimation of the error f − CMf requires the Lebesgue
constant ΛM ≥ 1 of the univariate interpolant defined by
||CM (f, h)||∞ ≤ ΛM ||f ||∞ for all f ∈ C(R). (25)
Stenger ’93 proves the inequality
ΛM := maxx∈R
M∑
k=−M
|Sk,h(x)| ≤ 2
π(3 + log(M)). (26)
For each fixed ℓ ∈ 1, . . . , d, choose ζℓ ∈ Bℓ and define the
“single hole” parameter set by
Yℓ := B1 × ...×Bℓ−1 ×Bℓ+1 × ...×Bd ∈ Rd−1.
Sinc-interpolation error B. Khoromskij, Zuerich 2010(L4) 89
Introduce the univariate (parameter dependent) function
Fℓ(·, y) : Bℓ → R, y ∈ Yℓ, ℓ = 1, ..., d,
that is the restriction of f onto Bℓ. Recall the Hardy space
H1(Dδ).
Thm. 4.1. For each ℓ = 1, ..., d, and for any fixed y ∈ Yℓ, we
assume that Fℓ(·, y) satisfies
(a) Fℓ(·, y) ∈ H1(Dδ) with N(Fℓ, Dδ) ≤ N0 <∞ uniformly in y;
(b) Fℓ(·, y) has hyper-exponential decay with a = 1, C, b > 0.
Then, the “optimal” choice h := log MM , yields
‖f − CM (f, h)‖∞ ≤ C
2πδΛd
MN0 e−πδMlogM (27)
with ΛM defined by (26).
Proof of the Sinc-interpolation error B. Khoromskij, Zuerich 2010(L4) 90
Similar to the case of polynomial interpolation, the multiple
use of (25) and the triangle inequality lead to
|f − CMf | ≤ |f − C1Mf | + |C1
M (f − C2M . . . Cd
Mf)|≤ |f − C1
Nf | + |C1M (f − C2
Mf)| ++ |C1
MC2M (f − C3
Mf)| + . . .+ |C1M . . . Cd−1
M (f − CdMf)|
≤ (1 + ΛM )[N1 + ΛMN2 + . . .+ Λd−1M Nd]
1
2πδe
−πδMlogM
≤ (1 + ΛM )1 + ΛM + ...+ Λd−1
M
2πδmax
ℓ=1,...,dN(Fℓ, Dδ) e
−πδMlogM .
Hence, for ΛM → ∞, (27) follows.
Notice that usually δ can be chosen close to π/2. The choice
of δ effects only the error estimate, while CMf does not
depend on δ.
Tucker approx. to integral operators (IOs) (analytic meth.) B. Khoromskij, Zuerich 2010(L4) 91
CMf applies to Nystrom method:
(Gu) (x) :=
∫
Ω
g(x, y)u(y)dy ≈∑
k
g(xm, yk)u(yk), xm, yk ∈ Ωh ∈ Rd.
A separable approx. of the singular kernel works only for
coupled variables (Φℓk(·, ·) is a set of bivariate functions),
gr :=r∑
k=1
bkΦ(1)k1
(x1, y1) · · ·Φ(d)kd
(xd, yd) ≈ g.
For the shift-invariant singular kernel g(x, y) = g(‖x− y‖),
g(x, y) ⇒ G(ζ1, ..., ζd) ≡ G
(√ζ21 + ...+ ζ2
d
),
where ζℓ = |xℓ − yℓ| ∈ [0, 1], ℓ = 1, ..., d.
Now the sinc interpolat applies w.r.t. the d coupled variables
ζ1, ..., ζd (only one point singularity!).
Canonical approximation to IOs (analytic meth.) B. Khoromskij, Zuerich 2010(L4) 92
Separation by integration applies to collocation and Galerkin
methods. The r-term Sinc-quadratures for the Laplace
integral representation of G(ρ), ρ = ‖x− y‖2, ρ ∈ [a, b],
G(ρ) =
∫
R
f(t)e−tρdt ≈r∑
k=1
ckf(tk)d∏
ℓ=1
e−tk|xℓ−yℓ|2 , ρ ∈ [a, b].
For the collocation-projection, comp. a rank-r tensor at x = 0,
gi = 〈G(ρ), φi〉 ≈r∑
k=1
ckf(tk)
d∏
ℓ=1
〈e−tk|yℓ|2 , φiℓ(yℓ)〉, i ∈ I.
Ex. 4.2. For the classical Green kernels, x, y ∈ Rd,
log ‖x−y‖, 1
‖x− y‖ ,e−µ‖x−y‖
‖x− y‖ , (µ ∈ R+),e−i κ2‖x−y‖
‖x− y‖ , (κ2 ∈ R),
Sinc method provides asymptotically optimal bound on both
the canonical and Tucker ranks (see also Lect. 2),
r = O(logn| log ε|), r = (r, ..., r).
Initial applications: Problem classes B. Khoromskij, Zuerich 2010(L4) 93
• Tri-linear approx. to 3-rd/6-th order tensors generated by
the classical Green kernels (examples below).
• ”Multi-centred” potential (prototype of large molecules).∑k cke
−αk‖x−xk‖, x ∈ R3 (examples below).
• Electron density, Hartree and exchange potentials in R3.
Solving the Hartree-Fock eq. in tensor format (Part III).
• Traditional FEM/BEM (Part III):
– elliptic inverse in Rd via 1‖x‖2 , particular solutions
(convolution with Green’s func.), BEM on special surf.
– solving the elliptic boundary value, spectral and
transient problems in tensor format.
• Solving stochastic PDEs in tensor format (Part III).
Basic examples B. Khoromskij, Zuerich 2010(L4) 94
Low rank separable approx. of the multi-variate functions
(a)1
x21 + ...+ x2
d
, (b)1√
x21 + ...+ x2
d
, (c)e−λ‖x‖
‖x‖ .
Ex. 4.3. In case (a), the Sinc method applies to the Laplace
integral transform
1
ρ=
∫
R+
e−ρtdt(ρ = x2
1 + ...+ x2d ∈ [1, R], R > 1
). (28)
The improved quadrature applies by using substitutions
t = log(1 + eu) and u = sinh(w),
1
ρ=
∫
R
f2(w)dw, with f2(w) =cosh(w)
1 + e− sinh(w)e−ρ log(1+esinh(w)).
Canonical rank estimate for 1/ρ B. Khoromskij, Zuerich 2010(L4) 95
Special case of Thm. 3.6 (Lect. 3).
Lem. 4.1. (Hackbusch, Khoromskij [1]) Let ρ ∈ [1, R], then the choice
δ = δ(R) = O(1/ log(R)), a = 1, b = 1/2, in Thm. 3.6 implies the
uniform quadrature error bound by setting h = log(4πM)/M ,
∣∣∣∣1
ρ− TM (f2, h)
∣∣∣∣ . Ce− π2M
(C + log(R)) log(π2M) . (29)
Sketch of Proof. The funct. f2(w) belongs to H1(Dδ), with
δ = O(1/ log(R)), and N(f2, Dδ) <∞ independent of ρ.
Double-exponential decay of f2(w) on w ∈ (−∞,∞), is due to
f2(w) ≈ 1
2ew− ρ2 ew as w → ∞; f2(w) ≈ 1
2e|w|−
12 e|w|
as w → −∞,
corresponding to C = 12 , b = min1, ρ/2, a = 1, in Thm. 3.6.
Numerics for 1/ρ B. Khoromskij, Zuerich 2010(L4) 96
In the case 1/ρ = 1x21+...+x2
d, estimate (29) implies that an
approximation of accuracy ε > 0 is obtainable with
M ≤ O(log( 1
ε ) · logR), (30)
provided that 1 ≤ ρ ≤ R (can be achieved by a proper scaling).
The numerical results support even the better bound
M ≤ O(log( 1
ε ) + logR)
(see Fig. 9, 10).
0 200 400 600 800 1000−8
−6
−4
−2
0
2
4
6x 10
−6
0 200 400 600 800 1000−2.5
−2
−1.5
−1
−0.5
0
0.5
1x 10
−8
0 200 400 600 800 1000−1
−0.5
0
0.5
1
1.5
2
2.5
3x 10
−13
Figure 9: The quadrature error related to (29) with 1 ≤ ρ ≤ 103, and
M = 16 (left), M = 32 (middle), M = 64 (right).
Numerics for 1/ρ B. Khoromskij, Zuerich 2010(L4) 97
0 0.5 1 1.5 2
x 104
−8
−6
−4
−2
0
2
4
6
8x 10
−6
0 0.5 1 1.5 2
x 104
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2x 10
−7
0 0.5 1 1.5 2
x 104
−4
−2
0
2
4
6x 10
−10
Figure 10: The quadrature error related to (29) with 1 ≤ r ≤ 18000, and
M = 16 (left), M = 32 (middle), M = 64 (right).
Lem. 4.1 indicates that the separation rank r = 2M + 1 depends only linear
logarithmically on both the tolerance ε > 0 and the upper bound R of ρ.
Important: Rank r does not depend on the dimension d.
Rem. 4.2. The choice of δ only effects the error bound but not the
quadrature itself.
Canonical rank estimate for 1/qx21 + ...+ xd
d B. Khoromskij, Zuerich 2010(L4) 98
A function 1√x21+...+x2
d
for d = 3, defines the Newton kernel.
Ex. 4.4. In case 1ρ = 1/
√x2
1 + ...+ x2d, apply the Gauss integral
1
ρ=
2√π
∫
R+
e−ρ2t2dt (ρ ∈ [1, R]) . (31)
To maintain robustness in ρ, let us rewrite the Gauss integral
(31) using substitutions t = log(1 + eu) and u = sinh(w),
1
ρ=
∫
R
f(w)dw with f(w) := cosh(w)F (sinh(w)) (32)
with
F (u) :=2√π
e−ρ2 log2(1+eu)
1 + e−u, u ∈ (−∞,∞).
Canonical rank estimate for 1/qx21 + ...+ xd
d B. Khoromskij, Zuerich 2010(L4) 99
Lem. 4.2. Let δ < π/2, ρ ≥ 1. Then for the function f in (32)
we have f ∈ H1(Dδ). Moreover, Thm. 3.6 applies with a = 1.
The improved (2M + 1)-point quadrature with the choice
δ(ρ) = πC+log(ρ) , allows the error bound
∣∣∣∣1
ρ− IM (f, h)
∣∣∣∣ ≤ C1 exp
(− π2M
(C + log(ρ)) logM
). (33)
Sketch of Proof. It is easy to check that f is holomorphic in
Dδ and N(f,Dδ) <∞ uniformly in ρ (with the choice δ = δ(ρ)).
Now we check the double-exponential decay of the integrand
as |w| → ∞ and then apply Thm. 3.6, with
δ = δ(ρ) =π
C + log(ρ).
Numerics for 1qx21+...+x2
d
B. Khoromskij, Zuerich 2010(L4) 100
We apply (33) and obtain the bound (no dependence on d),
M ≤ O`log( 1
ε) · logR
´. (34)
Fig. 11 presents numerical illustrations for sinc quadrature with values
ρ ∈ [1, R], R ≤ 5000. We observe very weak error increase in ρ. Similar
results were obtained in the case R > 5000 manifesting a rather stable
behaviour of the quadrature error w.r.t. R.
0 50 100 150 200−4
−3
−2
−1
0
1
2
3x 10
−8
0 200 400 600 800 1000−3
−2
−1
0
1
2
3
4x 10
−7
0 1000 2000 3000 4000 5000−5
0
5x 10
−7
Figure 11: The quadrature error for M = 64 with R = 200 (left), R = 1000
(middle), R = 5000 (right).
Helmholtz kernel revisited B. Khoromskij, Zuerich 2010(L4) 101
Ex. 4.5. The Helmholtz kernel in Rd can be approximated by
sinc-interpolation in the Tucker format.
Given κ ∈ R, consider the Helmholtz kernel function
g(x, y) :=cos(κ‖x− y‖)
‖x− y‖ = ℜeeiκ‖x−y‖
‖x− y‖ for (x, y) ∈ [1, R]d × [1, R]d.
The Sinc interpolation applies to the modified kernels.
For this example we have N0(F,Dδ) = O(eκ), hence, the
separation rank for Tucker approximation is r = 2M + 1, with
the maximal canonical rank rd−1, where M ∼ κ+ | log ε| logR.
This provides unsatisfactory complexity, e.g., O(κd−1n).
We do not know the good quadrature approximation.
Note: The result in Lect. 2 leads to the following bound on
the canonical rank r = O(d(κ+ | log ε| logn)).
SVD recompression B. Khoromskij, Zuerich 2010(L4) 102
4 6 8 10 12 14 16 18 20 22 24 26 28 30 3210
−14
10−12
10−10
10−8
10−6
10−4
10−2
M − number of quadrature points
erro
r
exp[(x2+y2)1/2]− exp[y], x ∈ [0,1], y=0.1
0 5 10 15 20 25 3010
−20
10−15
10−10
10−5
100
105
2 4 6 8 10 12 14 16 18 20 22 24 26 28 3010
−14
10−12
10−10
10−8
10−6
10−4
10−2
M − number of quadrature points
erro
r
exp[(x2+y2)1/2]− exp[y], x ∈ [0,5], y=0.1
0 5 10 15 20 25 3010
−20
10−15
10−10
10−5
100
105
Figure 12: Rank-(r1, ..., rd) approx. to exp(−‖x − y‖), d = 2 (left); SVD optimization (right).
Rank-r Tucker approx. to 1/‖x‖, d = 3, ‖x‖ ≤ 10. B. Khoromskij, Zuerich 2010(L4) 103
2 4 6 8 10 12
10−10
10−8
10−6
10−4
10−2
100
Tucker rank
erro
rNewton , AR=10, n = 64
EFN
EFE
EC
0 10 20 30 40 50 60−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6Canonical components L=3 r=6
Newton , AR=10, n = 64
grid points
Figure 13: Convergence history and canonical vectors for the Newton
potential on n× n× n grid.
Rank-r Tucker approx. to exp(−‖x‖γ), d = 3, ‖x‖ ≤ 10. B. Khoromskij, Zuerich 2010(L4) 104
1 2 3 4 5 6 7 8 9 1010
−10
10−8
10−6
10−4
10−2
100
Tucker rank
erro
r
exp(−|x|γ), , γ=0.5, n = 64
EFN
EFE
EC
1 2 3 4 5 6 7 8 9 10
10−10
10−8
10−6
10−4
10−2
100
Tucker rank
erro
r
exp(−|x|γ), , γ=1, n = 64
EFN
EFE
EC
1 2 3 4 5 6 7 8 9 10
10−12
10−10
10−8
10−6
10−4
10−2
100
Tucker rank
erro
r
exp(−|x|γ), , γ=1.5, n = 64
EFN
EFE
EC
0 10 20 30 40 50 60−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8Canonical components L=3 r=6
exp(−|x|γ), , γ=0.5, n = 64
grid points0 10 20 30 40 50 60
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6Canonical components L=3 r=6
exp(−|x|γ), , γ=1, n = 64
grid points0 10 20 30 40 50 60
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6Canonical components L=3 r=6
exp(−|x|γ), , γ=1.5, n = 64
grid points
Figure 14: Canonical vectors for the Slater-type potential.
Rank-r Tucker approx. toP64
k=1 ck exp(−‖x− xk‖) B. Khoromskij, Zuerich 2010(L4) 105
010
2030
40
0
10
20
30
40
0
0.05
0.1
0.15
0.2
1 2 3 4 5 6 7 810
−15
10−10
10−5
100
relative energy−norm
relative energy
Tucker rank
erro
r
Slater potential, AR=10
1 1.5 2 2.5 3 3.5 4
10−4
10−3
10−2
10−1
100
relative energy−norm
relative energy
Tucker rank
erro
r
Slater−Mult−Rand 1% , AR=10, n = 64
1 2 3 4 5 6
10−6
10−5
10−4
10−3
10−2
10−1
100
relative energy−norm
relative energy
Tucker rank
erro
r
Slater−Mult−Rand 0.1% , AR=10, n = 64
1 2 3 4 5 610
−8
10−6
10−4
10−2
100
relative energy−norm
relative energy
Tucker rank
erro
r
Slater−Mult−Rand 0.01% , AR=10, n = 64
Figure 15: Multi-centred randomly perturbed Slater potential.
Multidimensional Convolution via Tensorization B. Khoromskij, Zuerich 2010(L4) 106
Goal: Fast and accurate computation of convolution
transform in Rd,
w(x) := (f ∗ g)(x) :=
∫
Rd
f(y)g(x− y)dy f, g ∈ L1(Rd).
Application: Solving the Poisson equation in Rd, in
particular, the Hartree potential in quantum chemistry,
VH(x) =
∫
R3
ρ(y, y)
‖x− y‖dy, x ∈ R3.
Method: The low tensor rank multi-linear collocation.
Physical prerequisits:
(a) Compute f ∗ g in some fixed box Ω = [−A,A]d.
(b) Suppose that f has the support in Ω.
(c) f has R-term separable representation with moderate R.
Multidimensional Convolution via Tensorization B. Khoromskij, Zuerich 2010(L4) 107
Tensor grid: Let ωd := ω1 × ...× ωd be the equi-distant tensor
grid of collocation points xm in Ω, m ∈ M := 1, ..., n+ 1d,
ωℓ := −A+ (m− 1)h : m = 1, ..., n+ 1 (ℓ = 1, ..., d), h = 2A/n.
Product Basis: For given piecewise const. basis funct. φi,φi(x) =
d∏ℓ=1
φiℓ(xℓ), φiℓ(·) = φ(· + (iℓ − 1)h), i ∈ I := 1, ..., nd,
related to ωd, let
f(y) ≈∑
i∈Ifiφi(y), fi = f(Pi).
The discrete collocation scheme (cost O(n2d)):
f ∗ g ≈ wmm∈M, wm :=∑
i∈Ifi
∫
Rd
φi(y)g(xm − y)dy, xm ∈ ωd.
The collocation coeff. tensor G = gi ∈ RI (L2-project.)
gi =
∫
Rd
φi(y)g(−y)dy, i ∈ I.
Sinc quadrature for Yukawa projection-collocation B. Khoromskij, Zuerich 2010(L4) 108
Consider a class of spherically-symmetric convolving kernels g : Rd → R,
g(y) = G(ρ(y)) ≡ G(ρ) with ρ ≡ ρ(y) = y21 + ...+ y2d,
where G : R+ → R is represented via the generalised Laplace transform
G(ρ) =
Z
R+
bG(τ2)e−ρτ2dτ. (35)
sinc-quadrature methods apply to the collocation coefficients tensor
G = [gi]i∈I via rank-(2M + 1) canonical decomposition
gi ≈MX
k=−M
wkbG(τ2
k )dY
ℓ=1
Z
R
e−y2ℓ τ2kφiℓ (yℓ)dyℓ, i ∈ I,
with suitably chosen coefficients wk ∈ R and quadrature pints τk ∈ R+.
In the particular case of Yukawa potential for κ ∈ [0,∞), we apply the
Gauss transform (cf. (35))
G(ρ) =e−κ
√ρ
√ρ
=2√π
Z
R+
exp(−ρτ2 − κ2/τ2)dτ, (36)
corresponding to the choice bG(τ2) = 2√πe−κ2/τ2
.
Error bound for Sinc quadrature approximation B. Khoromskij, Zuerich 2010(L4) 109
Thm. 4.2. [3] For given G(ρ) in (36) with fixed κ > 0, we set
wk = hMbG(τ2
k ) and τk = etk , tk = khM ,
with hM = C0 log(M)/M for some C0 > 0. Then for the rank-(2M + 1)
collocation coefficients tensor G = [gi]i∈I, we have
‚‚‚‚‚‚gi −
MX
k=−M
wkbG(τ2
k )dY
ℓ=1
Z
R
e−y2ℓ τ2kφiℓ(yℓ)dyℓ
‚‚‚‚‚‚≤ Ce−π2M/(C+log(M)). (37)
Sketch of proof. Choose the analyticity domain for the integrand in (36)
as a sector Sδ := w ∈ C : |arg(w)| < δ with apex angle 0 < 2δ < π/2, and
then use the conformal map
ϕ−1 : Sδ → Dδ with w = ϕ(z) = ez , ϕ−1(w) = log(w).
Applying the change of variables τ = et leads to
G(ρ) =
Z
R
f(t; ρ)dt with f(t; ρ) = Q(t)e−ρe2t, Q(t) =
2√πet−κ2e−2t
.
Error bound for Sinc quadrature approximation B. Khoromskij, Zuerich 2010(L4) 110
f can be analytically extended into the strip Dδ. By definition,
gi = 〈G(ρ), φi〉 =
Z
R
〈f(t; ρ), φi〉dt ≡Z
R
pi(t)dt, pi(t) = Q(t)dY
ℓ=1
Z
R
e−y2ℓ e2t
φiℓdyℓ.
The rank-1 (separable in i) function pi : R → R, can be analytically
extended into the strip Dδ with 0 < δ < π/4, and this extension belongs to
the Hardy space H1(Dδ). In fact, using the error function erf : R → R,
erf(t) :=2√π
Z t
0e−τ2
dτ,
we calculate the explicit representationZ
R
e−y2e2tφi(y)dy =
1
2 t
˘erf(t ih) − erf(t (i− 1)h)
¯, (38)
with h = 2A/n (uniform grid spacing) for i = 1, ..., n. Since erf(z)/z is an
entire function one proves the required analyticity of pi.
To estimate N(pi, Dδ), we let Hi = h(i1 − 1, ..., id − 1)T ∈ Rd to obtainZ
Rde−w2|y|2φ(y +Hi)dy =
Z
Rde−w2|v−Hi|2φ(v)dv,
taking into account that φ has compact support [−h, h]d.
Error bound for Sinc quadrature approximation B. Khoromskij, Zuerich 2010(L4) 111
Notice that |Q(ζ exp(iδ))| ≤ C0 <∞ for ζ ∈ [0,∞), leading to the bound,
N(pi, Dδ) =
=
Z
∂Sδ
|pi(w)| |dw|
=
Z
∂Sδ
Z
Rd|Q(w)|
˛˛e−w2|y|2φ(y +Hi)dy
˛˛ |dw|
≤ 2
Z
R+
Z
Rd|Q(ζeiδ)|
˛˛e−ζ2 exp(2iδ)|u−Hi|2φ(u)du
˛˛ dζ
≤ 2C0
Z
Rd
Z
R+
˛˛e−ζ2 exp(2iδ)|u+Hi|2
˛˛ dζ |φ(u)|du
= 2C0
Z
Rd
Z
R+
e−ζ2cos(2δ)|u+Hi|2dζ |φ(u)|du
=2C0pcos(2δ)
Z
Rd
|φ(u)||u−Hi|2
du.
The latter term is uniformly bounded in Hi.
Error bound for Sinc quadrature approximation B. Khoromskij, Zuerich 2010(L4) 112
Asymptotics of the integrand pi(t) := pi(t;κ) on the real axis,
pi(t;κ) ≈ et−κ2e2tas t→ ∞,
pi(t;κ) ≈ et−κ2e2|t|as t→ −∞,
corresponds to a = 2, b = κ2, and C = 1 for the double-exponential decay.
Finally, we apply Thm. 3.6, providing the exponential convergence in M
of the rank-r sinc-quadrature approximation as in (37), with r = 2M + 1,
gi =
Z
R
pi(t;κ)dt ≈ hM
MX
k=−M
pi(tk;κ).
Rem. Sinc quadrature approximation of the Galerkin tensor for Newton
kernel is analysed in [2].
Ex. 4.6. Laplace transform for Slater function
G(ρ) = e−2√
αρ =
√α√π
Z
R+
τ−3/2 exp(−α/τ − ρτ)dτ.
Numerics for tensor-product convolution B. Khoromskij, Zuerich 2010(L4) 113
Ex. 4.7. n = 200: Naive collocation - n6 ∼ 2 · 1012;
FFT3 - n3 logn ∼ 107; canonical-canonical tensor format - 3nR1R2.
Next table shows the advantage of the fast tensor-product convolution
method. compared with those based on FFT3.
CPU time (sec.) for high accuracy computation of the Hartree potential
for H2O molecule calculated in MATLAB on a Sun Fire X4600 computer
with 2,6 GHz processor, [4].
CPU time for FFT3 scheme with n ≥ 1024 is obtained by extrapolation.
n3 1283 2563 5123 10243 20483 40963 81923 163843
FFT3 4.3 55.4 582.8 ∼ 6000 – – – ∼ 2 years
C ∗ C 1.0 3.1 5.3 21.9 43.7 127.1 368.6 700.2
Literature to Lect. 4 B. Khoromskij, Zuerich 2010(L4) 114
1. W. Hackbusch and B.N. Khoromskij: Low-rank Kronecker product approximation to multi-dimensional
nonlocal operators . Parts I/II. Computing 76 (2006) 177-202/203-225.
2. B.N. Khoromskij: Structured Rank-(r1, ..., rd) Decomposition of Function-related Tensors in Rd.
Comp. Meth. in Appl. Math., V. 6 (2006), 194-220.
3. B.N. Khoromskij: On Tensor Approximation of Green Iterations for Kohn-Sham Equations. Computing
and Visualization in Sci., 11 (2008), 259-271.
4. B.N. Khoromskij and V. Khoromskaia: Low-rank Tucker Tensor Approximation to Classical Potentials.
Central European J. of Math., 5(3) 2007, 1-28.
http://personal-homepages.mis.mpg.de/bokh
Introduction to Tensor Numerical Methods II B. Khoromskij, Zuerich 2010(L5) 115
Everything is more simple than one thinks
but at the same time more complex than one can understand.
J.W. von Goethe (1749-1832)
Introduction to Tensor Numerical Methods in
Scientific Computing(Part II. Tensor Approximation and Multilinear Algebra)
Boris N. Khoromskij
http://personal-homepages.mis.mpg.de/bokh
University/ETH Zuerich, Pro∗Doc Program, WS 2010
Part II: Outlook B. Khoromskij, Zuerich 2010(L5) 116
Part II (Lect. 5-11). Basic Tensor formats. Algebraic Methods of
Tensor Approximation and Multilinear Algebra (MLA).
Motivatory examples of modern applications. Low rank and H-matrices.
Truncated SVD, ACA. FFT and circulant convolution.
Canonical, Tucker and mixed tensor formats. On Strassen algorithm.
Unfolding of a tensor and contracted product. Basic MLA operations
on rank structured tensors. High order SVD (HOSVD), quasioptimality.
Multilinear Kronecker product of matrices, basic properties.
Approximation by rank-1 tensors.
Tucker/canonical approximation by ALS iteration. Multigrid accelerated
tensor approximation. Reduced HOSVD and fast canonical-to-Tucker
transform.
Tensor representation of matrix-valued functions.
Truncated iteration.
Tensor convolution revisited and other bilinear operations.
Lect. 5. From low to higher dimensions B. Khoromskij, Zuerich 2010(L5) 117
Outline of Lecture 5.
1. Wide range applications in Rd.
2. d = 2: Main properties of rank-R matrices. Approximation
by low rank matrices.
3. Truncated SVD, reduced SVD, and adaptive cross
approximation (ACA).
4. H-matrices in dimension ≤ 3: advantages and limitations.
5. FFT, FFTd, and circulant convolution.
6. A paradigm of super-computing (does not relax the curse
of dimension).
Problem classes in RdB. Khoromskij, Zuerich 2010(L5) 118
Elliptic (parameter-dependent) eq.: Find u ∈ H10 (Ω), s.t.,
Hu := − div (A gradu) + V u = F in Ω ∈ Rd.
EVP: Find a pair (λ, u) ∈ R ×H10 (Ω), s.t., 〈u, u〉 = 1, and
Hu = λu in Ω ∈ Rd,
u = 0 on ∂Ω.
Parabolic equations: Find u : Rd × (0,∞) → R, s.t.
u(x, 0) ∈ H2(Rd) : σ∂u
∂t+ Hu = 0, H = ∆d + V (x1, ..., xd).
Specific features:
⊲ High spacial dimension: Ω = (−b, b)d ∈ Rd (d = 2, 3, ..., 100, ...).
⊲ Multiparametric eq.: A(y, x), u(y, x), y ∈ RM (M = 1, 2, ..., 100, ...,∞).
⊲ Nonlinear, nonlocal (integral) operator V = V (x, u), singular potentials.
General examples B. Khoromskij, Zuerich 2010(L5) 119
• Fast Poisson solver, preconditioning ⇒ (−∆ + I)−1.
• Convolution transform in Rd with Green’s function for
d-Laplacian (d ≥ 3),
f(x) =
∫
Rd
ρ(y)
‖x− y‖d−2dy, x ∈ R
d.
O(dn logn)-algorithms and numerics in electronic
structure calculations.
• Parabolic equations (heat transfer, molecular dynamics)
∂u∂x +Au = f ⇒ exp(−tA), Gayley Transform I+A
I−A .
• Linear algebra, complexity theory (Strassen’s algorithm
by tensor decomposition).
• Matrix product states (MPS) and DMRG-type methods
for slightly entangled sytems (electronic structure,
molecular dynamics).
Many-particle models B. Khoromskij, Zuerich 2010(L5) 120
Objectives in Many-particle models.
• The electronic Schrodinger eq. for many-particle system in Rd,
HΨ = ΛΨ
with the Hamiltonian H = H[r1, ..., rNe ],
H := −1
2
NeX
i=1
∆i −KX
a=1
NeX
i=1
Za
|ri −Ra|+
X
i<j≤Ne
1
|ri − rj |+
X
a<b≤K
ZaZb
|Ra −Rb|,
Za, Ra are charges and positions of the nuclei, ri ∈ R3.
Hence the problem is posed in Rd with high dimension d = 3Ne,
where Ne is the (large) number of electrons.
Desired size of the system is Ne = O(10q), q = 1, 2, 3, 4, ...?
Proteins: q = 3, 4.
Molecular dynamics, electronic structure calculation for small
molecules: q = 1, 2.
Many-particle models B. Khoromskij, Zuerich 2010(L5) 121
• Hartree-Fock equation
»−1
2∆ − Vc(x) +
Z
R3
ρ(y, y)
‖x− y‖dy–φ(x) − 1
2
Z
R3
ρ(x, y)
‖x− y‖ φ(y)dy = λφ(y),
ρ(x, y) =Ne/2Pi=1
φi(x)φi(y) electron density matrix,
e−µ‖x‖ - density function for hydrogen atom,1
‖x‖ - Newton potential,
Vc - external potential with singularities at centers of atoms.
Tensor approximation scheme and numerics Lect. 10 - 11.
• Kohn-Sham equation (simplyfied Hartree-Fock eq.)
»−1
2∆ − Vc(x) +
Z
R3
ρ(y)
‖x− y‖dy − αVρ(x)
–ψ = λψ, Vρ(x) =
3
πρ(x)
ff1/3
.
• Poisson-Boltzmann eq. (the electrostatic potential of proteins)
∇ · [ε(x)∇ · φ(x)] − ε(x)h(x)2sinh[φ(x)] + 4πρ(x)/kT = 0, x ∈ R3.
If ε(x) = ε0, h(x) = h, ρ(x) = δ(x), then φ(x) = e−h‖x‖
‖x‖ .
Parametric Elliptic Problems: Stochastic PDEs B. Khoromskij, Zuerich 2010(L5) 122
Find uM ∈ L2(Γ) ×H10 (D), s.t.
AuM (y, x) = f(x) in D, ∀y ∈ Γ,
uM (y, x) = 0 on ∂D, ∀y ∈ Γ,
A := − div (aM (y, x) grad) , f ∈ L2 (D) , D ∈ Rd, d = 1, 2, 3,
aM (y, x) is smooth in x ∈ D, y = (y1, ..., yM ) ∈ Γ := [−1, 1]M , M ≤ ∞.
Additive case (via the truncated Karhunen-Loeve expansion)
aM (y, x) := a0(x) +MX
m=1
am(x)ym, am ∈ L∞(D), M → ∞.
Log-additive case
aM (y, x) := exp(a0(x) +MX
m=1
am(x)ym) > 0.
Computing the truncated Karhunen-Loeve expansion.
Analysis of best N-term approximations.
Tensor representation of stochastic-Galerkin and collocation matrices.
Tensor truncated preconditioned iteration.
“Low dimensional” methods as building blocks B. Khoromskij, Zuerich 2010(L5) 123
In low dimensions (d = 1, 2, 3) the goal is O(N)-methods.
Main principles: making use of hierarchical structures,
low-rank pattern and recursive algorithms.
Basic numerical methods for d = 1, 2, 3:
Finite element/Finite difference methods (FE/FD).
Classical Fourier (1768-1830) methods: FFT in O(N logN) op.,
FFT-based circulant convolution, Toeplitz, Hankel matrices.
Multigrid principle: O(N) - elliptic problem solvers.
Spectral equivalent preconditionrs.
Numertical linear algebra.
Low rank matrix approximation. SVD-based algorithms.
Matrix SVD B. Khoromskij, Zuerich 2010(L5) 124
Lem. 5.1. (matrix SVD). Every real (complex) τ × σ-matrix
M can be representd as the product
M = U (1) · S · V (2)T= S ×1 U
(1) ×2 V(2)T
= S×1 U(1) ×2 U
(2),
in which
1. U (1) = [U(1)1 U
(1)2 ...U
(1)I1
] is a unitary τ × τ-matrix,
2. U (2) = [U(2)1 U
(2)2 ...U
(2)I2
] is a unitary σ × σ-matrix,
3. S is an τ × σ-matrix (core tensor) with the properties of
(i) pseudodiagonality : S = diagσ1, σ2, ..., σmin(τ,σ),
(ii) ordering : σ1 ≥ σ2 ≥ ... ≥ σmin(τ,σ) ≥ 0.
The σi are singular values of M , and the vectors U(1)i and U
(2)i
are, resp., an ith left and ith right singular vectors.
Low rank matrices B. Khoromskij, Zuerich 2010(L5) 125
The class of rank ≤ k matrices in Rτ×σ will be called by
Rk-matrices, i.e. rank(M) ≤ k for M ∈ Rk.
Each M ∈ Rk can be represented in the form
M = A ·BT , A ∈ Rτ×k, B ∈ R
σ×k. (39)
Lem. 5.2. Attractive features of Rk-matrices:
1. The set Rk is closed (nontrivial result in linear algebra).
2. Only k(τ + σ) numbers are required to store an Rk-matrix.
3. The matrix-vector multiplication x 7→ y := Mx, x ∈ Rσ
can be done in two steps:
y′ := BTx ∈ Rk, and y := Ay′ ∈ Rτ .
The corresponding cost is 2k(σ + τ).
Low rank matrices B. Khoromskij, Zuerich 2010(L5) 126
4. The sum of two Rk-matrices R1 = A1BT1 , R2 = A2B
T2 is an
R2k-matrix,
R1 +R2 = [A1|A2][B1|B2]T , [A1|A2] ∈ R
τ×2k, [B1|B2] ∈ Rσ×2k.
5. The multiplication of R ∈ Rk by an arbitrary matrix M of
the proper size gives again an Rk-matrix:
RM = A(MTB)T , MR = (MA)BT .
6. The best approximation of an arbitrary matrix M ∈ Rτ×σ
by an Rk-matrix Mk, say in the Frobenius norm, that is
‖A‖2F :=
∑
(i,j)∈τ×σ
a2ij ,
can be calculated by the truncated SVD (discrete version of
the Schmidt decomposition).
Truncated SVD B. Khoromskij, Zuerich 2010(L5) 127
Alg. 5.1. (Truncated SVD). For given k ∈ N, let M = UΣV T
be the SVD of M , i.e., Σ = diagσ1, . . . σk, ..., σn with
σ1 ≥ σ2 ≥ . . . ≥ σn ≥ 0, and U = [U1, ..., Uk, Uk+1, ..., Un],
V = [V1, ..., Vk, Vk+1, ..., Vn] being unitary.
Set Σk := diagσ1, . . . , σk, 0, . . . , 0, then
Mk := UΣkVT = UΣkV
T ≈M,
and
‖Mk −M‖F ≤
√√√√n∑
j=k+1
σ2j .
The complexity of the truncated SVD: O(τσ2) with τ ≥ σ.
Too expensive for large τ and σ.
Is it possible to compute almost the best rank-k matrix
approximation getting rid of full matrix SVD ? – Yes.
Reduced truncated SVD B. Khoromskij, Zuerich 2010(L5) 128
If M ∈ Rm, then its best approximation Mk ∈ Rk, k < m, can be computed
by the following QR-SVD scheme.
Alg. 5.2. (Reduced truncated SVD). Given M = ABT ∈ Rm,
(i) Calculate the QR-decompositions A = QARA and B = QBRB, with the
unitary matrices QA ∈ Rτ×m, and QB ∈ Rσ×m, and upper triangular
matrices RA, RB ∈ Rm×m.
(ii) Calculate a SVD, RARTB = UΣV T (with the cost O(m3)).
(iii) Define Mk = AkBTk with Ak := QAUkΣk ∈ Rτ×k and
Bk := QBVk ∈ Rσ×k, where Uk := [U1, . . . , Uk], Vk := [V1, . . . , Vk] (in both
cases, first k columns) and the truncated matrix Σk of Σ are defined by
truncated SVD of RARTB = UΣV T .
Alg. 5.2 can be implemented in O(m2(τ + σ) +m3) operations.
Exer. 5.1 Compute the rank-r, r = 2M + 1, sinc quadrature
approximation of the Hilbert matrix A = aij, aij = 1/(i+ j) (i, j = 1, ..., n)
for n = 103, 104, and M = 64. Apply to the result the best low rank
approximation via reduced truncated SVD by Alg. 5.2 (cf. Exercise 2.2).
Adaptive cross approximation (ACA) B. Khoromskij, Zuerich 2010(L5) 129
In FEM/BEM applications, nearly best (suboptimal) rank-k approximation
over partial data can be computed by the heuristic method called adaptive
cross approximation (ACA), cf. [3], [6].
Many matrix decomposition algorithms can be represented as a sequence
of rank-one Wedderburn updates.
J. H. M. Wedderburn, Lectures on matrices, colloquim publications, vol. XVII, AMS, NY, 1934.
For a given m× n matrix A and vectors x, y of appropriate sizes, s.t.
xTAy 6= 0, matrix
B = A− AyxTA
xTAy,
has rank(B) =rank(A) − 1. For the rank-r matrix A0 = A after r updates of
form
Ak = Ak−1 − Ak−1ykxTk Ak−1
xTk Ak−1yk
, with xTk Ak−1yk 6= 0,
the matrix Ar becomes zero leading to rank-r decomposition of A.
Adaptive cross approximation (ACA) B. Khoromskij, Zuerich 2010(L5) 130
The ACA algorithm is a special case of Wedderburn updates based on
“max-element” pivoting strategy cf. [3], [6].
The idea of the ACA algorithm is as follows.
Starting from R0 = A ∈ Rm×n, find a nonzero pivot in Rk, say (ik, jk),
and subtract a scaled outer product of the ikth row and the jkth column:
Rk+1 := Rk − 1
(Rk)ikjk
ukvTk , with uk = (Rk)1:m,jk , vk = (Rk)ik,1:n,
where we use the notation (Rk)ik,1:n and (Rk)1:m,jk for the ikth row and
the jkth column of Rk, respectively.
Here jk is chosen as the maximum element in modulus of the ikth row,
i.e.,
|(Rk−1)ikjk | = maxj=1,...,n
|(Rk−1)ikj |.
The choice of ik will be similar.
The matrix Sr :=Pr
k=1 ukvTk will be used as the rank-r approximation
of A = Sr +Rr, since rank(Sr) ≤ r.
Apply the reduced truncated SVD to Sr for rank optimization.
H-matrix format: brief survey B. Khoromskij, Zuerich 2010(L5) 131
H- and H2-matrix technique is a direct descendant of panel
clustering, fast multipole and mosaic-skeleton approximations.
In addition, it allows data-sparse matrix-matrix operations.
MH,k(TI×I ,P), the class of data-sparse hierarchical
H-matrices - Hackbusch, Khoromskij, Bebendorf, Borm, Grasedyck, Sauter (’99 - ’05).
The construction of H-matrices defined on the product index
set I × I, is based on the following ingredients:
• An H-tree T (I) of the index set I (hierarchical cluster
tree).
• The admissible partitioning P of I × I based on a block
cluster tree T (I × I).
• Low rank approximation of all large enough blocks in P.
H-matrix format: brief survey B. Khoromskij, Zuerich 2010(L5) 132
Def. 5.1. For an admissible partitioning P and k ∈ N, define
the set MH,k(I × I,P) ⊂ RI×I of (real) H-matrices by
MH,k(I×I,P) := M ∈ RI×I : rank(M |b) ≤ k for all b ∈ P. (40)
M |b = (mij)(i,j)∈b denotes the matrix block of M = (mij)i,j∈I,
corresponding to b ∈ P.The matrices from MH,k(I × I,P) are implemented by means
of the list M |b : b ∈ P of matrix blocks, where each M |b(b = τ × σ with τ, σ ∈ T (I)) is represented by the rank-k matrix
k∑
ν=1
aνb⊤ν with vectors aν ∈ R
τ , bν ∈ Rσ.
The number k is called the local rank.
Examples of hierarchical partitioning B. Khoromskij, Zuerich 2010(L5) 133
Hierarchical Partitionings P1/2(I × I) and PW (I × I)
Figure 16: Standard- (left) and Weak-admissible H-partitionings for d =
1.
Main properties of the H-matrix format B. Khoromskij, Zuerich 2010(L5) 134
Thm. 5.1. (complexity of the H-matrix arithmetic)
For k ∈ N, and H-tree TI×I of depth L > 1, the arithmetic of
N ×N-matrices in MH,k(TI×I ,P) has the complexity
NH,store ≤ 2CspkLN, NH·v ≤ 4CspkLN,
NH⊕H ≤ Cspk2N(C1L+ C2k),
NH⊙H ≤ C0C2spk
2LN maxk, L, NgInv(H)≤ CNH⊙H,
where Csp = Csp(d) is the sparsity constant.
Typically: Csp(1) ≈ 3, Csp(2) ≈ 25, Csp(3) ≈ 150.
The H-matrix format is well suited for representation of
integral (nonlocal) operators in BEM applications (d = 2, 3).
The hierarchical LU decomposition applies in FEM.
Limitations: Not applicable in high dimensions, d > 3.
Fast Fourier Transform B. Khoromskij, Zuerich 2010(L5) 135
Let SN be the space of sequences f [n]0≤n<N of period N .
SN is an Euclidean space with the scalar product
〈f, g〉 =
N−1∑
n=0
f [n]g∗[n].
Thm. 5.2. The familyek[n] = exp
(2iπkn
N
)0≤k<N
is an
orthogonal basis of SN with ||ek||2 = N . Any f ∈ SN can berepresented by
f =
N−1X
k=0
〈f, ek〉||ek||2
ek. (41)
Def. 5.2. The discrete Fourier transform (DFT) of f is
bf [k] := 〈f, ek〉 =
N−1X
n=0
f [n] exp
„−2iπkn
N
«, (N2 complex multiplications).
Due to (41) an inverse DFT is given by
f [n] :=1
N
N−1X
k=0
bf [k] exp
„2iπkn
N
«.
FFT: Matrix representation B. Khoromskij, Zuerich(L5) 136
The FT matrix FN = fk,nNk,n=1 is given by
fk,n := exp(−2iπkn
N) = W−nk, W = e2iπ/N .
The DFT(N) can be calculated by Fast Fourier Transform
(FFT) in NFFT (N) = CFN log2N operations, CF ≈ 4.
The FFT traces back (1805) to Gauss (1777 - 1855).
First computer program Coolly/Tukey (1965).
The inverse FFT of f can be derived from the forward FFT
of its complex conjugate f∗ due to
f∗[n] :=1
N
N−1∑
k=0
f∗[k] exp
(−2iπkn
N
).
Discrete convolution B. Khoromskij, Zuerich(L5) 137
Let g be the discrete convolution of two signals f, h supported
only by the indices 0 ≤ n ≤M − 1,
g[n] = (f ∗ h)[n] =∞∑
k=−∞f [k]h[n− k].
The naive implementation requires M(M + 1) operations.
It can be represented as a matrix-by-vector product (MVP)
with the Toeplitz matrix
T = h[n− k]0≤n,k<M ∈ RM×M , g = Tf.
Extending f and h with over M samples by
h[M ] = 0, h[2M − i] = h[i], i = 1, ...,M − 1,
f [n] = 0, n = M, ..., 2M − 1,
we reduce the problem to the MVP with a circulant matrix
C ∈ R2M×2M specified by the first row h ∈ R2M .
Circulant convolution by FFT B. Khoromskij, Zuerich(L5) 138
An n×n Toeplitz matrix C is called circulant if it has the form
C = circc1, . . . , cn :=
c1 c2 . . . cn
cn c1 . . . cn−1
......
. . ....
c2 . . . cn c1
, ci ∈ C .
The set of all n× n circulant matrices is closed with respect
to addition and multiplication by a constant.
Any circulant matrix C is associated with the polynomial
pc(z) := c1 + c2z + . . .+ cnzn−1, z ∈ C.
Circulant convolution by FFT B. Khoromskij, Zuerich(L5) 139
Matrix C has a diagonal representation in the Fourier basis,
C = FTn ΛcFn
with
Λc = diagpc(1), . . . , pc(ωn−1), ω = eiπ/n.
The eigenvector corresponding to the eigenvalue pc(ωj−1) is
given by jth column of Fn, i.e.,
~ωj =1√nω(k−1)(j−1)n
k=1.
The matrix-vector product with C costs 2CFn log2 n+O(n) op.
Multi-dimensional FFT can be performed by tensorization
process with the linear-logarithmic cost O(N log2N), N = nd.
Huge problems: tensor methods bit super-computers B. Khoromskij, Zuerich 2010(L5) 140
⊲ The algebraic operations on high-dimensional data require
heavy computing.
⊲ Linear cost O(N), N = nd is satisfactory only for small d.
⊲ Traditional ”asymptotically optimal” methods suffer from
the “curse of dimensionality”
⊲ Complexity of matrix operations in full arithmetics: O(N3).
It is large already for d = 3, i.e., N = n3 ⇒ N3 = n9.
⊲ A paradigm of up-to-date numerical simulations:
The higher computer capacities do not relax the curse of
dimensionality.
⊲ Remedy: The identification and efficient using of low rank
tensor structured representations with linear scaling in d.
Literature to Lecture 5 B. Khoromskij, Zuerich 2010(L5) 141
1. G.H. Golub and C.F. Van Loan: Matrix computations. 3rd ed., The Johns Hopkins University Press,
Baltimore, 1996.
2. W. Hackbusch: Hierarchiche Matrizen - Algorithmen und Analysis. Springer 2009.
3. M. Bebendorf: Hierarchical Matrices. Springer, 2008.
4. W. Hackbusch and B.N. Khoromskij: A Sparse H-matrix Arithmetic. Part II: Application to
Multi-Dimensional Problems. Copmputing 64 (2000), 21-47.
5. W. Hackbusch, B.N. Khoromskij and S. Sauter: On H2-Matrices. In: Lectures on Appl. Math.
(H.-J. Bungartz et al. eds.) Springer, Berlin, 2000, 9-30.
6. B.N. Khoromskij: Data-Sparse Approximation of Integral Operators. Lecture notes 17, MPI MIS,
Leipzig 2003, 1-61.
7. E. Tyrtyshnikov: Incomplete cross approximation in the mosaic-skeleton method.
Computing 64 (2000), 367-380.
http://personal-homepages.mis.mpg.de/bokh
Lect. 6. Basic rank structured tensor formats B. Khoromskij, Zurich 2010(L6) 142
Outline of Lecture 6.
0. FFT and circulant convolution.
1. Tensor product of finite dimensional Hilbert spaces
(multidimensional vectors).
2. Matrix unfolding and contracted product of tensors.
3. Tensor rank and canonical representation.
4. Rank decomposition can be useful in linear algebra:
O(nlog2 7)- Strassen algorithm of matrix multiplication.
5. Orthogonal Tucker and mixed Tucker-canonical models.
6. Linear and multilinear operations on “formatted tensors”.
7. Toward best (nonlinear) approximation in basic tensor
formats.
Tensor product of finite dimensional Hilbert spaces B. Khoromskij, Zurich 2010(L6) 143
Let H = H1 ⊗ ...⊗Hd be a tensor prod. Hilbert space (TPHS).
Hℓ is a real Euclidean space of vectors,
Hℓ = Rnℓ , nℓ ∈ N, nℓ := dimHℓ, ℓ = 1, ..., d.
The scalar product of rank-1 elements W,V ∈ H is given by
〈W,V 〉 = 〈w(1)⊗ . . .⊗w(d), v(1)⊗ . . .⊗v(d)〉 =d∏
ℓ=1
〈w(ℓ), v(ℓ)〉Hℓ, (42)
W (i1, ..., id) =d∏
ℓ=1
w(ℓ)(iℓ), Stor(W ) = n1 + ...+ nd ≪d∏
ℓ=1
nℓ.
Choose a basisφ
(ℓ)k : 1 ≤ k ≤ nℓ
of Hℓ, then the set
φ(1)k1
⊗ φ(2)k2
⊗ . . .⊗ φ(d)kd
(1 ≤ kℓ ≤ nℓ, 1 ≤ ℓ ≤ d) is the basis in H.
Denote the d-fold tensor prod. H = H ⊗ ...⊗H by H⊗d (= RId).
Tensor product of finite dimensional Hilbert spaces B. Khoromskij, Zurich 2010(L6) 144
Rem. 6.1. d-th order tensor A ∈ H of size n = (n1, ..., nd) is a
function of d discrete arguments (multi-dimensional
array/vector over I := I1 × ...× Id, Iℓ = 1, ..., nℓ), i.e.,
W : I1 × ...× Id → R, with dim(H) = |n| = n1 · · ·nd.
Notations for the coordinate representation of A,
A := [ai1...id ] = [A(i1, ..., id)] ∈ RI .
The Euclidean scalar product of tensors A,B ∈ H becomes
〈A,B〉 :=∑
(i1,...,id)∈Iai1...id bi1...id ,
inducing the Euclidean (Frobenious) norm ‖A‖F :=√
〈A,A〉.The dimension directions ℓ = 1, ..., d are called the modes.
Tensor is a union of ℓ-mode fibers, A(i1, ..., iℓ−1, : , iℓ+1, ..., id).
Vectorization of a tensor B. Khoromskij Zurich 2010(L6) 145
For a matrix A ∈ Rm×n we use the vector representation
(vectorization or concatenation) A→ vec(A) ∈ Rmn, where
vec(A) is an nm× 1 vector obtained by “stacking” A’s columns
(the FORTRAN-style ordering)
vec(A) := [a11, ..., an1, a12, ..., anm]T .
In this way, vec(A) is a rearranged version of A.
Def. 6.1. In general, if A ∈ RI1×...×Id is a tensor, then thevectorization of A is recursively defined by
vec(A) =
26666664
vec([A(i1, ..., id−1, 1)])
vec([A(i1, ..., id−1, 2)])
.
.
.
vec([A(i1, ..., id−1, nd)])
37777775
∈ R|n|×1.
The tensor element A(i1, ..., id) maps to vector entry (j, 1),
where j = 1 +d∑
k=1
(ik − 1)k−1∏ℓ=1
nℓ.
Matrix unfolding of a tensor B. Khoromskij Zurich 2010(L6) 146
Unfolding of a tensor into a matrix (matricization) is a way to
map high order tensor into two-fold arrays by rearranging
(reshaping) it for some ℓ ∈ 1, ..., d, RI 7→ RIℓ×I(−ℓ) , and then
vectorizing the tensors in Riℓ×I(−ℓ) for each iℓ ∈ Iℓ. The single
hole index set is defined by I(−ℓ) := I1 × ...× Iℓ−1 × Iℓ+1 × ...× Id.
Def. 6.2. The unfolding mat(A) of a tensor A ∈ RI1×...×Id
w.r.t. the index ℓ (along mode ℓ) is defined by a matrix
mat(A) := A(ℓ) of dimension nℓ × nℓ, so that the tensor element
A(i1, ..., id) maps to matrix element v(iℓ, j), iℓ ∈ Iℓ, where
A(ℓ) = [viℓj ], with j ∈ 1, . . . , nℓ, nℓ = n1 · · ·nℓ−1nℓ+1 · · ·nd,
j = 1 +dX
k=1,k 6=ℓ
(ik − 1)Jk, Jk =
k−1Y
m=1,m6=ℓ
nm.
Exer. 6.1. (mat(A) by recursion over vec(A)). Derive the representation
mat(A) = [vec([A(i1, ..., iℓ−1, 1, iℓ+1, ..., id)], ..., vec([A(i1, ..., iℓ−1, nℓ, iℓ+1, ..., id)])]T .
Example of matrix unfolding of a tensor B. Khoromskij Zurich 2010(L6) 147
Rem. 6.2. Kolmogorow’s decomposition is a particular way
for unforlding of the multivariate function into
“one-dimensional” representation (univariate function).
Ex. 6.1. Define a tensor A ∈ R3×2×3 by
a111 = a112 = a211 = −a212 = 1,
a213 = a311 = a313 = a121 = a122 = a221 = −a222 = 2,
a223 = a321 = a323 = 4, a113 = a312 = a123 = a322 = 0.
The matrix unfolding A(1) is given by
A(1) =
1 1 0
1 − 1 2
2 0 2
2 2 0
2 − 2 4
4 0 4
.
Visualization of matrix unfolding of a tensor B. Khoromskij Zurich 2010(L6) 148
SVD
SVD
SVD
(1)
(2)
(3)
r 1
r2
r 3
n 1
n 2
n3
n1
n . n
n2
n 3
n . n
3 2
1 3
2 1
n n.
T
T
T
0
0
0
Figure 17: Visualization of the matrix unfolding for d = 3.
ℓ-rank of a tensor. Contracted product of tensors B. Khoromskij, Zurich 2010(L6) 149
Def. 6.3. The ℓ-rank of A (ℓ = 1, ..., d), denoted by
Rℓ = rankℓ(A), is the dimension of the vector space spanned
by the ℓ-mode vectors (fibers).
The ℓ-mode fibers of A are the column vectors of the matrix
unfolding A(ℓ) (by definition).
Prop. 6.1. We have
rankℓ(A) = rank(A(ℓ)).
The major difference with the matrix case, however, is the
fact that the different ℓ-ranks of a higher-order tensor are not
necessarily the same.
An important tensor-tensor operation is the contracted
product of two tensors. In the following we use a
tensor-matrix contracted product along mode ℓ.
Contracted product of tensors B. Khoromskij Zurich 2010(L6) 150
Def. 6.4. Given V ∈ RI1×...×Id, and a matrix M ∈ RJℓ×Iℓ ,
define the mode-ℓ tensor-matrix contracted product by
U = V ×ℓ M ∈ RI1×...×Iℓ−1×Jℓ×Iℓ+1...×Id ,
where
ui1,...,iℓ−1,jℓ,iℓ−1,...,id =
nℓ∑
iℓ=1
vi1,...,iℓ−1,iℓ,iℓ−1,...,idmjℓ,iℓ , jℓ ∈ Jℓ.
This is the generalization of the matrix-matrix multiplication:
M(n,m) ×2 M(p,m) = M(n,m)MT(p,m) →M(n,p).
n3 n 3
r3
r3
n1
n1
n2
n2
Figure 18: Contracted product of a third-order tensor with a matrix.
Rank-1 tensors and canonical format B. Khoromskij, Zurich 2010(L6) 151
Rem. 6.3. A dth-order tensor A has rank 1, rank(A) = 1, if it
is the contracted product of d vectors t(1), ..., t(d), t(ℓ) ∈ RIℓ ,
A = t(1) ×2 t(2)...×d t
(d), ai1...id = t(1)i1...t
(d)id,
for iℓ ∈ Iℓ (ℓ = 1, ..., d).
Ex. 6.2. Let A = a1 ⊗ a2, B = b1 ⊗ b2, ai, bi ∈ Rn (d = 2).
〈A,B〉 = 〈a1, b1〉〈a2, b2〉, ||A||F =√〈a1, a1〉〈a2, a2〉.
Def. 6.5. (Canonical format). Choose a subset of those
elements which require only R terms. They form the set
CR =
w ∈ H : w =
R∑
k=1
w(1)k ⊗ w
(2)k ⊗ . . .⊗ w
(d)k , w
(ℓ)k ∈ Hℓ
.
Elements w ∈ CR with w /∈ CR−1 are called to have the tensor
rank R.
Pro and contra for canonical format B. Khoromskij, Zurich 2010(L6) 152
Tensors w ∈ CR can be represented by the description of Rd
elements w(ℓ)k ∈ Hℓ, i.e. with linear cost in d, dRn.
Advantages: Tremendous reduction of storage cost,
removing d from the exponential, nd → dRn;
Analytic methods of low-rank approximation for Green
kernels.
Limitations: CR is a nonclosed set. Approximation process in
CR is not robust.
Visualization of the canonical model for d = 3.
+
b
A
1b
V V V
V V V
V V V
+= ...+
1
1 2
2
2
r
r
r
(1) (1) (1)
(2) (2) (2)
21
(3) (3) (3)
rb
Strassen algorithm via rank decomposition B. Khoromskij, Zurich 2010(L6) 153
Finding the tensor rank can be a useful concept even in the
classical linear algebra.
Historical remarks on the Strassen algorithm of fast
matrix-matrix multiplication of complexity O(nlog2 7).
O(n2+ε) algorithm to multiply two n× n matrices gives O(n2+ε)
method for solving system of n linear eqs. [Strassen 1969].
Best known result: O(n2.376) [Copperesmith-Winograd 1987].
Lloyd N. Trefethen bets Peter Alfred (25 June 1985) that a
method will have been found to solve Ax = b in O(n2+ε)
operations for any ε > 0 (numerical stabiliy is not an isue).
Details at personal homepage by Prof. L.N. Trefethen (Uni.
Oxford).
Strassen algorithm via rank decomposition B. Khoromskij, Zurich 2010(L6) 154
In the block form24 C1 C2
C3 C4
35 =
24 A1A2
A3A4
35 ·
24 B1 B2
B3 B4
35
with
Ck =4X
i=1
4X
j=1
γijkAiBj , k = 1, ..., 4,
where for the 3-rd order coefficients tensor of size 4 × 4 × 4 we have
(slicewise)
γijk = ⊳1
2666664
1 0 0 0
0 0 1 0
0 0 0 0
0 0 0 0
3777775⊳2
2666664
0 1 0 0
0 0 0 1
0 0 0 0
0 0 0 0
3777775⊳3
2666664
0 0 0 0
0 0 0 0
1 0 0 0
0 0 1 0
3777775⊳4
2666664
0 0 0 0
0 0 0 0
0 1 0 0
0 0 0 1
3777775.
Here ⊳i means that the related matrix corresponds to slice number i ≤ 4.
Strassen algorithm via rank decomposition B. Khoromskij, Zurich 2010(L6) 155
Suppose that we have rank-R expansion
γijk =RX
t=1
uitvjtwkt.
Then
Ck =
RX
t=1
wkt
4X
i=1
4X
j=1
uitAivjtBj =
RX
t=1
wkt
4X
i=1
uitAi
!0@
4X
j=1
vjtBj
1A .
Precompute Σt =4P
i=1uitAi, ∆t =
4Pj=1
vjtBj and reduce the initial task to R
matrix-matrix products of size n/2 × n/2.
We have R ≤ 8, but there are representations (infinitely many) of rank 7
(Strassen’s result).
Open problem: Is it possible to construct rank decompositions with
R < 7? If yes, then the Strassen result can be improved.
Exer. 6.2. Try to compute the canonical rank-7 decomposition of γ by
the Tensor Toolbox.
Orthogonal separabe representation B. Khoromskij, Zurich 2010(L6) 156
As in the Galerkin method, the replacement of Hℓ by
subspaces Vℓ ⊂ Hℓ (1 ≤ ℓ ≤ d) leads to the tensor subspace
V = V1 ⊗ V2 ⊗ . . .⊗ Vd ⊂ H.
Setting rℓ := dimVℓ and choosing a orthonormal basisφ
(ℓ)k : 1 ≤ k ≤ rℓ
of Vℓ, we can represent each v ∈ V by
v =∑
k
bkφ(1)k1
⊗ φ(2)k2
⊗ . . .⊗ φ(d)kd, with bk ∈ R
J1×...×Jd ,
and with the multi-index k = (k1, . . . , kd), 1 ≤ kℓ ≤ rℓ, where
Jℓ := 1, ..., rℓ, (1 ≤ ℓ ≤ d).
Let r = (r1, . . . , rd) ∈ Nd be a d-tuple of dimensions.
Exer. 6.3. Max. canonical rank in V, R = (∏d
ℓ=1 rℓ)/maxℓ rℓ.
Orthogonal rank-r representation (Tucker format) B. Khoromskij, Zurich 2010(L6) 157
Def 6.6. (Tucker format) Given r, define
Tr := v ∈ V ⊂ H ∀ Vℓ s.t. dimVℓ = rℓ, ℓ = 1, ..., d .
A representation of w ∈ Tr is called a Tucker format of rank r
(cf. [1], [3], [4]).
Denote by U (ℓ) = [φ(ℓ)1 , ..., φ
(ℓ)rℓ ] ∈ R
nℓ×rℓ the ℓ-mode side matrix.
Def. 6.7. We say that U (ℓ) ∈ Srℓ, where Srℓ is the Stiefel
manifold of the orthogonal nℓ × rℓ matrices.
The Tucker representation is not unique (rotation of U (ℓ)).
Let us set for ease of presentation, n = nℓ, (ℓ = 1, ..., d).
Storage of w ∈ Tr:∏d
ℓ=1 rℓ reals and the sampling of∑d
ℓ=1 rℓ
vectors φ(ℓ)k ∈ Rn, O(rd + drn), r = max rℓ (curse of dimension).
Orthogonal rank-r representation (Tucker format) B. Khoromskij, Zurich 2010(L6) 158
Remark to Def. 6.6. Using the (orthogonal) side-matrices
U (ℓ) = [φ(ℓ)1 ...φ(ℓ)
rℓ] ∈ R
n×rℓ ,
we represent the Tucker decomposition of V ∈ T r as a
tensor-by-matrix contracted products,
V = β ×1 U(1) ×2 U
(2)...×d U(d),
where β ∈ RJ1×...×Jd is the core tensor of “small” size
r1 × ...× rd.
Rem. 6.4. In the case d = 2, the above representation is a
multilinear equivalent of a matrix factorisation, i.e., we have
A = β ×1 U(1) ×2 U
(2) = U (1) · β · U (2)T , β ∈ Rr1×r2 .
Tucker orthogonality meets the canonical sparsity B. Khoromskij, Zurich 2010(L6) 159
Visualization of the Tucker model for d = 3:
=
I 2
I 1
I 3
A B
I 1
r 2
r 1
I 2
I 3
r 3
V
V
V
(1)
(2)
(3)
How to relax drowbacks of both T r,n and CR ?
Main idea: The two-level tensor format that inherits the Tucker
orthogonality in primal space (robust decomposition) and the CR
structure in the dual (coefficients) space (linear scaling in d, n,R, r).
Two-level Tucker-canonical model B. Khoromskij, Zurich 2010(L6) 160
Def. 6.8. Mixed Tucker-canonical model (T CR,r), ([2]).
Given the rank parameters r, R, define a subclass T CR,r ⊂ T r,n
of tensors with β ∈ CR,r ⊂ RJ1×...×Jd ,
V =
(∑R
ν=1βνu
(1)ν ⊗ . . .⊗ u(d)
ν
)×1 V
(1) ×2 V(2)...×d V
(d).
Storage: S(V ) = dRr +R+ drn (linear scaling in d, n,R, r).
AB
V
V
r3
I 3
I 2
r2
(3)
(2)
I1
I
I
I
2
3
1r2
r3
r1
r1
V (1)
I
I
I
2
3
1
b1
+ . . . + bR
U(3)1
U(2)2
U(1)1
U(3)R
U(2)R
U(1)R
B
.
.
Level I: Tucker decomposition (left). Level II: canonical decomposition of β (right).
Exer. 6.4. Compute the mixed decomposition of functional
tensor for f1,κ, is it much faster than CP? (cf. Lect. 2).
Nonlinear approximation in tensor format B. Khoromskij, Zurich 2010(L6) 161
Exer. 6.5. Compute the canonical, Tucker and ℓ-mode ε-rank of the
Hilbert tensor A = aijk, aijk = 1/(i+ j + k) (i, j, k = 1, ..., n) with n = 102,
corresponding to approximation error ε = 10−3, 10−4, 10−5. Do you observe
the exponential convergence in rε?
Probl. 1. Efficient and accurate MLA in fixed tensor classes
S getting rid of the curse of dimensionality.
Probl. 2. Best rank-structured approximation of a high-order
tensor f ∈ Vn in the fixed set S ⊂ Tr, CR,T CR,r.Probl. 3. For fixed accuracy ε > 0, efficient approximation of
a high-order tensor f ∈ Vn in S with adaptive rank parameter.
Since both Tr and CR are not linear spaces, we arrive at a
nontrivial nonlinear approximation problem on estimation:
Given X ∈ Vn (more generally, X ∈ S0 ⊂ Vn), find
Tr(X) := argminA∈S
‖X −A‖, where S ⊂ Tr, CR,T CR,r. (43)
Nonlinear approximation in tensor format B. Khoromskij, Zurich 2010(L6) 162
Recall that the decomposition
f(x) := sin
d∑
j=1
xj
=
d∑
j=1
sin(xj)∏
k∈1,...,d\j
sin(xk + αk − αj)
sin(αk − αj)
(44)
holds for any αk ∈ R, s.t. sin(αk − αj) 6= 0 for all j 6= k.
(44) shows the lack of uniqueness (ambiguity) of the best
rank-d tensor representation. The convergence of
minimisation schemes in CR might be non-robust (multiple
local minima).
Exer. 6.6. Prove that the tensor related to f(x) has the
maximal Tucker rank 2. Check it by Tensor Toolbox.
Next discussion: How to solve (317) efficiently?
Main steps: MLA on tensors + high-order extension(s) of
truncated SVD + nonlinear iteration + multigrid.
Literature to Lecture 6 B. Khoromskij, Zurich 2010(L6) 163
1. L. De Lathauwer, B. De Moor, J. Vandewalle: On the best rank-1 and rank-(R1, ..., RN )
approximation of higher-order tensors. SIAM J. Matrix Anal. Appl., 21 (2000) 1324-1342.
2. B.N. Khoromskij: Structured Rank-(r1, ..., rd) Decomposition of Function-related Tensors in Rd.
Comp. Meth. in Appl. Math., V. 6 (2006), 194-220.
3. T.G. Kolda, and B.W. Bader: Tensor decompositions and applications.
SIAM Review, 51/3 (2009), 455-500.
4. L.R. Tucker: Some mathematical notes on three-mode factor analysis. Psychometrika 31 (1966) 279-311.
http://personal-homepages.mis.mpg.de/bokh
Lect. 7. Toward multilinear algebra of tensors B. Khoromskij, Zurich 2010(L7) 164
Outlook of Lecture 7.
1. From d = 2 to higher dimensions: main distinctions.
2. Extending SVD to High Order SVD (HOSVD).
3. Truncated HOSVD: Quasi-optimal error estimate.
4. Reduced HOSDV (RHOSDV). The error estimate.
5. Summary and observations on basic tensor representations:
(a) Full → Tucker.
(b) Canonical (CP) → Tucker.
(c) Tucker → Tucker.
(d) Tucker → CP.
The multi-factor analysis is nonlinear B. Khoromskij, Zurich 2010(L7) 165
Def. 7.1. (Tensor rank, cf. Def. 6.5). The minimal number
R in the representation
RI ∋ A =
R∑
k=1
v(1)k ⊗ · · · ⊗ v
(d)k , v
(ℓ)k ∈ R
n, (45)
is called a tensor rank, R = rank(A), of the tensor A.
[Hitchcock 1927; Kruskal]
Finding of a tensor rank R and the corresponding
decomposition(s) in high dimensions (d ≥ 3) is the main issue
of the multi-factor analysis. Computing the rank of high
order tensor is NP-hard [Hastad 1990].
Rem. 7.1. For d = 2, Def. 7.1 coincides with the standard
definition of matrix rank(A), which can be calculated (with
the rank decomposition) by finite SVD algorithm in O(n3) op.
The orthogonality requirement in SVD ensures the
uniqueness of rank decomposition.
Little analogy between the cases d ≥ 3 and d = 2 B. Khoromskij, Zurich 2010(L7) 166
If d > 2, the situation changes dramatically.
I. rank(A) depends on the number field (say, R or C).
II. Set of tensors of rank not larder than r,
Cr(d) := T ∈ H1 ⊗ ...⊗Hd : rank(T ) ≤ r,
is closed when d = 2 (matrices), or if r = 1 (rank-1 tensors),
[Golub, Zhang ’01].
III. For d ≥ 3, r 6= 1, the set Cr(d) is nonclosed.
Ex. 7.1. Let x, y be two linearly independent vectors in H
(say, dim(H) = 2). Consider the tensor T ∈ H ⊗H ⊗H = H⊗3,
T := x⊗ x⊗ x+ x⊗ y ⊗ y + y ⊗ x⊗ y.
It can be proven that
(a) rank(T ) = 3 (Exer. 7.1. Prove (a), [De Silva, Lim ’06]);
(b) T has no best rank-2 approximation.
Little analogy between the cases d ≥ 3 and d = 2 B. Khoromskij, Zurich 2010(L7) 167
Proof of (b): Consider a sequence Sk∞k=1 in H⊗3,
Sk := x⊗ x⊗ (x− ky) + (x+ 1/ky) ⊗ (x+ 1/ky) ⊗ ky.
Clearly that rank(Sk) ≤ 2 for all k. By multilinearity of ⊗,
Sk = T +1
ky ⊗ y ⊗ y.
Hence, for any choice of norm on H ⊗H ⊗H,
‖Sk − T‖ =1
k‖y ⊗ y ⊗ y‖ → 0 as k → ∞.
IV. For d ≥ 3 we do not know any finite algorithm to compute
r = rank(A), except simple bounds:
0 ≤ rank(A) ≤ nd−1.
Compare with the case d = 2.
Little analogy between the cases d ≥ 3 and d = 2 B. Khoromskij, Zurich 2010(L7) 168
V. For fixed d ≥ 3 and n we do not know the exact value of
maxrank(A).J. Kruskal ’77 [3] proved that:
– for any 2 × 2 × 2 tensor we have maxrank(A) = 3 < 4;
– for 3 × 3 × 3 tensors there holds maxrank(A) = 5 < 9.
VI. “Probabilistic” properties of rank(A):
In the set of 2 × 2 × 2 tensors there is about 79% of rank-2
tensors and 21% of rank-3 tensors, while rank-1 tensors
appear with probability 0, (J. Kruskal).
Clearly, for n× n matrices we have (why?)
Prank(A) = n = 1.
Little analogy between the cases d ≥ 3 and d = 2 B. Khoromskij, Zurich 2010(L7) 169
VII. However, it is possible to prove the important uniqueness
property within the equivalence classes.
Two representations like (45) are considered as equivalent (essential
equivalence) if either
(a) they differ in the order of terms or
(b) for some set of paramers aℓk ∈ R such that
dQℓ=1
aℓk = 1 (k = 1, ..., R),
there is a transform v(ℓ)k → aℓ
kv(ℓ)k .
A simplified version of the general uniqueness result is the
following (all factors have the same full rank R).
Prop. 7.2. [J. Kruskal, ’77]. Let for each ℓ = 1, ..., d, the vectors
v(ℓ)k , (k = 1, ..., R) with R = rank(A), are linear independent. If
(d− 2)R ≥ d− 1,
then the decomposition (45) is uniquely determined up to the
equivalence (a) - (b) above.
High order SVD (HOSVD) B. Khoromskij, Zurich 2010(L7) 170
The analogy of SVD model for dth-order tensors can be
formulated as the so-called high order SVD.
Thm. 7.1. (dth-order SVD, [De Lathauwer, Moor, Vendewalle 2000]).
Every complex n1 × n2 × ...× nd-tensor A can be written as the
contracted product
A = S ×1 U(1) ×2 U
(2)...×d U(d), where (46)
1. U (ℓ) = [U(ℓ)1 U
(ℓ)2 ...U
(ℓ)nℓ ] is a unitary nℓ × nℓ-matrix,
2. S is a complex n1 × n2 × ...× nd-tensor of which the
subtensors Siℓ=α, obtained by fixing the ℓth index to α, have
the properties of
(i) all-orthogonality: two subtensors Siℓ=α and Siℓ=β are
orthogonal for all possible values of ℓ, α, and β s.t. α 6= β:
〈Siℓ=α,Siℓ=β〉 = 0 when α 6= β.
High order SVD (HOSVD) B. Khoromskij, Zurich 2010(L7) 171
(ii) Ordering:
‖Siℓ=1‖ ≥ ‖Siℓ=2‖ ≥ ... ≥ ‖Siℓ=nℓ‖ ≥ 0, ∀ ℓ = 1, ..., d.
Hint: consider the matrix representations
A(ℓ) = U (ℓ) S(ℓ) [U (1) ⊗ ...⊗ U (ℓ−1) ⊗ U (ℓ+1) ⊗ ...⊗ U (d)]T ,
A(ℓ) = U (ℓ) Σ(ℓ) V (ℓ)T, Σ(ℓ) = diagσ(ℓ)
1 , ..., σ(ℓ)nℓ ,
S(ℓ) = Σ(ℓ) V (ℓ)T[U (1) ⊗ ...⊗ U (ℓ−1) ⊗ U (ℓ+1) ⊗ ...⊗ U (d)],
where ⊗ denotes the Kronecker product of matrices (details in Lect. 8).
Now the latter implies (i) and (ii) due to orthogonality of matrices U (ℓ).
(iii) The Frobenius-norms ‖Siℓ=i‖, symbolized by σ(ℓ)i , are
ℓ-mode singular values of A(ℓ) and the vector U(ℓ)i is an ith
ℓ-mode left singular vector of A(ℓ).
Truncated HOSVD: Full → Tucker B. Khoromskij, Zurich 2010(L7) 172
Computation: U (ℓ) (ℓ = 1, ..., d), is left singular matrix of A(ℓ).
Core S can be computed by bringing the matrices of sigular
vectors to the left side of (46):
S = A×1 U(1)T ×2 U
(2)T ...×d U(d)T .
Thm. 7.2. (Approximation by HOSVD, [1]). Let the HOSVD
of A be given as in Thm. 7.1 and let the ℓ-mode rank of A
be equal to Rℓ (ℓ = 1, ..., d). For given rank parameter
r = (r1, ..., rd), define a tensor Ar by discarding the smallest
ℓ-mode singular values σ(ℓ)rℓ+1, σ
(ℓ)rℓ+2, ..., σ
(ℓ)Rℓ
(ℓ = 1, ..., d), i.e., set
the corresponding parts of S equal to zero. Then we have
‖A− Ar‖2 ≤d∑
ℓ=1
Rℓ∑
iℓ=rℓ+1
σ(ℓ)2
iℓ.
Notice that truncated HOSVD gets lost only the factor√d compared with
“best approximation”.
Visualizing the truncated HOSVD B. Khoromskij, Zurich 2010(L7) 173
SVD
SVD
SVD
(1)
(2)
(3)
r 1
r2
r 3
n 1
n 2
n3
n1
n . n
n2
n 3
n . n
3 2
1 3
2 1
n n.
T
T
T
0
0
0
Figure 19: Approximation by HOSVD via truncated SVD of the matrix
unfoldings for d = 3, rℓ < nℓ.
Rem. 7.2. Truncated HOSVD practically applies to small d
and to moderate n.
Proof of the T-HOSVD error bound B. Khoromskij Zurich 2010(L7) 174
Proof. Due to orthogonality of U (ℓ), ℓ = 1, ..., d, and Thm. 7.1 we have
‖A− Ar‖2 =
R1∑
i1=1
R2∑
i2=1
· · ·Rd∑
id=1
s2i1i2...id −r1∑
i1=1
r2∑
i2=1
· · ·rd∑
id=1
s2i1i2...id
≤R1∑
i1=r1+1
R2∑
i2=1
· · ·Rd∑
id=1
s2i1i2...id+
R1∑
i1=1
R2∑
i2=r2+1
· · ·Rd∑
id=1
s2i1i2...id
+ · · · +R1∑
i1=1
R2∑
i2=1
· · ·Rd∑
id=rd+1
s2i1i2...id
=
R1∑
i1=r1+1
σ(1)2
i1+
R2∑
i2=r2+1
σ(2)2
i2+ · · · +
Rd∑
id=rd+1
σ(d)2
id.
Though the tensor Ar is not the best apprix. of A under the given ℓ-mode
rank constraints, it normally provides a good Tucker approx. of A.
Exer. 7.2. Compare the rank-r RHOSVD of n× n× n Hilbert tensor for
d = 3, n = 100; d = 4, n = 50, with the “best“ rank-r Tucker approx.
Reduced HOSVD (RHOSVD): CP → Tucker B. Khoromskij, Zurich 2010(L7) 175
For given A ∈ CR,n, in the rank-R canonical format,
A =XR
ν=1ξνu
ν1 ⊗ . . .⊗ uν
d , ξν ∈ R,
use its contracted product representation (Tucker with r = (R, ..., R))
A = ξ ×1 U(1) ×2 U
(2) · · · ×d U(d), ξ = diagξ1, ..., ξR,
via ℓ-mode side matrices U (ℓ) = [u1ℓ ...u
Rℓ ] ∈ Rn×R (ℓ = 1, ..., d).
How to symplify HOSVD ? To fix the idea, suppose that n ≤ R.
Def. 7.2. (RHOSVD, [2]). For given A ∈ CR,n, and r, rℓ ≤ R, let
U (ℓ) ≈W (ℓ) := Z(ℓ)0 Dℓ,0V
(ℓ)0
T, be the truncated SVD of the side-matrix
U (ℓ) (ℓ = 1, ..., d), where Dℓ,0 = diagσℓ,1, σℓ,2, ..., σℓ,rℓ and
Z(ℓ)0 = [z1ℓ , ..., z
rℓℓ ] ∈ Rn×rℓ , V0
(ℓ) ∈ RR×rℓ , represent the respective
orthogonal factors. Then the RHOSVD approximation of A is defined by
A0(r) = ξ ×1
»Z
(1)0 D1,0V
(1)0
T–×2 · · · ×d
»Z
(d)0 Dd,0V
(d)0
T–. (47)
Note that A0(r)
∈ Tr in (47) is obtained by the projection of side matrices
U (ℓ) onto the left singular matrices Z(ℓ)0 .
Canonical-to-Tucker approx. by RHOSVD B. Khoromskij, Zurich 2010(L7) 176
Thm. 7.3. (Error of RHOSVD, [2])
(a) For A =RP
ν=1ξνuν
1 ⊗ ...⊗ uνd ∈ CR,n, the minimisation problem
A ∈ CR,n ⊂ Vn : A(r) = argminT∈Tr,n‖A− T‖Vn
,
is equivalent to the dual maximisation problem
[Z(1), ..., Z(d)] = argmaxW (ℓ)∈Gℓ[Srℓ ]
‚‚‚‚‚RX
ν=1
ξν“W (1)T
uν1
”⊗ ...⊗
“W (d)T
uνd
”‚‚‚‚‚
2
Hr
.
(b) (Error of RHOSVD). Let σℓ,1 ≥ σℓ,2... ≥ σℓ,min(n,R) be the singular
values of U (ℓ) ∈ Rn×R (ℓ = 1, ..., d). Then the RHOSVD approx. of A, A0(r)
,
exhibits the error bound (extra factor ‖ξ‖ compared to HOSVD),
‖A−A0(r)‖ ≤ ‖ξ‖
dX
ℓ=1
(
min(n,R)X
k=rℓ+1
σ2ℓ,k)1/2, ‖ξ‖ =
vuutRX
ν=1
ξ2ν . (48)
Item (a): to be considered in Lect. 9.
Sketching the proof of RHOSVD error bound B. Khoromskij, Zurich 2010(L7) 177
Proof. Using the contracted product representations of A ∈ CR,n and A0(r)
,
leads to the following expansion for the respective error,
A−A0(r) = ξ ×1 U
(1) ×2 U(2) · · · ×d U
(d)
− ξ ×1
»Z
(1)0 D1,0V
(1)0
T–×2
»Z
(2)0 D2,0V
(2)0
T–· · · ×d
»Z
(d)0 Dd,0V
(d)0
T–
= ξ ×1
»U (1) − Z
(1)0 D1,0V
(1)0
T–×2
»Z
(2)0 D2,0V
(2)0
T–· · · ×d
»Z
(d)0 Dd,0V
(d)0
T–
+ ξ ×1 U(1) ×2
»U (2) − Z
(2)0 D2,0V
(2)0
T–· · · ×d
»Z
(d)0 Dd,0V
(d)0
T–
+ ...
+ ξ ×1 U(1) ×2 U
(2) · · · ×d
»U (d) − Z
(d)0 Dd,0V
(d)0
T–.
Introduce the ℓ-mode residual
∆(ℓ) = U (ℓ) − Z(ℓ)0 Dℓ,0V
(ℓ)0
T, ∆(ℓ)ν =
nX
k=rℓ+1
σℓ,kzkℓ v
kℓ,ν , ν = 1, ..., R,
with notations
V(ℓ)0 = [v1ℓ , ..., v
rℓℓ ]T , vk
ℓ = vkℓ,νR
ν=1.
Sketching the proof of RHOSVD error bound B. Khoromskij, Zurich 2010(L7) 178
The ℓth summand in the righ-hand side in ‖A−A0(r)
‖ takes the form
Bℓ = ξ ×1 U(1) · · · ×ℓ−1 U
(ℓ−1) ×ℓ ∆(ℓ) ×ℓ+1 W(ℓ+1) · · · ×d W
(d).
This leads to the error bound (by the triangle inequality)
‖A−A0(r)‖ ≤
dX
ℓ=1
‖Bℓ‖ = ‖ξ ×1 ∆(1) ×2 W(2) · · · ×d W
(d)‖
+ ‖ξ ×1 U(1) ×2 ∆(2) · · · ×d W
(d)‖ + ...
+ ‖ξ ×1 U(1) ×2 U
(2) · · · ×d ∆(d)‖,
where the ℓth term Bℓ is represented by
RX
ν=1
ξν
"uν1 · · · ×ℓ−1 u
νℓ−1 ×ℓ ∆(ℓ)ν ×ℓ+1
rℓ+1X
k=1
σℓ+1,kzkℓ+1v
kℓ+1,ν · · · ×d
rdX
k=1
σd,kzkdv
kd,ν
#,
providing the estimate (in view of ‖uνℓ ‖ = 1, ℓ = 1, ..., d, ν = 1, ..., R)
‖Bℓ‖ ≤RX
ν=1
|ξν |(nX
k=rℓ+1
σ2ℓ,kv
kℓ,ν
2)1/2·(
rℓ+1X
k=1
σ2ℓ+1,kv
kℓ+1,ν
2)1/2 · · · (
rdX
k=1
σ2d,kv
kd,ν
2)1/2.
Sketching the proof of RHOSVD error bound B. Khoromskij, Zurich 2010(L7) 179
Notice that U (ℓ) (ℓ = 1, ..., d) has normalised columns, i.e.,
1 = ‖uνℓ ‖ = ‖
nX
k=1
σℓ,kzkℓ v
kℓ,ν‖,
implyingnP
k=1σ2
ℓ,kvkℓ,ν
2= 1 for ℓ = 1, ..., d ν = 1, ..., R.
We finalise the error bound as follows,
‖A− A0(r)‖ ≤
dX
ℓ=1
RX
ν=1
|ξν |
0@
nX
k=rℓ+1
σ2ℓ,kv
kℓ,ν
2
1A
1/2
≤dX
ℓ=1
RX
ν=1
ξ2ν
!1/20@
RX
ν=1
nX
k=rℓ+1
σ2ℓ,kv
kℓ,ν
2
1A
1/2
=dX
ℓ=1
‖ξ‖
0@
nX
k=rℓ+1
σ2ℓ,k
RX
ν=1
vkℓ,ν
2
1A
1/2
= ‖ξ‖dX
ℓ=1
0@
nX
k=rℓ+1
σ2ℓ,k
1A
1/2
.
The case R < n can be analysed along the same line.
Canonical rank estimate B. Khoromskij, Zurich 2010(L7) 180
Recall that nℓ denotes the ℓ-mode single-hole product of dimensions
nℓ = n1 · · ·nℓ−1nℓ+1 · · ·nd.
Rem. 7.3. The canonical rank of a tensor A ∈ Vn has the upper bound
R ≤ min1≤ℓ≤d
nℓ =
dY
ℓ=1
nℓ/maxℓnℓ. (49)
Proof. Consider the case d = 3. Let n1 = max1≤ℓ≤d
nℓ for definiteness. We
can represent a tensor A as
A =
n3X
k=1
Bk ⊗ Zk, Bk ∈ Rn1×n2 , Zk ∈ R
n3 ,
where Bk = A(:, :, k) (k = 1, . . . , n3) is the n1 × n2 matrix slice of A and
Zk(i) = 0, for i 6= k, Zk(k) = 1. Let rank(Bk) = rk ≤ n2, k = 1, . . . , n3, then
rank(Bk ⊗ Zk) = rank(Bk) ≤ n2, implying
rank(A) ≤n3X
n=1
rank(Bk) ≤ n2n3 = min1≤ℓ≤3
nℓ.
Tucker → Tucker, Tucker → CP B. Khoromskij, Zurich 2010(L7) 181
The general case of d > 3 can be proven by induction.
Lem. 7.4. (Mixed Tucker-to-canonical approximation).
(A) Let the target tensor A have the form
A = β ×1 V(1) ×2 ...×d V
(d) ∈ T r,n,
with the orthogonal side-matrices V (ℓ) = [v(ℓ)1 . . . v
(ℓ)rℓ ] ∈ Rn×rℓ and
β ∈ Rr1×...×rd .
Then, for a given R ≤ min1≤ℓ≤d
rℓ (see (49)),
minZ∈CR,n
‖A− Z‖ = minµ∈CR,r
‖β − µ‖. (50)
(B) Assume that there exists the best rank-R approximation A(R) ∈ CR,n
of A, then there is the best rank-R approximation β(R) ∈ CR,r of β, such
that
A(R) = β(R) ×1 V(1) ×2 ...×d V
(d). (51)
Tucker → Tucker, Tucker → CP B. Khoromskij, Zurich 2010(L7) 182
Proof. (A) Notice that the canonical vectors y(ℓ)k of any test element in
the left-hand side of (50),
Z =RX
k=1
λk y(1)k ⊗ ...⊗ y
(d)k ∈ CR,n, (52)
can be chosen in spanv(ℓ)1 , . . . , v(ℓ)rℓ , i.e.,
y(ℓ)k =
rℓX
m=1
µ(ℓ)k,mv
(ℓ)m , k = 1, . . . , R, ℓ = 1, ..., d. (53)
Indeed, assuming
y(ℓ)k =
rℓX
m=1
µ(ℓ)k,mv
(ℓ)m + E
(ℓ)k with E
(ℓ)k ⊥spanv(ℓ)1 , . . . , v
(ℓ)rℓ ,
we conclude that E(ℓ)k does not effect the cost function in (50) because of
the orthogonality of V (ℓ). Hence, setting E(ℓ)k = 0, and substituting (53)
into (52), we arrive at the desired Tucker decomposition of Z,
Z = βz ×1 V(1) ×2 . . .×d V
(d), βz ∈ CR,r.
Tucker → Tucker, Tucker → CP B. Khoromskij, Zurich 2010(L7) 183
This implies
‖A− Z‖2 = ‖(βz − β) ×1 V(1) ×2 . . .×d V
(d)‖2 = ‖β − βz‖2 ≥ minµ∈CR,r
‖β − µ‖2.
On the other hand, we have
minZ∈CR,n
‖A− Z‖2 ≤ minβz∈CR,r
‖(β − βz) ×1 V(1) ×2 . . .×d V
(d)‖2 = minµ∈CR,r
‖β − µ‖2.
Hence, we prove (50).
(B) Likewise, for any minimizer A(R) ∈ CR,n in the r.h.s. of (50), one
obtains
A(R) = β(R) ×1 V(1) ×2 V
(2)...×d V(d)
with the respective rank-R core tensor
β(R) =RX
k=1
λku(1)k ⊗ ...⊗ u
(d)k ∈ CR,r.
Here u(ℓ)k = µ(ℓ)
k,mℓrℓ
mℓ=1 ∈ Rrℓ , are calculated by using representation
Tucker → Tucker, Tucker → CP B. Khoromskij, Zurich 2010(L7) 184
(53), and then changing the order of summation,
A(R) =RX
k=1
λky(1)k ⊗ ...⊗ y
(d)k
=RX
k=1
λk
0@
r1X
m1=1
µ(1)k,m1
v(1)m1
1A⊗ ...⊗
0@
rdX
md=1
µ(d)k,md
v(d)md
1A
=
r1X
m1=1
...
rdX
md=1
(RX
k=1
λk
dY
ℓ=1
µ(ℓ)k,mℓ
)v(1)m1 ⊗ ...⊗ v
(d)md
.
Now (51) implies that
‖A−AR‖ = ‖β − β(R)‖,
since the ℓ-mode multiplication with orthogonal side matrices V (ℓ) does
not change the cost function. Taking into account the l.h.s. of (50), the
latter indicates that β(R) is the minimizer in the r.h.s. of (50).
Overlook on direct methods of tensor approx. B. Khoromskij, Zurich 2010(L7) 185
1. ACA + SVD for two-fold decompositions (d = 2).
2. Analytic approximation to some function-generated dth
order tensors (d ≥ 2), (Lect. 3, 4).
Def. 7.3. Given the multivariate function
g : Ω ∈ Rd → R, Ω := [−L,L]d,
and a set of collocation points ζi = (ζ1i1, ..., ζd
id), i ∈ Id,
specified by a tensor grid in Ω.
The function-generated d-th order tensor is defined by
A ≡ A(g) := [ai1...id ] ∈ RId with ai1...id := g(ζ1
i1 , ..., ζdid
).
3. T-HOSVD, RT-HOSVD for quasi-optimal Tucker approx.
4. Next step: Algebraic recompression methods by iterated
rank-r Tucker or/and canonical approximation of high order
tensors (convergence theory is an open question).
Preliminary summarise on Tucker/canonical formats B. Khoromskij, Zurich 2010(L7) 186
Direct analytic approximation: Analytic approximation
methods are of principal importance.
Basic examples: Tensor representation of Green kernels.
Direct SVD-based approx.: Applies to low dimensions.
Reduction to univariate operations: Basic multi-linear
algebra can be performed using one-dimensional operations,
thus avoiding the exponential scaling in d (Lect. 8).
Bottleneck: Lack of robust and efficient algebraic methods
for the robust canonical/Tucker tensor decomposition of high
order tensors (for d ≥ 3).
Algebraic methods are indespensible. Basic concepts:
nonlinear iteration, multigrid, new tensor formats.
Next we consider the heuristic ALS-type Tucker/canonical
approximation applicable to moderate dimensions.
Literature to Lect. 7 B. Khoromskij, Zurich 2010(L7) 187
1. L. De Lathauwer, B. De Moor, J. Vandewalle, A multilinear singular value decomposition. SIAM J.
Matrix Anal. Appl., 21 (2000) 1253-1278.
2. B.N. Khoromskij and V. Khoromskaia, Multigrid Tensor Approximation of Function Related Arrays.
SIAM J. on Sci. Comp., 31(4), 3002-3026 (2009).
3. B. Kruskal, Three-way arrays: Rank uniqueness of trilinear decompositions. Lin. Alg. Appl. 18 (1977), 95-138.
4. T. Zhang, and G.H. Golub, Rank-one approximation to high order tensors.
SIAM J. Matrix Anal. Appl. 23 (2001), 534-550.
5. V. De Silva, and L.-H. Lim, Tensor rank and the ill-posedness of the best low-rank approximation problem.
SIAM J. Matrix Anal. Appl., 30 (2008) 1084-1127.
http://personal-homepages.mis.mpg.de/bokh
Lect. 8. Matrices in tensor format B. Khoromskij, Zurich 2010(L8) 188
Contents of Lecture 8.
1. Rank-R and Tucker-type matrices.
2. Kronecker, Hadamard and Khatri-Rao product of matrices.
3. Kronecker sum of matrices.
4. Properties of the Kronecker product and sum.
5. Matrix exponential.
6. Eigenvalue problem for Kronecker sum.
7. Matrix Lyapunov/Silvester equations.
8. Kronecker Hadamard scalar product.
9. Kronecker matrix rank if d = 2.
Rank-R matrices (operators) B. Khoromskij, Zurich 2010(L8) 189
Tensor is a vector in the TPHS V = V1 ⊗ ...⊗ Vd = RJ ,
J = J1 × ...× Jd, hence, a rank-R matrix A ∈ MR,I×J is
supposed to be the linear operator which maps
RI×J ∋ A : V → W,
where W = W1 ⊗ ...⊗Wd = RI, I = I1 × ...× Id.
Def. 8.1. We call by MR,I×J a class of rank-R linear
tensor-tensor operators (matrices) A ∈ RI×J , A : R
J → RI,
A =∑R
ν=1ανA
ν1 ⊗ . . .⊗ Aν
d, αν ∈ R, Aνℓ ∈ R
Iℓ×Jℓ ,
for which the matrix-vector multiplication with rank-1 tensor
V ∈ C1,J is defined by the rank-R canonical sum
AV :=∑R
ν=1ανA
ν1v
(1) ⊗ ...⊗Aνdv
(d) ∈ CR,I .
R is called the Kronecker/tensor rank. Matrices Aνℓ : RJℓ → RIℓ
may have fully populated, or data sparse structure.
Tucker matrix format B. Khoromskij, Zurich 2010(L9) 190
Rem. 8.1. Rank-R matrices in MR,I×J could be recognized as a
”matricization“ of all canonical vectors in the CP tensor.
Def. 8.2. The definition of rank-R matrices in MR,I×J can
be extended to the Tucker format, that could be recognized
as a ”matricization“ of all orthogonal vectors in the Tucker
tensor,
A = β ×1 U(1) ×2 U
(2)...×d U(d) ∈ Mr,I×J ,
where β ∈ RJ1×...×Jd is the core r1 × ...× rd-tensor, and
U (ℓ) = [Φ(ℓ)1 , ...,Φ(ℓ)
rℓ ] ∈ RIℓ×Jℓ×rℓ , Φ
(ℓ)k ∈ R
Iℓ×Jℓ .
The operator A maps A : C1,J → T r,I.
If tensors are vectorised (unfolding to a vector, A→ vec(A))
then the respective matrix in MR,I×J can be represented by
the Kronecker product of matrices of size Iℓ × Jℓ (ℓ = 1, ..., d).
This construction is supported by MATLAB.
The Kronecker product of matrices B. Khoromskij, Zurich 2010(L8) 191
Def. 8.3. The Kronecker product (KP) operation A⊗B oftwo matrices A = [aij ] ∈ Rm×n, B ∈ Rh×g is an mh× ng matrixthat has the block-representation [aijB], i = 1, ...,m; j = 1, ..., n.Recursive extension to d-fold KP (see Property 1. below),
A⊗B ⊗ C = (A⊗ B) ⊗ C = A⊗ (B ⊗ C).
We have the equivalent representation
A⊗B = [a1 ⊗ b1 a1 ⊗ b2 ... an ⊗ bg−1 an ⊗ bg].
Def. 8.4. The Kronecker sum of A ∈ Rm×m and B ∈ Rn×n is defined by
A+⊗ B = Im ⊗ B +A⊗ In.
Ex. 8.1. Define the discrete FD Laplacian on H10 ([0, 1]d),
∆(d) := ∆ ⊗ I ⊗ ...⊗ I + I ⊗ ∆ ⊗ I ⊗ ...⊗ I + ...+ I ⊗ ...I ⊗ ∆ ∈ Rnd×nd ,
where I = In is the n× n identity and ∆ = 1(n+1)2
tridiag−1, 2,−1 ∈ Rn×n.
∆(d) has Kronecker/tensor rank R = d.
Exer. 8.1. Prove that ∆(d) ∈ Mr,I×I with r = (2, 2, ..., 2).
Khatri-Rao and Hadamard product of matrices B. Khoromskij, Zurich 2010(L8) 192
Def. 8.5. The Khatri-Rao product is the “matching
columnwise“ Kronecker product. Given matrices
A = [aij ] ∈ Rm×n, B ∈ R
h×n, their Khatri-Rao product is
denoted by A⊡B. It has the size mh× n and is defined by
A⊡B = [a1 ⊗ b1 a2 ⊗ b2 ... an ⊗ bn].
If A and B are vectors, then the Khatri-Rao and Kronecker
products are identical, i.e., A⊗B = A⊡B.
Def. 8.6. Define the Hadamard product
A⊙B = C := [ci1...id ](i1...id)∈I , I = I1 × ...× Id,
of two tensors/matrices of the same size I, A,B ∈ RI, by the
entrywise multiplication
ci1...id = ai1...id · bi1...id .
Properties of the Kronecker product B. Khoromskij, Zurich 2010(L8) 193
KP inherits many properties from matrices A and B (cf.
[1]):
(1) Let C ∈ Rs×t, then the KP satisfies the associative law,
(A⊗B) ⊗ C = A⊗ (B ⊗ C) = A⊗B ⊗ C,
and therefore we do not use brackets above. The matrix
A⊗B ⊗ C := (A⊗B) ⊗ C has (mhs) rows and (ngt) columns.
(2) Let C ∈ Rn×r and D ∈ Rg×s, then the standard
matrix-matrix product in the Kronecker format takes the form
(A⊗B)(C ⊗D) = (AC) ⊗ (BD).
The corresponding extension to d-th order tensors is
(A1 ⊗ ...⊗Ad)(B1 ⊗ ...⊗Bd) = (A1B1) ⊗ ...⊗ (AdBd).
Properties of the Kronecker product B. Khoromskij, Zurich 2010(L8) 194
(3) The distributive law
(A+B) ⊗ (C +D) = A⊗ C + A⊗D +B ⊗ C +B ⊗D.
(4) Rank relation: rank(A⊗B) = rank(A)rank(B).
Exer. 8.1. In general A⊗B 6= B ⊗A. Give the condition on A
and B that provides A⊗B = B ⊗A.
Invariance of some matrix properties:
(5) If A and B are diagonal then A⊗B is also diagonal, and
conversely (if A⊗B 6= 0).
(6) The upper/lower triangular matrices are preserved.
(7) Let A, B be Hermitian/normal/orthog. matr. (A∗ = A,
A−1 = A, A−1 = AT ). Then A⊗B is of the corresponding type.
(8) Let A ∈ Rn×n and B ∈ Rm×m. Then
det(A⊗B) = (detA)m(detB)n.
Matrix operations with Kronecker product and sum B. Khoromskij, Zurich 2010(L8) 195
Thm. 8.1. Let A ∈ Rn×n and B ∈ R
m×m be invertible
matrices. Then
(A⊗B)−1 = A−1 ⊗B−1.
Proof. Since det(A) 6= 0, det(B) 6= 0 and the above property
(8), we have det(A⊗B) 6= 0. Thus (A⊗B)−1 exists and
(A−1 ⊗B−1)(A⊗B) = (A−1A) ⊗ (B−1B) = Inm.
Lem. 8.2. Let A ∈ Rn×n and B ∈ R
m×m be unitary matrices.
Then A⊗B is a unitary matrix.
Proof. Since A∗ = A−1, B∗ = B−1 we have
(A⊗B)∗ = A∗ ⊗B∗ = A−1 ⊗B−1 = (A⊗B)−1.
Matrix operations with Kronecker products and sums B. Khoromskij, Zurich 2010(L8) 196
Define the commutator [A,B] := AB −BA.
Lem. 8.3. Let A ∈ Rn×n and B ∈ R
m×m. Then
[A⊗ In, Im ⊗B] = 0 ∈ Rm2×n2
.
Proof.
[A⊗ In, Im ⊗B] = (A⊗ In)(Im ⊗B) − (Im ⊗B)(A⊗ In)
= A⊗B −A⊗B = 0.
Rem. 8.2. Let A,B ∈ Rn×n, C,D ∈ R
m×m and [A,B] = 0,
[C,D] = 0. Then
[A⊗ C,B ⊗D] = 0.
Proof. Apply the identity (A⊗B)(C ⊗D) = (AC) ⊗ (BD).
Matrix operations with Kronecker products and sums B. Khoromskij, Zurich 2010(L8) 197
Lem. 8.4. Let A ∈ Rn×n and B ∈ R
m×m. Then
tr(A⊗B) = tr(A)tr(B).
Proof. Since diag(aiiB) = aiidiag(B), we have
tr(A⊗B) =n∑
i=1
m∑
j=1
aiibjj =n∑
i=1
aii
m∑
j=1
bjj .
Thm. 8.5. Let A,B, I ∈ Rn×n. Then
exp(A⊗ I + I ⊗B) = (expA) ⊗ (expB).
Proof. Since [A⊗ I, I ⊗B] = 0, we have
exp(A⊗ I + I ⊗B) = exp(A⊗ I) exp(I ⊗B).
Matrix operations with Kronecker products and sums B. Khoromskij, Zurich 2010(L8) 198
Furthermore, since
exp(A⊗ I) =∞∑
k=0
(A⊗ I)k
k!, exp(I ⊗B) =
∞∑
m=0
(I ⊗B)m
m!
the arbitrary term in exp(A⊗ I) exp(I ⊗B) is given by
1
k!
1
m!(A⊗ I)k(I ⊗B)m.
Imposing
(A⊗I)k(I⊗B)m = (Ak⊗Ik)(Im⊗Bm) = (Ak⊗I)(I⊗Bm) ≡ Ak⊗Bm,
we finally arrive at
1
k!
1
m!(A⊗ I)k(I ⊗B)m = (
1
k!Ak) ⊗ (
1
m!Bm).
Matrix operations with Kronecker products and sums B. Khoromskij, Zurich 2010(L8) 199
Thm. 8.5. can be extended to the case of many-term sum
exp(A1⊗I⊗...⊗I+I⊗A2⊗...⊗I+...+I⊗...⊗I⊗Ad) = (eA1)⊗...⊗(eAd).
Rem. 8.3. Similar properties can be shown for other analytic
functions, e.g.,
sin(In ⊗A) = In ⊗ sin(A),
sin(A⊗ Im + In ⊗B) = sin(A) ⊗ cos(B) + cos(A) ⊗ sin(B),
sin(A⊗ Im + In ⊗B) =sin(A) ⊗ sin(B + (b− a)I)
sin(b− a)+
sin(A+ (a− b)I) ⊗ sin(B))
sin(a− b)
for all values a, b such that sin(a− b) 6= 0. The latter can be extended to
sin(A1 ⊗ I ⊗ ...⊗ I + ...+ I ⊗ ...⊗ I ⊗ Ad) (compare with sin(x1 + ...+ xd)).
Other simple properties:
(A⊗B)T = AT ⊗BT , (A⊗B)∗ = A∗ ⊗B∗.
Eigenvalue problem for Kronecker sums B. Khoromskij, Zurich 2010(L8) 200
Lem. 8.6. Let A ∈ Rm×m and B ∈ R
n×n have the eigen-data
λ1, ..., λm, u1, ..., um, and µ1, ..., µn, v1, ..., vn, respectively. Then
A⊗B has the eigenvalues λjµk with the corresponding
eigenvectors uj ⊗ vk, 1 ≤ j ≤ m, 1 ≤ k ≤ n.
Thm. 8.7. Under the conditions of Thm. 8.6 the
eigenvalues/eigenfunctions of A⊗ In + Im ⊗B are given by
λj + µk and uj ⊗ vk, respectively.
Proof. Due to Thm. 8.6 we have
(A⊗ In + Im ⊗B)(uj ⊗ vk) = (A⊗ In)(uj ⊗ vk) + (Im ⊗B)(uj ⊗ vk)
= (Auj) ⊗ (Invk) + (Imuj) ⊗ (Bvk)
= (λjuj) ⊗ vk + uj ⊗ (µkvk)
= (λj + µk)(uj ⊗ vk).
Application to matrix Lyapunov/Sylvester equations B. Khoromskij, Zurich 2010(L8) 201
Recall that for a matrix A ∈ Rm×n, we use the vector representation
A→ vec(A) ∈ Rmn, where vec(A) is an nm× 1 vector obtained by
“stacking” A’s columns
vec(A) := [a11, ..., an1, a12, ..., anm]T .
In this way, vec(A) is a rearranged version of A.
Lem. 8.8. Let A ∈ Rm×m, Y ∈ Rm×n, B ∈ Rn×n. Then
vec(AY B) = (BT ⊗A)vec(Y ).
The matrix Sylvester equation for X ∈ Rm×n
AX +XBT = G ∈ Rm×m (54)
with A ∈ Rm×m, B ∈ R
n×n, can be written in vector form
(In ⊗A+B ⊗ Im)vec(X) = vec(G).
In the case A = B we arrive at the Lyapunov equation.
Application to matrix Lyapunov/Sylvester equations B. Khoromskij, Zurich 2010(L8) 202
Now the solvability conditions and certain solution methods
can be derived (cf. the results for eigenvalue problems).
The matrix Sylvester equation (54) is uniquely solvable if
λj(A) + µk(B) 6= 0.
Rem. 8.4. Since In ⊗A and B ⊗ Im commute, we can apply
methods based on sinc-quadr., to represent the inverse
(In ⊗ A+B ⊗ Im)−1 =
∫
R+
e−t(In⊗A+B⊗Im)dt.
If A and B represent the discrete elliptic operators in Rd with
separable coefficients, we obtain the low-rank tensor-product
approx. to the Sylvester solution operator.
Exer. 8.2. Approximate formally by the sinc quadrature
(∆(2))−1 = (In ⊗ ∆ + ∆ ⊗ In)−1 ≈MX
k=−M
cke−tk∆ ⊗ e−tk∆.
Kronecker Hadamard scalar product B. Khoromskij, Zurich 2010(L8) 203
Given tensors Y ⊗ U ∈ RJ×I with U ∈ R
I, Y ∈ RJ , and
B ∈ RL×I. Let T : RL → RJ be the linear operator (tensor)
that maps tensors defined on the index set L into those
defined on J . In particular T ·B ∈ RJ×I.
Def. 8.7. ([3]) The Hadamard “scalar” product [D,C]I ∈ RK
of two tensors D := [Di,k] ∈ RI×K and C := [Ci,k] ∈ R
I×K is
defined by
[D,C]I :=∑
i∈I[Di,K] ⊙ [Ci,K],
where ⊙ denotes the Hadamard product over the index set Kand [Di,K] := [Di,k]k∈K.
Lem. 8.9. Let U, Y,B and T be given as above. Then, with
K = J , the following identity is valid
[U ⊗ Y, T ·B]I = Y ⊙ (T · [U,B]I) ∈ RJ . (55)
Kronecker Hadamard scalar product B. Khoromskij, Zurich 2010(L8) 204
Proof. By definition of the Hadamard scalar product we have
[U ⊗ Y, T ·B]I =∑
i∈I[U ⊗ Y ]i,J ⊙ [T ·B]i,J
=∑
i∈I[[U ]iY ]J ⊙ [T ·B]i,J
= Y ⊙(∑
i∈I[U ]i[T ·B]i,J
)
= Y ⊙(T ·∑
i∈I[U ]i[B]i,L
),
then the assertion follows.
Identity (55) is particularly important in application to the
Boltzmann eq. [3], since in the right-hand side the operator T
is removed from the scalar product in I and, hence, it applies
only once.
Remarks on rank structured operators (matrices) B. Khoromskij, Zurich 2010(L8) 205
Rem. 8.5. By Def. 8.1 the matrix-vector multiplication with rank-1
tensor V ∈ C1,J is defined by the rank-R canonical sum
AV =XR
ν=1ανA
ν1v1 ⊗ ...⊗Aν
dvd ∈ CR,I .
Since ⊗ is also traditionally used for the Kronecker product of matrices
the equivalent and more consistent notation could be based on the
contracted product,
A =XR
ν=1ανA
ν1 ×2 . . .×d A
νd , αν ∈ R, Aν
ℓ ∈ RIℓ×Jℓ .
However, if there is no confusion, we continue using ⊗ in the matrix
tensor product as in Def. 8.1.
In particular, for the “single-index” representation of vectors, the discrete
Laplacian in Rd, takes the form as in Ex. 8.1. In the tensor representation
of vectors, we can write
∆(d) := ∆ ×2 I ×3 ...×d I + I ×2 ∆ ×3 I ×4 ...×d I + ...+ I ×2 ...I ×d ∆,
with ∆(d) ∈ R(n×n)×...×(n×n).
If there is no confusion, we continue using ⊗ in the notation for the
discrete operators (stiffness matrices) in Rd.
Kronecker matrix rank if d = 2 B. Khoromskij, Zurich 2010(L8) 206
If d = 2, estimate of the Kronecker matrix rank can be
reduced to computation of the standard matrix rank (cf. [2,
4]).
Let A = [a(i, j)]1≤i,j≤N , N = n1n2. Use the bijection
i↔ (i1, i2), j ↔ (j1, j2), 1 ≤ i1, j1 ≤ n1, 1 ≤ i2, j2 ≤ n2,
defined by FORTRAN-style ordering,
i = i1+(i2−1)n1, j = j1+(j2−1)n1, 1 ≤ i1, j1 ≤ n1, 1 ≤ i2, j2 ≤ n2.
A can be indexed by a(i, j) = a(i1, i2, j1, j2). Introduce a new
matrix A of size n21 × n2
2, indexed by the respective pairs
(i1, j1), (i2, j2) (long indices),
a(i1, j1; i2, j2) = a(i1, i2, j1, j2).
Kronecker matrix rank if d = 2 B. Khoromskij, Zurich 2010(L8) 207
New indexing also defines the bijective mapping P : A→ A
(rearranged version of A), which preserves the Frobenius
norm.
Note. There is no permutation s.t. A = PAP T .
Applied to the rank-R Kronecker product sum,
A→ AR =R∑
k=1
Uk ⊗ Vk ⇐⇒ P(AR) := AR =R∑
k=1
vk · uTk ,
with Uk = [uk(i2, j2)]1≤i2,j2≤n2, Vk = [vk(i1, j1)]1≤i1,j1≤n1 , vk ∈ Rn2
1 ,
uk ∈ Rn2
2 .
Rem. 8.6. The problem of finding a Kronecker tensor-rank
approximation AR of A is identical to the problem to find a
low-rank approximation AR of A.
Complexity of the Kronecker-matrix arithmetics B. Khoromskij, Zurich 2010(L8) 208
We say that a matrix A ∈ MR,I×J has the S-inherited
Kronecker tensor product structure (S-structure) if Aℓk ∈ S,
where S is a class of data-sparse matrices of complexity
O(n logn) (storage, MVM, MMM).
Complexity issues: Let N = nd,
• Data compression.
The storage for A is O(dRn) = O(dRN1/d).
Hence, we enjoy the sub-linear complexity.
• Matrix-by-vector (MVM) complexity of Ax, x ∈ CN .
For general x one has the linear cost O(dRN log n).
If x = x1 × ...× xd, xi ∈ Cn, we again arrive at sub-linear complexity
O(dRn log n) = O(RN1/d logN1/d).
• Matrix-by-matrix (MMM) complexity of AB, A⊙B and A⊗B.
The S-struct. of the Kronecker factors leads to
O(dR2n logn) = O(R2N1/d logN1/d) op. instead of O(N3).
Literature to Lecture 8 B. Khoromskij, Zurich 2010(L8) 209
1. P.J. Davis: Circulant matrices. John Wiley & and Sons, Inc., NY, 1979.
2. W. Hackbusch, B.N. Khoromskij and E. Tyrtyshnikov: Hierarchical Kronecker tensor-product approximation.
J. Numer. Math. v. 13, n. 2 (2005), 119-156.
3. B.N. Khoromskij: Structured data-sparse approximation to high order tensors arising from the deterministic
Boltzmann equation. Math. Comp. 76 (2007), 1292-1315.
4. C. Van Loan: The ubiquitous Kronecker product. J. of Comp. and Applied Math. 123 (2000) 85-100.
URL: http://personal-homepages.mis.mpg.de/bokh
Lect. 9. Tensor approximation by nonlinear iteration B. Khoromskij, Zurich 2010(L9) 210
Outlook of Lecture 9.
1. Nonlinear approximation and tensor truncation.
2. Dual maximization problem for the rank-r Tucker
approximation.
3. ALS iteration for best rank-1 approximation.
4. Canonical rank-R approximation by ALS.
5. ALS iteration for the orthogonal Tucker approximation.
6. Two-level orthogonal Tucker approx. for canonical input.
7. Multigrid accelerated Tucker approx.: initial guess, most
important fibers, fast convergence.
8. Numerical illustrations in electronic structure calculations.
Nonlinear approximation in tensor format B. Khoromskij, Zurich 2010(L9) 211
Probl. 1. Best rank-R approximation of a high-order tensor
A ∈ Vn in the set CR.
Probl. 2. Best rank-r orthogonal approx. of A ∈ Vn in the
Tucker format Tr.
Probl. 3. Best rank-(R, r) two-level orthogonal approx. of
A ∈ Vn in T CR,r .
Nonlinear approximation problem on estimation: Given
A ∈ S0 ⊂ Vn, find best approximation in S to A,
Tr(A) := argminT∈S
‖A− T‖, where S ⊂ Tr, CR,T CR,r. (56)
Solution of this problem defines the tensor truncation
operator
Tr : S0 ⊂ Vn → S.
Usually it is calculated only upto certain accuracy ε > 0.
Dual maximisation problem B. Khoromskij, Zurich 2010(L9) 212
Rem. 9.1. Recall the Stiefel manifold of orthogonal n× r matrices,
Sn,r := Y ∈ Rn×r : Y T Y = Ir×r.
Consider the minimization problem
A ∈ S0 ⊂ Vn : Ar = argminT∈Tr ‖A− T‖Vn(57)
Lem. 9.1. (quadratic convergence of norm). Let
A(r) = β ×1 V(1) ×2 V
(2) . . .×d V(d) ∈ RI1×...×Id solve the
minimisation problem (57), then ‖β‖ = ‖A(r)‖ ≤ ‖A‖.Moreover, we have the ”quadratic” error bound for the norm,
‖A‖ − ‖A(r)‖‖A‖ ≤ ‖A(r) −A‖2
‖A‖2. (58)
Sketch of proof. The Tucker orthogonality implies ‖β‖ = ‖A(r)‖. Relation
(57) is merely a linear least-square problem w.r.t. β ∈ Rr1×...×rd ,
g(β) := 〈A,A〉 − 2〈A,β ×1 V(1) ×2 . . .×d V
(d)〉 + 〈β,β〉 → min. (59)
Dual maximisation problem B. Khoromskij, Zurich 2010(L9) 213
The corresponding minimisation condition
g(β + δβ) − g(β) ≥ 0 ∀ δβ ∈ Rr1×...×rd ,
leads to the following equations for minimiser,
−〈A, δβ ×1 V(1) ×2 . . .×d V
(d)〉 + 〈β, δβ〉 = 0 ∀ δβ ∈ Rr1×...×rd ,
〈−A×1 V(1)T ×2 . . .×d V
(d)T+ β, δβ〉 = 0, ∀ δβ ∈ R
r1×...×rd ,
β − A×1 V(1)T ×2 . . .×d V
(d)T= 0. (60)
Next we derive
‖A(r) −A‖2 = ‖A(r)‖2 − 2〈β ×1 V(1) ×2 . . .×d V
(d), A〉 + ‖A‖2
= ‖A(r)‖2 + ‖A‖2 − 2〈β, A×1 V(1)T ×2 . . .×d V
(d)T 〉,= ‖A‖2 − ‖β‖2,
hence it follows that ‖A‖2 − ‖A(r)‖2 = ‖A(r) −A‖2, and
‖A‖ − ‖A(r)‖‖A‖ =
‖A(r) −A‖2
(‖A(r)‖ + ‖A‖)‖A‖ ≤‖A(r) −A‖2
‖A|2 .
Best orthogonal rank-(r1, ..., rd) Tucker approximation B. Khoromskij, Zurich 2010(L9) 214
Thm. 9.2. The minimisation problem (57) is equivalent to
the dual maximisation problem
[U (1), ..., U (d)] = argmax∥∥∥A×1 V
(1)T ×2 ...×d V(d)T
∥∥∥2
F(61)
over the product of (compact) Stiefel manifolds,
V (ℓ) = [v1ℓ ...v
rℓℓ ] ∈ Snℓ,rℓ , ℓ = 1, ..., d.
For given maximizing matrices U (m) (m = 1, ..., d), the tensor
β minimising (57) is represented by
β = A×1 U(1)T ×2 ...×d U
(d)T ∈ Rr1×...×rd . (62)
Under the compartibility condition
rm ≤ rm := r1...rm−1rm+1...rd, m = 1, ..., d, (63)
there is at least one solution of (61).
Best orthogonal rank-(r1, ..., rd) Tucker approximation B. Khoromskij, Zurich 2010(L9) 215
Proof. The substitution of β from (60) to (59) leads to the
equivalent minimizing equation
〈A,A〉−〈A×1V(1)T ×2 ...×dV
(d)T, A×1V
(1)T ×2 ...×dV(d)T 〉 → min,
that proves (61), while (60) yields (62).
For the size consistency of arising tensors, we require the
compatibility conditions (63). Then the dual maximisation
problem (61) posed on the compact manifold can be proven
to have at least one global maximum.
The rotational non-uniqueness of the maximizer in (61) can
be avoided if one solves this maximisation problem in a
product of the so-called Grassmann manifolds Gℓ, ℓ = 1, ..., d.
The latter is the factor space of Snℓ,rℓ w.r.t. the rotational
transforms.
Best rank-1 approximation B. Khoromskij, Zurich 2010(L9) 216
Now we look in more detail to the simple special case of the
Tucker/canonical model that is the best rank-1 approximation. It is an
important ingredient in typical multi-linear algebra algorithms.
To derive the corresponding Lagrange equations, we notice that due to
the normalisation ‖A(1)‖2 = β2, A(1) = βV (1)T ×2 ...×d V(d)T
, the dual
problem of maximising the generalised Rayleigh quotient over the
unit-norm vectors (eliminates the scalar β), reads as
˛˛A×1 V
(1)T ×2 ...×d V(d)T
˛˛2−
dX
ℓ=1
λ(ℓ)“‖V (ℓ)‖2 − 1
”→ max. (64)
For any solution of this problem, the corresponding scalar β can be
chosen as β = A×1 V (1)T × ...×d V(d)T
.
Differentiating (64) w.r.t. V (m) (1 ≤ m ≤ d) leads to the equations
β A×1 V(1)T
...×m−1 V(m−1)T ×m+1 V
(m+1)T...×d V
(d)T= λ(m) V (m),
which imply λ(m) = β2.
Best rank-1 approximation B. Khoromskij, Zurich 2010(L9) 217
Finally, the Lagrange equations read as
A×1 V(1)T
...×m−1 V(m−1)T ×m+1 V
(m+1)T...×d V
(d)T= β V (m),
A×1 V(1)T ×2 ...×d V
(d)T= β,
‖V (m)‖ = 1 (1 ≤ m ≤ d).
The above system of Lagrange equations can be solved by an alternating
least squares (ALS) algorithm.
ALS algorithm for rank-1 approximation (see Alg. BTA below).
At each iterative step an approximant to the scalar β and the estimate of
vectors V (m) (m = 1, ..., d) are optimised, while the rest vector-components
with ℓ 6= m are kept constant.
The ALS method for the best rank-1 approximation is proved to have
locally linear convergence rate ([Zhang, Golub 2001]). The Newton’s type
method provides locally quadratic convergence.
The rank-R canonical approximation by ALS iteration B. Khoromskij, Zurich 2010(L9) 218
Let V (m) ∈ Rn×R (m = 1, ..., d), and β = βkR
k=1, be the side
(factor) matrices with normalised column vectors, and
respective weights of the rank-R approximant T ∈ CR of A.
ALS algorithm for rank-R approximation. [Carroll, Chang ’70]
Given initial matrices V (m) (m = 1, 2, ..., d), the ALS scheme
fixes V (m) (m = 2, ..., d) to solve (317) for V (ℓ), ℓ = 1, then
fixes V (m) (m = 1, 3, ..., d) to solve (317) for V (ℓ), ℓ = 2, and so
on for ℓ = 1, 2, ..., d, and continues to repeat such a global
iteration until some convergence or stopping criteria is
satisfied.
Having fixed all but one side matrix, the numerical task
reduces to linear least squares problem.
– ALS algorithm is simple to implement,
– but convergence may be very slow.
The rank-R canonical approximation by ALS iteration B. Khoromskij, Zurich 2010(L9) 219
For example, for d = 3, we can write Iter. 1 for ℓ = 1,
T(1) = V (V (3) ⊡ V (2))T , where V = V (1) · diagβ.
The minimization problem takes the form (Frobenius norm)
minV
‖A(1) − V (V (3) ⊡ V (2))T ‖2,
and the optimal solution (minimizer) is then given by
V = A(1)
[(V (3) ⊡ V (2))T
]†,
where B† denotes the Moore-Penrose pseudo-inverse of B.
Update for β is computed by normalising columns of V .
Notice that the pseudo-inverse of B = (V (3) ⊡ V (2))T is only of
size R×R.
Generalization to an arbitrary d ≥ 3 is similar, [Kolda, Bader ’09].
Best rank-rTucker approximation (BTA) B. Khoromskij, Zurich 2010(L9) 220
The ALS iteration to compute BTA is also known as higher-order
orthogonal iteration (HOOI) [De Lathauwer, De Moor, J. Vandewalle ’00].
Algorithm BTA (Vn → T r,n). Given the input tensor A ∈ Vn.
1. Compute an initial guess V(ℓ)0 (ℓ = 1, ..., d) for the ℓ-mode
side-matrices by “truncated” SVD applied to n× nd−1 unfolding
matrix A(ℓ), i.e. HOSVD, (cost O(nd+1)).
2. For each q = 1, ..., d, and with fixed side-matrices V (ℓ) ∈ Rn×rℓ , ℓ 6= q,
the ALS iteration optimises the q-mode matrix V (q) via computing
the dominating rq-dimensional subspace (truncated SVD) for the
unfolding matrix B(q) ∈ Rn×rq , rq = r1...rq−1rq+1...rd = O(rd−1)
corresponding to the q-mode contracted product
B = A×1 V(1)T ×2 ...×q−1 V
(q−1)T ×q+1 V(q+1)T
...×d V(d)T
.
Each iteration has the cost O(drd−1nmaxrd−1, n).
3. Check stopping criteria.
4. Compute the core β as the representation coefficients of the
orthoprojection of A onto ⊗dℓ=1 spanvν
ℓ rℓν=1, (cost O(rdn))
β = A×1 V(1)T ×2 ...×d V
(d)T ∈ Rr1×....×rd .
Two-level BTA for canonical input B. Khoromskij, Zurich 2010(L9) 221
Conclusion: Algorithm BTA (Vn → T r,n) applies to moderate d, and
moderate n, because of exponential scaling in d.
Consider approximation in the two-level Tucker-canonical format.
In the iterative solution of multi-dimensional PDEs the typical situation
may arise when the target tensor is already presented in the rank-R
canonical format, A ∈ CR,n, but with large R and large n. In this case, a
two-level approximation scheme can be applied,
CR,n→T CR,r → T CR′,r. (65)
On Level-I, the best orthogonal Tucker approx. applies to CR,n input, so
that the resultant Tucker core is represented in the CR,r format.
On Level-II, the “small-size” Tucker core in CR,r is approximated by an
element in CR′,r with R′ ≪ R.
Next statement gives the result on the solvability and structure of the
Level-I scheme in (65), and provides the key to construct its efficient
numerical implementation.
Two-level BTA for canonical input B. Khoromskij, Zurich 2010(L9) 222
Thm 9.3. (Canonical to Tucker approximation, [4]).
(a) For A =RP
ν=1ξνuν
1 ⊗ ...⊗ uνd ∈ CR,n, the minimisation problem
A ∈ CR,n ⊂ Vn : A(r) = argminT∈Tr,n‖A− T‖Vn
, (66)
is equivalent to the dual maximisation problem
[V (1), ..., V (d)] = argmaxW (ℓ)∈Gℓ[Srℓ ]
‚‚‚‚‚RX
ν=1
ξν“W (1)T
uν1
”⊗ ...⊗
“W (d)T
uνd
”‚‚‚‚‚
2
Hr
.
(b) The compatibility condition is simplified to
rℓ ≤ rank(U (ℓ)) with U (ℓ) = [u(ℓ)1 ...u
(ℓ)R ] ∈ R
n×R,
ensuring the solvability of (221).
The maximizer is given by orthogonal matrices V (ℓ) = [v(ℓ)1 ...v
(ℓ)rℓ ] ∈ Rn×rℓ ,
computed as in Alg. BTA, with modification HOSVD → RHOSVD.
(c) The minimiser in (66) is then calculated by the orthoprojection,
A(r) =rX
k=1
µkv(1)k1
⊗...⊗v(d)kd, µ =
RX
ν=1
ξν(V (1)Tu(1)ν )⊗· · ·⊗(V (d)T
u(d)ν ) ∈ CR,r.
Two-level BTA for canonical input B. Khoromskij, Zurich 2010(L9) 223
Sketch of Proof. (a) The generic dual maximization problem
(61) with A ∈ CR,n, takes the form (221) due to the relation
〈vk11 ⊗ ...⊗ vkd
d , A〉 =
R∑
ν=1
ξν〈vk11 , uν
1〉...〈vkdd , uν
d〉.
(b) The compatibilty condition ensures the size-consistency
of all matrix unfoldings.
(c) Formula for A(r) is a special case of the general
orthoprojection representation for the Tucker core.
Notice that the approximation error of the minimizer A(r) is
given by Thm. 7.3 (Lect. 7), see also Rem. 9.2 below.
Algorithm C BTA, complexity bound B. Khoromskij, Zurich 2010(L9) 224
Algorithm C BTA (CR,n→T CR,r). Given A ∈ CR,n, iteration
parameter kmax, and the rank parameter r.
1. For ℓ = 1, ..., d, compute the truncated SVD of U (ℓ) to
obtain orthogonal matrices Z(ℓ)0 ∈ R
n×rℓ , representing the
rank-rℓ RHOSVD approximation of ℓ-mode dominating
subspaces (cost O(dRnminR, n)).
2. Given initial quess Z(ℓ)0 , (ℓ = 1, ..., d) for ℓ-mode orthogonal
matrices. Perform kmax ALS iterations as at Step 2 in the
general Alg. BTA to obtain the maximizer V (ℓ) ∈ Rnℓ×rℓ,
ℓ = 1, ..., d (cost O(drd−1nminrd−1, n) per iteration).
3. Calculate projections of U (ℓ) onto the basis of computed
orthogonal vectors of V (ℓ) as the matrix product V (ℓ)TU (ℓ)
(ℓ = 1, ..., d), at the cost O(drRn).
Algorithm C BTA, complexity bound B. Khoromskij, Zurich 2010(L9) 225
4. Using the columns in V (ℓ)TU (ℓ) (ℓ = 1, ..., d), calculate the
rank-R core tensor µ ∈ CR,r as in Thm. 9.3, in O(drRn)
operations and with O(drR)-storage.
Rem. 9.2 Algorithm C BTA (CR,n→T CR,r) exhibits
polynomial cost in R, r, n,
O(dRnminn,R + drd−1nminrd−1, n),
with exponential scaling in d. In absence of Step 2 (i.e.,
RHOSVD provides satisfactory approximation), we have for
any d ≥ 2 a finite SVD-based scheme with error bound as in
Thm. 7.3, see (4). If the R-term canonical representation is
only weakly redundant, then ‖ξ‖ will be of the same order as
‖A‖, and the relative error of RHOSVD becomes as good as
for HOSVD.
Old and new Tucker-type approximation methods B. Khoromskij, Zurich (L9) 226
1. Best orthogonal Tucker approx. for general Vn-input.
2. ALS-based rank reduction methods CR,n → Cr,n, r < R.
3. Two-level version of BTA.
→ Polynomial cost O(d(R+ rq)nminn,R) for CR,n-input.
4. Multigrid rank-r Tucker approximation (MG-BTA):
→ Robustness: fast convergence of ALS iteration ensured by
good initial guess at all representation grids.
→ Linear scaling for CR,n-input, O(dRrn).
→ Complexity reduction nd+1 → nd for full format input.
→ Efficient rank-structured tensor realization of the
convolution operator in Rd on large spatial grids (e.g.,
n⊗3, n ≤ 3.2 · 104), to be discussed in the following lectures.
Improved RHOSVD for function related tensors B. Khoromskij, Zurich (L9) 227
Multigrid accelerated method, [BNK, V. Khoromskaia ’08]: [4].
Main idea:
– Solve the sequence of approximation problems for An = Anm
with n = nm := n02m, m = 0, 1, ...,M . (R)HOSVD analysis of
An0 , only on the coarse grid.
– Use the coarse-to-fine approximation of dominating
subspaces.
– Find “most important fibers” (MIFs) of ℓ-mode unfolding
matrices on coarse level(s), via maximal energy principle.
– ALS iteration on the reduced data set by choosing only
small set of most representative MIFs of ℓ-mode unfolding
matrices.
Multilevel iteration with learning the data structure B. Khoromskij, Zurich 2010(L9) 228
Solving problems on a sequence of grids, n = n0, ..., nM :
1. The equidistant tensor grid ωd,n := ω1 × ω2 · · · × ωd, where
ωℓ := −A+ (k − 1)h : k = 1, ..., n+ 1 (ℓ = 1, ..., d)
with mesh-size h = 2A/n, n = n02m, m = 0, 1, ...,M . A set of collocation
points
xk ∈ Ω ∈ Rd, k ∈ I := 1, ..., nd
located at the midpoints of the grid-cells numbered by k ∈ I.
2. Given continuous multivariate function f : Ω → R, the target tensor
An = [an,k] ∈ RI is defined as the trace of f on the set xk,
an,k = f(xk), k ∈ I.
3. An “accurate“ tensor-product prolongation operator Im−1→m from the
coarse to fine grid (interpolation by piecewise linear/cubic splines).
4. Transfer the initial guess and position of most important fibers (MIFs)
from coarse to fine grids.
Algorithm MG C BTA B. Khoromskij, Zurich 2010(L9) 229
Algorithm MG C BTA (CR,nM→T CR,r). Multigrid accelerated
canonical-to-Tucker approximation. Set r = (r, ..., r) for ease of present.
1. Given Am ∈ CR,nm , corresponding to a sequence of grid parameters
nm := n02m, m = 0, 1, ...,M . Fix a structural constant p = O(1) (i.e.
pr ≪ rd−1), iteration parameter kmax, and the Tucker rank r.
2. For m = 0, solve C BTA(CR,n0 → TCR,r) and compute the index set
Jq,p(n0) ⊂ Jrq via identification of MIFs in the matrix unfolding B(q),
q = 1, ..., d, using the maximum energy principle applied to the
q-mode unfolding of the Tucker core β(q) = U (q)TB(q) ∈ Rrq×rq .
+ ... + + ... +
n1 n n nn
1 1 11
n n
n n3 3
2 2
rr
r12r r2
r33 r 3 3r
r2 23
B(q)find MIFs
βcore (q)
Figure 20: d = 3: Finding MIFs in the coarse level core β(q), q = 1 for the rank-R initial
data on the coarse grid n0 = (n1, n2, n3). For explanatory reasons, B(q) is presented in a tensor
form.
Algorithm MG C BTA B. Khoromskij, Zurich 2010(L9) 230
3. For m = 1, ...,M perform the cascadic MGA Tucker approximation by
the restricted ALS iteration:
3a) Compute initial orthogonal side matrices on level m by
interpolation from level m− 1 (say, by using cubic splines)
V (q) = V(q)m = Im−1→m(V
(q)m−1), q = 1, ..., d.
3b) For each q = 1, ..., d, fix V (ℓ) (ℓ = 1, ..., d, ℓ 6= q) and perform:
→ Compute matrix products V (ℓ)TU (ℓ), ℓ = 1, ..., d; ℓ 6= q, and build
the ”restricted” q-mode matrix unfolding B(q,p),
B(q,p) = B(q)|Jq,p(n0)∈ R
nm×pr ,
by calculating pr columns in the complete unfolding matr.
B(q) ∈ Rnm×rq .
→ Update the orthogonal matrix V (q) = V(q)m ∈ Rnm×r via computing
the r-dimensional dominating subspace of the ”restricted” matrix
unfolding B(q,p) (truncated SVD of nm × pr matrix).
4. If m = M , compute the rank-R core tensor β ∈ CR,r, as in Step 3 of
basic algorithm C BTA (CR,n → TCR,r).
Algorithm MG C BTA B. Khoromskij, Zurich 2010(L9) 231
2 4 6 8 10 12 14
2
4
6
8
10
12
142 4 6 8 10 12 14
2
4
6
8
10
12
142 4 6 8 10 12 14
2
4
6
8
10
12
14
Figure 21: MIFs: selected projections of the fibers in the coarse level cores for comput-
ing U(1)(left), U(2) (middle) and U(3) (right). The example corresponds to the multigrid rank
compression in computation of the Hartree potential for the HO2 molecule, r = 14, p = 4.
Thm. 9.4. Algorithm MG C BTA(CR,nM→T CR,r) amounts to
O(dRrnM + dp2r2nM )
operations per ALS loop, plus extra cost of the coarse mesh
solver BTA (CR,n0→T CR,r). It requires O(drnM + drR) storage
to represent the result.
Algorithm MG C BTA B. Khoromskij, Zurich 2010(L9) 232
U(l) Z (l)
m=0,..., M, l=1,...,d
HOSVD : 0 0
Given β , U (l)m , for,p>0
U(l)
0(Z(l)
0)T
0and the core tensor
dq=1,...,,
µ
find p MIFs in B(q) , specify Ip,q
, l=1,...,d
compute projections
ALSfor m=1,...,M interpolate Z
(l)
m−1Z
(l)
m
using Ip,q B (q)
Z(l)
mB (q)compute dominating subspaces
its canonical rank
compute µM and reduce
using matrix unfolding of 0µ B (q)
compute reduced unfoldings
Figure 22: Flow chart of Alg. MG C BTA for the rank-R target.
Complexity of Algorithm MG C BTA B. Khoromskij, Zurich 2010(L9) 233
Proof. Step 3a) requires O(drnm) operations and memory.
Notice that for large m, we have pr ≤ nm, hence the
complexity of second part in Step 3b) is dominated by
O(dRrnm + p2r2nm) per iteration loop, and same for first part
of Step 3b).
Rank-R representation of β ∈ CR,r requires O(drRnm)
operations and O(drR)-storage. Summing up over levels
m = 0, ...,M , proves the result.
Thm. 9.4 shows that Algorithm MG C BTA realizes the fast
rank reduction method that scales linearly in d, nM , R and r
on refined levels.
Moreover, the complexity and error of the MGA Tucker
approximation can be controlled by the adaptive choice of the
governing parameters r, p, n0 and kmax.
Numerics to Algorithm MG C BTA B. Khoromskij, Zurich 2010(L9) 234
200 400 600 800 1000 12000
10
20
30
40
50
60
70
80
90
rank
seco
nds
n=2048n=1024n=512n=256
20 40 60 80 10010
−10
10−8
10−6
10−4
10−2
100
n=128n=256n=512n=1024n=2048
Figure 23: Linear scaling in R and in n (left). Plot of SVD for the mode-1
matrix unfolding B(1,p), p = 4 (right).
Exer. 9.1. Find the maximal size of 3-tensors which are
tractable for Tucker approximation by Tensor Toolbox. What
is the extrapolated CPU time for n = 2048 scaled by O(n4).
Numerics to Algorithm MG C BTA B. Khoromskij, Zurich 2010(L9) 235
10 11 12 13 14 15 1610
−8
10−7
10−6
10−5
10−4
Tucker rank
MGA for Slater function, relative error
n=512n=256n=128
14 16 18 20 2210
−6
10−5
10−4
10−3
10−2
Tucker rank
relative error, ρ1/3 ,pseudodensity of CH4, kmax=4
n=128n=256n=512
Figure 24: Approximation error of the multigrid Tucker approximation
vs. rank for the 3D Slater function e−‖x‖ (left), and for the third root of
electron density ρ1/3 of CH4 (right).
Rem. 9.3. The multigrid BTA methods applies to the full format tensors
reducing the cost by nd+1 → nd. For d = 3, the alternative approach to
T-HOSVD can be based on the adaptive cross approximation [5].
Literature to Lecture 9 B. Khoromskij, Zurich 2010(L9) 236
1. J.D. Carrol, and J. Chang: Analysis of individual differences in multidimensional scaling.
Psychometrika 35 (1970), 283-319.
2. L. De Lathauwer, B. De Moor, J. Vandewalle, On the best rank-1 and rank-(R1, ..., RN )
approximation of higher-order tensors. SIAM J. Matrix Anal. Appl., 21 (2000) 1324-1342.
3. T.G. Kolda, and B.W. Bader: Tensor decompositions and applications.
SIAM Review, 51/3 (2009), 455-500.
4. B.N. Khoromskij and V. khoromskaia: Multigrid accelerated tensor approximation of function related
multi-dimensional arrays. SIAM J. Sci. Comp. 31(4), 2009, 3002-3026.
5. I. Oseledets, D. Savostianov, and E. Tyrtyshnikov: Tucker dimensionality reduction
of three-dimensional arrays in linear time. SIMAX, 30(3), 939-956 (2008).
6. T. Zhang and G.H. Golub: Rank-one approximation to high order tensors. SIAM J. Matrix Anal. Appl.
23 (2001), 534-550.
URL: http://personal-homepages.mis.mpg.de/bokh
Lect. 10. Tensor Representation of Matrix-Valued Functions B. Khoromskij, Zurich 2010(L10) 237
Outlook of Lecture 10.
1. Examples of Matrix-Valued Functions (MVF), see [3] for
more detail.
2. Standard integral or iterative representations of MVFs.
3. MVFs as solution operators (SOs) of PDEs and matrix
equ.
4. Quadrature representation of the elliptic resolvent.
5. Quadrature error controls the error in operator norm.
6. Matrix exponential.
7. Newton iteration to compute A−1.
8. Newton-Schultz iteration for sign(A), and√A.
Examples of matrix-valued functions B. Khoromskij, Zurich 2010(L10) 238
MVFs of the elliptic operator L (resp. matrix A) arise as the
solution operators of elliptic, parabolic, hyperbolic eq., etc.
Tensor-product representations for several classes of MVFs:
F(L) :=L−α, α > 0, elliptic inverse, preconditioning
F(L) :=e−tL, parabolic solution operator
F(L) := cos(t√L)L−k, k ∈ N, regularized hyperbolic SO
F(L) :=
∫ ∞
0
e−tL∗
Ge−tLdt, SO of matrix Sylvester eq.
F(L) := sign(L), control theory, DFT.
Both the discrete elliptic inverse A−1 and the matrix
exponential e−tA play the key role in numerical PDEs.
Usually MVFs are nonlocal, hence tensor-product formats are
useful for their efficient representation even if d = 3.
Constructive representation of MVFs B. Khoromskij, Zurich 2010(L10) 239
There are different methods to represent MVFs (L = A):
• The case of diagonalisable matrices, i.e., A = T−1DT with
D = diagd1, ..., dn - diagonal, one defines
F (A) = T−1F (D)T, F (D) = diagF (d1), ..., F (dn).
• Dunford-Cauchy integral for analytic functions
F (A) =1
2πi
∫
Γ
F (z)(zI −A)−1dz, Γ ∈ C, σ(A) ⊂ int(Γ).
• Laplace type transform
F (A) =
∫
R
f(t)e−tAdt.
• Trigonometric integral representation
F (A) =
∫
R
[a(t) cos(tA) + b(t) sin(tA)]dt.
• Polynomial expansions or/and nonlinear iteration.
MVFs as solution operators of PDEs and matrix eq. B. Khoromskij, Zurich 2010(L10) 240
Ex. 10.1. The solution operator to the initial value parabolic
problem∂u
∂t+ Lu(t) = 0, u(0) = u0 ∈ X, (67)
is given by
T (t;L) = e−tL =
∫
Γ
e−zt(zI − L)−1dz,
where L is an elliptic (say, sectorial) operator (e.g., L = −∆)
in a Hilbert space X and u(t) is a vector-valued function
u : R+ → X, Γ envelops σ(L).
Given the initial vector u0, then u(t) = T (t;L)u0.
A simple example of a parabolic PDE is the heat equation
u(0) = u0 :∂u
∂t− ∆u = 0, u : R+ × [0, 1]d → R
with the corresponding boundary conditions.
MVFs as solution operators of PDEs and matrix eq. B. Khoromskij, Zurich 2010(L10) 241
Ex. 10.2. Initial-value problem for the second order
differential equation with an operator coefficient
u′′(t) + Lu(t) = 0, u(0) = u0, u′(0) = 0,
has the solution operator
C(t;L) := cos(t√L) =
∫
Γ
cos(t√z)(zI − L)−1dz,
(the hyperbolic operator cosine family), u(t) = C(t;L)u0.
It represents the function-to-operator map cos(t√·)→ C(t;L).
An example of a hyperbolic PDE is the classical wave eq.
∂2u
∂t2− ∆xu = 0, x ∈ R
d,
subject to the corresponding boundary and initial conditions.
MVFs as solution operators of PDEs and matrix eq. B. Khoromskij, Zurich 2010(L10) 242
Ex. 10.3. For the boundary value problem
d2u
dx2− Lu = 0, u(0) = 0, u(1) = u1, (68)
in a Hilbert space X, the solution operator is the normalised
hyperbolic operator sine family
E(x;L) :=(sinh(
√L))−1
sinh(x√L) =
∫
Γ
sinh(x√z)
sinh(√z)
(zI − L)−1dz,
so that u(x) = E(x;L)u1.
Simple PDE of type (68): the Laplace eq. in a cylinder:
d2u
dx2+d2u
dy2= 0, x ∈ [0, 1], y ∈ [c, d],
u(0, y) = 0, u(1, y) = u1(y).
Rem. 10.1 Representations 10.1 - 10.3 allow to avoid time
stepping (parallel in time computations).
MVFs as solution operators of PDEs and matrix eq. B. Khoromskij, Zurich 2010(L10) 243
Ex. 10.4. In the case of Sylvester matrix equation
AX +XB = G, (A,B,G ∈ Rn×n given)
the solution X ∈ Rn×n is given by
X = F(A,B)G :=
∫ ∞
0
e−tAGe−tBdt,
provided that A,B ensure existence of integral (cf. Lect. 8).
The (nonlinear) Riccati matrix equation
AX +XA⊤ +XFX = G,
where A,F,G ∈ Rn×n are given and X ∈ Rn×n is the unknown
matrix, can be solved by Newton’s iteration. At each iteration
step the Lyapunov equation has to be solved (Xk → X)
(A− FXk)Xk+1 +Xk+1(A− FXk)⊤ = −XkFXk +G.
MVFs as solution operators of PDEs and matrix eq. B. Khoromskij, Zurich 2010(L10) 244
Ex. 10.5. Let A ∈ Rn×n be a matrix whose spectrum σ(A)
does not intersect the imeginary exis. The matrix function
F (A) = sign(A) is defined by
sign(A) :=1
πi
∫
Γ+
(zI −A)−1dz − I (69)
with Γ+ being any simply connected closed curve in C whose
interior contains all eigenvalues of A with positive real part.
The tensor representation of the MVF sign(A) (rank depends
on the grid-size) can be based on an efficient quadrature for
the integral
sign(A) =1
cf
∫
R+
f(tA)
tdt.
MVF sign(A) arises in optimization theory, the DFT in
comput. quantum chemistry, and in linear algebra.
MVFs as solution operators of PDEs and matrix eq. B. Khoromskij, Zurich 2010(L10) 245
Ex. 10.6. A negative fractional power of A is represented by
A−σ =1
Γ(σ)
∫ ∞
0
tσ−1e−tAdt, σ > 0, (70)
provided that the integral exists.
If A = −∆, (70) is of the particular interest in the cases:
(a) σ = 1: Laplacian inverse,
(b) σ = 1/2: Preconditioning for the Laplace-Beltrami op.
(−∆)1/2, and for the hypersingular integral op. in BEM,
(c) σ = 2: Inverse biharmonic operator.
A positive fractional power of A, say Aα, 0 < α < 1, can be
represented by a simple factorisation,
Aα = AA1−α.
Classes of elliptic operators B. Khoromskij, Zurich 2010(L10) 246
The elliptic operator A : V → V ′ with V = H10 (Ω), V ′ = H−1(Ω),
A =d∑
j=1
− ∂
∂xjaj(xj)
∂
∂xj+ bj(xj)
∂
∂xj+ cj(xj)
,
is supposed to have “separable” coefficients. The associated
bilinear form (with c(x) =∑cj(xj))
a(u, v) =
∫
Ω
d∑
j=1
aj(x)∂u
∂xj
∂v
∂xj+
d∑
j=1
bj(x)∂u
∂xjv + c(x)uv
dx
with a : V × V → R is assumed to be continuous and V -elliptic:
|a(u, v)| ≤ C‖u‖V ‖v‖V , ℜe a(v, v) ≥ δ0‖v‖2V , δ0 > 0.
In tensor-product setting we have (x1, ..., xd) ∈ Ω := (0, 1)d ∈ Rd.
Classes of elliptic operators B. Khoromskij, Zurich 2010(L10) 247
Let X = L2(Ω), then the corresponding elliptic operator L and
its discrete counterpart A (say, A is the FEM/FD stiffness
matrix corresponding to L) satisfy
‖(zI −A)−1‖X←X ≤ 1
|z| sin(θ1 − θ)∀ z ∈ C : θ1 ≤ | arg z| ≤ π,
(71)
for any θ1 ∈ (θ, π), where cos θ = δ0/C.
In the case of discrete elliptic operators A, the bound (71) on
the matrix resolvent is valid uniformly in the mesh-size h.
The variation in the r.h.s. of (71) is proportional to cond(A).
We consider tensor approximation to
A−1, exp(−tA),√A, sign(A) not analytic.
Tensor approximation of MVF quadratures for Laplace transform B. Khoromskij, Zurich 2010(L10) 248
Assume that for given f(ρ), ρ ∈ [1, R], and εR > 0, there is an
accurate r-term approximation by exponential sum,
fr(ρ) :=r∑
k=1
ake−bkρ s.t. |f(ρ) − fr(ρ)| ≤ εR, ρ ∈ [1, R]. (72)
The question is how accurate does the ansatz fr(A) represent
the matrix-valued function f(A)?
We consider two cases
(A) Real-diagonalisable matrix A, i.e., A = T−1DT with a
diagonal D = diagd1, ..., dn, where di ∈ [1, R].
(B) The analytic function f has the Dunford-Cauchy integral
representation (Γ “envelopes” σ(A)):
f(A) =1
2πi
∫
Γ
f(z)(zI −A)−1dz.
Tensor approximation of MVF quadratures for Laplace transform B. Khoromskij, Zurich 2010(L10) 249
Lem. 10.1. In Case (A) we have
‖f(A) − fr(A)‖ ≤ ‖T‖ ‖T−1‖ εR.
In Case (B), let (72) hold with εR = g(ρ)εΓ, at least for
ρ = z ∈ Γ. Then we have
‖f(A) − fr(A)‖ ≤ εΓ2π
maxz∈Γ
|g(z)|∫
Γ
∥∥(zI −A)−1∥∥ d |z|.
In the case of discrete elliptic operator A, we have
∫
Γ
∥∥(zI −A)−1∥∥ d |z| ≤ C log
|λmax||λmin|
, λmax, λmin ∈ σ(A),
where C depends on the ellipticity and continuity constants of
the related operator A.
Tensor approximation of MVF quadratures for Laplace transform B. Khoromskij, Zurich 2010(L10) 250
Proof: In Case (A), we readily obtain
‖f(A) − fr(A)‖ = ‖T−1 diagf1, ..., fnT‖with fi = f(di) − fr(di), which proves the statement. If T is
the unitary transform then ‖T‖ = ‖T−1‖ = 1.
In Case (B), there holds
‖f(A) − fr(A)‖ =1
2π
∥∥∥∥∥
∫
Γ
[f(z) −r∑
k=1
ake−bkz](zI −A)−1dz
∥∥∥∥∥
≤ εΓ2π
∫
Γ
|g(z)|∥∥(zI −A)−1
∥∥ d |z|,
that proves the general assertion. Finally, in the case of
discrete elliptic operators we shoose Γ in such a way that∥∥(zI −A)−1∥∥ ≤ C
|z| , (see (71)), to obtain
∫
Γ
∥∥(zI −A)−1∥∥ d |z| ≤ C
∫
Γ
d |z||z| .
Low tensor rank Laplacian inverse B. Khoromskij, Zurich 2010(L10) 251
Ex. 10.7. In the case of FD negative Laplacian ∆(d) on
H10 ([0, 1]d), for d = 2, 3, ..., the matrix ∆−1
(d), is approximated in
the rank-R Kronecker format,
LM :=M∑
k=−M
ck
d⊗
ℓ=1
exp(−tk∆(ℓ)) ≈ (∆(d))−1, ∆(ℓ) = ∆ ∈ R
n×n,
(tk, ck ∈ R), providing exponential convergence in R = 2M + 1.
In particular, taking
tk = ekh, ck = htk, h = π/√M,
leads to the convergence rate
∥∥(∆(d))−1 − LM
∥∥ ≤ Ce−π√
M .
Hence the ε-rank of (∆(d))−1 is of order O(| log ε|2), uniformly
in d (compare with Ex. 8.1, Lect. 8).
Low tensor rank Laplacian inverse B. Khoromskij, Zurich 2010(L10) 252
The matrix-vector multiplication of LM with rank-1 vector in
Rn⊗d
takes O(dRn logn) op. by the diagonalization
exp(−tk∆(ℓ)) = F ′ℓ ·Dℓ · Fℓ, Dℓ = diage−tkλ(ℓ)1 , ..., e−tkλ(ℓ)
n ,
where Fℓ is the ℓ-mode sin-transform matrix of size n, and
λ(ℓ)i (i = 1, ..., n) are the eigenvalues of the 1D Laplacian ∆(ℓ).
This also allows the rank reduction scheme by using tensor
product FFT,
A−1 ≈d⊗
ℓ=1
FTℓ
M∑
k=−M
ck
d⊗
ℓ=1
diage−tkλ(ℓ)1 , ..., e−tkλ(ℓ)
n d⊗
ℓ=1
Fℓ,
Rem. 10.2. The above decomposition is similar to the sinc approxim. of
the Hilbert tensor, applied to the n⊗d tensor Λ = [1/(λ(1)i1
+ ...+ λ(d)id
)].
Exer. 10.1. Construct the fast Poisson solver on uniform grid over [0, 1]d,
by the rank decomposition of the Hilbert tensor Λ. Test the case f = 1,
d = 3, for different grid-size n and truncation ranks.
Matrix exponentials B. Khoromskij, Zurich 2010(L10) 253
Ex. 10.8. The matrix exponential can be defined and then
calculated by
exp(A) :=∞∑
k=0
1
k!Ak ≈ EN :=
N−1∑
k=0
1
k!Ak. (73)
This approximation converges exponentially (if N is large
enough, say, N ≥ e||A||),
||EN − exp(A)|| ≤∞∑
k=N
1
k!||A||k ≤ C(||A||)
N !≈(e||A||)N
)N
.
The Horner scheme to calculate (73) requires only N − 1
matrix multiplications
AN := I; for k = N − 1 downto 1 do Ak :=1
kAk+1A+ I,
such that EN := A0.
If ||A|| > 1 the algorithm (73) may produce very large terms
for intermediate values of N !
Matrix exponentials B. Khoromskij, Zurich 2010(L10) 254
Recal that for commutative matrices A,B we have
exp(A+B) = exp(A) exp(B), in particular exp(A) = [exp(A/2)]2.
Now, the algorithm (73) can be modified as follows:
(a) Choose n such that 12n ‖A‖ ≤ 1.
(b) Compute B = exp(A/2n) by algorithm (73).
(c) Compute exp(A) = B2n in n ≈ log2(‖A‖) matrix quadrations.
If B = exp(A/2n) can be represented in certain data-sparse
format (e.g., Kronecker product form) then truncating all the
intermediate products B2m, m = 1, ..., n, into the fixed format
S leads to the desired tensor representation of exp(A).
In this case, the truncation error analysis is an open question.
Modification to the case of many exponentials: lectures
below.
Tensor truncated nonlinear iteration B. Khoromskij, Zurich 2010(L10) 255
Ex. 10.9. In some cases iterative schemes (with possible
recompression at each iteration) can be applied.
(A) An approximation to A−1: given X0 ∈ Rn×n, the
Newton-Schulz iteration
Xk+1 = Xk(2I −AXk), k = 1, 2, ... (74)
converges to A−1 locally quadratically (cf. anylisis below).
Iteration (74) is nothing but the Newton method
Ψ′(Xk)(Xk+1 −Xk) = −Ψ(Xk)
for solving the nonlinear matrix equation
Ψ(X) := A−X−1 = 0.
In fact, Ψ(X + δ) − Ψ(X) = X−1δ(X + δ)−1 providing
Ψ′(Xk)(δ) = X−1k δX−1
k . Now (74) follows from
Xk+1 −Xk = −Xk(A−X−1k )Xk.
Analysis of iterative schemes B. Khoromskij, Zurich 2010(L10) 256
Analysis of Newton-Schulz iteration (74) to compute A−1.
Denote the residual error by Ek = I −AXk, k = 0, 1, 2, . . .. It is
easy to see that
Xk+1 = Xk(I +Ek), k = 0, 1, 2, . . . ,
which implies (for k = 1, 2, . . .)
Ek = I−AXk−1(I+Ek−1) = I−(I−Ek−1)(I+Ek−1) = E2k−1. (75)
Applying (75) recursively, we find that
Ek = E2k
0 , k = 1, 2, . . . . (76)
It is also clear that
A−1 −Xk = A−1Ek = A−1E2k
0 = X0(I −E0)−1E2k
0 .
Analysis of iterative schemes B. Khoromskij, Zurich 2010(L10) 257
Under the assumption on the spectral radius of E0,
ρ ≡ ρ[E0] = maxj
|λj | < 1,
where λj = λj(E0) are the eigenvalues of E0, we obtain that
the error Ek in (76) vanishes like ρ2k .
Rem. 10.3. The iteration (74) can be applied to any
preconditioned matrix B = R0A, where R0 is a spectrally
equivalent preconditioner to A so that σ(B) is uniformly
bounded in n. Assuming that both R0 and R0A already have
the S-structured representation, we then obtain the
approximate inverse of interest from
A−1 = (R0A)−1R0.
In some cases this approach provides the constructive proof
on the existence of the S-matrix inverse.
Analysis of iterative schemes B. Khoromskij, Zurich 2010(L10) 258
Let E0 = I −BX0. The requirement ρ[E0] < 1 can be achieved
under the following conditions.
Lem. 10.2. Let B have real eigenvalues in the interval
0 < m ≤ λj ≤M , j = 1, 2, . . . , n. Let X0(w) = wI, then ρ[E0] < 1
for all w ∈ (0, 2M ). Moreover, if ρ(w) = ρ[E0(w)], then there
holds
ρ(w∗) = minw∈(0, 2
M )ρ(w) =
M −m
M +m< 1, w∗ =
2
M +m. (77)
Proof. This lemma is a reformulation of a standard
convergence result for the Richardson iteration.
Implementing (74) in the formatted S-arithmetics efficiently,
one can compute the S approximation Xk to A−1 with
O(log log ε−1) iterations, where ‖I −AXk‖ ≤ ε.
Iterative schemes to compute sign(A) and sqrt(A) B. Khoromskij, Zurich 2010(L10) 259
(B1) Newton-Schulz iteration scheme to approximate sign(A):
Xk+1 = Xk +1
2
[I − (Xk)2
]Xk , X0 = A/||A||2. (78)
For diagonalisable matrices we have locally quadratic
convergence Xk → sign(A) (see the analysis below). Many
methods are presented in [3].
This scheme was already successfully applied in many-particle
calculations.
The above mentioned schemes (a) and (b) are especially
efficient in the case q = 2, since the optimal SVD or ACA
recompression in the (H)KT-format can be applied.
(B2) Newton’s method to calculate sign(A). The iteration
X0 = A, Xk+1 =1
2(Xk +X−1
k ) (79)
converges (locally quadratically) to sign(A).
Iterative schemes to compute sign(A) and sqrt(A) B. Khoromskij, Zurich 2010(L10) 260
The iterative calculation may be not very simple !
(C) Newton iteration to compute the square root A1/2 of the
symmetric positive definite matrix A: Given X0, the iteration
Xk∆k + ∆kXk = A−X2k , (80)
where ∆k = Xk+1 −Xk, converges to A1/2 quadratically
(locally). It requires solving matrix Lyapunov equation.
This scheme can be consider as the Newton iteration to solve
the nonlinear matrix equation
Ψ(X) := A−1 −X2 = 0.
Clearly,
Ψ(X + δ) − Ψ(X) = −X∆ − ∆X,
so our iteration can be interpreted as the Newton method for
solving Ψ(X) = 0 (see Lect. 11 for the analysis of truncated
iterations).
Literature to Lecture 10 B. Khoromskij, Zurich 2010(L10) 261
1. I. Gavrilyuk, W. Hackbusch and B.N. Khoromskij: Tensor-Product Approximation to Elliptic and Parabolic
Solution Operators in Higher Dimensions. Computing 74 (2005), 131-157.
2. W. Hackbusch, B.N. Khoromskij and E. Tyrtyshnikov: Approximate Iteration for Structured Matrices.
J. Numer. Math. Vol. 13, No. 2 (2005), 119-156.
3. N.J. Higham: Functions of Matrices: Theory and Computation. SIAM, 2008.
URL: http://personal-homepages.mis.mpg.de/bokh
Lect. 11. Truncated iteration. Tensor convolution and other bilinear operations. B. Khoromskij,
Outlook of Lecture 11.
1. Tensor truncated nonlinear iteration: attempt to the
convergence theory.
2. Example on SVD based truncation methods for A−1.
3. Newton type iterations for sign(A) and√A.
4. Multivariate convolution: error bound for projection
collocation scheme. Richardson extrapolation.
5. Discrete d-dimensional convolutiuon in various tensor
formats.
6. Tensorization of Hadamard, scalar & contracted products.
7. Beyond the additive dimension splitting: what is the next
generation of tensor methods?
On convergence rate for fixed-point iteration B. Khoromskij, Zurich 2010(L11) 263
Let V be a normed space (e.g., n× n matrices) and consider
a function f : V → V . Assume that A ∈ V and B := f(A) can
be obtained by the locally convergent fixed-point iterations
Given X0 ∈ V, Xk = Φ(Xk−1), k = 1, 2, ... , (81)
where Φ : V → V is a one-step operator,
limk→∞
Xk = B = Φ(B). (82)
Lem. 11.1. ([1]) Assume, there are constants cΦ, εΦ > 0 s.t.
‖Φ(X) −B‖ ≤ cΦ ‖X −B‖α ∀ X: ‖X −B‖ ≤ εΦ, (83)
and set α = 2, ε := min (εΦ, 1/cΦ). Then (82) holds for any X0
satisfying ||X0 −B|| < ε, and, moreover,
‖Xk −B‖ ≤ c−1Φ (cΦ ‖X0 −B‖ )
2k(k = 0, 1, 2, . . .) . (84)
On convergence rate for fixed-point iteration B. Khoromskij, Zurich 2010(L11) 264
Proof: Let ek := ‖Xk −B‖. Then, due to (83),
ek ≤ cΦe2k−1, provided that ek−1 ≤ εΦ. (85)
Since (85), ek−1 ≤ ε ≤ εΦ imply ek ≤ cΦε2 = ε (cΦε) ≤ ε. Hence,
all iterates stay in the ε-neighbourhood of B.
(84) is proved by induction:
ek ≤(85)
cΦe2k−1 =
induct. hypoth.cΦ ·
(c−1Φ (cΦe0)
2k−1)2
=c−1Φ (cΦe0)
2k.
Whenever e0 < ε, (84) shows ek → 0.
Rem. 11.1. (84) together with e0 ≤ ε implies monotonicity:
‖Xk −B‖ ≤ ‖Xk−1 −B‖ . (86)
Rem. 11.2. Cond. (83), α = 2, is valid for the Newton iter.
Quadratic convergence of truncated iteration B. Khoromskij, Zurich 2010(L11) 265
Let S ⊂ V be a subset (not necessarily a subspace) considered
as a class of certain structured elements (e.g.
tensor-structured matrices) and suppose that R : V → S is an
operator mapping elements from V onto suitable structured
approximants in S. We call R a truncation operator .
Define a truncated iterative process as follows:
Y0 := R(X0), Yk := R(Φ(Yk−1)), k = 1, 2 . . . (87)
Thm. 11.2. ([1]) Under the premises of Lem. 11.1, assume
that
‖X −R(X)‖ ≤ cR ‖X −B‖ ∀ X: ‖X −B‖ ≤ εΦ. (88)
Then there exists δ > 0 such that the truncated iteration
(87) converges to B so that for k = 1, 2, . . .
‖Yk −B‖ ≤ cRΦ ‖Yk−1 −B‖2with cRΦ := (cR + 1)cΦ (89)
Quadratic convergence of truncated iteration B. Khoromskij, Zurich 2010(L11) 266
for any starting value Y0 = R(Y0) satisfying ‖Y0 −B‖ < δ.
Proof: Let ε := min (εΦ, 1/cΦ) and define Zk = Φ(Yk−1). By
(86) we have
‖Zk −B‖ ≤ ‖Yk−1 −B‖ ,
provided that ‖Yk−1 −B‖ ≤ ε. Then
‖Yk −B‖ = ‖R(Zk) − Zk + Zk −B‖ ≤ (cR + 1) ‖Zk −B‖ . (90a)
Assuming ‖Yk−1 −B‖ ≤ ε, the bounds ε ≤ εΦ and (83) ensure
‖Zk −B‖ = ‖Φk(Yk−1) −B‖ ≤ cΦ ‖Yk−1 −B‖2 . (90b)
Combining (90a) and (90b), we obtain (89) for any k,
provided that ‖Yk−1 −B‖ ≤ ε.
As in the proof of Lem. 11.1, the choice δ := min (ε, 1/cRΦ)
guarantees that ‖Y0 −B‖ ≤ δ implies ‖Yk −B‖ ≤ δ ≤ ε, k ∈ N.
Some remarks on trancated FP iteration B. Khoromskij, Zurich 2010(L11) 267
Cor. 11.3. Under the assumptions of Thm. 11.1, any
starting value Y0 with ‖Y0 −B‖ ≤ δ leads to
‖Yk −B‖ ≤ c−1RΦ (cRΦ ‖Y0 −B‖)2
k
(k = 1, 2, . . .) , (91)
where cRΦ and δ are defined as above.
The condition (88) has a clear geometrical meaning. If
R(X) := argmin ‖X − Y ‖ : Y ∈ S
is the best approximation to X in the given norm, inequality
(88) holds with cR = 1, since B ∈ S. Therefore, (88) with
cR ≥ 1 can be viewed as a quasi-optimality condition.
Rem. 11.3. If the norm is defined by a scalar product, S is a subspace,
and R(X) is the orthogonal projection onto S, then (88) is obviously
fulfilled with cR = 1. If α = 1 in (83), i.e. linear convergence, the trancated
process retains linear convergence rate, provided that (cR + 1)cΦ < 1.
Truncated Newton iteration to compute A−1B. Khoromskij, Zurich 2010(L11) 268
We analyse the case of second order tensors (d = 2)
A ≈ Ar =r∑
k=1
Uk ⊗ Vk, Uk ∈ Rm×m, Vk ∈ R
n×n.
Recall (see Def. 6.1, Lect. 6) that for a matrix A ∈ Rm×n we
use the vector representation A→ vec(A) ∈ Rmn, where vec(A)
is an nm× 1 vector obtained by “stacking” A’s columns
vec(A) := [a11, ..., an1, a12, ..., anm]T ,
vec(A) is a rearranged version of A. Introduce the linear
invertible oper. L : Rmn×mn → R
m2×n2
by (L = P, cf. Lect. 8)
L(Ar) ≡ Ar :=r∑
k=1
vec(Vk) · vec(Uk)T .
L is unitary with respect to the spectral or Frobenius norm.
Truncated Newton iteration to compute A−1B. Khoromskij, Zurich 2010(L11) 269
Making use of the transform L allows to reduce the low
Kronecker rank approximation of A to those for the low-rank
approximation to A. For fixed r one may apply truncation
operator R of the form
R(A) := L−1(Πr(L(A))), I −R = L−1(I − Πr)L,
where Πr(B) is the best rank-r approximation to B ∈ Rm2×n2
in the given norm (say, spectral or Frobenius norm).
We formulate the general statement.
Let B = F(A) be defined by the given matrix-valued function
F and let R be the truncation operator that satisfies (88) for
all X in the “small” neighbourhood S(B) of B.
In particular, we consider F(A) = A−1.
Truncated Newton iteration to compute A−1B. Khoromskij, Zurich 2010(L11) 270
Introduce the modified (truncated) Newton-Schultz iteration
Zk+1 = Xk(2I −AXk), Xk+1 = R(Zk+1), k = 1, 2, ... (92)
Thm. 11.4. Let (88) be satisfied. Then for any initial guess
X0 = R(X0) ∈ S(B), the truncated Newton-Schultz iteration
(92) converges to A−1 quadratically
||A−1 −Xk|| ≤ (1 + CR)||A|| ||A−1 −Xk||2, k = 1, 2, ...
Proof. Note that (88) leads to
B ≡ A−1 = R(A−1).
Now eq. (92) implies A−1 − Zk+1 = (A−1 −Xk)A(A−1 −Xk),
which yields
||A−1 − Zk+1|| ≤ ||A|| ||A−1 −Xk||2. (93)
Truncated Newton iteration to compute A−1B. Khoromskij, Zurich 2010(L11) 271
On the other hand, (88) implies
||Xk − Zk|| = ||R(Zk) − Zk|| ≤ CR||A−1 − Zk||,
hence, the triangle inequality leads to
||A−1 −Xk|| ≤ ||A−1 − Zk|| + ||Zk −Xk|| ≤ (1 + CR)||A−1 − Zk||.
Combinig this bound with (93) completes the proof.
Let us check (88) for the choice R(A) = L−1(Πr(L(A))). We
denote Y = L(X) and YB = L(B) and note that B = R(B)
yields ΠrYB = YB.
In the following proof we make use of the standard stability
estimates for the singular values of the perturbed matrix
[Wielandt, Hoffman ’55].
Now we estimate in the Frobenis norm
Truncated Newton iteration to compute A−1B. Khoromskij, Zurich 2010(L11) 272
‖L−1‖−1‖X −RX‖ ≤ ‖(I − Πr)Y ‖ = (nX
k=r+1
σk(Y )2)1/2
=
vuutnX
k=r+1
(σk(Y ) − σk(YB))2
≤nX
k=r+1
|σk(Y ) − σk(YB)|
≤n−r+1X
k=1
σk(Y − YB) ≤√n− r||L(X − B)||.
Estimate (88) now follows with CR =√n− r‖L−1‖‖L‖.
Few remarks.
1. The factor√n− r can be omitted.
2. The error estimate above allows the straightforward local analysis for
algorithm (94) with the truncation operator R.
3. In the case of three (or more) factors (d ≥ 3), we can analyse the
sub-optimal truncation operator R via Tucker-type decomposition.
Iterative schemes to compute sqrt(A) B. Khoromskij, Zurich 2010(L11) 273
The iterative calculation may be not very simple !
Newton iteration to compute the square root A1/2 of the
s.p.d. matrix A: Given X0, the iteration
Xk∆k + ∆kXk = A−X2k , with ∆k = Xk+1 −Xk, (94)
converges to A1/2 quadratically (locally). It requires solving
matrix Lyapunov equation.
This scheme can be considered as the Newton iteration to
solve the nonlinear matrix equation
Ψ(X) := A−1 −X2 = 0.
Clearly,
Ψ(X + δ) − Ψ(X) = −X∆ − ∆X,
so our iteration can be interpreted as the Newton method for
solving Ψ(X) = 0 (Thm. 11.2: analyses of truncated iter.).
Iterative schemes to compute sqrt(A) B. Khoromskij, Zurich 2010(L11) 274
Iteration (94) can be written as Xk = Φk(Xk−1) corresponding
to the choice
Φk(X) := Φ(X),
where Φ(X) solves the matrix equation
X(Φ(X) −X) + (Φ(X) −X)X = A−X2.
Simple calculation shows that the latter equation implies
(with the substitution A = B2)
X(Φ(X) −B) +XB −X2 + (Φ(X) −B)X +BX −X2 = B2 −X2,
which leads to the matrix Lyapunov eq. w.r.t. Y = Φ(X) −B,
XY + Y X = (B −X)2.
Iterative schemes to compute sqrt(A) B. Khoromskij, Zurich 2010(L11) 275
Making use of the solution operator for the Lyapunov eq.
(assume that X = X⊤ > 0), we arrive at the norm estimate
‖Φ(X) −B‖ ≤∥∥∥∥∫ ∞
0
e−tX(B −X)2e−tXdt
∥∥∥∥ ≤ C‖B −X‖2.
This proves relation (3) in Lem. 11.1 with α = 2. Hence,
Thm. 11.2 ensures the convergence of the truncated version
of the nonlinear iteration (94).
Note that the simpler scheme
X0 = a0A, Xk := Xk−1 −1
2(Xk−1 −X−1
k−1A) (k = 1, 2, . . .) ,
where a0 > 0 is the given constant, does not guarantee, in
general, the convergence of truncated iterations.
Exer. 11.1. Compute sign(A), sqrt(A) and A−1 in the case of 2D
Laplacian, ∆m ⊗ In + Im ⊗ ∆n, as MATLAB functions. What is the
Kronecker ε-rank in all cases (use SVD of the rearranged matrix A). Try
to apply the Newton scheme to A−1 on moderate m× n grid.
Iterative schemes to compute sign(A) B. Khoromskij, Zurich 2010(L11) 276
Newton-Schulz iteration to compute sign(A),
Xk+1 = Xk +1
2
[I − (Xk)2
]Xk , X0 = A/||A||2. (95)
Diagonalisable case. Let T be the unitary transform that
diagonalises A, i.e., A = T⊤DT with di ∈ [−1, 1], then it also
diagonalises all Xk, k = 1, 2, .... Hence we have to show that
the scalar iteration
xk+1 = f(xk), with x0 ∈ [−1, 0) ∪ (0, 1],
with f(x) := x+ 12x(1 − x2) ≡ xg(x), converges to sign(x0)
quadratically.
f(x), x ∈ [−1, 1], is increasing and has the fixed points
x = −1, 0, 1. Since g(x) > 1, x ∈ (−1, 1), it implies
0 < xk < xk+1 ≤ 1 if x0 ∈ (0, 1] and −1 ≤ xk+1 < xk < 0 if
x0 ∈ [−1, 0).
Hence, both x = −1 and x = 1 are stable fixed points.
Iterative schemes to compute sign(A) B. Khoromskij, Zurich 2010(L11) 277
For example, consider the case with small initial guess x0 > 0.
For x ∈ [−1/2, 1/2], we have g(x) ≥ q > 1 with q = 1 + 3/8, thus
the number of iterations xk+1 = xkg(xk) to achieve the value,
say, xk = 0.5 starting from x0 > 0 is about O(logq x0).
For xk ≥ 1/2, we enter the regime with quadratic
convergence. In fact, we just have
1 − xk+1 =1
2(1 − xk)2(xk + 2),
which implies |1 − xk+1| ≤ 32 (1 − xk)2. In this stage, to achieve
precision ε > 0 one requires O(log2 log2 ε−1) iterations.
For the initial guess we actually have x0 = cond(A)−1, which
implies that the total number of iterations is bounded by
O(log2 log2 ε−1) +O(logq cond(A)).
Iterative schemes to compute sign(A) B. Khoromskij, Zurich 2010(L11) 278
Note that iteration (95) can be written as Xk = Φ(Xk−1) with
Φ(X) := X + 12
(I −X2
)X (see Lem. 11.1). Since all Xk
(k = 1, 2, ...) are simultaniously diagonalised by the same
matrix T , we have (with B = sign(A)):
Φ(X) −B = X −B +1
2(B2 −X2)X
=1
2(X −B)(B(B −X) + (B −X)(B +X)
= −(X −B)2(B +1
2X).
The analysis for algorithm
X0 = A, Xk+1 =1
2(Xk +X−1
k )
in the diagonalisable case is reduced to that one for the
Newton meth. applied to
Ψ(x) := x2 − 1 = 0, that is xk+1 =1
2(xk +
1
xk).
Multidimensional convolution in tensor format B. Khoromskij, Zurich 2010(L11) 279
Fast and accurate computation of convolution transform in
Rd (recall from Lect. 4),
w(x) := (f ∗ g)(x) :=
∫
Rd
f(y)g(x− y)dy f, g ∈ L1(Rd).
Application: Solving the elliptic eq. with constant coef. in
Rd, in particular, the Hartree potential in quantum chemistry,
VH(x) =
∫
R3
ρ(y, y)
‖x− y‖dy, x ∈ R3.
Method: The low rank multi-linear projection-collocation.
Physical prerequisits:
(a) Compute f ∗ g in some fixed box Ω = [−A,A]d.
(b) Suppose that f has the support in Ω.
(c) f has R-term separable representation with moderate R.
Multidimensional convolution in tensor format B. Khoromskij, Zurich 2010(L11) 280
Tensor grid: Let ωd := ω1 × ...× ωd be the equi-distant tensor
grid of collocation points xm in Ω, m ∈ M := 1, ..., n+ 1d,
ωℓ := −A+ (m− 1)h : m = 1, ..., n+ 1 (ℓ = 1, ..., d), h = 2A/n.
Product Basis: For given piecewise const. basis funct. φi,φi(x) =
d∏ℓ=1
φiℓ(xℓ), φiℓ(·) = φ(· + (iℓ − 1)h), i ∈ I := 1, ..., nd,
related to ωd, let
f(y) ≈∑
i∈Ifiφi(y), fi = f(Pi).
The discrete collocation scheme (cost O(n2d)):
f ∗ g ≈ wmm∈M, wm :=∑
i∈Ifi
∫
Rd
φi(y)g(xm − y)dy, xm ∈ ωd.
Compute the collocation coeff. tensor (L2-projection)
G = gi ∈ RI : gi =
∫
Rd
φi(y)g(−y)dy, i ∈ I.
d-dimensional discrete convolution B. Khoromskij, Zurich 2010(L11) 281
Define the d-th order tensor F = fi ∈ RI.
Compute discrete convolution in Rd (cost O(nd logq n) via FFT)
F ∗ G := zj, zj :=∑
i
figj−i+1, j ∈ J := 1, ..., 2n− 1d,
where the sum is over all i ∈ I which lead to legal subscripts
for fi and gj−i+1.
Important step: wm, m ∈ M, is obtained by copying the
corresponding part of zj (centred by j = n),
wm = zj|j=n/2+m, m ∈ M.
Specifically, for jℓ = 1, ..., 2n− 1,
iℓ ∈ [max(1, jℓ + 1 − n) : min(jℓ, n)].
For example, in 1D case,
z(1) = f(1) · g(1), z(2) = f(1) · g(2) + f(2) · g(1),
d-dimensional discrete convolution B. Khoromskij, Zurich 2010(L11) 282
z(n) = f(1) · g(n) + f(2) · g(n− 1) + ...+ f(n) · g(1), z(2n− 1) = f(n) · g(n).
Ex. 11.1. 1D-case, n = 4 m = 1, ..., 5.
z(1) = f(1) · g(1), z(2) = f(1) · g(2) + f(2) · g(1),
z(3) = f(1) · g(3) + f(2) · g(2) + f(3) · g(1),
z(4) = f(1) · g(4) + f(2) · g(3) + f(3) · g(2) + f(4) · g(1).
z(5) = f(2) · g(4) + f(3) · g(3) + f(4) · g(2),
z(6) = f(3) · g(4) + f(4) · g(3), z(7) = f(4) · g(4).But the collocation scheme leads to (note: g1 = g4, g2 = g3)
w(1) = f(1) · g(2) + f(2) · g(1),
w(2) = f(1) · g(3) + f(2) · g(2) + f(3) · g(1),
w(3) = f(1) · g(4) + f(2) · g(3) + f(3) · g(2) + f(4) · g(1).
w(4) = f(2) · g(4) + f(3) · g(3) + f(4) · g(2), w(5) = f(3) · g(4) + f(4) · g(3).
Error analysis for projection collocation convolution B. Khoromskij, Zurich 2010(L11) 283
Lem. 11.5 Let f ∈ C2(Ω) and let g ∈ L1(Ω). Furthermore, we
assume that there exist µ ≥ 1 and β > 0, such that
|F(g)| ≤ C/|κ|µ as |κ| → ∞, κ ∈ Rd,
|∇yg(x− y)| ≤ C/|x− y|β for x, y ∈ Ω, x 6= y.
Then there is a constant C > 0 independent of h such that
|w(xm) − wm| ≤ Ch2, m ∈ M.
Exer. 11.2. See BNK [1] for the technical proof of Lem. 11.5.
Ex. 11.2. The fundamental solution of the Laplace operator
in Rd is given by
G(x) = c(d)/|x|d−2, with F(G) = C/|κ|2.
Lem. 11.5 now applies with β = d− 1, µ = 2.
Richardson extrapolation B. Khoromskij, Zurich 2010(L11) 284
Lem. 11.6. (Richardson extrapolation, BNK [1])
Under the assumptions of Lem. 11.5, let f ∈ C3(Ω). Then
there exists a function c1 ∈ C(Ω) which is independent of h,
such that for m ∈ M we have
w(xm) = wm + c1(xm)h2 + ηm,h with |ηm,h| ≤ Ch3.
Higher order approximation without extra cost!
wm = (4 ∗ wh/2m − wh
m)/3, m ∈ Mh,
then
|w(xm) − wm| ≤ Ch3, m ∈ Mh.
Richardson extrapolation applies directly to functionals of
w(xm).
Numerics for tensor-product convolution B. Khoromskij, Zurich 2010(L11) 285
Letting G ∈ CR, F ∈ T r, we tensorize the discr. convolution
F ∗ G =
R,rX
k=1,m=1
bkcm1...md
“U
(1)k ∗ V (1)
m1
”× ...×
“U
(d)k ∗ V (d)
md
”.
1D convolution on equidistant grid, U(ℓ)k ∗ V (ℓ)
mℓ∈ R2n−1, can be computed
by FFT in O(n logn) operations.
Setting a = U(ℓ)k ,b = V
(ℓ)mℓ
∈ Rn, we have
(a ∗ b)j =
nX
m=1
ambj−m+1, j = 1, ..., 2n− 1.
This leads to the linear in n complexity
NC∗T = O(drRn log n+Rrd) ≪ nd log n.
Ex. 11.3. The Hartree potential for H2O molecule [3].
n3 1283 2563 5123 10243 20483 40963 81923 163843
FFT3 4.3 55.4 582.8 ∼ 6000 – – – ∼ 2 years
C ∗ C 1.0 3.1 5.3 21.9 43.7 127.1 368.6 700.2
Linear and bilinear operations on Tucker tensors B. Khoromskij, Zurich 2010(L11) 286
Def. 11.1. For given tensors A,B ∈ RI , the Hadamard product
A⊙B ∈ RI of two tensors of the same size I is defined by componentwise
product,
(A⊙B)i = ai · bi, i ∈ I.For A1, A2 ∈ T r, one tensorizes the Hadamard product in O(drn+ r2d) op.
A1⊙A2 :=rX
k1,m1=1
···rX
kd,md=1
βk1...kdζm1...md
“u(1)k1
⊙ v(1)m1
”⊗...⊗
“u(d)kd
⊙ v(d)md
”.
(96)
Apply definition to the rank-1 tensors (β = ζ = 1), to obtain
(A1 ⊙ A2)i =(u(1)i1v(1)i1
) · · · (u(d)idv(d)id
)
=“u(1) ⊙ v(1)
”⊗ · · · ⊗
“u(d) ⊙ v(d)
”. (97)
Def. 11.2. For given tensors F = [fi] ∈ RI , G = [gi] ∈ RI , we define their
discrete convolution product by
F ∗G :=
24X
i∈Ifigj−i+1
35
j∈J
, J := 1, ..., 2n− 1d,
where j− i+1 ∈ I (G can be extended to the index set beyond I by zeros).
Linear and bilinear operations on Tucker tensors B. Khoromskij, Zurich 2010(L11) 287
For given A1, A2 ∈ T r, the tensorized convolution product reads
A1 ∗A2 :=rX
k=1
rX
m=1
βk1...kdζm1...md
“u(1)k1
∗ u(1)m1
”⊗ ...⊗
“u(d)kd
∗ v(d)md
”. (98)
This relation follows from the analysis in the case of rank-1 convolving
tensors F, G ∈ C1, similar to the case of Hadamard product of tensors,
(F ∗G)j :=X
i∈If(1)i1
· · · f(d)idg(1)j1−i1+1 · · · g(d)
jd−id+1
=
n1X
i1=1
f(1)i1g(1)j1−i1+1 · · ·
ndX
id=1
f(d)idg(d)jd−id+1 =
dY
ℓ=1
“f(ℓ) ∗ g(ℓ)
”jℓ. (99)
Assuming that ”one-dimensional” convolutions of n-vectors,
u(ℓ)kℓ
∗ v(ℓ)mℓ∈ R
2n−1,
can be computed in O(n logn) operations (circulant convolution by FFT),
we arrive at the overall complexity estimate
N·∗· = O(dr2n log n+ r2d) ≪ O(nd).
Bilinear tensors operations in canonical format B. Khoromskij, Zurich 2010(L11) 288
Consider tensors A1, A2, represented in the rank-R canonical format,
A1 =
R1X
k=1
cku(1)k ⊗ . . .⊗ u
(d)k , A2 =
R2X
m=1
bmv(1)m ⊗ . . .⊗ v
(d)m , (100)
with normalized vectors u(ℓ)k , v
(ℓ)m ∈ Rnℓ . For simplicity of discussion, we
assume nℓ = n, ℓ = 1, ..., d.
1. A sum of two canonical tensors given by (100) can be written as
A1 +A2 =
R1X
k=1
cku(1)k ⊗ . . .⊗ u
(d)k +
R2X
m=1
bmv(1)m ⊗ . . .⊗ v
(d)m , (101)
resulting in the canonical tensor with the rank at most RS = R1 +R2. This
operation has no cost since it is simply a concatenation of two tensors.
2. For given canonical tensors A1, A2, the scalar product is computed by
〈A1, A2〉 :=
R1X
k=1
R2X
m=1
ckbm
dY
ℓ=1
Du(ℓ)k , v
(ℓ)m
E. (102)
Calculation of (102) includes R1R2 scalar products of vectors in Rn,
Bilinear tensors operations in canonical format B. Khoromskij, Zurich 2010(L11) 289
leading to the overall complexity
N〈·,·〉 = O(dnR1R2).
3. For A1, A2 given by (100), we tensorize the Hadamard product by (97),
A1 ⊙ A2 :=
R1X
k=1
R2X
m=1
ckbm“u(1)k ⊙ v
(1)m
”⊗ . . .⊗
“u(d)k ⊙ v
(d)m
”. (103)
This leads to the complexity O(dnR1R2).
4. The convolution product of two tensors in the canonical format (100),
is given by (see (99))
A1 ∗A2 :=
R1X
k=1
R2X
m=1
ckbm“u(1)k ∗ v(1)m
”⊗ . . .⊗
“u(d)k ∗ v(d)
m
”, (104)
leading to the asymptotic complexity O(dR1R2n log n).
Bilinear tensors operations in canonical format B. Khoromskij, Zurich 2010(L11) 290
The convolution transform using the two-level format T CR,r (see Def.
6.8). The result W = F ⋆ G is represented in the two-level Tucker format
T CRc,rc with moderate Rc and rc.
Alg. 11.1. (Tensor convolution of type T CR1,r⋆ CR → T CRc,rc )
1. Given F ∈ T CR1,rwith the core β =
PR1ν=1βνz
(1)ν ⊗ . . .⊗ z
(d)ν , G ∈ Cn,R
and the rank parameters rc, Rc ∈ N (suppose that Rc ≪ RR1).
2. For ℓ = 1, ..., d, kℓ = 1, ..., r, compute 1D convolutions gkℓ,mℓ = u
(ℓ)kℓ⋆v
(ℓ)m
(m = 1, ..., R) of size 2n− 1, restrict the results onto the index set Iℓ,
and form the n× rR matrix unfolding Aℓ (cost O(drRn logn)).
3. For ℓ = 1, ..., d, compute the ℓ-mode rc-dimensional dominating
subspace for Aℓ, specified by an orthogonal matrix W (ℓ) = [w1ℓ , ..., w
rcℓ ]
at the expense O(dnr2R2).
4. Project the ℓ-mode matrices Gmℓ = [g1,m
ℓ , ..., gr,mℓ ] onto the new
orthogonal basis, Gmℓ ≈ W (ℓ) ·Mm
ℓ with a coefficients matrix
Mmℓ ∈ Rrc×r (cost O(drrcn)).
5. Calculate the core tensor of size rc = (rc, ..., rc) in the
Bilinear tensors operations in canonical format B. Khoromskij, Zurich 2010(L11) 291
product-canonical format
βc =RX
m=1
γm
0@
R1X
ν=1
βν
dO
ℓ=1
Mmℓ zν
ℓ
1A ∈ Cn,RR1
at the cost O(d2R1Rr2rc). Recompress the core βc to the rank-Rc
tensor eβc and constitute the result in the form
W = eβc ×1 W(1) ×2 ...×d W
(d) ∈ T CRc,rc .
We have proven that Alg. 11.1 scales linearly in n and quadratically in d,
NTC⋆C→TC = O(drRn log n+ dr2R2n+ drrcn+ d2r2RR1rc),
upto the low cost of rank RR1-to-Rc truncation at Step 5.
The Tucker-CP type convolution scales as
NT ⋆C→T = O(drRn logn+ dr2R2n+ rRrdc ).
Rem. 11.4. Additive dimension splitting based on canonical and Tucker
formats can be efficient in moderate dimensions. Tensor numerical
methods in higher dim. are based on multiplicative dimension splitting.
Literature to Lect. 11 B. Khoromskij, Zurich 2010(L11) 292
1. W. Hackbusch, B.N. Khoromskij and E. Tyrtyshnikov: Approximate Iteration for Structured Matrices.
J. Numer. Math. Vol. 13, No. 2 (2005), 119-156.
2. B.N. Khoromskij, Fast and Accurate Tensor Approximation of a Multivariate Convolution with Linear
Scaling in Dimension. J. of Comp. Appl. Math., 234 (2010) 3122-3139.
3. B.N. Khoromskij and V. Khoromskaia, Multigrid Tensor Approximation of Function Related Arrays.
SIAM J. on Sci. Comp., 31(4), 3002-3026 (2009).
http://personal-homepages.mis.mpg.de/bokh
Introduction to Tensor Numerical Methods III B. Khoromskij, Zurich 2010(L12) 293
If you can‘t explain it simply,
you don‘t understand it well enough.
Albert Einstein (1879-1955)
Introduction to Tensor Numerical Methods in
Scientific Computing(Part III. Multiplicative dimension splitting by TT/QTT formats.
TT/QTT numerical methods in modern applications.)
Boris N. Khoromskij
http://personal-homepages.mis.mpg.de/bokh
University/ETH Zurich, Pro∗Doc Program, WS 2010
Part III: Outlook B. Khoromskij, Zurich 2010(L12) 294
Part III (Lect. 12-18). Multiplicative dimension splitting by TT/QTT
formats. TT/QTT numerical methods in modern applications.
Tensor train/chain multiplicative formats. Main properties, SVD based
TT-rank truncation.
Quantics-TT representation. TT and QTT rank estimates on classes
of “multivariate“ vectors.
Approximation in QTT format. Explicit QTT representation of vectors
and matrices.
MLA in (Q)TT tensor formats. Relation to MPS and DMRG.
Rank structured solution operators and preconditioners in Rd.
Toward solving boundary value and spectral problems in tensor format.
Tensor numerical methods for the Hartree-Fock eq. in electronic
structure calculations.
Application to SPDEs.
DMRG in molecular dynamics and other directions.
Lect. 12. From additive to multiplicative dimension splitting B. Khoromskij, Zurich 2010(L12) 295
Outline of Lecture 12.
1. Higher dimensions might (and should) be tractable.
2. Dimension splitting via tensor train/chain factorization,
matrix product states (MPS).
3. Quasioptimal SVD based approximation.
4. Tensor truncation in TT format.
5. Historical overview, and related approaches.
6. On approximability in TT format.
(a) Analytic methods.
(b) Canonical → TT.
(c) SVD-based recompression.
7. TT/QTT MATLAB Tensor Toolbox:
http://spring.inm.ras.ru/osel (Dr. I. Oseledets, INM RUS, Moscow).
Higher dimensions might be tractable B. Khoromskij, Zurich 2010(L12) 296
1. Large scale modern applications:
DMRG for FCI electronic structure calculations, molecular dynamics,
quantum computing. Stochastic PDEs. Tensor networks in quantum
chemistry.
2. Main computational issues:
Solving basic PDEs on N ×N × ...×N︸ ︷︷ ︸d
grids, getting rid of
”curse of dimension”.
3. From additive to multiplicative dimension splitting:
Fast and robust MPS numerical methods for representation
of d-variate functions, operators and for solving physical
equations in Rd, with linear O(dN)-scaling in d.
4. Super-compression by high-dimensional Q-folding:
log-volume quantics-TT approxim. of N-d tensors, O(d logN).
Dimension splitting (DS) via tensor train/chain factorization B. Khoromskij, Zurich 2010(L12) 297
Def. 12.1. (Tensor Train/Chain format), TT/TC[r,n, d].
Given the index set J := ×dℓ=1Jℓ, Jℓ = 1, ..., rℓ, and J0 = Jd.
The rank-r tensor chain (TC) format
TC[r,n, d] ≡ TC[r] ⊂ Vn, n=(N,...,N) - d-fold,
contains all V ∈ Vn that can be presented as the chain of
contracted products of 3-tensors over (auxiliary) J ,
V = ×ℓdℓ=1G
(ℓ) with 3-tensors G(ℓ) ∈ RJℓ−1×Iℓ×Jℓ .
In coordinate representation,
V (i1, ..., id) =∑
α∈JG1(αd, i1, α1)G2(α1, i2, α2) · · ·Gd(αd−1, id, αd)
If J0 = Jd = 1 (disconnected chain), TC-format coincides
with the Tensor-Train (TT) model.
TT – Oseledets, Tyrtyshnikov, [3], [4], TC – Khoromskij, [1].
Specific features of TC/TT factorization B. Khoromskij, Zurich 2010(L12) 298
Here G1(i1) is a row 1 × r1-vector depending on i1, Gℓ(iℓ) is a
matrix of size rℓ−1 × rℓ with elements depending on iℓ, Gd(id)
is a column vector of size rd−1 × 1, depending on id.
A tensor V ∈ TC[r] is represented (approximated) by a
product of matrices (matrix product states), each depending
on a single “physical” index. Its similar to Tucker, but with
localised connectivity constraints.
d = 2: TT is a skeleton factorization of a rank-r matrix.
TT/TC model have many beneficial features:
– linear in d storage,
– existance of quasioptimal SVD-based rank approximation
(analogy of T-HOSVD),
– SVD-based rank truncation procedure with linear scaling in
d (analogy of T-RHOSVD),
– efficient formated bilinear tensor operations.
Visualising TC/TT models B. Khoromskij, Zurich 2010(L12) 299
The Tensor-Chain format for d = 6.
N ×N × ...×N︸ ︷︷ ︸d
-tensor in TC[r] is the d-fold contracted prod.
of tri-tensors.
N
r1
r1rr
2 2r3
d=6
r
N
N
3
r6
r5
6
r5 r4
r r4
Special case r6 = 1: TT[r] = TC[r] (disconnected chain).
Embedding: TT[r] ⊂ TC[r].
Benefits of TC/TT models B. Khoromskij, Zurich 2010(L12) 300
Thm. 12.1. (Storage, rank bound, concatenation, quasioptimality).
(A) Storage:d∑
ℓ=1
rℓ−1rℓN ≤ dr2N with r = maxℓ rℓ.
(B) Rank bound: rℓ ≤ rankℓ(V) ≤ rank(V).
(C) Canonical embeddings:
TT [r] ⊂ TC[r]; CR,n ⊂ TT [r,n, d] with r = (R, ..., R).
(D) Concatenation to higher dim.: V [d1]⊗ V [d2] → D = d1 + d2.
(E) Quasioptimal TT[r]-approximation of V ∈ Vn, satisfy
minT∈TT [r]
‖V −T‖F ≤ (d∑
ℓ=1
ε2ℓ )1/2, εℓ = min
rank B≤rℓ‖V[ℓ] −B‖F ,
and it can be computed by QR/SVD. V[ℓ] – ℓ-mode TT
unfolding matrix.
TC approximation requires ALS-type iteration.
Historical remarks related to quasioptimality (E) B. Khoromskij, Zurich 2010(L12) 301
Historical remarks toward the proof of (E),
(see Lem. 12.2. below).
Rem. 12.1. (Quasioptimality via SVD-based approximation).
Full-to-Tucker-HOSVD – [De Lathauwer et al. 2000].
Idea of Hierarchical DS – [BNK ’06], see [6].
Canonical-to-Tucker-RHOSVD – [BNK, Khoromskaia ’08].
Full-to-TT – [Oseledets, Tyrtyshnikov ’09].
Hierarchical Tucker format – [Hackbusch, Kuhn; Grasedyck ’09].
Quantics-TT – [BNK ’09], [Oseledets ’09], [BNK, Oseledets ’09-’10]
(approximation, theoretical bounds on rℓ and numerics).
Manifolds of TT tensors – [R. Schneider, Holtz, Rohwedder ’10]
Rem. 12.2. εℓ can be estimated over the truncated SVD or
ACA of the ℓ-mode TT-unfolding matr. of V, V[ℓ] (ℓ = 1, ..., d).
Toward the proof of Thm. 12.1 (E) B. Khoromskij, Zurich 2010(L12) 302
Lem. 12.2. (see [5]) For any tensor A = [A(i1, . . . , id)] there exists a TT
approximation T = [T (i1, . . . , id)] ∈ TT [r] with ranks rk, s.t.
‖A − T‖2F ≤
d−1X
k=1
ε2k, (105)
where εk is the Frobenius distance from A[k] to its best rank-rk approx.:
εk = minrankB≤rk
‖A[k] − B‖F .
Proof. First, consider the case d = 2. Then TT decomposition reads,
T (i1, i2) =
r1X
α1=1
G1(i1, α1)G2(α1, i2),
and coincides with the skeleton decomposition of the matrix T . Choose T
using the rank-r1 truncated SVD of A, which guarantees that the norm
‖A− T‖F = ε1 is minimal possible.
Then, proceed by induction. Consider the SVD of first unfolding matrix,
A[1] ≡ A(1) = [A(i1; i2...id)] = UΣV, Σ = diagσ1, σ2, · · · . (106)
Toward the proof of Thm. 12.1 (E) B. Khoromskij, Zurich 2010(L12) 303
As an approximation to A(1), consider
B1 = U1ΛV1, Λ = diag(σ1, . . . , σr1 ), (107)
where U1 and V1 contain the first r1 coulmns of U and rows of V ,
respectively. Then B1 is the best rank-r1 approximation to A[1], i.e.,
A[1] = B1 + E1, rankB1 ≤ r1, ‖E1‖F = ε1.
Obviously, B1 can be considered as a tensor B = [B(i1, . . . id)] and the
approximation problem reduces to the one for B.
Observe that if we take an arbitrary tensor T = [T (i1, . . . , id)] with the first
unfolding matrix T[1] = [T (i1, ; i2, . . . , id)] in the form,
T[1] = U1W (108)
with U1 from (107) and an arbitarry matrix W with r1 rows and as many
columns as in T[1], then E∗1T[1] = 0 and this implies that
‖(A − B) + (B − T)‖2F = ‖A − B‖2
F + ‖B − T‖2F . (109)
However, the tensor B is still of dimensionality d. To reduce
Toward the proof of Thm. 12.1 (E) B. Khoromskij, Zurich 2010(L12) 304
dimensionality, rewrite the matrix equality (107) in the elementwise form
B(i1; i2, . . . , id) =
r1X
α1=1
U1(i1;α1) bA(α1; i2, . . . , id), where bA = ΛV1.
Then, concatenate indices α1 and i2 into one long index and consider bAas a tensor
bA = [ bA(α1i2, i3, . . . id)] of dimensionality d− 1.
By induction, bA admits a TT approx. bT = [ bT (α1i2, i3, . . . id)] of the form
bT (α1i2, i3, . . . id) =X
α2,...,αd−1
G2(α1i2, α2)G3(α2, i3, α3) · · ·Gd(αd−1, id),
such that
‖bA − bT‖2F ≤
d−1X
k=2
ε2k, with
εk = minrankC≤rk
‖ bAk − C‖F , bAk = [ bA(α1i2, . . . ik; ik+1, . . . , id)].
Now let us set G1(i1, α1) = U(i1, α1), separate indices α1, i2 from the long
Toward the proof of Thm. 12.1 (E) B. Khoromskij, Zurich 2010(L12) 305
index α1i2 and define T by the following tensor train:
T (i1, . . . , id) =X
α1,...αd
G1(i1, α1)G2(α1, i2, α3) · · ·Gd(αd−1, id).
It remains to estimate ‖A − T‖F . First, from (106) and (107) it stems
that
bA = ΛV1 = U∗1A[1],
and consequently (Overlined numbers mean complex conjugates)
bA(α1i2, i3, . . . , id) =X
i1
U1(i1, α1)A(i1, i2, . . . , id).
Let Ak = Bk + Ek with rankBk ≤ rk and ‖Ek‖F = εk. We can consider Bk
and Ek as tensors Bk(i1, . . . , id) and Ek(i1, . . . , id). Since Bk admits a
skeleton decomposition with rk terms, we obtain
A(i1, . . . id) =
rkX
γ=1
P (i1, . . . ik; γ)Q(γ; ik+1, . . . , id) + Ek(i1, . . . , id).
Toward the proof of Thm. 12.1 (E) B. Khoromskij, Zurich 2010(L12) 306
Hence, bA(α1i2, i3, . . . , id) = Hk(α1, i2, i3, . . . , id) +Rk(α1, i2, i3, . . . , id) with
Hk(α1, i2, i3, . . . , id) =X
i1
U1(i1, α1)
rkX
γ=1
P (i1, . . . , ik; γ)Q(γ; ik+1, . . . , id),
Rk(α1, i2, i3, . . . , id) =X
i1
U1(i1, α1)Ek(i1, . . . , id).
Let us introduce a tensor L as follows:
L(α1, i2, . . . , ik, γ) =X
i1
U1(i1, α1)P (i1, . . . , ik; γ).
Then we can consider Hk as a matrix with the elements defined by a
skeleton decomposition
Hk(α1, i2, . . . , ik; ik+1, . . . , id) = L(α1, i2, . . . , ik; γ)Q(γ; ik+1, . . . , id)
and it makes it evident that the rank of Hk does not exceed rk. As well
we can consider Rk as a matrix with the elements defined by
Rk(α1; i2, i3, . . . , id) =X
i1
U1(i1;α1)Ek(i1; i2, . . . , id).
We know that U1 has orthonormal columns, and it means that the matrix
Toward the proof of Thm. 12.1 (E) B. Khoromskij, Zurich 2010(L12) 307
Ek is premultiplied by a matrix with orthonormal rows. Since this cannot
increase its Frobenius norm, we conclude that
εk ≤ ‖Rk‖F ≤ ‖Ek‖F = εk, 2 ≤ k ≤ d− 1.
Hence, for the error tensor E with the elements
E(α1i2, i3, . . . , id) = A(α1i2, i3, . . . , id) − T (α1i2, i3, . . . , id),
we obtain
‖bE‖2F ≤
d−1X
k=2
ε2k.
Further, the error tensor E = B − T can be viewed as matrix of the form
E(i1; i2, . . . , id) =
r1X
α1=1
U1(i1;α1) bE(α1; i2, . . . , id),
which shows that the matrix bE is premultiplied by a matrix with
orthonormal rows, so we have ‖E‖2F ≤ ‖bE‖2
F ≤d−1Pk=2
ε2k.
Finally, we observe that the first unfolding matrix T1 for T is exactly of
the form (108). Thus, (109) is valid completing the proof.
Further properties B. Khoromskij, Zurich 2010(L12) 308
Cor. 12.1 Given a tensor A, denote by ε = infB∈TT [r]
‖A − B‖F .
Then the optimal B exists (the infimum is in fact minimum)
and the TT approx. T constructed in the proof of Lem. 12.2
is quasi-optimal in the sense that
‖A − T‖F ≤√d− 1ε.
Proof. By the definition of infimum, there exists a sequence of tensor
trains B(s) (s = 1, 2, . . . , ) with the property lims→∞ ‖A − B(s)‖F = ε.
We cannot say that all elements of the corresponding tensor carriages are
uniformly bounded. Nevertheless, all elements of the tensor B(s) are
uniformly bounded, and hence, some subsequence B(st) converges
element-wise to some tensor B(min). The same holds true for the
corresponding unfolding matrices, B(s)[1]
→ B(min)[1]
, 1 ≤ k ≤ d.
It is well known that a sequence of matrices with a common rank bound
cannot converge to a matrix with a larger rank. Thus, rankB(st)[1]
≤ rk
implies that rankB(min) ≤ rk, and ‖A − B(min)‖F = ε, so B(min) is the
minimizer.
Further properties B. Khoromskij, Zurich 2010(L12) 309
It is now suffiecient to note that εk ≤ ε. The reason is that ε is the
approximation accuracy for every unfolding matrix A[k] delivered by a
special structured skeleton (dyadic) decomposition with rk terms while εk
stands for the best approximation accuracy without any restrictions on
the vestors of skeleton decomposition. Hence, εk ≤ ε. Then the
quasi-optimality bound follows directly from (105).
Cor. 12.2. If a tensor A admits an R-term canonical
approximation of accuracy ε > 0, then there exists a TT
approx. with rk ≤ R, and accuracy√d− 1ε.
Rem. 12.3. Similar to the case of product Stiefel manifold
of Tucker tensors, the set of TT-tensors with fixed rank
parameters, TT[r], can be proven to be the nonlinear
manifold in the TPHS Vn (see [7] for detailed discussion).
Full-to-TT approximation scheme B. Khoromskij, Zurich 2010(L12) 310
Rem. 12.4. The proof of Lem. 12.2 also gives a
constructive method for computing a TT-approximation.
Alg. 12.1. Full-to-TT compression algorithm ([5]).
Input: a tensor A of size n1 × n2 · · · × nd and accuracy bound ε > 0.
Output: tensor carriages Gk, k = 1, . . . d, defining a TT approximation to
A with the relative error bound ε.
1: Compute nrm := ‖A‖F .
2: Sizes of the first unfolding matrix : Nl = n1, Nr =Qd
k=2 nk.
3: Temporary tensor: B = A.
4: First unfolding: M := reshape(B, [Nl, Nr]).
5: Compute the truncated SVD of M ≈ UΛV , so that the approximate
rank r ensuresmin(Nl,Nr)X
k=r+1
σ2k ≤ (ε · nrm)2
d− 1.
6: Set G1 = U , M := ΛV T , r1 = r.
7: Process other modes
Full-to-TT approximation scheme B. Khoromskij, Zurich 2010(L12) 311
8: for k = 2 to d− 1 do
9: Redefine the sizes: Nl := nk, Nr := Nrnk
.
10: Construct the next unfolding: M := reshape(M, [rNl, Nr]).
11: Compute the truncated SVD of M ≈ UΛV , so that the approximate
rank r ensuresmin(Nl,Nr)X
k=r+1
σ2k ≤ (ε · nrm)2
d− 1.
12: Reshape the matrix U into a tensor:
Gk := reshape(U, [rk−1, nk, rk]).
13: Recompute M := ΛV .
14: end for
15. Set Gd = M .
Rem. 12.5. The Alg. 12.1 scales as O(nd+1).
Exer. 12.1. Examine Alg. 12.1 and Alg. 12.2 and compare with the
MATLAB codes.
Rank reduction in TT format B. Khoromskij, Zurich 2010(L12) 312
One of the most important procedures in structured tensor
computation is the recompression of formatted tensors.
Given a tensor A ∈ TT [r] with non-optimal ranks rk, we want
to approximate it with another TT-tensor B with smallest
possible ranks rk ≤ rk, while maintaining the desired relative
accuracy ε:
||A − B||F ≤ ε||B||F .
Such “projection” will define the ε-truncation operator,
B = Tε(A).
Construction of such an operator in the canonical form is a
notoriously difficult task, with no best solution known. Use of
the Tucker format is limited by the curse of dimension.
Rank reduction in TT format B. Khoromskij, Zurich 2010(L12) 313
For the TT-format it can be implemented by using standard
SVD and QR decompositions, see I. Oseledets, [3], as in Alg. 12.2
below, (with notation from [2]).
A MATLAB code for this algorithm is a part of TT-Toolbox
(I. Oseledets).
By SVDδ in Alg. 12.2, we denote SVD with singular values
that are set to zero if smaller then δ, and by QRrows, we
denote QR-decomposition of a matrix, where Q factor has
orthonormal rows.
The SVDδ(A) returns three matrices U , Λ, V of the
decomposition A ≈ UΛV ⊤ (as MATLAB svd function), and
QRrows returns two: Q-factor and R-factor.
Alg. 12.2 is an extension of reduced truncated matrix SVD,
and the RHOSVD for canonical tensors.
Algorithm of TT rank recompression B. Khoromskij, Zurich 2010(L12) 314
Alg. 12.2. TTε recompression ([3]).
Input: d-dimensional tensor A in the TT-format, required accuracy ε > 0.
Output: B in the TT-format with smallest compression ranks brk, s.t.
||A − B||F ≤ ε||A||F , i.e. B = Tε(A).
1: Let Gk, k = 1, . . . , d− 1 be cores of A.
2: Initialization: Compute truncation parameter δ = ε√d−1
||A||F .
3: Right-to-left orthogonalization:
4: for k = d to 2 step −1 do
5: [Gk(βk−1; ikβk), R(αk−1, βk−1)] := QRrows(Gk(αk−1; ikβk)).
6: Gk−1 := Gk ×3 R.
7: end for
8: Compression of the orthogonalized representation:
9: for k = 1 to d− 1 do (Compute δ-truncated SVD):
10: [Gk(βk−1ik; γk),Λ, V (βk, γk)] := SVDδ[Gk(βk−1ik;βk)].
11: Gk+1 := Gk+1 ×1 (V Λ)⊤.
12: end for; 13: Return: Gk, k = 1, . . . , d, as cores of B.
Algorithm of TT rank recompression B. Khoromskij, Zurich 2010(L12) 315
Rem. 12.6. The complexity of Alg. 12.2 is O(dnr3).
Rem. 12.7. All basic multilinear algebra (MLA) can be
implemented in TT-format: addition, multiplication by a
number, scalar product and norm calculation,
matrix-by-vector product, tensor-by-vector contracted
product, etc. Accomplished with the well-posed recompr.
procedure providing quasi-optimal approx., this gives an
efficient tool for solving large-scale high-dimensional probl.
Exer. 12.2. Apply the rank-2 TT representation of a tensor related to
f(x) = sin(x1 + ...+ xd) =eix − e−ix
2i= Im(eix),
to approximate the exact value of the multivariate integral for d = 5, 10, 20,
I(d) = Im
Z
[0,1]dei(x1+...+xd)dx = Im
"„ei − 1
i
«d#.
Hint: Use Lem. 2.1, Lect. 2, apply simple quadrature rule on n× ...× n
grid, and TT scalar product.
Basic tensor formats Tucker, MPS: HT, TT, QTT B. Khoromskij, Zurich 2010(L12) 316
Tensor methods in electronic structure calcul. (Hartree-Fock eq.):
Tucker, mixed Tucker/canonical - [BNK, Khoromskaia, Flad ’08-’10]
Canonical applies to SPDEs - [BNK, Ch. Schwab ’10]
DMRG via Matrix Product States (full configuration interaction
quantum chemistry) - [White ’92] ... [Schneider ’10]
Tucker + Hierarchical Dim. Splitting in molec. dynam., MCTDH -
[Meyer et al ’00-’09, Wang, Thoss ’03, ...]
Slightly entangled quantum computation, quantum Ising model [Vidal ’03]
TT (tensor train) + TT-toolbox (MATLAB) - [Oseledets, Tyrtyshnikov ’09]
HDS - O(drlog dn)-representation: [BNK ’06]; TC (periodic TT) - [BNK ’09]
HT - (hierarchical Tucker) [Hackbusch, Kuhn, Grasedyck ’09]
Quantics-TT - [BNK, Oseledets ’09] – SPDEs, DMRG in quant. mol. dyn.
Toward Tensor Networks:
TT/QTT ≈ MPS ⊂ TC/QTC ⊂ TN.
On nonlinear approximation in TT-type tensor formats B. Khoromskij, Zurich 2010(L12) 317
Important operation: ”Projection” onto the nonlinear
manifold S of rank-structured tensors,
S ⊂ S0 ⊂ Vn.
Def. 12.3. (Tensor truncation TS : S0 ⊂ Vn → S).
Let S ∈ Tr, CR,T CR,r , TT,QTT.
Given A ∈ S0 ⊂ Vn : Find TS(A) := argminT∈S
‖T −A‖.
⇒ The nonlinear approx. problem. on computation of TS for
S ∈ TT, TC,QTT, admits the SVD-based implementation
providing quasioptimal error and linear scaling in d.
Operator TS is an extension of the truncated SVD for
matrices to higher dimensions d > 2, getting rid of the ”curse
of dimensionality”.
Approximation tools: combine analytic & algebraic methods B. Khoromskij, Zurich 2010(L12) 318
Approximation in TT format can be based on:
(a) Analytic methods.
(b) Canonical → TT recompression (Any canonical decomposition is a
good starting point for further algebraic TT-rank approximation).
(c) SVD-based recompression of TT-tensors.
(d) Combination of (a) - (c).
Exer. 12.3. In some cases a TT representation can be derived from the
explicit function-TT (FTT) decomposition [Oseledets ’10]. The rank-2 TT
representation of the Kronecker sum tensor (cf. Def. 8.4, Lect. 8)
A :=dX
ℓ=1
⊗
Xℓ, Xℓ ∈ Rnℓ ,
is obtained as the n1 × · · · × nd grid representation of the rank-2 FTT
decomposition of f(x) = x1 + x2 + . . .+ xd, cf. Exer. 3.1, Lect. 1,
f(x) =“x1 1
”0@ 1 0
x2 1
1A · · ·
0@ 1 0
xd−1 1
1A0@ 1
xd
1A .
Check by the TT Toolbox. Derive the same with generalization
xℓ → fℓ(xℓ) (Recall that rank(A) = d).
Tensor numerical methods: main ingredients B. Khoromskij, Zurich 2010(L12) 319
1. Discretization in tensor-product Hilbert space of N-d
tensors, Vn = RI1×···×Id, #Iℓ = N .
2. MLA in low separation rank tensor formats S ⊂ Vn:
S = CR, Tr, TCR,r, TT [r], QTT [r], QTTloc.
Key point: Efficient tensor truncation (projection),
TS : S0 → S ⊂ S0 ⊂ Vn,
based on SVD + (R)HOSVD + ALS + ... + multigrid.
3. Multilevel tensor-truncated preconditioned iteration.
4. Minimisation on tensor manifold: precond. + DMRG + QTT.
5. Explicit TT-representation of functions and operators.
6. Quasi-direct tensor solvers: A−1, exp(tA), 1D sPDEs.
Literature to Lecture 12 B. Khoromskij, Zuerich 2010(L12) 320
1. B.N. Khoromskij, O(d logN)-Quantics Approximation of N-d Tensors in High-Dimensional Numerical
Modeling. Preprint 55/2009 MPI MiS, Leipzig 2009. J. Constr. Approx, 2011, to appear.
2. B.N. Khoromskij, and I. Oseledets. Quantics-TT approximation of elliptic solution operators in higher
dimensions. Preprint MPI MiS 79/2009, Leipzig. Rus. J. Numer. Anal. Math. Mod., 2011, to appear.
3. I.V. Oseledets, Compact matrix form of the d-dimensional tensor decomposition. Preprint 09-01, INM
RAS, Moscow 2009.
4. I.V. Oseledets, and E.E. Tyrtyshnikov, Breaking the Curse of Dimensionality, or How to Use SVD in
Many Dimensions. SIAM J. Sci. Comp., 31 (2009), 3744-3759.
5. I.V. Oseledets, and E.E. Tyrtyshnikov, TT-cross approximation for multidimensional arrays. Lin. Alg.
and its Applications. 432 (2010), 70-88.
6. B.N. Khoromskij, Structured Rank-(r1, ..., rd) Decomposition of Function-related Tensors in Rd.
Comp. Meth. in Applied Math., 6 (2006), 2, 194-220.
7. S. Holtz, T. Rohwedder, and R. Schneider, On manifold of tensors of fixed TT-rank, Tech. Rep. 61,
TU Berlin, 2010.
Lect. 13. Quantics-TT model: TT tour of highest dimensions B. Khoromskij, Zurich 2010(L13)
Outline of Lecture 13.
1. Why folding of a vector to high dimensional tensor?
2. Quantics folding. Quantics + TT = QTT.
3. Combined multiplicative tensor formats.
4. QTT representation of exponential/trigonometric vectors.
5. QTT of polynomials, sampled over uniform/graded grids.
6. Analytic methods of approximation.
7. On explicit (closed form) TT/QTT representation.
8. Numerics:
(a) QTT vector/tensor compression.
(b) QTT matrix compression.
9. Toward TT/QTT numerical methods.
Main properties of TT/TC models revisited B. Khoromskij, Zurich 2010(L13) 322
Thm. 12.1. (Lect. 12). (Stor., rank, concatenation, quasioptimality).
(A) Storage:d∑
ℓ=1
rℓ−1rℓN ≤ dr2N with r = maxℓ rℓ.
(B) Rank bound: rℓ ≤ rankℓ(V).
(C) Canonical embeddings:
TT [r] ⊂ TC[r]; CR,n ⊂ TT [r,n, d] with r = (R, ..., R).
(D) Concatenation to higher dim.: V [d1]⊗ V [d2] → D = d1 + d2.
(E) Quasioptimal TT[r]-approximation of V ∈ Vn,
minT∈TT [r]
‖V −T‖F ≤ (d∑
ℓ=1
ε2ℓ )1/2, εℓ = min
rank B≤rℓ‖V[ℓ] −B‖F ,
can be computed by QR/SVD, TC requires ALS iteration.
Quantics model: TT tour of higher dimensions B. Khoromskij, Zurich 2010(L13) 323
Lem. 13.1. [BNK ’09-’10] (Quantics approximation – background).
For a given N = 2L, with L ∈ N, and c, z ∈ C, the exponential N-vector
X := czn−1Nn=1 ∈ CN ,
can be reshaped by the dyadic folding to the rank-1, 2 × 2 × ...× 2| z L
-tensor,
F2,L : X 7→ A = c⊗Lp=1
24 1
z2p−1
35 , A : 1, 2⊗L → C, (2 − L tensor).
The trigonometric N-vector X := sin(h(n− 1))Nn=1 ∈ CN , can be
reshaped to the 2-L tensor, of TT-rank 2, Hint: sin z = eiz−e−iz
2i.
Explicit repres. of sin-vector, xp = h2p−1ip, ip ∈ 0, 1, n− 1 =LP
p=1ip2p−1,
F2,L : X 7→ [sin x1cos x1] ⊗L−1p=2
24 cos xp −sin xp
sin xp cos xp
35⊗
24 cos xL
sinxL
35 ∈ 0, 1⊗L,
Benefit: Reduces the number of representation parameters N→2 log2N .
Quantics folding to higher dimension B. Khoromskij, Zurich 2010(L13) 324
Quantics folding: Vector/Tensor Tensorization to highest possible dim.
Def. 13.1. The q-adic folding of degree 2 ≤ p ≤ L,
Fq,d,p : Vn,d → Vm,dp, m = (m1, ...,mℓ), mℓ = (mℓ,1, ...,mℓ,p),
mℓ,1 = qL−p+1, mℓ,ν = q for ν = 2, ..., p, (ℓ = 1, ..., d), reshapes the n-d
tensors in Vn,d to the elements of quantics space Vm,dp:
(A) d = 1: a vector X(N,1) = [X(i)]i∈I ∈ VN,1, is reshaped to VqL−p+1,p,
Fq,1,p : X(N,1) → Y(m,p) = [Y (j)] := [X(i)], j = j1, ..., jp,
j1 ∈ 1, ..., qL−p+1, and jν ∈ 1, ..., q for ν = 2, ..., p.
For fixed i, jν = jν(i) is defined by jν = 1 + CL−p−1+ν , (ν = 1, ..., p), where
the CL−p−1+ν are found from the partial radix-q representation of i− 1,
i− 1 = CL−p + CL−p+1qL−p+1 + · · · + CL−1q
L−1.
(B) For d > 1 a tensor A(n,d) = [A(i1, ..., id)], iℓ ∈ Iℓ, ℓ = 1, ..., d, is
reshaped to
Fq,d,p : A(n,d) → B(m,dp) = [B(j1, ..., jd)] := [A(i1, ..., id)], jℓ = jℓ,1, ..., jℓ,p,
Quantics folding to higher dimension B. Khoromskij, Zurich 2010(L13) 325
with jℓ,1 ∈ 1, ..., qL−p+1, and jℓ,ν ∈ 1, ..., q, for ν = 2, ..., p, and for all
ℓ = 1, ..., d. Now the univariate ℓ-mode index iℓ is reshaped into jℓ as in the
case d = 1.
(C) In the case p = 1, we define Fq,d,1 as the identity mapping.
Rem. 13.1. For the sake of higher compressibility, the maximal degree
folding, Fq,d,L, should be applied, corresponding to p = L. In this case the
index jν − 1 (ν = 1, ..., L) is the q-adic representation of iℓ − 1 for iℓ ∈ Iℓ, in
radix-q system, s.t. jν ∈ 1, ..., q. If q = 2, use the binary coding of i− 1,
i− 1 =LX
ν=1
(jν − 1)2ν−1.
Rem. 13.2. One step folding of a N2-vector to N ×N matrix is the well
known procedure to compress data in signal processing.
The unfolding transform, e.g., tensor-to-matrix (matricization) or
tensor-to-vector (vectorization), may be viewed as the reverse to the
folding transform,
F−1q,d,p : Vm,dp → Vn,d.
Quantics folding to higher dimension B. Khoromskij, Zurich 2010(L13) 326
The folding transform Fq,d,p exhibits many useful properties:
(F1) Fq,d,p is the linear isometry between VN,d and VqL−p+1,dp that has
the inverse transform (unfolding)
F−1q,d,p : VqL−p+1,dp → VN,d.
(F2) The q-folding of a rank-1 tensor w = x1 × ...× xd ∈ VN,d, is given by
the outer product of componentwise reshaping transforms of
canonical vectors,
Fq,d,pw = Fq,1,px1 ⊗ ...⊗Fq,1,pxd.
(F3) Let d = 1, then for any p = 2, ..., L and X = [X(i)] ∈ CN we have a
bound on the TT rank of a tensor Fq,1,LX,
rp−1 ≤ rank(Xp),
where Xp is the reshaping of X to a N/qp−1 × qp−1 matrix.
Why quantics model does a job B. Khoromskij, Zurich 2010(L13) 327
Lem. 13.2. For given N = qL, with q = 2, 3, ... and L ∈ N+, and for given
ck, zk ∈ C (k = 1, ..., R), we have:
(A) A exponential sum N-vector,
X := xn :=RX
k=1
ckzn−1k N
n=1,
can be reshaped by the q-folding Fq,1,L, to the rank-R, q-L tensor in Vq,L,
Fq,1,L : X → A(q,L) =
RX
k=1
ck ⊗Lp=1 [1 zqp−1
k ... z(q−1)qp−1
k ]T ∈ CR,q[TT [1]].
(B) A sum of trigon. N-vectors, X := xn :=RP
k=1ck sin(αk(n− 1))N
n=1, can
be reshaped to the rank-2R, q-L tensor A(q,L), whose TT-ranks do not
exeed 2R,
Fq,1,L : X → A(q,L) =
RX
k=1
Ak ∈ Vq,L, with Ak ∈ TT [2, L].
In both cases, the number of representation parameters is reduced from
(N + 1)R to (qL+ 1)R and 4qLR, respectively.
Why quantics model does a job B. Khoromskij, Zurich 2010(L13) 328
Proof. (A) R = 1, induction: L = 2, i.e., N = q2, Fq,1,2 : X(q2,1) → A(q,2),
A(q,2) :=
0BBBBBBB@
1 zq · · · z(q−1)q
z. . . · · · z(q−1)q+1
.
.
....
. . ....
zq−1 z2q−1 · · · zq2−1
1CCCCCCCA
=
26666664
1
z
.
.
.
zq−1
37777775
[1 zq ... z(q−1)q].
Induction step: L to L+ 1, i.e., for N = q qL. Subvectors x1, ..., xq ∈ RqL of
X(N,1), with xk(i) := X[i+ (k − 1)qL, 1], (k = 1, ..., q, i = 1, ..., qL), represent
the result of a one-level folding (p = 2), by the rank-1, N/q × q-matrix via
rescaling of the first colomn x1,
Fq,1,2 : X(N,1) → A(N/q,2) := c[x1x2...xq] = cx1 ⊗ y,
y := [1 zqL ... z(q−1)qL ]T .
By induction, substitute xk, k = 1, ..., q, of size N/q = qL by rank-1 tensor,
A(q,L+1) = ch⊗L
p=1[1 zqp−1... z(q−1)qp−1
]Ti⊗ [1 zqL ... z(q−1)qL ]T .
Why quantics model does a job B. Khoromskij, Zurich 2010(L13) 329
(B) Again, we begin from the case R = 1. Using trigonometric identity
sin z =eiz − e−iz
2i,
and applying item (A) with R = 1, we arrive at the required claim on
tensor rank of A(q,L) over field C.
Now, the rank of each ℓ-mode TT-unfolding matrix of the q-L tensor does
not exceed 2, since the matrix rank does not change if we extend the field
R to C. Since the TT-ranks do not exceed the ranks of respective
directional unfolding matrices, the maximal TT-rank of the q-L tensor
A(q,L) is bounded by 2R.
In the case of arbitrary rank parameter R > 1, the result is obtained by
summation of rank-1 (resp. rank-2) terms.
The complexity bounds then follow from Thm. 12.1, (A).
Examples on quantics-TT model B. Khoromskij, Zurich 2010(L13) 330
Ex. 13.1. For d = 1 and p = 2, 3, Fq,1,p folds an N-vector to a
N/q × q-matrix or to N/q2 × q × q, 3-tensor, respectively.
Ex. 13.2. Quantics folding of the exponential N-vector:
N = 8, L = 3, X = [1 z z2 z3 z4 z5 z6 z7]T ∈ C8, F2,3(X) ∈ C
2×2×2.
F2,1,3 : X 7→ A =
1
z
⊗
1
z2
⊗
1
z4
∈ QTT [1, 3] ⊂ C
2×2×2.
N
r1
r1rr
2 2r3
d=6
r
N
N
3
r6
r5
6
r5 r4
r r4
d= log N = 3
F
N=23
Def. 13.2. Quantics + TT = QTT model (or QTC).
Combined multiplicative tensor formats B. Khoromskij, Zurich 2010(L13) 331
Introduce the Tucker-TC format, T r[TC[r1]], containing all Tucker
tensors in T r,n with the Tucker core in the rank-r1 TC format. Now the
storage complexity of representation scales linearly in r, O(drN + d r21 r),
while the representation basis is given explicitly by the “optimal” set of
orthogonal Tucker vectors. Notice: T CR,r ⊂ T r[TT [r1]] with r1 = (R, ..., R).
Rem. 13.3. The hierarchical Tucker format introduced in [3] is closely
related to TT model: actually, it is in the spirit of T r[TT [r1]] format.
The so-called canonical-TC format, CR,n[TC[r, L]], is specified as a set
of N-d tensors in CR,n with N = qL, where each canonical N-vector in
rank-1 terms is represented by the q-L tensor in the TC[r, L] format with
L = logq N . The particular representation looks like
V =RX
k=1
ckT(1)k ×2 T
(2)k ...×d T
(d)k ∈ CR,n[TC[r, L]], (110)
where, for k = 1, ..., R, ν = 1, ..., d,
T(ν)k := ×ℓL
ℓ=1G(ℓ)k,ν ∈ TC[r, L] with small-size G
(ℓ)k,ν ∈ R
rℓ−1×q×rℓ .
The storage complexity scales logarithmically in N , O(Rr2d logN), hence,
it has advantages for large tensor size N .
Toward computations in quantum tensor networks B. Khoromskij, Zurich 2010(L13) 332
α
α α
i1α2 i2
1 i1α
αd id
1 d. . .
idαα
iαα1 i 1 α1 α1 i 2α2 α2 αd−1 d
α
α
α1 i 2
d−1
. . .
. . .
α i2 α id−1
α
αd−1i dαd
d d i 1α α1 1
α 2
α2d−1
Quantics approxim. on classes of functional vectors B. Khoromskij, Zurich 2010(L13) 333
The exponential-trigonometric product vector allows the 4 logq N
complexity quantics representation.
Lem. 13.3. For given N = qL, with q = 2, 3, ..., L ∈ N+, and c, z ∈ C, α ∈ R,
the exponential-trigonometric vector X := xn := czn−1 sin(α(n− 1))Nn=1,
can be reshaped to the q-L tensor, Fq,1,L : X → A(q,L) ∈ TT [2], whose
TT-ranks do not exceed 2.
Proof. The properties of the folding transform Fq,1,L imply that the q-L
tensor A(q,L) is obtained by the Hadamard product of the rank-1 quantics
representation for a single exponential and rank-2 quantics of the
trigonometric vector (cf. Lem. 13.2). Now, the statement follows from
the fact that the Hadamard product with rank-1 tensor does not enlarge
the TT-rank of the second factor that is exactly 2. Hence, the TT-rank
of the resultant q-L tensor A(q,L) does not exceed 2.
Rem. 13.4. The minimization of the parametric function fx(q) := q logq x
for the large value of x ∈ R+, leads to the optimal quantics base q∗ ∈ [2, 3].
This means that for large vector-size N , the choice q = 2, 3 leads to the
best compression rate.
Quantics approxim. on classes of functional vectors B. Khoromskij, Zurich 2010(L13) 334
Rem. 13.5. Property (F3) of the quantics folding ensures that the QTT
rank of a vector obtained by the equidistant sampling the polynomial of
degree m, does not exceed m+ 1. In fact, the column space of the
reshaped TT-unfolding matrix is spaned by at most m+ 1 polynomial
vectors generated by 1, x, ..., xm, respectively. The explicit (closed form)
rank-(m+ 1) QTT representation is discussed in Lect. 14.
The quantics format applies to piecewise polynomial wavelet basis
functions. For example, the QTT rank of Haar wavelet does not exceed
2, implying that the asymptotic QTT compression properties are at least
as good as for the Haar wavelets.
Conj. 13.1 Based on the extensive numerical tests, we further assume
that the Gaussian- and sinc-vectors obtained via the uniform sampling,
allow the q-folding quantics approximation, whose TT-rank remains
bounded by a small constant (say, 4, 5) uniformly in the vector size N (see
Table 1).
Quantics approxim. on classes of functional vectors B. Khoromskij, Zurich 2010(L13) 335
Using equidistant sampling points is non mandatory.
Next statements give the uniform rank bound for polynomial vectors
sampled at N + 1 Chebyshev Gauss-Lobatto nodes,
xj = cosπj
N∈ [−1, 1], j = 0, ...,N,
and for Gaussian type function with quadratic mesh grading.
Lem. 13.4 (A) For any n = 0, 1, ..., the Chebyshev polynomial
Tn(x) = cos(n arccos x), |x| ≤ 1, sampled over N + 1 = 2L Chebyshev nodes
xj ∈ [−1, 1], can be represented in the quantics space of 2-logN tensors
with both C-rank and QTT-rank ≤ 2, uniformly in N .
(B) The Chebyshev polynomial Tn(x), sampled as a vector X, at
Chebyshev nodes, θj = arccos xj, has the explicit rank-2 QTT
representation (with yp = h2p−1ip − 1, ip ∈ 0, 1, h = 2/N),
X 7→ [cos y1 − sin y1] ⊗L−1p=2
24 cos yp − sin yp
sin yp cos yp
35⊗
24 cos yL
sin yL
35 ∈ 0, 1⊗L,
(C) Any polynomial of degree m sampled over N + 1 = 2L Chebyshev
nodes at [−1, 1] has a quantics-TT separation rank bounded by 2m+ 1.
Quantics approxim. on classes of functional vectors B. Khoromskij, Zurich 2010(L13) 336
Proof. (A) First we note that the Chebyshev polynomial
Tn(x) = cos(n arccosx), sampled at Chebyshev nodes coinsides with the
cos-trigonometric vector sampled over uniformly graded points in variable
θj = arccosxj , j = 0, ...,N . Then the result follows by Lem. 13.1.
(B) Based on the explicit FTT representat. of the cos-vector (Lect. 14).
(C) Any polynomial of degree m can be represented in the orthogonal
basis of Chebyshev polynomials by at most m+ 1 terms with T0 = 1.
Hence (B) follows from item (A).
Lem. 13.4 can be applied to the case of polynomial interpolation over
Chebyshev nodes, which are usually more prefarable compared with
equispaced nodes. In fact, this prevents the well-known instability
appearing in those interpolation process based on equidistant grids.
Rem. 13.6. The TT-rank of the q-folded discrete Gaussian type function
sampled over the uniform grid, e−α(n−1)2Nn=1, appears to be greater
than 2, but numerical tests show that it remains to be almost uniformly
bounded in the vector size N (see Table 1 below). Lemma 13.2 implies
the rank-1 quantics representation in the case of quadratic mesh grading
toward the origin, i.e., by sampling the Gaussian e−αt2 over
tn =ph(n− 1), n = 1, ...,N, h > 0.
Analytic quantics approximation B. Khoromskij, Zurich 2010(L13) 337
Consider the error bound for the semianalytic quantics
approximation of function related (FR) tensors.
Lem. 13.5. For given continuous function f : [a, b] → R, and
ε > 0, let there is an approximation s.t.
maxx∈[a,b]
|f(x) −M∑
k=1
cke−tkx| ≤ ε. (111)
Then for any N = qL, with q = 2, 3, ..., and L ∈ N+, we have:
(A) The FR N-d tensor F = [Fi], defined by
Fi = f(h(i1+i2+...+id)), i ∈ I⊗d, h > 0, with a ≤ dh ≤ b/N,
with the generating multivariate function f(x1 + ...+ xd), can
be represented by the rank-M , q-dL tensor, up to the
tolerance ε in the max-norm.
Analytic quantics approximation B. Khoromskij, Zurich 2010(L13) 338
(B) Let a ≤ dh ≤ b/N , then the FR N-d tensor G = [Gi],
Gi = f(x21,i1 + ...+ x2
d,id), xℓ,iℓ =
√hiℓ, i ∈ I⊗d,
discretising the multivariate function g = f(x21 + ...+ x2
d) on the
polynomially graded grid xℓ,iℓ, imbedded into the region
a ≤∑dℓ=1 x
2ℓ ≤ b, can be approximated by the rank-M , q-dL
tensor with the tolerance ε in the max-norm.
In both cases, the representation complexity is O(dqM logq N).
Proof. We notice that the previous results can be applied to R-term sums
of exponential/trigonometric vectors in d dimensions, i.e., to the
respective N-d tensors,
A(n,d) := xn :=RX
k=1
ck
dY
ℓ=1
znℓ−1k,ℓ n∈I⊗d , I = 1, ...,N, (112)
A(n,d) can be reshaped to the Q-format CR,q[TT [1, dL]] of complexity
dqR logq N . Now items (A), (B) directly follow from (112).
Analytic quantics approximation B. Khoromskij, Zurich 2010(L13) 339
Lem. 13.5 allows us to derive the accurate O(d logN)
approximations to the wide class of function related tensors
in high dimension. For a class of analytic functions the basic
approximability assumption (111) can be verified with
ε = O(e−αM/ log M ), α > 0,
by applying the sinc-approx.
Exer. 13.1. Check the QTT rank of monomial, general polynomial and
Chebyshev polynomial, all over uniform and Chebyshev grids in [−1, 1].
Exer. 13.2. Test the QTT rank of sin-Hemholtz kernel (scaling in κ?).
Exer. 13.3. Find the QTT rank of step-function and Haar wavelet.
In all cases look on the average rank,
r :=
vuut 1
d− 1
d−1X
k=1
rkrk+1.
Super-compression in high dimension? B. Khoromskij, Zurich 2010(L13) 340
Exer. 13.4. Linear-log-log scaling via quantics in auxiliary
dimension: dth order Hilbert N-d tensor A of dimension N⊗d,
A(i1, ..., id) =1
i1 + i2 + ...+ id≈
M∑
k=−M
d⊗
ℓ=1
cke−tkiℓ ,
i1, ..., id = 1, ..., N = 2L, can be approximated by a rank-| log ε|tensor of order D = d logN and of size 2⊗D, requiring only of
Q = d| log ε| logN ≪ Nd reals.
Using our canonical decomposition, compute its QTT
aproximation applying C-to-QTT.
Numerical gain:
Matrix case: d = 2, N = 220 ⇒ Q = 40| log ε| ≪ 240.
High dimension: d = 210, N = 220 ⇒ Q = 20 · 210| log ε| ≪ 22·104.
Numerics on quantics model B. Khoromskij, Zurich 2010(L13) 341
Tables 1, 6 represent the average QTT-ranks in approximation of function related vectors/matrices up to
tolerance ε = 10−5. Table 6 includs the example of matrix exponential (cf. Conj. 13.1). One can observe
that rank parameters are small, and depend very mildly on the grid size.
N \ r e−αx2, α = 0.1 ÷ 102
sin(αx)x
, α = 1 ÷ 1021x
e−x
xx, x10, x1/10
210 3.2/2.8/2.8/2.2 4.0/4.7/5.5 4 3.5 1.9/2.7/3.9
212 3.1/2.9/2.9/2.6 3.8/4.8/5.6 4.2 3.8 1.9/2.6/3.9
214 2.9/2.8/2.8/2.8 3.6/4.7/5.5 4.2 3.8 1.9/2.5/3.9
216 2.8/2.7/2.8/2.8 3.6/4.5/5.4 4.2 5.3 1.9/2.4/3.9
Table 1: QTT2-ranks of large functional N-vectors, N = 2p.
N \ r e−α∆1 , α = 0.1, 1, 10, 102 ∆−11 diag(1/x2) diag(e−x2
)
29 6.2/6.8/9.7/11.2 6.2 5.1 4.0
210 6.3/6.8/9.5/10.8 6.3 5.3 4.0
211 6.4/6.8/9.0/10.4 6.2 5.5 4.1
Table 2: QTT2-matrix-ranks of N × N-matrices for N = 2p.
Numerics on quantics model B. Khoromskij, Zurich 2010(L13) 342
N \ r 1/(x1 + x2) e−‖x‖ e−‖x‖2diag(e−x2
) ∆−12 1, ε = 10−6, 10−7, 10−8
29 5.0 9.4 7.8 3.8 3.6/3.6/3.6
210 5.1 9.4 7.7 3.9 3.6/3.6/3.6
211 5.2 9.3 7.5 3.9 3.7/3.7/3.7
Table 3: QTT2-ranks of functional N ×N-matrices, N = 2p.
N 128 256 512 1024
1/‖x‖ 13.8 16.0 17.5 18.0
ρ(x) 32.0 40.0 45.8 48.6
Hartree(S) 13.7 14.2 14.2 13.9
Hartree(F) 32.1 34.9 20.2 28.2
Table 4: QTT2-ranks of projected 1/‖x‖, x ∈ R3, the Hartree potential
VH , and electron density ρ of CH4 on N ×N ×N grids in 3D; S - 2D slice,
F - full tensor.
TT/QTT based tensor numerical methods: main ingredients B. Khoromskij, Zurich 2010(L13) 343
1. Discretization in tensor-product Hilbert space of N-d
tensors, Vn = RI1×···×Id, #Iℓ = N = 2L.
2. MLA in low rank TT/QTT tensor formats S ⊂ Vn:
S = Tr[TT [r1]], TT [r], QTT [r], QTTloc.
Key point: Efficient tensor truncation (projection),
TS : S0 → S ⊂ S0 ⊂ Vn,
based on SVD + (R)HOSVD + ALS + ... + multigrid.
3. Explicit TT/QTT-represent. of functions and operators.
4. Multilevel tensor-truncated preconditioned iteration.
5. Quasi-direct tensor solvers: A−1, exp(tA), 1D sPDEs.
6. Minimisation on tensor manifold: precond. + DMRG + QTT.
Literature to Lecture 13 B. Khoromskij, Zurich 2010(L13) 344
1. I.V. Oseledets, Approximation of 2d × 2d matrices using tensor decomposition. SIAM J. Matrix Anal.
Appl., 2010.
2. B.N. Khoromskij, O(d logN)-Quantics Approximation of N-d Tensors in High-Dimensional Numerical
Modeling. Preprint 55/2009 MPI MIS, Leipzig 2009. J. Constr. Approx., 2011, to appear.
3. W. Hackbusch, and S. Kuhn, A new scheme for the tensor representation. Preprint 2/2009, MPI MIS,
Leipzig 2009, submitted.
4. B.N. Khoromskij, and I. Oseledets. Quantics-TT approximation of elliptic solution operators in higher
dimensions. Preprint MPI MiS 79/2009, Leipzig. Rus. J. Numer. Anal. Math. Mod., 2011, to appear.
5. I.V. Oseledets. Constructive representation of functions in tensor formats. Preprint INM Moscow, 2010.
6. I.V. Oseledets, and E.E. Tyrtyshnikov. Breaking the Curse of Dimensionality, or How to Use SVD in
Many Dimensions. SIAM J. Sci. Comp., 31 (2009), 3744-3759.
7. B.N. Khoromskij. Tensors-structured Numerical Methods in Scientific Computing: Survey on Recent
Advances. Preprint 21/2010, MPI MiS Leipzig 2010 (submitted).
Lect. 14. Explicit TT/QTT representation of multivariate vectors B. Khoromskij, Zurich 2010(L14)
Outline of Lecture 14.
1. Operator/matrix TT (OTT/MTT) formats.
2. Explicit (closed form) FTT representation of multivariate
functions in the form f(x) = f1(x1) + f2(x2) + . . .+ fd(xd).
3. Generic trigonometric functions of x1 + ...+ xd.
4. Rank-r separable functions.
5. Multivariate polynomials P (x1 + ...+ xd).
6. Explicit QTT repr. for classes of function related tensors.
7. QTT rank of multivariate polynomials.
8. Remarkable special cases: harmonic oscillator, multivariate
polynomial potentials.
Operator TT (OTT) decomposition B. Khoromskij, Zurich 2010(L14) 346
Recall FTT: Given J := ×dℓ=1Jℓ, Jℓ = 1, ..., rℓ, and J0 = Jd.
Def. 1.5 (Lect. 1) The rank-r functional tensor chain/train
(FTC/FTT) format: contains products of functional
tri-tensors over J (entrywise representation),
f(x1, ..., xd) =∑
j∈Jg1(jd, x1, j1)g2(j1, x2, j2) · · · gd(jd−1, xd, jd),
or in a tensor/functional form via contracted products,
FTC[r]:= f ∈ H : f = ×Jℓdℓ=1G
(ℓ)(xℓ) with G(ℓ) ∈ RJℓ−1×Hℓ×Jℓ.
A function f(x1, ..., xd) ∈ H, x ∈ [0, 1]d is represented
(approximately) by a product of matrices (matrix product
states), each depending on a single variable xℓ.
Rem. 14.1. Efficient tensor numerical methods for multidimensional
equations will require not only low rank FTT decomposition but also the
respective multiplicative representation of operators.
Operator TT (OTT) decomposition B. Khoromskij, Zurich 2010(L14) 347
FTT decomposition induces the important concept of
multiplicative formats for the operators acting between two
TPHSs, A : X → Y, each of dimension d (deatails in Lect. 15).
Ex. 14.1. X = Y = L2[0, 1]d. X = H10 ([0, 1]d), Y = H−1([0, 1]d).
X = Y = Rn⊗d.
Def. 14.1.(OTT/OTC decomp.) Introduce the rank-r operator TC
(OTC) decomposition simbolised by a set of factorised operators A,
A =X
j∈JG(1)(jd, j1)G(2)(j1, j2) · · ·G(d)(jd−1, jd),
with G(ℓ) = [G(ℓ)(jℓ, jℓ+1)] being the operator valued rℓ × rℓ+1 matrix,
where G(ℓ)(jℓ, jℓ+1) : Xℓ → Yℓ, (ℓ = 1, ..., d), s.t. the action Af on rank-1
function f ∈ X is defined as the rank-r TT/TC element in Y,
(Af) (y1, ..., yd) :=X
j∈Jg1(jd, y1, j1)g2(j1, y2, j2) · · · gd(jd−1, yd, jd),
with
gℓ(jℓ−1, yℓ, jℓ) = (G(ℓ)(jℓ−1, jℓ)fℓ)(yℓ).
A sum of univariate functions B. Khoromskij, Zurich 2010(L14) 348
Thm. 14.1 ([5]) The function
f(x) = f1(x1) + f2(x2) + . . .+ fd(xd)
allows rank-2 FTT decomposition of the form
f(x) =“f1(x1) 1
”0@ 1 0
f2(x2) 1
1A · · ·
0@ 1 0
fd−1(xd−1) 1
1A0@ 1
fd(xd)
1A .
Proof. By induction, using the identity for any a and b,0@1 0
a 1
1A0@1 0
b 1
1A =
0@ 1 0
a+ b 1
1A .
Rem. 14.2 The rank-2 TT repr. of the Kronecker sum tensor (Def. 8.4)
A :=dX
ℓ=1
⊗
Xℓ, Xℓ ∈ Rnℓ , rank(A) = d,
where Xℓ is the grid discretization of fℓ(xℓ), is obtained as the
n1 × · · · × nd grid representation of the rank-2 FTT decomposition of
f(x) = f1(x1) + f2(x2) + . . .+ fd(xd), as in Thm. 14.1.
Trigonometric functions f(x) = T (x1 + ...+ xd) B. Khoromskij, Zurich 2010(L14) 349
Lem. 14.2. The rank-2 FTT decomposition of f(x) := sin(dP
j=1xj), and
g(x) := cos(dP
j=1xj), x ∈ Rd, has form (same for tan(
dPj=1
xj), cot(dP
j=1xj), etc.),
f(x) =“sinx1 cosx1
”0@cosx2 − sinx2
sinx2 cosx2
1A · · ·
0@cosxd−1 − sin xd−1
sinxd−1 cos xd−1
1A0@cosxd
sinxd
1A ,
g(x) =“cos x1 − sinx1
”⊗L−1
p=2
0@ cos xp − sin xp
sin xp cos xp
1A0@cos xd
sinxd
1A , respect.
Proof. Prove the case g(x) by induction, similar to Ex. 3.1, Lec. 1,
g(x) = cos x1 cos(x2 + ...+ xd) − sinx1 sin(x2 + ...+ xd)
=“cosx1 − sinx1
”0@cos(x2 + ...+ xd)
sin(x2 + ...+ xd)
1A
=“cosx1 − sinx1
”0@cosx2 − sinx2
sinx2 cos x2
1A0@cos(x3 + ...+ xd)
sin(x3 + ...+ xd)
1A .
H(x+ y) is rank-r separable B. Khoromskij, Zurich 2010(L14) 350
Thm. 14.2. ([5]) Let f be a funct. that depends on a sum of arguments
f(x1, . . . , xd) = H(x1 + x2 + . . .+ xd),
where function H(x+ y) has separation rank r,
H(x+ y) =rX
α=1
uα(x)vα(y),
and functions uα(x) and vα(y) form two linear independent sets. Then
1. All FTT-ranks are bounded by r.
2. If, additionally, bxi, i = 1, . . . , r, and byj , j = 1, . . . , r, are known s.t. matrix
with elements H(bxi + byj) is nonsingular, then FTT decomp. takes form
f = g1(x1)G(x2) · . . . ·G(xd−1)gd(xd), G(xℓ) ∈ Rr×r ,
where
g1(x1) = (ψ1(x1)ψ2(x1) . . . ψr(x1),
G(x)ij = ψi(x+ byj),
H(x+ y) is rank-r separable B. Khoromskij, Zurich 2010(L14) 351
with
ψi(z) =rX
j=1
MijH(bxi + z),
gd(xd) =
0BBBBBB@
H(by1 + xd)
H(by2 + xd)
.
.
.
H(byr + xd)
1CCCCCCA,
and
[Mij ] = [H(bxi + byj)]−1.
Proof. Due to separability assumption, functional skeleton decomposition
ensures that there exist points bxi, i = 1, . . . , r, and byj , j = 1, . . . , r, such
that
H(x+ y) =rX
i,j=1
H(x+ byj)MijH(bxi + y), (113)
where
[Mij ] = [H(bxi + byj)]−1.
The requirement for (113) to be true is that nodes bxi and byj are chosen
H(x+ y) is rank-r separable B. Khoromskij, Zurich 2010(L14) 352
so that matrix [H(bxi + byj)] is nonsingular. Let us rewrite function H(x+ y)
in the form
H(x+ y) =
rX
i,j=1
H(x+ byj)ψi(y), (114)
where ψi(y) is defined as above.
Now, let us proceed construction of FTT decomposition for a function f .
From (113) and (114) it follows that
f = (ψ1(x1) ψ2(x1) . . . ψr(x1) )
0BBBBBB@
H(by1 + (x2 + . . .+ xd))
H(by2 + (x2 + . . .+ xd))
.
.
.
H(byr + (x2 + . . .+ xd))
1CCCCCCA.
For each element of the second vector in r.h.s., x2 can be separated:
H(byk + (x2 + . . .+ xd)) =rX
i=1
ψi(byk + x2)H(byi + (x3 + . . .+ xd)),
H(x+ y) is rank-r separable B. Khoromskij, Zurich 2010(L14) 353
therefore
0BBBBBB@
H(by1 + (x2 + . . .+ xd))
H(by2 + (x2 + . . .+ xd))
.
.
.
H(byr + (x2 + . . .+ xd))
1CCCCCCA
= G2(x2)
0BBBBBB@
H(by1 + (x3 + . . .+ xd))
H(by2 + (x3 + . . .+ xd))
.
.
.
H(byr + (x3 + . . .+ xd))
1CCCCCCA,
where G2 is r × r matrix with elements
G2(x2)ij = ψi(x2 + byj).
This completes the proof.
Cor. 14.1. For a function
f(x1, . . . , xd) = P (x1 + x2 + . . .+ xd),
where function P (z) is degree-p polynomial, P (z) =pP
k=0akz
k, FTT
decomposition has form
f = g1(x1)G(x2) · . . . ·G(xd−1)gd(xd),
H(x+ y) is rank-r separable B. Khoromskij, Zurich 2010(L14) 354
where G(x) is a matrix function of size (p+ 1) × (p+ 1) with entries
G(x)ij = Ci−ji xi−j , i ≥ j, and G(x)ij = 0 otherwise,
for i, j = 0, ..., p, while Csk = k!
s!(k−s)!is the binomial coefficient. Moreover,
g1(x1) = (φ0(x1) φ1(x1) · · ·φp−1(x1) φp(x1)),
φs(x) =
pX
k=s
akCskx
k−s, s = 0, ..., p,
and
gd(xd) = (1 xd x2d · · ·xp
d)T .
Proof. Follows by Thm. 14.2 due to the rank-(p + 1) separable
representation of polynomial P (x+ y),
pX
k=0
ak(x+ y)k =
pX
k=0
ak
kX
s=0
Csky
sxk−s =
pX
s=0
yspX
k=s
akCskx
k−s =
pX
s=0
ysφs(x),
with
φs(x) =
pX
k=s
akCskx
k−s.
Explicit QTT decomp. of rank-r separable funct. B. Khoromskij, Zurich 2010(L14) 355
Thm. 14.3. ([5]) Let f(x) be a continuous function defined on an interval
[a, b] and f(x+ y) has separation rank r. Consider uniform grid on [a, b],
xi = a+ (i− 1)h, h =b− a
n− 1, n = 2d, i = 1, ..., n,
a function generated vector v = [v(i)]
v(i) = f(xi),
and 2 × 2 × · · · × 2 tensor V in dimension d, which is a reshape of v:
V (i1, i2, . . . , id) = v(i),
where ik ∈ 0, 1 (k = 1, ..., d) are binary digits of integer i. Then
1. All QTT ranks are bounded by r.
2. If, additionally, bxi, i = 1, . . . , r, and byj , j = 1, . . . , r, are known such that
matrix with elements f(bxi + byj) is nonsingular, then TT decomposition of
V (and QTT of v) has form
V (i1, i2, . . . , id) = g1(x1)G(x2) · . . . ·G(xd−1)gd(xd),
Explicit QTT decomp. of rank-r separable funct. B. Khoromskij, Zurich 2010(L14) 356
where
g1(x1) = (ψ1(x1)ψ2(x1) . . . ψr(x1),
G(x)ij = ψ1(x+ byj),
gd(xd) =
0BBBBBB@
f(by1 + xd)
f(by2 + xd)
.
.
.
f(byr + xd)
1CCCCCCA,
ψi(z) =rX
j=1
Mijf(bxi + z), i = 1, ..., r,
xk =a
d+ 2k−1ikh, k = 1, ..., d,
and [Mij ] = [f(bxi + byj)]−1.
Proof. To prove Theorem, it is sufficient to note that
V (i1, i2, . . . , id) = v(i) = v(i1 + 2i2 + 4i3 + . . . 2d−1id) = f(x1 + x2 + . . . xd),
where xk = ad
+ 2k−1ikh, and apply Thm. 14.2.
Explicit QTT decomposition of polynomial vectors B. Khoromskij, Zurich 2010(L14) 357
Cor. 14.2. ([5]) Let
M(x) =
pX
k=0
akxk
be a polynomial of degree p on an interval [a, b]. Consider uniform grid on
this interval,
xi = a+ (i− 1)h, h =b− a
n− 1, n = 2d, i = 1, ..., n,
and vector v = [v(i)],
v(i) = M(xi), i = 1, ..., n,
and a 2 × 2 × . . .× 2 tensor V of dimension d, which is a reshape of v:
V (i1, i2, . . . , id) = v(i),
where ik ∈ 0, 1 (k = 1, ..., d) are binary digits of integer i. Then,
QTT-decomposition of v, that is TT-decomposition of V , has form
V (i1, i2, . . . , id) = g1(i1)G2(i2) . . . Gd−1(id−1)gd(id),
Explicit QTT decomposition of polynomial vectors B. Khoromskij, Zurich 2010(L14) 358
where
g1(i1) =“φ0(
a
d+ i1h) φ1(
a
d+ i1h) . . . φp−1(
a
d+ i1h) φp(
a
d+ i1h)
”,
φs(x) =
pX
k=s
akCskx
k−s, s = 0, . . . , p,
Gk(ik) = G“ad
+ 2kikh”,
G(x)ij =
8<:
Ci−ji xi−j , i ≥ j
0, i < j, i, j = 0, . . . p,
gd(id) = g(a
d+ 2didh),
with
g(x) =`1, x, x2, · · · , xp
´T.
Proof. To prove the statement, it is sufficient to note that
V (i1, i2, . . . , id) = v(i) = v(i1 + 2i2 + 4i3 + . . . 2d−1id) = M(x1 + x2 + . . . xd),
where xk = 2k−1ikh+ ad, and apply Thm. 14.3.
FQTT decomp. of rational polynomials and trigonometric funct. B. Khoromskij, Zurich 2010(L14)
Cor. 14.3. For a rational function f(x) =p(x)q(x)
, where p and q are
polynomials defined on an interval [a, b] (|f(x)| <∞, x ∈ [a, b]) and uniform
grid with n = 2d grid points, QTT-ranks behave logarithmically in
accuracy ε > 0 of the QTT-approximation, and number of grid points n:
rk = O(logα 1/ε logβ n), ε, β ≥ 0.
Proof. Due to Thm. 14.3., QTT rank estimates are reduced to the
estimation of the ε-separation rank of the function
g(x, y) = f(x+ y). (115)
For a rational function such estimates can be obtained via constructive
separable approximation schemes (for example, sinc-quadratures).
Cor. 14.4. Trigonometric functions f(x) = sin(x), f(x) = cos(x),
f(x) = tan(x), f(x) = cot(x), etc., sampled as an N-vector over equispaced
grid with N = 2d, admit the closed form QTT decomposition of rank 2.
Proof. Since these functions have separation rank 2, the result follows by
Thm. 14.3.
TT ranks of multivariate polynomials B. Khoromskij, Zurich 2010(L14) 360
The question on low-rank approximation of matrices representing
multidimensional potentials V (q1, . . . , qf ) can be reduced to low-rank
approximation of this function on a tensor grid. In particular, this problem
arises in approximation of so-called potential energy surface (PES) in
molecular dynamics ([6]).
If variables in V (q1, . . . , qf ) are separated,
V (q1, . . . , qf ) ≈rX
k=1
fY
i=1
vi(qi, k),
then the canonical rank of a tensor V does not exceed r, and moreover
TT-ranks of V do not exceed r. However, they can be much smaller. For
important case of polynomial potentials, one can obtain following
estimate on TT-ranks of corresponding tensors.
Thm. 14.4. ([6]) For a general homogeneous polynomial potential of
form
V (q1, . . . , qf ) =
fX
i1,...,is=1
a(i1, . . . , is)sY
k=1
qik ,
rankT T (V ) = C0f[ s2] + o(f [ s
2]).
TT ranks of multivariate polynomials B. Khoromskij, Zurich 2010(L14) 361
Proof. To illustrate the idea, let us first consider quadratic potential,
V =
fX
i,j=1
aijqiqj ,
and estimate its TT-ranks. To prove the statement, we will separate qkone-by-one. This is exactly how numerical algorithm for computing
TT-decomposition works. Suppose, we already have decomposition of
form
V = G1(q1) . . . Gk(qk)W (qk+1, . . . , qf ),
which is a “partial” variant of the FTT-decomposition, and we want to
obtain the next core. W (qk+1, . . . , qf ) is actually a parameter-dependent
vector of length rk:
W (qk+1, . . . , qf ) =
0BBB@
W1(qk+1, . . . , qf )
.
.
.
Wrk (qk+1, . . . , qf )
1CCCA .
In each element, qk+1 can be separated from other variables, but we
require that the same basis functions Rα(qk+2, . . . , qf ), s = 1, . . . , rk+1, are
TT ranks of multivariate polynomials B. Khoromskij, Zurich 2010(L14) 362
used, i.e.,
Ws(. . .) =
rk+1X
α=1
hαs(qk+1)Rα(qk+2, . . . , qf ), s = 1, . . . , rk.
This can be always done: for each component Ws one can separate qk+1
from other variables with ps terms, and get ps basis functions, and no
more thanPrk
s=1 ps basis functions are required. However, there are cases,
when less basis functions are needed and we can estimate their number
for the polynomial PES.
At the first step, q1 is separated from other variables. V is quadratic in q1:
V = a11q21 + q1
fX
j=2
a1jqj +
fX
i,j=2
aijqiqj ,
hence
V =“a11q21 q1 1
”0BB@
1
l1
s2
1CCA ,
where l1 is linear in q2, . . . , qf and s2 is quadratic in q2, . . . , qf , so r1 ≤ 3.
Now, at the second step separation of q2 is required. To estimate r2, one
TT ranks of multivariate polynomials B. Khoromskij, Zurich 2010(L14) 363
has to bound the number of functions depending on q3, . . . , qf required to
represent each element of the vector0BB@
a11
l1
s1
1CCA ,
as a linear combination of such functions with coefficients depending only
on q2. In order to do that, decompose l1 as
l1(q2, . . . , qf ) = a12q2 + l2(q3, . . . , qf ),
where l2 is linear in q3, . . . , qf , and
s1(q2, . . . , qf ) = a22q22 + q2l3(q3, . . . , qf ) + s2(q3, . . . , qf ),
where l3 is linear in q3, . . . , qf and s2 is quadratic in q3, . . . , qf . Therefore,
the following basis functions arise: 1, l2, l3, s2, i.e., there is one constant,
two linear functions in q3, . . . , qf , and one quadratic in q3, . . . , qf . It is easy
to see what happens next: quadratic function gives one more linear
function to the basis, thus after k steps we will have one constant, k
linear functions and 1 quadratic function in qk+1, . . . , qf , and the rank
bound is 2 + k.
TT ranks of multivariate polynomials B. Khoromskij, Zurich 2010(L14) 364
However, dimension of the space of linear functions of qk+1, . . . , qf is
bounded by (f − k), therefore the rank increases only until k ≤ (f − k), i.e.
k ≤ [ f2] and then start to decrease. Thus, maximal rank is [ f
2] + 1.
The idea is naturally extended to higher order of polynomials. For degree
three, at the first step, we will have one constant function, one linear
function, one quadratic and one cubic function in remaining variables.
Cubic function produces one quadratic and one linear function (and one
cubic remains), each quadratic function produces one additional linear
function. At the k-th step there will be k quadratic functions, and the
number of linear functions grows as O(k2), but the dimension of linear
space decreases as (f − k), thus when k ≤ O(k2) rank bound is
O(k2) + k + 2, and the rank increases with k, whereas for k > O(k2) rank
bound is simply f − k + k + 2 = f + 2. This can be depicted in a Table 5.
The rank bound can be obtained from the Table 5 by taking minimum of
the second and the third column.
Analysis for the general case s ≥ 2 is presentd in [6].
QTT ranks of multivariate polynomials B. Khoromskij, Zurich 2010(L14) 365
Pol. order Number at k-th step Dimension of the space
0 1 1
1 k2 f − k
2 k O(f − k)2
3 1 O(f − k)3
Table 5: Different polynomials appearing during TT-SVD process and
dimensions of corresponding spans for order-3 polynomials.
Let us estimate the QTT ranks of discretised polynomial potentials by
sampling over uniform grid.
Thm. 14.5. ([6]) For a general homogeneous polynomial potential of
form
V (q1, . . . , qf ) =
fX
i1,...,is=1
a(i1, . . . , is)sY
k=1
qik ,
sampled over uniform grid, we have
rankQT T (V ) = C0f[ s2] + o(f [ s
2]).
QTT ranks of multivariate polynomials B. Khoromskij, Zurich 2010(L14) 366
Proof. For continuous potentials it is sufficient to notice that
TT-decomposition (without QTT) obtained in the proof of Thm. 14.4
gives rise to FTT decomposition (it follows from the proof directly)
V (q1, . . . , qf ) = v1(q1)V2(q2) · · ·Vf−1(qf−1)vf (qf ),
where Vp for p = 2, . . . , f − 1, are rp−1 × rp matrices with elements which
are polynomials of degree in qp at most s, and rp are TT-ranks of V .
After discretization in variable qp with 2k points we obtain discrete
representation for Vp as rp−1 × 2 × 2 × . . .× 2 × rp array, and analogously to
the proof in the scalar case it can be shown that for the matrix
polynomial case QTT-ranks are bounded by (rp−1 + rp)(s+ 1).
In numerics, it is observed that constants hidden in O(·) in estimates of
Thm. 14.5 are not large. The following Hypothesis summarized results of
our numerical experiments.
Hypoth. 14.1. ([6]) Under premises of Thm. 14.5, the following rank
estimates hold:
1. For a general quadratic potential, V (q1, . . . , qf ) =Pf
ij aijqiqj ,
rankQT T (V ) ≤ f + 1.
QTT ranks of multivariate polynomials B. Khoromskij, Zurich 2010(L14) 367
2. For a general cubic potential, V (q1, . . . , qf ) =Pf
ijk aijkqiqjqk,
rankQT T (V ) ≤ f + 1.
3. For a general quartic potential, V (q1, . . . , qf ) =Pf
ijkl aijklqiqjqkql,
rankQT T (V ) ≤ f(f + 1).
However, these are upper asymptotical estimates for general coefficients
of polynomials. For particular potentials ranks can be much smaller.
Lem. 14.3. ([6]) For harmonic potential,
V (q1, . . . , qf ) =
rX
k=1
wkq2k,
QTT-ranks are bounded by 6, and for Henon-Heiles potential (which was
used as a benchmark computations in molecular dynamics) of form
V (q1, . . . , qf ) =1
2
fX
k=1
q2k + λ
f−1X
k=1
„q2kqk+1 − 1
3q3k
«, (116)
QTT-ranks are bounded by 8 (in numerics we observe 7).
QTT ranks of special multivariate polynomials B. Khoromskij, Zurich 2010(L14) 368
Proof. Since the maximal QTT-rank of discretized monomial q2 is 3, the
result for harmonic potential follows from Thm. 14.1.
To get decomposition for the Henon-Heiles potential, first separate q1:
V =“
−λ3q31 + 1
2q21 λq11
”0BB@
1
q2
V2(q2, . . . , qf )
1CCA ,
where V2(q2, . . . , qf ) is the Henon-Heiles potential of q2, . . . , qf . Separation
of q2 gives
V =“
−λ3q31 + 1
2q21 λq11
”0BB@
1 0 0
q2 0 0
12q22 − λ
3q32 q2 1
1CCA
0BB@
1
q3
V3(q3, . . . , qf )
1CCA ,
which justifies the general structure of the FTT core at subsequent steps:
they are 3 × 3 matrices,
Gk(qk) =
0BB@
1 0 0
qk 0 0
12q2k − λ
3q3k qk 1
1CCA , thus TT-ranks are equal to 3.
QTT ranks of special multivariate polynomials B. Khoromskij, Zurich 2010(L14) 369
To obtain QTT-decomposition on [a, b], one should consider binary
representation of qk, qk = a+ h(Pd−1
s=0 is2s−1), where h is a step size, and is
take values 0 and 1 (for simplicity, index k is omitted, of course, a, h, is,
will depend on k). Introducing new variables,
xs =a
d+ his2
s−1,
we obtain qk = x1 + . . .+ xs. Estimation of QTT-ranks is now reduced to
the separation of indices in block parameter-dependent matrix
G(q) =
0BB@
1 0 0
q 0 0
12q2 − λ
3q3 q 1
1CCA .
This matrix can be split into three parts. Its element in position (3,1) is a
degree-3 polynomial in q, thus rankQT T ( 12q2 − λ
3q3) ≤ 4, and for the linear
part they are bounded by 2 + 2 = 4, therefore the overall rank estimate is 8.
Rem. 14.2. Tucker ranks in all these cases are bounded by f , and lead
to O(ff ) scaling in general, while QTT-format gives polynomial storage
and polynomial complexity in f , even for the most general coefficients.
Literature to Lecture 14 B. Khoromskij, Zurich 2010(L14) 370
1. I.V. Oseledets, Approximation of 2d × 2d matrices using tensor decomposition. SIAM J. Matrix Anal.
Appl., 2010.
2. B.N. Khoromskij, O(d logN)-Quantics Approximation of N-d Tensors in High-Dimensional Numerical
Modeling. Preprint 55/2009 MPI MIS, Leipzig 2009 (submitted).
3. B.N. Khoromskij, and I. Oseledets. Quantics-TT approximation of elliptic solution operators in higher
dimensions. Preprint MPI MiS 79/2009, Leipzig 2009, submitted.
4. B.N. Khoromskij. Tensors-structured Numerical Methods in Scientific Computing: Survey on Recent
Advances. Preprint 21/2010, MPI MiS Leipzig 2010 (submitted).
5. I.V. Oseledets. Constructive representation of functions in tensor formats. Preprint INM Moscow, 2010.
6. B.N. Khoromskij, and I.V. Oseledets. DMRG + QTT approach to high-dimensional quantum molecular
dynamics. Preprint MPI MiS 68/2010, Leipzig 2010, submitted.
7. L. Grasedyck. Polynomial approximation in hierarchical Tucker format by vector-tensorization. Preprint
78, Aachen University, Aachen, 2010.
http://personal-homepages.mis.mpg.de/bokh
Lect. 15. Explicit QTT representation of multivariate matrices B. Khoromskij, Zurich 2010(L15)
Outline of Lecture 15.
1. Operator/matrix TT (OTT/MTT) formats.
2. Vector and operator OTT/OQTT ranks.
3. Laplace-type operators and special notations.
4. Shift and gradient matrices.
5. 1D Laplacian.
6. D-dimensional Laplacian.
7. Inverse Laplace operator in 1D.
8. Estimates on vector and operator QTT ranks for a class
of discrete elliptic operators.
9. Numerics.
Operator TT (OTT) decomposition B. Khoromskij, Zurich 2010(L15) 372
FTT decomposition induces the important concept of
multiplicative formats for the operators acting between two
TPHSs, A : X → Y, each of dimension d.
Ex. 14.1. X = Y = L2[0, 1]d. X = H10 ([0, 1]d), Y = H−1([0, 1]d).
Def. 14.1.(OTT/OTC decomp.) Introduce the rank-r operator TC
(OTC) decomposition simbolised by a set of factorised operators A,
A =X
j∈JG(1)(jd, j1)G(2)(j1, j2) · · ·G(d)(jd−1, jd),
with G(ℓ) = [G(ℓ)(jℓ, jℓ+1)] being the operator valued rℓ × rℓ+1 matrix,
where G(ℓ)(jℓ, jℓ+1) : Xℓ → Yℓ, (ℓ = 1, ..., d), s.t. the action Af on rank-1
function f ∈ X is defined as the rank-r TT/TC element in Y,
(Af) (y1, ..., yd) :=X
j∈Jg1(jd, y1, j1)g2(j1, y2, j2) · · · gd(jd−1, yd, jd),
with
gℓ(jℓ−1, yℓ, jℓ) = (G(ℓ)(jℓ−1, jℓ)fℓ)(yℓ).
Vector TT/QTT ranks B. Khoromskij, Zurich 2010(L15) 373
Vector TT ranks. TT representation may be applied to an operator Aover a vector space, which could be recognized as a “matricization“ ofTT-cores of a “vectorized“ matrix, e. g.
A`i1, j1, . . . , iD, jD
´=
r1X
α1=1
. . .
rD−1X
αD−1=1
U1 (i1, j1, α1)U2 (α1, i2, j2, α2) · . . . ·
· UD−1
“αD−2, iD−1, jD−1, αD−1
”UD
“αD−1, iD, jD
”.(117)
Let us now recur to the equation (117) in view of the basic results
obtained on the minimal possible k-th rank of an exact rather than
approximate TT decomposition of a tensor A (the k-th TT rank of A):
Def. 15.1. A multi-way n1 × . . .× nD-vector
A ∈ Rn1 × . . .× R
nD
is given, its k-th TT rank is the rank of its unfolding A(k) with the elem.
A(k) (i1 . . . ik ; ik+1 . . . iD) = A(i1 . . . iD), 1 ≤ k ≤ D − 1.
Once we apply this to a multi-way matrix rather than vector, TT
decomposition of which is given by (117), we arrive at the same concept
of matrix TT rank. This implies application of TT to a “vectorization” of
the matrix: a matrix is considered merely as a vector in (117), and its
Vector TT/QTT ranks B. Khoromskij, Zurich 2010(L15) 374
neither possibility to map vectors to vectors nor related properties are
taken into consideration. To emphasize this, we refer to this ranks as
vector TT ranks (presentation is based on V. Kazeev, BNK [5]).
Def. 15.2. A multi-way m1 × n1 × . . .×mD × nD-matrix
A : Rn1 × . . .× R
nD 7→ Rm1 × . . .× R
mD
is given, its k-th vector TT rank is the rank of its unfolding A(k)
(1 ≤ k ≤ D − 1) with the elements
A(k) (i1j1 . . . ikjk ; ik+1jk+1 . . . iDjD) = A(i1j1 . . . iDjD).
In particular this means that the minimal vector ranks of TT
decomposition of a certain matrix are somewhat independent from one
other, depending on the matrix in the aggregate. So we may consider a
minimal rank decomposition, which it holds for that no one of D − 1 ranks
can be reduced without introducing an error in (117) even if we allow the
others to grow. This makes it reasonable to compare ranks elementwise.
Def. 15.3. Let us say that a multiway matrix (vector) is of ranks not
greater than r1 . . . rD−1 if and only if for any k: 1 ≤ k ≤ D − 1 its k-th
vector TT rank is not greater than rk.
Operator TT/QTT ranks B. Khoromskij, Zurich 2010(L15) 375
Operator TT ranks. Vector TT rank of a matrix is of great
importance in view of storage costs and complexity of such basic
operations as dot product, multi-dimensional contraction,
matrix-by-vector multiplication, rank reduction and orthoganalization of a
tensor train. Their complexity upper bound is linear with respect to vector
TT rank upper bound raised to the power 2 or 3.
Even if we manage to perform a matrix-by-vector multiplication, this
may not be enough for solution of the problem involved. For example,
developing iterative solvers we are likely to be concerned with vector TT
ranks of a matrix-by-vector product.
Formally, ranks of TT decompositions are multiplied when two matrices
or a matrix and a vector are multiplied. Often this obvious estimate of
ranks of the product leads to unaffordable complexity estimates, but,
fortunately, is not sharp, so that low-rank approximation is possible.
A reasonable a priori estimate of ranks would allow one to rely upon
such the approximation procedure, complexity of which is cubic in respect
of ranks. Below we introduce the concept of operator TT rank.
Operator TT/QTT ranks B. Khoromskij, Zurich 2010(L15) 376
Def. 15.4. A multi-way matrix A : Rn1 × . . .× RnD 7→ Rm1 × . . .× RmD
given, for any vector X ∈ Rn1 × . . .× RnD let us denote vector TT ranks
of the matrix-by-vector product AX by r1 . . . rD−1. Then let us refer to
maxk=1...D−1,
X is of vector TT rank 1...1
rk
as the operator TT rank of A.
The following Prop. gives obvious inequality between the two ranks
introduced in Def. 15.4 and Def. 15.2.
Prop. 15.1. Operator TT rank does not exceed the maximum
component of vector TT rank.
This estimate is essentially not sharp. For example, consider two
vectors X,Y ∈ Rn1 × . . .× RnD such that X is of vector TT rank 1 . . . 1.
Then for any vector Z ∈ Rn1 × . . .× RnD of vector TT rank 1 . . . 1 the
tensor (XY ′) Z = 〈Y ,Z〉X is of vector TT rank 1 . . . 1, while
(Y X′) Z = 〈X,Z〉Y is of the same vector TT rank as Y . Consequently,
operator TT rank of XY ′ is equal to 1, while that of Y X′ is as high as
the maximum rank of TT cores of Y , which can be random and have a
very bad QTT structure resulting in a high vector TT rank of XY ′.
Explicit QTT for Laplace-related operators B. Khoromskij, Zurich 2010(L15) 377
Class of operators. We focus on QTT structure of finite difference
discretization ∆(d1...dD) of Laplace operator, considered over a
D-dimensional cube on tensor uniform grids, and, in one-dimensional case,
of its inverse as well.
The grids in question are tensor products of D one-dimensional uniform
grids, each k-th of them comprising 2dk points.
By discrete Laplace operator we mean a matrix
∆(d1...dD) = a1∆(d1)1 ⊗ I2d2 . . .⊗ I
2dD+ . . .+ I2d1 ⊗ . . .⊗ I
2dD−1 ⊗ aD∆
(dD)D ,
summed by D terms here, Im being an m×m identity matrix. The
weights ak are to take into consideration both the difference in grid steps
and anisotropy. For the sake of brevity these weights are let be 1 below
unless otherwise stated.
Each of ∆(dk)k may be any of the following 2dk × 2dk -matrices, depending
on the boundary conditions imposed:
Explicit QTT for Laplace-related operators B. Khoromskij, Zurich 2010(L15) 378
∆(dk)DD
=
0BBBBBBBBBBBBB@
2 −1
−1 2
...
...
... −1
−1 2 −1
−1 2
1CCCCCCCCCCCCCA
, ∆(dk)NN
=
0BBBBBBBBBBBBB@
1 −1
−1 2
...
...
... −1
−1 2 −1
−1 1
1CCCCCCCCCCCCCA
(118)
are the ones for Dirichlet and Neumann boundary conditions respectively,
∆(dk)DN
=
0BBBBBBBBBBBBB@
2 −1
−1 2
...
...
... −1
−1 2 −1
−1 1
1CCCCCCCCCCCCCA
, ∆(dk)ND
=
0BBBBBBBBBBBBB@
1 −1
−1 2
...
...
... −1
−1 2 −1
−1 2
1CCCCCCCCCCCCCA
(119)
are the ones for various boundary conditions in the two boundary points and
∆(dk)P
=
0BBBBBBBBBBBBB@
2 −1 −1
−1 2
...
...
... −1
−1 2 −1
−1 −1 2
1CCCCCCCCCCCCCA
(120)
is the one for periodic boundary conditions.
Notations B. Khoromskij, Zurich 2010(L15) 379
Notation.
I =
0@1 0
0 1
1A , J =
0@0 1
0 0
1A , I1 =
0@1 0
0 0
1A , I2 =
0@0 0
0 1
1A , P =
0@0 1
1 0
1A ,
E =
0@1 1
1 1
1A , F =
0@ 1 −1
−1 1
1A , K =
0@−1 0
0 1
1A , L =
0@0 −1
1 0
1A . (121)
To deal with 3 and 4-dimensional TT cores efficiently, we use matrix
notation for them and their convolutions. For instance, if n×m-matrices
Aαβ, α = 1 . . . r1, β = 1 . . . r2 are the TT blocks of a TT core U of mode
sizes n and m, left rank r1 and right rank r2, so that U (α, i, j, β) =`Aαβ
´ij
for all the values of the indices, then we write it just as a matrix
U =
26664
A11 · · · A1r2
.
.
....
.
.
.
Ar11 · · · Ar1r2
37775 ,
a core matrix (in square braces). As long as we aim to present TT
structure in terms of a narrow set of TT blocks, we need to focus on
rank structure of the cores, and that is why such a notation is convenient
in handling the cores of TT decomposition.
Notations B. Khoromskij, Zurich 2010(L15) 380
For every two TT cores U and V of proper sizes we define an inner core
product of them, U ⋊⋉V , as a regular product of the two core matrices,
their elements (TT blocks) being multiplied by means of tensor product,
e. g. U ⋊⋉V =
=
24A11 A12
A21 A22
35⋊⋉
24B11 B12
B21 B22
35 =
24A11 ⊗B11 + A12 ⊗B21 A11 ⊗B12 + A12 ⊗B22
A21 ⊗B11 + A22 ⊗B21 A21 ⊗B12 + A22 ⊗B22
35 ,
and an outer core product of them, U •V , as a tensor product of the two
core matrices, their elements (TT blocks) being multiplied by means of
regular matrix product, e. g.
U •V =
24A11 A12
A21 A22
35 •
24B11 B12
B21 B22
35 =
2666664
A11B11 A11B12 A12B11 A12B12
A11B21 A11B22 A12B21 A12B22
A21B11 A21B12 A22B11 A22B12
A21B21 A21B22 A22B21 A22B22
3777775.
In order to avoid confusion we use square braces for TT cores, which are
to be multiplied by means of inner or outer core product, and round
braces for regular matrices, which are to be multiplied as usual.
Representation of Laplace-like operator B. Khoromskij, Zurich 2010(L15) 381
Ex. 15.1. Both core products introduced above arise naturally from the
TT format. For instance, (117) could be recasted as
A = U1 ⋊⋉U2 ⋊⋉ . . .⋊⋉UD−1 ⋊⋉UD,
and a matrix product of A and B = V1 ⋊⋉V2 ⋊⋉ . . .⋊⋉VD−1 ⋊⋉VD could be put
down then as
AB = (U1 •V1) ⋊⋉(U2 •V2) ⋊⋉ . . .⋊⋉(UD−1 •VD−1) ⋊⋉(UD •VD) ,
let B be either a matrix or a vector.
As usual, by A⊗ k, k being natural, we mean a k-th tensor power of A. For
example, I⊗ 3 = I ⊗ I ⊗ I, and the same way for the core product
operations “⋊⋉“ and “•“.
TT structure of a “D-dimensional” Laplace-like operators.
Below we will also need a Laplace-like operator L(D), D ≥ 2, with a
Representation of Laplace-like operator B. Khoromskij, Zurich 2010(L15) 382
slightly more general structure:
L(D) = M1 ⊗R2 ⊗R3 ⊗ . . .⊗RD−2 ⊗RD−1 ⊗RD
+ L1 ⊗M2 ⊗R3 ⊗ . . .⊗RD−2 ⊗RD−1 ⊗RD + . . .
+ L1 ⊗L2 ⊗L3 ⊗ . . .⊗LD−2 ⊗MD−1 ⊗RD
+ L1 ⊗L2 ⊗L3 ⊗ . . .⊗LD−2 ⊗LD−1 ⊗MD, (122)
matrices Lk, Mk and Rk being of size mk × nk, 1 ≤ k ≤ D.
Lem. 15.1. For any D ≥ 2 the Laplace-like operator L(D) allows the
following rank-2 . . . 2 TT representation in terms of the blocks Lk, Mk and
Rk:
L(D) =
hL1 M1
i⋊⋉
24L2 M2
R2
35⋊⋉ . . .⋊⋉
24LD−1 MD−1
RD−1
35⋊⋉
24MD
RD
35 .
Rem. 15.1. Once QTT decompositions of each of “one-dimensional“
operators Lk, Mk and Rk, 1 ≤ k ≤ D, are known, they can easily be
merged into a QTT decomposition of the “D-dimensional“ operator L(D)
according to Lem. 15.1.
“One-dimensional” shift and gradient matrices B. Khoromskij, Zurich 2010(L15) 383
“One-dimensional” shift and gradient matrices.
Let us introduce the QTT structure of the two such recognizable
“one-dimensional” operators as shift and gradient matrices:
S(d) =
0 1 0
. . .. . .
. . .
0 1 0
0 1
0
, G(d) =
1 −1 0
. . .. . .
. . .
1 −1 0
1 −1
1
,
size of both being equal 2d. A simple recursive block structure of G(k)
G(k) =
0@G(k−1) −J ′ ⊗(k−1)
G(k−1)
1A = I ⊗G(k−1) − J ⊗ J ′ ⊗(k−1),
“One-dimensional” shift and gradient matrices B. Khoromskij, Zurich 2010(L15) 384
in our core product notation leads straightforwardly to
G(d) =
hI J
i⋊⋉
24 G(d−1)
−J′ ⊗(d−1)
35 =
hI J
i⋊⋉
24I J
J′
35 ⋊⋉
2664
G(d−2)
−J′ ⊗(d−2)
−J′ ⊗(d−2)
3775
=hI J
i⋊⋉
24I J
J′
35 ⋊⋉
24 G(d−2)
−J′ ⊗(d−2)
35 = . . . =
hI J
i⋊⋉
24I J
J′
35
⋊⋉(d−2)
⋊⋉
24G(1)
−J′
35
=hI J
i⋊⋉
24I J
J′
35
⋊⋉(d−2)
⋊⋉
24I − J
−J′
35 .
Decomposition of a shift matrix is obtained by the same token:
S(d) =hI J
i⋊⋉
24I J
J ′
35
⋊⋉(d−2)
⋊⋉
24S(1)
J ′
35 =
hI J
i⋊⋉
24I J
J ′
35
⋊⋉(d−2)
⋊⋉
24JJ ′
35 .
One dimensional Laplacian. Consider “one-dimensional” Laplace
operator ∆(d)DD.
Like the gradient matrix dealt with above it has a low-rank QTT
structure, described in the next lemma.
“One-dimensional” shift and gradient matrices B. Khoromskij, Zurich 2010(L15) 385
Lem. 15.2. For any d ≥ 2 it holds that
∆(d)DD =
hI J ′ J
i⋊⋉
2664
I J ′ J
J
J ′
3775
⋊⋉(d−2)
⋊⋉
2664
2I − J − J ′
−J−J ′
3775 .
Proof. Similarly to a gradient matrix, ∆(d)DD
exhibits a recursive block structure:
∆(k)DD
=
0@ ∆
(k−1)DD
−J′ ⊗(k−1)
−J⊗(k−1)∆
(k−1)DD
1A = I ⊗∆
(k−1)DD
− J′⊗ J
⊗(k−1)− J ⊗ J
′ ⊗(k−1),
which yields us its low-rank QTT representation:
∆(d)DD
=hI J′ J
i⋊⋉
2664
∆(d−1)
−J⊗(d−1)
−J′ ⊗(d−1)
3775 =
hI J′ J
i⋊⋉
2664
I J′ J
J
J′
3775 ⋊⋉
266666664
∆(d−2)
−J⊗(d−2)
−J′ ⊗(d−2)
−J⊗(d−2)
−J′ ⊗(d−2)
377777775
=hI J′ J
i⋊⋉
2664
I J′ J
J
J′
3775 ⋊⋉
2664
∆(d−2)
−J⊗(d−2)
−J′ ⊗(d−2)
3775 = . . . =
=hI J′ J
i⋊⋉
2664
I J′ J
J
J′
3775
⋊⋉(d−2)
⋊⋉
2664
2I − J − J′
−J
−J′
3775 .
D-dimensional Laplacian B. Khoromskij, Zurich 2010(L15) 386
D dimensions. As soon as Mk = ak∆(dk)k and Lk = Rk = I⊗ dk for any
k = 1 . . . D, L(D) (122) is a Laplace operator ∆(d1...dD) (377), and hence
the following corollary to Lem. 15.2. holds.
Lem. 15.3. For any D ≥ 2 the “D-dimensional” Laplace operator defined
by (377) has the following QTT structure in terms of the
“one-dimensional” Laplace operators ak∆(dk)k , 1 ≤ k ≤ D:
∆(d1...dD) =hI⊗ d1 a1∆
(d1)1
i
⋊⋉
24I
⊗ d2 a2∆(d2)2
I⊗ d2
35⋊⋉ . . .⋊⋉
24I
⊗ dD−1 aD−1∆(dD−1)
D−1
I⊗ dD−1
35⋊⋉
24aD∆
(dD)D
I⊗ dD
35 .
Next we match this with results of Lem. 15.2 and Lem. 15.3 according
to the Rem. 15.1.
As soon as we derive low-rank QTT representations of the supercores
involved, we will have the one of the D-dimensional Laplace operator
comprising these supercores at once.
D-dimensional Laplacian B. Khoromskij, Zurich 2010(L15) 387
In case of Dirichlet boundary conditions we put QTT cores into the supercores involved and do the same
thing as before: reduce ranks as possible by elimination of dependent QTT blocks, which could be
conceived as sweeping column (in regard to the left core) or row (in regard to the right core)
transformation matrices through the “tensor train” just as it was done in the proof of Lem. 15.2.
Lem. 15.4. For any d ≥ 3 the following QTT representations hold.
24I
⊗ dk ak∆(dk)DD
I⊗ dk
35 =
24I J′ J
I
35 ⋊⋉
2666664
I J′ J
J
J′
I
3777775
⋊⋉(d−2)
⋊⋉
2666664
I ak
“2I − J − J′
”
−akJ
−akJ′
I
3777775,
hI⊗ dk ak∆
(dk)DD
i=
hI J′ J
i⋊⋉
2664
I J′ J
J
J′
3775
⋊⋉(d−2)
⋊⋉
2664
I ak
“2I − J − J′
”
−akJ
−akJ′
3775 ,
24ak∆
(dk)DD
I⊗ dk
35 =
24I J′ J
I
35 ⋊⋉
2666664
I J′ J
J
J′
I
3777775
⋊⋉(d−3)
⋊⋉
2666664
akI akJ′ akJ
akJ
akJ′
12I − 1
2I − 1
2I
3777775
⋊⋉
2664
2I − J − J′
−J
−J′
3775 .
D-dimensional Laplacian B. Khoromskij, Zurich 2010(L15) 388
Proof.
24I
⊗ dk ak∆(dk)DD
I⊗ dk
35 =
24I I J′ J
I
35
⋊⋉
266666664
I
I J′ J
J
J′
I
377777775
⋊⋉(d−2)
⋊⋉
266666664
I
ak
“2I − J − J′
”
−akJ
−akJ′
I
377777775
=
24I J′ J
I
35 ⋊⋉
2666664
I J′ J
J
J′
I
3777775
⋊⋉(d−2)
⋊⋉
2666664
I ak
“2I − J − J′
”
−akJ
−akJ′
I
3777775,
for a middle supercore. The terminal supercores are subcores of that, which allows to reduce ranks of
them similarly to how it was done in the proof of Lem. 15.2.
Laplacian inverse B. Khoromskij, Zurich 2010(L15) 389
1D Laplace operator inverse.
Next we derive low-rank QTT decompositions of inverse of a discretized
Laplace operator, Dirichlet-Neumann or Dirichlet-Dirichlet boundary
conditions being imposed. We will proceed from explicit representation of
∆(d)DD
−1and ∆
(d)DN
−1. Next Lem. follows by a direct check.
Lem. 15.5. Let ∆DD, ∆DN , be n× n-matrices. Then
∆DD−1
ij =1
n+ 1
8<:
i(n+ 1 − j), 1 ≤ i ≤ j ≤ n
(n+ 1 − i)j, 1 ≤ j < i ≤ n,
∆DN−1
ij =1
n+ 1
8<:
i(n+ 1), 1 ≤ i ≤ j ≤ n
(n+ 1)j, 1 ≤ j < i ≤ n.
Lem. 15.6. For any d ≥ 2 it holds that
∆(d)DN
−1=
hI I2 J J ′
i⋊⋉
2666664
I I2 J J ′
2E
I2 + J ′ E
I2 + J E
3777775
⋊⋉(d−2)
⋊⋉
2666664
E + I2
2E
E + I2 + J ′
E + I2 + J
3777775.
Laplacian inverse B. Khoromskij, Zurich 2010(L15) 390
Proof. According to Lem. 15.6, the inverse of the matrix ∆(d)DN has the
following form:
∆(d)DN
−1=
0BBBBBBBBBBBBBBBB@
1 · · · · · · · · · 1
.
.
. 2 · · · · · · 2
.
.
.
.
.
. 3 · · · 3
.
.
.
.
.
.
.
.
.
...
.
.
.
1 2 3 · · · 2d
1CCCCCCCCCCCCCCCCA
,
and hence, introducing matrices
K(k) =
0BBBBBBBBBBBBBBBB@
1 2 3 · · · 2k
.
.
. · · · · · · · · ·
.
.
.
.
.
. · · · · · · · · ·
.
.
.
.
.
. · · · · · · · · ·
.
.
.
1 2 3 · · · 2k
1CCCCCCCCCCCCCCCCA
,
1 ≤ k ≤ D − 1, which it holds for that
K(k) =
0@K(k−1) 2k−1E⊗(k−1) + K(k−1)
K(k−1) 2k−1E⊗(k−1) + K(k−1)
1A =
hI2 + J E
i⋊⋉
242k−1E⊗(k−1)
K(k−1)
35 ,
Laplacian inverse B. Khoromskij, Zurich 2010(L15) 391
we draw up the following:
∆(d)DN
−1=
hI I2 J J′
i⋊⋉
2666664
∆(d−1)DN
−1
2d−1E⊗(d−1)
K(d−1)′
K(d−1)
3777775
=hI I2 J J′
i⋊⋉
2666664
I I2 J J′
2E
I2 + J′ E
I2 + J E
3777775
⋊⋉
266666666666666666664
∆(d−2)DN
−1
2d−2E⊗(d−2)
K(d−2)′
K(d−2)
2d−2E⊗(d−2)
2d−2E⊗(d−2)
K(d−2)′
2d−2E⊗(d−2)
K(d−2)
377777777777777777775
=hI I2 J J′
i⋊⋉
2666664
I I2 J J′
2E
I2 + J′ E
I2 + J E
3777775
⋊⋉
2666664
∆(d−2)DN
−1
2d−2E⊗(d−2)
K(d−2)′
K(d−2)
3777775
= . . . =hI I2 J J′
i⋊⋉
2666664
I I2 J J′
2E
I2 + J′ E
I2 + J E
3777775
⋊⋉(d−2)
⋊⋉
2666664
E + I2
2E
E + I2 + J′
E + I2 + J
3777775.
Laplacian inverse B. Khoromskij, Zurich 2010(L15) 392
Lem. 15.7. Let d ≥ 2 and
λk = −2k−1 + 1
2, µk =
(2k−1 + 1)2
2k,
ξk =2k−1 + 1
2k + 1, ηk =
2k−2
2k + 1, C(k) =
0@λk µk
µk λk
1A
for 1 ≤ k ≤ d. Then ∆(d)DD
−1has a rank-5 . . . 5 QTT representation
∆(d)DD
−1= Wd ⋊⋉Wd−1 ⋊⋉ . . .⋊⋉W2 ⋊⋉W1,
that consists of the TT cores
Wd =hI 1
4C(d) C(d) −λdK −µdL
i, W1 =
266666664
13
(I + E)
−E
− 136F
− 13K
13L
377777775
,
Wk =
266666664
I 14C(k) C(k) −λkK −µkL
E
η2kF ξ2kE ξkηkK ξkηkL
η2kK ξkηkE
−η2kL ξkηkE
377777775
, 2 ≤ k ≤ d − 1.
Sketching Proof. Use the Sherman-Morrison-Woodbury formula and recursive representations for
k = 2, ..., d (see V. Kazeev, BNK [5]).
Collect operator QTT ranks B. Khoromskij, Zurich 2010(L15) 393
Thm. 15.1. The following upper bounds of vector QTT ranks of the corresponding matrices hold.
∆(d)DD
: 3 . . . 3
(Lem. 15.2)
∆(d)DN
, ∆(d)ND
: 4 . . . 4
∆(d)NN
: 4, 5 . . . 5, 4
∆(d)P
: 2, 3 . . . 3
([5])
∆(d)DD
−1: 4, 5 . . . 5, 4
(Lem. 15.7.)
∆(d)DN
−1, ∆
(d)ND
−1: 4 . . . 4
(Lem. 15.6)
∆(d1...dd)DD
: 3 . . . 3, 2, 4 . . . 4, 2 . . . . . . 2, 4 . . . 4, 2, 4 . . . 4, 3
(Lem. 15.3, 15.4)
∆(d1...dd)DN
, ∆(d1...dd)ND
: 4 . . . 4, 2, 5 . . . 5, 2 . . . . . . 2, 5 . . . 5, 2, 5 . . . 5, 4
∆(d1...dd)NN
: 4, 5 . . . 5, 2, 5, 6 . . . 6, 5, 2 . . . . . . 2, 5, 6 . . . 6, 5, 2, 5, 6 . . . 6, 4
∆(d1...dd)P
: 2, 3 . . . 3, 2, 3, 4 . . . 4, 2 . . . . . . 2, 3, 4 . . . 4, 2, 3, 4 . . . 4, 3
([5])
Proof. Follows from the lemmas presenting explicit QTT representation of the mentioned ranks.
Toward numerical issues B. Khoromskij, Zurich 2010(L15) 394
Rem. 15.2. Numerical experiments carried out with TT-Toolbox prove
all the upper bounds for vector TT/QTT ranks given in Thm. 15.1 to be
sharp; the corresponding explicit representations, to be of minimal rank.
Exer. 15.1. ∆(d)DD has a rank-5 . . . 5 explicit QTT representation (Lem.
15.x). Confirm it with the numerical QTT decomposition. Compare the
CPU time for explicit and algebraic decomposition for large n = 2d.
Exer. 15.2. Recall that eigenvectors u of ∆(dk)DD have the explicit rank-2
FQTT decomposition (Lect. 14). Check it by “analytic“ calculation of
the matrix-vectors product of the respective OQTT and FQTT
decompositions.
Exer. 15.3. Compare vector and operator ranks of ∆(d)DD
−1by QTT
calculations.
Numerics on quantics model B. Khoromskij, Zurich 2010(L15) 395
Tables 6, 7 represent the average QTT-ranks in approximation of function related matrices up to
tolerance ε = 10−5. Table 6 includs the example of matrix exponential (cf. Conj. 13.1). One can observe
that rank parameters are small, and depend very mildly on the grid size.
N \ r e−α∆1 , α = 0.1, 1, 10, 102 ∆−11 diag(1/x2) diag(e−x2
)
29 6.2/6.8/9.7/11.2 6.2 5.1 4.0
210 6.3/6.8/9.5/10.8 6.3 5.3 4.0
211 6.4/6.8/9.0/10.4 6.2 5.5 4.1
Table 6: QTT2-matrix-ranks of N × N-matrices for N = 2p.
N \ r 1/(x1 + x2) e−‖x‖ e−‖x‖2diag(e−x2
) ∆−12 1, ε = 10−6, 10−7, 10−8
29 5.0 9.4 7.8 3.8 3.6/3.6/3.6
210 5.1 9.4 7.7 3.9 3.6/3.6/3.6
211 5.2 9.3 7.5 3.9 3.7/3.7/3.7
Table 7: QTT2-ranks of functional N ×N-matrices, N = 2p.
Literature to Lecture 15 B. Khoromskij, Zurich 2010(L15) 396
1. I.V. Oseledets, Approximation of 2d × 2d matrices using tensor decomposition. SIAM J. Matrix Anal.
Appl., 2010.
2. B.N. Khoromskij, O(d logN)-Quantics Approximation of N-d Tensors in High-Dimensional Numerical
Modeling. Preprint 55/2009 MPI MIS, Leipzig 2009 (submitted).
3. I.V. Oseledets. Constructive representation of functions in tensor formats. Preprint INM Moscow, 2010.
4. B.N. Khoromskij, and I.V. Oseledets. DMRG + QTT approach to high-dimensional quantum molecular
dynamics. Preprint MPI MiS 68/2010, Leipzig 2010, submitted.
5. V. Kazeev, and B.N. Khoromskij. On explicit QTT representation of Laplace-like operators and their
inverse. Preprint MPI MiS 75/2010, Leipzig 2010, submitted.
http://personal-homepages.mis.mpg.de/bokh
Lect. 16. Preconditioned tensor truncated iterative solvers B. Khoromskij, Zurich 2010(L16) 397
Outline of Lecture 16.
1. Problem classes.
2. BVPs: TT/QTT truncated preconditioned iteration.
3. Laplacian-based preconditioners for slow and strongly
varying coefficients.
4. Recursive quadratures for matrix exponential family.
5. EVPs: TT/QTT truncated preconditioned iteration.
6. Green function iteration.
7. Numerical experience.
8. Parabolic problems.
9. Regularised QTT matrix exponential.
10. Numerics on QTT representation of multivariate
potentials.
Problem classes in Rd revisited B. Khoromskij, Zurich 2010(L16) 398
Elliptic (parameter-dependent) eq.: Find u ∈ H10 (Ω), s.t.,
Hu := − div (a gradu) + V u = F in Ω ∈ Rd.
EVP: Find a pair (λ, u) ∈ R ×H10 (Ω), s.t., 〈u, u〉 = 1,
Hu = λu in Ω ∈ Rd,
u = 0 on ∂Ω.
Parabolic equations: Find u : Rd × (0,∞) → R, s.t.
u(x, 0) ∈ H2(Rd) : σ∂u
∂t+ Hu = 0, H = ∆d + V (x1, ..., xd).
Specific features:
⊲ High spacial dimension: Ω = (−b, b)d ∈ Rd (d = 2, 3, ..., 100, ...).
⊲ Multiparametric eq.: a(y, x), u(y, x), y ∈ RM (M = 1, 2, ..., 100, ...,∞).
⊲ Nonlinear, nonlocal (integral) operator V = V (x, u), singular potentials.
Tensor-truncated preconditioned iteration: BVP B. Khoromskij, Zurich 2010(L16) 399
Parametric elliptic BVP on nonlinear manifold S:
A(y)U(y) = F,
Um+1 = Um − B−1(AUm −F), Um+1 := TS(Um+1) ∈ S.
Assumptions:
U,F allow the low S-rank tensor approximation,
A and B−1 are of low matrix S-rank,
A and B are spectral equivalent (close).
Good candidates for B−1:
(A) Slowly variable a(x): Shifted anisotropic d-Lapl. inverse (∆(d) + a0I)−1,
∆(d) = a1∆1 ⊗ IN ⊗ ...⊗ IN + ...+ IN ⊗ IN ...⊗ ad∆d ∈ RN⊗d×N⊗d
.
(B) Highly variable coef. a(y, x): ”reciprocal“ preconditioner
“∇T a∇
”−1≈ P := ∆−1
„∇T 1
a∇«
∆−1. (123)
Dolgov, BNK, Oseledets, Tyrtyshnikov [1]
QTT representation of operators (matrices) B. Khoromskij, Zurich 2010(L16) 400
Rank bounds for Laplacian-related matrices
Lem. 16.1. The following TT/QTT rank estimates hold:
rankC(∆d) = d, rankTT (∆d) = 2.
rankQTT (∆1) = 3, rankQTT (∆d) = 4, d ≥ 2, Kazeev, BNK [9].
rankQTT (∆−11 ) ≤ 5, [9]. rankQTT
(∇T a∇
)≤ 7 rankQTT (a), [1].
ε-rank: rankQTT (exp(−α∆1)) ≤ C| log ε|| logα| - numer. proof.
ε-rank: rankTT (∆−1d ) ≤ rankC(∆−1
d ) ≤ C| log ε| logN .
ε-rank: rankQTT (∆−1d ) ≤ C| log ε|2 logN .
∆−1
(∇T 1
a∇)
∆−1(∇T a∇
)= I +R, where for d = 1
rank(R) = 1, and for d ≥ 2, we have rank(R) = const in the
case of pwc coefficient with one interface. Numerically, it
demonstrates surprisingly good clustering properties, [1].
Low tensor rank Laplacian inverse B. Khoromskij, Zurich 2010(L16) 401
Ex. 16.1. ∆(d): FD negative Laplacian on H10 ([0, 1]d),
d = 2, 3, .... Sinc-type rank-R, approx. of ∆−1(d), R = 2M + 1,
BM :=
M∑
k=−M
ck
d⊗
ℓ=1
exp(−tk∆(ℓ)) ≈ (∆(d))−1, ∆(ℓ) = ∆ ∈ R
n×n.
Exponential convergence in R:∥∥(∆(d))
−1 − BM
∥∥ ≤ C0e−π√
M , for tk = ekh, ck = htk, h =π√M.
(124)
The ε-rank of (∆(d))−1 is O(| log ε|2), uniformly in d.
The matrix-vector multiplication of BM with rank-1 vector
in Rn⊗d
takes O(dRn logn) op. by the diagonalization
exp(−tk∆(ℓ)) = F ′ℓ ·Dℓ · Fℓ, Dℓ = diage−tkλ(ℓ)1 , ..., e−tkλ(ℓ)
n ,
where Fℓ is the ℓ-mode sin-transform n× n matrix, and λ(ℓ)i
(i = 1, ..., n) are the eigenvalues of the 1D Laplacian ∆(ℓ).
Low tensor rank Laplacian inverse B. Khoromskij, Zurich 2010(L16) 402
Equivalent representation by using tensor product FFT,
(∆(d))−1 ≈
d⊗
ℓ=1
FTℓ
M∑
k=−M
ck
d⊗
ℓ=1
diage−tkλ(ℓ)1 , ..., e−tkλ(ℓ)
n d⊗
ℓ=1
Fℓ =: LM .
Exer. 16.1. Construct the QTT Poisson/Yukawa solver on uniform grid
over [0, 1]d (large d, moderate n), by precomputing the QTT
decompositions of matrix exponential family,
exp(−tk∆(ℓ)) ∈ Rn×n, k = −M, ...,M.
Try with F = 1, large d and moderate grid-size n ≤ 128.
Exer. 16.2. Construct the TT Poisson/Yukawa solver on uniform grid
over [0, 1]d, by LM + FFT , and using the TT-rank recompression of the
Hilbert tensor Λ = [1/(λ(1)i1
+ ...+ λ(d)id
)]. Try the case F = 1, d = 3, for large
grid-size n (proceed from Exer. 10.1).
Analysis of precond: Stability of spectral equivalence B. Khoromskij, Zurich 2010(L16) 403
Introduce the class of rank-R Kronecker product preconditioners defined
by approximate inverse BR as above. Here the coefficients aℓ can be
chosen from the optimisation of the condition number in (here L0 = ∆(d)))
C1 〈AU,U〉 ≤ 〈L0U,U〉 ≤ C2 〈AU, U〉 ∀U ∈ Vn, (125)
so that the matrix L0 is supposed to be spectrally close/equivalent to the
initial stiffness matrix A.
The following lemma proves the spectral equivalence estimates.
Lem. 16.2. [5] (Spectral equivalence). Let us suppose that the constants
C0, C1, C2 are determined by (124) and (125), respectively. Choose M
such that the inequality C0‖L0‖‖B−1R ‖e−π
√M < q(M)C1 holds with q < 1,
then
eC1 〈AU, U〉 ≤D
B−1R U,U
E≤ eC2 〈AU, U〉 ∀U ∈ Vn, (126)
with some spectral equivalence constants eC1, eC2 > 0, that allow the
following bound on the condition number
eC2
eC1
≤ 1
1 − q(M)
C2
C1+
q(M)
1 − q(M).
Analysis of precond: Stability of spectral equivalence B. Khoromskij, Zurich 2010(L16) 404
Proof. We have‚‚‚L0 − B
−1R
‚‚‚ =‚‚‚−L0(L−1
0 − BR)B−1R
‚‚‚ ≤ ‖L0‖‚‚‚L−1
0 − BR
‚‚‚‚‚‚B
−1R
‚‚‚ .
Using (124) the constants eC1, eC2 > 0 can be estimated by
C1 − C0‖L0‖‖B−1R ‖e−π
√M ≤ eC1, eC2 ≤ C2 + C0‖L0‖‖B
−1R ‖e−π
√M ,
with C0 > 0 defined in (124). Combining the above inequality with the
error estimate (124) and with (125) leads to the desired bound.
Rem. 16.1. [5] Lem. 16.2 indicates that the rank-R preconditioner B−1R
has linear (or quadratic) scaling in the univariate problem size n, providing
at the same time the condition number of order C2/C1 as in (125), as
soon as the estimate
C0‖L0‖‚‚‚B
−1R
‚‚‚ e−π√
M < C1
holds. The latter is valid for R = O(| log(q(M)/C1)|2) = O((logn)2). Notice
that the modified sinc-quadrature leads to the improved convergence rate
in (124), C0e−αM/ log(M) with α = log(cond(L0)), again providing the rank
estimate R = O(log(| log(q(M))|/C1)) = O((logn)2).
Numerics on tensor-structured Laplacian inverse B. Khoromskij, Zurich 2010(L16) 405
BNK, Oseledets, [6].
N Precomp Time for sol Residue Relative L2 error
28 6.14 2.98 6.6e-06 7.0e-06
29 8.37 3.52 8.7e-06 7.0e-06
210 10.81 4.02 9.4e-06 7.0e-06
Table 8: Solving 100D Poisson equation in C-QTT.
Numerical complexity in QTT-format: W = O(d| log ε|2 logN).
We used mixed Can-QTT (or simply CQTT) format: approximate each
individual factor in QTT format but do not assemble the full QTT
matrix. It is easy to design an algorithm for matrix and vector operations
in such format, using algorithms from TT Toolbox.
The results for d = 100, given in Table 8 confirm the expected complexity.
Now, as a precomputation, we have time for evaluation of matrix
exponentials. As a pay off, the solution time is now slightly higher, but
still it is linear in d and logarithmic in grid size n.
Numerics on tensor-structured Laplacian inverse B. Khoromskij, Zurich 2010(L16) 406
n Step 1 Step 2 Time for sol Residue Relative L2 error
27 7.34 1.66 0.11 7.4e-03 6.3e-06
28 10.01 2.67 0.19 2.1e-04 8.3e-06
29 12.43 7.68 0.36 2.0e-04 1.0e-05
210 18.71 27.89 0.49 1.7e-03 1.8e-05
Table 9: Numerics for 3D Poisson equation, using 2k-quadrature
n Step 1 Step 2 Time for sol Residue Relative L2 error
27 4.68 8.56 0.58 1.18e-02 7.35e-05
28 6.94 12.98 0.78 4.42e-01 7.81e-04
29 9.99 24.12 1.17 5.97e+00 1.75e-02
210 13.32 45.37 1.48 1.0774e+00 6.23e-01
Table 10: Numerics for 10D Poisson equation, using 2k-quadrature
The recursive quadrature for matrix exponential family B. Khoromskij, Zurich 2010(L16) 407
We discuss a simple recursion that connects previously computed
exponents exp(−tp∆1), p < k, with the new one for the number k. Denote
these matrices by Φk,
Φk = exp(−tk∆1).
The simplest possible recursion is
Φk = Φ2k−1, corresponding to tk = 2tk−1.
This is possible by choosing M such that eh = 2, or equivalently,
h = log 2 = π√M, therefore
M =
„π
log 2
«2
≈ 20.54.
Since M should be integer, so we select M = 21 or M = 20 and slightly
modify h (h = log 2 to make the recursion exact. This yields a new
quadrature formula with
tk = 2k, ck = 2k log 2, k = −M, . . . ,M. (127)
This quadrature will be called 2k-quadrature.
The recursive quadrature for matrix exponential family B. Khoromskij, Zurich 2010(L16) 408
The accuracy of quadrature formula (127) depends on the interval where
it is considered (i.e. on the spectrum of ∆1), but it always gives an
excellent approximate inverse to serve as preconditioner (with relative
accuracy not smaller than 10−3). The special structure of the quadrature
nodes allows the computation of all exponents fast, in 2M + 1
multiplications — in exact arithmetic.
However, in the approximate case, the error may accumulate during the
squaring (since for k = −M the exponents are close to the identity
matrix), and a mixed scheme is more preferable. Up to some k0 the
exponents are computed by scaling and squaring method, and after that
they are just the squares of the previously computed exponents.
A similar approach can be adopted to obtain a more accurate quadrature.
For example: another possible recurrence relation is
Φk = Φk−1Φk−2, or tk = tk−1 + tk−2.
Denoting a = eh , implies that a satisfies the quadratic eq.
a2 − a− 1 = 0, hence a is a a golden ratio: a = ρ =1 +
√5
2≈ 1.6180.
The recursive quadrature for matrix exponential family B. Khoromskij, Zurich 2010(L16) 409
The corresponding M is larger:
M =
„π
log ρ
«2
≈ 42.62,
so one can choose M = 42 or M = 43.
The respective quadrature weights and nodes are (ρk-quadrature)
tk = ρk, ck = ρk log ρ.
This quadrature formula is around 1000 times more accurate than
2k-quadrature. The number of quadrature points can be slightly
decreased, since the norm of several first and last summands is negligible.
There are other possible recursions:
Φk = Φ2k−2, i.e. tk = 2tk−2, and Φk = Φk−2Φk−3, i.e. tk = tk−2 + tk−3,
and so on, which lead to increased M and increased accuracy, yielding a
whole family of “cheap” quadrature rules.
Tensor-truncated preconditioned iteration: EVP B. Khoromskij, Zurich 2010(L16) 410
Elliptic spectral problem (EVP) in S:
AU = λU; 〈U,U〉 = 1.
Preconditioned iteration (or inverse iteration, A → A−1)
Um+1 = Um − B−1(AUm − λmUm),
Um+1 := TS(Um+1) ∈ S,
Um+1 := Um+1/‖Um+1‖|S , λm+1 = 〈AUm+1,Um+1〉|S .
Direct minimization of the Rayleigh quotient (energy func.)
〈AU,U〉 → min, 〈U,U〉 = 1.
DMRG + QTT/MPS.
Green function iteration by direct inversion of Yukawa-like
operator.
Tensor-truncated preconditioned iteration: EVP B. Khoromskij, Zurich 2010(L16) 411
Ex. 15.2. We present the results on the iterative calculation of the
minimal eigen-value for the FD d-Laplacian by inverse power method with
the rank truncation [2]. We discretise the problem on (0, π)d using nd grid
points and apply the sinc-quadr. with M = 49 to obtain the rank-(2M + 1)
appprox. of the Laplacian inverse for d = 3, 10, 50.
Table 411 presents the CPU time (sec.) per iter., the relative H1-error in
the eigen-funct. and the relative error in the eigen-value for n = 29. In all
cases, the number of power iter. does not exceed 6.
d Time/it δλ δu
3 0.9 3.1 · 10−6 4.5 · 10−4
10 2.9 3.1 · 10−6 3.8 · 10−4
50 14.7 3.1 · 10−6 3.1 · 10−4
Table 11: Minimal eigen-value for the d-dimensional Laplacian (d = 3, 10, 50).
Table 411 clearly indicates the linear scaling in d of our tensor solver.
Detailed numerical illustations on tensor-structured eigen-value solvers
can be found in Hackbusch, BNK, Sauter, Tyrtyshnikov [2].
Many-particle models B. Khoromskij, Zurich 2010(L16) 412
Objectives in Many-particle models.
The electronic Schrodinger eq. for many-particle system in Rd,
HΨ = ΛΨ
with the Hamiltonian H = H[r1, ..., rNe ],
H := −1
2
NeX
i=1
∆i −KX
a=1
NeX
i=1
Za
|ri −Ra|+
X
i<j≤Ne
1
|ri − rj |+
X
a<b≤K
ZaZb
|Ra −Rb|,
Za, Ra are charges and positions of the nuclei, ri ∈ R3.
Hence the problem is posed in Rd with high dimension d = 3Ne, where Ne
is the (large) number of electrons.
Desired size of the system is Ne = O(10q), q = 1, 2, 3, 4, ...?
Proteins: q = 3, 4.
Molecular dynamics, electronic structure calculation for small molecules:
q = 1, 2.
The Hartree-Fock equation in R3 (Lect. 17).
Many-particle models B. Khoromskij, Zurich 2010(L16) 413
Lippman-Schwinger integral formulation of the electronic
Schrodinger eq. in R3N , N is the (large) number of electrons
(−∆ − V )ψ = λψ ⇒ ψ = (−∆ − λ)−1V ψ.
New tensor method Truncated Green function iteration for
Hu := (−∆ + V )u = λu.
Rewrite the EVP as an integral equation (fixed point eq.)
u = −(−∆ + λ)−1V u := G(λ)u,
solve it by the power iteration (p.w.c. elements are
admissible!),
um+1 = G(λm)um/‖G(λm)um‖, λm+1 = (Hum+1, um+1).
Truncated Green function iteration B. Khoromskij, Zurich 2010(L16) 414
The truncated Green function iteration for solving the
spectral problem reads as (see BNK [4]),
(∆d + V)U = EU, ‖U‖ = 1, U ∈ RI . (128)
This iteration takes the form
Um+1 = (∆d−EmI)−1VUm, Um+1 := TS(Um+1), Um+1 :=
Um+1
||Um+1||,
and Em+1 is recomputed at each step as a Rayleigh quotient:
Em+1 = 〈LUm+1, Um+1〉.
The particular numerical illustrations are presented for the
Schrodinger eq. describing the hydrogen atom.
Numerics on tensor-structured EVP solvers B. Khoromskij, Zurich 2010(L16) 415
The Schrodinger equation for hydrogen atom,
(−1
2∆ − 1
‖x‖ )u = λu, x ∈ Ω := R3.
The eigenpair with minimal eigenvalue is
u1(x) = e−‖x‖, λ1 = −0.5,
and e−‖x‖ can be proven to have the low canonical rank, r = O(| log ε|).
The Green function iteration + QTT elliptic inverse.
N ×N ×N-grid, Time = O(logN).
BNK, Oseledets, [6]
N Time for 1 iter. Iter. Eigenvalue error
27 8.5 8 6.1e-03
28 13 8 1.5e-03
29 18 8 4.0e-04
210 25 8 1.0e-04
Time dependent problems B. Khoromskij, Zurich 2010(L16) 416
Parabolic BVP in S⊂ Vn:
∂U
∂t− iAU = 0, U0 = TS(U(0)).
The regularised solution operator by QTT -matrix exponential, [BNK ’10]
U(t) = eiAtU0 ≈ TS(eiAtB)TS(B−1U0), t ≥ 0.
Spectral decomposition: AUn = λnUn,
U(t) ≈NX
n=1
eiλnt〈U0,Un〉Un.
Recursive time-space separation by Cayley transform
[Gavriluyk, BNK ’10] in progress
U(t) =∞X
p=0
L(0)p (t)Xp L
(0)p − Laguerre polynomials
X0 = −(iA − I)−1U0, Xp+1 = iA(iA − I)−1Xp, p = 0, 1, ...
Implicit integrators by time stepping.
Parabolic Problems in Molecular Dynamics B. Khoromskij, Zurich 2010(L16) 417
Heat-like problems in Rd,
∂u
∂t−Hu = 0, H = ∆ + V.
The Schrodinger equation for nuclei,
i~∂u
∂t−Hu = 0, H = T + V,
with kinetic energy
T = −dX
ℓ=1
~2
2Mℓ∆xℓ , D(T ) = H2(R3d) ⊂ D(V ), xℓ ∈ R
3,
and a potential V (x1, ..., xd) ≈ E(x1, ..., xd) – PES.
PES E(x1, ..., xd) is computed by multiple solving of HF equation!
Multi-Configuration Time-Dependent Hartree (MCTDH) method
[Meyer et al. 2000], [Lubich, Koch ’08-’09]
QTT tensor approximation of PES.
Spectral solver: DMRG in QTT format with rank compression
BNK, Oseledets [8].
Numerics to MD: QTT matrix exponential B. Khoromskij, Zurich 2010(L16) 418
Direct solver using Quantics-TT matrix exponential.
The solution operator e−iHt could not be approximated by
QTT -matrix exponential with uniform bound on rankQTT .
Introduce the regularized solution operator and iterated
wavepacket
Σp = H−pe−iHt, Up = HpU(x, 0), p = 1, 2, 3,
⇒ stable QTT approximation of Σp,
⇒ low rank QTT approximation of Up.
Assumption:
U(x, 0) contains only low frequences of H.
QTT approximation for fixed t:
U(x, t) = e−iHtU(x, 0) ≈ (TSΣp)TSUp, t ≥ 0.
Numerics to MD: QTT matrix exponential B. Khoromskij, Zurich 2010(L16) 419
Numerical example: quantum harmonic oscillator corresponding to
V (x) =1
2‖x‖2,
propagating the wave packet (exact eigenfunction) by
U(x, t) = Un(x)e−i(n+1/2)t,
where Un(x) is the eigenfunction of quantum harmonic oscillator
(Gaussian multiple of the Hermit polynomials).
Set n = 0, t = 1.0, ε = 10−6, p = 2, d = 1.
The average rankQT T of both the initial wave packet and the time
evolved solutions are of about several ones.
N rank(H−pcos(Ht)) rank(H−pcos(Ht)Up) rank(Up)
28 33.8 3.7 4.7
29 33.2 3.7 4.7
210 32.5 3.6 4.9
Numerical cost is = O(r3 logN), N - spatial grid-size.
[BNK ’10], Preprint 21/2010, MPI MiS, Leipzig.
Numerics to QTT representation of PES B. Khoromskij, Zurich 2010(L16) 420
For particular potentials ranks can be small or even uniformly bounded in
physical dimension f (cf. Lect. 14).
For harmonic potential, QTT-ranks are bounded by 6,
V (q1, . . . , qf ) =
fX
k=1
wkq2k, rankQT T (V ) ≤ 6
For Henon-Heiles potential QTT-ranks are bounded by 7,
V (q1, . . . , qf ) =1
2
fX
k=1
q2k + λ
f−1X
k=1
„q2kqk+1 − 1
3q3k
«, rankQT T (V ) ≤ 7.
Notice: Tucker ranks in all these cases are equal to f ⇒ O(ff ) scaling.
QTT-format gives polynomial in f storage and complexity O(fr3 logN).
Can-to-QTT algorithm: Each rank-1 term is compressed to QTT, and
then addition with compression at each step: complexity O(Rdfr3),
N = 2d.
This is a preprocessing step to be performed only once for each particular
model and potential.
Numerics to QTT representation of PES B. Khoromskij, Zurich 2010(L16) 421
Compression of Henon-Heiles potentials for different dimensions.
Figure 25: Timings: Can-to-QTT approximation of Henon-Heiles pot., N = 1024, f = 4, . . . , 256
The maximal rank for V (Q) is 5, and for the highest dimension considered
f = 256, memory to store V (Q) in QTT format is 62.5 KB.
Dependence from one-dimensional grid size N = 2d is logarithmic, O(logN)
Discussion on numerical issues B. Khoromskij, Zurich 2010(L16) 422
Rem. 16.1. Since the QTT-rank of ∆(f) is bounded by 4 the numerical
commplexity of matrix-vector multiplication and matrix storage is only
limited by the efficiency of QTT representation to multi-dimensional PES.
It is known as a severe bottleneck of modern numerical methods in
molecular dynamics.
Rem. 16.2. Further numerical examples will be presented in Lect. 18
related to DMRG solution of spectral problem including high dimensional
PES.
Exer. 16.3. Compute the QTT decomposition to the few first
eigenfunctions of quantum harmonic oscillator, Un(x), x ∈ Rf (Gaussian
multiple of the Hermit polynomials for n = 0, 1, 2, ...).
Exer. 16.4. Look on the QTT rank of the time evolved solutions,
U(x, t) = Un(x)e−i(n+1/2)t, x ∈ Rf ,
(stable in time). Consider only the real part.
Literature to Lecture 16 B. Khoromskij, Zurich 2010(L16) 423
1. S.V. Dolgov, B.N. Khoromskij, I. Oseledets, and E.E. Tyrtyshnikov: Tensor Structured Iterative
Solutions of Elliptic Problems with Jumping Coeff. Preprint 55/2010 MPI MIS, Leipzig 2010, subm..
2. W. Hackbusch, B.N. Khoromskij, S. Sauter, and E. Tyrtyshnikov, Use of Tensor Formats in Elliptic
Eigenvalue Problems. Preprint 78/2008, MPI MiS Leipzig 2008 (submitted).
3. B.N. Khoromskij, O(d logN)-Quantics Approximation of N-d Tensors in High-Dimensional Numerical
Modeling. Preprint 55/2009 MPI MIS, Leipzig 2009 (J. Constr. Appr., accepted).
4. B.N. Khoromskij, On tensor approximation of Green iterations for Kohn-Sham equations. Comp. and
Visualization in Sci., 11 (2008) 259-271.
5. B.N. Khoromskij, Tensor-Structured Preconditioners and Approximate Inverse of Elliptic Operators in
Rd. J. Constructive Approx. 30:599-620 (2009).
6. B.N. Khoromskij, and I. Oseledets. Quantics-TT approximation of elliptic solution operators in higher
dimensions. Preprint MPI MiS 79/2009, Leipzig 2009, J. Numer. Math. 2010, accepted.
7. B.N. Khoromskij, and Ch. Schwab. Tensor-Structured Galerkin Approximation of Parametric and
Stochastic Elliptic PDEs. Preprint MPI MiS 9/2010, Leipzig 2010, SISC 2010, accepted.
Literature to Lecture 16 B. Khoromskij, Zurich 2010(L16) 424
8. B.N. Khoromskij, and I.V. Oseledets. DMRG + QTT approach to high-dimensional quantum molecular
dynamics. Preprint MPI MiS 68/2010, Leipzig 2010, submitted.
9. V. Kazeev, and B.N. Khoromskij. On explicit QTT representation of Laplace-like operators and their
inverse. Preprint MPI MiS 75/2010, Leipzig 2010.
Lect. 17. Preconditioned tensor truncated iterative solvers B. Khoromskij, Zurich 2010(L17) 425
Outline of Lecture 17.
1. Challenging computational features for Hartree-Fock
equation in electronic structure calculations.
2. Fast tensor convolution with Newton potential.
3. Tensor represenation of Hartree and exchange potentials.
4. Fast nolinear iteration via multilevel DIIS.
5. SPDEs with additive and log-additive multiparametric
coefficients.
6. Rank bound for solutions of 1D SPDEs.
7. Rank estimate for the FD multiparametric matrices.
8. Computation in canonical format.
9. Numerics for additive and log-additive cases.
Ab initio and DFT models B. Khoromskij, Zurich 2010(L17) 426
• Minimization of the Hartree-Fock energy functional
IHF = inf
8<:
NeX
i=1
1
2
Z
R3|∇φi|2 +
Z
R3ρVc +
1
2
Z Z
R3
ρ(x)ρ(y) − |τ(x, y)|2‖x− y‖ dxdy
9=; ,
(129)
over φi ∈ H1(R3),R
R3 φiφj = δij , 1 ≤ i, j ≤ Ne, where
τ(x, y) =NePi=1
φi(x)φi(y) - electron density matrix,
ρ(x) = τ(x, x) - electron density,1
‖x‖ - Newton potential,
Vc - external potential with singularities at centers of atoms.
• Hartree-Fock equation - the Euler-Lagrange equation of (129),
»−1
2∆ − Vc(x) +
Z
R3
ρ(y)
‖x− y‖dy–φi(x) −
1
2
Z
R3
τ(x, y)
‖x− y‖ φi(y)dy = λφi(y),
• Hartree-Fock-Slater/Kohn-Sham equation - DFT
»−1
2∆ − Vc(x) +
Z
R3
ρ(y)
‖x− y‖dy − αVρ(x)
–ψ = λψ, Vρ(x) =
3
πρ(x)
ff1/3
.
Electronic structure calculations B. Khoromskij, Zurich 2010(L17) 427
The Hartree-Fock equation
»−1
2∆ − Vc(x) +
Z
R3
ρ(y)
‖x− y‖dy–φi(x) −
1
2
Z
R3
τ(x, y)
‖x− y‖ φ(y)dy = λiφi(y),
Challenging features:
⊲ Nonlinear eigenvalue problem for i = 1, ...,Ne.
⊲ Nonlocal (integral) convolution-type operators
(Ne convolutions with the Newton potential).
⊲ Tensor decomposition on large spatial grids in 3D.
⊲ High accuracy.
Tensorized L2-projection of 1/‖x‖ onto p.w.c. basis functions over
n× n× n grid in [0, 1]3 ∈ R3. The resultant low rank canonical tensor is the
important ingredient in the fast Hartree-Fock solver including multiple
convolutions with the Coulomb (Newton) potential in the range n ≈ 104.
Ex. I: Canonical approx. to projected 1/‖x‖ in R3. B. Khoromskij, Zurich 2010(L17) 428
10 20 30 4010
−6
10−5
10−4
10−3
10−2
10−1
rank
n=64n=128n=256
Figure 26: Canonical approximation of 1/‖x‖ via sinc quadratures (solid lines). Algebraically
recompressed approximations (marked solid lines). [Bertoglio, BNK ’08]
Fast tensor convolution in R3 vs. FFT, [BNK ’08]
Matlab time/sec, linear scaling in Rρ and n, Rρ = 861, r = 15.
n3 1283 2563 5123 10243 20483 40963 81923 163843
FFT3 4.3 55.4 582.8 ∼ 6000 – – – ∼ 2 years
C ∗ C 1.0 3.1 5.3 21.9 43.7 127.1 368.6 700.2
Tensor representation of HF equation B. Khoromskij, Zurich 2010(L17) 429
For a given finite basis set gµ1≤µ≤Nb , gµ ∈ H1(R3), the molecular
orbitals ψi are represented (approximately) as
ψi =
NbX
µ=1
Cµigµ, i = 1, ...,N. (130)
To derive the equation for the unknown coefficients matrix
C = Cµi ∈ RNb×N , we first introduce the mass (overlap) matrix
S = Sµν1≤µ,ν≤Nb , given by
Sµν =
Z
R3gµgνdx,
the stiffness matrix H = hµν of the core Hamiltonian H = − 12∆ + Vc,
hµν =1
2
Z
R3∇gµ · ∇gνdx+
Z
R3Vc(x)gµgνdx, 1 ≤ µ, ν ≤ Nb,
and the symmetric density matrix
D = 2CC∗ ∈ RNb×Nb . (131)
The nonlinear terms representing the Galerkin approximation of the
Hartree and exchange operators are usually constructed by using the
Tensor representation of HF equation B. Khoromskij, Zurich 2010(L17) 430
so-called two-electron integrals, defined as
bµν,κλ =
Z
R3
Z
R3
gµ(x)gν(x)gκ(y)gλ(y)
‖x− y‖ dxdy, 1 ≤ µ, ν, κ, λ ≤ Nb.
Introducing the Nb ×Nb matrices J(D) and K(D), with D defined by (131),
J(D)µν =
NbX
κ,λ=1
bµν,κλDκλ, K(D)µν = −1
2
NbX
κ,λ=1
bµλ,νκDκλ,
and then the complete Fock matrix F ,
F (D) = H +G(D), G(D) = J(D) +K(D), (132)
one obtains the respective Galerkin system of nonlinear equations for the
coefficients matrix C ∈ RNb×N ,
F (D)C = SCΛ, Λ = diag(λ1, ..., λN ), (133)
C∗SC = IN ,
where the second equation represents the orthogonality constraintsRR3 ψiψj = δij , with IN being the N ×N identity matrix.
Tensor representation of HF equation B. Khoromskij, Zurich 2010(L17) 431
The Galerkin representation of the Hartree operator (the Coloumb
matrix) in tensor method is based on the agglomerated integrals,
J(D)µν =
Z
R3gµ(x)VH(x)gν(x)dx, 1 ≤ µ, ν ≤ Nb, (134)
by a single convolution transform in R3 to compute the Hartree potent.,
VH = ρ ∗ 1
‖ · ‖ ,
where the electron density is given by
ρ(y) = 2NX
a=1
0@
NbX
κ,λ=1
CκaCλagκ(y)gλ(y)
1A . (135)
We represent the matrix entries of K(D) by the following three loops
Khoromskaia [2]: For a = 1, ...,N , compute the convolution integrals,
Waν(x) =
Z
R3
gν(y)NbP
κ=1Cκagκ(y)
‖x− y‖ dy, ν = 1, ...,Nb, (136)
Tensor representation of HF equation B. Khoromskij, Zurich 2010(L17) 432
and then the scalar products
Kµν,a =
Z
R3
24
NbX
κ=1
Cκagκ(x)
35 gµ(x)Waν(x)dx, µ, ν = 1, ...,Nb. (137)
Finally, the entries of the exchange matrix are given by sums over all
orbitals,
K(C)µν =NX
a=1
Kµν,a, µ, ν = 1, ...,Nb. (138)
The advantage of above representations is due to the minimization of the
number of convolution products that have to be computed by numerical
quadratures. What is even more important, that we have the possibility of
efficient low-rank separable approximation of the discretised density ρ(x)
as well as of the auxiliary potentials Waν(x) at step (136).
Example II: Hartree potential on large grids B. Khoromskij, Zurich 2010(L17) 433
VH(x) :=
∫
R3
ρ(y)
|x− y|dy =
(ρ ∗ 1
‖ · ‖
)(x).
Represent orbitals in “approximating basis” gk, e.g., the
GTO basis,
ρ(x) =
N/2∑
i=1
(ϕi)2, ϕi =
R0∑
k=1
ci,kgk(x), R0 ≈ 100,
gk = (x−Ak)βke−λk(x−Ak)2
, x ∈ R3.
O(n logn)-computation of VH and its Galerkin matrix in tensor
format, on large n× n× n grids, error O(h3), h = 1/n.
Use Canonical-to-Tucker-to-canonical transform on a
sequence of grids to reduces the initial rank, Rρ ≈ R20/2.
V. Khoromskaia, BNK [1].
Compared with MOLPRO analytic program.
Example II: Hartree potential on large grids B. Khoromskij, Zurich 2010(L17) 434
−6 −4 −2 0 2 4 6
10−4
10−3
hart
ree
Abs. approx. error, VH
for H2O
a) atomic units
n=4096n=8192Ri−4096−8192
2000 4000 6000 8000 10000 12000 14000 160000
1
2
3
4
5
6
grid size
min
utes
H2O , r
T=20
C−2−T time3D conv. time
a) Absolute error of the tensor-product computation for the Hartree
potential of the HO2 molecule (visualised in Ω = [−6, 6] × 0 × 0);b) CPU times corresponding to n× n× n-grid, up to n = 16000.
Ex. II: Error in the Coulomb matrix Jkm B. Khoromskij, Zurich 2010(L17) 435
Coulomb (Galerkin) matrix is computed by tensor inner products in gk,
Jkm :=
Z
R3gk(x)VH(x)gm(x)dx, k,m = 1, . . . R0, x ∈ R
3.
a) Electron density of H2O in Ω = [−4, 4] × [−4, 4] × 0.c) Absolute approx. error for the Coulomb matrix Jkm (≈ 10−6).
Ex. II: The Exchange Galerkin Matrix Kex = Kkm B. Khoromskij, Zurich 2010(L17) 436
Khoromskaia [2]. Linear scaling in n, cubic in R0.
Kk,m := −1
2
Z
R3gk(x)
τ(x, y)
|x− y| gm(y)dxdy, k,m = 1, . . . R0.
Absolute L∞-error in the matrix elements of Kex for the density of CH4
and pseudodensity of CH3OH.
Univariate grid size n = 1024, 4096. Approximation error O(h3), h = 1/n.
Ex. III: The tensor-truncated iter. for H-F eq. B. Khoromskij, Zurich 2010(L17) 437
gµ ∈ H1(R3) : ψi =
NbX
µ=1
Cµigµ, i = 1, ...,N.
For C = Cµi ∈ RNb×N , and F (C) = H + J(C) −K(C), with Galerkin
matrices
I → S, H = −1
2∆ + Vc → H, VH → J(C), K → K(C),
solve
F (C)C = SCΛ,
C∗SC = IN .
Multilevel “Fixed-point” tensor-truncated iteration: initial guess C0, for
J = K = 0,
eFkCk+1 = SCk+1Λk+1, Λk+1 = diag(λk+11 , ..., λk+1
N )
C∗k+1SCk+1 = IN ,
where eFk, k = 0, 1, ..., is specified by extrapolation over solutions
Ck, Ck−1, ... Khoromskaia, BNK, Flad [3].
Ex. III: The tensor-truncated iter. for H-F eq. B. Khoromskij, Zurich 2010(L17) 438
⊲ Multigrid convergence in eigenvalues (left).
⊲ Convergence in effective iterations scaled to finest grid (right).
2 4 6 8 10 1210
−4
10−3
10−2
10−1
100
iterations
P. CH4 abs. error , | λ − λ
n,it |
n=64n=128
n=256
n=512
n=1024
0 1 2 3 410
−4
10−3
10−2
10−1
100
conv.in eff.iterations, CH4, pseudo, n=512
Ex. III: Optimal scaling of tensor-truncated iteration B. Khoromskij, Zurich 2010(L17) 439
Linear scaling of the CPU time (per iteration) in the univariate grid size n.
200 400 600 800 10000
5
10
15
20
25
30
univariate grid size
min
ute
s
time per SCF iteration
Exer. 17.1. Compute QTT repres. of Slater funct. e−‖x‖, x ∈ [0, 10]3 on
n× n× n (n = 2d) grid by using sinc-quadrature canonical decomp. (see
Ex. 4.6, use quadrature similar to the case of Yukawa kernel (4.13)).
Parametric Elliptic Problems: Stochastic PDEs B. Khoromskij, Zurich 2010(L17) 440
Find uM ∈ L2(Γ) ×H10 (D), s.t.
AuM (y, x) = f(x) in D, ∀y ∈ Γ,
uM (y, x) = 0 on ∂D, ∀y ∈ Γ,
A := − div (aM (y, x) grad) , f ∈ L2 (D) , D ∈ Rd, d = 1, 2, 3,
aM (y, x) is smooth in x ∈ D, y = (y1, ..., yM ) ∈ Γ := [−1, 1]M , M ≤ ∞.
Additive case (via the truncated Karhunen-Loeve expansion)
aM (y, x) := a0(x) +
MX
m=1
am(x)ym, am ∈ L∞(D), M → ∞.
Log-additive case
aM (y, x) := exp(a0(x) +MX
m=1
am(x)ym) > 0.
Sparse stochastic Galerkin/collocation: [Babuska, Nobile, Tempone ’06-’10; Schwab el. ’07-’10]
Stochastic Galerkin, canonical form., additive case: BNK, Ch. Schwab, [5]
HT, additive case: Kressner, Tobler, [8]
QTT, both additive and log-additive cases: BNK, Oseledets, [6]
Stochastic collocation (additive case) B. Khoromskij, Zurich 2010(L17) 441
A parametric linear system, N - grid size in x (FEM, FD in x)
A(y)u(y) = f, f ∈ RN , u(y) ∈ R
N , y ∈ Γ, (139)
A(y) = A0 +MX
m=1
Amym, Am ∈ RN×N , parameter dependent matrix.
Collocation on 1D grid y(k)m =: Γn ∈ [−1, 1], k = 1, . . . , n, n - grid size in y
⇒ Assembled large linear system
Au = f , u, f ∈ RNnM , A ∈ R
NnM×NnM ,
A = A0 × I × . . .× I +A1 ×D1 × I × . . .× I + . . .+ AM × I × . . .×DM ,
Dm, m = 1, . . . ,M , is n× n diagonal matrix with positions of collocation
points y(k)m ∈ Γn, on the diagonal: rankC(A) ≤M .
f = f × e× . . .× e, e = (1, ..., 1)T ∈ Rn.
Def. 17.1. (BNK [7]) Weakly (locally) coupled canonical N-(d+ 1) tensors,
V ∈ RJ×I (similar definition for functional decomposition),
V ∈ Cloc[R] : V(j, I) =RX
k=1
dO
ℓ=1
V(ℓ)k (j), V
(ℓ)k (j) ∈ R
Iℓ , j ∈ J.
Rank bound for the solution, d = 1 B. Khoromskij, Zurich 2010(L17) 442
Define v = −∆−1x f , σm =
‖am‖PMm=1 ‖am‖ > 0, bm(ym, x) = σma0(x) + am(x)ym.
Prop. 17.1. (BNK [7]) Let d = 1, assume ∇xuM (y, x) ∈ C(D) for all y ∈ Γ,
∇xv(x) ∈ C(D), and there exists amin > 0, s.t.,
(A) amin ≤ a0(x) < ∞,
(B)
˛˛˛
MPm=1
am(x)ym
˛˛˛ ≤ γamin with γ < 1, and for |ym| < 1 (m = 1, ...,M).
Then for ε-rank:
rank(∇xuM ) ≤ C| log ε| (additive); rankCloc (∇xuM ) = 1 (log − additive).
Proof. We have ∇xuM (y, x) = 1aM (y,x)
(C0 + ∇xv(x)). Then, in additive
case, there exist ck, tk ∈ R>0, s.t.
‚‚‚‚‚‚∇xuM (y, x) −
KX
k=−K
ck
MY
m=1
e−tkbm(ym,x)(C0 + ∇xv(x))
‚‚‚‚‚‚L∞
≤ Ce−βK/ log K ,
β > 0 and C do not depend on M and K. Log-additive: rank(aM (y, x)) = 1.
Rem. 17.1. Discrete analogy of Prop. 17.1 is based on Lem. 16.1.
Matrix QTT-ranks in additive case B. Khoromskij, Zurich 2010(L17) 443
BNK, Oseledets [6]
Consider 2D-dimensional (x ∈ R2) SPDE in stratified media (i.e., with
coefficient depending on 1D variable x = x− 1) in the two cases:
1. Polynomial decay: am(x) = 0.5(m+1)2
sinmx, x ∈ [−π, π], m = 1, . . . ,M .
2. Exponential decay: am(x) = e−0.7m sinmx, x ∈ [−π, π], m = 1, . . . ,M .
The parametric space is discretized on a uniform mesh in [−1, 1] with 2p
points in each spatial direction. For the experiments, p = 8 is taken.
Ranks are presented with different truncation parameters. Table 12
presents results for the log-additive case and polynomial decay of
coefficients, and Table 13 – for exponential decay. The dependence on M
is linear for polynomial decay, and seems to be much milder in the case of
exponential decay, which is rather natural.
We use two different TT rank estimates for tensors: one characterising
the overall storage needs and complexity, rT T , and another one serving for
the QTT-rank disctribution, rQT T :
rT T (u) =
sPniriri+1P
ni, rQT T (u) =
r1
M
Xriri+1.
Matrix QTT-ranks in additive case B. Khoromskij, Zurich 2010(L17) 444
M QTT-rank(10−7) QTT-rank(10−3)
5 27 10
10 44 17
20 78 27
40 117 49
Table 12: The matrix QTT-rank vs. M , log-additive case,
polynomial decay N = 128, p = 8
M QTT-rank(10−7) QTT-rank(10−3)
5 33 11
10 43 21
20 51 23
40 50 25
Table 13: The matrix QTT-rank vs. M . log-additive case,
exponential decay, N = 128, p = 8.
Matrix QTT-ranks in additive case B. Khoromskij, Zurich 2010(L17) 445
Table 14 describes dependence from the accuracy for a fixed M . This
confirms, that ranks are logarithmic in accuracy ε.
ε QTT-rank(ε)
10−3 25
10−4 31
10−5 38
10−6 44
10−7 50
Table 14: The matrix QTT-rank vs. accuracy. log-additive
case, exponential decay, N = 128, M = 40, p = 8.
Tables 12 - 14 confirm numerically that matrices for log-additive case
have low maximal QTT-ranks, and this representation can be used in the
solution process.
Stochastic collocation (log-additive case) B. Khoromskij, Zurich 2010(L17) 446
In log-additive case the dependence on y is no longer affine.
Let d = 1. Apply collocation to (139) ⇒ nM linear systems (p.w.l. FEM),
A(j1, . . . , jM )u(j1, . . . , jM ) = f, 1 ≤ jm ≤ n ⇒ Au = f .
A(i, j, y) =
Z
Db(x, y)
∂φi
∂x
∂φj
∂xdx, y ∈ Γn, D = [0, 1].
A(i, i, y) =1
4(b(xi−1, y) + 2b(xi, y) + b(xi+1, y)),
A(i, i− 1, y) =1
2(b(xi−1, y) + b(xi, y)), A(i− 1, i, y) = A(i, i− 1, y),
for i = 1, ...,N , with
b(x, y) = ea(x,y) = ea0(x)MY
m=1
eam(x)ym , y ∈ Γn.
There is still good low rank approximations of the form
A ≈RX
k=1
MO
m=0
Amk, Amk ∈ R(M+1)×n.
Stochastic collocation (log-additive case) B. Khoromskij, Zurich 2010(L17) 447
Lem. 17.1. For 1D SPDE by p.w.l. FEM, the log-additive case:
rankC(A(i, j, y)) ≤ 3 (i, j ≤ N, y ∈ Γn) ⇒ A ∈ Cloc[3] ⊂ QTTloc[3]
rankQT T (A(i, j, y)) ≤ 3 i, j ≤ N, y ∈ Γn, ⇒ A ∈ QTTloc[3].
rankC(A) ≤ 7N .
Prof.
A(y) = D(y) + Z(y) + Z⊤(y), y ∈ Γn.
D(y) is a diagonal of A, Z is the first subdiagonal. D(y) is represented as
D(y) =NX
i=1
A(i, i, y)eie⊤i =
1
4(C1(y) + 2C2(y) + C3(y)),
where C2(y) takes the form
C2(y) =NX
i=1
eie⊤i e
a0(xi)MY
m=1
eam(xi)ym . (140)
C2(y), y ∈ Γn, is Nnm ×Nnm diagonal matrix, and each summand in (140)
has tensor rank-1. For QTT format in variable ym, TT-ranks will be equal
to 1, since it is an exponential function (Lect. 13).
QTT-truncated preconditioned iteration B. Khoromskij, Zurich 2010(L17) 448
BNK, Ch. Schwab [5]
u(k+1) := u(k) − ωB−1k
(Au(k) − f
), u(k+1) = Tε(u
(k+1)) → u,
Tε is the rank truncation operator in given format S,
preserving accuracy ε.
In additive case, a good choice of a (rank-1) preconditioner
B−10 = A−1
0 × I × . . .× I.
In log-additive case, adaptive preconditioner at iter. step k,
B−1k = A(y∗k)−1 × I × . . .× I, y∗k = argminQTT (‖f − Au(k)‖).
Note: B0 corresponds to y∗ = 0.
Proven spectral equivalence, B0 ∼ A, in both cases.
Numerics to sPDEs: additive case/canonical B. Khoromskij, Zurich 2010(L17) 449
BNK, Ch. Schwab [5]
S-truncated preconditioned iteration in (d+M)-dimensional parametric
space. Canonical format, M ≤ 100.
Solving sPDE on N⊗(M+d)-grid, d = 1, M = 20 (S = CR, B−1 := A(0)−1).
Variable coefficients with exponential decay (N = 63, R ≤ 5),
am(x) = 0.5 e−msin(mx), m = 1, 2, ....,M, x ∈ (0, π).
1 2 3 4 510
−4
10−3
10−2
10−1
Dim=20, alpha=1, rank=5, grid=63
rank
2−no
rm
1 2 3 4 5 6 710
−5
10−4
10−3
10−2
10−1
100
101
Dim=20, alpha=1, rank=5, grid=63
T−iter
Res
idua
l
Numerics to sPDEs: additive case/canonical B. Khoromskij, Zurich 2010(L17) 450
Zero order SPDE: smooth and random coefficient in y.
a(y, x) = a(y) := 1 +MX
m=1
amym with γ = ‖a‖ℓ1 :=MX
m=1
|am| < 1, (141)
for the truncated sequence of (spatially homogeneous) coefficients
am = (1 +m)−α, (m = 1, ...,M) with algebraic decay rates α = 2, 3, 5.
Equivalent to the so-called zero order sPDE in the form,
a(y)u(y) = f. (142)
Highly oscillating random coefficient
a(y) = 1 +MX
m=1
amym H(ym − cm(ym)),
the pwc function cm(ym), given by a random n-vector at [−1, 1].
H : R → −1, 1, H(x) = −1 for x < 0, and H(x) = 1 for x ≥ 0.
Numerics to sPDEs: additive case/canonical B. Khoromskij, Zurich 2010(L17) 451
1 2 3 4 510
−6
10−5
10−4
10−3
10−2
rank
2−no
rmDim=20, alpha=3, rank−max=5, grid=63
0 10 20 30 40 50 60 70−0.3
−0.2
−0.1
0
0.1
0.2
0.3
0.4
Figure 27: Approximation error vs. rank R (left) and the five canonical
vectors in variable y1 (right) for the solution of (142), M = 20.
r-convergence exponential, same as in the smooth case.
The canonical vectors – highly oscillating.
Numerics to sPDEs: QTT/log-additive B. Khoromskij, Zurich 2010(L17) 452
Figure 28: Convergence in the stratified 2D example with two different truncation param., 1-point
preconditioner, Left: Residue with iteration, Right: Ranks with iteration BNK, Oseledets, [6]
Conclusions on tensor methods for SPDEs B. Khoromskij, Zurich 2010(L17) 453
Exer. 17.2. Compute QTT rank of the system matrix A for 1D SPDE
(N = 2d, n = 2p), exponential decay, for additive case (can-to-QTT
compr.).
Conclusions: C/QTT + Preconditioned tensor-truncated iteration:
unified approach to challenging problems of numerical SPDEs.
– Separation rank estimates and analytic approximation
– Rank-structured tensor representation of the system matrix
– C/QTT preconditioners
– Fast and stable rank optimization algorithms via QTT-MLA
– Possible tensorization in the physical variable
– Successful computations in HT format Kressner, Tobler [8]
8. D. Kressner, Ch. Tobler. Krylov subspace methods for linear systems with tensor product structure.
SIMAX, 31(4): 1688-1714, 2010.
Literature to Lecture 17 B. Khoromskij, Zurich 2010(L17) 454
1. B.N. Khoromskij and V. Khoromskaia, Multigrid Tensor Approximation of Function Related Arrays.
SIAM J. on Sci. Comp., 31(4), 3002-3026 (2009).
2. V. Khoromskaia, Computation of the Hartree-Fock Exchange in the Tensor-structured Format.
Computational Methods in Applied Mathematics, Vol. 10(2010), No 2, 204-218.
3. B.N. Khoromskij, V. Khoromskaia, and H.-J. Flad, Numerical Solution of the Hartree-Fock Equation in
Multilevel Tensor-structured Format. Preprint 44/2009, MPI MIS Leipzig 2009 (SISC, accepted).
4. B.N. Khoromskij, Fast and Accurate Tensor Approximation of a Multivariate Convolution with Linear
Scaling in Dimension. J. of Comp. Appl. Math., 234 (2010) 3122-3139.
5. B.N. Khoromskij, and Ch. Schwab. Tensor-Structured Galerkin Approximation of Parametric and
Stochastic Elliptic PDEs. Preprint MPI MiS 9/2010, Leipzig 2010, SISC 2010, accepted.
6. B.N. Khoromskij, and I. Oseledets, Quantics-TT collocation approximation of parameter-dependent
and stochastic elliptic PDEs. Preprint MPI MiS 37/2010, Leipzig 2010 (CMAM 2010, accepted).
7. B.N. Khoromskij. Tensors-structured Numerical Methods in Scientific Computing: Survey on Recent
Advances. Preprint 21/2010, MPI MiS Leipzig 2010 (submitted).
Lect. 18. DMRG + QTT approach to high-dimensional QMD B. Khoromskij, Zurich 2010(L18)
Outline of Lecture 18.
1. Time evolution by spectral decomposition in quantum
molecular dynamics (QMD).
2. Problem setting.
3. Representation of the Hamiltonian matrix.
4. Numerical QTT representation.
5. QTT representation of multidimensional potential energy
surface (PES).
6. Sketch of the density matrix renormalization group
(DMRG) iteration.
7. Local EVP.
8. Numerics in the case of Henon-Heiles PES in the wide
rande of dimensions f ≤ 256. Linear scaling in f .
9. General conclusions on tensor numerical methods.
Tensor methods for time dependent problems B. Khoromskij, Zurich 2010(L18) 456
Parabolic BVP in S⊂ Vn:
∂U
∂t− iAU = 0, U0 = TS(U(0)), A − symmetric.
The regularised solution operator by QTT -matrix exponential, [BNK ’10]
U(t) = eiAtU0 ≈ TS(eiAtB)TS(B−1U0), t ≥ 0, B
−1 ≈ A.
Spectral decomposition (common in QMD): AUn = λnUn,, n = 1, 2, ...
U(t) ≈NX
n=1
eiλnt〈U0,Un〉Un.
Time-space separation by Cayley transform, [Gavriluyk, BNK ’10] in progress
Implicit integrators by time stepping (Yukawa solvers).
Above methods apply to both heat-like and QMD problems.
Present lecture: methods based on spectral decomposition via
DMRG/QTT algorithms with examples in QMD [BNK, Oseledets ’10].
Spectral problems in QMD B. Khoromskij, Zurich 2010(L18) 457
The basic problem in quantum molecular dynamics: time-dependent
molecular Schrodinger equation,
i∂ψ
∂t= Hψ = (−1
2∆ + V )ψ, ψ(x, 0) = ψ0(x), x ∈ R
f , (143)
V : Rf → R is (known) approximation to the potential energy surface
(PES) and ψ0(x) is an initial wavepacket (Meyer [3,4], Lubich [2]).
Eq. (143) is solved in time, and high-quality spectrum of H can be
recovered from it, describing the vibrational motion of molecules.
The computation of the ground state (or several lower states) of H can
be considered as an ingredient in quantum molecular dynamics
simulations, providing the alternative to direct time-evolution of the
system by discrete time-integrators.
Finding ground state is the solution of the eigenvalue problem,
Hψ = (−1
2∆ + V )ψ = Eψ, ψ = ψ(q1, . . . , qf ) (144)
which has to be solved in the QTT tensor format.
Problem setting B. Khoromskij, Zurich 2010(L18) 458
H is the molecular Schroedinger operator
H = −1
2∆ + V,
where ∆ is a f-dimensional Laplace operator, and V = V (q1, . . . , qf ) is
called PES. q1, . . . , qf are degrees of freedom (for example, coordinates of
atoms in a molecule).
Values of V should be obtained from the solution of electronic
Schroedinger eq. using any reliable method for electronic structure
calculations (say, the Hartree-Fock eq.).
The case of polynomial potential is important. For example, second-order
polynomial in normal coordinates becomes (harmonic oscillator)
V =
fX
k=1
w2kqk,
where wk are vibrational frequencies, and f is the number of degrees of
freedom.
The solution decays exponentially as |q| → ∞.
Problem setting B. Khoromskij, Zurich 2010(L18) 459
The computational domain can be chosen to be a f-dimensional cube,
and Dirichlet boundary conditions can be imposed. In this cube, uniform
tensor grid with n points in each direction is introduced, and unknowns
will be values of function ψ on this grid, and there are nf unknown values.
Multiplication by V reduces to the multiplication by diagonal matrix.
For the Laplace operator, finite-difference approximation is used, yielding
a matrix ∆(f), which can be written as
∆(f) = ∆1 ⊗ I ⊗ . . .⊗ I + . . .+ I ⊗ . . .⊗ ∆f , (145)
where ∆i, i = 1, . . . , f are discretizations of one-dimensional Laplace
operator with Dirichlet boundary conditions, and ⊗ is the tensor
(Kronecker) product of matrices.
In the simplest uniform grid case, we have
∆i = γitridiag[−1, 2,−1].
Representation of the matrix B. Khoromskij, Zurich 2010(L18) 460
Theoretical rank bounds for the polynomial PES revisited.
Recall the results in Lect. 14-16, Thm. 14.4, Lem. 14.3. The Laplace
operator allows TT-decomposition with ranks equal to 2:
rankTT(∆f ) ≤ 2.
Suppose that the uniform grid is chosen, and ∆i in (145) are
∆i = γitridiag[−1, 2,−1]. For such case, QTT-ranks are bounded by 4:
rankQTT(∆i) ≤ 4.
The potential V , is discretized by collocation at grid points, leading to a
diagonal matrix. The low-rank approximation of this matrix is reduced to
the low-rank approximation of a function V (q1, . . . , qf ) on a tensor grid.
If variables in V (q1, . . . , qd) are separated,
V (q1, . . . , qf ) ≈rX
k=1
fY
i=1
vi(qi, k),
then the canonical rank of the respective tensor V does not exceed r,
hence TT-ranks of V do not exceed r. But they can be much smaller.
Representation of the matrix B. Khoromskij, Zurich 2010(L18) 461
For particular potentials ranks can be uniformly bounded in dimens. f .
Lem. 18.1. For a general homogeneous polynomial potential we have
V (q1, . . . , qf ) =
fX
i1,...,is=1
a(i1, . . . , is)
sY
k=1
qik , rankT T (V ) = C0f[ s2] + o(f [ s
2]).
For harmonic potential, QTT-ranks are bounded by 6,
V (q1, . . . , qf ) =
fX
k=1
wkq2k, rankQT T (V ) ≤ 6
For Henon-Heiles potential QTT-ranks are bounded by 7,
V (q1, . . . , qf ) =1
2
fX
k=1
q2k + λ
f−1X
k=1
„q2kqk+1 − 1
3q3k
«, rankQT T (V ) ≤ 7.
Notice: Tucker ranks in all these cases are equal to f ⇒ O(ff ) scaling.
QTT-format gives polynomial in f storage and complexity O(fr3 logN).
Can-to-QTT preprocessing: Each rank-1 term is compressed to QTT,
and then addition with compression: complexity O(Rdfr3), N = 2d.
A preprocessing step: performed only once for any particular potential.
Numerical QTT representation B. Khoromskij, Zurich 2010(L18) 462
In the case V is given analytically, often, V can be represented as
separable expansion of form
V (q1, . . . , qf ) ≈RX
α=1
h1(q1, α) . . . hf (qf , α), (146)
but with large number of terms R. This is true for the polynomial PES.
The conversion of V defined by (146) into the QTT-format is performed
using the following steps. For each summand, it is converted into
QTT-format, by converting into QTT-format one-dimensional functions
hk(qk, α), k = 1, . . . , f , using either full to tt subroutine, or in case of
functions like polynomial or sine/cos, known analytical
QTT-representations. Then their Kronecker product is performed by
using mkron function in TT-Toolbox, which for given tensors A1, . . . ,Af in
the TT-format computes their Kronecker product (df-dimensional tensor)
Vα = A1 × A2 × . . .Af .
Kronecker product reduces to the concatenation of cores (no operations).
Numerical QTT representation B. Khoromskij, Zurich 2010(L18) 463
The required tensor V is now given as a sum of R tensors in the
QTT-format:
V =RX
α=1
Vα.
Performing additions in the QTT-format directly yields a tensor with rank
bounded by
R“max
αrankQTT(Vα)
”.
Since R can be very large, Vα are added one-by-one approximately by
using the following scheme:
V(1) := V1, V(k+1) = V(k) + Vk, V(k+1) = Tε(V(k+1)),
where Tε is rounding operator with relative accuracy ε in the QTT format.
If we assume, that ranks of intermediate tensors V(k) are of order r,
complexity of the algorithm becomes O(Rdfr3). If no intermediate
compression is used, it would be cubic in R.
Minimization on QTT-manifold for high-dim. EVPs B. Khoromskij, Zurich 2010(L18) 464
Nonlinear optimization problem. Actual dimension is equal to df → d.
The Hamiltonian matrix, H (for the QTT-format nk = 2),
H(i1, . . . , id, j1, . . . , jd) = ⊗H1(i1, j1) . . . Hd(id, jd), (147)
Hk(ik, jk) = Hk(αk−1, ik, jk, αk)
are cores of the QTT-representation of H, and
1 ≤ αk ≤ Rk ≤ R, k = 1, . . . , d, with 1 ≤ ik, jk ≤ nk.
Find an eigenvector ψ that solves
Hψ = Eψ, ψ ∈ S := QTT [r]. (148)
ψ = reshape(ψ, [n1, . . . , nd]), ψ(i1, . . . , id) ≈ ⊗G1(i1) . . . Gd(id).
Equations for parameters, defining representation.
For the EVP (148) the standard way is to minimize Rayleigh quotient:
ψ = arg min(Hψ,ψ) : (ψ,ψ) = 1, rankQTT(ψ) ≤ r.
Rank parameter r controls storage requirement, and has to be chosen
compromising between accuracy and complexity.
Sketch of DMRG iteration B. Khoromskij, Zurich 2010(L18) 465
If all cores except Gk and Gk+1 are fixed, we are left with a “small”
optimization problem in Gk and Gk+1 (still nonlinear).
Linearisation: introduce a new superblock:
W (ik, ik+1) = ⊗Gk(ik)Gk+1(ik+1), i.e., rk−1 × nk × nk+1 × rk+2 - tensor.
Equiv. to agglomeration modes, k and k + 1 in the TT-repres., yielding
n1 × . . .× nk−1 × (nknk+1)× nk+2 . . .× nd tensor (middle core is optimized).
Imposing norm constraint ||ψ|| = 1: The quadratic problem in W can be
reduced to the EVP for finding W under a constraint, ||ψ|| = 1.
Before minimization, we ensure that in the TT-representation of ψ (464)
fixed cores Gs(is), s = 1, . . . , k − 1 are left-orthogonal, i.e.X
isG⊤
s (is)Gs(is) = Irs ,
and cores Gs, s = k + 2, . . . , d are right-orthogonal, i.e.X
isGs(is)G
⊤s (is) = Irs−1 .
Then, ||ψ|| is the norm of the cores that we optimize. In the matr. formrX
ik,jk||W (ik, ik+1||2
F= ||ψ|| = 1.
Decimation step and DMRG sweep B. Khoromskij, Zurich 2010(L18) 466
Decimation step: After W is obtained from the “local” eigenvalue
problem, it is approximated with some prescribed accuracy ε to recover
Gk and Gk+1,
W (ik, ik+1) ≈ Gk(ik)Gk+1(ik+1). (149)
Approximation (149) requires one SVDε: eq. (149) in the index form is
W (αk−1, ik, ik+1, αk+1) ≈rkX
αk=1
Gk(αk−1, ik, αk)Gk+1(αk, ik+1, αk+1),
and it can be done via reshaping W into a rk−1nk × nk+1rk+1 matrix and
computing its SVDε. Rank rk is determined adaptively to the accuracy
parameter ε, (a big advantage over the standard ALS).
DMRG sweep: Consists of the following steps:
(1) Cores G3, . . . , Gd are made right-orthogonal,
(2) Optimization for G1, and G2, then for G2 and G3 and so on, keeping
cores G1, . . . , Gk−1 left-orthogonal to make W of the unit Frobenius norm.
(3) After the last core is reached, sweep can be made from right to left
and process continuesly until convergence (stabilization of the Rayleigh
quotient, or stabilization of ψ).
Local EVP B. Khoromskij, Zurich 2010(L18) 467
Representation of the Rayleigh quotient: Ψ is represented as
Ψ(i1, . . . , id) = G1(i1) . . . Gk−1(ik−1)W (ik, ik+1)Gk+2(ik+2) . . . Gd(id).
(Hψ,ψ), includes matrix-by-vector product
y = Hψ,
that is in the TT-format with cores,
Yk(ik) =X
jk
“Hk(ik, jk) ⊗Gk(jk)
”.
Representation for (Hψ,ψ):
(Hψ,ψ) = Γ1Γ2 . . .Γk−1 · Γ · Γk+2 . . .Γd,
Γs =X
is,js
Hs(is, js) ⊗Gs(is) ⊗Gs(js), s = 1, . . . d, s 6= k, s 6= k + 1,
Γ =X
ik,ik+1,jk,jk+1
Hk(ik, jk)Hk+1(ik+1, jk+1) ⊗W (ik, ik+1) ⊗W (jk, jk+1).
Γs is a matrix of size Rsr2s ×Rs+1r2s+1,
Γ is of size Rk−1r2k−1 ×Rk+1r
2k+1 (corresponding to merged core).
Local EVP B. Khoromskij, Zurich 2010(L18) 468
The product
pk = Γ1 . . .Γk−1, (resp.) qk = Γk+2 . . .Γd,
is a row vector of length Rk−1r2k−1, (resp.) a column vect. of length
Rk+1r2k+1.
Precomputed pk and qk, the Rayleigh quotient becomes
(Hψ,ψ) = pk
“ X
ik,ik+1,jk,jk+1
Hk(ik, jk)Hk+1(ik+1, jk+1)⊗W (ik, ik+1)⊗W (jk, jk+1)”qk.
Assuming rs ∼ r,Rs ∼ R, the total cost is O(n2Rr3) with n = 2.
Local eigenvalue problem for W : “Local” notation. DMRG
optimization step is equivalent to the ALS step applied to the tensor with
modes k, k + 1 merged ⇒ i = (ik, ik+1) – one long index, and j = (jk, jk+1).
The function to be minimized
f(W ) = p“X
i,j
M(i, j) ⊗W (i) ⊗W (j)”q,
where M(i, j), i, j = 1, . . . ,m are R1 × R2 matrices, W (i), i = 1, . . . ,m are
r1 × r2 matrices.
Local EVP B. Khoromskij, Zurich 2010(L18) 469
Here p is a row R1r21-vector, and q is a column R2r22-vector.
The local optimization problem:
p“X
i,j
M(i, j) ⊗W (i) ⊗W (j)”q → min,
X
i
||W (i)||2F = 1. (150)
Quadratic optimization problem in W , (150), is reduced to the EVP for
W ,
cMw = Ew, ||w|| = 1,
and w is W transformed into the “long vector” (MATLAB: w = W (:)).
Complexity: W contains 4r2 elements (n = 2), i.e., matrix cM is 4r2 × 4r2.
In full matrix format, ranks r > 50 imply 4r2 > 10000, but in practice one
can encounter ranks of order several hundred.
Remedy: Storage of cM requires only 16r2 memory cells to store M(i, j),
and matrix-by-vector product cMw costs O(Rr3 +R2r2).
Iterative computation of the lowest eigenvalue in the local EVP can be an
option.
Hamiltonian with Henon-Heiles pot. (dim. f ≤ 256) B. Khoromskij, Zurich 2010(L18) 470
Matrix H was approximated in QTT-format by using add-and-compress
algorithm.
The grid size was set to n = 128 = 27, and the number of dimensions f
is varying from 4 to 256.
Figure 29: Time to approximate the matrix. Figure 30: Dependence of
maximal and effective ranks on the number of DOFs.
QTT-ranks of solution look like in Figure 31.
Minima in this oscillating picture correspond to TT-ranks, i.e., in
absence of QTT-tensorization, and for new (virtual) dimensions ranks are
larger but in the same range (local matrix size is reduced dramatically).
Resume on QTT-DMRG iteration:
Vector, matrix and MLA complexities scale linearly in dimension.
Global convergence of DMRG sweep is robust.
Local preconditioned EVP solver is fast.
Further prospects: Toward Tensor Networks, FCI electronic structure
calculations, general applications.
Hamiltonian with Henon-Heiles pot. (dim. f ≤ 256) B. Khoromskij, Zurich 2010(L18) 471
Exer. 18.1. Calculate the local DMRG-QTT matrix cM on the exact
solution (rank-2 QTT sin-tensor) for the problem −∆ψ = λψ, where
∆ = ∆(d)DD is 1D DD-Laplacian on uniform grid of size n = 2d. Use explicit
rank-3 QTT representation of ∆(d)DD (Lect. 16) and rank-2 representation
of ψ (Lect. 15).
Figure 29: Solution and approximation timings for Henon-Heiles potentials with d = 7 and f is
varying, ε = 10−6.
Hamiltonian with Henon-Heiles pot. (dim. f ≤ 256) B. Khoromskij, Zurich 2010(L18) 472
Figure 30: Maximal and effective rank behaviour, d = 7, f is varying,
ε = 10−6.
Hamiltonian with Henon-Heiles pot. (dim. f ≤ 256) B. Khoromskij, Zurich 2010(L18) 473
Figure 31: QTT-ranks for the solution with f = 32, d = 7 for different modes
Status of tensor numerical methods in modern applications B. Khoromskij, Zurich 2010(L18) 474
Elliptic (parameter-dependent) eq.: Find u ∈ H10 (Ω), s.t.,
Hu := − div (a gradu) + V u = F in Ω ∈ Rd.
EVP: Find a pair (λ, u) ∈ R ×H10 (Ω), s.t., 〈u, u〉 = 1,
Hu = λu in Ω ∈ Rd,
u = 0 on ∂Ω.
Parabolic (hyperbolic) eq.: Find u : Rd × (0,∞) → R, s.t.
u(x, 0) ∈ H2(Rd) : σ∂u
∂t+ Hu = 0, H = ∆d + V (x1, ..., xd).
Tensor meth. are gainfully adapted to main challenges:
High spacial dimension: Ω = (−b, b)d ∈ Rd (d = 2, 3, ..., 100, ...).
Multiparametric eq.: a(y, x), u(y, x), y ∈ RM (M = 1, 2, ..., 100, ...,∞).
Nonlinear, nonlocal (integral) operator V = V (x, u), singular potentials.
Literature to Lecture 18 B. Khoromskij, Zurich 2010(L18) 475
1. B.N. Khoromskij, and I.V. Oseledets. DMRG + QTT approach to high-dimensional quantum molecular
dynamics. Preprint MPI MiS 68/2010, Leipzig 2010, submitted.
2. I.V. Oseledets, Compact matrix form of the d-dimensional tensor decomposition. Preprint 09-01, INM
RAS, Moscow 2009.
3. Ch. Lubich, From quantum to classical molecular dynamics: reduced models and numerical analysis.
Zurich Lectures in Advanced Mathematics, EMS, 2008.
4. M.H. Beck, A. Jackle, G.A. Worth, and H.-D. Meyer, The multiconfiguration time-dependent Hartree
(MCTDH) method: A highly efficient algor. for propagating wavepackets. Phys. Rep. 324 (2000), 1-105.
5. H.-D. Meyer, F. Gatti, and G.A. Worth, Multidimensional quantum dynamics: MCTDH Theory and
Applications, Wiley-VCH, Weinheim, 2009.
6. G. Vidal, Efficient classical simulation of slightly entangled quantum computations. Phys. Rev. Lett.,
91 (2003).
7. S.R. White, Density matrix algorithms for quantum renormalization groups. Phys. Rev. B., 48 (1993),
10345-10356.
Acknowledgments and best wishes for 2011! B. Khoromskij, Zurich 2010(L18) 476
Acknowledgments.
I gratefully acknowledge Christoph Schwab and Stefan Sauter for their
kind invitation to present these lectures, for collaboration, and for
creating friendly and encouraging atmosphere during my stay in Zuerich.
I am very much appreciative to my colleagues and coauthors Wolfgang
Hackbusch, Eugene Tyrtyshnikov, Venera Khoromskaia, Ivan Oseledets,
Ivan Gavriluyk, Heinz-Jurgen Flad, and to MFTI-students Vladimir Kazeev
and Sergey Dolgov. Our effective collaboration in the recent years lead to
rigorous understanding of tensor numerical methods. Many of our joint
papers provided the base for my lecture course.
Special thanks are to Christine Tobler (ETH Zurich) for essential and
qualified assistance with MATLAB exercises to the course.
I am thankful to my students for their interest and patience.
Merry Christmas and the Happy New Year !!! :-) :-) :-)
http://personal-homepages.mis.mpg.de/bokh