Low ranks in computational Fourier analysis - TU Chemnitzpotts/cms/cms15/talk2kunis.pdf · Low ranks in computational Fourier analysis Part II: Fast matrix vector multiplication Stefan

Low ranks in computational Fourier analysisPart II: Fast matrix vector multiplication

Stefan Kunis

Outline

Part I: Introduction

Part II: Fast matrix vector multiplication

Part III: Efficient Reconstruction

Outline of part II

Hierarchical matrices in a nutshell

Fast Laplace transform - hierarchical approximation

Sparse fast Fourier transform - butterfly approximation

Application I - photoacoustic imaging

Application II - evaluating polynomials in a disk


hierarchical matrices (Greengard, Rokhlin; Hackbusch; Borm, Grasedyck; Bebendorf; Fenn, Steidl)

model problem d = 1

source nodes y` ∈ [0, 1], coefficients f` ∈ R, ` = 1, . . . ,N, andtarget nodes xj ∈ [0, 1], xj 6= y`, j , ` = 1, . . . ,N, compute

uj =N∑`=1

f`|xj − y`|

naive: O(N2) floating point operations


the kernel κ : [0, 1]× [0, 1]→ R

κ(x , y) =1

|x − y |

is asymptotically smooth


X = [xmin, xmax], Y = [ymin, ymax]

degenerate approximation κ : X × Y → R

κ(x , y) =

p−1∑s=0

(x − x0)s · (y − x0)−(s+1)

admissibility condition

|xmax − xmin| = diam(X ) ≤ dist(X ,Y ) = |ymin − xmax|

if p ≥ C | log ε|, then

‖κ− κ‖C(X×Y ) ≤ ε


dyadic decomposition of X = [0, 1]

0 1

0 1/2 1

0 1/4 1/2 3/4 1

0 1/8 1/4 3/8 1/2 5/8 3/4 7/8 1

Level l = 0X00

Level l = 1X10 X11

Level l = 2X20 X21 X22 X23

Level l = 3X30 X31 X32 X33 X34 X35 X36 X37


dyadic decompositions - two binary trees

X00

X10 X11

X20 X21 X22 X23

X30 X31 X32 X33 X34 X35 X36 X37

Y00

Y10 Y11

Y20 Y21 Y22 Y23

Y30 Y31 Y32 Y33 Y34 Y35 Y36 Y37


admissible pairs - a quadtree

X00 × Y00

X10 × Y10 X10 × Y11 X11 × Y10 X11 × Y11

X20 × Y22 X20 × Y23 X21 × Y22 X21 × Y23

X32 × Y34 X32 × Y35 X33 × Y34 X33 × Y35


X∗

Y∗

matrix partitioning

H-matrix

X∗ = {xj ∈ [0, 1] : j = 1, . . . ,N},Y∗ = {y` ∈ [0, 1] : ` = 1, . . . ,N} well distributed

local computations, admissible block in level l

κ(xj , y`) ≈p−1∑s=0

φs(xj)ψs(y`), K ≈ ΦΨ>, 2pN

2l

total computations

O(N log N| log ε|)



discrete Laplace transform (Rokhlin; Strain; Andersson)

model problem, d = 1

T ⊂ [0,Tmax], |T | = N

X ⊂ [0,Xmax], |X | = N

f = (fk)k∈T ∈ CN

evaluate sum of exponentials for x ∈ X

f (x) =∑k∈T

fke−kx




diam(T ) ≤ dist(T , 0) and diam(X ) ≤ dist(X , 0)

kernel function κ : [0, 1]2 → R, κ(k, x) = e−kx

using singular value decomposition

locally rank 1 κ






locally rank 1 error














kernel function κ : [0,Tmax]× [0,Xmax]→ R

κ(k, x) = e−kx

admissibility condition allowing low rank approximation

diam(X ) ≤ dist(X , 0)

diam(T ) ≤ dist(T , 0)

subdivide both intervals geometrically

0 Xmax

X4X5 X3 X2 X1


let X and T be admissible and

B =(e−kx

)x∈X ,k∈T

polynomial interpolation in k and x at ts = cos 2s−12p π (Trefethen)

LX ∈ R|X |×p, LT ∈ Rp×|T |, Lagrange matrices, and

Bp =(

e−(tr+1)(ts+1)diamTdiamX/4)pr ,s=1

Lemma (local error)

If X ,T ⊂ [0,∞) are admissible, then

‖B− LXBpLT‖1→∞ ≤ 2 · 4−p


Theorem (global error, complexity)

Let N ∈ N, ε > 0, X ,T ⊂ [0,∞), then a matrix vector productwith B = (e−xjk`)j ,`=1,...,N can be computed in

O(

N log1

ε+ log3

1

εlog

xmaxkmax

ε

)floating point operations.

1 3 5 7 9 11 13 15 17 1910

−16

10−14

10−12

10−10

10−8

10−6

10−4

10−2

100

10−2

100

102

104

21

24

27

210

213

216

219

222

error vs. p, N = 214 time vs. N, p = 8


model problem, d = 1

T ⊂ [0,N], |T | = N

X ⊂ [0,N], |X | = N


evaluate almost periodic function for x ∈ X

f (x) =∑k∈T

fke2πikx/N

naive: f = Af takes O(N2) floating point operations

FFT for nonequispaced nodes in time and frequency domain(nnFFT, Elbel, Steidl; Potts, Steidl, Tasche; Keiner, Knopp, Potts, K.; type-3 nuFFT, Greengard, Lee)

butterfly approximation scheme(Edelman; Michielsen, Boag; Chew, Song; Ying; O’Neil, Woolfe, Rokhlin; Candes, Demanet; Tygert)


Lemma (local error)

Let N, p ∈ N, X ,T ⊂ [0,N] fulfil the admissibility conditiondiam(T )diam(X ) ≤ N, then

‖A− LXApLT‖1→∞ ≤ 3 ·(

π

p − 1

)p

SVD of an admissible block

lower bound (Widom)

C(π

8

)p 1

p!≈ C ′

(1.06

p

)p

1 3 5 7 9 11 13 15 17 1910

−16

10−14

10−12

10−10

10−8

10−6

10−4

10−2

100

local error, N = 210


X

T

dyadic decomposition of X = [0,N]

0 N

0 N/2 N

0 N/4 N/2 3N/4 N

Level l = 0X00

Level l = 1X10 X11

Level l = 2X20 X21 X22 X23


dyadic decompositions - two binary trees

X00

X10 X11

X20 X21 X22 X23 T00

T10 T11

T20 T21 T22 T23


admissible pairs - a butterfly graph, N = 4

X00 × T20 X00 × T21 X00 × T22 X00 × T23

X10 × T10 X10 × T11 X11 × T10 X11 × T11

X20 × T00 X21 × T00 X22 × T00 X23 × T00



Let N ∈ N, ε > 0, X ,T ⊂ [0,N], then a matrix vector productwith A = (e2πixjk`/N)j ,`=1,...,N can be computed in

O(

N log N log2N

ε


−70

8

−7

0

8−8

0

8

−0.50

0.5

−0.5

0

0.5

0

0.5

d = 3, sparse T sparse X


spherical means, f : Rd → R

Mf (z, t) =

∫Sd−1

f (z + tx)dσ(x), z ∈ Sd−1, t ∈ [0, 2]

−0.25 −0.125 0 0.125 0.25

−0.25

−0.125

0

0.125

0.25

1 2 3 4 5 6

0.1

0.2

0.3

0.4

N2 function values N×N measurements


spectral ansatz ek(x) = e2πikx, k ∈ I = [−N2 ,

N2 )d ∩ Zd ,

Mek(y, r) =Γ(d2

)J d

2−1(2π|k|r)

(π|k|r)d2−1

· ek(y)

d = 3, |I | = N3, |Y| = N2, |R| = N

J 12(t) =

√2

πt· sin t

butterfly sparse fast Fourier transform

T =

{(k±|k|

): k ∈ I

}X = Y ×R


d = 3, problem size N3, time complexity (Gorner, Hielscher, K.)

naive O(N5)butterfly O(N3 log6 N)

iterative reconstruction (Brandt, Dong, Gorner, K.)

‖Mf − g‖22 + λ‖f ‖TV → min

d = 2

geometry least squares regularised


model problem

T := {1, . . . ,N}Z ⊂ {z ∈ C : |z | ≤ 1}, |Z | = N


evaluate the polynomial

f (z) =∑k∈T

fkzk , z ∈ Z


idea: write z = e−ye2πix , if B = LY BpLT , then

(A� B)f =(

LY � A(

diag f)

L>TB>p

)1



Let N ∈ N, ε > 0, zj ∈ {z ∈ C : |z | ≤ 1}, then a matrix vectorproduct with C = (zk

j )j ,k=1,...,N can be computed in

O(

N log N log1

εlog3

N

ε


1 3 5 7 9 11 13 15 17 1910

−16

10−14

10−12

10−10

10−8

10−6

10−4

10−2

100

10−2

100

102

104

21

24

27

210

213

216

219

222

error vs. p, N = 214 time vs. N, p = 8

Summary

local low rank

Laplace transform - asymptotically smooth kernelsFourier transform - Fourier integral operatorsapplication in photoacoustic imagingapplication for evaluating polynomials

think global, act local

O(N loga Nε )

www.analysis.uos.de

Documents

Low ranks in computational Fourier analysis - TU Chemnitzpotts/cms/cms15/talk2kunis.pdf · Low ranks in computational Fourier analysis Part II: Fast matrix vector multiplication Stefan