Upload
truongque
View
213
Download
0
Embed Size (px)
Citation preview
Low ranks in computational Fourier analysisPart II: Fast matrix vector multiplication
Stefan Kunis
Outline
Part I: Introduction
Part II: Fast matrix vector multiplication
Part III: Efficient Reconstruction
Outline of part II
Hierarchical matrices in a nutshell
Fast Laplace transform - hierarchical approximation
Sparse fast Fourier transform - butterfly approximation
Application I - photoacoustic imaging
Application II - evaluating polynomials in a disk
Hierarchical matrices in a nutshell
hierarchical matrices (Greengard, Rokhlin; Hackbusch; Borm, Grasedyck; Bebendorf; Fenn, Steidl)
model problem d = 1
source nodes y` ∈ [0, 1], coefficients f` ∈ R, ` = 1, . . . ,N, andtarget nodes xj ∈ [0, 1], xj 6= y`, j , ` = 1, . . . ,N, compute
uj =N∑`=1
f`|xj − y`|
naive: O(N2) floating point operations
Hierarchical matrices in a nutshell
the kernel κ : [0, 1]× [0, 1]→ R
κ(x , y) =1
|x − y |
is asymptotically smooth
Hierarchical matrices in a nutshell
X = [xmin, xmax], Y = [ymin, ymax]
degenerate approximation κ : X × Y → R
κ(x , y) =
p−1∑s=0
(x − x0)s · (y − x0)−(s+1)
admissibility condition
|xmax − xmin| = diam(X ) ≤ dist(X ,Y ) = |ymin − xmax|
if p ≥ C | log ε|, then
‖κ− κ‖C(X×Y ) ≤ ε
Hierarchical matrices in a nutshell
dyadic decomposition of X = [0, 1]
0 1
0 1/2 1
0 1/4 1/2 3/4 1
0 1/8 1/4 3/8 1/2 5/8 3/4 7/8 1
Level l = 0X00
Level l = 1X10 X11
Level l = 2X20 X21 X22 X23
Level l = 3X30 X31 X32 X33 X34 X35 X36 X37
Hierarchical matrices in a nutshell
dyadic decompositions - two binary trees
X00
X10 X11
X20 X21 X22 X23
X30 X31 X32 X33 X34 X35 X36 X37
Y00
Y10 Y11
Y20 Y21 Y22 Y23
Y30 Y31 Y32 Y33 Y34 Y35 Y36 Y37
Hierarchical matrices in a nutshell
admissible pairs - a quadtree
X00 × Y00
X10 × Y10 X10 × Y11 X11 × Y10 X11 × Y11
X20 × Y22 X20 × Y23 X21 × Y22 X21 × Y23
X32 × Y34 X32 × Y35 X33 × Y34 X33 × Y35
Hierarchical matrices in a nutshell
X∗
Y∗
matrix partitioning
H-matrix
X∗ = {xj ∈ [0, 1] : j = 1, . . . ,N},Y∗ = {y` ∈ [0, 1] : ` = 1, . . . ,N} well distributed
local computations, admissible block in level l
κ(xj , y`) ≈p−1∑s=0
φs(xj)ψs(y`), K ≈ ΦΨ>, 2pN
2l
total computations
O(N log N| log ε|)
naive: O(N2) floating point operations
Fast Laplace transform - hierarchical approximation
discrete Laplace transform (Rokhlin; Strain; Andersson)
model problem, d = 1
T ⊂ [0,Tmax], |T | = N
X ⊂ [0,Xmax], |X | = N
f = (fk)k∈T ∈ CN
evaluate sum of exponentials for x ∈ X
f (x) =∑k∈T
fke−kx
naive: O(N2) floating point operations
Fast Laplace transform - hierarchical approximation
admissibility condition
diam(T ) ≤ dist(T , 0) and diam(X ) ≤ dist(X , 0)
kernel function κ : [0, 1]2 → R, κ(k, x) = e−kx
using singular value decomposition
locally rank 1 κ
Fast Laplace transform - hierarchical approximation
admissibility condition
diam(T ) ≤ dist(T , 0) and diam(X ) ≤ dist(X , 0)
kernel function κ : [0, 1]2 → R, κ(k, x) = e−kx
using singular value decomposition
locally rank 1 error
Fast Laplace transform - hierarchical approximation
admissibility condition
diam(T ) ≤ dist(T , 0) and diam(X ) ≤ dist(X , 0)
kernel function κ : [0, 1]2 → R, κ(k, x) = e−kx
using singular value decomposition
locally rank 2 error
Fast Laplace transform - hierarchical approximation
admissibility condition
diam(T ) ≤ dist(T , 0) and diam(X ) ≤ dist(X , 0)
kernel function κ : [0, 1]2 → R, κ(k, x) = e−kx
using singular value decomposition
locally rank 3 error
Fast Laplace transform - hierarchical approximation
kernel function κ : [0,Tmax]× [0,Xmax]→ R
κ(k, x) = e−kx
admissibility condition allowing low rank approximation
diam(X ) ≤ dist(X , 0)
diam(T ) ≤ dist(T , 0)
subdivide both intervals geometrically
0 Xmax
X4X5 X3 X2 X1
Fast Laplace transform - hierarchical approximation
let X and T be admissible and
B =(e−kx
)x∈X ,k∈T
polynomial interpolation in k and x at ts = cos 2s−12p π (Trefethen)
LX ∈ R|X |×p, LT ∈ Rp×|T |, Lagrange matrices, and
Bp =(
e−(tr+1)(ts+1)diamTdiamX/4)pr ,s=1
Lemma (local error)
If X ,T ⊂ [0,∞) are admissible, then
‖B− LXBpLT‖1→∞ ≤ 2 · 4−p
Fast Laplace transform - hierarchical approximation
Theorem (global error, complexity)
Let N ∈ N, ε > 0, X ,T ⊂ [0,∞), then a matrix vector productwith B = (e−xjk`)j ,`=1,...,N can be computed in
O(
N log1
ε+ log3
1
εlog
xmaxkmax
ε
)floating point operations.
1 3 5 7 9 11 13 15 17 1910
−16
10−14
10−12
10−10
10−8
10−6
10−4
10−2
100
10−2
100
102
104
21
24
27
210
213
216
219
222
error vs. p, N = 214 time vs. N, p = 8
Sparse fast Fourier transform - butterfly approximation
model problem, d = 1
T ⊂ [0,N], |T | = N
X ⊂ [0,N], |X | = N
f = (fk)k∈T ∈ CN
evaluate almost periodic function for x ∈ X
f (x) =∑k∈T
fke2πikx/N
naive: f = Af takes O(N2) floating point operations
FFT for nonequispaced nodes in time and frequency domain(nnFFT, Elbel, Steidl; Potts, Steidl, Tasche; Keiner, Knopp, Potts, K.; type-3 nuFFT, Greengard, Lee)
butterfly approximation scheme(Edelman; Michielsen, Boag; Chew, Song; Ying; O’Neil, Woolfe, Rokhlin; Candes, Demanet; Tygert)
Sparse fast Fourier transform - butterfly approximation
Lemma (local error)
Let N, p ∈ N, X ,T ⊂ [0,N] fulfil the admissibility conditiondiam(T )diam(X ) ≤ N, then
‖A− LXApLT‖1→∞ ≤ 3 ·(
π
p − 1
)p
SVD of an admissible block
lower bound (Widom)
C(π
8
)p 1
p!≈ C ′
(1.06
p
)p
1 3 5 7 9 11 13 15 17 1910
−16
10−14
10−12
10−10
10−8
10−6
10−4
10−2
100
local error, N = 210
Sparse fast Fourier transform - butterfly approximation
X
T
dyadic decomposition of X = [0,N]
0 N
0 N/2 N
0 N/4 N/2 3N/4 N
Level l = 0X00
Level l = 1X10 X11
Level l = 2X20 X21 X22 X23
Sparse fast Fourier transform - butterfly approximation
dyadic decompositions - two binary trees
X00
X10 X11
X20 X21 X22 X23 T00
T10 T11
T20 T21 T22 T23
Sparse fast Fourier transform - butterfly approximation
admissible pairs - a butterfly graph, N = 4
X00 × T20 X00 × T21 X00 × T22 X00 × T23
X10 × T10 X10 × T11 X11 × T10 X11 × T11
X20 × T00 X21 × T00 X22 × T00 X23 × T00
Sparse fast Fourier transform - butterfly approximation
Theorem (global error, complexity)
Let N ∈ N, ε > 0, X ,T ⊂ [0,N], then a matrix vector productwith A = (e2πixjk`/N)j ,`=1,...,N can be computed in
O(
N log N log2N
ε
)floating point operations.
−70
8
−7
0
8−8
0
8
−0.50
0.5
−0.5
0
0.5
0
0.5
d = 3, sparse T sparse X
Application I - photoacoustic imaging
spherical means, f : Rd → R
Mf (z, t) =
∫Sd−1
f (z + tx)dσ(x), z ∈ Sd−1, t ∈ [0, 2]
−0.25 −0.125 0 0.125 0.25
−0.25
−0.125
0
0.125
0.25
1 2 3 4 5 6
0.1
0.2
0.3
0.4
N2 function values N×N measurements
Application I - photoacoustic imaging
spectral ansatz ek(x) = e2πikx, k ∈ I = [−N2 ,
N2 )d ∩ Zd ,
Mek(y, r) =Γ(d2
)J d
2−1(2π|k|r)
(π|k|r)d2−1
· ek(y)
d = 3, |I | = N3, |Y| = N2, |R| = N
J 12(t) =
√2
πt· sin t
butterfly sparse fast Fourier transform
T =
{(k±|k|
): k ∈ I
}X = Y ×R
Application I - photoacoustic imaging
d = 3, problem size N3, time complexity (Gorner, Hielscher, K.)
naive O(N5)butterfly O(N3 log6 N)
iterative reconstruction (Brandt, Dong, Gorner, K.)
‖Mf − g‖22 + λ‖f ‖TV → min
d = 2
geometry least squares regularised
Application II - evaluating polynomials in a disk
model problem
T := {1, . . . ,N}Z ⊂ {z ∈ C : |z | ≤ 1}, |Z | = N
f = (fk)k∈T ∈ CN
evaluate the polynomial
f (z) =∑k∈T
fkzk , z ∈ Z
naive: O(N2) floating point operations
idea: write z = e−ye2πix , if B = LY BpLT , then
(A� B)f =(
LY � A(
diag f)
L>TB>p
)1
Application II - evaluating polynomials in a disk
Theorem (global error, complexity)
Let N ∈ N, ε > 0, zj ∈ {z ∈ C : |z | ≤ 1}, then a matrix vectorproduct with C = (zk
j )j ,k=1,...,N can be computed in
O(
N log N log1
εlog3
N
ε
)floating point operations.
1 3 5 7 9 11 13 15 17 1910
−16
10−14
10−12
10−10
10−8
10−6
10−4
10−2
100
10−2
100
102
104
21
24
27
210
213
216
219
222
error vs. p, N = 214 time vs. N, p = 8
Summary
local low rank
Laplace transform - asymptotically smooth kernelsFourier transform - Fourier integral operatorsapplication in photoacoustic imagingapplication for evaluating polynomials
think global, act local
O(N loga Nε )
www.analysis.uos.de