Fast Multipole Methods: Fundamentals & Applications › ... › Lecture1_Introduction_06.pdf · © Gumerov & Duraiswami, 2002 - 2006 Fast Multipole Methods: Fundamentals & Applications

© Gumerov & Duraiswami, 2002 - 2006

Fast Multipole Methods: Fundamentals & Applications

Ramani DuraiswamiNail A. Gumerov


Week 1. Introduction.

• What are multipole methods and what is this course about.

• Problems from physics, mathematics, computer vision, statistics, etc. that can be solved with fast multipole methods.

• Historical review of the fast multipole method. • Review of Complexity • Homework.


Course Mechanics• Background Needed

– Linear Algebra• Matrices, Vectors, Linear Systems, LU Decomposition, Eigenvalue

problem, SVD

– Numerical Analysis • (Interpolation, Approximation, Finite Differences, Taylor Series,

etc.)

– Programming• Matlab, C/C++ and/or FORTRAN

– For particular applications in your domain you will need to know the background material there

– E.g., familiarity with partial differential equations or integral equations, or interpolation, or …

• Participation essential!


Course Requirements• Ideally

– you are an applied student in CS, engineering, physics or industry, who has a problem in mind and would like to explore how the FMM could be applied to it

– or, you are Applied Math/Scientific Computation student who wants to learn this important algorithm

• At the end of the course you will have a familiarity with– Application of FMM to different problems– Data structures for FMM, and analysis related to it.

• Course focus is on applying the methods to achieve solution. – Theory will be developed as needed to proceed


Homework

• Will try to have it every week• Account for 35 % of your grade• Will not be excessive• Essential for learning --- must do as opposed to just

read.• Homework handed out last class of a week. • Due last class of next week • Thanksgiving week no homework

– (you can turn in the previous homework after thanksgiving)


Projects & Exams• There will be a final project that will require you to implement an

FMM algorithm in a field of your choice, – account for 25% of the grade.

• Project to be chosen latest by October 17.– Implementation (reimplementation) of an FMM algorithm – Use an existing FMM algorithm to solve an applied or mathematical problem– you can discuss these with us, and we will help you choose one, and over the

project– If you already have a project in mind you can discuss it with us

• Exams– intermediate exam worth 15%, week of October 16– final exam worth 25%. Finals Week


Class web & Mailing List

• http://www.umiacs.umd.edu/~ramani/cmsc878R• [email protected]

Discussions, announcements• Homework will be posted on the web

– No late homework (except by timely prior arrangement)– Hardcopy (no web or email submissions)

• Solutions will be posted after homework collected• Links to papers etc.


Introductions• Email addresses• What are your interests?• What do you want us to cover?

– Last three years outline is posted at the course web site


Scientific Computing• Do science on the computer• Accepted paradigm in many fields

• Model the problem as best as we can• Then represent the physical system discretely

Computational physicsComputational chemistryComputational biologyComputational fluid dynamicsComputational solid mechanicsStellar dynamicsMolecular ModelingMachine LearningData MiningMeteorology

Computational StatisticsComputational ElectromagneticsMedical Imaging & TomographyComputational AcousticsGeology and Ocean EngineeringCircuit designPhotonicsNanotechnology…


Problem Size• As problem size grows, in general fidelity of

representation grows• On the other hand

– cost of solving the problem – Memory required to store the discrete representation

also grow• Way the cost creases with size is the “memory” or “run

time” complexity of an algorithm• Need a language to talk about fast and low memory

algorithms• Asymptotic complexity


Asymptotic Equivalence

1)g()f(lim =⎟⎟

⎠

⎞⎜⎜⎝

⎛∞→ n

nn

• Let n the problem size • Two algorithms with costs f(n)and g(n) are said to be equivalent:

f(n) ~ g(n)


Little Oh

•Asymptotically smaller:

•f(n) = o(g(n))

0)g()f(lim =⎟⎟

⎠

⎞⎜⎜⎝

⎛∞→ n

nn


Big Oh

•Asymptotic Order of Growth:•f(n) = O(g(n))

∞<⎟⎟⎠

⎞⎜⎜⎝

⎛∞→ )g(

)f(suplimnn

n


The Oh’s

If f = o(g) or f ~ g then f = O(g)

lim = 0 lim = 1 lim < ∞


The Oh’s

If f = o(g), then g ≠ O(f)

lim = 0 lim = ∞fg

gf


Big Oh

•Equivalently,

•f(n) = O(g(n))

∃c,n0 ≥ 0 ∀n ≥ n0 |f(n)| ≤ c·g(n)


c·

ln c

Big Oh

f(x) = O(g(x))

f(x)

g(x)↑

logscale

↓

blue stays below red


Complexity• we will specify the function g(N) • The most common complexities are

– O(1) - not proportional to any variable number, i.e. a fixed/constant amount of time

– O(N) - proportional to the size of N (this includes a loop to N and loops to constant multiples of N such as 0.5N, 2N, 2000N - no matter what that is, if you double N you expect (on average) the program to take twice as long)

– O(N2) - proportional to N squared (you double N, you expect it to take four times longer - usually two nested loops both dependent on N).

– O(log N) - this is tricker to show - usually the result of binary splitting. – O(N log N) this is usually caused by doing log N splits but also doing N

amount of work at each "layer" of splitting.


Theta

•f(n) = Θ(g(n))

f(n) = O(g(n)) and g(n) = O(f(n))

Same Order of Growth:


Log complexity

• If you half data at each stage then number of stages until you have a single item is given (roughly) by log2 N. => binary search takes log2 N time to find an item.

• All logs grow a constant amount apart (homework) – So we normally just say log N not log2 N.

• Log N grows very slowly


What is the Fast Multipole Method?• An algorithm for achieving fast products of particular

dense matrices with vectors• Similar to the Fast Fourier Transform

– For the FFT, matrix entries are uniformly sampled complex exponentials

• For FMM, matrix entries are – Derived from particular functions– Functions satisfy known “translation” theorems

• Name is a bit unfortunate– What is a multipole? We will return to this …

• Why is this important?


Vectors and Matrices

• n×d dimensional matrix M and its transpose Mt

d dimensional column vector x and its transpose


Matrix vector product

s1 = m11 x1 + m12 x2 + … + m1d xd

s2 = m21 x1 + m22 x2 + … + m2d xd

…sn = mn1 x1 + mn2 x2 + … + mnd xd

• d products and sums per line

• N lines• Total Nd products

and Nd sums to calculate N entries

• If d∼ N then matrix multiplication cost is O(N2)

• Storage of the matrix entries is also O(N2)

• Matrix vector product is identical to a sum

si = ∑j=1d mij xj

• So algorithm for fast matrix vector products is also a fast summation algorithm


Linear Systems• Solve a system of equations

Mx=s• M is a N × N matrix, x is a N vector, s is a N vector• Direct solution (Gauss elimination, LU Decomposition,

SVD, …) all need O(N3) operations• Iterative methods typically converge in k steps with each

step needing a matrix vector multiply O(N2) – if properly designed, k<< N

• A fast matrix vector multiplication algorithm (O(N log N) operations) will speed all these algorithms


Is this important?• Argument:

– Moore’s law: Processor speed doubles every 18 months– If we wait long enough the computer will get fast enough and

let my inefficient algorithm tackle the problem• Is this true?

– Yes for algorithms with same asymptotic complexity– No!! For algorithms with different asymptotic complexity

• For a million variables, we would need about 16 generations of Moore’s law before a O(N2) algorithm was comparable with a O(N) algorithm

• Similarly, clever problem formulation can also achieve large savings.


Memory complexity

• Sometimes we are not able to fit a problem in available memory– Don’t care how long solution takes, just if we can solve it

• To store a N × N matrix we need N2 locations– 1 GB RAM = 10243 =1,073,741,824 bytes

– => largest N is 32,768• “Out of core” algorithms copy partial results to disk, and keep only

necessary part of the matrix in memory

• FMM allows reduction of memory complexity as well– Elements of the matrix required for the product can be generated

as needed


Parallel Processing• Another way to tackle large problems is to throw more hardware

(or more sophisticated hardware) at the problem• If the work can be distributed across several processors (say P of

them), then it can reduce the cost by some factor (ideally P).• Usually requires reformulation of the solution algorithm (unless it

is trivially parallel)• Was “hot” in the 80s.• Interest in this area again as purpose built parallel processing

engines such as Graphical Processing Units, Multicore processors, FPGAs, etc. become widely available


The need for fast algorithms• Grand challenge problems in large numbers of variables • Simulation of physical systems

– Electromagnetics of complex systems– Stellar clusters– Protein folding– Turbulence

• Learning theory– Kernel methods – Support Vector Machines

• Graphics and Vision– Light scattering …


• General problems in these areas can be posed in terms of millions (106) or billions (109) of variables

• Recall Avogadro’s numer (6.022 141 99 × 1023

molecules/mole


Dense and Sparse matrices• Operation estimates are for dense matrices.

– Majority of elements of the matrix are non-zero

• However in many applications matrices are sparse• A sparse matrix (or vector, or array) is one in which

most of the elements are zero. – If storage space is more important than access speed, it may be

preferable to store a sparse matrix as a list of (index, value) pairs.

– For a given sparsity structure it may be possible to define a fast matrix-vector product/linear system algorithm


• Can compute

In 5 operations instead of 25 operations• Sparse matrices are not our concern here

a1 0 0 0 00 a2 0 0 00 0 a3 0 00 0 0 a4 00 0 0 0 a5

x1

x2

x3

x4

x5

=

a1x1

a2x2

a3x3

a4x4

a5x5


Structured matrices• Fast algorithms have been found for many dense

matrices• Typically the matrices have some “structure”• Definition:

– A dense matrix of order N × N is called structured if its entries depend on only O(N) parameters.

• Most famous example – the fast Fourier transform


Fourier Analysis• Def.: mathematical techniques for

breaking up a signal into its components (sinusoids)

• Jean Baptiste Joseph Fourier (1768-1830)

• can represent any continuous periodic signal as a sum of sinusoidal waves

=

+

+


Notation

Fourier transform f(t) -> F[k]

F[k] =

For f, periodic with period p

f (t) e- 2 π i k t/p d t1/p ∫0

p

Inverse Fourier transform F[k] -> f(t)

F[k] e 2 π i k t/pΣk in Z

f(t) =

= f (t) cos(2 π k t/p) dt1/p ∫0

p

f (t) sin(2 π k t/p) dt− i / p∫0

p


-3 -2 -1 1 2 3

2

4

6

8

10

-2 -1 1

-1

-0.5

0.5

1

-3 -1 1 3

10

f(t) φ[n] = f(n h) h

DFTf (t) e- 2 π i k t/p d t∫0

p φ[n] e -2 π i k nh /pΣn =0

N-1


DFT and its inverse for periodic discrete data

Φ[k] =

This is automatically periodic in k with period N

Inverse is like Fourier series, but with only p terms

p = N hφ[n] e -2 π i k n h /pΣn =0

N-1

φ[n] e -2 π i k n / N= Σn=0

N-1

© Gumerov & Duraiswami, 2002 - 2006fF = FN

Discrete time Numerical Fourier Analysis

DFT is really just a matrix multiplication!

0 10 20 30 40 50 600

10

20

30

40

50

60

Freq

. ind

ex

time index

e -2 π i k m/N f[k]Σk =0

F [m] = 1/Ν

f[0]

f[1]

f[2]...

f[N-1]

F[0]

F[1]

F[2]...

F[N-1]

=

N-1


Fourier Matrices


Primitive Roots of Unity

• Example: The complex number e2pi/n is a primitive n-th root of unity, where

Real

Imaginary

w1=cos(π/4) + i

45o

cos(π/4)

w3

w5

w4

w7

w8

w6

w2 2

1

2 2

10 2 3 ( 1)

0

1. = 1

2. cos2 sin 2 1

3. S= ...... 0

in

nin in

njp j j j j n

p

e

e e i

π

π π

ω

ω π π

ω ω ω ω ω ω−

−

=

≠

⎛ ⎞= = = + =⎜ ⎟

⎝ ⎠

= + + + + + =∑

Check: if properties are satisfied

1−=i

n complex roots of unity equally spaced around the circle of unit radius centered at the origin of the complex plane.

A number ω is a primitive n-th root of unity, for n>1, ifωn = 1The numbers 1, ω, ω2, …, ωn-1 are all distinct


Roots of Unity: Properties• Property 1: Let ω be the principal nth root of unity. If n > 0,

then ωn/2 = -1.– Proof: ω = e 2π i / n ⇒ ωn/2 = e π i = -1. (Euler's formula)– Reflective Property: – Corollary: ωk+n/2= -ωk.

• Property 2: Let n > 0 be even, and let ω and ν be the principal nth and (n/2)th roots of unity. Then (ωk ) 2 = ν k .– Proof: (ωk ) 2 = e (2k)2π i / n = e (k) 2π i / (n / 2) = ν k .– Reduction Property: If ω is a primitive (2n)-th root of unity, then

ω2 is a primitive n-th root of unity.


• L3: Let n > 0 be even. Then, the squares of the n complex nth roots of unity are the n/2 complex (n/2) th

roots of unity.– Proof: If we square all of the nth roots of unity, then each (n/2)

th root is obtained exactly twice since:• L1 ⇒ ωk + n / 2 = - ωk

• thus, (ωk + n / 2) 2 = (ωk ) 2

• L2 ⇒ both of these = ν k

• ωk + n / 2 and ωk have the same square

• Inverse Property: If ω is a primitive root of unity, then ω -1=ωn-1

– Proof: ωωn-1=ωn=1


Fast Fourier Transform• Presented by Cooley and Tukey in 1965, but invented several

times, including by Gauss (1809) and Danielson & Lanczos(1948)

• Danielson Lanczos lemma


• So far we have seen what happens on the right hand side• How about the left hand side?• When we split the sums in two we have two sets of sums

with N/2 quantities for N points.• So the complexity is N2/2 + N2/2 =N2

• So there is no improvement• Need to reduce the sums on the right hand side as well

– We need to reduce the number of sums computed from 2N to a lower number

– Notice that the values corresponding to k and k+N/2 will be the same.

– The transforms Fek and Fo

k are periodic in k with length N/2. – So we need only compute half of them!


FFT• So DFT of order N can be expressed as sum of two

DFTs of order N/2 evaluated at N/2 points• Does this improve the complexity?• Yes (N/2)2+(N/2)2 =N2/2< N2

• But we are not done ….• Can apply the lemma recursively

• Finally we have a set of one point transforms• One point transform is identity


FFT Algorithm

if (n == 1) // n is a power of 2return a0

ω ← e 2π i / n

(e0,e1,e2,...,en/2-1) ← FFT(n/2, a0,a2,a4,...,an-2)(d0,d1,d2,...,dn/2-1) ← FFT(n/2, a1,a3,a5,...,an-1)

for k = 0 to n/2 - 1yk ← ek + ωk dk

yk+n/2 ← ek - ωk dk

return (y0,y1,y2,...,yn-1)

FFT (n, a0, a1, a2, . . . , an-1)

O(n) complex multiplies if we pre-compute ωk.

)log()()()2/(2)( nnOnTnOnTnT =⇒+=


Complexity• Each Fk is a sum of log2 N transforms and (factors)• There are N Fk s• So the algorithm is O(N log2 N)• This is a recursive algorithm


FFTtxfunction y = ffttx(x)%FFTTX Textbook Fast Finite

Fourier Transform.% FFTTX(X) computes the same

finite Fourier transform as FFT(X).% The code uses a recursive divide

and conquer algorithm for% even order and matrix-vector

multiplication for odd order.% If length(X) is m*p where m is

odd and p is a power of 2, the% computational complexity of this

approach is O(m^2)*O(p*log2(p)).

x = x(:);n = length(x);omega = exp(-2*pi*i/n);

if rem(n,2) == 0% Recursive divide and conquerk = (0:n/2-1)';w = omega .^ k;u = ffttx(x(1:2:n-1));v = w.*ffttx(x(2:2:n));y = [u+v; u-v];

else% The Fourier matrix.j = 0:n-1;k = j';F = omega .^ (k*j);y = F*x;

end


Scrambled Output of the FFT

a0, a1, a2, a3, a4, a5, a6, a7

a1, a3, a5, a7a0, a2, a4, a6

a3, a7a1, a5a0, a4 a2, a6

a0 a4 a2 a6 a1 a5 a3 a7

"bit-reversed" order


FFT and IFFT

The FFT has revolutionized many applications by reducing the complexity by a factor of almost n

Can relate many other matrices to the Fourier Matrix


Circulant Matrices

Toeplitz Matrices

Hankel Matrices

Vandermonde Matrices


• Modern signal processing very strongly based on the FFT

• One of the defining inventions of the 20th century


• Introduced by Rokhlin & Greengard in 1987• Called one of the 10 most significant advances in computing of the

20th century• Speeds up matrix-vector products (sums) of a particular type

• Above sum requires O(MN) operations.• For a given precision ε the FMM achieves the evaluation in O(M+N)

operations.

Fast Multipole Methods (FMM)


• Can accelerate matrix vector products – Convert O(N2) to O(N log N)

• However, can also accelerate linear system solution– Use iterative methods that take k steps and use a matrix vector

product at each step– Convert O(N3) to O(kN log N)


A very simple algorithm• Not FMM, but has some key ideas• Consider

S(xi)=∑j=1N αj (xi – yj)2 i=1, … ,M

• Naïve way to evaluate the sum will require MN operations• Instead can write the sum as

S(xi)=(∑j=1N αj )xi

2 + (∑j=1N αjyj

2) -2xi(∑j=1N αjyj )

– Can evaluate each bracketed sum over j and evaluate an expression of the type

S(xi)=β xi2 + γ -2xiδ

– Requires O(M+N) operations

• Key idea – use of analytical manipulation of series to achieve faster summation


Approximate evaluation

• FMM introduces another key idea or “philosophy”– In scientific computing we almost never seek exact answers– At best, “exact” means to “machine precision”

• So instead of solving the problem we can solve a “nearby”problem that gives “almost” the same answer

• If this “nearby” problem is much easier to solve, and we can bound the error analytically we are done.

• In the case of the FMM– Manipulate series to achieve approximate evaluation– Use analytical expression to bound the error

• FFT is exact … FMM can be arbitrarily accurate


Some FMM algorithms

• Molecular and stellar dynamics– Computation of force fields and dynamics

• Interpolation with Radial Basis Functions• Solution of acoustical scattering problems

– Helmholtz Equation

• Electromagnetic Wave scattering– Maxwell’s equations

• Fluid Mechanics: Potential flow, vortex flow– Laplace/Poisson equations

• Fast nonuniform Fourier transform


Applications – I Interpolation• Given a scattered data set with points and values {xi, fi}• Build a representation of the function f(x)

– That satisfies f(xi)=fi

– Can be evaluated at new points

• One approach use “radial-basis functions”f(x)=∑i

N αi R(x-xi) + p(x)fj=∑i

N αi R(xj-xi) + p(xj)• Two problems

– Determining αi

– Knowing αi determine the product at many new points xj

• Both can be solved via FMM (Cherrie et al, 2001; Gumerov & Duraiswami, 2006)


Applications 2• RBF interpolation

Cherrie et al 2001

Original unstructured dataRBF interpolationto regular spatial grid

Gumerov & Duraiswami, 2006


Applications 3• Sound scattering off rooms and bodies

– Need to know the scattering properties of the head and body (our interest)

( ) ( ) ( ) ( ) ( ) ( ), ;, ;

y

yy y

p y G x y kC x p x G x y k p y d

n nΓ

⎡ ⎤∂ ∂= − Γ⎢ ⎥∂ ∂⎣ ⎦

∫ ( ),4

ikeGπ

−

=−

x y

x yx y

2 2 0P k P∇ + =P i P gn

σ∂+ =

∂lim 0r

Pr ikPr→∞

∂⎛ ⎞− =⎜ ⎟∂⎝ ⎠


Incident plane wave

Mesh: 132072 elements65539 vertices

BEM/FMM computationsof acoustic scatteringfrom complex shape objects

250 Hz (kD=0.96) 2.5 kHz (kD=9.6) 25 kHz (kD=96)

1

-9

0

-5

dB

exp(ikz)


kD=0.96

Sound Pressure


kD=9.6

Sound Pressure


kD=96

Sound Pressure


125 randomly oriented ellipsoids127000 vertices and 253500 elements(1016 vertices and 2028 elements per ellipsoid)

Potential


1000 randomly oriented ellipsoids488000 vertices and 972000 elements

(488 vertices and 972 elements per ellipsoid) Potential



EM wave scattering• Similar to acoustic scattering• Send waves and measure scattered waves• Attempt to figure out object

from the measured waves• Need to know “Radar

cross-section”• Many applications

– Light scattering– Radar– Antenna design– ….

Darve 2001


Molecular and stellar dynamics• Many particles distributed in

space• Particles exert a force on each

other• Simplest case force obeys an

inverse-square law (gravity, coulombic interaction)

2

2 31

( ),

| |

Ni ji

i i i jj i jj i

d F F q qdt =

≠

−= =

−∑x xxx x


Fluid mechanics• Incompressible Navier Stokes Equation

• Laplace equation for potential and Poisson equation for vorticity

• Solved via particle methods …

Au ×∇+∇= φ( ) uuuu

u

2

0

∇=∇+⎟⎠⎞

⎜⎝⎛ ∇⋅+

∂∂

=⋅∇

µρ pt


Vortex Methods


Applications• Digital Signal Processing: analyzing and manipulating real-world

signals using a computer– Toys and consumer electronics– Speech recognition – Audio/video compression– Medical imaging– Communications– Radar

Documents

Fast Multipole Methods: Fundamentals & Applications › ... › Lecture1_Introduction_06.pdf · © Gumerov & Duraiswami, 2002 - 2006 Fast Multipole Methods: Fundamentals & Applications