Krylov methods for the computation of matrix functionsjitse/hw10.pdf · Krylov methods for the computation of matrix functions ... Horn & Johnson, Topics in Matrix Analysis, 1991

Krylov methods for thecomputation of matrix functions

Jitse Niesen (University of Leeds)

in collaboration withWill Wright (Melbourne University)

Heriot-Watt University, March 2010

Outline

I Definition of matrix functionsI via seriesI via diagonalizationI via contour integration

I MotivationI Centrality measureI Exponential integrators

I Direct methods for computation of matrix exponentialI Pade approximation with scaling and squaring

I Krylov method for computation of matrix functionsI The basic ideaI Implementation (sketch)I Leja point interpolation

I Experiment: Heston equation for prizing derivatives

I Conclusions

Matrix functions via series

Given a scalar function f : C → C with Taylor series

f (x) = a0 + a1x + 12a2x

2 + 16a3x

3 + · · · ,

the matrix function f : Cn×n → Cn×n is defined by

f (X ) = a0 + a1X + 12a2X

2 + 16a3X

3 + · · · .

This talk concentrates on the matrix exponential

exp(X ) =∞∑

n=0

1

n!X n.

Matrix functions inherit many properties of the correspondingscalar functions, e.g., solution of X ′ = AX is X (t) = exp(At) X (0).

Definition needs to be adapted if scalar function f has singularities.

Matrix functions via diagonalization

Scalar function: f (x) = a0 + a1x + 12a2x

2 + 16a3x

3 + · · ·Matrix function: f (X ) = a0 + a1X + 1

2a2X2 + 1

6a3X3 + · · ·

Matrix function on diagonal matrix reduces to scalar function

f

d1 0 . . . 0

0 d2. . .

......

. . .. . . 0

0 . . . 0 dn

=

f (d1) 0 . . . 0

0 f (d2). . .

......

. . .. . . 0

0 . . . 0 f (dn)

.

If X can be diagonalized, X = VDV−1, then

f (X ) =∞∑

n=0

an(VDV−1)n =∞∑

n=0

anVDnV−1 = Vf (D)V−1.

This yields another definition of matrix functions.Can be extended to non-diagonalizable matrices by continuity.

Matrix functions via contour integration

The third definition of matrix functions is

f (X ) =1

2πi

∮f (z) (X − zI )−1 dz ,

where integral is along contour encircling the eigenvalues of X.

Equivalent to previous definition by residue theorem;the integrand has poles at the eigenvalues.

This definition is theoretically convenient because it always works.

References

1. Gantmacher, The Theory of Matrices, 1959.2. Horn & Johnson, Topics in Matrix Analysis, 1991.3. N. Higham, Functions of Matrices, 2008.

Outline






I Conclusions

Centrality measures of a network

A graph is a collection of nodes, some of which are connected byedges. We want to know which nodes are “central”.

One centrality measure is the degree: the number of edges of anode. This is a local measure; wish to extend this to incorporateglobal information.

Let X be the adjacency matrix of a graph: xij is 1 if there is anedge between nodes i and j and 0 otherwise.

The (i , j) entry of X 2 is∑

k xikxkj . This counts the number ofpaths of length 2 from i to j .

The (i , i) entry of X n counts the number of “n-cycles” that i ison. In particular, the (i , i) entry of X 2 is the degree.

Estrada & Rodrıguez-Velazquez (2005) propose to use thediagonal entries of exp(X ) =

∑n

1n!X

n as a centrality measure.

Matrix functions solve differential equations

The solution of x ′ = ax , x(0) = x0 is x(t) = exp(at)x0.The solution of x ′ = Ax , x(0) = x0 is x(t) = exp(tA)x0.

The solution of x ′ = ax + b, x(0) = 0 is

x(t) =

∫ t

0exp(aτ)b dτ =

exp(aτ)− 1

ab = tϕ1(at)b.

where ϕ1(z) =(exp(z)− 1

)/z .

The solution of x ′ = Ax + b, x(t) = 0 is x(t) = tϕ1(tA)b.

The solution of x ′ = Ax + ct, x(0) = 0 is x(t) = t2ϕ2(tA)c ,where ϕ2(z) =

(exp(z)− 1− z

)/z2.

These results can be combined by superposition.

Exponential Euler method

Solution of x ′(t) = Lx(t) + v0 + tv1 + 12 t2v2 + · · · , x(0) = x0,

where x(t), x0, v0, v1, . . . are vectors and L is a matrix, is

x(t) = exp(tL)x0 + hϕ1(tL)v0 + h2ϕ2(tL)v1 + h3ϕ3(tL)v2 + · · ·

Consider x ′ = Lx + N(x) (L = linear, N = nonlinear)

Replace the nonlinear term with the constant N(x(0)) ≈ N(x(t))(for small t) and use the results from the previous slide:

x(t) = exp(tL)x(0) + ϕ1(tL) N(x(0)).

This leads to the exponential Euler method

xn+1 = exp(tL)xn + ϕ1(hL)N(xn).

This method is not affected by stiffness in L. (Certaine 1960)

Outline






I Conclusions

Computation using series

The series exp(X ) = I + X + 12X 2 + 1

6X 3 + · · · is one method tocompute the matrix exponential.

Series converges slowly away from the origin, so combine withscaling and squaring. Set Y = 2−kX , compute exponential, squareexp(Y ) k times. This uses the identity exp(X ) = (exp(1

2X ))2.

Further improvement is to use Pade approximation instead ofpolynomial approximation:

exp(x) ≈ c0 + c1X + c2X2 + c3X

3

d0 + d1X + d2X 2 + d3X 3

where the coefficients ci and di are chosen so that the error is assmall as possible for small x (in practice, we use more coefficients).

Pade with scaling and squaring is the most populargeneral-purpose method for computing the matrix exponential.

(Lawson 1967; Higham 2005; Al-Mohy & Higham 2009)

Computing matrix functions by diagonalization

If X = VDV−1 then f (X ) = Vf (D)V−1. So compute matrixfunction by first diagonalizing the matrix.

This works well for some matrices, in particular symmetricmatrices. However, it fails if X is (close to) non-diagonalizable.

Computing matrix functions by integration

Use f (X ) = 12πi

∮f (z) (X − zI )−1 dz and evaluate contour integral

with trapezium rule.(Kassam & Trefethen 2005; Schmelzer & Trefethen 2007)

This has its appeal, but is expensive, especially if eigenvalues arefar apart or matrix is very non-normal.

(Ashi, Cummings & Matthews 2009)

Outline






I Conclusions

The idea behind Krylov methods

In exponential integrators, and other applications, X is a largematrix. On the other hand, we need not f (X ) but f (X )b.

A matrix-free method uses only matrix-vector products.This leads to the Krylov subspace

Km(X , b) = span{b,Xb,X 2b, . . . ,Xm−1b}.

This basis is very ill-conditioned, so use Gram–Schmidt(here called Arnoldi or Lanczos) to get an orthogonal basis:

Km(X , b) = span{v1, v2, v3, . . . , vm}.

Put the vectors vj in an n-by-m matrix Vm (where n is size of X ).Basis transformation is encoded in m-by-m matrix Hm.

The idea behind Krylov methods II

Considering the matrices as linear mappings:

I X is map from “big space ” Cn to itself.

I Vm is projection from Cn onto “small” Km(X , b).

I V Tm is extension from Km(X , b) to Cn; VmV T

m = Id.

I Hm is projection of action of X on Km(X , b); Hm = VmXV Tm .

Krylov methods use the approximation X ≈ V Tm HmVm.

In the context of matrix functions, use

f (X ) =∞∑

n=k

akX k ≈∞∑

k=0

ak(V Tm HmVm)k

=∞∑

k=0

akV Tm (Hm)kVm = V T

m f (Hm)Vm.

Krylov approximation replaces big matrix X by small matrix Hm.

Steps towards a practical method

I Estimate error in Krylov approximation.

I Use error estimate to adaptively choose dimension m ofKrylov subspace (and to correct approximation).

I Use recursion relation to combine several terms in one:

exp(X )b0 + ϕ1(X )b1 + ϕ2(X )b2 = w0 + ϕ2(X )w2.

I Compute ϕ function using trick

exp

([X b0 0

])=

[exp(X ) ϕ1(X )b

0 1

].

I Problem: A priori estimate suggests that optimal m isproportional to spectral radius ρ(X ).Thus, introduce time stepping similar to scaling-and-squaring.

This is implemented in the matlab code phipm.

Leja point interpolation

Krylov methods project matrix function on Krylov subspace

Km(X , b) = span{b,Xb,X 2b, . . . ,Xm−1b}.

They approximate f (X )b by element in Km(X , b). Thus,approximation is of form p(X )b, with p a polynomial. In fact,Krylov method performs polynomial interpolation in Ritz values(eigenvalues of Hm). Good because Ritz values approximateeigenvalues of A. (Saad 1992)

But Ritz values may not be the best interpolation points. If boundon eigenvalues of A is known, Leja points may be better and showgood performance in experiments.

(Caliari & Ostermann 2009; Tambue, Lord & Geiger 2010)

Outline






I Conclusions

Prizing derivatives

The Black–Scholes model assumes that the value St of theunderlying asset follows a geometric Brownian motion:

dSt = µSt dt + σSt dWt .

The no-arbitrage principle (no strategy guarantees a profit) impliesthat price u of derivative given s (price of asset) satisfies

∂u

∂t+ 1

2σ2s∂2u

∂s2+ rs

∂u

∂s− ru = 0.

where r is risk-free interest rate.

The Heston model assumes that volatility σ is not constant:

dSt = µSt dt +√

νtSt dW St ,

dνt = κ(η − νt) dt + λ√

νt dW νt .

Parameters are such that νt > 0.

Heston PDE

Heston model + no-arbitrage principle + magic yields

∂u

∂t= 1

2νs2 ∂2u

∂s2+ρλνs

∂2u

∂ν ∂s+ 1

2λ2ν∂2u

∂ν2+rs

∂u

∂s+κ(η−ν)

∂u

∂ν−ru.

Add boundary conditions for modelling European option.

Use standard second-order finite differences and incorporateboundary conditions to get ODE of the form

u′ = Au + v1, u(0) = v0.

The matrix A has size 5100 and 44800 non-zero elements.The ODE can be solved by evaluating ϕ-functions.

In ’t Hout (2007) advocates the use of ADI. We compare his ADIschemes with standard code ode15s and our phipm (in matlab).

Error vs time

100

101

102

10−8

10−7

10−6

10−5

10−4

10−3

10−2

10−1

100

101

cpu time

max

imum

err

or

PhipmCrank−NicolsonDouglasCraigHundsdorferOde15s

Krylov dimension m and step size τ

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.430

32

34

36

38

40m

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.42

2.5

3

3.5

4x 10

−3

t

τ

Error estimate (blue) and actual error (red)

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.40

0.5

1

1.5

2

2.5

3

3.5

4x 10

−7

t

erro

r

Conclusion

I Our matlab code looks good. But it needs more testing.I Current work:

I Rewrite code in compiled languaged (C++).I Investigate instability; compare exp’l trick with direct method.I Extend test to exotic options (American, Asian, barrier).

I Other work shows our code can be used in exponentialintegrators to solve semi-linear PDEs. Excellent with spectraldiscretiation, promising for FD yielding mildly stiff problems,disappointing for very stiff.

I Compare with other methods for evaluating ϕ functions,especially Leja point interpolation and also RD-rationalapproximations (Moret & Novati 2004).

I For more details, see:I Niesen & Wright, A Krylov subspace algorithm for evaluating

the ϕ-functions appearing in exponential integrators,arXiv:0907.4631

I matlab code is available at my home page.

Documents

Krylov methods for the computation of matrix functionsjitse/hw10.pdf · Krylov methods for the computation of matrix functions ... Horn & Johnson, Topics in Matrix Analysis, 1991