56
Applications of trace estimation techniques Yousef Saad Department of Computer Science and Engineering University of Minnesota HPCSE 2017 - Solan, Czech republic May 22-25, 2017

Yousef Saad Department of Computer Science and Engineeringsaad/PDF/hpcse17.pdf · 2017-06-08 · Applications of trace estimation techniques Yousef Saad Department of Computer Science

  • Upload
    others

  • View
    11

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/hpcse17.pdf · 2017-06-08 · Applications of trace estimation techniques Yousef Saad Department of Computer Science

Applications of trace estimation techniquesYousef Saad

Department of Computer Scienceand Engineering

University of Minnesota

HPCSE 2017 - Solan, Czech republicMay 22-25, 2017

Page 2: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/hpcse17.pdf · 2017-06-08 · Applications of trace estimation techniques Yousef Saad Department of Computer Science

LARGE SYSTEMSH2 / HSS matrices

Ax=b GraphPartitioning

Model reduction A x = xλ DomainDecomposition

−∆ u = f

Preconditioning

Sparse matrices

Page 3: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/hpcse17.pdf · 2017-06-08 · Applications of trace estimation techniques Yousef Saad Department of Computer Science

Tran

sla

te H2 / HSS matrices

Conquer

Divide &

Data Sparsity

Ax=b GraphPartitioning

Model reduction A x = xλ DomainDecomposition

−∆ u = f

Preconditioning

TΣA = U V PCA ClusteringDimension

Reduction

Semi−Supervised

Learning

Regression

Sparse matricesLARGE SYSTEMS

LASSO

GraphLaplaceans

BIG DATA!

Page 4: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/hpcse17.pdf · 2017-06-08 · Applications of trace estimation techniques Yousef Saad Department of Computer Science

Introduction

ä Focus of this talk: Spectral densities

ä Spectral density == function that provides a global repre-sentation of the spectrum of a Hermitian matrix

ä Known in solid state physics as ‘Density of States’ (DOS)

ä Very useful in physics

ä Almost unknown (as a tool) in numerical linear algebra

Outline:

1. general introduction, 2. trace estimation, 3. the DOS,4. how to compute it, 5. how to use it (applications)

hpcse17 4

Page 5: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/hpcse17.pdf · 2017-06-08 · Applications of trace estimation techniques Yousef Saad Department of Computer Science

Introduction: A few examples

Problem 1: Compute Tr[inv[A]] the trace of the inverse.

ä Arises in cross validation methods [Stats]

ä Motivation for the work [Golub & Meurant, “Matrices, Mo-ments, and Quadrature”, 1993, Book with same title in 2009]

Problem 2: Compute Tr [ f (A)], f a certain function

ä Arises in many applications in Physics. Example:

ä Stochastic estimations of Tr ( f(A)) extensively used by quan-tum chemists to estimate Density of States, see

[H. Röder, R. N. Silver, D. A. Drabold, J. J. Dong, Phys. Rev. B.55, 15382 (1997)]. Will be covered in detail later

hpcse17 5

Page 6: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/hpcse17.pdf · 2017-06-08 · Applications of trace estimation techniques Yousef Saad Department of Computer Science

Problem 3: Compute diag[inv(A)] the diagonal of the inverse

ä Dynamic Mean Field Theory [DMFT, motivation for our workon this topic]. Related approach: Non Equilibrium Green’s Func-tion (NEGF) approach used to model nanoscale transistors.

ä Uncertainty quantification: diagonal of the inverse of a co-variance matrix needed [Bekas, Curioni, Fedulova ’09]

Problem 4: Compute diag[ f (A)] ; f = a certain function.

ä Arises in density matrix approaches in quantum modeling

f(ε) =1

1 + exp(ε−µkBT

)

Here, f = Fermi-Dirac operatorNote: when T → 0 then f →a step function.

ä Linear-Scaling methods

hpcse17 6

Page 7: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/hpcse17.pdf · 2017-06-08 · Applications of trace estimation techniques Yousef Saad Department of Computer Science

Problem 5: Estimate the numerical rank.

ä Amounts to counting the number of singular values above acertain threshold τ == Trace (φτ(ATA))..

φτ(t) is a certain step function.

Problem 6: Estimate the log-determinant (common in statis-tics)

log det(A) = Trace(log(A)) =∑ni=1 log(λi).

.... many others

hpcse17 7

Page 8: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/hpcse17.pdf · 2017-06-08 · Applications of trace estimation techniques Yousef Saad Department of Computer Science

Important tool: Stochastic Estimator

ä To estimate diagonal of B = f(A) (e.g., B = A−1), let:

Notation:

• d(B) = diag(B) [matlab notation]

•� and �: Elementwise multiplication and divi-sion of vectors

• {vj}: Sequence of s random vectors

Result: d(B) ≈

s∑j=1

vj �Bvj

� s∑j=1

vj � vj

C. Bekas , E. Kokiopoulou & YS (’05); C. Bekas, A. Curioni, I.Fedulova ’09; ...

hpcse17 8

Page 9: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/hpcse17.pdf · 2017-06-08 · Applications of trace estimation techniques Yousef Saad Department of Computer Science

Trace of a matrix

ä For the trace - take vectors of unit norm and

Trace(B) ≈1

s

s∑j=1

vTj Bvj

ä Hutchinson’s estimator : take random vectors with compo-nents of the form±1/

√n [Rademacher vectors]

ä Extensively studied in literature. See e.g.: Hutchinson ’89;H. Avron and S. Toledo ’11; G.H. Golub & U. Von Matt ’97;Roosta-Khorasani & U. Ascher ’15; ...

hpcse17 9

Page 10: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/hpcse17.pdf · 2017-06-08 · Applications of trace estimation techniques Yousef Saad Department of Computer Science

Typical convergence curve for stochastic estimator

ä Estimating the diagonal of inverse of two sample matrices

0 100 200 300 400 500 600 700 800 900 10000

0.05

0.1

0.15

0.2

0.25

0.3

0.35

# sampling vectors

Rela

tive e

rror

Af23560Orsreg

1

hpcse17 10

Page 11: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/hpcse17.pdf · 2017-06-08 · Applications of trace estimation techniques Yousef Saad Department of Computer Science

Alternative: standard probing

ä Several names for same method: “probing”; “CPR”, “SparseJacobian estimators”,..

Basis of the method: Color columns of matrix so that no twocolumns of the same color overlap.

Entries of same color canbe computed with 1 matvec

ä Corresponds to color-ing graph of ATA.

ä For problem of diag(A)need only color graph of A

1 3 161

1

(1)

(3)

(12)

(15)

1

1

5 20

1

1

1

(5)

(13)

(20)

12 13

hpcse17 11

Page 12: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/hpcse17.pdf · 2017-06-08 · Applications of trace estimation techniques Yousef Saad Department of Computer Science

In summary:

ä Probing much more powerful when f(A) is known to benearly sparse (e.g. banded)..

ä Approximate pattern (graph) can be obtained inexpensively

ä Generally just a handful of probing vectors needed – Canbe obtained by coloring graph

ä However:

ä Not as general: need f(A) to be ‘ ε – sparse ’

hpcse17 12

Page 13: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/hpcse17.pdf · 2017-06-08 · Applications of trace estimation techniques Yousef Saad Department of Computer Science

References:

• J. M. Tang and YS, A probing method for computing thediagonal of a matrix inverse, Numer. Lin. Alg. Appl., 19 (2012),pp. 485–501.

See also (improvements)

• Andreas Stathopoulos, Jesse Laeuchli, and Kostas OrginosHierarchical Probing for Estimating the Trace of the Matrix In-verse on Toroidal Lattices SISC, 2012. [somewhat specific toLattice QCD ]

• E. Aune, D. P. Simpson, J. Eidsvik [Statistics and Comput-ing 2012] combine probing with stochastic estimation. Goodimprovements reported.

hpcse17 13

Page 14: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/hpcse17.pdf · 2017-06-08 · Applications of trace estimation techniques Yousef Saad Department of Computer Science

DENSITY OF STATES & APPLICATIONS

Page 15: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/hpcse17.pdf · 2017-06-08 · Applications of trace estimation techniques Yousef Saad Department of Computer Science

Density of States

ä Formally, the Density Of States (DOS) of a matrix A is

φ(t) =1

n

n∑j=1

δ(t− λj),

where: • δ is the Dirac δ-function or Dirac distribution• λ1 ≤ λ2 ≤ · · · ≤ λn are the eigenvalues of A

ä DOS is also referred to as the spectral density

ä Note: number of eigenvalues in an interval [a, b] is

µ[a,b] =

∫ b

a

∑j

δ(t− λj) dt ≡∫ b

anφ(t)dt .

hpcse17 15

Page 16: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/hpcse17.pdf · 2017-06-08 · Applications of trace estimation techniques Yousef Saad Department of Computer Science

Issue: How to deal with distributions?

ä Highly ‘discontinuous’, not easy to handle numerically

ä Solution for practical and theoretical purposes: replace φ bya regularized (‘blurred’) version φσ:

φσ(t) =1

n

n∑j=1

hσ(t− λj),

Where, for example:

hσ(t) =1

(2πσ2)1/2e−

t2

2σ2.

−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 10

5

10

15

20

25

30

35

40

hσ (t), σ = 0.1

hpcse17 16

Page 17: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/hpcse17.pdf · 2017-06-08 · Applications of trace estimation techniques Yousef Saad Department of Computer Science

ä Smoothed φ(t) can be viewed as a distribution function ==probability of finding eigenvalues of A in a given infinitesimalinterval near t.

ä In Solid-State physics, λi’s represent single-particle energylevels.

ä So the DOS represents # of levels per unit energy.

ä Many uses in physics

hpcse17 17

Page 18: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/hpcse17.pdf · 2017-06-08 · Applications of trace estimation techniques Yousef Saad Department of Computer Science

ä How to select smoothing parameter σ? Example for Si2

0 10 20 30 400

0.01

0.02

0.03

0.04

0.05

κ = 1.75, σ = 0.35

t

φ(t)

0 10 20 30 400

0.01

0.02

0.03

0.04

0.05

κ = 1.30, σ = 0.52

t

φ(t)

0 10 20 30 400

0.01

0.02

0.03

0.04

0.05

κ = 1.15, σ = 0.71

t

φ(t)

ä Higher σ → smoother curveä But loss of detail ..ä Compromise: σ = h

2√

2 log(κ),

ä h = resolution, κ = parameter > 1

0 10 20 30 400

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0.045

κ = 1.08, σ = 0.96

t

φ(t)

hpcse17 18

Page 19: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/hpcse17.pdf · 2017-06-08 · Applications of trace estimation techniques Yousef Saad Department of Computer Science

Computing the DOS: The Kernel Polynomial Method

ä Used by Chemists to calculate the DOS – see Silver andRöder’94 , Wang ’94, Drabold-Sankey’93, + others

ä Basic idea: expand DOS into Chebyshev polynomials

ä Use trace estimator [discovered independently] to get tracesneeded in calculations

ä Assume change of variable done so eigenvalues lie in [−1, 1].

ä Include the weight function in the expansion so expand:

φ̂(t) =√1− t2φ(t) =

√1− t2 ×

1

n

n∑j=1

δ(t− λj).

Then, (full) expansion is: φ̂(t) =∑∞k=0µkTk(t).

hpcse17 19

Page 20: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/hpcse17.pdf · 2017-06-08 · Applications of trace estimation techniques Yousef Saad Department of Computer Science

ä Expansion coefficients µk are formally defined by:

µk =2− δk0π

∫ 1

−1

1√1− t2

Tk(t)φ̂(t)dt

=2− δk0π

∫ 1

−1

1√1− t2

Tk(t)√1− t2φ(t)dt

=2− δk0nπ

n∑j=1

Tk(λj).

ä Here 2− δk0 == 1 when k = 0 and == 2 otherwise.

ä Note:∑Tk(λi) = Trace[Tk(A)]

ä Estimate this, e.g., via stochastic estimator

ä Generate random vectors v(1), v(2), · · · , v(nvec)

ä Assume normal distribution with zero mean

hpcse17 20

Page 21: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/hpcse17.pdf · 2017-06-08 · Applications of trace estimation techniques Yousef Saad Department of Computer Science

ä Each vector is normalized so that ‖v(l)‖ = 1, l = 1, . . . , nvec.

ä Estimate the trace of Tk(A) with stochastisc estimator:

Trace(Tk(A)) ≈1

nvec

nvec∑l=1

(v(l))TTk(A)v(l).

ä Will lead to the desired estimate:

µk ≈2− δk0nπnvec

nvec∑l=1

(v(l))TTk(A)v(l).

ä To compute scalars of the form vTTk(A)v, exploit 3-termrecurrence of the Chebyshev polynomial:

Tk+1(A)v = 2ATk(A)v − Tk−1(A)v

so if we let vk ≡ Tk(A)v, we have

vk+1 = 2Avk − vk−1hpcse17 21

Page 22: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/hpcse17.pdf · 2017-06-08 · Applications of trace estimation techniques Yousef Saad Department of Computer Science

ä Jackson smoothing can be used –

−1 −0.5 0 0.5 1

−2

0

2

4

6

8

10

12

14

16

18

t

φ(t)

Exactw/o Jackson

w/ Jackson

hpcse17 22

Page 23: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/hpcse17.pdf · 2017-06-08 · Applications of trace estimation techniques Yousef Saad Department of Computer Science

An example: The Benzene matrix

>> TestKpmDosMatrix Benzene n =8219 nnz = 242669Degree = 40 # sample vectors = 10Elapsed time is 0.235189 seconds.

hpcse17 23

Page 24: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/hpcse17.pdf · 2017-06-08 · Applications of trace estimation techniques Yousef Saad Department of Computer Science

Use of the Lanczos Algorithm

ä Background: The Lanczos algorithm generates an orthonor-mal basis Vm = [v1, v2, · · · , vm] for the Krylov subspace:

span{v1, Av1, · · · , Am−1v1}

ä ... such that:V Hm AVm = Tm - with Tm =

α1 β2

β2 α2 β3

β3 α3 β4

. . .. . .βm αm

hpcse17 24

Page 25: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/hpcse17.pdf · 2017-06-08 · Applications of trace estimation techniques Yousef Saad Department of Computer Science

ä Lanczos process builds orthogonal polynomials wrt to dotproduct: ∫

p(t)q(t)dt ≡ (p(A)v1, q(A)v1)

ä In theory vi’s defined by 3-term recurrence are orthogonal.

ä Let θi, i = 1 · · · ,m be the eigenvalues of Tm [Ritz values]

ä yi’s associated eigenvectors; Ritz vectors: {Vmyi}i=1:m

ä Ritz values approximate eigenvalues

ä Could compute θi’s then get approximate DOS from these

ä Problem: θi not good enough approximations – especiallyinside the spectrum.

hpcse17 25

Page 26: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/hpcse17.pdf · 2017-06-08 · Applications of trace estimation techniques Yousef Saad Department of Computer Science

ä Better idea: exploit relation of Lanczos with (discrete) or-thogonal polynomials and related Gaussian quadrature:∫

p(t)dt ≈m∑i=1

aip(θi) ai =[eT1 yi

]2ä See, e.g., Golub & Meurant ’93, and also Gautschi’81, Goluband Welsch ’69.

ä Formula exact when p is a polynomial of degree≤ 2m+1

hpcse17 26

Page 27: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/hpcse17.pdf · 2017-06-08 · Applications of trace estimation techniques Yousef Saad Department of Computer Science

ä Consider now∫p(t)dt =< p, 1 >= (Stieljes) integral≡

(p(A)v, v) =∑β2ip(λi) ≡< φv, p >

ä Then 〈φv, p〉 ≈∑aip(θi) =

∑ai 〈δθi, p〉 →

φv ≈∑

aiδθi

ä To mimick the effect of βi = 1, ∀i, use several vectors vand average the result of the above formula over them..

hpcse17 27

Page 28: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/hpcse17.pdf · 2017-06-08 · Applications of trace estimation techniques Yousef Saad Department of Computer Science

Other methods

ä The Lanczos spectroscopic approach : A sort of signalprocessing approach to detect peaks using Fourier analysis

ä The Delta-Chebyshev approach: Smooth φ with Gaussians,then expand Gaussians using Legendre polynomials

ä Haydock’s method: interesting ’classic’ approach in physics- uses Lanczos to unravel ‘near-poles’ of (A− εiI)−1

For details see:

• Approximating spectral densities of large matrices, Lin Lin,YS, and Chao Yang - SIAM Review ’16. Also in:[arXiv: http://arxiv.org/abs/1308.5467]

hpcse17 28

Page 29: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/hpcse17.pdf · 2017-06-08 · Applications of trace estimation techniques Yousef Saad Department of Computer Science

Experiments

ä Goal: to compare errors for similar number of matrix-vectorproducts

ä Example: Kohn-Sham Hamiltonian associated with a ben-zene molecule generated from PARSEC. n = 8, 219

ä In all cases, we use 10 sampling vectors

ä General observation: DGL, Lanczos, and KPM are best,

ä Spectroscopic method does OK

ä Haydock’s method [another method based on the Lanczosalgorithm] not as good

hpcse17 29

Page 30: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/hpcse17.pdf · 2017-06-08 · Applications of trace estimation techniques Yousef Saad Department of Computer Science

Method L1 error L2 error L∞ errorKPM w/ Jackson, deg=80 2.592e-02 5.032e-03 2.785e-03KPM w/o Jackson, deg=80 2.634e-02 4.454e-03 2.002e-03KPM Legendre, deg=80 2.504e-02 3.788e-03 1.174e-03Spectroscopic, deg=40 5.589e-02 8.652e-03 2.871e-03Spectroscopic, deg=100 4.624e-02 7.582e-03 2.447e-03DGL, deg=80 1.998e-02 3.379e-03 1.149e-03Lanczos, deg=80 2.755e-02 4.178e-03 1.599e-03Haydock, deg=40 6.951e-01 1.302e-01 6.176e-02Haydock, deg=100 2.581e-01 4.653e-02 1.420e-02

L1, L2, and L∞ error compared with the normalized “surro-gate” DOS for benzene matrix

ä Many more experiments in survey paper [L. Lin, YS, C.Yang, SIAM Review, 2015].

hpcse17 30

Page 31: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/hpcse17.pdf · 2017-06-08 · Applications of trace estimation techniques Yousef Saad Department of Computer Science

What about matrix pencils?

ä DOS for generalized eigen-value problems

Ax = λBx

ä Assume: A is symmetric and B is SPD.

ä In principle: can just apply methods toB−1Ax = λx, usingB - inner products.

ä Requires factoring B. Too expensive [Think 3D Pbs]

? Observe: B is usually very *strongly* diagonally dominant.

ä Especially true after Left+Right Diag. scaling :

B̃ = S−1BS−1 S = diag(B)1/2

hpcse17 31

Page 32: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/hpcse17.pdf · 2017-06-08 · Applications of trace estimation techniques Yousef Saad Department of Computer Science

ä General observation for FEM mass matrices. See Theorem3.2 in L. Kamenski, W. Huang, and H. Xu, Math. Comp.’14:

Theorem Condition number of scaled Galerkin mass matrixwith a simplicial mesh has a mesh-independent bound:

κ(S−1BS−1) ≤ d+ 2

Example: Matrix pair Kuu, Muu from Suite Sparse collection.

ä MatricesA andB have dimension n = 7, 102. nnz(A) =340, 200 nnz(B) = 170, 134.

ä After scaling by diagonals to have diag. entries equal toone, all eigenvalues of B are in interval

[0.6254, 1.5899]

hpcse17 32

Page 33: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/hpcse17.pdf · 2017-06-08 · Applications of trace estimation techniques Yousef Saad Department of Computer Science

Approximation theory to the rescue.

? Idea: Compute the DOS for the standard problem

B−1/2AB−1/2u = λu

ä Use a very low degree polynomial to approximate B−1/2.

ä We use Chebyshev expansions.

ä Degree k determined automatically by enforcing

‖t−1/2 − pk(t)‖∞ < tol

ä Theoretical results establish convergence that is exponentialwith respect to degree.

hpcse17 33

Page 34: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/hpcse17.pdf · 2017-06-08 · Applications of trace estimation techniques Yousef Saad Department of Computer Science

Example: Results for Kuu-Muu example

ä Using polynomials of degree 3 (!) to approximate B−1/2

ä Krylov subspace of dim. 30 (== deg. of polynomial in KPM)

ä 10 Sample vectors used

0 2 4 6 8 10 12 14 16

x 104

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8x 10

−5Kuu−Muu test −− m=30 Pol. Deg for B=3, n

vec = 10

DOS from Lanczos algorithm

From histogram

Lanczos

0 2 4 6 8 10 12 14 16

x 104

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8x 10

−5Kuu−Muu pair −− m=30 Pol. Deg for B=3, n

vec = 10

DOS from KPM

From histogram

KPM-Chebyshev

0 2 4 6 8 10 12 14 16

x 104

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8x 10

−5Kuu−Muu pair −− m=30 Pol. Deg for B=3, n

vec = 30

DOS from KPM−Legendre

From histogram

KPM-Legendre

hpcse17 34

Page 35: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/hpcse17.pdf · 2017-06-08 · Applications of trace estimation techniques Yousef Saad Department of Computer Science

Application 1: Eigenvalue counts

The problem: Given A (Hermitian) with eigenvalues λ1 ≤λ2 · · · ≤ λn find an estimate of the number µ[a,b] of eigenval-ues of A in interval [a, b].

Standard method: Sylvester inertia theorem. Requires twoLDLT factorizations→ expensive!

First alternative: integrate the Spectral Density in [a, b].

µ[a,b] ≈ n(∫ b

aφ̃(t)dt

)= n

m∑k=0

µk

(∫ b

a

Tk(t)√1− t2

dt

)= ...

Second method: Estimate traceof the related spectral projector P(→ ui’s = eigenvectors↔ λi’s)

P =∑

λi ∈ [a b]

uiuTi .

hpcse17 35

Page 36: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/hpcse17.pdf · 2017-06-08 · Applications of trace estimation techniques Yousef Saad Department of Computer Science

ä We know: µ[a,b] = Tr (P ) . P is not available ... but canbe approximated. Note:

P = h(A) where h(t) =

{1 if t ∈ [a b]0 otherwise

ä Approximate h(t) by polynom.ψ(t) using Chebyshev expansionsä Then µ[a,b] ≈ Tr (ψ(A)) approxi-mated by a trace estimator:

µ[a,b] ≈1

nv

nv∑k=1

v>k ψ(A)vk

−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1−0.2

0

0.2

0.4

0.6

0.8

1

1.2Mid−pass polynom. filter [−1 .3 .6 1]; Degree = 80

Standard Cheb.Jackson−Cheb.

ä It turns out that the 2 methods are identical.

hpcse17 36

Page 37: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/hpcse17.pdf · 2017-06-08 · Applications of trace estimation techniques Yousef Saad Department of Computer Science

Application 2: “Spectrum Slicing”

ä Situation: very large number of eigenvalues to be computed

ä Goal: compute spectrum by slices by applying filtering

ä Apply Lanczos or Sub-space iteration to problem:

φ(A)u = µu

φ(t) ≡ a polynomial orrational function that en-hances wanted eigenvalues

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−0.2

0

0.2

0.4

0.6

0.8

1

λ

φ (

λ )

λ

i

φ(λi)

Pol. of degree 32 approx δ(.5) in [−1 1]

hpcse17 37

Page 38: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/hpcse17.pdf · 2017-06-08 · Applications of trace estimation techniques Yousef Saad Department of Computer Science

Rationale. Eigenvectors on both ends of wanted spectrumneed not be orthogonalized against each other :

ä Idea: Get the spectrum by ‘slices’ or ’windows’ [e.g., a fewhundreds or thousands of pairs at a time]

ä Can use polynomial or rational filters

hpcse17 38

Page 39: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/hpcse17.pdf · 2017-06-08 · Applications of trace estimation techniques Yousef Saad Department of Computer Science

Compute slices separately

ä Deceivingly simple looking idea.

ä Issues:

• Deal with interfaces : duplicate/missing eigenvalues

• Window size [need estimate of eigenvalues]

• How to compute each slice? [polynomial / rationalfilters?, ..]

hpcse17 39

Page 40: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/hpcse17.pdf · 2017-06-08 · Applications of trace estimation techniques Yousef Saad Department of Computer Science

A digression: the EVSL project

ä Newly released EVSL uses polynomial and rational filters

ä Each can be appealing in different situations.

Spectrum slicing: cut the overall interval containing the spec-trum into small sub-intervals and compute eigenpairs in eachsub-interval independently.

For each subinterval: select a filterpolynomial of a certain degree so itshigh part captures the wanted eigen-values. In illustration, the polynomialsare of degree 20 (left), 30 (middle),and 32 (right).

−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1−0.2

0

0.2

0.4

0.6

0.8

1

1.2

λ

φ (

λ )

hpcse17 40

Page 41: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/hpcse17.pdf · 2017-06-08 · Applications of trace estimation techniques Yousef Saad Department of Computer Science

9/29/2016 Yousef Saad -- SOFTWARE

http://www-users.cs.umn.edu/~saad/software/ 1/6

S O F T W A R E

EVSL   a library of (sequential) eigensolvers based on spectrum slicing. Version 1.0released on [09/11/2016] EVSL provides routines for computing eigenvalues located in a given interval, and theirassociated eigenvectors, of real symmetric matrices. It also provides tools for spectrumslicing, i.e., the technique of subdividing a given interval into p smaller subintervals andcomputing the eigenvalues in each subinterval independently. EVSL implements apolynomial filtered Lanczos algorithm (thick restart, no restart) a rational filtered Lanczosalgorithm (thick restart, no restart), and a polynomial filtered subspace iteration.

ITSOL a library of (sequential) iterative solvers. Version 2 released. [11/16/2010] ITSOL can be viewed as an extension of the ITSOL module in the SPARSKIT package. Itis written in C and aims at providing additional preconditioners for solving general sparselinear systems of equations. Preconditioners so far in this package include (1) ILUK (ILUpreconditioner with level of fill) (2) ILUT (ILU preconditioner with threshold) (3) ILUC(Crout version of ILUT) (4) VBILUK (variable block preconditioner with level of fill ­ withautomatic block detection) (5) VBILUT (variable block preconditioner with threshold ­with automatic block detection) (6) ARMS (Algebraic Recursive Multilevel Solvers ­­includes actually several methods ­ In particular the standard ARMS and the ddPQ versionwhich uses nonsymmetric permutations). ZITSOL a complex version of some of the methods in ITSOL is also available. 

Page 42: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/hpcse17.pdf · 2017-06-08 · Applications of trace estimation techniques Yousef Saad Department of Computer Science

Levels of parallelism

Sli

ce

1S

lic

e 2

Sli

ce

3

Domain 1

Domain 2

Domain 3

Domain 4

Macro−task 1

The two main levels of parallelism in EVSL

hpcse17 42

Page 43: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/hpcse17.pdf · 2017-06-08 · Applications of trace estimation techniques Yousef Saad Department of Computer Science

How do I slice a spectrum?

Analogue question:

How would I slice an onion if Iwant each slice to have aboutthe same mass?

Answer: Use the DOS.

hpcse17 43

Page 44: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/hpcse17.pdf · 2017-06-08 · Applications of trace estimation techniques Yousef Saad Department of Computer Science

0 5 10 15 20−0.005

0

0.005

0.01

0.015

0.02

0.025

Slice spectrum into 8 with the DOS

DOS

ä We must have:

∫ ti+1

ti

φ(t)dt =1

nslices

∫ b

aφ(t)dt

hpcse17 44

Page 45: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/hpcse17.pdf · 2017-06-08 · Applications of trace estimation techniques Yousef Saad Department of Computer Science

Application 3: Estimating the rank

• Joint work with S. Ubaru

ä Very important problem in signal processing applications,machine learning, etc.

ä Often: a certain rank is selected ad-hoc. Dimension reduc-tion is application with this “guessed” rank.

ä Can be viewed as a particular case of the eigenvalue countproblem - but need a cutoff value..

hpcse17 45

Page 46: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/hpcse17.pdf · 2017-06-08 · Applications of trace estimation techniques Yousef Saad Department of Computer Science

Approximate rank, Numerical rank

ä Notion defined in various ways. A common one:

rε = min{rank(B) : B ∈ Rm×n, ‖A−B‖2 ≤ ε},

rε = Number of sing. values ≥ ε

ä Two distinct problems:

1. Get a good ε 2. Estimate number of sing. values≥ ε

ä We will need a cut-off value (’threshold’) ε.

ä Could use ‘noise level’ for ε, but not always available

hpcse17 46

Page 47: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/hpcse17.pdf · 2017-06-08 · Applications of trace estimation techniques Yousef Saad Department of Computer Science

Threshold selection

ä How to select a good threshold?

ä Answer: Obtain it from the DOS function

0.5 1 1.5 2

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Exact DOS by KPM, deg = 30

λ

φ(λ)

KPM (Chebyshev)

0.5 1 1.5 2 2.5 3 3.5 4

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

Exact DOS by KPM, deg = 30

λ

φ(λ)

KPM (Chebyshev)

1 2 3 4 5 6 7 8 9

0.5

1

1.5

2

2.5

3

3.5

4

Exact DOS by KPM, deg = 30

λ

φ(λ)

KPM (Chebyshev)

(A) (B) (C)

Exact DOS plots for three different types of matrices.

hpcse17 47

Page 48: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/hpcse17.pdf · 2017-06-08 · Applications of trace estimation techniques Yousef Saad Department of Computer Science

ä To find: point immediatly following the initial sharp dropobserved.

ä Simple idea: use derivative of DOS function φ

ä For an n×n matrix with eigenvalues λn ≤ λn−1 ≤ · · · ≤λ1:

ε = min{t : λn ≤ t ≤ λ1, φ′(t) = 0}.

ä In practice replace by

ε = min{t : λn ≤ t ≤ λ1, |φ′(t)| ≥ tol}

hpcse17 48

Page 49: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/hpcse17.pdf · 2017-06-08 · Applications of trace estimation techniques Yousef Saad Department of Computer Science

Experiments

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−0.5

0

0.5

1

1.5

2

2.5

3DOS with KPM, deg = 50

λ

φ(λ

)

0 10 20 301250

1300

1350

1400Lanczos Approximation (matrix size=1961)

Number of vectors (1 −> 30)

Estimed

#eigenvaluesininterval

CumulativeAvg

Exact(rε)ℓ

(A) (B)(A) The DOS found by KPM.(B) Approximate rank estimation by The Lanczos method forthe example netz4504.

hpcse17 49

Page 50: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/hpcse17.pdf · 2017-06-08 · Applications of trace estimation techniques Yousef Saad Department of Computer Science

Tests with Matérn covariance matrices for grids

ä Important in statistical applications

Approximate Rank Estimation of Matérn covariance matrices

Type of Grid (dimension) Matrix # λi’s rεSize ≥ ε KPM Lanczos

1D regular Grid (2048× 1) 2048 16 16.75 15.801D no structure Grid (2048× 1) 2048 20 20.10 20.462D regular Grid (64× 64) 4096 72 72.71 72.902D no structure Grid (64× 64) 4096 70 69.20 71.232D deformed Grid (64× 64) 4096 69 68.11 69.45

ä For all test M(deg) = 50, nv=30

hpcse17 50

Page 51: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/hpcse17.pdf · 2017-06-08 · Applications of trace estimation techniques Yousef Saad Department of Computer Science

Application 4: The LogDeterminant

Evaluate the Log-determinant of A:

log det(A) = Trace(log(A)) =∑ni=1 log(λi).

A is SPD.

ä Estimating the log-determinant of a matrix equivalent toestimating the trace of the matrix function f(A) = log(A).

ä Can invoke Stochastic Lanczos Quadrature (SLQ) to esti-mate this trace.

hpcse17 51

Page 52: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/hpcse17.pdf · 2017-06-08 · Applications of trace estimation techniques Yousef Saad Department of Computer Science

Numerical example: A graph Laplacian california of size9664× 9664, nz ≈ 105 from the Univ. of Florida collection.

Rel. error vs degree

• 3 methods: Taylor Series,Chebyshev expansion, SLQ

• # starting vectors nv = 100in all three cases.

10 20 30 40 50

10−4

10−2

100

Comparison nv=100

Degree (5 −> 50)R

ela

tive

err

or

TaylorChebyshevLanczos

hpcse17 52

Page 53: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/hpcse17.pdf · 2017-06-08 · Applications of trace estimation techniques Yousef Saad Department of Computer Science

Runtime comparisons

0 2 4 6x 104

10−2

100

102

104

Matrix SizeRuntim

e(secs)

CholeskyTalyorChebyshevLanczos

Runtime comparison

hpcse17 53

Page 54: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/hpcse17.pdf · 2017-06-08 · Applications of trace estimation techniques Yousef Saad Department of Computer Science

Application 6: Log-likelihood.

Comes from parameter estimation for Gaussian processes

ä Objective is to maximize the log-likelihood function withrespect to a ‘hyperparameter’ vector ξ

log p(z | ξ) = −12

[z>S(ξ)−1z + log detS(ξ) + cst

]where z = data vector and S(ξ) == covariance matrix parame-terized by ξ

ä Can use the same Lanczos runs to estimate z>S(ξ)−1zand logDet term simultaneously.

hpcse17 54

Page 55: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/hpcse17.pdf · 2017-06-08 · Applications of trace estimation techniques Yousef Saad Department of Computer Science

Application 7: calculating nuclear norm

ä ‖X‖∗ =∑σi(X) =

∑√λi(XTX)

ä Generalization: Schatten p-norms

‖X‖∗,p = [∑σi(X)p]1/p

ä See:

J. Chen, S. Ubaru, YS, “Fast estimation of log-determinant andSchatten norms via stochastic Lanczos quadrature”, (Submit-ted).

hpcse17 55

Page 56: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/hpcse17.pdf · 2017-06-08 · Applications of trace estimation techniques Yousef Saad Department of Computer Science

Conclusion

ä Estimating traces is a key ingredient in many algorithms

ä Physics, machine learning, matrix algorithms, ..

ä .. many new problems related to ‘data analysis’ and ’statis-tics’, and in signal processing,

Q: Can we do better than standard random sampling?

hpcse17 56