64
Gaussian Markov Random Fields Gaussian Markov Random Fields: Theory and Applications avard Rue 1 1 Department of Mathematical Sciences NTNU, Norway September 26, 2008 Gaussian Markov Random Fields Part I Definition and basic properties Gaussian Markov Random Fields Outline I Introduction Why? Outline of this course What is a GMRF? The precision matrix Definition of a GMRF Example: Auto-regressive process Why are GMRFs important? Main features of GMRFs Properties of GMRFs Interpretation of elements of Q Markov properties Conditional density Specification through full conditionals Proof via Brook’s lemma Gaussian Markov Random Fields Introduction Why? Why is it a good idea to learn about Gaussian Markov random fields (GMRFs)? That is a good question! What’s there to learn? Isn’t all just a Gaussian? That is also a good question!

Gaussian Markov Random Fields: Theory and Applications ... · Gaussian Markov Random Fields: Theory and Applications avaH rd Rue 1 1Department of Mathematical Sciences NTNU, Norway

  • Upload
    others

  • View
    73

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Gaussian Markov Random Fields: Theory and Applications ... · Gaussian Markov Random Fields: Theory and Applications avaH rd Rue 1 1Department of Mathematical Sciences NTNU, Norway

Gaussian Markov Random Fields

Gaussian Markov Random Fields: Theory andApplications

Havard Rue1

1Department of Mathematical SciencesNTNU, Norway

September 26, 2008

Gaussian Markov Random Fields

Part I

Definition and basic properties

Gaussian Markov Random Fields

Outline IIntroduction

Why?

Outline of this courseWhat is a GMRF?The precision matrixDefinition of a GMRFExample: Auto-regressive processWhy are GMRFs important?Main features of GMRFs

Properties of GMRFsInterpretation of elements of QMarkov propertiesConditional densitySpecification through full conditionalsProof via Brook’s lemma

Gaussian Markov Random Fields

Introduction

Why?

Why is it a good idea to learn about Gaussian Markov randomfields (GMRFs)?That is a good question!

What’s there to learn? Isn’t all just a Gaussian?

That is also a good question!

Page 2: Gaussian Markov Random Fields: Theory and Applications ... · Gaussian Markov Random Fields: Theory and Applications avaH rd Rue 1 1Department of Mathematical Sciences NTNU, Norway

Gaussian Markov Random Fields

Introduction

Why?

Why?

• Gaussians (and most often in the form of a GMRF) areextensively used in statistical models.

• However, its excellent computational properties are not oftenexplored.

• Lack of knowledge of general purpose and near optimalnumerical algorithms

Gaussian Markov Random Fields

Introduction

Why?

Why?

• Fast(er) MCMC based inference

Recent developments (the really nice stuff...)

• Near instant (compare to MCMC) approximate Bayesianinference is often possible

• Deterministic

• Relative error

• In practice: no “error”

Gaussian Markov Random Fields

Introduction

Why?

Longitudinal mixed effects model: Epil-example fromBUGS

Gaussian Markov Random Fields

Introduction

Why?

Longitudinal mixed effects model: Epil-example fromBUGS...

In this example

x = (a0, aBase, aTrt, aBT, aAge, aV4, b1j, bjk)

is a GMRF.

Two (hyper-)parameters

Precision(b1) and Precision(b)

Poisson-observations

Page 3: Gaussian Markov Random Fields: Theory and Applications ... · Gaussian Markov Random Fields: Theory and Applications avaH rd Rue 1 1Department of Mathematical Sciences NTNU, Norway

Gaussian Markov Random Fields

Outline of this course

Outline of this course: Day I

Lecture 1 GMRFs: definition and basic properties

Lecture 2 Simulation algorithms for GMRFs

Lecture 3 Numerical methods for sparse matrices

Lecture 4 Intrinsic GMRFs

Lecture 5 Hierarchical GMRF models and MCMC algorithmsfor such models

Lecture 6 Case-studies

Gaussian Markov Random Fields

Outline of this course

Outline of this course: Day II

Lecture 1 Motivation for Approximate Bayesian inference

Lecture 2 INLA: Integrated Nested Laplace Approximations

Lecture 3 The inla-program: examples

Lecture 4 Using R-interface to the inla-program (Sara Martino)

Lecture 5 Case studies with R (Sara Martino)

Lecture 6 Discussion

Gaussian Markov Random Fields

Outline of this course

What is a GMRF?

What is a Gaussian Markov random field (GMRF)?

A GMRF is a simple construct

• A normal distributed random vector

x = (x1, . . . , xn)T

• Additional Markov properties:

xi ⊥ xj | x−ij

xi and xj are conditional independent (CI).

Gaussian Markov Random Fields

Outline of this course

The precision matrix

If xi ⊥ xj | x−ij for a set of i , j, then we need to constrain theparametrisation of the GMRF.

• Covariance matrix: difficult

• Precision matrix: easy

Page 4: Gaussian Markov Random Fields: Theory and Applications ... · Gaussian Markov Random Fields: Theory and Applications avaH rd Rue 1 1Department of Mathematical Sciences NTNU, Norway

Gaussian Markov Random Fields

Outline of this course

The precision matrix

Conditional independence and the precision matrix

The density of a zero mean Gaussian

π(x) ∝ |Q|1/2 exp

(−1

2xT Qx

)Constraining the parametrisation to obey CI properties

Theorem

xi ⊥ xj | x−ij ⇐⇒ Qij = 0

Gaussian Markov Random Fields

Outline of this course

The precision matrix

Use a (undirected) graph G = (V, E) to represent the CI properties,

V Vertices: 1, 2, . . . , n.

E Edges i , j• No edge between i and j if xi ⊥ xj | x−ij .• An edge between i and j if xi 6⊥ xj | x−ij .

Gaussian Markov Random Fields

Outline of this course

Definition of a GMRF

Definition of a GMRF

Definition (GMRF)A random vector x = (x1, . . . , xn)T is called a GMRF wrt thegraph G = (V = 1, . . . , n, E) with mean µ and precision matrixQ > 0, iff its density has the form

π(x) = (2π)−n/2|Q|1/2 exp

(−1

2(x− µ)T Q(x− µ)

)and

Qij 6= 0 ⇐⇒ i , j ∈ E for all i 6= j .

Gaussian Markov Random Fields

Outline of this course

Example: Auto-regressive process

Simple example of a GMRFAuto-regressive process of order 1

xt | xt−1, . . . , x1 ∼ N (φxt−1, 1), t = 2, . . . , n

and x1 ∼ N (0, (1− φ2)−1).

Tridiagonal precision matrix

Q =

1 −φ−φ 1 + φ2 −φ

. . .. . .

. . .

−φ 1 + φ2 −φ−φ 1

Page 5: Gaussian Markov Random Fields: Theory and Applications ... · Gaussian Markov Random Fields: Theory and Applications avaH rd Rue 1 1Department of Mathematical Sciences NTNU, Norway

Gaussian Markov Random Fields

Outline of this course

Why are GMRFs important?

Usage of GMRFs (I)

Structural time-series analysis

• Autoregressive models.

• Gaussian state-space models.

• Computational algorithms based on the Kalman filter and itsvariants.

Analysis of longitudinal and survival data

• temporal GMRF priors

• state-space approaches

• spatial GMRF priors

used to analyse longitudinal and survival data.

Gaussian Markov Random Fields

Outline of this course

Why are GMRFs important?

Usage of GMRFs (II)

Graphical models

• A key model

• Estimate Q and its (associated) graph from data.

• Often used in a larger context.

Semiparametric regression and splines

• Model a smooth curve in time or a surface in space.

• Intrinsic GMRF models and random walk models

• Discretely observed integrated Wiener processes are GMRFs

• GMRFs models for coefficients in B-splines.

Gaussian Markov Random Fields

Outline of this course

Why are GMRFs important?

Usage of GMRFs (III)

Image analysis

• The first main area for spatial models

• Image restoration using the Wiener filter.

• Texture modelling and texture discrimination.

• Segmentation

• Deformable templates

• Object identification

• 3D reconstruction

• Restoring ultrasound images

Gaussian Markov Random Fields

Outline of this course

Why are GMRFs important?

Usage of GMRFs (IV)

Spatial statistics

• Latent GMRF model analysis of spatial binary data

• Geostatistics using GMRFs

• Analysis of data in social sciences and spatial econometrics

• Spatial and space-time epidemiology

• Environmental statistics

• Inverse problems

Page 6: Gaussian Markov Random Fields: Theory and Applications ... · Gaussian Markov Random Fields: Theory and Applications avaH rd Rue 1 1Department of Mathematical Sciences NTNU, Norway

Gaussian Markov Random Fields

Outline of this course

Main features of GMRFs

Main features

• Analytical tractable

• Modelling using conditional independence

• Merging GMRFs using conditioning (hierarchical models)

• Unified framework for• understanding• representation• computation using numerical methods for sparse matrices

• Fits nicely into the MCMC world

• Can construct fast and reliable block-MCMC algorithms.

Gaussian Markov Random Fields

Properties of GMRFs

Constraining the parametrisation to obey CI properties

Theorem

xi ⊥ xj | x−ij ⇐⇒ Qij = 0

Gaussian Markov Random Fields

Properties of GMRFs

Proof

Start with the factorisation theorem.

Theorem

x ⊥ y | z ⇐⇒ π(x , y , z) = f (x , z)g(y , z) (1)

for some functions f and g, and for all z with π(z) > 0.

ExampleFor

π(x , y , z) ∝ exp(x + xz + yz)

on some bounded region, we see that x ⊥ y |z . However, this is notthe case for

π(x , y , z) ∝ exp(xyz)

Gaussian Markov Random Fields

Properties of GMRFs

Proof...

π(xi , xj , x−ij) ∝ exp(− 1

2

∑k,l

xkQklxl

)∝ exp

(− 1

2xixj(Qij + Qji )︸ ︷︷ ︸

term 1

−1

2

∑k,l6=i ,j

xkQklxl︸ ︷︷ ︸term 2

).

Term 1 involves xixj iff Qij 6= 0. Term 2 does not involve xixj .

We now see that π(xi , xj , x−ij) = f (xi , x−ij)g(xj , x−ij) for somefunctions f and g , iff Qij = 0.Use the factorisation theorem.

Page 7: Gaussian Markov Random Fields: Theory and Applications ... · Gaussian Markov Random Fields: Theory and Applications avaH rd Rue 1 1Department of Mathematical Sciences NTNU, Norway

Gaussian Markov Random Fields

Properties of GMRFs

Interpretation of elements of Q

Interpretation of elements of Q

Let x be a GMRF wrt G = (V, E) with mean µ and precisionmatrix Q > 0, then

E(xi | x−i ) = µi − 1

Qii

∑j :j∼i

Qij(xj − µj),

Prec(xi | x−i ) = Qii and

Corr(xi , xj | x−ij) = − Qij√QiiQjj

, i 6= j .

Gaussian Markov Random Fields

Properties of GMRFs

Markov properties

Markov properties

Let x be a GMRF wrt G = (V, E). Then the following areequivalent.The pairwise Markov property:

xi ⊥ xj | x−ij if i , j 6∈ E and i 6= j .

Gaussian Markov Random Fields

Properties of GMRFs

Markov properties

Markov properties

Let x be a GMRF wrt G = (V, E). Then the following areequivalent.The local Markov property:

xi ⊥ x−i ,ne(i) | xne(i) for every i ∈ V.

Gaussian Markov Random Fields

Properties of GMRFs

Markov properties

Markov propertiesLet x be a GMRF wrt G = (V, E). Then the following areequivalent.The global Markov property:

xA ⊥ xB | xC

for all disjoint sets A, B and C where C separates A and B, and Aand B are non-empty.

Page 8: Gaussian Markov Random Fields: Theory and Applications ... · Gaussian Markov Random Fields: Theory and Applications avaH rd Rue 1 1Department of Mathematical Sciences NTNU, Norway

Gaussian Markov Random Fields

Properties of GMRFs

Conditional density

(Induced) subgraph

Let A ⊂ VGA denote the graph restricted to A.

• remove all nodes not belonging to A, and• all edges where at least one node does not belong to A

ExampleA = 1, 2, then

VA = 1, 2 and EA = 1, 2

Gaussian Markov Random Fields

Properties of GMRFs

Conditional density

Conditional density I

Let V = A ∪ B where A ∩ B = ∅, and

x =

(xA

xB

), µ =

(µA

µB

), Q =

(QAA QAB

QBA QBB

).

Result: xA|xB is then a GMRF wrt the subgraph GA withparameters µA|B and QA|B > 0, where

µA|B = µA −Q−1AAQAB(xB − µB) and QA|B = QAA.

Gaussian Markov Random Fields

Properties of GMRFs

Conditional density

Conditional density II

µA|B = µA −Q−1AAQAB(xB − µB) and QA|B = QAA.

• Have explicit knowledge QA|B as the principal matrix QAA.

• The subgraph GA does not change the structure, only removesnodes and edges in A.

• The conditional mean only depends on nodes in A ∪ ne(A).

• If QAA is sparse, then µA|B is the solution of a sparse linearsystem

QAA(µA|B − µA) = −QAB(xB − µB)

Gaussian Markov Random Fields

Properties of GMRFs

Conditional density

Conditional density III

Page 9: Gaussian Markov Random Fields: Theory and Applications ... · Gaussian Markov Random Fields: Theory and Applications avaH rd Rue 1 1Department of Mathematical Sciences NTNU, Norway

Gaussian Markov Random Fields

Properties of GMRFs

Conditional density

Conditional density III

Gaussian Markov Random Fields

Properties of GMRFs

Conditional density

Conditional density III

Gaussian Markov Random Fields

Properties of GMRFs

Specification through full conditionals

Specification through full conditionals

An alternative to specifying a GMRF by its mean and precisionmatrix, is to specify it implicitly through the full conditionals

π(xi |x−i ), i = 1, . . . , n

This approach was pioneered by Besag (1974,1975)

Also known as conditional autoregressions or CAR-models.

Gaussian Markov Random Fields

Properties of GMRFs

Specification through full conditionals

Example: AR-process

Specify full conditionals

E (x1 | x−1) = 0.3x2 Prec(x1 | x−1) = 1

E (x3 | x−3) = 0.2x2 + 0.4x4 Prec(x3 | x−3) = 2

and so on.

Page 10: Gaussian Markov Random Fields: Theory and Applications ... · Gaussian Markov Random Fields: Theory and Applications avaH rd Rue 1 1Department of Mathematical Sciences NTNU, Norway

Gaussian Markov Random Fields

Properties of GMRFs

Specification through full conditionals

Example: Spatial model

Specify full conditionals for each i (assuming zero mean)

E (xi | x−i ) =∑

j

wijxj Prec(xi | x−i ) = κi

Gaussian Markov Random Fields

Properties of GMRFs

Specification through full conditionals

Potential problems

• Even though the “full conditionals” are well defined, are wesure that the joint density exists and is unique?

• What are the compatibility conditions required for a jointGMRF to exists with the prescribed Markov properties.

Gaussian Markov Random Fields

Properties of GMRFs

Specification through full conditionals

In general

Specify the full conditionals as normals with

E(xi | x−i ) = µi −∑j :j∼i

βij(xj − µj) and (2)

Prec(xi | x−i ) = κi > 0 (3)

for i = 1, . . . , n, for some βij , i 6= j, and vectors µ and κ.

Clearly, ∼ is defined implicitly by the nonzero terms of βij.

Gaussian Markov Random Fields

Properties of GMRFs

Specification through full conditionals

E(xi | x−i ) = µi −∑j :j∼i

βij(xj − µj) and Prec(xi | x−i ) = κi > 0

Must be a joint density π(x) with these as the full conditionals.

Comparing term by term with (2):

Qii = κi , and Qij = κiβij

Q is symmetric, i.e.,κiβij = κjβji ,

We have a candidate for a joint density provided Q > 0.

Page 11: Gaussian Markov Random Fields: Theory and Applications ... · Gaussian Markov Random Fields: Theory and Applications avaH rd Rue 1 1Department of Mathematical Sciences NTNU, Norway

Gaussian Markov Random Fields

Properties of GMRFs

Proof via Brook’s lemma

Lemma (Brook’s lemma)Let π(x) be the density for x ∈ Rn and defineΩ = x ∈ Rn : π(x) > 0. Let x, x′ ∈ Ω, then

π(x)

π(x′)=

n∏i=1

π(xi |x1, . . . , xi−1, x′i+1, . . . , x

′n)

π(x ′i |x1, . . . , xi−1, x ′i+1, . . . , x′n)

=n∏

i=1

π(xi |x ′1, . . . , x ′i−1, xi+1, . . . , xn)

π(x ′i |x ′1, . . . , x ′i−1, xi+1, . . . , xn).

Gaussian Markov Random Fields

Properties of GMRFs

Proof via Brook’s lemma

Assume µ = 0 and fix x′ = 0. Then

logπ(x)

π(0)= −1

2

n∑i=1

κix2i −

n∑i=2

i−1∑j=1

κiβijxixj . (4)

and

logπ(x)

π(0)= −1

2

n∑i=1

κix2i −

n−1∑i=1

n∑j=i+1

κiβijxixj . (5)

Since (4) and (5) must be identical then κiβij = κjβji for i 6= j ,and

log π(x) = const− 1

2

n∑i=1

κix2i −

1

2

∑i 6=j

κiβijxixj ;

hence x is zero mean multivariate normal provided Q > 0.

Gaussian Markov Random Fields

Part II

Simulation algorithms for GMRFs

Gaussian Markov Random Fields

Outline IIntroductionSimulation algorithms for GMRFs

TaskSummary of the simulation algorithms

Basic numerical linear algebraCholesky factorisation and the Cholesky triangleSolving linear equationsAvoid computing the inverse

Unconditional samplingSimulation algorithmEvaluating the log-density

Conditional samplingCanonical parameterisationConditional distribution in the canonical parameterisationSampling from a canonical parameterised GMRF

Page 12: Gaussian Markov Random Fields: Theory and Applications ... · Gaussian Markov Random Fields: Theory and Applications avaH rd Rue 1 1Department of Mathematical Sciences NTNU, Norway

Gaussian Markov Random Fields

Outline IISampling under hard linear constraints

IntroductionGeneral algorithm (slow)Specific algorithm with not to many constraints (fast)ExampleEvaluating the log-density

Sampling under soft linear constraintsIntroductionSpecific algorithm with not to many constraints (fast)The algorithmEvaluate the log-densityExample

Gaussian Markov Random Fields

Introduction

Sparse precision matrix

Recall that

E (xi − µi | x−i ) = − 1

Qii

∑j∼i

Qij(xi − µj), Prec(xi | x−i ) = Qii

In most cases:

• Total number of neighbours is O(n).

• Only O(n) of the n2 terms in Q will be non-zero.

• Use this to construct exact simulation algorithms for GMRFs,using numerical algorithms for sparse matrices.

Gaussian Markov Random Fields

Introduction

Example of a typical precision matrix

Gaussian Markov Random Fields

Introduction

Example of a typical precision matrix

Each node have on the average 7 neighbours.

Page 13: Gaussian Markov Random Fields: Theory and Applications ... · Gaussian Markov Random Fields: Theory and Applications avaH rd Rue 1 1Department of Mathematical Sciences NTNU, Norway

Gaussian Markov Random Fields

Introduction

Example of a typical precision matrix

Each node have on the average 7 neighbours.

Gaussian Markov Random Fields

Simulation algorithms for GMRFs

Task

Simulation algorithms for GMRFs

Can we take advantage of the sparse structure of Q?

• It is faster to factorise a sparse Q compared to a dense Q.

• The speedup depends on the “pattern” in Q, not only thenumber of non-zero terms.

Our task

• Formulate all algorithms to use only sparse matrices.

• Unconditional simulation

• Conditional simulation• Condition on a subset of variables• Condition on linear constraints• Condition on linear constraints with normal noise

• Evaluation of the log-density in all cases.

Gaussian Markov Random Fields

Simulation algorithms for GMRFs

Summary of the simulation algorithms

The result

In most cases, the cost is

• O(n) for temporal GMRFs

• O(n3/2) for spatial GMRFs

• O(n2) for spatio-temporal GMRFs

including evaluation of the log-density.

Condition on k linear constraints, add O(k3).

These are general algorithms only depending on the graph G notthe numerical values in Q.

The core is numerical algorithms for sparse matrices.

Gaussian Markov Random Fields

Basic numerical linear algebra

Cholesky factorisation and the Cholesky triangle

Cholesky factorisation

If A > 0 be a n × n positive definite matrix, then there exists aunique Cholesky triangle L, such that L is a lower triangularmatrix, and

A = LLT

Computing L costs n3/3 flops.

This factorisation is the basis for solving systems like

Ax = b or AX = B

for k right hand sides, or equivalently, computing

x = A−1b or X = A−1B

Page 14: Gaussian Markov Random Fields: Theory and Applications ... · Gaussian Markov Random Fields: Theory and Applications avaH rd Rue 1 1Department of Mathematical Sciences NTNU, Norway

Gaussian Markov Random Fields

Basic numerical linear algebra

Solving linear equations

Algorithm 1 Solving Ax = b where A > 0

1: Compute the Cholesky factorisation, A = LLT

2: Solve Lv = b3: Solve LT x = v4: Return x

Step 2 is called forward-substitution and cost O(n2) flops.

=

The solution v is computed in a forward-loop

vi =1

Lii(bi −

i−1∑j=1

Lijvj), i = 1, . . . , n (6)

Gaussian Markov Random Fields

Basic numerical linear algebra

Solving linear equations

Algorithm 1 Solving Ax = b where A > 0

1: Compute the Cholesky factorisation, A = LLT

2: Solve Lv = b3: Solve LT x = v4: Return x

Step 3 is called back-substitution and costs O(n2) flops.

The solution x is computed in a backward-loop

xi =1

Lii(vi −

n∑j=i+1

Ljixj), i = n, . . . , 1 (7)

Gaussian Markov Random Fields

Basic numerical linear algebra

Avoid computing the inverse

To compute A−1B where B is a n × k matrix, we do this bycomputing the solution X of

AXj = Bj

for each of the k columns of X.

Algorithm 2 Solving AX = B where A > 0

1: Compute the Cholesky factorisation, A = LLT

2: for j = 1 to k do3: Solve Lv = Bj

4: Solve LT Xj = v5: end for6: Return X

Gaussian Markov Random Fields

Unconditional sampling

Simulation algorithm

Sample x ∼ N (µ,Q−1)If Q = LLT and z ∼ N (0, I), then x defined by

LT x = z

has covariance

Cov(x) = Cov(L−T z) = (LLT )−1 = Q−1

Algorithm 3 Sampling x ∼ N (µ,Q−1)

1: Compute the Cholesky factorisation, Q = LLT

2: Sample z ∼ N (0, I)3: Solve LT v = z4: Compute x = µ + v5: Return x

Page 15: Gaussian Markov Random Fields: Theory and Applications ... · Gaussian Markov Random Fields: Theory and Applications avaH rd Rue 1 1Department of Mathematical Sciences NTNU, Norway

Gaussian Markov Random Fields

Unconditional sampling

Evaluating the log-density

The log-density is

log π(x) = −n

2log 2π +

n∑i=1

log Lii − 1

2(x− µ)T Q(x− µ)︸ ︷︷ ︸

=q

If x is sampled, thenq = zT z

otherwise, compute this term as

• u = x− µ

• v = Qu

• q = uT v

Gaussian Markov Random Fields

Conditional sampling

Canonical parameterisation

The canonical parameterisation

A GMRF x wrt G having a canonical parameterisation (b,Q), hasdensity

π(x) ∝ exp(− 1

2xT Qx + bT x

)(8)

ie, precision matrix Q and mean µ = Q−1b.

Write this asx ∼ NC (b,Q). (9)

The relation to the Gaussian distribution, is that

N (µ,Q−1) = NC (Qµ,Q)

Gaussian Markov Random Fields

Conditional sampling

Conditional distribution in the canonical parameterisation

Conditional simulation of a GMRF

Decompose x as(xA

xB

)∼ N

((µA

µB

),

(QAA QAB

QBA QBB

)−1)

ThenxA | xB

has canonical parameterisation

xA − µA | xB ∼ NC (−QAB(xB − µB),QAA) (10)

Simulate using an algorithm for a canonical parameterisation.

Gaussian Markov Random Fields

Conditional sampling

Sampling from a canonical parameterised GMRF

Sample x ∼ NC (b,Q−1)

Recall that“NC (b,Q) = N (Q−1b,Q−1)′′

so we need to compute the mean as well.

Algorithm 4 Sampling x ∼ NC (b,Q)

1: Compute the Cholesky factorisation, Q = LLT

2: Solve Lw = b3: Solve LTµ = w4: Sample z ∼ N (0, I)5: Solve LT v = z6: Compute x = µ + v7: Return x

Page 16: Gaussian Markov Random Fields: Theory and Applications ... · Gaussian Markov Random Fields: Theory and Applications avaH rd Rue 1 1Department of Mathematical Sciences NTNU, Norway

Gaussian Markov Random Fields

Sampling under hard linear constraints

Introduction

Sampling from x | Ax = e

• A is a k × n matrix, 0 < k < n, with rank k ,

• e is a vector of length k.

This case occurs quite frequently:

A sum-to-zero constraint corresponds to k = 1, A = 1T

and e = 0.

We will denote this problem sampling under a hard constraint.

Note that π(x | Ax) is singular with rank n − k .

Gaussian Markov Random Fields

Sampling under hard linear constraints

Introduction

The linear constraint makes the conditional distribution Gaussian

• but it is singular as the rank of the constrained covariancematrix is n − k

• more care must be exercised when sampling from thisdistribution.

We have that

E(x | Ax = e) = µ− AQ−1(AQ−1AT )−1(Aµ− e) (11)

Cov(x | Ax = e) = Q−1 −Q−1AT (AQ−1AT )−1AQ−1 (12)

This is typically a dense-matrix case, which must be solved usinggeneral O(n3) algorithms (see next frame for details).

Gaussian Markov Random Fields

Sampling under hard linear constraints

General algorithm (slow)

We can sample from this distribution as follows. As the covariancematrix is singular we compute the eigenvalues and eigenvectors, andwrite the covariance matrix as VΛVT where V have the eigenvectors oneach row and Λ is a diagonal matrix with the eigenvalues on thediagonal. This corresponds to a different factorisation than the Choleskytriangle, but any matrix C which satisfy CCT = Σ, will do. Note that kof the eigenvalues are zero. We can produce a sample by computingv = Cz, where C = VΛ1/2, z ∼ N (0, I), and then add the conditionalmean. We can compute the log-density as

log(π(x|Ax = e)) = −n − k

2log 2π − 1

2

∑i :Λii>0

log Λii

−1

2(x− µ∗)T Σ−(x− µ∗) (13)

where µ∗ is (11), and Σ− = VΛ−VT where (Λ−)ii is Λ−1ii if Λii > 0 and

zero otherwise. In total, this is a quite computational demeaningprocedure, as the algorithm is not able to take advantage of the sparsestructure of Q.

Gaussian Markov Random Fields

Sampling under hard linear constraints

Specific algorithm with not to many constraints (fast)

Conditioning via Kriging

There is an alternative algorithm which correct for the constraints,at nearly no costs if k n.

Let x ∼ N (µ,Q), then compute

x∗ = x−Q−1AT (AQ−1AT )−1(Ax− e). (14)

Now x∗ has the correct conditional distribution!

AQ−1AT is a k × k matrix, hence its factorisation is fast tocompute for small k .

Page 17: Gaussian Markov Random Fields: Theory and Applications ... · Gaussian Markov Random Fields: Theory and Applications avaH rd Rue 1 1Department of Mathematical Sciences NTNU, Norway

Gaussian Markov Random Fields

Sampling under hard linear constraints

Specific algorithm with not to many constraints (fast)

Algorithm 5 Sampling x | Ax = e when x ∼ N (µ,Q−1)

1: Compute the Cholesky factorisation, Q = LLT

2: Sample z ∼ N (0, I)3: Solve LT v = z4: Compute x = µ + v5: Compute Vn×k = Q−1AT with Algorithm 26: Compute Wk×k = AV7: Compute Uk×n = W−1VT using Algorithm 28: Compute c = Ax− e9: Compute x∗ = x−UT c

10: Return x∗

If z = 0 in Algorithm 5, then x∗ is the conditional mean.Extra cost is only O(k3) for large k!

Gaussian Markov Random Fields

Sampling under hard linear constraints

Example

ExampleLet x1, . . . , xn be independent Gaussian variables with variance σ2

i andmean µi .To sample x conditioned on ∑

xi = 0

we sample firstxi ∼ N (µi , σ

2i ), i = 1, . . . , n

and compute the constrained sample x∗, as

x∗i = xi − c σ2i (15)

where

c =

∑j xj∑j σ

2j

Gaussian Markov Random Fields

Sampling under hard linear constraints

Evaluating the log-density

The log-density can be rapidly computed using the followingidentity,

π(x | Ax) =π(x)π(Ax | x)

π(Ax), (16)

We can compute each term on the rhs easier than the lhs.

π(x)-term. This is a GMRF and the log-density is easy tocompute using L computed in Algorithm 5 step 1.

Gaussian Markov Random Fields

Sampling under hard linear constraints

Evaluating the log-density

The log-density can be rapidly computed using the followingidentity,

π(x | Ax) =π(x)π(Ax | x)

π(Ax), (16)

We can compute each term on the rhs easier than the lhs.

π(Ax | x)-term. This is a degenerate density, which is either zeroor a constant,

log π(Ax | x) = −1

2log |AAT | (17)

The determinant of a k × k matrix can be found itsCholesky factorisation.

Page 18: Gaussian Markov Random Fields: Theory and Applications ... · Gaussian Markov Random Fields: Theory and Applications avaH rd Rue 1 1Department of Mathematical Sciences NTNU, Norway

Gaussian Markov Random Fields

Sampling under hard linear constraints

Evaluating the log-density

The log-density can be rapidly computed using the followingidentity,

π(x | Ax) =π(x)π(Ax | x)

π(Ax), (17)

We can compute each term on the rhs easier than the lhs.

π(Ax)-term. Ax is Gaussian with mean Aµ and covariance matrixAQ−1AT with Cholesky triangle L available fromAlgorithm 5 step 7.

Gaussian Markov Random Fields

Sampling under soft linear constraints

Introduction

Sampling from x | Ax = ε

Let x be a GMRF which is observed by e, wheree | x ∼ N (Ax,Σε).

• e is a vector of length k < n

• A a k × n matrix rank k

• Σε > 0 is the covariance matrix for the noise.

The conditional for x, is

log π(x | e).

= −1

2(x−µ)T Q(x−µ)−1

2(e−Ax)T Σ−1

ε (e−Ax) (18)

x | e ∼ NC (Qµ + AT Σ−1ε e,Q + AT Σ−1

ε A) (19)

This precision matrix is most often a full matrix and the nicesparse structure of Q is lost.

Gaussian Markov Random Fields

Sampling under soft linear constraints

Introduction

ExampleObserve the sum of xi with unit variance noise, the posteriorprecision is

Q + 11T

which is a dense matrix.

In general, we have to sample from (18) using a general algorithmwhich is computational expensive for large n.

Gaussian Markov Random Fields

Sampling under soft linear constraints

Specific algorithm with not to many constraints (fast)

Alternative approach

Similar problem to sampling under a hard constraint Ax = e, butnow “e” is stochastic

Ax = ε where ε ∼ N (e,Σε). (20)

We denote this case sampling under a soft constraint.

If we extend (14) to

x∗ = x−Q−1AT (AQ−1AT + Σε)−1(Ax− ε), (21)

where ε and x are samples from their respective distributions,

....then x∗ has the correct conditional distribution.

Page 19: Gaussian Markov Random Fields: Theory and Applications ... · Gaussian Markov Random Fields: Theory and Applications avaH rd Rue 1 1Department of Mathematical Sciences NTNU, Norway

Gaussian Markov Random Fields

Sampling under soft linear constraints

The algorithm

Algorithm 6 Sampling x | Ax = ε when ε ∼ N (e, Σε) and x ∼ N (µ, Q−1)

1: Compute the Cholesky factorisation, Q = LLT

2: Sample z ∼ N (0, I)3: Solve LT v = z4: Compute x = µ + v5: Compute Vn×k = Q−1AT using Algorithm 2 using L6: Compute Wk×k = AV + Σε

7: Compute Uk×n = W−1VT using Algorithm 28: Sample ε ∼ N (e,W) using the factorisation from step 79: Compute c = Ax− ε

10: Compute x∗ = x−UT c11: Return x∗

When z = 0 and ε = e then x∗ is the conditional mean.

Gaussian Markov Random Fields

Sampling under soft linear constraints

Evaluate the log-density

The log-density is computed via

π(x | e) =π(x)π(e | x)

π(e)(22)

All Cholesky triangles required are available from the simulationalgorithm.

π(x)-term. This is a GMRF.

π(e | x)-term. e | x is Gaussian with mean Ax and covariance Σε.

π(e)-term. e is Gaussian with mean Aµ and covariance matrixAQ−1AT + Σε.

Gaussian Markov Random Fields

Sampling under soft linear constraints

Example

Example

Let x1, . . . , xn be independent Gaussian variables with variance σ2i and

mean µi . We now observe

e ∼ N (∑

xi , σ2ε )

To sample from π(x | e) we sample first

xi ∼ N (µi , σ2i ), i = 1, . . . , n

and thenε ∼ N (e, σ2

ε )

A conditional sample x∗, is then

x∗i = xi − c σ2i , where c =

∑j xj − ε∑

j σ2j + σ2

ε

Gaussian Markov Random Fields

Part III

Numerical methods for sparse matrices

Page 20: Gaussian Markov Random Fields: Theory and Applications ... · Gaussian Markov Random Fields: Theory and Applications avaH rd Rue 1 1Department of Mathematical Sciences NTNU, Norway

Gaussian Markov Random Fields

Outline IIntroductionCholesky factorisationThe Cholesky triangle

InterpretationThe zero-pattern in LExample: A simple graphExample: auto-regressive processes

Band matricesBandwidth is preservedCholesky factorisation for band-matrices

Reordering schemesIntroductionReordering to band-matricesReordering using the idea of nested dissection

What to do with sparse matrices?

Gaussian Markov Random Fields

Outline IIA numerical case-study

IntroductionGMRF-models in timeGMRF-models in spaceGMRF-models in time×space

Gaussian Markov Random Fields

Introduction

Numerical methods for sparse matrices

Have shown that computations on GMRFs can be expressed suchthat the main tasks are

1. compute the Cholesky factorisation of Q = LLT , and

2. solve Lv = b and LT x = z.

The second task is much-faster than the first, but sparsity will beof advantage also here.

Gaussian Markov Random Fields

Introduction

The goal is to explain

• why a sparse Q allow for fast factorisation

• how we can take advantage of it,

• why we gain if we permute the vertics before factorising thematrix,

• how statisticians can benefit for recent research in this areaby the numerical mathematicians.

At the end we present a small case study factorising some typicalmatrices for GMRFs, using a classical and more recent methods forfactorising matrices.

Page 21: Gaussian Markov Random Fields: Theory and Applications ... · Gaussian Markov Random Fields: Theory and Applications avaH rd Rue 1 1Department of Mathematical Sciences NTNU, Norway

Gaussian Markov Random Fields

Cholesky factorisation

How to compute the Cholesky factorisation

Qij =

j∑k=1

LikLjk , i ≥ j .

vi = Qij −j−1∑k=1

LikLjk , i ≥ j ,

Then

• L2jj = vj , and

• LijLjj = vi for i > j .

If we know vi for fixed j , then

Ljj =√

vj and Lij = vi/√

vj , for i = j + 1, . . . , n.

This gives the jth column in L.

Gaussian Markov Random Fields

Cholesky factorisation

Cholesky factorization of Q > 0

Algorithm 7 Computing the Cholesky triangle L of Q

1: for j = 1 to n do2: vj :n = Qj :n,j

3: for k = 1 to j − 1 do vj :n = vj :n − Lj :n,kLjk

4: Lj :n,j = vj :n/√

vj

5: end for6: Return L

The overall process involves n3/3 flops.

Gaussian Markov Random Fields

The Cholesky triangle

Interpretation

Interpretation of L (I)

Let Q = LLT , then the solution of

LT x = z where z ∼ N (0, I)

is N (0,Q−1) distributed.

=

Since L is lower triangular then

xn =1

Lnnzn

xn−1 =1

Ln−1,n−1(zn−1 − Ln,n−1xn)

. . .

Gaussian Markov Random Fields

The Cholesky triangle

Interpretation

Interpretation of L (II)

TheoremLet x be a GMRF wrt to the labelled graph G, with mean µ andprecision matrix Q > 0. Let L be the Cholesky triangle of Q. Thenfor i ∈ V,

E(xi | x(i+1):n) = µi − 1

Lii

n∑j=i+1

Lji (xj − µj) and

Prec(xi | x(i+1):n) = L2ii .

Page 22: Gaussian Markov Random Fields: Theory and Applications ... · Gaussian Markov Random Fields: Theory and Applications avaH rd Rue 1 1Department of Mathematical Sciences NTNU, Norway

Gaussian Markov Random Fields

The Cholesky triangle

The zero-pattern in L

Determine the zero-pattern in L (I)

TheoremLet x be a GMRF wrt G, with mean µ and precision matrix Q > 0.Let L be the Cholesky triangle of Q and define for 1 ≤ i < j ≤ nthe set

F (i , j) = i + 1, . . . , j − 1, j + 1, . . . , n,which is the future of i except j. Then

xi ⊥ xj | xF (i ,j) ⇐⇒ Lji = 0.

If we can verify that Lji is zero, we do not have to compute itwhen factorising Q

Gaussian Markov Random Fields

The Cholesky triangle

The zero-pattern in L

Proof

Assume µ = 0 and fix 1 ≤ i < j ≤ n. Theorem 11 gives that

π(xi :n) ∝ exp

−1

2

n∑k=i

L2kk

xk +1

Lkk

n∑j=k+1

Ljkxj

2= exp

(−1

2xTi :nQ(i :n)xi :n

),

where Q(i :n)ij = LiiLji . Then

xi ⊥ xj | xF (i ,j) ⇐⇒ LiiLji = 0,

which is equivalent to Lji = 0 since Lii > 0 as Q(i :n) > 0.

Gaussian Markov Random Fields

The Cholesky triangle

The zero-pattern in L

Determine the zero-pattern in L (II)

The global Markov property provide a simple and sufficient criteriafor checking if Lji = 0.

CorollaryIf F (i , j) separates i < j in G, then Lji = 0.

CorollaryIf i ∼ j then F (i , j) does not separates i < j .

The idea is simple

• Use the global Markov property to check if Lji = 0.

• Compute only the non-zero terms in L, so that Q = LLT .

Gaussian Markov Random Fields

The Cholesky triangle

Example: A simple graph

Example

Q =

× × ×× × ×× × ×× × ×

L =

×× ×× ? ×? × × ×

Page 23: Gaussian Markov Random Fields: Theory and Applications ... · Gaussian Markov Random Fields: Theory and Applications avaH rd Rue 1 1Department of Mathematical Sciences NTNU, Norway

Gaussian Markov Random Fields

The Cholesky triangle

Example: A simple graph

Example

Q =

× × ×× × ×× × ×× × ×

L =

×× ×× √ ×× × ×

Gaussian Markov Random Fields

The Cholesky triangle

Example: auto-regressive processes

Example: AR(1)-process

xt | x1:(t−1) ∼ N (φxt−1, σ2), t = 1, . . . , n

Q =

× ×× × ×× × ×× × ×× × ×× × ×× ×

L =

×× ×× ×× ×× ×× ×× ×

Gaussian Markov Random Fields

Band matrices

Bandwidth is preserved

Bandwidth is preserved

Similarly, for an AR(p)-process

• Q have bandwidth p.

• L have lower-bandwidth p.

TheoremLet Q > 0 be a band matrix with bandwidth p and dimension n,then the Cholesky triangle of Q has (lower) bandwidth p.

=

...easy to modify existing Cholesky-factorisation code to use onlyentries where |i − j | ≤ p.

Gaussian Markov Random Fields

Band matrices

Cholesky factorisation for band-matrices

Avoid computing Lij and reading Qij for |i − j | > p.

Algorithm 8 Band-Cholesky factorization of Q with bandwidth p

1: for j = 1 to n do2: λ = minj + p, n3: vj :λ = Qj :λ,j

4: for k = max1, j − p to j − 1 do5: i = mink + p, n6: vj :i = vj :i − Lj :i ,kLjk

7: end for8: Lj :λ,j = vj :λ/

√vj

9: end for10: Return L

Cost is now n(p2 + 3p) flops assuming n p.

Page 24: Gaussian Markov Random Fields: Theory and Applications ... · Gaussian Markov Random Fields: Theory and Applications avaH rd Rue 1 1Department of Mathematical Sciences NTNU, Norway

Gaussian Markov Random Fields

Reordering schemes

Introduction

Reorder the vertices

We can permute the vertexes;

select one of the n! possible permutations, define thecorresponding permutation matrix P, such that iP = Pi,where i = (1, . . . , n)T , is the new ordering of thevertexes.

Chose P, if possible, such that

QP = PQPT (23)

is a band-matrix with a small bandwidth.

Gaussian Markov Random Fields

Reordering schemes

Introduction

• Impossible in general to obtain the optimal permutation, n! isto large!

• A sub-optimal ordering will do as well.

• Solve Qµ = b as follows:• bP = Pb.• Solve QPµP = bP

• Map the solution back, µ = PTµP .

Gaussian Markov Random Fields

Reordering schemes

Reordering to band-matrices

Reordering to band-matrices

Gaussian Markov Random Fields

Reordering schemes

Reordering to band-matrices

Reordering to band-matrices

Page 25: Gaussian Markov Random Fields: Theory and Applications ... · Gaussian Markov Random Fields: Theory and Applications avaH rd Rue 1 1Department of Mathematical Sciences NTNU, Norway

Gaussian Markov Random Fields

Reordering schemes

Reordering to band-matrices

Reordering to band-matrices

• factorisation of QP required about 0.0018 seconds

• Solving Qµ = b required about 0.0006 seconds.

• on a 1 200MHz laptop.

Gaussian Markov Random Fields

Reordering schemes

Reordering using the idea of nested dissection

More optimal reordering schemes

× × × × × ×× ×× ×× ×× ×× ×

× ×× ×× ×× ×× ×

× × × × × ×

×× ×× √ ×× √ √ ×× √ √ √ ×× √ √ √ √ ×

×××××

× × × × × ×

Gaussian Markov Random Fields

Reordering schemes

Reordering using the idea of nested dissection

Nested dissection reordering (I)

The idea generalise as follows.

• Select a (small) set of nodes whose removal divides the graphinto two disconnected subgraphs of almost equal size.

• Order the nodes chosen after ordering all the nodes in bothsubgraphs.

• Apply this procedure recursively to the nodes in eachsubgraph.

Costs in the spatial case

• Factorisation O(n3/2)

• Fill-in O(n log n)

• Optimal in the order sense.

Gaussian Markov Random Fields

Reordering schemes

Reordering using the idea of nested dissection

Nested dissection reordering (II)

Page 26: Gaussian Markov Random Fields: Theory and Applications ... · Gaussian Markov Random Fields: Theory and Applications avaH rd Rue 1 1Department of Mathematical Sciences NTNU, Norway

Gaussian Markov Random Fields

What to do with sparse matrices?

Statisticians and numerical methods for sparse matrices

Gupta (2002) summarises his findings on recent advances forsparse linear solvers:

. . . recent sparse solvers have significantly improved thestate of the art of the direct solution of general sparsesystems.. . . recent years have seen some remarkable advances inthe general sparse direct-solver algorithms and software.

• Good news

• We just use their software

Gaussian Markov Random Fields

A numerical case-study

Introduction

A numerical case-study of typical GMRFs

Divide GMRF models into three categories.

1. GMRF models in time or on a line; auto-regressive models andmodels for smooth functions.Neighbours to xt are then those xs such that |s − t| ≤ p.

2. Spatial GMRF models; regular lattice, or irregular latticeinduced by a tessellation or regions of land.Neighbours to xi (spatial index i), are those j spatially “close”to i , where “close” is defined from its context.

3. Spatio-temporal GMRF models. Often an extension of spatialmodels to also include dynamic changes.

Gaussian Markov Random Fields

A numerical case-study

Introduction

Also include some “global” nodes:

Let x be a GMRF with a common mean µ

x|µ ∼ N (µ1,Q−1) (24)

Assume µ ∼ N (0, σ2)

...then (x, µ) is also a GMRF where the node µ is neighbour withall xi ’s.

Gaussian Markov Random Fields

A numerical case-study

Introduction

Two different algorithms

1. The band-Cholesky factorisation (BCF) as in Algorithm 8.Here we use the LAPACK-routines DPBTRF and DTBSV, for thefactorisation and the forward/back-substitution, respectively,and the Gibbs-Poole-Stockmeyer algorithm for bandwidthreduction.

2. The Multifrontal Supernodal Cholesky factorisation (MSCF)implementation in the library TAUCS using the nesteddissection reordering from the library METIS.

Both solvers are available in the GMRFLib-library

Page 27: Gaussian Markov Random Fields: Theory and Applications ... · Gaussian Markov Random Fields: Theory and Applications avaH rd Rue 1 1Department of Mathematical Sciences NTNU, Norway

Gaussian Markov Random Fields

A numerical case-study

Introduction

The tasks we want to investigate are

1. factorising Q into LLT , and

2. solving LLTµ = b.

Producing a random sample from the GMRF, is half the cost ofsolving the linear system in step 2.

All tests reported here, we conducted on a 1 200MHz laptop with512Mb memory running Linux.

New machines are nearly 3− 10 times as fast...

Gaussian Markov Random Fields

A numerical case-study

GMRF-models in time

GMRF models in time

Let Q be a band-matrix with bandwidth p and dimension n.For such a problem, using BCF will be (theoretically) optimal, asthe fillin will be be zero.

n = 103 n = 104 n = 105

CPU-time p = 5 p = 25 p = 5 p = 25 p = 5 p = 25

Factorise 0.0005 0.0019 0.0044 0.0271 0.0443 0.2705Solve 0.0000 0.0004 0.0031 0.0109 0.0509 0.1052

“Long and thin” are fast!The MSCF is less optimal for band-matrices: fillin and morecomplicated data-structures.

Gaussian Markov Random Fields

A numerical case-study

GMRF-models in time

Add 10 global nodes:

n = 103 n = 104 n = 105

CPU-time p = 5 p = 25 p = 5 p = 25 p = 5 p = 25

Factorise 0.0119 0.0335 0.1394 0.4085 1.6396 4.1679Solve 0.0007 0.0035 0.0138 0.0306 0.1541 0.3078

• The fillin ≈ pn.

• The nested dissection ordering give good results in all casesconsidered so far.

Gaussian Markov Random Fields

A numerical case-study

GMRF-models in space

Spatial GMRF models

Regular lattice

Page 28: Gaussian Markov Random Fields: Theory and Applications ... · Gaussian Markov Random Fields: Theory and Applications avaH rd Rue 1 1Department of Mathematical Sciences NTNU, Norway

Gaussian Markov Random Fields

A numerical case-study

GMRF-models in space

Spatial GMRF models

Irregular lattice

Gaussian Markov Random Fields

A numerical case-study

GMRF-models in space

n = 1002 n = 1502 n = 2002

CPU-time Method 3× 3 5× 5 3× 3 5× 5 3× 3 5× 5

Factorise BCF 0.51 1.02 2.60 4.93 13.30 38.12MSCF 0.17 0.62 0.55 1.92 1.91 4.90

Solve BCF 0.03 0.05 0.10 0.16 0.24 0.43MSCF 0.01 0.04 0.04 0.11 0.08 0.21

• For the largest lattice the MSCF really outperform the BCF.

• The reason is the O(n3/2) cost for MSCF compared to O(n2)for the BCF.

Gaussian Markov Random Fields

A numerical case-study

GMRF-models in time×space

Spatio-temporal GMRF models

Spatio-temporal GMRF models is often an extension of spatialGMRF models to account to time variation.

time

Use this model and the graph of Germany, for T = 10 andT = 100.

Gaussian Markov Random Fields

A numerical case-study

GMRF-models in time×space

10 global nodesCPU-time T = 10 T = 100 T = 10 T = 100

Factorise 0.25 39.96 0.31 39.22Solve 0.02 0.42 0.02 0.42

• The results shows a quite heavy dependency on T .

• Spatio-Temporal GMRFs are more demanding than spatialones, O(n2) for a n1/3 × n1/3 × n1/3-cube.

Page 29: Gaussian Markov Random Fields: Theory and Applications ... · Gaussian Markov Random Fields: Theory and Applications avaH rd Rue 1 1Department of Mathematical Sciences NTNU, Norway

Gaussian Markov Random Fields

Part IV

Intrinsic GMRFs

Gaussian Markov Random Fields

Outline IIntroductionDefinition of IGMRFs

Improper GMRFs

IGMRFs of first orderForward differencesOn the line with regular locationsOn the line with irregular locationsWhy IGMRFs are useful in applicationsIGMRFs of first order on irregular latticesIGMRFs of first order on regular lattices

IGMRFs of second orderRW2 model for regular locations

The CRWk model for regular and irregular locationsIGMRFs of general order on regular latticesIGMRFs of second order on regular lattices

Gaussian Markov Random Fields

Introduction

Intrinsic GMRFs (IGMRF)

• IGMRFs are improper, i.e., they have precision matrices not offull rank.

• Often used as prior distributions in various applications.

• Of particular importance are IGMRFs that are invariant to anytrend that is a polynomial of the locations of the nodes up toa specific order.

Gaussian Markov Random Fields

Definition of IGMRFs

Improper GMRFs

Improper GMRF

DefinitionLet Q be an n × n SPSD matrix with rank n − k > 0. Thenx = (x1, . . . , xn)T is an improper GMRF of rank n − k withparameters (µ,Q), if its density is

π(x) = (2π)−(n−k)

2 (|Q|∗)1/2 exp

(−1

2(x− µ)T Q(x− µ)

). (25)

Further, x is an improper GMRF wrt to the labelled graphG = (V, E), where

Qij 6= 0⇐⇒ i , j ∈ E for all i 6= j .

| · |∗ denote the generalised determinant: product of the non-zeroeigenvalues.

Page 30: Gaussian Markov Random Fields: Theory and Applications ... · Gaussian Markov Random Fields: Theory and Applications avaH rd Rue 1 1Department of Mathematical Sciences NTNU, Norway

Gaussian Markov Random Fields

Definition of IGMRFs

Improper GMRFs

Markov properties

The Markov properties are interpreted as those obtained from thelimit of a proper density.

Let the columns of AT span the null space of Q

Q(γ) = Q + γAT A. (26)

Q(γ) tends to the corresponding one in Q as γ → 0.

E(xi | x−i ) = µi − 1

Qii

∑j∼i

Qij(xj − µj)

is interpreted as γ → 0.

Gaussian Markov Random Fields

IGMRFs of first order

IGMRF of first order

An intrinsic GMRF of first order is an improper GMRF of rankn − 1 where Q1 = 0.

The condition Q1 = 0 means that∑j

Qij = 0, i = 1, . . . , n

Gaussian Markov Random Fields

IGMRFs of first order

Let µ = 0 so

E(xi | x−i ) = − 1

Qii

∑j :j∼i

Qijxj (27)

−∑j :j∼i

Qij/Qii = 1

• The conditional mean of xi is a weighted mean of itsneighbours.

• No shrinking towards an overall level.

• Many IGMRFs are constructed such that the deviation fromthe overall level is a smooth curve in time or a smooth surfacein space.

Gaussian Markov Random Fields

IGMRFs of first order

Forward differences

Definition (Forward difference)Define the first-order forward difference of a function f (·) as

∆f (z) = f (z + 1)− f (z).

Higher-order forward differences are defined recursively:

∆k f (z) = ∆∆k−1f (z)

so∆2f (z) = f (z + 2)− 2f (z + 1) + f (z) (28)

and in general for k = 1, 2, . . .,

∆k f (z) = (−1)kk∑

j=0

(−1)j

(k

j

)f (z + j).

Page 31: Gaussian Markov Random Fields: Theory and Applications ... · Gaussian Markov Random Fields: Theory and Applications avaH rd Rue 1 1Department of Mathematical Sciences NTNU, Norway

Gaussian Markov Random Fields

IGMRFs of first order

Forward differences

For a vector z = (z1, z2, . . . , zn)T , ∆z has elements

∆zi = zi+1 − zi , i = 1, . . . , n − 1

The forward difference of kth order as an approximation to the kthderivative of f (z),

f ′(z) = limh→0

f (z + h)− f (z)

h

Gaussian Markov Random Fields

IGMRFs of first order

On the line with regular locations

IGMRFs of first order on the line

Location of the node i is i (think “time”).

Assume independent increments

∆xiiid∼ N (0, κ−1), i = 1, . . . , n − 1, so (29)

xj − xi ∼ N (0, (j − i)κ−1) for i < j . (30)

If the intersection between i , . . . , j and k , . . . , l is empty fori < j and k < l , then

Cov(xj − xi , xl − xk) = 0. (31)

Properties coincide with those of a Wiener process.

Gaussian Markov Random Fields

IGMRFs of first order

On the line with regular locations

The density for x is

π(x | κ) ∝ κ(n−1)/2 exp

(−κ

2

n−1∑i=1

(∆xi )2

)(32)

= κ(n−1)/2 exp

(−κ

2

n−1∑i=1

(xi+1 − xi )2

), (33)

orπ(x) ∝ κ(n−1)/2 exp

(−κ

2xT Rx

)(34)

Gaussian Markov Random Fields

IGMRFs of first order

On the line with regular locations

The structure matrix

R =

1 −1−1 2 −1

−1 2 −1. . .

. . .. . .

−1 2 −1−1 2 −1

−1 1

. (35)

• The n − 1 independent increments ensure that the rank of Qis n − 1.

• Denote this model by RW1(κ) or short RW1.

Page 32: Gaussian Markov Random Fields: Theory and Applications ... · Gaussian Markov Random Fields: Theory and Applications avaH rd Rue 1 1Department of Mathematical Sciences NTNU, Norway

Gaussian Markov Random Fields

IGMRFs of first order

On the line with regular locations

Full conditionals

xi | x−i , κ ∼ N (1

2(xi−1 + xi+1), 1/(2κ)), 1 < i < n, (36)

Alternative interpretation:

• fit a first order polynomial

p(j) = β0 + β1j , (37)

locally through the points (i − 1, xi−1) and (i + 1, xi+1).

• The conditional mean is p(i).

Gaussian Markov Random Fields

IGMRFs of first order

On the line with regular locations

Example

0 20 40 60 80 100

−10

−50

510

Samples with n = 99 and κ = 1 by conditioning on the constraint∑xi = 0.

Gaussian Markov Random Fields

IGMRFs of first order

On the line with regular locations

Example

Marginal variances Var(xi ) for i = 1, . . . , n.

Gaussian Markov Random Fields

IGMRFs of first order

On the line with regular locations

Example

Corr(xn/2, xi ) for i = 1, . . . , n

Page 33: Gaussian Markov Random Fields: Theory and Applications ... · Gaussian Markov Random Fields: Theory and Applications avaH rd Rue 1 1Department of Mathematical Sciences NTNU, Norway

Gaussian Markov Random Fields

IGMRFs of first order

On the line with irregular locations

The first order RW for irregular locations

Location of xi is si but si+1 − si is not constant.Assume

s1 < s2 < · · · < sn

and letδi

def= si+1 − si . (38)

Gaussian Markov Random Fields

IGMRFs of first order

On the line with irregular locations

Wiener process

Consider xi as the realisation of an integrated Brownian motion incontinuous time, i.e. a Wiener process W (t), at time si .

Definition (Wiener process)

• A Wiener process with precision κ is a continuous-timestochastic process W (t) for t ≥ 0 with W (0) = 0 and suchthat the increments W (t)−W (s) are Gaussian with mean 0and variance (t − s)/κ for any 0 ≤ s < t.

• Furthermore, increments for non overlapping time intervals areindependent.

• For κ = 1, this process is called a standard Wiener process.

Gaussian Markov Random Fields

IGMRFs of first order

On the line with irregular locations

Brownian bridge

The full conditional is in this case known as the the Brownianbridge:

E(xi | x−i , κ) =δi

δi−1 + δixi−1 +

δi−1

δi−1 + δixi+1 (39)

and

Prec(xi | x−i , κ) = κ

(1

δi−1+

1

δi

). (40)

κ is a precision parameter.

Gaussian Markov Random Fields

IGMRFs of first order

On the line with irregular locations

The precision matrix is now

Qij = κ

1δi−1

+ 1δi

j = i

− 1δi

j = i + 1

0 otherwise

(41)

for 1 < i < n.

A proper correction at the boundary gives the remaining diagonalterms Q11 = κ/δ1, Qnn = κ/δn−1.

Page 34: Gaussian Markov Random Fields: Theory and Applications ... · Gaussian Markov Random Fields: Theory and Applications avaH rd Rue 1 1Department of Mathematical Sciences NTNU, Norway

Gaussian Markov Random Fields

IGMRFs of first order

On the line with irregular locations

The joint density is

π(x | κ) ∝ κ(n−1)/2 exp

(−κ

2

n−1∑i=1

(xi+1 − xi )2/δi

), (42)

and is invariant to the addition of a constant.

Gaussian Markov Random Fields

IGMRFs of first order

On the line with irregular locations

• The interpretation of a RW1 model as a discretely observedWiener-process, justifies the corrections needed.

• The underlying model is the same, it is only observeddifferently.

0 20 40 60 80 100

−10

−50

5

Gaussian Markov Random Fields

IGMRFs of first order

Why IGMRFs are useful in applications

When the mean is only locally constant

Alternative IGMRF

π(x) ∝ κ(n−1)/2 exp

(−κ

2

n∑i=1

(xi − x)2

)(43)

where x is the empirical mean of x.

Gaussian Markov Random Fields

IGMRFs of first order

Why IGMRFs are useful in applications

Assume n is even and

xi =

0, 1 ≤ i ≤ n/2

1, n/2 < i ≤ n. (44)

Thus x is locally constant with two levels.

Evaluate the density at this configuration under the RW1 modeland the alternative (43), we obtain

κ(n−1)/2 exp(−κ

2

)and κ(n−1)/2 exp

(−n

κ

8

)(45)

The log ratio of the densities is then of order O(n).

Page 35: Gaussian Markov Random Fields: Theory and Applications ... · Gaussian Markov Random Fields: Theory and Applications avaH rd Rue 1 1Department of Mathematical Sciences NTNU, Norway

Gaussian Markov Random Fields

IGMRFs of first order

Why IGMRFs are useful in applications

• The RW1 model only penalises the local deviation from aconstant level.

• The alternative penalises the global deviation from a constantlevel.

This local behaviour is advantageous in applications if the meanlevel of x is approximately or locally constant.

A similar argument also apply for polynomial IGMRFs of higherorder constructed using forward differences of order k asindependent Gaussian increments.

Gaussian Markov Random Fields

IGMRFs of first order

IGMRFs of first order on irregular lattices

First order IGMRFs on irregular latticesConsider the map of the 544 regions in Germany.Two regions are neighbours if they share a common border.

Gaussian Markov Random Fields

IGMRFs of first order

IGMRFs of first order on irregular lattices

Between neighbouring regions i and j , say, we define a“independent” Gaussian increment

xi − xj ∼ N (0, κ−1) (46)

Assume independent increments yields

π(x) ∝ κ(n−1)/2 exp

−κ2

∑i∼j

(xi − xj)2

. (47)

“i ∼ j” denotes the set of all unordered pairs of neighbours.

Number of increments |i ∼ j | is larger than n, but the rank of thecorresponding precision matrix is still n − 1.

Gaussian Markov Random Fields

IGMRFs of first order

IGMRFs of first order on irregular lattices

There are hidden constraints in the increments due to the morecomplicated geometry on a lattice than on the line.

ExampleLet n = 3 where all nodes are neighbours. Then x1 − x2 = ε1,x2 − x3 = ε2, and x3 − x1 = ε3, where ε1, ε2 and ε3 are theincrements.

This implies thatε1 + ε2 + ε3 = 0

which is the ‘hidden’ linear constraint.

Page 36: Gaussian Markov Random Fields: Theory and Applications ... · Gaussian Markov Random Fields: Theory and Applications avaH rd Rue 1 1Department of Mathematical Sciences NTNU, Norway

Gaussian Markov Random Fields

IGMRFs of first order

IGMRFs of first order on irregular lattices

Let ni denote the number of neighbours of region i .The precision matrix Q is

Qij = κ

ni i = j ,

−1 i ∼ j ,

0 otherwise,

(48)

xi | x−i , κ ∼ N (1

ni

∑j :j∼i

xj ,1

niκ). (49)

Gaussian Markov Random Fields

IGMRFs of first order

IGMRFs of first order on irregular lattices

Example

Gaussian Markov Random Fields

IGMRFs of first order

IGMRFs of first order on irregular lattices

• In general there is no longer an underlying continuousstochastic process that we can relate to this density.

• If we change the spatial resolution or split a region into twonew ones, we change the model.

Gaussian Markov Random Fields

IGMRFs of first order

IGMRFs of first order on irregular lattices

Weighted variants

Incorporate symmetric weights wij for each pair of adjacent nodes iand j .

For example, wij = 1/d(i , j)

Assuming independent increments

xi − xj ∼ N (0, 1/(wijκ)), (50)

then

π(x) ∝ κ(n−1)/2 exp

−κ2

∑i∼j

wij(xi − xj)2

. (51)

Page 37: Gaussian Markov Random Fields: Theory and Applications ... · Gaussian Markov Random Fields: Theory and Applications avaH rd Rue 1 1Department of Mathematical Sciences NTNU, Norway

Gaussian Markov Random Fields

IGMRFs of first order

IGMRFs of first order on irregular lattices

The precision matrix is

Qij = κ

k:k∼i wik i = j ,

−1/wij i ∼ j ,

0 otherwise,

(52)

and with mean and precision∑j :j∼i xjwij∑j :j∼i wij

and κ∑j :j∼i

wij , (53)

respectively.

Gaussian Markov Random Fields

IGMRFs of first order

IGMRFs of first order on regular lattices

First order IGMRFs on regular latticesFor a lattice IN with n = n1n2 nodes, let i = (i1, i2) denote thenode in the i1th row and i2th column.

Use the nearest four sites of i as its neighbours

(i1 + 1, i2), (i1 − 1, i2), (i1, i2 + 1), (i1, i2 − 1).

The precision matrix is

Qij = κ

ni i = j ,

−1 i ∼ j ,

0 otherwise,

(54)

and the full conditionals for xi are

xi | x−i , κ ∼ N (1

ni

∑j :j∼i

xj ,1

niκ). (55)

Gaussian Markov Random Fields

IGMRFs of first order

IGMRFs of first order on regular lattices

Limiting behaviour

This model have an important property

The process converge to the de Wijs-process: a Gaussianprocess with variogram

log(distance)

This is important

• there is a limiting process

• The de Wijs process has dense precision matrix whereas theIGMRF is very sparse!

Gaussian Markov Random Fields

IGMRFs of second order

RW2 model for regular locations

The RW2 model for regular locations

Let si = i for i = 1, . . . , n, with a constant distance betweenconsecutive nodes.

Use the second order increments

∆2xiiid∼ N (0, κ−1) (56)

for i = 1, . . . , n − 2, to define the joint density of x

π(x) ∝ κ(n−2)/2 exp

(−κ

2

n−2∑i=1

(xi − 2xi+1 + xi+2)2

)(57)

= κ(n−2)/2 exp(−κ

2xT Rx

)(58)

Page 38: Gaussian Markov Random Fields: Theory and Applications ... · Gaussian Markov Random Fields: Theory and Applications avaH rd Rue 1 1Department of Mathematical Sciences NTNU, Norway

Gaussian Markov Random Fields

IGMRFs of second order

RW2 model for regular locations

R =

1 −2 1−2 5 −4 11 −4 6 −4 1

1 −4 6 −4 1. . .

. . .. . .

. . .. . .

1 −4 6 −4 11 −4 6 −4 1

1 −4 5 −21 −2 1

. (59)

• Verify directly that QS1 = 0 and that the rank of Q is n − 2.

• IGMRF of second order: invariant to the adding line to x.

• Known as the second order random walk model, denoted byRW2(κ) or simply RW2 model.

Gaussian Markov Random Fields

IGMRFs of second order

RW2 model for regular locations

Remarks

• This is the RW2 model defined and used in the literature.

• We cannot extend it consistently to the case where thelocations are irregular.

• Similar problems occur if we increase the resolution from n to2n locations, say.

• This is in contrast to the RW1 model.

• There exists an alternative (somewhat more involved)formulation with the desired continuous time interpretation.

Gaussian Markov Random Fields

IGMRFs of second order

RW2 model for regular locations

The conditional mean and precision is

E(xi | x−i , κ) =4

6(xi+1 + xi−1)− 1

6(xi+2 + xi−2), (60)

Prec(xi | x−i , κ) = 6κ, (61)

respectively for 2 < i < n − 2.

Gaussian Markov Random Fields

IGMRFs of second order

RW2 model for regular locations

Consider the second order polynomial

p(j) = β0 + β1j +1

2β2j2 (62)

and compute the coefficients by a local least squares fit to thepoints

(i − 2, xi−2), (i − 1, xi−1), (i + 1, xi+1), (i + 2, xi+2). (63)

Turns out thatE(xi | x−i , κ) = p(i)

Page 39: Gaussian Markov Random Fields: Theory and Applications ... · Gaussian Markov Random Fields: Theory and Applications avaH rd Rue 1 1Department of Mathematical Sciences NTNU, Norway

Gaussian Markov Random Fields

IGMRFs of second order

RW2 model for regular locations

Samples with n = 99 and κ = 1 by conditioning on the constraints.

Gaussian Markov Random Fields

IGMRFs of second order

RW2 model for regular locations

Marginal variances Var(xi ) for i = 1, . . . , n.

Gaussian Markov Random Fields

IGMRFs of second order

RW2 model for regular locations

Corr(xn/2, xi ) for i = 1, . . . , n

Gaussian Markov Random Fields

The CRWk model for regular and irregular locations

Random walk models: irregular locations

• Locations s1 < s2 < . . . < sn.

• Discretely observed (integrated) Wiener process

• Augment with the velocities

• Valid for any order

• Connection to spline-models

Page 40: Gaussian Markov Random Fields: Theory and Applications ... · Gaussian Markov Random Fields: Theory and Applications avaH rd Rue 1 1Department of Mathematical Sciences NTNU, Norway

Gaussian Markov Random Fields

The CRWk model for regular and irregular locations

Random walk models: irregular locations

• Locations s1 < s2 < . . . < sn.

• Discretely observed (integrated) Wiener process

• Augment with the velocities

• Valid for any order

• Connection to spline-models

A1 B1

BT1 A2 + C1 B2

BT2 A3 + C2 B3

. . .. . .

. . .

Bn−1

BTn−1 Cn−1

.

Gaussian Markov Random Fields

The CRWk model for regular and irregular locations

Random walk models: irregular locations

• Locations s1 < s2 < . . . < sn.

• Discretely observed (integrated) Wiener process

• Augment with the velocities

• Valid for any order

• Connection to spline-models

Ai =

(12/δ3

i 6/δ2i

6/δ2i 4/δi

)Bi =

(−12/δ3i 6/δ2

i

−6/δ2i 2/δi

)Ci =

(12/δ3

i −6/δ2i

−6/δ2i 4/δi

)where

δi = si+1 − si .

Gaussian Markov Random Fields

IGMRFs of general order on regular lattices

IGMRFs of higher order on regular lattices

The precision matrix is orthogonal to a polynomial design matrix ofa certain degree.

Definition (IGMRFs of order k in dimension d)An IGMRF of order k in dimension d , is an improper GMRF ofrank n −mk−1,d where QSk−1,d = 0.

Gaussian Markov Random Fields

IGMRFs of second order on regular lattices

A second order IGMRF in two dimensions

Consider a regular lattice IN in d = 2 dimensions where si1 = i1and si2 = i2.

Choose the independent increments(x(i1+1,i2) + x(i1−1,i2) + x(i1,i2+1) + x(i1,i2−1)

)− 4xi1,i2 (64)

The motivation for this choice is that (64) is(∆2

i1 + ∆2i2

)xi1−1,i2−1 (65)

Invariant to adding a first order polynomial

p1,2(i1, i2) = β00 + β10i1 + β01i2,

Page 41: Gaussian Markov Random Fields: Theory and Applications ... · Gaussian Markov Random Fields: Theory and Applications avaH rd Rue 1 1Department of Mathematical Sciences NTNU, Norway

Gaussian Markov Random Fields

IGMRFs of second order on regular lattices

The precision matrix (apart from boundary effects) should havenon-zero elements

− (∆2i1 + ∆2

i2

)2= − (∆4

i1 + 2∆2i1∆2

i2 + ∆4i2

)(66)

which is a negative difference approximation to the biharmonicdifferential operator(

∂2

∂x2+

∂2

∂y 2

)2

=∂4

∂x4+ 2

∂4

∂x2∂y 2+

∂4

∂y 4. (67)

The fundamental solution of the biharmonic equation(∂4

∂x4+ 2

∂4

∂x2∂y 2+

∂4

∂y 4

)φ(x , y) = 0 (68)

is the thin plate spline.

Gaussian Markov Random Fields

IGMRFs of second order on regular lattices

Full conditionals in the interior

E(xi | x−i ) =1

20

(8 • • • •

− 2 • • • •

− 1 • • • •

)(69)

andPrec(xi | x−i ) = 20κ. (70)

Gaussian Markov Random Fields

IGMRFs of second order on regular lattices

Alternative IGMRFs in two dimensions

Using only the terms • • • • • (71)

to obtain an difference approximation to

∂2

∂x2+

∂2

∂y 2(72)

is not optimal.

The discretization error different 45 degrees to the main directions,hence we should expect a “directional” effect.

Gaussian Markov Random Fields

IGMRFs of second order on regular lattices

Use an isotropic approximations (ex. “Mehrstellen-stencil”)

− 10

3

• +2

3

• • • • +1

6

• • • • (73)

E(xi | x−i ) =1

468

(144

• • • • − 18

• • • •

+8 • • • •

− 8 • • • • • • • •

− 1• • • •

)(74)

Prec(xi | x−i ) = 13κ. (75)

Page 42: Gaussian Markov Random Fields: Theory and Applications ... · Gaussian Markov Random Fields: Theory and Applications avaH rd Rue 1 1Department of Mathematical Sciences NTNU, Norway

Gaussian Markov Random Fields

IGMRFs of second order on regular lattices

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Samples.

Gaussian Markov Random Fields

IGMRFs of second order on regular lattices

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Samples.

Gaussian Markov Random Fields

Part V

Hierarchical GMRF models

Gaussian Markov Random Fields

Outline IHierarchical GMRFs models

A simple exampleClasses of hierarchical GMRF models

A brief introduction to MCMCBlocking

Rate of convergenceExampleOne-block algorithm

Block algorithms for hierarchical GMRF modelsGeneral setupThe one-block algorithmThe sub-block algorithmGMRF approximations

Merging GMRFsIntroduction

Page 43: Gaussian Markov Random Fields: Theory and Applications ... · Gaussian Markov Random Fields: Theory and Applications avaH rd Rue 1 1Department of Mathematical Sciences NTNU, Norway

Gaussian Markov Random Fields

Outline IIWhy is it so?

Gaussian Markov Random Fields

Hierarchical GMRFs models

Hierarchical GMRFs modelsCharacterised through several stages of observables andparameters.

A typical scenario is as follows.

Stage 1 Formulate a distributional assumption for theobservables, dependent on latent parameters.

• Time series of binary observations y, we mayassume

yi , i = 1, . . . , n : yi ∼ B(pi )

• We assume the observations to be conditionallyindependent

Gaussian Markov Random Fields

Hierarchical GMRFs models

Hierarchical GMRFs modelsCharacterised through several stages of observables andparameters.

A typical scenario is as follows.

Stage 2 Assign a prior model, i.e. a GMRF, for the unknownparameters, here pi .

• Chose an autoregressive model for thelogit-transformed probabilities xi = logit(pi ).

Gaussian Markov Random Fields

Hierarchical GMRFs models

Hierarchical GMRFs modelsCharacterised through several stages of observables andparameters.

A typical scenario is as follows.

Stage 3 Assign to unknown parameters (or hyperparameters)of the GMRF

• precision parameter κ• “strength” of dependency.

Further stages if needed.

Page 44: Gaussian Markov Random Fields: Theory and Applications ... · Gaussian Markov Random Fields: Theory and Applications avaH rd Rue 1 1Department of Mathematical Sciences NTNU, Norway

Gaussian Markov Random Fields

Hierarchical GMRFs models

A simple example

Tokyo rainfall data

Stage 1 Binomial data

yi ∼

Binomial(2, p(xi ))

Binomial(1, p(xi ))

Gaussian Markov Random Fields

Hierarchical GMRFs models

A simple example

Tokyo rainfall data

Stage 2 Assume a smooth latent x,

x ∼ RW 2(κ), logit(pi ) = xi

Gaussian Markov Random Fields

Hierarchical GMRFs models

A simple example

Tokyo rainfall data

Stage 3 Gamma(α, β)-prior on κ

Gaussian Markov Random Fields

Hierarchical GMRFs models

Classes of hierarchical GMRF models

Classes of hierarchical GMRF models

• Normal data• Block-MCMC

• Non-normal data that allows for a normal-mixturerepresentation• Student-t distribution• Logistic and Laplace (Binary regression)

• Block-MCMC with auxillary variables

• Non-normal data• Poisson• and others...

• Block-MCMC with GMRF-approximations

Page 45: Gaussian Markov Random Fields: Theory and Applications ... · Gaussian Markov Random Fields: Theory and Applications avaH rd Rue 1 1Department of Mathematical Sciences NTNU, Norway

Gaussian Markov Random Fields

A brief introduction to MCMC

MCMCConstruct a Markov chain

θ(1),θ(2), . . . ,θ(k), . . .

that converges (under some conditions) to π(θ).

1. Start with θ(0) where π(θ(0)) > 0. Set k = 1.

2. Generate a proposal θ∗ from some proposal kernelq(θ∗|θ(k−1)).Set θ(k) = θ∗ with probability

α = min

1,

π(θ∗)π(θ(k−1))

q(θ(k−1)|θ∗)q(θ∗|θ(k−1))

;

otherwise set θ(k) = θ(k−1).

3. Set k = k + 1 and go back to 2

Gaussian Markov Random Fields

A brief introduction to MCMC

Depending on the specific choice of the proposal kernel q(θ∗|θ),very different algorithms result.

• When q(θ∗|θ) does not depend on the current value of θproposal is called an independence proposal.

• When q(θ∗|θ) = q(θ|θ∗) we have a Metropolis proposal.These includes so-called random-walk proposals.

The rate of convergence toward π(θ) and the degree ofdependence between successive samples of the Markov chain(mixing) will depend on the chosen proposal.

Gaussian Markov Random Fields

A brief introduction to MCMC

Single-site algorithms

• Most MCMC algorithms have been based on updating eachscalar component

θi , i = 1, . . . , p

of θ conditional on θ−i .

• Apply the MH-algorithm in turn to every component θi of θwith arbitrary proposal kernels

qi (θ∗i |θi ,θ−i )

• As long as we update each component of θ, this algorithmwill converge to the target distribution π(θ).

Gaussian Markov Random Fields

A brief introduction to MCMC

However...

• Such single-site updating can be disadvantageous ifparameters are highly dependent in the posterior distributionπ(θ).

• The problem is that the Markov chain may move around veryslowly in its target (posterior) distribution.

• A general approach to circumvent this problem is to updateparameters in larger blocks, θj : a vector of components of θ.

• The choice of blocks are often controlled by what is possibleto do in practise.

• Ideally, we should choose a small number of blocks with largedependence within the blocks but with less dependencebetween blocks.

Page 46: Gaussian Markov Random Fields: Theory and Applications ... · Gaussian Markov Random Fields: Theory and Applications avaH rd Rue 1 1Department of Mathematical Sciences NTNU, Norway

Gaussian Markov Random Fields

Blocking

Blocking

Assume the following

• θ = (κ, x) where• κ is the precision and• x is a GMRF.

Two-block approach

• sample x ∼ π(x|κ), and

• sample κ ∼ π(κ|x)

Often strong dependence between κ and x in the posterior;resolved using a joint update of (x, κ).

Why this modification is important and why it works is discussednext.

Gaussian Markov Random Fields

Blocking

Rate of convergence

Rate of convergence

Let θ(1), θ(2), . . . denote a Markov chain with target distributionπ(θ) and initial value θ(0) ∼ π(θ).

Rate of convergence ρ: how quickly E(h(θ(t))|θ(0)) approaches thestationary value E(h(θ)) for all square π-integrable functions h(·).

Let ρ be the minimum number such that for all h(·) and for allr > ρ

limk→∞

E

[(E(

h(θ(k)) | θ(0))− E (h(θ))

)2r−2k

]= 0. (76)

Gaussian Markov Random Fields

Blocking

Example

Example

Let x be a first-order autoregressive process

xt − µ = γ(xt−1 − µ) + νt , t = 2, . . . , n, (77)

where |γ| < 1, νt are iid normals with zero mean and variance

σ2, and x1 ∼ N (µ, σ2

1−γ2 ).

Let γ, σ2, and µ be fixed parameters.

At each iteration a single-site Gibbs sampler will sample xt fromthe full conditional π(xt |x−t) for t = 1, . . . , n,

xt | x−t ∼

N (µ+ γ(x2 − µ), σ2) t = 1,

N (µ+ γ1+γ2 (xt−1 + xt+1 − 2µ), σ2

1+γ2 ) t = 2, . . . , n − 1,

N (µ+ γ(xn−1 − µ), σ2) t = n.

Gaussian Markov Random Fields

Blocking

Example

• For large n, the rate of convergence is

ρ = 4γ2

(1 + γ2)2. (78)

• For |γ| close to one the rate of convergence can be slow: Ifγ = 1− δ for small δ > 0, then ρ = 1− δ2 +O(δ3).

To circumvent this problem, we may update x in one block. This ispossible as x is a GMRF.

This yields immediate convergence.

Page 47: Gaussian Markov Random Fields: Theory and Applications ... · Gaussian Markov Random Fields: Theory and Applications avaH rd Rue 1 1Department of Mathematical Sciences NTNU, Norway

Gaussian Markov Random Fields

Blocking

Example

Relax the assumptions of fixed hyperparameters.

Consider a hierarchical formulation where the mean of xt , µ, isunknown and assigned with a standard normal prior,

µ ∼ N (0, 1) and x | µ ∼ N (µ1,Q−1),

where Q is the precision matrix of the GMRF x|µ.

The joint density of (µ, x) is normal.

We have two natural blocks, µ and x.

Gaussian Markov Random Fields

Blocking

Example

A two-block Gibbs sampler update µ and x with samples from theirfull conditionals,

µ(k)|x(k) ∼ N(

1T Qx(k−1)

1 + 1T Q1, (1 + 1T Q1)

−1

)x(k)|µ(k) ∼ N (µ(k)1, Q−1).

(79)

The presence of the hyperparameter µ will slow down theconvergence compared to the case when µ is fixed.

Due to the nice structure of (79) we can characterise explicitly themarginal chain of µ(k).

Gaussian Markov Random Fields

Blocking

Example

TheoremThe marginal chain µ(1), µ(2), . . . from the two-block Gibbssampler defined in (79) and started in equilibrium, is a first-orderautoregressive process

µ(k) = φµ(k−1) + εk ,

where

φ =1T Q1

1 + 1T Q1

and εkiid∼ N (0, 1− φ2).

Gaussian Markov Random Fields

Blocking

Example

Proof. It follows directly that the marginal chain µ(1), µ(2), . . . is afirst-order autoregressive process.

The coefficient φ is found by computing the covariance at lag 1,

Cov(µ(k), µ(k+1)) = E(µ(k)µ(k+1)

)= E

(µ(k)E

(µ(k+1) | µ(k)

))= E

(µ(k)E

(E(µ(k+1) | x(k)

)| µ(k)

))= E

(µ(k)E

(1T Qx(k)

1 + 1T Q1| µ(k)

))

=1T Q1

1 + 1T Q1Var(µ(k)),

which is known to be φ times the variance Var(µ(k)) for afirst-order autoregressive process.

Page 48: Gaussian Markov Random Fields: Theory and Applications ... · Gaussian Markov Random Fields: Theory and Applications avaH rd Rue 1 1Department of Mathematical Sciences NTNU, Norway

Gaussian Markov Random Fields

Blocking

Example

For our model

φ =n(1− γ)2/σ2

1 + n(1− γ)2/σ2= 1− Var(xt)

n

1− γ2

(1− γ)2+O(1/n2).

When n is large, φ is close to 1 and the chain will both mix andconverge slowly even though we use a two-block Gibbs sampler.

Note that

• increasing variance of xt improves the convergence.

However, φ→ 1 as n→∞, which is bad.

Gaussian Markov Random Fields

Blocking

Example

What is going on?

(a) Trace of µ(k), and (b) the pairs (µ(k), 1T Qx(k)), with µ(k) onthe horizontal axis.

Gaussian Markov Random Fields

Blocking

One-block algorithm

One-block algorithm

So far: blocking improves mainly within the block. If there isstrong dependence between blocks, the MCMC algorithm may stillsuffer from slow convergence.

Update (µ, x) jointly by delaying the accept/reject step until x alsois updated.

µ∗ ∼ q(µ∗ | µ(k−1))

x∗|µ∗ ∼ N (µ∗1,Q−1)(80)

then accept/reject (µ∗, x∗) jointly.

Here, q(µ∗|µ(k−1)) can be a simple random-walk proposal or someother suitable proposal distribution.

Gaussian Markov Random Fields

Blocking

One-block algorithm

What is going on?

(a) Trace of µ(k), and (b) the pairs (µ(k), 1T Qx(k)), with µ(k) onthe horizontal axis.

Page 49: Gaussian Markov Random Fields: Theory and Applications ... · Gaussian Markov Random Fields: Theory and Applications avaH rd Rue 1 1Department of Mathematical Sciences NTNU, Norway

Gaussian Markov Random Fields

Blocking

One-block algorithm

With a symmetric µ-proposal,

α = min

1, exp(−1

2((µ∗)2 − (µ(k−1))2))

. (81)

• Only the marginal density of µ is needed in (81): weeffectively integrate x out of the target density.

• The minor modification to delay the accept/reject step until xis updated as well can give a large improvement.

• In this case: random walk on a one-dimensional density.

Gaussian Markov Random Fields

Block algorithms for hierarchical GMRF models

General setup

A more general setup

• Hyperparameters θ (low dimension)

• GMRF x | θ of size n

• Observe x with data y.

The posterior is

π(x,θ | y) ∝ π(θ) π(x | θ) π(y | x,θ).

Assume we are able to sample from π(x|θ, y), i.e., the fullconditional of x is a GMRF.

Gaussian Markov Random Fields

Block algorithms for hierarchical GMRF models

The one-block algorithm

The one-block algorithm

The following proposal update (θ, x) in one block:

θ∗ ∼ q(θ∗ | θ(k−1))

x∗ ∼ π(x | θ∗, y).(82)

The proposal (θ∗, x∗) is then accepted/rejected jointly.

We denote this as the one-block algorithm.

Gaussian Markov Random Fields

Block algorithms for hierarchical GMRF models

The one-block algorithm

Consider the θ-chain, then we are in fact sampling from theposterior marginal π(θ|y) using the proposal

θ∗ ∼ q(θ∗ | θ(k−1))

The dimension of θ is typically low (1-5, say).

The proposed algorithm should not experience any serious mixingproblems.

Page 50: Gaussian Markov Random Fields: Theory and Applications ... · Gaussian Markov Random Fields: Theory and Applications avaH rd Rue 1 1Department of Mathematical Sciences NTNU, Norway

Gaussian Markov Random Fields

Block algorithms for hierarchical GMRF models

The one-block algorithm

The one-block algorithm is not always feasible

The one-block algorithm is not always feasible for the followingreasons:

1. The full conditional of x can be a GMRF with a precisionmatrix that is not sparse.This will prohibit a fast factorisation, hence a joint update isfeasible but not computationally efficient.

2. The data can be non-normal so the full conditional of x is nota GMRF and sampling x∗ using (82) is not possible (ingeneral).

Gaussian Markov Random Fields

Block algorithms for hierarchical GMRF models

The sub-block algorithm

...not sparse precision matrix

These cases can often be approached using sub-blocks of (θ, x),the sub-block algorithm.

Assume a natural splitting exists for both θ and x into

(θa, xa), (θb, xb) and (θc , xc), (83)

The sets a, b, and c do not need to be disjoint.

One class of examples where such an approach is fruitful is(geo-)additive models where a, b, and c represent three differentcovariate effects with their respective hyperparameters.

Gaussian Markov Random Fields

Block algorithms for hierarchical GMRF models

GMRF approximations

...non-Gaussian full conditionals

Auxillary variables

• can help achieving Gaussian full conditionals.

• logit and probit regression models for binary andmulti-categorical data, and

• Student-tν distributed observations.

GMRF approximations

• can approximate π(x|θ, y) using a second-order Taylorexpansion.

• Prominent example: Poisson-regression.

• Such approximations can be surprisingly accurate in manycases and can be interpreted as integrating x approximatelyout of π(x,θ|y).

Gaussian Markov Random Fields

Merging GMRFs

Introduction

Merging GMRFs using conditioning (I)

µ ∼ N (0, 1)

x− µ | µ ∼ AR(1)

z | x ∼ N (x, I)

y | z ∼ N (z, I)

• x∗ = (µ, x, z, y) is a GMRF

• x∗ | y is a GMRF

Page 51: Gaussian Markov Random Fields: Theory and Applications ... · Gaussian Markov Random Fields: Theory and Applications avaH rd Rue 1 1Department of Mathematical Sciences NTNU, Norway

Gaussian Markov Random Fields

Merging GMRFs

Introduction

Merging GMRFs using conditioning (I)

µ ∼ N (0, 1)

x− µ | µ ∼ AR(1)

z | x ∼ N (x, I)

y | z ∼ N (z, I)

• x∗ = (µ, x, z, y) is a GMRF

• x∗ | y is a GMRF

• Additional hyperparameters θ

Gaussian Markov Random Fields

Merging GMRFs

Introduction

Merging GMRFs using conditioning (II)

Gaussian Markov Random Fields

Merging GMRFs

Why is it so?

Merging GMRFs using conditioning (III)

If

x ∼ N (0,Q−1)

y | x ∼ N (x,K−1)

z | x, y ∼ N (y,H−1)

then

Prec(x, y, z) =

Q + K −K 0K + H −H

H

which is sparse if Q, K and H are.

Gaussian Markov Random Fields

Part VI

Applications

Page 52: Gaussian Markov Random Fields: Theory and Applications ... · Gaussian Markov Random Fields: Theory and Applications avaH rd Rue 1 1Department of Mathematical Sciences NTNU, Norway

Gaussian Markov Random Fields

Outline I

Normal dataMunich rental data

Normal-mixture response modelsIntroductionHierarchical-t formulationsBinary regressionSummary

Non-normal response modelsGMRF approximationJoint analysis of diseases

Gaussian Markov Random Fields

Normal data

Normal data: Munich rental guide

• Response variable yi : rent pr m2

• location

• floor space

• year of construction

• various indicator variables, suchas• no central heating• no bathroom• large balcony

German law: increase in the rent is based on an ‘average rent’ of acomparable flat.

Gaussian Markov Random Fields

Normal data

Munich rental data

Spatial regression model

yi ∼ N(µ+ xS(i) + xC (i) + xL(i) + zT

i β, 1/κy

)xS(i) Floor space (continuous RW2)

xC (i) Year of construction (continuous RW2)

xL(i) Location (first order IGMRF)

β Parameter for indicator variables

• 380 spatial locations

• 2035 observations

• sum-to-zero constraint on spatial IGMRF model and the yearof construction covariate

Gaussian Markov Random Fields

Normal data

Munich rental data

Inference using MCMC

Sub-block approach

(xS , κS), (xC , κC ), (xL, κL), and (β, µ, κy).

Update each block at a time, using

κ∗L ∼ q(κ∗L | κL)

xL,∗ ∼ π(xL,∗ | the rest)

and then accepts/rejects (κ∗L, xL,∗) jointly.

Page 53: Gaussian Markov Random Fields: Theory and Applications ... · Gaussian Markov Random Fields: Theory and Applications avaH rd Rue 1 1Department of Mathematical Sciences NTNU, Norway

Gaussian Markov Random Fields

Normal data

Munich rental data

Log-RW proposal for precisions

It’s convenient to propose new precisions using

κnew = κold · f

f ∼ π(f ) ∝ 1 + 1/f

for f in [1/F ,F ] and F > 1.

With this choiceq(κold | κnew)

q(κnew | κold)= 1

Var(κnew | κold) ∝ (κold)2

Gaussian Markov Random Fields

Normal data

Munich rental data

Full conditional π(xL,∗|the rest)

Introduce ‘fake’ data y

yi = yi −(µ+ xS(i) + xC (i) + zT

i β),

The full conditional of xL is

π(xL | the rest) ∝ exp(−κL

2

∑i∼j

(xLi − xL

j )2)

× exp(−κy

2

∑k

(yk − xL(k)

)2).

The data y do not introduce extra dependence between the xLi ’s,

as yi acts as a noisy observation of xLi .

Gaussian Markov Random Fields

Normal data

Munich rental data

Denote by ni the number of neighbors to location i and let L(i) be

L(i) = k : xL(k) = xLi ,

where its size is |L(i)|.The full conditional of xL is a GMRF with parameters (Q−1b,Q),where

bi = κy

∑k∈Li

yk and

Qij =

κLni + κy|L(i)| if i = j

−κL if i ∼ j

0 otherwise.

Gaussian Markov Random Fields

Normal data

Munich rental data

Results

Floor space

Page 54: Gaussian Markov Random Fields: Theory and Applications ... · Gaussian Markov Random Fields: Theory and Applications avaH rd Rue 1 1Department of Mathematical Sciences NTNU, Norway

Gaussian Markov Random Fields

Normal data

Munich rental data

Results

Year of construction

Gaussian Markov Random Fields

Normal data

Munich rental data

ResultsLocation

Gaussian Markov Random Fields

Normal-mixture response models

Introduction

Normal-mixture representation

Theorem (Kelker, 1971)If x has density f (x) symmetric around 0, then there existindependent random variables z and v, with z standard normalsuch that x = z/v iff the derivatives of f (x) satisfy(

− d

dy

)k

f (√

y) ≥ 0

for y > 0 and for k = 1, 2, . . ..

• Student-t

• Logistic and Laplace

Gaussian Markov Random Fields

Normal-mixture response models

Introduction

Corresponding mixing distribution for the precision parameter λthat generates these distributions as scale mixtures of normals.

Distribution of x Mixing distribution of λ

Student-tν G(ν/2, ν/2)Logistic 1/(2K )2 where K is

Kolmogorov-Smirnov distributedLaplace 1/(2E ) where E is exponential distributed

Page 55: Gaussian Markov Random Fields: Theory and Applications ... · Gaussian Markov Random Fields: Theory and Applications avaH rd Rue 1 1Department of Mathematical Sciences NTNU, Norway

Gaussian Markov Random Fields

Normal-mixture response models

Hierarchical-t formulations

RW1 with tν − increments

Replace the assumption of normally distributed increments by aStudent-tν distribution to allow for larger jumps in the sequence x.

Introduce n−1 independent G(ν/2, ν/2) scale mixture variables λi :

∆xi | λiiid∼ N (0, (κλi )

−1), i = 1, . . . , n − 1.

Gaussian Markov Random Fields

Normal-mixture response models

Hierarchical-t formulations

Observe data yi ∼ N (xi , κ−1y ) for i = 1, . . . , n

The posterior density for (x,λ) is

π(x,λ | y) ∝ π(x | λ) π(λ) π(y | x).

Note that

• x|(y,λ) is now a GMRF, while

• λ1, . . . , λn−1|(x, y) are conditionally independent gammadistributed with parameters (ν + 1)/2 and (ν + κ(∆xi )

2)/2.

Use the sub-block algorithm with blocks

(θ, x), λ

Gaussian Markov Random Fields

Normal-mixture response models

Binary regression

Example: Binary regression

GMRF x and Bernoulli data

yi ∼ B(g−1(xi ))

g(p) =

log(p/(1− p)) logit link

Φ(p) probit link

Equivalent representation using auxiliary variables w

εiiid∼ N (0, 1)

wi = xi + εi

yi =

1 if wi > 0

0 otherwise.

for the probit-link.

Gaussian Markov Random Fields

Normal-mixture response models

Binary regression

MCMC Inference

• Full conditional for the GMRF prior is a GMRF

• Full conditional for the auxiliary variables

π(w | . . .) =∏i

π(wi | ...)

Use the sub-block algorithm with blocks

(θ, x), w

Page 56: Gaussian Markov Random Fields: Theory and Applications ... · Gaussian Markov Random Fields: Theory and Applications avaH rd Rue 1 1Department of Mathematical Sciences NTNU, Norway

Gaussian Markov Random Fields

Normal-mixture response models

Binary regression

Simple example: Tokyo rainfall data

Stage 1 Binomial data

yi ∼

Binomial(2, p(xi ))

Binomial(1, p(xi ))

Gaussian Markov Random Fields

Normal-mixture response models

Binary regression

Simple example: Tokyo rainfall data

Stage 2 Assume a smooth latent x,

x ∼ RW 2(κ), logit(pi ) = xi

Gaussian Markov Random Fields

Normal-mixture response models

Binary regression

Simple example: Tokyo rainfall data

Stage 3 Gamma(α, β)-prior on κ

Gaussian Markov Random Fields

Normal-mixture response models

Binary regression

Simple example: Tokyo rainfall data

Block Single-site

Precision

Page 57: Gaussian Markov Random Fields: Theory and Applications ... · Gaussian Markov Random Fields: Theory and Applications avaH rd Rue 1 1Department of Mathematical Sciences NTNU, Norway

Gaussian Markov Random Fields

Normal-mixture response models

Binary regression

Simple example: Tokyo rainfall data

Block Single-site

Probability

Gaussian Markov Random Fields

Normal-mixture response models

Summary

• These auxiliary variables methods works better than onemight think.

• Not that strong dependency between the blocks:

(θ, x), w

• Nearly as random normal data.

Gaussian Markov Random Fields

Non-normal response models

GMRF approximation

Non-normal response models

Example of full conditional

π(x | y) ∝ exp

(−1

2xT Qx−

∑i

exp(xi )

)

≈ exp

(−1

2(x− µ)T (Q + diag(ci ))(x− µ)

)Construct GMRF approximation

• Locate the mode x∗

• Expand to second order

• To obtain the GMRF approximation

• The graph is unchanged

Gaussian Markov Random Fields

Non-normal response models

GMRF approximation

Why use the mode?

Page 58: Gaussian Markov Random Fields: Theory and Applications ... · Gaussian Markov Random Fields: Theory and Applications avaH rd Rue 1 1Department of Mathematical Sciences NTNU, Norway

Gaussian Markov Random Fields

Non-normal response models

GMRF approximation

How to find the mode?

• Numerical gradient-search methods• evaluate the gradient and hessian in O(n) flops.• faster for large GMRFs

• Newton-Raphson method• Current x is initial value• Taylor-expand around x• Compute the mean in the GMRF-approximation• Repeat this until convergence

• Further complications with linear constraints Ax = b.

Gaussian Markov Random Fields

Non-normal response models

GMRF approximation

“Taylor” or not? (I)

Consider a non-quadratic function g(x)

g(x) = g(x0) + (x − x0)g ′(x0) +1

2(x − x0)2g ′′(x0) + . . .

• error is zero in x0

• error typically increase with |x − x0|

Gaussian Markov Random Fields

Non-normal response models

GMRF approximation

“Taylor” or not? (II)

Alternative approximation

g(x) ≈ g(x) = a + b +1

2cx2 (84)

where a, b and c are such that

g(x0) = g(x0), g(x0 ± h) = g(x0 ± h)

• More “uniformly distributed error”

• Better for distribution approximation

• ....if h is a rough guess of ROI.

Preference to numerical approxmations of derivatives compared toanalytical expressions.

Gaussian Markov Random Fields

Non-normal response models

Joint analysis of diseases

Joint analysis of diseases

SMR Oral SMR Lung

Page 59: Gaussian Markov Random Fields: Theory and Applications ... · Gaussian Markov Random Fields: Theory and Applications avaH rd Rue 1 1Department of Mathematical Sciences NTNU, Norway

Gaussian Markov Random Fields

Non-normal response models

Joint analysis of diseases

Separate analysis: model

yij : number of cases in area i for cancer type j .

yij ∼ P(eij exp(ηij)),

ηij : log relative risk in area i for disease j .eij : constants

• Decompose the log-relative risk η as

η = µ1 + u + v

• µ is the overall mean

• u a spatially structured component (IGMRF)

• v an unstructured component (random effects)

Gaussian Markov Random Fields

Non-normal response models

Joint analysis of diseases

Separate analysis: model

Posterior

π(x,κ | y) ∝ κn/2v κ

(n−1)/2u exp

(−1

2xT Qx

)× exp

(∑i

yiηi − ei exp(ηi )

)π(κ)

Q =

κµ + nκv κv1T −κv1T

κv1 κuR + κvI −κvI−κv1 −κvI κvI

Q is a 2n + 1× 2n + 1 matrix where n = 544.

The spatially structured term u has a sum-to-zero constraint,1T u = 0.

Gaussian Markov Random Fields

Non-normal response models

Joint analysis of diseases

Separate analysis: MCMC

Joint update of all parameters:

κ∗u ∼ q(κ∗u | κu)

κ∗v ∼ q(κ∗v | κv)

x∗ ∼ π(x | κ∗, y)

Gaussian Markov Random Fields

Non-normal response models

Joint analysis of diseases

Separate analysis: Results

Oral Lung

Page 60: Gaussian Markov Random Fields: Theory and Applications ... · Gaussian Markov Random Fields: Theory and Applications avaH rd Rue 1 1Department of Mathematical Sciences NTNU, Norway

Gaussian Markov Random Fields

Non-normal response models

Joint analysis of diseases

Joint analysis: Model

• Oral cavity and lung cancer relates to tobacco smoking (u1)

• Oral cancer relates to alcohol consumption (u2)

• Latent shared and specific spatial components:

η1 | u1,u2,µ,κ ∼ N (µ11 + δu1 + u2, κ−1η1

I)

η2 | u1,u2,µ,κ ∼ N (µ21 + δ−1u1, κ−1η2

I),

Gaussian Markov Random Fields

Non-normal response models

Joint analysis of diseases

Joint analysis: MCMC

Update all parameters in one block

• Simple (log-)RW for the hyperparamteters

• Sample (µ1,η1,u1, µ2,η2,u2) from the GMRF approximation.

• Accepts/rejects jointly.

Gaussian Markov Random Fields

Non-normal response models

Joint analysis of diseases

Joint analysis: Results

“tobacco” “alcohol”

Gaussian Markov Random Fields

Non-normal response models

Joint analysis of diseases

Independence samplers

For many of the examples it’s quite easy/possible to constructindependence samplers

• (More or less) exact samples

• The basis to methods avoiding completely MCMC

...continue tomorrow!

Page 61: Gaussian Markov Random Fields: Theory and Applications ... · Gaussian Markov Random Fields: Theory and Applications avaH rd Rue 1 1Department of Mathematical Sciences NTNU, Norway

Gaussian Markov Random Fields

Part VII

Advanced topics

Gaussian Markov Random Fields

Outline I

Computing marginal variances for GMRFs∗

IntroductionStatistical derivationMarginal variances under hard and soft constraints

Gaussian Markov Random Fields

Computing marginal variances for GMRFs∗

Introduction

Computing marginal variances for GMRFs∗

LetQ = VDVT

• where D is a diagonal matrix, and

• V is a lower triangular matrix with ones on the diagonal.

The matrix identity

Σ = D−1V−1 + (I− VT )Σ

define recursions which can be used to compute

• Var(xi ) and Cov(xi , xj) for i ∼ j

essentially without cost when the Cholesky triangle L is known.This is a result from 1973.

Gaussian Markov Random Fields

Computing marginal variances for GMRFs∗

Statistical derivation

Statistical derivation

Recall for a zero mean GMRF that

xi | xi+1, . . . , xn ∼ N (− 1

Lii

n∑k=i+1

Lkixk , 1/L2ii ), i = n, . . . , 1.

provides a sequential representation of the GMRF backward in“time” i .

Page 62: Gaussian Markov Random Fields: Theory and Applications ... · Gaussian Markov Random Fields: Theory and Applications avaH rd Rue 1 1Department of Mathematical Sciences NTNU, Norway

Gaussian Markov Random Fields

Computing marginal variances for GMRFs∗

Statistical derivation

Multiply by xj , j ≥ i , and taking expectation yields

Σij = δij/L2ii −

1

Lii

n∑k∈I(i)

LkiΣkj , j ≥ i , i = n, . . . , 1,

where I(i) as those k where Lki is non-zero,

I(i) = k > i : Lki 6= 0

and δij is one if i = j and zero otherwise.

Gaussian Markov Random Fields

Computing marginal variances for GMRFs∗

Statistical derivation

We can use

Σij = δij/L2ii −

1

Lii

n∑k∈I(i)

LkiΣkj , j ≥ i , i = n, . . . , 1,

to compute Σij for each ij :

• Outer loop i = n, . . . , 1

• Inner loop j = n, . . . , i

Gaussian Markov Random Fields

Computing marginal variances for GMRFs∗

Statistical derivation

ExampleLet n = 3, I(1) = 2, 3, I(2) = 3, then we get

Σ33 =1

L233

Σ23 = − 1

L22(L32Σ33)

Σ22 =1

L222

− 1

L22(L32Σ32) Σ13 = − 1

L11(L21Σ23 + L31Σ33)

Σ12 = − 1

L11(L21Σ22 + L31Σ32) Σ11 =

1

L211

− 1

L11(L21Σ21 + L31Σ31)

where we also need to use that Σ is symmetric.

Gaussian Markov Random Fields

Computing marginal variances for GMRFs∗

Statistical derivation

• Assume we want to compute all marginal variances.

• To do so, we need to compute Σij (or Σji ) for all ij in someset S.

• If the recursions can be solved by only computing Σij for allij ∈ S we say that the recursions are solvable using S.

Page 63: Gaussian Markov Random Fields: Theory and Applications ... · Gaussian Markov Random Fields: Theory and Applications avaH rd Rue 1 1Department of Mathematical Sciences NTNU, Norway

Gaussian Markov Random Fields

Computing marginal variances for GMRFs∗

Statistical derivation

From

Σij = δij/L2ii −

1

Lii

n∑k∈I(i)

LkiΣkj , j ≥ i , i = n, . . . , 1, (85)

it is evident that S must satisfy

ij ∈ S and k ∈ I(i) =⇒ kj ∈ S (86)

We also need that ii ∈ S for i = 1, . . . , n.

Gaussian Markov Random Fields

Computing marginal variances for GMRFs∗

Statistical derivation

• S = V × V is a valid set, but we want |S| to be minimal toavoid unnecessary computations.

• Such a minimal set depends however on the numerical valuesin L, or Q implicitly.

• Denote by S(Q) a minimal set.

TheoremThe union of S(Q) for all Q > 0 with fixed graph G, is a subset of

S∗ = ij ∈ V × V : j ≥ i , i and j are not separated by F (i , j)

and S∗ is solvable.

Gaussian Markov Random Fields

Computing marginal variances for GMRFs∗

Statistical derivation

Proof. We first note that ii ∈ S∗, for i = 1, . . . , n, since i and i are notseparated by F (i , i). We will now verify that the recursions are solvable usingS∗. The global Markov property ensure that if ij 6∈ S∗ then Lji = 0 for allQ > 0 with fixed graph G. We use this to replace I(i) withI∗(i) = k > i : ik ∈ S∗ in (86), which is legal since I(i) ⊆ I∗(i) and thedifference only identify terms Lki which are zero. It is now sufficient to showthat

ij ∈ S∗ and ik ∈ S∗ =⇒ kj ∈ S∗ (87)

which implies (86). Eq. (87) is trivially true for i ≤ k = j . Fix now i < k < j .

Then ij ∈ S∗ says that there exists a path i , i1, . . . , in, j , where i1, . . . , in are all

smaller than i , and ik ∈ S∗ says that exists a path i , i ′1, . . . , i ′n′ , k, where

i ′1, . . . , i ′n′ are all smaller than i. Then there is a path from k to i and from i to

j where all nodes are less or equal to i , but then also less than k since i < k.

Hence, k and j are not separated by F (k, j) so kj ∈ S∗. Finally, since S∗contains 11, . . . , nn and only depend on G, it must contain the union of all

S(Q) since each S(Q) is minimal.

Gaussian Markov Random Fields

Computing marginal variances for GMRFs∗

Statistical derivation

Interpretation of S∗

• S∗ is the set of all possible non-zero elements in L based on Gonly.

• This is the set of Lji ’s that are computed when computingQ = LLT .

• Since Lji 6= 0 in general when i ∼ j , then we compute alsoCov(xi , xj) for i ∼ j .

• Some of the Lij ’s might turn out to be zero depending on theconditional independence properties of the marginal densityfor xi :n for i = n, . . . , 1. This might cause a slight problem inpractice (implementation dependent).

Page 64: Gaussian Markov Random Fields: Theory and Applications ... · Gaussian Markov Random Fields: Theory and Applications avaH rd Rue 1 1Department of Mathematical Sciences NTNU, Norway

Gaussian Markov Random Fields

Computing marginal variances for GMRFs∗

Statistical derivation

General algorithm

for i = n, . . . , 1for decreasing j in I(i)

Compute Σi ,j from Eq. (85)

Gaussian Markov Random Fields

Computing marginal variances for GMRFs∗

Statistical derivation

Band matrices

for i = n, . . . , 1for j = min(i + bw , n), . . . , i

Compute Σi ,j from Eq. (85).

Equivalent to Kalman-recursions for smoothing.

Gaussian Markov Random Fields

Computing marginal variances for GMRFs∗

Marginal variances under hard and soft constraints

Marginal variances under hard and soft constraints

Let Σ be the covariance with the constraints and Σ be without.

Then Σ relates to Σ as

Σ = Σ−Q−1AT(

AQ−1AT)−1

AQ−1.

hence

Σii = Σii −(

Q−1AT(

AQ−1AT)−1

AQ−1

)ii

, i = 1, . . . , n.

for the hard constraint Ax = b.

Gaussian Markov Random Fields

Computing marginal variances for GMRFs∗

Marginal variances under hard and soft constraints

Computational costs

Time O(n)

Spatial O(n log(n)2)

Spatio-temporal O(n5/3).