Download pdf - Iain Currie Heriot-Watt University Universitat Z¨ urich ...iain/research/talks/Zurich.pdf · Iain Currie Heriot-Watt University Universitat Z¨ urich¨ May 2014. Outline • Theory

GLAM

Generalized Linear Array Models

Iain Currie

Heriot-Watt University

Universitat Zurich

May 2014

Outline

• Theory

– Data

– Generalized Linear Models, GLMs

– The P -spline method

– P -splines in two dimensions

– Accelerated computation, GLAM

• Applications

– The smooth Lee-Carter family

Outline

• Theory

– Data

– Generalized Linear Models, GLMs

– The P -spline method

– P -splines in two dimensions

– Accelerated computation, GLAM

• Applications

– The smooth Lee-Carter family

England & Wales male mortality data (ONS)

Age

40

90

Year

1961 2009

Deaths : D

Exposures : E

D,E : 51× 49

Age

40

50

60

70

80

90

Year

1970

1980

1990

2000

log(mortality)

−6

−5

−4

−3

−2

Raw mortality surface

Generalized linear models

Structure

• Data: vectors y of deaths and e of exposures

• Model: a model matrix B of B-splines

a parameter vector θ

a link function

µ = E(y), logµ = log e+Bθ

• Error distribution: Poisson

Algorithm

• Newton-Raphson (IWLS) algorithm

B′WδBθ = B′Wδz

where z = Bθ + Wδ

−1

(y − µ) is the working vector and Wδ is a diagonal

matrix of weights.

Generalized linear models

Structure

• Data: vectors y of deaths and e of exposures

• Model: a model matrix B of B-splines

a parameter vector θ

a link function

µ = E(y), logµ = log e+Bθ

• Error distribution: Poisson

Algorithm

• Newton-Raphson (IWLS) algorithm

B′WδBθ = B′Wδz

where z = Bθ + Wδ

−1

(y − µ) is the working vector and Wδ is a diagonal

matrix of weights.

Log mortality for E & W males age 70

log(

mor

talit

y)

−3.

9−

3.7

−3.

5−

3.3

−3.

1−

2.9

Observed mortalityB−spline regressionB−spline coefficients

DF = 28

Year

Bsp

line

1961 1971 1981 1991 2001

0.0

0.3

0.6

Penalties

Eilers & Marx (1996) imposed penalties on differences between adjacent

coefficients

(θ1 − 2θ2 + θ3)2 + . . .+ (θc−2 − 2θc−1 + θc)

2 = θ′D′

2D2θ

where D2 is a second order difference matrix.

Algorithm

• Penalized Newton-Raphson (IWLS) algorithm

(B′WδB + P )θ = B′Wδz,

P = λD′

2D2 is a roughness penalty.

• This is the method of P -splines.

Penalties

Eilers & Marx (1996) imposed penalties on differences between adjacent

coefficients

(θ1 − 2θ2 + θ3)2 + . . .+ (θc−2 − 2θc−1 + θc)

2 = θ′D′

2D2θ

where D2 is a second order difference matrix.

Algorithm

• Penalized Newton-Raphson (IWLS) algorithm

(B′WδB + P )θ = B′Wδz,

P = λD′

2D2 is a roughness penalty.

• This is the method of P -splines.

Smoothing parameter selection

Trade-off fit and roughness by minimizing

BIC = Deviance + log(n)ED,

where ED is the effective dimension or degrees of freedom.

Log mortality for E & W males age 70

log(

mor

talit

y)

−3.

9−

3.7

−3.

5−

3.3

−3.

1−

2.9

Observed mortalityB−spline regressionB−spline coefficientsP−spline regressionP−spline coefficients

DF = 28EDF = 16

Year

Bsp

line

1961 1971 1981 1991 2001

0.0

0.3

0.6

2-dimensional smoothing

Let Ba, na × ca, be a 1-d B-spline model matrix defined along age.

Let By, ny × cy , be a 1-d B-spline model matrix defined along year.

The 2-d model matrix is given by the Kronecker product

B = By ⊗Ba, nany × cacy.

Age

40

50

60

70

80

90

Year

1970

1980

1990

2000

B−spline

0.0

0.1

0.2

0.3

0.4

0.5

2d B−spline basis

Penalties in 2-d

• Each regression coefficient is associated with the summit of one of the hills.

• Smoothness is ensured by penalizing the coefficients in rows and columns.

P = λaIcy ⊗D′

aDa + λyD′

yDy ⊗ Ica

Computational challenge

• B = By ⊗Ba is large, maybe very large.

• Two smoothing parameters, λa and λy , must be chosen.

GLAM

Definition: Row tensor of X, n× c,

G(X) = [X ⊗ 1′

c] ∗ [1′

c ⊗X] , n× c2.




GLAM


G(X) = [X ⊗ 1′

c] ∗ [1′

c ⊗X] , n× c2.




GLAM


G(X) = [X ⊗ 1′

c] ∗ [1′

c ⊗X] , n× c2.

GLAM Algorithms

Structure: logE[D] = logE +BaΘB′

y

Linear function: Bθ = [By ⊗Ba]θ ≡ BaΘB′

y

nany × 1 na × ny

Inner product: B′WδB ≡ G(Ba)′WG(By)

cacy × cacy c2a × c2y

St errors: diag{Var(Bθ)} ≡ G(Ba)SG(By)′

diag{nany × nany} na × ny

S, c2a × c2y ≡ (B′WδB)−1, cacy × cacy

GLAM Algorithms


y


y

nany × 1 na × ny






GLAM Algorithms


y


y

nany × 1 na × ny






GLAM Algorithms


y


y

nany × 1 na × ny






GLAM

• Conceptually attractive

• Low footprint

• Very fast - marginal processing (Yates algorithm)

• Generalizes to d-dimensions

• Mixed model representation

Applications of GLAM

• Spatio-temporal modelling

• Multidimensional density estimation

• Longitudinal data analysis

• Variety trials

• Microarray analysis

• Respiratory disease modelling (age: 1-105; year: 1959-1998, month: 1-12).

Timings in 3-d

Data are 105× 40× 12 = 50400

Times (seconds) to calculate B′WδB

Coefficients npar GLM GLAM Ratio

6× 6× 6 216 20 1 20:1

7× 7× 7 343 200 2 100:1

8× 8× 8 512 2000 4 500:1

9× 9× 9 729 − 20 −

Timings in 3-d

Data are 105× 40× 12 = 50400

Times (seconds) to calculate B′WδB

Coefficients npar GLM GLAM Ratio

6× 6× 6 216 20 1 20:1

7× 7× 7 343 200 2 100:1

8× 8× 8 512 2000 4 500:1

9× 9× 9 729 − 20 −

Smoothing a constrained GLM

Consider a constrained penalized GLM (Currie, 2013) with

• Model matrix: X, n× p, r(X) = p− q

• Constraints matrix: H, q × p, r ([X ′ : H ′]) = p, Hθ = k

• Penalty matrix: P , p× p, acting on θ

• Link: canonical

• Error: exponential family

• Algorithm: Newton-Raphson:

X ′WX + P : H ′

H : 0

θ

ω

=

X ′W z

k

Lee-Carter model

log λi,j = αi + βiκj,∑

κj = 0,∑

βi = 1

logΛ = α1′ + βκ′

The gnm package in R

Age.F = factor(Age); Year.F = factor(Year)

gnm(Death ∼ −1+ Age.F+ Mult(Age.F, Year.F) + offset(Off),

family = poisson(link = "log"))

NB: This is a vector calculation - not GLAM!

Lee-Carter model


κj = 0,∑

βi = 1

logΛ = α1′ + βκ′

The gnm package in R

Age.F = factor(Age); Year.F = factor(Year)

gnm(Death ∼ −1+ Age.F+ Mult(Age.F, Year.F) + offset(Off),

family = poisson(link = "log"))

NB: This is a vector calculation - not GLAM!

40 50 60 70 80 90

−6

−5

−4

−3

−2

Age

Alp

ha

40 50 60 70 80 90

0.25

0.35

0.45

0.55

Age

Bet

a

1960 1970 1980 1990 2000 2010

−1.

0−

0.5

0.0

0.5

Year

Kap

pa

1960 1970 1980 1990 2000 2010

−3.

8−

3.6

−3.

4−

3.2

−3.

0−

2.8

Year

log(

mor

talit

y)

Age 70

Coupled GLMs

GLM1: log λi,j = αi + βiκj,∑

βi = 1

logE(y) = log e+ 1ny⊗ α+Xβ, X = [κ⊗ Ina

],

H = h′ = 1′

na, k = 1

Xβ ≡ βκ′


κj = 0

logE(y) = log e+Xθ, X = [1ny⊗ Ina

: Iny⊗ β],

H = h′ = (0′

na,1′

ny), k = 0

Xθ ≡ α1′ + βκ′

Coupled GLMs


βi = 1

logE(y) = log e+ 1ny⊗ α+Xβ, X = [κ⊗ Ina

],

H = h′ = 1′

na, k = 1

Xβ ≡ βκ′


κj = 0

logE(y) = log e+Xθ, X = [1ny⊗ Ina

: Iny⊗ β],

H = h′ = (0′

na,1′

ny), k = 0

Xθ ≡ α1′ + βκ′

2010 2020 2030 2040 2050

−7.

2−

7.0

−6.

8−

6.6

−6.

4

Crossover with LC original

Year

log(

mor

talit

y)Age: 41Age: 42

Delwarde-Denuit-Eilers model


κj = 0,∑

βi = 1

where

β = Bab.

Fitting Delwarde-Denuit-Eilers

Coupled GLMs


βi = 1, β = Bab

logE(y) = log e+ 1ny⊗ α+Xb, X = [κ⊗ Ina

]Ba,

H = h′ = 1′

naBa, k = 1

P = τβD′

2D2

GLM2: as LC original.

Fitting Delwarde-Denuit-Eilers

Coupled GLMs


βi = 1, β = Bab

logE(y) = log e+ 1ny⊗ α+Xb, X = [κ⊗ Ina

]Ba,

H = h′ = 1′

naBa, k = 1

P = τβD′

2D2

GLM2: as LC original.

40 50 60 70 80 90

0.01

50.

020

0.02

5

Estimates of beta

Age

Bet

aLCDDE

2010 2020 2030 2040 2050

−7.

2−

7.0

−6.

8−

6.6

−6.

4

Crossover with LC and DDE

Year

log(

mor

talit

y)LC: Age = 41LC: Age = 42DDE: Age = 41DDE: Age = 42

2010 2020 2030 2040 2050

−6.

6−

6.4

−6.

2−

6.0

−5.

8−

5.6

−5.

4

Irregularity with Lee−Carter

Year

log(

mor

talit

y)

Age = 51Age = 52Age = 53Age = 54Age = 55

2010 2020 2030 2040 2050

−6.

8−

6.6

−6.

4−

6.2

−6.

0−

5.8

−5.

6−

5.4

Irregularity with DDE

Year

log(

mor

talit

y)

Age = 51Age = 52Age = 53Age = 54Age = 55

LC(S): Smoothing α and β


κj = 0,∑

βi = 1

where

β = Bab, α = Baa.

Fitting smooth α and β

Coupled GLMs

GLM1: as DDE.


κj = 0, α = Baa

logE(y) = log e+Xθ, X = [1ny⊗Ba : Iny

⊗ β],

θ′ = (a′,κ′),

H = h′ = (0′

ca,1′

ny), k = 0,

P = blockdiag{ταD′

2D2, 0 ∗ Iny

}.

Fitting smooth α and β

Coupled GLMs

GLM1: as DDE.


κj = 0, α = Baa

logE(y) = log e+Xθ, X = [1ny⊗Ba : Iny

⊗ β],

θ′ = (a′,κ′),

H = h′ = (0′

ca,1′

ny), k = 0,

P = blockdiag{ταD′

2D2, 0 ∗ Iny

}.

40 50 60 70 80 90

0.04

0.06

0.08

0.10

0.12

Mortality differences in 2010

Age

Mor

talit

y di

ffere

nce

LCDDELC(S)

40 50 60 70 80 90

−0.

050.

000.

050.

100.

15

Mortality differences in 2050

Age

Mor

talit

y di

ffere

nce

LCDDELC(S)

References

• Currie, Durban, & Eilers (2006). Generalized linear array

models with applications to multidimensional smoothing.

Journal of the Royal Statistical Society, Series B, 68, 259-280.

• Currie (2013). Smoothing constrained generalized linear

models with applications to the Lee-Carter model. Statistical

Modelling, 13, 69-93.

• Djeundje & Currie (2011). Smoothing dispersed counts with

applications to mortality data. Annals of Actuarial Science, 5,

33-52. (Over-dispersion, amounts, joint models)

• Lee & Durban (2011). P-spline ANOVA-type interaction

models for spatio-temporal smoothing. Statistical Modelling,

11, 49-69,

• Turner & Firth. (2012). Generalized nonlinear models in R:

An overview of the gnm package.