GLAM
Generalized Linear Array Models
Iain Currie
Heriot-Watt University
Universitat Zurich
May 2014
Outline
• Theory
– Data
– Generalized Linear Models, GLMs
– The P -spline method
– P -splines in two dimensions
– Accelerated computation, GLAM
• Applications
– The smooth Lee-Carter family
Outline
• Theory
– Data
– Generalized Linear Models, GLMs
– The P -spline method
– P -splines in two dimensions
– Accelerated computation, GLAM
• Applications
– The smooth Lee-Carter family
England & Wales male mortality data (ONS)
Age
40
90
Year
1961 2009
Deaths : D
Exposures : E
D,E : 51× 49
Age
40
50
60
70
80
90
Year
1970
1980
1990
2000
log(mortality)
−6
−5
−4
−3
−2
Raw mortality surface
Generalized linear models
Structure
• Data: vectors y of deaths and e of exposures
• Model: a model matrix B of B-splines
a parameter vector θ
a link function
µ = E(y), logµ = log e+Bθ
• Error distribution: Poisson
Algorithm
• Newton-Raphson (IWLS) algorithm
B′WδBθ = B′Wδz
where z = Bθ + Wδ
−1
(y − µ) is the working vector and Wδ is a diagonal
matrix of weights.
Generalized linear models
Structure
• Data: vectors y of deaths and e of exposures
• Model: a model matrix B of B-splines
a parameter vector θ
a link function
µ = E(y), logµ = log e+Bθ
• Error distribution: Poisson
Algorithm
• Newton-Raphson (IWLS) algorithm
B′WδBθ = B′Wδz
where z = Bθ + Wδ
−1
(y − µ) is the working vector and Wδ is a diagonal
matrix of weights.
Log mortality for E & W males age 70
log(
mor
talit
y)
−3.
9−
3.7
−3.
5−
3.3
−3.
1−
2.9
Observed mortalityB−spline regressionB−spline coefficients
DF = 28
Year
Bsp
line
1961 1971 1981 1991 2001
0.0
0.3
0.6
Penalties
Eilers & Marx (1996) imposed penalties on differences between adjacent
coefficients
(θ1 − 2θ2 + θ3)2 + . . .+ (θc−2 − 2θc−1 + θc)
2 = θ′D′
2D2θ
where D2 is a second order difference matrix.
Algorithm
• Penalized Newton-Raphson (IWLS) algorithm
(B′WδB + P )θ = B′Wδz,
P = λD′
2D2 is a roughness penalty.
• This is the method of P -splines.
Penalties
Eilers & Marx (1996) imposed penalties on differences between adjacent
coefficients
(θ1 − 2θ2 + θ3)2 + . . .+ (θc−2 − 2θc−1 + θc)
2 = θ′D′
2D2θ
where D2 is a second order difference matrix.
Algorithm
• Penalized Newton-Raphson (IWLS) algorithm
(B′WδB + P )θ = B′Wδz,
P = λD′
2D2 is a roughness penalty.
• This is the method of P -splines.
Smoothing parameter selection
Trade-off fit and roughness by minimizing
BIC = Deviance + log(n)ED,
where ED is the effective dimension or degrees of freedom.
Log mortality for E & W males age 70
log(
mor
talit
y)
−3.
9−
3.7
−3.
5−
3.3
−3.
1−
2.9
Observed mortalityB−spline regressionB−spline coefficientsP−spline regressionP−spline coefficients
DF = 28EDF = 16
Year
Bsp
line
1961 1971 1981 1991 2001
0.0
0.3
0.6
2-dimensional smoothing
Let Ba, na × ca, be a 1-d B-spline model matrix defined along age.
Let By, ny × cy , be a 1-d B-spline model matrix defined along year.
The 2-d model matrix is given by the Kronecker product
B = By ⊗Ba, nany × cacy.
Age
40
50
60
70
80
90
Year
1970
1980
1990
2000
B−spline
0.0
0.1
0.2
0.3
0.4
0.5
2d B−spline basis
Penalties in 2-d
• Each regression coefficient is associated with the summit of one of the hills.
• Smoothness is ensured by penalizing the coefficients in rows and columns.
P = λaIcy ⊗D′
aDa + λyD′
yDy ⊗ Ica
Computational challenge
• B = By ⊗Ba is large, maybe very large.
• Two smoothing parameters, λa and λy , must be chosen.
GLAM
Definition: Row tensor of X, n× c,
G(X) = [X ⊗ 1′
c] ∗ [1′
c ⊗X] , n× c2.
Computational challenge
• B = By ⊗Ba is large, maybe very large.
• Two smoothing parameters, λa and λy , must be chosen.
GLAM
Definition: Row tensor of X, n× c,
G(X) = [X ⊗ 1′
c] ∗ [1′
c ⊗X] , n× c2.
Computational challenge
• B = By ⊗Ba is large, maybe very large.
• Two smoothing parameters, λa and λy , must be chosen.
GLAM
Definition: Row tensor of X, n× c,
G(X) = [X ⊗ 1′
c] ∗ [1′
c ⊗X] , n× c2.
GLAM Algorithms
Structure: logE[D] = logE +BaΘB′
y
Linear function: Bθ = [By ⊗Ba]θ ≡ BaΘB′
y
nany × 1 na × ny
Inner product: B′WδB ≡ G(Ba)′WG(By)
cacy × cacy c2a × c2y
St errors: diag{Var(Bθ)} ≡ G(Ba)SG(By)′
diag{nany × nany} na × ny
S, c2a × c2y ≡ (B′WδB)−1, cacy × cacy
GLAM Algorithms
Structure: logE[D] = logE +BaΘB′
y
Linear function: Bθ = [By ⊗Ba]θ ≡ BaΘB′
y
nany × 1 na × ny
Inner product: B′WδB ≡ G(Ba)′WG(By)
cacy × cacy c2a × c2y
St errors: diag{Var(Bθ)} ≡ G(Ba)SG(By)′
diag{nany × nany} na × ny
S, c2a × c2y ≡ (B′WδB)−1, cacy × cacy
GLAM Algorithms
Structure: logE[D] = logE +BaΘB′
y
Linear function: Bθ = [By ⊗Ba]θ ≡ BaΘB′
y
nany × 1 na × ny
Inner product: B′WδB ≡ G(Ba)′WG(By)
cacy × cacy c2a × c2y
St errors: diag{Var(Bθ)} ≡ G(Ba)SG(By)′
diag{nany × nany} na × ny
S, c2a × c2y ≡ (B′WδB)−1, cacy × cacy
GLAM Algorithms
Structure: logE[D] = logE +BaΘB′
y
Linear function: Bθ = [By ⊗Ba]θ ≡ BaΘB′
y
nany × 1 na × ny
Inner product: B′WδB ≡ G(Ba)′WG(By)
cacy × cacy c2a × c2y
St errors: diag{Var(Bθ)} ≡ G(Ba)SG(By)′
diag{nany × nany} na × ny
S, c2a × c2y ≡ (B′WδB)−1, cacy × cacy
GLAM
• Conceptually attractive
• Low footprint
• Very fast - marginal processing (Yates algorithm)
• Generalizes to d-dimensions
• Mixed model representation
Applications of GLAM
• Spatio-temporal modelling
• Multidimensional density estimation
• Longitudinal data analysis
• Variety trials
• Microarray analysis
• Respiratory disease modelling (age: 1-105; year: 1959-1998, month: 1-12).
Timings in 3-d
Data are 105× 40× 12 = 50400
Times (seconds) to calculate B′WδB
Coefficients npar GLM GLAM Ratio
6× 6× 6 216 20 1 20:1
7× 7× 7 343 200 2 100:1
8× 8× 8 512 2000 4 500:1
9× 9× 9 729 − 20 −
Timings in 3-d
Data are 105× 40× 12 = 50400
Times (seconds) to calculate B′WδB
Coefficients npar GLM GLAM Ratio
6× 6× 6 216 20 1 20:1
7× 7× 7 343 200 2 100:1
8× 8× 8 512 2000 4 500:1
9× 9× 9 729 − 20 −
Smoothing a constrained GLM
Consider a constrained penalized GLM (Currie, 2013) with
• Model matrix: X, n× p, r(X) = p− q
• Constraints matrix: H, q × p, r ([X ′ : H ′]) = p, Hθ = k
• Penalty matrix: P , p× p, acting on θ
• Link: canonical
• Error: exponential family
• Algorithm: Newton-Raphson:
X ′WX + P : H ′
H : 0
θ
ω
=
X ′W z
k
Lee-Carter model
log λi,j = αi + βiκj,∑
κj = 0,∑
βi = 1
logΛ = α1′ + βκ′
The gnm package in R
Age.F = factor(Age); Year.F = factor(Year)
gnm(Death ∼ −1+ Age.F+ Mult(Age.F, Year.F) + offset(Off),
family = poisson(link = "log"))
NB: This is a vector calculation - not GLAM!
Lee-Carter model
log λi,j = αi + βiκj,∑
κj = 0,∑
βi = 1
logΛ = α1′ + βκ′
The gnm package in R
Age.F = factor(Age); Year.F = factor(Year)
gnm(Death ∼ −1+ Age.F+ Mult(Age.F, Year.F) + offset(Off),
family = poisson(link = "log"))
NB: This is a vector calculation - not GLAM!
40 50 60 70 80 90
−6
−5
−4
−3
−2
Age
Alp
ha
40 50 60 70 80 90
0.25
0.35
0.45
0.55
Age
Bet
a
1960 1970 1980 1990 2000 2010
−1.
0−
0.5
0.0
0.5
Year
Kap
pa
1960 1970 1980 1990 2000 2010
−3.
8−
3.6
−3.
4−
3.2
−3.
0−
2.8
Year
log(
mor
talit
y)
Age 70
Coupled GLMs
GLM1: log λi,j = αi + βiκj,∑
βi = 1
logE(y) = log e+ 1ny⊗ α+Xβ, X = [κ⊗ Ina
],
H = h′ = 1′
na, k = 1
Xβ ≡ βκ′
GLM2: log λi,j = αi + βiκj,∑
κj = 0
logE(y) = log e+Xθ, X = [1ny⊗ Ina
: Iny⊗ β],
H = h′ = (0′
na,1′
ny), k = 0
Xθ ≡ α1′ + βκ′
Coupled GLMs
GLM1: log λi,j = αi + βiκj,∑
βi = 1
logE(y) = log e+ 1ny⊗ α+Xβ, X = [κ⊗ Ina
],
H = h′ = 1′
na, k = 1
Xβ ≡ βκ′
GLM2: log λi,j = αi + βiκj,∑
κj = 0
logE(y) = log e+Xθ, X = [1ny⊗ Ina
: Iny⊗ β],
H = h′ = (0′
na,1′
ny), k = 0
Xθ ≡ α1′ + βκ′
2010 2020 2030 2040 2050
−7.
2−
7.0
−6.
8−
6.6
−6.
4
Crossover with LC original
Year
log(
mor
talit
y)Age: 41Age: 42
Delwarde-Denuit-Eilers model
log λi,j = αi + βiκj,∑
κj = 0,∑
βi = 1
where
β = Bab.
Fitting Delwarde-Denuit-Eilers
Coupled GLMs
GLM1: log λi,j = αi + βiκj,∑
βi = 1, β = Bab
logE(y) = log e+ 1ny⊗ α+Xb, X = [κ⊗ Ina
]Ba,
H = h′ = 1′
naBa, k = 1
P = τβD′
2D2
GLM2: as LC original.
Fitting Delwarde-Denuit-Eilers
Coupled GLMs
GLM1: log λi,j = αi + βiκj,∑
βi = 1, β = Bab
logE(y) = log e+ 1ny⊗ α+Xb, X = [κ⊗ Ina
]Ba,
H = h′ = 1′
naBa, k = 1
P = τβD′
2D2
GLM2: as LC original.
40 50 60 70 80 90
0.01
50.
020
0.02
5
Estimates of beta
Age
Bet
aLCDDE
2010 2020 2030 2040 2050
−7.
2−
7.0
−6.
8−
6.6
−6.
4
Crossover with LC and DDE
Year
log(
mor
talit
y)LC: Age = 41LC: Age = 42DDE: Age = 41DDE: Age = 42
2010 2020 2030 2040 2050
−6.
6−
6.4
−6.
2−
6.0
−5.
8−
5.6
−5.
4
Irregularity with Lee−Carter
Year
log(
mor
talit
y)
Age = 51Age = 52Age = 53Age = 54Age = 55
2010 2020 2030 2040 2050
−6.
8−
6.6
−6.
4−
6.2
−6.
0−
5.8
−5.
6−
5.4
Irregularity with DDE
Year
log(
mor
talit
y)
Age = 51Age = 52Age = 53Age = 54Age = 55
LC(S): Smoothing α and β
log λi,j = αi + βiκj,∑
κj = 0,∑
βi = 1
where
β = Bab, α = Baa.
Fitting smooth α and β
Coupled GLMs
GLM1: as DDE.
GLM2: log λi,j = αi + βiκj,∑
κj = 0, α = Baa
logE(y) = log e+Xθ, X = [1ny⊗Ba : Iny
⊗ β],
θ′ = (a′,κ′),
H = h′ = (0′
ca,1′
ny), k = 0,
P = blockdiag{ταD′
2D2, 0 ∗ Iny
}.
Fitting smooth α and β
Coupled GLMs
GLM1: as DDE.
GLM2: log λi,j = αi + βiκj,∑
κj = 0, α = Baa
logE(y) = log e+Xθ, X = [1ny⊗Ba : Iny
⊗ β],
θ′ = (a′,κ′),
H = h′ = (0′
ca,1′
ny), k = 0,
P = blockdiag{ταD′
2D2, 0 ∗ Iny
}.
40 50 60 70 80 90
0.04
0.06
0.08
0.10
0.12
Mortality differences in 2010
Age
Mor
talit
y di
ffere
nce
LCDDELC(S)
40 50 60 70 80 90
−0.
050.
000.
050.
100.
15
Mortality differences in 2050
Age
Mor
talit
y di
ffere
nce
LCDDELC(S)
References
• Currie, Durban, & Eilers (2006). Generalized linear array
models with applications to multidimensional smoothing.
Journal of the Royal Statistical Society, Series B, 68, 259-280.
• Currie (2013). Smoothing constrained generalized linear
models with applications to the Lee-Carter model. Statistical
Modelling, 13, 69-93.
• Djeundje & Currie (2011). Smoothing dispersed counts with
applications to mortality data. Annals of Actuarial Science, 5,
33-52. (Over-dispersion, amounts, joint models)
• Lee & Durban (2011). P-spline ANOVA-type interaction
models for spatio-temporal smoothing. Statistical Modelling,
11, 49-69,
• Turner & Firth. (2012). Generalized nonlinear models in R:
An overview of the gnm package.