Generalized linear models II Exponential familiespmcc/seminars/SASA/glm2.pdf · Components of a generalized linear model I Observation Y 2Rn with independent components... very strong

Generalized linear models IIExponential families

Peter McCullagh

Department of StatisticsUniversity of Chicago

Polokwane, South AfricaNovember 2013

Outline

Components of a GLM

Exponential families

Real exponential families

Maximum likelihood fitting

Parameter estimation

Components of a generalized linear model

I Observation Y ∈ Rn with independent components... very strong simplifying assumption

I Distribution: exponential family: Yi ∼ EF (θi)mean-value parameter µi = E(Yi)includes Poisson, binomial, exponential,

hypergeometric,...I Linear part: η = Xβ; η ∈ X ⊂ Rn

e.g. factorial model,..I Link function: ηi = g(µi) (component-wise)
















Construction of an exponential family(i) Observation space y ∈ S (S = R)(ii) Baseline distribution with density f0(y) on S(iii) Real-valued statistic S(y)(iv) Moment generating function of statistic S():

M0(θ) =

∫S

eθS(y) × f0(y) dy

(v) Θ = {θ : M0(θ) <∞} (parameter space)(vi) K0(θ) = log M0(θ) is the cumulant generating function(vii) Weighted distribution

fθ(y) =eθS(y)f0(y)

M0(θ)= eθS(y)−K0(θ) · f0(y)

for θ ∈ Θ.(viii) Support of fθ = support of f0

Simplest types: (natural exponential families)S = R and S(y) = y .

Some properties of the family

Moment generating function of S() under fθ

Mθ(t) =

∫S

etS(y)fθ(y) dy

=

∫S

etS(y) eθS(y)f0(y)

M0(θ)dy

=M0(t + θ)

M0(θ)

Kθ(t) = K0(θ + t)− K0(θ)

The r th cumulant of S under fθ is K (r)θ (0) = K (r)

0 (θ)Mean: Eθ(S) = K ′0(θ)Variance: varθ(S) = K ′′0 (θ) ≥ 0

K0(·) is a convex function on Θ

Examples of real exponential families

Real: observation space is S = R

Gaussian family:Baseline: f0(y) = exp(−y2/2)/

√2π

MGF: M0(θ) = exp(θ2/2)CGF: K0(θ) = θ2/2

Exponentially weighted distribution:

fθ(y) = eθy−θ2/2e−y2/2/√

2π

= e−(y−θ)2/2/√

2π

Initial distribution N(0,1):

Exp family {N(θ,1) : θ ∈ R}all with unit variance

The Poisson familyBaseline distribution: Po(1)

f0(y) =exp(−1)

y !y = 0,1, . . .

Generating functions:

M0(θ) =∞∑

y=0

eθye−1

y != exp(eθ − 1)

K0(θ) = eθ − 1

All cumulants are equal to one; r th moment is Br (Bell number)Θ = {θ : M0(θ) <∞} = RExponential family:

fθ(y) =exp(yθ − eθ + 1)e−1

y !=

eyθ−eθ

y !=µy e−µ

y !

Initial baseline distribution Po(1) on integers:

Exp family {Po(µ) : µ > 0} µ = eθ

r th cumulant of Po(µ) is K (r)(θ) = µ

The Bernoulli familyBaseline distribution: Bernoulli coin-toss

f0(y) = 1/2 for y = 0,1

Generating functions:

M0(θ) = (eθ0 + eθ1)/2 = (1 + eθ)/2K0(θ) = log(1 + eθ)− log 2

Θ = {θ : M0(θ) <∞} = R

Exponential family:

fθ(y) =

{1/(1 + eθ) y = 0eθ/(1 + eθ) y = 1

π = eθ/(1 + eθ) is the mean of fθ

Initial baseline distribution Ber(1/2) on {0,1}Exponential family: {Ber(π) : 0 < π < 1} on {0,1}

The binomial familyBaseline distribution: Binomial(1/2):

f0(y ; m) =

(my

)2−m (0 ≤ y ≤ m)

Generating functions

M0(θ) = 2−mm∑

y=0

(my

)eθy = (1 + eθ)m/2m

K0(θ) = m log(1 + eθ)−m log 2

Exponential family:

fθ(y ; m) =

(my

)eθy

(1 + eθ)m

=

(my

)πy (1− π)m−y

Θ = R, π = eθ/(1 + eθ), 0 < π < 1

The Ewens familyDistribution on permutations [n]→ [n]: S = Sn:

[n] = {1, . . . ,n}, #Sn = n!

σ =y( 1 2 3 4 5 6 7

4 1 2 3 7 6 5

)= (1,4,3,2)(5,7), (6)

#σ = 3

Baseline distribution: f0(σ) = 1/n!

Generating function:∑

σ α#σ = α↑n (Euler)

α↑n = α(α + 1) · · · (α + n − 1)

Generating functions

M0(θ) =∑σ

eθ#σf0(σ) = α↑n/n! (α = eθ)

K0(θ) = log(α↑n)− log(n!)

Weighted distribution on permutations

fθ(σ) =α#σ

α↑n

The Ewens family on set partitionsPartition of [7] into blocks: 1|23|4567 or 1234|57|6 or...Partitions of [n]

n = 2 : 12, 1|2n = 3 : 123, 12|3, 13|2, 23|1, 1|2|3n = 4 : 1234, 123|4[4], 12|34[3], 12|3|4[6], 1|2|3|4

#Pn: 1, 2, 5, 15, 52, 203, 877, 4140, 21147, 115975,. . .

Make each permutation cycle into a blockInduced marginal distribution on set partitions

fn(σ) =α#σ

α↑n

∏b∈σ

(#b − 1)!

Also exponential family:Canonical statistic #σ (number of blocks)

Can talk of the Ewens distribution of #σ on {1, . . . ,n}Cumulant function:

K (θ) = log((eθ)(eθ + 1) · · · (eθ + n − 1))

versus n log(1 + eθ) for binomial

Regression models

Sample: n individuals or subjects i = 1, . . . ,nCovariate xi for individual iResponse Yi for individual i

Distributional assumptions: (exp family)density fi(y) = exp(yiθi − K (θi))× f0(y)

independent for i 6= jµi = E(Yi) = K ′(θi);var(Yi) = K ′′(θi) = V (µi)

Model for vector µ as a function of Xµ =

Convolution and dispersion parameter

Suppose Y1, . . . ,Ym are iid fθ(y) = eθy−K0(θ)f0(y)What is the distribution of Y?

Answer first for θ = 0: f (m)0 (y)

in general: em(θy−K0(θ)) × f (m)0 (y)

Suggests introducing a dispersion parameter σ2 = 1/nufθ(y) = eν(θy−K0(θ)) × f0(y ; ν) ... ν is effective sample size;Mean is E(Y ) = K ′0(θ) independent of νvar(Y ) = K ′′(θ)/ν = σ2K ′′(θ)σ2 = 1/ν is the relative varianceθ, ν are orthogonal parameters

Example: the gamma familyStandard gamma distribution:

f (y ; ν) = yν−1e−y/Γ(ν); y > 0; ν > 0 CGF:K (t) = −ν log(1− t) = ν(t + t2/2 + t3/3 + t4/4 + · · ·

sum of exponentials: E(Y ) = ν; var(Y ) = ν

Standard 2-parameter gamma distribution:λνyν−1e−λy/Gamma(ν)K (t) = ν log(1− t/λ) = ν(tλ+ t2λ2/2 + · · ·Mean µ = λν; variance λ2ν

Parameterization for GLMs:

ννyν−1e−νy/µ

µν Γ(ν)

E(Y ) = µ; var(Y ) = µ2/ν; ν = var(Y )/µ2

GLM assumption: ν constant c.v.; µi depends on xiExp family parameterization: θi = −1/µi

Alternative non-GLM models:νi = ν constant and g(λi) = x ′i βλi = λ constant and log νi = x ′i β

Generalized linear models: Key ideasY1, . . . ,Yn are assumed independentdensity function of Yi at y :

eνi (θi y−K0(θi )) × f0(y ; νi)

Two-parameter family (θi , νi) such thatµi = E(Yi) = K ′0(θi);var(Yi) = K ′′(θi)/νi = V (µi)/νivariance function V (µ) is a characteristic of the familyV (µ) = µ for Poisson; V (µ) = µ2 for Gamma

GLM assumptions:νi = ν (constant relative variance)g(µi) = ηi = x ′i β;g(µ) = Xβ, where X is the design matrixLink function g is part of the specification

Parameters to be estimated (learned) (β, ν)

Estimation of β by maximum likelihoodLog likelihood derivatives w.r.t. β (vector/matrix form)

∂l∂β

= νX ′W(

Y − µdµ/dη

)∂2l∂β2 = −νX ′WX + terms of zero mean

W = diag{(dµi/dηi)2/Vi}

Fisher modification of Newton-Raphson scheme gives

(β − β0) = (X ′W0X )−1X ′W0

(Y − µ0

dµ/dη

)(X ′W0X )β = X ′W0X β0 + X ′W0

(Y − µ0

dµ/dη

)Entire F-N-R sequence is independent of ν ≡ 1/σ2

Asymptotic moments of β:

E(β) = β + Op(n−1

cov(β) = (X ′WX )−1/ν = σ2(X ′WX )−1

Dispersion estimation

In certain applications ν = 1 is ‘known’: ignore this frame

Otherwise, . . . F-N-R sequence produces βη = X β; µ = g−1(η)

Dispersion = relative variance: σ2 = var(Yi)/V (µi)

Natural moment estimate

σ2 =1

n − p

∑i

(Yi − µi)2

V (µi)=

X 2

n − p

p = rank(X ); X 2 is the generalized Pearson statistic

σ2 is consistent, but not the mle

R defaults

R defaults in summary(fit) and vcov(fit)σ2 = 1 for Poisson and binomialσ2 = X 2/(n − p) otherwise. normal, gamma,...

Arguably these are the right defaults,...Sometimes, but not always, appropriate

Over-riding the defaults:summary(glm(y~..., family=poisson()),dispersion=4.7)summary(glm(y~..., family=gamma(link=log)),dispersion=1)

The deviance function

Documents

Generalized linear models II Exponential familiespmcc/seminars/SASA/glm2.pdf · Components of a generalized linear model I Observation Y 2Rn with independent components... very strong