43
1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University

1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary

Embed Size (px)

Citation preview

Page 1: 1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary

1

GLM I: Introduction to Generalized Linear Models

GLM I: Introduction to Generalized Linear Models

By Curtis Gary DeanDistinguished Professor of Actuarial Science

Ball State University

By Curtis Gary DeanDistinguished Professor of Actuarial Science

Ball State University

Page 2: 1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary

2

Classical Multiple Linear RegressionClassical Multiple Linear Regression

Yi = a0 + a1Xi1 + a2Xi2 …+ amXim + ei

Yi are the response variables Xij are predictors

i subscript denotes ith observation

j subscript identifies jth predictor

Yi = a0 + a1Xi1 + a2Xi2 …+ amXim + ei

Yi are the response variables Xij are predictors

i subscript denotes ith observation

j subscript identifies jth predictor

Page 3: 1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary

3

One Predictor: Yi = a0 + a1Xi One Predictor: Yi = a0 + a1Xi

60

65

70

75

60 65 70 75

Average Height of Parents (Inches)

Hei

gh

t o

f O

ldes

t S

on

60

65

70

75

60 65 70 75

Average Height of Parents (Inches)

Hei

gh

t o

f O

ldes

t S

on

Page 4: 1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary

4

Classical Multiple Linear Regression: Solving for ai

Classical Multiple Linear Regression: Solving for ai

n

iimmii

m

XaXaaY

aaaG

1

2110

10

)(

),..,( Minimize

Page 5: 1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary

5

Classical Multiple Linear RegressionClassical Multiple Linear Regression

μi = E[Yi] = a0 + a1Xi1 + …+ amXim

Yi is Normally distributed random variable with constant variance σ2

Want to estimate μi = E[Yi] for each i

μi = E[Yi] = a0 + a1Xi1 + …+ amXim

Yi is Normally distributed random variable with constant variance σ2

Want to estimate μi = E[Yi] for each i

Page 6: 1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary

6

Response Yi has Normal Distribution

-12 -7 -2 3 8 μ i

Page 7: 1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary

7

Problems with Traditional ModelProblems with Traditional Model

Number of claims is discrete

Claim sizes are skewed to the right

Probability of an event is in [0,1]

Variance is not constant across data points i

Nonlinear relationship between X’s and Y’s

Number of claims is discrete

Claim sizes are skewed to the right

Probability of an event is in [0,1]

Variance is not constant across data points i

Nonlinear relationship between X’s and Y’s

Page 8: 1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary

8

Generalized Linear Models - GLMsGeneralized Linear Models - GLMs

Fewer restrictions

Y can model number of claims, probability of renewing, loss severity, loss ratio, etc.

Large and small policies can be put into one model

Y can be nonlinear function of X’s

Classical linear regression model is a special case

Fewer restrictions

Y can model number of claims, probability of renewing, loss severity, loss ratio, etc.

Large and small policies can be put into one model

Y can be nonlinear function of X’s

Classical linear regression model is a special case

Page 9: 1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary

9

Generalized Linear Models - GLMsGeneralized Linear Models - GLMs

Same goal

Predict : μi = E[Yi]

Same goal

Predict : μi = E[Yi]

Page 10: 1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary

10

Generalized Linear Models - GLMsGeneralized Linear Models - GLMs

g(μi )= a0 + a1Xi1 + …+ amXim

g( ) is the link function

E[Yi] = μi = g-1(a0 + a1Xi1 + …+ amXim)

Yi can be Normal, Poisson, Gamma, Binomial, Compound Poisson, …

Variance can be modeled

g(μi )= a0 + a1Xi1 + …+ amXim

g( ) is the link function

E[Yi] = μi = g-1(a0 + a1Xi1 + …+ amXim)

Yi can be Normal, Poisson, Gamma, Binomial, Compound Poisson, …

Variance can be modeled

Page 11: 1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary

11

GLMs Extend Classical Linear RegressionGLMs Extend Classical Linear Regression

If link function is identity: g(μi) = μi

And Yi has Normal distribution

→ GLM gives same answer as Classical Linear Regression*

* Least squares and MLE equivalent for Normal dist.

If link function is identity: g(μi) = μi

And Yi has Normal distribution

→ GLM gives same answer as Classical Linear Regression*

* Least squares and MLE equivalent for Normal dist.

Page 12: 1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary

12

Exponential Family of Distributions – Canonical Form

Exponential Family of Distributions – Canonical Form

parameter. nuisancea called often is

!interest ofparameter theis

)()(''][

)('][

,exp , ;

abYVar

bYE

yca

byyf

Page 13: 1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary

13

Some Math Rules: RefresherSome Math Rules: Refresher

yxyx

xxx

xrx

yxxy

xxx

ex

r

x

lnln)/ln()6(

ln)ln()/1ln()5(

ln)ln()4(

lnln)ln()3(

]ln[exp]exp[ln)2(

)exp()1(

1

Page 14: 1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary

14

Normal Distribution in Exponential FamilyNormal Distribution in Exponential Family

22

2

2

2

2

22

2

2

2

2

2

2ln2

2/exp

2

2exp

2

1lnexp

2

)(exp

2

1),;(

yy

yy

yyf

Page 15: 1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary

15

Normal Distribution in Exponential FamilyNormal Distribution in Exponential Family

22

2

2

22

2

2

22

1)()(][ and

)(2/)(then

,)( and Let

2ln2

2/exp),;(

abYVar

bb

a

yyyf

θ b(θ)

Page 16: 1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary

16

Poisson Distribution in Exponential FamilyPoisson Distribution in Exponential Family

)!ln(1

)(lnexp]Pr[

!lnexp]Pr[

!]Pr[

yy

yY

y

eyY

y

eyY

y

y

)!ln(1

)(lnexp]Pr[

!lnexp]Pr[

!]Pr[

yy

yY

y

eyY

y

eyY

y

y

θ

Page 17: 1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary

17

Poisson Distribution in Exponential FamilyPoisson Distribution in Exponential Family

ed

dabYVar

eed

dbYE

aeb

e

2

2

)()(''][

)('][

1)(and)(

ln

ed

dabYVar

eed

dbYE

aeb

e

2

2

)()(''][

)('][

1)(and)(

ln

Page 18: 1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary

18

Compound Poisson DistributionCompound Poisson Distribution

Y = C1 + C2 + . . . + CN

N is Poisson random variable Ci are i.i.d. with Gamma distribution

This is an example of a Tweedie distribution

Y is member of Exponential Family

Y = C1 + C2 + . . . + CN

N is Poisson random variable Ci are i.i.d. with Gamma distribution

This is an example of a Tweedie distribution

Y is member of Exponential Family

Page 19: 1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary

19

Members of the Exponential FamilyMembers of the Exponential Family

• Normal• Poisson• Binomial• Gamma• Inverse Gaussian• Compound Poisson

Page 20: 1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary

20

Variance StructureVariance Structure

E[Yi] = μi = b' (θi) → θi = b' (-1)(μi )

Var[Yi] = a(Φi) b' ' (θi) = a(Φi) V(μi)

Common form: Var[Yi] = Φ V(μi)/wi

Φ is constant across data but weights applied to data points

E[Yi] = μi = b' (θi) → θi = b' (-1)(μi )

Var[Yi] = a(Φi) b' ' (θi) = a(Φi) V(μi)

Common form: Var[Yi] = Φ V(μi)/wi

Φ is constant across data but weights applied to data points

Page 21: 1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary

21

V(μ) Normal 0

Poisson Binomial (1-) Tweedie p, 1<p<2 Gamma 2

Inverse Gaussian 3

Recall: Var[Yi] = Φ V(μi)/wi

Variance Functions V(μ)

Page 22: 1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary

22

Variance at Point and Fit Variance at Point and Fit

A Practioner’s Guide to Generalized Linear Models: A CAS Study Note

Page 23: 1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary

23

Variance of Yi and Fit at Data Point iVariance of Yi and Fit at Data Point i

Var(Yi) is big → looser fit at data point i

Var(Yi) is small → tighter fit at data point i

Var(Yi) is big → looser fit at data point i

Var(Yi) is small → tighter fit at data point i

)Var(

1fit of Tightness

iY

Page 24: 1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary

24

Why Exponential Family?Why Exponential Family?

Distributions in Exponential Family can model a variety of problems

Standard algorithm for finding coefficients a0, a1, …, am

Distributions in Exponential Family can model a variety of problems

Standard algorithm for finding coefficients a0, a1, …, am

Page 25: 1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary

25

Modeling Number of ClaimsModeling Number of Claims

Number ofPolicy Sex Territory Claims in 5 Years

1 M 02 02 F 01 03 F 01 04 F 02 15 F 01 06 F 02 17 M 02 28 M 02 29 M 02 1

10 F 01 1: : : :

Page 26: 1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary

26

Assume a Multiplicative ModelAssume a Multiplicative Model

μi = expected number of claims in five years

μi = BF,01 x CSex(i) x CTerr(i)

If i is Female and Terr 01

→ μi = BF,01 x 1.00 x 1.00

μi = expected number of claims in five years

μi = BF,01 x CSex(i) x CTerr(i)

If i is Female and Terr 01

→ μi = BF,01 x 1.00 x 1.00

Page 27: 1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary

27

Multiplicative ModelMultiplicative Model

μi = exp(a0 + aSXS(i) + aTXT(i))

μi = exp(a0) x exp(aSXS(i)) x exp( aTXT(i))

i is Female → XS(i) = 0; Male → XS(i) = 1

i is Terr 01 → XT(i) = 0; Terr 02 → XT(i) = 1

μi = exp(a0 + aSXS(i) + aTXT(i))

μi = exp(a0) x exp(aSXS(i)) x exp( aTXT(i))

i is Female → XS(i) = 0; Male → XS(i) = 1

i is Terr 01 → XT(i) = 0; Terr 02 → XT(i) = 1

Page 28: 1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary

28

Values of Predictor VariablesValues of Predictor Variables

Policy Sex XS(i) Territory XT(i)

1 M 1 02 12 F 0 01 03 F 0 01 04 F 0 02 15 F 0 01 06 F 0 02 17 M 1 02 18 M 1 02 19 M 1 02 1

10 F 0 01 0

Page 29: 1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary

29

Natural Log Link FunctionNatural Log Link Function

ln(μi) = a0 + aSXS(i) + aTXT(i)

μi is in (0 , ∞ )

ln(μi) is in ( - ∞ , ∞ )

ln(μi) = a0 + aSXS(i) + aTXT(i)

μi is in (0 , ∞ )

ln(μi) is in ( - ∞ , ∞ )

Page 30: 1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary

30

Poisson Distribution in Exponential FamilyPoisson Distribution in Exponential Family

eb

yy

yY

)(

ln

)!ln(1

lnexp]Pr[

eb

yy

yY

)(

ln

)!ln(1

lnexp]Pr[

Page 31: 1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary

31

Natural Log is Canonical Link for PoissonNatural Log is Canonical Link for Poisson

θi = ln(μi)

θi = a0 + aSXS(i) + aTXT(i)

θi = ln(μi)

θi = a0 + aSXS(i) + aTXT(i)

Page 32: 1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary

32

Estimating Coefficients a1, a2, .., amEstimating Coefficients a1, a2, .., am

Classical linear regression uses least squares

GLMs use Maximum Likelihood Method

Solution will exist for distributions in exponential family

Classical linear regression uses least squares

GLMs use Maximum Likelihood Method

Solution will exist for distributions in exponential family

Page 33: 1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary

33

Likelihood and Log LikelihoodLikelihood and Log Likelihood

n

iii

n

iii

yfy

yLy

yfyL

111

1111

111

);(ln,..),..;(

,..)],..;(ln[,..),..;(

);(,..),..;(

Page 34: 1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary

34

Find a0, aS, and aT for PoissonFind a0, aS, and aT for Poisson

)()(0

111

with

!ln,...),..;(

:Maximize

iTTiSSi

n

iiii

xaxaa

yeyy i

Page 35: 1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary

35

Iterative Numerical Procedure to Find ai’sIterative Numerical Procedure to Find ai’s

Use statistical package or actuarial software

Specify link function and distribution type

“Iterative weighted least squares” is the numerical method used

Use statistical package or actuarial software

Specify link function and distribution type

“Iterative weighted least squares” is the numerical method used

Page 36: 1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary

36

Solution to Our ExampleSolution to Our Example

a0 = -.288 → exp(-.288) = .75 aS = .262 → exp(.262) = 1.3 aT = .095 → exp(.095) = 1.1

μi = exp(a0) x exp(aSXS(i)) x exp( aTXT(i))

μi = .75 x 1.3XS(i) x 1.1XT(i)

i is Male,Terr 01 → μi = .75 x 1.31 x 1.10

a0 = -.288 → exp(-.288) = .75 aS = .262 → exp(.262) = 1.3 aT = .095 → exp(.095) = 1.1

μi = exp(a0) x exp(aSXS(i)) x exp( aTXT(i))

μi = .75 x 1.3XS(i) x 1.1XT(i)

i is Male,Terr 01 → μi = .75 x 1.31 x 1.10

Page 37: 1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary

37

Testing New Drug TreatmentTesting New Drug Treatment

X1 X2 Y

Dosage Age Cure Value1.0 30 Yes 11.0 43 No 01.0 82 No 01.5 45 No 01.5 67 No 01.5 26 Yes 12.0 33 Yes 12.0 50 Yes 12.0 72 No 02.5 31 Yes 12.5 45 Yes 12.5 75 Yes 1

Page 38: 1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary

38

Multiple Linear RegressionMultiple Linear Regression

Cure ActualPredicted

ProbabilityYes 1 0.5179No 0 0.3298No 0 -0.2345No 0 0.5366No 0 0.2183

Yes 1 0.8115Yes 1 0.9460Yes 1 0.7000No 0 0.3817

Yes 1 1.2107Yes 1 1.0081Yes 1 0.5740

Dependent Variable: Y

Cure ActualPredicted

ProbabilityYes 1 0.5179No 0 0.3298No 0 -0.2345No 0 0.5366No 0 0.2183

Yes 1 0.8115Yes 1 0.9460Yes 1 0.7000No 0 0.3817

Yes 1 1.2107Yes 1 1.0081Yes 1 0.5740

Dependent Variable: Y

Page 39: 1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary

39

Logistic Regression ModelLogistic Regression Model

p = probability of cure, p in [0,1]

odds ratio: p/(1-p) in [0, + ∞ ]

ln[p/(1-p)] in [ - ∞ , + ∞ ]

ln[p/(1-p)] = a + b1X1 + b2X2

p = probability of cure, p in [0,1]

odds ratio: p/(1-p) in [0, + ∞ ]

ln[p/(1-p)] in [ - ∞ , + ∞ ]

ln[p/(1-p)] = a + b1X1 + b2X2

Link function

Page 40: 1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary

40

Logistic Regression ModelLogistic Regression Model

X1 X2

Dosage Age Cure ValuePredicted

Probability1.0 30 Yes 1 0.5681.0 43 No 0 0.0001.0 82 No 0 0.0001.5 45 No 0 0.6481.5 67 No 0 0.0001.5 26 Yes 1 1.0002.0 33 Yes 1 1.0002.0 50 Yes 1 1.0002.0 72 No 0 0.0002.5 31 Yes 1 1.0002.5 45 Yes 1 1.0002.5 75 Yes 1 0.784

Dependent Variable YX1 X2

Dosage Age Cure ValuePredicted

Probability1.0 30 Yes 1 0.5681.0 43 No 0 0.0001.0 82 No 0 0.0001.5 45 No 0 0.6481.5 67 No 0 0.0001.5 26 Yes 1 1.0002.0 33 Yes 1 1.0002.0 50 Yes 1 1.0002.0 72 No 0 0.0002.5 31 Yes 1 1.0002.5 45 Yes 1 1.0002.5 75 Yes 1 0.784

Dependent Variable Y

Page 41: 1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary

41

Which Exponential Family Distribution?Which Exponential Family Distribution?

Frequency: Poisson, {Negative Binomial}

Severity: Gamma

Loss ratio: Compound Poisson Pure Premium: Compound Poisson

How many policies will renew: Binomial

Frequency: Poisson, {Negative Binomial}

Severity: Gamma

Loss ratio: Compound Poisson Pure Premium: Compound Poisson

How many policies will renew: Binomial

Page 42: 1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary

42

What link function?What link function?

Additive model: identity

Multiplicative model: natural log

Modeling probability of event: logistic

Additive model: identity

Multiplicative model: natural log

Modeling probability of event: logistic

Page 43: 1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary

43