45
Copyright © 2013, SAS Institute Inc. All rights reserved. GENERALIZED LINEAR MODELS

Copyright © 2013, SAS Institute Inc. All rights reserved. GENERALIZED LINEAR MODELS

Embed Size (px)

Citation preview

Page 1: Copyright © 2013, SAS Institute Inc. All rights reserved. GENERALIZED LINEAR MODELS

Copy r igh t © 2013 , SAS Ins t i t u t e I nc . A l l r i gh t s res erved .

GENERALIZED LINEAR MODELS

Page 2: Copyright © 2013, SAS Institute Inc. All rights reserved. GENERALIZED LINEAR MODELS

Copy r igh t © 2013 , SAS Ins t i t u t e I nc . A l l r i gh t s res erved .

GENERALIZED LINEAR MODELSOVERVIEW

Page 3: Copyright © 2013, SAS Institute Inc. All rights reserved. GENERALIZED LINEAR MODELS

Copy r igh t © 2013 , SAS Ins t i t u t e I nc . A l l r i gh t s res erved .

OVERVIEW GENERAL LINEAR MODELS

0 1 1i i k ki iy x x 2~ . . . (0, )i i i d N

2var( )iy

Actually, proc glm

Page 4: Copyright © 2013, SAS Institute Inc. All rights reserved. GENERALIZED LINEAR MODELS

Copy r igh t © 2013 , SAS Ins t i t u t e I nc . A l l r i gh t s res erved .

OVERVIEW GENERALIZED LINEAR MODELS

• The distribution of the observations can come from the exponential family of distributions.

• The variance of the response variable is a specified function of its mean.

• X is fit to a function of E(y) (called a link function) suggested by the distribution of the observations:

g(E(y)) = g() = X

0 1 1( ( )) Xi i k kig E y x x …

Link function

Page 5: Copyright © 2013, SAS Institute Inc. All rights reserved. GENERALIZED LINEAR MODELS

Copy r igh t © 2013 , SAS Ins t i t u t e I nc . A l l r i gh t s res erved .

OVERVIEW LOGIT LINK FUNCTION FOR BINARY RESPONSE

Logit (pi)

Predictor

LogitTransform

Predictor

pi

logit( ) log1

pp

p

Page 6: Copyright © 2013, SAS Institute Inc. All rights reserved. GENERALIZED LINEAR MODELS

Copy r igh t © 2013 , SAS Ins t i t u t e I nc . A l l r i gh t s res erved .

OVERVIEW LOG LINK FUNCTION FOR COUNT DATA

Count Log(count)

LogTransform

Predictor Predictor

Page 7: Copyright © 2013, SAS Institute Inc. All rights reserved. GENERALIZED LINEAR MODELS

Copy r igh t © 2013 , SAS Ins t i t u t e I nc . A l l r i gh t s res erved .

OVERVIEW EXAMPLES OF GENERALIZED LINEAR MODELS

*Models often use the LOG link in practice.

Page 8: Copyright © 2013, SAS Institute Inc. All rights reserved. GENERALIZED LINEAR MODELS

Copy r igh t © 2013 , SAS Ins t i t u t e I nc . A l l r i gh t s res erved .

POISSON REGRESSION

Page 9: Copyright © 2013, SAS Institute Inc. All rights reserved. GENERALIZED LINEAR MODELS

Copy r igh t © 2013 , SAS Ins t i t u t e I nc . A l l r i gh t s res erved .

POISSON REGRESSION PROPERTIES AND EXAMPLES

• is one type of generalized linear model

• assumes that the response variable follows a Poisson distribution conditional on the values of the predictor variables

• can be used to model the number of occurrences of an event of interest or the rate of occurrence of an event of interest as a function of some predictor variables

• is most appropriate for rare events

Examples include • number of ear infections in

infants• number of equipment failures• colony counts for bacteria or

viruses• counts of a rare disease in a

population• number of fatal crashes at an

intersection• homicide rates in a given state• rate of insurance claims• number of infected areas per

unit volume of a tree• response rates to a marketing

campaign

• Response dist. should have small mean (<10 or even <5 and ideally ~1)

• If no, gamma and lognormal could be better choice

Page 10: Copyright © 2013, SAS Institute Inc. All rights reserved. GENERALIZED LINEAR MODELS

Copy r igh t © 2013 , SAS Ins t i t u t e I nc . A l l r i gh t s res erved .

POISSON REGRESSION POISSON VERSUS NORMAL DISTRIBUTION

Poisson distribution• is skewed to the right for rare events• is for nonnegative integer values• has only one parameter (the mean)• has a variance that is equal to the mean

Normal distribution• is symmetric• can be for negative as well as positive real values• has two unrelated parameters (mean and variance)

Page 11: Copyright © 2013, SAS Institute Inc. All rights reserved. GENERALIZED LINEAR MODELS

Copy r igh t © 2013 , SAS Ins t i t u t e I nc . A l l r i gh t s res erved .

POISSON REGRESSION MODEL

0 1 1 2 2log( ) ... k kX X X

0 1 1 2 2

0 1 1 2 2

( ... )k k

k k

X X X

XX X

e

e e e e

Page 12: Copyright © 2013, SAS Institute Inc. All rights reserved. GENERALIZED LINEAR MODELS

Copy r igh t © 2013 , SAS Ins t i t u t e I nc . A l l r i gh t s res erved .

POISSON REGRESSION PARAMETER ESTIMATES

ˆe multiplicative effect on ̂

for a one-unit change in X.

1̂e 1.20, then a one-unit increase in X1 yields a 20% increase in the estimated mean.

0.80, then a one-unit increase in X2 yields a 20% decrease in the estimated mean.

Example 1, if

Example 2, if

2ˆe

Page 13: Copyright © 2013, SAS Institute Inc. All rights reserved. GENERALIZED LINEAR MODELS

Copy r igh t © 2013 , SAS Ins t i t u t e I nc . A l l r i gh t s res erved .

POISSON REGRESSION ПРИМЕР: ДАННЫЕ

Gender

Number of Self-Diagnosed Ear Infections Age in Years

Frequent or Occasional Ocean Swimmer

Typical Swimming Location

Page 14: Copyright © 2013, SAS Institute Inc. All rights reserved. GENERALIZED LINEAR MODELS

Copy r igh t © 2013 , SAS Ins t i t u t e I nc . A l l r i gh t s res erved .

POISSON REGRESSION CATEGORICAL

GenderFrequent or Occasional Ocean Swimmer

Typical Swimming Location

Occ

asio

nal

Fre

q

Bea

ch

no

nB

each

Mal

e

Fem

ale

Page 15: Copyright © 2013, SAS Institute Inc. All rights reserved. GENERALIZED LINEAR MODELS

Copy r igh t © 2013 , SAS Ins t i t u t e I nc . A l l r i gh t s res erved .

POISSON REGRESSION INTERVAL

Age in Years

Page 16: Copyright © 2013, SAS Institute Inc. All rights reserved. GENERALIZED LINEAR MODELS

Copy r igh t © 2013 , SAS Ins t i t u t e I nc . A l l r i gh t s res erved .

POISSON REGRESSION ПРИМЕР

proc genmod data=sasuser.earinfection; class Swimmer (param=ref ref='Freq') Location (param=ref ref='Beach') Gender (param=ref ref='Male'); model infections = swimmer location gender age age*age / dist=poisson link=log type3;run;

Page 17: Copyright © 2013, SAS Institute Inc. All rights reserved. GENERALIZED LINEAR MODELS

Copy r igh t © 2013 , SAS Ins t i t u t e I nc . A l l r i gh t s res erved .

POISSON REGRESSION ПРИМЕР: PROC GENMOD OUTPUT

Scale = 1*

Page 18: Copyright © 2013, SAS Institute Inc. All rights reserved. GENERALIZED LINEAR MODELS

Copy r igh t © 2013 , SAS Ins t i t u t e I nc . A l l r i gh t s res erved .

POISSON REGRESSION

• Poisson regression models assume the variance is equal to the mean.

• Count data often exhibit variability exceeding the mean.

• Overdispersion leads to underestimates of the standard errors of parameter estimates.

• Overdispersion results in overestimates of the test statistic and liberal p-values.

OVERDISPERSION

• Subject heterogeneity due to an under-specified model• Outliers in the data• Positive correlation between the responses in clustered data

WHAT TO DO• Use the negative binomial

distribution [NOW]• Apply a multiplicative adjustment

factor (PSCALE or DSCALE option in the MODEL statement) [HW]

Page 19: Copyright © 2013, SAS Institute Inc. All rights reserved. GENERALIZED LINEAR MODELS

Copy r igh t © 2013 , SAS Ins t i t u t e I nc . A l l r i gh t s res erved .

NEGATIVE BINOMIAL REGRESSION

Page 20: Copyright © 2013, SAS Institute Inc. All rights reserved. GENERALIZED LINEAR MODELS

Copy r igh t © 2013 , SAS Ins t i t u t e I nc . A l l r i gh t s res erved .

NEGATIVE BINOMIAL REGRESSION

DISTRIBUTION AND MODEL

The negative binomial distribution• is the distribution for count data that permits the variance to exceed

the mean• enables the model to have greater flexibility in modeling the

relationship between the mean and the variance of the response variable than the Poisson model

Natural LogNegative

Binomial

Count

Variance Function

Link Function

DistributionResponse Variable

2k

Page 21: Copyright © 2013, SAS Institute Inc. All rights reserved. GENERALIZED LINEAR MODELS

Copy r igh t © 2013 , SAS Ins t i t u t e I nc . A l l r i gh t s res erved .

NEGATIVE BINOMIAL REGRESSION

DISPERSION PARAMETER K

• The dispersion parameter k is not allowed to vary over observations.

• The limiting case when the parameter k is equal to 0 corresponds to a Poisson regression model.

• When the parameter is greater than 0, overdispersion is evident and the standard errors will increase. The fitted values are similar, but the larger standard errors reflect the overdispersion uncaptured with the Poisson model.

Page 22: Copyright © 2013, SAS Institute Inc. All rights reserved. GENERALIZED LINEAR MODELS

Copy r igh t © 2013 , SAS Ins t i t u t e I nc . A l l r i gh t s res erved .

NEGATIVE BINOMIAL REGRESSION

ПРИМЕР

proc genmod data=sasuser.earinfection; class Swimmer (param=ref ref='Freq') Location (param=ref ref='Beach') Gender (param=ref ref='Male'); model infections = swimmer location gender age age*age / dist=negbin link=log type3;run;

Page 23: Copyright © 2013, SAS Institute Inc. All rights reserved. GENERALIZED LINEAR MODELS

Copy r igh t © 2013 , SAS Ins t i t u t e I nc . A l l r i gh t s res erved .

NEGATIVE BINOMIAL REGRESSION

ПРИМЕР: PROC GENMOD OUTPUT

Page 24: Copyright © 2013, SAS Institute Inc. All rights reserved. GENERALIZED LINEAR MODELS

Copy r igh t © 2013 , SAS Ins t i t u t e I nc . A l l r i gh t s res erved .

POISSON REGRESSION FOR RATES

Page 25: Copyright © 2013, SAS Institute Inc. All rights reserved. GENERALIZED LINEAR MODELS

Copy r igh t © 2013 , SAS Ins t i t u t e I nc . A l l r i gh t s res erved .

POISSON REGRESSION: RATES

RATES DATA: DEFINITION & EXAMPLES

• When events occur over time, space, or some other index of exposure, it is more relevant to model the rate at which they occur rather than the number of events.

• Rates provide the necessary standardization to make the outcomes comparable.

• You use the OFFSET= option in the MODEL statement in PROC GENMOD.

• How crime rates are related to the city’s unemployment rate

• How melanoma incidence rates are related to demographic variables

• How the rate of loan defaults is related to region of the country

• How response rates to marketing campaigns relate to known characteristics of the recipients

Page 26: Copyright © 2013, SAS Institute Inc. All rights reserved. GENERALIZED LINEAR MODELS

Copy r igh t © 2013 , SAS Ins t i t u t e I nc . A l l r i gh t s res erved .

• Log(T) is called the offset variable that has a coefficient equal to 1.

• The offset variable makes the fitted rate proportional to the index of exposure.

• For example, using the log of the population as an offset variable is the same as modeling the mean number of events proportional to population size.

POISSON REGRESSION: RATES

RATES DATA: OFFSET

0 1 1log k kx xT

0 1 1log( ) log( ) k kT x x …

0 1 1( )* k kx xT e …

OFFSET = Variable

Page 27: Copyright © 2013, SAS Institute Inc. All rights reserved. GENERALIZED LINEAR MODELS

Copy r igh t © 2013 , SAS Ins t i t u t e I nc . A l l r i gh t s res erved .

POISSON REGRESSION: RATES

SKIN CANCER IN TEXAS AND MINNESOTA

Incidence of nonmelanoma

skin cancer

City: Minneapolis-St. Paul Dallas-Fort Worth

Age_ 15-24Group: 25-34

35-44 45-54 55-64 65-74 75-84 85+

Page 28: Copyright © 2013, SAS Institute Inc. All rights reserved. GENERALIZED LINEAR MODELS

Copy r igh t © 2013 , SAS Ins t i t u t e I nc . A l l r i gh t s res erved .

POISSON REGRESSION: RATES

ПРИМЕР

proc genmod data=sasuser.skin; class City (param=ref ref='MSP') Age (param=ref ref='85+'); model cases = city age / offset=log_pop dist=poisson link=log type3;run;

Page 29: Copyright © 2013, SAS Institute Inc. All rights reserved. GENERALIZED LINEAR MODELS

Copy r igh t © 2013 , SAS Ins t i t u t e I nc . A l l r i gh t s res erved .

ZERO-INFLATED POISSON MODEL

Page 30: Copyright © 2013, SAS Institute Inc. All rights reserved. GENERALIZED LINEAR MODELS

Copy r igh t © 2013 , SAS Ins t i t u t e I nc . A l l r i gh t s res erved .

ZIP PURPOSE

• In some settings, the incidence of zero counts will be much greater than expected for the Poisson distribution.

• Poisson regression models will exhibit overdispersion when they are fit to data with an excess number of zeros.

• Zero-inflated Poisson (ZIP) models might be a better fit to the data.

Page 31: Copyright © 2013, SAS Institute Inc. All rights reserved. GENERALIZED LINEAR MODELS

Copy r igh t © 2013 , SAS Ins t i t u t e I nc . A l l r i gh t s res erved .

ZIP MODEL

• The population that can be modeled with the zero-inflated Poisson distribution is considered to consist of two types of responses.

• The first type gives Poisson distributed counts, which can produce the zero outcome or some other positive outcome.

• The second type always gives a zero count.

• Therefore, the relevant distribution is a mixture of a Poisson distribution and a distribution that is constant at zero.

Page 32: Copyright © 2013, SAS Institute Inc. All rights reserved. GENERALIZED LINEAR MODELS

Copy r igh t © 2013 , SAS Ins t i t u t e I nc . A l l r i gh t s res erved .

ZIP COMPONENTS

( )i ih z

( )i ig x MODEL statement

ZEROMODEL statement

proc genmod data=sasuser.roots; class bap photoperiod; model roots = photoperiod | bap

/ dist=zip link=log type3; zeromodel photoperiod;run;

Page 33: Copyright © 2013, SAS Institute Inc. All rights reserved. GENERALIZED LINEAR MODELS

Copy r igh t © 2013 , SAS Ins t i t u t e I nc . A l l r i gh t s res erved .

ZIP ПРИМЕР: ДАННЫЕ

photoperiod (hour)

concentration (M)

2.2 4.4 8.8 17.6

8 Number of roots

Number of roots

Number of roots

Number of roots

16 Number of roots

Number of roots

Number of roots

Number of roots

Page 34: Copyright © 2013, SAS Institute Inc. All rights reserved. GENERALIZED LINEAR MODELS

Copy r igh t © 2013 , SAS Ins t i t u t e I nc . A l l r i gh t s res erved .

ZIP ПРИМЕР

16 h

ours

8 ho

urs

Page 35: Copyright © 2013, SAS Institute Inc. All rights reserved. GENERALIZED LINEAR MODELS

Copy r igh t © 2013 , SAS Ins t i t u t e I nc . A l l r i gh t s res erved .

ZIP ПРИМЕР: РЕЗУЛЬТАТЫ

dist=zinb

Page 36: Copyright © 2013, SAS Institute Inc. All rights reserved. GENERALIZED LINEAR MODELS

Copy r igh t © 2013 , SAS Ins t i t u t e I nc . A l l r i gh t s res erved .

GAMMA REGRESSION

Page 37: Copyright © 2013, SAS Institute Inc. All rights reserved. GENERALIZED LINEAR MODELS

Copy r igh t © 2013 , SAS Ins t i t u t e I nc . A l l r i gh t s res erved .

GAMMA DISTRIBUTION

• is a skewed distribution for positive values• has a variance that is proportional to the squared mean• has lighter tails than a lognormal distribution

Var(y) [E(y)]2

gamma

Page 38: Copyright © 2013, SAS Institute Inc. All rights reserved. GENERALIZED LINEAR MODELS

Copy r igh t © 2013 , SAS Ins t i t u t e I nc . A l l r i gh t s res erved .

DISTRIBUTIONS COMPARISON

Normal (truncated)constant*

Poisson E(Y)

Gamma (E(Y))2

Lognormal (E(Y))2

Distribution Variance

100x

Page 39: Copyright © 2013, SAS Institute Inc. All rights reserved. GENERALIZED LINEAR MODELS

Copy r igh t © 2013 , SAS Ins t i t u t e I nc . A l l r i gh t s res erved .

GAMMA REGRESSION ПРИМЕР

proc univariate data=car; var price; histogram / gamma (alpha=est sigma=est theta=est color=blue w=2) vaxis=0 to 14 by 2 midpoints=8 to 50 by 2;run;

Page 40: Copyright © 2013, SAS Institute Inc. All rights reserved. GENERALIZED LINEAR MODELS

Copy r igh t © 2013 , SAS Ins t i t u t e I nc . A l l r i gh t s res erved .

GAMMA REGRESSION REG AND GENMOD RESULTS: RESIDUAL

PROC REG

PROC GENMOD, link=log PROC GENMOD, link=identity

proc genmod data=car; model price = hwympg hwympg2 horsepower / dist=gamma link=log /*identity*/ obstats id=model;run;

Page 41: Copyright © 2013, SAS Institute Inc. All rights reserved. GENERALIZED LINEAR MODELS

Copy r igh t © 2013 , SAS Ins t i t u t e I nc . A l l r i gh t s res erved .

SUMMARY

• Problem: • nonconstant variance

• Approaches: Transform the dependent variable Price (log). Fit a gamma regression model with the log link function. Fit a gamma regression model with the identity link

function.

PROBLEM for OLS

?

Page 42: Copyright © 2013, SAS Institute Inc. All rights reserved. GENERALIZED LINEAR MODELS

Copy r igh t © 2013 , SAS Ins t i t u t e I nc . A l l r i gh t s res erved .

СТРАХОВАНИЕ CASE STUDY

Page 43: Copyright © 2013, SAS Institute Inc. All rights reserved. GENERALIZED LINEAR MODELS

Copy r igh t © 2013 , SAS Ins t i t u t e I nc . A l l r i gh t s res erved .

GENMOD СТРАХОВАНИЕ

• Frequency - how often claims are made• Severity

• A typical way to model severity (claim amount) is by using a gamma distribution with a log link function

• Pure premium - it is the portion of the company’s expected cost that is “purely” attributed to loss• does not include the general expense of doing

business• Tweedie distribution

Page 44: Copyright © 2013, SAS Institute Inc. All rights reserved. GENERALIZED LINEAR MODELS

Copy r igh t © 2013 , SAS Ins t i t u t e I nc . A l l r i gh t s res erved .

GLM СТРАХОВАНИЕ: FREQUENCY & PURE PREMIUM

• ZIP• Tweedie distribution –

• PROC SEVERITY SAS/ETS

Page 45: Copyright © 2013, SAS Institute Inc. All rights reserved. GENERALIZED LINEAR MODELS

Copy r igh t © 2013 , SAS Ins t i t u t e I nc . A l l r i gh t s res erved . sas.com

СПАСИБО!