80
Poisson & Negative Binomial Regression “Now I've got heartaches by the number, Troubles by the score, Every day you love me less, Each day I love you more” (Ray Price)

Count Variables

  • Upload
    osias

  • View
    27

  • Download
    0

Embed Size (px)

DESCRIPTION

Poisson & Negative Binomial Regression “Now I've got heartaches by the number, Troubles by the score, Every day you love me less, Each day I love you more” (Ray Price). Count Variables. Number of times a particular event occurs to each case, usually within a given: - PowerPoint PPT Presentation

Citation preview

Page 1: Count Variables

Poisson & Negative Binomial Regression

“Now I've got heartaches by the number,Troubles by the score,

Every day you love me less,Each day I love you more” (Ray Price)

Page 2: Count Variables

Count Variables

Number of times a particular event occurs to each case, usually within a given:Time period (e.g., number of hospital visits

per year)Population size (e.g., number of registered

sex offenders per 100,000 population), orGeographical area (e.g., number of divorces

per county or state) Whole numbers that can range from 0

through +

Page 3: Count Variables

Count DVs

Number of hospital visits, outpatient visits, services used, divorces, arrests, criminal offenses, symptoms, placements, children fostered, children adopted

Page 4: Count Variables

Overview

Poisson regressionBasic model for count DVs

Negative binomial regressionAlternative to Poisson regression

• Less restrictive assumptions, and so greater generality

Page 5: Count Variables

Single (Dichotomous) IV Example

DV = number of foster children adopted IV = marital status, 0 = unmarried, 1 =

married N = 285 foster mothers

Is there a difference in the number of foster children adopted by unmarried and married foster mothers?

Page 6: Count Variables

Distribution of Count DVs

Typically skewed positively with large percentage of 0 values

Page 7: Count Variables

Number of Foster Children Adopted

Page 8: Count Variables

Descriptive Statistics

Table 5.1

Why is a t-test for independent groups not appropriate here?

Page 9: Count Variables

Strength & Direction of Relationships

Being married increased the mean number of children adopted by a factor of 1.47 (47%)

1.112 / .754 = 1.47

100(1.47 – 1.00) = 47%

Page 10: Count Variables

Question & Answer

Is there a difference in the number of foster children adopted by unmarried and married foster mothers?Yes. The mean number of children adopted

by unmarried mothers is .75 and by married mothers 1.11. So, being married increased the mean number of children adopted by a factor of 1.47 (47%).

But, analysis incorrect because…

Page 11: Count Variables

Exposure

Opportunity for event to occurLength of time, population size, geographical

area, or other domain of interest

Number of years fostering varied across mothers, and so opportunity to adopt foster children variedUnmarried mothers, M = 8.803Married mothers, M = 7.254

Page 12: Count Variables

Rate

Count per unit of…Time (e.g., number of children adopted per

year)Population (e.g., number of registered sex

offenders per 100,000)Geographical area (e.g., number of children

below the poverty rate per state)

Page 13: Count Variables

Rate (cont’d)

= / E

(lambda), mean population rate• Sometimes referred to as the incidence rate

(mu), mean population count• Sometimes referred to as incidence

E, exposure

Page 14: Count Variables

Rate (cont’d)

Example

rateUnmarried = .754 / 8.803 = .086• .086 children adopted yearly (rate)

rateMarried = 1.112 / 7.254 = .153• .153 children adopted yearly (rate)

Page 15: Count Variables

Incidence Rate Ratio (IRR)

IRR = Married / Unmarried

Quantifies the direction and strength of relationship between IVs and DV

Being married increased the yearly adoption rate by a factor of 1.78 (78%)

• 153 / .086 = 1.78• 100(1.78 – 1.00)

Page 16: Count Variables

Incidence Rate Ratio (IRR) (cont’d) IRR = 1

Numerator group and denominator group have same incidence rate

IRR > 1Numerator group has a higher incidence

rate than denominator group IRR < 1

Numerator group has a lower incidence rate than the denominator group

Potential range from 0 through +

Page 17: Count Variables

Comparing IRR > 1 & IRR < IRR > 1 & IRR < 11 Compute reciprocal of one of the IRRs

e.g., IRR of 2.00 and an IRR of .50

Reciprocal of .50 is 2.00 (1 / .50 = 2.00) IRRs are equal in size (but not in direction of

the relationship)

Page 18: Count Variables

Question & Answer

Is there a difference in the number of foster children adopted by unmarried and married foster mothers?Yes. The yearly adoption rate for unmarried

mothers is .09 and for married mothers .15. So, being married increased the yearly adoption rate by a factor of 1.78 (78%).

Page 19: Count Variables

Poisson Regression

Page 20: Count Variables

Single (Dichotomous) IV Example (ignoring exposure)

DV = number of foster children adopted IV = marital status, 0 = unmarried, 1 =

married N = 285 foster mothers

Is there a difference in the number of foster children adopted by unmarried and married foster mothers?

Page 21: Count Variables

Statistical Significance

Tables 5.2, 5.3Relationship between marital status and

children adopted is statistically significant (Wald 2 = 5.846, p = .016)

H0: = 0, 0, ≤ 0, same as

H0: IRR = 1, IRR 0, IRR ≤ 0Likelihood ratio 2 better than Wald

Page 22: Count Variables

Slope

B = slopePositive slope, positive relationship

• IRR > 1

Negative slope, negative relationship• IRR < 1

0 slope, no linear relationship• IRR = 1

Page 23: Count Variables

Slope (cont’d)

B = .388 Positive relationship between marital

status and children adoptedMarried mothers adopt more children

Page 24: Count Variables

IRR & Percentage Change

Exp(B) = IRR = 1.474 % change = 100(1.474 - 1) = 47%

Married mothers adopt more childrenBeing married increased the yearly adoption

rate by a factor of 1.47 (47%)

Page 25: Count Variables

Poisson Model

ln() = α + 1X1 + 1X2 + … kXk, or ln() =

ln(), log of mean count (“log link”)e.g., log of mean number of children adopted

, abbreviation for linear predictor (right hand side of this equation)

k = number of independent variables

Page 26: Count Variables

Inverse (reverse) Link

= e

is the mean count• e.g., mean number of children adopted

Page 27: Count Variables

ln() to

ln(mean) = -.282 + (.388)(XMarried)

Single mothers ln(mean) = -.282 + (.338)(0) = -.282mean = e-.282 = .754mean = .75 children adopted

Married mothers ln(mean) = -.282 + (.388)(1) = .106mean = e.106 = 1.112mean = 1.11 children adopted

Page 28: Count Variables

Question & Answer

Is there a difference in the number of foster children adopted by unmarried and married foster mothers?Yes. The mean number of children adopted

by unmarried mothers is .75 and by married mothers 1.11. So, being married increased the mean number of children adopted by a factor of 1.47 (47%).

But, analysis incorrect because…

Page 29: Count Variables

Single (Dichotomous) IV Example (with exposure)

Use SPSS to create an “offset” variableNatural log of the exposure variable

• Exposure variable must be > 0compute lnYearsFostered =

ln(YearsFostered).

Enter offset variable into the regression analysis

Page 30: Count Variables

Statistical Significance

Tables 5.4, 5.5Relationship between marital status and

yearly adopton rate is statistically significant (Wald 2 = 13.131, p < .001)

Page 31: Count Variables

IRR & Percentage Change

Exp(B) = IRR = 1.789 % change = 100(1.789 - 1) = 79%

Married mothers adopt more children per year

Being married increased the yearly adoption rate by a factor of 1.79 (79%)

Page 32: Count Variables

ln() to

ln(rate) = -2.457 + (.582)(XMarried)

Single mothers ln(rate) = -2.457 + (.582)(0) = -2.457 rate = e-2.457 = .086 .09 children adopted yearly (rate)

Married mothers ln(rate) = -2.457 + (.582)(1) = -1.875 rate = e-1.875 = .153 .15 children adopted yearly (rate)

Page 33: Count Variables

Roadmap to Computations

Log of Ratesln() =

Rates = e

IRR(1) / (0)

% change100(IRR - 1)

Page 34: Count Variables

Question & Answer

Is there a difference in the number of foster children adopted by unmarried and married foster mothers?Yes. The yearly adoption rate for unmarried

mothers is .09 and for married mothers .15. So, being married increased the yearly

adoption rate by a factor of 1.79 (79%).

Page 35: Count Variables

Single (Quantitative) IV Example DV = number of foster children adopted IV = Perceived responsibility for

parenting (scale scores transformed to z-scores)

Offset variable = log of years fostered N = 285 foster mothers

Do foster mothers who feel a greater responsibility to parent foster children adopt more foster children?

Page 36: Count Variables

Statistical Significance

Tables 5.6, 5.7Relationship between parenting

responsibility and yearly adoption rate is statistically significant (Wald 2 = 10.045, p = .002)

Page 37: Count Variables

IRR & Percentage Change

Exp(B) = IRR = 1.202 % change = 100(1.202 - 1) = 20%

Mothers with greater parenting responsibility adopt more children per year

For every one-standard deviaiton increase in parenting responsibility the yearly adoption rate increases by a factor of 1.20 (20%)

Page 38: Count Variables

ln() to

ln(rate) = -2.008 + (.184)(XzParentRole)

e.g., mean parenting responsibility (z = 0): ln(rate) = -2.008 + (.184)(0) = -2.008 rate = e-2.008 = .13 .13 children adopted yearly (rate)

Page 39: Count Variables

Figure

zParentRole.xls

Page 40: Count Variables

Effect of Standardized Parenting Responsibility on Adoption Rate

0.00

0.05

0.10

0.15

0.20

0.25

Standardized Parenting Responsibility

Ado

ptio

n R

ate

Rate 0.08 0.09 0.11 0.13 0.16 0.19 0.23

-3 -2 -1 0 1 2 3

Page 41: Count Variables

Question & Answer

Do foster mothers who feel a greater responsibility to parent foster children adopt more foster children?Yes. For every one-standard deviation

increase in parenting responsibility the yearly adoption rate increases by a factor of 1.20 (20%). The yearly adoption rate is .09 for mothers two standard deviations below the mean, .13 for mothers with the mean, and .19 for mothers two standard deviations above the mean.

Page 42: Count Variables

Multiple IV Example

DV = number of foster children adopted IV = Perceived responsibility for parenting (scale

scores transformed to z-scores) IV = marital status, 0 = unmarried, 1 = married Offset variable = log of years fostered N = 285 foster mothers

Do foster mothers who take more responsibility for parenting adopt more foster children per year, controlling for marital status?

Page 43: Count Variables

Statistical Significance

Table 5.8Relationship between set of IVs and

yearly adoption rate is statistically significant (2 = 27.792, p < .001)

H0: 1 = 2 = k = 0, same as

H0: IRR1 = IRR2 = IRRk = 1

Page 44: Count Variables

Statistical Significance

Table 5.9Relationship between parenting

responsibility and yearly adoption rate is statistically significant, controlling for marital status (2 = 11.853, p = .001)

Relationship between marital status and yearly adoption rate is statistically significant, controlling for parenting responsibility (2 = 16.520, p < .001)

Page 45: Count Variables

Statistical Significance

Table 5.10Relationship between parenting

responsibility and yearly adoption rate is statistically significant, controlling for marital status (Wald 2 = 11.576, p = .001)

Relationship between marital status and yearly adoption rate is statistically significant, controlling for parenting responsibility (Wald 2 = 14.433, p < .001)

Page 46: Count Variables

IRR & Percentage Change: Parenting Responsibility Exp(B) = IRR = 1.219 % change = 100(1.219 - 1) = 22%

Mothers with greater parenting responsibility adopt more children per year, controlling for marital status

For every one-standard deviaiton increase in parenting responsibility the yearly adoption rate increases by a factor of 1.22 (22%), controlling for marital status

Page 47: Count Variables

IRR & Percentage Change: Marital Status

Exp(B) = IRR = 1.842 % change = 100(1.842 - 1) = 84%

Married mothers adopt more children per year, controlling for parenting responsibility

Being married increased the yearly adoption rate by a factor of 1.84 (84%), controlling for parenting responsibility

Page 48: Count Variables

ln() to

ln(rate) = -2.498 + (.198)(XzParentRole) + (.611)(XMarried)

e.g., mean parenting responsibility (z = 0) and unmarried mothers: ln(rate) = -2.498 + (.198)(0) + (.611)(0) = -

2.498 rate = e-2.498 = .08 .08 children adopted yearly (rate)

Page 49: Count Variables

Figure

Married & zParentRole.xls

Page 50: Count Variables

Effect of Standardized Parenting Responsibility and Marital Status on Adoption Rate

0.00

0.05

0.10

0.15

0.20

0.25

0.30

Standardized Parenting Responsibility

Ado

ptio

n R

ate

Unmarried 0.05 0.06 0.07 0.08 0.10 0.12 0.15

Married 0.08 0.10 0.12 0.15 0.18 0.23 0.27

-3 -2 -1 0 1 2 3

Page 51: Count Variables

Question & Answer

Do foster mothers who take more responsibility for parenting adopt more foster children per year, controlling for marital status? Yes. For every one-standard deviation increase in

parenting responsibility the yearly adoption rate increases by a factor of 1.22 (22%), controlling for marital status.

Cont’d

Page 52: Count Variables

Question & Answer

Do foster mothers who take more responsibility for parenting adopt more foster children per year, controlling for marital status? For unmarried mothers the yearly adoption rate

is .06 for mothers two standard deviations below the mean, .08 for mothers with the mean, and .12 for mothers two standard deviations above the mean.

For umarried mothers the yearly adoption rate is .10 for mothers two standard deviations below the mean, .15 for mothers with the mean, and .23 for mothers two standard deviations above the mean.

Page 53: Count Variables

Assumptions Necessary for Testing Hypotheses Equidispersion—variance equals the mean

Underdispersion—variance less than the meanOverdispersion—variance larger than the mean

• Typical• Overdispersion may lead us to believe that IVs are

statistically significant when in fact they are not• Overdispersion can result from outliers and exclusion

of relevant IVs, interaction terms, curvilinear terms and numerous other factors

Page 54: Count Variables

Assumptions Necessary for Testing Hypotheses (cont’d)

Assumptions discussed in GZLM lecture See below concerning underdispersion, zero-

inflation, censoring, truncation

Page 55: Count Variables

Negative Binomial Regression

Page 56: Count Variables

Negative Binomial Regression Extension of Poisson regression Allows overdispersion (but not

underdispersion) Standard method used to model

overdispersed Poisson data Given that overdispersion is the norm,

the negative binomial model has more generality than the Poisson model

Page 57: Count Variables

Multiple IV Example

DV = number of foster children adopted IV = Perceived responsibility for parenting (scale

scores transformed to z-scores) IV = marital status, 0 = unmarried, 1 = married Offset variable = log of years fostered N = 285 foster mothers

Do foster mothers who take more responsibility for parenting adopt more foster children per year, controlling for marital status?

Page 58: Count Variables

Test for Overdispersion

Estimate a negative binomial regression Negative binomial regression adds an

ancillary parameter that allows overdispersion (but not underdispersion)

Page 59: Count Variables

Test for Overdispersion (cont’d)

Ancillary parameter directly related to amount of overdispersion If data are not overdispersed ancillary

parameter equals 0Poisson regression is a negative binomial

regression with an ancillary parameter of 0Larger values indicate more overdispersion

• Values typically range from 0 to about 4

Page 60: Count Variables

Test for Overdispersion (cont’d)

Table 5.14Test of the null hypothesis that ancillary

parameter equals 0Rejection of this null hypothesis indicates

overdispersion• p = .029 for alternative hypothesis that ancillary

parameter > 0, so reject

Negative binomial regression used when overdispersion

Page 61: Count Variables

Statistical Significance

Table 5.15Relationship between set of IVs and

yearly adoption rate is statistically significant (2 = 8.68, p = .013)

H0: 1 = 2 = k = 0, same as

H0: IRR1 = IRR2 = IRRk = 1

Page 62: Count Variables

Statistical Significance

Table 5.16Relationship between parenting

responsibility and yearly adoption rate is statistically significant, controlling for marital status (2 = 4.854, p = .028)

Relationship between marital status and yearly adoption rate is statistically significant, controlling for parenting responsibility (2 = 4.710, p = .030)

Page 63: Count Variables

Statistical Significance

Table 5.17Relationship between parenting

responsibility and yearly adoption rate is statistically significant, controlling for marital status (Wald 2 = 4.917, p = .027)

Relationship between marital status and yearly adoption rate is statistically significant, controlling for parenting responsibility (Wald 2 = 4.845, p < .028)

Page 64: Count Variables

IRR & Percentage Change: Parenting Responsibility Exp(B) = IRR = 1.254 % change = 100(1.254 - 1) = 25%

Mothers with greater parenting responsibility adopt more children per year, controlling for marital status

For every one-standard deviaiton increase in parenting responsibility the yearly adoption rate increases by a factor of 1.25 (25%), controlling for marital status

Page 65: Count Variables

IRR & Percentage Change: Marital Status

Exp(B) = IRR = 1.760 % change = 100(1.760 - 1) = 76%

Married mothers adopt more children per year, controlling for parenting responsibility

Being married increased the yearly adoption rate by a factor of 1.76 (76%), controlling for parenting responsibility

Page 66: Count Variables

ln() to

ln(rate) = -2.256 + (.227)(XzParentRole) + (.565)(XMarried)

e.g., mean parenting responsibility (z = 0) and unmarried mothers: ln(rate) = -2.256 + (.227)(0) + (.565)(0) = -

2.256 rate = e-2.256 = .10 .10 children adopted yearly (rate)

Page 67: Count Variables

Figure

(NB) Married & zParentRole.xls

Page 68: Count Variables

Effect of Standardized Parenting Responsibility & Marital Status on Adoption Rate

0.00

0.10

0.20

0.30

0.40

Standardized Parenting Responsibility

Ado

ptio

n R

ate

Unmarried 0.05 0.07 0.08 0.10 0.13 0.16 0.21

Married 0.09 0.12 0.15 0.18 0.23 0.29 0.36

-3 -2 -1 0 1 2 3

Page 69: Count Variables

Question & Answer

Do foster mothers who take more responsibility for parenting adopt more foster children per year, controlling for marital status? Yes. For every one-standard deviation increase in

parenting responsibility the yearly adoption rate increases by a factor of 1.25 (25%), controlling for marital status.

Cont’d

Page 70: Count Variables

Question & Answer

Do foster mothers who take more responsibility for parenting adopt more foster children per year, controlling for marital status? For unmarried mothers the yearly adoption rate

is .07 for mothers two standard deviations below the mean, .10 for mothers with the mean, and .16 for mothers two standard deviations above the mean.

For umarried mothers the yearly adoption rate is .12 for mothers two standard deviations below the mean, .18 for mothers with the mean, and .29 for mothers two standard deviations above the mean.

Page 71: Count Variables

Assumptions Necessary for Testing Hypotheses Assumptions discussed in GZLM lecture See below concerning underdispersion,

zero-inflation, censoring, truncation

Page 72: Count Variables

Model Evaluation

Index plotsLeverage valuesStandardized or unstandardized deviance

residualsCook’s D

Graph and compare observed and estimated counts

Page 73: Count Variables

Analogs of RAnalogs of R22

None in standard use and each may give different results

Typically much smaller than R2 values in linear regression

Difficult to interpret

Page 74: Count Variables

Multicollinearity

SPSS GZLM doesn’t compute multicollinearity statistics

Use SPSS linear regression Problematic levels

Tolerance < .10 or VIF > 10

Page 75: Count Variables

Additional Topics

Polytomous IVs Curvilinear relationships Interactions

Page 76: Count Variables

Overview of the Process

Select IVs and decide whether to test curvilinear relationships or interactions

Carefully screen and clean data Transform and code variables as

needed Estimate regression model Examine assumptions necessary to

estimate Poisson or negative binomial regression model, examine model fit, and revise model as needed

Page 77: Count Variables

Overview of the Process (cont’d)

Test hypotheses about the overall model and specific model parameters, such as IRRs

Create tables and graphs to present results in the most meaningful and parsimonious way

Interpret results of the estimated model in terms of rates and IRRs, as appropriate

Page 78: Count Variables

Additional Regression Models for Count DVs

Generalized Poisson modelData are under- or overdispersed

Poisson and negative binomial models for truncated samplesTruncation occurs when cases from the

population of interest are excluded based on characteristics of the DVe.g., Cases with zero counts are excluded (e.g.,

only mothers who adopted one or more children are included in the sample)

Page 79: Count Variables

Additional Regression Models for Count DVs (cont’d) Zero-inflated Poisson and negative

binomial models and Hurdle modelsMix of two processes in the count variable,

one that generates only zero counts, and another that generates both zero and positive counts

• e.g., Some parents might not adopt because they are not interested in adopting (a process that generates only zero counts), and some parents might want to adopt but have not had the opportunity (a process that generates both zero and positive counts)

Page 80: Count Variables

Additional Regression Models for Count DVs (cont’d) Poisson and negative binomial models

for censored DVsCensored variables are variables whose

values are known over some range, but unknown beyond a certain value because they were recorded or collected only up (or down) to a certain value

• e.g., Number of contacts between foster children and their biological parents measured as 0, 1, 2, or 3 or more per month are censored from above