47
Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion POLS/PUAD6080: Statistical Regression with Other Categorical Outcomes David A. Hughes, Ph.D. Auburn University at Montgomery [email protected] April 20, 2020 1 / 47

POLS/PUAD6080: Statistical Regression with Other Categorical … · 2020. 4. 20. · Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion POLS/PUAD6080: Statistical

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: POLS/PUAD6080: Statistical Regression with Other Categorical … · 2020. 4. 20. · Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion POLS/PUAD6080: Statistical

Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion

POLS/PUAD6080: Statistical Regression withOther Categorical Outcomes

David A. Hughes, Ph.D.

Auburn University at Montgomery

[email protected]

April 20, 2020

1 / 47

Page 2: POLS/PUAD6080: Statistical Regression with Other Categorical … · 2020. 4. 20. · Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion POLS/PUAD6080: Statistical

Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion

Overview

1 Motivation

2 Ordered Outcomes

3 Nominal Outcomes

4 Event Counts

5 Conclusion

2 / 47

Page 3: POLS/PUAD6080: Statistical Regression with Other Categorical … · 2020. 4. 20. · Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion POLS/PUAD6080: Statistical

Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion

Categorical data and regression analysis

• Last week, we examined binary dependent variables anddiscussed why the CLRM was inappropriate.

• This week, we extend our analysis to other types ofcategorical dependent variables.

• These include ordered, nominal, and event outcomes.

• What are these, and how could we think about regressionsgiven their levels of measurement?

3 / 47

Page 4: POLS/PUAD6080: Statistical Regression with Other Categorical … · 2020. 4. 20. · Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion POLS/PUAD6080: Statistical

Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion

Ordered Logit/Probit

• Start with a latent variable such that:

Y ∗ = µ+ ui.

• Similar to how we motivated the binary response model,suppose that:

Yi = j if τj−1 ≤ Y ∗i < τj , j ∈ {1, . . . , J}.

• Therefore, Y has J ordered outcome ctaegories and J − 1“cutpoints” (τ).

4 / 47

Page 5: POLS/PUAD6080: Statistical Regression with Other Categorical … · 2020. 4. 20. · Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion POLS/PUAD6080: Statistical

Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion

Estimating the ordered logit/probit

• We can express the probability of any particular discreteoutcome on Y as:

Pr(Yi = j) = Pr(τj−1 ≤ Y ∗ < τj)

= Pr(τj−1 ≤ µ+ ui < τj).

• With minimal assumptions, we can substitute Xiβ for µ:

µi = Xiβ.

5 / 47

Page 6: POLS/PUAD6080: Statistical Regression with Other Categorical … · 2020. 4. 20. · Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion POLS/PUAD6080: Statistical

Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion

Estimating the ordered logit/probit (cont’d.)

• We can rewrite the above equations as:

Pr(Yi = j | X, β) = Pr(τj − 1 ≤ Y ∗i < τj | X)

= Pr(τj−1 ≤ Xiβ + ui < τj)

= Pr(τj−1 −Xiβ ≤ ui < τj −Xiβ)

=

∫ τj−Xiβ

−∞f(ui)du−

∫ τj−1−Xiβ

−∞f(ui)du)

= F (τj −Xiβ)− F (τj−1 −Xiβ),

where f is the density for u, and F is the corresponding cdf.

• The intuition is that we “cut” the density at different points,and the probability of a given observation receiving the the ofY associated with this interval is simply the area under thedensity curve between those points.

6 / 47

Page 7: POLS/PUAD6080: Statistical Regression with Other Categorical … · 2020. 4. 20. · Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion POLS/PUAD6080: Statistical

Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion

Estimating the ordered logit/probit (cont’d.)

• If we make certain distributional assumptions about the errorterms, then we’re well on our way to forming a likelihood.We’re either going to assume that errors are distributedaccording to the logistic or standard normal distributions.

• Proceeding with the standard normal, and assuming we havea three outcome DV:

Pr(Yi = 1) = Φ(τ1 −Xiβ)− 0

Pr(Yi = 2) = Φ(τ2 −Xiβ)− Φ(τ1 −Xiβ)

Pr(Yi = 3) = 1− Φ(τ2 −Xiβ).

7 / 47

Page 8: POLS/PUAD6080: Statistical Regression with Other Categorical … · 2020. 4. 20. · Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion POLS/PUAD6080: Statistical

Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion

What about the intercept?

• We often think of the τs in ordered models as being a seriesof “intercepts.”

• In the binary model, the intercept tells us the probabilityY = 1 | Xi = 0—that is, the probability of being in eithercategory of Y .

• An identification problem occurs if we try to estimate theintercept and all J − 1 cutpoints.

8 / 47

Page 9: POLS/PUAD6080: Statistical Regression with Other Categorical … · 2020. 4. 20. · Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion POLS/PUAD6080: Statistical

Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion

Estimating Ordered Logit/Probit in Stata

• Estimating an ordered logit ologit or ordered probitoprobit involves basically the same syntax in Stata, so I willonly present the ordered probit

• Basic syntax:

• oprobit depvar [indepvars] [if] [in] [weight] [,

options]

• The if, in, and weight commands are the same as with thestandard logit and probit models

• The option commands are very similar as well, with optionsfor estimation of the standard errors begin the mostcommonly used (use help oprobit for a full list of options)

9 / 47

Page 10: POLS/PUAD6080: Statistical Regression with Other Categorical … · 2020. 4. 20. · Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion POLS/PUAD6080: Statistical

Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion

Example Output

. ologit afterlife age attendance partyid protestant catholic jewish

Iteration 0: log likelihood = -1176.262

Iteration 1: log likelihood = -1080.5213

Iteration 2: log likelihood = -1078.1519

Iteration 3: log likelihood = -1078.1468

Iteration 4: log likelihood = -1078.1468

Ordered logistic regression Number of obs = 1,068

LR chi2(6) = 196.23

Prob > chi2 = 0.0000

Log likelihood = -1078.1468 Pseudo R2 = 0.0834

------------------------------------------------------------------------------

afterlife | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------

age | .0047425 .0036268 1.31 0.191 -.0023659 .011851

attendance | -.243044 .0267099 -9.10 0.000 -.2953945 -.1906935

partyid | -.0630709 .0332457 -1.90 0.058 -.1282312 .0020895

protestant | -.8593461 .1643058 -5.23 0.000 -1.18138 -.5373126

catholic | -.5447963 .1764368 -3.09 0.002 -.890606 -.1989865

jewish | .2911736 .458702 0.63 0.526 -.6078657 1.190213

-------------+----------------------------------------------------------------

/cut1 | -.8667024 .2075344 -1.273462 -.4599425

/cut2 | .3661047 .2063156 -.0382665 .7704759

/cut3 | 1.408207 .2170333 .9828294 1.833584

------------------------------------------------------------------------------

10 / 47

Page 11: POLS/PUAD6080: Statistical Regression with Other Categorical … · 2020. 4. 20. · Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion POLS/PUAD6080: Statistical

Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion

Example Output Using Margins

• When compared to logit models, the margins command forordered models will produce a much larger matrix due to theincreased number of outcome categories

• Even in a simple case, with 4 outcome categories and a singlebinary explanatory variable, the margins results can besomewhat confusing to interpret• The variables in the following examples are:

• A four category variable on belief in afterlife (firm belief tonone)

• A binary measure of whether a respondent is Protestant (1 ifyes, 0 else)

• A 7 point scale of a respondent’s party id (strong D to strongR)

11 / 47

Page 12: POLS/PUAD6080: Statistical Regression with Other Categorical … · 2020. 4. 20. · Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion POLS/PUAD6080: Statistical

Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion

Example Output Using Margins

. margins protestant

Predictive margins Number of obs = 1,073

Model VCE : OIM

1._predict : Pr(afterlife==1), predict(pr outcome(1))

2._predict : Pr(afterlife==2), predict(pr outcome(2))

3._predict : Pr(afterlife==3), predict(pr outcome(3))

4._predict : Pr(afterlife==4), predict(pr outcome(4))

-------------------------------------------------------------------------------------

| Delta-method

| Margin Std. Err. z P>|z| [95% Conf. Interval]

--------------------+----------------------------------------------------------------

_predict#protestant |

1 0 | .475419 .0205397 23.15 0.000 .4351618 .5156761

1 1 | .7031778 .0198939 35.35 0.000 .6641865 .742169

2 0 | .2573557 .0150527 17.10 0.000 .2278529 .2868585

2 1 | .1746949 .0126143 13.85 0.000 .1499714 .1994185

3 0 | .1457501 .0129964 11.21 0.000 .1202776 .1712225

3 1 | .0721371 .0080107 9.01 0.000 .0564365 .0878377

4 0 | .1214752 .0123616 9.83 0.000 .0972469 .1457036

4 1 | .0499902 .0065168 7.67 0.000 .0372174 .062763

-------------------------------------------------------------------------------------

12 / 47

Page 13: POLS/PUAD6080: Statistical Regression with Other Categorical … · 2020. 4. 20. · Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion POLS/PUAD6080: Statistical

Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion

Example Output Using Margins• A better approach with continuous and quasi-continuous

variables may be to evaluate predicted probabilities from theminimum to maximum values, where theoretically appropriate

0

.2

.4

.6

.8

Prob

abilit

y

Non-Protestants Protestants

Yes, Definitely Yes, ProbablyNo, Probably Not No, Definitely Not

Is there an afterlife?

13 / 47

Page 14: POLS/PUAD6080: Statistical Regression with Other Categorical … · 2020. 4. 20. · Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion POLS/PUAD6080: Statistical

Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion

Example output using margins

0

.2

.4

.6

.8

Prob

abilit

y

Strong

Dem

ocrat

Modera

te Dem

ocrat

Lean

s Dem

ocrat

Indep

ende

nt

Lean

s Rep

ublica

n

Modera

te Rep

ublica

n

Strong

Rep

ublica

n

Yes, Definitely Yes, ProbablyNo, Probably Not No, Definitely Not

Is there an afterlife?

14 / 47

Page 15: POLS/PUAD6080: Statistical Regression with Other Categorical … · 2020. 4. 20. · Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion POLS/PUAD6080: Statistical

Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion

Example output using margins

0

.2

.4

.6

.8

1

Prob

abilit

y

Never

< Onc

e a Yea

r

Once a

Year

Severa

l Tim

es a

Year

Once a

Mon

th

2-3 Tim

es a

Month

Nearly

Every

Week

Every

Week

> Onc

e a W

eek

Yes, Definitely Yes, ProbablyNo, Probably Not No, Definitely Not

Is there an afterlife?

15 / 47

Page 16: POLS/PUAD6080: Statistical Regression with Other Categorical … · 2020. 4. 20. · Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion POLS/PUAD6080: Statistical

Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion

Motivating the model

• Consider a set of N individuals, i ∈ {1, 2, . . . , N} withdependent variable Yi that takes on J unordered responses.

• Let Pr(Yi = 1) = Pij and note that∑J

j=1 Pij = 1. That is,every i is required to make at least some choice in J .

• Naturally, we want to allow Pij to vary as a function of somek independent variable(s), Xi, indexed by a k × 1 vector ofparameters specific to that outcome, βj .

16 / 47

Page 17: POLS/PUAD6080: Statistical Regression with Other Categorical … · 2020. 4. 20. · Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion POLS/PUAD6080: Statistical

Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion

Motivating the model (cont’d.)

• As before, we’ll make use of the exponential function:

Pij = exp(Xiβj).

• However,∑J

j=1 6= 1, which it must be. Therefore, we rescalePij by dividing each by the sum of all Pijs:

Pr(Yi = j) ≡ Pij =exp(Xiβj)∑Jj=1 exp(Xiβj)

. (1)

17 / 47

Page 18: POLS/PUAD6080: Statistical Regression with Other Categorical … · 2020. 4. 20. · Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion POLS/PUAD6080: Statistical

Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion

Motivating the model (cont’d.)

• Equation one helps us to express what we will term themult-inomial logit (MNL).

• Unfortunately, as specified, Equation 1 is unidentified.

• That is, there are an infinite set of βjs that will renderidentical sets of probabilities.

• This problem is similar to what we encounted with the ordinallogit/probit vis-a-vis the constant term.

18 / 47

Page 19: POLS/PUAD6080: Statistical Regression with Other Categorical … · 2020. 4. 20. · Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion POLS/PUAD6080: Statistical

Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion

Motivating the model (cont’d.)

• To address the identification problem, we constrain theparameters for one of the outcomes, J , to zero making thatcategory the baseline for comparison to other outcomes.

• If we omit the first category, then Equation 1 changes as such:

Pr(Yi = 1) =1

1 +∑J

j=2 exp(Xiβ′j),

where β′j represents the rescaled influence of the various Xs

on a given outcome, relative to Pr(Yi = 1).

• We express the probability of the other J − 1 alternatives as:

exp(Xiβ′j)

1 +∑J

j=2 exp(Xiβ′j).

19 / 47

Page 20: POLS/PUAD6080: Statistical Regression with Other Categorical … · 2020. 4. 20. · Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion POLS/PUAD6080: Statistical

Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion

Interpreting MLE coefficients for the MNL

• MLE yields separate βs for each of the alternatives (except thebaseline, which is omitted as its parameters are set to zero).

• Coefficients on given covariates reflect the change in theprobability of a given outcome, relative to the omittedcategory.

20 / 47

Page 21: POLS/PUAD6080: Statistical Regression with Other Categorical … · 2020. 4. 20. · Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion POLS/PUAD6080: Statistical

Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion

An example: Judicial departures

• Let’s look at data on the choices elected judges make overtheir employment.

• In a given year, a judge can, (1) Stay on the bench, (2) Retirefor good, or (3) Retire for some other work.

• For now, we’ll just consider the lone covariate, “ElectionYear,” which is a dummy variable for whether the judge iscurrently up for election.

21 / 47

Page 22: POLS/PUAD6080: Statistical Regression with Other Categorical … · 2020. 4. 20. · Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion POLS/PUAD6080: Statistical

Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion

MNL output for judicial departures

. mlogit risk i.elecyr, base(0)

Multinomial logistic regression Number of obs = 1,992

LR chi2(2) = 93.92

Prob > chi2 = 0.0000

Log likelihood = -543.69618 Pseudo R2 = 0.0795

------------------------------------------------------------------------------

risk | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------

0 | (base outcome)

-------------+----------------------------------------------------------------

1 |

1.elecyr | -15.41396 469.3825 -0.03 0.974 -935.3867 904.5587

_cons | -2.605912 .0970537 -26.85 0.000 -2.796133 -2.41569

-------------+----------------------------------------------------------------

2 |

1.elecyr | 2.814694 .4362368 6.45 0.000 1.959686 3.669703

_cons | -5.396386 .3788515 -14.24 0.000 -6.138921 -4.653851

------------------------------------------------------------------------------

22 / 47

Page 23: POLS/PUAD6080: Statistical Regression with Other Categorical … · 2020. 4. 20. · Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion POLS/PUAD6080: Statistical

Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion

Analyzing the output

• Note a few things:• First, “0” (staying on the bench) is the omitted category.• Second, the coefficient on election year is negative on outcome

“1” (retire for good) but positive on outcome 2 (seek otheremployment).

• These results would change if we changed the baselinecategory.

• Nevertheless, model fit statistics would not.

23 / 47

Page 24: POLS/PUAD6080: Statistical Regression with Other Categorical … · 2020. 4. 20. · Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion POLS/PUAD6080: Statistical

Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion

Results witha different baseline category

. mlogit risk i.elecyr, base(2)

Multinomial logistic regression Number of obs = 1,992

LR chi2(2) = 93.92

Prob > chi2 = 0.0000

Log likelihood = -543.69618 Pseudo R2 = 0.0795

------------------------------------------------------------------------------

risk | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------

0 |

1.elecyr | -2.814694 .4362368 -6.45 0.000 -3.669703 -1.959686

_cons | 5.396386 .3788515 14.24 0.000 4.653851 6.138921

-------------+----------------------------------------------------------------

1 |

1.elecyr | -18.22866 469.3826 -0.04 0.969 -938.2017 901.7444

_cons | 2.790474 .3894259 7.17 0.000 2.027214 3.553735

-------------+----------------------------------------------------------------

2 | (base outcome)

------------------------------------------------------------------------------

24 / 47

Page 25: POLS/PUAD6080: Statistical Regression with Other Categorical … · 2020. 4. 20. · Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion POLS/PUAD6080: Statistical

Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion

Presenting results from the MNL (cont’d.)

• The margins command in Stata is going to be a

friend of ours.

• Returning to the previous example, let’s include a fewadditional covariates (remember the unit of analysis is ajudge-year).

• Let’s estimate the likelihood of a given retirement strategygiven covariates: Election Year, Workload Vested Age, andlet’s interact age with vested.

25 / 47

Page 26: POLS/PUAD6080: Statistical Regression with Other Categorical … · 2020. 4. 20. · Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion POLS/PUAD6080: Statistical

Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion

Model resultsMultinomial logistic regression Number of obs = 1,932

LR chi2(10) = 187.21

Prob > chi2 = 0.0000

Log likelihood = -482.84756 Pseudo R2 = 0.1624

------------------------------------------------------------------------------

risk | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------

0 | (base outcome)

-------------+----------------------------------------------------------------

1 |

1.elecyr | -15.49575 600.7121 -0.03 0.979 -1192.87 1161.878

workload | -.0000559 .0001426 -0.39 0.695 -.0003354 .0002237

1.vested | -5.669977 2.338391 -2.42 0.015 -10.25314 -1.086814

age | .0403759 .0288213 1.40 0.161 -.0161129 .0968647

|

vested#c.age |

1 | .1031974 .0386273 2.67 0.008 .0274893 .1789055

|

_cons | -5.601525 1.608329 -3.48 0.000 -8.753792 -2.449258

-------------+----------------------------------------------------------------

2 |

1.elecyr | 2.899413 .4652694 6.23 0.000 1.987501 3.811324

workload | .000246 .0002613 0.94 0.346 -.0002661 .0007581

1.vested | -14.76489 6.873756 -2.15 0.032 -28.2372 -1.292574

age | .0310025 .0325234 0.95 0.340 -.0327422 .0947472

|

vested#c.age |

1 | .2157043 .1047291 2.06 0.039 .0104391 .4209695

|

_cons | -7.402557 1.843319 -4.02 0.000 -11.0154 -3.789718

------------------------------------------------------------------------------ 26 / 47

Page 27: POLS/PUAD6080: Statistical Regression with Other Categorical … · 2020. 4. 20. · Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion POLS/PUAD6080: Statistical

Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion

Estimating predicted probabilities

• Let’s look at the interaction effect between age and vested.

• margins vested, at(age=(30(10)80))

• marginsplot

27 / 47

Page 28: POLS/PUAD6080: Statistical Regression with Other Categorical … · 2020. 4. 20. · Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion POLS/PUAD6080: Statistical

Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion

Graphed predicted probabilities

0.2

.4.6

.81

Prob

abilit

y

30 40 50 60 70 80age

vested=0, Outcome=0 vested=0, Outcome=1vested=0, Outcome=2 vested=1, Outcome=0vested=1, Outcome=1 vested=1, Outcome=2

Predictive Margins of vested with 95% CIs

28 / 47

Page 29: POLS/PUAD6080: Statistical Regression with Other Categorical … · 2020. 4. 20. · Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion POLS/PUAD6080: Statistical

Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion

The Poisson process

• A good way to think about an event count outcome is tothink of them as events that occur over time.

• Let λ denote the constant rate at which events occur—thiscould be the expected number of events in a given period oflength h.

• Then the probability that an event occurs in a given interval isλh, and the probability is does not occur is 1− λh.

29 / 47

Page 30: POLS/PUAD6080: Statistical Regression with Other Categorical … · 2020. 4. 20. · Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion POLS/PUAD6080: Statistical

Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion

The Poisson process (cont’d.)

• Let Yt reflect the number of events that occur in the interval tof length h.

• The probability that the number of events that occurs in(t, t+ h] is equal to some value y ∈ {0, 1, 2, 3, . . .} is:

Pr(Yt = y) =exp(−λh)λhy

y!. (2)

• And if all the intervals are of equal length 1, Equation (1)becomes:

Pr(Yt = y) =exp(−λ)λy

y!. (3)

30 / 47

Page 31: POLS/PUAD6080: Statistical Regression with Other Categorical … · 2020. 4. 20. · Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion POLS/PUAD6080: Statistical

Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion

The Poisson distribution

• Equations (1) and (2) give the Poisson distribution.

• Critically, we assume that values of Yt arrive at a constant rate(λ) and are independent across draws from the distribution.

• The parameter λ is interpreted as a rate or the expectednumber of events during a given period, t. That is, E(Y ) = λ.• As λ increases:

• The mean of the distribution gets bigger (shockingly enough)• The variance of the distribution gets larger too, and it turns

out that E(Y ) = V ar(Y ) = λ.• The distribution becomes a normal distribution (relevant for

deciding between MLE and OLS).

31 / 47

Page 32: POLS/PUAD6080: Statistical Regression with Other Categorical … · 2020. 4. 20. · Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion POLS/PUAD6080: Statistical

Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion

Example: The Chief Justice’s Year-End Report

• As a running example, consider the number of requests theChief Justice of the United States makes to Congress in hisannual Year-End Report.

• Consider the following covariates: (1) Public approval of theCourt, (2, 3, 4) The Court’s ideological distance to the House,Senate, and President, and (5) The number of years the chiefjustice has held his position.

32 / 47

Page 33: POLS/PUAD6080: Statistical Regression with Other Categorical … · 2020. 4. 20. · Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion POLS/PUAD6080: Statistical

Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion

Requests for institutional reform

02

46

Freq

uenc

y

0 5 10 15Total number of requests

33 / 47

Page 34: POLS/PUAD6080: Statistical Regression with Other Categorical … · 2020. 4. 20. · Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion POLS/PUAD6080: Statistical

Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion

Poisson in Stata

• The standard framework for Poisson regressions follows whatought now to feel like a pretty familiar pattern:• poisson y x1 x2 ... xk [if...] [, options]

34 / 47

Page 35: POLS/PUAD6080: Statistical Regression with Other Categorical … · 2020. 4. 20. · Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion POLS/PUAD6080: Statistical

Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion

Poisson results

. poisson requests_total gss_conf house sen pres cjtenurebyte

Iteration 0: log likelihood = -95.624953

Iteration 1: log likelihood = -95.432778

Iteration 2: log likelihood = -95.432711

Iteration 3: log likelihood = -95.432711

Poisson regression Number of obs = 39

LR chi2(5) = 21.63

Prob > chi2 = 0.0006

Log likelihood = -95.432711 Pseudo R2 = 0.1018

--------------------------------------------------------------------------------

requests_total | Coef. Std. Err. z P>|z| [95% Conf. Interval]

---------------+----------------------------------------------------------------

gss_conf | -.0157822 .0292324 -0.54 0.589 -.0730767 .0415124

house_dist | -2.289797 1.082493 -2.12 0.034 -4.411444 -.1681492

sen_dist | .6036195 .852443 0.71 0.479 -1.067138 2.274377

pres_dist | -1.951108 .7159464 -2.73 0.006 -3.354337 -.5478791

cjtenurebyte | .0442441 .0179891 2.46 0.014 .0089861 .0795022

_cons | 2.802411 .5752195 4.87 0.000 1.675001 3.92982

--------------------------------------------------------------------------------

35 / 47

Page 36: POLS/PUAD6080: Statistical Regression with Other Categorical … · 2020. 4. 20. · Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion POLS/PUAD6080: Statistical

Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion

Using Stata to recover count estimates

• Stata’s margins command will once again prove quite

useful.

• For example, to estimate the predicted number of requests forinstitutional reform in a given year by a chief justice’s tenurein office across a 15 year period:margins, at(cjtenurebyte=(0(3)15)).

• We can then use the marginsplot command to recover a

graphical presentation of this relationship.

36 / 47

Page 37: POLS/PUAD6080: Statistical Regression with Other Categorical … · 2020. 4. 20. · Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion POLS/PUAD6080: Statistical

Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion

Exposure and offsets in Poisson models

• We’ve been modeling the number of outcomes in a givenperiod so far.

• But what if there never were any opportunities during a givenperiod for a non-zero outcom to occur?

• For example, if I were to model the number of congressionalacts the Supreme Court invalidated in a given year, it might bepertinent to know if no congressional acts were even reviewed.

37 / 47

Page 38: POLS/PUAD6080: Statistical Regression with Other Categorical … · 2020. 4. 20. · Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion POLS/PUAD6080: Statistical

Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion

Exposure and offsets in Poisson models (cont’d.)

• We need to account for the exposure term, and the easiestway to thisis to include Mi as an “offset” in the model:

λi = exp[Xiβ + ln(Mi)],

which constrains the effect of the offset to a coefficient of 1.

• In Stata, we use the exposure option when estimating a

Poisson.

• We could also include Mi as a covariate and model its effecton E(Yi) directly, examining its coefficient to see how close toone it really is.

38 / 47

Page 39: POLS/PUAD6080: Statistical Regression with Other Categorical … · 2020. 4. 20. · Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion POLS/PUAD6080: Statistical

Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion

Some problems with Poisson

• We’ve made strong assumptions in setting up the Poissonmodel.

• First, we required that the probability of some event occurringis constant within a given period. And second, we requiredthat the probability of some event occuring was independentof other events during the same period.

• But what if this assumption were violated?

39 / 47

Page 40: POLS/PUAD6080: Statistical Regression with Other Categorical … · 2020. 4. 20. · Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion POLS/PUAD6080: Statistical

Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion

Contagion and dispersion

• Suppose I count the total number of leaves on my hydrangeaover four periods: winter, spring, summer, and fall.

• How likely is it that the rate of occurrence (λ) is constantacross all four seasons?

• Because we find that the occurrence of observing one leafincreases the likelihood of observing another, we have a“positive contagion.”

• This increases the variance of the observed counts, which isbad mojo when we have assumed that E(Y ) = V ar(Y ) = λand leads to a problem known as “overdispersion.”

40 / 47

Page 41: POLS/PUAD6080: Statistical Regression with Other Categorical … · 2020. 4. 20. · Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion POLS/PUAD6080: Statistical

Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion

Testing for over-dispersion

• We can test whether we have over/under-dispersion in ourdata.

• The easiest way to do this is by running a negative binomialregression and checking the statistical significance of thedispersion parameter, α.

41 / 47

Page 42: POLS/PUAD6080: Statistical Regression with Other Categorical … · 2020. 4. 20. · Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion POLS/PUAD6080: Statistical

Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion

Addressing overdispersion

• If we have problems with dispersion, it makes sense to just goahead and model it directly rather than to rely uponinadequate results from a Poisson.

• Droping the assumption that λ is a constant, we can insteadtreat it as a random variable:

E(Yi) ≡ λi = exp(Xiβ + ui)

= exp(Xiβ)exp(ui)

= λiνi. (4)

• All that’s left now is to specify a distribution on ui. Weusually use the Gamma distribution.

42 / 47

Page 43: POLS/PUAD6080: Statistical Regression with Other Categorical … · 2020. 4. 20. · Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion POLS/PUAD6080: Statistical

Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion

The negative binomial distribution

• If νi is assumed to be randomly distributed according to aone-parameter Gamma distribution with mean E(ν) = 1 andvariance V ar(ν) = 1

α , then the marginal density of Y is saidto be negative binomial:

Pr(Yi = y | λi, α) =

(Γ(α−1 + Yi)

Γ(α−1)Γ(Yi + 1)

)(α−1

α−1 + λi

)α−1(λi

λi + α−1

)Yi

, (5)

where Γ is the gamma function.

• We model λi = exp(Xiβ), which has E(Y ) = λ andV ar(Y ) = λ(1 + αλ), where α > 0.

• Note that when α = 0, the negative binomial reduces to thePoisson.

43 / 47

Page 44: POLS/PUAD6080: Statistical Regression with Other Categorical … · 2020. 4. 20. · Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion POLS/PUAD6080: Statistical

Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion

Negative binomial regressions in Stata

• To estimate the negative binomial in Stata, we simply replacepoisson with nbreg.

• We really want to pay attention to the value of α and ln(α).

• If we find statistically significant evidence that α 6= 0, then welikely shouldn’t be using Poisson as our data are overdispersed.

44 / 47

Page 45: POLS/PUAD6080: Statistical Regression with Other Categorical … · 2020. 4. 20. · Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion POLS/PUAD6080: Statistical

Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion

Example with data

Negative binomial regression Number of obs = 39

LR chi2(5) = 12.69

Dispersion = mean Prob > chi2 = 0.0264

Log likelihood = -93.718196 Pseudo R2 = 0.0634

--------------------------------------------------------------------------------

requests_total | Coef. Std. Err. z P>|z| [95% Conf. Interval]

---------------+----------------------------------------------------------------

gss_conf | -.0165893 .0353904 -0.47 0.639 -.0859531 .0527746

house_dist | -2.168843 1.319143 -1.64 0.100 -4.754315 .4166289

sen_dist | .6064278 1.078424 0.56 0.574 -1.507245 2.7201

pres_dist | -1.896643 .8535061 -2.22 0.026 -3.569484 -.2238019

cjtenurebyte | .0444888 .0222564 2.00 0.046 .0008671 .0881106

_cons | 2.76428 .6811576 4.06 0.000 1.429236 4.099324

---------------+----------------------------------------------------------------

/lnalpha | -2.384044 .7152737 -3.785954 -.982133

---------------+----------------------------------------------------------------

alpha | .0921771 .0659318 .0226872 .3745114

--------------------------------------------------------------------------------

LR test of alpha=0: chibar2(01) = 3.43 Prob >= chibar2 = 0.032

45 / 47

Page 46: POLS/PUAD6080: Statistical Regression with Other Categorical … · 2020. 4. 20. · Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion POLS/PUAD6080: Statistical

Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion

Assessing the model’s results

• We see from the results that (a) the data appear to beoverdispersed. When the chief justice asks for one for ofinstitutional reform, he’s likely to request even more.

• Also note what happened to the estimated effect on the CJ’sdistance to the House. When we didn’t account foroverdispersion, we found the CJ catering to Housepreferences, but under the negative binomial, we’re unable toreject the null that House members are unrelated to the CJ’srequests for institutional reform.

46 / 47

Page 47: POLS/PUAD6080: Statistical Regression with Other Categorical … · 2020. 4. 20. · Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion POLS/PUAD6080: Statistical

Motivation Ordered Outcomes Nominal Outcomes Event Counts Conclusion

Discussion

• In this unit, we analyzed three additional statistical regressionmodels useful for the examination of categorical dependentvariables.

• We learned that the ordered logit/probit is appropriate whenthe DV has a level of measurement that is ordinal.

• The multinomial logit is appropriate when the DV has a levelof measurement that is nominal.

• And finally we learned that the Poisson or negative binomial isappropriate when our DV is an event count.

47 / 47