Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
PUBL0055: Introduction to Quantitative MethodsLecture 10: Binary Dependent Variable Models
Jack Blumenau and Benjamin Lauderdale
1 / 58
Example: Representation in Parliament
E-petitions and MP supportA crucial question in political science is whether representatives areresponsive to their constituents. We will examine this question by looking atsignatures to an e-petition, and seeing whether MPs who received lots ofsignatures were more likely to support the petition in parliamentary debate.
• 𝑌 : MP supported (1) or opposed (0) the petition in debate• 𝑋1: Number of petition signatures from the MP’s constituency• 𝑋2: The party of the MP: Conservative (1) or Labour (0)
(To read the excellent paper on which this example is based, click here)
2 / 58
http://www.jackblumenau.com/papers/petitions.pdf
Example: Representation in Parliament
3 / 58
Example: Representation in Parliament
• The dependent variable has only 2 values, coded as 1 and 0
• 𝑌 = 1 if the MP supported the petition in a speech• 𝑌 = 0 if the MP did not support the petition in a speech
Supported Petition Frequency Percent
Supported 26 51Opposed 25 49Total 51 100
4 / 58
Binary dependent variables
Binary variables are those with two categories
• 𝑌 = 1 if something is “true”, or occurred• 𝑌 = 0 if something is “not true”, or did not occur
Examples of binary response variables
• Survey questions: yes/no; agree/disagree• In politics: vote/do not vote• In medicine: have/do not have a certain condition• In education: correct/incorrect; graduate/do not graduate; pass/fail
We have used binary variables as explanatory variables (𝑋) in previous weeks.This week we focus on models for binary dependent variables (𝑌 ).
5 / 58
Why do we need a new regression model?
• A regression for this dependent variable would be useful to describe howexplanatory variables predict MP support for the petition
• Why can’t we just run a linear regression?
6 / 58
Outline
The Linear Probability Model
The Binary Logistic Regression Model
Interpretation
Predicted Probabilities
Inference
7 / 58
The Linear Probability Model
Linear Probability Model
The linear regression for binary outcome variables is known as the linearprobability model:
Linear Probability Model
𝐸[𝑌 |𝑋1, 𝑋2, ..., 𝑋𝑘] = 𝛼 + 𝛽1𝑋1 + 𝛽2𝑋2 + ... + 𝛽𝑘𝑋𝑘𝑃𝑟(𝑌 = 1|𝑋1, 𝑋2, ...) = 𝛼 + 𝛽1𝑋1 + 𝛽2𝑋2 + ... + 𝛽𝑘𝑋𝑘
Advantages:
• We can use a well-known model for a new class of phenomena• Easy to interpret the marginal effects of 𝑋 variables
Disadvantages:
• The linear model assumes a continuous dependent variable, if thedependent variable is binary we run into problems.
8 / 58
Linear Probability Model – Advantages
lpm
Linear Probability Model – Disadvantages
Problems with Linear Probability Model (I)
Predictions, ̂𝑌 , are interpreted as probability for 𝑌 = 1
• 𝑃(𝑌 = 1) = ̂𝑌 = 𝛼+𝛽1𝑋• Can be above 1 if 𝑋 is large enough• Can be below 1 if 𝑋 is small enough
0 500 1000 1500 2000 2500
Number of signatures
Sup
port
ed p
etiti
on in
par
liam
ent
01
Problem: the linear regression can predictprobabilities > than 1 and < than 0.
10 / 58
Linear Probability Model – Disadvantages
Problems with Linear Probability Model (II)The linear function may not be appropriate
• e.g. Does an additional 1000 signatures have the same effect going from1000 to 2000 as from 3000 to 4000?
0 500 1000 1500 2000 2500
0.0
0.5
1.0
Number of signatures
u i Implication: We need a model that willaccount for these types of non-constanteffects.
11 / 58
12 / 58
The Binary Logistic Regression Model
Proportions and probabilities
• When we have a binary Y, we are interested in the proportion of thesubjects in the population for whom 𝑌 = 1
• We can also think of this as the probability 𝜋 that a randomly selectedmember of the population will have the value 𝑌 = 1 rather than 𝑌 = 0
𝜋 = 𝑝(𝑌 = 1)1 − 𝜋 = 𝑝(𝑌 = 0)
• If 𝜋 = 0, no unit in the population has 𝑌 = 1; if 𝜋 = 1 every unit in thepopulation has 𝑌 = 1.
• We want to model 𝜋, given one or more explanatory variables 𝑋.
13 / 58
Continuous predictor of petition support
Does petition support depend on the number of signatures an MP receives?
0 500 1000 1500 2000 2500
Number of signatures
Sup
port
ed p
etiti
on in
par
liam
ent
01
14 / 58
Binary predictor of petition support
Does petition support depend on the party of the MP?
Opposed SupportedLabour 8 20
Conservative 17 6
Table 2: Sample counts
Opposed SupportedLabour 0.29 0.71
Conservative 0.74 0.26
Table 3: Sample proportions
15 / 58
Conditional probabilities
• Consider the dummy variable 𝑋 = 1 if an MP is a member of theConservative Party and 𝑋 = 0 if they are a member of the Labour Party
• We would like to estimate conditional probabilities of supporting thepetition separately for these two groups:
̂𝑃 (𝑌 = 1|𝑋 = 0) = 𝜇𝑋=0 = 0.71̂𝑃 (𝑌 = 1|𝑋 = 1) = 𝜇𝑋=1 = 0.26
• The estimated conditional probability ( ̂𝑃 ) of supporting the petition ishigher for Labour MPs than Conservative MPs
• More generally, we would like to model how the probability𝜋 = 𝑃(𝑌 = 1) depends on one or more explanatory variables, whichmight be continuous.
16 / 58
How to model 𝜋?
• Linear regression model: conditional mean is equal to a linearcombination of explanatory variables:
𝐸(𝑌𝑖) = 𝜇𝑖 = 𝛼 + 𝛽1𝑋1𝑖 + ⋯
• Linear probability model: conditional probability is equal to a linearcombination of X:
𝐸(𝑌𝑖|𝑋𝑖) = 𝑃(𝑌𝑖 = 1|𝑋𝑖) = 𝜋𝑖 = 𝛼 + 𝛽1𝑋1𝑖 + ⋯
• However, we would like a way to make sure 0 ≤ 𝜋𝑖 ≤ 1• We cannot model a linear model for 𝜋 directly• Instead, we build a linear model for a transformation of 𝜋
17 / 58
From probabilities to odds
Odds: the ratio of the probabilities of the event and the non-event:
Odds = 𝑃(𝑌 = 1)1 − 𝑃(𝑌 = 1) =𝜋
1 − 𝜋
• If the probability of supporting the petition is 𝜋 = 0.25…• the odds of supporting the petition are = 0.25/0.75 = 0.33• the odds of not supporting the petition are = 0.75/0.25 = 3
• Odds vs. probabilities 𝜋:• If odds > 1 → 𝑃(𝑌 = 1) > 𝑃(𝑌 = 0) → 𝜋 > 0.5• If odds < 1 → 𝑃(𝑌 = 1) < 𝑃(𝑌 = 0) → 𝜋 < 0.5• If odds = 1 → 𝑃(𝑌 = 1) = 𝑃(𝑌 = 0) → 𝜋 = 0.5
18 / 58
From probabilities to odds
0 20 40 60 80 100
0.0
0.2
0.4
0.6
0.8
1.0
Odds
Pro
babi
lity
Range of 𝜋 is (0, 1) — Range of odds is (0, ∞)
19 / 58
Conditional odds
Conditional odds: the odds of an event, conditional on another event:
Opposed SupportedLabour 0.29 0.71
Conservative 0.74 0.26
Table 4: Sample proportions
• Odds of supporting the petition if you are a Labour MP:
̂Odds𝐿 =̂𝜋𝐿
(1 − ̂𝜋𝐿)= 0.71(1 − 0.71) = 2.5
• Odds of supporting the petition if you are a Conservative MP:
̂Odds𝐶 =̂𝜋𝐶
(1 − ̂𝜋𝐶)= 0.26(1 − 0.26) = 0.35
20 / 58
Odds ratios
Odds ratio: the ratio of two conditional odds
• Describes the association between two variables
̂OR𝐿𝐶 =̂Odds𝐿̂Odds𝐶
= 2.50.35 = 7.08
• ̂Odds𝐿 is the odds that 𝑌 = 1 for Labour MPs• ̂Odds𝐶 is the odds that 𝑌 = 1 for Conservative MPs
Interpretation:
• The odds of a Labour MP supporting the petition are 7.08 times the odds ofa Conservative MP supporting the petition
• This also means that the probability of supporting the petition is higher forLabour MPs than Conservative MPs
→ being a Labour MP is associated with higher odds (and probability) ofsupporting the petition
21 / 58
Odds ratios
• In our example,
• 𝑌 = supported the petition (1=supported, 0=opposed)• 𝑋 = party (1=Labour, 0=Conservative)
• The association is described by comparing odds of 𝑌 = 1 for levels ofvariable 𝑋
• If odds ratio = 1 → no association between 𝑋 and 𝑌• If odds ratio > 1 → positive association between 𝑋 and 𝑌• If odds ratio < 1 → negative association between 𝑋 and 𝑌
22 / 58
From odds to log-odds
• Recall that we need to solve the problem that
• The linear predictor 𝛼 + 𝛽1𝑋1𝑖 + ⋯ can take values from −∞ to ∞.• The probability 𝜋𝑖 must be between 0 and 1.
• We now have the necessary pieces to solve the problem.
• Turning 𝜋𝑖 into the odds expanded the range to:
0 < 𝜋(1 − 𝜋) < +∞
• By taking the logarithm of the odds:
−∞ < log ( 𝜋1 − 𝜋 ) < +∞
• This transformation is known as the logit transformation
23 / 58
From probabilities to log-odds
−6 −4 −2 0 2 4 6
0.0
0.2
0.4
0.6
0.8
1.0
Log−odds
Pro
babi
lity
Range of 𝜋 is (0, 1) — Range of log-odds is (−∞, ∞)
24 / 58
The Binary Logistic Regression Model
The logistic regression model, also known as the logit model, is a model for thelog-odds of an outcome:
• 𝑌 is a binary response variable, with values 0 and 1• 𝑋1, … , 𝑋𝑘 are 𝑘 explanatory variables of any type• Observations 𝑌𝑖 are statistically independent of each other• For each observation 𝑖, the following equation holds for
log(Odds𝑖) = log (𝜋𝑖
1 − 𝜋𝑖) = 𝛼 + 𝛽1𝑋1𝑖 + ⋯ + 𝛽𝑘𝑋𝑘𝑖
where 𝛼 and 𝛽1, … , 𝛽𝑘 are the unknown parameters of the model, to beestimated from data
25 / 58
Model for the probabilities
• Although the model is written first for the log-odds, it also implies a modelfor the probabilities, 𝜋𝑖:
𝜋𝑖 =exp(𝛼 + 𝛽1𝑋1𝑖 + ⋯ + 𝛽𝑘𝑋𝑘𝑖)
1 + exp(𝛼 + 𝛽1𝑋1𝑖 + ⋯ + 𝛽𝑘𝑋𝑘𝑖)
• This is always between 0 and 1
• The plots on the next slide give examples of
𝜋 = exp(𝛼 + 𝛽𝑋)1 + exp(𝛼 + 𝛽𝑋)for a simple logistic model with one continuous 𝑋
26 / 58
Probabilities from a logistic model
−4 −2 0 2 4
0.0
0.2
0.4
0.6
0.8
1.0
Changing AlphaBeta = 1
X
Fitt
ed V
alue
Alpha = 0Alpha = 1Alpha = −2
−4 −2 0 2 40.
00.
20.
40.
60.
81.
0
Changing BetaAlpha = 0
X
Fitt
ed V
alue
Beta = 1Beta = 2Beta = 0Beta = −1
27 / 58
28 / 58
Interpretation
Petition signatures and MP support
Consider again the following variables:
• 𝑌 : MP support for petition (1 = Supported, 0 = Opposed)• 𝑋1: Number of petition signatures• 𝑋2: Party of the MP (1 = Conservative, 0 = Labour)
We can estimate the logistic model using the glm function in R:
logit_model
Interpretation of the coefficients
̂log( 𝜋𝑖1−𝜋𝑖 ) = −9.36 + 0.01 ∗ signatures𝑖 − 3.09 ∗ Conservative𝑖#### =============================## Model 1## -----------------------------## (Intercept) -9.36 **## (3.09)## signatures 0.01 ***## (0.00)## partyConservative -3.09 *## (1.23)## -----------------------------## AIC 29.23## BIC 35.02## Log Likelihood -11.61## Deviance 23.23## Num. obs. 51## =============================## *** p < 0.001; ** p < 0.01; * p < 0.05
Some aspects of interpretation arestraightforward:
• The sign of the coefficientsindicate the direction of theassociations
• 𝛽signatures > 0 → more signaturesincrease the probability of MPsupport
• 𝛽party < 0 → being a ConservativeMP decreases the probability ofMP support
30 / 58
Interpretation of the coefficients
̂log( 𝜋𝑖1−𝜋𝑖 ) = −9.36 + 0.01 ∗ signatures𝑖 − 3.09 ∗ Conservative𝑖#### =============================## Model 1## -----------------------------## (Intercept) -9.36 **## (3.09)## signatures 0.01 ***## (0.00)## partyConservative -3.09 *## (1.23)## -----------------------------## AIC 29.23## BIC 35.02## Log Likelihood -11.61## Deviance 23.23## Num. obs. 51## =============================## *** p < 0.001; ** p < 0.01; * p < 0.05
Some aspects of interpretation arestraightforward:
• The significance of the coefficientsare still determined by
̂𝛽𝑆𝐸( ̂𝛽)
30 / 58
Interpretation of the coefficients
̂log( 𝜋𝑖1−𝜋𝑖 ) = −9.36 + 0.01 ∗ signatures𝑖 − 3.09 ∗ Conservative𝑖#### =============================## Model 1## -----------------------------## (Intercept) -9.36 **## (3.09)## signatures 0.01 ***## (0.00)## partyConservative -3.09 *## (1.23)## -----------------------------## AIC 29.23## BIC 35.02## Log Likelihood -11.61## Deviance 23.23## Num. obs. 51## =============================## *** p < 0.001; ** p < 0.01; * p < 0.05
It is possible to interpret thecoefficients directly…
• → a one signature increase isassociated with an increase of𝛽signatures = 0.01 in the log-oddsof MP support, holding constantparty
• → Conservative MPs areassociated 𝛽party = −3.09 lowerlog-odds of petition support,holding constant signatures
• …but no-one thinks in terms oflog-odds!
31 / 58
Interpretation of the coefficients
Instead of interpreting the log-odd ratios, we can convert the ̂𝛽 coefficients into(slightly) more intuitive odds ratios:
• exp( ̂𝛽signatures) = exp(0.01) = 1.01• exp( ̂𝛽party) = exp(−3.09) = 0.05
In R:
round(exp(coef(logit_model)),2)
## (Intercept) signatures partyConservative## 0.00 1.01 0.05
Where
• coef returns the estimated coefficients• exp exponentiates the coefficients• round rounds the results to 2 digits
32 / 58
Interpretation of the coefficients
• exp( ̂𝛽signatures) = 1.01: Controlling for party, an increase of 1 signaturemultiplies the odds of petition support by 1.01, controlling for party (i.e. itincreases the odds by 1%)
• exp( ̂𝛽party) = 0.05: Controlling for signatures, being a Conservative MPmultiplies the odds of petition support by 0.05 (i.e. it decreases the odds by95%)
33 / 58
Interpretation of the coefficients
• We can directly interpret the coefficients of the binary logistic regressionas partial log-odds ratios
• We can exponentiate the coefficients, and then interpret them as partialodds ratios
• But this still requires having to think in terms of odds…• Instead, we can directly calculate predicted probabilities from the modeland communicate these instead
34 / 58
35 / 58
Predicted Probabilities
Calculating predicted probabilities
• The logistic regression gives us an equation for calculating the fittedlog-odds that 𝑌 = 1 for a given set of X values:
̂log( 𝜋𝑖1 − 𝜋𝑖) = 𝛼 + ̂𝛽1 ∗ 𝑋1 − ̂𝛽2 ∗ 𝑋2
• To recover the probability that 𝑌 = 1, we use
̂𝜋𝑖 =exp( ̂𝛼 + ̂𝛽1𝑋1𝑖 + ̂𝛽2𝑋2𝑖)
1 + exp( ̂𝛼 + ̂𝛽1𝑋1𝑖 + ̂𝛽2𝑋2𝑖)
for selected values of the explanatory variables 𝑋1, … , 𝑋𝑘.1
• Typically, we will calculate ̂𝜋 for different “profiles” of our X variables,where we only change the values of one variable
1𝑒𝑥𝑝() is the exponential function, the inverse of the 𝑙𝑜𝑔() function36 / 58
First differences
First differencesA simple way to communicate the effects of our X variables on Y is to reportfirst differences in the predicted probabilities. For example:
Δ𝜋 = 𝜋2 − 𝜋1𝜋1 =
exp(𝛼 + 𝛽1𝑋1 + 𝛽2𝑋2)1 + exp(𝛼 + 𝛽1𝑋1 + 𝛽2𝑋2)
𝜋2 =exp(𝛼 + 𝛽1𝑋1 + 𝛽2(𝑋2 + 1))
1 + exp(𝛼 + 𝛽1𝑋1 + 𝛽2(𝑋2 + 1))
The allows us to describe how 𝜋 changes when 𝑋2 changes while holding 𝑋1constant.
37 / 58
First differences
What is the probability of supporting a petition for a Labour MP who receives1200 signatures?
#### ============================## Model 1## ----------------------------## (Intercept) -9.36 **## (3.09)## signatures 0.01 ***## (0.00)## partyConservative -3.09 *## (1.23)## ----------------------------## Num. obs. 51## ============================
𝜋1 =exp(𝛼 + 𝛽1𝑋1𝑖 + 𝛽2𝑋2𝑖)
1 + exp(𝛼 + 𝛽1𝑋1𝑖 + 𝛽2𝑋2𝑖)
• Substitute 𝛼, 𝛽1, 𝛽2 with estimatedvalues
• Set 𝑋1 = 1200 and 𝑋2 = 0• 𝜋1 = exp(−9.36+0.01∗1200−3.09∗0)1+exp(−9.36+0.01∗1200−3.09∗0)• 𝜋1 = exp(0.14)1+exp(0.14) = 1.142.14• 𝜋1 = 0.53
Result: the probability for an MP with these X values would be 0.53
38 / 58
First differences
What is the probability of supporting a petition for a Conservative MP whoreceives 1200 signatures?
#### ============================## Model 1## ----------------------------## (Intercept) -9.36 **## (3.09)## signatures 0.01 ***## (0.00)## partyConservative -3.09 *## (1.23)## ----------------------------## Num. obs. 51## ============================
𝜋1 =exp(𝛼 + 𝛽1𝑋1𝑖 + 𝛽2𝑋2𝑖)
1 + exp(𝛼 + 𝛽1𝑋1𝑖 + 𝛽2𝑋2𝑖)
• Substitute 𝛼, 𝛽1, 𝛽2 with estimatedvalues
• Set 𝑋1 = 1200 and 𝑋2 = 1• 𝜋2 = exp(−9.36+0.01∗1200−3.09∗1)1+exp(−9.36+0.01∗1200−3.09∗1)• 𝜋2 = exp(−2.96)1+exp(−2.96) = 0.051.05• 𝜋2 = 0.05
Result: the probability for an MP with these X values would be 0.05
39 / 58
First differences
• In R, we can calculate the predicted probabilities using predict()• To do so, we need to specify values for all the explanatory variables
predict(logit_model, newdata = data.frame(signatures = 1200,party = "Labour"),
type = "response")
## 1## 0.5337154
predict(logit_model, newdata = data.frame(signatures = 1200,party = "Conservative"),
type = "response")
## 1## 0.0493317
• where type = "response" tells R to calculate predicted probabilities• If 𝜋1 = .53 and 𝜋2 = .049, then the first difference 𝜋2 − 𝜋1 = −.48• → the probability of a Conservative MP supporting the petition is .48 lowerthan the probability of a Labour MP supporting the petition
40 / 58
Non-linear relationship between X and 𝜋
0 500 1000 1500 2000 2500
0.0
0.2
0.4
0.6
0.8
1.0
Number of signatures
π i
• The plot shows the predictedprobability of MP support over therange of the signature variable forLabour party MPs
• Notice that the predictions are nolonger linear: the effect of 𝑋 on 𝜋is not constant
41 / 58
Non-linear relationship between X and 𝜋
0 500 1000 1500 2000 2500
0.0
0.2
0.4
0.6
0.8
1.0
Number of signatures
π i
• Consider the change in 𝜋 thatresults from an increase insignatures from 500 to 1000
• → 𝜋 increases from 0 to about .18
42 / 58
Non-linear relationship between X and 𝜋
0 500 1000 1500 2000 2500
0.0
0.2
0.4
0.6
0.8
1.0
Number of signatures
π i
• Consider the change in 𝜋 thatresults from an increase insignatures from 1000 to 1500
• → 𝜋 increases from .18 to about.98
• Implication: the same change in Xresults in different changes in 𝜋depending on which values of X weconsider
43 / 58
𝑋1,𝑋2 and 𝜋
0 500 1000 1500 2000 2500
0.0
0.2
0.4
0.6
0.8
1.0
Number of signatures
π i
LabourConservative
• The plot shows the predictedprobability of MP support over therange of signatures for Labour andConservative MPs
• Question: Is the effect of partyconstant across the range of thesignature variable?
• Answer: No!
44 / 58
𝑋1,𝑋2 and 𝜋
0 500 1000 1500 2000 2500
0.0
0.2
0.4
0.6
0.8
1.0
Number of signatures
π i
LabourConservative
• Set the signature variable equal to1000
• Calculate the difference inprobability for Labour andConservative MPs
• 𝜋𝐿 − 𝜋𝐶 ≈ 0.18
45 / 58
𝑋1,𝑋2 and 𝜋
0 500 1000 1500 2000 2500
0.0
0.2
0.4
0.6
0.8
1.0
Number of signatures
π i
LabourConservative • Set the signature variable equal to
1500• Calculate the difference inprobability for Labour andConservative MPs
• 𝜋𝐿 − 𝜋𝐶 ≈ 0.57• Implication: Even exactly the samechange in 𝑋1 will result indifferent changes in 𝜋 dependingon which values of 𝑋2 we consider
46 / 58
Summary
• No single number can describe the effect of 𝑋 on 𝜋 everywhere• → The effect of a one-unit change in 𝑋1 will be different for differentstarting values of 𝑋1
• → The effect of a one-unit change in 𝑋1 will be different for different valuesof 𝑋2
• Because of this, best practice is to provide predicted probabilities for somekey comparisons in order to describe the effects of your explanatoryvariables
47 / 58
48 / 58
Inference
Statistical inference for logistic regression
• Logistic regression is not estimated by ordinary least squares (OLS), butrather by maximum likelihood estimation (MLE).
• However, the estimates from this method still have normally distributedsampling distributions.
• This feature means we can use familiar statistical tests:
• Tests ask where a statistic (e.g., ̂𝛽) falls in the sampling distribution thatwould result under a null hypothesis (e.g., 𝛽 = 0).
• We already know how to do hypothesis tests with normal samplingdistributions.
49 / 58
Hypothesis Tests
Hypothesis tests for coefficients take a familiar form:
• We test against a null hypothesis of no effect – 𝐻0 ∶ 𝛽𝑖 = 0• We compute the a test-statistic, the 𝑧 value:
𝑧 =̂𝛽𝑖
𝑆𝐸( ̂𝛽𝑖)
• It is a 𝑧 value because, we compute p values from the standard normalinstead of the student t
• We reject the null at the 95% level when |𝑧| ≥ 1.96• These estimates become unstable in small samples (< 100)
50 / 58
Hypothesis test example
screenreg(logit_model)
#### =============================## Model 1## -----------------------------## (Intercept) -9.36 **## (3.09)## signatures 0.01 ***## (0.00)## partyConservative -3.09 *## (1.23)## -----------------------------## AIC 29.23## BIC 35.02## Log Likelihood -11.61## Deviance 23.23## Num. obs. 51## =============================## *** p < 0.001; ** p < 0.01; * p < 0.05
• We reject 𝐻0 if 𝑝 is small,e.g. < 0.05
• That is, if 𝑧 > 1.96 or 𝑧 < −1.96• z = 0.010.001 = 10• 𝑝 = 0.0000001• Do we reject 𝐻0?• Yes. We can reject the null that therelationship between signaturesand petition support is zero
51 / 58
Hypothesis test example
screenreg(logit_model)
#### =============================## Model 1## -----------------------------## (Intercept) -9.36 **## (3.09)## signatures 0.01 ***## (0.00)## partyConservative -3.09 *## (1.23)## -----------------------------## AIC 29.23## BIC 35.02## Log Likelihood -11.61## Deviance 23.23## Num. obs. 51## =============================## *** p < 0.001; ** p < 0.01; * p < 0.05
• We reject 𝐻0 if 𝑝 is small,e.g. < 0.05
• That is, if 𝑧 > 1.96 or 𝑧 < −1.96• z = −3.091.23 = −2.51• 𝑝 = 0.012• Do we reject 𝐻0?• Yes. We can reject the null that therelationship between party andpetition support is zero
52 / 58
Conclusion
What have we learned? (I)
• Many research questions in the social sciences require analysing binaryoutcomes
• While we can use linear regression to analyse these outcomes, OLS hassome unattractive properties
• Logistic regression is a helpful alternative to OLS, which avoids the mainproblem: that probabilities need to be constrained to be between 0 and 1
• We need to be careful when interpreting the output of the model
53 / 58
Seminar
In seminars this week, you will learn to …
1. Implement some binary logistic regression models
2. Interpret the resulting coefficients
3. Calculate some fitted probabilities
54 / 58
What have we learned? (II)
Substantive finding:
• Politicians are more likely to speak on issues where local support for theissue is strong!
Question: Does this mean that higher numbers of signatures cause betterparliamentary representation?
55 / 58
Logit and causality
• Logistic regression is a method for describing variation in observed data
• As with linear regression, we cannot claim to be describing a causalrelationship unless we are confident that we have controlled for allpossible confounding variables
• No new method will guarantee us a way of making causal statements!
56 / 58
From PUBL0055 to PUBL0050
In the “Introduction” module, we have covered:
1. Several commonly applied statistical methods for quantitative analysis2. How to use R3. An introduction to quantitative causal analysis
In the ‘Advanced’ module, we will cover:
1. More ‘cutting-edge’ statistical methods for quantitative analysis2. More R!3. In-depth exploration of the different approaches to causal analysis4. More focus on developing research questions/designs in your own work
57 / 58
Thanks for watching, have a good break, and hopefully see many of you nextterm!
58 / 58
The Linear Probability ModelThe Binary Logistic Regression ModelInterpretationPredicted ProbabilitiesInferenceConclusion