Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
GVT 201Data Analysis and Politics
Professor Elena Llaudet
Lecture 23 | April 11, 2017
Experiment
I The votes are in and the winner is: The experiment aboutthe e�ects of music on productivity!
I ProtocolI On a related note:
https://youtu.be/2HmQRb2uzlc?t=90
Next Thursday: Student Research Conference
8-9:15am Breakfast and posters9:25-10:40am
I Panel: Equity, Identity, and American DemocracyI Panel: Lobbying and the Executive Branch
10:50am-12:05pm
I Panel: Variations of Political Participation in ElectionsI Panel: International Relations
1:40-2:55pm
I Panel: Justice and InequalityI Panel: Education PolicyI Panel: Internship Panel
3:05-4:20pm Student Panel4:30-5:30pm Award Ceremony5:30-7:00pm Alumni Speed Networking!
Highly recommend it! (especially alumni speednetworking). Attendance is mandatory (at leastduring our time slot). Location: Sargent Hall.
Plan for Today
I Review: Hypothesis Testing of Di�s-in-MeansI Hypothesis Testing of Coe�cientsI Example: Do Women Promote Di�erent Policies?
1. Load and explore data2. Identify/calculate outcome and independent variables3. Estimate the e�ect of X on Y4. Identify whether the e�ect is statisically
signficant at the 95% confidence level usinghypothesis testing
Uncertainty: Hypothesis Testing of Di�s-in-MeansI We have seen how to determine whether the e�ect of X on
Y, as measured by the di�s-in-means estimator, isstatistically significant at the 95% confidence level usinghypothesis testing
I H0: E (Xt) ≠ E (Xc) = 0 (no e�ect)I H1: E (Xt) ≠ E (Xc) ”= 0 (e�ect can be + or -)I we calculate the observed test statisticI we calculate the p-value of the observed test statisticI we will reject the null hypothesis of no causal e�ect at the
population level and conclude the e�ect is statisticallysignficant at the 95% confidence level
I if the p-value Æ 0.05 orI if |observed test statistic| Ø 1.96
Uncertainty: Hypothesis Testing of Coe�cientsI How about if we use regression estimates/coe�cients
to measure the e�ect of X on Y? How would we determinewhether the e�ect is statistically significant at the 95%confidence level using hypothesis testing in this case?
I First, how can we calculate the di�s-in-means estimatorusing regression analysis?
I By running a regression where X is binary and identifiestreatment assignment
Y = – + —X where X =Y]
[1 if Treatment0 if Control
I In this case, what coe�cient can be interpreted as thedi�s-in-means estimator? —
I How about if we had more than one treatment, how can wecalculate the multiple di�s-in-means estimator usingregression analysis?
I By running a regression where we have a binary Xjidentifying each of the j treatments and the baselinecategory is the control group
I For example, if the experiment uses 2 di�erent treatments
Y = – + —1X1 + —2X2
X1 =Y]
[1 if Treatment 10 if Control
and X2 =Y]
[1 if Treatment 20 if Control
I In this case, what coe�cient can be interpreted as thedi�s-in-means estimator of treatment 1? —1
I What coe�cient can be interpreted as the di�s-in-meansestimator of treatment 2? —2
Hypothesis Testing of —: Formal Procedure1. In this class, we will always test whether the population
parameter (—) is zero (i.e., there is no e�ect of X on Y –what we ultimately want to refute)
I H0 : — = 02. We will always use as our alternative hypothesis that the
e�ect is di�erent from zero and can be positive or negativeI H1 : — ”= 0
3. The test statistic in this case is defined as:
test statistic = — ≠ —
SE (—)I where — is the observed coe�cientI — is the population parameterI SE (—) stands for the standard error of —
4. If n is large enough and the true — is indeed zero:
test statistic = —
SE (—)if null is true ≥ N(0, 1)
5. Calculate observed test stat and draw conclusions about —I We can compute the p-value (probability under the null of
observed test statistic or more extreme value)I p-value = p(test statistic Ø |observed value|) + p(test
statistic Æ -|observed value|)I if p-value Æ 0.05 (our cuto�) æ reject the null
hypothesis and conclude that the e�ect is statisticallysignificant at the 95% confidence level
I Or, we can compare |observed test statistic| to 1.96I if |observed test statistic| Ø 1.96 æ reject the null
hypothesis and conclude that the e�ect is statisticallysignificant at the 95% confidence level
Standard Error of —
test statistic = —
SE (—)where SE (—) is standard error of —
I The standard error of the — represents the estimatedstandard deviation of its sampling distribution (in thehypothetical repeated sampling)
SE (—) =ııÙ
1n
qni=1 ‘2
iqni=1(xi ≠ x)2
I No need to memorize formula but do need to understandimplications
I bigger residuals æ noisier estimatorI larger n æ more precise estimatorI more variation in x æ more precise estimator
Example: Do Women Promote Di�erent Policies?
I Randomized policy experiment in India; where some councilseats where randomly assigned to women
I Particular policy outcomes we look into:I number of new and repaired drinking water facilitiesI number of new and repaired irrigation systems facilities
1. Load and explore data
data <- read.csv("women.csv")
head(data)
## village female water irrigation
## 1 1.2 1 10 0
## 2 1.1 1 0 5
## 3 2.2 1 2 2
## 4 2.1 1 31 4
## 5 3.2 0 0 0
## 6 3.1 0 0 0
I village: village identifierI female: council assigned to womenI water : number of drinking water facilities increased or
repairedI irrigation: number of irrigation facilities increased or repaired
2. Identify/calculate outcome and independent variables
I What should be the model we estimate, if what we aretrying to do is run a regression that will provide us with acoe�cient equivalent to the di�s-in-means estimator thatcan be interpreted as the estimated e�ect of having a femaleleader on the number of new and repaired drinking waterfacilities?
I In other words, what should be our Y?I water
I What should be our X?I female
I So the model to estimate is:
\Drinking Water Facilitiesi = – + — Female Leaderi
3. Estimate the e�ect of X on Y. Does female leadership a�ectthe number of new and repaired water facilities?
I So, we want to estimate the following regression:\
Drinking Water Facilitiesi = – + —Female Leaderi
I What should be the R code?regression <- lm(data$water ~ data$female)
regression
##
## Call:
## lm(formula = data$water ~ data$female)
##
## Coefficients:
## (Intercept) data$female
## 14.738 9.252
I So, the estimated model is:\
Drinking Water Facilitiesi = 14.74 + 9.25 Female Leaderi
I Interpretation of –? 14.74I mathematically – is always Y when X = 0I in special case where X is a binary/dummy variable that
identifies treatment assignment – can be interpreted asthe average outcome for the control group
I in this case, then, – which is estimated to be of 14.74is the expected average number of new and repaireddrinking facilities in villages with male leaders
I Interpretation of —? 9.25I mathematically — is always —Y when —X = 1I in special case where X is a binary/dummy variable that
identifies treatment assignment — can be interpreted asthe di�s-in-means estimator
I in this case, then, — is the estimated average causale�ect of female leadership on water facilities – i.e.,having a female leader leads to having 9.25 more newor repaired drinking facilities on average as compared tohaving a male leader
I Can this observed e�ect be due to noise alone (i.e., due tosampling variability)? In other words, is this e�ectstatistically distinguishable from zero?
I we need to do hypothesis testing
4. Identify whether the e�ect is statistically significant at the95% confidence level using hypothesis testing
I What is our null hypothesis? H0 : — = 0I What is our alternative hypothesis? H1 : — ”= 0I What is the test statistic?
test statistic = —
SE (—)I If n is large enough and the null is true, the test statistic will
be distributed like what? The standard normal distribution:test statistic ≥ N(0,1)
I Then, if |observed test statistic| Ø 1.96, we will reject thenull hypothesis and conclude that the e�ect is statisticallysignificant at the 95% confidence level
I OK, so let’s calculate the observed test statistic. . .
I We want to calculate this:
test statistic = —
SE (—)
I We already have the —. We need the SE (—)I One way to find the standard error of the estimated
coe�cients is to use the function tidy() from the broompackage (you may have to install the package first)
# if package not installed: install.packages("broom")
library(broom)
tidy(regression)
## term estimate std.error statistic p.value
## 1 (Intercept) 14.738318 2.286300 6.446363 4.216474e-10
## 2 data$female 9.252423 3.947746 2.343723 1.970398e-02
tidy(regression)
## term estimate std.error statistic p.value
## 1 (Intercept) 14.738318 2.286300 6.446363 4.216474e-10
## 2 data$female 9.252423 3.947746 2.343723 1.970398e-02
I We can store the coe�cient and standard error in objects,and then calculate the observed test statistic like so
slope <- tidy(regression)[2,2] # second row, second column
slope.se <- tidy(regression)[2,3] # second row, third column
t.stat <- slope / slope.se
t.stat
## [1] 2.343723
I Or just look at the reported test statistic in the tableI Based on the observed test statistic of —, is the e�ect of
female leadership on drinking water facilities statisticallysignificant at the 95% confidence level?
I |observed test statistic| Ø 1.96, so we reject the nulland conclude that the e�ect IS statistically significant
I We could also go ahead and calculate the p-value associatedwith that observed test statisticp.val <- 2 * pnorm(-abs(t.stat))
p.val
## [1] 0.01909235
I Based on the p-value associated with —, is the e�ect offemale leadership on drinking water facilities statisticallysignificant at the 95% confidence level?
I because p-value Æ 0.05, we reject the null æ slopecoe�cient is statistically significant (i.e., distinguishablefrom zero) at the 95% confidence level
I the e�ect IS statistically significant
I Does statistical significance mean that the e�ect isimportant/meaningful?
I not necessarily; just means that it’s not likely to be 0I to gauge magnitude we can compare the size of the
e�ect (9.25) with the standard deviation of theoutcome variablesd(data$water)
## [1] 33.67894
slope/sd(data$water)
## [1] 0.2747243
I the e�ect is equivalent to about 1/4 of the standarddeviation of Y - that is a sizable e�ect (recall that if Yis normally distributed, 2/3 of its data will be within 1standard deviation of the mean)
Today’s Class and Next
TodayI Hypothesis Testing with Regression Coe�cients
I ConceptI Example
Next Thursday: Student Research ConferenceNext Tuesday: A Su�olk MondayFollowing Thursday: Another example of hypothesis testing withregression coe�cients in class (very similar to PSet # 11)
I bring your computers!!