7
Harvard ID: 30938380 A164 Program Evaluation Midterm 4/2/2015 1. (5 points) Suppose one is studying the impact of some intervention, X, on an outcome, Y. There is some other variable, Z, which one has not included in the analysis. Excluding Z from a regression analysis will lead to a biased estimate of the effect of X if and only if two conditions are true. What are the two conditions? The two conditions necessary for omitted variable bias is (1) if Z is correlated with the independent variable X or any other independent variable included in the model and (2) if Z is a determinant of Y, that is to say, if the true regression coefficient of Z on Y is non-zero. Both conditions must be present for there to be omitted variable bias. 2. Researchers at a community college are evaluating a new approach to remedial coursework for students who score poorly on a placement test in English. Students have a choice: take the college’s traditional remedial English course (which does not grant college credit); or, take the college-level English class immediately followed by an “accelerated learning” course, in which the English instructor reviews those topics where the students need extra help. Five hundred and ninety two (592) students chose the new two-course option and 5,545 students chose the traditional remedial course. In addition to having data on the courses students took, you are provided with data on students’ placement scores and student background measures (race/ethnicity and Pell Grant eligibility). Suppose you want to analyze the impact of the new approach on students’ subsequent probability of completing a college degree. 2a. (10 points, max 100 words) You have two options for choosing a comparison group: (i) use the full comparison group of all those choosing the traditional remedial option (5,545 students) and statistically adjust for student background and students’ scores on the placement test, or (ii) find the subset of students in the full comparison group who most closely match the 592 students in the treatment group in terms of student background and placement test scores. Describe the advantages and disadvantages of (i) and (ii). Which would you choose: (i), (ii) or both (i) and (ii)? Option (i) affords a larger sample size, but differences in characteristics between treatment and control units may cause over or underestimation of the treatment effect. Option (ii) reduces differences since treatment and control units are matched according to observable characteristics. However, depending on the type of matching used, there could still be imbalance or a lack of

Program Evaluation Midterm

Embed Size (px)

Citation preview

Page 1: Program Evaluation Midterm

Harvard ID: 30938380

A164 Program Evaluation

Midterm

4/2/2015

1. (5 points) Suppose one is studying the impact of some intervention, X, on an outcome, Y.

There is some other variable, Z, which one has not included in the analysis. Excluding Z from

a regression analysis will lead to a biased estimate of the effect of X if and only if two conditions

are true. What are the two conditions?

The two conditions necessary for omitted variable bias is (1) if Z is correlated with the independent

variable X – or any other independent variable included in the model – and (2) if Z is a determinant

of Y, that is to say, if the true regression coefficient of Z on Y is non-zero. Both conditions must

be present for there to be omitted variable bias.

2. Researchers at a community college are evaluating a new approach to remedial coursework

for students who score poorly on a placement test in English. Students have a choice: take the

college’s traditional remedial English course (which does not grant college credit); or, take the

college-level English class immediately followed by an “accelerated learning” course, in which

the English instructor reviews those topics where the students need extra help. Five hundred

and ninety two (592) students chose the new two-course option and 5,545 students chose the

traditional remedial course. In addition to having data on the courses students took, you are

provided with data on students’ placement scores and student background measures

(race/ethnicity and Pell Grant eligibility). Suppose you want to analyze the impact of the new

approach on students’ subsequent probability of completing a college degree.

2a. (10 points, max 100 words) You have two options for choosing a comparison group: (i) use

the full comparison group of all those choosing the traditional remedial option (5,545 students)

and statistically adjust for student background and students’ scores on the placement test, or (ii)

find the subset of students in the full comparison group who most closely match the 592 students

in the treatment group in terms of student background and placement test scores. Describe the

advantages and disadvantages of (i) and (ii). Which would you choose: (i), (ii) or both (i) and

(ii)?

Option (i) affords a larger sample size, but differences in characteristics between treatment and

control units may cause over or underestimation of the treatment effect. Option (ii) reduces

differences since treatment and control units are matched according to observable characteristics.

However, depending on the type of matching used, there could still be imbalance or a lack of

Page 2: Program Evaluation Midterm

complete overlap between treatment and control, especially in the case of attrition or if replacement

matching is not used. I would use both options in order to understand the treatment effect in more

detail if I had sufficient resources.

2b. (10 points, max 100 words) Suppose the state department of education has 10th grade math

and English scores for many of the same students. However, negotiating access to those data

will take time and potentially cause delays. Would it be worth the effort to gain access to those

test scores? Discuss how they might be used and the advantages and disadvantages of

negotiating access.

Gaining access is worthwhile. Selection bias can be minimized further with the data. Matching

will be improved if the test scores are used as a proxy for unobservable unit characteristics.

Furthermore, the data serve as a pretest and allow creation of treatment and control subsets;

consequently, treatment effects can be measured differentially. Nevertheless, two issues may lead

to spurious subset groupings. First, there may not be sufficient variation in the test scores since all

study participants are remedial students. Second, a significant amount of time has passed since the

test’s administration, during which many study participants may have considerably changed.

3. In 1997, the School Choice Scholarships Foundation (SCSF) offered scholarships to low-

income families attending kindergarten through grade four in New York City public schools.

They could use the scholarships to transfer to private schools. The scholarships were worth up

to$1,400 annually and could be used for up to four years at both religious and secular schools.

The foundation received 20,000 applications and offered scholarships to 1,300 students selected

by lottery. Of those offered the scholarship, 24 percent never used the scholarship. Moreover,

when students were tested 2 years later, 35 percent of the total analysis sample (combining the

treatment and control groups) did not take the test.

3a. (10 points, max 100 words) Which poses a bigger challenge for internal validity—the 24

percent who never used the scholarship or the 35 percent who did not take the test at the end of

two years? Explain.

The 35% attrition is a greater challenge. Nonparticipation can be remedied through an intent-to-

treat analysis. Even though the estimated effect in an intent-to-treat analysis is of being assigned

Page 3: Program Evaluation Midterm

to treatment rather than of the treatment itself, the benefits of random assignment to internal

validity are maintained. Using an intent-to-treat analysis will yield unbiased results unless there is

systematic attrition. Attrition always lowers statistical power. If sufficiently high, attrition

precludes the ability to make causal inferences. Furthermore, if attrition is treatment-correlated,

heterogeneity of treatment units may be lost, thus biasing effect estimates and threatening internal

validity.

3b. (10 points, max 100 words) How might the researchers use the data to estimate the impact

of attending a private school? Describe the statistical method and specification they would use.

The researchers need to utilize instrumental variable estimation because students choosing to use

the scholarship are systematically different on some unmeasured characteristic. Including an

“instrument”, a factor related to the treatment variable but uncorrelated with the outcome,

eliminates this selection bias. The instrumental variable estimator is represented mathematically

as:

�̂�1𝐼𝑉𝐸 = (�̅�𝐼=1 − �̅�𝐼=0)/(�̅�𝐼=1 − �̅�𝐼=0)

In the scholarship example, a dummy variable of whether a student wins or loses the lottery could

serve as an appropriate instrument since it is related to the treatment but uncorrelated with student

achievement on a standardized test.

3c. (10 points, max 100 words) Does the impact estimated in (b) apply to the full sample of

scholarship applicants, or just to the subset who would have used the scholarship if offered (and

would not have attended a private school otherwise)? What are the implications for external

validity?

It applies to those who would have used the scholarship if offered. Instrumental variable estimates

identify the effect of treatment on “compliers”, those whose decision to participate in treatment

Page 4: Program Evaluation Midterm

was affected by the offer of treatment. It does not generalize to “always compliers”, those who

receive treatment through other means even if not chosen by randomization into the treatment

group, nor does it generalize to “always non-compliers”, those who are unaffected by the offer of

treatment.

4. Suppose you are doing an evaluation of a new online math curriculum for students in grades

3, 4 and 5. Two-hundred schools have applied to participate. You randomly choose 100 schools

to receive the curriculum and 100 schools to be in the control group. When a school is chosen,

they can use the curriculum in all three grades.

4a. (10 points, max 75 words) In most schools, standardized testing only begins at the end of

grade 3. In other words, you will not be able to control for baseline test scores for students in

grade 3. Does that mean you cannot test for the impact of the curriculum on student

achievement gains during third grade?

It is still possible to test for the impact of the curriculum and yield unbiased estimates. However,

statistical power will not be as robust without the pretest scores acting as a control variable.

Furthermore, adding more regressors, such as one for pretest scores, is useful to indirectly test for

randomization. If randomization has occurred, the addition of more regressors will not change the

estimate on the treatment variable since they would be uncorrelated to the treatment.

4b. (10 points, max 75 words) Given the design above, you cannot estimate the differences in

the impact by school. Why? How would you change the design if you were interested in

estimating the differences in impact by school?

Identifying treatment effects must occur within randomization blocks; otherwise results can be

biased. In this study, schools, which are higher order units, are randomized. In order to estimate

the differences in impact by school, heterogeneity within blocks must be maintained. It is easier to

accomplish this when randomization occurs at lower level units, like students or classes.

Page 5: Program Evaluation Midterm

4c. (10 points, max 75 words) Using the design above, the 200 schools are drawn from 4 large

school districts (District A, District B, District C and District D). How would you test whether

the impact of the program varied by district? (Write down the equation you would estimate and

the statistical test you would use to answer the question.)

�̂� = 𝛽0̂ + 𝛽1̂𝑇 + 𝛽2̂𝐵 + 𝛽3̂𝐶 + 𝛽4̂𝐷 + 𝛽5̂𝐵 ∗ 𝑇 + 𝛽6̂𝐶 ∗ 𝑇 + 𝛽7̂𝐷 ∗ 𝑇 + 𝛽8̂𝑋

T is a dummy variable representing whether treatment was given. B, C, and D are dummy variables

representing their respective districts. Interactions between the district variables and the treatment

follow. Lastly, any control variables are represented by X. I can use a joint F-test to test the null

hypothesis that 𝛽5̂ = 𝛽6̂ = 𝛽7̂ = 0. If any of these coefficients is nonzero, then I reject the null

hypothesis and conclude that there is impact variation by district.

4d. (5 points, max 75 words) For an additional $100,000, you could conduct a survey of the

teachers in the control group, to ask about the math curricula they are using, and whether they

are using an online math curriculum. How would such data be useful to know to understand

the magnitude of the treatment effect?

If teachers in the control group are using curricula that are very similar to the treatment curriculum,

the magnitude of the treatment effect will be underestimated. In contrast, if control group teachers

are using curricula that are much different than the treatment curricula, much more different than

what we expect the average curriculum to be from the treatment curriculum, then the magnitude

of the treatment effect will be overestimated.

5. (10 points, max 100 words) Suppose one had to choose between two standardized tests as an

outcome for an evaluation of educational intervention. They are testing exactly the same

domains, and are drawing from the same bank of test items in each domain. However, one test

draws 50 percent more items from each domain and, therefore, is more reliable. However, you

fear that students will be less likely to complete the longer test. Ignoring the scoring costs,

describe an important advantage and an important disadvantage of using the shorter test.

Page 6: Program Evaluation Midterm

One advantage is that students will be less likely to suffer from test fatigue. Consequently, they

will be more likely to concentrate and put their best effort into answering each question rather than

not caring about their performance on some or all of the test, thus decreasing the possibility of a

floor effect. One disadvantage, however, is that student scores may exhibit more variation from

expected score as a result of chance, not ability.

6. The following table is from a study in which the researchers tested the effect of value-added

estimates for teachers on principals’ beliefs about each teacher’s effectiveness. The researchers

asked principals to subjectively rate teachers’ effectiveness at the beginning of the year. A

randomly chosen half of principals were provided with value-added estimates for each of their

teachers from prior school years. At the end of the school year, all principals (in the treatment

group that received value-added info as well as the control group that did not) were asked to

update their rating of each teachers’ effectiveness.

6a. (5 points, max 25 words) Using the results in the (1) set of columns, what does the table

imply about the impact of value-added estimates on principals’ perceptions of teachers’

effectiveness?

Page 7: Program Evaluation Midterm

The table implies that providing value-added estimates to principals resulted in a 0.123 unit

increase in their perceptions of teachers’ effectiveness.

6b. (5 points, max 25 words) As described in the table note, the variable “precision” is a measure

of the statistical precision of the value-added estimates. (A higher value implies that the value-

added measure for a given teacher had a lower standard error. That is, the value-added for a

given teacher was measured with more precision.) What do the results in the (3) set of columns

imply about the relationship between the impact of the value-added measure on principals’

perceptions and the precision of that estimate?

Value-added estimates were precise (.72 standard deviations above the mean) and when provided,

resulted in a 0.087 unit increase in principal perceptions of teachers’ effectiveness.