Gilligan quantitative impact eval methods

INTERNATIONAL FOOD POLICY RESEARCH INSTITUTE

Quantitative Impact Evaluation Methods

Dan Gilligan, IFPRI

INTERNATIONAL LIVESTOCK RESEARCH INSTITUTE

An Introduction to Quantitative Impact Evaluation

I. Why is impact evaluation important?

• What are appropriate goals for an impact evaluation?

• Monitoring and evaluation

II. How do you design an impact evaluation?• The evaluation problem

• Measuring causal impact

• Impact evaluation methodologies

Introduction (cont‟d)

III. Impact Evaluation and Measurement Tools

• Choice of evaluation estimator

• Data requirements

• How to randomize

• Sample design

• Sample size

What are appropriate goals for an impact evaluation?

Measure impact on important outcomes

• Need a limited set of outcome indicators that are easy to measure

Estimate the program‟s cost effectiveness

Explain which components of a program work best

Caution:

• Evaluations can only answer a limited number of questions

• Evaluations sometimes cannot explain what caused the impacts

Effective monitoring and qualitative assessments help to explain the context for impact evaluation results

Indicators for Monitoring and Evaluation

IMPACT

OUTPUTS

OUTCOMES

INPUTS

Effect on living standards

-better welfare impacts (e.g literacy, health)

- increase in participation, happiness

Financial and physical resources

- track resources used in the intervention

- e.g. budget support for local service delivery

Goods and services generated

- more local government services delivered

- e.g., textbooks, food delivered, roads built

Access, usage and satisfaction of users

- e.g. school attendance, vaccination rates,

- food consumption, number of mobile phones

Evalu

ation

Mo

nitori

ng

II. How do you design an impact evaluation?

The central problem of impact evaluation

• Want to measure the impact of a program or “treatment” on outcomes

• How do we know measured impacts are due to the program?

• If we want to claim that the impacts observed are causal, we need an „identification strategy‟—a way to attribute the observed effects to the program and not to other factors

II. How do you design an impact evaluation?

Designing the impact evaluation

• Measure impact by comparing outcomes in households exposed to the treatment to what those outcomes would have been without that exposure—the counterfactual

• Problem: you cannot observe the counterfactual because program beneficiaries receive the treatment

• Need to construct a comparison group from nonbeneficiaries

• Comparison group makes it possible to control for other factors that affect the outcome

Ex: IFPRI evaluated the effect of Ethiopia‟s public works (PSNP) on food consumption, but food prices rose at the same time; use comparison group to remove the effect of rising prices on food consumption in impact estimates

Suppose we observe an increase in outcome Y for beneficiaries over time after an intervention

Y0

Y1

baseline(t0) follow-up(t1)

Intervention

(observed)

To measure impact, we need to remove the counterfactual from the observed outcome

Y0

Y1

baseline(t0) follow-up(t1)

Intervention

(observed)

Y1*

Impact=

Y1-Y1*

(counterfactual)

Comparison

What You Can Miss Without a Comparison Group

0.0

5.0

10.0

15.0

20.0

25.0

30.0

35.0

SFP THR CTR

% Round 1

Round 2

-3.4

13.9

-5.3

Impact:SFP -19.2%

THR -17.2%

(*Anemic = hemoglobin<11g/dL)

Impact on School Feeding on Anemia Prevalence of Girls Age 10-13

Constructing a Comparison Group

Suppose we want to measure the impact of public works on household food security (calorie consumption)

Q: Why not compare average calorie consumption of PW beneficiaries to average calorie consumption of randomly selected nonbeneficiaries?

A: On average, nonbeneficiaries are different from beneficiaries in ways that make them an ineffective comparison group

Need to correct for pre-program differences between beneficiaries and nonbeneficiaries

• Beneficiaries are usually poorer; they also decided to participate

• If you don‟t control for this, impact estimates are biased

Impact Evaluation Methodologies

Ways of constructing a control or comparison group

Randomization

Matching (including propensity score matching, covariate matching)

Regression discontinuity design (RDD)

Instrumental variables

Difference-in-differences


Randomization

• Randomly assign communities or households into treatment and control groups before the program for the purpose of evaluation

random assignment makes it likely that treatment and control communities have identical characteristics on average at baseline

for safety nets, usually randomize at the community level

• Common approach: use phased rounds of program implementation and randomly decide which communities enter the program in each round

• Example of randomization from N. Uganda school feeding study


Randomization

• How do you justify having a control group?

Justified if program cannot reach all communities at once

Some communities are always excluded

Main difference between control group and other nonbeneficiaries is that you interview the control group

Ex: transparency in Nicaragua RPS evaluation. Randomization done in public with media and politicians present

• There is consensus that a randomized out control group provides the best estimate of counterfactual outcomes

Results of a good randomized evaluation will be convincing to everyone: you have solid evidence of the impact of the program


Matching

• Match beneficiary and nonbeneficiary households by characteristics observed in a survey

• Estimate impact as the difference in weighted average outcomes between beneficiaries and matched nonbeneficiaries

• Propensity score matching matches households on estimated probability of being in the program

• With matching, the quality of the evaluation depends heavily on the quality of the data: not as convincing as randomization

Propensity Score Matching

0.5

11.5

2

0 .2 .4 .6 .8 1Estimated propensity score

Non-beneficiary Beneficiary

Kernel density of PPS by treatment status


Many of the projects being presented here may be able to rely on matching methods for their evaluation

• Need detailed data from the baseline or on variables that change very little over time (adult education level)

Tips on Using Propensity Score Matching

• Need variables that are correlated with the outcome and with the treatment

• Comparison households should come from the same community as treated households if possible; otherwise include many community-level variables


Regression Discontinuity Design (RDD)

If program eligibility is based on threshold for some characteristic (e.g., poverty index), compare outcomes for households just above and just below the threshold

More useful for poverty programs targeted on easily observable and measureable criteria» poverty score, proxy means score, food insecurity

score

How RDD Measures Impact

Before start of the program

05

10

15

20

25

20 25 30 35 40 45

Poverty Score

Pr(

Co

mp

lete

Se

co

nd

ary

Sch

oo

l)


After the program

05

10

15

20

25

20 25 30 35 40 45

Poverty Score

Pr(

Co

mp

lete

Se

co

nd

ary

Sch

oo

l) beneficiariesnonbeneficiaries


After the program

05

10

15

20

25

20 25 30 35 40 45

Poverty Score

Pr(

Co

mp

lete

Se

co

nd

ary

Sch

oo

l) beneficiariesnonbeneficiaries

IMPACT

Example of RDD from El Salvador RPS Evaluation

Figure 4. Change in enrollment rate of 7-12 year olds from 2006-2007 by distance from implied cluster threshold, 2006 and 2007 entry groups

Source: Impact Evaluation Survey Data, 2008

-.05

0

.05

.1

Cha

ng

e in

En

rollm

ent R

ate

-10 -5 0 5 10 15Distance to Cluster Threshold

2006 2007

Difference-in-Differences (DID)

Using any evaluation method, measure outcomes before and after the program begins to obtain “difference-in-differences” (DID) impact estimates

Impact = (T1-T0)-(C1-C0)

Cost Effectiveness

Comparisons of programs should focus on cost effectiveness.

• Cost effectiveness is most relevant for policy: Which program has the biggest impact per dollar spent?

• Impact evaluation methodology focuses on measuring program benefits—one side of cost effectiveness.

Would need to add a cost study similar to Caldés, Coady and Maluccio, IFPRI, 2004.

Technology

Gilligan quantitative impact eval methods