Upload
leonard-watkinson
View
225
Download
2
Tags:
Embed Size (px)
Citation preview
Impact Evaluation:An Overview
Lori Beaman, PhDRWJF Scholar in Health Policy
UC Berkeley
What is Impact Evaluation? IE assesses how a program affects the well-being
or welfare of individuals, households or communities (or businesses)
Well-being at the individual level can be captured by income & consumption, health outcomes or ideally both
At the community level, poverty levels or growth rates may be appropriate, depending on the question
Outline Advantages of Impact Evaluation
Challenges for IE: Need for Comparison Groups
Methods for Constructing Comparison
IE Versus other M&E Tools The key distinction between impact evaluation
and other M&E tools is the focus on discerning the impact of the program from all other confounding effects
IE seeks to provide evidence of the causal link between an intervention and outcomes
Monitoring and IE
IMPACT
OUTPUTS
OUTCOMES
INPUTS
Effect on living standards and welfare - infant and child mortality, - improved household income
Financial and physical resources - spending in primary health care
Goods and services generated - number of nurses - availability of medicine
Access, usage and satisfaction of users - number of children vaccinated, - percentage within 5 km of health center
Monitoring and IE
Gov’t/program production function
Users meet service delivery
INPUTS
OUTPUTS
OUTCOMES
IMPACTSProgram impacts confounded by local, national, global effects
difficulty of showing causality
Logic Model: An Example Consider a program of providing
Insecticide-Treated Nets (ITNs) to poor households
What are: Inputs? Outputs? Outcomes? Impacts?
Logic Model: An Example Inputs: # of ITNs; # of health or NGO
employees to help dissemination Outputs: # of ITNs received by HHs Outcomes: ITNs utilized by # of
households Impact: Reduction in illness from malaria;
increase in income; improvements in children’s school attendance and performance
Advantages of IE
In order to be able to determine which projects are successful, need a carefully designed impact evaluation strategy
This is useful for: Understanding if projects worked:
Justification for funding Scaling up Meta-analysis: Learning from Others
Cost-benefit tradeoffs across projects Can test between different approaches of same
program or different projects to meet national indicator
Essential Methodology Difficulty is determining what would have
happened to the individuals or communities of interest in absence of the project
The key component to an impact evaluation is to construct a suitable comparison group to proxy for the “counterfactual”
Problem: can only observe people in one state of the world at one time
Before/After Comparisons Why not collect data on individuals before and
after intervention (the Reflexive)? Difference in income, etc, would be due to project
Problem: many things change over time, including the project The country is growing and ITN usage is increasing
generally (from 2000-2003 in NetMark data), so how do we know an increase in ITN use is due to the program or would have occurred in absence of program?
Many factors affect malaria rate in a given year
Example: Providing Insecticide-Treated Nets (ITNs) to Poor Households The intervention: provide free ITNs to households
in Zamfara Program targets poor areas Women have to enroll at local NGO office in
order to receive bednets Starts in 2002, ends in 2003, we have data on
malaria rates from 2001-2004
Scenario 1: we observe that the households in Zamfara we provided bednets to have an increase malaria from 2002 to 2003
Years
Malaria Rate
2001 2002 2003 2004Treatment Period
A
CImpact = C – A?An increase in malaria rate!
Underestimated Impact when
using before/after comparisons: High rainfall year
Basic Problem of Impact Evaluation: Scenario 1
Zamfara households with bednets
“Counterfactual”Zamfara Households if no bednets provided
Years
Malaria Rate
2001 2002 2003 2004Treatment Period
Impact = C – BA Decline in theMalaria Rate!
A
B
C
Impact ≠ C - A
Underestimated Impact when
using before/after comparisons: High rainfall year
Basic Problem of Impact Evaluation: Scenario 1
Zamfara households with bednets
“Counterfactual” (Zamfara households if no bednets provided)
Years
Malaria Rate
2001 2002 2003 2004Treatment Period
TRUE Impact = C - B
A
B
C
Overestimated Impact: Bad Rainfall
Impact ≠ C - A
Basic Problem of Impact Evaluation: Scenario 2
Zamfara households
Comparison Groups Instead of using before/after comparisons, we
need to use comparison groups to proxy for the counterfactual
Two Core Problems in Finding Suitable Groups: Programs are targeted
Recipients receive intervention for particular reason Participation is voluntary
Individuals who participate differ in observable and unobservable ways (selection bias)
• Hence, a comparison of participants and an arbitrary group of non-participants can lead to misleading or incorrect results
Comparison 1: Treatment and Region B Scenario 1: Failure of reflexive comparison due to higher
rainfall, and everyone experienced an increase in malaria rates
We compare the households in the program region to those in another region
We find that our “treatment” households in Zamfara have a larger increase in malaria rates than those in region B, Oyo. Did the program have a negative impact?
Not necessarily! Program placement is important: Region B has better sanitation and therefore affected less
by rainfall (unobservable)
Years
Malaria rate
2001 2002 2003 2004Treatment Period
High Rainfall
Basic Problem of Impact Evaluation: Program Placement
“Treatment”: ZamfaraA
D
E
TRUE IMPACT: E-D
Years
Malaria rate
2001 2002 2003 2004Treatment Period
Underestimated Impact when using region B comparison group: High Rainfall
Basic Problem of Impact Evaluation: Program Placement
“Treatment”: Zamfara
Region B: Oyo
A
B
C
D
E-A > C-B : Region B affected less by rainfall
E
TRUE IMPACT: E-D
Comparison 2: Treatment vs. Neighbors We compare “treatment” households with their neighbors.
We think the sanitation and rainfall patterns are about the same.
Scenario 2: Let’s say we observe that treatment households’ malaria rates decrease more than comparison households. Did the program work? Not necessarily: There may be two types of households:
types A and B, with A knowing how malaria is transmitted and also burn mosquito coils
Type A households were more likely to register with the program. However, their other characteristics mean they would have had lower malaria rates in the absence of the ITNs (individual unobservables).
Type A HHs with Project
Years
Malaria Rates
Y1 Y2 Y3 Y4Treatment Period
Basic Problem of Impact Evaluation:
Selection Bias
Type B HHs
Observed difference
Comparing Project Beneficiaries (Type A) to
Neighbors (Type B)
Type A HHs with Project
Type A Households
Years
Malaria Rates
Y1 Y2 Y3 Y4Treatment Period
Basic Problem of Impact Evaluation:
Selection Bias
Type B HHs
True Impact
Selection BiasObserved difference
Participants are often different than Non-participants
Basic Problem of Impact Evaluation: Spillover Effects Another difficulty finding a true counterfactual
has to do will spillover or contagion effects
Example: ITNs will not only reduce malaria rates for those sleeping under nets, but also may lower overall rates because ITNs kill mosquitoes
Problem: children who did not receive “treatment” may also have lower malaria rates – and therefore higher school attendance rates
Generally leads to underestimate of treatment effect
“Treatment” Children
Years
School Attendance
2001 2002 2003 2004Treatment Period
Impact ≠ B - C
A
B
C
Impact = B - A
Basic Problem of Impact Evaluation: Spillover Effects
“Control” Group of Children in Neighborhood School
C>A due to spilloverfrom treatment children
Counterfactual: Methodology We need a comparison group that is as
identical in observable and unobservable dimensions as possible, to those receiving the program, and a comparison group that will not receive spillover benefits.
Number of techniques:Randomization as gold standardVarious Techniques of Matching
How to construct a comparison group – building the counterfactual1. Randomization2. Difference-in-Difference3. Regression discontinuity4. Matching
Pipeline comparisons Propensity score
1. Randomization
Individuals/communities/firms are randomly assigned into participation
Counterfactual: randomized-out groupCounterfactual: randomized-out group
Advantages: Often addressed to as the “gold standard”: by
design: selection bias is zero on average and mean impact is revealed
Perceived as a fair process of allocation with limited resources
Randomization: Disadvantages Disadvantages:
Ethical issues, political constraints Internal validity (exogeneity): people might not
comply with the assignment (selective non-compliance)
External validity (generalizability): usually run controlled experiment on a pilot, small scale. Difficult to extrapolate the results to a larger population.
Does not always solve problem of spillovers
When to Randomize If funds are insufficient to treat all eligible
recipients Randomization can be the most fair and
transparent approach
The program is administered at the individual, household or community level Higher level of implementation difficult:
example – trunk roads
Program will be scaled-up: learning what works is very valuable
2. Difference-in-difference Observations over time: compare observed changes
in the outcomes for a sample of participants and non-participants
Identification assumption: the selection bias or unobservable characteristics are time-invariant (‘parallel trends’ in the absence of the program)
Counter-factual: changes over time for the non-Counter-factual: changes over time for the non-participantsparticipants
Diff-in-Diff: ContinuedConstraint: Requires at least two cross-sections of
data, pre-program and post-program on participants and non-participants Need to think about the evaluation ex-ante,
before the program More valid if there are 2 pre-periods so can
observe whether trend is same
Can be in principle combined with matching to adjust for pre-treatment differences that affect the growth rate
Implementing differences in differences: Different Strategies Some arbitrary comparison group Matched diff in diff Randomized diff in diff
These are in order of more problems less problems, think about this as we look at this graphically
Essential Assumptions of Diff-in-Diff
Initial difference must be time invariant
In absence of program, the change over time would be identical
Y1
Impact Y1
*
Y0
t=0 t=1 time
Difference-in-Difference in ITN Example Instead of comparing Zamfara to Oyo,
compare Zamfara to Niger if: While Zamfara and Oyo have different malaria
rates and different ITN usage, we expect that they change in parallel
Use NetMark data to compare 2000 to 2003 in Zamfara and Niger states
Use additional data (GHS, NLSS) to compare incomes and sanitation infrastructure levels and changes prior to program implementation
3. Regression discontinuity design Exploit the rule generating assignment into a program
given to individuals only above a given threshold – Assume that discontinuity in participation but not in counterfactual outcomes
Counterfactual: individuals just below the cut-off who did Counterfactual: individuals just below the cut-off who did not participatenot participate
Advantages: “Identification” built in the program design Delivers marginal gains from the program around the
eligibility cut-off point. Important for program expansion Disadvantages:
Threshold has to be applied in practice, and individuals should not be able manipulate the score used in the program to become eligible
RDD in ITN Example Program available for poor households Eligibility criteria: must be below the
national poverty line or < 1 ha of land Treatment group: those below cut-off
Those with income below the poverty line and therefore qualified for ITNs
Comparison group: those right above the cutoff Those with income just above poverty line and
therefore not-eligible
RDD in ITN Example
Problems: How well enforced was the rule? Can the rule be manipulated? Local effect: may not be generalizable if
program expands to households well above poverty line
Particularly relevant since NetMark data indicate low ITN usage across all socio-economic status groups
4. Matching Match participants with non-participants from a larger
survey
Counterfactual: matched comparison groupCounterfactual: matched comparison group
Each program participant is paired with one or more non-participant that are similar based on observable characteristics
Assumes that, conditional on the set of observables, there is no selection bias based on unobserved heterogeneity
When the set of variables to match is large, often match on a summary statistics: the probability of participation as a function of the observables (the propensity score)
4. Matching Advantages:
Does not require randomization, nor baseline (pre-intervention data)
Disadvantages: Strong identification assumptions
In many cases, may make interpretation of results very difficult
Requires very good quality data: need to control for all factors that influence program placement
Requires significantly large sample size to generate comparison group
Matching in Practice
Using statistical techniques, we match a group of non-participants with participants using variables like gender, household size, education, experience, land size (rainfall to control for drought), irrigation (as many observable characteristics not affected by program intervention)
One common method: Propensity Score Matching
Matching in Practice: 2 Approaches
Approach 1: After program implementation, we match (within region) those who received ITNs with those who did not. Problem?
Problem: likelihood of usage of different households is unobservable, so not included in propensity score
This creates selection bias
Approach 2: The program is allocated based on land size. After implementation, we match those eligible in region A with those in region B. Problem?
Problems: same issues of individual unobservables, but lessened because we compare eligible to potential eligible
Now problem of unobservable factors across regions
An extension of matching:pipeline comparisons Idea: compare those just about to get an
intervention with those getting it now
Assumption: the stopping point of the intervention does not separate two fundamentally different populations
Example: extending irrigation networks
In ITN example: If only some communities within Zamfara receive ITNs in round 1: compare them to nearby communities will receive ITNs in round 2
Difficulty with Infrastructure: Spillover effects may be strong or anticipatory effect