Methodological reflections on impact evaluation 1 Concertation meetingBrussels, 29 March 2011 The views expressed in this presentation are those of the

Methodological reflections on impact evaluation 1Concertation meeting Brussels, 29 March 2011

The views expressed in this presentation are those of the author and do not (necessarily) reflect the position of the European Commission. Neither the Commission nor any person acting on behalf of the

Commission can be hold responsible for the use which is made of this presentation

Renewing Health Concertation MeetingSeptember 9, 2011

Brussels

Gold or Bronze?

JRC – IPTS – IS UnitCristiano Codagnone


Mapping questions to approaches

Question ApproachDid we do what we planned? To what extent? Where we did it and where we did not?

Evaluation as control, implementation effectiveness = expected output minus actual output (if zero then full effectiveness). Within the domain of M&OE. Mostly about data and indicators.

Could we produce the same output spending less? Who spent less and who spent more and why?

Evaluation as control, implementation efficiency= output/input. If intervention on multiple sites data from M&OE can be used for statistical multivariate analysis of input efficiency (i.e. Data Envelopment Analysis)

Did we produce an effect contributing to reach the target outcome that can be attributed solely to our intervention?

Impact Evaluation with counterfactual approach, outcome effectiveness = proved treatment effect

Was the produced effect worth the costs?

Impact Evaluation with counterfactual approach, outcome cost-effectiveness = proved treatment effect net of cost (or estimate of unit costs obtained as a net effect of intervention through counterfactual approach. For instance cost-per – QALY produced threshold used in the UK by NICE)


Back to basics (1/ 6)

• Mr. Smith suffering from a headache and taking an aspirin

• What happens after taking aspirin: Y1

• And What, had he not taken the aspirin: Y0 (counterfactual)

• Causal effect under counterfactual ‘successionist’ concept of causation: Y1- Y0 with ways to deal with the fact that Y0 is not observable


Back to basics (2/6 )

• Let us look in formalised but simplified notation to the question, first using a typical regression equation used for estimation:

• Yi =αXi +βTi +εi. (1) where:

• T is a dummy variable with value 1 for individuals receiving treatment and 0 for those not receiving.

• X indicate what is generally referred to as ‘a bunch of Xs observed characteristics’ (confounders)

• whereas ε is an error term reflecting unobserved characteristics that also affect Y



• The problem for the estimation of this equation springs from the fact: – Intervention can entail a purposive definition of who is eligible for the intervention

and who is not;– Individuals can self-select into the intervention

• These two are sources of unobserved feeding the error term ε, which ends up containing variables that are correlated with the dummy variable T, in formal notation cov (T, ε) = 0

• • This is, however, a clear violation of the orthogonality assumption of

ordinary least square regression: independence of regressors from the error term.

• Without further elements addressing this problem, the causal parameter β cannot be estimated.



• Now let us look at the same problem using a more conceptual notation of equation (2):

• D = E(Yi(1) | Ti= 1) – E(Yi(0) | Ti= 0) (2) where:

• E indicate the average (linear operator), Yi is the general outcome variable for treated individuals (Ti = 1), and the value of Yi under treatment is represented as Yi (1). For non-treated, Ti = 0, and Yi can be represented as Yi (0).

• Then in (2) the average effect D of the intervention is posited equal to the differences between the averages outcomes for treated and non-treated.

• Because treated and non-treated might not be similar prior to the intervention and the difference above cannot be attributed solely to the intervention.



• Now let us both add and subtract (nothing change) in equation (2) the expected outcome for non-treated had they been treated — E (Yi(0) / Ti = 1):

• D = E(Yi(1) | Ti= 1) – E(Yi(0) | Ti= 0) + [E(Yi(0) | Ti = 1) – E(Yi(0) | Ti = 1)]

• Or= [E(Yi(1) | Ti= 1) – E(Yi(0) | Ti = 1)] + [E(Yi(0) | Ti = 1) – E(Yi(0) | Ti = 0)]

• So D= ATE+B

• ATE is the average treatment effect and B is the selection bias. Because one does not know E (Yi(0) | Ti = 1), one cannot calculate the magnitude of selection bias.



• As a result, if one does not know the extent to which selection bias makes up D, one may never know the exact difference in outcomes between the treated and the control groups.

• Perfect randomisation solves the selection bias because it ensures perfect comparability between treated and non-treated and they would have the same expected outcome in the absence of the intervention.

• This means that E[Yi(0)|Ti = 1] is equal to E[Yi(0)|Ti = 0] that applied to the equation above lead B to be equal to zero.

• This is the main reason why experimental evaluation design through randomisation has been considered for a long time the golden standard


To RCT or Not to RCT (1/X)

• Since Fisher (1926), randomisation as golden standard: – Experimental (control / manipulability of selection process)– Treated and non treated are equivalent in all relevant respects but different wrt the

probability of being exposed to the intervention– Selection bias eliminated and internal validity of estimates ensured

• Randomized experiments as the bronze standard, Journal of Experimental Criminology, (2005) 1:417-433 – RCT balance treatment and control groups on potential confounders to

produce unbiased estimates of causal effects. No other research design confer this benefit so reliably

– Random assignment: on average balance two groups on all possible confounders, this means that the estimation of the term X can be un biased.

– Average effects are substituted to individual effects: unlikely that every unit would have precisely the same response to an intervention. Scientifically it makes sense to look at averages

– Both critics and supporter are biased: expectations on RCT unrealistic


Gold or bronze

• Yet, there is a catch. You must meet the conditions of no interference or SUTVA: treatment and condition to which a unit is assigned has no impact on the response of another unit.

• Policy problem: key is to unambiguously manipulate: if response depends on the pattern of assignment even if it is random then you lose control

• If SUTVA is violated there is no statistical fix.

• A number of benefits but only if conditions are met and on important assumptions

Easy to explain Face validity Easy to analyse


Gold or bronze?

“Randomised experiments rest on more complicated, subtle, and fragile foundations than some researchers appreciate. Proper implementation of randomized experiments is demanding. Textbook requirements are rarely met. Thus, randomized experiments are not the gold standard. But if the truth be told, there is no gold standard.


Contents

Conclusions


Crossroads One

• Are Renewing Health et al interested only in M&OE?– I YES, THEN:

MAST and current work is more than enough, Implementation readiness assessment is straightforward, mostly ‘yes’ answers to brainstorming questions

– IF NO (ALSO IE), THEN: A lot is still missing, choices are needed, answers to brainstorming questions

change radically

• Impact Evaluation (IE) is not simply about data and indicators: define, identify, and estimate counterfactuals

Steps Comments

1 Hypotheses formulation About science (and, convention, imagination): definition of hypothetical counterfactuals (what if I take and I do not take an aspirin)

2 Identification About mathematical analysis on hypothetical large sample with no sampling variation

3Estimation (statistical inference)

About methods (and data): depending on 1) and 2) and on knowledge of the selection process, causal effects are statistically inferred using different methods (i.e. randomisation, discontinuity regression, difference in difference, propensity score matching, instrumental variables, etc)

Source: Heckman J, Econometric Causality, International Statistical Review, 2008. 76(1): p. 1-27.


Scoping a IE problem

• Which is the unit of analysis (i) ?– Individuals/families?– Health and social care producing units?– Administrative territorial units?– All of them? That is we have (i), (j), and (k) units of analysis

• Which is the time unit (t)– When is t+1? (first time period after which intervention should have had an effect)– Is there also t+2, t+3? (continuous treatment with longitudinal explication of effects)– How do we define t – 1? ( the pre-intervention period on which baseline data are available)

• Which is/are the outcome variable(s) Y [ issue not orthogonal to (i)]: so, are we estimating Yit or Y1-n

ijkt?

• Are the SUTVA (Stable Unit Treatment Value Assumption ) conditions met?– Only one form of treatment and of non treatment for each unit– No interference across units (no contagion effect)

• Is the CS (Common Support) condition (comparing the comparable) respected across time and space ?


The Best is the enemy of the good

• Everything is possible, complex methodological solutions are available– Counterfactual IE set up can be extended to situations where:

Time is individual specific Units can be observed at two and ,possibly more, different times There are multiple treatment states, corresponding to different types of treatment or levels of treatment

intensity, or where treatment is continuous

• Yet, they are demanding and increased technical complexity may come at the cost of internal or external validity (back to ‘castles in the air’?)

• Counterfactual IE should be applied only to max 2 well defined outcoems.:– The most strategic outcome aimed at (addressing the direst societal challenge): Policy-driven– An outcome whose analysis would greatly advance our knowledge of what works: Science-driven

• This requires thinking bigger than simply indicators and data: make strategic choices followed by theoretical and methodological reasoning

• Avoid repeating the ‘one mile wide but one inch deep’ mistake over and over again (especially if time and budget constraints are cogent)


Answers to some of the brainstorming questions

• Convergence Telecare and Telehealth?– Yes for M&OE with limited obstacles– It depends on the selection of the outcome variable(s) for IE

• Single methodology independent of duration and technology used?– Probably yes for M&OE– Very challenging, if not impossible, for IE

• eReadiness self-assessment mandatory?– Very feasible and unproblematic

• Is evaluation worth pursuing? Yes, as long as it also include some IE

• What can be done within the current funding of running projects?– Extensive M&OE (but less than the full list from MAST)– Very selective and well defined IE, recalling that the best is the enemy of the good

Documents

Methodological reflections on impact evaluation 1 Concertation meetingBrussels, 29 March 2011 The views expressed in this presentation are those of the