12
American Lessons on Designing Reliable Impact Evaluations, from Studies of WIA and Its Predecessor Programs Larry L. Orr, Independent Consultant Stephen H. Bell, Abt Associates Jacob A. Klerman, Abt Associates

Larry L. Orr, Independent Consultant Stephen H. Bell, Abt Associates

  • Upload
    arich

  • View
    104

  • Download
    1

Embed Size (px)

DESCRIPTION

American Lessons on Designing Reliable Impact Evaluations, from Studies of WIA and Its Predecessor Programs. Larry L. Orr, Independent Consultant Stephen H. Bell, Abt Associates Jacob A. Klerman, Abt Associates. The early evaluations. 1960s: MDTA (pre/post) 1970s: - PowerPoint PPT Presentation

Citation preview

Page 1: Larry L. Orr, Independent Consultant Stephen H. Bell, Abt Associates

American Lessons on Designing Reliable Impact Evaluations, from Studies of WIA

and Its Predecessor Programs

Larry L. Orr, Independent ConsultantStephen H. Bell, Abt Associates

Jacob A. Klerman, Abt Associates

Page 2: Larry L. Orr, Independent Consultant Stephen H. Bell, Abt Associates

The early evaluations

• 1960s: MDTA (pre/post)

• 1970s:

– YEDPA (400+ studies; various methods)

– CETA (comparison groups from national survey samples)

• 1980s:

– National Academy review of YEDPA studies found “little reliable information on the effectiveness of the programs”, recommended random assignment

– More than a dozen CETA evaluations produced widely divergent impact estimates – with essentially the same data (Barnow, 1987)

– DOL-convened expert panel recommended random assignment for evaluation of new Job Training Partnership Act (JTPA)

Page 3: Larry L. Orr, Independent Consultant Stephen H. Bell, Abt Associates

Evaluating the econometric evaluations

• LaLonde (1986) and Maynard and Fraker (1987) applied a variety of nonexperimental methods to data from a randomized trial, were unable to replicate the experimental estimates

• Since then, a number of replication studies have been conducted (see summaries in Glazerman et al., 2003; Bloom et al., 2005; and Pirog et al., 2009). No nonexperimental method has consistently replicated experimental results.

Page 4: Larry L. Orr, Independent Consultant Stephen H. Bell, Abt Associates

The current consensus

• No known nonexperimental method can reliably produce unbiased estimates of the impact of training programs – this means that you can never know ex post whether you have a good estimate or not

• Randomized trials are the strongly preferred method of estimating training program impacts on technical grounds

• Randomized trials are also more intuitively understandable to policy makers than complex econometric methods

• Nonexperimental studies frequently give rise to technical controversy that detracts from their credibility and acceptance, whereas randomized trials are generally accepted by both evaluators and policy makers

Page 5: Larry L. Orr, Independent Consultant Stephen H. Bell, Abt Associates

Why is it so hard to obtain reliable results from nonexperimental studies?

• “Impact” is the difference between trainees’ actual outcomes (e.g., earnings) and what those outcomes would have been without training

• The fundamental problem of evaluation is to estimate what the trainees’ outcomes would have been without training

• To see how difficult that task is, consider the time path of earnings for the JTPA control group – individuals who were just like the trainees except that they didn’t get JTPA services…

Page 6: Larry L. Orr, Independent Consultant Stephen H. Bell, Abt Associates

Time path of earnings, control group, National JTPA Study

200

400

600

800

-12 -10 -8 -6 -4 -2 0 2 4 6 8 10 12 14 16 18

Months After Program Entry

Ear

ning

s

Page 7: Larry L. Orr, Independent Consultant Stephen H. Bell, Abt Associates

What is the margin for error?

200

400

600

800

-12 -10 -8 -6 -4 -2 0 2 4 6 8 10 12 14 16 18

Months After Program Entry

Ear

ning

sTreatment Group

Control Group

Page 8: Larry L. Orr, Independent Consultant Stephen H. Bell, Abt Associates

Time path of earnings, program and comparison groups, from Heinrich et al.

Page 9: Larry L. Orr, Independent Consultant Stephen H. Bell, Abt Associates

Our Conclusions/Recommendations (1)

• Random assignment is the only safe way to estimate the impacts of training programs

– Different nonexperimental approaches yield widely varying results

– In dozens of replication studies, nonexperimental methods have almost never satisfactorily replicated the experimental estimates

– The stakes are too high to take the kind of risk and uncertainty entailed in nonexperimental methods

– Nonexperimental evaluations inevitably shift the debate from substance to method

Page 10: Larry L. Orr, Independent Consultant Stephen H. Bell, Abt Associates

Our Conclusions/Recommendations (2)

• If the ESF does decide to use nonexperimental methods:

– Need to pay close attention to timing of job loss and pre-program dynamics of earnings in matching comparison group (a necessary, but not sufficient, condition)

– Before adopting any nonexperimental method, it should be demonstrated that it replicates multiple experimental results (Note that what should be tested is an algorithm that can be applied in other evaluations, not a set of estimates that are unique to a single evaluation.)

Page 11: Larry L. Orr, Independent Consultant Stephen H. Bell, Abt Associates

Our Conclusions/Recommendations (3)

Learn from our mistakes – don’t spend 40 years

repeating them!

Page 12: Larry L. Orr, Independent Consultant Stephen H. Bell, Abt Associates

For copies of these slides, contact…

[email protected]