What works in program evaluation

Forum

What Works in Program Evaluation

MICHAEL HENNESSY

ABSTRACT

The query, “What works in program evaluation”can be answered on the basis of information already known to many evaluators. Documentation of procedures needed to justify evaluative conclusions in summative evaluations of the impact of large, expensive social programs has led to four propositions proffered as reflective of what works in any such outcome evaluation.

Really, I’ve had it with all the arguing in the field. Thus, only relatively non-contentious comments are welcome. Please send them to the address above or use e-mail.

Applied researchers are always asking the question “What works?” in reference to their own particular interests. Twenty years ago this query was prominent in the areas of corrections and welfare reform. Later, the same question was asked of manpower training programs and programs for the educationally disadvantaged. Now, of course, identifying effective welfare reforms is once again an important policy issue, as are also assessing proposals for restructuring health care financing and delivery and determining the empirical effects (both positive and negative) of affirmative action programs in promoting educational and occupational advancements of minorities and women.

Only the most cynical view this concern with program effectiveness as pure self interest on the part of researchers, program managers, funding agencies, politicians, or other stakeholders. While “insincere” policy analysis (Bozeman & Landsbergen, 1989) certainly exists, there is always a legitimate social and political interest in discovering which modalities of incarceration, occupational training, educational enhancement, medical service delivery, and furthering social equality are superior (or, at least, no more harmful) to those of the status quo. Thus, “What works?” is a completely legitimate question to ask about social policies and the institutions that support them.

Curiously, the question of “What works?” is less often asked about the process of program evaluation itself.’ This is an unfortunate situation since evaluation, like all

Michael Hennessy . PhD, MPH, Department of Sociology, 1555 Pierce Drive, Emory University, Atlanta GA 30322; e-mail: [email protected].

Evaluation Practice, Vol. 16, No. 3, 1995, pp. 275-278. Copyright 0 1995 by JAI Press, Inc.

ISSN: 0886-1633 All rights of reproduction in any form reserved.

275

276 EVALUATION PRACTICE, 16(3), 1995

applications of the scientific method, depends on the legitimacy of its procedures to justify its findings and produce credible conclusions. That is, to have faith in the outcomes of any evaluation, interested stakeholders must have prior faith in the evaluation design, data- collection, and analysis processes that produce the conclusions.

I would additionally argue that the lack of emphasis on what works is doubly debilitating for evaluators because we actually do know “What works”-there is no need for a question mark in the title. In fact, program evaluation is one of the best documented fields of social science practice (perhaps exceeded only in this by the technical and procedural aspects of survey sampling and public opinion polling). Oddly enough, one explanation for this is that evaluation is a relatively new discipline and there are quite a few “evaluation practices” competing for intellectual and practical legitimacy. It is telling, for example, that all the founders of evaluation theory reviewed in Shadish, Cook, and Leviton (199 1) are still practicing evaluators (or, a less strict criterion, are still professionally active in their respective fields). Another reason for the documentation of evaluation practice is that it is typically adversarial in nature and claims of superiority or inferiority (to which most evaluation studies can be reduced, like it or not) naturally produce enemies as well as allies. But both advocates and adversaries find it necessary to document and justify the procedures used to arrive at the evaluative conclusions.

The fact is, there is not any uncertainty about “What works” in evaluation. There are, however, both philosophical dogmas, methodological preferences, organization pressures (Hennessy & Sullivan, 1989) and particular institutional relationships (Moskowitz, 1993) that obscure the fact that there is an underlying unanimity-about at least many important aspects of program evaluation--that currently exists in the field. You doubt that statement? Then consider some propositions that summarize “What works” in program evaluation today. In my opinion, these propositions apply to any impact evaluation of large-scale, expensive, social programs.

l Evaluators should be involved with program development and design at the initial stages, given appropriate arrangements to preclude bias. Where this is not possible and evaluators are brought in after programs and policies are in place, all “evaluation studies” should incorporate an initial evaluability assessment phase, the successful completion of which is the condition for additional evaluation expenditures, temporal or financial.

l Evaluators should generally only evaluate policies or programs that have explicit programmatic theory or those for which programmatic theory can be constructed and the relevant measures identified and operationalized. This is most critical in policy evaluations and process studies oriented toward program improvement, because to reform “welfare,” for example, it is necessary to have a model of the “welfare” process. Similarly, program improvement efforts are virtually impossible if components of the program and their interrelationships are not specified in advance.2

l While randomized experiments may be the purported “gold standard” in some scientific professions, when such approaches are not possible, evaluators should propose only quasi-experimental designs that are high in internal validity. Usually these will be either time series (Horn & Heerboth, 1982) “explicit selection”(Bel1, Orr, Blomquist, & Cain, 1995; Trochim, 1984) or multiple comparison group quasi-experimental alternatives to classical randomized experiments.

What Works in Program Evaluation 277

l In all cases, evaluators should work strenuously to compensate for the deterioration of experimental control that may occur during implementation of randomized experiments or the problem of confounding variables in quasi- experimental designs, by use of strong measurements (qualitative or quantitative) of other causes of the outcomes of interest. In other words, more attention should be given to transforming “noise into signals” which can then be included in the subsequent analysis. Less attention should be paid to “design issues,” which are technically well understood but often intractable in program evaluation.3

As stated earlier, these particular guidelines are oriented toward outcome or summative evaluation questions, but I see no reason why similar guidelines that summarize what works could not be proposed for formative and process evaluation purposes as well. More importantly, please note that none of these assertions about what works include any requirements dictating a specific philosophy of science, a particular methodology or measurement strategy, an organizational location of the evaluator(s), or any peculiar motivation of the actors involved (e.g., Objective Scientist, Program Apologist, Bureaucratic Hatchet-Person, Conservative Budget-Cutter). In sum, whether the evaluation methods and underlying philosophy of science represent “first” generation, “fourth” generation, or “X” generation evaluation is irrelevant. The only issue is the appropriate link between the problem or the object of the evaluation and the general guidelines given above for defining “What works.” Program evaluation as a scientific enterprise is methodologically neutral and strenuous efforts should be made to maintain this neutrality and at the same time expand the range of methods and measurement used (Hennessy, 1982). And, no, these two dicta are not mutually exclusive or contradictory.

Readers of the above might respond by pointing out that these guidelines, if strictly applied, would probably reduce the number of outcome evaluations actually conducted. My brief answer is “Yes, but so what?” All evaluators know of some (perhaps many) evaluations that should never have been conducted due to lack of program theory, lack of measurable outcomes, or even lack of an operating program itself, although this knowledge is often disguised and denied for many worthy reasons (Worthen, 1995) and probably for some unworthy ones. But the practical implications of the serious adherence to these guidelines are not the issue; the assertion here is not that all programs and polices should be evaluated, but that we know “What works” in program evaluation. In fact, adherence to these nonmethodological guidelines for evaluation would improve the quality of large-scale impact evaluations that were done and the credibility of evaluators who did them. Certainly this would be an unalloyed good. Still skeptical? Try a small thought experiment with the guidelines above and imagine doing an impact evaluation where none of them are followed. To my mind, that would be a quintessential example of a program evaluation that does not work.

NOTES

1. Actually, some evaluators might retort that it is the only question asked about evaluation, but these persons possibly spend too much time reading Evaluation Practice.

2. Which is not to say that evaluators are not useful in identifying program theory; they obviously are.

3. This is true of social science in general.

278 EVALUATION PRACTICE, 16(3), 1995

REFERENCES

Bell. S., Or-r, L., Blomquist, J., & Cain, G. (1995). Program applicants as a comparison group in evaluating trainingprograms. Kalamazoo, MI: W.E. Upjohn Institute.

Bozeman, B., & Landsbergen, D. (1989). Truth and credibility in sincere policy analysis: Alternative approaches for the production of policy-relevant knowledge. Evaluation Review, 13(4), 355 379.

Hennessy, M. (1982). The end of methodology? A review essay on evaluation research methods. Western Political Quarterly, 35(4), 606-6 12.

Hennessy, M., & Sullivan, M. (1989). Good organizational reasons for bad evaluation research. Evaluation Practice, 10(4), 41-50.

Horn, W., & Heerboth, J. (1982). Single case experimental designs and program evaluation. Evaluation Review, 6(3), 403-424.

Moskowitz, J. (1993). Why reports of outcome evaluations are often biased or uninterpretable. Evaluation and Program Planning, 16, l-9.

Shadish, W., Cook, T. & Leviton, L. (1991). Foundations of program evaluation. Newbury Park:

Sage. Trochim, W. (1984). Research design for program evaluation. Beverly Hills: Sage. Worthen, B. (1995). The unvarnished truth about logic-in-use and reconstructed logic in educational

inquiry. Evaluation Practice, 16(2), 165 178.

Documents

What works in program evaluation