Good organizational reasons for bad evaluation research

Good Organizational Reasonsfor Bad Evaluation ResearchMichael HennessyPreventionResearch Center, Berkeley, CA

Michael J. SullivanFreeman, Sullivan& Company, Berkeley, CA

In his recent crit ique of evaluation research, Lipsey (1988) identified manyflaws in evaluation practice: the generally nontheoretical approach in specifying the implicit causal model ofsocial programs, the limitations of conceptdefinition and measurement operationalization, the necessity for "black box"assumptions concerning the definition of the program treatment, the use ofinadequate research designs to estimate treatment effects , and the generallack of implementation research. Not surprising, these failings result inoutcome or effectiveness research that is weak in inferential power anddominated by Type I and Type II errors in unknown proportions. He characterizes this situation as evaluation research "malpractice" since it isonly through "ignorance or negligence" that such a situation could exist(Lipsey, 1988, p. 22). Based on his overview, he makes four recommendations (pp. 23-24):

1. Do less quantitative-comparative studies and more process-implementationstudies

2. Interpret the results cautiously, carefully, and with all appropriate caveats andplan ahead for a meta-analysis which uses the results of individual studies asinput

3. Develop and utilize more sophisticated methodological and conceptual tools4. Devise [and presumably enforce] new standards that evaluation research must

meet

Like his critique of current practice, his remedies are conventional; all butthe first reflect advice ritualistically given to students of evaluation (Cook &Campbell, 1979; Green & Lewis, 1986; Rossi & Freeman, 1982) and the

AUTHORS' NOTE: Preparation of this article was supported in part by the National Instituteon Alcoholismand Alcohol Abuse ResearchCenter Grant AA06282 to the Prevention ResearchCenter, Pacific Institute for Research and Evaluation Our thanks are given to Catherine HaganHennessy for commentson earlier drafts.

41

42

advocacy of more training or higher standards is commonly heard whenever the weaknesses of evaluation research are discussed. For example,"re-tooling" for the technically retarded researcher has been advocated aspart of a proposal to restructureprogram evaluation/social policywork donein the UnitedStates (Berk,Boruch, Chambers, Rossi, & Witte, 1985) and anextremely detailed set of "higher standards" have been developed in acongressionally-mandated study (Boruch, Cordray, Pior, & Leviton, 1983).

Lipsey's recommendations (as well as the tone of the other works citedearlier) imply a single cause for the apparent failure of evaluation researchas commonlypracticed: incompetentresearchers. Lipseyand others suggestthe same sort of self-reforming remedies: better training, more sophisticated analytic technologies, higher standards, more honesty, and less ego.In other words, if evaluation researchers were smarter, better trained, andmore scientific, better evaluations would result. As Lipsey (1988, p. 6)summarized:

Without substantial improvement in our concepts, methods, and standards ofpractice, I see little hope for quantitative-comparative evaluation research tocontribute usefully to the development and testing of planned intervention.

Lipsey's points define what we will call here the "academic critique" ofevaluation research.' They are of interest to us less for of their truth valuethen because they conflict rather dramatically with out own experiences indoing evaluation research in private settings (bothprofit and nonprofit). Wedo notbelievethatwe measuredownto the lowstandardsthat Lipseyimputesto most evaluation researchers, and we disagree with his diagnosis of thecauses of poor evaluations.

To understand our position, it helps to contrast Lipsey's critique ofevaluation with the critiques we hear from our clients, usually projectmanagersor program staff.While this "client critique"consists of a predictable litany of shortcomings, it is quite different than Lipsey's. The critiquemost commonly includes the following: (a) objections to the "academic"impositionof some kind of systematic plan concerning the data collectionand sampling strategies proposed for an evaluation, (b) the "unnecessary"use of conceptual terms and multidimensional outcome measures, (c) the"confusing use" of "sophisticated" data analysis techniques (e.g., contrastcoded dummy variable regression) when other methods (e.g., hundreds ofbivariate cross-tabulations) are deemed to be more easily understood andmore appropriate, (d) our perverse tendency to "highlight" the majorweaknessesof the finalproductratherthan itsstrengths, and(e)our insistence

43

on a stance of professional neutrality(i.e., any demeanor less than enthusiasm)when it iswidelyassumedthat the programunderscrutinyis a paragonof management efficiency, cost effectiveness, and an exemplar of its kind.

Our point is that the weaknesses of evaluation from the client's point ofview are completely contradictory to those of the academic critique. Academicswant more theory,clientswant less.Academicswantgreater experimental control, clients want less. Academics want causal models of theprocess and multiple measurements, clients want less. Academics wantcopious caveats, clients want none. And so it goes. How can it be that theacademic critique of evaluation practice can be so different from the clientcritique?It is almostas if the academicsand the clients are not talkingaboutthe same thing.

TWO ORGANIZATIONAL MODELS OF EVALUATION

Webelieve that the major reasonfor this remarkablecontradiction is thatthe academic critique is based on an inaccurate model (or metaphor) ofdecision making in organizations. Furthermore, it is the features of theorganization,not the level of professional training or I.Q. of the evaluator,that are the important factors in determining evaluation quality. To clarifythis, we present the implicitmodelof organizationunderlying the academicand client critiquesof evaluation, and then show how such a contradictorycriticism of evaluation practice is simultaneously possible.

The Academic Model of Organizational Decision Making

Underlying most of the academic critique of evaluation research is amodel of "rational" decisionmaking. That is, the explicit function of evaluationresearchis toprovidefortheaccumulation of "findings"thatare derivedfrom a set of strategically planned studies that address specific researchquestions. The results of such studies are used to inform (not define) apluralistic decision-making process that balances both organizational priorities and resources and produce rationallyoptimal (if not always cconornically optimal)decisions. In summary, the role of evaluation in the organizationis toassistdecision makersandmanagerstodecideoradjudicate betweencompeting,data informed,logicallyrelated arguments(DeYoung & Conner,1982).

This is a very common metaphorfor the evaluationprocessand one thatis implicitly assumed by Lipsey as well as virtually all "theoristsof evalua-

44

tion" (Shadish & Epstein, 1987)regardless of their methodological persuasion. However, the important features of the metaphor are those that are notstated directly. The rational decision-making model assumes, for example,that the organizational environment is changingslowly, if at all, has alreadyproduced clear program goals and stationary research questions, and hasallocated sufficient time and resources to come to a reasoned balancing ofall relevant information. In other words, the contextof the evaluation is, atworst, assumed to be irrelevant and, at best, reflects the priorities andcapabilities of an organizational "think tank" or a research and development(R&D) facility (Hennessy, 1982).

The Client Model of Organizational Decision Making

Theorganizational modelof decisionmakingimplicit inthe clientcritiqueis quite different. Our clients view decision making as resulting from theinteraction of groupcoalitions within the organization thatare affectedquitenondeterministically by formal position and administrative responsibility.These coalitions are formed around perceived common interest areas that,while perhapsstable for longperiods,are considered inherently unstableandtransient.

Evaluation research in thismodelis primarilyaflexible tool to respond todemands from other coalitions, protect resources already acquired, andexpand the influenceof existingcoalitions if possible. In a specificcontext,the coalitions determine the research problem and utilize evaluation toidentify the relevant aspects of the defined problem (which may be bothpositive or negative)and suggest possible responses to the issue. However,it is important to note that the role of evaluation is not purely cynical orself-serving: the evaluator is not just a hatchet or a publicrelationsperson.There is, after all, a legitimate need in all organizations for the type ofinformation that evaluation research can provide, a need that programmanagers actively, if somewhat ruefully, recognize (Wye, 1989).

This model of decision making corresponds in part to a number ofproposed alternatives tothe rationaldecision-making model. Botha"garbagecan" (March & Olsen,1976;Martin,1982)or a"publicgrazingarea"(Crane,1988, p, 469) metaphorseems particularly appropriate. Under the garbagecan metaphor the four elements of organizational decision making (problems, decision makers,opportunities for makingchoices, and solutions) arein continual flux within the organization. As part of the public grazing areametaphor, evaluation research defines the grazing area of resources and

45

programs,self-defined coalitionsof interestcorrespond to the"speciesmix"competing for these resources, and "farmers" are the decision makers ormanagersof the organization.

However, more important than the selectionof the appropriate metaphoris the empirical assessment of which model best describes the environmentandactivitiesof evaluatorsinorganizations, and howthemodels can be usedto explain thecontradictions betweenwhat the academics and clients see asthe failures of evaluation research. We know of two studies that explicitlyfocus on the role of evaluators in organizations. Peck and Rubin (1983)reportedon theevaluation roleof the Officeof Evaluation (OE)of Community Planning and Development (CPO). They found that the evaluationprocesswas enmeshedin a morass of political considerations that producedevaluatorswhowere"often required to ignorebasicresearch principles,"andthat "straightforward technicalIy sophisticated evaluation maynot serve thebest interestsof anyone" (peck & Rubin, 1983,pp. 688-689). They summarized their experience through the developmentof the ThreeRules of Evaluation at OE:

1. Speed indevelopment of a projectis of essence;revision andmodification canbe achieved after the project is underway. "Quick and dirty" research islegitimate.

2. It is vital to employ a foot-in-the-door strategy. Entry to as many programoffices as possible will enhance the image of the evaluation unit while alsodisplaying its utility to the organization.

3. Bulk size,appearance, and quantityof evaluationreports count more than thequalityof the content.

In a different organizational context,Kennedy(1983)reviewedthe evaluation practices of 16schooldistricts and developeda typology of evaluatorroles: technician, participant, management facilitator, and independent observer. Only two districts had the "independent observer" as a type ofevaluator, which is surprisinggiven that this type conforms most rigidly tothe evaluator's role under the assumptions of the academic critique and therational decision-making model [i.e., scientificalIy neutral and purposivelynot involvedindecisionmaking). In fact, Kennedysuggeststhat the adaptation to the decision-making style in the school district is the most importantfactor in developing a successful evaluation practice:

Successful adaptation had two main effects. It assured continuing organizational support for the evaluation enterprise. and it often resulted in failure to

46

meet theprofessionalstandardsof theevaluator's role.However, noneof theseevaluatorsperceivedtheirorganizationalcontexts as compromising their professional obligations The context merely reflected the client's needs and, inso doing,defined the evaluator's job. In the eyes of theevaluators,adaptationwas not failure but success, and the unhappiest evaluators were those whocould not adapt and therefore could not serve. (Kennedy, 1983,p. 540).

EXPLAINING THE CONTRADICTIONSBETIVEEN THE CRITIQUES

In lightof the foregoing, it is clear how the academic and client critiquesof the identical evaluatorbehaviorcan be so different, The academiccritiqueof current evaluation practice implies a particular model of the researchprocess-characterized by the values of scientific neutrality, experimentalcontrol, and statistical adjustments for the "confounding variables" thatremain-that isconsistentwitha rational decision-making environmentand,not surprising, is extremely successful in universities and R&D environments. However, as both Peck and Rubin's (1983) and Kennedy's (1983)analyses of evaluation researchers in the field show, the application of theacademic evaluation research model into a nonrational decision-makingcontext thatvaluesgoal flexibility, politicalcriteriaforassessingquality, andthe adaptation of "scientific"methods to the studyofpolicyalternatives findsevaluators caught between their professional training and the demands oftheir occupational responsibilities.

Fromthe viewpoint of the academiccritique, thecompromises necessaryto resolvethe contradictions betweenthe academicmodelof evaluation andthe decision-making processes of mostorganizations produceevaluation research practices that are"incoherent"(Sherrill,1984), "unscientific"(Crane,1988), and riddled with "malpractice," "ignorance," and "negligence"(Lipsey, 1988). However, the evaluator's insistence on these professionalstandards in most organizational environments simultaneously produces acontradictory client critique emphasizing the "unrealistic," "academic," or"impractical"dimensions ofevaluatorpractice.Thus,by takingtheacademiccritique of evaluation seriously, the evaluatorexperiences the worstof bothworlds: criticism from academics highlighting the low standardsand obviously limitedintelligencedisplayed in the evaluation products, andcriticism

47

from clients highlighting the overly structured or too academic approach tothe evaluation tasks.

LIVING (AND WORKING) WITH THE CONTRADICTION

We see no direct solution to this contradiction between accepted standardsof evaluator training and the way in which most organizations make decisions.' However, we do see a great deal of evaluation research (both published and unpublished) that is of high quality, relevant and persuasive,socially meaningful, and professionally fulfilling.' How is it that any respectable evaluation research is done at all? The answer must be that someevaluators suceessfully achieve Kennedy's "adaptation" to their environment: They match the tasks ofevaluation with the decision-making styles oftheir organization.

Therefore, the first task of the evaluator (either in-house or hired from theoutside) is to identify the type of decision-making style that exists in theorganization. The best way to determine whether an evaluation projectoperates under the rules of garbage can or rational decision making is toconsider the extent of the project's identification with or connectedness tothe range of issues and problems that coexist in the organization. Under therational decision-making model, projects are linked with perceived problemsfor long periods, and the appropriateness of this connection is rarely questioned. When project purpose and organizational problems are congruent inthis way, projects are designed as solutions to a set of particular problems,and the project manager directs the program toward implementing its ownparticular technical or administrative "solution."

In contrast, within a garbage can decision-making environment, theconnections between organizational problems and project goals are quitefluid and subject to continual debate and redefinition. Such a situation canbe diagnosed through different indicators. If there are frequent changes inmanagement structure, rapid introduction (and equally quick obsolescence)of problem-specific jargon, management entreaties toward redoubled effortstoward a new goal, or claims of "innovative" solutions to new problems, itis likely that the organization is in a garbage can decision-making mode. Amore project-specific indicator is the encouragement of the project managerto expand the purposes and redefine the goals of the project, often to othersectors of the company or to new functions that were unrelated to the originaldesign. In other words, if the project manager presents the project as a good

48

solutionlookingfor some appropriate (but as yet unidentified) problemit islikely thatgarbage can decision makingis at work.

Once the decision-making style is identified, the appropriate style ofevaluation - "academic" or "adaptive"- needsto be implemented. In rational decision-making environments, the evaluatorcan carryout research thatconformscloselyto theacademicideal:Evaluation research is usedto assessthe changein the problemcondition causedby theoperation of theprogram,usually in the form of a summative or "impact" evaluation, and to act as afeedback mechanism for theprojectmanagerso thattheprojectcanbe alteredto becomemoreeffectivecomparedwith otherprojects thataddressthesameproblem. This comparison makes it possible, of course, that the project isjudged relatively ineffective, and the evaluator's assessment may lead toproject overhaul or abandonment. But this is perfectly consistent with arational decision-making environment: ineffective project components areredesigned, and unsuccessful projectsare ended andothers take their place.

Things are quite different for the evaluator faced with a nonrationaldecision-making process. As noted earlier, academic evaluation methodscombined with garbage can decision-making stylesproduceseriouscontradictions, and the typical recommendations for improvement offered byacademiccritics are irrelevantin this situation. No amount of even "higherstandards," more "sophisticated"analytic technologies, or greater "professionalism" can resolve the contradictions between academic evaluationresearch and nonrational decision making. What is worse, most of therecommendations are actively harmful since they would exacerbate theinappropriateness of the academic'spreferredevaluation methodsand techniques whenapplied to a nonrational decision-making environment.

Thusevaluators trapped inthegarbagemustadoptan"adaptive"approachto evaluation by redefining their roleand (probably) renouncing mostof theirstandardmethodological trainingbasedon theexperimental paradigm. However, thisdoes not imply that evaluation skills play no part in these typesoforganizations. As an alternative to impact studies, research and evaluationskills can be brought to bear in such nonrational systems by concentratingon defining organizationalproblems and determining theirprevalence. Thatis, the evaluator's emphasis should be on identifying problems that theproject can be fruitfully connectedwith, rather than assessing the effectiveness of theproject in reducing a rangeof possibleproblem conditions. Thismeans thatLipsey's first recommendation for improving evaluation practiceis absolutely correct when evaluators find themselves in garbage can organizations, not because his recommendation corresponds to the academic

49

ideal, but becauseformative evaluationsand problemprevalence studies areimportant tasksfor evaluatorsassigned to projects that are solutionssearching for problemsto solve.

CONCLUSION

High-quality evaluationresearch is a product of the evaluator's ability todiagnose decision-making styles and to fashion a professional role that isappropriate to decision making in their organization. While the resultantadaptation may not conform to the academic definition of the evaluator'sresponsibilities and the ideal function of evaluationresearch in most organizations, we sec this fact more as a reflection of a narrow, formalized, andtechnically orientedconceptualization of "evaluation research"on the partof the academiccritics rather than an accurate assessment of the intellectualcompetenceof evaluation researchers or the quality of professional training(Shadish & Epstein, 1987) available to them.

Ifwe are correct,then thecriticsof currentevaluationpracticecould moreusefully re-orient their efforts from evaluationdesign issues, which are bothwell known topracticingevaluatorsand exasperatingdifficult to implementfor the reasons discussed earlier, to the study of factors that aid in theintegrationofevaluation researchandorganizational decision making- fromthe technical criticism of evaluators to the critical study of organizationaltechnologiesfor evaluating.This effort on the part of academics would domore to increasethequalityofevaluationresearchinorganizations thanhavemost of the "how-to" manuals, workbooks, primers, and handbooks onevaluation research publishedduring the last decade.

NOTES

1. While Lipsey claimsto be speakingfromthe evaluationresearch "trenches,"his fortrflcationsseem ivorylined.

2. This dilemmawasone of the thingsthatattractedus to appliedresearch In the firstplace,It is theconstanttension betweenincentives for"sellingout" andopportunities for"doinggood"that makesevaluation researchso interesting.

3. Without unduemodesty. we claim to have donesome ourselves.

50

REFERENCES

Berk, R , Boruch,R., Chambers, D ,ROSSIP, & Wille,A. (1985) Socialpolicyexperimentation:a positionpaper. EvaluationRc~'iel'\ 9(4),387·429.

Boruch, R ,Cordray,D.,Pror,G ,& Leviton, L (1983).Recommendationsto Congressandtheirrationale. EvaluationRc~'icw,7(1), 5·35.

Cook, T , & Campbell,D (1979) Quasi·experimenratlon. Chicago: Rand McNally.Crane,J (1988) Evaluationas scientific research EvaluationRevIew, 12(5),467·482.DeYoung, D, & Conner, R. (1982). Evaluator preconceptions about organizationaldecision

making Evaluat ion Review; 6(3), 431·440.Green, L, & Lewis, F. (1986). Measurement and evaluation in health education and health

promotion Palo Alto, CA: Mayfield.Hennessy, M. (1982). Theendof methodology? Areviewessayon evaluationresearchmethods.

"estern PoluicalQuarterly, 35(4), 606-612.Kennedy, M. (1983). The role of the in-houseevaluator.Evaluation Re~iew, 7(4),519·541.Lipsey,M. (1988). Practiceand malpractice in evaluationresearch. EvaluationPractice, 9(4),

5-24.March, J. & Olsen, J. (1976) Ambiguity and choice in organizations. Bergen: Universitcls

Forlaget.Martin, J. (1982). A garbage can model of the research process.In J. McGrath, J. Martin, &

R. Kulka(Eds.),Judgment calls in research (pp. 17-39) Beverly Hills, CA: Sage.Peck, D., & Rubin,H. (1983). Bureaucraticneeds and evaluatron research.EvaluationReview,

7(5),685·703.Rossi, P.,& Freeman,II. (1982). Evaluanon:A systematicapproach. BeverlyHills, CA:Sage.Shadish, W., & Epstein, R. (1987) Patterns of program evaluation among members of the

Evaluation ResearchSocietyand EvaluationNetwork.EvaluauonReview, 11(5),555·590.Sherrill,S. (1984) Towarda coherentview of evaluation.E~'Qluatlon Re~'lew, 8(4),443-466.Wye, C. (1989).Increasingclient involvementin evaluation: A team approach. In G. Barkdoll

& J Bell (Eds.),Evaluationand the federal decisionmaker. San Francisco Jessey-Bass.

Documents

Good organizational reasons for bad evaluation research