Evaluation Use - SAGE Publications IncEvaluation Use as a Critical Societal Issue If the scene I have described were unique, it would merely represent a frustrating profes-sional problem

1Evaluation UseBoth Challenge and Mandate

T he human condition: insidious prejudice, stultifying fear of the unknown, con-tagious avoidance, beguiling distortion of reality, awesomely selective percep-

tion, stupefying self-deception, profane rationalization, massive avoidance of truth—allmarvels of evolution’s selection of the fittest. Evaluation is our collective effort to outwitthese human propensities—when we choose to use it.

—Halcolm

On a cold November morning in Minnesota, some 15 coffee-clutching people invarious states of wakefulness have gathered to discuss evaluation of a county welfare-to-work program. Citizen advisory board representatives are present; thecounty board and state representatives have arrived; and the internal evaluator isbusy passing out handouts and setting up the PowerPoint presentation. We areassembled at this early hour to review the past year’s evaluation.

The evaluator begins by reviewing the problems getting started—fuzzy programgoals, uncertain funding, incomplete program files, and software inadequacies. Datacollection problems included staff resistance to doing additional paperwork, diffi-culty finding clients for follow-up interviews, and inconsistent state and county infor-mation systems. The evaluation was further hampered by management problems,staff turnover, unclear decision-making hierarchies, political undercurrents, trying todo too much, and an impossibly short timeline for reporting. Despite the problems,the evaluation had been completed and, putting the best face on a difficult situation,

3

01-Patton-45577.qxd 5/28/2008 7:22 PM Page 3

Evaluation Use as aCritical Societal Issue

If the scene I have described were unique, itwould merely represent a frustrating profes-sional problem for the people involved. Butif that scene is repeated over and over onmany mornings, with many advisoryboards, then the question of evaluation usewould become what eminent sociologistC. Wright Mills called a critical public issue:

Issues have to do with matters that tran-scend these local environments of the indi-vidual and the range of his inner life. Theyhave to do with the organization of manysuch milieux into the institutions of an his-torical society as a whole. . . . An issue, infact, often involves a crisis in institutionalarrangements (Mills 1959:8–9).

In my judgment, the challenge of usingevaluation in appropriate and meaningfulways represents just such a crisis in institu-tional arrangements. How evaluations areused affects the spending of billions of dol-lars to fight problems of poverty, disease,ignorance, joblessness, mental anguish,crime, hunger, and inequality. How areprograms that combat these societal ills to

be judged? How does one distinguish effec-tive from ineffective programs? And howcan evaluations be conducted in ways thatlead to use? How do we avoid producingreports that gather dust on bookshelves,unread and unused? These are the ques-tions this book addresses, not just in gen-eral, but within a particular framework:utilization-focused evaluation.

But first, what are these things called“evaluations” that we hope to see used?

4 � TOWARD MORE USEFUL EVALUATIONS

the evaluator explains that “the findings are tentative to be sure, but more than weknew a year ago.”

Advisory board members are clearly disappointed. One says, “The data justaren’t solid enough.” A county commissioner explains why Board decisions havebeen contrary to evaluation recommendations: “We didn’t really get the informationwe needed when we wanted it and it wasn’t what we wanted when we got it.” Theroom is filled with disappointment, frustration, defensiveness, cynicism, and morethan a little anger. There are charges, countercharges, budget threats, moments ofplanning, and longer moments of explaining away problems. The chairperson endsthe meeting in exasperation, lamenting: “What do we have to do to get evaluationresults we can actually use?”

Evaluation as Defined in the Encyclopedia of Evaluation

Evaluation is an applied inquiry process forcollecting and synthesizing evidence thatculminates in conclusions about the state ofaffairs, value, merit, worth, significance, orquality of a program, product, person, policy,proposal, or plan. Conclusions made inevaluations encompass both an empiricalaspect (that something is the case) and anormative aspect (judgment about the valueof something). It is the value feature thatdistinguishes evaluation from other types ofinquiry, such as basic science research,clinical epidemiology, investigative journalism,or public polling. (Fournier 2005a:140)

This book is an outgrowth of, and an answer to, that question.


To evaluate something means determin-ing its merit, worth, value, or signifi-cance. Program or project evaluationstypically involve making the followingkinds of judgments: How effective is theprogram? To what extent has the pro-gram been implemented as expected?Were the program’s goals achieved? Whatoutcomes and results were achieved bythe program? To what extent and in whatways did program participants benefit, ifat all? What needs of participants weremet? What unanticipated consequencesresulted from the program? What are thestrengths and weaknesses of the program,and how can it be improved? Whatworked and what didn’t work? What hasbeen learned in this program that mightbe useful to other programs? To whatextent do the benefits of the programprovide sufficient value to justify thecosts of the program? Should the pro-gram’s funding be maintained as is,increased, or decreased? Evaluations,then, typically describe and assess whatwas intended (goals and objectives), whathappened that was unintended, what wasactually implemented, and what out-comes and results were achieved. Theevaluator will then discuss the implicationsof these findings, sometimes includingitems for future action and recommenda-tions. In the simplest terms, evaluations aresaid to answer three questions:

What?

So What?

Now What?

Sometimes these evaluation questionsare answered in formal reports. Someevaluative judgments flow from analyz-ing and discussing data from a program’sinformation system without producing aformal report; indeed, increasingly findings

emerge as “the real-time production ofstreams of evaluative knowledge” (Rist2006a:6–7; Stame 2006b:vii), rather thanas discrete, stand-alone studies. Someevaluation reports are entirely internal toan organization for use by staff andadministrators to support ongoing man-agerial decision making. Other evaluationreports are published or posted on theInternet to meet an obligation for public accountability or to share lessonslearned.

The issue of evaluation use has emergedat the interface between science and action,between knowing and doing. It raises funda-mental questions about human rationality,decision making, and knowledge applied tocreation of a better world. And the issue isnot just a concern of researchers. Sometimesit reaches the larger public as in this classicnewspaper headline, “Agency EvaluationReports Disregarded by Legislators WhoRequested Them” (see Exhibit 1.1).

A Broader Perspective: UsingInformation in the Knowledge Age

The challenge of evaluation use epito-mizes the more general challenge of knowl-edge use in our times. Our age—the Age ofInformation, Knowledge, and Communica-tions—has developed the capacity togenerate, store, retrieve, transmit, andinstantaneously communicate massiveamounts of information. Our problem iskeeping up with, sorting out, absorbing,prioritizing, and using information. Ourtechnological capacity for gathering andcomputerizing information now far exceedsour human ability to process and makesense out of it all. We’re constantly facedwith deciding what’s worth knowing andwhat to ignore.

Evaluators are “knowledge workers,” aterm the great management scholar andconsultant Peter Drucker introduced to

Evaluation Use � 5



What? So What? Now What?

Glenda H. Eoyang, Executive Director of the Human Systems Dynamics Institute in Minnesota,describes how she uses this evaluative framework in managing her own organization.

At the Institute, we use three simple questions to help us distinguish emergencies from the merelyemergent, to analyze multiple factors in the moment, and to align our diverse actions toward sharedgoals. These questions, though simple, are deeply powerful as we shape our work together towardadaptive action.

WHAT? What do we see? What does data tell us? What are the indicators of change or stability? Whatcues can we capture to see changing patterns as they emerge?

SO WHAT? So, what sense can we make of emerging data? What does it mean to us in this momentand in the future? What effect are current changes likely to have on us, our clients, our extendednetwork, and our field of inquiry and action?

NOW WHAT? What are our options? What are our resources? When and how can we act—individuallyor collectively—to optimize opportunities in this moment and the next?

We and our clients have used these questions to move together toward decisive action.

A social service agency faced radical changes in public policy that would have a direct effect on theirclients and the resources they had available to meet clients’ needs. What? So what? Now what?

A medical technology company focused on getting processes under control and ensuring lean, highquality product development and deployment procedures. What? So what? Now what?

An organization in the midst of internal transformation faced backlash from disgruntled workers.What? So what? Now what?

A group of attorneys and their support staff recognized patterns of negative attitudes and disruptiverelationships that sucked their energies and distracted them from productive work. What? So what?Now what?

In each of these cases, the three questions helped leadership focus on critical options and effectiveactions. What emerged was not a sophisticated and complicated plan for an unknowable future. No.What did emerge was a shared understanding of emerging challenges and clear focus on actions thatcould shift emergencies into emergent possibilities. (Eoyang 2006b)

SOURCE: Reprinted with permission of Glenda Eoyang.

describe anyone who produces knowledge,ideas, and information in contrast to moretangible products and services (Drucker2003; Lowenstein 2006). The challenge weface is not just producing knowledge butthe even greater challenge of getting peopleto use the knowledge we produce.

Getting people to use what is known hasbecome a critical concern across the differentknowledge sectors of society. A major spe-cialty in medicine (compliance research) isdedicated to understanding why so manypeople don’t follow their doctor’s orders.Common problems of information use


underlie trying to get people to use seatbelts, quit smoking, begin exercising, eatproperly, and pay attention to evaluationfindings. In the fields of nutrition, energyconservation, education, criminal justice,financial investment, human services, corpo-rate management, public administration,

philanthropy, international development—the list could go on and on—a central prob-lem, often the central problem, is gettingpeople to apply what is already known.

In agriculture, a major activity of uni-versity extension services is trying to getfarmers to adopt new scientific methods.


E X H I B I T 1. 1Newspaper Column on Evaluation Use

Agency Evaluation Reports Disregarded by Legislators Who Had Requested Them

Minnesota lawmakers who mandated that state agencies spend a lot of employee hours and money develop-ing performance evaluation reports pretty much ignored them. . . . The official word from the state legislativeauditor’s evaluation of the performance evaluation process: Legislators who asked for the reports did not paymuch attention to them. They were often full of boring and insignificant details. . . .

Thousands of employee hours and one million taxpayer dollars went into writing the 21 major stateagency performance evaluation reports. The auditor reports the sad results:

• Only three of 21 state commissioners thought that the performance reports helped the governor makebudget choices regarding their agencies.

• Only seven of 21 agencies were satisfied with the attention given the reports in the House committeesreviewing their programs and budgets. And only one agency was satisfied with the attention it receivedin the Senate.

Agency heads also complained to legislative committees this year that the 1993 law mandating the reportswas particularly painful because departments had to prepare new two-year budget requests and program jus-tifications at the same time. That “dual” responsibility resulted in bureaucratic paperwork factories runningovertime.

“Our experience is that few, if any, legislators have actually read the valuable information contained in ourreport . . . ,” one agency head told auditors. “The benefits of performance reporting will not materialize if oneof the principal audiences is uninterested,” said another.

“If the Legislature is not serious about making the report ‘the key document’ in the budget decisionprocess, it serves little value outside the agency,” said a third department head.

Mandating the reports and ignoring them looks like another misguided venture by the 201-memberMinnesota Legislature. It is the fifth largest Legislature in the nation and during much of the early part of thisyear’s five-month session had little to do. With time on their hands, lawmakers could have devoted more timeto evaluation reports. But if the reports were dull and of little value in evaluating successes of programs, canthey be blamed for not reading them?

Gary Dawson, “State Journal” columnSaint Paul Pioneer Press August 7, 1995: 4B

SOURCE: Reprinted with permission from the Saint Paul Pioneer Press.


Experienced agricultural extension agentslike to tell the story of a young agent tellinga farmer about the latest food productiontechniques. As he begins to offer advice thefarmer interrupts him and says, “No sensetelling me all those new ideas, young man.I’m not doing half of what I know I shouldbe doing now.”

I remember talking with a time manage-ment trainer who had done a follow-upstudy of people who had taken her work-shop series. Few were applying the timemanagement techniques they had learned.When she compared the graduates of hertime management training with a sample ofnonparticipants, the differences were not inhow people in each group managed theirtime. The time management graduates hadquickly fallen back into old habits. The dif-ference was the graduates felt much guiltierabout how they wasted time.

Research on adolescent pregnancy illus-trates another dimension of the knowledgeuse problem. In a classic study, adolescent-health specialist Michael Resnik (1984)interviewed teenagers who became preg-nant. He found very few cases in which theproblem was a lack of information aboutcontraception, about pregnancy, or abouthow to avoid pregnancies. The problemwas that teens just didn’t apply what theyknew. “There is an incredible gap betweenthe knowledge and the application of thatknowledge. In so many instances it’s heart-breaking—they have the knowledge, theawareness, and the understanding, butsomehow it doesn’t apply to them” (p. 15).

Sometimes the stakes are incrediblyhigh, as in using information to preventgenocide. Between April and June 1994,an estimated 800,000 Rwandans werekilled in the space of 100 days. LieutenantGeneral Roméo Dallaire headed the smallUN Peacekeeping Force in Rwanda. He

filed detailed reports about the unspeak-able horrors he and his troops witnessed.He documented the geographic scope ofthe massacre and the numbers of peoplebeing slaughtered. In reporting these find-ings to the UN officials and Western gov-ernments, Dallaire pleaded for morepeacekeepers and additional trucks totransport his woefully ill-equipped force.He sought authority to seize Hutu armscaches, but the narrow UN mandate did-n’t allow him to disarm the militias. Asbodies filled the streets and rivers, thegeneral tried in vain to attract the world’sattention to what was going on. In anassessment that military experts nowaccept as realistic, Dallaire argued thatwith 5,000 well-equipped soldiers and afree hand to fight Hutu power, he couldbring the genocide to a rapid halt. TheUnited Nations, constrained by thedomestic and international politics ofSecurity Council members, ignored him.He asked the United States to block theHutu radio transmissions that were pro-voking and guiding the massacre. TheClinton administration refused to do even that.

Instead, following the deaths of 10Belgian peacekeepers assigned to protectthe President of Rwanda, Dallaire’s forceswere cut to a mere 500 men, far too fewto make a difference as one of the mosthorrific genocides in modern historyunfolded. Dallaire, frustrated and dis-heartened by the passive attitude ofworld leaders, repeatedly confronted hissuperiors, trying to get them to deal withthe data about what was going on, all tono avail. The international communityoccupied itself with arguing about thedefinition of genocide, placing blameelsewhere, and finding reasons not tointervene.



The highly respected Danish interna-tional development agency, Danida,sponsored a major retrospective evalua-tion of the Rwanda genocide seeking toextract lessons that might help the worldavoid future such tragedies (Danida2005). The United Nations (1996) under-took its own investigation and Dallaireand Beardsley (2004) have provided theirown account of what happened and why. The point, for our purposes, is thatthe Rwanda story included the refusal ofinternational agencies and world leadersto take seriously and use the data theywere given. Those who had the responsi-bility and capacity to act failed to payattention to the evidence Dallaire pro-vided them about the deteriorating situa-tion and the consequences of a failure toact. While his efforts involved the higheststakes possible—saving human lives—evaluators across a broad range of sectorsface the daily challenge of getting deci-sion makers to take evidence of ineffec-tiveness seriously and act on theimplications of the evidence. It was pre-cisely this larger relevance of the Rwandaexample that led to Dalliare being invitedto keynote 2,330 evaluation professionalsfrom 55 countries at the joint CanadianEvaluation Society and the AmericanEvaluation Association (AEA) interna-tional conference in Toronto in 2005.Following the keynote, he was awardedthe Presidents’ Prize for Speaking Truthto Power. This award symbolizes one of the most important roles evaluatorscan be called on to play, a role that goes beyond technical competence andmethodological rigor, a role that recog-nizes the inherently political nature ofevaluation in a world where knowledge ispower—the role of speaking truth topower.

The High Stakes of Evaluation Use

When the space shuttle Columbia disin-tegrated on February 1, 2002, killing allseven astronauts aboard, a comprehensiveindependent investigation ensued by a 13-member board of inquiry. While the directmechanical problem was damage causedby a foam tile that came loose during liftoff, the more basic cause, inves-tigators concluded, was the NationalAeronautics and Space Administration’s(NASA’s) own culture, a culture of com-placency nurtured by a string of successessince the 1986 Challenger disaster, whichalso killed seven. This led to a habit ofrelaxing safety standards to meet financialand time constraints, for example, defin-ing a problem as insignificant so as not torequire a fix that would cause delay. TheColumbia Accident Investigation Board(2003) concluded in its 248-page reportthat the space agency lacked effectivechecks and balances, did not have anindependent safety program, and had notdemonstrated the characteristics of alearning organization.

In addition to detailing the technical fac-tors behind Columbia’s breakup just min-utes before its scheduled landing at theend of a 16-day science mission, the board’sreport laid out the cultural factors behindNASA’s failings. It said NASA mission man-agers fell into the habit of accepting as nor-mal some flaws in the shuttle system andtended to ignore, not recognize, or not wantto hear about such problems even thoughthey might foreshadow catastrophe. Suchrepeating patterns meant that flawedpractices embedded in NASA’s organiza-tional system continued for years and madesubstantial contributions to both accidents,the report concluded. During Columbia’slast mission, NASA managers missed



opportunities to evaluate possible damageto the craft’s heat shield from a strike on theleft wing by flying foam insulation. Suchinsulation strikes had occurred on previousmissions and, the report said, engineerswithin NASA had documented the dangersinvolved, but the evidence they submittedand the accompanying warnings they sentup the chain-of-command were ignored.This attitude of ignoring data that led toconclusions they didn’t like also con-tributed to the lack of interest amongNASA managers in getting spy satellitephotos of Columbia, images that mighthave identified the extent of damage on theshuttle. Over time, NASA managers hadcome to accept more and more risk in orderto meet scheduled launch deadlines. Butmost of all, the report concluded, there wasineffective leadership that discouraged dis-senting views on safety issues, ignored theevaluation findings of safety engineers, andultimately created blind spots about the riskto the space shuttle of the foam insulationimpact.

Sometimes, as in the examples of theRwanda genocide and the Columbia shuttledisaster, important data are ignored. Inother cases, the data-generating processitself is distorted and manipulated to createbiased and distorted findings. Regardless ofwhat one thinks of the U.S. invasion of Iraqto depose Saddam Hussein, both those whosupported the war and those who opposedit have come to agree that the intelligenceused to justify the invasion was deeplyflawed and systematically distorted (U.S.Senate Select Committee on Intelligence2004). Under intense political pressure toshow sufficient grounds for military action,those charged with analyzing and evaluat-ing intelligence data began doing what is sometimes called cherry-picking orstovepiping—selecting and passing on only

those data that support preconceived posi-tions and ignoring or repressing all contraryevidence (Hersh 2003). This is a problem ofthe misuse of evaluation findings, theshadow side of the challenge of increasingutility.

Getting evaluations used begins withhaving valid, accurate, relevant, and bal-anced findings that are worth using. Then,as the Rwanda example demonstrates,those with data have to get the attention ofthose who will make crucial decisions. Asthe NASA story shows, this is not just amatter of reaching individual decisionmakers but dealing with the whole cultureof organizations to create a learning envi-ronment that is receptive to data-baseddecision making. And as the problem ofIraq prewar intelligence shows, evaluatorscommitted to accurate and balancedreporting will have to deal with politicalobstacles and speak truth to power.

These are high-profile examples of eval-uation neglect and misuse, but the chal-lenges of getting evaluations used aren’tlimited to such obviously high stakesnational and international initiatives. Lookat today’s local news. What decisions arebeing reported about programs and poli-cies in your community—decisions by citycouncils, school boards, county commis-sions, legislative committees, not-for-profitagencies, philanthropic foundations, andbusinesses? What does the news story tellyou about the data that informed thosedecisions? What evidence was used? Whatwas the quality of that evidence? Howwere data generated and presented as partof the decision-making process?

Decisions abound. Policy choices areall around us. Programs aimed at solvingproblems exist in every sector of society.And behind every one of these decisions,policy choices, and program initiatives



is an evaluation story. What evaluativeevidence, if any, was used in the decisionmaking? What was the quality ofthat evidence? Asking these questions isat the foundation of utilization-focusedevaluation.

These questions also have relevance atthe personal level. Some degree of evalua-tive thinking is inherently involved in everydecision you make. How did you decidewhat computer to purchase? Or whatcourse to take? How do you and those youknow make decisions about dating, mar-riage, nutrition, exercise, lifestyle, where tolive, what to do for leisure, who to vote for(and whether to vote at all), what movie tosee (and whether you thought it was anygood), what books or magazines to read,when to see a doctor, and so on. We are allevaluators. But we are not all good at it,not always systematic or thoughtful, care-ful about seeking out and weighing evi-dence, and explicit about the criteria andvalues that underpin our interpretations ofwhatever evidence we have. Thus, as weconsider how to enhance program decisionmaking through systematic evaluation, youmay pause now and again to consider theimplications of this way of thinking fordecisions you make in your personal life.Or maybe not. How will you decide?

These examples of the challenges ofputting knowledge to use are meant to seta general context for the specific concernof this book: generating high-quality andhighly relevant evaluation findings andthen actually getting those findings usedfor program decision making and improve-ment. Although the problem of informa-tion use remains central to our age, we arenot without knowledge about what to do.We’ve learned a few things about overcom-ing our human resistance to new knowl-edge and change, and over the past three

decades of professional evaluation prac-tice, we’ve learned a great deal abouthow to increase evaluation use. Before pre-senting what’s been learned, let’s set thecontext.

Historical Precedents

Today’s professional evaluators stand onthe shoulders of much earlier practitionersthough they didn’t necessarily call whatthey did evaluation. The emperor of Chinaestablished formal proficiency testing forpublic officials some 4,000 years ago. Thebook of Daniel in the Old Testament of theBible opens with the story of an educa-tional program evaluation in which KingNebuchadnezzar of Babylon created a3-year civil service training program forHebrew youth after his capture ofJerusalem. When Daniel objected to eatingthe King’s meat and wine, the programdirector, Melzar, agreed to an experimentalcomparison to evaluate how eating akosher diet might affect the “countenance”of Daniel and his friends, Hananiah,Mishael, and Azariah. When they remainedhealthy after the pilot test period, he agreedto a permanent change for those whoeschewed the Babylonian diet—the earliestdocumentation of using evaluation find-ings to change a program’s design.

Once you start looking, you can turnup all kinds of historical precedents forevaluation.

The great Lewis and Clark expeditionthrough the central and western Americanwilderness had as its purpose evaluatingthe suitability of the interior rivers fortransportation and the value of the land forsettlement. The Louisiana Purchase, whichthey would reconnoiter, covered morethan 2 million square kilometers of land



extending from the Mississippi River to theRocky Mountains, essentially doubling thesize of the United States. They would travelall the way to the Pacific Ocean. On June20, 1803, President Thomas Jeffersonlaunched the exploration thusly:

The Object of your mission is to explore theMissouri river & such principal streams of itas by its course and communication with thewaters of the Pacific ocean, whether theColumbia, Oregon, Colorado or any otherriver may offer the most direct and practica-ble water communication across this conti-nent for the purpose of commerce.

The reports Lewis and Clark sent backto Jefferson were, for all intents and pur-poses, evaluation reports addressing theobjectives set forth and much, much more.In effect, President Jefferson needed to findout what he had purchased, what we mightcall a retrospective evaluation. The reportsof Lewis and Clark went well beyond theirnarrow, stated mission and included exten-sive inventories of plants and animals,including many new species, details about

indigenous peoples, maps of the land,indeed, everything they did and all thatthey encountered over a 3-year periodthrough lands that later became 11 states.They might be awarded a posthumousGuinness World Record for the evaluationthat most exceeded its original scope ofwork. The impressive Saint Louis GatewayArch on the banks of the Mississippi River,commemorating the Westward Expan-sion opened up by the Lewis and ClarkExpedition, could be considered a monu-ment to the impact of evaluation findings.

You get the idea. Just because some-thing wasn’t officially labeled an evalua-tion report doesn’t mean it didn’t serve anevaluative function. Lewis and Clarkgathered and reported extensive data tojudge the merit, worth, significance, andvalue of the Louisiana Territory. Theirdescriptive data and judgments affectedcongressional policy, executive directives,and federal appropriations. If it reads likean evaluation, serves evaluative functions,and gets used like an evaluation, we mightcall it an evaluation.


Evaluating a Venture in Colonial America

In the 1730s, the settlement of the colony of Georgia by the English poor began as a philanthropicventure in colonial America complete with a detailed proposal (“blueprint”), grandiose goals, quitemeasurable outcomes, testable hypotheses about how to achieve outcomes (what we’d call today a“theory of change”), annual plans, internal evaluation reports from the staff to the Board of Trustees,independent site visits, multiple and conflicting stakeholders, divisive politics, bureaucratic ineptitude andmicromanaging, pointed participant feedback about problems, a major problem with dropouts, ongoingefforts at project improvement, and ultimately, a judgment that the experiment failed, accompanied by alofty explanation from the funders about why they were compelled to pull the plug, to wit:

“At first it was a trial, now it is an experiment; and certainly no man or society need be ashamed toown, that from unforeseen emergencies their hypothesis did misgive; and no person of judgmentwould censure for want of success where the proposal was probable; but all the world would exclaimagainst that person or society who, through mistaken notions of honor or positiveness of temper,would persist in pushing an experiment contrary to all probability, to the ruin of the adventurers.”(Boorstin 1958:96)


Thomas Jefferson was also among therecipients of another evaluation report,this one well before he became president,indeed, before the American Revolution.Jefferson and other founding fathers ofthe United States had become knowledge-able about and impressed with theIroquois republic, a Native Americanpeople in the Northeastern part of NorthAmerica, which had continuously existedsince the fourteenth or fifteenth century.The Iroquois Constitution, known as“The Great Law of Peace,” was an orallytransmitted constitution for the union of five (later six) Indian nations:Mohawk, Onondagam Seneca, Oneida,Cayuga, and the Tscarora. Jefferson,Benjamin Franklin, John Adams, andGeorge Washington were all familiarwith the Iroquois polity and were influ-enced by its key ideas and processes inconceptualizing American government(Johansen 1998; 1987; Idarius 1998). In1774, the Virginia Colony offered theIroquois Confederacy scholarships to sendsix of their young men to WilliamsburgCollege to be educated. The IroquoisChiefs responded that they had alreadyhad some of their young men attend sucha college and their evaluation of theresults did not predispose them to acceptthe Virginia offer, which they, in fact,declined:

Several of our Young People were formallybrought up at the Colleges of the NorthernProvinces; they were instructed in all of yourSciences; but, when they returned to us, theywere bad runners, ignorant of every meansof living in the Woods, unable to bear eitherCold or Hunger, knew neither how to builda Cabin, take a Deer, or kill an Enemy,spoke our language imperfectly, were there-fore neither fit for Hunters, Warriors, norCounsellors, they were totally good fornothing. (Hopkins 1898:240)

Now that’s a clear, evidence-based,mince-no-words, evaluative judgment!

You never know where an evaluationreport may turn up. Doing archival researchin Tanzania, I found scores of reports byanthropologists, colonial managers, andEnglish academics describing and assessingvarious and sundry failed attempts to settlethe nomadic cattle-herding Wagogo peopleof the Dodoma Region in what had beencentral Tanganyika. My assignment was towrestle lessons learned from the many failedsettlement schemes spanning some 50 yearsof colonial rule so that the modern demo-cratic government under President JuliusNyerere could find a new, more humane,and effective approach. None of the reportswere titled “evaluations,” but all of themwere fundamentally evaluative.

I stumbled across an explicit but stillsecret evaluation visiting the HiroshimaMuseum while conducting evaluationtraining at Hiroshima University. One ofthe exhibits there describes how theAmerican military’s “Target Committee”selected Hiroshima for trying out the firstatomic bomb. Allied forces were engagedin heavy bombing throughout Japan, espe-cially in and around Tokyo. Since thedestructive power of the atomic bombwas unknown, a small number of majorJapanese cities were excluded from routinebombing and carefully photographed withinventories of buildings, infrastructure,and industries. On August 6, 1945, thenuclear weapon Little Boy was dropped onHiroshima by the Enola Gay, a U.S. AirForce B-29 bomber, which was alteredspecifically to hold the bomb, killing anestimated 80,000 people and heavily dam-aging 80 percent of the city. An Americanmilitary team subsequently completed afull evaluation of the bomb’s damageand impact but, according to the Museumdisplay, that report has never been made



public. Whether and how it was used isalso, therefore, unknown.

Using Evaluative Thinking:A Transdisciplinary Perspective

Experience in thinking can be won,like all experience in doing something,only through practice.

—Philosopher Hannah Arendt (1963:4)

The historical examples of evaluations Ihave just reviewed provide some sense ofthe long and diverse history of evaluationreporting, but even more important, theyillustrate the centrality of evaluative think-ing in human affairs and inquiries of allkinds. Using evaluative thinking and rea-soning is ultimately more important andhas more far-reaching implications thanmerely using evaluation reports. This iswhy eminent philosopher and evaluationtheorist Michael Scriven (2005b, 2004)has characterized evaluation as a transdis-cipline, because every discipline, profes-sion, and field engages in some form ofevaluation, the most prominent examplebeing, perhaps, evaluations of studentstaking courses and completing disciplinaryprograms of study, and refereed journals inwhich new research is evaluated by peersto determine if it is worthy of publication.Evaluation is a discipline that serves otherdisciplines even as it is a discipline untoitself; thus its emergent transdisciplinarystatus (Coryn and Hattie 2006). Statistics,logic, and evaluation are examples oftransdisciplines in that their methods, waysof thinking, and knowledge base are usedin other areas of inquiry, e.g., education,health, social work, engineering, envi-ronmental studies, and so on (Mathison

2005:422). In studying evaluation use,then, we will be looking at not only the useof evaluation findings and reports but alsowhat it means to use evaluative thinking.

The Emergence of ProgramEvaluation as a FormalField of Professional Practice

There is nothing more difficult to takein hand, more perilous to conduct, ormore uncertain in its success, than totake the lead in the introduction of anew order of things. Because the inno-vator has for enemies all those whohave done well under the old condi-tions and lukewarm defenders in thosewho may do well under the new.

—Advice from The Prince (1513)Niccolo Machiavelli (1469–1527)

While evaluative thinking, inquiry, andjudgments are as old as and inherent to ourhuman species, formal and systematic eval-uation as a field of professional practice isrelatively recent. Like many poor people,evaluation in the United States has grownup in the “projects”—federal projectsspawned by the Great Society legislation ofthe 1960s. When the federal governmentof the United States began to take a majorrole in alleviating poverty, hunger, andjoblessness during the Depression of the1930s, the closest thing to evaluation wasthe employment of a few jobless academicsto write program histories. Some importantevaluations began to be done after WorldWar II, for example, an evaluation of theFirst Salzberg Seminar conducted by distin-guished anthropologist Margaret Mead forthe W. K. Kellogg Foundation in 1947(Russon and Ryback 2003; Greene 2003;Patton 2003). In 1959, the U.S. Departmentof Health, Education, and Welfare published



guidelines for evaluation (Herzog 1959).Preeminent evaluation researcher andauthor Carol Weiss has recounted finding anumber of published evaluation studiesfrom the late 1950s and early 1960s thatinformed her own first evaluation effort(Weiss 2004:163). But it was not until themassive federal expenditures on an awe-some assortment of programs during the1960s and 1970s that accountability ingovernment began to mean more thanfinancial audits or political head counts ofopponents and proponents. Demand forsystematic empirical evaluation of theeffectiveness of government programs grewas government programs grew (Shadishand Luellen 2005; Aucoin and Heinzman2000; House 1993; Wye and Sonnichsen1992). At the same time, fear of, resistanceto, and a backlash against evaluationaccompanied evaluation’s growth as someprogram staff and agency managers lookedon evaluation as a personal attack and

feared that evaluation was merely a rusefor what was really a program terminationagenda.

Educational evaluation accompaniedthe expansion of access to public school-ing. Joseph Rue’s comparative study ofspelling performance by 33,000 studentsin 1897 was a precursor of educationalevaluation, which remains dominated byachievement testing. During the cold war,after the Soviet Union launched Sputnik in1957, calls for better educational assess-ments accompanied a critique born of fearthat the education gap was even largerthan the “missile gap.” Demand for inde-pendent evaluation accelerated with thegrowing realization, in the years after the1954 Supreme Court Brown decisionrequiring racial integration of schools, that“separate and unequal” was still the normrather than the exception. Passage of theU.S. Elementary and Secondary EducationAct in 1965 contributed greatly to more



comprehensive approaches to evaluation.The massive influx of federal money aimedat desegregation, innovation, compen-satory education, greater equality ofopportunity, teacher training, and higherstudent achievement was accompaniedby calls for evaluation data to assess theeffects on the nation’s children: To whatextent did these changes really make aneducational difference?

But education was only one arena in theWar on Poverty of the 1960s. Great Societyprograms from the Office of EconomicOpportunity were aimed at nothing lessthan the elimination of poverty. The cre-ation of large-scale federal health programs,including community mental health centers,was coupled with a mandate for evaluation,often at a level of 1 percent to 3 percent ofprogram budgets. Other major programswere created in housing, employment, ser-vices integration, community planning,urban renewal, welfare, and so on—thewhole of which came to be referred to as“butter” (in contrast to “guns”) expendi-tures. In the 1970s, these Great Society pro-grams collided head on with the VietnamWar, rising inflation, increasing taxes, andthe fall from glory of Keynesian economics.All in all, it was what sociologists and socialhistorians, with a penchant for understate-ment, would characterize as “a period ofrapid social and economic change.”

Program evaluation as a distinct fieldof professional practice was born of twolessons from this period of large-scalesocial experimentation and governmentintervention: first, the realization that thereis not enough money to do all the thingsthat need doing and, second, even if therewere enough money, it takes more thanmoney to solve complex human and socialproblems. As not everything can be done,there must be a basis for deciding whichthings are worth doing. Enter evaluation.

High Hopes for Evaluation

One of the most appealing ideas ofour century is the notion that sciencecan be put to work to provide solu-tions to social problems.

—Political Sociologist Hans Zetterman(quoted in Suchman [1967:1])

Evaluation and Rationality

The great sociologist Max Weber(1864–1920), founder of organizationalsociology, predicted that modern institu-tions would be the foundation of ever-increasing rationality in human affairs.“Modernity, Weber said, is the progres-sive disenchantment of the world.Superstitions disappear; cultures growmore homogeneous; life becomesincreasing rational” (Menand 2006:84).Evaluation epitomizes Weber’s vision ofrationality in the modern world. DonaldCampbell (1917–1996) picked up themantle of Weber’s work on the sociologyof science and rearticulated his vision ofmodernity, explicitly incorporating eval-uation as a cornerstone of rationality,and expressed in the ideal of an “experi-menting society”:

It would be an active society preferringexploratory innovation to inaction. . . . Itwill be an evolutionary, learning society. . . .It will be an honest society, committed toreality testing, to self-criticism, to avoidingself-deception. It will say it like it is, face upto the facts, be undefensive and open in self-presentation. (Campbell 1999:13)

The ascendance of applied social andbehavioral sciences was driven by hope thatknowledge could be used rationallyto make the world a better place, that is,that social sciences would yield practical



knowledge (Stehr 1992). In 1961, Harvard-educated President John F. Kennedy wel-comed scientists to the White House as neverbefore. Scientific perspectives were takeninto account in the writing of new social leg-islation. Economists, historians, psycholo-gists, political scientists, and sociologistswere all welcomed into the public arena toshare in the reshaping of modern postindus-trial society. They dreamed of and workedfor a new order of rationality in govern-ment—a rationality undergirded by socialscientists who, if not exemplifying Plato’sphilosopher-kings themselves, were at leastministers to philosopher-kings. Carol Weisshas captured the optimism of that period:

There was much hoopla about the rational-ity that social science would bring to theuntidy world of government. It would pro-vide hard data for planning . . . and givecause-and-effect theories for policy making,so that statesmen would know which vari-ables to alter in order to effect the desiredoutcomes. It would bring to the assessmentof alternative policies a knowledge of rela-tive costs and benefits so that decision-makers could select the options with thehighest payoff. And once policies were inoperation, it would provide objective evalu-ation of their effectiveness so that necessarymodifications could be made to improveperformance. (Weiss 1977:4)

While pragmatists turned to evaluationas a commonsensical way to figure outwhat works and is worth funding, vision-aries were conceptualizing evaluation asthe centerpiece of a new kind of society:“the experimenting society.” Donald T.Campbell gave voice to this vision inhis 1971 address to the American Psycho-logical Association:

The experimenting society will be one whichwill vigorously try out proposed solutionsto recurrent problems, which will make

hard-headed and multidimensional evaluationsof the outcomes, and which will move on toother alternatives when evaluation shows onereform to have been ineffective or harmful.

We do not have such a society today.(Campbell 1991:223)

Early visions for evaluation, then, focusedon evaluation’s expected role in guidingfunding decisions and differentiating thewheat from the chaff in federal programs.But as evaluations were implemented, a newrole emerged: helping improve programs asthey were implemented. The Great Societyprograms floundered on a host of problems:management weaknesses, cultural issues,and failure to take into account the enor-mously complex systems that contributed topoverty. Wanting to help is not the same asknowing how to help; likewise, having themoney to help is not the same as knowinghow to spend money in a helpful way. ManyWar on Poverty programs turned out to bepatronizing, controlling, dependency gener-ating, insulting, inadequate, misguided, over-promised, wasteful, and mismanaged.Evaluators were called on not only to offerfinal judgments about the overall effective-ness of programs but also to gather processdata and provide feedback to help solveproblems along the way.

By the mid-1970s, interest in evaluationhad grown to the point where two profes-sional organizations were established: theacademically oriented Evaluation ResearchSociety and the practitioner-orientedEvaluation Network. In 1984, they mergedas the American Evaluation Association.By that time, interest in evaluation hadbecome international with establishment ofthe Canadian Evaluation Society and theAustralasian Evaluation Society.

One manifestation of the scope, perva-siveness, and penetration of the high hopesfor evaluation is the number of evaluationstudies conducted. As early as 1976, the



Congressional Sourcebook on FederalProgram Evaluations contained 1,700 cita-tions of program evaluation reports issuedby 18 U.S. executive branch agencies andthe General Accounting Office (GAO) dur-ing fiscal years 1973 through 1975 (Officeof Program Analysis, GAO 1976:1). In1977, federal agencies spent $64 million onprogram evaluation and more than $1.1billion on social research and development(Abramson 1978). The third edition ofthe Compendium of Health and HumanServices Evaluation Studies (HHS 1983)contained 1,435 entries. The fourth volumeof the U.S. Comptroller General’s directoryof Federal Evaluations (GAO 1981) identi-fied 1,429 evaluative studies from variousU.S. federal agencies completed in fiscalyear 1980. While the large number of andsubstantial funding for evaluations sug-gested great prosperity and acceptance,under the surface and behind the scenes acrisis was building—a utilization crisis.

Reality Check: Evaluations Largely Unused

By the end of the 1960s, it was becomingclear that evaluations of “Great Society”social programs were largely ignored orpoliticized. The utopian hopes for a scientificand rational society had somehow failed tobe realized. The landing of the first humanon the moon came and went, but povertypersisted despite the 1960s “war” on it—and research was still not being used as thebasis for government decision making. Whileall types of applied social science sufferedfrom underuse (Weiss 1977, 1972a), nonuseseemed to be particularly characteristic ofevaluation studies. Ernest House (1972) putit this way: “Producing data is one thing!Getting it used is quite another” (p. 412).Williams and Evans (1969) wrote that “inthe final analysis, the test of the effectiveness

of outcome data is its impact on imple-mented policy. By this standard, there isa dearth of successful evaluation studies”(p. 119). Wholey et al. (1970) concluded that“the recent literature is unanimous inannouncing the general failure of evaluationto affect decision making in a significantway” (p. 46). They went on to note that theirown study “found the same absence of suc-cessful evaluations noted by other authors”(Wholey et al. 1970:48). There was little evi-dence to indicate that government planningoffices had succeeded in linking socialresearch and decision making. SeymourDeitchman (1976), in his Tale of SocialResearch and Bureaucracy, did not mincewords: “The impact of the research on themost important affairs of state was, with fewexceptions, nil” (p. 390). Weidman et al.(1973) concluded that “on those rare occa-sions when evaluations studies have beenused . . . the little use that has occurred [hasbeen] fortuitous rather than planned”(p. 15). In 1972, the eminent evaluationscholar Carol Weiss viewed underutilizationas one of the foremost problems in evalua-tion research: “A review of evaluation expe-rience suggests that evaluation results havenot exerted significant influence on programdecisions” (Weiss 1972c:10–11).

This conclusion was echoed by fourprominent commissions and study com-mittees: the U.S. House of RepresentativesCommittee on Government Operations,Research and Technical Programs Subcom-mittee (1967); the Young Committee reportpublished by the National Academy ofSciences (1968); the Report of the SpecialCommission on the Social Sciences for theNational Science Foundation (1968); and theSocial Science Research Council’s prospectiveon the Behavioral and Social Sciences (1969).

British economist L. J. Sharpe (1977)reviewed the European literature and com-mission reports on use of social scientific



knowledge and reached a decidedly gloomyconclusion:

We are brought face to face with the fact thatit has proved very difficult to uncover manyinstances where social science research hashad a clear and direct effect on policy evenwhen it has been specifically commissionedby government. (P. 45)

Ronald Havelock (1980) of the KnowledgeTransfer Institute generalized that “there isa gap between the world of research andthe world of routine organizational prac-tice, regardless of the field” (p. 13). Thesame conclusions came forth time andagain from different fields:

At the moment there seems to be no indica-tion that evaluation, although the law of theland, contributes anything to educationalpractice, other than headaches for theresearcher, threats for the innovators and

depressing articles for journals devoted toevaluation. (Rippey 1973:9)

More recent utilization studies continueto show low levels of research use in gov-ernment decision making (Landry, Lamari,and Amara 2003)

It can hardly come as a surprise, then,that support for evaluation began to decline.During the Reagan Administration in the1980s, the U.S. GAO found that federalevaluation received fewer resources and that“findings from both large and small studieshave become less easily available for useby the Congress and the public” (GAO1987:4). In both 1988 and 1992, the GAOprepared status reports on program evalua-tion to inform changing executive branchadministrations at the federal level.

We found a 22-percent decline in the numberof professional staff in agency program


An Evaluation Report Disappears into the Void—And an Area of Inquiry Is Born

Sociologist and Harvard professor Carol Weiss is recognized in the Encyclopedia of Evaluation as the“Founding Mother” of evaluation (Mathison 2005:449). She was also the first to give prominence tothe issue of evaluation use, a deep-seated interest arising from her experience in the 1960s evaluatinga government program that was part of the “War on Poverty.”

“I was asked to evaluate a program in central Harlem. One of the program’s goals was to bring blackcollege students from universities in the south to work in central Harlem, to work in the schools, thehospitals and social agencies. They were trained and then they spent the year working in the community.When I finished my evaluation of the Harlem program, the report came out in 3 volumes. We sent copiesof the report to Washington: I never heard a word from them! I had the feeling I could have just dumpedit into the ocean and it would have made no difference. So, I asked myself: ‘Why did they support andfund this evaluation if they were not going to pay any attention to it?’ That’s how I got interested in theuses of research: What was going on? What could researchers—or anyone else—do to encouragepeople to pay more attention to research?” (www.gse.harvard.edu/news/features/weiss09102001.html)

Weiss subsequently began studying and writing about knowledge utilization (Weiss 1977) and becameone of the most influential contributors to our understandings of evaluation use, policy formulation,and organizational decision making (Alkin 2004). She has been one of the most visible and influentialvoices for the idea that cumulative evaluative evidence can contribute to significant program andpolicy changes expressed in the aphorism: In Evidence Lies Change (Graff and Christou 2001).Bottom line: “Utility is what evaluation is all about” (Weiss 2004:161).


evaluation units between 1980 and 1984.A follow-up study of 15 units that hadbeen active in 1980 showed an additional 12-percent decline in the number of professionalstaff between 1984 and 1988. Funds for pro-gram evaluation also dropped substantiallybetween 1980 and 1984 (down by 37 percentin constant 1980 dollars). . . . Discussionswith the Office of Management and Budgetoffer no indication that the executive branchinvestment in program evaluation showedany meaningful overall increase from 1988 to1992. (GAO 1992a:7)

The GAO went on to conclude that its1988 recommendations to enhance the fed-eral government’s evaluation function hadgone unheeded: “The effort to rebuild thegovernment’s evaluation capacity that wecalled for in our 1988 transition seriesreport has not been carried out” (GAO1992a:7). Here, ironically, we have an eval-uation report on evaluation going unused.

In 1995, the GAO provided anotherreport to the U.S. Senate on ProgramEvaluation subtitled “Improving the Flowof Information to Congress.” GAO ana-lysts conducted follow-up case studies ofthree major federal program evaluations:the Comprehensive Child DevelopmentProgram, the Community Health CentersProgram, and the Chapter 1 Elementaryand Secondary Education Act aimed atproviding compensatory education ser-vices to low-income students. The analystsconcluded that

lack of information does not appear tobe the main problem. Rather, the problemseems to be that available information is notorganized and communicated effectively.Much of the available information did notreach the [appropriate Senate] Committee,or reached it in a form that was too highlyaggregated to be useful or that was difficultto digest. (GAO 1995:39)

Many factors affect evaluation use inCongress, but politics is always a dominantfactor (Chelimsky 2007; 2006a, 2006b;Julnes and Rog 2007; Mohan and Sullivan2007). Evaluation use throughout the U.S.federal government continued its spiral ofdecline through the early 1990s (Popham1995; Wargo 1995; Chelimsky 1992). Inmany federal agencies, the emphasis shiftedfrom program evaluation to inspection,auditing, and investigations (Smith 1992;Hendricks, Mangano, and Moran 1990).Then came attention to and adoption ofperformance monitoring for accountabilityand the picture changed dramatically.

New Directions in Accountability:Reinventing Government

A predominant theme of the 1995International Evaluation Conference inVancouver was worldwide interest in reduc-ing government programs and makingremaining programs more effective andaccountable. Decline in support for govern-ment programs was fueled by the wide-spread belief that such efforts wereineffective and wasteful. While the GreatSociety and War on Poverty programs ofthe 1960s had been founded on good inten-tions and high expectations, they came tobe perceived as a failure. The “needs assess-ments” that had provided the rationales forthose original programs had found that the poor, the sick, the homeless, and theuneducated—the needy of all kinds—needed services. So services and programswere created. Thirty years down the roadfrom those original efforts, and billions ofdollars later, most social indicators revealedlittle improvement. Poverty statistics, ratesof homelessness, hard core unemploymentand underemployment, multigenerationalwelfare recipients, urban degradation, and



crime rates combined to raise questionsabout the effectiveness of services. Reportson effective programs (e.g., Guttmannand Sussman 1995; Kennedy School ofGovernment 1995; Schorr 1988) receivedrelatively little media attention comparedwith the relentless press about waste andineffectiveness (Wortman 1995). In the1990s, growing concerns about federalbudget deficits and runaway entitlementcosts intensified the debate about the effec-tiveness of government programs. Bothconservatives and liberals were faced withpublic demands to know what had beenachieved by all the programs created andall the money spent. The call for greateraccountability became a watershed flowingat every level—national, state, and local;public sector, not-for-profit agencies, andthe private sector (Mohan and Sullivan2007; Chelimsky 2006a; Harvard FamilyResearch Project 1996a, 1996b).

Clear answers were not forthcoming.Few programs could provide data onresults achieved and outcomes attained.Internal accountability had come to centeron how funds were spent (inputs monitor-ing), eligibility requirements (who gets ser-vices and client characteristics), how manypeople get services, what activities theyparticipate in, and how many complete theprogram. These indicators of inputs, clientcharacteristics, activities, and outputs(program completion) measured whetherproviders were following governmentrules and regulations rather than whetherdesired results were being achieved.Control had come to be exercised throughaudits, licensing, and service contractsrather than through measuring outcomes.The consequence was to make providersand practitioners compliance-orientedrather than results-focused. Programs wererewarded for doing the paperwork well

rather than making a difference in clients’lives.

Public skepticism turned to deep-seatedcynicism. Polling data showed a wide-spread perception that “nothing works.”As an aside, and in all fairness, this percep-tion is not unique to the late twentieth cen-tury. In the nineteenth century, Spencertraced 32 acts of the British Parliament anddiscovered that 29 produced effects con-trary to those intended (Edison 1983:5).Given today’s public cynicism, three effec-tive programs out of 32 might be consid-ered a pretty good record.

More damning still, in modern times,the perception has grown that no relation-ship exists between the amount of moneyspent on a problem and the results accom-plished, an observation made with a senseof despair by economist John Brandl inhis keynote address to the AEA in NewOrleans in 1988. Brandl, a professor in theHubert H. Humphrey Institute of PublicAffairs at the University of Minnesota (for-merly its Director), was present at the cre-ation of many human services programsduring his days at the old Department ofHealth, Education and Welfare (HEW). Hecreated the interdisciplinary EvaluationMethodology training program at theUniversity of Minnesota. He later movedfrom being a policy analyst to being a pol-icy formulator as a Minnesota state legisla-tor. His opinions carried the weight of bothscholarship and experience. In his keynoteaddress to professional evaluators, heopined that no demonstrable relationshipexists between program funding levels andimpact, that is, between inputs and out-puts; more money spent does not meanhigher quality or greater results.

In a later article, Brandl updated hisanalysis. While his immediate focus was onMinnesota state government, his comments



characterize general concerns about theeffectiveness of government programs inthe 1990s:

The great government bureaucracies ofMinnesota and the rest of America todayare failing for the same reason that the for-merly Communist governments in Europefell a few years ago. . . . There is no system-atic accountability. People are not regularlyinspired to do good work, rewarded for out-standing performance, or penalized for notaccomplishing their tasks.

In bureaus, people are expected to dowell because the rules tell them to do so.Indeed, often in bureaus here and abroad,able, idealistic workers become disillusionedand burned-out by a system that is not ori-ented to produce excellent results. No infu-sion of management was ever going to makeoperations of the Lenin shipyard in Gdanskeffective.

Maybe—I would say surely—until sys-tematic accountability is built into govern-ment, no management improvements willdo the job. (Brandl 1994:13A)

Similar indictments of governmenteffectiveness became the foundation forefforts at Performance Monitoring, Total

Quality Management, ReengineeringGovernment, Management by Objectives(MBO), Reinventing Government, andManaging for Results. Such public sectorinitiatives made greater accountability andperformance monitoring, and increased useof evaluation, central to reform in U.S.federal and state governments, as well asgovernments around the world, notablyAustralia, Canada, New Zealand, and theUnited Kingdom (Moynihan 2006; Rogers2006; Sears 2006). In this vein, Exhibit 1.2illustrates the premises for results-orientedgovernment as promulgated by Osborneand Gaebler (1992) in their influential andbest-selling book Reinventing Government.

In the United States, the Clinton/Gore Administration’s effort to “reinventgovernment” led to the 1993 GovernmentPerformance and Results Act (GPRA). Thismajor legislation aimed to shift the focus ofgovernment decision making and account-ability away from a preoccupation withreporting on activities to a focus on theresults of those activities, such as real gainsin employability, safety, responsiveness,or program quality. Under GPRA, U.S. fed-eral government agencies are required to


E X H I B I T 1. 2Premises of Reinventing Government

• What gets measured gets done.• If you don’t measure results, you can’t tell success from failure.• If you can’t see success, you can’t reward it.• If you can’t reward success, you’re probably rewarding failure.• If you can’t see success, you can’t learn from it.• If you can’t recognize failure, you can’t correct it.• If you can demonstrate results, you can win public support.

From Osborne and Gaebler (1992, chap. 5)


develop multiyear strategic plans, annualperformance plans, and annual performancereports.

It is now an entrenched part of Americanpolitics that each new presidential adminis-tration will initiate new performance moni-toring and accountability requirements.The Bush administration focused a goodportion of its campaign rhetoric on perfor-mance, accountability, and results. To thatend, in 2001, the Office of Managementand Budget (OMB) began to develop amechanism called the Program AssessmentRating Tool (PART) to help budget exam-iners and federal managers measure theeffectiveness of government programs. APART review aims to identify a program’sstrengths and weaknesses in order toinform funding and management decisionsaimed at making the program more effec-tive. The PART framework sets as its goalan evaluation of “all factors that affect and

reflect program performance includingprogram purpose and design; performancemeasurement, evaluations, and strategicplanning; program management; and pro-gram results” (www.whitehouse.gov/omb/part). PART aims to examine programimprovements over time and allow compar-isons between similar programs. WilliamTrochim (2006a), Chair of the AmericanEvaluation Association Public AffairsCommittee, observed, “PART is one of themore significant evaluation-related itemsemerging from the US federal governmentin many years.” In Chapter 4, we shallexamine the utility of these accountabilityinitiatives.

Seemingly endless administrative reformswith a focus on accountability are by nomeans limited to the U.S. government. It isindicative of the political power and publicappeal of accountability-oriented govern-ment reforms that one of the first things



the newly elected Conservative governmentof Prime Minister Harper did in Canadawas pass a 255-page “Accountability Act.”In one review of the Act, political observerRobin Sears (2006) concluded that it wasone more effort in a long tradition of try-ing “to tame the twin nightmares of everymodern democracy: lousy management ofpublic spending, and a broad convictionamong voters that insiders get favoursfrom government” (p. 19). Making govern-ment accountability meaningful, credible,and useful is one of the challenges facingall modern democracies (Chelimsky 2006a,2006b).

Misuse of Evaluations

Utilization-focused evaluation can be locatedbetween two extremes. One extreme, as justdiscussed, is the oversimplified image ofanalyzing evaluation findings then mechan-ically making instantaneous decisionsbased on those findings, for example, thesimplistic expectation that PART effective-ness scores (or any simple grading systemthat categorizes results) should nicelymatch budget allocations (high scoresequal more funds, low scores mean pro-gram termination). Real-world evaluation

use, we shall find, is more complex,nuanced, and interpretative. Moving fromdata to action involves treading a pathfraught with obstacles. Evaluators whosuccessfully facilitate use of findings needtechnical skill, to be sure, but they alsoneed to be good communicators, havepolitical savvy, understand how organiza-tions function, and know how to workwith a variety of people with differentlearning and decision-making styles andcompeting interests.

If one extreme is an image of simple,mechanical, and immediate use, the otherextreme is ignoring evaluation findings alto-gether, or worse, misusing them. Evaluationfindings are not going to be methodologi-cally or technically perfect. Debates aboutfocus, measurement challenges, designweaknesses, sampling problems, and con-troversies about what the data mean arethe rule rather than the exception. Theworld is a messy place. Programs aremessy and complex. Studying the world andevaluating programs is difficult because,in doing so, we encounter what WilliamJames (1950) famously called “one greatblooming, buzzing confusion” (p. 488).Clear, precise, certain, and noncontrover-sial findings are elusive, a modern chimera,especially on matters about which peoplehave differing opinions and perspectives,which is just about everything. The imper-fections of research designs and the diffi-culties of moving from evaluation findingsto action are not, however, reasons toignore evaluation findings altogether, orworse yet, manipulate the findings andinterpretations to support preconceivedpositions and biases. Let’s distinguish then,right here in the first chapter, between seri-ously taking evaluation findings intoaccount as part of a complex and multifac-eted process of deliberation versus ignoringfindings altogether because the person


I was working with a major, long-establishedorganization. In a meeting with seniormanagement to get the evaluation off to agood start, I asked them to tell me about anevaluation that had been useful to them, tostart exploring the features that makeevaluation useful. There was a long, nervoussilence until one of them said, “None, really.”

“Then I guess we’ll have to do things quitedifferently,” I said.

—An experienced evaluator


getting the findings doesn’t like how theycame out, or that person manipulating thefindings to make them come out the wayhe or she wants them to be.

Thus, we face not only the challenge ofincreasing evaluation use. We also must beconcerned with misuse, deception, andabuse. Marv Alkin (1990, 2004), an earlytheorist of user-oriented evaluation, haslong emphasized that evaluators mustattend to appropriate use, not just amountof use, and be concerned about misuse(Christie and Alkin 1999; Alkin and Coyle1988). Ernest House, one of the mostastute observers of how the evaluation pro-fession has developed, observed in thisregard: “Results from poorly conceivedstudies have frequently been given widepublicity, and findings from good studieshave been improperly used” (1990a:26).The field faces a dual challenge then: sup-porting and enhancing appropriate useswhile also working to eliminate improperuses (Patton 2005b).

In 2004, Philip A. Cooney, chief ofstaff for the White House Council onEnvironmental Quality, repeatedly editedgovernment climate reports to play downlinks between such emissions and globalwarming. Before joining the Bush WhiteHouse, Cooney had been a lobbyist at theAmerican Petroleum Institute, the largesttrade group representing the interests of theoil industry, where he led the oil industry’sfight against limits on greenhouse gases. Hewas trained as a lawyer with a bachelor’sdegree in economics, but with no scientifictraining. News accounts (e.g., New YorkTimes, June 8, 2005) reported that Cooneyremoved or adjusted descriptions of climateresearch that government scientists andtheir supervisors had already approved.The dozens of changes, while sometimes assubtle as the insertion of the phrase “signif-icant and fundamental” before the word

“uncertainties,” tended to raise doubtsabout findings, despite a consensus amongclimate experts that the findings wererobust. In one instance, he changed anOctober 2002 draft of a regularly publishedsummary of government climate research,“Our Changing Planet,” by adding theword “extremely” to this sentence: “Theattribution of the causes of biological andecological changes to climate change orvariability is extremely difficult.” In a sec-tion on the need for research into howwarming might change water availabilityand flooding, he crossed out a paragraphdescribing the projected reduction of moun-tain glaciers and snowpack.

Such distortions don’t just happen withpolitically motivated advisors protectingnational policies. Evaluators at a local levelregularly report efforts by program staff,administrators, and elected officials to altertheir findings and conclusions. As I wasworking on this very section, I received aphone call from an evaluation colleague ina small rural community who had justreceived a request from an agency directorto rewrite an evaluation report “with amore positive tone” and leave out some ofthe negative quotations from participants.She wanted help with language that woulddiplomatically but firmly explain that suchalterations would be unethical.

There is irony here that, in a broad histor-ical context, is worth noting. When the firstedition of this book was published in 1978,just as the field of evaluation was emerging,the primary concern was getting anyone topay any attention to evaluations and takefindings seriously. In the last quarter century,a sea change has occurred in politicalrhetoric. Now, in the information and knowl-edge age, politicians, policymakers, businessleaders, and not-for-profit advocates havelearned that the public expects them toaddress problems through a research and



evaluation lens. Public debates regularlyinclude these questions: What do we actu-ally know about this issue? What does theresearch show? What are evaluation find-ings about the effectiveness of attemptedinterventions and solutions? From 1981 to2006, the frequency of articles about“accountability” in the New York Timesincreased some fivefold, from 124 to 624, arate by 2006 of at least one per day (JohnBare, Vice President, The Arthur M. BlankFoundation, Atlanta, Georgia, 2007 per-sonal communication).

The irony is that as evaluation has becomemore prominent and more used, it has alsobecome more subject to manipulation andabuse. Thus, critics of the Bush administra-tion consider the Cooney manipulation of cli-mate research to be business-as-usual inpolitics rather than an exception. The Centersfor Disease Control, long the world’s leadingsource for high-quality, credible research andevaluation, has been subject to such manipu-lation. Data about the ineffectiveness ofabstinence-only sex education programshave been manipulated and suppressed; andclear evidence about the effectiveness of con-doms in preventing HIV-AIDS has been den-igrated. On the National Cancer InstituteWeb site, evidence about a correlationbetween abortions and cancer was fabricatedand disseminated. There are a great manyother examples, not least of which was themanipulation and distortion of intelligenceabout weapons of mass destruction to justifythe Iraq invasion (Specter 2006). All politicalgroups attempt to support their preferredideological positions by championing empir-ical findings that support their beliefs anddenigrating evidence that runs counter totheir beliefs. In both the Clinton and theBush administrations, findings about theeffectiveness of needle exchanges to preventHIV transmission were dismissed. Scholarshippublished by the National Society for the

Study of Education has documented both theuses and the misuses of data for educationalaccountability, especially in the standards-based No Child Left Behind initiative ofthe U.S. federal government (Herman andHaertel 2005); misuses have resulted fromlack of competence, inadequate resources,political pressures, and, in some cases, pre-meditation and corruption.

These examples illustrate some of thepolitical and ethical challenges evaluatorsface in working to get evaluation findingstaken seriously and used (Chelimsky 1995b).We’ll look at these issues in depth in laterchapters, especially how evaluators can meettheir responsibility to assure the integrity andhonesty of evaluations as called for in theGuiding Principles adopted by the AEA.

Evaluators display honesty and integrity intheir own behavior and attempt to ensurethe honesty and integrity of the entireevaluation process. (AEA Task Force onGuiding Principles for Evaluators 1995;see Exhibit 1.3)

Standards of Excellencefor Evaluation

Concerns about ethics, the quality of eval-uations, and making evaluations usefulundergirded an early effort by professionalevaluators to articulate standards of prac-tice. To appreciate the importance of thestandards, let’s begin with some context.Prior to adoption of standards, manyresearchers took the position that theirresponsibility was merely to design stud-ies, collect data, and publish findings;what decision makers did with those find-ings was not their problem. This stanceremoved from the evaluation researcherany responsibility for fostering use andplaced all the blame for nonuse or under-utilization on decision makers.



Academic aloofness from the messyworld in which research findings are trans-lated into action has long been a character-istic of basic scientific research. Before thefield of evaluation generated its own stan-dards in the late 1970s, criteria for judgingevaluations were based on the quality stan-dards of traditional social and behavioralsciences, namely, technical quality andmethodological rigor. Use was ignored.Methods decisions dominated the evalua-tion design process. Methodological rigormeant experimental designs, quantitativedata, and sophisticated statistical analysis.Whether decision makers understood such

analyses was not the researcher’s problem.Validity, reliability, measurability, and gen-eralizability were the dimensions thatreceived the greatest attention in judgingevaluation research proposals and reports(e.g., Bernstein and Freeman 1975). Indeed,evaluators concerned about increasing astudy’s usefulness often called for evermore methodologically rigorous evalua-tions to increase the validity of findings,thereby hoping to compel decision makersto take findings seriously.

By the late 1970s, however, it was becom-ing clear that greater methodological rigorwas not solving the use problem. Program


E X H I B I T 1.3Guiding Principles for Evaluators

Systematic InquiryEvaluators conduct systematic, data-based inquiries about what is being evaluated.

CompetenceEvaluators provide competent performance to stakeholders.

Integrity/HonestyEvaluators display honesty and integrity in their own behavior, and attempt to ensure the honesty and integrityof the entire evaluation process.

Respect for PeopleEvaluators respect the security, dignity, and self-worth of the respondents, program participants, clients, andother stakeholders with whom they interact.

Responsibilities for General and Public WelfareEvaluators articulate and take into account the diversity of interests and values that may be related to thegeneral and public welfare.

American Evaluation Association (AEA), 1995Task Force on Guiding Principles for Evaluators

(See also Shadish, Newman, Scheirer, and Wye 1995)

For detailed elaboration and discussion of the specific Guiding Principles adopted by the AmericanEvaluation Association, see www.eval.org/Publications/GuidingPrinciples.asp


staff and funders were becoming openlyskeptical about spending scarce funds onevaluations they couldn’t understand and/orfound irrelevant. Evaluators were beingasked to be “accountable” just as programstaff members were supposed to be account-able. The questions emerged with uncom-fortable directness: Who will evaluate theevaluators? How will evaluation be evalu-ated? It was in this context that professionalevaluators began discussing standards.

The most comprehensive effort atdeveloping standards was hammered outover 5 years by a 17-member committeeappointed by 12 professional organiza-tions, with input from hundreds of practic-ing evaluation professionals. The standardspublished by the Joint Committee onStandards in 1981 dramatically reflectedthe ways in which the practice of evaluationhad matured. Just prior to publication, DanStufflebeam, Chair of the Committee, sum-marized the committee’s work as follows:

The standards that will be published essen-tially call for evaluations that have four fea-tures. These are utility, feasibility, proprietyand accuracy. And I think it is interestingthat the Joint Committee decided on thatparticular order. Their rationale is that anevaluation should not be done at all if thereis no prospect for its being useful to someaudience. Second, it should not be done if itis not feasible to conduct it in politicalterms, or practicality terms, or cost effec-tiveness terms. Third, they do not think itshould be done if we cannot demonstratethat it will be conducted fairly and ethically.Finally, if we can demonstrate that an eval-uation will have utility, will be feasible andwill be proper in its conduct, then they saidwe could turn to the difficult matters ofthe technical adequacy of the evaluation.(Stufflebeam 1980:90)

In 1994 and 2008, revised standardswere published following extensive reviews

spanning several years (Stufflebeam 2007;Patton 1994b). While some changes weremade in the individual standards, the over-arching framework of four primary criteriaremained unchanged: utility, feasibility,propriety, and accuracy (see Exhibit 1.4).Specific standards have also been adaptedto various international contexts (Russonand Russon 2004), but the overall frame-work has translated well cross-culturally.Taking the standards seriously has meantlooking at the world quite differently.Unlike the traditionally aloof stance ofpurely academic researchers, professionalevaluators are challenged to take responsi-bility for use. No more can we play thegame of blaming the resistant decisionmaker. If evaluations are ignored or mis-used, we have to look at where our ownpractices and processes may have been inad-equate. Implementation of a utility-focused,feasibility-conscious, propriety-oriented,and accuracy-based evaluation requires situ-ational responsiveness, methodologicalflexibility, multiple evaluator roles, politicalsophistication, and substantial doses ofcreativity, all elements of utilization-focusedevaluation.

Daniel Stufflebeam (2001), the guidingleader of the standards movement in evalu-ation, undertook a comprehensive, exhaus-tive, and independent review of how 22different evaluation approaches stack upagainst the standards. No one was betterpositioned by knowledge, experience, pres-tige within the profession, and commit-ment to the standards to undertake such achallenging endeavor. He concluded, “Of thevariety of evaluation approaches thatemerged during the twentieth century, ninecan be identified as strongest and mostpromising for continued use and develop-ment.” Utilization-focused evaluation wasamong those nine, with the highest rating foradherence to the utility standards (p. 80).



Worldwide Surgein Demand for Evaluation

Interest in evaluation has surged in thenew millennium, including a proliferation ofdifferent models and approaches (Stufflebeamand Shinkfield 2007). But no trend has beenmore important to evaluation in the lastdecade than its expanding global reach. Inthe 1970s and 1980s, professional evalua-tion associations began to appear: theCanadian Evaluation Society, the AustralasianEvaluation Society, and the AEA. In 1995,evaluation professionals from 61 countriesaround the world came together at the firsttruly international evaluation conference inVancouver, British Columbia. Ten years later,a second international conference in

Toronto attracted 2,330 evaluation profes-sionals from around the world. The 1990salso gave rise to the European EvaluationSociety (founded in 1994 in the Hague) andthe African Evaluation Association (foundedin 1999 in Nairobi and having held its fourthcontinent-wide conference in Niamey, Niger,in 2007). Now there are more than 60national evaluation associations around theworld, including Japan, Malaysia, Sri Lanka,Mongolia, Russia, Brazil, Colombia, Peru,South Africa, Zimbabwe, Niger, and NewZealand, to name but a few examples. In2003 in Lima, Peru, the inaugural meetingof the new International Organization forCooperation in Evaluation (IOCE) was heldas an umbrella networking and support ini-tiative for national and regional evaluation


E X H I B I T 1.4Standards for Evaluation

UTILITYThe Utility Standards are intended to ensure that an evaluation will serve the practical information needs ofintended users.

FEASIBILITYThe Feasibility Standards are intended to ensure that an evaluation will be realistic, prudent, diplomatic, andfrugal.

PROPRIETYThe Propriety Standards are intended to ensure that an evaluation will be conducted legally, ethically, and withdue regard for the welfare of those involved in the evaluation, as well as those affected by its results.

ACCURACYThe Accuracy Standards are intended to ensure that an evaluation will reveal and convey technically adequateinformation about the features that determine worth or merit of the program being evaluated.

(Joint Committee on Standards for Educational Evaluation 1994)

For the full set of detailed standards, see www.wmich.edu/evalctr/jc


associations around the world. The Interna-tional Development Evaluation Association(IDEAS) was formed in Beijing in 2002 tosupport evaluators with special interests in developing countries; its first biennial

conference was held in New Delhi in 2005.The Network for Monitoring, Evaluation,and Systematization of Latin America andthe Caribbean (ReLAC) was formed in 2005 in Peru.


Commemorating 20 Years of Evaluation Scholarship

In 2005, the Canadian Journal of Program Evaluation (CJPE) published a special 20th anniversaryissue (www.cjpe.ca). The volume featured articles on the state of the art of evaluation in Canada ina variety of domains of practice, including health, education, child welfare, social services, andgovernment as well as two independent content analyses of CJPE since the publication of Volume 1,Number 1 in 1986. All issues are available online.

In 2007, the American Evaluation Association journal New Directions for Evaluation celebrated its 20thanniversary with a review of enduring issues: judging interpretations, theory-based evaluation,participatory evaluation, and cultural issues (Cousins and Whitmore 2007; Datta 2007a; King 2007b;Leviton 2007; Lipsey 2007c; Madison 2007; Mark 2007; Mathison 2007; Rogers 2007;Schwandt 2007a).

Evaluation capacity can be a crucialpart of what the World Bank calls “theknowledge-based economy.” The Bank’sKnowledge Assessment Methodology(KAM 2006) is an interactive benchmark-ing tool aimed at helping countries identifythe challenges and opportunities they facein making the transition to the knowledge-based economy. The World Bank, throughits International Program for DevelopmentEvaluation Training, also offers annually,month-long evaluation training for peoplethroughout the developing world (Cousins2006a).

International agencies have developedcomprehensive guidelines for the conductof evaluation (e.g., United Nations Develop-ment Programme 2007; United NationsEvaluation Group 2007, 2005a, 2005b;Danida 2006; Independent EvaluationGroup 2006; International Organizationfor Management 2006; Organisation forEconomic Co-operation and Development2006; World Food Programme 2006; Inter-national Fund for Agricultural Development

2002). Various national associations havereviewed and adapted the Joint CommitteeStandards to their own socio-political con-texts as the African Evaluation Association(AfrEA) did in adopting African EvaluationGuidelines in 2007 (AfrEA 2007; Russonand Russon 2004). Evaluation texts andanthologies are available for specificcountries and languages, e.g., Italy (Stame2007), France (Ridde and Dagenais 2007,Japan (Patton and Nagao 2000), and NewZealand (Lunt, Davidson, and McKegg2003).

Such globally interconnected effortsmade it possible for evaluation strategiesand approaches to be shared worldwide.Thus, the globalization of evaluation sup-ports our working together to increase ourinternational understanding about factorsthat support program effectiveness and eval-uation use. International perspectives alsochallenge Western definitions and culturalassumptions about how evaluations oughtto be conducted and how quality ought tobe judged. As the evaluation standards are


translated into different languages, nationalassociations are adding their own culturalnuances and adapting practices to fit localpolitical, social, organizational, economic,and cultural contexts (Stufflebeam 2004b).

Governments around the world arebuilding new systems for monitoring andevaluation, aiming to adapt results-basedmanagement and performance measure-ment to support development (Rist andStame 2006). International agencies havealso begun using evaluation to assess the fullrange of development efforts under way indeveloping countries. Most major interna-tional organizations have their own evaluation

units with guidelines, protocols, confer-ences, training opportunities, Web sites, andresource specialists. In his keynote addressto the international conference in Vancouver,Masafumi Nagao (1995), a cofounder ofJapan’s evaluation society, challenged evalu-ators to think globally even as they evaluatelocally, that is, to consider how internationalforces and trends affect project outcomeseven in small and remote communities. Thisbook will include attention to how utilization-focused evaluation offers a process foradapting evaluation processes to addressmulticultural and international issues andconstituencies.


The International Evaluation Challenge

In 2005, distinguished international leaders meeting in Bellagio, Italy, committed their support forimpact evaluations of social programs in developing countries. Participants noted that in 2005, donorcountries committed $34 billion to aid projects addressing health, education, and poverty in thedeveloping world, but evaluation of results was rare and inadequate. Developing countries themselvesspent hundreds of billions more on similar programs. The leaders from international agencies,governments, research organizations, and philanthropic foundations endorsed five principles for action.

1. Impact studies are beneficial

2. Knowledge is a public good

3. A collective initiative to promote impact studies is needed

4. The quality of impact studies is essential

5. The initiative should be complementary, strategic, transparent, and independent

(Evaluation Gap Working Group 2006)

The challenges and opportunities for eval-uation extend well beyond government-supported programming. Because of the enor-mous size and importance of governmentefforts, program evaluation is inevitablyaffected by trends in the public sector, butevaluation has also been growing inimportance in the private and indepen-dent sectors. Corporations, philanthropicfoundations, not-for-profit agencies, and

nongovernmental organizations (NGOs)worldwide are increasingly turning to evalua-tors for help in enhancing their effectiveness.

All this ferment means that evaluation hasbecome a many-splendored thing—a richtapestry of models, methods, issues, approaches,variations, definitions, jargon, concepts,theories, and practices. And therein lies therub. How does one sort through the manycompeting and contradictory messages


about how to conduct evaluations? Theanswer in this book is to stay focused onthe issue of use—conducting evaluationsthat are useful and actually get used. Andas Carol Weiss (1998b) observed in herkeynote address to the AEA annual confer-ence, the challenge is not just increasinguse, “but more effective utilization, use forimproving daily program practice and alsouse for making larger changes in policy andprogramming” (p. 30).

From Problem to Solution:Toward Use in Practice

The future of evaluation is tied to the futureeffectiveness of programs. Indictments ofprogram effectiveness are, underneath,also indictments of evaluation. The origi-nal promise of evaluation was that itwould point the way to effective program-ming. Later, that promise broadened toinclude providing ongoing feedback forimprovements during implementation.Evaluation cannot be considered to havefulfilled its promise if, as is increasinglythe case, the general perception is that fewprograms have attained desired outcomes,that “nothing works.”

As this introduction and historicaloverview closes, we are called back to theearly morning scene that opened thischapter: decision makers lamenting the dis-appointing results of an evaluation, com-plaining that the findings did not tell themwhat they needed to know. For their part,evaluators complain about many things aswell, but for a long time their most com-mon complaint has been that their findingsare ignored (Weiss 1972d:319). The ques-tion from those who believe in the impor-tance and potential utility of evaluationremains: “What has to be done to getresults that are appropriately and meaning-fully used?” This question has taken center

stage as program evaluation has emergedas a distinct field of professional practice—and that is the question this book answers.In doing so, we recognize a lineage thatextends much farther back than the morerecent establishment of the evaluationprofession.

Scientists now calculate that all living humanbeings are related to a single woman wholived roughly 150,000 years ago in Africa,a “mitochondrial Eve.” . . . all humanity islinked to Eve through an unbroken chain ofmothers. (Shreeve 2006:62)

And, adds Halcolm, she was an evaluator.

Follow-Up Exercises

1. Scan recent issues of local and/ornational newspapers. Look for articlesthat report evaluation findings. Write acritique of the press report. Can you tellwhat was evaluated? Are the methodsused in the evaluation discussed? Howclear are the findings from the evaluation?Can you tell how the findings have been orwill be used? How balanced and compre-hensive is the press report?

2. See if you can locate the actual eval-uation report discussed in the newspaper(Question 1 above). Many evaluationreports are posted on the Internet. Writeyour own newspaper report based onwhat you consider important in the evalu-ation. How is your press report differentfrom the one you found in the newspaper?Why? What does this tell you about thechallenges of disseminating evaluationfindings to the general public?

3. Find the Web site for one of theinternational or national evaluation asso-ciations. Review the site and its offerings.

4. Most major philanthropic founda-tions, federal agencies, and international



organizations have Web sites with access toevaluation policies, guidelines, and reports.Visit the evaluation sections of at least onegovernment site and one nongovernmentalsite. Write a comparison of their informa-tion about and approaches to evaluation.

5. Review the full list of ProgramEvaluation Standards (www.wmich.edu/

evalctr/jc). (a) Conduct a cultural assump-tions analysis of the standards. Whatspecific standards, if any, strike you as par-ticularly Western in orientation? (b) Selectand discuss at least two standards that seemto you unclear, that is, you aren’t sure whatyou would have to do in an evaluation tomeet that particular standard.




Documents

Evaluation Use - SAGE Publications IncEvaluation Use as a Critical Societal Issue If the scene I have described were unique, it would merely represent a frustrating profes-sional problem