22
Robert D. Behn Harvard University Why Measure Performance? Different Purposes Require Different Measures Performance measurement is not an end in itself. So why should public managers measure perfor- mance? Because they may find such measures helpful in achieving eight specific managerial pur- poses. As part of their overall management strategy, public managers can use performance mea- sures to evaluate, control, budget, motivate, promote, celebrate, learn, and improve. Unfortu- nately, no single performance measure is appropriate for all eight purposes. Consequently, public managers should not seek the one magic performance measure. Instead, they need to think seri- ously about the managerial purposes to v/hich performance measurement might contribute and how they might deploy these measures. Only then can they select measures with the characteristics necessary to help achieve each purpose. Without at least a tentative theory about how perfor- mance measures can be employed to foster improvement (which is the core purpose behind the other seven), public managers will be unable to decide what should be measured. Everyone is measuring performance.' Public managers are measuring the performance of their organizations, their contractors, and the collaboratives in which they partici- pate. Congress, state legislatures, and city councils are in- sisting that executive-branch agencies periodically report measures of performance. Stakeholder organizations want performance measures so they can hold government ac- countable. Journalists like nothing better than a front-page bar chart that compares performance measures for various jurisdictions—whether they are average test scores for the city's schools or FBI uniform crime statistics for the state's cities. Moreover, public agencies are taking the initiative to publish compilations of their own performance mea- surements (Murphey 1999). A major trend among the na- tions that comprise the Organisation for Economic Co- operation and Development, concludes Alexander Kouzmin (1999) of the University of Western Sydney and his col- leagues, is "the development of measurement systems which enable comparison of similar activities across a num- ber of areas," (122) and which "help to establish a perfor- mance-based culture in the public sector" (123). "Perfor- mance measurement," writes Terrell Blodgett of the University of Texas and Gerald Newfarmer of Manage- ment Partners, Inc., is "(arguably) the hottest topic in gov- ernment today" (1996, 6). Why Measure Performance? What is behind all of this measuring of performance? What do people expect to do with the measures—other than use them to beat up on some underperforming agency, bureaucrat, or contractor? How are people actually using these performance measures? What is the rationale that connects the measurement of government's performance to some higher purpose? After all, neither the act of mea- suring performance nor the resulting data accomplishes anything itself; only when someone uses these measures in some way do they accomplish something. For what pur- poses do—or might—people measure the performance of public agencies, public programs, nonprofit and for-profit contractors, or the collaboratives of public, nonprofit, and for-profit organizations that deliver public services?^ Why measure performance? Because measuring perfor- mance is good. But how do we know it is good? Because business firms all measure their performance, and every- one knows that the private sector is managed better than Robert D. Be/in is a lecturer at Harvard University's John F. Kennedy Scbool of Government and the faculty chair of its executive program Driving Gov- ernment Performance. His research focuses on governance, leadership, and performance management. His latest book is Retriinking Democratic Account- ability (Brookings Institution, 2001). He believes the most important perfor- mance measure is 1918: the last year the Boston Red Sox won the World Series. Email: [email protected]. 586 Public Administratian Review • September/October 2003, Vol, 63, No, 5

Why Measure Performance? Different Purposes Require Different

Embed Size (px)

Citation preview

Robert D. BehnHarvard University

Why Measure Performance?Different Purposes Require Different Measures

Performance measurement is not an end in itself. So why should public managers measure perfor-mance? Because they may find such measures helpful in achieving eight specific managerial pur-poses. As part of their overall management strategy, public managers can use performance mea-sures to evaluate, control, budget, motivate, promote, celebrate, learn, and improve. Unfortu-nately, no single performance measure is appropriate for all eight purposes. Consequently, publicmanagers should not seek the one magic performance measure. Instead, they need to think seri-ously about the managerial purposes to v/hich performance measurement might contribute andhow they might deploy these measures. Only then can they select measures with the characteristicsnecessary to help achieve each purpose. Without at least a tentative theory about how perfor-mance measures can be employed to foster improvement (which is the core purpose behind theother seven), public managers will be unable to decide what should be measured.

Everyone is measuring performance.' Public managersare measuring the performance of their organizations, theircontractors, and the collaboratives in which they partici-pate. Congress, state legislatures, and city councils are in-sisting that executive-branch agencies periodically reportmeasures of performance. Stakeholder organizations wantperformance measures so they can hold government ac-countable. Journalists like nothing better than a front-pagebar chart that compares performance measures for variousjurisdictions—whether they are average test scores for thecity's schools or FBI uniform crime statistics for the state'scities. Moreover, public agencies are taking the initiativeto publish compilations of their own performance mea-surements (Murphey 1999). A major trend among the na-tions that comprise the Organisation for Economic Co-operation and Development, concludes Alexander Kouzmin(1999) of the University of Western Sydney and his col-leagues, is "the development of measurement systemswhich enable comparison of similar activities across a num-ber of areas," (122) and which "help to establish a perfor-mance-based culture in the public sector" (123). "Perfor-mance measurement," writes Terrell Blodgett of theUniversity of Texas and Gerald Newfarmer of Manage-ment Partners, Inc., is "(arguably) the hottest topic in gov-ernment today" (1996, 6).

Why Measure Performance?What is behind all of this measuring of performance?

What do people expect to do with the measures—otherthan use them to beat up on some underperforming agency,bureaucrat, or contractor? How are people actually usingthese performance measures? What is the rationale thatconnects the measurement of government's performanceto some higher purpose? After all, neither the act of mea-suring performance nor the resulting data accomplishesanything itself; only when someone uses these measuresin some way do they accomplish something. For what pur-poses do—or might—people measure the performance ofpublic agencies, public programs, nonprofit and for-profitcontractors, or the collaboratives of public, nonprofit, andfor-profit organizations that deliver public services?^

Why measure performance? Because measuring perfor-mance is good. But how do we know it is good? Becausebusiness firms all measure their performance, and every-one knows that the private sector is managed better than

Robert D. Be/in is a lecturer at Harvard University's John F. Kennedy Scboolof Government and the faculty chair of its executive program Driving Gov-ernment Performance. His research focuses on governance, leadership, andperformance management. His latest book is Retriinking Democratic Account-ability (Brookings Institution, 2001). He believes the most important perfor-mance measure is 1918: the last year the Boston Red Sox won the WorldSeries. Email: [email protected].

586 Public Administratian Review • September/October 2003, Vol, 63, No, 5

the public sector. Unfortunately, the kinds of financial ra-tios the business world uses to measure a firm's perfor-mance are not appropriate for the public sector. So whatshould public agencies measure? Performance, of course.But what kind of performance should they measure, howshould they measure it, and what should they do with thesemeasurements? A variety of commentators offer a varietyof purposes:• Joseph Wholey of the University of Southern California

and Kathryn Newcomer of George Washington Universityobserve that "the current focus on performancemeasurement at all levels of government and in nonprofitorganizations refiects citizen demands for evidence ofprogram effectiveness that have been made around theworld" (1997, 92).

• In their case for performance monitoring, Wholey and theUrban Institute's Harry Hatry note that "performancemonitoring systems are beginning to be used in budgetformulation and resource allocation, employee motivation,performance contracting, improving government servicesand improving communications between citizens andgovernment" (1992, 604), as well as for "externalaccountability purposes" (609).

• "Performance measurement may be done annually toimprove public accountability and policy decisionmaking," write Wholey and Newcomer, "or done morefrequently to improve management and programeffectiveness" (1997, 98).

• The Governmental Accounting and Standards Boardsuggests that performance measures are "needed forsetting goals and objectives, planning program activitiesto accomplish these goals, allocating resources to theseprograms, monitoring and evaluating the results todetermine if they are making progress in achieving theestablished goals and objectives, and modifying programplans to enhance performance" (Hatry et al. 1990, v).

• Municipalities, notes Mary Kopczynski of the UrbanInstitute and Michael Lombardo of the International City/County Management Association, can use comparativeperformance data in five ways: "(1) to recognize goodperformance and to identify areas for improvement; (2)to use indicator values for higher-performing jurisdic-tions as improvement targets by jurisdictions that fallshort of the top marks; (3) to compare performanceamong a subset of jurisdictions believed to be similar insome way (for example, in size, service delivery practice,geography, etc); (4) to inform stakeholders outside ofthe local government sector (such as citizens or businessgroups); and (5) to solicit joint cooperation in improvingfuture outcomes in respective communities" (1999,133).

• Advocates of performance measurement in localgovernment, observes David Ammons of the Universityof North Carolina, "have promised that more sophisticat-

ed measurement systems will undergird managementprocesses, better inform resource allocation decisions,enhance legislative oversight, and increase accounta-bility" (1995, 37).

• Performance measurement, write David Osbome andPeter Plastrik in The Reinventor's Fieldbook, "enablesofficials to hold organizations accountable and tointroduce consequences for performance. It helps citizensand customers judge the value that government createsfor them. And it provides managers with the data theyneed to improve performance" (2000, 247).

• Robert Kravchuk of Indiana University and RonaldSchack of the Connecticut Department of Labor do notoffer a specific list of purposes for measuring perfor-mance. Nevertheless, imbedded in their proposals fordesigning effective performance measures, they suggesta number of different purposes: planning, evaluation,organizational learning, driving improvement efforts,decision making, resource allocation, control, facilitat-ing the devolution of authority to lower levels of the hi-erarchy, and helping to promote accountability (Krav-chuck and Schack 1996, 348, 349, 350, 351).Performance measures can be used for multiple pur-

poses. Moreover, different people have different purposes.Legislators have different purposes than journalists. Stake-holders have different purposes than public managers.Consequently, I will focus on just those people who man-age public agencies.

Eight Managerial Purposes for MeasuringPerformance

What purpose—exactly—is a public manager attempt-ing to achieve by measuring performance? Even for thisnarrower question, the answer isn't obvious. One analystadmonishes public managers: "Always remember that theintent of performance measures is to provide reliable andvalid information on performance" (Theurer 1998,24). Butthat hardly answers the question. What will public manag-ers do with all of this reliable and valid information? Pro-ducing reliable and valid reports of government perfor-mance is no end in itself. All of the reliable and valid dataabout performance is of little use to public managers ifthey lack a clear idea about how to use them or if the dataare not appropriate for this particular use. So what, ex-actly, will performance measurement do, and what kindsof measures do public managers need to do this? Indeed,what is the logic behind all of this performance measure-ment—the causal link between the measures and the pub-lic manager's effort to achieve specific policy purposes?

Hatry offers one of the few enumerated lists of the usesof performance information. He suggests that public man-agers can use such information to perform ten different

Why Measure Performance? 587

tasks: to (1) respond to elected officials' and the public'sdemands for accountability; (2) make budget requests; (3)do internal budgeting; (4) trigger in-depth examinationsof performance problems and possible corrections; (5)motivate; (6) contract; (7) evaluate; (8) support strategicplanning; (9) communicate better with the public to buildpublic trust; and (10) improve.^ Hatry notes that improv-ing programs is the fundamental purpose of performancemeasurement, and all but two of these ten uses—improv-ing accountability and increasing communications with thepublic—"are intended to make program improvements thatlead to improved outcomes" (1999b, 158, 157).

My list is slightly different. From the diversity of rea-sons for measuring performance, I think public managershave eight primary purposes that are specific and distinct(or only marginally overlapping'*). As part of their overallmanagement strategy, the leaders of public agencies canuse performance measurement to (1) evaluate; (2) control;(3) budget; (4) motivate; (5) promote; (6) celebrate; (7)learn; and (8) improve.^

This list could be longer or shorter. For the measure-ment of performance, the public manager's real purpose—indeed, the only real purpose—is to improve performance.The other seven purposes are simply means for achievingthis ultimate purpose. Consequently, the choice of howmany subpurposes—how many distinct means—to includeis somewhat arbitrary. But my major point is not. Instead,let me emphasize: The leaders of public agencies can useperformance measures to achieve a number of very differ-ent purposes, and they need to carefully and explicitlychoose their purposes. Only then can they identify or cre-ate specific measures that are appropriate for each indi-vidual purpose.*

Of the various purposes that others have proposed formeasuring performance, I have not included on my list: plan-ning, decision making, modifying programs, setting per-formance targets, recognizing good performance, compar-ing performance, informing stakeholders, performancecontracting, and promoting accountability. Why not? Be-cause these are really subpurposes of one (or more) of theeight basic purposes. For example, planning, decision mak-ing, and modifying are implicit in two of my eight, morebasic, purposes: budgeting and improving. The real reasonthat managers plan, or make decisions, or modify programsis to either reallocate resources or to improve future perfor-mance. Similarly, the reason that managers set performancetargets is to motivate, and thus to improve. To compare per-formance among jurisdictions is—implicitly but undeni-ably—to evaluate them. Recognizing good performance isdesigned to motivate improvements. Informing stakehold-ers both promotes and gives them the opportunity to evalu-ate and learn. Performance contracting involves all of theeight purposes from evaluating to improving. And, depend-

Table 1 Eight Purposes that Public Managers Have for

Measuring Performance

The purpose The public manager's question that the performancemeasure can help answer

Evaluate How well is my public agency performing?Control How can I ensure that my subordinates are doing the right

thing?Budget On what programs, people, or projects should my agency

spend the public's money?Motivate How can I motivate line staff, middle managers, nonprofit

and for-profit collaborators, stakeholders, and citizens todo the things necessary to improve performonce?

Promote How can I convince political superiors, legislators,stakeholders, journalists, and citizens that my agency isdoing a good job?

Celebrate What accomplishments are worthy of the importantorganizational ritual of celebrating success?

Learn Why is what working or not working?Improve What exactly should who do differently to improve

performance?

ing upon what people mean by accountability, they maypromote it by evaluating public agencies, by controllingthem, or by motivating them to improve' (table 1).

Purpose 1. To Evaluate: How Well Is ThisGovernment Agency Performing?

Evaluation is the usual reason for measuring perfor-mance. Indeed, many of the scholars and practitioners whoare attempting to develop systems of performance mea-surement have come from the field of program evaluation.Often (despite the many different reasons cited earlier), noreason is given for measuring performance; instead, theevaluation purpose is simply assumed. People rarely statethat their only (or dominant) rationale for measuring per-formance is to evaluate performance, let alone acknowl-edge there may be other purposes. It is simply there be-tween the lines of many performance audits, budgetdocuments, articles, speeches, and books: People are mea-suring the performance of this organization or that pro-gram so they (or others) can evaluate it.

In a report on early performance-measurement effortsunder the Government Performance and Results Act of1993, an advisory panel of the National Academy of Pub-lic Administration (NAPA) observed, "Performance mea-surement of program outputs and outcomes provides im-portant, if not vital, information on current program statusand how much progress is being made toward importantprogram goals. It provides needed information as towhether problems are worsening or improving, even if itcannot tell us why or how the problem improvement (orworsening) came about" (NAPA 1994,2). These sentencesdo not contain the words "evaluation" or "evaluate," yetthey clearly imply the performance measurements will fur-nish some kind of assessment of program performance.

Of course, to evaluate the performance of a publicagency, the public manager needs to know what that agency

588 Public Administration Review • September/October 2003, VoL 63, No. 5

is supposed to accomplish. For this reason, two of the tenperformance-measurement design principles developed byKravchuk and Schack are to "formulate a clear, coherentmission, strategy, and objectives," and to "rationalize theprogrammatic structure as a prelude to measurement." Dothis first, they argue, because "performance measurementmust begin with a clear understanding of the policy objec-tives of a program, or multiprogram system," and because"meaningful measurement requires a rational programstructure" (1996, 350). Oops. If public managers have towait for the U.S. Congress or the local city council to for-mulate (for just one governmental undertaking) a clear,coherent mission, strategy, and objectives combined witha rationalized program structure, they will never get to thenext step of measuring anything.*

No wonder many public managers are alarmed by theevaluative nature of performance measurement. If thereexisted a clear, universal understanding of their policy ob-jectives, and if they could manage within a rational pro-gram structure, they might find performance measurementless scary. But without an agreement on policy objectives,public managers know that others can use performance datato criticize them (and their agency) for failing to achieveobjectives that they were not pursuing. And if given re-sponsibility for achieving widely accepted policy objec-tives with an insane program structure (multiple constraints,inadequate resources, and unreasonable timetables), eventhe most talented managers may fall short of the agreed-upon performance targets.

Moreover, even if the perfonnance measures are notcollected for the explicit purpose of evaluation, this possi-bility is always implicit. And using performance data toevaluate a public agency is a tricky and sophisticated un-dertaking. Yet, a simple comparison of readily availabledata about similar (though rarely identical) agencies is themost common evaluative technique. Hatry (1999a) notesthat intergovernmental comparisons of performance "fo-cus primarily on indicators that can be obtained from tra-ditional and readily available data sources." This is thecommon practice, he continues, because "the best outcomedata cannot be obtained without new, or at least, substan-tially revised procedures" (104).

Often, however, existing or easily attainable data createan opportunity for simplistic, evaluative comparisons. Hatrywrites that those who collect comparative performance data,as well as "the public, and the media must recognize thatthe data in comparative perfonnance measurement effortswill only be roughly comparable" (1999a, 104). But willjournalists, who must produce this evening's news ortomorrow's newspaper under very tight deadlines, recog-nize this, let alone explain it? And will the public, in theirquick glance at an attractive bar chart, get this message?Hatry, himself, is not completely sanguine:

The ultimate question of comparative data is whetherpublication does more harm than good, iVIore harmcan occur if many of the measurements contain er-rors or are otherwise unfair, so that low performersare unfairly beaten up by the media and have to spendexcessive amounts of time and effort attempting toexplain and defend themselves.... On the other hand,if the data seem on the whole to encourage jurisdic-tions to explore why low performance has occurredand how they might better themselves, then suchefforts will be worthwhile, even if a few agenciesare unfairly treated." (Hatry 1999a, 104).

Whether the scholars, analysts, or managers like it, al-most any performance measure can and will be used toevaluate a public agency's perfonnance.

Purpose 2. To Control: How Can Public ManagersEnsure Their Subordinates Are Doing the RightThing?

Yes. Frederick Winslow Taylor is dead. Today, no man-ager believes the best way to influence the behavior of sub-ordinates is to establish the one best way for them to dotheir prescribed tasks and then measure their compliancewith this particular way. In the twenty-first century, allmanagers are into empowerment.

Nevertheless, it is disingenuous to assert (or believe)that people no longer seek to control the behavior of pub-lic agencies and public employees, let alone seek to useperformance measurement to help them do so.' Why dogovernments have line-item budgets? Today, no one em-ploys the measurements of time-and-motion studies forcontrol. Yet, legislatures and executive-branch superiorsdo establish performance standards—whether they are spe-cific curriculum standards for teachers or sentencing stan-dards for judges—and then measure perfonnance to seewhether individuals have complied with these mandates."*After all, the central concern of the principle-agent theoryis how principles can control the behavior of their agents(Ingraham and Kneedler 2000, 238-39).

Indeed, the controlling style of management has a longand distinguished history. It has cleverly encoded itself intoone of the rarely stated but very real purposes behind per-formance measurement. "Management control depends onmeasurement," writes William Bruns in a Harvard Busi-ness School note on "Responsibility Centers and Perfor-mance Measurement" (1993, 1). In business schools, ac-counting courses and accounting texts often explicitly usethe word "control.""

In their original explanation of the balanced scorecard,Robert Kaplan and David Norton note that business has acontrol bias: "Probably because traditional measurementsystems have sprung from the finance function, the sys-tems have a control bias. That is, traditional performancemeasurement systems specify the particular actions they

Why Measure Performance? 589

want employees to take and then measure to see whetherthe employees have in fact taken those actions. In that way,the systems try to control behavior. Such measurementsystems fit with the engineering mentality of the Indus-trial Age" (1992,79). The same is true in the public sector.Legislatures create measurement systems that specify par-ticular actions they want executive-branch employees totake and particular ways they want executive-branch agen-cies to spend money. Executive-branch superiors, regula-tory units, and overhead agencies do the same. Then, theymeasure to see whether the agency employees have takenthe specified actions and spent the money in the specifiedways.'^ Can't you just see Fred Taylor smiling?Purpose 3. To Budget: On What Programs, People,or Projects Should Government Spend the Public'sMoney?

Performance measurement can help public officials tomake budget allocations. At the macro level, however, theapportionment of tax monies is a political decision madeby political officials. Citizens delegate to elected officialsand their immediate subordinates the responsibility fordeciding which purposes of government action are primaryand which ones are secondary or tertiary. Thus, politicalpriorities—not agency performance—drive macro budget-ary choices.

Performance budgeting, performance-based budgeting,and results-oriented budgeting are some of the names com-monly given to the use of performance measures in thebudgetary process (Holt 1995-96; Jordon and Hackbart1999; Joyce 1996, 1997; Lehan 1996; Melkers andWilloughby 1998, 2001; Thompson 1994; Thompson andJohansen 1999). But like so many other phrases in the per-formance-measurement business, they can mean differentthings to different people in different contexts.'' For ex-ample, performance budgeting may simply mean includ-ing historical data on performance in the annual budgetrequest. Or it may mean that budgets are structured notaround line-item expenditures (with performance purposesor targets left either secondary or implicit), but around gen-eral performance purposes or specific performance targets(with line-item allocations left to the managers of the unitscharged with achieving these purposes or targets). Or itmay mean rewarding units that do well compared to someperformance targets with extra funds and punishing unitsthat fail to achieve their targets with budget cuts.

For improving performance, however, budgets are crudetools. What should a city do if its fire department fails toachieve its performance targets? Cut the department's bud-get? Or increase its budget? Or should the city manager firethe fire chief and recruit a public manager with a track recordof fixing broken agencies? The answer depends on the spe-cific circumstances that are not captured by the formal per-

formance data. Certainly, cutting the fire department's bud-get seems like a counterproductive way to improve perfor-mance (though cutting the fire department's budget may beperfectly logical if the city council decides that fire safetyis less of a political priority than educating children, fixingthe sewers, or reducing crime). If analysis reveals the firedepartment is underperforming because it is underfunded—because, for example, its capital budget lacks the funds forcost-effective technology—then increasing the department'sbudget is a sensible response. But poor performance maybe the result of factors that more (or less) money won't fix:poor leadership, the lack of a fire-prevention strategy tocomplement the department's fire-fighting strategy, or thefailure to adopt industry training standards. Using budget-ary increments to reward well-performing agencies and bud-getary decrements to punish underperforming ones is not astrategy that will automatically fix (or even motivate) poorperformers.

Nevertheless, line managers can use performance datato inform their resource-allocation decisions. Once electedofficials have established macro political priorities, thoseresponsible for more micro decisions may seek to investtheir limited allocation of resources in the most cost-effec-tive units and activities. And when making such microbudgetary choices, public managers may find performancemeasures helpful.

Purpose 4. To Motivate: How Can Public ManagersMotivate Line Staff, Middle Managers, Nonprofitand For-Profit Collaborators, Stakeholders, andCitizens to Do the Things Necessary to ImprovePerformance?

Public managers may use performance measures to learnhow to perform better. Or, if they already understand whatit takes to improve performance, they may use the mea-sures to motivate such behavior. And for this motivationalpurpose, performance measures have proven to be veryuseful.

The basic concept is that establishing performancegoals—particularly stretch goals—grabs people's attention.Then the measurement of progress toward the goals pro-vides useful feedback, concentrating their efforts on reach-ing these targets. In his book The Great Ideas of Manage-ment, Jack Duncan of the University of Alabama reportson the startling conclusion of research into the impact ofgoal setting on performance: "No other motivational tech-nique known to date can come close to duplicating thatrecord" (1989, 127).

To implement this motivational strategy, an agency'sleadership needs to give its people a significant goal toachieve and then use performance measures—includinginterim targets—to focus people's thinking and work andto provide a periodic sense of accomplishment. Moreover,

590 Public Administration Review • September/October 2003, Vol, 63, No, 5

performance targets may also encourage creativity in evolv-ing better ways to achieve the goal (Behn 1999); thus,measures that motivate improved performance may alsomotivate learning.'"•

In New York City in the 1970s, Gordon Chase used per-formance targets to motivate the employees of the HealthServices Administration (Rosenthal 1975; Levin andSanger 1994). In Massachusetts in the 1980s, the leader-ship of the Department of Public Welfare used the samestrategy (Behn 1991). And in the 1990s in Pennsylvania,the same basic approach worked in the Department of En-vironmental Protection (Behn 1997a). But perhaps the mostfamous application of performance targets to motivate pub-lic employees is Compstat, the system created by WilliamBratton, then commissioner of the New York Police De-partment, to focus attention of precinct commanders onreducing crime (Silverman 2001, 88-89, 101).

Purpose 5. To Promote: How Can Public ManagersConvince Political Superiors, Legislators,Stakeholders, Journalists, and Citizens that TheirAgency Is Doing a Good Job?

Americans suspect their government is both ineffectiveand inefficient. Yet, if public agencies are to accomplishpublic purposes, they need the public's support. Perfor-mance measures can contribute to such support by reveal-ing not only when government institutions are failing, butalso when they are doing a good or excellent job. For ex-ample, the National Academy of Public Administration'sCenter for Improving Govemment Performance reports thatperformance measures can be used to "validate success;justify additional resources (when appropriate); earn cus-tomer, stakeholder, and staff loyalty by showing results;and win recognition inside and outside the organization"(NAPA 1999, 7).

Still, too many public managers fail to use performancemeasures to promote the value and contribution of theiragency. "Performance-based measures," writes HarryBoone of the Council of State Governments, "provide ajustification for the agency's existence," yet "many agen-cies cannot defend their effectiveness in performance-basedterms" (1996, 10).

In a study, "Toward Useful Performance Measures," aNational Academy of Public Administration advisory panel(1994) asserts that "performance indicators can be a pow-erful tool in communicating program value and accom-plishments to a variety of constituencies" (23). In additionto "the use of performance measurement to communicateprogram success and worth" (9), the panel noted, the "ma-jor values of a performance measurement system" includeits potential "to enhance public trust" (9). That is, the panelargues, performance measurement can not only directlyestablish—and thus promote—the competence of specific

agencies and the value of particular programs; it also canindirectly establish, and thus promote, the competence andvalue of govemment in general.

Purpose 6. To Celebrate: What AccomplishmentsAre Worthy of the Important Organizational Ritualof Celebrating Success?

All organizations need to commemorate their accomplish-ments. Such rituals tie people together, give them a sense oftheir individual and collective relevance, and motivate futureefforts. Moreover, by achieving specific goals, people gain asense of personal accomplishment and self-worth (Locke andLatham 1984, 1990). Such celebrations need not be limitedto one big party to mark the end of the fiscal year or the comple-tion of a significant project. Small milestones along the way—as well as unusual achievements and unanticipated victories—provide an opportunity for impromptu celebrations that callattention to these accomplishments and to the people whomade them happen. And such celebrations can help to focusattention on the next challenge.

Like all of the other purposes for measuring perfor-mance—with the sole and important exception of improve-ment—celebration is not an end in itself. Rather, celebra-tion is important because it motivates, promotes, andrecruits. Celebration helps to improve performance becauseit motivates people to improve further in the next year,quarter, or month. Celebration helps to improve perfor-mance because it brings attention to the agency, and thuspromotes its competence. And this promotion—this atten-tion—may even generate increased flexibility (from over-head agencies) and resources (from the guardians of thebudget). Moreover, this promotion and attention attractanother resource: dedicated people who want to work fora successful agency that is achieving important public pur-poses. Celebration may even attract potential collabora-tors from other organizations that have not received as muchattention, and thus seek to enhance their own sense of ac-complishment by shifting some of their energies to the high-performing collaborative (Behn 1991, 92-93).

Celebration also may be combined with learning. Ratherthan hold a party to acknowledge success and recognizeits contributors, an informal seminar or formal presenta-tion can realize the same purposes. Asking those who pro-duced the unanticipated achievement or unusual victory toexplain how they pulled it off celebrates their triumph; butit also provides others with an opportunity to learn howthey might achieve a similar success (Behn 1991, 106-7).

Still, the links from measurement to celebration to im-provement is the most indirect because it has to work throughone of the other links—either motivation, budgeting, learn-ing, or promotion. In the end, any reason for measuring per-formance is valid only to the extent that it helps to achievethe most basic purpose: to improve performance.

Why Measure Performance? 591

Purpose 7. To Learn: Why Is What Working or NotWorking?

Perfonnance measures contain information that can beused not only to evaluate, but also to learn. Indeed, learn-ing is more than evaluation. The objective of evaluation isto determine what is working and what isn't. The objec-tive of learning is to determine why.

To learn from performance measures, however, manag-ers need some mechanism to extract information from thedata. We may all believe that the data speak for themselves.This, however, is only because we each have buried in ourbrain some unconscious mechanism that has already madean implicit conversion of the abstract data into meaningfulinfonnation. The data speak only through an interpreterthat convens the collection of digits into analog lessons—that decodes the otherwise inscrutable numbers and pro-vides a persuasive explanation. And often, different peopleuse different interpreters, which explains how they can drawvery different lessons from the same data.''

Moreover, if managers have too many performancemeasures, they may be unable to learn anything. CaroleNeves of the National Academy of Public Administration,James Wolf of Virginia Tech, and Bill Benton of Bentonand Associates (1986) write that "in many agencies," be-cause of the proliferation of perfonnance measures, "thereis more confusion or 'noise' than useful data." TheodorePoister and Gregory Streib of Georgia State University callthis the "'DRIP' syndrome—Data Rich but InformationPoor" (1999, 326). Thus, Neves and her colleagues con-clude, "managers lack time or simply find it too difficultto try to identify good signals from the mass of numbers"(1986, 141).

From perfonnance measures, public managers may learnwhat is not working. If so, they can stop doing it and real-locate money and people from this nonperforming activityto more effective undertakings (designed to achieve theidentical or quite different purposes). Or they may learnwhat is working. If so, they can shift existing resources (ornew resources that become available) to this proven activ-ity. Learning can help with the budgeting of both moneyand people.

Furthermore, learning can help more directly with theimproving. The performance measures can reveal not onlywhether an agency is performing well or poorly, but alsowhy: What is contributing to the agency's excellent, fair,or poor performance—and what might be done to improvethe components that are performing fairly or poorly?

In seeking to learn from perfonnance measures, publicmanagers frequently confront the black box enigma of so-cial science research."' The data—the performance mea-sures—can reveal that an organization is performing wellor poorly, but they don't necessarily reveal why. The per-

formance measures can describe what is coming out of theblack box of a public agency, as well as what is going in,but they don't necessarily reveal what is happening inside.How are the various inputs interacting to produce the out-puts? What is the organizational black box actually doingto the inputs to convert them into the outputs? What is thesocietal black box actually doing to the outputs to convertthem into the outcomes?"

Public managers can, of course, create some measuresof the processes going on inside the black box. But theycannot guarantee that the internal characteristics and pro-cesses of the black box they have chosen to measure areactually the ones that determine whether the inputs areconverted into high-quality or low-quality outputs. Yet, themore internal processes that public managers choose tomeasure, the more likely they are to discover a few thatconelate well with the outputs. Such conelations could,however, be purely random,'^ or the factors that are identi-fied by the correlations as significant contributors couldmerely be correlated with other factors that are the realcauses. Converting performance data into an understand-ing of what is happening inside the black box is neithereasy nor obvious.

Purpose 8. To Improve: What Exactly Should WhoDo Differently to Improve Performance?

Performance " 'measurement' is not an end in itself butmust be used by managers to make improvements" (NAPA1994, 22), emphasizes an advisory panel of the NationalAcademy of Public Administration. In fact, the word "im-prove" (or "improving" or "improvement") appears morethan a dozen times in this NAPA report. "Ideally," the panelconcludes, "performance data should be part of a continu-ous feedback loop that is used to report on program valueand accomplishment and identify areas where performanceis weak so that steps can be taken to promote improve-ments" (22). Yet, the panel also found "little evidence inmost [GRPA pilot performance] plans that the performanceinformation would be used to improve program perfor-mance" (8).

Similarly, Hatry argues the "fundamental purpose ofperformance information" is "to make program improve-ments" (1999b, 158). But how? What exactly is the con-nection between the measurement and the improvement?Who has to do what to convert the measurement into animprovement? Or does this just happen automatically? No,responds the NAPA panel: "measurement alone does notbring about perfonnance improvement" (1994, 15).

For example, if the measurement produces some learn-ing, someone then must convert that learning into an im-provement. Someone has to intervene consciously and ac-tively. But can any slightly competent individual pull thisoff? Or does it require a sophisticated appreciation of the

592 Public Administration Review • September/October 2003, Vol, 63, No, 5

strategies and pitfalls of converting measurement into im-provement? To improve, an organization needs the capac-ity to adopt—and adapt—the lessons from its learning.

Learning from performance measures, however, is tricky.It isn't obvious what lessons public managers should drawabout which factors are contributing to the good or poorperformance, let alone how they might modify such fac-tors to foster improvements. Improvement requires atten-tion to the feedback—the ability to check whether the les-sons postulated from the learning have been implementedin a way that actually changes organizational behavior sothat it results in the better outputs and outcomes that thelearning promised. Improvement is active, operationallearning.

The challenge of learning from the performance mea-sures is both intellectual and operational. Public managerswho wish to use measurement to improve the performanceof their agencies face two challenges: First, they have theintellectual challenge of figuring out how to learn whichchanges in plans, or procedures, or personnel might pro-duce improvements. Then, they confront the operationalchallenge of figuring out how to implement the indicatedchanges.

There are a variety of standard mechanisms for usingperformance measures to evaluate. There exist some suchmechanisms to control and budget. For the purposes oflearning and improving, however, each new combinationof policy objectives, political environment, budgetary re-sources, programmatic structure, operational capacity, regu-latory constraints, and performance measures demands amore open-ended, qualitative analysis. For performancelearning and performance improvement, there is no cook-book.'"*

How does the measurement of performance beget im-provement? Measurement can influence performance in avariety of ways, most of which are hardly direct or appar-ent. There exist a variety of feedback loops, though not allof them may be obvious, and the obvious ones may notfunction as expected or desired. Consequently, to measurean agency's performance in a way that can actually helpimprove its performance, the agency's leadership needs tothink seriously not only about what it should measure, butalso about how it might deploy any such measurements.Indeed, without at least some tentative theory about howthe measurements can be employed to foster improvements,it is difficult to think about what should be measured.

Selection Criteria for Each MeasurementPurpose

What kinds of performance measures are most appro-priate for which purposes? It isn't obvious. Moreover, ameasure that is particularly appropriate for one purpose

may be completely useless for another. For example, "inmany cases," Newcomer notes, "the sorts of measures thatmight effectively inform program improvement decisionsmay provide data that managers would not find helpful forresource allocation purposes" (1997, 8). Before choosinga performance measure, public managers must first choosetheir purpose.

Kravchuk and Schack note that no one measure or evenone collection of measures is appropriate for all circum-stances: "The search for a single array of measures for allneeds should be abandoned, especially where there are di-vergent needs and interests among key users of performanceinformation." Thus, they advocate "an explicit measure-ment strategy" that will "provide for the needs of all im-portant users of performance information" (Kravchuk andSchack 1996, 350).

I take a similar approach. But, rather than worry aboutthe needs of different kinds of users, I focus on the differ-ent purposes for which the users—specifically, public man-agers—can employ the performance measures. After all,different users want different measures because they havedifferent purposes. But it is the nature of the purpose—notthe nature of the user—that determines which characteris-tics of those measures will be most helpful. The usual ad-monition of performance measurement is, "Don't measureinputs. Don't measure processes. Don't measure outputs.Measure outcomes." But outcomes are not necessarily thebest measure for all purposes.

Will a particular public manager find a certain perfor-mance measure helpful for a specific purpose? The answerdepends not on the organizational position of that man-ager, but on whether this measure possesses the character-istics required by the manager's purpose (table 2).

Purpose 1: To EvaluateEvaluation requires a comparison. To evaluate the per-

formance of an agency, its managers have to compare that

Table 2 Characteristics of Performance Measures forDifferent Purposes

The purpose To help achieve this purpose, public manogers need

Evaluate Outcomes, combined with inputs and with the effects ofexogenous factors

Control Inputs tfiat can be regulatedBudget Efficiency measures (specifically outcomes or outputs

divided by inputs)Motivate Almost-reol-time outputs compared with production targetsPromote Easily understood aspects of performance about which

citizens really careCelebrate Periodic and significant performance targets that, when

achieved, provide people with a real sense of personaland collective accomplishment

Learn Disaggregated data that can reveal deviancies from theexpected

Improve Inside-the-black-box relationships that connect changes inoperations to changes in outputs and outcomes

Why Measure Performance? 593

performance with some standard. Such a standard can comefrom past perfonnance, from the perfonnance of similaragencies, from a professional or industry standard, or frompolitical expectations. But without such a basis for com-parison, it is impossible to determine whether the agencyis performing well or poorly.

And to compare actual performance against the perfor-mance criterion requires a variety of outcome measures,combined with some input (plus environmental, process,and output) measures. The focus, however, is on the out-comes. To evaluate a public agency—to determine whetherit is achieving its public purpose—requires some measureof the outcomes that the agency was designed to affect.Only with outcome measures can public managers answerthe effectiveness question: Did the agency achieve the re-sults it set out to produce? Then, dividing by some inputmeasures, they can ask the efficiency question: Did thisagency produce these results in a cost-effective way? Toanswer either of these evaluative questions, a managerneeds to measure outcomes.^"

Of course, the agency did not produce all of the out-comes alone. Other factors, such as economic conditions,affected them. Consequently, public managers also needto ask the impact question: What did the agency itself ac-complish? What is the difference between the actual out-comes and the outcomes that would have occurred if theagency had not acted?

Another way of assessing an organization or program isto evaluate its internal operations. This is the best-practicequestion: How do the operations and practices of this or-ganization or program compare with the ones that areknown to be most effective and efficient? To conduct sucha best-practice evaluation requires some process mea-sures—appropriate descriptions of the organization's keyinternal operations that can be compared with some op-erational standards.

No one comparison of a single outcome measure with asingle performance standard will provide a definitive evalu-ation. Rather, to provide a conscientious and credible pic-ture of the agency's performance, an evaluation requiresmultiple measures compared with multiple standards.

Purpose 2: To ControlTo control the behavior of agencies and employees, pub-

lic officials need input requirements. Indeed, whenever youdiscover officials who are using input measures, you canbe sure they are using them to control. To do this, officialsneed to measure the corresponding behavior of individu-als and organizations and then compare this perfonnancewith the requirements to check who has and has not com-plied: Did the teachers follow the curricular requirementsfor the children in their classrooms? Did the judges followthe sentencing requirements for those found guilty in their

courts? Often, such requirements are described only asguidelines: curriculum guidelines, sentencing guidelines.Do not be fooled. These guidelines are really requirements,and these requirements are designed to control. The mea-surement of compliance with these requirements is themechanism of control.

Purpose 3: To BudgetTo use performance measures for budgeting purposes,

public managers need measures that describe the efficiencyof various activities. Then, once political leaders have setmacro budgetary priorities, agency managers can use effi-ciency measures to suggest the activities in which theyshould invest the appropriated funds. Why spend limitedfunds on some programs or organizations when the per-formance measures reveal that other programs or organi-zations are more efficient at achieving the political objec-tives behind the budget's macro allocations?

To use performance measures to budget, however, man-agers need not only data on outcomes (or outputs) for thenumerator in the efficiency equation; they also need reli-able cost data for the denominator. And these cost mea-sures have to capture not only the obvious, direct costs ofthe agency or program, but also the hidden, indirect costs.Few governments, however, have created cost-accountingor activity-based-accounting systems that assign to eachgovernment function the full and accurate costs (Coe 1999,112; Joyce 1997, 53, 56; Thompson 1994).

Budgeting usually concerns the allocation of dollars.But most public managers are constrained by a system ofdouble budgeting. They must manage a fixed number ofdollars and a fixed number of personnel slots. Thus, in at-tempting to maximize the productivity of these two con-strained resources, they also need to budget their people.And to use perfonnance measurement for this budgetarypurpose, they need not only outcome (or output) measuresfor the numerator of their efficiency equation, but also in-put data in terms of people for the denominator. Publicmanagers need to allocate their people to the activities withthe highest productivity per person.

Purpose 4: To MotivateTo motivate people to work harder or smarter, public

managers need almost-real-time measures of outputs tocompare with production targets. Organizations don't pro-duce outcomes; organizations produce outputs. And tomotivate an organization to improve its performance,managers have to motivate it to improve what it actuallydoes. Consequently, although public managers want touse outcome data to evaluate their agency's performance,they need output data to motivate better performance.^'Managers can't motivate people to do something theycan't do; managers can't motivate people to affect some-

594 Public Administration Review • September/October 2003, Vol. 63, No. 5

thing over which they have little or no influence; manag-ers can't motivate people to produce an outcome they donot themselves produce.

Moreover, to motivate, managers have to collect anddistribute the output data quickly enough to provide useablefeedback. Those who produce the agency's outputs cannotadjust their production processes to respond to inadequa-cies or deficiencies unless they know how well they aredoing against their current performance target. EliSilverman of the John Jay College of Criminal Justice de-scribes Compstat as "intelligence-led policing" (2001,182).The New York Police Department collects, analyzes, andquickly distributes to managers at all levels—from com-missioner to patrol sergeants—the data about the currentpatterns and concentrations of crime that are necessary todevelop strategic responses.

This helps to explain why society attempts to motivateschools and teachers with test scores. The real, ultimateoutcome that citizens seek from our public schools is chil-dren who grow up to become productive employees andresponsible citizens. But using a measure of employee pro-ductivity and citizen responsibility to motivate performancecreates a number of problems. First, it is very difficult todevelop a widely acceptable measure of employee produc-tivity (do we simply use wage levels?), let alone citizenresponsibility (do we using voting participation?). Second,schools and teachers are not the only contributors to a fu-ture adult's productivity and responsibility. And third, thelag between when the schools and teachers do their workand when these outcomes can be measured is not justmonths or years, but decades. Thus, we never could feedthese outcome measures back to the schools and teachersin time for them to make any adjustments. Consequently,as a society we must resort to motivating schools and teach-ers with outputs—with test scores that (presumably) mea-sure how much a child has learned. And although we can-not determine whether schools and teachers are producingproductivity or responsibility in future adults, citizens ex-pect they will convey some very testable knowledge and

k i 2 2

Once an agency's leaders have motivated significantimprovements using output targets, they can create someoutcome targets. Output targets can motivate people to fo-cus on improving their agency's internal processes, whichproduce the outputs. Outcome targets, in contrast, canmotivate people to look outside their agency—to seek waysto collaborate with other individuals and organizationswhose activities may affect (perhaps more directly) theoutcomes and values the agency is really charged with pro-ducing (Bardach 1998; Sparrow 2000).

Purpose 5: To PromoteTo convince citizens their agency is effective and effi-

cient, public managers need easily understood measuresof those aspects of performance about which many citi-zens personally care. And such performance may be onlytangentially related to the agency's public purpose.

The National Academy of Public Administration, in itsstudy of early performance-measurement plans under theGovernment Performance and Results Act, noted that "mostplans recognized the need to communicate perfonnanceevaluation results to higher level officials, but did not showclear recognition that the form and level of data for theseneeds would be different than that for operating manag-ers." NAPA emphasized that the needs of "departmentheads, the Executive Office of the President, and Congress"are "different and the special needs of each should be moreexplicitly defined" (1994, 23). Similarly, Kaplan andNorton stress that different customers have different con-cems (1992, 73-74).

Consider a state division of motor vehicles. Its missionis safety—the safety of vehicle drivers, vehicle passengers,bicycle riders, and pedestrians. In pursuit of this mission,this agency inspects vehicles to ensure their safety equip-ment is working, and it inspects people to ensure they aresafe drivers. When most citizens think about their divisionof motor vehicles, however, what is their greatest care?Answer: How long they will have to stand in line. If a stateDMV wants to promote itself to the public, it has to em-phasize just one obvious aspect of performance: the timepeople spend in line. To promote itself to the public, a DMVhas to use this performance measure to convince citizensthat the time they will spend in line is going down.̂ ^

Ammons (1995) offers a "revolutionary" approach:"make perfonnance measurement interesting." Municipali-ties, he argues, ought to adopt measures "that can capturethe interest of local media and the public" (43)—particu-larly measures that "allow meaningful comparisons thatto some degree put community pride at stake" (38). Suchcomparisons could be to a professional or industry stan-dard. After all, as Ammons notes, "comparison with a stan-dard captures attention, where raw information does not"(39).^" But he is even more attracted to interjurisdictionalcomparisons. For example, Ammons argues that the pub-lic pays attention to various rankings of communities (be-cause, first, journalists pay attention to them). Thus, hewants measures that "are both revealing of operational ef-ficiency and effectiveness and more conducive to cross-jurisdictional comparisons" (38)—measures that provide"opportunities for interesting and meaningful performancecomparisons" (44).

Time spent in line is a measure that is both interesting

and meaningful. But what should be the standard for com-

parison? Is the average time spent in line the most mean-

Why Measure Performance? 595

ingful measure? Or do people care more about the prob-ability they will spend more than some unacceptable time(say one hour) in line?" Whatever time-in-line measure itchooses, a DMV may want to compare it with the samemeasure from neighboring (or similar) states. But will citi-zens in North Carolina really be impressed that they spendless time in a DMV line than do the citizens of South Caro-lina or North Dakota? Or will their most meaningful com-parison be with the time spent in line at their local super-market, bank, or fast-food franchise? A DMV manager whowants to promote the agency's competence to the publicshould compare its time-in-line performance with similarorganizational performance that people experience everyday. This is an easily understood performance characteris-tic about which citizens really care.

To do this, however, the agency must not only publishthe performance data; it must also make them accessibleboth physically and psychologically. People must be ableto obtain—perhaps not avoid—the measures; they mustalso fmd them easy to comprehend.

Purpose 6: To CelebrateBefore an agency can do any celebrating, its managers

need to create a performance target that, when achieved,gives its employees and collaborators a real sense of per-sonal and collective accomplishment. This target can beone that has also been used to motivate; it can be an annualtarget, or one of the monthly or quarterly targets into whichan annual target has been divided. Once an agency has pro-duced a tangible and genuine accomplishment that is worthcommemorating, its managers need to create a festivitythat is proportional to the significance of the achievement.

The verb "to celebrate" suggests a major undertaking—a big, end-of-the-fiscal-year bash, an awards ceremonywhen a most-wanted criminal is captured, or a victory partywhen a badly delayed project is completed on deadline.But private-sector managers celebrate lesser accomplish-ments; they use celebrations of significant successes toconvince their employees that their firm is full of winners(Peters and Austin 1985). Public managers have used thisstrategy, too (Behn 1991, 103-11). But to make the strat-egy work—to ensure that it creates motivation and thusimprovement—the agency's managers have to lead thecelebrations.

Purpose 7: To LearnTo learn, public managers need a large number and wide

variety of measures—measures that provide detailed, dis-aggregated information on the various aspects of the vari-ous operations of the various components of the agency.When seeking to learn, caution Kravchuk and Schack,public managers need to "avoid excessive aggregation ofinformation" (1996, 357).

Benchmarking is a traditional form of performancemeasurement that is designed to facilitate learning(Holloway, Francis, and Hinton 1999). It seeks to answerthree questions: What is my organization doing well? Whatis my organization not doing well? What does my organi-zation need to do differently to improve on what it is notdoing well? The organization, public or private, identifiesa critical internal process, measures it, and compares thesedata with similar measurements of the identical (or simi-lar) processes of organizations that are recognized as (cur-rently) the best.̂ * Any differences suggest not only that theorganization needs to improve, but also provide a basis foridentifying how it could achieve these improvements.

Benchmarking, write Kouzmin et al., is "an instrumentfor assessing organizational performance and for facilitat-ing management transfer and learning from other bench-marked organizations" (1999,121). Benchmarking, as theydefine it, "is a continuous, systematic process of measur-ing products, services and practices against organizationsregarded to be superior with the aim of rectifying any per-formance 'gaps'" (123). Thus, they conclude, "benchmark-ing can, on the whole, be seen as a learning strategy" (131).Nevertheless, they caution, for this strategy to work, theorganization must become a learning organization. Con-sequently, they conclude, "the learning effects of bench-marking are, to a very high degree, dependent on adequateorganizational conditions and managerial solutions" (132).

Deciding which performance measures best facilitatelearning is not easy. If public managers know what theyneed to do to improve performance, they don't need to learnit. But, if they don't know how they might improve, howdo they go about learning it? Kravchuck and Schack notethat a "measurement system is a reflection of what deci-sion makers expect to see and how they expect to respond"(1996, 356). That is, when designing a performance-mea-surement system, when deciding what to measure, man-agers first will decide what they might see and then createa system to see it.

Real learning, however, is often triggered by the unex-pected. As Richard Feynman, the Nobel Prize-winningphysicist, explained, when experiments produce unex-pected results, scientists start guessing at possible expla-nations (1965, 157). When Mendel crossed a dwarf plantwith a large one, he found that he didn't get a medium-sized plant, but either a small or large one, which led himto discover the laws of heredity (Messadie 1991, 90).When the planet Uranus was discovered to be unexpect-edly deviating from its predicted orbit, John Couch Adamsand Urbain Le Verrier independently calculated the orbitof an unknown planet that could be causing this unantici-pated behavior; then, Johan Gottlieb Galle pointed histelescope in the suggested direction and discovered Nep-tune (Standage 2000). When Karl Jansky observed that

596 Public Administration Review • September/October 2003, Vol, 63, No, 5

the static on his radio peaked every 24 hours and that thepeak occurred when the Milky Way was centered on hisantenna, he discovered radio waves from space (Messadie1991, 179). Scientific learning often emerges from aneffort to explain the unexpected. So does managementlearning.

Yet how can public officials design a measurement sys-tem for the unexpected when they can't figure out whatthey don't expect? As Kravchuk and Schack write, "unex-pected occurrences may not be apprehended by existingmeasures" (1996, 356). Nevertheless, the more disaggre-gated the data, the more likely they are to reveal devian-cies that may suggest the need to learn. This is the value ofmanagement by walking around—or what might be called"data collection by walking around." The stories that peopletell managers are the ultimate in disaggregation; one suchstory can provide a single deviate datum that the summarystatistics have completely masked but that, precisely be-cause it was unexpected, prods further investigation thatcan produce some real learning (and thus, perhaps, somereal improvement).

In fact, the learning process may first be triggered bysome deviance from the expected that appeared not in theformal performance data, but in the informal monitoringin which all public managers necessarily (if only implic-itly) engage. Then, having noticed this deviancy—someaberration that doesn't fit previous patterns, some numberso out of line that it jumps off the page, some subtle signthat suggests that something isn't quite right—the man-ager can create a measuring strategy to learn what causedthe deviance and how it can be fixed or exploited.^^

Failure is a most obvious deviance from the expectedand, therefore, provides a significant opportunity to leam.̂ ^Indeed, a retrospective investigation into the causes of thefailure will uncover a variety of measures that deviatedfrom the expected—that is, either from the agency's pre-scribed behavior or from the predicted consequences ofsuch prescriptions. Thus, failure provides multiple oppor-tunities to learn (Petroski 1985; Sitkin 1992).

Yet, failure (particularly in the public sector) is usuallypunished—and severely. Thus, when a failure is revealed(or even presumed), people tend to hide the deviate data,for such data can be used to assign blame. Unfortunately,these are the same deviate data that are needed to learn.

As glaring departures from the expected, failures pro-vide managers with obvious opportunities to learn. Mostdeviances, however, are more subtle. Thus, to learn fromsuch deviances, managers must be intellectually preparedto recognize them and to examine their causes. They haveto possess enough knowledge about the operation and be-havior of their organization—and about the operation andbehavior of their collaborators and society—to distinguisha significant deviance from a random aberration. And when

they think they observe an interesting deviance, they needa learning strategy for probing the causes and possibleimplications.

Thus, Kravchuk and Schack (1996) caution, "organi-zational learning cannot depend upon measurementalone" (356)—that is, "performance measurement sys-tems cannot replace the efforts of administrators to trulyknow, understand, and manage their programs" (350).Rather, they argue, the measures should indicate whenthe organization needs to undertake a serious effort atlearning based on the "expert knowledge" (357) of itsprogram managers and "other sources of performanceinformation which can supplement the formal measures"(356). Thus, they suggest, "measures should be placed ina management-by-exception frame, where they are re-garded as indicators that will serve to signal the need toinvestigate further" (357). Similarly, Neves, Wolf, andBenton write that "management indicators are intendedto be provocative, to suggest to managers a few areaswhere it may be appropriate to investigate further why aparticular indicator shows up the way it does" (1986,129).The better the manager understands his or her agency andthe political, social, and cultural environment in which itworks, the better the manager is able to identify—fromamong the various deviances that are generated by for-mal and informal performance measures—the ones thatare worthy of additional investigation.

Performance measures that diverge from the expectedcan create an opportunity to learn. But the measures them-selves are more likely to suggest topics for investigationthan to directly impart key operational lessons.

Purpose 8: To ImproveTo ratchet up performance, public managers need to

understand how they can influence the behavior of thepeople inside their agency (and its collaboratives) who pro-duce their outputs and how they can influence the conductof citizens who convert these outputs into outcomes. Theyneed to know what is going on inside their organization—including the broader organization that consists of every-thing and everyone whose behavior can affect these out-puts and outcomes. They need to know what is going oninside their entire, operational black box. They need in-side-the-black-box data that explains how the inputs, en-vironment, and operations they can change (influence orinspire) do (can, or might) cause (create, or contribute to)improvements in the outputs and outcomes. For example,a fire chief needs to understand how the budget input in-teracts (inside the fire department's black box) with people,equipment, training, and values to affect how the depart-ment's staff implements its fire-fighting strategy and itseducational fire-prevention strategy—outputs that furtherinteract with the behavior of citizens to produce the de-

Why Measure Performance? 597

sired outcomes of fewer fires and fewer people injured orkilled by fires that do occur.

Unfortunately, what is really going on inside the blackbox of any public agency is both complex and difficult toperceive. Much of it is going on inside the brains (oftenthe subconscious brains) of the employees who work withinthe organization, the collaborators who somehow contrib-ute to its outputs, and the citizens who convert these out-puts into outcomes. Moreover, any single action may ripplethrough the agency, collaborators, and society as peopleadjust their behavior in response to seemingly small or ir-relevant changes made by someone in some far-off comer.And when several people simultaneously take several ac-tions, the ripples may interact in complex and unpredict-able ways. It is very difficult to understand the black boxadjustments and interactions that happen when just a fewof the inputs (or processes) are changed, let alone whenmany of them are changing simultaneously and perhaps inundetected ways.^'

Once the managers have figured out what is going oninside their black box, they have to figure out how thefew things they can do are connected to the internal com-ponents they want to affect (because these componentsare, in tum, connected to the desired outputs or outcomes).How can changes in the budget's size or allocations af-fect people's behavior?'" How can changes in one coreprocess affect other processes? How can changes in onestrategy support or undermine other strategies? Howmight they influence people's behavior?

Specifically, how might various leadership activitiesripple through the black box? How might frequent, infor-mal recognition of clear, if modest, successes or publicattention to some small wins activate others? How mightan inspirational speech or a more dramatic statement ofthe agency's mission affect the diligence, intelligence, andcreativity of both organizational employees and collabo-rating citizens? To improve performance, public managersneed measures that illuminate how their own activities af-fect the behavior of all of the various humans whose ac-tions affect the outputs and outcomes they seek.

Meaningful Performance MeasurementRequires a Gauge and a Context

Abstract measures are worthless. To use a performancemeasure—to extract information from it—a manager needsa specific, comparative gauge, plus an understanding ofthe relevant context. A truck has been driven 6.0 million.Six miUion what? Six million miles? That's impressive.Six million feet? That's only 1,136 miles. Six millioninches? That's not even 95 miles. Big deal—unless those95 miles were driven in two hours along a dirt road on avery rainy night.

To use performance measures to achieve any of theseeight purposes, the public manager needs some kind ofstandard with which the measure can be compared.1. To use a measure to evaluate performance, public man-

agers need some kind of desired result with which tocompare the data, and thus judge performance.

2. To use a measure of performance to control behavior,public managers need first to establish the desired be-havioral or input standard from which to gauge indi-vidual or collective deviance.

3. To use efficiency measures to budget, public managersneed an idea of what is a good, acceptable, or poor levelof efficiency.''

4. To use performance measures to motivate people, pub-lic managers need some sense of what are reasonableand significant targets.

5. To use performance measures to promote an agency'scompetence, public managers need to understand whatthe public cares about.

6. To use performance measures to celebrate, public manag-ers need to discern the kinds of achievements that em-ployees and collaborators think are worth celebrating.

7. To use performance measures to learn, public manag-ers need to be able to detect unexpected (and signifi-cant) developments and anticipate a wide variety ofcommon organizational, human, and societal behaviors.

8. To use performance measures to improve, public man-agers need an understanding (or prediction) of howtheir actions affect the inside-the-black-box behaviorof the people who contribute to their desired outputsand outcomes.

All of the eight purposes require (explicitly or implic-itly) a baseline with which the measure can be compared.And, of course, the appropriate baseline depends on thecontext.

The standard against which to compare current perfor-mance can come from a variety of sources—each with itsown advantages and liabilities. The agency may use itshistorical record as a baseline, looking to see how much ithas improved. It may use comparative information fromsimilar organizations, such as the data collected by theComparative Performance Measurement Consortium or-ganized by the International City/County ManagementAssociation (1999), or the effort to measure and comparethe performance of local jurisdictions in North Carolinaorganized by the University of North Carolina (Rivenbarkand Few 2000).'^ Of course, comparative data also maycome from dissimilar organizations; citizens may com-pare—implicitly or quite explicitly—the ease of navigat-ing a govemment Web site with the ease of navigating thosecreated by private businesses." Or the standard may be anexplicit performance target established by the legislature,by political executives, or by career managers. Even to

598 Public Administration Review • September/October 2003, Vol. 63, No. 5

control, managers need some kind of Tayloristic standardto be met by those whose behavior they seek to control.Whether public managers want to evaluate, control, bud-get, motivate, promote, celebrate, learn, or improve, theyneed both a measure and a standard of performance.

The Political Complexities of MeasuringPerformance

Who will pick the purpose, the measure, and the perfor-mance standard? The leadership team of a public agencyhas both the opportunity and the responsibility. But oth-ers—elected executives and legislators, political appoin-tees and budget officers, journalists and stakeholders, andof course individual citizens—have the same opportunity,and often the same responsibility. Consequently, theagency's managers may discover that a set of performancemeasures has been imposed on them.

In some ways, however, public managers have more flex-ibility in selecting the performance measures that will beused by outsiders than do private-sector managers. Afterall, investment analysts long ago settled on a specific col-lection of performance measures—from return on equityto growth in market share—that they use when examininga business. For public agencies, however, no such broadlyapplicable and widely acceptable performance measuresexist. Thus, every time those outsiders—whether they arebudget officers or stakeholders—wish to examine a par-ticular agency's management, they have to create someperformance measures.

Sometimes, some will. Sometimes, a legislator or a bud-get officer will know exactly how he or she thinks the per-formance of a particular public agency should be measured.Sometimes, none will. Sometimes, no outsider will be ableto devise a performance measure that makes much sense.Sometimes, many will. Sometimes several outsiders—anelected executive, a newspaper editor, and a stakeholderorganization—will each develop a performance measure(or several such measures) for the agency. And when thishappens, these measures may well conflict.

Mostly, these outsiders use their performance measuresto evaluate, control, budget, or punish. Some might say,"We need this performance measure to hold the agencyaccountable." By this, they really mean, "We need thisperformance measure to evaluate the agency and if (as wesuspect) the agency doesn't measure up, we will punish itby cutting its budget (or saying nasty things that will bereported by journalists)."-^" Outsiders are less likely to useperformance measures to motivate, promote, or celebrate—though they could try to use them to force improvements.

Thus, the managers of a public agency may not havecomplete freedom to choose their own performance mea-sures. They may have to pay attention to measures chosen

by others. Even when they must respond to measures im-posed by outsiders, however, the leaders of a public agencyhave not lost their obligation to create a collection of per-formance measures that they will use to manage the agency.The leadership team still must report the measures thatoutsiders are, legitimately, requesting. And they may beable to use some of these measures for one or more of theirown eight purposes. But even when others have chosentheir own measures of the agency's performance, its lead-ers still need to seriously examine the eight managerialpurposes for which performance measures may prove use-ful and carefully select the best measures available for eachpurpose.

The Futile Search for the One BestMeasure

"What gets measured gets done" is, perhaps, the mostfamous aphorism of performance measurement.^^ If youmeasure it, people will do it. Unfortunately, what peoplemeasure often is not precisely what they want done. Andpeople—responding to the explicit or implicit incentivesof the measurement—will do what people are measuring,not what these people actually want done. This is, as StevenKerr, now chief learning officer at Goldman Sachs, wiselyobserves, "the folly of rewarding A while hoping for B"(1975). Thus, although performance measures shape be-havior, they may shape behavior in both desirable and un-desirable ways.̂ *

For a business, the traditional performance measure hasbeen the infamous bottom line—although any business hasnot just one bottom line, but many of them: a variety offinancial ratios (return on equity, return on sales) that col-lectively suggest how well the firm is doing—or, at least,how well it has done. But as Kaplan and Norton observe,"many have criticized financial measures because of theirwell-documented inadequacies, their backward-lookingfocus, and their inability to reflect contemporary value-creating actions." Thus, Kaplan and Norton invented theirnow-famous balanced scorecard to give businesses abroader set of measures that capture more than the firm'smost recent financial numbers. They want performancemeasures that answer four questions from four differentperspectives:• How do customers see us? (customer perspective)• What must we excel at? (internal business perspective)• Can we continue to improve and create value? (innovation

and learning perspective)• How do we look to shareholders? (financial perspective)

No single measure of performance answers all four ques-tions (1992, 77, 72).

Similarly, there is no one magic performance measurethat public managers can use for all of their eight purposes.

Why Measure Performance? 599

The search for the one best measurement is just as futile asthe search for the one best way (Behn 1996). Indeed, thisis precisely the argument behind Kaplan and Norton's bal-anced scorecard: Private-sector managers, they argue,"should not have to choose between financial and opera-tional measures"; instead, business executives need "a bal-anced presentation of both financial and operational mea-sures" (1992, 71). The same applies to public managers,who are faced with a more diverse set of stakeholders (notjust customers and shareholders), a more contradictory setof demands for activities in which they ought to excel, and

a more complex set of obstacles that must be overcome toimprove and create value." Consequently, they need aneven more heterogeneous family of measures than the fourthat Kaplan and Norton propose for business.

The leaders of a public agency should not go lookingfor their one magic performance measure. Instead, theyshould begin by deciding on the managerial purposes towhich performance measurement may contribute. Onlythen can they select a collection of performance measureswith the characteristics necessary to help them (directlyand indirectly) achieve these purposes.

Notes

1. Okay, not everyone is measuring performance. From a sur-vey of municipalities in the United States, Poister and Streibfind that "some 38 percent of the [695] respondents indicatethat their cities use performance measures, a significantlylower percentage than reported by some of the earlier sur-veys" (1999, 328). Similarly, Ammons reports on municipalgovernments' "meager record" of using performance mea-sures (1995,42). And, of course, people who report they aremeasuring performance may not really be using these mea-sures for any real purpose. Joyce notes there is "little evi-dence that performance information is actually used in theprocess of making budget decisions" (1997, 59).

2. People can measure the performance of (1) a public agency;(2) a public program; (3) a nonprofit or for-profit contractorthat is providing a public service; or (4) a collaborative ofpublic, nonprofit, and for-profit organizations. For brevity, Iusually mention only the agency—though I clearly intendmy reference to a public agency's performance to include theperformance of its programs, contractors, and collaboratives.

3. Although Hatry provides the usual list of different types ofperformance information—input, process, output, outcome,efficiency, workload, and impact data (1999b, 12)—whendiscussing his 10 different purposes (chapter 11), he refersalmost exclusively to outcome measures.

4. These eight purposes are not completely distinct. For example,learning itself is valuable only when put to some use. Obvi-ously, two ways to use the learning extracted from perfor-mance measurements are to improve and to budget. Simi-larly, evaluation is not an ultimate purpose; to be valuable,any evaluation has to be used either to redesign programs (toimprove) or to reallocate resources (to budget) by movingfunds into more valuable uses. Even the budgetary purposeis subordinate to improvement.

Indeed, the other seven purposes are all subordinate to im-provement. Whenever public managers use performancemeasures to evaluate, control, budget, motivate, promote,celebrate, or learn, they do so only because these activities—they believe or hope—will help to improve the performanceof govemment.

There is, however, no guarantee that every use of performancemeasures to budget or celebrate will automatically enhanceperformance. There is no guarantee that every controlling ormotivational strategy will improve performance. Public man-agers who seek evaluation or learning measures as a step to-ward improving performance need to think carefully not onlyabout why they are measuring, but also about what they willdo with these measurements and how they will employ themto improve performance.

5. Jolie Bain Pillsbury deserves the credit for explicitly point-ing out to me that distinctly different purposes for measuringperformance exist. On April 16, 1996, at a seminar at DukeUniversity, she defined five purposes: evaluate, motivate,leam, promote, and celebrate (Behn 1997b).

Others, however, have also observed this. For example,Osbome and Gaebler (1992), in their chapter on "ResultsOriented Govemment," employ five headings that capture fiveof my eight purposes: "If you don't measure results, you can'ttell success from failure" (147) (evaluate); "If you can't seesuccess, you can't reward it" (148) (motivate); "If you can'tsee success, you can't leam from it" (150) (leam); "If youcan't recognize failure, you can't correct it" (152) (improve);"If you can demonstrate results, you can win public support"(154) (promote).

6. Anyone who wishes to add a purpose to this list should alsodefine the characteristics of potential measures that will bemost appropriate for this additional purpose.

7. But isn't promoting accountability a quite distinct and alsovery important purpose for measuring performance? Afterall, scholars and practitioners emphasize the connection be-tween performance measurement and accountability. Indeed,it is Hatry's first use of performance information (1999b, 158).In a report commissioned by the Govemmental AccountingStandards Board on what it calls "service efforts and accom-plishments [SEA] reporting," Hatry, Fountain, and Sullivan(1990) note that SEA measurement reflects the board's de-sire "to assist in fulfilling govemment's duty to be publiclyaccountable and ... enable users to assess that accountabil-ity" (2). Moreover, they argue, without such performancemeasures, elected officials, citizens, and other users "are not

600 Public Administration Review • September/October 2003, Vol. 63, No. 5

able to fully assess the adequacy of the govemmental entity'sperformance or hold it accountable for the management oftaxpayer and other resources" (2-3). Indeed, they continue,elected officials and public managers have a responsibility"to be accountable by giving information that will assist thepublic in assessing the results of operations" (5).But what exactly does it mean to hold govemment account-able? In a 1989 resolution, the Governmental AccountingStandards Board called SEA information "an essential ele-ment of accountability." Indeed, in this resolution, the agency"gave considerable weight to the concept of accountability:of 'being obliged to explain one's actions, to justify what onedoes'; of being required 'to answer to the citizenry^ustifythe raising of public resources and the purposes for whichthey are used'" (Hatry et al. 1990, v). But does the phrase"hold govemment accountable" cover only the requirementsto explain, justify, and answer? Or does accountability reallymean punishment?

I find the use of the word "accountability" to be both ubiqui-tous and ambiguous. Yet it is difficult to examine how perfor-mance measurement will or might promote accountabilitywithout first deciding what citizens collectively mean by ac-countability—particularly, what we mean by accountabilityfor performance. What does it mean to hold a public agencyor manager accountable for performance? Presumably, thisholding-people-accountable-for-performance process wouldemploy some measure of performance. But what measureswould be most useful for this promoting-accountability pur-pose? And how would those measures actually be used topromote accountability? (Or to revise the logical sequence:How might we use performance measures to promote account-ability? Then, what measures would be most useful for pro-moting this accountability?) Before we as a polity can thinkanalytically and creatively about how we might use perfor-mance measures to promote accountability, we need to thinkmore analytically and creatively about what we mean by ac-countability. For a more detailed discussion of accountabil-ity—particularly of accountability for performance—seeBehn (2001).

8. Joyce (1997) makes a similar argument: "The ability to mea-sure performance is inexorably related to a clear understand-ing of what an agency or program is trying to accomplish"(50). Unfortunately, he continues, "the U.S. constitutional andpolitical traditions, particularly at the national and state lev-els, work against this kind of clarity, because objectives areopen to constant interpretation and reinterpretation at everystage of the policy process" (60).

9. Although a controlling approach to managing superior-sub-ordinate relations may be out of style, the same is not neces-sarily true for how policy makers manage operating agen-cies. The New Public Management consists of two conflictingapproaches: Letting the managers manage, versus making themanagers manage (Kettl 1997, 447-48). And while the let-the-managers-manage strategy does, indeed, empower themanagers of public agencies (for example, by giving themmore flexibility), the make-the-managers-manage strategy

does, in some ways, seek to control the managers. Yes, undera make-the-manager-manage performance contract, the man-ager has the flexibility to achieve his or her performance tar-gets; at the same time, these output targets can be thought ofas output "controls." I am grateful to an anonymous refereefor this insight.

10. For a discussion of the pervasive efforts of public officialsto control the behavior of their subordinates, see the classicdiscussion by Landau and Stout (1979).

11. For example, Robert Anthony's business texts on account-ing include The Management Control Function (1988) and(with Vijay Govindarajan) Management Control Systems(1998). Similarly, his equivalent book (with David W. Young)for the public and nonprofit sectors is titled ManagementControl in Nonprofit Organizations (1999).

12. "Do management information systems lead to greater man-agement control?" Overman and Loraine (1994, 193) askthis question and conclude they do not. From an analysis of99 Air Force contracts, they could not fmd any relationshipbetween the quality, detail, and timeliness of informationreceived from the vendors and the cost, schedule, or qualityof the project. Instead, they argue, "information can sym-bolize other values in the organization" (194). Still, legisla-tures, overhead agencies, and line managers seek controlthrough performance measurement.

13. Melkers and Willoughby (2001) report that 47 of the 50 stateshave some form of performance budgeting, which they de-fine "as requiring strategic planning regarding agency mis-sion, goals and objectives, and a process that requests quan-tifiable data that provides meaningful information aboutprogram outcomes" (54). Yet when they asked people inboth the executive and legislative branches of state govem-ment if their state had implemented performance budget-ing, they found that "surprisingly, budgeters from a handfulof states (10) disagreed across the branches as to imple-mentation of performance budget in their state" (59). A hand-ful? Melkers and Willoughby received responses from bothbranches from only 32 states. Thus, in nearly a third of thestates that responded, the legislative-branch respondent dis-agreed with the executive-branch respondent. Not only is itdifficult to define what performance budgeting is, it is alsodifficult to determine whether it has been implemented.

14. A more controversial use of performance measurement tomotivate is the linking of performance data to an individual'spay. For a discussion, see Smith and Wallace (1995).

15. Williams, McShane, and Sechrest (1994) worry that "rawdata may be misinterpreted by those without statistical train-ing" (538), while at the same time "summaries of manage-ment information based on aggregate data are potentiallydangerous to decision makers" (539). Moreover, they notethe differing assumptions that managers and evaluators bringto the task of interpreting data: "The administrator oftenmakes the implicit assumption that a project or operation isfully capable of succeeding" (541), while "the evaluator isapt to see the very core of his role as a questioning of theassumptions behind the project" (541). The evaluator starts

Why Measure Performance? 601

with the assumption that the program doesn't work; themanager, of course, believes that it does.

16. Understanding what is going on inside the black box is dif-ficult in all of the sciences. Physicists, for example, do notknow what is going on inside the black box of gravity. Theyknow what happens—they know that two physical objectsattract each other, and they can calculate the strength of thatattraction—but they don't understand how the inputs of massand distance are converted into the output of physical at-traction. Newton figured out that, to determine very accu-rately (in a very wide variety of circumstances) the force ofattraction between two objects, you need only three mea-surable inputs: the mass of the first object, the mass of thesecond object, and the distance between them:

F = C X OT, X OTj / CP

(where G is the universal gravitational constant)Unfortunately for physicists, this universal law doesn't workat the subatomic level: Here, the classical laws of gravita-tional and electrical attraction between physical objects donot hold. Thus, when confronted with their inability to evencalculate (using an existing formula) what is happening in-side one of their black boxes, physicists invent new con-cepts (Behn 1992)—in this case, for example, the strongforce and the weak force—that they can use to produce cal-culations that match the behavior they observe. But this doesnot mean that physicists understand what creates these in-side-the-black-box forces.

17. My black box of perfonnance management differs from theone defined by Ingraham and Kneedler (2000) and Ingrahamand Donahue (2000). In their "new performance model,"the black box is government management, and the inputsare politics, policy direction, and resources, all of which areimbedded in a set of environmental factors or contingen-cies. In my conception, the black box is the agency (or, moreaccurately, the people who work in the agency), the col-laborative (that is, the people who staff both the agency andits various partners), or society (the collection of citizens).Management and leadership are inputs that seek to improvethe perfonnance of the black box by convincing the envi-ronment to provide better inputs and by attempting to influ-ence the diligence, intelligence, and creativity with whichthose inside the black box convert the other inputs into out-puts and outcomes.

18. If all of the data are indeed random, analysts who use thetraditional 5 percent threshold for statistical significance andwho check out 20 potentially causal variables will identify(on average) one of these variables as statistically signifi-cant. If they test 100 variables, they will (on average) findfive that are statistically significant.

19. Perhaps this explains why formal perfonnance evaluationhas attracted a larger following and literature than has per-formance learning, let alone performance improvement.

20. A note of caution: Using outcomes to evaluate an organiza-tion's perfonnance makes sense, except that the organiza-tion's perfonnance is not the only factor affecting the out-comes. Yet, cognitive psychologists have identified the

"outcome effect" in the evaluation of managerial perfor-mance. This outcome effect causes the evaluators of amanager's decision to give more weight to the actual out-come than is warranted given the circumstances—particu-larly, the uncertainty—under which the original decision wasmade. That is, when an evaluator considers a decision madeby a manager, who could only make an uncertain, probabi-listic guess about the future state of one or more importantvariables, the evaluator will give higher marks when theoutcome seems to validate the initial choice than when theoutcome doesn't—even when the circumstances of the de-cision are precisely the same (Baron and Hershey 1988;Lipshitz 1989; Hawkins and Hastie 1990; Hershey and Baron1992; Ghosh and Ray 2000).

21. In business, Kaplan and Norton (1992) emphasize, the chal-lenge is to figure out how to make an "explicit linkage be-tween operations and finance" (79). They emphasize, "thehard truth is that if improved [internal, operational] perfor-mance fails to be reflected in the bottom line, executivesshould reexamine the basic assumptions of their strategyand mission" (77).

The same applies in government. Public managers need anexplicit link between operations and outcomes. If they useoutput (or process) measures to motivate people in theiragency to ratchet up performance, and yet the outcomes thatthese outputs (or processes) are supposed to produce don'timprove, they need to reexamine their strategy—and theirassumptions about how these outputs (or processes) may ormay not contribute to the desired outcomes.

22. In some ways, measures that are designed to motivate inter-nal improvements in a public agency's perfonnance appearto correspond to the measures that Kaplan and Norton de-sign for their internal business perspective. Such internalmeasures help managers to focus on critical internal opera-tions, write Kaplan and Norton. "Companies should decidewhat processes and competencies they must excel at andspecify measures for each." Then, they continue "managersmust devise measures that are infiuenced by employees' ac-tions" (1992, 74-75). That is, to motivate their employeesto improve internal operations, a firm's leaders need outputmeasures.

23. In January 2002, when he announced his campaign for statetreasurer, Daniel A. Grabauskas emphasized that, as the Mas-sachusetts registrar of motor vehicles, he had cut waitingtime by over an hour (Klein 2002).

24. For the general public, NAPA's advisory panel suggests,performance measures need to be suitably summarized(NAPA 1994, 23).

25. When attempting to select a measure to promote a publicagency's achievements, it is not obvious which perfonnancemeasure will capture citizens' concerns. What, for example,do the citizens visiting their state division of motor vehiclesreally care about? They might care about how long theywait in line. They might care less about how long they waitin line if they know, when they first get in line, how longthey will have to wait. Some might say they will be quite

602 Public Administration Review • September/October 2003, Vol. 63, No. 5

happy to wait 29 minutes, but not 30. Or, they might notcare how long they wait as long as they can do it in a com-fortable chair. Thus, before selecting a measure to promotethe agency's perfonnance, the agency's leadership shouldmake some effort—through polls, focus groups, or customersurveys—to determine what the public wants.Polls or focus groups, however, may produce only a theo-retical answer. Have people who visit the DMV only bien-nially really thought through what they want? A customersurvey—administered as people leave the DMV (or whilethey wait)—might produce a better sense of how peoplereally measure the agency's performance.

26. Actually, managers don't need to identify the best practice.To improve their organization's perfonnance, they need onlyto identify a better practice.

27. To promote the division of motor vehicles with the public,managers may simply publish the average length of timethat citizens wait in line at the DMV office (compared, per-haps, with the average wait in similar lines). Waiting time isan easily understood concept. Yet, when an organizationreports its waiting time, it rarely explains that this numberis the average waiting time because this is what people im-plicitly assume.To learn, however, the average wait is too much of a sum-mary statistic. The average wait obscures all of the interest-ing deviances that can be found only in the disaggregateddata: What branch has a wait time that is half the statewideaverage (and what can the division learn from this)? Whatday of the month or hour of the day has the longest waittime (and what might the division do about this)? From suchdeviances, the DMV's managers can learn what is workingbest within their agency and what needs to be fixed.

28. After all, if any individual had expected a major deviance,he or she presumably would have done something to pre-vent it. Of course, an individual might have anticipated thisdeviance but also anticipated that it would be labeled a mi-nor mistake rather than a huge failure (and thus they, too,have an opportunity to learn because, although they hadexpected the event, they did not expect it would be called afailure.) Or, an individual may have anticipated the failurebut may not have been in a position to prevent it or to con-vince those who could prevent it that it would really hap-pen. Or an individual may have anticipated the failure andhoped it would occur because its consequences (which wereunanticipated by others) would further the individual's ownagenda. Or an individual may have anticipated the failurebut gambled that the probability of the failure, combinedwith its personal costs, were less than the certain personalcosts of exposing his or her own responsibility for the causesbehind this (potential, future) failure. Some people may haveanticipated the failure, but certainly not everyone did.

29. The evaluator's ideal, of course, is that "only one new strat-egy should be introduced in one [organization]," while thebaseline strategy "would go on in a demographically simi-lar [organization]." Public executives, however, rarely areable to conduct the evaluator's "carefully controlled fieldexperiment" (Karmen 2000, 95). Moreover, if the manager

31.

believes that two strategies will have an synergistic effect,he or she will—quite naturally—choose to implement themsimultaneously in the same jurisdiction.

30. This suggests the limitations of performance budgeting as astrategy for improving performance: How much do budgetofficials know about how budget allocations affect the in-side-the-black-box behaviors that improve performance? Dothey really know which changes in the budget inputs willcreate the kind of complex inside-the-black-box interactionsthat can create improvements in organizational outputs, andthus societal outcomes?

Managers could simply allocate the available funds to theexisting activities that are (using strictly internal compari-sons) most efficient. Without some external standard of ef-ficiency, however, they could spend all of their appropria-tions on completely inefficient operations.

32. Measuring perfonnance against similar public agencies in away that facilitates useful comparisons among jurisdictionsis not easy. Agencies and jurisdictions organize themselvesdifferently. They collect different kinds of data. They defineinputs, processes, outputs, and outcomes differently. Con-sequently, obtaining comparable data is difficult—some-times impossible. To make valid comparisons, someone mustundertake the time-consuming task that Ammons, Coe, andLombardo call "data cleaning" (2001, 102).Still, even when perfectly similar data have been collectedfrom multiple jurisdictions, making useful comparisons isalso difficult. Is one city reporting higher performance datafor a particular agency because its leadership is more in-spiring or inventive, because the agency has inherited a moreeffective organizational structure, because its political andmanagerial leadership has adopted a strategy designed tofocus on some outcomes and not others, because the citycouncil established the agency's mission as a higher prior-ity, because the city council was willing to invest in modemtechnology, or because more of its citizens are cooperatingfully? Even comparing perfonnance measures for such afundamental public service as trash collection is not simple.For a more detailed discussion of why benchmarking per-formance results among local governments may be moredifficult than theorized, see Coe (1999, 111).

33. Criticism of public-sector service delivery has increased inthe last two decades because a number of the traditionalprocess measures for public-sector services—such as thetime spent in a line or the ease of navigating a Web site—can easily be compared with the same process measures forthe private sector, and because many private-sector firmshave made a concerted effort to improve the process mea-sures that customers value. Once people become accustomedto a short wait at their bank or when calling a toll-free num-ber—once they learn that it is technically possible to ensurethe wait is brief—they expect the same short wait from allother organizations, both private and public.

34. For a discussion of how accountability has become a codeword for punishment and how we might make it mean some-thing very different, see Behn (2001).

Why Measure Performance? 603

35. Peters and Waterman (1982,268) attribute it to Mason Haire.

36. For example, in business, Kaplan and Norton write, "retum-on-investment and eamings-per-share can give misleadingsignals for continuous improvement and innovation" (1992,71).

37. Kaplan and Norton also argue their balanced scorecard"guards against suboptimization." Because the leadershipis measuring a variety of indicators of the organization's

performance, people in the organization will avoid focus-ing on one measure (or one kind of measure) at the expenseof some others; after all, an "improvement in one area mayhave been achieved at the expense of another." And, even ifa part of the organization chooses to focus on one perfor-mance indicator and ignore the others, the leadership—be-cause it is measuring a variety of things—is much more aptto catch this suboptimal behavior (1992, 72).

References

Ammons, David N. 1995. Overcoming the Inadequacies of Per-formance Measurement in Local Government: The Case ofLibraries and Leisure Services. Public Administration Review55(1): 37-47.

Ammons, David N., Charles Coe, and Michael Lombardo. 2001.Performance-Comparison Projects in Local Government: Par-ticipants' Perspectives. Public Administration Review 61(1):100-10.

Anthony, Robert N. 1988. The Management Control Function.Boston, MA: Harvard Business School Press.

Anthony, Robert N., and Vijay Govindarajan. 1998. Manage-ment Control Systems. 9th ed. Burr Ridge, IL: McGraw-Hill/Irwin.

Anthony, Robert, and David W. Young. 1999. Management Con-trol in Nonprofit Organizations. 6th ed. Burr Ridge, IL:McGraw-Hill/Irwin.

Bardach, Eugene. 1998. Getting Agencies to Work Together: ThePractice and Theory of Managerial Craftsmanship. Wash-ington, DC: Brookings Institution.

Baron, Jonathan, and John C. Hershey. 1988. Outcome Bias inDecision Evaluation. Journal of Personality and Social Psy-chology 54(4): 569-79.

Behn, Robert D. 1991. Leadership Counts: Lessons for PublicManagers. Cambridge, MA: Harvard University Press.

. 1992. Management and the Neutrino: The Search forMeaningful Metaphors. Public Administration Review 52(5):409-19.

. 1996. The Futile Search for the One Best Way. Govern-ing, July, 82.

-. 1997a. The Money-Back Guarantee. Governing, Sep-tember, 74.

-. 1997b. Linking Measurement to Motivation: A Chal-lenge for Education. In Improving Educational Performance:Local and Systemic Reforms, Advances in Educational Ad-ministration 5, edited by Paul W Thurston and James G. Ward,15-58. Greenwich, CT: JAI Press.

-. 1999. Do Goals Help Create Innovative Organizations?In Public Management Reform and Innovation: Research,Theory, and Application, edited by H. George Fredericksonand Jocelyn M. Johnston, 70-88. Tuscaloosa, AL: Universityof Alabama Press.

. 2001. Rethinking Democratic Accountability. Washing-ton, DC: Brookings Institution.

Blodgett, Terrell, and Gerald Newfarmer. 1996. PerformanceMeasurement: (Arguably) The Hottest Topic in GovernmentToday. Public Management, January, 6.

Boone, Harry. 1996. Proving Government Works. State Govern-ment News, May, 10-12.

Bruns, William J. 1993. Responsibility Centers and PerformanceMeasurement. Note 9-193-101. Boston, MA: Harvard Busi-ness School.

Coe, Charles. 1999. Local Government Benchmarking: Lessonsfrom Two Major Multigovemment Efforts. Public Adminis-tration Review 59(2): 110-23.

Duncan, W. Jack. 1989. Great Ideas in Management: Lessonsfrom the Founders and Foundations of Managerial Practice.San Francisco, CA: Jossey-Bass.

Feynman, Richard. 1965. The Character of Physical Law. Cam-bridge, MA: MIT Press.

Ghosh, Dipankar, and Manash R. Ray. 2000. Evaluating Mana-gerial Performance: Mitigating the "Outcome Effect." Jour-nal of Managerial Issues 12(2): 247-60.

Hatry, Harry P. 1999a. Mini-Symposium on IntergovernmentalComparative Perfonnance Data. Public Administration Re-view 59(2): 101-4.

. 1999b. Performance Measurement: Getting Results.Washington, DC: Urban Institute.

Hatry, Harry P., James R. Fountain, Jr., Jonathan M. Sullivan.1990. Overview. In Service Efforts and AccomplishmentsReporting: Its Time Has Come, edited by Harry P. Hatry, JamesR. Fountain, Jr., Jonathan M. Sullivan, and Lorraine Kremer,1^9. Norwalk, CT: Governmental Accounting and StandardsBoard.

Hatry, Harry P., James R. Fountain, Jr., Jonathan M. Sullivan,and Lorraine Kremer. 1990. Service Efforts arjd Accomplish-ments Reporting: Its Time Has Come. Norwalk, CT: Govern-mental Accounting and Standards Board.

Hawkins, Scott A., and Reid Hastie. 1990. Hindsight: BiasedJudgements of Past Events after the Outcomes are Known.Psychological Bulletin 107(3): 311-27.

Hershey, John C , and Jonathan Baron. 1992. Judgment by Out-comes: When Is It Justified? Organizational Behavior andHuman Decision Processes 53(1): 89-93.

Holloway, Jacky A., Graham A.J. Francis, and C. MatthewHinton. 1999. A Vehicle for Change? A Case Study of Per-formance Improvement in the "New" Public Sector. Interna-tional Journal of Public Sector Management 12(4): 351-65.

604 Public Administration Review • September/October 2003, Vol. 63, No. 5

Holt, Craig L. 1995-96. Performance Based Budgeting: Can ItReally Be Done? The Public Manager 24(4): 19-21.

Ingraham, Patricia W., and Amy E. Kneedler. 2000. Dissectingthe Black Box: Toward a Model and Measures of Govem-ment Management Performance. In Advancing Public Man-agement: New Developments in Theory, Methods, and Prac-tice, edited by Jeffrey L. Brudney, Laurence J. OToole, Jr.,and Hal G. Rainey, 235-52. Washington, DC: GeorgetownUniversity Press.

Ingraham, Patricia W., and Amy Kneedler Donahue. 2000. Dis-secting the Black Box Revisited: Characterizing GovemmentManagement Capacity. In Governance and Performance: NewPerspectives, edited by Carolyn J. Heinrich and Laurence E.Lynn, Jr., 292-318. Washington, DC: Georgetown Univer-sity Press.

Intemational City/County Management Association (ICMA).1999. Comparative Performance Measurement: FY1998 DataReport. Washington, DC: ICMA.

Jordon, Meagan M., and Merl M. Hackbart. 1999. PerformanceBudgeting and Performance Funding in the States: A StatusAssessment. Public Budgeting and Finance 19(1): 68-88.

Joyce, Philip G. 1996. Appraising Budget Appraisal: Can YouTake Politics Out of Budgeting. Public Budgeting and Fi-nance 16(4): 21-25.

. 1997. Using Performance Measures for Budgeting: ANew Beat, or Is It the Same Old Tune? In Using PerformanceMeasurement to Improve Public and Nonprofit Programs,New Directions for Evaluation 75, edited by Kathryn E. New-comer, 45-61. San Francisco, CA: Jossey-Bass.

Kaplan, Robert S., and David P. Norton. 1992. The BalancedScorecard—Measures that Drive Performance. Harvard Busi-ness Review 70( 1): 71 -91.

Karmen, Andrew. 2000. New York Murder Mystery: The TrueStory behind the Crime Crash in the 1990s. New York: NewYork University Press.

Kerr, Steve. 1975. On the Folly of Rewarding A, While Hopingfor B. Academy of Management Journal 18(4): 769-83.

Kettl, Donald F. 1997. The Global Revolution in Public Man-agement: Driving Themes, Missing Links. Journal of PolicyAnalysis and Management 16(3): 446-62.

Klein, Rick. 2002. Registry Chief Quits to Run for State Trea-surer. Boston Globe, January 10, B5.

Kopczynski, Mary, and Michael Lombardo. 1999. ComparativePerformance Measurement: Insights and Lessons Leamedfrom a Consortium Effort. Public Administration Review59(2): 124-34.

Kouzmin, Alexander, Elke Loffler, Helmut Klages, and NadaKorac-Kakabadse. 1999. Benchmarking and PerformanceMeasurement in Public Sectors. International Journal of Pub-lic Sector Management 12(2): 121^4.

Kravchuk, Robert S., and Ronald W. Schack. 1996. DesigningEffective Performance Measurement Systems under the Gov-emment Performance and Results Act of 1993. Public Ad-ministration Review 56(4): 348-58.

Landau, Martin, and Russell Stout, Jr. 1979. To Manage Is Notto Control: Or the Folly of Type II Errors. Public Administra-tion Review 39(2): 148-56.

Lehan, Edward Anthony. 1996. Budget Appraisal—The NextStep in the Quest for Better Budgeting? Public Budgetingand Finance 16(4): 3-20.

Levin, Martin A., and Mary Bryna Sanger. 1994. Making Gov-ernment Work: How Entrepreneurial Executives Turn BrightIdeas into Real Results. San Francisco, CA: Jossey-Bass.

Lipshitz, Raanan. 1989. Either a Medal or a Corporal: The Ef-fects of Success and Failure on the Evaluation of DecisionMaking and Decision Makers. Organizational Behavior andHuman Decision Processes 44(3): 380-95.

Locke, Edwin A., and Gary P. Latham. 1984. Goal Setting: AMotivational Technique That Works. Englewood Cliffs, NJ:Prentice Hall.

. 1990. A Theory of Goal Setting and Task Performance.Englewood Cliffs, NJ: Prentice Hall.

Melkers, Julia, and Katherine Willoughby. 1998. The State ofthe States: Performance-Based Budgeting Requirements in47 out of 50. Public Administration Review 58(1): 66-73.

. 2001. Budgeters'Views of State Performance-Budget-ing Systems: Distinctions across Branches. Public Adminis-tration Review 61(1): 54-64.

Messadie, Gerald. 1991. Great Scientific Discoveries. New York:Chambers.

Murphey, David A. 1999. Presenting Community-Level Data inan "Outcomes and Indicators" Framework: Lessons fromVermont's Experience. Public Administration Review 59(1):76-82.

National Academy of Public Administration (NAPA). 1994. To-ward Useful Performance Measurement: Lessons Leamedfrom Initial Pilot Performance Plans Prepared under the Gov-ernment Performance and Results Act. Washington, DC:NAPA.

. 1999. Using Performance Data to Improve GovernmentEffectiveness. Washington, DC: NAPA.

Neves, Carole M.P, James F Wolf, and Bill B. Benton. 1986.The Use of Management Indicators in Monitoring the Per-formance of Human Service Agencies. In Performance andCredibility: Developing Excellence in Public and NonprofitOrganizations, edited by Joseph S. Wholey, Mark A.Abramson, and Christopher Bellavita, 129-48. Lexington,MA: Lexington Books.

Newcomer, Kathryn E. 1997. Using Performance Measures toImprove Programs. In Using Peiformance Measurement toImprove Public and Nonprofit Programs, New Directions forEvaluation 75, edited by Kathryn E. Newcomer, 5-13. SanFrancisco, CA: Jossey-Bass.

Osbome, David, and Ted Gaebler. 1992. Reinventing Govern-ment. Reading, MA: Addison-Wesley.

Osbome, David, and Peter Plastrik. 2000. The Reinventor'sFieldbook: Tools for Transforming Your Government. SanFrancisco, CA: Jossey-Bass.

Overman, E. Sam, and Donna T. Loraine. 1994. Information forControl: Another Management Proverb? Public Administra-tion Review 54(2): 193-96.

Peters, Thomas J., and Robert H. Waterman, Jr. 1982. In Searchof Excellence. New York: Harper and Row.

Why Measure Performance? 605

Peters, Tom, and Nancy Austin. 1985. A Passion for Excelience:The Leadership Difference. New York: Random House.

Petroski, Henry. 1985. To Engineer is Human: The Roie of Fail-ure in Successful Design. New York: St. Martin's Press.

Poister, Theodore H., and Gregory Streib. 1999. PerformanceMeasurement in Municipal Govemment: Assessing the Stateof the Practice. Public Administration Review 59(4): 325-35.

Rivenbark, William C, and Paula K. Few. 2000. Final Reporton City Services for Fiscal Year 1998-99: Performance andCost Data. Chapel Hill, NC: University of North Carolina-Chapel Hill, North Carolina Local Govemment PerformanceMeasurement Project.

Rosenthal, Burton. 1975. Lead Poisoning (A) and (B). Cam-bridge, MA: Kennedy School of Govemment.

Silverman, Eli B. 2001. NYPD Battles Crime: Innovative Strat-egies in Policing. Boston, MA: Northeastem University Press.

Sitkin, Sim B. 1992. Leaming through Failure: The Strategy ofSmall Losses. Research in Organizational Behavior 14:231-66.

Smith, Kimberly J., and Wanda A. Wallace. 1995. Incentive Ef-fects of Service Efforts and Accomplishments PerformanceMeasures: A Need for Experimentation. International Jour-nal of Public Administration 18(2/3): 383^07.

Sparrow, Malcolm K. 2000. The Regulatory Craft: ControllingRisks, Solving Problems, and Managing Compliance. Wash-ington, DC: Brookings Institution.

Standage, Tom. 2000. The Neptune File: A Story of Astronomi-cal Rivalry and the Pioneers of Planet Hunting. New York:Walker Publishing.

Theurer, Jim. 1998. Seven Pitfalls to Avoid When EstablishingPerformance Measures. Public Management 8(7): 21-24.

Thompson, Fred. 1994. Mission-Driven, Results-Oriented Bud-geting: Fiscal Administration and the New Public Manage-ment. Public Budgeting and Finance 15(3): 90-105.

Thompson, Fred, and Carol K. Johansen. 1999. ImplementingMission-Driven, Results-Oriented Budgeting. In Public Man-agement Reform and Innovation: Research, Theory, and Ap-plication, edited by H. George Frederickson and Jocelyn M.Johnston, 189-205. Tuscaloosa, AL: University of AlabamaPress.

Wholey, Joseph S. 1997. Clarifying Goals, Reporting Results.In Progress and Future Directions in Evaluation: Perspec-tives on Theory, Practice and Methods, New Directions forEvaluation 76, edited by Debra J. Rog and Deborah Foumier,95-105. San Francisco, CA: Jossey-Bass.

Wholey, Joseph S., and Harry P Hatry. 1992. The Case for Per-formance Monitoring. Public Administration Review 52(6):604-10.

Wholey, Joseph S., and Kathryn E. Newcomer. 1997. ClarifyingGoals, Reporting Results. In Using Performance Measure-ment to Improve Public and Nonprofit Programs, New Di-rections for Evaluation 75, edited by Kathryn E. Newcomer,91-98. San Francisco, CA: Jossey-Bass.

Williams, Frank P. Ill, Marilyn D. McShane, and Dale Sechrest.1994. Barriers to Effective Performance Review: The Seduc-tion of Raw Data. Public Administration Review 54(6): 537-42.

606 Public Administration Review • September/October 2003, Vol. 63, No. 5