28
192 Chapter 8 EVALUATION RESEARCH What Is the History of Evaluation Research? What Is Evaluation Research? What Are the Alternatives in Evaluation Designs? Black Box or Program Theory Researcher or Stakeholder Orientation Quantitative or Qualitative Methods Simple or Complex Outcomes What Can an Evaluation Study Focus On? Needs Assessment Evaluability Assessment Process Evaluation Impact Analysis Efficiency Analysis Ethical Issues in Evaluation Research Conclusion D.A.R.E.: Drug Abuse Resistance Education, as you probably know, is offered in elementary schools across America. For parents worried about drug abuse among youth, or for any concerned citizen, the program has immediate appeal. It brings a special police officer into the schools once a week to speak with classes about the hazards of drug abuse and to establish a direct link between local law enforcement and young people. You only have to check out bumper stickers or attend a few P.T.A. meetings to learn that it’s a popular program. And it is appealing. D.A.R.E. seems to improve relations between the schools and law enforcement and to create a positive image of the police in the eyes of students. It’s a very positive program for kids . . . a way for law enforcement to interact with children in a nonthreatening fashion . . . DARE sponsored a basketball 08-Chambliss.qxd 14-01-03 4:26 AM Page 192

Chapter 8€¦ · 192 Chapter 8 EVALUATION RESEARCH What Is the History of Evaluation Research? What Is Evaluation Research? What Are the Alternatives in Evaluation Designs?

  • Upload
    vancong

  • View
    219

  • Download
    3

Embed Size (px)

Citation preview

192

Chapter 8

EVALUATION RESEARCH

What Is the History of EvaluationResearch?

What Is Evaluation Research?

What Are the Alternatives in EvaluationDesigns?

Black Box or Program TheoryResearcher or Stakeholder OrientationQuantitative or Qualitative MethodsSimple or Complex Outcomes

What Can an Evaluation StudyFocus On?

Needs AssessmentEvaluability AssessmentProcess EvaluationImpact AnalysisEfficiency Analysis

Ethical Issues in Evaluation Research

Conclusion

D.A.R.E.: Drug Abuse Resistance Education, as you probably know, is offeredin elementary schools across America. For parents worried about drug abuseamong youth, or for any concerned citizen, the program has immediate appeal. Itbrings a special police officer into the schools once a week to speak with classesabout the hazards of drug abuse and to establish a direct link between local lawenforcement and young people. You only have to check out bumper stickers orattend a few P.T.A. meetings to learn that it’s a popular program.

And it is appealing. D.A.R.E. seems to improve relations between the schoolsand law enforcement and to create a positive image of the police in the eyes ofstudents.

It’s a very positive program for kids . . . a way for law enforcement to interactwith children in a nonthreatening fashion . . . DARE sponsored a basketball

08-Chambliss.qxd 14-01-03 4:26 AM Page 192

game. The middle school jazz band played . . . We had families there. . . .DARE officers lead activities at the [middle school] . . . Kids do woodworkingand produce a play. (Taylor, 1999)

Yet when all is said and done, D.A.R.E. doesn’t work. That is, it doesn’t do whatit was designed to do—lessen the use of illicit drugs among D.A.R.E. students,either while they are enrolled in the program or, more importantly, after they entermiddle or high school. Research designed to evaluate D.A.R.E. using social sciencemethods has repeatedly come to the same conclusion: Students who have parti-cipated in D.A.R.E. are no less likely to use illicit drugs than comparable studentswho have not participated in D.A.R.E. (Ringwalt et al., 1994).

If, like us, you have children who have enjoyed D.A.R.E., or were yourself aD.A.R.E. student, this may seem like a depressing way to begin a chapter on evalu-ation research. Nonetheless, it drives home an important point: In order to knowwhether social programs work, or how they work, we have to evaluate themsystematically and fairly, whether we personally like the program or not. In thischapter, you will read about a variety of social program evaluations as we introducethe evaluation research process, highlight alternative approaches, illustrate thedifferent types of evaluation research, and review common ethical concerns.

WHAT IS THE HISTORY OF EVALUATION RESEARCH?

Evaluation research is not a method of data collection, like survey research or exper-iments; nor is it a unique component of research designs, like sampling or measure-ment. Instead, evaluation research is research conducted for a distinctive purpose: toinvestigate social programs (such as substance abuse treatment programs, welfareprograms, criminal justice programs, or employment and training programs). Foreach project, an evaluation researcher must select a research design and method ofdata collection that are useful for answering the particular research questions posedand appropriate for the particular program investigated.

So you can see why we placed this chapter after most of the others in the text.When you review or plan evaluation research, you have to think about the researchprocess as a whole and how different parts of that process can best be combined.

The development of evaluation research as a major enterprise followed on theheels of the expansion of the federal government during the Great Depression andWorld War II. Large Depression-era government outlays for social programs stim-ulated interest in monitoring program output, and the military effort in World WarII led to some of the necessary review and contracting procedures for sponsoringevaluation research. However, it was not until the Great Society programs of the1960s that evaluation began to be required when new social programs were funded

Evaluation Research 193

08-Chambliss.qxd 14-01-03 4:26 AM Page 193

(Dentler, 2002; Rossi & Freeman, 1989:34). More than 100 contract research anddevelopment firms began in the United States between 1965 and 1975 and manyfederal agencies developed their own research units. The RAND Corporationexpanded from its role as a U.S. Air Force planning unit into a major socialresearch firm, SRI International spun off from Standard University as a privatefirm, and Abt Associates in Cambridge, Massachusetts, which began in a garagein 1965, grew to employ more than 1,000 employees in 5 offices in the UnitedStates, Canada, and Europe. The World Bank and International Monetary Fund(IMF) began to require evaluation of the programs they fund in other countries(Dentler, 2002:147).

In the early 1980s, after this period of rapid growth, many evaluation researchfirms closed in tandem with the decline of many Great Society programs. However,the evaluation research enterprise has been growing again in more recent years. TheCommunity Mental Health Act Amendments of 1975 (Public Law 94-63) requiredquality assurance (QA) reviews, which often involve evaluation-like activities(Patton, 2002:147–151). The Government Performance and Results Act of 1993required some type of evaluation of all government programs (GovernmentPerformance Results Act of 1993, retrieved July 13, 2002). At century’s end, thefederal government was spending about $200 million annually on evaluating $400billion in domestic programs, and the 30 major federal agencies had between them200 distinct evaluation units (Boruch, 1997). In 1999, the new GovernmentalAccounting Standards Board urged that more attention be given to “service effortsand accomplishments” in standard government fiscal reports (Campbell, 2001).

The growth of evaluation research is also reflected in the social science commu-nity. The American Evaluation Association was founded in 1986 as a professionalorganization for evaluation researchers (merging two previous associations) and isthe publisher of an evaluation research journal. In 1999, evaluation researchersfounded the Campbell Collaboration in order to publicize and encourage systematicreview of evaluation research studies. Their online archive contains 10,449 reportson randomized evaluation studies (Davies, Petrosino, & Chalmers, 1999).

WHAT IS EVALUATION RESEARCH?

Exhibit 8.1 illustrates the process of evaluation research as a simple systems model.First, clients, customers, students, or some other persons or units—cases—enter theprogram as inputs. (Notice that this model regards programs as machines, withclients—people—seen as raw materials to be processed.) Students may begin anew school program, welfare recipients may enroll in a new job training program,or crime victims may be sent to a victim advocate. Resources and staff required bya program are also program inputs.

194 M A K I N G S E N S E O F T H E S O C I A L W O R L D

08-Chambliss.qxd 14-01-03 4:26 AM Page 194

Inputs: Resources, raw materials, clients and staff that go into aprogram.

Next, some service or treatment is provided to the cases. This may be atten-dance in a class, assistance with a health problem, residence in new housing, orreceipt of special cash benefits. This process of service delivery may be simple orcomplicated, short or long, but it is designed to have some impact on the cases, asinputs are consumed and outputs are produced.

The direct product of the program’s service delivery process is its output.Program outputs may include clients served, case managers trained, food parcelsdelivered, or arrests made. The program outputs may be desirable in themselves,but they primarily serve to indicate that the program is operating.

Program outcomes indicate the impact of the program on the cases that havebeen processed. Outcomes can range from improved test scores or higher rates ofjob retention to fewer criminal offenses and lower rates of poverty. There arelikely to be multiple outcomes of any social program, some intended and someunintended, some viewed as positive and others viewed as negative.

Variation in both outputs and outcomes in turn influence the inputs to the pro-gram through a feedback process. Without feedback, if not enough clients arebeing served, recruitment of new clients may increase. If too many negative sideeffects result from a trial medication, the trials may be limited or terminated.If a program does not appear to lead to improved outcomes, clients may go else-where.

Evaluation Research 195

Program Stakeholders

OUTPUTSINPUTS PROGRAMPROCESS

Feedback

OUTPUTS

Exhibit 8.1 A Model of Evaluation

Source: Adapted from Martin, Lawrence L. and Peter M. Kettner. 1996. Measuring the Performance ofHuman Service Programs. Sage Publications. Used with permission.

08-Chambliss.qxd 14-01-03 4:26 AM Page 195

Program process: The complete treatment or service delivered by theprogram.

Outputs: The services delivered or new products produced by theprogram process.

Outcomes: The impact of the program process on the cases processed.

Feedback: Information about service delivery system outputs, outcomes,or operations that is available to any program inputs.

Evaluation research itself is really just a systematic approach to feedback; itstrengthens the feedback loop through credible analyses of program operationsand outcomes. Evaluation research also broadens this loop to include connectionsto parties outside of the program itself. A funding agency or political authoritymay mandate the research, outside experts may be brought in to conduct theresearch, and the evaluation research findings may be released to the public, or atleast to funders, in a formal report.

The evaluation process as a whole, and the feedback in particular, can be under-stood only in relation to the interests and perspectives of program stakeholders.Stakeholders are those individuals and groups who have some basis of concern forthe program. They might be clients, staff, managers, funders, or the public. Theboard of a program or agency, the parents or spouses of clients, the foundations thataward program grants, the auditors who monitor program spending, the membersof Congress—each is a potential program stakeholder, and each has an interest inthe outcome of any program evaluation. Some may fund the evaluation, some mayprovide research data, some may review—or even approve—the research report(Martin & Kettner, 1996:3). Who the program stakeholders are, and what role theyplay in the program evaluation, can have tremendous consequences for theresearch.

Stakeholders: Individuals and groups who have some basis of concernwith the program.

Do you see the difference between traditional social science and evaluationresearch (Posavac & Carey, 1997)? Social science is motivated by theoreticalconcerns and is guided by the standards of research methods, without considera-tion (ideally) for political factors. It examines specific organizations for what ingeneral we can learn from them, not for improving that one organization. Practicalramifications, for particular programs, are not usually of any import. For evalua-tion research, on the other hand, the particular program and its impact are para-mount. How the program works also matters—not to advance a theory, but to

196 M A K I N G S E N S E O F T H E S O C I A L W O R L D

08-Chambliss.qxd 14-01-03 4:26 AM Page 196

improve the program. Finally, stockholders of all sorts—not an abstract “scientificcommunity”—have a legitimate role in setting the research agenda, and may wellintervene even when they aren’t supposed to. But overall there is no sharp boundarybetween the two approaches: In their attempt to explain how and why the programhas an impact, and whether the program is needed, evaluation researchers oftenbring social theories into their projects, but for immediately practical aims.

WHAT ARE THEALTERNATIVES IN EVALUATION DESIGNS?

Evaluation research tries to learn if, and how, real-world programs produce results.But that simple statement covers a number of important alternatives in researchdesign, including the following:

• Black box or program theory—Do we care how the program gets results?• Researcher or stakeholders orientation—Whose goals matter most?• Quantitative or qualitative methods—Which methods provide the best

answers?• Simple or complex outcomes—How complicated should the findings be?

Black Box or Program Theory

Most evaluation research tries to determine whether a program has the intendedeffect. If the effect occurred, the program “worked”; if the effect didn’t occur,then, some would say, the program should be abandoned or redesigned. In thissimple approach, the process by which a program produces outcomes is oftentreated as a “black box,” in which the “inside” of the program is unknown. Thefocus of such research is whether cases have changed as a result of their exposureto the program, between the time they entered as inputs and when they exited asoutputs (Chen, 1990); the assumption is that program evaluation requires only thetest of a simple input/output model, like that in Exhibit 8.1. There may be noattempt to “open the black box” of the program process.

But there are good reasons to open the black box and investigate how theprocess works (or doesn’t work). Consider recent research on welfare-to-workprograms. The Manpower Demonstration Research Corporation reviewed find-ings from research on these programs in Florida, Minnesota, and Canada (Lewin,2001a). In each location, adolescents with parents in a welfare-to-work programwere compared to a control group of teenagers whose parents were also on wel-fare but were not enrolled in welfare-to-work. In all three locations, teenagersin the welfare-to-work program families did worse in school than those in thecontrol group.

Evaluation Research 197

08-Chambliss.qxd 14-01-03 4:26 AM Page 197

But why did requiring welfare mothers to get jobs hurt their children’s schoolwork? Unfortunately, because the researchers had not investigated programprocess—had not opened the black box—we can’t know for sure. Martha Zaslow,an author of the resulting research report, speculated (Lewin, 2001a) that

parents in the programs might have less time and energy to monitor their ado-lescents’ behavior once they were employed. . . . under the stress of working,they might adopt harsher parenting styles . . . the adolescents’ assuming moreresponsibilities at home when parents got jobs was creating too great a burden.

Unfortunately, as Ms. Zaslow admitted,

We don’t know exactly what’s causing these effects, so it’s really hard to say,at this point, what will be the long-term effects on these kids.

If an investigation of program process had been conducted, though, a program

theory could have been developed. A program theory describes what has beenlearned about how the program has its effect. When a researcher has sufficientknowledge before the investigation begins, outlining a program theory can help toguide the investigation of program process in the most productive directions. Thisis termed a theory-driven evaluation.

A program theory specifies how the program is expected to operate and identi-fies which program elements are operational (Chen, 1990:32). In addition, a pro-gram theory specifies how a program is to produce its effects, and so improvesunderstanding of the relationship between the independent variable (the program)and the dependent variable (the outcome or outcomes). For example, Exhibit 8.2illustrates the theory for an alcoholism treatment program. It shows that personsentering the program are expected to respond to the combination of motivationalinterviewing and peer support. A program theory also can decrease the risk of fail-ure when the program is transported to other settings, because it will help to iden-tify the conditions required for the program to have its intended effect.

Program theory: A descriptive or prescriptive model of how a programoperates and produces effects.

Program theory can be either descriptive or prescriptive (Chen, 1990).Descriptive theory specifies impacts that are generated and how this occurs. It sug-gests a causal mechanism, including intervening factors, and the necessary contextfor the effects. Descriptive theories are generally empirically based. On the otherhand, prescriptive theory specifies what ought to be done by the program, and is

198 M A K I N G S E N S E O F T H E S O C I A L W O R L D

08-Chambliss.qxd 14-01-03 4:26 AM Page 198

not actually tested. Prescriptive theory specifies how to design or implement thetreatment, what outcomes should be expected, and how performance should bejudged. Comparison of the program’s descriptive and prescriptive theories can helpto identify implementation difficulties and incorrect understandings that can befixed (Patton, 2002:162–164).

Researcher or Stakeholder Orientation

Whose prescriptions specify how the program should operate, what outcomes itshould achieve, or whom it should serve? Most social science assumes that theresearcher specifies the questions, the applicable theory or theories, and the out-comes to be investigated. Social science research results are usually reported in aprofessional journal or at a professional conference, where scientific standardsdetermine how the research is judged. In program evaluation, however, the researchquestion is often set by the program sponsors or a government agency; in consult-ing projects for businesses, the client—a manager, perhaps, or a division presi-dent—decides what question researchers will study. It is to these authorities thatresearch findings are reported. Most often this authority also specifies the outcomesto be investigated. The primary evaluator of evaluation research, then, is the fund-ing agency, not the professional social science community. Evaluation research isresearch for a client, and its results may directly affect the services, treatments, oreven punishments (in the case of prison studies, for example) that program usersreceive. In this case, the person who pays the piper gets to pick the tune.

Evaluation Research 199

Detox,Shelters,Hospitals,

Other

Screeningand

Assessment

MotivationalInterviewing

ReducedDrinking

Housing

Recidivism

Feedback

Peer Support

EvaluationRecruitment Program Elements Outputs Outcomes

Exhibit 8.2 The Program Theory for a Treatment Program for Homeless Alcoholics

08-Chambliss.qxd 14-01-03 4:26 AM Page 199

Should the evaluation researcher insist on designing the project and specifyingits goals? Or should she accept the suggestions and goals of the funding agency?What role should program staff and clients play? What responsibility does theresearcher have to politicians and taxpayers when evaluating government-fundedprograms?

The different answers that various evaluation researchers have given to thesequestions are reflected in different approaches to evaluation (Chen, 1990:66–68).Stakeholder approaches encourage researchers to be responsive to programstakeholders. Issues for study are to be based on the views of people involved withthe program and reports are to be made to program participants (Stake, 1975). Theprogram theory is developed by the researcher to clarify and develop the keystakeholders’ theory of the program (Wholey, 1987). In one stakeholder approach,termed utilization-focused evaluation, the evaluator forms a task force of programstakeholders who help to shape the evaluation project so that they are most likelyto use its results (Patton, 2002:171–175). In evaluation research termed actionresearch or participatory research, program participants are engaged with theresearchers as co-researchers and help to design, conduct, and report the research.One research approach, termed appreciative inquiry, eliminates the professionalresearcher altogether in favor of a structured dialogue about needed changesamong program participants themselves (Patton, 2002:177–185).

Egon Guba and Yvonna Lincoln (1989:11) argue for a stakeholder approach intheir book, Fourth Generation Evaluation:

The stakeholders and others who may be drawn into the evaluation are welcomedas equal partners in every aspect of design, implementation, interpretation, andresulting action of an evaluation—that is, they are accorded a full measure ofpolitical parity and control. . . . determining what questions are to be asked andwhat information is to be collected on the basis of stakeholder inputs.

Social science approaches, in contrast, emphasize the importance of researcherexpertise and maintenance of some autonomy in order to develop the most trust-worthy, unbiased program evaluation. They assume that “evaluators cannotpassively accept the values and views of the other stakeholders” (Chen, 1990:78).Evaluators who adopt this approach derive a program theory from information theyobtain on how the program operates and extant social science theory and knowl-edge, not from the views of stakeholders. In one somewhat extreme form of thisapproach, goal-free evaluation, researchers do not even permit themselves to learnwhat goals the program stakeholders have for the program. Instead, the researcherassesses and then compares the needs of participants to a wide array of programoutcomes (Scriven, 1972). The goal-free evaluator wants to see the unanticipated

200 M A K I N G S E N S E O F T H E S O C I A L W O R L D

08-Chambliss.qxd 14-01-03 4:26 AM Page 200

outcomes and to remove any biases caused by knowing the program goals inadvance.

Of course, there are disadvantages to both stakeholder and social scienceapproaches to program evaluation. If stakeholders are ignored, researchers mayfind that participants are uncooperative, that their reports are unused, and that thenext project remains unfunded. On the other hand, if social science procedures areneglected, standards of evidence will be compromised, conclusions about programeffects will likely be invalid, and results are unlikely to be generalizable to othersettings. These equally undesirable possibilities have led to several attempts todevelop more integrated approaches to evaluation research.

Integrative approaches attempt to cover issues of concern to both stakeholders,including stakeholders in the group from which guidance is sought, and evaluators(Chen & Rossi, 1987:101–102). The emphasis given to either stakeholder or scien-tific concerns varies with the specific circumstances. Integrative approaches seek tobalance being responsive to stakeholder concerns with being objective and scienti-fically trustworthy. When the research is planned, evaluators are expected to com-municate and negotiate regularly with key stakeholders and to take their concernsinto account. Findings from preliminary inquiries are reported back to program deci-sion makers so that they can make improvements in the program before it is formallyevaluated. But when the actual evaluation is conducted, the research team operatesmore autonomously, minimizing intrusions from program stakeholders.

Quantitative or Qualitative Methods

Quantitative evaluation research typically is used in attempts to identify the effectsof a social program, whether by systematically tracking change over time or by com-paring outcomes between an experimental and a control group: Did the responsetimes of emergency personnel tend to decrease? Did the students’ test scores increasemore in the experimental group than in the control group? Did housing retentionimprove for all subjects or just for those who were not substance abusers?

Qualitative methods can add much to quantitative evaluation studies, includingdepth, detail, nuance, and exemplary case studies (Patton, 2002). Perhaps the great-est contribution qualitative methods can make is in investigating program process—finding out what is “inside the black box.” With quantitative measures like staffcontact hours or frequency of complaints you can track items such as service deliv-ery, but finding out how clients experience the program can best be accomplishedby observing program activities and interviewing staff and clients intensively.

For example, Tim Diamond’s (1992) observational study of work in a nursinghome shows how the somewhat cool professionalism of new program aides wassoftened to include a greater sensitivity to interpersonal relations:

Evaluation Research 201

08-Chambliss.qxd 14-01-03 4:26 AM Page 201

The tensions generated by the introductory lecture and . . . ideas of careerprofessionalism were reflected in our conversations as we waited for the secondclass to get under way. Yet within the next half hour they seemed to dissolve.Mrs. Bonderoid, our teacher, saw to that. . . . “What this is going to take,” sheinstructed, “is a lot of mother’s wit.” “Mother’s wit,” she said, not “mother wit,”which connotes native intelligence irrespective of gender. She was talking aboutmaternal feelings and skills. (Diamond, 1992:17)

Diamond could have asked the aides in surveys how satisfied they were withtheir training, but structured questions would not have revealed the subtle mix ofcool professionalism and “mother’s wit.”

Qualitative methods also can uncover how different individuals react to thetreatment. For example, a quantitative evaluation of student reactions to an adultbasic skills program for new immigrants relied heavily on the students’ initialstatements of their goals. However, qualitative interviews revealed that most newimmigrants lacked sufficient experience in America to set meaningful goals; theirinitial goal statements simply reflected their eagerness to agree with their coun-selors’ suggestions (Patton, 2002, 177–181).

Qualitative methods can, in general, help in understanding how social programsactually operate. In complex social programs, it is not always clear whether anyparticular features are responsible for the program’s effect (or noneffect). LisbethB. Schorr, director of the Harvard Project on Effective Interventions, and DanielYankelovich, president of Public Agenda, put it this way (Schorr & Yankelovich,2000:A14): “Social programs are sprawling efforts with multiple componentsrequiring constant mid-course corrections, the involvement of committed humanbeings, and flexible adaptation to local circumstances.”

The more complex the social program, the more value that qualitative methodscan add to the evaluation process. Schorr and Yankelovich (2000: A14) pointto the Ten Point Coalition, an alliance of black ministers that helped to reducegang warfare in Boston through multiple initiatives, “ranging from neighborhoodprobation patrols to safe havens for recreation.” Qualitative methods would helpto describe a complex, multifaceted program like this.

Simple or Complex Outcomes

Does the program have only one outcome? Unlikely. How many outcomes areanticipated? How many might be unintended? Which are direct consequences ofthe program, and which are indirect effects that are derived later from the directeffects? (Mohr, 1992) Do the longer-term outcomes follow directly from the imme-diate program outputs? Do the outputs—say, the increases in test scores at the endof the preparation course—definitely result in the desired outcomes (increased rates

202 M A K I N G S E N S E O F T H E S O C I A L W O R L D

08-Chambliss.qxd 14-01-03 4:26 AM Page 202

of college admission)? The decision to focus on one outcome rather than another,on a single outcome or on several, can have enormous implications.

Sometimes a single policy outcome is sought, but is found not to be sufficient,either methodologically or substantively. When Sherman and Berk (1984) evalu-ated the impact of an immediate arrest policy in cases of domestic violence inMinneapolis, they focused on recidivism—repeating the offense—as the key out-come. Similarly, the reduction of recidivism was the single desired outcome ofprison boot camps which began opening in the 1990s. Boot camps were military-style programs for prison inmates that provided tough, highly regimented activitiesand harsh punishment for disciplinary infractions with the goal of scaring inmates“straight.” But these single-purpose programs, both designed to reduce recidivism,turned out not to be quite so simple to evaluate. The Minneapolis researchers foundthat there were no adequate single sources for recidivism in domestic violencecases, so they had to hunt for evidence from court and police records, perform fol-low-up interviews with victims, and review family member reports. More easilymeasured variables, such as partners’ ratings of the accused’s subsequent behavior,received more attention. Boot camp research soon concluded that the experiencedid not reduce recidivism, but some participants felt that boot camps did have somebeneficial effects (Latour, 2002):

[A staff member] saw things unfold that he had never witnessed among inmatesand their caretakers. . . . profoundly affected the drill instructors and theircharges . . . graduation ceremonies routinely reduced inmates . . . sometimeseven supervisors to tears. . . . Here, it was a totally different experience.

Some now argue that the failure of boot camps to reduce recidivism was due tothe lack of post-prison support, rather than to failure of the camps to promote pos-itive change in inmates. Looking at recidivism rates alone would ignore someimportant positive results.

So in spite of the additional difficulties, most evaluation researchers attempt tomeasure multiple outcomes (Mohr, 1992). The result usually is a much more real-istic, and richer, understanding of program impact.

Some of the multiple outcomes measured in the evaluation of Project NewHope appear in Exhibit 8.3. Project New Hope was an ambitious experimentalevaluation of the impact of guaranteeing jobs to poor people (DeParle, 1999). Itwas designed to answer the following question: If low-income adults are given ajob at a sufficient wage, above the poverty level, with child care and health careassured, how many would ultimately prosper?

In Project New Hope, 677 low-income adults in Milwaukee were offered a jobinvolving work for 30 hours a week, as well as child care and health care benefits.

Evaluation Research 203

08-Chambliss.qxd 14-01-03 4:26 AM Page 203

A control group did not receive the guaranteed jobs. The outcome? Only 27% ofthe 677 stuck with the job long enough to lift themselves out of poverty, and theirearnings as a whole were only slightly higher than those of the control group.Levels of depression were not decreased nor self-esteem increased by the jobguarantee. But there were some positive effects: The number of people who never

204 M A K I N G S E N S E O F T H E S O C I A L W O R L D

Goals & Action KnowledgeInformation Source* Collegiality Community Expectations Orientation Base

SETTINGS

Public places X X X X X(halls, main offices)

Teacher's lounge X X X X

Classrooms X X X X

Meeting rooms X X X

Gymnasium or Xlocker room

EVENTS

Faculty meetings X X X

Lunch hour X X

Teaching X X X X

PEOPLE

Principal X X X X

Teachers X X X X X

Students X X X

ARTIFACTS

Newspapers X X X

Decorations X

Type of Information to Be Obtained

Exhibit 8.3 Sampling Plan for a Participant Observation Project in Schools

Source: From DeParle, Jason. 1999. “Project to Rescue Needy Stumbles Against the Persistence ofPoverty.” The New York Times, May 15: A1, A10. Reprinted with permission.

*Selected examples in each category.

08-Chambliss.qxd 14-01-03 4:26 AM Page 204

worked at all declined, and rates of health insurance and use of formal child careincreased. Perhaps most importantly, the classroom performance and educationalhopes of participants’ male children increased, with the boys’ test scores rising bythe equivalent of 100 points on the SAT and their teachers ranking them as betterbehaved.

So did the New Hope program “work”? Clearly it didn’t live up to initial expec-tations, but it certainly showed that social interventions can have some benefits.Would the boys’ gains continue through adolescence? Longer term outcomeswould be needed. Why didn’t girls (who were already performing better than theboys) benefit from their parents’ enrollment in New Hope just as the boys did? Aprocess analysis would add a great deal to the evaluation design. Collection ofmultiple outcomes, then, gives a better picture of program impact.

WHAT CAN AN EVALUATION STUDY FOCUS ON?

Evaluation projects can focus on a variety of different questions related to socialprograms and their impact:

• What is the level of need for the program? • Can the program be evaluated? • How does the program operate? • What is the program’s impact?• How efficient is the program?

Which question is asked will determine what research methods are used.

Needs Assessment

Is a new program needed, or an old one still required? Is there a need at all? Aneeds assessment attempts to answer this question with systematic, credible evi-dence. Need may be assessed by social indicators such as the poverty rate or thelevel of home ownership, interviews with local experts such as school board mem-bers or team captains, surveys of populations potentially in need, or focus groupswith community residents (Rossi & Freeman, 1989).

It is not as easy as it sounds (Posavac & Carey, 1997). Whose definitions ofneed should be used? How will we deal with ignorance of need? How can weunderstand the level of need without understanding the social context? (Shortanswer to that one: We can’t!) What, after all, does “need” mean in the abstract?

Consider what kind of housing people need. The Boston McKinney Project, inwhich Russell Schutt (one of your authors) was a co-investigator, evaluated the

Evaluation Research 205

08-Chambliss.qxd 14-01-03 4:26 AM Page 205

merits of providing formerly homeless mentally ill people with group housinginstead of individual housing (Goldfinger et al., 1997). The experiment involvedasking what type of housing these people needed. We first examined the ques-tion by asking two clinicians to estimate which of the two housing alternativeswould be best for each project participant (Goldfinger & Schutt, 1996) andby asking each participant which type of housing they wanted (Schutt &Goldfinger, 1996).

Exhibit 8.4 displays the findings. Clinicians recommended group housing formost participants, whereas most of the participants sought individual housing. Infact, there was no correspondence between the housing recommendations of theclinicians and the housing preferences of the participants (who did not know whatthe clinicians had recommended for them). So, which perspective reveals the levelof need for group as opposed to individual housing? Of course there’s no com-pletely objective answer to that question. You can see that policymakers’ valuesand their understanding of the nature of mental illness and homelessness willinfluence which perspective on “need” they prefer.

In general, it is a good idea to use multiple indicators of need. There is noabsolute definition of need in this situation, nor is there in any but the simplestevaluation projects. A good evaluation researcher will do his or her best to capturedifferent perspectives on need and then to help others make sense of the results.

Evaluability Assessment

Evaluation research will be pointless if the program itself cannot be evaluated.Yes, some type of study is always possible, but a study specifically to identify theeffects of a particular program may not be possible within the available time and

206 M A K I N G S E N S E O F T H E S O C I A L W O R L D

27

51

5

18

0

10

20

30

40

50

60

IL ECH

Clinician Recommendation

Per

cent Independent

Supported

ConsumerPreference

Exhibit 8.4 Type of Residence: Preferred and Recommended

08-Chambliss.qxd 14-01-03 4:26 AM Page 206

resources. So researchers may conduct an evaluability assessment to learn this inadvance, rather than expend time and effort on a fruitless project.

Why might a social program not be evaluable?

• Management only wants to have its superior performance confirmed anddoes not really care whether the program is having its intended effects. Thisis a very common problem.

• Staff are so alienated from the agency that they don’t trust any attempt spon-sored by management to check on their performance.

• Program personnel are just “helping people” or “putting in time” without anyclear sense of what the program is trying to achieve.

• The program is not clearly distinct from other services delivered by theagency, and so can’t be evaluated by itself (Patton, 2002:164).

An evaluability assessment can help to solve the problems identified. Discussionwith program managers and staff can result in changes in program operations. Theevaluators may use the evaluability assessment to “sell” the evaluation to parti-cipants and sensitize them to the importance of clarifying their goals and objec-tives. Knowledge about the program gleaned through evaluability assessment canbe used to refine evaluation plans.

Because they are preliminary studies to “check things out,” evaluability assess-ments often rely on qualitative methods. Program managers and key staff may beinterviewed in depth or program sponsors may be asked about the importance theyattach to different goals. These assessments may also have an “action research”aspect, because the researcher presents the findings to program managers andencourages changes in program operations.

Process Evaluation

What actually happens in a social program? In the New Jersey IncomeMaintenance Experiment, some welfare recipients received higher payments thanothers (Kershaw & Fair, 1976). Simple enough, and not too difficult to verify thatthe right people received the intended treatment. In the Minneapolis experiment onthe police response to domestic violence (Sherman & Berk, 1984), some individu-als accused of assaulting their spouses were arrested, whereas others were justwarned. This is a little bit more complicated, because the severity of the warningmight have varied between police officers and, to minimize the risk of repeat harm,police officers were allowed to override the experimental assignment. In order toidentify this deviation from the experimental design, the researchers would havehad to keep track of the treatments delivered to each accused spouse and collectinformation on what officers actually did when they warned an accused spouse.

Evaluation Research 207

08-Chambliss.qxd 14-01-03 4:26 AM Page 207

This would be process evaluation—research to investigate the process of servicedelivery.

Process evaluation: Evaluation research that investigates the processof service delivery.

Process evaluation is more important when more complex programs are evalu-ated. Many social programs comprise multiple elements and are delivered over anextended period of time, often by different providers in different areas. Due to thiscomplexity, it is quite possible that the program as delivered is not the same forall program recipients, nor consistent with the formal program design.

The evaluation of D.A.R.E. by Research Triangle Institute researchersChristopher Ringwalt and others (1994) included a process evaluation designed toaddress these issues:

• Assess the organizational structure and operation of representative D.A.R.E.programs nationwide.

• Review and assess the factors that contribute to the effective implementationof D.A.R.E. programs nationwide.

• Assess how D.A.R.E. and other school-based drug prevention programs aretailored to meet the needs of specific populations.

The process evaluation (they called it an “implementation assessment”) was anambitious research project, with site visits and informal interviews, discussions,and surveys of D.A.R.E. program coordinators and advisors. These data indicatedthat D.A.R.E. was operating as designed and was running relatively smoothly.Drug prevention coordinators in D.A.R.E. school districts rated the program com-ponents as much more satisfactory than did coordinators in school districts withother types of alcohol and drug prevention programs.

Process evaluation also can identify the specific aspects of the service deliveryprocess that have an impact. This, in turn, will help to explain why the programhas an effect and which conditions are required for these effects. This identifiesthe causal mechanism behind an effect. (In Chapter 5, you will remember, wedescribed this as identifying the casual mechanism.) In the D.A.R.E. research, sitevisits revealed an insufficient number of officers and a lack of Spanish-languageD.A.R.E. books in a largely Hispanic school. At the same time, classroom obser-vations indicated engaging presentations and active student participation (Ringwaltet al., 1994:69, 70).

Process analysis of this sort can also help to show how apparently unambiguousfindings may be incorrect. The apparently disappointing results of the Transitional

208 M A K I N G S E N S E O F T H E S O C I A L W O R L D

08-Chambliss.qxd 14-01-03 4:26 AM Page 208

Aid Research Project (TARP) provide an instructive lesson. TARP was a socialexperiment designed to determine whether financial aid during the transition fromprison to the community would help released prisoners to find employment andavoid returning to crime. Two thousand participants in Georgia and Texas wererandomized to receive a particular level of benefits over a particular period of time,or no benefits (the control group). Initially, it seemed that the payments had noeffect: The rate of subsequent arrests for property or nonproperty crimes was notaltered by TARP treatment condition.

But this wasn’t all there was to it. Peter Rossi tested a more elaborate causalmodel of TARP’s effects that is summarized in Exhibit 8.5. Participants whoreceived TARP payments had more income to begin with and so had more to lose ifthey were arrested; therefore they were less likely to commit crimes. However,TARP payments also created a disincentive to work and therefore increased the timeavailable in which to commit crimes. Thus, the positive direct effect of TARP (moreto lose) was cancelled out by its negative indirect effect (more free time).

The term formative evaluation may be used instead of process evaluationwhen the evaluation findings are used to help shape and refine the program(Rossi & Freeman, 1989). Formative evaluation procedures that are incorporatedinto the initial development of the service program can specify the treatment processand lead to changes in recruitment procedures, program delivery, or measurementtools (Patton, 2002:220).

Formative evaluation: Process evaluation that is used to shape andrefine program operations.

Evaluation Research 209

TARPTreatment

TARPPayments

PropertyArrests

OtherExogenousVariables

TimeWorked

Time inPrison

NonpropertyArrests

Exhibit 8.5 Model of TARP Effects

Source: From Chen, Huey-Tsyh. 1990. Theory-Driven Evaluations. Sage Publications. Used withpermission.

08-Chambliss.qxd 14-01-03 4:26 AM Page 209

You can see the “formative” element in the following government report on theperformance of the Health Care Finance Administration (HCFA):

While HCFA’s performance report and plan indicate that it is making someprogress toward achieving its Medicare program integrity outcome, progressis difficult to measure because of continual goal changes that are sometimeshard to track or that are made with insufficient explanation. Of the five fiscalyear 2000 program integrity goals it discussed, HCFA reported that three weremet, a fourth unmet goal was revised to reflect a new focus, and performancedata for the fifth will not be available until mid-2001. HCFA plans to discon-tinue three of these goals. Although the federal share of Medicaid is projectedto be $124 billion in fiscal year 2001, HCFA had no program integrity goal forMedicaid for fiscal year 2000. HCFA has since added a developmental goalconcerning Medicaid payment accuracy. (U.S. Government AccountingOffice, 2001)

Process evaluation can employ a wide range of indicators. Program coverage canbe monitored through program records, participant surveys, community surveys,and analysis of utilizers versus dropouts and ineligibles. Service delivery may bemonitored through service records completed by program staff, a managementinformation system maintained by program administrators, and reports by programrecipients (Rossi & Freeman, 1989).

Qualitative methods are often a key component of process evaluation studiesbecause they can be used to elucidate and understand internal program dynam-ics—even those that were not anticipated (Patton, 2002:159; Posavac & Carey,1997). Qualitative researchers may develop detailed descriptions of how programparticipants engage with each other, how the program experience varies fordifferent people, and how the program changes and evolves over time.

Impact Analysis

The core questions of evaluation research are: Did the program work? Did ithave the intended result? This kind of research is variously called impact analysis,impact evaluation, or summative evaluation. Formally speaking, impact analysis

compares what happened after a program was implemented with what would havehappened had there been no program at all.

Think of the program—such as a new strategy for combating domestic violenceor an income supplement—as an independent variable, and the result it seeks as adependent variable. The D.A.R.E. program (independent variable), for instance,tries to reduce drug use (dependent variable). If the program is present, we shouldexpect less drug use. In a more elaborate study, we might have multiple values of

210 M A K I N G S E N S E O F T H E S O C I A L W O R L D

08-Chambliss.qxd 14-01-03 4:26 AM Page 210

the independent variable, for instance, comparing conditions of “no program,”“D.A.R.E. program,” and “other drug/alcohol education.”

As in other areas of research, an experimental design is the preferred method formaximizing internal validity—that is, for making sure your causal claims aboutprogram impact are justified. Cases are assigned randomly to one or more experi-mental treatment groups and to a control group so that there is no systematic dif-ference between the groups at the outset (see Chapter 5). The goal is to achieve afair, unbiased test of the program itself, so that the judgment about the program’simpact is not influenced by differences between the types of people who are in thedifferent groups. It can be a difficult goal to achieve, because the usual practice insocial programs is to let people decide for themselves whether they want to enter aprogram, and also to establish eligibility criteria that ensure that people who enterthe program are different from those who do not (Boruch, 1997). In either case, aselection bias is introduced.

But sometimes researchers are able to conduct well-controlled experiments.Robert Drake, Gregory McHugo, Deborah Becker, William Anthony, and RobinClark (1996) evaluated the impact of two different approaches to providing employ-ment services for people diagnosed with severe mental disorders, using a random-ized experimental design. One approach, group skills training (GST), emphasizedpreemployment skills training and used separate agencies to provide vocational andmental health services. The other approach, individual placement and support (IPS),provided vocational and mental health services in a single program and placedpeople directly into jobs without preemployment skills training. The researchershypothesized that GST participants would be more likely to obtain jobs during the18-month study period than would IPS participants.

Their experimental design is depicted in Exhibit 8.6. Cases were assignedrandomly to the two groups, and then:

1. Both groups received a pretest.

2. One group received the experimental intervention (GST), and the otherreceived the IPS approach.

3. Both groups received three posttests, at 6, 12, and 18 months.

Contrary to the researchers’ hypothesis, the IPS participants were twice aslikely to obtain a competitive job as the GST participants. The IPS participantsalso worked more hours and earned more total wages. Although this was not theoutcome Drake et al. had anticipated, it was valuable information for policy makersand program planners—and the study was rigorously experimental.

Program impact also can be evaluated with quasi-experimental designs (seeChapter 5), nonexperimental designs, or field research methods, without a

Evaluation Research 211

08-Chambliss.qxd 14-01-03 4:26 AM Page 211

randomized experimental design. If program participants can be compared tononparticipants who are reasonably comparable except for their program partici-pation, causal conclusions about program impact can still be made. However,researchers must evaluate carefully the likelihood that factors other than programparticipation might have resulted in the appearance of a program effect. For exam-ple, when a study at New York’s maximum-security prison for women found that“education [i.e., classes] is found to lower risk of new arrest,” the conclusionswere immediately suspect: The research design did not ensure that the womenwho enrolled in the prison classes were the same as those who were not, “leavingopen the possibility that the results were due, at least in part, to self-selection, withthe women most motivated to avoid reincarceration being the ones who took thecollege classes” (Lewin, 2001b). Such nonequivalent control groups are often ouronly option, but you should be alert to their weaknesses.

Impact analysis is an important undertaking that fully deserves the attention ithas been given in government program funding requirements. However, youshould realize that more rigorous evaluation designs are less likely to concludethat a program has the desired effect; as the standard of proof goes up, success isharder to demonstrate. The prevalence of “null findings” (or “we can’t be sure itworks”) has led to a bit of gallows humor among evaluation researchers.

The Output/Outcome/Downstream Impact Blues

Donors often say,And this is a fact,Get out there and show usYour impact

212 M A K I N G S E N S E O F T H E S O C I A L W O R L D

ExperimentalGroup

ComparisonGroup

R

O1 O2 O4O3X

Pretest

Preemploymentskillstraining

Posttestat 6 months

Posttestat 6 months

Posttestat 12 months

Posttestat 12 months

Posttestat 18 months

Posttestat 18 months

Key: R = Random assignmentO = Observation (employment status at pretest or posttest)X = Experimental treatment

Exhibit 8.6 Randomized Comparative Change Design: Employment Services for People withSevere Mental Disorders

Source: From Drake, Robert E., Gregory J. McHugo, Deborah R. Becker, William A. Anthony, andRobin E. Clark. 1996. “The New Hampshire Study of Supported Employment for People with SevereMental Illness.” Journal of Consulting and Clinical Psychology, 64: 391–399. Used with permission.

08-Chambliss.qxd 14-01-03 4:26 AM Page 212

You must change peoples’ livesAnd help us take the creditOr next time you want fundingYou just might not get it.ChorusSo donors wake upFrom your impossible dream.You drop in your fundingA long way upstream.The waters they flow,They mingle, they blendSo how can you take creditFor what comes out in the end?

—Terry Smutylo, director, EvaluationInternational Development Research Centre,

Ottawa, Canada (excerpt reprinted in Patton, 2002:154)

Efficiency Analysis

Are the program’s financial benefits sufficient to offset the program’s costs? Theanswer to this question is provided by a cost-benefit analysis. How much does itcost to achieve a given effect? This answer is provided by a cost-effectiveness

analysis. For understandable reasons, program funders often require one or both ofthese types of efficiency analysis.

A cost-benefit analysis must (obviously) identify the specific costs and benefitsto be studied, but my “benefit” may easily be a “cost” to you. Program clients, forinstance, will certainly have a different perspective on these issues than do tax-payers or program staff. Exhibit 8.7 lists factors that can be considered costs orbenefits in a supported employment program, from the standpoint of participantsand taxpayers (Schalock & Butterwoth, 2000). Note that some anticipated impactsof the program (e.g., taxes and subsidies) are a cost to one group but a benefit tothe other, and some impacts are not relevant to either.

After potential costs and benefits have been identified, they must be measured.This need is highlighted in recent government programs (Campbell, 2001):

The Governmental Accounting Standards Board’s (GASB) mission is to estab-lish and improve standards of accounting and financial reporting for state andlocal governments in the United States. In June 1999, the GASB issued a majorrevision to current reporting requirements (“Statement 34”). The new reportingwill provide information that citizens and other users can utilize to gain an under-standing of the financial position and cost of programs for a government and adescriptive management’s discussion and analysis to assist in understanding agovernment’s financial results.

Evaluation Research 213

08-Chambliss.qxd 14-01-03 4:26 AM Page 213

In addition to measuring services and their associated costs, a cost-benefitanalysis must be able to make some type of estimation of how clients benefitedfrom the program and what the economic value of this benefit was. A recent studyof therapeutic communities provides a clear illustration. A therapeutic communityis a method for treating substance abuse in which abusers participate in an inten-sive, structured living experience with other addicts who are attempting to staysober. Because the treatment involves residential support as well as other types ofservices, it can be quite costly. Are those costs worth it?

Sacks et al. (2002) conducted a cost-benefit analysis of a modified therapeuticcommunity (TC), in which 342 homeless mentally ill chemical abusers were ran-domly assigned to either a TC or a “treatment-as-usual” comparison group.Employment status, criminal activity, and utilization of health care services wereeach measured for the three months prior to entering treatment and the three

214 M A K I N G S E N S E O F T H E S O C I A L W O R L D

Analysis Variable Participant Taxpayer

Monetized Costs and Benefit Variables

Operating Costs __ Cost

Savings in alternative programs __ Benefit

Gross wages Benefit __

Forgone wages Cost __

Fringe benefits Benefit __

Taxes withheld Cost Benefit

Interest on taxes withheld __ Benefit

Taxes refunded Benefit Cost

Reduction in subsidies in taxes __ Costdue to tax credits

Nonmonetized Benefits

Increased self-sufficiency Benefit __

Increased independent living Benefit __

Increased quality of life Benefit __

Exhibit 8.7 Potential Costs and Benefits of a Social Program, by Beneficiary

Source: From Schalock, Robert and John Butterworth. 2000. A Benefit-Cost Analysis Model for SocialService Agencies. Boston: Institute for Community Inclusion. Used with permission.

08-Chambliss.qxd 14-01-03 4:26 AM Page 214

months after treatment. Earnings from employment in each period were adjustedfor costs incurred by criminal activity and utilization of health care services.

Was it worth it? The average cost of TC treatment for a client was $20,361. Incomparison, the economic benefit (based on earnings) to the average TC clientwas $305,273, which declined to $273,698 after comparing post- to pre-programearnings. After adjusting for the cost of the program, the benefit was still $253,337.The resulting benefit-cost ratio was 13:1, although this ratio declined to only 5.2:1after further adjustments (for cases with extreme values). Nonetheless, the TCprogram studied seems to have had a substantial benefit relative to its costs.

ETHICAL ISSUES IN EVALUATION RESEARCH

Whenever you evaluate the needs of clients, or analyze the impact of a program,you directly affect people’s lives. Social workers want to believe their effortsmatter; drug educators think they’re preventing drug abuse. Homeless people haveproblems and may really appreciate the services an agency provides. Programadministrators have bosses to please, foundations need big programs to fund, anddomestic violence, for instance, is a real problem—and finding solutions to it mat-ters. Participants and clients in social programs, then, are not just subjects eager totake part in your research; they care about your findings, deeply. This producesserious ethical as well as political challenges for the evaluation researcher(Boruch, 1997:13; Dentler, 2002:166).

There are many specific ethical challenges in evaluation research:

• How can confidentiality be preserved when the data are owned by a govern-ment agency or are subject to discovery in a legal proceeding?

• Who decides what burden an evaluation project can impose upon participants?• Can a research decision legitimately be shaped by political considerations?• Must findings be shared with all stakeholders, or only with policy makers? • Will a randomized experiment yield more defensible evidence than the

alternatives?• Will the results actually be used?

Is it fair to randomly assign persons to receive some social program or benefit?One answer has to do with the scarcity of these resources. If not everyone who iseligible for a program can receive it, due to resource limitations, what fairer wayto distribute the benefits than through a lottery? This is exactly the process that isinvolved in a randomized experimental design.

The Health Research Extension Act of 1985 (Public Law 99-158) mandated thatthe Department of Health and Human Services require all research organizations

Evaluation Research 215

08-Chambliss.qxd 14-01-03 4:26 AM Page 215

receiving federal funds to have an Institutional Review Board (IRB) to assessall research for adherence to ethical practice guidelines. There are six federallymandated criteria (Boruch, 1997:29–33):

• Are risks minimized? • Are risks reasonable in relation to benefits?• Is the selection of individuals equitable? (Randomization implies this.) • Is informed consent given?• Are the data monitored?• Are privacy and confidentiality assured?

Evaluation researchers must consider whether it will be possible to meet eachof these criteria long before they even design a study. The problem of maintain-ing subject confidentiality is particularly thorny because researchers, in general,are not legally protected from the requirement that they provide evidencerequested in legal proceedings. However, several federal statutes have been passedspecifically to protect research data about vulnerable populations from legal dis-closure requirements. For example, the Crime Control and Safe Streets Act (28CFR Part 11) includes the following stipulation (Boruch, 1997:60):

Copies of [research] information [about persons receiving services under theact or the subject of inquiries into criminal behavior] shall be immune fromlegal process and shall not, without the consent of the persons furnishing suchinformation, be admitted as evidence or used for any purpose in any action, suit,or other judicial or administrative proceedings.

When it appears that it will be difficult to meet the ethical standards in an eval-uation project, at least from the perspective of some of the relevant stakeholders,modifications should be considered in the study design. Several steps can be takento lessen any possibly detrimental program impact (Boruch, 1997:67–68):

• Alter the group allocation ratios to minimize the number in the untreatedcontrol group.

• Use the minimum sample size required to be able to adequately test theresults.

• Test just parts of new programs, rather than the entire program.• Compare treatments that vary in intensity (rather than presence or absence).• Vary treatments between settings, rather than among individuals within a

setting.

216 M A K I N G S E N S E O F T H E S O C I A L W O R L D

08-Chambliss.qxd 14-01-03 4:26 AM Page 216

CONCLUSION

In social policy circles, hopes for evaluation research are high: Society wouldbenefit from the programs that work well, that accomplish their goals, and thatserve people who genuinely need them. At least that is the hope. Unfortunately,there are many obstacles to realizing this hope. Because social programs and thepeople who use them are complex, evaluation research designs can easily missimportant outcomes or aspects of the program process. Because the many programstakeholders all have an interest in particular results from the evaluation, researcherscan be subjected to an unusual level of cross-pressures and demands. Because theneed to include program stakeholders in research decisions may undermine adher-ence to scientific standards, research designs can be weakened. Because programadministrators may want to believe their programs really work well, researchers maybe pressured to avoid null findings, or, if they are not responsive, find their researchreport ignored. Plenty of well done evaluation research studies wind up in a recy-cling bin, or hidden away in a file cabinet. Because the primary audience for evalu-ation research reports are program administrators, politicians, or members of thepublic, evaluation findings may need to be overly simplified, distorting the findings(Posavac & Carey, 1997).

The rewards of evaluation research are often worth the risks, however. Evaluationresearch can provide social scientists with rare opportunities to study complex socialprocesses, with real consequences, and to contribute to the public good. Althoughthey may face unusual constraints on their research designs, most evaluation pro-jects can also result in high-quality analyses and publications in reputable socialscience journals. In many respects, evaluation research is an idea whose time hascome. We may never achieve Donald Campbell’s vision of an “experimenting soci-ety” (Campbell & Russo, 1999), in which research is consistently used to evaluatenew programs and to suggest constructive changes, but we are close enough tocontinue trying.

K E Y T E R M S

Evaluation Research 217

cost-benefit analysiscost-effectiveness analysisefficiency analysisevaluability assessmentevaluation researchformative evaluationimpact analysisinputsintegrative approach to evaluationneeds assessment

outcomesoutputsprocess evaluationprogram processprogram theorysocial science approach to evaluationstakeholder approach to evaluationstakeholderssummative evaluationtheory-driven evaluation

08-Chambliss.qxd 14-01-03 4:26 AM Page 217

H I G H L I G H T S

• Evaluation research is social research that is conducted for a distinctive purpose: toinvestigate social programs.

• The development of evaluation research as a major enterprise followed on the heelsof the expansion of the federal government during the Great Depression and World War II.

• The evaluation process can be modeled as a feedback system, with inputs enteringthe program, which generates outputs and then outcomes, which feed back to programstakeholders and affect program inputs.

• The evaluation process as a whole, and the feedback process in particular, can onlybe understood in relation to the interests and perspectives of program stakeholders.

• The process by which a program has an effect on outcomes is often treated as a“black box,” but there is good reason to open the black box and investigate the process bywhich the program operates and produces, or fails to produce, an effect.

• A program theory may be developed before or after an investigation of programprocess is completed. It can be either descriptive or prescriptive.

• Evaluation research is research for a client and its results may directly affect the ser-vices, treatments, or punishments that program users receive. Evaluation researchers differin the extent to which they attempt to orient their evaluations to program stakeholders.

• Qualitative methods are useful in describing the process of program delivery.

• Multiple outcomes are often necessary to understand program effects.

• There are five primary types of program evaluation: needs assessment, evaluabilityassessment, process evaluation (including formative evaluation), impact analysis (alsotermed summative evaluation), and efficiency (cost-benefit) analysis.

• Evaluation research raises complex ethical issues because it may involve withholdingdesired social benefits.

E X E R C I S E S

Discussing Research

1. Would you prefer that evaluation researchers use a stakeholder or a social scienceapproach? Compare and contrast these perspectives and list at least four arguments for theone you favor.

2. Is it ethical to assign people to receive some social benefit on a random basis? Formtwo teams and debate the ethics of the TARP randomized evaluation of welfare paymentsdescribed in this chapter.

Finding Research

1. Inspect the Web site maintained by the Governmental Accounting StandardsBoard: www.seagov.org. Read and report on performance measurement in government asdescribed in one of the case studies.

218 M A K I N G S E N S E O F T H E S O C I A L W O R L D

08-Chambliss.qxd 14-01-03 4:26 AM Page 218

2. Describe the resources available for evaluation researchers at one of the followingthree Web sites: www.wmich.edu/evalctr/; www.stanford.edu/~davidf/empowermenteval-uation.html; www.worldbank.org/oed/.

Critiquing Research

1. Read and summarize an evaluation research report published in the Evaluation andProgram Planning journal. Be sure to identify the type of evaluation research that isdescribed.

2. Select one of the evaluation research studies described in this chapter, read the orig-inal report (book or article) about it, and review its adherence to the ethical guidelines forevaluation research. Which guidelines do you feel are most important? Which are mostdifficult to adhere to?

Doing Research

1. Propose a randomized experimental evaluation of a social program with which youare familiar. Include in your proposal a description of the program and its intended out-comes. Discuss the strengths and weaknesses of your proposed design.

2. Identify the key stakeholders in a local social or educational program. Interviewseveral stakeholders to determine what their goals are for the program and what tools theyuse to assess goal achievement. Compare and contrast the views of each stakeholder andtry to account for any differences you find.

Evaluation Research 219

08-Chambliss.qxd 14-01-03 4:26 AM Page 219