18
DOCUMENT RESUME ED 030 948 EA 002 270 By-Stake, Robert E. Toward a Technology for the Evaluation of Educational Prmrams. Pub Date 67 Note-17p.: Pages 1.12 in PERSPECTIVES OF CURRICULUM EVALUATION, AERA Monograph Series on Curriculum Evaluation, edited by Ralph W. Tyler, And Others, Rand McNally & Co., Chicago, 1967. Available from-Rand McNally & Company, Box 7600, Chicago, Illinois 60680 (Complete document 102 pages, $2.00). EDRS Price MF-$0.2.5 HC-$0.95 Descriptors-Achievement Tests, *Curriculum Evaluation, Diagnostic Tests, *Evaluation Criteria, *Evaluation Techniques, Literature Reviews, Measurement, *Measurement Goals, *Program Evaluation Identifiers-AERA, American Educational Research Association Reflecting an increased awareness of the need for comprehensive curriculum evaluation: a monograph series has been initiated, focusing on major aspects of curriculum design and development. This introduction to the series defines curriculum evaluation as the collection, processing, and interpretation of two main kinds of data. ( 1 ) The objective descriptions of a curriculum's goals, ennronments, personnel, methods, content, and outcomes; and (2) personal judgments by the evaluator of the curriculum's goals, environments, etc. Available tests related to the evaluation of instructic seldom go beyond achievement testing. New techniques of observation and judgment of total curriculums are needed, with greater attention given to diagnostic testing, task analyses, and evaluation of goals. As reported in the growing literature on measurement and evaluation, special techniques employed in the behavioral sciences need to be utilized in curriculum evaluation. Through its sponsorship of the monograph series, the American Educational Research Association recognizes its obligation as well as its opportunity to cultivate a methodology for the evaluation of education programs. (X) ..M7=1.,7R_ ,.P.r. ,.MWs.r..

,.P.r. ,.MWs.r.. · DOCUMENT RESUME ED 030 948 EA 002 270 By-Stake, Robert E. Toward a Technology for the Evaluation of Educational Prmrams. Pub Date 67 Note-17p.: Pages 1.12 in PERSPECTIVES

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ,.P.r. ,.MWs.r.. · DOCUMENT RESUME ED 030 948 EA 002 270 By-Stake, Robert E. Toward a Technology for the Evaluation of Educational Prmrams. Pub Date 67 Note-17p.: Pages 1.12 in PERSPECTIVES

DOCUMENT RESUMEED 030 948 EA 002 270By-Stake, Robert E.Toward a Technology for the Evaluation of Educational Prmrams.Pub Date 67Note-17p.: Pages 1.12 in PERSPECTIVES OF CURRICULUM EVALUATION, AERA Monograph Series onCurriculum Evaluation, edited by Ralph W. Tyler, And Others, Rand McNally & Co., Chicago, 1967.

Available from-Rand McNally & Company, Box 7600, Chicago, Illinois 60680 (Complete document 102 pages,$2.00).

EDRS Price MF-$0.2.5 HC-$0.95Descriptors-Achievement Tests, *Curriculum Evaluation, Diagnostic Tests, *Evaluation Criteria, *EvaluationTechniques, Literature Reviews, Measurement, *Measurement Goals, *Program Evaluation

Identifiers-AERA, American Educational Research AssociationReflecting an increased awareness of the need for comprehensive curriculum

evaluation: a monograph series has been initiated, focusing on major aspects ofcurriculum design and development. This introduction to the series defines curriculumevaluation as the collection, processing, and interpretation of two main kinds of data.( 1 ) The objective descriptions of a curriculum's goals, ennronments, personnel,methods, content, and outcomes; and (2) personal judgments by the evaluator of thecurriculum's goals, environments, etc. Available tests related to the evaluation ofinstructic seldom go beyond achievement testing. New techniques of observation andjudgment of total curriculums are needed, with greater attention given to diagnostictesting, task analyses, and evaluation of goals. As reported in the growing literatureon measurement and evaluation, special techniques employed in the behavioralsciences need to be utilized in curriculum evaluation. Through its sponsorship of themonograph series, the American Educational Research Association recognizes itsobligation as well as its opportunity to cultivate a methodology for the evaluation ofeducation programs. (X)

..M7=1.,7R_ ,.P.r. ,.MWs.r..

Page 2: ,.P.r. ,.MWs.r.. · DOCUMENT RESUME ED 030 948 EA 002 270 By-Stake, Robert E. Toward a Technology for the Evaluation of Educational Prmrams. Pub Date 67 Note-17p.: Pages 1.12 in PERSPECTIVES

1AERA MONOGRAPH SERIES

N CURRICULUM EVALUATION

aPE RSPECTIVE S

OF CURRICULUMEVALUATION

Ralph TylerRobert Gagne

Michael Scriven

Page 3: ,.P.r. ,.MWs.r.. · DOCUMENT RESUME ED 030 948 EA 002 270 By-Stake, Robert E. Toward a Technology for the Evaluation of Educational Prmrams. Pub Date 67 Note-17p.: Pages 1.12 in PERSPECTIVES

AMERICAN EDUCATIONAL RESEARCH ASSOCIATIONMONOGRAPH SERIES

ONCURRICULUM EVALUATION

Committee on Curriculum Evaluation:Harold BerlakLeonard S. CahenRichard A. DershimerChristine McGuireJack C. MerwinErnst Z. RothkopfJames P. ShaverRobert E. Stake

Coordinating EditorRobert E. Stake

Editorial Associates:J. Myron AtkinPeter A. Taylor

Editorial Consultants for this Issue:John B. GilpinJ. Thomas HastingsThomas 0. Maguire

American Educational Research Association1126 Sixteenth Street, N. W.Washington, D. C. 20036

Page 4: ,.P.r. ,.MWs.r.. · DOCUMENT RESUME ED 030 948 EA 002 270 By-Stake, Robert E. Toward a Technology for the Evaluation of Educational Prmrams. Pub Date 67 Note-17p.: Pages 1.12 in PERSPECTIVES

Perspectives ofCurriculum Evaluation

Ralph W. TylerCenter for Advanced Study in theBehavioral Sciences

Robert M. GagnéUniversity of California, Berkeley

Michael Scriven

Indiana University

U.S. DEPARTMENT OF HEALTH, EDUCATWN 8. WELFARE

OFFICE OF EDUCATION

THIS DOCUMENT HAS BEEN REPRODUCED EXACTLY AS RECEIVED FROM THE

PERSON OR ORGANIZATION ORIGINATING IT. POINTS Of VIEW OR OPINIONS

STATED DO NOT NECESSARILY REPRESENT OFFICIAL OFFICE OF EDUCATION

POSITION OR POLICY.

Rand MCNally & CompanyChicago

'Boy_ -7(000(OD 20

Page 5: ,.P.r. ,.MWs.r.. · DOCUMENT RESUME ED 030 948 EA 002 270 By-Stake, Robert E. Toward a Technology for the Evaluation of Educational Prmrams. Pub Date 67 Note-17p.: Pages 1.12 in PERSPECTIVES

RAND NENALLY EDUCATION SERIES

B. OTHANEL SMITH, Series Editor

"FERM1SSION TO REPRODUCE THIS

COPYRIGHTED MATERIAL HAS BEEN GRANTED

BY Barbara Siefken, Rand

MeNal 1 y & CompanyTO ERIC AND ORGANIZATIONS OPERATING

UNDER AGREEMENTS WITH THE U.S. OFFICE OF

EDUCATION. FURTHER REPRODUCTION OUTSIDE

THE ERIC SYSTEM REQUIRES PERMISSION OF

THE COPYRINT OWNER."

Copyright ©1967 by Rand Mc:Nally & Company

All rights reservedPrinted in U.S.A. by Rand A/Many & CompanyLibrary of Congress Catalog Card Number 66-30794

Page 6: ,.P.r. ,.MWs.r.. · DOCUMENT RESUME ED 030 948 EA 002 270 By-Stake, Robert E. Toward a Technology for the Evaluation of Educational Prmrams. Pub Date 67 Note-17p.: Pages 1.12 in PERSPECTIVES

Contents

Toward a Technology for theEvaluation of Educational Programs:

Robert E. Stake

Page

1

Changing Concepts of Educational Evaluation:Ralph W. Tyler 13

Curriculum Research and the Promotion of Learning:Robert M. Cagné 19

The Methodology of Evaluation:Michael Scriven 39

Aspects of Curriculum Evaluation: A Synopsis:J. Stanley /1"----1 84

Bibliography 90

Page 7: ,.P.r. ,.MWs.r.. · DOCUMENT RESUME ED 030 948 EA 002 270 By-Stake, Robert E. Toward a Technology for the Evaluation of Educational Prmrams. Pub Date 67 Note-17p.: Pages 1.12 in PERSPECTIVES

co

CI' Toward a Technology for theEvaluation of Educational Programs

Prft%

"Let the buyer beware," declared the 19th-century individualist.What faith in human judgment! Then, as now, some "buyers"lacked the needed powers of judgments but those who had themwere expected to use them, and those who did not were said todeserve no concern. Yesteryear's spokesman for individualismadvocated self-responsibility for one's choices and protested againstgovernment and consumer-collective action in the marketplace.

The marketplace in the last third of the 20th century will feature,along with provisions for sustenance, comfort, leisure, and longevity,a great array of products for the never-ending education of an in-quisitive populace. Both government and private corporations havealready initiated vast new production lines. Revolutionary curriculaare emerging. Should the buyer beware? Can the buyer beware?What agencies are prepared to evaluate these educational productsand programs? What steps should be taken to gain an understandingof an Operation Headstart or a school system designed by LittonIndustries? We little understand the traditional operations of schoolsystems. How can we understand the new?

How much we expect society to help the individual make de-cisions, in the marketplace and out, has changed greatly over thelast 80 years. Most people now believe that government must notonly protect against the grossly negligent and wanton, but must alsolicense and standardize the conduct of legitimate business. Non-government agencies such as consumer organizations, professionalassociations, and producer self-regulatory bodies have been createdto help provide information for judgment and decision.

How about the educational consumer? Can the teacher, super-intendent, and curriculum coordinator choose wisely? Far toolittle information is now available. Little is known about the meritand shortcoming of products and programs. For excellence in

1

V

Page 8: ,.P.r. ,.MWs.r.. · DOCUMENT RESUME ED 030 948 EA 002 270 By-Stake, Robert E. Toward a Technology for the Evaluation of Educational Prmrams. Pub Date 67 Note-17p.: Pages 1.12 in PERSPECTIVES

i

2 STAKE

education we need excellent books and excellent teachers, but ourmethods of recognizing excellence are inadequate. For a few years,at least, there will be little quality control of goods produced byResearch and Development Centers, by the growing curriculum--innovation projects, and by the newborn instruction industry.Much of the forthcoming educational output will be excellent, butnot ail. We grade the eggs a buyer cannot grade for himself and we .legislate automobile safety standards. Yet far more crucially thaneggs or automobiles, educational programs shape our futuresociety. Should educational programs continue to escape formalevaluation?

ACCURACY VERSUS COMPLETENESS

"Let's call a spade a spade," declares a 20th century logical-positivist. What faith in perspicacity! To treat a spade properly wemust recognize it as a spade. To specify the impact of an educa-tional program we must be able to perceive impact.

Measurement specialists are proud of their perspicacity. "If itexists," they say, "it exists in quantity; and if it exists in quantity,it can be measured." It follows that if an educational program hasan impact, that impact can be measured. Most specialists in educa-tional testing and measurement believe they can do the job. Thegeneral public and most members of the educational professionpresume that after having analyzed his data the "testing man" canstate in precise terms the worth of a curriculum. The language ofthe Elementary and Secondary Education Act of 1965, Title I,implies that capability to evaluate is presently within our command.But the fluidity of our experiments and the bluntness of our testsdeny us that capability. Neither quantity nor quality of impact ismeasured.

These are not, however, the greatest of our measurement prob-lems: A spade is not just a spade. We do not have labels to identifyeach spadeand each educational program so that it can beunderstood by label alone. Each needs ample description. Eachdiffers from the others in a multitude of ways, and representationby title alone or by some composite score or rating leaves much ofthe story untold.

Our measurements are not perfectly accurate. We could devoteourselves to improving the precision of our instruments, but arethere not higher-priority tasks? For the evaluation of curricula, Ibelieve that we should postpone our concern for greater precision.

4101,4.....ime.....n...-.7

:

1

!

I

I,i

k

i%

1

Page 9: ,.P.r. ,.MWs.r.. · DOCUMENT RESUME ED 030 948 EA 002 270 By-Stake, Robert E. Toward a Technology for the Evaluation of Educational Prmrams. Pub Date 67 Note-17p.: Pages 1.12 in PERSPECTIVES

irA7'..`;41'4

TOWARD A TECHNOLOGY 3

We should demonstrate first Our awareness of a full array of teach-ing and learning phenomena. We should extend to this array ourability to observe and pass judgment. We should commit ourselvesto a more compleie description.

New techniques of observation and judgment need to be de-veloped. In fact, we need a new technology of educational evalua-tion. We need new parachgms, new methods, and new findings tohelp the buyer beware, to help the teacher capitalize on newdevices, to help the developer create new materials, and to helpall of us to understand the changing educaticoal enterprise.

PROFESSIONAL TOOLS AND TACTICS

It is not uncommon today for educational psychologists andmeasurement specialists to serve as advisors to evaluation projects.The inclination of these professionals, not surprisingly, is to usetheir most refined tools and techniques. Most of these tools andtechniques were developed for differentiating among individualstudents, not for measuring the impact of an instructional program.Although differences in impact are indeed related to differences instudent groups, curriculum evaluation and student evaluationrequire different measurement tactics.

Measurement consultants usually recommend specification ofobjectives in behavioral terms, experimental studies rather thanstatus studies, and testing with instruments of empirically demon-strated reliability. Clearly these recommendations have their merit,but they can misguide evaluation efforts. J. Myron Atkin (1963) andElliot Eisner (1966) have indicated how behavioral specificationmay disembody an educator's purpose. Lee Cronbach (1963) hasindicated how a preoccupation with reliability can drain away anevolving test's content validity. Experimental controls are neededin the laboratory, regression equations are needed in the admis-sions office, and behavioral language is an essential considerationin test construction, but such techniques may not facilitate theconduct of an evaluation project.

Within the school, teachers and administrators evaluate theirprograms. Usually their purpose is self-improvement. Whenapproaching the task in a formal way, they choose checklists andquestionnaires as tools. Unfortunately, their inquiries are seldomvalidated, their attention to student achievement is negligible, andthey seldom consider alternate ways of teaching. Still, these layevaluations can be admired. They do attend to important facets of

Page 10: ,.P.r. ,.MWs.r.. · DOCUMENT RESUME ED 030 948 EA 002 270 By-Stake, Robert E. Toward a Technology for the Evaluation of Educational Prmrams. Pub Date 67 Note-17p.: Pages 1.12 in PERSPECTIVES

4 STAKE

the situation that are absent from the reports of measurement ex-perts. New measurement tools and tactics should be devised forwhat is of obvious concern to the lay evaluators.

The official accreditation agencies and accreditation associationsof the nation have not accepted the role of evaluator. They haveestablished certain minimum standards. Each standard is believedto be related to quality education. When a school is rated, theextent to which standards are met is indicatedbut the real worthof its educational program is not apparent in an accreditation report.

What strategies and tactics are needed for real evaluation? Thewriters of these first monographs, Ralph Tyler, Robert Gagné, andMichael Scriven, urge more attention to diagnostic testing, to taskanalyses, and to evaluation of goals. The approaches they offer arenot completely new, but the attempt to bring them together forcurriculum evaluation is all too new. Some of us see in thesetechniques the beginnings of a tucbnology of evaluation. Our guessis that this technology will draw from instructional technology,psycbometric-testing technology, social-survey technology, com-munication technology, and others; and that it will become acontributor to the understanding of evaluation in areas otherthan education.

The skeptical reader may respond that neither new tactics nornew tools are neededthat available tools used in the right waycan do the job. Later, I will try to show wby we should not expectcertain common tools to be useful for evaluation, but first I want tospecify what I mean by "curriculum evaluation."

A DEFINITION OF CURRICULUM EVALUATION

A curriculum is an educational program. It can be informallyorganized: what a craftsman teaches an apprentice; or formallyorganized: what is taught in an instructional film. A curriculum,defined in this way, could be a mere lesson, or it could be thecurricular program of a comprehensive high school, or the entireeducational program of a nation. A curriculum may be specified interms of what the teacher will do, in terms of what the student willbe exposed to, oras Gagné does in this issuein terms of studentachievement.

Educational programs are characterized by their purposes, theircontent, their environments, their methods, and the changes theybring about. Usually there are messages to be conveyed, relation-

Page 11: ,.P.r. ,.MWs.r.. · DOCUMENT RESUME ED 030 948 EA 002 270 By-Stake, Robert E. Toward a Technology for the Evaluation of Educational Prmrams. Pub Date 67 Note-17p.: Pages 1.12 in PERSPECTIVES

TOWARD A TECHNOLOGY 5

ships to hc .. demonstnited, concepts to be symbolized, under-standings and skills to be acquired. Evahiation is complex becauseeach of the many characteristics requires separate attention.

The purpose of educational cvaluation is expository: to acquaintthe audience with the workings of certain educators and theirlearners. It differs from edrcational research in its orientation to aspecific program rather than to variables common to many pro-grams. A full evaluation results in a story, supported perhaps bystatistics and profiles. It tells what happened. It reveals perceptionsand judgments that different groups a»d individuals boldobtained, I hope, by objective means. It tells of merit and short-coming. As a bonus, it may offer generalizations ("The moral of the

story is ...") for the guidance of subseqvitnt educational programs.Curriculum evaluation requires collection, processing, and

interprethtion of data pertaining to an educational program. For acomplete evaluation, two main kinds of data are collected: (1)objective descriptions of goals, :,:nvironments, personnel, methodsand content, and outcomes; and (2) personal judgments as to the

quality and appropriateness of those goals, environments, etc. Thecurriculum evaluator has such diverse tasks as weighing the out-comes of a training institute against previously stated Objectives,comparing the costs of two courses of study, collecting judgmentsof the social worth of a certain goal, and determining the skill orsophistication needed for students commencing a certain scholasticexperience. These evaluative efforts should lead to better decision-making: to better development, better selection, and better use ofcurricuhi.

SOME LIMITATIONS OF AVAILABLE TESTS

Most contemporary evaluations of instruction begin and end withachievement testing. A large number of standardized tests areavailable. Many of these tests have been developed with appropri-ate attention to the Standards for Educational and Psychological

Tests and Manuals (American Psychological Association, 1966)and to such well-considered guidelines as those in EducationalMeasurements (Lindquist, 1951, now in revision). It is important to

our concern here to emphasize that these tests have been devel-oped to provide reliable discrimination among individual students.Discniminability among students is important for instruction andguidance, but for development and selection of curricula, tests are

le'

Page 12: ,.P.r. ,.MWs.r.. · DOCUMENT RESUME ED 030 948 EA 002 270 By-Stake, Robert E. Toward a Technology for the Evaluation of Educational Prmrams. Pub Date 67 Note-17p.: Pages 1.12 in PERSPECTIVES

6 STAKE

needed that discriminate among curricula. Different rules for testadministration are possible, and different criteria of test develop-ment are appropriate, when the tests are to be used to discriminateamong curricula.

For the usual standardized athievement test, the test authorwrites a large number of "general-coverage" or "general-skill"items. If certain content areas are unlikely to be encountered bymany students, the author avoids them. Items on special content.,even when valid, show up poorly on item analyses, and arc weededout. Since the items of a standardized achievement test are meantto be fair to students of all curricula, they are aimed at what iscommon to ail. I3y intent, the standardized achievement test isunlikely to encompass the scope Or penetrate to the depth of aparticular curriculum being evaluated.

Items having a strong relationship with general intelligenceusually look good in an item analysis. These items correlate highlyamong themselves and moderately with almost any achievementitems. Since they are indirect measures of achievement whichsuccessfully predict subsequent performance, they are acceptedby teachers and counselors as well as test developers. But indirectmeasurement of achievement is irrelevant, even offensive, to manycurriculum developers and supervisors of instruction. They wantto know what has been learned. They want to know what deficien-cies remain in student understanding. The standardized test doesnot tell them.

Apart from clinical experience, our only current basis for inter-preting most test performances is the frequency distribution of"total-test" scores collected from a norm group. Reputable testpublishers have been reluctant to endorse subtest scores or toprovide item response information. Clearly, individual-studentdecisions resting on responses to one or just a few items arequestionable. Unlike the counselor, the curriculum supervisordoes not concentrate on individual-student decisions. He mustexplain the variance among curricula. Test developers could helphim by providing item data or, better still, by constructing separatesubtests for each specific curricular objective. That would be adeparture from current practice.

Please do not misunderstand me. I am not belittling our stan-dardized achievement tests. I am favorably impressed with theirusefulness for counseling students. But they are not equally usefulfor evaluation. I am dismayed by my colleagues who believe thatthese same tests can be used to satisfy the needs of the curriculumevaluator.

Page 13: ,.P.r. ,.MWs.r.. · DOCUMENT RESUME ED 030 948 EA 002 270 By-Stake, Robert E. Toward a Technology for the Evaluation of Educational Prmrams. Pub Date 67 Note-17p.: Pages 1.12 in PERSPECTIVES

TOWARD A TECHNOLOGY 7

SOME NEEDED THINKING

The evaluator needs a battery of standard operating procedures.Procedures depend on criteria. Criteria depend on rationales.Rationales depend on theories. From evaluation theory to practice,new thinking is needed.

Regarding curriculum development, we need standard ways oftranslating aims and needs into practices. Our measurement andprogrammed-instruction specialists have developed taxonomiesof objectives. Our classroom-learning-laboratory personnel havedeveloped principles of instruction governing sequences of rulesand examples, schedules for practice and review, hierarchies ofunderstanding, etc. But there is no "compiler language," no grandscheme for deriving educational activities from given objectives.We need lesson-writing paradigms, including subroutines forhelping an author maintain a pace, control reading difficulty,organize review exercises, discover inconsistencies, optimizeredundancies, etc. Things like these, done today intuitively byauthors and editors, should be done more explicitly with routinecheck on the quality of the materials written.

Whether accomplished by author and editor or by author andcomputer, the derivation of lessons should be examined on logicalgrounds. Today the evaluator lacks a rational procedure for check-ing the logic of the development of a curriculum. He needs ways tomeasure 6-- correspondence between the intent of a lesson planand the original goal. Does it require a thorough understanding ofthe subject matter? Should he employ a logician? We do not know.

As a part of this evaluation, communication integrity should beconsidered. Much of education includes the conveying of a certainmessage to a student audience. From the time a message is con-ceived until the students are exposed to it, a considerable trans-formation of the message is likely to take place. Does the author saywhat he wants to say? Does the teacher say what the author in-tended him to say? This concern applies whether the author is asubject-matter expert, e.g., a nuclear physicist, or the final trans-mitter, e.g., the physics teacher himself. Some writings are moreilluminating than others, some homework problems are morepertinent than others, some demonstrations are more apphcablethan others. Some teachers use the right words but obscure themessage, others refine and extend the message. To understand oneimportant quality of a curriculum we must appraise the fidelity ofits communications.

We need techniques for representing the perspectives held by

Page 14: ,.P.r. ,.MWs.r.. · DOCUMENT RESUME ED 030 948 EA 002 270 By-Stake, Robert E. Toward a Technology for the Evaluation of Educational Prmrams. Pub Date 67 Note-17p.: Pages 1.12 in PERSPECTIVES

8 STAKE

different people. Although they use the same language, differentpeople see things differently. Do parents and the school board,consultants and the regular staff see things differently? Althoughtwo groups respond differently to a question, they may see thesame merit in certain instruction. We need better devices forscaling perceptions of objectives. We need better procedures forprocessing judgments. At the beginning these procedures will nothave the precision of an aptitude test or the elegance of an interestinventory, but even crude attempts to scale perceptions shouldbe useful.

What are appropriate and inappropriate roles for the classroomteacher in curriculum evaluation? Can we capitalize upon the con-siderable ability of teachers to estimate which of two demonstratedteaching techniques is more likely to accomplish a particular long-term goal? Through training we could refine the teacher's powersof observation and estimation to make his contribution both tech-nically sound and educationally valid. It is not unreasonable toconjecture that some day the primary role of the classroom teachermay be as a curriculum trouble-shooter, a conceptually orientedmonitor, an evaluatorthe essential link between the school'sprovision of a standard learning situation and the modification of itto accommodate the uniqueness of the student.

Several of the needs listed in this section call for psychometricthinking, the province of the psychologist. Other needs listed hereand elsewhere call for help from the sociologist, the communica-tions expert, the linguist, the philosopher, the anthropologist, andthe economist. Can we find men of such pursuits to think with usas we develop our methodology of evaluation? I believe we must.

PRECURSORS OF A LITERATURE

Educational evaluation has not been without its champions. Thesocial science literature includes many relevant works. A few willbe cited here more extensive coverage can be found in thebibliography.

Psychometric testing, for example, has been thoroughly dis-cussed. For our purposes, the testing literature is nicely repre-sented by Educational Measurement (Lindquist, 1951), with itsfarthest extension toward curriculum being Tyler's chapter on themeasurement of learning. Among other fine writings on the evalua-tion of learning are those of Dressel and Mayhew, whose 1954report is widely accepted as a textbook aimed at the evaluation of

Page 15: ,.P.r. ,.MWs.r.. · DOCUMENT RESUME ED 030 948 EA 002 270 By-Stake, Robert E. Toward a Technology for the Evaluation of Educational Prmrams. Pub Date 67 Note-17p.: Pages 1.12 in PERSPECTIVES

TOWARD A TECHNOLOGY 9

course offerings, and portions of Ahmann and Clock's EvaluatingPupil Growth (1963). Many measurement projects deserve atten-tion, but only two reports will be mentioned here: Project TALENT(Flanagan, 1964) and the National Assessment of EducationalProgress (Tyler, 1966).

Techniques for the evaluation of teaching directly apply tocurriculum evaluation. Prominent among writings in this area areGage's Handbook of Research on Teaching (1963), and publicationsof NIcKeachie (1959) and Simpson and Seidman (1962).

Conant (1959), Gardner (1961), and Trump (1960) have madethorough but nontechnical evaluations of the nation's schools.Defining educational goals for the nation has been a continuingundertaking of the Educational Policies Commission (1959, 1961).More immediate instructional objectives have been the concern ofBloom (1956), Krathwohl (1964), Lindvall (1964), and their col-leagues. The study of educational decision-making has beenrelatively neglected, but noteworthy are the works of Cronbach andGleser (1964) and James (1963).

School environments, notably college environments, have beenthe focus of study by Astin (1961) and Pace (1965-66). Benson(1961), Carlson et al. (1965), and Mort (see Mort and Furno, 1960)have considered economic and social aspects of school systems.Questions concerning curriculum development have been dis-cussed extensively by Taba (1962) and in a collection edited byHeath (1964b). On the general topic of innovation in education,Clark and Cuba (1965), Miles (1964), and Pellegrin (1966) arefrequently cited.

Innovation in measurement methodology is apparent in theliterature. Methods well established in other branches of educa-tional research have found applications in curriculum evaluation.Psychological scaling (Torgerson, 1958), Osgood's semanticdifferential (1957), and Flanders' interaction analysis (1961) areexamples. The Damrin-Glaser tab-testing methods, adapted forgroup testing by McGuire (1966), seem to have particular promise.

As Britton (1964) found, much of the literature relevant tocurriculum evaluation exists in impermanent form office papers,conference handouts, etc. Many valuable illustrative pieces arevirtually unknown because they were written only for the personsconcerned with a particular curriculum. City-wide and state-wideevaluations get little attention outside their jurisdiction, but somehave generated documents and instruments worthy of widerdistribution. Some of the more noteworthy studies have occurredin Baltimore, and in the states of New York, Pennsylvania, Florida,

I

1

Page 16: ,.P.r. ,.MWs.r.. · DOCUMENT RESUME ED 030 948 EA 002 270 By-Stake, Robert E. Toward a Technology for the Evaluation of Educational Prmrams. Pub Date 67 Note-17p.: Pages 1.12 in PERSPECTIVES

1

I

10 STAKE

and California. Ilhistrative materials are sometimes avzlilable fromconsulting agencies such as the American Institute for Research;the Center for Instructional Research and Curriculum Evaluationat the University of Illinois; the Educational Testing Service; theInstitute for Administrative Research at Teachers College, Colum-bia; and the Research and Development Center at UCLA.

THE CHALLENGE TO AERA

A professional organization sees no more clearly than its mostsighted member, and seldom so well. Its actions usually servemore to consolidate rather than to extend, more to permit its mem-bers to tell of past deeds and future hopes than to propel themtoward an institutional goal. So it has been with the AmericanEducational Research Association.

It is possible for the more sighted members of almost any organi-zation to become its officers. And so it has been with AERA.

In the early I960's there were few independent sallies andlittle clamoring from the membership for the development ofevaluation techniques. The officers of AERA, however, were thenconsidering a possible impetus to evaluation efforts. They wereaware that many new curricula were coming from such novelsources as National Science Foundation course-content improve-ment projects; that many special vocational programs were beinginitiated; that education had become a major instrument of waragainst poverty; and that the proliferation of programs now defiesthe local administrator's efforts to understand them on a personalbasis.

Other professional organizations were also recognizing the need.The Association for Supervision and Curriculum Development,like AERA an affiliate of the National Education Association, de-voted its Second National Conference on Curriculum Projects(Ammons and Gilchrist, 1965) to evaluation problems. The Ameri-can Personnel and Guidance Association formed a subdivisioncalled the Association for Measurement and Evaluation in Guid-ance. In 1965 a joint committee chaired by A. A. Lumsdaine andsponsored by AERA, the Department of Audio-Visual instructionof the National Education Association, and the American Psycho-logical Association prepared "Recommendations for Reporting theEffectiveness of Programmed Instruction Materials," a set ofguidelines more general than the title suggests (Joint Committee

Page 17: ,.P.r. ,.MWs.r.. · DOCUMENT RESUME ED 030 948 EA 002 270 By-Stake, Robert E. Toward a Technology for the Evaluation of Educational Prmrams. Pub Date 67 Note-17p.: Pages 1.12 in PERSPECTIVES

TOWARD A TECIINOLOGY 11

on Programed Instruction and Teaching Machines, 1963). Note-worthy publications have been provided by the American Associa-tim of School Arin l i l ;etrators, the American Council on Education,the National Citizens' Council for Better Schools, the NationalSchool Boards Association, and the National Association of Secon-dary School Principals. The need for evaluation has not escapedthe attention of any of these r-cessional organizations, but nonecommands the broad research purview or the measurement skillto apply to that need. AERA does.

None of the publications cited in these two sections stronglyencourages the belief that curriculum evaluation can be accomp-lished by currently available tests, checklists, and visitationroutines. New tools and techniques are needed. With its involve-ment in the development and refinement of educational curricula,AERA is in a unique position. More than any other professionalorganization, it faces the obligation, and the opportunity, to cul-tivate a methodology for the evaluation of education programs.

THE COMMITTEE ON CURRICULUM EVALUATION

In 1964 President Lee J. Cronbach appointed an ad hoc com-mittee to study possible AERA contributions. This committee,composed of N. L. Gage, Wells Hively II, John R. Mayor, gild my-self, reported early in 1965 that a number of activities were war-ranted. It recommended in particular that a regular committee benamed, that conferences be sponsored by AERA, and that a seriesof monographs be published.

Acting upon this report and upon his own perception of educa-tional affairs, President Benjamin S. Bloom in 1965 commis-sioned an AERA Committee on Curriculum Evaluation to developguidelines for quality control mddel evaluation procedures toaccompany the development and revision of educational curricula.Members of the 1965 committee were J. Stanley Ahmann, LeonardS. Callen, Arthur Wells Foshay, Christine McGuire, Tack C.Merwin, Ernst Rothkopf, Richard A. Dershimer (ex officio), andmyself. Harold Berlak and James P. Shaver were added in 1966 byPresident Julian C. Stanley. This committee, like its predecessor,concluded that guidelines limited to contemporary testing andinquiry procedures were inadequate; that special observation, data-reduction, and decision-making techniques were needed; and thatAERA should encourage writing and discussion of theory and

_

Page 18: ,.P.r. ,.MWs.r.. · DOCUMENT RESUME ED 030 948 EA 002 270 By-Stake, Robert E. Toward a Technology for the Evaluation of Educational Prmrams. Pub Date 67 Note-17p.: Pages 1.12 in PERSPECTIVES

12 STAKE

rationales for such techniques. As a first project, this MonographSeries was proposed. It was approved by the AERA Board ofDirectors early in 1966.

AERA, of course, has no writers of its own. The MonographSeries was created to attract contributions from members and non-members alike. Many disciplines should be represented. A dis-tinguished educator, a distinguished psychologist, and a dis- .tinguished philosopher have contributed to this first issue. It isexpected that economists, social anthropologists, communicationsspecialists, sellout administrators, and classroom teachers will beamong the authors of future issues.

Some issues will contain several monographs; most perhaps willbe devoted to a single monograph. Attention will range across suchdiverse topics as decision-making, educational goals, innovation,merit in teaching, merit in textbooks, the measurement of change,content validity, the politics of education in short, to any topic thatcontributes to the scholarly study and technical practice of evalua-tion in education.

This Monograph Series is not a new professional journal. It willbe published aperiodically, to meet a current need. It will be con-tinued only as long as the priority of the need remains high. TheSeries will exist as a medium for writings too lengthy and tooelaborate for journal publication. It will include discourse. Someof this discourse will be speculative, some may even be supplica-tory. Although some of the contributions will be theoretical andabstract, the ultimate purpose of the Series is to serve the practi-tioner. The primary criterion for acceptance ofa manuscript will bewhether, in the long run, what the author has to say will facilitatethe development of palatable, comprehensive, and dependableevaluation procedures.

At this point, we do not know what directions this MonographSeries will take or what services it will render. Will it aid thecurriculum developer? Will it ultimately help the buyer beware?We are convinced that the purposes of evaluation should bereconsidered, that our resources should be inventoried, that newmodels of evaluation should be proposed, and that new tacticsshould be discussed. The monographs in this Series should servethese ends.

ROBERT E. STAKE

I

i

I

11

1

,