00015ENT amiss - ERIC · 2014. 2. 3. · 0. ED 146 233. TITLE INSTITUTION PUB DATE. NOTE AVAILABLE FROM. EDRS PRICE' 'DESCRIPTORS "00015ENT amiss. 0.T5 006 639. Standardized Testing

0

ED 146 233

TITLE

INSTITUTIONPUB DATENOTEAVAILABLE FROM

EDRS PRICE''DESCRIPTORS

"00015ENT amiss0

.T5 006 639-

Standardized Testing Issues: Teachers' Perspectives.Reference and Resource Series. -National Education Association, Washington, D.C.7795p.National Education Association, 1201 Sixteenth StreetN.V., Washington, D.C. 20036 (Stock Number 1501-0-00,$5.75)

NF -$0.83 Plus Postage. MC Not Available from EDRS.Achievement Tests; *Change Strategies; Criterion'Referenced Tests; Elementary School Students;Elementary Secondary Education; Evaluation Methods;Minority Groups; *Standardized Tests; StudentReaction; *Student TeSting; *Teacher Attitudes;Teacher Remponsibility;.*Test Biase*TestingProblems; Testing Programs; Test Interpretation

IDENTIFIERS *Alternatives to Standardized Testing

ABSTRACTThe problems associated with standardited testing are

illustrated in this collection of articles. Alternatives to currentpraCtices and strategies for change are suggested. lhe contributorsdiscuss the roles and responsibilities of groups concerned with

,.Student evaluation systeis, the testing of inoiit/ group andnon-English-speaking students, problems.in using students' -testresultt for evaluation at teachers, and teachers' perspectives on

,.testing alternatives. The 1975 report of the National' EducationAssociation (NEA) Task Force on Testing and a report of the 1972 NEAConference on Civil and Human Rights in. Education are appended.(Author /MV),

***********************************************************************Documents acquired by ERIC include many informal unpublished '*

* materials not available from other sources. ERIC makes every effort ** to obtain the best copy available. Nevertheless, items of marginal *.*'reproducibility are often encountered and this affects the quality ** of the microfiche and hardcopy reproductions'ERIC makes available ** viat,the ERIC Document Reproduction Service (EDES). EDRS is not* Eel onsible for the quality of the original document. Reproductions ** supplied by EDRS are the best that can be made from the original. ************************************************************************

StandardizedTestingIssuesTeachers' Perspectives

S

7

Reference & Resource SerieS

f

2

4.

%Ileft0; ICHE ONt. Y

StandardizedTestingIssuesTeachers' Perspectives.

Reference & Resource Series

National Education AssociationWashington, D.C.

Copyright © 1977National Education Association of the United States

Stock No. 1501.0-00

Note

The opinions expressed in this publication should not be construed as representit, Ag, the policyor position of the National Education Association. Materials published as part of the NEAReference & Resource Series are intended to be discussion documents for teachers who areconcerned with specialized interests of the profession.

Library of Congress Cataloging in Publication Data

National Education Association of the UnitedStates.Standardized testing issues.

(Reference and resource series)Includes bibliographical references.1. ExaminationsUnited States. Title.

II. Series.'LB3051.N3 1977 372.1'2'6 77-24041ISBN 0.8106- 1501.0

0Acknowledgments

"Glossary .of Measurement Terms" (in "Guidelines and Cautions for Considering Criterion-:Referenced Testing" by Bernard McKenna) is excerpted from the revised edition of A Glossary

j

of Measurement Terms: A Basic Vocabulary. for Evaluation-and Testing, published-by C1'13/McGraw-Hill, Del Monte Research Park, Monterey, California 93940. Reprinted by permissionof the publisher. .`

The following articles are reprinted with permission f oin.,Today 's Education:

"An Alternative to Blanket Standrdized Testing" by"Richard J. Stiggins.

"'Criticisms of Standardized Testing" by Milton G. Holmen and Richai'd F. Docter.

"The Looking-Glass World of Testi:ig" by Edwin F. Taylor.

"One Way It Can Be" by Brenda S.Engel.

"A Summary of Alternati'ves"

"A Teacher-Views Criterion-Referenced Tests" by Jean S. Blackford.

"Teacher-Made TestsAn Alternative to Standardized Tests" by Frances Quinto.

"The Testing of Minority ChildrenA NeoPiagetian Approach" by Edward A. De Avilaand Barbara Havassy. -

"Thc Way It Is" by Charlotte Darehshori. )"What's Wrong with Standardized Testing?" by Bernard McKenna.

0

CONTENTS

What's Wrong with Standardized Testing? by Bernard McKenna

The Looking-Glass World of Testing by Edwin F. Taylor 11

The Way It Is by CliarlOtter Darehshori 16

One Way It Can Be by Brenda S. Engel 20

Roles and Responsibilities of Groups Concerned with'Student EvaluationSystems by Bernard McKenna 24-

Why Should All Those Students Take All Those-Tests? 30

A Teacher Views Criterion-Referenced Tests by Jean S. Blachford 33a

Guidelines and Cautions for Considering Criterion-Referenced Testingby 13,ernard McKenna 35

The Testing cf Minority ChildrenA Neo-Piagetian Approach by Edward ADe Avila and Barbara Havassy 43

.Criticisms of Standardized Testing by Milton G. Holmen and

Richard F. Docter 48

Problems in Using Pupil Outcomes for Teacher Evaluation by Robert.S.Soar and Ruth M. Soar 52

_Teacher-Made TestsAn Alternative to Standardized Tests by FrancesQuinto 58

An Alternative to Blanket Standardized Testing by Richard J. Stiggins 60

A Summary of Alternatives 63

Appendices

Tests and Use of Tests: NEA Conference on Civil and Human Rightsin Education, 1972 65

Report of the NEA Task Force on Testing, 1975 81

Contributors , 91

Footnotes and References 94

-1

"Girl 'number twenty," *said Mr. Gradgrind, squarely-pointing with his square forefinger, "I don't know that -girl.Who is that girl?"

"Sissy Ju'pe, sir," explained number -twenty, blushing,standing up, and curtseying. ,

"Sissy is not a name," said Mr. Gradgrind. "'Don't callyourself Sissy. Cdll yourself Cecilia."

"It's fathr as calls me Sissy, sir," returned the younggirl in'a trembling voice, and with another curtsey.

"Then he has no business to do it," said Mr. Gradgrind."Tell him he mustn't. Cecilia Jupe. Let me see. What is your

father?" ,"He belongs to the horse:riding [the circus] , if you'

please, sir."Mr. Gradgrind frowned, a* nd waved off the objection-

able calling with' his hand."ilie don't want to know anything about that, here. You.

mustn't tell us about that, here. Your father breaks horses,don't-he?"

"If you please, sir, when they can get any to break, theydo break horses in the ring, sir."

"Your mustn't tell us about the ring here. Very -well,then. Describe your father as a horsebreaker. He doctors sickhorses; I,dare say?",

"Oh, yes, sir.""Very well, then. He is a veterinary surgeon, a farrier,

and horsebreaker. Give me your definition of a hoise.";(Sissy .Jupe thrown into the greatest alarm by this

. demand.) ."Girl number twenty unable to define a horse!" said Mr.

Gradgrind, for the general behoof of all the little pitchers."Girl number twenty possessed ,of no facts in reference toone of the,commonest of animals! Some boy's definition of ahorse." .

"Bitzer,"said Thomas Gradgrind. "Your definition ofa horse."

"Quadruped. Graminivoroui. Forty° teeth, namely'twenty-four grinders, four ,eye-teeth, and twelve incisive.Sheds coat in the spring; in marshy countries, sheds hoofs,too. Hoofs hard, but requiring to be shod with iron. Ageknown by marks in mouth." Thus (and much more) Bitzer.

"Now,, girl number twenty,',: said Mr. Gradgrind, "youknow what a horse is."

from Book the First, "Sowing": Chapter 'Two,"Murdering the Innocents" of Hard Times by

Charles Dickens (1854).

O

WHAT'S WRONG WITH STANDARDIZED TESTING?by Bernard McKenna

O

In the social sciences, economics is knowri asthe dismal science. In education the "dismal sci-ence" has to be standardized testing.

Its histOry is ominous.Much test content is unirhportant orirrelevant.The structure and formats of the testsare confusing and misleading.The process of administering the tests isdemeaning,, wasteful of time, andcounterproductive.The application of statistics that resultfrom test scores diitorts reality.It is difficult, if not impossible,'to ensurethat test results will be used either toimprove student learning or to helpteachers improve instruction.

'The paragraphs that follow develop each ofthese points.

Intelligence and achievement testing began inthe United States about the turn of the centuryand is closely associated with developments inFrance. The story is well known of how the Frenchminister,of public instruction commissioned AlfredBinet to construct a test to identify students whoseaptitudes were solpw.that they should be placedispecial schools: Billet 'soon found himself opposing

o those philosophers who supported the idea thatintelligence is a fixed quantity. He said, "We mustprotest and react against this brutal pessimism."'

But the Americans who were influential inbringing the Binet test to America, Lewis Termaiof Stanford University and Henry Goddard of theVineland Training School in New Jersey, espousedthe "brutal: pessimism." Terman's translationbecame the widely used Stanford-Binet IQ Test.

The U.S. Public Health Service commissionedGoddard to administer the Binet test to immigrantsat the receiving station on Ellis Island. The test'results ?showed" that 87 percent of Russians, 83percent of Jews, 80 percent of Hungarians, and 79p -cent of Italians were feebleminded. Conse-

.

quently, the percentage of aliens deported forfeeblemindedness rose by 350 percent in 1913. Ahistory to be proud of? A record leading to enlight-eriment? For shame

. The next gathering of destructiVe test datawas during World War I when mental tests weregiven en masse to draftees. Analysis of these results,immediately following the war resulted in their dis-criminatory use against Blacksto ° demonstratethat Blacks had lower IQs than Whites. 1And so it

es.go. Bistween then and now is 'a history of further"refinement" of essentially the same content andformats, of the misuse and abuse of the same kindsof IQ tests that so destructively dealt with immi-grants and minority groups jjn the early 1900'sand during World War I.

The history. of standardized achievementtesting is only slightly less dismal than. that of IQtests: Edward L. Thorndike developed the firstformal achievement tests in 1904. The main reasonfor achievement testing was not to assess studentprogress or improve teaching but to establish theprofession of psychology as a science separate frothphilosophy. Never mind the .students and teachersand their needs. The psychologists saw the oppor;tunity to be considered scientists if they came upwith precise measuring tools with which to plytheir trade. Thorndike wrote that "the nature ofeducational measurement is the same as that of allscientific measurement." And so the course wasset, a course that has never been reversed: Theevaluation of student progress would be consideredin the same realm with measuring tolerances ofautomobile pistons or the trajectory of missiles.

The near panic among the American publiccreated by Russia's launching of the Sputnik in1957 led to vastly increased testing programs. Thisoveremphasis on the use of tests resulted in severalpublished warnings of the dangers of such pro-liferation. Testing Testing Testing, by a joint com-mittee of national educational associations, andThe Tyranny of Testing, by Banesh Hoffmann,were among them. But these warnings went unheeded. And before the end of the 60's, evaluation

a

8

guidelines of Title I of the Elemen ary'and Sectintdary Education Act resulted in ev n more stan-dardized testing."

By the early 1970's, the situation h\a4 becomeso oppressive that warnings were once againher-aided. A'national task force of the National Educa-tion Association and two substantive and pene-trating issues of the National Elementary Principal(March-April 1975, July- August 1975) were. amongthose sounding the alarm. Even as this article goesto press, the Reader's Digest carries a warning pieceon the potential dangers of standardized testingand a report out of London discredits a Britishpsychologist's studies of identical 'twins, a majorsource for the conclusion that IQ is innate..

. At the same time a movement called per-formance-based education calls for more testing,much of which is or promises to' become stan-dardized in one form or another. One is remindedof. the refrain, "When will they ever learn, whenwill they ever learn?"

'Ralph Tyler's observation that standardizedtests get "small answers to small questions" is apt.The content of the tests evaluates little more thanthe' ability to recall facts, define words, and doroutine calculations. Obviously, not all these thingsare unimportant, but even in the reading andmathematics parts of these tests, many of thequestions are inane. The mathematics sectionsemphasize mechanical calculation at a time wheninexpensive electronic calculators are available tothe general public. And the tests make almost noprovision for evaluating a student's ability toestimate or to measure real thingsimportant skillsneeded for functioning as workers and citizens. Asone prominent mathematician has' put it, "Theconcepts sections of most of the commonly usedachievement tests suffer from the fact that theytrivialize the concepts." Large percentages of the"items in standardized tests, particularly in IQ tests,arc limited to word definitions, all of .which arelearnable and tell little about students' generalaptitudes. Further, the words to be defined areoften obscure, infrequently used or encountered inreading, writing, and speaking.

If the 'content of the basic skills tests and IQtests is poor, that in the social studies is infinitelyworse. For example, the social-studies part of onenationally prominent test reflects little of con-temporary curriculum change and improvement inthis subject area. One review states that it totally

neglects "the art of discovery" and "process," bothvery much a part of accepted teaching strategiestoday. And of science test Vems, a scientist-researcher says, "They are incorrect, misleading,skewed in emphasis, and irrelevant.

The 'content of.standardized tests emphasizesgetting "right answers," almost totally neglectingthe thought process by which, the answers arearrived at. Interestingly, a recent Gallup pollindicates that a major educational concern ofparents is that the schools help students think forthemselves.

Much else that is wrong with the substance ofstandardized tests Can be only briefly cited here:

Test content does not reflect local-instructional objectives or specific cur-riculums.Much, of ,the content is unimportant orirrelevant to anything students need toknow or understand.Test content measures mainly recall-typelearning, neglecting the higher thoughtprocessesanalyzing, --synthesizing, anddrawing 'generalizations and applyingthem to new phenomena.The tesis'give an incomplete .pictUre,,ofstudent learning progress, because itemsthat all or almost all students havelearned -are removed from the tests inorder to keep the norming procedurestatistically sound.The test maker uses a language that isnot commonly used in other activities inthe real world.Test items arc unduly complex and re-quire too many different manipulations;sometimes instructions for the items areunclear.Test. vocabularies .and illustrations areoften unfamiliar to those who are not of-white middle-class culttires or for whomEnglish is a second language; that is, thetests are culturally and linguisticallybiased.

The test formats arc unimaginative, restrictiveof creative thinking, and confusing. The multiple-choice .mentality that is sometimes referred to irijest ("A, B, C, or none of the above") is more thana cliche. Large numbers of items- in most stan-

dardized tests are multiple-choice. The assertionthat students become more able to think for them-selves by learning to respond to multiple-choiceitems offers a simplistic solution to a complexproblern. th fact, there is some evidence that thereverse is true. Because each multiple-choice itemmust appear somewhat plausible as an answer inorder to minimize guessing; more than one answercan often be logically assumed to be right. Thisworks particular hardships on those who thinkmost creatively and innovatively.

Because of space limitations; test illustrationiand pictures frequently are out of proportion: Aneraser is about the same size as an automobile,houses are smaller than people, etc.

The need for speed in taking the tests imposesan'artificial structure that is not scharacterktic ofreal-life tasks. Obviously, students.need to learn towork rapidly and accurately. But in the real world,not much is comparable to answering 40 multiple -choice items in 60 minutes, or whatever.

Standardized testing uses up inordinateamounts of precious instructional time. Thousandsof hours go into testing tWat might better be usedin individualizing instruction and planning forteaching. In terms of cost efficiency, the testingbusiness runs into hundreds of millions o£ dollars,the results of which provide little or no help to-students and teachers.

Testing situations generate -fear, imply mis-trust, and generally threaten and demean students,The emphasis on competition, the pressure of time,and the measures used to discourage cheating causestudents to have lowered self - concepts, and to feelinsecure and mistrusted.

Testing settings are frequently physicallyintolerable: Time periods of testing are too long,instructions are blared out on public address sys-tems, and large groups of students are herded intocafeterias or auditoriums where they work on theirlaps.

In spite of the evidence, the test makers saythat there isn't much wrong with the content,structure, and formats of the tests. And while theyadmit that there are abuses in reporting, inter-preting, and using the results, they assume littleresponsibility for this. They argue that if r ominis-trators and -teachers would just interpret and usethe results properly everything would be all right.

. Well, everything wouldn't be all right.

I

Surely practitioners can improve test inter-pretation and usage, but proper interpretation andusage are almost unattainable because of -the kindof substance and formats mentioned in the pre-ceding paragraphs'. It is nearly impossible toseparate content and structure from usagecontentand structure,.in large part, determine usage.

Even if it were possible_ to separate contentand structure trod usage, large problems of usagewould still remain. Let us examine some of them.

Me standardization process in testing leads toreporting of results in terms of averages (norms).This distribution of scores along a range ensuresthat half the students will be below average nomatter how well they do. Since there is nothingbeyond subjective judgment to determine how"good" average (or,above or below average) is, it ispossible that "below average" represents good pro-gress on some tests and "above average" representspoor programs on others.

On the matter of interpretation of results, amajor fault with standardized testing is attributingto the findings- much more meaning than theydeserveassuming that verbal and quantitativescores stand for general, intelligence, for example.Guilford and his associates confirmed long ago thatthe intellect has many dimensions, of which verbaland quantitative abilities arc only a part.

The statement of a prominent psychometristthat if she had just one measure of intelligence itwould be vocabulary represents the kind of narrowpoint, of view about interpreting test scores thatdoes disservice to both those who are tested andthose who use scores to make decisions that mayaffect human beings throughout their lives. On theachievement-test side, a student's ability (or in-ability) to respond "correctly". to more than halfthe items on a standardized achievement test inbiology or social studies tells too little of his or herpotential in either of these subjects tobe a basisfor broad-range decision making.

Yet decisions are made regularly on suchnarrow data, decisions that may limit or deny stu-dents' opportunities. On the basis of ,standardizedtests results, students are categorized, grouped, andpigeonholed; placed in classes for the retarded; ex-cluded from particular courses of study; prohibitedfrom pursuing advanced programs; barred fromparticular institutions; and even denied job opportunities. And all this, sometimes on as small a basis

9

10

as two or three wrong answers, 'answers to ques-tions that themselves may be highly questionable.

Even when test results are not used in formaldecision-making processes, they affect practi-tioners' expectations of particular students. "Maryis in the lower quartile. There is not much usespending time on her; she just doesn't have it," isan attitude That -test results create. But Mary may"have it," and the reasons for the low test scoresmay have'been the particular testing situation, orMary's physical or emotional situation at testingtime. Or Mary may "have it" in many ways notevaluated by the tests. But since the tests theni-selves create the impression that they measureswhat's important or most of what's important,Mary may nbt get much attention after scoring lowon them.

. Decision making on the basis of standardizedtest scores goes far beyond the classroom. Schooladministrators use test scores in comparing class-rooms and school buildings and make decisions on'programs'anc personnel accordingly, school boardsand legislatures use scores to determine the alloca-tion of resources, and the public judges the overallquality of education on the,basis of the scores theyread about in the papers. None of these uses isappropriate. All of them assume that the testsindicate much more than any group-administeredstandardized test is capable of.

Most important, for students and teachers;.'the test results are too broad and general to pro -vide diagnosis of individual student learning prob-lems, and they don't help teachers select the mostappropriate- teaching methodologies for individualstudents or groups of students.

The schools and colleges of America shouldnot use group-administered norm-referenced stan-

a

dardized intelligence, aptitude, and achievementtests. As Jerold Zacharias, prominent physicist andprofessor emeritus at Massachusetts Institute ofTechnology, pointed' out in the National Elemen-tary Principal, it is not sufficient to "retreat' tocatch phrases like 'I knOw these tests are not verygood, but they are all we have.' There arc many

-other ways to assess a child's general competence.They may not look numerical or scientific, butthey are known to every teacher and every schoolprincipal who reads this journal."

Among such other ways are objectives-referenced (criterion- referenced) tests of whichteacher-made tests are a part, individual diagnosticinstruments, interviews 'of students tc determinetheir progress and learning needs, evaluation of *theproducts., of student work and their live perfor-mances, simulation, Contracts with students, stu-dent self-evaluation, and peer evaluation.

Almost no one wants less rigorous evaluationof student-learning progreis. If the Americanschools are to respond effectively to agreed-ongoals and objectives, more and better evaluationprocedures will be required. But one thing is cer-tain: Large-scale mass-administered standardizedtesting programs will not accomplish this mission.

Most teachers are well aware of 'this. Theyneed to use their expertise, professional jud.gment,and influence with other educators and the publicto end such testing programs in their school sys-tems. And individually and collectively, they needto influence the testing industry, state educationdepartments, and other groups to reallocate theirlarge resources to research, develop, field test, anddisseminate a broad range of alternatives to stan-dardized tests for evaluating student learningprogress and to help teachers improve instruction.

10

t

THE LOOKING-GLASS WORLD,OF TESTINGby Edwin F. Taylor

Take a look at this multiple-choice:question:

Scientists study three basic kinds of thingsanimals, vegetables, and

peoplestarsminerals

# foodsreligions"Animal, vegetable, or mineral" is a way to

divide up the world in the game ",20 Questions." Ithas nothing, to do with what scientists study. Infact, scientists study (among other things) peopleand stars 'and minerals and foods and (if you in-clude archaeology) religions. The description ofscience implied but this test question is nonsense.No sense.

That question is from a standardized achieve-ment test for elementary school children (We'llmention later the meanings of standardized andachievement.)

Now look at this question from another test:

If V2 of 6 is 3, then I/4 of 8 isNever mind the answer (which is also pre-

sented as multiple-choice): What does the questionmean? If . , then . . . usually means that one thingfollows logically from something else. What is thelogical connection between 1/2 of 6 and % of 8?There isn't any. No logic.

Here is n third question from the same page ofthe same test as the preceding one:

Different melons weighed 2 lb, 10 lb, 22 lb,15 lb, and 16 lb. How many pounds did themiddle-sized one weigh? 4

Before answering the question, think of acantaloupe or honeydew melon in a supermarket:What does it weigh? A small one, 2 or 3 pounds; abig one, 7 or 8 pounds. The question says "12 lb,101b, 22 lb, 151b, and 16 lb." Good grief, they areall huge! None of them is middle-sized. They areunreal. No reality.

No sense, no logic, no reality. That is theimpression you get from reading through test afterstandardized test. At first you think there must besome mistake, some one or two test makers who

11

do a particularly poor job. And some tests are trulyterrible. But \all of them I have read are at leastbad.

Test makkrs clearly live in some sort of fan-tasy world. That Would be all right by me exceptthat, my children ..and yours arc judged by theirstandards. In ordk to succeed on these importanttests, our children ust adopt their crazy logic and'distorted view of re ity.

From the outst c, the testing business seemsuseful, helpful, no al, and impressive. Mostpeople want to know\ how well their children aredoing in school and hdw well their school is doingin comparison with other schools. Each test hasbeen tried out with thousands of children ("stan-dardized") so one expects that all the 'bugs havebeen worked out of it.

But the tests thCmsehees are secret, in thesense that parents and other public groups cannotexamine and discuss them. And as soon as youlook inside the tests, you realize that instead ofbeing useful, helpful, normal, and impressive, theyare none of the above. One feels like Alice inThrough the Looking Glass, who stepped. into thefantasy world behind the mirror over her fireplace.

Then she began looking about, and noticedthat what could be seen from the old room wasquite common and uninteresting, but that all therest was as different as possible. For instance, thepictures on the wall next the fire seemed to be allalive,' and the very clock on- the chimney -piece(you know you can only see the back of it in theLooking Glass) had got the face of a little old man,and grinned at her.

In this chapter we take a very brief strollaround the looking-glass world of standardizedachievement tests. (Achievement tests examinewhat you know or do, as opposed to aptitude orintelligence tests which examinesupposedlywhat your potential for learning is.) To keep thestory simple, we will quote examples only fromtests for elementary and junior high school stu-dents (for children up to age 13 or 14).

As you look at one of these test questions, donot congratulate yourself for knowing the "right"

11

12.

'answer: at is to be trapped behind the lookjngglass. Inst ad, think about the logic and reality of

'the questi itself, the number of different ways itcan be int preted by children from a variety ofbackgrounds how many of the giyen multiple-

- choice answ could be correct, and where a childmust look out for a trick, a trap, or a simple mis-take by the test maker.

Two of the questions that began this, chapterare examples` of looking-glass arithmetic: themanipulation of numbers. But numbers themselves

. beconie weirdly distorted, in standardized tests. Trythikquestion: °

How !nadir hundreds arc in 20 tens?.

Never mind the answer itself: What possible usewill the answer !lave) Does any scientist, doctor,lawyer, shopkeeper, or homeowner need to knowhow to answer this question? The test maker: willmention' something about "place value," whichmeans that children should realize that 20 + 1equals 21. and not 30. But if children have this kindof trouble, you help them with the rather thanteach them some jargon.

Apart from its uselessness, the question con-tains a linguistic trap. Since 20 tens equal 200,therefore there are two hundreds in 20 tens. So theanswer is 200, right? Wrong!, But never mind.

Here is another question about numbers, infact the number zero.

36. Which of these are names for zero?

I. . 0 + 10*o x io

IIL 0 +10A. II onlyB. I and II onlyC. II and III onlyD. 1, II and III

First of all, what does "names for zero"mean? I know four names for zero: null, cold, zip,and zilch. None of them appears among theanswers, so try again., Apparently 0 x 10 is a namefor zero. This name for zero is called Roman nu-meral II. Another name for zero is called III. Theanswer is "II and III." This answer is called letterC. In order to answer the question the poor childhas to keep in.mind simultaneously all these names

and names fo'r.names. He or she may feel like Alicewhen the White knight explains the names for hissong:

"The name' of the song is called 'Haddocks'Eyes.' "

"Oh, that's the name of the song, is it?" Alicesaid, trying to feel interested.

The Original Looking-Glass Achievement Test

"Can you do .Addition?" the White Queenasked. "What's one and one and one and one andone and one and one and one and one and one ?"

"I don't know," said Alice. "I lost count.""She ca'n't do Addition,"" the Red Queen

interrupted. "Can you do' Subtraction? Take ninefrom eight." ,`

"Nine from eight, I ca'n't, you know," Alicereplied readily, "but "

"She ca'n't do Subtraction," said the WhiteQueen. "Can .yob do Division? Divide a loaf by aknifewhat's the answer to that?"

"I suppose"' Alice, was beginning, but theRed Queen answered for her. "Bread and Butter,of course. Try. another Subtraction sum. Take abone from a dog: what rt ns?"

Alice considered. "The bone wouldn't remain,of course, if I took itand the dog-wouldn't re-main: -it would come to bite meand I'm sure Ishouldn't remain!".

"Then you think nothing would remain?"said the Red Queen.

"I think that's the answer.""Wrong, as usual," said the Red Queen.. "The

dog's temper would remain.""But I don't see how""Why, look here!" the Red Queen cried. "The

dog would lose its temper, wouldn't it?"Perhaps it would," Alice replied cautiously."Then if the dog went away, its temper would

remain!" the Queen exclaimed triumphantly.Alice said as gravely as she could, "They

might go different ways." But she couldn't helpthinking to ,herself, "What dreadful nonsense weare talking!"

"She ca'n't do sums a bit!" the Queens saidtogether, with great emphasis.

12

"No, you don't understand," .the Knight said,looking a little vexed. "That's what the name iscalled. Thc name really is The Aged, Aged Man.' "

"Then I o, to have said, 'That's what thesong is called'?" Alice corrected herself.

"No, you oughtn'tcthat's quite another thing;The song is called 'Ways and Means',.. but that'sonly What it's called, you know',"

"Well, what is the song, then?" said Alice,who was by this tie com?letelyhewildered.

"I was coming. to that,"the Knight said. "The"song really is '4-sitting on a Gate; and the tune'smy own invention."

Mere is an example .of what my colleague_Judah Schwartz. calls "A. is ro B as C is to almostanything" ,

Pullman was to railway cars whatWhitney was

wasGoodyear was to rubber'7' 'Jefferson was to cotton

Boston was to beansdon't know

Since there is no unique relationship between-different kindi of things (such as a person and aproduct), the item asks, in effect, "What am Ithinking?" The result is to penalize inventiveness.Boston .produced beans just as surely as PullmanproduCed.railway cars. Tests are full of this kind ofquestion,'.partioularly. the college, entrance exami-

' nations.In no field is the unreality of the test maker's

woad more apparent than in science. H_ere is alooking-glass question about mirrors:

What does this picture of a boy looking athimself in a mfr, sillustrate?

focusing' .-,transparencydispersionreflection--[don't know]

This is one of many, many examples, of amultiple-choice .prOblem in which .all the choicesare correct. The picture.of a boy looking at himselfin a mirror illustrates focuiing on his eyes .(andours!); 'it illustrates transparency of the glass; itillustrates color fringes due to different speeds of

13

light of different wave-lengths in the glass (calleddispersionthe original figure is two-color withblue and black, so is "in color"); and it certainlyillustrates reflection. If I cannot choose one amongthese correct answers, will I be given full credit forchoosing the answer "don't know"? ;

Along with "content," the enterprise ofscience itself as. pictured, in achievement tests isseriously distorted. One example began this article.Here is anothei- one:

'Which method is used by scientists to discovernew facts?

talking and listeningreading and writingrevising and attendingexperimentingand observing

What does facts mean? Experimental data?Then clearly "experimenting and observing' is thecorrect. answer. But experimental data are not"discovered" as , some 'kind of surprise: They arerecorded as the result of planned experiments.Maybe "new facts" -means "new theories.*". Newtheories can be discovered, but how are they.discovered? Under what circumstances have youShad new ideas? While talking-or listening or readingor writing or revising or amending or experi-menting or observing?, Yes! And while dozing orwaking or sitting or walking or bicycling or. . , . Intruth, this question seriously misrepresents the,enterprise of science. fh order to answer the-ques-tion at all, the child must adopt the fantasy world

. of the test maker.Arc all test items as bad as the ones we have

shown? No, but a, significant- percentage 'are, thepercentage being greater or smaller depending onhow you set your sights. Banesh Hoffmann, authorof The Tyrannyof Testing, has a standing offer fortest publishers:- On any standardized achievementtest not concerned merely with trivial facts orroutine arithmeticalsOperations, he guarantees thatreasonable people will agree ,that at least 10 per-cent of the questions are seriously faulty.

He is clearly being conservative; It should notbe difficult ,to find significant faults with 20 per-cent of standardized test items. Indeed, if one is,allowed to object on principle to crowded graphiclayout, a separate answer sheet, or the :trultiple,choice format itself, then the failure rate for test

14

O

a

questions themSelves can approach 100 percent.But even if only 10 percent arc faulty, this con-stitutes a serious indictment of these tests, since avariation of 10 percent in number of "correct"answers can oftentimes determine whether a childis placed at the top.or in the middle of his or her"reference :group."

Why ',are achievement tests so bad? I believethat the primary reason is the test maker's goal oflining up -children along a single line by asking,"Who hasil\e higher score?" The inhumane notionthat people can and should be' Compared with oneanother along a line is the fundamental error thatleads to the looking-glass world of testing and itsperversion of .our educational system.

This notion also leads to the brainless use ofstatistics sin the development of tests. The stan-dardized test is constructed initially by selectingquestiAs from a large reservoir composed by,"item writers." The preliminary .version is theytried out with different groups of children, eachgroup large enough to -provide- "statistically sig.;nificant" information on whether or not each testitem discriminates between children in the waythat the test makers -wish to discriminate,Typi-catty, a revision of the test is tied out with a largeselection of childi.en in order to "standardize" the'results for different groups. -

The test 'items that survive this selectionprocess are thoSe that make the "appropriate" dis-criminations between children and not necessarily

those'that, are logical, -correct, or clearly laid out,'or that actually test the skills that society holds tobe important.

This entire process of test development can inprinciple take place without any child's sitting down,with a,,sensitive adult to try out the questions anddiscuss which of the difficulties arc important andrelevant and WhiCh are'trivial, irrelevant, or causedby the form or layout of the test itself. tTntil test,makers get a lot closer to real, individual children,the children who take their tests have the terriblechoice between remaining real (and failing) -aridbecoming part: of the ,test maker's dream (andlosing their own reality).

If we' know 'how tests come to be as bad asthey are, 'why do they remain so bad? I believe thatthe.answer is summarized in one word: secrecy. Somuch timeand effort go into trying out each testitem with large numbers of children that itbecomes a valuable property i its own right. To

make such an item public is to destroy its power. tocompare children with one another. The result is,that, parents as a group cannot see thc .tests bywhich their children are fudged. Until parents andteaaters can compare notes and seek advice ontests' exposed to the light of day, there will be noopportunity for their natural outrage to lead totests' improved in content, humaneness, and,connection to the real world.

What, shall we make of all this? Shall we laughor shall we cry? In dur outrage shall we demand anend to all achievement testing? Some parents,.teachers organizations, and school boards maydecide so, and their choice should be respected.Others will continue to feel that children andteachers need to know how well they are doing 'andthat schools need to report to parents and other -iaipayers how well' children have mastered theskills that society thinks essential to its properoperation. In order to .do this task humanely, testdevelopment and use must be altered funda-mentally.

The first and.,essential step is to stop com-paring one child with other children (so:,called"norm-referencing") and instead to try deter-mining whether a child performs the necessarytasks well enough ("criterion referencing"). Thebest example from the adult world -is the auto--mobile driver's test:. The driving skills, you areexpected , to demonstrate are not secret, and youeither do well enough now or you have to try againlater.

Second, test developers must sit down with.children individually, watch them take the test,and talk with them afterward about which ques-tions were clear and important and which wereconfusing or demeaning. The children who try outthe tests must be from diverse ethnic and culturalbackgrounds, both because tests must not dis:eliminate on these base's and also because allchildren will benefit from the use of tests that are

,made understandable, for as wide a variety of, children as possible.

Third, the usefulness of tests must be' judgedby how soon after completion children and teach-ers know which answers are in error and what mis-understandings may have resulted in a given in-correct answer.

Fourth, for tests of "practical skills" such asclassifying, describing, narrating, measuring,estimating, graphing, mapping, and doing word

14

C

problems, test makers must show that performanceon the test compares with ability to carry out simi-lar tasks in settings as near to real life as possible.

Finally, when skills can be dearlOelated fotest performance, parents, teachers, and adminis-trators must speak 'for society by deciding what

4

15

level of performance on each test shall be called"good enough." As a chuck on this process, testsmust be made public, at least after they have beengiven locally.

It's a long road back through the lookingglass, but some of us are starting down it.

15

to

16

THE WAY IT ISby Charlotte Darekshori

One of the 'main goals of education is toimplement humanistic programs. in our schools.Yet incorporated' in these programs as one of theevaluative tools is one of the most dehumanizingpractices in education standardized testing.

While most of us talk in terms of indi-vidualized:approaches, we employ tests that areconstructed to compare child with child, class withclass, and school With,school. We use tests that not

, only give us \a basis for comparing children, but arepurposefully built to "fail" a certain percentage ofthem.,

We tell parents not to, compare their childwith peers or siblings because this could bedamaging to the child's self-concept; we tell chil-dren not to 'compare themselves with others. Howthen can we justify our practice of using stan-dardized tests that make just such comparisons?

As a teacher, I have found it harder andharder to justify standardized testing philo-sophically, but its, is even more difficult to justifythe cruelty of subjecting young children to the actof testing itself.

In giving standardized tests we place childrenin yOsitions' over which they have no control, thenwe direct them to perform illogical tasks and to actas if everything were perfectly logical.

Taking a_ standardized test is a bizarre expe-rience for beginning first grade students. Itsscenario comes complete with written parts forboth teacher and student: For' the first time in thechildren's school careersperhaps in their livestthey are interacting with an adult, who is readingfrom a script that dictates what, how, and when he

-or,she will react to thsm. In this play,

tler stomach begins to feel funny; she holdsthe pencil more tightly..

The teacher, goes on, "You should havemarked the s-t. You hear the sound that s-t makesat the beginning of stamp. You do not hear c-1 orb-1. You should not have marked these.

The teacher's aide is walking around the roomlooking at papers. She comes to Melanie. "Melanie,do you understand how to mark your-ansWers?'!.

'Melanie looks -up at the aide. "I know I'Msupposed to mark, in one of these circles. We didthat yesterday, but I can't tell which one to mark."

"Just take a good guess and go on.""But, I don't know, tcan't read yet."The aide pats her on the shoulder, "Just do

the best you can."The: teacher goes on, "We are ready to.begin.

If you do not understand what you are to do,.r/tiseyour hand. If you are not sure of an answer,itnarkthe one that your think is right. If you change youranswer, erase the wrong one. If you want me -torepeat any question, raise ythir hand."

By "this time, Melanie and most of the otherchildren are so confused by the maze of instruc-tions, they can't even formulate a question.

SinCe no one raises a hand, the teacher con/tinues, "First we are going to listen*for sounds, atthe end of words. Is everyone ready for Number hi?Look at the picture of the drum. .. Mark your.answer."

/ Melanie stares a her-. paper. She doesn't knowwhat the teacher is asking her to do. She lOoksaro/uncle feeling panic. Since many of the childrennbw have their hands up, she puts her u?.4Theaide finishes with one of the other children andcomes to her side. "I don't know what too do,"Melanie whispers, tears- in, her eyes.' The aide can

F ;only repeat what tht teacherhas said.The teacherloes'thrOgh 27 more items, in-

cluiding ones in which the children have to be able-.1.o distinguish between e, u, and i as the sound

heard in the middle of first, and u, o, 9r e as thesound heard in the middle of rug.,

Everyone greets recess with Cheers: chil-dren are exhausted the aide and the keacher areexhausted.

After reCCSS, Ole' children conic back into classand see-thtest booklets still-on theirdesks. Theygroan and protest. The teacher gets them settleddown and begins the l'outine again.

f'

17

In a situation like the one above, the citilVren''tend to feel that they are failures; they riversuspect that something may be wrong with thetqt. The teacher, too, is a victim in this testingprocess, because he or she is made to feel that anyproblem in carrying out the test is caused by the.way he or she has administered it. According to thetesting manual, "the teacher or examiner whomakes the announcement should guard againstaiOusing anxiety in the students,"

During testing week some children remove,/themselves front` the intolerable situation by either"playing sick" or. actually becoming sick..

On the second day of testing, Melanie did notwant to come to school, but.her mother felt it wasimportant for her to go and "not get in the habit

stayin,g home just because something she didn'tI like was happening." In this way Melanie's mother,like many other parents, helped to support the

7 practice Of testing, feeling that it is a necessary evil-.The parent thus joins with the school in furtheiconvincing the child that something is wrong withthe child, not with .the test.

Melanie, however, had an asthma attack dur-ing the math portion of this test and got to go

',home. anyway. :During the rest of the year, she wasfrequently absent and very reluctant to try newtasks. -

In the second grade, the same 'pattern con-,61'1114. Melanie's experiences with testing seem tohave changed what started as a positive schoolexperience into a negative *onse:

Unfortunately, this student is not unique.Two or three clays of testing frequently damage theself- esteem of many first graders. It is hard to over-

,

state the,negative impact of this test-on young chil-dren.

Other children deal with standardized testingby not really trying, by just narking answers andgoing, through the motions, On therreading cont-.prehenson part of the test given above, the chil-dren were required to reild sentences suchsas "Theprince' took a drink and changed into a' frog." Onlytwo children in this class were able to read at all.

The children were given 15 minutes for this".part of the test. Most went through 'it marking any

-bubble--that struck their.-fancc. and finished the test--in two or three minutes. Some made nice designswith the bubbles. Only the two little boys whocould read took more than five minutes for the

-test. One of them became. frustrated 1;;:cause the

1 7

18

teacher wouldn't help him -with a word, so he puthis ,head down on his desk and refused to finish thetest.

The effects of tests on children are tragic andcruel. The vicious cycle of labeling and testing fol-lows' children throughout their school experiences,influencing -both teacher and parental .attitudestoward themand, what is worse, their attitudestoward themselves.

Much has been written about the effects oftesting on teachers' attitudes toward students. Wemust now contend with a third party in this un-healthy situation. Federal and state programs ,re-.quire increased parent participation, so parentshave access to information which they might nototherwise be aware of

Usually parents whoie child has low scoresbelieveeither the child or the school is failing.

Teachers who kriow thelimitations of ,thesescores are' reluctant to tell parents a first oresecond'grade,,,child is ranked "beldv average." Providingthis information to the parent only perpetuates thelabeling of young children: We must question, how-ever, any use whatsoever of a score that is Sotainted that we, wish to withhold' it from a child'sparents,..If a score is that misleading and damagingin its effects, we must examine the wisdom of evenhaving it available.

We must _also question the educational sound-ness of Writing objectives based on raising scores onstandardized tests. Suppose. a `school gets govern-nient money for a program to bring all students'scores that are in the lOwer two quartiles up to thcUpper two. The tests are .constructed, however, toobtain a certaii distribution of scores among allfour quartiles. The two lower quartiles will bydefinition always contain a certain propoition ofstudents' scores, so the programs are destined tel.fall short of their objectives. \

It is difficult for a teacher to \have workedhard all year only to get the results of ,tandardizedtests and find that, technically, the 'teacher and.claSs both have failed. This year nearly very childin our school is in the lower two quartiles in read-itig or math or both. Since the main goal of theprogram at 'our school is to bring these hildrcninto the two top quartiles, the. program ha failed.

The teachers and staff of our \scho canaccept this, failure intellectually because\ we feel itis only a "paper failure." Emotionallyhowever,we become frustrated when faced' with\a list of

4

scores that says our students,are failing academ-

These tests also negatively affect the programsthat they evaluate. In schools where the staff isprofessional and secure, the influence of these testsis minimal as the staff tries to. keep the children'sneeds in mind and teach to theie needs, not to thetests. Even so, the need to,comPare skillrachieve-.ment with that in other schools gives the testsinfluence. Because evaluation techniques and stan-dardizedtests'have not kept pace with curriculumdevelopment and theories of child development,that influence is regressive.

In other schools the situation is worse., AtOne school where I taught, great emphasis wasplaced on .the test results. Predictably, teachers did'everything possible to improve the test scores.Since the only two areas evaluated 'on theest weremath and reading, teachers- concentrated{ on thesetwo areas almost to the exclusion of art;lsocialstudies, and music. Recess and lunch time ere cutdown in order -to give more instructional time inmath and reading. Meng was manipulated to

than!bake the pretest scores flower tnan tne posttestscores. For pretesting, all t is were given inone day, on a Monday; for posttesting, they weregiven at a more leisurely pace on Tuesday, Wednes-day,' and Thursdaydays when the children wereusually more settled.

Tests and work sheets covering the materialon the test finally came to be the curriculum at theschool. The pressure to look good oKt tests broughtabout wide fluctuations in students' test scores.gains of two or three years one year and regressionthe next,

It seems, then, that little of Nalue is derived'4,,,,,from these tests, other than using the scores as

criteria for deciding which schools will get federalmoney for new programs. (Why not throw darts?)

To student, teacher, and parent, the tests areequally devastating. One teacher at Williatm PennElementary School (Bakersfield California) put it -very succinctly, "How do standardized tests helpme in the classroom? Well, they helped three chil-dren ruin their pants and one child have ;xi asthmaattack."

Teache s have talked about the dama gingeffects of staI da dized tests for years. perhaps:4they refused to ve the tests, changes and reformswould result. J

18.

One immediate change could be to exclude° children from standardized testing until they

actually haVe the skills that these tests arcsupposed to be testing. Teachers could use theirjudgment to decide who should take the tests.

Because these 'tests are not diagnostic and aresupposed to be more valid (although this, too, isquestionable) for a group than for an individual,test results should not be, linked with an individual

.1"'"c"\\

\:

ft

f.

19,

student but only with a group.On a long-term- basis, test manufacturers

should design tests based on the developmentallevels of young childrennot adults. In curriculumWe realized years ago that the child is a uniquekind of being and,-not just a smaller version of agrown-up. Merely updating the old model of thestandardized test as testing companies have done inthe past and continue to do is not enough.

yf

2

.20

ONE WAY IT CAN BEby Brenda S. Engel

the spring of I976, the CambridgeAlt ative Public School, then in its fourth year,

d generally avoided administering the stan-dardized tests ordinarily required of Cambridgepublic schools. At that time, however, pressure'from the school department was increasing; theassistant superintendent for elementary educationfel:that he needed concrete evidence of thequality of the educatiOn offered at the school.

With, parent support, the school had taken anantitestirig position (similar, on several points, tothat taken by the NEA). The school felt thatscheduling standardized tests disrupted the educa-tional process, that the tests made many childrenanxious, that the tests penalized minority children,and that their influence on teaching and the cur-

, ficulum. could be disastrOus for an innovativeschool. But the school 'community (teachers,administration, and parents) also ,had. a stronginterest in carrying out some form of evaluation inorder to corroborate their confidence in theschool. So much for the situation.

At this point, at the request of a parent-staffcommittee, I was employed as an independentconsultant to try out some means of 'evaluationthat might be satisfactory to both the school com-munity and the school department.

We settled on the third grade for the alterna-tive evaluation, because, it was a well-balancedclass in regard to age, sex, and race. Four teacherswould be directly involved since the 29 children inthe grade were fairly evenly divided among (Orclissrooms (each contained mixed ages). Most ofthe children involved had been in the school fromits inception. .

In order to keep the size of the undertakingmanageable during the first experimentat_year, weidentified thiee areas of the curriculum for assess-ment math, reading,. and artand proceeded tomake an overall plan, to outline an implementationschedule, and to design the actual instruments ofevaluation.

The evaluation was to be carried out over afive-week period' toward the end of the school

4..`

rr

14 I

year. We hoped the instruments of evaluationwould do the following:

Give each child various ways to demon-strate his or her abilities.Take into consideration the. variedeconomic, cultural, and linguistic back-

.. grounds of the children., Elicit, original responses and creative

. thinking.. Assess significant aspects of education.

4," Gain information about children's learn-ing as directly as possible..

AWe also hoped that the evaluation would

cause a minimum of disruption in the school andthat it would not be a negative experience for thechildren. The actual work of the assessment was tobe shared among a'number of people with differentjobs in the school or with different relationships tokit.

When the evaluation was completed, wehoped to present a report in1clear, readable, and

''-usable form. We planned to make it more descrip-tive than judgmental, both non comparative andnonnumerical, and useful to teachers as well as in-,formative to administrators and parents.

The matrix shOwn in Figure 1 describes whatwe intended to assess and how, we intended to doit. The Areas to Be Assessed'are listed across thetop and the Means of Assessment down the left-hand side. Teacher'statements led the list a Meansof Assessment. Each teacher gave opinions (whichwe determined through lengthy, interviews) of each

progress in each area of learning: hisor herunderstanding of the decimal system, sense of

-estimation and of probability, and so on across thematrix, ending with the child's ability to solve'original problems.

The teacher statements began the evaluationprocess and supplied the guidelines, in both, con-tent and approach, for conducting the re$tyf the,assessment. The'teachers' opinions of each !Child'sability in each curriculum area set the stage for

20

22

what followed, particularly for our observations of,and interviews with, the children.

The-second Means of Assessmentclassroomobservationswas necessarily open-ended anddirected more toward quality of work and involve-ment than toward skills. An observer spent abouthalf a day in each of the four classrooms, focusingparticularly on the third graders and recording theobservations in anecdotal form.

A parent committee drew up and distributedparent questionnaires, thi third item in the Meanscolumn. The areas checked on the matrix representonly some of the subjects covered in the ques-tionnaires; other subjects were matters of generalinterest to the school and were not part.,-of theassessment.

The oral And written tests ,were 'Made upspecifically for the occasions (i.e., nonstan-dardized).,They were to inventory the children'sabilities in the specified areas as simply-as possible.

Following is an example from such a test. Itwas designed to measure children's ability toestimate as -part of *e mathematical skills assessedon the-matrix.

°AbOut how much does your teacher weigh?Which do you think, weighs more, a bicycle or "

a horse?About how long is your thumb?About how high is the ceiling in this room?About how long does it take you to brush

your teeth?About how long will it be before you are a

grown-up?.About how long is summer vacation?About how many children are there in this

school?About how many pieces of bread are there in

a loaf?

Classroom teachers, with the help of graduatestudents, gathered the 'next items on the list, collection of work samples (current), previous testresults, and summaries of school records.

Finally, when all these data were assembled infolders, we conducted an interview with each childto fill in any gaps in the informati;:,n and clear uppossible ambiguities or contradictions.

Most important, each area of learning wasexamined in a variety a ways designed to cross-

check each other. No judgments were made on thebasis of a single means or single occasion.

Another, central and challenging considerationwas the form of the final report, which had to bedesigned for the requirements of widely differentconstituencies: the school department, schooladministration, teachers, parents, and children.

The school department was interested in aconcise statement focused mainly on skills. Par-ents, although ,they varied in their expectations ofthe evaluations (some looking for cognitive; others,affective assessment), were primarily interested indetailed reports on their own children. Teacherswere looking for confirmation of their own percep-,tions, for further insights, and for implications forthe curriculum. The.school administration sharedall these interests. The children themselves, if theywere at all aware of the nature of the process, werelooking for personal reassurance.

Our reporting system hap three, parts: a Sum-mary sheet for each child, a key (with schooldepartment expectations, not norms, underlined),and documentation (folders containing results,of,or notes on, all the Means of Assessment). Byglancing at only the summary sheet, one could gaina general impression of achievement; one couldread the summary sheet in detail, using the key toidentify specific skills; or one could scrutinize theactual evidence on which the summary sh-eet wasbased.

It would be misleading to suggest that all ofthis does not add up to a substantial amount ofwork. Having gone through this process once, how-e'ver, those of us involved now believe that, thesame approach could be carried out in a variety ofways and over differing lengths of time.

A teacher, or group of teachers, could-customdesign his or her own matrix, listing the iubject.matter to be assessed across the top (as shown inFigure 1) and:the feasible means$ assessment downthe lefthand column. The matrix itself, once it hasbeen filled out, can provide the frainework for theassessment process. For instance, one might limitthe means to teacher statements,, parent question-naires, written tests, and work samples. Similarly,one could limit the areas to be scrutinized. (It isimportant, however, to assess each area in at leastthree ways.) Later, then, when the time comes towrite a test or plan a questionnaire, one has only tolook across the horizontal row from a particularmeans to identify its content. Classroom teachers

23

can formulate the specific questions to be askedwithout much difficulty.

After the scheduled information has beencollected for each child, the teacher can fill in areport for each child; viewing the assessment as amore-than-adequate substitute for the usual reportsand testsnot as an addition to them.

How to assess the assessment in relation toour expectatir as? At this point, it is important toreemphasize that the purpose of this alternativeevaluation was to compile as informative and ascomprehensive a picture of each child's abilitiesand skills as possiblenot to compare .children's

.''

- 24

23

achievements or fates of growth with those ofother children. In this context, the findings havepromise.

The children, by their own- accounts duringtheir interviews, seemed to enjoy the process,which was neither seriously interruptive nordamagirg. With the, additional specific informatidnabout each student gained from the assessment,teachers felt that they could do a better job ofindividualizing the educational program .for eachchild.-Terhaps most important, the assessment-itselfdid not violate The educational climate we weretrying to protect and to which we were committed.

0."-

24

ROLES AND RESPONSIBILITIES OF GROUPS CONCERNED WITH STUDENT EVALUATION SYSTEMS*by Bernard McKenna

The roles and responsibilities delineatedbelow for specific groups of persons particularlyconcerned with student evaluation are based onfindings of and Positions taken by the NEA TaskForce on Testing. (The report of this task force iscontained on pages 81-90.) These recommendedroles and responsibilities are considered essentialsfor achieving the following goals:

Sound and fair develoPment of evalua-tion systems .Appropriate distribution and adminis-tration of evaluation systemsAccurate and fair jnterpretation of theresultsRelevant And constructive action pro-grams based.on the results.

.)

A. Teachers, Individually or CollectivelyThrough' Their Associations, as Appropriate,Should Do the Following:

1. Seek representation on school district,testing industry, and government (stateand federal) decision- making groups fortest development (e.g., Educational Test-ing Service, National Institute of Educa-tion), become involved id item analysisand selection, and provide feedback oncontent and foimat 6f tests.

2. Plan and negotiate for, or otherwisereacb,aireement with the schoolistratidd on, released time and districtin-service education programs to preparemembers in the use of tests.

3. .Plan professidnal activities in the area oftesting for MI- members of the associa-tion.

4. Seek and participate in in-servicetrainingin the area of testing to learn* to con-struct and evaluate teacher-made tests,to learn about objective- or criterion-refereiking, to learn about alternativeassessment tools, to learn appropriate re-porting procedures, to develop an aware-ness of the variety of tests and their pur-

poses, to keep abreast of latest researchfindings, and to develop the ability toanalyze and -riticize standardized testsas they relate to school and district pro-grams and goals.

5. Work to influence test makers and thelocal and state school systems and securefrom them a firm commitment to evalua-tion programs that- will lead to the im-provement of instruction.

6. Keep parents and other interested com-munity groups infornied al gut trendsand promising developments in evalua-tion procedures and about unsound test-ing practices.

7. Negotiate for;, or otherwise reach agree-ment with the school administration on,provisions guaranteeing teacher lead timefor preparation for testing, appropriatetesting conditions and scheduling, andfollow-up the for scoring. Provisionsshould spell out teachers' appropriaterole in the test - scoring process, e.g., toremedy the inordinate amount of *timespent on hand scoring:

8. Thoroughly familiarize themselves withtests td be given (assuming they havebeen furnished with appropriate back-ground materials and sufficient time forlearning about administration of theinstruments).

9. Develop an understaiuling of their stu-dents' cultural and socio - economic back-grounds and sensitivity to their indi-viduaPiiceds and problems in order toavoid. the of irrelevant andbiased testing.

10. Periodically t review- tests to determilietheir relevancy to instructional goals andobjectives and' their timeliness, and

*The 'term evaluation systenV is used irnteadof tests because it is believed that a wide variety ofalternatives to tests should and can be developedthrough research and tryout leading to theirvalidation for evaluation purposes.

25

recommend to the school administrationand the testing industry abandonment ofirrelevant and outmoded tests.

11. Secure by appropriate,meansfrom theschool or school district administration,as deemed necessarythe right to deter-mine what tests will be administered,when they will be given, and at whatintervals. They should also secure theright to determine exemptions from test-ing.

12. Secure by appropriate meansfrom theschool or school district administration,as deemed necessarythe right to deter...mine proper physical arrangements andtime frames for testing as appropriate forthemselves and their students. Timeallowed should be sufficient forthorough orientation of students to the.test being given, and for scoring and re-portingrcsults.

13. Be responsible for providing a non-threatening attitudinal atmosphere forstudents during testing sessions, giventhe proper conditions.

14. Assure that machine-scored results arcvalidated by hand scoring a sample oftests. s

15. Take an objective approach in inter-preting test results, never using them as aweapon against students.

16. Seek to ensure that test results are notused to categorize students into homo:,gencous groups or as a criterion for stu- 'dent admission to programs of theirchoice.

17. Strive for accuracy in interpreting testresults, relating them to socio-economicfactors affecting individual students.

'18. Have respect for student privacy in inter-preting test results and manifest thatrespect by working to secure school dis-trict policies guaranteeing students'privacy in the reporting and dissemina-tion of test results, which should not befor public information.

19. Urge strict enforcement of the federalPrivacy Act affecting pupil records.

20. Work to secure legislation which will pre-vent publication of test scores.

25

21. Work to secure legislation which will pre-vent the use of test results as a basis forallocation ollocal, state, or federal edu-cational funding.

22. Assure that test results are not comparedamong dassrooms or buildings or withother districts or regions.

23. Report oa test results in a manner appro-priate to a varied audience -= students,parenti, media, professionals.

24. Recommend *general and specific pro-gram improvements to the school andschool district administrations, and toeffect the improvements, .identify theneeded resoUrces and remedial measuresand programs.

25. Secure through the appropriate meansfrom the school or school district admin.ittration, as deemed necessarythe stipu-lation that test results will not be used in,_evaluating teacher performance. (Teach-ers should be held accountable for con-ducting the pest instructional processpossible under existing conditions, notfor guaranteeing learning.)

26. Take a position in favor of the inclusionof courses in tests and measurement's inall teacher preparation ,programs, andprovide input oni

testing`-problems andissues to their on profes-sional governance boards or commissionsto help in the formulation of standardsand requirements for teacher education

licensure. (There is littic4 evidencethat most preparing institutions or statesspecifically require or encourage class-room teachers to acquire the knowledgeand skills necessary for using tests.)

B. Other Professional Associations Should Dothe Following:

1. _ Search out and synthesize informationon all issues associated with the develop-ment, use, and abuse of tests and com-municate to the members any informa-tion affecting them or their students.

2. Organize study committee, of membersknowledgeable in testing to developpolicies, guidelines, and procedures fortesting. Such Committees should seek in-

26

put from all members and consultationfrom experts in the field.

3. gerve in a "watchdog" capacity .on theintroduction and administration of cur-riculum-related tests to assure their,appropriateness for schools, and com-municate regional concerns to the testingindustry.

4. Pursue needed changes in school .cur-riculum programs as identified- throughThe results of testing, this in cooperationwith other associations in the regionwhich represent comparable educationaland socioeconolhlic conditions.

5. Identify alternatives to standardized test-ing. ,

6. Provide background information andregional concerns to those responsiblefor drafting or introducing state legisla-tion, and work for passage of legislationto regulate types of tests and uses of theresults. These efforts should include call-.ing for the testing of students in theirdominant language (except, for example,proficiency tests in English).

7. Urge strict enforcement of the federalPrivacy Act affecting pupil records.

Students, Individually 'or Collectively as Ap-propriate, Should Do the Following: .

I. Seek. a role in the developrhent of teststhrough representation on school, dis-trict, and testing industry committeesand by providing, feedback on test con-tent and format.'

2. Take. positions against the use of mea,-surement instruments that they feel are

. biased and will lead to unfair results onthc. basis Of race, sex, 'socioeconomicstatus, language; or culture, and makethese positions known to the school andschool district administration and thetesting in dustry.1Make every effort, assuming they havebeen afforded proper ,orientation, tothoroughly understand the purpose, andintended uses of results, of any 'test to beadministered in which they Will be in-volved. Students should have the right to

refuse to take a test known to be racially,culturally; or otherwise biased.

4. Seek a role in determining Ifni conditionsof test administrationinduding sched-uling, preparation, 'length, location,facilities. (Many tests are administeredunder adverse conditions, with' littleattention given to the total physicalenvironment and insufficient time allow-ed for orientation.)Call attention to any physical or atti-tudinatpressures in the administration oftests which they .feel threaten them ortheir performance.

6. Insist that they be given a thorough ex-planation of test results in a meaningfulway and in; language they can under-stand.

7, Take a position on the use of test results,demanding guarantees of privacy and theright to determine 'to whom the resultswill be released, insisting-that results notbe used to demean or categorize them orto deny them admission to prdgrams of.their choice, and urging strict enforce;ment of the federal-Privacy- Act, which

'affects pupil records.8. Seek a role in deciding, on alternatives

for meeting student needs as identifiedthrough the results of testing, insist onthe right to choose from among alterna-tives, and become involved in theplanning of remedial pidgrams.

Local Student Action for Education and Stu-dent NEA groups might assume the leadership rolein involving all students in the-evaluation programsof the school or school district and sc.-ye as thevoice 'of student opinion and the vehicle for theirprotection against the adverse effects of 'evaluation.

-

D. Minority Groups Should Do thefollowing:

1. Actively seek representatiMinyolvementon testing industry and school systemdecision-making groups for test develop-ment and use.

2. Urge test makers to (a) revise. tests inconsideration of minority' differences,eliminating culture-related items from

270

AA

current tests and working toward .cross-cultural instruments,. (b) research ethnicand regional test requifeinents and with-drawttests found to be inappropriate to.the population being tested, and (c)explore and recommend alternativeforms of student evaluation.

3. Request from the testing industry docu-mentation on norming procedures andpopulation bases for norming.

4. Keep members informed of impropertest procedures and seek support or legalassistance where tests and results are mis-used.

5. Urge minority students to refuse to taketests which are., found to be biased andurge minority - teachers to refuse toadminister such tests.

6. Work to prevent the invasion of studentprivacy in interpretation and use of testresults.

7. Woik for legislation to preVent publica-tion of test scores and for enforcementof the federal Privacy Act affecting pupilrecord.

8. Promote legislatiOn to prevent the use oftest* scores as. a basis for allocation oflocal, state, or federal edUcational fund-ing.

9. Take strong positions and action againstthe .use of test results for tracking, todenigrate minority intelligence, or todeny students entrance to programs.

10. Expose the erroneous contentions ofShockley and Jensen that some groups insociety are genetically less intelligentthan others. (The` Typical group test' isconsidered (a) an unreliable measure ofmental ability and _(b) to be biasedagainst minorities, having been stan-dardized on, a different kind of popula-tion.)

11. Actively seek changes in' curriculum (in-cluding textbooks) to reflect minority

. concerns and diagnostic 'services basedon student needs as identified-by appro-priate testing:

12. Seek community support and funds forappropriate new or experimental educa-tion programs based on needs identifiedthrough means other than testing.

27_

-

13.- Become 4volved in planning and pro-viding pre- and in-service education forteachers o orient them to minorityproblems and needs related to testing.

14. Seek public awareness of and. concernfor minority problems in testing, andpressure' community media to help keepthe public informed, especially on issuesrelated to proper interpretation and useof test results.

15. Form coalitions for action in thedevelopment and use of tests.

E. The 'Testing Industry Should Do the Follow;ing:

1. Include in test development substantialnumbers of persons from all groups thathave an interest in and knowledge abouttesting, particularly representatives ofclassroom teachers and minority groups.

2. Be responsible for producing culturallyfair and bias-free tests that contain rele-vant items..

3. Work with all concerned groups in con-stantly monitoring, updating, and re-vising their tests. The industry shouldimmediately withdraw oat -of-ilate testsfrom the market, -as recommended bythose who use them.

4. Take regional. diversities into considera-tion in constructing tests to ensure rele-vance of test items.

5. Correlate tests to current and developingcurricula.

6. Improve sampling techniques and broad-en sampling bases.

7. Undertake in-depth research anddevelopment to perfect a wide -variety ofalternatives to standardized norm-refer-enced tests.

8.. Provide with each test copy a coverdocument specifying what- the test isdesigned for (to reveal depth of subjectknowledge, to-verify reading comprehen-sion, to establish -equivalency, etc. ) and

3 what groups (e.g., "early childhood,""later elementary ")' it is appropriate for.The document shbuld also include a re-lease form for student signatia--testify-ing that "I understand the purpose of

28

28

the test..." or "I am taking test underprotest...."

9. Provide an up-to-date manual with eachstandardised test, issued in English andother appropriate language editionsdepending on the 'student population.The manual should give clear and com-plete, information for administration ofthe test, including proper physical ar-rangements; define proper and improperuses of the test, warning particularlyagainst using the test for purposes ofteacher evaluation; explain various waysof interpreting results, providing 'infor-mation on the basis of norming'to ensureproper interpretation and including a"Surgeon General's warning" on thedangers of misinterpretation; delineatelimitations of the, test.

10. Provide with each test, not just bench,marks, but a range of scoring norms.

11.- Constantly monitor the distribution ofstandardized tests to, ensure. proper use,respond,, promptly to charges of.misuse,and refuse to sell tests or report scoreswhere misuse is evident:

12. provide in-service training for teachersand administrators in. the use of stan-dardized tests; provide consultants, andtest administiitors to assist teachers ingiving tests and developing sensitivity totesting conditionS; rand have representa-fives available- as resource persons for in-terpretation Of tesf res-ults.

'13. Provide information on the use of stan-dardized tests and interpretation of re-sults to schoOls of education and urgethem to include courses in tests and'measurements in theii required profes-sional preparation for teachers. suchcourses should include instruction onlimitations of tests, potential bias, and abroad range of alternatives to testing.

14. Develop recommendations for curricu-lum revisions 4s related to test results inorder to help teachers in planning re-medial programslor students.

15. Establish an exiensive.. PR program tokeep the public informed on testingissues' and developments, issuing infor-

nation materials in English and otherlanguage. editions.

F. School Administrators Should Do the Fol-.lowing:

1. Ensure that, `-w-hen----aripropriate, all teststo be administers reflect the uniquenessof the geographi region in which theyare administered and that locallydeveloped and standardized tests reflectupdated curriculum. . '

2. Involve teachers, students, and parents in.decision making related to the testingprogram. -

3. Ensure that all teachers who must' ad-..

ministet tests are-provided with adequatesupplies for the students; proper physicalarrangements, and thorough orientationtime,including practice testing.

4. Provide released tike for teachers for in-service training in the administration oftests.

5. Ensure that test results are not used tolabel students, that the confidentiality ofscores is protected in a professionalmanner, and that the federal Privacy Actaffecting pupil records is enforced inschool buildings and districts.

6. Make available to teachers or specialiststools for diagnostic purposes and train-ing,in their use. .

7. Keep parents informed about tesf results(using nontechnical language) :and keepthe school board informed about thelimitations and possible misuses of tests.

8. Continually evaluate the total testingprogram.

G. Appropriate College and University PersonnelShould Da the Following:

1. Serve a research function, providing toNEA and other concerned groups and tofaculty in the school of education theirfindings on the use and misuses of stan-,dardized tests (including their on test-ing devices), test bias, and alternatives,

2. Serve in a consultative capacity to thetesting industry, providing information-on student population and needs, new

2 9

curricula, college admissiOn policies. schol-arships, equal opportunity programs, andthe like.

3. Serve in a consultative capacity to schoolsystems, for in-service teacher educationand for, decision making aboUt curricu-lum changes based on the results of test-ing.

4 Seek...41e involvement of practitioners indeciiion making, relating to professional

-preparation in tests and measurements.5. Monitor test results ,from school districts

in their _region in relation,to new direc-tions for open admissions,- equal oppor-,tunity programs, scholarships, etc., andkeep junior and senior high schools in-formed about the .relationship of testscores to admission policies and programchoice.

6. Form coalitions to influence legislationand provide expert testimony on theproper uses of tests and test results.

H. Government Agencies

2.

The U.S. Congress should legislate re.straints on the use of tests that preventequakeducational opportunity.The approriate f6deral agencies should

Provide quality control of Jestingby taking steps to restrain the test-ing industry .from publishing teststhat are improperly conitliicted

- and by monitoring instruments s-to-ensure their constant updating.

'or Provide technical assistance and in-formation to educators and thepublic regarding test development

. and' use.

29 .

o j

Increase research effoi-ts in stan-dardized tests and alternatives.Assure that teachers are involved indecision making about the use ofrevenue-sharing funds as they applyto the school system's testing pro-gram.

3. State education agencies should7Provide consultant services, finan-cial assistance, and models forqiiality in-service education forteachers on the proper. administra-tion of tests and on limitationi oftest results.Provide for ,alternatives. to stan-dardized tests for state assessment.programs.' ..Prevent the improper distributionand administration of large-scaleassessment .program materials byinstituting sanipling procedures asopposed to blanket testing.

Local education agencies should- Provide released time and qualityin-service education for teachersand other school personnel on theadministration of tests and use ofresults.Prevent misuse of large-sCale assess-_ment instruments by institutingsampling procedures ,as opposed toblanket testing.

---,,,Education agencies at all levels shouldInvoIye teachers decision making

'an l.e.st development.Provide the funds for innovativeprogramso deyelop alternatives to

. standardized testing-and interpreta-tion.Provide the funds for long rangeexperimental testing programs.

e 4-4X.4

36

SO

WHY. SHOULD ALL"THOSE STUDENTS TAKE ALL THOSE TESTS?

The NEA Task Force on Testing,,, in its firstinteriin report, states:

The' Task Force believes they is overkill in the use of'standardized tests, and that the intended purposes oftesting can be accomplished through less use of stan-dardized tests, through sampling techniques wheretests are used, and through a variety of alternatives totests....

Representatives of the testing industry and otherstold the Task Force.that.sampling of student popPla-tions could be as effective as the blanket applidationof tests that is now so common. Some suggested thatsuch procedures, in addition to increasing the assureance of privacy, rights, would conserve time, effort,and financial expenditure.i,i

someblanket use of tests (every-pupil testing)

in some state assessment and local testing programsappears to require inordinate amounts of time andresources on the part of teachers, other personnelinvolved in test administration and interpretation,and the students themselves.

Criticisms of the blanket use of tests, hivecome from a variety of ilrominent researchers,evaluators,' and other educators.

4.,,House, Rivers, and Stufflebeam, in their

evaluation of the Michigan accountability system,concurred that.in that state:

Statewide testing as presently executed also raises thequestion. of the feasibility of every pupil testing. Thispractice appears to be of dubious value when the costof Such an undertaking is compared with the resultingbenefits to local level personnel.'... The local, 'andhence overall, costs cbuld be reduced by a matrixsampling plan which requires that each student testedtake only a, few items.... In the long run, a matrixsampling plan will -be the only One feasible from acost and time standpoint. The cost and time requiredfor every pupil testingfor the whole state would behorrendous. ... We feel that it [strict adherence to astatewide testing Modell will result in useless expen-ditures of monies and manpower, in addition to pro-ducing unwarranted disruptions of the educationalprograms within a great number of schools.2

In a paper entitled "Criteria for EvaluatingState Educatio'n 'Accountability Systems," the Na-

..

tional Education Association has laid dtiwn fifteenbasic principles, one of which'is as follows:

If the state desires test data for its own planning pur-poses, it 'should use proven matrix sampling tech-niques which will not reveal schools? and which willgreatly reduce costs.

Matrix sampling -techniques can give an accuratepicture of the state by various categories much moteefficiently .than testing. each child with an entireinstrumeni.3.,

It was with such admonitions as these in mindthat this chapter was written. And while someprocedures are appropriate for evaluating-kill stu-dents in one way or another for particular pur-poses, it would appear that there is gross over -useof blanket testing procedures.

To help teachers and other educators better.understand' some .main considerations related tosampling, the NEA obtained permissidn from Dr.Frank Womer, Michigan School Testing Service,University' of Michigan, to reproduce material froma monogfaph of s.iis on 'developing assessment pro-grams.4 In addition, Dr. Womer prepared, espe-cially for this paper, a section on item sampling.Dr. Women's recommendations follow the excerptsfrom his monograph.

Determining Whether Sampling Is To Be Used

The decision whether to test an entire popula-tion or use a sample involves a combination of con-cerns. Clearly there are policy considerations; clear-ly there are psychometric5 considerations; clearlythere are data collection considerations; and clearlythere are cost considerations. The best possiblestaff- and cOnsultant thinking on this questionshould be brOilglit to an advisory committee forthem to consider very carefully. - .

Probably the most crucial consideration is apolicy one, since psychometrics, data collection,and cost generally would argue on the side of

31.

5

sampling rather, than using an entire population.If it is deemed wise for policy reasons to test allstudents in a Population,that preference, typically,will have to be weighed against availal,le resourcesand technology; so Vewill consider first the policYimplications of the two choices.

One needs to look carefully at the purposesand goals, f a specific assessment program in deter-mining wh ther sampling is appropriate. If all ofthe specific purposes and objectiVes of an assess-,ment progra can be met by group results, thensampling mils be considered.

The only assessment situation' that dearlycalls for common data collection on all members ofthe population is