Harmmer Roz 13

7/30/2019 Harmmer Roz 13

1/10

Testingg Reasons for testing students Marking testsx Good tests = Designing testsw Test types

Reasons for testing studentsAt various stages during their learning, students may need_ or want to be tested on their"tiriiY i" the E"nglish lu,ig,rug.. If they arrive at a school and need to be put in a class at anup|rJpriut. levei they j" a placement test. This often takes the form of a number ofai*..i. (indirect) items (see belr,v), coupled with an oral interview and perhaps a longer|iece of writing. fhe purpose of thetest is to find out not onlywhat students know,but also*hut th.y doni know. Ai a result, they can be placed in an appropriate class,At various stages during a term or semester, we may give students progress tests, Thesehave the function of seeing"how students are getting on with the lessons, and how well theyhave assimilated what th.i hurr. been taughtver the ]ast week, two weeks or a month,At the end of a term, semester or year, we may want to do a final achievement test(sometimes called an exit test) to see how well students have learnt everything, Theirresults on this test may determine what class they are piaced in next year (in some_ schools,failing students have to repeat ayear),or ay be entered into some kind of schoo|_|eavingcertifi,cate. pically, achievement tests include a variety of test types and_ measure thestudents, abilities in all four skills, as wel1 as their knowledge of grammar and vocabularY,Many students enter for public examinations such as those offered by the Universityof cambridge EsoL, pitmari or Trinity college in the uk, and in the us, the universityof Michigan'and ToEFL and ToEIC. These proficienry tests are designed to show whatlevel a student has reached at any one time, urrd u.. used by employers and universities, forexample, who want a reliable measure of a student's ianguage abilities,So far in this chapter we have been talking about testing in terms of 'one-off' events,usually taking plu." ut the end of a period o} time (except for placement tests). These'sudden deat' .u.rrt, (where ability is measured at a particular point in time) are verydifferent from continuous assessment, where the studints'progress is_ measured.as it ist u|f.rri.rg, and where the measure of a student,s achievement is the work done a1l through-r# i.u..rg period and not just at the end. One form of continuous assessment is the|anguage pJ.ifolio, where students coliect examples of their work over time, so that theseil;.; 3r *ort can all be taken into account when an evaluation is made of their languagei.og..r, and achievement. Such portfolios (called dossiers in this case) are part of the CEF(Common European Framework), which also asks language learners to complete languagel66


2/10

7:lllTesting

PssPorts (sho ving their language abilities in all the languages they speak) and languagebiographies (describing their experiences and progress).There are other forms of continuous assessment, too, which allow us to keep an eye onhow well our students are doing. Such continuous recording may involve, among otherthings, keeping a record of who speaks in lessons and how often they do it, how copliantstudents are with homework tasks and how well they do them, and also how weli theyinteract with their classmates.Some students seem to be weli suited to taking progress and achievement tests as the mainwaY of having their language abilities measured. Others do less well in such circumstancesand are better able to show their abilities in continuous assessment environments. The bestsolution is probably a judicious blend of both.Good testsGood tests are those that do the job they are designed to do and which convince the peopletaking and marking them that they work. Good tests also have a positive rather thun unegative effect on both students and teachers.A good test is valid. This means that it does what it says it will. In other words, if we saythat a certain test is a good measure of a student's reading ability, then we need to be able toshow that this is the case. There is another kind of validity, too, in that when students andteachers see the test, they should think it looks like the real thing - that it has face validity,As they sit in front of their test paper or in front of the screen, the students need to haveconfidence that this test will work (even if they are nervous about their own abilities).However reliable the test is (see below) face validity demands that the students thinkitisreliable and valid.A good test should have marking reliability. Not only should it be fairly easy to mark,but anyone marking it should come up with the same result as someone else. However,since different people can (and do) mark differently, there will always be the danger thatwhere tests involve anything other than computer-scorable questions, different results willbe given by different markers. For this reason, a test should be designed to minimise theef[ect of individual marking styles.fNhen designing tests, one of the things we have to take into account is the practicalityof the test. We need to work out how long it will take both to sit the test and alio to markit. The test will be worthless if it is so long that no one has the time to do it. In the samewa|, We have to think of the physical constraints of the test situation. Some speaking tests,esPecially for international exams, ask not only for an examiner but also for an interlcutor(someone who participates in a conversation with a student). But this is clearly not practicalfor teachers working on their own,Tests have a marked washbacVbaclcrvash effect, whether they are public exams orinstitution-designed progress or achievement tests. The washback efflect occurs whenteachers see the form of the test their students are going to have to take and then, as aresult, start teachin g for the test. For example, they concentlate on teaching the techniquesfor answering certain types of question rather than thinking in terms of what languagestudents need to learn in general. This is completely understandable since teachers want asmanY of their students as possible to pass the test.Indeed, teachers would be careless if theydid not introduce their students to the kinds of test item they are likely to encounter in theexam. But this does not mean that teachers should allow such test preparation to dominate

I67


3/10

Chapter r3

tlreir ]essons and cleflect fron tlreir main teaching aims and procedures.The rvashback eflect has a negative ef}'ect on teaching if the test fails to nirror ourteachine because then we lt iil be tenpted to nrake our teaching fit tlre test, rather than tl,reother r.vaY lound. lvlan,lr p66.r,n pubiic examinations have iilproved greatl,v ii-om theirmore traditional versions, so thart thev often do ref]ect contempora y teaihing practice. Asa resltlt, the r 'ashbzrck effect does not have tlre baleful influence orr teaching *hich we havebeen discr_rssing,lVhen rve design our otvn progress and achievernent tests, rve need to try to ensure thatwe are not asking sttrdents to do things rvhich are completely different from the activitiesthey have taken part in during our lessons. That rvor-rld clearll, be rrnfrrir.Fini,rlly, rt'e need to renlenrber thirt tests have a powerfi-rl eft-ect on student motivation.Firstl1', students often tvork a 1ot harder than nornral rvhen there is a test or eraminationin sight, SecondlY, they can be greatly encouraged by success in tests, o conversely,demotivated bY doing badly. For tlris reason, $le IT13}, want to try to discotrrage stlldentsfronr taking pubiic examinations that they are clearly going to fail, and rvl'en deiigning ourown Progress and achieven]ent tests, we may want to consider the needs of ali our stud-ents,not just tlre ones vho are doing weil. This does not mean lvriting easy tests, btrt it cloessllggest that r,vhen writing p1,ogress tests, especially, lve do not lvant to design the test so thatstudents fail unnecessarily - and are consequently demotivated by the experience,Test typeslVherr designing tests, we can either lvrite discrete items, or ask students to become involvedin more integrative language use. Discrete-item testing means only testing one thing at atime (e.g. testing a verb tense or a word), whereas integrative testing means asking stuentsto use a variety of language and skills to complete a task successfully. A further clistinctionneeds to be made between direct and indirect test items. A direct test item is one that asksstudents to do sometiring with language (e.g. write a lettel, read and reply to a ne\^/spaperarticle or take Palt in a conversation). Direct test items are almost il-uy, integiative.Tndirect test items are those which test the students' knowledge of language rathJr thangetting thern to use it. Indirect test items might focus on, say, wor-d coilcations (see page75) or the correct use of modal verbs (see page 69). Direct test items have more to do withactivntiotl,whereas indirect items are more closely related to study- that is the constructionoflanguage.lndirect test itemsThere are manY different \\iays of testing the students'knowledge of language construction.we wiil look at three of the most common.Multiple choiceMultiPle-choice questions are those where students are given alternatives to choose from,as in the follolving example:

t68


4/10

circle the correct answer.you must here on time.a to get b getting c to have get d get

Testing

Sometimes students are instructed to choose the'correct'ans!\rer (becar_rse only one ans\Ieris Possibie), as in the example abol,e. But sonetimes, instead, theycan be told to choose the'best'anslver (because, although nrore than one answel.is possible, one stands out as themost appropriate), e.g,

MultlPle-choice questiorrs have the great advantage of being easy to mark. Answer sheets'T !" read bY comPuter, oI can be niarkecl by puttTng o t.oripu..rrcy over the ansrver sheetwhich shows the circled corlect letters. ]vlarkers dJ not hav. to woIry) then, aborrt thelangr,'age in the questions; it is sinply a rnatter of checkirrg the correct 1etters for eachquestion., One Problem with multiple-choice questions lies in the choice of distractors, that isthe three incorrect (or inappropriate) urir*".r. For while it may not be difficult to writeone obvious distractor (e.g. answer a'to get' in the first example above), because that is amistake that students commonly make, ii becomes less easy to come up rvith three itemswhich will all sort out those students lvho know horv this piece of language *ork. f.o*the ones who don't. In other words, there is a danger that we will eitherisiract too manystudents (even those who should get the questio"n right) or too few (in vhich case thequestion has not done its job of differentiating stud.n). 'MultiPle-choice questions can be used to test ..uji.rg and listening comprehension(we can also use true/false questions for this: students cir?le 'i' or ,F, next to statementsconcerning material they have just read or listened to).The washback effect of multiple-choice questions leads some people to find themunattractive, since training students to be good- at multiple-choice questions may not helpthem to become better language learners. d there is a limit to how much we can test withthis kind of indirect item. Nevertheless, multiple-choi.. qrr.rti,cns are very attractive interms of scorer reliability,Fill-in and clozeThis extremelY common form of indirect testing involves the examinee writing a vord in agap in a sentence or paragraph, e.g.GaP-fi l (or fill-in) items like this are fairly easy to write, though it is often difficult to ieavea.gaP where onlY one item is possible. In such cases) we wiil ,re"d to be ar,vare of whatdifferent answers we can accept. They also make marking a little more complex, though we

l69

Circle the best answer.police are worried abouta juvenile b childish c the level of -- crime.young d infant

Multiple-choice questions

Yesterday l went ashe did not. -

the cinema b - my friend ctare. l enjoyed the film c


5/10

abc

Chapter r3

can design anslver sheets lvhere students only have to vritediff'erent Ietters, e.g. the required -"vord against

A variation on fili-ins and gap-fills is the cloze procedure, r,r,hele gaps are pi-rt into a text atregular intervals (say every sixth word). As a result, without the test r,vriter having to thinkabout it too much, students are forced to produce a lvide range of different words based oneverytlring from collocation to verb formation, etc, as in the following example.A1l around the world, students a _ all ages are learning to b _ English, but theirreasons for c _ to study Engtish can differ d _. Some students, of course, onlye_ English because it is on f _ curriculum at primary orsecondary 8i_,but for others, studying the h _ reflects some kind of a i _.

The random selection of gaps (every sixth lvord) is difficult to use in all circumstances.Sometimes the sixth word wi]l be impossible to guess - or will give rise to far too manyalternatives (e.g. gaps c and d above). Most test designers use a form of modified cloze tocounteract this situation, trying to adhere to some kind of random distribution (e.g. makingevery sixth word into a blank), but using their common sense to ensure that students havea chance of fiiling in the gaps successfully - and thus demonstrating their knowledge ofEnglish,TransformationIn transformation items students are asked to change the form of words and phrases toshow their knowledge of syntax and word grammar. In the following test type they aregiven a sentence and then asked to produce an equivalent sentence using a given word:

Rewrite the sentence so that it means the same. use the word in boldCould l borrow five pounds, please?[end

In order to complete the item successfully, the students not only have to know the meaningof borrow and lend, but also how to use them in grammatical constructions.A variation of this technique is designed to focus more exactly on word grammar. Here,students have to complete lines in a text using the correct form of a given word, e.g.lt was a errif:linl performance. terrifyThe acrobats showed no fear even though absolutetheir feats of shocked the crowd into stunned silence. dareThese kinds of transformations work very weil as aknowiedge of grammar and vocabulary. However,construct.

170

test of the students' underlyingthe items are quite difficult to


6/10

TestingThere are many other kinds of indirect test item. We can ask students to put jumble d r,vordsin order, to make correct sentences and questions. We can ask them to identi, and correctmistakes or match the beginnings and ends of sentences. Our choice of test item lvill dependon rvhich, if an1', of these techniques 1,,,e hal,e used in our teaching since it l ,illalways beunfair to give students test items unlike anl.thing they har.e seen before.Direct test itemsIn direct test items, we ask students to tlse language to do something, instead of just testingtheir knorvledge of how the language itself rvorks. We might ask our students to writeinstructions for a simple task (such as using a vending machine or assenbling a shelvingsystem) or to give an oral mini-presentation.There is no real limit to the kinds of tasks r ,emight ask students to perform. Thefollowing list gives some possibilities:Reading and listeningSome reading and r,vriting test items look a bit like indirect items (e.g. lvhen students aregiven mrrltiple-choice questions about a partici- lar word in a text, for example, or have toanswer T/F questions about a particular sentence), But at other times we might ask studentsto choose the best sllmmary of what they have heard or read. We might ask them to pLtt aset of pictures in order as they read or listen to a story, or compiete a phone message forn(for a listening task) or fill out a summary form (for a reading task).Many reading and listening tests are a blend of direct and indirect testing. We canask students direct language - or text-focused - questions as well as testing their globalunderstanding.WritingDirect tests of writing might include getting students to write leaflets based on informationsuPPlied in an accompanying text, or having them write compositions, such as narrative anddiscursive essays, We can ask students to write'transactional letters' (that is letters replyingto an advertisement, or something they have read in the paper, etc). In transactional writingwe expect students to include and refer to information they are given.SpeakingWe can interview students, or we can put them in pairs and ask them to perform a nunberof tasks. These might include having them discuss the similarities and differences betrveentwo pictures (see information-gap activities on page 129); they might discuss how tofurnish a room, or talk about any other topic we select for them. We can ask them to role-Play certain situations (see page 125), such as buying a ticket or asking for information ina shop, or we might ask them to talk about a picture we show them.\'Vhen designing direct test items for our students, lve need to remember two crucial facts.The first is that, as with indirect tests, direct tests should have items which look like the kindof tasks students have been practising in their lessons. In other words, there is no point ingiving students tasks which, because they are unfamiliar, confuse them. The result of thiswiil be that students cannot demonstrate properly how well they can use the language, andthis will make tlre test worthless.Direct test items are much more difficult to mark than indirect items. This is becauseour response to a piece of writing or speaking will almost certainly be very subjective -

1,7 1,


7/10

:I-,

Chapter r3

uniesswedoSomethingtomodithissubjectivity.\lelvillnor'vgoontolookathorvthiscan be done.Marking tests l _ ^,1_.L^_,^t t.irl,hnxesorindividualThemarkingoftestsisreasonabiysimpleifthemarkersonlyhavetotickboxesorindividrwords (though even here human error can ;;;;;.;p in). things are a lot more complex,holvever, rvhen we have to evaluate a more integrative ':': :l:,*t an overall score (say"" ";;';;y of marking a piece of rvriting, for exampie, ls to gi\A or B, or 650lo). r * ']rr be based o., orr.'.'*"p.il+ oi the l.uel we are teaching and onour ,gut_instinct, reactio; ;; ;h" *. ,.ud. ii,,is is the way that many essays are markedin va-rious different u'r'rr., of education ""j,"_"r*" ""h marking can be highlyappropriate. r"*.r.., ,*i,irrri".i, i, a highly Subjective phenomenon, our judgment canbe heavily swayed oy flctors We are "", ;;;;;;;i"il of._ lj students vil1 remembertimes when they didn{,rnj.rr,u.,a *hy,g",'; il_r:,j:, u" essay which iooked.._".r."ury similar to one of their classmates, higher_scortng pleces,There are ,*o *u/r"oi.o,rrri..i,g t},.d";;;;i;u,r" ,u"uj,-ivity, The first is to involveother people.When ffi:;ffi'fi;i" i""k";t the same piece of work and, independently,

give it a score, we .u., t u* *or. io.rfid.,," i",r" ,""r"uiio" of the writing than if just onepersonlooksatit. .l , ,__^_l,:_^ '-^"o nhiective is to use marking scales for aThe other way of making the markine more objective is tcrange of different ir;*;.iil; u.. *u,king;;;,;ir, ;,oI p""",ution, we might use thefollowing scales:

This kind of scale forces us to look at our student,s speaking in more detail than is allowedby an overall i_pr.rriJ.,lrii. rr..It also utio*, to,'diff",i", in individual performance:a student may get marked down on e, +i","*:::::T::,Tr}] ;l,;n.:i;,:H*iil #i:filrTl'ff:;il,i;: i:|:x*l* of a total of 25 may reflecthis or her ability mole accurateiy than , "";_;;k'. i*|r.r,," wili do, But we are still leftrvith the problem "f l;;;;;-.,"u.tty wh], we should give a student 2 rather than 3 fororonunciation. What ;,fe" studnts r,"*," J"-," !co,e 5 for grammar? What wouldmuk. o, giv" studentlT"i;i il;;';;j;;;t";' t, stil' an issue"here (though it is lessproblematic because *.""r.i"r.i.rg o"",lu"-,"io Ju 'ut,different aspects of the students,performance).^ l _,__^_l.:_^.ralacmfpohiectiveistowritecarefuldescriptionsone way ot tryrng to make_marking scales more objective is to l'#ff:l:T:ffi;r'.,'',of what the differen; r.;;., for each*cat.g*} u.,.,uliy represent, Here, for example, is a,lil" fo, assessing writing, which uses descriptions:

Grammarpronunciation

|72


8/10

Testing

5 Exemplary 4 strong 3 Satisfactory 2 Developing r WeakcoEU@oE

0riginal treatmentof ideas, well-developed fromstart to fini5h,focused topic withrelevant, 5trongsuppoing detait,

Clear, interestingideas enhancedby appropriatedetails.Evident mainidea with somesupporting details.May have someirrelevant materia[,gaps in neededinformation.

Some attemptat support butmain topic maybe too generalor confused byirrelevant detaiIs.

Writing [acksa central idea;development isminimal or non-existent, wanders.

co,6CMo

Effectivelyorganised ina logical andinteresting Way.Has a creativeand engagingintroduction andcon clusio n.

structure movesthe readersmoothly throughthe text. wellorganised withan invitingintroduction and asatisling closure.

Olganisation isappropriate butconventiona l.There is anobvious attemptat an introductionand conclusion.

An effort has beenmade to organisethe piece, but itmay be a 'list'of events. Theintroduction andconclusion are notwell developed,

A lack of structuremakes thispiece hard tofol[ow. Lead andconclusion maybe weak or non-exi5tent.

o,Us

passionate,compelling, fullof energy andcommitment.shows emotionand generatesan emotionalresponse from thereader.

Expressive,engaging,sincere tone withgood sense ofaudience. writerbehind the wordscomes throughoccasiona [y.

pleasant but notdistinctive toneand persona. Voiceis appropriateto audience andpurpose.

Voice may bemechanical,artificial orinappropriate.Writer 5eem5 tolack a sense ofaudience.

Writing tends tobe flat or stiff.Style does notsuit audience orpurpose.

o.9Eo=

Careful[y chosenwords conveytrong, fresh,vivid imagesconsistentlythroughout thepiece.

word choice isfunctional andappropriate withsome attempt atdescription; mayoveruse adjectivesand adverbs.

Words maybe correct butmundane; writingu5e5 patternsof conversationrather than booklanguage andstructu re.

word choice ismonotonous; maybe repetitious orimmature.Limited vocabularyrange.

coJIocoEoLn

High degree ofcraftsmanship;control of rhythmand flow so theWriting soundsalmost musicalto read aloud.variation insentence lengthand forms addsinterest andrhythm.

The piece has aneasy flow andrhythm With agood variety ofsentence engthand structures.

The writing showssome generalsense of rhythmand flow, but manysentences follow asimilar structure.

Many similar5entencebeginnings andpatterns With littlesense of rhythm;sounds choppy toread aloud. Mayhave many shortsentences or run-ons.

No real sentence5en5e - mayramble or soundchoppy to readaloud.

.oc@EU

The writingcontains few,if any, errors inconventions. Thewriter showscontrol over awide range ofconvention5 forthis grade level.

Generally, thewriting is free fromerrors, but theremay be occasionalerrors in morecomplex wordsand sentenceconstructio ns.

occasional errorsare noticeable butminor. The writeruses conventionswith enough skillto make the papereasily readabLe.

The writingsuffer from morefrequent enors,inappropriate tothe grade level,but a reader canstill follow it.

Errors inconventionsmake the writingdifficult to follow.The writer seemsto know someconventions, butconfuse manymore.A tnarking scnle for writillg

la

l/)


9/10

Chapter r3

This frarnelvork sllggests that the students'writing wil1 be marked fairly and objectively.But it is extremely cumbersome, and for teachers to use it lvell, they i,vill need training andfamiliarity with the different descriptions provided here.When marking tests - especially progress tests \\ie design ourselves - we need to strikea balance between totally si"rbjective one-nark-only evaluation on the one hand, and over-complexity in marking-scale frameworks on the other.Designing testWhen lve write tests for our classes, rve need to bear in mind the characteristics of good testswhich we discussed on pages 167-168.We r,vill think very carefully about how practical ourtests wii1 be in terms of time (including horv long it wil1 take us to mark them).When writing progress tests, it is important to try to work out what we want to achieve,especially since the students' results in a progress test will have an immediate effect ontheir motivation. As a consequence, veneed to think about hor,v difficult we want the testto be. Is it designed so that only the best students rvill pass, or should everyone get a goodmark? Some test designers, especially for public exams, appear to have an idea of how manystudents should get a high grade, what percentage of examinees should pass satisfactorily,and what an acceptable failing percentage would look like.Progress tests should not work like that, however. Their purpose is only to see howwell the students have learnt what they have been taught. Our intention, as far as possible,should be to allow the studerrts to shor,v us what they know and can do, not what they don'tknow and can't do.When designing tests for our classes, it is helpful to make a list of the things we wantto test. This list might include grammar items (e.g. the present continuous) or direct tasks(e.g. sending an emaii to arrange a meeting). When we have made our lists, we can decidehow much importance to give to each item. We can then reflect these different levels ofimportance either by making specific elements take up most of the time (or space) on thetest, or by weighting the marks to reflect the importance of a particular element. In otherwords, we might give a writing task double the marks of an equivalent indirect test item toreflect our belief in the importance of direct test lypes.When we have decided what to include, we write the test. However, it is important thatwe do not just hand it straight over to the students to take. It will be much more sensibleto show the test to colleagues (who frequently notice things we had not thought of) first. Ifpossible, it is a good idea to try the test out with students of roughly the same level as theones it is designed for. This will show us if there are any items which are more difficult (oreasier) than we thought, and it will highlight any items which are unclear - or which causeunnecessary problems.Finally, once we have given the test and marked it, we should see if we need to make anychanges to it if we are to Lrse some or all of it again.It is not always necessary to write our own tests, however. Many coursebooks nowinclude test items or test genelators which can be used instead of home-grown versions.Howevel, srrch tests may not take account of the particular situation or learning experiencesof our own classes.

17


10/10

Testing

Conclusions l ln this chapter we hove:r discussed the different reasons that students tal

Documents

Harmmer Roz 13