Download pdf - Understanding Assessments: What they Mean and What they Do

DylanWiliam(@dylanwiliam)

UnderstandingAssessments:WhattheyMeanandWhattheyDo

www.dylanwiliamcenter.com www.dylanwiliam.net

Ini$alassump$ons

•  Anyassessmentsystemshouldbedesignedtoassesstheschool’scurriculumratherthanhavingtodesignthecurriculumtofittheschool’sassessmentsystem.

•  Sinceeachschool’scurriculumshouldbedesignedtomeetlocalneeds,therecannotbeaone-size-fits-allassessmentsystem—eachschool’sassessmentsystemwillbedifferent.

•  Thereare,however,anumberofprinciplesthatshouldgovernthedesignofassessmentsystems,and

•  Thereissomesciencehere—knowledgethatpeopleneedinordertoavoiddoingthingsthatarejustwrong.

2

Assessment:Acau$onarytale

A B C D E F G H Total

Adams 100 30 47 72 40 75 30 47 441

Brown 90 38 43 60 20 65 48 70 434

Collins 61 36 40 45 41 55 62 80 420

Dorkin 63 32 51 90 30 70 47 35 418

Evans 56 55 41 82 45 40 49 41 409

Fuller 80 45 49 64 65 45 38 20 406

Grant 23 47 45 55 60 80 32 60 402

Howell 40 35 52 70 56 20 60 65 398

Iman 85 40 60 40 28 51 55 30 389

Jones 72 54 50 10 25 35 66 75 387

Keller 48 57 55 34 70 60 36 10 370

Lant 10 60 59 20 35 30 70 58 342Mean 61 44 49 54 43 52 49 49

Equalizingtherangeforeachsubject


Adams 100 0 35 77 40 92 0 53 397

Brown 89 27 15 63 0 75 45 86 400

Collins 57 20 0 44 42 58 80 100 401

Dorkin 59 7 55 100 20 83 43 36 403

Evans 51 83 5 90 50 33 48 44 404

Fuller 78 50 45 68 90 42 20 14 407

Grant 14 57 25 56 80 100 5 71 408

Howell 33 17 60 75 72 0 75 79 411

Iman 83 34 100 38 16 52 62 29 414

Jones 69 80 50 0 10 25 90 93 417

Keller 42 90 75 30 100 67 15 0 419

Lant 0 100 95 12 30 17 100 69 423Mean 56 47 47 54 46 54 49 56

Andusingclassranksineachsubject…


Adams 1 12 8 3 7 2 12 7 52

Brown 2 8 10 6 12 4 7 3 52

Collins 7 9 12 8 6 6 3 1 52

Dorkin 6 11 5 1 9 3 8 9 52

Evans 8 3 11 2 5 9 6 8 52

Fuller 4 6 7 5 2 8 9 11 52

Grant 11 5 9 7 3 1 11 5 52

Howell 10 10 4 4 4 12 4 4 52

Iman 3 7 1 9 10 7 5 10 52

Jones 5 4 6 12 11 10 2 2 52

Keller 9 2 3 10 1 5 10 12 52

Lant 12 1 2 11 8 11 1 6 52

Beforewecanassess…

•  The‘backwarddesign’ofaneducaaonsystem– Wheredowewantourstudentstogetto?

•  ‘Bigideas’– Whatarethewaystheycangetthere?

•  Learningprogressions– Whenshouldwecheckon/reportprogress?

•  Inherentandusefulcheckpoints

6

Bigideas

7

Bigideas8

•  A“bigidea”–  helpsmakesenseofapparentlyunrelatedphenomena–  isgenera)veinthatiscanbeappliedinnewareas

Bigideasinreading

•  Wriangisanafempttocommunicatemeaning•  Makingsenseoftextohenrequiresmakingconnecaonsbetweensentences

•  Writersohenchoosewordsfortheeffecttheyhaveonthelistener/reader

•  Thehero’sjourney(Campbell,1949)•  …

9

Learningprogressions

Whatisitthatgetsbeferwhensomeonegetsbeferatreading?

10

The“seduc$veallure”ofneuroscience

11

Cor$callanguagelocaliza$on12

•  117individuals(aged4to80)undergoingfrontalorfrontotemporoparietalcraniotomiesasatreatmentforepilepsy

•  Subjectswereshownlinedrawingsoffamiliarobjectsandaskedtonamewhattheyhadseenwhileexposedregionsofthecerebralcortexweresamulatedwithelectriccurrent

•  Namingerrorsweretakenasindicaangthattheregioninquesaonwasessenaaltolanguage

Ojemann,Ojemann,Leatch,andBerger(1989)

58

6

1417

14

3334

11

1264

8211 45

68 58 51

667261625876

6046 60

5861 44

8

6 10 23 3


Numberofpaaentswithasiteineachzone(outof117)

2037

50

5029

43

4523

18

4236

799 27 19

14 26

293629261914

52 7

1921 32

8

0 0 0 0 0


Percentageofpaaentswithasiteineachzonewithsignificantnamingerrorsinthatzone

“Allmodelsarewrong;someareuseful”

“Sinceallmodelsarewrongthescienastcannotobtaina‘correct’onebyexcessiveelaboraaon.OnthecontraryfollowingWilliamofOccamheshouldseekaneconomicaldescripaonofnaturalphenomena.Justastheabilitytodevisesimplebutevocaavemodelsisthesignatureofthegreatscienastsooverelaboraaonandoverparameterizaaonisohenthemarkofmediocrity.”(Box,1976p.792)

Learningprogressions

•  Whatgetsbeferwhenstudentsgetbeferatreading?–  Phonemicawareness–  Phonics–  Fluency–  Vocabulary–  Textcomprehension

16

NaaonalReadingPanel(2001)

The“simple”viewofreading17

Scarborough(2001)

Backgroundknowledge

Vocabulary

Languagestructures

Verbalreasoning

Literacyknowledge

Sightrecogniaon

DecodingPhonologicalawareness

Lefers

Translaaonrules

Wordsounds

Syntacacrules

Ideaweb

Spellings

Situaaonmodel

Wordmeanings

Sentencerepresentaaon

Expandedmodelofreading(Willingham,2017)

Copythis19

ЖӘШІК

Readingskills:whataretheyreally?

“Amanifold,containedinanintuiaonwhichIcallmine,isrepresented,bymeansofthesynthesisoftheunderstanding,asbelongingtothenecessaryunityofself-consciousness;andthisiseffectedbymeansofthecategory.”Whatisthemainideaofthispassage?

A.  Withoutamanifold,onecannotcallanintuiaon‘mine.’B.  Intuiaonmustprecedeunderstanding.C.  Intuiaonmustoccurthroughacategory.D.  Self-consciousnessisnecessarytounderstanding

Hirsch(2006)

Lostintransla$on?

•  “Comprehensiondependsonconstrucangamentalmodelthatmakestheelementsfallintoplaceand,equallyimportant,enablesthelistenerorreadertosupplyessenaalinformaaonthatisnotexplicitlystated.Inlanguageuse,thereisalwaysagreatdealthatislehunsaidandmustbeinferred.Thismeansthatcommunicaaondependsonbothsides,writerandreader,sharingabasisofunspokenknowledge.Thislargedimensionoftacitknowledgeispreciselywhatisnotbeingtaughtadequatelyinourschools.”

Hirsch(2009loc.176)

Domainknowledgeandmemory

•  3rd(N=64),5th(N=67)and7th(N=54)gradestudentsfromHeidelberg,Germany,testedonreadingexperaseandsoccerknowledge–  13-itemquesaonnaireonsoccerknowledge–  standardizedreadingcomprehensiontest

•  Studentsheard(twice)andreadawell-structuredreadablestoryonayoungplayer’sexperiencesinasoccergame

•  Tested15minuteslaterwithaclozeversionofthetestwith20blanks

Schneider,Körkel,andWiener(1989)

High

Low

HighLow

Knowledge of socce

rReading ability

16.417.0

11.111.0

Assessment

24

WriOenexamina$ons

“Theyhavepervertedthebesteffortsofteachers,andnarrowedandgroovedtheirinstrucaon;theyhaveoccasionedandmadewellnighimperaavetheuseofmechanicalandrote

methodsofteaching;theyhaveoccasionedcrammingandthemostvicioushabitsofstudy;theyhavecausedmuchofthe

overpressurechargeduponschools,someofwhichisreal;theyhavetemptedbothteachersandpupilstodishonesty;

andlastbutnotleast,theyhavepermifedamechanicalmethodofschoolsupervision.”

25

(White,1888pp.517-518)

Campbell’slaw

“Themoreanyquanataavesocialindicatorisusedforsocialdecision-making,themoresubjectitwillbetocorrupaonpressuresandthemoreaptitwillbetodistortandcorruptthesocialprocessesitisintendedtomonitor.”(Campbell,1976p.49)–  Allperformanceindicatorslosetheirmeaningwhenadoptedaspolicytargets

–  Thecleareryouareaboutwhatyouwant,themorelikelyyouaretogetit,butthelesslikelyitistomeananything

26

The“LakeWobegon”effect

3.4

3.5

3.6

3.7

3.8

3.9

4.0

4.1

4.2

4.3

4.4

1986 1987 1988 1989 1990

Grade

equ

ivalen

ts

TestC TestB TestC

Koretz,Linn,DunbarandShepard(1991)

Effectsofnarrowassessment

•  Incenavestoteachtothetest–  Focusonsomesubjectsattheexpenseofothers–  Focusonsomeaspectsofasubjectattheexpenseofothers

–  Focusonsomestudentsattheexpenseofothers(“bubble”students)

•  Consequences–  Learningthatis

•  Narrow•  Shallow•  Transient

28

GeSngassessmentright

29

Whatisanassessment?

•  Anassessmentisaprocedureformakinginferences– Wegivestudentsthingstodo– Wecollecttheevidence– Wedrawconclusions

•  Keyquesaon:“Onceyouknowtheassessmentoutcome,whatdoyouknow?”

•  Foranytest:–  someinferencesarewarranted(valid)–  somearenot

30

Validity

•  Evoluaonoftheideaofvalidity–  Apropertyofatest–  Apropertyofstudents’scoresonatest–  Apropertyofinferencesdrawnonthebasisoftestresults

•  “Onevalidatesnotatestbutaninterpretaaonofdataarisingfromaspecifiedprocedure”(Cronbach,1971)

•  Consequences–  Nosuchthingasavalid(orindeedinvalid)assessment–  Nosuchthingasabiasedassessment–  Formaaveandsummaavearedescripaonsofinferences

31

Meaningsandconsequencesofassessment

•  Evidenaalbasis– Whatdoestheassessmentresultmean?

•  Consequenaalbasis– Whatdoestheassessmentresultdo?

•  Assessmentliteracy(Saggins,1991)–  Doyouknowwhatthisassessmentresultmeans?–  Doesithaveualityforitsintendeduse?– Whatmessagedoesthisassessmentsendtostudents(andotherstakeholders)abouttheachievementoutcomeswevalue?

– Whatislikelytobetheeffectofthisassessmentonstudents?

Validityrevisited

“Validityisanintegraaveevaluaavejudgmentofthedegreetowhichempiricalevidenceandtheoreacalraaonalessupporttheadequacyandappropriatenessofinferencesandacaonsbasedontestscoresorothermodesofassessment.”(Messick,1989p.13)•  Socialconsequences:–  “Rightconcern,wrongconcept”(Popham,1997)

Qualityinassessment

•  Threatstovalidity–  Construct-irrelevantvariance

•  Systemaac:goodperformanceontheassessmentrequiresabiliaesnotrelatedtotheconstructofinterest

•  Random:goodperformanceisrelatedtochancefactors,suchasluck(effecavelypoorreliability)

–  Constructunder-representaaon•  Goodperformanceontheassessmentcanbeachievedwithoutdemonstraangallaspectsoftheconstructofinterest

35

Discussio

n •  Workingasagroup,trytoframeonevalidityissueasanissueofconstruct-irrelevantvarianceorofconstructunder-representaaon.

Understandingreliability

36

Understandingtestscores

•  Consideratestofstudents’abilitytospellwordsdrawnfromabankof1000words.

•  Whatwecanconcludedependson:–  Thesizeofthesample–  Thewaythesamplewasdrawn–  Students’knowledgeofthesample–  Theamountofnoacegiven

Samplesandreliability

•  Supposeweaskastudenttospell20ofthewordsdrawnatrandom,atfivedifferentamesoftheday,withthefollowingresults–  15 17 14 15 14–  Onaverage,thestudentscores15outof20–  Ourbestguessisthestudentcanspell750ofthe1000words

•  Iftheresultswere:–  20 12 17 10 16–  Ourbestguessissallthatthestudentknows750ofthe1000spellings

–  Butnowwearemuchlesscertainaboutthis

Someexamples

Example1Actualscore 15 17 14 15 14Differencefromaverage 0 +2 -1 0 -1Averageerror 0(bydefiniaon!)Standarddeviaaonoferrors 1.2

Example2Actualscore 20 12 17 10 16Differencefromaverage 5 -3 +2 -5 +1Averageerror 0(bydefiniaon!)Standarddeviaaonoferrors 4.0

Quan$fyingreliability

•  The“standarderrorofmeasurement”or“SEM”isjustthestandarddeviaaonoftheerrorsaveragedoveralltesttakers

•  Thereliabilityofthetestis:

Rela$onshipofreliabilityanderror

•  Foratestwithanaveragescoreof50,andastandarddeviaaonof15(sothatmostscoresrangefrom20to80),errorsofmeasurementareasfollows:

Reliability Standarderrorofmeasurement

0.70 8.20.75 7.50.80 6.70.85 5.80.90 4.70.95 3.4

Whatdoesthismean?

•  Consideraclassof25studentstakingareadingtest–  withareliabilityof0.85–  anaveragescoreof50–  astandarddeviaaonof15(mostscoresrangefrom20to80)

•  Then–  17studentsgetascorewithin6pointsoftheirtruescore–  7studentsgetascorethatismorethan6points,butlessthan12pointsfromtheirtruescore

–  andonestudentgetsascorethatdiffersfromtheirtruescorebymorethan12points

•  Unfortunately…–  youwon’tknowwhichstudent–  andyouwon’tknowiftheirscorewashigherorlowerthanitshouldhavebeen

Reliability:0.75

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70 80 90 100

Observedscore

Truescore

Reliability:0.80

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70 80 90 100

Observedscore

Truescore

Reliability:0.85

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70 80 90 100

Observedscore

Truescore

Reliability:0.90

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70 80 90 100

Observedscore

Truescore

Reliability:0.95

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70 80 90 100

Observedscore

Truescore

Understandingwhatthismeansinprac$ce

48

Groupingstudentsbyability

49

Usingtestsforgroupingstudentsbyability

shouldbein

group1 group2 group3 group4

studentsplacedin

group1 23 9 3

group2 9 12 6 3

group3 3 6 7 4

group4 3 4 8

Usingatestwithareliabilityof0.9,andwithapredicavevalidityof0.7,togroup100studentsintofourabilitygroups:

Only50%ofthestudentsareinthe“right”group

Diagnos$ctes$ng

51

Thelimitsofdiagnos$ctes$ng

•  120-itemmulaplechoicetestforteacherlicensure–  Fourmajorsubjectareas

•  languagearts/reading•  mathemaacs•  socialstudies•  science

–  30itemspersubjectarea–  Sub-scorereliabiliaesrangefrom0.71to0.83

Howreliableare10-itemsubtestscores?

•  Itemsforeachsubjectarearankedinorderofdifficulty(i.e.,1to30)

•  Threeparallel10-itemformscreatedineachsubjectarea:–  FormA:items1,4,7,…28–  FormB:items2,5,8,…29–  FormC:items3,6,9,…30

•  Sub-scorereliabiliaesintherange0.40to0.60•  OnformA,271examineesscored7inmathemaacsand3inscience

Scoresof271studentsonformB

Sciencesubscore1 2 3 4 5 6 7 8 9 10

Mathsubscore

1 0 0 0 1 1 1 0 0 0 0

2 0 0 0 1 3 1 2 0 0 0

3 1 0 0 1 2 4 3 1 1 1

4 0 0 2 7 7 6 4 0 1 0

5 0 1 1 1 10 14 8 5 1 1

6 2 0 1 5 10 11 15 8 1 1

7 0 1 4 4 4 11 10 7 4 0

8 0 1 1 5 12 13 7 5 4 0

9 0 0 1 1 6 3 7 4 3 0

10 0 0 0 1 1 2 1 1 0 0

Sinharay,GautamandHalberman(2010)

110outof271(41%)examineesgotabeferformBscoreinsciencethanmathemaacs

Whatdoesthismean?

•  Astudentscoring7onmathemaacsand3onsciencewouldprobablywanttoimprovethelafer

•  But110ofthe271examineesgotabeferscoreinsciencethanmathemaacsonFormB

•  CorrelaaonofsciencesubscoresonFormsAandBis0.48

•  CorrelaaonofsciencesubscoreonFormAwithtotalscoreonFormBis0.63

•  Inotherwords,thetotalscoreonthetotaltestisabeferguidetothescoreonasub-testthananotherscoreonthesamesub-test

Measuringprogress

56

Reliability,standarderrors,andprogress

Grade Reliability SEMasapercentageofannualprogress

1 0.89 26%2 0.85 56%3 0.82 76%4 0.83 39%5 0.83 55%6 0.89 46%

Average 0.85 49%Inotherwords,thestandarderrorofmeasurementofthisreadingtestisequaltosixmonths’progressbyatypicalstudent

Inotherwords…

•  Inaclassof25students,iftheyhaveallmadeexactlytheexpectedprogress,andtheyaretestedwithatypicalreadingtesteverysixmonths:–  Fourwillappeartohavemadenoprogressorgonebackwards

–  Fourwillappeartohavemadeatleasttwiceasmuchprogressasexpected

–  Andagain,youwon’tknowwhichstudentsarewhich…

Trueandobservedgrowthscores

Pre-testaverage: 50Post-testaverage:60Pre-testSD: 15ChangeSD: 2Testreliability: 0.85Progressreliability:0.04

Trueprogress

Observedprogress

Fortunately…

•  Whileprogressmeasuresforindividualsareratherunreliable,progressmeasuresforgroupsaremuchmorereliable.

•  Asrulesofthumb:–  Forindividualstudents,progressmeasuresaremeaningfulonlyiftheprogressismorethantwicethestandarderrorofmeasurementofthetestbeingusedtomeasureprogress

–  Foraclassof25students,progressmeasuresaremeaningfuliftheprogressismorethanhalfthestandarderrorofmeasurementofthetestbeingusedtomeasureprogress

Curriculum-basedmeasurement

Curriculum-basedmeasurement62

•  Usedforavarietyofpurposesincludingscreening,benchmarking,andprogressmonitoring

•  Avoidstheproblemsofmeasuringchangescoresbecauseitfocusesonmulapleassessmentsofstatus.

•  Dependsonaclearviewofwhatwillbelearnedbytheendoftheinstrucaonalsequence.

•  However,itisnotapanacea

ReviewofstudiesofCBM-R63

“Suchstudiessuggestthatevenunderthebestcondiaons(i.e.,high-qualityprobesetsandaghtlycontrolledcondiaons),(a)aminimumof5or6weeksofdatawithmulapledatapointscollectedperweekareneededtoinformrouaneinstrucaonaldecisionsand(b)aminimumof12weeksofdatawithmulapledatapointscollectedperweekareneededtomakespecialeducaaoneligibilitydecisions”(p.12)

Ardoin,Christ,Morena,Cormier,andKlingbeil(2013)

64

“…atthispoint,therearenostudiestosuggestthatanindividualstudent'sprogresscanbeaccuratelydeterminedusingCBM-Rprogressmonitoringdata”(p.14)“Furthermore,trainersandpublishersofCBM-RmaterialsshouldneithersuggesttoschoolteachersandothereducatorsthatCBM-RprogressmonitoringdatacanbeusedasaprimaryoutcomemeasuretoevaluateindividualstudentgrowthovershortperiodsofamenortrainthemincurrentCBM-Rdecisionrules.”(pp.14-15)

Ardoin,Christ,Morena,Cormier,andKlingbeil(2013)

Discussionquesaon65

Discussio

n •  Fromwhatyouhaveheardsofar,whatarethekeychallengesregardingthedesignofreadingassessmentforyourschool/district?

Evidence-centereddesign

66

Evidence-centereddesign

•  Conceptualassessmentframework–  Studentmodel:whatareweassessing?

•  “Degreeofdifficulty”model•  “Marksforstyle”model•  “Support”model

–  Evidencemodel:whatevidencedowewant?–  Taskmodel:wherewilltheevidencecomefrom?–  Four-processarchitecture

•  Taskselecaon•  Taskpresentaaon•  Evidenceidenaficaaon•  Evidenceaccumulaaon

Mislevy,AlmondandLukas(2003);Almond,SteinbergandMislevy(2003)

Taskselec$on

68

KintherLay$cks

SkondohasohenbeendescribedasoneofthefantemgrowingplaidosintheUKduringthelast10years,butthelureofchemicksaboutintabselhasconanuedtoafracttheafenaonofmooricknumbersofBritons.

Thepercentageriseintranspitansinthelastdecadedoesnotmatchtheskondoboombutincreasingtranspitancyhasbeentakingplacesincetheearlynineaesandthedemandonourtuwoaitchanddadinisrevealsthespectacularmoory.

Unfortunately,unlikeskondo,theplaidooflayackshasafendantsnuffsemfortheenthusiasacbutrudioamateur.Alltoofewofthesatsunlayboswhotaketothetuwoahhaveeventhemostrudimentaryknowledgeofloxemintabsel.

1.  Nametwopopularplaidos.

2.  Havetherebeenmany

deathsfromSkondo?

3.  Whichcountryhasalotof

kintherlayacks?

4.  Writedowntwo

precauaonstotakefor

layacks

5.  Whatissnuffsemabout

skondo?

6.  Whatwouldyoufindin

dadinis?

69

Taskpresenta$on

70

Itemformats

•  “Noassessmenttechniquehasbeenrubbishedquitelikemulaplechoice,unlessitbegraphology”Wood,1991,p.32)

•  Mythsaboutmulaple-choiceitems–  Theyarebiasedagainstfemales–  Theyassessonlycandidates’abilitytospotorguess–  Theytestonlylower-orderskills

Diagnos$cques$onsinEnglish

Inapieceofpersuasivewriang,whichofthesewouldbethebestthesisstatement?

A.  ThetypicalTVshowhas9violentincidentsB.  ThereisalotofviolenceonTVC.  TheamountofviolenceonTVshouldbereducedD.  SomeprogramsaremoreviolentthanothersE.  ViolenceisincludedinprogramstoboostraangsF.  ViolenceonTVisinteresangG.  Idon’tliketheviolenceonTVH.  TheessayIamgoingtowriteisaboutviolenceonTV

Evidenceiden$fica$on

73

Referentsinassessment

•  Norm-referenced–  agroupwhowereassessedpreviously

•  Cohort-referenced–  thegroupassessedatthesameame

•  Criterion-referenced–  explicitandpreciseperformancecriteria

•  Ipsaave–  definedonlywithinanindividual

•  Construct-referenced–  asharedconstructinacommunityofpracace

Quality

“Maximscannotbeunderstood,salllessappliedbyanyonenotalreadypossessingagoodpracacalknowledgeoftheart.Theyderivetheirinterestfromourappreciaaonoftheartandcannotthemselveseitherreplaceorestablishthatappreciaaon”.(Polanyi,1958p.50).“Qualitydoesn’thavetobedefined.Youunderstanditwithoutdefiniaon.Qualityisadirectexperienceindependentofandpriortointellectualabstracaons”.(Pirsig,1991p.64).

Evidenceaccumula$on

76

Memoryonlandandunderwater

•  18(5f,13m)studentmembersofauniversitydivingclubweretestedontheirrecalloftwo-andthree-syllablewordsfromfour36-wordliststakenfromtheTorontoWordBankspokentothemtwice.

•  Studentslearned,andweretestedon,thewordswhileunderwater,andwhileontheshore,resulanginfourcondiaons:–  DD(learndry,recalldry)–  DW(learndry,recallwet)–  WD(learnwet,recalldry)–  WW(learnwet,recallwet)

77

Memoryiscontext-dependent78

Recallenvironment

Dry Wet

Learningenvironment

Dry 13.5 8.6

Wet 8.4

GoddenandBaddeley(1975)

Nosignificantmaineffects;interacaoneffect:F=22.0;df=1,12;p=<0.001

11.4

Alcoholandmemory

NumberofitemscorrectDay1 Day2

Day1:sober;day2:sober 17 17Day1:sober;day2:intoxicated 17 11Day1:intoxicated;day2:sober 18 13Day1:intoxicated;day2:intoxicated 16

•  32adults(aged22to43)askedtomemorizeamapanda19-itemsetofinstrucaonsforajourney

•  Halfdidsosoberandhalfatthelegallimitforintoxicaaon•  Thefollowingday,halfofthemweretestedsoberandhalfat

thelegallimitforintoxicaaon.

Lowe(1981)

16

79

Discussionquesaon80

Discussio

n •  Howwillyoudecidehowmuchevidenceisneededtodecidewhetherastudenthasreachedlearnedsomething?

Recording

81

SylvieandBrunoconcluded(Carroll,1893)

“That’sanotherthingwe’velearnedfromyourNaaon,”saidMeinHerr,“map-making.Butwe’vecarrieditmuchfurtherthanyou.Whatdoyouconsiderthelargestmapthatwouldbereallyuseful?”“Aboutsixinchestothemile.”“Onlysixinches!”exclaimedMeinHerr.“Weverysoongottosixyardstothemile.Thenwetriedahundredyardstothemile.Andthencamethegrandestideaofall!Weactuallymadeamapofthecountry,onthescaleofamiletothemile!”“Haveyouuseditmuch?”Ienquired.“Ithasneverbeenspreadout,yet,”saidMeinHerr:“thefarmersobjected:theysaiditwouldcoverthewholecountry,andshutoutthesunlight!Sowenowusethecountryitself,asitsownmap,andIassureyouitdoesnearlyaswell.

82

Whatisagrade?

“…aninadequatereportofaninaccuratejudgmentbyabiasedandvariablejudgeoftheextenttowhichastudenthasafainedanundefinedlevelofmasteryofanunknownproporaonofanindefinitematerial.”(Dressel,quotedinChickering,1983p.12)

Repor$ng

84

Effectsoffeedback

•  Kluger&DeNisi(1996)•  Reviewof3000researchreports•  Excludingthose:

–  withoutadequatecontrols–  withpoordesign–  withfewerthan10paracipants–  whereperformancewasnotmeasured–  withoutdetailsofeffectsizes

•  leh131reports,607effectsizes,involving12652individuals

•  Onaveragefeedbackdoesimproveperformance,but–  Effectsizesverydifferentindifferentstudies–  In38%(50outof131)ofstudies,effectsizeswerenegaave

GeSngfeedbackrightishard

Responsetype Feedbackindicatesperformance…

exceedsgoal fallsshortofgoal

Changebehavior Exertlesseffort Increaseeffort

Changegoal Increaseaspira$on Reduceaspiraaon

Abandongoal Decidegoalistooeasy Decidegoalistoohard

Rejectfeedback Feedbackisignored Feedbackisignored

Meaningsandconsequencesofschoolgrades

•  Tworaaonalesforgrading– Meanings

•  Assessmentasevidenaaryreasoning•  Assessmentoutcomesassupportsformakinginferences

–  (e.g.,aboutstudentachievement)

–  Consequences•  Assessmentoutcomesasrewardsandpunishments•  Assessmentscreateincenavesforstudentstodowhatwewantthemtodo

–  Thesetworaaonalesinteract,andconflict•  achievementgradesforcompleaonofhomework•  achievementgradesforeffort•  penalaesforlatesubmission•  zeroesformissingwork

87

Dual-pathwaytheory(Boekaerts,2006)

•  Long-termlearninggoalsaretranslatedintoshort-termlearningintenaons

•  Dynamiccomparisonsoftaskandsituaaonaldemandswithpersonalresources,takingintoaccount:–  Currentpercepaonsofthetask–  Beliefsaboutthesubjectortask–  Beliefsabout“ability”andtheroleofeffortinthesubject–  Interestinthesubject(personalvs.situaaonal)–  Previousexperiencesonsimilartasks–  Costsandbenefits

88

Andthenitcomesdownto…

•  Resulangacavaaonofenergyalongoneoftwopathways– Wellbeing–  Growth

•  Weneedassessmentsystemsthatpushourstudentstowardsafocusongrowth,ratherthanwellbeing

Summary

•  Beforewecanassess,weneedclearmodelsofprogression

•  Validityisnotapropertyoftestsorassessments,butofinferences,whichareweakenedby–  constructunderrepresentaaon–  construct-irrelevantvariance

•  Reliabilityisakeyrequirementforvalidity•  Limitedtestreliabilityhasparacularlysevereconsequencesforchangesscoresanddiagnosis

•  Assessmentsareimportantforwhattheydoaswellaswhattheymean