DylanWiliam(@dylanwiliam)
UnderstandingAssessments:WhattheyMeanandWhattheyDo
www.dylanwiliamcenter.com www.dylanwiliam.net
Ini$alassump$ons
• Anyassessmentsystemshouldbedesignedtoassesstheschool’scurriculumratherthanhavingtodesignthecurriculumtofittheschool’sassessmentsystem.
• Sinceeachschool’scurriculumshouldbedesignedtomeetlocalneeds,therecannotbeaone-size-fits-allassessmentsystem—eachschool’sassessmentsystemwillbedifferent.
• Thereare,however,anumberofprinciplesthatshouldgovernthedesignofassessmentsystems,and
• Thereissomesciencehere—knowledgethatpeopleneedinordertoavoiddoingthingsthatarejustwrong.
2
Assessment:Acau$onarytale
A B C D E F G H Total
Adams 100 30 47 72 40 75 30 47 441
Brown 90 38 43 60 20 65 48 70 434
Collins 61 36 40 45 41 55 62 80 420
Dorkin 63 32 51 90 30 70 47 35 418
Evans 56 55 41 82 45 40 49 41 409
Fuller 80 45 49 64 65 45 38 20 406
Grant 23 47 45 55 60 80 32 60 402
Howell 40 35 52 70 56 20 60 65 398
Iman 85 40 60 40 28 51 55 30 389
Jones 72 54 50 10 25 35 66 75 387
Keller 48 57 55 34 70 60 36 10 370
Lant 10 60 59 20 35 30 70 58 342Mean 61 44 49 54 43 52 49 49
Equalizingtherangeforeachsubject
A B C D E F G H Total
Adams 100 0 35 77 40 92 0 53 397
Brown 89 27 15 63 0 75 45 86 400
Collins 57 20 0 44 42 58 80 100 401
Dorkin 59 7 55 100 20 83 43 36 403
Evans 51 83 5 90 50 33 48 44 404
Fuller 78 50 45 68 90 42 20 14 407
Grant 14 57 25 56 80 100 5 71 408
Howell 33 17 60 75 72 0 75 79 411
Iman 83 34 100 38 16 52 62 29 414
Jones 69 80 50 0 10 25 90 93 417
Keller 42 90 75 30 100 67 15 0 419
Lant 0 100 95 12 30 17 100 69 423Mean 56 47 47 54 46 54 49 56
Andusingclassranksineachsubject…
A B C D E F G H Total
Adams 1 12 8 3 7 2 12 7 52
Brown 2 8 10 6 12 4 7 3 52
Collins 7 9 12 8 6 6 3 1 52
Dorkin 6 11 5 1 9 3 8 9 52
Evans 8 3 11 2 5 9 6 8 52
Fuller 4 6 7 5 2 8 9 11 52
Grant 11 5 9 7 3 1 11 5 52
Howell 10 10 4 4 4 12 4 4 52
Iman 3 7 1 9 10 7 5 10 52
Jones 5 4 6 12 11 10 2 2 52
Keller 9 2 3 10 1 5 10 12 52
Lant 12 1 2 11 8 11 1 6 52
Beforewecanassess…
• The‘backwarddesign’ofaneducaaonsystem– Wheredowewantourstudentstogetto?
• ‘Bigideas’– Whatarethewaystheycangetthere?
• Learningprogressions– Whenshouldwecheckon/reportprogress?
• Inherentandusefulcheckpoints
6
Bigideas
7
Bigideas8
• A“bigidea”– helpsmakesenseofapparentlyunrelatedphenomena– isgenera)veinthatiscanbeappliedinnewareas
Bigideasinreading
• Wriangisanafempttocommunicatemeaning• Makingsenseoftextohenrequiresmakingconnecaonsbetweensentences
• Writersohenchoosewordsfortheeffecttheyhaveonthelistener/reader
• Thehero’sjourney(Campbell,1949)• …
9
Learningprogressions
Whatisitthatgetsbeferwhensomeonegetsbeferatreading?
10
The“seduc$veallure”ofneuroscience
11
Cor$callanguagelocaliza$on12
• 117individuals(aged4to80)undergoingfrontalorfrontotemporoparietalcraniotomiesasatreatmentforepilepsy
• Subjectswereshownlinedrawingsoffamiliarobjectsandaskedtonamewhattheyhadseenwhileexposedregionsofthecerebralcortexweresamulatedwithelectriccurrent
• Namingerrorsweretakenasindicaangthattheregioninquesaonwasessenaaltolanguage
Ojemann,Ojemann,Leatch,andBerger(1989)
58
6
1417
14
3334
11
1264
8211 45
68 58 51
667261625876
6046 60
5861 44
8
6 10 23 3
Ojemann,Ojemann,Leatch,andBerger(1989)
Numberofpaaentswithasiteineachzone(outof117)
2037
50
5029
43
4523
18
4236
799 27 19
14 26
293629261914
52 7
1921 32
8
0 0 0 0 0
Ojemann,Ojemann,Leatch,andBerger(1989)
Percentageofpaaentswithasiteineachzonewithsignificantnamingerrorsinthatzone
“Allmodelsarewrong;someareuseful”
“Sinceallmodelsarewrongthescienastcannotobtaina‘correct’onebyexcessiveelaboraaon.OnthecontraryfollowingWilliamofOccamheshouldseekaneconomicaldescripaonofnaturalphenomena.Justastheabilitytodevisesimplebutevocaavemodelsisthesignatureofthegreatscienastsooverelaboraaonandoverparameterizaaonisohenthemarkofmediocrity.”(Box,1976p.792)
Learningprogressions
• Whatgetsbeferwhenstudentsgetbeferatreading?– Phonemicawareness– Phonics– Fluency– Vocabulary– Textcomprehension
16
NaaonalReadingPanel(2001)
The“simple”viewofreading17
Scarborough(2001)
Backgroundknowledge
Vocabulary
Languagestructures
Verbalreasoning
Literacyknowledge
Sightrecogniaon
DecodingPhonologicalawareness
Lefers
Translaaonrules
Wordsounds
Syntacacrules
Ideaweb
Spellings
Situaaonmodel
Wordmeanings
Sentencerepresentaaon
Expandedmodelofreading(Willingham,2017)
Copythis19
ЖӘШІК
Readingskills:whataretheyreally?
“Amanifold,containedinanintuiaonwhichIcallmine,isrepresented,bymeansofthesynthesisoftheunderstanding,asbelongingtothenecessaryunityofself-consciousness;andthisiseffectedbymeansofthecategory.”Whatisthemainideaofthispassage?
A. Withoutamanifold,onecannotcallanintuiaon‘mine.’B. Intuiaonmustprecedeunderstanding.C. Intuiaonmustoccurthroughacategory.D. Self-consciousnessisnecessarytounderstanding
Hirsch(2006)
Lostintransla$on?
• “Comprehensiondependsonconstrucangamentalmodelthatmakestheelementsfallintoplaceand,equallyimportant,enablesthelistenerorreadertosupplyessenaalinformaaonthatisnotexplicitlystated.Inlanguageuse,thereisalwaysagreatdealthatislehunsaidandmustbeinferred.Thismeansthatcommunicaaondependsonbothsides,writerandreader,sharingabasisofunspokenknowledge.Thislargedimensionoftacitknowledgeispreciselywhatisnotbeingtaughtadequatelyinourschools.”
Hirsch(2009loc.176)
Domainknowledgeandmemory
• 3rd(N=64),5th(N=67)and7th(N=54)gradestudentsfromHeidelberg,Germany,testedonreadingexperaseandsoccerknowledge– 13-itemquesaonnaireonsoccerknowledge– standardizedreadingcomprehensiontest
• Studentsheard(twice)andreadawell-structuredreadablestoryonayoungplayer’sexperiencesinasoccergame
• Tested15minuteslaterwithaclozeversionofthetestwith20blanks
Schneider,Körkel,andWiener(1989)
High
Low
HighLow
Knowledge of socce
rReading ability
16.417.0
11.111.0
Assessment
24
WriOenexamina$ons
“Theyhavepervertedthebesteffortsofteachers,andnarrowedandgroovedtheirinstrucaon;theyhaveoccasionedandmadewellnighimperaavetheuseofmechanicalandrote
methodsofteaching;theyhaveoccasionedcrammingandthemostvicioushabitsofstudy;theyhavecausedmuchofthe
overpressurechargeduponschools,someofwhichisreal;theyhavetemptedbothteachersandpupilstodishonesty;
andlastbutnotleast,theyhavepermifedamechanicalmethodofschoolsupervision.”
25
(White,1888pp.517-518)
Campbell’slaw
“Themoreanyquanataavesocialindicatorisusedforsocialdecision-making,themoresubjectitwillbetocorrupaonpressuresandthemoreaptitwillbetodistortandcorruptthesocialprocessesitisintendedtomonitor.”(Campbell,1976p.49)– Allperformanceindicatorslosetheirmeaningwhenadoptedaspolicytargets
– Thecleareryouareaboutwhatyouwant,themorelikelyyouaretogetit,butthelesslikelyitistomeananything
26
The“LakeWobegon”effect
3.4
3.5
3.6
3.7
3.8
3.9
4.0
4.1
4.2
4.3
4.4
1986 1987 1988 1989 1990
Grade
equ
ivalen
ts
TestC TestB TestC
Koretz,Linn,DunbarandShepard(1991)
Effectsofnarrowassessment
• Incenavestoteachtothetest– Focusonsomesubjectsattheexpenseofothers– Focusonsomeaspectsofasubjectattheexpenseofothers
– Focusonsomestudentsattheexpenseofothers(“bubble”students)
• Consequences– Learningthatis
• Narrow• Shallow• Transient
28
GeSngassessmentright
29
Whatisanassessment?
• Anassessmentisaprocedureformakinginferences– Wegivestudentsthingstodo– Wecollecttheevidence– Wedrawconclusions
• Keyquesaon:“Onceyouknowtheassessmentoutcome,whatdoyouknow?”
• Foranytest:– someinferencesarewarranted(valid)– somearenot
30
Validity
• Evoluaonoftheideaofvalidity– Apropertyofatest– Apropertyofstudents’scoresonatest– Apropertyofinferencesdrawnonthebasisoftestresults
• “Onevalidatesnotatestbutaninterpretaaonofdataarisingfromaspecifiedprocedure”(Cronbach,1971)
• Consequences– Nosuchthingasavalid(orindeedinvalid)assessment– Nosuchthingasabiasedassessment– Formaaveandsummaavearedescripaonsofinferences
31
Meaningsandconsequencesofassessment
• Evidenaalbasis– Whatdoestheassessmentresultmean?
• Consequenaalbasis– Whatdoestheassessmentresultdo?
• Assessmentliteracy(Saggins,1991)– Doyouknowwhatthisassessmentresultmeans?– Doesithaveualityforitsintendeduse?– Whatmessagedoesthisassessmentsendtostudents(andotherstakeholders)abouttheachievementoutcomeswevalue?
– Whatislikelytobetheeffectofthisassessmentonstudents?
Validityrevisited
“Validityisanintegraaveevaluaavejudgmentofthedegreetowhichempiricalevidenceandtheoreacalraaonalessupporttheadequacyandappropriatenessofinferencesandacaonsbasedontestscoresorothermodesofassessment.”(Messick,1989p.13)• Socialconsequences:– “Rightconcern,wrongconcept”(Popham,1997)
Qualityinassessment
• Threatstovalidity– Construct-irrelevantvariance
• Systemaac:goodperformanceontheassessmentrequiresabiliaesnotrelatedtotheconstructofinterest
• Random:goodperformanceisrelatedtochancefactors,suchasluck(effecavelypoorreliability)
– Constructunder-representaaon• Goodperformanceontheassessmentcanbeachievedwithoutdemonstraangallaspectsoftheconstructofinterest
35
Discussio
n • Workingasagroup,trytoframeonevalidityissueasanissueofconstruct-irrelevantvarianceorofconstructunder-representaaon.
Understandingreliability
36
Understandingtestscores
• Consideratestofstudents’abilitytospellwordsdrawnfromabankof1000words.
• Whatwecanconcludedependson:– Thesizeofthesample– Thewaythesamplewasdrawn– Students’knowledgeofthesample– Theamountofnoacegiven
Samplesandreliability
• Supposeweaskastudenttospell20ofthewordsdrawnatrandom,atfivedifferentamesoftheday,withthefollowingresults– 15 17 14 15 14– Onaverage,thestudentscores15outof20– Ourbestguessisthestudentcanspell750ofthe1000words
• Iftheresultswere:– 20 12 17 10 16– Ourbestguessissallthatthestudentknows750ofthe1000spellings
– Butnowwearemuchlesscertainaboutthis
Someexamples
Example1Actualscore 15 17 14 15 14Differencefromaverage 0 +2 -1 0 -1Averageerror 0(bydefiniaon!)Standarddeviaaonoferrors 1.2
Example2Actualscore 20 12 17 10 16Differencefromaverage 5 -3 +2 -5 +1Averageerror 0(bydefiniaon!)Standarddeviaaonoferrors 4.0
Quan$fyingreliability
• The“standarderrorofmeasurement”or“SEM”isjustthestandarddeviaaonoftheerrorsaveragedoveralltesttakers
• Thereliabilityofthetestis:
Rela$onshipofreliabilityanderror
• Foratestwithanaveragescoreof50,andastandarddeviaaonof15(sothatmostscoresrangefrom20to80),errorsofmeasurementareasfollows:
Reliability Standarderrorofmeasurement
0.70 8.20.75 7.50.80 6.70.85 5.80.90 4.70.95 3.4
Whatdoesthismean?
• Consideraclassof25studentstakingareadingtest– withareliabilityof0.85– anaveragescoreof50– astandarddeviaaonof15(mostscoresrangefrom20to80)
• Then– 17studentsgetascorewithin6pointsoftheirtruescore– 7studentsgetascorethatismorethan6points,butlessthan12pointsfromtheirtruescore
– andonestudentgetsascorethatdiffersfromtheirtruescorebymorethan12points
• Unfortunately…– youwon’tknowwhichstudent– andyouwon’tknowiftheirscorewashigherorlowerthanitshouldhavebeen
Reliability:0.75
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60 70 80 90 100
Observedscore
Truescore
Reliability:0.80
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60 70 80 90 100
Observedscore
Truescore
Reliability:0.85
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60 70 80 90 100
Observedscore
Truescore
Reliability:0.90
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60 70 80 90 100
Observedscore
Truescore
Reliability:0.95
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60 70 80 90 100
Observedscore
Truescore
Understandingwhatthismeansinprac$ce
48
Groupingstudentsbyability
49
Usingtestsforgroupingstudentsbyability
shouldbein
group1 group2 group3 group4
studentsplacedin
group1 23 9 3
group2 9 12 6 3
group3 3 6 7 4
group4 3 4 8
Usingatestwithareliabilityof0.9,andwithapredicavevalidityof0.7,togroup100studentsintofourabilitygroups:
Only50%ofthestudentsareinthe“right”group
Diagnos$ctes$ng
51
Thelimitsofdiagnos$ctes$ng
• 120-itemmulaplechoicetestforteacherlicensure– Fourmajorsubjectareas
• languagearts/reading• mathemaacs• socialstudies• science
– 30itemspersubjectarea– Sub-scorereliabiliaesrangefrom0.71to0.83
Howreliableare10-itemsubtestscores?
• Itemsforeachsubjectarearankedinorderofdifficulty(i.e.,1to30)
• Threeparallel10-itemformscreatedineachsubjectarea:– FormA:items1,4,7,…28– FormB:items2,5,8,…29– FormC:items3,6,9,…30
• Sub-scorereliabiliaesintherange0.40to0.60• OnformA,271examineesscored7inmathemaacsand3inscience
Scoresof271studentsonformB
Sciencesubscore1 2 3 4 5 6 7 8 9 10
Mathsubscore
1 0 0 0 1 1 1 0 0 0 0
2 0 0 0 1 3 1 2 0 0 0
3 1 0 0 1 2 4 3 1 1 1
4 0 0 2 7 7 6 4 0 1 0
5 0 1 1 1 10 14 8 5 1 1
6 2 0 1 5 10 11 15 8 1 1
7 0 1 4 4 4 11 10 7 4 0
8 0 1 1 5 12 13 7 5 4 0
9 0 0 1 1 6 3 7 4 3 0
10 0 0 0 1 1 2 1 1 0 0
Sinharay,GautamandHalberman(2010)
110outof271(41%)examineesgotabeferformBscoreinsciencethanmathemaacs
Whatdoesthismean?
• Astudentscoring7onmathemaacsand3onsciencewouldprobablywanttoimprovethelafer
• But110ofthe271examineesgotabeferscoreinsciencethanmathemaacsonFormB
• CorrelaaonofsciencesubscoresonFormsAandBis0.48
• CorrelaaonofsciencesubscoreonFormAwithtotalscoreonFormBis0.63
• Inotherwords,thetotalscoreonthetotaltestisabeferguidetothescoreonasub-testthananotherscoreonthesamesub-test
Measuringprogress
56
Reliability,standarderrors,andprogress
Grade Reliability SEMasapercentageofannualprogress
1 0.89 26%2 0.85 56%3 0.82 76%4 0.83 39%5 0.83 55%6 0.89 46%
Average 0.85 49%Inotherwords,thestandarderrorofmeasurementofthisreadingtestisequaltosixmonths’progressbyatypicalstudent
Inotherwords…
• Inaclassof25students,iftheyhaveallmadeexactlytheexpectedprogress,andtheyaretestedwithatypicalreadingtesteverysixmonths:– Fourwillappeartohavemadenoprogressorgonebackwards
– Fourwillappeartohavemadeatleasttwiceasmuchprogressasexpected
– Andagain,youwon’tknowwhichstudentsarewhich…
Trueandobservedgrowthscores
Pre-testaverage: 50Post-testaverage:60Pre-testSD: 15ChangeSD: 2Testreliability: 0.85Progressreliability:0.04
Trueprogress
Observedprogress
Fortunately…
• Whileprogressmeasuresforindividualsareratherunreliable,progressmeasuresforgroupsaremuchmorereliable.
• Asrulesofthumb:– Forindividualstudents,progressmeasuresaremeaningfulonlyiftheprogressismorethantwicethestandarderrorofmeasurementofthetestbeingusedtomeasureprogress
– Foraclassof25students,progressmeasuresaremeaningfuliftheprogressismorethanhalfthestandarderrorofmeasurementofthetestbeingusedtomeasureprogress
Curriculum-basedmeasurement
Curriculum-basedmeasurement62
• Usedforavarietyofpurposesincludingscreening,benchmarking,andprogressmonitoring
• Avoidstheproblemsofmeasuringchangescoresbecauseitfocusesonmulapleassessmentsofstatus.
• Dependsonaclearviewofwhatwillbelearnedbytheendoftheinstrucaonalsequence.
• However,itisnotapanacea
ReviewofstudiesofCBM-R63
“Suchstudiessuggestthatevenunderthebestcondiaons(i.e.,high-qualityprobesetsandaghtlycontrolledcondiaons),(a)aminimumof5or6weeksofdatawithmulapledatapointscollectedperweekareneededtoinformrouaneinstrucaonaldecisionsand(b)aminimumof12weeksofdatawithmulapledatapointscollectedperweekareneededtomakespecialeducaaoneligibilitydecisions”(p.12)
Ardoin,Christ,Morena,Cormier,andKlingbeil(2013)
64
“…atthispoint,therearenostudiestosuggestthatanindividualstudent'sprogresscanbeaccuratelydeterminedusingCBM-Rprogressmonitoringdata”(p.14)“Furthermore,trainersandpublishersofCBM-RmaterialsshouldneithersuggesttoschoolteachersandothereducatorsthatCBM-RprogressmonitoringdatacanbeusedasaprimaryoutcomemeasuretoevaluateindividualstudentgrowthovershortperiodsofamenortrainthemincurrentCBM-Rdecisionrules.”(pp.14-15)
Ardoin,Christ,Morena,Cormier,andKlingbeil(2013)
Discussionquesaon65
Discussio
n • Fromwhatyouhaveheardsofar,whatarethekeychallengesregardingthedesignofreadingassessmentforyourschool/district?
Evidence-centereddesign
66
Evidence-centereddesign
• Conceptualassessmentframework– Studentmodel:whatareweassessing?
• “Degreeofdifficulty”model• “Marksforstyle”model• “Support”model
– Evidencemodel:whatevidencedowewant?– Taskmodel:wherewilltheevidencecomefrom?– Four-processarchitecture
• Taskselecaon• Taskpresentaaon• Evidenceidenaficaaon• Evidenceaccumulaaon
Mislevy,AlmondandLukas(2003);Almond,SteinbergandMislevy(2003)
Taskselec$on
68
KintherLay$cks
SkondohasohenbeendescribedasoneofthefantemgrowingplaidosintheUKduringthelast10years,butthelureofchemicksaboutintabselhasconanuedtoafracttheafenaonofmooricknumbersofBritons.
Thepercentageriseintranspitansinthelastdecadedoesnotmatchtheskondoboombutincreasingtranspitancyhasbeentakingplacesincetheearlynineaesandthedemandonourtuwoaitchanddadinisrevealsthespectacularmoory.
Unfortunately,unlikeskondo,theplaidooflayackshasafendantsnuffsemfortheenthusiasacbutrudioamateur.Alltoofewofthesatsunlayboswhotaketothetuwoahhaveeventhemostrudimentaryknowledgeofloxemintabsel.
1. Nametwopopularplaidos.
2. Havetherebeenmany
deathsfromSkondo?
3. Whichcountryhasalotof
kintherlayacks?
4. Writedowntwo
precauaonstotakefor
layacks
5. Whatissnuffsemabout
skondo?
6. Whatwouldyoufindin
dadinis?
69
Taskpresenta$on
70
Itemformats
• “Noassessmenttechniquehasbeenrubbishedquitelikemulaplechoice,unlessitbegraphology”Wood,1991,p.32)
• Mythsaboutmulaple-choiceitems– Theyarebiasedagainstfemales– Theyassessonlycandidates’abilitytospotorguess– Theytestonlylower-orderskills
Diagnos$cques$onsinEnglish
Inapieceofpersuasivewriang,whichofthesewouldbethebestthesisstatement?
A. ThetypicalTVshowhas9violentincidentsB. ThereisalotofviolenceonTVC. TheamountofviolenceonTVshouldbereducedD. SomeprogramsaremoreviolentthanothersE. ViolenceisincludedinprogramstoboostraangsF. ViolenceonTVisinteresangG. Idon’tliketheviolenceonTVH. TheessayIamgoingtowriteisaboutviolenceonTV
Evidenceiden$fica$on
73
Referentsinassessment
• Norm-referenced– agroupwhowereassessedpreviously
• Cohort-referenced– thegroupassessedatthesameame
• Criterion-referenced– explicitandpreciseperformancecriteria
• Ipsaave– definedonlywithinanindividual
• Construct-referenced– asharedconstructinacommunityofpracace
Quality
“Maximscannotbeunderstood,salllessappliedbyanyonenotalreadypossessingagoodpracacalknowledgeoftheart.Theyderivetheirinterestfromourappreciaaonoftheartandcannotthemselveseitherreplaceorestablishthatappreciaaon”.(Polanyi,1958p.50).“Qualitydoesn’thavetobedefined.Youunderstanditwithoutdefiniaon.Qualityisadirectexperienceindependentofandpriortointellectualabstracaons”.(Pirsig,1991p.64).
Evidenceaccumula$on
76
Memoryonlandandunderwater
• 18(5f,13m)studentmembersofauniversitydivingclubweretestedontheirrecalloftwo-andthree-syllablewordsfromfour36-wordliststakenfromtheTorontoWordBankspokentothemtwice.
• Studentslearned,andweretestedon,thewordswhileunderwater,andwhileontheshore,resulanginfourcondiaons:– DD(learndry,recalldry)– DW(learndry,recallwet)– WD(learnwet,recalldry)– WW(learnwet,recallwet)
77
Memoryiscontext-dependent78
Recallenvironment
Dry Wet
Learningenvironment
Dry 13.5 8.6
Wet 8.4
GoddenandBaddeley(1975)
Nosignificantmaineffects;interacaoneffect:F=22.0;df=1,12;p=<0.001
11.4
Alcoholandmemory
NumberofitemscorrectDay1 Day2
Day1:sober;day2:sober 17 17Day1:sober;day2:intoxicated 17 11Day1:intoxicated;day2:sober 18 13Day1:intoxicated;day2:intoxicated 16
• 32adults(aged22to43)askedtomemorizeamapanda19-itemsetofinstrucaonsforajourney
• Halfdidsosoberandhalfatthelegallimitforintoxicaaon• Thefollowingday,halfofthemweretestedsoberandhalfat
thelegallimitforintoxicaaon.
Lowe(1981)
16
79
Discussionquesaon80
Discussio
n • Howwillyoudecidehowmuchevidenceisneededtodecidewhetherastudenthasreachedlearnedsomething?
Recording
81
SylvieandBrunoconcluded(Carroll,1893)
“That’sanotherthingwe’velearnedfromyourNaaon,”saidMeinHerr,“map-making.Butwe’vecarrieditmuchfurtherthanyou.Whatdoyouconsiderthelargestmapthatwouldbereallyuseful?”“Aboutsixinchestothemile.”“Onlysixinches!”exclaimedMeinHerr.“Weverysoongottosixyardstothemile.Thenwetriedahundredyardstothemile.Andthencamethegrandestideaofall!Weactuallymadeamapofthecountry,onthescaleofamiletothemile!”“Haveyouuseditmuch?”Ienquired.“Ithasneverbeenspreadout,yet,”saidMeinHerr:“thefarmersobjected:theysaiditwouldcoverthewholecountry,andshutoutthesunlight!Sowenowusethecountryitself,asitsownmap,andIassureyouitdoesnearlyaswell.
82
Whatisagrade?
“…aninadequatereportofaninaccuratejudgmentbyabiasedandvariablejudgeoftheextenttowhichastudenthasafainedanundefinedlevelofmasteryofanunknownproporaonofanindefinitematerial.”(Dressel,quotedinChickering,1983p.12)
Repor$ng
84
Effectsoffeedback
• Kluger&DeNisi(1996)• Reviewof3000researchreports• Excludingthose:
– withoutadequatecontrols– withpoordesign– withfewerthan10paracipants– whereperformancewasnotmeasured– withoutdetailsofeffectsizes
• leh131reports,607effectsizes,involving12652individuals
• Onaveragefeedbackdoesimproveperformance,but– Effectsizesverydifferentindifferentstudies– In38%(50outof131)ofstudies,effectsizeswerenegaave
GeSngfeedbackrightishard
Responsetype Feedbackindicatesperformance…
exceedsgoal fallsshortofgoal
Changebehavior Exertlesseffort Increaseeffort
Changegoal Increaseaspira$on Reduceaspiraaon
Abandongoal Decidegoalistooeasy Decidegoalistoohard
Rejectfeedback Feedbackisignored Feedbackisignored
Meaningsandconsequencesofschoolgrades
• Tworaaonalesforgrading– Meanings
• Assessmentasevidenaaryreasoning• Assessmentoutcomesassupportsformakinginferences
– (e.g.,aboutstudentachievement)
– Consequences• Assessmentoutcomesasrewardsandpunishments• Assessmentscreateincenavesforstudentstodowhatwewantthemtodo
– Thesetworaaonalesinteract,andconflict• achievementgradesforcompleaonofhomework• achievementgradesforeffort• penalaesforlatesubmission• zeroesformissingwork
87
Dual-pathwaytheory(Boekaerts,2006)
• Long-termlearninggoalsaretranslatedintoshort-termlearningintenaons
• Dynamiccomparisonsoftaskandsituaaonaldemandswithpersonalresources,takingintoaccount:– Currentpercepaonsofthetask– Beliefsaboutthesubjectortask– Beliefsabout“ability”andtheroleofeffortinthesubject– Interestinthesubject(personalvs.situaaonal)– Previousexperiencesonsimilartasks– Costsandbenefits
88
Andthenitcomesdownto…
• Resulangacavaaonofenergyalongoneoftwopathways– Wellbeing– Growth
• Weneedassessmentsystemsthatpushourstudentstowardsafocusongrowth,ratherthanwellbeing
Summary
• Beforewecanassess,weneedclearmodelsofprogression
• Validityisnotapropertyoftestsorassessments,butofinferences,whichareweakenedby– constructunderrepresentaaon– construct-irrelevantvariance
• Reliabilityisakeyrequirementforvalidity• Limitedtestreliabilityhasparacularlysevereconsequencesforchangesscoresanddiagnosis
• Assessmentsareimportantforwhattheydoaswellaswhattheymean