Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
10/25/17
1
Revitalizing‘Pore’AssessmentPractices:
SarahShaffer,DO– UniversityofIowaAmyThompson,MD– UniversityofCincinnati
AliceChuang,MD,MEd– UniversityofNorthCarolina
ValidandReliableAssessmentToolsinMedicalEducation
Noneoftheauthorsnortheirspouseshaveanyrelevantfinancial
disclosures.
LearningObjectives
• Thelearnerwillbeabletoidentifyelementsofapoorly-constructedmultiple-choicequestion.• Thelearnerwillrecognizereliabilityandvaliditythreatstoobservationalassessmentsanddiscussactionstominimizethese.• Thelearnerwillbeabletoimplementkeystepstocreateaperformanceevaluation(simulationorobjectivestructuredclinicalexam).
10/25/17
2
“Onlywhenweareclearwhatitiswewanttoassesscanwedeviseeffectivestrategiesforachievingthat.”
Hammersley,1987
Whyarereliabilityandvaliditysoconfusing?
• Campbell&Fiske,1967Reliability istheagreementbetweentwoeffortstomeasurethesametraitthroughmaximallysimilarmethods.
Validity isrepresentedintheagreementbetweentwoattemptstomeasurethesametraitthroughmaximallydifferentmethods.
• Black&Champion,1976Thereliability ofameasuringinstrumentisdefinedastheabilityoftheinstrumenttomeasureconsistentlythephenomenonitisdesignedtomeasure.Thevalidity ofameasuringinstrumentisdefinedasthepropertyofameasurethatallowstheresearchertosaythattheinstrumentmeasureswhathesaysitmeasures…
Hammersley,1987
HighPrecisionLowaccuracyRandomerrorsmallSystematicerrorlarge
HighPrecisionHighAccuracyRandomerrorsmallSystematicerrorsmall
BetterPrecisionBetteraccuracyRandomerrorlargeSystematicerrorlarge
LowPrecisionLowaccuracyRandomerrorsmallerSystematicerrorlarge
Validity – ConstructwearemeasuringReliability – Instrumentweareusingtomeasuretheconstruct,anditsabilitytoproducevalidscores
10/25/17
3
Let’sreview…Validity1
• “Doscoresaccuratelyreflectthepresenceofaproperty?”• “Accuracyinourabilitytomeasureanobservablepropertyandcorrelatewithanunobservableproperty.“• Doscoresaccuratelyreflecttheamountofaproperty?(precision)
Reliability• “Repetitionofthestudywouldresultinthesamedataandconclusions”2
• “Abilityofaninstrumenttoproducevalidscores.”1• Anunderstoodrequirementforatesttobevalid• Ameasureofrandomerrorintheinstrument
1Hammersley,19872Goode&Hatt,1952
How tall am I?Validity
Reliability
10/25/17
4
Pregnancytestexample– 50%ofpatientsaretrulypregnant
Hour#1Results50positive50negative
Hour#2Results20positive80negative
BESTRESULTS
??%
WrittenTests
WrittenTests
•Twomajorformats
• ConstructedResponse:requireformationofanuniquewrittenresponse
• SelectedResponse:requirechoiceofabestorcorrectanswerfromasetlistofoptionsinaresponsetoapromptor“questionstem”
10/25/17
5
Miller’sPyramid4.Results
A.ChangeinorganizationalpracticeB.Benefitstopatients/clients
3.BehaviorA.Transferlearningtoworkplace
B.Learnersapplynewknowledgeandskills
2.LearningA.Changeattitudes/perceptions
B.Changeknowledge/skills
1.ReactionStudentSatisfaction
Kirkpatrick’sCriteriaforLearningOutcomes
Demonstrationoflearning
Pros&ConsofWrittenTests
• Strengths• In-depth&non-cued(CR)• Testmultipleaspects (CR)• Easeofpartialcreditscoring(CR)• Testbroad content(SR)• Objective(SR)• Reproducible(SR)• Efficient(SR)
• Limitations• Subjectivescoring(CR)• Issueswithreproducibility(CR)• Inefficient(CR)• Difficulttowritewell (SR)• Issueswithpublicizing(SR)
StepstocreationofaWrittenTest
• SelectedResponse• DetermineappropriatenessofSRwrittentestformat• Determinecontenttobeassessed• WriteSRquestion(s)withconsiderationof:• Formatting• Style• Stem-specificissues• Option-specificissues
• ScoreSRitem(s)
• ConstructedResponse• DetermineappropriatenessofCRwrittentestformat• Determinethecontenttobeassessed• WriteCRprompt(s)orquestion(s)• Prepareaidealanswer(s)• Contentexpertsread&scoreCRitem(s)
10/25/17
6
“Any aspectofacognitiveeducationalachievementcanbetestedbymeansofeitherthemultiple-choiceorthetrue-falseform.”
RogerL.Ebel,1972
SRWrittenTest:DetermineContent
• Learnerlevel• Backgroundinformationorinstruction• Goalofcontentgainedbylearner• Time(etc…)constraints• Testauthor&testreviewer(s)shouldbecontentexperts
SRWrittenTest:WriteMCQs
• SRtestitemanatomy=stem +question +options• Aproposedtaxonomyof31principlesofwritingeffectiveMCQsisavailable• HIGHPOINTS• Focusonasinglecontentelementorissue• Stem mustcontainadequateinformation• Question shouldbeobvious• Brevityisbestforansweroptions• Everyoptions shouldbeplausible• Threeoptions isgenerallybest
10/25/17
7
SRWrittenTest:Review&Scoring
• Reviewbyanothercontentexpert(s)• ApplyprinciplesofwritingeffectiveMCQs• Considercontentover/under-representation• Considerbalanceofhigher-order&lower-ordercognitiveobjectives(2/3and1/3,respectively,aresuggested)
• Adequatetimeforreview&revision• Scoring shouldberelativelysimplewithasinglebestand/orcorrectanswer
Examples:SelectedResponseMCQs
SR- Possiblethreatstovalidity/sourcesoferror• Choiceofcontent➡ adequateinputfrommultiplecontentexperts,adequatetimetorefinelist• Content-validityconsiderations
• Over-representation• Degradationduetoinefficiency
• ConstructionsofMCQsorotherSRitems➡ trainingofskill,peerreview&feedback• Attemptstocorrectforguessing➡ construct-irrelevantvariance➡ limitsthenumberofoptionsandthenumberofitems
10/25/17
8
CR- Possiblethreatstovalidity/sourcesoferror
• Scoringisinherentlysubjective➡• ModelAnswers
• Containrequiredcomponents• Reviewedforaccuracy• Reviewedforcompleteness
• Createascoringrubric• Twoindependentreadersforeach• Readallanswerstosamequestion• Train,re-trainorreplace
• ContentUnder-representation• Promptssampleknowledge“deeply”butnot“broadly”
• Typesofexaminee“bluffing”
InteractiveActivity
OralExams
10/25/17
9
Observationalassessments:aka ClinicalPerformanceEvaluation
Miller’sPyramid4.Results
A.ChangeinorganizationalpracticeB.Benefitstopatients/clients
3.BehaviorA.Transferlearningtoworkplace
B.Learnersapplynewknowledgeandskills
2.LearningA.Changeattitudes/perceptions
B.Changeknowledge/skills
1.ReactionStudentSatisfaction
Kirkpatrick’sCriteriaforLearningOutcomes
Demonstrationoflearning
ClinicalPerformanceEvaluation
• Commonlyusedforformativeevaluation• Oftenrepresentsmajorcomponentofcompositegrade• Varyfromsimpletocomplex• Singlerater,singlesetting• Singlerater,multiplesettings• Multipleraters,singleratertype,singlesetting• Multipleraters,singleratertype,multiplesettings• Multipleraters,multipleratertypes,singlesetting• Multipleraters,multipleratertypes,multiplesettings
• Checklistsandratingformsused
10/25/17
10
RatersandRatingScalelimitations
“Themoreprecisethescale,themoredifficultitistoachievehighlevelsofvalidity.And,indeed,thereisoftenatemptationtobemoreprecisethanthelevelofvaliditywithwhichanobjectcanbemeasuredjustifies.”
Hammersley,1987
Designer Regularpeople
StandardizedAssessment(OSCE)• Oftenpredictable
• Specificparticipants
• Rolesconstant
• Levelofparticipationconsistent
• Trainingcontrolled
• Specificcontent
• Patientcomplexitycontrolled
• Simulatedworld
ClinicalPerformanceAssessment• Unpredictable
• Participantscanvary
• Rolesarecomplex
• Levelofparticipationvaries
• Participants(likely)notformallytrained
• Contentvaries
• Patientcomplexityvaries
• Real-consequences
AssessmentinClinicalEnvironments
• Instructionandassessmentareinseparable• Two-wayinteractiverelationshipsdevelop• Subjecttopitfallsandbiasesinherentinhumanrelationships• Assessorsareoftenlearningandworkingatthesametime• Learner’squicklytakeonapretenseorcloakofcompetencethatshapesevaluation.
10/25/17
11
ValidityThreatsinClinicalEvaluations
• Toofewobservationsofbehavior• Incompleteobservations• Checklistitemsinappropriatelywritten• Evaluatorbias• Systematicevaluatorerror• Bluffingofevaluators• Poorlytrainedevaluators
Stepstominimizevaliditythreats
• Assessmentgoalsaredefinedandmatchassessmenttool• Checklistitemsaregeneratedbykeystakeholders• Checklistitemsareclearlywrittenandtestedwithavarietyofevaluators• Evaluatorshaveclearsenseoftheassessmentgoal• Evaluatorsknowhowtousetheassessmenttool/instrument• Learnershaveclearsenseofassessmentgoal
BestPractices• Instrumentsshouldallowforabroadrangeofclinicalsituations• Usemultipleevaluatorswhenpossible• Keepinstrumentsshortandfocused• Formativeassessmentsshouldbeseparatethanthoseforadvancement• Givesufficienttimeforassessmentssothatratingsarenotrushed.• Minimizelagtimebetweenobservationsandevaluationcompletion• Nomorethan7responseoptionsperitem
Williams,Klamen,&McGaghie,2003;McGaghie,Buter,&Kaye,2009
10/25/17
12
InteractiveActivity
PerformanceAssessments
PerformanceAssessments
• Definition:Simulatedreal-lifetask,performed“invitro”• Examples:• Driver’slicenseexam• SCUBAcertification• USMLEStep2• OSCE
10/25/17
13
Demonstrationoflearning
ProsandCons(comparedtoObservationalAssessments)
• Strengths• Cancontrolsetting• Increasedstandardization• ImprovedpatientsafetybecauseusingSP’s• Notdisruptivetoactualpatientcare
• Limitations• Complexlogistically• Challengingtosimulatereallife• Expensive
Steps
• Identifytask• Createcase• Createassessmenttools• Pilot•Makeadjustments
10/25/17
14
Identifytask• Whatcompetenciesandskillswillbeassessed?• Mayneedtoreferenceschoolcompetencies• Mayrequireaneedsassessment• MayneedtoreviewEPAs
• Doyouwanttotestdiscreteskills(e.g.takingasexualhistory)?
• Doyouwanttotestacomprehensivepatientencounter(e.g.fullH&Pwithdifferentialandwriteup)?
Creatingthecase/WritingtheScripts
• SPscriptshouldcontainrichdetailincluding:• Personalcharacteristicsofthepatient• Demographics• Specialinstructions(i.e.affect,props,mood,etc.)• Backstory• Fullmedicalhistory• Proposedresponsestoquestions• Plausiblelabsresults
• Includealso:• Doornote• Checklists• Tasktobeaccomplishedbystudent• Guidelines• Feedback• Fillingoutforms
Scoringtheperformance• Checklist:containsspecifictasksthatthestudenthasperformed,doesnotrequireexpertrater,yesorno,behaviors,objective• RatingScale:Likertscaletoindicatequalityofwhatwasdone,moresubjective• Case-specificChecklist:tasksdevelopedbypanelofexpertsforspecificscenario,shouldbeabletobecompletedbynon-expert,behaviors,objective• Rubric:usedforwrittenproductsorpost-encounter
• ALSO- MUSTDECIDETHESTANDARDFORPASSING:CERTAINCUMULATIVESCOREor CERTAINPERFORMANCEoneachcase
10/25/17
15
Examples:ShowChoirAuditionRubricPoor(0points) Fair(1points) Good(2points) Superior(3pts)
SingingSkills
Singswith asmuchexpressionasawetnoodle,
cannotidentifytunecandidateissinging,cannotidentifylyricsofsongdueto
poorpronunciation
Minimally expressive,pitchoffsignificantlyon
occasion,dictionunclearattimes
Veryexpressive, singsonpitch mostofthetimewithminor
errors,dictionclearmostofthetime
Artistically expressive,singsonpitch,dictionconsistentlyclear
DancingSkills
Has2leftfeet,unabletolearnnewsteps,continuestodance likeMCHammer
despitedifferentchoreographydemonstrated
Misstepsfrequently,minimalartistic
expressionindancemoves,difficultylearningnew
choreographyafter3demonstrations
Occasionally missteps,butoveralldancestepsareaccurate,adaptschoreography
fairlyrapidly
Quickandnimble,dancesartistically,able
tolearnnewchoreography quickly.
Enthusiasm FreelyadmitsnotknowingwhatGLEEis
Endorses enjoymentofGLEE,butunabletoidentifyfavorite
character
Haswatched70%ofGLEEepisodes
Hasseen everyepisodeofGLEE,allGLEE
albumsconfirmediniTUNES library,has
beentoGLEELIVEeachsummer
Possiblethreatstovalidity/error
• Checklistitemsinterpreteddifferentlybydifferentpeople->facultydevelopment,usemultipleinputsforcreationofchecklist• Someitems/somecasesvarybydifficult->usemultiplecases• Somecasesambiguouswithmultipleinterpretations->pilotcarefully,usePDSAcycletoimprove• SP’svariablequality->carefultrainingofSP’s• Ratershavevariablescoressecondarytohaloeffective,leniency,severity,etc ->usebehaviorallyanchoredscoring,establishframeofreference• Onerater/SPperformsinconsistently->removethemortrain• Oneparticulargroupperformssignificantlydifferently->maybeanoccasion-specificfactor(suchasdisappointingpresidentialelection),trytocontrolfactorsthatcanbecontrolled.
InteractiveActivity• Pleasecreatea3stationOSCEthatwilltestifanindividualknowshowtoprepareforbed.ThisOSCEshouldtestskillsintheareasof1)dentalhygiene,2)toiletingand3)appropriateattire.Rememberthesteps:• Identifytask• Createcase• Createassessmenttools• Pilot• MakeadjustmentsEachtableshouldchoosetocreateastation forthisOSCE– selectskill1,2,or3.
10/25/17
16
Thankyouforjoiningus!
SarahShaffer,DO– UniversityofIowaAmyThompson,MD– UniversityofCincinnati
AliceChuang,MD,MEd– UniversityofNorthCarolina
References
• DowningSMandYudkowsky R,eds.AssessmentinHealthProfessionsEducation.NewYork:Routledge.2009.• CaseyPMetal.Tothepoint:reviewinmedicaleducation—theObjectiveStructuredClinicalExamination.AmJObstet Gynecol 2009Jan;2009(1):25-34.• Hammersley,1987• Goode&Hatt,1952• Williams,Klamen &McGaghie,2003• McGaghie,Buter &Kaye,2009