Upload
medcoucilofcanada
View
278
Download
0
Embed Size (px)
Citation preview
| | © 2016 MCC/CMC© 2016 MCC | CMC
André F. De Champlain, PhDDirector, Psychometrics and Assessment Services
Miriam Friedman Ben-David Lecture17th Ottawa Conference on the Assessment of Competence inMedicine & the Healthcare ProfessionsMar. 22, 2016 – Perth, Australia
Peering Through the Looking Glass: How Advances in Technology, Psychometrics and Philosophy are Altering the Assessment Landscape in Medical Education
3| | © 2016 MCC/CMC
Through the Looking Glass & What Alice Found There
Charles Dodgson (Lewis Carroll)• Sequel to Alice’s Adventures in
Wonderland (1871)Key theme: Inverse reflection• Reflection on an alternative world
which lies on the other side of a mirror• Distortion of sense (Jabberwocky)• Portmanteau (Jabberwocky)
• Linguistic blend of words◦ Webinar (web + seminar)◦ Brunch (breakfast + lunch)
5| | © 2016 MCC/CMC
Education & Assessment as Willing Partners
• Mechanisms through which learning occurs has shifted
• From traditional (paper-based) to electronic media◦ Tablet & mobile device-based
learning is ubiquitous (e.g., MedPage Today, QuantiaMD, etc.)
◦ Linear to exponential growth of knowledge in medicine
6| | © 2016 MCC/CMC
Education & Assessment as Willing Partners
• From a traditional view of education• Teacher-centered with high exam
scores as main goal
• To alternate models which stress learning, retention & integration of knowledge & skills using a host of assessment modalities (e.g., PBL, competency-based education, etc.)
7| | © 2016 MCC/CMC
Education & Assessment as Willing Partners• Evolution of learning models & modalities not completely mirrored by
similar changes in educational assessment• Educational assessment must evolve alongside learning models or
risk fostering an antagonistic relationship
• Educational assessment must (Bennett, 2002):• Provide meaningful information• Satisfy multiple purposes• Use modern conceptions of competency as a design basis• Design for positive impact & engagement• Use technology to achieve substantive goals
8| | © 2016 MCC/CMC
Rethinking the Nuts & Bolts of Assessment• Reconceptualising assessment
• Over the past two decades, tremendous amount of thought & activity aimed at proposing models of assessment & related processes that are:◦ More transparent & flexible◦ Better linked to learning activities◦ More informative from an educational standpoint
• Concurrently, effort has been devoted to improving processes necessary to support these re-envisioned assessments
• Revisitation of assessment’s epistemological core• What world lies on the other side of the assessment mirror?
9| | © 2016 MCC/CMC
Rethinking the Nuts & Bolts of Assessment• Assessment paradigm shift
• Programmatic assessment (van der Vleuten et al., 2012)
• Post-modern test theory (Mislevy, 1997)• Cognitively-based assessment of, for & as
learning (CBAL [Bennett, 2010])
• Use of technology to improve• Test development practices (automated
item generation [AIG])• Marking of open-ended responses &
narrative text
10| | © 2016 MCC/CMC
Rethinking the Nuts & Bolts of Assessment
• Illustrate our evolution in assessment paradigm, technology & scoring at the Medical Council of Canada
11| | © 2016 MCC/CMC
Assessment Paradigm Shift
• Increasing dissatisfaction with established educational assessment models
• Candidate’s “true” competency level can be measured with standardized, context-free tools & further confirmed by highly reproducible, unambiguous, statistical results
• Linear relationship between learning & assessment◦ Discrete, episodic hurdles to overcome
• Unlinked assessments
12| | © 2016 MCC/CMC
Assessment Paradigm Shift• Concerns
• Lack of overarching framework (program) to guide the design of the assessment tools along an educational continuum◦ Plea for a macroscopic rather than microscopic view of
assessment (de Rosnay, 1979)• Reductionist lens that is applied to what is a complex, adaptive
system with interconnected components & dynamic relationships• Missed opportunity to view learning & assessment in a rich,
recursive relationship◦ Both activities can dynamically inform each other◦ Feed forwarding information
13| | © 2016 MCC/CMC
Programmatic Assessment1
• Calls for a deliberate, arranged set of longitudinal assessment activities
• Joint attestation of all data points for decision & remediation purposes
• Input of expert professional judgment is a cornerstone of this model
• Purposeful link between assessment & learning/remediation• Dynamic, recursive relationships between assessment &
learning points
14| | © 2016 MCC/CMC
Programmatic Assessment1
• Application of a program evaluation framework to assessment• Systematic collection of data to answer specific questions about
a program
• Gaining popularity within several medical education settings• Competency-based workplace learning• Medical schools (e.g., Dalhousie University, University of
Toronto, etc.)• Etc.
15| | © 2016 MCC/CMC
Programmatic Assessment1
• Systemic models of assessment and learning also popular in other settings
• K-12 education (Bennett, 2012; Educational Testing Service)◦ Cognitively based assessment of, for & as learning (CBAL)
– Documents what students have achieved (‘of learning’)– Helps identify how to plan & adjust instruction (‘for learning’)– Considered by students and teachers to be a worthwhile
educational experience in and of itself (‘as learning’)
• We are treading on well-travelled ground!
16| | © 2016 MCC/CMC
Dipping our Toes into the Programmatic Assessment Waters: The MCC Experience
Assessment Review Task Force (ARTF)• As the MCC approached it’s 100th anniversary (2012), a
task force of eminent Canadian medical educators was convened to undertake a reflective & strategic review of the MCC’s assessment purposes and objectives, their structure & their alignment with MCC’s major stakeholder requirements
• Report published in 2011 which contained six recommendations to fulfill including validating & updating the blueprint for MCC examinations, offering exams with greater flexibility, enhancing & standardizing IMG assessments & engaging in in-practice assessment
17| | © 2016 MCC/CMC
Tacit Sub-Recommendation: Macro-AnalysisAn intimated challenge outlined in the ARTF report pertains to the need to conduct a macro-analysis & review of the MCC Qualifying Examination (MCCQE)• Applying a systemic (macroscopic) lens to the MCCQE as
an integrated examination system & not simply as a restricted number of episodic hurdles (MCCQE Parts I & II)
• How are the components of the MCCQE interconnected & how do they inform key markers along a physician’s educational & professional continuum?
• How can the MCCQE progress towards embodying an integrated, logically planned & sequenced system of assessments that mirrors the Canadian physician’s journey?
18| | © 2016 MCC/CMC
Programmatic Assessment Refocuses the Debate
Reductionism• A system reduced to its
most basic elements (e.g., corresponds to the sum of its parts)
• Decision point I: MCCQE Part I
• Decision point II: MCCQE Part II
Emergentism• A system is more than the sum of
its parts & also depends on complex interdependencies amongst its component parts
• Decision point I: Purposeful integration of MCCQE Part I scores with other data elements
• Decision point II: Purposeful integration of MCCQE Part II scores with other data elements
VS.
19| | © 2016 MCC/CMC
Towards a Programmatic View of Assessment in Canada
Assessment Continuum for Canada (ACC)• Purpose:
• To define the ‘life of a physician’ from the beginning of medical school to retirement in terms of assessments
• To propose a common, national framework of assessment using a programmatic approach
• Composition includes representation from the MCC, UGME/PGME, certification colleges, FMRAC and MRAs
• First step includes defining the CMG physician pathway with key partners
21| | © 2016 MCC/CMC
Assessment Continuum for Canada (ACC)• Next steps include a retreat to begin to define a program of assessment (April
2016)
• Starting points to develop a program of assessment• Various ongoing North American EPA projects• Milestone projects (RCPSC, ACGME)• CanMEDS 2015 (RCPSC) & Triple C curriculum (CFPC)• MCCQE blueprint• …and many others
• Critical to develop an overarching framework (program) prior to specifying elements & relationships in this program
22| | © 2016 MCC/CMC
Practical Implications of Programmatic Assessment• Programmatic assessment is predicated on more frequent & flexible
assessment via a variety of tools:• Traditional exam formats• Lower-stakes, in-practice observations• Narratives, etc.
• This shift impacts core assessment tasks including test development & scoring activities
• Assessments need to be developed, administered & scored more frequently
• How technology is helping the MCC optimize test development & scoring activities to better support programmatic assessment
• AIG and automated marking as examples
23| | © 2016 MCC/CMC
Enhancing & Supplementing Test Development: AIG
What is AIG?• Automated item generation (AIG) is the process of using item
models to generate test items with the aid of computer technology
• AIG uses a three-stage process for generating items where the cognitive mechanism required to solve the items is identified & manipulated to create new items (‘cognitive map’)
24| | © 2016 MCC/CMC
Enhancing & Supplementing Test Development: AIG
Three steps:1. Identification of the PROBLEM (e.g., post-operative fever)
2. Specification of SOURCES of information required to diagnose the problem (e.g., type of surgery, physical examination, etc.)
3. Description of VARIABLES & LEVELS within each information source (e.g., guarding & rebound, timing of fever, calf tenderness, etc.) needed to create different instances of the problem
25| | © 2016 MCC/CMC
Enhancing & Supplementing Test Development: AIG
A traditional, committee-based, item-writing workshop
26| | © 2016 MCC/CMC
Enhancing & Supplementing Test Development: AIG
The beginnings of a cognitive map (post-operative fever)
28| | © 2016 MCC/CMC
Enhancing & Supplementing Test Development: AIG• A cognitive model can be viewed as a template, a rendering, or a mold of the
assessment task (i.e., a target where we want to place the content for the item)
• A 54-year-old woman has a <Type of Surgery>. On post-operative day <Timing of Fever>, the patient has a temperature of 38.5°C. Physical examination reveals <Physical Examination>. Which one of the following is the best next step?
• Type of Surgery: Gastrectomy, right hemicolectomy, left hemicolectomy, appendectomy, laparoscopic cholecystectomy
• Timing of Fever: 1 to 6 days• Physical Examination: Red & tender wound, guarding & rebound, abdominal
tenderness, calf tenderness
29| | © 2016 MCC/CMC
Enhancing & Supplementing Test Development: AIG
Lessons learned since 2011:• Test development efforts
• Thousands of items generated across 50+ cognitive maps• Significantly improved test development process• Deliberate modeled process for distractors (diagnostic feedback)• Predictive identification accuracy ranged from 32-52% across four
experts with an average accuracy rate of 42%
• Psychometric efforts (250+ piloted AIG items)• On average, AIG items are comparable in difficulty & discrimination
(based on classical & IRT statistics)• Stronger distractors (directly attributable to the AIG process)
30| | © 2016 MCC/CMC
Streamlining the Scoring of Open Responses & Narrative Text: Automated Essay Scoring (AES)
• Programmatic assessment calls for a combination of low- & high-stakes assessment for remediation & decision purposes
• Lower-stakes assessments include narrative & other open-ended qualitative measures
• Scoring of these open-ended measures is typically based on human ratings & can be very resource-intensive
• How might we streamline the scoring of such tasks?• Automated essay scoring (AES)
31| | © 2016 MCC/CMC
Streamlining the Scoring of Open Responses & Narrative Text: Automated Essay Scoring (AES)
• The MCCQE Part I • MCCQE Part I completed as a licencing requirement in Canada
towards the end of UGME• Two-part exam (196 A-type MCQs + clinical decision-making [CDM],
short-menu and write-in questions)
• Scoring process for write-in questions• >50 physicians hired to score write-in responses over two-three days• Scoring can only occur on weekends• Costly & logistically unsustainable
32| | © 2016 MCC/CMC
Streamlining the Scoring of Open Responses & Narrative Text: Automated Essay Scoring (AES)
• Our solution: AES• AES offers a promising alternative for supplementing hand-scoring
of written-response items
• With AES, a computer builds scoring models based on previously-scored (human) responses & applies scoring model(s) to subsequent examinations
• AES relies on natural language processing (NLP) and machine-learning algorithms (MLA)
33| | © 2016 MCC/CMC
Streamlining the Scoring of Open Responses & Narrative Text: Automated Essay Scoring (AES)
Our research to date:• Under what conditions does AES work properly?
• What is the level of agreement between human raters & machine?
• How does this agreement compare to inter-human ratings?
• What is the impact of scoring CDM write-in questions using AES on overall pass/fail rates?
34| | © 2016 MCC/CMC
Streamlining the Scoring of Open Responses & Narrative Text: Automated Essay Scoring (AES)
Agreement rates:• Overall, human-machine agreement is very high
• Median kappa = 0.89 for spring 2015 MCCQE Part I CDM write-in questions (range: 0.56-0.99)
• Human-human agreement is slightly higher (Shermis, 2015)• Median kappa = 0.96 for spring 2015 MCCQE Part I CDM write-in questions
(range: 0.56-1.00)
• Humans were not always correct!◦ For a few CDM write-in questions, humans scored consistently, but incorrectly
35| | © 2016 MCC/CMC
Streamlining the Scoring of Open Responses & Narrative Text: Automated Essay Scoring (AES)
Impact on pass/fail decisions• Pass/fail decision agreement
rates greater or equal to 0.96 whether candidates were scored by machine or human
• All Cohen kappa statistics greater or equal to 0.93
SPRING 2015MCCQE Part I
COHEN’SKAPPA
Overall 0.96
English 0.96
French 0.95
36| | © 2016 MCC/CMC
Streamlining the Scoring of Open Responses & Narrative Text: Automated Essay Scoring (AES)
Conclusions:• Overall, agreement was very high
• Results are aligned with findings from past research• AES cannot be completed using a one-size-fits-all approach
• Performance of AES is less optimal in some conditions◦ With some polytomous items◦ With French models (very small sample sizes)
• Limited impact of machine scoring on ability estimates and pass/fail decisions• MCC is confident that AES can considerably streamline the scoring of open-
ended items
37| | © 2016 MCC/CMC
Conclusions
• Re-emergence of competency-based educational models necessitates an analogous broadening of our assessment frameworks & strategies
• Systemic models have challenged assessment experts to broaden our panoply of strategies & to purposefully link assessment to learning in a recursive fashion (assessment engineering)
• The implementation of linked models is predicated on more frequent & broad assessments of performance
38| | © 2016 MCC/CMC
Conclusions
• Technology provides several useful means by which:• Test development processes can be significantly improved,
systematized & streamlined to support programmatic assessment (exam engineering)
• Scoring of open-ended, qualitative narrative can be accomplished in an efficient & defensible manner
• The mind is not a vessel to be filled, but a fire to be kindled. (Plutarch)