39
| | © 2016 MCC/CMC © 2016 MCC | CMC André F. De Champlain, PhD Director, Psychometrics and Assessment Services Miriam Friedman Ben-David Lecture 17 th Ottawa Conference on the Assessment of Competence in Medicine & the Healthcare Professions Mar. 22, 2016 – Perth, Australia Peering Through the Looking Glass: How Advances in Technology, Psychometrics and Philosophy are Altering the Assessment Landscape in Medical Education

Peering Through the Looking Glass: How Advances in Technology, Psychometrics and Philosophy are Altering the Assessment Landscape in Medical Education

Embed Size (px)

Citation preview

| | © 2016 MCC/CMC© 2016 MCC | CMC

André F. De Champlain, PhDDirector, Psychometrics and Assessment Services

Miriam Friedman Ben-David Lecture17th Ottawa Conference on the Assessment of Competence inMedicine & the Healthcare ProfessionsMar. 22, 2016 – Perth, Australia

Peering Through the Looking Glass: How Advances in Technology, Psychometrics and Philosophy are Altering the Assessment Landscape in Medical Education

2| | © 2016 MCC/CMC

I do not have any conflicts of interest to report.

3| | © 2016 MCC/CMC

Through the Looking Glass & What Alice Found There

Charles Dodgson (Lewis Carroll)• Sequel to Alice’s Adventures in

Wonderland (1871)Key theme: Inverse reflection• Reflection on an alternative world

which lies on the other side of a mirror• Distortion of sense (Jabberwocky)• Portmanteau (Jabberwocky)

• Linguistic blend of words◦ Webinar (web + seminar)◦ Brunch (breakfast + lunch)

4| | © 2016 MCC/CMC

5| | © 2016 MCC/CMC

Education & Assessment as Willing Partners

• Mechanisms through which learning occurs has shifted

• From traditional (paper-based) to electronic media◦ Tablet & mobile device-based

learning is ubiquitous (e.g., MedPage Today, QuantiaMD, etc.)

◦ Linear to exponential growth of knowledge in medicine

6| | © 2016 MCC/CMC

Education & Assessment as Willing Partners

• From a traditional view of education• Teacher-centered with high exam

scores as main goal

• To alternate models which stress learning, retention & integration of knowledge & skills using a host of assessment modalities (e.g., PBL, competency-based education, etc.)

7| | © 2016 MCC/CMC

Education & Assessment as Willing Partners• Evolution of learning models & modalities not completely mirrored by

similar changes in educational assessment• Educational assessment must evolve alongside learning models or

risk fostering an antagonistic relationship

• Educational assessment must (Bennett, 2002):• Provide meaningful information• Satisfy multiple purposes• Use modern conceptions of competency as a design basis• Design for positive impact & engagement• Use technology to achieve substantive goals

8| | © 2016 MCC/CMC

Rethinking the Nuts & Bolts of Assessment• Reconceptualising assessment

• Over the past two decades, tremendous amount of thought & activity aimed at proposing models of assessment & related processes that are:◦ More transparent & flexible◦ Better linked to learning activities◦ More informative from an educational standpoint

• Concurrently, effort has been devoted to improving processes necessary to support these re-envisioned assessments

• Revisitation of assessment’s epistemological core• What world lies on the other side of the assessment mirror?

9| | © 2016 MCC/CMC

Rethinking the Nuts & Bolts of Assessment• Assessment paradigm shift

• Programmatic assessment (van der Vleuten et al., 2012)

• Post-modern test theory (Mislevy, 1997)• Cognitively-based assessment of, for & as

learning (CBAL [Bennett, 2010])

• Use of technology to improve• Test development practices (automated

item generation [AIG])• Marking of open-ended responses &

narrative text

10| | © 2016 MCC/CMC

Rethinking the Nuts & Bolts of Assessment

• Illustrate our evolution in assessment paradigm, technology & scoring at the Medical Council of Canada

11| | © 2016 MCC/CMC

Assessment Paradigm Shift

• Increasing dissatisfaction with established educational assessment models

• Candidate’s “true” competency level can be measured with standardized, context-free tools & further confirmed by highly reproducible, unambiguous, statistical results

• Linear relationship between learning & assessment◦ Discrete, episodic hurdles to overcome

• Unlinked assessments

12| | © 2016 MCC/CMC

Assessment Paradigm Shift• Concerns

• Lack of overarching framework (program) to guide the design of the assessment tools along an educational continuum◦ Plea for a macroscopic rather than microscopic view of

assessment (de Rosnay, 1979)• Reductionist lens that is applied to what is a complex, adaptive

system with interconnected components & dynamic relationships• Missed opportunity to view learning & assessment in a rich,

recursive relationship◦ Both activities can dynamically inform each other◦ Feed forwarding information

13| | © 2016 MCC/CMC

Programmatic Assessment1

• Calls for a deliberate, arranged set of longitudinal assessment activities

• Joint attestation of all data points for decision & remediation purposes

• Input of expert professional judgment is a cornerstone of this model

• Purposeful link between assessment & learning/remediation• Dynamic, recursive relationships between assessment &

learning points

14| | © 2016 MCC/CMC

Programmatic Assessment1

• Application of a program evaluation framework to assessment• Systematic collection of data to answer specific questions about

a program

• Gaining popularity within several medical education settings• Competency-based workplace learning• Medical schools (e.g., Dalhousie University, University of

Toronto, etc.)• Etc.

15| | © 2016 MCC/CMC

Programmatic Assessment1

• Systemic models of assessment and learning also popular in other settings

• K-12 education (Bennett, 2012; Educational Testing Service)◦ Cognitively based assessment of, for & as learning (CBAL)

– Documents what students have achieved (‘of learning’)– Helps identify how to plan & adjust instruction (‘for learning’)– Considered by students and teachers to be a worthwhile

educational experience in and of itself (‘as learning’)

• We are treading on well-travelled ground!

16| | © 2016 MCC/CMC

Dipping our Toes into the Programmatic Assessment Waters: The MCC Experience

Assessment Review Task Force (ARTF)• As the MCC approached it’s 100th anniversary (2012), a

task force of eminent Canadian medical educators was convened to undertake a reflective & strategic review of the MCC’s assessment purposes and objectives, their structure & their alignment with MCC’s major stakeholder requirements

• Report published in 2011 which contained six recommendations to fulfill including validating & updating the blueprint for MCC examinations, offering exams with greater flexibility, enhancing & standardizing IMG assessments & engaging in in-practice assessment

17| | © 2016 MCC/CMC

Tacit Sub-Recommendation: Macro-AnalysisAn intimated challenge outlined in the ARTF report pertains to the need to conduct a macro-analysis & review of the MCC Qualifying Examination (MCCQE)• Applying a systemic (macroscopic) lens to the MCCQE as

an integrated examination system & not simply as a restricted number of episodic hurdles (MCCQE Parts I & II)

• How are the components of the MCCQE interconnected & how do they inform key markers along a physician’s educational & professional continuum?

• How can the MCCQE progress towards embodying an integrated, logically planned & sequenced system of assessments that mirrors the Canadian physician’s journey?

18| | © 2016 MCC/CMC

Programmatic Assessment Refocuses the Debate

Reductionism• A system reduced to its

most basic elements (e.g., corresponds to the sum of its parts)

• Decision point I: MCCQE Part I

• Decision point II: MCCQE Part II

Emergentism• A system is more than the sum of

its parts & also depends on complex interdependencies amongst its component parts

• Decision point I: Purposeful integration of MCCQE Part I scores with other data elements

• Decision point II: Purposeful integration of MCCQE Part II scores with other data elements

VS.

19| | © 2016 MCC/CMC

Towards a Programmatic View of Assessment in Canada

Assessment Continuum for Canada (ACC)• Purpose:

• To define the ‘life of a physician’ from the beginning of medical school to retirement in terms of assessments

• To propose a common, national framework of assessment using a programmatic approach

• Composition includes representation from the MCC, UGME/PGME, certification colleges, FMRAC and MRAs

• First step includes defining the CMG physician pathway with key partners

20| | © 2016 MCC/CMC

CMG Competency Assessment Pathway

21| | © 2016 MCC/CMC

Assessment Continuum for Canada (ACC)• Next steps include a retreat to begin to define a program of assessment (April

2016)

• Starting points to develop a program of assessment• Various ongoing North American EPA projects• Milestone projects (RCPSC, ACGME)• CanMEDS 2015 (RCPSC) & Triple C curriculum (CFPC)• MCCQE blueprint• …and many others

• Critical to develop an overarching framework (program) prior to specifying elements & relationships in this program

22| | © 2016 MCC/CMC

Practical Implications of Programmatic Assessment• Programmatic assessment is predicated on more frequent & flexible

assessment via a variety of tools:• Traditional exam formats• Lower-stakes, in-practice observations• Narratives, etc.

• This shift impacts core assessment tasks including test development & scoring activities

• Assessments need to be developed, administered & scored more frequently

• How technology is helping the MCC optimize test development & scoring activities to better support programmatic assessment

• AIG and automated marking as examples

23| | © 2016 MCC/CMC

Enhancing & Supplementing Test Development: AIG

What is AIG?• Automated item generation (AIG) is the process of using item

models to generate test items with the aid of computer technology

• AIG uses a three-stage process for generating items where the cognitive mechanism required to solve the items is identified & manipulated to create new items (‘cognitive map’)

24| | © 2016 MCC/CMC

Enhancing & Supplementing Test Development: AIG

Three steps:1. Identification of the PROBLEM (e.g., post-operative fever)

2. Specification of SOURCES of information required to diagnose the problem (e.g., type of surgery, physical examination, etc.)

3. Description of VARIABLES & LEVELS within each information source (e.g., guarding & rebound, timing of fever, calf tenderness, etc.) needed to create different instances of the problem

25| | © 2016 MCC/CMC

Enhancing & Supplementing Test Development: AIG

A traditional, committee-based, item-writing workshop

26| | © 2016 MCC/CMC

Enhancing & Supplementing Test Development: AIG

The beginnings of a cognitive map (post-operative fever)

27| | © 2016 MCC/CMC

Enhancing & Supplementing Test Development: AIG

28| | © 2016 MCC/CMC

Enhancing & Supplementing Test Development: AIG• A cognitive model can be viewed as a template, a rendering, or a mold of the

assessment task (i.e., a target where we want to place the content for the item)

• A 54-year-old woman has a <Type of Surgery>. On post-operative day <Timing of Fever>, the patient has a temperature of 38.5°C. Physical examination reveals <Physical Examination>. Which one of the following is the best next step?

• Type of Surgery: Gastrectomy, right hemicolectomy, left hemicolectomy, appendectomy, laparoscopic cholecystectomy

• Timing of Fever: 1 to 6 days• Physical Examination: Red & tender wound, guarding & rebound, abdominal

tenderness, calf tenderness

29| | © 2016 MCC/CMC

Enhancing & Supplementing Test Development: AIG

Lessons learned since 2011:• Test development efforts

• Thousands of items generated across 50+ cognitive maps• Significantly improved test development process• Deliberate modeled process for distractors (diagnostic feedback)• Predictive identification accuracy ranged from 32-52% across four

experts with an average accuracy rate of 42%

• Psychometric efforts (250+ piloted AIG items)• On average, AIG items are comparable in difficulty & discrimination

(based on classical & IRT statistics)• Stronger distractors (directly attributable to the AIG process)

30| | © 2016 MCC/CMC

Streamlining the Scoring of Open Responses & Narrative Text: Automated Essay Scoring (AES)

• Programmatic assessment calls for a combination of low- & high-stakes assessment for remediation & decision purposes

• Lower-stakes assessments include narrative & other open-ended qualitative measures

• Scoring of these open-ended measures is typically based on human ratings & can be very resource-intensive

• How might we streamline the scoring of such tasks?• Automated essay scoring (AES)

31| | © 2016 MCC/CMC

Streamlining the Scoring of Open Responses & Narrative Text: Automated Essay Scoring (AES)

• The MCCQE Part I • MCCQE Part I completed as a licencing requirement in Canada

towards the end of UGME• Two-part exam (196 A-type MCQs + clinical decision-making [CDM],

short-menu and write-in questions)

• Scoring process for write-in questions• >50 physicians hired to score write-in responses over two-three days• Scoring can only occur on weekends• Costly & logistically unsustainable

32| | © 2016 MCC/CMC

Streamlining the Scoring of Open Responses & Narrative Text: Automated Essay Scoring (AES)

• Our solution: AES• AES offers a promising alternative for supplementing hand-scoring

of written-response items

• With AES, a computer builds scoring models based on previously-scored (human) responses & applies scoring model(s) to subsequent examinations

• AES relies on natural language processing (NLP) and machine-learning algorithms (MLA)

33| | © 2016 MCC/CMC

Streamlining the Scoring of Open Responses & Narrative Text: Automated Essay Scoring (AES)

Our research to date:• Under what conditions does AES work properly?

• What is the level of agreement between human raters & machine?

• How does this agreement compare to inter-human ratings?

• What is the impact of scoring CDM write-in questions using AES on overall pass/fail rates?

34| | © 2016 MCC/CMC

Streamlining the Scoring of Open Responses & Narrative Text: Automated Essay Scoring (AES)

Agreement rates:• Overall, human-machine agreement is very high

• Median kappa = 0.89 for spring 2015 MCCQE Part I CDM write-in questions (range: 0.56-0.99)

• Human-human agreement is slightly higher (Shermis, 2015)• Median kappa = 0.96 for spring 2015 MCCQE Part I CDM write-in questions

(range: 0.56-1.00)

• Humans were not always correct!◦ For a few CDM write-in questions, humans scored consistently, but incorrectly

35| | © 2016 MCC/CMC

Streamlining the Scoring of Open Responses & Narrative Text: Automated Essay Scoring (AES)

Impact on pass/fail decisions• Pass/fail decision agreement

rates greater or equal to 0.96 whether candidates were scored by machine or human

• All Cohen kappa statistics greater or equal to 0.93

SPRING 2015MCCQE Part I

COHEN’SKAPPA

Overall 0.96

English 0.96

French 0.95

36| | © 2016 MCC/CMC

Streamlining the Scoring of Open Responses & Narrative Text: Automated Essay Scoring (AES)

Conclusions:• Overall, agreement was very high

• Results are aligned with findings from past research• AES cannot be completed using a one-size-fits-all approach

• Performance of AES is less optimal in some conditions◦ With some polytomous items◦ With French models (very small sample sizes)

• Limited impact of machine scoring on ability estimates and pass/fail decisions• MCC is confident that AES can considerably streamline the scoring of open-

ended items

37| | © 2016 MCC/CMC

Conclusions

• Re-emergence of competency-based educational models necessitates an analogous broadening of our assessment frameworks & strategies

• Systemic models have challenged assessment experts to broaden our panoply of strategies & to purposefully link assessment to learning in a recursive fashion (assessment engineering)

• The implementation of linked models is predicated on more frequent & broad assessments of performance

38| | © 2016 MCC/CMC

Conclusions

• Technology provides several useful means by which:• Test development processes can be significantly improved,

systematized & streamlined to support programmatic assessment (exam engineering)

• Scoring of open-ended, qualitative narrative can be accomplished in an efficient & defensible manner

• The mind is not a vessel to be filled, but a fire to be kindled. (Plutarch)

| | © 2016 MCC/CMCTHANK YOU!

© 2016 MCC | CMC

André F. De Champlain, [email protected]

39