Upload
dinhque
View
225
Download
4
Embed Size (px)
Citation preview
Yoon Soo Park, PhDSeptember 21, 2017
Validity Studies on the Assessment of Clinical Reasoning Skills
Using Patient Notes
University of TokyoMedical Education Lecture
Overview
1. Overview of Clinical Reasoning: Diagnostic Justification
2. USMLE Step 2 Clinical Skills Examination
3. Validity Studies of Patient Note Scoring Rubric
– Five Projects (Years 2012 to 2017)
– Gather Validity Evidence
4. Implications for Validity Research
Clinical Reasoning (1)
Data GatheringHistory
Physical ExaminationLab Results
Differential DiagnosisWorking Diagnosis
Plan
Diagnoses listed –Not always relate to actual findings!
Finding 1
Finding 2
Finding 3
Finding 4
Finding 5
?????Diagnosis A
Diagnosis B
Diagnosis C
Clinical Reasoning (2)
• Organize and use knowledge• Collect pertinent data about patients• Generate appropriate DDx given chief complaint• Make good diagnostic decisions using data
Diagnostic Justification
1. How to assess?2. Validity?
Clinical Reasoning –Diagnostic Justification
Finding 1
Finding 2
Finding 3
Finding 4
Finding 5
Diagnosis A
Diagnosis B
Diagnostic Justification Study –4th year Medical Students
Williams and Klamen, Academic Medicine, 2012
Williams and Klamen Study
• Case specificity – Justification scores variable across cases
• Task specificity
• >20% of students had borderline or poor Diagnostic justification on > 50% of cases
USMLE Step 2 Clinical SkillsPatient Note Format
Old Format:Before June 2012
1. Document H & PE
2. DDX
3. Plan for immediate workup
New Format:After June 2012
1. Document H & PE
2. Ranked and Justified DDX
3. Plan for immediate workup
USMLE Step 2 Clinical Skills (1)
Standardized Patient Encounter
(15 minutes)
Patient NoteWriting
(10 minutes)
Patient
MSStudent Patient
Note
Repeat for 12 Stations
• Medical schools → align curriculum and assessment • Develop scoring rubrics • Train faculty raters to score PNs
SP Encounter(about 15 min.)
Patient Note(about 10 min.)
Scoring: Standardized Patients• Physical Examination
Scoring: Physician Raters1. Documentation2. Differential Diagnosis3. Diagnostic Justification4. Workup Plan
USMLE Step 2 Clinical Skills (2)
• 75‐yo woman, sudden onset back pain while lifting turkey from oven. • Pain in midline, mid‐thoracic• PMH: fracture of foot bone last year• Meds: Acetaminophen for occasional headache …
• Well‐nourished woman looks her age, alert and lucid; • Appears in pain, especially with movements; • Point tenderness over T8; • Straight leg test negative; sensation intact …
Pulled Back Muscle
Sudden onset back pain with liftingPain worse with movement
Range of motion limited by pain
Imaging of back/spine
Five Sources of Validity Evidence(AERA, APA, & NCME, 2014)
• Content– Content sampled, blueprint
• Response process– Response, Rater Issues
• Internal structure– Reliability, Psychometrics
• Relations to other variables– Correlations between assessment score to other measures
• Consequences– Impact on pass‐fail rates, curriculum
Education and Research Cycle:University of Illinois – College of Medicine
Assessments:Graduation CompetencyExamination
PsychometricAnalysisResearch Paper
Generate New Questions
Education Committee:Discussion
Curricular ChangeNew Questions
Quality ImprovementAccreditation Agency
• 1.1 Continuous Quality Improvement– A medical school engages in ongoing planning and continuous
improvement processes … achievement of measureable outcomes …
• 8.4 Program Evaluation– A medical school collects and uses a variety of outcome data, including
national norms of accomplishment, to demonstrate the extent to which medical students are achieving medical education program objectives and to enhance medical education program quality.
Standards for Accreditation of Medical Education Programs Leading to the MD Degree,Liaison Committee on Medical Education (LCME), 2016
Validity Research: Patient Note Rubric
2012 2013 2014 2015 2016 2017
Paper 1:Development
Paper 3:Rater Issues
Paper 2: Reliability and Psychometrics
Paper 4: Composite Measure
Paper 5: Multisite Study
Validity Research Agenda(Sireci 2013; Kane 2013; Messick 1995)
ProjectSources of Validity Evidence
Content Internal structure
Relations with other variables
Response Process Consequences
Paper 1(2012‐2013)
Paper 2(2013‐2015)
Paper 3(2014‐2016)
Paper 4(2015‐2016)
Paper 5(2015‐2017)
xx
x
xx
xx x
x x
x x
Task with Description Score and Anchor1. Documentation:
Documentation of findings in history (Hx) and physical examination (PE)
(30 points)
1. Key Hx and PE findings are missing or incorrect 2. Some key positive findings present but poorly documented or disorganized or missing
pertinent negatives 3. Most key positive findings well documented and organized, may miss a few pertinent
negatives 4. All key information present, concise and well organized with little irrelevant
information 2. DDX:
Differential Diagnosis
(30 points)
1. [0-1 of 3] or [0 of 2] of the correct diagnoses listed2. [2 of 3] or [1 of 2] of the correct diagnoses listed, in any order3. All diagnoses listed, incorrect rank order 4. All diagnoses listed and correctly rank ordered
3. Justification:
Justification of Differential Diagnosis
(30 points)
1. No justification provided OR many missing or incorrect links between findings and Dx2. Some missing or incorrect links between findings and Dx 3. Only a few missing or incorrect attributions, which would not impact Dx 4. Links to diagnoses are correct and complete
4. Workup:
Plan for Immediate Diagnostic Workup
(10 points)
1. Diagnostic workup places patient in unnecessary risk or danger 2. Ineffective plan for diagnostic workup – essential tests missed, irrelevant tests included 3. Reasonable plan for diagnostic workup, may have some unnecessary tests or missing
few essential tests4. Plan for diagnostic workup is effective and efficient, includes all essential tests, and
few or no unnecessary tests
Literature Review and Interview/Focus Groups
• As described in the literature? • What language do learners use?
Conceptualization of Construct
Prospective Learners
match?
Literature Review• Determine if construct already exist• Alignment between construct and theory
Interview / Focus Group
Conduct Cognitive Interviews
Read and paraphrase it –
“tell me what you think it is asking, in your own words”
Ask more focused questions to evaluate if respondents agree
• Observable?• Relevant?• Wording clear?• Responses clear?• Other?
1. Documentation of Key Hx and PE (30 points)1. Key Hx and PE findings are missing or incorrect
2. Some key positive findings present but poorly documented or disorganized or missing pertinent negatives
3. Most key positive findings well documented and organized, may miss a few pertinent negatives
4. All key information present, concise and well organizedwith little irrelevant info
7%
27%
57%
9%
Pilot Study –Task Score: Documentation
2. Differential Diagnosis (30 points)1. [0‐1 of 3] or [0 of 2] of the correct diagnoses
listed2. [2 of 3] or [1 of 2] of the correct diagnoses
listed, in any order3. All diagnoses listed, incorrect rank order 4. All diagnoses listed and correctly rank
ordered
26
11%
34%
29%
25%
Pilot Study –Task Score: Differential Diagnosis
3. Justification (30 points)1. No justification provided OR many missing or
incorrect links between findings and Dx2. Some missing or incorrect links between
findings and Dx3. Only a few missing or incorrect attributions,
which would not impact Dx4. Links to diagnoses are correct and complete
27
6%
34%
55%
5%
Pilot Study –Task Score: Justification
4. Plan for Immediate Dx Workup (10 points)1. Diagnostic workup places patient in unnecessary
risk or danger 2. Ineffective plan for diagnostic workup – essential
tests missed, irrelevant tests included 3. Reasonable plan for diagnostic workup, may
have some unnecessary tests or missing few essential tests
4. Plan for diagnostic workup is effective and efficient, includes all essential tests, and few or no unnecessary tests.
28
7%
54%
36%
3%
Pilot Study –Task Score: Workup
Students who received scores of “Poor” or “Borderline” on Justification
# Cases %0 91 312 303 214 85 1
Consistent with findings from Williams Study –
>20% of students had borderline or poor Diagnostic justificationon >50% of cases.
Students need more practice linking findings and diagnoses!
Finding 1
Finding 2
Finding 3
Finding 4
Finding 5
Diagnosis A
Diagnosis B
Yudkowsky et al, Academic Medicine, 2015
Paper 2: Reliability and Psychometrics(Project Years: 2013‐2015)
True Variance
Score (X)
Frequency
True Variance =Total Variance – Error
Total Variance =True Variance + Error
error variancetrue"" variancetrue"" y Reliabilit
Reliability: Example
ErrorTrueTrue
nceTotalVariaceTrueVarianyreliabilit
Example (Note: Hypothetical!)
Test Mean = 65 Standard Deviation = 5 Total Variance = 25
Total (True + Error) = 25→ True = 15, Error = 10
Reliability =
__15__ = __15__ = .6025 15 + 10
Internal Structure: Reliability
• Reliability with 5 cases– G‐coefficient: .47– Φ‐coefficient: .43
• Variance components– Students: 6%– Student‐case interaction (case specificity): 20%
• Need 15 cases to reach Φ‐coefficient of .70
Park et al, Advances in Health Sciences Education, 2016
Paper 3: Rater Issues(Project Years: 2014‐2016)
Rater Agreement
Rater 1 Rater 23 24 41 32 41 43 41 12 24 21 1
• How well do raters agree?• Exact: % exact match• Adjacent: % match within ±1 adjacent point• Discrepant : % discrepant by ±2 or more points
Agreement% Exact = 40%% Adjacent = 20%% Discrepant = 40%
“EXACT”“Discrepant”“Discrepant”
“Adjacent”
“Adjacent”
“Discrepant”
“EXACT”“EXACT”
“Discrepant”“EXACT”
• Kappa• Intraclass Correlations
3.5
44.
55
5.5
66.
57
Abso
luat
e SE
M
5 6 7 8 9 10 11 12Number of Cases
1 Rater 2 Raters3 Raters 4 Raters
Projections in Absolute SEM
Additional rater or creating a new case:
What’s more optimal?
Adding a rater (double scoring) ≈ 2 new cases!
Rater Training
Training Materials1. Detailed case information2. “Gold Standard” exemplar note3. Case‐specific scoring guidelines4. 3 sample notes (actual student notes)5. 10 extra notes (optional cases to review)
Training (2 sessions: 3 hours total)1. Moderator presented summary of case2. Discussion of 3 sample notes 3. Discrepancies in scoring discussed4. Update scoring guidelines and “Gold Standard” exemplar note
How to combine different educational experiences?
• Example: How can weights be specified to optimize signal?
Patient Note
Written Exam
SP Encounter (OSCE)
Composite Measure
?
?
?
Composite Scores: Weighted Mean vs Kane Method
Weighted Mean Score
Composite Score: Kane Method
SP Physical Exam(SP‐PE)
Patient Note(PN)+
Weighted Mean Score69.5%
(= 80% x .30 + 65% x .70)
Weight = .30 Weight = .70
Score = 80% Score = 65%
SP Physical Exam(SP‐PE)
Patient Note(PN)+
Standardize Standardize
Weighted Mean
Kane Composite Score
Adjust Composite Score Variance
.25
.3.3
5.4
.45
.5.5
5.6
Com
posi
te S
core
Rel
iabi
lity
0 10 20 30 40 50 60 70 80 90 100Patient Note Weight (%)
SP-PE and PN SP-Hx, SP-PE, and PN
Weights and Composite Score ReliabilityPsychometrically – what “weight” maximizes composite score reliability? .51
.42.46
Pass‐Fail Classification0
1020
3040
5060
7080
9010
0C
ompo
site
Sco
re
40 45 50 55 60 65 70 75 80Weighted GCE % Score: Integrated Clinical Encounter
Misclassificationn = 16, 4.5%
Weighted GCE % cutscore = 52%
Composite cutscore = 9
Misclassification: n = 1, 0.3%
Classification accuracy (κ) = .77
Park et al, Academic Medicine, in press
Paper 5: Multisite Study(Project Years: 2015‐2016)
Validity Evidence and Scoring Guidelines for Standardized Patient Encounters and Patient Notes from a Multisite Study of Clinical Performance Examinations in Seven Medical Schools
Yoon Soo Park, PhD, Abbas Hyderi, MD, MPH, Nancy Heine MEd, RN, ANP, Win May MD, PhD, Andrew Nevins, MD, Ming Lee, PhD, Georges Bordage, MD, PhD, and Rachel Yudkowsky, MD, MHPE
Why multisite validity study?
• Enhance psychometric rigor → Validity
Psychometric• Greater heterogeneity of learner samples• Reliability and validity inferences
Educational• Broaden mix of clinical cases• Shared case development• Develop common rater training protocols• Evaluate program • Many more!
Validity Evidence:
UI‐College of Medicine(n = 160)
California Consortium for the Assessment of Clinical Competence (CCACC)
(n = 830)
1. Loma Linda University 2. UC Davis3. UCLA4. UC San Diego5. University of Southern
California 6. Stanford University
GeneralizeInferences?
Cross‐Institutional Consequential Evidence
Multisite Participants
Multisite Validity Evidence• Consistent with previous findings
• Notable school‐effect differences on patient note scores– Evidence from variance components– Differences in effective teaching of clinical reasoning
• Ability to “justify” diagnoses → most discriminating
• Develop case‐specific scoring guidelines – More accessible
6570
7580
Patie
nt N
ote
(%)
1 2 3 4 5 6 7School ID
95% confidence intervals
Patient Note
Program Evaluation
Why is performance on Medical School Clinical Skills Exam important?
Standardized Patient Encounter
Patient Note
Communication and Interpersonal Skills
Medical School
Step 2 CS Exam
Medical School
Step 2 CS Exam
Medical School
Step 2 CS Exam
r = .32
r = .75
r = .56
Berg et al, Academic Medicine, 2008
Cuddy et al, Academic Medicine, 2016
Residency Performance:
Significant Prediction!
Current Studies on the Patient Note (1) –Revisit Project (8 medical schools)
Reflect and Revisit Patient
Blatt et al, Journal of General Internal Medicine, 2007
Reflection Study –Standardized Patient Encounter
Current Studies on the Patient Note (2) –Non‐Physician Rater (15 medical schools)
• Scoring Patient Note – Expensive
• Cost of training and scoring notes – UIC estimates:
– Physician: $40,869 (excluding case development costs)
– Non‐Physicians: $9,990
• California pilot study (4 medical schools)– Physician and Non‐Physician correlation: r = .64
Translational Science in Education
Teacher training, development
Teacher knowledge and skills
Classroom teaching
Medical Education
Trainee Clinical Skills and knowledge
Clinics and offices for better
health care
Patient or public health outcome
K‐12 Education (example)
Medical Education
T1(education setting)
T2(clinic)
T3(clinic and community)
McGaghie WC, Science Translational Medicine, 2010
Student achievement
Allen et al, Science, 2011
Yoon Soo Park, PhDemail: [email protected]