View
221
Download
3
Tags:
Embed Size (px)
Citation preview
Introduction to Introduction to Test Test
DevelopmentDevelopmentGraham McMahon, MD, MMSc.Graham McMahon, MD, MMSc.
Sarah E. Peyre, EdDSarah E. Peyre, EdD
Educational Research Methods Educational Research Methods ProgramProgram
Learning ObjectivesLearning Objectives
Understand the pros and cons to various Understand the pros and cons to various testing questions for written examinationstesting questions for written examinations
Learn how to determine Learn how to determine Item difficulty and Item difficulty and Item discrimination Item discrimination
Understand the psychometrics of a high Understand the psychometrics of a high stakes teststakes test ValidityValidity ReliabilityReliability Standard SettingStandard Setting
Come to our Workshop!
Work in small groups to… Review problematic multiple choice
items Establish validity and reliability for a
test Participate in standard setting exercise
Question Types – Pros and Question Types – Pros and ConsCons
Essay ItemsEssay Items Short Answer and Completion ItemsShort Answer and Completion Items Matching ItemsMatching Items True-False and Multiple-Choice True-False and Multiple-Choice
TestsTests InterviewsInterviews PortfoliosPortfolios
…….all can be scored and can be subject to test development.all can be scored and can be subject to test development
Multiple-Choice ItemsMultiple-Choice Items An 85-year-old woman has An 85-year-old woman has
difficulty raising her arms above difficulty raising her arms above her head and combing her hair. her head and combing her hair. She has morning aches in her She has morning aches in her shoulders and neck. Her reflexes shoulders and neck. Her reflexes are symmetrical and normal. are symmetrical and normal. There is no muscle tenderness or There is no muscle tenderness or joint swelling. Which one of joint swelling. Which one of following laboratory tests should following laboratory tests should be obtained to confirm the most be obtained to confirm the most likely diagnosis? likely diagnosis?
A. Anti-nuclear antibody.A. Anti-nuclear antibody. B. Erythrocyte sedimentation B. Erythrocyte sedimentation
rate.rate. C. Serum concentration of C. Serum concentration of
creatine kinase. creatine kinase. D. Serum concentration of D. Serum concentration of
angiotensin-converting enzyme. angiotensin-converting enzyme. E. Urine microscopy. E. Urine microscopy.
Stem
Lead in
Responses Correct
response Distractors
Tips for writing Tips for writing discriminant MCQsdiscriminant MCQs
Be sure that each item reflects a clearly defined Be sure that each item reflects a clearly defined learning outcomelearning outcome
StemStem The stem of the item should be self-contained and The stem of the item should be self-contained and
written in clear and precise language. written in clear and precise language. Avoid ‘trigger’ words (e.g. pin-rolling tremor) Avoid ‘trigger’ words (e.g. pin-rolling tremor) Negatives, excepts, absolutes and qualifiers in question Negatives, excepts, absolutes and qualifiers in question
stems are no-no’s.stems are no-no’s. ResponsesResponses
All answers should be plausible and homogenousAll answers should be plausible and homogenous Items need to be independent of one anotherItems need to be independent of one another Answer choices should be similar in length and Answer choices should be similar in length and
grammatical formgrammatical form List answer choices in alphabetical or numerical orderList answer choices in alphabetical or numerical order Avoid ‘all of the above’ as a responseAvoid ‘all of the above’ as a response Avoid technical flaws (tense or plurality for example) Avoid technical flaws (tense or plurality for example)
Pros and Cons of MCQ’sPros and Cons of MCQ’s
ProsPros Useful for Useful for
measuring learning measuring learning outcomes at almost outcomes at almost any levelany level
Easy to understandEasy to understand Easy to scoreEasy to score Easily analyzed for Easily analyzed for
effectivenesseffectiveness Allow broad Allow broad
coverage efficiently coverage efficiently
ConsCons Good questionsGood questions
Take a long time to Take a long time to writewrite
Are difficult to writeAre difficult to write Constrain creative Constrain creative
responses from responses from learnerslearners
May have more than May have more than one correct answerone correct answer
Item AnalysisItem Analysis
Qualitative: looks at whether the Qualitative: looks at whether the content matches the information, content matches the information, attitude, characteristic or behavior attitude, characteristic or behavior being assessedbeing assessed
Quantitative:Quantitative: Item difficulty Item difficulty Item discriminationItem discrimination
Determining item Determining item difficultydifficulty
The percentage The percentage of participants of participants who get that who get that item correctitem correct
Item difficulty Item difficulty scores can scores can range from 0 to range from 0 to 100%100% Low value = Low value =
high difficultyhigh difficulty High value = High value =
low difficultylow difficulty
High(Difficul
t)
Medium(Moderate)
Low(Easy)
<= 30%
>30% AND < 80%
>=80%
0 10 20 30 40 50 60 70 80 90 100
Number of Students achieving each Score
0
10
20
30
0 10 20 30 40 50 60 70 80 90 100
Hard Exam Normal Exam Easy Exam
Discrimination IndexDiscrimination Index
Index of discrimination: Index of discrimination: The difference in the % of The difference in the % of
people in one extreme group people in one extreme group minus the % of people in the minus the % of people in the other extreme groupother extreme group
Item discrimination scores Item discrimination scores can range from -1.00 to can range from -1.00 to +1.00+1.00
ExampleExample 100 test takers: 20 in top 25 100 test takers: 20 in top 25
were correct but only 5 in were correct but only 5 in the lowest 25 students were the lowest 25 students were correct. correct.
DI = (20-5)/25 = 0.8 DI = (20-5)/25 = 0.8
The Discrimination Index distinguishes for each item between the performance of students who did well on the exam and students who did poorly.
Item Discriminati
on (D)
Item Difficulty
High
Med Low
D =< 0% review
review
review
0% < D < 30%
ok review
ok
D >= 30% ok ok ok
Item Analysis Report
The left half shows percentages, the right half counts. The correct option is indicated in parentheses. Point Biserial is similar to the discrimination index, but is not based
on fixed upper and lower groups. For each item, it compares the mean score of students who chose the correct answer to the mean score of students who chose the wrong answer.
percentages counts
Order ID and group number
Test ValidityTest Validity
ValidityValidity: : The extent to which inferences made from a The extent to which inferences made from a
test are appropriate, meaningful, or useful. test are appropriate, meaningful, or useful. Does my test measure what it is intended Does my test measure what it is intended
to measure?to measure? Content validityContent validity
Expert reviewExpert review Criterion validity – Predictive/Concurrent Criterion validity – Predictive/Concurrent
Scores can be related to another known metric Scores can be related to another known metric Construct validityConstruct validity
Successfully differentiates between levels of learnersSuccessfully differentiates between levels of learners
Kissing CousinsKissing Cousins
A test can not be valid until it is A test can not be valid until it is reliable:reliable:
Test ReliabilityTest Reliability
ReliabilityReliability: Measure the underlying : Measure the underlying construct consistently = construct consistently = trustworthiness/stability trustworthiness/stability Test-Retest ReliabilityTest-Retest Reliability Alternate forms reliabilityAlternate forms reliability Internal consistency reliability Internal consistency reliability
(cronbach’s alpha) (cronbach’s alpha) Inter-rater reliabilityInter-rater reliability
How do I set a passing How do I set a passing grade?grade?
Standard SettingStandard Setting Norm referenced: Z-scoresNorm referenced: Z-scores
Number of standard deviations below the meanNumber of standard deviations below the mean Criterion Referenced: Angoff MethodCriterion Referenced: Angoff Method
Panel of experts are asked to evaluate each Panel of experts are asked to evaluate each item and estimate the number fraction of item and estimate the number fraction of minimally competent students who would minimally competent students who would answer each item correctly answer each item correctly
Ratings are averaged across the experts for Ratings are averaged across the experts for each item, discussed and then summed to get each item, discussed and then summed to get panel raw cutscorepanel raw cutscore
Thank you!Thank you!
Welcome to Our Workshop on Test
Development!
Graham McMahon, MD, MMSc.Graham McMahon, MD, MMSc.
Sarah E. Peyre, EdDSarah E. Peyre, EdD
Educational Research MethodsEducational Research Methods
The Academy at Harvard Medical The Academy at Harvard Medical SchoolSchool
OutlineLearning ObjectivesCreating MCQ Items
Item Template Item Flaws Tips for Success
Establishing Validity and Reliability for a Test
Mock Standard Setting
Item CreationConsider beginning with the end in
mind What is it that you think the medical
student should demonstrate that he/she knows or knows how to do?
This should be an objective from your lesson plan.
Objectives
Learning Activities
Evaluation
Item Stems: Clinical Vignettes Things to consider:
Patient description (46-year-old-female) Functional disability (difficulty rising from a seated
position, but has no difficulty flexing her legs) The question based on this item template:
A 46-year-old-female has difficulty rising from a seated position, but has no difficulty flexing her legs. Which of the following muscles has been injured?
[Objective: Identify and explain the function of the muscles in the…. ]
Item Creation Lead-in: The most likely
diagnosis is Options: disorders,
diseases Objective: Describe the
signs and symptoms of X. Compare and contrast the signs and symptoms of XY and Z.
Lead-in: Which of the following additional symptoms would you expect to be present? Options: symptoms Objective: same as above
Lead-in: The most likely cause is Options: bacteria, toxins,
medications, metabolic defects
Objective: List and explain the causes of X.
Lead-in: The most likely mechanism is Options: disease
mechanisms, pharmacologic mechanisms
Objective: Diagram and explain the mechanism of drug X.
Item Templates Other considerations:
Age, gender, race, ethnicity Site of care (ER, office visit) Presenting complaint
presents for a routine physical exam presents with a headache
Duration Patient history, family history
There is no history of… He has a history of…
Physical findings Lab values, imaging studies, pathology reports
Treatment, subsequent findings
Item Creation
Add the lead-in (question) and the options Which of the following pulmonary variables
is most likely to be lower than normal in this patient?
A. Alveolar-arterial PO2 differenceB. Compliance of the lungC. Oncotic pressure of the alveolar fluidD. Work of breathingE. Residual volume
Item Creation: Taking Recall up to Another
LevelRecall question:What area is supplied with blood by
the posterior inferior cerebral artery?
[Objective: Identify the areas of the brain supplied by the major cerebral arteries.]
Item Creation: Taking Recall up to Another
Level Application question:A 62-year-old man develops left-sided limb
ataxia, Horner’s syndrome, nystagmus and loss of facial pain and temperature. Which artery is most likely to be occluded?
[Objective: Differentiate the signs and symptoms that would occur upon occlusion of each of the major cerebral arteries.]
Your Turn!Review the distributed questions and identify
strengths and weaknesses in each.
Question
Acute intermittent porphyria is the result of a defect in the biosynthetic pathway for
A. collagen B. corticosteroid C. fatty acid D. glucose E. heme
Rewritten…. An otherwise healthy 33-year-old male has mild
weakness and occasional episodes of steady, severe abdominal pain with some cramping but no diarrhea. One aunt and a cousin have had similar episodes. During an episode, his abdomen is distended, and bowel sounds are decreased. Neurological examination shows mild weakness in the upper arms. These findings suggest a defect in the biosynthetic pathway for:
A. collagen B. corticosteroid C. fatty acid D. glucose E. heme
QuestionA 52-year-old male presents to the office with a
one-week history of flank pain and hematuria. Past medical history is unremarkable. Physical examination reveals a left-sided abdominal mass. The greatest risk factor for renal cell carcinoma isA. diabetesB. female genderC. hyperlipidemiaD. low body mass indexE. smoking
QuestionWhich of the following is a correct statement
about cystic fibrosis (CF)?A. The incidence of CF is 1:2000.B. Children with CF usually die in their teens.C. Males with CF are sterile.D. CF is an autosomal recessive disease.E. Symptoms of CF only appear in infancy.
What other flaws can you detect in this question?
Item Flaws: Unfocused items
Which of the following is correct regarding [topic]?
There is not enough information in the stem to answer the question without looking at the options.
The responses are disparate. The distractors have to be 100% false. Thus, the question basically becomes a true/false question. Avoid these!
A 45-year-old man comes to the physician because of a 6 week history of a non-productive cough. An X-ray film of the chest shows a 0.8 cm well circumscribed peripheral nodule in the right lung. Biopsy shows a necrotizing granuloma. Which of the following is the most likely diagnosis?
(A) Pulmonary embolus(B) Small cell carcinoma(C) Pseudomonas aeruginosa infection(D) Histoplasma capsulatum(E) Herpes pneumonitis(F) Metastatic renal cell carcinoma
A healthy 57-year-old woman comes to the physician because of 2 cm mass in her right breast. Biopsy reveals an invasive ductal carcinoma. Which of the following is the most important prognostic factor?
(A) High grade tumor cytology(B) Infiltrative nature of tumor into benign
breast(C) Numerous mitotic figures(D) Amount of tumor fibrosis(E) Presence of Lymph node metastasis(F) Number of plasma cells in tumor
A 63-year-old man comes to the physician because of a 6-week history of progressive dyspnea on exertion, orthopnea, and ankle edema. He has received multiagent chemotherapy for Waldenström’s macroglobulinemia for the past year. Urinalysis shows proteinuria. A bone marrow biopsy shows a partial response to therapy with ongoing marrow involvement still identified. Which of the following is the most likely diagnosis?
(A) Cardiac amyloidosis(B) Viral myocarditis(C) Cardiac sarcoidosis(D) Myocardial infarct(E) Hypertrophic cardiomyopathy
A question submittedIn aortic stenosis what other
abnormal heart sounds might accompany the resulting murmur?A. Physiological splitting of S2B. An accentuated S2C. Paradoxical splitting of S2D. A muffled S2
Revised questionA 60 year old patient with an active lifestyle
is found to have a systolic murmur on a routine physical exam. He currently has no symptoms. If this were aortic stenosis, what other abnormal heart sounds might accompany the systolic murmur?A.) Physiological splitting of S2B.) An accentuated S2C.) Paradoxical splitting of S2D.) A muffled S2
Determining item Determining item difficultydifficulty
The percentage The percentage of participants of participants who get that who get that item correctitem correct
Item difficulty Item difficulty scores can scores can range from 0 to range from 0 to 100%100% Low value = Low value =
high difficultyhigh difficulty High value = High value =
low difficultylow difficulty
High(Difficul
t)
Medium(Moderate)
Low(Easy)
<= 30%
>30% AND < 80%
>=80%
0 10 20 30 40 50 60 70 80 90 100
Number of Students achieving each Score
0
10
20
30
0 10 20 30 40 50 60 70 80 90 100
Hard Exam Normal Exam Easy Exam
Discrimination IndexDiscrimination Index
Index of discrimination: Index of discrimination: The difference in the % of The difference in the % of
people in one extreme group people in one extreme group minus the % of people in the minus the % of people in the other extreme groupother extreme group
Item discrimination scores Item discrimination scores can range from -1.00 to can range from -1.00 to +1.00+1.00
ExampleExample 100 test takers: 20 in top 25 100 test takers: 20 in top 25
were correct but only 5 in were correct but only 5 in the lowest 25 students were the lowest 25 students were correct. correct.
DI = (20-5)/25 = 0.8 DI = (20-5)/25 = 0.8
The Discrimination Index distinguishes for each item between the performance of students who did well on the exam and students who did poorly.
Item Discriminati
on (D)
Item Difficulty
High
Med Low
D =< 0% review
review
review
0% < D < 30%
ok review
ok
D >= 30% ok ok ok
Item Analysis Report
The left half shows percentages, the right half counts. The correct option is indicated in parentheses. Point Biserial is similar to the discrimination index, but is not based
on fixed upper and lower groups. For each item, it compares the mean score of students who chose the correct answer to the mean score of students who chose the wrong answer.
percentages counts
Order ID and group number
Summary Utilize action verbs to write objectives Write your exam items based on the
objectives Tie the clinical vignette to the lead-in Choose appropriate options with one best answer Avoid technical flaws
Utilize an item checklist to ensure that you have done all you can to write the best items possible.
Pretest your items
Establishing Validity and Reliability
(Groups)
Standard Setting
(Groups)
Item Discrimination: Examples
Item No.
Number of Correct Answers in Group
Item Discrimination
IndexUpper 1/4 Lower 1/4
1 90 20
2 80 70
3 100 0
4 100 100
5 50 50
6 20 60
0.7
0.1
1
0
0
-0.4
Number of students per group = 100
Distracter Analysis: Examples
Item 1 A* B C D E Omit
% of students in upper ¼ 20 5 0 0 0 0
% of students in the middle
15 10 10 10 5 0
% of students in lower ¼ 5 5 5 10 0 0
(*) marks the correct answer.
Item 2 A B C D* E Omit
% of students in upper ¼ 0 5 5 15 0 0
% of students in the middle
0 10 15 5 20 0
% of students in lower ¼ 0 5 10 0 10 0