45
Introduction to Introduction to Test Development Test Development Graham McMahon, MD, MMSc. Graham McMahon, MD, MMSc. Sarah E. Peyre, EdD Sarah E. Peyre, EdD Educational Research Methods Program Educational Research Methods Program

Introduction to Test Development Graham McMahon, MD, MMSc. Sarah E. Peyre, EdD Educational Research Methods Program

  • View
    221

  • Download
    3

Embed Size (px)

Citation preview

Page 1: Introduction to Test Development Graham McMahon, MD, MMSc. Sarah E. Peyre, EdD Educational Research Methods Program

Introduction to Introduction to Test Test

DevelopmentDevelopmentGraham McMahon, MD, MMSc.Graham McMahon, MD, MMSc.

Sarah E. Peyre, EdDSarah E. Peyre, EdD

Educational Research Methods Educational Research Methods ProgramProgram

Page 2: Introduction to Test Development Graham McMahon, MD, MMSc. Sarah E. Peyre, EdD Educational Research Methods Program

Learning ObjectivesLearning Objectives

Understand the pros and cons to various Understand the pros and cons to various testing questions for written examinationstesting questions for written examinations

Learn how to determine Learn how to determine Item difficulty and Item difficulty and Item discrimination Item discrimination

Understand the psychometrics of a high Understand the psychometrics of a high stakes teststakes test ValidityValidity ReliabilityReliability Standard SettingStandard Setting

Page 3: Introduction to Test Development Graham McMahon, MD, MMSc. Sarah E. Peyre, EdD Educational Research Methods Program

Come to our Workshop!

Work in small groups to… Review problematic multiple choice

items Establish validity and reliability for a

test Participate in standard setting exercise

Page 4: Introduction to Test Development Graham McMahon, MD, MMSc. Sarah E. Peyre, EdD Educational Research Methods Program

Question Types – Pros and Question Types – Pros and ConsCons

Essay ItemsEssay Items Short Answer and Completion ItemsShort Answer and Completion Items Matching ItemsMatching Items True-False and Multiple-Choice True-False and Multiple-Choice

TestsTests InterviewsInterviews PortfoliosPortfolios

…….all can be scored and can be subject to test development.all can be scored and can be subject to test development

Page 5: Introduction to Test Development Graham McMahon, MD, MMSc. Sarah E. Peyre, EdD Educational Research Methods Program

Multiple-Choice ItemsMultiple-Choice Items An 85-year-old woman has An 85-year-old woman has

difficulty raising her arms above difficulty raising her arms above her head and combing her hair. her head and combing her hair. She has morning aches in her She has morning aches in her shoulders and neck. Her reflexes shoulders and neck. Her reflexes are symmetrical and normal. are symmetrical and normal. There is no muscle tenderness or There is no muscle tenderness or joint swelling. Which one of joint swelling. Which one of following laboratory tests should following laboratory tests should be obtained to confirm the most be obtained to confirm the most likely diagnosis? likely diagnosis?

A. Anti-nuclear antibody.A. Anti-nuclear antibody. B. Erythrocyte sedimentation B. Erythrocyte sedimentation

rate.rate. C. Serum concentration of C. Serum concentration of

creatine kinase. creatine kinase. D. Serum concentration of D. Serum concentration of

angiotensin-converting enzyme. angiotensin-converting enzyme. E. Urine microscopy. E. Urine microscopy.

Stem

Lead in

Responses Correct

response Distractors

Page 6: Introduction to Test Development Graham McMahon, MD, MMSc. Sarah E. Peyre, EdD Educational Research Methods Program

Tips for writing Tips for writing discriminant MCQsdiscriminant MCQs

Be sure that each item reflects a clearly defined Be sure that each item reflects a clearly defined learning outcomelearning outcome

StemStem The stem of the item should be self-contained and The stem of the item should be self-contained and

written in clear and precise language. written in clear and precise language. Avoid ‘trigger’ words (e.g. pin-rolling tremor) Avoid ‘trigger’ words (e.g. pin-rolling tremor) Negatives, excepts, absolutes and qualifiers in question Negatives, excepts, absolutes and qualifiers in question

stems are no-no’s.stems are no-no’s. ResponsesResponses

All answers should be plausible and homogenousAll answers should be plausible and homogenous Items need to be independent of one anotherItems need to be independent of one another Answer choices should be similar in length and Answer choices should be similar in length and

grammatical formgrammatical form List answer choices in alphabetical or numerical orderList answer choices in alphabetical or numerical order Avoid ‘all of the above’ as a responseAvoid ‘all of the above’ as a response Avoid technical flaws (tense or plurality for example) Avoid technical flaws (tense or plurality for example)

Page 7: Introduction to Test Development Graham McMahon, MD, MMSc. Sarah E. Peyre, EdD Educational Research Methods Program

Pros and Cons of MCQ’sPros and Cons of MCQ’s

ProsPros Useful for Useful for

measuring learning measuring learning outcomes at almost outcomes at almost any levelany level

Easy to understandEasy to understand Easy to scoreEasy to score Easily analyzed for Easily analyzed for

effectivenesseffectiveness Allow broad Allow broad

coverage efficiently coverage efficiently

ConsCons Good questionsGood questions

Take a long time to Take a long time to writewrite

Are difficult to writeAre difficult to write Constrain creative Constrain creative

responses from responses from learnerslearners

May have more than May have more than one correct answerone correct answer

Page 8: Introduction to Test Development Graham McMahon, MD, MMSc. Sarah E. Peyre, EdD Educational Research Methods Program

Item AnalysisItem Analysis

Qualitative: looks at whether the Qualitative: looks at whether the content matches the information, content matches the information, attitude, characteristic or behavior attitude, characteristic or behavior being assessedbeing assessed

Quantitative:Quantitative: Item difficulty Item difficulty Item discriminationItem discrimination

Page 9: Introduction to Test Development Graham McMahon, MD, MMSc. Sarah E. Peyre, EdD Educational Research Methods Program

Determining item Determining item difficultydifficulty

The percentage The percentage of participants of participants who get that who get that item correctitem correct

Item difficulty Item difficulty scores can scores can range from 0 to range from 0 to 100%100% Low value = Low value =

high difficultyhigh difficulty High value = High value =

low difficultylow difficulty

High(Difficul

t)

Medium(Moderate)

Low(Easy)

<= 30%

>30% AND < 80%

>=80%

0 10 20 30 40 50 60 70 80 90 100

Number of Students achieving each Score

0

10

20

30

0 10 20 30 40 50 60 70 80 90 100

Hard Exam Normal Exam Easy Exam

Page 10: Introduction to Test Development Graham McMahon, MD, MMSc. Sarah E. Peyre, EdD Educational Research Methods Program

Discrimination IndexDiscrimination Index

Index of discrimination: Index of discrimination: The difference in the % of The difference in the % of

people in one extreme group people in one extreme group minus the % of people in the minus the % of people in the other extreme groupother extreme group

Item discrimination scores Item discrimination scores can range from -1.00 to can range from -1.00 to +1.00+1.00

ExampleExample 100 test takers: 20 in top 25 100 test takers: 20 in top 25

were correct but only 5 in were correct but only 5 in the lowest 25 students were the lowest 25 students were correct. correct.

DI = (20-5)/25 = 0.8 DI = (20-5)/25 = 0.8

The Discrimination Index distinguishes for each item between the performance of students who did well on the exam and students who did poorly.

Item Discriminati

on (D)

Item Difficulty

High

Med Low

D =< 0% review

review

review

0% < D < 30%

ok review

ok

D >= 30% ok ok ok

Page 11: Introduction to Test Development Graham McMahon, MD, MMSc. Sarah E. Peyre, EdD Educational Research Methods Program

Item Analysis Report

The left half shows percentages, the right half counts. The correct option is indicated in parentheses. Point Biserial is similar to the discrimination index, but is not based

on fixed upper and lower groups. For each item, it compares the mean score of students who chose the correct answer to the mean score of students who chose the wrong answer.

percentages counts

Order ID and group number

Page 12: Introduction to Test Development Graham McMahon, MD, MMSc. Sarah E. Peyre, EdD Educational Research Methods Program

Test ValidityTest Validity

ValidityValidity: : The extent to which inferences made from a The extent to which inferences made from a

test are appropriate, meaningful, or useful. test are appropriate, meaningful, or useful. Does my test measure what it is intended Does my test measure what it is intended

to measure?to measure? Content validityContent validity

Expert reviewExpert review Criterion validity – Predictive/Concurrent Criterion validity – Predictive/Concurrent

Scores can be related to another known metric Scores can be related to another known metric Construct validityConstruct validity

Successfully differentiates between levels of learnersSuccessfully differentiates between levels of learners

Page 13: Introduction to Test Development Graham McMahon, MD, MMSc. Sarah E. Peyre, EdD Educational Research Methods Program

Kissing CousinsKissing Cousins

A test can not be valid until it is A test can not be valid until it is reliable:reliable:

Page 14: Introduction to Test Development Graham McMahon, MD, MMSc. Sarah E. Peyre, EdD Educational Research Methods Program

Test ReliabilityTest Reliability

ReliabilityReliability: Measure the underlying : Measure the underlying construct consistently = construct consistently = trustworthiness/stability trustworthiness/stability Test-Retest ReliabilityTest-Retest Reliability Alternate forms reliabilityAlternate forms reliability Internal consistency reliability Internal consistency reliability

(cronbach’s alpha) (cronbach’s alpha) Inter-rater reliabilityInter-rater reliability

Page 15: Introduction to Test Development Graham McMahon, MD, MMSc. Sarah E. Peyre, EdD Educational Research Methods Program

How do I set a passing How do I set a passing grade?grade?

Standard SettingStandard Setting Norm referenced: Z-scoresNorm referenced: Z-scores

Number of standard deviations below the meanNumber of standard deviations below the mean Criterion Referenced: Angoff MethodCriterion Referenced: Angoff Method

Panel of experts are asked to evaluate each Panel of experts are asked to evaluate each item and estimate the number fraction of item and estimate the number fraction of minimally competent students who would minimally competent students who would answer each item correctly answer each item correctly

Ratings are averaged across the experts for Ratings are averaged across the experts for each item, discussed and then summed to get each item, discussed and then summed to get panel raw cutscorepanel raw cutscore

Page 16: Introduction to Test Development Graham McMahon, MD, MMSc. Sarah E. Peyre, EdD Educational Research Methods Program

Thank you!Thank you!

Page 17: Introduction to Test Development Graham McMahon, MD, MMSc. Sarah E. Peyre, EdD Educational Research Methods Program

Welcome to Our Workshop on Test

Development!

Graham McMahon, MD, MMSc.Graham McMahon, MD, MMSc.

Sarah E. Peyre, EdDSarah E. Peyre, EdD

Educational Research MethodsEducational Research Methods

The Academy at Harvard Medical The Academy at Harvard Medical SchoolSchool

Page 18: Introduction to Test Development Graham McMahon, MD, MMSc. Sarah E. Peyre, EdD Educational Research Methods Program

OutlineLearning ObjectivesCreating MCQ Items

Item Template Item Flaws Tips for Success

Establishing Validity and Reliability for a Test

Mock Standard Setting

Page 19: Introduction to Test Development Graham McMahon, MD, MMSc. Sarah E. Peyre, EdD Educational Research Methods Program

Item CreationConsider beginning with the end in

mind What is it that you think the medical

student should demonstrate that he/she knows or knows how to do?

This should be an objective from your lesson plan.

Objectives

Learning Activities

Evaluation

Page 20: Introduction to Test Development Graham McMahon, MD, MMSc. Sarah E. Peyre, EdD Educational Research Methods Program

Item Stems: Clinical Vignettes Things to consider:

Patient description (46-year-old-female) Functional disability (difficulty rising from a seated

position, but has no difficulty flexing her legs) The question based on this item template:

A 46-year-old-female has difficulty rising from a seated position, but has no difficulty flexing her legs. Which of the following muscles has been injured?

[Objective: Identify and explain the function of the muscles in the…. ]

Page 21: Introduction to Test Development Graham McMahon, MD, MMSc. Sarah E. Peyre, EdD Educational Research Methods Program

Item Creation Lead-in: The most likely

diagnosis is Options: disorders,

diseases Objective: Describe the

signs and symptoms of X. Compare and contrast the signs and symptoms of XY and Z.

Lead-in: Which of the following additional symptoms would you expect to be present? Options: symptoms Objective: same as above

Lead-in: The most likely cause is Options: bacteria, toxins,

medications, metabolic defects

Objective: List and explain the causes of X.

Lead-in: The most likely mechanism is Options: disease

mechanisms, pharmacologic mechanisms

Objective: Diagram and explain the mechanism of drug X.

Page 22: Introduction to Test Development Graham McMahon, MD, MMSc. Sarah E. Peyre, EdD Educational Research Methods Program

Item Templates Other considerations:

Age, gender, race, ethnicity Site of care (ER, office visit) Presenting complaint

presents for a routine physical exam presents with a headache

Duration Patient history, family history

There is no history of… He has a history of…

Physical findings Lab values, imaging studies, pathology reports

Treatment, subsequent findings

Page 23: Introduction to Test Development Graham McMahon, MD, MMSc. Sarah E. Peyre, EdD Educational Research Methods Program

Item Creation

Add the lead-in (question) and the options Which of the following pulmonary variables

is most likely to be lower than normal in this patient?

A. Alveolar-arterial PO2 differenceB. Compliance of the lungC. Oncotic pressure of the alveolar fluidD. Work of breathingE. Residual volume

Page 24: Introduction to Test Development Graham McMahon, MD, MMSc. Sarah E. Peyre, EdD Educational Research Methods Program

Item Creation: Taking Recall up to Another

LevelRecall question:What area is supplied with blood by

the posterior inferior cerebral artery?

[Objective: Identify the areas of the brain supplied by the major cerebral arteries.]

Page 25: Introduction to Test Development Graham McMahon, MD, MMSc. Sarah E. Peyre, EdD Educational Research Methods Program

Item Creation: Taking Recall up to Another

Level Application question:A 62-year-old man develops left-sided limb

ataxia, Horner’s syndrome, nystagmus and loss of facial pain and temperature. Which artery is most likely to be occluded?

[Objective: Differentiate the signs and symptoms that would occur upon occlusion of each of the major cerebral arteries.]

Page 26: Introduction to Test Development Graham McMahon, MD, MMSc. Sarah E. Peyre, EdD Educational Research Methods Program

Your Turn!Review the distributed questions and identify

strengths and weaknesses in each.

Page 27: Introduction to Test Development Graham McMahon, MD, MMSc. Sarah E. Peyre, EdD Educational Research Methods Program

Question

Acute intermittent porphyria is the result of a defect in the biosynthetic pathway for

A. collagen B. corticosteroid C. fatty acid D. glucose E. heme

Page 28: Introduction to Test Development Graham McMahon, MD, MMSc. Sarah E. Peyre, EdD Educational Research Methods Program

Rewritten…. An otherwise healthy 33-year-old male has mild

weakness and occasional episodes of steady, severe abdominal pain with some cramping but no diarrhea. One aunt and a cousin have had similar episodes. During an episode, his abdomen is distended, and bowel sounds are decreased. Neurological examination shows mild weakness in the upper arms. These findings suggest a defect in the biosynthetic pathway for:

A. collagen B. corticosteroid C. fatty acid D. glucose E. heme

Page 29: Introduction to Test Development Graham McMahon, MD, MMSc. Sarah E. Peyre, EdD Educational Research Methods Program

QuestionA 52-year-old male presents to the office with a

one-week history of flank pain and hematuria. Past medical history is unremarkable. Physical examination reveals a left-sided abdominal mass. The greatest risk factor for renal cell carcinoma isA. diabetesB. female genderC. hyperlipidemiaD. low body mass indexE. smoking

Page 30: Introduction to Test Development Graham McMahon, MD, MMSc. Sarah E. Peyre, EdD Educational Research Methods Program

QuestionWhich of the following is a correct statement

about cystic fibrosis (CF)?A. The incidence of CF is 1:2000.B. Children with CF usually die in their teens.C. Males with CF are sterile.D. CF is an autosomal recessive disease.E. Symptoms of CF only appear in infancy.

What other flaws can you detect in this question?

Page 31: Introduction to Test Development Graham McMahon, MD, MMSc. Sarah E. Peyre, EdD Educational Research Methods Program

Item Flaws: Unfocused items

Which of the following is correct regarding [topic]?

There is not enough information in the stem to answer the question without looking at the options.

The responses are disparate. The distractors have to be 100% false. Thus, the question basically becomes a true/false question. Avoid these!

Page 32: Introduction to Test Development Graham McMahon, MD, MMSc. Sarah E. Peyre, EdD Educational Research Methods Program

A 45-year-old man comes to the physician because of a 6 week history of a non-productive cough. An X-ray film of the chest shows a 0.8 cm well circumscribed peripheral nodule in the right lung. Biopsy shows a necrotizing granuloma. Which of the following is the most likely diagnosis?

(A) Pulmonary embolus(B) Small cell carcinoma(C) Pseudomonas aeruginosa infection(D) Histoplasma capsulatum(E) Herpes pneumonitis(F) Metastatic renal cell carcinoma

Page 33: Introduction to Test Development Graham McMahon, MD, MMSc. Sarah E. Peyre, EdD Educational Research Methods Program

A healthy 57-year-old woman comes to the physician because of 2 cm mass in her right breast. Biopsy reveals an invasive ductal carcinoma. Which of the following is the most important prognostic factor?

(A) High grade tumor cytology(B) Infiltrative nature of tumor into benign

breast(C) Numerous mitotic figures(D) Amount of tumor fibrosis(E) Presence of Lymph node metastasis(F) Number of plasma cells in tumor

Page 34: Introduction to Test Development Graham McMahon, MD, MMSc. Sarah E. Peyre, EdD Educational Research Methods Program

A 63-year-old man comes to the physician because of a 6-week history of progressive dyspnea on exertion, orthopnea, and ankle edema. He has received multiagent chemotherapy for Waldenström’s macroglobulinemia for the past year. Urinalysis shows proteinuria. A bone marrow biopsy shows a partial response to therapy with ongoing marrow involvement still identified. Which of the following is the most likely diagnosis?

(A) Cardiac amyloidosis(B) Viral myocarditis(C) Cardiac sarcoidosis(D) Myocardial infarct(E) Hypertrophic cardiomyopathy

Page 35: Introduction to Test Development Graham McMahon, MD, MMSc. Sarah E. Peyre, EdD Educational Research Methods Program

A question submittedIn aortic stenosis what other

abnormal heart sounds might accompany the resulting murmur?A. Physiological splitting of S2B. An accentuated  S2C. Paradoxical splitting of S2D. A muffled S2

Page 36: Introduction to Test Development Graham McMahon, MD, MMSc. Sarah E. Peyre, EdD Educational Research Methods Program

Revised questionA 60 year old patient with an active lifestyle

is found to have a systolic murmur on a routine physical exam. He currently has no symptoms. If this were aortic stenosis, what other abnormal heart sounds might accompany the systolic murmur?A.) Physiological splitting of S2B.) An accentuated S2C.) Paradoxical splitting of S2D.) A muffled S2

Page 37: Introduction to Test Development Graham McMahon, MD, MMSc. Sarah E. Peyre, EdD Educational Research Methods Program

Determining item Determining item difficultydifficulty

The percentage The percentage of participants of participants who get that who get that item correctitem correct

Item difficulty Item difficulty scores can scores can range from 0 to range from 0 to 100%100% Low value = Low value =

high difficultyhigh difficulty High value = High value =

low difficultylow difficulty

High(Difficul

t)

Medium(Moderate)

Low(Easy)

<= 30%

>30% AND < 80%

>=80%

0 10 20 30 40 50 60 70 80 90 100

Number of Students achieving each Score

0

10

20

30

0 10 20 30 40 50 60 70 80 90 100

Hard Exam Normal Exam Easy Exam

Page 38: Introduction to Test Development Graham McMahon, MD, MMSc. Sarah E. Peyre, EdD Educational Research Methods Program

Discrimination IndexDiscrimination Index

Index of discrimination: Index of discrimination: The difference in the % of The difference in the % of

people in one extreme group people in one extreme group minus the % of people in the minus the % of people in the other extreme groupother extreme group

Item discrimination scores Item discrimination scores can range from -1.00 to can range from -1.00 to +1.00+1.00

ExampleExample 100 test takers: 20 in top 25 100 test takers: 20 in top 25

were correct but only 5 in were correct but only 5 in the lowest 25 students were the lowest 25 students were correct. correct.

DI = (20-5)/25 = 0.8 DI = (20-5)/25 = 0.8

The Discrimination Index distinguishes for each item between the performance of students who did well on the exam and students who did poorly.

Item Discriminati

on (D)

Item Difficulty

High

Med Low

D =< 0% review

review

review

0% < D < 30%

ok review

ok

D >= 30% ok ok ok

Page 39: Introduction to Test Development Graham McMahon, MD, MMSc. Sarah E. Peyre, EdD Educational Research Methods Program

Item Analysis Report

The left half shows percentages, the right half counts. The correct option is indicated in parentheses. Point Biserial is similar to the discrimination index, but is not based

on fixed upper and lower groups. For each item, it compares the mean score of students who chose the correct answer to the mean score of students who chose the wrong answer.

percentages counts

Order ID and group number

Page 40: Introduction to Test Development Graham McMahon, MD, MMSc. Sarah E. Peyre, EdD Educational Research Methods Program

Summary Utilize action verbs to write objectives Write your exam items based on the

objectives Tie the clinical vignette to the lead-in Choose appropriate options with one best answer Avoid technical flaws

Utilize an item checklist to ensure that you have done all you can to write the best items possible.

Pretest your items

Page 41: Introduction to Test Development Graham McMahon, MD, MMSc. Sarah E. Peyre, EdD Educational Research Methods Program

Establishing Validity and Reliability

(Groups)

Page 42: Introduction to Test Development Graham McMahon, MD, MMSc. Sarah E. Peyre, EdD Educational Research Methods Program

Standard Setting

(Groups)

Page 43: Introduction to Test Development Graham McMahon, MD, MMSc. Sarah E. Peyre, EdD Educational Research Methods Program

Graham McMahonGraham McMahon

[email protected] [email protected]

43

Page 44: Introduction to Test Development Graham McMahon, MD, MMSc. Sarah E. Peyre, EdD Educational Research Methods Program

Item Discrimination: Examples

Item No.

Number of Correct Answers in Group

Item Discrimination

IndexUpper 1/4 Lower 1/4

1 90 20

2 80 70

3 100 0

4 100 100

5 50 50

6 20 60

0.7

0.1

1

0

0

-0.4

Number of students per group = 100

Page 45: Introduction to Test Development Graham McMahon, MD, MMSc. Sarah E. Peyre, EdD Educational Research Methods Program

Distracter Analysis: Examples

Item 1 A* B C D E Omit

% of students in upper ¼ 20 5 0 0 0 0

% of students in the middle

15 10 10 10 5 0

% of students in lower ¼ 5 5 5 10 0 0

(*) marks the correct answer.

Item 2 A B C D* E Omit

% of students in upper ¼ 0 5 5 15 0 0

% of students in the middle

0 10 15 5 20 0

% of students in lower ¼ 0 5 10 0 10 0