Evaluating the use of 'none of the above' in multiple choice testing

Evaluating the use of ‘none of the above’ in multiple choice testing

Matt Pachai

McMaster University

• Dr. Joe Kim

• Dr. David DiBattista

• Yvonne Chen

• The Pedagogy Research Lab

Acknowledgements

1) The goal of multiple choice (MC)

2) None of the above (NOTA) in MC

3) The present experiment

4) Future directions and implications

Outline

• What are your goals in testing students?

–Assessment?

–Discrimination?

– Learning?

Goals of Testing

• Haladyna and Downing (1989a) examined 46 textbook passages on MC

• Produced 43 recommendations for a “good” question

MC Guidelines

• Use Positives, not Negatives, in the Stem

• Avoid None of the Above

• Avoid complex (Type K) questions

Sample Guidelines

• Which of the following would not increase obedience in the Milgram experiment?i. Moving the experimenter to another room

ii. Moving the experiment to a run down building

iii. Dressing the experimenter in dirty clothes

iv. Moving the learner closer to the teachera) i and ii

b) ii and iii

c) i, ii, and iii

d) iii and iv

e) None of the above

A Bad Question

• Only half of these recommendations were empirically examined

• A clear need for rigorous examination remains

Empirical Support

Haladyna and Downing, 1989b

• How do we examine our test’s ability to achieve our goals?

–Difficulty: Percent Correct

–Discrimination: Point-biserial correlation

– Learning: Retention

Measurement Tools

• A simple way to measure knowledge at two levels

• Students:–How many questions did each student

answer correctly?

• Concepts:–What percentage of students got a

particular question correct?

Performance

• A measure of a question’s ability to discriminate between students

• What is the correlation between the answers for a particular question and each students’ final score?

Point-Biserial Correlation

A B C* D

% A 0 0 90 10

% B 5 2 83 11

% C 5 1 66 27

% D 23 5 35 37

% F 32 7 37 24

Point-Biserial Correlation

Point-biserial correlation = 0.32

OptionsG

• Cognitive psychologists have extensively studied retention of material

• Basic Paradigm:

– Session 1: teach a concept

– Session 2: test retention after a delay

Retention Experiments

• Numerous studies suggest testing improves learning

The Positive Testing Effect

Carpenter et al., 2008; Roediger and Karpicke (2006)

• Flawed questions are more difficult (Downing, 2005)

• Test flaws may hurt high achieving students more than low (Tarrant and Ware, 2008)

The Impact of Flaws

• Previous studies classify flawed questions based on a large number of guidelines

• Hard to decipher which specific flaws have which specific effects

Specific Flaws

• In a recent review, 48% of textbook authors agreed that NOTA should be avoided (Haladyna et al., 2002)

The Case of NOTA

• The few studies examining NOTA have produced mixed results

• NOTA may:

– increase difficulty and discrimination

–not change difficulty and discrimination

– increase difficulty but not discrimination

Empirical Evidence

• “When NOTA is correct… it rewards examinees with serious knowledge deficiencies or misinformation” … “Any stem or option format that reduces an item’s ability to distinguish between candidates with full and misinformation should not be used” (Gross, 1994)

Mixed Messages

• “NOTA should remain an option in the item-writer’s toolbox, as long as its use is appropriately considered. However, given the complexity of its effects, NOTA should generally be avoided by novice item writers.” (Haladyna et al., 2002)

Mixed Messages

• What effect does NOTA have on:

–Assessment?

–Discrimination?

– Learning? (not addressed today)

General Questions

• We examined NOTA on two of our Introductory Psychology examinations (approx 3000 students/year)

• Advantages of our population:

–A large class

–Highly motivated students

– Topical questions, basic and applied

Our Study

• Five versions of each test were produced

• Each test contained 5 experimental questions, randomly distributed

Test Design

• Each test version had one question in each of the following conditions:

–No NOTA (control)

–NOTA as key

–NOTA replacing distractor #1

Conditions

FORM 1 FORM 2 FORM 3 FORM 4 FORM 5

Q1 Normal NOTA key NOTA D1 NOTA D2 NOTA D3

Q2 NOTA D3 Normal NOTA key NOTA D1 NOTA D2

Q3 NOTA D2 NOTA D3 Normal NOTA key NOTA D1

Q4 NOTA D1 NOTA D2 NOTA D3 Normal NOTA key

Q5 NOTA key NOTA D1 NOTA D2 NOTA D3 Normal

Summary of Design

• Harlow's studies of infant monkeys raised with surrogate mothers indicated that infants became attached to the surrogate mother:

a) from which food was most often delivered.

b) that provided the most contact comfort.

c) that was present when danger was presented.

d) that was present for the greatest amount of time.

Sample Question: Normal

b) that was present when danger was presented.

c) that was present for the greatest amount of time.

d) None of the above

Sample Question: NOTA Key

a) that provided the most contact comfort.

b) that was present when danger was presented.

Sample Question: NOTA D1

c) that was present when danger was presented.

• Distractors were recoded as either high frequency, middle frequency, or low frequency selections

• Harlow's studies of infant monkeys raised with surrogate mothers indicated that infants became attached to the surrogate mother:a) from which food was most often delivered. (HF: 19%)b) that provided the most contact comfort. c) that was present when danger was presented. (LF: 4%)d) that was present for the greatest amount of time. (MF:

Recoding Distractors

• Independent Variable: Condition– Normal– NOTA-Key– NOTA-HF– NOTA-MF– NOTA-LF

• Dependent Variables– Performance (% correct)– Discrimination (point-biserial correlation)

Analysis

Performance

Normal NOTA-KEY NOTA-HF NOTA-MF NOTA-LF

= p < 0.001*

Discrimination

Normal NOTA-Key NOTA-HF NOTA-MF NOTA-LF

p > 0.05

• What effect does NOTA have on:

–Assessment:

• Key: Increased difficulty

• Distractor: Less effective than a good distractor

–Discrimination: No effect

– Learning: Negative testing effect? (Odegard and Koen, 2007)

Implications

• When NOTA is the correct answer, do the students selecting it know the truth?

– Fill in the correct response for a bonus

Future Directions

• Understanding the specific effects of writing “errors” is highly important

• Test writers should be thoughtful in question writing

–Questions should be matched to the goals of the test

General Conclusions

Evaluating the use of ‘none of the above’ in multiple choice testing

Questions?

• Carpenter, S. K., Pashler, H., Wixted, J. T., & Vul, E. (2008). The effects of tests on learning and forgetting. Memory & Cognition, 36(2), 438-448.

• Downing, S. M. (2005). The effects of violating standard item writing principles on tests and students: The consequences of using flawed test items on achievement examinations in medical education. Advances in Health Sciences Education, 10(2), 133-133.

• Gross, L. J. (1994). Logical versus empirical guidelines for writing test items: The case of "none of the above.". Evaluation & the Health Professions, 17(1), 123-126.

• Haladyna, T. M., & Downing, S. M. (1989a). A taxonomy of multiple-choice item-writing rules. Applied Measurement in Education, 1, 37–50.

• Haladyna, T. M., & Downing, S. M. (1989b). The validity of a taxonomy of multiple-choice item-writing rules. Applied Measurement in Education, 1, 51–78.

• Haladyna, T. M., Downing, S. M., & Rodriguez, M. C. (2002). A review of multiple-choice item-writing guidelines for classroom assessment. Applied Measurement in Education, 15(3), 309-309.

• Odegard, T. N., & Koen, J. D. (2007). "None of the above" as a correct and incorrect alternative on a multiple-choice test: Implications for the testing effect. Memory, 15(8), 873-885.

• Roediger, H.L., III, & Karpicke, J.D. (2006). Test enhanced learning: Taking memory tests improves long term retention. Psychological Science, 17 (3), 249-255

• Tarrant, M., & Ware, J. (2008). Impact of item-writing flaws in multiple-choice questions on student achievement in high-stakes nursing assessments. Medical Education, 42(2), 198-206.

References

Evaluating the use of 'none of the above' in multiple choice testing

Documents

Welcome to Tonight your server is: Alfonso Menu Choice Gluten-Free Vegetarian None G

“Top-down” effects where none should be found The …...The El Greco fallacy p. 4 Evaluating Top-Down Effects on Perception A central challenge in evaluating this emerging literature

Evaluating and Giving Feedback to Mentors: New ... sets and clinically generated data ... fi ve multiple-choice questions regarding frequency and ... Anderson et al. Evaluating and

NONE - volusiaexposed.comvolusiaexposed.com/delandpd/chiefumbergerethicsreview52020/ethi… · NONE Author: NONE Subject: NONE Keywords: NONE Created Date: 2/5/2019 12:00:00 AM

Evaluating the Factors Affecting Student Travel Mode Choice

TRELLIS SYSTEM EVALUATION FOR MECHANIZATION · Shoot thinning 7 None None Hedging 100 100 100 Shoot positioning 2 None None Cluster removal 7 None None Harvesting 91 35 None. UC DAVIS

Increasing Enrollment: Evaluating College-Choice Factors

Evaluating Alternative Representations of the Choice Set In Models of Labour Supply Rolf Aaberge, Ugo Colombino and Tom Wennemo Workshop on Discrete Choice

6-611-SFOOP.indt 1-31-2015 7:16 AM - Marriott International...None SFO 11” x 8.5” 11.25” x 8.75” None None None None None Finishing None Notes None Qty. None Fonts & Images

Scanned with CamScannercdo.ustp.edu.ph/wp-content/uploads/2018/10/ProspectucsT...ROTC/CWTS/LTS Art Appreciation pre- requisit es NONE NONE NONE NONE NONE NONE NONE Pre- requisit es

Directory of Coal Resource Transportation System Roads ......12/31/2003 8G 03A080 14.72 NONE NONE NONE NONE NONE NONE 12/31/2003 8H 03A081 19.17 NONE NONE NONE NONE NONE NONE 12/31/2003

NONE - Unibg · Title: NONE Author: NONE Subject: NONE Keywords: NONE Created Date: 4/12/2011 12:00:00 AM

Evaluating Office Space Needs & · PDF filecalled for greater flexibility and choice, they need to have the means of evaluating the options in order to ... Evaluating Office Space

Abrate, Alberto: None - AUA2018...A Aaronson, David: None Abas, Cheno: None Abascal, Jose María: None Abate-Shen, Cory: None Abaza, Ronney: None Abbadi, Samira: None Abbasi, Mehdi:

Comparison of Forest Management Software Suites - August 2014 · Cactos /Cryptos Database Architecture Relational Inventory Database Yes None None Yes None None None None Active GIS

Evaluating the impact of choice on place of care

· 2018. 11. 9. · PD 907 Honor Graduate Eligibility RA 1080 - Registered Electrical Engineer PERFORMANCE RATING None None None None None None REMARKS Graduated Cum Laude Graduated

EVALUATING Community choice aggregation …innovation.luskin.ucla.edu/sites/default/files/Evaluating CCA... · EVALUATING COMMUNITY CHOICE AGGREGATION ALTERNATIVES FOR THE CITY OF

2020 Global benefits financing matrix and poolable coverages · SUNU Assurances IARD Centrafrique (L,A,D,M) None None Chad None None None None None None None None SAAR, Globus Network

NONE · Title: NONE Author: NONE Subject: NONE Keywords: NONE Created Date: 7/15/2015 12:00:00 AM