Computerized Adaptive Testing: What is it and How Does it Work? Presented by Matthew Finkelman, Ph.D

Embed Size (px)

Citation preview

  • Slide 1
  • Slide 2
  • Computerized Adaptive Testing: What is it and How Does it Work? Presented by Matthew Finkelman, Ph.D.
  • Slide 3
  • Goals of this session Learn about Computerized Adaptive Testing (CAT) Crash course in Item Response Theory (IRT) Combining CAT with IRT History, pros and cons of CAT Answer questions
  • Slide 4
  • Not to be confused with Computerized Adaptive Testing: Not as cute, but far fewer hairballs.
  • Slide 5
  • PART I Introduction to CAT
  • Slide 6
  • Why should you care? There are already operational assessments that use CAT Some believe it will revolutionize classroom testing in the future Fascinating idea that speaks to potential of computers to have new uses in education Item Response Theory is all over testing now
  • Slide 7
  • OK, so what is CAT? A type of assessment where a question is displayed on a monitor Students use mouse to select answer Computer chooses next question based on previous responses Next question is displayed on monitor, or else test ends
  • Slide 8
  • A graphical representation Questions chosen depend on prior responses
  • Slide 9
  • Analogy: A Game of 20 Questions I am thinking of an object. You have 20 yes-or-no questions to figure it out. Would you write out all your questions ahead of time? 1) Is it an animal? 2) Is it a vegetable? 3) Is it blue? 4) Is it red? 5) Is it bigger than a car? 6) Etc.
  • Slide 10
  • 20 Questions, Continued Isnt it more effective to base your next question on previous answers? 1) Is it an animal? NO. 2) Is it a vegetable? YES. 3) Is it commonly found in a salad? YES. 4) Is it green? NO. 5) Would Bugs Bunny eat it? YES.
  • Slide 11
  • Same principle used in CAT Computer keeps track of each students pattern of responses so far As test progresses, learn more about individual student Choose next question (item) to get maximal info about that particular students level of ability Purpose of assessment: Get best possible information about students
  • Slide 12
  • Some items are more informative than others? Sure! Some items are easier than others: 2 + 2 vs. 54389 + 34697 Some items are more relevant than others: 3 + 7 vs. Academy Awards question Some items are better at discerning proficient students from those who need improvement
  • Slide 13
  • Which is most informative? Suppose we have only 2 types of students: Advanced and Beginning Use the test to classify each student Which item below is the best for this purpose? ItemP(Correct|Advanced)P(Correct|Beginning) 152% 275%34% 3100%0%
  • Slide 14
  • Item 3 is the best Item 1 is completely useless Item 2 gives some information Item 3 is all you need! ItemP(Correct|Advanced)P(Correct|Beginning) 152% 275%34% 3100%0%
  • Slide 15
  • But wait Wouldnt we choose Item 3 for ALL students? If so, why customize a test for an individual student? Answer: For some students, Item A is more informative. For others, Item B is more informative.
  • Slide 16
  • When is Item A more informative than Item B? Item A: 2 + 2 Item B: (34 + 68) / 2 If youve answered many difficult items correctly, Item A is waste of time If youve answered many easy items incorrectly, Item B is too hard Thus, give Item B to high-performing students, Item A to low-performing students
  • Slide 17
  • Isnt that unfair? It seems like CAT penalizes students for performing well at start If we give different items to different students, how can we compare their performances? The above question arises whether we use CAT or not Item Response Theory to the rescue!
  • Slide 18
  • Summary of Part I CAT customizes assessment based on previous responses, as in 20 Questions Certain items more informative than others For some students, Item A is more informative; for others, Item B is When give different items to different students, need way to relate student performances (Item Response Theory)
  • Slide 19
  • PART II Crash Course in Item Response Theory
  • Slide 20
  • Item Response Theory (IRT) Quantifies the relation between examinees and test items For each item, gives probability of correct response by ability level Provides a means for estimating ability of examinees, describing characteristics of items Places examinees on common scale when they have taken different items
  • Slide 21
  • The IRT Model: One item
  • Slide 22
  • Different items have different curves
  • Slide 23
  • Where did those curves come from? In IRT, ability is denoted by Probability of a correct response is Each item has its own values of a, b, and c. We know them from pre-testing a is the discrimination: Related to the slope b is the difficulty: Harder item, higher b c is the guessing parameter: Chance of lucky guess
  • Slide 24
  • Effect of the a parameter Larger a increases the slope in the middle All curves shown have equal b and c parameters
  • Slide 25
  • Effect of the b parameter Larger b means harder item All curves shown have equal a and c parameters
  • Slide 26
  • Effect of the c parameter c is the left asymptote All curves shown have equal a and b parameters
  • Slide 27
  • Wait a minute What do you mean by a student with an ability of 1.0? Does an ability of 0.0 mean that a student has NO ability? What if my student has a reading ability of -1.2? What in the world does that mean???
  • Slide 28
  • The ability scale Ability is on an arbitrary scale that just happens to be centered around 0.0 We use arbitrary scales all the time: Fahrenheit Celsius Decibels Nevertheless, need more user-friendly reporting: scaled scores on conventional scale like 200-300
  • Slide 29
  • Giving a score for each student First assign an ability () value to each student (say, -3 to 3) Student is given the value of that is most consistent with his/her responses The better he/she does on the test, the higher the value of that he/she receives Computer converts the score to a scaled score Report final score!
  • Slide 30
  • Assigning scores Set of answers: (C,C,I,C,C,I,I,C,C,C,I,C,C) We know which items were taken by each student: a, b, c parameters If Student 1s items were easier than Student 2s, take into account through item parameters Student 1: = 1.25, scaled score = 290 Student 2: = 0.65, scaled score = 268 Can compare students who took different items!!!
  • Slide 31
  • Summary of Part II If you didnt get all that, dont worry Just remember: In IRT, different items have different curves (depending on a, b, c parameters) IRT allows us to give scores on the same scale, even when students take different items These features critical in CAT So how do we choose which items to give?
  • Slide 32
  • PART III Combining CAT with IRT
  • Slide 33
  • CAT Reminder CAT customizes assessment based on previous responses For some students, Item A is more informative; for others, Item B is With IRT, its OK to give different items to different students
  • Slide 34
  • Which item would you choose next? PREVIOUS RESPONSES: 10 + 19 = ? Answered correctly. 27 + 38 = ? Answered incorrectly. 12 + 26 = ? Answered incorrectly. POSSIBLE ITEMS TO GIVE NEXT: 18 + 9 = ? 13 + 17 = ? 14 + 20 = ?
  • Slide 35
  • Item selection to match ability/difficulty Want to give items appropriate to ability 2 + 2 is not informative for high-performing students; (34 + 68) / 2 is not informative for low- performing students Student has taken 10 items, awaits 11th Classic approach: Give item whose difficulty (b) is closest to current ability estimate ()
  • Slide 36
  • Which item is better for = -1.2? Ability of a student Easier item Harder item
  • Slide 37
  • More complex item selection Previous method: Match difficulty to ability This criterion only uses b parameter and Recall that a parameter is related to slope, c is guessing parameter Shouldnt we consider those when choosing next item?
  • Slide 38
  • Another item selection method Ideal item: Low value of c; value of b close to ; high value of a Fisher Information combines these factors into a single number Choose item with highest Fisher Info
  • Slide 39
  • Game: Which item would you choose? Suppose our current estimate of is 0.6 Itemabc 10.8-0.20.25 21.10.40.15 31.02.20.18
  • Slide 40
  • Results If matching ability (estimate = 0.6) with difficulty, we would give Item 2 If using Fisher Info, we would give Item 2 Itemabc 10.8-0.20.25 21.10.40.15 31.02.20.18
  • Slide 41
  • Round 2 Suppose our current estimate of is 0.7 Itemabc 11.30.90.20 21.10.60.22 30.8-0.10.10
  • Slide 42
  • Round 2 Results If matching ability (estimate = 0.7) with difficulty, we would give Item 2 If using Fisher Info, we would give Item 1 Itemabc 11.30.90.20 21.10.60.22 30.8-0.10.10
  • Slide 43
  • Summary of Part III Tailor items to be most informative about individual students ability Do this by combining CAT with IRT One method: Match difficulty with current estimate of Another method: Take all parameters into account via Fisher Info
  • Slide 44
  • PART IV Historical and Practical Considerations
  • Slide 45
  • Brief history of CAT Flexilevel testing: The Choose Your Own Adventure of assessment Multi-stage tests: Routing test followed by second (tailored) stage of testing Full-blown CAT studied heavily in 70s; item selection methods proposed CAT has been used in real testing
  • Slide 46
  • Problem: Content Balance In testing, must balance content (e.g., Math test of algebra, geometry, number sense) What if all your most informative items come from the same content strand? In practice, dozens of constraints for each CAT: Content, topics, enemies list, etc. CAT solution: Pick most informative item among those in play
  • Slide 47
  • Problem: Test security CAT administered on multiple occasions Different students, different items; however, some items more popular than others Person A takes exam, memorizes items, tells Person B. Person B takes exam, benefits from Person As information CAT solution: Limit the amount each item can be administered
  • Slide 48
  • Problems persisted CAT lovers were none too happy about this
  • Slide 49
  • CAT Pros Convenient administration Immediate scoring Items maximally informative: Exams just as accurate, with shorter tests Items at correct level: High-performing students not bored, low-performing students not overwhelmed
  • Slide 50
  • CAT Cons Limited by technology Potential bias versus students with less computer experience Content balance less exact than paper-and- pencil testing Test security
  • Slide 51
  • Use of CAT in diagnostic testing Want to use CAT where pros are in effect but cons are not Diagnostic testing: Give each student a list of strengths & weaknesses, helping teachers focus instruction Content naturally balanced Low-stakes, so test security less of an issue Still reap benefits of adaptive testing
  • Slide 52
  • Final summary Introduction to CAT: Benefits of giving different items to different students Crash course in IRT Using IRT to select items in a CAT History, pros and cons of CAT How should CAT be used in the future?
  • Slide 53
  • Its all about student learning. Period. www.measuredprogress.org