Computerized Adaptive Testing: What is it and How Does it Work?
Presented by Matthew Finkelman, Ph.D.
Slide 3
Goals of this session Learn about Computerized Adaptive Testing
(CAT) Crash course in Item Response Theory (IRT) Combining CAT with
IRT History, pros and cons of CAT Answer questions
Slide 4
Not to be confused with Computerized Adaptive Testing: Not as
cute, but far fewer hairballs.
Slide 5
PART I Introduction to CAT
Slide 6
Why should you care? There are already operational assessments
that use CAT Some believe it will revolutionize classroom testing
in the future Fascinating idea that speaks to potential of
computers to have new uses in education Item Response Theory is all
over testing now
Slide 7
OK, so what is CAT? A type of assessment where a question is
displayed on a monitor Students use mouse to select answer Computer
chooses next question based on previous responses Next question is
displayed on monitor, or else test ends
Slide 8
A graphical representation Questions chosen depend on prior
responses
Slide 9
Analogy: A Game of 20 Questions I am thinking of an object. You
have 20 yes-or-no questions to figure it out. Would you write out
all your questions ahead of time? 1) Is it an animal? 2) Is it a
vegetable? 3) Is it blue? 4) Is it red? 5) Is it bigger than a car?
6) Etc.
Slide 10
20 Questions, Continued Isnt it more effective to base your
next question on previous answers? 1) Is it an animal? NO. 2) Is it
a vegetable? YES. 3) Is it commonly found in a salad? YES. 4) Is it
green? NO. 5) Would Bugs Bunny eat it? YES.
Slide 11
Same principle used in CAT Computer keeps track of each
students pattern of responses so far As test progresses, learn more
about individual student Choose next question (item) to get maximal
info about that particular students level of ability Purpose of
assessment: Get best possible information about students
Slide 12
Some items are more informative than others? Sure! Some items
are easier than others: 2 + 2 vs. 54389 + 34697 Some items are more
relevant than others: 3 + 7 vs. Academy Awards question Some items
are better at discerning proficient students from those who need
improvement
Slide 13
Which is most informative? Suppose we have only 2 types of
students: Advanced and Beginning Use the test to classify each
student Which item below is the best for this purpose?
ItemP(Correct|Advanced)P(Correct|Beginning) 152% 275%34%
3100%0%
Slide 14
Item 3 is the best Item 1 is completely useless Item 2 gives
some information Item 3 is all you need!
ItemP(Correct|Advanced)P(Correct|Beginning) 152% 275%34%
3100%0%
Slide 15
But wait Wouldnt we choose Item 3 for ALL students? If so, why
customize a test for an individual student? Answer: For some
students, Item A is more informative. For others, Item B is more
informative.
Slide 16
When is Item A more informative than Item B? Item A: 2 + 2 Item
B: (34 + 68) / 2 If youve answered many difficult items correctly,
Item A is waste of time If youve answered many easy items
incorrectly, Item B is too hard Thus, give Item B to
high-performing students, Item A to low-performing students
Slide 17
Isnt that unfair? It seems like CAT penalizes students for
performing well at start If we give different items to different
students, how can we compare their performances? The above question
arises whether we use CAT or not Item Response Theory to the
rescue!
Slide 18
Summary of Part I CAT customizes assessment based on previous
responses, as in 20 Questions Certain items more informative than
others For some students, Item A is more informative; for others,
Item B is When give different items to different students, need way
to relate student performances (Item Response Theory)
Slide 19
PART II Crash Course in Item Response Theory
Slide 20
Item Response Theory (IRT) Quantifies the relation between
examinees and test items For each item, gives probability of
correct response by ability level Provides a means for estimating
ability of examinees, describing characteristics of items Places
examinees on common scale when they have taken different items
Slide 21
The IRT Model: One item
Slide 22
Different items have different curves
Slide 23
Where did those curves come from? In IRT, ability is denoted by
Probability of a correct response is Each item has its own values
of a, b, and c. We know them from pre-testing a is the
discrimination: Related to the slope b is the difficulty: Harder
item, higher b c is the guessing parameter: Chance of lucky
guess
Slide 24
Effect of the a parameter Larger a increases the slope in the
middle All curves shown have equal b and c parameters
Slide 25
Effect of the b parameter Larger b means harder item All curves
shown have equal a and c parameters
Slide 26
Effect of the c parameter c is the left asymptote All curves
shown have equal a and b parameters
Slide 27
Wait a minute What do you mean by a student with an ability of
1.0? Does an ability of 0.0 mean that a student has NO ability?
What if my student has a reading ability of -1.2? What in the world
does that mean???
Slide 28
The ability scale Ability is on an arbitrary scale that just
happens to be centered around 0.0 We use arbitrary scales all the
time: Fahrenheit Celsius Decibels Nevertheless, need more
user-friendly reporting: scaled scores on conventional scale like
200-300
Slide 29
Giving a score for each student First assign an ability ()
value to each student (say, -3 to 3) Student is given the value of
that is most consistent with his/her responses The better he/she
does on the test, the higher the value of that he/she receives
Computer converts the score to a scaled score Report final
score!
Slide 30
Assigning scores Set of answers: (C,C,I,C,C,I,I,C,C,C,I,C,C) We
know which items were taken by each student: a, b, c parameters If
Student 1s items were easier than Student 2s, take into account
through item parameters Student 1: = 1.25, scaled score = 290
Student 2: = 0.65, scaled score = 268 Can compare students who took
different items!!!
Slide 31
Summary of Part II If you didnt get all that, dont worry Just
remember: In IRT, different items have different curves (depending
on a, b, c parameters) IRT allows us to give scores on the same
scale, even when students take different items These features
critical in CAT So how do we choose which items to give?
Slide 32
PART III Combining CAT with IRT
Slide 33
CAT Reminder CAT customizes assessment based on previous
responses For some students, Item A is more informative; for
others, Item B is With IRT, its OK to give different items to
different students
Slide 34
Which item would you choose next? PREVIOUS RESPONSES: 10 + 19 =
? Answered correctly. 27 + 38 = ? Answered incorrectly. 12 + 26 = ?
Answered incorrectly. POSSIBLE ITEMS TO GIVE NEXT: 18 + 9 = ? 13 +
17 = ? 14 + 20 = ?
Slide 35
Item selection to match ability/difficulty Want to give items
appropriate to ability 2 + 2 is not informative for high-performing
students; (34 + 68) / 2 is not informative for low- performing
students Student has taken 10 items, awaits 11th Classic approach:
Give item whose difficulty (b) is closest to current ability
estimate ()
Slide 36
Which item is better for = -1.2? Ability of a student Easier
item Harder item
Slide 37
More complex item selection Previous method: Match difficulty
to ability This criterion only uses b parameter and Recall that a
parameter is related to slope, c is guessing parameter Shouldnt we
consider those when choosing next item?
Slide 38
Another item selection method Ideal item: Low value of c; value
of b close to ; high value of a Fisher Information combines these
factors into a single number Choose item with highest Fisher
Info
Slide 39
Game: Which item would you choose? Suppose our current estimate
of is 0.6 Itemabc 10.8-0.20.25 21.10.40.15 31.02.20.18
Slide 40
Results If matching ability (estimate = 0.6) with difficulty,
we would give Item 2 If using Fisher Info, we would give Item 2
Itemabc 10.8-0.20.25 21.10.40.15 31.02.20.18
Slide 41
Round 2 Suppose our current estimate of is 0.7 Itemabc
11.30.90.20 21.10.60.22 30.8-0.10.10
Slide 42
Round 2 Results If matching ability (estimate = 0.7) with
difficulty, we would give Item 2 If using Fisher Info, we would
give Item 1 Itemabc 11.30.90.20 21.10.60.22 30.8-0.10.10
Slide 43
Summary of Part III Tailor items to be most informative about
individual students ability Do this by combining CAT with IRT One
method: Match difficulty with current estimate of Another method:
Take all parameters into account via Fisher Info
Slide 44
PART IV Historical and Practical Considerations
Slide 45
Brief history of CAT Flexilevel testing: The Choose Your Own
Adventure of assessment Multi-stage tests: Routing test followed by
second (tailored) stage of testing Full-blown CAT studied heavily
in 70s; item selection methods proposed CAT has been used in real
testing
Slide 46
Problem: Content Balance In testing, must balance content
(e.g., Math test of algebra, geometry, number sense) What if all
your most informative items come from the same content strand? In
practice, dozens of constraints for each CAT: Content, topics,
enemies list, etc. CAT solution: Pick most informative item among
those in play
Slide 47
Problem: Test security CAT administered on multiple occasions
Different students, different items; however, some items more
popular than others Person A takes exam, memorizes items, tells
Person B. Person B takes exam, benefits from Person As information
CAT solution: Limit the amount each item can be administered
Slide 48
Problems persisted CAT lovers were none too happy about
this
Slide 49
CAT Pros Convenient administration Immediate scoring Items
maximally informative: Exams just as accurate, with shorter tests
Items at correct level: High-performing students not bored,
low-performing students not overwhelmed
Slide 50
CAT Cons Limited by technology Potential bias versus students
with less computer experience Content balance less exact than
paper-and- pencil testing Test security
Slide 51
Use of CAT in diagnostic testing Want to use CAT where pros are
in effect but cons are not Diagnostic testing: Give each student a
list of strengths & weaknesses, helping teachers focus
instruction Content naturally balanced Low-stakes, so test security
less of an issue Still reap benefits of adaptive testing
Slide 52
Final summary Introduction to CAT: Benefits of giving different
items to different students Crash course in IRT Using IRT to select
items in a CAT History, pros and cons of CAT How should CAT be used
in the future?
Slide 53
Its all about student learning. Period.
www.measuredprogress.org