View
35
Download
7
Category
Preview:
DESCRIPTION
A Taxonomy of Adaptive Testing. Robert J. Mislevy Measurement, Statistics & Evaluation University of Maryland in collaboration with. Presented at the Fifth Annual Technology for Second Language Learning Conference, September 21-22, 2007, Iowa State University, Ames, Iowa, USA. - PowerPoint PPT Presentation
Citation preview
TSLL 07 Slide 1September 22, 2007
A Taxonomy of Adaptive Testing
Robert J. MislevyMeasurement, Statistics & Evaluation
University of Maryland
in collaboration with
Roy Levy John T. Behrens
Arizona State University Cisco Systems, Inc.
Presented at the Fifth Annual Technology for Second Language Learning Conference,
September 21-22, 2007, Iowa State University, Ames, Iowa, USA
TSLL 07 Slide 2September 22, 2007
Terminology & Concepts for Adaptive Testing
Adaptive testing» Most familiar as item response-theory based
computer-adaptive testing (IRT-CAT)
Can take a broader perspective of evidentiary reasoning
We will look at the interplay among inferences and data gathering
A taxonomy of configurations» IRT-CAT plus many others
TSLL 07 Slide 3September 22, 2007
Taxonomy based on three dimensions …
Claim status Observation status Locus of control
TSLL 07 Slide 4September 22, 2007
Background for the dimensions
Glenn Shafer’s “Frame of discernment” Evidence–centered assessment design
TSLL 07 Slide 5September 22, 2007
“Frame of discernment”
From Shafer’s (1976) A mathematical theory of evidence. It’s all the possible combinations of values of the variables
your are working with. “Frame” emphasizes how it effectively circumscribes a
universe in which inference will take place “Discern” = “detect, recognize, distinguish” Property of you as much as property of world Depends on what you know and what your purpose is
TSLL 07 Slide 6September 22, 2007
“Frame of discernment”
Frames of discernment can evolve over time, as beliefs, knowledge, and aims unfold over time. E.g., dip for the party? medical diagnosis
Move from one frame of discernment to another by ascertaining values of some variables, dropping others, adding new variables or refining current ones constructing a different frame when observations cause
rethinking of assumptions or goals
TSLL 07 Slide 7September 22, 2007
Evidence-Centered Design
Mislevy, Steinberg, & Almond (2003) “On the structure of educational assessments.”
Educational assessment as evidentiary argument:
We reason from the things students say, do, or make in a handful of particular settings, to what they know, can do in various situations, or have accomplished, as more broadly construed.
All elements of an assessment, from analysis of domain, through design, to operation, are based on building then embodying such an argument in operational procedures.
TSLL 07 Slide 8September 22, 2007
Toulmin’s Argument
Claim
Backing
unless
sinceWarrant
Alternativeexplanationso
Data
Structure
TSLL 07 Slide 9September 22, 2007
An Assessment Design Argument
so
Data concerning
situation
Student acting inassessment situation
Claim about student in some frame of
discernent
Data concerning
performance
Warrant
Backing
Information pertinent to addressing the claims is accumulated in terms of student-model variables
(SMVs)
Information pertinent to addressing the claims is accumulated in terms of student-model variables
(SMVs)
Aspects of performance that bear
on claims is captured in terms of observable
variables (OVs)
Aspects of performance that bear
on claims is captured in terms of observable
variables (OVs)
What we actually see/hear the student
say, do, or make
What we actually see/hear the student
say, do, or make
What aspects of the situation are important for the possibility of
inference about examinee?
What aspects of the situation are important for the possibility of
inference about examinee?
Formative assessments often have highly specific claims, summative assessments tend
to have broader claims.
Formative assessments often have highly specific claims, summative assessments tend
to have broader claims.
TSLL 07 Slide 10September 22, 2007
Adaptive Testing
so
Data concerning
situation
Student acting inassessment situation
Claim about student in some frame of
discernent
Data concerning
performance
Warrant
Backing
1. Somebody selects
situation for getting
information
2. Examinee acts
3. Evaluation of performance in light of current targeted claim
4. Update belief about claim
5. Somebody has choice about whether to
refocus claim
TSLL 07 Slide 11September 22, 2007
What is an adaptive test?
At a given time in an assessment system,
The set of student-model variables and observable variables consitutes a frame of discernment.
An adaptive test is one in which the frame of discernment changes over time as a function of the values of observations.
Ways it might change are the basis of the taxonomy.
TSLL 07 Slide 12September 22, 2007
Claim Status
Is the claim part of the frame of discernment, i.e., SMVs, fixed or evolving?
i.e., do the SMVs at issue stay the same or change (as opposed to knowledge about SMVs)?
TSLL 07 Slide 13September 22, 2007
Observation status
Is the data part of the frame of discernment, i.e., OVs, fixed or evolving?
i.e., does the choice of OVs that can be made stay the same or change as more information is obtained?
TSLL 07 Slide 14September 22, 2007
Locus of Control
If the claim part of the frame is changing as the test procedes, who decides how it should change:
The examiner or the examinee?
If the data part of the frame is changing as the test procedes, who decides how it should change:
The examiner or the examinee?
Claim status
Observation status
FixedAdaptive: Examiner
DeterminedAdaptive: Examinee
Determined
Fixed 1. Usual, linear test 2. IRT-CAT
Adaptive: Examiner Determined
Adaptive: Examinee Determined
“User friendly”testing
Claim status
Observation status
FixedAdaptive: Examiner
DeterminedAdaptive: Examinee
Determined
Fixed 1. Usual, linear test 2. IRT-CAT
Adaptive: Examiner Determined
Adaptive: Examinee Determined
Guided /diagnostic
Claim status
Observation status
FixedAdaptive: Examiner
DeterminedAdaptive: Examinee
Determined
Fixed 1. Usual, linear test 2. IRT-CAT
Adaptive: Examiner Determined
Adaptive: Examinee Determined
Self-guided /diagnostic
TSLL 07 Slide 18September 22, 2007
Cell 1: Fixed, examiner-controlled claim; Fixed, examiner-controlled observation
Traditional assessments in which … Same kind of claim(s) / inferences / SMVs for everyone they were decided on by the examiner a priori, tasks presented are determined by the examiner a priori, the examiner determines the sequence of tasks a priori
Neither the frame of discernment nor the gathering of evidence varies in response to values of observable variables or their impact on beliefs about SMVs.
TSLL 07 Slide 19September 22, 2007
Cell 2: Fixed, examiner-controlled claim; Adaptive, examiner-controlled observation
Same claims space (SMVs) for everyone the claims (SMVs) were decided on by the examiner, the tasks presented are determined by examiner a priori,
But in light of unfolding pattern responses, examiner selects items, to maximize accuracy
IRT-CAT (Can be multivariate; Segall, 1996). Binet’s original individually-administered intelligence test Lord’s Flexi-level scheme
TSLL 07 Slide 20September 22, 2007
Cell 3: Fixed, examiner-controlled claim; Adaptive, examinee-controlled observation
Same claims space (SMVs) for everyone the claims (SMVs) were decided on by the examiner.
But examinee is able to determine tasks in light of how he/she chooses. “User friendly”
Pole-vaulting competition Self-adaptive SAT (Wise et al, 1992): Student chooses
items by page or bin, grouped by difficulty. IRT scoring takes difficulty into account. (also see Wright, 1977)
Guard against nonignorable missingness (free throws)
TSLL 07 Slide 21September 22, 2007
Cell 4: Adaptive, examiner-controlled claim; fixed, examiner-controlled observation
Same tasks (OVs) for everyone Same presentation of tasks, determined a priori by
examiner.
But examiner determines claims (SMVs) for examinee in light of responses. E.g.,
MMPI – same 100’s of items for everyone, but examiner may compute different scales for different patients.
Diagnostic “reading record” test in language testing
Note: Need multidimensional claim space in Cells 4-9.
TSLL 07 Slide 22September 22, 2007
Cell 5: Adaptive, examiner-controlled claim;
adaptive, examiner-controlled observation Claims may diverge for different examinees in light of data Different tasks for different examinees, to be optimal in light
of the claims examiner wants to make about them as individuals
E.g., Triage in medicine, followed by different diagnostics Adaptive MMPI – different items for everyone, adaptively
selected for different scales for scales for different patients. Differential strategies in math (Tatsuoka) Adaptive diagnosis in language testing
TSLL 07 Slide 23September 22, 2007
Cell 6: Adaptive, examiner-controlled claim;
adaptive, examinee-controlled observation Examiners can home in on different claims for different
examinees in light of data, but Examinees have at least some control over task selection.
E.g., Self-adaptive tests, but along dimensions controlled by
examiner. Mulivariate SA-SAT, examiner’s inferences. Diagnostic / placement tests, homing in on different
remedial needs of students, but allowing for lower-stress choices of groups/pages of tasks like in Cell 3.
Thus examiner tailors claims part of frame of discernment, examinee tailors overvations part given claims.
TSLL 07 Slide 24September 22, 2007
Cell 7: Adaptive, examinee-controlled claims; fixed,
examiner-controlled observations Examinees all take same examiner-determined items in
examiner-determined way, but … Examinees can home in on different claims of their
choosing in light of data.
E.g., MMPI, but examinee determines which scales to compute
& analyze. Oral reading of a fixed sample, automated parsing—
student determines what to work on next (maybe could be done with Ordinate-like setup?)
TSLL 07 Slide 25September 22, 2007
Cell 8: Adaptive, examinee-controlled claims;
adaptive, examiner-controlled observations Examinee chooses the claim, at beginning or adaptively, examiner controls tasks presentation for optimal precision.
E.g., structured self-diagnosis: MMPI, where examinee determines which scales to focus
on and is presented items adaptively for those scales. Oral readings w. automated parsing—student determines
what to work on next, then examiner-selected samples to focus on what examinee wants to follow up on.
SIGI: Sequential exploration of career interests -- examinee chooses categories and system asks adaptive questions.
TSLL 07 Slide 26September 22, 2007
Cell 9: Adaptive, examinee-controlled claims;
adaptive, examinee-controlled observations Examinees control both the claims and the tasks to yield
observations for those claims. The examinee selects the claims to focus on and then has
input into what data will be observed. Feedback from system to help examinee figure out what
they want to know, then offer them choices about directions to go to refine information they receive
(continued)
TSLL 07 Slide 27September 22, 2007
Cell 9, continued: Adaptive, examinee-controlled
claims; adaptive, examinee-controlled observations
E.g., guided self-diagnosis: Central challenge in retrieval systems in libraries --
organize materials and search terms to help patrons find the information they might want
Amazon: “Customers who looked at these books you selected also looked at…”
Multivariate SA-SAT practice exploration space Language testing self-diagnosis: Start with common
passage or list of areas, do diagnostics, use results to refine testing for areas you are interested in.
Claim status
Observation status
FixedAdaptive: Examiner
DeterminedAdaptive: Examinee
Determined
Fixed 1. Usual, linear test 2. IRT-CAT3. Self-adapting tests e.g., SA-SAT (Wise et al., 1992)
Adaptive: Examiner Determined
4. MMPI—examiner decides how to pursue analysis
5. Examiner chooses target, Multidim CAT
6. Examiner chooses target in Multidim SA-SAT
Adaptive: Examinee Determined
7. MMPI—examinee decides how to pursue analysis
8. Examinee chooses target, Multidim CAT
9. Examinee chooses target & tasks in Multidim SA-SAT
TSLL 07 Slide 29September 22, 2007
Conclusion
Assessments involving adaptive claims have yet to achieve the prominence of adaptive-observation assessments. » History, up-front work, solving known “centralized” problems
User-controlled assessment not seen as assessment User modeling literature will be important Cells 8 & 9 good for self-directed learning in a
supported environment» Like user-modeling strategies for buying cars, choosing
movies, finding information in library systems.
Recommended