A Taxonomy of Adaptive Testing

TSLL 07 Slide 1September 22, 2007

Robert J. MislevyMeasurement, Statistics & Evaluation

University of Maryland

in collaboration with

Roy Levy John T. Behrens

Arizona State University Cisco Systems, Inc.

Presented at the Fifth Annual Technology for Second Language Learning Conference,

September 21-22, 2007, Iowa State University, Ames, Iowa, USA

Terminology & Concepts for Adaptive Testing

Adaptive testing» Most familiar as item response-theory based

computer-adaptive testing (IRT-CAT)

Can take a broader perspective of evidentiary reasoning

We will look at the interplay among inferences and data gathering

A taxonomy of configurations» IRT-CAT plus many others

Taxonomy based on three dimensions …

Claim status Observation status Locus of control

Background for the dimensions

Glenn Shafer’s “Frame of discernment” Evidence–centered assessment design

“Frame of discernment”

From Shafer’s (1976) A mathematical theory of evidence. It’s all the possible combinations of values of the variables

your are working with. “Frame” emphasizes how it effectively circumscribes a

universe in which inference will take place “Discern” = “detect, recognize, distinguish” Property of you as much as property of world Depends on what you know and what your purpose is

“Frame of discernment”

Frames of discernment can evolve over time, as beliefs, knowledge, and aims unfold over time. E.g., dip for the party? medical diagnosis

Move from one frame of discernment to another by ascertaining values of some variables, dropping others, adding new variables or refining current ones constructing a different frame when observations cause

rethinking of assumptions or goals

Evidence-Centered Design

Mislevy, Steinberg, & Almond (2003) “On the structure of educational assessments.”

Educational assessment as evidentiary argument:

We reason from the things students say, do, or make in a handful of particular settings, to what they know, can do in various situations, or have accomplished, as more broadly construed.

All elements of an assessment, from analysis of domain, through design, to operation, are based on building then embodying such an argument in operational procedures.

Toulmin’s Argument

Backing

unless

sinceWarrant

Alternativeexplanationso

Structure

An Assessment Design Argument

Data concerning

situation

Student acting inassessment situation

Claim about student in some frame of

discernent

Data concerning

performance

Warrant

Backing

Information pertinent to addressing the claims is accumulated in terms of student-model variables

(SMVs)

Information pertinent to addressing the claims is accumulated in terms of student-model variables

(SMVs)

Aspects of performance that bear

on claims is captured in terms of observable

variables (OVs)

Aspects of performance that bear

on claims is captured in terms of observable

variables (OVs)

What we actually see/hear the student

say, do, or make

What we actually see/hear the student

say, do, or make

What aspects of the situation are important for the possibility of

inference about examinee?

What aspects of the situation are important for the possibility of

inference about examinee?

Formative assessments often have highly specific claims, summative assessments tend

to have broader claims.

Formative assessments often have highly specific claims, summative assessments tend

to have broader claims.

Adaptive Testing

Data concerning

situation

Student acting inassessment situation

Claim about student in some frame of

discernent

Data concerning

performance

Warrant

Backing

1. Somebody selects

situation for getting

information

2. Examinee acts

3. Evaluation of performance in light of current targeted claim

4. Update belief about claim

5. Somebody has choice about whether to

refocus claim

What is an adaptive test?

At a given time in an assessment system,

The set of student-model variables and observable variables consitutes a frame of discernment.

An adaptive test is one in which the frame of discernment changes over time as a function of the values of observations.

Ways it might change are the basis of the taxonomy.

Claim Status

Is the claim part of the frame of discernment, i.e., SMVs, fixed or evolving?

i.e., do the SMVs at issue stay the same or change (as opposed to knowledge about SMVs)?

Observation status

Is the data part of the frame of discernment, i.e., OVs, fixed or evolving?

i.e., does the choice of OVs that can be made stay the same or change as more information is obtained?

Locus of Control

If the claim part of the frame is changing as the test procedes, who decides how it should change:

The examiner or the examinee?

If the data part of the frame is changing as the test procedes, who decides how it should change:

The examiner or the examinee?

Claim status

Observation status

FixedAdaptive: Examiner

DeterminedAdaptive: Examinee

Determined

Fixed 1. Usual, linear test 2. IRT-CAT

Adaptive: Examiner Determined

Adaptive: Examinee Determined

“User friendly”testing

Claim status

Observation status

Determined

Guided /diagnostic

Claim status

Observation status

Determined

Self-guided /diagnostic

Cell 1: Fixed, examiner-controlled claim; Fixed, examiner-controlled observation

Traditional assessments in which … Same kind of claim(s) / inferences / SMVs for everyone they were decided on by the examiner a priori, tasks presented are determined by the examiner a priori, the examiner determines the sequence of tasks a priori

Neither the frame of discernment nor the gathering of evidence varies in response to values of observable variables or their impact on beliefs about SMVs.

Cell 2: Fixed, examiner-controlled claim; Adaptive, examiner-controlled observation

Same claims space (SMVs) for everyone the claims (SMVs) were decided on by the examiner, the tasks presented are determined by examiner a priori,

But in light of unfolding pattern responses, examiner selects items, to maximize accuracy

IRT-CAT (Can be multivariate; Segall, 1996). Binet’s original individually-administered intelligence test Lord’s Flexi-level scheme

Cell 3: Fixed, examiner-controlled claim; Adaptive, examinee-controlled observation

Same claims space (SMVs) for everyone the claims (SMVs) were decided on by the examiner.

But examinee is able to determine tasks in light of how he/she chooses. “User friendly”

Pole-vaulting competition Self-adaptive SAT (Wise et al, 1992): Student chooses

items by page or bin, grouped by difficulty. IRT scoring takes difficulty into account. (also see Wright, 1977)

Guard against nonignorable missingness (free throws)

Cell 4: Adaptive, examiner-controlled claim; fixed, examiner-controlled observation

Same tasks (OVs) for everyone Same presentation of tasks, determined a priori by

examiner.

But examiner determines claims (SMVs) for examinee in light of responses. E.g.,

MMPI – same 100’s of items for everyone, but examiner may compute different scales for different patients.

Diagnostic “reading record” test in language testing

Note: Need multidimensional claim space in Cells 4-9.

Cell 5: Adaptive, examiner-controlled claim;

adaptive, examiner-controlled observation Claims may diverge for different examinees in light of data Different tasks for different examinees, to be optimal in light

of the claims examiner wants to make about them as individuals

E.g., Triage in medicine, followed by different diagnostics Adaptive MMPI – different items for everyone, adaptively

selected for different scales for scales for different patients. Differential strategies in math (Tatsuoka) Adaptive diagnosis in language testing

Cell 6: Adaptive, examiner-controlled claim;

adaptive, examinee-controlled observation Examiners can home in on different claims for different

examinees in light of data, but Examinees have at least some control over task selection.

E.g., Self-adaptive tests, but along dimensions controlled by

examiner. Mulivariate SA-SAT, examiner’s inferences. Diagnostic / placement tests, homing in on different

remedial needs of students, but allowing for lower-stress choices of groups/pages of tasks like in Cell 3.

Thus examiner tailors claims part of frame of discernment, examinee tailors overvations part given claims.

Cell 7: Adaptive, examinee-controlled claims; fixed,

examiner-controlled observations Examinees all take same examiner-determined items in

examiner-determined way, but … Examinees can home in on different claims of their

choosing in light of data.

E.g., MMPI, but examinee determines which scales to compute

& analyze. Oral reading of a fixed sample, automated parsing—

student determines what to work on next (maybe could be done with Ordinate-like setup?)

Cell 8: Adaptive, examinee-controlled claims;

adaptive, examiner-controlled observations Examinee chooses the claim, at beginning or adaptively, examiner controls tasks presentation for optimal precision.

E.g., structured self-diagnosis: MMPI, where examinee determines which scales to focus

on and is presented items adaptively for those scales. Oral readings w. automated parsing—student determines

what to work on next, then examiner-selected samples to focus on what examinee wants to follow up on.

SIGI: Sequential exploration of career interests -- examinee chooses categories and system asks adaptive questions.

Cell 9: Adaptive, examinee-controlled claims;

adaptive, examinee-controlled observations Examinees control both the claims and the tasks to yield

observations for those claims. The examinee selects the claims to focus on and then has

input into what data will be observed. Feedback from system to help examinee figure out what

they want to know, then offer them choices about directions to go to refine information they receive

(continued)

Cell 9, continued: Adaptive, examinee-controlled

claims; adaptive, examinee-controlled observations

E.g., guided self-diagnosis: Central challenge in retrieval systems in libraries --

organize materials and search terms to help patrons find the information they might want

Amazon: “Customers who looked at these books you selected also looked at…”

Multivariate SA-SAT practice exploration space Language testing self-diagnosis: Start with common

passage or list of areas, do diagnostics, use results to refine testing for areas you are interested in.

Claim status

Observation status

Determined

Fixed 1. Usual, linear test 2. IRT-CAT3. Self-adapting tests e.g., SA-SAT (Wise et al., 1992)

4. MMPI—examiner decides how to pursue analysis

5. Examiner chooses target, Multidim CAT

6. Examiner chooses target in Multidim SA-SAT

7. MMPI—examinee decides how to pursue analysis

8. Examinee chooses target, Multidim CAT

9. Examinee chooses target & tasks in Multidim SA-SAT

Conclusion

Assessments involving adaptive claims have yet to achieve the prominence of adaptive-observation assessments. » History, up-front work, solving known “centralized” problems

User-controlled assessment not seen as assessment User modeling literature will be important Cells 8 & 9 good for self-directed learning in a

supported environment» Like user-modeling strategies for buying cars, choosing

movies, finding information in library systems.

A Taxonomy of Adaptive Testing

Documents

Computerized adaptive testing using Bayesian networks

Item Response Times in Computerized Adaptive Testing · PDF fileItem Response Times in Computerized Adaptive Testing ... from 5,912 young men on a computerized adaptive test were

Testing the EU taxonomy - PRI

Adaptive functional testing for autonomous trucks

A Taxonomy and Classication of Adaptive Event Based Middleware … · 2014-03-19 · A Taxonomy and Classication of Adaptive Event Based Middleware with Support for Service Guarantees

Adaptive Distance Learning and Testing System

11 adaptive testing-irt

Computerized Adaptive Testing (CAT)

MULTIDIMENSIONAL ADAPTIVE TESTING ... › uploads › incoming › pub › Literatur...trates that multidimensional adaptive testing (MAT) can provide equal or higher reliabilities

Journal of Computerized Adaptive Testing

Adaptive Testing Methodology [ ATM ]

Bayesian Computerized Adaptive Testing

Constrained Multidimensional Adaptive Testing without intermixing

CATBOOK Computerized Adaptive Testing: From Inquiry …sites.nationalacademies.org/cs/groups/dbassesite/... · CATBOOK Computerized Adaptive Testing: From Inquiry ... CATBOOK- Computerized

Computerized Adaptive Psychological Testing A ... · Computerized Adaptive Psychological Testing A Personalisation Perspective ... •Results of inference •Temporary results Inference

Research Report for Adaptive Testing

Decisions to be made in developing an adaptive testing ... · for Computerized Adaptive Testing (IACAT) and ... Decisions to be made in developing an adaptive testing system for K–12

Strategies LLC Taxonomy March 25, 2006Copyright 2006 Taxonomy Strategies LLC. All rights reserved. Taxonomy Testing & Usability Joseph A. Busch

UNCLASSIFIEDUNCLASSIFIED Adaptive Sensitivity Testing in

Digital Testing: How to enable Continuous Adaptive Testing (EN)