Classical and Bayesian Computerized Adaptive Testing Algorithms

Classical and Classical and Bayesian Bayesian

Computerized Computerized Adaptive Testing Adaptive Testing

AlgorithmsAlgorithms

Classical and Classical and Bayesian Bayesian

Computerized Computerized Adaptive Testing Adaptive Testing

AlgorithmsAlgorithms

Richard J. SwartzRichard J. Swartz

Department of Biostatistics Department of Biostatistics ([email protected])([email protected])

Outline Outline

• Principle of computerized adaptive testing Principle of computerized adaptive testing

• Basic statistical concepts and notationBasic statistical concepts and notation

• Trait estimation methodsTrait estimation methods

• Item selection methodsItem selection methods

• Comparisons between methodsComparisons between methods

• Current CAT Research TopicsCurrent CAT Research Topics

2

Computerized Adaptive Tests (CAT)

Computerized Adaptive Tests (CAT)

• First developed for assessment testingFirst developed for assessment testing

• Test tailored to an individualTest tailored to an individual– Only questions relevant to individual trait levelOnly questions relevant to individual trait level

– Shorter testsShorter tests

• Sequential adaptive selection problemSequential adaptive selection problem

• Requires item bank Requires item bank – Fit with IRT modelsFit with IRT models

– Extensive initial development before CAT Extensive initial development before CAT implementationimplementation 3

Item Bank Development IItem Bank Development I

• Qualitative item developmentQualitative item development

– Content expertsContent experts

– Response categoriesResponse categories

• Test model fitTest model fit

– Likelihood ratio based methodsLikelihood ratio based methods

– Model fit indicesModel fit indices

4

Item Bank Development IIItem Bank Development II

• Test Assumption: UnidimensionalityTest Assumption: Unidimensionality

– Factor analysisFactor analysis

– Confirmatory factor analysisConfirmatory factor analysis

– Multidimensional IRT modelsMultidimensional IRT models

• Test assumption: Local DependenceTest assumption: Local Dependence

– Residual correlation after 1Residual correlation after 1stst factor removed factor removed

– Multidimensional IRT modelsMultidimensional IRT models

5

Item Bank Development IIIItem Bank Development III

• Test assumption: InvarianceTest assumption: Invariance

– DIF = differential item functioningDIF = differential item functioning

• Over time and across groups (i.e. men vs. Over time and across groups (i.e. men vs. women)women)

• Across groupsAcross groups

• Many different methods (Logistic Regression Many different methods (Logistic Regression method, Area between response curves, and method, Area between response curves, and others)others)

6

CAT ImplementationCAT Implementation

11

3344

66

55

77

88

1414

1515

99

1313

1212

1010

1111

Lo DepressionLo Depression

Hi DepressionHi Depression

aa bb cc

22

cc

aa bb cc

55

aa bb cc

1515

bb

bb

22

7

CAT Item SelectionCAT Item Selection

8

Basic Concepts/ NotationBasic Concepts/ Notation

9

Latent trait of interest =

Bank of items B

Set of items administered after stage kk A

1Set of items remaining at stage k kk R B A

Response to item ( possible categories):

{1 2 ,

3 }, , ,i

i i

i m

u m

Single item adminstered at stage kk iResponse to item at stage

kii k u

Basic Concepts/ Notation IIBasic Concepts/ Notation II

10

1 2 3

Vector of responses up to stage

, ,

:

( , ),k kA i i i i

k

u u u uu

Probability of response to item (IRT model) :

( , )

Local independence Assumption:

for two items and , given

( , ) is independe

|

| nt of ( )| ,

i

i i i

i i i j j j

u i

u

i j

u u

P

P P

TRAIT ESTIMATIONTRAIT ESTIMATION

11

Estimating TraitsEstimating Traits

• Assumes Item parameters are knownAssumes Item parameters are known

• Represent the individual’s abilityRepresent the individual’s ability

• Done sequentially in CAT Done sequentially in CAT

• Estimate is updated after each Estimate is updated after each additional responseadditional response

– Maximum Likelihood EstimatorMaximum Likelihood Estimator

– Bayesian EstimatorsBayesian Estimators12

LikelihoodLikelihood

• Model describing a person’s response Model describing a person’s response pattern:pattern:

13

1 2 3

| , ) ( | , ) ( | , ).

item parameters for item

( , , , , )

(k k k k

k

k k

A A j j A Aj A

i

A i i

j

i i

P u f

i

L

u u

( , )depends on IRT model used,

Locally indep ndent

|

e

j jjP u

Maximum Likelihood Estimate

Maximum Likelihood Estimate

• Frequentist: “likely” value to generate the responsesFrequentist: “likely” value to generate the responses

• Consistency, efficiency depend on selection Consistency, efficiency depend on selection methods and item bank used.methods and item bank used.

• Does not always existDoes not always exist

14

ˆ arg max ( | , ): () ,A kk

AML

AL

ku u

Bayesian FrameworkBayesian Framework

is a random variable is a random variable

• A distribution on A distribution on describes knowledge prior describes knowledge prior to data collection (to data collection (Prior distribution)Prior distribution)

• Update information about Update information about (Trait) (Trait) as data is as data is collected (collected (Posterior distributionPosterior distribution))

• Describes distribution ofDescribes distribution of values instead of a values instead of a point estimatepoint estimate

15

Bayes RuleBayes Rule

• Combines information about Combines information about (prior) (prior) with information from the data with information from the data (Likelihood) (Likelihood)

16

,)

( | ) ( ): ( |

( | (, ) )k k

k k

k k

A AA A

A A

f gposterior g

f g d

uu

u

); : ( | ( | ).: ( , ) ,k k k kA A A ALikelihoodprior g L f u u

• Posterior Posterior Likelihood × Prior Likelihood × Prior

Maximum A Posteriori (MAP) Estimate

Maximum A Posteriori (MAP) Estimate

• Properties:Properties:

– Uniform Prior = equivalent to MLE over support Uniform Prior = equivalent to MLE over support of the prior, of the prior,

– For some prior/likelihood combinations, For some prior/likelihood combinations, Posterior can be multimodal Posterior can be multimodal

17

ˆ arg max ( ,) :| ( ),A kk

A AMAP g

ku u

Expected A Posteriori (EAP) Estimate

Expected A Posteriori (EAP) Estimate

• Properties:Properties:

– Always exists for a proper priorAlways exists for a proper prior

– Easy to calculate with numerical integration Easy to calculate with numerical integration techniquestechniques

– Prior influences estimatePrior influences estimate18

ˆ [ ] ( | , )A kk

A AEAP E g d ku u

Posterior VariancePosterior Variance

• Describes variability of Describes variability of • Can be used as conditional Standard Can be used as conditional Standard

Error of Measurement (SEM) for a given Error of Measurement (SEM) for a given response pattern.response pattern.

19

2( | , [ | ,) ] )( | ,

k k k k kA A AA A A dVar E g ku u u

ITEM SELECTION ITEM SELECTION

20

Item Selection AlgorithmsItem Selection Algorithms

• Choose the item that is “best” for the Choose the item that is “best” for the individual being testedindividual being tested

• Define “best”Define “best”

– Most information about trait estimateMost information about trait estimate

– Greatest reduction in expected variability of Greatest reduction in expected variability of trait estimatetrait estimate

21

Fisher’s InformationFisher’s Information• Information of a given item at a trait Information of a given item at a trait

valuevalue

22

1

1

2

ˆ

ˆ( ) ln ( | , )k

kA

A k kkA AI E L

U u U

1

2

2ˆ

ln ( | , )k k

Ak

A AE L

U

1

ˆ

2

2ln ( | , )

k Ak

j jj

jA

P UE

1

ˆ( )j kA

k

Uj A

I

u

1

1

2

2ˆ

ˆ( l) | (Observed Informat nn ( i, o )k k k

Ak

A AJ L

k

u

u u u

Maximum Fisher’s Information

Maximum Fisher’s Information

• Myopic algorithmMyopic algorithm

• Pick the item Pick the item iikk at stage at stage k,k, ( (iikk R Rkk) that ) that

maximizes Fisher’s information at current trait maximizes Fisher’s information at current trait estimate, (Classically MLE):estimate, (Classically MLE):

23

1k̂

1

1

, 1

1 1

1

ârg max ( ) :

ˆ ârg max ( ) ( ) :

ârg max ( ) :

j

j

j

Ak

Ak

k U k kj

k U k kj

ML

ML ML

LU k k

j

M

i I j R

I I j R

I j R

U

U

MFI - SelectionMFI - Selection

24

ˆ 1.3

Minimum Expected Posterior Variance (MEPV)


• Selects items that yields the minimum Selects items that yields the minimum predicted Posterior variance given predicted Posterior variance given previous responsesprevious responses

• Uses predictive distributionUses predictive distribution

• Is a myopic Bayesian decision theoretic Is a myopic Bayesian decision theoretic approach (minimizes Bayes risk)approach (minimizes Bayes risk)

• First described by Owen (1969, 1975)First described by Owen (1969, 1975)

25

Predictive DistributionPredictive Distribution

• Predict the probability of a response to Predict the probability of a response to an item given previous responsesan item given previous responses

26

( | ( , ) ( |) | , )k ki i A i i i A A dp u P u g k

u u

Bayesian Decision TheoryBayesian Decision Theory

• Dictates optimal (sequential adaptive) Dictates optimal (sequential adaptive) decisionsdecisions

• In addition to prior and Likelihood, specify a In addition to prior and Likelihood, specify a loss function (squared error loss):loss function (squared error loss):

27

1 1

2ˆ, , ( ,( ))k k k kA i A iul u

u u

Bayesian Decision Theory: Item Selection

Bayesian Decision Theory: Item Selection

• Optimal estimator for Squared-error loss Optimal estimator for Squared-error loss is posterior mean (EAP)is posterior mean (EAP)

• Select item that minimizes Bayes risk:Select item that minimizes Bayes risk:

28

11 1

2

| | ,

posterior predictive variance given

expectation over predicted response to item

arg minj A A j k jk k

j

j

U A U k

U

k Uj

U j

EAi E P for RE j

u u

11 1,| | ( , , ) ;

j A A j k kk kUU A iRiskBayes E E L u

u u u



• Pick the item Pick the item iikk remaining in the bank at remaining in the bank at

stage stage k,k, ( (iikk R Rkk) that minimizes the ) that minimizes the

expected posterior variance (with respect expected posterior variance (with respect to the predictive distribution):to the predictive distribution):

29

1 11

arg min ( | ) Var( | , ) :k k

j

j

j A Ar

j

m

k j j kj

i p r U r j R

u u

Other Information MeasuresOther Information Measures

• Weighted MeasuresWeighted Measures

– Maximum Likelihood weighted Fisher’s Maximum Likelihood weighted Fisher’s Information(MLWI) Information(MLWI)

– Maximum Posterior Weighted Fisher’s Maximum Posterior Weighted Fisher’s Information (MPWI):Information (MPWI):

• Kulback-Leibler Information: Global Kulback-Leibler Information: Global Information Measure Information Measure

30

Hybrid AlgorithmsHybrid Algorithms

• Maximum Expected Information (MEI)Maximum Expected Information (MEI)– Use observed informationUse observed information

– Predict information for next itemPredict information for next item

• Maximum Expected Posterior Weighted Maximum Expected Posterior Weighted Information (MEPWI)Information (MEPWI)– Use observed information Use observed information

– Predict information for next itemPredict information for next item

– Weight with Posterior Weight with Posterior

– MEPWI MEPWI MPWI MPWI

31

Mix – N– Match Mix – N– Match

• MAP with uniform prior to approximate MAP with uniform prior to approximate MLEMLE

• MFI using EAP instead of MLE (any MFI using EAP instead of MLE (any point information function)point information function)

• Use EAP for item selection, but MFI for Use EAP for item selection, but MFI for final trait estimatefinal trait estimate

32

COMPARISONSCOMPARISONS

33

Study DesignStudy Design

• Real Item BankReal Item Bank

– Depressive symptom items (62) Depressive symptom items (62)

– 4 categories (fit with Graded Response IRT 4 categories (fit with Graded Response IRT Model)Model)

• Peaked Bank: Items have “narrow” coveragePeaked Bank: Items have “narrow” coverage

• Flat Bank: Items have “wider” coverageFlat Bank: Items have “wider” coverage

• fixed length: 5, 10, 20-item CATsfixed length: 5, 10, 20-item CATs

34

Datasets Used Datasets Used

• Post hoc simulation using real data: Post hoc simulation using real data:

– 730 patients and caregivers at MDA730 patients and caregivers at MDA

– Real bank onlyReal bank only

• Simulated data:Simulated data:

– grid: -3 to 3 by .5 grid: -3 to 3 by .5

– 500 “simulees” per 500 “simulees” per

– Simulated and Real banksSimulated and Real banks

35

Theta

Test

In

form

atio

n

0

20

40

60

80

0.0

0.5

1.0

1.5

2.0

Sta

nd

ard

Err

or

-4 -3 -2 -1 0 1 2 3 4

InformationSE

|| | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| ||| | | |

Real Item Bank CharacteristicsReal Item Bank Characteristics

36

-2 -1 0 1 2

0.0

0.2

0.4

0.6

MFI

Theta (CAT)

SE

(C

AT

)

p(SE<=.32) = 0.53

-2 -1 0 1 20.

00.

20.

40.

6

MLWI

Theta (CAT)

SE

(C

AT

)p(SE<=.32) = 0.55

-2 -1 0 1 2

0.0

0.2

0.4

0.6

MPWI

Theta (CAT)

SE

(C

AT

)

p(SE<=.32) = 0.54

-2 -1 0 1 2

0.0

0.2

0.4

0.6

MEPV

Theta (CAT)

SE

(C

AT

)

p(SE<=.32) = 0.55

-2 -1 0 1 2

0.0

0.2

0.4

0.6

MEI(Fisher)

Theta (CAT)

SE

(C

AT

)

p(SE<=.32) = 0.55

-2 -1 0 1 2

0.0

0.2

0.4

0.6

MEI(Observed)

Theta (CAT)

SE

(C

AT

)

p(SE<=.32) = 0.55

-2 -1 0 1 2

0.0

0.2

0.4

0.6

Random

Theta (CAT)S

E (

CA

T)

p(SE<=.32) = 0.02

-2 -1 0 1 2

0.0

0.2

0.4

0.6

Theta

Theta (CAT)

SE

(C

AT

)

p(SE<=.32) = 0.53

-2 -1 0 1 2

0.0

0.2

0.4

0.6

Fixed

Theta (Static)

SE

(S

tatic

)

p(SE<=.32) = 0.49

-2 -1 0 1 2

0.0

0.2

0.4

0.6

Full Bank

Theta (Full Bank)

SE

(F

ull B

ank)

p(SE<=.32) = 0.94

Real Bank, Real Data, 5 ItemsReal Bank, Real Data, 5 Items

37

Real Bank, Real Data, 5 itemsReal Bank, Real Data, 5 items

38

SelectionCriterion

MeanSE2 RMSD CORR

MFI 0.1463 0.3763 0.9069MLWI 0.1432 0.3736 0.9094MPWI 0.1396 0.3738 0.9080MEPV 0.1388 0.3598 0.9149

MEI (Fisher’s) 0.1388 0.3632 0.9134MEI (Observed) 0.1388 0.3616 0.9139

Random 0.2369 0.4567 0.8565

-3 -2 -1 0 1 2 3

0.0

0.2

0.4

0.6

MFI

Theta (CAT)

SE

(C

AT

)

p(SE<=.32) = 0.44

-3 -2 -1 0 1 2 3

0.0

0.2

0.4

0.6

MLWI

Theta (CAT)

SE

(C

AT

)

p(SE<=.32) = 0.26

-3 -2 -1 0 1 2 3

0.0

0.2

0.4

0.6

MPWI

Theta (CAT)

SE

(C

AT

)

p(SE<=.32) = 0.43

-3 -2 -1 0 1 2 3

0.0

0.2

0.4

0.6

MEPV

Theta (CAT)

SE

(C

AT

)

p(SE<=.32) = 0.44

-3 -2 -1 0 1 2 3

0.0

0.2

0.4

0.6

MEI(Fisher)

Theta (CAT)

SE

(C

AT

)

p(SE<=.32) = 0.44

-3 -2 -1 0 1 2 3

0.0

0.2

0.4

0.6

MEI(Observed)

Theta (CAT)

SE

(C

AT

)

p(SE<=.32) = 0.44

-3 -2 -1 0 1 2 3

0.0

0.2

0.4

0.6

Random

Theta (CAT)

SE

(C

AT

)

p(SE<=.32) = 0

-3 -2 -1 0 1 2 3

0.0

0.2

0.4

0.6

Theta

Theta (CAT)

SE

(C

AT

)

p(SE<=.32) = 0.55

-3 -2 -1 0 1 2 3

0.0

0.2

0.4

0.6

Fixed

Theta (Static)

SE

(S

tatic

)

p(SE<=.32) = 0.12

Peaked Bank, Sim. Data, 5 ItemPeaked Bank, Sim. Data, 5 Item

39

Peaked Bank, Sim. Data, 5 ItemPeaked Bank, Sim. Data, 5 Item

40

SelectionCriterion

BIAS RMSE CORR

MFI 0.0283 0.3923 0.9822

MLWI 0.0678 0.4798 0.9724

MPWI 0.0261 0.3898 0.9822

MEPV 0.0232 0.3871 0.9822

MEI (Fisher’s) 0.0299 0.3903 0.9824

MEI (Observed) 0.0283 0.3911 0.9823

Random 0.0095 0.8378 0.9233

SummarySummary

• Polytomous itemsPolytomous items– Choi and Swartz, In pressChoi and Swartz, In press

– Classic MFI with MLE, and MLWI not as good as others.Classic MFI with MLE, and MLWI not as good as others.

– MFI with EAP, and all others essentially perform MFI with EAP, and all others essentially perform similarly.similarly.

• Dichotomous items Dichotomous items – (van der Linden, 1998)(van der Linden, 1998)

– MFI with MLE not as good as all others* MFI with MLE not as good as all others*

– Difference more pronounced for shorter testsDifference more pronounced for shorter tests

41

Adaptations/ Active Research Areas

Adaptations/ Active Research Areas

• Constrained adaptive tests/ content Constrained adaptive tests/ content balancingbalancing

• Exposure ControlExposure Control

• A-stratified adaptive testingA-stratified adaptive testing

• Item selection including burdenItem selection including burden

• Cheating detectionCheating detection

• Response timesResponse times42

43

Thank You!Thank You!

References and Further Reading

References and Further Reading

Choi SW Swartz RJ. (in press) ”Comparison of CAT Item Selection Choi SW Swartz RJ. (in press) ”Comparison of CAT Item Selection Criteria for Polytomous Items” Criteria for Polytomous Items” Applied psychological MeasurementApplied psychological Measurement..

Owen RJ (1969) Owen RJ (1969) A Bayesian approach to tailored testing A Bayesian approach to tailored testing (Research (Research report 69-92) Princeton, NJ: Educational Testing Service report 69-92) Princeton, NJ: Educational Testing Service

Owen RJ (1975). A Bayesian Sequential Procedure for quantal Owen RJ (1975). A Bayesian Sequential Procedure for quantal response in the context of adaptive mental testing. response in the context of adaptive mental testing. Journal of the Journal of the American Statistical Association, 70American Statistical Association, 70, 351-356., 351-356.

van der Linden WJ. (1998). “Bayesian item selection criteria for van der Linden WJ. (1998). “Bayesian item selection criteria for adaptive testing” adaptive testing” PsychometrikaPsychometrika, 2, 201-216., 2, 201-216.

van der Linden WJ. & Glas, C. A. W. (Eds). (2000). van der Linden WJ. & Glas, C. A. W. (Eds). (2000). Computerized Computerized Adaptive Testing: Theory and Practice. Adaptive Testing: Theory and Practice. Dordrecht; Boston: Kluwer Dordrecht; Boston: Kluwer Academic.Academic.

44

45

MLE PropertiesMLE Properties

• Usually has desirable asymptotic Usually has desirable asymptotic propertiesproperties

• Consistency and efficiency depend on Consistency and efficiency depend on selection criteria and item bankselection criteria and item bank

• Finite estimate does not exist for Finite estimate does not exist for repeated responses in categories 1 or repeated responses in categories 1 or mm

46

Documents

Classical and Bayesian Computerized Adaptive Testing Algorithms