Upload
metta
View
29
Download
0
Embed Size (px)
DESCRIPTION
Classical and Bayesian Computerized Adaptive Testing Algorithms. Richard J. Swartz Department of Biostatistics ([email protected]). Outline. Principle of computerized adaptive testing Basic statistical concepts and notation Trait estimation methods Item selection methods - PowerPoint PPT Presentation
Citation preview
Classical and Classical and Bayesian Bayesian
Computerized Computerized Adaptive Testing Adaptive Testing
AlgorithmsAlgorithms
Classical and Classical and Bayesian Bayesian
Computerized Computerized Adaptive Testing Adaptive Testing
AlgorithmsAlgorithms
Richard J. SwartzRichard J. Swartz
Department of Biostatistics Department of Biostatistics ([email protected])([email protected])
Outline Outline
• Principle of computerized adaptive testing Principle of computerized adaptive testing
• Basic statistical concepts and notationBasic statistical concepts and notation
• Trait estimation methodsTrait estimation methods
• Item selection methodsItem selection methods
• Comparisons between methodsComparisons between methods
• Current CAT Research TopicsCurrent CAT Research Topics
2
Computerized Adaptive Tests (CAT)
Computerized Adaptive Tests (CAT)
• First developed for assessment testingFirst developed for assessment testing
• Test tailored to an individualTest tailored to an individual– Only questions relevant to individual trait levelOnly questions relevant to individual trait level
– Shorter testsShorter tests
• Sequential adaptive selection problemSequential adaptive selection problem
• Requires item bank Requires item bank – Fit with IRT modelsFit with IRT models
– Extensive initial development before CAT Extensive initial development before CAT implementationimplementation 3
Item Bank Development IItem Bank Development I
• Qualitative item developmentQualitative item development
– Content expertsContent experts
– Response categoriesResponse categories
• Test model fitTest model fit
– Likelihood ratio based methodsLikelihood ratio based methods
– Model fit indicesModel fit indices
4
Item Bank Development IIItem Bank Development II
• Test Assumption: UnidimensionalityTest Assumption: Unidimensionality
– Factor analysisFactor analysis
– Confirmatory factor analysisConfirmatory factor analysis
– Multidimensional IRT modelsMultidimensional IRT models
• Test assumption: Local DependenceTest assumption: Local Dependence
– Residual correlation after 1Residual correlation after 1stst factor removed factor removed
– Multidimensional IRT modelsMultidimensional IRT models
5
Item Bank Development IIIItem Bank Development III
• Test assumption: InvarianceTest assumption: Invariance
– DIF = differential item functioningDIF = differential item functioning
• Over time and across groups (i.e. men vs. Over time and across groups (i.e. men vs. women)women)
• Across groupsAcross groups
• Many different methods (Logistic Regression Many different methods (Logistic Regression method, Area between response curves, and method, Area between response curves, and others)others)
6
CAT ImplementationCAT Implementation
11
3344
66
55
77
88
1414
1515
99
1313
1212
1010
1111
Lo DepressionLo Depression
Hi DepressionHi Depression
aa bb cc
22
cc
aa bb cc
55
aa bb cc
1515
bb
bb
22
7
CAT Item SelectionCAT Item Selection
8
Basic Concepts/ NotationBasic Concepts/ Notation
9
Latent trait of interest =
Bank of items B
Set of items administered after stage kk A
1Set of items remaining at stage k kk R B A
Response to item ( possible categories):
{1 2 ,
3 }, , ,i
i i
i m
u m
Single item adminstered at stage kk iResponse to item at stage
kii k u
Basic Concepts/ Notation IIBasic Concepts/ Notation II
10
1 2 3
Vector of responses up to stage
, ,
:
( , ),k kA i i i i
k
u u u uu
Probability of response to item (IRT model) :
( , )
Local independence Assumption:
for two items and , given
( , ) is independe
|
| nt of ( )| ,
i
i i i
i i i j j j
u i
u
i j
u u
P
P P
TRAIT ESTIMATIONTRAIT ESTIMATION
11
Estimating TraitsEstimating Traits
• Assumes Item parameters are knownAssumes Item parameters are known
• Represent the individual’s abilityRepresent the individual’s ability
• Done sequentially in CAT Done sequentially in CAT
• Estimate is updated after each Estimate is updated after each additional responseadditional response
– Maximum Likelihood EstimatorMaximum Likelihood Estimator
– Bayesian EstimatorsBayesian Estimators12
LikelihoodLikelihood
• Model describing a person’s response Model describing a person’s response pattern:pattern:
13
1 2 3
| , ) ( | , ) ( | , ).
item parameters for item
( , , , , )
(k k k k
k
k k
A A j j A Aj A
i
A i i
j
i i
P u f
i
L
u u
( , )depends on IRT model used,
Locally indep ndent
|
e
j jjP u
Maximum Likelihood Estimate
Maximum Likelihood Estimate
• Frequentist: “likely” value to generate the responsesFrequentist: “likely” value to generate the responses
• Consistency, efficiency depend on selection Consistency, efficiency depend on selection methods and item bank used.methods and item bank used.
• Does not always existDoes not always exist
14
ˆ arg max ( | , ): () ,A kk
AML
AL
ku u
Bayesian FrameworkBayesian Framework
is a random variable is a random variable
• A distribution on A distribution on describes knowledge prior describes knowledge prior to data collection (to data collection (Prior distribution)Prior distribution)
• Update information about Update information about (Trait) (Trait) as data is as data is collected (collected (Posterior distributionPosterior distribution))
• Describes distribution ofDescribes distribution of values instead of a values instead of a point estimatepoint estimate
15
Bayes RuleBayes Rule
• Combines information about Combines information about (prior) (prior) with information from the data with information from the data (Likelihood) (Likelihood)
16
,)
( | ) ( ): ( |
( | (, ) )k k
k k
k k
A AA A
A A
f gposterior g
f g d
uu
u
); : ( | ( | ).: ( , ) ,k k k kA A A ALikelihoodprior g L f u u
• Posterior Posterior Likelihood × Prior Likelihood × Prior
Maximum A Posteriori (MAP) Estimate
Maximum A Posteriori (MAP) Estimate
• Properties:Properties:
– Uniform Prior = equivalent to MLE over support Uniform Prior = equivalent to MLE over support of the prior, of the prior,
– For some prior/likelihood combinations, For some prior/likelihood combinations, Posterior can be multimodal Posterior can be multimodal
17
ˆ arg max ( ,) :| ( ),A kk
A AMAP g
ku u
Expected A Posteriori (EAP) Estimate
Expected A Posteriori (EAP) Estimate
• Properties:Properties:
– Always exists for a proper priorAlways exists for a proper prior
– Easy to calculate with numerical integration Easy to calculate with numerical integration techniquestechniques
– Prior influences estimatePrior influences estimate18
ˆ [ ] ( | , )A kk
A AEAP E g d ku u
Posterior VariancePosterior Variance
• Describes variability of Describes variability of • Can be used as conditional Standard Can be used as conditional Standard
Error of Measurement (SEM) for a given Error of Measurement (SEM) for a given response pattern.response pattern.
19
2( | , [ | ,) ] )( | ,
k k k k kA A AA A A dVar E g ku u u
ITEM SELECTION ITEM SELECTION
20
Item Selection AlgorithmsItem Selection Algorithms
• Choose the item that is “best” for the Choose the item that is “best” for the individual being testedindividual being tested
• Define “best”Define “best”
– Most information about trait estimateMost information about trait estimate
– Greatest reduction in expected variability of Greatest reduction in expected variability of trait estimatetrait estimate
21
Fisher’s InformationFisher’s Information• Information of a given item at a trait Information of a given item at a trait
valuevalue
22
1
1
2
ˆ
ˆ( ) ln ( | , )k
kA
A k kkA AI E L
U u U
1
2
2ˆ
ln ( | , )k k
Ak
A AE L
U
1
ˆ
2
2ln ( | , )
k Ak
j jj
jA
P UE
1
ˆ( )j kA
k
Uj A
I
u
1
1
2
2ˆ
ˆ( l) | (Observed Informat nn ( i, o )k k k
Ak
A AJ L
k
u
u u u
Maximum Fisher’s Information
Maximum Fisher’s Information
• Myopic algorithmMyopic algorithm
• Pick the item Pick the item iikk at stage at stage k,k, ( (iikk R Rkk) that ) that
maximizes Fisher’s information at current trait maximizes Fisher’s information at current trait estimate, (Classically MLE):estimate, (Classically MLE):
23
1k̂
1
1
, 1
1 1
1
ˆarg max ( ) :
ˆ ˆarg max ( ) ( ) :
ˆarg max ( ) :
j
j
j
Ak
Ak
k U k kj
k U k kj
ML
ML ML
LU k k
j
M
i I j R
I I j R
I j R
U
U
MFI - SelectionMFI - Selection
24
ˆ 1.3
Minimum Expected Posterior Variance (MEPV)
Minimum Expected Posterior Variance (MEPV)
• Selects items that yields the minimum Selects items that yields the minimum predicted Posterior variance given predicted Posterior variance given previous responsesprevious responses
• Uses predictive distributionUses predictive distribution
• Is a myopic Bayesian decision theoretic Is a myopic Bayesian decision theoretic approach (minimizes Bayes risk)approach (minimizes Bayes risk)
• First described by Owen (1969, 1975)First described by Owen (1969, 1975)
25
Predictive DistributionPredictive Distribution
• Predict the probability of a response to Predict the probability of a response to an item given previous responsesan item given previous responses
26
( | ( , ) ( |) | , )k ki i A i i i A A dp u P u g k
u u
Bayesian Decision TheoryBayesian Decision Theory
• Dictates optimal (sequential adaptive) Dictates optimal (sequential adaptive) decisionsdecisions
• In addition to prior and Likelihood, specify a In addition to prior and Likelihood, specify a loss function (squared error loss):loss function (squared error loss):
27
1 1
2ˆ, , ( ,( ))k k k kA i A iul u
u u
Bayesian Decision Theory: Item Selection
Bayesian Decision Theory: Item Selection
• Optimal estimator for Squared-error loss Optimal estimator for Squared-error loss is posterior mean (EAP)is posterior mean (EAP)
• Select item that minimizes Bayes risk:Select item that minimizes Bayes risk:
28
11 1
2
| | ,
posterior predictive variance given
expectation over predicted response to item
arg minj A A j k jk k
j
j
U A U k
U
k Uj
U j
EAi E P for RE j
u u
11 1,| | ( , , ) ;
j A A j k kk kUU A iRiskBayes E E L u
u u u
Minimum Expected Posterior Variance (MEPV)
Minimum Expected Posterior Variance (MEPV)
• Pick the item Pick the item iikk remaining in the bank at remaining in the bank at
stage stage k,k, ( (iikk R Rkk) that minimizes the ) that minimizes the
expected posterior variance (with respect expected posterior variance (with respect to the predictive distribution):to the predictive distribution):
29
1 11
arg min ( | ) Var( | , ) :k k
j
j
j A Ar
j
m
k j j kj
i p r U r j R
u u
Other Information MeasuresOther Information Measures
• Weighted MeasuresWeighted Measures
– Maximum Likelihood weighted Fisher’s Maximum Likelihood weighted Fisher’s Information(MLWI) Information(MLWI)
– Maximum Posterior Weighted Fisher’s Maximum Posterior Weighted Fisher’s Information (MPWI):Information (MPWI):
• Kulback-Leibler Information: Global Kulback-Leibler Information: Global Information Measure Information Measure
30
Hybrid AlgorithmsHybrid Algorithms
• Maximum Expected Information (MEI)Maximum Expected Information (MEI)– Use observed informationUse observed information
– Predict information for next itemPredict information for next item
• Maximum Expected Posterior Weighted Maximum Expected Posterior Weighted Information (MEPWI)Information (MEPWI)– Use observed information Use observed information
– Predict information for next itemPredict information for next item
– Weight with Posterior Weight with Posterior
– MEPWI MEPWI MPWI MPWI
31
Mix – N– Match Mix – N– Match
• MAP with uniform prior to approximate MAP with uniform prior to approximate MLEMLE
• MFI using EAP instead of MLE (any MFI using EAP instead of MLE (any point information function)point information function)
• Use EAP for item selection, but MFI for Use EAP for item selection, but MFI for final trait estimatefinal trait estimate
32
COMPARISONSCOMPARISONS
33
Study DesignStudy Design
• Real Item BankReal Item Bank
– Depressive symptom items (62) Depressive symptom items (62)
– 4 categories (fit with Graded Response IRT 4 categories (fit with Graded Response IRT Model)Model)
• Peaked Bank: Items have “narrow” coveragePeaked Bank: Items have “narrow” coverage
• Flat Bank: Items have “wider” coverageFlat Bank: Items have “wider” coverage
• fixed length: 5, 10, 20-item CATsfixed length: 5, 10, 20-item CATs
34
Datasets Used Datasets Used
• Post hoc simulation using real data: Post hoc simulation using real data:
– 730 patients and caregivers at MDA730 patients and caregivers at MDA
– Real bank onlyReal bank only
• Simulated data:Simulated data:
– grid: -3 to 3 by .5 grid: -3 to 3 by .5
– 500 “simulees” per 500 “simulees” per
– Simulated and Real banksSimulated and Real banks
35
Theta
Test
In
form
atio
n
0
20
40
60
80
0.0
0.5
1.0
1.5
2.0
Sta
nd
ard
Err
or
-4 -3 -2 -1 0 1 2 3 4
InformationSE
|| | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| ||| | | |
Real Item Bank CharacteristicsReal Item Bank Characteristics
36
-2 -1 0 1 2
0.0
0.2
0.4
0.6
MFI
Theta (CAT)
SE
(C
AT
)
p(SE<=.32) = 0.53
-2 -1 0 1 20.
00.
20.
40.
6
MLWI
Theta (CAT)
SE
(C
AT
)p(SE<=.32) = 0.55
-2 -1 0 1 2
0.0
0.2
0.4
0.6
MPWI
Theta (CAT)
SE
(C
AT
)
p(SE<=.32) = 0.54
-2 -1 0 1 2
0.0
0.2
0.4
0.6
MEPV
Theta (CAT)
SE
(C
AT
)
p(SE<=.32) = 0.55
-2 -1 0 1 2
0.0
0.2
0.4
0.6
MEI(Fisher)
Theta (CAT)
SE
(C
AT
)
p(SE<=.32) = 0.55
-2 -1 0 1 2
0.0
0.2
0.4
0.6
MEI(Observed)
Theta (CAT)
SE
(C
AT
)
p(SE<=.32) = 0.55
-2 -1 0 1 2
0.0
0.2
0.4
0.6
Random
Theta (CAT)S
E (
CA
T)
p(SE<=.32) = 0.02
-2 -1 0 1 2
0.0
0.2
0.4
0.6
Theta
Theta (CAT)
SE
(C
AT
)
p(SE<=.32) = 0.53
-2 -1 0 1 2
0.0
0.2
0.4
0.6
Fixed
Theta (Static)
SE
(S
tatic
)
p(SE<=.32) = 0.49
-2 -1 0 1 2
0.0
0.2
0.4
0.6
Full Bank
Theta (Full Bank)
SE
(F
ull B
ank)
p(SE<=.32) = 0.94
Real Bank, Real Data, 5 ItemsReal Bank, Real Data, 5 Items
37
Real Bank, Real Data, 5 itemsReal Bank, Real Data, 5 items
38
SelectionCriterion
MeanSE2 RMSD CORR
MFI 0.1463 0.3763 0.9069MLWI 0.1432 0.3736 0.9094MPWI 0.1396 0.3738 0.9080MEPV 0.1388 0.3598 0.9149
MEI (Fisher’s) 0.1388 0.3632 0.9134MEI (Observed) 0.1388 0.3616 0.9139
Random 0.2369 0.4567 0.8565
-3 -2 -1 0 1 2 3
0.0
0.2
0.4
0.6
MFI
Theta (CAT)
SE
(C
AT
)
p(SE<=.32) = 0.44
-3 -2 -1 0 1 2 3
0.0
0.2
0.4
0.6
MLWI
Theta (CAT)
SE
(C
AT
)
p(SE<=.32) = 0.26
-3 -2 -1 0 1 2 3
0.0
0.2
0.4
0.6
MPWI
Theta (CAT)
SE
(C
AT
)
p(SE<=.32) = 0.43
-3 -2 -1 0 1 2 3
0.0
0.2
0.4
0.6
MEPV
Theta (CAT)
SE
(C
AT
)
p(SE<=.32) = 0.44
-3 -2 -1 0 1 2 3
0.0
0.2
0.4
0.6
MEI(Fisher)
Theta (CAT)
SE
(C
AT
)
p(SE<=.32) = 0.44
-3 -2 -1 0 1 2 3
0.0
0.2
0.4
0.6
MEI(Observed)
Theta (CAT)
SE
(C
AT
)
p(SE<=.32) = 0.44
-3 -2 -1 0 1 2 3
0.0
0.2
0.4
0.6
Random
Theta (CAT)
SE
(C
AT
)
p(SE<=.32) = 0
-3 -2 -1 0 1 2 3
0.0
0.2
0.4
0.6
Theta
Theta (CAT)
SE
(C
AT
)
p(SE<=.32) = 0.55
-3 -2 -1 0 1 2 3
0.0
0.2
0.4
0.6
Fixed
Theta (Static)
SE
(S
tatic
)
p(SE<=.32) = 0.12
Peaked Bank, Sim. Data, 5 ItemPeaked Bank, Sim. Data, 5 Item
39
Peaked Bank, Sim. Data, 5 ItemPeaked Bank, Sim. Data, 5 Item
40
SelectionCriterion
BIAS RMSE CORR
MFI 0.0283 0.3923 0.9822
MLWI 0.0678 0.4798 0.9724
MPWI 0.0261 0.3898 0.9822
MEPV 0.0232 0.3871 0.9822
MEI (Fisher’s) 0.0299 0.3903 0.9824
MEI (Observed) 0.0283 0.3911 0.9823
Random 0.0095 0.8378 0.9233
SummarySummary
• Polytomous itemsPolytomous items– Choi and Swartz, In pressChoi and Swartz, In press
– Classic MFI with MLE, and MLWI not as good as others.Classic MFI with MLE, and MLWI not as good as others.
– MFI with EAP, and all others essentially perform MFI with EAP, and all others essentially perform similarly.similarly.
• Dichotomous items Dichotomous items – (van der Linden, 1998)(van der Linden, 1998)
– MFI with MLE not as good as all others* MFI with MLE not as good as all others*
– Difference more pronounced for shorter testsDifference more pronounced for shorter tests
41
Adaptations/ Active Research Areas
Adaptations/ Active Research Areas
• Constrained adaptive tests/ content Constrained adaptive tests/ content balancingbalancing
• Exposure ControlExposure Control
• A-stratified adaptive testingA-stratified adaptive testing
• Item selection including burdenItem selection including burden
• Cheating detectionCheating detection
• Response timesResponse times42
43
Thank You!Thank You!
References and Further Reading
References and Further Reading
Choi SW Swartz RJ. (in press) ”Comparison of CAT Item Selection Choi SW Swartz RJ. (in press) ”Comparison of CAT Item Selection Criteria for Polytomous Items” Criteria for Polytomous Items” Applied psychological MeasurementApplied psychological Measurement..
Owen RJ (1969) Owen RJ (1969) A Bayesian approach to tailored testing A Bayesian approach to tailored testing (Research (Research report 69-92) Princeton, NJ: Educational Testing Service report 69-92) Princeton, NJ: Educational Testing Service
Owen RJ (1975). A Bayesian Sequential Procedure for quantal Owen RJ (1975). A Bayesian Sequential Procedure for quantal response in the context of adaptive mental testing. response in the context of adaptive mental testing. Journal of the Journal of the American Statistical Association, 70American Statistical Association, 70, 351-356., 351-356.
van der Linden WJ. (1998). “Bayesian item selection criteria for van der Linden WJ. (1998). “Bayesian item selection criteria for adaptive testing” adaptive testing” PsychometrikaPsychometrika, 2, 201-216., 2, 201-216.
van der Linden WJ. & Glas, C. A. W. (Eds). (2000). van der Linden WJ. & Glas, C. A. W. (Eds). (2000). Computerized Computerized Adaptive Testing: Theory and Practice. Adaptive Testing: Theory and Practice. Dordrecht; Boston: Kluwer Dordrecht; Boston: Kluwer Academic.Academic.
44
45
MLE PropertiesMLE Properties
• Usually has desirable asymptotic Usually has desirable asymptotic propertiesproperties
• Consistency and efficiency depend on Consistency and efficiency depend on selection criteria and item bankselection criteria and item bank
• Finite estimate does not exist for Finite estimate does not exist for repeated responses in categories 1 or repeated responses in categories 1 or mm
46