Exploring the Full-Information Bifactor Model in Vertical Scaling With Construct Shift Ying Li and...

Preview:

Citation preview

Exploring the Full-Information Bifactor Model in Vertical Scaling With Construct Shift

Ying Li and Robert W. Lissitz

contentscontents

1. Unidimensionality of tests at each grade level2. Test construct invariance across grade

bifactor model

Vertical scaling

1. Computational simplicity for estimation

2. Ease of interpretation

3. Vertical scaling problems across grades

QuestionsQuestions

PurposePurpose

• to propose and evaluate a bifactor model for IRT vertical scaling that can incorporate construct shift across grades while extracting a common scale,

• to evaluate the robustness of the UIRT model in parameter recovery, and

• to compare parameter estimates from the bifactor and UIRT models

MethodMethod

• Bifactor Model• Gibbons and Hedeker

(1992) generalized the work of Holzinger and Swineford (1937) to derive a bifactor model for dichotomously scored item response data.

0: general factor or ability i0 : discrimination parameter for general factors:group special factor or ability is : discrimination parameter for group-special factordi :overall multidimensional item difficulty

simulationsimulation

• Common item design

• Bifactor model for generating data

• Concurrent calibration– Stable results

• Sample size• Number or percentage of common items

– Test length:60

• Variance of grade-specific factor: degree of construct shift

• 100 replication

Data generationData generation

Item discrimination parameters were set deliberately and repeatedly at 1.2, 1.4, 1.6, 1.8, 2.0, and 2.2 for the general dimension, and fixed at 1.7 for grade-specific

Identifications of Bifactor Model EstimationIdentifications of Bifactor Model Estimation

• For the general dimension, the variance of the general latent dimension was fixed to 1, and the discrimination parameters (loadings) were freely estimated in the study

• For the grade-specific dimensions (s = 1, 2, . . . , k), the discrimination parameters (s = 1, 2, .. . , k) (loadings) were fixed to the true parameter value 1.7, so that the variances of the grade-specific dimensions could be freely estimated

• the common items answered by multiple groups were restricted so that they would have unique item parameters in the multiple-group concurrent calibration

Model EstimationModel Estimation• Multigroup concurrent calibration was

implemented

• The computer program IRTPRO (Cai, Thissen, & du Toit, 2011), using marginal maximumlikelihood estimation (MML) with an EM algorithm, was used to estimate the models

Evaluation criteriaEvaluation criteria

RMSE = root mean square error; SE = standard error; SS = sample size; CI = common item; VR = variance of grade-specific factor.

ResultsResults

Person parameterPerson parameter

• person parameter estimates of the general dimension were better recovered than that of the grade-specific dimensions when the degree of construct shift was small or moderate

• sample size the estimation accuracy

Group parameterGroup parameter

overestimated

UIRTUIRT

• discrimination: overestimated

• Difficulty: well recovered

• construct shift person & group mean

ANOVA Effects for the Simulated FactorsANOVA Effects for the Simulated Factors• Three-way tests of between-subject effects (ANOVA)

– bias, RMSE, and SE

Sample size Degree of Construct shift

Percentage of common item

Bifactor model small~moderate

No~small Small Bias in d

UIRT small~moderate

Large : &Group Mean ability

Large bias in d

& d & SEGroup Mean abilitygrade-specific variance parameter

Large:SE

comparisoncomparison

• UIRT: overestimate discrimation parameter

• person and group mean parameter:

Less accurate

Real dataReal data

• 2006 fall Michigan mathematics assessments• Grade 3, 4, 5• Randomly 4000 examinees

• Bifactor vs UIRT

Variance estimationVariance estimation

R=0.983R=0.983

DiscussionDiscussion

• sample size the estimation accuracy & stability

• Variance of grade-specific dimension stability

• Be caution about construct shift

• Polytomounsly/mixed item format• Incorporate covariates • longitudinal studies

• Common item measure two group-specific ability?

• the item discriminate parameter fixed to the true value

• Multidimensional IRT?

Recommended