21
Detecting Item Parameter Drift in a CAT program using the Rasch Measurement Model Mayuko Simon, David Chayer, Pam Hermann, and Yi Du Data Recognition Corporation April, 2012

Detecting Item Parameter Drift in a CAT program using the Rasch Measurement Model

  • Upload
    arnold

  • View
    42

  • Download
    0

Embed Size (px)

DESCRIPTION

Detecting Item Parameter Drift in a CAT program using the Rasch Measurement Model. Mayuko Simon, David Chayer, Pam Hermann, and Yi Du Data Recognition Corporation April, 2012. How should banked item parameters be checked? . - PowerPoint PPT Presentation

Citation preview

Page 1: Detecting Item Parameter Drift  in a CAT program  using the  Rasch  Measurement Model

Detecting Item Parameter Drift in a CAT program

using the Rasch Measurement Model

Mayuko Simon, David Chayer, Pam Hermann, and Yi Du

Data Recognition CorporationApril, 2012

Page 2: Detecting Item Parameter Drift  in a CAT program  using the  Rasch  Measurement Model

How should banked item parameters be checked?

• The idea for this study came about when the authors were faced with a large existing bank of CAT items with estimated item parameters that needed augmentation.

Page 3: Detecting Item Parameter Drift  in a CAT program  using the  Rasch  Measurement Model

Re-calibration of banked item parameters and item parameter drift

• Recalibration is recommended at periodic interval

• CAT item data is sparse matrix and range of students’ ability for each item are limited

Page 4: Detecting Item Parameter Drift  in a CAT program  using the  Rasch  Measurement Model

What would be a reasonable way to recalibrate items?

• The methods can be applied to– Maintenance of CAT item bank– Detecting item parameter drift– Calibration of field test items

Page 5: Detecting Item Parameter Drift  in a CAT program  using the  Rasch  Measurement Model

How did other researchers calibrate/re-calibrate CAT data?

• Missing imputation to avoid sparseness (Harmes, Parshall, and Kromrey, 2003)

• Calibrate FT items by anchoring operational items (Wang and Wiley, 2004)

• Calibrate FT item anchoring ability (Kingsbury, 2009)

• Use ability to calibrate item parameter to detect drift (Stocking, 1988)

Page 6: Detecting Item Parameter Drift  in a CAT program  using the  Rasch  Measurement Model

Simulation study• 300 items in item bank• 20,000 students’ simulated responses,

N(0,1)• Known item parameter drift (10% of

item bank)• Various drift sizes

Page 7: Detecting Item Parameter Drift  in a CAT program  using the  Rasch  Measurement Model

DesignItem difficulty # of

itemsItem parameter drift sizeCondition 1 Condition 2 Control

Condition

Easy d < -1.5

10 0.1, 0.2, 0.3, 0.4, 0.5 -0.1,- 0.2,- 0.3,- 0.4,- 0.5, 0.1, 0.2, 0.3, 0.4, 0.5

No change

Medium-1.5 ≤ d ≤ 1.5

10 0.1, 0.2, 0.3, 0.4, 0.5 -0.1,- 0.2,- 0.3,- 0.4,- 0.5, 0.1, 0.2, 0.3, 0.4, 0.5

No change

Difficultd > 1.5

10 0.1, 0.2, 0.3, 0.4, 0.5 -0.1,- 0.2,- 0.3,- 0.4,- 0.5, 0.1, 0.2, 0.3, 0.4, 0.5

No change

Page 8: Detecting Item Parameter Drift  in a CAT program  using the  Rasch  Measurement Model

Four calibration methods in this study

1. Anchor person ability (AP)2. Anchor person ability and anchor 200

items difficulty out of 300 items (API)3. Use of Displacement value from

Winsteps output4. Item by Item calibration (IBI)

Page 9: Detecting Item Parameter Drift  in a CAT program  using the  Rasch  Measurement Model

IBI: Item by Item calibration• A vector of responses for an item• A vector of ability who took the item• Same concept as logistic regression,

but use Winsteps to calibrate• No sparseness involved• Less data is needed (especially when

not all items in a bank needed to be checked)

Page 10: Detecting Item Parameter Drift  in a CAT program  using the  Rasch  Measurement Model

Evaluation• One sample t-test with alpha 0.01 for AP,

API, and IBI• Cutoff value 0.4 for Displacement method• Type I error rate• Type II error rate• Sensitivity (Type II + Sensitivity = 1)

• RMSE (average difference from banked value for flagged items)

• BIAS (average bias from banked value for flagged items)

Page 11: Detecting Item Parameter Drift  in a CAT program  using the  Rasch  Measurement Model

Type I error rate

AP API Displacement IBI0

0.0050.01

0.0150.02

0.0250.03

0.035

ControlCondition 1Condition 2

• Type I error for Control is also inflated• Condition 1 had higher Type I error rate

* Average over 40 replications

Page 12: Detecting Item Parameter Drift  in a CAT program  using the  Rasch  Measurement Model

Type II error rate

AP API Displacement IBI0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Condition 1Condition 2

• Type II error for Displacement method is too high.• Condition 1 had higher Type II error rate

* Average over 40 replications

Page 13: Detecting Item Parameter Drift  in a CAT program  using the  Rasch  Measurement Model

Sensitivity

AP API Displacement IBI0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Condition 1Condition 2

* Average over 40 replications

• Sensitivity for Displacement method is too low. • Condition 1 had lower sensitivity rate

Page 14: Detecting Item Parameter Drift  in a CAT program  using the  Rasch  Measurement Model

Items with small sample sizes and small drift are difficult to flag correctly.

Page 15: Detecting Item Parameter Drift  in a CAT program  using the  Rasch  Measurement Model
Page 16: Detecting Item Parameter Drift  in a CAT program  using the  Rasch  Measurement Model

Type II error were with items with small sample size and/or small drift

Item with small drift

Items with small N

Items with large drift

Page 17: Detecting Item Parameter Drift  in a CAT program  using the  Rasch  Measurement Model

Same item

Same items

Same items

Page 18: Detecting Item Parameter Drift  in a CAT program  using the  Rasch  Measurement Model

Which method has re-calibrated item difficulty closer to the banked value?

• Median of the RMSE are similar across three methods• IBI has less variance of RMSE than AP

Page 19: Detecting Item Parameter Drift  in a CAT program  using the  Rasch  Measurement Model

Which method has less bias with the re-calibrated item difficulty?

• All three methods has very small bias• IBI has less variance of BIAS than AP

Page 20: Detecting Item Parameter Drift  in a CAT program  using the  Rasch  Measurement Model

Conclusion• Use caution with Displacement value to identify

item parameter drift.• AP, API, and IBI worked reasonably well.• Items with small drift or small sample sizes are

difficult to detect the item parameter drift• Compared to AP, IBI had less variance of RMSE and

BIAS• Item parameter in one direction (condition 1)

would cause more bias in the final ability estimate, leading to higher Type I and Type II errors.

Page 21: Detecting Item Parameter Drift  in a CAT program  using the  Rasch  Measurement Model

Limitation and Future Study• Proportion of items with item parameter drift was 10% of

the bank.– How the results would change with various proportion? How

about the size of drift?• Used only Rasch model

– How about other models and software?• Minimum sample size was 10

– How about different minimum sample sizes (e.g., 30,50, etc)?• No iterative procedure (no update of the item difficulty

with drift)– Does results get better if we do iteratively, updating the

difficulty after detecting?