Upload
jack-jacobs
View
215
Download
0
Tags:
Embed Size (px)
Citation preview
1
IRT basics: Theory and parameter estimation
Wayne C. Lee, David Chuah, Patrick Wadlington, Steve
Stark, & Sasha Chernyshenko
2
Overview How do I begin a set of IRT
analyses? What do I need?
Software Data
What do I do? Input/ syntax files Examination of output
On-line!
3
“Eye-ARE-What?” Item response theory (IRT)
Set of probabilistic models that… Describes the relationship between a
respondent’s magnitude on a construct (a.k.a. latent trait; e.g., extraversion, cognitive ability, affective commitment)…
To his or her probability of a particular response to an individual item
4
But what does that buy you? Provides more information than
classical test theory (CTT) Classical test statistics depend on the set of
items and sample examined IRT modeling not dependent on sample
examined Can examine item bias/ measurement
equivalence and provide conditional standard errors of measurement
5
Before we begin… Data preparation
Raw data must be recoded if necessary (negatively worded items must be reverse coded such that all items in the scale indicate a positive direction)
Dichotomization (optional) Reducing multiple options into two
separate values (0, 1; right, wrong)
6
Calibration and validation files Data is split into two separate files
Calibration sample for estimating IRT parameters
Validation sample for assessing the fit of the model to the data
Data files for the programs that we will be discussing must be in ASCII/ text format
7
Investigating dimensionality The models presented make a common
assumption of unidimensionality Hattie (1985) reviewed 30 techniques Some propose the ratio of the 1st
eigenvalue to the 2nd eigenvalue (Lord, 1980)
On-line we describe how to examine the eigenvalues following Principal Axis Factoring (PAF)
8
PAF and scree plots If the data are
dichotomous, factor analyze tetrachoric correlations Assume
continuum underlies item responses
Dominant
first factor
9
Two models presented The Three Parameter Logistic
model (3PL) For dichotomous data E.g., cognitive ability tests
Samejima's Graded Response model For polytomous data where options
are ordered along a continuum E.g., Likert scales
Common models among applied psychologists
10
The 3PL model
Three parameters: a = item discrimination b = item extremity/ difficulty c = lower asymptote, “pseudo-
guessing” Theta refers to the latent trait
11
Effect of the “a” parameter
Small “a,” poor
discrimination
12
Effect of the “a” parameter
Larger “a,” better
discrimination
13
Effect of the “b” parameter
Low “b,” “easy item”
14
Effect of the “b” parameter
Higher “b,” more difficult
item
“b” inversely proportional to CTT p
15
Effect of the “c” parameter
c=0, asymptote
at zero
16
Effect of the “c” parameter
“low ability”
respondents may
endorse correct
response
17
Estimating 3PL parameters DOS version of BILOG (Scientific Software)
Multiple files in directory, but small size overall Easier to estimate parameters for a large
number of scales or experimental groups Data file must be saved as ASCII text
ID number Individual responses
Input file (ASCII text)
18
BILOG input file (*.BLG)AGREEABLENESS CALIBRATION FOR IRT TUTORIAL.>COMMENT>GLOBAL DFN='AGR2_CAL.DAT', NIDW=4, NPARM=3, OFNAME='OMIT.KEY',
SAVE;>SAVE SCO = 'AGR2_CAL.SCO', PARM = 'AGR2_CAL.PAR', COV =
'AGR2_CAL.COV';>LENGTH NITEMS=(10);>INPUT SAMPLE=99999;(4A1,10A1)>TEST TNAME=AGR;>CALIB NQPT=40, CYC=100, NEW=30, CRIT=.001, PLOT=0;>SCORE MET=2, IDIST=0, RSC=0, NOPRINT;
Title line
19
BILOG input file (*.BLG)AGREEABLENESS CALIBRATION FOR IRT TUTORIAL.>COMMENT>GLOBAL DFN='AGR2_CAL.DAT', NIDW=4, NPARM=3, OFNAME='OMIT.KEY',
SAVE;>SAVE SCO = 'AGR2_CAL.SCO', PARM = 'AGR2_CAL.PAR', COV =
'AGR2_CAL.COV';>LENGTH NITEMS=(10);>INPUT SAMPLE=99999;(4A1,10A1)>TEST TNAME=AGR;>CALIB NQPT=40, CYC=100, NEW=30, CRIT=.001, PLOT=0;>SCORE MET=2, IDIST=0, RSC=0, NOPRINT;Data File
NameCharacters in ID
field
Parameters File for missing
20
BILOG input file (*.BLG)AGREEABLENESS CALIBRATION FOR IRT TUTORIAL.>COMMENT>GLOBAL DFN='AGR2_CAL.DAT', NIDW=4, NPARM=3, OFNAME='OMIT.KEY',
SAVE;>SAVE SCO = 'AGR2_CAL.SCO', PARM = 'AGR2_CAL.PAR', COV =
'AGR2_CAL.COV';>LENGTH NITEMS=(10);>INPUT SAMPLE=99999;(4A1,10A1)>TEST TNAME=AGR;>CALIB NQPT=40, CYC=100, NEW=30, CRIT=.001, PLOT=0;>SCORE MET=2, IDIST=0, RSC=0, NOPRINT;Requested files for:
Scoring, Parameters, Covariances
21
BILOG input file (*.BLG)AGREEABLENESS CALIBRATION FOR IRT TUTORIAL.>COMMENT>GLOBAL DFN='AGR2_CAL.DAT', NIDW=4, NPARM=3, OFNAME='OMIT.KEY',
SAVE;>SAVE SCO = 'AGR2_CAL.SCO', PARM = 'AGR2_CAL.PAR', COV =
'AGR2_CAL.COV';>LENGTH NITEMS=(10);>INPUT SAMPLE=99999;(4A1,10A1)>TEST TNAME=AGR;>CALIB NQPT=40, CYC=100, NEW=30, CRIT=.001, PLOT=0;>SCORE MET=2, IDIST=0, RSC=0, NOPRINT;
Number of items
Sample size
22
BILOG input file (*.BLG)AGREEABLENESS CALIBRATION FOR IRT TUTORIAL.>COMMENT>GLOBAL DFN='AGR2_CAL.DAT', NIDW=4, NPARM=3, OFNAME='OMIT.KEY',
SAVE;>SAVE SCO = 'AGR2_CAL.SCO', PARM = 'AGR2_CAL.PAR', COV =
'AGR2_CAL.COV';>LENGTH NITEMS=(10);>INPUT SAMPLE=99999;(4A1,10A1)>TEST TNAME=AGR;>CALIB NQPT=40, CYC=100, NEW=30, CRIT=.001, PLOT=0;>SCORE MET=2, IDIST=0, RSC=0, NOPRINT;
FORTRAN statement for reading data
Name of scale/
measure
23
BILOG input file (*.BLG)AGREEABLENESS CALIBRATION FOR IRT TUTORIAL.>COMMENT>GLOBAL DFN='AGR2_CAL.DAT', NIDW=4, NPARM=3, OFNAME='OMIT.KEY',
SAVE;>SAVE SCO = 'AGR2_CAL.SCO', PARM = 'AGR2_CAL.PAR', COV =
'AGR2_CAL.COV';>LENGTH NITEMS=(10);>INPUT SAMPLE=99999;(4A1,10A1)>TEST TNAME=AGR;>CALIB NQPT=40, CYC=100, NEW=30, CRIT=.001, PLOT=0;>SCORE MET=2, IDIST=0, RSC=0, NOPRINT;
Estimation specifications (not the default for
BILOG)
24
BILOG input file (*.BLG)AGREEABLENESS CALIBRATION FOR IRT TUTORIAL.>COMMENT>GLOBAL DFN='AGR2_CAL.DAT', NIDW=4, NPARM=3, OFNAME='OMIT.KEY',
SAVE;>SAVE SCO = 'AGR2_CAL.SCO', PARM = 'AGR2_CAL.PAR', COV =
'AGR2_CAL.COV';>LENGTH NITEMS=(10);>INPUT SAMPLE=99999;(4A1,10A1)>TEST TNAME=AGR;>CALIB NQPT=40, CYC=100, NEW=30, CRIT=.001, PLOT=0;>SCORE MET=2, IDIST=0, RSC=0, NOPRINT;
Scoring: Maximum likelihood, no prior distribution of scale scores, no rescaling
25
Phase one output file (*.PH1)
CLASSICAL ITEM STATISTICS FOR SUBTEST AGR
NUMBER NUMBER ITEM*TEST CORRELATION
ITEM NAME TRIED RIGHT PERCENT LOGIT/1.7 PEARSON BISERIAL
---------------------------------------------------------------------
1 0001 1500.0 1158.0 0.772 0.72 0.535 0.742
2 0002 1500.0 991.0 0.661 0.39 0.421 0.545
3 0003 1500.0 1354.0 0.903 1.31 0.290 0.500
4 0004 1500.0 1187.0 0.791 0.78 0.518 0.733
5 0005 1500.0 970.0 0.647 0.36 0.566 0.728
6 0006 1500.0 1203.0 0.802 0.82 0.362 0.519
7 0007 1500.0 875.0 0.583 0.20 0.533 0.674
8 0008 1500.0 810.0 0.540 0.09 0.473 0.594
9 0009 1500.0 1022.0 0.681 0.45 0.415 0.542
10 0010 1500.0 869.0 0.579 0.19 0.426 0.538
---------------------------------------------------------------------
Can indicate problems in parameter estimation
26
Phase two output file (*.PH2)
CYCLE 12: LARGEST CHANGE = 0.00116
-2 LOG LIKELIHOOD = 15181.4541
CYCLE 13: LARGEST CHANGE = 0.00071
[FULL NEWTON STEP]
-2 LOG LIKELIHOOD = 15181.2347
CYCLE 14: LARGEST CHANGE = 0.00066
Check for convergence
27
Phase three output file (*.PH3)
Theta estimation Scoring of individual respondents Required for DTF analyses
28
Parameter file (specified, *.PAR)
AGREEABLENESS CALIBRATION FOR IRT TUTORIAL. >COMMENT 1 10 100001AGR 111 1.130784 1.533393 -0.737439 0.652148 0.147203 0.101834 0.185726 0.135455 0.078989 0.0536880002AGR 211 0.360630 0.870309 -0.414371 1.149018 0.132796 0.087236 0.097709 0.098866 0.129000 0.0544610003AGR 311 1.474175 0.743095 -1.983831 1.345723 0.197127 0.108974 0.084487 0.250499 0.153003 0.0875780004AGR 411 1.196368 1.256263 -0.952323 0.796012 0.090901 0.087856 0.114710 0.123613 0.072684 0.0429370005AGR 511 0.544388 1.403904 -0.387767 0.712300 0.056774 0.071490 0.133486 0.080438 0.067727 0.0260860006AGR 611 0.892399 0.777440 -1.147869 1.286273 0.173882 0.093109 0.082096 0.152846 0.135828 0.0758290007AGR 711 0.174395 1.369223 -0.127368 0.730341 0.088135 0.083777 0.159712 0.085084 0.085190 0.032376
“a” “b” “c”
(32X,2F12.6,12X,F12.6)
29
PARTO3PL output (*.3PL)
0001AGR 111 1.130784 1.533393 -0.737439 0.652148 0.147203
0002AGR 211 0.360630 0.870309 -0.414371 1.149018 0.132796
0003AGR 311 1.474175 0.743095 -1.983831 1.345723 0.197127
0004AGR 411 1.196368 1.256263 -0.952323 0.796012 0.090901
0005AGR 511 0.544388 1.403904 -0.387767 0.712300 0.056774
0006AGR 611 0.892399 0.777440 -1.147869 1.286273 0.173882
0007AGR 711 0.174395 1.369223 -0.127368 0.730341 0.088135
0008AGR 811 0.042231 0.979045 -0.043135 1.021403 0.056546
0009AGR 911 0.441586 0.839144 -0.526234 1.191691 0.129646
0010AGR 1011 0.104452 0.879683 -0.118738 1.136773 0.101087
a b c
30
Scoring and covariance files Like the *.PAR file, specifically
requested *.COV - Provides parameters as
well as the variances/covariances between the parameters Necessary for DIF analyses
*.SCO - Provides ability score information for each respondent
31
Samejima's Graded Response model
Used when options are ordered along a continuum, as with Likert scales v = response to the polytomously
scored item i k = particular option a = discrimination parameter b = extremity parameter
32
Sample SGR Plot
“Low option”
“High option”
Low discrimination (a=0.4)
33
Sample SGR Plot
Better discrimination (a=2)
34
Running MULTILOG MULTILOG for DOS
Example with DOS batch file INFORLOG with MULTILOG
INFORLOG is typically interactive Process automated with batch file and
an input file (described on-line) *.IN1 (parameter estimation) *.IN2 (scoring)
35
The first input file (*.IN1)CALIBRATION OF AGREEABLENESS GRADED RESPONSE
MODEL>PRO IN RA NI=10 NE=1500 NCHAR=4 NG=1;>TEST ALL GR NC=(5,5,5,5,5,5,5,5,5,5);>EST NC=50;>SAVE;>END; 50123411111111112222222222333333333344444444445555555555(4A1,10A1)
Title line
36
The first input file (*.IN1)CALIBRATION OF AGREEABLENESS GRADED RESPONSE
MODEL>PRO IN RA NI=10 NE=1500 NCHAR=4 NG=1;>TEST ALL GR NC=(5,5,5,5,5,5,5,5,5,5);>EST NC=50;>SAVE;>END; 50123411111111112222222222333333333344444444445555555555(4A1,10A1)
Number of items, examinees, characters in the
ID field, single group
37
The first input file (*.IN1)CALIBRATION OF AGREEABLENESS GRADED RESPONSE
MODEL>PRO IN RA NI=10 NE=1500 NCHAR=4 NG=1;>TEST ALL GR NC=(5,5,5,5,5,5,5,5,5,5);>EST NC=50;>SAVE;>END; 50123411111111112222222222333333333344444444445555555555(4A1,10A1)
SGR model
Number of options for each
item
38
The first input file (*.IN1)CALIBRATION OF AGREEABLENESS GRADED RESPONSE
MODEL>PRO IN RA NI=10 NE=1500 NCHAR=4 NG=1;>TEST ALL GR NC=(5,5,5,5,5,5,5,5,5,5);>EST NC=50;>SAVE;>END; 50123411111111112222222222333333333344444444445555555555(4A1,10A1)
Number of cycles for estimation
End of command
syntax
39
The first input file (*.IN1)CALIBRATION OF AGREEABLENESS GRADED RESPONSE
MODEL>PRO IN RA NI=10 NE=1500 NCHAR=4 NG=1;>TEST ALL GR NC=(5,5,5,5,5,5,5,5,5,5);>EST NC=50;>SAVE;>END; 50123411111111112222222222333333333344444444445555555555(4A1,10A1)
Five characters
Denoting five options
40
The first input file (*.IN1)CALIBRATION OF AGREEABLENESS GRADED RESPONSE
MODEL>PRO IN RA NI=10 NE=1500 NCHAR=4 NG=1;>TEST ALL GR NC=(5,5,5,5,5,5,5,5,5,5);>EST NC=50;>SAVE;>END; 50123411111111112222222222333333333344444444445555555555(4A1,10A1)
Recoding of options for MULTILOG
41
The second input file (*.IN2)
SCORING AGREEABLENESS SCALE SGR MODEL>PRO SCORE IN RA NI=10 NE=1500 NCHAR=4 NG=1;>TEST ALL GR NC=(5,5,5,5,5,5,5,5,5,5);>START;Y >SAVE;>END; 51234511111111112222222222333333333344444444445555555555(4A1,10A1)
Scoring
Yes to INFORLOG (parameters in a
separate file)
42
Running MULTILOG Run the batch file *.IN1 *.LS1 (*.lis file renamed as *.ls1)
ensure that the data were read in and the model specified correctly
also provides a report of the estimation procedure with the estimated item parameters
Things of note…
43
0ITEM 1: 5 GRADED CATEGORIES P(#) ESTIMATE (S.E.) A 1 1.99 (0.12) B( 1) 2 -3.03 (0.18) B( 2) 3 -2.35 (0.11) B( 3) 4 -0.98 (0.06) B( 4) 5 2.01 (0.10)0 @THETA: -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 I(THETA): 1.08 1.04 1.05 0.81 0.49 0.35 0.47 0.79 0.990 OBSERVED AND EXPECTED COUNTS/PROPORTIONS IN CATEGORY(K): 1 2 3 4 5 OBS. FREQ. 21 44 277 1050 108 OBS. PROP. 0.01 0.03 0.18 0.70 0.07 EXP. PROP. 0.01 0.03 0.19 0.70 0.07
“a” includes a 1.7 scaling
factor
Frequencies for each option
Collapsing options
44
Scoring output *.IN2 *.LS2 Last portion of the file contains the
person parameters (estimated theta, standard error, the number of iterations used, and the respondent's ID number).
45
What now? Review
Data requirements for IRT Two models: 3PL (dichotomous), SGR
(polytomous), more on-line! MODFIT
Can plot IRF’s, ORF’s Model-data fit: Input parameters,
validation sample