1 IRT basics: Theory and parameter estimation Wayne C. Lee, David Chuah, Patrick Wadlington, Steve Stark, & Sasha Chernyshenko

1

IRT basics: Theory and parameter estimation

Wayne C. Lee, David Chuah, Patrick Wadlington, Steve

Stark, & Sasha Chernyshenko

2

Overview How do I begin a set of IRT

analyses? What do I need?

Software Data

What do I do? Input/ syntax files Examination of output

On-line!

3

“Eye-ARE-What?” Item response theory (IRT)

Set of probabilistic models that… Describes the relationship between a

respondent’s magnitude on a construct (a.k.a. latent trait; e.g., extraversion, cognitive ability, affective commitment)…

To his or her probability of a particular response to an individual item

4

But what does that buy you? Provides more information than

classical test theory (CTT) Classical test statistics depend on the set of

items and sample examined IRT modeling not dependent on sample

examined Can examine item bias/ measurement

equivalence and provide conditional standard errors of measurement

5

Before we begin… Data preparation

Raw data must be recoded if necessary (negatively worded items must be reverse coded such that all items in the scale indicate a positive direction)

Dichotomization (optional) Reducing multiple options into two

separate values (0, 1; right, wrong)

6

Calibration and validation files Data is split into two separate files

Calibration sample for estimating IRT parameters

Validation sample for assessing the fit of the model to the data

Data files for the programs that we will be discussing must be in ASCII/ text format

7

Investigating dimensionality The models presented make a common

assumption of unidimensionality Hattie (1985) reviewed 30 techniques Some propose the ratio of the 1st

eigenvalue to the 2nd eigenvalue (Lord, 1980)

On-line we describe how to examine the eigenvalues following Principal Axis Factoring (PAF)

8

PAF and scree plots If the data are

dichotomous, factor analyze tetrachoric correlations Assume

continuum underlies item responses

Dominant

first factor

9

Two models presented The Three Parameter Logistic

model (3PL) For dichotomous data E.g., cognitive ability tests

Samejima's Graded Response model For polytomous data where options

are ordered along a continuum E.g., Likert scales

Common models among applied psychologists

10

The 3PL model

Three parameters: a = item discrimination b = item extremity/ difficulty c = lower asymptote, “pseudo-

guessing” Theta refers to the latent trait

11

Effect of the “a” parameter

Small “a,” poor

discrimination

12

Effect of the “a” parameter

Larger “a,” better

discrimination

13

Effect of the “b” parameter

Low “b,” “easy item”

14

Effect of the “b” parameter

Higher “b,” more difficult

item

“b” inversely proportional to CTT p

15

Effect of the “c” parameter

c=0, asymptote

at zero

16

Effect of the “c” parameter

“low ability”

respondents may

endorse correct

response

17

Estimating 3PL parameters DOS version of BILOG (Scientific Software)

Multiple files in directory, but small size overall Easier to estimate parameters for a large

number of scales or experimental groups Data file must be saved as ASCII text

ID number Individual responses

Input file (ASCII text)

18

BILOG input file (*.BLG)AGREEABLENESS CALIBRATION FOR IRT TUTORIAL.>COMMENT>GLOBAL DFN='AGR2_CAL.DAT', NIDW=4, NPARM=3, OFNAME='OMIT.KEY',

SAVE;>SAVE SCO = 'AGR2_CAL.SCO', PARM = 'AGR2_CAL.PAR', COV =

'AGR2_CAL.COV';>LENGTH NITEMS=(10);>INPUT SAMPLE=99999;(4A1,10A1)>TEST TNAME=AGR;>CALIB NQPT=40, CYC=100, NEW=30, CRIT=.001, PLOT=0;>SCORE MET=2, IDIST=0, RSC=0, NOPRINT;

Title line

19



'AGR2_CAL.COV';>LENGTH NITEMS=(10);>INPUT SAMPLE=99999;(4A1,10A1)>TEST TNAME=AGR;>CALIB NQPT=40, CYC=100, NEW=30, CRIT=.001, PLOT=0;>SCORE MET=2, IDIST=0, RSC=0, NOPRINT;Data File

NameCharacters in ID

field

Parameters File for missing

20



'AGR2_CAL.COV';>LENGTH NITEMS=(10);>INPUT SAMPLE=99999;(4A1,10A1)>TEST TNAME=AGR;>CALIB NQPT=40, CYC=100, NEW=30, CRIT=.001, PLOT=0;>SCORE MET=2, IDIST=0, RSC=0, NOPRINT;Requested files for:

Scoring, Parameters, Covariances

21




Number of items

Sample size

22




FORTRAN statement for reading data

Name of scale/

measure

23




Estimation specifications (not the default for

BILOG)

24




Scoring: Maximum likelihood, no prior distribution of scale scores, no rescaling

25

Phase one output file (*.PH1)

CLASSICAL ITEM STATISTICS FOR SUBTEST AGR

NUMBER NUMBER ITEM*TEST CORRELATION

ITEM NAME TRIED RIGHT PERCENT LOGIT/1.7 PEARSON BISERIAL

---------------------------------------------------------------------

1 0001 1500.0 1158.0 0.772 0.72 0.535 0.742

2 0002 1500.0 991.0 0.661 0.39 0.421 0.545

3 0003 1500.0 1354.0 0.903 1.31 0.290 0.500

4 0004 1500.0 1187.0 0.791 0.78 0.518 0.733

5 0005 1500.0 970.0 0.647 0.36 0.566 0.728

6 0006 1500.0 1203.0 0.802 0.82 0.362 0.519

7 0007 1500.0 875.0 0.583 0.20 0.533 0.674

8 0008 1500.0 810.0 0.540 0.09 0.473 0.594

9 0009 1500.0 1022.0 0.681 0.45 0.415 0.542

10 0010 1500.0 869.0 0.579 0.19 0.426 0.538

---------------------------------------------------------------------

Can indicate problems in parameter estimation

26

Phase two output file (*.PH2)

CYCLE 12: LARGEST CHANGE = 0.00116

-2 LOG LIKELIHOOD = 15181.4541


[FULL NEWTON STEP]

-2 LOG LIKELIHOOD = 15181.2347


Check for convergence

27

Phase three output file (*.PH3)

Theta estimation Scoring of individual respondents Required for DTF analyses

28

Parameter file (specified, *.PAR)

AGREEABLENESS CALIBRATION FOR IRT TUTORIAL. >COMMENT 1 10 100001AGR 111 1.130784 1.533393 -0.737439 0.652148 0.147203 0.101834 0.185726 0.135455 0.078989 0.0536880002AGR 211 0.360630 0.870309 -0.414371 1.149018 0.132796 0.087236 0.097709 0.098866 0.129000 0.0544610003AGR 311 1.474175 0.743095 -1.983831 1.345723 0.197127 0.108974 0.084487 0.250499 0.153003 0.0875780004AGR 411 1.196368 1.256263 -0.952323 0.796012 0.090901 0.087856 0.114710 0.123613 0.072684 0.0429370005AGR 511 0.544388 1.403904 -0.387767 0.712300 0.056774 0.071490 0.133486 0.080438 0.067727 0.0260860006AGR 611 0.892399 0.777440 -1.147869 1.286273 0.173882 0.093109 0.082096 0.152846 0.135828 0.0758290007AGR 711 0.174395 1.369223 -0.127368 0.730341 0.088135 0.083777 0.159712 0.085084 0.085190 0.032376

“a” “b” “c”

(32X,2F12.6,12X,F12.6)

29

PARTO3PL output (*.3PL)

0001AGR 111 1.130784 1.533393 -0.737439 0.652148 0.147203

0002AGR 211 0.360630 0.870309 -0.414371 1.149018 0.132796

0003AGR 311 1.474175 0.743095 -1.983831 1.345723 0.197127

0004AGR 411 1.196368 1.256263 -0.952323 0.796012 0.090901

0005AGR 511 0.544388 1.403904 -0.387767 0.712300 0.056774

0006AGR 611 0.892399 0.777440 -1.147869 1.286273 0.173882

0007AGR 711 0.174395 1.369223 -0.127368 0.730341 0.088135

0008AGR 811 0.042231 0.979045 -0.043135 1.021403 0.056546

0009AGR 911 0.441586 0.839144 -0.526234 1.191691 0.129646

0010AGR 1011 0.104452 0.879683 -0.118738 1.136773 0.101087

a b c

30

Scoring and covariance files Like the *.PAR file, specifically

requested *.COV - Provides parameters as

well as the variances/covariances between the parameters Necessary for DIF analyses

*.SCO - Provides ability score information for each respondent

31

Samejima's Graded Response model

Used when options are ordered along a continuum, as with Likert scales v = response to the polytomously

scored item i k = particular option a = discrimination parameter b = extremity parameter

32

Sample SGR Plot

“Low option”

“High option”

Low discrimination (a=0.4)

33

Sample SGR Plot

Better discrimination (a=2)

34

Running MULTILOG MULTILOG for DOS

Example with DOS batch file INFORLOG with MULTILOG

INFORLOG is typically interactive Process automated with batch file and

an input file (described on-line) *.IN1 (parameter estimation) *.IN2 (scoring)

35

The first input file (*.IN1)CALIBRATION OF AGREEABLENESS GRADED RESPONSE

MODEL>PRO IN RA NI=10 NE=1500 NCHAR=4 NG=1;>TEST ALL GR NC=(5,5,5,5,5,5,5,5,5,5);>EST NC=50;>SAVE;>END; 50123411111111112222222222333333333344444444445555555555(4A1,10A1)

Title line

36



Number of items, examinees, characters in the

ID field, single group

37



SGR model

Number of options for each

item

38



Number of cycles for estimation

End of command

syntax

39



Five characters

Denoting five options

40



Recoding of options for MULTILOG

41

The second input file (*.IN2)

SCORING AGREEABLENESS SCALE SGR MODEL>PRO SCORE IN RA NI=10 NE=1500 NCHAR=4 NG=1;>TEST ALL GR NC=(5,5,5,5,5,5,5,5,5,5);>START;Y >SAVE;>END; 51234511111111112222222222333333333344444444445555555555(4A1,10A1)

Scoring

Yes to INFORLOG (parameters in a

separate file)

42

Running MULTILOG Run the batch file *.IN1 *.LS1 (*.lis file renamed as *.ls1)

ensure that the data were read in and the model specified correctly

also provides a report of the estimation procedure with the estimated item parameters

Things of note…

43

0ITEM 1: 5 GRADED CATEGORIES P(#) ESTIMATE (S.E.) A 1 1.99 (0.12) B( 1) 2 -3.03 (0.18) B( 2) 3 -2.35 (0.11) B( 3) 4 -0.98 (0.06) B( 4) 5 2.01 (0.10)0 @THETA: -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 I(THETA): 1.08 1.04 1.05 0.81 0.49 0.35 0.47 0.79 0.990 OBSERVED AND EXPECTED COUNTS/PROPORTIONS IN CATEGORY(K): 1 2 3 4 5 OBS. FREQ. 21 44 277 1050 108 OBS. PROP. 0.01 0.03 0.18 0.70 0.07 EXP. PROP. 0.01 0.03 0.19 0.70 0.07

“a” includes a 1.7 scaling

factor

Frequencies for each option

Collapsing options

44

Scoring output *.IN2 *.LS2 Last portion of the file contains the

person parameters (estimated theta, standard error, the number of iterations used, and the respondent's ID number).

45

What now? Review

Data requirements for IRT Two models: 3PL (dichotomous), SGR

(polytomous), more on-line! MODFIT

Can plot IRF’s, ORF’s Model-data fit: Input parameters,

validation sample

Documents

1 IRT basics: Theory and parameter estimation Wayne C. Lee, David Chuah, Patrick Wadlington, Steve Stark, & Sasha Chernyshenko