39
Adam Wyse and Ji Zeng Psychometricians Michigan Department of Education Office of Educational Assessment and Accountability Introduction to Item Response Theory

Introduction to Item Response Theory - Michigan to Item Response Theory. 2 ... – Rasch Model (MEAP, MEAP-Access, ... • Many IRT linking methods exist for creating a

  • Upload
    trananh

  • View
    225

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Introduction to Item Response Theory - Michigan to Item Response Theory. 2 ... – Rasch Model (MEAP, MEAP-Access, ... • Many IRT linking methods exist for creating a

Adam Wyse and Ji ZengPsychometricians

Michigan Department of EducationOffice of Educational Assessment and

Accountability

Introduction to Item Response Theory

Page 2: Introduction to Item Response Theory - Michigan to Item Response Theory. 2 ... – Rasch Model (MEAP, MEAP-Access, ... • Many IRT linking methods exist for creating a

2

Focus of this session

• The other session discussed Classical Test Theory (CTT).

• The focus of this session is on Item Response Theory (IRT) and how IRT is used at MDE.

Page 3: Introduction to Item Response Theory - Michigan to Item Response Theory. 2 ... – Rasch Model (MEAP, MEAP-Access, ... • Many IRT linking methods exist for creating a

3

Basic Differences between CTT and IRT

• Focus on item performance (IRT) versus Total Test performance (CTT).

• Population dependent statistics (CTT) versus population independent statistics (IRT).

• Test specific statistics (CTT) versus Test independent statistics (IRT).

• Definition, which cannot be tested (CTT), versus a model, which can be tested (IRT).

• Few assumptions (CTT) versus several assumptions (IRT).

Page 4: Introduction to Item Response Theory - Michigan to Item Response Theory. 2 ... – Rasch Model (MEAP, MEAP-Access, ... • Many IRT linking methods exist for creating a

4

What is IRT?

• Relates student ability and item characteristics to the probability of obtaining a particular score on an item.

• Many IRT models exist, including models for multiple-choice, short answer, and constructed response items.

• Models differ in how probabilities are related to student ability and item characteristics.

Page 5: Introduction to Item Response Theory - Michigan to Item Response Theory. 2 ... – Rasch Model (MEAP, MEAP-Access, ... • Many IRT linking methods exist for creating a

5

IRT assumptions

• Monotonicity: A more able person has a higher probability of responding correctly to an item than a less able person.

• Local independence: the response to one item is independent of and does not influence your probability of responding correctly to another item after controlling for ability.

• Item and person parameters do not change across populations.

Page 6: Introduction to Item Response Theory - Michigan to Item Response Theory. 2 ... – Rasch Model (MEAP, MEAP-Access, ... • Many IRT linking methods exist for creating a

6

Unidimensionality

• Models used by MDE also assume unidimensionality.–A single underlying construct

measured by the assessment (i.e. mathematics achievement, reading achievement, etc.)

Page 7: Introduction to Item Response Theory - Michigan to Item Response Theory. 2 ... – Rasch Model (MEAP, MEAP-Access, ... • Many IRT linking methods exist for creating a

7

Common IRT Models

• Multiple-Choice and Short Answer Items– Rasch Model (MEAP, MEAP-Access, MI-Access

FI, ELPA)– 2 PL Model (NAEP)– 3 PL Model (MME, NAEP)

• Constructed Response Items– Partial Credit Model (MEAP Writing, MEAP-

Access Writing, MI-Access FI Expressing Ideas, ELPA)

– Generalized Partial Credit Model (MME Writing, NAEP)

Note: NAEP is not analyzed or administered by MDE. It is a test administered by the federal government!

Page 8: Introduction to Item Response Theory - Michigan to Item Response Theory. 2 ... – Rasch Model (MEAP, MEAP-Access, ... • Many IRT linking methods exist for creating a

8

The Rasch Model(sometimes called the 1 PL Model)

( ) )(

)(

1 b

b

eeP −

+= θ

θ

θ

Page 9: Introduction to Item Response Theory - Michigan to Item Response Theory. 2 ... – Rasch Model (MEAP, MEAP-Access, ... • Many IRT linking methods exist for creating a

9

The Rasch Model

• An item characteristic curve for a sample MEAP itemSimple IRT Model

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

-3 -2 -1 0 1 2 3

Achievement

Prob

abili

ty o

f cor

rect

resp

onse

Item difficulty

Inflection point

50% probability of correct response

Page 10: Introduction to Item Response Theory - Michigan to Item Response Theory. 2 ... – Rasch Model (MEAP, MEAP-Access, ... • Many IRT linking methods exist for creating a

10

The 3 PL Model

( ) )(

)(

1)1(

bDa

bDa

eeccP −

+−

+= θ

θ

θ

Page 11: Introduction to Item Response Theory - Michigan to Item Response Theory. 2 ... – Rasch Model (MEAP, MEAP-Access, ... • Many IRT linking methods exist for creating a

11

The 3 PL Model

• An item characteristic curve for a sample MME item.More Complex IRT Model

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

-3 -2 -1 0 1 2 3

Achievement

Prob

abili

ty o

f cor

rect

resp

onse

Item difficulty

Slope at inflection point indicates how well the item discriminates between high and low achievers

Probability halfway between item "guessability" and 1

Item "guessability"

Page 12: Introduction to Item Response Theory - Michigan to Item Response Theory. 2 ... – Rasch Model (MEAP, MEAP-Access, ... • Many IRT linking methods exist for creating a

12

Rasch vs. 3 PL

What features do the Rasch and 3PL model have in common?

What features of the Rasch and 3 PL Model are different?

Page 13: Introduction to Item Response Theory - Michigan to Item Response Theory. 2 ... – Rasch Model (MEAP, MEAP-Access, ... • Many IRT linking methods exist for creating a

13

Rasch vs. 3PL

• In the Rasch model, the item difficulty parameter and its difference from student ability drives the probability of a correct response. All other elements are constants in the equation.– Therefore, when you see the plots of

multiple items, they only differ by a constant in terms of their location on the scale (shown in diagram on next slide).

Page 14: Introduction to Item Response Theory - Michigan to Item Response Theory. 2 ... – Rasch Model (MEAP, MEAP-Access, ... • Many IRT linking methods exist for creating a

14

10 Rasch item characteristic curves

-3 -2 -1 0 1 2 3

0.0

0.2

0.4

0.6

0.8

1.0

θ

Pro

babi

lity

of c

orre

ct re

spon

se

Page 15: Introduction to Item Response Theory - Michigan to Item Response Theory. 2 ... – Rasch Model (MEAP, MEAP-Access, ... • Many IRT linking methods exist for creating a

15

Rasch vs. 3PL• In 3 PL model, the difference between ability and difficulty

is still the critical piece. However, the discrimination parameter changes the influence of the difference between ability and difficulty for each item. Furthermore, the minimum possible result for the equation is influenced by the ‘c’ parameter.– If c > 0.00, the probability of correct response is greater

than 0.– Item characteristic curves will vary by location on the

scale as well as lower asymptote (c parameter) and slope (a parameter).

– Knowing how difficult an item is compared to another is still relevant but is not the only piece of information that leads to differences in items.

Page 16: Introduction to Item Response Theory - Michigan to Item Response Theory. 2 ... – Rasch Model (MEAP, MEAP-Access, ... • Many IRT linking methods exist for creating a

16

10 3 PL item characteristic curves

-3 -2 -1 0 1 2 3

0.0

0.2

0.4

0.6

0.8

1.0

θ

Pro

babi

lity

of c

orre

ct re

spon

se

Page 17: Introduction to Item Response Theory - Michigan to Item Response Theory. 2 ... – Rasch Model (MEAP, MEAP-Access, ... • Many IRT linking methods exist for creating a

17

2 PL Model

• How do you end up with the 2 PL model?

Page 18: Introduction to Item Response Theory - Michigan to Item Response Theory. 2 ... – Rasch Model (MEAP, MEAP-Access, ... • Many IRT linking methods exist for creating a

18

Test Characteristic Curves

• Relates achievement to scores examinees are expected to receive on the assessment.

• Sum of Item Characteristic Curves in IRT.

• Defined the same way for Raschand 3 PL models.

Page 19: Introduction to Item Response Theory - Michigan to Item Response Theory. 2 ... – Rasch Model (MEAP, MEAP-Access, ... • Many IRT linking methods exist for creating a

19

Example of Test Characteristic Curve for 10 Rasch Items

- 4 -2 0 2 4

02

46

810

θ

Exp

ecte

d N

umbe

r Cor

rect

Sco

re

Page 20: Introduction to Item Response Theory - Michigan to Item Response Theory. 2 ... – Rasch Model (MEAP, MEAP-Access, ... • Many IRT linking methods exist for creating a

20

Partial Credit Model (PCM)

( ) ( )( )

( )∑ ∑

= =

=

−===

im

h

h

kik

x

kik

iix

b

bxXPP

0 0

0

exp

exp

θ

θθθ

Page 21: Introduction to Item Response Theory - Michigan to Item Response Theory. 2 ... – Rasch Model (MEAP, MEAP-Access, ... • Many IRT linking methods exist for creating a

21

Generalized Partial Credit Model (GPCM)

( ) ( )( )

( )∑ ∑

= =

=

−===

im

h

h

kiki

x

kiki

iix

ba

baxXPP

0 0

0

exp

exp

θ

θθθ

Page 22: Introduction to Item Response Theory - Michigan to Item Response Theory. 2 ... – Rasch Model (MEAP, MEAP-Access, ... • Many IRT linking methods exist for creating a

22

How do we get there?• IRT models depend on item and person

parameters.• Item and person parameters have to be estimated.• Person by item matrix is needed to begin the

process.

MME Science010100101111000111101110101111000101100110110011011000011101001011010111011001010011

Page 23: Introduction to Item Response Theory - Michigan to Item Response Theory. 2 ... – Rasch Model (MEAP, MEAP-Access, ... • Many IRT linking methods exist for creating a

23

• Person by item matrix input into an IRT estimation program.

• Program uses an estimation algorithm (a set of mathematical rules) to come up with a solution.

• The end products are best estimates of the item parameters and person ability estimates.– Item parameters are the ‘guessability’,

discrimination and difficulty parameters– Person parameters are the ability estimates we

use to create a student’s scale score.

IRT Estimation

Page 24: Introduction to Item Response Theory - Michigan to Item Response Theory. 2 ... – Rasch Model (MEAP, MEAP-Access, ... • Many IRT linking methods exist for creating a

24

Estimating Ability

• For the 3PL/GPCM, people who share the same response string (same pattern of correct and incorrect responses/ same score on constructed response items) will have the same ability estimate. – It is possible for people with the same raw score to

end up with different ability estimates.• In the Rasch/PCM, the raw score is used to

derive the abilities. – Each person with the same raw score will have the

same estimate of ability.

Page 25: Introduction to Item Response Theory - Michigan to Item Response Theory. 2 ... – Rasch Model (MEAP, MEAP-Access, ... • Many IRT linking methods exist for creating a

25

Common IRT Software Packages

• Rasch/PCM:– WINSTEPS– FACETS– CONQUEST

• 3PL/GPCM:– PARSCALE– MULTILOG– BILOG-MG (cannot be used for

constructed response items)

Page 26: Introduction to Item Response Theory - Michigan to Item Response Theory. 2 ... – Rasch Model (MEAP, MEAP-Access, ... • Many IRT linking methods exist for creating a

26

Uses of IRT

• Item/Test Information• Conditional Standard Error of

Measurement• Creation of Scale Scores• Standard Setting• Equating/Linking• Test assembly/Test Design• Differential Item Functioning (DIF)

Page 27: Introduction to Item Response Theory - Michigan to Item Response Theory. 2 ... – Rasch Model (MEAP, MEAP-Access, ... • Many IRT linking methods exist for creating a

27

Item/Test Information

• Each IRT model has an item information function.

• Item information provides an indicator of the accuracy of ability estimates at each location.

• Test information is the sum of item information over items.

Page 28: Introduction to Item Response Theory - Michigan to Item Response Theory. 2 ... – Rasch Model (MEAP, MEAP-Access, ... • Many IRT linking methods exist for creating a

28

Item Information (Rasch item)

-4 -2 0 2 4

0.0

0.2

0.4

0.6

0.8

1.0

θ

Info

rmat

ion

Page 29: Introduction to Item Response Theory - Michigan to Item Response Theory. 2 ... – Rasch Model (MEAP, MEAP-Access, ... • Many IRT linking methods exist for creating a

29

Test information (10 Rasch items)

-4 -2 0 2 4

01

23

45

θ

Info

rmat

ion

Page 30: Introduction to Item Response Theory - Michigan to Item Response Theory. 2 ... – Rasch Model (MEAP, MEAP-Access, ... • Many IRT linking methods exist for creating a

30

Conditional Standard Error of Measurement

• Equal to reciprocal of the square root of the Test Information Function.

• Provides indicator of assessment accuracy at each ability level.

• MDE reports conditional standard error of measurement for student’s scale scores.

Page 31: Introduction to Item Response Theory - Michigan to Item Response Theory. 2 ... – Rasch Model (MEAP, MEAP-Access, ... • Many IRT linking methods exist for creating a

31

Conditional Standard Error of Measurement

-4 -2 0 2 4

01

23

45

θ

Con

ditio

nal S

tand

ard

Erro

r of M

easu

rem

ent

Page 32: Introduction to Item Response Theory - Michigan to Item Response Theory. 2 ... – Rasch Model (MEAP, MEAP-Access, ... • Many IRT linking methods exist for creating a

32

Theta to scale score transformation

• Remember the linear equation?– y = mx + b

• MDE uses linear equations to transform θ (Ability) to scale scores.

• Different transformation for each grade, content area, and assessment.

• Performance levels are determined by the student’s scale score.

Page 33: Introduction to Item Response Theory - Michigan to Item Response Theory. 2 ... – Rasch Model (MEAP, MEAP-Access, ... • Many IRT linking methods exist for creating a

33

Example of a Raw to Scale Score Table

Raw Score Scale Score PL SE0 385 4 441 414 4 252 432 4 18… … … …9 478 4 1010 482 3 10… … … …14 497 3 915 500 2 9… … … …23 534 2 1124 540 1 12… … … …

Page 34: Introduction to Item Response Theory - Michigan to Item Response Theory. 2 ... – Rasch Model (MEAP, MEAP-Access, ... • Many IRT linking methods exist for creating a

34

Standard Setting• Process of establishing cut scores on the score scale

of an assessment. • Involves groups of teachers, administrators, and

content experts who make cut score recommendations.

• Recommendations are based on panelists’understanding of students and content as well as assessment characteristics.

• MDE has applied IRT based standard setting methods (e.g. Bookmark and Body of Work).

• State Board of Education sets final cut scores after considering panelists’ cut score recommendations.

Page 35: Introduction to Item Response Theory - Michigan to Item Response Theory. 2 ... – Rasch Model (MEAP, MEAP-Access, ... • Many IRT linking methods exist for creating a

35

Equating/Linking• Process of placing scores from different test

administrations onto a common scale so that scores can be used interchangeably.

• Equating adjusts for differences in difficulty between test forms.

• IRT facilitates equating/linking by assuming item parameters for common items do not change over time.

• Many IRT linking methods exist for creating a common scale once this assumption is made.

• MDE uses the Stocking-Lord procedure for MME and the fixed parameter method for the other assessments.

Page 36: Introduction to Item Response Theory - Michigan to Item Response Theory. 2 ... – Rasch Model (MEAP, MEAP-Access, ... • Many IRT linking methods exist for creating a

36

Test Design/Assembly• MDE checks item and content characteristics

when creating new test forms.• Make test information as large as possible near

the cut scores to make performance level classifications as accurate as possible.

• Make sure that the IRT test information and test characteristic curves for alternate test versions are as close to each other as possible.

• Why do we want the test information functions and test characteristic curves to be as similar as possible?

Page 37: Introduction to Item Response Theory - Michigan to Item Response Theory. 2 ... – Rasch Model (MEAP, MEAP-Access, ... • Many IRT linking methods exist for creating a

37

Differential Item Functioning (DIF)

• DIF refers to the situation where examinees with the same ability differ on average in their item performance depending on subgroup membership.

• MDE checks for DIF for each subgroup (e.g. males vs. females) that it is tested on the assessment that has a large enough sample size.

• Items identified as exhibiting DIF are reviewed by a panel of teachers and content experts to make sure that they are fair to all subgroups of examinees being tested.

Page 38: Introduction to Item Response Theory - Michigan to Item Response Theory. 2 ... – Rasch Model (MEAP, MEAP-Access, ... • Many IRT linking methods exist for creating a

38

Summary

• You were introduced to IRT models and how they are used by MDE.

• Goal is that you leave with a greater understanding of how MDE assessments are scored, scaled, and interpreted.

• In addition, you now should have some ‘tools’ that can assist you in your own analyses.

Page 39: Introduction to Item Response Theory - Michigan to Item Response Theory. 2 ... – Rasch Model (MEAP, MEAP-Access, ... • Many IRT linking methods exist for creating a

39

Contact InformationAdam Wyse (517)-373-2435

[email protected]

Ji Zeng(517)-241-3517

[email protected]

Please feel free to contact us if you have any questions ☺