60
Test Reliability & Development Using IRT University of Kansas Item Response Theory Stats Camp ‘07

Test Reliability & Development Using IRT - Jonathan … Reliability & Development Using IRT University of Kansas Item Response Theory Stats Camp ‘07 Overview • Reliability with

  • Upload
    letruc

  • View
    218

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Test Reliability & Development Using IRT - Jonathan … Reliability & Development Using IRT University of Kansas Item Response Theory Stats Camp ‘07 Overview • Reliability with

Test Reliability & Development Using IRT

University of KansasItem Response Theory

Stats Camp ‘07

Page 2: Test Reliability & Development Using IRT - Jonathan … Reliability & Development Using IRT University of Kansas Item Response Theory Stats Camp ‘07 Overview • Reliability with

Overview

• Reliability with IRT–Item and Test Information

Functions• Concepts• Equations• Uses and Examples

• Optimal Test Design

Page 3: Test Reliability & Development Using IRT - Jonathan … Reliability & Development Using IRT University of Kansas Item Response Theory Stats Camp ‘07 Overview • Reliability with

Reliability with IRT

• We all know that reliability (precision) is a desirable property for an assessment.

• The more reliable a test is, the more precisely we can measure the construct.

• For any scaling procedure (IRT or CTT), as reliability goes up, the standard error of measurement goes down.

Page 4: Test Reliability & Development Using IRT - Jonathan … Reliability & Development Using IRT University of Kansas Item Response Theory Stats Camp ‘07 Overview • Reliability with

Reliability with IRT

• In CTT, reliability is a one-number summary of test precision, and there is a corresponding single standard error of measurement that is used for any test score.

• In IRT, test precision is conceptualized as something called Information, which is conditional on the trait level being measured.– Some tests could measure certain trait levels very

well but measure others poorly…

Page 5: Test Reliability & Development Using IRT - Jonathan … Reliability & Development Using IRT University of Kansas Item Response Theory Stats Camp ‘07 Overview • Reliability with

Reliability with IRT

• A further advantage of IRT with respect to evaluating reliability is that we can consider the amount of Information an item and/or a test provides.

• In CTT, measures of item quality exist, but these are only indirectly related to what the reliability of the test will be.

Page 6: Test Reliability & Development Using IRT - Jonathan … Reliability & Development Using IRT University of Kansas Item Response Theory Stats Camp ‘07 Overview • Reliability with

Item Information Function

• “Item Information” indicates an item’s usefulness for assessing ability.

• By “usefulness” we basically mean how good an item is at distinguishing examinees with lower ability levels from those with higher ability levels.

• Information Precision

Page 7: Test Reliability & Development Using IRT - Jonathan … Reliability & Development Using IRT University of Kansas Item Response Theory Stats Camp ‘07 Overview • Reliability with

0.0

0.2

0.4

0.6

0.8

1.0

-3 -2 -1 0 1 2 3

Ability (θ)

P (u

= 1

| θ)

0.0

0.2

0.4

0.6

-3 -2 -1 0 1 2 3

Ability (θ)

Info

( θ)

0.8

1.0

Page 8: Test Reliability & Development Using IRT - Jonathan … Reliability & Development Using IRT University of Kansas Item Response Theory Stats Camp ‘07 Overview • Reliability with

Item Information Function

• Items are basically more informative where the slope of the ICC is steepest, which happens when…bj is relatively close to θi,aj is relatively high, andcj is relatively low

• If cj = 0, an item provides its maximum information when θi = bj

Page 9: Test Reliability & Development Using IRT - Jonathan … Reliability & Development Using IRT University of Kansas Item Response Theory Stats Camp ‘07 Overview • Reliability with

0.0

0.2

0.4

0.6

0.8

1.0

-3 -2 -1 0 1 2 3

Ability (θ)

P (u

= 1

| θ)

a = 1.0

c = 0.0

b = 1.0 or 2.0

Page 10: Test Reliability & Development Using IRT - Jonathan … Reliability & Development Using IRT University of Kansas Item Response Theory Stats Camp ‘07 Overview • Reliability with

0.0

0.2

0.4

0.6

0.8

1.0

-3 -2 -1 0 1 2 3

Ability (θ)

Info

( θ)

a = 1.0

c = 0.0

b = 1.0 or 2.0

Page 11: Test Reliability & Development Using IRT - Jonathan … Reliability & Development Using IRT University of Kansas Item Response Theory Stats Camp ‘07 Overview • Reliability with

0.0

0.2

0.4

0.6

0.8

1.0

-3 -2 -1 0 1 2 3

Ability (θ)

P (u

= 1

| θ)

b = -1.0

c = 0.2

a = 1.0 or 0.5

Page 12: Test Reliability & Development Using IRT - Jonathan … Reliability & Development Using IRT University of Kansas Item Response Theory Stats Camp ‘07 Overview • Reliability with

0.0

0.2

0.4

0.6

0.8

1.0

-3 -2 -1 0 1 2 3

Ability (θ)

Info

( θ)

b = -1.0

c = 0.2

a = 1.0 or 0.5

Page 13: Test Reliability & Development Using IRT - Jonathan … Reliability & Development Using IRT University of Kansas Item Response Theory Stats Camp ‘07 Overview • Reliability with

0.0

0.2

0.4

0.6

0.8

1.0

-3 -2 -1 0 1 2 3

Ability (θ)

P (u

= 1

| θ)

a = 1.0

b = 0.0

c = 0.0 or 0.2

Page 14: Test Reliability & Development Using IRT - Jonathan … Reliability & Development Using IRT University of Kansas Item Response Theory Stats Camp ‘07 Overview • Reliability with

0.0

0.2

0.4

0.6

0.8

1.0

-3 -2 -1 0 1 2 3

Ability (θ)

Info

( θ)

a = 1.0

b = 0.0

c = 0.0 or 0.2

Page 15: Test Reliability & Development Using IRT - Jonathan … Reliability & Development Using IRT University of Kansas Item Response Theory Stats Camp ‘07 Overview • Reliability with

Item Information Function

• IMPORTANT: information is a function of θ, which means that an item could be very informative for some ability levels and relatively uninformative for others.

• Example: difficult items are informative for higher ability levels, but don’t tell us much about lower ability levels (because they mostly get all those items wrong!)

Page 16: Test Reliability & Development Using IRT - Jonathan … Reliability & Development Using IRT University of Kansas Item Response Theory Stats Camp ‘07 Overview • Reliability with

0.0

0.2

0.4

0.6

0.8

1.0

-3 -2 -1 0 1 2 3

Ability (θ)

P (u

= 1

| θ)

c = 0.0

a = 1.2 or 0.8

b = 1.0 or 0.0

Page 17: Test Reliability & Development Using IRT - Jonathan … Reliability & Development Using IRT University of Kansas Item Response Theory Stats Camp ‘07 Overview • Reliability with

0.0

0.2

0.4

0.6

0.8

1.0

-3 -2 -1 0 1 2 3

Ability (θ)

Info

( θ)

c = 0.0

a = 1.2 or 0.8

b = 1.0 or 0.0

Page 18: Test Reliability & Development Using IRT - Jonathan … Reliability & Development Using IRT University of Kansas Item Response Theory Stats Camp ‘07 Overview • Reliability with

Item Information Functionfor the 3-PL

' 2

2 2

( ) ( ) 2

[ ( )]( )

( ) ( )

(1 )[ ][1 ]j j j j

jj

j j

j jDa b Da b

j

PI

P Q

D a cc e eθ θ

θθ

θ θ

− − −

=

−=

+ +

Page 19: Test Reliability & Development Using IRT - Jonathan … Reliability & Development Using IRT University of Kansas Item Response Theory Stats Camp ‘07 Overview • Reliability with

Notes on IIF

• The roles of aj and cj are easy to see– as aj increases, information increases– as cj increases, information decreases

• As ability moves away from bj (+ or -) the denominator increases, so information approaches zero.

Page 20: Test Reliability & Development Using IRT - Jonathan … Reliability & Development Using IRT University of Kansas Item Response Theory Stats Camp ‘07 Overview • Reliability with

Maximum Information

If cj = 0, then Information is maximized at bj

If cj > 0, then Information is maximized at an ability level slightly greater than bj

max1 ln 0.5(1 1 8 )j j

j

b cDa

θ ⎡ ⎤= + + +⎣ ⎦

Page 21: Test Reliability & Development Using IRT - Jonathan … Reliability & Development Using IRT University of Kansas Item Response Theory Stats Camp ‘07 Overview • Reliability with

Test Information Function

• Just like we add up ICCs to get a TCC, we add up IIFs to get a TIF.

• Information will continue to increase as we add test items, therefore increasing precision.

• All things equal, longer tests provide increased measurement precision.

Page 22: Test Reliability & Development Using IRT - Jonathan … Reliability & Development Using IRT University of Kansas Item Response Theory Stats Camp ‘07 Overview • Reliability with

Test Information Function

• Defined for a set of items at each point along the ability (θ) scale

• Test information is influenced by the ‘quality’ and the number of test items

1

( ) ( )n

jj

I Iθ θ=

=∑

Page 23: Test Reliability & Development Using IRT - Jonathan … Reliability & Development Using IRT University of Kansas Item Response Theory Stats Camp ‘07 Overview • Reliability with

0.0

0.2

0.4

0.6

0.8

1.0

-3 -2 -1 0 1 2 3

Ability (θ)

P (u

= 1

| θ)

Page 24: Test Reliability & Development Using IRT - Jonathan … Reliability & Development Using IRT University of Kansas Item Response Theory Stats Camp ‘07 Overview • Reliability with

0

1

2

3

4

5

6

7

8

-3 -2 -1 0 1 2 3

Ability (θ)

E(X

| θ)

Page 25: Test Reliability & Development Using IRT - Jonathan … Reliability & Development Using IRT University of Kansas Item Response Theory Stats Camp ‘07 Overview • Reliability with

0.0

0.2

0.4

0.6

0.8

1.0

-3 -2 -1 0 1 2 3

Ability (θ)

Info

( θ)

Page 26: Test Reliability & Development Using IRT - Jonathan … Reliability & Development Using IRT University of Kansas Item Response Theory Stats Camp ‘07 Overview • Reliability with

0

1

2

3

4

-3 -2 -1 0 1 2 3

Ability (θ)

Info

( θ)

Page 27: Test Reliability & Development Using IRT - Jonathan … Reliability & Development Using IRT University of Kansas Item Response Theory Stats Camp ‘07 Overview • Reliability with

0

1

2

3

4

-3 -2 -1 0 1 2 3

Ability (θ)

Info

( θ)

Page 28: Test Reliability & Development Using IRT - Jonathan … Reliability & Development Using IRT University of Kansas Item Response Theory Stats Camp ‘07 Overview • Reliability with

Conditional Error for Maximum Likelihood Estimates

• One of the great benefits of IRT scaling is that measurement precision and error can now be considered conditional on θ.

Page 29: Test Reliability & Development Using IRT - Jonathan … Reliability & Development Using IRT University of Kansas Item Response Theory Stats Camp ‘07 Overview • Reliability with

Conditional Error for Maximum Likelihood Estimates

• Standard error of an MLE is determined by:

1ˆ( )ˆ( )

SEI

θθ

=

Page 30: Test Reliability & Development Using IRT - Jonathan … Reliability & Development Using IRT University of Kansas Item Response Theory Stats Camp ‘07 Overview • Reliability with

Conditional Standard Error

• The imprecision of ability estimation is therefore inversely related to the amount of Information with respect to ability that is available.

• Since Information increases with the quality and number of items, the SE conversely decreases…which hopefully makes some sense!

Page 31: Test Reliability & Development Using IRT - Jonathan … Reliability & Development Using IRT University of Kansas Item Response Theory Stats Camp ‘07 Overview • Reliability with

0

1

2

3

4

-3 -2 -1 0 1 2 3

Ability (θ)

Info

( θ) a

nd S

E(θ)

8-item Test Information Function

Page 32: Test Reliability & Development Using IRT - Jonathan … Reliability & Development Using IRT University of Kansas Item Response Theory Stats Camp ‘07 Overview • Reliability with

0

2

4

6

8

10

-3 -2 -1 0 1 2 3

Ability (θ)

Info

( θ) a

nd S

E(θ)

Information may be spread across a relatively wide range…

Page 33: Test Reliability & Development Using IRT - Jonathan … Reliability & Development Using IRT University of Kansas Item Response Theory Stats Camp ‘07 Overview • Reliability with

0

2

4

6

8

10

-3 -2 -1 0 1 2 3

Ability (θ)

Info

( θ) a

nd S

E(θ)

or maximized around an ability level of interest(e.g., a cutscore)

Page 34: Test Reliability & Development Using IRT - Jonathan … Reliability & Development Using IRT University of Kansas Item Response Theory Stats Camp ‘07 Overview • Reliability with

Info and SE Example

At 1.0, ( 1) 91 1ˆ( ) 0.33

ˆ 9( )ˆ ˆIf 1.0, ( ) 0.33

I

SEI

SE

θ θ

θθ

θ θ

= = =

= = =

= =

Page 35: Test Reliability & Development Using IRT - Jonathan … Reliability & Development Using IRT University of Kansas Item Response Theory Stats Camp ‘07 Overview • Reliability with

Info and SE Example

At 0.0, ( 0) 31 1ˆ( ) 0.58

ˆ 3( )ˆ ˆIf 0.0, ( ) 0.58

I

SEI

SE

θ θ

θθ

θ θ

= = =

= = =

= =

Page 36: Test Reliability & Development Using IRT - Jonathan … Reliability & Development Using IRT University of Kansas Item Response Theory Stats Camp ‘07 Overview • Reliability with

Info and SE Example

At 1.0, ( 1) 11 1ˆ( ) 1.0

ˆ 1( )ˆ ˆIf 1.0, ( ) 1.0

I

SEI

SE

θ θ

θθ

θ θ

=− =− =

= = =

=− =

Page 37: Test Reliability & Development Using IRT - Jonathan … Reliability & Development Using IRT University of Kansas Item Response Theory Stats Camp ‘07 Overview • Reliability with

95% Confidence Interval

• Because MLEs are asymptotically normally distributed, we create a 95% confidence interval around a point estimate of ability by adding and subtracting 1.96 standard errors:

• Estimate ± 1.96 SE(recall critical values from a standard normal distribution)

Page 38: Test Reliability & Development Using IRT - Jonathan … Reliability & Development Using IRT University of Kansas Item Response Theory Stats Camp ‘07 Overview • Reliability with

0

0.1

0.2

0.3

0.4

0.5

-3 -2 -1 0 1 2 3

Prob

abili

tyStandard Normal Distribution

0.025 0.025

0.95

Page 39: Test Reliability & Development Using IRT - Jonathan … Reliability & Development Using IRT University of Kansas Item Response Theory Stats Camp ‘07 Overview • Reliability with

95% Confidence Interval

• For θ = 1, SE=0.33 1.0 ± 0.65– 95% chance that examinee’s true ability is in

between 0.35 and 1.65• For θ = 0, SE=0.58 0.0 ± 1.14

– 95% chance that examinee’s true ability is in between -1.14 and 1.14

• For θ = -1, SE=1.0 -1.0 ± 1.96– 95% chance that examinee’s true ability is in

between -2.96 and 0.96

Page 40: Test Reliability & Development Using IRT - Jonathan … Reliability & Development Using IRT University of Kansas Item Response Theory Stats Camp ‘07 Overview • Reliability with

95% Confidence Interval

• As information increases…– SE decreases– CI becomes narrower– Increased trust in ability estimate

• As information decreases…– SE increases– CI becomes wider– Decreased trust in ability estimate

Page 41: Test Reliability & Development Using IRT - Jonathan … Reliability & Development Using IRT University of Kansas Item Response Theory Stats Camp ‘07 Overview • Reliability with

Notes on IIF and TIF

• Note that the contribution of Ij(θ) to I(θ) does not depend on the particular combination of test items.– Each item contributes independently

• This is a very big advantage of IRT over CTT: reliability can be described conditionally (as information), and it does not depend on the particular set of items.

Page 42: Test Reliability & Development Using IRT - Jonathan … Reliability & Development Using IRT University of Kansas Item Response Theory Stats Camp ‘07 Overview • Reliability with

Mini-CTT lesson• In CTT, item discrimination (quality) is the

item-total correlation• This will depend on the item itself, but is

also influenced by the other test items.• Adding items changes the total score, thus

changing the correlation.• Therefore, it’s difficult to anticipate the

reliability of a test when creating a form from a bank of previously piloted items, unless those items all appeared together.

Page 43: Test Reliability & Development Using IRT - Jonathan … Reliability & Development Using IRT University of Kansas Item Response Theory Stats Camp ‘07 Overview • Reliability with

CTT versus IRT• In IRT, item quality is Information, which

is affected by aj, bj, cj, and θ.• An item’s information function will be

independent of the other items on the test, as will its contribution to the TIF.

• Adding more and/or better items will increase TIF, but won’t impact any IIF.

• Therefore, it’s easy to anticipate the reliability of a test when creating a form from a bank of previously piloted items.

Page 44: Test Reliability & Development Using IRT - Jonathan … Reliability & Development Using IRT University of Kansas Item Response Theory Stats Camp ‘07 Overview • Reliability with

Excel Spreadsheet Demo

• Show Excel Spreadsheet containing eight items, their ICCs, TCC, IIFs, TIF and SE.

• Specify different item parameters and determine how changes affect the resulting graphs.

Page 45: Test Reliability & Development Using IRT - Jonathan … Reliability & Development Using IRT University of Kansas Item Response Theory Stats Camp ‘07 Overview • Reliability with

Uses of Item and Test Information Functions

1) Providing conditional SE of trait2) Building a test to meet desired

statistical specifications3) Revising an existing test4) Comparing tests

Page 46: Test Reliability & Development Using IRT - Jonathan … Reliability & Development Using IRT University of Kansas Item Response Theory Stats Camp ‘07 Overview • Reliability with

Conditional SE

• As previously stated, the precision (reliability) and imprecision (error) of a test scaled with IRT is conditional on θ.

• Tests may be better or worse for measuring certain trait levels

Page 47: Test Reliability & Development Using IRT - Jonathan … Reliability & Development Using IRT University of Kansas Item Response Theory Stats Camp ‘07 Overview • Reliability with

Test Development

• From a pool of previously piloted test items, IRT makes it relatively easy to switch items in and out and determine what the resulting Information function will be.

• This tells the test maker what the conditional standard errors will be, too.

Page 48: Test Reliability & Development Using IRT - Jonathan … Reliability & Development Using IRT University of Kansas Item Response Theory Stats Camp ‘07 Overview • Reliability with

Test Development

• Another benefit to test development is that multiple forms may be built to the same statistical specifications.

• This process is often referred to as “Pre-equating.”

• Building strictly parallel forms is always difficult, but these procedures can help.

Page 49: Test Reliability & Development Using IRT - Jonathan … Reliability & Development Using IRT University of Kansas Item Response Theory Stats Camp ‘07 Overview • Reliability with

Test Revision

• Likewise, test items may be removed from previously existing forms (e.g, to create a “short form” of a test).

• Test items may also need to be added if the previous form is found to be unreliable.

• Estimating the new reliability of the test is straightforward with IRT

Page 50: Test Reliability & Development Using IRT - Jonathan … Reliability & Development Using IRT University of Kansas Item Response Theory Stats Camp ‘07 Overview • Reliability with

Test Revision

• In CTT, such test revisions require the assumption that the deleted or added items are of comparable statistical quality to those already on the test.–Spearman-Brown prophecy formula–This may or may not be true!

Page 51: Test Reliability & Development Using IRT - Jonathan … Reliability & Development Using IRT University of Kansas Item Response Theory Stats Camp ‘07 Overview • Reliability with

Comparing Tests

• When comparing the reliability (i.e., precision) of two test forms, its useful to determine the ratio of their information with respect to θ.

• This ratio is known as the relative efficiency of a test: RE(θ).

• Consider two previous example TIFs

Page 52: Test Reliability & Development Using IRT - Jonathan … Reliability & Development Using IRT University of Kansas Item Response Theory Stats Camp ‘07 Overview • Reliability with

0

2

4

6

8

10

-3 -2 -1 0 1 2 3

Ability (θ)

Info

( θ) a

nd S

E(θ)

Information targeted around a cutscore

We’ll call this“Form X”

Page 53: Test Reliability & Development Using IRT - Jonathan … Reliability & Development Using IRT University of Kansas Item Response Theory Stats Camp ‘07 Overview • Reliability with

0

2

4

6

8

10

-3 -2 -1 0 1 2 3

Ability (θ)

Info

( θ) a

nd S

E(θ)

Information spread across a wide range

We’ll call this“Form Y”

Page 54: Test Reliability & Development Using IRT - Jonathan … Reliability & Development Using IRT University of Kansas Item Response Theory Stats Camp ‘07 Overview • Reliability with

( ) info for form X at ( )( ) info for form Y at

Suppose at =1 ( ) 9.0 =1 ( ) 3.6

9Then, ( 1) 2.53.6

X

Y

X

Y

IREI

II

RE

θ θθθ θ

θ θθ θ

θ

= →

→ =→ =

= = =

Page 55: Test Reliability & Development Using IRT - Jonathan … Reliability & Development Using IRT University of Kansas Item Response Theory Stats Camp ‘07 Overview • Reliability with

0

2

4

6

8

10

-3 -2 -1 0 1 2 3

Ability (θ)

Info

( θ)

In the region θ = 1, Form X is 2.5 times more efficient than Form Y

Page 56: Test Reliability & Development Using IRT - Jonathan … Reliability & Development Using IRT University of Kansas Item Response Theory Stats Camp ‘07 Overview • Reliability with

0

2

4

6

8

10

-3 -2 -1 0 1 2 3

Ability (θ)

Info

( θ)

In the region θ ≈ 0.10, Form X is just as efficient as Form Y

Page 57: Test Reliability & Development Using IRT - Jonathan … Reliability & Development Using IRT University of Kansas Item Response Theory Stats Camp ‘07 Overview • Reliability with

0

2

4

6

8

10

-3 -2 -1 0 1 2 3

Ability (θ)

Info

( θ)

In the region θ = -1, Form X is LESS efficient than Form Y RE(θ)=0.23

Page 58: Test Reliability & Development Using IRT - Jonathan … Reliability & Development Using IRT University of Kansas Item Response Theory Stats Camp ‘07 Overview • Reliability with

0

1

2

3

4

5

6

-3 -2 -1 0 1 2 3

Ability (θ)

RE(θ)

Form X is more efficient than Form Y above the point θ ≈ 0.1

Page 59: Test Reliability & Development Using IRT - Jonathan … Reliability & Development Using IRT University of Kansas Item Response Theory Stats Camp ‘07 Overview • Reliability with

0

2

4

6

8

10

12

-3 -2 -1 0 1 2 3

Ability (θ)

RE(θ)

Form Y is more efficient than Form X below the point θ ≈ 0.1

Page 60: Test Reliability & Development Using IRT - Jonathan … Reliability & Development Using IRT University of Kansas Item Response Theory Stats Camp ‘07 Overview • Reliability with

Next…

• Test Score Equating using IRT