Download ppt - ITEM RESPONSE THEORY MODELS & COMPOSITE MEASURES Sharon-Lise T. Normand

1

ITEM RESPONSE THEORY MODELS & COMPOSITE

MEASURES

Sharon-Lise T. Normand FOCUS: How to deal with data that are dichotomous or ordinal?

“s” index subject“j” index item (or measure) s = “true” unobserved score.

2

WHAT IS AN ITEM RESPONSE THEORY (IRT) MODEL?

A statistical model that relates the probability of response to an item to item-specific parameters and to the subject’s underlying latent trait.

3

Classical Test Theory

• Estimate reliability of items (coefficient ).

• Model:

Ysj = s + sj

Ysj = response

s = underlying trait

sj = error Normal with expectation 0 and constant variance.

Item Response Theory

• Estimate discriminating ability of items using item-specific parameters.

• Responses within a subject are independent conditional on latent trait.

• Normality & constant variance not assumed

s ~ N(0,1)

4

DICHOTOMOUS OR ORDINAL RESPONSES

Item response formulation: Observed response is ysj; generalized linear model formulation:

h(P(ysj = 1 given s)) = j(s - j)

• h = link function (logit or probit) j and j are “item” parameters.

5

RASCH MODEL (1-PARAMETER LOGISTIC)

Simplest IRT Model• Ysj = 1 if subject s responds correctly to

item j and 0 otherwise. s = latent ability for subject s.

j = difficulty of jth item.

Probability subject s responds correctly jth item:

P(Ysj=1|s) = exp(s - j )

1+exp(s - j )

6

o o o o oo

o

o

o

o

o

o

o

o

oo

o

Beta

Pro

ba

bili

ty o

f C

orr

ect

Re

spo

nse

-4 -3 -2 -1 0 1 2 3 4

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0Theta=1Theta=0Theta=-1

RASCH MODEL: 3 SUBJECTS WITH DIFFERENT TRAITS

= DIFFICULTY

7

2-PARAMETER LOGISTIC

• Ysj = 1 if subject s responds correctly to item j and 0 otherwise.

s = latent ability for subject s.

j = difficulty of jth item.

j = discrimination of jth item (j > 0)

Probability subject s responds correctly jth item:

P(Ysj=1|s) = exp(j(s - j)

1+exp(j(s - j))

8

o o o o oo

o

o

o

o

o

o

o

o

oo

o

Theta

Pro

ba

bili

ty o

f C

orr

ect

Re

spo

nse

-4 -3 -2 -1 0 1 2 3 4

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

o o o o o o o oo

o

o

o

oo o o o

o oo

oo

oo

o

o

o

o

o

o

oo

oo

2-PARAMETER LOGISTIC: 3 ITEMS ( = 1)

= 3 = 0.5

= 1

9

o o o o oo

o

o

o

o

o

o

o

o

oo

o

Theta

Pro

ba

bili

ty o

f C

orr

ect

Re

spo

nse

-4 -3 -2 -1 0 1 2 3 4

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0Item 1: alpha=1,beta=1Item 2: alpha=3,beta=0Item 3: alpha=.5,beta=1

2-PARAMETER LOGISTIC: 3 ITEMS & DIFFERENT ’s

= 3, = 0

= 0.5, = 1

= 1, = -1

10

EXAMPLE 1:

HOSPITAL QUALITY FOR HEART FAILURE

(3376 US Hospitals; 2005)

MeasureMedian #

Eligible Patients[10th; 90th]

Median% Compliant

[10th; 90th]

LVF assessment 200 [51;580] 89 [64; 98]

ACE or ARB for LVSD 34 [5;120] 83 [60; 100]

Smoking cessation advice 25 [1;98] 79 [40; 100]

Discharge instructions 135 [0; 469] 55 [15; 87]

Teixeira-Pinto and Normand – Statistics in Medicine (2008)

LVF = left ventricular function; LVSD = left ventricular systolic dysfunction

11

EXAMPLE 1: Hospital Performance

•Ysj = no. of eligible cases in sth hospital getting treatment j.0j = “difficulty” of the jth process measure.

1j = “discriminating” ability of the process measure.

s = underlying quality of care for sth hospital.

)1,0(~ and ) - (

)logit(p where)p,Bin(n~,

01j N

Y

iid

sjs

sjsjsjsjsj

12

LS = Latent Score

Correlation

Dx

Inst.

LVF

AC

EI

AR

B

Sm

ok

e

Dx Inst. 1

LVF .39

1

ACEI/ARB .28

.34

1

Smoke .55

.38

.21 1

13

Comparing Composites:(Teixeira-Pinto and Normand, Statistics in Medicine

(2008))

2005 Data

14

EXAMPLE 2: BASIS-32

Background. BASIS-32, an instrument to assess subjective distress was originally developed using classical testing theory based on a sample of psychiatric inpatients from one hospital.

Data. Self-reports of symptom and problem difficulty obtained from 2,656 psychiatric inpatients discharged from 13 US hospitals between May 2001 and April 2002. (BASIS-32 = Behavior and Symptom Identification Scale)

Normand, Belanger, Eisen – Health Services Outcomes Research Methodology (2006)

15

Provide the answer that best describes the degree of difficulty you have been experiencing in each area during the PAST WEEK.

Managing day-to-day life Being able to feel close to others

Household responsibilities Depression, hopelessness

Leisure time or recreational activities

Controlling temper, outbursts of anger, violence

Adjusting to major life stresses Drinking alcoholic beverages

Relationships with family members

Developing independence

Getting along with people outside family

Lack of self-confidence, feeling bad about yourself

Isolation or feelings of loneliness

Manic, bizarre behavior

Response Options: 0 = No difficulty; 1 = A little difficulty; 2 = Moderate difficulty; 3 = Quite a bit of difficulty; 4 = Extreme difficulty.

16

GRADED RESPONSE MODEL(IRT MODEL)

When response options are ordinal categorical, e.g., Ysj = 0, 1, 2, 3, or 4 where

0 = No difficulty; 1 = A little difficulty; 2 = Moderate difficulty; 3 = Quite a bit of difficulty; 4 = Extreme difficulty

Need to model probability of responding in each category.

17

GRADED RESPONSE MODEL

Probability subject s responds in threshold category k or higher:

P(Ysj k|s)= Pjk*(s) = exp[j(s - jk)]

1 + exp[j(s - jk)]

s = latent trait (e.g., subjective distress)

j = discrimination of jth item (j > 0)

j4 j3 j2 j1 = threshold parameters

18

.0

.2

.4

.6

.8

1.0

-4.5 -3.5 -2.5 -1.5 -0.5 0.5 1.5 2.5 3.5 4.5

= 6.00

CUMULATIVE PROBABILITIES

.0

.2

.4

.6

.8

1.0

-4.5 -3.5 -2.5 -1.5 -0.5 0.5 1.5 2.5 3.5 4.5

= 0.90

P4*()

P1*()

19

.0

.2

.4

.6

.8

1.0

-4.5 -3.5 -2.5 -1.5 -0.5 0.5 1.5 2.5 3.5 4.5

i1 i2 i3 i4

-2.45 -1.87 -1.10 -0.63

.0

.2

.4

.6

.8

1.0

-4.5 -3.5 -2.5 -1.5 -0.5 0.5 1.5 2.5 3.5 4.5

i1 i2 i3 i4

-2.45 -2.40 -2.35 -2.30

CUMULATIVE PROBABILITIES

20

BASIS-10 IRT Parameter Estimates

Item j j1 j2 j3 j4 Ii()Managing Life 2.87 -1.03 -0.36 0.25 1.06 2.08Responsibilitie

s at Home 2.51 -0.80 -0.22 0.37 1.13 1.63Responsibilitie

s Outside Home

2.44 -0.79 -0.25 0.30 1.06 1.53Coping with

Problems 3.13 -1.33 -0.68 -0.13 0.66 2.33Concentrating 3.15 -0.96 -0.33 0.16 0.97 2.41

Thinking Clearly 3.10 -0.89 -0.28 0.27 1.02 2.36Sad or

Depressed 1.99 -1.58 -0.66 -0.12 0.84 1.08Ending Your

Life 1.40 -0.16 0.75 1.30 2.22 0.51Feel Nervous 1.97 -1.16 -0.27 0.26 1.13 1.08Feel Afraid 1.92 -0.87 -0.03 0.52 1.34 1.01

21

Concluding Remarks: (Kaplan & Normand 2006)

Pros Cons Summarizes a large amount of

information into a simpler measureMay be difficult to interpret – what do

the units mean?

Facilitates provider ranking Difficult to validate

Improves reliability of provider measure and thus reduces the number of individual quality measures that need to be collected

Does not necessarily guide quality improvement; the individual quality measures are needed

Fairer to providers – different ways to get good composite scores

Quality information may be wasted or hidden in the composite measure

Reduces the time frame over which quality is assessed by effectively increasing the sample size

The weighting scheme to create the composite score may not be transparent (scoring)

Not new or unique to health care: intelligence, aptitude, mental illness, and personality used for over a century; economics/business; education (student and teacher performance); clinical trials