1
ITEM RESPONSE THEORY MODELS & COMPOSITE
MEASURES
Sharon-Lise T. Normand FOCUS: How to deal with data that are dichotomous or ordinal?
“s” index subject“j” index item (or measure) s = “true” unobserved score.
2
WHAT IS AN ITEM RESPONSE THEORY (IRT) MODEL?
A statistical model that relates the probability of response to an item to item-specific parameters and to the subject’s underlying latent trait.
3
Classical Test Theory
• Estimate reliability of items (coefficient ).
• Model:
Ysj = s + sj
Ysj = response
s = underlying trait
sj = error Normal with expectation 0 and constant variance.
Item Response Theory
• Estimate discriminating ability of items using item-specific parameters.
• Responses within a subject are independent conditional on latent trait.
• Normality & constant variance not assumed
s ~ N(0,1)
4
DICHOTOMOUS OR ORDINAL RESPONSES
Item response formulation: Observed response is ysj; generalized linear model formulation:
h(P(ysj = 1 given s)) = j(s - j)
• h = link function (logit or probit) j and j are “item” parameters.
5
RASCH MODEL (1-PARAMETER LOGISTIC)
Simplest IRT Model• Ysj = 1 if subject s responds correctly to
item j and 0 otherwise. s = latent ability for subject s.
j = difficulty of jth item.
Probability subject s responds correctly jth item:
P(Ysj=1|s) = exp(s - j )
1+exp(s - j )
6
o o o o oo
o
o
o
o
o
o
o
o
oo
o
Beta
Pro
ba
bili
ty o
f C
orr
ect
Re
spo
nse
-4 -3 -2 -1 0 1 2 3 4
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0Theta=1Theta=0Theta=-1
RASCH MODEL: 3 SUBJECTS WITH DIFFERENT TRAITS
= DIFFICULTY
7
2-PARAMETER LOGISTIC
• Ysj = 1 if subject s responds correctly to item j and 0 otherwise.
s = latent ability for subject s.
j = difficulty of jth item.
j = discrimination of jth item (j > 0)
Probability subject s responds correctly jth item:
P(Ysj=1|s) = exp(j(s - j)
1+exp(j(s - j))
8
o o o o oo
o
o
o
o
o
o
o
o
oo
o
Theta
Pro
ba
bili
ty o
f C
orr
ect
Re
spo
nse
-4 -3 -2 -1 0 1 2 3 4
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
o o o o o o o oo
o
o
o
oo o o o
o oo
oo
oo
o
o
o
o
o
o
oo
oo
2-PARAMETER LOGISTIC: 3 ITEMS ( = 1)
= 3 = 0.5
= 1
9
o o o o oo
o
o
o
o
o
o
o
o
oo
o
Theta
Pro
ba
bili
ty o
f C
orr
ect
Re
spo
nse
-4 -3 -2 -1 0 1 2 3 4
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0Item 1: alpha=1,beta=1Item 2: alpha=3,beta=0Item 3: alpha=.5,beta=1
2-PARAMETER LOGISTIC: 3 ITEMS & DIFFERENT ’s
= 3, = 0
= 0.5, = 1
= 1, = -1
10
EXAMPLE 1:
HOSPITAL QUALITY FOR HEART FAILURE
(3376 US Hospitals; 2005)
MeasureMedian #
Eligible Patients[10th; 90th]
Median% Compliant
[10th; 90th]
LVF assessment 200 [51;580] 89 [64; 98]
ACE or ARB for LVSD 34 [5;120] 83 [60; 100]
Smoking cessation advice 25 [1;98] 79 [40; 100]
Discharge instructions 135 [0; 469] 55 [15; 87]
Teixeira-Pinto and Normand – Statistics in Medicine (2008)
LVF = left ventricular function; LVSD = left ventricular systolic dysfunction
11
EXAMPLE 1: Hospital Performance
•Ysj = no. of eligible cases in sth hospital getting treatment j.0j = “difficulty” of the jth process measure.
1j = “discriminating” ability of the process measure.
s = underlying quality of care for sth hospital.
)1,0(~ and ) - (
)logit(p where)p,Bin(n~,
01j N
Y
iid
sjs
sjsjsjsjsj
12
LS = Latent Score
Correlation
Dx
Inst.
LVF
AC
EI
AR
B
Sm
ok
e
Dx Inst. 1
LVF .39
1
ACEI/ARB .28
.34
1
Smoke .55
.38
.21 1
14
EXAMPLE 2: BASIS-32
Background. BASIS-32, an instrument to assess subjective distress was originally developed using classical testing theory based on a sample of psychiatric inpatients from one hospital.
Data. Self-reports of symptom and problem difficulty obtained from 2,656 psychiatric inpatients discharged from 13 US hospitals between May 2001 and April 2002. (BASIS-32 = Behavior and Symptom Identification Scale)
Normand, Belanger, Eisen – Health Services Outcomes Research Methodology (2006)
15
Provide the answer that best describes the degree of difficulty you have been experiencing in each area during the PAST WEEK.
Managing day-to-day life Being able to feel close to others
Household responsibilities Depression, hopelessness
Leisure time or recreational activities
Controlling temper, outbursts of anger, violence
Adjusting to major life stresses Drinking alcoholic beverages
Relationships with family members
Developing independence
Getting along with people outside family
Lack of self-confidence, feeling bad about yourself
Isolation or feelings of loneliness
Manic, bizarre behavior
Response Options: 0 = No difficulty; 1 = A little difficulty; 2 = Moderate difficulty; 3 = Quite a bit of difficulty; 4 = Extreme difficulty.
16
GRADED RESPONSE MODEL(IRT MODEL)
When response options are ordinal categorical, e.g., Ysj = 0, 1, 2, 3, or 4 where
0 = No difficulty; 1 = A little difficulty; 2 = Moderate difficulty; 3 = Quite a bit of difficulty; 4 = Extreme difficulty
Need to model probability of responding in each category.
17
GRADED RESPONSE MODEL
Probability subject s responds in threshold category k or higher:
P(Ysj k|s)= Pjk*(s) = exp[j(s - jk)]
1 + exp[j(s - jk)]
s = latent trait (e.g., subjective distress)
j = discrimination of jth item (j > 0)
j4 j3 j2 j1 = threshold parameters
18
.0
.2
.4
.6
.8
1.0
-4.5 -3.5 -2.5 -1.5 -0.5 0.5 1.5 2.5 3.5 4.5
= 6.00
CUMULATIVE PROBABILITIES
.0
.2
.4
.6
.8
1.0
-4.5 -3.5 -2.5 -1.5 -0.5 0.5 1.5 2.5 3.5 4.5
= 0.90
P4*()
P1*()
19
.0
.2
.4
.6
.8
1.0
-4.5 -3.5 -2.5 -1.5 -0.5 0.5 1.5 2.5 3.5 4.5
i1 i2 i3 i4
-2.45 -1.87 -1.10 -0.63
.0
.2
.4
.6
.8
1.0
-4.5 -3.5 -2.5 -1.5 -0.5 0.5 1.5 2.5 3.5 4.5
i1 i2 i3 i4
-2.45 -2.40 -2.35 -2.30
CUMULATIVE PROBABILITIES
20
BASIS-10 IRT Parameter Estimates
Item j j1 j2 j3 j4 Ii()Managing Life 2.87 -1.03 -0.36 0.25 1.06 2.08Responsibilitie
s at Home 2.51 -0.80 -0.22 0.37 1.13 1.63Responsibilitie
s Outside Home
2.44 -0.79 -0.25 0.30 1.06 1.53Coping with
Problems 3.13 -1.33 -0.68 -0.13 0.66 2.33Concentrating 3.15 -0.96 -0.33 0.16 0.97 2.41
Thinking Clearly 3.10 -0.89 -0.28 0.27 1.02 2.36Sad or
Depressed 1.99 -1.58 -0.66 -0.12 0.84 1.08Ending Your
Life 1.40 -0.16 0.75 1.30 2.22 0.51Feel Nervous 1.97 -1.16 -0.27 0.26 1.13 1.08Feel Afraid 1.92 -0.87 -0.03 0.52 1.34 1.01
21
Concluding Remarks: (Kaplan & Normand 2006)
Pros Cons Summarizes a large amount of
information into a simpler measureMay be difficult to interpret – what do
the units mean?
Facilitates provider ranking Difficult to validate
Improves reliability of provider measure and thus reduces the number of individual quality measures that need to be collected
Does not necessarily guide quality improvement; the individual quality measures are needed
Fairer to providers – different ways to get good composite scores
Quality information may be wasted or hidden in the composite measure
Reduces the time frame over which quality is assessed by effectively increasing the sample size
The weighting scheme to create the composite score may not be transparent (scoring)
Not new or unique to health care: intelligence, aptitude, mental illness, and personality used for over a century; economics/business; education (student and teacher performance); clinical trials