10
STATISTICS IN MEDICINE Statist. Med. 2005; 24:2953–2962 Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/sim.2159 Estimation of attributable risk for case-control studies with multiple matching Kung-Jong Lui ; Department of Mathematics and Statistics; San Diego State University; San Diego; CA 92182-7720; U.S.A. SUMMARY Kuritz and Landis considered case-control studies with multiple matching and proposed an asympto- tic interval estimator of the attributable risk based on Wald’s statistic. Using Monte Carlo simulation, Kuritz and Landis demonstrated that their interval estimator could perform well when the number of matched sets was large (¿ 100). However, the number of matched sets may often be moderate or small in practice. In this paper, we evaluate the performance of Kuritz and Landis’ interval estimator in small or moderate number of matched sets and compare it with four other interval estimators. We note that the coverage probability of Kuritz and Landis’ interval estimator tends to be less than the desired condence level when the probability of exposure among cases is large. In these cases, the interval estimator using the logarithmic transformation and the two interval estimators derived from the quadratic equations developed here can generally improve the coverage probability of Kuritz and Landis’ interval estimator, especially for the case of a small number of matched sets. Furthermore, we nd that an interval estimator derived from a quadratic equation is consistently more ecient than Kuritz and Landis’ interval estimator. The interval estimator using the logit transformation, although which performs poorly when the underlying odds ratio (OR) is close to 1, can be useful when both the probability of exposure among cases and the underlying OR are moderate or large. Copyright ? 2005 John Wiley & Sons, Ltd. KEY WORDS: multiple matching; attributable risk; interval estimation; case-control studies; coverage probability; ecient 1. INTRODUCTION Kuritz and Landis [1] proposed an asymptotic interval estimator of the attributable risk (AR) [2] using Wald’s statistic for case-control studies with multiple matching. Kuritz and Landis [1] further applied Monte Carlo simulation to demonstrate that their interval estimator could perform well when the number of matched sets was large (¿100). In practice, however, the number of matched sets may often be moderate or small. For example, consider the matched Correspondence to: Kung-Jong Lui, Department of Mathematics and Statistics, San Diego State University, 5500 Campanile Drive, San Diego, CA 92182-7720, U.S.A. E-mail: [email protected] Received 18 August 2004 Copyright ? 2005 John Wiley & Sons, Ltd. Accepted 15 November 2004

Estimation of attributable risk for case-control studies with multiple matching

Embed Size (px)

Citation preview

Page 1: Estimation of attributable risk for case-control studies with multiple matching

STATISTICS IN MEDICINEStatist. Med. 2005; 24:2953–2962Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/sim.2159

Estimation of attributable risk for case-control studieswith multiple matching

Kung-Jong Lui∗;†

Department of Mathematics and Statistics; San Diego State University; San Diego; CA 92182-7720; U.S.A.

SUMMARY

Kuritz and Landis considered case-control studies with multiple matching and proposed an asympto-tic interval estimator of the attributable risk based on Wald’s statistic. Using Monte Carlo simulation,Kuritz and Landis demonstrated that their interval estimator could perform well when the number ofmatched sets was large (¿ 100). However, the number of matched sets may often be moderate orsmall in practice. In this paper, we evaluate the performance of Kuritz and Landis’ interval estimatorin small or moderate number of matched sets and compare it with four other interval estimators. Wenote that the coverage probability of Kuritz and Landis’ interval estimator tends to be less than thedesired con�dence level when the probability of exposure among cases is large. In these cases, theinterval estimator using the logarithmic transformation and the two interval estimators derived fromthe quadratic equations developed here can generally improve the coverage probability of Kuritz andLandis’ interval estimator, especially for the case of a small number of matched sets. Furthermore,we �nd that an interval estimator derived from a quadratic equation is consistently more e�cient thanKuritz and Landis’ interval estimator. The interval estimator using the logit transformation, althoughwhich performs poorly when the underlying odds ratio (OR) is close to 1, can be useful when both theprobability of exposure among cases and the underlying OR are moderate or large. Copyright ? 2005John Wiley & Sons, Ltd.

KEY WORDS: multiple matching; attributable risk; interval estimation; case-control studies; coverageprobability; e�cient

1. INTRODUCTION

Kuritz and Landis [1] proposed an asymptotic interval estimator of the attributable risk (AR)[2] using Wald’s statistic for case-control studies with multiple matching. Kuritz and Landis[1] further applied Monte Carlo simulation to demonstrate that their interval estimator couldperform well when the number of matched sets was large (¿100). In practice, however, thenumber of matched sets may often be moderate or small. For example, consider the matched

∗Correspondence to: Kung-Jong Lui, Department of Mathematics and Statistics, San Diego State University, 5500Campanile Drive, San Diego, CA 92182-7720, U.S.A.

†E-mail: [email protected]

Received 18 August 2004Copyright ? 2005 John Wiley & Sons, Ltd. Accepted 15 November 2004

Page 2: Estimation of attributable risk for case-control studies with multiple matching

2954 K.-J. LUI

case-control study of endometrial cancer reported by Mack et al. [3] consisting of only 59matched sets with available information on the status of exposure to conjugated estrogens.Because the point estimator of the AR depends on a ratio of two random variables, the intervalestimator using Wald’s statistic is not likely to perform well for a small or even moderatenumber of matched sets due to the possibly skewed sampling distribution. This motivates usto search for other simple alternative interval estimators, which may improve the performanceof Kuritz and Landis’ interval estimator.In this paper, we consider �ve interval estimators, including the interval estimator using

Wald’s statistic proposed by Kuritz and Landis [1], the interval estimator using the logarithmictransformation [4, 5], the two interval estimators derived from two simple quadratic equationsdeveloped here, and the interval estimator using the logit transformation [6]. On the basisof Monte Carlo simulation, we evaluate and compare the �nite sample performance of theseestimators in a variety of situations. We note the situations where Kuritz and Landis’ intervalestimator can be of limited use and provide a general guideline about the choice of intervalestimators under di�erent situations.

2. NOTATIONS AND INTERVAL ESTIMATORS

Consider estimation of the AR, de�ned as the proportion of preventable cases due to theelimination of a risk factor from the population. Let E1 and E0 denote the exposure andnon-exposure to the risk factor, respectively. Similarly, let D1 and D0 denote the diseased andnon-diseased populations. We further let P(D1|Ei) denote the probability of developing thedisease in the exposure (i=1) and non-exposure (i=0) populations. The AR is simply equalto [7]

AR=P(E1|D1)(RR − 1)=RR (1)

where P(E1|D1) denotes the population proportion of cases exposed to the risk factor underinvestigation, and RR(=P(D1|E1)=P(D1|E0) denotes the relative risk (RR) of disease. Whenthe underlying disease is rare, since the RR and the OR are approximately equal, we canapproximate AR(1) by

AR=P(E1|D1)(OR − 1)=OR=P(E1|D1)�E (2)

where �E =(OR − 1)=OR. In the following discussion, we will assume that the underlyingdisease is so rare that the di�erence between AR(1) and AR(2) is negligible.Consider a case-control study, in which for each randomly selected case l (l=1; 2; 3; : : : ; n)

from the case population, we match a random number k (k=1; 2; 3; : : : ; K) of controls withrespect to certain nuisance confounders to form n matched sets. Let nijk denote the observednumber of matched sets with i (i=1; 0) exposed case and j (j=0; 1; 2; : : : ; k) exposed (out ofk random matched) controls among the n matched sets, where

∑i

∑j

∑k nijk = n. Kuritz and

Landis [1] noted that the frequencies nijk follow the multinomial distribution with parametersn and the vector of the corresponding cell probability pijk . Thus, the maximum likelihood esti-mator (MLE) of pijk is pijk = nijk =n. Using the functional invariance property of the MLE, theMLE for P(E1|D1) (=p1::=

∑k

∑j p1jk) is p1::=

∑k

∑j p1jk , which is actually the sample

Copyright ? 2005 John Wiley & Sons, Ltd. Statist. Med. 2005; 24:2953–2962

Page 3: Estimation of attributable risk for case-control studies with multiple matching

CASE-CONTROL STUDIES WITH MULTIPLE MATCHING 2955

proportion of cases who are exposed to the risk factor. For estimating the OR, we commonlyemploy the Mantel–Haenszel (MH) estimator [1, 8],

MH =∑k

∑jwrjk p1jk =

∑k

∑jwsjk p0jk (3)

where wrjk =(k − j)=(k + 1) and wsjk = j=(k + 1). These lead us to obtain the point esti-mate [1],

AR = p1::�E (4)

where �E =( MH − 1)= MH.Using the delta method, we can easily derive the asymptotic variance of AR to be [1]

Var(AR) =

{�2Ep1:: + 2�E(1− �E)p1::

+p21::R2

[(1− �E)2

∑k

∑jw2rjkp1jk +

∑k

∑jw2sjkp0jk

]−AR2

}/n (5)

where R=∑

k

∑j wrjkp1jk . To obtain the estimated asymptotic variance ˆVar(AR), we simply

substitute p1:: for pi::; �E for �E ,˙pijk for pijk , and AR for AR in (5). On the basis of (4)

and ˆVar(AR), Kuritz and Landis [1] proposed the asymptotic 100(1− �) percent con�denceinterval for the AR[

AR − Z�=2

√ˆVar(AR);min

{AR + Z�=2

√ˆVar(AR); 1

}](6)

where Z� is the upper (100�)th percentile of the standard normal distribution, and min{a; b}denotes the minimum of the two values of a and b.When the number of matched sets n is moderate or small, the sampling distribution of

AR(4) can be skewed. To improve the normal approximation, we may consider use of thelogarithmic transformation [4]. De�ne �=1 − AR. Note that except for the cases in which MH is 0 or ∞, the estimate AR is �nite and is less than 1, and hence �¿0. Thus, Whenn is moderate or large (¿ 50), unless the probability of exposure p1:: is quite small or theunderlying OR is very large, the value log(�) is likely de�ned in most commonly encoun-tered situations in practice. Furthermore, because ˆVar(log(�))= ˆVar(AR)=�2, we obtain anasymptotic 100(1− �) per cent con�dence interval for the AR to be[

1− � exp(Z�=2

√ˆVar(log(�))

); 1− � exp

(−Z�=2

√ˆVar(log(�))

)](7)

As the number of matched sets n is large, we have the probability

P((AR −AR)2=Var(AR)6Z2�=2)≈ 1− �

Copyright ? 2005 John Wiley & Sons, Ltd. Statist. Med. 2005; 24:2953–2962

Page 4: Estimation of attributable risk for case-control studies with multiple matching

2956 K.-J. LUI

Thus, we obtain the quadratic equation in AR:

A AR2 − 2B AR + C6 0 (8)

where A=1+ Z2�=2=n, B=AR, and

C = AR2 − Z2�=2

{�2Ep1:: + 2�E(1− �E)p1::

+p21::R2

[(1− �E)2

∑k

∑jw2rjk p1jk +

∑k

∑jw2sjk p0jk

]}/n

and R=∑

k

∑j wrjk p1jk . Note that since the quadratic coe�cient A in (8) is positive, the

above equation (8) is convex. Note also that since the inequality B2 − AC¿0 can be shownto hold for ˆVar(AR)¿0, an asymptotic 100(1− �) percent con�dence interval for the AR isgiven by [

B − √B2 − ACA

; min

{B+

√B2 − ACA

; 1

}](9)

When the number of matched controls K =1 for all l (=1; 2; : : : ; n), interval estimator (9)reduces to that proposed elsewhere [9] for case-control studies with one-to-one matching.Note that when deriving interval estimator (9), we do not need to estimate the parame-

ter AR in Var(AR) (5). Thus, we may increase the e�ciency of using interval estimator(6) through the reduction of the number of estimated parameters in variance [9]. Followinga similar idea as this, if we replaced the component p1::�E in the second term of (5) byAR, we might also obtain the following quadratic equation, which is slightly di�erent fromequation (8):

A∗AR2 − 2B∗AR + C∗6 0 (10)

where A∗=1+ Z2�=2=n, B∗=AR + Z2�=2(1− �E)=n, and

C∗=AR2 − Z2�=2

{�2Ep1:: +

p21::R2

[(1− �E)2

∑k

∑jw2rjk p1jk +

∑k

∑jw2sjk p0jk

]}/n

Again, we can show that the inequality B∗2 − A∗C∗¿0 holds for ˆVar(AR)¿0. Thus, anasymptotic 100(1− �) per cent con�dence interval for the AR is given by[

B∗ −√

B∗2 − A∗C∗

A∗ ; min

{B∗ +

√B∗2 − A∗C∗

A∗ ; 1

}](11)

To improve the normal approximation, Leung and Kupper [6] suggested use of the logittransformation log(AR=(1−AR)). Using the delta method, we can easily show that the asymp-totic variance ˆVar(log(AR=(1− AR)))= ˆVar(AR)=(AR(1− AR))2. This leads us to consider

Copyright ? 2005 John Wiley & Sons, Ltd. Statist. Med. 2005; 24:2953–2962

Page 5: Estimation of attributable risk for case-control studies with multiple matching

CASE-CONTROL STUDIES WITH MULTIPLE MATCHING 2957

the following asymptotic 100(�)th per cent con�dence interval for the AR:[{1 + ((1− AR)=AR) exp

(Z�=2

√ˆVar(log(AR=(1− AR)))

)}−1

{1 + ((1− AR)=AR) exp

(−Z�=2

√ˆVar(log(AR=(1− AR)))

)}−1](12)

3. MONTE CARLO SIMULATION

To evaluate and compare the �nite sample performance of interval estimators (6), (7), (9),(11), and (12), we apply Monte Carlo simulation. To account for the possible variationof the exposure probability between cases across matched sets, we assume that the expo-sure probability Pl(E1|D1) (l=1; 2; : : : ; n) follows the beta distribution with mean � (=p1::)and variance �(1 − �)=(T + 1). Note that given � �xed, the larger the value of T , thesmaller is the variation of the exposure probability Pl(E1|D1) between cases. Therefore,the parameter T can be regarded as a measure of variation for the probability distribu-tion Pl(E1|D1). Note further that when �= 1

2 and T =2, the beta distribution reduces tothe uniform distribution over (0; 1). Given OR and Pl(E1|D1) �xed, we can uniquely deter-mine the probability of exposure Pl(E1|D0) for matched controls in the lth matched set byPl(E1|D1)=[Pl(E1|D1) + OR(1 − Pl(E1|D1))]. To account for a possible variation of the ran-dom number k of matched controls, we assume that k follows the probability mass functionP(K = k)= 1

3 , for k=2; 3; 4. We consider the situations in which the probability of exposurein the case population, p1::(=�)=0:2; 0:50; 0:80; the OR of exposure between the case andthe control, OR=1; 2; 4; 8; the number of matched sets n=30; 50; 100; and the measure ofvariation for the exposure probability in the case population, T =2; 9. For each con�gurationdetermined by the combination of these parameters, we generate 10 000 repeated samples ofn matched sets, each consisting of one case and k random number of matched controls, tocalculate the coverage probability and the average length of the 95 per cent con�dence intervalcalculated by interval estimators (6), (7), (9), (11), and (12). Note that it is not uncommonthat we may �nd an interval estimator has the coverage probability larger than another in-terval estimator, but the former is actually more e�cient than the latter with respect to theaverage length. Thus, when comparing the performance between various interval estimators,we need to account for the coverage probability and the average length simultaneously. Anideal interval estimator is the one which consistently has the coverage probability larger thanor approximately equal to the desired con�dence level, and has the average length shorterthan the others. Note also that if either

∑k

∑j wrjk p1jk or

∑k

∑j wsjk p0jk is 0, the MH es-

timate MH (3) would be 0 or ∞, and hence all interval estimators (6), (7), (9), (11), and(12) would be inapplicable. Therefore, we calculate the coverage probability and the averagelength over those samples for which

∑k

∑j wrjk p1jk¿0 and

∑k

∑j wsjk p0jk¿0. In fact, as

long as 0¡ MH¡∞, the con�dence limits of using (6), (7), (9), and (11) will exist. Whenthe estimate AR is less than or equal to 0, we cannot apply interval estimator (12) either.For completeness, we calculate the proportions of simulated samples for which MH =0 or∞, and the proportions of samples for which we fail to apply interval estimator (12).

Copyright ? 2005 John Wiley & Sons, Ltd. Statist. Med. 2005; 24:2953–2962

Page 6: Estimation of attributable risk for case-control studies with multiple matching

2958 K.-J. LUI

4. RESULTS

Table I summarizes the probability of obtaining the MH estimate MH (3) to be 0 or ∞ andthe probability of failing to apply interval estimator (12) in the situations, where the prob-ability of exposure in the case population p1::=0:2; 0:50; 0:80; the OR of exposure betweenthe case and the control OR=1; 2; 4; 8; the number of matched sets n=30; 50; 100; the ran-dom number of matched controls k follows the probability mass function P(K = k)= 1

3 , fork=2; 3; 4; and the measure of variation for the exposure probability in the case populationT =2. We can see that except for the extreme cases where OR is large (=8) and n is small(=30), the probability of obtaining MH to be 0 or ∞ is small or even negligible. How-ever, the probability of failing to apply interval estimator (12) can be quite substantial if theunderlying OR equals 1, despite that the number of matched sets n is large (Table I).Table II summarizes that the coverage probability and the average length of the 95 per cent

con�dence interval using (6), (7), (9), (11), and (12) for the same con�gurations as thoseconsidered in Table I. We �rst note that the interval estimator (7) can generally perform wellwith respect to the coverage probability (; the estimated coverage probability is¿ 94 per cent)in the situations considered here. Second, we �nd that Kuritz and Landis’ interval estimator(6) using Wald’s statistic is often less than the desired con�dence level 95 per cent by morethan 2 per cent when p1::=0:8 (Table II) and the interval estimator (9) derived from thequadratic equation (8) can generally improve the performance of (6) with respect to boththe coverage probability and the average length. For example, when pi::=0:8, OR=2, and

Table I. The estimated probability of obtaining the Mantel–Haenszel estimate MH to be0 or ∞, and the estimated probability of failing to apply interval estimator (12) in thesituations where the exposure probability in the case population p1::=0:20; 0:50; 0:80;the odds ratio OR=1; 2; 4; 8; the number of matched sets n=30; 50; 100; and the ran-dom number K of matched controls with the probability P(K = k)= 1

3 , where k =2; 3; 4.Each entry is calculated on the basis of 10 000 repeated samples.

MH =0 or ∞ Interval estimator (12)

p1:: OR n=30 50 100 30 50 100

0.20 1 0.007 0.000 0.000 0.517 0.503 0.5042 0.010 0.000 0.000 0.178 0.101 0.0344 0.052 0.009 0.000 0.090 0.020 0.0008 0.174 0.054 0.003 0.180 0.055 0.003

0.50 1 0.000 0.000 0.000 0.497 0.497 0.5002 0.001 0.000 0.000 0.096 0.045 0.0064 0.006 0.000 0.000 0.010 0.001 0.0008 0.037 0.005 0.000 0.037 0.005 0.000

0.80 1 0.006 0.000 0.000 0.491 0.497 0.5052 0.013 0.001 0.000 0.137 0.069 0.0214 0.035 0.004 0.000 0.044 0.007 0.0008 0.088 0.018 0.000 0.088 0.018 0.000

Copyright ? 2005 John Wiley & Sons, Ltd. Statist. Med. 2005; 24:2953–2962

Page 7: Estimation of attributable risk for case-control studies with multiple matching

CASE-CONTROL STUDIES WITH MULTIPLE MATCHING 2959

Table II. The estimated coverage probability and average length (in parenthesis) of 95 per cent con-�dence interval using (6), (7), (9), (11), and (12) in the situations where the exposure probabilityin the case population p1::=0:20; 0:50; 0:80; the odds ratio OR=1; 2; 4; 8; the number of matched setsn=30; 50; 100; and the random number K of matched controls with the probability P(K = k)= 1

3 , wherek =2; 3; 4. Each entry is calculated on the basis of 10 000 repeated samples.

p1:: OR AR n (6) (7) (9) (11) (12)

0.2 1 0.00 30 0.95(0.59) 0.96(0.60) 0.97(0.55)∗ 0.94(0.67) 0.00(0.68)50 0.96(0.43) 0.97(0.43) 0.97(0.41)∗ 0.94(0.45) 0.00(0.64)100 0.96(0.29) 0.96(0.29) 0.96(0.28)∗ 0.94(0.29) 0.00(0.58)

2 0.10 30 0.96(0.40) 0.96(0.40) 0.97(0.37)∗ 0.99(0.41) 0.92(0.52)50 0.95(0.30) 0.95(0.30) 0.96(0.29)∗ 0.97(0.30) 0.92(0.42)100 0.95(0.21)∗ 0.95(0.21)∗ 0.96(0.21)∗ 0.97(0.21)∗ 0.94(0.28)

4 0.15 30 0.94(0.33) 0.94(0.33) 0.94(0.31)∗ 0.99(0.32) 0.96(0.40)50 0.94(0.25) 0.94(0.25) 0.94(0.24)∗ 0.99(0.25) 0.95(0.29)100 0.94(0.18)∗ 0.94(0.18)∗ 0.94(0.18)∗ 0.97(0.18)∗ 0.96(0.19)

8 0.18 30 0.93(0.30) 0.93(0.30) 0.92(0.29) 0.98(0.29)∗ 0.97(0.33)50 0.93(0.23) 0.93(0.23) 0.93(0.23) 0.96(0.23)∗ 0.97(0.24)100 0.94(0.17) 0.94(0.17) 0.94(0.16)∗ 0.96(0.16)∗ 0.96(0.17)

0.5 1 0.00 30 0.94(1.12) 0.96(1.19) 0.95(1.06)∗ 0.92(1.10) 0.00(0.81)50 0.95(0.83) 0.96(0.85) 0.95(0.80)∗ 0.93(0.82) 0.00(0.77)100 0.95(0.57) 0.95(0.58) 0.96(0.56)∗ 0.95(0.57) 0.00(0.72)

2 0.25 30 0.94(0.68) 0.95(0.70) 0.97(0.64)∗ 0.96(0.65) 0.92(0.63)50 0.95(0.52) 0.95(0.53) 0.96(0.50)∗ 0.96(0.50)∗ 0.94(0.53)100 0.95(0.36)∗ 0.95(0.37) 0.95(0.36)∗ 0.96(0.36)∗ 0.94(0.38)

4 0.38 30 0.94(0.50) 0.95(0.51) 0.96(0.48)∗ 0.99(0.48)∗ 0.96(0.48)∗

50 0.94(0.39) 0.94(0.39) 0.95(0.38) 0.97(0.37)∗ 0.96(0.37)∗

100 0.95(0.27)∗ 0.95(0.28) 0.95(0.27)∗ 0.96(0.27)∗ 0.96(0.27)∗

8 0.44 30 0.94(0.42) 0.94(0.43) 0.93(0.41) 0.97(0.41) 0.98(0.40)∗

50 0.94(0.33) 0.95(0.33) 0.95(0.32)∗ 0.96(0.32)∗ 0.96(0.32)∗

100 0.94(0.23)∗ 0.95(0.23)∗ 0.94(0.23)∗ 0.95(0.23)∗ 0.95(0.23)∗

0.8 1 0.00 30 0.90(2.56) 0.96(3.23)∗ 0.91(2.42) 0.90(2.44) 0.00(0.89)50 0.92(1.79) 0.95(2.02)∗ 0.93(1.73) 0.92(1.74) 0.00(0.87)100 0.94(1.18) 0.95(1.25) 0.94(1.16)∗ 0.93(1.16) 0.00(0.83)

2 0.40 30 0.91(1.25) 0.95(1.47) 0.95(1.18)∗ 0.94(1.18)∗ 0.93(0.77)50 0.92(0.91) 0.95(0.99) 0.94(0.88) 0.94(0.88) 0.94(0.70)∗

100 0.94(0.63) 0.95(0.66) 0.95(0.62) 0.95(0.62) 0.95(0.57)∗

4 0.60 30 0.92(0.72) 0.97(0.83) 0.98(0.70) 0.98(0.69) 0.97(0.60)∗

50 0.92(0.55) 0.95(0.59) 0.96(0.54) 0.96(0.53) 0.96(0.50)∗

100 0.93(0.38) 0.94(0.40) 0.95(0.38) 0.95(0.38) 0.96(0.37)∗

8 0.70 30 0.95(0.51) 0.96(0.57) 0.98(0.51) 0.99(0.50) 1.00(0.47)∗

50 0.93(0.38) 0.95(0.41) 0.97(0.38) 0.98(0.38) 0.97(0.36)∗

100 0.94(0.27) 0.95(0.28) 0.96(0.27) 0.96(0.27) 0.96(0.26)∗

Boldface means that the estimated coverage probability is less than the desired 95 per cent con�dence levelby ¿ 2 per cent.∗Indicates that the corresponding interval estimator has the shortest average length among interval estimatorsconsidered here with the estimated coverage probability ¿ 94 per cent.

n=30, the coverage probability of the 95 per cent con�dence interval using (6) is 91 percent with an estimated average length 1.25. By contrast, the coverage probability of using (9)is 95 per cent with an estimated average length 1.18. In fact, Table II also shows that the

Copyright ? 2005 John Wiley & Sons, Ltd. Statist. Med. 2005; 24:2953–2962

Page 8: Estimation of attributable risk for case-control studies with multiple matching

2960 K.-J. LUI

latter consistently has the estimated average length smaller than or equal to the former. Third,we note that the interval estimator using the logit transformation (12), although which hasthe coverage probability 0 when OR=1, can be useful when the underlying p1:: and OR areboth large. Since the �ndings for T =9 are essentially similar to those presented in Tables Iand II, for brevity, we do not present the results for T =9.

5. AN EXAMPLE

To illustrate the use of interval estimators (6), (7), (9), (11), and (12), we consider the casecontrol study of endometrial cancer consisting of 59 matched sets with the available infor-mation on exposure to conjugated estrogens [3, 10]. Because the data appear in many places[1, 3, 10, 11], we do not repeat these data here. On the basis of the data, we obtain the MHestimate MH(3) and the point estimate ˆAR (4) to be 5.75 and 65.8 per cent, respectively.When applying interval estimators (6), (7), (9), (11), and (12), we obtain the 95 per centcon�dence intervals to be [0:477; 0:839], [0:419; 0:799], [0:438; 0:798], [0:451; 0:806], and[0:463; 0:811]. We can see that the interval estimate (6) tends to shift to the right as comparedwith the others. We can further see that the interval estimate (7) seems to have the longestlength among the above interval estimates. Because both the probability of exposure in thecase population and the underlying OR are high (p1::=0:797 and MH =5:75), the coverageprobability of using (6) in this case may tend to have the coverage probability slightly lessthan the desired con�dence level (Table II).

6. DISCUSSION

When comparing the performance of di�erent interval estimators, as noted before, we needto consider both the coverage probability and the average length simultaneously to avoiddrawing possibly misleading inference. For example, for p1::=0:20 and OR=1, the coverageprobability of interval estimator (9) is higher than the desired 95 per cent con�dence level,while the coverage probabilities of interval estimators (6), (7), and (11) are close to or slightlyless than the desired 95 per cent con�dence level (Table II). If we draw inference exclusivelybased on the coverage probability in these situations, we will wrongly conclude that intervalestimator (9) is more conservative than the others. As shown in Table II, however, intervalestimator (9) is in these situations the most e�cient among these four estimators with respectto the average length. Similarly, when p1::¿ 0:50 and OR¿ 4 (i.e. AR is large), the coverageprobability of interval estimator (12) using the logit transformation tends to be generally largerthan both the desired 95 per cent con�dence level and the coverage probability of the others.Again, Table II shows that in these cases interval estimator (12) is actually the most e�cientrather than the most conservative among all interval estimators considered here.We note that the coverage probability of using Kuritz and Landis’ interval estimator (6)

can be much less than the desired con�dence level when the probability p1:: of exposure inthe case group is large (say, 0.80). Applying the interval estimator (7) using the logarithmictransformation, which can improve the coverage probability of (6), may lose e�ciency ascompared with (6). Using the interval estimator (9) derived from a quadratic equation cangenerally improve the coverage probability and e�ciency of (6). In fact, when both the

Copyright ? 2005 John Wiley & Sons, Ltd. Statist. Med. 2005; 24:2953–2962

Page 9: Estimation of attributable risk for case-control studies with multiple matching

CASE-CONTROL STUDIES WITH MULTIPLE MATCHING 2961

probability of exposure and the underlying OR are not large, interval estimator (9) is oftenthe most e�cient among interval estimators subject to the coverage probability not less thanthe desired 95 per cent con�dence level by more than 1 per cent (Table II). This is consistentwith the results found elsewhere [9] for the case of one-to-one matching.Note that the coverage probability of (12) is 0 when the underlying OR=1 (Table II). This

is because if OR was 1, the underlying AR would equal 0 and thereby would automaticallyfall below the lower limit of (12), that is always positive. Furthermore, note that if theunderlying OR was close to 1, the chance that we obtain an estimate MH less than 1 wouldbe large, and so would be the probability of obtaining a negative estimate AR (4). Thisexplains the reason why the probability of failing to apply (12) is large for OR=1 even fora large number n of matched sets (Table I). On the other hand, if we have some prior know-ledge that both p1:: and OR are large, estimator (12) can still be a useful interval estimator(Table II).Although all interval estimators considered here are derived on the basis of large sample

theory, Table II shows that di�erent interval estimators can perform reasonably well undercertain situations even when the number n of matched sets is as small as 30. For example,when p1::6 0:50 and OR6 4, the coverage probabilities of using interval estimators (6), (7),and (9) are larger than or approximately equal to the desired 95 per cent con�dence level in allthe situations considered in Table II. In these cases, interval estimator (9) is probably the bestamong these three estimators with respect to the e�ciency. On the other hand, when OR¿ 4,both interval estimators (11) and (12) also perform well even for a small number (=30) ofmatched sets. Under these situations, interval estimator (11) is preferable to (12) for a smallp1:: (=0:20), while interval estimator (12) is preferable to (11) for a large p1:: (=0:80). Whenn is large (¿ 100), interval estimators (6), (7), (9), and (11) are all essentially equivalent;they are all appropriate for use in the situations considered here.The direct generalization of the approach [1] focused here to accommodate the data for a

case-control study with frequency matching (i.e. multiple cases and controls in each matchedset) is not obvious and easy. However, we may apply the methods published elsewhere [11]to produce interval estimators of the AR for frequency matching. The detailed evaluation andcomparison of the �nite sample performance using various methods in data with frequencymatching are truly beyond the scope of this paper and can be a future research topic.In summary, we note that the coverage probability of Kuritz and Landis’ interval estimator

[1] tends to be less than the desired con�dence level when the probability of exposure amongcases is large. In these cases, the interval estimator using the logarithmic transformation andthe interval estimators derived from the two quadratic equations can generally improve thecoverage probability of Kuritz and Landis’ interval estimator. Furthermore, we �nd that theinterval estimator derived from one quadratic equation is consistently more e�cient thanthe latter in almost all the situations considered here. Finally, we note that the interval es-timator using the logit transformation, although which performs poorly when the underlyingodds ratio (OR) is close to 1, can be useful when both the probability of exposure and theunderlying OR are moderate or large.

ACKNOWLEDGEMENTS

The author wishes to thank the referee for many valuable comments and helpful suggestions to improvethe clarity of this paper.

Copyright ? 2005 John Wiley & Sons, Ltd. Statist. Med. 2005; 24:2953–2962

Page 10: Estimation of attributable risk for case-control studies with multiple matching

2962 K.-J. LUI

REFERENCES

1. Kuritz SJ, Landis JR. Attributable risk estimation from matched case-control data. Biometrics 1988; 44:355–367.2. Levin ML. The occurrence of lung cancer in man. Acta Unio Internationalis Contra Cancrum 1953; 9:531–541.3. Mack TM, Pike MC, Henderson BE, Pfe�er RI, Gerkins VR, Arthur BS, Brown SE. Estrogens and endometrialcancer in a retirement community. New England Journal of Medicine 1976; 294:1262–1267.

4. Fleiss JL. Statistical Methods for Rates and Proportions (2nd edn). Wiley: New York, 1981.5. Walter SD. The estimation and interpretation of attributable risk in health research. Biometrics 1976; 32:829–849.

6. Leung HM, Kupper LL. Comparisons of con�dence intervals for attributable risk. Biometrics 1981; 37:293–302.7. Miettinen OS. Proportion of disease caused or prevented by a given exposure, trait or intervention. AmericanJournal of Epidemiology 1974; 99:325–332.

8. Fleiss JL. The Mantel–Haenszel estimator in case-control studies with varying number of controls matched toeach case. American Journal of Epidemiology 1984; 120:943–952.

9. Lui K-J. Interval estimation of the attributable risk in case control studies with matched pairs. Journal ofEpidemiology and Community Health 2001; 55:885–890.

10. Breslow NE, Day NE. Statistical Methods in Cancer Research, vol. 1. The Analysis of Case-Control Studies.IARC Scienti�c Publication No. 32. International Agency for Research on Cancer: Lyon, 1980.

11. Greenland S. Variance estimators for attributable fraction estimates consistent in both large strata and sparsedata. Statistics in Medicine 1987; 6:701–708.

Copyright ? 2005 John Wiley & Sons, Ltd. Statist. Med. 2005; 24:2953–2962