10
Printed in Great Britain Biostatistics (2000), 1, 1, pp. 113–122 Small-sample bias and corrections for conditional maximum-likelihood odds-ratio estimators SANDER GREENLAND Department of Epidemiology, UCLA School of Public Health, Los Angeles, CA 90095-1772, USA SUMMARY A number of small-sample corrections have been proposed for the conditional maximum-likelihood estimator of the odds ratio for matched pairs with a dichotomous exposure. I here contrast the rationale and performance of several corrections, specifically those that generalize easily to multiple conditional logistic regression. These corrections or Bayesian analyses with informative priors may serve as diagnostics for small-sample problems. Points are illustrated with a small exact performance comparison and with an example from a study of electrical wiring and childhood leukemia. The former comparison suggests that small-sample bias may be more prevalent than commonly realized. Keywords: Bias; Case-control studies; Conditional logistic regression; Cox model; Epidemiologic methods; Likeli- hood analysis; Logistic models; Matching; Odds ratio; Proportional hazards; Relative risk; Risk assessment. 1. I NTRODUCTION The conditional maximum-likelihood (CML) estimator of a common odds ratio for matched pairs was introduced by Kraus (1960) and has since become a mainstay of epidemiologic analysis (Breslow and Day, 1980; Clayton and Hills, 1993; Kelsey et al., 1996; Rothman and Greenland, 1998). Jewell (1984), however, described the severe small-sample bias that can arise in the estimator, and derived and compared some bias corrections. Since then other corrections and comparisons have appeared. The present note contrasts several corrections that have an obvious Bayesian rationale or a straightforward extension to conditional-logistic regression. A new estimator is introduced that is a minor adaptation of formulas for ordinary logistic regression. Estimators are illustrated in an exact performance comparison, and in a matched-pair study of power lines and childhood leukemia (Ebi et al., 1999). The former comparison suggests that bias may be a frequent problem in small or overmatched studies. The CML odds-ratio estimators have positive probability of being infinite and so have infinite exact expectations, even though they are unbiased to first order. Following earlier literature (Jewell, 1984, 1986; Liu, 1989), for ease of writing I will use the term ‘bias’ to refer to bias of higher order. One can also can view the bias problem as one in which estimates far above the true parameter value occur with unacceptably high probability. 2. APPROXIMATE BIAS CORRECTIONS Several approximate corrections have been proposed and evaluated for matched odds-ratio estimates with a dichotomous exposure and for 2 × 2-table (unmatched) odds-ratio estimators (Jewell, 1984, 1986; Becker, 1989; Liu, 1989; Walter and Cook, 1991). These corrections are of two forms: those that correct c Oxford University Press (2000)

Greenland S. - Small-sample bias and corrections for conditional maximum-likelihood odds-ratio estimators(2000)(10).pdf

  • Upload
    oscura

  • View
    225

  • Download
    4

Embed Size (px)

Citation preview

Page 1: Greenland S. - Small-sample bias and corrections for conditional maximum-likelihood odds-ratio estimators(2000)(10).pdf

Printed in Great BritainBiostatistics (2000),1, 1, pp. 113–122

Small-sample bias and corrections for conditionalmaximum-likelihood odds-ratio estimators

SANDER GREENLAND

Department of Epidemiology, UCLA School of Public Health, Los Angeles,CA 90095-1772, USA

SUMMARY

A number of small-sample corrections have been proposed for the conditional maximum-likelihoodestimator of the odds ratio for matched pairs with a dichotomous exposure. I here contrast the rationale andperformance of several corrections, specifically those that generalize easily to multiple conditional logisticregression. These corrections or Bayesian analyses with informative priors may serve as diagnostics forsmall-sample problems. Points are illustrated with a small exact performance comparison and with anexample from a study of electrical wiring and childhood leukemia. The former comparison suggests thatsmall-sample bias may be more prevalent than commonly realized.

Keywords: Bias; Case-control studies; Conditional logistic regression; Cox model; Epidemiologic methods; Likeli-hood analysis; Logistic models; Matching; Odds ratio; Proportional hazards; Relative risk; Risk assessment.

1. INTRODUCTION

The conditional maximum-likelihood (CML) estimator of a common odds ratio for matched pairs wasintroduced by Kraus (1960) and has since become a mainstay of epidemiologic analysis (Breslow andDay, 1980; Clayton and Hills, 1993; Kelseyet al., 1996; Rothman and Greenland, 1998). Jewell (1984),however, described the severe small-sample bias that can arise in the estimator, and derived and comparedsome bias corrections. Since then other corrections and comparisons have appeared. The present notecontrasts several corrections that have an obvious Bayesian rationale or a straightforward extension toconditional-logistic regression. A new estimator is introduced that is a minor adaptation of formulasfor ordinary logistic regression. Estimators are illustrated in an exact performance comparison, and ina matched-pair study of power lines and childhood leukemia (Ebiet al., 1999). The former comparisonsuggests that bias may be a frequent problem in small or overmatched studies.

The CML odds-ratio estimators have positive probability of being infinite and so have infinite exactexpectations, even though they are unbiased to first order. Following earlier literature (Jewell, 1984,1986; Liu, 1989), for ease of writing I will use the term ‘bias’ to refer to bias of higher order. One canalso can view the bias problem as one in which estimates far above the true parameter value occur withunacceptably high probability.

2. APPROXIMATE BIAS CORRECTIONS

Several approximate corrections have been proposed and evaluated for matched odds-ratio estimateswith a dichotomous exposure and for 2× 2-table (unmatched) odds-ratio estimators (Jewell, 1984, 1986;Becker, 1989; Liu, 1989; Walter and Cook, 1991). These corrections are of two forms: those that correct

c© Oxford University Press (2000)

Page 2: Greenland S. - Small-sample bias and corrections for conditional maximum-likelihood odds-ratio estimators(2000)(10).pdf

114 S. GREENLAND

for bias on the logarithmic scale, and those that correct for bias on the odds-ratio (arithmetic) scale.

2.1. Logarithmic corrections

It is possible to adapt a well-known bias correction for unconditional ML estimators (Byth and McLach-lan, 1978; Anderson and Richardson, 1979; Schaefer, 1983; Cordeiro and McCullagh, 1991) tomatched-pair CML estimators. The contribution of a matched pair with case regressor vector x1 andcontrol regressor vector x0 to the conditional logistic likelihood (Breslow and Day, 1980; Clayton andHills, 1993) simplifies to expit(d ′β), where β is the vector of logistic coefficients, d = x1 − x0, andexpit(η) = (1 + e−η)−1 is the logistic transform. The full conditional likelihood thus can be written inthe form of a no-intercept unconditional logistic likelihood for binomial observations defined by n(x1, x0)

‘successes’ out of n(x1, x0)+n(x0, x1) trials, where n(x1, x0) is the number of pairs with case regressor x1and control regressor x0. The distribution of n(x1, x0) is binomial given n(x1, x0)+ n(x0, x1); if x1 = x0,the distribution does not depend on β and hence concordant pairs do not contribute to the likelihood.

Let i index the pairs, let D be the diagonal matrix of observed pair differences di , p the vector ofconditional probabilities pi = expit(d ′

iβ) for the pairs, W the diagonal matrix diag[pi (1 − pi )], andH = D′W D. The second-order approximation to the bias in the CML estimator is then b = H−1 D′Wr ,where ri = d ′

i H−1di (pi − 12 ); see Cordeiro and McCullagh (1991). A bias correction is obtained by

using the CMLE β to compute b, then subtracting the result b from β (Anderson and Richardson, 1979;Schaefer, 1983); a corresponding variance estimate for β − b may be computed by the delta method(Bishop et al., 1975, Chapter 14).

For discrete x there is another logarithmic-scale correction, due to Haldane, which adds 12 to each cell

(here, pair count) and then applies ML to the augmented counts (Bishop et al., 1975; Good, 1983; Jewell,1984). For a binary x , the resulting ‘augmented likelihood’ for β is identical to the posterior distributionfor β under a Jeffreys prior (Leonard and Hsu, 1994). The augmented counts are sometimes multipliedby a constant to restore the sample total to its original value (Bishop et al., 1975), which affects only thevariance estimates.

2.2. Arithmetic corrections

For any nonconstant estimator θ , E(θ) = θ implies E(eθ ) > eθ ; consequently, the above estimatorsundercorrect for bias on the arithmetic scale (Jewell, 1984). This raises the issue of whether one shouldexamine the odds ratios or the log odds ratios. A common presumption is that one should focus on thelog odds ratios because of the extreme asymmetry of the distribution of the odds ratios. I and othersmaintain that this presumption is an example of ignoring context to suit the statistics. In a well-designedstudy, the odds ratios, not their logs, are proportional to disease rates (Rothman and Greenland, 1998).These rates, in turn, are proportional to the overall costs of disease (Morgenstern and Greenland, 1990).The magnitudes of these costs are a primary target of interest for public health and subsequent policydebates, and hence the relevant estimation errors are proportional to arithmetic, not logarithmic, errors inrelative-risk estimates.

Several approximate bias corrections for the arithmetic scale have been proposed for discrete x (Bishopet al., 1975; Good, 1983; Jewell, 1984). These turn out to be very close to the Laplace estimator obtainedby adding 1 rather than 1

2 to each cell (Bishop et al., 1975; Good, 1983). Unlike the others, this Laplacecorrection is invariant under exposure recoding and has a simple Bayesian derivation using a uniformprior on expit(β) (Bishop et al., 1975; Good, 1983); this prior is equivalent to the mean-zero logistic prioron β with c.d.f. expit(β).

Page 3: Greenland S. - Small-sample bias and corrections for conditional maximum-likelihood odds-ratio estimators(2000)(10).pdf

Small-sample bias and corrections for conditional maximum-likelihood odds-ratio estimators 115

2.3. Bayes estimators

If one believes that random error is a major contributor to the results, it would be natural to pursueestimators with even lower expected squared error (ESE) than the corrected estimators, such as a Bayesestimator based on a prior that is (hopefully) more concentrated near the true coefficient vector than thepriors implicit in the above procedures.

This leaves the task of specifying the prior. For ease of illustration, consider a matched-pair studyof a dichotomous exposure, with no covariates. In this case β is the pair-specific log odds ratio. Manyepidemiologic controversies about harmful effects revolve around whether the true relative risk (whichthe odds ratio is supposed to approximate) is 1 versus 1.5 or 1 versus 2, with virtually no prior probabilitygiven to values above 3 or 4 by anyone, largely because most estimates are below 2. The electric power-cancer literature is an example (Portier and Wolfe, 1998; Greenland et al., 2000b). Other examples canbe found in the nutrition and diet literature, such as in the coffee–heart disease controversy (Greenland,1993a). In these contexts, the upper prior percentiles derived from normal(µ, τ 2) distributions for β

with µ close to zero and τ 2 between 12 and 1 more closely corresponds to meta-analysis results and to

the spectrum of expert opinions than do percentiles derived from the priors implicit in the Haldane orLaplace estimators. For example, the upper 90th prior percentiles for the odds ratio under normal(0,1)

and normal(0, 12 ) priors for β are 3.6 and 1.9, whereas the upper 90th odds-ratio percentiles under the

Haldane and Laplace priors are 40 and 9.

3. EXACT RESULTS FOR DICHOTOMOUS MATCHED-PAIR STUDIES

In the case of a matched-pair study of a dichotomous exposure with no covariates, it is easy to computethe bias of odds-ratio estimates directly from the exact conditional distribution of the discordant pairs(Jewell, 1984); there is no need for approximate or simulation studies. Let u and v be the numbers ofdiscordant pairs with the case exposed and with the control exposed. The conditional distribution of uis binomial with probability expit(β) and total N = u + v; the CML, Haldane, and Laplace odds-ratioestimates are u/v, (u + 1

2 )/(v + 12 ), and (u + 1)/(v + 1) (Breslow and Day, 1980; Good, 1983; Jewell,

1984; Clayton and Hills, 1993); the Mantel–Haenszel and CML estimators are identical in this case. Onemust use an ad hoc redefinition of the CML estimator at v = 0 to give it a finite mean; following Jewell(1984), I equated it to the Haldane estimator when a zero occurred. The log-scale CML bias correctionsimplifies to

b = (u − v)2/2uvN (1)

so the corrected CML estimate is u/veb (again, with ad hoc replacement by the Haldane estimator whena zero occurred). The posterior mode of the log odds ratio β under a normal(0, τ 2) prior is the solution β

of

β = [u − N · expit(β)]τ 2. (2)

I here evaluate eβ with τ 2 = 1 as a ‘Bayes point estimator’ of the odds ratio, because it is a special case oflogistic penalized-likelihood estimators studied elsewhere (Breslow and Clayton, 1993; Greenland, 1997;Breslow et al., 1998).

Table 1 presents the exact expectations of the above estimators under various scenarios. When thetrue odds ratio was small (2 or less), the Bayes-normal(0,1) estimator appeared least biased, though littledifferent from the Laplace estimator, but was severely overcorrected (biased downward) for odds ratios of4 or more. Excepting the rather extreme case of N = 8, ω = 8, the Laplace estimator was nearly unbiasedin all cases examined. As expected, the uncorrected CML estimator had considerable bias even when thetrue odds ratio was 1, and the corrected-CML and Haldane estimators were arithmetically undercorrected.

Page 4: Greenland S. - Small-sample bias and corrections for conditional maximum-likelihood odds-ratio estimators(2000)(10).pdf

116 S. GREENLAND

Table 1. Expected values and percent probabilities of twice truth or more for odds-ratio estimators in amatched-pair study of dichotomous exposure.* OR = odds ratio, N = number of discordant pairs, CMLC

= CML with ML bias correction, Bayes = Bayes estimator using normal(0,1) prior for β (see text)

Expected value % Probability $ 2 A true OR

True OR N CML CMLC Haldane Laplace Bayes CML CMLC Haldane Laplace Bayes

1 8 1.4 1.3 1.3 1.2 1.1 14 14 14 14 4

16 1.2 1.1 1.1 1.1 1.1 11 11 11 11 4

24 1.1 1.1 1.1 1.1 1.1 8 3 3 3 3

1.2 8 1.8 1.6 1.6 1.4 1.3 21 21 21 6 6

16 1.4 1.4 1.4 1.3 1.3 8 8 8 8 3

24 1.3 1.3 1.3 1.3 1.3 8 3 3 3 3

1.5 8 2.3 2.1 2.0 1.7 1.5 32 11 11 11 2

16 1.8 1.8 1.7 1.6 1.5 17 7 7 7 2

24 1.7 1.7 1.6 1.6 1.5 10 4 4 4 1

2 8 3.2 2.9 2.8 2.2 1.8 20 20 20 20 4

16 2.6 2.4 2.4 2.2 1.9 17 6 6 6 1

24 2.3 2.3 2.2 2.1 2.0 6 6 6 6 2

4 8 6.4 5.7 5.6 3.8 2.6 17 17 17 17 0

16 6.2 5.3 5.2 4.2 3.1 14 14 14 14 0

24 5.4 4.8 4.7 4.2 3.4 11 11 11 3 0

8 8 9.9 9.2 9.1 5.5 3.4 39 39 39 0 0

16 12.5 10.7 10.7 7.2 4.4 15 15 15 15 0

24 12.6 10.4 10.5 7.9 5.1 24 6 6 6 0

*CML and CMLC set equal to Haldane when zero cell occurs.

The bias results are in good accord with those in Jewell (1984). As mentioned earlier, however, noteveryone is comfortable with bias as a criterion for evaluating ratio estimators. Therefore, the table alsopresents the probabilities that the estimates will exceed twice the true odds ratio. For the CML estimator,these upper-tail probabilities can remain appreciable even with a substantial number of discordant pairs,and only the Bayes estimator does consistently better by this criterion.

Evaluations were also made using arithmetic and logarithmic expected-squared error as performancecriteria; in both, CML was worst and Laplace was best over all the cases shown. I also computed exactcoverages of the approximate 95% Wald-type intervals centered on log odds-ratio estimators, as well asexact and score intervals, for the situations in Table 3. These results are not shown because all exhibitedover 95% coverage in almost all the situations examined, although the Laplace correction produced by farthe narrowest average width and closest to nominal coverage, with score intervals also doing well. Studiesof intervals for binomial proportions have found that score intervals exhibit better performance than CML,Wald, likelihood-ratio, and even exact intervals; see Agresti and Coull (1998) for references. Interestingly,the latter authors observed that adding two to each cell count produced Wald intervals for p = expit(β)

that performed nearly as well as the score intervals; this corresponds to using an approximate posteriorinterval for p derived from a beta(2,2) prior.

Page 5: Greenland S. - Small-sample bias and corrections for conditional maximum-likelihood odds-ratio estimators(2000)(10).pdf

Small-sample bias and corrections for conditional maximum-likelihood odds-ratio estimators 117

Table 2. Case-specular pairs fromanalysis of back-yard electrical lines

and childhood leukemia

Specular back-yard lines

Case: 3-phase Secondary None

3-phase 15 24 11

Secondary 11 107 9

None 0 1 81

4. AN EXAMPLE

A case-specular study involves case-control pairs in which the ‘case’ is a case house and the ‘control’is a reflection of the case house across the street (Zaffanella et al., 1998); under certain assumptions,ordinary matched-pair likelihoods can be used to analyze such data (Greenland, 1999). Table 2 gives datafrom a case-specular study of electrical wiring and childhood leukemia (Ebi et al., 1999). Of the 259pairs available for this example, only 56 were discordant, and only one of these pairs had a case with noback-yard power line.

Represent line type by two indicators, t1 for 3-phase line (1 = yes, 0 = no) and t2 for secondary line,and let ω(t1, t2) be the ratio of leukemia odds at exposure (t1, t2) versus (0,0) within matching strata.The usual conditional-logistic model for the regression of leukemia risk on (t1, t2) is equivalent to theconditional (stratum-specific) odds-ratio model

ω(t1, t2) = exp(β1t1 + β2t2).

Row 1 of Table 3 gives the CML odds-ratio estimates (with 95% Wald confidence limits) from fitting thismodel to the example data. The intervals fall above the range of estimates obtained from other studiesof wiring and leukemia, and both point estimates are at least ten times what one would expect based onall the evidence to date (including twenty or so other epidemiologic studies, most of them larger than thisone) (Portier and Wolfe, 1998; Greenland et al., 2000b).

While epidemiologic validity problems may have contributed to the apparent exaggeration of the esti-mates, the data are uninformative about those problems. We can, however, examine the extent to whichthis appearance depends on the analysis method. Row 2 of Table 3 provides the results from an exactlogistic-regression software program. The point estimates are hardly different from the CML estimatesbecause they are in fact only slightly modified CML estimates (LogXact, 1993). More disturbing is thefact that the exact limits appear even more exaggerated than the CML Wald limits. The exact limits areknown to cover at or above the nominal rate if there are no epidemiologic biases (Breslow and Day, 1980),and so suggest no exaggeration in the CML intervals. Nonetheless, the results are extraordinarily unsta-ble. Row 3 of Table 3 shows the impact on the CML results of reclassifying as unexposed just one of theeleven cases in the secondary/3-phase cell. This minor change puts one pair in the empty cell in Table 1,and halves the estimates. Conversely, reclassifying as exposed the single unexposed case in a discordantpair makes the CML estimates infinite.

Rows 4–6 of Table 3 give the ML-bias corrected, Haldane, and Laplace estimates. The two logarithmiccorrections reduce the estimates by about half while the Laplace correction reduces the estimates by abouttwo-thirds. Nonetheless, the estimates still appear implausibly large relative to previous studies.

Row 7 of Table 3 gives approximate posterior medians and 95% intervals derived from the secondderivative of the log posterior density, based on a bivariate normal prior for β2, β1 − β2 in model 1, withprior means of zero, prior variances of 1, and a prior correlation 0.5; β2 and β1 −β2 are the log odds ratios

Page 6: Greenland S. - Small-sample bias and corrections for conditional maximum-likelihood odds-ratio estimators(2000)(10).pdf

118 S. GREENLAND

Table 3. Odds-ratio estimates for 3-phase and sec-ondary back-yard power-line exposure, from case-specular analysis of childhood leukemia. CML =

conditional maximum likelihood

Method 3-phase Secondary

1. CML 32 (4.0,253) 14 (1.8,107)

2. Exact* 30 (4.5,1328) 14 (2.1,507)

3. CML moving one pair 16 (3.4,72) 6.8 (1.5,30)

4. CML bias corrected† 19 (3.6,105) 8.7 (1.7,45)

5. Haldane‡ 16 (3.5,78) 7.4 (1.6,34)

6. Laplace§ 11 (2.9,43) 5.2 (1.4,19)

7. Bayes β ∼ N (0,1) 12 (3.5,40) 4.9 (1.7,14)

8. Bayes β ∼ N (0,1/2) 8.6 (3.0,25) 3.6 (1.6,8.5)

9. Pairing ignored 2.4 (1.4,4.1) 1.2 (0.81,1.7)

*Modified CML point estimates and exact limits from LogXact† Using approximate bias correction for ML estimates‡ Add 1

2 to each cell and renormalize§ Add 1 to each cell and renormalize

comparing secondary to no line and 3-phase to secondary. The prior variance for β1 is 1+1+2(0.5) = 3,which yields an upper 90th prior percentile for the odds ratio eβ1 comparing 3-phase to no line of 9.2. Theresults resemble the Laplace estimates, but with narrower intervals; this narrowing is as expected, giventhe lighter tails of the normal prior in comparison to the Laplace prior. Row 8 is derived using the sameprior means and correlation, but with prior variances of 1

2 . This change implies prior variance of 1.5 forβ1 and an upper 90th prior percentile for eβ1 of 3; although the results are still implausibly large, theirmagnitude is easily attributable to random error and (not unlikely) other sources of bias.

A referee suggested examining the estimates obtained by breaking the pairing and using the crudeunmatched data. These are presented in row 9 of Table 3. Because of the strong positive association of thepair exposures, collapsing across pairs produces estimates that are less than a tenth that of CML; the resultsare also much more precise and consistent with the literature. The latter consistency may largely reflect afortuitous cancellation of biases, for the crude (collapsed) odds ratio is known to be biased toward the nullwhen the pair exposures are positively correlated (Siegel and Greenhouse, 1973). Nonetheless, the crudeodds ratio also has lower variance, which has led some authors to suggest averaging the stratified andcrude estimators to minimize expected squared error (Liang and Zeger, 1988; Kalish, 1990; Greenland,1991). In the present example, the tremendous drop in the odds ratio upon collapsing is just what oneshould expect given the extremely high correlation of the exposure (line type) with the main matchingfactors (neighborhood and housing type) implicit in the use of specular controls.

For a more detailed discussion of this example and similar bias in a conventional matched case-controlstudy, see Greenland et al. (2000a).

5. DISCUSSION

The present paper has focused on situations in which there are too few pairs to support CML estimationof even one parameter. The problems can become more acute in multiple logistic regression. These prob-

Page 7: Greenland S. - Small-sample bias and corrections for conditional maximum-likelihood odds-ratio estimators(2000)(10).pdf

Small-sample bias and corrections for conditional maximum-likelihood odds-ratio estimators 119

lems, formalized as sparse-data inconsistency, have long been recognized in unconditional ML estimators,and in fact CML estimators were developed to address these problems (Breslow and Day, 1980; Breslow,1981). Unfortunately, an analogous problem occurs in CML estimators when pair-counts are sparse.

The formal equivalence of the matched-pair conditional likelihood to an unconditional likelihood al-lows one to map results for the latter to the former. As an example, consider a matched-pair study of anindicator x in which the investigator wishes to control an unmatched nominal covariate z whose number oflevels increases at the same rate as the total number of pairs M . Entering this covariate into the conditionallogistic model as a series of indicators (dummy variables) will then produce a conditional likelihood withO(M) nuisance parameters (the z indicator coefficients), from which it follows by arguments parallel tothose in Breslow (1981) that the CML estimator β of the x coefficient will be inconsistent. Because of theformal equivalence of conditional logistic and Cox-model partial likelihoods, the same type of problemcan afflict proportional-hazards analyses, although the bias would not be as severe because each failure(case) would be matched to many nonfailures at each failure time.

The example in Table 2 may seem extreme, but studies reporting similarly large odds ratios basedon sparse matched or stratified data are not uncommon, especially in analyses in which many covari-ates are entered in the conditional logistic model or in which the data are divided into small subgroups(for examples, see Daling et al., 1994; Witte et al., 1994; Abenhaim et al., 1996; Feychting et al., 1998;Schwartzbaum et al., 1998). Such large reported estimates should call attention to potential bias problems.Of perhaps greater concern, however, is the possibility of unnoticed small-sample bias in modest, plau-sible results. Uncontrolled study biases, like selection bias, misclassification, and residual confounding,can easily make the odds-ratio parameter eβ equal to 1.2 or even 1.5 when no underlying causal effect ispresent (Kelsey et al., 1996; Rothman and Greenland, 1998). As apparent from Table 1, small-sample biascan then operate on this biased parameter to generate CMLEs of 2 or more, which seem less plausibly ex-plained by study biases. The contribution of such synergistic bias effects to the generation of controversialresults may be considerable when most studies have few exposed cases.

Another potential for harmful synergy can arise from unnecessary matching. If the matching factor isrelated only to the exposure, such overmatching increases the variance of the CML estimator of the oddsratio by reducing the number of discordant matched sets available for analysis (Miettinen, 1970; Thomasand Greenland, 1983). An additional consequence of this reduction is an increase in the small-samplebias of the odds-ratio estimator. Some older writings on the impact of matching (e.g., Chase, 1968) didnot encounter these problems because they focused on tests of the null hypothesis under random matching(which does not increase concordance) or focused on the difference in proportions, whose variance de-creases as the pairwise correlation (and hence concordancy) increases, and which is exactly unbiased forthe average pairwise difference in response probabilities. In case-control studies, however, the ‘ response’is exposure status, and so the response difference is of no direct interest.

The present paper concerns the poor behavior of CML odds-ratio estimators under conditions commonin epidemiology (studies with few discordant matched pairs). This behavior does not have a simplerelation to the behavior of tests of the null hypothesis. Consider the behavior of the Wald test for univariateβ, treating W = β/SE(β) as a standard normal statistic for testing β = 0. W exhibits quite differentpathologies from β. It has long been known that W can eventually decline as |β| gets larger given afixed sample size, due to the fact the SE(β) can increase more rapidly than β as the latter increases.

For example, with matched pairs, the CML odds ratio is u/v and SE(β) = (1/u + 1/v)12 , so N = 24

discordant pairs and u = 22 yield eβ = 22/2 = 11 and W = ln(11)/(1/22 + 12 )

12 = 3.25, whereas

N = 24 and u = 23 yield eβ = 23 and W = ln(23)/(1/23 + 1/1)12 = 3.07. Thus, W declines as eβ

explodes. This type of behavior can result in the power of the Wald test dropping as |β| → ∞ given fixedN (Hauck and Donner, 1977; Vaeth, 1985).

Page 8: Greenland S. - Small-sample bias and corrections for conditional maximum-likelihood odds-ratio estimators(2000)(10).pdf

120 S. GREENLAND

6. RECOMMENDATIONS

The simplest diagnostic for small-sample or sparse-data problems is close tabular examination of basicdata. In the above example, the possibility of small-sample artifacts did not occur to the co-investigatorwho first presented the CML odds ratios to the research team, simply because the total number of pairs(259) seemed quite large. Even a more sophisticated summary, noting there are fifty-six discordant pairs(twenty-eight ‘ informative’ pairs per parameter), would not have signaled problems. Only the full pairtable (Table 1) shows the pair sparsity.

Full tabulation may seem impractical or unreliable when multiple covariates (some perhaps continu-ous) are entered in the model. A crude rule of thumb, adapted from an oft-cited rule for unconditionallogistic regression (Peduzzi et al., 1996), would require at least ten discordant matched sets per esti-mated parameter. This rule, however, fails dramatically in the above example. I therefore suggest that aBayesian or (more generally, when applicable) an hierarchical Bayes analysis can serve as a diagnostic, inthe following sense: if the results change dramatically between a CML analysis and a Bayesian analysiswith scientifically reasonable priors, one at least has a warning of severe data limitations. This use of aBayesian analysis need entail no commitment to the Bayesian results over the CML results, but may serveto temper reliance on the CML results in formulating conclusions. For this purpose, simple approximatefitting methods may suffice (Greenland, 1993b; Witte and Greenland, 1996; Greenland, 1997; Breslow etal., 1998), although even these are not invulnerable to sparse-data bias (Neuhaus and Segal, 1997).

A fully Bayesian analysis with scientifically sensible priors is of course the Bayesian solution to thesample-size problem, provided one uses a fitting method appropriate for small samples. Frequentistsmight argue instead in favor of formal bias corrections or exact analysis. Formal bias corrections for themultiple-regression case are currently only available for the coefficients, which, as argued above, are notthe final parameters of interest for public-health purposes. Exact analysis has more serious shortcomings.Exact intervals are constructed to ensure at least nominal coverage of the true parameter value. Thisassurance extends to all parameter values, no matter how absurdly large. The cost is that exact intervalstend to expand to values beyond the CML intervals, driving them even further from the Bayesian posteriorintervals than the CML intervals. They thus can be even more misleading than the CML intervals when(as seems inevitable) they are interpreted as posterior intervals by the consumer. On the practical side, thecapacity of exact programs remains limited, despite remarkable computing advances.

Regardless of how one chooses to deal with it, the potential for small-sample bias in results fromasymptotic procedures needs to be checked more routinely than is current practice. The development ofeasily programmed sample-size diagnostics for commercial software would be of particular value; formalbias corrections might serve well in this role.

ACKNOWLEDGEMENTS

The author thanks Kris Ebi, David Savitz, and Luciano Zaffanella for use of the example data, and thereferees for helpful comments.

REFERENCES

ABENHAIM, L., MORIDE, Y., BRENOT, F., RICH, S., BENICHOU, J., KURZ, X., HIGENBOTTAM, T., OAKLEY, C.,WOUTERS, E., AUBIER, M., et al. (1996). Appetite-suppressant drugs and the risk of primary pulmonary hyper-tension. New England Journal of Medicine 335, 609–616, Table 3.

AGRESTI, A. AND COULL, B. A. (1998). Approximate is better than ‘exact’ for interval estimation of binomialproportions. American Statistician 52, 119–126.

ANDERSON, J. A. AND RICHARDSON, S. C. (1979). Logistic discrimination and bias correction in maximum likeli-hood estimation. Technometrics 21, 71–78.

Page 9: Greenland S. - Small-sample bias and corrections for conditional maximum-likelihood odds-ratio estimators(2000)(10).pdf

Small-sample bias and corrections for conditional maximum-likelihood odds-ratio estimators 121

BECKER, S. (1989). A comparison of maximum likelihood and Jewell’s estimators of the odds ratio and relative riskin single 2 × 2 tables. Statistics in Medicine 8, 987–996.

BISHOP, Y. M. M., FIENBERG, S. E. AND HOLLAND, P. W. (1975). Discrete Multivariate Analysis: Theory andPractice. Cambridge, MA: MIT Press.

BRESLOW, N., LEROUX, B. AND PLATT, R. (1998). Approximate hierarchical modelling of discrete data in epidemi-ology. Statistical Methods in Medical Research 7, 49–62.

BRESLOW, N. E. (1981). Odds ratio estimators when the data are sparse. Biometrika 68, 73–84.BRESLOW, N. E. AND CLAYTON, D. G. (1993). Approximate inference in generalized linear mixed models. Journal

of the American Statistical Association 88, 9–25.BRESLOW, N. E. AND DAY, N. E. (1980). Statistical Methods in Cancer Epidemiology. I. The Analysis of Case-

Control Studies. Lyon: IARC.BYTH, K. AND MCLACHLAN, G. T. (1978). The biases associated with maximum likelihood methods of estimation

of the multivariate logistic risk function. Communications in Statistics A7, 877–890.CHASE, G. R. (1968). On the efficiency of matched pairs in Bernoulli trials. Biometrika 55, 365–369.CLAYTON, D. AND HILLS, M. (1993). Statistical Models in Epidemiology. New York: Oxford University Press.CORDEIRO, G. M. AND MCCULLAGH, P. (1991). Bias correction in generalized linear models. Journal of the Royal

Statistical Society B 53, 629–643.DALING, J. R., MALONE, K. E., VOIGT, L. F., WHITE, E. AND WEISS, N. S. (1994). Risk of breast cancer among

young women: relationship to induced abortion. Journal of the National Cancer Institute 86, 1584–1592.EBI, K. L., ZAFFANELLA, L. E. AND GREENLAND, S. (1999). Application of the case-specular method to two

studies of wire codes and childhood cancers. Epidemiology 10, 398–404.FEYCHTING, M., FORSSEN, U., RUTQUIST, L. E. AND AHLBOHM, A. (1998). Magnetic fields and breast cancer in

Swedish adults residing near high-voltage power lines. Epidemiology 9, 392–397.GOOD, I. J. (1983). Some history of the hierarchical Bayesian methodology. In Good Thinking ed. Good, I.J. Chap-

ter 9, 95–105. Minneapolis, MN: University of Minnesota Press.GREENLAND, S. (1991). Reducing mean squared error in the analysis of stratified epidemiologic studies. Biometrics

47, 773–775.GREENLAND, S. (1993a). A meta-analysis of coffee, myocardial infarction, and sudden coronary death. Epidemiology

4, 366–374.GREENLAND, S. (1993b). Methods for epidemiologic analyses of multiple exposures: A review and a comparative

study of maximum-likelihood, preliminary testing, and empirical-Bayes regression. Statistics in Medicine 12, 717–736.

GREENLAND, S. (1997). Second-stage least squares versus penalized quasi-likelihood for fitting hierarchical modelsin epidemiologic analysis. Statistics in Medicine 16, 515–526.

GREENLAND, S. (1999). A unified approach to the analysis of case-distribution (case-only) studies. Statistics inMedicine 18, 1–15.

GREENLAND, S., SCHWARTZBAUM, J. A. AND FINKLE, W. D. (2000a). Problems due to small samples and sparsedata in conditional logistic regression analysis. American Journal Epidemiology 151, in press.

GREENLAND, S., SHEPPARD, A. S., KAUNE, W. T., POOLE, C. AND KELSH, M. A. (2000b). A pooled analysis ofmagnetic fields, wire codes, and childhood leukemia. Epidemiology 11, in press.

HAUCK, W. W. AND DONNER, A. (1977). Wald’s test as applied to hypotheses in logit analysis. Journal of theAmerican Statistical Association 72, 851–853.

JEWELL, N. P. (1984). Small-sample bias of point estimators of the odds ratio from matched sets. Biometrics 40,421–435.

JEWELL, N. P. (1986). On the bias of commonly used measures of association for 2×2 tables. Biometrics 42, 351–358.KALISH, L. A. (1990). Reducing mean-squared error in the analysis of pair-matched case-control studies. Biometrics

46, 493–499.KELSEY, J. L., WHITTEMORE, A. S., EVANS, A. S. AND THOMPSON, W. D. (1996). Methods in Observational

Epidemiology. New York: Oxford University Press.KRAUS, A. S. (1960). Comparison of a group with disease and a control group from the same families, in search of

possible etiologic factors. American Journal of Public Health 50, 303–311.

Page 10: Greenland S. - Small-sample bias and corrections for conditional maximum-likelihood odds-ratio estimators(2000)(10).pdf

122 S. GREENLAND

LEONARD, T. AND HSU, J. S. J. (1994). The Bayesian analysis of categorical data: a selective review. In Aspects ofUncertainty ed. Freeman, P. R. and Smith, A. F. M. Chapter 18, 283–310. New York: Wiley.

LIANG, K.-Y. AND ZEGER, S. L. (1988). On the use of concordant pairs in matched case-control studies. Biometrics44, 1145–1156.

LIU, K.-J. (1989). A note on the estimate of the relative risk when sample sizes are small (letter). Biometrics 45,1030–1031.

LOGXACT (1993). Cambridge, MA, Cytel.MIETTINEN, O. S. (1970). Matching and design efficiency in retrospective studies. American Journal of Epidemiology

91, 111–118.MORGENSTERN, H. AND GREENLAND, S. (1990). Graphing ratio measures of effect. Journal of Clinical Epidemi-

ology 43, 539–542.NEUHAUS, J. M. AND SEGAL, M. R. (1997). An assessment of approximate maximum likelihood estimators in

generalized linear models. In Modelling Longitudinal and Spatially Correlated Data: Methods, Applications, andFuture Directions ed. Gregoire, T.G., Brillinger, D. R., Diggle, P. J., Russek-Cohen, E., Warren, W. G. and Wolfin-ger, R. D. 11–22. New York: Springer.

PEDUZZI, P., CONCATO, J., KEMPER, E., HOLFORD, T. R. AND FEINSTEIN, A. R. (1996). A simulation study of thenumber of events per variable in logistic regression analysis. Journal of Clinical Epidemiology 49, 1373–1379.

PORTIER, C. J. AND WOLFE, M. S. (1998). Assessment of Health Effects from Exposure to Power-line FrequencyElectric and Magnetic Fields. Research Triangle Park, NC; National Institute of Environmental Health Sciences.

ROTHMAN, K. J. AND GREENLAND, S. (1998). Modern Epidemiology (2nd edn). Philadelphia: Lippincott-Raven.SCHAEFER, R. L. (1983). Bias correction in maximum-likelihood logistic regression. Statistics in Medicine 2, 71–78.SCHWARTZBAUM, J. A., FISHER, J. L. AND CORNWELL, D. G. (1998). Role of dietary energy and cured meat

consumption in adult glioma risk (abstract). American Journal of Epidemiology 147, S7.SIEGEL, D. G. AND GREENHOUSE, S. W. (1973). Validity in estimating relative risk in case-control studies. Journal

of Chronic Diseases 42, 687–688.THOMAS, D. C. AND GREENLAND, S. (1983). The relative efficiencies of matched and independent sample designs

for case-control studies. Journal of Chronic Diseases 36, 685–697.VAETH, M. (1985). On the use of Wald’s test in exponential families. International Statistics Review 53, 199–214.WALTER, S. D. AND COOK, R. J. (1991). A comparison of several point estimators of the odds ratio in a single 2 × 2

contingency table. Biometrics 47, 795–811.WITTE, J. S. AND GREENLAND, S. (1996). Simulation study of hierarchical regression. Statistics in Medicine 15,

1161–1170.WITTE, J. S., GREENLAND, S., HAILE, R. W. AND BIRD, C. L. (1994). Hierarchical regression analysis applied to

a study of multiple dietary exposures and breast cancer. Epidemiology 5, 612–621.ZAFFANELLA, L. E., SAVITZ, D. A., GREENLAND, S. AND EBI, K. L. (1998). The residential case-specular method

to study wire codes, magnetic fields, and disease. Epidemiology 9, 16–20.

[Received June 28, 1999. Revised October 25, 1999]