Correcting for Selection Bias in Randomized Clinical Trials

Correcting for Selection Bias in Randomized Clinical Trials

Vance W. Berger, NCIVance W. Berger, NCI

9/15/05 FDA/Industry Workshop, DC9/15/05 FDA/Industry Workshop, DC

Outline 1. What do we expect of randomization (4)?

2. Chronological bias (2). 3. Randomized blocks (3). 4. Selection bias (7). 5. Correcting selection bias (5). 6. Further reading (4).

1. What Do We Expect? (1/4) The success of randomization has often

been questioned in randomized trials, because of baseline imbalances [1].

For example, Schor [2] raised this concern in The University Group Diabetes Program.

Altman [3] raised this concern for a Altman [3] raised this concern for a randomized comparison of talc to mustine randomized comparison of talc to mustine for control of pleural effusions [4].for control of pleural effusions [4].

1. What Do We Expect? (2/4) Because of an imbalance in the numbers of Because of an imbalance in the numbers of

patients randomized to each group (134 vs. 116), patients randomized to each group (134 vs. 116), the Western Washington Intracoronary the Western Washington Intracoronary Streptokinase Trial statisticians were “particularly Streptokinase Trial statisticians were “particularly concerned in verifying that the randomization concerned in verifying that the randomization process had been carried out as planned” [5].process had been carried out as planned” [5].

Weiss, Gill, and Hudis [6] audited a randomized Weiss, Gill, and Hudis [6] audited a randomized South African trial of high-dose chemotherapy for South African trial of high-dose chemotherapy for metastatic breast cancer [7], noted imbalances in metastatic breast cancer [7], noted imbalances in the numbers of patients allocated over time, and the numbers of patients allocated over time, and concluded that “It is unlikely that this sequence of concluded that “It is unlikely that this sequence of treatment assignments could have occurred if the treatment assignments could have occurred if the study were truly randomized.”study were truly randomized.”

1. What Do We Expect? (3/4) In a randomized study of a culturally sensitive AIDS In a randomized study of a culturally sensitive AIDS

education program [8], Marcus [9] hypothesized that education program [8], Marcus [9] hypothesized that “subjects with lower baseline knowledge scores … may “subjects with lower baseline knowledge scores … may have been channeled into the treatment group”, because have been channeled into the treatment group”, because of baseline imbalances across the randomized groups.of baseline imbalances across the randomized groups.

Jordhoy et al. [10] discussed a cluster randomized trial Jordhoy et al. [10] discussed a cluster randomized trial of palliative care conducted at the Palliative Medicine of palliative care conducted at the Palliative Medicine Unit of Trondheim University Hospital and noted that Unit of Trondheim University Hospital and noted that “The individual patient results [meaning baseline “The individual patient results [meaning baseline imbalances] suggested that diagnosis was not randomly imbalances] suggested that diagnosis was not randomly distributed across the two groups”.distributed across the two groups”.

1. What Do We Expect? (4/4)

Two common themes emerge from all of these Two common themes emerge from all of these challenges of ostensibly randomized trials.challenges of ostensibly randomized trials.

Questions are raised when either 1) the numbers Questions are raised when either 1) the numbers of subjects do not match expectations or 2) the of subjects do not match expectations or 2) the baseline characteristics of the participants differ baseline characteristics of the participants differ greatly across the randomized groups.greatly across the randomized groups.

Clearly, then, we expect more from randomized Clearly, then, we expect more from randomized trials than just that they be randomized, and in trials than just that they be randomized, and in fact randomization does not always create the fact randomization does not always create the balanced groups we would have hoped for.balanced groups we would have hoped for.

2. Chronological Bias (1/2) How can baseline imbalances be large enough that one How can baseline imbalances be large enough that one

would question the success of the randomization?would question the success of the randomization? Completely unrestricted randomization ensures Completely unrestricted randomization ensures

independence, but allows for unbalanced group sizes, independence, but allows for unbalanced group sizes, and so is not used very often in practice.and so is not used very often in practice.

Instead, some form of restricted randomization is used to Instead, some form of restricted randomization is used to ensure balanced group sizes at the end of the trial.ensure balanced group sizes at the end of the trial.

The random allocation rule makes this terminal balance The random allocation rule makes this terminal balance in group sizes its only restriction, and so it allows for in group sizes its only restriction, and so it allows for large baseline imbalances large baseline imbalances duringduring the trial. the trial.

Suppose that many more early allocations are to one Suppose that many more early allocations are to one group, and more late allocations are to the other group.group, and more late allocations are to the other group.

Suppose further that the covariate distribution changes Suppose further that the covariate distribution changes during the course of the trial; this is quite likely.during the course of the trial; this is quite likely.

2. Chronological Bias (2/2) There could be more females early, but during There could be more females early, but during

the trial another trial opens up just for females, the trial another trial opens up just for females, so there are more males in this trial henceforth.so there are more males in this trial henceforth.

Gender is confounded with time, which, because Gender is confounded with time, which, because of the imbalance, is confounded with treatments.of the imbalance, is confounded with treatments.

This is chronological bias [11], although the This is chronological bias [11], although the name is a misnomer as chronological bias does name is a misnomer as chronological bias does not systematically favor one group or the other.not systematically favor one group or the other.

Still, it is one cause of baseline imbalances.Still, it is one cause of baseline imbalances. The only way to control chronological bias is to

introduce restrictions on the randomization.

3. Randomized Blocks (1/3) Perhaps the most common form of restricted

randomization is randomized or permuted blocks. The idea is to force perfect balance every so often. Block sizes may be fixed (e.g., 4) or varied (e.g., 2

& 4), and the random allocation rule is used within each block to ensure perfect balance in the block.

In unmasked trials, prior allocations are known. Once all but one group has been exhausted in the

block (e.g., EECC with size 4), all remaining allocations to that block will be deterministic.

3. Randomized Blocks (2/3) In fact, in an EECC block even the 2nd is

predictable, as one can use knowledge of the 1st allocation to do better than guessing.

Let P{E} be the proportion of remaining assignments to the experimental group E.

If there is 1:1 allocation between experimental group E and control C, with block size 4:

CCEE 2/4, 2/3, 2/2, 1/1 EECC 2/4, 1/3, 0/2, 0/1 CECE 2/4, 2/3, ½. 1/1 ECEC 2/4, 1/3, ½, 0/1 CEEC 2/4, 2/3, ½, 0/1 ECCE 2/4, 1/3, ½, 1/1

3. Randomized Blocks (3/3) Only the 1st allocation of an EECC or CCEE block

is unpredictable, and only the 1st and 3rd of CECE, CEEC, ECEC, or ECCE blocks are unpredictable.

Even if the investigator has never actually seen the allocation sequence, he or she will still know P{E} at the time a patient is considered for trial entry.

In fact, the investigator will know both P{E} (the predicted treatment assignment) and the set of covariates specific to the patient being considered.

Only if P{E} equals the unconditional probability (or 0.5 with 1:1 allocation) is there no prediction.

4. Selection Bias Mechanism (1/7) Many authors state that, as a consequence of

randomization, any baseline imbalances in a randomized trial must be random in origin.

Yet selection bias occurs if healthier patients are enrolled when P{E}>0.5 and sicker patients are enrolled when P{E}<0.5 (or vice versa).

Of course, this is not a concern in masked trials, because unmasking is required for P{E} to assume any value other than the uninformative 0.5.

But in practice, are there any truly masked trials?

4. Selection Bias Mechanism (2/7) It will help to define our terms carefully. Some define masked trials as those in which

nobody knows who got what until the end. Indeed, this is the objective of masking; to define

randomization similarly in terms of its objective is to define a trial to be randomized if and only if any of its baseline imbalances are random.

And yet one cannot help but recall Socrates asking if an act was pious because the heavens approved, or if the heavens approved because it was pious.

4. Selection Bias Mechanism (3/7) Just as one cannot confer with Zeus to inquire as

to his approval of an action one is contemplating, so too is one unable to verify that each observed baseline imbalance was of a random origin.

This ideal would have to be a consequence, and not the definition, of randomization, and we are now left to wonder – what is randomization?

To make randomization, masking, and allocation concealment useful concepts, and avoid circular logic, we must define these three terms as actions that one can take (processes), and not as the realization of their intended outcomes [12].

4. Selection Bias Mechanism (4/7) The process of randomization is nothing more, or

less, than constructing treatment groups by randomly selecting non-overlapping subsets of the set of all accession numbers to be used [13].

Note that this definition allows one to actually conduct a randomized trial (it is an action).

Can one eliminate selection bias as a consequence of randomization according to the definition?

Without allocation concealment (often defined as masking of each allocation only until a treatment is assigned to the patient in question), the answer is clearly no, but perfect masking implies perfect allocation concealment, which implies no bias.

4. Selection Bias Mechanism (5/7) But do masking & allocation concealment claims

confer true allocation concealment (and no bias)? The process of masking, or not telling patients or

physicians who got what, is clearly worthwhile, but information is not often contained very well.

Tell-tale side effects, e.g., may lead to unmasking. Sealed envelopes have been held up to lights, files

have been raided, and fake patients have been called in to ascertain the next allocation [14].

So the effect of masking may not match its goal. Unmasking may lead to evaluation biases; if it

occurs after the patients have been selected then it should not lead to selection bias; however …

4. Selection Bias Mechanism (6/7) Most RCTs use restricted randomization (blocks). The patterns in the allocation sequence allow for

prediction of the future allocations based on knowledge of the past ones, and selection bias [1].

So even “masked” randomized trials with planned allocation concealment are not immune [12].

One can compute the expected imbalance in a binary covariate to be 50% with blocks of size 2, 42% (block size 4), or 28% (block size 6) [15].

The result is artificially large test statistics and posterior probabilities, artificially low p-values, and artificially narrow confidence intervals.

All patients randomized (20 male, 20 female)

P{E}=0.0 (10 male) P{E}=0.5 (10 male, 10 female) P{E}=1.0 (10 female)

Control Group (25% female, 75% male)

Experimental Group (75% female, 25% male)

4. Selection Bias Mechanism (7/7)

20 blocks of size two each10 ‘CE’ blocks, 10 ‘EC’ blocksFor ‘CE’, P{E}=0.5, then 1.0For ‘EC’, P{E}=0.5, then 1.0Females respond better than males

SelectivelySemi-permeable Permeable Selectively

Semi-permeable

100% 100%t50% 50%

5. Correcting Selection Bias (1/5) Selection bias can be prevented, detected, and

corrected, but specialized methods are needed. Recall that E & C are the experimental & control

treatment groups (TG), respectively; P{E} is the proportion of E allocations remaining in the block.

If E is superior to C, then treatment group TG and response Y are correlated, as are P{E} and TG.

P{E} should be unbalanced, possibly prognostic. But P{E} should not predict Y within a given TG. Consider two patients who receive E, one known

up front to get E (P{E}=1), one not (P{E}=0.50).

5. Correcting Selection Bias (2/5) If E[Y|TG=E, P{E}] depends on P{E}, then P{E}

is on the causal pathway of the mechanism of action of E; this would suggest selection bias.

For example, consider a study with 24 patients, 12 blocks of size two each, six each of EC and CE.

P{E}=0.5 if block position BP=1, P{E}=0 if BP=2 (EC block), and P{E}=1 if BP=2 (CE block).

Suppose that the response data turn out as follows. BP=2, P{E}=0 BP=1, P{E}=1/2 BP=2, P{E}=1 T

C 0/6 3/6 0/0 3/12 E 0/0 3/6 6/6 9/12

5. Correcting Selection Bias (3/5) Fisher’s exact p-values are 0.04 (two-sided) or

0.02 (one-sided) for comparing either E to C or EC blocks to CE blocks; p=0.0003 one-sided or p=0.0007 two-sided for testing for trend in P{E} binomial proportions (Jonckheere-Terpstra).

So P{E} is even more predictive than treatment is! Without allocation concealment P{E} is a perfect

predictor of treatment group (TG), but allocation concealment (meaning the ability to predict but not observe) separates the effects of P{E} and TG.

5. Correcting Selection Bias (4/5) The Berger-Exner test of selection bias [16]

exploits this separation of effects, and is based on the ability of P{E} to predict Y, adjusting for TG.

The quantity P{E} can also be used to correct for selection bias, because there is no bias within a group of patients with the same P{E} value.

That is, P{E} is a balancing score much like the propensity score (used in observational studies).

P{E} functions as the propensity score, and was termed the “reverse propensity score” [17].

So compare TGs within P{E} values [17] to ensure that the comparisons are free of bias.

5. Correcting Selection Bias (5/5) That is, the suggestion is to use the RPS as a

covariate, although it is an unusual covariate. We might call the RPS a “reverse causality”

covariate, because it does not bring about better outcomes but rather suggests that the patient was found to possess attributes that would do so.

So the RPS is a credential that reflects selection based on all attributes, but is not itself an attribute.

Further work is needed to clarify if the RPS should replace or supplement other covariates.

6. Further Reading (1/4)

More information is More information is available -- just send available -- just send me a message and I me a message and I will send you articles.will send you articles.

Vance BergerVance Berger [email protected]@nih.gov (301) 435-5303(301) 435-5303

6. Further Reading (2/4) [1]. Berger VW, Weinstein S (2004). Ensuring the Comparability of

Comparison Groups: Is Randomization Enough? Controlled Clinical Trials 25, 515-524.

[2]. Schor, S. (1971). The University Group Diabetes Program: A Statistician Looks at the Mortality Results. JAMA 217, 12, 1671-1675.

[3]. Altman, D. G. (1985). Comparability of Randomized Groups. The Statistician 34, 125-136.

[4]. Fentiman, I. S., Rubens, R. D., Hayward, J. L. (1983). Control of Pleural Effusions in Patients with Breast Cancer. Cancer 52, 737-739.

[5]. Hallstrom, A., Davis, K. (1988). Imbalance in Treatment Assignments in Stratified Blocked Randomization. Controlled Clinical Trials 9, 375-382.

[6]. Weiss, R. B., Gill, G. G., and Hudis, C. A. (2001). An On-Site Audit of the South African Trial of High-Dose Chemotherapy for Metastatic Breast Cancer and Associated Publications. Journal of Clinical Oncology 19, 11, 2771-2777.

6. Further Reading (3/4) [7]. Bezwoda, W. R., Seymour, L., and Dansey, R. D. (1995). High-Dose

Chemotherapy with Hematopoietic Rescue as Primary Treatment for Metastatic Breast Cancer: A Randomized Trial. Journal of Clinical Oncology 13, 2483-2489.

[8]. Stevenson, H. C., Davis, G. (1994). Impact of Culturally Sensitive AIDS Video Education on the AIDS Risk Knowledge of African American Adolescents. AIDS Education and Prevention 6, 40-52.

[9]. Marcus SM (2001). Sensitivity Analysis for Subverting Randomization in Controlled Trials. Statistics in Medicine 20, 545-555.

[10]. Jordhoy, M. S., Fayers, P. M., Ahlner-Elmqvist, M., Kaasa, S. (2002). Lack of Concealment May Lead To Selection Bias in Cluster Randomized Trials of Palliative Care. Palliative Medicine 16, 43-49.

[11]. Matts, J. P. and McHugh, R. B. (1983). Conditional Markov chain design for accrual clinical trials. Biometrical Journal 25, 563-577.

[12]. Berger, VW, Christophi, CA (2003). “Randomization Technique, Allocation Concealment, Masking, and Susceptibility of Trials to Selection Bias”, JMASM 2, 1, 80-86.

[13]. Berger, VW (2004). “Selection Bias and Baseline Imbalances in Randomized Trials”, Drug Information Journal 38, 1-2.

6. Further Reading (4/4) [14]. Berger, VW (2005). Selection Bias and Covariate

Imbalances in Randomized Clinical Trials, John Wiley & Sons, Chichester.

[15]. Berger, VW (2005). “Quantifying the Magnitude of Baseline Covariate Imbalances Resulting from Selection Bias in Randomized Clinical Trials” (with discussion), Biometrical Journal 47, 2, 119-139.

[16]. Berger, VW, Exner, DV (1999). “Detecting Selection Bias in Randomized Clinical Trials”, Controlled Clinical Trials 20, 319-327.

[17]. Berger, VW (2005). “The Reverse Propensity Score To Manage Baseline Imbalances in Randomized Trials”, Statistics in Medicine 24, in press.

Documents

Correcting for Selection Bias in Randomized Clinical Trials