Multiplicity in Clinical Trials Ziad Taib Biostatistics AstraZeneca March 12, 2012

Multiplicity in Clinical Trials

Ziad Taib

Biostatistics

AstraZeneca

March 12, 2012

Issues• The multiplicity problem• Sources of multiplicity in clinical trials• Bonferroni• Holm• Hochberg• Closed test procedures• FDR (Benjamini-Hochberg)

The multiplicity problem

• When we perform one test there is a risk of % for a false significant result i.e. rejecting H0 (no effect) when it is actually true.

• What about the risk for at least one false significant result when performing many tests?

Greater or smaller?

When performing 20 independent tests, we shall expect to have one significant result even though no difference exists.

Number of tests Probability

1 0.05

2 0.0975

5 0.226

10 0.401

50 0.923P(at least one false positive result) = 1 - P(zero false positive results)= 1 – (1 - .05) ^ k

Probability of at least one false significant result

Multiplicity Dimensions

• A. Multiple treatments

• B. Multiple variables

• C. Multiple time points

• D. Interim analyses

• E. Subgroup analyses

The multiplicity problem

• Doing a lot of tests will give us significant results just by chance.

• We want to find methods to control this risk (error rate).

• The same problem arises when considering many confidence intervals simultaneously.

Family wise error rate

• FWE = probability of observing a false positive finding in any of the tests undertaken

• While there may be different opinions about needing to adjust:– Regulatory authorities are concerned about any false

claims for the effectiveness of a drug, not just for the claim based on the primary endpoint(s)

– So we will need to demonstrate adequate control of the FWE rate

• Its not just about the p-value!– True! estimates and confidence intervals are important

too– Ideally, multiplicity methods need to handle these as well

Procedures for controlling the probability of false significances

• Bonferroni

• Holm

• Hochberg

• Closed tests

• FDR

Bonferroni

• N different null hypotheses H1, … HN

• Calculate corresponding p-values p1, … pN

• Reject Hk if and only if pk < /N

Variation: The limits may be unequal as long as they sum up to

Conservative

Bonferroni’s inequality

• P(Ai) = P(reject H0i when it is true )

N

NN

APAPN

i

N

ii

N

ii

111

Reject at least one hypthesis falsely

N

Example of Bonferroni correction

• Suppose we have N = 3 t-tests. • Assume target alpha = 0.05.• Bonferroni corrected p-value is alpha/N = 0.05/3

= 0.0167• Unadjusted p-values are p1 = 0.001; p2 = 0.013; p3 = 0.074

– p1 = 0.001 < 0.0167, so reject H01

– p2 = 0.013 < 0.0167, so reject H02 – p3 = 0.074 > 0.0167, so do not reject H03

Holm• N different null hypotheses H01, … H0N


• Order the p-values from the smallest to the largest, p(1) < ….<p(N)

• Start with the smallest p-value and reject H(j) as long as p(j) < /(N-j+1)

Example of Holm’s test• Suppose we have N = 3 t-tests. • Assume target alpha= 0.05. • Unadjusted p-values are • p1 = 0.001; p2 = 0.013; p3 = 0.074• For the jth test, calculate

alpha(j) = alpha/(N – j +1)– For test j = 1, alpha(1) = 0.05/(3 – 1 + 1)=0.0167– the observed p1 = 0.001 is less than 0.0167, so

we reject the null hypothesis.

• For test j = 2, • alpha(2) = 0.05/(3 – 2 + 1) = 0.05 / 2= 0.025

• the observed p2 = 0.013 is less than alpha(j) = 0.025, so we reject the null hypothesis.

• For test j = 3, • alpha(3) = 0.05/(3 – 3 + 1) = 0.05• the observed p3 = 0.074 is greater than alpha(3) =

0.05, so we do not reject the null hypothesis.

Hochberg• N different null hypotheses H1, … HN


• Order the p-values from the smallest to the largest, p(1) < ….<p(N)

• Start with the largest p-value. If p(N) < stop and declare all comparisons significant at level (i.e. reject H(1) … H(N) at level ). Otherwise accept H(N) and go to the next step

• if p(N-1) < /2 stop and declare H(1) … H(N-1) significant. Otherwise accept H(N-1) and go to the next step

• ….• If p(N-k+1) < /(N-k+1) stop and declare H(1) … H(N-k+1) significant.

Otherwise accept H(N-k+1) and go to the next step

Closed procedures - stepwise

• Pre-specify order of the tested hypothesis. Test on 5% level until non-significant result.

• Order of tested hypothesis stated in protocol– Dose-response– Factorial designs

Example

• Assume we performed N=5 tests of hypothesis simultaneously and want the result to be at the level 0.05. The p-values obtained were

p(1) 0.009

p(2) 0.011

p(3) 0.012

p(4) 0.134

p(5) 0.512

• Bonferroni: 0.05/5=0.01. Since only p(1) is less than 0.01 we reject H(1) but accept the remaining hypotheses.

• Holm: p(1), p(2) and p(3) are less than 0.05/5, 0.05/4 and 0.05/3 respectively so we reject the corresponding hypotheses H(1), H(2) and H(3). But p(4) = 0.134 > 0.05/2=0.025 so we stop and accept H(3) and H(4).

• Hochberg: – 0.512 is not less than 0.05 so we accept H(5)– 0.134 is not less than 0.025 so we accept H(4)– 0.012 is less than 0.0153 so we reject H(1),H(2) and

H(3)

Questions or Comments?

Documents

Multiplicity in Clinical Trials Ziad Taib Biostatistics AstraZeneca March 12, 2012