Biostatistics Case Studies 2007

Biostatistics Case Studies 2007

Peter D. Christenson

Biostatistician

http://gcrc.labiomed.org/Biostat

Session 5:

Demonstrating Lack of Treatment Effect: Equivalence or Non-inferiority

Terminology

Superiority and/or Inferiority Study:

• Two or more treatments are assumed equal and the study is designed to find overwhelming evidence of a difference.

• Usually, one treatment is a control, sham, or placebo.

• Most common comparative study type.

• It is rare to assess only one of superiority or inferiority (“one-sided” statistical tests), unless there is

biological impossibility of one of them.

Terminology

Equivalence Study:

• Two treatments are assumed to differ and the study is designed to find overwhelming evidence that they are equal.

• Usually, the quantity of interest is a measure of biological activity or potency and “treatments” are

drugs or lots or batches of drugs.

• AKA, bioequivalence.

• Sometimes used to compare clinical outcomes for two active treatments, e.g., statins or vaccines, if neither treatment can be considered standard or accepted.

This usually requires large numbers of subjects

Terminology

Non-Inferiority Study:

• Usually a new treatment or regimen is compared with an accepted treatment or regimen or standard of care.

• The new treatment is assumed inferior to the standard and the study is designed to show overwhelming

evidence that it is at least nearly as good, i.e., non- inferior. It may has other advantages, e.g., oral vs. inj.

• A negative inferiority study fails to detect inferiority, but does not necessarily give evidence for non-inferiority.

• The accepted treatment is usually known to be efficacious already, but an added placebo group may also be used.

• The distinguishing feature is an attempt to prove negativity, not the one-sidedness of the inference.

Case Study

pASA+PPI = 1.5%

Demonstrate: pclop – pASA+PPI ≤ 4%

N=145/group Power=80% for what?

Typical Analysis: Inferiority or Superiority

H0: pclop – pASA+PPI = 0%

H1: pclop – pASA+PPI ≠ 0%

H1 → therapies differ

α = 0.05

Power = 80% for Δ=|pclop - pASA+PPI| =?

Clop inferior

= 95% CI for pclop – pASA+PPI

Clop superior

0

0

pclop – pASA+PPI

pclop – pASA+PPI

[Not used in this paper]

0pclop – pASA+PPI

No diff detected*

* and 80% chance that a Δ of (?) or more would be detected.





α = 0.05

Power = 80% for Δ=|pclop - pASA+PPI| =?


So, N=331/group → 80% chance that a Δ of 4% or more would be detected.

Detectable Δ = 5.5%-1.5%=4%





α = 0.05

Power = 80% for Δ=|pclop - pASA+PPI| =4%


H0: pclop – pASA+PPI ≤ 0%

H1: pclop – pASA+PPI > 0%

H1 → clop inferior

Note that this could be formulated as two one-sided tests (TOST):

α = 0.025

Power = 80% for pclop - pASA+PPI =4%

H0: pclop – pASA+PPI ≥ 0%

H1: pclop – pASA+PPI < 0%

H1 → clop superior

α = 0.025

Power = 80% for pclop - pASA+PPI =-4%

Demonstrating Equivalence

H0: |pclop – pASA+PPI| ≥ E%

H1: |pclop – pASA+PPI| < E%

H1 → therapies “equivalent”, within E


H0: pclop – pASA+PPI ≤ -4%

H1: pclop – pASA+PPI > -4%

H1 → clop non-superior

Note that this could be formulated as two one-sided tests (TOST):

α = 0.025

Power = 80% for pclop - pASA+PPI = 0%



H1 → clop non-inferior

α = 0.025


Demonstrating Equivalence

H0: |pclop – pASA+PPI | ≥ 4%

H1: |pclop – pASA+PPI | < 4%

H1 → equivalence

α = 0.05

Power = 80% for pclop

- pASA+PPI = 0

Clop non-superior


Clop non-inferior

0

0

pclop – pASA+PPI

pclop – pASA+PPI

0

pclop – pASA+PPI Equivalence*

-4 4

-4

-4

4

4

* both non-superior and non-inferior.

This Paper: Inferiority and Non-Inferiority




Apparently, two one-sided tests (TOST), but only one explicitly powered:

α = 0.025

Power = 80% for pclop - pASA+PPI = ?%




α = 0.025


The authors chose E=4% as the maximum therapy difference that therapies are considered equivalent.

This Paper: Inferiority and Non-Inferiority

Clop inferior


Clop non-inferior

0

0

pclop – pASA+PPI

pclop – pASA+PPI

0

pclop – pASA+PPI

“Non-clinical” inferiority*

-4 4

-4

-4

4

4

* clop is statistically inferior, but not enough for clinical significance.

Decisions:

Observed Results: pclop = 8.6%; pASA+PPI = 0.7%; 95% CI = 3.4 to 12.4

12

0-4 4

pclop – pASA+PPI

Clop inferior

Power for Test of Clopidrogrel Non-Inferiority




α = 0.025


- pASA+PPI = 0%

Power for Test of Clopidrogrel Inferiority




α = 0.025


- pASA+PPI = 7.3%

Detectable Δ = 8.8%-1.5%=7.3%

Conclusions: This Paper

• In this paper, clop was so inferior that investigators were apparently lucky to have enough power for detecting it. The CI was too wide with this N for detecting a smaller therapy difference.

• Investigators justify testing non-inferiority of clop only (and not of Aspirin + Nexium) with the lessened desirability of combination therapy (?).

•This is a good approach for size and power for a new competing therapy against a standard, if the N for clop inferiority had been considered also.

• Note that power calculations were based on actual %s of subjects, whereas cumulative 12-month incidence was used in the analysis. There are not power calculations for equivalency tests using survival analysis, that I know of.

Conclusions: General

• “Negligibly inferior” would be a better term than non-inferior.

• All inference can be based on confidence intervals.

• Pre-specify the comparisons to be made. Cannot test for both non-inferiority and superiority.

• Power for only one or for multiple comparisons, e.g., non-inferiority and inferiority. Power can be different for different comparisons.

• Very careful consideration must be given to choice of margin of equivalence (4% here). The study is worthless if others in the field would find your margin too large.

FDA Guidelines• http://www.fda.gov/cder/guidance/4155fnl.pdf

• FDA has at least 4 major concerns:

1. Need strong evidence that standard treatment is effective.

2. Must have acceptable margin of equivalence that is much smaller than the effect of the standard over placebo.

3. Trial design must be very close to that which established the effectiveness of the standard treatment.

4. Study conduct must be high quality. This sounds like business-speak about “excellence”, but it’s really referring to the fact that superiority studies are by nature conservative: e.g., non-compliance and misclassification bias the results toward no effect. Those flaws in a non-inferiority study have the same bias, making it easier to falsely prove the aim.

Appendix: Possible Errors in Study Conclusions

Truth:

H0: No Effect H1: Effect

No Effect

Effect

Study Claims:

Correct

CorrectError (Type I)

Error (Type II)

Power: Maximize

Choose N for 80%

Set α=0.05

Specificity=95%

Specificity

Sensitivity

Typical study to demonstrate superiority/inferiority

Appendix: Graphical Representation of Power

H0

HA

H0: true effect=0

HA: true effect=3

Effect in study=1.13

\\\ = Probability of concluding HA if H0 is true.

41%

5%

Effect (Group B mean – Group A mean)

/// = Probability of concluding H0 if HA is true. Power=100-41=59%Note greater power if larger N, and/or if true effect>3, and/or less subject heterogeneity.

N=100 per

Group

Larger Ns give

narrower curves

Typical study to demonstrate superiority/inferiority

www.stat.uiowa.edu/~rlenth/Power

Appendix: Online Study Size / Power Calculator

Does NOT include tests

for equivalence

or non-inferiority

or non-superiority

Documents

Biostatistics Case Studies 2007