1 From bench to the bedside Statistics Issues in RCT Ferran Torres Biostatistics and Data Management Platform IDIBAPS - Hospital Clinic Barcelona Universitat

1

From bench to the bedside

Statistics Issues in RCT

Ferran TorresBiostatistics and Data Management PlatformIDIBAPS - Hospital Clinic Barcelona

Universitat Autònoma Barcelona. EMA:

Scientific Advice Working Party (SAWP)Biostatistics Working Party (BSWP).

2

Disclaimer• The opinions expressed today are personal views

and should not be understood or quoted as being made on behalf of any organization.

– Regulatory• Spanish Medicines Agency (AEMPS)• European Medicines Agency (EMA)

– Scientific Advice Working Party (SAWP)– Biostatistics Working Party (BSWP)

– Hospital - Academic - Independent Research• IDIBAPS. Hospital Clinic Barcelona• Autonomous University of Barcelona (UAB)• SCREN. Spanish Clinical Trials Platform

Documentation

3

Documentation

• Power Point presentation• Selected References • Direct links to guidelines

4

http://ferran.torres.name/edu/stats_rct

Password: stats_rct

5

Globalisation

LACK OF HARMONISATION

Data toregister in all regions

Similar BasicTechnical

Requirements

JAPANUSA

EU

INTERNATIONAL CONFERENCES HARMONISATION

www.ich.org

6

Regulatory Regulatory AgenciesAgencies

7

8

9

• CPMP/EWP/908/99 CPMP Points to Consider on Multiplicity issues in Clinical Trials

• CPMP/EWP/2863/99 Points to Consider on Adjustment for Baseline Covariates

• CPMP/2330/99 Points to Consider on Application with 1.) Meta-analyses and 2.) One Pivotal study

• Choice of a Non-Inferiority Margin CPMP/EWP/482/99 Points to Consider on Switching between Superiority and Non-inferiority

• CPMP/EWP/1776/99 Points to Consider on Missing Data

• CHMP/EWP/83561/05 Guideline on Clinical Trials in Small Populations

• CHMP/EWP/2459/02 Reflection Paper on Methodological Issues in Confirmatory Clinical Trials with Flexible Design and Analysis Plan

Regulatory Guidances

1010

• Consort Statement: Summary, // General, // non-inferiority

• Lancet: Series de Methodological & Stats Series

• BMJ: Statistics Notes (Bland & Altman) or in BMJ

1010

“Scientific Recomendations”

11

http://www.equator-network.org http://www.equator-network.org

12

Today’s talk is

on statistic

s

13

14

15

Basic statistics

• Why Statistics?• Samples and populations• P-Value• Statistical errors• Sample size• Confidence Intervals• Interpretation of CI: superiority, non-

inferiority, equivalence

16

The role of statistics

“Thus statistical methods are no substitute for common sense and objectivity. They should never aim to confuse the reader, but instead should be a major contributor to the clarity of a scientific argument.”

The role of statistics. Pocock SJBr J Psychiat 1980; 137:188-190

17

Why Statistics?

•Variation!!!!

BACKGROUNG

SAMPLE AND POPULATIONSP-VALUE AND CONFIDENCE INTERVALS

18

19

p

20

Population and Samples

Target Population

Population of the Study

Sample

21

Extrapolation

Sample

Population

Inferential analysisStatistical Tests

Confidence Intervals

Study Results

“Conclusions”

22

P-value• The p-value is a “tool” to answer the question:

–Could the observed results have occurred by chance*?

–Remember:• Decision given the observed results in a SAMPLE

• Extrapolating results to POPULATION

*: accounts exclusively for the random error, not bias

p < .05“statistically significant”

23

P-value: an intuitive definition

• The p-value is the probability of having observed our data when the null hypothesis is true (no differences exist)

• Steps:1) Calculate the treatment differences in the sample (A-B)2) Assume that both treatments are equal (A=B) and then…3) …calculate the probability of obtaining a magnitude of at

least the observed differences, given the assumption 24) We conclude according the probability:

a. p<0.05: the differences are unlikely to be explained by random, – we assume that the treatment explains the differences

b. p>0.05: the differences could be explained by random, 1) we assume that random explains the differences

24

Factors influencing statistical significance

• Signal

• Noise (background)

• Quantity

• Difference

• Variance (SD)

• Quantity of data

diferencia

dens

idad

-2 0 2 4

0.0

0.1

0.2

0.3

0.4

diferencia

dens

idad

-2 0 2 4

0.0

0.2

0.4

0.6

0.8

25

26

P-value. Some reflexionsTell us NOTHING about clinical or scientific importance. Only, that the results were not due to chance.

• A “very low” p-value do NOT imply:–Clinical relevance (NO!!!)–Magnitude of the treatment effect (NO!!)

With n or variability p

•Please never compare p-values!! (NO!!!)

27

Interval Estimation

Confidence intervalSample statistic (point estimate)

Confidence limit (lower)

Confidence limit (upper)

Intuitive interpretation:

“A probability that the population parameter falls somewhere within the interval”

28

95%CI• Better than p-values…

– …use the data collected in the trial to give an estimate of the treatment effect size, together with a measure of how certain we are of our estimate

• CI is a range of values within which the “true” treatment effect is believed to be found, with a given level of confidence.

–95% CI is a range of values within which the ‘true’ treatment effect will lie 95% of the time

• Generally, 95% CI is calculated as –Sample Estimate ± 1.96 x Standard Error

29

Superiority study

d > 0+ effect

IC95%

d = 0No differences

d < 0- effect

Test betterControl better

DESIGN

STATISTICAL ERRORSSAMPLE SIZEMINIMUM IMPORTANT CLINICALY IMPORTANT DIFFERENCE (MICD)

30

31

Type I & II Error & Power

Reality (Population)

A=B A≠B

Conclusion (sample)

“A=B” p>0.05 OK Type I I error

()

A≠B p<0.05 Type I error

() OK

33

Type I & II Error & Power

• Type I Error ()– False positive– Rejecting the null hypothesis when in fact it is true – Standard: =0.05– In words, chance of finding statistical significance when in

fact there truly was no effect

• Type II Error ()– False negative– Accepting the null hypothesis when in fact alternative is

true– Standard: =0.20 or 0.10– In words, chance of not finding statistical significance

when in fact there was an effect

34

Sample size and MICD

C x Variancen = (MICD)2

C: function of and MICD: Minimum Important Clinically Difference

Minimum Important Clinically Difference (MICD or MID)

• “Smallest difference that is considered clinically important, this can be a specified difference (the Minimum Important Clinically Difference (MICD)”

• One can observe a difference between two groups or within one group over time that is statistically significance but small.

• With a large enough sample size, even a tiny difference could be statistically significant.

• The MID is the smallest difference that we care about.

35

Effect scalesABBSOLUTE AND RELATIVE DIFERENCES

37

Absolute and Relative Scales

• Incidence events / population at risk

• Absolute Risk Reduction (ARR)Incidence in Test – Incidence in control

• Relative Risk Reduction (RRR)(Incidence in Test – Incidence in control) / Incidence in control

• Number Needed to Treat (NNT)1/ ARR

• Relative Risk (RR) Incidence in Test / Incidence in control

38

39

Absolute and Relative effects

Risks …

P0 P1 Difabs Difrel RR OR

80.0% 75.0% -5.0% -6.3% 0.938 0.75015.0% 10.0% -5.0% -33.3% 0.667 0.63015.0% 14.0% -1.0% -6.7% 0.933 0.922

40

RR & OR

• RR or OR > 1

• RR or OR =1

• RR or OR < 1

Risk Factor

Absence of effect

Protection Factor

41

RR & OR

Non-Exposed

Exposed

Ills

Rate in Exposed 2/4 => 0.50

Rate in non-Exposed1/4 => 0.25

RR=2

Odds in Exposed: 2/2=> 1 Odds in non-

Exposed 1/3

OR=3

Example

• Treatment A: relative risk of 0.81

• Treatment B: reduction of 19% in risk

• Treatment C: absolute rate reduction of 3%

• Treatment D: survival increase from 84% to 87%

• Treatment E: relative mortality reduction of 19%

• Treatment F: avoids 1 death per 33 treated patients

42

Example• Treatment A: relative risk of 0.81

RR = 13% / 16% => 0.81

• Treatment B: reduction of 19% in riskRRR = 1-0.81 => 19%

• Treatment C: absolute rate reduction of 3%ARR = 16% - 13% => 3%

• Treatment D: survival increase from 84% to 87%ARR = 87%-84% = 16% - 13% = 3%

• Treatment E: relative mortality reduction of 19%RRR = (16%-13%) / 16% = 19% o bé 100*(1-RR) => 19%

• Treatment F: avoids 1 death per 33 treated patientsNNT = 33; ARR = 1/33 = 0,3 = 3%

43

CLINICAL RELEVANCE-INTERPRETATION

SUPERIORITY, NON-INFERIORITY AND EQUIVALENCE DESIGNS

44

45

Superiority study

d > 0+ effect

IC95%

d = 0No differences

d < 0- effect

Test betterControl better

46

0Treatment more effective -><- Treatment less effective

2

3

4

5

Treatment-Control

1

Superiority

47

0

Lower equivalence boundary

Upper equivalence boundary

Treatment more effective -><- Treatment less effective

2

3

4

5

Treatment-Control

1

Equivalence

48

0



2

3

4

5

Treatment-Control

1

Non-Inferiority

49

Main effi cacy End-Point

40%

10%

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Active Placebo

30%

B

A

P

1/2 ?1/3 ?

50


40%

15%

45%40%

20%

10%

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Active 1 Active 2 Active 3 Placebo 1 Placebo 2 Placebo 3

51

JAMA 2002; 287: 1807-1814

51

52


40%

10%

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Active REF Placebo Active Test

30%

53

54

56

MARKET

AP

A B

B C

C D

ED

57

58

0Treatment more effective -><- Treatment less effective

2

3

4

5

Treatment-Control

1

Superiority

59

0

MICD


Treatment-Control

Statistically and Clinically Superiority

2

3

4

5

1

60

0


Upper equivalence boundary

Statistical Superiority

Non-inferiority

Equivalence

Inferiority

Treatment-Control

Statistically and Clinically superiority

Statistically and Clinically Superiority

Non relevant/negative effect relevant effect

61

Effect Size & Sample SizeRelative Effect Absolute Size Power* difference (%) (%) (mmHg)----------------------------------- 0% 4.9% 0.0 10% 5.9% 0.2 20% 8.5% 0.4 30% 13.3% 0.6 40% 20.2% 0.8 50% 28.2% 1.0 60% 39.3% 1.2 70% 49.3% 1.4 80% 61.1% 1.6 90% 71.0% 1.8 100% 80.4% 2.0 -----------------------------------*Statistical power assuming constant variability (SD=20mmHg)

62

Key statistical issues

• Multiplicity• Subgroups: interaction & confounding• Superiority and non-inferiority (and )• Adjustment by covariates• Missing data• Others

– Interim analyses– Meta-analysis vs one pivotal study– Flexible designs

63

MULTIPLICITY

64

Torneo Roland Garros 19991ª Ronda

Carlos Moyá vs Markus Hipfl

Moyá Hipfl

J uegos Totales Ganados 22 24Puntos Totales Ganados 147 1461er Servicio 62% 69%Aces 5 3Doble Faltas 4 5% Ganadores con el 1er Servicio 63 de 95 = 66% 61 de 96 = 64%% Ganadores con el 2º Servicio 25 de 58 = 43% 20 de 44 = 45%Ganadores (incluyendo el Servicio) 30 56Errores No Forzados 62 75Puntos de Break Ganados 6 of 21 = 29% 6 of 27 = 22%Aproximaciones a la red 48 of 71 = 68% 29 of 41 = 71%Velocidad del Servicio más Rápido 200 KPH 193 KPHPromedio Velocidad 1er Servicio 157 KPH 141 KPHPromedio Velocidad 2º Servicio 132 KPH 126 KPH

Set 1 2 3 4 5

Carlos Moyá 3 1 6 6 6Markus Hipfl 6 6 4 4 4

65

Lancet 2005; 365: 1591–95

To say it colloquially,

torture the data until they speak...

66

Torturing data…

– Investigators examine additional endpoints, manipulate group comparisons, do many subgroup analyses, and undertake repeated interim analyses.

– Investigators should report all analytical comparisons implemented. Unfortunately, they sometimes hide the complete analysis, handicapping the reader’s understanding of the results.

Lancet 2005; 365: 1591–95

68

Multiplicity

K independent hypothesis : H01 , H02 , ... , H0K

S significant results ( p<)

Pr (S 1 | H01 H02 ... H0K = H0.) = 1 - Pr (S=0|H0.)

= 1- (1 - )K

K Pr(S>=1|Ho.) K Pr(S>=1|Ho.)

1 0.0500 10 0.4013

2 0.0975 15 0.5367

3 0.1426 20 0.6415

4 0.1855 25 0.7226

5 0.2262 30 0.7854

69

Same examples

case A case B case CVariables 2 5 5Times 2 4 4Subgroups 2 3 3Comparisons 1 1 3

total 8 60 180False positive rate 33.66% 96.61% 99.99%

70

Multiplicity

• Bonferroni correction (simplified version)

– K tests with level of signification of – Each test can be tested at the /k level

• Example:– 5 independent tests– Global level of significance=5%– Each test should be tested at the 1%

level 5% /5 => 1%

71

But this is the simplified version for the general public

73

Some strategies to ‘burden’ with multiple contrasts

74

Handling Multiplicity in Variables

• Scenario 1: One Primary Variable– Identify one primary variable -- other

variables are secondary

– Trial is positive if and only if primary variable shows significant (p < 0.05), positive results

75

76


• Scenario 2 Divide Type I Error

– Identify two (or more) co-primary variables

– Divide the 0.05 experiment-wise Type I error over these co-primary variables, e.g., 0.04 for the 1st, and 0.01 for the 2nd co-primary variable

– Trial is positive if at least one of the co-primary variables shows significant, positive results

77


• Scenario 3 Sequentially Rejective Procedure– Identify n co-primary variables, e.g., n = 3– Order obtained p-values

• Interpret the variable with the highest p-value at the 0.05 level;

• if significant, then interpret the variable with the 2nd highest p-value at the 0.05/2 level;

• if positive, then interpret the variable with the smallest p-value at the 0.05/3 level.

78


• Scenario 4 Hierarchy– Pre-specify hierarchy among n co-primary

variables,

– All tested at the same level• interpret 1st variable at 0.05 level, if significant, then • interpret 2nd variable at 0.05 level; if positive, then • interpret 3rd variable at 0.05 level. • …Test procedure stops when a test is not significant.

– Trial is positive if first co-primary variable shows significant, positive result

79

Role of Secondary Variables

• Secondary variables can only be claimed if and only if – the primary variable shows significant results,

and – the comparisons related to the secondary

variables also are protected under the same Type I error rate as the primary variable.

• Similar procedures as already discussed can be used to protect Type I error

81

SUBGROUPS

82

Subgroups

• Indiscriminate subgroup analyses pose serious multiplicity concerns. Problems reverberate throughout the medical literature. Even after many warnings, some investigators doggedly persist in undertaking excessive subgroup analyses.

Lancet 2000; 355: 1033–34Lancet 2005; 365: 1657–61

Confounding & Interaction

83

84

Confounding

Non-Smokers Smokers

d=6%

d=0%

d=0%

Confounding• A situation in which a measure of the effect of

an exposure on risk is distorted because of the association of exposure with other factor(s) that influence the outcome under study.

• Criteria for confounding– Factor is associated with exposure– Factor is associated with disease in the absence of

exposure– Factor is not in the causal path between exposure

and outcome

85

Exposure Outcome

Third variable

To be a confounding factor, two conditions must be met:

Be associated with exposure - without being the consequence of exposureBe associated with outcome - independently of exposure (not an intermediary)

ConfoundingConfounding

86

87

Interacction

Age< 45 years Age>= 45 Years

d=5%

d=0.7% d=11.5%

88

Interaction & Subgroups

AspirinPlaceboVascular Death150 147

Total 1357 1442

11.1% 10.2%

p=0.42045 d=-0.9

ISIS-2: Vascular death by Star signs

Geminis/Libra Other Star Signs

AspirinPlaceboVascular Death 654 868

Total 7228 7157

9.0% 12.1%

p<0.0001 d=3.1

Interacction p = 0.019

Lancet 1988; 2: 349–60.

89

Changes from ISIS-2 results

Lancet 2005; 365: 1657–61

90

Simpson’s Paradox

Experimental Controln (%) n (%)

ALL Succes 70 (70%) 60 (60%)Failure 30 (30%) 40 (40%)

100 100

91

Simpson’s Paradox cont.Experimental Control

n (%) n (%)MALE Succes 10 (33%) 24 (40%)

Failure 20 (67%) 36 (60%)30 60

FEMALE Succes 60 (86%) 36 (90%)Failure 10 (14%) 4 (10%)

70 40

Experimental Controln (%) n (%)

ALL Succes 70 (70%) 60 (60%)Failure 30 (30%) 40 (40%)

100 100

92

• “The answer to a randomized controlled trial that does not confirm one’s beliefs is not the conduct of several subanalyses until one can see what one believes. Rather, the answer is to re-examine one’s beliefs carefully.”

–BMJ 1999; 318: 1008–09.

93

Lancet 2005; 365: 1657–61

94

the question is NOT: ‘Is the treatment effect in this subgroup statistically significantly different from zero?’

BUT…are there any differences in the treatment effect between the various subgroups?

The correct statistical procedures are either a test of heterogeneity or a test for interaction

95

Subgroups• Recommendations:

– 1) Examine the global effect – 2) Test for the interaction– 3) Plan adjustments for confirmatory

analyses– 4) Some points which increase the

credibility:• Pre-specification• Biologic plausibility

96

Lancet 2005; 365: 176–86

HOW TO CONTROL FOR CONFOUNDERS?

• IN STUDY DESIGN…

– RESTRICTION of subjects according to potential confounders (i.e. simply don’t include confounder in study)

– MATCHING subjects on potential confounder thus assuring even distribution among study groups

– RANDOM ALLOCATION of subjects to study groups to attempt to even out unknown confounders

• IN DATA ANALYSIS…

– RESTRICTION is still possible at the analysis stage but it means throwing away data

– IMPLEMENT A MATCHED-DESIGN after you have collected data (frequency or group)

– STRATIFIED ANALYSIS using to control for confounders

– MODEL FITTING using adjustment techniques

97

98

MULTIPLE INSPECTIONS

99

Interim Analyses in the CDP

Z ValueZ ValueZ ValueZ Value

+2+2

+1+1

00

-1-1

-2-2

+2+2

+1+1

00

-1-1

-2-210 20 30 40 50 60 70 80 90 10010 20 30 40 50 60 70 80 90 100

Month of Follow-upMonth of Follow-up

(Month 0 = March 1966, Month 100 = July 1974)

Coronary Drug Project Mortality Surveillance. Circulation. 1973;47:I-1

http://clinicaltrials.gov/ct/show/NCT00000483;jsessionid=C4EA2EA9C3351138F8CAB6AFB723820A?order=23

100

Lancet 2005; 365: 1657–61

101

Sequential designs

1) Sample size re-estimation

2) Group Sequential Methods

3) Alpha (Beta) Spending Functions

4) Repeated Confidence Intervals

5) Stochastic Curtailment

6) Bayesian Methods

7) Likelihood based Methods

105

K z ' z ' z '1 2.782 0.005 2.576 0.010 2.178 0.0292 1.967 0.049 1.969 0.049 2.178 0.029

1 3.438 0.001 2.576 0.010 2.289 0.0222 2.431 0.015 2.576 0.010 2.289 0.0223 1.985 0.047 1.969 0.049 2.289 0.022

1 4.084 0.000 3.291 0.001 2.361 0.0182 2.888 0.004 3.291 0.001 2.361 0.0183 2.358 0.018 3.291 0.001 2.361 0.0184 2.042 0.041 1.969 0.049 2.361 0.018

1 4.555 0.000 3.291 0.001 2.413 0.0162 3.221 0.001 3.291 0.001 2.413 0.0163 2.630 0.009 3.291 0.001 2.413 0.0164 2.277 0.023 3.291 0.001 2.413 0.0165 2.037 0.042 1.969 0.049 2.413 0.016

O'Brien & Fleming Peto Pocock

Group Sequential Methods

107

CONCLUSION

108

109

110

111

The role of statistics

“Thus statistical methods are no substitute for common sense and objectivity. They should never aim to confuse the reader, but instead should be a major contributor to the clarity of a scientific argument.”

The role of statistics. Pocock SJ Br J Psychiat 1980; 137:188-190

112

http://ferran.torres.name/edu/stats_rct

Password: stats_rct

BACK-UP

113

114

RANDOMIZATION & COVARIATES

115

116

Adjustement• The objective should be not to compensate

unbalance (randomisation) but to improve the precision

• Avoid to adjust by post-randomization variables

• In RCT, never use this widespread strategy: “adjust by any baseline significant variable (5% or 10% level)”

117

Testing for “baseline homogeneity”

• All observed differences are known with certainty to be due to chance.

• We must not test for it: there is no alternative hypothesis whose truth can be supported by such a test.

• If significant, the estimator is still unbiased

• Balance:– Decreases the variance and increases the power. – It has no effect on type I error.

118

Stratification• A priori

• May desire to have treatment groups balanced with respect to prognostic or risk factors (co-variates)

• For large studies, randomization “tends” to give balance • For smaller studies a better guarantee may be needed

• Useful only to a limited extent (especially for small trials) but avoid to many variables (i.e. many empty or partly filled strata)

119

Observed Unbalanced…• NEVER justifies the post-hoc

adjustment:– Randomization is more important– The treatment effect is unbiased without

adjustment (randomization)– Type I error level takes into account for

“chance error”– Post-hoc: data driven analyses – Multiplicity issues : increase type I error by

allowing a post-hoc adjustment

120

Adjusted Analyses

• ‘ When the potential value of an adjustment is in doubt, it is often advisable to nominate the unadjusted analysis as the one for primary attention, the adjusted analysis being supportive.’

122

MISSING DATA

123

Ex: LOCF & lineal extrapolation

36

32

28

24-

20

16

12

8

4 0 2 4 6 8 10 12 14 16 18 Time (months)

LOCF

Lineal Regresion

Bias

Ad

as-

Cog

> Worse

< Better

124

Ex: Early drop-out due to AE

Ad

as-

Cog

36

32

28

24-

20

16

12

8

4 0 2 4 6 8 10 12 14 16 18 Time

(months)

Placebo

Active

> Worse

< Better

Bias:

Favours

Active

125

Ex: Early drop-out due to lack of Efficacy

Ad

as-

Cog

36

32

28

24-

20

16

12

8

4 0 2 4 6 8 10 12 14 16 18 Time (months)

Placebo

Active

> Worse

< Better

Bias:

Favours

Placebo

126

RND

B

Baseline Last Visit

≠ Frecuencies

A

Drop-outs and missing dataDrop-outs and missing data

A A A A A AB B A

Visit 2Visit 1

A

127

RND

Baseline Last Visit

≠ Timing

A

Drop-outs and missing dataDrop-outs and missing data

A A A A B B

Visit 2Visit 1

B B B

128

MD e incorrecto uso de poblaciones (1)

DiseñoDiseño Cirugía vs Tratamiento Médico en estenosis Cirugía vs Tratamiento Médico en estenosis

carotidea bilateral (Sackket et al., 1985)carotidea bilateral (Sackket et al., 1985) Variable principalVariable principal: Número de pacientes que : Número de pacientes que

presenten TIA, ACV o muertepresenten TIA, ACV o muerte Distribución de los pacientes:Distribución de los pacientes:

Pacientes randomizados:Pacientes randomizados: 167167 Tratamiento quirúrgico: Tratamiento quirúrgico: 94 94 Tratamiento médico:Tratamiento médico: 73 73

– Pacientes que no completaron el estudio Pacientes que no completaron el estudio debido a ACV en las fases iniciales de debido a ACV en las fases iniciales de hospitalización: hospitalización:

Tratamiento quirúrgico: 15 pacientesTratamiento quirúrgico: 15 pacientesTratamiento médico:Tratamiento médico: 01 pacientes 01 pacientes

129


Población Por Protocolo (PP):Población Por Protocolo (PP):

Pacientes que hayan completado el estudioPacientes que hayan completado el estudio

AnálisisAnálisis

– Tratamiento quirúrgico:Tratamiento quirúrgico: 43 / (94 - 15) = 43 / 79 = 54%43 / (94 - 15) = 43 / 79 = 54%

– Tratamiento médico:Tratamiento médico: 53 / (73 - 1) = 53 / 72 = 74%53 / (73 - 1) = 53 / 72 = 74%

– Reducción del riesgo:Reducción del riesgo:27%, p = 0.0227%, p = 0.02

Primer análisis que se realiza :

130


El análisis definitivo queda de la siguiente forma :

Población Intención de Tratar (ITT):Población Intención de Tratar (ITT):

Todos los pacientes randomizadosTodos los pacientes randomizados

AnálisisAnálisis– Tratamiento quirúrgico:Tratamiento quirúrgico: 58 / 94 = 62%58 / 94 = 62%– Tratamiento médico:Tratamiento médico: 54 / 73 = 74%54 / 73 = 74%– Reducción del riesgo:Reducción del riesgo:18%, p = 0.0918%, p = 0.09 (PP: 27%, p = 0.02)(PP: 27%, p = 0.02)

Conclusiones: La población correcta de análisis es la ITT El tratamiento quirúrgico no ha demostrado ser significativamente superior al tratamiento médico

131

Handling of MD• Methods for imputation:

– Many techniques– No gold standard for every situation– In principle, all methods may be valid:

• Simple methods to more complex:– From LOCF to multiple imputation methods– Worst Case, “Mean methods”

• Multiple Imputation• But their appropriateness has to be justified

• Statistical approaches less sensitive to MD:– Mixed models– Survival models

• They assume no relationship between treatment and the missing outcome, and generally this cannot be assumed.

Handling of MD

132

Relationship of MD with

1) Treatment2) Outcome

133

134

A B

X X

X X

X X

X X

X X

X X

X X

X X

X X

X X

Effi cacy

A B

X X

X X

X X

X X

X X

X X

X X

. .

. .

. .

Effi cacy

A B

X X

X X

X X

X X

X X

X X

X X

. X

. X

. .

Effi cacy

0%

2%

4%

6%

8%

10%

12%

14%

A B

Obs

MD

0%

10%

20%

30%

40%

50%

60%

A B

Obs

MD

0%

2%

4%

6%

8%

10%

12%

14%

A B

Obs

MD

0%

2%

4%

6%

8%

10%

12%

14%

16%

18%

A B

Obs

MD

135

Trt Outc.Missing - -

S = 5,0% S = 12,0%

S = - S = - S = - S = -

MissingnessA B

Succes (M.D.)A B

Succes (Observed)A B

n % N %S 5 5,0% 12 12,0%F 95 95,0% 88 88,0%

100 100% 100 100%

% dif -7,0%OR 0,386RR 0,417

A B

A B

X X

X X

X X

X X

X X

X X

X X

X X

X X

X X

Effi cacy

0%

2%

4%

6%

8%

10%

12%

14%

A B

Obs

MD

136

Trt Outc.Missing - -

S = 5,0% S = 12,0%

S = 30,0% S = 30,0% S = 5,0% S = 12,0%

B BA

Succes (Observed)

Succes (M.D.)MissingnessA

A B

n % N %S 3,5 5,0% 8,4 12,0%F 66,5 95,0% 61,6 88,0%

70 100% 70 100%

% dif -7,0%

OR 0,386

RR 0,417

A B

n % N %S 5 5,0% 12 12,0%F 95 95,0% 88 88,0%

100 100% 100 100%

% dif -7,0%OR 0,386RR 0,417

BA

A B

X X

X X

X X

X X

X X

X X

X X

. .

. .

. .

Effi cacy

0%

2%

4%

6%

8%

10%

12%

14%

A B

Obs

MD

137

n % n %S 3.5 5.0% 10.8 12.0%F 66.5 95.0% 79.2 88.0%

70 100% 90 100%

% dif -7.0%OR 0.386RR 0.417

A B

Trt Outc.Missing si _

S = 5.0% S = 12.0%

S = 30.0% S = 10.0% S = 5.0% S = 12.0%

MissingnessA B

Succes (Observed)

Succes (M.D.)

BA

A B

n % N %

S 5 5,0% 12 12,0%F 95 95,0% 88 88,0%

100 100% 100 100%

% dif -7,0%OR 0,386RR 0,417

A B

A B

X X

X X

X X

X X

X X

X X

X X

. X

. X

. .

Effi cacy

0%

2%

4%

6%

8%

10%

12%

14%

A B

Obs

MD

138

0%

2%

4%

6%

8%

10%

12%

14%

16%

18%

A B

Obs

MDA B

X X

X X

X X

X X

X X

X X

X X

. .

. .

. .

Effi cacy

Trt Outc.Missing - si

S = 5,0% S = 12,0%

S = 30,0% S = 30,0% S = 10,0% S = 17,0%

MissingnessA B

Succes (Observed)A B

Succes (M.D.)A B

n % N %S 3,5 5,0% 8,4 12,0%F 66,5 95,0% 61,6 88,0%

70 100% 70 100%

% dif -7,0%OR 0,386RR 0,417

A B

n % N %S 6,5 6,5% 13,5 13,5%F 93,5 93,5% 86,5 86,5%

100 100% 100 100%

% dif -7,0%OR 0,445RR 0,481

A B

139

Trt Outc.Missing - si

S = 5,0% S = 12,0%

S = 30,0% S = 30,0% S = 50,0% S = 50,0%

Missingness

Succes (Observed)

Succes (M.D.)

A B

A B A B

n % N %S 3,5 5,0% 8,4 12,0%F 66,5 95,0% 61,6 88,0%

70 100% 70 100%

% dif -7,0%OR 0,386RR 0,417

A B

n % N %S 18,5 18,5% 23,4 23,4%F 81,5 81,5% 76,6 76,6%

100 100% 100 100%

% dif -4,9%OR 0,743RR 0,791

A B

A B

X X

X X

X X

X X

X X

X X

X X

. .

. .

. .

Effi cacy

0%

10%

20%

30%

40%

50%

60%

A B

Obs

MD

140

n % N %S 3.5 5.0% 10.8 12.0%F 66.5 95.0% 79.2 88.0%

70 100% 90 100%

% dif -7.0%

OR 0.386RR 0.417

A B

Trt Outc.Missing si si

S = 5.0% S = 12.0%

S = 30.0% S = 10.0% S = 50.0% S = 50.0%

MissingnessA B

A

A B

BObserved

Succes (M.D.)

A B

X X

X X

X X

X X

X X

X X

X X

. X

. X

. .

Effi cacy

0%

10%

20%

30%

40%

50%

60%

A B

Obs

MD

n % N %S 18.5 18.5% 15.8 15.8%F 81.5 81.5% 84.2 84.2%

100 100% 100 100%

% dif 3%OR 1.210RR 1.171

A B

Handling of MD

141

Best way to deal with Missing Data:

Don’t have any!!!• Methods for imputation:

– Many techniques– No gold standard for every situation– In principle, “almost any method may be valid”:

=>But their appropriateness has to be justified

142

Handling of MD• Avoidance of missingness:

– In the design and conduct all efforts should be directed towards minimising the amount of missing data likely to occur.

– Despite these efforts some missing values will generally be expected.

• The way these missing observations are handled may substantially affect the conclusions of the study.

143

Statistical framework

• applicability of methods based on a classification according to missingness generation mechanisms:

– missing completely at random (MCAR) – missing at random (MAR) – missing not at random (MNAR)

Rubin (1976)144

• MCAR - missing completely at random– Neither observed or unobserved outcomes are

related to dropout

• MAR - missing at random– Unobserved outcomes are not related to dropout,

they can be predicted from the observed data

• MNAR - missing not at random– Unobserved outcomes are related to dropout

Missing Data Mechanisms

145

MAR methods• MAR assumption

– MD depends on the observed data

– the behaviour of the post drop-out observations can be predicted with the observed data

– It seems reasonable and it is not a strong assumption, at least a priori

– In RCT, the reasons for withdrawal are known

– Other assumptions seem stronger and more arbitrary 146

147

36

32

28

24

20

16

12

8

4

0 2 4 6 8 10 12 14 16 18 Time (months)

> Worse

< Better

Options after withdrawalOptions after withdrawal

However…

• It is reasonable to consider that the treatment effect will somehow cease/attenuate after withdrawal

• If there is a good response, MAR will not “predict” a bad response

• =>MAR assumption not suitable for early drop-outs because of safety issues

• In this context MAR seems likely to be anti-conservative

148

The main analysis: What should reflect ?

A) The “pure” treatment effect:– Estimation using the “on treatment” effect after

withdrawal – Ignore effects (changes) after treatment

discontinuation– Does not mix up efficacy and safety

B) The expected treatment effect in “usual clinical practice” conditions

149

General Strategies

• Complete-case analysis• “Weighting methods” & Dummy

variable/category • Imputation methods

– Single Imputation / Multiple Imputation

• Analysing data as incomplete• MNAR methods• Other methods

150

Complete-case analysis

• a.k.a. Available Data Only (ADO)

• “Case deletion”:

– Listwise deletion (a.k.a. complete-case analysis):• delete all cases with missing value on any of the variables in the analysis. Only use complete

cases.

– Pairwise deletion (a.k.a. available-case analysis) • use all available cases for computation of any sample moment

• Only OK if missing data are MCAR (very strong assumption)– Parameter estimates unbiased– Standard errors appropriate?

• But, can result in substantial loss of statistical power151

Complete-case analysis

• Complete case analysis:– Bias, power and variability– Not generally appropriate. – Exceptions:

– Exploratory studies, especially in the initial phases of drug development.

– Secondary supportive analysis in confirmatory trials (robustness)

152

General Strategies





153

“Weighting methods” & Dummy variable/category

• “Weighting methods”: – To construct weights for incomplete/under-

represented cases– Sometimes considered as a form of imputation

• Dummy variable/category adjustment– Cohen & Cohen (1985); produces biased coefficient

estimates (see Jones’ 1996 JASA article)

Utility: observational studies; exploratory analyses

154

General Strategies





155

Single Imputation• Substitute a value for each missing value.

Some of the ways to choose this value: – Mean Estimation

• Replace missing data with the mean of non-missing values.

– Class Imputation methods• Stratify and sort by key covariates, replace missing data

from another record in the same strata.

– Predict missing values from Regression • Impute each independent variable on the basis of other

independent variables in model.

– LOCF / BOCF

– Other single imputation methods:• Rank/Score based methods• Worst (best) case• EM estimation

156

Mean Imputation

Scatterplots are from Joe Schafer’s website

157

Regression methods

158

Imputation methods• LOCF and variants

– Bias: • Depending on the amount and timing of drop-outs:• Ex: The conditions under study has a worsening course

– Conservative: » Drop-outs because of lack of efficacy in the control group

– Anticonservative:» Drop-outs because of intolerance in the test group

– Use: only if MCAR assumption and if there are no trends with time

– BOCF useful in some cases such as in a chronic pain trial• it is reasonable to assume that when a patient withdraws and

treatment is stopped the pain levels return to baseline levels.

159

160

Ex: LOCF & lineal extrapolation lineal

36

32

28

24-

20

16

12

8

4

0 2 4 6 8 10 12 14 16 18 Time (months)

LOCF

Lineal Regresion

Bias

Ad

as-

Cog

> Worse

< Better

Ex: Early drop-out due to AE

Ad

as-

Cog

36

32

28

24-

20

16

12

8

4

0 2 4 6 8 10 12 14 16 18 Time (months)

Placebo

Active

> Worse

< Better

Bias:

Favours

Active

161

Ex: Early drop-out due to lack of Efficacy

Ad

as-

Cog

36

32

28

24-

20

16

12

8

4

0 2 4 6 8 10 12 14 16 18 Time (months)

Placebo

Active

> Worse

< Better

Bias:

Favours

Placebo

162

Adas-Cog 36

32

28

24-

20

16

12

8

4

0 2 4 6 8 10 12 14 16 18 Time

month

Example of interpolation

Regression imputation

163

Single Imputation• Substitute a value for each missing value.

Some of the ways to choose this value: – Mean Estimation

– Class Imputation methods

– Predict missing values from Regression

– LOCF / BOCF

– Other single imputation methods:• Rank/Score based methods• Worst (best) case

164

Single Imputation Pros - Cons

• Advantages-Single Imputation– Allows standard complete-data methods of analysis to

be used– Incorporates the data collectors knowledge

• Disadvantages-Single Imputation– Inferences based on imputed data set might be too

sharp– Correlations can be biased

165

General Strategies





166

Analysing data as incomplete

• Direct Estimation:– GEE analysis– Likehood methods– Bayesian Estimation with Metropolis-Hastings or Markov

Chain Monte Carlo

NMAR Procedures (Usually uses one of these procedures or their extensions.)

• Time to event variables

167


• For continuous responses: – mixed-effect models for repeated measures, MMRM

• For categorical responses and count data: – marginal (e.g. generalized estimating equations, GEE) – random-effects (e.g., generalized linear mixed models,

GLMM)

• MD is not imputed

• Information is borrowed from cases where the information is available

• MAR assumption

168


Time to event analysis

•When the outcome measure is time to event, survival models which take into account censored observations are often used.

•Many standard survival methods assume that there is no relationship between the response and the missing outcome.

•Violations from this assumption could lead to biased results especially when data are missing due to withdrawal.

169

General Strategies




• Analysing data as incomplete• Other methods

170

3 Steps in Multiple Imputation (MI)

1. Create imputations (>1 for each missing value)

2. Analyze the imputed datasets

3. Combine the results

171

172

Multiple Imputation

173

Multiple Imputation

Advantages of MI

• By imputing more than one value => uncertainty is introduced

• Re-combining results in efficient and unbiased estimates

=>Correct inference

174

General Strategies





175

NMAR Missing Data

• Pattern Mixture Models

• Selection Models

• Other:– Auxiliary Variables

• Can alleviate NMAR bias (If correlates highly with missing values)– Shared parameter/Joint models

Be extremely cautious in the interpretation!176

Other methods• Retrieval of data after withdrawal

– Assessment may be interfered by external treatments, but reflects the clinical practice

– Balance: possible influence of external treatments after withdrawals VS possible bias due to the process of imputation or direct estimation

– Not biased when there are no effective treatments in one particular setting

• Responder analysis• Reasons for drop-out which are likely to be treatment related

(such as lack of efficacy or safety issues) will be considered as non-responders.

177

178

Definición de las distintas Definición de las distintas poblaciones de un estudiopoblaciones de un estudio

179

Objetivo: Evaluar la eficacia de un programa para reducir el peso frente a los a los consejos habituales

Diseño: Ensayo Clínico Aleatorio

Candidatos: 790

Obesos: 320

Grupo intervención: 161

Grupo control: 159

Rechazo: 59Petición espontánea: 54

Acaban: 102 Acaban: 105

180

Grupo intervención: 161

Grupo control: 159

Rechazo: 59Petición espontánea: 54

Acaban: 102 Acaban: 105

Grupo intervención Grupo Control

Opción A 161 159

Opción B 102 105

Opción C 59 54

Opción D 156 164

[email protected] 183

Análisis multivariante


Principales ventajas

• Ajuste por variables mal distribuidas

• Ajuste por valores basales distintos

• Test de significación multivariante


Análisis MultivarianteParadoja de Simpson (1951)

• En 1973, 8.442 hombres y 4.321 mujeres solicitaron su admisión en la Universidad de Berkeley.

• Se admitió al 44% de los hombres y al 30% de las mujeres.

• El gobierno federal acusó a la Universidad de Berkeley de discriminación sexual.



Tasa de solicitud y admisión en las 6 carreras más importantes.

OR hombres = 1,54 Una persona tiene 1,54 veces más probabilidad de ser admitida si es hombre.

Carrera

Nº Hombres

Solicitud

Nº Hombres

Admitidos

% Hombres

Admitidos

Nº Mujeres

Solicitud

Nº Mujeres

Admitidas

% Mujeres

Admitidas

Razón Hombres

Admitidos

A 825 512 0,62 108 89 0,82 0,35

B 560 353 0,63 25 17 0,68 0,80

C 325 120 0,37 593 202 0,34 1,13

D 417 138 0,33 375 202 0,54 0,42

E 191 53 0,28 393 94 0,24 1,22

F 373 22 0,06 341 24 0,07 0,83

Suma 2691 1198 0,45 1835 628 0,34 1,54



• En la mayoría de las carreras las mujeres tuvieron tasas de admisión superior a los hombres (A, B, D, F).

• Las excepciones fueron en pequeña cantidad (C, E).

¿De dónde surge la idea de la discriminación?

Carrera

% Hombres

Admitidos

% Mujeres

Admitidas

Razón Hombres

Admitidos

A 0,62 0,82 0,35

B 0,63 0,68 0,80

C 0,37 0,34 1,13

D 0,33 0,54 0,42

E 0,28 0,24 1,22

F 0,06 0,07 0,83

Suma 0,45 0,34 1,54



• Observar que es muy fácil entrar en las carreras A y B y que muchos hombres solicitan el ingreso en ellas.

• Sin embargo las mujeres pretenden entrar en la carrera F que es muy difícil.

Carrera

% Hombres

Solicitud

% Hombres

Admitidos

% Mujeres

Solicitud

% Mujeres

Admitidas

A 0,31 0,62 0,06 0,82

B 0,21 0,63 0,01 0,68

C 0,12 0,37 0,32 0,34

D 0,15 0,33 0,20 0,54

E 0,07 0,28 0,21 0,24

F 0,14 0,06 0,19 0,07

Suma 0,45 0,34

Documents

1 From bench to the bedside Statistics Issues in RCT Ferran Torres Biostatistics and Data Management Platform IDIBAPS - Hospital Clinic Barcelona Universitat