Upload
maia-bonser
View
228
Download
2
Tags:
Embed Size (px)
Citation preview
1
From bench to the bedside
Statistics Issues in RCT
Ferran TorresBiostatistics and Data Management PlatformIDIBAPS - Hospital Clinic Barcelona
Universitat Autònoma Barcelona. EMA:
Scientific Advice Working Party (SAWP)Biostatistics Working Party (BSWP).
2
Disclaimer• The opinions expressed today are personal views
and should not be understood or quoted as being made on behalf of any organization.
– Regulatory• Spanish Medicines Agency (AEMPS)• European Medicines Agency (EMA)
– Scientific Advice Working Party (SAWP)– Biostatistics Working Party (BSWP)
– Hospital - Academic - Independent Research• IDIBAPS. Hospital Clinic Barcelona• Autonomous University of Barcelona (UAB)• SCREN. Spanish Clinical Trials Platform
Documentation
3
Documentation
• Power Point presentation• Selected References • Direct links to guidelines
4
http://ferran.torres.name/edu/stats_rct
Password: stats_rct
5
Globalisation
LACK OF HARMONISATION
Data toregister in all regions
Similar BasicTechnical
Requirements
JAPANUSA
EU
INTERNATIONAL CONFERENCES HARMONISATION
www.ich.org
6
Regulatory Regulatory AgenciesAgencies
7
8
9
• CPMP/EWP/908/99 CPMP Points to Consider on Multiplicity issues in Clinical Trials
• CPMP/EWP/2863/99 Points to Consider on Adjustment for Baseline Covariates
• CPMP/2330/99 Points to Consider on Application with 1.) Meta-analyses and 2.) One Pivotal study
• Choice of a Non-Inferiority Margin CPMP/EWP/482/99 Points to Consider on Switching between Superiority and Non-inferiority
• CPMP/EWP/1776/99 Points to Consider on Missing Data
• CHMP/EWP/83561/05 Guideline on Clinical Trials in Small Populations
• CHMP/EWP/2459/02 Reflection Paper on Methodological Issues in Confirmatory Clinical Trials with Flexible Design and Analysis Plan
Regulatory Guidances
1010
• Consort Statement: Summary, // General, // non-inferiority
• Lancet: Series de Methodological & Stats Series
• BMJ: Statistics Notes (Bland & Altman) or in BMJ
1010
“Scientific Recomendations”
11
http://www.equator-network.org http://www.equator-network.org
12
Today’s talk is
on statistic
s
13
14
15
Basic statistics
• Why Statistics?• Samples and populations• P-Value• Statistical errors• Sample size• Confidence Intervals• Interpretation of CI: superiority, non-
inferiority, equivalence
16
The role of statistics
“Thus statistical methods are no substitute for common sense and objectivity. They should never aim to confuse the reader, but instead should be a major contributor to the clarity of a scientific argument.”
The role of statistics. Pocock SJBr J Psychiat 1980; 137:188-190
17
Why Statistics?
•Variation!!!!
BACKGROUNG
SAMPLE AND POPULATIONSP-VALUE AND CONFIDENCE INTERVALS
18
19
p
20
Population and Samples
Target Population
Population of the Study
Sample
21
Extrapolation
Sample
Population
Inferential analysisStatistical Tests
Confidence Intervals
Study Results
“Conclusions”
22
P-value• The p-value is a “tool” to answer the question:
–Could the observed results have occurred by chance*?
–Remember:• Decision given the observed results in a SAMPLE
• Extrapolating results to POPULATION
*: accounts exclusively for the random error, not bias
p < .05“statistically significant”
23
P-value: an intuitive definition
• The p-value is the probability of having observed our data when the null hypothesis is true (no differences exist)
• Steps:1) Calculate the treatment differences in the sample (A-B)2) Assume that both treatments are equal (A=B) and then…3) …calculate the probability of obtaining a magnitude of at
least the observed differences, given the assumption 24) We conclude according the probability:
a. p<0.05: the differences are unlikely to be explained by random, – we assume that the treatment explains the differences
b. p>0.05: the differences could be explained by random, 1) we assume that random explains the differences
24
Factors influencing statistical significance
• Signal
• Noise (background)
• Quantity
• Difference
• Variance (SD)
• Quantity of data
diferencia
dens
idad
-2 0 2 4
0.0
0.1
0.2
0.3
0.4
diferencia
dens
idad
-2 0 2 4
0.0
0.2
0.4
0.6
0.8
25
26
P-value. Some reflexionsTell us NOTHING about clinical or scientific importance. Only, that the results were not due to chance.
• A “very low” p-value do NOT imply:–Clinical relevance (NO!!!)–Magnitude of the treatment effect (NO!!)
With n or variability p
•Please never compare p-values!! (NO!!!)
27
Interval Estimation
Confidence intervalSample statistic (point estimate)
Confidence limit (lower)
Confidence limit (upper)
Intuitive interpretation:
“A probability that the population parameter falls somewhere within the interval”
28
95%CI• Better than p-values…
– …use the data collected in the trial to give an estimate of the treatment effect size, together with a measure of how certain we are of our estimate
• CI is a range of values within which the “true” treatment effect is believed to be found, with a given level of confidence.
–95% CI is a range of values within which the ‘true’ treatment effect will lie 95% of the time
• Generally, 95% CI is calculated as –Sample Estimate ± 1.96 x Standard Error
29
Superiority study
d > 0+ effect
IC95%
d = 0No differences
d < 0- effect
Test betterControl better
DESIGN
STATISTICAL ERRORSSAMPLE SIZEMINIMUM IMPORTANT CLINICALY IMPORTANT DIFFERENCE (MICD)
30
31
Type I & II Error & Power
Reality (Population)
A=B A≠B
Conclusion (sample)
“A=B” p>0.05 OK Type I I error
()
A≠B p<0.05 Type I error
() OK
33
Type I & II Error & Power
• Type I Error ()– False positive– Rejecting the null hypothesis when in fact it is true – Standard: =0.05– In words, chance of finding statistical significance when in
fact there truly was no effect
• Type II Error ()– False negative– Accepting the null hypothesis when in fact alternative is
true– Standard: =0.20 or 0.10– In words, chance of not finding statistical significance
when in fact there was an effect
34
Sample size and MICD
C x Variancen = (MICD)2
C: function of and MICD: Minimum Important Clinically Difference
Minimum Important Clinically Difference (MICD or MID)
• “Smallest difference that is considered clinically important, this can be a specified difference (the Minimum Important Clinically Difference (MICD)”
• One can observe a difference between two groups or within one group over time that is statistically significance but small.
• With a large enough sample size, even a tiny difference could be statistically significant.
• The MID is the smallest difference that we care about.
35
Effect scalesABBSOLUTE AND RELATIVE DIFERENCES
37
Absolute and Relative Scales
• Incidence events / population at risk
• Absolute Risk Reduction (ARR)Incidence in Test – Incidence in control
• Relative Risk Reduction (RRR)(Incidence in Test – Incidence in control) / Incidence in control
• Number Needed to Treat (NNT)1/ ARR
• Relative Risk (RR) Incidence in Test / Incidence in control
38
39
Absolute and Relative effects
Risks …
P0 P1 Difabs Difrel RR OR
80.0% 75.0% -5.0% -6.3% 0.938 0.75015.0% 10.0% -5.0% -33.3% 0.667 0.63015.0% 14.0% -1.0% -6.7% 0.933 0.922
40
RR & OR
• RR or OR > 1
• RR or OR =1
• RR or OR < 1
Risk Factor
Absence of effect
Protection Factor
41
RR & OR
Non-Exposed
Exposed
Ills
Rate in Exposed 2/4 => 0.50
Rate in non-Exposed1/4 => 0.25
RR=2
Odds in Exposed: 2/2=> 1 Odds in non-
Exposed 1/3
OR=3
Example
• Treatment A: relative risk of 0.81
• Treatment B: reduction of 19% in risk
• Treatment C: absolute rate reduction of 3%
• Treatment D: survival increase from 84% to 87%
• Treatment E: relative mortality reduction of 19%
• Treatment F: avoids 1 death per 33 treated patients
42
Example• Treatment A: relative risk of 0.81
RR = 13% / 16% => 0.81
• Treatment B: reduction of 19% in riskRRR = 1-0.81 => 19%
• Treatment C: absolute rate reduction of 3%ARR = 16% - 13% => 3%
• Treatment D: survival increase from 84% to 87%ARR = 87%-84% = 16% - 13% = 3%
• Treatment E: relative mortality reduction of 19%RRR = (16%-13%) / 16% = 19% o bé 100*(1-RR) => 19%
• Treatment F: avoids 1 death per 33 treated patientsNNT = 33; ARR = 1/33 = 0,3 = 3%
43
CLINICAL RELEVANCE-INTERPRETATION
SUPERIORITY, NON-INFERIORITY AND EQUIVALENCE DESIGNS
44
45
Superiority study
d > 0+ effect
IC95%
d = 0No differences
d < 0- effect
Test betterControl better
46
0Treatment more effective -><- Treatment less effective
2
3
4
5
Treatment-Control
1
Superiority
47
0
Lower equivalence boundary
Upper equivalence boundary
Treatment more effective -><- Treatment less effective
2
3
4
5
Treatment-Control
1
Equivalence
48
0
Lower equivalence boundary
Treatment more effective -><- Treatment less effective
2
3
4
5
Treatment-Control
1
Non-Inferiority
49
Main effi cacy End-Point
40%
10%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Active Placebo
30%
B
A
P
1/2 ?1/3 ?
50
Main effi cacy End-Point
40%
15%
45%40%
20%
10%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Active 1 Active 2 Active 3 Placebo 1 Placebo 2 Placebo 3
51
JAMA 2002; 287: 1807-1814
51
52
Main effi cacy End-Point
40%
10%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Active REF Placebo Active Test
30%
53
54
56
MARKET
AP
A B
B C
C D
ED
57
58
0Treatment more effective -><- Treatment less effective
2
3
4
5
Treatment-Control
1
Superiority
59
0
MICD
Treatment more effective -><- Treatment less effective
Treatment-Control
Statistically and Clinically Superiority
2
3
4
5
1
60
0
Lower equivalence boundary
Upper equivalence boundary
Statistical Superiority
Non-inferiority
Equivalence
Inferiority
Treatment-Control
Statistically and Clinically superiority
Statistically and Clinically Superiority
Non relevant/negative effect relevant effect
61
Effect Size & Sample SizeRelative Effect Absolute Size Power* difference (%) (%) (mmHg)----------------------------------- 0% 4.9% 0.0 10% 5.9% 0.2 20% 8.5% 0.4 30% 13.3% 0.6 40% 20.2% 0.8 50% 28.2% 1.0 60% 39.3% 1.2 70% 49.3% 1.4 80% 61.1% 1.6 90% 71.0% 1.8 100% 80.4% 2.0 -----------------------------------*Statistical power assuming constant variability (SD=20mmHg)
62
Key statistical issues
• Multiplicity• Subgroups: interaction & confounding• Superiority and non-inferiority (and )• Adjustment by covariates• Missing data• Others
– Interim analyses– Meta-analysis vs one pivotal study– Flexible designs
63
MULTIPLICITY
64
Torneo Roland Garros 19991ª Ronda
Carlos Moyá vs Markus Hipfl
Moyá Hipfl
J uegos Totales Ganados 22 24Puntos Totales Ganados 147 1461er Servicio 62% 69%Aces 5 3Doble Faltas 4 5% Ganadores con el 1er Servicio 63 de 95 = 66% 61 de 96 = 64%% Ganadores con el 2º Servicio 25 de 58 = 43% 20 de 44 = 45%Ganadores (incluyendo el Servicio) 30 56Errores No Forzados 62 75Puntos de Break Ganados 6 of 21 = 29% 6 of 27 = 22%Aproximaciones a la red 48 of 71 = 68% 29 of 41 = 71%Velocidad del Servicio más Rápido 200 KPH 193 KPHPromedio Velocidad 1er Servicio 157 KPH 141 KPHPromedio Velocidad 2º Servicio 132 KPH 126 KPH
Set 1 2 3 4 5
Carlos Moyá 3 1 6 6 6Markus Hipfl 6 6 4 4 4
65
Lancet 2005; 365: 1591–95
To say it colloquially,
torture the data until they speak...
66
Torturing data…
– Investigators examine additional endpoints, manipulate group comparisons, do many subgroup analyses, and undertake repeated interim analyses.
– Investigators should report all analytical comparisons implemented. Unfortunately, they sometimes hide the complete analysis, handicapping the reader’s understanding of the results.
Lancet 2005; 365: 1591–95
68
Multiplicity
K independent hypothesis : H01 , H02 , ... , H0K
S significant results ( p<)
Pr (S 1 | H01 H02 ... H0K = H0.) = 1 - Pr (S=0|H0.)
= 1- (1 - )K
K Pr(S>=1|Ho.) K Pr(S>=1|Ho.)
1 0.0500 10 0.4013
2 0.0975 15 0.5367
3 0.1426 20 0.6415
4 0.1855 25 0.7226
5 0.2262 30 0.7854
69
Same examples
case A case B case CVariables 2 5 5Times 2 4 4Subgroups 2 3 3Comparisons 1 1 3
total 8 60 180False positive rate 33.66% 96.61% 99.99%
70
Multiplicity
• Bonferroni correction (simplified version)
– K tests with level of signification of – Each test can be tested at the /k level
• Example:– 5 independent tests– Global level of significance=5%– Each test should be tested at the 1%
level 5% /5 => 1%
71
But this is the simplified version for the general public
73
Some strategies to ‘burden’ with multiple contrasts
74
Handling Multiplicity in Variables
• Scenario 1: One Primary Variable– Identify one primary variable -- other
variables are secondary
– Trial is positive if and only if primary variable shows significant (p < 0.05), positive results
75
76
Handling Multiplicity in Variables
• Scenario 2 Divide Type I Error
– Identify two (or more) co-primary variables
– Divide the 0.05 experiment-wise Type I error over these co-primary variables, e.g., 0.04 for the 1st, and 0.01 for the 2nd co-primary variable
– Trial is positive if at least one of the co-primary variables shows significant, positive results
77
Handling Multiplicity in Variables
• Scenario 3 Sequentially Rejective Procedure– Identify n co-primary variables, e.g., n = 3– Order obtained p-values
• Interpret the variable with the highest p-value at the 0.05 level;
• if significant, then interpret the variable with the 2nd highest p-value at the 0.05/2 level;
• if positive, then interpret the variable with the smallest p-value at the 0.05/3 level.
78
Handling Multiplicity in Variables
• Scenario 4 Hierarchy– Pre-specify hierarchy among n co-primary
variables,
– All tested at the same level• interpret 1st variable at 0.05 level, if significant, then • interpret 2nd variable at 0.05 level; if positive, then • interpret 3rd variable at 0.05 level. • …Test procedure stops when a test is not significant.
– Trial is positive if first co-primary variable shows significant, positive result
79
Role of Secondary Variables
• Secondary variables can only be claimed if and only if – the primary variable shows significant results,
and – the comparisons related to the secondary
variables also are protected under the same Type I error rate as the primary variable.
• Similar procedures as already discussed can be used to protect Type I error
81
SUBGROUPS
82
Subgroups
• Indiscriminate subgroup analyses pose serious multiplicity concerns. Problems reverberate throughout the medical literature. Even after many warnings, some investigators doggedly persist in undertaking excessive subgroup analyses.
Lancet 2000; 355: 1033–34Lancet 2005; 365: 1657–61
Confounding & Interaction
83
84
Confounding
Non-Smokers Smokers
d=6%
d=0%
d=0%
Confounding• A situation in which a measure of the effect of
an exposure on risk is distorted because of the association of exposure with other factor(s) that influence the outcome under study.
• Criteria for confounding– Factor is associated with exposure– Factor is associated with disease in the absence of
exposure– Factor is not in the causal path between exposure
and outcome
85
Exposure Outcome
Third variable
To be a confounding factor, two conditions must be met:
Be associated with exposure - without being the consequence of exposureBe associated with outcome - independently of exposure (not an intermediary)
ConfoundingConfounding
86
87
Interacction
Age< 45 years Age>= 45 Years
d=5%
d=0.7% d=11.5%
88
Interaction & Subgroups
AspirinPlaceboVascular Death150 147
Total 1357 1442
11.1% 10.2%
p=0.42045 d=-0.9
ISIS-2: Vascular death by Star signs
Geminis/Libra Other Star Signs
AspirinPlaceboVascular Death 654 868
Total 7228 7157
9.0% 12.1%
p<0.0001 d=3.1
Interacction p = 0.019
Lancet 1988; 2: 349–60.
89
Changes from ISIS-2 results
Lancet 2005; 365: 1657–61
90
Simpson’s Paradox
Experimental Controln (%) n (%)
ALL Succes 70 (70%) 60 (60%)Failure 30 (30%) 40 (40%)
100 100
91
Simpson’s Paradox cont.Experimental Control
n (%) n (%)MALE Succes 10 (33%) 24 (40%)
Failure 20 (67%) 36 (60%)30 60
FEMALE Succes 60 (86%) 36 (90%)Failure 10 (14%) 4 (10%)
70 40
Experimental Controln (%) n (%)
ALL Succes 70 (70%) 60 (60%)Failure 30 (30%) 40 (40%)
100 100
92
• “The answer to a randomized controlled trial that does not confirm one’s beliefs is not the conduct of several subanalyses until one can see what one believes. Rather, the answer is to re-examine one’s beliefs carefully.”
–BMJ 1999; 318: 1008–09.
93
Lancet 2005; 365: 1657–61
94
the question is NOT: ‘Is the treatment effect in this subgroup statistically significantly different from zero?’
BUT…are there any differences in the treatment effect between the various subgroups?
The correct statistical procedures are either a test of heterogeneity or a test for interaction
95
Subgroups• Recommendations:
– 1) Examine the global effect – 2) Test for the interaction– 3) Plan adjustments for confirmatory
analyses– 4) Some points which increase the
credibility:• Pre-specification• Biologic plausibility
96
Lancet 2005; 365: 176–86
HOW TO CONTROL FOR CONFOUNDERS?
• IN STUDY DESIGN…
– RESTRICTION of subjects according to potential confounders (i.e. simply don’t include confounder in study)
– MATCHING subjects on potential confounder thus assuring even distribution among study groups
– RANDOM ALLOCATION of subjects to study groups to attempt to even out unknown confounders
• IN DATA ANALYSIS…
– RESTRICTION is still possible at the analysis stage but it means throwing away data
– IMPLEMENT A MATCHED-DESIGN after you have collected data (frequency or group)
– STRATIFIED ANALYSIS using to control for confounders
– MODEL FITTING using adjustment techniques
97
98
MULTIPLE INSPECTIONS
99
Interim Analyses in the CDP
Z ValueZ ValueZ ValueZ Value
+2+2
+1+1
00
-1-1
-2-2
+2+2
+1+1
00
-1-1
-2-210 20 30 40 50 60 70 80 90 10010 20 30 40 50 60 70 80 90 100
Month of Follow-upMonth of Follow-up
(Month 0 = March 1966, Month 100 = July 1974)
Coronary Drug Project Mortality Surveillance. Circulation. 1973;47:I-1
http://clinicaltrials.gov/ct/show/NCT00000483;jsessionid=C4EA2EA9C3351138F8CAB6AFB723820A?order=23
100
Lancet 2005; 365: 1657–61
101
Sequential designs
1) Sample size re-estimation
2) Group Sequential Methods
3) Alpha (Beta) Spending Functions
4) Repeated Confidence Intervals
5) Stochastic Curtailment
6) Bayesian Methods
7) Likelihood based Methods
105
K z ' z ' z '1 2.782 0.005 2.576 0.010 2.178 0.0292 1.967 0.049 1.969 0.049 2.178 0.029
1 3.438 0.001 2.576 0.010 2.289 0.0222 2.431 0.015 2.576 0.010 2.289 0.0223 1.985 0.047 1.969 0.049 2.289 0.022
1 4.084 0.000 3.291 0.001 2.361 0.0182 2.888 0.004 3.291 0.001 2.361 0.0183 2.358 0.018 3.291 0.001 2.361 0.0184 2.042 0.041 1.969 0.049 2.361 0.018
1 4.555 0.000 3.291 0.001 2.413 0.0162 3.221 0.001 3.291 0.001 2.413 0.0163 2.630 0.009 3.291 0.001 2.413 0.0164 2.277 0.023 3.291 0.001 2.413 0.0165 2.037 0.042 1.969 0.049 2.413 0.016
O'Brien & Fleming Peto Pocock
Group Sequential Methods
107
CONCLUSION
108
109
110
111
The role of statistics
“Thus statistical methods are no substitute for common sense and objectivity. They should never aim to confuse the reader, but instead should be a major contributor to the clarity of a scientific argument.”
The role of statistics. Pocock SJ Br J Psychiat 1980; 137:188-190
112
http://ferran.torres.name/edu/stats_rct
Password: stats_rct
BACK-UP
113
114
RANDOMIZATION & COVARIATES
115
116
Adjustement• The objective should be not to compensate
unbalance (randomisation) but to improve the precision
• Avoid to adjust by post-randomization variables
• In RCT, never use this widespread strategy: “adjust by any baseline significant variable (5% or 10% level)”
117
Testing for “baseline homogeneity”
• All observed differences are known with certainty to be due to chance.
• We must not test for it: there is no alternative hypothesis whose truth can be supported by such a test.
• If significant, the estimator is still unbiased
• Balance:– Decreases the variance and increases the power. – It has no effect on type I error.
118
Stratification• A priori
• May desire to have treatment groups balanced with respect to prognostic or risk factors (co-variates)
• For large studies, randomization “tends” to give balance • For smaller studies a better guarantee may be needed
• Useful only to a limited extent (especially for small trials) but avoid to many variables (i.e. many empty or partly filled strata)
119
Observed Unbalanced…• NEVER justifies the post-hoc
adjustment:– Randomization is more important– The treatment effect is unbiased without
adjustment (randomization)– Type I error level takes into account for
“chance error”– Post-hoc: data driven analyses – Multiplicity issues : increase type I error by
allowing a post-hoc adjustment
120
Adjusted Analyses
• ‘ When the potential value of an adjustment is in doubt, it is often advisable to nominate the unadjusted analysis as the one for primary attention, the adjusted analysis being supportive.’
122
MISSING DATA
123
Ex: LOCF & lineal extrapolation
36
32
28
24-
20
16
12
8
4 0 2 4 6 8 10 12 14 16 18 Time (months)
LOCF
Lineal Regresion
Bias
Ad
as-
Cog
> Worse
< Better
124
Ex: Early drop-out due to AE
Ad
as-
Cog
36
32
28
24-
20
16
12
8
4 0 2 4 6 8 10 12 14 16 18 Time
(months)
Placebo
Active
> Worse
< Better
Bias:
Favours
Active
125
Ex: Early drop-out due to lack of Efficacy
Ad
as-
Cog
36
32
28
24-
20
16
12
8
4 0 2 4 6 8 10 12 14 16 18 Time (months)
Placebo
Active
> Worse
< Better
Bias:
Favours
Placebo
126
RND
B
Baseline Last Visit
≠ Frecuencies
A
Drop-outs and missing dataDrop-outs and missing data
A A A A A AB B A
Visit 2Visit 1
A
127
RND
Baseline Last Visit
≠ Timing
A
Drop-outs and missing dataDrop-outs and missing data
A A A A B B
Visit 2Visit 1
B B B
128
MD e incorrecto uso de poblaciones (1)
DiseñoDiseño Cirugía vs Tratamiento Médico en estenosis Cirugía vs Tratamiento Médico en estenosis
carotidea bilateral (Sackket et al., 1985)carotidea bilateral (Sackket et al., 1985) Variable principalVariable principal: Número de pacientes que : Número de pacientes que
presenten TIA, ACV o muertepresenten TIA, ACV o muerte Distribución de los pacientes:Distribución de los pacientes:
Pacientes randomizados:Pacientes randomizados: 167167 Tratamiento quirúrgico: Tratamiento quirúrgico: 94 94 Tratamiento médico:Tratamiento médico: 73 73
– Pacientes que no completaron el estudio Pacientes que no completaron el estudio debido a ACV en las fases iniciales de debido a ACV en las fases iniciales de hospitalización: hospitalización:
Tratamiento quirúrgico: 15 pacientesTratamiento quirúrgico: 15 pacientesTratamiento médico:Tratamiento médico: 01 pacientes 01 pacientes
129
MD e incorrecto uso de poblaciones (2)
Población Por Protocolo (PP):Población Por Protocolo (PP):
Pacientes que hayan completado el estudioPacientes que hayan completado el estudio
AnálisisAnálisis
– Tratamiento quirúrgico:Tratamiento quirúrgico: 43 / (94 - 15) = 43 / 79 = 54%43 / (94 - 15) = 43 / 79 = 54%
– Tratamiento médico:Tratamiento médico: 53 / (73 - 1) = 53 / 72 = 74%53 / (73 - 1) = 53 / 72 = 74%
– Reducción del riesgo:Reducción del riesgo:27%, p = 0.0227%, p = 0.02
Primer análisis que se realiza :
130
MD e incorrecto uso de poblaciones (3)
El análisis definitivo queda de la siguiente forma :
Población Intención de Tratar (ITT):Población Intención de Tratar (ITT):
Todos los pacientes randomizadosTodos los pacientes randomizados
AnálisisAnálisis– Tratamiento quirúrgico:Tratamiento quirúrgico: 58 / 94 = 62%58 / 94 = 62%– Tratamiento médico:Tratamiento médico: 54 / 73 = 74%54 / 73 = 74%– Reducción del riesgo:Reducción del riesgo:18%, p = 0.0918%, p = 0.09 (PP: 27%, p = 0.02)(PP: 27%, p = 0.02)
Conclusiones: La población correcta de análisis es la ITT El tratamiento quirúrgico no ha demostrado ser significativamente superior al tratamiento médico
131
Handling of MD• Methods for imputation:
– Many techniques– No gold standard for every situation– In principle, all methods may be valid:
• Simple methods to more complex:– From LOCF to multiple imputation methods– Worst Case, “Mean methods”
• Multiple Imputation• But their appropriateness has to be justified
• Statistical approaches less sensitive to MD:– Mixed models– Survival models
• They assume no relationship between treatment and the missing outcome, and generally this cannot be assumed.
Handling of MD
132
Relationship of MD with
1) Treatment2) Outcome
133
134
A B
X X
X X
X X
X X
X X
X X
X X
X X
X X
X X
Effi cacy
A B
X X
X X
X X
X X
X X
X X
X X
. .
. .
. .
Effi cacy
A B
X X
X X
X X
X X
X X
X X
X X
. X
. X
. .
Effi cacy
0%
2%
4%
6%
8%
10%
12%
14%
A B
Obs
MD
0%
10%
20%
30%
40%
50%
60%
A B
Obs
MD
0%
2%
4%
6%
8%
10%
12%
14%
A B
Obs
MD
0%
2%
4%
6%
8%
10%
12%
14%
16%
18%
A B
Obs
MD
135
Trt Outc.Missing - -
S = 5,0% S = 12,0%
S = - S = - S = - S = -
MissingnessA B
Succes (M.D.)A B
Succes (Observed)A B
n % N %S 5 5,0% 12 12,0%F 95 95,0% 88 88,0%
100 100% 100 100%
% dif -7,0%OR 0,386RR 0,417
A B
A B
X X
X X
X X
X X
X X
X X
X X
X X
X X
X X
Effi cacy
0%
2%
4%
6%
8%
10%
12%
14%
A B
Obs
MD
136
Trt Outc.Missing - -
S = 5,0% S = 12,0%
S = 30,0% S = 30,0% S = 5,0% S = 12,0%
B BA
Succes (Observed)
Succes (M.D.)MissingnessA
A B
n % N %S 3,5 5,0% 8,4 12,0%F 66,5 95,0% 61,6 88,0%
70 100% 70 100%
% dif -7,0%
OR 0,386
RR 0,417
A B
n % N %S 5 5,0% 12 12,0%F 95 95,0% 88 88,0%
100 100% 100 100%
% dif -7,0%OR 0,386RR 0,417
BA
A B
X X
X X
X X
X X
X X
X X
X X
. .
. .
. .
Effi cacy
0%
2%
4%
6%
8%
10%
12%
14%
A B
Obs
MD
137
n % n %S 3.5 5.0% 10.8 12.0%F 66.5 95.0% 79.2 88.0%
70 100% 90 100%
% dif -7.0%OR 0.386RR 0.417
A B
Trt Outc.Missing si _
S = 5.0% S = 12.0%
S = 30.0% S = 10.0% S = 5.0% S = 12.0%
MissingnessA B
Succes (Observed)
Succes (M.D.)
BA
A B
n % N %
S 5 5,0% 12 12,0%F 95 95,0% 88 88,0%
100 100% 100 100%
% dif -7,0%OR 0,386RR 0,417
A B
A B
X X
X X
X X
X X
X X
X X
X X
. X
. X
. .
Effi cacy
0%
2%
4%
6%
8%
10%
12%
14%
A B
Obs
MD
138
0%
2%
4%
6%
8%
10%
12%
14%
16%
18%
A B
Obs
MDA B
X X
X X
X X
X X
X X
X X
X X
. .
. .
. .
Effi cacy
Trt Outc.Missing - si
S = 5,0% S = 12,0%
S = 30,0% S = 30,0% S = 10,0% S = 17,0%
MissingnessA B
Succes (Observed)A B
Succes (M.D.)A B
n % N %S 3,5 5,0% 8,4 12,0%F 66,5 95,0% 61,6 88,0%
70 100% 70 100%
% dif -7,0%OR 0,386RR 0,417
A B
n % N %S 6,5 6,5% 13,5 13,5%F 93,5 93,5% 86,5 86,5%
100 100% 100 100%
% dif -7,0%OR 0,445RR 0,481
A B
139
Trt Outc.Missing - si
S = 5,0% S = 12,0%
S = 30,0% S = 30,0% S = 50,0% S = 50,0%
Missingness
Succes (Observed)
Succes (M.D.)
A B
A B A B
n % N %S 3,5 5,0% 8,4 12,0%F 66,5 95,0% 61,6 88,0%
70 100% 70 100%
% dif -7,0%OR 0,386RR 0,417
A B
n % N %S 18,5 18,5% 23,4 23,4%F 81,5 81,5% 76,6 76,6%
100 100% 100 100%
% dif -4,9%OR 0,743RR 0,791
A B
A B
X X
X X
X X
X X
X X
X X
X X
. .
. .
. .
Effi cacy
0%
10%
20%
30%
40%
50%
60%
A B
Obs
MD
140
n % N %S 3.5 5.0% 10.8 12.0%F 66.5 95.0% 79.2 88.0%
70 100% 90 100%
% dif -7.0%
OR 0.386RR 0.417
A B
Trt Outc.Missing si si
S = 5.0% S = 12.0%
S = 30.0% S = 10.0% S = 50.0% S = 50.0%
MissingnessA B
A
A B
BObserved
Succes (M.D.)
A B
X X
X X
X X
X X
X X
X X
X X
. X
. X
. .
Effi cacy
0%
10%
20%
30%
40%
50%
60%
A B
Obs
MD
n % N %S 18.5 18.5% 15.8 15.8%F 81.5 81.5% 84.2 84.2%
100 100% 100 100%
% dif 3%OR 1.210RR 1.171
A B
Handling of MD
141
Best way to deal with Missing Data:
Don’t have any!!!• Methods for imputation:
– Many techniques– No gold standard for every situation– In principle, “almost any method may be valid”:
=>But their appropriateness has to be justified
142
Handling of MD• Avoidance of missingness:
– In the design and conduct all efforts should be directed towards minimising the amount of missing data likely to occur.
– Despite these efforts some missing values will generally be expected.
• The way these missing observations are handled may substantially affect the conclusions of the study.
143
Statistical framework
• applicability of methods based on a classification according to missingness generation mechanisms:
– missing completely at random (MCAR) – missing at random (MAR) – missing not at random (MNAR)
Rubin (1976)144
• MCAR - missing completely at random– Neither observed or unobserved outcomes are
related to dropout
• MAR - missing at random– Unobserved outcomes are not related to dropout,
they can be predicted from the observed data
• MNAR - missing not at random– Unobserved outcomes are related to dropout
Missing Data Mechanisms
145
MAR methods• MAR assumption
– MD depends on the observed data
– the behaviour of the post drop-out observations can be predicted with the observed data
– It seems reasonable and it is not a strong assumption, at least a priori
– In RCT, the reasons for withdrawal are known
– Other assumptions seem stronger and more arbitrary 146
147
36
32
28
24
20
16
12
8
4
0 2 4 6 8 10 12 14 16 18 Time (months)
> Worse
< Better
Options after withdrawalOptions after withdrawal
However…
• It is reasonable to consider that the treatment effect will somehow cease/attenuate after withdrawal
• If there is a good response, MAR will not “predict” a bad response
• =>MAR assumption not suitable for early drop-outs because of safety issues
• In this context MAR seems likely to be anti-conservative
148
The main analysis: What should reflect ?
A) The “pure” treatment effect:– Estimation using the “on treatment” effect after
withdrawal – Ignore effects (changes) after treatment
discontinuation– Does not mix up efficacy and safety
B) The expected treatment effect in “usual clinical practice” conditions
149
General Strategies
• Complete-case analysis• “Weighting methods” & Dummy
variable/category • Imputation methods
– Single Imputation / Multiple Imputation
• Analysing data as incomplete• MNAR methods• Other methods
150
Complete-case analysis
• a.k.a. Available Data Only (ADO)
• “Case deletion”:
– Listwise deletion (a.k.a. complete-case analysis):• delete all cases with missing value on any of the variables in the analysis. Only use complete
cases.
– Pairwise deletion (a.k.a. available-case analysis) • use all available cases for computation of any sample moment
• Only OK if missing data are MCAR (very strong assumption)– Parameter estimates unbiased– Standard errors appropriate?
• But, can result in substantial loss of statistical power151
Complete-case analysis
• Complete case analysis:– Bias, power and variability– Not generally appropriate. – Exceptions:
– Exploratory studies, especially in the initial phases of drug development.
– Secondary supportive analysis in confirmatory trials (robustness)
152
General Strategies
• Complete-case analysis• “Weighting methods” & Dummy
variable/category • Imputation methods
– Single Imputation / Multiple Imputation
• Analysing data as incomplete• MNAR methods• Other methods
153
“Weighting methods” & Dummy variable/category
• “Weighting methods”: – To construct weights for incomplete/under-
represented cases– Sometimes considered as a form of imputation
• Dummy variable/category adjustment– Cohen & Cohen (1985); produces biased coefficient
estimates (see Jones’ 1996 JASA article)
Utility: observational studies; exploratory analyses
154
General Strategies
• Complete-case analysis• “Weighting methods” & Dummy
variable/category • Imputation methods
– Single Imputation / Multiple Imputation
• Analysing data as incomplete• MNAR methods• Other methods
155
Single Imputation• Substitute a value for each missing value.
Some of the ways to choose this value: – Mean Estimation
• Replace missing data with the mean of non-missing values.
– Class Imputation methods• Stratify and sort by key covariates, replace missing data
from another record in the same strata.
– Predict missing values from Regression • Impute each independent variable on the basis of other
independent variables in model.
– LOCF / BOCF
– Other single imputation methods:• Rank/Score based methods• Worst (best) case• EM estimation
156
Mean Imputation
Scatterplots are from Joe Schafer’s website
157
Regression methods
158
Imputation methods• LOCF and variants
– Bias: • Depending on the amount and timing of drop-outs:• Ex: The conditions under study has a worsening course
– Conservative: » Drop-outs because of lack of efficacy in the control group
– Anticonservative:» Drop-outs because of intolerance in the test group
– Use: only if MCAR assumption and if there are no trends with time
– BOCF useful in some cases such as in a chronic pain trial• it is reasonable to assume that when a patient withdraws and
treatment is stopped the pain levels return to baseline levels.
159
160
Ex: LOCF & lineal extrapolation lineal
36
32
28
24-
20
16
12
8
4
0 2 4 6 8 10 12 14 16 18 Time (months)
LOCF
Lineal Regresion
Bias
Ad
as-
Cog
> Worse
< Better
Ex: Early drop-out due to AE
Ad
as-
Cog
36
32
28
24-
20
16
12
8
4
0 2 4 6 8 10 12 14 16 18 Time (months)
Placebo
Active
> Worse
< Better
Bias:
Favours
Active
161
Ex: Early drop-out due to lack of Efficacy
Ad
as-
Cog
36
32
28
24-
20
16
12
8
4
0 2 4 6 8 10 12 14 16 18 Time (months)
Placebo
Active
> Worse
< Better
Bias:
Favours
Placebo
162
Adas-Cog 36
32
28
24-
20
16
12
8
4
0 2 4 6 8 10 12 14 16 18 Time
month
Example of interpolation
Regression imputation
163
Single Imputation• Substitute a value for each missing value.
Some of the ways to choose this value: – Mean Estimation
– Class Imputation methods
– Predict missing values from Regression
– LOCF / BOCF
– Other single imputation methods:• Rank/Score based methods• Worst (best) case
164
Single Imputation Pros - Cons
• Advantages-Single Imputation– Allows standard complete-data methods of analysis to
be used– Incorporates the data collectors knowledge
• Disadvantages-Single Imputation– Inferences based on imputed data set might be too
sharp– Correlations can be biased
165
General Strategies
• Complete-case analysis• “Weighting methods” & Dummy
variable/category • Imputation methods
– Single Imputation / Multiple Imputation
• Analysing data as incomplete• MNAR methods• Other methods
166
Analysing data as incomplete
• Direct Estimation:– GEE analysis– Likehood methods– Bayesian Estimation with Metropolis-Hastings or Markov
Chain Monte Carlo
NMAR Procedures (Usually uses one of these procedures or their extensions.)
• Time to event variables
167
Analysing data as incomplete
• For continuous responses: – mixed-effect models for repeated measures, MMRM
• For categorical responses and count data: – marginal (e.g. generalized estimating equations, GEE) – random-effects (e.g., generalized linear mixed models,
GLMM)
• MD is not imputed
• Information is borrowed from cases where the information is available
• MAR assumption
168
Analysing data as incomplete
Time to event analysis
•When the outcome measure is time to event, survival models which take into account censored observations are often used.
•Many standard survival methods assume that there is no relationship between the response and the missing outcome.
•Violations from this assumption could lead to biased results especially when data are missing due to withdrawal.
169
General Strategies
• Complete-case analysis• “Weighting methods” & Dummy
variable/category • Imputation methods
– Single Imputation / Multiple Imputation
• Analysing data as incomplete• Other methods
170
3 Steps in Multiple Imputation (MI)
1. Create imputations (>1 for each missing value)
2. Analyze the imputed datasets
3. Combine the results
171
172
Multiple Imputation
173
Multiple Imputation
Advantages of MI
• By imputing more than one value => uncertainty is introduced
• Re-combining results in efficient and unbiased estimates
=>Correct inference
174
General Strategies
• Complete-case analysis• “Weighting methods” & Dummy
variable/category • Imputation methods
– Single Imputation / Multiple Imputation
• Analysing data as incomplete• MNAR methods• Other methods
175
NMAR Missing Data
• Pattern Mixture Models
• Selection Models
• Other:– Auxiliary Variables
• Can alleviate NMAR bias (If correlates highly with missing values)– Shared parameter/Joint models
Be extremely cautious in the interpretation!176
Other methods• Retrieval of data after withdrawal
– Assessment may be interfered by external treatments, but reflects the clinical practice
– Balance: possible influence of external treatments after withdrawals VS possible bias due to the process of imputation or direct estimation
– Not biased when there are no effective treatments in one particular setting
• Responder analysis• Reasons for drop-out which are likely to be treatment related
(such as lack of efficacy or safety issues) will be considered as non-responders.
177
178
Definición de las distintas Definición de las distintas poblaciones de un estudiopoblaciones de un estudio
179
Objetivo: Evaluar la eficacia de un programa para reducir el peso frente a los a los consejos habituales
Diseño: Ensayo Clínico Aleatorio
Candidatos: 790
Obesos: 320
Grupo intervención: 161
Grupo control: 159
Rechazo: 59Petición espontánea: 54
Acaban: 102 Acaban: 105
180
Grupo intervención: 161
Grupo control: 159
Rechazo: 59Petición espontánea: 54
Acaban: 102 Acaban: 105
Grupo intervención Grupo Control
Opción A 161 159
Opción B 102 105
Opción C 59 54
Opción D 156 164
Análisis multivariante
Principales ventajas
• Ajuste por variables mal distribuidas
• Ajuste por valores basales distintos
• Test de significación multivariante
Análisis MultivarianteParadoja de Simpson (1951)
• En 1973, 8.442 hombres y 4.321 mujeres solicitaron su admisión en la Universidad de Berkeley.
• Se admitió al 44% de los hombres y al 30% de las mujeres.
• El gobierno federal acusó a la Universidad de Berkeley de discriminación sexual.
Análisis MultivarianteParadoja de Simpson (1951)
Tasa de solicitud y admisión en las 6 carreras más importantes.
OR hombres = 1,54 Una persona tiene 1,54 veces más probabilidad de ser admitida si es hombre.
Carrera
Nº Hombres
Solicitud
Nº Hombres
Admitidos
% Hombres
Admitidos
Nº Mujeres
Solicitud
Nº Mujeres
Admitidas
% Mujeres
Admitidas
Razón Hombres
Admitidos
A 825 512 0,62 108 89 0,82 0,35
B 560 353 0,63 25 17 0,68 0,80
C 325 120 0,37 593 202 0,34 1,13
D 417 138 0,33 375 202 0,54 0,42
E 191 53 0,28 393 94 0,24 1,22
F 373 22 0,06 341 24 0,07 0,83
Suma 2691 1198 0,45 1835 628 0,34 1,54
Análisis MultivarianteParadoja de Simpson (1951)
• En la mayoría de las carreras las mujeres tuvieron tasas de admisión superior a los hombres (A, B, D, F).
• Las excepciones fueron en pequeña cantidad (C, E).
¿De dónde surge la idea de la discriminación?
Carrera
% Hombres
Admitidos
% Mujeres
Admitidas
Razón Hombres
Admitidos
A 0,62 0,82 0,35
B 0,63 0,68 0,80
C 0,37 0,34 1,13
D 0,33 0,54 0,42
E 0,28 0,24 1,22
F 0,06 0,07 0,83
Suma 0,45 0,34 1,54
Análisis MultivarianteParadoja de Simpson (1951)
• Observar que es muy fácil entrar en las carreras A y B y que muchos hombres solicitan el ingreso en ellas.
• Sin embargo las mujeres pretenden entrar en la carrera F que es muy difícil.
Carrera
% Hombres
Solicitud
% Hombres
Admitidos
% Mujeres
Solicitud
% Mujeres
Admitidas
A 0,31 0,62 0,06 0,82
B 0,21 0,63 0,01 0,68
C 0,12 0,37 0,32 0,34
D 0,15 0,33 0,20 0,54
E 0,07 0,28 0,21 0,24
F 0,14 0,06 0,19 0,07
Suma 0,45 0,34