View
34
Download
0
Category
Tags:
Preview:
DESCRIPTION
THE RELIABILITY AND UNRELIAB ILITY OF SUB-GROUP ANALYSES JEFFREY L. PROBSTFIELD, MD, FACP, FACC, FAHA, FESC, FSCT Professor of Medicine (Cardiology) University of Washington Research grants- Abbott, Boehringer Ingelheim, King, Sanofi-Aventis Pharmaceuticals, NHLBI, NCI; - PowerPoint PPT Presentation
Citation preview
THE RELIABILITY AND UNRELIAB ILITY OF SUB-GROUP ANALYSES
JEFFREY L. PROBSTFIELD, MD, FACP, FACC, FAHA, FESC, FSCTProfessor of Medicine (Cardiology)
University of Washington
Research grants- Abbott, Boehringer Ingelheim, King, Sanofi-Aventis Pharmaceuticals, NHLBI, NCI;
Consultantship-King Pharmaceuticals; no stocks, options or BOD positions
DILEMMA
“The response of the average patient to therapy is not necessarily the response of
the patient being treated.”
Bernard, 1865
SUBGROUP ANALYSES DRIVEN BY:
“Should all patients be given XYZ before, during, or after ABC or
can/should treatment be limited to a select group?”
SUBGROUPS- PETO
• Only one thing is worse than doing subgroup analyses---believing the results
PETO: HOW TO SPOIL A GOOD TRIAL RESULT
1. Undertake many data-dependent subgroup analyses.
2. Find some subgroups where treatment has no significant effect (or even, perhaps, no apparent effect whatsoever).
3. Publish the findings in such a way that many readers believe them.
CARE RESULTS - NO CHD RISK REDUCTION BELOW LDL-C 125 mg/dL
“…Although our finding cannot be considered definitive and requires confirmation, it suggests that an LDL cholesterol level of 125 mg per deciliter may be an approximate lower boundary for a clinically important influence of the LDL cholesterol level on coronary heart disease…”
NEJM 1996;335:1001-1009
CHOLESTEROL LEVELS AND CHD RISK REDUCTION
Cholesterol and Recurrent Events (CARE)
4159 Participants
Participants
Plasma With Events % Risk
LDL-C Placebo Pravastatin Reduction
137 269 210 23 (8 to 36)
> 137 280 220 24 (10 to 36)
NEJM 1996;335:1001-1009
CHOLESTEROL LEVELS AND CHD RISK REDUCTION
Cholesterol and Recurrent Events (CARE)
4159 Participants
Participants
Plasma with Event % Risk
LDL-C Placebo Pravastatin Reduction
125 93 89 -3 (-38 to 23)
125-150 311 239 26(13 to 38)
> 150-175 145 102 35(17 to 50)
NEJM 1996;335:1001-1009
CARE SUBGROUP ANALYSIS RISK REDUCTION BELOW BASELINE LDL-C
mg/dLConcerns about divisions described
Plasma Participants %CHD Risk
LDL-C N Reduction 95% CI
> x(20%) 850 N/A N/A
> 150mg/dL 953 35 17 to 50
125 - 150 2355 26 13 to 38
< 137.5 2090 23 8 to 36
< 130 1386 15 N/A
< 127 1034 10 N/A
< 125(20%) 851 -3 -38 to 23
HPSLDL-C N RRR CHD
<3 (<116) 6793 33%>3<3.5 5063 25% >3.5 8680 42%
LDL-C N Δ LDL-C E RRR CHD
<100 3421 -35 69 22%100-130 7068 -37 86 28%>130 9927 -39 104 24%
No interaction with Vitamin Cocktail
PROPER SUBGROUP
“A common set
of baseline parameters”
IMPROPER SUBGROUP:
“Characterized by a variable
measured after randomization”
Hypokalemia Associated With Diuretic Use and CV Events in SHEP
Franse, et al. Franse, et al. Hypertension. 2000;35:1025-1030
CHD Event Rate by Year 1 KCHD Event Rate by Year 1 K++ Strata Strata
0.00
0.03
0.06
0.09
0.12
0.15
Cu
mu
lati
ve C
HD
Eve
nt
Rat
e
0 1 2 3 4 5
Years to CHD
Y1 K+ = < 3.5
Y1 K+ = 3.5-5.4
Y1 K+ = > 5.4
HR 95% CIHyper/Normo-K+ 1.28 0.69, 2.40Hypo/Normo-K+ 1.03 0.82, 1.30
EXAMPLES OF IMPROPER SUBGROUPS:
1. Responders vs. nonresponders
2. Adherers vs Non-adherers
FIVE-YEAR MORTALITY ACCORDING TO BASE-LINE CHOLESTEROL AND CHANGE
FROM BASE-LINE, ADJUSTED FOR 40 BASE-LINE CHARACTERISTICS
Treatment Group
Clofibrate Placebo
BaselineCholesterol
MG/DL
CholesterolChange
No. of Pts. % mortality No. of Pts. % mortality
< 250 All men 507 20.0 ± 1.8 1319 19.9 ± 1.1
> 250 All men 490 17.5 ± 1.7 1216 20.6 ± 1.2
All men Fall 680 17.2 ± 1.4 1376 20.7 ± 1.1
All men Rise 317 22.2 ± 2.3 1159 19.7 ± 1.2
< 250 Fall 295 16.0 ± 2.1 614 21.2 ± 1.6
< 250 Rise 212 25.5 ± 3.0 705 18.7 ± 1.5
> 250 Fall 385 18.1 ± 2.0 762 20.2 ± 1.5
> 250 Rise 105 15.5 ± 3.5 454 21.3 ± 1.9
FIVE-YEAR MORTALITY: PATIENTS GIVEN CLOFIBRATE OR PLACEBO, ACCORDING TO CUMULATIVE ADHERENCE TO PROTOCOL
PRESCRIPTION
Treatment Group
Clofibrate Placebo
No. of Pts. % mortality No. of Pts. % mortality
< 80% 357 24.6 ± 2.3(22.5)
882 28.2 ± 1.5(25.8)
> 80% 708 15.0 ± 1.3(15.7)
1813 15.1 ± 0.8(16.4)
Total studygroup
1065 18.2 ± 1.2(18.0)
2695 19.4 ± 0.8(19.5)
STRUCTURED HYPOTHESES
• Carefully state hypothesis
• Allow analyses to capture the effect
INTERACTION (Differential Subgroup Effect)
“A treatment effect that
differs by subgroup.”
QUANTATIVE INTERACTION
Different amount (quantity) of
benefit in various subgroups.
QUALITATIVE INTERACTION
True Benefit in some subgroups
and True Harm in others
(1 of over 700)
QUALITATIVE DIFFERENCES -WHY NOT?
• Extremes excluded
• Lack of replication in other studies
BIASES AND ERRORS IN DETERMINING SUBGROUP
EFFECTS
1. Subgroups lack statistical power
2. Random variation - widely divergentestimates of treatment benefit
3. Statistical multiplicity
4. Post-hoc analyses - extreme results theproduct of random errors
5. Replication - a posteriori vs. a priori
FURBERG AND BYINGTONCIRCULATION 1983;67:I98-I101
• 146 Subgroups in BHAT: Few defined a priori
• Distribution of subgroup results - Gaussian
• Impact of change on data set inversely related to sample size. (Participants or deaths gives similar distribution)
3 CRITERIA FOR CONFIDENCE IN SUBGROUP FINDING
• Dose response relationship
• Independent findings within the study
• Replication by outside trial
EXPECTED EFFECTS OF TRIAL SIZE ON TRIAL RESULTS
Total no. of deaths in trial
(treated+control)
(Approx. no. of patients
randomized if risk 10%
Approx Probability of failing to achieve
1 P<0.01 significance if true risk reduction is
1/4
Comments that might be made of size before trial
begins
0-50 (under 500) Over 0.9 Utterly inadequate
50-150 (1000) 0.7-0.9 Probably inadequate
150-350 (3000) 0.3-0.7 Probably adequate, possibly not
350-650 (6000) 0.1-0.3 Possibly adequate
Over 650 (10,000) Under 0.1 Definitely Adequate
Actual effects of trial size on trial results. Relationship between the total number of deaths in the two treatment groups and the result actually attained, in the 24 trials of a treatment (long-term beta-
blockade) that reduces the odds of death by about 22 + 4%
Total no. of deaths in
trial
(β-bl.+plac.)
(Mean no. of patients
randomized)
Statistical power
P<0.5 against
Non-sigt. against
Non-sigt. favorable
P<0.5 favorable
0-50 (255) Utterly Inadequate
0 5 5 0
50-150 (861) Probably Inadequate
0 1 9 1
150-350 (2925) Possibly adequate,
probably not
0 0 1 2
350-650 (No such β -bl. trials
exist
Probably Adequate
- - - -
Over 650 No such β -bl. trials
exist
Definitely Adequate
- - - -
TOTAL (866) Inadequate separately,
adequate only in aggregate
0 6 15 3
No. of trials resulting in:
SUBGROUP EFFECT
Treatment effect in a specific proper subgroup.
Must be significantly different from
overall effect!!
HYPOTHETICAL SUBGROUP EFFECTS ILLUSTRATING THE “PLAY OF CHANCE” IN A TRIAL THAT SHOWS
CLEAR OVERALL BENEFIT
(%) (%) RiskDecrease
(%)
p Value
Overallresult
240/3,000(8)
300/3,000(10)
20 < 0.01
Subgroup A 80/1,000(8)
100/1,000(10)
20 NS
Subgroup B 70/1,000(7)
110/1,000(11)
36 < 0.001
Subgroup C 90/1,000(9)
90/1,000(9)
0 NS
MULTIPLE COMPARISONSExample:
• 1,000 participants Mortality Rate = 10%
Treat A Treat B Treatments equally effective• 10 subgroups (equal size) randomly formed
Relative Risk Probability
Reduction to: (percent)
.5 99
.33 80
.1 5• Nominal p value - inappropriate
• Conservative approach - p/Sn
• Especially important in trial where main outcome is
not statistically significant
1.25
Subgroup
% pts
Overall
100
Beta blocker (yes)
35
no beta blocker
65
ACEI (yes)
93
no ACEI*
7
1.00.750.50
.87.77 .97
Valsartan better Valsartan worse
Mortality and Morbidity
44.0% , P=.0002
Cohn. N Engl J Med. 2001; *FDA analysis/package insert
RR of death P
no ACEI 41%<0.05*
ACEI + Beta blocker 42% 0.009
Subgroup Results
*n=366
Diabetes No 680/2715 815/2721Yes 470/1088 495/1075
Hyper- No 484/1710 579/1703tension Yes 666/2093 731/2093
ACEIs No 586/2230 688/2244 Yes 564/1573 622/1552
Beta No 611/1701 710/1695blocker Yes 539/2102 600/2101
Spirono- No 880/3160 1041/3167lactone Yes 270/643 269/629
Overall 1150/3803 1310/3796
Test for interaction
P=0.09
P=0.51
P=0.32
P=0.19
P=0.17
candesartan better
Hazard ratio
placebo better
0.6 0.8 1.0 1.2 1.4
Candesartanevent/n
Placeboevent/n
CV Death or Hospitalization for CHF
EXAMPLE OF “SUBGROUPING” IN INTERNATIONAL SOCIETY FOR THE
INVESTIGATION OF STRESS-2: ASTROLOGY AND ASPIRIN
Vascular Mortality at Week 5
Aspirin (%) Placebo (%) OddsDecrease(% ± SD)
Patients bornunder Libraand Gemini
150/1,357(11.1)
147/1,442(10.2)
8% adverse(NS)
Patients bornunder other“birth signs”
654/7,228(9.0)
868/7,157(12.1)
26% ± 5(p<0.0001)
Overall results 804/8,587(9.4)
1,016/8,600(11.8)
23% ± 4(p<0.0001)
ORDERED SUBGROUPS
• Strong biological rationale
• Reflects natural ordering
• Correct for multiplicity
• Only indicate those as significant which have a p value less than p/Sn
Reduction of Stroke According to Sex
N Active Rx
Placebo % Difference
Males 2046 5.5% 8.7% -38%
Females 2690 4.7% 7.3% -31%
Stroke Rates
GISSI-1• Overall result - streptokinase treatment,
20% reduction in total mortality
• Benefit confined to:
– Anterior MI
– Age 65 years
– Treatment 6 hours
• Subsequent trials and pooled results do not confirm
HYPOTHETICAL EXAMPLE OF ORDERED SUBGROUPS: RELATIVE RISK AS A
FUNCTION OF TIME OF THROMBOLYTIC THERAPY
14-d Mortality (%) Hours After Pain Treated Control
Relative Risk
P
≤ 2 10 20 .50 .01
> 2 – 4 11 16 .70 NS
> 4 – 8 17 22 .77 NS
> 8 – 12 20 25 .80 NS
> 12 23 24 .95 NS
Overall 14 21 .67 .001
SUBGROUPS DEFINED A-PRIORI
• Suggestive differential subgroup effect
• State in design of new trial
• Publish (multiplicity, design analysis, plan over-sampling)
• Test within an existing data set
STROKE SUBGROUP HYPOTHESIS
On BP Meds at ICV Off BP Meds at ICV
35% of participants 65% of participants
Net reduction in Net reduction in
stroke rate =10% stroke rate =40%
80% power to detect 30% treatment difference
STROKE EVENTS BY MEDICATION STATUS
GROUP N NFS FS CSNot on MedicationsActive 1584 64 5 67Placebo 1577 88 11 96
6.72= 2א Relative Risk (active/placebo)= 0.69P = .0096 95% CI = 0.51-0.95
On MedicationsActive 781 32 5 36Placebo 794 61 3 63
5.11 = 2א Relative Risk (active/placebo)= 0.57 P = .0237 95% CI = 0.38-0.85
MRFIT RESEARCH GROUP, AM J CARDIOL 1985;55:1-15
ECG ABNORMALITIES AT BASELINE
Present Absent
PersonYrs.
No. ofDeaths
Per 1000Per Yrs.
PersonYrs.
No. ofDeaths
Per 1000Per Yrs.
Not onDiuretic
6074 9 1.48 17,356 35 2.02
On Diuretic 6433 38 5.91 14,399 33 2.29
SUBGROUP HYPOTHESIS
Will the treatment of ISH reduce the frequency of major coronary events more in those free of baseline ECG
abnormalities than in those with such abnormalities?
OTHER COMBINED EVENTS BY TREATMENT GROUP
Event Active Placebo Rel. Risk 95% CI
Nonfatal MI/CHD Death
104 141 0.73 0.57-0.94
Stroke/MI/CHD/Death
199 289 0.67 0.56-0.81
CHD 140 184 0.75 0.60-0.94
CVD 289 414 0.68 0.58-0.79
NONFATAL MI & CHD DEATH BY BASELINE ECG ABNORMALITIES
TreatmentNumber Events Rate per 100 (SE)With baseline ECG abnormalitiesActive 1429 67 6.0 (0.8)Placebo 1426 96 8.0 (0.9)
5.73 = 2א Rel. Risk = 0.69P = 0.02 95% CI = 0.50-0.94
Without baseline ECG abnormalitiesActive 903 35 4.5 (0.8)Placebo 922 43 4.6 (0.7)
0.70 = 2א Rel. Risk = 0.83P = 0.40 95% CI = 0.53-1.29
SUDDEN DEATH BY BASELINE ECG ABNORMALITIES
TreatmentNumber Events Rate per 100 (SE)With baseline ECG abnormalitiesActive 1429 15 1.2 (0.3)Placebo 1426 17 1.3 (0.4)
0.14 = 2א Rel. Risk = 0.88P = 0.71 95% CI = 0.44-1.75
Without baseline ECG abnormalitiesActive 903 8 1.2 (0.4)Placebo 922 5 0.6 (0.3)
0.77 = 2א Rel. Risk = 1.64P = 0.38 95% CI = 0.54-5.01
SUBGROUPS DEFINED A-POSTERIORI
• “Grist” for formulating hypothesis
• Watch for alternative definitions!
• Should be clearly stated and reported as an estimate of effect with appropriate confidence interval
SUBGROUPS AND MONITORING TRIALS
• Use statistically sound monitoring method
• Interference with main trial endpoint - rare
• Formulate hypothesis and test prospectively
• Terminate subgroup
• “Mega trials” - special problems
BUCHWALD AND COLLEAGUES: POSCH, NEJM1990;323;946-955
• Cholesterol lowering by ileal bypass for secondary prevention of CHD
• 838 participants randomized
• Primary outcome-Fatal and non-fatal CHD– 62 vs 49, p=0.164
• Subgroup, EF<50% vs >50%, – Final 24 vs 39, p=0.021– post-hoc analysis 6 vs 17– after observed phenomenon - 18 vs 22
ESSENTIALS IN SUBGROUP REPORTING
• PRESPECIFICATION OF SUBGROUPS • NUMBER OF SUBGROUPS• NUMBER OF SUBGROUP OUTCOMES• STATISTICAL METHODS (INTERACTION)• NUMBER OF SIGNIFICANT SUBGROUPS FOUND• EMPHASIS OF SUBGROUP VS PRIMARY
OUTCOME
Hernandez AV, et al. Am Heart J. 2006;151:257-64.
SUGGESTIONS TO APPROPRIATELY PERFORMAND INTERPRET SUBGROUP ANALYSIS
DICTUM 1
“The treatment effect in all subgroups of patients without obvious contraindications to treatment is likely to be qualitatively (in the same direction) similar.”
DICTUM 2
“The treatment effect in all subgroups of patients without obvious contraindications to treatment is likely to be quantitatively (difference in degree) dissimilar even when effects appear to be identical.”
DICTUM 3
“Estimates of treatment effect within a subgroup chosen for special emphasis are usually ‘biased’ and so the most appropriate estimate in a subgroup is closer to the overall result.”
KEY POINTS IN SUBGROUP ANALYSES & INTERPRETATION I
Design
1. State clearly a few important and plausible subgroup hypotheses in advance. Include the direction of the expected effect.
2. Rank the subgroup hypotheses in order or plausibility.
3. Calculate power to detect key subgroup effects. If it is inadequate, consider building adequate power to detect key subgroup effects.
4. State whether the trial will be continued even after the overall results are convincing but the subgroup effects are not significant. Decide whether the primary method of monitoring will focus on the subgroups or on the overall trial.
5. State primary analytical methods in advance.
KEY POINTS IN SUBGROUP ANALYSES AND INTERPRETATION II
Monitoring a trial
1. Rigorous evidence of benefit or harm in subgroups postulated a priori: consider selective discontinuation of that subgroup.
2. Evidence of benefit or harm in unexpected subgroups: postulate a hypothesis to be tested in the remaining part of the study.
KEY POINTS IN SUBGROUP ANALYSES & INTERPRETATION III
Analyses and interpretation
1. Use statistical methods that capture the framework of the prior hypotheses.
2. Place greater emphasis on the overall results than on what may be apparent within a particular subgroup.
3. Distinguish between prior and data-derived hypotheses. Do not calculate p values for data-derived hypotheses because such p values usually bear little resemblance to what could occurs if the hypothesis were tested independently in another study.
4. Use tests of “interactions” and/or correct for multiplicity of statistical comparisons. (“Nominal” p values are usually misleading.)
5. Interpret the results in the context of similar data from other trials, from the architecture of the entire set of data on all patients, and from principles of biological coherence.
KEY POINTS IN SUBGROUP ANALYSES AND INTERPRETATION IV
Improper subgroups and analyses
1. Avoid analyses of subgroups based on post-randomization response, adherence, etc.
2. Avoid emphasizing nominal p values.
3. Do not emphasize data-derived analyses or analyses based on post-hoc definitions of subgroups.
Recommended