THE RELIABILITY AND UNRELIAB ILITY OF SUB-GROUP ANALYSES

JEFFREY L. PROBSTFIELD, MD, FACP, FACC, FAHA, FESC, FSCTProfessor of Medicine (Cardiology)

University of Washington

Research grants- Abbott, Boehringer Ingelheim, King, Sanofi-Aventis Pharmaceuticals, NHLBI, NCI;

Consultantship-King Pharmaceuticals; no stocks, options or BOD positions

DILEMMA

“The response of the average patient to therapy is not necessarily the response of

the patient being treated.”

Bernard, 1865

SUBGROUP ANALYSES DRIVEN BY:

“Should all patients be given XYZ before, during, or after ABC or

can/should treatment be limited to a select group?”

SUBGROUPS- PETO

• Only one thing is worse than doing subgroup analyses---believing the results

PETO: HOW TO SPOIL A GOOD TRIAL RESULT

1. Undertake many data-dependent subgroup analyses.

2. Find some subgroups where treatment has no significant effect (or even, perhaps, no apparent effect whatsoever).

3. Publish the findings in such a way that many readers believe them.

CARE RESULTS - NO CHD RISK REDUCTION BELOW LDL-C 125 mg/dL

“…Although our finding cannot be considered definitive and requires confirmation, it suggests that an LDL cholesterol level of 125 mg per deciliter may be an approximate lower boundary for a clinically important influence of the LDL cholesterol level on coronary heart disease…”

NEJM 1996;335:1001-1009

CHOLESTEROL LEVELS AND CHD RISK REDUCTION

Cholesterol and Recurrent Events (CARE)

4159 Participants

Participants

Plasma With Events % Risk

LDL-C Placebo Pravastatin Reduction

137 269 210 23 (8 to 36)

> 137 280 220 24 (10 to 36)

NEJM 1996;335:1001-1009

CHOLESTEROL LEVELS AND CHD RISK REDUCTION

Cholesterol and Recurrent Events (CARE)

4159 Participants

Participants

Plasma with Event % Risk

LDL-C Placebo Pravastatin Reduction

125 93 89 -3 (-38 to 23)

125-150 311 239 26(13 to 38)

> 150-175 145 102 35(17 to 50)

NEJM 1996;335:1001-1009

CARE SUBGROUP ANALYSIS RISK REDUCTION BELOW BASELINE LDL-C

mg/dLConcerns about divisions described

Plasma Participants %CHD Risk

LDL-C N Reduction 95% CI

> x(20%) 850 N/A N/A

> 150mg/dL 953 35 17 to 50

125 - 150 2355 26 13 to 38

< 137.5 2090 23 8 to 36

< 130 1386 15 N/A

< 127 1034 10 N/A

< 125(20%) 851 -3 -38 to 23

HPSLDL-C N RRR CHD

<3 (<116) 6793 33%>3<3.5 5063 25% >3.5 8680 42%

LDL-C N Δ LDL-C E RRR CHD

<100 3421 -35 69 22%100-130 7068 -37 86 28%>130 9927 -39 104 24%

No interaction with Vitamin Cocktail

PROPER SUBGROUP

“A common set

of baseline parameters”

IMPROPER SUBGROUP:

“Characterized by a variable

measured after randomization”

Hypokalemia Associated With Diuretic Use and CV Events in SHEP

Franse, et al. Franse, et al. Hypertension. 2000;35:1025-1030

CHD Event Rate by Year 1 KCHD Event Rate by Year 1 K++ Strata Strata

0 1 2 3 4 5

Years to CHD

Y1 K+ = < 3.5

Y1 K+ = 3.5-5.4

Y1 K+ = > 5.4

HR 95% CIHyper/Normo-K+ 1.28 0.69, 2.40Hypo/Normo-K+ 1.03 0.82, 1.30

EXAMPLES OF IMPROPER SUBGROUPS:

1. Responders vs. nonresponders

2. Adherers vs Non-adherers

FIVE-YEAR MORTALITY ACCORDING TO BASE-LINE CHOLESTEROL AND CHANGE

FROM BASE-LINE, ADJUSTED FOR 40 BASE-LINE CHARACTERISTICS

Treatment Group

Clofibrate Placebo

BaselineCholesterol

CholesterolChange

No. of Pts. % mortality No. of Pts. % mortality

< 250 All men 507 20.0 ± 1.8 1319 19.9 ± 1.1

> 250 All men 490 17.5 ± 1.7 1216 20.6 ± 1.2

All men Fall 680 17.2 ± 1.4 1376 20.7 ± 1.1

All men Rise 317 22.2 ± 2.3 1159 19.7 ± 1.2

< 250 Fall 295 16.0 ± 2.1 614 21.2 ± 1.6

< 250 Rise 212 25.5 ± 3.0 705 18.7 ± 1.5

> 250 Fall 385 18.1 ± 2.0 762 20.2 ± 1.5

> 250 Rise 105 15.5 ± 3.5 454 21.3 ± 1.9

FIVE-YEAR MORTALITY: PATIENTS GIVEN CLOFIBRATE OR PLACEBO, ACCORDING TO CUMULATIVE ADHERENCE TO PROTOCOL

PRESCRIPTION

Treatment Group

Clofibrate Placebo

No. of Pts. % mortality No. of Pts. % mortality

< 80% 357 24.6 ± 2.3(22.5)

882 28.2 ± 1.5(25.8)

> 80% 708 15.0 ± 1.3(15.7)

1813 15.1 ± 0.8(16.4)

Total studygroup

1065 18.2 ± 1.2(18.0)

2695 19.4 ± 0.8(19.5)

STRUCTURED HYPOTHESES

• Carefully state hypothesis

• Allow analyses to capture the effect

INTERACTION (Differential Subgroup Effect)

“A treatment effect that

differs by subgroup.”

QUANTATIVE INTERACTION

Different amount (quantity) of

benefit in various subgroups.

QUALITATIVE INTERACTION

True Benefit in some subgroups

and True Harm in others

(1 of over 700)

QUALITATIVE DIFFERENCES -WHY NOT?

• Extremes excluded

• Lack of replication in other studies

BIASES AND ERRORS IN DETERMINING SUBGROUP

EFFECTS

1. Subgroups lack statistical power

2. Random variation - widely divergentestimates of treatment benefit

3. Statistical multiplicity

4. Post-hoc analyses - extreme results theproduct of random errors

5. Replication - a posteriori vs. a priori

FURBERG AND BYINGTONCIRCULATION 1983;67:I98-I101

• 146 Subgroups in BHAT: Few defined a priori

• Distribution of subgroup results - Gaussian

• Impact of change on data set inversely related to sample size. (Participants or deaths gives similar distribution)

3 CRITERIA FOR CONFIDENCE IN SUBGROUP FINDING

• Dose response relationship

• Independent findings within the study

• Replication by outside trial

EXPECTED EFFECTS OF TRIAL SIZE ON TRIAL RESULTS

Total no. of deaths in trial

(treated+control)

(Approx. no. of patients

randomized if risk 10%

Approx Probability of failing to achieve

1 P<0.01 significance if true risk reduction is

Comments that might be made of size before trial

begins

0-50 (under 500) Over 0.9 Utterly inadequate

50-150 (1000) 0.7-0.9 Probably inadequate

150-350 (3000) 0.3-0.7 Probably adequate, possibly not

350-650 (6000) 0.1-0.3 Possibly adequate

Over 650 (10,000) Under 0.1 Definitely Adequate

Actual effects of trial size on trial results. Relationship between the total number of deaths in the two treatment groups and the result actually attained, in the 24 trials of a treatment (long-term beta-

blockade) that reduces the odds of death by about 22 + 4%

Total no. of deaths in

(β-bl.+plac.)

(Mean no. of patients

randomized)

Statistical power

P<0.5 against

Non-sigt. against

Non-sigt. favorable

P<0.5 favorable

0-50 (255) Utterly Inadequate

0 5 5 0

50-150 (861) Probably Inadequate

0 1 9 1

150-350 (2925) Possibly adequate,

probably not

0 0 1 2

350-650 (No such β -bl. trials

Probably Adequate

- - - -

Over 650 No such β -bl. trials

Definitely Adequate

- - - -

TOTAL (866) Inadequate separately,

adequate only in aggregate

0 6 15 3

No. of trials resulting in:

SUBGROUP EFFECT

Treatment effect in a specific proper subgroup.

Must be significantly different from

overall effect!!

HYPOTHETICAL SUBGROUP EFFECTS ILLUSTRATING THE “PLAY OF CHANCE” IN A TRIAL THAT SHOWS

CLEAR OVERALL BENEFIT

(%) (%) RiskDecrease

p Value

Overallresult

240/3,000(8)

300/3,000(10)

20 < 0.01

Subgroup A 80/1,000(8)

100/1,000(10)

Subgroup B 70/1,000(7)

110/1,000(11)

36 < 0.001

Subgroup C 90/1,000(9)

90/1,000(9)

MULTIPLE COMPARISONSExample:

• 1,000 participants Mortality Rate = 10%

Treat A Treat B Treatments equally effective• 10 subgroups (equal size) randomly formed

Relative Risk Probability

Reduction to: (percent)

.33 80

.1 5• Nominal p value - inappropriate

• Conservative approach - p/Sn

• Especially important in trial where main outcome is

not statistically significant

Subgroup

Overall

Beta blocker (yes)

no beta blocker

ACEI (yes)

no ACEI*

1.00.750.50

.87.77 .97

Valsartan better Valsartan worse

Mortality and Morbidity

44.0% , P=.0002

Cohn. N Engl J Med. 2001; *FDA analysis/package insert

RR of death P

no ACEI 41%<0.05*

ACEI + Beta blocker 42% 0.009

Subgroup Results

*n=366

Diabetes No 680/2715 815/2721Yes 470/1088 495/1075

Hyper- No 484/1710 579/1703tension Yes 666/2093 731/2093

ACEIs No 586/2230 688/2244 Yes 564/1573 622/1552

Beta No 611/1701 710/1695blocker Yes 539/2102 600/2101

Spirono- No 880/3160 1041/3167lactone Yes 270/643 269/629

Overall 1150/3803 1310/3796

Test for interaction

P=0.09

P=0.51

P=0.32

P=0.19

P=0.17

candesartan better

Hazard ratio

placebo better

0.6 0.8 1.0 1.2 1.4

Candesartanevent/n

Placeboevent/n

CV Death or Hospitalization for CHF

EXAMPLE OF “SUBGROUPING” IN INTERNATIONAL SOCIETY FOR THE

INVESTIGATION OF STRESS-2: ASTROLOGY AND ASPIRIN

Vascular Mortality at Week 5

Aspirin (%) Placebo (%) OddsDecrease(% ± SD)

Patients bornunder Libraand Gemini

150/1,357(11.1)

147/1,442(10.2)

8% adverse(NS)

Patients bornunder other“birth signs”

654/7,228(9.0)

868/7,157(12.1)

26% ± 5(p<0.0001)

Overall results 804/8,587(9.4)

1,016/8,600(11.8)

23% ± 4(p<0.0001)

ORDERED SUBGROUPS

• Strong biological rationale

• Reflects natural ordering

• Correct for multiplicity

• Only indicate those as significant which have a p value less than p/Sn

Reduction of Stroke According to Sex

N Active Rx

Placebo % Difference

Males 2046 5.5% 8.7% -38%

Females 2690 4.7% 7.3% -31%

Stroke Rates

GISSI-1• Overall result - streptokinase treatment,

20% reduction in total mortality

• Benefit confined to:

– Anterior MI

– Age 65 years

– Treatment 6 hours

• Subsequent trials and pooled results do not confirm

HYPOTHETICAL EXAMPLE OF ORDERED SUBGROUPS: RELATIVE RISK AS A

FUNCTION OF TIME OF THROMBOLYTIC THERAPY

14-d Mortality (%) Hours After Pain Treated Control

Relative Risk

≤ 2 10 20 .50 .01

> 2 – 4 11 16 .70 NS

> 4 – 8 17 22 .77 NS

> 8 – 12 20 25 .80 NS

> 12 23 24 .95 NS

Overall 14 21 .67 .001

SUBGROUPS DEFINED A-PRIORI

• Suggestive differential subgroup effect

• State in design of new trial

• Publish (multiplicity, design analysis, plan over-sampling)

• Test within an existing data set

STROKE SUBGROUP HYPOTHESIS

On BP Meds at ICV Off BP Meds at ICV

35% of participants 65% of participants

Net reduction in Net reduction in

stroke rate =10% stroke rate =40%

80% power to detect 30% treatment difference

STROKE EVENTS BY MEDICATION STATUS

GROUP N NFS FS CSNot on MedicationsActive 1584 64 5 67Placebo 1577 88 11 96

6.72= 2א Relative Risk (active/placebo)= 0.69P = .0096 95% CI = 0.51-0.95

On MedicationsActive 781 32 5 36Placebo 794 61 3 63

5.11 = 2א Relative Risk (active/placebo)= 0.57 P = .0237 95% CI = 0.38-0.85

MRFIT RESEARCH GROUP, AM J CARDIOL 1985;55:1-15

ECG ABNORMALITIES AT BASELINE

Present Absent

PersonYrs.

No. ofDeaths

Per 1000Per Yrs.

PersonYrs.

No. ofDeaths

Per 1000Per Yrs.

Not onDiuretic

6074 9 1.48 17,356 35 2.02

On Diuretic 6433 38 5.91 14,399 33 2.29

SUBGROUP HYPOTHESIS

Will the treatment of ISH reduce the frequency of major coronary events more in those free of baseline ECG

abnormalities than in those with such abnormalities?

OTHER COMBINED EVENTS BY TREATMENT GROUP

Event Active Placebo Rel. Risk 95% CI

Nonfatal MI/CHD Death

104 141 0.73 0.57-0.94

Stroke/MI/CHD/Death

199 289 0.67 0.56-0.81

CHD 140 184 0.75 0.60-0.94

CVD 289 414 0.68 0.58-0.79

NONFATAL MI & CHD DEATH BY BASELINE ECG ABNORMALITIES

TreatmentNumber Events Rate per 100 (SE)With baseline ECG abnormalitiesActive 1429 67 6.0 (0.8)Placebo 1426 96 8.0 (0.9)

5.73 = 2א Rel. Risk = 0.69P = 0.02 95% CI = 0.50-0.94

Without baseline ECG abnormalitiesActive 903 35 4.5 (0.8)Placebo 922 43 4.6 (0.7)

0.70 = 2א Rel. Risk = 0.83P = 0.40 95% CI = 0.53-1.29

SUDDEN DEATH BY BASELINE ECG ABNORMALITIES

TreatmentNumber Events Rate per 100 (SE)With baseline ECG abnormalitiesActive 1429 15 1.2 (0.3)Placebo 1426 17 1.3 (0.4)

0.14 = 2א Rel. Risk = 0.88P = 0.71 95% CI = 0.44-1.75

Without baseline ECG abnormalitiesActive 903 8 1.2 (0.4)Placebo 922 5 0.6 (0.3)

0.77 = 2א Rel. Risk = 1.64P = 0.38 95% CI = 0.54-5.01

SUBGROUPS DEFINED A-POSTERIORI

• “Grist” for formulating hypothesis

• Watch for alternative definitions!

• Should be clearly stated and reported as an estimate of effect with appropriate confidence interval

SUBGROUPS AND MONITORING TRIALS

• Use statistically sound monitoring method

• Interference with main trial endpoint - rare

• Formulate hypothesis and test prospectively

• Terminate subgroup

• “Mega trials” - special problems

BUCHWALD AND COLLEAGUES: POSCH, NEJM1990;323;946-955

• Cholesterol lowering by ileal bypass for secondary prevention of CHD

• 838 participants randomized

• Primary outcome-Fatal and non-fatal CHD– 62 vs 49, p=0.164

• Subgroup, EF<50% vs >50%, – Final 24 vs 39, p=0.021– post-hoc analysis 6 vs 17– after observed phenomenon - 18 vs 22

ESSENTIALS IN SUBGROUP REPORTING

• PRESPECIFICATION OF SUBGROUPS • NUMBER OF SUBGROUPS• NUMBER OF SUBGROUP OUTCOMES• STATISTICAL METHODS (INTERACTION)• NUMBER OF SIGNIFICANT SUBGROUPS FOUND• EMPHASIS OF SUBGROUP VS PRIMARY

OUTCOME

Hernandez AV, et al. Am Heart J. 2006;151:257-64.

SUGGESTIONS TO APPROPRIATELY PERFORMAND INTERPRET SUBGROUP ANALYSIS

DICTUM 1

“The treatment effect in all subgroups of patients without obvious contraindications to treatment is likely to be qualitatively (in the same direction) similar.”

DICTUM 2

“The treatment effect in all subgroups of patients without obvious contraindications to treatment is likely to be quantitatively (difference in degree) dissimilar even when effects appear to be identical.”

DICTUM 3

“Estimates of treatment effect within a subgroup chosen for special emphasis are usually ‘biased’ and so the most appropriate estimate in a subgroup is closer to the overall result.”

KEY POINTS IN SUBGROUP ANALYSES & INTERPRETATION I

Design

1. State clearly a few important and plausible subgroup hypotheses in advance. Include the direction of the expected effect.

2. Rank the subgroup hypotheses in order or plausibility.

3. Calculate power to detect key subgroup effects. If it is inadequate, consider building adequate power to detect key subgroup effects.

4. State whether the trial will be continued even after the overall results are convincing but the subgroup effects are not significant. Decide whether the primary method of monitoring will focus on the subgroups or on the overall trial.

5. State primary analytical methods in advance.

KEY POINTS IN SUBGROUP ANALYSES AND INTERPRETATION II

Monitoring a trial

1. Rigorous evidence of benefit or harm in subgroups postulated a priori: consider selective discontinuation of that subgroup.

2. Evidence of benefit or harm in unexpected subgroups: postulate a hypothesis to be tested in the remaining part of the study.

KEY POINTS IN SUBGROUP ANALYSES & INTERPRETATION III

Analyses and interpretation

1. Use statistical methods that capture the framework of the prior hypotheses.

2. Place greater emphasis on the overall results than on what may be apparent within a particular subgroup.

3. Distinguish between prior and data-derived hypotheses. Do not calculate p values for data-derived hypotheses because such p values usually bear little resemblance to what could occurs if the hypothesis were tested independently in another study.

4. Use tests of “interactions” and/or correct for multiplicity of statistical comparisons. (“Nominal” p values are usually misleading.)

5. Interpret the results in the context of similar data from other trials, from the architecture of the entire set of data on all patients, and from principles of biological coherence.

KEY POINTS IN SUBGROUP ANALYSES AND INTERPRETATION IV

Improper subgroups and analyses

1. Avoid analyses of subgroups based on post-randomization response, adherence, etc.

2. Avoid emphasizing nominal p values.

3. Do not emphasize data-derived analyses or analyses based on post-hoc definitions of subgroups.

THE RELIABILITY AND UNRELIAB ILITY OF SUB-GROUP ANALYSES

Documents

Application of Bayesian Methods in Reliability Data Analyses

Weibull Reliability Analyses · Weibull – Reliability Analyses Weibull-Analysis for field data One of the most important methods for field data is the prognosis of censored running

Good Practices for Implementing Human Reliability Analysis ... · This report documents good practices for performing human reliability analyses (HRAs) and assessing the quality of

Embedding employA ILITY thinking across higher education

Descriptive Statistical Analyses Reliability Analyses Review of Last Class

Strength and Reliability Analyses of Energy and Marine ... · Strength and Reliability Analyses of Energy . and Marine Transportation Structures . Shuji AIHARA. Professor, Graduate

PAIT Reliability Analyses Summary: Table of Contents · PAIT Reliability Analyses 1 PAIT Reliability Analyses Summary: Table of Contents Dominance (DO) - Teen Both Cohorts..... .....5

Evaluating the detail level of reliability analyses used - DiVA Portal

CRITERIA FOR RELIABILITY-BASED DESIGN AND …STP-PT-048 Criteria for Reliab ility-Based Design and Assessment for ASME B31.8 Code vi FORWORD This Criteria Document provides guidance

NETWORK ON REVERSI ILITY: MID-LIFE REVERSI ILITY OF EARLY ... · NETWORK ON REVERSI ILITY: MID-LIFE REVERSI ILITY OF EARLY ESTA LISHED IO EHAVIORAL RISK FATORS. Bethesda, Maryland

Business Ethics & Social Responsib ility: Doing Well by ... · 4 Business Ethics & Social Responsib ility: Doing Well by Doing Good ... Business Ethics & Social Responsib ility:

m sustainability obin ility - JK Tyre

Evaluating Reliability o f Enterprise Architecture Based ... · netwo rks, fuzzy enterprise architecture and enterprise reliab ility. In the third section, enterprise architecture

DIGITAL SYSTEMS - RELIABILITY ANALYSES - Nordic · PDF file · 2010-10-25DIGITAL SYSTEMS - RELIABILITY ANALYSES ... (ca +35 MW) Modification of HP ... Modif. of screening plant (sea

Validity and Reliability and Item Analyses - · PDF fileValidity, Reliability and Item Analysis ... Validity and Reliability ... You are researching an assessment instrument to measure

Technical Report Reliability Analyses · Reliability Analyses Mütec Instruments GmbH, Seevetal-Ramelsloh, Germany RISK NOWLOGY Experts in Risk, Reliability and Safety Page 10 1 Introduction

Using Operator Workload Data to Inform Human Reliability ...ewh.ieee.org/conf/hfpp/presentations/47.pdf · Using Operator Workload Data to Inform Human Reliability Analyses Lila Laux

THE RELIABILITY AND UNRELIAB ILITY OF SUB-GROUP ANALYSES JEFFREY L. PROBSTFIELD, MD, FACP, FACC, FAHA, FESC, FSCT Professor of Medicine (Cardiology) University

Minimizationand Reliability Analyses of Attack Graphs

Continuum shape sensitivity and reliability analyses of ...user.engineering.uiowa.edu/~rahman/ijf_nsra.pdf · Continuum shape sensitivity and reliability analyses 191 Figure 1. Variation