Download pptx - Two-stage individual participant data meta-analysis and flexible forest plots David Fisher MRC Clinical Trials Unit Hub for Trials Methodology Research

Two-stage individual participant data meta-analysis

and flexible forest plotsDavid Fisher

MRC Clinical Trials Unit Hub for Trials Methodology Research

at UCL

[email protected]

2013 UK Stata Users Group Meeting Cass Business School, London

Outline of presentation• Introduction to individual patient data (IPD) meta-analysis (MA)

• IPD vs aggregate-data (AD) MA• “One-stage” vs “two-stage” IPD MA

• The ipdmetan command• Basic use; comparison with metan• Covariate interactions• Combining AD with IPD• Advanced syntax

• The forestplot command • Interface with ipdmetan• Stand-alone use and “stacking”

• Summary and Conclusion

Introduction to IPD meta-analysis

• Meta-analysis (MA):• Use statistical methods to combine results of “similar”

trials to give a single estimate of effect• Increase power & precision• Assess whether treatment effects are similar in across

trials (heterogeneity)

• Aggregate data (AD) vs IPD:• “Traditional” MAs gather results from publications

• Aggregated across all patients in the trial; nothing is known of individual patients

• IPD MAs gather raw data from trial investigators• Ensures all relevant patients are included• Ensures similar analysis across all trials• Allows more complex analysis, e.g. patient-level

interactions

“One-stage” IPD MA

• Consider a linear regression (extension to GLMs or time-to-event regressions is straightforward)

• For a one-stage IPD MA (i = trial, j = patient):𝑦 𝑖𝑗=𝛼𝑖+ (𝛽+𝑢𝑖 ) 𝑥 𝑖𝑗

• Examples in Stata:• Fixed effects: regress y x i.trial• Random effects:

xtmixed y x i.trial || trial: x, nocons

where αi = trial identifiersβ = overall treatment effect estimated across all trials i(with optional random effect ui)

“Two-stage” IPD MA

• For a two-stage IPD MA:

𝑦 (𝑖 ) 𝑗=𝛼(𝑖)+𝛽(𝑖) 𝑥 (𝑖 ) 𝑗

𝑦 (1 ) 𝑗=𝛼(1 )+𝛽(1) 𝑥( 1) 𝑗…

for trial 1

for trial i…

• Then: �̂�=∑𝑖𝑤 𝑖 𝛽(𝑖)

∑𝑖𝑤𝑖

and

𝑤𝑖=1

𝑠𝑒 ( 𝛽(𝑖))2

• Weights wi may be altered to give random effects• e.g. DerSimonian & Laird,

• Straightforward, but currently messy in Stata

where

𝑠𝑒 ( �̂�)= 1∑𝑖𝑤 𝑖

Treatment-covariate interactions

• Assessment of patient-level covariate interactions is a great advantage of IPD

• Arguably best done with “one-stage”• Main effects & interactions (& correlations) estimated

simultaneously• But basic analysis also possible with “two-stage”

• Relative effect (interaction coefficient) only• Same approach (inverse-variance) as for main effects• Ensures no estimation bias from between-trial effects• Can be presented in a forest plot, with assessment of

heterogeneity etc.• Discussed in a published paper (Fisher 2011)

𝑦 𝑖𝑗=𝛼𝑖+𝛽 𝑥𝑖𝑗+𝛾 𝑧𝑖𝑗+𝛿𝑥𝑖𝑗 𝑧 𝑖𝑗

“One-stage” vs “two-stage”

One-stage Two-stagePros - All coeffs & correls

estimated simultaneously

- Flexible & extendable model structure

- Natural extension of AD MA- Easily presentable in forest plots- Applicable to any set of effect

estimates and SEs(incl. interactions)

- Negligible difference to 1S in most common scenarios

Cons - Requires more statistical expertise

- Challenging in certain situations, e.g. random-effects with time-to-event data

- Not a natural fit with forest plots

- Only a single estimate can be pooled, which limits complexity(e.g. interactions)

- Theoretically inferior in (at least) some scenarios

Example data

• IPD MA of randomised trials of post-operative radiotherapy (PORT) in non-small cell lung cancer• Trial ID (k=11)• Patient ID (n=2343)• Treatment arm

• Outcome is censored time to overall survival (death from any cause)• Time to event (from randomisation)• Event type (death or censorship)

• Certain covariate measurements also available, not necessarily for all trials or patients• Disease stage (factor, but treat as continuous)• (+ others)

ipdmetan syntax

ipdmetan,study(trialid) eform

: stcox arm, strata(sex)

ipdmetan options after comma, before colon

estimation_command and options after colon

Uses “prefix” command syntax:

ipdmetan [exp_list], study(study_ID) [ ipd_options ad(aggregate_data_options) forestplot(forest_plot_options)]

: estimation_command ...

Example: default is to pool coeffs from first dep. var. (excluding baseline factor levels)

Trials included: 11Patients included: 2342 Meta-analysis pooling of main (treatment) effect estimate armusing Fixed-effects --------------------------------------------------------------------trial reference |number | Effect [95% Conf. Interval] % Weight----------------------+---------------------------------------------belgium | 1.456 1.072 1.979 11.09EORTC 08861 | 1.643 0.913 2.956 3.02LILLE | 1.568 1.060 2.319 6.81... ... ... ... ...----------------------+---------------------------------------------Overall effect | 1.178 1.064 1.305 100.00--------------------------------------------------------------------

Test of overall effect = 1: z = 3.153 p = 0.002

Heterogeneity Measures--------------------------------------------------- | value df p-value---------------+-----------------------------------Cochrane Q | 15.88 10 0.103I² (%) | 37.0%Modified H² | 0.588tau² | 0.0180---------------------------------------------------

I² = between-study variance (tau²) as a percentage of total varianceModified H² = ratio of tau² to typical within-study variance

Output style similar to metan

or metaan

Variable label

Basic forest plot

belgium

LCSG 773

CAMS

MRC LU11

EORTC 08861

SLOVENIA

LILLE

GETCB 04CB86

GETCB 05CB86

ITALY

KOREA

Overall (I-squared = 37.0%, p = 0.103)

reference number

trial

1.46 (1.07, 1.98)

1.12 (0.83, 1.53)

1.03 (0.77, 1.38)

0.96 (0.74, 1.24)

1.64 (0.91, 2.96)

0.89 (0.54, 1.49)

1.57 (1.06, 2.32)

1.14 (0.80, 1.62)

1.44 (1.13, 1.83)

0.69 (0.40, 1.20)

1.16 (0.76, 1.76)

1.18 (1.06, 1.31)

Effect (95% CI)

11.09

11.13

12.20

16.00

3.02

3.97

6.81

8.48

17.84

3.49

5.98

100.00

Weight

%

1 2.5 4.25

Forest plot of covariate interactions

belgium

LCSG 773

CAMS

MRC LU11

EORTC 08861

GETCB 04CB86

GETCB 05CB86

KOREA

SLOVENIA

LILLE

ITALY

Overall (I-squared = 2.7%, p = 0.409)

reference number

trial

0.92 (0.61, 1.40)

0.76 (0.40, 1.45)

0.77 (0.43, 1.39)

0.62 (0.36, 1.07)

0.39 (0.14, 1.09)

0.94 (0.50, 1.77)

0.97 (0.72, 1.30)

2.09 (0.70, 6.27)

(Insufficient data)

(Insufficient data)

(Insufficient data)

0.87 (0.72, 1.04)

Effect (95% CI)

18.70

8.11

9.49

11.26

3.16

8.22

38.35

2.73

100.00

Weight

%

1 2.5 4.25 8.125

Trials included: 8Patients included: 1962

Meta-analysis pooling of interaction effect estimate

1.arm#c.stage2

using Fixed-effects

ipdmetan, study(trialid) eform interaction keepall: stcox arm##c.stage

default is to pool coeffs from first interaction term

Inclusion of aggregate data

• I don’t have a separate aggregate dataset, so I will create one artificially from my IPD dataset

. ** Generate artificial trial subgrouping

. gen subgroup = inlist(trialid, 1, 8, 12, 15)

. label define subgroup_ 0 "Trial group 1" 1 "Trial group 2"

. label values subgroup subgroup_

. ** Run ipdmetan within one of the subgroups; save the dataset

. qui ipdmetan,study(trialid) by(subgroup) nooverall nographsaving(subgroup1.dta)

: stcox arm if subgroup==1, strata(sex)

(Aside: Contents of subgroup1.dta)

_use trialid _labels _ES _seES _lci _uci _wgt _NN1 1belgium 0.376 0.156 0.069 0.682 0.286 2021 8EORTC 08861 0.496 0.300 -0.091 1.084 0.078 1051 12LILLE 0.450 0.200 0.058 0.841 0.176 1631 15GETCB 05CB86 0.362 0.123 0.120 0.603 0.460 539

Inclusion of aggregate data: Syntax

. ipdmetan, study(trialid) eform nooverall

ad(subgroup1.dta, byad)

: stcox arm if subgroup==0, strata(sex)

Do not pool IPD and aggregate

together

Aggregate data syntax

estimation_command

“byad” = treat IPD & aggregate data as

subgroups

Trials included from IPD: 7Patients included: 1333 Trials included from aggregate data: 4Patients included: 1009 Pooling of main (treatment) effect estimate armusing Fixed-effects

-------------------------------------------------------------------trial reference |number | Effect [95% Conf. Interval] % Weight---------------------+---------------------------------------------IPD |LCSG 773 | 1.123 0.827 1.526 11.13CAMS | 1.029 0.768 1.378 12.20... | ...Subgroup effect | 1.021 0.896 1.163 61.25---------------------+---------------------------------------------Aggregate |belgium | 1.456 1.072 1.979 11.09EORTC 08861 | 1.643 0.913 2.956 3.02... | ...Subgroup effect | 1.479 1.256 1.743 38.75-------------------------------------------------------------------

Tests of effect size = 1:IPD z = 0.305 p = 0.760Aggregate z = 4.682 p = 0.000

Inclusion of aggregate data: Screen output

Inclusion of aggregate data: Forest plot

IPD

LCSG 773

CAMS

MRC LU11

SLOVENIA

GETCB 04CB86

ITALY

KOREA

Subtotal (I-squared = 0.0%, p = 0.740)

Aggregate

belgium

EORTC 08861

LILLE

GETCB 05CB86

Subtotal (I-squared = 0.0%, p = 0.964)

reference number

trial

1.12 (0.83, 1.53)

1.03 (0.77, 1.38)

0.96 (0.74, 1.24)

0.89 (0.54, 1.49)

1.14 (0.80, 1.62)

0.69 (0.40, 1.20)

1.16 (0.76, 1.76)

1.02 (0.90, 1.16)

1.46 (1.07, 1.98)

1.64 (0.91, 2.96)

1.57 (1.06, 2.32)

1.44 (1.13, 1.83)

1.48 (1.26, 1.74)

Effect (95% CI)

18.18

19.92

26.12

6.48

13.85

5.69

9.76

100.00

28.61

7.79

17.56

46.03

100.00

Weight

%

1 2.5 4.25

Advanced syntax example:non “e-class” estimation command

ipdmetan (u[1,1]/V[1,1]) (1/sqrt(V[1,1]))

, study(trialid) eformad(subgroup1.dta, byad)

lcols(evrate=_d %3.2f "Event rate")

rcols(u[1,1] %5.2f "o-E(o)" V[1,1] %5.1f "V(o)")

forest(nooverall nostats nowt)

: sts test arm if subgroup==0, mat(u V)

Effect estimate & SE not from e(b)

– must specify manually

Advanced syntax example:columns of data in forestplot

ipdmetan (u[1,1]/V[1,1]) (1/sqrt(V[1,1]))

, study(trialid) eformad(subgroup1.dta, byad)

lcols(evrate=_d %3.2f "Event rate")

rcols(u[1,1] %5.2f "o-E(o)" V[1,1] %5.1f "V(o)")

forest(nooverall nostats nowt)

: sts test arm if subgroup==0, mat(u V)

Mean of var currently in memory (note user-assigned name, to

match with varname in aggregate dataset)

Collect lists of returned stats

Advanced syntax example: Forest plot

IPDLCSG 773CAMSMRC LU11SLOVENIAGETCB 04CB86ITALYKOREASubtotal(I-squared = 0.0%, p = 0.710)

AggregatebelgiumEORTC 08861LILLEGETCB 05CB86Subtotal(I-squared = 0.0%, p = 0.964)

reference numbertrial

0.720.580.780.850.680.510.810.69

0.830.430.640.50

rateEvent

4.771.07-2.48-2.564.95-4.503.063.24

o-E(o)

41.044.959.415.631.613.222.4229.6

V(o)

1 2.5 4.25

Advanced syntax example: Forest plot

IPDLCSG 773CAMSMRC LU11SLOVENIAGETCB 04CB86ITALYKOREASubtotal(I-squared = 0.0%, p = 0.710)

AggregatebelgiumEORTC 08861LILLEGETCB 05CB86Subtotal(I-squared = 0.0%, p = 0.964)


0.720.580.780.850.680.510.810.69

0.830.430.640.50

rateEvent

4.771.07-2.48-2.564.95-4.503.063.24

o-E(o)

41.044.959.415.631.613.222.4229.6

V(o)

1 2.5 4.25

These vars do not appear in the

aggregate dataset, so are not plotted

Subtotal cannot be calculated for

aggregate data

The forestplot command

• Does not perform any calculations/estimations; simply plots existing data as a forest plot

• Overall/subgroup estimates, spacings, labels, text columns etc. need to be created/arranged in advance• Ordering & spacing; marking of subgroup/overall

estimates for plotting “diamonds”: _use• Principal left-hand data column (study IDs,

heterogeneity etc. – string fmt): _labels• This setup is done automatically by ipdmetan before

passing to forestplot• (but can also be done manually by user)

• Multiple datasets can be passed to forestplot at once to create a single large “stacked” plot on common x-axis

forestplot syntax

forestplot [varlist] [if] [in][, plot_options graph_options using_option]

• varlist = manually specify varnames to plot• plot_options control the data plotting (within plot region)• graph_options control the surroundings (outside plot region;

graph region)• using_option represents one or more options that allow

suitable datasets (or parts of datasets) to be fed to forestplot, possibly with different plot_options, to form a single large forest plot on a single x-axis.

using_option syntax

using(filenamelist [if] [in] [, plot_options]) [using(filenamelist [if] [in] [, plot_options)] ...]

• filenamelist is a list of one or more Stata-format datasets• parts may be specified with [if] [in]• same filename can appear more than once• order of filenames determines placement in graph

• Different plot_options may be specified to each using option

• For same options applied to multiple files, place them in a filenamelist• For different options applied to each file, place each file

in a different using option

plot_options syntax• Based on metan syntax, options refer to different parts

of the forest plot• Most options appropriate to the underlying twoway plot

type are acceptable, with some exceptionsOption Function twoway plot typeboxopt Weighted boxes for

study point estimatesscatter [aweight]

pointopt Points for study point estimates

scatter

ciopt Lines for confidence intervals

rspike, horpcarrow

diamopt Diamond for summary estimate

pcspike (x4)

olineopt Vertical line through summary estimate

rspike

Example forestplot dataset(“resultsset” from last ipdmetan example)

_use _by _study _labels _ES _lci _uci _wgt evrate u_1_1_ V_1_1_ _NN0 1 IPD1 1 3 LCSG 773 0.116 -0.190 0.422 0.111 0.72 4.77 41.01 1 5 CAMS 0.024 -0.269 0.316 0.121 0.58 1.07 44.91 1 6 MRC LU11 -0.042 -0.296 0.213 0.160 0.78 -2.48 59.41 1 9 SLOVENIA -0.164 -0.660 0.332 0.042 0.85 -2.56 15.61 1 14 GETCB 04CB86 0.157 -0.192 0.506 0.085 0.68 4.95 31.61 1 13 ITALY -0.341 -0.881 0.199 0.036 0.51 -4.50 13.21 1 16 KOREA 0.136 -0.278 0.550 0.061 0.81 3.06 22.43 1 Subtotal 0.019 -0.111 0.149 0.615 0.69 3.24 229.64 1 (I-squared = 0.0%, p = 0.710)4 10 2 Aggregate1 2 17 belgium 0.376 0.069 0.682 0.110 0.83 2021 2 18 EORTC 08861 0.496 -0.091 1.084 0.030 0.43 1051 2 19 LILLE 0.450 0.058 0.841 0.068 0.64 1631 2 20 GETCB 05CB86 0.362 0.120 0.603 0.177 0.50 5393 2 Subtotal 0.392 0.228 0.556 0.385 10094 2 (I-squared = 0.0%, p = 0.964)4 2

4Heterogeneity between groups:p = 0.000

5 Overall 0.162 0.061 0.264 1.000 10094 (I-squared = 38.4%, p = 0.093)

Estimates; CIs; weights Extra data columns

“Stacking” of forest plots

• Imagine:• dataset on previous slide is saved as ipdtest.dta• we want IPD boxes to be red, and AD boxes to be green

• We proceed as follows:• Run forestplot with two using(...) options, one for

each part of the plot, with the same filename• (Alternatively: run ipdmetan twice and save under

different filenames)• Specify our desired plot_options as suboptions to using()

forestplot,using(ipdtest.dta if _by==1, boxopt(mcolor(red)))using(ipdtest.dta if _by==2, boxopt(mcolor(green)))lcols(evrate) rcols(u_1_1_ V_1_1_)nooverall nostats nowt

(I-squared = 0.0%, p = 0.964)SubtotalGETCB 05CB86LILLEEORTC 08861belgiumAggregate

(I-squared = 0.0%, p = 0.710)SubtotalKOREAITALYGETCB 04CB86SLOVENIAMRC LU11CAMSLCSG 773IPD


0.500.640.430.83

0.690.810.510.680.850.780.580.72

rateEvent

3.243.06-4.504.95-2.56-2.481.074.77

o-E(o)

229.622.413.231.615.659.444.941.0

V(o)

1 2.5 4.25

Summary and conclusion

• IPD is increasingly used, and its advantages widely accepted• Large numbers of MA scientists use two-stage models for

analysing IPD• Currently only AD MA (e.g. metan) and

one-stage IPD (e.g. xtmixed) commands exist in Stata

• ipdmetan is a universal command for two-stage IPD MA• forestplot is a flexible forest plot command

• does not carry out analysis itself, thus not restricted by it• may be useful outside the MA context (e.g. presenting

trial subgroups)

Further information

• Other related programs (all call forestplot by default):• admetan: calls ipdmetan to analyse AD

(direct alternative to metan)• ipdover: fit model within series of subgroups• petometan: perform meta-analysis of time-to-event

data using the Peto (log-rank) method

• SSC and Stata Journal article in near future

Thankyou!

• Questions, requests, bug reports:[email protected]

• Thanks to:• Jayne Tierney, Patrick Royston• Ross Harris (author of metan) for advice & support• Assorted colleagues for testing

• Reference:• Fisher D. J. et al. 2011. Journal of Clinical

Epidemiology 64: 949-67