View
222
Download
2
Category
Tags:
Preview:
Citation preview
Interactions With Continuous Variables – Extensions of the Multivariable Fractional
Polynomial Approach
Willi SauerbreiInstitut of Medical Biometry and Informatics University Medical Center Freiburg, Germany
Patrick RoystonMRC Clinical Trials Unit, London, UK
2
Overview
• Issues in regression models• (Multivariable) fractional polynomials
(MFP)• Interactions of continuous variable with
– Binary variable– Continuous variable– Time
• Summary
3
Observational StudiesSeveral variables, mix of continuous and (ordered) categorical
variables
Different situations:– prediction– explanation
Explanation is the main interest here:• Identify variables with (strong) influence on the outcome• Determine functional form (roughly) for continuous
variables
The issues are very similar in different types of regression models (linear regression model, GLM, survival models ...)
Use subject-matter knowledge for modelling ...... but for some variables, data-driven choice inevitable
4
Regression modelsX=(X1, ...,Xp) covariate, prognostic factorsg(x) = ß1 X1 + ß2 X2 +...+ ßp Xp (assuming effects are linear)
normal errors (linear) regression model
Y normally distributedE (Y|X) = ß0 + g(X)Var (Y|X) = σ2I
logistic regression model
Y binary
Logit P (Y|X) = ln
survival times T survival time (partly censored) Incorporation of covariates
0β)X0P(Y
)X1P(Y g(X)
(t)expλ)Xλ(t 0 (g(X))
6
Continuous variables – The problem
“Quantifying epidemiologic risk factors using non-parametric regression: model selection remains the greatest challenge”Rosenberg PS et al, Statistics in Medicine 2003; 22:3369-3381
Discussion of issues in (univariate) modelling with splines
Trivial nowadays to fit almost any model
To choose a good model is much harder
8
Building multivariable regression models
Before dealing with the functional form, the ‚easier‘ problem of model selection:
variable selection assuming that the effect of each continuous variable is linear
9
Multivariable models - methods for variable selection
Full model– variance inflation in the case of multicollinearity
Stepwise procedures prespecified (in, out) and actual significance level?
• forward selection (FS)• stepwise selection (StS)• backward elimination (BE)
All subset selection which criteria?• Cp Mallows = (SSE / ) - n+ p 2 • AIC Akaike Information Criterion = n ln (SSE / n) + p 2• BIC Bayes Information Criterion = n ln (SSE / n) + p ln(n)
fit penalty
Combining selection with Shrinkage
Bayes variable selection Recommendations???
Central issue: MORE OR LESS COMPLEX MODELS?
2σ̂
10
Backward elimination is a sensible approach
- Significance level can be chosen- Reduces overfitting
Of course required• Checks• Sensitivity analysis• Stability analysis
11
Traditional approaches a) Linear function
- may be inadequate functional form- misspecification of functional form may lead to
wrong conclusions
b) ‘best‘ ‘standard‘ transformation
c) Step function (categorial data)- Loss of information- How many cutpoints?- Which cutpoints?- Bias introduced by outcome-dependent choice
Continuous variables – what functional form?
13
Continuous variables – newer approaches
• ‘Non-parametric’ (local-influence) models– Locally weighted (kernel) fits (e.g. lowess)– Regression splines– Smoothing splines
• Parametric (non-local influence) models– Polynomials– Non-linear curves– Fractional polynomials
• Intermediate between polynomials and non-linear curves
14
Fractional polynomial models
• Describe for one covariate, X• Fractional polynomial of degree m for X with powers p1,
… , pm is given byFPm(X) = 1 X p1 + … + m X pm
• Powers p1,…, pm are taken from a special set {2, 1, 0.5, 0, 0.5, 1, 2, 3}
• Usually m = 1 or m = 2 is sufficient for a good fit
• Repeated powers (p1=p2)
1 X p1 + 2 X p1log X
• 8 FP1, 36 FP2 models
16
Examples of FP2 curves- single power, different coefficients
(-2, 2)
Y
x10 20 30 40 50
-4
-2
0
2
4
17
Our philosophy of function selection
• Prefer simple (linear) model• Use more complex (non-linear) FP1 or FP2
model if indicated by the data• Contrasts to more local regression modelling
– Already starts with a complex model
18
299 events for recurrence-free survival time (RFS) in 686 patients with complete data
7 prognostic factors, of which 5 are continuous
Example: Prognostic factors
GBSG-study in node-positive breast cancer
20
χ2 df p-value
Any effect? Best FP2 versus null 17.61 4 0.0015
Linear function suitable?Best FP2 versus linear 17.03 3 0.0007
FP1 sufficient?Best FP2 vs. best FP1 11.20 2 0.0037
Function selection procedure (FSP)
Effect of age at 5% level?
21
Many predictors – MFP
With many continuous predictors selection of best FP for each becomes more difficult MFP algorithm as a standardized way to variable and function selection
(usually binary and categorical variables are also available)
MFP algorithm combinesbackward elimination withFP function selection procedures
22
P-value 0.9 0.2 0.001
Continuous factors Different results with different analysesAge as prognostic factor in breast cancer (adjusted)
23
Results similar? Nodes as prognostic factor in breast cancer (adjusted)
P-value 0.001 0.001 0.001
24
Multivariable FP
f X X X X Xa12
10 5
4 5 60 50 1 2
, ; ; ex p . ;. .
Model choosen out ofmore than a million possible models, one model selected
Model - Sensible?- Interpretable?- Stable?
Bootstrap stability analysis (see R & S 2003)
Final Model in breast cancer example
age grade nodes progesterone
25
Example: Risk factors
• Whitehall 1
– 17,370 male Civil Servants aged 40-64 years, 1670 (9.7%) died
– Measurements include: age, cigarette smoking, BP, cholesterol, height, weight, job grade
– Outcomes of interest: all-cause mortality at 10 years logistic regression
26
Fractional polynomials First degree Second degree Power p
Deviance Difference
Powers p q
Deviance Difference
Powers p q
Deviance Difference
Powers p q
Deviance difference
-2 -74.19 -2 -2 26.22* -1 1 12.97 0 2 7.05 -1 -43.15 -2 -1 24.43 -1 2 7.80 0 3 3.74 -0.5 -29.40 -2 -0.5 22.80 -1 3 2.53 0.5 0.5 10.94 0 -17.37 -2 0 20.72 -0.5 -0.5 17.97 0.5 1 9.51 0.5 -7.45 -2 0.5 18.23 -0.5 0 16.00 0.5 2 6.80 1 0.00 -2 1 15.38 -0.5 0.5 13.93 0.5 3 4.41 2 6.43* -2 2 8.85 -0.5 1 11.77 1 1 8.46 3 0.98 -2 3 1.63 -0.5 2 7.39 1 2 6.61 -1 -1 21.62 -0.5 3 3.10 1 3 5.11 -1 -0.5 19.78 0 0 14.24 2 2 6.44 -1 0 17.69 0 0.5 12.43 2 3 6.45 -1 0.5 15.41 0 1 10.61 3 3 7.59
Whitehall 1Systolic blood pressureDeviance difference in comparison to a straight line for FP(1) and FP(2) models
28
Presentation of models for continuous covariates
• The function + 95% CI gives the whole story• Functions for important covariates should always be
plotted• In epidemiology, sometimes useful to give a more
conventional table of results in categories• This can be done from the fitted function
29
Whitehall 1Systolic blood pressure Odds ratio from final FP(2) model LogOR= 2.92 – 5.43X-2 –14.30* X –2 log XPresented in categories Systolic blood pressure (mm Hg) Range ref. point
Number of men at risk dying
OR (model-based) Estimate 95%CI
90 88 27 3 2.47 1.75, 3.49 91-100 95 283 22 1.42 1.21, 1.67 101-110 105 1079 84 1.00 - 111-120 115 2668 164 0.94 0.86, 1.03 121-130 125 3456 289 1.04 0.91, 1.19 131-140 135 4197 470 1.25 1.07, 1.46 141-160 150 2775 344 1.77 1.50, 2.08 161-180 170 1437 252 2.87 2.42, 3.41 181-200 190 438 108 4.54 3.78, 5.46 201-240 220 154 41 8.24 6.60, 10.28 241-280 250 5 4 15.42 11.64, 20.43
30
Whitehall 1MFP analysis
Covariate FP etc.
Age Linear
Cigarettes 0.5
Systolic BP -1, -0.5
Total cholesterol Linear
Height Linear
Weight -2, 3
Job grade In
No variables were eliminated by the MFP algorithm
Assuming a linear function weight is eliminated by backward elimination
31
Detecting predictive factors (interaction with treatment)
• Don’t investigate effects in separate subgroups!
• Investigation of treatment/covariate interaction requires statistical tests
• Care is needed to avoid over-interpretation
• Distinguish two cases:- Hypothesis generation: searching several interactions
- Specific predefined hypothesis
• For current bad practise - see Assmann et al (Lancet 2000)
InteractionsMotivation – I
32
Continuous by continuous interactions
• usually linear by linear product term
• not sensible if main effect (prognostic effect) is non-linear
• mismodelling the main effect may introduce spurious interactions
Motivation - II
33
Detecting predictive factors(treatment – covariate interaction)
• Most popular approach- Treatment effect in separate subgroups- Has several problems (Assman et al 2000)
• Test of treatment/covariate interaction required - For `binary`covariate standard test for interaction
available
• Continuous covariate- Often categorized into two groups
34
Categorizing a continuous covariate
• How many cutpoints?• Position of the cutpoint(s)• Loss of information loss of power
35
Standard approach
• Based on binary predictor• Need cut-point for continuous predictor• Illustration - problem with cut-point approach
TAM*ER interaction in breast cancer (GBSG-study)
P-v
alu
e (
log s
cale
)
ER cutpoint, fmol/l0 5 10 15 20
.01
.02
.05
.1
.2
.5
36
Treatment effect by subgroup
Log
haza
rd r
atio
for
trea
tmen
t
ER cutpoint, fmol/l
Low ER High ER
0 5 10 15 20-.5
0
.5
1
37
New approaches for continuous covariates
• STEPPSubpopulation treatment effect pattern plots
Bonetti & Gelber 2000
• MFPIMultivariable fractional polynomial
interaction approach
Royston & Sauerbrei 2004
38
STEPP
Sequences of overlapping subpopulations
Sliding window Tail oriented
Con
tin.
cova
riate
2g-1 subpopulations(here g=8)
39
STEPP
Estimates in subpopulations• No interaction
treatment effects ‚similar‘ in all subpopulations• Plot effects in subpopulations
40
STEPP
Overlapping populations, therefore correlation
between treatment effects in subpopulations
Simultaneous confidence band and tests
proposed
41
MFPI• Have one continuous factor X of interest
• Use other prognostic factors to build an adjustment model, e.g. by MFP
Interaction part – with or without adjustment
• Find best FP2 transformation of X with same powers in each treatment group
• LRT of equality of reg coefficients
• Test against main effects model(no interaction) based on 2 with 2df
• Distinguish predefined hypothesis - hypothesis searching
42
At risk 1: 175 55 22 11 3 2 1
At risk 2: 172 73 36 20 8 5 1
0.0
00.2
50.5
00.7
51.0
0P
rop
ort
ion
aliv
e
0 12 24 36 48 60 72Follow-up (months)
(1) MPA(2) Interferon
RCT: Metastatic renal carcinomaComparison of MPA with interferon N = 347, 322 Death
43
• Is the treatment effect similar in all patients?
Sensible questions?- Yes, from our point of view
• Ten factors available for the investigation of treatment – covariate interactions
Overall: Interferon is better (p<0.01)
44
-4-2
02
Tre
atm
ent effect, log r
ela
tive h
azard
5 10 15 20White cell count
Original data
Treatment effect function for WCC
Only a result of complex (mis-)modelling?
MFPI
45
0.0
00
.25
0.5
00
.75
1.0
0P
roport
ion a
live
0 12 24 36 48 60 72
Group I
0.0
00
.25
0.5
00
.75
1.0
0
0 12 24 36 48 60 72
Group II
0.0
00
.25
0.5
00
.75
1.0
0P
roport
ion a
live
0 12 24 36 48 60 72Follow-up (months)
Group III
0.0
00
.25
0.5
00
.75
1.0
0
0 12 24 36 48 60 72Follow-up (months)
Group IV
Treatment effect in subgroups defined by WCC
HR (Interferon to MPA; adjusted values similar) overall: 0.75 (0.60 – 0.93)I : 0.53 (0.34 – 0.83) II : 0.69 (0.44 – 1.07)III : 0.89 (0.57 – 1.37) IV : 1.32 (0.85 –2.05)
Does the MFPI model agree with the data?Check proposed trend
47
STEPP as check of MFPI
.1.2
.51
2R
ela
tive
haz
ard
4 8 12 16 20White cell count
STEPP – tail-oriented, g = 6
48
0.5
11.
5D
ensi
ty
0 .2 .4 .6 .8 1p
MFPI – Type I error Random permutation of a continuous covariate (haemoglobin) no interaction
Distribution of P-value from test of interaction1000 runs, Type I error: 0.054
49
Continuous by continuous interactionsMFPIgen
• Have Z1 , Z2 continuous and X confounders
• Apply MFP to X, Z1 and Z2, forcing Z1 and Z2 into the model. FP functions f1(Z1) and f2(Z2) will be selected for Z1 and Z2
• Add term f1(Z1)* f2(Z2) to the model chosen and use LRT for test of interaction
• Often f1(Z1) and/or f2(Z2) are linear
• Check all pairs of continuous variables for an interaction
• Check (graphically) interactions for artefacts
• Use forward stepwise if more than one interaction remains
• Low significance level for interactions
50
InteractionsWhitehall 1
Consider only age and weight
Main effects:age – linearweight – FP2 (-1,3)
Interaction?
Include age*weight-1 + age*weight3
into the model
LRT: χ2 = 5.27 (2df, p = 0.07)
no (strong) interaction
51
Erroneously assume that the effect of weight is linear
Interaction?
Include age*weight into the model
LRT: χ2 = 8.74 (1df, p = 0.003)
hightly significant interaction
52
Model check:
categorize age in 4 equal sized groups
Compute running line smooth of the binary outcome on weight in each group
53
Whitehall 1: check of age x weight interaction-4
-3-2
-10
Logi
t(pr
(dea
th))
40 60 80 100 120 140weight
1st quartile 2nd quartile3rd quartile 4th quartile
54
Running line smooth are about
parallel across age groups
no (strong) interactions
smoothed probabilities are about equally spaced effect of age is linear
55
Erroneously assume that the effect of
weight is linear
Estimated slopes of weight in age-groups indicates strong qualitative interaction between age und weight
56
Whitehall 1:P-values for two-way interactions from MFPIgen
*FP transformations
chol*age highly significant
57
Presentation of interactions
Whitehall 1: age*chol interaction
Effect (adjusted) for 10th, 35th, 65th and 90th centile
58
Age*Chol interaction
Chol ‚low‘ : age has an effect
Chol ‚high‘ : age has no effect
Age ‚low‘ : chol has an effect
Age ‚high‘ : chol has no effect
59
Age*Chol interactionDoes the model fit? Check in 4 subgroups
Linearity of chol: okBut: Slopes are not monotonically ordered Lack of fit of linear*linear interaction
62
Extending the Cox model
• Cox model
(t | X) = 0(t) exp (X)
• Relax PH-assumption dynamic Cox model
(t | X) = 0(t) exp ((t) X)
HR(x,t) – function of X and time t
• Relax linearity assumption
(t | X) = 0(t) exp ( f (X))
63
Causes of non-proportionality
• Effect gets weaker with time• Incorrect modelling
– omission of an important covariate– incorrect functional form of a covariate– different survival model is appropriate
64
Non-PH – What can be done?
Non-PH - Does it matter ?
- Is it real ?
Non-PH is large and real– stratify by the factor
(t|X, V=j) = j (t) exp (X )• effect of V not estimated, not tested • for continuous variables grouping necessary
– Partition time axis– Model non-proportionality by time-dependent
covariate
65
Example: Time-varying effects
Rotterdam breast cancer data2982 patients1 to 231 months follow-up time1518 events for RFI (recurrence free interval)Adjuvant treatment with chemo- or hormonal therapy according to clinic guidelines70% without adjuvant treatment
Covariatescontinuous age, number of positive nodes, estrogen, progesterone
categorical menopausal status, tumor size, grade
66
• Treatment variables ( chemo , hormon) will be analysed as usual covariates
• 9 covariates , partly strong correlation (age-meno; estrogen-progesterone; chemo, hormon – nodes )
variable selection
• Use multivariable fractional polynomial approach for model selection in the Cox proportional hazards model
67
Assessing PH-assumption
• Plots– Plots of log(-log(S(t))) vs log t should be parallel for groups– Plotting Schoenfeld residuals against time to identify patterns in
regression coefficients– Many other plots proposed
• Testsmany proposed, often based on Schoenfeld residuals,most differ only in choice of time transformation
• Partition the time axis and fit models seperatly to each time interval
• Including time-by-covariate interaction terms in the model and estimate the log hazard ratio function
69
Factor
SE
p-value t rank(t) Log(t) Sqrt(t)
X1 – age -0.01 0.002 0.082 0.243 0.329 0.149
X3a – size 0.29 0.057 0.000 0.000 0.001 0.000
X4b – grade 0.39 0.064 0.189 0.198 0.129 0.164
X5e – nodes -1.71 0.081 0.002 0.000 0.000 0.000
X8 - chemo-T -0.39 0.085 0.091 0.008 0.023 0.034
X9 – horm-T -0.45 0.073 0.014 0.001 0.000 0.002
Index 1.00 0.039 0.000 0.000 0.000 0.000
estimates
test of time-varying effect for different time transformations
Selected model with MFP
70
Including time – by covariate interaction(Semi-) parametric models for β(t)
• model (t) x = x + x g(t)calculate time-varying covariate x g(t)
fit time-varying Cox model and test for 0
plot (t) against t
• g(t) – which form?– ‘usual‘ function, eg t, log(t)– piecewise– splines– fractional polynomials
71
MFPTime algorithmMotivation
• Multivariable strategy required to select– Variables which have influence on outcome– For continuous variables determine functional form of
the influence (‚usual‘ linearity assumption sensible?)– Proportional hazards assumption sensible or does a
time-varying function fit the data better?
72
MFPTime algorithm (1)
• Determine (time-fixed) MFP model M0
possible problems
variable included, but effect is not constant in time
variable not included because of short term effect only
• Consider short term period only
Additional to M0 significant variables?
This gives M1
73
MFPTime algorithm (2)For all variables (with transformations) selected from full time-period and short time-period
• Investigate time function for each covariate in forward stepwise fashion - may use small P value
• Adjust for covariates from selected model• To determine time function for a variable
compare deviance of models ( 2) from
FPT2 to null (time fixed effect) 4 DFFPT2 to log 3 DFFPT2 to FPT1 2 DF
• Use strategy analogous to stepwise to add
time-varying functions to MFP model M1
74
Development of the modelVariable Model M0 Model M1 Model M2
β SE β SE β SE
X1 -0.013 0.002 -0.013 0.002 -0.013 0.002
X3b - - 0.171 0.080 0.150 0.081
X4 0.39 0.064 0.354 0.065 0.375 0.065
X5e(2) -1.71 0.081 -1.681 0.083 -1.696 0.084
X8 -0.39 0.085 -0.389 0.085 -0.411 0.085
X9 -0.45 0.073 -0.443 0.073 -0.446 0.073
X3a 0.29 0.057 0.249 0.059 - 0.112 0.107
logX6 - - -0.032 0.012 - 0.137 0.024
X3a(log(t)) - - - - - 0.298 0.073
logX6(log(t)) - - - - 0.128 0.016
Index 1.000 0.039 1.000 0.038 0.504 0.082
Index(log(t)) - - - - -0.361 0.052
76
Final model includes time-varying functions for
progesterone ( log(t) ) and
tumor size ( log(t) )
Prognostic ability of the Index vanishes in time
77
Software sources MFP
• Most comprehensive implementation is in Stata– Command mfp is part since Stata 8 (now Stata 10)
• Versions for SAS and R are available– SAS
www.imbi.uni-freiburg.de/biom/mfp
– R version available on CRAN archive• mfp package
• Extensions to investigate interactions• So far only in Stata
78
Concluding comments – MFP
• FPs use full information - in contrast to a priori categorisation
• FPs search within flexible class of functions (FP1 and FP(2)-44 models)
• MFP is a well-defined multivariate model-building strategy – combines search for transformations with BE
• Important that model reflects medical knowledge,
e.g. monotonic / asymptotic functional forms
79
Towards recommendations for model-building by selection of variables and functional forms for continuous predictors under
several assumptionsIssue Recommendation
Variable selection procedure Backward elimination; significance level as key tuning parameter, choice depends on the aim of the study
Functional form for continuous covariates
Linear function as the 'default', check improvement in model fit by fractional polynomials. Check derived function for undetected local features
Extreme values or influential points Check at least univariately for outliers and influential points in continuous variables. A preliminary transformation may improve the model selected. For a proposal see R & S 2007
Sensitivity analysis Important assumptions should be checked by a sensitivity analysis. Highly context dependent
Check of model stability The bootstrap is a suitable approach to check for model stability
Complexity of a predictor A predictor should be 'as parsimonious as possible'
Sauerbrei et al. SiM 2007
80
Interactions
• Interactions are often ignored by analysts• Continuous categorical has been studied in
FP context because clinically very important• Continuous continuous is more complex• Interaction with time important for long-term FU
survival data
81
MFP extensions
• MFPI – treatment/covariate interactions In contrast to STEPP it avoids categorisation
• MFPIgen – interaction between two continuous variables
• MFPT – time-varying effects in survival data
82
Summary
Getting the big picture right is more important than optimising aspects and ignoring others
• strong predictors
• strong non-linearity
• strong interactions• strong non-PH in survival model
83
Harrell FE jr. (2001): Regression Modeling Strategies. Springer.
Royston P, Altman DG. (1994): Regression using fractional polynomials of continuous covariates: parsimonious parametric modelling (with discussion). Applied Statistics, 43, 429-467.
Royston P, Altman DG, Sauerbrei W (2006): Dichotomizing continuous predictors in multiple regression: a bad idea. Statistics in Medicine, 25, 127-141.
Royston P, Sauerbrei W. (2004): A new approach to modelling interactions between treatment and continuous covariates in clinical trials by using fractional polynomials. Statistics in Medicine, 23, 2509-2525.
Royston P, Sauerbrei W. (2005): Building multivariable regression models with continuous covariates, with a practical emphasis on fractional polynomials and applications in clinical epidemiology. Methods of Information in Medicine, 44, 561-571.
Royston P, Sauerbrei W. (2007): Improving the robustness of fractional polynomial models by preliminary covariate transformation: a pragmatic approach. Computational Statistics and Data Analysis, 51: 4240-4253.
Royston P, Sauerbrei W (2008): Multivariable Model-Building - A pragmatic approach to regression analysis based on fractional polynomials for continuous variables. Wiley.
Sauerbrei W. (1999): The use of resampling methods to simplify regression models in medical statistics. Applied Statistics, 48, 313-329.
Sauerbrei W, Meier-Hirmer C, Benner A, Royston P. (2006): Multivariable regression model building by using fractional polynomials: Description of SAS, STATA and R programs. Computational Statistics & Data Analysis, 50, 3464-3485.
Sauerbrei W, Royston P. (1999): Building multivariable prognostic and diagnostic models: transformation of the predictors by using fractional polynomials. Journal of the Royal Statistical Society A, 162, 71-94.
Sauerbrei W, Royston P, Binder H (2007): Selection of important variables and determination of functional form for continuous predictors in multivariable model building. Statistics in Medicine, 26:5512-28.
Sauerbrei W, Royston P, Look M. (2007): A new proposal for multivariable modelling of time-varying effects in survival data based on fractional polynomial time-transformation. Biometrical Journal, 49: 453-473.
Sauerbrei W, Royston P, Zapien K. (2007): Detecting an interaction between treatment and a continuous covariate: a comparison of two approaches. Computational Statistics and Data Analysis, 51: 4054-4063.
Schumacher M, Holländer N, Schwarzer G, Sauerbrei W. (2006): Prognostic Factor Studies. In Crowley J, Ankerst DP (ed.), Handbook of Statistics in Clinical Oncology, Chapman&Hall/CRC, 289-333.
References
Recommended