17
Making fractional polynomial models more robust Willi Sauerbrei Institut of Medical Biometry and Informatics University Medical Center Freiburg, Germany Patrick Royston MRC Clinical Trials Unit, London, UK

Making fractional polynomial models more robust Willi Sauerbrei Institut of Medical Biometry and Informatics University Medical Center Freiburg, Germany

  • View
    217

  • Download
    0

Embed Size (px)

Citation preview

Making fractional polynomial models more robust

Willi SauerbreiInstitut of Medical Biometry and Informatics University Medical Center Freiburg, Germany

Patrick RoystonMRC Clinical Trials Unit, London, UK

2

An interesting dataset

• From Johnson (J Statistics Education 1996)• Percent body fat measurements in 252 men• 13 continuous covariates comprising age,

weight, height, 10 body circumference measurements

• Used by Johnson to illustrate some of the problems of multiple regression analysis (collinearity etc.)

3

The problem …0

20

40

60

Per

cent

age

body

fat

60 80 100 120 140

Linear

FP1

FP2

All data

Case 39

02

04

06

0

60 80 100 120 140

Excluding case 39.0

1.1

1Le

vera

ge (

log

scal

e)

60 80 100 120 140Abdominal circ, cm

Leverage

.01

.11

60 80 100 120 140Abdominal circ, cm

Leverage

4

Effect of case 39 on FP analysis(P-values for non-linear effects)

Comparison All data Omit case 39 FP2 vs. linear 0.000 0.73 FP2 vs. FP1 0.035 0.85

Non-linearity depends on case 39

This case has an undue influence on the results of the FP analysis

Would have similar influence on other flexible models, e.g. splines

5

Brief reminder:Fractional polynomial models

• For one covariate, X• Fractional polynomial of degree m for X with

powers p1, … , pm is given byFPm(X) = 1 Xp1 + … + m Xpm

• Powers p1,…, pm are taken from a special set {2, 1, 0.5, 0, 0.5, 1, 2, 3}

• In clinical data, m = 1 or m = 2 is usually sufficient for a good fit

6

FP1 and FP2 models

• FP1 models are simple power transformations• 1/X2, 1/X, 1/X, log X, X, X, X2, X3

8 models of the form 0 + 1Xp

• FP2 models have combinations of the powersFor example 0 + 1(1/X) + 2(X2)28 models

• Also ‘repeated powers’ modelsFor example (1, 1): 0 + 1X + 2X log X8 models

7

Bodyfat: Case 39 also influences a multivariable FP model

Covariate All data Case 39 omitted

Height 1 Abdomen 1 1 Biceps 3, 3 Wrist 1 1 Weight 1 Thigh -2, -2

Case 39 is extreme for several covariates

8

A conceptual solution:preliminary transformation of X

.2.4

.6.8

1g(

Abd

omen

)

75 100 125 150Abdomen (cm)

9

Bodyfat revisited0

20

40

60

Per

cent

age

body

fat

60 80 100 120 140

Linear

FP1

FP2

Original data

Case 39

02

04

06

0

60 80 100 120 140

After preliminary transformation.0

1.1

1Le

vera

ge (

log

scal

e)

60 80 100 120 140Abdominal circ, cm

Leverage

.01

.11

60 80 100 120 140Abdominal circ, cm

Leverage

10

Preliminary transformation:effect on multivariable FP analysis

Apply preliminary transformation to all predictors in bodyfat data

Covariate Original data Transformed data All data Case 39

omitted All data Case 39

omitted Height 1 Abdomen 1 1 1 1 Biceps 3, 3 Wrist 1 1 1 1 Weight 1 1 1 Thigh -2, -2 -2 -2

11

The transformation (1)

1/ln* where

*2/*1

ln),(

istion Transforma

data of sample afor /)(Let

zzzg

sxxz

Take = 0.01 for best results

12

The transformation (2)

• 0 < g(z, ) < 1 for any z and • g(z, ) tends to asymptotes 0 and 1 as z tends

to • g(z, ) looks like a straight line centrally,

smoothly truncated at the extremes

13

The transformation (3)

= 0.01 is nearly linear in central region

-.5

0.5

11.

5g(

z, e

ps)

-4 -2 0 2 4z

epsilon = .0001epsilon = .01

epsilon = 1Line on [-2, 2] for eps = 0.01

14

The transformation (4)

• FP functions (including transformations such as log) are sensitive to values of x near 0

• To avoid this effect, shift the origin of g(z, ) to the right

• Simple linear transformation of g(z, ) to the interval (, 1) does this

• Simulation studies support = 0.2

15

Example 2 – Whitehall 1 study

17,370 male Civil Servants aged 40-64 years• Covariates: age, cigarette smoking, BP,

cholesterol, height, weight, job grade• Outcomes of interest: all-cause mortality

logistic regression• Interested in risk as function of covariates• Several continuous covariates• Risk functions preliminary transformation

16

Multivariable FP modelling with or without preliminary transformation

.05

.1.1

5.2

Pro

babi

lity

of d

eath

40 45 50 55 60 65Age at entry

Age

.08

.1.1

2.1

4.1

6.1

8P

roba

bilit

y of

dea

th

0 20 40 60Cigarettes/day

Cigarettes

.1.2

.3.4

.5P

roba

bilit

y of

dea

th

50 100 150 200 250 300Systolic BP

Systolic BP

.08

.1.1

2.1

4.1

6P

roba

bilit

y of

dea

th

0 5 10 15Cholesterol/ mmol/l

Total cholesterol

.1.1

2.1

4.1

6.1

8.2

Pro

babi

lity

of d

eath

40 60 80 100 120 140Weight/kgs

Weight

.08

.09

.1.1

1.1

2.1

3P

roba

bilit

y of

dea

th

140 160 180 200Height/cms

Height

Original vs. transformed covariatesWhitehall 1: multivariable FP analysis

Green vertical lines show 1 and 99th centiles of X

17

Comments and conclusions

• Issue of robustness affects FP and other models• Standard analysis of influence may identify

problematic points but does not tell you what to do• Proposed preliminary transformation is effective in

reducing leverage of extreme covariate valuesLowers the chance that FP and other flexible models will

contain artefacts in curve shapeTransformation looks complicated, but graph shows idea is

really quite simple – like double truncation

• May be concerned about possible bias in fit at extreme values of X following transformation