48
Introduction Web site for slides www.geos.ed.ac.uk/homes/graab SLS synthetic survival data

SLS synthetic survival data - geos.ed.ac.uk · PDF fileSLS synthetic survival data. ... Only these three shaded groups born in 1970s ... 3.96 is 95% significance level. Survival by

Embed Size (px)

Citation preview

Page 1: SLS synthetic survival data - geos.ed.ac.uk · PDF fileSLS synthetic survival data. ... Only these three shaded groups born in 1970s ... 3.96 is 95% significance level. Survival by

Introduction

Web site for slides

www.geos.ed.ac.uk/homes/graab

SLS synthetic survival data

Page 2: SLS synthetic survival data - geos.ed.ac.uk · PDF fileSLS synthetic survival data. ... Only these three shaded groups born in 1970s ... 3.96 is 95% significance level. Survival by

SLS synthetic data

Derived from a real SLS extract

But contains no real people

See back page of notes for details

Start by looking at one table from it

Notes page 5

Page 3: SLS synthetic survival data - geos.ed.ac.uk · PDF fileSLS synthetic survival data. ... Only these three shaded groups born in 1970s ... 3.96 is 95% significance level. Survival by

Deaths by age group

agegroup

Total20-29 30-39 40-49 50-59 60-69 70-75year of

death

1991 8 12 41 94 218 165 538

1992 15 42 72 221 470 367 1187

1993 14 54 109 256 620 426 1479

1994 28 64 120 304 727 490 1733

1995 25 53 145 317 854 528 1922

1996 36 47 155 341 803 523 1905

1997 41 75 172 407 888 522 2105

1998 33 67 166 427 904 510 2107

1999 45 79 214 413 847 513 2111

2000 45 85 169 407 866 447 2019

2001 42 93 202 450 899 441 2127

2002 52 115 215 474 837 459 2152

2003 39 101 246 515 821 383 2105

2004 49 102 226 515 896 384 2172

2005 37 117 255 494 831 362 2096

2006 52 105 252 521 860 311 2101

2007 52 112 242 440 744 280 1870

2008 55 119 270 544 780 258 2026

Total 668 1442 3271 7140 13865 7369 33755

Page 4: SLS synthetic survival data - geos.ed.ac.uk · PDF fileSLS synthetic survival data. ... Only these three shaded groups born in 1970s ... 3.96 is 95% significance level. Survival by

Age at death by birth cohort

Birth Cohort 1910 1920 1930 1940 1950 1960 1970

Mean age at death

Page 5: SLS synthetic survival data - geos.ed.ac.uk · PDF fileSLS synthetic survival data. ... Only these three shaded groups born in 1970s ... 3.96 is 95% significance level. Survival by

Answers

Birth Cohort 1910 1920 1930 1940 1950 1960 1970

Mean age at death 81 76 67 57 48 38 30

Number of deaths 3858 15051 8475 3822 1663 786 100

Page 6: SLS synthetic survival data - geos.ed.ac.uk · PDF fileSLS synthetic survival data. ... Only these three shaded groups born in 1970s ... 3.96 is 95% significance level. Survival by

Youngest ages

age 1991 Census Birthday

From To Age end 2008

20 Apr-70 Mar-71 38 37

21 Apr-69 Mar-70 39 38

22 Apr-68 Mar-69 40 39

23 Apr-67 Mar-68 41 40

Only these three shaded groups born in 1970sFollowed up from age 20/21 to 37/38Hence mean age at death 30

Page 7: SLS synthetic survival data - geos.ed.ac.uk · PDF fileSLS synthetic survival data. ... Only these three shaded groups born in 1970s ... 3.96 is 95% significance level. Survival by

Oldest ages

age 1991 Census Birthday

From To Age end 2008

71 Apr-19 Mar-20 89 88

72 Apr-18 Mar-19 90 89

73 Apr-17 Mar-18 91 90

74 Apr-16 Mar-17 92 91

Only these three shaded groups born in 1910sFollowed up from age 71-74 to 88-91Hence mean age at death 81

Page 8: SLS synthetic survival data - geos.ed.ac.uk · PDF fileSLS synthetic survival data. ... Only these three shaded groups born in 1970s ... 3.96 is 95% significance level. Survival by

Extra information

Birth Cohort 1910s 1920s 1930s1940s1950s1960s1970s

Mean age at death 81 76 67 57 48 38 30

Number of deaths 3858 15051 8475 3822 1663 786 100

Page 9: SLS synthetic survival data - geos.ed.ac.uk · PDF fileSLS synthetic survival data. ... Only these three shaded groups born in 1970s ... 3.96 is 95% significance level. Survival by

Most important for youngest group

Data are CENSORED, strictly RIGHT CENSORED

meaning that the follow up ended before the event

happened as in the two lower examples

1991 2008Moved to France 20011

Page 10: SLS synthetic survival data - geos.ed.ac.uk · PDF fileSLS synthetic survival data. ... Only these three shaded groups born in 1970s ... 3.96 is 95% significance level. Survival by

Time origin at birth – oldest group

Age 0 Age 40Age 20Age 80

Age 60

The middle person died before we started the follow up, so we will not capture them in the study at all. This is called left truncation.

Page 11: SLS synthetic survival data - geos.ed.ac.uk · PDF fileSLS synthetic survival data. ... Only these three shaded groups born in 1970s ... 3.96 is 95% significance level. Survival by

Summarising and plotting

survival data

Questions about practical 1?

Now continue

Page 12: SLS synthetic survival data - geos.ed.ac.uk · PDF fileSLS synthetic survival data. ... Only these three shaded groups born in 1970s ... 3.96 is 95% significance level. Survival by

Survival data with no

censoring (fake)

Page 13: SLS synthetic survival data - geos.ed.ac.uk · PDF fileSLS synthetic survival data. ... Only these three shaded groups born in 1970s ... 3.96 is 95% significance level. Survival by

Kaplan Meir calculations for SLS data

Time

months

At risk at

startDied Exited

Proportion

died

Proportion

died

Proportion

survived is

then P(t)

S(t) KM

.50 175059 25 0 25/175059 = 0.000143 0.999857191 0.999857

1.50 175034 44 19 63/17503 = 0.000251 0.99974862 0.999605

2.50 174971 53 11 64/174971 = 0.0003029 0.999697093 0.999303

3.50 174907 66 16 82/174907 = 0.0003773 0.999622657 0.998925

4.50 174825 0 11 0/174825 = 0 1 0.998925

5.50 174814 70 12 82/174814 = 0.0004004 0.999599574 0.998526

Table 3: Product limit calculations for the first few months of the synthetic SLS data

continued

Page 14: SLS synthetic survival data - geos.ed.ac.uk · PDF fileSLS synthetic survival data. ... Only these three shaded groups born in 1970s ... 3.96 is 95% significance level. Survival by

Plot of KM for whole SLS sample

Can we get mean or median survival time?

Page 15: SLS synthetic survival data - geos.ed.ac.uk · PDF fileSLS synthetic survival data. ... Only these three shaded groups born in 1970s ... 3.96 is 95% significance level. Survival by

Example of survival summaries from a study

of survival after surgery

No tissue

loss

AP>=50

Tissue loss

AP>=50

AP <50

Median

5-year

survival

Restricted

mean to 7

years

Page 16: SLS synthetic survival data - geos.ed.ac.uk · PDF fileSLS synthetic survival data. ... Only these three shaded groups born in 1970s ... 3.96 is 95% significance level. Survival by

Answers - approximate

No tissue

loss

AP>=50

Tissue loss

AP>=50

AP <50

Median >8 yrs 5.2 yrs 3.6 yrs

5-year

survival

75% 42% 30%

Restricted

mean to 7

years

6.8 yrs 4.3 years 3.4 years

Page 17: SLS synthetic survival data - geos.ed.ac.uk · PDF fileSLS synthetic survival data. ... Only these three shaded groups born in 1970s ... 3.96 is 95% significance level. Survival by

Afternoon session

Going over practical 2

Modelling survival

Page 18: SLS synthetic survival data - geos.ed.ac.uk · PDF fileSLS synthetic survival data. ... Only these three shaded groups born in 1970s ... 3.96 is 95% significance level. Survival by

Practical 2 answers

Age group in

1991

20-29 30-39 40-49 50-59 60-69 70-74

Median survival

years - - - - 16.7 10.5

Restricted mean

years 17.6 17.4 17.0 15.8 13.3 10.6

5 year survival 99.8% 99.4% 98.5% 95.5% 88.2% 78.0%

19.4% died during follow upMedian survival >17.7 yearsTests by sexten9Chisquared tests all give values >500 1 degree of freedom 3.96 is 95% significance level

Page 19: SLS synthetic survival data - geos.ed.ac.uk · PDF fileSLS synthetic survival data. ... Only these three shaded groups born in 1970s ... 3.96 is 95% significance level. Survival by

Survival by sexten9

Page 20: SLS synthetic survival data - geos.ed.ac.uk · PDF fileSLS synthetic survival data. ... Only these three shaded groups born in 1970s ... 3.96 is 95% significance level. Survival by

Survival by Higher

education

Page 21: SLS synthetic survival data - geos.ed.ac.uk · PDF fileSLS synthetic survival data. ... Only these three shaded groups born in 1970s ... 3.96 is 95% significance level. Survival by

By economic activity

Page 22: SLS synthetic survival data - geos.ed.ac.uk · PDF fileSLS synthetic survival data. ... Only these three shaded groups born in 1970s ... 3.96 is 95% significance level. Survival by

Social class

Page 23: SLS synthetic survival data - geos.ed.ac.uk · PDF fileSLS synthetic survival data. ... Only these three shaded groups born in 1970s ... 3.96 is 95% significance level. Survival by

With origin changed

Page 24: SLS synthetic survival data - geos.ed.ac.uk · PDF fileSLS synthetic survival data. ... Only these three shaded groups born in 1970s ... 3.96 is 95% significance level. Survival by

Note Y axis

Page 25: SLS synthetic survival data - geos.ed.ac.uk · PDF fileSLS synthetic survival data. ... Only these three shaded groups born in 1970s ... 3.96 is 95% significance level. Survival by

Confounding?

sexten9 Sex recorded at 91 Census Women survive longer

agegroup 10 year age groups Big effect

Hed Higher education qualification

at 1991 Census

Big effect

econpo9 Economic activity at 1991

Census

Retired and perm sick

poorest survival

sclas9 Social class at 1991 Census III non manual seem to

be best?

Out of work worst

urbgro9 6 fold urban rural classification Most rural best

Page 26: SLS synthetic survival data - geos.ed.ac.uk · PDF fileSLS synthetic survival data. ... Only these three shaded groups born in 1970s ... 3.96 is 95% significance level. Survival by

Modelling survival

Models of survival could use mean survival

time – and some do

But censoring can make this tricky

Commonest method of modelling survival

is via the HAZARD function

Notes page 20

Page 27: SLS synthetic survival data - geos.ed.ac.uk · PDF fileSLS synthetic survival data. ... Only these three shaded groups born in 1970s ... 3.96 is 95% significance level. Survival by

Hazard rate

Rate of failure for those still at risk

Higher hazard means shorter survival

The exponential distribution has a constant

hazard rate

But that doesn’t mean a linear decline in

survival

Because the number at risk keeps falling

Page 28: SLS synthetic survival data - geos.ed.ac.uk · PDF fileSLS synthetic survival data. ... Only these three shaded groups born in 1970s ... 3.96 is 95% significance level. Survival by

Exponential survival

Page 29: SLS synthetic survival data - geos.ed.ac.uk · PDF fileSLS synthetic survival data. ... Only these three shaded groups born in 1970s ... 3.96 is 95% significance level. Survival by

Survival and cumulative

hazard plots

Page 30: SLS synthetic survival data - geos.ed.ac.uk · PDF fileSLS synthetic survival data. ... Only these three shaded groups born in 1970s ... 3.96 is 95% significance level. Survival by

Cumulative hazard plots

For exponential distributions?

As people age?

Following a major operation

Page 31: SLS synthetic survival data - geos.ed.ac.uk · PDF fileSLS synthetic survival data. ... Only these three shaded groups born in 1970s ... 3.96 is 95% significance level. Survival by

Cox 1972 “Regression

models and life tables”

Time in

months

At risk at

startDied Exited

Proportion

died

0.50 175059 25 0 25/175059

1.50 175034 44 19 63/17503

2.50 174971 53 11 64/174971

The proportion at risk who die in the interval approximates the hazard.

The hazards can be compared across groups

Page 32: SLS synthetic survival data - geos.ed.ac.uk · PDF fileSLS synthetic survival data. ... Only these three shaded groups born in 1970s ... 3.96 is 95% significance level. Survival by

Cox Proportional Hazard

models

h(group1) = h0(t) terms

h(group2)= K h0(t)

where K is the hazard ratio.

In terms of logs

log( h(group1)) = log(h0(t))

log( h(group2)) =log(K) + log(h0(t))

Page 33: SLS synthetic survival data - geos.ed.ac.uk · PDF fileSLS synthetic survival data. ... Only these three shaded groups born in 1970s ... 3.96 is 95% significance level. Survival by

Cox PH model for age

group

Compared to age group 20-29

Hazard age 30-39 is 2.25 times

Hazard age 40-49 is 5.81 times

……..

Hazard age 70-74 is 84.0 times

Page 34: SLS synthetic survival data - geos.ed.ac.uk · PDF fileSLS synthetic survival data. ... Only these three shaded groups born in 1970s ... 3.96 is 95% significance level. Survival by

Adjusting for age in PH model of effect

of education

------------------------------------------------------------------------------

_t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------

Hed | 0.4231624 0.0088321 -41.20 0.000 .4062011 .440832

------------------------------------------------------------------------------

------------------------------------------------------------------------------

_t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------

agegroup |

30 | 2.295549 .1080288 17.66 0.000 2.093288 2.517354

40 | 5.851488 .2496112 41.42 0.000 5.382153 6.361751

50 | 16.3997 .6668447 68.79 0.000 15.14343 17.76018

60 | 42.73684 1.702482 94.26 0.000 39.52697 46.20736

70 | 81.4729 3.314068 108.18 0.000 75.22963 88.2343

|

Hed | .6087054 .0127545 -23.69 0.000 .5842134 .6342241

------------------------------------------------------------------------------

Page 35: SLS synthetic survival data - geos.ed.ac.uk · PDF fileSLS synthetic survival data. ... Only these three shaded groups born in 1970s ... 3.96 is 95% significance level. Survival by

Plots for checking PH

models

Page 36: SLS synthetic survival data - geos.ed.ac.uk · PDF fileSLS synthetic survival data. ... Only these three shaded groups born in 1970s ... 3.96 is 95% significance level. Survival by

Adjusted analysis for social

class = Class I as baseline

Category Hazard

relative to

baseline

Hazard

adjusted for

agegroup

From delayed

entry model

(part 2)

Class II 0.99

Class III NM 0.88

Class III M 1.55

Class IV 1.49

Class V 1.82

Not worked

10 years

4.31

Page 37: SLS synthetic survival data - geos.ed.ac.uk · PDF fileSLS synthetic survival data. ... Only these three shaded groups born in 1970s ... 3.96 is 95% significance level. Survival by

Adjusted analysis for social

class = Class I as baseline

Category Hazard

relative to

baseline

Hazard

adjusted for

agegroup

From delayed

entry model

(part 2)

Class II 0.99 0.94

Class III NM 0.88 0.96

Class III M 1.55 1.52

Class IV 1.49 1.48

Class V 1.82 1.40

Not worked

10 years

4.31 1.70

Page 38: SLS synthetic survival data - geos.ed.ac.uk · PDF fileSLS synthetic survival data. ... Only these three shaded groups born in 1970s ... 3.96 is 95% significance level. Survival by

Adjusted analysis for social

class = Class I as baseline

Category Hazard

relative to

baseline

Hazard

adjusted for

agegroup

From delayed

entry model

(part 2)

Class II 0.99 0.94 0.94

Class III NM 0.88 0.96 0.96

Class III M 1.55 1.52 1.55

Class IV 1.49 1.48 1.47

Class V 1.82 1.40 1.42

Not worked

10 years

4.31 1.70 1.72

Page 39: SLS synthetic survival data - geos.ed.ac.uk · PDF fileSLS synthetic survival data. ... Only these three shaded groups born in 1970s ... 3.96 is 95% significance level. Survival by

Extensions of the Cox

Model

Time dependent covariates

The effect of a factor changes over the

follow-up period

Delayed entry

Involves changing the origin of time

Data are left truncated so that their follow

up time does not start at zero

Page 40: SLS synthetic survival data - geos.ed.ac.uk · PDF fileSLS synthetic survival data. ... Only these three shaded groups born in 1970s ... 3.96 is 95% significance level. Survival by

Time-dependent Cox modelThe effect of a factor can change over the follow-

up time, or can only come into effect over certain

times. Examples:

Comparison of types of surgery may vary over time,

e.g. a higher risk at operation followed by longer

term improvements

People who experience widowhood may have

increased risk of death compared to those

remaining married

Periods when an elderly person is visiting a day

centre may reduce their hazard of being taken into

hospital

Page 41: SLS synthetic survival data - geos.ed.ac.uk · PDF fileSLS synthetic survival data. ... Only these three shaded groups born in 1970s ... 3.96 is 95% significance level. Survival by

Time dependent example

Start1991

2008

How does a young person’s hazard of a first diagnosis of a mental illness change while they are at university?

Page 42: SLS synthetic survival data - geos.ed.ac.uk · PDF fileSLS synthetic survival data. ... Only these three shaded groups born in 1970s ... 3.96 is 95% significance level. Survival by

Time dependent example

Start1991

2008

A covariate for University by time gets switched on for certain time periods. The coefficient for this measures the difference in hazards between those at university and others not at university.

Page 43: SLS synthetic survival data - geos.ed.ac.uk · PDF fileSLS synthetic survival data. ... Only these three shaded groups born in 1970s ... 3.96 is 95% significance level. Survival by

Time dependent example

Start1991

2008

Programs differ in how you define time-dependent covariates. Some require you to split up the follow-up records into two or more periods for each individual (R and Stata), while others (SPSS and SAS) require you to write some code to define the periods.

Page 44: SLS synthetic survival data - geos.ed.ac.uk · PDF fileSLS synthetic survival data. ... Only these three shaded groups born in 1970s ... 3.96 is 95% significance level. Survival by

Delayed entry models

Age 0 Age 40Age 20Age 80

Age 60

For each person calculate their age when they come in to follow-up and their age at the terminal event or censoring. Then define your survival data with an entry time and an exit time. Sometimes called a counting process method.

Page 45: SLS synthetic survival data - geos.ed.ac.uk · PDF fileSLS synthetic survival data. ... Only these three shaded groups born in 1970s ... 3.96 is 95% significance level. Survival by

Cohort effects from a counting

process model

Age 0 Age 40Age 20Age 80

Age 60

For each person calculate their age when they come in to follow-up and their age at the terminal event or censoring. Then define your survival data with an entry time and an exit time. Sometimes called a counting process method.

Page 46: SLS synthetic survival data - geos.ed.ac.uk · PDF fileSLS synthetic survival data. ... Only these three shaded groups born in 1970s ... 3.96 is 95% significance level. Survival by

Delayed entry models

Age 0 30Age 20

Allow modelling of cohort effects by comparing survival of different cohorts at the same age.

Cohort 1970s

Cohort 1960s

Cohort 1950s

40 50 60

Cohort 1940s

Cohort 1930s

Cohort 1920s

70 80

Cohort 1910s

Page 47: SLS synthetic survival data - geos.ed.ac.uk · PDF fileSLS synthetic survival data. ... Only these three shaded groups born in 1970s ... 3.96 is 95% significance level. Survival by

Cohort effect from delayed

entry model

Cohort Frequency Ln(hazard ratio) std. error Hazard ratio

1910 4792 1.04 0.12 2.84

1920 25405 0.81 0.12 2.24

1930 28279 0.56 0.12 1.75

1940 33051 0.24 0.12 1.27

1950 36309 -0.01 0.11 0.99

1960 40003 -0.15 0.11 0.86

1970 7220 0.00 1.00

Page 48: SLS synthetic survival data - geos.ed.ac.uk · PDF fileSLS synthetic survival data. ... Only these three shaded groups born in 1970s ... 3.96 is 95% significance level. Survival by

Males survive even worse

in the 21st century

Variables in the Equation

B SE Sig. Exp(B)

sexten9 .473 .016 .000 1.605

agegroup 0.000agegroup(1) -4.480 .041 0.000 .011

agegroup(2) -3.673 .029 0.000 .025

agegroup(3) -2.724 .021 0.000 .066

agegroup(4) -1.667 .017 0.000 .189

agegroup(5) -.681 .014 0.000 .506

sex21st .123 .022 .000 1.131