25
SJS SDI_13 1 Design of Statistical Investigations Stephen Senn 13 Cohort Studies

SJS SDI_131 Design of Statistical Investigations Stephen Senn 13 Cohort Studies

Embed Size (px)

Citation preview

Page 1: SJS SDI_131 Design of Statistical Investigations Stephen Senn 13 Cohort Studies

SJS SDI_13 1

Design of Statistical Investigations

Stephen Senn

13 Cohort Studies

Page 2: SJS SDI_131 Design of Statistical Investigations Stephen Senn 13 Cohort Studies

SJS SDI_13 2

Two Major Types of Epidemiological Observational Study

This section partly based on Rothman• Cohort study

– Sometimes referred to as prospective study• But this terms is best avoided

– Some cohort studies are not prospective

• Case-control study– Sometimes referred to as retrospective study

• But this term is also best avoided– Since some cohort studies are retrospective

Page 3: SJS SDI_131 Design of Statistical Investigations Stephen Senn 13 Cohort Studies

SJS SDI_13 3

Cohorts

• A cohort was the tenth part of a legion of Roman soldiers

• It is used by epidemiologists to mean a group of individuals followed over time– Analogy of a body of marching men

• In demography it is sometimes used to distinguish generational as opposed to cross-sectional approaches

Page 4: SJS SDI_131 Design of Statistical Investigations Stephen Senn 13 Cohort Studies

SJS SDI_13 4

Cohort Study

• The epidemiological equivalent of clinical trial

• Subjects are compared according to exposure

• Followed up for outcome

• However unlike a clinical trial, the exposure is not assigned

Page 5: SJS SDI_131 Design of Statistical Investigations Stephen Senn 13 Cohort Studies

SJS SDI_13 5

Example Obs_2• John Snow & Cholera London 1854

• Two different companies supplied water to London– Lambeth

• 26,107 houses

• 14 houses with fatal attacks

– Southwark and Vauxhall• 40,046 houses

• 286 houses with fatal attacks

Page 6: SJS SDI_131 Design of Statistical Investigations Stephen Senn 13 Cohort Studies

SJS SDI_13 6

Obs_2Notes

• Sampling is by exposure– Lambeth company versus Southwark &

Vauxhall company

• The study is retrospective however.

• Snow obtained data once the outbreak was known

Page 7: SJS SDI_131 Design of Statistical Investigations Stephen Senn 13 Cohort Studies

SJS SDI_13 7

Population at Risk

• Population chosen should be capable (in principle) of suffering event of interest

• Standard requirement that population at risk must be free of disease of interest at outset– Argument is that you cannot develop a disease

if you already have it• WARNING. Consider how this agrees with our

notions of causality?

Page 8: SJS SDI_131 Design of Statistical Investigations Stephen Senn 13 Cohort Studies

SJS SDI_13 8

Closed and Open Cohorts

• Closed cohort– Membership is defined at outset– Numbers can only get smaller as study

progresses

• Open Cohort (dynamic cohort)– Can take on new members as study progresses– Usually defined geographically

Page 9: SJS SDI_131 Design of Statistical Investigations Stephen Senn 13 Cohort Studies

SJS SDI_13 9

Confounding

• Confounding is the major problem of observational studies

• We fear that the presence of hidden variables (confounders) rather than the variable under study may explain results

• In the extreme case we have a complete reversal known as “Simpson’s Paradox”

Page 10: SJS SDI_131 Design of Statistical Investigations Stephen Senn 13 Cohort Studies

SJS SDI_13 10

Simpson’s ParadoxObs_3 Berkeley Example

• Case of graduate admissions to University of Berkeley in California in early 1970s

• Data by sex show that a lower proportion of females are admitted

status |sex |Female |Male |RowTotl|-------+-------+-------+-------+accept | 628 |1198 |1826 | |0.34 |0.45 | |-------+-------+-------+-------+reject |1207 |1493 |2700 | |0.66 |0.55 | |-------+-------+-------+-------+ColTotl|1835 |2691 |4526 |-------+-------+-------+-------+

Page 11: SJS SDI_131 Design of Statistical Investigations Stephen Senn 13 Cohort Studies

SJS SDI_13 11

Logistic Regression

Call: glm(formula = p.accepted ~ sex, family = binomial, weights = n.applied)

Coefficients: Value Std. Error t value (Intercept) -0.4367435 0.03132617 -13.941811 sex 0.2166095 0.03132617 6.914651

Males have significantly higher admission rate

Page 12: SJS SDI_131 Design of Statistical Investigations Stephen Senn 13 Cohort Studies

SJS SDI_13 12

sex = femalefaculty|status |accept |reject |RowTotl|-------+-------+-------+-------+1 | 89 | 19 |108 | |0.82 |0.18 |0.059 |-------+-------+-------+-------+2 | 17 | 8 |25 | |0.68 |0.32 |0.014 |-------+-------+-------+-------+3 |202 |391 |593 | |0.34 |0.66 |0.32 |-------+-------+-------+-------+4 |202 |173 |375 | |0.54 |0.46 |0.2 |-------+-------+-------+-------+5 | 94 |299 |393 | |0.24 |0.76 |0.21 |-------+-------+-------+-------+6 | 24 |317 |341 | |0.07 |0.93 |0.19 |-------+-------+-------+-------+ColTotl|628 |1207 |1835 | |0.34 |0.66 | |-------+-------+-------+-------+

sex = malefaculty|status |accept |reject |RowTotl|-------+-------+-------+-------+1 |512 |313 |825 | |0.62 |0.38 |0.31 |-------+-------+-------+-------+2 |353 |207 |560 | |0.63 |0.37 |0.21 |-------+-------+-------+-------+3 |120 |205 |325 | |0.37 |0.63 |0.12 |-------+-------+-------+-------+4 |138 |279 |417 | |0.33 |0.67 |0.15 |-------+-------+-------+-------+5 | 53 |138 |191 | |0.28 |0.72 |0.071 |-------+-------+-------+-------+6 | 22 |351 |373 | |0.059 |0.94 |0.14 |-------+-------+-------+-------+ColTotl|1198 |1493 |2691 | |0.45 |0.55 | |-------+-------+-------+-------+

Page 13: SJS SDI_131 Design of Statistical Investigations Stephen Senn 13 Cohort Studies

SJS SDI_13 13

Logistic Regression 2

Call: glm(formula = p.accepted ~ sex + faculty, family = binomial, weights = n.applied)Coefficients: Value Std. Error t value (Intercept) -0.55965018 0.03934316 -14.2248409 sex -0.16941771 0.04053734 -4.1792997 faculty1 -0.01350673 0.05494827 -0.2458081 faculty2 -0.46041466 0.03363771 -13.6874569 faculty3 -0.13233587 0.02145413 -6.1683158 faculty4 -0.25473989 0.02143709 -11.8831374 faculty5 -0.42417774 0.02617420 -16.2059518

Males have significantly lower admission rate

Page 14: SJS SDI_131 Design of Statistical Investigations Stephen Senn 13 Cohort Studies

SJS SDI_13 14

A Paradox?

• If we do not take faculty into account admission is more difficult for females

• If we allow for the faculty the reverse is the case

• In the extreme case (no quite here) when the trend in each and every stratum is the opposite of the overall trend we have “Simpson’s paradox”.

Page 15: SJS SDI_131 Design of Statistical Investigations Stephen Senn 13 Cohort Studies

SJS SDI_13 15

Simpson’s Paradox?

• Given some information we come to one conclusion

• Given further information we come to the opposite conclusion

• This is worrying because, given yet further information we might restore the original conclusion.

• But is this a paradox?…consider the following story

Page 16: SJS SDI_131 Design of Statistical Investigations Stephen Senn 13 Cohort Studies

SJS SDI_13 16

Reversal of OpinionAn Illustrative Story

“In the Welsh legend, the returning Llewelyn is met by his hound Gelert at the castle door. Its muzzle is flecked with blood. In the nursery the scene is one of savage disorder and the infant son is missing. Only once the hound has been put to the sword is the child heard to cry and discovered safe and sound by the body of a dead wolf. The additional evidence reverses everything: Llewelyn and not his hound is revealed as a faithless killer.” (From chapter 1 of Senn, SJ, Dicing with Death )

So reversal of opinion is not a purely statistical phenomenon. It is a human one, we accept. So why do we regard this as being a “paradox”?

Page 17: SJS SDI_131 Design of Statistical Investigations Stephen Senn 13 Cohort Studies

SJS SDI_13 17

Obs_4Poole Diabetic Cohort

(Julious and Mulee)Type of Diabetes

Non-insulin

dependent

Insulin

dependent

All Patients

Censored 326 (60) 253 (71) 579

Dead 218 (40) 105 (29) 323

544 (100) 358 (100) 902

Page 18: SJS SDI_131 Design of Statistical Investigations Stephen Senn 13 Cohort Studies

SJS SDI_13 18

Non-insulin

Dependent

Insulin

Dependent

Subjects aged 40

Censored 15 (100) 129 (99) 144

Dead 0 (0) 1 (1) 1

15 (100) 130 (100) 145

Subjects aged > 40

Censored 311 (59) 124 (54) 435

Dead 218 (41) 104 (46) 322

529 (100) 228 (100) 757

Page 19: SJS SDI_131 Design of Statistical Investigations Stephen Senn 13 Cohort Studies

SJS SDI_13 19

“Suppose that the numbers in the table remain the same but refer now to a clinical trial in some life-threatening condition and we replace “Type of Diabetes” by “Treatment “ and “non-insulin dependent” by “A” and “insulin-dependent by B” and “Subjects” by “Patients”. An incautious interpretation of the table would then lead us to a truly paradoxical conclusion. Treating young patients with A rather than B is beneficial (or at least not harmful – the numbers of deaths 0 in the one case and 1 in the other are very small). Treating older patients with A rather than B is beneficial. However, the overall effect of

switching patients from B to A would be to increase deaths overall.” From Dicing with Death

Page 20: SJS SDI_131 Design of Statistical Investigations Stephen Senn 13 Cohort Studies

SJS SDI_13 20

“In his brilliant book, Causality(1), Judea Pearl gives Simpson’s paradox pride of place. Many statisticians have taken Simpson’s paradox to mean that judgements of causality based on observational studies are ultimately doomed. We could never guarantee that further refined observation would not lead to a change in opinion. Pearl points out, however, that we are capable of distinguishing causality from association because there is a difference between seeing and doing. In the case of the trial above we may have seen that the trial is badly imbalanced but we know that the treatment given cannot affect the age of the patient at baseline, that is to say before the trial starts. However, age very plausibly will affect outcome and so it is a factor that should be taken account of when judging the effect of treatment. If in future we change a patient’s treatment we will not (at the moment we change it) change their age. So there is no paradox. We can improve the survival of both young and the old and will not, in acting in this way, adversely affect the survival of the population as a whole.” Dicing with Death

(1) Pearl, J. (2000) Causality. Cambridge University Press, New York.

Page 21: SJS SDI_131 Design of Statistical Investigations Stephen Senn 13 Cohort Studies

SJS SDI_13 21

Lessons

• Confounders can be a problem for cohort studies

• We may need to measure many potential confounders

• We will almost certainly need to include them in our models

• Interpretation may have to be cautious.

Page 22: SJS SDI_131 Design of Statistical Investigations Stephen Senn 13 Cohort Studies

SJS SDI_13 22

Questions

A survey of women in Wickham, England in 1972-1974, with 20 year follow-up gave results recorded in the following slide.

• Do the results show smoking to be dangerous?

• What explanation can you think of for the result?

• What further data would you like to see?

Page 23: SJS SDI_131 Design of Statistical Investigations Stephen Senn 13 Cohort Studies

SJS SDI_13 23

Obs_5Wickham Wonders

status |smoker? |no |yes |RowTotl|-------+-------+-------+-------+alive |502 |443 |945 | |0.69 |0.76 | |-------+-------+-------+-------+dead |230 |139 |369 | |0.31 |0.24 | |-------+-------+-------+-------+ColTotl|732 |582 |1314 |-------+-------+-------+-------+

Page 24: SJS SDI_131 Design of Statistical Investigations Stephen Senn 13 Cohort Studies

SJS SDI_13 24

More Questions

• What sort of interaction is described?

• What explanations can you think of?

• What further information would you like to have?

Look at the study by Best et al, described on the next slide

Page 25: SJS SDI_131 Design of Statistical Investigations Stephen Senn 13 Cohort Studies

SJS SDI_13 25

Obs_6Best, et al

1. The relationship between blood cyclosporin concentration (CyACb) and a patient's risk of organ rejection following heart-lung (HL) transplantation was investigated. 2. Longitudinal data were collected for 90 days post-operation for 31 HL transplant recipients. Following exploratory analysis, a multiple logistic regression model with a binary outcome variable representing presence or absence of lung rejection (as defined on biopsy findings and/or intention to treat) in the next 5 days was fitted to the data. 3. A significant interaction between time post-transplant and CyACb was found. During weeks 1-3, the relative risk (RR) of rejection per unit increase in log(e) (5-day mean CyACb) was reduced: RR = 0.29, 95% confidence interval (CI) = (0.12, 0.72). After 3 post-operative weeks, this trend was reversed: RR = 1.61, 95% CI = (0.96, 2.70).

Best, Trull, Tan, Hue, Spiegelhalter, Gore, Wallwork, Brit J Clin Pharm, 1992