Leicester Warwick Medical School Health and Disease in Populations Cohort Studies Paul Burton

Leicester Warwick Medical School

Health and Disease in Populations

Cohort Studies

Paul Burton

Lecture Objectives

1. Describe the logical basis of, and the practical problems involved in, cohort studies of disease incidence

2. Compare incidence rates or mortality rates between two groups of individuals within a cohort by calculating the incidence rate ratio (IRR) (internal comparisons)

Lecture Objectives

3. Compare disease incidence or mortality in a study cohort with that in a reference population using standardisation methods (e.g. the SMR) (external comparisons)

4. Describe the factors determining the precision of an estimated relative risk

Relevant reading in Prescribed book:Farmer R, Miller D. Chapter 5 pp 47-55

Investigating the aetiology (cause)of a disease

• Comparison of incidence rates between groups • So far we have focused on ‘natural experiments’

often based on routine data • Examples:

• leukaemia nuclear plant• Skin cancer living in Brighton• Mortality rates in different social classes

• But there are serious limitations on what can realistically be achieved

What would be an ideal study?

• Basic scientific method: compare ‘like with like’ • Two identical groups, differing only in exposure

status • Differences can then reasonably be attributed to

that exposure.


• A problem• How to get two identical groups differing only in

exposure status, when exposure status is linked to many other characteristics?

• e.g. Smoking is linked to: alcohol, type of occupation, ethnicity, social class, exercise levels ....


• The ideal solution• an experiment (force equality)

• Next best• randomisation (later in module)

• Next best• a cohort study (measure and record factors that may

determine inequality)

A simple cohort study with a ‘within cohort comparison’

• e.g. Does smoking cause asthma?• Recruit disease-free individuals • Classify into exposed versus unexposed

• e.g. current smokers versus non-smokers• Follow up over time and:

• (a) Count the person years ‘at risk’ (p-y)• (b) Count how many develop asthma (d)• (c) Calculate incidence rate (IR = d/p-y)

A simple cohort study with a ‘within cohort comparison’

• Do this for each exposure group separately:• IRSMOKERS = dSMOKERS / p-ySMOKERS

• IRNON-SMOKERS = dNON-SMOKERS / p-yNON-SMOKERS

• Relative risk = Incidence rate ratio:• IRR = IRSMOKERS / IRNON-SMOKERS

A worked example

• 1,000 children followed from birth to age 5 yrs• 300 at least one parent smoked in the home• 700 neither parent smoked in the home

• p-ySMOKE = 300 5 pyr = 1,500 p-y

• p-yNON-SMOKE = 700 5 pyr = 3,500 p-y

A worked example

• Smoke exposed, 75 diagnosed asthma• Smoke unexposed, 105 diagnosed asthma

• IRSMOKERS = 75/1,500 = 50 per 1,000 p-y

• IRNON-SMOKERS = 105/3,500 = 30 per 1,000 p-y

A worked example

• IRR = 1.667

• e.f. =

• e.f. = 1.35

• 95% CI: 1.667÷1.35 to 1.6671.35• i.e. 1.23 to 2.25

105

1

75

12exp

Advantages over looking at routine data

• You can study exposures/personal characteristics which aren’t collected routinely

• Opportunity to obtain more detailed information on outcomes and exposures

• Ability to collect additional data on confounding variables

Prospective cohort studies

• All cohort studies involve prospective follow up. That is:• Recruit and define exposure status in disease free

individuals• Follow up count p-y and d

• But, may collect data starting in the future (e.g. start follow-up in 2003). This is a conventional prospective cohort study.

‘Historical’ or ‘retrospective’ cohort study

• Alternatively, may collect follow-up data starting in the past:• e.g. recruit and define exposure status in disease free

individuals from 1990 using historical records

• Follow up count p-y and d (possibly using historical records)

• Start of study 2003 but start of follow-up 1990• This is a historical cohort study

Exposure data may be binary or in several categories or continuous

• Lung cancer death rates per 100,000 p-y

Cigarettes per day

0 1-14 15-24 25+

Men 10 78 127 251

Women 7 9 45 208

Comparisons can be made internally or against an external reference population

• Comparison of sub-cohorts (internal comparison)

Sub-cohort Person-years Observed cases (or deaths)

Rate

Non-exposed (-) p-y- d- r- = d- y-

Exposed (+) p-y+ d+ r+ = d+ y+

Comparisons can be made internally or against an external reference population

IRR = r+ r-

• Random variation:

• Internal comparisons:• If either subcohort is small, e.f. large• d+=3, d-=10 e.f. = 3.73• d+=3, d-=10,000 e.f. = 3.17• d+=30, d-=10,000 e.f. = 1.44

e.f. =

dd

112exp

External comparisons

(E) cases ofnumber Expected

(O) cases ofnumber ObservedSMR

O

fe1

2exp..

Calculating expected cases

• Usually cohorts observed over long periods• People age during the study• Rates in reference population change during the

study

Extending the SMR approach

• Calculate separate ‘number of expected cases (or deaths)’ for each age group in each different calendar time period e.g. age 50-54 in period 1985-1989:• obtain reference population’s age-specific rates for

each calendar period from routine sources• multiply these rates by appropriate cells' person-

years to estimate the expected cases (or deaths) in each cell

Extending the SMR approach

• Expected deaths are then summed over all cells (i.e. over all age groups and all periods)

• Can also add additional classification variables: e.g. age-sex specific rates at each calendar time period. But limited to variables which are recorded in the routine data!

• Lexis diagram

Deaths from IHD and p-y exposurein test population

• O=5 + 4 + 50 + 35 + 500 + 400 = 994

Age Group 1980-1989 1990-1999

45-54 p-y = 100,000 deaths = 5

p-y=110,000 deaths = 4

55-64 p-y = 90,000 deaths = 50

p-y = 90,000 deaths = 35

65-74 p-y = 40,000 deaths = 500

p-y = 50,000 deaths = 400

Rates from reference population

• E=10 + 8.8 + 90 + 72 + 800 + 800 = 1780.8• SMR = O/E = 994/1780.8 = 0.558 or 55.8

•

• 95% CI = 55.8 ×/÷ 1.065 = (52.4, 59.4)

Age Group 1980-1989 1990-1999 45-54 death rate = 10/100,000

E deaths =10

(100,000 [10/100,000])=10

death rate = 8/100,000

E deaths =8.8

(110,000 [8/100,000])=8.8 55-64 death rate = 100/100,000

E deaths=90 (90,000 [100/100,000])=90

death rate = 80/100,000 E deaths =72

(90,000 [80/100,000])=72 65-74 death rate =2,000/100000

E deaths =800 (40,000 [2,000/100,000])=800

death rate = 1,600/100,000 E deaths =800

(50,000 [1,600/100,000])=800

065.1994

12exp..

fe

Precision

• External comparisons:

Number of events (O)

O

fe1

2exp..

10 1.88 50 1.33 100 1.22 500 1.09 1000 1.07

Precision

• Internal comparisons:

O d- d+ e.f. for SM R

21

112exp..

ddfe

100 1.22 50 50 1.49 10 90 1.95 500 1.09 250 250 1.20 50 450 1.35

External comparisons• Useful when not possible to use subcohorts but

• Often limited data for reference population• Often no incidence data• Make do with mortality data• Study and reference populations may not be comparable

- selection bias• The ‘healthy worker effect’

• Many occupational cohorts yield SMRs well below 100%: Employment restricted to healthy individuals

Problems with cohort studies• Large and resource intensive• Take a long time (historical cohorts take less time)• Rigorous definitions of disease and exposure

• Expensive• Intensive/invasive investigation• Difficulty avoiding high drop out• Results take a long time• Ethical dilemmas• Can become politically charged

• Not very good for rare diseases• Difficulty with confounding (especially unknown

confounders)

Why do cohort studies at all?

• Detailed and prospective assessment of exposure, outcomes and confounders a huge scientific benefit (historical cohorts can be less good)

• Wish to study a range of different outcomes• Wish to study a rare exposure• Conditions which fluctuate with age

• Randomly• Systematically

Documents

Leicester Warwick Medical School Health and Disease in Populations Cohort Studies Paul Burton