Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
OVERVIEW OF EPIDEMIOLOGICAL STUDY DESIGNSPRING 2020
1
Kristin Wall, PhD
Department of Epidemiology
OUTLINE
Definition of epidemiology
Measures of disease frequency and association
Study designs
Descriptive studies
Analytic studies
Experimental (Randomized trials)
Observational (Cohort studies, case-control studies, cross-sectional studies)
2
DEFINITION OF EPIDEMIOLOGY
Epidemiology is the study of the distribution and determinants of disease (i.e., health related
states or events)
Determinants (i.e., Exposures) → Disease (i.e., Outcomes)
We may be interested in
quantifying the distribution of
disease using measures of
frequency
4
Measures of frequency Definition
Risk (“Cumulative Incidence”, CI) Probability that a new disease will occur in a given time
period
Rate (“Incidence density”, ID) A measure of how quickly new diseases develop in a given
time period
Prevalence (“Point prevalence”) A measure of existing disease at a particular point in time
DEFINITION OF EPIDEMIOLOGY
Epidemiology is the study of the distribution and determinants of disease (i.e., health related
states or events)
Determinants (i.e., Exposures) → Disease (i.e., Outcomes)
We may be interested in quantifying
the association between determinants
and disease using measures of
association
MEASURES OF ASSOCIATION
Measures of Frequency → Measures of Association
Risk Risk Ratio
Risk Difference
Rate Rate Ratio
Prevalence Prevalence Ratio
Prevalence Difference
Comparison of disease frequency between two groups (typically comparing
frequencies between the exposed and the unexposed) using differences and ratios
7
Data-layout
(2x2 tables)
Measures of frequency Measures of association
Risk-
based
E+ E-
D+ A B
D- C D
A+C B+D
8
Data-layout
(2x2 tables)
Measures of frequency Measures of association
Risk-
based
E+ E-
D+ A B
D- C D
A+C B+D
RiskE+ =A
A+C
RiskE- =B
B+D
9
Data-layout
(2x2 tables)
Measures of frequency Measures of association
Risk ratio:
Risk difference:
Risk-
based
E+ E-
D+ A B
D- C D
A+C B+D
RiskE+ =A
A+C
RiskE- =B
B+D
RiskE+
RiskE-
=AA+C
BB+D
RiskE+ -RiskE- =A
A+C-
B
B+D
10
Data-layout
(2x2 tables)
Measures of frequency Measures of association
Risk ratio:
Risk difference:
Rate ratio:
Rate difference:
Risk-
based
E+ E-
D+ A B
D- C D
A+C B+D
Rate-
based
E+ E-
D+ A B
Person-
time
PT 1 PT 0
RiskE+ =A
A+C
RiskE- =B
B+D
RateE+ =A
PT1
RateE- =B
PT0
RiskE+
RiskE-
=AA+C
BB+D
RiskE+ -RiskE- =A
A+C-
B
B+D
RateE+
RateE-
=
APT1
BPT0
RateE+ -RateE- =A
PT1
-B
PT0
11
Data-layout
(2x2 tables)
Measures of frequency Measures of association
Risk ratio:
Risk difference:
Rate ratio:
Rate difference:
Prevalence ratio:
Prevalence
difference:
Risk-
based
E+ E-
D+ A B
D- C D
A+C B+D
Rate-
based
E+ E-
D+ A B
Person-
time
PT 1 PT 0
Prevalence-
basedE+ E-
D+ A B
D- C D
A+C B+D
RiskE+ =A
A+C
RiskE- =B
B+D
RateE+ =A
PT1
RateE- =B
PT0
PrevalenceE+ =A
A+C
PrevalenceE- =B
B+D
RiskE+
RiskE-
=AA+C
BB+D
RiskE+ -RiskE- =A
A+C-
B
B+D
RateE+
RateE-
=
APT1
BPT0
RateE+ -RateE- =A
PT1
-B
PT0
PrevalenceE+
PrevalenceE-
=AA+C
BB+D
PrevalenceE+ -PrevalenceE- =A
A+C-
B
B+D
TYPES OF EPIDEMIOLOGIC RESEARCH
12
Epidemiologic studies can be divided into two broad
categories:
DESCRIPTIVE STUDIES:
Measure disease frequencies in certain groups
Hypothesis generating
Example: A study found that in the US, 5 in 165
pregnancies among obese women that make it to 20
weeks gestation end in stillbirth
‘
ANALYTIC STUDIES:
Compare disease frequencies between groups
(typically the exposed and unexposed) (i.e., calculate
measures of association)
Hypothesis testing
Example: A study found that obese women are more
likely to have a stillbirth than women of normal weight
TYPES OF EPIDEMIOLOGIC STUDIES
Descriptive studies
Analytic studies
Experimental - The investigator manipulates
(controls) the exposure condition under study.
Clinical trials
Observational - The investigator does not
manipulate the exposure, but rather just
observes exposure status.
Cohort studies
Case-control studies
Cross-sectional studies
In any analytical study, your first step is to
precisely define your exposure(s) of interest
and define your outcome(s) of interest
RANDOMIZED CONTROLLED TRIAL
Disease free, at-
risk population
Sample Randomized by
exposure
Treatment
(exposed)
Standard of care *
(unexposed)
Disease
DiseaseNo
Disease
No
Disease
*Or placebo if no standard of
care exists
14
Time
Enroll participants who are at risk to develop the outcome (e.g., disease free)
Participants are followed longitudinally to see if they develop the outcome
Active manipulation of the exposure by the investigator is the hallmark that distinguishes experimental studies from observational studies
Experimenter often randomly assigns study subjects to exposure conditions
On average, randomized study groups should be comparable on ALL factors except for exposure status
Attempts to control for factors (both known and unknown) which could affect outcome of interest
These ‘factors’ of concern are called ‘confounders’
Units of randomization
Individual: Exposure is allocated to individuals (unit of randomization is an individual)
Cluster: Exposure is allocated to entire groups (unit of randomization is a group, e.g., Communities, families, hospitals, schools, etc.)
RANDOMIZED CONTROLLED TRIAL
RANDOMIZED CONTROLLED TRIAL
Blinding
Single-blind: the participant does not know the study arm of the participants during the study
Double-blinded: neither the investigator nor the study participants know which arm they are in
Can also blind data assessors
Trial objectives
Efficacy: does the intervention work in the people who receive it? Calculate measures of association among
participants with perfect compliance
Effectiveness: does the intervention work in the people who are offered it? Calculate measures of association
among all participants, regardless of whether they complied with the intervention (this is called an intention to
treat analysis, i.e., “Analyze what you randomize”)
Effectiveness is closer to a measure of the ‘real-world’ effect of an intervention
COMMON TRIAL DESIGNS
Parallel
Each group receives one exposure. Exposures
are administered concurrently.
Crossover
Each group receives each exposures, one
after another. May need to have a washout
period between treatments.
Advantages: Each participant serves as their
own control which can minimize bias
STRENGTHS OF TRIALS
Randomization yields treatment and control groups that are similar with respect to both known and
unknown variables
Blinding minimizes bias
Randomized controlled double-blind clinical trial generally considered the “gold standard” design
LIMITATIONS OF TRIALS
Eligibility criteria (trials often use highly selected study participants) and non-participation may limit
generalizability
Trials are often risky, expensive, and time-consuming
Many participants and study staff (labor intensive)
Many years of follow-up (Ex. From start of clinical testing to FDA market approval: 7-10 years)
Expensive (Ex. Cost of taking a drug through FDA approval: $55 to 800 million)
May not be feasible or ethical (e.g., random assignment of exposure is not always ethical)
20
COHORT STUDY
TRIALS Everyone is Followed for (close to) the Entire
Study Period
Follow-Up Time is NOT necessarily the Same
for Everyone
Data
Layout
(2x2 tables)
Measures of
Frequency
Risk Rate
Measures of
Association
Risk Ratio
(RR)
Risk
Difference
Rate Ratio
(IDR)
Rate
Difference
RiskE+
RiskE-
=AA+C
BB+D
RiskE+ -RiskE- =A
A+C-
B
B+D
RateE+
RateE-
=
APT1
BPT0
RateE+ -RateE- =A
PT1
-B
PT0
E+ E-
D+ A B A+B
D- C D C+D
A+C B+D Total
E+ E-
D+ A B A+B
PT PT 1 PT 0 PT
RateE+ =A
PT1
RateE- =B
PT0
RiskE+ =A
A+C
RiskE- =B
B+D
TYPES OF EPIDEMIOLOGIC STUDIES
Descriptive studies
Analytic studies
Experimental - The investigator manipulates
(controls) the exposure condition under study.
Clinical trials
Observational - The investigator does not
manipulate the exposure, but rather just
observes exposure status.
Cohort studies
Case-control studies
Cross-sectional studies
In any analytical study, your first step is to
precisely define your exposure(s) of interest
and define your outcome(s) of interest
COHORT STUDY
Disease free, at-
risk population
Sample
Exposed
Unexposed
Disease
DiseaseNo
Disease
No
Disease
22
Time
Enroll participants who are at risk to develop the outcome (e.g., disease free)
Participants are followed longitudinally to see if they develop the outcome
PROSPECTIVE VS RETROSPECTIVE COHORTS
Prospective cohort: Exposed and unexposed subjects are enrolled and followed before
outcome is measured
Advantages: Temporality of exposure and disease is well-established; data collection under your
control, for your purposes
Disadvantages: More costly and time-consuming
Retrospective cohort: Makes use of a previous cohort (outcomes have already occurred)
Advantages: Less expensive and less time consuming
Disadvantages: Rely on data not compiled for the purposes of your study, cannot change how data
were collected
23
COHORT STUDY EXAMPLE: RISK-BASED MEASURES
24
Smokers Non-
smokers
Lung cancer 20 2 22
No lung cancer 980 998 1978
*25-year cohort 1000 1000 2000
Consider a 25-year retrospective cohort study of the association between smoking and lung cancer
What is the 25-year risk of lung cancer among smokers?
20/1000 = 2%
What is the 25-year risk of lung cancer among non-smokers?
2/1000 = 0.2%
What is the risk ratio?
2% / 0.2% = 10
The risk of lung cancer in smokers is 10 times the risk of lung cancer in non-smokers over 25 years of follow-up
COHORT STUDIES
Strengths
Can study several diseases
Useful for examining rare exposures
Limitations
Often costly and (for prospective studies) time-consuming
Inefficient for rare diseases or diseases with long induction/latency periods
Recall, induction period is the time between exposure and disease onset; latency period is the time between
disease onset and clinical detection
Loss of subjects from migration, lack of participation, withdrawal and death may introduce bias
26
COHORT STUDY
COHORT
STUDIES
Everyone is Followed for (close to) the Entire
Study Period
Follow-Up Time is NOT necessarily the Same
for Everyone
Data
Layout
(2x2 tables)
Measures of
Frequency
Risk Rate
Measures of
Association
Risk Ratio
(RR)
Risk
Difference
Rate Ratio
(IDR)
Rate
Difference
RiskE+
RiskE-
=AA+C
BB+D
RiskE+ -RiskE- =A
A+C-
B
B+D
RateE+
RateE-
=
APT1
BPT0
RateE+ -RateE- =A
PT1
-B
PT0
E+ E-
D+ A B A+B
D- C D C+D
A+C B+D Total
E+ E-
D+ A B A+B
PT PT 1 PT 0 PT
RateE+ =A
PT1
RateE- =B
PT0
RiskE+ =A
A+C
RiskE- =B
B+D
CASE-CONTROL STUDY
Population
Sample
Diseased
‘Cases’
Non-diseased
‘Controls’
Unexposed
Unexposed Exposed
Exposed
27
Cannot measure disease frequency (can compare measures of exposure
frequency to generate useful measures of association)
Step 1.
Select
individuals
with the
disease
(cases) and
those
without the
disease
(controls)
Step 2. Then inquire about prior exposures
CASE-CONTROL STUDY EXAMPLE:
OUTBREAK OF DIARRHEA AT A RESORT HOTEL
28
EXPOSUREHEALTH
OUTCOME
Eating at the
salad barDiarrhea
Cases: Individuals with diarrhea
Controls: A small sample of disease-free
guests
Ate at
salad bar
Did not eat
at salad bar
Cases 17 20 37
Controls 7 26 33
24 46 70
CASE-CONTROL STUDY EXAMPLE:
OUTBREAK OF DIARRHEA AT A RESORT HOTEL
We cannot measure disease frequency. Why not?
The number of individuals who do or do not get the disease is fixed by the investigator
We can compare measures of exposure frequency to generate useful measures of association
This measure of association is called the Odds ratio (OR)
OR = A*D / B*C = (17*26) / (20*7) = 3
The odds of having diarrhea among those who ate at the salad bar is three 3 higher than the odds of diarrheaamong those who did not eat at the salad bar
29
Ate at
salad bar
Did not eat
at salad bar
Cases 17 20 37
Controls 7 26 33
24 46 70
CASE-CONTROL STUDIES
Strengths
Can evaluate multiple exposures
Useful for studying rare diseases or diseases with long induction/latency periods
In general, less expensive and can be conducted more quickly
Commonly used in outbreak investigations
CASE-CONTROL STUDIES
Limitations
Not useful for studying rare exposures
Generally, only designed to study one disease
Frequency of disease cannot be estimated directly
Susceptible to bias if not carefully designed
Especially susceptible to exposure misclassification if relying on recall
Sampling controls without bias is not always easy
Purpose of controls is to represent the distribution of the exposure in the source population that gave rise
to the cases. Identifying this population is not always easy.
SOME IMPORTANT CASE-CONTROL STUDIES
1950s
Association between lung cancer and smoking
1970s/1980s
Sexual practices and HIV/AIDS
1990s
Diet and cancers
Ongoing
Identifying the source of legionnaires outbreaks
CASE-CONTROL STUDIES
Data
Layout
(2x2 tables)
Measures of disease
frequency
None
Measures of Association
OR =AD
BC
E+ E-
D+ A B A+B
D- C D C+D
A+C B+D N
CROSS-SECTIONAL STUDY
Sample
Sample is drawn at one
point in time (‘snap-
shot of time)
Existing
Disease
No Existing
Disease
34
Time
Currently
Exposed
Currently
unexposed
Which came first,
the exposure or
the outcome?
CROSS-SECTIONAL STUDY: EXAMPLE
Research question
What is the association between coronary artery calcification (exposure) and late-life
depression (outcome)?
Methods
Sample: n=1,920 elderly men and women in Rotterdam
Researchers measured the prevalence of coronary artery calcification and the prevalence of
depressive symptoms
CROSS-SECTIONAL STUDY: EXAMPLE
Coronary calc
>500
Coronary calc
<=500
Depression (D+) 28 53 81
Not depressed (D-) 511 1328 1839
539 1381 1920
• What is prevalence of depression among those with coronary calcification?
• 28/539 = 5.2%
• What is prevalence of depression among those without coronary calcification?
• 53/1381 = 3.8%
• What is the prevalence ratio?
• 5.2% / 3.8% = 1.4
• The prevalence of disease among the exposed is 1.4 times the prevalence of disease among the unexposed
CROSS-SECTIONAL STUDY: EXAMPLE
From this study design, we cannot establish whether exposure preceded disease or
vice versa (temporal sequence unclear)
Hard to establish cause and effect
Coronary artery
calcification
?Lack of exercise
Poor Eating
Depression in
elderly
Biological
changes
CROSS-SECTIONAL STUDIES
Strengths
Relatively quick and inexpensive.
Can evaluate several exposures and several diseases (outcomes) at the same time.
Can be used to generate pilot data for a study with clear temporality between exposure and disease.
Helpful for evaluating disease burden or exposure burden in a population (prevalence).
Limitations
Generally, cannot establish whether exposure preceded disease or disease influenced exposure (temporal sequence unclear)
Usually not useful for establishing etiologic relationships
Length-bias sampling: May miss diseases or exposures with short duration and overrepresent those with long duration
39
COHORT STUDY
CROSS-SECTIONAL STUDIES
Data
Layout
(2x2 tables)
Measures of Frequency Prevalence
Measures of Association
Prevalence Ratio (PR) =
Prevalence
Difference (PD) =
Prevalence odds ratio (POR) =
PrevalenceE+
PrevalenceE-
=AA+C
BB+D
PrevalenceE+ -PrevalenceE- =A
A+C-
B
B+D
PrevalenceE+ =A
A+CPrevalenceE- =
B
B+D
Prevalence-
basedE+ E-
D+ A B A+B
D- C D C+D
A+C B+D Total
=AD
BC
WHICH STUDY DESIGN TO CHOOSE?
Ideal study design is often unattainable
There is no true “hierarchy” of study designs – the best realistic study design depends on:
Research question
Ethics
Whether outcome is rare (or has long induction/latency period)
Whether exposure is rare
Available resources: trials and cohorts especially can be resource-intensive
41
GOAL WHEN DESIGNING AN ANALYTIC STUDY
To obtain best evidence either in support of, or in refutation of, the study
hypothesis within the constraints of time, resources, ethics and other practical
considerations.
Rarely (if ever) does one study settle a research question. Rather, our knowledge of
the determinants of health outcomes advances by an accumulation of different types
of evidence from different studies.