OVERVIEW OF EPIDEMIOLOGICAL STUDY DESIGN · 2020. 8. 9. · P reva l ence E+ - P reva lence-= A A + C-B B + D. TYPES OF EPIDEMIOLOGIC RESEARCH 12 Epidemiologic studies can be divided

OVERVIEW OF EPIDEMIOLOGICAL STUDY DESIGNSPRING 2020

1

Kristin Wall, PhD

Department of Epidemiology

[email protected]

OUTLINE

Definition of epidemiology

Measures of disease frequency and association

Study designs

Descriptive studies

Analytic studies

Experimental (Randomized trials)

Observational (Cohort studies, case-control studies, cross-sectional studies)

2

DEFINITION OF EPIDEMIOLOGY

Epidemiology is the study of the distribution and determinants of disease (i.e., health related

states or events)

Determinants (i.e., Exposures) → Disease (i.e., Outcomes)

We may be interested in

quantifying the distribution of

disease using measures of

frequency

4

Measures of frequency Definition

Risk (“Cumulative Incidence”, CI) Probability that a new disease will occur in a given time

period

Rate (“Incidence density”, ID) A measure of how quickly new diseases develop in a given

time period

Prevalence (“Point prevalence”) A measure of existing disease at a particular point in time

DEFINITION OF EPIDEMIOLOGY

Epidemiology is the study of the distribution and determinants of disease (i.e., health related

states or events)

Determinants (i.e., Exposures) → Disease (i.e., Outcomes)

We may be interested in quantifying

the association between determinants

and disease using measures of

association

MEASURES OF ASSOCIATION

Measures of Frequency → Measures of Association

Risk Risk Ratio

Risk Difference

Rate Rate Ratio

Prevalence Prevalence Ratio

Prevalence Difference

Comparison of disease frequency between two groups (typically comparing

frequencies between the exposed and the unexposed) using differences and ratios

7

Data-layout

(2x2 tables)

Measures of frequency Measures of association

Risk-

based

E+ E-

D+ A B

D- C D

A+C B+D

8

Data-layout

(2x2 tables)


Risk-

based

E+ E-

D+ A B

D- C D

A+C B+D

RiskE+ =A

A+C

RiskE- =B

B+D

9

Data-layout

(2x2 tables)


Risk ratio:

Risk difference:

Risk-

based

E+ E-

D+ A B

D- C D

A+C B+D

RiskE+ =A

A+C

RiskE- =B

B+D

RiskE+

RiskE-

=AA+C

BB+D

RiskE+ -RiskE- =A

A+C-

B

B+D

10

Data-layout

(2x2 tables)


Risk ratio:

Risk difference:

Rate ratio:

Rate difference:

Risk-

based

E+ E-

D+ A B

D- C D

A+C B+D

Rate-

based

E+ E-

D+ A B

Person-

time

PT 1 PT 0

RiskE+ =A

A+C

RiskE- =B

B+D

RateE+ =A

PT1

RateE- =B

PT0

RiskE+

RiskE-

=AA+C

BB+D

RiskE+ -RiskE- =A

A+C-

B

B+D

RateE+

RateE-

=

APT1

BPT0

RateE+ -RateE- =A

PT1

-B

PT0

11

Data-layout

(2x2 tables)


Risk ratio:

Risk difference:

Rate ratio:

Rate difference:

Prevalence ratio:

Prevalence

difference:

Risk-

based

E+ E-

D+ A B

D- C D

A+C B+D

Rate-

based

E+ E-

D+ A B

Person-

time

PT 1 PT 0

Prevalence-

basedE+ E-

D+ A B

D- C D

A+C B+D

RiskE+ =A

A+C

RiskE- =B

B+D

RateE+ =A

PT1

RateE- =B

PT0

PrevalenceE+ =A

A+C

PrevalenceE- =B

B+D

RiskE+

RiskE-

=AA+C

BB+D

RiskE+ -RiskE- =A

A+C-

B

B+D

RateE+

RateE-

=

APT1

BPT0

RateE+ -RateE- =A

PT1

-B

PT0

PrevalenceE+

PrevalenceE-

=AA+C

BB+D

PrevalenceE+ -PrevalenceE- =A

A+C-

B

B+D

TYPES OF EPIDEMIOLOGIC RESEARCH

12

Epidemiologic studies can be divided into two broad

categories:

DESCRIPTIVE STUDIES:

Measure disease frequencies in certain groups

Hypothesis generating

Example: A study found that in the US, 5 in 165

pregnancies among obese women that make it to 20

weeks gestation end in stillbirth

‘

ANALYTIC STUDIES:

Compare disease frequencies between groups

(typically the exposed and unexposed) (i.e., calculate

measures of association)

Hypothesis testing

Example: A study found that obese women are more

likely to have a stillbirth than women of normal weight

TYPES OF EPIDEMIOLOGIC STUDIES

Descriptive studies

Analytic studies

Experimental - The investigator manipulates

(controls) the exposure condition under study.

Clinical trials

Observational - The investigator does not

manipulate the exposure, but rather just

observes exposure status.

Cohort studies

Case-control studies

Cross-sectional studies

In any analytical study, your first step is to

precisely define your exposure(s) of interest

and define your outcome(s) of interest

RANDOMIZED CONTROLLED TRIAL

Disease free, at-

risk population

Sample Randomized by

exposure

Treatment

(exposed)

Standard of care *

(unexposed)

Disease

DiseaseNo

Disease

No

Disease

*Or placebo if no standard of

care exists

14

Time

Enroll participants who are at risk to develop the outcome (e.g., disease free)

Participants are followed longitudinally to see if they develop the outcome

Active manipulation of the exposure by the investigator is the hallmark that distinguishes experimental studies from observational studies

Experimenter often randomly assigns study subjects to exposure conditions

On average, randomized study groups should be comparable on ALL factors except for exposure status

Attempts to control for factors (both known and unknown) which could affect outcome of interest

These ‘factors’ of concern are called ‘confounders’

Units of randomization

Individual: Exposure is allocated to individuals (unit of randomization is an individual)

Cluster: Exposure is allocated to entire groups (unit of randomization is a group, e.g., Communities, families, hospitals, schools, etc.)



Blinding

Single-blind: the participant does not know the study arm of the participants during the study

Double-blinded: neither the investigator nor the study participants know which arm they are in

Can also blind data assessors

Trial objectives

Efficacy: does the intervention work in the people who receive it? Calculate measures of association among

participants with perfect compliance

Effectiveness: does the intervention work in the people who are offered it? Calculate measures of association

among all participants, regardless of whether they complied with the intervention (this is called an intention to

treat analysis, i.e., “Analyze what you randomize”)

Effectiveness is closer to a measure of the ‘real-world’ effect of an intervention

COMMON TRIAL DESIGNS

Parallel

Each group receives one exposure. Exposures

are administered concurrently.

Crossover

Each group receives each exposures, one

after another. May need to have a washout

period between treatments.

Advantages: Each participant serves as their

own control which can minimize bias

STRENGTHS OF TRIALS

Randomization yields treatment and control groups that are similar with respect to both known and

unknown variables

Blinding minimizes bias

Randomized controlled double-blind clinical trial generally considered the “gold standard” design

LIMITATIONS OF TRIALS

Eligibility criteria (trials often use highly selected study participants) and non-participation may limit

generalizability

Trials are often risky, expensive, and time-consuming

Many participants and study staff (labor intensive)

Many years of follow-up (Ex. From start of clinical testing to FDA market approval: 7-10 years)

Expensive (Ex. Cost of taking a drug through FDA approval: $55 to 800 million)

May not be feasible or ethical (e.g., random assignment of exposure is not always ethical)

20

COHORT STUDY

TRIALS Everyone is Followed for (close to) the Entire

Study Period

Follow-Up Time is NOT necessarily the Same

for Everyone

Data

Layout

(2x2 tables)

Measures of

Frequency

Risk Rate

Measures of

Association

Risk Ratio

(RR)

Risk

Difference

Rate Ratio

(IDR)

Rate

Difference

RiskE+

RiskE-

=AA+C

BB+D

RiskE+ -RiskE- =A

A+C-

B

B+D

RateE+

RateE-

=

APT1

BPT0

RateE+ -RateE- =A

PT1

-B

PT0

E+ E-

D+ A B A+B

D- C D C+D

A+C B+D Total

E+ E-

D+ A B A+B

PT PT 1 PT 0 PT

RateE+ =A

PT1

RateE- =B

PT0

RiskE+ =A

A+C

RiskE- =B

B+D

TYPES OF EPIDEMIOLOGIC STUDIES

Descriptive studies

Analytic studies

Experimental - The investigator manipulates

(controls) the exposure condition under study.

Clinical trials

Observational - The investigator does not

manipulate the exposure, but rather just

observes exposure status.

Cohort studies

Case-control studies

Cross-sectional studies

In any analytical study, your first step is to

precisely define your exposure(s) of interest

and define your outcome(s) of interest

COHORT STUDY

Disease free, at-

risk population

Sample

Exposed

Unexposed

Disease

DiseaseNo

Disease

No

Disease

22

Time

Enroll participants who are at risk to develop the outcome (e.g., disease free)

Participants are followed longitudinally to see if they develop the outcome

PROSPECTIVE VS RETROSPECTIVE COHORTS

Prospective cohort: Exposed and unexposed subjects are enrolled and followed before

outcome is measured

Advantages: Temporality of exposure and disease is well-established; data collection under your

control, for your purposes

Disadvantages: More costly and time-consuming

Retrospective cohort: Makes use of a previous cohort (outcomes have already occurred)

Advantages: Less expensive and less time consuming

Disadvantages: Rely on data not compiled for the purposes of your study, cannot change how data

were collected

23

COHORT STUDY EXAMPLE: RISK-BASED MEASURES

24

Smokers Non-

smokers

Lung cancer 20 2 22

No lung cancer 980 998 1978

*25-year cohort 1000 1000 2000

Consider a 25-year retrospective cohort study of the association between smoking and lung cancer

What is the 25-year risk of lung cancer among smokers?

20/1000 = 2%

What is the 25-year risk of lung cancer among non-smokers?

2/1000 = 0.2%

What is the risk ratio?

2% / 0.2% = 10

The risk of lung cancer in smokers is 10 times the risk of lung cancer in non-smokers over 25 years of follow-up

COHORT STUDIES

Strengths

Can study several diseases

Useful for examining rare exposures

Limitations

Often costly and (for prospective studies) time-consuming

Inefficient for rare diseases or diseases with long induction/latency periods

Recall, induction period is the time between exposure and disease onset; latency period is the time between

disease onset and clinical detection

Loss of subjects from migration, lack of participation, withdrawal and death may introduce bias

26

COHORT STUDY

COHORT

STUDIES

Everyone is Followed for (close to) the Entire

Study Period

Follow-Up Time is NOT necessarily the Same

for Everyone

Data

Layout

(2x2 tables)

Measures of

Frequency

Risk Rate

Measures of

Association

Risk Ratio

(RR)

Risk

Difference

Rate Ratio

(IDR)

Rate

Difference

RiskE+

RiskE-

=AA+C

BB+D

RiskE+ -RiskE- =A

A+C-

B

B+D

RateE+

RateE-

=

APT1

BPT0

RateE+ -RateE- =A

PT1

-B

PT0

E+ E-

D+ A B A+B

D- C D C+D

A+C B+D Total

E+ E-

D+ A B A+B

PT PT 1 PT 0 PT

RateE+ =A

PT1

RateE- =B

PT0

RiskE+ =A

A+C

RiskE- =B

B+D

CASE-CONTROL STUDY

Population

Sample

Diseased

‘Cases’

Non-diseased

‘Controls’

Unexposed

Unexposed Exposed

Exposed

27

Cannot measure disease frequency (can compare measures of exposure

frequency to generate useful measures of association)

Step 1.

Select

individuals

with the

disease

(cases) and

those

without the

disease

(controls)

Step 2. Then inquire about prior exposures

CASE-CONTROL STUDY EXAMPLE:

OUTBREAK OF DIARRHEA AT A RESORT HOTEL

28

EXPOSUREHEALTH

OUTCOME

Eating at the

salad barDiarrhea

Cases: Individuals with diarrhea

Controls: A small sample of disease-free

guests

Ate at

salad bar

Did not eat

at salad bar

Cases 17 20 37

Controls 7 26 33

24 46 70

CASE-CONTROL STUDY EXAMPLE:

OUTBREAK OF DIARRHEA AT A RESORT HOTEL

We cannot measure disease frequency. Why not?

The number of individuals who do or do not get the disease is fixed by the investigator

We can compare measures of exposure frequency to generate useful measures of association

This measure of association is called the Odds ratio (OR)

OR = A*D / B*C = (17*26) / (20*7) = 3

The odds of having diarrhea among those who ate at the salad bar is three 3 higher than the odds of diarrheaamong those who did not eat at the salad bar

29

Ate at

salad bar

Did not eat

at salad bar

Cases 17 20 37

Controls 7 26 33

24 46 70

CASE-CONTROL STUDIES

Strengths

Can evaluate multiple exposures

Useful for studying rare diseases or diseases with long induction/latency periods

In general, less expensive and can be conducted more quickly

Commonly used in outbreak investigations


Limitations

Not useful for studying rare exposures

Generally, only designed to study one disease

Frequency of disease cannot be estimated directly

Susceptible to bias if not carefully designed

Especially susceptible to exposure misclassification if relying on recall

Sampling controls without bias is not always easy

Purpose of controls is to represent the distribution of the exposure in the source population that gave rise

to the cases. Identifying this population is not always easy.

SOME IMPORTANT CASE-CONTROL STUDIES

1950s

Association between lung cancer and smoking

1970s/1980s

Sexual practices and HIV/AIDS

1990s

Diet and cancers

Ongoing

Identifying the source of legionnaires outbreaks


Data

Layout

(2x2 tables)

Measures of disease

frequency

None

Measures of Association

OR =AD

BC

E+ E-

D+ A B A+B

D- C D C+D

A+C B+D N

CROSS-SECTIONAL STUDY

Sample

Sample is drawn at one

point in time (‘snap-

shot of time)

Existing

Disease

No Existing

Disease

34

Time

Currently

Exposed

Currently

unexposed

Which came first,

the exposure or

the outcome?

CROSS-SECTIONAL STUDY: EXAMPLE

Research question

What is the association between coronary artery calcification (exposure) and late-life

depression (outcome)?

Methods

Sample: n=1,920 elderly men and women in Rotterdam

Researchers measured the prevalence of coronary artery calcification and the prevalence of

depressive symptoms


Coronary calc

>500

Coronary calc

<=500

Depression (D+) 28 53 81

Not depressed (D-) 511 1328 1839

539 1381 1920

• What is prevalence of depression among those with coronary calcification?

• 28/539 = 5.2%

• What is prevalence of depression among those without coronary calcification?

• 53/1381 = 3.8%

• What is the prevalence ratio?

• 5.2% / 3.8% = 1.4

• The prevalence of disease among the exposed is 1.4 times the prevalence of disease among the unexposed


From this study design, we cannot establish whether exposure preceded disease or

vice versa (temporal sequence unclear)

Hard to establish cause and effect

Coronary artery

calcification

?Lack of exercise

Poor Eating

Depression in

elderly

Biological

changes

CROSS-SECTIONAL STUDIES

Strengths

Relatively quick and inexpensive.

Can evaluate several exposures and several diseases (outcomes) at the same time.

Can be used to generate pilot data for a study with clear temporality between exposure and disease.

Helpful for evaluating disease burden or exposure burden in a population (prevalence).

Limitations

Generally, cannot establish whether exposure preceded disease or disease influenced exposure (temporal sequence unclear)

Usually not useful for establishing etiologic relationships

Length-bias sampling: May miss diseases or exposures with short duration and overrepresent those with long duration

39

COHORT STUDY

CROSS-SECTIONAL STUDIES

Data

Layout

(2x2 tables)

Measures of Frequency Prevalence

Measures of Association

Prevalence Ratio (PR) =

Prevalence

Difference (PD) =

Prevalence odds ratio (POR) =

PrevalenceE+

PrevalenceE-

=AA+C

BB+D

PrevalenceE+ -PrevalenceE- =A

A+C-

B

B+D

PrevalenceE+ =A

A+CPrevalenceE- =

B

B+D

Prevalence-

basedE+ E-

D+ A B A+B

D- C D C+D

A+C B+D Total

=AD

BC

OPEN EPI

40

http://www.openepi.com/Menu/OE_Menu.htm

http://www.openepi.com/Menu/OE_Menu.htm

WHICH STUDY DESIGN TO CHOOSE?

Ideal study design is often unattainable

There is no true “hierarchy” of study designs – the best realistic study design depends on:

Research question

Ethics

Whether outcome is rare (or has long induction/latency period)

Whether exposure is rare

Available resources: trials and cohorts especially can be resource-intensive

41

GOAL WHEN DESIGNING AN ANALYTIC STUDY

To obtain best evidence either in support of, or in refutation of, the study

hypothesis within the constraints of time, resources, ethics and other practical

considerations.

Rarely (if ever) does one study settle a research question. Rather, our knowledge of

the determinants of health outcomes advances by an accumulation of different types

of evidence from different studies.

Documents

OVERVIEW OF EPIDEMIOLOGICAL STUDY DESIGN · 2020. 8. 9. · P reva l ence E+ - P reva lence-= A A + C-B B + D. TYPES OF EPIDEMIOLOGIC RESEARCH 12 Epidemiologic studies can be divided