64
Observational Study Designs and Studies of Medical Tests Michael A. Kohn, MD, MPP 25 August 2009

Observational Study Designs and Studies of Medical Tests Michael A. Kohn, MD, MPP 25 August 2009

Embed Size (px)

Citation preview

ObservationalStudy Designs

andStudies of Medical Tests

Michael A. Kohn, MD, MPP25 August 2009

Outline Single Sentence Study Description REVIEW Observational study designs

Cohort, Double Cohort Case-Control Cross-sectional

Studies of Medical Tests Diagnostic Test Accuracy Prognostic Test Accuracy

Examples of observational designs (“Name that Design”)

Single-Sentence Study Description

(Unless Studying a Medical Test)

“The [cute acronym] study is a [DESIGN] study of the association* between [predictor] and [outcome] in [study population].”

“The SCOTCH Study is a cohort study of the association* between HPV infection and development of cutaneous squamous cell carcinoma in renal transplant recipients.”

* Interested in causal association.

Single-Sentence DescriptionIf Studying a Test

“The [cute acronym] study is a [DESIGN] study of [test] as a [diagnostic/prognostic] test for [disease/outcome] in [study population].”

“The 3D-ERUS Study is a cross-sectional study of the accuracy of endorectal ultrasound in re-staging rectal cancer relative to the gold standard of surgical pathology after neoadjuvant chemoradiation in patients with locally invasive rectal cancer.”

Single-Sentence Study Description

Exercise for section today: Present your study with a sentence like this.

Study Design

Not just a matter of semantics Weaknesses and strengths associated

with each study design Different measures of disease association Worth getting right or at least thinking

about

Study Designs

Experimental-- Randomized controlled trial Observational (today’s topic)-- Cohort -- Double Cohort (exposed-unexposed)-- Case-control-- Cross-sectional

Predictor Type and Experimental vs. Observational Design

• Predictor = treatment or screening program

-- experiment (randomized controlled trial)*

-- observational study of a treatment or program

• Predictor = exposure or risk factor

-- observational study of an exposure or risk factor

• Predictor = test result

-- observational study of a test*Not all treatments or screening programs require RCTs to prove effectiveness.

OBSERVATIONAL STUDIES Only option if predictor is a potentially

harmful exposure, risk factor, or test. Even if the predictor is an

intervention, RCT may not be feasible Confounding is an issue* More intellectually interesting than

RCTs?

* Except in studies of tests, then the issue isn’t confounding, but how much the test adds to information that is already available.

Note on Figures

Following schematics of observational study designs assume:

Predictor = Risk Factor Outcome = Disease

Cohort Study

Cohort Studies

1)Determine predictor status on a sample from a single population (defined by something other than the predictor).

2)Exclude any potential subjects who already have the outcome.

3)Follow sample over time and attempt to determine outcome on all subjects.

Cohort Studies

Can identify individuals lost to follow up

Can estimate overall incidence of outcome in the population (e.g., cases/person-year)

Measure of disease association is the relative risk (RR) or relative hazard (RH)

Double Cohort Study

Double Cohort (Exposed-Unexposed) Studies

1) Sample study subjects based on predictor status.

2) Exclude potential subjects in whom outcome has already occurred.

3) Attempt to determine outcome in all subjects over time.

Double Cohort (Exposed-Unexposed) Studies

Can identify individuals lost to follow up

Cannot estimate overall incidence of outcome in the population (e.g., cases/person-year)

Measure of disease association is the relative risk (RR) or relative hazard (RH)

Cohort Studies: Sampling Frame vs. Time Frame

Time Frame: All cohort studies are longitudinal (follow patients over time).

Sampling Frame:Double cohort study -- samples on

predictor statusCohort study -- starts with a cross-

sectional sample

Cohort Studies: Prospective vs. Retrospective

Prospective – Predictor status collected as part of this study

Retrospective – Predictor status collected by someone else in the past (another study, medical records, etc.)

Don’t worry too much about retrospective vs. prospective!

Case-Control Study

Case-Control Study

1) Separately sample subjects with the outcome (cases) and without the outcome (controls)

2) Attempt to determine predictor status on all subjects in both outcome groups

Case-Control Study Cannot identify individuals lost to follow up

(no such thing as “lost to follow up”, since by definition outcome status is known)

Cannot calculate prevalence (or incidence) of outcome

Measure of disease association is the Odds Ratio (OR)

Trying to replicate a nested case control study in which the cases and controls come from the same cohort.

Nested Case-Control Study

Cross-Sectional Study

Cross-Sectional Study

Attempt to determine predictor and outcome status on all patients in a single population (defined by something other than predictor and outcome).

Cross-Sectional Study Cannot identify individuals lost to follow up

(no such thing as “lost to follow up”) Can calculate prevalence but not

incidence Measure of disease association is the

Relative Prevalence (RP). Time frame is the same as for a case-

control study; both discussed in DCR3, Chapter 8

Cohort Studies Start with a Cross-Sectional Study

Eliminate subjects who already have disease

Causal Association Between Predictor and Outcome

• Most observational studies: Does predictor cause outcome?

• Studies of diagnostic/prognostic test accuracy: Test result does not cause outcome.

Studies of Medical Tests

Causality irrelevant.Not enough to show that test result

is associated with disease status or outcome*.

Need to estimate parameters (e.g., sensitivity and specificity) describing test performance.

*Although if it isn’t, you can stop.

Studies of Diagnostic Test Accuracy for Prevalent Disease

Predictor = Test ResultOutcome = Disease status as

determined by Gold Standard

Designs:

Case-control (sample separately from disease positive and disease negative groups)

Cross-sectional (sample from the whole population of interest)

Dichotomous TestsDisease + Disease -

Test +a

True Positives

b

False Positives

Test -c

False Negatives

d

True Negatives

Total

a + c

Total With Disease

b + d

Total Without

Disease

Sensitivity = a/(a + c)Specificity = d/(b + d)

Sensitivity and Specificity

SensitivityPID = “Positive In Disease”Proportion of D+ patients with “+” test

result

SpecificityNIH = “Negative in Health”Proportion of D- patients with “–” test result

Studies of Dx Tests

Importance of Sampling Scheme

If sampling separately from Disease+ and Disease– groups (case-control sampling), cannot calculate prevalence, positive predictive value, or negative predictive value.

Dx Test:Case-Control SamplingDisease +

Sampled Separately

Disease –

Sampled Separately

Test +a

True Positives

b

False Positives

Test -c

False Negatives

d

True Negatives

Total

a + c

Total With Disease

b + d

Total Without

Disease

Sensitivity = a/(a + c)Specificity = d/(b + d)

Dx Test: Cross-sectional Sampling

Prevalence = (a + c)/NPositive Predictive Value = a/(a + b)Negative Predictive Value = d/(c + d)

Disease + Disease - Total

Test + a

True Positives

b

False Positives

a + b

Total Positives

Test - c

False Negatives

d

True Negatives

c + d

Total Negatives

Total a + c

Total With Disease

b + d

Total Without

Disease

a + b + c + d

Total N

Immunohistochemical Test for ARVC*

ARVC

Yes No

Immuno-histochemical

Test

Positive 10 2 12

Negative 1 9 10

11 11

Sensitivity = 10/11 = 91%

Specificity = 9/11 = 82%

*N Engl J Med. 2009 Mar 12;360(11):1075-84.

Immunohistochemical Test for ARVC*

ARVC

Yes No

Immuno-histochemical

Test

Positive 10 2 12

Negative 1 9 10

11 11

PPV = 10/12 = 83%

NPV = 9/10 = 90%

*N Engl J Med. 2009 Mar 12;360(11):1075-84.

Your patient has a negative result on this test. Does the NPV of 90% mean he still has a 10% chance of ARVC?

Sample Size Calculations for Studies of Diagnostic Test Accuracy

Sensitivity and Specificity are descriptive proportions.*

Choose N with disease to estimate sensitivity with the desired precision.

Choose N without disease to estimate specificity with the desired precision.

*Table 6E, page 91 DCR3

Likelihood Ratio

LR(result) = P(result|D+)/P(result|D-)

P(Result) in patient WITH disease----------------------------------------------------P(Result) in patients WITHOUT disease

See DCR3, Chapter 12, page 191

Sample Size Calculations for Studies of Diagnostic Test Accuracy

Size the sample to estimate a likelihood ratio with the desired precision.

See DCR3, Chapter 12, page 191

Studies of Prognostic Tests for Incident Outcomes

Predictor = Test ResultDevelopment of outcome or time to

development of outcome.

Design:

Cohort study

Studies of Prognostic Tests for Incident Outcomes

Prognostic test “result” is often a probability of having the outcome by a future time point (e.g. risk of death or recurrence by 5 years).

Need to assess both calibration and discrimination.

Comparing Predictions

Evidence-Based Diagnosis, Chapter 7

Jan. 30, 2008 Issue of Statistics in Medicine*

*Pencina et al. Stat Med. 2008 Jan 30;27(2):157-72;

Examples

Name that observational study design

JIFee

Babies born at Kaiser with neonatal hyperbilirubinemia (Bili > 25) are compared with randomly selected “controls” from the same birth cohort.

Outcome measure is IQ and neurologic status at age 5 years.

No difference in IQ or fraction with neurologic disability between the “case” and “control” groups.

Newman, T. B., P. Liljestrand, et al. (2006). N Engl J Med 354(18): 1889-900.

JIFee

Design?(Be Careful)

RRISK(Reproductive Risk Factors for Incontinence at Kaiser)

Random sample of 2100 women aged 40-69 yo

Interview, self report, diaries to determine whether they have the outcome, urinary incontinence.

Chart abstraction of L&D/surgical records to establish predictor status

RRISK

Design?

HIV Tropism and Rapid Progression*

Is HIV CXCR4 (as opposed to CCR5) tropism a predictor of rapid progression in acutely infected HIV patients?

Molecular tropism assay is “high end” and labor-intensive. Have funding to perform a total of 80 assays.

UCSF OPTIONS cohort follows patients acutely infected with HIV. Has banked serum from near time of acute infection.

* Vivek Jain’s Project

Identify the 40 patients with the most rapid progression (Group 1) and randomly select 40 others from the UCSF Options cohort (Group 2).

Run the tropism assay on banked serum for these 80 patients and compare results between Group 1 and Group 2.

HIV Tropism and Rapid Progression (continued)

Design?

HIV Tropism and Rapid Progression

Year of Visit and Provision of Analgesia in ED Patients with RLQ Pain

Subjects: Patients presenting to the SFGH ED with RLQ pain in 1998 and 2003

Predictor: Year of presentation Outcome: Receipt of parenteral analgesia

Neighbor ML, Baird C*, Kohn MA. Changing Opioid Use for Right Lower Quadrant Abdominal Pain in the ED. Acad Emerg Med 2005; 12(12): 1216-20.. *UCSF MSIII

Year of Visit and Provision of Analgesia in ED Patients with RLQ Pain

Analgesia

Year Yes No TotalPrevalenc

e

2003 72 65 137 53%

1998 43 144 187 23%

Relative Prevalence = 2.3

Year of Visit and Provision of Analgesia in ED Patients with RLQ Pain

Design?

Year of Visit and Provision of Analgesia in ED Patients with RLQ Pain

1. Patients who get abdominal CTs are much more likely to get analgesia (59% vs. 22%)

2. CTs were much more common in 2003 than in 1998 (56% vs 20%)

Is the increase in analgesia rates between 1998 and 2003 wholly explained by increased CT scanning?

Year of Visit and Provision of Analgesia in ED Patients with RLQ Pain

CT YES CT NO

Year Analg No Analg

Total

Prev Year

Analg No Analg Total

Prev.

2003 47 30 77

61% 2003 25 35 60 42%

1998 21 17 38

55% 1998 22 127 149 15%

Prev. Ratio 1.1

Prev. Ratio 2.8

Year of Visit and Provision of Analgesia in ED Patients with RLQ Pain

Is the increase in analgesia rates between 1998 and 2003 wholly explained by increased CT scanning?

NO. In the group that did not receive CTs, analgesia rate was

almost 3x higher in 2003 than in 1998.

Enhancing Causal Inference

Stratification

DCR 3rd Ed. Chapter 9 pp. 137.

Also, Appendix 9A

Smoking as a confounder of the relationship between coffee drinking and MI.

Causal Inference and Confounding in Observational Studies

(Next Week)

B-hCG Example if Time

B-hCG and Ectopic Pregnancy Subjects: All women with non-zero

serum B-hCGs presenting to the SFGH ED for abdominal pain or vaginal bleeding between 9/1/96 and 6/30/99.

Predictor: B-hCG Level Outcome: Pregnancy type (ectopic,

spontaneous ab, normal IUP) determined on medical record review by trained abstractors using explicit criteria

Kohn MA, et al.. Acad Emerg Med 2003;10(2):119-26.

B-hCG and Ectopic Pregnancy

Results: 845 patients, pregnancy type (ectopic vs. intrauterine) could not be established in 115, leaving 730 for analysis

HCG EP IUP Total Risk

<1500 40 118 158 25%

> 1500 56 516 572 10%

96 634 730

Risk Ratio 2.59

B-hCG and Ectopic Pregnancy

Design?

B-hCG and Ectopic Pregnancy

HCG EP IUP Total Risk

<1500 40 118 158 25%

> 1500 56 516 572 10%

96 634 730 13%

Risk Ratio 2.59

Your pregnant patient with abdominal pain has an HCG < 1500. Is her risk of ectopic pregnancy 25%?

B-hCG and Ectopic Pregnancy

β-hCG Distribution of Ectopic Pregnancy (EP) and Intrauterine Pregnancy (IUP)

β-hCG Pregnancy Type  

(mIU/mL) EP IUP Likelihood Ratio

< 1500 40 42% 118 19% 2.24 (1.68 - 2.98)

1500-50000 55 57% 313 49% 1.16 (0.96 - 1.40)

> 50000* 1 1% 203 32% 0.03 (0.01 - 0.23)

Total 96 100% 634 100%

*Sensitivity and specificity of B-hCG < 50,000 mIU/mL for EP were 0.99 (95% CI: 0.94 - 1.00) and 0.32 (0.28-0.36)