Case-Control Studies November 18 2004 Epidemiology 511 W. A. Kukull

Case-Control

Studies

November 18 2004

Epidemiology 511W. A. Kukull

History

• Lane-Claypon (1926) first case control study: reproductive experience and breast ca

• Sociology used case control methods in 1920’s and 30’s

• Wynder& Graham (1950) and others linked cigarette smoking to lung ca

• Cornfield (1951) direct standardization to control confounding

• Mantel and Haenszel (JNCI, 1959) stratified analysis

Case Control Studies

Cases(with Disease)

Controls(no Disease)

Hx ofExposure

Hx ofExposure

Not Exposed

NotExposed

Study Base orPopulation

All New Cases of Disease “X”that meet studycriteria

“Sample” ofpersons withoutDisease “X”(Controls)

Population-based, Incident, Case-Control Study

“Controls” must be able to become diseased and must besampled without regard to “exposure”.

“Study base” concepts

Eligible Subjects

Cohort Studyenrollees

Follow-up Time

Loss, death, refusalsbefore disease develops

Disease cases

Non- diseased

Enroll all?

Sample?

Bias?

Case - Control

Nested Case Control

Cohort study population:Exposed and Unexposed

Developeddisease

Did Not develop Disease

Cases“Controls”

Sampled or Selectedfrom the remaining unaffectedwhen each case is diagnosed

Use data or biomarkers collected at cohort entry to determine exposurestatus for cases andcontrols

Selecting Cases

• Must cases be “representative” of all persons who have the disease?– What about female cases?– Severe cases?– early onset cases?– Cases from Omaha?

• They must be selected independent of exposure !! (and define a study base)

Principles of Comparability(after Wacholder et al, AJE;1992;135:1019-28)

• Case-control comparisons should be made within subjects from the same study base; (selection bias)

• Effects of other factors on the disease-exposure association being studied should be minimized; (confounding)

• Errors in exposure measurement should be non-differential; (information bias)

Case – Control comparability(after Koepsell & Weiss, 2003)

• Comparability and “representativeness”– If each of the study cases had not developed the

disease, would they have been included in the study base/population?

– If each of the non-cases in the study base/population had developed the disease, would they have been included as a case?

• Can we characterize the study base/population?

Representativeness?Ambiguous interpretation

• Cases may be restricted to any type of case– The case definition will define the “study base”

or source population for controls

• Cases do not need to be a “random sample” of the entire diseased universe to be valid– Case selection and inclusion criteria will affect

the research questions that can be answered

Selecting Cases

• Disease criteria: clinical or histopathological evidence?

• Hospitalized cases or cases from a registry?– may need to include more than one hospital

• Incident or Prevalent cases ?– survival bias? Prevalent case sample may miss

persons dying early in the course of disease– Health services factors plus risk factors

Cohort design similarities

• Suppose we could find all new cases of ALS in a particular town as they occur

• Could we conduct a cohort study ?– We know the population of the town– How would we measure exposures for

everyone?

• Could we take a “sample” of the persons without ALS and compare them to cases

Sources of Controls:Are they part of the same study base?

• Community or Population-based– RDD

• Friend or spouse • Neighborhood• Hospitalized patients

– unrelated to exposure; multiple diagnoses• Medicare and government lists• HMO enrollees

Timing

• Specify a reference time– e.g. diagnosis for cases, similar time for control

• Determine exposure before reference time– later ones don’t count

• What if enrolled controls later get the disease of interest?– Controls at enrollment are compared to cases at

enrollment

Exposure and Onset: how we think of onset influences potential relevant exposures

Biologic onset ofdisease

Disease Detectable by Screen

Sx/Dx

Outcome

Potentially effective Exposures (Critical period)

IrrelevantExposures

???

Comparable exposure periods for controls? Setting a “reference age/time”

Case age at biologic onset ofdisease

Disease first Detectable by Screen

Case age at

Sx/Dx

Outcome

Potentially effective Exposures (Critical Period)

IrrelevantExposures

???

Age/time

Proxy RespondentsWhen the subject can’t respond

• Spouse, sibling, child, friend

• Use for both case and control – when proxies systematically over or

underestimate exposure– when control responses and their proxies are

poorly correlated– when there is no information relating proxy and

subject responses

Choosing controls (1)

• We want to study computer use as a risk factor for carpal tunnel syndrome among 16 yo women

• We find cases through the hospital neurology clinic

• We enroll the best friend of each case (same age and sex), who has no symptoms, as a control—for a paired case-control design


• We want to study the effect of smoking on carotid artery stenosis/occlusion

• Cases selected from UWMC vascular clinic – carotid doppler > 70% stenosis

• Controls are selected from UWMC pulmonary clinic – carotid doppler : <20% stenosis

• Determine smoking Hx for each– What would we expect to find


• Selection Bias– controls representative of the study base?– selection related to exposure hx?

• Information bias (recall and other)– is the exposure measured with the same

accuracy in cases as in controls

• Residual confounding: unmeasured factors• Statistical power

Example: study base(after MacMahon and Trichopoulos)

• Case control study of induced abortion and subsequent ectopic pregnancy– Cases: 26 women with EP and one previous

pregnancy – 3 controls for each case from same maternity

hospital (matched on age, education and pregnancy order)

• Result : odds ratio of 10.0

Example: IA an EP (2)

• Were controls representative of the case “study base”?

• Controls were generally completing pregnancy (in general, women completing were less likely to have had an IA)

• Cases of EP is diagnosed early in gestation

• Later study, with only “new” pregnancy controls showed OR= 1.9, not 10.0

Measuring Exposure

• Recall Bias– Are cases more/less likely to recall exposure

than controls?

• Limitations of recall– Is the person’s recall valid?

• Comparability in cases and controls

• Validity (accuracy) of measures

Comparability of Exposure Information

• Does incompleteness or inaccuracy occur to a different degree in cases vs controls?– Same degree?

• What would happen to the Odds Ratio in either case?– attenuation (reduction toward null value)?– exaggeration?– spurious association?

Comparability Considerations: Exposure Measurement

• Provisions for “sensitive” questions (illicit drug use, sex, income)

• Biologic specimens: lab methods, storage degradation

• Excessively long interviews– psychological testing– food frequency– occupational and extended family history

Obtaining ComparabilityExposure measurement

• Keep staff unaware of hypothesis and case/control status

• Place and circumstances of interview– equal proportions of cases and controls

interviewed by each staff member

• Use information recorded prior to time of diagnosis (hospital or pharmacy records)

• Direct Measurement (EMF, radiation)

Evidence of Comparability

• Is there similar proportions of “missing” data in cases and controls?– Does the time duration of interviews differ?

• Ask about etiologically irrelevant characteristics and assess response– e.g. if studying pancreatic ca and coffee ask

also about tonsillectomy and hemorrhoids

• More than one source

ExampleTrue classification

135

95

50

180

MI No MI

Illegaldrug

No Illegaldrug

OR= 5.1230 230

Example:Reporting accuracycases .90; controls .20Differential misclassification

112

118

10

220

MI(case)

No MI(control)

Illegaldrug

No Illegaldrug

OR = 20.8 230 230

Example:Reporting accuracycases .20; controls .20

Non-Differential misclassification

27

203

10

220

MI(case)

No MI(control)

Illegaldrug

No Illegaldrug

OR = 2.9 230 230

Example summary

• Differential Differential misclassification may bias Odds Ratio in either direction

• Non DifferentialNon Differential misclassification usually biases toward the NULL (1.0)– under some circumstances it may be biased

away from the null– Don’t always trust it to underestimate true

effect

Example: Recall Bias Comparability of information

• We are studying prenatal maternal infection and congenital malformations.

• True incidence of infection is 15% for both cases and controls

• IF Case mothers recall 60% or their true infections; controls recall 10% of theirs

• Results will show 9% infection rate in cases and 1.5% among controls (OR ~ 6.0)

Matchingto reduce potential confounding by design

• Individual matching requires a “matched” analysis– pairs are the unit of analysis OR= b/c

• Frequency matching uses a standard, stratified or unmatched analysis– May require that cases be enrolled before

controls

Indications for Matching

• If the unmatched groups have little overlap on the factor (and the factor is associated with disease)

• Small studies of rare diseases with several confounding variables

• To account for unmeasured confounders, through a surrogate measure, e.g., neighborhood

• Especially, when the matched factor is a STRONG confounder

MatchingPotential problems

• Finding a match for the 93 y/o man from Ballard, who was a fisherman, is married, has five children, and drinks socially– Age, neighborhood, occupation, marital status,

offspring, alcohol use—too much matching?

• Once you have “matched” on a factor you cannot study that factor– Why? because we have artificially established equal

proportions of that factor in the cases and the controls

OverOvermatching

• Matching on a variable intermediate in the causal pathway– Suppose smoking alters cholesterol, and

cholesterol is associated with CVD– What if we matched on cholesterol?

• Don’t match on factors related to exposure of interest but not to disease– Contraceptives, religion, -> embolism

Case Control Studies:Limitations

• Inefficient when exposure is rare

• Cannot compute incidence rates directly

• Sometimes difficult to establish temporal relationship between exposure and disease

• Prone to biases:– Selection bias– Information bias, specifically RECALL bias

Case Control Studies:Strengths

• Relatively quick and inexpensive

• Good for diseases with long latent periods

• Optimal for RARE diseases

• Can examine multiple etiologic factors for a single disease

Conclusion

• Define a study base• Select diseased and non diseased persons• Measure history of “exposure” • Compare exposure hx in cases and controls• Assess possibility of bias

– misclassification and non-comparability of exposure data

– inappropriate study base sampling; timing

Documents

Case-Control Studies November 18 2004 Epidemiology 511 W. A. Kukull