Suppose we conducted a study multiple times in an identical way. …sphweb.bumc.bu.edu/otlt/MPH-Modules/EP/EP713_Bias/G-HC... · 2013-05-13 · Selection bias is not caused by differences

A systematic error (caused by the investigator

or the subjects) that causes an incorrect (over-

or under-) estimate of an association.

0 1.0 10

Risk Ratio, Rate Ratio, or Odds Ratio

True

Effect

Also biased Precise, but

biased

Here, random error is small,

but systematic errors have

led to an inaccurate estimate

Bias

True

value

Random Error

and Bias

Little random

or systematic error

Little random error,

but biased

Null

Suppose we conducted a study multiple times in an identical way.

Random Error

20,000 200,000

10,000 500,000

Diseased Not

100 2100

100 5000

Diseased Not

175 2000

125 5000

Diseased Not

200 2000

100 5000

Diseased Not

True Relationship Unbiased Sample

Biased Estimates

These samples are

misleading about

the true relationship

in the population.

There are 4 mechanisms that can produce

this distortion when samples are taken.

D+ D-

E+

E-

Selection Bias Information Bias

Confounding

When there is over- or under-

sampling that distorts the picture. When there are errors in

classification of

exposure or outcome.

When other factors affecting

outcome are unevenly distributed

between the comparison groups.

Random Error

Selection Bias

Occurs when selection, enrollment, or continued participation

in a study is somehow dependent on the likelihood of having

the exposure of interest or the outcome of interest.

0.3 1.0 2 3

Selection bias can

cause an overestimate

or underestimate of

the association.

D+ D-

E+

E-

Selection Bias

1. Selection of a comparison group ("controls") that is not

representative of the population that produced the cases in

a case-control study. (Control selection bias)

2. Differential loss to follow up in a cohort study, such that the

likelihood of being lost to follow up is related to outcome

status and exposure status. (Loss to follow-up bias)

3. Refusal, non-response, or agreement to participate that is

related to the exposure and disease (Self-selection bias)

Selection bias can occur in several ways:

MGH 100

Hospital

Cases

Selection bias can occur in a case-control study if controls are

more (or less) likely to be selected if they have the exposure.

Do women of low SES have higher risk of cervical cancer?

200 Controls:

Door-to-door survey

of neighborhood

around the hospital

during work day.

Selection Bias in a Case-Control Study

200 Controls:

Door-to-door survey

of neighborhood

around the hospital

during work day.

Problems:

1. SE status of people living around the hospital may generally

be different from that of the population that produced the

cases.

2. The door-to-door method of selecting controls may tend to

select people of lower SE status.

Case Control

Low SES

High SES

75 100

25 100

OR=3.0

Selection Bias in a

Case-Control Study

OR=2.0

75 120

25 80 Exp.

Y

N Exp.

Y

N

Dis.

Y N

Dis.

Y N

Control Selection Bias True

(More likely to select a

control with low SES.)

Selection bias is not caused by differences in

other potential risk factors (confounding).

It is caused by selecting controls who are more

(or less) likely to have the exposure of interest.

Selection bias can occur in a case-control

study when controls are more (or less) likely

to be selected if they have the exposure.

If a control had developed cervical cancer,

would she have been included in the case

group? (“Would” criterion)

If the answer is “not necessarily,” then there

is likely to be a problem with selection bias.

Control Selection Bias

- The “Would” Criterion

Are the controls a representative sample

of the population that produced the cases?

Entire Population

Cancer Cases

Normal

Low SES (<median)

150 1,000,000

High SES (>median)

50 1,000,000

2,000,000 women > age 20 in MA, & about

200 cases of cervical cancer per year.

If low SES were associated with cervical cancer with OR=3.0,

MA would look like this.

Sample Cancer Cases

Normal

Low SES 75

High SES 25

Cases are referred to MGH from all

over, so their SES distribution is same

as the state’s, i.e. 3 to 1.

OR = (75/25) = 2.0

(120/80) (Biased)

But, controls

selected from area

around MGH may have

lower SES than MA. 120

80

Referred

Cases

Are mothers of children with

hemi-facial microsomia

more often diabetic?

Cases are referred, but what if

controls are selected from the

general pediatrics ward at MGH?

The referral mechanism of controls might be very

different from that of the cases with microsomia.

Could mothers of controls be more or less likely

to be diabetic than the cases (regardless of any

association between diabetes and microsomia)?

How would you select controls for this study?

Selection bias can be introduced into case-control studies with

low response or participation rates if the likelihood of responding

or participating is related to both the exposure and outcome.

Example: A case-control study explored an association between

family history of heart disease (exposure) and the presence of

heart disease in subjects. Volunteers are recruited from an

HMO. Subjects with heart disease may be more likely to

participate if they have a family history of disease.

Self- Selection Bias in

a Case-Control Study

Case Control

Fam. Hx+

Fam. Hx-

Self-Selection Bias in

a Case-Control Study

300 200

200 300

OR=2.25 OR=3.0

240

(80%)

120

(60%)

120

(60%)

180

(60%)

Exp.

Y

N Exp.

Y

N

Diseased

Y N

Diseased

Y N

Self-Selection Bias True

The best solution is to work toward

high participation (>80%) in all groups.

In a retrospective cohort study selection bias

occurs if selection of exposed & non-exposed

subjects is somehow related to the outcome.

What will be the result if the investigators are

more likely to select an exposed person if

they have the outcome of interest?

Selection Bias in a

Retrospective Cohort Study

Example:

Investigating occupational exposure (an organic

solvent) occurring 15-20 yrs. ago in a factory.

Exposed & unexposed subjects are enrolled based

on employment records, but some records were lost.

Suppose there was a greater likelihood of retaining

records of those who were exposed & got disease.

Selection Bias in a


100 900

50 950

RR=2.0

Selection Bias in a


RR=2.42

99 720

40 760 Exp.

Y

N Exp.

Y

N

Y N Y N

True

20% of employee health records

were lost or discarded, except in

“solvent” workers who reported

illness (1% loss).

Workers in the exposed group were more likely to

be included if they had the outcome of interest.

Differential “referral” or

diagnosis of subjects

Diseased Diseased

General Population

Rubber

Workers

vs.

Mortality

Rates?

The general population is often used in

occupational studies of mortality,

since data is readily available, and they

are mostly unexposed.

The main disadvantage is bias by the “healthy worker

effect.” The employed work force (mostly healthy)

generally has lower rates of mortality and disease than

the general population (with healthy & ill people).

The “Healthy Worker” Effect

Can be considered a form of selection bias

because the general population controls have a

higher probability of getting the outcome (death).

Exposed

Work Force

Unexposed

Work Force

Overall

Population

Comparing Mortality In:

True

RR=1.3

Apparent

RR=1.1

Additional risk

in unemployed

More likely to have the

outcome than the

employed work force

that produced the cases.

Enrollment into a prospective cohort study will not be

biased by the outcome, because the outcome has not

occurred at enrollment.

However, prospective cohort studies can have selection

bias if the exposure groups have differential retention of

subjects with the outcomes of interest. This can cause

either an over- or underestimate of association

Differential Retention (Loss to Follow Up)

in Prospective Cohort Studies

0.3 1.0 2 3

20 9980

10 9990

RR=2.0

Selection Bias in a

Prospective Cohort Study

RR=1.0

8 5980

8 5990 OC

Y

N OC

Y

N

TE

Y N

TE

Y N

Loss to Follow Up Bias True

More ‘events’

lost in one

exposure group

Without Losses

TE Normal

OC+ 20 9,980

OC- 10 9,990

Differential loss to follow up in a prospective cohort study

on oral contraceptives (OC) & thromboembolism (TE).

If OC were associated with TE with RR=2.0 (TRUTH),

the 2x2 for all subjects would look like this:

There is 40% loss to follow up overall,

but a greater tendency to loose OC users

with TE results in a de facto selection.

Final Sample

TE Normal

OC+ 8 5,980

OC- 8 5,990

RR = (8/5988) = 1.0

(8/5998)

If OC users

with TE are more

likely to be lost than

non-OC-users

with TE…

(Biased)

Observation Bias (Information Bias)

Biased measure of association due to incorrect categorization.

Diseased Not Diseased

Exposed

Not Exposed

The Correct

Classification

Non-differential Misclassification (random): If errors are about the same in both groups, it tends to minimize any true difference between the groups (bias toward the null).

Errors Errors

Subjects are misclassified with respect to their risk factor

status or their outcome, i.e., errors in classification.

Errors

Errors

=

Differential Misclassification (non-random):

If information is better in one group than

another, the association maybe over- or

underestimated.

Misclassification Bias

Non-Differential Misclassification

1. Equally inaccurate memory of exposures in both groups.

Example: Case-control study of heart disease and past

activity: difficulty remembering your specific exercise

frequency, duration, intensity over many years

2. Recording and coding errors in records and databases.

Example: ICD-9 codes in hospital discharge summaries.

3. Using surrogate measures of exposure:

Example: Using prescriptions for anti-hypertensive

medications as an indication of treatment

4. Non-specific or broad definitions of exposure or outcome.

Example: “Do you smoke?” to define exposure to tobacco

smoke (vs. How much, how often, how long).

When errors in exposure or outcome status occur

with approximately equal frequency in groups being compared.

Errors Errors =

Recording or Coding Errors:

Example:

When patients are discharged, the MD dictates a

summary which is transcribed. Diagnoses and

procedures noted on the summary are encoded (ICD-9

codes) and sent to the MA Health Data Consortium.

1. MDs don’t list all relevant diagnoses.

2. Coders assign incorrect codes (they aren’t MDs).

Errors occur in 25-30% of records.

Non-Differential Misclassification Errors Errors =

CAD Controls

Diabetes 40 10

No diabetes 60 90

OR= 40x90 = 6.0

10x60

CAD Controls

Diabetes 20 5

No diabetes 80 95

OR= 20x95 = 4.75

5x80

True Relationship With Nondifferential Misclassification

Example: A case-control study comparing CAD cases &

controls for history of diabetes. Only half of the diabetics

are correctly recorded as such in cases and controls.

Non-Differential Misclassification

Errors Errors =

Effect: With a dichotomous exposure (e.g., smoking

vs. non-smoking), non-differential misclassification

minimizes differences & causes an underestimate of

effect, i.e. “bias toward the null.”

0.3 0.5 1.0 2 3

“Null” means

no difference

Relative Risk

Non-Differential Misclassification Errors Errors =

“Self-reported weights were validated in a subsample

of 184 NHS participants living in the Boston, MA area

and were highly correlated with actual measured

weights (r = 0.96).”

Cho E, Manson JE, et al.: A Prospective Study of Obesity and Risk of Coronary

Heart Disease Among Diabetic Women. Diabetes Care 25:1142–1148, 2002.

Validation

When data from the Nurses’ Health Study was used to

examine the association between obesity and heart

disease, information on exposure (BMI) was obtained

from self-reported weights on a questionnaire.

Could they have under-reported?

• Differences in accurately remembering exposures (unequal)

Example: Mothers of children with birth defects will

remember the drugs they took during pregnancy better

than mothers of normal children (maternal recall bias).

• Interviewer or recorder bias.

Example: Interview has subconscious belief about the

hypothesis.

• More accurate information in one of the groups.

Example: Case-control study with cases from one facility

and controls from another with differences in record

keeping.

When errors in classification of exposure or

outcome are more frequent in one group.

Errors

Errors

Differential

Misclassification

To Minimize:

• Use a control group that has a different disease

(unrelated to the disease under study).

• Use questionnaires that are constructed to maximize

accuracy and completeness. Ask specific questions.

More accuracy means fewer differences.

• For socially sensitive questions, such as alcohol and

drug use or sexual behaviors, use a self-administered

questionnaire instead of an interviewer.

• If possible, assess past exposures from biomarkers or

from pre-existing records.

Note: If the groups have the same % of

errors based on faulty memory, that’s

non-differential misclassification.

(Differential) Recall Bias

People with disease may remember exposures differently

(more or less accurately) than those without disease.

Errors

Errors

(Differential) Interviewer Bias (& recorder bias in record reviews)

Minimized by:

• Blinding the interviewers if possible.

• Using standardized questionnaires consisting of

closed-end, easy to understand questions with

appropriate response options.

• Training all interviewers to adhere to the question and

answer format strictly, with the same degree of

questioning for both cases and controls.

• Obtaining data or verifying data by examining pre-

existing records (e.g., medical records or employment

records) or assessing biomarkers.

Systematic differences in soliciting,

recording, or interpreting information.

Are you sure? Think harder!

Interviewer

Errors

Errors

Recall

Bias Interviewer bias

Differential

Misclassification

Non-Differential

Misclassification

Errors

Errors

Errors Errors

Bias to Null

These are differential and can

bias toward or away from null.

0.3 1.0 2 3

0.3 1.0 2 3

Effects of Bias

Misclassification of Outcome

Can Also Introduce Bias

… but it usually has much less of an impact than

misclassification of exposure, because:

1. Most of the problems with misclassification occur with

respect to exposure status, not outcome.

2. There are a number of mechanisms by which

misclassification of exposure can be introduced, but

most outcomes are more definitive and there are few

mechanisms that introduce errors in outcome.

3. Most outcomes are relatively uncommon.

4. Misclassification of outcome will generally bias toward

the null, so if an association is demonstrated, if

anything the true effect might be slightly greater.

Any concerns?

A study is conducted to see if serum cholesterol screening

reduces the rate of heart attacks. 1,500 members of an HMO

are offered the opportunity to participate in the screening

program, & 600 volunteer to be screened. Their rates of MI are

compared to those of randomly selected members who were

not invited to be screened. After 3 years of follow-up rates of MI

are found to be significantly less in the screened group.

1. Nope

2. Differential misclassification

3. Interviewer bias

4. Recall bias

5. Selection bias

Abdominal Aortic Aneurysm (AAA)

Background Information on Abdominal Aortic Aneurysms

Usually asymptomatic (surgery if > 5 cm.)

Discovered during routine abdominal

exam by palpation, or

Seen on x-ray or ultrasound of

abdomen (done for other reasons).

Diagnosis of Abdominal Aortic Aneurysms

Known risk factors:

Age

Male gender

Smoking

Hypertension

AAA Other

60 1,242 1,302 Black

260 620 880 White

Conclusion:

AAA uncommon in Blacks and

more often due to infections.

OR = 0.12 (0.09 – 0.15)

320 1,862

‘Other’: a variety of

readily apparent

conditions.

Costa & Robbs: Br. J. Surg. 1986

A vascular surgery (referral) service in So. Africa reviewed

records of elective peripheral vascular surgery.

AAA Other

60 1,242 1,302 Black

260 620 880 White

320 1,862

‘Other’: a variety of

readily apparent

conditions.

Costa & Robbs: Br. J. Surg. 1986

A vascular surgery (referral) service in So. Africa reviewed

records of elective peripheral vascular surgery.

1. Yes

2. No

Could there have

been selection bias?

A possibility of misclassification?

“AAA in blacks are more often due to infectious causes.”

Blacks Whites

Atherosclerotic 34% 99%

Inflammatory or

Infectious 47% 0.5%

Uncertain etiology 19% 0. 0%

“All black patients were screened for TB … and for syphilis.”

1. No

2. Yes, random.

3. Yes, differential.

More Details About the Study

White Black

Male:Female 2:1 1:1

Mean age 49.4 67.1

Admitted for Uncontrolled HBP 0% 17%

Smoking 76% 48%

(Known risk factors…)

Age

Male gender

Smoking

Hypertension

Environmental tobacco smoke and tobacco related

mortality in a prospective study of Californians, 1960-98. James E. Enstrom, Geoffrey C. Kabat. BMJ 2003;326:1057

118,094 adults enrolled in an ACS cancer study in 1959 were

followed until 1998. For “never smokers married to ever

smokers” compared with “never smokers married to never

smokers”:

RR in Males RR in Females

Heart disease 0.94 (0.85 - 1.05) 1.01 (0.94 - 1.08)

Lung cancer 0.75 (0.42 - 1.35) 0.99 (0.72 - 1.37)

Chr. Pulm. Dis. 1.27 (0.78 - 2.08) 1.13 (0.80 - 1.58)

Conclusions: The results do not support a causal relation

between environmental tobacco smoke and tobacco related

mortality, although they do not rule out a small effect.

Any potential selection bias in the ETS study?

1. I don’t think so.

2. Yes, there was a potential for it.

Any potential information bias in the ETS study?

1. I don’t think so.

2. Non-differential misclassification.

3. Differential misclassification.

4. Interviewer bias.

5. Recall bias.

Are Analgesic Drugs Associated with

Increased Risk of Renal Failure?

Case-Control study by Perneger et al.

Cases found in the renal dialysis registry for

all of Maryland, Virginia, West Virginia, & D.C.

Controls: random digit dial from the same

geographic area.

Data: Non-blinded phone interview asking about

lifetime analgesic use.

Conclusion:

Acetaminophen & NSAIDS increase risk of renal failure, but not aspirin.

OR 95% CI

Acetaminophen

0-999 1.0 -

1000-4999 2.0 1.3-3.2

>5000 2.4 1.2-4.8

Aspirin

0-999 1.0 -

1000-4999 0.5 0.4-0.7

>5000 1.0 0.6-1.8

NSAIDs

0-999 1.0 -

1000-4999 0.6 0.3-1.1

>5000 8.8 1.1-71.8 Could any biases have

influenced the conclusion?

Case-Control Study: Analgesic Use & Renal Failure

0%

0%

• Could there have been selection bias?

• Was loss to follow-up a problem?

• Could interviewer bias have affected results?

• Could recall bias have affected results?

• Is it possible that the conclusions were

influenced by non-differential

misclassification?

This was not a terrible paper. However, is it

possible that there were any potentially important

sources of bias?

Think in a structured way. Think about how they

collected the data. Consider each of the following:

Select subjects by similar mechanism.

Maintain follow-up in prospective studies.

Blind interviewers.

For case-control studies, get subjects with equal tendency to remember.

Use clear, homogeneous definitions of disease & exposure.

Get accurate data collected in a similar way.

Confirm data; use error trapping during data entry.

Once it’s in the study, you can’t fix it.

Avoiding Bias

Documents

Suppose we conducted a study multiple times in an identical way. …sphweb.bumc.bu.edu/otlt/MPH-Modules/EP/EP713_Bias/G-HC... · 2013-05-13 · Selection bias is not caused by differences