Upload
tranminh
View
248
Download
2
Embed Size (px)
Citation preview
11.1
X:\606\May 2003\study guide\Chapter 11.doc
CHAPTER 11. Confounding
Objectives. Students should be able to:
a) Define: weighted average, arithmetic average
b) Define: confounding effect, confounding variable
c) Describe the structure of a crude relative risk in terms of
weighted averages
d) Define and distinguish strategies for the control of
confounding: direct and indirect standardization, Mantel-
Haenszel procedure, restriction, randomization, matching,
multivariate analysis
e) Distinguish between control strategies used in design and
those used in analysis
f) Define effect-modification; distinguish between confounding
and effect modification
Assignment: Confounding
Confounding 11.2
Outline
1. Weighted average
2. Confounding effect
3. Confounding variable
A risk factor for the disease under study
Associated with exposure
Not in the causal pathway
4. Weights in estimation of relative risk
5. Control of confounding: stratification
Direct standardization
Indirect standardization
Mantel-Haenszel procedure
6. Control of confounding: matching (a separate lecture)
7. Control of confounding: other strategies
Restriction
Randomization
Multivariate analysis
8. Overview of control strategies
9. Effect modification
Compulsory readings
Hennekens CH, Buring JE. Chapter 12.
Confounding 11.3
LECTURE NOTES ON CONFOUNDING
(prepared by Jean-François Boivin and William Hodge)
The design and conduct of epidemiologic studies can be affected
by three types of biases: selection bias, misclassification
bias, and confounding bias. Other types of biases may arise at
the level of analysis, for example through inappropriate
modelling assumptions. The subject of this section is
confounding.
WEIGHTED AVERAGE
Central to the understanding of confounding and how to control
it is the concept of weighted average. The arithmetic mean is
one particular type of mean, calculated in a specific way and
with specific weights (insert 1). To calculate the arithmetic
mean, the weights used are equal for each value being averaged.
Depending on the weights chosen, however, a mean may be very
different from the arithmetic mean. It will, however, always
fall within a range of values, the limits of which are the
maximum and minimum values in the set of numbers being averaged.
The concept of weighted average plays an essential role in the
understanding and controlling of confounding.
Confounding 11.4
CONFOUNDING EFFECT
A confounder is an extraneous variable that totally or partially
accounts for the apparent effect of the study exposure on the
outcome. It may even mask an underlying true association or
reverse it. Some examples will serve to illustrate these
possibilities. In table 1, the exposure variable is vitamin
deficiency and the outcome is depression. Is vitamin deficiency
a risk factor for depression? The overall relative risk is 2.5,
giving the erroneous impression that it is a risk factor.
However, if we stratify the results by age category, we see that
among old subjects the relative risk is 1.0 and among the young
it is also 1.0. Hence there is no relationship between vitamin
deficiency and depression. The confounding effect of age totally
accounts for the relationship between vitamin deficiency and
depression.
A confounding effect may partially account for the effect of the
exposure. In table 2, the overall relative risk of heart
disease in subjects with a red meat diet is 2.4, but when we
stratify the results by sex, the risk is 2.0 for each stratum.
Hence sex is partially responsible for the association between
red meat diet and heart disease.
In table 3, the relative risk of respiratory disease after
exposure to air pollution is 1.0, suggesting that there is no
Confounding 11.5
association between this exposure and the outcome. If we
stratify by smoking status, however, we see that the relative
risk in each category is actually 3.3. This represents an
example of a confounding effect masking a true association.
Finally in table 4, we see an example on how confounding can
reverse the true exposure-outcome relationship. Looking at the
effect of drug A relative to drug B on the risk of death, one
would conclude from the crude data that subjects exposed to drug
A have a 2.5-fold increase in the risk of dying relative to
drug B patients. However, if we stratify by asthma severity
status, we see that drug A patients are really protected from
dying relative to drug B patients.
We will now define the concept of confounding more formally.
CONFOUNDING VARIABLE
A confounding variable is a risk factor for the disease under
study and is also associated with the exposure under study.
Some refinements of this definition are needed. While the
confounder must be a risk factor for the outcome under study,
this relationship need not be causal. Causality is a complex
philosophical and scientific concept and will be addressed later
in this course. Age and sex are examples of variables which
often are confounders without being causal risk factors.
Confounding 11.6
A factor which meets the definition for confounding but is an
intermediary in the exposure-outcome pathway is not a
confounder. For example, if a high fat diet causes increased
cholesterolemia which in turn results in an increased risk of
myocardial infarction, the intermediary variable cholesterolemia
is not a confounder.
WEIGHTS IN THE ESTIMATION OF RELATIVE RISK
Return to table 1. The overall or crude relative risk of
depression was found to be 2.5, suggesting that vitamin
deficiency was a risk factor for depression. However, when we
stratify by age, we see that the relative risk in each stratum
is really 1.0, indicating that age confounds the relationship
between vitamin deficiency and depression. Why has this
happened? Table 5 shows data from table 1, from a different
perspective. We see that age is a risk factor for depression
(relative risk= 3.0). We also see that age is associated with
vitamin deficiency. That age is a risk factor for depression is
a fact that cannot be modified. However, the association
between age and vitamin deficiency is a design feature of the
study that can be manipulated by modifying the weights used.
Insert 2 demonstrates that subjects with vitamin deficiency and
those without vitamin deficiency receive different weights in
Confounding 11.7
the estimation of the crude relative risk. For subjects with
vitamin deficiency, the weights heavily favor the old age group
whereas for those without vitamin deficiency, the weights are
heavily tipped toward the young age group. Hence the crude
comparison of subjects with vitamin deficiency and subjects
without vitamin deficiency represents to a large extent the
comparison of old and of young subjects. In tables 6 and 7, we
changed the weights to an appropriate set of weights and the
confounding effect disappears. Choosing appropriate weights
represents one approach for controlling confounding, which leads
us into our next topic.
METHODS TO CONTROL CONFOUNDING
(A) STRATIFICATION
Stratification refers to a group of methods which yields a
summary measure of association which is an average of stratum-
specific values. If, for example, sex were a potential
confounding variable, the measure of association for males and
females could be calculated separately. By definition, each
stratum-specific estimate is now unconfounded by sex. One
option is to report the unconfounded measure of association
separately for each stratum. This, however, has the
disadvantage of being unparsimonious. Usually an overall pooled
measure of association, representing a weighted average of the
measure of association in each stratum, is calculated. The
Confounding 11.8
pooled measure of association will fall between the lower and
upper limits of the stratified measures of association. Table 8
demonstrates the relationship between the use of a drug and
development of a rash. There is strong confounding by sex as
the crude relative risk of 0.63 does not fall within the
stratum-specific relative risks of 2.0 and 1.5. By using a
weighting system which yields a weighted average of these two
values, the corrected or standardized relative risk will fall
between 1.5 and 2.0. Various stratification methods are
distinguished on the basis of how they determine weights used in
the analysis. The main stratification methods include direct
standardization, indirect standardization, and the Mantel-
Haenszel procedure.
Standardization
Both direct and indirect standardization consist of obtaining a
weighted average of stratum-specific risks, using within each
stratum the same weights for exposed and unexposed subjects.
Let us use the example from table 8 to work through these two
methods. Table 9 demonstrates the crude weights and risks for
this example. One can see from this table that the confounding
arises as a result of the extreme difference in the values of
the weights between the two groups. Drug A was predominantly
prescribed to males and drug B predominantly to females. In
table 9, these weights are changed and standard arbitrary
Confounding 11.9
weights are chosen. The summary unconfounded estimate then
becomes 1.57. In table 10, indirect standardization is
performed. Note that the method is identical to direct
standardization, except that the weights are not in this case
arbitrary: the population of exposed subjects (drug A) is chosen
to provide the weights. The indirectly standardized estimate is
1.8.
Mantel-Haenszel procedure
A common method used to obtain an unconfounded estimate of the
odds ratio is the Mantel-Haenszel procedure. The principle of
the Mantel-Haenszel procedure is to give the largest weights to
the stratum-specific estimates with the smallest variance. The
weights assigned to the stratum-specific values are therefore
inversely proportional to the variance of each estimate. Table
11 illustrates the use of the Mantel-Haenszel procedure in a
study of occupation and lung dysfunction. The crude odds ratio
is 0.54 but the stratum-specific odds ratios (stratified by age)
range from 1.42 to 2.7. Using the Mantel-Haenzel procedure, the
adjusted odds ratio is 2.2.
(B) MATCHING
The question of matching will be covered in a separate lecture.
Confounding 11.10
(C) RESTRICTION
In restriction, the investigator allows only subjects in one
category of the potential confounding variable to be included in
the study. For example, if the association between asbestos
exposure and lung cancer was to be studied, and one wanted to
control for cigarette smoking as an important confounding
variable, one could restrict the study to nonsmokers, or to
smokers. One disadvantage of restriction is that it may reduce
the number of subjects available for study.
(D) RANDOMIZATION
Randomization is a very important way to control confounding and
has unique advantages. In randomized studies, known confounders
are expected on average to be equally distributed between these
groups. It is this presumptive ability to control even unknown
confounders that makes randomization such an attractive way to
control confounding.
(E) MULTIVARIATE ANALYSIS
By far the most important and practical way to control
confounding in modern epidemiology is multivariate modelling.
This topic is the subject of several textbooks and specialized
courses and will not be developed here.
Confounding 11.11
OVERVIEW OF CONTROL STRATEGIES
Table 12 summarizes at which level of a study specific
strategies aimed at the control of confounding are used.
Randomization and restriction are used in the design phase while
stratification and multivariate analysis are used in the
analysis. Matching is used in the design phase and depending on
the study design may also need to be taken into account in the
analysis.
EFFECT MODIFICATION
Effect modification (synonym: interaction) is sometimes confused
with confounding. While confounding is a bias to be controlled
for, effect modification is a phenomenon to be assessed and
explored. Does the effect of diabetes mellitus on risk of
coronary heart disease differ across sex or age group? For
example, are diabetic males more likely to suffer from coronary
heart disease than diabetic females? Such relationships must be
studied to better understand this clinical and epidemiologic
question.
When a measure of association differs across strata, effect
modification exists. Effect modification may exist with or
without confounding. Table 13 gives examples. In the first
example, confounding exists because the crude relative risk of
0.63 does not fall between the stratum-specific relative risks
Confounding 11.12
of 1.5 and 2.0. Effect modification also exists: the stratum-
specific relative risks of 1.5 and 2.0 are different. In the
second example, confounding is not present as the crude relative
risk of 2.36 falls between the stratum-specific risks of 2.0 and
6.0; effect modification is certainly present as the stratum-
specific relative risks of 2.0 and 6.0 are very different. By
the same reasoning, in the third example, confounding exists but
effect modification does not. Finally, in the fourth example,
there is neither confounding nor effect modification. We have
assumed in all of these examples that the sample sizes were
large and therefore that these estimates of relative risk were
very precise.
Effect modification depends on the measure of association. Look
for example at the third example of table 12. Based on the
relative risk, there is no effect modification as the relative
risk in each stratum is 2.0. The risk difference in the first
stratum, however, is 0.1 (200/1000-20/200) and in the second
stratum it is 0.4 (80/100-800/2000). Hence on the risk
difference scale there is effect modification.
Confounding 11.13
INSERT 1 DEFINITION OF AVERAGE AND WEIGHTED AVERAGE
1. Mean: a value that lies within a range of values and is
computed according to a prescribed law (synonym: average)
(Webster's New Collegiate Dictionary, 1977).
2. Arithmetic mean: a value that is computed by dividing the
sum of a set of terms by the number of terms (Webster).
Algebraically:
3. Weighted mean (or weighted average): a value that is
computed by adding a set of terms weighted in such a way
that the sum of the weights is equal to one.
Algebraically:
The arithmetic mean is a special case of the weighted mean
where weights are equal for all i's. If one defines wi =
1/n in equation II above, one obtains:
( )n/x = n
x+...+x+x = x ofMean in
1=i
n21 Σ
wx = wx+...+wx+wx = x ofMean iin
1=inn2211 Σ
1 = w where in
1=iΣ
nx+...+x+x =
)n1(x+...+)
n1(x+)
n1(x = xofMean
n21
n21
Confounding 11.14
TABLE 1 CONFOUNDING TOTALLY ACCOUNTS FOR EXPOSURE EFFECT A COHORT STUDY OLD Vitamin deficiency + -
1.0 = 10060
20001200
÷
+ Depression
-
1200 60
800 40
2000 100
YOUNG Vitamin deficiency + -
1.0 = 1000200
10020
÷
+
Depression -
20 200
80 800
100 1000 ALL Vitamin deficiency + -
2.5 = 1100260
21001220
÷
+ Depression
-
1220 260 880 840
2100 1100
Confounding 11.15
TABLE 2 CONFOUNDING PARTIALLY ACCOUNTS FOR EXPOSURE EFFECT (source: Boivin JF, Wacholder S. Conditions for confounding of the risk ratio and of the odds ratio. American Journal of Epidemiology 1985; 121:152-158.)
A COHORT STUDY MALES Red meat diet + -
2.0 = 480200
16801400
÷
Heart + Disease -
1400 200
280 280
1680 480 FEMALES Red meat diet + -
2.0 = 24020
26444
÷ Heart + Disease -
44 20
220 220
264 240 ALL Red meat diet + -
2.4 = 720220
19441444
÷ Heart + Disease -
1444 220
500 500
1944 720
Confounding 11.16
TABLE 3 CONFOUNDING MASKS THE EFFECT OF EXPOSURE
A COHORT STUDY SMOKERS Air pollution + -
3.3 = 73367
10030
÷ Respiratory +
Disease -
30 67
70 666
100 733 NON-SMOKERS Air pollution + -
3.3 = 100
142614
÷ Respiratory +
Disease -
14 1
412 99
426 100 ALL Air pollution + -
1.0 = 83368
52644
÷ Respiratory +
Disease -
44 68
482 765 526 833
Confounding 11.17
TABLE 4 CONFOUNDING REVERSES THE EFFECT OF EXPOSURE
A COHORT STUDY SEVERE ASTHMA Drug A B
0.5 = 104
1000200
÷ +
Death -
200 4
800 6
1000 10 MILD ASTHMA Drug A B
0.5 = 100
4100
2÷
+ Death
-
2 4
98 96
100 100 ALL PATIENTS Drug A B
2.5 = 110
81100202
÷ +
Death -
202 8
898 102
1100 110
Confounding 11.18
TABLE 5 CONDITIONS FOR CONFOUNDING IN THE EXAMPLE OF VITAMIN DEFICIENCY AND DEPRESSION
(1) Age is associated with depression In subjects without vitamin deficiency: Old Young
3.0 = 1000200
10060
÷ +
Depression -
60 200
40 800
100 1000 In subjects with vitamin deficiency: Old Young
3.0 = 10020
20001200
÷ +
Depression -
1200 20
800 80
2000 100 (2) Age is associated with vitamin deficiency Old Young
200 = 10010010002000 =ratio Odds
××
+
Vitamin deficiency
-
2000 100
100 1000
Confounding 11.19
INSERT 2 EXPRESSING DATA FROM TABLE 1 IN TERMS OF WEIGHTS In tables 1 and 5, it can be seen that the stratum-specific risks of depression among subjects with vitamin deficiency are 0.60 (old) 0.20 (young). The crude risk, seen in table 1, is 1220/2100 = 0.58. This crude risk is a weighted average of 0.60 (old) and 0.20 (young) and the weights are 2000/2100 and 100/2100, respectively. Thus,
Similarly, the stratum-specific risks of depression among subjects without vitamin deficiency are 0.60 (old) and 0.20 (young) and the crude risk is 260/1100 = 0.24. The weights are 100/1100 and 1000/1100, respectively:
It can be seen that for each age group, a different weight is used in the subjects with vitamin deficiency and those without: Crude relative risk:
9 9 Old Young Relative risk=1 Relative risk = 1 Most of the information which goes into the crude relative risk is the risk of 0.60 in the exposed and of 0.20 in the unexposed.
0.58 = 2100
20+1200 = 2100100+(0.20)
21002000(0.60)
0.24 = 1100260 =
1100200+60 =
11001000+(0.20)
1100100(0.60)
2.5 =
11001000(0.20) +
1100100(0.60)
2100100(0.20) +
21002000(0.60)
Confounding 11.20
TABLE 6 VITAMIN DEFICIENCY AND DEPRESSION: SELECTING NEW WEIGHTS
A COHORT STUDY OLD Vitamin deficiency + -
+ Depression
-
60% 60%
40% 40%
N1 N2 YOUNG Vitamin deficiency + -
+ Depression
-
20% 20%
80% 80% N3 N4
Select N1, N2, N3, N4 such that For example: N1, N2, N3, N4 = 100
then . . . table 7
1 = NNNN
32
41
Confounding 11.21
TABLE 7 VITAMIN DEFICIENCY AND DEPRESSION: RESULTS WITH NEW WEIGHTS
OLD Vitamin deficiency + -
Relative risk = 1 +
Depression -
60 60
40 40
100 100 YOUNG + -
Relative risk = 1 +
Depression -
20 20
80 80
100 100 ALL + -
Relative risk = 1 +
Depression -
80 80
120 120
200 200 This represents the principle of stratified analysis. The existing weights are replaced by other more a weights.
Confounding 11.22
TABLE 8 CONFOUNDING REVERSES THE EFFECT OF EXPOSURE
A COHORT STUDY MALE Drug A B
2.0 = 20020
1000200
÷ +
Rash -
200 20
800 180
1000 200 FEMALE Drug A B
1.5 = 2000800
10060
÷ +
Rash -
60 800
40 1200
100 2000 ALL Drug A B
0.63 = 2200820
1100260
÷ +
Rash -
260 820
840 1380
1100 2200
Confounding 11.23
TABLE 9 DIRECT STANDARDIZATION Replace crude weights by appropriate weights (the standard weights). Using data from table 8:
Drug A Drug B
Risk Population Crude weight Risk Population Crude
weight
male 0.20 1000 11001000
male 0.10 200 2200200
female 0.60 100 1100100
female 0.40 2000 22002000
Replace the observed population distribution by a standard arbitrary one. The one constraint is that the population weights must be identical within sexes.
Drug A Drug B
Risk Standard population Weight Risk Standard
population Weight
male 0.20 1000 1000/3000 male 0.10 1000 1000/3000
female 0.60 2000 2000/3000 female 0.40 2000 2000/3000
0.47 = 30002000(0.60) +
30001000(0.20) 0.30 =
30002000(0.40) +
30001000(0.10)
Relative risk:0.47/0.30 = 1.57 (an average of stratum-specific
values of the relative risk; the stratum-specific values in table 8 were 2.0 and 1.5)
Confounding 11.24
TABLE 10 INDIRECT STANDARDIZATION Replace the crude weights by weights in the exposed group
Drug treatment A = Exposed B = Unexposed
Risk Crude
population Crude weights Risk
Standard population = Crude
population in A
New weights = Crude
weights in A
male 0.20 1000 1000/1100 male 0.10 1000 1000/1100
female 0.60 100 100/1100 female 0.40 100 100/1100
0.24 = 1100100(0.60) +
11001000(0.20) 0.13 =
1100100(0.40) +
11001000(0.10)
Relative risk: 0.24/0.13 = 1.8 (an average of the
stratum-specific values of the relative risk)
The relative risk is more specifically called 'indirectly standardized relative risk.'
Confounding 11.25
TABLE 11 MANTEL-HAENSZEL PROCEDURE Procedure to obtain a weighted average of odds ratios (applicable in cohort and case-control studies; the original paper dealt with case-control studies) Example: A cohort study of lung dysfunction in occupational groups Young OCCUPATION
Manufacture Service Dysfunction + 10 5
- 90 95
100 100
Middle age 20 15
80 85
100 100
Old
+ 80 600
- 20 400
100 1000
2.1 = 90 595 10 = ORy ×
×
)OR( Variance1 W = youngfor weight MH
yy ≈
1.42 = 80 1585 20 = ORm ×
×
)OR( Variance1 W = age middlefor weightsMH
mm ≈
2.7 = 20 600
400 80 = ORo ××
)OR( Variance1 W = oldfor weightsMH
oo ≈
Confounding 11.26
TABLE 11 (continued) All ages
110 620
190 580
300 1200
Reference: Mantel N, Haenszel W. Statistical aspects of the analysis of data from retrospective studies of disease. Journal of the National Cancer Institute 1959;22:719-748.
0.54 = 190 620580 110 = OR
××
1 = W + W + W omy
= )OR(W + )OR(W + )OR(W = OR oommyyMH
2.2 = (2.6)W + (1.42)W + (2.1)W = omy
Confounding 11.27
TABLE 12 OVERVIEW OF CONTROL STRATEGIES FOR CONFOUNDING Strategy Level of
use
Randomization Design Restriction Design Stratification Analysis Multivariate analysis
Analysis (represents an extension of stratification in which stratum-specific estimates are influenced by values of other strata)
Matching Design and sometimes analysis, depending on the measure of association being estimated
Confounding 11.28
TABLE 13 EFFECT MODIFICATION AND CONFOUNDING 1. Effect modification, confounding
Stratum 1 Exp
Stratum 2 1 + 2
+ - + -
Dis + 200 20 60 800 260 820
- 800 180 40 1200 840 1380
RR = 2.0 RR = 1.5 RR = 0.63 2. Effect modification, no confounding
200 20 60 200 260 220
800 180 40 1800 840 1980
RR = 2.0 RR = 6.0 RR = 2.36 3. Confounding, no effect modification
200 20 80 800 280 820
800 180 20 1200 820 1380
RR = 2.0 RR = 2.0 RR = 0.68 4. No effect modification, no confounding
200 20 200 20 400 40
800 180 800 180 1600 360
RR = 2.0 RR = 2.0 RR = 2.0