Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
IBE
Multiple logistic regression
Ulrich Mansmann, Alexander CrispinDepartment of Medical Informatics, Biometry, and Epidemiology
Ludwig Maximilians University Munich
2
Overview
• Applications and contraindications• Basics• Interpreting the regression coefficients• Maximum likelihood estimation • Likelihood ratio test• Wald test, confidence intervals• Modeling strategies
6
Regression models in epidemiology and other health sciences
• De facto standard for control ofconfounding and effect modification
• Standard implies:– Tried and tested for decades– No longer avant-garde statistical
methodology
• Much more flexible than e.g. Mantel-Haenszel analyses
7
Logistic regression: applications
• Many studies with binary = dichotomous outcome:– Cross-sectional studies– Case-control studies
without matching– Cohort studies with
cumulative incidence data
8
Logistic regression: assumptions
• One dichotomous outcome variable (dependent variable)
• Multiple, differently scaled factors (independent variables) influencing the outcome
• No collinearity: each independent variable provides unique information
• Independence of study subjects
Robust method with only few assumptions that could
be violated.
9
Contraindications
Problem Example Better try...
Outcome not binary
Quantitative outcome Linear regression
Ordinal outcome Ordinal logistic regression
Person-time data (incidence densities)
Cox proportional hazards regression
Subjects not independent
Matched case-control studies
Conditional logistic regression
Cluster samples GEE models, multi-level models
No individual data
Aggregate data on numbers of events Poisson regression
10
Examples of multi-collinearity
• Several indicators measuring the same factor
– Indicators of socio-economic status (SES): income, property, education, prestige
• Causal chains– Calorie intake adiposity
metabolic syndrome type 2 diabetes mellitus arteriosclerosis myocardial infarction (MI)
15
One dichotomous outcome as a function of multiple factors
• The event probability P of an is a function of multiple variables x1, x2, ..., xn
• At the end of the study, all individual outcome probabilities are known:
– Subjects without event: P = 0– Subjects with event: P = 1
P = f (x1,..., xn)
16
If the true function f were known...
• Clinicians could predictindividual disease risksfrom individual risk factor patterns.
• Scientists could tell the true effects of risk factors. – In many studies, there is one
exposure of primary interest.
– All other independent variables are merely nuisance factors.
Prediction
Adjustment, confounder control
17
Bad news, good news
• Bad news: the true function f is and will remain unknown.
• Good news: we can estimate the function f from empirical data.
18
0
1
0 100 200 300 400 500 600 700 800 900
Risk factor level
Risk
How should the function f behave?
Low level of exposure: risk near 0
High level of exposure: risk near 1
Sigmoidal (S-shaped)
increase with rising levels of exposure
19
Two variables: one exposure of primary interest, one confounder
100
200
300
400
500
600
700
800
100200
300400
500600700800
0
1
Risk
Exposure
Confounder
S-shaped risk increase with
higher levels of confounder
Exposure itself has no
effect
Exposure associated with confounder
20
More than two independent variables
• We have no intuitive understanding of higher-dimensional relationships.
• However, we can do computations involving more than three dimensions.
22
Right-hand side: linear predictor (familiar from multiple linear regression)
Left-hand side of the equation: logit of the event probability ("log odds")
Regression equation for logistic regression
αβββ ++++=−
= ...1
ln)logit( 21122211 xxxxP
PP
Main effectweighted by a regression coefficient
Interaction termwith its
regression coefficient
InterceptAnother main effect with regression coefficient
23
Suboptimal alternative: linear probability model
-1
0
1
2
x
P
• The linear predictor is convenient and comes natural.
• However, standard linear regression is not ideal for modeling probabilities:– Biologically, a sigmoidal
function makes more sense than a straight line.
– The linear probability model leads to impossible probability estimatesbelow 0 or above 1.
24
Motivation for the logit transformation
• If we want to use a linear predictor, we must transform the straight line to asigmoidal function graph.
• Link function: we need a function that links our linear predictor to the non-linear probability.
• The most common link function is the logistic function.
25
Reminder: disease probability as function of risk factor RF
0
1
0 100 200 300 400 500 600 700 800 900
Risk factor level
Risk
p = odds/(1+odds)odds = p/(1-p)
26
Odds as function of RF
0
10
20
0 100 200 300 400 500 600 700 800 900
Risk factor level
Odd
sp = odds/(1+odds)odds = p/(1-p)
27
Log odds as function of RF
-10
-5
0
5
10
0 100 200 300 400 500 600 700 800 900
Risk factor level
Log
odds
p = odds/(1+odds)odds = p/(1-p)
28
Why use the logistic function?
• There are other functionswith sigmoidal graphs and values between 0 and 1.
• Example: distribution function of the Normal distribution (probit transformation) 0
1
0 100 200 300 400 500 600 700 800
Risk factor
Risk
Normal distribution function
30
Interpretation of the regression coefficient βi (1)
• Expected change of logit(P) associated with an increase of the independent variable xi by one unit.
• Exponentiation of βi yields the odds ratio for an increase of the independent variable xi by one unit.
• Since all regression coefficients are estimatedsimultaneously, all odds ratios are automatically adjusted for confounding by all other independent variables in the model.
• Under the rare disease assumption the odds ratio is a good approximation of the relative risk.
• (If the outcome is not a rare event, the odds ratio may still be used as an association measure in its own right.)
31
Interpretation of the regression coefficient βi (2)
Regression coefficient Odds ratio Interpretation
βi < 0 ORi < 1 Protective effect
βi = 0 ORi = 1 No effect
βi > 0 ORi > 1 Increased risk
32
Derivation: high risk subject H and low risk subject L
• Let L and H be two persons with almost identical risk factor patterns.
• Only difference: H has a higher risk because his value of risk factor x1 exceeds L's by exactly one unit.
33
Log odds of subjects L and H
CxxxP
PLL
L
L +=+++=− 112211 ...
1ln(I) βαββ
CxxxP
PLL
H
H ++=++++=−
)1(...)1(1
ln(II) 112211 βαββ
34
Equation II minus equation I yields the log odds ratio
)()1(1
ln1
lnI)(II 1111 CxCxP
PP
P- LLL
L
H
H +−++=−
−−
ββ
1111 )1(ln
1
1ln ββ =−+==
−
−LL
L
L
H
H
xxOR
PP
PP
1eβ=OR
35
What is "one unit"?
• Quantitative variables• Dichotomous variables• Polytomous variables• Ordinal variables
36
Quantitative independent variables
• The risk associated with a unit increase of body weight depends on the measurement unit:– Pounds?– Kilograms?– Hundredweights?– Metric tons?
37
Quantitative independent variables: increases by multiple units• The linear predictor says: When xi increases by one unit,
logit(P) increases by the constant amount βi .• It follows: when xi increases by k units, logit(P) increases by k
times βi .• When xi increases by k units, we have to multiply the odds
ratios for a unit increase k times:
...eeee ××=== ∏× iiii
k
kiOR ββββ
Level of the linear predictor: additive model Level of the odds ratio: multiplicative model
38
The model assumption of a linear increase of logit(P) need not make sense…
0
1
0 100 200 300 400 500 600 700 800 900
Exposure
Ris
k
U-/J-shaped risk
40
Log odds for J-shaped risk
-3
-2
-1
00 100 200 300 400 500 600 700 800 900
Exposure
Log
odds
Log odds: definitely no straight line!
This would be the regression line...
41
"One unit" for dichotomous xi: a matter of coding…
• Frequently used because intuitive: cornered effects coding– Exposure present: xi = 1– Exposure not present: xi = 0
• Unfortunately not uncommon: centered effects coding– Exposure present: xi = 1– Exposure not present : xi = –1
Difference between exposed and unexposed
subjects: one unit
Difference between exposed and
unexposed subjects: two (!!) units
Be careful with your pocket calculator: when using
centered effects coding, βi is only half the expected size.
42
A remark on quantitative and dichotomous risk factors
• If a biological phenomenon is expressed as dichotomous factor, there may be an enormous difference between xi = 0 and xi = 1:
– βi >> 0 or βi << 0 – ORi >> 1 oder ORi << 1
• If xi is quantitative, an increase by one unit usually implies a small risk increase:
– βi near 0 – ORi near 1
Example: arterial hypertension yes/no
Example: systolic blood pressure in
mm Hg
43
Polytomous nominal independent variables
• Nominal variables with more than two values must be recoded using dichotomous dummy variables.
• If the nominal variable has kpossible values, we need k–1 dummies.
44
Dummy coding: example
Medication Dummy 1 Dummy 2 Dummy 3
Placebo 0 0 0
Ibuprofen 1 0 0
Diclofenac 0 1 0
Celecoxib 0 0 1
Interpretation of the dummies
Ibuprofen vs. placebo
Diclofenac vs. placebo
Celecoxib vs. placebo
Placebo = reference category
45
Ordinal independent variables
• Dummy coding as with polytomous nominal variables
• (In case of a constant risk increase per category, ordinal variables are sometimes handled as quantitative ones.)
46
Dummy coding of quantitative independent variables
• Popular solution for the problem of U-shaped risks:– Group quantitative values
into classes.– Proceed as with natural
ordinal variables (dummy coding).
Other solutions for U- and J-shaped risks
• Polynomials:– Include e.g. xi, xi
2, and xi3
in the model.
• Fractional polynomials: – Try combinations of xi
-3, xi
-2, xi-1, xi
-1/2, ln(xi), xi1/2,
xi, xi2, and xi
3.
47
0.5 1.0 1.5 2.0
02
46
8
x
x^3
48
Effect modification
• When estimating the risk function, all effects of independent variables in the model are considered simultaneously.
• This means automatic mutual adjustment for confounding.
• But: What do we do about effect modifiers?
49
Effect modification
• Interaction of two or more independent variables• Most common: two-way interactions
– Exposure: xi
– Potential effect modifier: xj
– Interaction term: xi × xj
• Effect modification: regression coefficient βij of the interaction term significantly different from 0.
αβββ ++++= ...)logit( 21122211 xxxxP
Interaction termwith regression
coefficient
50
Adjusting the OR in case of effect modification
• Remember stratified analysis à la Mantel-Haenszel?
• If there was evidence for effect modification, you reported stratum-specific risk estimates for each value of the effect modifier.
• Logistic regression makes no difference: the exposure effect can't be quantified by one single risk estimate.
ijiiji
iiji
jiji
jijijiiji
i
j
i
j
x
xxxi
OR
xOR
x
OR
ββββ
βββ
ββ
ββββ
+×+
×+
+
××++
==
===
==
==
ee
1 :#2 Caseee
0 :#1 Casee
ee
:exposure theof ratio odds Adjusted
1
0
1
51
Estimation of absolute risks (1)
• It's easy to derive probabilities from the log odds:– Cohort studies: cumulative
incidences – Cross-sectional studies:
prevalences– Case control studies: no
meaningful interpretation (arbitrary mix of cases and controls)
52
Estimation of absolute risks (2)
+−
+
+
+
∑+
=∑+
∑=
∑=
−
∑ +=−
iiii
ii
iii
iii
xx
x
x
iii
P
PP
xP
P
αβαβ
αβ
αβ
αβ
e1
1
e1
e
e1
1ln
53
Interpretation of the intercept α
• When all xi equal 0:
ααβ
−
+⋅− +
=∑
+
=e11
e1
10
i
P
• The intercept α quantifies the baseline risk.
54
Intercept α from case-control studies
• One can't estimate absolute risks from case-control studies.
• So there's no way of estimating a baseline risk.
• For case-control data, there's no meaningful interpretation of the intercept α.
55
Maximum likelihood estimation of the regression coefficients (1)
• We're looking for a combination of coefficients βi und α, that…
– ... fits our empirical data best.
– ... gives the most plausible explanation for our findings.
– ... maximizes the likelihood of our findings.
Maximum likelihood estimation of the regression coefficients (2)
• For any subject j with eventthe model should predict a high individual event probability.
• For any subject j withoutevent the model should predict a low probability.
56
1ˆ →jp
( ) 1ˆ10ˆ →−⇔→ jj pp
57
Maximum likelihood estimation of the regression coefficients (3)
Likelihood L: a measure for the goodness of fit between model and observed reality:
∏∏ −×=
events w/oSubjects
events with Subjects
)ˆ1(ˆ jj ppL
Maximum likelihood estimation of the regression coefficients (4)
• Technically, the computer doesn't work with the likelihood itself.
• Instead, it uses the deviance as a measure of badness of fit.
• An iterative optimization process adjusts initial estimates of the regression coefficients so that the deviance D is minimized.
• (And the likelihood L is maximized.)
58
0.0 0.2 0.4 0.6 0.8 1.0
02
46
810
1214
Likelihood
Dev
ianc
e
LD ln2×−=
60
Likelihood-ratio test: comparison of nested models
• Larger model:– Number of independent variables in the larger model: m ("many")
• Smaller (reduced) model:– Number of independent variables in the smaller model: f ("fewer")– f ∈ {0, 1, ..., m –1}
• Nested models:– The independent variables in the smaller model must be a subset
of the variables in the larger one.– You can't use the LR test to compare models with disjoint sets of
predictors.
61
Likelihood-ratio test: null hypothesis
H0: The regression coefficients of all m–f independent variables not included in the reduced model are equal to 0.
That is: none of these m–f independent variables makes a
difference...
62
Likelihood-ratio test: test statistic
20
~ fm
HLS
LR
DDLR
−
−=
χ
• Likelihood ratio LR: difference of the deviances of the large and small model
• Under H0, LR follows a χ2 distribution withm–f degrees of freedom.
63
Likelihood-ratio test: applications
• Global test (f = 0): Is the full model better than the empty null model?
• Significance test for single predictors (f = m–1) • Significance test for multiple predictors at a time
(0 < f < m–1), e.g. dummy variables coding one categorical predictor:– Software output contains separate Wald tests for each
dummy variable, but these are not meaningful.– A test for all dummy variables together is required.
64
Likelihood-ratio test: pitfalls
• The LR test can't be used to compare models that are not nested.
• The LR test is only valid if the same number of observationsare used for modeling:
– Statistical software uses only cases without missing values.
– Less variables ⇒ less missing values ⇒ more usable cases for the smaller model
65
Confidence intervals for the regression coefficients
• Statistical software outputs standard errors for the regression coefficients.
• It's easy to derive confidence limits for regression coefficients and odds ratios.
( )ii SE ββ ×± 96.1
( )e 96.1 ii SE ββ ×±
66
Same logic: Wald test
• Comfortable alternative to LR tests for examining the roles of single independent variables.
• Null hypothesis H0: βi = 0• Test statistic: z or (mathematically equivalent) χ2(df=1)
221 )()(
==
i
i
i
i
SESEz
ββχ
ββ or
67
Modeling: selection of relevant predictors
Occam's razor:"Entities must not be multiplied beyond what is necessary."
68
What does this imply for regression models?
• We should look for the most parsimonious model that gives a valid picture.
• Whenever two concurrent models have equal explanatory power, we should choose the one with fewer independent variables.
69
A question of precision
• The more independent variables in the model, the lower the precision of the effect estimates: – Many variables – wide CI– Few variables – narrow CI
70
"Everything should be made as simple as possible, but not simpler."
• There's no need to include irrelevant variables in the model.
• However: validity overrides precision. Relevant variables mustn't be missed.
71
Two aims of modeling – two different modeling strategies
1. Prediction of outcomes2. Adjustment: valid
estimation of the effect of one exposure
72
Modeling strategy 1: prediction models
• Objective: to find a parsimonious model that predicts the outcomes optimally. – A priori, all independent variables are of equal interest.– Independent variables without explanatory power should not
appear in the final model.
• Approaches:– Forward selection (p-value driven)– Backward elimination (p-value driven)– Stepwise selection (combination of forward selection and backward
elimination)– Best subset selection (e.g. based on AIC or SC)
73
Modeling strategy 2: adjusting the exposure effect for other variables
• Our interest is solely in the effect of one selected exposure to be estimatedvalidly and precisely.
• If the result is valid, there is no problem with an OR near 1.
• The exposure is included in the model, even if its effect is statistically not significant.
74
Modeling strategy 2: overview
• Step 1: preselection of potential confounders and effect modifiers
• Step 2: identification of effect modifiers (p-value driven)
• Step 3: identification of confounders (change-of-estimate criterion)
75
Step 1: pre-selection of potential confounders and effect modifiers
• Common sense• Scientific knowledge: every
risk factor for the outcome is a potential confounder
• Exploratory data analysis– Bivariate analyses– Stratified analyses– In this phase, the aim is high
sensitivity for detecting of potentially relevant variables, not high specificity Use a high alpha level (0.1 or 0.2).
76
Step 2: identification of effect modifiers
• Formulate interaction terms of the exposure and allsuspected effect modifiers:– Two-way interactions – If necessary: 3-way or higher-dimensional interactions– No interactions that don't involve the exposure
• Formulate the full model with all main effects and all interaction terms.
• Use p-value based backward elimination to remove non-significant interactions.– Significant interaction means effect modification (important
finding).– No matter what happens later: significant interaction terms and
the main effects involved are included in all subsequent models.
77
Where do we stand now?
• At this stage, the estimate of the exposure effect is as valid as possible:
– All effect modifiers are identified.
– Since all main effects are still in the model, confounding is under control.
• However, the estimate of the exposure effect may not be asprecise as it could be.
– Lots of variables in the model wide confidence interval
78
Step 3: identification of confounders
• Eliminate main effects that are not needed:– Main effects not involved in significant interactions– Main effects that are no confounders
• Change-in-estimate criterion– When eliminating a variable, the (maximally valid) estimate of the
exposure effect must not change materially.– If it does, the eliminated variable is a confounder and must be
reintroduced into the model.• This may cause elimination of established risk factors.
– No problem: if the risk factor is not associated with the exposure, it's no confounder.
– Be careful: the resulting model is not appropriate for predicting outcomes.
79
Conclusion
• Logistic regression is an established standard for analysis of a wide range of studies:
– Cross-sectional studies– Unmatched case-control studies– Cohort studies with cumulative
incidence data
• It is not a panacea. Other study designs require other methods:
– Matched case-control studies: conditional logistic regression
– Cohort studies with person-time data: Cox proportional hazards regression