76
IBE Multiple logistic regression Ulrich Mansmann, Alexander Crispin Department of Medical Informatics, Biometry, and Epidemiology Ludwig Maximilians University Munich

IBE - en.mmrs.med.uni-muenchen.de€¦ · 30 Interpretation of the regression coefficient βi (1) • Expected change of logit(P) associated with an increase of the independent variable

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

IBE

Multiple logistic regression

Ulrich Mansmann, Alexander CrispinDepartment of Medical Informatics, Biometry, and Epidemiology

Ludwig Maximilians University Munich

2

Overview

• Applications and contraindications• Basics• Interpreting the regression coefficients• Maximum likelihood estimation • Likelihood ratio test• Wald test, confidence intervals• Modeling strategies

3

4

5

Applications and contraindications

6

Regression models in epidemiology and other health sciences

• De facto standard for control ofconfounding and effect modification

• Standard implies:– Tried and tested for decades– No longer avant-garde statistical

methodology

• Much more flexible than e.g. Mantel-Haenszel analyses

7

Logistic regression: applications

• Many studies with binary = dichotomous outcome:– Cross-sectional studies– Case-control studies

without matching– Cohort studies with

cumulative incidence data

8

Logistic regression: assumptions

• One dichotomous outcome variable (dependent variable)

• Multiple, differently scaled factors (independent variables) influencing the outcome

• No collinearity: each independent variable provides unique information

• Independence of study subjects

Robust method with only few assumptions that could

be violated.

9

Contraindications

Problem Example Better try...

Outcome not binary

Quantitative outcome Linear regression

Ordinal outcome Ordinal logistic regression

Person-time data (incidence densities)

Cox proportional hazards regression

Subjects not independent

Matched case-control studies

Conditional logistic regression

Cluster samples GEE models, multi-level models

No individual data

Aggregate data on numbers of events Poisson regression

10

Examples of multi-collinearity

• Several indicators measuring the same factor

– Indicators of socio-economic status (SES): income, property, education, prestige

• Causal chains– Calorie intake adiposity

metabolic syndrome type 2 diabetes mellitus arteriosclerosis myocardial infarction (MI)

14

Basics of logistic regression

15

One dichotomous outcome as a function of multiple factors

• The event probability P of an is a function of multiple variables x1, x2, ..., xn

• At the end of the study, all individual outcome probabilities are known:

– Subjects without event: P = 0– Subjects with event: P = 1

P = f (x1,..., xn)

16

If the true function f were known...

• Clinicians could predictindividual disease risksfrom individual risk factor patterns.

• Scientists could tell the true effects of risk factors. – In many studies, there is one

exposure of primary interest.

– All other independent variables are merely nuisance factors.

Prediction

Adjustment, confounder control

17

Bad news, good news

• Bad news: the true function f is and will remain unknown.

• Good news: we can estimate the function f from empirical data.

18

0

1

0 100 200 300 400 500 600 700 800 900

Risk factor level

Risk

How should the function f behave?

Low level of exposure: risk near 0

High level of exposure: risk near 1

Sigmoidal (S-shaped)

increase with rising levels of exposure

19

Two variables: one exposure of primary interest, one confounder

100

200

300

400

500

600

700

800

100200

300400

500600700800

0

1

Risk

Exposure

Confounder

S-shaped risk increase with

higher levels of confounder

Exposure itself has no

effect

Exposure associated with confounder

20

More than two independent variables

• We have no intuitive understanding of higher-dimensional relationships.

• However, we can do computations involving more than three dimensions.

21

Mathematical details

22

Right-hand side: linear predictor (familiar from multiple linear regression)

Left-hand side of the equation: logit of the event probability ("log odds")

Regression equation for logistic regression

αβββ ++++=−

= ...1

ln)logit( 21122211 xxxxP

PP

Main effectweighted by a regression coefficient

Interaction termwith its

regression coefficient

InterceptAnother main effect with regression coefficient

23

Suboptimal alternative: linear probability model

-1

0

1

2

x

P

• The linear predictor is convenient and comes natural.

• However, standard linear regression is not ideal for modeling probabilities:– Biologically, a sigmoidal

function makes more sense than a straight line.

– The linear probability model leads to impossible probability estimatesbelow 0 or above 1.

24

Motivation for the logit transformation

• If we want to use a linear predictor, we must transform the straight line to asigmoidal function graph.

• Link function: we need a function that links our linear predictor to the non-linear probability.

• The most common link function is the logistic function.

25

Reminder: disease probability as function of risk factor RF

0

1

0 100 200 300 400 500 600 700 800 900

Risk factor level

Risk

p = odds/(1+odds)odds = p/(1-p)

26

Odds as function of RF

0

10

20

0 100 200 300 400 500 600 700 800 900

Risk factor level

Odd

sp = odds/(1+odds)odds = p/(1-p)

27

Log odds as function of RF

-10

-5

0

5

10

0 100 200 300 400 500 600 700 800 900

Risk factor level

Log

odds

p = odds/(1+odds)odds = p/(1-p)

28

Why use the logistic function?

• There are other functionswith sigmoidal graphs and values between 0 and 1.

• Example: distribution function of the Normal distribution (probit transformation) 0

1

0 100 200 300 400 500 600 700 800

Risk factor

Risk

Normal distribution function

29

A nice feature of logistic regression…

iiOR βe=

...718.2e =

30

Interpretation of the regression coefficient βi (1)

• Expected change of logit(P) associated with an increase of the independent variable xi by one unit.

• Exponentiation of βi yields the odds ratio for an increase of the independent variable xi by one unit.

• Since all regression coefficients are estimatedsimultaneously, all odds ratios are automatically adjusted for confounding by all other independent variables in the model.

• Under the rare disease assumption the odds ratio is a good approximation of the relative risk.

• (If the outcome is not a rare event, the odds ratio may still be used as an association measure in its own right.)

31

Interpretation of the regression coefficient βi (2)

Regression coefficient Odds ratio Interpretation

βi < 0 ORi < 1 Protective effect

βi = 0 ORi = 1 No effect

βi > 0 ORi > 1 Increased risk

32

Derivation: high risk subject H and low risk subject L

• Let L and H be two persons with almost identical risk factor patterns.

• Only difference: H has a higher risk because his value of risk factor x1 exceeds L's by exactly one unit.

33

Log odds of subjects L and H

CxxxP

PLL

L

L +=+++=− 112211 ...

1ln(I) βαββ

CxxxP

PLL

H

H ++=++++=−

)1(...)1(1

ln(II) 112211 βαββ

34

Equation II minus equation I yields the log odds ratio

)()1(1

ln1

lnI)(II 1111 CxCxP

PP

P- LLL

L

H

H +−++=−

−−

ββ

1111 )1(ln

1

1ln ββ =−+==

−LL

L

L

H

H

xxOR

PP

PP

1eβ=OR

35

What is "one unit"?

• Quantitative variables• Dichotomous variables• Polytomous variables• Ordinal variables

36

Quantitative independent variables

• The risk associated with a unit increase of body weight depends on the measurement unit:– Pounds?– Kilograms?– Hundredweights?– Metric tons?

37

Quantitative independent variables: increases by multiple units• The linear predictor says: When xi increases by one unit,

logit(P) increases by the constant amount βi .• It follows: when xi increases by k units, logit(P) increases by k

times βi .• When xi increases by k units, we have to multiply the odds

ratios for a unit increase k times:

...eeee ××=== ∏× iiii

k

kiOR ββββ

Level of the linear predictor: additive model Level of the odds ratio: multiplicative model

38

The model assumption of a linear increase of logit(P) need not make sense…

0

1

0 100 200 300 400 500 600 700 800 900

Exposure

Ris

k

U-/J-shaped risk

39

Odds for J-shaped risk

0

1

0 100 200 300 400 500 600 700 800 900

Exposure

Odd

s

40

Log odds for J-shaped risk

-3

-2

-1

00 100 200 300 400 500 600 700 800 900

Exposure

Log

odds

Log odds: definitely no straight line!

This would be the regression line...

41

"One unit" for dichotomous xi: a matter of coding…

• Frequently used because intuitive: cornered effects coding– Exposure present: xi = 1– Exposure not present: xi = 0

• Unfortunately not uncommon: centered effects coding– Exposure present: xi = 1– Exposure not present : xi = –1

Difference between exposed and unexposed

subjects: one unit

Difference between exposed and

unexposed subjects: two (!!) units

Be careful with your pocket calculator: when using

centered effects coding, βi is only half the expected size.

42

A remark on quantitative and dichotomous risk factors

• If a biological phenomenon is expressed as dichotomous factor, there may be an enormous difference between xi = 0 and xi = 1:

– βi >> 0 or βi << 0 – ORi >> 1 oder ORi << 1

• If xi is quantitative, an increase by one unit usually implies a small risk increase:

– βi near 0 – ORi near 1

Example: arterial hypertension yes/no

Example: systolic blood pressure in

mm Hg

43

Polytomous nominal independent variables

• Nominal variables with more than two values must be recoded using dichotomous dummy variables.

• If the nominal variable has kpossible values, we need k–1 dummies.

44

Dummy coding: example

Medication Dummy 1 Dummy 2 Dummy 3

Placebo 0 0 0

Ibuprofen 1 0 0

Diclofenac 0 1 0

Celecoxib 0 0 1

Interpretation of the dummies

Ibuprofen vs. placebo

Diclofenac vs. placebo

Celecoxib vs. placebo

Placebo = reference category

45

Ordinal independent variables

• Dummy coding as with polytomous nominal variables

• (In case of a constant risk increase per category, ordinal variables are sometimes handled as quantitative ones.)

46

Dummy coding of quantitative independent variables

• Popular solution for the problem of U-shaped risks:– Group quantitative values

into classes.– Proceed as with natural

ordinal variables (dummy coding).

Other solutions for U- and J-shaped risks

• Polynomials:– Include e.g. xi, xi

2, and xi3

in the model.

• Fractional polynomials: – Try combinations of xi

-3, xi

-2, xi-1, xi

-1/2, ln(xi), xi1/2,

xi, xi2, and xi

3.

47

0.5 1.0 1.5 2.0

02

46

8

x

x^3

48

Effect modification

• When estimating the risk function, all effects of independent variables in the model are considered simultaneously.

• This means automatic mutual adjustment for confounding.

• But: What do we do about effect modifiers?

49

Effect modification

• Interaction of two or more independent variables• Most common: two-way interactions

– Exposure: xi

– Potential effect modifier: xj

– Interaction term: xi × xj

• Effect modification: regression coefficient βij of the interaction term significantly different from 0.

αβββ ++++= ...)logit( 21122211 xxxxP

Interaction termwith regression

coefficient

50

Adjusting the OR in case of effect modification

• Remember stratified analysis à la Mantel-Haenszel?

• If there was evidence for effect modification, you reported stratum-specific risk estimates for each value of the effect modifier.

• Logistic regression makes no difference: the exposure effect can't be quantified by one single risk estimate.

ijiiji

iiji

jiji

jijijiiji

i

j

i

j

x

xxxi

OR

xOR

x

OR

ββββ

βββ

ββ

ββββ

+×+

×+

+

××++

==

===

==

==

ee

1 :#2 Caseee

0 :#1 Casee

ee

:exposure theof ratio odds Adjusted

1

0

1

51

Estimation of absolute risks (1)

• It's easy to derive probabilities from the log odds:– Cohort studies: cumulative

incidences – Cross-sectional studies:

prevalences– Case control studies: no

meaningful interpretation (arbitrary mix of cases and controls)

52

Estimation of absolute risks (2)

+−

+

+

+

∑+

=∑+

∑=

∑=

∑ +=−

iiii

ii

iii

iii

xx

x

x

iii

P

PP

xP

P

αβαβ

αβ

αβ

αβ

e1

1

e1

e

e1

1ln

53

Interpretation of the intercept α

• When all xi equal 0:

ααβ

+⋅− +

=∑

+

=e11

e1

10

i

P

• The intercept α quantifies the baseline risk.

54

Intercept α from case-control studies

• One can't estimate absolute risks from case-control studies.

• So there's no way of estimating a baseline risk.

• For case-control data, there's no meaningful interpretation of the intercept α.

55

Maximum likelihood estimation of the regression coefficients (1)

• We're looking for a combination of coefficients βi und α, that…

– ... fits our empirical data best.

– ... gives the most plausible explanation for our findings.

– ... maximizes the likelihood of our findings.

Maximum likelihood estimation of the regression coefficients (2)

• For any subject j with eventthe model should predict a high individual event probability.

• For any subject j withoutevent the model should predict a low probability.

56

1ˆ →jp

( ) 1ˆ10ˆ →−⇔→ jj pp

57

Maximum likelihood estimation of the regression coefficients (3)

Likelihood L: a measure for the goodness of fit between model and observed reality:

∏∏ −×=

events w/oSubjects

events with Subjects

)ˆ1(ˆ jj ppL

Maximum likelihood estimation of the regression coefficients (4)

• Technically, the computer doesn't work with the likelihood itself.

• Instead, it uses the deviance as a measure of badness of fit.

• An iterative optimization process adjusts initial estimates of the regression coefficients so that the deviance D is minimized.

• (And the likelihood L is maximized.)

58

0.0 0.2 0.4 0.6 0.8 1.0

02

46

810

1214

Likelihood

Dev

ianc

e

LD ln2×−=

59

Inference statistics

60

Likelihood-ratio test: comparison of nested models

• Larger model:– Number of independent variables in the larger model: m ("many")

• Smaller (reduced) model:– Number of independent variables in the smaller model: f ("fewer")– f ∈ {0, 1, ..., m –1}

• Nested models:– The independent variables in the smaller model must be a subset

of the variables in the larger one.– You can't use the LR test to compare models with disjoint sets of

predictors.

61

Likelihood-ratio test: null hypothesis

H0: The regression coefficients of all m–f independent variables not included in the reduced model are equal to 0.

That is: none of these m–f independent variables makes a

difference...

62

Likelihood-ratio test: test statistic

20

~ fm

HLS

LR

DDLR

−=

χ

• Likelihood ratio LR: difference of the deviances of the large and small model

• Under H0, LR follows a χ2 distribution withm–f degrees of freedom.

63

Likelihood-ratio test: applications

• Global test (f = 0): Is the full model better than the empty null model?

• Significance test for single predictors (f = m–1) • Significance test for multiple predictors at a time

(0 < f < m–1), e.g. dummy variables coding one categorical predictor:– Software output contains separate Wald tests for each

dummy variable, but these are not meaningful.– A test for all dummy variables together is required.

64

Likelihood-ratio test: pitfalls

• The LR test can't be used to compare models that are not nested.

• The LR test is only valid if the same number of observationsare used for modeling:

– Statistical software uses only cases without missing values.

– Less variables ⇒ less missing values ⇒ more usable cases for the smaller model

65

Confidence intervals for the regression coefficients

• Statistical software outputs standard errors for the regression coefficients.

• It's easy to derive confidence limits for regression coefficients and odds ratios.

( )ii SE ββ ×± 96.1

( )e 96.1 ii SE ββ ×±

66

Same logic: Wald test

• Comfortable alternative to LR tests for examining the roles of single independent variables.

• Null hypothesis H0: βi = 0• Test statistic: z or (mathematically equivalent) χ2(df=1)

221 )()(

==

i

i

i

i

SESEz

ββχ

ββ or

67

Modeling: selection of relevant predictors

Occam's razor:"Entities must not be multiplied beyond what is necessary."

68

What does this imply for regression models?

• We should look for the most parsimonious model that gives a valid picture.

• Whenever two concurrent models have equal explanatory power, we should choose the one with fewer independent variables.

69

A question of precision

• The more independent variables in the model, the lower the precision of the effect estimates: – Many variables – wide CI– Few variables – narrow CI

70

"Everything should be made as simple as possible, but not simpler."

• There's no need to include irrelevant variables in the model.

• However: validity overrides precision. Relevant variables mustn't be missed.

71

Two aims of modeling – two different modeling strategies

1. Prediction of outcomes2. Adjustment: valid

estimation of the effect of one exposure

72

Modeling strategy 1: prediction models

• Objective: to find a parsimonious model that predicts the outcomes optimally. – A priori, all independent variables are of equal interest.– Independent variables without explanatory power should not

appear in the final model.

• Approaches:– Forward selection (p-value driven)– Backward elimination (p-value driven)– Stepwise selection (combination of forward selection and backward

elimination)– Best subset selection (e.g. based on AIC or SC)

73

Modeling strategy 2: adjusting the exposure effect for other variables

• Our interest is solely in the effect of one selected exposure to be estimatedvalidly and precisely.

• If the result is valid, there is no problem with an OR near 1.

• The exposure is included in the model, even if its effect is statistically not significant.

74

Modeling strategy 2: overview

• Step 1: preselection of potential confounders and effect modifiers

• Step 2: identification of effect modifiers (p-value driven)

• Step 3: identification of confounders (change-of-estimate criterion)

75

Step 1: pre-selection of potential confounders and effect modifiers

• Common sense• Scientific knowledge: every

risk factor for the outcome is a potential confounder

• Exploratory data analysis– Bivariate analyses– Stratified analyses– In this phase, the aim is high

sensitivity for detecting of potentially relevant variables, not high specificity Use a high alpha level (0.1 or 0.2).

76

Step 2: identification of effect modifiers

• Formulate interaction terms of the exposure and allsuspected effect modifiers:– Two-way interactions – If necessary: 3-way or higher-dimensional interactions– No interactions that don't involve the exposure

• Formulate the full model with all main effects and all interaction terms.

• Use p-value based backward elimination to remove non-significant interactions.– Significant interaction means effect modification (important

finding).– No matter what happens later: significant interaction terms and

the main effects involved are included in all subsequent models.

77

Where do we stand now?

• At this stage, the estimate of the exposure effect is as valid as possible:

– All effect modifiers are identified.

– Since all main effects are still in the model, confounding is under control.

• However, the estimate of the exposure effect may not be asprecise as it could be.

– Lots of variables in the model wide confidence interval

78

Step 3: identification of confounders

• Eliminate main effects that are not needed:– Main effects not involved in significant interactions– Main effects that are no confounders

• Change-in-estimate criterion– When eliminating a variable, the (maximally valid) estimate of the

exposure effect must not change materially.– If it does, the eliminated variable is a confounder and must be

reintroduced into the model.• This may cause elimination of established risk factors.

– No problem: if the risk factor is not associated with the exposure, it's no confounder.

– Be careful: the resulting model is not appropriate for predicting outcomes.

79

Conclusion

• Logistic regression is an established standard for analysis of a wide range of studies:

– Cross-sectional studies– Unmatched case-control studies– Cohort studies with cumulative

incidence data

• It is not a panacea. Other study designs require other methods:

– Matched case-control studies: conditional logistic regression

– Cohort studies with person-time data: Cox proportional hazards regression