1 Multivariable Modeling. 2 nAdjustment by statistical model for the relationships of predictors to the outcome. nRepresents the frequency or magnitude

1

Multivariable Modeling

2

Multivariable ModelingAdjustment by statistical model for the relationships of predictors to the outcome.

Represents the frequency or magnitude of one phenomenon as a mathematical function of “predictors” and random variation.

–The phenomenon modeled may be continuous, e.g. HDL cholesterol, or categorical, e.g. survival or death.

–The model consists of

a mathematical form of an equation to predict some aspect of the distribution of the predicted phenomenon, e.g. mean cholesterol or probability of death

a probability law that describes, on a group basis, how individuals vary from what the equation predicts

3

Multivariable Modeling The prediction equation typically includes.

– The exposure of interest.– Other exposures of potential importance.– Potential confounders.

The predictors may be.– Continuous variables.– Categorical variables with two or more categories, or.– Combinations of these.

The data are used to estimate coefficients of the prediction equation, and the magnitude of random variation.

The coefficients represent the statistical effects of the various predictors, assuming that the other predictors are adjusted for by holding them constant.– The prediction equation relating the outcome to any single

predictor, holding the others constant, may be linear, quadratic, cyclical, or of many other forms.

4

Multivariable Modeling: Multiple Linear Regression Models the mean of a quantitative outcome as a function of the

values of predictor variables. Assumes independent observations with approximately

Gaussian (normal) distributions. Contrary to what the name suggests, these models need not be

linear in the predictor variables. They are always, however, linear in the coefficients by which the values of predictor variables are multiplied.

Example:

E(y) = 1x1+2x2+3x3+1z1+2z2

where x1 is the value of the exposure of interest, x2 and x3 are values of other variables that may biologically affect y, and z1 and z2 are possible confounders. The x’s and z’s may be continuous, or values of 0-1 “dummy variables” representing categories of qualitative variables.

The Greek coefficients are estimated from the observed data.

5

Example

Outcome is systolic BP = 30 X1 is age 1=1.5 X2 is a dummy variable for gender (male=1, female=0) 2 =10 Equation is: Mean BP =30+ 1.5 X1+ 10 X2+..+… So mean systolic blood pressure=

30 +(1.5 X age)+ 10X0 for women And for men = 30+(1.5Xage) +10X1

6

Interpretation

The most important item here is 1 =1.5 mm Hg.

This will be reported as follows: We found an association between age and

SBP, with a mean increase of 1.5 mmHg for every increase in age of 1 year after adjustment for ….

This applies equally to men and women with the mean SBP being 10 mmHg higher in men at every age.

7

Note that gender is not an effect modifier because the 1.5 mmHg correlation is same for men and women

Gender here is independently associated with the outcome

Could be a confounder in your crude calculations if it is associated with age in your sample

But even if it is a confounder in the crude calculation, the 1.5 mmHg correlation is already adjusted for gender. (It is adjusted for all the other variables in the equation)

8

Multiple Linear Regression When a confounder is added into the equation, the

beta of the exposure you are interested in becomes “adjusted” for the confounder.

That is to say “This is the correct association after a confounder is taken into account”.

This is how confounders are searched for in regression. By adding each into the equation and finding out whether the for the exposure of interest changes.

Whenever a gets close to zero that variable is taken out.

9

Regression

Think of it as each risk factor is adjusted for all the other risk factors in the model.

10

Interpretaion

“There were 10 factors significantly associated with the outcome in univariate analysis. In multivariate analysis only factors 1-5 remained significant.”

Factors 1-5 are truly associated with the outcome. Factors 6-10 are not independently associated

with the outcome. Factors 6-10 were confounded by factors 1-5. for factors 6-10 became 0 after adjustment for

factors 1-5

11

Multivariable Modeling: Multiple linear regression

Thus, 1 represents the predicted change in the mean value of y associated with an increase of one unit in the variable represented by x1, with the variables represented by the other x’s and z’s held fixed.

This type of model accommodates effect modification through the use of interaction terms, e.g., x1z2, which allows the effect of a change of one unit in x1 to vary with the value of z2.

E(y) = 1x1+2x2+3x3+1z1+2z2+ x1z2.

is thus a difference of differences: how the effect on y of a one unit increase in x1 is itself modified by a one unit increase in z2.

12

Multivariable Modeling: Multiple linear regression

E(y) = ………………………+ x1z2.

13

Example

Mean SBP=………+ 0.5 X age X race Mean SBP= 30 +1.5 X age+ 10 X (1 for men and 0 for women) + 0.5 X age X (0 for white and 1 for black) That is to say for every increase of 1 yr

BP goes up 1.5 in whites but 1.5 + 0.5 = 2 in blacks.

14

Multivariable Modeling: Multiple logistic regression

Models the probability of a dichotomous outcome as a function of the values of predictor variables.

Assumes independent observations with binomial distributions.

The right side of equation is the same as linear regression. The left side (the outcome) is different

Natural log of the odds of outcome = 1x1+2x2+3x3+1z1+2z2

e.g. the odds of response to cancer chemotherapy, and the other symbols are all as defined above for multiple linear regression.

15

Multivariable Modeling: Multiple logistic regression

More specifically: When x1 represents levels of a dichotomous predictor by the

values 0 (absent) and 1 (present), then

exp(1) is the predicted odds ratio relating predictor to outcome,

e.g., smoking to lung cancer,

adjusted for other possible predictors and confounders.

When x1 represents values of a quantitative predictor, then

exp(1) is the odds ratio between predictor and outcome,

e.g., stroke and diastolic blood pressure,

associated with a one unit increase in the predictor,

and adjusted for other possible predictors and confounders.

16

Multivariable Modeling: Multiple Logistic Regression

Again multiple variables will be introduced to see if the OR for others will become 1 ( or close to 1). Or if the associated p-values will become NS.

These variables are then dropped out of the equation because they were not truly associated with the outcome but were only confounded by the other variables.

17

Multivariable Modeling: Multiple Logistic Regression

At the end the relevant variables’ ORs will be reported and also interactions will be reported.

The OR will be reported as the adjusted OR for that association. (Adjusted for all the variables in the model)

18

Interpretaion

Moderate alcohol consumption protects from coronary disease (OR =0.56)

It is well established that moderate alcohol consumption CAUSES an increase in HDL.

It has been postulated that alcohol’s coronary protective effect is mediated by raising HDL.

“When HDL level was introduced into the model the RR for moderate drinking increased from 0.56 to 0.77 but remained significant.”

19

Interpretaion

HDL explains some but not all of alcohol’s coronary protective effect.

the RR for alcohol (0.77) is independent of it’s effect on HDL.

Alcohol offers more protection (RR 0.56) through its effect on raising HDL.

Some of the protective effect is mediated (not confounded) by HDL

Both HDL and alcohol are truly and independently associated with decreased coronary events.

20

Not a confounder

HDL is not a confounder Why did we adjust for it? When should you do that? When should you not?

21

Propensity Scores

If there isn’t enough outcomes you can’t use logistic regression.

Rifampin and Pyrazinamide versus Isoniazid for latent TB.

411 patients. 18 cases of hepatotoxicity. Not randomized. Patients at higher risk for hepatotoxicity

received R/P.

22

Propensity Scores

A crude comparison would be unfair to R/P.

Need some “adjustment’. Typically we use logistic regression to

look for any and all factors associated with hepatotoxicity and adjust for those.

When the outcome is rare this cannot be done.

23

Propensity Scores

We can look for factors associated with the treatment choice.

Certain variables (e.g. alcohol use) make a patient more likely to receive R/P.

These factors are given numeric scores. The higher the score the higher the

propensity to be treated with R/P

24

Propensity Scores

You calculate propensity score for every patient.

Compare patients with equal propensity scores as to the incidence of the outcome.

There might be 90 patients with the same propensity score.

They all are moderate alcohol drinkers, they all had remote history of hepatitis, and so on.

This can accommodate many many variables.

25

Those 90 patients

With identical high propensity scores had a high likelihood of receiving R/P.

Guess what 75 of them received P/R and only 15 received INH.

But NOW we can compare the incidence of that outcome in these 75 to these 15.

26

Typically

5 groups using the quintiles of the score are used.

27

In a Clinical Trial of Platelet Inhibitor Data were collected regarding

outcomes (death etc.) Also we have information about who

received early statin therapy. But receiving early statin was not a

random process. Totally up to clinical discretion

28

We want to study

The association between early statin therapy and outcome.

BUT The patients who received statins are

very different than those who didn’t.

29

Crude event rates

Would be unfair comparison. PROPENSITY SCORES We find out what factors were

associated with statin use. For example younger patients were

more likely to receive statin.

30

Propensity Scores

Are then used to classify patients by quintile of increasing probability of early statin initiation. (The 1st quintile least likely, the 5th most likely).

Patients within each quintile were similar in their likelihood to receive a statin.

31

32

33

Patients in 1st quintile

Were least likely to receive statin Of 2391 patients144 received statin and

2247 did not. All these 2391 patients are very similar

in all confounding factors and can be compared.

Documents

1 Multivariable Modeling. 2 nAdjustment by statistical model for the relationships of predictors to the outcome. nRepresents the frequency or magnitude