14
Week 6: Model selection Overview Questions from last week Model selection in multivariable analysis -bivariate significance -interaction and confounding Discussion of the 3 articles Data analysis discussion

Week 6: Model selection Overview Questions from last week Model selection in multivariable analysis -bivariate significance -interaction and confounding

Embed Size (px)

Citation preview

Page 1: Week 6: Model selection Overview Questions from last week Model selection in multivariable analysis -bivariate significance -interaction and confounding

Week 6: Model selection

Overview

Questions from last week

Model selection in multivariable analysis

-bivariate significance

-interaction and confounding

Discussion of the 3 articles

Data analysis discussion

Page 2: Week 6: Model selection Overview Questions from last week Model selection in multivariable analysis -bivariate significance -interaction and confounding

Univariate, bivariate, and multivariate analysis: a review

Type of analysis Type of variable/test used

Purpose

Univariate Continuous: mean, median, standard deviationHistogram

Outcome variable: to assess normal distribution

Exposure variable: to examine distribution, missing variables, etc.

Univariate Categorical:

Frequency distribution

Outcome variable: to assess frequency

Exposure variables: to assess frequency (are there enough observations in each category?), missing variables

Page 3: Week 6: Model selection Overview Questions from last week Model selection in multivariable analysis -bivariate significance -interaction and confounding

Univariate, bivariate, and multivariate analysis: a review

Type of analysis Type of variable/test used

Purpose

Bivariate: for exposure groups

Continuous: t-test between exposure groups

Categorical:

Chi-square test

To assess differences between groups prior to analysis

To look for possible confounding relationships.

Bivariate for outcome groups

Continuous: t-test

Categorical:

Odds ratio

To look for significant differences in the outcome variable by exposure variables

‘Crude’ analysis

Page 4: Week 6: Model selection Overview Questions from last week Model selection in multivariable analysis -bivariate significance -interaction and confounding

Univariate, bivariate, and multivariate analysis: a review

Type of analysis Type of variable/test used

Purpose

Multivariate: for continuous variables

Linear regression analysis

To examine the relationship between all the exposure variables and the outcome variable controlling for all the variables in the model

High r2 desired.

Multivariate for binary (yes/no) outcomes

Logistic regression analysis

To examine the relationship between all the exposure variables and the outcome variable controlling for all the variables in the model

‘Adjusted’ analysis

Page 5: Week 6: Model selection Overview Questions from last week Model selection in multivariable analysis -bivariate significance -interaction and confounding

Back to the mathematical model• In linear regression Y’ (known as Y prime) is the

predicted value on the outcome variable• A is the Y axis intercept

• β1 is the coefficient assigned through regression

• X1 is the unit of the exposure variable

• For logistic regression the model is:

• ln ( Y’ ) =A + β1X1 + β2X2 + β3X3

• 1-Y’

Page 6: Week 6: Model selection Overview Questions from last week Model selection in multivariable analysis -bivariate significance -interaction and confounding

Model selection

• A ‘full’ model is one that includes all the variables

• A ‘null’ model is one that includes only the intercept

• Selection of which variables to include can be done by you, by the computer, or both

• Types of selection:

• Forward, backward, stepwise

Page 7: Week 6: Model selection Overview Questions from last week Model selection in multivariable analysis -bivariate significance -interaction and confounding

Backward selection

• Starts with a full model

• Removes variables starting with the least significant variable

• Often the best approach to start with

Page 8: Week 6: Model selection Overview Questions from last week Model selection in multivariable analysis -bivariate significance -interaction and confounding

• What do you get when you cross a statistician with a chiropractor?

• You get an adjusted R squared from a BACKward regression problem!

Page 9: Week 6: Model selection Overview Questions from last week Model selection in multivariable analysis -bivariate significance -interaction and confounding

Forward selection

• Starts with a null model

• Enters the variables into the model starting with the most significant

• Can miss important associations or interactions

Page 10: Week 6: Model selection Overview Questions from last week Model selection in multivariable analysis -bivariate significance -interaction and confounding

Stepwise selection

• Starts with a full or null model (usually a full model or backwards stepwise)

• Adds or removes variables based on their significance in the model

• Looks at variable itself and the relationship with other in the model

• Can be considered the best automatic model selection especially with many exposure variables

Page 11: Week 6: Model selection Overview Questions from last week Model selection in multivariable analysis -bivariate significance -interaction and confounding

Maximum likelihood model fitting

• Most logistic regression models use the maximum likelihood model to fit regression models

• The log-likelihood is calculated based on predicted and actual outcomes A good model has a NON-significant LL

• A goodness-of-fit chi-square is calculated (usually compares a constant-only model to the one you created)-2LL in null model - -2LL in your model with df = number of exposure variable

• A good model has a significant goodness of fit

Page 12: Week 6: Model selection Overview Questions from last week Model selection in multivariable analysis -bivariate significance -interaction and confounding

Linear regression model fitting

• Uses the same principles as logistic regression

• Often starts with a full model• You need to examine 2 things:

-the r2 and adjusted r2

-changes in significance of each variable as the model changes

• The goal is to achieve the model with the highest adjusted r2

Page 13: Week 6: Model selection Overview Questions from last week Model selection in multivariable analysis -bivariate significance -interaction and confounding

Confounding and effect modification

• Confounding is classified as a variable that is associated with the exposure variable and the outcome variable, but is not on the causal pathway

• E.g. smoking can be a confounding variable in the relationship between drinking alcohol and oral cancer

• Effect modification is when the variable has a different effect in subgroups of the population

• E.g., the effectiveness of a form to reduce medication errors can depend on whether the form is for home or the ED

• These need to be considered when fitting a regression model

Page 14: Week 6: Model selection Overview Questions from last week Model selection in multivariable analysis -bivariate significance -interaction and confounding

For next week

• Read articles

• Start modelling your own data using the appropriate multivariable technique

• Think about model selection, interactions and possibility of confounding