13
Items to consider - 3 Multicollinearity The relationship between IV’s… when IV’s are highly correlated with one another What to do: Examine the correlation matrix of all IV’s & DV to detect any multicollinearity Look for r’s between IV’s in excess of 0.70 If detected, it is generally best (or at least most simple) to re-run MLR and eliminate one of the 2 3 1

Items to consider - 3

  • Upload
    sal

  • View
    33

  • Download
    0

Embed Size (px)

DESCRIPTION

Items to consider - 3. 1. Multicollinearity The relationship between IV’s…when IV’s are highly correlated with one another What to do: Examine the correlation matrix of all IV’s & DV to detect any multicollinearity Look for r’s between IV’s in excess of 0.70 - PowerPoint PPT Presentation

Citation preview

PowerPoint Presentation

Items to consider - 3MulticollinearityThe relationship between IVswhen IVs are highly correlated with one anotherWhat to do:Examine the correlation matrix of all IVs & DV to detect any multicollinearityLook for rs between IVs in excess of 0.70If detected, it is generally best (or at least most simple) to re-run MLR and eliminate one of the offending IVs from the model (see model reduction, later)23

1

Multicollinearity what is it?Its to do with unique and shared variance of the IVs with the predictor & themselvesMust establish what unique variance on each predictor (IV) is related to variance on criterion (DV)Example 1 (graphical):y freshman college GPApredictor 1 high school GPApredictor 2 SAT total scorepredictor 3 attitude toward education1

Multicollinearity what is it?x1x2yCommon variance in y that both predictors 1 and 2 account for variance in y accounted for by predictor 2 after the effect of predictor 1 has been partialled out Circle = variance for a variable; overlap = shared variance (only 2 predictors shown here)1234

5

Multicollinearity what is it?x1x2yCircle = variance for a variable; overlap = shared variance (only 2 predictors shown here)123Total R2 = .66 or 66%

Multicollinearity what is it?x1x2yCircle = variance for a variable; overlap = shared variance (only 2 predictors shown here)123Total R2 = .33 or 33%

4

Multicollinearity what is it?Example 2 (words):y freshman college GPApredictor 1 high school GPApredictor 2 SAT total scorepredictor 3 attitude toward education

123

4

5

Multicollinearity what is it?

= variance in college GPA predictable from variance in high school GPA= residual variance in SAT related to variance in college GPA= residual variance in attitude related to variance in college GPA1

Multicollinearity what is it?Consider these:X1X2X3Y.2.1.3X1.5.4X2.6X1X2X3Y.6.5.7X1.2.3X2.2X1X2X3Y.6.7.7X1.7.6X2.8ACBWhich would we expect to have the largest overall R2, and which would we expect to have the smallest?1

Multicollinearity what is it?R2 will be at least .7 for B & C, but only at least .3 for ANo chance of R2 for A getting much larger, because intercorrelations of Xs are as large for A as for B & C

X1X2X3Y.2.1.3X1.5.4X2.6X1X2X3Y.6.5.7X1.2.3X2.2X1X2X3Y.6.7.7X1.7.6X2.8ACB1

2

Multicollinearity what is it?R will probably be largest for BPredictors are correlated with YNot much redundancy among predictorsR probably greater in B than C, as C has considerable redundancy in predictorsX1X2X3Y.2.1.3X1.5.4X2.6X1X2X3Y.6.5.7X1.2.3X2.2X1X2X3Y.6.7.7X1.7.6X2.8ACB12

What effect does the big M have?Can increase SEE of regression coefficients (those with the multicollinearity)This can lead to insignificant findings for those coefficientsSo predictors that may be significant when used in isolation may not be significant when used togetherCan also lead to imprecision among regression coefficients (mistakes in estimating the change in Y for a unit change in the IV)So a model with multicollinearity is misleading, & can have redundancy among the predictors1234

What do we do about the big M?Many opinionsE.g. OBrien (2007) A Caution Regarding Rules of Thumb for Variance Inflation Factors. Quality & Quantity, 41, 5, 673-690Can use VIF (variance inflation factor) and tolerance values in SPSS (problem variables are those with VIF < 4)Can painstakingly examine all possible versions of the model (putting each predictor in 1st)Well just signal multicollinearity with a r > .70, and enforce removal of at least one of the variables, and signal possible multicollinearity with a r of between .5 and .7, and suggest examination of the model with and without one of the variables.12

The Goal of MLRThe big pictureWhat were trying to do is create a model predicting a DV that explains as much of the variance in that DV as possible, while at the same time:Meet the assumptions of MLRBest manage the other issues sample size, n of predictors, outliers, multicollinearity, r with dependent variable, significance in modelBe parsimonious (can be very important)12