Assumptions in Multiple Regression Analysis...• Normal Distribution • Minimization of Outliers...

Preview:

Citation preview

Measurement

Normality

Outliers

Homoscedasticity

Linearity

IndependenceAssumptions in Multiple

Regression Analysis

Measurement

Normality

Outliers

Homoscedasticity

Linearity

Independence

Violations

Is there a violation?

How severe?

Can it be avoided?

Likely effects?

Can it be minimized?

What can I do?

Measurement

Normality

Outliers

Homoscedasticity

Linearity

Independence

Key Assumptions

• Issues of Measurement• Normal Distribution• Minimization of Outliers• Homoscedasticity• Relationships are Linear• Independence

Measurement

Normality

Outliers

Homoscedasticity

Linearity

Independence

Design issues

Statistical issues

Both

Sources of assumption violations

Measurement

Normality

Outliers

Homoscedasticity

Linearity

Independence

Strength and Weakness

Measurement

Normality

Outliers

Homoscedasticity

Linearity

Independence

Issues of Measurement

www.nearingzero.net

Measurement

Normality

Outliers

Homoscedasticity

Linearity

Independence

Issues of Measurement

• Unreliability of measures

• Scale violations

• Multicolinearity

Measurement

Normality

Outliers

Homoscedasticity

Linearity

Independence

Issues of Measurement

• Unreliability

• Scale violations

• Multicolinearity

Measurement

Normality

Outliers

Homoscedasticity

Linearity

Independence

Unreliability

Unreliable measures affect the interpretation of regression

• Relationships are underestimated

• Type II error implications

Measurement

Normality

Outliers

Homoscedasticity

Linearity

Independence

An illustration using multiple regression

Reading fluency

Decoding

Time spent reading

Fluency training

Measurement

Normality

Outliers

Homoscedasticity

Linearity

Independence

Issues of Measurement

• Unreliability

• Scale violations

• Multicolinearity

Measurement

Normality

Outliers

Homoscedasticity

Linearity

Independence

Issues of Measurement

• Unreliability

• Scale violations

• Multicolinearity

Measurement

Normality

Outliers

Homoscedasticity

Linearity

Independence

How are variables measured?

1. Nominal2. Ordinal3. Interval4. Ratio

Measurement

Normality

Outliers

Homoscedasticity

Linearity

Independence

How are variables measured?

1. Nominal2. Ordinal3. Interval4. Ratio

Measurement

Normality

Outliers

Homoscedasticity

Linearity

Independence

Measures of Central Tendency

Measurement

Normality

Outliers

Homoscedasticity

Linearity

Independence

How are variables measured?

1. Nominal2. Ordinal3. Interval4. Ratio

Measurement

Normality

Outliers

Homoscedasticity

Linearity

Independence

DUMMY CODING

Dichotomous, nominal or ordinal variables are permitted (with “dummy” coding).

1. Black2. White3. Hispanic4. Other

1 if score is Black, otherwise, 0.

1 if score is White, otherwise, 0.

1 if score is Other, otherwise 0.

Measurement

Normality

Outliers

Homoscedasticity

Linearity

Independence

Issues of Measurement

• Unreliability

• Scale violations

• Multicolinearity

Measurement

Normality

Outliers

Homoscedasticity

Linearity

Independence

Issues of Measurement

• Unreliability

• Scale violations

• Multicolinearity

Measurement

Normality

Outliers

Homoscedasticity

Linearity

Independence

Multicolinearity

• You want to measure two DIFFERENT constructs

• but your two measures are very highly correlated

• You probably aren’t measuring two different constructs

• Eliminate or combine? Keep?

Measurement

Normality

Outliers

Homoscedasticity

Linearity

Independence

MulticolinearityTo mathematically check for the

presence of multicolinearity, SPSS allows you to run tolerance and Variance Inflation Factor (VIF) statistics.

• The closer the Tolerance value is to zero, the more multicolinearity exists in the model (0.20—rule of thumb).

• VIF—High VIF values are a problem (4.0 rule of thumb).

Measurement

Normality

Outliers

Homoscedasticity

Linearity

Independence Homoscedasticity

Unequal variances at different levels of a variable

Measurement

Normality

Outliers

Homoscedasticity

Linearity

Independence

Equal Variances and Residuals

http://www.math.csusb.edu/faculty/stanton/m262/regress/regress.html

Measurement

Normality

Outliers

Homoscedasticity

Linearity

Independence

Violation: Heteroscedasticity

• At each level of the independent variable (low to high), there is unequal variance for the residuals in the outcome variable.

Residual PlotE

arni

ngs

Importance

DV: $ donated

Measurement

Normality

Outliers

Homoscedasticity

Linearity

Independence

Homoscedasticity

• Does the same variance exists across all levels of your independent variable.

http://pareonline.net Practical Assessment, research, and evaluation online—used with permission

Measurement

Normality

Outliers

Homoscedasticity

Linearity

Independence

Heteroscedasticity

Low IV High

Res

idua

ls

Res

idua

ls

http://pareonline.net

fan

bow tie

http://pareonline.net Practical Assessment, research, and evaluation online—used with permission

Measurement

Normality

Outliers

Homoscedasticity

Linearity

Independence Normality

Graphic and Numeric Investigations

Measurement

Normality

Outliers

Homoscedasticity

Linearity

Independence

Normality allows

• The calculation of means and standard deviations

• Better inference to a population of interest

Measurement

Normality

Outliers

Homoscedasticity

Linearity

Independence

Normality

Measurement

Normality

Outliers

Homoscedasticity

Linearity

Independence

• Skewness

• Kurtosis

• Outliers

Measurement

Normality

Outliers

Homoscedasticity

Linearity

Independence

Residuals

• Should also be normally distributed

• Mean of 0• Have equal variances at all

values of the predictors (IV)

Measurement

Normality

Outliers

Homoscedasticity

Linearity

Independence

Abnormality Detectives

GRAPHS

CALCULATIONS

Measurement

Normality

Outliers

Homoscedasticity

Linearity

Independence

Histograms

Measurement

Normality

Outliers

Homoscedasticity

Linearity

Independence

Visual Inspection

• Histograms

Measurement

Normality

Outliers

Homoscedasticity

Linearity

Independence

Visual Inspection

• Histograms

Measurement

Normality

Outliers

Homoscedasticity

Linearity

Independence

Visual Inspection

• Histograms

Measurement

Normality

Outliers

Homoscedasticity

Linearity

Independence

Box & Whisker Plots

Measurement

Normality

Outliers

Homoscedasticity

Linearity

Independence

25%

25%

25%

25%

25%

25%

25%

25%

out liers

out liers

Measurement

Normality

Outliers

Homoscedasticity

Linearity

Independence

Box Plots

Measurement

Normality

Outliers

Homoscedasticity

Linearity

Independence

Visual Inspection

Box Plots

Measurement

Normality

Outliers

Homoscedasticity

Linearity

Independence

Visual Inspection

Box Plots

Measurement

Normality

Outliers

Homoscedasticity

Linearity

Independence

Box Plots

Measurement

Normality

Outliers

Homoscedasticity

Linearity

Independence

Visual Inspection

Box-plot--Problems

Measurement

Normality

Outliers

Homoscedasticity

Linearity

Independence

Measurement

Normality

Outliers

Homoscedasticity

Linearity

Independence

Box Plots

Measurement

Normality

Outliers

Homoscedasticity

Linearity

Independence

P-Plots

Measurement

Normality

Outliers

Homoscedasticity

Linearity

Independence

Probability plot

Measurement

Normality

Outliers

Homoscedasticity

Linearity

Independence

Probability plot

Measurement

Normality

Outliers

Homoscedasticity

Linearity

Independence

Probability Plot

Measurement

Normality

Outliers

Homoscedasticity

Linearity

Independence

MCI Total

Measurement

Normality

Outliers

Homoscedasticity

Linearity

Independence

Calculation Methods to Assess Normality

Skew = 0

Kurtosis = 0

Measurement

Normality

Outliers

Homoscedasticity

Linearity

Independence

Skewness & Kurtosis

Skewness rule of thumb :0 = ideal 1 - 2 = uh oh1 = okay >2 = oh no

Values less than twice their standard error are considered good enough.

Descriptive Statistics

175 32.9429 4.89042 -.341 .184 .132 .365175 125.0457 14.31975 -.163 .184 -.758 .365175

MCIKNOWLMCITOTALValid N (listwise)

Statistic Statistic Statistic Statistic Std. Error Statistic Std. ErrorN Mean s.d. Skewness Kurtosis

Measurement

Normality

Outliers

Homoscedasticity

Linearity

Independence

Outliers

Handy z-score formula

z = score - meanstandard deviation

You wonder if a score of 150 is an outlier. Your data set has a mean of 100, s.d. of 15. Figure it out.

http://www.math.csusb.edu/faculty/stanton/m262/regress/regress.html

Measurement

Normality

Outliers

Homoscedasticity

Linearity

Independence

What to do with Outliers?

Those might be perfectly good observations.

Keep them and live with your

results.

Delete them! The outliers are

throwing everything off. Just get rid of

them.

“Statistics do not exclude data, analysts do.”

Good & Hardin Common Errors in Statistics (and How to Avoid Them) 2003 p. 139

Measurement

Normality

Outliers

Homoscedasticity

Linearity

Independence

Retain, Discard, Do Nothing, Look at Both?

Abelson, 1995 Statistics as Principled Argument p. 70

“Abelson’s Third Law: Never flout a

convention just once.”

Measurement

Normality

Outliers

Homoscedasticity

Linearity

Independence

Influence Statistics

How much does each individual data point influence the parameter estimates? You can find out.

1-Calculate your parameter estimate [mean] for all of the variables

2-Recalculate the estimate with one data point excluded

3 Compare the results of each estimate (first and second) and note the differences.

Measurement

Normality

Outliers

Homoscedasticity

Linearity

Independence

Transformations

• Log transformations (take the logarithm of every score, and make it into a new variable)

• Taking the square root of scores is also a common transformation

• Both are accomplished under “compute” in SPSS

Measurement

Normality

Outliers

Homoscedasticity

Linearity

Independence

Transformations

•Good for distributions

•Bad for interpretations

Measurement

Normality

Outliers

Homoscedasticity

Linearity

Independence

Assumptions for data used in multivariate analyses

Assumption 1• At each value of dependent

variable, distribution of residuals is normal.

Assumption 2• Variances of residuals at every set

of values for the independent variables is equal. Known as homoscedasticity. Violated assumption known as heteroscedasticity.

Measurement

Normality

Outliers

Homoscedasticity

Linearity

Independence

Assumptions for data used in multivariate analyses

Assumption 3• Mean value of residual equals

zero at each value of dependent variable — this is an extension of the bivariate assumption that relationship between IV and DV is linear.

Assumption 4 • For any 2 cases, the expected

correlation between the residuals = zero. This is the independence assumption/non-autocorrelation.

Measurement

Normality

Outliers

Homoscedasticity

Linearity

Independence

Assumption of Linearity

http://pareonline.net

We expect to be able to predict one variable, on the basis of the value of another variable, and the values in question are best represented in a linear fashion.

Measurement

Normality

Outliers

Homoscedasticity

Linearity

Independence

What does non-linear look like?

Curvilinear relationship

Curvilinear relationship

Curvilinear relationship- Asymptotic

Measurement

Normality

Outliers

Homoscedasticity

Linearity

Independence

An important relationships

Goo

d re

sear

ch ju

dgm

ent

Experience with research issues

Measurement

Normality

Outliers

Homoscedasticity

Linearity

Independence

Dixon and Reid 2000

Measurement

Normality

Outliers

Homoscedasticity

Linearity

Independence

Sample

Measurement

Normality

Outliers

Homoscedasticity

Linearity

Independence

Instruments

Measurement

Normality

Outliers

Homoscedasticity

Linearity

Independence

Dixon & Reid 2000

Measurement

Normality

Outliers

Homoscedasticity

Linearity

Independence

Hypotheses and Results

NLE

PLE

Depression

Depression

Depression

NLE

PLE

PLE

High

LowDepression

PLE moderates the effect of NLE on symptoms of depression

High

Normal

High

Measurement

Normality

Outliers

Homoscedasticity

Linearity

Independence

Dixon & Reid 2000

High positive life events

Low positive lif

e events

Normal

High Depression

Measurement

Normality

Outliers

Homoscedasticity

Linearity

Independence

Which Comes First?

NLE

NLE

Depression

Depression

Measurement

Normality

Outliers

Homoscedasticity

Linearity

Independence

Dixon & Reid 2000

Recommended