Assumptions. “Essentially, all models are wrong, but some are useful” George E.P. Box Your model...

Preview:

Citation preview

Assumptions

“Essentially, all models are wrong, but some are useful”

George E.P. Box

Your model has to bewrong…… but that’s o.k.if it’s illuminating!

Linear ModelAssumptions

Absence ofCollinearity

Normality of Errors

Homoskedasticity of Errors

No influentialdata points

Independence

Linear ModelAssumptions

Absence ofCollinearity

Normality of Errors

Homoskedasticity of Errors

No influentialdata points

Independence

Absence of Collinearity

Baayen(2008: 182)

Absence of Collinearity

Baayen(2008: 182)

Where does collinearitycome from?

…most often, correlated predictor variables

Demo

What to do?

Linear ModelAssumptions

Absence ofCollinearity

Normality of Errors

Homoskedasticity of Errors

No influentialdata points

Independence

Baayen(2008: 189-

190)

Leverage

DFbeta

(…and much more)

Leave-one-outInfluence Diagnostics

Winter & Matlock (2013)

Linear ModelAssumptions

Absence ofCollinearity

Normality of Errors

Homoskedasticity of Errors

No influentialdata points

Independence

Normality of ErrorThe error (not the data!) is assumed to be normally distributed

So, the residuals should be normally distributed

xmdl = lm(y ~ x)hist(residuals(xmdl))

qqnorm(residuals(xmdl))qqline(residuals(xmdl))

qqnorm(residuals(xmdl))qqline(residuals(xmdl))

Linear ModelAssumptions

Absence ofCollinearity

Normality of Errors

Homoskedasticity of Errors

No influentialdata points

Independence

Homoskedasticity of ErrorThe error (not the data!) is assumed to have equal variance across the predicted values

So, the residuals should have equal variance across the predicted values

WHAT TO IF NORMALITY/HOMOSKEDAS

TICITY IS VIOLATED?

Either: nothing + report the violation

Or: report the violation + transformations

Two types of transformations

LinearTransformation

s

NonlinearTransformation

s

Leave shape of the distribution

intact (centering, scaling)

Do change the shape of the distribution

Before transformation

After transformation

Still bad….…. but better!!

Assumptions

Absence ofCollinearity

Normality of Errors

Homoskedasticity of Errors

No influentialdata points

Independence

Normality of Errors

Homoskedasticity of Errors

(Histogram of Residuals)

Q-Q plot of Residuals

Residual Plot

Assumptions

Absence ofCollinearity

No influentialdata points

Independence

Normality of Errors

Homoskedasticity of Errors

Assumptions

Absence ofCollinearity

Normality of Errors

Homoskedasticity of Errors

No influentialdata points

Independence

Assumptions

What isindependence?

Rep 1

Rep 2

Rep 3

Item #1

Subject

Common experimental data

Item...

Item...

Rep 1

Rep 2

Rep 3

Item #1

Subject

Common experimental data

Pseudoreplication= DisregardingDependencies

Item...

Item...

Subject1 Item1Subject1 Item2Subject1 Item3… …

Subject2 Item1Subject2 Item2Subject3 Item3…. …

Machlis et al. (1985)

“pooling fallacy”

Hurlbert (1984)

“pseudoreplication”

Hierarchical data is everywhere• Typological data

(e.g., Bell 1978, Dryer 1989, Perkins 1989; Jaeger et al., 2011)

• Organizational data

• Classroom data

German

French

English

Spanish Italian

Swedish

NorwegianFinnish

Hungarian

Turkish

Romanian

German

French

English

Spanish Italian

Swedish

NorwegianFinnish

Hungarian

Turkish

Romanian

Class 1 Class 2

Hierarchical data is everywhere

Class 1 Class 2

Hierarchical data is everywhere

Class 1 Class 2

Hierarchical data is everywhere

Hierarchical data is everywhere

IntraclassCorrelation (ICC)

Hierarchical data is everywhere

Simulation for 16 subjects

pseudoreplication

items analysis

Type Ierrorrate

Interpretational Problem:What’s the population

for inference?

Violating the independence assumption makesthe p-value…

…meaningless

S1

S2

S1

S2

That’s it(for now)

Recommended