Upload
meripangaribuan
View
225
Download
4
Tags:
Embed Size (px)
DESCRIPTION
best
Citation preview
Linear Mixed Models
Introduction to Statistics
Carl von Ossietzky Universitat Oldenburg
Fakultat III - Sprach- und Kulturwissenschaften
1
Introduction
Example taken from H. Baayen 2008, Analyzing Linguistic Data: A practicalintroduction to Statistics using R. New York: Cambridge University Press.
Subjects listen to items presented auditorily over headphones. White noise is added or not added. Does white noise influence the speed of lexical
acces?
Dependendent variable: lexical decision latencies as a measure of speed of lexical acces.
2
Random effects
Items and subjects are sampled randomly from populations of items and subjects. Replicating the experiment would involve selecting other items and other subjects. Random-effect terms:
randomly sampled from a much larger population, modeled as random variables with a
mean of zero and unknown variance.
3
Fixed effects
Presence or absence of white noise is treatment factor with two levels (noise versus nonoise).
The treatment factor is repeatable for any set of subjects and sentences. Number oflevels is fixed, each of the levels can be repeated.
Fixed-effect terms:repeatable levels, factors defined by means of contrasts.
4
Mixed effects
Linear mixed model (LMM):a statistical model containing both fixed effects and random effects, that is mixedeffects.
LMM is a kind of regression analysis.
5
Model
Multiple linear regression analysis:
yi = 0 + 1xi1 + 2xi2 + ...+ pxip + i
where 0 + 1xi1 + 2xi2 + ...+ pxip is the population mean response and the irepresent the deviations (residuals, errors) from the population mean response.
Subjects show individual differences, and likewise items do. By-subject variation andby-item variation is represented in the error term.
Problem:multiple responses from the same subject cannot be regarded as independent from each
other. Mutatis mutandis the same for items.
6
Model
Classical solution:Averaging over items for a subjects-analysis or averaging over subjects for a items-
analysis.
Disadvantage:Either by-item variation or by-subject variation is disregarded.
Linear mixed models:Random effects (by-subject and by-item variation) is modeled. Averaging is not
necessary, and both kinds of variation are regarded.
A random-effect term specifies that the model will make by-subject adjustments for theaverage of the response variable by means of small changes to the intercept. Similarly
for by-item variation.
By-subject and by-item variation is no longer represented in the error term.
7
Model
Linear mixed model with p fixed variables and q random variables:
yij = 0 + 1xij1 + ...+ pxijp + bi1zij1 + ...+ biqzijp + ij
where:
yij is the value of the response variable for the jth of ni observations in the ith ofM groups
1...p are the fixed-effect coefficients, which are identical for all groups xij1..xijp are the fixed effect regressors for observation j in group i bi1...biq are the random-effect coefficients for group i; the random effects, therefore,
vary by group
zij1...zijq are the random-effect regressors ij is the error for observation j in group i.
8
Advantages
No assumption of homogeneity of regression slopes:ANCOVA requires this, but LMMs we can explicitly model this variability in regression
slopes.
No assumption of independences:AN(C)OVA and regression models require this, but LMMs do not require this.
Missing data is no problem:LMMs can deal with missing data als long as the missing data meets the so-called
missing-at-random definition.
9
Repeated measures design
An experimental design in which we have multiple subjects responding to multiple itemsis referred to as a repeated measures design.
Also known as a model with crossed random effects for subjects and items. In our example: our subjects listen to all items, all items are heard by all subjects.
10
Hierarchical design
Hierarchical design or nested design or multilevel model: there is an hierarchicalstructure.
Several schools are investigated, per school several classroom are investigated, perclassroom several students are investigated.
Three levels: highest level is school, middle level is classroom, lowest level is student.
11
Example
We focus on a repeated measures design. Example: Angela Jochmann (Niederlandistik) studies the effects of fast speech on
processing of canonical and non-canonical sentences.
Two experiments with V2 or RC sentences were conducted. We focus on the V2sentences.
21 female and 22 male students of the University of Oldenburg with an age range from19 to 30 years (mean 23.35 years) were tested.
Participants read a target word on a screen. After 800 ms the word vanished and a fixation cross appeared on the screen,
simultaneously accompanied by the auditory stimulus.
12
Example
The task was to press a button as soon as the target word was detected in the sentence. Reaction times were measured from the onset of the auditory target until button press.
An extended response window of 1000 ms was implemented to account for length
differences of the stimuli.
After each sentence, a written yes/no question was shown on the screen and participantshad to answer via button press.
Response latencies were recorded from the beginning of the visual presentation of thequestion until button press.
Response latencies were measured. We created a new variable logReactionTimerepresenting the logarithmic response latencies.
13
Example
14
Example
Material consisted of items (i.e. sentences) from the OLACS corpus (OldenburgerLinguistically and Audiologically Controlled Sentences, Uslar et al., 2010, 2013)
50 items, each of them was offered 6 times to the subject. 25 items have a canonical SVO structure, 25 items have a non-canonical OVS structure.
Examples:
SVO:Der kleine Junge umarmt den dicken Nikolaus
OVS:Den dicken Nikolaus umarmt der kleine Junge
All sentences had three different measuring points, or regions of interest (ROI). ROIwere specified on the first noun, on the verb and on the second adjective.
15
Example
All stimuli were recorded by a female semi-professional speaker in a slow to normalspeaking rate.
The duration of the stimuli is measured in milliseconds. On the basis of thesemeasurements stimuli were uniformly time-compressed to 65%, 50% and 35% of the
original speaking rate.
16
Example
Random factors: Subject (41 subjects) Item (50 items)
Fixed factors: Condition (SVO, OVS) ROI (first noun, verb, second adjective) Compression
Response variable: logReactionTime
17
By-subject variation
18
By-item variation
19
Assumptions
1. LinearityThe residual plot should not show any obvious pattern. If you find a curve or another
pattern there is no linearity.
When performing a regression analyis in SPSS make a scatter plot with predicted valueson the x axis and the residuals on the y axis.
2. No perfect multicollinearityWhen two or more predictor variables are highly correlated, meaning that one can be
linearly predicted from the others with a non-trivial degree of accuracy, we call this
multicollinearity.
Make scatterplots and calculate correlation coefficients for each pair of predictors. Thers should be lower than 0.9.
20
Assumptions
3. HomoskedasticityThe variability of the data should be approximately equal across the range of the
predicted values. At each level of the predictors the variance of the residuals should be
constant.
The residuals need to roughly have a similar amount of deviation from the predictedvalues. A good residual plot essentially looks blob-like.
4. Normality of residualsThis assumption is the least important and sometimes even not mentioned.
Perform a Shapiro-Wilk test on the residuals and make a normal quantile plot of theresiduals.
5. Absence of influential datapointsConsider the absolute standardized residuals.
Do not automatically remove outliers and influential points!
21
1. Linearity / 3. Homoskedasticity
Residual plot: residues drawn against the predicted logarithmic reaction times.
22
2. No perfect multicollinearity
We have just one covariate, namely Compression.
23
4. Normality of residuals
Normal quantile plot of the residues. The Kolmogorov-Smirnov test gives p < 0.001.
Results for the Shapiro-Wilk were not given by SPSS.
24
5. Absence of influential datapoints
Cooks distance is not available for mixed models in SPSS. We try to find outliers by investigating the residuals. This is not exactly the same as
finding influential cases.
Standardize the residuals: Analyze, Descriptive Statistics, Descriptives. Move Residualsunder Variable(s). Check Save standardized values as variables. Click on OK. A new
column contains the standardized residuals.
No residuals should have an absolute value larger than 3.29, no more than 1% shouldhave an absolute value larger than 2.58, no more than 5% should have an absolute
value larger than 1.96.
We found 0.7%, 2.1% and 5.3% respectively for the three criteria.
25
SPSS
A subject is a variable that groups participants (or subjects).
26
SPSS
27
SPSS
28
SPSS
29
SPSS
Results based on REML or ML will not differ much, ML provides a description of the fit of the full model
which is required if you want to compare models. REML only takes into account the random parameters.
30
SPSS
31
SPSS
32
SPSS
33
Results
34
Results
AIC is a goodness-of-fit measure that is corrected for the number of parameters being
estimated. It is not intrinsically interpretable, but can be used for comparing models. A
small value represents a better fit of the data.
35
Results
36
Results
The column Estimate contains the bs, being the estimated s.
37
Effect size fixed factors
The best way to calculate R2 seems to be proposed by Nakagawa& Schielzeth (2013), see: http://jslefche.wordpress.com/2013/03/13/
r2-for-linear-mixed-effects-models/
We show an easier way, proposed by Xu (2003), Measuring explained variation inlinear mixed effects models. Statistics in Medicine, 22:35273541. See http:
//onlinelibrary.wiley.com/doi/10.1002/sim.1572/pdf.
Compare a model including both random and fixed factors with a model which includesthe random factors only.
Formula:
2
= 1 variance residuals model random & fixed
variance residuals model random
38
Effect size fixed factors
2
= 1variance residuals model random & fixed
variance residuals model random= 1
0.018559
0.019115= 2.9%
39
Effect size fixed factors
How do we calculate the effect size per factor? Assume predictors P1 and P2. Assume A model having P1 only as predictor has a
higher 2 than a model having P2 only as predictor.
The effect size for P2 is calculated as:
2
= 1 variance residuals model P1 & P2 & random
variance residuals model P1 & random
40
Effect size random factors
Compare a model including both random and fixed factors with a model which includesthe fixed factors only.
Formula:
2
= 1 variance residuals model random & fixed
variance residuals model fixed
41
Effect size random factors
2
= 1variance residuals model random & fixed
variance residuals model fixed= 1
0.018559
0.028492= 34.9%
42
Multiple comparisons
43
Multiple comparisons
44
Centring
Process of transforming a variable into deviations around a fixed point. The is especiallyuseful for predictor variables.
Simplest is grand mean centring: for a given variable for each score subtract from itthe mean of all scores for that variable.
Centring is a useful way to combat multicollinearity between predictor variables. Grand mean centring of predictors does not affect the model fit, predicted values and
the residuals will be the same.
Centring can also be used in ordinary multiple linear regression analysis.
45
Interactions
46
Interactions
Adding the interactions involving Condition and Compression cause the main effect
Condition to become insignificant, probably due to a strong correlation between those
interactions and the main effect Condition.
47
Interactions
After having centred the covariate Compression around its grand mean (62.431569), the
fixed factor Condition has become significant again.
48
Generalized Linear Mixed Models
When using Linear Mixed Models (LMMs) we assume that the response being modeledis on a continuous scale.
Sometimes we can bend this assumption a bit if the response is an ordinal responsewith a moderate to large number of levels.
However, a LMM is not suitable for modelling a binary response that represents acount. For these we use generalized linear mixed models (GLMMs).
Not available in SPSS, use Generalized Estimating Equations instead. GLMM isavailable in R.
49
Generalized Estimating Equations
Generalized Estimating Equations (GEE) were introduced by Liang and Zeger (1986). GEEs are a popular alternative to the likelihoodbased GLMM which is more sensitive
to variance structure specification.
GEEs belong to a class of semiparametric regression techniques Useful for: longitudinal data:
subjects are measured at different points in time.
hierarchical data:measurements are taken on subjects who share a common characteristic such as
belonging to the same litter.
The response variable may be linear, ordinal or binary!
50
Generalized Estimating Equations
Under correct model specification and mild regularity conditions, parameter estimatesfrom GEEs are consistent.
Assumptions: the dependent variable is linearly related to the predictors (when the dependent
variable is non-normally distributed a nonidentity link function should be selected;
the number of groups is relatively high (a rule of thumb is no fewer than 10, possiblymore than 30 (Norton et al., 1996))
the observations in different clusters are independent (although within-groupobservations may correlate).
51