51
Linear Mixed Models Introduction to Statistics Carl von Ossietzky Universit¨ at Oldenburg Fakult¨ at III - Sprach- und Kulturwissenschaften 1

GLR Fixed Effects

Embed Size (px)

DESCRIPTION

best

Citation preview

  • Linear Mixed Models

    Introduction to Statistics

    Carl von Ossietzky Universitat Oldenburg

    Fakultat III - Sprach- und Kulturwissenschaften

    1

  • Introduction

    Example taken from H. Baayen 2008, Analyzing Linguistic Data: A practicalintroduction to Statistics using R. New York: Cambridge University Press.

    Subjects listen to items presented auditorily over headphones. White noise is added or not added. Does white noise influence the speed of lexical

    acces?

    Dependendent variable: lexical decision latencies as a measure of speed of lexical acces.

    2

  • Random effects

    Items and subjects are sampled randomly from populations of items and subjects. Replicating the experiment would involve selecting other items and other subjects. Random-effect terms:

    randomly sampled from a much larger population, modeled as random variables with a

    mean of zero and unknown variance.

    3

  • Fixed effects

    Presence or absence of white noise is treatment factor with two levels (noise versus nonoise).

    The treatment factor is repeatable for any set of subjects and sentences. Number oflevels is fixed, each of the levels can be repeated.

    Fixed-effect terms:repeatable levels, factors defined by means of contrasts.

    4

  • Mixed effects

    Linear mixed model (LMM):a statistical model containing both fixed effects and random effects, that is mixedeffects.

    LMM is a kind of regression analysis.

    5

  • Model

    Multiple linear regression analysis:

    yi = 0 + 1xi1 + 2xi2 + ...+ pxip + i

    where 0 + 1xi1 + 2xi2 + ...+ pxip is the population mean response and the irepresent the deviations (residuals, errors) from the population mean response.

    Subjects show individual differences, and likewise items do. By-subject variation andby-item variation is represented in the error term.

    Problem:multiple responses from the same subject cannot be regarded as independent from each

    other. Mutatis mutandis the same for items.

    6

  • Model

    Classical solution:Averaging over items for a subjects-analysis or averaging over subjects for a items-

    analysis.

    Disadvantage:Either by-item variation or by-subject variation is disregarded.

    Linear mixed models:Random effects (by-subject and by-item variation) is modeled. Averaging is not

    necessary, and both kinds of variation are regarded.

    A random-effect term specifies that the model will make by-subject adjustments for theaverage of the response variable by means of small changes to the intercept. Similarly

    for by-item variation.

    By-subject and by-item variation is no longer represented in the error term.

    7

  • Model

    Linear mixed model with p fixed variables and q random variables:

    yij = 0 + 1xij1 + ...+ pxijp + bi1zij1 + ...+ biqzijp + ij

    where:

    yij is the value of the response variable for the jth of ni observations in the ith ofM groups

    1...p are the fixed-effect coefficients, which are identical for all groups xij1..xijp are the fixed effect regressors for observation j in group i bi1...biq are the random-effect coefficients for group i; the random effects, therefore,

    vary by group

    zij1...zijq are the random-effect regressors ij is the error for observation j in group i.

    8

  • Advantages

    No assumption of homogeneity of regression slopes:ANCOVA requires this, but LMMs we can explicitly model this variability in regression

    slopes.

    No assumption of independences:AN(C)OVA and regression models require this, but LMMs do not require this.

    Missing data is no problem:LMMs can deal with missing data als long as the missing data meets the so-called

    missing-at-random definition.

    9

  • Repeated measures design

    An experimental design in which we have multiple subjects responding to multiple itemsis referred to as a repeated measures design.

    Also known as a model with crossed random effects for subjects and items. In our example: our subjects listen to all items, all items are heard by all subjects.

    10

  • Hierarchical design

    Hierarchical design or nested design or multilevel model: there is an hierarchicalstructure.

    Several schools are investigated, per school several classroom are investigated, perclassroom several students are investigated.

    Three levels: highest level is school, middle level is classroom, lowest level is student.

    11

  • Example

    We focus on a repeated measures design. Example: Angela Jochmann (Niederlandistik) studies the effects of fast speech on

    processing of canonical and non-canonical sentences.

    Two experiments with V2 or RC sentences were conducted. We focus on the V2sentences.

    21 female and 22 male students of the University of Oldenburg with an age range from19 to 30 years (mean 23.35 years) were tested.

    Participants read a target word on a screen. After 800 ms the word vanished and a fixation cross appeared on the screen,

    simultaneously accompanied by the auditory stimulus.

    12

  • Example

    The task was to press a button as soon as the target word was detected in the sentence. Reaction times were measured from the onset of the auditory target until button press.

    An extended response window of 1000 ms was implemented to account for length

    differences of the stimuli.

    After each sentence, a written yes/no question was shown on the screen and participantshad to answer via button press.

    Response latencies were recorded from the beginning of the visual presentation of thequestion until button press.

    Response latencies were measured. We created a new variable logReactionTimerepresenting the logarithmic response latencies.

    13

  • Example

    14

  • Example

    Material consisted of items (i.e. sentences) from the OLACS corpus (OldenburgerLinguistically and Audiologically Controlled Sentences, Uslar et al., 2010, 2013)

    50 items, each of them was offered 6 times to the subject. 25 items have a canonical SVO structure, 25 items have a non-canonical OVS structure.

    Examples:

    SVO:Der kleine Junge umarmt den dicken Nikolaus

    OVS:Den dicken Nikolaus umarmt der kleine Junge

    All sentences had three different measuring points, or regions of interest (ROI). ROIwere specified on the first noun, on the verb and on the second adjective.

    15

  • Example

    All stimuli were recorded by a female semi-professional speaker in a slow to normalspeaking rate.

    The duration of the stimuli is measured in milliseconds. On the basis of thesemeasurements stimuli were uniformly time-compressed to 65%, 50% and 35% of the

    original speaking rate.

    16

  • Example

    Random factors: Subject (41 subjects) Item (50 items)

    Fixed factors: Condition (SVO, OVS) ROI (first noun, verb, second adjective) Compression

    Response variable: logReactionTime

    17

  • By-subject variation

    18

  • By-item variation

    19

  • Assumptions

    1. LinearityThe residual plot should not show any obvious pattern. If you find a curve or another

    pattern there is no linearity.

    When performing a regression analyis in SPSS make a scatter plot with predicted valueson the x axis and the residuals on the y axis.

    2. No perfect multicollinearityWhen two or more predictor variables are highly correlated, meaning that one can be

    linearly predicted from the others with a non-trivial degree of accuracy, we call this

    multicollinearity.

    Make scatterplots and calculate correlation coefficients for each pair of predictors. Thers should be lower than 0.9.

    20

  • Assumptions

    3. HomoskedasticityThe variability of the data should be approximately equal across the range of the

    predicted values. At each level of the predictors the variance of the residuals should be

    constant.

    The residuals need to roughly have a similar amount of deviation from the predictedvalues. A good residual plot essentially looks blob-like.

    4. Normality of residualsThis assumption is the least important and sometimes even not mentioned.

    Perform a Shapiro-Wilk test on the residuals and make a normal quantile plot of theresiduals.

    5. Absence of influential datapointsConsider the absolute standardized residuals.

    Do not automatically remove outliers and influential points!

    21

  • 1. Linearity / 3. Homoskedasticity

    Residual plot: residues drawn against the predicted logarithmic reaction times.

    22

  • 2. No perfect multicollinearity

    We have just one covariate, namely Compression.

    23

  • 4. Normality of residuals

    Normal quantile plot of the residues. The Kolmogorov-Smirnov test gives p < 0.001.

    Results for the Shapiro-Wilk were not given by SPSS.

    24

  • 5. Absence of influential datapoints

    Cooks distance is not available for mixed models in SPSS. We try to find outliers by investigating the residuals. This is not exactly the same as

    finding influential cases.

    Standardize the residuals: Analyze, Descriptive Statistics, Descriptives. Move Residualsunder Variable(s). Check Save standardized values as variables. Click on OK. A new

    column contains the standardized residuals.

    No residuals should have an absolute value larger than 3.29, no more than 1% shouldhave an absolute value larger than 2.58, no more than 5% should have an absolute

    value larger than 1.96.

    We found 0.7%, 2.1% and 5.3% respectively for the three criteria.

    25

  • SPSS

    A subject is a variable that groups participants (or subjects).

    26

  • SPSS

    27

  • SPSS

    28

  • SPSS

    29

  • SPSS

    Results based on REML or ML will not differ much, ML provides a description of the fit of the full model

    which is required if you want to compare models. REML only takes into account the random parameters.

    30

  • SPSS

    31

  • SPSS

    32

  • SPSS

    33

  • Results

    34

  • Results

    AIC is a goodness-of-fit measure that is corrected for the number of parameters being

    estimated. It is not intrinsically interpretable, but can be used for comparing models. A

    small value represents a better fit of the data.

    35

  • Results

    36

  • Results

    The column Estimate contains the bs, being the estimated s.

    37

  • Effect size fixed factors

    The best way to calculate R2 seems to be proposed by Nakagawa& Schielzeth (2013), see: http://jslefche.wordpress.com/2013/03/13/

    r2-for-linear-mixed-effects-models/

    We show an easier way, proposed by Xu (2003), Measuring explained variation inlinear mixed effects models. Statistics in Medicine, 22:35273541. See http:

    //onlinelibrary.wiley.com/doi/10.1002/sim.1572/pdf.

    Compare a model including both random and fixed factors with a model which includesthe random factors only.

    Formula:

    2

    = 1 variance residuals model random & fixed

    variance residuals model random

    38

  • Effect size fixed factors

    2

    = 1variance residuals model random & fixed

    variance residuals model random= 1

    0.018559

    0.019115= 2.9%

    39

  • Effect size fixed factors

    How do we calculate the effect size per factor? Assume predictors P1 and P2. Assume A model having P1 only as predictor has a

    higher 2 than a model having P2 only as predictor.

    The effect size for P2 is calculated as:

    2

    = 1 variance residuals model P1 & P2 & random

    variance residuals model P1 & random

    40

  • Effect size random factors

    Compare a model including both random and fixed factors with a model which includesthe fixed factors only.

    Formula:

    2

    = 1 variance residuals model random & fixed

    variance residuals model fixed

    41

  • Effect size random factors

    2

    = 1variance residuals model random & fixed

    variance residuals model fixed= 1

    0.018559

    0.028492= 34.9%

    42

  • Multiple comparisons

    43

  • Multiple comparisons

    44

  • Centring

    Process of transforming a variable into deviations around a fixed point. The is especiallyuseful for predictor variables.

    Simplest is grand mean centring: for a given variable for each score subtract from itthe mean of all scores for that variable.

    Centring is a useful way to combat multicollinearity between predictor variables. Grand mean centring of predictors does not affect the model fit, predicted values and

    the residuals will be the same.

    Centring can also be used in ordinary multiple linear regression analysis.

    45

  • Interactions

    46

  • Interactions

    Adding the interactions involving Condition and Compression cause the main effect

    Condition to become insignificant, probably due to a strong correlation between those

    interactions and the main effect Condition.

    47

  • Interactions

    After having centred the covariate Compression around its grand mean (62.431569), the

    fixed factor Condition has become significant again.

    48

  • Generalized Linear Mixed Models

    When using Linear Mixed Models (LMMs) we assume that the response being modeledis on a continuous scale.

    Sometimes we can bend this assumption a bit if the response is an ordinal responsewith a moderate to large number of levels.

    However, a LMM is not suitable for modelling a binary response that represents acount. For these we use generalized linear mixed models (GLMMs).

    Not available in SPSS, use Generalized Estimating Equations instead. GLMM isavailable in R.

    49

  • Generalized Estimating Equations

    Generalized Estimating Equations (GEE) were introduced by Liang and Zeger (1986). GEEs are a popular alternative to the likelihoodbased GLMM which is more sensitive

    to variance structure specification.

    GEEs belong to a class of semiparametric regression techniques Useful for: longitudinal data:

    subjects are measured at different points in time.

    hierarchical data:measurements are taken on subjects who share a common characteristic such as

    belonging to the same litter.

    The response variable may be linear, ordinal or binary!

    50

  • Generalized Estimating Equations

    Under correct model specification and mild regularity conditions, parameter estimatesfrom GEEs are consistent.

    Assumptions: the dependent variable is linearly related to the predictors (when the dependent

    variable is non-normally distributed a nonidentity link function should be selected;

    the number of groups is relatively high (a rule of thumb is no fewer than 10, possiblymore than 30 (Norton et al., 1996))

    the observations in different clusters are independent (although within-groupobservations may correlate).

    51