t^2 de hottelling.pdf

Embed Size (px)

Citation preview

  • 7/31/2019 t^2 de hottelling.pdf

    1/47

    2/22/20

    1

    EPS 651

    Multivariate Analysis of Variance:

    Hotellings T

    Some new material from Chapter 4 and part of a review

    Basically, this is a review and so is next week

    2

  • 7/31/2019 t^2 de hottelling.pdf

    2/47

    2/22/20

    3

    1. Assuming that a finding of no difference (non-significance) means

    that the groups are the same. Finding no effect does NOT suggest

    anything about probability.

    2. Running multiple dependent univariate tests rather than MANOVA.

    3. Running a control group as a straw dog.

    4. Employing pre and posttests on extremes.

    5. Running correlations to show differences.

    6. Not running pilots to determine effect sizes.

    7. Trying to measure too many independent variables(or dependent variables) with too few subjects.

    4

    In selecting participants for group assignment, it appears that you haveemployed failure to find statistical differences as a strategy forshowing that participants did not differ with regard to particulardemographic characteristics. For example, on p. 5 within the Methodsection you indicate:

    There were no significant differences in demographiccharacteristics of the two groups on age, t(2, 12) = 0.814868,p > .05, income,

    t(2, 12) = 0.176824, p > .05, and level of education in yearspost-high school, t(2, 12) = 0.07276, p > .05.

  • 7/31/2019 t^2 de hottelling.pdf

    3/47

    2/22/20

    5

    I think it is important to be very clear about what itmeans to fail to reject the null hypothesis (forgive theannoying use of jargon and double negatives), but thisis the language of null hypothesis testing and we areobligated to employ it properly. Failing to reject thenull hypothesis does not in any way suggest that groupsare essentially the same. Failing to reject the nullhypothesis only means that one cannot rule out thechance that the differences between groups could be afunction of sampling error. Thus, I can see no

    methodological advantage to including any of thesedetails within a revised version of your manuscript.

    6

    Stevens: There are two reasons one should beinterested in using more than one dependent variablewhen comparing two treatments.

    1. Any treatment worth its salt will affect the subjects inmore than one way; hence the need for severalcriterion measures.

    2. Through the use of several criterion measures we canobtain a more complete and detailed description of

    the phenomenon under investigation, whether it isreading achievement, math achievement, self concept,physiological stress, or teacher effectiveness orcounselor effectiveness.

  • 7/31/2019 t^2 de hottelling.pdf

    4/47

    2/22/20

    7

    Principles of (K-group) MANOVA

    Testing multivariate effects

    Post-hoc tests and contrasts

    Assumptions in MANOVA

    Independent observations

    Multivariate normality

    Homogeneous covariance matrices

    Overview: From my point of view, the concepts in Stevensare just as difficult (if not more so) than the calculations

  • 7/31/2019 t^2 de hottelling.pdf

    5/47

    2/22/20

    Multivariate test

    Test

    Partitioning of total covariance (matrix):

    Total SSCP = Between SSCP + Within SSCP

    = +

    Test statistic Wilks :

    Multivariate analysis

    Pairwise comparison ofvectors of meansfor all

    kgroups

    Special case: k= 2 groups Hotellings T

    Univariate t procedure does not need post hoc tests

    Multivariate procedure tests pdependent variables:

    Hotellings T MANOVAK = number of groups and p = number of pairwise comparisons

  • 7/31/2019 t^2 de hottelling.pdf

    6/47

  • 7/31/2019 t^2 de hottelling.pdf

    7/47

    2/22/20

    Univariate case

    In each group the dependent variable follows a normal

    distribution (mean i and variance 2)

    Effect of violation:

    ANOVA is robust with respect to Type I error

    Power attenuated by kurtosis (what does this mean?)

    ANOVA Assumption

    MANOVA Requires:

    Variances and covariances must be relatively equal

    See Levenes test and related details

    More stringent assumption: more restrictions

    Effect of violation:

    For equal group sizes actual levels

    are very close to the nominal levels

    For unequal group sizes:

    Large variances w/ small groups: Liberal F test.Large variances w/ large groups: Conservative F test.

    MANOVA:Always remember Homogeneity of Variance

  • 7/31/2019 t^2 de hottelling.pdf

    8/47

    2/22/20

    Levenes Test (Available in SPSS)

    15

    is employed to assess the equality of variances in differentsamples. Several multivariate procedures entail the assumption that

    variances of the populations from which different samples are drawn

    are equal. (otherwise what? as per the previous slide)

    Levene's test assesses the null hypothesis that the population variances are

    equal (homogeneity). If the resulting p-value of Levene's test is less than

    some critical value (typically 0.05), the obtained differences in sample

    variances are unlikely to have occurred based on random sampling. Thus, the

    null hypothesis of equal variances is rejected and it is concluded that there

    is a difference between the variances in the population.

    As a potential test item, what is wrong with this the rest

    of this picture? Hint: If the null is NOT rejected, then we assume?

    In any event, MANOVA is NOT

    robust with regard to violation of homogeneity.I recommend simply looking at your covariance matrix

    and looking at your distributions on Excel

    Keep in mind, things can go south quickly

    when this assumption is violated.

  • 7/31/2019 t^2 de hottelling.pdf

    9/47

    2/22/20

    17

    18

  • 7/31/2019 t^2 de hottelling.pdf

    10/47

    2/22/20

    1

    (2, 1, 3) 2 = 2 (1) + 1(2) + 3(-1) = 1

    -1

    0

    (2, 1, 3) 4 = 2 (0) + 1 (4) + 3 (5) = 1 9

    5

    1

    (4, 5, 6) 2 = 4 (1) + 5 (2) + 6 (-1) = 8

    -1

    Product Matrix C = 1 19

    0 8 50

    (4, 5, 6) 4 = 4(0) + 5(4) + 6(5) = 50

    5

    20

    In MLR we employed raw scores in MANOVA we use deviation scores

  • 7/31/2019 t^2 de hottelling.pdf

    11/47

    2/22/20

    21

    22

    Or by using the matrix method using deviation scoresNote: Variances appear to be relatively close in this example

  • 7/31/2019 t^2 de hottelling.pdf

    12/47

    2/22/20

    23

    Matrix method: What needs to be done with the

    transpose?

    24

    SSCPSSCP

  • 7/31/2019 t^2 de hottelling.pdf

    13/47

    2/22/20

    26

  • 7/31/2019 t^2 de hottelling.pdf

    14/47

    2/22/20

    27

    28

  • 7/31/2019 t^2 de hottelling.pdf

    15/47

    2/22/20

    29

    Look at the respective variances in each of the two matrices

    30

  • 7/31/2019 t^2 de hottelling.pdf

    16/47

    2/22/20

    31

    3

  • 7/31/2019 t^2 de hottelling.pdf

    17/47

    2/22/20

    33

    34

  • 7/31/2019 t^2 de hottelling.pdf

    18/47

    2/22/20

    35

    36

  • 7/31/2019 t^2 de hottelling.pdf

    19/47

    2/22/20

    37

    38

  • 7/31/2019 t^2 de hottelling.pdf

    20/47

    2/22/20

    39

    In MLR we employed raw scores in MANOVA we use deviation scores

    Just a CSV file set up

    40

  • 7/31/2019 t^2 de hottelling.pdf

    21/47

    2/22/20

    indicate GPs, DVs, number of scores

    41

    Open CSV and load data

    42

  • 7/31/2019 t^2 de hottelling.pdf

    22/47

    2/22/20

    If everything works:

    43

    44

  • 7/31/2019 t^2 de hottelling.pdf

    23/47

    2/22/20

    45

    46

  • 7/31/2019 t^2 de hottelling.pdf

    24/47

    2/22/20

    47

    4.5 Three Post Hoc Procedures

    1. Roy-Bose simultaneous confidence intervals

    (least powerful but most informative)

    2. Univariate t-Tests with Bonferroni(fairly powerful and most often used)

    3. Using univariate tests set at the .05(most powerful but way too liberal)

    Basically, run a t-Test on each set of scores.

    Here, the Client Self Acceptance Variable shows

    .004287 X 2 = .00857

    in favor of the Adlerian group48

  • 7/31/2019 t^2 de hottelling.pdf

    25/47

    2/22/20

    SPSS run on same data:

    see http://www.ats.ucla.edu/stat/spss/

    49

    SPSS data setup (I just pull from an existing CSV)

    Data View

    Variable View

  • 7/31/2019 t^2 de hottelling.pdf

    26/47

    2/22/20

    Select GML and Multivariate

    Select Model, Contrasts, Post Hoc, and Options

  • 7/31/2019 t^2 de hottelling.pdf

    27/47

    2/22/20

    I suggest spending some time reading and exploring

    Again, see http://www.ats.ucla.edu/stat/spss/ et al

  • 7/31/2019 t^2 de hottelling.pdf

    28/47

    2/22/20

    Descriptive Stats

  • 7/31/2019 t^2 de hottelling.pdf

    29/47

    2/22/20

    a Exact statistic

    b Design: Intercept+GROUP

    SPSS Two-Group MANOVA Table See R2

    Hotelling = 21 F = 9

    The text describes a way to run a regression on these data by combing scores

  • 7/31/2019 t^2 de hottelling.pdf

    30/47

    2/22/20

    59

    Cohen's 2Cohen's 2 is an appropriate effect size

    measure to use in the context of an F-test for ANOVA ormultiple regression. The 2 effect size measure for

    multiple regression is defined as:

    where R2A is the variance accounted for by a set of

    one or more independent variables A(constant), and

    R2AB is the combined variance accounted for by Aandanother set of one or more independent variables B.

    (Could be a test item)

    As per Stevens, under the level heading

    4.8 Multivariate Regression Analysis for the Sample ProblemSubjects in group 1 are dummy coded as 1s

    and subjects in group 2 are coded as 0s (strange but functional)

    60

    In this arrangement, Xs are treated as the dependent variable

  • 7/31/2019 t^2 de hottelling.pdf

    31/47

    2/22/20

    Under these conditions, the coefficient of determination is indicating

    the extent to which the variables on Y1 and Y2

    are predictive of group membership.

    61

    (Problem 4a good test prep item) Consider the following

    data for a two group two dependent variable problem:

    T1 T2

    Y1 Y2 Y1 Y2

    1 9 4 8

    2 3 5 6

    3 4 6 7

    5 4

    2 5

    62

    4.7 Multivariate Significance But No Univariate Significance

    Just as with ANOVA, there are several ways in which this can happen.

    Violation of Assumptions or Strong Within Group Correlations

    See Stevens discussion of this issue.

  • 7/31/2019 t^2 de hottelling.pdf

    32/47

    2/22/20

    a) Compute W, i.e., the pooled within-SSCP matrix

    b) Find the pooled within covariance matrix and indicate what

    each of the elements in the matrix represents

    c) Find Hotellings

    d) What is the multivariate null hypothesis in symbolic form?

    e) Test the null hypothesis at the .05 level. What is yourdecision?

    f) Run Post Hoc comparisons 63

    T1 T2

    Y1 Y2 Y1 Y2

    1 9 4 8

    2 3 5 6

    3 4 6 7

    5 4

    2 5

    64

    Here N (total n per group) = 8 and k (groups) = 2 Keep in mind for:

  • 7/31/2019 t^2 de hottelling.pdf

    33/47

    2/22/20

    Mean differences on subtest scores obtained by subtracting

    mean of second group from the mean of first group

    T1 T2

    Y1 Y2 Y1 Y2

    1 9 4 8

    2 3 5 6

    3 4 6 7

    5 4

    2 5

    65

    M = 2.6 M = 5

    M = 5 M = 7

    = -2.4

    = -2

    Group 1: [transpose] * [deviation matrix]

    66

    -1.6 -0.6 0.4 2.4 -0.6

    4 -2 -1 -1 0-1.6 4

    -0.6 -2

    0.4 -1

    2.4 -1

    -0.6 0

  • 7/31/2019 t^2 de hottelling.pdf

    34/47

    2/22/20

    Calculate matrix product

    for the first group:

    -1 1

    0 -1

    1 0

    68

    Group 2: [transpose] * [deviation matrix]

    -1 0 1

    1 -1 0

  • 7/31/2019 t^2 de hottelling.pdf

    35/47

    2/22/20

    Calculate matrix product

    70

    Problem 4 continued

    The within matrices for groups 1 and 2 are

    respectively

    Therefore the pooled within SSCP matrix is:

    Btw, look at the respective covariances in

    each of the two matriceswhat do you think?

  • 7/31/2019 t^2 de hottelling.pdf

    36/47

    2/22/20

    71

    b) The pooled within covariance matrix is given by:

    1.87 is the variance for y1,

    4 is the variance for y2

    1.5 is the covariance for y1, and y2

    Where 5.23 is the determinant of do you see why?

    What will happen to the signs in the next step?

    72

    where and are the vectors of means for groups 1 and 2 andS1 is the inverse of the covariance matrix.

    and 5.23 is the determinant.

  • 7/31/2019 t^2 de hottelling.pdf

    37/47

    2/22/20

    73

    Now, the means for y1 and y2 in group 1 are 2.6 and 5,

    while the means for y1 and y2 in group 2 are 5 and 7.Thus,

    Therefore,

    74

    d) The multivariate null hypothesis is that the population

    mean vectors are equal, i.e., = 2 for both DVs

    e) To test the multivariate null hypothesis we use the exact

    Ftransformation ofT :

    Since 6.71 > 5.79 we reject the multivariate null hypothesis

    and conclude that the groups differ somewhere in the

    set of 2 variables.

  • 7/31/2019 t^2 de hottelling.pdf

    38/47

    2/22/20

    Check your answers with

    psychNet

    Lets look at the Post Hoc detailsNot all that outstanding.

  • 7/31/2019 t^2 de hottelling.pdf

    39/47

    2/22/20

    SSCP:

    :

  • 7/31/2019 t^2 de hottelling.pdf

    40/47

    2/22/20

    Inverse of

    Covariance Matrix

  • 7/31/2019 t^2 de hottelling.pdf

    41/47

    2/22/20

    81

    Since 6.71 > 5.79 (df = 2 & 5 see table A.3)

    we reject the multivariate null hypothesis and

    conclude that the groups differ on the set of 2

    variablesbut not by very much.

    Given the apparent lack of homogeneity and the small

    sample sizes, how about the Post Hoc comparisons?

    Mean differences on subtest scores obtained by subtracting

    mean of second group from the mean of first group

    T1 T2

    Y1 Y2 Y1 Y2

    1 9 4 8

    2 3 5 6

    3 4 6 7

    5 4

    2 5

    82

    M = 2.6 M = 5

    M = 5 M = 7

    = -2.4

    = -2

  • 7/31/2019 t^2 de hottelling.pdf

    42/47

    2/22/20

    83

    Oops!Just looking at the first set of means 2.6 and 5

    84

    Still Oops!

  • 7/31/2019 t^2 de hottelling.pdf

    43/47

    2/22/20

    10) Ambrose (1985) compared elementary schoolchildren who received instruction on the clarinet via

    programmed instruction (experimental group) vs.

    those who received instruction via traditional

    classroom instruction on the following six

    performance aspects: interpretation (interp), tone,

    rhythm, intonation (inton), tempo, and articulation

    (artic).

    The data, representing the average of two judgesratings, is listed below, with GPID = 1 referring to

    the experimental group and GPID = 2 referring to

    the control group: 86

  • 7/31/2019 t^2 de hottelling.pdf

    44/47

    2/22/20

    c) Setting overall

    = .05 and using the Bonferroni inequalityapproach, which of the individual variables are significant, and hence

    contributing to the overall multivariate significance.

    Using Bonferroni, each variable is tested for significance at the

    .05/6 = .0083 level of significance. From the printout the following

    variables are significant:

  • 7/31/2019 t^2 de hottelling.pdf

    45/47

    2/22/20

    Between-Subjects SSPC Matrix

  • 7/31/2019 t^2 de hottelling.pdf

    46/47

    2/22/20

    SOM classes built on Raw Scores

    By comparison, these are Not well differentiated by the SOM

    Int Tone Rhy Inton Tem Artic

  • 7/31/2019 t^2 de hottelling.pdf

    47/47

    2/22/20

    93

    We will start with Power Analysis after the test

    For practice I would run the both problems and several more

    by hand several times to make sure everything matches

    Next week we will reviewmostly chapters 2 and 3