74
Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke [email protected] Paris Lodron Universit¨ at Salzburg Innsbruck, 8 May 2013 With: Solomon W. Harrar, Woody Burchett, Amanda Ellis Arne Bathke [email protected] Inference for Multivariate Samples

Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke [email protected]

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

Inference for Multivariate Samples

Arne [email protected]

Paris Lodron Universitat Salzburg

Innsbruck, 8 May 2013

With: Solomon W. Harrar, Woody Burchett, Amanda Ellis

Arne Bathke [email protected] Inference for Multivariate Samples

Page 2: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

Data Structure

Multivariate data from factorial designs

p different variables (endpoints) are measuredindex k = 1, . . . , p

under a different conditions (treatments, sub-populations)index i = 1, . . . , a

with ni subjects (experimental units) per conditionindex j = 1, . . . , ni

The a different conditions themselves may have a structure(factorial design).

We assume that observations on different subjects are independent.

Research question: Is there a difference between the a differentconditions?

Arne Bathke [email protected] Inference for Multivariate Samples

Page 3: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

Data Structure

Multivariate data from factorial designs

p different variables (endpoints) are measuredindex k = 1, . . . , p

under a different conditions (treatments, sub-populations)index i = 1, . . . , a

with ni subjects (experimental units) per conditionindex j = 1, . . . , ni

The a different conditions themselves may have a structure(factorial design).

We assume that observations on different subjects are independent.

Research question: Is there a difference between the a differentconditions?

Arne Bathke [email protected] Inference for Multivariate Samples

Page 4: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

Data Structure

Multivariate data from factorial designs

p different variables (endpoints) are measuredindex k = 1, . . . , p

under a different conditions (treatments, sub-populations)index i = 1, . . . , a

with ni subjects (experimental units) per conditionindex j = 1, . . . , ni

The a different conditions themselves may have a structure(factorial design).

We assume that observations on different subjects are independent.

Research question: Is there a difference between the a differentconditions?

Arne Bathke [email protected] Inference for Multivariate Samples

Page 5: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

Data Example I: Clinical Study with Multiple Endpoints

a = 3 treatment groups

ni = 15 patients with panic disorder in each group

p = 2 response variables:

Clinical Global Impression (CGI), rated by the investigator on aseven point ordinal scalePatient’s Global Impression (PGI), rated by the patient on thesame scale

Question: Do the treatments have different effects? Whichone works best?

Arne Bathke [email protected] Inference for Multivariate Samples

Page 6: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

Data Example II: Crabapple Scab

a = 63 varieties of crabapples (a large)

ni = 3 to 5 replicates of each variety (ni small)

Evaluate disease resistance at p = 4 times during the growingseason

Ordinal response: Each tree rated on a scale from 0 to 5

Question: Do the 63 varieties differ with regard to theirdisease resistance?

Arne Bathke [email protected] Inference for Multivariate Samples

Page 7: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

Data Example III: Kentucky Behavioral Risk Factor Survey

a = 120 counties in Kentucky

Random sample of size ni = 9 to ni = 20from each county

p = 3 response variables(3 risk factors for developingtype II diabetes):

body mass index (numerical)exercise activity (yes/no)education (ordinal, 6 levels)

Question: Do the 120 countiesdiffer with regard torisk factors for diabetes?

Arne Bathke [email protected] Inference for Multivariate Samples

Page 8: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

Data Example IV: Foliar Diseases of Strawberry

a = 4 treatments (three fungicides and control)Sample sizes ni = 4 plots eachp = 4 response variables:

total weight of harvested fruit (quantitative)percent of fruit with symptoms of Botrytis, and other species(2 quantitative variables)severity of Phomopsis leaf blight on a scale from 0-3 where 0represents disease-free (ordinal)

Question: Do the four treatments have different protectiveeffects?

Strawberry Data Analysis

Arne Bathke [email protected] Inference for Multivariate Samples

Page 9: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

Data Example IV: Foliar Diseases of Strawberry

a = 4 treatments (three fungicides and control)Sample sizes ni = 4 plots eachp = 4 response variables:

total weight of harvested fruit (quantitative)percent of fruit with symptoms of Botrytis, and other species(2 quantitative variables)severity of Phomopsis leaf blight on a scale from 0-3 where 0represents disease-free (ordinal)

Question: Do the four treatments have different protectiveeffects?

Strawberry Data Analysis

Arne Bathke [email protected] Inference for Multivariate Samples

Page 10: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

Summary of Data Characteristics

One factor (“one-way layout”) and a levels (to keep notation simple)

Typical hypothesis:

Are the a samples from the same population (multivariate

distribution)?

or: Are the a treatments having the same effect?

Multivariate observations with p variables.

Variables can be quantitative or ordinal (or there is a mix of both).

Sample sizes ni can be different.

Either one of a, ni , p can be large or smallPanic disorder: a = 2 ni = 15 p = 2Crabapples: a = 63 ni = 3 to 5 p = 4Risk factor survey: a = 120 ni = 9 to 20 p = 3Strawberries: a = 4 ni = 4 p = 4

Arne Bathke [email protected] Inference for Multivariate Samples

Page 11: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

Summary of Data Characteristics

One factor (“one-way layout”) and a levels (to keep notation simple)

Typical hypothesis:

Are the a samples from the same population (multivariate

distribution)?

or: Are the a treatments having the same effect?

Multivariate observations with p variables.

Variables can be quantitative or ordinal (or there is a mix of both).

Sample sizes ni can be different.

Either one of a, ni , p can be large or smallPanic disorder: a = 2 ni = 15 p = 2Crabapples: a = 63 ni = 3 to 5 p = 4Risk factor survey: a = 120 ni = 9 to 20 p = 3Strawberries: a = 4 ni = 4 p = 4

Arne Bathke [email protected] Inference for Multivariate Samples

Page 12: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

Summary of Data Characteristics

One factor (“one-way layout”) and a levels (to keep notation simple)

Typical hypothesis:

Are the a samples from the same population (multivariate

distribution)?

or: Are the a treatments having the same effect?

Multivariate observations with p variables.

Variables can be quantitative or ordinal (or there is a mix of both).

Sample sizes ni can be different.

Either one of a, ni , p can be large or smallPanic disorder: a = 2 ni = 15 p = 2Crabapples: a = 63 ni = 3 to 5 p = 4Risk factor survey: a = 120 ni = 9 to 20 p = 3Strawberries: a = 4 ni = 4 p = 4

Arne Bathke [email protected] Inference for Multivariate Samples

Page 13: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

Multivariate

Why Using a Multivariate Approach?

Often, data sets similar to those mentioned above areanalyzed using univariate methods: each response variable istreated separately.

Univariate is always “easier” than multivariate, and severalunivariate nonparametric inference methods are available, andwell-known.

However, multivariate methods are (most of the time)“better” and more effective:

Arne Bathke [email protected] Inference for Multivariate Samples

Page 14: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

Multivariate

Why Using a Multivariate Approach?

Often, data sets similar to those mentioned above areanalyzed using univariate methods: each response variable istreated separately.

Univariate is always “easier” than multivariate, and severalunivariate nonparametric inference methods are available, andwell-known.

However, multivariate methods are (most of the time)“better” and more effective:

Arne Bathke [email protected] Inference for Multivariate Samples

Page 15: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

Multivariate

Why Using a Multivariate Approach?

Treatment effects only present if the joint distribution of variables isconsidered, but not in marginal distributions.

Arne Bathke [email protected] Inference for Multivariate Samples

Page 16: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

Multivariate

Why Using a Multivariate Approach?

Univariate effects in individual variables too small to be detected, inparticular when employing multiple testing correction, but combinedeffects of different variables strong enough to be significant.

Arne Bathke [email protected] Inference for Multivariate Samples

Page 17: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

Multivariate

Why Using a Multivariate Approach?

Rencher and Scott (1990): Parametric case under normality

Experimentwise error kept well for the following procedure:

Conduct Wilks’ Lambda MANOVA at αIf significant, conduct p univariate F tests

Arne Bathke [email protected] Inference for Multivariate Samples

Page 18: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

Multivariate

Multivariate Data or Repeated Measures?

Repeated Measures

one variable observed at several points in time or spacecommensurate observations (same measurement scale andunits)inference should be invariant under affine lineartransformations of all observationshypotheses e.g. comparing time points

Multivariate Data

different variables, can be in different unitsinference should be invariant under componentwise monotonetransformationshypotheses about total observation vectors

still much in common

Arne Bathke [email protected] Inference for Multivariate Samples

Page 19: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

MANOVA

(M)ANOVA

Goal: Multivariate (M) Analysis of Variance (ANOVA)

Recall: ANOVA

a groups with respective sample sizes ni ; N =∑a

i=1 ni

F = H/E where

H =1

a− 1

a∑i=1

ni (Xi . − X..)2 and

E =1

N − a

a∑i=1

ni∑j=1

(Xij − Xi .)2.

Under normality and null hypothesis, F ∼ F (a− 1,N − a).

Arne Bathke [email protected] Inference for Multivariate Samples

Page 20: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

MANOVA

MANOVA

Multivariate (M) Analysis of Variance (ANOVA)

a groups with respective sample sizes ni ; N =∑a

i=1 ni

p variables

H(X) =1

a− 1

a∑i=1

ni (Xi . − X..)(Xi . − X..)′ and

E (X) =1

N − a

a∑i=1

ni∑j=1

(Xij − Xi .)(Xij − Xi .)′.

How to combine these into one test statistic?

Arne Bathke [email protected] Inference for Multivariate Samples

Page 21: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

MANOVA

MANOVA

Lawley-Hotelling: TLH = tr(HE−

)=∑

λl

Bartlett-Nanda-Pillai: TBNP = tr(H(H + E )−

)=∑ λl

1 + λl

Wilks’ Lambda: TWL = − logdet(E )

det(E + H)=∏ 1

1 + λl

where A− is the Moore-Penrose generalized inverse of A,λl are the eigenvalues of HE−1

Classical MANOVA assumes multivariate normality.

How would you check for this assumption?

Still, null distributions rather complicated.

Arne Bathke [email protected] Inference for Multivariate Samples

Page 22: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

MANOVA

MANOVA

Lawley-Hotelling: TLH = tr(HE−

)=∑

λl

Bartlett-Nanda-Pillai: TBNP = tr(H(H + E )−

)=∑ λl

1 + λl

Wilks’ Lambda: TWL = − logdet(E )

det(E + H)=∏ 1

1 + λl

where A− is the Moore-Penrose generalized inverse of A,λl are the eigenvalues of HE−1

Classical MANOVA assumes multivariate normality.How would you check for this assumption?

Still, null distributions rather complicated.

Arne Bathke [email protected] Inference for Multivariate Samples

Page 23: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

MANOVA

MANOVA

Lawley-Hotelling: TLH = tr(HE−

)=∑

λl

Bartlett-Nanda-Pillai: TBNP = tr(H(H + E )−

)=∑ λl

1 + λl

Wilks’ Lambda: TWL = − logdet(E )

det(E + H)=∏ 1

1 + λl

where A− is the Moore-Penrose generalized inverse of A,λl are the eigenvalues of HE−1

Classical MANOVA assumes multivariate normality.How would you check for this assumption?

Still, null distributions rather complicated.

Arne Bathke [email protected] Inference for Multivariate Samples

Page 24: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

MANOVA

Nonparametric

Nonparametric (Alternative to) Multivariate Analysis ofVariance

a groups with respective sample sizes ni ; N =∑a

i=1 ni

p variables (“p endpoints”)

No assumption of multivariate normality.

Not even assuming that the p variables are measured on thesame scale!

Example 1: measure on each person a binary variable (exercisestatus), an ordinal variable (education), and a quantitativevariable (BMI)Example 2: measure on each plant the proportion of sick leaves(quantitative) and the weight of fruit (quantitative), andassign a healthiness score (ordinal)

Arne Bathke [email protected] Inference for Multivariate Samples

Page 25: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

MANOVA

Nonparametric

Nonparametric (Alternative to) Multivariate Analysis ofVariance

a groups with respective sample sizes ni ; N =∑a

i=1 ni

p variables (“p endpoints”)

No assumption of multivariate normality.

Not even assuming that the p variables are measured on thesame scale!

Example 1: measure on each person a binary variable (exercisestatus), an ordinal variable (education), and a quantitativevariable (BMI)Example 2: measure on each plant the proportion of sick leaves(quantitative) and the weight of fruit (quantitative), andassign a healthiness score (ordinal)

Arne Bathke [email protected] Inference for Multivariate Samples

Page 26: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

MANOVA

Nonparametric

Conventional wisdom: this type of data shall not be analyzedusing multivariate methods.

Often used: O’Brien’s heuristic rank procedure for multipleendpoints

Inspiration for our approach: recent developments onrank-based methods for factorial designs (Brunner et al. ,1990s and later)

Arne Bathke [email protected] Inference for Multivariate Samples

Page 27: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

Notation for CR1F

p different variables (endpoints) k = 1, . . . , p

a different conditions (treatments, sub-populations)i = 1, . . . , a

ni subjects (experimental units) per condition j = 1, . . . , ni

Sample 1 Sample 2 . . . Sample a

X(1)11 X

(1)12 . . . X

(1)1n1

X(1)21 X

(1)22 . . . X

(1)2n2

. . . X(1)a1 X

(1)a2 . . . X

(1)a,na

X(2)11 X

(2)12 . . . X

(2)1n1

X(2)21 X

(2)22 . . . X

(2)2n2

. . . X(2)a1 X

(2)a2 . . . X

(2)a,na

. . . . . . . . . . . .

X(p)11 X

(p)12 . . . X

(p)1n1

X(p)21 X

(p)22 . . . X

(p)2n2

. . . X(p)a1 X

(p)a2 . . . X

(p)a,na

Ranks denoted by R instead of X

Each row (each variable) is ranked separately

Arne Bathke [email protected] Inference for Multivariate Samples

Page 28: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

Model for CR1F

Independent multivariate observations (X(1)ij , . . . ,X

(p)ij )′ ∼ Fi

Fi are p-variate distributions

Null hypothesis HF0 : F1 = · · · = Fa

Asymptotics: Either a or n tends to infinity

Arne Bathke [email protected] Inference for Multivariate Samples

Page 29: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

Balanced Design

Tests for the Balanced CR1F

Balanced: n1 = . . . = na = n and N = n · aWe consider four types of test statistics based on the quadratic forms

H(R) =1

a− 1

a∑i=1

n(Ri. − R..)(Ri. − R..)′ and

E(R) =1

N − a

a∑i=1

n∑j=1

(Rij − Ri.)(Rij − Ri.)′.

1. ANOVA-type: TA =tr(H)

tr(E)

2. Lawley-Hotelling: TLH = tr(HE−

)=∑

λl

3. Bartlett-Nanda-Pillai: TBNP = tr(H(H + E)−

)=∑ λl

1 + λl

4. Wilks’ Lambda: TWL = − logdet(E)

det(E + H)=∏ 1

1 + λl

where A− is the Moore-Penrose generalized inverse of A,

λl are the eigenvalues of HE−1

Arne Bathke [email protected] Inference for Multivariate Samples

Page 30: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

Balanced Design

Balanced CR1F: Asymptotic Results for Large n

The following results hold under null hypothesis H0

ANOVAn = f · TA = f · tr(H)

tr(E )= (a− 1) · (trE )(trH)

tr(E 2)

approximately χ2f

with f = (a− 1) · (trE )2/tr(E 2)

LHn = (a− 1)tr(HE−1

)asymptotically χ2

p(a−1)

BNPn = (N − a) · tr

(a− 1)H[(a− 1)H + (N − a)E

]−asymptotically χ2

ρ(a−1), ρ ≤ p rank of covariance matrix

WLn = a(n − 1)TWL asymptotically χ2p(a−1)

Arne Bathke [email protected] Inference for Multivariate Samples

Page 31: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

Balanced Design

Balanced CR1F: Asymptotic Results for Large a

Under null hypothesis H0, the following standardized test statistics areasymptotically (a→∞, n, p fixed) standard normal.

ANOVAa =

√a(n − 1)

2n· (TA − 1)

tr(E)√tr(E 2)

=

√a(n − 1)

2n· tr(H)− tr(E)√

tr(E 2)

=

√ap(n − 1)

2n(TA − 1) (under eigenvalue conditions)

LHa =

√a(n − 1)

2nρ·[tr(HE−

)− r1

], where r1 = rank(E).

BNPa =

√a(n − 1)

2nρ·(

N − 1

N − a

(N − 1)tr(

H[(a− 1)H + (N − a)E

]−)− r2

,

where r2 = Rg [(a− 1)H + (N − a)E ]

WLa =

√a(n − 1)n

2p

[TWL + p log(

n − 1

n)]

Arne Bathke [email protected] Inference for Multivariate Samples

Page 32: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

Unbalanced

Unbalanced CR1F

Test statistics can be defined in terms of the following matrices

H1 =1

a− 1

a∑i=1

ni (Ri. − R..)(Ri. − R..)′

H2 =1

a− 1

a∑i=1

(Ri. − R..)(Ri. − R..)′

E1 =1

N − a

a∑i=1

ni∑j=1

(Rij − Ri.)(Rij − Ri.)′

E2 =1

a− 1

a∑i=1

(1− ni

N

) 1

ni − 1

ni∑j=1

(Rij − Ri.)(Rij − Ri.)′

E3 =1

a

a∑i=1

1

ni (ni − 1)

ni∑j=1

(Rij − Ri.)(Rij − Ri.)′

Arne Bathke [email protected] Inference for Multivariate Samples

Page 33: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

Unbalanced

Unbalanced CR1F

Test statistics can be defined in terms of the following matrices

H1 =1

a− 1R

(a⊕

i=1

1

niJni −

1

NJN

)R′

H2 =1

a− 1R

[(a⊕

i=1

1

ni1ni

)Pa

(a⊕

i=1

1

ni1′ni

)]R′

E1 =1

N − aR

(a⊕

i=1

Pni

)R′

E2 =1

a− 1R

[a⊕

i=1

(1− ni

N

) 1

ni − 1Pni

]R′

E3 =1

aR

(a⊕

i=1

1

ni (ni − 1)Pni

)R′

Arne Bathke [email protected] Inference for Multivariate Samples

Page 34: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

Unbalanced

Different Matrix Pairs

Multivariate tests are now constructed similar to the balanced design.

They are based on one of the pairs (H1,E1), (H1,E2), or (H2,E3).

In each of these pairs, both matrices are under the null hypothesisconsistent estimators of the same covariance matrix.

In a balanced design, each of the three pairs will lead to the same teststatistic.

Define

n =1

a

a∑i=1

ni and n =1

a

a∑i=1

1

ni

The following standardized multivariate nonparametric test statistics have

under H0 : F1 = · · · = Fa, as a→∞, ni , p fixed, asymptotically a standard

normal distribution. (Harrar and B., JMVA 2008)

Arne Bathke [email protected] Inference for Multivariate Samples

Page 35: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

Unbalanced

Unbalanced CR1F:Large a asymptotics for ANOVA-Type Statistic

ANOVA-Type Statistic Based on (H1,E1)√a

τ(1)AN

(tr(H1)

tr(E1)− 1

), where

τ(1)AN =

1

(trE1)2

(2n

n − 1tr(E 2

1 ) +n(nn − 1)

(n − 1)2

(µ4 − 2tr(E 2

1 )− (trE1)2))

and µ4 =1

N

a∑i=1

ni∑j=1

[(Rij −

N + 1

21)′(Rij −

N + 1

21)]2

Based on (H1,E2)√a

τ(2)AN

(tr(H1)

tr(E2)− 1

), where τ

(2)AN =

tr(E 22 )

(trE2)2

2

a

a∑i=1

ni

ni − 1(1− ni

N)2 .

Based on (H2,E3)√a

τ(3)AN

(tr(H2)

tr(E3)− 1

),whereτ

(3)AN =

trE 22

(trE3)2

2

a

a∑i=1

1

ni (ni − 1).

Arne Bathke [email protected] Inference for Multivariate Samples

Page 36: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

Unbalanced

Unbalanced CR1F: Asymptotic Results

Large a: Similar results can be formulated for the other threetypes of test statistics.

Large n: The previously shown results for large n are still validin the unbalanced case, as min ni →∞.

The ANOVA-type and Lawley-Hotelling-type statisticsproposed by Munzel and Brunner (2000a,b) are based on(H2,E3).

The Bartlett-Nanda-Pillai-type statistic suggested by Harrarand B. (JMVA 2008) is based on (H1,E1).

Wilks’ Lambda traditionally based on (H1,E1), but asymptoticresults for both versions available (StatProbL 2011)

Arne Bathke [email protected] Inference for Multivariate Samples

Page 37: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

Approximations

CR1F: Finite Approximations

We have considered the following approximations and investigatedtheir relative performance through simulations.

1 Moment Estimators

1 ANOVA-type statistic: Brunner, Dette, and Munk (1997),Srivastava and Fujikoshi (2006)

2 Lawley-Hotelling test: McKeon (1974)3 Bartlett-Nanda-Pillai test: Muller (1998)4 Wilks’ Lambda: Davis (1979)

2 Asymptotic Expansions (Edgeworth, Cornish-Fisher)

1 Chi-squared distribution expansions (LH, BNP)2 Fujikoshi (1975) (LH, BNP)

3 Permutation/Randomization Method

Arne Bathke [email protected] Inference for Multivariate Samples

Page 38: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

R-Package npmv

http://cran.r-project.org/web/packages/npmv/index.html

After installing npmv, call the R-package with the following code.

nonpartest(sberry,vars=c(’weight’,’bot’,’fungi’,’rating’)

,permreps=1000)

The user has the following options:

specify explanatory variable

choose which response variables to include

choose number of permutations for randomization test(default 10000)

turn off plots (default on)

Arne Bathke [email protected] Inference for Multivariate Samples

Page 39: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

Test Statistics

The code produces the four newly developed nonparametricmultivariate test statistics mentioned above:ANOVA Type, Wilks’ Lambda Type, Lawley Hotelling Type, andBartlett Nanda Pillai Type.The output provides the p-value for the F-approximationsdescribed above, and for the corresponding permutation tests.

npmv output

Test Test Statistic P-value Perm.Test P-value

[1,] ANOVA type test 2.984 0.019 0.003

[2,] McKeon approx. for Lawley Hotelling Test 5.769 0.002 0.003

[3,] Muller approx. for Bartlett-Nanda-Pillai Test 2.501 0.009 0.006

[4,] Wilks Lambda 4.166 0.001 0.001

Arne Bathke [email protected] Inference for Multivariate Samples

Page 40: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

Permutation Test

CR1F: exchangeability under null hypothesis assumed

CR1F current project: permutation test seems to also workunder less stringent hypothesis (no exchangeability)

CR2F: permutation appears to work, too

Arne Bathke [email protected] Inference for Multivariate Samples

Page 41: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

Analysis of Data Example

Test Statistic Num df Den df p-value Perm. p

AF 2.984 6.836 27.343 0.0191 0.0040

AFS 2.984 9.024 36.095 0.0092LHMcK 8.241 12 12 0.0025 0.0027

BNPMu 1.477 15.333 42.167 0.0060 0.0061

WLF 4.17 12 24.103 0.0014 0.0014

Tabelle: Do the four treatments have different protective effects? Results fromdifferent multivariate nonparametric (NB: three quantitative and one ordinalresponse variable) tests for the strawberry data.

Arne Bathke [email protected] Inference for Multivariate Samples

Page 42: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

Further Analysis of Data Example

Null hypothesis of equal multi (=4) variate distributions rejected

More detailed look:

Test multivariate hypotheses of equality (no treatment effect)for all 6 pairs and 4 triplets of variables (using the samemultivariate test) [or use closure test principle.]Test marginal hypotheses of no treatment effect for each ofthe four variables individually (exact Kruskal-Wallis test)With Bonferroni-Holm adjustment or using closure testprinciple exactly those hypotheses turn out significant thatinclude the variable BotrytisFurther pairwise comparisons between the four treatments,with respect to the variable Botrytis.With regard to the percentage of Botrytis, it appears that theuse of any of the fungicides is better than no spray.

Arne Bathke [email protected] Inference for Multivariate Samples

Page 43: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

Further Analysis of Data Example

Null hypothesis of equal multi (=4) variate distributions rejected

More detailed look:

Test multivariate hypotheses of equality (no treatment effect)for all 6 pairs and 4 triplets of variables (using the samemultivariate test) [or use closure test principle.]Test marginal hypotheses of no treatment effect for each ofthe four variables individually (exact Kruskal-Wallis test)

With Bonferroni-Holm adjustment or using closure testprinciple exactly those hypotheses turn out significant thatinclude the variable BotrytisFurther pairwise comparisons between the four treatments,with respect to the variable Botrytis.With regard to the percentage of Botrytis, it appears that theuse of any of the fungicides is better than no spray.

Arne Bathke [email protected] Inference for Multivariate Samples

Page 44: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

Further Analysis of Data Example

Null hypothesis of equal multi (=4) variate distributions rejected

More detailed look:

Test multivariate hypotheses of equality (no treatment effect)for all 6 pairs and 4 triplets of variables (using the samemultivariate test) [or use closure test principle.]Test marginal hypotheses of no treatment effect for each ofthe four variables individually (exact Kruskal-Wallis test)With Bonferroni-Holm adjustment or using closure testprinciple exactly those hypotheses turn out significant thatinclude the variable Botrytis

Further pairwise comparisons between the four treatments,with respect to the variable Botrytis.With regard to the percentage of Botrytis, it appears that theuse of any of the fungicides is better than no spray.

Arne Bathke [email protected] Inference for Multivariate Samples

Page 45: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

Further Analysis of Data Example

Null hypothesis of equal multi (=4) variate distributions rejected

More detailed look:

Test multivariate hypotheses of equality (no treatment effect)for all 6 pairs and 4 triplets of variables (using the samemultivariate test) [or use closure test principle.]Test marginal hypotheses of no treatment effect for each ofthe four variables individually (exact Kruskal-Wallis test)With Bonferroni-Holm adjustment or using closure testprinciple exactly those hypotheses turn out significant thatinclude the variable BotrytisFurther pairwise comparisons between the four treatments,with respect to the variable Botrytis.With regard to the percentage of Botrytis, it appears that theuse of any of the fungicides is better than no spray.

Arne Bathke [email protected] Inference for Multivariate Samples

Page 46: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

Conventional Wisdom (CW) and New Findings (1)

CW:http://en.wikiversity.org/wiki/Advanced_ANOVA/MANOVA

CW: Assumption ni > p. Not true.

CW: Assumption of multivariate normality. Not necessary fornonparametric approach.

CW: Sensitive to the effect of outliers. Not when you usenonparametric approach.

CW: MANOVA can tolerate a few outliers, particularly if theirscores are not too extreme and there is a reasonable N. Ifthere are too many outliers, or very extreme scores, considerdeleting these cases or transforming the variables involved.No. See above.

Arne Bathke [email protected] Inference for Multivariate Samples

Page 47: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

Conventional Wisdom (CW) and New Findings (1)

CW:http://en.wikiversity.org/wiki/Advanced_ANOVA/MANOVA

CW: Assumption ni > p.

Not true.

CW: Assumption of multivariate normality. Not necessary fornonparametric approach.

CW: Sensitive to the effect of outliers. Not when you usenonparametric approach.

CW: MANOVA can tolerate a few outliers, particularly if theirscores are not too extreme and there is a reasonable N. Ifthere are too many outliers, or very extreme scores, considerdeleting these cases or transforming the variables involved.No. See above.

Arne Bathke [email protected] Inference for Multivariate Samples

Page 48: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

Conventional Wisdom (CW) and New Findings (1)

CW:http://en.wikiversity.org/wiki/Advanced_ANOVA/MANOVA

CW: Assumption ni > p. Not true.

CW: Assumption of multivariate normality. Not necessary fornonparametric approach.

CW: Sensitive to the effect of outliers. Not when you usenonparametric approach.

CW: MANOVA can tolerate a few outliers, particularly if theirscores are not too extreme and there is a reasonable N. Ifthere are too many outliers, or very extreme scores, considerdeleting these cases or transforming the variables involved.No. See above.

Arne Bathke [email protected] Inference for Multivariate Samples

Page 49: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

Conventional Wisdom (CW) and New Findings (1)

CW:http://en.wikiversity.org/wiki/Advanced_ANOVA/MANOVA

CW: Assumption ni > p. Not true.

CW: Assumption of multivariate normality.

Not necessary fornonparametric approach.

CW: Sensitive to the effect of outliers. Not when you usenonparametric approach.

CW: MANOVA can tolerate a few outliers, particularly if theirscores are not too extreme and there is a reasonable N. Ifthere are too many outliers, or very extreme scores, considerdeleting these cases or transforming the variables involved.No. See above.

Arne Bathke [email protected] Inference for Multivariate Samples

Page 50: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

Conventional Wisdom (CW) and New Findings (1)

CW:http://en.wikiversity.org/wiki/Advanced_ANOVA/MANOVA

CW: Assumption ni > p. Not true.

CW: Assumption of multivariate normality. Not necessary fornonparametric approach.

CW: Sensitive to the effect of outliers. Not when you usenonparametric approach.

CW: MANOVA can tolerate a few outliers, particularly if theirscores are not too extreme and there is a reasonable N. Ifthere are too many outliers, or very extreme scores, considerdeleting these cases or transforming the variables involved.No. See above.

Arne Bathke [email protected] Inference for Multivariate Samples

Page 51: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

Conventional Wisdom (CW) and New Findings (1)

CW:http://en.wikiversity.org/wiki/Advanced_ANOVA/MANOVA

CW: Assumption ni > p. Not true.

CW: Assumption of multivariate normality. Not necessary fornonparametric approach.

CW: Sensitive to the effect of outliers.

Not when you usenonparametric approach.

CW: MANOVA can tolerate a few outliers, particularly if theirscores are not too extreme and there is a reasonable N. Ifthere are too many outliers, or very extreme scores, considerdeleting these cases or transforming the variables involved.No. See above.

Arne Bathke [email protected] Inference for Multivariate Samples

Page 52: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

Conventional Wisdom (CW) and New Findings (1)

CW:http://en.wikiversity.org/wiki/Advanced_ANOVA/MANOVA

CW: Assumption ni > p. Not true.

CW: Assumption of multivariate normality. Not necessary fornonparametric approach.

CW: Sensitive to the effect of outliers. Not when you usenonparametric approach.

CW: MANOVA can tolerate a few outliers, particularly if theirscores are not too extreme and there is a reasonable N. Ifthere are too many outliers, or very extreme scores, considerdeleting these cases or transforming the variables involved.No. See above.

Arne Bathke [email protected] Inference for Multivariate Samples

Page 53: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

Conventional Wisdom (CW) and New Findings (1)

CW:http://en.wikiversity.org/wiki/Advanced_ANOVA/MANOVA

CW: Assumption ni > p. Not true.

CW: Assumption of multivariate normality. Not necessary fornonparametric approach.

CW: Sensitive to the effect of outliers. Not when you usenonparametric approach.

CW: MANOVA can tolerate a few outliers, particularly if theirscores are not too extreme and there is a reasonable N. Ifthere are too many outliers, or very extreme scores, considerdeleting these cases or transforming the variables involved.

No. See above.

Arne Bathke [email protected] Inference for Multivariate Samples

Page 54: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

Conventional Wisdom (CW) and New Findings (1)

CW:http://en.wikiversity.org/wiki/Advanced_ANOVA/MANOVA

CW: Assumption ni > p. Not true.

CW: Assumption of multivariate normality. Not necessary fornonparametric approach.

CW: Sensitive to the effect of outliers. Not when you usenonparametric approach.

CW: MANOVA can tolerate a few outliers, particularly if theirscores are not too extreme and there is a reasonable N. Ifthere are too many outliers, or very extreme scores, considerdeleting these cases or transforming the variables involved.No. See above.

Arne Bathke [email protected] Inference for Multivariate Samples

Page 55: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

Conventional Wisdom (CW) and New Findings (2)

CW: MANOVA works best when the DVs are only moderatelycorrelated.

New nonparametric tests also work when variablesare negatively or highly positively correlated.

CW: When correlations are low, consider running separateANOVAs. Consider always first running nonparametricalternative to MANOVA.

CW: If error variances are not homogeneous, use a moreconservative critical α-level for determining significance forthat variable in the univariate F-test. Don’t mess with it. Usenonparametric alternative to MANOVA.

Finally: It’s available in R and easy to use.

Arne Bathke [email protected] Inference for Multivariate Samples

Page 56: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

Conventional Wisdom (CW) and New Findings (2)

CW: MANOVA works best when the DVs are only moderatelycorrelated. New nonparametric tests also work when variablesare negatively or highly positively correlated.

CW: When correlations are low, consider running separateANOVAs. Consider always first running nonparametricalternative to MANOVA.

CW: If error variances are not homogeneous, use a moreconservative critical α-level for determining significance forthat variable in the univariate F-test. Don’t mess with it. Usenonparametric alternative to MANOVA.

Finally: It’s available in R and easy to use.

Arne Bathke [email protected] Inference for Multivariate Samples

Page 57: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

Conventional Wisdom (CW) and New Findings (2)

CW: MANOVA works best when the DVs are only moderatelycorrelated. New nonparametric tests also work when variablesare negatively or highly positively correlated.

CW: When correlations are low, consider running separateANOVAs.

Consider always first running nonparametricalternative to MANOVA.

CW: If error variances are not homogeneous, use a moreconservative critical α-level for determining significance forthat variable in the univariate F-test. Don’t mess with it. Usenonparametric alternative to MANOVA.

Finally: It’s available in R and easy to use.

Arne Bathke [email protected] Inference for Multivariate Samples

Page 58: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

Conventional Wisdom (CW) and New Findings (2)

CW: MANOVA works best when the DVs are only moderatelycorrelated. New nonparametric tests also work when variablesare negatively or highly positively correlated.

CW: When correlations are low, consider running separateANOVAs. Consider always first running nonparametricalternative to MANOVA.

CW: If error variances are not homogeneous, use a moreconservative critical α-level for determining significance forthat variable in the univariate F-test. Don’t mess with it. Usenonparametric alternative to MANOVA.

Finally: It’s available in R and easy to use.

Arne Bathke [email protected] Inference for Multivariate Samples

Page 59: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

Conventional Wisdom (CW) and New Findings (2)

CW: MANOVA works best when the DVs are only moderatelycorrelated. New nonparametric tests also work when variablesare negatively or highly positively correlated.

CW: When correlations are low, consider running separateANOVAs. Consider always first running nonparametricalternative to MANOVA.

CW: If error variances are not homogeneous, use a moreconservative critical α-level for determining significance forthat variable in the univariate F-test.

Don’t mess with it. Usenonparametric alternative to MANOVA.

Finally: It’s available in R and easy to use.

Arne Bathke [email protected] Inference for Multivariate Samples

Page 60: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

Conventional Wisdom (CW) and New Findings (2)

CW: MANOVA works best when the DVs are only moderatelycorrelated. New nonparametric tests also work when variablesare negatively or highly positively correlated.

CW: When correlations are low, consider running separateANOVAs. Consider always first running nonparametricalternative to MANOVA.

CW: If error variances are not homogeneous, use a moreconservative critical α-level for determining significance forthat variable in the univariate F-test. Don’t mess with it. Usenonparametric alternative to MANOVA.

Finally: It’s available in R and easy to use.

Arne Bathke [email protected] Inference for Multivariate Samples

Page 61: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

Conventional Wisdom (CW) and New Findings (2)

CW: MANOVA works best when the DVs are only moderatelycorrelated. New nonparametric tests also work when variablesare negatively or highly positively correlated.

CW: When correlations are low, consider running separateANOVAs. Consider always first running nonparametricalternative to MANOVA.

CW: If error variances are not homogeneous, use a moreconservative critical α-level for determining significance forthat variable in the univariate F-test. Don’t mess with it. Usenonparametric alternative to MANOVA.

Finally: It’s available in R and easy to use.

Arne Bathke [email protected] Inference for Multivariate Samples

Page 62: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

Simulated α-Level, Nonparametric vs. Parametric

Abbildung: Simulated α-level of proposed nonparametric and of parametricmultivariate tests in comparison. Sample size between n = 4 and 128 per level,a = 4 levels, p = 4 variables. Underlying distribution is multivariate normalwith correlation structure as in strawberry data and 10% contamination. Onevariable rounded to an ordinal scale.

Arne Bathke [email protected] Inference for Multivariate Samples

Page 63: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

Simulated α-Level, Nonparametric vs. Parametric

Abbildung: Simulated α-level of proposed nonparametric and of parametricmultivariate tests in comparison. Sample size n = 4, number of samplesbetween a = 4 and 128, p = 4 variables. Underlying distribution is multivariatenormal with correlation structure as in strawberry data and 10%contamination. One variable rounded to an ordinal scale.

Arne Bathke [email protected] Inference for Multivariate Samples

Page 64: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

Simulated α-Level, Nonparametric vs. Parametric

Abbildung: Simulated α-level of proposed nonparametric and of parametricmultivariate tests in comparison. Sample size n = 4, a = 4 levels, p = 4variables. Underlying distribution is multivariate normal with correlationbetween the variables between -0.3 and +1, and 10% contamination. Onevariable rounded to an ordinal scale.

Arne Bathke [email protected] Inference for Multivariate Samples

Page 65: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

Simulated Power, Nonparametric vs. Parametric

Abbildung: Simulated power of proposed nonparametric and of parametricmultivariate tests in comparison. Sample size n = 4, a = 4 levels, p = 4variables. Underlying distribution is multivariate normal with correlationstructure as in strawberry data, and 10% contamination. One variable roundedto an ordinal scale. Alternative is location shift in two variables (0,0,1,2).

Arne Bathke [email protected] Inference for Multivariate Samples

Page 66: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

CR2F Model

Yijk independent p-dimensional random vectors

mean vectors µij

covariance matrices Σij

i = 1, . . . , a, j = 1, . . . , b, and k = 1, . . . , nij

Model µij = µ + αi + βj + γ ij

αi , βj , and γ ij unknown constants corresponding to theeffects due to factors A and B and interaction AB

assume identifiability constraints

Arne Bathke [email protected] Inference for Multivariate Samples

Page 67: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

CR2F Hypotheses

H(A)0 : αi = 0 for i = 1, 2, . . . , a – no main effects of levels of

factor A

H(A|B)0 : αi + γ ij = 0 for i = 1, 2, . . . , a and j = 1, 2, . . . , b –

no simple effects of levels of factor A

H(B)0 : βj = 0 for j = 1, 2, . . . , b – no main effects of levels of

factor B

H(B|A)0 : βj + γ ij = 0 for i = 1, 2, . . . , a and j = 1, 2, . . . , b –

no simple effects of levels of factor B

H(AB)0 : γ ij = 0 for i = 1, 2, . . . , a and j = 1, 2, . . . , b – no

interaction effects of levels of factor A and levels of factor B

Arne Bathke [email protected] Inference for Multivariate Samples

Page 68: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

CR2F Test Statistics: AB Interaction

H(AB) =1

(a− 1)(b − 1)

a∑i=1

b∑j=1

(Yij . − Yi .. − Y.j . + Y...)(· · · )′

=1

(a− 1)(b − 1)Y·(Pa ⊗ Pb)Y

′·

G =1

ab

a∑i=1

b∑j=1

1

nij(nij − 1)

nij∑k=1

(Yijk − Yij .)(Yijk − Yij .)′

=1

ab

a∑i=1

b∑j=1

1

nijSij

Y· = (Y11·, . . . , Y1b·, Y21·, . . . , Yab·)

Y... = 1ab

a∑i=1

b∑j=1

Yij .

Arne Bathke [email protected] Inference for Multivariate Samples

Page 69: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

CR2F Test Statistics: AB Interaction

H(AB) =1

(a− 1)(b − 1)

a∑i=1

b∑j=1

(Yij . − Yi .. − Y.j . + Y...)(· · · )′

=1

(a− 1)(b − 1)Y·(Pa ⊗ Pb)Y

′·

G =1

ab

a∑i=1

b∑j=1

1

nij(nij − 1)

nij∑k=1

(Yijk − Yij .)(Yijk − Yij .)′

=1

ab

a∑i=1

b∑j=1

1

nijSij

Estimate of the “within” variability: average of the estimatorsof the (co)variances of the cell mean vectors.

E (G ) = Σ and under H(AB)0 : E

(H(AB)

)= Σ.

Arne Bathke [email protected] Inference for Multivariate Samples

Page 70: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

More Notation

µ(ψ)ij =

µ + αi + βj if ψ = AB

µ + βj if ψ = A or A|Bµ + αi if ψ = B or B|A

Ω is a fixed matrix

v1(Ω) = lima→∞1ab

∑ai=1

∑bj=1

tr(ΩΣij )2

nij (nij−1) and

v2(Ω) = lima→∞1ab

∑ai=1

∑bj 6=j ′

tr(ΩΣijΩΣij′ )

nijnij′, assuming the

limits exist.

Arne Bathke [email protected] Inference for Multivariate Samples

Page 71: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

CR2F Test Statistics: Testing A, AB , A|B , B |A

Theorem

Let ψ = AB,A,A|B or B|A. Suppose that under the hypothesis

H(ψ)0 , Yijk are independently distributed with mean vector µ

(ψ)ij

and covariance matrix Σij for i = 1, . . . , a, j = 1, . . . , b andk = 1, . . . , nij . Then, under technical assumptions,√

a tr(H(ψ) − G )ΩL→ N

(0, τ2

ψ(Ω))

as a→∞ and nij and b

bounded, where

τ2ψ(Ω) =

2b

v1(Ω) + v2(Ω)

(b−1)2

when ψ = AB

2b v1(Ω) + v2(Ω) when ψ = A2bv1(Ω) when ψ = A|B2b2

v1(Ω) + v2(Ω)

(b−1)2

when ψ = B|A

.

Arne Bathke [email protected] Inference for Multivariate Samples

Page 72: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Theorem (Consistent Variance Estimation)

Let the model and assumptions be as in the previous Theorem. Furtherassume the eighth order moments of Yijk exist and define

Ψij(Ω) =1

4cij

nij∑(k1,k2,k3,k4)∈K

Ω(Yijk1−Yijk2 )(Yijk1−Yijk2 )′Ω(Yijk3−Yijk4 )(Yijk3−Yijk4 )′,

where K is the set of all quadruples κ = (k1, k2, k3, k4) where no elementin κ is equal to any other element in κ, andcij = nij(nij − 1)(nij − 2)(nij − 3).Then, as a→∞,

1

ab

a∑i=1

b∑j=1

1

nij(nij − 1)tr(Ψij(Ω))− 1

ab

a∑i=1

b∑j=1

1

nij(nij − 1)tr(ΩΣij)

2 = op(1)

and

1

ab

a∑i=1

b∑j 6=j′

1

nijnij′tr(ΩSijΩSij′)−

1

ab

a∑i=1

b∑j 6=j′

1

nijnij′tr(ΩΣijΩΣij′) = op(1).

Page 73: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

CR2F Test Statistics: Testing BTheorem

Suppose Yijk are independently distributed with mean vector

µ(B)ij = µ + αi and covariance matrix Σij for i = 1, . . . , a,

j = 1, . . . , b and k = 1, . . . , nij . Then, under the assumption(1/a)

∑ai=1 n−1

ij Σij = O(1) as a→∞, and another technicalassumption,

(b − 1)trH(B)ΩL→

(b−1)p∑k=1

λkχ21,k

where nij and b are bounded and λk is the kth largest eigenvalueof Λ defined by

Λ = (Pb ⊗ Ω1/2)(

1a

∑ai=1

(⊕bj=1

1nij

Σij

))(Pb ⊗ Ω1/2).

Here, χ21,k , k = 1, 2, . . . , (b − 1)p, stands for independent

chi-square random variables each with one degree of freedom.

Arne Bathke [email protected] Inference for Multivariate Samples

Page 74: Inference for Multivariate Samples · Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F Inference for Multivariate Samples Arne Bathke Arne.Bathke@sbg.ac.at

Data Statistical Approach CR1F R Permutations Data Analysis Summary Simulations (CR1F) CR2F

Literature

AB & SW Harrar 2008, Nonparametric methods in multivariate factorial designsfor large number of factor levels, JSPI.AB & SW Harrar & LV Madden 2008, How to compare small multivariatesamples using nonparametric tests, CSDA.SW Harrar & AB 2008, A nonparametric version of the Bartlett-Nanda-Pillaimultivariate test: asymptotics, approximations, and applications,AmJMathManagemSci.SW Harrar & AB 2008, Nonparametric methods for unbalanced multivariate dataand many factor levels, JMVA.AB & SW Harrar & MR Ahmad 2009, Some contributions to the analysis ofmultivariate data, BiomJ.SW Harrar and AB 2010, A modified robust two-factor multivariate analysis ofvariance: asymptotics and small sample approximations. AnnInstStatMath.C Liu & AB & SW Harrar 2011, A nonparametric version of Wilks’ lambda –asymptotic results and small sample approximations, StatProb Letters.

Arne Bathke [email protected] Inference for Multivariate Samples