60
Permutation tests in multivariate analysis Michele Scardi Department of Biology ‘Tor Vergata’ University Rome, Italy ail: [email protected] L: http://www.michele.scardi.name The Fourth BIOSAFENET Seminar – January 12-15, 2009 International Centre for Genetic Engineering and Biosafety (ICGEB) Ca’ Tron di Roncade, Italy

Permutation tests in multivariate analysis Michele Scardi Department of Biology ‘Tor Vergata’ University Rome, Italy Email: [email protected] URL:

Embed Size (px)

Citation preview

Page 1: Permutation tests in multivariate analysis Michele Scardi Department of Biology ‘Tor Vergata’ University Rome, Italy Email: mscardi@mclink.it URL:

Permutation tests in multivariate analysis

Michele Scardi

Department of Biology‘Tor Vergata’ University

Rome, Italy

Email: [email protected]: http://www.michele.scardi.name

The Fourth BIOSAFENET Seminar – January 12-15, 2009

International Centre for Genetic Engineering and Biosafety (ICGEB)Ca’ Tron di Roncade, Italy

Page 2: Permutation tests in multivariate analysis Michele Scardi Department of Biology ‘Tor Vergata’ University Rome, Italy Email: mscardi@mclink.it URL:

Ecology and agriculture

A few reasons why studyingGMO effects on NTO is relevant

to ecologists

Page 3: Permutation tests in multivariate analysis Michele Scardi Department of Biology ‘Tor Vergata’ University Rome, Italy Email: mscardi@mclink.it URL:

Agriculture and biodiveristy

Intermediatedisturbancehypothesis!

Modified from EEA Report 2/2006

Intensive farmland

High Nature Value farmland

Page 4: Permutation tests in multivariate analysis Michele Scardi Department of Biology ‘Tor Vergata’ University Rome, Italy Email: mscardi@mclink.it URL:

CAP and community ecology

• In the past, Common Agricultural Policy mainly promoted the expansion of agricultural production.

• A complete re-arrangement of the funding scheme now puts the environment at the center of farming policy.

• Preserving biodiversity (and therefore monitoring agroecosystems) is one of the CAP main goals.

• Biodiversity can only be monitored through community ecology studies.

• Similar targets have been set in other fields (e.g. Water Framework Directive).

Page 5: Permutation tests in multivariate analysis Michele Scardi Department of Biology ‘Tor Vergata’ University Rome, Italy Email: mscardi@mclink.it URL:

EU and GMOs

• In December 2008, the Council of the EU adopted the following conclusions on GMOs:

“GMOs, … , give rise to discussion and questions, within the scientific community and society at large regarding their possible impact on health, environment and ecosystems.”

• Therefore, there is a growing need for ecological research on agroecosystems.

Page 6: Permutation tests in multivariate analysis Michele Scardi Department of Biology ‘Tor Vergata’ University Rome, Italy Email: mscardi@mclink.it URL:

Permutation tests

Because bugs don’t know they’re supposed to be normally

distributed…

Page 7: Permutation tests in multivariate analysis Michele Scardi Department of Biology ‘Tor Vergata’ University Rome, Italy Email: mscardi@mclink.it URL:

A basic (univariate) permutation test

• Sample A: 28, 32, 45 (mA=35.00)

• Sample B: 22, 25, 29 (mB=25.33)

• H0: no difference between means

• Six data can be arranged in two groups of three in 20 different combinations:

2066

720

)!36(!3

!6

)!(!

!

rnr

n

Page 8: Permutation tests in multivariate analysis Michele Scardi Department of Biology ‘Tor Vergata’ University Rome, Italy Email: mscardi@mclink.it URL:

Sample A Sample BPermutatio

n a1 a2 a3 mA b1 b2 b3 mB mA-mB

1 28 32 45 35.00 25 22 29 25.33 9.672 28 32 25 28.33 45 22 29 32.00 -3.673 28 32 22 27.33 25 45 29 33.00 -5.674 28 32 29 29.67 25 22 29 25.33 4.335 28 25 45 32.67 32 22 29 27.67 5.006 28 22 45 31.67 25 32 29 28.67 3.007 28 29 45 34.00 25 22 32 26.33 7.678 25 32 45 34.00 28 22 29 26.33 7.679 22 32 45 33.00 25 28 29 27.33 5.6710 29 32 45 35.33 25 22 28 25.00 10.3311 25 22 45 30.67 28 32 29 29.67 1.0012 25 32 22 26.33 28 45 29 34.00 -7.6713 28 25 22 25.00 25 28 32 28.33 -3.3314 25 29 45 33.00 28 32 29 29.67 3.3315 25 32 29 28.67 28 22 45 31.67 -3.0016 28 25 29 27.33 25 32 45 34.00 -6.6717 22 29 45 32.00 28 32 29 29.67 2.3318 22 32 29 27.67 28 22 45 31.67 -4.0019 28 22 29 26.33 25 32 45 34.00 -7.6720 25 22 29 25.33 28 32 45 35.00 -9.67

A basic (univariate) permutation test

Page 9: Permutation tests in multivariate analysis Michele Scardi Department of Biology ‘Tor Vergata’ University Rome, Italy Email: mscardi@mclink.it URL:

• Difference between Sample A and Sample B means is mA-mB=9.67

• By permuting our data we obtained an empirical distribution of the values of this difference acting as if H0: equal means were true

• How large is this original difference relative to the empirical distribution?

• Is it so large that H0 is probably false?

A basic (univariate) permutation test

Page 10: Permutation tests in multivariate analysis Michele Scardi Department of Biology ‘Tor Vergata’ University Rome, Italy Email: mscardi@mclink.it URL:

mA-mB

9.67 -3.67 -5.67 4.33 5.00 3.00 7.67 7.67 5.6710.33 1.00 -7.67 -3.33 3.33 -3.00 -6.67 2.33 -4.00 -7.67 -9.67

mA-mB

In 2+1 casesout of 19+1:

|mA-mB| 9.67

P(H0: mA=mB) =

= (2+1)/(19+1) =

= 0.15 > 0.05

We cannot rejectH0: mA=mB

A basic (univariate) permutation testP-value cannot be

less than:

nspermutatio of n.

1

Page 11: Permutation tests in multivariate analysis Michele Scardi Department of Biology ‘Tor Vergata’ University Rome, Italy Email: mscardi@mclink.it URL:

A basic (univariate) permutation test

Page 12: Permutation tests in multivariate analysis Michele Scardi Department of Biology ‘Tor Vergata’ University Rome, Italy Email: mscardi@mclink.it URL:

What permutation tests do ecologist use?

• ANOSIM, MRPP, NPMANOVA, Mantel Test and Indicator Species Analysis are quite popular

• Other permutation have been developed to suit specific needs (e.g. testing significance of relationships between envrionmental variables and species in CCA)

• If you have to solve a particular ecological problem, chances are you can develop your own permutation test

Page 13: Permutation tests in multivariate analysis Michele Scardi Department of Biology ‘Tor Vergata’ University Rome, Italy Email: mscardi@mclink.it URL:

0

2

4

6

8

10

12

14

16

0 2 4 6 8 10 12 14 16 18 20 22 24

species 1

spec

ies

2

Treatment A Treatment B Treatment C Treatment D

A toy multivariate problem (4 treatments and a community with only 2 species)…

Page 14: Permutation tests in multivariate analysis Michele Scardi Department of Biology ‘Tor Vergata’ University Rome, Italy Email: mscardi@mclink.it URL:

…and a toy permutation test

• Differences between samples are measured as Euclidean distances

• The test statistics is the the average distance between group centroids (i.e. average responses to treatments)

• If the test statistics is large enough with respect to those obtained by data permutation, then differences between groups are significant.

Page 15: Permutation tests in multivariate analysis Michele Scardi Department of Biology ‘Tor Vergata’ University Rome, Italy Email: mscardi@mclink.it URL:

Species 1

Spe

cies

2

Distances between average responses to treatments (between centroids)

Page 16: Permutation tests in multivariate analysis Michele Scardi Department of Biology ‘Tor Vergata’ University Rome, Italy Email: mscardi@mclink.it URL:

Mantel test

A B C D EA 0.0 1.2 2.6 1.8 3.2B 1.2 0.0 3.1 0.5 2.7C 2.6 3.1 0.0 1.1 4.2D 1.8 0.5 1.1 0.0 3.4E 3.2 2.7 4.2 3.4 0.0

A B C D EA 0.00 0.29 0.56 0.45 0.49B 0.01 0.00 0.48 0.06 0.12C 0.21 0.17 0.00 0.27 0.59D 0.07 0.04 0.16 0.00 0.02E 0.45 0.34 0.78 0.21 0.00

X matrixe.g. geographical

distances

Y matrixe.g. species dissimilarity

Problem: are X and Y (in)dependent of each other?

Page 17: Permutation tests in multivariate analysis Michele Scardi Department of Biology ‘Tor Vergata’ University Rome, Italy Email: mscardi@mclink.it URL:

Mantel statistics

1

1 12

)()(

12

)(

1 n

i

n

ij y

ij

x

ij

s

yy

s

xx

nnR

1

1 1

n

i

n

ijijij yxZ

Range: [0,)

Range: [-1,1]i.e. correlation between xij and yij

Page 18: Permutation tests in multivariate analysis Michele Scardi Department of Biology ‘Tor Vergata’ University Rome, Italy Email: mscardi@mclink.it URL:

Mantel statistics

A B C D E

A 0.0 1.2 2.6 1.8 3.2

B 1.2 0.0 3.1 0.5 2.7

C 2.6 3.1 0.0 1.1 4.2

D 1.8 0.5 1.1 0.0 3.4

E 3.2 2.7 4.2 3.4 0.0

A B C D E

A 0.00 0.29 0.56 0.45 0.49

B 0.29 0.00 0.48 0.06 0.12

C 0.56 0.48 0.00 0.27 0.59

D 0.45 0.06 0.27 0.00 0.02

E 0.49 0.12 0.59 0.02 0.00

1

1 1

n

i

n

ijijij yxZ

Z=8.867

X

Y

xijyijyijxij

0.0680.023.4

2.4780.594.2

0.2970.271.1

0.3240.122.7

0.0300.060.5

1.4880.483.1

1.5680.493.2

0.8100.451.8

1.4560.562.6

0.3480.291.2

xijyijyijxij

0.0680.023.4

2.4780.594.2

0.2970.271.1

0.3240.122.7

0.0300.060.5

1.4880.483.1

1.5680.493.2

0.8100.451.8

1.4560.562.6

0.3480.291.2

Page 19: Permutation tests in multivariate analysis Michele Scardi Department of Biology ‘Tor Vergata’ University Rome, Italy Email: mscardi@mclink.it URL:

Mantel statistics

A B C D E

A 0.0 1.2 2.6 1.8 3.2

B 1.2 0.0 3.1 0.5 2.7

C 2.6 3.1 0.0 1.1 4.2

D 1.8 0.5 1.1 0.0 3.4

E 3.2 2.7 4.2 3.4 0.0

A B C D E

A 0.00 0.29 0.56 0.45 0.49

B 0.29 0.00 0.48 0.06 0.12

C 0.56 0.48 0.00 0.27 0.59

D 0.45 0.06 0.27 0.00 0.02

E 0.49 0.12 0.59 0.02 0.00

X

Y

1

1 1

n

i

n

ijijij yxZ

Z=8.867

xijyijyijxij

0.0680.023.4

2.4780.594.2

0.2970.271.1

0.3240.122.7

0.0300.060.5

1.4880.483.1

1.5680.493.2

0.8100.451.8

1.4560.562.6

0.3480.291.2

xijyijyijxij

0.0680.023.4

2.4780.594.2

0.2970.271.1

0.3240.122.7

0.0300.060.5

1.4880.483.1

1.5680.493.2

0.8100.451.8

1.4560.562.6

0.3480.291.2

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0 1 2 3 4 5

X

Y

NB: Mantel statistics is maximum when large xij are multiplied by large yij, i.e. when the two matrices have the same structure

Page 20: Permutation tests in multivariate analysis Michele Scardi Department of Biology ‘Tor Vergata’ University Rome, Italy Email: mscardi@mclink.it URL:

Mantel testA B C D E

A 0.0 1.2 2.6 1.8 3.2

B 1.2 0.0 3.1 0.5 2.7

C 2.6 3.1 0.0 1.1 4.2

D 1.8 0.5 1.1 0.0 3.4

E 3.2 2.7 4.2 3.4 0.0

A B C D E

A 0.00 0.29 0.56 0.45 0.49

B 0.29 0.00 0.48 0.06 0.12

C 0.56 0.48 0.00 0.27 0.59

D 0.45 0.06 0.27 0.00 0.02

E 0.49 0.12 0.59 0.02 0.00

X

Y

Z=8.867

A B C D E

A 0.0 0.5 1.1 1.8 3.4

B 0.5 0.0 3.1 1.2 2.7

C 1.1 3.1 0.0 2.6 4.2

D 1.8 1.2 2.6 0.0 3.2

E 3.4 2.7 4.2 3.2 0.0

ZP1=8.365

Z = 8.867

ZP1 = 8.365

ZP2 = 7.834

ZP3 = 6.897

ZP4 = 8.531

ZP5= 8.885

ZPn = 7.852

actual value

permutations

Page 21: Permutation tests in multivariate analysis Michele Scardi Department of Biology ‘Tor Vergata’ University Rome, Italy Email: mscardi@mclink.it URL:

Z=8.867

3%

Z

Fre

quen

cy

Mantel test

Z = 8.867

ZP1 = 8.365

ZP2 = 7.834

ZP3 = 6.897

ZP4 = 8.531

ZP5= 8.885

ZPn = 7.852

actual value

permutations

H0: X and Y are independent

permutations

actual value

p(Z)=0.03

Therefore, we reject H0

Page 22: Permutation tests in multivariate analysis Michele Scardi Department of Biology ‘Tor Vergata’ University Rome, Italy Email: mscardi@mclink.it URL:

Mantel test

• The test can be performed on any couple of distance or similarity matrices, for instance:– two different groups of organisms– environmental data and species composition– etc.

• In case the matrices are large enough (D>30) the test statistics is approximately distributed like Student’s t

Page 23: Permutation tests in multivariate analysis Michele Scardi Department of Biology ‘Tor Vergata’ University Rome, Italy Email: mscardi@mclink.it URL:
Page 24: Permutation tests in multivariate analysis Michele Scardi Department of Biology ‘Tor Vergata’ University Rome, Italy Email: mscardi@mclink.it URL:

W

E

1 0.5 0.9 0.1 0.6 1.0 0.3 0.9 0.7 0.8 0.8 0.8 0.6 0.9 0.0 0.6 0.4 0.6 0.2 0.2 0.7

0.5 1 0.3 0.7 0.3 0.2 0.4 0.6 0.9 0.1 0.0 0.8 0.5 0.9 0.1 0.8 0.1 0.8 0.5 0.9 0.7

0.9 0.3 1 0.0 0.4 0.7 0.1 0.9 0.5 0.0 0.4 0.4 0.9 0.5 0.6 0.5 0.0 0.1 0.4 0.3 0.5

0.1 0.7 0.0 1 0.7 1.0 0.4 0.3 0.1 0.8 0.1 0.5 0.2 0.5 0.7 0.9 0.3 0.6 0.9 0.7 0.5

0.6 0.3 0.4 0.6 1 0.7 0.6 0.2 0.8 0.3 0.9 0.2 0.9 0.1 0.7 0.6 0.9 0.6 0.6 0.2 0.6

1.0 0.2 0.7 1.0 0.7 1 0.0 0.6 0.3 0.0 0.2 0.8 0.5 0.2 0.1 0.5 0.7 0.6 0.8 0.7 0.6

0.3 0.4 0.1 0.4 0.6 0.0 1 0.6 0.3 0.2 0.4 0.6 0.9 0.5 0.0 0.1 0.7 0.3 0.5 0.6 0.5

0.9 0.6 0.9 0.3 0.2 0.6 0.6 1 0.4 0.6 0.6 0.2 0.4 0.6 0.4 0.5 0.4 0.3 0.6 0.2 0.1

0.7 0.9 0.5 0.1 0.8 0.3 0.3 0.4 1 0.7 0.6 0.7 0.5 0.0 0.7 0.9 0.3 0.8 0.1 0.5 0.8

0.8 0.1 0.0 0.8 0.3 0.0 0.2 0.6 0.7 1 0.9 0.8 0.7 0.2 0.5 0.2 0.6 0.8 0.6 0.6 0.6

0.8 0.0 0.4 0.1 0.9 0.2 0.4 0.6 0.6 0.9 1 0.3 0.3 0.3 0.0 0.8 0.2 0.4 0.8 0.1 0.8

0.8 0.8 0.4 0.5 0.2 0.8 0.6 0.2 0.7 0.8 0.3 1 0.9 0.1 0.2 1.0 0.6 0.6 0.4 0.6 1.0

0.6 0.5 0.9 0.2 0.9 0.5 0.9 0.4 0.5 0.7 0.3 0.9 1 0.6 0.9 0.7 0.0 0.9 0.9 0.8 0.9

0.9 0.9 0.5 0.5 0.1 0.2 0.5 0.6 0.0 0.2 0.3 0.1 0.6 1 0.8 0.6 0.5 0.1 0.1 0.5 0.6

0.0 0.1 0.6 0.7 0.7 0.1 0.0 0.4 0.7 0.5 0.0 0.2 0.9 0.8 1 0.5 0.4 0.2 0.7 0.2 0.3

0.6 0.8 0.5 0.9 0.6 0.5 0.1 0.5 0.9 0.2 0.8 1.0 0.7 0.6 0.5 1 0.3 0.9 0.1 0.8 0.6

0.4 0.1 0.0 0.3 0.9 0.7 0.7 0.4 0.3 0.6 0.2 0.6 0.0 0.5 0.4 0.3 1 0.5 0.1 0.9 0.3

0.6 0.8 0.1 0.6 0.6 0.6 0.3 0.3 0.8 0.8 0.4 0.6 0.9 0.1 0.2 0.9 0.5 1 0.6 0.7 0.2

0.2 0.5 0.4 0.9 0.6 0.8 0.5 0.6 0.1 0.6 0.8 0.4 0.9 0.1 0.7 0.1 0.1 0.6 1 0.1 0.2

0.2 0.9 0.3 0.7 0.2 0.7 0.6 0.2 0.5 0.6 0.1 0.6 0.8 0.5 0.2 0.8 0.9 0.7 0.1 1 0.5

0.7 0.7 0.5 0.5 0.6 0.6 0.5 0.1 0.8 0.6 0.8 1.0 0.9 0.6 0.3 0.6 0.3 0.2 0.2 0.5 1

1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0

1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0

1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0

1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0

1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0

1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0

1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0

1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0

1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1

0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1

0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1

0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1

0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1

0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1

0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1

0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1

0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1

0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1

0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1

0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1

W E

W

E

W E

W

E

Jaccard similarity Same basin? (1=yes, 0=no)

R=0.37

p<0.001

Years: 1969-1970Samples: 9 (W), 12 (E)Depth: 570-2550 m

H0: structure ofdeep copepod assemblages is independent of basin

W and E basins are separated by two

shallow sills

Mantel test

H0 was rejected, concluding that (according to common opinion)

Eastern and Western basins actually had different

deep copepod assemblages

But…More about this at the end

of the presentation!

Page 25: Permutation tests in multivariate analysis Michele Scardi Department of Biology ‘Tor Vergata’ University Rome, Italy Email: mscardi@mclink.it URL:

A few more facts about Mantel test

• Mantel test assumes that relationships between data in the two matrices are linear

• Therefore, it is sensitive to non-linearity (as well as to outliers)

• A rank-based Mantel test can be also performed, although it is not very popular

• Partial Mantel correlation (e.g. using 3 matrices) can be computed

• Mantel test can be performed on non-symmetrical matrices

Page 26: Permutation tests in multivariate analysis Michele Scardi Department of Biology ‘Tor Vergata’ University Rome, Italy Email: mscardi@mclink.it URL:

Measuring ecological resemblance• Similarity coefficients

– Range: [0,1]– S=1 when similarity between species lists is maximum

• Dissimilarity coefficients– D=1-S– Range: [0,1]– D=0 when similarity between species lists is maximum– Some of them are metric

• Distance coefficients– Range: [0,] or [0,Dmax]– D=0 when two species lists are identical (In some cases

proportional)– Most of them are metric

i.e. they can be used to arrange objects,

samples, etc. in an Euclidean space

Page 27: Permutation tests in multivariate analysis Michele Scardi Department of Biology ‘Tor Vergata’ University Rome, Italy Email: mscardi@mclink.it URL:

Measuring ecological resemblance

• Legendre & Legendre (1983,1998) list 25 similarity coefficients and 14 metric distances, but many others have been used.

• Different measures of ecological resemblance may lead to different results.

• Selection of an optimal way of measuring ecological resemblance is inherently subjective.

Page 28: Permutation tests in multivariate analysis Michele Scardi Department of Biology ‘Tor Vergata’ University Rome, Italy Email: mscardi@mclink.it URL:

A few examples…

cba

aS jk

Jaccard

cba

aS jk

2

2Sørensen

p

iikij

p

iikij

jk

xx

xx

S

1

1

),min(2Steinhaus(Bray-Curtis)

p

iikijjk xxD

1

2)(

p

iikijjk xxD

1

p

i ikij

ikij

ij xx

xxD

1

Euclidean

Manhattan

Canberra

A smart pick: asymmetric similarity coefficients ignore absence data (they rely upon the lowest level of information, i.e. presence)

Page 29: Permutation tests in multivariate analysis Michele Scardi Department of Biology ‘Tor Vergata’ University Rome, Italy Email: mscardi@mclink.it URL:

nst 6

x

1 2 3 4 5 6

1

2

3

4

5

6

7

8

9

10

11

12

0 1 4 0 0 0

0 0 4 10 0 0

0 1 5 1 16 29

0 0 0 0 0 1

0 0 4 14 0 0

0 1 0 7 0 0

0 0 4 6 15 18

0 0 0 0 8 7

0 0 0 12 8 9

15 8 0 0 12 20

17 19 0 7 0 0

0 1 0 9 0 0

nsp 12

D4j k

1

nsp

i

xi j x

i k

D4

0

13

53

84

67

86

13

0

48

77

72

97

53

48

0

61

62

87

84

77

61

0

95

118

67

72

62

95

0

27

86

97

87

118

27

0

S3j k

aj k

aj k b

j k cj k

S3

1

0.333

0

0.111

0.167

0.143

0.333

1

0.222

0.4

0.222

0.2

0

0.222

1

0.444

0.25

0.222

0.111

0.4

0.444

1

0.3

0.273

0.167

0.222

0.25

0.3

1

0.833

0.143

0.2

0.222

0.273

0.833

1

D5j k

1

nsp

i

xi j x

i k

xi j x

i k

D5

0

4.36

7

8.42

5.11

6.14

4.36

0

8.27

8.01

8.08

9.36

7

8.27

0

6.85

7.1

8.34

8.42

8.01

6.85

0

8.51

9.58

5.11

8.08

7.1

8.51

0

1.76

6.14

9.36

8.34

9.58

1.76

0

quantit

ativ

e

qualita

tive

Jaccard

Canberra

Manhattan

treatmentor impact

control

control

treatmentor impact

similaritieswithin groups

similaritiesbetween groups

Page 30: Permutation tests in multivariate analysis Michele Scardi Department of Biology ‘Tor Vergata’ University Rome, Italy Email: mscardi@mclink.it URL:

ANOSIM(ANalysis Of SIMilarities)

( 1) / 4b wr r

RN N

Problem: are differences between groups large enough as to say that groups are different from each other?

H0: mean rank of similarites between groups is equal to mean rank of similarities within groups, i.e. rb = rw

Page 31: Permutation tests in multivariate analysis Michele Scardi Department of Biology ‘Tor Vergata’ University Rome, Italy Email: mscardi@mclink.it URL:

sort

( 1) / 4b wr r

RN N

wr br

0.5

ANOSIM(ANalysis Of SIMilarities)

n=6 n=9

N=6

Page 32: Permutation tests in multivariate analysis Michele Scardi Department of Biology ‘Tor Vergata’ University Rome, Italy Email: mscardi@mclink.it URL:

0%

5%

10%

15%

20%

25%

-0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

n=6 n=9

( 1) / 4b wr r

RN N

n=6 n=9 n=6 n=9 n=6 n=9

R= 0.50

rw= 5.75rb=9.50

R= 0.20

rw= 7.08rb=8.61

R= 0.19

rw= 7.17rb=8.56

R= -0.26

rw= 9.17rb=7.22

...

0%

5%

10%

15%

20%

25%

-0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

0%

20%

40%

60%

80%

100%

p(R)=0.10

Page 33: Permutation tests in multivariate analysis Michele Scardi Department of Biology ‘Tor Vergata’ University Rome, Italy Email: mscardi@mclink.it URL:
Page 34: Permutation tests in multivariate analysis Michele Scardi Department of Biology ‘Tor Vergata’ University Rome, Italy Email: mscardi@mclink.it URL:

Final remarks about ANOSIM

• In case of a posteriori pairwise comparisons, don’t forget the Bonferroni correction:p(R) = p(R) * n. of pairwise comparisons

• Two-way analyses can also be performed

• The most particular feature of ANOSIM is that it takes into account ranks of similarities (not sensitive to outliers, but maybe too sensitive to small differences)

Page 35: Permutation tests in multivariate analysis Michele Scardi Department of Biology ‘Tor Vergata’ University Rome, Italy Email: mscardi@mclink.it URL:

OUTSIDE

INSIDE

Problem: is the fish assemblage compositionin a Marine Protected Area different from the

neighbouring sites?

Page 36: Permutation tests in multivariate analysis Michele Scardi Department of Biology ‘Tor Vergata’ University Rome, Italy Email: mscardi@mclink.it URL:

Other approaches

• ANOSIM is loosely related to ANOVA-like tests, but it can be easily integrated into explorative analyses based on ordination or clustering

• Other approaches are more similar to ANOVA and to ANOVA users’ way of thinking: MRPP and NPMANOVA

• Exactly like ANOVA, they support more complex experimental designs

Page 37: Permutation tests in multivariate analysis Michele Scardi Department of Biology ‘Tor Vergata’ University Rome, Italy Email: mscardi@mclink.it URL:

MRPP(MultiResponse Permutation Procedure)

• It is based on actual distances (usually Euclidean), not ranks

• The test statistic is Δ, a weighted average within group distance

• Significance of Δ is assessed by the empirical distribution of permuted Δs, but an approximation to Student’s t is also available (quite handy for very large data sets)

• There are different methods for weighting Δ according to group size

Page 38: Permutation tests in multivariate analysis Michele Scardi Department of Biology ‘Tor Vergata’ University Rome, Italy Email: mscardi@mclink.it URL:

MRPP• PRO: not using ranks, it can detect

differences in mean values as well as differences in data dispersion

• CON: biases due to distance metric are more pronounced than in ANOSIM

• Output– observed Δ– significance of observed Δ– expected Δ– a ratio between observed and expected Δ

Page 39: Permutation tests in multivariate analysis Michele Scardi Department of Biology ‘Tor Vergata’ University Rome, Italy Email: mscardi@mclink.it URL:

MRPP

T1 T2 T3 C1 C2 C3T1 0.0 8.2 5.4 17.4 16.1 15.1T2 8.2 0.0 5.7 15.9 14.2 12.4T3 5.4 5.7 0.0 15.3 13.9 11.9C1 17.4 16.1 15.1 0.0 5.6 6.9C2 15.9 14.2 12.4 5.6 0.0 5.8C3 15.3 13.9 11.9 6.9 5.8 0.0

T1 T2 T3 C1 C2 C3T1 8.2 5.4 17.4 16.1 15.1T2 5.7 15.9 14.2 12.4T3 15.3 13.9 11.9C1 5.6 6.9C2 5.8C3

T1 T2 T3 C1 C2 C3species 1 0 0 0 1 1 0species 2 0 0 0 0 1 1species 3 0 1 1 9 5 4species 4 1 0 1 0 0 0species 5 0 0 0 1 1 0species 6 8 12 9 0 1 2species 7 5 8 4 9 11 8species 8 0 0 1 4 6 2species 9 2 4 5 3 5 6

species 10 12 6 8 1 2 0

treatment control

Euclidean distances

distances within groups

distances between groups

=6.43*3/6+6.10*3/6=6.26

T=(8.2+5.4+5.7)/3=6.43

C=(5.6+6.9+5.8)/3=6.10

Weight = ni/sum(ni)

Expected = overall average distance (i.e. between and within groups) = 11.32

447.01expected

observed

R

Page 40: Permutation tests in multivariate analysis Michele Scardi Department of Biology ‘Tor Vergata’ University Rome, Italy Email: mscardi@mclink.it URL:

MRPP output------------------------------------------------- MRPP -- Multi-Response Permutation Procedures [Michele Scardi, 1999] -------------------------------------------------

Input data: 6 objects, 10 variables, 2 groupsWeighting option: C(I) = n(I)/sum(n(I))Distance measure: Euclidean

Group # 1, n = 3, avg(d) = 6.43 Group # 2, n = 3, avg(d) = 6.10

Test statistic, T = -2.95 Observed delta = 6.26 Expected delta = 11.32 Variance of delta = 2.94 Skewness of delta = -2.52 Within group agreement, R = 0.447 P-value of a <= delta = 0.022

R1 when D0i.e. R is large when

within group distance is small

groups are significantly more homegenous than expected (therefore, differences between groups are significant)

Page 41: Permutation tests in multivariate analysis Michele Scardi Department of Biology ‘Tor Vergata’ University Rome, Italy Email: mscardi@mclink.it URL:

Arpaia et al. (2007). Composition of Arthropod Species Assemblages inBt-expressing and Near Isogenic Eggplants in Experimental Fields.Environ. Entomol. 36(1): 213-227

Page 42: Permutation tests in multivariate analysis Michele Scardi Department of Biology ‘Tor Vergata’ University Rome, Italy Email: mscardi@mclink.it URL:

NPMANOVA

• Non-Parametric Multivariate ANalysis Of VAriance

• Test statistics

– SStotal = sum of squared distances between all observations and the overall centroid

– SSwithin = sum of squared differences between group observations and group centroid

– SSamong = SStotal-SSwithin

– Pseudo-F = ratio of SSamong to SSwithin

Page 43: Permutation tests in multivariate analysis Michele Scardi Department of Biology ‘Tor Vergata’ University Rome, Italy Email: mscardi@mclink.it URL:

NPMANOVA

Species 1

Spe

cies

2

SSamong

SSwithin

Page 44: Permutation tests in multivariate analysis Michele Scardi Department of Biology ‘Tor Vergata’ University Rome, Italy Email: mscardi@mclink.it URL:

NPMANOVA

• Test is based on permutation of a distance matrix and computation of new Pseudo-F

• Original Pseudo-F is then compared to the empirical distribution of permuted Pseudo-F values

• Traditional ANOVA output (Pseudo-F)

• One-way, two-ways and higher level ANOVA designs

Page 45: Permutation tests in multivariate analysis Michele Scardi Department of Biology ‘Tor Vergata’ University Rome, Italy Email: mscardi@mclink.it URL:

NPMANOVA

• It can be applied any time ANOSIM or MRPP can be used, and it seems more robust

• It allows higher level ANOVA designs, including designs with interactions

• Any distance coefficient can be used• Like ANOVA in multiple regression, it can be used

to evaluate model functioning: e.g. community structure=f(disturbance,time)

• Software available from the Author’s web site for complex analysis designs

Page 46: Permutation tests in multivariate analysis Michele Scardi Department of Biology ‘Tor Vergata’ University Rome, Italy Email: mscardi@mclink.it URL:

NPMANOVA

T1 T2 T3 C1 C2 C3species 1 0 0 0 1 1 0species 2 0 0 0 0 1 1species 3 0 1 1 9 5 4species 4 1 0 1 0 0 0species 5 0 0 0 1 1 0species 6 8 12 9 0 1 2species 7 5 8 4 9 11 8species 8 0 0 1 4 6 2species 9 2 4 5 3 5 6

species 10 12 6 8 1 2 0

treatment controlT1 T2 T3 C1 C2 C3

species 1 0 0 0 1 1 0species 2 0 0 0 0 1 1species 3 0 1 1 9 5 4species 4 1 0 1 0 0 0species 5 0 0 0 1 1 0species 6 8 12 9 0 1 2species 7 5 8 4 9 11 8species 8 0 0 1 4 6 2species 9 2 4 5 3 5 6

species 10 12 6 8 1 2 0

treatment control

Non-parametric Multivariate Analysis of Variance

Source df SS MS F P---------------------------------------------------Treatment 1 288.0 288.0 14.4 0.0981Residuals 4 80.0 20.0Total 5 368.0---------------------------------------------------

Page 47: Permutation tests in multivariate analysis Michele Scardi Department of Biology ‘Tor Vergata’ University Rome, Italy Email: mscardi@mclink.it URL:

NPMANOVA vs. MRPP

Species 1

Spe

cies

2

SSamong

SSwithin

Page 48: Permutation tests in multivariate analysis Michele Scardi Department of Biology ‘Tor Vergata’ University Rome, Italy Email: mscardi@mclink.it URL:

NPMANOVA vs. MRPP

Species 1

Spe

cies

2

expected

observed

Page 49: Permutation tests in multivariate analysis Michele Scardi Department of Biology ‘Tor Vergata’ University Rome, Italy Email: mscardi@mclink.it URL:

Indicator Species Analysis

• Indicator Species Analysis allows identifiying species that are significantly more frequent and/or abundant in a group of samples

• Each species is associated to a vector of Indicator Values (IVs), i.e. to an IV for each group of samples

• Significance of IVs is tested by permutation of the raw data matrix

Page 50: Permutation tests in multivariate analysis Michele Scardi Department of Biology ‘Tor Vergata’ University Rome, Italy Email: mscardi@mclink.it URL:

g

kkj

kjkj

x

xRA

1

kn

i k

ijkkj n

bRF

1

100 kjkjkj RFRAIV

Indicator Species AnalysisRelative abundance

of species j in group k

Average frequency of occurence of

species j in group k

The Indicator value (IV) is obtained by combining

relative abundances (RA) and average frequencies

of occurrence (RF)

Counts, biomass, etc.

Presence or absence

(0,1)

Page 51: Permutation tests in multivariate analysis Michele Scardi Department of Biology ‘Tor Vergata’ University Rome, Italy Email: mscardi@mclink.it URL:

Group: CTL IMP N: 43 20

n ID INDVAL p taxon----------------------------------------------------

1 EUPHAS 4 44 0.017 Euphasiacea

2 THYSAN 0 25 0.004 Thysanopoda aequalis

3 RESPES 26 1 0.046 Fish scales

4 RESCRO 0 20 0.009 Unid. Crustaceans

5 MISIDA 5 0 0.595 Misidacea

6 DECAPO 2 3 0.999 Decapoda

7 CEFALO 5 0 0.554 Cephalopoda

8 CHLORO 0 5 0.299 Chlorotocus crassicornis

9 CRANGO 2 0 0.999 Crangon sp

10 SARDIN 0 5 0.299 Sardina pilchardus

11 ROCINE 0 5 0.307 Rocinela sp

12 POLICH 2 0 0.999 Polychaeta

--------------------------------------------------------

Indicator Species AnalysisGut contents of

Merluccius merluccius Group: CTL IMP

N: 43 20

n ID INDVAL p taxon----------------------------------------------------

1 EUPHAS 4 44 0.017 Euphasiacea

2 THYSAN 0 25 0.004 Thysanopoda aequalis

3 RESPES 26 1 0.046 Fish scales

4 RESCRO 0 20 0.009 Unid. Crustaceans

5 MISIDA 5 0 0.595 Misidacea

6 DECAPO 2 3 0.999 Decapoda

7 CEFALO 5 0 0.554 Cephalopoda

8 CHLORO 0 5 0.299 Chlorotocus crassicornis

9 CRANGO 2 0 0.999 Crangon sp

10 SARDIN 0 5 0.299 Sardina pilchardus

11 ROCINE 0 5 0.307 Rocinela sp

12 POLICH 2 0 0.999 Polychaeta

--------------------------------------------------------

Fish don’t feed on polluted sea

bed!

Page 52: Permutation tests in multivariate analysis Michele Scardi Department of Biology ‘Tor Vergata’ University Rome, Italy Email: mscardi@mclink.it URL:

Arpaia et al. (2007). Composition of Arthropod Species Assemblages inBt-expressing and Near Isogenic Eggplants in Experimental Fields.Environ. Entomol. 36(1): 213-227

Page 53: Permutation tests in multivariate analysis Michele Scardi Department of Biology ‘Tor Vergata’ University Rome, Italy Email: mscardi@mclink.it URL:

As for the deep copepods…Back to the Mantel test

Page 54: Permutation tests in multivariate analysis Michele Scardi Department of Biology ‘Tor Vergata’ University Rome, Italy Email: mscardi@mclink.it URL:

W

E

1 0.5 0.9 0.1 0.6 1.0 0.3 0.9 0.7 0.8 0.8 0.8 0.6 0.9 0.0 0.6 0.4 0.6 0.2 0.2 0.7

0.5 1 0.3 0.7 0.3 0.2 0.4 0.6 0.9 0.1 0.0 0.8 0.5 0.9 0.1 0.8 0.1 0.8 0.5 0.9 0.7

0.9 0.3 1 0.0 0.4 0.7 0.1 0.9 0.5 0.0 0.4 0.4 0.9 0.5 0.6 0.5 0.0 0.1 0.4 0.3 0.5

0.1 0.7 0.0 1 0.7 1.0 0.4 0.3 0.1 0.8 0.1 0.5 0.2 0.5 0.7 0.9 0.3 0.6 0.9 0.7 0.5

0.6 0.3 0.4 0.6 1 0.7 0.6 0.2 0.8 0.3 0.9 0.2 0.9 0.1 0.7 0.6 0.9 0.6 0.6 0.2 0.6

1.0 0.2 0.7 1.0 0.7 1 0.0 0.6 0.3 0.0 0.2 0.8 0.5 0.2 0.1 0.5 0.7 0.6 0.8 0.7 0.6

0.3 0.4 0.1 0.4 0.6 0.0 1 0.6 0.3 0.2 0.4 0.6 0.9 0.5 0.0 0.1 0.7 0.3 0.5 0.6 0.5

0.9 0.6 0.9 0.3 0.2 0.6 0.6 1 0.4 0.6 0.6 0.2 0.4 0.6 0.4 0.5 0.4 0.3 0.6 0.2 0.1

0.7 0.9 0.5 0.1 0.8 0.3 0.3 0.4 1 0.7 0.6 0.7 0.5 0.0 0.7 0.9 0.3 0.8 0.1 0.5 0.8

0.8 0.1 0.0 0.8 0.3 0.0 0.2 0.6 0.7 1 0.9 0.8 0.7 0.2 0.5 0.2 0.6 0.8 0.6 0.6 0.6

0.8 0.0 0.4 0.1 0.9 0.2 0.4 0.6 0.6 0.9 1 0.3 0.3 0.3 0.0 0.8 0.2 0.4 0.8 0.1 0.8

0.8 0.8 0.4 0.5 0.2 0.8 0.6 0.2 0.7 0.8 0.3 1 0.9 0.1 0.2 1.0 0.6 0.6 0.4 0.6 1.0

0.6 0.5 0.9 0.2 0.9 0.5 0.9 0.4 0.5 0.7 0.3 0.9 1 0.6 0.9 0.7 0.0 0.9 0.9 0.8 0.9

0.9 0.9 0.5 0.5 0.1 0.2 0.5 0.6 0.0 0.2 0.3 0.1 0.6 1 0.8 0.6 0.5 0.1 0.1 0.5 0.6

0.0 0.1 0.6 0.7 0.7 0.1 0.0 0.4 0.7 0.5 0.0 0.2 0.9 0.8 1 0.5 0.4 0.2 0.7 0.2 0.3

0.6 0.8 0.5 0.9 0.6 0.5 0.1 0.5 0.9 0.2 0.8 1.0 0.7 0.6 0.5 1 0.3 0.9 0.1 0.8 0.6

0.4 0.1 0.0 0.3 0.9 0.7 0.7 0.4 0.3 0.6 0.2 0.6 0.0 0.5 0.4 0.3 1 0.5 0.1 0.9 0.3

0.6 0.8 0.1 0.6 0.6 0.6 0.3 0.3 0.8 0.8 0.4 0.6 0.9 0.1 0.2 0.9 0.5 1 0.6 0.7 0.2

0.2 0.5 0.4 0.9 0.6 0.8 0.5 0.6 0.1 0.6 0.8 0.4 0.9 0.1 0.7 0.1 0.1 0.6 1 0.1 0.2

0.2 0.9 0.3 0.7 0.2 0.7 0.6 0.2 0.5 0.6 0.1 0.6 0.8 0.5 0.2 0.8 0.9 0.7 0.1 1 0.5

0.7 0.7 0.5 0.5 0.6 0.6 0.5 0.1 0.8 0.6 0.8 1.0 0.9 0.6 0.3 0.6 0.3 0.2 0.2 0.5 1

1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0

1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0

1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0

1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0

1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0

1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0

1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0

1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0

1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1

0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1

0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1

0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1

0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1

0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1

0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1

0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1

0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1

0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1

0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1

0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1

W E

W

E

W E

W

E

Jaccard similarity Same basin? (1=yes, 0=no)

R=0.37

p<0.001

Years: 1969-1970Samples: 9 (W), 12 (E)Depth: 570-2550 m

H0: structure ofdeep copepod assemblages is independent of basin

W and E basins are separated by two

shallow sills

Mantel test

H0 was rejected, concluding that (according to common opinion)

Eastern and Western basins actually had different

deep copepod assemblages

Page 55: Permutation tests in multivariate analysis Michele Scardi Department of Biology ‘Tor Vergata’ University Rome, Italy Email: mscardi@mclink.it URL:

But…

• Deep copepod density is very, very low• In many cases only a single specimen of a given

species was found in the whole sample• 45 out of 65 species were found in both basins,

and only 19 among them accounted for 96% of the specimens

• The number of samples was very small• When a very rare species was not found in a

sample it was actually absent in the sampling area or just too sparse to be always collected?

Page 56: Permutation tests in multivariate analysis Michele Scardi Department of Biology ‘Tor Vergata’ University Rome, Italy Email: mscardi@mclink.it URL:

Simulation

• Re-sampling was simulated 500 times, assuming that species found in a real sample with only 1 or 2 specimens had a 50% chance of being found again

• Mantel tests were performed after each simulated re-sampling

• H0 was rejected in only 73 out of 500 cases (14.6%)

• So, are the deep copepod assemblages from Western and Eastern Basins really different?

Happy ending!Recent studies confirmed that

there are no differences betweenW and E deep copepod

assemblages.

Page 57: Permutation tests in multivariate analysis Michele Scardi Department of Biology ‘Tor Vergata’ University Rome, Italy Email: mscardi@mclink.it URL:

Conclusions

Only two more slides!

Page 58: Permutation tests in multivariate analysis Michele Scardi Department of Biology ‘Tor Vergata’ University Rome, Italy Email: mscardi@mclink.it URL:

Permutation tests in 6 easy steps

1. Define a problem and a null hypothesis

2. Define an appropriate statistics

3. Compute that statistics for your data set

4. Permute many times your data in a way that is consistent with the null hypothesis

5. Obtain an empirical distribution of the test statistics

6. Compare the original value of the test statistics to the empirical distribution: is it large (or small) enough as to reject the null hypothesis?

Page 59: Permutation tests in multivariate analysis Michele Scardi Department of Biology ‘Tor Vergata’ University Rome, Italy Email: mscardi@mclink.it URL:

The bottom line• Permutation tests are very useful in ecological

studies, as they don’t rely upon some assumptions that are seldom met in ecological applications: e.g. (multi)normal distribution of data.

• However, some assumption are still to be met:– variances (dispersion) must be homogeneous– effects (in case of complex designs) must be additive– data (observations) must be independent

• You con design/adapt your own test, provided that is consistent with H0 (and other general assumptions)

Page 60: Permutation tests in multivariate analysis Michele Scardi Department of Biology ‘Tor Vergata’ University Rome, Italy Email: mscardi@mclink.it URL:

Thank you!More info about ecological data analysis(+Machine Learning and other related topics)?

Email: [email protected]: http://www.michele.scardi.name(or just google Michele Scardi)

PAST software package:

http://folk.uio.no/ohammer/past(or just google PAST)

Marti J. Anderson’s web site (NPMANOVA):

http://www.stat.auckland.ac.nz/~mja