Download pdf - ENV 2A7Y Community Analysis in Ecology - UEAe130/Lecture overheads 2004.pdf · ENV 2A7Y Community Analysis in Ecology ... • Annotated examples of SPSS output for ... – MDS or

ENV 2A7Y Community Analysis in Ecology

Academic year 2004/5Alastair Grant

University of East Anglia

Course home page (http://www.uea.ac.uk/~e130/env2b7y.htm)

Alastair Grant home page (http://www.uea.ac.uk/~e130/)

Centre for Ecology, Evolution and Conservation at the University of East Anglia (http://www.uea.ac.uk/ceec/)

ENV 2A7Y Community Analysis in Ecology

Lecture 1 (Wednesday Au02, 1000) overview to statistical analysis.Plus picking up bits and pieces

Aims of Course

• Learn to identify organisms using formal keys– A vital and saleable ecological skill.

• Carry out a quantitative survey• Learn to do and interpret multivariate

statistical analysis of ecological data– Again, vital and saleable skills.

Coursework deadlines different from last year

• Collection of organisms– Autumn week 7 (Monday).

• Survey report.– Spring week 3 (Wednesday).

• Deadlines are at 1400 hours, and are strictly enforced.– For details of system for extensions etc, see:– http://www.uea.ac.uk/env/ueanetwk/studentrec

ords/Webpages/regulations.htm

Collection

• Set of correctly identified and well presented organisms.

• Accompanied by short report justifying identifications – Demonstrates that you have identified them using a

proper key. – e.g. "The presence of a horn in the centre of the

forehead distinguishes this as a unicorn rather than a horse".

– Can be presented as notes accompanying specimens.

First practical session

• Practical class, 10-12, Wednesday Au03, in ENV Lab A.

• Bring along any problem specimens for help.

• Attendance optional – if you are happy with identifications, no need to come.

• Computer based practicals begin Au 07. Attendance again optional

Any Questions?

Statistical analysis

• Ordination– Identifying gradient/trends in the data

• Classification– Identifying groups in the data– Communities and super-organisms– (NVC)

Statistical Analysis

• Begin with SPSS– PCA (Principal Components Analysis)

• simplest ordination method

– Cluster Analysis

• Then think about:– Canoco– Primer– Including looking at relationship between ecology and

environment.

VAR00002

8.58.07.57.06.56.05.55.04.5

VA

R00

001

10

8

6

4

2

0

Cluster Analysis – identifies groups

VAR00001

121086420

V2

16

14

12

10

8

6

4

2

Example where PCA will be helpful

Organising data

Site id Replicate Altitude (m)

Sp 1 Sp 2 Sp 3 Sp 4 Etc.

Field 1 1 10 7 5 12 7

Field 1 2 10 10 9 6 12

Etc

How to do the basic statistical analysis

The bare minimum

• Everyone should carry out:– PCA– Cluster analysis (using K-means clustering)– Plot site scores on PCA with different symbols for

different clusters– Discuss patterns in community composition– Discuss relationship of this to environmental variables.

• Handouts on statistics are at: http://www.uea.ac.uk/~e130/2b7ymethods.htm

Handouts you’ve already had

• Instructions on putting data into SPSS• Transforming data (and filling in zeros)• Running basic PCA and K-Means

Clustering

Handouts today

• Screenshot of web links to stats handouts.• Annotated examples of SPSS output for

PCA and Cluster analysis• Relationship between Environmental

variables and Ecology and instructions on Primer and Canoco.

• Extract from Clarke and Warwick (1994) discussing MDS

Before you start….

• Count data may need transformation– Often skewed– Large counts dominate analysis

• Use 4th root for counts (◊◊)• Don’t transform % cover data

Topic 1. Cluster analysis

• Group together samples that are similar to each other

• K-means clustering. – You define number of clusters– If distinct groups are present, will pick them out– If no distinct groups, will give roughly equal

sized clusters that are easy to work with

Hierarchical clustering

• No need to specify cluster number up front• Will identify real groups in data, if they are

present• If groups are not clearly distinct, results are

messy• An optional extra. Examples on handout.

K-means clustering

• Analyse>Classify>K-Means Cluster• Copy names of variables containing species

data into “variables” box• Specify the number of clusters• Click on SAVE button, and tick the Save

Cluster Memberships box. Then click Continue

• Click OK to run analysis

Dataset 2. Species x, y and zQuick Cluster

The first two tables can be ignored for most purposes

Initial Cluster Centers

36 16 55 69 533 62 8 31 2331 22 37 0 71

XYZ

1 2 3 4 5Cluster

Iteration Historya

1.620 .000 3.512 10.927 11.3345.410 .000 .000 .000 10.513

.000 .000 .000 .000 .000

Iteration123

1 2 3 4 5Change in Cluster Centers

Convergence achieved due to no or small distance change. Themaximum distance by which any center has changed is .000. Thecurrent iteration is 3. The minimum distance between initialcenters is 32.180.

a.

Mean abundance of each species in each cluster

Final Cluster Centers

39 16 53 64 1333 62 11 27 3328 22 36 9 54

XYZ

1 2 3 4 5Cluster

Number of Cases in each Cluster

8.0001.0005.000

12.0004.000

30.000.000

12345

Cluster

ValidMissing

No distinct clusters

REGR factor score 1 for analysis 2

.2.10.0-.1

RE

GR

fact

or s

core

2

for a

naly

sis

2

.3

.2

.1

0.0

-.1

-.2

Cluster Number of Ca

5

4

3

2

1

c.f. species a, b and c


.10.0-.1

RE

GR

fact

or s

core

2

for a

naly

sis

4

.2

.1

0.0

-.1

-.2

Cluster Number of Ca

5

4

3

2

1

Topic 2. Principal Components Analysis (PCA)

• Simplest of a group of methods known as Factor Analysis– Data on two species - graph on paper– Data on three species – 3D graph on computer– Data on 30 species = Nightmare

• PCA Takes multidimensional set of data • Rotates the data in space so that they can be

plotted with the minimum distortion.

For a good description of PCA see

http://www.okstate.edu/artsci/botany/ordinate/PCA.htm

To carry out PCA

• Analyse>Data Reduction>Factor• Copy species abundances into variables box• Click on the extraction button and select

“covariance matrix”• Click on the Scores button and check the

“Save as Variables” box• If any variables contain all zeros, SPSS will

sulk

Dataset 1. PCA

• Data on the abundance of three species – x, y and z

• Abundances sum to 100%• 21 samples

Scatter plot of all pairs of variables

X

Y

Z

Pearson correlation coefficients

Correlations

1.000 -.780** -.880**. .000 .000

21 21 21-.780** 1.000 .389.000 . .081

21 21 21-.880** .389 1.000.000 .081 .

21 21 21

Pearson CorrelationSig. (2-tailed)NPearson CorrelationSig. (2-tailed)NPearson CorrelationSig. (2-tailed)N

X

Y

Z

X Y Z

Correlation is significant at the 0.01 level (2-tailed).**.

Eigenvalues - proportion of variance explainedHow long is the sausage?

100%11.9%1482

88.1%88.1%11011

Cumulative % variance

% varianceEigenvalueComponent

Rescaled half of Component Matrix. Correlations with species abundances

-0.4310.902Z

0.6630.748Y

-0.048-0.999X

Component 2Component 1

Scores on factor 1 against scores on factor 2


.08.06.04.020.00-.02-.04-.06-.08

RE

GR

fact

or s

core

2

for

anal

ysis

2

.2

.1

0.0

-.1

-.2

á Y

á

Z

áá Species X Species Y&Z

Topic 3. Environmental variables

• Formal statistical tests of relationships– Canonical Correspondence Analysis (CCA) in

Canoco– Permutation tests (BIOENV procedure in

Primer)

• Simple (often visual) approaches– Correlate environmental variables with

principal components (may not work well)

Calculate means for each cluster

555N =

CLUSTER

3.002.001.00

95%

CI S

ALI

NIT

Y30

20

10

0

Plot symbols of different sizes on PCA graph

What next?

• Make sure that you’ve done PCA and cluster analysis

• Interpret the results in ecological terms• Assess relationships with environmental variables• Then:

– MDS or DCA if necessary (arch effect)– Correspondence analysis to look at relationships

between species

Schedule for remainder of course

• 10-12 on Wednesdays. • Help available with data analysis.

– Week 7 and 9 Arts 1.02– Week 11 and 12, ENV Lab D

• Turn up if you need help• Lectures in week 8 and 11• Two more computer based sessions SP 01

and SP 02