ENV 2A7Y Community Analysis in Ecology
Academic year 2004/5Alastair Grant
University of East Anglia
Course home page (http://www.uea.ac.uk/~e130/env2b7y.htm)
Alastair Grant home page (http://www.uea.ac.uk/~e130/)
Centre for Ecology, Evolution and Conservation at the University of East Anglia (http://www.uea.ac.uk/ceec/)
ENV 2A7Y Community Analysis in Ecology
Lecture 1 (Wednesday Au02, 1000) overview to statistical analysis.Plus picking up bits and pieces
Aims of Course
• Learn to identify organisms using formal keys– A vital and saleable ecological skill.
• Carry out a quantitative survey• Learn to do and interpret multivariate
statistical analysis of ecological data– Again, vital and saleable skills.
Coursework deadlines different from last year
• Collection of organisms– Autumn week 7 (Monday).
• Survey report.– Spring week 3 (Wednesday).
• Deadlines are at 1400 hours, and are strictly enforced.– For details of system for extensions etc, see:– http://www.uea.ac.uk/env/ueanetwk/studentrec
ords/Webpages/regulations.htm
Collection
• Set of correctly identified and well presented organisms.
• Accompanied by short report justifying identifications – Demonstrates that you have identified them using a
proper key. – e.g. "The presence of a horn in the centre of the
forehead distinguishes this as a unicorn rather than a horse".
– Can be presented as notes accompanying specimens.
First practical session
• Practical class, 10-12, Wednesday Au03, in ENV Lab A.
• Bring along any problem specimens for help.
• Attendance optional – if you are happy with identifications, no need to come.
• Computer based practicals begin Au 07. Attendance again optional
Any Questions?
Statistical analysis
• Ordination– Identifying gradient/trends in the data
• Classification– Identifying groups in the data– Communities and super-organisms– (NVC)
Statistical Analysis
• Begin with SPSS– PCA (Principal Components Analysis)
• simplest ordination method
– Cluster Analysis
• Then think about:– Canoco– Primer– Including looking at relationship between ecology and
environment.
VAR00002
8.58.07.57.06.56.05.55.04.5
VA
R00
001
10
8
6
4
2
0
Cluster Analysis – identifies groups
VAR00001
121086420
V2
16
14
12
10
8
6
4
2
Example where PCA will be helpful
Organising data
Site id Replicate Altitude (m)
Sp 1 Sp 2 Sp 3 Sp 4 Etc.
Field 1 1 10 7 5 12 7
Field 1 2 10 10 9 6 12
Etc
How to do the basic statistical analysis
The bare minimum
• Everyone should carry out:– PCA– Cluster analysis (using K-means clustering)– Plot site scores on PCA with different symbols for
different clusters– Discuss patterns in community composition– Discuss relationship of this to environmental variables.
• Handouts on statistics are at: http://www.uea.ac.uk/~e130/2b7ymethods.htm
Handouts you’ve already had
• Instructions on putting data into SPSS• Transforming data (and filling in zeros)• Running basic PCA and K-Means
Clustering
Handouts today
• Screenshot of web links to stats handouts.• Annotated examples of SPSS output for
PCA and Cluster analysis• Relationship between Environmental
variables and Ecology and instructions on Primer and Canoco.
• Extract from Clarke and Warwick (1994) discussing MDS
Before you start….
• Count data may need transformation– Often skewed– Large counts dominate analysis
• Use 4th root for counts (◊◊)• Don’t transform % cover data
Topic 1. Cluster analysis
• Group together samples that are similar to each other
• K-means clustering. – You define number of clusters– If distinct groups are present, will pick them out– If no distinct groups, will give roughly equal
sized clusters that are easy to work with
Hierarchical clustering
• No need to specify cluster number up front• Will identify real groups in data, if they are
present• If groups are not clearly distinct, results are
messy• An optional extra. Examples on handout.
K-means clustering
• Analyse>Classify>K-Means Cluster• Copy names of variables containing species
data into “variables” box• Specify the number of clusters• Click on SAVE button, and tick the Save
Cluster Memberships box. Then click Continue
• Click OK to run analysis
Dataset 2. Species x, y and zQuick Cluster
The first two tables can be ignored for most purposes
Initial Cluster Centers
36 16 55 69 533 62 8 31 2331 22 37 0 71
XYZ
1 2 3 4 5Cluster
Iteration Historya
1.620 .000 3.512 10.927 11.3345.410 .000 .000 .000 10.513
.000 .000 .000 .000 .000
Iteration123
1 2 3 4 5Change in Cluster Centers
Convergence achieved due to no or small distance change. Themaximum distance by which any center has changed is .000. Thecurrent iteration is 3. The minimum distance between initialcenters is 32.180.
a.
Mean abundance of each species in each cluster
Final Cluster Centers
39 16 53 64 1333 62 11 27 3328 22 36 9 54
XYZ
1 2 3 4 5Cluster
Number of Cases in each Cluster
8.0001.0005.000
12.0004.000
30.000.000
12345
Cluster
ValidMissing
No distinct clusters
REGR factor score 1 for analysis 2
.2.10.0-.1
RE
GR
fact
or s
core
2
for a
naly
sis
2
.3
.2
.1
0.0
-.1
-.2
Cluster Number of Ca
5
4
3
2
1
c.f. species a, b and c
REGR factor score 1 for analysis 4
.10.0-.1
RE
GR
fact
or s
core
2
for a
naly
sis
4
.2
.1
0.0
-.1
-.2
Cluster Number of Ca
5
4
3
2
1
Topic 2. Principal Components Analysis (PCA)
• Simplest of a group of methods known as Factor Analysis– Data on two species - graph on paper– Data on three species – 3D graph on computer– Data on 30 species = Nightmare
• PCA Takes multidimensional set of data • Rotates the data in space so that they can be
plotted with the minimum distortion.
For a good description of PCA see
http://www.okstate.edu/artsci/botany/ordinate/PCA.htm
To carry out PCA
• Analyse>Data Reduction>Factor• Copy species abundances into variables box• Click on the extraction button and select
“covariance matrix”• Click on the Scores button and check the
“Save as Variables” box• If any variables contain all zeros, SPSS will
sulk
Dataset 1. PCA
• Data on the abundance of three species – x, y and z
• Abundances sum to 100%• 21 samples
Scatter plot of all pairs of variables
X
Y
Z
Pearson correlation coefficients
Correlations
1.000 -.780** -.880**. .000 .000
21 21 21-.780** 1.000 .389.000 . .081
21 21 21-.880** .389 1.000.000 .081 .
21 21 21
Pearson CorrelationSig. (2-tailed)NPearson CorrelationSig. (2-tailed)NPearson CorrelationSig. (2-tailed)N
X
Y
Z
X Y Z
Correlation is significant at the 0.01 level (2-tailed).**.
Eigenvalues - proportion of variance explainedHow long is the sausage?
100%11.9%1482
88.1%88.1%11011
Cumulative % variance
% varianceEigenvalueComponent
Rescaled half of Component Matrix. Correlations with species abundances
-0.4310.902Z
0.6630.748Y
-0.048-0.999X
Component 2Component 1
Scores on factor 1 against scores on factor 2
REGR factor score 1 for analysis 2
.08.06.04.020.00-.02-.04-.06-.08
RE
GR
fact
or s
core
2
for
anal
ysis
2
.2
.1
0.0
-.1
-.2
á Y
á
Z
áá Species X Species Y&Z
Topic 3. Environmental variables
• Formal statistical tests of relationships– Canonical Correspondence Analysis (CCA) in
Canoco– Permutation tests (BIOENV procedure in
Primer)
• Simple (often visual) approaches– Correlate environmental variables with
principal components (may not work well)
Calculate means for each cluster
555N =
CLUSTER
3.002.001.00
95%
CI S
ALI
NIT
Y30
20
10
0
Plot symbols of different sizes on PCA graph
What next?
• Make sure that you’ve done PCA and cluster analysis
• Interpret the results in ecological terms• Assess relationships with environmental variables• Then:
– MDS or DCA if necessary (arch effect)– Correspondence analysis to look at relationships
between species
Schedule for remainder of course
• 10-12 on Wednesdays. • Help available with data analysis.
– Week 7 and 9 Arts 1.02– Week 11 and 12, ENV Lab D
• Turn up if you need help• Lectures in week 8 and 11• Two more computer based sessions SP 01
and SP 02