What we Measure vs. What we Want to Know "Not everything that counts can be counted, and not everything that can be counted counts." - Albert Einstein

What we Measure vs. What we Want to Know

"Not everything that counts can be counted, and not everything that can be counted counts." - Albert Einstein

Scales, Transformations, Vectors and Multi-Dimensional

Hyperspace

• All measurement is a proxy for what is really of interest - The Relationship between them

• The scale of measurement and the scale of analysis and reporting are not always the same - Transformations

• We often make measurements that are highly correlated - Multi-component Vectors

Multivariate Description

Gulls Variables

Weight

400 420 440 105 115 125 135

700

900

1100

400

420

440

Wing

Bill

1618

2022

700 800 900 1100

105

115

125

135

16 17 18 19 20 21 22

H.and.B

Scree Plot

Comp.1 Comp.2 Comp.3 Comp.4

gulls.pca2V

ari

an

ces

0.0

0.5

1.0

1.5

2.0

2.5

3.0

Output

> gulls.pca2$loadings

Loadings:

Comp.1 Comp.2 Comp.3 Comp.4Weight -0.505 -0.343 0.285 0.739Wing -0.490 0.852 -0.143 0.116Bill -0.500 -0.381 -0.742 -0.232H.and.B -0.505 -0.107 0.589 -0.622

> summary(gulls.pca2)

Importance of components:

Comp.1 Comp.2 Comp.3 Standard deviation 1.8133342 0.52544623 0.47501980 Proportion of Variance 0.8243224 0.06921464 0.05656722 Cumulative Proportion 0.8243224 0.89353703 0.95010425

Bi-Plot

-0.15 -0.10 -0.05 0.00 0.05 0.10

-0.1

5-0

.10

-0.0

50

.00

0.0

50

.10

Comp.1

Co

mp

.2

1

2

3

4

5

6

7

8

9

1011

1213

14

15

16

17

18

19

20

21

22

23

24

25

26

27 28

2930

31

32

3334

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

6768

69

70

71

72

73

74

75

76

77

78

79

80

81

82

8384

85 86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146147

148

149

150

151152

153154

155156

157

158 159

160

161

162163

164

165

166

167

168

169170

171172

173

174

175

176

177

178

179

180

181182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217218

219

220

221

222

223

224225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255256

257

258

259

260

261

262263

264

265

266 267

268

269

270

271

272

273

274

275

276

277

278

279

280

281 282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

326327

328

329

330

331

332

333

334

335

336

337

338

339

340

341

342

343

344

345

346

347

348

349

350

351

352

353354

355

356

357

358

359

360

361

362

-20 -10 0 10

-20

-10

01

0

Weight

Wing

Bill

H.and.B

Environmental Gradients

Inferring Gradients from Attribute Data (e.g. species)

Indirect Gradient Analysis

• Environmental gradients are inferred from species data alone

• Three methods:– Principal Component Analysis - linear model– Correspondence Analysis - unimodal model– Detrended CA - modified unimodal model

Terschelling Dune Data

PCA gradient - site plot

PCA 1

PCA

2

2.01.51.00.50.0-0.5-1.0-1.5

2.0

1.5

1.0

0.5

0.0

-0.5

-1.0

Managmentbiodynamichobbynaturestandard

PCA Plot for Dune Species Data

PCA gradient - site/species biplot

Axis 1

Axi

s 2

210-1-2

2.5

2.0

1.5

1.0

0.5

0.0

-0.5

-1.0

Ach mil

Agr sto

Alo gen

Ant odo

Bel perBro hor

Ele pal

Ely rep

Jun art

J un buf

Leo aut

Lol per

Pla lan

Poa pra

Poa tri

Ran flaRum ace

Sag pro

Tri rep

Bra rut

Biplot for Dune Species Data

standard

nature

biodynamic& hobby

Making Effective Use of Environmental Variables

Approaches

• Use single responses in linear models of environmental variables

• Use axes of a multivariate dimension reduction technique as responses in linear models of environmental variables

• Constrain the multivariate dimension reduction into the factor space defined by the environmental variables

Dimension Reduction (Ordination) ‘Constrained’ by the

Environmental Variables

Constrained?

Working with the Variability that we Can Explain

• Start with all the variability in the response variables.

• Replace the original observations with their fitted values from a model employing the environmental variables as explanatory variables (discarding the residual variability).

• Carry our gradient analysis on the fitted values.

Unconstrained/Constrained

• Unconstrained ordination axes correspond to the directions of the greatest variability within the data set.

• Constrained ordination axes correspond to the directions of the greatest variability of the data set that can be explained by the environmental variables.

Direct Gradient Analysis

• Environmental gradients are constructed from the relationship between species environmental variables

• Three methods:– Redundancy Analysis - linear model– Canonical (or Constrained) Correspondence

Analysis - unimodal model– Detrended CCA - modified unimodal model

Dune Data Unconstrained

-2 -1 0 1 2

-10

12

3

CA1

CA

2

Belper

Empnig

Junbuf

Junart

Airpra

Elepal

Rumace

ViclatBrarut Ranfla

Cirarv

Hyprad

LeoautPotpal

Poapra

Calcus

TripraTrirep

Antodo

Salrep

Achmil

Poatri

ChealbElyrep

Sagpro

Plalan

AgrstoLolper

Alogen

Brohor

213

4

166

1

85

17

15

10

11

9

18

3

20

14

19

12

7

Dune Data Constrained

-2 -1 0 1 2 3

-2-1

01

CCA1

CC

A2

Belper

Empnig

Junbuf

Junart

Airpra

Elepal

Rumace

Viclat

Brarut Ranfla

Cirarv

Hyprad

LeoautPotpal

Poapra

Calcus

Tripra Trirep

Antodo

Salrep

Achmil

Poatri

Chealb

ElyrepSagpro

Plalan

Agrsto

Lolper

Alogen

Brohor

2

13

4

16

6

1 8

5

17

15

1011

9

18

3

20

14

19

12

7

Manure.L

Manure.Q

Manure.C

Manure^4

Moisture.L

Moisture.Q

Moisture.C

A1

-10

Manure0

Manure1Manure2

Manure3

Manure4

Moisture1

Moisture2

Moisture4

Moisture5

How Similar are Objects/Samples/Individuals/Sites?

Similarity approachesor what do we mean by similar?

Different types of data

example

Continuous data : height

Categorical data

ordered (nominal) : growth rate very slow, slow, medium, fast, very

fast

not ordered : fruit colour yellow, green, purple, red, orange

Binary data : fruit / no fruit

Different scales of measurement

example

Large Range : soil ion concentrations

Restricted Range : air pressure

Constrained : proportions

Large numbers : altitude

Small numbers : attribute counts

Do we standardise measurement scales to make them equivalent? If so what do we lose?

Similarity matrixWe define a similarity between units – like the correlation between continuous variables.

(also can be a dissimilarity or distance matrix)

A similarity can be constructed as an average of the similarities between the units on each variable.

(can use weighted average)

This provides a way of combining different types of variables.

relevant for continuous variables:

Euclidean

city block or Manhattan

Distance metrics

A

B

A

B

(also many other variations)

Similarity coefficients for binary data

simple matching

count if both units 0 or both units 1

Jaccard

count only if both units 1

(also many other variants, eg Bray-Curtis)

simple matching can be extended to categorical data

0,1 1,1

0,0 1,0

0,1 1,1

0,0 1,0

A Distance Matrix

Uses of Distances

Distance/Dissimilarity can be used to:-

• Explore dimensionality in data using Principal coordinate analysis (PCO or PCoA)

• As a basis for clustering/classification

UK Wet Deposition Network

-400 -200 0 200 400

-40

0-2

00

02

00

40

0

Dim1

Dim

2

Goonhilly

Lough Navar

Achanarras

Flatford Mill

Strathvaich Dam

Yarner WoodBarcombe Mills

Stoke Ferry

Hillsborough Forest

Tycanol Wood

Allt a MharcaidhGlen Dye

Driby

Woburn

Balquhidder 2

Compton

High Muffles

Bottesford

Whiteadder

Pumlumon

Loch Dee Redesdale

Wardlow Hay Cop

Cow Green ReservoirBannisdale

Grouping methods

Cluster Analysis

hierarchical

divisive

put everything together and split

monothetic / polythetic

agglomerative

keep everything separate and join the most similar points (classical cluster analysis)

non-hierarchical

k-means clustering

Clustering methods

Agglomerative hierarchical

Single linkage or nearest neighbour

finds the minimum spanning tree: shortest tree that connects all points

chaining can be a problem


Complete linkage or furthest neighbour

compact clusters of approximately equal size.(makes compact groups even when none exist)


Average linkage methods

between single and complete linkage

From Alexandria to Suez

Hierarchical Clustering

CS

RA

11

CS

RA

12

CS

RA

13

CS

RA

18

CS

RA

19

CS

RA

20

CS

RA

21

CS

RA

22

CS

RA

23

CS

RA

16

CS

RA

17

CS

RA

14

CS

RA

15

CS

RA

33

CS

RA

34

CS

RA

35

CS

RA

26

CS

RA

27

CS

RA

28

CS

RA

29

AL

EX

07

AL

EX

06

AL

EX

08

AL

EX

05

AL

EX

02

AL

EX

04

AL

EX

01

AL

EX

03

AL

EX

09

AL

EX

10

CS

RA

31

CS

RA

30

CS

RA

32

CS

RA

24

CS

RA

25

0.0

0.2

0.4

0.6

0.8

1.0

Cluster Dendrogram

hclust (*, "complete")des.dist

He

igh

t


CS

RA

11

CS

RA

12

CS

RA

13

CS

RA

18

CS

RA

19

CS

RA

20

CS

RA

21

CS

RA

22

CS

RA

23

CS

RA

16

CS

RA

17

CS

RA

14

CS

RA

15

CS

RA

33

CS

RA

34

CS

RA

35

CS

RA

26

CS

RA

27

CS

RA

28

CS

RA

29

AL

EX

07

AL

EX

06

AL

EX

08

AL

EX

05

AL

EX

02

AL

EX

04

AL

EX

01

AL

EX

03

AL

EX

09

AL

EX

10

CS

RA

31

CS

RA

30

CS

RA

32

CS

RA

24

CS

RA

25

0.0

0.2

0.4

0.6

0.8

1.0

Cluster Dendrogram

hclust (*, "complete")des.dist

He

igh

t


-2 -1 0 1 2

-2-1

01

2

CA1

CA

2

+

+

+

+

+

+

+

+

++

+

+

++

+

++

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

Building and testing models

Basically you just approach this in the same way as for multiple regression – so there are the same issues of variable selection, interactions between variables, etc.

However the basis of any statistical tests using distributional assumptions are more problematic, so there is much greater use of randomisation tests and permutation procedures to evaluate the statistical significance of results.

Some Examples

Part of Fig 4.

What Technique?

Response variable(s) ...

Predictors(s)

No

Predictors(s)

Yes

... is one • distribution summary • regression models

... are many • indirect gradient analysis

(PCA, CA, DCA, MDS)

• cluster analysis

• direct gradient analysis

• constrained cluster analysis

• discriminant analysis (CVA)

Raw Data

65 70 75 80 85

81

01

21

41

61

82

0

Height

Dia

me

ter

Linear Regression

65 70 75 80 85

81

01

21

41

61

82

0

Height

Dia

me

ter

Two Regressions

65 70 75 80 85

81

01

21

41

61

82

0

Height

Dia

me

ter

Principal Components

65 70 75 80 85

81

01

21

41

61

82

0

Height

Dia

me

ter

Models of Species Response

There are (at least) two models:-

• Linear - species increase or decrease along the environmental gradient

• Unimodal - species rise to a peak somewhere along the environmental gradient and then fall again

Linear

-0.4 +0.4

+0.0

+7.0

Unimodal

-2.5 +3.5

+0.0

+250.0

Non-metric multidimensional scaling

NMDS maps the observed dissimilarities onto an ordination space by trying to preserve their rank order in a low number of dimensions (often 2) – but the solution is linked to the number of dimensions chosen

it is like a non-linear version of PCO

define a stress function and look for the mapping with minimum stress(e.g. sum of squared residuals in a monotonic regression of NMDS space distances between original and mapped dissimilarities)

need to use an iterative process, so try with many different starting points and convergence is not guaranteed

Procrustes rotation

-3 -2 -1 0 1 2 3

-3-2

-10

12

Procrustes errors

Dimension 1

Dim

en

sio

n 2

-4 -2 0 2 4

-5-4

-3-2

-10

12

Procrustes errors

Dimension 1

Dim

en

sio

n 2

used to compare graphically two separate ordinations

Documents

What we Measure vs. What we Want to Know "Not everything that counts can be counted, and not everything that can be counted counts." - Albert Einstein