69
Method comparison on graph based models for species occurences prediction - Methods And First results - Jörn Vorwald BTU Cottbus

Jörn Vorwald BTU Cottbus

  • Upload
    kenley

  • View
    45

  • Download
    0

Embed Size (px)

DESCRIPTION

Method comparison on graph based models for species occurences prediction - Methods And F irst results -. Jörn Vorwald BTU Cottbus. Motivation Basics Graph Theory Model Classification Methods Field Ecology GIS Statistics Results Outlook. Overview. - PowerPoint PPT Presentation

Citation preview

Page 1: Jörn Vorwald BTU Cottbus

Method comparison on graph based models for

species occurences prediction

- Methods And First results -Jörn

VorwaldBTU

Cottbus

Page 2: Jörn Vorwald BTU Cottbus

Overview1. Motivation

2. Basics

a) Graph Theory

b) Model Classification

3. Methods

a) Field Ecology

b) GIS

c) Statistics

4. Results

5. Outlook

Overview – Motivation - Basics – Methods – Results - Outlook

##

#

#

#

# #

##

#

#

#

#

# #

#

#

#

#

##

#

#

# #

##

#

# #

# #

# #

##

#

#

##

# # #

#

#

##

##

# #

#

#

#

#

#

#

## #

#

#

#

dl_ga dl_kr dl_nn dl_vo dl_fo ga_kr ga_nn ga_vo ga_fo kr_nn kr_vo kr_fo nn_vo nn_fo vo_fo

0.0

0.2

0.4

0.6

0.8

1.0

)1(3)1(

12

1

2

nnn

Hk

i i

i

nT

Page 3: Jörn Vorwald BTU Cottbus

Motivation• Atlas project for grasshoppers and bush crickets in

Brandenburg, started 1996

• First 63 sampling sites in 1997, 61 in SPN and Cottbus, 2 outside; damselflies and dragonflies added for investigation

• Next 60 sites in 1998, all in SPN and Cottbus

• Completion in 1999

• Last 35 sites in 2000 in SPN with target of local aggregation

• In 1999 first idea beyond atlases: information theory based approach answering the question ‚How much information is enough, when you cannot get complete information?‘

• In 2003 second idea: compare graph based models for prediction of species occurences

Overview – Motivation - Basics – Methods – Results - Outlook

Page 4: Jörn Vorwald BTU Cottbus

Overview – Motivation - Basics – Methods – Results - Outlook

From Atlas Project To Modelling

!!

!

!

!

! !

! !

!

!

!

!

! !

!!

!!

!!

!!

! !

! !

!

! !

! !

! !

! !

!!

!!

! ! !

!!

!!

!!

! !

!!

!

!

!!

! ! ! !

!

!

!

!

!

!!

! !

! !

!

!

!

!

!!

!! !

!

! !

! !

!

!!

!

! !! !

!

!

!!

!

!

! !

! ! !

! !

!!

!

!

!!

!

!

!

!

! ! !

!

! !

! !!! !

!

!

!

!!

!! !!!

!

!

!!

!

!

!!

!! !

!

!!

!

!!

!

!

!

!

!

!

!!

! !

! !

!

!

!

!

!!

!! !

!

! !

! !

!

!!

!

! !! !

!

!

!!

!

!

! !

! ! !

! !

!!

!

!

!!

!

!

!

!

! ! !

!

! !

!!

!

!

!

! !

! !

!

!

!

!

! !

!!

!!

!!

!!

! !

! !

!

! !

! !

! !

! !

!!

!!

! ! !

!!

!!

!!

! !

!!

!

!

!!

! ! ! !

!

!

! !!! !

!

!

!

!!

!! !!!

!

!

!!

!

!

!!

!! !

!

!!

!

!!

!

!

!

!

!

!!

!

!

!

! !

!

!

! !

!

!

!

!

!

!

!

!

!!

!

!

!!

!

!!

!

! !

!!

!

!

!

! !

!

!

! ! ! !

! !!

!

! !

!

!

!

! !

! !!

!!

!

!

!!

!

!!

!

!

!

!

!

! !

!

!

!

!

!

! !

!

! !

! !

!

!

!

! !

!

!

! !!

!

! ! !!

! !

!

• Brandenburg• 299 TK-25

• SPN/CB• 27 TK-25• 22 selected• contain 88

TK-10• 65 selected,

158 sites• 50 for

buffering, 106 sites

Page 5: Jörn Vorwald BTU Cottbus

Basics – Graph Theory

• What is a graph?

• What are special graphs?

• What is adjacency in graphs?

• What are weighted edges?

• What kinds of graphs are common in ecological modelling?

• What kinds of graphs are used in my approach?Overview – Motivation - Basics – Methods – Results - Outlook

Page 6: Jörn Vorwald BTU Cottbus

Graph Theory - Graphs

• A graph is a system of points and the points connecting lines (Bodendiek & Lang 1995).

• A graph is a system of point sets and of sets of point connecting lines. The set of lines may be empty. Usually the points are named vertices, and the lines are named edges.

Overview – Motivation - Basics – Methods – Results - Outlook

Page 7: Jörn Vorwald BTU Cottbus

Graph Theory – Special Graphs

• Complete graphs

• Wheels

• Stars

• Cycles

• Trees

• Platonian graphs

• Petersen graph

Overview – Motivation - Basics – Methods – Results - Outlook

Page 8: Jörn Vorwald BTU Cottbus

Graph Theory - Adjacency• When an edge connects two vertices, the vertices are

called ‚incident‘ to the edge, or, the edge is incident to each vertex.

v1

v2 v3

v4

x1

x2

x3

z1

z3

z2

z4

z5

e7e8

e6e5

e4

e3e2

e1

Overview – Motivation - Basics – Methods – Results - Outlook

Page 9: Jörn Vorwald BTU Cottbus

Graph Theory – Edge-weighting

• Each edge can be weighted by adding a special attribute.

• Some important problems of graph theory and computer sciences are related to weighted graphs (e. g. optimisation problems, travelling salesman problem).

z1

z3

z2

z4

z5 e7e8

e6e5

e4

e3e2

e1 e1 = 6 e5 = 5e2 = 4 e6 = 5e3 = 8 e7 = 6e4 = 3 e8 = 7

Σe = 44

Overview – Motivation - Basics – Methods – Results - Outlook

Page 10: Jörn Vorwald BTU Cottbus

Graphs In Ecological Modelling

• Graph based models are rare in ecology.

• Two kinds of graphs found in literature review

• Voronoi (Dirichlet, Thiessen) tessellation (e. g. Byers 1992, Mercier & Baujard 1997, Okabe et al. 2000)

• Gabriel graph (Gabriel & Sokal, 1969)

• In graphs usually ‚only‘ adjacency can be used for modelling.

• Polygon methods can introduce more realistic assumptions about abiotic and biotic factors influencing sampling sites or target organisms.

Overview – Motivation - Basics – Methods – Results - Outlook

Page 11: Jörn Vorwald BTU Cottbus

Graphs In This Approach

• Delaunay triangulation (dl)

• Gabriel graph (ga)

• Minimum spanning tree by Kruskal algorithm (kr)

• Nearest neighbours (nn)

• Voronoi tessellation (vo)

Overview – Motivation - Basics – Methods – Results - Outlook

Page 12: Jörn Vorwald BTU Cottbus

Delaunay Triangulation• In a delaunay triangulation a system of 3 vertices and 3

edges building triangles is establisht to separate the complete surface of interest, i.e. the area between vertices.

• Algorithm (‚divide and conquer‘):

• A triangle of edges is drawn between three points.

• The Delaunay constraint is checked:

• No fourth point is within the circumcircle of the triangle.

• (additional: The sum of two angles is greater than 30°.)

• A second triangle is drawn.

• The Delaunay constraint is checked again …Overview – Motivation - Basics – Methods – Results - Outlook

Page 13: Jörn Vorwald BTU Cottbus

Overview – Motivation - Basics – Methods – Results - Outlook

Page 14: Jörn Vorwald BTU Cottbus

Gabriel Graph• A Gabriel graph is constructed similarly to a Delaunay

triangulation.• In praxis, edges may be rejected from the graph due to

external conditions.• Algorithm:

• Draw an edge between two points with minimal distance (nearest neighbours).

• Check the constraint: a third point must not be within a circle with the edge as diameter.

• Draw an edge between one of the first points and a third point.

• Check the constraint again. When the new edge violates the constraint, the edge is to reject as member of the graph.

Overview – Motivation - Basics – Methods – Results - Outlook

Page 15: Jörn Vorwald BTU Cottbus

Overview – Motivation - Basics – Methods – Results - Outlook

Page 16: Jörn Vorwald BTU Cottbus

Minimum Spanning Tree• A minimum spanning tree is a set of connected vertices,

where the sum of the lengths of all edges tends to be less then other sums. It is a tree containing all vertices.

• Algorithm (Kruskal):

• Choose an edge with minimal distance (nearest neighbours). When more than one exist, choose accidently one.

• Choose a second edge with minimal or next bigger distance.

• Choose a third edge under same condition.

• Check the constraint: The edges must not build a cycle. If they do, reject the last choosen edge.

• Choose a new edge. Check the constraint again.Overview – Motivation - Basics – Methods – Results - Outlook

Page 17: Jörn Vorwald BTU Cottbus

Overview – Motivation - Basics – Methods – Results - Outlook

Page 18: Jörn Vorwald BTU Cottbus

Nearest Neighbours• A nearest neighbour graph is necessarily a set of

disconnected subgraphs, where each vertex has a connection to the vertex with minimum distance. (Nevertheless, a vertex may get a connection to two vertices.)

• Algorithm:

• Calculate the distances within a complete graph. Order the distances ascending.

• Start with minimum distance and draw an edge.

• Check the constraint: All vertices must be included.

• Continue with the next bigger distance, draw a new edge.

• Check the constraint again.Overview – Motivation - Basics – Methods – Results - Outlook

Page 19: Jörn Vorwald BTU Cottbus

Overview – Motivation - Basics – Methods – Results - Outlook

Page 20: Jörn Vorwald BTU Cottbus

Voronoi Diagram• A Voronoi diagram is the dual graph of a Delaunay

triangulation, i.e. each edge within a Voronoi diagram is orthograpic to an edge within the Delaunay triangulation.

• Within a Voronoi cell each point is affected nearer to the centre of the cell than to each other cell centre.

• Algorithm:

• Select two points (e. g. the most top and left and its nearest neighbour), draw temporarily a line between them.

• Draw an edge on the line in the middle orthographic to it, remove the line.

• Select a third point, draw temporarily lines between it and all neighbours. Create edges orthographic to each of the lines. Cut the edges on intersection points. Remove the lines.

Overview – Motivation - Basics – Methods – Results - Outlook

Page 21: Jörn Vorwald BTU Cottbus

Overview – Motivation - Basics – Methods – Results - Outlook

Page 22: Jörn Vorwald BTU Cottbus

Model Classification

• Multidimensional vector of classes

• Rare classifications in literature review:

• Levins (1966), Sharpe (1990), Refsgaard (1996), eWater Ltd. (2006)

• Rare reflection of classifications

• No explicit classification for each model possible

Overview – Motivation - Basics – Methods – Results - Outlook

Page 23: Jörn Vorwald BTU Cottbus

Model Classifications

• By type

• mechanistic

• statistical

• By time complexity

• static

• dynamic

• By species complexity

• single species

• multiple species

Overview – Motivation - Basics – Methods – Results - Outlook

• By data distribution

• localised

• gridded

• By purpose

• screening

• research

• planning, monitoring, assessment

Page 24: Jörn Vorwald BTU Cottbus

Model Classifications

• By extent

• local (x=1)

• regional (x=2)

• continental (x=3)

• By number

• presence only (y=1)

• presence and absence (y=2)

• activity, abundance (y=3)

Overview – Motivation - Basics – Methods – Results - Outlook

• By background

• empirical (z=1)

• causal (z=2)

Page 25: Jörn Vorwald BTU Cottbus

Model Classification

Overview – Motivation - Basics – Methods – Results - Outlook

extent

num

ber

backgroundByers (1992)

Boyce (2003)

• Byers (1992)

• statistic

• static

• 3 bark beetle species, used as single

• extent: bork of single tree

• presence only

• localised data

• for research

• causal

• Boyce et al. (2003)

• statistic: log. regress.

• dynamic: summer/winter

• single species: elk

• extent:Yellowstone National Park

• relative abundance

• localised data

• for monitoring

• empirical

• Buckland & Elston (1993)

• statistic: GLM

• static

• single species: green woodpecker, red deer

• extent: north-east Scotland

• relative abundance

• gridded data

• for screening

• causal

B & E (1993)• Ferrier et al. (2002)

• statistic

• static

• community level

• extent: North East New South Wales

• presence/absence

• gridded data

• for monitoring

• causal

Ferrier (2002)

• Vorwald (2006)

• statistic

• static

• community level

• extent: CB/SPN

• relative activity

• localised data

• for screening

• empirical

Vorwald (2006)

Page 26: Jörn Vorwald BTU Cottbus

Methods – Field Ecology• Selection of sampling sites

• First site set: one site in each topographic map 1:10,000 within SPN or CB

• Second set: same procedure

• Third set: unobserved topographic map (1:10,000) squares within 4 selected topographic maps 1:25,000 with one site each

• Criteria:

• Preferably grassland with gradient in wetness

• Preferably open water (creek, river, pond or lake)

• Preferably old trees on or near site

Overview – Motivation - Basics – Methods – Results - Outlook

Page 27: Jörn Vorwald BTU Cottbus

Methods – Field Ecology• Observation

• Visual observation (grasshoppers, bush crickets, damselflies and dragonflies)

• Net capturing (all groups) – specimen collection

• Acoustic observation (grasshoppers and bush crickets)

• By ear

• With bat detector support

• Documentation

• Field forms

• DatabaseOverview – Motivation - Basics – Methods – Results - Outlook

Page 28: Jörn Vorwald BTU Cottbus

Methods – GIS• Preparation:

• Sets of sampling and buffer sites exported to plain text files from the database

• Calculation of graphs within adopted Java program

• Export of results to plain text files

• Import of text file information into GIS for visualisation and preparation of intersection

• Intersection of Voronoi diagrams in GIS, export of relevant information of intersected polygons to plain text files

• Calculation of species vectors in database

Overview – Motivation - Basics – Methods – Results - Outlook

Page 29: Jörn Vorwald BTU Cottbus

KNOWN BUF_DS SHAPE PREDICT

97 05 97_05 98

97 06 97_06 98

97 05_06 97_05_06 98

98 05 98_05 97

98 06 98_06 97

98 05_06 98_05_06 97

97_1 97_2_05_1 97_1_97_2_05_1 00

97_1 97_2_06_1 97_1_97_2_06_1 00

KNOWN BUF_DS SHAPE PREDICT

97 05 97_05 98

97 06 97_06 98

97 05_06 97_05_06 98

98 05 98_05 97

98 06 98_06 97

98 05_06 98_05_06 97

97_1 97_2_05_1 97_1_97_2_05_1 00

97_1 97_2_06_1 97_1_97_2_06_1 00

GIS - Preparation

ID SHORT START SUBSET X_COORD Y_COORD

1 Jessern 1997 4651304,13163 5768510,75401

2 Groß Drewitz 1997 4679614,60514 5767082,00114

3 TÜP Lieberose 1997 4660128,00351 5757345,31492

4 Staakow 1997 4665327,07645 5764012,82831

12 Weidenweg 1997 2 4646409,33013 5749301,96544

13 Paulicks Mühle 1997 2 4646105,05869 5743732,47509

14 Byhleguhre 1997 2 4650020,89988 5750267,69655

29 Dahlitz 1997 1 4653737,66796 5739262,97754

30 Zahsow 1997 1 4656727,79354 5739109,94034

31 Koselmühle 1997 1 4651528,22304 5733255,17356

NodeId NodeX NodeY

1 4651304.13 5768510.75

2 4679614.61 5767082

3 4660128 5757345.31

4 4665327.08 5764012.83

12 4646409.33 5749301.97

13 4646105.06 5743732.48

14 4650020.9 5750267.7

29 4653737.67 5739262.98

30 4656727.79 5739109.94

31 4651528.22 5733255.17

32 4658004.59 5733498.77

Overview – Motivation - Basics – Methods – Results - Outlook

NodeId NodeX NodeY

1 4651304.13 5768510.75

2 4679614.61 5767082

3 4660128 5757345.31

4 4665327.08 5764012.83

12 4646409.33 5749301.97

13 4646105.06 5743732.48

14 4650020.9 5750267.7

29 4653737.67 5739262.98

30 4656727.79 5739109.94

31 4651528.22 5733255.17

32 4658004.59 5733498.77

ID SHORT START SUBSET X_COORD Y_COORD

1 Jessern 1997 4651304,13163 5768510,75401

2 Groß Drewitz 1997 4679614,60514 5767082,00114

3 TÜP Lieberose 1997 4660128,00351 5757345,31492

4 Staakow 1997 4665327,07645 5764012,82831

12 Weidenweg 1997 2 4646409,33013 5749301,96544

13 Paulicks Mühle 1997 2 4646105,05869 5743732,47509

14 Byhleguhre 1997 2 4650020,89988 5750267,69655

29 Dahlitz 1997 1 4653737,66796 5739262,97754

30 Zahsow 1997 1 4656727,79354 5739109,94034

31 Koselmühle 1997 1 4651528,22304 5733255,17356

0 4632582.890005745551.93000

0 4636575.000005741010.00000

1 4638150.000005755400.00000

1 4632582.890005745551.93000

2 4638155.000005733600.00000

2 4636575.000005741010.00000

3 4638735.000005734745.00000

3 4638155.000005733600.00000

4 4638735.000005734745.00000

4 4636575.000005741010.00000

5 4639150.480005730300.92000

5 4638155.000005733600.00000

180 186 6047.0186 180 6047.0173 180 11313.0180 173 11313.0187 186 7577.0186 187 7577.0188 187 1284.0187 188 1284.0188 186 6627.0186 188 6627.0192 187 3446.0187 192 3446.0192 188 4463.0188 192 4463.0193 192 6464.0192 193 6464.0182 186 7531.0186 182 7531.0

Page 30: Jörn Vorwald BTU Cottbus

Overview – Motivation - Basics – Methods – Results - Outlook

Page 31: Jörn Vorwald BTU Cottbus

GIS - Preparation• 76 point sets for input (one file each):

• 62 for all graph types (26 with ‚known‘ and ‚buffer‘ points, 36 with ‚known‘, ‚buffer‘ and points to ‚predict‘

• 14 for graph types except Voronoi graphs (6 with ‚known‘ points (without ‚buffer‘), 8 with known and points to ‚predict‘)

• 670 files as output:

• 62 * 5 (graph types) for all lines

• 62 * 4 for all neighbouring points for all types except Voronoi

• 14 * 4 for lines of graph types except Voronoi

• 14 * 4 for neighbouring points of graph types except VoronoiOverview – Motivation - Basics – Methods – Results - Outlook

Page 32: Jörn Vorwald BTU Cottbus

GIS - Intersection• Each Voronoi diagram with ‚known‘ sites and ‚buffer‘ sites

has to be intersected with corresponding Voronoi diagram with sites to ‚predict‘ added.

• Buffering for avoidance of ‚edge effects‘ (Kenkel et al. 1989)

• 36 intersections at all

• Split into small polygons with two parents:

• One from known or buffering site

• One from site to be predicted

• Calculation of areas and relation of the area to the area of the parent for each polygon

Overview – Motivation - Basics – Methods – Results - Outlook

Page 33: Jörn Vorwald BTU Cottbus

Overview – Motivation - Basics – Methods – Results - Outlook

Page 34: Jörn Vorwald BTU Cottbus

GIS - Intersection

Overview – Motivation - Basics – Methods – Results - Outlook

1 4649437.156855770571.553371 4648965.431595765840.895441 4649350.000005765310.163101 4653966.569335767238.157051 4654150.526015767459.054311 4653141.974355771837.518571 4653062.682575771885.732932 4675843.706645767525.422102 4677881.383775764961.497652 4681057.635445765466.766082 4678293.735285771173.281562 4677786.850265771184.857242 4677461.308165770960.247083 4655085.498445757481.473633 4660252.599485756432.552243 4662308.694615760767.060353 4661109.535085761496.041013 4658889.969325761673.507193 4655171.682885759479.60279

113 211 211 100

114 175 175 100

115 176 176 100

116 177 177 100

117 64 2 74.4

118 64 10 3.57

119 64 167 3.63

120 64 169 1.48

121 64 170 16.92

122 65 3 75.11

123 65 15 19.25

124 65 18 3.65

125 65 177 1.98

126 66 3 1.96

127 66 4 77.17

128 66 164 19.56

129 66 165 1.32

Page 35: Jörn Vorwald BTU Cottbus

Calculation Of Species Vectors

• A vector in this approach is a space of attributes.

• Relevant attributes are ‚counts‘ of species sampled on the sites.

• A species count is the maximum detection class, in which the species have been observed.

Overview – Motivation - Basics – Methods – Results - Outlook

Class name Number of observations Class centre (2^(n-1))

1 1 1

2 2 … 4 2

4 5 … 9 8

5 10 … 19 16

6 20 … 49 32

7 ≥ 50 64

Page 36: Jörn Vorwald BTU Cottbus

Calculation Of Species Vectors

• Tables in database (filled by Visual Basic programs):

• Neighbouring sample sites from Java output (incl. distances)

• Voronoi cell intersection from GIS output (incl. areas)

• Prediction table with sample sites and prediction subsets (incl. ‚found‘ as observed values) as rows, species as columns and species counts as table values

• Filling prediction table

Overview – Motivation - Basics – Methods – Results - Outlook

Page 37: Jörn Vorwald BTU Cottbus

Calculation Of Species Vectors

• Filling prediction table

• For each site to be predicted iteration on neighbours defined by graph type

• Sum of all distances for calculation of relation of each neighbour

• Calculation of prediction relation using ‚real‘ number (i.e. converted class centre)

• Sum of all relations reconverted to class

• Similar procedure for Voronoi cells using areas instead of distances

Overview – Motivation - Basics – Methods – Results - Outlook

Page 38: Jörn Vorwald BTU Cottbus

Calculation Example• Site 77 within gabriel

graph with known sites ‚97‘ and buffer set ‚05‘

• Neighbours: 14, 15, 16 ,17

Overview – Motivation - Basics – Methods – Results - Outlook

77 14 4132.0

77 15 4413.0

77 16 3500.0

77 17 3642.0

• Σdist = 15,68777 14 0.24

77 15 0.22

77 16 0.27

77 17 0.27

• Vector calculation

Page 39: Jörn Vorwald BTU Cottbus

Vector Calculation Example

Overview – Motivation - Basics – Methods – Results - Outlook

77 14 4132.0

77 15 4413.0

77 16 3500.0

77 17 3642.0

77 14 0.24

77 15 0.22

77 16 0.27

77 17 0.27

• Vectors for observation

• o14 <- c(0,0,6,5,6,2,7,0,4,7,6,4, ... ,2,0,0,0,2,2,0,2)

• o15 <- c(0,0,4,6,6,5,4,0,0,7,6,2, ... ,0,0,0,6,4,4,0,0)

• o16 <- c(0,0,5,7,5,5,6,0,0,5,6,2, ... ,0,0,0,0,0,2,0,4)

• o17 <- c(0,0,4,5,5,4,5,0,0,7,6,1, ... ,2,0,0,6,4,1,0,1)

• Calculation of prediction using interim transformation to ‚abundances‘ and retransformation to observation classes

• p77 <- c(o14 * 0.24 + o15 * 0.22 + o16 * 0.27 + o17 * 0.27)

• p77 <- c(0,0,5,6,6,5,6,0,2,7,6,2, ... ,2,0,0,5,2,2,0,2)

• Calculations for sample sites, which are to be predicted within prediction subsets of sites, for each graph: 7,238

Page 40: Jörn Vorwald BTU Cottbus

Methods - Statistics• Preparation:

• Export of values to be calculated in statistics environment to plain text files from database (prediction table)

• Export of statistics scripts from database

• Calculation of statistics in statistics environment R

• Export of results to plain text files

• Import of statistics results into database

• Visualisation of results in R, or in spreadsheet calculation program

Overview – Motivation - Basics – Methods – Results - Outlook

Page 41: Jörn Vorwald BTU Cottbus

Statistics Preparation• Export of values to be calculated in statistics environment

to plain text files from database (prediction table)

• p77 <- c(0,0,5,6,6,5,6,0,2,7,6,2, ... ,2,0,0,5,2,2,0,2)

Overview – Motivation - Basics – Methods – Results - Outlook

• Export of statistics script from database: 1,506 tests

dl ga kr nn vo fo

0 0 0 0 0 0

0 0 0 0 0 0

5 5 5 5 5 4

6 6 6 7 6 6

6 6 6 5 6 6

5 5 4 5 4 5

6 6 6 6 6 0

0 0 0 0 0 4

2 2 2 0 2 0

7 7 6 5 7 6

6 6 6 6 6 6

2 2 2 2 2 2

sink( file = "U:/diss/r/kruskal_wallis/output/kruskal_result.txt", append = FALSE )

#site: 77

kktst <- read.table("U:/diss/r/kruskal_wallis/input/77-97_05_98.dat", header = TRUE)

site_77 <- c(kktst$dl,kktst$ga,kktst$kr,kktst$nn,kktst$vo,kktst$fo)

ps_97_05_98 <- factor(rep(1:6, c(86, 86, 86, 86, 86, 86)))

kruskal.test(site_77, ps_97_05_98)

sink( file = NULL )

Page 42: Jörn Vorwald BTU Cottbus

Statistics Calculation In R• Kruskal-Wallis rank sum test for each site within

each prediction set: models vs. observation - 1,506 operations

• Correlation using R-method „kendall“, i.e. rank based measure of association, for each site within each prediction set: each model vs. each other (incl. observation) - 22,590 operations

• Group building by model comparison, e. g. all Delaunay triangulations vs. all Gabriel graphs, or all Voronoi tessellations vs. all observations: Kruskal-Wallis rank sum test for the comparison of correlation coefficients – 106 operations

Overview – Motivation - Basics – Methods – Results - Outlook

Page 43: Jörn Vorwald BTU Cottbus

Data And Result Handling

Overview – Motivation - Basics – Methods – Results - Outlook

• Calculation of statistics in R

• Export of results to plain text files

• Import of statistics results into database by text wrapping routine in Visual Basic

• Visualisation of results in R, or in spreadsheet calculation program

Kruskal-Wallis rank sum test

data: site_77 and ps_97_05_98

Kruskal-Wallis chi-squared = 19.9643, df = 5, p-value = 0.001269

Page 44: Jörn Vorwald BTU Cottbus

Overview – Motivation - Basics – Methods – Results - Outlook

Results

• Kruskal-Wallis rank sum test for each site within each prediction set: models vs. observation

• Correlation using Kendall‘s τ for each site within each prediction set: each model vs. each other (incl. observation)

• Kruskal-Wallis rank sum test for the correlation coefficients of model comparisons

• Advantages and limits of methods

Page 45: Jörn Vorwald BTU Cottbus

Overview – Motivation - Basics – Methods – Results - Outlook

Models vs. Observations• Rows: site

• Columns: prediction set

• Cells: p-value of Kruskal-Wallis rank sum test (models vs. observation)

ID 97_05_98 97_05_06_9897_06_9864 0,3244 0,3614 0,333365 0,000544 0,000628 0,0004966 0,9658 0,997 0,999267 1,04E-13 5,76E-13 5,76E-1368 0,01031 0,01031 0,0103169 0,000687 0,001286 0,0191270 0,000157 0,000157 0,00020271 7,17E-06 7,17E-06 7,17E-0672 0,04321 0,04321 0,0432173 0,07101 0,07337 0,0733774 0,001006 0,001813 0,00181375 0,6383 0,6383 0,638376 4,2E-05 4,2E-05 4,2E-0577 0,001269 0,001616 0,00161678 0,004537 0,004537 0,00453779 0,001985 0,001985 0,00198580 0,000532 0,000532 0,000532

• Significance level less depending on prediction set

• Heavy differences between groups of sites

•Low corre

lation with

out statis

tical

significance

Page 46: Jörn Vorwald BTU Cottbus

Overview – Motivation - Basics – Methods – Results - Outlook

• Pattern to be recognised

• Not independent from prediction set

• Differences between groups of sites

Models vs. ObservationsID 00_97_1 00_97_1_98_1 00_97_2_05_1_97_100_97_2_05_1_97_1_98_1

29 0,002261 0,0005179 0,0009002 0,000941230 0,1128 0,006092 0,1652 0,0122931 0,01191 0,0007397 0,01549 0,000761832 0,3468 0,002578 0,4438 0,00508533 0,1623 0,001004 0,2149 0,00188734 1,943E-08 8,225E-14 0,0006627 0,00104335 0,001144 0,1223 0,001701 0,107236 0,0000441 0,04833 0,000004346 0,057323742 0,00001053 0,01478 0,000007105 0,0117343 0,000003024 1,205E-10 0,000009334 5,344E-1044 0,998 0,004016 0,9995 0,065545 0,001063 0,001963 0,0007176 0,00308646 0,009834 0,01047 0,01081 0,012847 0,0004521 1,375E-07 0,001374 0,000199348 0,09602 0,5212 0,0000969 0,283649 0,6537 0,0006065 0,3804 0,0002888

•Low corre

lation with

out statis

tical

significance

Page 47: Jörn Vorwald BTU Cottbus

Overview – Motivation - Basics – Methods – Results - Outlook

• Independent from prediction set

• Model comparison creates groups:

• Delaunay triangulations are similar to Gabriel graphs, and similar to Voronoi tessellations

• Minimum spanning trees are similar to nearest neighbour graphs

• Observations are less similar to each model than unsimilar models among each other

Model CorrelationsID PS dl_ga dl_kr dl_nn dl_vo … nn_fo vo_fo77 97_98 1 0,907748 0,619755 0,403974777 97_05_98 1 0,907748 0,619755 0,993317 0,4039747 0,57837677 97_05_06_98 1 0,900415 0,626944 0,991985 0,4039747 0,58225177 97_06_98 1 0,900415 0,626944 0,991985 0,4039747 0,582251

Page 48: Jörn Vorwald BTU Cottbus

dl_ga dl_kr dl_nn dl_vo dl_fo ga_kr ga_nn ga_vo ga_fo kr_nn kr_vo kr_fo nn_vo nn_fo vo_fo

0.0

0.2

0.4

0.6

0.8

1.0

Overview – Motivation - Basics – Methods – Results - Outlook

Final Kruskal-Wallis test

• Delaunay/Gabriel and MST/NN are very similar.• MST and NN are different from Delaunay as well as from

Gabriel.• Observations are different from all other models.

Page 49: Jörn Vorwald BTU Cottbus

Overview – Motivation - Basics – Methods – Results - Outlook

Advantages And Limits• The models are easy to implement.

• The models are easy to understand.

• The models are easy to extend.• The less the graph is connected, i.e. the less the set of

edges is, or, the less the number of neighbours of a single vertex is, the less is the probability of connections between known sites and sites to be predicted: The decrease of edge number increases the error rate.

• The border effects are important limitations, not only for Voronoi cells (comp. Byers 1992), but for all graph types: spatially outlying sites are not or only bad to be predicted.

Page 50: Jörn Vorwald BTU Cottbus

Limits: Graph Connections

!

! !

!

!

!

!

!

! !

!

2

67

171168

10

169

178

166

170

167

8 1799

4

165

11

!

! !

!

!

!

!

!

! !

!

2

67

171168

10

169

178

166

170

167

8 1799

4

165

11

64

!

! !

!

!

!

!

!

! !

!

2

67

171168

10

169

178

166

170

167

8 1799

4

165

11

64

116 177 177 100

117 64 2 74.4

118 64 10 3.57

119 64 167 3.63

120 64 169 1.48

121 64 170 16.92

122 65 3 75.11

123 65 15 19.25

Overview – Motivation - Basics – Methods – Results - Outlook

-> 22.03% from unknown (buffering) sites

Page 51: Jörn Vorwald BTU Cottbus

Limits: Graph Connections

!

!

!

!

!

!

!

! !

!

!

!

85

22

11

23

9

184

25 185

6

24

19 183

179

4

21

18

!

!

!

!

!

! !

!

!

!

!

!

!

!

!

! !

!

!

!

85

22

11

23

9

184

25 185

6

24

19 183

179

4

21

18

86

208 85 22 52.99

209 86 8 1.42

210 86 9 0.4

211 86 11 4.12

212 86 22 5.92

213 86 23 81.98

214 86 25 6.16

215 87 22 45.13

Overview – Motivation - Basics – Methods – Results - Outlook

-> 0.0% from unknown (buffering) sites

Page 52: Jörn Vorwald BTU Cottbus

Limits: Graph Connections

!

!

!

!

!

!

! !!172

159

163

164

160

175

162161

176

165

66

174173 177

!

!

!

!

!

!

!

! !!172

159

163

164

160

175

162161

176

165

66

174173 177

1!

!

!

!

!

!

!

! !!172

159

163

164

160

175

162161

176

165

66

174173 177

1

113 211 211 100

114 1 160 31.73

115 1 161 16.48

116 1 162 0.01

117 1 163 22.59

118 1 172 0.34

119 1 175 28.74

120 1 176 0.12

121 2 64 75.47

Overview – Motivation - Basics – Methods – Results - Outlook

-> 100.0% from unknown (buffering) sites

Page 53: Jörn Vorwald BTU Cottbus

Limits: Site Introduction

Overview – Motivation - Basics – Methods – Results - Outlook

• difference between introduction of sites to be predicted one after other, and simultanous introduction of many sites

• order of introduction important due to local reorganisation of graph

• e.g. leaving out site 81:

• sites 18, 20, 78, and 80 would loose one neighbour (81)

• all would get a new neighbour (18: 20, 20: 18, 78: 80, 80: 78)

!!

!

!

!

! !

!!

!

!

!

!

! !

!

!

!

!

!

!

!

!

! !

!!

!

! !

! !

!!

!!

!

!

!!

! ! !

!

!

!!

!

!

! !

!

!

!

!

!

!

!!

!! !

!

!

!

!!

!

!

!

!!

!

!

! !

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!!

!

! !

!!

!

!

!

! !

!

!

! ! !!

! !!

!

! !

!

9876

5

4

3

21

63626160

595857

56

55

5453

52

5150

4948

4746

4544

434241

4039

3837

3635

3433

3231

3029

28

2726

2524

2322

2120

191817

16

1514

13

12

11

10

177

176175

211210209208

207206205204

203

202

201200

199

198

197

196195

194193

192

191190

189

188187

186

185184

183182

181

180

179

178

174

173

172

171

169

168

167166

165

164

163

162161

160159

!

!

!

!

!

! !

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

! !

!

!

!

! !

!!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

! !

!

!

!

!

!

!

!

!

!

! !

!

!

!

!

!

!

!

!

!

!

!

!

! !

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!!

!

!

!!

!

!

!

!

!

!

!!

! 3

33

302927

20

18

17

16

1514

177

!

!

!

!

!

! !

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

! !

!

!

!

! !

!!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

! !

!

!

!

!

!

!

!

!

!

! !

!

!

!

!

!

!

!

!

!

!

!

!

! !

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!!

!

!

!!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

! !

! !

!

!

!

!

! !

!!

!

!

!

!

!

!

! !

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

3

33

302927

20

18

17

16

1514

177

95

92

89

8483

82

81

80

79

78

77

76

75

67

65

!

!

!

!

!

! !

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

! !

!

!

!

! !

!!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

! !

!

!

!

!

!

!

!

!

!

! !

!

!

!

!

!

!

!

!

!

!

!

!

! !

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!!

!

!

!!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

! !

! !

!

!

!

!

! !

!!

!

!

!

!

!

!

! !

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

3

33

302927

20

18

17

16

1514

177

95

92

89

8483

82

81

80

79

78

77

76

75

67

65

xx

xx x

Page 54: Jörn Vorwald BTU Cottbus

Overview – Motivation - Basics – Methods – Results - Outlook

More Limits

• Species richness is poor in all taxa investigated.

• The landscape heterogeneity of the study area is poor.

Page 55: Jörn Vorwald BTU Cottbus

Outlook - Questions• What is the best model?

• Are there differences between buffered and unbuffered prediction sets?

• Are there differences between Orthoptera and Odonata, i.e. is the applicability of the models independent from the species group?

• Why no use of geostatistics?

• How are errors to be handled, which errors occur, which errors are how being propagated?

Overview – Motivation - Basics – Methods – Results - Outlook

Page 56: Jörn Vorwald BTU Cottbus

What Is The Best Model?

Overview – Motivation - Basics – Methods – Results - Outlook

• The problem: the models seemed being better than ‚reality‘ (observation) -> no scale for assessment as in modelling literature

• First: Observation is just another model, which is being ignored in modelling literature.

• Second: Observation seemed being variation-less due to ‚single‘ observation acts.

• The solution: simulating other ‚observations‘ using the same models being tested.

Page 57: Jörn Vorwald BTU Cottbus

Best Model - Example

Overview – Motivation - Basics – Methods – Results - Outlook

• The method is called ‚leave one out‘, i.e.

• take all sites but one

• ‚predict‘ its result

• take all sites but another one

• …

• All graph types have to be included getting not one ‚observation‘ but many: 5, i.e. it fits not for Voronoi diagrams.

!!

!

!

!

! !

!!

!

!

!

!

! !

!

!

!

!

!!

!

!

! !

!!

!

! !

! !

!!

!!

!

!

!!

! ! !

!

!

!!

!!

! !

!

!

!

!

!

!

!! !

!

!

!

!!

!

!

!

! !

!!

!

!

!

!

! !

!

!

!

!

!!

!

!

! !

!!

!

! !

! !

!!

!!

!

!

!!

! ! !

!

!

!!

!!

! !

!

!

!

!

!

!

!! !

!

!

!

Page 58: Jörn Vorwald BTU Cottbus

dl ga kr nn fo

0 0 0 0 0

0 0 0 0 0

5 5 5 5 4

6 6 6 7 6

6 6 6 5 6

5 5 4 5 5

6 6 6 6 0

0 0 0 0 4

2 2 2 0 0

7 7 6 5 6

6 6 6 6 6

2 2 2 2 2

2 2 2 2 1

2 2 4 5 0

0 0 0 0 0

2 2 2 4 2

0 0 0 0 0

Best Model - Example

Overview – Motivation - Basics – Methods – Results - Outlook

!

!

!

!!

! !

! !

!

!

!

!

!

!

!

!!

!

! !

! !

!

!

!

!

! !! !

!

!

!

!

!

!

! !

! !!

! !

!!

!

!

!!

!

!

!

!

! !!

!

! !

99

9897

96

95

949392

91

90

8988

87

8685

8483

82

818079

7877

76

75

74

73

7271

7069

6867

66

65

64

123122

121

120119118

117

116

115

114

113112

111

110

109108

107106

105104103

102101

100

!

!

!

!!

! !

! !

!

!

!

!

!

!

!

!!

!

! !

! !

!

!

!

!

! !! !

!

!

!

!

!

!

! !

! !!

! !

!!

!

!

!!

!

!

!

!

! !!

!

! !

99

9897

96

95

949392

91

90

8988

87

8685

8483

82

818079

7877

76

75

74

73

7271

7069

6867

66

65

64

123122

121

120119118

117

116

115

114

113112

111

110

109108

107106

105104103

102101

100

dl ga kr nn fo fd fg fk fn

0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0

5 5 5 5 4 4 4 4 0

6 6 6 7 6 5 5 4 4

6 6 6 5 6 5 5 5 4

5 5 4 5 5 4 4 4 4

6 6 6 6 0 4 4 4 2

0 0 0 0 4 2 2 0 0

2 2 2 0 0 2 2 0 0

7 7 6 5 6 6 6 5 4

6 6 6 6 6 6 6 5 4

2 2 2 2 2 4 4 4 4

2 2 2 2 1 0 0 0 0

2 2 4 5 0 1 1 2 2

0 0 0 0 0 0 0 0 0

2 2 2 4 2 2 2 0 0

0 0 0 0 0 0 0 0 0

Page 59: Jörn Vorwald BTU Cottbus

dl_ga dl_kr dl_nn dl_vo dl_fo ga_kr ga_nn ga_vo ga_fo kr_nn kr_vo kr_fo nn_vo nn_fo vo_fo

0.0

0.2

0.4

0.6

0.8

1.0

Overview – Motivation - Basics – Methods – Results - Outlook

dl_g

adl

_kr

dl_n

ndl

_vo

dl_f

odl

_fd

dl_f

gdl

_fk

dl_f

nga

_kr

ga_n

nga

_v

ga_f

oga

_fd

ga_f

gga

_fk

ga_f

nkr

_nn

kr_v

okr

_fo

kr_f

dkr

_fg

kr_f

kkr

_fn

nn_v

onn

_fo

nn_f

dnn

_fg

nn_f

knn

_fn

vo_f

ovo

_fd

vo_f

gvo

_fk

vo_f

nfo

_fd

fo_f

gfo

_fk

fo_f

nfd

_fg

fd_f

kfd

_fn

fg_f

kfg

_fn

fk_f

n

0.0

0.2

0.4

0.6

0.8

1.0

Page 60: Jörn Vorwald BTU Cottbus

What Is The Best Model?

Overview – Motivation - Basics – Methods – Results - Outlook

• The solution: The best model is the Voronoi tessellation, followed by Delaunay triangulation and Gabriel graph.

• Voronoi tessellation focuses not only on distances to single (known) sites, but to complete ‚recruiting areas‘.

• It can be expanded in the context of ecology and landscape ecology by introducing landscape parameters (e. g. connectivity, habitat suitability, etc.).

• Delaunay triangulation is due to duality of models an equivalent of Voronoi tessellation.

• Gabriel graphs are not far from Delaunay triangulation, but it is not feasible that sites are excluded from influence only by another constraint.

Page 61: Jörn Vorwald BTU Cottbus

Why No Geostatistics?

Overview – Motivation - Basics – Methods – Results - Outlook

• It has been developed for static processes (geology).

• Autocorrelation is one of the central concepts.

• There is no external validation of autocorrelation concept: It is depending on dispersion of data points, independent from scale.

Page 62: Jörn Vorwald BTU Cottbus

Why Simple Models?

Overview – Motivation - Basics – Methods – Results - Outlook

• Why only simplified models with adjacency as only factor?

• We tend to observe the wrong parameters.

• Programmatic literature

• Why don’t we believe the models? (Aber 1997)

• Does vegetation suit our models? (Bio 2000)

Page 63: Jörn Vorwald BTU Cottbus

Outlook

Overview – Motivation - Basics – Methods – Results - Outlook

• Decoursey 1992. Developing models with more detail: do more algorithms give more truth?

XX

Page 64: Jörn Vorwald BTU Cottbus

Acknowledgements

U. BröringS. FlemmingH. VorwaldG. Wiegleb

Page 65: Jörn Vorwald BTU Cottbus

Acknowledgements

Thank you for attention.

[email protected]

Page 66: Jörn Vorwald BTU Cottbus

References• Aber, J. D. 1997. Why don’t we believe the models? – Bull. Ecol. Soc. Am. 78: 232–233. • Bio, A. M. F. 2000. Does vegetation suit our models? Assessing species distribution in environmental space. Nederlandse Geografische Studies 265. Krug/Faculteit Ruimtelijke Wetenschappen, Universiteit Utrecht, Utrecht, The Netherlands. 206 pp. • Boyce, M. S., Mao, J. S., Merrill, E. H., Fortin, D., Turner, M. G., Fryxell, J. & Turchin, P. 2003. Scale and heterogeneity in habitat selection by elk in Yellowstone National Park. Ecoscience 10(4): 421-431.• Buckland, S. T. & Elston, D. A. 1993. Empirical models for the spatial distribution of wildlife. Journal of Applied Ecology 30: 478–95.

Page 67: Jörn Vorwald BTU Cottbus

References• Byers, J. A. 1992. Dirichlet tessellation of bark beetle spatial attack points. Journal of Animal Ecology 61: 759-768. • Decoursey, D. G. 1992. Developing models with more detail: do more algorithms give more truth? Weed Technol. 6, 709–715. • eWater Ltd. 2006. Series on model choice. 1. General approaches to modelling and practical issues of model choice. http://www.toolkit.net.au/cgi-bin/WebObjects/toolkit.woa/wa/modelChoice (valid on 20.09.2006)• Ferrier, S. Drielsma, M., Manion, G. & Watson, G. 2002. Extended statistical approaches to modeling spatial pattern in biodiversity: the north-east New South Wales experience. II. Community-level modeling, 11, 2309-2338.

Page 68: Jörn Vorwald BTU Cottbus

References• Gabriel, K. R. & Sokal, R. R. 1969. A new statistical approach to geographic variation analysis. – Syst. Ecol. 18: 259-270.• Kenkel N.C., Hoskins J.A. & Hoskins W.D. 1989. Edge effects in the use of area polygons to study competition. Ecology, 70 : 272-274.• Levins, R. 1966. The strategy of model building in population ecology. Am. Sci. 54: 421–431.• Mercier, F. & Baujard, O. 1997. Proceedings of GeoComputation ‘97 & SIRC ’97: 161 – 171.• Okabe, A., Boots, B., Sugihara, K. & Chiu, S. N. 2000. Spatial tessellations: concepts and applications of Voronoi diagrams. 2nd ed. John Wiley & Sons, Chichester, UK.

Page 69: Jörn Vorwald BTU Cottbus

References

• Refsgaard, J. C. 1996. Terminology, modelling protocol and classification of hydrological model codes. In: Refsgaard, J. C. & Abbott M. B. (eds.) Distributed hydrological modelling, Kluwer: 17-40.• Sharpe, P. J. A. 1990. Forest modeling approaches: compromises between generality and precision. In: Dixon, R. K., Meldahl, R. S., Ruark, G. A. & Warren, W. G. (Eds.) Process Modeling of Forest Growth Responses to Environmental Stress. Timber Press, Portland, OR: pp. 180–190.