View
215
Download
0
Category
Tags:
Preview:
Citation preview
AnalysingAnalysing the link between the link between traits traits & invasive & invasive spread in German flora: accounting for spread in German flora: accounting for
residence timeresidence time
Joint work betweenEva Küster, Ingolf Kühn ~ UFZ
Adam Butler, Stijn Bierman, Glenn Marion ~ BioSS
Athens ALARM meeting, January 2007Athens ALARM meeting, January 2007
IntroductionIntroduction• Direct data on the arrival, establishment & spread of invasive
species are typically not available at the national or pan-European levels
• Indirect data about the traits & current spatial distribution of species that invaded in the past can be used to identify correlative relationships between traits and invasive success, accounting for phylogeny
• Data on traits are often missing or ambiguous, however, creating serious problems for the analysis – we look at how to address these using Bayesian methods
• We analyse data on German vascular plants• Biolflor (www.ufz.de/biolflorwww.ufz.de/biolflor):
database with information on traits & phylogeny of 3660 species
• Florkart (www.floraweb.dewww.floraweb.de):database with information on presence/absense of 4000+ species for 2995 grid
cells within Germany
• We look at neophyte species (arrivals since 1490), excluding ephemerophytes: there are 388 such species
• We use the # of grid cells occupied as a measure of invasive success
DataData
Niche breadth in Germany
# hemerobic levels
Urbanity
# of habitat types
# of vegetation formations
# phytosociological classes
Genetics
Ploidy
DNA content
Morphology
Life form
Growth form
Life span
Generative reproductive cycles
Propagation & dispersal
Types of storage organs
Existence of storage organs
Types of shoot metamorphoses
Types of root metamorphoses
Flowering phenology
Beginning of flowering season
Length of flowering season
End of flowering season
Floral & reproductive biology
Strategy types of reproduction
Mating strategy
Pollen vector
Flower colour
Floral UV pattern
Floral UV reflection
Blossom type
Diaspores & germinules
Types of diaspores
Weights of diaspores
Weights of germinules
Native global distribution
Floristic zones of native area
# floristic zones in native area
Continent of native area
# continents in native area
Native in old or new world?
Oceanity of native area
Amplitude of oceanity
Leaf traits
Leaf persistence
Leaf anatomy
Leaf form
Invasive history
Mode of introduction
Residence time
Life strategy
Ecological strategy
Ruderal life strategy
Current analysis by UFZCurrent analysis by UFZ KKüüster, Kster, Küühn and Klotz (in prep.)hn and Klotz (in prep.)
• Regress log(# grid cells occupied) onto each of the ~40 individual traits in turn, in the presence of phylogenetic variables
• Retain only traits that are significant at the 95% level, exclude non-predictive traits, & then use cluster analysis to further reduce the set of traits
• Use AIC to select the best model from within this set of traits, including interactions
• At all stages, use only those species that have complete data for all traits currently in the model
Phylogenetic correctionPhylogenetic correction KKüüster, Kster, Küühn and Klotz (in prep.)hn and Klotz (in prep.)
• Compute the patristic distance matrix based on the phylogenetic codes given in biolflor
• For the current set of species –• apply a principal coordinate analysis to the relevant part of the distance matrix• retain only axes associated with positive eigenvalues• then retain the axes that account for the first 80% of variation• then regress log(# grid cell occupied) onto the remaining axes and retain only
those that are significant at the 95% level
• The phylogenetic variables need to be recomputed whenever the set of species is changed
Missing dataMissing data
• A large number of species are currently excluded from the final analysis as data are missing on some of their traits
• This is inefficient, & could potentially lead to bias if the data are missing not at random
• The missing data arise from different sources –• there being no record in the Biolflor database• the qualifier in Biolflor suggesting that data quality is poor• multiple states being recorded for a particular trait• a very rare state being recorded
Residence timesResidence times
• Residence time is a particularly important variable because• it has good explanatory power to describe occupancy • It partly accounts for the dynamic nature of invasive processes• it allows us to make time-specific predictions about occupancy
• However, data on German residence times are only available for
171 species, & for 35 of these only to the nearest century
• Some auxiliary data is available for neighbouring countries
• How can we properly include residence time into the analysis,
given the large proportion of missing data?
Species Region Time
Amaranthus deflexus L. Germany 1889
Aesculus hippocastanum L. Germany 16th century
Acer negundo L. Czech Republic
Germany
1699
18th cenutry
Oenothera depressa Greene Germany Early 19th century
Oxalis fontana Bunge Central Europe
Germany
17th century ?
1807
Epilobium ciliatum Raf. Central Europe
Germany
1871 / since 1971
1927
Nepeta grandiflora M. Bieb. Germany ca. 1900
Agrostis scabra Willd. Central Europe
Germany
1909
1960
Work at BioSSWork at BioSS
• The aims of our research on this at BioSS –• to explore how sensitive the results of inferences are to the
assumptions that we make about missing data • to analyse the data in such a way that species with missing
data for some traits do not need to be excluded• to relate the outputs from the the analysis to invasive risk
• We work with the Biolflor-Florkart data, and focus upon missing
data for residence times; however, the methodological ideas are
widely applicable
Application to toolkitApplication to toolkit
• Application to the prediction of invasive risk• e.g. Use traits & phylogeny to infer the number of cells
that a recently arrived species is likely to occupy after
N years of residence
• This number is uncertain, so it will be a probability
distribution rather than a single number
Bayesian methodsBayesian methods
• An alternative approach to statistical modelling and inference, in
which data are regarded as fixed and parameters are regarded
as random
• Increasingly widely used: due to improvements in computational
power it is now often possible to fit more advanced models
using Bayesian inference than using classical statistical methods
• Particularly suitable for problems that involve missing data
• Implemented using free software called WinBUGS:
extremely powerful but not particularly user-friendly…
Bayesian modellingBayesian modelling
Notation: for species i:
yi = # of grid cells occupied
ri = residence time
xi = other trait data
zi = phylogenetic variables
Basic model
log yi ~ N( + xi + zi + ri, 2)
…just the same as a GLM
Prior distributions
We use uninformative priors
, , , ~ N(0,1000)
2 ~ Gamma(1/1000, 1/1000)
• Recast the UFZ methodology in a Bayesian context, and implement this in WinBUGS
• Use this to explore potential refinements or extensions to the current analysis
• Assess sensitivity to the assumptions about missing data, phylogenetic dependence and distribution of the response variable (log-normal or Binomial)
• Implementation is in WinBUGS• develop ways of dealing more
efficiently with missing data
• Bayesian
LPJ code: Ben Smith, Stephen Sitch, Sybil Schapoff
CRU data: David Viner
GCM data: PCMDI
Statistical methods: Jonathan Rougier, Chris Glasbey
Uncertainty analysis: Bjoern Reineking, Stijn Bierman
MCMC details:
Burn-in = 5000, Sample = 2000
Thinning ratio = 1:50
ImputationImputation
• When data on residence times are missing, then we can assume that they are random variables• We can use data on the other traits, phylogeny & number of grid cells occupied to infer the distribution of the residence time for a particular species i
e.g.
log ri ~ N(exp{a + bxi + czi + dyi}, s2)
• Use of the cut function ensures this does not bias inferences about , , , and
• Recast the UFZ methodology in a Bayesian context, and implement this in WinBUGS• Use this to explore potential refinements or extensions to the current analysis• Assess sensitivity to the assumptions about missing data, phylogenetic dependence and distribution of the response variable (log-normal or Binomial)
• Implementation is in WinBUGS• develop ways of dealing more efficiently with missing data
• Bayesian
LPJ code: Ben Smith, Stephen Sitch, Sybil Schapoff
CRU data: David Viner
GCM data: PCMDI
Statistical methods: Jonathan Rougier, Chris Glasbey
Uncertainty analysis: Bjoern Reineking, Stijn Bierman
Results: PloidyResults: Ploidy
Polyploid vs diploidPolyploid vs diploid
Estimate (SE) for trait effect Classical Bayesian
Trait .580 (.226) .587 (.225)
Trait + Phylogeny .636 (.220) .656 (.211)
Trait + Phylogeny + Residence .790 (.347) .630 (.216) [cut]
.761 (.199) [full]
Pink result based on 124 species
Other results based on 345 species
42 species excluded
Main model: P(parameter > 0)
Imputation model: P(parameter > 0)
> .99 b .14
1, .94 c 1, .84
> .99 d .99
Results: PloidyResults: Ploidy
Imputed valuesImputed values
Results: PloidyResults: Ploidy
PredictionsPredictions
Results: Duration of floweringResults: Duration of flowering
Estimate (SE) for trait effect Classical Bayesian
Trait .362 (.084) .358 (.080)
Trait + Phylogeny .329 (.083) .326 (.081)
Trait + Phylogeny + Residence .298 (.113) .229 (.082) [cut]
.204 (.076) [full]
Pink result based on 135 species
Other results based on 379 species
8 species excluded
Main model: P(parameter > 0)
Imputation model: P(parameter > 0)
> .99 b .97
.99 c > .99
> .99 D > .99
Results: End of floweringResults: End of flowering
Estimate (SE) for trait effect Classical Bayesian
Trait .207 (.060) .206 (.058)
Trait + Phylogeny .167 (.060) .166 (.059)
Trait + Phylogeny + Residence .275 (.106) .096 (.061) [cut]
.227 (.060) [full]
Pink result based on 135 species
Other results based on 379 species
8 species excluded
Main model: P(parameter > 0)
Imputation model: P(parameter > 0)
.96 b .17
.98 c > .99
> .99 d > .99
Results: End of floweringResults: End of flowering
Results: End of floweringResults: End of flowering
Results: Pollen vectorResults: Pollen vector
Estimate (SE) for trait effect Wind vs Self Insect vs Self
Classical Bayesian Classical Bayesian
Trait -1.16 (.38) -1.16 (.37) -0.71 (.32) -0.72 (.31)
Trait + Phylogeny -0.79 (.38) -0.81 (.36) -0.72 (.32) -0.72 (.31)
Trait + Phylogeny + Residence -1.22 (.51) -0.57 (.37) -0.74 (.43) -0.39 (.33)
-0.51 (.32) -0.56 (.27)
Main model Imputation model
.06, .13 b .06, .82
< .01, <.01 c < .01, .08
> .99 d .99
Pink result: 108 species
Other results: 329 species
58 species excluded
Results: Shoot metamorphosesResults: Shoot metamorphoses
a vs no Classical Bayesian
T 0.64 (.34) 0.64 (.34)
T+P 0.68 (.34) 0.70 (.34)
T+P+R 0.61 (.62) 0.82 (.35)
rh v no Classical Bayesian
T -1.06 (.35) -1.05 (.34)
T+P -0.79 (.37) -0.82 (.37)
T+P+R 0.26 (.63) -0.70 (.35)
p vs no Classical Bayesian
T 0.09 (.34) 0.10 (.32)
T+P 0.05 (.34) 0.08 (.37)
T+P+R -0.02 (.65) 0.23 (.33)
z vs no Classical Bayesian
T -1.12 (.65) -1.04 (.65)
T+P -0.24 (.75) -0.26 (.75)
T+P+R ? -0.06 (.69)
Significance of trait effect in Bayesian model: posterior probability that > 0
Trait only Trait + phylogeny
CUT: T + P + residence
Ploidy polyploid vs diploid > .99 > .99 > .99
Length of flowering season > .99 > .99 > .99
End of flowering season > .99 > .99 .94
Shoot a vs none .97 .98 .99
rh vs none < .01 .01 .02
p vs none .62 .59 .75
z vs none .05 .36 .47
Pollen vector wind vs self < .01 .01 .06
insect vs self .01 .01 .12
(Note: posterior probability that > 0 is always >0.99)
Further work 1:Further work 1:
Data Not Missing at RandomData Not Missing at Random
• Our model assumes that the data on residence times are missing at random, as does the approach of excluding missing data
• We can also consider possible mechanisms by which the missing data might be related to the variables of interest
Let oi = 1 if residence time observed for species i, 0 otherwise
• We could assume that
oi ~ Binomial(1, logit-1{A + Bxi + Czi + Dyi + Eri})
• The parameter E cannot be estimated, but we can assess sensitivity to the value of it; we assume here that E is negative
Results: End of floweringResults: End of flowering
Trait effect: estimate (SE)
for
Mean (Q2.5%, Q97.5%) imputed
residence
Trait only .206 (.058) -
+ Phylogeny .166 (.059) -
+ Residence MAR CUT .096 (.061) 114 (34, 355)
full .227 (.060) 104 (27, 351)
NMAR CUT E = -1
E = -2
E = -3
.094 (.062)
.096 (.064)
.090 (.058)
145 (44, 454)
191 (55, 619)
315 (73, 916)
Further work 2: Further work 2:
Multiple traitsMultiple traits
• Relatively low proportions of missing data for the other key traits:can just exclude these when he look at traits individually, but more problematic when we look at effects of multiple traits
• Most “missing data” for the other key traits arise because rare or duplicate trait states are recorded in Biolflor
• We would like to incorporate this information directly into the analysis, rather than attempting to impute the missing values
• We can deal with duplicate states either by assuming:• that the parameter for species that have both states is the average of the
parameters for the two states; or• by including a separate parameter for species that have duplicate traits
# treated as missing in current analysis
# with no record at all
Ploidy 42 13
Length of flowering season 8 8
End of flowering season 8 8
Pollen vector 58 37
Shoot metamorphoses 59 1
Any of the above five traits 134 54
Species Pollen vector Qualifer
Acer negundo L. Wind Always
Adonis annua L. Selfing
Insects
Unknown
Unknown
Alcea Rosea L. Selfing
Insects
At failure of outcrossing
The rule
Artemisia dracunculus L. Wind The rule
Diplotaxis muralis (L.) DC. Selfing
Selfing
Insects
The rule
At failure of outcrossing
The rule
Elodea canadensis Michx. Water The rule
Epilobium ciliatum Raf.H Selfing
Cleistogamy
The rule
The rule
Missing datain current analysis
Method to deal with duplicates: Exclude Average of parameters
Separate parameter
Ploidy Polyploid vs Diploid .636 (.220) .592 (.226) .641 (.225)
Both vs Diploid - .296 (.113) .747 (.396)
Pollen vector Wind vs Self -.795 (.376) -.508 (.376) -.653 (.384)
Insect vs Self -.716 (.315) -.683 (.310) -.748 (.314)
Water vs Self - -.094 (.935) -.134 (.933)
Insect+Self vs Self - -.342 (.155) -.138 (.614)
Wind+Self vs Self - -.254 (.188) 2.18 (1.42)
Wind+Insect vs Self - -.596 (.244) .099 (1.99)
Classical analysis, model = Traits + Phylogeny
Furthur work 3: Furthur work 3:
Auxiliary residence time dataAuxiliary residence time data
• The imputation model allows us to draw inferences about residence times for species where the arrival date is unknown
• The performance of the imputation model depends upon us it containing regressors that are strongly correlated with residence time in Germany
• Possibility of using data on residence in a neighbouring country, ni, as an explanatory variable:
log ri ~ N(exp{a + bxi + czi + dyi + eni }, s2)
Furthur work 4: Furthur work 4:
Climate changeClimate change
• UFZ are using the species-level model to identify key
traits for invasive success, & then a spatial approach
to estimate impact of environmental change on these
• A non-spatial approach might involve grouping cells
according to environmental characteristics, & fitting the
species-level model seperately for each group of cells
• We are interesting in comparing these approaches
Recommended