Lepidoptera: Where They Are and When They Fly Roy Adams, UC Davis Ryan Smith, Union College Camila...
If you can't read please download the document
Lepidoptera: Where They Are and When They Fly Roy Adams, UC Davis Ryan Smith, Union College Camila Matamala-Ost,Oregon State Chris Mattioli, Providence
Lepidoptera: Where They Are and When They Fly Roy Adams, UC
Davis Ryan Smith, Union College Camila Matamala-Ost,Oregon State
Chris Mattioli, Providence College in Providence Rhode Island
Elizabeth Cowdery, Cornell Grace Zalenski, Lewis & Clark
Slide 2
Lepidoptera Lepidoptera is second largest class in Insecta
Approximately 600 species of moths occur in the H.J. Andrews
Experimental Forest Relatively little is known
Slide 3
Prey Defoliators PollinatorsDecomposers Diverse Ecological
Roles EggCaterpillar Pupa Moth Ecosystem Functions Stages of
lifecycle
Assessing Environmental Impacts Temperature Caterpillar growth
rate affected by temperature Caterpillar must reach certain
critical size to enter pupal stage Majority of moths in Pacific
northwest overwinter as egg or in cocoon Many species wont emerge
unless undergo period of cold (dipause) Plant Nutrition Sensitive
to nitrogen and water content High water content enhances growth
Theoretically, moth abundance and/or emergence could be linked to
changes in nitrogen and water content of plants. Source of food,
impacted by abundance of food source, sensitive to changes in
temperature and nutritional quality of plants.
Slide 6
Slide 7
Moth Sampling Universal Black light traps 22w circular bulbs,
12v batteries Set 1 2 hours before sunset Moths attracted to light
and stunned by insecticide and acrylic veins Intervals of 1+ weeks
Biased towards phototatic night flying moths (majority) Data not
used this summer, will be used in island biogeography study
Slide 8
Moth Identification
Slide 9
Moth Data Used in Modeling Sampled with same method by Jeffrey
Miller 2004-2008 Emergence uses data from 20 sites trapped 30+
times Moth Distribution includes data from biological inventory
survey Almost 40% sites trapped only once More than half trapped
either once or twice Feralia deceptiva
Slide 10
Vegetation Sampling Purpose: Test hypothesis that moths are
distributed near host plants by contributing to a database of
vegetation data at moth sampling sites 32 sites 100 meters in 4
directions All vascular plant species except fern allies (except
horsetails) To learn more about host plants Polystichum
munitum
Slide 11
Phenology and Climate Change As difficult as it is to predict
precisely how the planet will warm over the next century or so, it
is even harder to refine predictions of how those changes will
affect specific species. 1 What are the drivers of moth emergence?
Moths are poikilothermic How will climate change influence moth
emergence? Due to human induced climate change over the last
decade, phenology has become one of the leading indicators of
species response to environmental change 2 Will this have an effect
on other animals? 1: Barringer, Felicity. Trout Fishing in a
Climate Changed America. New York Times Green Blog 16/7/2011.
16/7/2011. 2: Roy, DB & Sparks, TH. Phenology of British
butterflies and climate change. Global Change Biology (2000). 6,
407-416.
Slide 12
Emergence Objectives Improve on the previous model Create a
model that can predict on which day moths will emerge Use degree
days instead of Julian days: GDD for plants
Slide 13
Model Counts with Julian Days as the interval Degree-Day Curve
Model Model Showing Counts with Degree Days as interval
Slide 14
Degree Days Took max temp. data from HJ Andrews Assigned trap
sites to Met. and Ref. Stands Interpolated missing data Discuss
procedure for calculations
Slide 15
Thermal Climate of the H.J. Andrews Experimental Forest PRISM
estimated mean monthly maximum and minimum temperature maps showing
topographic effects of radiation and sky view factors. Provided by
Jonathan W. Smith and EISI 2010
The Model Uses abundance data from trapping Estimates
parameters of emergence and abundance curves from trap counts
Optimizes parameter estimates to create emergence and abundance
curves
Slide 18
P(j,k) We assume we catch all moths flying at trap time P(j,k)
is the probability that a moth emerges in interval j and has a
natural death time in interval k Measures abundance
Slide 19
Variables In original model, P(j,k) found by numerically
integrating the joint density Q(j,k) and q j successively computed
Likelihood function uses q j to optimize parameter estimates
Emergence time: Lifespan:
Slide 20
Obtaining our parameters = Pr(moth caught by trap) m = # moths
flying q j = Pr(moth trapped at t j ) Assume (a constant)
Slide 21
Multinomial distribution
Slide 22
Convergence in distribution... Where the F i s are Poisson
random variables As and, we assume (expected value of moths caught)
approaches some constant
Slide 23
Distribution, cont. m and alpha are unknown If m is large and
alpha small enough, the likelihood will be very close to Poisson
The model uses the multinomial distribution :
Slide 24
Incorporating degree days Degree day values: Each moth has
emergence threshold, D Now define
Slide 25
Changes Compute P(j,k) differently because T e is discrete
Single set of parameters for each species, rather than separate for
each trap and year AIC: measure of fit
Slide 26
Slide 27
Slide 28
Slide 29
3G Days Since May 1 st 3G Degree Days
Slide 30
Slide 31
5O Days Since May 1 st 5O Degree Days
Slide 32
Slide 33
Future Work Degree Days Revisit interpolation methods
Experiment with different degree thresholds and starting dates
Model Multinomial v. Poisson Multiple traps for one year Take new
data into account: Vegetation surveys Elevation, Aspect, Watershed,
Habitat
Slide 34
Species Distribution Model Applications Combine numerical
observations and relevant variables (often environmental and
spatial) to predict species distribution in space and/or time Why
do this? Ecological insight, further research topics Land use
management and conservation planning
Slide 35
SDMs and Machine Learning Supervised machine learning Use
training data: {(x 1,y 1 ), (x 2,y 2 ),,(x n,y n )} to arrive at a
function f(x i ) y i. Split data into training, test, and
validation sets. Assume that if a moth exists at each site weve
trapped it at least once at that site.
Slide 36
SDMs and Machine Learning Training set: half of the original
data set used in initially learning and fitting the function.
Certain algorithms require their parameters to be tuned for optimum
performance. This is accomplished by testing the model against a
validation set a subset of the training set. Test set: half of the
original data set separate from the training set. After parameter
tuning, the functions accuracy can be evaluated by running it on a
test set.
Slide 37
Quantifying Accuracy The area under the receiver operating
characteristic curve (AUC) is used as our measure of accuracy for
the distribution maps. It is the probability that a randomly
selected positive instance (moth presence) is ranked higher than a
randomly selected negative instance. AUC = 0.5 indicates a random
guess.
Slide 38
Learning Algorithms Algorithms Random Forest Logistic
Regression Support Vector Machines Generalized Boosted Regression
Models Corresponding R package randomForest glmnet e1071 gbm
Slide 39
Random Forest Ensemble method Grows decision trees by combining
bagging with the random selection of features. A decision tree is a
model of decisions and their outcomes. Internal nodes represent
points where a decision is made, and the leaves represent the
outcomes. Bagging is the process of randomly sampling with
replacement from the set of training examples, and constructing a
decision tree from the bag. Random forest also randomly selects
features for each training example rather using the whole of
features.
Tuning Random Forest After creating n bags, and growing n trees
new data can be classified by taking a vote of all the trees
predictions. The number of trees grown can be altered and tuned as
can the number of nodes of each tree.
Slide 42
Logistic Regression P(y = 1|x) = 1/(1+e -t ) Where t is ( 0 + 1
x 1 ++ n x n ) Attempt to find appropriate values to weigh the
covariates.
Slide 43
Tuning Logistic Regression It is oftentimes optimal to restrict
the number and size of these values in regression. There is a
combination of penalty terms called the elastic net to achieve
these restrictions. Penalty term takes the form: [((1-)/2)*|| 2 +
(*||)] The parameters: and are tuned. controls which term is more
important. controls the weight of the entire expression.
Slide 44
Support Vector Machines Non-probabilistic classifier. Attempts
to construct an n-dimensional hyperplane to separate two possible
classes of data. The most desirable hyperplane is the one with the
largest functional margin.
Slide 45
Tuning Support Vector Machines Oftentimes data is not linearly
separable. Kernel functions map the data unto a space where a
hyperplane can be easily constructed. Linear Radial Sigmoid
Polynomial
Slide 46
Generalized Boosted Regression Models Ensemble Method Loss
function: a measure that represents the loss in predictive
performance of a model. GBMs construct an initial regression tree
that maximally reduces the loss function. A regression tree is a
decision tree whose outputs are real-valued.
Slide 47
Generalized Boosted Regression Models To further reduce the
loss function, new trees are added. At the second step, a
regression tree is fitted using the residuals (variations in
response) of the first tree. The model now updates to contain two
terms, and residuals are taken from the two-term model. The process
continues in this stage-wise fashion until a specified parameter
n.trees. Fitted values update with each new tree addition.
Slide 48
Tuning GBM Like the other learning algorithms, GBM also has
parameters to be tuned. The number of trees to be constructed and
added. The number of nodes in each tree (interaction depth).
Acknowledgements NSF OSU OSU Arthropod Museum Matt Cox Steve
Highland Tom Dietterich Dan Sheldon Olivia Poblacion Julia Jones
Desiree Tullos Jorge Ramirez John & Emily Vera Jeff Miller/Paul
C. Hammond