29
1 Dealing with detection error in site occupancy surveys: What can we do with 1 a single survey? 2 Running title: Site occupancy estimation 3 4 Subhash R. Lele 1 , Monica Moreno 1 and Erin Bayne 2 5 1 Department of Mathematical and Statistical Sciences 6 2 Department of Biological Sciences 7 University of Alberta 8 Edmonton, AB T6G 2G1 (Canada). 9 10 Corresponding author: 11 Subhash R. Lele 12 Email: [email protected] , 13 Phone: (780) 492 4290 14 Fax: (780) 492 6826 15 16 Word count: 5608 17 Number of tables 2 18 Number of figures 5 19 20

Lele et al

  • Upload
    ngonga

  • View
    228

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Lele et al

1

Dealing with detection error in site occupancy surveys: What can we do with 1

a single survey? 2

Running title: Site occupancy estimation 3

4

Subhash R. Lele1, Monica Moreno1 and Erin Bayne2 5

1Department of Mathematical and Statistical Sciences 6

2Department of Biological Sciences 7

University of Alberta 8

Edmonton, AB T6G 2G1 (Canada). 9

10

Corresponding author: 11

Subhash R. Lele 12

Email: [email protected], 13

Phone: (780) 492 4290 14

Fax: (780) 492 6826 15

16

Word count: 5608 17

Number of tables 2 18

Number of figures 5 19

20

Page 2: Lele et al

2

1

Summary: 2

Site occupancy probabilities of target species are commonly used in various ecological 3

studies, e.g. to monitor current status and trends in biodiversity. Detection error 4

introduces bias in the estimators of site occupancy. Existing methods for estimating 5

occupancy probability in the presence of detection error use replicate surveys and assume 6

site occupancy status remains constant across surveys. This assumption is known as 7

population closure. We present an approach for estimating site occupancy probability in 8

the presence of detection error that requires only a single survey and does not require 9

assumption of population closure. In place of the closure assumption, this method 10

requires covariates that affect either detection or occupancy or both. This single survey 11

approach facilitates analysis of historical data sets where replicate surveys are 12

unavailable, situations where replicate surveys are expensive to conduct, and when the 13

assumption of closure is not met. This method saves significant amounts of time, energy, 14

and money in ecological surveys without sacrificing statistical validity. 15

16

Key words: Abundance estimation, Biodiversity, Breeding Bird Survey, Closed 17

population, Data cloning, Penalized likelihood, Species occurrence. 18

19

20

Page 3: Lele et al

3

Introduction: 1

The ability to accurately measure the distribution and abundance of species is at 2

the core of ecological research and monitoring (Krebs 1999). Understanding and 3

quantifying species distributions is essential to predict the effects of global climate 4

change, colonization by exotic species or changes in land-use. Many sampling methods 5

and statistical analyses have been developed to estimate species abundance. Most 6

methods are designed to measure density, that is, the number of individuals per unit area. 7

While density is a desirable state variable to report, there are a number of practical 8

problems in estimating density (Krebs 1999). Even when it is possible to measure 9

density accurately, the economics of doing so can be prohibitive for large-scale 10

applications. This has led many research programs and large-scale monitoring initiatives 11

to rely on rates of occurrence to get coarse measures of species abundance. 12

Collecting presence-absence data at a series of locations has become a preferred 13

method of evaluating ecological status and trends because of the simplicity of data 14

collection. Analysis of presence-absence data is often done to estimate the relationship 15

between site occupancy probability and site characteristics and then use these models to 16

predict the number of sites in a larger landscape that are occupied by that species. Most 17

applications of species presence-absence data have used logistic regression or 18

contingency table analysis to associate species with temporal, habitat or spatial 19

covariates. The assumption inherent in this approach is that the proportion of sites where 20

a species is detected is equivalent to the actual occupancy rate. Occupancy rate is the 21

true proportion of sites where a species occurs. It can only be estimated when the 22

detection probability is equal to 1. The statistical analysis of presence-absence data and 23

Page 4: Lele et al

4

the ability to draw conclusions from such data has been questioned (Mackenzie et al. 1

2002) because detection is seldom perfect, hence true occupancy rates are generally 2

underestimated. 3

Detection probability is the probability that a species is observed, when it was 4

present, at a particular site during the survey period. Detection probability can be less 5

than one for a variety of reasons. Observers may be unable to detect the presence of the 6

species due to survey-specific conditions such as rain, temperature or lack of visibility at 7

the time of the survey. Detection rate can vary between habitats because the structure of 8

vegetation alters the ability of an observer to see or hear a species. Imperfect detection, if 9

not taken into account, leads to a biased estimator of occupancy probability (MacKenzie 10

et al., 2002). Recently several researchers (MacKenzie et al., 2002; Tyre et al., 2003; 11

Stauffer et al., 2004; Gu and Swihart, 2004, Martin et al., 2005) have developed methods 12

for estimating occupancy probability when the probability of detection is less than 1. 13

A common requirement for current methods to estimate occupancy probability 14

when detection probability is less than 1 is that sites must be sampled repeatedly. 15

Repeated sampling can take the form of visiting multiple locations within a larger area of 16

interest or by visiting the same location at different times. Using a repeated-visit 17

approach, the target species is recorded as being detected or not detected at each visit. At 18

locations where the species is present, detection error will occasionally result in species 19

not being detected even though it is present at the site. Assuming the true occupancy 20

status does not change over the duration of repeated visits, called the closed-population 21

assumption, the changes in the detection and non-detection at a particular site can be 22

Page 5: Lele et al

5

attributed to detection error. This allows estimation of detection probability, which can be 1

used to correct biases in the naïve estimator of occupancy probability. 2

Repeatedly visiting sites to estimate detection error is not always feasible. 3

Returning to a site more than once multiplies the cost of most monitoring and research 4

programs. For example, if an observer has to return on different days, which may be 5

required to ensure the additional assumption of independence of surveys is met, the cost 6

of travel is effectively doubled. Requirement of multiple visits reduces the number of 7

sites that can be visited within a given sampling season for the same cost, reducing the 8

generality of the results to broader areas. Repeated visits to study sites can also have an 9

adverse effect on the survival of the individuals under observation (Rotella et al., 2000) 10

making multiple visits undesirable from a conservation perspective. Analysis of the vast 11

historical datasets (i.e. monitoring data) that did not conduct multiple visits is also not 12

possible in this context (Hirzel et al., 2002). Perhaps most importantly, validity of the 13

repeated visit methodology depends crucially on the assumption that the population is 14

closed, that is site occupancy status remains the same throughout the study period. To 15

ensure closure many researchers have used very short-time intervals between revisits. 16

However, the closure assumption can be violated, even on very short time scales, because 17

of within territory movement. Bayne et al (2010) show that within-territory movement 18

can introduce severe biases in occupancy estimates. Replicate visit methodology also 19

assumes statistical independence between surveys, which is likely to be violated with 20

shorter durations between revisits (Kendall and White 2009). 21

Despite the potential limitations of repeated visits for estimating occupancy rate, 22

there is a growing belief among ecologists that repeat surveys are essential if predictive 23

Page 6: Lele et al

6

wildlife-habitat models are to be useful. For example, Bolker (2008, page 333) claims: 1

"there is no way to identify catchability—the probability that you will observe an 2

individual—from a single observational sample; you simply don't have the information to 3

estimate how many animals or plants you failed to count". We whole-heartedly agree 4

that ecologists need to be more rigorous in reporting accurate occupancy rates so that 5

rigorous comparisons can be made across studies. However, does this mean that single-6

visit data and the resulting inferences are without value or is there something that can be 7

done with single-visit data that allows us to account for detection error? 8

Fortunately, the claim made by Bolker (2008) is false; multiple surveys are not 9

essential for estimating site occupancy parameters in the presence of detection error. In 10

general, site occupancy and detection probability parameters can be estimated using a 11

single survey provided (a) probability of occupancy and probability of detection depend 12

on covariates, and (b) the set of covariates that affect occupancy and the set of covariates 13

that affect detection differ by at least one variable. We surveyed nearly 100 papers on site 14

occupancy and found that, when covariates were used for modeling occupancy and 15

detection, most of them satisfied these conditions. Clearly, the conditions needed for the 16

single survey based estimation are not esoteric and are often satisfied in practice. In the 17

following, we describe a methodology for dealing with detection error in single surveys, 18

present simulation results that show the methodology works, and present an analysis of a 19

common type of data for which only single survey data have been available for over 50 20

years (Breeding Bird Survey). 21

22

Statistical model and estimation procedure: 23

Page 7: Lele et al

7

Let us assume that there are sites that will be surveyed in the study area. Let 1

if the i-th site is occupied and if the i-th site is unoccupied. These are the 2

true states that are unobserved. Let if the i-th site is ‘observed to be occupied’ and 3

if the i-th site is ‘observed to be unoccupied’. The probability of occupancy is 4

denoted by and the probability of detection by . We 5

assume that if the species is not present, it will not be misidentified and hence 6

. Simple probability calculations show that 7

and . These probabilities can depend on the 8

habitat and other covariates. Let denote the set of covariates that affect occupancy, 9

denote the set of covariates that affect detection. Some covariates may affect only 10

detection, some covariates may affect only occupancy and some covariates may affect 11

both detection and occupancy. For example, type of forest cover may affect both 12

occupancy and detection, whereas time of the day or weather conditions may affect only 13

detection. Thus, some of the covariates in the sets and might be the same. With this 14

notation, and . These functions should be such that 15

and . 16

The necessary conditions under which the parameters are identifiable using 17

single survey data are: 18

1) There should exist at least one covariate that affects either probability of detection 19

or probability of occupancy. 20

2) The set of covariates and should be such that there is at least one covariate 21

that is in one set but not the other. 22

Page 8: Lele et al

8

From a survey of previous applications of site occupancy models, it seems that most 1

practical situations satisfy these conditions. In fact, for 94 out of 100 cases, the covariates 2

that affect detection and covariates that affect occupancy were disjoint. A limitation of 3

our methodology is that the case of constant probability of occupancy and constant 4

probability of detection cannot be estimated. However, it appears that this restriction is 5

not important in practice as in most papers we have reviewed the probability of 6

occupancy and detection both were seldom constant. It is not possible to provide general 7

result about identifiability conditions under every possible model. However, the data 8

cloning method (Lele et al., 2007, 2010) can be used for both estimation and detection of 9

possible non-estimability. The parameters are estimable if and only if the posterior 10

variance converges to zero as the number of clones increases. This test is built into our 11

software for the analysis of single survey site occupancy data (Solymos and Moreno, 12

2010). 13

The goal of the statistical analysis is to estimate given the observations 14

. The likelihood function for these data is 15

(1) 16

Maximum likelihood estimators are obtained by maximizing this function with respect to 17

. If the sample size is large, one can use any numerical optimization technique to 18

obtain the MLE. However, this likelihood function, for small samples, is not a well-19

behaved, concave function. The problem of ill-behaved likelihood function is not unique 20

to single survey situation. In Moreno and Lele (2010) it is shown that even in the multiple 21

survey approach used by Mackenzie et al. (2002) that when the number of sites and/or 22

number of surveys is small the likelihood function is ill-behaved and will not reach the 23

Page 9: Lele et al

9

proper solution. Moreno and Lele (2010) use penalized likelihood to obtain estimators 1

from a multiple visit approach that are somewhat biased in small samples but have 2

substantially smaller mean squared error than the MLE. They also show that the 3

confidence intervals based on the penalized likelihood estimators are substantially shorter 4

and have close to nominal coverage than using the standard likelihood. The concept of 5

maximum penalized likelihood estimator, its intuitive justification and simulation results 6

comparing its properties with the usual maximum likelihood estimator under various 7

sample sizes and different levels of occupancy and detection error are presented in 8

Moreno and Lele (2010). Similar improvements over the standard maximum likelihood 9

estimator are obtained using the penalization idea in the single survey situation as well. 10

See the simulation results in the appendix. We emphasize here that the difference 11

between MPLE and MLE vanishes as the sample size increases. The penalization simply 12

stabilizes the likelihood function for small sample sizes. The penalized likelihood 13

estimators in the single visit case are obtained using the following steps. 14

Step 1: Obtain the MLE for by maximizing the likelihood function in (1). Let us 15

denote these by . 16

Step 2: Obtain the naïve estimator of by maximizing 17

18

This estimator, denoted by , is based on the assumption that there is no detection 19

error. This is stable but biased with the magnitude of bias depending on how large the 20

detection error is. 21

Step 3: Obtain the naïve estimator of by maximizing 22

Page 10: Lele et al

10

1

This estimator, denoted by , is based on the assumption that all sites are occupied. 2

This estimator is stable but biased. 3

Step 4: Maximize the penalized likelihood function with respect to 4

5

where and and 6

and denote the average occupancy and detection probabilities 7

under the naïve method of estimation and MLE respectively. Justification for this penalty 8

function is along the same lines as described in Moreno and Lele (2010). The confidence 9

intervals for MPLE can be based on the bootstrap technique and are shown to have good 10

coverage (Moreno and Lele, 2010). A computer program written in R to implement this 11

method is available in Solymos and Moreno (2010). 12

As described in Moreno and Lele (2010), because and 13

as the sample size increases, the penalized likelihood function 14

approaches the likelihood function if the number of sites is large. Penalization simply 15

stabilizes the likelihood function for small sample sizes. If the MLE of average detection 16

probability is high, naïve estimates of the occupancy are reasonable. In this case, the first 17

component of the penalty function is large, thus shrinking the occupancy parameters 18

towards their naïve estimates. Similarly if the MLE of the average occupancy is high, 19

naïve estimates of the detection parameters are reasonable. In this case, the second 20

component of the penalty function is large, thus shrinking the detection parameters 21

towards their naïve estimates. 22

Page 11: Lele et al

11

In Step 1 of the penalized likelihood estimation algorithm, we need to compute 1

the MLE and its variance. If the number of sites is smaller than 100, using a gradient-2

based optimization technique to find the location of the maximum tends to be tricky as it 3

tends to lead to nearly singular Hessian matrices (Moreno and Lele 2010). Hence we 4

cannot use the inverse of the Hessian matrix to approximate the variance of the MLE. 5

Instead of using a local, gradient-based technique to find the MLE and its variance in step 6

1, we use a global stochastic search method, a variant of the well known simulated 7

annealing method, called data cloning (Lele et al. 2007, 2010). In data cloning, as in 8

simulated annealing, the MLE is obtained as the mean of the posterior distribution. This 9

avoids the task of numerically differentiating a non-smooth function. To obtain the 10

variance of the MLE, one can either use the bootstrap (Efron and Tibshirani 1993) or it 11

can also be approximated by the variance of the posterior distribution (Lele et al. 2007, 12

2010). This avoids the problem of inverting a nearly singular Hessian matrix to 13

approximate the variance of the MLE. The penalized likelihood function (step 4) is 14

maximized using the standard numerical optimization techniques. 15

16

Simulation results: 17

These simulations have two goals. The first goal is to support the claim of 18

estimability of the parameters using a single survey. If the parameters are consistently 19

estimable then, as we increase the sample size, the estimates should converge to the true 20

values. The second goal is to show that these estimators give reasonable inferences in 21

practical situations. 22

Page 12: Lele et al

12

To achieve this goal, and for the purpose of considering a variety of scenarios 1

commonly found in this type of analysis, a total of 54 cases were simulated. These cases 2

were defined by considering different levels for factors such as the sample size, the type 3

of link function, the probability of occupancy and detection, and the configuration of the 4

covariates, i.e. whether or not there was a common covariate for both occupancy and 5

detection. 6

For each case, 100 data sets were generated using two covariates for occupancy 7

and two covariates for detection. For the case with no common covariates the probability 8

of occupancy was calculated using the Logistic link 9

where covariate values were generated using and 10

. Similarly, the probability of detection was calculated using either 11

the Logistic link or the Log-Log link 12

where covariate values were generated using 13

and . 14

For the common covariate cases, the covariate for the occupancy model was taken 15

as the common one for both. For instance, if the common covariate is a continuous one 16

and the link for both occupancy and detection is the Logistic link, the probability of 17

occupancy for the site i is and the probability of detection 18

is . 19

Page 13: Lele et al

13

The set of parameters was selected to obtain the desired level of occupancy and 1

detection required according to the case and the estimates were obtained by using the 2

Maximum Penalized Likelihood estimator described in the section 2 of this paper. 3

Figures 1 and 2 present the results from two representative cases obtained from the 4

simulations. A complete summary of the results obtained for the 54 cases is available in 5

the electronic appendix. 6

Figure 1 shows the box-plots of the estimated parameters when the mean 7

probability of occupancy is 0.27, the mean probability of detection 0.27, and the 8

covariates for occupancy and detection are separable. Clearly as the sample size increases 9

the distributions of the parameters become more symmetric and their centers are closer to 10

the true value. It is also observed that as the sample size increases the spread of the 11

distributions decreases. Figure 2 presents the results obtained for the case in which there 12

is a discrete covariate that is common to both occupancy and detection. Again, in this 13

case, the mean probability of occupancy and detection are low (0.27 and 0.31 14

respectively). Similar to the separable covariates case, as the sample size increases the 15

centers of the density functions get closer to the true value, and also, that the variance 16

decreases as the sample size increases. 17

For most of the situations considered in our simulations (see electronic appendix), 18

the mean occupancy and mean detection probability can be estimated reasonably well at 19

sample sizes of 100-200 whereas a good estimation of regression coefficients occurs at 20

sample sizes of 300 or larger. See Figure 3 for an example. If the main goal of an analysis 21

is estimation of mean occupancy rate, one does not have to worry as much about sample 22

size as when accurate estimation of the regression coefficient per se is the objective. 23

Page 14: Lele et al

14

These results along with the results in the appendix show that the occupancy and 1

detection parameters are identifiable using single survey data. This holds even when the 2

set of covariates for occupancy and detection have some overlap. 3

4

Illustrative data analysis 5

To illustrate the estimation of the parameters for an occupancy model using a 6

single survey we consider detected – not detected data for Ovenbirds (Seiurus 7

aurocapilla). Data were collected in 1999 using Breeding Bird Survey Protocols 8

(Downes and Collins 2003) in the boreal plains eco-region of Saskatchewan. The goal of 9

the study was to determine whether the occupancy of this species was influenced by the 10

amount of forest around each survey point. Data were collected along 36 BBS routes 11

each consisting of 50 survey locations with survey locations separated by 800 meters. To 12

increase independence of observations we used every second survey point along each 13

route (thus each point was 1.6 km apart) in our analysis (n = 900 survey locations). 14

Attributes about the forest type and amount of forest remaining with a 400m radius were 15

estimated from the Saskatchewan Digital Land Cover Project (MacTavish 1995). 16

The habitat requirements of the Ovenbird are well understood (Hobson and Bayne 17

2002) and we expected that the probability a location was occupied by the Ovenbird 18

would be positively influenced by the amount of deciduous forest remaining (forest 19

deciduous proportion). We also included longitude as the study covered an east-west 20

gradient over 1000 kilometers in length although a priori we were not sure what effect 21

this would have on occupancy. 22

Page 15: Lele et al

15

We expected four factors to influence detection probability: observer, time of day, 1

time of year, and amount of forest. Observers differ in their ability to hear birds in part 2

because of skill but also because of fundamental differences in the distance over which 3

they hear things. In general, male songbirds sing very regularly early in the breeding 4

season making it easy to detect individuals that are present. As the breeding season 5

progresses however, the males spend less time singing as they focus on other activities. 6

This often results in lower detectability later in the breeding season. We included Julian 7

date as a variable influencing detection error. Male songbirds also have a tendency to 8

sing earlier in the day, shortly after sunrise, and then later in the morning focus on 9

guarding the mate or foraging. To account for this, we included time of the day as a 10

factor influencing detectability. Detectability can also be influenced by habitat attributes. 11

In more open environments it is plausible that birds can be heard from long distances 12

increasing the chance an individual is detected (Schieck 1997). Alternatively, in areas 13

with more forest the chance of multiple males singing may be higher, increasing 14

detection probability relative to areas with less forest where only one individual may 15

exist. 16

We considered several different models and used Akaike Information Criterion 17

(AIC) to select the final model. We also used the Receiver Operating Curve (ROC) and 18

the Area under the ROC (AUC) to heuristically compare the fit of the full model, 19

detection and occupancy together. Table 1 gives the details on the various models that 20

were considered and the corresponding AIC and AUC values. The final model has AUC 21

of 0.82 indicating a fairly good predictive capacity for the full model, detection and 22

occupancy together. It also has a smaller AIC value relative to other candidate models. 23

Page 16: Lele et al

16

Table 2 presents the estimated parameters, the 90% confidence intervals and the 1

estimated standard errors for occupancy and detectability. Figure 4 depicts graphically 2

how the probability of occupancy and detection vary with the covariates. The confidence 3

intervals and the standard errors were estimated using 200 bootstrap samples. As 4

expected the proportion of deciduous forest has a positive effect on the probability of 5

occupancy. This relationship was best fit using a log-transformation of forest deciduous 6

proportion. Longitude was not statistically significant but it suggested that Ovenbird 7

occupancy rate increased as surveys were done further west. 8

The amount of forest cover was the strongest predictor of detection probability. 9

Detection probability was highest in areas with higher forest cover. This suggests that 10

increased numbers of birds in areas with more forest increase the probability of detecting 11

the species while in areas with low forest cover the reduced numbers of birds means the 12

chances of detecting the species given they are present in considerably lower. Although 13

not strictly statistically relevant according to AIC (and, hence not included in the final 14

model), detection probability did differ among observers and was affected by the time of 15

the day and Julian date in a sensible fashion (Figure 5). One observer (SVW) was much 16

more likely to detect birds in areas with less forest than the others. Previous experience in 17

other projects has demonstrated that this individual is able to hear birds over far greater 18

distances than other people so this result was not unexpected. Detection probability was 19

negatively related to Julian date indicating decreased singing activity later in the season 20

was reducing observer ability to detect birds given they were present. Time of day had a 21

positive relationship with detection probability. This was somewhat unexpected. 22

However, time of day was the least significant effect & surveys were done in a very 23

Page 17: Lele et al

17

narrow time window (4:00 to 9:00 local time). The estimated mean probability of 1

occupancy for all the sites, based on the final model, was 0.523, with a mean probability 2

of detection of 0.466. The mean probability of occurrence without correcting for 3

detection error, on the other hand, was 0.297. 4

5

Discussion 6

Ecologists have long relied on single surveys to estimate the relative abundance of 7

organisms between habitats. The key assumption underlying this approach is that by 8

randomizing sources of detection error across the variables of primary interest that the 9

correct relative pattern of habitat suitability is revealed even though the absolute 10

abundance or occupancy rate is underestimated. However, multiple survey techniques 11

have demonstrated that patterns in habitat selection can be strongly influenced by 12

detection error when habitat conditions influence the ability to detect an organism. In 13

addition, multiple survey methods have been useful in shifting ecologists away from 14

relative measures to absolute estimates of abundance, facilitating informed decision- 15

making and better ecological inference. When the crucial assumptions of population 16

closure and independence of surveys are satisfied and costs are not a major issue then 17

multiple survey methods will generally provide statistically more efficient estimators than 18

a single survey based approach. However, multiple survey methods have assumptions 19

that, if not met, will result in biased estimates of occupancy rate. 20

The concept of a closed population is clear in some situations. For stationary 21

organisms, the assumption of population closure is met quite naturally because detection 22

error can not be due to movement of the organism out of the sampling area. Detection 23

Page 18: Lele et al

18

error in such situations is clearly due to sampling conditions such as weather, date, time 1

of day, observer effects, or the ability to see a species in one habitat over another. For 2

mobile organisms, on the other hand, a closed population is difficult to define. For 3

example, many wildlife ecologists record species presence using point surveys whereby 4

individuals heard or seen within a given radius of the observer’s location are counted. For 5

species that are not territorial and move widely across the landscape, multiple visit 6

methodologies could fail to provide accurate estimates of occupancy rate. Even for 7

territorial species the issue of spatial scale of sampling relative to spatial scale of 8

movement of the animal creates problems for multiple visit surveys. For example, when 9

only part of an individual animal’s territory is within the sampling area an observer may 10

detect the animal imperfectly over multiple visits simply because the animal is in a part 11

of its territory that is outside the sampling area during one visit and inside the sampling 12

area during another visit. Bayne et al (2010) show that within-territory movement can 13

introduce biases in occupancy estimates that dramatically overestimate density.. See also 14

Rota et al. (2009) for other factors that result in the assumption of closure being violated. 15

Financial and logistical costs are pivotal in the design of effective monitoring and 16

scientific studies to track biodiversity. Proponents of multiple visit methods often 17

suggest the increased costs of multiple visits to be negligible. Admittedly, if the repeated 18

visits occur on the same day, the travel costs to a site will be relatively small. However, 19

whether such an approach will achieve independence of visits is not well established. In 20

addition, monitoring programs such as the Breeding Bird Survey (BBS) already 21

maximize the number of stops that an observer can do during the ideal period of 22

observations for birds. Requiring multiple visits would force the BBS and other 23

Page 19: Lele et al

19

monitoring programs to return to the same locations on a different day. Returning to a 1

site on a different day increases travel costs and personnel time, which will typically 2

reduce the total number of sites that can be visited during a survey season in direct 3

proportion to the number of sites that have to be revisited. We estimate that a shift from 4

a single visit to four visits for a bird monitoring program we are involved with in northern 5

Alberta, where many sites are visited by helicopter, would increase the annual costs of 6

monitoring 720 stations per year from ~$180,000 per year to ~$700,000 per year. 7

The development of the single survey approach provides an additional tool to 8

ecologists that allows for correction of detection error, does not have the critical 9

assumption of population closure, and has the logistical flexibility of conventional single-10

survey designs. 11

12

Acknowledgements: 13

This work was partially supported by grants from NSERC and Alberta 14

Biodiversity Monitoring Initiative, Environment Canada. 15

16

Page 20: Lele et al

20

Literature cited: 1

Bayne, E., Lele, S. and Solymos, P. (2010) Bias in the estimation of bird density and 2

relative abundance when the closure assumption of multiple survey approaches is 3

violated: A simulation study (submitted manuscript) 4

Bolker, B. (2008) Ecological Models and Data in R. Princeton university press, NJ. 5

Downes, C.M. & Collins B.T. (2003) The Canadian breeding bird survey, 1967-2000. 6

Canadian Wildlife Service – Progress Notes # 219. National Wildlife Research Centre, 7

Ottawa, ON. 8

Gu, W. & Swihart R.K. (2004) Absent or undetected? Effects of non-detection of species 9

occurrence on wildlife–habitat models. Biological Conservation 116:195–203. 10

Hirzel, A.H., Hausser J., Chessel D. & Perrin N. (2002) Ecological niche factor analysis: 11

how to compute habitat suitability maps without absence data. Ecology 83: 2027-2036. 12

Hobson, K.A. & Bayne E.M. (2002) Breeding bird communities in boreal forest of 13

Western Canada: Consequences of “unmixing” the mixed woods. Condor 102:759-769. 14

Krebs, C.J. (1985) Ecology: The experimental analysis of distribution and abundance (3rd 15

edition) Harper and Row, New York, USA. 16

Krebs, C. J. (1999) Ecological Methodology(2nd Ed). Pearson Education Press, USA. 17

Lele, S., Dennis B. & Lutscher F. (2007) Data cloning: easy maximum likelihood 18

estimation for complex ecological models using Bayesian Markov chain Monte Carlo 19

methods. Ecology Letters 10: 551–563. 20

Lele, S. R., Nadeem K. and Schmuland, B. (2010) Estimability and Likelihood Inference 21

For Generalized Linear Mixed Models Using Data Cloning. The Journal of the American 22

Statistical Association (in press). 23

Page 21: Lele et al

21

MacKenzie, D.I., Nichols J.D., Lachman G.B., Droege S., Royle J.A. & Langtimm C.A. 1

(2002) Estimating site occupancy rates when detection probabilities are less than one. 2

Ecology 83: 2248-2255. 3

Martin, T.G., Wintle B.A., Rhodes J.R., Kuhnert P.M., Field S.A., Low-Choy S.J., Tyre 4

A.J. & Possingham H.P. (2005) Zero tolerance ecology: improving ecological inference 5

by modeling the source of zero observations. Ecology Letters 8: 1235:1246. 6

MacTavish, P. 1995. Saskatchewan digital landcover mapping project. Report I-4900-7

15-B-95. Saskatchewan Research Council, Saskatoon, SK. 8

Moreno M. & Lele S.R. (2010) Improved estimation of site occupancy using penalized 9

likelihood. Ecology 91:341–346 10

Rotella, J.J., Taper M.L. & Hansen A.J. (2000) Correcting nesting success estimates for 11

possible observer effects: maximum likelihood estimates of daily survival rates with 12

reduced bias. Auk 117:92-109. 13

Schieck, J. (1997) Biased Detection of Bird Vocalizations Affects Comparisons of Bird 14

Abundance among Forested Habitats. The Condor 99:179-190. 15

Solymos, P. & M. Moreno (2010) Analyzing Single Visit Site Occupancy Data with 16

Detection Error. R package version 1.0-0. URL:http://cran.r-project.org/package=occupy 17

Stauffer, H.B., Ralph C.J. & Miller S.L. (2004) Ranking habitat for marbled murrelets: a 18

new conservation approach for species with uncertain detection. Ecological Applications 19

14: 1374-1383. 20

Tyre, A.J., B. Tenhumberg, S.A. Field, D. Niejalke, K. Parris & Possingham H.P. 21

(2003) Improving precision and reducing bias in biological surveys: estimating false-22

negative error rates. Ecological Applications 13: 1790-1801 23

Page 22: Lele et al

22

1 2 3

4

Page 23: Lele et al

23

Table 1 Models for the ovenbird data sorted from smallest to largest Akaike Information 1 Criterion (AIC). We also provide the Schwarz Information Criterior (BIC) and the area 2 under the Receiver Operating Characteristic curve (AUC). Smaller the AIC or BIC, better 3 is the model fit; larger the AUC, better is the fit. 4 5 Model Occupancy

model covariates

Detection model covariates AUC AIC BIC

1

Log (proportion of deciduous forest).

Proportion of forest. 0.823 825.500 844.656

2

Log (proportion of deciduous forest).

Proportion of forest, Julian date, time of day.

0.826 826.550 855.283

3

Log (proportion of deciduous forest); log (non deciduous forest).

Proportion of forest. 0.823 827.129 851.074

4

Proportion of deciduous forest, longitude.

Proportion of forest, Julian date, time of day, observer.

0.828 827.907 875.797

5

Log (proportion of deciduous forest), longitude.

Proportion of forest, Julian date, time of day.

0.826 828.490 862.013

6

Log (proportion of deciduous forest), longitude.

Proportion of forest, Julian date, time of day, observers.

0.827 830.775 878.664

7

Log (proportion of agricultural area), longitude.

Proportion of forest, Julian date, time of day, observers.

0.818 841.286 889.175

8 Proportion of agricultural area, longitude.

Proportion of forest, Julian date, time of day, observers.

0.820 843.457 891.347

6

Page 24: Lele et al

24

1 2 3 Table 2. Estimated parameters, confidence intervals and standard errors for the 4 occupancy and detection model for Ovenbird occupancy survey data. 5 6

Model Covariates Point Estimate and

(90% CI)

Intercept -0.255

(-0.843, 0.566) Occupancy model

Log(Proportion of deciduous forest) 1.546

(0.836, 3.561)

Intercept 0.476

(0.047, 1.127) Detection model

Proportion of forest area 0.951

(0.588, 1.763)

Average occupancy probability 0.496

(0.405, 0.643)

Average detection probability 0.489

(0.377, 0.608)

Naïve estimate of average occupancy 0.297

(0.278, 0.318)

7 8 9

Page 25: Lele et al

25

1 Figure 1. Simulations showing estimability of the parameters using a single survey when 2

the covariates that affect occupancy and detection are distinct. The parameters 3

correspond to the occupancy and correspond to the detection models. As the 4

sample size increases, the estimates converge to the true value. 5

6

Page 26: Lele et al

26

1

2 3 Figure 2. Simulations showing estimability of the parameters using a single survey when 4

there is a categorical common covariate that affect occupancy and detection. The 5

parameters correspond to the occupancy and correspond to the 6

detection models. As the sample size increases, the estimates converge to the true value. 7

8

Page 27: Lele et al

27

1 2 Figure 3. Simulations showing estimability of the mean occupancy and detection for both 3

cases: using separable covariates, and using a categorical common covariate that affects 4

occupancy and detection. 5

6

Page 28: Lele et al

28

(a)

(b)

(c)

1 2 Figure 4. (a) Probability of occupancy, (b) Probability of detection , and (c) Receiver 3 Operating Curve for the Ovenbird data. 4 5 6 7

Page 29: Lele et al

29

Figure 5 Estimates of the effects of the observers, Julian date and time of the day over the probability of detection for the Ovenbird data.