Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Model-based Inference for Rare and
Clustered Populations from Adaptive
Cluster Sampling using Auxiliary Variables
Izabel Nolau de Souza
Orientadores: Kelly Cristina Mota Goncalves
e
Joao Batista de Morais Pereira
Universidade Federal do Rio de Janeiro
Instituto de Matematica
Departamento de Metodos Estatısticos
2020
Model-based Inference for Rare and
Clustered Populations from Adaptive
Cluster Sampling using Auxiliary Variables
Izabel Nolau de Souza
Dissertacao de Mestrado apresentada ao Programa de Pos-graduacao em Estatıstica do
Instituto de Matemaica da Universidade Federal do Rio de Janeiro - UFRJ, como parte
dos requisitos necessarios a obtencao do tıtulo de Mestre em Estatıstica. Aprovado por:
Kelly Cristina Mota Goncalves
Dr.Sc. - IM/UFRJ - Orientadora.
Joao Batista de Morais Pereira
Dr.Sc. - IM/UFRJ - Coorientador.
Carlos Antonio Abanto-Valle
Dr.Sc. - IM/UFRJ.
Fernando Antonio da Silva Moura
Dr.Sc. - IM/UFRJ.
Pedro Luis do Nascimento Silva
Dr.Sc. - ENCE.
Rio de Janeiro, RJ - Brasil
30 de abril de 2020
ii
Agradecimentos
A concretizacao deste projeto nao se deve apenas a mim, mas tambem a todos aqueles
que de forma direta ou indireta se envolveram. Foi enorme e constante a partilha das
inumeras duvidas, incertezas, conquistas e muitas aprendizagens.
Agradeco primeiramente a Deus, por ter me dado saude e forca, nao somente durante
o mestrado, mas em todos os momentos, permitindo que tudo isso acontecesse.
Agradeco aos meus pais, Izilda e Fernando Jose, por todo amor, apoio e incentivo
que sempre me deram, celebrando minhas conquistas como se fossem deles proprios.
Obrigada pela educacao que me proporcionaram e por acreditarem tanto em mim! A
todos os familiares e amigos que me apoiaram e acompanharam toda minha trajetoria,
muito obrigada!
Agradeco aos meus orientadores Kelly e Joao, pelo incentivo, atencao e suporte que
me deram e pelas correcoes que fizeram neste trabalho visando a sua melhoria. Obrigada
nao apenas pela amizade, mas tambem por toda preocupacao que sempre demonstraram
com o meu futuro e por sempre acreditarem no meu potencial.
Agradeco a todos os professores que tive ao longo da minha vida, em especial aos
professores que me acompanharam durante a graduacao e o mestrado, que tiveram tanta
paciencia comigo e que tanto contribuıram para a minha formacao.
Agradeco aos professores Carlos Antonio Abanto-Valle (UFRJ), Fernando Antonio
Moura (UFRJ) e Pedro Luis do Nascimento (ENCE) por aceitarem fazer parte da minha
banca.
Por fim, agradeco ao Conselho Nacional de Desenvolvimento Cientıfico e Tecnologico
(CNPq), pelo apoio financeiro dos meus estudos.
iii
Abstract
Rare populations, such as endangered animals and plants, drug users and individuals
with rare diseases, tend to cluster in regions. Adaptive cluster sampling is generally
applied to obtain information from clustered and sparse populations since it increases
survey effort in areas where the individuals of interest are observed. This work aims to
propose a unit-level model which assumes that counts are related to auxiliary variables,
improving the sampling process, assigning different weights to the cells, besides referring
them spatially. The proposed model fits rare and grouped populations, disposed over a
regular grid, in a Bayesian framework. The approach is compared to alternative methods
using simulated data and a real experiment in which adaptive samples were drawn from an
African Buffaloes population in a 24,108km2 area of East Africa. Simulation studies show
that the model is efficient under several settings, validating the methodology proposed
in this dissertation for practical situations.
Keywords: Informative sampling, MCMC, spatial sampling, zero-inflated count data
iv
Contents
1 Introduction 1
2 Inference for finite populations 4
2.1 Fixed population approach . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Superpopulation models . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.1 Informative sampling design . . . . . . . . . . . . . . . . . . . . . 6
3 Adaptive cluster sampling 9
3.1 Thompson (1990)’ approach . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Extensions of adaptive cluster sampling . . . . . . . . . . . . . . . . . . . 12
3.3 Model-based inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4 Model for rare and clustered populations under adaptive sampling using
covariates 15
4.1 Proposed model for cell counts using covariates . . . . . . . . . . . . . . 16
4.1.1 Model inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.1.1.1 Allocating procedure . . . . . . . . . . . . . . . . . . . . 19
4.1.2 Sampling procedure . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.1.2.1 Border-sampling procedure . . . . . . . . . . . . . . . . 23
4.2 A first simulation study . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.3 Comparison with the aggregated model . . . . . . . . . . . . . . . . . . . 27
4.3.1 A design-based experiment evaluating the sample fraction . . . . 27
4.3.2 A real application . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
v
4.4 Model-based experiment under different settings . . . . . . . . . . . . . . 38
5 Conclusions 41
A Full conditional posterior distributions of the parameters in the pro-
posed model 43
A.1 Full conditional posterior distribution of θ . . . . . . . . . . . . . . . . . 43
A.2 Full conditional posterior distribution of α . . . . . . . . . . . . . . . . . 44
A.3 Full conditional posterior distribution of β . . . . . . . . . . . . . . . . . 44
A.4 Full conditional posterior distribution of quantities (Xs, Ps,Ys,ηs) . . . . 44
A.5 Full conditional posterior distribution of ρ . . . . . . . . . . . . . . . . . 45
B Assessment of MCMC with real data 46
vi
List of Figures
3.1 Illustration of adaptive cluster sampling procedure for a rare clustered popula-
tion distributed in a region with M = 400 grid cells. Figure (a) presents the
initial sample of m = 10 cells in dark grey. From this sample, in Figure (b),
neighbors are added to the sample whenever there is at least one observation
(black dot) in the selected cell, finally setting the sample presented in Figure (c). 10
3.2 Illustration of important concepts in adaptive cluster sampling: bold bordered
squares correspond to the observed cluster, gray squares are the network units,
and the hatched part are the border units. The unit initially selected is in
darker gray. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.1 Allocation method illustration of two out-of-sample networks of sizes 3 and 2,
based on weights λ (gray background). The lighter the cells’ color, the higher
the λ value (intensity) of that cell. The white cells with bold borders, whose
weights are equal to zero, are the sampled networks’ cells and the hatched ones
correspond to the nonempty networks’ border cells. The red borders surround
cells that can be drawn in each stage of the procedure. The cells that compose
the first and second allocated networks are blue-painted and green-painted,
respectively. The example proceeds as follows: draw one of the red-surrounded
cells of Panel (a). The sampled cell is blue-indicated in Panel (b) and only its
neighbors can be sorted to keep building this network. In Panel (c) and (d)
the allocation of the 3-sized network is finished, and we can draw any cell with
the red border to start the allocation of the 2-sized network. Panels (e) and (f)
present the cells chosen to compose this network in green. . . . . . . . . . . . 20
vii
4.2 Proposed sampling procedure illustration of a population (points) distributed
in a region with M = 400 cells and weights (gray background) used in this
scheme. The lighter the cells’ color, the higher the weight of that cell. The
white cells, whose weights are equal to zero, with bold borders are the sampled
networks’ cells and hatched cells correspond to the nonempty networks’ border
cells. Panels (a) and (b) present the same grayscale, since all cells have constant
weight in the first stage; and Panels (c) and (d) show different shades of gray
due to the second stage’s different weights. . . . . . . . . . . . . . . . . . . . 21
4.3 Values of the generated covariate (gray background) and counts for each nonempty
cell in a grid with M = 400 cells. . . . . . . . . . . . . . . . . . . . . . . . 25
4.4 Plot of the generated covariate versus the population counts and covariate’s
histogram, where the black points in the histogram represent the nonempty cells. 25
4.5 Boxplots with measurements of the point and 95% credibility interval estimates
for T over 100 simulations obtained for the fits of MAN, MDN and MDC models. 26
4.6 Relative frequency of the number of networks sampled over 100 simulations
obtained for the fits of MAN, MDN and MDC models. . . . . . . . . . . . . 27
4.7 Boxplots with measurements of the point and 95% credibility interval estimates
for T over 500 simulations obtained for the fits of the disaggregated and aggre-
gated models, considering different sample sizes and proportions. . . . . . . . 29
4.8 Relative frequency of the number of networks sampled over 500 simulations
obtained for the fits of the disaggregated and aggregated models, considering
different sample sizes and proportions. . . . . . . . . . . . . . . . . . . . . . 30
4.9 Mean (white line) and 2,5% and 97,5% quantiles (gray bars) of the relative bias
estimates of T over 500 simulations obtained for the fits of the disaggregated
and aggregated models, considering different sample sizes, proportions, numbers
of networks sampled (number of simulations given below the gray bars) and
coverage (above the gray bars). . . . . . . . . . . . . . . . . . . . . . . . . 31
4.10 Altitude in a logarithm scale (gray background) and counts of African Buffaloes
over parts of Kenya and Tanzania in 2010 in a grid with M = 391 cells. . . . . 32
4.11 Plot of altitude in a logarithm scale versus counts of African Buffaloes, and
covariate’s histogram, where the black points in the histogram represent the
nonempty cells. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
viii
4.12 Boxplots with measurements of the point and 95% credibility interval estimates
for T over 500 simulations obtained for the fits of the disaggregated and aggre-
gated models to real data. . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.13 Boxplots with measurements of the point and 95% credibility interval estimates
for T over 100 simulations for each number of networks sampled, obtained for
the fits of the disaggregated and aggregated models to real data. . . . . . . . 36
4.14 Boxplots with measurements of the point and 95% credibility interval estimates
for T over 100 simulations for each number of networks sampled, obtained for
the fits of the disaggregated and aggregated models. . . . . . . . . . . . . . . 37
4.15 Maps of the posterior mean of African Buffalo counts η(c) (gray background)
for all out-of-sample cell c of R, for each number of sampled networks, and its
population (points) distributed in a region with 391 cells. The lighter the cells’
color, the higher the posterior mean of that cell. The blue cells are the sampled
networks’ cells and the hatched cells correspond to the nonempty networks’
border cells. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.16 Boxplots with measurements of the point and 95% credibility interval estimates
of the proposed model and population parameters over 500 simulations for dif-
ferent values of α, β and T . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
B.1 Trace plot with the posterior densities of α, β and T obtained from the fits of
the disaggregated and the aggregated models to real data, for each number of
networks sampled (NS). The black line represents the true value of T . . . . . 47
ix
List of Tables
4.1 Summary measurements of the point and 95% credibility interval estimates of
the population total T , obtained by fitting MAN, MDN and MDC models under
100 samples according to each model. . . . . . . . . . . . . . . . . . . . . . 26
4.2 Values of the sample sizes m, m1 and m2, according to each fixed percentage. 28
4.3 Summary measurements of the point and 95% credibility interval estimates for
T over 500 simulations obtained for the fits of the disaggregated and aggregated
models, considering different sample sizes and proportions. . . . . . . . . . . 29
4.4 Percentage of networks sampled over 500 simulations under each model. . . . 35
4.5 Summary measurements of the point and 95% credibility interval estimates for
T over 100 simulations for different numbers of networks sampled, obtained for
the fits of the disaggregated and aggregated models. . . . . . . . . . . . . . . 35
4.6 Summary measurements of the point and interval estimates of the population
total, obtained by fitting the disaggregated and aggregated models and Raj’s
estimator. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.7 Summary measurements of the point and 95% credibility interval estimates of
the proposed model and population parameters over 500 simulations for different
values of α, β and T . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
B.1 Geweke convergence diagnostic for some of the parameters estimated for the
real population for both models. . . . . . . . . . . . . . . . . . . . . . . . . 46
x
Chapter 1
Introduction
In several statistical surveys, there are obstacles in data collection since the study
object is hard to observe either because it is a rare population, exhibits a pattern of
sparsely distributed groups in a region, or is mobile over time. Examples of populations
with these characteristics include endangered animals and plants, ethnic minorities, drug
users, individuals with rare diseases, and recent immigrants. Assume that the population
of interest is spatially distributed in a region of interest, where a regular grid with M
equal-sized cells is superimposed. Denote the partitioned region by R = {c1, . . . , cM}.Let η(c) denote the number of individuals of the population within the grid cell c, for
all c ∈ R, that is, this cell’s count. The objective is to estimate a rare and clustered
population total T =∑
c∈R η(c).
Under traditional sampling methods of grid cells, a subset of m < M cells is drawn
and their respective counts η(c) are observed. Due to the population characteristics,
small sample sizes result in large numbers of empty grid cells, for which η(c) = 0, leading
us to inaccurate estimates of the population quantity of interest. In this context, adaptive
cluster sampling, introduced by Thompson (1990), is a way to surmount this difficulty by
increasing survey effort around non-empty grid cells of the sample. From an initial sample
of m grid cells, when we find a non-empty grid cell, for which η(c) 6= 0, we also sample
its neighbors (cells sharing a common edge with the current one) and continue surveying
until we obtain a set of contiguous non-empty grid cells surrounded by empty grid cells.
By this way, empty grid cells bring no further survey effort. Therefore, adaptive cluster
sampling requires some prior knowledge about the structure of the subjacent population,
which may be obtained from a preliminary survey, to be effective.
According to Thompson (1990), the set of contiguous non-empty grid cells is called a
network; this set plus its neighboring empty grid cells are together named a cluster; and
1
empty cells are defined as one-sized networks. Therefore, R is exhaustively partitioned
into disjoint networks, and the final sample contains empty and non-empty networks.
Thompson (1990) treated empty edge cells as unobserved and, from an initial random
sample without replacement of grid cells, inclusion probabilities are assigned to the sam-
pled networks, used to construct design-unbiased estimators of T and their variances.
Note that the networks are the basis of the analysis and, although the initial selection of
cells is without replacement, the same network can be selected more than once, a problem
that Thompson (1990) solved by allowing multiple inclusions of networks. Edge cells can
be incorporated into the estimator by taking the conditional expectation of their counts
given the minimal sufficient statistic and setting the Rao-Blackwell improved version of
that. These estimators were described and computed for small sample sizes in Thompson
(1990). Further, Salehi M. & Seber (1997) proposed a scheme whereby the networks are
selected one by one without replacement, avoiding select the same network more than
once.
Several studies have been conducted using adaptive sampling designs on real popu-
lations. For example, Smith et al. (1995) studied the methodology for rare species of
waterfowl, Su & Quinn (2003) discussed adaptive cluster sampling with order statistics
and a stopping rule for a fish population, Philippi (2005) showed that it is a viable al-
ternative for the estimation of occurrences in local populations of low-abundance plants
and Gattone et al. (2016) applied it to negatively correlated data.
Thompson & Seber (1996) examined some general ideas about model-based inference
approaches for adaptive sampling. Bayesian methods showed promising results among
model-based approaches. Beside them, Bayesian inference methods for adaptive cluster
sampling designs have been developed in Rapley & Welsh (2008) and Goncalves & Moura
(2016), which incorporate prior knowledge that the population is rare and grouped for
both inference and sample design. Rapley & Welsh (2008) provided a model at the
network level, while Goncalves & Moura (2016) modeled at the cell level, considering
heterogeneity among units belonging to different clusters. Both works did not take into
account the spatial locations of the networks, a fact that does not cause any loss of
information about the population total since it does not depend on where the networks
are located, under the model.
A possible approach to spatially model clustered data is by using point processes
(Diggle, 1975; Baddeley & Turner, 2000; Brix & Diggle, 2001), where the clusters are
considered as points and have no internal spatial structure, although there is a spatial
relationship between them. Rapley & Welsh (2008) place the clusters and give them a
spatial size by superimposing a grid on a region containing a clustered population and
2
modeling it within this grid structure. In this case, it is assumed that the intensity of
the counts in each cluster is proportional to its size. However, this assumption is not
always valid. In some situations, cells that belong to the same cluster can have different
intensities, e.g. the border cells can present a smaller incidence rate than the central
ones. Moreover, a cluster can have a higher incidence of the phenomenon, not because of
its size, but due to other factors that influence its disposition, as a spatially referenced
covariate.
This work aims to present a disaggregated model, at cell level, which assumes that the
intensity in each cell of a cluster is related to an available covariate value. The proposed
model fits rare and grouped populations, disposed over a regular grid, in a Bayesian
framework. The key idea of this dissertation is the improvement of the population esti-
mates through the use of grid cells as analysis units and the incorporation of additional
information into the model. Based on this extra information, we also raise an improved
sampling process, where different probabilities are assigned to draw the cells, and we can
spatially reference the estimates of the cell counts. Introducing additional information
seems to be an intuitive idea, provided that the prior knowledge indicates that there is
a relationship between the phenomenon occurrence and some covariate.
This dissertation is organized as follows. In Chapter 2, the notation of finite popula-
tion sampling is introduced, which will be used throughout the text, as well as design-
based and superpopulation approaches. Chapter 3 presents the adaptive sampling plans’
methodology, some extensions, and a model-based approach proposed by Rapley & Welsh
(2008), which motivated the ideas of this work. The proposed model is introduced in
Chapter 4, a new sampling procedure is proposed and aspects of inference are discussed.
Moreover, simulation studies are presented for assessing the effectiveness of the proposed
model and the one proposed by Rapley & Welsh (2008) and considering the estimation
of model parameters under different degrees of rare and clustered populations. Also, a
real population is presented and used in an evaluation of the performance of the pro-
posed model. Finally, we conclude with a brief discussion about the advantages of our
methodology and suggestions for further research in Chapter 5.
3
Chapter 2
Inference for finite populations
In this chapter, important notation and definitions in the theory of finite population
sampling, that will be used throughout this work, are presented. In this context, there
are two possible approaches: (i) fixed population approach, where each population unit is
associated with a fixed but unknown real number, that is the value of the variable under
study; and (ii) superpopulation approach, where each population unit is associated with
a random variable for which a stochastic structure is specified, and the actual value
associated with the population unit is treated as the outcome of this random variable. In
Section 2.1, the first approach is presented, and in Section 2.2, the second one, for which
the sample design could be considered relevant to perform Bayesian inference about the
model parameters.
2.1 Fixed population approach
According to Cassel et al. (1977), a finite population, for which we are interested in a
characteristic n, is a collection of M units, denoted by the index set P = {1, . . . ,M}, for
M <∞ supposedly known. It is important to remark that, in association with each unit
i, i = 1, . . . ,M , we also have the value ni. Thus, when data is observed, we should record
not only these values, but also the respective unit that produced each measurement. The
complete observation is denoted by the pair (i, ni) and, therefore, there are M pairs for
the whole population.
In statistical inference, a parameter is frequently treated as an unknown quantity
indexing a probability distribution. Define n = (n1, . . . , nM)′ as the parameter of the
finite population, belonging to a parameter space defined in RM . Any real function of
n1, . . . , nM is called a parametric function. For example, the number of people with
4
some disease in M neighborhoods, or the number of animals of a particular species in
M locations. Inference in finite populations is usually made about a specific parametric
function, such as the population total T =∑M
i=1 ni, the population mean µ = T/M or
the population variance σ2 =∑M
i=1(ni − µ)2/M . In particular, in this work, we aim to
estimate the population total.
We make statistical inferences about these parametric functions based on information
obtained from a sample of the population P (more details can be seen in Cassel et al.
(1977)). A sequence s = {i1, . . . , im} such that ij ∈ P , for j = 1, . . . ,m, is called an
ordered sample, or simply a sample, of size m. The label ij is called the j-th component
of s. The finite population sampling based on randomization of the sample differs from
other parts of statistics since it treats the population as fixed. In this approach, the
probabilistic mechanism of sample selection is a predetermined randomization procedure
called sample design. It is represented by a probability function, known as sampling
plan, of the set S of all possible samples s, where [s] denotes the probability to select
the sample s. A sample design [·] is called non-informative if and only if [·] is a function
that does not depend on the values of n associated with s. Otherwise, it is called an
informative sampling plan and is denoted by [s | n].
Once s is selected, the observed result can be specified as the set of pairs d = {(i, ni) :
i ∈ s}. In some cases, the interest is only in the values of n and not in the full pair, so
define ns = {ni : i ∈ s}. Let s = P − s and so ns = {ni : i ∈ s} be the values of n
associated to the units that do not belong to the sample.
In the following section, we present the approach based on superpopulation models,
where the sample remains fixed, and the value related with the population unit is treated
as the outcome of its associated random variable, and the inferences refer to a hypothetical
superpopulation, in which a probability law governs the variables of interest.
2.2 Superpopulation models
Another inferential approach for finite populations is based on superpopulation mod-
els. The process of statistical inferences from a sample comprises a set of principles
and procedures that may include, for example, knowledge of some random process that
generated the true unknown value of the characteristic of interest for each unit of the
population. This process is represented by a model that is used as a basis for making
inferences.
While in conventional sampling theory values of the variable of interest corresponding
5
to the population units are treated as fixed constants, under superpopulation models
approach, the value of the population vector n = (n1, . . . , nM)′ is considered a realization
of the random vector N = (N1, . . . , NM)′, for which there is a joint distribution of all
values of the population.
According to the model, suppose that, given a parametric vector θ ⊂ Θ, N follows
a probability distribution denoted by [N | θ]. Let N = (N1, . . . , NM)′ be the population
vector generated according to the distribution [N | θ]. Let H be the vector containing
additional variables associated with the structure of the population and suppose that the
joint distribution of H, which depends on a vector parameter ψ, is given by [H | ψ].
2.2.1 Informative sampling design
In a wide range of sampling designs, the sample selection mechanism may depend on
the values of the variables of interest in the population. This situation characterizes an
informative sampling plan. A typical example is a case-control study, where the sample
is selected such that there are cases (units with a given condition of interest) and controls
(units without this condition), and one is interested in modelling the indicator of presence
or absence of the condition as a function of predictor variables. This indicator is one of
the research variables and is considered in the sample selection mechanism.
Under the approach of superpopulation models, it is important to analyze whether the
selection probabilities of the population elements are related to the response variables. In
this case, it is relevant for inference to take into consideration the sampling plan, either
in the model design or in the construction of the likelihood function.
Let v be the set of variables that are fully observed for all units of the population
P . Even when we are primarily interested in some aspect of the distribution of N , v can
provide information through a regression setting. In other words, if v is fully observed,
then a model for N | v can lead to more precise inference about new values of N than
would be obtained by modeling N alone.
According to Gelman et al. (1995), when considering data collection, it is useful to
split the joint probability model into two parts: (i) the model for the underlying complete
data, N — including observed and unobserved components; and (ii) the model for the
sample s. The complete-data likelihood of sample s, vector N, and variables H, given
the parameters in the model and covariates v, is given by:
[s,N,H | v,θ,ψ] = [s | N,H][N | v,H,θ][H | ψ], (2.1)
which depends on the complete data N. In fact, the information obtained from a sample
6
is (s,Ns,Hs). Therefore, the likelihood of the observed data, assuming continuity of the
model’s quantities, without loss of generality, is:
[s,Ns,Hs | v,θ,ψ] =
∫ ∫[s,N,H | v,θ,ψ]dNsdHs
=
∫ ∫[s | N,H][N | v,H,θ][H | ψ]dNsdHs.
Under the Bayesian approach, we are interested in obtaining the posterior distribution
of the parametric vector. In this case, the joint posterior distribution of the model
parameters θ and ψ, given the observed information, (s,Ns,Hs,v), is:
[θ,ψ | s,Ns,Hs,v] ∝ [θ,ψ][s,Ns,Hs | v,θ,ψ]
= [θ,ψ]
∫ ∫[s,N,H | v,θ,ψ]dNsdHs
= [θ,ψ]
∫ ∫[s | N,H][N | v,H,θ][H | ψ]dNsdHs. (2.2)
The posterior distribution of θ is obtained by integrating the expression (2.2) over ψ,
as follows:
[θ | s,Ns,Hs,v] ∝∫ ∫ ∫
[θ,ψ][s | N,H][N | v,H,θ][H | ψ]dNsdHsdψ
∝∫ ∫ ∫
[θ][ψ | θ][s | N,H][N | v,H,θ][H | ψ]dNsdHsdψ
∝ [θ]
∫ ∫ ∫[ψ | θ][s | N,H][N | v,H,θ][H | ψ]dNsdHsdψ.(2.3)
If we decide to ignore the sampling design, we can compute the joint posterior distri-
bution of the model parameters θ and ψ by conditioning only on Ns, Hs and v but not
s, as follows:
[θ,ψ | Ns,Hs,v] ∝ [θ,ψ][Ns,Hs | v,θ,ψ]
= [θ,ψ]
∫ ∫[N | v,H,θ][H | ψ]dNsdHs. (2.4)
The posterior distribution of θ, ignoring the sampling design, is obtained from the
expression (2.4) and is given by:
[θ | Ns,Hs,v] ∝ [θ]
∫ ∫ ∫[ψ | θ][N | v,H,θ][H | ψ]dNsdHsdψ. (2.5)
When unobserved data supplies no information, i.e. when [θ | Ns,Hs,v] given in
(2.5) equals [θ | s,Ns,Hs,v] given in (2.3), the sample design is called ignorable (with
7
respect to the proposed model). In general, sampling plans involve some knowledge of the
structure of the population, such as stratification, conglomeration, and unequal selection
probabilities (complex sampling).
In this case, the sufficient condition to ensure design ignorability is [s | N,H] =
[s | Ns,Hs]. The important consequence of this condition is that, from (2.3), it follows
that if the sampling plan is ignored with respect to the parameter of interest θ, then
[θ | s,Ns,Hs,v] = [θ | Ns,Hs,v]. Thus, the additional information brought trough s
can be discarded when one wishes to make inference about θ, otherwise, it cannot be
ignored. Mistakenly ignoring the informative sampling plan in inference may negatively
affect the parameters’ estimates.
In this work, the approach based on the superpopulation model will be used, focusing
on inference about the model parameters and the prediction of T from data obtained by
adaptive cluster sampling, which is an informative sampling plan. As usual, we will avoid
evaluating the integrals presented in this section by simply drawing posterior simulations
of the joint vector of unknowns, (Ns,Hs,θ,ψ), and then focusing on estimates.
8
Chapter 3
Adaptive cluster sampling
In the context of rare and grouped populations, it is usual to divide the region of
interest into cells. When we use traditional sampling methods to sample a population
with these characteristics, small samples of cells result in a large number of empty cells,
leading us to inaccurate estimates of the population quantity of interest. Adaptive cluster
sampling is an alternative to deal with this difficulty since it allows us to increase survey
effort in the vicinity of regions where individuals of interest are found by using information
from the observed values to be more successful in collecting additional cells.
In Section 3.1, we present the adaptive sampling plan proposed by Thompson (1990),
since it is a suitable sampling plan for the type of population we aim to study in this work.
In Section 3.2 some extensions of this sampling plan are briefly presented. Finally, in
Section 3.3 the model-based approach proposed by Rapley & Welsh (2008) is presented,
for which the sampling plan is relevant to perform Bayesian inference about the model
parameters.
3.1 Thompson (1990)’ approach
Adaptive cluster sampling is generally used when we are dealing with sparse and
clustered populations, since this method help to more accurately estimate the population
totals. Consider a population spread non-homogeneously over a region, for instance, one
in which there are clusters, with a grid of size M superimposed upon it. In this case, the
population total may be estimated inefficiently: if the sample includes several clusters
the population total will be overestimated, and if the sample includes very few clusters,
it will be underestimated. In this situation, using an adaptive sampling strategy provides
more efficient estimators and, therefore, should be preferred in most cases.
9
Initially proposed by Thompson (1990), the method has shown to be effective in
epidemiological research and studies on rare diseases, animals and plants. From an
initial sample of m grid cells, when a selected cell contains a member of the population of
interest, the cells sharing a common edge with the current cell are also sampled and we
continue surveying until we obtain a set of contiguous non-empty grid cells surrounded
by empty grid cells. This procedure has shown to be intuitive since it is expected to find
an element with similar characteristics to another in its vicinity when the population is
grouped. In this way, empty grid cells bring no further survey effort. Hence, adaptive
cluster sampling requires some prior knowledge about the structure of the underlying
population, which may be obtained from a preceding survey, to be effective.
In Figure 3.1, the method is illustrated for a population distributed over a region
partitioned into M = 400 grid cells. The sampling procedure starts with a simple random
sample without replacement of m = 10 units, which are displayed in gray in the grid
(Figure 3.1a). Note that, from the 10 cells selected, only 2 of them contain a member of
the population of interest. Next, the units neighboring these 2 units are also included in
the sample (Figure 3.1b). We continue surveying neighbors until the process is finalized,
Figure 3.1c, with 45 sampled cells, represented by the highlighted ones.
●●●
● ● ●● ●
●
●●● ●●●
●●●●
●●●●
●
●
●●
●
●●●
●
●●
●●
●
●
●●
●
●●●●
●
●●●●●●●
●● ● ●
●●
●●
●●●●
●
● ● ●● ●● ●
●
●●
●●●
●● ● ● ● ●
●
●●●●●
●● ●
●
●●●
●
●●
●●
●●
●●
●●
●●
●●●
●●
●
●●
●●●●
●
●●
●
●
●●
●
●
●
●
●●●
●
● ● ●
●●●●
●●●● ● ●
●●
(a) Initial sample
●●●
● ● ●● ●
●
●●● ●●●
●●●●
●●●●
●
●
●●
●
●●●
●
●●
●●
●
●
●●
●
●●●●
●
●●●●●●●
●● ● ●
●●
●●
●●●●
●
● ● ●● ●● ●
●
●●
●●●
●● ● ● ● ●
●
●●●●●
●● ●
●
●●●
●
●●
●●
●●
●●
●●
●●
●●●
●●
●
●●
●●●●
●
●●
●
●
●●
●
●
●
●
●●●
●
● ● ●
●●●●
●●●● ● ●
●●
(b) Sampling process
●●●
● ● ●● ●
●
●●● ●●●
●●●●
●●●●
●
●
●●
●
●●●
●
●●
●●
●
●
●●
●
●●●●
●
●●●●●●●
●● ● ●
●●
●●
●●●●
●
● ● ●● ●● ●
●
●●
●●●
●● ● ● ● ●
●
●●●●●
●● ●
●
●●●
●
●●
●●
●●
●●
●●
●●
●●●
●●
●
●●
●●●●
●
●●
●
●
●●
●
●
●
●
●●●
●
● ● ●
●●●●
●●●● ● ●
●●
(c) Final sample
Figure 3.1: Illustration of adaptive cluster sampling procedure for a rare clustered population
distributed in a region with M = 400 grid cells. Figure (a) presents the initial sample of
m = 10 cells in dark grey. From this sample, in Figure (b), neighbors are added to the sample
whenever there is at least one observation (black dot) in the selected cell, finally setting the
sample presented in Figure (c).
The set of contiguous cells containing members of the population make up a network,
while the set of contiguous units sampled, both the network and the empty boundary
cells are together termed a cluster. These boundary cells are called edge cells. It is
also convenient to define all singular empty cells as networks, so an edge cell is, in fact,
a network of size one. These definitions are illustrated in Figure 3.2, which is part of
the sample seen in Figure 3.1. The squares with bold border correspond to the observed
10
cluster, the squares in gray make up the nonempty network and the hatched cells represent
the border units. The unit initially selected is in darker gray.
●
● ●●
● ●
●●
●
●
●
●●●
●
● ● ● ● ●
●
●
●●
●●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
Figure 3.2: Illustration of important concepts in adaptive cluster sampling: bold bordered
squares correspond to the observed cluster, gray squares are the network units, and the hatched
part are the border units. The unit initially selected is in darker gray.
Although the initial sampled cells are distinct, a cluster may include more than one
unit of the initial sample, i.e., if two non-border units in the same cluster are initially
selected, then this cluster can occur twice in the final sample. Therefore, an adaptive
cluster sample, which begins with a selection without replacement of m initial units, has
the number of distinct nonempty networks less or equal to m. Thus, the final number of
sampled cells is a random variable and cannot be set.
When the data are analyzed, the networks become the unit of analysis and the respec-
tive boundaries are ignored if they have not appeared in the original sample (Thompson
& Seber, 1996). Networks are used as unit of analysis because it is possible to calculate
inclusion probabilities, based on their size, for each of them in the model. In addition,
since the networks are disjoint, they form a partition over the region of interest. Besides
that, grid cells within networks usually have a structure of dependency and considering
the network as a unit of analysis avoids the need to explicitly define this structure in the
model.
Conventional estimators under adaptive cluster sampling design tend to be biased
since nonempty cells are sampled disproportionately. Based on this idea, Thompson
(1990) obtained an unbiased estimator under this sample design for the population mean
and allowed multiple inclusions of networks. Moreover, the Thompson (1990)’s approach
lets edge cells to be incorporated into the estimator by taking its conditional expectation
given the minimal sufficient statistic and setting the Rao-Blackwell improved version of
that. These estimators were described and computed for small sample sizes in Thompson
(1990). From this work, some extensions of this sampling design, in addition to the
initial selection based on simple random sampling, appeared in the literature and will be
presented below.
11
3.2 Extensions of adaptive cluster sampling
Several extensions to simple random sampling — e.g., stratified sampling — can also
be applied to adaptive sampling. Methods for stratified adaptive cluster sampling were
first proposed in Thompson (1991b) and another extension was proposed in Thompson
(1991a). In these approaches, primary sampling units, for example, groups of units
arranged in strips or rectangles, are defined and then randomly sampled. If a member
of the population is found within a primary unit, secondary units outside of the primary
unit are added to the sample in the same way as in normal adaptive cluster sampling
(Thompson, 1991a). Further, Salehi M. & Seber (1997) proposed a scheme whereby the
networks are selected one by one without replacement, avoiding the selection of the same
network more than once.
Thompson & Seber (1996) examined some general ideas about model-based inference
approaches for adaptive sampling. The likelihood-based methods, such as Bayesian es-
timation, showed promising results among model-based approaches, some of which are
detailed in the next section.
3.3 Model-based inference
Bayesian inference methods for adaptive cluster sampling designs have been devel-
oped in Rapley & Welsh (2008) and Goncalves & Moura (2016), which incorporate prior
knowledge that the population is rare and grouped for both inference and sample de-
sign. Rapley & Welsh (2008) provided a model at the network level, while Goncalves &
Moura (2016) modeled at the cell level, considering heterogeneity among units belonging
to different clusters. Both works did not take into account the spatial locations of the
networks, a fact that does not cause any loss of information about the population total
since, under the model, it does not depend on where the networks are located. In this
chapter, we will focus on Rapley & Welsh (2008)’s model.
Rapley & Welsh (2008) proposed a complex model, which uses networks as units
of analysis. Therefore, we refer to this model as the aggregate model. The use of the
Bayesian approach is a natural extension of the idea of adaptive cluster sampling because
it incorporates the prior knowledge that the rare population is grouped for both inference
and sample design. To illustrate their proposal, Rapley & Welsh (2008) compare their
estimators with the estimators developed in Thompson (1990) through a simulation study,
showing to be efficient, especially in a context of prior knowledge.
12
Let R be a region containing a rare, clustered population, over which a regular grid
partitioned into M cells overlaps. A cell is considered nonempty if it contains at least
one member of the population and empty otherwise. Let X ≤ M be the number of
nonempty cells in R. Let P ≤ X be the number of nonempty networks in R, where a
network is defined as in Thompson (1990). Let Yi be the number of nonempty cells within
the nonempty network i, for i = 1, . . . , P , and therefore Y = (Y1, . . . , YP )′ is the vector
with the number of nonempty cells within each nonempty network, so that X =∑P
i=1 Yi.
Note that there are M −X empty cells, which are defined as one-sized empty networks,
so there are M −X + P networks in R. We can extend the P -dimensional Y vector to
a (M − X + P )-dimensional vector given by Z = (Y′,1′M−X)′ where 1M−X is a vector
of dimension M − X. Thus, it follows that Zi = Yi, if the i-th network is a nonempty
one and Zi = 1, otherwise, for i = 1, . . . ,M − X + P . Let Ni be the count of a given
phenomenon of interest in the nonempty network i and, therefore, N = (N1, . . . , NP )′
denotes the vector with the population total in each of the nonempty networks.
In order to perform inference about the population total T =P∑i=1
Ni, one must specify
the joint distribution of {X,P,Y,N} for the entire population and the sampling mecha-
nism that provides a particular sample of m networks from the M −X + P networks in
the population. First, we model the nonempty network structure and then, conditional
on it, model the count on the nonempty networks. Since the model applies to nonempty
cells, to avoid degeneration problems it is assumed that there is at least one nonempty
cell in R, so distributions are left truncated at zero. The proposed model can be written
as follows:
Ni | P, Yi, γ ∼ independent truncated Poisson(γYi), i = 1, . . . , P,
Y | X,P ∼ 1P + Multinomial
(X − P, 1
P1P
), Yi = 1, . . . , X − P,
P | X, β ∼ truncated Binomial(X, β), P = 1, . . . , X,
X | α ∼ truncated Binomial(M,α), X = 1, . . . ,M. (3.1)
The model in (3.1) is applied to samples collected according to the adaptive method
proposed by Salehi M. & Seber (1997), which consists of observing Yi for i ∈ s sequen-
tially. Since the sample design depends on the structure of the population, which is
unknown, being characterized as an informative sampling, it should be incorporated into
the likelihood function of the model to perform inference. Therefore, the next step is to
define the probability of selecting a sample s = {i1, . . . , im}, i.e., [s]. It is known that
if a cell that composes a nonempty network is sampled, then the entire network must
13
be observed and thus the probability of selecting a network is weighted according to its
size. To illustrate the construction of the probability of selection of a sample, consider
a population consisting of eight networks of sizes {1, 1, 1, 1, 3, 3, 5, 5}, from which the
sample {5, 1, 5, 3} is taken. The probability of selecting the first network is equal to
the probability of selecting a network of size 5, which is equal to 5×220
, the probability of
selecting a network of size 1 in the second step, given the previous one, is 1×415
and thus
the probability of selection of this particular sample is equal to
5× 2
20× 1× 4
20− 5× 5× 1
20− 5− 1× 3× 2
20− 5− 1− 5.
Therefore, the probability of selection of a particular sample can be generalized as
follows:
[s | X,P,Y] =m∏j=1
Zij × gij ,jM−X+P∑
i=1
Zi −j−1∑k=0
Zik
, (3.2)
where gij ,j is the number of networks of size Zij unselected after j − 1 networks have
been selected and Zi0 = 0.
Making an equivalence with the notation presented in Section 2.2, the following cor-
respondence can be obtained: M = M −X+P , H = (X,P,Y′)′, θ = γ and ψ = (α, β)′.
Note that the probability of selecting the sample s does not depend directly on N but
depends on the variables associated with the population structure, so it is said that the
sampling plan is informative with respect to H.
14
Chapter 4
Model for rare and clustered
populations under adaptive sampling
using covariates
Rapley & Welsh (2008) proposed a model that uses networks as units of analysis,
avoiding introducing spatial components in the model, which may facilitate the inference.
In this case, it is assumed that the intensity of the counts in each cluster is proportional
to its size. However, this assumption is not always valid, e.g. a cluster can have a
higher incidence of the phenomenon due to external factors that influence its disposition,
such as a spatially referenced covariate. Thus, proposing a disaggregated model can
be interesting in many contexts with rare and clustered populations. The objective of
this chapter is to present a disaggregated model at cell-level, which assumes that the
intensity of the counts in each cell of a cluster is related to an available covariate value.
The proposed model fits rare and grouped populations sampled under adaptive cluster
design. Therefore, the probability of selecting a given sample should be incorporated into
the model likelihood function. Introducing additional information in the model seems to
be an intuitive idea, provided that the prior knowledge indicates a relationship between
the phenomenon occurrence and some covariate.
In Section 4.1, the model is introduced, a new sampling procedure is proposed and
aspects of inference are discussed. Section 4.2 presents a simulation study for assessing
the effectiveness of the proposed model and the model proposed by Rapley & Welsh
(2008). In Section 4.3 both approaches are compared through a design-based perspective
under different scenarios, as well as a real data application. Finally, a simulation study to
evaluate the estimation of model parameters under different degrees of rare and clustered
15
populations is presented in Section 4.4.
4.1 Proposed model for cell counts using covariates
Suppose the phenomenon of interest is related to covariates, the values of which are
available for each one of the cells in R. Let C be the set of all nonempty cells of R and
C the set containing all empty cells of R. Let η(c) be the count of a given phenomenon
of interest in the cell c, and vc = (1, v1(c), . . . , vk(c))′ the vector with the k covariates
associated with cell c, for all c ∈ R. Let η be the set with the counts for all nonempty
cells, that is, η = {η(c) | c ∈ C}.
In order to perform inference about the population total T =∑c∈C
η(c), one must
specify the joint distribution of {X,P,Y,η} for the entire population and the sampling
mechanism that provides a particular sample ofm networks fromM−X+P in population.
First, we model the nonempty network structure and then, conditional on it, model the
count on the nonempty network’s cells, similarly to Rapley & Welsh (2008)’s approach.
Since the model applies to nonempty cells, to avoid degeneration problems it is assumed
that there is at least one nonempty cell in R, so distributions are left truncated at zero.
The proposed model can be written as follows:
η(c) | vc,θ ∼ truncated Poisson(λ(c)), η(c) ≥ 1, c ∈ C,
Y | X,P ∼ 1P + Multinomial
(X − P, 1
P1P
), Yi = 1, . . . , X − P,
P | X, β ∼ truncated Binomial(X, β), P = 1, . . . , X,
X | α ∼ truncated Binomial(M,α), X = 1, . . . ,M. (4.1)
where λ(c) = exp{v′cθ}, θ = (θ0, θ1, . . . , θk)′ represents the regression coefficients vector
associated with vc and 1P + Multinomial(·) represents the truncated at one Multinomial
distribution. Note that the M−X empty cells have their respective counts equal to zero,
that is, η(c) = 0, for all c ∈ C.
Making an equivalence with the notation presented in Section 2.2, the following corre-
spondence can be obtained: H = (X,P,Y′)′ and ψ = (α, β)′. According to the sampling
procedure, the drawn sample is composed of networks, from which the cells that compose
each of them and its respective counts are observed and used to model the population
total at a more disaggregated level. Furthermore, we can make an equivalence among the
vectors η and N since the first one contains the counts for each of the M cells of R and
the second one for each of the M −P +X networks of R. Let Ni be the count associated
16
with the i-th network, that can be empty or nonempty. If i-th network is empty, then
it is composed of a single cell and Ni = η(c), where c is the cell that composes the i-th
network. On the other hand, if i-th network is nonempty, then it is composed of a set
of cells ci and Ni =∑
c∈ci η(c). Moreover, the probability of selecting the sample s
does not depend directly on the quantities of the model, since they do not appear on
its expression, but on the allocation process that produces the set of cells that compose
unsampled networks and, consequently, the set Gij ,j in equation 4.3. Thus, it is said that
the sampling plan is informative.
4.1.1 Model inference
The sampling procedure entails observing Yi for the networks {i1, . . . im} and the
counts η(c) for its respective cells. Since adaptive cluster sampling procedure depends on
the population structure, it is characterized as an informative sampling design and the
probability of selecting the sample s = {i1, . . . , im} of m networks, [s | X,P,Y], should
be incorporated into the model likelihood function. Set the subscript ‘s’ to identify
the observed component and s to the unobserved component, and define Y = (Y′s,Y′s)′,
X = Xs+Xs and P = Ps+Ps to distinguish between observed and unobserved quantities.
Let Cs be the set of the sample’s nonempty cells, i.e., the cells that compose the networks
with sizes Ys; and Cs be the set of the out-of-sample nonempty cells, i.e., the cells that
compose the non-sampled networks with sizes Ys. Thus, define η = (η′s,η′s)′, where
ηs = {η(c) | c ∈ Cs} and ηs = {η(c) | c ∈ Cs}. A natural predictor of the population
total T is given by:
T =∑c∈Cs
η(c) +∑c∈Cs
η(c),
where η(c) represents the posterior mean of the count of the cell c, for c ∈ Cs.
Following the Bayesian paradigm, independent priors are also assumed for the un-
known parameters θ, α and β and their marginal prior distributions are denoted, respec-
tively, by [θ], [α] and [β]. Let [θ] be a non-informative prior with a zero-mean vector
and covariance matrix σ2θIk+1, where Ik+1 denotes the (k+1)-dimensional identity matrix
and σ2θ = 104. For α we assumed a Beta(aα, bα) distribution with aα = 3 and bα = 15,
and for β a Beta(aβ, bβ) distribution with aβ = 1 and bβ = 9. The prior distributions of
α and β are chosen to reflect the fact that α and β are necessarily small in a rare and
clustered population, as considered in Rapley & Welsh (2008). In this case, the objective
is not only to estimate the parameters of the model based on a sample, but also to make
predictions of the unobserved parts.
17
The joint distribution of all the quantities in the model is:
[η,Y, P,X,θ, β, α]
= [s | X,P,Y][η | θ][Y | X,P ][P | X, β][X | α][θ][α][β]
∝ [s | X,P,Y]×∏c∈C
exp{− exp{v′cθ}+ η(c)v′cθ}η(c)!(1− exp{− exp{v′cθ}})
×(x− p)!p∏i=1
1
(yi − 1)!
(1
p
)yi−1
×
(x
p
)βp(1− β)x−p
1− (1− β)x×
(M
x
)αx(1− α)M−x
1− (1− α)M
× exp
{− 1
2σ2θ
θ′θ
}× αaα−1(1− α)bα−1 × βaβ−1(1− β)bβ−1. (4.2)
We perform inference via MCMC to obtain samples from the resulting posterior distri-
bution. The full conditional posterior distributions and the methods adopted to sample
from each of them are detailed in Appendix A. In comparison with the sampling pro-
cedure proposed by Salehi M. & Seber (1997), our improved sampling process leads to
draw a greater number of networks, providing samples that may include all networks
from R (see details in Subsection 4.1.2). Thus, our proposal distribution, different from
Rapley & Welsh (2008)’s approach, may lead to none out-of-sample nonempty cells and,
consequently, none out-of-sample networks (see details in Appendix A). The estimation
procedure consists of the following steps:
(1) Initialize the counter j = 2 and set initial values for the parameters and quantities
of the model: θ(1), α(1), β(1), X(1)s , P
(1)s , Y
(1)s and η
(1)s ;
(2) Update the model parameters θ, α and β from the conditional distributions:
[θ | α(j−1), β(j−1), X(j−1), P (j−1),Y(j−1),η(j−1)],
[α | θ(j), β(j−1), X(j−1), P (j−1),Y(j−1),η(j−1)],
[β | θ(j), α(j), X(j−1), P (j−1),Y(j−1),η(j−1)],
described in Appendix A;
(3) Generate the non-sampled quantities Xs, Ps and Ys according to the proposal
distribution described in Section A.4;
(4) Allocate the Ps networks of Ys according to the allocating procedure described in
Subsection 4.1.1.1;
(5) Generate ηs and jointly update Xs, Ps, Ys and ηs from the conditional distribution:
[Xs, Ps,Ys,ηs | θ(j), α(j), β(j), Xs, Ps,Ys,ηs];
18
(6) Increment the counter j to j + 1 and iterate from (2).
Note that the regression coefficients θ are updated on step (2) based only on the
sample information. Moreover, from them, we can easily obtain the Poisson distribution’s
intensity λ(c) = exp{v′cθ} for any non-sampled cell c of R, which is used later to estimate
η(c) for all nonempty and non-sampled cell c of R. Let λ be the set of intensities assigned
to all cells of R. Then, after generating the non-sampled quantitiesXs, Ps and Ys, all that
remains is to find out which cells form each of these Ps networks on step (4), according
to the allocating procedure presented in Subsection 4.1.1.1.
4.1.1.1 Allocating procedure
Determining the cells that compose the out-of-sample nonempty networks is a crucial
step in the proposed model estimation since the resulting allocation directly impacts: the
cells that compose Cs and the estimated value of ηs. Each one of the generated out-of-
sample networks are allocated sequentially, according to its size: the bigger networks are
allocated first and the smaller ones later. It is assumed that the bigger the size of the
network, the higher its cells’ intensity values. Note that the cells that compose the set
of the out-of-sample cells, Cs, must not be part of the set of sampled cells, Cs, nor of
the sampled nonempty networks’ borders (if it happens, we would be able to modify a
network previously sampled).
The allocating procedure aims to draw the cells that compose each generated out-
of-sample network according to determined weights. In this case, we will use the set
of intensities λ, although one could sample the cells based on other practical weights.
The Cs cells’ and visited borders’ weights λ are admitted to be zero. An example of
this procedure is illustrated in Figure 4.1. The allocating method of a network of size Y
proceeds as follows: draw an available cell c with probability proportional to the weights
λ and, if Y > 1, draw another cell from the neighbors of that cell and continue to
draw another neighbors’ cells until we obtain a set of Y contiguous nonempty grid cells
surrounded by empty grid cells. Then remove this network from the population, select
one of the remaining grid cells with probability proportional to the weights λ and proceed
in this way until we have allocated all the Ps networks. Note that the cells that were not
chosen to be part of Cs are assumed to be empty.
19
●●
●●●
●
●●
●
●● ●
●●
●●
●
● ●
●●
●
●
●●
●
●
●
● ●
●● ●
●
● ●● ●
●
●●
●● ●
●
●●
●
●●●
●●
●●
●●●●
●
●●
●●
●●●●
●●
● ●●
●●●●
●●
● ●●●
●●
●●● ●
●●
●●
●●
●●
●●●
●
●●●
●● ●● ●●● ●
● ●●
●●●
●●
●●●●●
● ●●
●●●
●
●●●● ●●
●●
●
●
●● ●
●●
●●●●
●●
● ●●
●
●
●●
●
●●
●
●
●
●
● ●●●
●
●●●
●
● ● ●
●●
●
●●
●
●
●●●
● ●●
●●●●●
●
●●
(a)
→
●●
●●●
●
●●
●
●● ●
●●
●●
●
● ●
●●
●
●
●●
●
●
●
● ●
●● ●
●
● ●● ●
●
●●
●● ●
●
●●
●
●●●
●●
●●
●●●●
●
●●
●●
●●●●
●●
● ●●
●●●●
●●
● ●●●
●●
●●● ●
●●
●●
●●
●●
●●●
●
●●●
●● ●● ●●● ●
● ●●
●●●
●●
●●●●●
● ●●
●●●
●
●●●● ●●
●●
●
●
●● ●
●●
●●●●
●●
● ●●
●
●
●●
●
●●
●
●
●
●
● ●●●
●
●●●
●
● ● ●
●●
●
●●
●
●
●●●
● ●●
●●●●●
●
●●
(b)
→
●●
●●●
●
●●
●
●● ●
●●
●●
●
● ●
●●
●
●
●●
●
●
●
● ●
●● ●
●
● ●● ●
●
●●
●● ●
●
●●
●
●●●
●●
●●
●●●●
●
●●
●●
●●●●
●●
● ●●
●●●●
●●
● ●●●
●●
●●● ●
●●
●●
●●
●●
●●●
●
●●●
●● ●● ●●● ●
● ●●
●●●
●●
●●●●●
● ●●
●●●
●
●●●● ●●
●●
●
●
●● ●
●●
●●●●
●●
● ●●
●
●
●●
●
●●
●
●
●
●
● ●●●
●
●●●
●
● ● ●
●●
●
●●
●
●
●●●
● ●●
●●●●●
●
●●
(c)
●●
●●●
●
●●
●
●● ●
●●
●●
●
● ●
●●
●
●
●●
●
●
●
● ●
●● ●
●
● ●● ●
●
●●
●● ●
●
●●
●
●●●
●●
●●
●●●●
●
●●
●●
●●●●
●●
● ●●
●●●●
●●
● ●●●
●●
●●● ●
●●
●●
●●
●●
●●●
●
●●●
●● ●● ●●● ●
● ●●
●●●
●●
●●●●●
● ●●
●●●
●
●●●● ●●
●●
●
●
●● ●
●●
●●●●
●●
● ●●
●
●
●●
●
●●
●
●
●
●
● ●●●
●
●●●
●
● ● ●
●●
●
●●
●
●
●●●
● ●●
●●●●●
●
●●
(d)
→
●●
●●●
●
●●
●
●● ●
●●
●●
●
● ●
●●
●
●
●●
●
●
●
● ●
●● ●
●
● ●● ●
●
●●
●● ●
●
●●
●
●●●
●●
●●
●●●●
●
●●
●●
●●●●
●●
● ●●
●●●●
●●
● ●●●
●●
●●● ●
●●
●●
●●
●●
●●●
●
●●●
●● ●● ●●● ●
● ●●
●●●
●●
●●●●●
● ●●
●●●
●
●●●● ●●
●●
●
●
●● ●
●●
●●●●
●●
● ●●
●
●
●●
●
●●
●
●
●
●
● ●●●
●
●●●
●
● ● ●
●●
●
●●
●
●
●●●
● ●●
●●●●●
●
●●
(e)
→
●●
●●●
●
●●
●
●● ●
●●
●●
●
● ●
●●
●
●
●●
●
●
●
● ●
●● ●
●
● ●● ●
●
●●
●● ●
●
●●
●
●●●
●●
●●
●●●●
●
●●
●●
●●●●
●●
● ●●
●●●●
●●
● ●●●
●●
●●● ●
●●
●●
●●
●●
●●●
●
●●●
●● ●● ●●● ●
● ●●
●●●
●●
●●●●●
● ●●
●●●
●
●●●● ●●
●●
●
●
●● ●
●●
●●●●
●●
● ●●
●
●
●●
●
●●
●
●
●
●
● ●●●
●
●●●
●
● ● ●
●●
●
●●
●
●
●●●
● ●●
●●●●●
●
●●
(f)
Figure 4.1: Allocation method illustration of two out-of-sample networks of sizes 3 and 2,
based on weights λ (gray background). The lighter the cells’ color, the higher the λ value
(intensity) of that cell. The white cells with bold borders, whose weights are equal to zero, are
the sampled networks’ cells and the hatched ones correspond to the nonempty networks’ border
cells. The red borders surround cells that can be drawn in each stage of the procedure. The
cells that compose the first and second allocated networks are blue-painted and green-painted,
respectively. The example proceeds as follows: draw one of the red-surrounded cells of Panel
(a). The sampled cell is blue-indicated in Panel (b) and only its neighbors can be sorted to keep
building this network. In Panel (c) and (d) the allocation of the 3-sized network is finished, and
we can draw any cell with the red border to start the allocation of the 2-sized network. Panels
(e) and (f) present the cells chosen to compose this network in green.
4.1.2 Sampling procedure
A variation of the sampling procedure proposed by Salehi M. & Seber (1997) is pro-
posed here to improve the sampling process, aiming to sample more nonempty networks.
Let π be the set of sampling weights assigned to all cells of R and π(c) the weight for a
given cell c. The procedure consists of sampling a grid cell from the set of M grid cells
with probability proportional to the weights π and, if it is nonempty, the entire network
containing the selected grid cell. After removing this network from the population, a
new cell is selected from the remaining set of grid cells and the method proceeds in this
way until we have selected m networks in the sample. Note that a nonempty network is
surrounded by empty cells that make up its border and can be resampled.
The sampling process improvement proposed in this dissertation, illustrated in Figure
20
4.2, is divided into two stages and is based on weights that are used to draw the sample.
In the first stage, m1 networks are selected considering grid cells with equal weights, i.e.
π(c) is constant for all c ∈ R. The sampling procedure continues until all nonempty
cells in the neighborhood are observed and stop when empty units are visited. Thus,
the networks are selected with probability proportional to their size. Note that, during
this process, although the border cells are visited, they are not added to the sample.
Based on the fit of the proposed model in equation (4.1) to this first sample with m1
networks, we obtain the vector of weights ω for all non-sampled cells of R, which are
used to select the second sample. Let ω(c) be the weight defined by the posterior mean
of η(c), for each cell c ∈ R. Note that the higher the posterior mean of a cell count,
the more chances of selecting that cell. Due to the inference process, the weights ω
associated with the border cells are assigned to be zero. Since the first sample of the
network’s cells must not be drawn in the second sampling stage, the weights associated
with these cells are assumed to be zero too. Then, a second sample of m2 networks is
●
●
●●●
●
●●
●
●
● ●●
●
●●
●
● ●
●
●
●
●
●●
●
●
●
● ●
●● ●
●
● ●
● ●
●
●●
●● ●
●
●●
●
●●●
●●
●
●
●●
●●
●
●●
●●
●●
●
●●
●● ●●
●●●●
●●
● ●
●●
●●
●
●● ●●
●
●
●
●●
●●
●●●
●
●●●
●● ● ● ●●● ●
● ●●
●●●
●●
●●
● ●●
● ●●
●●●
●
●
●●● ●●●
●
●
●
●
● ●●
●●
●●●●●
● ●●
●
●
●●
●
●●
●
●
●
●
● ●
●●
●
●●●
●
● ● ●
●●
●
●
●
●
●
●●●
● ●●
●●●●●
●
●●
(a) Population and constant weights
1st stage−−−−−−→
●
●
●●●
●
●●
●
●
● ●●
●
●●
●
● ●
●
●
●
●
●●
●
●
●
● ●
●● ●
●
● ●
● ●
●
●●
●● ●
●
●●
●
●●●
●●
●
●
●●
●●
●
●●
●●
●●
●
●●
●● ●●
●●●●
●●
● ●
●●
●●
●
●● ●●
●
●
●
●●
●●
●●●
●
●●●
●● ● ● ●●● ●
● ●●
●●●
●●
●●
● ●●
● ●●
●●●
●
●
●●● ●●●
●
●
●
●
● ●●
●●
●●●●●
● ●●
●
●
●●
●
●●
●
●
●
●
● ●
●●
●
●●●
●
● ● ●
●●
●
●
●
●
●
●●●
● ●●
●●●●●
●
●●
(b) First sample: m1 networks
●
●
●●●
●
●●
●
●
● ●●
●
●●
●
● ●
●
●
●
●
●●
●
●
●
● ●
●● ●
●
● ●
● ●
●
●●
●● ●
●
●●
●
●●●
●●
●
●
●●
●●
●
●●
●●
●●
●
●●
●● ●●
●●●●
●●
● ●
●●
●●
●
●● ●●
●
●
●
●●
●●
●●●
●
●●●
●● ● ● ●●● ●
● ●●
●●●
●●
●●
● ●●
● ●●
●●●
●
●
●●● ●●●
●
●
●
●
● ●●
●●
●●●●●
● ●●
●
●
●●
●
●●
●
●
●
●
● ●
●●
●
●●●
●
● ● ●
●●
●
●
●
●
●
●●●
● ●●
●●●●●
●
●●
(c) First sample and weights ω
2nd stage−−−−−−→
●
●
●●●
●
●●
●
●
● ●●
●
●●
●
● ●
●
●
●
●
●●
●
●
●
● ●
●● ●
●
● ●
● ●
●
●●
●● ●
●
●●
●
●●●
●●
●
●
●●
●●
●
●●
●●
●●
●
●●
●● ●●
●●●●
●●
● ●
●●
●●
●
●● ●●
●
●
●
●●
●●
●●●
●
●●●
●● ● ● ●●● ●
● ●●
●●●
●●
●●
● ●●
● ●●
●●●
●
●
●●● ●●●
●
●
●
●
● ●●
●●
●●●●●
● ●●
●
●
●●
●
●●
●
●
●
●
● ●
●●
●
●●●
●
● ● ●
●●
●
●
●
●
●
●●●
● ●●
●●●●●
●
●●
(d) Final sample: m1 +m2 networks
Figure 4.2: Proposed sampling procedure illustration of a population (points) distributed in
a region with M = 400 cells and weights (gray background) used in this scheme. The lighter
the cells’ color, the higher the weight of that cell. The white cells, whose weights are equal
to zero, with bold borders are the sampled networks’ cells and hatched cells correspond to the
nonempty networks’ border cells. Panels (a) and (b) present the same grayscale, since all cells
have constant weight in the first stage; and Panels (c) and (d) show different shades of gray
due to the second stage’s different weights.
21
drawn with probability proportional to the weights ω. Hence, the final sample will be
given by s = s1 ∪ s2 = {i1, . . . , im1 , im1+1, . . . , im1+m2}, with size m = m1 +m2.
To motivate the notation for the probability of selecting a given sample, consider a
population consisting of networks of size Z from which we obtain the ordered sample
s = {i1, . . . , im}. The probability of selecting the j-th network of the sample, that is
a network of size Zij , is given by the sum of probabilities of selecting each unselected
network of size Zij after j − 1 networks have been observed, since networks with the
same size are considered alike. Thus, the probability of selecting a network in the sample
depends on its size Zi, which is only observed for the sampled networks after their
selection in the sample.
Let cj be the set of sampled cells in the j-th draw. Thus, cj is composed of the drawn
grid cell and, if it is nonempty, cj contains the entire network containing the selected grid
cell. Let Gij ,j be the set of cells that compose unselected networks of size Zij after j − 1
networks have been selected. Thus, in general, the probability of selecting the sample
s = {i1, . . . , im} of m networks is given by:
[s | X,P,Y] =m∏j=1
∑g∈Gij ,j
π(g)∑r∈R
π(r)−j−1∑k=0
∑c∈ck
π(c)
, (4.3)
where π(c) represents the weight of the cell c and is:
π(c) =
{constant, if c ∈ s1;
ω(c), if c ∈ s2.
Note that in equation (4.3), the index j represents j-th draw, so c ∈ s1 for j =
1, . . . ,m1, and c ∈ s2 for j = m1 + 1, . . . ,m. When the proposed model in equation
(4.1) is fitted to the first sample (to obtain the weights ω), the weights π(c) are constant
and the probability given in expression (4.3) matches with the probability of selecting a
sample s given in Rapley & Welsh (2008). On the other hand, differently from Rapley
& Welsh (2008), the probability of selecting a given sample s does not depend directly
on the quantities of the model, but on the weights of the networks’ cells.
The cells that compose each non-sampled network are defined from their allocation
process (described in Subsection 4.1.1.1), which directly impacts the weights ω used to
select the second sample. Therefore, the proposed model must properly determine the
cells that compose the out-of-sample nonempty networks. If a cell is part of Cs in a
large number of MCMC iterations, this cell tends to be a nonempty cell of R and the
associated posterior mean will be high, while, if a cell does not compose Cs in a large
22
number of MCMC iterations, this cell tends to be an empty cell of R. It is expected that
this novel sampling method based on weights will lead us to a more efficient selection of
networks, as we are assigning higher chances to the cells where the phenomenon of interest
is expected to be found and avoiding sampling in areas where the expected intensity of
the phenomenon’s occurrence is low.
The proposed sampling methodology consists of the following steps:
(1) Consider a region R containing a rare, clustered population, partitioned into M
cells and draw an adaptive cluster sample of m1 networks, which is equivalent to
drawing a sample of m1 networks with probability proportional to their sizes, i.e.,
the elements of the vector of probabilities π are constant;
(2) Fit the proposed model in equation (4.1) to this first sample to obtain the posterior
mean of the cells’ counts η(c), given in the vector ω, which will be used as weights
to select the second sample;
(3) Since the first sample of the networks cells must not be drawn in the second sam-
pling stage, set the weights associated with the cells of the first sample as zero, as
well as, the non-empty networks’ border cells;
(4) From the remaining cells of R, drawn m2 networks with probability proportional
to the weights ω;
(5) Finally, fit the proposed model in equation (4.1) to the final sample of size m =
m1 +m2 to estimate the population total.
4.1.2.1 Border-sampling procedure
Through the proposed sampling method, we survey a selected grid cell and, if it is
nonempty, the entire network containing the selected grid cell. It is important to remark
that nonempty networks are surrounded by empty cells that compose its border, which
are not removed from R unless they are drawn as an empty network. Thus, a surveyed
border cell can be drawn later, although we know that it is empty.
To avoid surveying the same border cell twice, we propose an alternative sampling
method, given as follows: draw a grid cell from R with probability proportional to the
weights π, survey that grid cell and, if it is nonempty, survey the entire network contain-
ing the selected cell. After removing this network and its border from the population,
select a new cell from the remaining set of grid cells and proceed in this way until we
have selected m networks in the sample. In practice, proceeding this way is equivalent
to surveying clusters instead networks, though the final sample structure is the same as
23
before but containing the border cells’ information. Note that the only change in this
method is that the border cells can not be re-sampled, in comparison with the method
previously presented in Subsection 4.1.2.
The inference procedure is the same as Subsection 4.1.1 except for the joint distri-
bution of all the quantities in the model (4.2) since the probability of selecting a given
sample s has changed. Let cj be the set of sampled cells in the j-th draw. In this case, cj
is composed of the drawn grid cell and, if it is nonempty, the entire network containing
the selected grid cell plus its border. This subtle change in the sets cj, for j = 1, . . . ,m,
incorporates the sampling modification. Thus, the expression of the probability of se-
lecting the sample s of m networks is the same as in (4.3) except for the definition of the
set cj.
4.2 A first simulation study
In order to assess the effectiveness of each methodology, we compared the results
of our approach to those obtained in Rapley & Welsh (2008). In this Section, we will
refer to the model of Rapley & Welsh (2008) as the ‘network-sampling aggregated model’
(MAN) since the sampling procedure proposed by Salehi M. & Seber (1997) allows the
networks’ border cells to be re-selected. Analogously, the model proposed in Section 4.1
considering the sampling procedure first presented in Subsection 4.1.2 will be referred
to as the ‘network-sampling disaggregated model’ (MDN), and the one considering the
methodology proposed in Subsection 4.1.2.1 as the ‘cluster-sampling disaggregated model’
(MDC), since the sampling procedure does not allow the networks’ border cells to be re-
sampled, i.e., the entire cluster is sampled.
The fixed population used here was generated based on the disaggregated model
presented in equation (4.1) according to the fixed parameters (α, β) = (0.1, 0.1) and
(θ0, θ1) = (2.7, 0.5). The fictional covariate was simulated from a gaussian process. Figure
4.3 shows the generated population and covariate in a grid with M = 400 cells. Note
that these counts are sparse and clustered, motivating the use of adaptive sampling.
We can observe from Figure 4.4 that the higher the generated covariate value, the
higher the associated count of nonempty cells. Moreover, the simulated interest event is
associated with higher values of the covariate since there is no occurrence of that in cells
with lower covariate values.
The study consists of drawing 100 nonempty samples of m = 40 networks of the
population according to each method. The proposed sampling methodologies are divided
24
Figure 4.3: Values of the generated covariate (gray background) and counts for each nonempty
cell in a grid with M = 400 cells.
0 20 40 60 80
1.0
2.0
3.0
●●
●
●
●
●
●
●●
●
●●
●●
●●●
●●●
●●
●●●●●
●
●
●●
●●
●●
●
●●
●
●
●
●
●●
●●
●
●●●
●●
●
●
●●
●
●
●●●
●●
●
●●
●●●
●●●●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●●
●●●
●
●●
●
●●●●
●
●
●●●●●●
●
●
●●
●
●●
●
●
●●
●●
●●●●
●
●
●
●
●
●●●
●●
●
●●
●●●●
●
●
●●
●●
●●●●
●
●●
●●
●●●●
●
●
●●
●●●●●
●
●
●
●●●●●●●
●
●●
●●
●●
●
●
●
●
●
●
●
●●
●●
●
●
●●
●●●●
●●●●●●●●●
●●
●
●
●●
●●
●
●
●
●●
●●●
●●
●●
●●●●
●
●
●●
●
●●
●
●
●●●●
●
●●
●●
●
●
●●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●●
●●●
●●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●●●●
●
●●●
●●●●●
●
●●●●●●
●
●
●
●
●
●●
●
●●●
●
●
●
●●
●●
●●
●
●●●●
●●●
●●
●
●●
●
●
●
●●
●
●● ●
●●
●
●
●
●●
●
●
●●
● ●
●
●●
● ●
●
●
●
●
●
●●●●
●
Cov
aria
te
Counts
●
●
Nonempty cellEmpty cell
0.0 0.4 0.8
●●
●
●●●
●
●
●●
●
●●●
●●
●
●
●
●●
●
●
●●
●●
●
●●
●●
●
●
●
●
●
●●●●
●
Density
Figure 4.4: Plot of the generated covariate versus the population counts and covariate’s his-
togram, where the black points in the histogram represent the nonempty cells.
into two stages, where we sample m1 = 20 networks randomly and m2 = 20 networks
based on weights ω, according to the proposed sampling methodology. Note that we
are using an initial sample of size 50% of m to obtain these weights and one can adjust
this percentage (see Subsection 4.3.1). In the study performed in this Section, we are
omitting samples that consist of only empty networks since our proposed models require
at least one nonempty network in the first sample to adjust the weights ω properly.
The MCMC algorithm was implemented in the R programming language, v. 3.6.1 (R
Core Team, 2019). For each sample and fitted model, we ran two parallel chains starting
from different initial values, let each chain run for 40,100 iterations, discarded the first
100 as burn-in, and stored every 20th iteration to obtain 2,000 independent samples.
We used the diagnostic tools available in the package CODA (Plummer et al., 2006) to
25
check convergence of the chains. Convergence results of Subsection 4.3.2 are available in
Appendix B.
A summary comparison of the population total estimators using the relative square
root of the mean square error (RRMSE), relative absolute error (RAE), relative bias
(RB), relative width (RW) and the empirical coverage of the 95% credibility intervals
measured in percentages (Cov.) are presented in Table 4.1. These results are obtained
considering all 100 samples generated according to each model.
Table 4.1 shows that the Bayes estimator produced by MDN fit has a smaller RRMSE
and RAE than MAN and MDC’s estimators. The model MAN produced the smallest
RB, but it seems to be less efficient than the other models according to the other error
measurements. Although the MDC produces 95% credibility intervals with higher cover-
age percentages than the others, its width (RW) is not the smallest one. The proposed
model MDN appears to be more efficient when applied to these artificial samples.
RRMSE RAE RB RW Cov.
MAN 0.296 0.236 -0.009 0.879 95.88
MDN 0.265 0.209 -0.022 0.756 97.94
MDC 0.283 0.217 0.014 0.878 100.00
Table 4.1: Summary measurements of the point and 95% credibility interval estimates of
the population total T , obtained by fitting MAN, MDN and MDC models under 100 samples
according to each model.
In a similar way, Figure 4.5 shows the boxplots of the RRMSE, RAE, RB and RW of
the Bayes estimators obtained when fitting each model, based on all 100 samples. Here
again, we see that the RRMSE, RAE and RW obtained for MDN model are lower than
the others. Note that the RB distributions are quite similar although MDN has a smaller
variability than the others.
RR
MS
E
0.15
0.30
0.45
MA
N
MD
N
MD
C
RA
E
0.12
0.23
0.34
MA
N
MD
N
MD
C
RB
−0.
500.
000.
45
MA
N
MD
N
MD
C
RW
0.60
0.87
1.14
MA
N
MD
N
MD
C
Figure 4.5: Boxplots with measurements of the point and 95% credibility interval estimates
for T over 100 simulations obtained for the fits of MAN, MDN and MDC models.
26
Finally, we present the barplots with the relative frequency of the number of networks
sampled from 100 fits of each model in Figure 4.6. Note that the sampling procedure of
the proposed disaggregated models tend to sample more networks than the aggregated
one. In particular, MDN sampling procedure provides us more samples containing the
whole population than the others.
1 2 3 4 5
Rel
ativ
e fr
eque
ncy
0.0
0.2
0.4
MAN
1 2 3 4 5
MDN
1 2 3 4 5
MDC
Figure 4.6: Relative frequency of the number of networks sampled over 100 simulations ob-
tained for the fits of MAN, MDN and MDC models.
Since among the proposed methodologies, the MDN model (which allows the border
cells to be re-sampled) yielded better results than the MDC model, we will focus on
studying the properties of the MDN model from now on, as well as a more extensive
comparison with the aggregated model and an application to a real data.
4.3 Comparison with the aggregated model
To assess the effectiveness of our proposed methodology, we compared the results of
our approach considering the MDN model of Section 4.2, which will be simply referred
to as disaggregated model (MD), to those obtained in Rapley & Welsh (2008), called
aggregated model (MA). The first comparison consists of a design-based experiment,
where the numbers of networks m1 and m2 selected are studied, and the second one is a
real experiment with an African Buffalo population in an area of East Africa.
4.3.1 A design-based experiment evaluating the sample fraction
The purpose of this simulation study is to compare the performance of the aggregated
and disaggregated models when the population is generated according to the disaggre-
gated model, and study how the choice of the numbers of networks m1 and m2 selected,
27
respectively, in the first and second sampling stages, affect the population total estimates.
We considered twelve scenarios to evaluate how the sample size m and the numbers m1
and m2 of networks selected in the first and second sampling stages, respectively, affect
the population total estimates under the disaggregated model. We fixed the total sample
size m ∈ {30, 40, 50} and the percentage of the m networks to be sampled in the first
sampling stage at {35%, 50%, 65%, 80%}, i.e., the numbers m1 and m2 depend on these
percentages. We used the same population generated in the simulation study presented
in Section 4.2, which is distributed in a region with M = 400 cells, draw 500 samples
according to each scenario and methodology, and fitted both models to evaluate their
performances. Note that, the aggregated model’s sampling methodology considers only
one sample of size m, collected as in the first sampling stage of the proposed methodology.
Table 4.2 shows the values of the sample size m and the respective m1 and m2 of networks
selected in the first and second sampling stages, according to each fixed percentage.
35% 50% 65% 80%
m m1 m2 m1 m2 m1 m2 m1 m2
30 10 20 15 15 20 10 24 6
40 14 26 20 20 26 14 32 8
50 18 32 25 25 32 18 40 10
Table 4.2: Values of the sample sizes m, m1 and m2, according to each fixed percentage.
Table 4.3 displays some of the frequentist properties of the estimators obtained by
fitting the proposed disaggregated and aggregated models. In general, increasing the
sample size leads to smaller errors and variances (RW), so it is expected that errors
associated with the disaggregated model for m = 50 are smaller than the ones that
consider m = 30 and m = 40. Note that, as the percentage of networks sampled in the
first sampling stage decreases, the disaggregated model performs better, according to the
RRMSEs e RAEs, since its values become smaller. Moreover, these error values associated
with the aggregated model are higher than the ones obtained under the proposed model
fit, except when m = 40 and the sampling proportion is fixed in 80%. The relative bias
of the fitted MAs is smaller than the ones produced by the MD models and both seem
to underestimate the population totals T . The relative width of the proposed model
for all scenarios are smaller than the ones provided by the aggregated model, while
still producing higher coverages. Overall, the disaggregated model presents a better
performance than the aggregated model.
Figure 4.7 presents the boxplot of some of the measurements associated with the
estimates for T , and again it suggests that the disaggregated model performs better
28
m RRMSE RAE RB RW Cov.
30
MD35% 0.334 0.263 -0.030 1.027 100.00
MD50% 0.335 0.267 -0.052 1.010 100.00
MD65% 0.343 0.278 -0.082 1.030 100.00
MD80% 0.350 0.287 -0.106 1.029 100.00
MA 0.357 0.283 -0.015 1.119 99.00
40
MD35% 0.262 0.207 -0.022 0.761 99.80
MD50% 0.267 0.214 -0.042 0.762 99.60
MD65% 0.275 0.224 -0.070 0.768 99.20
MD80% 0.302 0.252 -0.110 0.791 98.40
MA 0.294 0.235 -0.018 0.867 96.00
50
MD35% 0.226 0.178 -0.015 0.619 99.80
MD50% 0.227 0.180 -0.022 0.603 98.60
MD65% 0.235 0.190 -0.045 0.614 98.60
MD80% 0.245 0.202 -0.073 0.617 98.40
MA 0.252 0.204 -0.015 0.697 90.80
Table 4.3: Summary measurements of the point and 95% credibility interval estimates for T
over 500 simulations obtained for the fits of the disaggregated and aggregated models, consid-
ering different sample sizes and proportions.
taking into account the variation of these values. In particular, there is an increase in
RRMSEs and RAEs quartiles as we increase the percentage of networks sampled in the
first sampling stage.
0.2
0.4
0.6
RR
MS
E
m = 30 m = 40 m = 50
0.1
0.3
0.5
RA
E
m = 30 m = 40 m = 50
−0.
50.
5
MD
35 %
MD
50 %
MD
65 %
MD
80 %
MA
MD
35 %
MD
50 %
MD
65 %
MD
80 %
MA
MD
35 %
MD
50 %
MD
65 %
MD
80 %
MA
RB
0.5
1.0
1.5
MD
35 %
MD
50 %
MD
65 %
MD
80 %
MA
MD
35 %
MD
50 %
MD
65 %
MD
80 %
MA
MD
35 %
MD
50 %
MD
65 %
MD
80 %
MA
RW
Figure 4.7: Boxplots with measurements of the point and 95% credibility interval estimates
for T over 500 simulations obtained for the fits of the disaggregated and aggregated models,
considering different sample sizes and proportions.
29
Figure 4.8 presents the barplots with the relative frequency of the number of networks
sampled from 500 simulations, according to each scenario and sampling methodology.
Note that the proposed methodology provides a higher number of samples containing
the entire population than the aggregated model and, in particular, as we decrease the
percentage, the greater is the number of networks sampled since we are using weights
earlier. Moreover, as expected, we observed more nonempty sampled networks as we
increase the sample size.
Rel
ativ
e fr
eque
ncy
1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
00.
20.
40
0.2
0.4
00.
20.
4
m =
50
m =
40
m =
30
Number of networks sampled
MD35 % MD50 % MD65 % MD80 % MA
Figure 4.8: Relative frequency of the number of networks sampled over 500 simulations ob-
tained for the fits of the disaggregated and aggregated models, considering different sample
sizes and proportions.
To evaluate the effect of the number of networks sampled in the total estimates, Figure
4.9 shows some measurements of the relative bias estimates of T from 500 simulations,
according to each scenario and sampling methodology. For each scenario and sample size
combination, the number of simulations distributed among the five possible quantities of
sampled networks is given below the gray bars, with few simulations in some cases, and
the coverage of the 95% credibility intervals is shown above the gray bars. Our proposal
distribution may lead to none out-of-sample nonempty cells and, consequently, none
out-of-sample networks, differently from Rapley & Welsh (2008)’s approach. Therefore,
when we sample five networks, the aggregated model tends to overestimate more than the
disaggregated models, and its coverage is zero regardless of the sample size m. Notice that
for m = 30, all 95% credibility intervals associated with disaggregated models include the
true value of T , while for m = 40, it happens when we sample more than one network
30
and, for m = 50, when we sampled more than two networks. Moreover, among the
disaggregated models, the model that provides the largest coverage for all sample sizes
is MD35%.
m =
30
−0.
60.
00.
6
1 1 1 1
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
0
1 1 1 1
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
0
1 1 1 1
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
0
27 35 43 55 110 11
4
121
149
165 16
9
166
183
185
182 16
3
155
131
106 85
53 38 30 17 13
5
27 35 43 55 110 11
4
121
149
165 16
9
166
183
185
182 16
3
155
131
106 85
53 38 30 17 13
5
27 35 43 55 110 11
4
121
149
165 16
9
166
183
185
182 16
3
155
131
106 85
53 38 30 17 13
5
Rel
ativ
e bi
as
m =
40
−0.
7−
0.2
0.3
0.86 0.
78
0.73
0.75
0.92 1 1 1 1
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
0
0.86 0.
78
0.73
0.75
0.92 1 1 1 1
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
0
0.86 0.
78
0.73
0.75
0.92 1 1 1 1
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
0
7 9
15 32
48
46 60 70 108 13
4 170
169
186
184 19
1 191
184
174
129 11
1 86 78 55 47
16
7 9
15 32
48
46 60 70 108 13
4 170
169
186
184 19
1 191
184
174
129 11
1 86 78 55 47
16
7 9
15 32
48
46 60 70 108 13
4 170
169
186
184 19
1 191
184
174
129 11
1 86 78 55 47
16
m =
50
−0.
7−
0.2
0.3
0.75 0.25
0.5
0
0.63
1 0.97
0.94
0.94 1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
0
0.75 0.25
0.5
0
0.63
1 0.97
0.94
0.94 1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
0
0.75 0.25
0.5
0
0.63
1 0.97
0.94
0.94 1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
0
4 8
10 4 19
29 30 35 64 100 11
1
103
137
141 18
8 202
209
194
202 15
4
154
150
124 89
39
4 8
10 4 19
29 30 35 64 100 11
1
103
137
141 18
8 202
209
194
202 15
4
154
150
124 89
39
4 8
10 4 19
29 30 35 64 100 11
1
103
137
141 18
8 202
209
194
202 15
4
154
150
124 89
39
1 2 3 4 5Number of networks sampled
MD
35 %
MD
50 %
MD
65 %
MD
80 %
MA
MD
35 %
MD
50 %
MD
65 %
MD
80 %
MA
MD
35 %
MD
50 %
MD
65 %
MD
80 %
MA
MD
35 %
MD
50 %
MD
65 %
MD
80 %
MA
MD
35 %
MD
50 %
MD
65 %
MD
80 %
MA
MD
35 %
MD
50 %
MD
65 %
MD
80 %
MA
MD
35 %
MD
50 %
MD
65 %
MD
80 %
MA
MD
35 %
MD
50 %
MD
65 %
MD
80 %
MA
MD
35 %
MD
50 %
MD
65 %
MD
80 %
MA
MD
35 %
MD
50 %
MD
65 %
MD
80 %
MA
MD
35 %
MD
50 %
MD
65 %
MD
80 %
MA
MD
35 %
MD
50 %
MD
65 %
MD
80 %
MA
MD
35 %
MD
50 %
MD
65 %
MD
80 %
MA
MD
35 %
MD
50 %
MD
65 %
MD
80 %
MA
MD
35 %
MD
50 %
MD
65 %
MD
80 %
MA
Figure 4.9: Mean (white line) and 2,5% and 97,5% quantiles (gray bars) of the relative bias
estimates of T over 500 simulations obtained for the fits of the disaggregated and aggregated
models, considering different sample sizes, proportions, numbers of networks sampled (number
of simulations given below the gray bars) and coverage (above the gray bars).
Based on this study, the disaggregated model provides a more efficient sample, with
a large number of networks than the aggregated model. With relation to the estima-
tors performance, the MD35% model showed to be more efficient and, therefore, we will
concentrate on studying the properties of this model on an application to real data.
4.3.2 A real application
In this subsection, we analyze the performance of the disaggregated model with 35%
of m networks sampled in the first sampling stage (MD35%) and the aggregated model
(MA) using a real dataset. In order to simplify the notation we will refer MD35% model
31
as MD. The study variable considered is the number of African Buffaloes in an area
of East Africa, while the auxiliary variable is the altitude (in meters). The choice of
Buffalo and altitude was motivated by the fact that Buffaloes drink a lot of water (Prins,
1996) and their spatial distribution depends on the prevailing climatic condition (Bennitt
et al., 2014), that is related to the altitude. Thus, areas with higher temperatures (lower
altitude) lose terrain water (lakes or rivers) due to the evaporation, attracting little or
no presence of Buffaloes.
The data on African Buffalo was obtained from maps produced from an aerial census.
The census was conducted by the Kenya Wildlife Service, the Tanzania Wildlife Research
Institute, and other partners during the wet season in the year 2010 covering an area of
about 24,108km2. The area covered was the Amboseli-West Kilimanjaro/Magadi-Natron
cross border landscape, which covers parts of Kenya and Tanzania. The auxiliary data
over the study area were obtained from the Shuttle Radar Topography Mission (STRM)
database freely available for download from https://www2.jpl.nasa.gov/srtm/. In
particular, we used the altitude in a logarithm scale as covariate, since its values have a
smaller order of magnitude. Figure 4.10 presents the distribution and counts of the Buf-
falo in the study region along with pixels of auxiliary variables and shows that Buffaloes
are mostly found in areas of higher altitudes.
Figure 4.10: Altitude in a logarithm scale (gray background) and counts of African Buffaloes
over parts of Kenya and Tanzania in 2010 in a grid with M = 391 cells.
From Figure 4.11, we can notice that Buffaloes tend not to be in areas in which
the associated covariate has extreme values, i.e. they are concentrated in areas with
intermediate log altitude values.
Based on the relationship between the Buffalo counts and altitude, it seems natural
to add the square of the covariate as another explanatory covariate, allowing us to model
more accurately the effect of log altitude, which has a non-linear relationship with the
32
Figure 4.11: Plot of altitude in a logarithm scale versus counts of African Buffaloes, and
covariate’s histogram, where the black points in the histogram represent the nonempty cells.
Buffalo counts. Since we are using highly correlated covariates, centering them is helpful
for the numerical schemes to converge. Thus, let v(c) be the centered log altitude for
c ∈ R, and v2(c) its respective square. Now, the covariate vector associated with the
cell c in the proposed model (4.1) is given by vc, for all c ∈ R. Introducing the squared
log altitude leads to consider the high correlation between covariates in the proposal
distribution, which is detailed in Section A.1.
Remember that in the allocation process, described in Subsection 4.1.1, the available
cells of R are sampled with probability proportional to λ, the set of intensities assigned
to R cells, which is obtained through information from sampled nonempty cells. Note
that the real population used in this section (see Figure 4.10) is extremely rare, with
small networks. Thus, the sample may contain a few number of nonempty cells and,
consequently, few information to estimate λ. Therefore, in this application, the weights
used in the allocation process will be the probability of each cell not being empty, which
is estimated from all sampled cells (empty and non-empty). Let φ(c) = 1, if c is a
nonempty grid cell and φ(c) = 0, otherwise. Define ν(c) = P (φ(c) = 1), the probability
that c is a nonempty cell, and ν = (ν(c1), . . . , ν(cM))′ the set of these probabilities, for
all cell c of R. Thus, the available cells of R are sampled with probability proportional to
ν. To obtain these probabilities, the following structure will be included in the proposed
model (4.1):
φ(c) | vc,ρ ∼ Bernoulli
(1
1 + exp{−v′cρ}
), c ∈ R, (4.4)
where ρ = (ρ0, ρ1, ρ2)′ represents the regression coefficients vector associated with the
covariates vc = (1, v(c), v2(c))′. Note that, to estimate ρ, we will use the information
from all empty and nonempty sample cells, unlike θ, which only takes into account
33
nonempty cells of the sample. The full conditional posterior distribution of ρ and the
methods adopted to sample from it are detailed in Appendix A, Section A.5.
Following the Bayesian paradigm, it is also assumed independent priors for the un-
known parameters ρ and its marginal prior distributions is denoted by [ρ]. Let [ρ] be a
non-informative prior with a zero-mean vector and covariance matrix σ2ρIk+1, where Ik+1
denotes the (k + 1)-dimensional identity matrix. Since the African Buffaloes population
is extremely sparse and clustered, we will set the prior distribution parameters of α, pre-
sented in Subsection 4.1.1, as aα = 3 and bα = 50, to reflect the fact that α is necessarily
small in this population. For β, we maintain the prior parameters set as aβ = 1 and
bβ = 9.
The study consists of drawing 500 samples of m = 40 networks of the real population
according to each method. The proposed sampling methodology is divided into two
stages, where the sample consists of 35% of m = 40 networks sampled in the first stage,
this is m1 = 14 networks are sampled randomly and m2 = 26 networks based on weights
ω. In this study, we are omitting samples that consist of only empty networks since our
proposed model requires at least one nonempty network in the first sample to adjust the
weights ω properly.
Figure 4.12 presents the boxplots of the RRMSE, RAE, RB and RW of the Bayes
estimators obtained from the disaggregated and aggregated model fits, based on 500
simulations. Note that these measures’ distributions under the disaggregated model are
wider and higher than under the aggregated model.
RR
MS
E
0.5
1.1
1.7
2.3
MD
MA
RA
E
0.3
0.6
0.9
1.2
MD
MA
RB
−0.
10.
30.
71.
1
MD
MA
RW
1.5
2.9
4.3
5.7
MD
MA
Figure 4.12: Boxplots with measurements of the point and 95% credibility interval estimates
for T over 500 simulations obtained for the fits of the disaggregated and aggregated models to
real data.
On the other hand, since the African Buffaloes population is extremely sparse and
clustered, many samples consist of networks with few cells each. These samples are ex-
pected to be of limited use in accurately estimating the population total. Thus, the
results must ultimately be affected. Table 4.4 shows that 79.4% of the simulations based
34
on MD and 84.4% based on MA contain one or two nonempty networks sampled. More-
over, the aggregated model did not sample five nonempty networks once. Thus, it would
be interesting to fix the number of nonempty networks sampled when each method is
used.
Percentage of nonempty networks sampled 1 2 3 4 5
Disaggregated model 35.4 44.0 18.0 2.2 0.4
Aggregated model 36.4 48.0 13.6 2.0 0.0
Table 4.4: Percentage of networks sampled over 500 simulations under each model.
By this way, to facilitate the comparison between the models, we repeated the study
with 100 samples of the real population according to each method, fixing the final number
of nonempty networks sampled, that is, we will consider 100 simulations with one to five
nonempty networks sampled. Henceforth, we will refer to the number of nonempty
networks sampled simply as number of networks sampled.
Table 4.5 displays some of the frequentist properties of the estimators for T obtained
by fitting the disaggregated and aggregated models, for each number of networks sampled.
When the number of networks sampled is fixed in four or five, the proposed model
performs better than the aggregated one according to all the criteria. In general, the
relative bias associated with the disaggregated model are smaller than the ones produced
by the aggregated model, except when we have two networks sampled. Also, the coverages
of the proposed model are higher than the aggregated model ones and, as we increase
the number of networks sampled, the MA’s coverage becomes smaller. In particular,
with five networks sampled, none of the 500 95% credibility intervals associated with the
aggregated model include the true value of T .
Number of networks sampled
1 2 3 4 5
MD MA MD MA MD MA MD MA MD MA
RRMSE 1.074 0.869 0.823 0.809 0.774 0.893 0.794 1.056 0.757 1.311
RAE 0.662 0.610 0.578 0.621 0.538 0.696 0.572 0.855 0.550 1.117
RB 0.339 0.538 0.374 0.364 0.428 0.585 0.549 0.832 0.550 1.117
RW 3.189 2.561 2.442 2.069 2.207 2.186 2.079 2.387 1.895 2.593
Cov. 100.00 100.00 100.00 64.00 100.00 51.00 100.00 32.00 96.00 0.00
Table 4.5: Summary measurements of the point and 95% credibility interval estimates for T
over 100 simulations for different numbers of networks sampled, obtained for the fits of the
disaggregated and aggregated models.
35
Figure 4.13 presents the boxplots of some of the previous summary measurements for
T , and the conclusion is analogous to the previous one. In particular, there is a decreas-
ing behavior of the disaggregated model’s relative widths as we increase the number of
networks sampled.
Number of networks sampled Number of networks sampled
RR
MS
E
0.6
1.1
1.6 1 2 3 4 5
RA
E
0.4
0.9
1.4 1 2 3 4 5
RB
−0.
20.
51.
2
MD
MA
MD
MA
MD
MA
MD
MA
MD
MA
RW
1.2
2.5
3.8
MD
MA
MD
MA
MD
MA
MD
MA
MD
MA
Figure 4.13: Boxplots with measurements of the point and 95% credibility interval estimates
for T over 100 simulations for each number of networks sampled, obtained for the fits of the
disaggregated and aggregated models to real data.
Finally, a summary comparison of population total T estimators considering the dis-
aggregated and aggregated models is presented, using RRMSE, RAE, RB and RW are
presented in Table 4.6, based on the 500 simulations resultant of the aggregation of 100
simulations with one to five networks sampled. Additionally, we compared these results
to the ones obtained by applying an unbiased Raj’s estimator, detailed on Salehi M. &
Seber (1997). This estimator of the population total is based only on the information
contained in the selected networks, i.e., ignoring the information in the border cells. In
this case, we used a normal approximation to set the 95% confidence interval to the pop-
ulation total. Table 4.6 shows that both estimators have larger RRMSEs, RAEs and RBs
than our proposed estimator, although it is well-known that Raj’s estimator is unbiased.
Raj’s estimator has a much larger variance than its counterparts. The aggregated model
produces 95% credibility intervals that have lower nominal coverages than the others.
Furthermore, our proposed model appears to be more efficient when applied to these
data.
Figure 4.14 shows the boxplots of some measurements of the Bayes estimators ob-
tained when fitting each model and Raj’s estimator.
In addition to other advantages previously seen, incorporating covariates into the
36
RRMSE RAE RB RW Cov.
Disaggregated model 0.845 0.580 0.448 2.362 99.2
Aggregated model 0.987 0.780 0.687 2.359 49.4
Raj’s estimator 1.018 1.018 0.967 4.222 99.0
Table 4.6: Summary measurements of the point and interval estimates of the population total,
obtained by fitting the disaggregated and aggregated models and Raj’s estimator.
RR
MS
E
0.0
0.8
1.6
2.4
MD
MA
Raj
RA
E
0.0
0.8
1.6
2.4
MD
MA
Raj
RB
−1.
10.
01.
12.
2
MD
MA
Raj
RW
1.2
2.6
4.0
5.4
MD
MA
Raj
Figure 4.14: Boxplots with measurements of the point and 95% credibility interval estimates
for T over 100 simulations for each number of networks sampled, obtained for the fits of the
disaggregated and aggregated models.
model allows us to refer the out-of-sample nonempty cells spatially. Figure 4.15 presents,
for one sample of each number of networks sampled, a map of the posterior mean of the
African Buffalo counts, i.e., the posterior mean of η(c) for all out-of-sample cell c of R.
Note that the lighter the cells’ color, the higher the posterior mean of that cell. Due to the
allocation process, the count estimates associated with the sampled nonempty networks’
border cells are equal to zero (hatched black cells). Note that the maps become darker as
we increase the number of sampled networks, i.e. samples with more nonempty networks
tend to estimate lower out-of-sample counts. Moreover, considering one to four networks
sampled, the out-of-sample nonempty networks are located in lighter areas, indicating
that the model can predict where they are. In the case with five networks sampled,
due to the proposed model structure and the relation between the Buffalo counts and the
covariate, we believe that the lighter areas would be more conducive to the establishment
of new populations, although there are no out-of-sample nonempty networks.
Figure B.1 in Appendix B shows the trace plot with the posterior distribution of
parameters α and β and the population total T when fitting the disaggregated model for
one of the samples selected for each number of networks sampled. Table B.1 in Appendix
B presents the values of the Geweke criteria. Analyzing Figure B.1 and Table B.1 leads
to conclude that convergence appears to have been reached. The same conclusion was
37
●●●●
●●
●●●●●●
●●●●●
● ●●●●
●●●●
●●
●●●●●●●
●●●●●●●●
●●●●● ●
●●●●●
●●
●●● ●
●●●●
●
● ●●●●
●●● ●
●●●
●●
●●
●
●●●● ●
●●●●
●●
●●●●●
●
●●●●●●
●●
●●
●●●
●●
● ●●
●●●
●●●●
●● ●
●
●●●
●●●●
●●●
●●●●●●●
●●● ●
●●●
●●●●●●
●●
●●
●
●●●●●
●●●●
●● ●●
●●
●●●●● ●
●●●●●●
●● ●●
● ●●
●
●●●●
●●●●●
●●
●●●●
●●●●●● ● ●
●●●●●
●●●●
●● ●
●●●
●●●●
●●●
●●● ●
●●●
(a) One network sampled
●●●●
●●
●●●●●●
●●●●●
● ●●●●
●●●●
●●
●●●●●●●
●●●●●●●●
●●●●● ●
●●●●●
●●
●●● ●
●●●●
●
● ●●●●
●●● ●
●●●
●●
●●
●
●●●● ●
●●●●
●●
●●●●●
●
●●●●●●
●●
●●
●●●
●●
● ●●
●●●
●●●●
●● ●
●
●●●
●●●●
●●●
●●●●●●●
●●● ●
●●●
●●●●●●
●●
●●
●
●●●●●
●●●●
●● ●●
●●
●●●●● ●
●●●●●●
●● ●●
● ●●
●
●●●●
●●●●●
●●
●●●●
●●●●●● ● ●
●●●●●
●●●●
●● ●
●●●
●●●●
●●●
●●● ●
●●●
(b) Two networks sampled
●●●●
●●
●●●●●●
●●●●●
● ●●●●
●●●●
●●
●●●●●●●
●●●●●●●●
●●●●● ●
●●●●●
●●
●●● ●
●●●●
●
● ●●●●
●●● ●
●●●
●●
●●
●
●●●● ●
●●●●
●●
●●●●●
●
●●●●●●
●●
●●
●●●
●●
● ●●
●●●
●●●●
●● ●
●
●●●
●●●●
●●●
●●●●●●●
●●● ●
●●●
●●●●●●
●●
●●
●
●●●●●
●●●●
●● ●●
●●
●●●●● ●
●●●●●●
●● ●●
● ●●
●
●●●●
●●●●●
●●
●●●●
●●●●●● ● ●
●●●●●
●●●●
●● ●
●●●
●●●●
●●●
●●● ●
●●●
(c) Three networks sampled
●●●●
●●
●●●●●●
●●●●●
● ●●●●
●●●●
●●
●●●●●●●
●●●●●●●●
●●●●● ●
●●●●●
●●
●●● ●
●●●●
●
● ●●●●
●●● ●
●●●
●●
●●
●
●●●● ●
●●●●
●●
●●●●●
●
●●●●●●
●●
●●
●●●
●●
● ●●
●●●
●●●●
●● ●
●
●●●
●●●●
●●●
●●●●●●●
●●● ●
●●●
●●●●●●
●●
●●
●
●●●●●
●●●●
●● ●●
●●
●●●●● ●
●●●●●●
●● ●●
● ●●
●
●●●●
●●●●●
●●
●●●●
●●●●●● ● ●
●●●●●
●●●●
●● ●
●●●
●●●●
●●●
●●● ●
●●●
(d) Four networks sampled
●●●●
●●
●●●●●●
●●●●●
● ●●●●
●●●●
●●
●●●●●●●
●●●●●●●●
●●●●● ●
●●●●●
●●
●●● ●
●●●●
●
● ●●●●
●●● ●
●●●
●●
●●
●
●●●● ●
●●●●
●●
●●●●●
●
●●●●●●
●●
●●
●●●
●●
● ●●
●●●
●●●●
●● ●
●
●●●
●●●●
●●●
●●●●●●●
●●● ●
●●●
●●●●●●
●●
●●
●
●●●●●
●●●●
●● ●●
●●
●●●●● ●
●●●●●●
●● ●●
● ●●
●
●●●●
●●●●●
●●
●●●●
●●●●●● ● ●
●●●●●
●●●●
●● ●
●●●
●●●●
●●●
●●● ●
●●●
(e) Five networks sampled
Figure 4.15: Maps of the posterior mean of African Buffalo counts η(c) (gray background) for
all out-of-sample cell c of R, for each number of sampled networks, and its population (points)
distributed in a region with 391 cells. The lighter the cells’ color, the higher the posterior mean
of that cell. The blue cells are the sampled networks’ cells and the hatched cells correspond to
the nonempty networks’ border cells.
achieved for all 600 samples selected from this population.
4.4 Model-based experiment under different settings
To examine the proposed methodology’s performance under several scenarios, 500
populations were generated considering different configurations for each one of the four
scenarios considered, which were created by varying the values of parameters (α, β). In
38
particular, populations were simulated for 4 pairs of (α, β), with α, β ∈ {0.10, 0.15},which were set to create different degrees of rare and clustered populations. Then, an
adaptive cluster sample of final size m = 40 was selected from each population with a
35% proportion being sampled randomly, i.e. the first stage’s sample size is m1 = 14 and
the second’s one is m2 = 26.
Table 4.7 shows summary statistics with some frequentist measurements of the poste-
rior distributions of the model parameters for each of the four evaluated scenarios. Note
that, the less rare and clustered the population is, the narrower the 95% credibility in-
tervals are, and the greater the tendency for the model to underestimate its parameters.
Moreover, the RRMSEs and RAEs do not vary much. In addition, the rarer and more
clustered the population, the larger the coverage of the 95% credibility intervals for the
population total T , while the coverage for α and β are close to the nominal level.
T α β T α β
(α, β) = (0.10, 0.10) (α, β) = (0.10, 0.15)
RRMSE 0.306 0.352 0.562 0.321 0.340 0.467
RAE 0.238 0.288 0.465 0.262 0.279 0.393
RB 0.006 0.037 -0.066 -0.092 -0.069 -0.146
RW 0.794 1.036 1.690 0.883 1.076 1.375
Cov. 95.00 93.60 95.00 99.00 98.80 95.00
(α, β) = (0.15, 0.10) (α, β) = (0.15, 0.15)
RRMSE 0.242 0.273 0.490 0.292 0.314 0.412
RAE 0.209 0.236 0.402 0.264 0.278 0.345
RB -0.114 -0.109 -0.022 -0.216 -0.209 -0.126
RW 0.495 0.701 1.513 0.562 0.723 1.225
Cov. 89.60 93.80 97.00 86.40 91.60 96.00
Table 4.7: Summary measurements of the point and 95% credibility interval estimates of the
proposed model and population parameters over 500 simulations for different values of α, β and
T .
Figure 4.16 presents the boxplots of some of the previous summary measurements for
T . Note that, fixing α, as we switch β from 0.10 to 0.15, the RRMSEs, RAEs and RWs
increase. Additionally, T is slightly underestimated as the population becomes rarer and
more clustered.
Finally, considering the population total T , the scenario generated considering α =
0.15 and β = 0.10, provided lower values of RRMSE and RAE errors, besides presenting
a smaller relative width.
39
RR
MS
E
0.0
0.3
0.6
(0.1
0, 0
.10)
(0.1
0, 0
.15)
(0.1
5, 0
.10)
(0.1
5, 0
.15)
(α, β
)
RA
E
0.0
0.3
0.6
(0.1
0, 0
.10)
(0.1
0, 0
.15)
(0.1
5, 0
.10)
(0.1
5, 0
.15)
(α, β
)
RB
−0.
60.
00.
6
(0.1
0, 0
.10)
(0.1
0, 0
.15)
(0.1
5, 0
.10)
(0.1
5, 0
.15)
(α, β
)
RW
0.2
0.8
1.4
(0.1
0, 0
.10)
(0.1
0, 0
.15)
(0.1
5, 0
.10)
(0.1
5, 0
.15)
(α, β
)
Figure 4.16: Boxplots with measurements of the point and 95% credibility interval estimates
of the proposed model and population parameters over 500 simulations for different values of
α, β and T .
40
Chapter 5
Conclusions
We have considered the problem of estimating the total number of individuals in a
rare and clustered population. A regular grid is superimposed on the interest region,
placing the clusters, giving them a spatial size and allowing modeling the number of
individuals selected by adaptive cluster sampling, as described by Salehi M. & Seber
(1997), within this grid structure.
Our approach is to model the observed counts of the selected grid cells and to use
a model-based analysis to estimate the population total using the auxiliary information
of covariates. To include this extra knowledge in the model, we proposed a model more
flexible than the one introduced by Rapley & Welsh (2008), since it models at cell level
instead of network-level and assumes that the intensity in each cell of a cluster is related
to the available covariates’ values. Despite the higher computational cost, the proposed
methodology considering a grid with 400 cells still runs on a home computer (CORE i7,
16GB) at an acceptable time (about 30 minutes on average).
As evidenced by the simulated studies in Sections 4.2 and 4.3, the incorporation of
covariates into the model provided an improvement in the sampling process by getting
more sampled networks. In practical situations, increasing the number of sampled net-
works directs and enhances the use of human and material resources, reducing expenses
involved with the sampling procedure. Besides, it is also possible to spatially refer ob-
served and unobserved networks, highlighting areas more conducive to the establishment
of the studied population. Moreover, despite the challenges inherent in the spatial pre-
diction problem - more specifically in the allocation of the networks and their counts
through the interest region - the resulting maps showed to adequately indicate where the
population under study is placed. We also performed changes in MCMC obtaining ad-
vances in inference: our proposal distribution may lead to none out-of-sample networks,
41
and the performance of our Bayes estimator is substantially better than the one proposed
by Rapley & Welsh (2008).
Simulation studies have assessed different scenarios varying the percentage of net-
works drawn in the first sampling stage and the parameters used to generate artificial
populations. Our methodology has yielded satisfactory results and, in most cases, bet-
ter than those obtained without additional information according to various comparison
criteria. In the analyzed application, covariate information was successfully incorporated
into the model by including quadratic terms in the linear predictor, evidencing the flex-
ibility of the model to incorporate available auxiliary information. Simulation studies
with a real population have shown that the results are quite satisfactory according to
several comparison criteria, validating the methodology proposed in this dissertation for
practical situations.
The main findings of this work encourage an extension of this model to other spatial
structures that reveal more information about the population. It is reasonable to study
the expected cost of the proposed sampling scheme since sampling more non-empty net-
works evolve visiting more cells. Moreover, the proposed methodology requires at least
one nonempty network in the sample to fit the model. It is of interest to propose an alter-
native method that, instead of fixing m1, sample networks until reaching a stop criterion.
Furthermore, Goncalves & Moura (2016) proposed a mixture model at a disaggregated
level that allows the assumption of heterogeneity between networks. Since both models
are disaggregated, it would be interesting to compare these methodologies. Finally, we
intend to develop a package in R programming language that allows the replication of the
main results of this dissertation.
42
Appendix A
Full conditional posterior
distributions of the parameters in
the proposed model
In this section we present the posterior full conditional distributions of the compo-
nents of the parameter vector Θ = (ηs,Ys, Ps, Xs,θ, β, α)′. We denote the posterior full
conditional of a parameter φ in Θ by [φ | · · · ].
A.1 Full conditional posterior distribution of θ
The posterior full conditional of θ = (θ0, θ1, . . . , θk)′ is proportional to
[θ | · · · ] ∝∏c∈Cs
exp{− exp{v′cθ}+ η(c)v′cθ}1− exp{− exp{v′cθ}}
× exp
{− 1
2σ2θ
θ′θ
},
which does not have an analytical closed-form. We use the block Metropolis-Hastings
algorithm with a multivariate Normal proposal, whose vector mean is the current value of
the parameter and the covariance matrix is fixed at Ik+1σ2θ∗ , where the term σ2
θ∗ controls
the acceptance rates and Ik+1 denotes the (k + 1)-dimensional identity matrix.
Considering the structure presented in Subsection 4.3.2, where the squared log altitude
is introduced to the model, the covariance matrix is fixed at (V′V)−1σ2θ∗ , where the term
σ2θ∗ controls the acceptance rates and V is the matrix with rows v′c, for all nonempty cell
c of the sample, that is c ∈ Cs.
43
A.2 Full conditional posterior distribution of α
The posterior full conditional of α is proportional to
[α | · · · ] ∝ αxs+xs+aα−1(1− α)M−xs−xs+bα−1
1− (1− α)M,
which is close to a Beta distribution but is not truly a Beta distribution, due to the
truncation term. We use the Metropolis-Hastings algorithm with a Beta proposal with
parameters beta(xs + xs + aα, M − xs − xs + bα).
A.3 Full conditional posterior distribution of β
The posterior full conditional of β is proportional to
[β | · · · ] ∝ βps+ps+aβ−1(1− β)xs+xs−ps−ps+bβ−1
1− (1− β)x,
which is close to a Beta distribution but is not truly a Beta distribution, due to the
truncation term. We use the Metropolis-Hastings algorithm with a Beta proposal with
parameters beta(ps + ps + aβ, xs + xs − ps − ps + bβ).
A.4 Full conditional posterior distribution of quan-
tities (Xs, Ps,Ys,ηs)
The joint posterior full conditional of (Xs, Ps,Ys,ηs) is proportional to
[Xs, Ps,Ys,Ns,ηs | · · · ]
∝m∏j=1
∑c∈Gij ,j
π(c)∑c∈R
π(c)−j−1∑k=0
∑c∈ek
π(c)
×∏c∈Cs
exp{− exp{v′cθ}+ η(c)v′cθ}η(c)!(1− exp{− exp{v′cθ}})
×(
1
ps + ps
)∑ps+psi=1 (yi−1)
1
(ps + ps)!
βps(1− β)xs−ps
1− (1− β)xs+xsαxs(1− α)xs
(M − xs + xs)!,
which does not have an analytical closed-form. We use the Metropolis-Hastings algorithm
for sampling (Xs, Ps,Ys,ηs) jointly. From the proposal distribution, it is straightforward
to sample (Xs, Ps,Ys,ηs) and jointly accept or reject these values using (a) as the target
distribution.
It is useful to generate Xs from a discrete uniform distribution with support in the
set {X∗ ± k : k = 1, . . . , 5}, where X∗ is is the current value of the X. Then, make
44
Xs = X − Xs, ensuring that Xs < X < M , since the number of nonempty cells in R
is at most M and it’s known that there are Xs nonempty cells in R. Note that, Ps is
the number of non-empty networks formed out of the Xs non-empty grid cells. Then, Ps
is generated by sampling from the truncated Binomial(Xs, β) distribution. Notice that
Ys is the number of non-empty grid cells in each of the Ps networks, so we generate Ys
from the 1Ps + multinomial(Xs − Ps, 1
Ps1Ps
)distribution. Then, the set of cells Cs that
compose the out-of-sample nonempty networks is established from Ys allocation process,
described in Subsection 4.1.1. From the covariates associated with Cs cells, we generate
ηs elements from the truncated Poisson(exp{v′cθ}) distribution, for c ∈ Cs. Therefore,
the proposal distribution is
[Xs, Ps,Ys,ηs]prop =1
10× xs!
ps!
βps(1− β)xs−ps
1− (1− β)xs×∏i/∈s
1
(yi − 1)!
(1
ps
)yi−1
×∏c∈Cs
exp{− exp{v′cθ}+ η(c)v′cθ}η(c)!(1− exp{− exp{v′cθ}})
. (A.1)
The improved sampling procedure, detailed in Subsection 4.1.2, leads to draw a
greater number of networks than without using the weights ω. Therefore, the final
sample s (made up by the first and second samples) may include all networks from R.
Thus, it is plausible that the proposal distribution of the disaggregated model fitted to
the final sample may lead to none out-of-sample nonempty cells, that is Xs = 0. Note
that, in this case, the number of nonempty cells in R, X, can assume value Xs. So, in
X generation, make Xs = X − Xs, but ensuring that Xs ≤ X < M . If X = Xs, then
Xs = 0 and the quantities Ps, Ys and ηs are, necessarily, equal to zero. Therefore, the
proposal distribution when Xs = 0 is
[Xs, Ps,Ys,ηs]prop =1
10.
A.5 Full conditional posterior distribution of ρ
The posterior full conditional of ρ (from Subsection 4.3.2) is proportional to
[ρ | · · · ] ∝∏
c∈{Cs,Cs}
(1
1 + exp{−v′cρ}
)φ(c)(exp{−v′cρ}
1 + exp{−v′cρ}
)1−φ(c)
×exp
{− 1
2σ2ρ
ρ′ρ
},
which does not have an analytical closed-form. We use the block Metropolis-Hastings
algorithm with a multivariate Normal proposal, whose vector mean is the current value
of the parameter and the covariance matrix is fixed at (V′V)−1σ2ρ∗ , where the term σ2
ρ∗
controls the acceptance rates and V is the matrix with rows v′c, for all cells c of the
sample, that is c ∈ {Cs,Cs}.
45
Appendix B
Assessment of MCMC with real data
In Section 4.3, we compared the results of our approach to those obtained using the
model proposed by Rapley & Welsh (2008). This appendix presents the convergence
results of the design-based experiment with a real population, displayed in Subsection
4.3.2. We evaluated the convergence of two parallel chains according to each number of
networks sampled from the real population. The results are presented in Table B.1 and
Figure B.1.
Number of networks sampled
Parameter 1 2 3 4 5
α -2.04 0.96 -0.53 -0.90 0.94
β -2.18 1.50 1.14 0.20 0.54
T -1.78 0.52 -0.12 -1.08 1.22
Table B.1: Geweke convergence diagnostic for some of the parameters estimated for the real
population for both models.
46
α0.
010.
060.
11
β
0.00
0.25
0.50
200
1400
2600
T
1 N
S
α0.
010.
080.
15
β
0.00
0.25
0.50
100
500
900
T
2 N
S
α0.
010.
090.
17
β
0.00
0.25
0.50
200
800
1400
T
3 N
S
α0.
020.
080.
14
β
0.04
0.29
0.54
200
600
1000
T
4 N
S
α0.
020.
130.
24
0 1000 2000
Iterations
β
0.04
0.27
0.50
0 1000 2000
Iterations
300
900
1500
T
0 1000 2000
Iterations
5 N
S
Figure B.1: Trace plot with the posterior densities of α, β and T obtained from the fits of the
disaggregated and the aggregated models to real data, for each number of networks sampled
(NS). The black line represents the true value of T .
47
Bibliography
Baddeley, A. & Turner, R. (2000), ‘Practical maximum pseudolikelihood for spatial
point patterns: (with discussion)’, Australian & New Zealand Journal of Statistics
42(3), 283–322.
Bennitt, E., Bonyongo, M. C. & Harris, S. (2014), ‘Habitat selection by african buffalo
(syncerus caffer) in response to landscape-level fluctuations in water availability on two
temporal scales’, PloS one 9(7), e101346.
Brix, A. & Diggle, P. J. (2001), ‘Spatiotemporal prediction for log-gaussian cox processes’,
Journal of the Royal Statistical Society: Series B (Statistical Methodology) 63(4), 823–
841.
Cassel, C. M., Sarndal, C. E. & Wretman, J. H. (1977), Foundations of inference in
survey sampling, New York: Wiley.
Diggle, P. J. (1975), ‘Robust density estimation using distance methods’, Biometrika
62(1), 39–48.
Gattone, S. A., Mohamed, E., Dryver, A. L. & Munnich, R. T. (2016), ‘Adaptive cluster
sampling for negatively correlated data’, Environmetrics 27(2), E103–E113.
Gelman, A., Carlin, J. B., Stern, H. S. & Rubin, D. B. (1995), Bayesian data analysis,
Chapman & Hall.
Goncalves, K. C. M. & Moura, F. A. S. (2016), ‘A mixture model for rare and clustered
populations under adaptive cluster sampling’, Bayesian Analysis 11, 519–544.
Philippi, T. (2005), ‘Adaptive cluster sampling for estimation of abundances within local
populations of low-abundance plants’, Ecology 86(5), 1091–1100.
Plummer, M., Best, N., Cowles, K. & Vines, K. (2006), ‘Coda: Convergence diagnosis
and output analysis for mcmc’, R News 6(1), 7–11.
URL: https://journal.r-project.org/archive/
48
Prins, H. (1996), Ecology and behaviour of the African buffalo: social inequality and
decision making, Vol. 1, Springer Science & Business Media.
R Core Team (2019), R: A Language and Environment for Statistical Computing, R
Foundation for Statistical Computing, Vienna, Austria.
URL: https://www.R-project.org/
Rapley, V. & Welsh, A. (2008), ‘Model-based inferences from adaptive cluster sampling’,
Bayesian Analysis 3, 717–736.
Salehi M., M. & Seber, G. A. F. (1997), ‘Adaptive cluster sampling with networks selected
without replacement’, Biometrika 84(1), 209–219.
Smith, D. R., Conroy, M. J. & Brakhage, D. H. (1995), ‘Efficiency of adaptive cluster
sampling for estimating density of wintering waterfowl’, Biometrics pp. 777–788.
Su, Z. & Quinn, T. J. (2003), ‘Estimator bias and efficiency for adaptive cluster sam-
pling with order statistics and a stopping rule’, Environmental and Ecological Statistics
10(1), 17–41.
Thompson, S. K. (1990), ‘Adaptive cluster sampling’, Journal of the American Statistical
Association 85, 1050–1059.
Thompson, S. K. & Seber, G. A. F. (1996), Adaptive sampling, Wiley Series in Probability
and Statistics, Wiley.
49