Model-based Inference for Rare and Clustered Populations ... · Cluster Sampling using Auxiliary Variables Izabel Nolau de Souza Orientadores: Kelly Cristina Mota Gon˘calves e Jo~ao

Model-based Inference for Rare and

Clustered Populations from Adaptive

Cluster Sampling using Auxiliary Variables

Izabel Nolau de Souza

Orientadores: Kelly Cristina Mota Goncalves

e

Joao Batista de Morais Pereira

Universidade Federal do Rio de Janeiro

Instituto de Matematica

Departamento de Metodos Estatısticos

2020

Model-based Inference for Rare and

Clustered Populations from Adaptive

Cluster Sampling using Auxiliary Variables

Izabel Nolau de Souza

Dissertacao de Mestrado apresentada ao Programa de Pos-graduacao em Estatıstica do

Instituto de Matemaica da Universidade Federal do Rio de Janeiro - UFRJ, como parte

dos requisitos necessarios a obtencao do tıtulo de Mestre em Estatıstica. Aprovado por:

Kelly Cristina Mota Goncalves

Dr.Sc. - IM/UFRJ - Orientadora.

Joao Batista de Morais Pereira

Dr.Sc. - IM/UFRJ - Coorientador.

Carlos Antonio Abanto-Valle

Dr.Sc. - IM/UFRJ.

Fernando Antonio da Silva Moura

Dr.Sc. - IM/UFRJ.

Pedro Luis do Nascimento Silva

Dr.Sc. - ENCE.

Rio de Janeiro, RJ - Brasil

30 de abril de 2020

ii

Agradecimentos

A concretizacao deste projeto nao se deve apenas a mim, mas tambem a todos aqueles

que de forma direta ou indireta se envolveram. Foi enorme e constante a partilha das

inumeras duvidas, incertezas, conquistas e muitas aprendizagens.

Agradeco primeiramente a Deus, por ter me dado saude e forca, nao somente durante

o mestrado, mas em todos os momentos, permitindo que tudo isso acontecesse.

Agradeco aos meus pais, Izilda e Fernando Jose, por todo amor, apoio e incentivo

que sempre me deram, celebrando minhas conquistas como se fossem deles proprios.

Obrigada pela educacao que me proporcionaram e por acreditarem tanto em mim! A

todos os familiares e amigos que me apoiaram e acompanharam toda minha trajetoria,

muito obrigada!

Agradeco aos meus orientadores Kelly e Joao, pelo incentivo, atencao e suporte que

me deram e pelas correcoes que fizeram neste trabalho visando a sua melhoria. Obrigada

nao apenas pela amizade, mas tambem por toda preocupacao que sempre demonstraram

com o meu futuro e por sempre acreditarem no meu potencial.

Agradeco a todos os professores que tive ao longo da minha vida, em especial aos

professores que me acompanharam durante a graduacao e o mestrado, que tiveram tanta

paciencia comigo e que tanto contribuıram para a minha formacao.

Agradeco aos professores Carlos Antonio Abanto-Valle (UFRJ), Fernando Antonio

Moura (UFRJ) e Pedro Luis do Nascimento (ENCE) por aceitarem fazer parte da minha

banca.

Por fim, agradeco ao Conselho Nacional de Desenvolvimento Cientıfico e Tecnologico

(CNPq), pelo apoio financeiro dos meus estudos.

iii

Abstract

Rare populations, such as endangered animals and plants, drug users and individuals

with rare diseases, tend to cluster in regions. Adaptive cluster sampling is generally

applied to obtain information from clustered and sparse populations since it increases

survey effort in areas where the individuals of interest are observed. This work aims to

propose a unit-level model which assumes that counts are related to auxiliary variables,

improving the sampling process, assigning different weights to the cells, besides referring

them spatially. The proposed model fits rare and grouped populations, disposed over a

regular grid, in a Bayesian framework. The approach is compared to alternative methods

using simulated data and a real experiment in which adaptive samples were drawn from an

African Buffaloes population in a 24,108km2 area of East Africa. Simulation studies show

that the model is efficient under several settings, validating the methodology proposed

in this dissertation for practical situations.

Keywords: Informative sampling, MCMC, spatial sampling, zero-inflated count data

iv

Contents

1 Introduction 1

2 Inference for finite populations 4

2.1 Fixed population approach . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.2 Superpopulation models . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2.1 Informative sampling design . . . . . . . . . . . . . . . . . . . . . 6

3 Adaptive cluster sampling 9

3.1 Thompson (1990)’ approach . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.2 Extensions of adaptive cluster sampling . . . . . . . . . . . . . . . . . . . 12

3.3 Model-based inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

4 Model for rare and clustered populations under adaptive sampling using

covariates 15

4.1 Proposed model for cell counts using covariates . . . . . . . . . . . . . . 16

4.1.1 Model inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4.1.1.1 Allocating procedure . . . . . . . . . . . . . . . . . . . . 19

4.1.2 Sampling procedure . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.1.2.1 Border-sampling procedure . . . . . . . . . . . . . . . . 23

4.2 A first simulation study . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4.3 Comparison with the aggregated model . . . . . . . . . . . . . . . . . . . 27

4.3.1 A design-based experiment evaluating the sample fraction . . . . 27

4.3.2 A real application . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

v

4.4 Model-based experiment under different settings . . . . . . . . . . . . . . 38

5 Conclusions 41

A Full conditional posterior distributions of the parameters in the pro-

posed model 43

A.1 Full conditional posterior distribution of θ . . . . . . . . . . . . . . . . . 43

A.2 Full conditional posterior distribution of α . . . . . . . . . . . . . . . . . 44

A.3 Full conditional posterior distribution of β . . . . . . . . . . . . . . . . . 44

A.4 Full conditional posterior distribution of quantities (Xs, Ps,Ys,ηs) . . . . 44

A.5 Full conditional posterior distribution of ρ . . . . . . . . . . . . . . . . . 45

B Assessment of MCMC with real data 46

vi

List of Figures

3.1 Illustration of adaptive cluster sampling procedure for a rare clustered popula-

tion distributed in a region with M = 400 grid cells. Figure (a) presents the

initial sample of m = 10 cells in dark grey. From this sample, in Figure (b),

neighbors are added to the sample whenever there is at least one observation

(black dot) in the selected cell, finally setting the sample presented in Figure (c). 10

3.2 Illustration of important concepts in adaptive cluster sampling: bold bordered

squares correspond to the observed cluster, gray squares are the network units,

and the hatched part are the border units. The unit initially selected is in

darker gray. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

4.1 Allocation method illustration of two out-of-sample networks of sizes 3 and 2,

based on weights λ (gray background). The lighter the cells’ color, the higher

the λ value (intensity) of that cell. The white cells with bold borders, whose

weights are equal to zero, are the sampled networks’ cells and the hatched ones

correspond to the nonempty networks’ border cells. The red borders surround

cells that can be drawn in each stage of the procedure. The cells that compose

the first and second allocated networks are blue-painted and green-painted,

respectively. The example proceeds as follows: draw one of the red-surrounded

cells of Panel (a). The sampled cell is blue-indicated in Panel (b) and only its

neighbors can be sorted to keep building this network. In Panel (c) and (d)

the allocation of the 3-sized network is finished, and we can draw any cell with

the red border to start the allocation of the 2-sized network. Panels (e) and (f)

present the cells chosen to compose this network in green. . . . . . . . . . . . 20

vii

4.2 Proposed sampling procedure illustration of a population (points) distributed

in a region with M = 400 cells and weights (gray background) used in this

scheme. The lighter the cells’ color, the higher the weight of that cell. The

white cells, whose weights are equal to zero, with bold borders are the sampled

networks’ cells and hatched cells correspond to the nonempty networks’ border

cells. Panels (a) and (b) present the same grayscale, since all cells have constant

weight in the first stage; and Panels (c) and (d) show different shades of gray

due to the second stage’s different weights. . . . . . . . . . . . . . . . . . . . 21

4.3 Values of the generated covariate (gray background) and counts for each nonempty

cell in a grid with M = 400 cells. . . . . . . . . . . . . . . . . . . . . . . . 25

4.4 Plot of the generated covariate versus the population counts and covariate’s

histogram, where the black points in the histogram represent the nonempty cells. 25

4.5 Boxplots with measurements of the point and 95% credibility interval estimates

for T over 100 simulations obtained for the fits of MAN, MDN and MDC models. 26

4.6 Relative frequency of the number of networks sampled over 100 simulations

obtained for the fits of MAN, MDN and MDC models. . . . . . . . . . . . . 27


for T over 500 simulations obtained for the fits of the disaggregated and aggre-

gated models, considering different sample sizes and proportions. . . . . . . . 29

4.8 Relative frequency of the number of networks sampled over 500 simulations

obtained for the fits of the disaggregated and aggregated models, considering

different sample sizes and proportions. . . . . . . . . . . . . . . . . . . . . . 30

4.9 Mean (white line) and 2,5% and 97,5% quantiles (gray bars) of the relative bias

estimates of T over 500 simulations obtained for the fits of the disaggregated

and aggregated models, considering different sample sizes, proportions, numbers

of networks sampled (number of simulations given below the gray bars) and

coverage (above the gray bars). . . . . . . . . . . . . . . . . . . . . . . . . 31

4.10 Altitude in a logarithm scale (gray background) and counts of African Buffaloes

over parts of Kenya and Tanzania in 2010 in a grid with M = 391 cells. . . . . 32

4.11 Plot of altitude in a logarithm scale versus counts of African Buffaloes, and

covariate’s histogram, where the black points in the histogram represent the

nonempty cells. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

viii


for T over 500 simulations obtained for the fits of the disaggregated and aggre-

gated models to real data. . . . . . . . . . . . . . . . . . . . . . . . . . . . 34


for T over 100 simulations for each number of networks sampled, obtained for

the fits of the disaggregated and aggregated models to real data. . . . . . . . 36


for T over 100 simulations for each number of networks sampled, obtained for

the fits of the disaggregated and aggregated models. . . . . . . . . . . . . . . 37

4.15 Maps of the posterior mean of African Buffalo counts η(c) (gray background)

for all out-of-sample cell c of R, for each number of sampled networks, and its

population (points) distributed in a region with 391 cells. The lighter the cells’

color, the higher the posterior mean of that cell. The blue cells are the sampled

networks’ cells and the hatched cells correspond to the nonempty networks’

border cells. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38


of the proposed model and population parameters over 500 simulations for dif-

ferent values of α, β and T . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

B.1 Trace plot with the posterior densities of α, β and T obtained from the fits of

the disaggregated and the aggregated models to real data, for each number of

networks sampled (NS). The black line represents the true value of T . . . . . 47

ix

List of Tables

4.1 Summary measurements of the point and 95% credibility interval estimates of

the population total T , obtained by fitting MAN, MDN and MDC models under

100 samples according to each model. . . . . . . . . . . . . . . . . . . . . . 26

4.2 Values of the sample sizes m, m1 and m2, according to each fixed percentage. 28

4.3 Summary measurements of the point and 95% credibility interval estimates for

T over 500 simulations obtained for the fits of the disaggregated and aggregated

models, considering different sample sizes and proportions. . . . . . . . . . . 29

4.4 Percentage of networks sampled over 500 simulations under each model. . . . 35

4.5 Summary measurements of the point and 95% credibility interval estimates for

T over 100 simulations for different numbers of networks sampled, obtained for

the fits of the disaggregated and aggregated models. . . . . . . . . . . . . . . 35

4.6 Summary measurements of the point and interval estimates of the population

total, obtained by fitting the disaggregated and aggregated models and Raj’s

estimator. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.7 Summary measurements of the point and 95% credibility interval estimates of

the proposed model and population parameters over 500 simulations for different

values of α, β and T . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

B.1 Geweke convergence diagnostic for some of the parameters estimated for the

real population for both models. . . . . . . . . . . . . . . . . . . . . . . . . 46

x

Chapter 1

Introduction

In several statistical surveys, there are obstacles in data collection since the study

object is hard to observe either because it is a rare population, exhibits a pattern of

sparsely distributed groups in a region, or is mobile over time. Examples of populations

with these characteristics include endangered animals and plants, ethnic minorities, drug

users, individuals with rare diseases, and recent immigrants. Assume that the population

of interest is spatially distributed in a region of interest, where a regular grid with M

equal-sized cells is superimposed. Denote the partitioned region by R = {c1, . . . , cM}.Let η(c) denote the number of individuals of the population within the grid cell c, for

all c ∈ R, that is, this cell’s count. The objective is to estimate a rare and clustered

population total T =∑

c∈R η(c).

Under traditional sampling methods of grid cells, a subset of m < M cells is drawn

and their respective counts η(c) are observed. Due to the population characteristics,

small sample sizes result in large numbers of empty grid cells, for which η(c) = 0, leading

us to inaccurate estimates of the population quantity of interest. In this context, adaptive

cluster sampling, introduced by Thompson (1990), is a way to surmount this difficulty by

increasing survey effort around non-empty grid cells of the sample. From an initial sample

of m grid cells, when we find a non-empty grid cell, for which η(c) 6= 0, we also sample

its neighbors (cells sharing a common edge with the current one) and continue surveying

until we obtain a set of contiguous non-empty grid cells surrounded by empty grid cells.

By this way, empty grid cells bring no further survey effort. Therefore, adaptive cluster

sampling requires some prior knowledge about the structure of the subjacent population,

which may be obtained from a preliminary survey, to be effective.

According to Thompson (1990), the set of contiguous non-empty grid cells is called a

network; this set plus its neighboring empty grid cells are together named a cluster; and

1

empty cells are defined as one-sized networks. Therefore, R is exhaustively partitioned

into disjoint networks, and the final sample contains empty and non-empty networks.

Thompson (1990) treated empty edge cells as unobserved and, from an initial random

sample without replacement of grid cells, inclusion probabilities are assigned to the sam-

pled networks, used to construct design-unbiased estimators of T and their variances.

Note that the networks are the basis of the analysis and, although the initial selection of

cells is without replacement, the same network can be selected more than once, a problem

that Thompson (1990) solved by allowing multiple inclusions of networks. Edge cells can

be incorporated into the estimator by taking the conditional expectation of their counts

given the minimal sufficient statistic and setting the Rao-Blackwell improved version of

that. These estimators were described and computed for small sample sizes in Thompson

(1990). Further, Salehi M. & Seber (1997) proposed a scheme whereby the networks are

selected one by one without replacement, avoiding select the same network more than

once.

Several studies have been conducted using adaptive sampling designs on real popu-

lations. For example, Smith et al. (1995) studied the methodology for rare species of

waterfowl, Su & Quinn (2003) discussed adaptive cluster sampling with order statistics

and a stopping rule for a fish population, Philippi (2005) showed that it is a viable al-

ternative for the estimation of occurrences in local populations of low-abundance plants

and Gattone et al. (2016) applied it to negatively correlated data.

Thompson & Seber (1996) examined some general ideas about model-based inference

approaches for adaptive sampling. Bayesian methods showed promising results among

model-based approaches. Beside them, Bayesian inference methods for adaptive cluster

sampling designs have been developed in Rapley & Welsh (2008) and Goncalves & Moura

(2016), which incorporate prior knowledge that the population is rare and grouped for

both inference and sample design. Rapley & Welsh (2008) provided a model at the

network level, while Goncalves & Moura (2016) modeled at the cell level, considering

heterogeneity among units belonging to different clusters. Both works did not take into

account the spatial locations of the networks, a fact that does not cause any loss of

information about the population total since it does not depend on where the networks

are located, under the model.

A possible approach to spatially model clustered data is by using point processes

(Diggle, 1975; Baddeley & Turner, 2000; Brix & Diggle, 2001), where the clusters are

considered as points and have no internal spatial structure, although there is a spatial

relationship between them. Rapley & Welsh (2008) place the clusters and give them a

spatial size by superimposing a grid on a region containing a clustered population and

2

modeling it within this grid structure. In this case, it is assumed that the intensity of

the counts in each cluster is proportional to its size. However, this assumption is not

always valid. In some situations, cells that belong to the same cluster can have different

intensities, e.g. the border cells can present a smaller incidence rate than the central

ones. Moreover, a cluster can have a higher incidence of the phenomenon, not because of

its size, but due to other factors that influence its disposition, as a spatially referenced

covariate.

This work aims to present a disaggregated model, at cell level, which assumes that the

intensity in each cell of a cluster is related to an available covariate value. The proposed

model fits rare and grouped populations, disposed over a regular grid, in a Bayesian

framework. The key idea of this dissertation is the improvement of the population esti-

mates through the use of grid cells as analysis units and the incorporation of additional

information into the model. Based on this extra information, we also raise an improved

sampling process, where different probabilities are assigned to draw the cells, and we can

spatially reference the estimates of the cell counts. Introducing additional information

seems to be an intuitive idea, provided that the prior knowledge indicates that there is

a relationship between the phenomenon occurrence and some covariate.

This dissertation is organized as follows. In Chapter 2, the notation of finite popula-

tion sampling is introduced, which will be used throughout the text, as well as design-

based and superpopulation approaches. Chapter 3 presents the adaptive sampling plans’

methodology, some extensions, and a model-based approach proposed by Rapley & Welsh

(2008), which motivated the ideas of this work. The proposed model is introduced in

Chapter 4, a new sampling procedure is proposed and aspects of inference are discussed.

Moreover, simulation studies are presented for assessing the effectiveness of the proposed

model and the one proposed by Rapley & Welsh (2008) and considering the estimation

of model parameters under different degrees of rare and clustered populations. Also, a

real population is presented and used in an evaluation of the performance of the pro-

posed model. Finally, we conclude with a brief discussion about the advantages of our

methodology and suggestions for further research in Chapter 5.

3

Chapter 2

Inference for finite populations

In this chapter, important notation and definitions in the theory of finite population

sampling, that will be used throughout this work, are presented. In this context, there

are two possible approaches: (i) fixed population approach, where each population unit is

associated with a fixed but unknown real number, that is the value of the variable under

study; and (ii) superpopulation approach, where each population unit is associated with

a random variable for which a stochastic structure is specified, and the actual value

associated with the population unit is treated as the outcome of this random variable. In

Section 2.1, the first approach is presented, and in Section 2.2, the second one, for which

the sample design could be considered relevant to perform Bayesian inference about the

model parameters.

2.1 Fixed population approach

According to Cassel et al. (1977), a finite population, for which we are interested in a

characteristic n, is a collection of M units, denoted by the index set P = {1, . . . ,M}, for

M <∞ supposedly known. It is important to remark that, in association with each unit

i, i = 1, . . . ,M , we also have the value ni. Thus, when data is observed, we should record

not only these values, but also the respective unit that produced each measurement. The

complete observation is denoted by the pair (i, ni) and, therefore, there are M pairs for

the whole population.

In statistical inference, a parameter is frequently treated as an unknown quantity

indexing a probability distribution. Define n = (n1, . . . , nM)′ as the parameter of the

finite population, belonging to a parameter space defined in RM . Any real function of

n1, . . . , nM is called a parametric function. For example, the number of people with

4

some disease in M neighborhoods, or the number of animals of a particular species in

M locations. Inference in finite populations is usually made about a specific parametric

function, such as the population total T =∑M

i=1 ni, the population mean µ = T/M or

the population variance σ2 =∑M

i=1(ni − µ)2/M . In particular, in this work, we aim to

estimate the population total.

We make statistical inferences about these parametric functions based on information

obtained from a sample of the population P (more details can be seen in Cassel et al.

(1977)). A sequence s = {i1, . . . , im} such that ij ∈ P , for j = 1, . . . ,m, is called an

ordered sample, or simply a sample, of size m. The label ij is called the j-th component

of s. The finite population sampling based on randomization of the sample differs from

other parts of statistics since it treats the population as fixed. In this approach, the

probabilistic mechanism of sample selection is a predetermined randomization procedure

called sample design. It is represented by a probability function, known as sampling

plan, of the set S of all possible samples s, where [s] denotes the probability to select

the sample s. A sample design [·] is called non-informative if and only if [·] is a function

that does not depend on the values of n associated with s. Otherwise, it is called an

informative sampling plan and is denoted by [s | n].

Once s is selected, the observed result can be specified as the set of pairs d = {(i, ni) :

i ∈ s}. In some cases, the interest is only in the values of n and not in the full pair, so

define ns = {ni : i ∈ s}. Let s = P − s and so ns = {ni : i ∈ s} be the values of n

associated to the units that do not belong to the sample.

In the following section, we present the approach based on superpopulation models,

where the sample remains fixed, and the value related with the population unit is treated

as the outcome of its associated random variable, and the inferences refer to a hypothetical

superpopulation, in which a probability law governs the variables of interest.

2.2 Superpopulation models

Another inferential approach for finite populations is based on superpopulation mod-

els. The process of statistical inferences from a sample comprises a set of principles

and procedures that may include, for example, knowledge of some random process that

generated the true unknown value of the characteristic of interest for each unit of the

population. This process is represented by a model that is used as a basis for making

inferences.

While in conventional sampling theory values of the variable of interest corresponding

5

to the population units are treated as fixed constants, under superpopulation models

approach, the value of the population vector n = (n1, . . . , nM)′ is considered a realization

of the random vector N = (N1, . . . , NM)′, for which there is a joint distribution of all

values of the population.

According to the model, suppose that, given a parametric vector θ ⊂ Θ, N follows

a probability distribution denoted by [N | θ]. Let N = (N1, . . . , NM)′ be the population

vector generated according to the distribution [N | θ]. Let H be the vector containing

additional variables associated with the structure of the population and suppose that the

joint distribution of H, which depends on a vector parameter ψ, is given by [H | ψ].

2.2.1 Informative sampling design

In a wide range of sampling designs, the sample selection mechanism may depend on

the values of the variables of interest in the population. This situation characterizes an

informative sampling plan. A typical example is a case-control study, where the sample

is selected such that there are cases (units with a given condition of interest) and controls

(units without this condition), and one is interested in modelling the indicator of presence

or absence of the condition as a function of predictor variables. This indicator is one of

the research variables and is considered in the sample selection mechanism.

Under the approach of superpopulation models, it is important to analyze whether the

selection probabilities of the population elements are related to the response variables. In

this case, it is relevant for inference to take into consideration the sampling plan, either

in the model design or in the construction of the likelihood function.

Let v be the set of variables that are fully observed for all units of the population

P . Even when we are primarily interested in some aspect of the distribution of N , v can

provide information through a regression setting. In other words, if v is fully observed,

then a model for N | v can lead to more precise inference about new values of N than

would be obtained by modeling N alone.

According to Gelman et al. (1995), when considering data collection, it is useful to

split the joint probability model into two parts: (i) the model for the underlying complete

data, N — including observed and unobserved components; and (ii) the model for the

sample s. The complete-data likelihood of sample s, vector N, and variables H, given

the parameters in the model and covariates v, is given by:

[s,N,H | v,θ,ψ] = [s | N,H][N | v,H,θ][H | ψ], (2.1)

which depends on the complete data N. In fact, the information obtained from a sample

6

is (s,Ns,Hs). Therefore, the likelihood of the observed data, assuming continuity of the

model’s quantities, without loss of generality, is:

[s,Ns,Hs | v,θ,ψ] =

∫ ∫[s,N,H | v,θ,ψ]dNsdHs

=

∫ ∫[s | N,H][N | v,H,θ][H | ψ]dNsdHs.

Under the Bayesian approach, we are interested in obtaining the posterior distribution

of the parametric vector. In this case, the joint posterior distribution of the model

parameters θ and ψ, given the observed information, (s,Ns,Hs,v), is:

[θ,ψ | s,Ns,Hs,v] ∝ [θ,ψ][s,Ns,Hs | v,θ,ψ]

= [θ,ψ]

∫ ∫[s,N,H | v,θ,ψ]dNsdHs

= [θ,ψ]

∫ ∫[s | N,H][N | v,H,θ][H | ψ]dNsdHs. (2.2)

The posterior distribution of θ is obtained by integrating the expression (2.2) over ψ,

as follows:

[θ | s,Ns,Hs,v] ∝∫ ∫ ∫

[θ,ψ][s | N,H][N | v,H,θ][H | ψ]dNsdHsdψ

∝∫ ∫ ∫

[θ][ψ | θ][s | N,H][N | v,H,θ][H | ψ]dNsdHsdψ

∝ [θ]

∫ ∫ ∫[ψ | θ][s | N,H][N | v,H,θ][H | ψ]dNsdHsdψ.(2.3)

If we decide to ignore the sampling design, we can compute the joint posterior distri-

bution of the model parameters θ and ψ by conditioning only on Ns, Hs and v but not

s, as follows:

[θ,ψ | Ns,Hs,v] ∝ [θ,ψ][Ns,Hs | v,θ,ψ]

= [θ,ψ]

∫ ∫[N | v,H,θ][H | ψ]dNsdHs. (2.4)

The posterior distribution of θ, ignoring the sampling design, is obtained from the

expression (2.4) and is given by:

[θ | Ns,Hs,v] ∝ [θ]

∫ ∫ ∫[ψ | θ][N | v,H,θ][H | ψ]dNsdHsdψ. (2.5)

When unobserved data supplies no information, i.e. when [θ | Ns,Hs,v] given in

(2.5) equals [θ | s,Ns,Hs,v] given in (2.3), the sample design is called ignorable (with

7

respect to the proposed model). In general, sampling plans involve some knowledge of the

structure of the population, such as stratification, conglomeration, and unequal selection

probabilities (complex sampling).

In this case, the sufficient condition to ensure design ignorability is [s | N,H] =

[s | Ns,Hs]. The important consequence of this condition is that, from (2.3), it follows

that if the sampling plan is ignored with respect to the parameter of interest θ, then

[θ | s,Ns,Hs,v] = [θ | Ns,Hs,v]. Thus, the additional information brought trough s

can be discarded when one wishes to make inference about θ, otherwise, it cannot be

ignored. Mistakenly ignoring the informative sampling plan in inference may negatively

affect the parameters’ estimates.

In this work, the approach based on the superpopulation model will be used, focusing

on inference about the model parameters and the prediction of T from data obtained by

adaptive cluster sampling, which is an informative sampling plan. As usual, we will avoid

evaluating the integrals presented in this section by simply drawing posterior simulations

of the joint vector of unknowns, (Ns,Hs,θ,ψ), and then focusing on estimates.

8

Chapter 3

Adaptive cluster sampling

In the context of rare and grouped populations, it is usual to divide the region of

interest into cells. When we use traditional sampling methods to sample a population

with these characteristics, small samples of cells result in a large number of empty cells,

leading us to inaccurate estimates of the population quantity of interest. Adaptive cluster

sampling is an alternative to deal with this difficulty since it allows us to increase survey

effort in the vicinity of regions where individuals of interest are found by using information

from the observed values to be more successful in collecting additional cells.

In Section 3.1, we present the adaptive sampling plan proposed by Thompson (1990),

since it is a suitable sampling plan for the type of population we aim to study in this work.

In Section 3.2 some extensions of this sampling plan are briefly presented. Finally, in

Section 3.3 the model-based approach proposed by Rapley & Welsh (2008) is presented,

for which the sampling plan is relevant to perform Bayesian inference about the model

parameters.

3.1 Thompson (1990)’ approach

Adaptive cluster sampling is generally used when we are dealing with sparse and

clustered populations, since this method help to more accurately estimate the population

totals. Consider a population spread non-homogeneously over a region, for instance, one

in which there are clusters, with a grid of size M superimposed upon it. In this case, the

population total may be estimated inefficiently: if the sample includes several clusters

the population total will be overestimated, and if the sample includes very few clusters,

it will be underestimated. In this situation, using an adaptive sampling strategy provides

more efficient estimators and, therefore, should be preferred in most cases.

9

Initially proposed by Thompson (1990), the method has shown to be effective in

epidemiological research and studies on rare diseases, animals and plants. From an

initial sample of m grid cells, when a selected cell contains a member of the population of

interest, the cells sharing a common edge with the current cell are also sampled and we

continue surveying until we obtain a set of contiguous non-empty grid cells surrounded

by empty grid cells. This procedure has shown to be intuitive since it is expected to find

an element with similar characteristics to another in its vicinity when the population is

grouped. In this way, empty grid cells bring no further survey effort. Hence, adaptive

cluster sampling requires some prior knowledge about the structure of the underlying

population, which may be obtained from a preceding survey, to be effective.

In Figure 3.1, the method is illustrated for a population distributed over a region

partitioned into M = 400 grid cells. The sampling procedure starts with a simple random

sample without replacement of m = 10 units, which are displayed in gray in the grid

(Figure 3.1a). Note that, from the 10 cells selected, only 2 of them contain a member of

the population of interest. Next, the units neighboring these 2 units are also included in

the sample (Figure 3.1b). We continue surveying neighbors until the process is finalized,

Figure 3.1c, with 45 sampled cells, represented by the highlighted ones.

●●●

● ● ●● ●

●

●●● ●●●

●●●●

●●●●

●

●

●●

●

●●●

●

●●

●●

●

●

●●

●

●●●●

●

●●●●●●●

●● ● ●

●●

●●

●●●●

●

● ● ●● ●● ●

●

●●

●●●

●● ● ● ● ●

●

●●●●●

●● ●

●

●●●

●

●●

●●

●●

●●

●●

●●

●●●

●●

●

●●

●●●●

●

●●

●

●

●●

●

●

●

●

●●●

●

● ● ●

●●●●

●●●● ● ●

●●

(a) Initial sample

●●●

● ● ●● ●

●

●●● ●●●

●●●●

●●●●

●

●

●●

●

●●●

●

●●

●●

●

●

●●

●

●●●●

●

●●●●●●●

●● ● ●

●●

●●

●●●●

●

● ● ●● ●● ●

●

●●

●●●

●● ● ● ● ●

●

●●●●●

●● ●

●

●●●

●

●●

●●

●●

●●

●●

●●

●●●

●●

●

●●

●●●●

●

●●

●

●

●●

●

●

●

●

●●●

●

● ● ●

●●●●

●●●● ● ●

●●

(b) Sampling process

●●●

● ● ●● ●

●

●●● ●●●

●●●●

●●●●

●

●

●●

●

●●●

●

●●

●●

●

●

●●

●

●●●●

●

●●●●●●●

●● ● ●

●●

●●

●●●●

●

● ● ●● ●● ●

●

●●

●●●

●● ● ● ● ●

●

●●●●●

●● ●

●

●●●

●

●●

●●

●●

●●

●●

●●

●●●

●●

●

●●

●●●●

●

●●

●

●

●●

●

●

●

●

●●●

●

● ● ●

●●●●

●●●● ● ●

●●

(c) Final sample

Figure 3.1: Illustration of adaptive cluster sampling procedure for a rare clustered population

distributed in a region with M = 400 grid cells. Figure (a) presents the initial sample of

m = 10 cells in dark grey. From this sample, in Figure (b), neighbors are added to the sample

whenever there is at least one observation (black dot) in the selected cell, finally setting the

sample presented in Figure (c).

The set of contiguous cells containing members of the population make up a network,

while the set of contiguous units sampled, both the network and the empty boundary

cells are together termed a cluster. These boundary cells are called edge cells. It is

also convenient to define all singular empty cells as networks, so an edge cell is, in fact,

a network of size one. These definitions are illustrated in Figure 3.2, which is part of

the sample seen in Figure 3.1. The squares with bold border correspond to the observed

10

cluster, the squares in gray make up the nonempty network and the hatched cells represent

the border units. The unit initially selected is in darker gray.

●

● ●●

● ●

●●

●

●

●

●●●

●

● ● ● ● ●

●

●

●●

●●

●

●●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

Figure 3.2: Illustration of important concepts in adaptive cluster sampling: bold bordered

squares correspond to the observed cluster, gray squares are the network units, and the hatched

part are the border units. The unit initially selected is in darker gray.

Although the initial sampled cells are distinct, a cluster may include more than one

unit of the initial sample, i.e., if two non-border units in the same cluster are initially

selected, then this cluster can occur twice in the final sample. Therefore, an adaptive

cluster sample, which begins with a selection without replacement of m initial units, has

the number of distinct nonempty networks less or equal to m. Thus, the final number of

sampled cells is a random variable and cannot be set.

When the data are analyzed, the networks become the unit of analysis and the respec-

tive boundaries are ignored if they have not appeared in the original sample (Thompson

& Seber, 1996). Networks are used as unit of analysis because it is possible to calculate

inclusion probabilities, based on their size, for each of them in the model. In addition,

since the networks are disjoint, they form a partition over the region of interest. Besides

that, grid cells within networks usually have a structure of dependency and considering

the network as a unit of analysis avoids the need to explicitly define this structure in the

model.

Conventional estimators under adaptive cluster sampling design tend to be biased

since nonempty cells are sampled disproportionately. Based on this idea, Thompson

(1990) obtained an unbiased estimator under this sample design for the population mean

and allowed multiple inclusions of networks. Moreover, the Thompson (1990)’s approach

lets edge cells to be incorporated into the estimator by taking its conditional expectation

given the minimal sufficient statistic and setting the Rao-Blackwell improved version of

that. These estimators were described and computed for small sample sizes in Thompson

(1990). From this work, some extensions of this sampling design, in addition to the

initial selection based on simple random sampling, appeared in the literature and will be

presented below.

11

3.2 Extensions of adaptive cluster sampling

Several extensions to simple random sampling — e.g., stratified sampling — can also

be applied to adaptive sampling. Methods for stratified adaptive cluster sampling were

first proposed in Thompson (1991b) and another extension was proposed in Thompson

(1991a). In these approaches, primary sampling units, for example, groups of units

arranged in strips or rectangles, are defined and then randomly sampled. If a member

of the population is found within a primary unit, secondary units outside of the primary

unit are added to the sample in the same way as in normal adaptive cluster sampling

(Thompson, 1991a). Further, Salehi M. & Seber (1997) proposed a scheme whereby the

networks are selected one by one without replacement, avoiding the selection of the same

network more than once.

Thompson & Seber (1996) examined some general ideas about model-based inference

approaches for adaptive sampling. The likelihood-based methods, such as Bayesian es-

timation, showed promising results among model-based approaches, some of which are

detailed in the next section.

3.3 Model-based inference

Bayesian inference methods for adaptive cluster sampling designs have been devel-

oped in Rapley & Welsh (2008) and Goncalves & Moura (2016), which incorporate prior

knowledge that the population is rare and grouped for both inference and sample de-

sign. Rapley & Welsh (2008) provided a model at the network level, while Goncalves &

Moura (2016) modeled at the cell level, considering heterogeneity among units belonging

to different clusters. Both works did not take into account the spatial locations of the

networks, a fact that does not cause any loss of information about the population total

since, under the model, it does not depend on where the networks are located. In this

chapter, we will focus on Rapley & Welsh (2008)’s model.

Rapley & Welsh (2008) proposed a complex model, which uses networks as units

of analysis. Therefore, we refer to this model as the aggregate model. The use of the

Bayesian approach is a natural extension of the idea of adaptive cluster sampling because

it incorporates the prior knowledge that the rare population is grouped for both inference

and sample design. To illustrate their proposal, Rapley & Welsh (2008) compare their

estimators with the estimators developed in Thompson (1990) through a simulation study,

showing to be efficient, especially in a context of prior knowledge.

12

Let R be a region containing a rare, clustered population, over which a regular grid

partitioned into M cells overlaps. A cell is considered nonempty if it contains at least

one member of the population and empty otherwise. Let X ≤ M be the number of

nonempty cells in R. Let P ≤ X be the number of nonempty networks in R, where a

network is defined as in Thompson (1990). Let Yi be the number of nonempty cells within

the nonempty network i, for i = 1, . . . , P , and therefore Y = (Y1, . . . , YP )′ is the vector

with the number of nonempty cells within each nonempty network, so that X =∑P

i=1 Yi.

Note that there are M −X empty cells, which are defined as one-sized empty networks,

so there are M −X + P networks in R. We can extend the P -dimensional Y vector to

a (M − X + P )-dimensional vector given by Z = (Y′,1′M−X)′ where 1M−X is a vector

of dimension M − X. Thus, it follows that Zi = Yi, if the i-th network is a nonempty

one and Zi = 1, otherwise, for i = 1, . . . ,M − X + P . Let Ni be the count of a given

phenomenon of interest in the nonempty network i and, therefore, N = (N1, . . . , NP )′

denotes the vector with the population total in each of the nonempty networks.

In order to perform inference about the population total T =P∑i=1

Ni, one must specify

the joint distribution of {X,P,Y,N} for the entire population and the sampling mecha-

nism that provides a particular sample of m networks from the M −X + P networks in

the population. First, we model the nonempty network structure and then, conditional

on it, model the count on the nonempty networks. Since the model applies to nonempty

cells, to avoid degeneration problems it is assumed that there is at least one nonempty

cell in R, so distributions are left truncated at zero. The proposed model can be written

as follows:

Ni | P, Yi, γ ∼ independent truncated Poisson(γYi), i = 1, . . . , P,

Y | X,P ∼ 1P + Multinomial

(X − P, 1

P1P

), Yi = 1, . . . , X − P,

P | X, β ∼ truncated Binomial(X, β), P = 1, . . . , X,

X | α ∼ truncated Binomial(M,α), X = 1, . . . ,M. (3.1)

The model in (3.1) is applied to samples collected according to the adaptive method

proposed by Salehi M. & Seber (1997), which consists of observing Yi for i ∈ s sequen-

tially. Since the sample design depends on the structure of the population, which is

unknown, being characterized as an informative sampling, it should be incorporated into

the likelihood function of the model to perform inference. Therefore, the next step is to

define the probability of selecting a sample s = {i1, . . . , im}, i.e., [s]. It is known that

if a cell that composes a nonempty network is sampled, then the entire network must

13

be observed and thus the probability of selecting a network is weighted according to its

size. To illustrate the construction of the probability of selection of a sample, consider

a population consisting of eight networks of sizes {1, 1, 1, 1, 3, 3, 5, 5}, from which the

sample {5, 1, 5, 3} is taken. The probability of selecting the first network is equal to

the probability of selecting a network of size 5, which is equal to 5×220

, the probability of

selecting a network of size 1 in the second step, given the previous one, is 1×415

and thus

the probability of selection of this particular sample is equal to

5× 2

20× 1× 4

20− 5× 5× 1

20− 5− 1× 3× 2

20− 5− 1− 5.

Therefore, the probability of selection of a particular sample can be generalized as

follows:

[s | X,P,Y] =m∏j=1

Zij × gij ,jM−X+P∑

i=1

Zi −j−1∑k=0

Zik

, (3.2)

where gij ,j is the number of networks of size Zij unselected after j − 1 networks have

been selected and Zi0 = 0.

Making an equivalence with the notation presented in Section 2.2, the following cor-

respondence can be obtained: M = M −X+P , H = (X,P,Y′)′, θ = γ and ψ = (α, β)′.

Note that the probability of selecting the sample s does not depend directly on N but

depends on the variables associated with the population structure, so it is said that the

sampling plan is informative with respect to H.

14

Chapter 4

Model for rare and clustered

populations under adaptive sampling

using covariates

Rapley & Welsh (2008) proposed a model that uses networks as units of analysis,

avoiding introducing spatial components in the model, which may facilitate the inference.

In this case, it is assumed that the intensity of the counts in each cluster is proportional

to its size. However, this assumption is not always valid, e.g. a cluster can have a

higher incidence of the phenomenon due to external factors that influence its disposition,

such as a spatially referenced covariate. Thus, proposing a disaggregated model can

be interesting in many contexts with rare and clustered populations. The objective of

this chapter is to present a disaggregated model at cell-level, which assumes that the

intensity of the counts in each cell of a cluster is related to an available covariate value.

The proposed model fits rare and grouped populations sampled under adaptive cluster

design. Therefore, the probability of selecting a given sample should be incorporated into

the model likelihood function. Introducing additional information in the model seems to

be an intuitive idea, provided that the prior knowledge indicates a relationship between

the phenomenon occurrence and some covariate.

In Section 4.1, the model is introduced, a new sampling procedure is proposed and

aspects of inference are discussed. Section 4.2 presents a simulation study for assessing

the effectiveness of the proposed model and the model proposed by Rapley & Welsh

(2008). In Section 4.3 both approaches are compared through a design-based perspective

under different scenarios, as well as a real data application. Finally, a simulation study to

evaluate the estimation of model parameters under different degrees of rare and clustered

15

populations is presented in Section 4.4.

4.1 Proposed model for cell counts using covariates

Suppose the phenomenon of interest is related to covariates, the values of which are

available for each one of the cells in R. Let C be the set of all nonempty cells of R and

C the set containing all empty cells of R. Let η(c) be the count of a given phenomenon

of interest in the cell c, and vc = (1, v1(c), . . . , vk(c))′ the vector with the k covariates

associated with cell c, for all c ∈ R. Let η be the set with the counts for all nonempty

cells, that is, η = {η(c) | c ∈ C}.

In order to perform inference about the population total T =∑c∈C

η(c), one must

specify the joint distribution of {X,P,Y,η} for the entire population and the sampling

mechanism that provides a particular sample ofm networks fromM−X+P in population.

First, we model the nonempty network structure and then, conditional on it, model the

count on the nonempty network’s cells, similarly to Rapley & Welsh (2008)’s approach.

Since the model applies to nonempty cells, to avoid degeneration problems it is assumed

that there is at least one nonempty cell in R, so distributions are left truncated at zero.

The proposed model can be written as follows:

η(c) | vc,θ ∼ truncated Poisson(λ(c)), η(c) ≥ 1, c ∈ C,

Y | X,P ∼ 1P + Multinomial

(X − P, 1

P1P

), Yi = 1, . . . , X − P,

P | X, β ∼ truncated Binomial(X, β), P = 1, . . . , X,

X | α ∼ truncated Binomial(M,α), X = 1, . . . ,M. (4.1)

where λ(c) = exp{v′cθ}, θ = (θ0, θ1, . . . , θk)′ represents the regression coefficients vector

associated with vc and 1P + Multinomial(·) represents the truncated at one Multinomial

distribution. Note that the M−X empty cells have their respective counts equal to zero,

that is, η(c) = 0, for all c ∈ C.

Making an equivalence with the notation presented in Section 2.2, the following corre-

spondence can be obtained: H = (X,P,Y′)′ and ψ = (α, β)′. According to the sampling

procedure, the drawn sample is composed of networks, from which the cells that compose

each of them and its respective counts are observed and used to model the population

total at a more disaggregated level. Furthermore, we can make an equivalence among the

vectors η and N since the first one contains the counts for each of the M cells of R and

the second one for each of the M −P +X networks of R. Let Ni be the count associated

16

with the i-th network, that can be empty or nonempty. If i-th network is empty, then

it is composed of a single cell and Ni = η(c), where c is the cell that composes the i-th

network. On the other hand, if i-th network is nonempty, then it is composed of a set

of cells ci and Ni =∑

c∈ci η(c). Moreover, the probability of selecting the sample s

does not depend directly on the quantities of the model, since they do not appear on

its expression, but on the allocation process that produces the set of cells that compose

unsampled networks and, consequently, the set Gij ,j in equation 4.3. Thus, it is said that

the sampling plan is informative.

4.1.1 Model inference

The sampling procedure entails observing Yi for the networks {i1, . . . im} and the

counts η(c) for its respective cells. Since adaptive cluster sampling procedure depends on

the population structure, it is characterized as an informative sampling design and the

probability of selecting the sample s = {i1, . . . , im} of m networks, [s | X,P,Y], should

be incorporated into the model likelihood function. Set the subscript ‘s’ to identify

the observed component and s to the unobserved component, and define Y = (Y′s,Y′s)′,

X = Xs+Xs and P = Ps+Ps to distinguish between observed and unobserved quantities.

Let Cs be the set of the sample’s nonempty cells, i.e., the cells that compose the networks

with sizes Ys; and Cs be the set of the out-of-sample nonempty cells, i.e., the cells that

compose the non-sampled networks with sizes Ys. Thus, define η = (η′s,η′s)′, where

ηs = {η(c) | c ∈ Cs} and ηs = {η(c) | c ∈ Cs}. A natural predictor of the population

total T is given by:

T =∑c∈Cs

η(c) +∑c∈Cs

η(c),

where η(c) represents the posterior mean of the count of the cell c, for c ∈ Cs.

Following the Bayesian paradigm, independent priors are also assumed for the un-

known parameters θ, α and β and their marginal prior distributions are denoted, respec-

tively, by [θ], [α] and [β]. Let [θ] be a non-informative prior with a zero-mean vector

and covariance matrix σ2θIk+1, where Ik+1 denotes the (k+1)-dimensional identity matrix

and σ2θ = 104. For α we assumed a Beta(aα, bα) distribution with aα = 3 and bα = 15,

and for β a Beta(aβ, bβ) distribution with aβ = 1 and bβ = 9. The prior distributions of

α and β are chosen to reflect the fact that α and β are necessarily small in a rare and

clustered population, as considered in Rapley & Welsh (2008). In this case, the objective

is not only to estimate the parameters of the model based on a sample, but also to make

predictions of the unobserved parts.

17

The joint distribution of all the quantities in the model is:

[η,Y, P,X,θ, β, α]

= [s | X,P,Y][η | θ][Y | X,P ][P | X, β][X | α][θ][α][β]

∝ [s | X,P,Y]×∏c∈C

exp{− exp{v′cθ}+ η(c)v′cθ}η(c)!(1− exp{− exp{v′cθ}})

×(x− p)!p∏i=1

1

(yi − 1)!

(1

p

)yi−1

×

(x

p

)βp(1− β)x−p

1− (1− β)x×

(M

x

)αx(1− α)M−x

1− (1− α)M

× exp

{− 1

2σ2θ

θ′θ

}× αaα−1(1− α)bα−1 × βaβ−1(1− β)bβ−1. (4.2)

We perform inference via MCMC to obtain samples from the resulting posterior distri-

bution. The full conditional posterior distributions and the methods adopted to sample

from each of them are detailed in Appendix A. In comparison with the sampling pro-

cedure proposed by Salehi M. & Seber (1997), our improved sampling process leads to

draw a greater number of networks, providing samples that may include all networks

from R (see details in Subsection 4.1.2). Thus, our proposal distribution, different from

Rapley & Welsh (2008)’s approach, may lead to none out-of-sample nonempty cells and,

consequently, none out-of-sample networks (see details in Appendix A). The estimation

procedure consists of the following steps:

(1) Initialize the counter j = 2 and set initial values for the parameters and quantities

of the model: θ(1), α(1), β(1), X(1)s , P

(1)s , Y

(1)s and η

(1)s ;

(2) Update the model parameters θ, α and β from the conditional distributions:

[θ | α(j−1), β(j−1), X(j−1), P (j−1),Y(j−1),η(j−1)],

[α | θ(j), β(j−1), X(j−1), P (j−1),Y(j−1),η(j−1)],

[β | θ(j), α(j), X(j−1), P (j−1),Y(j−1),η(j−1)],

described in Appendix A;

(3) Generate the non-sampled quantities Xs, Ps and Ys according to the proposal

distribution described in Section A.4;

(4) Allocate the Ps networks of Ys according to the allocating procedure described in

Subsection 4.1.1.1;

(5) Generate ηs and jointly update Xs, Ps, Ys and ηs from the conditional distribution:

[Xs, Ps,Ys,ηs | θ(j), α(j), β(j), Xs, Ps,Ys,ηs];

18

(6) Increment the counter j to j + 1 and iterate from (2).

Note that the regression coefficients θ are updated on step (2) based only on the

sample information. Moreover, from them, we can easily obtain the Poisson distribution’s

intensity λ(c) = exp{v′cθ} for any non-sampled cell c of R, which is used later to estimate

η(c) for all nonempty and non-sampled cell c of R. Let λ be the set of intensities assigned

to all cells of R. Then, after generating the non-sampled quantitiesXs, Ps and Ys, all that

remains is to find out which cells form each of these Ps networks on step (4), according

to the allocating procedure presented in Subsection 4.1.1.1.

4.1.1.1 Allocating procedure

Determining the cells that compose the out-of-sample nonempty networks is a crucial

step in the proposed model estimation since the resulting allocation directly impacts: the

cells that compose Cs and the estimated value of ηs. Each one of the generated out-of-

sample networks are allocated sequentially, according to its size: the bigger networks are

allocated first and the smaller ones later. It is assumed that the bigger the size of the

network, the higher its cells’ intensity values. Note that the cells that compose the set

of the out-of-sample cells, Cs, must not be part of the set of sampled cells, Cs, nor of

the sampled nonempty networks’ borders (if it happens, we would be able to modify a

network previously sampled).

The allocating procedure aims to draw the cells that compose each generated out-

of-sample network according to determined weights. In this case, we will use the set

of intensities λ, although one could sample the cells based on other practical weights.

The Cs cells’ and visited borders’ weights λ are admitted to be zero. An example of

this procedure is illustrated in Figure 4.1. The allocating method of a network of size Y

proceeds as follows: draw an available cell c with probability proportional to the weights

λ and, if Y > 1, draw another cell from the neighbors of that cell and continue to

draw another neighbors’ cells until we obtain a set of Y contiguous nonempty grid cells

surrounded by empty grid cells. Then remove this network from the population, select

one of the remaining grid cells with probability proportional to the weights λ and proceed

in this way until we have allocated all the Ps networks. Note that the cells that were not

chosen to be part of Cs are assumed to be empty.

19

●●

●●●

●

●●

●

●● ●

●●

●●

●

● ●

●●

●

●

●●

●

●

●

● ●

●● ●

●

● ●● ●

●

●●

●● ●

●

●●

●

●●●

●●

●●

●●●●

●

●●

●●

●●●●

●●

● ●●

●●●●

●●

● ●●●

●●

●●● ●

●●

●●

●●

●●

●●●

●

●●●

●● ●● ●●● ●

● ●●

●●●

●●

●●●●●

● ●●

●●●

●

●●●● ●●

●●

●

●

●● ●

●●

●●●●

●●

● ●●

●

●

●●

●

●●

●

●

●

●

● ●●●

●

●●●

●

● ● ●

●●

●

●●

●

●

●●●

● ●●

●●●●●

●

●●

(a)

→

●●

●●●

●

●●

●

●● ●

●●

●●

●

● ●

●●

●

●

●●

●

●

●

● ●

●● ●

●

● ●● ●

●

●●

●● ●

●

●●

●

●●●

●●

●●

●●●●

●

●●

●●

●●●●

●●

● ●●

●●●●

●●

● ●●●

●●

●●● ●

●●

●●

●●

●●

●●●

●

●●●

●● ●● ●●● ●

● ●●

●●●

●●

●●●●●

● ●●

●●●

●

●●●● ●●

●●

●

●

●● ●

●●

●●●●

●●

● ●●

●

●

●●

●

●●

●

●

●

●

● ●●●

●

●●●

●

● ● ●

●●

●

●●

●

●

●●●

● ●●

●●●●●

●

●●

(b)

→

●●

●●●

●

●●

●

●● ●

●●

●●

●

● ●

●●

●

●

●●

●

●

●

● ●

●● ●

●

● ●● ●

●

●●

●● ●

●

●●

●

●●●

●●

●●

●●●●

●

●●

●●

●●●●

●●

● ●●

●●●●

●●

● ●●●

●●

●●● ●

●●

●●

●●

●●

●●●

●

●●●

●● ●● ●●● ●

● ●●

●●●

●●

●●●●●

● ●●

●●●

●

●●●● ●●

●●

●

●

●● ●

●●

●●●●

●●

● ●●

●

●

●●

●

●●

●

●

●

●

● ●●●

●

●●●

●

● ● ●

●●

●

●●

●

●

●●●

● ●●

●●●●●

●

●●

(c)

●●

●●●

●

●●

●

●● ●

●●

●●

●

● ●

●●

●

●

●●

●

●

●

● ●

●● ●

●

● ●● ●

●

●●

●● ●

●

●●

●

●●●

●●

●●

●●●●

●

●●

●●

●●●●

●●

● ●●

●●●●

●●

● ●●●

●●

●●● ●

●●

●●

●●

●●

●●●

●

●●●

●● ●● ●●● ●

● ●●

●●●

●●

●●●●●

● ●●

●●●

●

●●●● ●●

●●

●

●

●● ●

●●

●●●●

●●

● ●●

●

●

●●

●

●●

●

●

●

●

● ●●●

●

●●●

●

● ● ●

●●

●

●●

●

●

●●●

● ●●

●●●●●

●

●●

(d)

→

●●

●●●

●

●●

●

●● ●

●●

●●

●

● ●

●●

●

●

●●

●

●

●

● ●

●● ●

●

● ●● ●

●

●●

●● ●

●

●●

●

●●●

●●

●●

●●●●

●

●●

●●

●●●●

●●

● ●●

●●●●

●●

● ●●●

●●

●●● ●

●●

●●

●●

●●

●●●

●

●●●

●● ●● ●●● ●

● ●●

●●●

●●

●●●●●

● ●●

●●●

●

●●●● ●●

●●

●

●

●● ●

●●

●●●●

●●

● ●●

●

●

●●

●

●●

●

●

●

●

● ●●●

●

●●●

●

● ● ●

●●

●

●●

●

●

●●●

● ●●

●●●●●

●

●●

(e)

→

●●

●●●

●

●●

●

●● ●

●●

●●

●

● ●

●●

●

●

●●

●

●

●

● ●

●● ●

●

● ●● ●

●

●●

●● ●

●

●●

●

●●●

●●

●●

●●●●

●

●●

●●

●●●●

●●

● ●●

●●●●

●●

● ●●●

●●

●●● ●

●●

●●

●●

●●

●●●

●

●●●

●● ●● ●●● ●

● ●●

●●●

●●

●●●●●

● ●●

●●●

●

●●●● ●●

●●

●

●

●● ●

●●

●●●●

●●

● ●●

●

●

●●

●

●●

●

●

●

●

● ●●●

●

●●●

●

● ● ●

●●

●

●●

●

●

●●●

● ●●

●●●●●

●

●●

(f)

Figure 4.1: Allocation method illustration of two out-of-sample networks of sizes 3 and 2,

based on weights λ (gray background). The lighter the cells’ color, the higher the λ value

(intensity) of that cell. The white cells with bold borders, whose weights are equal to zero, are

the sampled networks’ cells and the hatched ones correspond to the nonempty networks’ border

cells. The red borders surround cells that can be drawn in each stage of the procedure. The

cells that compose the first and second allocated networks are blue-painted and green-painted,

respectively. The example proceeds as follows: draw one of the red-surrounded cells of Panel

(a). The sampled cell is blue-indicated in Panel (b) and only its neighbors can be sorted to keep

building this network. In Panel (c) and (d) the allocation of the 3-sized network is finished, and

we can draw any cell with the red border to start the allocation of the 2-sized network. Panels

(e) and (f) present the cells chosen to compose this network in green.

4.1.2 Sampling procedure

A variation of the sampling procedure proposed by Salehi M. & Seber (1997) is pro-

posed here to improve the sampling process, aiming to sample more nonempty networks.

Let π be the set of sampling weights assigned to all cells of R and π(c) the weight for a

given cell c. The procedure consists of sampling a grid cell from the set of M grid cells

with probability proportional to the weights π and, if it is nonempty, the entire network

containing the selected grid cell. After removing this network from the population, a

new cell is selected from the remaining set of grid cells and the method proceeds in this

way until we have selected m networks in the sample. Note that a nonempty network is

surrounded by empty cells that make up its border and can be resampled.

The sampling process improvement proposed in this dissertation, illustrated in Figure

20

4.2, is divided into two stages and is based on weights that are used to draw the sample.

In the first stage, m1 networks are selected considering grid cells with equal weights, i.e.

π(c) is constant for all c ∈ R. The sampling procedure continues until all nonempty

cells in the neighborhood are observed and stop when empty units are visited. Thus,

the networks are selected with probability proportional to their size. Note that, during

this process, although the border cells are visited, they are not added to the sample.

Based on the fit of the proposed model in equation (4.1) to this first sample with m1

networks, we obtain the vector of weights ω for all non-sampled cells of R, which are

used to select the second sample. Let ω(c) be the weight defined by the posterior mean

of η(c), for each cell c ∈ R. Note that the higher the posterior mean of a cell count,

the more chances of selecting that cell. Due to the inference process, the weights ω

associated with the border cells are assigned to be zero. Since the first sample of the

network’s cells must not be drawn in the second sampling stage, the weights associated

with these cells are assumed to be zero too. Then, a second sample of m2 networks is

●

●

●●●

●

●●

●

●

● ●●

●

●●

●

● ●

●

●

●

●

●●

●

●

●

● ●

●● ●

●

● ●

● ●

●

●●

●● ●

●

●●

●

●●●

●●

●

●

●●

●●

●

●●

●●

●●

●

●●

●● ●●

●●●●

●●

● ●

●●

●●

●

●● ●●

●

●

●

●●

●●

●●●

●

●●●

●● ● ● ●●● ●

● ●●

●●●

●●

●●

● ●●

● ●●

●●●

●

●

●●● ●●●

●

●

●

●

● ●●

●●

●●●●●

● ●●

●

●

●●

●

●●

●

●

●

●

● ●

●●

●

●●●

●

● ● ●

●●

●

●

●

●

●

●●●

● ●●

●●●●●

●

●●

(a) Population and constant weights

1st stage−−−−−−→

●

●

●●●

●

●●

●

●

● ●●

●

●●

●

● ●

●

●

●

●

●●

●

●

●

● ●

●● ●

●

● ●

● ●

●

●●

●● ●

●

●●

●

●●●

●●

●

●

●●

●●

●

●●

●●

●●

●

●●

●● ●●

●●●●

●●

● ●

●●

●●

●

●● ●●

●

●

●

●●

●●

●●●

●

●●●

●● ● ● ●●● ●

● ●●

●●●

●●

●●

● ●●

● ●●

●●●

●

●

●●● ●●●

●

●

●

●

● ●●

●●

●●●●●

● ●●

●

●

●●

●

●●

●

●

●

●

● ●

●●

●

●●●

●

● ● ●

●●

●

●

●

●

●

●●●

● ●●

●●●●●

●

●●

(b) First sample: m1 networks

●

●

●●●

●

●●

●

●

● ●●

●

●●

●

● ●

●

●

●

●

●●

●

●

●

● ●

●● ●

●

● ●

● ●

●

●●

●● ●

●

●●

●

●●●

●●

●

●

●●

●●

●

●●

●●

●●

●

●●

●● ●●

●●●●

●●

● ●

●●

●●

●

●● ●●

●

●

●

●●

●●

●●●

●

●●●

●● ● ● ●●● ●

● ●●

●●●

●●

●●

● ●●

● ●●

●●●

●

●

●●● ●●●

●

●

●

●

● ●●

●●

●●●●●

● ●●

●

●

●●

●

●●

●

●

●

●

● ●

●●

●

●●●

●

● ● ●

●●

●

●

●

●

●

●●●

● ●●

●●●●●

●

●●

(c) First sample and weights ω

2nd stage−−−−−−→

●

●

●●●

●

●●

●

●

● ●●

●

●●

●

● ●

●

●

●

●

●●

●

●

●

● ●

●● ●

●

● ●

● ●

●

●●

●● ●

●

●●

●

●●●

●●

●

●

●●

●●

●

●●

●●

●●

●

●●

●● ●●

●●●●

●●

● ●

●●

●●

●

●● ●●

●

●

●

●●

●●

●●●

●

●●●

●● ● ● ●●● ●

● ●●

●●●

●●

●●

● ●●

● ●●

●●●

●

●

●●● ●●●

●

●

●

●

● ●●

●●

●●●●●

● ●●

●

●

●●

●

●●

●

●

●

●

● ●

●●

●

●●●

●

● ● ●

●●

●

●

●

●

●

●●●

● ●●

●●●●●

●

●●

(d) Final sample: m1 +m2 networks

Figure 4.2: Proposed sampling procedure illustration of a population (points) distributed in

a region with M = 400 cells and weights (gray background) used in this scheme. The lighter

the cells’ color, the higher the weight of that cell. The white cells, whose weights are equal

to zero, with bold borders are the sampled networks’ cells and hatched cells correspond to the

nonempty networks’ border cells. Panels (a) and (b) present the same grayscale, since all cells

have constant weight in the first stage; and Panels (c) and (d) show different shades of gray

due to the second stage’s different weights.

21

drawn with probability proportional to the weights ω. Hence, the final sample will be

given by s = s1 ∪ s2 = {i1, . . . , im1 , im1+1, . . . , im1+m2}, with size m = m1 +m2.

To motivate the notation for the probability of selecting a given sample, consider a

population consisting of networks of size Z from which we obtain the ordered sample

s = {i1, . . . , im}. The probability of selecting the j-th network of the sample, that is

a network of size Zij , is given by the sum of probabilities of selecting each unselected

network of size Zij after j − 1 networks have been observed, since networks with the

same size are considered alike. Thus, the probability of selecting a network in the sample

depends on its size Zi, which is only observed for the sampled networks after their

selection in the sample.

Let cj be the set of sampled cells in the j-th draw. Thus, cj is composed of the drawn

grid cell and, if it is nonempty, cj contains the entire network containing the selected grid

cell. Let Gij ,j be the set of cells that compose unselected networks of size Zij after j − 1

networks have been selected. Thus, in general, the probability of selecting the sample

s = {i1, . . . , im} of m networks is given by:

[s | X,P,Y] =m∏j=1

∑g∈Gij ,j

π(g)∑r∈R

π(r)−j−1∑k=0

∑c∈ck

π(c)

, (4.3)

where π(c) represents the weight of the cell c and is:

π(c) =

{constant, if c ∈ s1;

ω(c), if c ∈ s2.

Note that in equation (4.3), the index j represents j-th draw, so c ∈ s1 for j =

1, . . . ,m1, and c ∈ s2 for j = m1 + 1, . . . ,m. When the proposed model in equation

(4.1) is fitted to the first sample (to obtain the weights ω), the weights π(c) are constant

and the probability given in expression (4.3) matches with the probability of selecting a

sample s given in Rapley & Welsh (2008). On the other hand, differently from Rapley

& Welsh (2008), the probability of selecting a given sample s does not depend directly

on the quantities of the model, but on the weights of the networks’ cells.

The cells that compose each non-sampled network are defined from their allocation

process (described in Subsection 4.1.1.1), which directly impacts the weights ω used to

select the second sample. Therefore, the proposed model must properly determine the

cells that compose the out-of-sample nonempty networks. If a cell is part of Cs in a

large number of MCMC iterations, this cell tends to be a nonempty cell of R and the

associated posterior mean will be high, while, if a cell does not compose Cs in a large

22

number of MCMC iterations, this cell tends to be an empty cell of R. It is expected that

this novel sampling method based on weights will lead us to a more efficient selection of

networks, as we are assigning higher chances to the cells where the phenomenon of interest

is expected to be found and avoiding sampling in areas where the expected intensity of

the phenomenon’s occurrence is low.

The proposed sampling methodology consists of the following steps:

(1) Consider a region R containing a rare, clustered population, partitioned into M

cells and draw an adaptive cluster sample of m1 networks, which is equivalent to

drawing a sample of m1 networks with probability proportional to their sizes, i.e.,

the elements of the vector of probabilities π are constant;

(2) Fit the proposed model in equation (4.1) to this first sample to obtain the posterior

mean of the cells’ counts η(c), given in the vector ω, which will be used as weights

to select the second sample;

(3) Since the first sample of the networks cells must not be drawn in the second sam-

pling stage, set the weights associated with the cells of the first sample as zero, as

well as, the non-empty networks’ border cells;

(4) From the remaining cells of R, drawn m2 networks with probability proportional

to the weights ω;

(5) Finally, fit the proposed model in equation (4.1) to the final sample of size m =

m1 +m2 to estimate the population total.

4.1.2.1 Border-sampling procedure

Through the proposed sampling method, we survey a selected grid cell and, if it is

nonempty, the entire network containing the selected grid cell. It is important to remark

that nonempty networks are surrounded by empty cells that compose its border, which

are not removed from R unless they are drawn as an empty network. Thus, a surveyed

border cell can be drawn later, although we know that it is empty.

To avoid surveying the same border cell twice, we propose an alternative sampling

method, given as follows: draw a grid cell from R with probability proportional to the

weights π, survey that grid cell and, if it is nonempty, survey the entire network contain-

ing the selected cell. After removing this network and its border from the population,

select a new cell from the remaining set of grid cells and proceed in this way until we

have selected m networks in the sample. In practice, proceeding this way is equivalent

to surveying clusters instead networks, though the final sample structure is the same as

23

before but containing the border cells’ information. Note that the only change in this

method is that the border cells can not be re-sampled, in comparison with the method

previously presented in Subsection 4.1.2.

The inference procedure is the same as Subsection 4.1.1 except for the joint distri-

bution of all the quantities in the model (4.2) since the probability of selecting a given

sample s has changed. Let cj be the set of sampled cells in the j-th draw. In this case, cj

is composed of the drawn grid cell and, if it is nonempty, the entire network containing

the selected grid cell plus its border. This subtle change in the sets cj, for j = 1, . . . ,m,

incorporates the sampling modification. Thus, the expression of the probability of se-

lecting the sample s of m networks is the same as in (4.3) except for the definition of the

set cj.

4.2 A first simulation study

In order to assess the effectiveness of each methodology, we compared the results

of our approach to those obtained in Rapley & Welsh (2008). In this Section, we will

refer to the model of Rapley & Welsh (2008) as the ‘network-sampling aggregated model’

(MAN) since the sampling procedure proposed by Salehi M. & Seber (1997) allows the

networks’ border cells to be re-selected. Analogously, the model proposed in Section 4.1

considering the sampling procedure first presented in Subsection 4.1.2 will be referred

to as the ‘network-sampling disaggregated model’ (MDN), and the one considering the

methodology proposed in Subsection 4.1.2.1 as the ‘cluster-sampling disaggregated model’

(MDC), since the sampling procedure does not allow the networks’ border cells to be re-

sampled, i.e., the entire cluster is sampled.

The fixed population used here was generated based on the disaggregated model

presented in equation (4.1) according to the fixed parameters (α, β) = (0.1, 0.1) and

(θ0, θ1) = (2.7, 0.5). The fictional covariate was simulated from a gaussian process. Figure

4.3 shows the generated population and covariate in a grid with M = 400 cells. Note

that these counts are sparse and clustered, motivating the use of adaptive sampling.

We can observe from Figure 4.4 that the higher the generated covariate value, the

higher the associated count of nonempty cells. Moreover, the simulated interest event is

associated with higher values of the covariate since there is no occurrence of that in cells

with lower covariate values.

The study consists of drawing 100 nonempty samples of m = 40 networks of the

population according to each method. The proposed sampling methodologies are divided

24

Figure 4.3: Values of the generated covariate (gray background) and counts for each nonempty

cell in a grid with M = 400 cells.

0 20 40 60 80

1.0

2.0

3.0

●●

●

●

●

●

●

●●

●

●●

●●

●●●

●●●

●●

●●●●●

●

●

●●

●●

●●

●

●●

●

●

●

●

●●

●●

●

●●●

●●

●

●

●●

●

●

●●●

●●

●

●●

●●●

●●●●

●

●

●

●

●●

●

●

●

●●

●

●●

●

●

●

●●

●●●

●

●●

●

●●●●

●

●

●●●●●●

●

●

●●

●

●●

●

●

●●

●●

●●●●

●

●

●

●

●

●●●

●●

●

●●

●●●●

●

●

●●

●●

●●●●

●

●●

●●

●●●●

●

●

●●

●●●●●

●

●

●

●●●●●●●

●

●●

●●

●●

●

●

●

●

●

●

●

●●

●●

●

●

●●

●●●●

●●●●●●●●●

●●

●

●

●●

●●

●

●

●

●●

●●●

●●

●●

●●●●

●

●

●●

●

●●

●

●

●●●●

●

●●

●●

●

●

●●

●

●●●

●●

●

●

●

●

●

●

●

●

●

●●

●●●

●●

●

●

●●

●●

●

●●

●

●

●

●

●

●

●

●

●

●●●

●

●

●●●●●

●

●●●

●●●●●

●

●●●●●●

●

●

●

●

●

●●

●

●●●

●

●

●

●●

●●

●●

●

●●●●

●●●

●●

●

●●

●

●

●

●●

●

●● ●

●●

●

●

●

●●

●

●

●●

● ●

●

●●

● ●

●

●

●

●

●

●●●●

●

Cov

aria

te

Counts

●

●

Nonempty cellEmpty cell

0.0 0.4 0.8

●●

●

●●●

●

●

●●

●

●●●

●●

●

●

●

●●

●

●

●●

●●

●

●●

●●

●

●

●

●

●

●●●●

●

Density

Figure 4.4: Plot of the generated covariate versus the population counts and covariate’s his-

togram, where the black points in the histogram represent the nonempty cells.

into two stages, where we sample m1 = 20 networks randomly and m2 = 20 networks

based on weights ω, according to the proposed sampling methodology. Note that we

are using an initial sample of size 50% of m to obtain these weights and one can adjust

this percentage (see Subsection 4.3.1). In the study performed in this Section, we are

omitting samples that consist of only empty networks since our proposed models require

at least one nonempty network in the first sample to adjust the weights ω properly.

The MCMC algorithm was implemented in the R programming language, v. 3.6.1 (R

Core Team, 2019). For each sample and fitted model, we ran two parallel chains starting

from different initial values, let each chain run for 40,100 iterations, discarded the first

100 as burn-in, and stored every 20th iteration to obtain 2,000 independent samples.

We used the diagnostic tools available in the package CODA (Plummer et al., 2006) to

25

check convergence of the chains. Convergence results of Subsection 4.3.2 are available in

Appendix B.

A summary comparison of the population total estimators using the relative square

root of the mean square error (RRMSE), relative absolute error (RAE), relative bias

(RB), relative width (RW) and the empirical coverage of the 95% credibility intervals

measured in percentages (Cov.) are presented in Table 4.1. These results are obtained

considering all 100 samples generated according to each model.

Table 4.1 shows that the Bayes estimator produced by MDN fit has a smaller RRMSE

and RAE than MAN and MDC’s estimators. The model MAN produced the smallest

RB, but it seems to be less efficient than the other models according to the other error

measurements. Although the MDC produces 95% credibility intervals with higher cover-

age percentages than the others, its width (RW) is not the smallest one. The proposed

model MDN appears to be more efficient when applied to these artificial samples.

RRMSE RAE RB RW Cov.

MAN 0.296 0.236 -0.009 0.879 95.88

MDN 0.265 0.209 -0.022 0.756 97.94

MDC 0.283 0.217 0.014 0.878 100.00

Table 4.1: Summary measurements of the point and 95% credibility interval estimates of

the population total T , obtained by fitting MAN, MDN and MDC models under 100 samples

according to each model.

In a similar way, Figure 4.5 shows the boxplots of the RRMSE, RAE, RB and RW of

the Bayes estimators obtained when fitting each model, based on all 100 samples. Here

again, we see that the RRMSE, RAE and RW obtained for MDN model are lower than

the others. Note that the RB distributions are quite similar although MDN has a smaller

variability than the others.

RR

MS

E

0.15

0.30

0.45

MA

N

MD

N

MD

C

RA

E

0.12

0.23

0.34

MA

N

MD

N

MD

C

RB

−0.

500.

000.

45

MA

N

MD

N

MD

C

RW

0.60

0.87

1.14

MA

N

MD

N

MD

C

Figure 4.5: Boxplots with measurements of the point and 95% credibility interval estimates

for T over 100 simulations obtained for the fits of MAN, MDN and MDC models.

26

Finally, we present the barplots with the relative frequency of the number of networks

sampled from 100 fits of each model in Figure 4.6. Note that the sampling procedure of

the proposed disaggregated models tend to sample more networks than the aggregated

one. In particular, MDN sampling procedure provides us more samples containing the

whole population than the others.

1 2 3 4 5

Rel

ativ

e fr

eque

ncy

0.0

0.2

0.4

MAN

1 2 3 4 5

MDN

1 2 3 4 5

MDC

Figure 4.6: Relative frequency of the number of networks sampled over 100 simulations ob-

tained for the fits of MAN, MDN and MDC models.

Since among the proposed methodologies, the MDN model (which allows the border

cells to be re-sampled) yielded better results than the MDC model, we will focus on

studying the properties of the MDN model from now on, as well as a more extensive

comparison with the aggregated model and an application to a real data.

4.3 Comparison with the aggregated model

To assess the effectiveness of our proposed methodology, we compared the results of

our approach considering the MDN model of Section 4.2, which will be simply referred

to as disaggregated model (MD), to those obtained in Rapley & Welsh (2008), called

aggregated model (MA). The first comparison consists of a design-based experiment,

where the numbers of networks m1 and m2 selected are studied, and the second one is a

real experiment with an African Buffalo population in an area of East Africa.

4.3.1 A design-based experiment evaluating the sample fraction

The purpose of this simulation study is to compare the performance of the aggregated

and disaggregated models when the population is generated according to the disaggre-

gated model, and study how the choice of the numbers of networks m1 and m2 selected,

27

respectively, in the first and second sampling stages, affect the population total estimates.

We considered twelve scenarios to evaluate how the sample size m and the numbers m1

and m2 of networks selected in the first and second sampling stages, respectively, affect

the population total estimates under the disaggregated model. We fixed the total sample

size m ∈ {30, 40, 50} and the percentage of the m networks to be sampled in the first

sampling stage at {35%, 50%, 65%, 80%}, i.e., the numbers m1 and m2 depend on these

percentages. We used the same population generated in the simulation study presented

in Section 4.2, which is distributed in a region with M = 400 cells, draw 500 samples

according to each scenario and methodology, and fitted both models to evaluate their

performances. Note that, the aggregated model’s sampling methodology considers only

one sample of size m, collected as in the first sampling stage of the proposed methodology.

Table 4.2 shows the values of the sample size m and the respective m1 and m2 of networks

selected in the first and second sampling stages, according to each fixed percentage.

35% 50% 65% 80%

m m1 m2 m1 m2 m1 m2 m1 m2

30 10 20 15 15 20 10 24 6

40 14 26 20 20 26 14 32 8

50 18 32 25 25 32 18 40 10

Table 4.2: Values of the sample sizes m, m1 and m2, according to each fixed percentage.

Table 4.3 displays some of the frequentist properties of the estimators obtained by

fitting the proposed disaggregated and aggregated models. In general, increasing the

sample size leads to smaller errors and variances (RW), so it is expected that errors

associated with the disaggregated model for m = 50 are smaller than the ones that

consider m = 30 and m = 40. Note that, as the percentage of networks sampled in the

first sampling stage decreases, the disaggregated model performs better, according to the

RRMSEs e RAEs, since its values become smaller. Moreover, these error values associated

with the aggregated model are higher than the ones obtained under the proposed model

fit, except when m = 40 and the sampling proportion is fixed in 80%. The relative bias

of the fitted MAs is smaller than the ones produced by the MD models and both seem

to underestimate the population totals T . The relative width of the proposed model

for all scenarios are smaller than the ones provided by the aggregated model, while

still producing higher coverages. Overall, the disaggregated model presents a better

performance than the aggregated model.

Figure 4.7 presents the boxplot of some of the measurements associated with the

estimates for T , and again it suggests that the disaggregated model performs better

28

m RRMSE RAE RB RW Cov.

30

MD35% 0.334 0.263 -0.030 1.027 100.00

MD50% 0.335 0.267 -0.052 1.010 100.00

MD65% 0.343 0.278 -0.082 1.030 100.00

MD80% 0.350 0.287 -0.106 1.029 100.00

MA 0.357 0.283 -0.015 1.119 99.00

40

MD35% 0.262 0.207 -0.022 0.761 99.80

MD50% 0.267 0.214 -0.042 0.762 99.60

MD65% 0.275 0.224 -0.070 0.768 99.20

MD80% 0.302 0.252 -0.110 0.791 98.40

MA 0.294 0.235 -0.018 0.867 96.00

50

MD35% 0.226 0.178 -0.015 0.619 99.80

MD50% 0.227 0.180 -0.022 0.603 98.60

MD65% 0.235 0.190 -0.045 0.614 98.60

MD80% 0.245 0.202 -0.073 0.617 98.40

MA 0.252 0.204 -0.015 0.697 90.80

Table 4.3: Summary measurements of the point and 95% credibility interval estimates for T

over 500 simulations obtained for the fits of the disaggregated and aggregated models, consid-

ering different sample sizes and proportions.

taking into account the variation of these values. In particular, there is an increase in

RRMSEs and RAEs quartiles as we increase the percentage of networks sampled in the

first sampling stage.

0.2

0.4

0.6

RR

MS

E

m = 30 m = 40 m = 50

0.1

0.3

0.5

RA

E

m = 30 m = 40 m = 50

−0.

50.

5

MD

35 %

MD

50 %

MD

65 %

MD

80 %

MA

MD

35 %

MD

50 %

MD

65 %

MD

80 %

MA

MD

35 %

MD

50 %

MD

65 %

MD

80 %

MA

RB

0.5

1.0

1.5

MD

35 %

MD

50 %

MD

65 %

MD

80 %

MA

MD

35 %

MD

50 %

MD

65 %

MD

80 %

MA

MD

35 %

MD

50 %

MD

65 %

MD

80 %

MA

RW


for T over 500 simulations obtained for the fits of the disaggregated and aggregated models,

considering different sample sizes and proportions.

29

Figure 4.8 presents the barplots with the relative frequency of the number of networks

sampled from 500 simulations, according to each scenario and sampling methodology.

Note that the proposed methodology provides a higher number of samples containing

the entire population than the aggregated model and, in particular, as we decrease the

percentage, the greater is the number of networks sampled since we are using weights

earlier. Moreover, as expected, we observed more nonempty sampled networks as we

increase the sample size.

Rel

ativ

e fr

eque

ncy

1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5

00.

20.

40

0.2

0.4

00.

20.

4

m =

50

m =

40

m =

30

Number of networks sampled

MD35 % MD50 % MD65 % MD80 % MA

Figure 4.8: Relative frequency of the number of networks sampled over 500 simulations ob-

tained for the fits of the disaggregated and aggregated models, considering different sample

sizes and proportions.

To evaluate the effect of the number of networks sampled in the total estimates, Figure

4.9 shows some measurements of the relative bias estimates of T from 500 simulations,

according to each scenario and sampling methodology. For each scenario and sample size

combination, the number of simulations distributed among the five possible quantities of

sampled networks is given below the gray bars, with few simulations in some cases, and

the coverage of the 95% credibility intervals is shown above the gray bars. Our proposal

distribution may lead to none out-of-sample nonempty cells and, consequently, none

out-of-sample networks, differently from Rapley & Welsh (2008)’s approach. Therefore,

when we sample five networks, the aggregated model tends to overestimate more than the

disaggregated models, and its coverage is zero regardless of the sample size m. Notice that

for m = 30, all 95% credibility intervals associated with disaggregated models include the

true value of T , while for m = 40, it happens when we sample more than one network

30

and, for m = 50, when we sampled more than two networks. Moreover, among the

disaggregated models, the model that provides the largest coverage for all sample sizes

is MD35%.

m =

30

−0.

60.

00.

6

1 1 1 1

1 1 1 1 1

1 1 1 1 1

1 1 1 1 1

1 1 1 1 1

0

1 1 1 1

1 1 1 1 1

1 1 1 1 1

1 1 1 1 1

1 1 1 1 1

0

1 1 1 1

1 1 1 1 1

1 1 1 1 1

1 1 1 1 1

1 1 1 1 1

0

27 35 43 55 110 11

4

121

149

165 16

9

166

183

185

182 16

3

155

131

106 85

53 38 30 17 13

5

27 35 43 55 110 11

4

121

149

165 16

9

166

183

185

182 16

3

155

131

106 85

53 38 30 17 13

5

27 35 43 55 110 11

4

121

149

165 16

9

166

183

185

182 16

3

155

131

106 85

53 38 30 17 13

5

Rel

ativ

e bi

as

m =

40

−0.

7−

0.2

0.3

0.86 0.

78

0.73

0.75

0.92 1 1 1 1

1 1 1 1 1

1 1 1 1 1

1 1 1 1 1

0

0.86 0.

78

0.73

0.75

0.92 1 1 1 1

1 1 1 1 1

1 1 1 1 1

1 1 1 1 1

0

0.86 0.

78

0.73

0.75

0.92 1 1 1 1

1 1 1 1 1

1 1 1 1 1

1 1 1 1 1

0

7 9

15 32

48

46 60 70 108 13

4 170

169

186

184 19

1 191

184

174

129 11

1 86 78 55 47

16

7 9

15 32

48

46 60 70 108 13

4 170

169

186

184 19

1 191

184

174

129 11

1 86 78 55 47

16

7 9

15 32

48

46 60 70 108 13

4 170

169

186

184 19

1 191

184

174

129 11

1 86 78 55 47

16

m =

50

−0.

7−

0.2

0.3

0.75 0.25

0.5

0

0.63

1 0.97

0.94

0.94 1 1 1 1 1

1 1 1 1 1

1 1 1 1 1

0

0.75 0.25

0.5

0

0.63

1 0.97

0.94

0.94 1 1 1 1 1

1 1 1 1 1

1 1 1 1 1

0

0.75 0.25

0.5

0

0.63

1 0.97

0.94

0.94 1 1 1 1 1

1 1 1 1 1

1 1 1 1 1

0

4 8

10 4 19

29 30 35 64 100 11

1

103

137

141 18

8 202

209

194

202 15

4

154

150

124 89

39

4 8

10 4 19

29 30 35 64 100 11

1

103

137

141 18

8 202

209

194

202 15

4

154

150

124 89

39

4 8

10 4 19

29 30 35 64 100 11

1

103

137

141 18

8 202

209

194

202 15

4

154

150

124 89

39

1 2 3 4 5Number of networks sampled

MD

35 %

MD

50 %

MD

65 %

MD

80 %

MA

MD

35 %

MD

50 %

MD

65 %

MD

80 %

MA

MD

35 %

MD

50 %

MD

65 %

MD

80 %

MA

MD

35 %

MD

50 %

MD

65 %

MD

80 %

MA

MD

35 %

MD

50 %

MD

65 %

MD

80 %

MA

MD

35 %

MD

50 %

MD

65 %

MD

80 %

MA

MD

35 %

MD

50 %

MD

65 %

MD

80 %

MA

MD

35 %

MD

50 %

MD

65 %

MD

80 %

MA

MD

35 %

MD

50 %

MD

65 %

MD

80 %

MA

MD

35 %

MD

50 %

MD

65 %

MD

80 %

MA

MD

35 %

MD

50 %

MD

65 %

MD

80 %

MA

MD

35 %

MD

50 %

MD

65 %

MD

80 %

MA

MD

35 %

MD

50 %

MD

65 %

MD

80 %

MA

MD

35 %

MD

50 %

MD

65 %

MD

80 %

MA

MD

35 %

MD

50 %

MD

65 %

MD

80 %

MA

Figure 4.9: Mean (white line) and 2,5% and 97,5% quantiles (gray bars) of the relative bias

estimates of T over 500 simulations obtained for the fits of the disaggregated and aggregated

models, considering different sample sizes, proportions, numbers of networks sampled (number

of simulations given below the gray bars) and coverage (above the gray bars).

Based on this study, the disaggregated model provides a more efficient sample, with

a large number of networks than the aggregated model. With relation to the estima-

tors performance, the MD35% model showed to be more efficient and, therefore, we will

concentrate on studying the properties of this model on an application to real data.

4.3.2 A real application

In this subsection, we analyze the performance of the disaggregated model with 35%

of m networks sampled in the first sampling stage (MD35%) and the aggregated model

(MA) using a real dataset. In order to simplify the notation we will refer MD35% model

31

as MD. The study variable considered is the number of African Buffaloes in an area

of East Africa, while the auxiliary variable is the altitude (in meters). The choice of

Buffalo and altitude was motivated by the fact that Buffaloes drink a lot of water (Prins,

1996) and their spatial distribution depends on the prevailing climatic condition (Bennitt

et al., 2014), that is related to the altitude. Thus, areas with higher temperatures (lower

altitude) lose terrain water (lakes or rivers) due to the evaporation, attracting little or

no presence of Buffaloes.

The data on African Buffalo was obtained from maps produced from an aerial census.

The census was conducted by the Kenya Wildlife Service, the Tanzania Wildlife Research

Institute, and other partners during the wet season in the year 2010 covering an area of

about 24,108km2. The area covered was the Amboseli-West Kilimanjaro/Magadi-Natron

cross border landscape, which covers parts of Kenya and Tanzania. The auxiliary data

over the study area were obtained from the Shuttle Radar Topography Mission (STRM)

database freely available for download from https://www2.jpl.nasa.gov/srtm/. In

particular, we used the altitude in a logarithm scale as covariate, since its values have a

smaller order of magnitude. Figure 4.10 presents the distribution and counts of the Buf-

falo in the study region along with pixels of auxiliary variables and shows that Buffaloes

are mostly found in areas of higher altitudes.

Figure 4.10: Altitude in a logarithm scale (gray background) and counts of African Buffaloes

over parts of Kenya and Tanzania in 2010 in a grid with M = 391 cells.

From Figure 4.11, we can notice that Buffaloes tend not to be in areas in which

the associated covariate has extreme values, i.e. they are concentrated in areas with

intermediate log altitude values.

Based on the relationship between the Buffalo counts and altitude, it seems natural

to add the square of the covariate as another explanatory covariate, allowing us to model

more accurately the effect of log altitude, which has a non-linear relationship with the

32

https://www2.jpl.nasa.gov/srtm/

Figure 4.11: Plot of altitude in a logarithm scale versus counts of African Buffaloes, and

covariate’s histogram, where the black points in the histogram represent the nonempty cells.

Buffalo counts. Since we are using highly correlated covariates, centering them is helpful

for the numerical schemes to converge. Thus, let v(c) be the centered log altitude for

c ∈ R, and v2(c) its respective square. Now, the covariate vector associated with the

cell c in the proposed model (4.1) is given by vc, for all c ∈ R. Introducing the squared

log altitude leads to consider the high correlation between covariates in the proposal

distribution, which is detailed in Section A.1.

Remember that in the allocation process, described in Subsection 4.1.1, the available

cells of R are sampled with probability proportional to λ, the set of intensities assigned

to R cells, which is obtained through information from sampled nonempty cells. Note

that the real population used in this section (see Figure 4.10) is extremely rare, with

small networks. Thus, the sample may contain a few number of nonempty cells and,

consequently, few information to estimate λ. Therefore, in this application, the weights

used in the allocation process will be the probability of each cell not being empty, which

is estimated from all sampled cells (empty and non-empty). Let φ(c) = 1, if c is a

nonempty grid cell and φ(c) = 0, otherwise. Define ν(c) = P (φ(c) = 1), the probability

that c is a nonempty cell, and ν = (ν(c1), . . . , ν(cM))′ the set of these probabilities, for

all cell c of R. Thus, the available cells of R are sampled with probability proportional to

ν. To obtain these probabilities, the following structure will be included in the proposed

model (4.1):

φ(c) | vc,ρ ∼ Bernoulli

(1

1 + exp{−v′cρ}

), c ∈ R, (4.4)

where ρ = (ρ0, ρ1, ρ2)′ represents the regression coefficients vector associated with the

covariates vc = (1, v(c), v2(c))′. Note that, to estimate ρ, we will use the information

from all empty and nonempty sample cells, unlike θ, which only takes into account

33

nonempty cells of the sample. The full conditional posterior distribution of ρ and the

methods adopted to sample from it are detailed in Appendix A, Section A.5.

Following the Bayesian paradigm, it is also assumed independent priors for the un-

known parameters ρ and its marginal prior distributions is denoted by [ρ]. Let [ρ] be a

non-informative prior with a zero-mean vector and covariance matrix σ2ρIk+1, where Ik+1

denotes the (k + 1)-dimensional identity matrix. Since the African Buffaloes population

is extremely sparse and clustered, we will set the prior distribution parameters of α, pre-

sented in Subsection 4.1.1, as aα = 3 and bα = 50, to reflect the fact that α is necessarily

small in this population. For β, we maintain the prior parameters set as aβ = 1 and

bβ = 9.

The study consists of drawing 500 samples of m = 40 networks of the real population

according to each method. The proposed sampling methodology is divided into two

stages, where the sample consists of 35% of m = 40 networks sampled in the first stage,

this is m1 = 14 networks are sampled randomly and m2 = 26 networks based on weights

ω. In this study, we are omitting samples that consist of only empty networks since our

proposed model requires at least one nonempty network in the first sample to adjust the

weights ω properly.

Figure 4.12 presents the boxplots of the RRMSE, RAE, RB and RW of the Bayes

estimators obtained from the disaggregated and aggregated model fits, based on 500

simulations. Note that these measures’ distributions under the disaggregated model are

wider and higher than under the aggregated model.

RR

MS

E

0.5

1.1

1.7

2.3

MD

MA

RA

E

0.3

0.6

0.9

1.2

MD

MA

RB

−0.

10.

30.

71.

1

MD

MA

RW

1.5

2.9

4.3

5.7

MD

MA


for T over 500 simulations obtained for the fits of the disaggregated and aggregated models to

real data.

On the other hand, since the African Buffaloes population is extremely sparse and

clustered, many samples consist of networks with few cells each. These samples are ex-

pected to be of limited use in accurately estimating the population total. Thus, the

results must ultimately be affected. Table 4.4 shows that 79.4% of the simulations based

34

on MD and 84.4% based on MA contain one or two nonempty networks sampled. More-

over, the aggregated model did not sample five nonempty networks once. Thus, it would

be interesting to fix the number of nonempty networks sampled when each method is

used.

Percentage of nonempty networks sampled 1 2 3 4 5

Disaggregated model 35.4 44.0 18.0 2.2 0.4

Aggregated model 36.4 48.0 13.6 2.0 0.0

Table 4.4: Percentage of networks sampled over 500 simulations under each model.

By this way, to facilitate the comparison between the models, we repeated the study

with 100 samples of the real population according to each method, fixing the final number

of nonempty networks sampled, that is, we will consider 100 simulations with one to five

nonempty networks sampled. Henceforth, we will refer to the number of nonempty

networks sampled simply as number of networks sampled.

Table 4.5 displays some of the frequentist properties of the estimators for T obtained

by fitting the disaggregated and aggregated models, for each number of networks sampled.

When the number of networks sampled is fixed in four or five, the proposed model

performs better than the aggregated one according to all the criteria. In general, the

relative bias associated with the disaggregated model are smaller than the ones produced

by the aggregated model, except when we have two networks sampled. Also, the coverages

of the proposed model are higher than the aggregated model ones and, as we increase

the number of networks sampled, the MA’s coverage becomes smaller. In particular,

with five networks sampled, none of the 500 95% credibility intervals associated with the

aggregated model include the true value of T .


1 2 3 4 5

MD MA MD MA MD MA MD MA MD MA

RRMSE 1.074 0.869 0.823 0.809 0.774 0.893 0.794 1.056 0.757 1.311

RAE 0.662 0.610 0.578 0.621 0.538 0.696 0.572 0.855 0.550 1.117

RB 0.339 0.538 0.374 0.364 0.428 0.585 0.549 0.832 0.550 1.117

RW 3.189 2.561 2.442 2.069 2.207 2.186 2.079 2.387 1.895 2.593

Cov. 100.00 100.00 100.00 64.00 100.00 51.00 100.00 32.00 96.00 0.00

Table 4.5: Summary measurements of the point and 95% credibility interval estimates for T

over 100 simulations for different numbers of networks sampled, obtained for the fits of the

disaggregated and aggregated models.

35

Figure 4.13 presents the boxplots of some of the previous summary measurements for

T , and the conclusion is analogous to the previous one. In particular, there is a decreas-

ing behavior of the disaggregated model’s relative widths as we increase the number of

networks sampled.

Number of networks sampled Number of networks sampled

RR

MS

E

0.6

1.1

1.6 1 2 3 4 5

RA

E

0.4

0.9

1.4 1 2 3 4 5

RB

−0.

20.

51.

2

MD

MA

MD

MA

MD

MA

MD

MA

MD

MA

RW

1.2

2.5

3.8

MD

MA

MD

MA

MD

MA

MD

MA

MD

MA


for T over 100 simulations for each number of networks sampled, obtained for the fits of the

disaggregated and aggregated models to real data.

Finally, a summary comparison of population total T estimators considering the dis-

aggregated and aggregated models is presented, using RRMSE, RAE, RB and RW are

presented in Table 4.6, based on the 500 simulations resultant of the aggregation of 100

simulations with one to five networks sampled. Additionally, we compared these results

to the ones obtained by applying an unbiased Raj’s estimator, detailed on Salehi M. &

Seber (1997). This estimator of the population total is based only on the information

contained in the selected networks, i.e., ignoring the information in the border cells. In

this case, we used a normal approximation to set the 95% confidence interval to the pop-

ulation total. Table 4.6 shows that both estimators have larger RRMSEs, RAEs and RBs

than our proposed estimator, although it is well-known that Raj’s estimator is unbiased.

Raj’s estimator has a much larger variance than its counterparts. The aggregated model

produces 95% credibility intervals that have lower nominal coverages than the others.

Furthermore, our proposed model appears to be more efficient when applied to these

data.

Figure 4.14 shows the boxplots of some measurements of the Bayes estimators ob-

tained when fitting each model and Raj’s estimator.

In addition to other advantages previously seen, incorporating covariates into the

36

RRMSE RAE RB RW Cov.

Disaggregated model 0.845 0.580 0.448 2.362 99.2

Aggregated model 0.987 0.780 0.687 2.359 49.4

Raj’s estimator 1.018 1.018 0.967 4.222 99.0

Table 4.6: Summary measurements of the point and interval estimates of the population total,

obtained by fitting the disaggregated and aggregated models and Raj’s estimator.

RR

MS

E

0.0

0.8

1.6

2.4

MD

MA

Raj

RA

E

0.0

0.8

1.6

2.4

MD

MA

Raj

RB

−1.

10.

01.

12.

2

MD

MA

Raj

RW

1.2

2.6

4.0

5.4

MD

MA

Raj


for T over 100 simulations for each number of networks sampled, obtained for the fits of the

disaggregated and aggregated models.

model allows us to refer the out-of-sample nonempty cells spatially. Figure 4.15 presents,

for one sample of each number of networks sampled, a map of the posterior mean of the

African Buffalo counts, i.e., the posterior mean of η(c) for all out-of-sample cell c of R.

Note that the lighter the cells’ color, the higher the posterior mean of that cell. Due to the

allocation process, the count estimates associated with the sampled nonempty networks’

border cells are equal to zero (hatched black cells). Note that the maps become darker as

we increase the number of sampled networks, i.e. samples with more nonempty networks

tend to estimate lower out-of-sample counts. Moreover, considering one to four networks

sampled, the out-of-sample nonempty networks are located in lighter areas, indicating

that the model can predict where they are. In the case with five networks sampled,

due to the proposed model structure and the relation between the Buffalo counts and the

covariate, we believe that the lighter areas would be more conducive to the establishment

of new populations, although there are no out-of-sample nonempty networks.

Figure B.1 in Appendix B shows the trace plot with the posterior distribution of

parameters α and β and the population total T when fitting the disaggregated model for

one of the samples selected for each number of networks sampled. Table B.1 in Appendix

B presents the values of the Geweke criteria. Analyzing Figure B.1 and Table B.1 leads

to conclude that convergence appears to have been reached. The same conclusion was

37

●●●●

●●

●●●●●●

●●●●●

● ●●●●

●●●●

●●

●●●●●●●

●●●●●●●●

●●●●● ●

●●●●●

●●

●●● ●

●●●●

●

● ●●●●

●●● ●

●●●

●●

●●

●

●●●● ●

●●●●

●●

●●●●●

●

●●●●●●

●●

●●

●●●

●●

● ●●

●●●

●●●●

●● ●

●

●●●

●●●●

●●●

●●●●●●●

●●● ●

●●●

●●●●●●

●●

●●

●

●●●●●

●●●●

●● ●●

●●

●●●●● ●

●●●●●●

●● ●●

● ●●

●

●●●●

●●●●●

●●

●●●●

●●●●●● ● ●

●●●●●

●●●●

●● ●

●●●

●●●●

●●●

●●● ●

●●●

(a) One network sampled

●●●●

●●

●●●●●●

●●●●●

● ●●●●

●●●●

●●

●●●●●●●

●●●●●●●●

●●●●● ●

●●●●●

●●

●●● ●

●●●●

●

● ●●●●

●●● ●

●●●

●●

●●

●

●●●● ●

●●●●

●●

●●●●●

●

●●●●●●

●●

●●

●●●

●●

● ●●

●●●

●●●●

●● ●

●

●●●

●●●●

●●●

●●●●●●●

●●● ●

●●●

●●●●●●

●●

●●

●

●●●●●

●●●●

●● ●●

●●

●●●●● ●

●●●●●●

●● ●●

● ●●

●

●●●●

●●●●●

●●

●●●●

●●●●●● ● ●

●●●●●

●●●●

●● ●

●●●

●●●●

●●●

●●● ●

●●●

(b) Two networks sampled

●●●●

●●

●●●●●●

●●●●●

● ●●●●

●●●●

●●

●●●●●●●

●●●●●●●●

●●●●● ●

●●●●●

●●

●●● ●

●●●●

●

● ●●●●

●●● ●

●●●

●●

●●

●

●●●● ●

●●●●

●●

●●●●●

●

●●●●●●

●●

●●

●●●

●●

● ●●

●●●

●●●●

●● ●

●

●●●

●●●●

●●●

●●●●●●●

●●● ●

●●●

●●●●●●

●●

●●

●

●●●●●

●●●●

●● ●●

●●

●●●●● ●

●●●●●●

●● ●●

● ●●

●

●●●●

●●●●●

●●

●●●●

●●●●●● ● ●

●●●●●

●●●●

●● ●

●●●

●●●●

●●●

●●● ●

●●●

(c) Three networks sampled

●●●●

●●

●●●●●●

●●●●●

● ●●●●

●●●●

●●

●●●●●●●

●●●●●●●●

●●●●● ●

●●●●●

●●

●●● ●

●●●●

●

● ●●●●

●●● ●

●●●

●●

●●

●

●●●● ●

●●●●

●●

●●●●●

●

●●●●●●

●●

●●

●●●

●●

● ●●

●●●

●●●●

●● ●

●

●●●

●●●●

●●●

●●●●●●●

●●● ●

●●●

●●●●●●

●●

●●

●

●●●●●

●●●●

●● ●●

●●

●●●●● ●

●●●●●●

●● ●●

● ●●

●

●●●●

●●●●●

●●

●●●●

●●●●●● ● ●

●●●●●

●●●●

●● ●

●●●

●●●●

●●●

●●● ●

●●●

(d) Four networks sampled

●●●●

●●

●●●●●●

●●●●●

● ●●●●

●●●●

●●

●●●●●●●

●●●●●●●●

●●●●● ●

●●●●●

●●

●●● ●

●●●●

●

● ●●●●

●●● ●

●●●

●●

●●

●

●●●● ●

●●●●

●●

●●●●●

●

●●●●●●

●●

●●

●●●

●●

● ●●

●●●

●●●●

●● ●

●

●●●

●●●●

●●●

●●●●●●●

●●● ●

●●●

●●●●●●

●●

●●

●

●●●●●

●●●●

●● ●●

●●

●●●●● ●

●●●●●●

●● ●●

● ●●

●

●●●●

●●●●●

●●

●●●●

●●●●●● ● ●

●●●●●

●●●●

●● ●

●●●

●●●●

●●●

●●● ●

●●●

(e) Five networks sampled

Figure 4.15: Maps of the posterior mean of African Buffalo counts η(c) (gray background) for

all out-of-sample cell c of R, for each number of sampled networks, and its population (points)

distributed in a region with 391 cells. The lighter the cells’ color, the higher the posterior mean

of that cell. The blue cells are the sampled networks’ cells and the hatched cells correspond to

the nonempty networks’ border cells.

achieved for all 600 samples selected from this population.

4.4 Model-based experiment under different settings

To examine the proposed methodology’s performance under several scenarios, 500

populations were generated considering different configurations for each one of the four

scenarios considered, which were created by varying the values of parameters (α, β). In

38

particular, populations were simulated for 4 pairs of (α, β), with α, β ∈ {0.10, 0.15},which were set to create different degrees of rare and clustered populations. Then, an

adaptive cluster sample of final size m = 40 was selected from each population with a

35% proportion being sampled randomly, i.e. the first stage’s sample size is m1 = 14 and

the second’s one is m2 = 26.

Table 4.7 shows summary statistics with some frequentist measurements of the poste-

rior distributions of the model parameters for each of the four evaluated scenarios. Note

that, the less rare and clustered the population is, the narrower the 95% credibility in-

tervals are, and the greater the tendency for the model to underestimate its parameters.

Moreover, the RRMSEs and RAEs do not vary much. In addition, the rarer and more

clustered the population, the larger the coverage of the 95% credibility intervals for the

population total T , while the coverage for α and β are close to the nominal level.

T α β T α β

(α, β) = (0.10, 0.10) (α, β) = (0.10, 0.15)

RRMSE 0.306 0.352 0.562 0.321 0.340 0.467

RAE 0.238 0.288 0.465 0.262 0.279 0.393

RB 0.006 0.037 -0.066 -0.092 -0.069 -0.146

RW 0.794 1.036 1.690 0.883 1.076 1.375

Cov. 95.00 93.60 95.00 99.00 98.80 95.00

(α, β) = (0.15, 0.10) (α, β) = (0.15, 0.15)

RRMSE 0.242 0.273 0.490 0.292 0.314 0.412

RAE 0.209 0.236 0.402 0.264 0.278 0.345

RB -0.114 -0.109 -0.022 -0.216 -0.209 -0.126

RW 0.495 0.701 1.513 0.562 0.723 1.225

Cov. 89.60 93.80 97.00 86.40 91.60 96.00

Table 4.7: Summary measurements of the point and 95% credibility interval estimates of the

proposed model and population parameters over 500 simulations for different values of α, β and

T .

Figure 4.16 presents the boxplots of some of the previous summary measurements for

T . Note that, fixing α, as we switch β from 0.10 to 0.15, the RRMSEs, RAEs and RWs

increase. Additionally, T is slightly underestimated as the population becomes rarer and

more clustered.

Finally, considering the population total T , the scenario generated considering α =

0.15 and β = 0.10, provided lower values of RRMSE and RAE errors, besides presenting

a smaller relative width.

39

RR

MS

E

0.0

0.3

0.6

(0.1

0, 0

.10)

(0.1

0, 0

.15)

(0.1

5, 0

.10)

(0.1

5, 0

.15)

(α, β

)

RA

E

0.0

0.3

0.6

(0.1

0, 0

.10)

(0.1

0, 0

.15)

(0.1

5, 0

.10)

(0.1

5, 0

.15)

(α, β

)

RB

−0.

60.

00.

6

(0.1

0, 0

.10)

(0.1

0, 0

.15)

(0.1

5, 0

.10)

(0.1

5, 0

.15)

(α, β

)

RW

0.2

0.8

1.4

(0.1

0, 0

.10)

(0.1

0, 0

.15)

(0.1

5, 0

.10)

(0.1

5, 0

.15)

(α, β

)


of the proposed model and population parameters over 500 simulations for different values of

α, β and T .

40

Chapter 5

Conclusions

We have considered the problem of estimating the total number of individuals in a

rare and clustered population. A regular grid is superimposed on the interest region,

placing the clusters, giving them a spatial size and allowing modeling the number of

individuals selected by adaptive cluster sampling, as described by Salehi M. & Seber

(1997), within this grid structure.

Our approach is to model the observed counts of the selected grid cells and to use

a model-based analysis to estimate the population total using the auxiliary information

of covariates. To include this extra knowledge in the model, we proposed a model more

flexible than the one introduced by Rapley & Welsh (2008), since it models at cell level

instead of network-level and assumes that the intensity in each cell of a cluster is related

to the available covariates’ values. Despite the higher computational cost, the proposed

methodology considering a grid with 400 cells still runs on a home computer (CORE i7,

16GB) at an acceptable time (about 30 minutes on average).

As evidenced by the simulated studies in Sections 4.2 and 4.3, the incorporation of

covariates into the model provided an improvement in the sampling process by getting

more sampled networks. In practical situations, increasing the number of sampled net-

works directs and enhances the use of human and material resources, reducing expenses

involved with the sampling procedure. Besides, it is also possible to spatially refer ob-

served and unobserved networks, highlighting areas more conducive to the establishment

of the studied population. Moreover, despite the challenges inherent in the spatial pre-

diction problem - more specifically in the allocation of the networks and their counts

through the interest region - the resulting maps showed to adequately indicate where the

population under study is placed. We also performed changes in MCMC obtaining ad-

vances in inference: our proposal distribution may lead to none out-of-sample networks,

41

and the performance of our Bayes estimator is substantially better than the one proposed

by Rapley & Welsh (2008).

Simulation studies have assessed different scenarios varying the percentage of net-

works drawn in the first sampling stage and the parameters used to generate artificial

populations. Our methodology has yielded satisfactory results and, in most cases, bet-

ter than those obtained without additional information according to various comparison

criteria. In the analyzed application, covariate information was successfully incorporated

into the model by including quadratic terms in the linear predictor, evidencing the flex-

ibility of the model to incorporate available auxiliary information. Simulation studies

with a real population have shown that the results are quite satisfactory according to

several comparison criteria, validating the methodology proposed in this dissertation for

practical situations.

The main findings of this work encourage an extension of this model to other spatial

structures that reveal more information about the population. It is reasonable to study

the expected cost of the proposed sampling scheme since sampling more non-empty net-

works evolve visiting more cells. Moreover, the proposed methodology requires at least

one nonempty network in the sample to fit the model. It is of interest to propose an alter-

native method that, instead of fixing m1, sample networks until reaching a stop criterion.

Furthermore, Goncalves & Moura (2016) proposed a mixture model at a disaggregated

level that allows the assumption of heterogeneity between networks. Since both models

are disaggregated, it would be interesting to compare these methodologies. Finally, we

intend to develop a package in R programming language that allows the replication of the

main results of this dissertation.

42

Appendix A

Full conditional posterior

distributions of the parameters in

the proposed model

In this section we present the posterior full conditional distributions of the compo-

nents of the parameter vector Θ = (ηs,Ys, Ps, Xs,θ, β, α)′. We denote the posterior full

conditional of a parameter φ in Θ by [φ | · · · ].

A.1 Full conditional posterior distribution of θ

The posterior full conditional of θ = (θ0, θ1, . . . , θk)′ is proportional to

[θ | · · · ] ∝∏c∈Cs

exp{− exp{v′cθ}+ η(c)v′cθ}1− exp{− exp{v′cθ}}

× exp

{− 1

2σ2θ

θ′θ

},

which does not have an analytical closed-form. We use the block Metropolis-Hastings

algorithm with a multivariate Normal proposal, whose vector mean is the current value of

the parameter and the covariance matrix is fixed at Ik+1σ2θ∗ , where the term σ2

θ∗ controls

the acceptance rates and Ik+1 denotes the (k + 1)-dimensional identity matrix.

Considering the structure presented in Subsection 4.3.2, where the squared log altitude

is introduced to the model, the covariance matrix is fixed at (V′V)−1σ2θ∗ , where the term

σ2θ∗ controls the acceptance rates and V is the matrix with rows v′c, for all nonempty cell

c of the sample, that is c ∈ Cs.

43

A.2 Full conditional posterior distribution of α

The posterior full conditional of α is proportional to

[α | · · · ] ∝ αxs+xs+aα−1(1− α)M−xs−xs+bα−1

1− (1− α)M,

which is close to a Beta distribution but is not truly a Beta distribution, due to the

truncation term. We use the Metropolis-Hastings algorithm with a Beta proposal with

parameters beta(xs + xs + aα, M − xs − xs + bα).

A.3 Full conditional posterior distribution of β

The posterior full conditional of β is proportional to

[β | · · · ] ∝ βps+ps+aβ−1(1− β)xs+xs−ps−ps+bβ−1

1− (1− β)x,

which is close to a Beta distribution but is not truly a Beta distribution, due to the

truncation term. We use the Metropolis-Hastings algorithm with a Beta proposal with

parameters beta(ps + ps + aβ, xs + xs − ps − ps + bβ).

A.4 Full conditional posterior distribution of quan-

tities (Xs, Ps,Ys,ηs)

The joint posterior full conditional of (Xs, Ps,Ys,ηs) is proportional to

[Xs, Ps,Ys,Ns,ηs | · · · ]

∝m∏j=1

∑c∈Gij ,j

π(c)∑c∈R

π(c)−j−1∑k=0

∑c∈ek

π(c)

×∏c∈Cs


×(

1

ps + ps

)∑ps+psi=1 (yi−1)

1

(ps + ps)!

βps(1− β)xs−ps

1− (1− β)xs+xsαxs(1− α)xs

(M − xs + xs)!,

which does not have an analytical closed-form. We use the Metropolis-Hastings algorithm

for sampling (Xs, Ps,Ys,ηs) jointly. From the proposal distribution, it is straightforward

to sample (Xs, Ps,Ys,ηs) and jointly accept or reject these values using (a) as the target

distribution.

It is useful to generate Xs from a discrete uniform distribution with support in the

set {X∗ ± k : k = 1, . . . , 5}, where X∗ is is the current value of the X. Then, make

44

Xs = X − Xs, ensuring that Xs < X < M , since the number of nonempty cells in R

is at most M and it’s known that there are Xs nonempty cells in R. Note that, Ps is

the number of non-empty networks formed out of the Xs non-empty grid cells. Then, Ps

is generated by sampling from the truncated Binomial(Xs, β) distribution. Notice that

Ys is the number of non-empty grid cells in each of the Ps networks, so we generate Ys

from the 1Ps + multinomial(Xs − Ps, 1

Ps1Ps

)distribution. Then, the set of cells Cs that

compose the out-of-sample nonempty networks is established from Ys allocation process,

described in Subsection 4.1.1. From the covariates associated with Cs cells, we generate

ηs elements from the truncated Poisson(exp{v′cθ}) distribution, for c ∈ Cs. Therefore,

the proposal distribution is

[Xs, Ps,Ys,ηs]prop =1

10× xs!

ps!

βps(1− β)xs−ps

1− (1− β)xs×∏i/∈s

1

(yi − 1)!

(1

ps

)yi−1

×∏c∈Cs


. (A.1)

The improved sampling procedure, detailed in Subsection 4.1.2, leads to draw a

greater number of networks than without using the weights ω. Therefore, the final

sample s (made up by the first and second samples) may include all networks from R.

Thus, it is plausible that the proposal distribution of the disaggregated model fitted to

the final sample may lead to none out-of-sample nonempty cells, that is Xs = 0. Note

that, in this case, the number of nonempty cells in R, X, can assume value Xs. So, in

X generation, make Xs = X − Xs, but ensuring that Xs ≤ X < M . If X = Xs, then

Xs = 0 and the quantities Ps, Ys and ηs are, necessarily, equal to zero. Therefore, the

proposal distribution when Xs = 0 is

[Xs, Ps,Ys,ηs]prop =1

10.

A.5 Full conditional posterior distribution of ρ

The posterior full conditional of ρ (from Subsection 4.3.2) is proportional to

[ρ | · · · ] ∝∏

c∈{Cs,Cs}

(1

1 + exp{−v′cρ}

)φ(c)(exp{−v′cρ}

1 + exp{−v′cρ}

)1−φ(c)

×exp

{− 1

2σ2ρ

ρ′ρ

},

which does not have an analytical closed-form. We use the block Metropolis-Hastings

algorithm with a multivariate Normal proposal, whose vector mean is the current value

of the parameter and the covariance matrix is fixed at (V′V)−1σ2ρ∗ , where the term σ2

ρ∗

controls the acceptance rates and V is the matrix with rows v′c, for all cells c of the

sample, that is c ∈ {Cs,Cs}.

45

Appendix B

Assessment of MCMC with real data

In Section 4.3, we compared the results of our approach to those obtained using the

model proposed by Rapley & Welsh (2008). This appendix presents the convergence

results of the design-based experiment with a real population, displayed in Subsection

4.3.2. We evaluated the convergence of two parallel chains according to each number of

networks sampled from the real population. The results are presented in Table B.1 and

Figure B.1.


Parameter 1 2 3 4 5

α -2.04 0.96 -0.53 -0.90 0.94

β -2.18 1.50 1.14 0.20 0.54

T -1.78 0.52 -0.12 -1.08 1.22

Table B.1: Geweke convergence diagnostic for some of the parameters estimated for the real

population for both models.

46

α0.

010.

060.

11

β

0.00

0.25

0.50

200

1400

2600

T

1 N

S

α0.

010.

080.

15

β

0.00

0.25

0.50

100

500

900

T

2 N

S

α0.

010.

090.

17

β

0.00

0.25

0.50

200

800

1400

T

3 N

S

α0.

020.

080.

14

β

0.04

0.29

0.54

200

600

1000

T

4 N

S

α0.

020.

130.

24

0 1000 2000

Iterations

β

0.04

0.27

0.50

0 1000 2000

Iterations

300

900

1500

T

0 1000 2000

Iterations

5 N

S

Figure B.1: Trace plot with the posterior densities of α, β and T obtained from the fits of the

disaggregated and the aggregated models to real data, for each number of networks sampled

(NS). The black line represents the true value of T .

47

Bibliography

Baddeley, A. & Turner, R. (2000), ‘Practical maximum pseudolikelihood for spatial

point patterns: (with discussion)’, Australian & New Zealand Journal of Statistics

42(3), 283–322.

Bennitt, E., Bonyongo, M. C. & Harris, S. (2014), ‘Habitat selection by african buffalo

(syncerus caffer) in response to landscape-level fluctuations in water availability on two

temporal scales’, PloS one 9(7), e101346.

Brix, A. & Diggle, P. J. (2001), ‘Spatiotemporal prediction for log-gaussian cox processes’,

Journal of the Royal Statistical Society: Series B (Statistical Methodology) 63(4), 823–

841.

Cassel, C. M., Sarndal, C. E. & Wretman, J. H. (1977), Foundations of inference in

survey sampling, New York: Wiley.

Diggle, P. J. (1975), ‘Robust density estimation using distance methods’, Biometrika

62(1), 39–48.

Gattone, S. A., Mohamed, E., Dryver, A. L. & Munnich, R. T. (2016), ‘Adaptive cluster

sampling for negatively correlated data’, Environmetrics 27(2), E103–E113.

Gelman, A., Carlin, J. B., Stern, H. S. & Rubin, D. B. (1995), Bayesian data analysis,

Chapman & Hall.

Goncalves, K. C. M. & Moura, F. A. S. (2016), ‘A mixture model for rare and clustered

populations under adaptive cluster sampling’, Bayesian Analysis 11, 519–544.

Philippi, T. (2005), ‘Adaptive cluster sampling for estimation of abundances within local

populations of low-abundance plants’, Ecology 86(5), 1091–1100.

Plummer, M., Best, N., Cowles, K. & Vines, K. (2006), ‘Coda: Convergence diagnosis

and output analysis for mcmc’, R News 6(1), 7–11.

URL: https://journal.r-project.org/archive/

48

Prins, H. (1996), Ecology and behaviour of the African buffalo: social inequality and

decision making, Vol. 1, Springer Science & Business Media.

R Core Team (2019), R: A Language and Environment for Statistical Computing, R

Foundation for Statistical Computing, Vienna, Austria.

URL: https://www.R-project.org/

Rapley, V. & Welsh, A. (2008), ‘Model-based inferences from adaptive cluster sampling’,

Bayesian Analysis 3, 717–736.

Salehi M., M. & Seber, G. A. F. (1997), ‘Adaptive cluster sampling with networks selected

without replacement’, Biometrika 84(1), 209–219.

Smith, D. R., Conroy, M. J. & Brakhage, D. H. (1995), ‘Efficiency of adaptive cluster

sampling for estimating density of wintering waterfowl’, Biometrics pp. 777–788.

Su, Z. & Quinn, T. J. (2003), ‘Estimator bias and efficiency for adaptive cluster sam-

pling with order statistics and a stopping rule’, Environmental and Ecological Statistics

10(1), 17–41.

Thompson, S. K. (1990), ‘Adaptive cluster sampling’, Journal of the American Statistical

Association 85, 1050–1059.

Thompson, S. K. & Seber, G. A. F. (1996), Adaptive sampling, Wiley Series in Probability

and Statistics, Wiley.

49

Documents

Model-based Inference for Rare and Clustered Populations ... · Cluster Sampling using Auxiliary Variables Izabel Nolau de Souza Orientadores: Kelly Cristina Mota Gon˘calves e Jo~ao