Technical Report 4...Stochastic Modelling of Fractures in Rock Masses, 2003 2 1. Introduction After spatial pattern analysis of the point dataset, the next step towards modelling is

Stochastic Modelling of Fractures in Rock Masses

EPSRC grant number GR/R94602

Technical Report 4 (03/R02)

Spatial Point Density Estimation

C. XU, P. Dowd, K. Mardia, R. Fowell

Department of Mining and Mineral Engineering The University of Leeds

March 2003

Stochastic Modelling of Fractures in Rock Masses, 2003

2

1. Introduction After spatial pattern analysis of the point dataset, the next step towards modelling is the point density estimation. The purpose of the estimation is to construct a model, either parametric or non-parametric, which best describe the underlying point density model of the dataset. The model can then be used as a reference to model spatial point phenomenon which is believed to resemble the characteristics repressed by the data. As defined in Section 4.1 of Report 3, the point density, denoted as ?(X), at point X of a point dataset can be expressed theoretically by the following limit form:

=→ ||

)]([)( lim

0|| dXdXNE

XdX

λ

where X is the location variable and in two dimensional case, X=(x,y) and x and y are the horizontal and vertical coordinates respectively. dX is an infinitesimal volume containing location X, N(V) is the number of points within volume V and E[.] is the expected value. Clearly ?(X) is a point definition quantity which in practice can only be estimated approximately by surrounding data. In this context, the selection of dX, such as the bandwidth in the non-parametric estimation (see below), is therefore directly responsible for the approximation accuracy. In the case of homogeneous Poisson process (stationary spatial points), ?(X) = ? (a constant) and the best unbiased estimate for ? is ? =N/V, where N is the total number of points within the whole volume V being considered. In general there are two distinct approaches for density estimation, namely:

• Non-parametric approach • Parametric approach

Non-parametric approach does not assume any underlying parametric density model. Density values at certain locations are directly estimated from surrounding data and a discrete model is normally presented as the result. Parametric approach, on the other hand, proposes an underlying parametric model first and then estimate the parameters of the proposed model based on an optimal (either minimisation or maximisation) criteria. The optimisation in this case is done in the global scale over the whole region ℜ being considered. On the contrary, non-parametric estimation is data-driven and the estimation is only concerned with local scale variations. There are two basic types of optimisation criterion used in parametric estimation of point density and they are:

• Maximum likelihood • Least square

For the maximum likelihood method, the likelihood function based on the proposed model is derived first. For example, the likelihood function of non-homogeneous Poisson point process in general can be expressed as:

∏=

⋅−∫⋅=N

i

duvu

iVeXl

1

)()()()( θλ

θλθ

where θ represents the vector of parameters in the proposed model, N is the total number of events (points) within the region and Xi (i=1,2,..,N) are their locations and


3

V is again the volume of interest. )(⋅θλ in the expression is the proposed parametric

density model. The estimated parameters θ̂ will the set which maximises the likelihood function, such as one shown above. The least square method, on the other hand, is a minimisation problem. A function quantifying the squared differences in some characteristic statistics between the actual dataset and the proposed model is constructed. This in general can be expressed as follows:

[ ]∫ ⋅−= dttftfs cD

cm

2)}({)},({)( θθ

where f is a chosen statistics, mf is the statistics constructed from the model with parameter θ to be estimated, Df is the statistics calculated from the data. c is a weighting constant and t describes the range where statistics f is defined. The estimated parameters θ̂ will the set which minimises the squared difference between the model and actual statistics s(θ). In practice for point density estimation, K function is normally chosen as the f function. There is also an additional type of density estimation, namely the spectral density estimation, which sits in between the non-parametric and parametric estimation. The goal of the estimation is to derive a parametric model to describe the density and therefore it is a kind of parametric estimation. The estimation, however, adapts direct estimation based on surrounding data (local-scale significance) and is not optimised in any sense, similar to the non-parametric approach. The essential features of this approach is given in Shinozuka [], which stated that a random process such as density can be simulated by a series of cosine functions with random frequency (or wave number). This can be expressed in the following form:

∑=

+⋅′⋅=N

kkk XSXf

1

)cos()(2)( φϖϖ

where N is the number of discrete points dividing the range ϖ of the spectral function S(ϖ), φk is a random phase angle uniformly distributed within [0,2π]. S(ϖ) can be estimated from higher order analysis of f(X). If we write the second order f(X) of a stationary process as:

1221 )]()([)( XXXfXfEC −== ξξ The spectral function S(ϖ) is then defined by (in Shinozuka []):

∫+∞

∞−

⋅⋅⋅= ξξωξπ

ω dCS )cos()(21

)(

In the case of point processes, f(X) will be the density function λ(X) and C(ξ) will be the second order density λ2(ξ), as used in Report 3. This approach is the method adopted by Lee & Einstein [] in their research. To make the method workable, they introduced an assumption that for their case studies, the directional density in x and y directions are independent, i.e., λ(X)=λ(x,y) can be written as:

)]()([),( yxbayx yx λλλ ⋅⋅+=

Lee & Einstein then estimated λx(x) and λy(y) independently using the above spectral function method using directional second order density functions.


4

In our research, we are not going to follow this route. This is due to two a few disadvantages of the approach described above:

• The assumption of independence of density in x and y directions is a serious drawback in this approach as this is hardly true in most practical cases.

• The estimation of second order density λ2 is as difficult, if not more, as the estimation of λ itself. Some functional form of λ2 is normally assumed to simplify the estimation process.

• The approach is lack a way to judge the effectiveness of the estimation. Another different approach for density estimation is by Fourier series. This is due to the fact that any function, including a density function, f(X) can be expressed as the sum of an infinite series of Fourier expansion, i.e.,

∑∞

=

⋅=0

)()(k

kk XfXf φ

where

∑=

−

==

====n

iikkkk

mm

Xn

fdatafromestimatedbecanwhichXEf

mformXXmXXX

1

2120

)(1ˆ:)]([

...,2,1)2sin(2)(),2cos(2)(,1)(

φφ

πφπφφ

We are not going to follow this route either due to the similar concerns as for the spectral function approach. As Silverman [] stated, the summation of the Fourier expansion above is not a good estimate of f(X). It only converges to a sum of delta functions at the observations. 1.1 Diagnostics Once the density model has been fitted, either parametric or non-parametric, the effectiveness of the fitted model must be tested. Cressie [] proposed a two dimensional residual test of point patterns based on Berman. The idea is outlined below: If the region ℜ being considered is rectangle [0,a]×[0,b], for each event (xi,yi)∈ℜ, i=1, 2, …N, introduce:

⋅=

⋅=

∫ ∫

∫ ∫i

i

y a

i

x b

i

dxdyyxv

dydxyxu

0 0

0 0

),(

),(

λ

λ

where λ(x,y) is the fitted density model. Then the events {ui} and {vi} are realisations of one dimensional homogeneous Poisson process with unit intensity. Test can then be conducted for this verification. This is an indirect approach for verification. I am still not convinced that events {ui} and {vi} are realisations of one dimensional homogeneous Poisson process given that λ(x,y) itself defines a non-homogeneous point process. In our research, this verification will not be adapted. In my opinion, the most direct verification of the fitted model is to examine the differences between the basic point process statistics of the dataset and those derived


5

from the fitted model. For example, if fd(t)is the statistics derived from data and fm(t) from the fitted model, then by comparing fd(t) and fm(t) for the whole valid range of the function one can reach the conclusion of the effectiveness of the fitted model, or by comparing the fd(t) with fm(t) from different model he will be able to identify the best model. If the theoretical fm(t) is not available, we can resolve fm by Monte Carlo simulation based on the fitted model. In this case, the confidence envelope of fm can also be given. For example, if 100 MC simulations are conducted and their corresponding )(tf i

m , i=1,2,…,100 can be calculated. Rearrange )(tf im for each t

value in ascending order we can get the 90% confidence interval for t as [ )(5 tfm , )(95 tf m ], see the illustration in Figure 2.1. In this figure, the envelope shown is the 95% confidence limit. An effective fitted model will be the one that the data statistics fd(t) follows the average )(tf i

m and fd(t) is contained within the confidence envelope for the whole range. This frame work for the basic judgement of the effectiveness of a fitted model will be used in the current research. The statistics f mentioned in this context can be the inter-event distance statistics H(t), the nearest event distance statistics G(y), the point to nearest event distance statistics F(x) or the K-function K(t).

f(t)

t Figure 1.1 Diagnostics for the effectiveness of

fitted density model

fd(t)

fm(t) - average

fm(t) – upper 95% confidence limit

fm(t) – lower 95% confidence limit

(Note notation f is used to represent a function of interest in this report. The actual meaning should be read from context). 1.2 Datasets to be used for examples Datasets used for the density estimation exercise will be either generated or the two actual datasets used in the pattern analysis report. For the generated point dataset, three different point processes will be used:

• Non-homogeneous process • Cluster process, and • Cox process

Homogeneous process can be viewed as a special case for the non-homogeneous process. For non-homogeneous case, the density model used for generating the points will be:

Non-homogeneous model: vuevu 2800),( −−⋅=λ


6

For cluster process, the model used will be:

025.02

1),(:

20][,30

2

22

22 ==−

==−+

−σ

πσ

ρ

σ witheyxhnormalBiondistributiOffspring

SEwithprocessScottNeymonyx

For Cox process, the model is defined as a normal distribution with mean and variance specified as follows:

⋅=

⋅=−−

−−

vu

vu

evuvariance

evumean2

2

120),(

800),(

In the above expressions, u and v are the horizontal and vertical coordinates. For simulated dataset, the region for the simulation will always be a unit square, i.e., [0,1]×[0,1]. For the two actual datasets, the first dataset is from Lee & Einstein []. It can also be divided into two sub-datasets. The data is displayed in Figures 1.2 – 1.4 below. For more detailed descriptions, see Report 3 [].

Figure 1.2 (a) Example 1 (fracture traces)

Figure 1.2 (b) Example 1 (Points data set)

Figure 1.3 (a) Example 1 – fracture trace 1

Figure 1.3 (b) Example 1 - fracture set 1


7

Figure 1.4 (a) Example 1 – fracture trace 2

Figure 1.4 (b) Example 1 - fracture set 2

The second dataset is from La Pointe []. Again it can also be divided into two sub-sets. The data is displayed in Figure 1.5 – 1.7 below. For more details, see Report 3 [].

Figure 1.5 (a) Example 2 (fracture traces)

Figure 1.5 (b) Example 2 (Points data set)

Figure 1.6 (a) Example 2 – fracture trace set 1

Figure 1.6 (b) Example 2 – fracture trace set 1


8

Figure 1.7 (a) Example 2 – fracture trace set 2

Figure 1.7 (b) Example 2 – fracture trace set 2

2. Non-parametric density estimation Non-parametric density estimation can serve two purposes: the density estimated can be used as the model for future reference by its own right, or it can be used as a preliminary estimation stage which helps to build a more complicated or a parametric model. Nevertheless the estimation is simple and straightforward and therefore should always be conducted as the first stage in the density modelling process. Three categories of non-parametric approaches will be discussed here. 2.1 Unbiased non-parametric density estimation Recall the definition of density as follows:

=→ ||

)]([)( lim

0|| dXdXNE

XdX

λ

In two dimensional case, suppose N(s,a) is the number of points within a square of area a×a at location s, i.e., the quadrat count within the area. From definition,

2

),(a

asNwill be an unbiased estimator of 2

2)(

a

duusa∫ +λ

. If a is small enough and

λ(s) is approximately constant inside a2, then this unbiased estimator will be the

density at location s, λ(s). In other words, 2

),(a

asN is the unbiased estimator of λ(s)

for all s ∈ ℜ. Similar arguments can be draw for three dimensional case. The above definition is based on a square area a×a centred at location s. Square grid can sometime introduce some undesirable effect because of its four corners. It may be more suitable to use a circle instead. If there is some preferential direction for the point distribution, i.e., if the spatial density correlation is anisotropic, an ellipse with major axis oriented in the main correlation direction will be more appropriate. Figure 2.1 illustrates the different shapes used for the estimation. More discussions will be given about their effect on the density model by examples in later sections.


9

(a). Square grid (b). Circle grid (c). Ellipse grid

Figure 2.1 Different grid used for unbiased estimate of λ(s) The size of the area A (in the case of square grid: A= a×a) used plays a very important role in estimating the model. In density estimation, A is termed the smoothing window as its size effectively determines the ranges in the region to be taken into account when estimating λ(s) and hence the smoothness of the estimated model. The larger the size, the smoother the density model. 2.1.1 Unbiased non-parametric density estimation of simulated patterns For demonstration purpose, we will only have some examples for non-homogeneous and cluster processes. Cox process can be treated as a non-homogeneous process in non-parametric estimation. For the non-homogeneous model given in Figure 2.2 (a), various types of unbiased density estimate are displayed in Figure 2.2 (b)-(g).

Figure 2.2 (a) Non-homogeneous Fig. 2.2 (c) Square grid, BW=0.2

Fig. 2.2 (c) Square grid, BW=0.4 Fig. 2.2 (d) Square grid, BW=0.5


10

Fig. 2.2 (e) Circle grid, Radius=0.1 Fig. 2.2 (d) Circle grid, Radius=0.2

Fig. 2.2 (f) Circle grid, Radius=0.25 Fig. 2.2 (g) Circle grid, Radius=0.4

For cluster process, the point dataset is given in Figure 2.3 (a). Density estimate results are given in Figure 2.3 (b)-(g).


11

Figure 2.3 (a) Cluster process Fig. 2.3 (c) Square grid, BW=0.025

Fig. 2.3 (c) Square grid, BW=0.05 Fig. 2.3 (d) Square grid, BW=0.1

Fig. 2.3 (e) Circle grid, Radius=0.0125 Fig. 2.3 (d) Circle grid, Radius=0.025


12

Fig. 2.3 (f) Circle grid, Radius=0.05 Fig. 2.3 (g) Circle grid, Radius=0.1

2.1.2 Unbiased non-parametric density estimation of two example datasets As can be seen from the above examples, none of the estimation seems to give satisfactory results for the generated dataset (see the theoretical model used displayed in Figure 2.16). It is then decided that this method will not be pursued any further and will not be used in the current research. Therefore the analysis for the two actual datasets using this method will not be conducted. 2.2 Kernel density estimation Kernel density estimation is a direct extension of density estimation by histograms. Take a one dimensional case for example, the version of density function f(x) estimated directly by histogram can be written as:

<

=

−

⋅= ∑

= otherwise

uifuwwhere

hxx

whN

xfN

i

i

0

1||21

)(1

)(ˆ1

w(u) is a simple weight function which simply gives the maximum weight of ½ to a sample if it lies within the range of x ± h, i.e., the bin width of the histogram is 2h. The density model estimated by histogram is not a continuous but a “jumpy” function. It has discontinuity point at every sample point due to the sudden in the weight function. The “jumps” happen at xi ± h (i=1,2,…,N) and the function )(ˆ xf has zero derivatives everywhere else. An illustrating example is given in Figure 2.4 (a). This estimation is very sensitive to the origin, the selection of bin width and the function discontinuity has serious undesirable influences for future effective use of the model. Some problems can be overcame by replacing the discontinuous weight function w(u) above by a continuous kernel function k(u), i.e., we estimate f(x) by:

∑=

−

⋅=

N

i

i

hxx

khN

xf1

1)(ˆ

Kernel function k(u) is a continuous function within its effective range and is used to assign different weight to different samples according to their distances to the point x where the density value is being estimated. Figure 2.5 illustrates the process of density estimation by kernel method. Each sample point will have its own kernel function profile (weight function) which represents its influence on the density value in surrounding area. Kernel function profiles will be identical for all sample points, meaning that all sample points are equally important in determining the final overall density model. The summation of all the kernel function “bumps” will then be the


13

estimated model )(ˆ xf . Figure 2.4 (b) gives a possible estimate of f(x) shown in Figure 2.4 (a) by kernel estimator.

f(x)

x

f(x)

x (a) Histogram estimator (b) Kernel estimator

Figure 2.4 Density estimation by weighting functions

f(x)

x Figure 2.5 Kernel estimate showing individual

kernels (adopted from Silverman [ ])

The value of h in the above discussion, similarly to the A discussed in the above section, is termed the smoothing parameter, or smoothing bandwidth. It plays the decisive role in the final outcome of the density model. Its choice is of primary importance compared with the kernel function itself which is only of secondary importance. More discussions are given in later sections. There is no actual requirement for the form of the kernel function as long as it satisfies the following condition:

1)( =∫+∞

∞−duuk

Any kernel function satisfying this condition will give reasonable answers. This weak condition leaves us with a rich source of selections for k(u). In general, k(u) is a symmetrical function with a finite range. If a symmetrical function is used, the following condition will be obvious:

0)( =⋅∫+∞

∞−duuku

Some common kernel functions used in the literature are listed in Table 2.1 below. Name of the kernel k(u)


14

Biweight

<−

=otherwise

uforuuk

0

1||)1()(

221615

Gaussian 2

21

21

)(u

euk−

=π

Epanechnikov

<−

=otherwise

uforuuk

0

5||)1()(

251

543

Table 2.1 Some commonly used kernel model for one dimensional case Now, turning to the two dimensional density estimation for point processes, the physical meaning of density definition is not quite the same as the one dimensional case discussed above, which comes directly from histogram definition. However, the fundamental principle that density value can be estimated from the summation of kernel “bumps” based on surrounding data can still be used. They do have two basic differences: firstly, the kernel “bumps” are now a two dimensional surface (such as a bell shape surface), and secondly, the normalising factor changes from N (the number of samples) to p(s) which, by definition, is the average value (height) of the kernel “bump” (bell shape surface) at location s. In expression:

)(1

)(1

)(1

)( 21

2 dwvh

wsk

hspwhere

hXs

khsp

s h

N

i

i

hh ⋅

−

=

−

= ∫∑ℜ=

λ

where h, again, is termed the smoothing parameter, the smoothing bandwidth or the smoothing window. The selection of its value is again of primary importance affecting the final results of the model. It is for this reason that the density value λ and normalising factor p in the formula are subscripted by h. The normalising factor ph(s) is the volume of the kernel (e.g., the volume covered by the bell shape surface) divided by h2, the area of the smoothing window, and hence can be regarded as the average height of the kernel function. For kernels fully enclosed by the region ℜ, this factor is constant and every locations have the same treatment. When location s is near the edge of the region, however, this factor decreases and therefore it will increase the λ value by a certain factor accordingly. This is to take into account the possibility of additional contributions to λ estimation from sample points which are outside the region and are not shown. It is for this reason, ph(s) is termed the edge correction factor by Diggle [1985]. The requirement for possible kernel functions in 2D cases is again weak. Compared with h, which is of primary importance in the estimation, the issue of which k(u) to use is of secondary importance. Any reasonable kernel can give close to optimal results. Table 2.2 lists some of the common 2D kernel functions. Note in the table,

=yx

u , uT = {x y}.

Name of kernel function k(u) Simple ( )[ ] ( )[ ]

<<−⋅−=

otherwise

hyandhxforyxk hy

hx

0

||||11),(22

169


15

Binormal uuT

euk⋅−

= 21

21

)(π

Epanechnikov

<⋅⋅−

=otherwise

uuforuuuk

TT

0

1)1()(

2π

Biweight

<⋅⋅−

=otherwise

uuforuuuk

TT

0

1)1()(

23π

Table 2.2 Common kernel functions used for 2D point density estimation The first three types of kernels will be used in our examples. 2.2.1 Kernel density estimation of simulated patterns The same non-homogeneous model was used to generate the example dataset, as shown in Figure 2.6 (a). Density estimation using different kernels are displayed in Figure 2.6 (b)-(n) below.

Figure 2.6 (a) Non-homogeneous Fig. 2.6 (b) Simple kernel, BW=0.05


16

Fig. 2.6 (c) Simple kernel, BW=0.1 Fig. 2.6 (d) Simple kernel, BW=0.2

Fig. 2.6 (e) Simple kernel, BW=0.3 Fig. 2.6 (f) Binormal kernel, BW=0.05

Fig. 2.6 (g) Binormal kernel, BW=0.1 Fig. 2.6 (h) Binormal kernel, BW=0.2

Fig. 2.6 (i) Binormal kernel, BW=0.3 Fig. 2.6 (j) Epanechnikov kernel, BW=0.05


17

Fig. 2.6 (k) Epanechnikov kernel, BW=0.1 Fig. 2.6 (l) Epanechnikov kernel, BW=0.2

Fig. 2.6 (m) Epanechnikov kernel, BW=0.3 Fig. 2.6 (n) Epanechnikov kernel, BW=0.4

As can be seen from the above examples, simple kernel and Epanechnikov kernel give very similar results and will give a stable model using the bandwidth of around 0.3 (see the theoretical model used displayed in Figure 2.16). Binormal kernel, on the other hand, seems to be able to give stable estimate with smaller bandwidth value at around 0.2. The reason for this behaviour is due to the effective ranges taken into account by the kernel function. For the first two types of kernels mentioned above, the effective range is the bandwidth itself, i.e., points outside the bandwidth range will not be considered. For binormal kernel, however, due to the tails of the function the effective range for points to be included in the density estimation goes beyond the specified bandwidth. As a consequence, a smoother image can be obtained with smaller bandwidth compared with the other two types of kernels. For cluster example, the generated points are shown in Figure 2.7 (a) and various density estimation by different kernels and bandwidths are given in Figure 2.7 (b)-(l).


18

Figure 2.7 (a) Cluster example Fig. 2.7 (b) Simple kernel, BW=0.025


Fig. 2.7 (e) Simple kernel, BW=0.2 Fig. 2.7 (f) Binormal kernel, BW=0.01

Fig. 2.7 (g) Binormal kernel, BW=0.025 Fig. 2.7 (h) Binormal kernel, BW=0.05


19

Fig. 2.7 (i) Binormal kernel, BW=0.1 Fig. 2.7 (j) Epanechnikov kernel, BW=0.025

Fig. 2.7 (k) Epanechnikov kernel, BW=0.05 Fig. 2.7 (l) Epanechnikov kernel, BW=0.1

Again binormal kernel can reach an acceptable estimate using smaller bandwidth (at about 0.025) compared with the other two methods (at 0.05). Compared with these two examples demonstrated above, it is not difficult to conclude that in general smaller bandwidth will be needed if smaller scale details are to be represented correctly by the model. In the non-homogeneous case, the global scale trend is more important than local details around points. In this case large bandwidth will normally give better estimate results. In the case of cluster process, details about individual clusters are of the main interests and therefore it needs smaller badnwidth to reveals all the details. Large bandwidth can also be used deliberately to reveal global variation trend in the density model, see example in Figure 2.7 (e) and (i). This is important to help to establish the underlying parent process, especially when the parent process in non-homogeneous. More discussions can be found in later sections. 2.2.1 Kernel density estimation of two example datasets Since kernel estimates can provide satisfactory results we will use the methods to examine the two actual datasets mentioned in Section 1. For dataset 1, the kernel estimate results are presented in Figure 2.8. For the first subset of dataset 1, results are given in Figure 2.9, and for the second subset, results are shown in Figure 2.10. Figure 2.11 – 2.13 list all figures of kernel estimation results for dataset 2, including the two subsets.


20

Fig. 2.8 (a) Dataset 1 - full Fig. 2.8 (b) Simple kernel, BW=0.5


Fig. 2.8 (e) Binormal kernel, BW=0.25 Fig. 2.8 (f) Binormal kernel, BW=0.5

Fig. 2.8 (g) Binormal kernel, BW=1.0 Fig. 2.8 (h) Epanechnikov kernel, BW=0.5


21

Fig. 2.8 (i) Epanechnikov kernel, BW=1.0 Fig. 2.8 (j) Epanechnikov kernel, BW=2.0

Fig. 2.9 (a) Dataset 1 – subset 1 Fig. 2.9 (b) Simple kernel, BW=0.75




22






23




Fig. 2.11 (a) Dataset 2 - full Fig. 2.11 (b) Simple kernel, BW=1.0


24






25






26






27



For the full dataset 1, the suitable bandwidth using simple or Epanechnikov kernels is somewhere between 1.0 and 2.0 and the density model will be like thos shown in Figure 2.8 (d) and (j). For binormal model, again the bandwidth required for appropriate estimation will be smaller and it will be in the range of 0.5 – 1.0 and the final density model looks like that shown in Figure 2.8 (g). In this case the suitable bandwidth is mainly required by the two aggregations shown on the top-middle part of the picture. For the first subset of dataset 1, the two aggregations are removed and therefore the bandwidth required for suitable estimation is increased slightly. For simple and Epanechnikov models, the suitable bandwidth will be somewhere between 1.5 and 2.0 and for binormal kernel it will be between 0.75 and 1.0. The final density model will be like those shown in Figure 2.9 (d), (g) and (j). For the second subset, however, the two aggregations will be present and therefore the suitable bandwidth will be similar to the one used for the full dataset estimation. The final model will be like those shown in Figure 2.10 (d), (g) and (j). For the second dataset, no point aggregations is apparent. We need greater bandwidth to obtain suitable estimate. The simple kernel Epanechnikov kernels will have suitable estimate with bandwidth somewhere between 2.0 and 4.0 and their estimated models will be like those shown in Figure 2.11 (d) and (j). For binormal kernel the suitable bandwidth will be between 1.0 and 2.0 and the results will be like that shown in Figure 2.11 (g). For the two subsets of dataset2, the estimation reveals more or less the same sort of characteristics and the suitable model will be one like that shown in Figure 2.12 (d), (g) or (j) for subset 1 and Figure 2.13 (d), (g) or (j) for subset 2. Clearly there is a need to have an objective way to judge the effectiveness of the modelling process. In the above discussions, the “suitability” of the model is only


28

judged by visual inspections of different models derived by different modelling techniques or different modelling parameters. Certain difficulties arise in deriving criteria for objective evaluating the modelling process. The problem, however, can be addressed by examining the squared errors in some statistics between the dataset and the Monte Carlo simulations from the fitted model. More discussions will be given in Section 2.4 later. 2.3 Distance density estimation Suppose the density at location s, λ(s), is to be estimated. If the distance between nearest k-th point (event) and s is rk(s), a circle with radius equal to this distance centred at s will enclose k number of events (points), including the k-th event. From the other hand, the expected number of events inside the circle is λ(s)⋅A, where

2krA ⋅= π is the area of the circle. If λ(s) does not vary much inside the circle, k/A

will then be an unbiased estimate for λ(s), i.e.,

2)(kr

kAk

s⋅

==π

λ

To take into account the edge effect, A can be replaced by the intersection area between A and the region ℜ being considered, i.e.,

ℜ∩=

Ak

s)(λ

As can be seen from the definition, distance estimation is quite similar to the unbiased estimation described in Section 2.1. There is a fundamental difference, however. Distance estimator is event oriented while the unbiased estimator is done through arbitrary search windows. The direct consequence of this difference is that distance estimator never has zero estimation value while unbiased estimator does not guarantee anything (except the unbiasedness). Similar to the h value used in the kernel estimator, k in distance estimator is termed the smoothing parameter. The larger the selected k, the smoother the estimated density model will be. 2.3.1 Distance density estimation of simulated patterns For demonstration purpose, we use the same generated point patterns used in the last two sections. For the non-homogeneous case, the density estimation using different kernels are displayed in Figure 2.14 below. For cluster example shown in Figure 2.15 (a), the corresponding distance estimate are presented in Figure 2.15 (b)-(d).


29

Figure 2.14 (a) Non-homogeneous Fig. 2.14 (b) Distance method, k=10

Fig. 2.14 (c) Distance method, k=20 Fig. 2.14 (d) Distance method, k=50

Figure 2.15 (a) Cluster model Fig. 2.15 (b) Distance method, k=5


30

Fig. 2.15 (c) Distance method, k=10 Fig. 2.15 (d) Distance method, k=20

Fig. 2.15 (e) Distance method, k=30 Fig. 2.15 (f) Distance method, k=40

Note for the cluster example, the best smoothing parameters (the number of points to be used for density estimation) is about 20, which coincides with the mean daughter numbers of off-springs of the Neyman-Scott process used. 2.3.2 Distance density estimation of two example datasets Compared with the kernel methods, distance method does not give results as satisfactory (see the theoretical model used displayed in Figure 2.16) and therefore this method will not be pursued any further in this research. Distance analysis for the two actual datasets will not be conducted. 2.4 General remarks about non-parametric estimation Two general conclusions can be drawn based on the above analysis:

• Kernel estimate in general can provide better estimate compared with the other two non-parametric estimation methods

• Non-parametric methods are extremely sensitive to the bandwidth used Clearly there is a need here to judge the goodness of fit of the estimated model in an objective manner. This on one hand will help to determine the effectiveness of the estimation, and on the other hand, it will help to choose a suitable bandwidth to be used. The most common way to look at the issue of goodness of fit of the model is to examine the squared error of the fitted model. For example, the squared error of the fitted model for the non-homogeneous example case can be tested as shown in Figure


31

2.16 below. The squared error calculated will be an effective way to quantifying the goodness of fit of the model. We have a problem here, however. In practice λ(x,y) is not known and therefore the calculation of squared error of the density model is impractical. We need to build other ways of testing. One practical way will be testing the squared errors of some statistics, instead of testing the density model directly. For example, we can calculate the squared error of difference in some statistics between the data and the fitted model in the following way:

dttstsstatisticsinerrorSquared MD ⋅−= ∫2

)](ˆ)([

and use this as an index to reach the judgement. s(.) in the expression represent some kind of statistics, for example, the K-function. sD is estimated directly from the data and sM can be calculated from Monte Carlo simulation technique. More research can be done in this area to explore this issue.

Theoretical model λ(x,y) Estimated model ),(ˆ yxλ

dxdyyxyxerrorSquaredA

⋅−= ∫2

)],(ˆ),([ λλ

Figure 2.16 Examining the goodness of fit by squared errors

dttstsstatisticsinerrorSquared MD ⋅−= ∫2

)](ˆ)([

2.5 Effectiveness of the fitted non-parametric model As a final remark for this section, we now examine the effectiveness of the fitted non-parametric model. To choose the “best” model, it will obviously involve the statistics of all the models concerned. In the exercises in the section, however, we only choose one non-parametric model for each dataset. Furthermore, only the full datasets will be analysed here.


32

2.5.1 Dataset 1 – full For dataset 1 (full), as reproduced in Figure 2.17 (a), the non-parametric model chosen for this exercise is one estimated by normal kernel with the bandwidth of 1.0, as shown in Figure 2.17 (b). 100 Monte Carlo simulations based on the model are conducted and distance and K-function statistics are then calculated for the dataset and the simulated points. 95% confidence envelope of the statistics are also calculated. One of the realisation is given in Figure 2.17 (c) and statistics for the data and the simulated points are presented in Figure 2.17 (d) –(j) below.

Fig. 2.17 (a) Dataset 1 - full Fig. 2.17 (b) Non-parametric model used

Fig. 2.17 (c) One realisation of the model Fig. 2.17 (d) Inter-event distance H(t)

Fig. 2.17 (e) Inter-event distance (histogram)

Fig. 2.17 (f) Nearest-event distance G(t)


33

Fig. 2.17 (g) Nearest-event distance (histogram)

Fig. 2.17 (h) Point-nearest-event distance F(t)

Fig. 2.17 (i) f(x) - histogram Fig. 2.17 (j) K-function K(t) The inspection of these statistics suggests that the model used is extremely effective. The statistics of the dataset almost follows consistently the average statistics derived from the 100 simulations. This, however, will not be the case if we choose another model. For example, if the non-parametric model calculated from binormal kernel using bandwidth 1.0 (Figure 2.18 a) is used, the inter-event statistics and the K-function will be shown in Figure 2.18 (b) –(d). As can be seen, though the statistics of the actual dataset still follows the average value, greater amount of discrepancy (though not significant) can be observed compared to Figure 2.17, especially for small distances. This point supports the arguments discussed in the above section. A method needs to be derived to judge the optimal bandwidth to be used in kernel estimation.


34

Fig. 2.18 (a) Model with bandwidth 1.5 Fig. 2.18 (b) Inter-event distance H(t)

Fig. 2.18 (c) Nearest-event distance G(t) Fig. 2.18 (d) K-function K(t)

2.5.2 Dataset 2 – full Follow the same procedure, we choose the non-parametric model estimated by binormal kernel with bandwidth 1.0 for testing the effectiveness of the model. The whole process is repeated again and the results are displayed in Figure 2.19 below.

Fig. 2.19 (a) Dataset 1 - full Fig. 2.19 (b) Non-parametric model used


35

Fig. 2.19 (c) One realisation of the model Fig. 2.19 (d) Inter-event distance H(t)

Fig. 2.19 (e) Inter-event distance (histogram)

Fig. 2.19 (f) Nearest-event distance G(t)

Fig. 2.19 (g) Nearest-event distance (histogram)

Fig. 2.19 (h) Point-nearest-event distance F(t)


36

Fig. 2.19 (i) f(x) - histogram Fig. 2.17 (j) K-function K(t) As can be seen again, the average statistics between the dataset and the simulations from the model is almost identical, implying a suitable non-parametric model used. Note the blue curve in the figures are the theoretical statistics for homogeneous process. Compared with dataset 1, dataset 2 is “closer” to homogeneous case. 3. Maximum likelihood parametric density estimation There are two fundamental steps for maximum likelihood parametric density estimation: firstly a parametric model needs to be proposed, and secondly, to construct the likelihood function. There is no unique way to propose a suitable parametric model. The most likely approach lies in the application itself. Based on the application we can gather some prior knowledge about possible forms for the parametric model. Several forms may be proposed which are all possible logical derivation from this knowledge. Consequent analyses are then used to identify the optimal one to be used as the future reference. If no prior knowledge is available the best starting point is to conduct the non-parametric estimation first. By examining the point pattern and the estimated non-parametric model, one will be able to reach some appropriate proposals. Again, further analyses are used to test the effectiveness of the fitted model. A logical conclusion from this process is that it will be difficult to get an optimal parametric model as any optimisation done subsequently is only within the scope of the proposed models, which, in practice, are limited. Once we have a proposed parametric model, the construction of likelihood function is more straightforward. For non-homogeneous (including homogeneous cases) process, the likelihood function can be written as (Daley & Vere-Jones, 1988):

∏=

⋅−∫⋅=N

i

duvu

iVeXl

1

)()()()( θλ

θλθ

For Cox process, because it is a two steps random process (doubly stochastic model), the likelihood function also consists of two stages components. The final point data is realised from an intermediate density model (non-homogeneous in general), λ(X), the second stochastic component, and therefore the likelihood of this realisation will be the same as the non-homogeneous case, i.e.,

∏=

⋅− ∫⋅=N

i

duvu

iVeXl

1

)()()()( θλ

θλθ


37

We write λ(X) as λθ(X) because the intermediate model is controlled by some parameters θ. This likelihood function can be interpreted as the probability density of obtaining N events located at the data points within the region. Because the intermediate model parameters θ, and hence the density λθ(X), is a realisation of the underlying Cox model, the first stochastic component, and therefore the likelihood of this model realisation within the region can be written as (Cressie, 1993):

θθλθθθβ βθ θ

λ

θβθ dgeXdgpl

N

i

duvu

iV ⋅⋅

∫⋅=⋅⋅= ∫ ∫ ∏

=

⋅−)()()()()(

1

)()(

where gβ(θ) is the density function for intermediate parameters θ and β is the parameters for the underlying Cox model. For cluster process, the only feasible way to adapt the maximum likelihood approach is to separate the parent and daughter processes first. This itself is extremely difficult in most of the cases but suppose it can be done. If we assume, for example, the daughters location distribution around their parents are bi-normal with dispersion parameter σ, then the likelihood function for cluster process can be written as:

⋅

∫⋅= ∏ ∏∏

= =

+−

=

⋅− n

i

N

Nk

vun

i

duvu

i

j

i

kk

V eewl1

22

1

)()( 2

22

21

)(),( σλ

θ πσλσθ θ

where the first and second parts are the likelihood of the parent and daughter realisations respectively. n is the number of clusters uk and vk are the distances between the daughters and their parents. As can be seen from the expression, the success of applying the maximum likelihood method depends on the successful partition of the data points into logical clusters. Once the likelihood function has been derived, we can then estimate the parameters, θ, β or σ, by maximising the likelihood function. The estimate will be the sets which give maximum value for the l(⋅) function. 3.1 Implementation issues As stated in Cressie [], the likelihood maximisation problem will not in general admit a closed-form solution so numerical techniques must be used. In applying the numerical method, only certain points within the range of parameters being considered are used to maximise the function. Therefore charting the likelihood value against the parameters considered will in general help the optimisation process. Because of the truncation problems in computers for small numbers, the log-likelihood functions are normally used instead. For example, in the non-homogeneous case, instead of maximising the likelihood function directly (which is in general extremely small), we maximise the following log-likelihood function:

∫ ⋅−= ∑=

V

N

ii duvuXl )()()](log[))(log(

1θθ λλθ

This log-likelihood function is in general much easier for manipulation. A serious problem exists in evaluation the likelihood function for the Cox process. For each set of Cox model parameters β, a random process specifying the λ(X) is defined by the corresponding parameters θ, and θ=f(β). In other words, at each location, the actual density value itself is a random process defined by θ, i.e., λ(X, θ).


38

This introduces problems for calculating the likelihood of density realisation at sample locations. At each sample location, the likelihood has to be evaluated over the whole probability space of the random process at the point and this has to be done for all samples. Clearly this process, if feasible in real applications, will be extremely expansive. Furthermore, we are optimising β, the above process will have to repeated a number of times for its search space. A practical way to evaluate the likelihood for Cox process is by Monte Carlo integration. In its nutshell, Monte Carlo (MC) integration is a numerical integration by random samplings. In expression terms, the integration ∫ ⋅⋅ dxxgxf )()( can be

approximated by the expected sample values of f(x) provided g(x) is a probability density function and the sampling points generated from the g(x) are independent, i.e.,

∫ ∫∑ ≠=⋅≈⋅⋅=

tindependenarejixandxanddxxgifxfM

dxxgxf ji

M

ii )(1)()(

1)()(

1

And the variance of the MC integration is:

22 1gMC M

σσ =

i.e., as M→∝, 02 →MCσ and the MC calculation converge to the integration. Same technique can also be applied in the cases of multi-dimensional integrations. Back to our problem, the evaluation of the likelihood function for Cox process.

)(θβg in the function is a probability density describing the random density field in the region and the integration is over the whole probability space. If we generate random samples from )(θβg we can then approximate the calculation of likelihood for Cox process as follows:

∑ ∏= =

⋅−

∫⋅=

M

m

N

i

duvu

miV meX

Ml

1 1

)(),(),(

1)(

θλ

θθθλβ

or in log-likelihood terms:

∑ ∑= =

∫ ⋅−=M

mV m

N

imi duvuX

Ml

1 1

)(),()],(log[1

))(log( θλθλθ θθ

Note each sample of parameter vector θ here is equivalent to one realisation of the Cox density field in the region. As can be seen, the MC evaluation described here is still very expensive, but nevertheless it provides a more practical way of getting the result than the direct integration. 3.2 Maximum likelihood density estimation of simulated patterns The first exercise about maximum likelihood density estimation is directed for some simulated point patterns. Non-homogeneous and Cox processes will be used for this exercise. For these patterns, the underlying models are known and therefore the form of the proposed parametric model. We can then use the maximum likelihood technique to estimate the parameters from the generated point patterns. In this case, it might more informative to provide the estimate of distribution of the parameters rather than just a single point estimate. Multiple realisations from the


39

same underlying model will provide multiple estimated values for the same parameter and then the distribution can be derived. The non-homogeneous model used in this exercise is as follows:

yxeyx *2800),( −−⋅=λ for a unit square region ℜ=[0,1]×[0,1]. This model is used to generate 200 realisations which are used to estimate the parameters, λo, a and b for the following proposed parametric model:

byaxo eyx −−⋅= λλ ),(

One of the realisations is shown in Figure 3.1 (a) and the estimated distribution for λo, a and b are given in Figure 3.1(b), (c) and (d).

0

10

20

30

40

50

60

650 680 710 740 770 800 830 860 890 920 950

Figure 3.1 (a) Non-homogeneous Points Fig. 3.1 (b) Estimated distribution for λo

0

10

20

30

40

50

60

70

0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5

0

10

20

30

40

50

60

1.5 1.6 1.7 1.8 1.9 2 2.1 2.2 2.3 2.4 2.5

Figure 3.1 (c) Estimated distribution for a Fig. 3.1 (d) Estimated distribution for b

As can be seen from the distribution, the maximum likelihood estimate honours quite well the true parameter values of the model, especially for parameters a and b. λo is slightly over-estimated by the method. All distributions, however, are more or less normal. The Cox model used in this exercise is defined as a normal distribution with mean and variance specified as follows:

⋅=

⋅=−−

−−

yx

yx

eyxofvariance

eyxmean2

2

120),(

800),(

λ

λ

The proposed model is then:


40

⋅=

⋅=−−

−−

byaxo

byaxo

evyxofvariance

eyxmean

),(

),(

λ

λλ

To save the running time, 100 realisations of the Cox model are generated, each providing a point estimate for the parameters λo, vo, a and b of the proposed parametric model. 20 Monte Carlo sampling points are used for evaluating the Cox likelihood function. Figure 3.2 (a) shows a typical realisations and Figure 3.2 (b) – (e) shows the estimated distribution for the model parameters.

Figure 3.2 (a) Cox process

0

5

10

15

20

25

650 680 710 740 770 800 830 860 890 920 950

Fig. 3.2 (b) Parameter distribution for λo

0

10

20

30

40

50

60

70

0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.50

10

20

30

40

50

60

1.5 1.6 1.7 1.8 1.9 2 2.1 2.2 2.3 2.4 2.5

Figure 3.2 (c) Parameter distribution for a Fig. 3.2 (d) Parameter distribution for b

0

5

10

15

20

25

30

35

40

45

60 80 100 120 140 160 180 200 220 240 260

Figure 3.2 (e) Parameter distribution for vo Fig. 3.2 (f) One realisation of λ(.)

0

2

4

6

8

10

12

14

16

18

20

60 80 100 120 140 160 180 200 220 240 260

As can be seen from the estimated distribution, parameters λo, a and b of the original model are honoured quite satisfactorily by the estimation. vo, however, demonstrates


41

a poor estimate. The reason could be due to the fact that this parameter contributes to the additional random components for the density model and is in general of secondary importance in making up the final realisation. The direct consequence of this less important role is the more difficulties in estimating the parameter given the realisations. More investigation for this point may be helpful. 3.3 Maximum likelihood density estimation of two example datasets For the two actual datasets, the main difficulty for the maximum likelihood estimation lies in proposing a suitable parametric model. On way to tackle the issue is by examining the non-parametric model obtained in the last section and the propose the model accordingly. Some fitting processes can also be introduced here but it seems there is no universal solutions. 3.3.1 Dataset 1 The full dataset 1 and a non-parametric model is reproduced from last section in Figure 3.3 (a) and (b). By examining this model, one can visually two peak values at the locations of (13.4,17.3) and (16.4,13.4). A direct proposal for the parametric model will then be a non-homogeneous model which is the combination of two normal functions, i.e.,

])4.13()4.16(

[])3.17()4.13(

[2222

),( wy

vx

dy

cx

euebayx−

+−

−−

+−

−⋅+⋅+=λ

which is a model with seven parameters. Parameters c, d, v and w are used to address the anisotropy in the model, b and u are the contributions to the density from the two normal models, and a is the background density which for the current exercise is treated as a constant. Certainly a simpler or a more complicated model similar to this one can be proposed. For example, if anisotropy is not to be considered, c and d can be combined into one parameter and so are v and w. Furthermore, correlations between x and y can also be built into the model by adding an extra product term between x and y.

Fig. 3.3 (a) Dataset 1 - full Fig. 3.3 (b) A non-parametric model


42

Fig. 3.3 (d) The parametric model Fig. 3.3 (c) Sub-region ℜ’ for the model

ℜ

ℜ’

There is, however, large proportion of unpopulated white space within the region ℜ being considered. These sub-regions (white spaces) are always difficult to be modelled together with the populated areas with a single parametric mode. One way to tackle this problem is by coordinate transformation, as used in Lee and Einstein’s research. Another way is to define some sub-regions within ℜ, say ℜ’, and only consider the model within the defined region. For dataset 1, the defined sub-region ℜ’ is shown in Figure 3.3 (c). Therefore the final parametric model proposed for this dataset will be as:

ℜ∈⋅+⋅+=−

+−

−−

+−

−

otherwiseyxeuebayx w

yv

xd

yc

x

0'),(),(

])4.13()4.16(

[])3.17()4.13(

[2222

λ

Using the maximum likelihood method described, a direct estimate give the following parameters:

• a = 0.62 • b = 8.1 • c = 4.1 • d = 0.45 • u = 4.2 • v = 2.5 • w = 1.6

The estimated model is illustrated in Figure 3.3 (d). Note the major part within the region ℜ’ is dominated by the background density a = 0.6 and the model, as expected, gives a continuously smooth density within the region. The likelihood functions for each of the estimated parameters are displayed in Figure 3.4.


43

-700

-600

-500

-400

-300

-200

-100

0 0.2 0.4 0.6 0.8 1 1.2

-265

-260

-255

-250

-245

-240

-235

-230

-225

-220

-215

0 2 4 6 8 10 12

Fig. 3.4 (a) Likelihood function for a Fig. 3.4 (b) Likelihood function for b

-265

-260

-255

-250

-245

-240-235

-230

-225

-220

-215

0 0.5 1 1.5 2 2.5 3

-245

-240

-235

-230

-225

-220

-215

0 1 2 3 4 5 6

Fig. 3.4 (c) Likelihood function for c Fig. 3.4 (d) Likelihood function for d

-240

-235

-230

-225

-220

-215

0 1 2 3 4 5

-250

-245

-240

-235

-230

-225

-220

-215

0 2 4 6 8 10 12

Fig. 3.4 (e) Likelihood function for u Fig. 3.4 (f) Likelihood function for v From these likelihood functions, it is not difficult to see the choices of parameters d, u and w are sensitive to the probability likelihood, but the choices of the others are not. d and w in this case represent the ranges in the y direction covered by the two normal functions. This suggests anisotropic behaviour in the data, i.e., the variation of density in the y direction is more severe. The levelling behaviour of likelihood function in parameters c and v also suggests a longer range of continuity.

-245

-240

-235

-230

-225

-220

-215

0 1 2 3 4 5 6

Fig. 3.4 (g) Likelihood function for w


44

Estimation of distribution for the parameters is not possible using only the maximum likelihood method and one single dataset. MCMC, however, might give us a helping hand for this job. More work on this later. To measure the effectiveness of the fitted model, we conduct 100 Monte Carlo simulation using the model. Figure 3.5 (a) shows one of the realisations. We then conduct the distance and the K-function analyses of both the data and the simulated dataset. The rest of the figures in Figure 3.5 compare the results between them. Displayed in the figures are also the 95% confidence envelop derived from the simulation.

Fig. 3.5 (a) One realisation of the model Fig. 3.5 (b) Inter-event distance H(t)

Fig. 3.5 (c) Inter-event distance (histogram) Fig. 3.5 (d) Nearest-event distance G(t)

Fig. 3.5 (e) Nearest-event distance (histogram)

Fig. 3.5 (f) Point-nearest-event distance F(t)


45

Fig. 3.5 (g) f(x) - histogram Fig. 3.5 (h) K-function K(t) In these figures, the blue lines (the smooth curves) are calculated theoretically for homogeneous point process. Significant departures from homogeneous case are obvious, as expected. Comparing the Monte Carlo simulation statistics with the actual data statistics, we soon realise that the model is not a very good fit for the data. In the K-function (Figure 3.5 h) for example, dataset K(t) follows only the lower 95% confidence envelope curve, not the average curve as expected for a good fit. At the distance around 5.0, dataset K(t) even goes outside the envelope, though only marginal. The shapes of the K-functions between the dataset and the simulation, however, are very similar, implying that the major structures of the dataset have been described properly by the model. The discrepancy is mainly in the small scale structures. This is also confirmed by examining the nearest-event distance statistics, G(t) (Figure 3.5 d). In this figure, the dataset G(t) is almost always outside the simulated G(t) curve. The higher values mean that the dataset always have more pairs of points for small nearest event distance. A direct inspection of a single realisation (Figure 3.5 a) of the model and the actual dataset (Figure 3.3 a) will also suggest that the density of the two point aggregation patterns in the simulated points are not as high as that in the actual dataset. Compared with the statistics for the non-parametric model as shown in Figure 2.17, this parametric model gives a poorer fit. In most of the cases, this statement is always true as the parametric model always use some kind of approximation (smoothing) in order to obtain a parametric form solution, while the non-parametric model is directly derived from data and therefore it will in general give a better fit for the dataset. The parametric model fitted above can be improved, however, by introducing an extra component to describe the random variations of actual density around the non-homogeneous model, i.e., to use a Cox model. To make it clearer, lets display the density models using grid cells instead of contour plot. Figure 3.6 (a) is the grid cells colour plot of the non-parametric model displayed in Figure 3.3 (b), and Figure 3.6 (b) is the grid cells colour plot of the fitted parametric model given above. As can be seen, the parametric model give a much smoother density field. In other words, the extra random component in the field is not modelled by the parametric model. Figure 3.6 (c) is the difference in density values between those displayed in Figure 3.6 (a) and (b). This field is the error in modelling the density field by using the parametric model, by either over-estimation or under-estimation, at different locations. This field can also be regarded as the random


46

component which is not modelled by the parametric model (note this random field will be different if different non-parametric field is used). The histogram of the density values shown in this random field is given in Figure 3.6 (d).

Fig. 3.6 (a) λ of a non-parametric model Fig. 3.6 (b) λ of the parametric model

Fig. 3.6 (c) λ of [λ (a) – λ (b)] Fig. 3.6 (d) Histogram of λ in (c) The mean and variance for the random density field displayed in Figure 3.6 (c) is m=0.091 and v=0.406. As can be seen from Figure 3.6 (c) and (d), the random error field has no preference variation trend and the values follow an approximate normal distribution. This suggests that we can model this random component by a Gaussian random process. By incorporating this random process in the parametric model derived above, we effectively create a Cox process model for the dataset. Therefore the modified parametric model for the dataset will be a Cox model defined as follows:

ℜ∈

=otherwise

yxbelowdefinedyxCmodelCoxAyx

0'),(),(

),(λ

where the Cox model C(x,y) is defined as a normal random process with mean and variance given below:

=⋅+⋅+==

−+

−−

−+

−−

qvarianceeuebameanyxC w

yv

xd

yc

x]

)4.13()4.16([]

)3.17()4.13([

2222

),(

Parameters a, b, c, d, u, v and w will be using the same values as estimated above and q will be the value to model the random component discussed above. q can be a constant or a location dependent function if the variance variation can be described in an analytical form.


47

If we set q = 0.2 for dataset 1, one of the realisation of density field λ(x,y) is shown in Figure 3.7 (a) below. A realisation of the points is shown in Figure 3.7 (b).

Fig. 3.7 (a) One realisation of λ(x,y) Fig. 3.7 (b) One realisation of the model

Note, statistically, the Cox model defined above is identical to the non-homogeneous model presented previously. This is due to the fact that the random component added to the model has a statistical mean of zero. Therefore we should expect that the statistics derived from the Monte Carlo simulations based on the Cox model be similar to those derived by the non-homogeneous model. A few examples are given in Figure 3.8. These figures bear considerable amount of similarities to their counterparts displayed in Figure 3.5. Nevertheless, the Cox model does provide a way to model the random variations in the actual density model which is not available using the non-homogeneous approach.

Fig. 3.8 (a) H(t) - Histogram Fig. 3.8 (b) G(y) - Histogram


48

Fig. 3.8 (c) F(x) - Histogram Fig. 3.8 (d) K-function K(t) Figure 3.8 Statistics derived from Monte Carlo simulations based on the Cox model

f(x,y)=8.1*exp(-(x-13.4)*(x-13.4)/4.1-(y-17.3)*(y-17.3)/0.45)+4.2*exp(-(x-16.4)*(x-16.4)/2.5-(y-13.4)*(y-13.4)/1.6)+0.62 3.3.2 Dataset 2 For the full dataset 2, as shown in Figure 3.9 (a), a suitable non-parametric model (based on results in Section 2 of this report) is reproduced in Figure 3.9 (b). If we define an effective region ℜ’ as shown in Figure 3.9 (c) and restrict our considerations only within the region, a homogeneous density model can be considered as an appropriate proposal, i.e.,

ℜ∈

=Otherwise

yxayx

0'),(

),(λ

Note the homogeneity only hold within ℜ’. Globally it is non-homogeneous. Region ℜ’ is defined according to the original boundary of the fracture dataset, which has a very clear definition even for the gap between the top and the bottom fracture sets, as shown in Figure 3.9 (d).

Fig. 3.9 (a) Dataset 2 - full Fig. 3.9 (b) A non-parametric model


49

Fig. 3.9 (d) The original fracture image Fig. 3.9 (c) Sub-region ℜ’ for the model

ℜ

ℜ’

A direct estimation using maximum likelihood method gives the parameter a a value of a = 1.34. The likelihood function is shown in Figure 3.10 (a). Figure 3.10 (b) gives the estimated model in the form of contour plot.

-2000

-1800

-1600

-1400

-1200

-1000

-800

-600

-400

0 0.5 1 1.5 2

Fig. 3.10 (b) Likelihood function for a Fig. 3.10 (a) Contour plot for the model We proceed with the estimation of some statistics as usual to test the effectiveness of the model. These are given in Figure 3.11. As can be seen, the model fit the dataset quite satisfactorily. Due to the smoothing effect of the parametric model, however, compared with the non-parametric model analysed in Figure 2.17, parametric model gives a poorer fit as expected.



50

Fig. 3.11 (c) H(t) - (histogram) Fig. 3.12 (d) Nearest-event distance G(t)



Fig. 3.11 (g) f(x) - histogram Fig. 3.11 (h) K-function K(t) The parametric model fitted above, again, can be modified to take into account of the random component which can not be described by the parametric model. Similar procedures as used in the last section can be used for this modification. Figure 3.12 (a) and (b) are the grid cells plots of the nonparametric model used in Figure 2.17 and the parametric model estimated above. Figure 3.12 (c) shows their differences, or the errors (in corresponding to the specified non-parametric model) of the parametric model. The histogram of the errors is shown in Figure 3.12 (d).


51

Fig. 3.12 (a) λ of a non-parametric model Fig. 3.12 (b) λ of the parametric model

Fig. 3.12 (c) λ of [λ (a) – λ (b)] Fig. 3.12 (d) Histogram of λ in (c) The model can also be improved by a more accurate descriptive model for λ displayed in Figure 3.12 (a). For example, the two peak values at around (10,18) and (22,4) can also be modelled in the similar way used for the dataset 1 example. 4. Least-squared parametric density estimation Another category of parametric estimation of interest to us is the least-squared estimation. The principle behind this technique is the minimisation of discrepancy in some statistics between the data and the parametric model. K-function is normally used for this exercise. If the K-function calculated from the data is )(ˆ tK and the theoretical K-function derived from the parametric model with parameter array θ is K(t,θ), the discrepancy between the two quantities can then be written as:

( ) ( )∫ ⋅

−=

otcc

dttKtKD0

2

),()(ˆ)( θθ

where to and c are termed as turning constants by Diggle []. The estimated parameters will be the set θ̂ which minimise the above discrepancy, D(θ). There are two important elements needed to be addressed in order to successfully apply this method. The first issue is the accurate representation of K(t,θ) for a given parametric model. From the literature, the only process which has the acute K-


52

function representation is the homogeneous Poisson process, 2),( ttK ⋅= πθ . The theoretical model for Nyman-Scott cluster process can also be derived (see Diggle [] and Cressie [ ] ) for the special case of bi-normal distribution of daughter locations as:

−+⋅=−

2

2

422 11

),,( σ

ρπρσ

h

ettK

where daughter distribution function is:

2

22

222

1),( σ

πσ

yx

eyxh+

−=

This theoretical K-function, however, only covers the relations between daughter points from the same parent and therefore is only valid for t up till the cluster size. Bear in mind for most cases, cluster size is normally small compared with the size of the region. It is for this reason that only a very small range of the K-function is valid and usable. In general the valid range is only about 10%-15% of the side length of a rectangular region. If the theoretical K-function is not available, It can be replaced by the average K-functions from Monte Carlo simulation, i.e.,

∑=

=M

ii tK

MtK

1

)(ˆ1),(ˆ θ

Here M should be large enough to give a stable estimate ),(ˆ θtK . As can be

expected, the replacement of theoretical K(t,θ) by Monte Carlo simulated ),(ˆ θtK is an extremely time-consuming process. Trade-offs between the accuracy of the representation of K(t,θ), the number of MC simulations M and the search space for parameters θ are always necessary to reach the conclusion within the reasonable amount of time. Based on experience, a minimum value of 20 for M should be used. The second important issue for successful implementation of least-squared estimation is the selection of turning constants, to and c. In most cases, the choices of values are subjective and mainly based on trials and errors type of technique. Diggle suggested in general to = 0.25 (for unit square region), c = 0.5 for more regular point patterns and c = 0.25 for more aggregated patterns. In my opinion, the selection of to and c are not independent. to basically defines the range of the K and K̂ functions to be considered. c actually defines the weight of contributions from K and K̂ to D(θ) at different distance t. As K(t) is always a monotonically increasing function with t and therefore in practice the discrepancy between K and K̂ in general will also increase. However, the smaller the constant c, the smaller the contribution of this discrepancy towards D(θ). In other words, the ever-increased discrepancy between K and K̂ becomes less and less important as the constant c is getting smaller. In this sense, c also defines implicitly an effective t range where discrepancy between K and K̂ is important and therefore its choice is related to to. The reason that Diggle choose small c for his research is due to the fact that the theoretical K-function used in his research is only valid for a small range of t (in the example presented, the valid range is only tmax = 0.15). As he chose to = 0.25 for his


53

least squared estimation exercise, a small value of c = 0.25 is necessary to minimise the effect of discrepancy between K and K̂ in the range of 0.15 – 0.25 (as the theoretical K-function is not valid in this range). Based on experience, I think the more reasonable choices of to and c should be done in a more objective way. to should be chosen according to the valid range of the theoretical of K(t,θ), e.g., to = 0.15 in Diggle’s example. If K(t,θ) is replaced by Monte Carlo simulation values, the valid range of ),(ˆ θtK can be up to to = 0.75 (for a unit square region) in most practical cases (see Report 3 for the discussions for valid K-function ranges). For general usage, the choice of to = 0.50 (50% of the smaller side length of a rectangular region) will serve the purpose. The choice of c, on the other hand, should be done according to the scale of interest. If there is no particular requirement, i.e., pattern characteristics at all scales are of the same interest and importance, c should be set at 1.0. If small scale pattern is the main focus of the investigation, c should be small, such as c = 0.25 in Diggle’s example. If large scale characteristics is more important than small ones, it is possible to use c with value greater than 1.0. 4.1 Implementation issues Apart from the two problems discussed above, there is one additional issue for the implementation of the method, the search space for the parameters. Since the least-squared minimisation is an expensive process, one may start with a coarse search grid first and gradually increase the number of search points by tracking down the possible ranges of parameters for the minima. This can be done in an automatic way or in a supervised interactive manner. There is always a possibility in this implementation that the minima found is only of local significance. The chance of this happening, however, is rare in point density estimation as D(θ) is normally a simple function against θ. 4.2 Least-squared density estimation of simulated patterns In the following sections, we examine the density estimation using least-square technique for some simulated point patterns. 4.2.1 Non-homogeneous case We start looking at the actions of the method by some simulated point patterns. Our first example is a non-homogeneous example with density defined below:

yxeyx *2100),( −−⋅=λ for a unit square region ℜ=[0,1]×[0,1]. This is a similar non-homogeneous process to that used in the last two sections, but with reduced density to save computation time. This model is used to generate 100 realisations which are used to estimate the parameters, λo, a and b for the following proposed parametric model:

byaxo eyx −−⋅= λλ ),(

The turning constant selected for the least square estimation of these parameters are: to = 0.5, c = 1.0, 20 number of Monte Carlo simulations are used to estimate the theoretical K(t). One of the realisations of the original model is shown in Figure 4.1 (a) and the estimated distribution for λo, a and b are given in Figure 4.1(b), (c) and (d).


54

0

2

4

6

8

10

12

14

50 60 70 80 90 100 110 120 130 140 1500

1

2

3

4

5

6

7

8

9

10

50 60 70 80 90 100 110 120 130 140 150

Figure 4.1 (a) Non-homogeneous Points Fig. 4.1 (b) Estimated distribution for λo

0

1

2

3

4

5

6

7

8

1.5 1.6 1.7 1.8 1.9 2 2.1 2.2 2.3 2.4 2.50

1

2

3

4

5

6

7

8

0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5

Figure 4.1 (c) Estimated distribution for a Fig. 4.1 (d) Estimated distribution for b

As can be seen from the distribution, the least squared estimator does not give satisfactory estimated distributions for the parameters. This is especially true compared with the results obtained by maximum likelihood estimator (cf. Figure 3.1). There are many reasons for this outcome, but two main reasons are outstanding: 1. The first reason is mainly due to the artefact, i.e., the original model used. The point density may not be high enough and therefore the non-homogeneous point pattern characteristics are not obvious. The number of Monte Carlo simulations to derive the theoretical ),(ˆ θtK may not be large enough so that the approximated K-function is not stable. These problems, however, can be improved. For example, use λo = 800 in the above model instead of 100 and use 100 MC simulations instead of 20 in the above estimation. We do not, however, predict a considerable amount of improvement by this modification. If time permitted, we will come back to this point and do some more verification for this point. 2. The second reason is fundamental and is far more serious than first thought. It could be due to the ineffectiveness of the least-square method to model the non-homogeneous point process. In other words, the least squared method described above may not be applicable all together in non-homogeneous case. The method discussed above is based in minimising the squared difference between the K-functions of the data and the theoretical values. Unlike the likelihood function l(θ), K-function is not a location-dependent entity. In other words, K-function only describes the pattern characteristics and the exact locations of the point patterns are not relevant. A simple example is given in Figure 4.2. In this example, the same


55

point pattern is located in two different places inside the region and both are having identical empirical K-functions, as shown in (b) and (d) in the figure.

Figure 4.2 (a) A point pattern Fig. 4.2 (b) K-function for (a)

Figure 4.2 (c) Point pattern (a) relocated Fig. 4.2 (d) K-function for (c)

The point put forward by this argument is this: parametric density model is a location dependent function but the K-function is not. The using of a location-independent statistics to model a location-dependent function is doomed to fail. This is exactly what is shown in our example given in Figure 4.1. We can look at the example again in a different angle. In Figure 4.3 (a) and (c), the model used to generate the point patterns are two different non-homogeneous density functions as:

=⋅=⋅=

⋅=

+−+−−−−

−−

)(3.4800

800800),(

)(3.4800),(

*23

3*2)1*(2)1(

*2

cFigureforee

eeyx

aFigureforeyx

yxyxyx

yx

λ

λ

The two point patterns, however, have the identical K-functions, as shown in Figure 4.3 (b) and (d). If we propose the following parametric form to fit a model for the point pattern in Figure 4.3 (a):

byaxo eyx +⋅= λλ ),(

and if the least square technique described above is used, we will have the equal chance of getting the answers of { λo = 800, a = -1.0, b = -2.0 }, which is the correct group of answers, and { λo = 39.8 (800/e3), a = 1.0, b = 2.0 }, which is totally inappropriate for describing the density field shown in Figure 4.3 (a).


56

Figure 4.3 (a) A point pattern Fig. 4.3 (b) K-function for (a)

Figure 4.3 (c) Another point pattern Fig. 4.3 (d) K-function for (c)

In conclusion, least-square technique described above can not be used for parametric modelling of non-homogeneous point process. The current literature has the incorrect perception about the application of this technique and hopefully it will not be long before this can be put right. This does not mean that least-square technique is totally inadequate in point density modelling. If fact, the idea of the technique is universal: the minimisation of the squared difference between the statistics of the dataset and the model being considered. The key point here is the selection of the statistics. Based on the above discussion, the use of K-function in this context, as recommended in the current literature (see Diggle [ ]), is not suitable. In order for the technique to be usable and effective, a location dependent statistics must be used. There is another situation, however, the least-square method discussed above can be used without any modification. This is the case when the density within the region is itself invariant in space, i.e., location-independent. In point process, this situation is corresponding to a homogeneous point process. An application will be discussed below in dealing with the cluster process, as in our case, the parent point will be modelled as a homogeneous process. 4.2.2 Cluster cases For cluster cases, we use the Neyman-Scott cluster process. The parent process is supposed to be a homogeneous process with density value λo. Each parent produces a Poisson number of daughters with the mean of 20. The daughter points follow binormal distribution with parameter σ, i.e., the distribution function is:


57

2

22

222

1),( σ

πσ

yx

eyxh+

−=

For Neyman-Scott cluster process with binormal daughter distribution, the theoretical K(t,λo,σ) can actually be expressed in functional form as follows (see Diggle [] and Cressie []):

)1(1

),,( 2

2

42 σ

λπσλ

t

oo ettK

−−+=

As discussed above, however, this theoretical solution is only valid for t up till the cluster size. From the pattern analysis we know this means that the valid range for the above K-function solution is only up till the first maxima – the first “bumps” in the K-function curve, cf. Report 3 []. Therefore cautions are needed in applying this solution. We can, however, use Monte Carlo simulations to approximate the theoretical K-function curve which should extend the valid range for ),,(ˆ σλotK to at least 50% of the shorter side length of a rectangular region. This two different approaches will be used below. As the first example of cluster process, we choose λo = 5 and σ = 0.1, i.e., smaller number of parents with wide spread daughter points. Figure 4.4 (a) is one of the realisations from the model and Figure 4.4 (b) is its corresponding K(t) function. Because of the large cluster size, the theoretical K function given above is valid up till to = 0.3. Using the turning constant of to = 0.3 and c = 1.0 the estimated distributions for parameters λo and σ are given in Figure 4.4 (c) and (d) by using the above mentioned theoretical solutions for ),,( σλotK . As can be seen from the results, though the estimates give the indications of the tendency of the estimates for the original parameters, the quality of the estimated distribution is quite poor. If we use the parameters of λo = 5 and σ = 0.05 to generate the point datasets, the corresponding estimated distributions will be as shown in Figure 4.4 (e) and (f) using the theoretical

),,( σλotK function. As can be seen, the quality of the estimation improves over the first example but it is still poor. The improvement in the estimation quality is mainly due to the fact that greater dispersion variance parameter, σ, causes the daughter points more spread-out and the point pattern departing more from cluster characteristics and therefore the theoretical ),,( σλotK tends less applicable. However, if we use Monte Carlo simulation in this case to approximate the theoretical K-function ),,(ˆ σλotK , the quality of the estimated distributions will improve considerably (at least major portion of estimated values will be surrounding the original parameter value), as shown in Figure 4.4 (g) and (h).


58

Figure 4.4 (a) One realisation Fig. 4.4 (b) Typical K-function

0

5

10

15

20

25

30

35

40

45

50

0 1 2 3 4 5 6 7 8 9 10

0

2

4

6

8

10

12

14

16

00.0

5 0.1 0.15 0.2 0.2

5 0.3 0.35 0.4 0.4

5 0.5

Figure 4.4 (c) Estimated distribution for oλ̂

(λo = 5) – using formula =

Fig. 4.4 (d) Estimated distribution for σ̂ (σ = 0.1) – using formula

0

5

10

15

20

25

30

35

40

00.0

25 0.05

0.075 0.1

0.125 0.1

50.1

75 0.20.2

25 0.250

5

10

15

20

25

0 1 2 3 4 5 6 7 8 9 10

Figure 4.4 (e) Estimated distribution for oλ̂

(λo = 5) – using formula Fig. 4.4 (f) Estimated distribution for σ̂

(σ = 0.05) – using formula


59

0

2

4

6

8

10

12

14

16

18

20

0 1 2 3 4 5 6 7 8 9 10

0

2

4

6

8

10

12

14

0 0.03 0.05 0.08 0.1 0.13 0.15 0.18 0.2

Figure 4.4 (g) Estimated distribution for oλ̂

(λo = 5) Fig. 4.4 (h) Estimated distribution for σ̂

(σ = 0.05) – using MC The considerable amount of difference in terms of the estimation quality reveals that the theoretical ),,( σλotK function must be used with great care. It is only valid for a small range of t and it is only usable when the point pattern expresses clear characteristics of clusters. Figure 4.5 demonstrates this argument. In this figure, the original model (the correct one) is as displayed as curve 1. Because we specify to = 0.5 so the estimation process will try to derive a model which minimise the squared difference between the theoretical model and empirical curve (shown as the red dots) within the range of t∈[0,0.5]. The result is the incorrect model shown as curve 2 in the figure which, however, does minimise the squared difference within that range. Experiences suggest that the more the characteristics of the point pattern depart from that of point clusters, the more serious the problem of incorrect estimation. If we revisit Figure 4.4 (c) and (d), we can see a considerable proportion of estimated parameters go beyond the upper limit of the estimation ranges being considered. These will the situations as illustrated in Figure 4.5. Using Monte Carlo simulation to estimate the theoretical K-function, ),,(ˆ σλotK , however, eliminates the problem as the simulation will take into account the cluster characteristics and the validity range automatically. The downside of this approach is that it will be an extremely time-consuming process. 4.2.3 Cox case In general, Cox process will have location-dependent density model. From the discussions in Section 4.2.1, application of least square technique will not be suitable for this case. We therefore will not conduct any analysis for this process.

Figure 4.5 Incorrect K(t) estimation

Valid range 1 2


60

4.3 Least-squared density estimation of two example datasets We know turn to the least-square estimation for the two example datasets used in the reports. For the reason of time, only the analysis of the full datasets will be conducted. 4.3.1 Dataset 1 – full For the same reason as mentioned in Section 4.2.3, least-square analysis for the dataset 1 will be skipped. 4.3.2 Dataset 2 - full For the second dataset (full), we use a homogeneous parametric density model proposed in Section 3.3, as given below:

ℜ∈

=Otherwise

yxayx

0'),(

),(λ

The region ℜ’ again gives the boundary where the density function is defined, as shown in Figure 4.6 (a). As mentioned above, the least-square technique described above can be applied directly to invariant density field without any modification. The direct estimation from the technique gives the result of 1.25 for the parameter a [an unbiased estimate for a will be N/A(ℜ’), where N is the number of points within the area A defined by region ℜ’] . Figure 4.6 (b) shows the squared error function for different parameters ranging from 0 – 2.0. At the first sight, it is not easy to determine the “best” parameter. This is caused by the number of Monte Carlo simulations used to derive the theoretical ),(ˆ θtK function. We only use 6 MC simulation to save computing time. If we choose a large number, say 50 – 100, a much more stable result can be obtained. The reciprocal of the squared error function as shown in Figure 4.6 (c) will help to identify the “best” least-squared parameter. This parameter, from Figure 4.6 (c), is estimated to be 1.25.

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

0 0.5 1 1.5 2

Fig. 4.6 (b) Squared error function Fig. 4.6 (a) Sub-region ℜ’ for the model

ℜ

ℜ’


61

0

0.002

0.004

0.006

0.008

0.01

0.012

0 0.5 1 1.5 2Fig. 4.6 (c) Reciprocal of squared error function

To test the effectiveness of the fitted model, again we conduct 100 Monte Carlo simulations using the model with a = 1.25. Figure 4.7 (a) shows one of the realisations. We then conduct the distance and the K-function analyses of both the data and the simulated dataset. The rest of the figures in Figure 4.7 compare the results between them. Displayed in the figures are also the 95% confidence envelop derived from the simulation.


Fig. 4.7 (c) Inter-event distance (histogram) Fig. 4.7 (d) Nearest-event distance G(t)


62



Fig. 4.7 (g) f(x) - histogram Fig. 4.7 (h) K-function K(t) The following two obvious conclusions can be drawn from the analysis:

• The fitted model is effective in a global sense as all the statistics of the dataset are honoured satisfactorily.

• Compared with the analysis conducted in Section 3.4 and displayed in Figure 3.11, least-square model in this case gives a slightly better fit for the dataset (the model derived by maximum likelihood method is with parameter a = 1.34, instead of 1.25 derived here).

5. General conclusions Density estimation is one of the most important issues in point process modelling. Two categories of methods are available for this purpose:

1. Non-parametric method, which can use the following different approaches • Unbiased estimate • Kernel estimate • Distance estimate

2. Parametric method, which is further divided into two different approaches • Maximum likelihood method • Least-square method


63

Based on the above analysis, we can conclude that in general non-parametric estimation using the kernel functions are very effective and can give a very accurate representation of the actual dataset considered. This approach, however, is data-driven and is reliable only when enough and representative data are present. It ignores totally the underlying physical (possible functional) process which generates the point pattern. The selection of an appropriate bandwidth (smoothing factor) is subjective and sometimes difficult. More research for choosing the suitable bandwidth objectively is suggested. Nevertheless the non-parametric method provides a quick and effective tool for modelling the density. The derived model can be used in its own right or it can serve as the preliminary estimation in helping to build an appropriate parametric model. The maximum likelihood (ML) parametric method is also proved to be very effective. As long as a suitable parametric density form is proposed, the ML estimation can give consistent model which maximise the likelihood of the displayed point pattern. The difficulty in applying this method, however, is two fold: firstly, an appropriate parametric model has to be constructed, which determines the final effectiveness of the model. Secondly, the likelihood function must be derived, which in complex point processes may proves difficult. Least-square technique based on K-functions, on the other hand, is found to be ineffective in modelling the point process. The only exception is only when the point process is homogeneous in which case the estimation of the density is not too much an issue. It can be used, in this sense, for modelling cluster process when the parent follows a homogeneous process. For other cases, it is almost useless. In order for this technique to work properly, a location-dependent statistics other than the location-independent K-function must be used. More research into this point is suggested. 6. References 1. Cressie, N. A. C., Statistics for spatial data, John Wiley & Sons, Inc., New Yory,

1993. 2. Diggle, P., On parameter estimation fro spatial point process, J. R. Statist. Soc.

B., Vol. 40, No. 2, pp178-181, 1978. 3. Diggle, P, Statistical analysis of spatial point patterns, Academic press, 1983. 4. La Pointe, P. R., Pattern analysis and simulation of joints for rock engineering,

Comprehensive rock engineering: principles, practice & projects, pg. 230, Chapter 10, J. A. Hudson (editor-in-chief), Pergamon, Oxford, 1993.

5. Lee, J. S., Einstein, H. H. & Veneziano, D., Stochastic and centrifuge modelling of jointed rock, Part III – stochastic and topological fracture geometry model, technical report MIT CE R-90-25, MIT, 1990.

6. Silverman, B.W., Choosing the window width when estimating a density, Biometrika, Vol. 65, No. 1, pp1-11, 1978.

7. Silverman, B. W., Density estimation, Chapman and Hall, London, 1986. 8. Xu, C., Dowd, P., Mardia, K., Fowell, R., Basic analysis of spatial point patterns,

technical report 3 (EPSRC GR/R94602), Department of Mining and Mineral Engineering, University of Leeds, January 2003.

Documents

Technical Report 4...Stochastic Modelling of Fractures in Rock Masses, 2003 2 1. Introduction After spatial pattern analysis of the point dataset, the next step towards modelling is