Modelling Bivariate Distributions Using Kernel Density ... · Modelling Bivariate Distributions Using Kernel Density Estimation Alexander Bilock, Carl Jidling and Ylva Rydin ... Matlab

Department of information technology

Modelling Bivariate DistributionsUsing Kernel Density Estimation

Alexander Bilock, Carl Jidling and Ylva Rydin

Project in Computational Science

26 January 2016

Abstract

Kernel density estimation is a topic covering methods for computingcontinuous estimates of the underlying probability density function ofa data set. A wide range of approximation methods are available forthis purpose, theses include the use of binning on coarser grids and fastFourier transform (FFT) in order to speed up the calculations. A keyfactor in the kernel density estimation process is the selection of theso-called kernel bandwidth. The aim of this project is to implementdifferent kernel density estimation approaches proposed in the litera-ture and compare their performance in terms of speed and accuracy.Matlab is used as the main environment for the implementation. Theresults show that using FFT can speed up the calculation with almostmaintained accuracy if the data is binned on a dense grid. Some gen-eral advice for selection of kernel bandwidth is also discussed.

1

Contents

1 Introduction 3

2 Univariate kernel density estimates 3

3 Bivariate kernel density estimates 4

4 Error Estimation 6

5 Approximations 85.1 Binning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85.2 Fourier transform . . . . . . . . . . . . . . . . . . . . . . . . . 8

6 Bandwidth selection 106.1 Plug-in bandwidth selection . . . . . . . . . . . . . . . . . . . 126.2 Cross validation . . . . . . . . . . . . . . . . . . . . . . . . . . 15

6.2.1 Smoothed cross validation . . . . . . . . . . . . . . . . 156.3 Pre-transformation . . . . . . . . . . . . . . . . . . . . . . . . 16

7 Applications of KDE 177.1 Cloud transform . . . . . . . . . . . . . . . . . . . . . . . . . 177.2 Examples with real data . . . . . . . . . . . . . . . . . . . . . 18

8 Method and results 228.1 Comparison of binning methods . . . . . . . . . . . . . . . . . 238.2 Comparison of KDE-calculation methods . . . . . . . . . . . 248.3 Comparison of bandwidth selection methods . . . . . . . . . . 29

9 Summary and conclusions 33

A Comparison of binning methods 36

B Comparison of KDE-calculation methods 41

C Comparison of Bandwidth selection methods 45

2

1 Introduction

In many fields of science data exploration is of significant importance. In onedimension, investigating the properties of a data set can often be done intu-itively. However, in higher dimensions detecting properties such as skewnessand multi-modality may be difficult. In lower dimensions histograms can beused to reveal some of the properties, but making a smooth estimation of theunderlying probability density function (PDF) is often desired. A popularmethod for doing that is kernel density estimation (KDE). The purpose ofthis work is to implement two dimensional KDEs in Matlab using differentmethods and investigate them in terms of accuracy and speed.

In Section 2 and 3 the theory for kernel density estimation is presented. Er-ror estimation is introduced in Section 4. Section 5 describes approximativeways of calculating KDEs in order to increase the speed. In Section 6 thebandwidth concept is introduced with a walk-through of existing algorithms.An application field for KDEs are introduced in Section 7, including someexamples with geostatistical data. Section 8 presents the methods of andresults from the performance study. Conclusions and analysis is found inSection 9.

2 Univariate kernel density estimates

One way to explore the properties of a data set is by constructing a his-togram. If the histogram is normalised, it yields a non smooth representa-tion of the PDF. A KDE is used to get a smooth estimation of the PDF.The univariate KDE f of the PDF f is defined as

f(x, h) = 1n

n∑i=1

Kh(x− xi) (1)

for a dataset with n samples x = [x1, x2, x3, ..., xn] from f . The kernelfunction Kh(u) = 1

hK(uh) is a symmetric and non-negative function fulfilling∫

<Kh(u)du = 1. There is a wide range of kernels, although the kernelfunction does not have a significant impact on the estimator. In this workthe two most commonly used have been considered , namely the Gaussiankernel

3

K(u) = 1√2πe−

12 u2

, (2)

and the Epanechnikov kernel

K(u) = 34(1− u2)1{|u|<1}, (3)

where 1{...} is the indicator function

1{|u|<1} ={

1 if |u| < 10 otherwise.

The main difference between those kernels is that while the Gaussian kernelhas an infinite support (non-zero everywhere) the Epanechnikov kernel isnon-zero only on a limited domain. The parameter h is called the bandwidthof the kernel. The choice of h is the most important factor regarding theaccuracy of the estimate. The bandwidth selection methods used in thisproject are described in Section 6.

A simple visualisation is seen in Figure 1. It shows a KDE of the datasetx = [−3.9; −2.9; −1.8; 0.9; 1.5; 2], calculated with a Gaussian kernel andh = 0.75. For comparison, a histogram constructed from the same points isshown as well. In the left Figure the blue dots are the data points and thered curves are the kernels evaluated at each point. The green curve is thefinal KDE.

3 Bivariate kernel density estimates

In the bivariate case the data points are represented by two vectors x1 =[x11, x12, x13, ..., x1n] and x2 = [x21, x22, x23, ..., x2n] where xi = (x1i, x2i) isa sample from a bivariate distribution f . In analogy with the univariatecase, the bivariate kernel density estimate is defined as

f(x,H) = 1n

n∑i=1

KH(x− xi). (4)

4

-5 -4 -3 -2 -1 0 1 2 3

x

0

0.05

0.1

0.15

0.2

0.25

Density

Kernel EstimatePointsKernels

(a) KDE

-5 -4 -3 -2 -1 0 1 2 3

x

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Density

(b) Histogram

Figure 1: Kernel density estimation and histogram from a dataset with 6points.

Here the bandwidth is the positive definite matrix,

H =[h11 h12h12 h22

], (5)

and the kernel function KH is a symmetric and non negative function ful-filling

∫<2 KH(u)du = 1. In the bivariate case KH(u) = |H|−1/2K(H−1/2u).

As in the univariate case the bivariate kernels used in this work have beenthe Gaussian kernel,

K(u) = 1√2πe−

12 uT u, (6)

and the Epanechnikov kernel,

K(u) = 2π

(1− uTu))1{|uT u|<1}. (7)

Figure 2 demonstrates the difference between a bivariate histogram and akernel density estimation. It shows a dataset generated from a combinationof two bivariate normal distributions, visualised through a scatterplot, ahistogram, a Gaussian kernel density estimate and the true PDF.

5

-3 -2 -1 0 1 2 3

-4

-3

-2

-1

0

1

2

3

(a) Scatter plot-3 -2 -1 0 1 2 3

-4

-3

-2

-1

0

1

2

3

0.05

0.1

0.15

0.2

0.25

(b) True Density

-3 -2 -1 0 1 2 3

-4

-3

-2

-1

0

1

2

3

0

0.05

0.1

0.15

0.2

0.25

(c) Histogram-3 -2 -1 0 1 2 3

-4

-3

-2

-1

0

1

2

3

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

(d) Kernel density estimate

Figure 2: Comparison between scatter plot, histogram and KDE from adataset generated from two normal distributions

4 Error Estimation

To assess the closeness of a kernel density estimator to the target densityan error criteria must be used. A common error estimate for kernel densityestimation is the Mean Integrated Square Error (MISE):

MISE(f) = E(∫

(f(x,H)− f(x))2)dx. (8)

Since the MISE depends on the true density f it can only be calculated fordata sets drawn from known distributions f . The MISE can be approxi-mated with the Integrated Mean Square Error IMSE. The expression for theIMSE is obtained by moving the expectation value in (8) inside the integral.The IMSE can be calculated numerically using, for instance, Monte Carlointegration. The algorithm goes as follows:

6

• Generate m datasets each with n random points from the density fon a uniform grid [X,Y ].

• Generate a set of k uniformly distributed random points xc on thegrid.

• For each one of the m datasets a KDE f is calculated and evaluatedon the grid.

• Use linear interpolation to obtain an approximation f(xc, h) of theKDE in the random points xc.

• The Mean Squared Error MSE is given as

MSE = 1m

m∑i=1

(fi(xc, h)− f(xc))2. (9)

• The Integrated Mean Square Error is approximated as IMSE = MSE·Awhere MSE is the mean of MSE for all Monte Carlo points and A isthe area of the domain spanned by the grid [X,Y ].

In some situations it is more interesting to study the Integrated Square Error,ISE. The difference from the IMSE-calculation above is that no mean is takenin order to form the MSE. Instead, the value of the squared error is savedfor each data set. The result can thereafter be integrated as above to formthe ISE and presented e.g. in box plots to visualize the deviations from itsmean value, which then is an approximate MISE.

Provided the number of sample points and the bandwidth matrix, exactvalues of the MISE can be calculated on a closed form if f is a combina-tion of normal distributions and K is the Gaussian kernel, as described byequation (4.16) in [2]. This closed form can be used in comparison studiesof bandwidth selection methods.

The Asymptotic MISE (AMISE) is an approximation of MISE used in thebandwidth selection since it depends on the bandwidth h in a simpler way.In Wand and Jones (1995) [2] it is stated that under certain assumptions onf , h and K

AMISE(f) = (nh)−1R(K) + h4(µ2(K)

2)2R(f ′′), (10)

where R(L) =∫L2(x)dx , µ2(L) =

∫x2L(x)dx for any function L.

7

5 Approximations

5.1 Binning

In many practical applications direct computation of the kernel density esti-mation is too computationally expensive. One strategy to reduce the compu-tational load is by using binning. Instead of calculating the kernel estimatorson each data point an approximation is made by binning the data on the gridwhere the KDE is calculated. In this way the number of kernel evaluationsis changed from O(nM) to O(M2), where M is the number of grid points(in any dimension). This implies that binning reduces the computationalburden provided that the number of data points exceeds the number of gridpoints (neglecting the time required for the binning itself). The expressionfor the approximate, binned KDE in dimension d is

f(xi) = 1n

M1∑l1=1

. . .Md∑

ld=1KH(xi − xl)cl, (11)

where cl is the weight assigned to the grid point xl.

The two most commonly used binning rules are simple binning and linearbinning. In the univariate case, simple binning assigns a unit mass to thenearest grid point of the data point x. In the case of linear binning, x givesa weighted contribution to both of the surrounding grid points. If y andz are the left and right surrounding grid points, the weighted masses are(z−x)/(z−y) for y and (x−y)/(z−y) for z. The extension to the bivariatecase and higher dimensions is straightforward. The line between the closesttwo grid points in one dimension is replaced by the area enclosed by thefour surrounding grid points in the bivariate case, and so on with volumesin higher dimensions. The approximation by linear binning is considerablemore accurate as compared to simple binning. Moreover the number ofgrid points can be a quarter as many for linear binning as compared tosimple binning with maintained accuracy [1]. Figure 3 illustrates a bivariateexample of linear binning.

5.2 Fourier transform

As described in Section 5.1, an approximation of the KDE can be calculatedby binning the data and assign a weight to each grid point. The more

8

X

-3 -2 -1 0 1 2 3

Y

-3

-2

-1

0

1

2

3

Figure 3: Bivariate linear binning with green markers as data, mesh repre-sented by blue lines and scaled weight contributions as filled red circles.

the number of data points exceeds the number of grid points, the fasterwill the binned calculation be as compared to the calculation according tothe definition. The speed can be increased further by making use of thefast Fourier transorm (FFT). The key point is that expression (11) for thebinned approximation can be rewritten in form of a convolution

f = 1n

L1∑l1=−(L1)

. . .Ld∑

ld=−Ld

cj−lkl, (12)

where Li = Mi − 1 although it can be shrunk for a slightly reduced compu-tational burden. Furthermore

kl = 1nKH(δ1l1, . . . , δdld),

where δi is the mesh size in direction i. With the convolution form of (12),the fourier transform can easily be applied, and using the FFT is recom-mended since the computational load is reduced fromO(M2) toO(M log(M)).

An FFT-method for KDE-calculations is presented by Wand in [1]. This al-gorithm, however, suffers from the drawback of not allowing unconstrained

9

bandwidth matrices. A corrected version of the algorithm is recently pre-sented by Gramacki and Gramacki in [3], which is the one used in the im-plementation of this work.

As can be seen in Section 8, the FFT method surpasses the binned cal-culation (11) in terms of computation time. Regarding the accuracy, nonumerical difference has been detected. However, the FFT method mayintroduce some visual artifacts as seen in Figure 4. This is assumed to becaused by numerical errors due to the limited precision of the floating pointformat. Attempts to remove the effect by an extended zero-padding of thecomputational domain turned out unsuccessful.

-3 -2 -1 0 1 2 3

-4

-3

-2

-1

0

1

2

3

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

(a) Linear binning-3 -2 -1 0 1 2 3

-4

-3

-2

-1

0

1

2

3

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

(b) Linear binning with FFT

Figure 4: Approximative versions of the KDE in figure 2(d). Linear binninghas been used in both cases, but the KDE to the right has been calculatedusing FFT. It is seen that this has introduced some artifacts.

6 Bandwidth selection

An implementation of the kernel density estimation requires the selectionof a bandwidth, denoted h in the univariate case and H in the bivariate.The choice of bandwidth has been shown to be of greater importance thanthe actual choice of kernel [2]. Figure 5 demonstrates the importance ofan appropriate bandwidth. In 5(a) the KDE is over-smoothed caused bya too large value of h, and it therefore misses some of the distribution’sstructural behaviour. On the other hand, a too small h as in 5(b) makesthe KDE under-smoothed. In 5(c) the bandwidth is calculated according toSilverman’s ”rule of thumb” described in [2] and the KDE seems to catchthe actual bimodality of the distribution.

10

0 5 10

f(x)

0

0.05

0.1

0.15

0.2

0.25

0.3

Histogram

Kernel est

(a) h = 2

0 5 10

f(x)

0

0.05

0.1

0.15

0.2

0.25

0.3

Histogram

Kernel est

(b) h = 0.05

0 5 10

f(x)

0

0.05

0.1

0.15

0.2

0.25

0.3

Histogram

Kernel est

(c) h = 0.23

Figure 5: Kernel density estimation with a Gaussian kernel for three differentvalues of h and a data set with 2000 sample points from a combined normaldensity.

In the univariate case it is possible to choose the bandwidth by inspection.This is done by calculating the KDE for a large number of h and then de-crease h until the KDE in some sense looks satisfying. This approach is alsopossible in the bivariate case but in higher dimensions the data can not bevisualised intuitively. Visual inspection of the bandwidth also assumes somesense of knowledge of the data, for example the positions of the modes. Inmany situations the distribution is totally unknown and an automatic band-width selection is preferred in order to avoid the problems of the inspectionmethod.

The previously mentioned ”rule of thumb” is a bandwidth selection methodwhich is very easy to understand and implement. The ”rule of thumb” gives asatisfying result in many situations and can serve as a useful starting point.However, the method lacks in terms of robustness and optimality. Morerobust and in some sense optimal alternatives to the ”rule of thumb” is to tryto minimise the AMISE. Calculating the bandwidth in the univariate case is

11

manageable but becomes very complex in higher dimensions. The extensionto bivariate bandwidth selection increases the complexity significantly sincethe bivariate bandwidth H is the matrix defined in equation 5. Often somesimplification can be made by considering diagonal H:s and in some cases ithave been shown that a diagonal H can be sufficient [6]. On the other handdiagonal H:s do not support an arbitrary change of the kernel orientationwhich in some cases is quite crucial. In the next two Sections the main classesused for bandwidth selection will be presented, namely plug in methods (PI)and cross validation (CV).

6.1 Plug-in bandwidth selection

As previously mentioned, most available bandwidth selection method aimto minimise the asymptotic error estimation AMISE. In the univariate casethe following expression for the optimal bandwidth hAMISE can be obtainedby differentiating the AMISE expression (10) with respect to h and settingthe derivative equal to zero

hAMISE =[

R(K)µ2(K)2R(f ′′)n

]1/5

. (13)

Usually the only unknown quantity in the expression above is the actualprobability density function f . In the plug-in method R(f ′′) is replaced bythe kernel functional estimator ψ4(g) that can be obtained from the formula

ψr(g) = n−2n∑

i=1

n∑j=1

L(r)g (Xi −Xj), (14)

where Lg is an appropriate kernel and is g the pilot bandwidth. The pilotbandwidth is usually chosen by applying the formula for the AMISE optimalbandwidth again

gAMISE =[

2K4(0)−µ2(K)2ψ6n

]1/7

. (15)

This has the effect of introducing ψ6 which requires a new pilot bandwidthto be estimated. Every new estimate ψr will depend on ψr+2. The common

12

solution to this problem is at some point to estimate ψr with an easilyobtained estimate such as the rule of thumb instead of an AMISE basedapproximation. This yields a variety of plug in methods differing in thenumber of steps in which kernel functional estimators are obtained beforethe simple estimate is applied. If k stages are applied before the simpleestimate it is referred to as an k-stage plug in method. Several versionsof the PI-method have been developed. The most well-known univariateplug-in selector is the algorithm developed by Sheater and Jones (1991) [4].

The plug in method can be extended to several dimensions, first shown byWand and Jones 1993 [6] and refined and optimised by Doung and Hazelton2003 [5]. In the bivariate case the plug in method aimes to minimise thebivariate AMISE

AMISE f(H) = n−1|H|−12R(k) + 1

4µ2(K)2(vechT H ψ4vech H). (16)

where vech denotes the following operation

vech H = vech

[h11 h12h12 h22

]=[h11 h12 h22

]T. (17)

The 3× 3 matrix ψ4 is defined as

ψ4 =

ψ40 ψ31 ψ222ψ31 4ψ22 2ψ13ψ22 2ψ13 ψ04

, (18)

where

ψr1r2 =∫R2f (r1,r2)(x)f(x)dx

and

f (r1,r2)(x) = ∂4

∂r1x1∂r2x2f(x)

13

is the partial derivatives of x with respect to x1 and x2.

As in the univariate case ψr1,r2 has to be estimated. A commonly usedestimate is

ψ(r1,r2)(G) = n−2n∑

i=1

n∑j=1

K(r1,r2)G (Xi −Xj) (19)

where G is the pilot bandwidth matrix. In Doung and Hazelton 2003 [5]it is suggested that this matrix should be on the form G = g2I. Choosingg can be done in a similar way as in the univariate case. For each entryψ(r1,r2) in ψ4, g = gAMSE is chosen such that it minimises the AsymptoticMean Square Error approximation

AMSE ψ(r1,r2)(g) = 2n−2g−2(r1+r2)−2ψ0R(K(r1+r2))+

+(n−1g−(r1+r2)−2K(r1+r2)(0) + 1

2g2µ2(K)(ψr1+2,r2 + ψr1,r2+2)

)2

. (20)

This method may produce matrices ψ4 that are not positive definite. Inthat case a minimum to the objective function does not exist. To solve thisissue Doung and Hazelton suggest another approach as opposed to findingone optimal g for each entry in ψ4. Instead, g = g4SAMSE that minimisesthe sum

SAMSE =∑

r1+r2=4AMSE ψ(r1,r2)(g)

should be calculated and used as a common g for all entries in ψ4 . A closedform expression for g4SAMSE is stated in Doung and Hazelton 2003 [5]. Inanalogy with the univariate case, the estimates of g depends on ψr1,r2 andtherefore an easy estimate of ψr1,r2 has to be made at some stage.

The plug in method as described above requires higher derivatives. There-fore it is not possible to implement the method for an Epanechnikov kernelsince its derivatives of second order and higher all are equal to 0.

14

6.2 Cross validation

The most commonly used bandwidth selectors besides PI belongs to theclass using cross-validation (CV). Generally methods based on CV can beapplied to any kernel. This differs from the PI methods that usually requirehigher order derivatives. The MISE previously defined in equation (8) canbe rewritten as

MISE(h) = E(∫

(f(x, h)− f(x))2)

=∫f(x, h)2 − 2

∫f(x, h)f(x) +

∫f(x)2.

(21)

CV aims to minimise the MISE which is equivalent to keeping the approx-imation f(x) as close to f(x) as possible. The third term in (21) is inde-pendent of the bandwidth and the equivalent minimisation can be writtenas

MISE(h)−∫f(x)2 =

∫f(x, h)2 − 2

∫f(x, h)f(x). (22)

The calculation of the first term on the RHS is quite straightforward sinceit only involves known quantities. However, the second term complicatesthings since it involves the unknown quantity f(x). Several versions ofbandwidth selection methods using CV have been developed but the mainfocus has been to investigate smoothed cross validation.

6.2.1 Smoothed cross validation

The most commonly used bandwidth selector within the CV family is smoothedcross validation (SCV). SCV can be seen as a general method for bandwidthselection. It usually performs better as compared to other CV methods. Themethod will be presented for the bivariate case. SCV uses the following pilotbandwidth estimate to approximate f in equation (22)

fL(x,G) = n−1n∑

i=1LG(x−Xi). (23)

15

Here L is an appropriate kernel and G the pilot bandwidth. This gives theobjective function

SCV = (n)−1R(K)|H|−1/2

+ n−2n∑

i=1

n∑j=1

(KH ∗KH ∗LG ∗LG − 2KH ∗LG ∗LG +LG ∗LG)(Xi −Xj).

(24)

where ∗ denotes convolution. This method is similar to PI in the sense thata pilot estimate is used. As in PI estimates the choice of G is important andthere are different ways to choose it. Usually it is chosen the same way as gin the PI selection in Section 6.1. Since that method can not be applied forthe Epanechnikov kernel, neither can this version of SCV. The convolutionsin equation (24) are simplified a lot if a Gaussian kernel is used, since inthat case there is a closed form expression [8].

6.3 Pre-transformation

Bandwidth selection methods with pilot bandwidths often require some sortof pre-transformation of the data [5]. This is of particular importance whenthe data is scaled differently along the coordinate axes. The two main meth-ods for pre-transforming the data are sphering and scaling. Both methodsuse the variance of the dataset in order to make it more uniformly scaled.After the transformation the bandwidth can be calculated and the data backtransformed into its original form. The preferred scaling method is not al-ways obvious, although some general recommendations can be given [5]. Ifthe data in some sense has a different local orientation as compared to theglobal, the sphering method can destroy the local structures of the distribu-tion. On the other hand, if the entire dataset is skewed, sphering can yielda considerably more accurate result than scaling. Figure 6(a) shows an ex-ample where using sphering pre-transformation can be suitable. In Figure6(b) shows an example where using scaling can be more suitable due to thedifference in orientation of the two modals.

16

-3 -2 -1 0 1 2 3

-3

-2

-1

0

1

2

3

(a) Correlated gaussian-3 -2 -1 0 1 2 3

-3

-2

-1

0

1

2

3

(b) Assymetric bimodal

Figure 6: Two examples of distributions where two different scaling methodswill give significantly different results.

7 Applications of KDE

7.1 Cloud transform

Application fields of kernel density estimation include the so called cloudtransform (see Kolbjørnsen and Abrehamsen, [11]). In this context, theterm is used equivalently to the conditional cumulative distribution F (y|x).This can be estimated from data according to the following expression

F (y|x) =n∑

i=1kd

(x−Xi

h

)·K1

(y − Yi

hy

)/ n∑i=1

kd

(x−Xi

h

)(25)

where

K1(y) =∫ y

−∞k1(t)dt.

If the data is bivariate, then d = 1 and kd = k1 is a one-dimensional kernel.For illustration, Figure 7 shows the estimator of the conditional cumulativedistribution F (y|x) for the scattered data in Figure 2(a).

17

-3 -2 -1 0 1 2 3

-4

-3

-2

-1

0

1

2

3

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Figure 7: The estimated conditional cumulative distribution F (y|x) for thescattered data in Figure 2(a).

7.2 Examples with real data

This Section contains examples of kernel density estimates and cloud trans-forms for petro-elastic data. Scattered data showing porosity versus acousticimpedance for two different wells separately and both wells together pre-sented in Figure 8. Kernel density estimates and the conditional cumulativedistributions of the porosity given the acoustic impedance is seen in Figures9 and 10, respectively. To produce the plots in the latter Figures, the Gaus-sian kernel has been used and the bandwidths have been generated with theplug-in method (see Section 6). Corresponding plots for data sets of logpermeability versus porosity are shown in Figures 11, 12 and 13.

18

Acoustic Impedance ×104

2.5 3 3.5 4 4.5 5

Porosity

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

(a) Well 1Acoustic Impedance ×10

4

2.5 3 3.5 4 4.5

Porosity

0

0.05

0.1

0.15

0.2

0.25

0.3

(b) Well 2


2 2.5 3 3.5 4 4.5 5

Porosity

0

0.05

0.1

0.15

0.2

0.25

0.3

(c) Well 1 and 2

Figure 8: Scatter plots of acoustic impedance and porosity for two wells


2.5 3 3.5 4 4.5 5

Porosity

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

×10-3

0.5

1

1.5

2

2.5

3

3.5

4


4

2.5 3 3.5 4 4.5

Porosity

0

0.05

0.1

0.15

0.2

0.25

0.3

0

0.001

0.002

0.003

0.004

0.005

0.006

0.007

0.008

0.009

0.01

(b) Well 2


2 2.5 3 3.5 4 4.5 5

Porosity

0

0.05

0.1

0.15

0.2

0.25

0.3

×10-3

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

(c) Well 1 and 2

Figure 9: Kernel density estimates for the data shown in Figure 8.

19


2.5 3 3.5 4 4.5 5

Porosity

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


4

2.5 3 3.5 4 4.5

Porosity

0

0.05

0.1

0.15

0.2

0.25

0.3

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

(b) Well 2


2 2.5 3 3.5 4 4.5 5

Porosity

0

0.05

0.1

0.15

0.2

0.25

0.3

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

(c) Well 1 and 2

Figure 10: Conditional cumulative distributions of porosity given acousticimpedance for the data shown in Figure 8.

Porosity0.05 0.1 0.15 0.2 0.25 0.3 0.35

logPermeability

0

0.5

1

1.5

2

2.5

(a) Well 1Porosity

-0.05 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4

logPermeability

-0.5

0

0.5

1

1.5

2

2.5

3

3.5

(b) Well 2

Porosity0 0.1 0.2 0.3 0.4 0.5

logPermeability

0

0.5

1

1.5

2

2.5

3

3.5

(c) Well 2Porosity

0 0.1 0.2 0.3 0.4 0.5

logPermeability

-0.5

0

0.5

1

1.5

2

2.5

3

3.5

(d) All wells together

Figure 11: Scatter plots of log permeability and porosity for three differentwells and all together.

20

Porosity0.05 0.1 0.15 0.2 0.25 0.3 0.35

logPermeability

0

0.5

1

1.5

2

2.5

2

4

6

8

10

12

14

16

(a) Well 1Porosity

-0.05 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4

logPermeability

-0.5

0

0.5

1

1.5

2

2.5

3

3.5

1

2

3

4

5

6

7

8

9

10

(b) Well 2

Porosity0 0.1 0.2 0.3 0.4 0.5

logPermeability

0

0.5

1

1.5

2

2.5

3

3.5

1

2

3

4

5

6

7

8

9

10

11

(c) Well 2Porosity

0 0.1 0.2 0.3 0.4 0.5

logPermeability

-0.5

0

0.5

1

1.5

2

2.5

3

3.5

0

1

2

3

4

5

6

7

8

9

10


Figure 12: Kernel density estimates for the data shown in Figure 11.

Porosity0.05 0.1 0.15 0.2 0.25 0.3 0.35

logPermeability

0

0.5

1

1.5

2

2.5

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

(a) Well 1Porosity

-0.05 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4

logPermeability

-0.5

0

0.5

1

1.5

2

2.5

3

3.5

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

(b) Well 2

Porosity0 0.1 0.2 0.3 0.4 0.5

logPermeability

0

0.5

1

1.5

2

2.5

3

3.5

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

(c) Well 2Porosity

0 0.1 0.2 0.3 0.4 0.5

logPermeability

-0.5

0

0.5

1

1.5

2

2.5

3

3.5

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


Figure 13: Conditional cumulative distributions of log permeability givenporosity for the data shown in Figure 11.

21

8 Method and results

All code for comparison and testing is written in Matlab, including thecalculations of the KDE:s. The code implements a communication with Rin order to use its ks-package (developed by Tarn Doung) for the bandwidthselection process. Also, C is used in order to speed up the linear binning-algorithm.

The testing is done by comparing the results obtained when using differentmethods with the true values from a known underlying density. A set of fourtarget densities (picked from the larger set in [10]) are used for those stud-ies, all built up by a combination of normal distributions and representingdifferent properties. The four densities are shown in figure 14.

-3 -2 -1 0 1 2 3

-3

-2

-1

0

1

2

3

(a) Target density 1: Uncorrelated gaus-sian.

-3 -2 -1 0 1 2 3

-3

-2

-1

0

1

2

3

(b) Target density 2: Correlated gaussian.

-3 -2 -1 0 1 2 3

-3

-2

-1

0

1

2

3

(c) Target density 3: Strongly skewed.

-3 -2 -1 0 1 2 3

-3

-2

-1

0

1

2

3

(d) Target density 4: Assymetric bimodal.

Figure 14: Shows the four target densities used in the tests.

The tests are carried out on a system using Scientific Linux 6.5 with theCPU AMD Opteron (Bulldozer) 6282SE, 2.6 GHz.

22

8.1 Comparison of binning methods

To verify and extend the results on binning methods in [1] a comparison testis performed on the four target densities in Figure 14. In order to comparesimple and linear binning, simulations are done to estimate the relative meanintegrated squared error (RMISE) as defined by Wand in [1]

RMISE = E

∫{f(x)− f(x)}2dx

/E

∫{f(x)− f(x)}2dx. (26)

In words, RMISE is the MISE error due to binning divided with the MISEof the KDE calculated according to the definition. The study is done on theaforementioned target densities. The denominator MISE is calculated withthe closed form. The numerator is estimated with the IMSE calculated asdescribed in Section 4 (so the RMISE is actually estimated as the RIMSE).An equally spaced grid with M = M1 = M2 is used for four different valuesof M . For each M , four different numbers of sample points are investigated.100 data sets are generated to approximate the MSE and 10 000 uniformlydistributed random points are used for the Monte Carlo integration, points inwhich the KDE is approximated with linear interpolation. For each numberof sample points, the bandwidth H is chosen with the plug-in method for aninitial data set and then used all-through for the remaining data sets andgrid sizes.

Parts of the results are seen in Figure 15, while the remaining Figures arefound in appendix A. For each target density, it is seen that linear binningyields a more accurate result than simple binning for almost all combinationsof grid- and sample sizes. The absolute difference is most significant for smallsample sizes and on coarser grids, situations in which good approximationsare naturally harder to make. Note however that the relative difference isincreasing as the grid size is growing. One should also note that the RIMSE-values are growing with the sample size, which implies that larger samplesrequire more grid points to reduce the binning error.

It should be recalled that there is an additional uncertainty introduced bythe linear interpolation, which grows larger as the grid size shrinks. This”extra” level of approximation is also the reason for which even smaller gridsizes are not used in the test.

23

Nr of grid points30 60 120 240

logRIM

SE

10-6

10-5

10-4

10-3

10-2

(a) Target density 1Nr of grid points

30 60 120 240

logRIM

SE

10-4

10-3

10-2

10-1

(b) Target density 2


logRIM

SE

10-3

10-2

10-1

100

(c) Target density 3Nr of grid points

30 60 120 240

logRIM

SE

10-5

10-4

10-3

10-2

10-1

(d) Target density 4

Figure 15: log RIMSE versus number of grid points for the target densitieswhen using 1000 samples. Star and circle corresponds to simple binning andlinear binning, respectively.

Regarding the speed, linear binning is generally faster (all time comparisonplots are found in appendix A). This result may be surprising consideringthat linear binning is a more complicated algorithm than simple binning.The explanation is found in our implementation. Simple binning is imple-mented purely in Matlab, while the more complex linear binning algorithmis partly written in C to speed up the execution of an expensive for-loop.Having these implementation differences in mind, the results should not beused to draw any general conclusions of how the methods compare in termsof speed. In which case, the binning time is small as compared to the timerequired for the actual KDE-computation (se also Section 8.2). Conclusively,the choice of binning method should be based on the accuracy comparison,and thus linear binning is to prefer.

8.2 Comparison of KDE-calculation methods

The aim of the test described in this Section is to compare the KDE:s com-puted by definition (4), the binned estimate (11) and the binned estimate

24

computed with FFT (12). Linear binning (lb) is used since it is the preferablebinning method according to Section 8.1. The ISE for the three methodsis compared through box plots. The ISE values are calculated as describedin Section 4, using 1000 data sets and 10 000 uniformly distributed randompoints for the Monte Carlo-integration.

The tests are performed on data sets generated from the four target densitiesseen in Figure 14. For each density, three different values are used for thesample size of the data sets: {100, 1000, 10000}. Each test is performed withboth the Epanechnikov and the Gaussian kernel on two different grid sizes,30×30 and 60×60. The Gaussian bandwidth is chosen once for each targetdensity using the plug-in method for a data set consisting of 10 000 points.The bandwidth used for the Epanechnikov kernel is obtained by scaling theGaussian bandwidth with a factor 1

6 , as described in Doung 2015 [12]. Thetest results are presented in Figure 16 and 17.

The results are similar for target density 1, 2 and 4. For these densities theISE values are in the same range on both the grids and for all three KDEmethods. The main difference in ISE is seen for an increased sample sizes.On the 30 × 30 grid the KDE by definition estimate improves more thanthe binned estimate as the number of points increased. This is expectedand follows from that the RIMSE grows when the sample size increases asdescribed in Section 8.1.

The ISE behaviour for calculations using linear binning with or withoutFFT can not be distinguished. This is expected since the only observeddifference between the methods are the visual artifacts described in Section5.2. Another observed pattern is that increasing the sample size n of thedataset improves the accuracy for all methods. This is intuitive since alarger sample contains more information about the estimated PDF.

An unexpected result is that the binned estimates in some cases have asmaller mean value (approximate MISE) then the KDE by definition. Thisoccurs mainly when the number of grid points is larger than the number ofsample points.

The results for the strongly skewed target density 3 stand out as comparedto the others. For this density the test on the 30 × 30 grid yields an ISEabout 1000 times larger than for the other densities, and the estimate bydefinition performs significantly better than the binned estimates. On the60×60 grid the ISE is about 100 times larger than for the other densities andthe binned estimates has a smaller mean value for all sample sizes. Thoseslightly unexpected results are probably a consequence of the densities strong

25

skewness. A general advice is to use a dense grid and a large sample size ifa KDE approximation is calculated from a very skew data set.

The results of the Gaussian and the Epanechnikov kernel are similar butnot identical. The kernel with the best peforrmance varies between the dif-ferent methods and target densities. The Epanechnikov kernel is provedto be the most efficient [2], which is not observed in this test. The reasoncould be that the target densities used are combinations of normal distri-butions which may yield an advantage for the infinitely supported Gaussiankernel. Furthermore, the bandwidth matrices are algorithmically chosen forthe Gaussian kernel and adapted to Epanechnikov using a scale factor. In asense, this may cause the Epanechnikov bandwidth to be less optimal thanthe Gaussian.

For each parameter setting the mean execution time of each KDE method isrecorded. This is found to be independent of target density and kernel type.Hence the execution times for one combination of kernel and target densityis representative for all remaining settings. Figure 34 shows the results fortarget density 1 and the Epanechnikov kernel. The remaining figures fromthe time study are found in appendix B.

Some general patterns are observed for all test densities. The executiontime increases as the sample size increases. As mentioned in Section 5.1, thenumber of kernel evaluations required for the KDE by definition is O(nM),where M is the number of grid points. This can be observed in Figure34 where the time for the KDE by definition, denoted def, increases pro-portionally to the sample size. For binned estimates the kernel evaluationsare O(M2), although one must also take into account the time required bythe binning algorithm. This time is proportional to O(nM) as shown in[1]. However, the computational burden of the binning procedure is signifi-cantly less than the actual KDE-calculation. This is also clear from Figure34, where it is seen that the execution time for the KDE by definition growsrapidly compared to the binned estimates.

Furthermore, using FFT results in an enormous speed-up as compared tothe other methods. The ratio of the times required for the binned KDEestimates computed with and without FFT is about 3 % on the 30 × 30mesh. This ratio shrinks to about 0.5 % on the 60 × 60 grid, and wouldshrink further on finer meshes due to the FFT:s speed benefits as discussedin Section 5.2. Due to this speed-up it is strongly recommended to use FFTin binned estimations.

26

100 1000 10000

Random points

-8

-7.5

-7

-6.5

-6

-5.5

-5

-4.5

-4

-3.5

-3log(ISE)

ISE Gaussian kernel on target density 1

KDE by definitionLinear binningFFT linear binning

100 1000 10000

Random points

-8

-7.5

-7

-6.5

-6

-5.5

-5

-4.5

-4

-3.5

-3

log(ISE)

ISE Epanechnikov kernel on target density 1


100 1000 10000

Random points

-7.5

-7

-6.5

-6

-5.5

-5

-4.5

-4

-3.5

-3

log(ISE)



100 1000 10000

Random points

-7.5

-7

-6.5

-6

-5.5

-5

-4.5

-4

-3.5

-3

log(ISE)



100 1000 10000

Random points

-2

-1.5

-1

-0.5

0

0.5

log(ISE)



100 1000 10000

Random points

-2

-1.5

-1

-0.5

0

0.5

log(ISE)



100 1000 10000

Random points

-8

-7.5

-7

-6.5

-6

-5.5

-5

-4.5

-4

-3.5

-3

log(ISE)



100 1000 10000

Random points

-8

-7.5

-7

-6.5

-6

-5.5

-5

-4.5

-4

-3.5

-3

log(ISE)



Figure 16: Result for the KDE method test on 30× 30 Grid

27

100 1000 10000

Random points

-8

-7

-6

-5

-4

-3log(ISE)



100 1000 10000

Random points

-8

-7

-6

-5

-4

-3

log(ISE)



100 1000 10000

Random points

-8

-7.5

-7

-6.5

-6

-5.5

-5

-4.5

-4

-3.5

-3

log(ISE)



100 1000 10000

Random points

-8

-7.5

-7

-6.5

-6

-5.5

-5

-4.5

-4

-3.5

-3

log(ISE)



100 1000 10000

Random points

-2.2

-2

-1.8

-1.6

-1.4

-1.2

-1

log(ISE)



100 1000 10000

Random points

-2.2

-2

-1.8

-1.6

-1.4

-1.2

-1

log(ISE)



100 1000 10000

Random points

-8

-7.5

-7

-6.5

-6

-5.5

-5

-4.5

-4

-3.5

-3

log(ISE)



100 1000 10000

Random points

-8

-7.5

-7

-6.5

-6

-5.5

-5

-4.5

-4

-3.5

-3

log(ISE)



Figure 17: 60× 60 Grid

28

def102

lb102

fft102

def103

lb103

fft103

def104

lb104

fft104

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Meancomputationaltime

Time 30 x 30 grid target density 1

fft102

fft103

fft104

0

0.5

1

1.5

2

2.5

3

3.5

4


×10-3 Time 30 x 30 grid target density 1

def102

lb102

fft102

def103

lb103

fft103

def104

lb104

fft104

0

0.5

1

1.5

2

2.5

3

3.5

4



fft102

fft103

fft104

0

0.001

0.002

0.003

0.004

0.005

0.006

0.007

0.008

0.009

0.01



Figure 18: Time Epanechnikov kernel

8.3 Comparison of bandwidth selection methods

A benefit of using combined normal distributions as target densities is thatthey allow exact computations of the MISE-value, provided the number ofsample points and the bandwidth matrix, as described by equation (4.16) in[2]. This way, it is not necessary to carry out the thorough IMSE-calculationsin the same way as for the KDE-comparison. Instead one can simply usethe bandwidth matrices suggested by the selection methods, calculate thecorresponding exact MISE-values and compare the results.

The main interest in this work is to compare the performance of the ”Plug-In” and ”Smoothed Cross Validation” selection methods. As mentioned inthe introduction to Section 8, R:s ks-package is used in the implementation.This package allows a number of different options for bandwidth selection.The focus is to investigate how PI and SCV compare when using one- andtwo-stage methods and the two different pre-transformations, sphering andscaling. Let sphering be denoted with a star (∗). Scaling is used if nothingelse is stated. The total parameter setting yields eight combinations of band-

29

width selection methods. For a faster execution, the bandwidth algorithmsmake use of binning the data on a grid of size 60× 60. Earlier observationshave shown that this decreases the execution time with orders of magnitudewithout any notable loss of accuracy.

The test is carried out on the four target densities in Figure 14 using avarious set of sample sizes. For each sample size and target density the ks-package is used to calculate the bandwidth, which is then used to computethe exact MISE value. This procedure is then repeated for 100 data sets.In Figure 19-21 the MISE is visualized in form of box plots for the samplesize n = {100, 1000, 10000}.

It is seen in Figure 19-21 that pre-transformation is an important factor.The results seems to agree with the arguments of Section 6.3, since spheringperforms better on the skewed densities 2 and 3, while scaling is better forthe ”multiply directed” density 4. In general the two-stage method performsequally well or better than the one-stage method. No distinct conclusion canbe drawn regarding the difference between PI and SCV. On target density1 PI and SCV shows similar results for all sample sizes. Considering targetdensities 2 and 3 there are some clear differences in accuracy but the resultis not consistent. PI outperforms SCV on target density 3 for all samplesizes. On the other hand, SCV seems to be the more robust selector ontarget density 2. Both selectors show similar accuracy on target density 4even if PI is slightly better. The results also indicates a slight difference indissipation between the two selectors with PI having the lowest.

Besides the accuracy study the execution time for each bandwidth methodis recorded. No difference in execution time could be detected between thetarget densities. The result for target density 1 is presented in Figure 22(the remaining figures are found in appendix C). PI is faster than SCV forall cases investigated. The difference is especially large for the one-stagemethod and for the smallest sample size n = 100, in which case the speedof PI completely surpasses the speed of SCV. A possible explanation can bethat the objective function for SCV is hard to minimize for a small samplesize. Decreasing the sample size to n = 1000 yields significantly reducedexecution times for the SCV method. A further increase of the samplesize to n = 10000 do not reduce the execution times in the same extent.This can probably be explained by the binning approximation used in thebandwidth selection method. The one-stage method is faster than the two-stage method for all investigated cases, although the differences in somecases are very small especially for SCV. The patterns of the remaining timeplots are roughly similar to the one in figure 22.

30

Plug-in SCV

log(MISE)

-4.9

-4.85

-4.8

-4.75

-4.7

-4.65

-4.6

-4.55

-4.5

-4.45

-4.4MISE Gaussian kernel on target density 1

1-stage 2-stage 1-stage* 2-stage*

Plug-in SCV

log(MISE)

-4.8

-4.6

-4.4

-4.2

-4

-3.8

-3.6



Plug-in SCV

log(MISE)

-2.8

-2.6

-2.4

-2.2

-2

-1.8



Plug-in SCV

log(MISE)

-4.8

-4.6

-4.4

-4.2

-4

-3.8



Figure 19: Box plots showing the accuracy for the different bandwidth se-lection methods for a sample size of 100 points.

Plug-in SCV

log(MISE)

-6.2

-6.18

-6.16

-6.14

-6.12

-6.1

-6.08



Plug-in SCV

log(MISE)

-6.1

-6

-5.9

-5.8

-5.7

-5.6

-5.5

-5.4



Plug-in SCV

log(MISE)

-4.2

-4

-3.8

-3.6

-3.4

-3.2

-3MISE Gaussian kernel on target density 3


Plug-in SCV

log(MISE)

-6

-5.9

-5.8

-5.7

-5.6

-5.5




31

Plug-in SCV

log(MISE)

-7.65

-7.64

-7.63

-7.62

-7.61

-7.6

-7.59



Plug-in SCV

log(MISE)

-7.6

-7.5

-7.4

-7.3

-7.2

-7.1



Plug-in SCV

log(MISE)

-5.6

-5.4

-5.2

-5

-4.8

-4.6

-4.4



Plug-in SCV

log(MISE)

-7.45

-7.4

-7.35

-7.3

-7.25

-7.2




PI1 PI2 PI1* PI2* SCV1 SCV2 SCV1* SCV2*

Tim

e[s]

0

5

10

15

20

25

30Time bandwidth selection target number 1

(a) Sample size: 100PI1 PI2 PI1* PI2* SCV1 SCV2 SCV1* SCV2*

Tim

e[s]

0

1

2

3

4

5

6

7

8

9


(b) Sample size: 1000


Tim

e[s]

0

1

2

3

4

5

6

7


(c) Sample size: 10000

Figure 22: Execution time for the bandwidth selection methods on targetdensity 1 for sample sizes {100, 1000, 10000}.

32

9 Summary and conclusions

In data exploration KDE is a useful tool to find underlying PDF:s. Inthis project the focus has been to investigate the properties of differentapproximations and methods in order to identify an efficient and accurateestimate. The main focus has been on binning, bandwidth selection anduse of FFT. The test carried out in Section 8.1 shows that linear binningis more accurate than simple binning. Regarding the KDE calculations thesample size and the grid size are the most important factors for accuracy.A denser grid makes the binned estimate more reliable, while on the otherhand it requires additional computations. Using FFT is shown to be fasterthan the KDE by definition. However, KDE by definition is more accurateon a coarse grid. Therefore our recommendation is to use FFT on a densegrid for a good trade off between performance and speed.

The bandwidth selection can be seen as one of the more crucial parts ofKDE calculations. Some general recommendation can be given from thetests carried out in Section 8.3 even though the results strongly dependson the shape of the target density. First of all, the data should be pre-transformed correctly. If only one orientation is present in the data setsphering should be seen as the preferable pre-transformation due to itsnon-destructive properties. For data with multiple orientations scaling isthe preferable pre-transformation method. A one-stage method should beconsidered due to the more robust and solid performance compared to theone-stage counterpart. However, the computational cost is higher for thetwo-stage method.

Regarding the executional times the real bottlenecks of the calculations canbe found in the selection of the bandwidth. Since the bandwidth selection isperformed using an external call to an already existing software profiling ishard to perform. The ks-package contains some highly evolved code whereincluding calls to C to improve the speed on time consuming parts. Com-pared to the bandwidth selection the binning and actual KDE-calculationare usually fast.

Since the implemented KDE-calculation makes use of already existing soft-ware the portability is somewhat tricky. As an extension of the work itwould be highly interesting to have all the code written in Matlab, dueto the software portability as well as for analysis purpose. It would also bedesired to have a bandwidth selection especially developed for the Epanech-nikov kernel instead of scaling the Gaussian bandwidth. This would notwork for the PI approach since it requires higher order derivatives. How-ever, it should be possible to implement for SCV, although some practical

33

issues must be dealt with such as the convolution in equation 24 and thechoice of pilot kernel. Another interesting aspect would be to perform thetests on a non-Gaussian target density.

In addition to the theoretical results themselves presented and discussed inthis report, the source code written and its implementation has been animportant part of the work. Anyone interested in the subject who wishes tomake use of these resources is welcome to contact the authors on any of theemail-addresses found below.

Contact Information

Alexander Bilock: [email protected] Jidling: [email protected] Rydin: [email protected]

Acknowledgements

Thanks to our supervisor David Marquez for his support and comments.

34

References

[1] M. P. Wand, Fast Computation of Multivariate Kernel Estimators,Journal of Computational and Graphical Statistics, (1994).

[2] M. P. Wand, M. C. Jones, Kernel Smoothing, Chapman & Hall, 1stedition, (1995).

[3] A. Gramacki, J. Gramacki, FFT-Based Fast Computation of Multivari-ate Kernel Density Estimators with Unconstrained Bandwidth Matrices, (2015).

[4] M. C. Jones, S. J. Sheather, Using non-stochastic terms to advantagein integrated squared density derivatives (1991).

[5] T. Doung, M. Hazelton, Plug-In Bandwidth Matrices for Bivariate Ker-nel Density Estimation, Nonparametric Statistics, (2003).

[6] M. P. Wand, M. C. Jones, Comparison of smoothing parameterizationsin bivariate kernel density estimation, (1993).

[7] M. P. Wand, M. C. Jones, Multivariate Plug-in Bandwidth Selection,(1993).

[8] T. Doung, M. Hazelton, Cross-validation Bandwidth Matrices for Mul-tivariate Kernel Density Estimation, (2005).

[9] S.R. Sain, K.A. Baggerly, D.W. Scott, Cross-validation of multivariatedensities, (1994).

[10] J. E. Chacon, Cross-validation Bandwidth Matrices for MultivariateKernel Density Estimation, The Canadian Journal of Statistics, Vol.34, No 4, (2006).

[11] O. Kolbjørnsen, P.Abrehamsen, Theory of the Cloud transform for Ap-plications. Geostatistics Banff, Vol. 1 (7th International GeostatisticsCongress), (2004).

[12] T. Doung Spherically symmetric multivariate beta family kernels,Statistics and Probability Letters Volume 104, (2015).

35

Appendix

A Comparison of binning methods

Accuracy

Star and circle corresponds to simple binning and linear binning, respec-tively.


logRIM

SE

10-7

10-6

10-5

10-4

10-3

10-2

n = 100


logRIM

SE

10-6

10-5

10-4

10-3

10-2

n = 1000


logRIM

SE

10-6

10-5

10-4

10-3

10-2

n = 10000


logRIM

SE

10-5

10-4

10-3

10-2

n = 100000

Figure 23: Target density 1.

36


logRIM

SE

10-5

10-4

10-3

10-2

10-1

n = 100


logRIM

SE

10-4

10-3

10-2

10-1

n = 1000


logRIM

SE

10-4

10-3

10-2

10-1

n = 10000


logRIM

SE

10-3

10-2

10-1

100

101

102

n = 100000



logRIM

SE

10-4

10-3

10-2

10-1

n = 100


logRIM

SE

10-3

10-2

10-1

100

n = 1000


logRIM

SE

10-2

10-1

100

101

102

103

n = 10000


logRIM

SE

10-1

100

101

102

103

104

n = 100000


37


logRIM

SE

10-6

10-5

10-4

10-3

10-2

n = 100


logRIM

SE

10-5

10-4

10-3

10-2

10-1

n = 1000


logRIM

SE

10-4

10-3

10-2

10-1

n = 10000


logRIM

SE

10-3

10-2

10-1

100

101

102

103

104

n = 100000


Execution time

Nr of grid points

30 60 120 240

Tim

e [s

]

×10 -3

0

0.5

1

1.5

2

2.5

n = 100

Linear

Simple

Nr of grid points

30 60 120 240

Tim

e [s

]

×10 -3

0

0.5

1

1.5

2

2.5

n = 1000

Linear

Simple

Nr of grid points

30 60 120 240

Tim

e [s

]

×10 -3

0

1

2

3

4

5

6

7

8

9n = 10000

Linear

Simple

Nr of grid points

30 60 120 240

Tim

e [s

]

0

0.005

0.01

0.015

0.02

0.025

n = 100000

Linear

Simple

Figure 27: TargDens=1

38

Nr of grid points

30 60 120 240

Tim

e [s

]×10 -3

0

0.5

1

1.5

2

n = 100

Linear

Simple

Nr of grid points

30 60 120 240

Tim

e [s

]

×10 -3

0

0.5

1

1.5

2

2.5

n = 1000

Linear

Simple

Nr of grid points

30 60 120 240

Tim

e [s

]

×10 -3

0

1

2

3

4

5

6n = 10000

Linear

Simple

Nr of grid points

30 60 120 240

Tim

e [s

]

0

0.005

0.01

0.015

0.02

0.025

n = 100000

Linear

Simple


Nr of grid points

30 60 120 240

Tim

e [s

]

×10 -3

0

0.5

1

1.5

2

2.5

n = 100

Linear

Simple

Nr of grid points

30 60 120 240

Tim

e [s

]

0

0.002

0.004

0.006

0.008

0.01

0.012n = 1000

Linear

Simple

Nr of grid points

30 60 120 240

Tim

e [s

]

×10 -3

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

n = 10000

Linear

Simple

Nr of grid points

30 60 120 240

Tim

e [s

]

0

0.005

0.01

0.015

0.02

0.025

0.03

n = 100000

Linear

Simple


39

Nr of grid points

30 60 120 240

Tim

e [s

]×10 -3

0

0.2

0.4

0.6

0.8

1

1.2

n = 100

Linear

Simple

Nr of grid points

30 60 120 240

Tim

e [s

]

×10 -3

0

0.2

0.4

0.6

0.8

1

1.2

1.4

n = 1000

Linear

Simple

Nr of grid points

30 60 120 240

Tim

e [s

]

×10 -3

0

0.5

1

1.5

2

2.5

3

n = 10000

Linear

Simple

Nr of grid points

30 60 120 240

Tim

e [s

]

0

0.005

0.01

0.015

0.02

0.025

0.03

n = 100000

Linear

Simple


40

B Comparison of KDE-calculation methods

Execution time

def102

lb102

fft102

def103

lb103

fft103

def104

lb104

fft104

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9



fft102

fft103

fft104

0

0.5

1

1.5

2

2.5

3

3.5

4



def102

lb102

fft102

def103

lb103

fft103

def104

lb104

fft104

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9



fft102

fft103

fft104

0

0.5

1

1.5

2

2.5

3

3.5

4



def102

lb102

fft102

def103

lb103

fft103

def104

lb104

fft104

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9



fft102

fft103

fft104

0

0.5

1

1.5

2

2.5

3

3.5

4



def102

lb102

fft102

def103

lb103

fft103

def104

lb104

fft104

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9



fft102

fft103

fft104

0

0.5

1

1.5

2

2.5

3

3.5

4



Figure 31: Epanechnikov kernel 30× 30 grid

41

def102

lb102

fft102

def103

lb103

fft103

def104

lb104

fft104

0

0.2

0.4

0.6

0.8

1

1.2



fft102

fft103

fft104

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5



def102

lb102

fft102

def103

lb103

fft103

def104

lb104

fft104

0

0.2

0.4

0.6

0.8

1

1.2



fft102

fft103

fft104

0

0.5

1

1.5

2

2.5

3

3.5

4



def102

lb102

fft102

def103

lb103

fft103

def104

lb104

fft104

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9



fft102

fft103

fft104

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5



def102

lb102

fft102

def103

lb103

fft103

def104

lb104

fft104

0

0.2

0.4

0.6

0.8

1

1.2



fft102

fft103

fft104

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5



Figure 32: Gaussian kernel 30× 30 grid42

def102

lb102

fft102

def103

lb103

fft103

def104

lb104

fft104

0

0.5

1

1.5

2

2.5

3

3.5

4



fft102

fft103

fft104

0

0.001

0.002

0.003

0.004

0.005

0.006

0.007

0.008

0.009

0.01



def102

lb102

fft102

def103

lb103

fft103

def104

lb104

fft104

0

0.5

1

1.5

2

2.5

3

3.5

4



fft102

fft103

fft104

0

1

2

3

4

5

6

7

8

9



def102

lb102

fft102

def103

lb103

fft103

def104

lb104

fft104

0

0.5

1

1.5

2

2.5

3

3.5

4



fft102

fft103

fft104

0

0.001

0.002

0.003

0.004

0.005

0.006

0.007

0.008

0.009

0.01



def102

lb102

fft102

def103

lb103

fft103

def104

lb104

fft104

0

0.5

1

1.5

2

2.5

3

3.5

4



fft102

fft103

fft104

0

0.001

0.002

0.003

0.004

0.005

0.006

0.007

0.008

0.009

0.01



Figure 33: Epanechnikov kernel 60× 60 grid43

def102

lb102

fft102

def103

lb103

fft103

def104

lb104

fft104

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5



fft102

fft103

fft104

0

0.001

0.002

0.003

0.004

0.005

0.006

0.007

0.008

0.009

0.01



def102

lb102

fft102

def103

lb103

fft103

def104

lb104

fft104

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5



fft102

fft103

fft104

0

0.001

0.002

0.003

0.004

0.005

0.006

0.007

0.008

0.009

0.01



def102

lb102

fft102

def103

lb103

fft103

def104

lb104

fft104

0

0.5

1

1.5

2

2.5

3

3.5

4



fft102

fft103

fft104

0

0.001

0.002

0.003

0.004

0.005

0.006

0.007

0.008

0.009

0.01



def102

lb102

fft102

def103

lb103

fft103

def104

lb104

fft104

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5



fft102

fft103

fft104

0

0.002

0.004

0.006

0.008

0.01

0.012



Figure 34: Gaussian kernel 60× 60 grid44

C Comparison of Bandwidth selection methods

ISE

Plug-in SCV

log(MISE)

-4.9

-4.85

-4.8

-4.75

-4.7

-4.65

-4.6

-4.55

-4.5

-4.45



Plug-in SCV

log(MISE)

-4.8

-4.6

-4.4

-4.2

-4

-3.8

-3.6



Plug-in SCV

log(MISE)

-2.8

-2.6

-2.4

-2.2

-2

-1.8



Plug-in SCV

log(MISE)

-4.8

-4.6

-4.4

-4.2

-4

-3.8



Figure 35: 100 points

45

Plug-in SCV

log(MISE)

-6.2

-6.18

-6.16

-6.14

-6.12

-6.1

-6.08



Plug-in SCV

log(MISE)

-6.1

-6

-5.9

-5.8

-5.7

-5.6

-5.5

-5.4



Plug-in SCV

log(MISE)

-4.2

-4

-3.8

-3.6

-3.4

-3.2



Plug-in SCV

log(MISE)

-6

-5.9

-5.8

-5.7

-5.6

-5.5




Plug-in SCV

log(MISE)

-7.65

-7.64

-7.63

-7.62

-7.61

-7.6

-7.59



Plug-in SCV

log(MISE)

-7.6

-7.5

-7.4

-7.3

-7.2

-7.1



Plug-in SCV

log(MISE)

-5.6

-5.4

-5.2

-5

-4.8

-4.6

-4.4



Plug-in SCV

log(MISE)

-7.45

-7.4

-7.35

-7.3

-7.25

-7.2




46

Execution time


Tim

e[s]

0

5

10

15

20

25



Tim

e[s]

0

5

10

15

20

25

30



Tim

e[s]

0

5

10

15

20

25



Tim

e[s]

0

5

10

15

20

25

30


Figure 38: Sample size: 100 points

47


Tim

e[s]

0

1

2

3

4

5

6

7

8

9



Tim

e[s]

0

1

2

3

4

5

6

7

8

9



Tim

e[s]

0

1

2

3

4

5

6

7



Tim

e[s]

0

1

2

3

4

5

6

7

8




Tim

e[s]

0

1

2

3

4

5

6

7



Tim

e[s]

0

1

2

3

4

5

6

7



Tim

e[s]

0

1

2

3

4

5

6

7



Tim

e[s]

0

1

2

3

4

5

6

7



48

Documents

Modelling Bivariate Distributions Using Kernel Density ... · Modelling Bivariate Distributions Using Kernel Density Estimation Alexander Bilock, Carl Jidling and Ylva Rydin ... Matlab