A New Hybrid Estimation Method for the Generalized · PDF fileOutline Introduction Estimation of the GPD Parameters Simulation Study An Example Final Conclusions A New Hybrid Estimation

Outline Introduction Estimation of the GPD Parameters Simulation Study An Example Final Conclusions

A New Hybrid Estimation Method for theGeneralized Pareto Distribution

Chunlin Wang

Department of Mathematics and StatisticsUniversity of Calgary

May 18, 2011

A New Hybrid Estimation Method for the GPD Chunlin Wang (UCalgary) May 18, 2011 1/32


1 IntroductionThe Generalized Pareto DistributionApplication

2 Estimation of the GPD ParametersA Review of LiteratureThe Maximum Likelihood EstimationThe Maximum Goodness-of-Fit EstimationA New Hybrid Estimation Method

3 Simulation StudyBias and MSE Comparisons

4 An ExampleAn Example: Bilbao waves data

5 Final Conclusions



The Generalized Pareto Distribution


The Generalized Pareto Distribution (GPD) is a two-parameter familyof distributions first introduced by Pickands (1975) with the distributionfunction (cdf)

Fσ,k(x) ={ 1− (1− kx/σ)1/k , if k 6= 0 ,

1− e−x/σ, if k = 0 ,(1)

and the probability density function (pdf)

fσ,k(x) ={

σ−1(1− kx/σ)1/k−1, if k 6= 0 ,σ−1e−x/σ, if k = 0 ,

(2)

where the σ > 0 and −∞ < k < ∞ are the scale and shape parameters,and the domain of x is (0,∞) when k ≤ 0 or (0, σ/k) when k > 0. Wedenote the above distribution by GPD(σ, k).





The GPD is important because of its versatility and flexibility. Thespecial cases of GPD are

when k = 1, the GPD becomes the uniform distribution in the range[0, σ];

when k = 0, the GPD becomes the exponential distribution withmean σ as taken the limit;

when k < 0, the GPD reduces to the Pareto distribution (PD).

The mean of the GPD is σ/(1 + k); and the variance of the GPD isσ2/[(1 + k)2(1 + 2k)], but its mean and variance exist only if k > −1and k > −1/2, respectively. In general, the rth central moment of theGPD exists only if k > −1/r .




Graphing the GPD

The Figure 1 shows the density functions of the GPD with σ = 1 fixed.

0.0 0.5 1.0 1.5 2.0

0.0

0.5

1.0

1.5

2.0

Plot of GPD density, with σ = 1 fixed, and k > 0

x

f(x)

k=0.1k=0.5k=0.75k=1k=1.25

0 1 2 3 4 5 6

0.0

0.2

0.4

0.6

0.8

1.0

Plot of GPD density, with σ = 1 fixed, and k <= 0

x

f(x)

k = 0k = −0.5k = −2

Figure 1: The Density functions of the GPD with different k.A New Hybrid Estimation Method for the GPD Chunlin Wang (UCalgary) May 18, 2011 5/32


Application

Application: Peaks Over Thresholds (POT)

In extreme value theory, there are generally two methods for modelingthe extremes:

The classical approach is based on the limiting distribution of themaxima or minima of a sequence of i.i.d. random variables, whichturns out to be the generalized extreme value distribution (GEVD).

The GPD was introduced to model the exceedences Xi − t over ahigh threshold, where {Xi} are the sample observations and t is agiven threshold: examples are flood levels of rivers, heights ofwaves, etc.

An attractive and useful feature of the GPD in this application is itsstability. It may easily be shown that if X follows a GPD(σ, k), thenthe conditional distribution of X − t given that X > t for any level tfollows the GPD(σ − kt, k).



A Review of Literature


Given a random sample from the GPD, most of the existing estimationmethods for the GPD parameters σ and k can give some theoretical orcomputational problems.

As the most classical and important method of estimation instatistics, the maximum likelihood (ML) method, has beenconsidered by DuMouchel (1983), Davison (1984), Smith (1984,1985), Grimshaw (1993), Choulakian and Stephens (2001), and thereferences therein. We will present the ML method in more detailsin the next section.

Hosking and Wallis (1987) and Dupuis and Tsao (1998) studiedsome alternative estimation methods to the method of moment(MOM), and the probability-weighted moment (PWM)method.





Castillo and Hadi (1997) proposed an elemental percentilemethod (EPM) which was based on the idea to make full use ofthe order statistics by initially equating the GPD distributionfunction to all pairs of the order statistics, and then use the medianas the overall estimates of σ and k.

Luceno (2006) brought out the maximum goodness-of-fitestimation (MGFE) method based on the family of the empiricaldistribution function (EDF) statistics. In fact, this method can bedated back to Wolfowitz (1953, 1957) under a more general nameof minimum distance estimation. We will carefully investigate theMGFE method in the next section, and borrow some of its ideas todevelop our new hybrid estimation method.





Zhang (2007) suggested the likelihood moment estimation(LME) method for the GPD to overcome the computationalproblems faced by the ML method.

Zhang and Stephens (2009) provided a new efficient estimationmethod based on the likelihood and the empirical Bayesianmethod (EBM). But this method is quite sensitive to the choice ofthe shape of the prior distribution as indicated in their paper.

In order to improve the poor performance of the EBM estimators inthe heavy-tailed cases, Zhang (2010) introduced a modified EBM(EBM*) by updating a more reliable and adaptive prior. The mainconclusion of the paper was that the EBM* generally outperformsthe other existing estimation procedures in the range−6 < k < 1/2, in terms of estimation bias and efficiency.



The Maximum Likelihood Estimation

The Estimating Equations

Given a random sample X = (X1,X2, . . . ,Xn) from the GPD with the cdfgiven in (1), the log-likelihood function is given by

l(σ, k;X ) = −n log σ −(

1− 1

k

) n∑i=1

log

(1− kXi

σ

).

To find the maximum of the log-likelihood over the parameter spaceA = {k < 0, σ > 0} ∪ {k > 0, σ/k > X(n)}, consider the firstderivatives of the GPD log-likelihood with respect to k and σ, and setthem to be zero to have the following estimating equations{

n(k − 1) =∑n

i=1 log(1− kXi

σ

)+ (k − 1)

∑ni=1

(1− kXi

σ

)−1,

k = −n−1∑n

i=1 log(1− kXi

σ

).




The Estimating Equations

As pointed out by Davison (1984), the above bivariate maximization canbe reduced to a one-dimensional search because the two estimatingequations are only dependent on the ratio θ = k/σ (θ < 1/X(n)), andthen given a value of θ, a close-form expression for k is available. So it isnatural and convenient to reparameterize the (σ, k) to (θ, k).

Based on the log-likelihood function of (θ, k) and substituting k withk = −n−1

∑ni=1 log (1− θXi ), we have the profile log-likelihood function

of θ given by

l(θ;X ) = −n −n∑

i=1

log (1− θXi )− n log

[− 1

nθ

n∑i=1

log (1− θXi )

]. (3)




Computing the MLE

Supposed a local maximum of (3) can be found at θMLE numerically overthe parameter space B =

{θ < 1/X(n)

}, then the MLE of σ and k are

given by

kMLE = −n−1n∑

i=1

log(1− θMLEXi ) and σMLE = kMLE/θMLE. (4)

But the numerical solution of θMLE could be complex since there couldhave more than one root for the first derivative of (3) to be zero, andsome convergence problem may occur when θ gets closer to its boundary,so the constraint θ < 1/X(n) needs to be cared about.

An algorithm for computing the MLE for the GPD parameters wasdesigned in Grimshaw (1993).




Computing the MLE

When k < 1/2, Smith (1984) proved that the ML estimators given in (4)is asymptotically normally distributed with the asymptotic variancesachieving the Cramer-Rao lower bound under some proper regularityconditions. Specifically, we have[ σMLE

kMLE

]∼ N

([ σk

], n−1

[ 2σ2(1− k) σ(1− k)σ(1− k) (1− k)2

]), k < 1/2 .

When k ≥ 1/2, Smith (1984) identified as the non-regular case since theregularity conditions fail to hold, and the convergence problems mayoccur in this case.

When k > 1, the MLE does not exist because the likelihood function nearthe endpoint tends to infinity as x approaches σ/k.



The Maximum Goodness-of-Fit Estimation

EDF Statistics

Given a random sample X = (X1,X2, . . . ,Xn) from a continuousdistribution function F (x ; θ), let Fn(x) denote the empirical distributionfunction (EDF), that is

Fn(x) =1

n·

n∑i=1

IXi (x) ,

where IXi (x) = 1 if Xi ≤ x , and IXi (x) = 0 if Xi > x .

Then any statistic that measures the discrepancy between Fn(x) andF (x ; θ) is called an EDF statistic, which is originally used to test thegoodness-of-fit (GOF) of fitting a continuous probability distribution tosample data.




EDF Statistics

There are mainly two classes of EDF statistics: the supremum EDFstatistics which include the Kolmogorov-Smirnov (KS) statistic, theKuiper statistic; and the integral EDF statistics which include theCramer-von Mises statistic (CM), the Anderson-Darling (AD) statisticand etc.

In Luceno (2006), the idea of GOF was borrowed for the parameterestimation purpose for the GPD. The proposed maximum goodness-of-fitestimator (MGFE) was obtained by minimizing any of the EDF statisticswith respect to unknown parameters σ and k. We will only focus on theMGFE based on the AD statistic.




Computing the MGFE

In terms of the GPD with the cdf F (x ;σ, k), the definition of the ADstatistic A2(σ, k) is

A2(σ, k) = n

∫ ∞

−∞{Fn(x)− F (x ;σ, k)}2 {F (x ;σ, k)(1− F (x ;σ, k)}−1 dF (x ;σ, k).

For computational purposes, the above AD statistic can be expressed inan alternative form since the Fn(x) is a step function with jump at eachorder statistics. By applying the probability integral transformation to theordered sample, we denote zi = F (x(i); σ, k), i = 1, . . . n. Then the ADstatistic A2(σ, k) can be written as follows

A2(σ, k) = −n − 1

n

n∑i=1

{(2i − 1) ln zi + (2n + 1− 2i) ln(1− zi )} . (5)




Computing the MGFE

The final estimates σMGFE and kMGFE of the GPD are obtained byminimizing the AD statistic A2(σ, k; x) given in (5) with respect to theunknown parameters σ and k. The minimization should be carefullyperformed over the parameter space A = {k < 0, σ > 0} ∪ {k > 0,σ/k > X(n)}.

In general, the technique of MGFE was shown to be able to deal with theGPD parameters estimation when the MLE and other methods failed,and even in the context of generalized linear model. However, thetwo-dimensional numerical optimization could be complex and relativelytime-consuming, and a well specified starting point (σ(0), k(0)) could beuseful.



A New Hybrid Estimation Method

Motivation

As we have discussed, the MLE can possess high large-sample efficiencywhenever it exists in a restricted parameter space, while the MGFE havesmall bias and can always be found provided a well chosen initial point.

Motivated by the idea to take advantage of both the MGFE and theMLE, we propose a new hybrid estimation method, which primarily relieson the MGFE to maintain the small bias and then improves the efficiencyby incorporating the useful maximum likelihood information. At the sametime, the computational effort is also greatly reduced.




Computing the New Hybrid Estimates

Under the reparameterization of θ = k/σ for the GPD, the MLE of k andθ must satisfy k = −n−1

∑ni=1 log (1− θXi ).

For the MGFE based on the AD statistic A2(σ, k;X ), we can consider thereparameterized version and substitute the above maximum likelihoodrelationship into it to have a simplified univariate minimization problem.

Specifically, we consider minimizing the target function G , so theproblem becomes a univariate minimization given the maximumlikelihood relationship as a constraint

minθ∈B

G (θ;X ) = minσ,k∈A

A2(σ, k;X ) | θ = k/σ, k = −n−1∑

log (1− θXi ) .





The target function G based on AD statistic can be written in a simplecomputational form

G (θ;X ) = −n − 1

n

n∑i=1

{(2i − 1) log

[1− (1− θXi )

−n/∑

j log(1−θXj )]

−n (2n + 1− 2i)log(1− θXi )∑j log(1− θXj)

}. (6)

In the POT applications the sample size is usual small. To reduce thebias in such cases, through our extensive simulation, an effectiveadjustment in the above G (θ;X ) is suggested, which is to replace thefirst n of the last term by (n − 0.5) to ensure that as n gets larger, thisadjustment vanishes.





Our new hybrid estimator θNEW of θ is defined to be the value of θ atwhich G (θ;X ) is minimized subject to the boundary conditionθ < 1/X(n).

Finally, the new hybrid estimators σNEW and kNEW can be calculated as

kNEW = −n−1n∑

i=1

log(1− θNEWXi ) and σNEW = kNEW/θNEW . (7)

It is easy to see that the new hybrid estimators kNEW and σNEW willalways give valid estimates.




Inference

Because the new hybrid method combines both the maximumgoodness-of-fit and the maximum likelihood methods, it seems not easyto derive the asymptotic variances of these new estimators. Fortunately,the bootstrap resampling method introduced by Efron (1977) provides usan alternative to find approximations to the distributions of the newhybrid estimators, and based on the bootstrap samples we can calculatethe standard errors of the new estimators.

The use of bootstrap method to find the standard error for otherdifferent estimators for the GPD has already been suggested by manyother authors. A reason for preferring the bootstrap method is that theconfidence intervals obtained for the parameters can always make senseby satisfying the endpoint constraints.



Bias and MSE Comparisons

Finite Sample Simulation

We will only include the classical MLE, the MGFE based on AD statisticand the improved EBM* in the finite-sample comparisons.

The range of k considered is −6 < k < 2, which covers all the rangesused previously in the literature, and also the commonly used range−1 < k < 1/2, the non-regular range k > 1/2 where the MLE hastrouble and the range k < −1/2 where the GPD has infinite variance.

It is already known that the MLE have severe problems when k > 1/2.To deal with such unusual behavior of the MLE as k approaches 1/2 insimulation, we employ a quasi-maximum likelihood (QML) method usedin Luceno (2006) which is to replace the MLE of (σMLE, kMLE) by

kQML = −(n − 1)−1n−1∑i=1

log

(1−

X(i)

X(n)

)and σQML = kQMLX(n) .




Bias Comparison

Without loss of generality, the scale parameter σ is taken to be 1 becausethe estimates for the GPD are invariant with respect to the values of σ.

As the widely accepted criteria for measuring the accuracy of anestimator, the estimation bias are calculated for the finite sample sizesn = 50 based on 10, 000 random samples.

The biases for different estimators of σ and k are plotted against k inFigure 2. We see that our new hybrid estimators have significantlyimproved the estimation biases for σ and k, especially when comparedwith the MGFE and the MLE which supply the original ideas behind it.




Bias Comparison

−6 −4 −2 0 2

−0.

2−

0.1

0.0

0.1

0.2

0.3

0.4

0.5

Bias for scale, n=50

k

bias

(sig

ma)

NEW(AD)MGFE(AD)MLEEBM*

−6 −4 −2 0 2

−0.

2−

0.1

0.0

0.1

0.2

0.3

0.4

0.5

Bias for shape, n=50

k

bias

(k)


Figure 2: The bias of parameters estimation.A New Hybrid Estimation Method for the GPD Chunlin Wang (UCalgary) May 18, 2011 25/32



MSE Comparison

As the widely accepted criteria for measuring the overall quality of anestimator, the estimation mean square error (MSE) are calculated for thefinite sample sizes n = 50 based on 10, 000 random samples.

The MSEs for different estimators of σ and k are plotted against k inFigure 3. From the figure, we see that our new hybrid estimators alwayspossess comparable MSEs, and improve over the MLE for estimating thescale σ, and over the MGFE for estimating the shape k.




MSE Comparison

−6 −4 −2 0 2

0.0

0.2

0.4

0.6

0.8

1.0

MSE for scale, n=50

k

MS

E(s

igm

a)


−6 −4 −2 0 2

0.0

0.2

0.4

0.6

0.8

1.0

MSE for shape, n=50

k

MS

E(k

)


Figure 3: The efficiency of parameters estimation.A New Hybrid Estimation Method for the GPD Chunlin Wang (UCalgary) May 18, 2011 27/32


An Example: Bilbao waves data


To illustrate the advantages of the new hybrid estimation procedure, wewill present a real-world example originally analyzed in Castillo and Hadi(1997), which consists of the zero-crossing hourly mean periods (inseconds) of the sea waves measured in the Bilbao bay, Spain. Later on,this data set was revisited in Luceno (2006) and in Zhang and Stephens(2009). Only the 197 observations with periods above 7 seconds weretaken into consideration.

We model this data by the GPD using thresholds at t = 7.5 following theabove mentioned authors. The table below provides the estimated GPDparameters for Bilbao waves data using different estimators.

σ kt m MLE EBM* MGFE Hybrid MLE EBM* MGFE Hybrid7.5 154 1.860 1.722 1.632 1.626 0.768 0.686 0.614 0.620





To check graphically whether the minimum of the target function Gdefined in (6) is reached at θNEW = kNEW/σNEW = 0.3812, the G (θ;X )and its first derivative are plotted for the Bilbao waves data at t = 7.5.The boundary condition for this given data set isθ < 1/X(n) = 1/2.4 = 0.4167.

−0.4 −0.2 0.0 0.2 0.4

02

46

8

The plot of G for the Bilbao waves data

θ

G(θ

)

−0.4 −0.2 0.0 0.2 0.4

−10

−5

05

1015

20

The plot of first derivative of G for the Bilbao waves data

θ

dG/d

θ





The following figure shows the histograms of B = 1000 parametricbootstrap samples of σNEW and kNEW for the Bilbao waves data. Theparametric bootstrap standard errors for the hybrid estimates arese(σNEW) = 0.167 and se(kNEW) = 0.090, and the corresponding 95%bootstrap confidence intervals for σ and k are (1.288, 1.949) and(0.413, 0.771).

Histogram of 1000 parametric bootstrap samples of sigma

b.se[, 1]

Fre

quen

cy

1.0 1.2 1.4 1.6 1.8 2.0 2.2

050

100

150

200

Histogram of 1000 parametric bootstrap samples of k

b.se[, 2]

Fre

quen

cy

0.2 0.4 0.6 0.8

050

100

150

200

250



Final Comments

The new hybrid estimating procedure has been introduced for the GPDparameters, and it has several advantages.

First, the new hybrid estimates are easily obtained by optimizing asingle parameter function using some standard algorithms, and theexistence and feasibility of the hybrid estimates can even be verifiedgraphically.

Second, unlike some other existing methods, the new hybridestimates can always be found for the entire parameter space.

Third, the standard errors and confidence intervals can be easilycalculated by the bootstrap method.

Finally, the simulation study of bias and MSE showed that theproposed hybrid estimators greatly improve over the MLE and theMGFE, and well compared with the other existing methods.



Acknowledgements

THANK YOU!


Documents

A New Hybrid Estimation Method for the Generalized · PDF fileOutline Introduction Estimation of the GPD Parameters Simulation Study An Example Final Conclusions A New Hybrid Estimation