5
Statistics and Probability Letters 82 (2012) 1285–1289 Contents lists available at SciVerse ScienceDirect Statistics and Probability Letters journal homepage: www.elsevier.com/locate/stapro On interval and point estimators based on a penalization of the modified profile likelihood Laura Ventura a,, Walter Racugno b a Department of Statistics, University of Padova, Italy b Department of Mathematics and Informatics, University of Cagliari, Italy article info Article history: Received 18 February 2012 Received in revised form 20 March 2012 Accepted 20 March 2012 Available online 30 March 2012 Keywords: Bias Expected length Exponential family Matching prior Mean squared error abstract In the presence of a nuisance parameter, one widely shared approach to likelihood inference on a scalar parameter of interest is based on the profile likelihood and its various modifications. In this paper, we add a penalization to the modified profile likelihood, which is based on a suitable matching prior, and we discuss the frequency properties of interval estimators and point estimators based on this penalized modified profile likelihood. Two simulation studies are illustrated, and we indicate the improvement of the proposed penalized modified profile likelihood over its counterparts. © 2012 Elsevier B.V. All rights reserved. 1. Introduction Let us consider a model with a scalar parameter of interest ψ ,a d-dimensional nuisance parameter λ and likelihood function L(ψ , λ) = L(ψ,λ; y), with y = (y 1 ,..., y n ) random sample. Standard methods for inference about ψ are based on the profile likelihood L p (ψ) = L(ψ, ˆ λ ψ ), with ˆ λ ψ maximum likelihood estimator (MLE) of λ for fixed ψ . Several examples of bad behaviour of L p (ψ) have led to adjustments of the form L mp (ψ) = L p (ψ)M(ψ), (1) for a suitably defined correction term M(ψ); see, for instance, Severini (2000, Chapter 9). Reduction of the score bias is the motivation for adjusting L p (ψ) in McCullagh and Tibshirani (1990), while other proposals (see e.g. Pace and Salvan (2006)) aim to approximate some target likelihood. In this paper, we propose a penalization of L mp (ψ) from a new perspective, which aims to improve interval estimators and point estimators for ψ . The penalization is based on a matching prior π mp (ψ) on ψ only, i.e. a prior for which there is a strong agreement between Bayesian and frequentist inference and which validates the use of L mp (ψ) for Bayesian inference. In this context, the matching prior π mp (ψ) can be interpreted as a non-negative weight function on ψ . We show that the penalized modified profile likelihood L mp (ψ) = L mp (ψ)π mp (ψ) (2) has better properties than (1), in the sense that: the expected length (EL) of confidence intervals (CIs) based on L mp (ψ) and the mean squared error (MSE) of the maximizer of L mp (ψ) are less than those based on L mp (ψ); in exponential families, L mp (ψ) equals the Firth’s procedure (Firth, 1993) for bias reduction of the MLE. Corresponding author. Tel.: +39 049 8274177; fax: +39 049 8274170. E-mail addresses: [email protected] (L. Ventura), [email protected] (W. Racugno). 0167-7152/$ – see front matter © 2012 Elsevier B.V. All rights reserved. doi:10.1016/j.spl.2012.03.025

On interval and point estimators based on a penalization of the modified profile likelihood

Embed Size (px)

Citation preview

Page 1: On interval and point estimators based on a penalization of the modified profile likelihood

Statistics and Probability Letters 82 (2012) 1285–1289

Contents lists available at SciVerse ScienceDirect

Statistics and Probability Letters

journal homepage: www.elsevier.com/locate/stapro

On interval and point estimators based on a penalization of the modifiedprofile likelihoodLaura Ventura a,∗, Walter Racugno b

a Department of Statistics, University of Padova, Italyb Department of Mathematics and Informatics, University of Cagliari, Italy

a r t i c l e i n f o

Article history:Received 18 February 2012Received in revised form 20 March 2012Accepted 20 March 2012Available online 30 March 2012

Keywords:BiasExpected lengthExponential familyMatching priorMean squared error

a b s t r a c t

In the presence of a nuisance parameter, one widely shared approach to likelihoodinference on a scalar parameter of interest is based on the profile likelihood and its variousmodifications. In this paper, we add a penalization to themodified profile likelihood, whichis based on a suitable matching prior, and we discuss the frequency properties of intervalestimators and point estimators based on this penalized modified profile likelihood. Twosimulation studies are illustrated, and we indicate the improvement of the proposedpenalized modified profile likelihood over its counterparts.

© 2012 Elsevier B.V. All rights reserved.

1. Introduction

Let us consider a model with a scalar parameter of interest ψ , a d-dimensional nuisance parameter λ and likelihoodfunction L(ψ, λ) = L(ψ, λ; y), with y = (y1, . . . , yn) random sample. Standardmethods for inference aboutψ are based onthe profile likelihood Lp(ψ) = L(ψ, λψ ), with λψ maximum likelihood estimator (MLE) of λ for fixed ψ . Several examplesof bad behaviour of Lp(ψ) have led to adjustments of the form

Lmp(ψ) = Lp(ψ)M(ψ), (1)

for a suitably defined correction term M(ψ); see, for instance, Severini (2000, Chapter 9). Reduction of the score bias is themotivation for adjusting Lp(ψ) in McCullagh and Tibshirani (1990), while other proposals (see e.g. Pace and Salvan (2006))aim to approximate some target likelihood.

In this paper, we propose a penalization of Lmp(ψ) from a new perspective, which aims to improve interval estimatorsand point estimators forψ . The penalization is based on a matching prior πmp(ψ) onψ only, i.e. a prior for which there is astrong agreement between Bayesian and frequentist inference andwhich validates the use of Lmp(ψ) for Bayesian inference.In this context, the matching prior πmp(ψ) can be interpreted as a non-negative weight function on ψ . We show that thepenalized modified profile likelihood

L∗

mp(ψ) = Lmp(ψ)πmp(ψ) (2)

has better properties than (1), in the sense that: the expected length (EL) of confidence intervals (CIs) based on L∗mp(ψ) and

the mean squared error (MSE) of the maximizer of L∗mp(ψ) are less than those based on Lmp(ψ); in exponential families,

L∗mp(ψ) equals the Firth’s procedure (Firth, 1993) for bias reduction of the MLE.

∗ Corresponding author. Tel.: +39 049 8274177; fax: +39 049 8274170.E-mail addresses: [email protected] (L. Ventura), [email protected] (W. Racugno).

0167-7152/$ – see front matter© 2012 Elsevier B.V. All rights reserved.doi:10.1016/j.spl.2012.03.025

Page 2: On interval and point estimators based on a penalization of the modified profile likelihood

1286 L. Ventura, W. Racugno / Statistics and Probability Letters 82 (2012) 1285–1289

The interface between Bayesian versus frequentist inference has been the subject of considerable recent interest. Forinstance, this has lead to the investigation of integrated likelihoods for non-Bayesian inference (Liseo, 1993; Severini, 2007,2011), to the use of pseudo-likelihoods for Bayesian inference (see e.g. Lazar (2003), Chang and Mukerjee (2006), Racugnoet al. (2010) and Pauli et al. (2011)), and to the development of matching priors (see Datta andMukerjee (2004) and Venturaet al. (2009)). In particular, the agreement between frequentist and Bayesian inference arising from matching priors makestheir study of interest from the frequentist viewpoint as well. The possibility of adjusting a likelihood using priors, evenif quite differently motivated, is considered also in Firth (1993), Liseo (1993), Mukerjee and Reid (1999), and Ventura andRacugno (2011). Here, we focus on the class of matching priors for ψ derived from Lmp(ψ).

2. The matching prior

Since the modified profile likelihood Lmp(ψ) depends only on the data and the parameter of interest ψ , it can be usedalso in the Bayesian framework as a genuine likelihood to construct a posterior distribution for ψ of the form πmp(ψ |y) ∝

Lmp(ψ) πmp(ψ), where πmp(ψ) is the matching prior on ψ only, given by Ventura et al. (2009)

πmp(ψ) ∝ iψψ ·λ(ψ, λψ )1/2, (3)

with iψψ ·λ(ψ, λ) = iψψ (ψ, λ) − iψλ(ψ, λ)iλλ(ψ, λ)−1iλψ (ψ, λ) partial information, and iψψ (ψ, λ), iψλ(ψ, λ), iλλ(ψ, λ),and iλψ (ψ, λ) blocks of the expected Fisher information i(ψ, λ) from L(ψ, λ) for one observation. Note that (3) can beinterpreted as a Jeffreys’ type prior associated to Lmp(ψ).

To order O(n−3/2), for the posterior πmp(ψ |y) it can be shown that (Ventura and Racugno, 2011) ψ0

−∞

πmp(ψ |y) dψ =Φ(r∗

p (ψ0)), (4)

where Φ(·) is the standard normal distribution function and r∗p (ψ) is the modified signed likelihood ratio statistic

r∗p (ψ) = rp(ψ) + (1/rp(ψ)) log(q(ψ)/rp(ψ)) of Barndorff-Nielsen and Chamberlin (1994), with rp(ψ) = sign(ψ −

ψ)

2(ℓp(ψ)− ℓp(ψ)) signed likelihood ratio statistic, ℓp(ψ) = log Lp(ψ), and q(ψ) = ℓ′

p(ψ)|jp(ψ)|−1/2M(ψ)−1

|iψψ ·λ(ψ, λ)|1/2

|iψψ ·λ(ψ, λψ )|−1/2. In q(ψ), ℓ′

p(ψ) = ∂ℓp(ψ)/∂ψ and jp(ψ) = −∂2ℓp(ψ)/∂ψ2 are the profile score

function and the profile observed information, respectively, and (ψ, λ) is the MLE of (ψ, λ). Moreover, M(ψ) =

(|jλλ(ψ, λψ )|1/2|jλλ(ψ, λ)|1/2)/|ℓλ;λ(ψ, λψ )|, where jλλ(ψ, λ) is the (λ, λ)-block of the observed Fisher information j(ψ, λ),and ℓλ;λ(ψ, λ) = ∂2ℓ(ψ, λ)/∂λ∂λ⊤, with ℓ(ψ, λ) = log L(ψ, λ).

In view of (4), the HPD credible set H(kα) = {ψ : πmp(ψ |y) ≥ kα} for ψ , with kα suitable given constant, is givenby H(z1−α/2) =

ψ : |r∗

p (ψ)| ≤ z1−α/2to order O(n−3/2), where z1−α/2 is the (1 − α/2)-quantile of the standard normal

distribution. Therefore, H(z1−α/2) coincides with the CI forψ with approximate level (1 − α) based on the modified signedlikelihood ratio statistic r∗

p (ψ). In view of this, the prior (3) is also an HPDmatching prior forψ (Ventura and Racugno, 2011).

3. Frequency properties of inferences based on L∗mp(ψ)

In this section we show that L∗mp(ψ) yields statistical procedures that are superior to those based on Lmp(ψ) in the

sense that: the EL of CIs based on L∗mp(ψ) and the MSE of the maximizer of L∗

mp(ψ) are less than those based on Lmp(ψ);in exponential families, L∗

mp(ψ) equals Firth’s procedure (Firth, 1993) for bias reduction of the MLE.In the following, without loss of generality, as in Mukerjee and Reid (1999) and Severini (2011) we assume that ψ

and λ are orthogonal parameters. Since ψ is scalar, this can always be achieved by a reparameterization of the nuisanceparameter. With this parameterization, the modified profile likelihood Lmp(ψ) reduces to the Cox and Reid (1987) adjustedprofile likelihood Lmp(ψ) = Lp(ψ) |jλλ(ψ, λψ )|−1/2 and the penalized modified profile likelihood (2) is thus L∗

mp(ψ) =

Lmp(ψ) iψψ (ψ, λψ )1/2.Expected length and mean squared error. The EL of the CIs based on L∗

mp(ψ) is less than the EL of the CIs based on Lmp(ψ), upto o(n−1), if and only if (see Mukerjee and Reid (1999, Section 4.3))

2∂

∂ψiψψ (ψ, λ)−1

πψ (ψ, λ)+

πψ (ψ, λ)2

iψψ (ψ, λ)< 0. (5)

For the matching prior (3) condition (5) is satisfied since it reduces to −34 iψψ (ψ, λ)

−3(νψψψ + νψψ,ψ )2 < 0, with

νψψψ = E(∂3ℓ(ψ, λ)/∂ψ3) and νψψ,ψ = E((∂2ℓ(ψ, λ)/∂ψ2)(∂ℓ(ψ, λ)/∂ψ)), and this shows that the EL of the CIs arisingfrom L∗

mp(ψ) is shorter than that of the ones arising from Lmp(ψ). As a final remark, note that condition (5) implies that

∂ψ

πψ (ψ, λ)

iψψ (ψ, λ)< 0, (6)

since πψ (ψ, λ)2/iψψ (ψ, λ) > 0, and thus function (6) is negative (see Severini (2011)).

Page 3: On interval and point estimators based on a penalization of the modified profile likelihood

L. Ventura, W. Racugno / Statistics and Probability Letters 82 (2012) 1285–1289 1287

Fig. 1. Plot of profile likelihood ratio testWp(ψ) (dashed line) and of the penalized profile likelihood ratio testW ∗mp(ψ) (solid line) for a simulated sample

of size n = 5.

Let ψ∗mp denote the penalized modified profile MLE, i.e. the value of ψ that maximizes L∗

mp(ψ). Moreover, let ψmp thevalue of ψ that maximizes Lmp(ψ). There exists a connection between the MSE of the estimator given by L∗

mp(ψ) and theEL of the associated CIs (see formula (3.12) in Mukerjee and Reid (1999)). In view of this, similarly to the EL, the MSE ofψ∗

mp is less than the MSE of ψmp, up to o(n−1), whenever ∂(πψ (ψ, λ)/iψψ (ψ, λ))/∂ψ is sufficiently negative as explainedin Severini (2011, Sections 4.2 and 4.3). Since this condition is the same for the EL of CI, then the MSE of the maximizer ofL∗mp(ψ) is less than the MSE of the maximizer of Lmp(ψ).Bias reduction in exponential families. In regular parametric problems, first-order bias reduction of the MLE can be achievedby a suitable modification of the score function (Firth, 1993). In exponential families with canonical parameterization, notnecessarily orthogonal, the effect is to penalize the likelihood by the Jeffreys’ prior. Thus, Firth’s penalized likelihood functioncan be expressed as LF (ψ, λ) = L(ψ, λ)|i(ψ, λ)|1/2.

When inferential interest is only on the scalar parameter ψ , Firth’s likelihood inference can be based on the profilepenalized likelihood function, obtained by substituting λ with the partial MLE λψ , i.e. LF (ψ, λψ ) = L(ψ, λψ )|i(ψ, λψ )|1/2.Then, using standard results for partitioned matrices, we have

LF (ψ, λψ ) = Lp(ψ)|iψψ ·λ(ψ, λψ )|1/2

|iλλ(ψ, λψ )|1/2 = Lmp(ψ)|iψψ ·λ(ψ, λψ )|1/2, (7)

which is the penalized modified profile likelihood (2). Therefore, in exponential families, the O(n−1) bias of the MLE ψis removed by calculation of the maximizer of L∗

mp(ψ). As a final remark, we note that in exponential families it holdsiψψ ·λ(ψ, λψ ) = jp(ψ).

4. Examples

Example 1 (Ratio of normal means). Let (x1, . . . , xn) and (y1, . . . , yn) be independent random samples from normaldistributions with means µx and µy, respectively, and unit variances. Assume µx > 0 and µy > 0. The parameter of

interest is ψ = µx/µy, and let λ =

µ2

x + µ2y , so that ψ and λ are orthogonal (Severini, 2011). The profile loglikelihood is

ℓp(ψ) = (n/2)(ψ x+ y)2/(ψ2+ 1), where x and y are the sample means, and ℓmp(ψ) = ℓp(ψ), since jλλ(ψ, λ) = n. As well

known, for this problem ℓp(ψ), as well as ℓmp(ψ), has the property that it does not approach 0 as ψ → ∞ (see, e.g., Liseo(1993) and Severini (2011)) and likelihood CIs can be infinite. Simple calculations give iψψ (ψ, λ) = λ2/(ψ2

+ 1)2, and it iseasy to show that the matching prior πmp(ψ) = (ψ x + y)/[(x + y)(ψ2

+ 1)3/2] is proper. The proposed penalized profileloglikelihood is

ℓ∗

mp(ψ) = ℓp(ψ)+12log

(ψ x + y)(ψ2 + 1)3/2

and, in terms of CIs, is clearly preferable to ℓp(ψ) since πmp(ψ)makes a strong correction to ℓp(ψ) on the tail (see Fig. 1).

The EL of CIs is illustrated through simulation studies based on 10,000 Monte Carlo trials. Table 1 gives the EL for 95%CIs from ℓp(ψ) = ℓmp(ψ) and ℓ∗

mp(ψ); the number in square brackets indicates the empirical percentage of cases when theCI from ℓp(ψ) is finite. From Table 1 we observe that, even for moderate n, CIs based on ℓ∗

mp(ψ) have mean lengths alwayssmaller than the ones based on ℓp(ψ). Table 2 gives the result of a simulation study which evaluates the finite-sample

Page 4: On interval and point estimators based on a penalization of the modified profile likelihood

1288 L. Ventura, W. Racugno / Statistics and Probability Letters 82 (2012) 1285–1289

Table 1EL of 0.95% CIs for the ratio of normal means; a computed when EL < ∞.

n = 20 n = 30 n = 50

ψ = 1 ℓp(ψ)a 5.181 (98%) 1.309 (99%) 0.878

ℓ∗mp(ψ) 1.362 1.065 0.807

ψ = 5 ℓp(ψ)a 9.468 (99%) 5.138 (99%) 3.340

ℓ∗mp(ψ) 4.647 3.869 2.952

Table 2Bias (and MSE) of the maximizers of ℓp(ψ) and ℓ∗

mp(ψ) for the ratio of normal means.

n = 5 n = 10 n = 15 n = 20

ψ = 1 ψ 0.225 (411.197) 0.146 (12.187) 0.091 (0.437) 0.062 (0.148)ψ∗

mp 0.197 (0.227) 0.108 (0.149) 0.073 (0.111) 0.052 (0.086)ψ = 5 ψ 0.767 (>1000) 0.760 (34.909) 0.391 (4.462) 0.289 (2.436)

ψ∗mp 0.727 (2.498) 0.522 (1.705) 0.364 (1.288) 0.269 (1.087)

Table 3EL of 0.95% CIs under the inverse Gaussian.

n = 5 n = 10 n = 15 n = 20

ℓp(ψ) 3.045 1.167 0.713 0.533ℓmp(ψ) 2.704 1.075 0.673 0.511ℓ∗mp(ψ) 2.337 1.001 0.643 0.494

Table 4Bias (and MSE) of the maximizers of ℓp(ψ), ℓmp(ψ) and ℓ∗

mp(ψ), under the inverse Gaussian.

n = 5 n = 10 n = 15 n = 20

ψ 1.966 (49.698) 0.548 (1.599) 0.337 (0.589) 0.239 (0.348)ψmp 1.373 (31.261) 0.393 (1.195) 0.248 (0.471) 0.177 (0.302)ψ∗

mp 0.186 (7.378) 0.083 (0.630) 0.069 (0.307) 0.053 (0.214)

properties of the maximizers of ℓp(ψ) = ℓmp(ψ) and ℓ∗mp(ψ). The estimators are compared in terms of bias and MSE. From

Table 2 it can be noted that the maximizer of ℓ∗mp(ψ) always exhibits a smaller MSE (and even bias) than ψ .

Example 2 (Inverse Gaussian distribution). Consider a random sample (y1, . . . , yn) from the inverse Gaussian distributionwith density p(y;ψ, λ) = [ψ/(2πy3)]1/2 exp(−ψ(y − λ)2/(2λ2y)), y > 0, ψ, λ > 0, with ψ scale parameter. The profileloglikelihood is ℓp(ψ) = (n/2) logψ − (ψt)/(2y2), with t =

(yi − y)2/yi, and the modified profile loglikelihood is

ℓmp(ψ) = ((n − 1)/2) logψ − (ψt)/(2y2). Moreover, iψψ (ψ, λ) = n/(2ψ2), and thus the proposed modified profileloglikelihood is ℓ∗

mp(ψ) =(n−3)

2 logψ −ψt2y2

.

As in the previous example, a simulation study based on 10,000 Monte Carlo trials has been performed in order to studythe frequency properties of interval estimators and point estimators based on ℓ∗

mp(ψ). Table 3 gives the EL for 95% CIs fromℓp(ψ), ℓmp(ψ) and ℓ∗

mp(ψ), and Table 4 gives the bias andMSE of themaximizers of ℓp(ψ), ℓmp(ψ) and ℓ∗mp(ψ). FromTable 3

we observe that, also in this case, ℓ∗mp(ψ) is preferable to ℓmp(ψ). From Table 4 it can be noted that the maximizer of ℓ∗

mp(ψ)

always exhibits a smaller MSE (and bias) than ψ and ψmp.As a final remark, note that the comparisons based on the EL of CIs and the MSE of estimators depend on the

parameterization used. Therefore, when considering specific examples, it is important to keep in mind that differentconclusions might be reached if the parameter of interest is reparameterized.

Acknowledgements

The authors acknowledge the referee for the useful comments. This work was supported by grants from University ofPadua (Progetti di Ricerca di Ateneo 2011) and from MIUR, Italy.

References

Barndorff-Nielsen, O.E., Chamberlin, S.R., 1994. Stable and invariant adjusted directed likelihoods. Biometrika 81, 485–499.Chang, H., Mukerjee, R., 2006. Probability matching property of adjusted likelihoods. Statist. Probab. Lett. 76, 838–842.

Page 5: On interval and point estimators based on a penalization of the modified profile likelihood

L. Ventura, W. Racugno / Statistics and Probability Letters 82 (2012) 1285–1289 1289

Cox, D.R., Reid, N., 1987. Parameter orthogonality and approximate conditional inference (with discussion). J. Roy. Statist. Soc. B 49, 1–39.Datta, G.S., Mukerjee, R., 2004. Probability Matching Priors: Higher Order Asymptotics. Springer, Berlin.Firth, D., 1993. Bias reduction of maximum likelihood estimates. Biometrika 80, 27–38.Lazar, N.A., 2003. Bayesian empirical likelihood. Biometrika 90, 319–326.Liseo, B., 1993. Elimination of nuisance parameters with reference priors. Biometrika 80, 295–304.McCullagh, P., Tibshirani, R., 1990. A simple method for the adjustment of profile likelihoods. J. Roy. Statist. Soc. B 52, 325–344.Mukerjee, R., Reid, N., 1999. On tconfidence intervals associated with the usual and adjusted likelihoods. J. Roy. Statist. Soc. B 61, 945–953.Pace, L., Salvan, A., 2006. Adjustments of the profile likelihood from a new perspective. J. Statist. Plan. Inf. 136, 3554–3564.Pauli, F., Racugno, W., Ventura, L., 2011. Bayesian composite marginal likelihoods. Statist. Sin. 21, 149–164.Racugno, W., Salvan, A., Ventura, L., 2010. Bayesian analysis in regression models using pseudo-likelihoods. Comm. Stat. Th. Meth. 39, 3444–3455.Severini, T.A., 2000. Likelihood Methods in Statistics. Oxford University Press.Severini, T.A., 2007. Integrated likelihood functions for non-Bayesian inference. Biometrika 94, 529–542.Severini, T.A., 2011. Frequency properties of inferences based on an integrated likelihood function. Statist. Sin. 21, 433–447.Ventura, L., Cabras, S., Racugno, W., 2009. Prior distributions from pseudo-likelihoods in the presence of nuisance parameters. J. Amer. Stat. Assoc. 104,

768–774.Ventura, L., Racugno, W., 2011. Recent advances on Bayesian inference for P(X < Y ). Bayesian Anal. 6, 411–428.