41
Adjusted scores for regression models Euloge Clovis Kenne Pagui [email protected] Department of Statistical Sciences University of Padua Joint work with Alessandra Salvan and Nicola Sartori Euloge Clovis Kenne (Univ. of Padua) Adjusted scores PRIN 2015 meeting, Padova 1 / 27

Euloge Clovis Kenne Paguifmi.stat.unipd.it/pdf/Kenne_prin2019.pdf · Idea: solve the pro le versions system with e cient scores instead of pro le scores (and all quantities evaluated

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Euloge Clovis Kenne Paguifmi.stat.unipd.it/pdf/Kenne_prin2019.pdf · Idea: solve the pro le versions system with e cient scores instead of pro le scores (and all quantities evaluated

Adjusted scores for regression models

Euloge Clovis Kenne [email protected]

Department of Statistical SciencesUniversity of Padua

Joint work with Alessandra Salvan and Nicola Sartori

Euloge Clovis Kenne (Univ. of Padua) Adjusted scores PRIN 2015 meeting, Padova 1 / 27

Page 2: Euloge Clovis Kenne Paguifmi.stat.unipd.it/pdf/Kenne_prin2019.pdf · Idea: solve the pro le versions system with e cient scores instead of pro le scores (and all quantities evaluated

Outline

1 Background on MLE

2 Mean bias reduction

3 Median bias reduction

4 Applications

5 Conclusions

Euloge Clovis Kenne (Univ. of Padua) Adjusted scores PRIN 2015 meeting, Padova 2 / 27

Page 3: Euloge Clovis Kenne Paguifmi.stat.unipd.it/pdf/Kenne_prin2019.pdf · Idea: solve the pro le versions system with e cient scores instead of pro le scores (and all quantities evaluated

Basics

Parametric model with p-dimensional parameter θ and log likelihood

`(θ) based on sample of size n.

With θ = (θ1, . . . , θp), derivatives of `(θ):

. Ur= ∂`(θ)/∂θr , Urs = ∂2`(θ)/∂θr∂θs , r , s = 1, . . . , p;

. score U(θ) with components Ur ;

. observed information j(θ) with entry −Urs .

Expected values of derivatives of `(θ):

. expected information ı(θ) = Eθ{j(θ)} with entry ırs ;

. ırs entry of ı(θ)−1;

. νr ,st = Eθ(UrUst) and νr ,s,t = Eθ(UrUsUt).

Euloge Clovis Kenne (Univ. of Padua) Adjusted scores PRIN 2015 meeting, Padova 3 / 27

Page 4: Euloge Clovis Kenne Paguifmi.stat.unipd.it/pdf/Kenne_prin2019.pdf · Idea: solve the pro le versions system with e cient scores instead of pro le scores (and all quantities evaluated

Basics

θ denotes the maximum likelihood estimator (MLE).

First-order properties of MLE:

. θ ∼Np(θ, ı(θ)−1)

. mean unbiasedness: Eθ(θ) = θ + O(n−1)

. (marginal) median unbiasedness: Pθ(θr < θr ) = 12 + O(n−1/2)

Euloge Clovis Kenne (Univ. of Padua) Adjusted scores PRIN 2015 meeting, Padova 4 / 27

Page 5: Euloge Clovis Kenne Paguifmi.stat.unipd.it/pdf/Kenne_prin2019.pdf · Idea: solve the pro le versions system with e cient scores instead of pro le scores (and all quantities evaluated

Mean bias correction

In regular parametric models, the bias of MLE has the form

Eθ(θ − θ) = b(θ) + O(n−2),

where b(θ) = −ı(θ)−1A∗(θ) = O(n−1), with

A∗r =1

2tr[ı(θ)−1{Pr + Qr}], r = 1, . . . , p,

where Pr = Eθ{U(θ)U(θ)TUr} and Qr = Eθ{−j(θ)Ur}

θBC = θ − b(θ) has bias of order O(n−2).

Asymptotically equivalent alternatives, such as bootstrap and

jackknife, can be obtained without b(θ).

Lack of equivariance under reparameterizations.

Available only if θ is finite.

Euloge Clovis Kenne (Univ. of Padua) Adjusted scores PRIN 2015 meeting, Padova 5 / 27

Page 6: Euloge Clovis Kenne Paguifmi.stat.unipd.it/pdf/Kenne_prin2019.pdf · Idea: solve the pro le versions system with e cient scores instead of pro le scores (and all quantities evaluated

Mean bias reduction

Firth (1993), proposes a modification of the score vector of the form

U∗(θ) = U(θ) + A∗(θ) .

The modification term A∗(θ) is of order O(1).

The estimator θ∗, solution of U∗(θ) = 0 has a bias of order O(n−2)

and does not require finiteness of θ.

Notable examples: generalized linear models (glm) implemented in

the R package brglm2.

θ∗ ∼Np(θ, ı(θ)−1).

Still depends on parameterization.

Euloge Clovis Kenne (Univ. of Padua) Adjusted scores PRIN 2015 meeting, Padova 6 / 27

Page 7: Euloge Clovis Kenne Paguifmi.stat.unipd.it/pdf/Kenne_prin2019.pdf · Idea: solve the pro le versions system with e cient scores instead of pro le scores (and all quantities evaluated

Median bias reduction

Kenne Pagui et al. (2017): a new modified score that

. does not require θ, as mean bias reduction;

. estimates all components of the parameter simultaneously;

. gives an estimator that is marginally third-order median unbiased for

each component;

. maintains (some) equivariance.

Development:

. Scalar parameter;

. Scalar parameter with nuisance parameters;

. Multidimensional parameter.

Euloge Clovis Kenne (Univ. of Padua) Adjusted scores PRIN 2015 meeting, Padova 7 / 27

Page 8: Euloge Clovis Kenne Paguifmi.stat.unipd.it/pdf/Kenne_prin2019.pdf · Idea: solve the pro le versions system with e cient scores instead of pro le scores (and all quantities evaluated

Median bias reduction

Kenne Pagui et al. (2017): a new modified score that

. does not require θ, as mean bias reduction;

. estimates all components of the parameter simultaneously;

. gives an estimator that is marginally third-order median unbiased for

each component;

. maintains (some) equivariance.

Development:

. Scalar parameter;

. Scalar parameter with nuisance parameters;

. Multidimensional parameter.

Euloge Clovis Kenne (Univ. of Padua) Adjusted scores PRIN 2015 meeting, Padova 7 / 27

Page 9: Euloge Clovis Kenne Paguifmi.stat.unipd.it/pdf/Kenne_prin2019.pdf · Idea: solve the pro le versions system with e cient scores instead of pro le scores (and all quantities evaluated

Median bias reduction: scalar parameter

For scalar θ, a median modification of the score is obtained using a

Cornish-Fisher expansion

U(θ) = U(θ) + A(θ) ,

where A(θ)=−Meθ{U(θ)}=νθ,θ,θ/{6ı(θ)} = O(1).

For continuous random variables and under weak regularity conditions

on the score, the solution of U(θ) = 0, θ, is third-order median

unbiased

Prθ(θ ≤ θ) = Prθ(U(θ) ≤ 0) =1

2+ O(n−3/2) .

θ is equivariant.

θ ∼N(θ, ı(θ)−1).

Euloge Clovis Kenne (Univ. of Padua) Adjusted scores PRIN 2015 meeting, Padova 8 / 27

Page 10: Euloge Clovis Kenne Paguifmi.stat.unipd.it/pdf/Kenne_prin2019.pdf · Idea: solve the pro le versions system with e cient scores instead of pro le scores (and all quantities evaluated

Median bias reduction: scalar parameter

For scalar θ, a median modification of the score is obtained using a

Cornish-Fisher expansion

U(θ) = U(θ) + A(θ) ,

where A(θ)=−Meθ{U(θ)}=νθ,θ,θ/{6ı(θ)} = O(1).

For continuous random variables and under weak regularity conditions

on the score, the solution of U(θ) = 0, θ, is third-order median

unbiased

Prθ(θ ≤ θ) = Prθ(U(θ) ≤ 0) =1

2+ O(n−3/2) .

θ is equivariant.

θ ∼N(θ, ı(θ)−1).

Euloge Clovis Kenne (Univ. of Padua) Adjusted scores PRIN 2015 meeting, Padova 8 / 27

Page 11: Euloge Clovis Kenne Paguifmi.stat.unipd.it/pdf/Kenne_prin2019.pdf · Idea: solve the pro le versions system with e cient scores instead of pro le scores (and all quantities evaluated

Toy example: normal model

y1, . . . , yn random sample from N(µ0, σ2), with µ0 known.

σ2 = σ2∗ = s(µ0)/n, with s(µ0) =∑n

i=1(yi − µ0)2.

The median modified score is

U(σ2) = −n−2/3

2σ2+

s(µ0)

2(σ2)2

which leads to σ2 = s(µ0)/(n−2/3).

σ2 is equal to the exact median unbiased estimator, s(µ0)/χ2n;0.5, plus

an error of order O(n−2).

Both σ2 and σ2 are equivariant; hence, for instance, for ω =√σ2, we

have ω = {σ2}1/2 and ω = {σ2}1/2.

On the other hand, ω∗ = {s(µ0)/(n−1/2)}1/2.

Euloge Clovis Kenne (Univ. of Padua) Adjusted scores PRIN 2015 meeting, Padova 9 / 27

Page 12: Euloge Clovis Kenne Paguifmi.stat.unipd.it/pdf/Kenne_prin2019.pdf · Idea: solve the pro le versions system with e cient scores instead of pro le scores (and all quantities evaluated

Toy example: normal model

y1, . . . , yn random sample from N(µ0, σ2), with µ0 known.

σ2 = σ2∗ = s(µ0)/n, with s(µ0) =∑n

i=1(yi − µ0)2.

The median modified score is

U(σ2) = −n−2/3

2σ2+

s(µ0)

2(σ2)2

which leads to σ2 = s(µ0)/(n−2/3).

σ2 is equal to the exact median unbiased estimator, s(µ0)/χ2n;0.5, plus

an error of order O(n−2).

Both σ2 and σ2 are equivariant; hence, for instance, for ω =√σ2, we

have ω = {σ2}1/2 and ω = {σ2}1/2.

On the other hand, ω∗ = {s(µ0)/(n−1/2)}1/2.

Euloge Clovis Kenne (Univ. of Padua) Adjusted scores PRIN 2015 meeting, Padova 9 / 27

Page 13: Euloge Clovis Kenne Paguifmi.stat.unipd.it/pdf/Kenne_prin2019.pdf · Idea: solve the pro le versions system with e cient scores instead of pro le scores (and all quantities evaluated

Scalar parameter with nuisance parameters

Let θ = (ψ, λ), with ψ a scalar parameter of interest.

Substitute λ with λψ and get the profile score UP(ψ) = Uψ(θψ).

Using a Cornish-Fisher expansion for the median of the standardized

profile score we obtain

UP(ψ) = UP(ψ)−κ1ψ(θψ) +1

6

κ3ψ(θψ)

κ2ψ(θψ),

where κ1ψ(·), κ2ψ(·) and κ3ψ(·) involve ırs , νr ,s,t and νr ,st , and are

the leading terms of the first three cumulants of UP(ψ).

UP(ψ) = 0 gives ψP which is

. third-order median unbiased,

. invariant with respect to interest preserving reparameterizations,

. asymptotically N(ψ, ıψψ).

Euloge Clovis Kenne (Univ. of Padua) Adjusted scores PRIN 2015 meeting, Padova 10 / 27

Page 14: Euloge Clovis Kenne Paguifmi.stat.unipd.it/pdf/Kenne_prin2019.pdf · Idea: solve the pro le versions system with e cient scores instead of pro le scores (and all quantities evaluated

Scalar parameter with nuisance parameters

Let θ = (ψ, λ), with ψ a scalar parameter of interest.

Substitute λ with λψ and get the profile score UP(ψ) = Uψ(θψ).

Using a Cornish-Fisher expansion for the median of the standardized

profile score we obtain

UP(ψ) = UP(ψ)−κ1ψ(θψ) +1

6

κ3ψ(θψ)

κ2ψ(θψ),

where κ1ψ(·), κ2ψ(·) and κ3ψ(·) involve ırs , νr ,s,t and νr ,st , and are

the leading terms of the first three cumulants of UP(ψ).

UP(ψ) = 0 gives ψP which is

. third-order median unbiased,

. invariant with respect to interest preserving reparameterizations,

. asymptotically N(ψ, ıψψ).

Euloge Clovis Kenne (Univ. of Padua) Adjusted scores PRIN 2015 meeting, Padova 10 / 27

Page 15: Euloge Clovis Kenne Paguifmi.stat.unipd.it/pdf/Kenne_prin2019.pdf · Idea: solve the pro le versions system with e cient scores instead of pro le scores (and all quantities evaluated

Multi-dimensional parameter

First idea: jointly solve system with “profile versions” for each

parameter, with all quantities evaluated at θ.

Second idea: profile score coincides with the efficient score

Uψ = Uψ − ıψλı−1λλUλ,

evaluated at θψ, indeed

UP(ψ) = Uψ(θψ)− Uψλ(θψ)Uλλ(θψ)−1Uλ(θψ) = Uψ(θψ) .

Idea: solve the profile versions system with efficient scores instead ofprofile scores (and all quantities evaluated at θ).

Euloge Clovis Kenne (Univ. of Padua) Adjusted scores PRIN 2015 meeting, Padova 11 / 27

Page 16: Euloge Clovis Kenne Paguifmi.stat.unipd.it/pdf/Kenne_prin2019.pdf · Idea: solve the pro le versions system with e cient scores instead of pro le scores (and all quantities evaluated

Multi-dimensional parameter

First idea: jointly solve system with “profile versions” for each

parameter, with all quantities evaluated at θ.

Second idea: profile score coincides with the efficient score

Uψ = Uψ − ıψλı−1λλUλ,

evaluated at θψ, indeed

UP(ψ) = Uψ(θψ)− Uψλ(θψ)Uλλ(θψ)−1Uλ(θψ) = Uψ(θψ) .

Idea: solve the profile versions system with efficient scores instead ofprofile scores (and all quantities evaluated at θ).

Euloge Clovis Kenne (Univ. of Padua) Adjusted scores PRIN 2015 meeting, Padova 11 / 27

Page 17: Euloge Clovis Kenne Paguifmi.stat.unipd.it/pdf/Kenne_prin2019.pdf · Idea: solve the pro le versions system with e cient scores instead of pro le scores (and all quantities evaluated

Multi-dimensional parameter

The vector of efficient scores has elements Ur =∑p

s=1(ırs/ırr )Us .

Define θ as the solution of

Ur = Ur + Mr = 0 (r = 1, . . . , p), where Mr = −κ1r +1

6

κ3rκ2r·

Properties:

. θr − θrP = Op(n−3/2) (r = 1, . . . , p);

. θr is third-order median unbiased, i.e. Prθ(θr ≤ θr ) = 12 + O(n−3/2);

. θ is equivariant under joint reparameterizations that transform each

component of θ separately;

. θ ∼Np(θ, ı(θ)−1).

Euloge Clovis Kenne (Univ. of Padua) Adjusted scores PRIN 2015 meeting, Padova 12 / 27

Page 18: Euloge Clovis Kenne Paguifmi.stat.unipd.it/pdf/Kenne_prin2019.pdf · Idea: solve the pro le versions system with e cient scores instead of pro le scores (and all quantities evaluated

Multi-dimensional parameter

The vector of efficient scores has elements Ur =∑p

s=1(ırs/ırr )Us .

Define θ as the solution of

Ur = Ur + Mr = 0 (r = 1, . . . , p), where Mr = −κ1r +1

6

κ3rκ2r·

Properties:

. θr − θrP = Op(n−3/2) (r = 1, . . . , p);

. θr is third-order median unbiased, i.e. Prθ(θr ≤ θr ) = 12 + O(n−3/2);

. θ is equivariant under joint reparameterizations that transform each

component of θ separately;

. θ ∼Np(θ, ı(θ)−1).

Euloge Clovis Kenne (Univ. of Padua) Adjusted scores PRIN 2015 meeting, Padova 12 / 27

Page 19: Euloge Clovis Kenne Paguifmi.stat.unipd.it/pdf/Kenne_prin2019.pdf · Idea: solve the pro le versions system with e cient scores instead of pro le scores (and all quantities evaluated

Computational aspects

The estimating equation can be reformulated in terms of an

adjustment to the original score:

U(θ) = U(θ) + A(θ), A(θ) = A∗(θ)− ı(θ)F (θ).

F (θ) has elements

Fr = [ı(θ)−1]Tr Fr (r = 1, . . . , p),

with Fr having components

Fr ,t = tr [hr{(1/3)Pt + (1/2)Qt}] (t = 1, . . . , p).

For both mean and median bias reduction, a quasi-Fisher scoring-type

algorithm has kth iteration

θ(k+1) = θ(k) + ı−1(θ(k))B(θ(k)) + ı−1(θ(k))U(θ(k)),

where B(θ) can be A∗(θ) or A(θ), respectively.

Euloge Clovis Kenne (Univ. of Padua) Adjusted scores PRIN 2015 meeting, Padova 13 / 27

Page 20: Euloge Clovis Kenne Paguifmi.stat.unipd.it/pdf/Kenne_prin2019.pdf · Idea: solve the pro le versions system with e cient scores instead of pro le scores (and all quantities evaluated

Computational aspects

The estimating equation can be reformulated in terms of an

adjustment to the original score:

U(θ) = U(θ) + A(θ), A(θ) = A∗(θ)− ı(θ)F (θ).

F (θ) has elements

Fr = [ı(θ)−1]Tr Fr (r = 1, . . . , p),

with Fr having components

Fr ,t = tr [hr{(1/3)Pt + (1/2)Qt}] (t = 1, . . . , p).

For both mean and median bias reduction, a quasi-Fisher scoring-type

algorithm has kth iteration

θ(k+1) = θ(k) + ı−1(θ(k))B(θ(k)) + ı−1(θ(k))U(θ(k)),

where B(θ) can be A∗(θ) or A(θ), respectively.

Euloge Clovis Kenne (Univ. of Padua) Adjusted scores PRIN 2015 meeting, Padova 13 / 27

Page 21: Euloge Clovis Kenne Paguifmi.stat.unipd.it/pdf/Kenne_prin2019.pdf · Idea: solve the pro le versions system with e cient scores instead of pro le scores (and all quantities evaluated

Logistic regression: endometrial cancer grade

Study designed to evaluate the relationship between the histology of

the endometrium of 79 patients and three risk factors: neovasculation

(NV), pulsatility index of arteria uterina (PI) and endometrium height

(EH) (Agresti, 2015, Section 5.7.1).

Maximum likelihood estimation leads to infinite estimate of βNV

(quasi-complete separation).

Both mean and median bias reduction provide a solution to the

problem of separation in logistic regression.

Bias reduction in generalized linear models (Kosmidis et al., 2018).

Euloge Clovis Kenne (Univ. of Padua) Adjusted scores PRIN 2015 meeting, Padova 14 / 27

Page 22: Euloge Clovis Kenne Paguifmi.stat.unipd.it/pdf/Kenne_prin2019.pdf · Idea: solve the pro le versions system with e cient scores instead of pro le scores (and all quantities evaluated

Logistic regression: endometrial cancer grade (cont.)

Endometrial cancer grade: estimates (s.e.).intercept NV PI EH

β 4.305 (1.637) +∞ (+∞) -0.042 (0.044) -2.903 (0.846)

β 3.969 (1.552) 3.869 (2.298) -0.039 (0.042) -2.708 (0.803)

β∗ 3.775 (1.489) 2.929 (1.551) -0.035 (0.040) -2.604 (0.776)

Both β and β∗ are finite.

β is intermediate between β and β∗.

e βNV is third order median unbiased for eβNV while e β∗NV is not a bias

reduced estimate of eβNV .

R package: brglm2 (on CRAN).

Euloge Clovis Kenne (Univ. of Padua) Adjusted scores PRIN 2015 meeting, Padova 15 / 27

Page 23: Euloge Clovis Kenne Paguifmi.stat.unipd.it/pdf/Kenne_prin2019.pdf · Idea: solve the pro le versions system with e cient scores instead of pro le scores (and all quantities evaluated

R package: brglm2 (on CRAN)

Maximun likelihood estimates> endometrial_ML <- glm(HG~NV+PI+EH,family=binomial,data=endometrial)

> endometrial_ML$coefficients

(Intercept) NV PI EH

4.3045178 18.1855558 -0.0421834 -2.9026056

Mean bias reduced estimates> endometrial_BR <- update(endometrial_ML,method="brglmFit",

type="AS_mean")

> endometrial_BR$coefficients

(Intercept) NV PI EH

3.77455971 2.92927335 -0.03475176 -2.60416392

Median bias reduced estimates> endometrial_MBR <- update(endometrial_ML,method="brglmFit",

type="AS_median")

> endometrial_MBR$coefficients

(Intercept) NV PI EH

3.96935983 3.86920663 -0.03867797 -2.70793447

Euloge Clovis Kenne (Univ. of Padua) Adjusted scores PRIN 2015 meeting, Padova 16 / 27

Page 24: Euloge Clovis Kenne Paguifmi.stat.unipd.it/pdf/Kenne_prin2019.pdf · Idea: solve the pro le versions system with e cient scores instead of pro le scores (and all quantities evaluated

Logistic regression: simulation

Simulate 10000 samples with covariates values fixed as in the original

sample.

True parameter β0 = (−1.19, 2.00,−0.39,−1.79).

About 28% of MLE of βNV are infinite, while β∗NV and βNV are

always finite.

Performance of the maximum likelihood (ML), mean bias reduction

(meanBR) and median bias reduction (medianBR) in terms of

estimated

. percentage of underestimation;

. bias;

. coverage of 95% Wald-type confidence intervals.

Euloge Clovis Kenne (Univ. of Padua) Adjusted scores PRIN 2015 meeting, Padova 17 / 27

Page 25: Euloge Clovis Kenne Paguifmi.stat.unipd.it/pdf/Kenne_prin2019.pdf · Idea: solve the pro le versions system with e cient scores instead of pro le scores (and all quantities evaluated

Logistic regression: simulation

4446

4850

5254

56Percentage of underestimation

βint βNV βPI βEH

MLmeanBRmedianBR

Euloge Clovis Kenne (Univ. of Padua) Adjusted scores PRIN 2015 meeting, Padova 18 / 27

Page 26: Euloge Clovis Kenne Paguifmi.stat.unipd.it/pdf/Kenne_prin2019.pdf · Idea: solve the pro le versions system with e cient scores instead of pro le scores (and all quantities evaluated

Logistic regression: simulation

−0.

2−

0.1

0.0

0.1

0.2

0.3

Bias

βint βNV βPI βEH

●●

MLmeanBRmedianBR

Euloge Clovis Kenne (Univ. of Padua) Adjusted scores PRIN 2015 meeting, Padova 18 / 27

Page 27: Euloge Clovis Kenne Paguifmi.stat.unipd.it/pdf/Kenne_prin2019.pdf · Idea: solve the pro le versions system with e cient scores instead of pro le scores (and all quantities evaluated

Logistic regression: simulation

95.0

95.5

96.0

96.5

97.0

97.5

Coverages of 95% Wald CI

βint βNV βPI βEH

MLmeanBRmedianBR

Euloge Clovis Kenne (Univ. of Padua) Adjusted scores PRIN 2015 meeting, Padova 18 / 27

Page 28: Euloge Clovis Kenne Paguifmi.stat.unipd.it/pdf/Kenne_prin2019.pdf · Idea: solve the pro le versions system with e cient scores instead of pro le scores (and all quantities evaluated

Double index beta regression model

Y1, . . . ,Yn independent beta random variables with

fYi(yi ;µi , φi ) =

Γ(φi )

Γ(µiφi )Γ((1− µi )φi )yµiφi−1i (1− yi )

(1−µi )φi−1,

0 < yi < 1, 0 < µi < 1 and φi > 0.

Eθ(Yi ) = µi and Vθ(Yi ) = µi (1− µi )/(1 + φi ), so that φi is a

precision parameter.

Link functions connect the mean and precision with the linear

predictors g1(µi ) = xTi β and g2(φi ) = zTi γ, respectively.

xi = (xi1, . . . , xip)> and zi = (zi1, . . . , ziq)> are vectors of covariates.

Inference about θ = (β1, . . . , βp, γ1, . . . , γq)>.

Euloge Clovis Kenne (Univ. of Padua) Adjusted scores PRIN 2015 meeting, Padova 19 / 27

Page 29: Euloge Clovis Kenne Paguifmi.stat.unipd.it/pdf/Kenne_prin2019.pdf · Idea: solve the pro le versions system with e cient scores instead of pro le scores (and all quantities evaluated

Simulation: a constant precision

Consider a logit link on the mean structure as

logµi

1− µi= β0 + β1xi1 + β2xi2, i = 1, . . . , n,

where the xi1 are n realizations of a standard normal and xi2 = logui ,

with ui generated from a uniform U(1, 2).

Simulate 10000 samples, with xi1 and xi2 held constant throughout

the replications.

Parameter values were fixed as β0 = 1.5, β1 = 0.5, β2 = 2 and

φ = 200.

Euloge Clovis Kenne (Univ. of Padua) Adjusted scores PRIN 2015 meeting, Padova 20 / 27

Page 30: Euloge Clovis Kenne Paguifmi.stat.unipd.it/pdf/Kenne_prin2019.pdf · Idea: solve the pro le versions system with e cient scores instead of pro le scores (and all quantities evaluated

Simulation: a constant precision

3035

4045

5055

60Percentage of underestimation

βint β1 β2 φ

● ●

MLmeanBRmedianBR

Euloge Clovis Kenne (Univ. of Padua) Adjusted scores PRIN 2015 meeting, Padova 21 / 27

Page 31: Euloge Clovis Kenne Paguifmi.stat.unipd.it/pdf/Kenne_prin2019.pdf · Idea: solve the pro le versions system with e cient scores instead of pro le scores (and all quantities evaluated

Simulation: a constant precision

010

2030

4050

60

Bias

βint β1 β2 φ

● ● ● ●

MLmeanBRmedianBR

Euloge Clovis Kenne (Univ. of Padua) Adjusted scores PRIN 2015 meeting, Padova 21 / 27

Page 32: Euloge Clovis Kenne Paguifmi.stat.unipd.it/pdf/Kenne_prin2019.pdf · Idea: solve the pro le versions system with e cient scores instead of pro le scores (and all quantities evaluated

Simulation: a constant precision

8890

9294

96

Coverages of 95% Wald CI

βint β1 β2 φ

●●

MLmeanBRmedianBR

Euloge Clovis Kenne (Univ. of Padua) Adjusted scores PRIN 2015 meeting, Padova 21 / 27

Page 33: Euloge Clovis Kenne Paguifmi.stat.unipd.it/pdf/Kenne_prin2019.pdf · Idea: solve the pro le versions system with e cient scores instead of pro le scores (and all quantities evaluated

Reading skills data

Reading skills data from Smithson and Verkuilen (2006), also available

in the R package betareg.

The analysis is on the reading skills for nondyslexic and dyslexic

Australian children.

The children are aged between eight years and five months and twelve

years and three months.

The data consists of 44 observations of children of which 19

are dyslexic children.

The variables accuracy (the score on a reading accuracy test), iq (the

score on a nonverbal intelligent quotient test) and dyslexia (binary

variable on whether the child is dyslexic) were recorded.

Euloge Clovis Kenne (Univ. of Padua) Adjusted scores PRIN 2015 meeting, Padova 22 / 27

Page 34: Euloge Clovis Kenne Paguifmi.stat.unipd.it/pdf/Kenne_prin2019.pdf · Idea: solve the pro le versions system with e cient scores instead of pro le scores (and all quantities evaluated

R function: mbrbetareg (on GitHub)

The mbrbetareg function produces the same summary output asbetareg.

> data("ReadingSkills", package = "betareg")

> rs_f <- accuracy ~ dyslexia * iq | dyslexia * iq

> rs_mbr<- mbrbetareg(rs_f, data = ReadingSkills, type = "medianBR")

> summary(rs_mbr)$coefficients

$mean

Estimate Std. Error z value Pr(>|z|)

(Intercept) 0.9962230 0.1507239 6.609589 3.853879e-11

dyslexia -0.6146817 0.1507239 -4.078197 4.538635e-05

iq 0.6999005 0.1338485 5.229048 1.703853e-07

dyslexia:iq -0.7776387 0.1338485 -5.809840 6.253246e-09

$precision

Estimate Std. Error z value Pr(>|z|)

(Intercept) 2.7486473 0.2566998 10.707633 9.372729e-27

dyslexia 1.6463254 0.2566998 6.413427 1.422845e-10

iq 1.2841753 0.2573632 4.989739 6.046090e-07

dyslexia:iq -0.7399326 0.2573632 -2.875052 4.039611e-03

Euloge Clovis Kenne (Univ. of Padua) Adjusted scores PRIN 2015 meeting, Padova 23 / 27

Page 35: Euloge Clovis Kenne Paguifmi.stat.unipd.it/pdf/Kenne_prin2019.pdf · Idea: solve the pro le versions system with e cient scores instead of pro le scores (and all quantities evaluated

Reading skills data

Model: log µi1−µi = β1 + β2dyslexiai + β3iqi + β4zi and

logφi = γ1 + γ2dyslexiai + γ3iqi + γ4zi , where the variable z

represents the interaction between the variables dyslexia and iq.

Reading skills data: estimates (s.e.).β1 β2 β3 β4 γ1 γ2 γ3 γ4

β 1.019 -0.638 0.690 -0.776 3.040 1.768 1.437 -0.611(0.145) (0.145) (0.127) (0.127) (0.258) (0.258) (0.257) (0.257)

β 0.996 -0.615 0.700 -0.778 2.749 1.646 1.284 -0.740(0.151) (0.151) (0.134) (0.134) (0.257) (0.257) (0.257) (0.257)

β∗ 0.985 -0.603 0.707 -0.784 2.721 1.634 1.281 -0.759(0.150) (0.150) (0.133) (0.133) (0.256) (0.256) (0.257) (0.257)

β is obtained from the R function mbrbetareg (on GitHub) while β

and β∗ are calculated from the R package betareg (on CRAN).

Euloge Clovis Kenne (Univ. of Padua) Adjusted scores PRIN 2015 meeting, Padova 24 / 27

Page 36: Euloge Clovis Kenne Paguifmi.stat.unipd.it/pdf/Kenne_prin2019.pdf · Idea: solve the pro le versions system with e cient scores instead of pro le scores (and all quantities evaluated

Reading skills data: simulation

3540

4550

55Percentage of underestimation

β1 β2 β3 β4 γ1 γ2 γ3 γ4

●●

MLmeanBRmedianBR

Euloge Clovis Kenne (Univ. of Padua) Adjusted scores PRIN 2015 meeting, Padova 25 / 27

Page 37: Euloge Clovis Kenne Paguifmi.stat.unipd.it/pdf/Kenne_prin2019.pdf · Idea: solve the pro le versions system with e cient scores instead of pro le scores (and all quantities evaluated

Reading skills data: simulation

−0.

050.

000.

050.

100.

150.

20

Bias

β1 β2 β3 β4 γ1 γ2 γ3 γ4

●●

● ●

MLmeanBRmedianBR

Euloge Clovis Kenne (Univ. of Padua) Adjusted scores PRIN 2015 meeting, Padova 25 / 27

Page 38: Euloge Clovis Kenne Paguifmi.stat.unipd.it/pdf/Kenne_prin2019.pdf · Idea: solve the pro le versions system with e cient scores instead of pro le scores (and all quantities evaluated

Reading skills data: simulation

8486

8890

9294

Coverages of 95% Wald CI

β1 β2 β3 β4 γ1 γ2 γ3 γ4

● ●●

MLmeanBRmedianBR

Euloge Clovis Kenne (Univ. of Padua) Adjusted scores PRIN 2015 meeting, Padova 25 / 27

Page 39: Euloge Clovis Kenne Paguifmi.stat.unipd.it/pdf/Kenne_prin2019.pdf · Idea: solve the pro le versions system with e cient scores instead of pro le scores (and all quantities evaluated

Closing

The proposed method is effective for median centering of components

of the estimator.

Gives a solution to boundary estimates by means of a model-based

penalization.

Wald confidence intervals have good coverage, but using

score statistic seems more promising.

Recover some global invariance (affine transformations)?

Euloge Clovis Kenne (Univ. of Padua) Adjusted scores PRIN 2015 meeting, Padova 26 / 27

Page 40: Euloge Clovis Kenne Paguifmi.stat.unipd.it/pdf/Kenne_prin2019.pdf · Idea: solve the pro le versions system with e cient scores instead of pro le scores (and all quantities evaluated

Main references

Agresti, A. (2015). Foundations of Linear and Generalized Linear

Models. Wiley,New York.

Firth, D. (1993). Bias reduction of maximum likelihood estimates.

Biometrika, 80, 27–38.

Kenne Pagui, E. C., Salvan, A. and Sartori, N. (2017). Median bias

reduction of maximum likelihood estimates. Biometrika, 104,

923–938.

Kosmidis, I., Kenne Pagui, E. C. and Sartori, N. (2018). Mean and

median bias reduction in generalized linear models. Statistics andComputing (accepted) http://arxiv.org/abs/1804.04085.

Smithson, M. and Verkuilen, J. (2006). A Better lemon squeezer?

maximum likelihood regression with beta-distributed dependent

variables. Psychological Methods, 11, 54–71.

Euloge Clovis Kenne (Univ. of Padua) Adjusted scores PRIN 2015 meeting, Padova 27 / 27

Page 41: Euloge Clovis Kenne Paguifmi.stat.unipd.it/pdf/Kenne_prin2019.pdf · Idea: solve the pro le versions system with e cient scores instead of pro le scores (and all quantities evaluated

Median modification term

κ1r = −1

2

p∑s=1

p∑t=1

p∑u=1

ırsνtu(νs,tu + νs,t,u)/ırr , with νtu = ıtu − ıtr ıru/ırr ,

κ2r = 1/ırr ,

κ3r =

p∑s=1

p∑t=1

p∑u=1

ırs ırt ıruνs,t,u/(ırr )3.

Euloge Clovis Kenne (Univ. of Padua) Adjusted scores PRIN 2015 meeting, Padova 28 / 27