Euloge Clovis Kenne Paguifmi.stat.unipd.it/pdf/Kenne_prin2019.pdf · Idea: solve the pro le versions system with e cient scores instead of pro le scores (and all quantities evaluated

Adjusted scores for regression models

Euloge Clovis Kenne [email protected]

Department of Statistical SciencesUniversity of Padua

Joint work with Alessandra Salvan and Nicola Sartori

Euloge Clovis Kenne (Univ. of Padua) Adjusted scores PRIN 2015 meeting, Padova 1 / 27

Outline

1 Background on MLE

2 Mean bias reduction

3 Median bias reduction

4 Applications

5 Conclusions


Basics

Parametric model with p-dimensional parameter θ and log likelihood

`(θ) based on sample of size n.

With θ = (θ1, . . . , θp), derivatives of `(θ):

. Ur= ∂`(θ)/∂θr , Urs = ∂2`(θ)/∂θr∂θs , r , s = 1, . . . , p;

. score U(θ) with components Ur ;

. observed information j(θ) with entry −Urs .

Expected values of derivatives of `(θ):

. expected information ı(θ) = Eθ{j(θ)} with entry ırs ;

. ırs entry of ı(θ)−1;

. νr ,st = Eθ(UrUst) and νr ,s,t = Eθ(UrUsUt).


Basics

θ denotes the maximum likelihood estimator (MLE).

First-order properties of MLE:

. θ ∼Np(θ, ı(θ)−1)

. mean unbiasedness: Eθ(θ) = θ + O(n−1)

. (marginal) median unbiasedness: Pθ(θr < θr ) = 12 + O(n−1/2)


Mean bias correction

In regular parametric models, the bias of MLE has the form

Eθ(θ − θ) = b(θ) + O(n−2),

where b(θ) = −ı(θ)−1A∗(θ) = O(n−1), with

A∗r =1

2tr[ı(θ)−1{Pr + Qr}], r = 1, . . . , p,

where Pr = Eθ{U(θ)U(θ)TUr} and Qr = Eθ{−j(θ)Ur}

θBC = θ − b(θ) has bias of order O(n−2).

Asymptotically equivalent alternatives, such as bootstrap and

jackknife, can be obtained without b(θ).

Lack of equivariance under reparameterizations.

Available only if θ is finite.


Mean bias reduction

Firth (1993), proposes a modification of the score vector of the form

U∗(θ) = U(θ) + A∗(θ) .

The modification term A∗(θ) is of order O(1).

The estimator θ∗, solution of U∗(θ) = 0 has a bias of order O(n−2)

and does not require finiteness of θ.

Notable examples: generalized linear models (glm) implemented in

the R package brglm2.

θ∗ ∼Np(θ, ı(θ)−1).

Still depends on parameterization.


Median bias reduction

Kenne Pagui et al. (2017): a new modified score that

. does not require θ, as mean bias reduction;

. estimates all components of the parameter simultaneously;

. gives an estimator that is marginally third-order median unbiased for

each component;

. maintains (some) equivariance.

Development:

. Scalar parameter;

. Scalar parameter with nuisance parameters;

. Multidimensional parameter.


Median bias reduction

Kenne Pagui et al. (2017): a new modified score that

. does not require θ, as mean bias reduction;

. estimates all components of the parameter simultaneously;

. gives an estimator that is marginally third-order median unbiased for

each component;

. maintains (some) equivariance.

Development:

. Scalar parameter;

. Scalar parameter with nuisance parameters;

. Multidimensional parameter.


Median bias reduction: scalar parameter

For scalar θ, a median modification of the score is obtained using a

Cornish-Fisher expansion

U(θ) = U(θ) + A(θ) ,

where A(θ)=−Meθ{U(θ)}=νθ,θ,θ/{6ı(θ)} = O(1).

For continuous random variables and under weak regularity conditions

on the score, the solution of U(θ) = 0, θ, is third-order median

unbiased

Prθ(θ ≤ θ) = Prθ(U(θ) ≤ 0) =1

2+ O(n−3/2) .

θ is equivariant.

θ ∼N(θ, ı(θ)−1).


Median bias reduction: scalar parameter

For scalar θ, a median modification of the score is obtained using a

Cornish-Fisher expansion

U(θ) = U(θ) + A(θ) ,

where A(θ)=−Meθ{U(θ)}=νθ,θ,θ/{6ı(θ)} = O(1).

For continuous random variables and under weak regularity conditions

on the score, the solution of U(θ) = 0, θ, is third-order median

unbiased

Prθ(θ ≤ θ) = Prθ(U(θ) ≤ 0) =1

2+ O(n−3/2) .

θ is equivariant.

θ ∼N(θ, ı(θ)−1).


Toy example: normal model

y1, . . . , yn random sample from N(µ0, σ2), with µ0 known.

σ2 = σ2∗ = s(µ0)/n, with s(µ0) =∑n

i=1(yi − µ0)2.

The median modified score is

U(σ2) = −n−2/3

2σ2+

s(µ0)

2(σ2)2

which leads to σ2 = s(µ0)/(n−2/3).

σ2 is equal to the exact median unbiased estimator, s(µ0)/χ2n;0.5, plus

an error of order O(n−2).

Both σ2 and σ2 are equivariant; hence, for instance, for ω =√σ2, we

have ω = {σ2}1/2 and ω = {σ2}1/2.

On the other hand, ω∗ = {s(µ0)/(n−1/2)}1/2.


Toy example: normal model

y1, . . . , yn random sample from N(µ0, σ2), with µ0 known.

σ2 = σ2∗ = s(µ0)/n, with s(µ0) =∑n

i=1(yi − µ0)2.

The median modified score is

U(σ2) = −n−2/3

2σ2+

s(µ0)

2(σ2)2

which leads to σ2 = s(µ0)/(n−2/3).

σ2 is equal to the exact median unbiased estimator, s(µ0)/χ2n;0.5, plus

an error of order O(n−2).

Both σ2 and σ2 are equivariant; hence, for instance, for ω =√σ2, we

have ω = {σ2}1/2 and ω = {σ2}1/2.

On the other hand, ω∗ = {s(µ0)/(n−1/2)}1/2.


Scalar parameter with nuisance parameters

Let θ = (ψ, λ), with ψ a scalar parameter of interest.

Substitute λ with λψ and get the profile score UP(ψ) = Uψ(θψ).

Using a Cornish-Fisher expansion for the median of the standardized

profile score we obtain

UP(ψ) = UP(ψ)−κ1ψ(θψ) +1

6

κ3ψ(θψ)

κ2ψ(θψ),

where κ1ψ(·), κ2ψ(·) and κ3ψ(·) involve ırs , νr ,s,t and νr ,st , and are

the leading terms of the first three cumulants of UP(ψ).

UP(ψ) = 0 gives ψP which is

. third-order median unbiased,

. invariant with respect to interest preserving reparameterizations,

. asymptotically N(ψ, ıψψ).


Scalar parameter with nuisance parameters

Let θ = (ψ, λ), with ψ a scalar parameter of interest.

Substitute λ with λψ and get the profile score UP(ψ) = Uψ(θψ).

Using a Cornish-Fisher expansion for the median of the standardized

profile score we obtain

UP(ψ) = UP(ψ)−κ1ψ(θψ) +1

6

κ3ψ(θψ)

κ2ψ(θψ),

where κ1ψ(·), κ2ψ(·) and κ3ψ(·) involve ırs , νr ,s,t and νr ,st , and are

the leading terms of the first three cumulants of UP(ψ).

UP(ψ) = 0 gives ψP which is

. third-order median unbiased,

. invariant with respect to interest preserving reparameterizations,

. asymptotically N(ψ, ıψψ).


Multi-dimensional parameter

First idea: jointly solve system with “profile versions” for each

parameter, with all quantities evaluated at θ.

Second idea: profile score coincides with the efficient score

Uψ = Uψ − ıψλı−1λλUλ,

evaluated at θψ, indeed

UP(ψ) = Uψ(θψ)− Uψλ(θψ)Uλλ(θψ)−1Uλ(θψ) = Uψ(θψ) .

Idea: solve the profile versions system with efficient scores instead ofprofile scores (and all quantities evaluated at θ).



First idea: jointly solve system with “profile versions” for each

parameter, with all quantities evaluated at θ.

Second idea: profile score coincides with the efficient score

Uψ = Uψ − ıψλı−1λλUλ,

evaluated at θψ, indeed

UP(ψ) = Uψ(θψ)− Uψλ(θψ)Uλλ(θψ)−1Uλ(θψ) = Uψ(θψ) .

Idea: solve the profile versions system with efficient scores instead ofprofile scores (and all quantities evaluated at θ).



The vector of efficient scores has elements Ur =∑p

s=1(ırs/ırr )Us .

Define θ as the solution of

Ur = Ur + Mr = 0 (r = 1, . . . , p), where Mr = −κ1r +1

6

κ3rκ2r·

Properties:

. θr − θrP = Op(n−3/2) (r = 1, . . . , p);

. θr is third-order median unbiased, i.e. Prθ(θr ≤ θr ) = 12 + O(n−3/2);

. θ is equivariant under joint reparameterizations that transform each

component of θ separately;

. θ ∼Np(θ, ı(θ)−1).



The vector of efficient scores has elements Ur =∑p

s=1(ırs/ırr )Us .

Define θ as the solution of

Ur = Ur + Mr = 0 (r = 1, . . . , p), where Mr = −κ1r +1

6

κ3rκ2r·

Properties:

. θr − θrP = Op(n−3/2) (r = 1, . . . , p);

. θr is third-order median unbiased, i.e. Prθ(θr ≤ θr ) = 12 + O(n−3/2);

. θ is equivariant under joint reparameterizations that transform each

component of θ separately;

. θ ∼Np(θ, ı(θ)−1).


Computational aspects

The estimating equation can be reformulated in terms of an

adjustment to the original score:

U(θ) = U(θ) + A(θ), A(θ) = A∗(θ)− ı(θ)F (θ).

F (θ) has elements

Fr = [ı(θ)−1]Tr Fr (r = 1, . . . , p),

with Fr having components

Fr ,t = tr [hr{(1/3)Pt + (1/2)Qt}] (t = 1, . . . , p).

For both mean and median bias reduction, a quasi-Fisher scoring-type

algorithm has kth iteration

θ(k+1) = θ(k) + ı−1(θ(k))B(θ(k)) + ı−1(θ(k))U(θ(k)),

where B(θ) can be A∗(θ) or A(θ), respectively.


Computational aspects

The estimating equation can be reformulated in terms of an

adjustment to the original score:

U(θ) = U(θ) + A(θ), A(θ) = A∗(θ)− ı(θ)F (θ).

F (θ) has elements

Fr = [ı(θ)−1]Tr Fr (r = 1, . . . , p),

with Fr having components

Fr ,t = tr [hr{(1/3)Pt + (1/2)Qt}] (t = 1, . . . , p).

For both mean and median bias reduction, a quasi-Fisher scoring-type

algorithm has kth iteration

θ(k+1) = θ(k) + ı−1(θ(k))B(θ(k)) + ı−1(θ(k))U(θ(k)),

where B(θ) can be A∗(θ) or A(θ), respectively.


Logistic regression: endometrial cancer grade

Study designed to evaluate the relationship between the histology of

the endometrium of 79 patients and three risk factors: neovasculation

(NV), pulsatility index of arteria uterina (PI) and endometrium height

(EH) (Agresti, 2015, Section 5.7.1).

Maximum likelihood estimation leads to infinite estimate of βNV

(quasi-complete separation).

Both mean and median bias reduction provide a solution to the

problem of separation in logistic regression.

Bias reduction in generalized linear models (Kosmidis et al., 2018).


Logistic regression: endometrial cancer grade (cont.)

Endometrial cancer grade: estimates (s.e.).intercept NV PI EH

β 4.305 (1.637) +∞ (+∞) -0.042 (0.044) -2.903 (0.846)

β 3.969 (1.552) 3.869 (2.298) -0.039 (0.042) -2.708 (0.803)

β∗ 3.775 (1.489) 2.929 (1.551) -0.035 (0.040) -2.604 (0.776)

Both β and β∗ are finite.

β is intermediate between β and β∗.

e βNV is third order median unbiased for eβNV while e β∗NV is not a bias

reduced estimate of eβNV .

R package: brglm2 (on CRAN).


R package: brglm2 (on CRAN)

Maximun likelihood estimates> endometrial_ML <- glm(HG~NV+PI+EH,family=binomial,data=endometrial)

> endometrial_ML$coefficients

(Intercept) NV PI EH

4.3045178 18.1855558 -0.0421834 -2.9026056

Mean bias reduced estimates> endometrial_BR <- update(endometrial_ML,method="brglmFit",

type="AS_mean")

> endometrial_BR$coefficients


3.77455971 2.92927335 -0.03475176 -2.60416392

Median bias reduced estimates> endometrial_MBR <- update(endometrial_ML,method="brglmFit",

type="AS_median")

> endometrial_MBR$coefficients


3.96935983 3.86920663 -0.03867797 -2.70793447


Logistic regression: simulation

Simulate 10000 samples with covariates values fixed as in the original

sample.

True parameter β0 = (−1.19, 2.00,−0.39,−1.79).

About 28% of MLE of βNV are infinite, while β∗NV and βNV are

always finite.

Performance of the maximum likelihood (ML), mean bias reduction

(meanBR) and median bias reduction (medianBR) in terms of

estimated

. percentage of underestimation;

. bias;

. coverage of 95% Wald-type confidence intervals.



4446

4850

5254

56Percentage of underestimation

βint βNV βPI βEH

●

●

●

●

●

MLmeanBRmedianBR



−0.

2−

0.1

0.0

0.1

0.2

0.3

Bias


●

●

●●

●

MLmeanBRmedianBR



95.0

95.5

96.0

96.5

97.0

97.5

Coverages of 95% Wald CI


●

●

●

●

●

MLmeanBRmedianBR


Double index beta regression model

Y1, . . . ,Yn independent beta random variables with

fYi(yi ;µi , φi ) =

Γ(φi )

Γ(µiφi )Γ((1− µi )φi )yµiφi−1i (1− yi )

(1−µi )φi−1,

0 < yi < 1, 0 < µi < 1 and φi > 0.

Eθ(Yi ) = µi and Vθ(Yi ) = µi (1− µi )/(1 + φi ), so that φi is a

precision parameter.

Link functions connect the mean and precision with the linear

predictors g1(µi ) = xTi β and g2(φi ) = zTi γ, respectively.

xi = (xi1, . . . , xip)> and zi = (zi1, . . . , ziq)> are vectors of covariates.

Inference about θ = (β1, . . . , βp, γ1, . . . , γq)>.


Simulation: a constant precision

Consider a logit link on the mean structure as

logµi

1− µi= β0 + β1xi1 + β2xi2, i = 1, . . . , n,

where the xi1 are n realizations of a standard normal and xi2 = logui ,

with ui generated from a uniform U(1, 2).

Simulate 10000 samples, with xi1 and xi2 held constant throughout

the replications.

Parameter values were fixed as β0 = 1.5, β1 = 0.5, β2 = 2 and

φ = 200.



3035

4045

5055


βint β1 β2 φ

●

● ●

●

●

MLmeanBRmedianBR



010

2030

4050

60

Bias

βint β1 β2 φ

● ● ● ●

●

MLmeanBRmedianBR



8890

9294

96


βint β1 β2 φ

●

●●

●

●

MLmeanBRmedianBR


Reading skills data

Reading skills data from Smithson and Verkuilen (2006), also available

in the R package betareg.

The analysis is on the reading skills for nondyslexic and dyslexic

Australian children.

The children are aged between eight years and five months and twelve

years and three months.

The data consists of 44 observations of children of which 19

are dyslexic children.

The variables accuracy (the score on a reading accuracy test), iq (the

score on a nonverbal intelligent quotient test) and dyslexia (binary

variable on whether the child is dyslexic) were recorded.


R function: mbrbetareg (on GitHub)

The mbrbetareg function produces the same summary output asbetareg.

> data("ReadingSkills", package = "betareg")

> rs_f <- accuracy ~ dyslexia * iq | dyslexia * iq

> rs_mbr<- mbrbetareg(rs_f, data = ReadingSkills, type = "medianBR")

> summary(rs_mbr)$coefficients

$mean

Estimate Std. Error z value Pr(>|z|)

(Intercept) 0.9962230 0.1507239 6.609589 3.853879e-11

dyslexia -0.6146817 0.1507239 -4.078197 4.538635e-05

iq 0.6999005 0.1338485 5.229048 1.703853e-07

dyslexia:iq -0.7776387 0.1338485 -5.809840 6.253246e-09

$precision

Estimate Std. Error z value Pr(>|z|)

(Intercept) 2.7486473 0.2566998 10.707633 9.372729e-27

dyslexia 1.6463254 0.2566998 6.413427 1.422845e-10

iq 1.2841753 0.2573632 4.989739 6.046090e-07

dyslexia:iq -0.7399326 0.2573632 -2.875052 4.039611e-03


Reading skills data

Model: log µi1−µi = β1 + β2dyslexiai + β3iqi + β4zi and

logφi = γ1 + γ2dyslexiai + γ3iqi + γ4zi , where the variable z

represents the interaction between the variables dyslexia and iq.

Reading skills data: estimates (s.e.).β1 β2 β3 β4 γ1 γ2 γ3 γ4

β 1.019 -0.638 0.690 -0.776 3.040 1.768 1.437 -0.611(0.145) (0.145) (0.127) (0.127) (0.258) (0.258) (0.257) (0.257)

β 0.996 -0.615 0.700 -0.778 2.749 1.646 1.284 -0.740(0.151) (0.151) (0.134) (0.134) (0.257) (0.257) (0.257) (0.257)

β∗ 0.985 -0.603 0.707 -0.784 2.721 1.634 1.281 -0.759(0.150) (0.150) (0.133) (0.133) (0.256) (0.256) (0.257) (0.257)

β is obtained from the R function mbrbetareg (on GitHub) while β

and β∗ are calculated from the R package betareg (on CRAN).


Reading skills data: simulation

3540

4550


β1 β2 β3 β4 γ1 γ2 γ3 γ4

●

●

●

●

●●

●

●

●

MLmeanBRmedianBR



−0.

050.

000.

050.

100.

150.

20

Bias

β1 β2 β3 β4 γ1 γ2 γ3 γ4

●

●●

● ●

●

●

●

●

MLmeanBRmedianBR



8486

8890

9294


β1 β2 β3 β4 γ1 γ2 γ3 γ4

● ●●

●

●

●

●

●

●

MLmeanBRmedianBR


Closing

The proposed method is effective for median centering of components

of the estimator.

Gives a solution to boundary estimates by means of a model-based

penalization.

Wald confidence intervals have good coverage, but using

score statistic seems more promising.

Recover some global invariance (affine transformations)?


Main references

Agresti, A. (2015). Foundations of Linear and Generalized Linear

Models. Wiley,New York.

Firth, D. (1993). Bias reduction of maximum likelihood estimates.

Biometrika, 80, 27–38.

Kenne Pagui, E. C., Salvan, A. and Sartori, N. (2017). Median bias

reduction of maximum likelihood estimates. Biometrika, 104,

923–938.

Kosmidis, I., Kenne Pagui, E. C. and Sartori, N. (2018). Mean and

median bias reduction in generalized linear models. Statistics andComputing (accepted) http://arxiv.org/abs/1804.04085.

Smithson, M. and Verkuilen, J. (2006). A Better lemon squeezer?

maximum likelihood regression with beta-distributed dependent

variables. Psychological Methods, 11, 54–71.


Median modification term

κ1r = −1

2

p∑s=1

p∑t=1

p∑u=1

ırsνtu(νs,tu + νs,t,u)/ırr , with νtu = ıtu − ıtr ıru/ırr ,

κ2r = 1/ırr ,

κ3r =

p∑s=1

p∑t=1

p∑u=1

ırs ırt ıruνs,t,u/(ırr )3.


Documents

Euloge Clovis Kenne Paguifmi.stat.unipd.it/pdf/Kenne_prin2019.pdf · Idea: solve the pro le versions system with e cient scores instead of pro le scores (and all quantities evaluated