[Lecture Notes in Statistics] Topics in Survey Sampling Volume 153 || Modifications of Bayes Procedures

Chapter 4

Modifications of BayesProcedures

4.1 INTRODUCTION

This chapter considers different modifications of Bayes procedures and theirapplications in finite population sampling. Section 4.2 reviews Bayes leastsquares prediction or Linear Bayes prediction. Section 4.3 addresses restricted Bayes least squares prediction. The problem of Constrained Bayesprediction and Limited Translation Bayes prediction have been consideredin the next section. Applications of these procedures in finite populationsampling have been illustrated in various stages. Section 4.5 considers therobustness of a Bayesian predictor derived under a working model with respect to a class of alternative models as developed by Bolfarine et al (1987).Robust Bayes estimation of a finite population mean under a class of contaminated priors as advocated by Ghosh and Kim (1993, 1997) has beenaddressed in the last section.

4.2 LINEAR BAYES PREDICTION

Bayesian analysis requires full specification of the prior distribution of parameters which may often be large in number. But, in practice, one maynot have full knowledge of the prior distribution but firmly believes tha~

the prior distribution belongs to a class of distributions with specified firstand second order moments. A Bayesian procedure, Linear Bayes Procedure, which is applicable in such circumstances was proposed by Hartigan

93P. Mukhopadhyay, Topics in Survey Sampling© Springer-Verlag New York, Inc. 2001

94 CHAPTER 4. MODIFYING BAYES PROCEDURES

(1969). The procedure only requires the specification of first two momentsand not the full knowledge of the distribution of the prior. The resultingestimator has the property that it minimises the posterior expected squaredloss among all the estimators that are linear in the data and thus can beregarded as an approximation to the posterior mean. In certain sitations,a posterior mean is itself linear in the data (eg. Ericson (1969 a, b), Jewell(1974), Diaconis and Yalvisaker (1979), Goel and DeGroot (1980)) so thatthe Linear Bayes estimate is an exact Bayes estimate under squared errorloss function.

Hartigan's procedure has similarity with the ordinary least squares procedure and as such may also be termed as 'Bayesian Least Squares' method.

The linear Bayes (LB) estimation theory is as follows. Suppose the dataZ has likelihood function feZ I ()) while () has prior g(()), () E e. Undersquared error loss function Bayes estimate of () is BB = E(() I Z). In thelinear Bayes estimation we do not specify the density function feZ I ()) andg(()) but only their first two moments.

DEFINITION 4.2.1 Let 'It = (U, WI, ... Wq ), be random variables with finitevariances and covariances defined over a common probability space. Thelinear expectation of U given W = (WI, ... , Wq), is defined as

q

EL(U IW) = ao + La;W;;=1

where ao, a1, ... aq are suitable constants determined by minimising

q

E(U - ao - La;Wi)2;=1

(4.2.1)

(4.2.2)

the expectation being taken with respect to the joint dostribution of 'It.The linear variance of U given W is defined as

q

VL(U IW) = E(U - ao - L a;W;)2;=1

(4.2.3)

where a/s (j = 0, 1, ... , q) are determined as above.

The idea is that the true regression of U on WI, ... ,Wq which is givenby E(U I W ll ... , Wq) may be a complicated function of W. Instead, weconsider a linear function

q

ao+ La;W;;=1

4.2. LINEAR BAYES PREDICTION 95

which gives the best predictor of U in the sense that it minimises (4.2.2).If the true regression is linear, the minimisation of (4.2.2) gives the trueregression, otherwise it gives the best fit linear regression. The quantity(4.2.1) is the linear regression of U on W. The linear expectation EL(U IW)may, therefore, be considered as an approximation to E(U I W). If 'IJ has(q+ I)-variate normal distribution, then EL(U IW) = E(U IW) and henceVL(U IW) = V(U IW).

DEFINITION 4.2.2 The linear Bayes estimate of egiven the data Z is

(4.2.4)

Suppose linear expectation and linear variance of distribution of gee), fez Ie) are given respectively by EL(e)(= E(e)), vL(e) = (V(e)), EL(Z Ie), VL(Z Ie). If (Z, e) are jointly normal, then E(e), Vee), E(Z Ie), V(Z Ie) coincidewith the corresponding linear expectations and linear variance respectively.

In LB inference we assume that the relationship which hold among E(e), v (e),E(Z I e),V(Z I e),E(e I Z),v(e I Z) in case (Z,e) follow jointly normal distribution, also extend to the corresponding linear expectations andvariances, EL(e), VL(e), EL(Z Ie), etc. The relations are: If

(4.2.5)

then(4.2.6.1)

(4.2.6.2)

I Z) are

andVL-

1(e IZ)EL(e IZ) = cVL-1(Z Ie)(Z - d)

+vL-1(e)EL(e)

The LB estimate of e, EL(e I Z) and its linear variance VL(ecalculated from the relations (4.2.6.1), (4.2.6.2).

More generally, suppose we have a prior distribution for Y given X =(Xl,'" , X n)' with linear expectation EL(Y IX) and linear variance VL(Y IX). A new data X n+1 is obtained and the likelihood of X n+1 given (Y, X)has linear expectation

n

EL(Xn+1 I Y, X) = cY +L aiXi + ai=l

cY + d (say)

(4.2.7)

n

where d = L aiXi + a and linear variance VL(Xn+1 I Y, X). Then thei=l

following relations hold:

VL-1(y IX, X n+1) = c2VL-1(Xn+1 IY, X) + VL-

1(y IX) (4.2.8.1)


VL-l(y IX,Xn+l)EL(Y IX,Xn+l ) = CVL-l(Xn+l IY,X)

(Xn+l - d) + Vil(y IX)EdY IX) (4.2.8.2)

The LB estimation does not assume any loss function. Under squarederror loss the procedure gives classical Bayes estimates under normalityassumptions and approximately classical Bayes estimates when normalityis not assumed.

EXAMPLE 4.2.1

Let Xl, ... ,Xn be independently and identically distributed random variables with mean Jl and variance 0-2 . Let Jl have a distribution with meanJlo and variance 0-5. Here, EL(Jl) = E(Jl) = Jlo, VL(Jl) = V(Jl) = 0-5· AlsoEL(Xi I Jl) = E(Xi I Jl) = Jl, VL(Xi I Jl) = V(Xi I Jl) = 0-2(i = 1, ... ,N).

n

Let X = L Xi/no Then EL(X I Jl) = Jl(c = 1, d = 0), VL(X I Jl) = 0-2In.

i=lHere

and(ii)

Therefore, LB estimate of Jl is

(iii)

which coincides with the ordinary Bayes estimate of Jl. If 0-0 ---. 00, EL(P, IX) = X and VL(Jl IX) = 0-

2In.

If a further observation X n+l is obtained, the prior has linear expectationEL(Jl I X) = EL(Jl I X) (where X = (Xl, ... ,X n)') given in (iii) andlinear variance VL(Jl IX) = Vdp, IX) given in (i) (Before observing X n+lthese were the linear posterior expectation and linear posterior variance,respectively.) Also, EL(Xn+l I X,fL) = Jl ( so that c=l, d=O) , VL(Xn+l IX, Jl) = 0-2 . Hence, by (4.2.8.1) and (4.2.8.2)

VL-l(JlI X,Xn+l )

VL-l(JlI X, Xn+l)EdJl IX, X n+l )

so that( )

- 2 2E ( IX X ) = n + 1 Xn+lo-o + JloO-

L Jl ,n+l (n + 1)0-5 + 0-2

4.2. LINEAR BAYES PREDICTION 97

n+l

where Xn+1 = L Xii (n + 1). Note that the normality is not assumedi=l

anywhere.

Multivariate generalisaton

Consider now Y, Xl, ... , X n , all vector random variables, Y = (Yi, ... ,Ym )',

Xi = (Xil, ... ,Xip)',(i = 1, ,n). Define the linear expectation of theprior distribution of Y on Xl, , X n as

and the linear variance

Suppose that the new data Z given X = (Xl,"" X n ) and Y have linearexpectations

EL(Z IX, Y) = CY + D

whereD = ao+AX

Then

VL(Z IX, Y) = E[(Z - CY - D)(Z - CY - D)' IX, Y]

The following relations now hold:

(4.2.9)

Vil(y I z, X) C'VL-l(Z IY, X)C + Vil(y IX)Vi\Y IZ, X)EL(Y IZ, X) = C'VL-l(Z IY, X)(Z - D)

+VL-l(y I X)EL(Y IX) (4.2.10)

Brunk(1980) has given the following results on LB-estimation. Consider0= (01 , ... , Om)' , a vector of parameters and Z = (Zl,"" Zn)' , a sampleof n observations. The LB-estimator of 0 given Z is

(4.2.11)

The linear dispersion matrix of 0 given Z is

The following results hold:

EL(O IZ) E(O) + Cov(O, Z)[Cov(Z)tl(Z - E(Z))VL(O IZ) = Cov(O) - Cov(O, Z)[Cov(Z)]-lCov(Z, 0)

(4.2.12)

(4.2.13)


4.2.1 LINEAR BAYES ESTIMATION IN FINITE POP

ULATION SAMPLING

In the finite population set up, the problem is to estimate a linear functionN

bey) = L bkYk where b = (b1 , ... , bN )' is a known vector. Let, as before,k=l

y = (y~,y~)',b = (b~, b~)' so that bey) = b~ys + b~ys, when s is a sampleand other symbols have obvious meanings. Smouse (1984) considered LBestimation of bey) in sampling from a finite population.

Assume y to be a random vector having pdf € on R N . Using Bayesianprinciple for making inference, the posterior distribution of y given the datad= {i,Yiji E s} is

eYld = e(y)/J.. ·ld e(y)dy (4.2.14)

where 0d = {y : y is consistent with d}. If €(y) is completely specified onecan find eo and hence Bayes estimator bB(y) of bey).

The LB-estimate of b'(y) is b'YLB where YLB is the linear expectation of ygiven the data. Let

E(y)

D(y)(4.2.15)

Considering Brunk's results in (4.2.11)-(4.2.12.2), Ys = Z, Y = B. Hence,

Cov(B, Z) = Cov(y, Ys) = [i:s

]

Therefore, from (4.2.12.1),

YLB = J.l + (~Sl ~sr)'~:;l(ys - J.ls)

i.e. LB estimate of Ys is

YsLB = Ys

and similarly, LB estimate of Ys is

YsLB = J.ls + ~rs~:;1 (Ys - J.ls)

(4.2.16)

(4.2.17.1)

(4.2.17.2)

From (4.2.13)

VL(y IYs) ~ - [~s ~sr]'~;l[~s

[~ ~r - ~r~~;l~sr](4.2.18)

4.3. RESTRICTED LIN. BAYES PREDICTION 99

In case p, and ~ are known, (4.2.17), (4.2.18) gives LB-estimator bey) andits variance. In case p, is not known precisely, we, generally, assume acompletely specified prior distribution for p,. For LB-approach it is enoughto know the mean and covariance matrix of p,. Suppose

E(p,) = II = (ll., lis)'

D(p,) = n = [ns nsr ]nrs nr

Then,E(y) II

Cov(y) ~+n

Cov(y, Ys) = [(~s + ns), (~sr + nsr )]',

(4.2.19)

(4.2.20)

The LB-estimator of y and its linear variance can ,therefore, be obtainedby replacing p, by A and ~ by ~+ n in (4.2.17) and (4.2.18).

Cocchi and Mouchart (1986) considered LB estimation in finite populationwith a categorical auxiliary variable. O'Hagan (1987) considered Bayeslinear estimation for randomized response models. Mukhopadhyay (19998b) considered linear Bayes prediction of a finite population total undermeasurement error models. Godambe (1999) investigated linear Bayes estimation procedure in the light of estimating functions.

4.3 RESTRICTED LINEAR BAYES PREDIC

TION

Rodrigues (1989) considered a different kind of Bayesian estimation of finitepopulation parameters. His procedure also does not require full specification of the distribution of the prior. We consider two relevant definitions.

DEFINITION 4.3.1 A predictor 0 = a + t'ys, where a is a constant and t isa n x 1 real vector is said to be a Restricted Bayes Least Squares Predictor(RBLSP) or Restricted Linear Bayes Predictor (RLBP) of () if

(4.3.1)

where the expectation is taken with respect to the predictive distributionof Ys' The corresponding class of predictors satisfying (4.3.1) is denoted as.c.

DEFINITION 4.3.2 A predictor 0* = a* +t*'ys is said to be the best RBLSPof () if O' E .c and


(4.3.2)

and for all parameters involved in the predictive distribution of y with strictinequality holding for at least one B.In RBLSP we are restricting ourselves to the linear unbiased predictorswhere unbiasedness is with respect to the predictive distribution of the dataYs' The unbiasedness in (4.3.1) is a generalisation of model-unbiasednessdefined in Section 2.2, where the unbiasedness is with respect to the superpopulation model and no prior distribution is assumed for the parametersinvolved in the model. The concept can be extended to the quadratic unbiased Bayesian estimation which may be useful in estimating quadratic functions of finite population values, like population variance, design-varianceof a predictor.

The RBLSP method was used by La Motte (1978) for estimating superpopulation parameters.

We now recall some results on least squares theory when the parametersare random variables (vide Rao, 1973, p.234). Consider the linear model

E(y 113) = Xf3, D(y I 13) = V

where 13 itself is a random vector with

E(f3 Iv) = v, D(f3 Iv) = R

(4.3.3)

(4.3.4)

Our problem is to find the best linear unbiased estimator (BLUE) of P'f3.where P is a vector of constants. Here,

E(y) = Xv, D(y) = V + XRX'

C(y,P'f3) = E[C(y,P'f3)lf3]+C[E(Ylf3),P'f3)]= XRP

(4.3.5)

where C denotes model-covariance. We find a linear function a + L'y suchthat

E(P'f3 - a - L'y) = 0

andV(P'f3 - a - L'y)

is minimum among all functions satisfying (4.3.6).

Case 1. v known. The optimum choice of L and a are

L* = (V + XRX')-I XRP = V-I X(R-I + X'V-I X)-IP

(4.3.6)

(4.3.7)

(4.3.8)

4.3. RESTRICTED LIN. BAYES PREDICTION

a* = v'P - v'X'L*

and the prediction variance is

V(P'(3 - a* - L*'y) = P'RP - P'RX'L*= P'(R-l + X'V-l X)-l P

Case 2. v unknown. The optimum choice of L and a are

a* = 0

101

(4.3.9)

(4.3.10)

(4.3.11)

(4.3.12)

provided that there exists an L such that X'L = P and the predictionvariance is

(4.3.13)

Rodrigues (1989) obtained RBLSP of a population function () = q'y for aknown vector q = (ql, ... , qN)' of constants under the random regressioncoefficients model (4.3.3), (4.3.4). This model is called M(V, R). We denote

fl = [fls fl sr ] _ [ XsRX~ XsRX; ]fl rs flr - XrRX~ XrRX;

V* = [~: ~~ ] [It. + fl s It.r + fl sr ]v;.s ~* = v;.s + fl rs v;. + flr

=V+fl (4.3.14)

K = [It. It.r] , K* - [V* ~~]- s

The following lemma can be proved by standard computations.

LEMMA 4.3.1 Under the model M(V,R)

0= a + t'ys E J:, iff E[a + j'Ys - u'(3] = 0

wherej' = t' - q'K'~-l

andu' = q' (X - K'~-lX s)

LEMMA 4.3.2 Under the model M(V,R), for all a and t',

E(O - ()2 = V(a + j'Ys - u'(3) + q'Vq - q'K'~-lK q

(4.3.15.1)


+[a. + (t' X s - q'X)lIf

Lemma 4.3.2 corresponds to a result in Tam (1986) under frequentist approach. The above lemmas reduce the problem of predicting () = q'y intothe problem of predicting u'f3.It is clear from lemmas 4.3.1 and 4.3.2 that the problem of finding 0* isequivalent to the determination of a + f'Ys such that

E(a+ f'Ys - u'(3) = 0

andV(a+ f'Ys - u'(3)

is minimum. The problem can be solved in a straight-forward manner byusing Rao's result stated in (4.3.6)-(4.3.13).

Case(i) 1I known. Here

f* = (~+ XsRX~)-1XsRu~-IXs(R- I + X~Vs-IXs)-Iu

a* = lI'U - lI'X~f*

V(a* + J*'ys - u'(3) = u'(R-I + X~~-IXs)-IU

This gives

0* = a* + t*'ys= q~ys + q~(Xr/3 + VrsVs-I(ys - X s(3))

where

/3 c/3v + (1 - C)lI,/3v = (X~~-IXs)-IX~~-IysC (R-I +X'V-1X )-I(X'V-IX)ss s 55 S

andE(O* - ())2 = u'(R- I + X~~-IXs)-Iu+

q'Vq - q'K'~-I K q,

Case (ii) 1I unknown and fixed. Here

f * = V-I X (X' V-I X )-Iu a* = 0,s S 55 S ,

(4.3.15.2)

(4.3.15.3)

(4.3.16.1)

(4.3.16.2)

(4.3.17.1)

(4.3.17.2)

4.4. CONSTRAINED BAYES PREDICTION 103

This gives

(4.3.18.1)

and

(4.3.18.2)

The predictor e* given in (4.3.18.1) was obtained by Royall and Pfeffermann(1982) by using the multivariate normal distribution for y given f3 anddiffuse prior for f3. The results (4.3.16.1),(4.3.16.2) were also obtained byBolfarine et al (1987) and Malec and Sedransk (1985) by using normalityassumptions.

EXAMPLE 4.3.2

Model M(V,R) with X = (1, ... ,I)', V = u2I, R = U5, q' = (liN, ... , liN)'.Here BRBLSP of fj is, using formula (4.3.16.1),

,,* ~* u2IN + u5 _ u2IN + U5 _ ~y = e = ( 21 + 2 )Ys + (1 - 21 + 2 )1/, Ys = LJ y;fn,

u n Uo u n Uo iEs

and,,* -2 2 (N+n)u5+ u2

E(y - y) = (1- J)u In[ N( 21 2) ],f = nlNu n + Uo

Goldstein (1975) with the purpose of estimating the superpopulation parameter f3 considered the Bayes linear predictor of f3 = E(f}), which isactually a predictor of fj,

withE(ec - f3)2 = (nlu2+ 1/(6)-1

For further details the reader may refer to Rodrigues (1989).

4.4 CONSTRAINED BAYES PREDICTION

Suppose we have m parameters el , ... em with corresponding estimatesel , ... ,em. Sometimes, it is desirable to produce an ensemble of parameterestimates whose histogram resemble the histogram of population parameters in some sense. This occurs, for example, in subgroup analysis when the


problem is not only to estimate the different components of a vector, butalso to identify the parameters whose values are above and below a certaincut-off point (see Ghosh and Maiti (1999) for further examples in this area).Louis (1984) wanted to modify the Bayes estimates to satisfy this property.He attempted to match the first two moments from the histogram of Bayesestimates with the corresponding moments from the histogram of parameters in a normal theory set up. Specifically, let OJ ir::.dN(tl, 7

2), i = 1, ... , m

and X j I Oi ir::.dN(Oil 1). Then Bayes estimate of OJ under the summedm

squared loss [SSEL = 2)OJ - OJ?] function isi=l

AB .OJ = Ii + D(xj - Ii), ~ = 1, ... ,m

where

Letting

Louis proved that(i)

_ "BE(Olx)=O

but(ii)

E[f(Oj - 1J? I x] ~ f(op _ 1/)2j=l j=l

(4.4.1)

(4.4.2)

(4.4.3)

(4.4.4)

where x = (Xl, ... , X m )' I the sample observations. Thus, for any given x,the mean of two histograms, of estimates and of posterior expected values ofO's coincide I while the variance of histogram of estimates is only a fractionof the posterior expected values of the variance of histogram of parameters.Louis pointed out that this phenomenon was due to overshrinking of theobserved estimates towards the prior means (exercise 1).

EXERCISE 4.4.1 Suppose X j I OJ are independent N(Oj, 18) and OJ are iidN(O,9). Then the Bayes estimate of 0 is OB(x) = (~X1'" ., ~Xm)'. Also,

E[(m - 1)-1 f(Of(x) - 1/(x)]2 = 3j=l


m

E[(m - 1)-1 2)Oi - 8)2 I x] = 9i=l

so that the Bayes estimates underestimate the posterior expected value ofthe variance of the parameters by a factor 2/3.

Ghosh (1992) proved that (4.4.4) holds true in a more general set up. Suppose that 01 , . .• , Om are the m parameters of interest and ef (x), ... e;;' (x)are the corresponding Bayes estimates for any likelihood function of x andany prior of 0 = (Oll"" Om)' under any quadratic loss function. Assumethat(A) not all 01 - 8, ... ,Om - 8 have degenerate posterior distributions.

The assumption A is much weaker than the assumption that V(O I x) ispositive definite.

THEOREM 4.4.1 Under assumptions (A)

m m

E[~)Oi - 8)2 Ix] > '2)ef - eB)2

i=l i=l

wherem

eB = eB(x) = L ef(x)/mi=l

Proof·m

E[L(Oi - 8)21 x] = E[O'(Im - m-1Jm)e 1x]i=l

m

= L(E(Oi I x) - E(8 Ix))2 + tr[V((O - 81m) I x]i=l

m

> L(ef(x) - eB (x))2i=l

(4.4.5)

(4.4.6)

(4.4.7)

since tr(B) > 0, where Jm = 1m 1;" and B is a positive semi-definite matrix.

Taking as a desirable criterion the property of invariance of first two moments of the histogram of parameters and their estimates, we have thefollowing definition.

DEFINITION 4.4.1 A set of estimators


of () is said to be a set of constrained Bayes (CB) estimators of () if eCB (x)minimises

m

E[~)(); - tY I x];=1

(4.4.8)

within the class of estimates t(x) = (t1 (x), ... , t'l' (x))' of () which satisfy(a)

(b)

m

E(iJ I x) = m-1 L t;(x) = f(x);=1

m m

E[L((); - iJ)2 Ix] = L(t;(x) - f(x))2;=1 ;=1

(4.4.9)

(4.4.10)

The Bayes estimate eB(x) = (ef(x), ... , e!:.(x))' satisfies (4.4.9) but not(4.4.10).

Let us writeH1(x) = tr[V(() - iJlm Ix)]

m

H 2(x) = L(ef(x) - eB(x)?;=1

THEOREM 4.4.2 Let Xo = {x: H2(x) > O} and

eB(x) = (ef(x), ... , e!:.(x))'

(4.4.11)

(4.4.12)

denote the Bayes estimate of () under any quadratic loss function . Then,for any x E Xo,

efB(x) = aef(x) + (1- a)eB(x), i = 1, ... ,m

where

Proof. We havem

E[L((); - t;? I x];=1

m m

= E[L((); - ef(x))21 x] + L(ef(x) - t;?;=1 ;=1

Now,

(4.4.13)

(4.4.14)

(4.4.15)

m

L(ef(x) - t;? = L(t; - f? - 2 L(ef(x) - eB(x))(t; - f);=1


m

+ '2)ef(x) - eB(x)?i=1

(4.4.16)

whereP[ZI = ef(x),Z2 = til = 11m, i = 1, ... ,m.

Now, V(ZI) = H2 is a fixed quantity. Also, V(Z2) = 2:::1 (ti - f'i 1m =HI + H2 is a fixed quantity (because of the requirement (4.4.10». Hence,minimum value of (4.4.16) is attined when the correlation P(ZI' Z2) = 1,Le. when Z2 = aZl +b with probability one for some constants a(> 0) andb. Thus

ti = ti(X) = aef(x) + b,i = 1, ... ,m

Now, (4.4.9) requires

This givesti(X) = aef(x) + (1- a)eB(x)

By virtue of (4.4.6), (4.4.10) and (4.4.17),

m

HI (x) + H2(x) = I)ti - fi = a2H2(x).i=1

Hence, for x E Xo,

(4.4.17)

EXAMPLE 6.4.2 Let Xl>" ., X m be m independent random variables, whereXi has pdf (with respect to some er-finite measure),

ft/>i(xi) = exp{n4>ixi - mp(4)i)}, i = 1, ... , m (i)

Each Xi can be viewed as an average of n iid random variables, each havinga pdf, belonging to a one-parameter Exponential family. Assuming that'ljJ(.) is twice differentiable in its argument, it is desired to estimate 0i =Et/>.(Xi) = 'ljJ'(4)i), i = 1, ... , m. For conjugate prior

g(4)i) = exp(v4>iP, - v'ljJ(4)i)) (ii)

for 4>i, the Bayes estimate of OJ for the squared error loss function is givenby

ef(x) = E(Oj I x) = (1 - B)xj + Bp, (iii)


where B = I//(n + 1/). Also,

V(Oi Ix) V('IjJ'(¢i) I Xi)= E['IjJ"(¢i) I xi]/(n + 1/) = qi (say)

It follows thatm

HI (x) = (1 - 11m) L qii=l

m

H2(x) = (1 - B)2 L(Xi - x)2i=l

from which 'a' can be determined.

(iv)

In particular, suppose that pdf of Xi belongs to the QVF (quadratic variance function) subfamily of the natural exponential family. Thus,

(v)

where I/o, 1/1,1/2 are not all zero and 1/2 < n + 1/. It follows from (iv) and (v),

qi = [I/O + I/lef(x) + 1/2(ef(x)?]/(n + 1/ - 1/2)

HI (x) = (m-1)(n+I/-1/2)-1 [I/O + I/leB (x) + 1/2{ (eB(x»2 +H2(x)/m}] (vi)

Therefore, for x E Xo,

a2(x) = [1 + 1/2(n + 1/ - 1/2)-1(1 - 11m)]

+(m - l)(n + 1/ - 1/2) -1 [I/O + I/leB (x) + 1/2(eB(x) )2]/H2(x) (vii)

When the X;'s are averages of iid Bernoulii variables, I/o = 0,1/1 = 1,1/2 =-1. For the Poisson case, I/o = 1/2 = 0,1/1 = 1. For the normal case,1/1 = 1/2 = 0, I/o = Var(Xi ).

Note 4.4.1

Unlike the classical Bayes estimators, the CB estimators change if weightedsquared error loss (0 - O)'W(O - 0), where W is a m x m matrix of weights,is used instead of the Eucledian distance I::::l (Oi - Oi)2.

The following theorem indicates the Bayes risk dominance of CB-estimatorover the sample mean under a conjugate prior. Consider the model M:

X IOrvN(O, (J"21m) ((J"2 unknown)

~ : O,,-,N(O, 721m ) (4.4.18)


Here,

wherem

;=1

Hence,eCB(x) = (1 - B)[Xlm + a(X)(X - Xl m )] (4.4.19)

m

THEOREM 4.4.3 Let r(e, e) = E{~)e; - O;? Ix} where e = (ell"" em),;=1

denote the Bayes risk of an estimator e of 0 under the model M and SSEL.Then

r(e, eCB ) < r(e, X) for m ~ 4.

Ghosh and Maiti (1999) considered generalisation when 01 , ... ,Om are vectors of parameters.

Note 4.4.2

Efron and Morris (1971, 1972) (exercise 2) pointed out that Bayes estimators may perform well overall but do poorly (in frequentist sense) forestimating individual O;'s with unusually large or small values. To overcome this problem they recommended the use of limited translation (LT)Bayes estimators of O. They suggested a compromise, which consists of restricting the amount by which the Bayes estimator Of differs from the mlestimator of1L by some multiple of the standard error of X;. In the modelX;rvN(O;, (12), O;rvN(O, T 2 ), i = 1, ... , k, the modified estimator is

(4.4.20)

where K is a suitable constant.

The estimator (4.4.20) compromises between limiting the maximum possible risk to any component Of for any unusually large or small value of 0;and preserving the average gain of OB (= (Of, ... ,Of)). The choice K = 1,for example, ensures that E(OfT - 0;)2 < 2(12 V i while retaining more than80 % of the average gain of OB over X = (Xl, ... , Xk).

The LT Bayes estimator does not seem to have received considerable attention in survey sampling.


4.4.1 ApPLICATIONS IN FINITE POPULATION SAM

PLING

EXAMPLE 4.4.3

Suppose there are m strata in the population, the ith stratum of size N i

having values (Yip"" YiNJ' of the study variable Y on the units in thepopulation (Yis[= (Yil"",YinJ']' on the ni sampled units) (i = 1, ... ,m).The objective is to predict

N;

1 = hI,'" ,1m)' where 1i = LYij/Nij=1

on the basis of Yis(i = 1, ... ,m). Denote

( , , )'Ys = Y1s>"" Yms

Consider the following model:(a)

(b)

(4.4.21)

Bi i,,:-dN(j.L, T2

)

It follows from Ghosh and Meeden (1986), Ghosh and Lahiri (1987 a) that

where

Ehi IYs)

V(1i IYs)

COVhi,1k IYs)

= (1 - liBi)?};s + liBij.L

Ii 0-2 [Ni-1+ lini1(1 - Bi)], i = 1, ... , m

= O,i:rf:k=l, .. "m

n;

Yi = LYij/ni, Bi = 0- 2/(0-2 + niT2 ), Ii = 1 - ndNj=l

CB-predictors of 1, i CB are found by computing H 1(y) and H 2(y) and usingformula (4.4.13).

Lahiri(1990) obtained constrained empirical Bayes (CEB) predictors of 1by finding estimates of j.L, 0-2 and T 2 and then substituting these in i CB •

He called these as 'adjusted EB-predictors'. Following Hartigan (1969),Ericson (1969), Goldstein (1975) and others, he also replaced the normality


assumptions by a weaker assumption of 'posterior linearity', discussed insection 3.3.

EXAMPLE 4.4.4

Ghosh (1992) used CB-estimation to estimate the average wage and salariesof workers in a certain industry consisting of 114 units and spread over 16small areas. He also wanted to identify the areas with very low or very highaverage wages. A simple random sample was taken and the sampled unitswere post-stratified into these small areas. It turned out that 3 of the 16areas had no representation in the sample.

The following mixed effects model was considered:

Yij = (30 + (31 Xij + Vi + e:jVX0 (4.4.22)

where Yij(Xij)= average wage (gross business income) of unit J III areai, Vi = a random area effect, v;'s and ei/s are independently distributed

with Vi ir::d N(O, (Ar)-l), eij ir::d N(O, r-1)(i = 1, ... , 16; j = 1, ... , N i ; LNi

= 116), (30, (31 are unknown regression coefficients. The model (4.4.22) givesthe conditional likelihood of the data given the parameters (130, (31, A, r).The following priors were assumed:

(i)

(ii)

(iii)

rrvGamma(ao/2, go/2) (4.4.23)

Arrv(a1/2, gl /2)

The authors used diffuse Gamma priors on rand Ar with aD = go = gl = 0and a1 = 0.00005 (a1 = 0 could lead to an improper distribution). Smallarea models using mixed effects and Hierarchical Bayes estimation havebeen discussed in details in Mukhopadhyay (1998 e).

The model given in (4.4.22) and (4.4.23) is a special case of the modelin Datta and Ghosh (1991). Using these results Ghosh found Hierarchical Bayes (HB) predictors :YiHB = ECri I Ys) and VCri I Ys),H1(ys) andH 2 (ys). He adjusted these HB-predictors to find constrained HB-predictors:ypHB(ys) of small area means li(i = 1, ... ,16). The three estimators, sample averages YiSl:yf B(y), :ypHB(y) along with their associated standard errors were compared with reference to the average of the squared deviations

A

of the estimates from the true means, ASD = L(ei - Mi)2/A, averagei=l


A

bias, AB = L I ~ - M i I /A and average relative bias of the estimatesi=l

A

ARB = L(I ei - M i I)/AMi , where Mi is the true mean of the smalli=l

area i and A is the number of small areas for which estimates are available(A=13 for ij;s and 16 otherwise). It was found that on an average the HBpredictors :yfB resulted in a 77.05% reduction in ASD and 60% reductionin ARB compared to sample means. The adjusted Bayes estimators :ypHBhad slight edge over :yfB resulting in 79.7% reduction in ASD and 52.9%reduction in ARB compared to ij;s (Ghosh and Maiti, 1999). The authorobserved that the CB estimators identified the areas with very high andlow wages more successfully than the other two estimators. It has beenrecommended that in case of the dual problem of estimation and subgroupidentification, one should use CB estimators in preference to usual Bayesestimators.

4.5 BAYESIAN ROBUSTNESS UNDER A

CLASS OF ALTERNATIVE MODELS

In this section we shall consider the robustness of a Bayesian predictorderived under a working model with respect to a class of alternative models.Consider the model M (V, R, v) :

y = X{3 + f, frvNN(O, V)

{3rvNp(v, R)

(4.5.1)

The model has been discussed in details in Theorem 3.3.1. We use hereslightly different notations. Let

L::. s = Diag(ok' k = 1, ... ,N), Ok = 1(0) if k E (rJ.)s

Ys = L::.sy, Yr = (1 - L::.s)y

X s = L::.sX, X r = (1 - L::.s)X

~ = L::.sV L::.s, v;. = (I - L::.s)V(l - L::.s)

Y.r = V:s = L::.sV(l - L::.s)

Hence,(4.5.2)

4.5. BAYESIAN ROBUSTNESS 113

where <I> is the square null matrix (here of order p). Note that both Ys andYr are N x 1 vectors. Suppose rank (Xs ) = p. Let

A-1 X~V-X s

D (A-1+R-1)-lA-1

Do (A-1 + R-1)-1 R-1

~* AX~V,-ys

where G- denotes the generalised inverse of a matrix G. Note that Do +Dis the identity matrix of order p.

THEOREM 4.5.1 In the model M(V; R, v) the posterior distribution of Yrgiven Ys is multivariate normal with mean

and conditional variance

D(Yr IYs) = (~- ~sV,-~r) + (Xr - ~sVs-~r)

DA(Xr - ~sV,-v;,r)'

where

(4.5.3)

(4.5.4)

j3 = E({3 IYs) = D~* + Dov

is the Bayesian estmate of {3.Proof. Follows as in Theorem 3.3.1 or from Lindley and Smith (1972).

The Bayesian estimator j3 may be looked upon as a generalisation of theconvex combination of the generalised least square estimator ~* and priormean v. In case prior distribution of {3 is non-informative, R-1 = <I> andj3 = ~*.

Consider an alternative model

X* = (X, Z), {3* = ({3',8)'

{3*.-.JNp+L(V* = (v', A')', R*)

R* _ [R ~]- n~ H

where Z is a NxL matrix of values of additional L variables Xp+l," ., Xp+L, {j

is a LxI vector of corresponding regression coefficients and A, no, n have obvious interpretations. Call this model as M* (V, R*, v*). Let E*, D* denote,


respectively, expectation and dispersion matrix (variance) with respect toM*. Define

Thus

x; = D..sX* , X; = (I - D..s)X* ,

Zs = D..sZ, Zr = (I - D..s)Z (4.5.5)

Bolfarine et al (1987) considered the conditions under which the posteriormean or posterior distribution of a linear function qly remains the sameunder a class of Bayesian models B, Le. under a class of combinations oflikelihood functions and priors. They called these conditions as a set ofconditions for robustness. Under these conditions Bayes predictors undersquared error loss remains the same for all the models in B. We have thefollowing definition.

DEFINITION 4.5.1 Weak Robustness (Strong robustness or simply Robustness) A set of conditions R is a set of conditions for weak robustness (or strong robustness or simply, robustness) in relation to a linear functionq'y for a class B of Bayesians (i.e. Bayesian models), if under R the posterior expectation (distribution) of Yr [of qlyJ given Ys remains the same forall elements of B.

The following theorem considers a class of Bayesian models B, for whichexpectation of Yr given Ys equal either to E(Yr IYs) or E*(Yr IYs). Clearly,models M and M* are members of B. The authors found conditions underwhich the posterior expectation of qly given Ys remain the same under allmodels in B.

THEOREM 4.5.2 (Bolfarine et aI, 1987) For any class B of Bayesians whoseposterior means of Yr given Ys are either equal to E(Yr I Ys) or E*(Yr I Ys)the following set of conditions R form a 'weak robustness set' in relationto the linear function qly. The condition Rare:

(i) 8 and f3 are independent(ii)

q'Zr = q'[(Xr - v;.s~-Xs)(A-1 + R-l)-lX~

+v;.s]V~- Zs (4.5.6)

For models M and M*, R is actually a set of conditions for ( strong)robustness in relation to the linear function q'y.

4.5. BAYESIAN ROBUSTNESS

EXAMPLE 4.5.1

115

Suppose V is a positive definite diagonal matrix and define ~-I = !:!..SV-I !:!"S.Suppose R-I = <I> and condition (i) of R holds. Then condition (4.5.6)reduces to

q'Zr = q'XrAX~~- Zs

{:} q'(I - !:!..s)Z = q'(1 - !:!..s)X(X'!:!..sV-I!:!..sX)-1

X'!:!..sV-IZ

Under these conditions, q'Yr IYs is normally distributed with mean

(4.5.7)

E·{q'Yr IYs}q'(1 - !:!..s)X(X'!:!..sV-I X)-I X'!:!..sV-Iys

and variance

D·{q'Yr IYs}q'(I - !:!..s)[V-1+X(X'!:!..sV-IX)-lX'j

(I - !:!..s)q

Condition (4.5.7) coincides with the condition of Pereira and Rodrigues(1983) for unbiasedness of Royall's BLUP T·(X, V) under the model (X·, V)where V is diagonal.

EXAMPLE 4.5.2

Consider the model of example 3.2.2. Suppose Z = IN. Condition (4.5.6)reduces to

(i)

This condition ensures robustness of the Bayesian predictor of populationtotal

. N-nTB = TI + --xs(xs + (J2/nR)-I(nys + (J2 v /R) (ii)

n

where T I = LYi, if {3 and {3 and fj are independently and normally dis

tributed.

For a non-informative prior of {3, R ---+ 00 and TB reduces to the ratiopredictor which remains robust under both these models (model of example3.2.2 and the present) if Xs = xs ' A Bayesian, therefore, may use thefollowing sampling rules. If one is sure that fj is not included in the model,one may use the optimal s. d. p. to control the variance as suggested in


example 3.2.2. However, if there is doubt that fj may appear in the modelone should take a balanced sample for which Xs = x. (or a sample for whichthe condition (i) is satisfied if the parameters R is known). In both thecases purposive samples are recommended.

EXAMPLE 4.5.3

Suppose X = IN, V = (J2 I,v is a finite real number, R is a positive realnumber,

[

Xl xi .ooXf ]Z = . 2 ... L ,no = (0, ... ,0)'

XN XN .. ,xN

The Bayes predictor of population total T under M(V, R, v) is

Condition (4.5.6) reduces to

(1 + 21 R) -(j) - -(j) . - 1 L(J n x. -xs ,]- , ... ,

where

(4.5.8)

If (J2 1R ~ 0, then condition (4.5.8) is close to the conditions of balancedsampling designs of Royall and Herson (1973) and TB ~ Nys' If v = 0, TB =[1+ N~n (1+(J2 InR)-l]Tl, where Tl = 2:s Yi, which is similar to an estimatorproposed by Lindley (1962). The factor (J21nR which uses the informationabout the current population relative to the prior may, therefore, be calleda' shrinkage factor'.

In the next section we consider robust Bayesian estimation of finite population parameters under a class of contaminated priors.

4.6 ROBUST BAYES ESTIMATION UNDER

CONTAMINATED PRIORS

Following Berger (1984), Berger and Berliner (1986), Sivaganeshan andBerger (1989), Ghosh and Kim (1993, 1997) considered robust Bayes estimation of a finite population mean -y(y) = tr 2:;:'1 Yi under a class ofcontaminated priors. We first review robust Bayesian view-point of Bergerand Berliner (1986).

4.6. CONTAMINATED PRIORS 117

Let X denote an observable random vector having a pdf f(x I B) indexedby a parameter vector B E 8. Consider the class of priors 'Tr for B,

rQ = {'Tr: 'Tr = (l-E)'TrO +Eq,q E Q},E E [0,1] (4.6.1)

(4.6.2)

where 'Tro is a particular well-specified distribution, q is another prior distribution, Q is a subset of Q, the class of all prior distributions of B on 8.Clearly, the class r Q is a broader class than the singletone class {'Tro} andthus considers errors in assessment of subjective prior 'Tro. Such priors havebeen used by Blum and Rosenblatt (1967), Hubler (1973), Merazzi (1985),Bickel (1984), Berger (1982, 1984), Berger and Berliner (1986), among others. The most commonly used method of selecting a robust prior in r Q isto choose that prior'Tr which maximises the (marginal) predictive density

m(x I 'Tr) = 1m(x IB)'Tr(dB)

= (1 - E)m(x I 'TrO) + Em(x Iq)

over Q. This is equivalent to maximising m(x I q) over Q. Assuming thatthe maximum of m(x I q) is uniquely attained at q = q, the estimated prior7l-, called the ml (maximum likelihood)-II prior by Good (1963) is

7l- = (1 - E)'TrO + Eq

For an arbitrary prior q E Q, the posterior density of B is

'Tr(dB Ix) = )..(x)'Tro(dB Ix) + (1- )..(x))q(dB Ix)

where )..(x) E [0,1] and is given by

)..(x) = (1 - E)m(x l'Tro)m(x 1'Tr)

(4.6.3)

(4.6.4)

(4.6..5)

Further, the posterior mean 51f and the posterior variance V 1f of B (whenthey exist) are given by

51f (x) = )..(x)51fO(x) + (1- )..(x))5Q(x)

V 1f (x) = )..(x)V1fO(x) + (1 - )..(x))vQ(x)

+)..(x)(l- )..(x))(51fO(x) - 5Q(x))2

(4.6.6)

(4.6.7)

If C is a measurable subset of 8, then the posterior distribution of C withrespect to 'Tr is


When Q = Q, assuming a unique mle O(x) exists for (), the ml-II prior of ()in f Q is given by

ir(d()) = (1 - E)11"O(()) + dlx(d()) (4.6.8)

where iix(.) is a degenerate prior of (), which assigns probability one to() = O(x). The ml-II posterior of () is then given from (4.6.4) as

1i-(. Ix) = '\(x)11"o(. Ix) + (1 - '\(x))iix(.)

where

'\(x) = (1- E)m(x 111"0)/[(1 - E)m(x 111"0) + Ef(x IO(x))]

The ml-II posterior mean of () is then

8*(x) = '\(x)811"0(x) + (1 - '\(x))O(x)

and the posterior variance of () is

(4.6.9)

(4.6.10)

(4.6.11)

V(() 11i-(x)) = V*(x) = '\(x)[V11"O(x) + (1 - '\(x)) (811"0 (x) - O(x))2] (4.6.12)

When the data are consistent with 11"0, m(x 111"0) will be reasonably largeand '\(x) will be close to one ( for small E), so that 8* will be essentiallyequal to 811"0. When the data and 11"0 are incompatible, m(x I 11"0) will besmall and '\(x) near zero; 8* will then be approximately equal to mle O.An interesting class of priors f s involves symmetric modal contamination.Here

Q = { densities of the form q(1 () - ()o I), q non-increasing}

Since any symmetric unimodal distribution is a mixture of symmetric uniform distributions (cf. Berger and Silken, 1987) it suffices to restrict qto

Q' = { Uniform (()o - a, () + a) densities, a;::: O} (4.6.13)

where a is to be chosen optimally. For the class fs, the ml-II prior is

ir = (1 - E)11"O + Eq

where q is uniform in (()o - ii, ()o +ii), ii being the value of a which minimises

m(x Ia) = { (2a)-1 J~o~aa f(x I ()) d() , a> 0f(x I ()o) a = 0

4.6. CONTAMINATED PRIORS

EXAMPLE 4.6.1

119

Let X = (XI, ... ,Xp )' rv Np(O,u2Ip ),O unknown, u2 known. Suppose theelicited prior 7ro for 0 is Np(p,T2Ip ). Since the usual mle of 0 is O(x) = x =(Xl,""Xp ),

where€ T 2

5.(x) = [1 + (-)(1 + - )p/21- € u2

exp {I x - It 12 /2(T2 + ( 2)}]-1

and 1 z I denotes vI: z;. Note that A ---> 0 exponentially fast in 1 x - It 12

so that 8ir (x) ---> x quite rapidly as 1 x - It 12 gets large.

Another class of priors is unimodality preserving contaminations where wedenote by 00 the mode of 7ro, assumed to be unique. Here the class of priorsis

ru = {7r : 7r = (1 - €)7ro + €q, q E Qu} (4.6.14)

where Qu is the set of all probability densities for which 7r is unimodal withmode 00 (not necessarily unique) and 7r(Oo) :::; (1 + €')7ro(Oo).

However, it may be noted that the mt-II technique is not fullproof and canproduce bad results, specially, when r includes unreasonable distributions.

Sivaganeshan (1988) obtained the following result for the range of posteriormean of 0 when 0 E rQ. Assume that the parameter space e is the realline R1 and f(x I0) > 0 V 0 E R.

THEOREM 4.6.1 Let r 1 E rQ be defined by

r 1 = {7r: 7r = (1- €)7ro + €qjq is a point mass}

Then,

and

whereR(O) = [a81fO(x) + Of (x IO)JI[a + f(x I0)]

(4.6.15)


anda = (1- €)m(x Ino)/€

The problem has also been considered by Sivaganesan and Berger (1989).Clearly, the smaller is the range of 81r (x), more robust is the Bayes estimateover priors in r Q.

EXAMPLE 4.6.2

Suppose x I (J rv N((J,a2 ),a2 known and no = N((JO,T2 ) for given (JO,T2 •

ThenR((J) = A((J)81rO(x) + (1 - A((J))(J

where aA((J) = a + fx((J) ,

1 - € 1 (x - (Jo? ]a = -€- J2n(a2 + T2 ) exp [- 2(a2 + T2 )

~O(x) = a2

(Jo + T2

X

a2 +T2

The range of 81r (x) for n E r Q is given by

(i)

where (J/ is the value of (J in R which minimises R((J) given in (i) andsimilarly for (Ju.

Robustness of inference based on posterior probability distribution of (J withrespect to prior n E r Q can be checked from the following result due toHubler (1973).

THEOREM 4.6.2 Let C be a measurable subset of e and define {30 to bethe posterior probability of C under no Le.

{30 = [f(x I (J)no(d(J) = P1rO[(J Eel X = x]

Then

inf 1rErp1r [(J Eel X = x] = f30{1 + € sup 9ECf(x I (J)}_1(1 - €)m(x Ina)

P 1r[(J C IX _ ] _ (1 - €)m(x Ino){3o + € sup 9ECf(x I (J)sup 1rEr E - x - ~--,--:---,--:--,--:-::.:.:.,..-=--_---:...-::..:::=:...:.....,-~

(1 - €)m(x Ina) + € sup 9ECf(x I (J)


Thus robustness with respect to r will usually significantly depend on theobserved x values. A lack of robustness may also be due to the fact thatr is too large. Generally C is taken as the 100(1 - 0:)% credibility intervalof () under 7f. Berger and Berliner (1983) determined the optimal (1 - 0:)robust credible set, optimal in the sense of having smallest size (Lebesguemeasure) subject to the posterior probability having at least 1 - 0: for all7f in r.

4.6.1 ApPLICATIONS IN FINITE POPULATION SAM

PLING

A sample s = (i l , ... , in) is drawn from P using the sampling design pes)and let Yr = {Yi' i E r = s}, y = ~ LiEs Yi. Consider the following superpopulation model

Yi I()i~dN(O, (J2),i = 1, ... ,N

where () has a prior

() rv N(llo, T~) = 7fo (say)

From Ericson (1969 a) it follows that

(4.6.16)

(4.6.17)

Yr I (s, Ys) rv N({(l - Bo)Ys + BoPo}lN-n, (J2(IN_n + (Mo+ n)-IJN_n»(4.6.18)

where2 2 M o

Mo = (J ITO, Bo = (Mo+ n)

Bayes estimate of ')'(y) = *L~l Yi is, therefore,

8J.Lo,Bo(S, Ys) = E[')'(y) I s, Ys)]

= Y - (1 - f)Bo(Y - Po)

where f = niN. Also, the posterior variance of ')'(y) is

The classical estimator of ')'(y) is

(4.6.19)

(4.6.20)

(4.6.21)

(4.6.22)

which remains p-unbiased under srs and m-unbiased under any model whichassumes that Yi'S have a common mean.

(4.6.23)


Ghosh and Kim (1993) considered robust Bayes estimation of -y(y) underthe class of priors rQ. The m~II prior in this class is given by (4.6.8) as

7rs (fJ) = (1- f)'TrO(O) + fOy,(O)

where Oy,(O) = 1(0) for 0 = Ys otherwise.

THEOREM 4.6.3 Under prior 7rs (B) (given in (4.6.8)), the posterior distribution of Yr is

7rs(Yr I s,Ys) = ).ML('Ys)N({(l- Bo)Ys + BolJo}lN-n,

(12(IN_n + (Mo+n)-l IN-n)) + (1 - ).MLys)N(Y1N-n, (12IN_n) (4.6.24)

where

).flA('Ys) = 1 + ( \v'B~exp{nBo(Y - IJo)2/2(12}1- f Bo

The Bayes estimator of -y(y) is

ORB(s,ys) = Ys - (1- j).ML(ys)Bo('Ys - IJo)

with posterior variance

V(-y(y) Is, Ys) = N-2[(N - n)(12 + (N - n)-2

(4.6.25)

(4.6.26)

(4.6.27)

(4.6.28)

Proof The conditional pdf of Yr given (s, Ys) is

7rs(Yr I s,Ys) = Jf(Yr I B)7rs(B I s,Ys)dB

The results are then obtained by using (4.6.10) - (4.6.12).

We note that for f very close to zero, Le. when one is very confident aboutthe'Tro- prior, (since ).ML('Y) ~ 1), oRB is very close to oO.For f close to one,oRB is close to oC.

For a given prior ~, the posterior risk of an estimator e(s, Ys) of -y(y) is

p(~,(s,Ys),er= E[{e(s,ys) _-y(y)}21 s,Ys] (4.6.28)

DEFINITION 4.6.1 An estimator eo(s, Ys) is'ljl-posterior-robust (POR) wrf;priors in a class r if

POR(eo) = sUP€Er I p(~, (s, Ys), eo) - p(~, (s, Ys), o€ 1< 'ljI (4.6.29)


where {)E = ()E(s, Ys) is Bayes estimator of 'Y(Y) under the prior €. Thequantity POR(eo) is called the posterior robustness index of eo. Taking

(4.6.30)

where B is defined as in (4.6.4), Bayes estimator under 7rI-'.B and its posteriorvariance p(7r, (s,ys),{)I-'·B) are given by (4.6.5) and (4.6.6). The followingresults hold.

P(7rI-'.B, (s, Ys), ()o) - P(7rI-'.B, (s, Ys), ()I-'.B)

= (1 - f?[Bo(p - Po) + (Bo - B)CYs - p)]2

P(7rI-'.B, (s, Ys), ()C) - P(7rI-'.B, (s, y., ()I-'.B)

= (1- f?B2(y - p?P(7rI-'.B, (s, Ys), ()RB) - p(7rI-'.B, (s, ys),{)I-'·B)

= (1- j)2[Bo~ML(Ys)(Ys- po) - B(ys - p)]2

(4.6.31)

(4.6.32)

(4.6.33)

It follows from (4.6.31) - (4.6.32), therefore, that all the estimators fJO,{)Cand {)RB are POR-non-robust under r I-'.B. This is because r is very large.If one confines to the narrower class r o = {7rl-'o.B = N(po, ( 2

), u 2 > O}, itfollows from (4.6.31) - (4.6.33) that

(4.6.34)

POR({)C) = (1 - f?CYs - PO)2 (4.6.35)

POR({)RB) = (1- f)2max[B5~ML(Ys),(l-Bo~ML(Ys)?](Ys-po? (4.6.36)

Thus, given 1f; and f, posterior robustness of the predictors depend on thecloseness of y to Po. Also, both the subjective Bayes predictor {)O and robustBayes predictor {)RB are more posterior robust than {)C under roo Again,{)RB is more posterior robust than ()o if BO~MLCy) > 1/2. Defining

'Y(€, e) = E[p(€, (s, Ys)), e] (4.6.37)

where expectation is taken with respect to marginal predictive distributionof Ys, as the overall Bayes risk of e, we consider

DEFINITION 4.6.2 An estimator eo(s, Ys) is said to be 1f;- procedure robus~

with respect to r if

(4.6.38)


PR(eo) is called the procedure robustness of eo. Considering the class fo,and denoting by 7fB the N(po, a 2) prior,

(4.6.39)

r(7fB, bC) - r(7fB, 8B) = (1 - 1)2Ba2In (4.6.40)

r(7fB, 8RB ) - r(7fB, 8B) = (1 - 1)2E[(Bo).ML(Ys) - B?(ys - po?]

It follows, therefore,PR(OO) = 00

P R(8C) = (1 - f?a2In

PR(bRB ) = (1 - 1)2 sup O<B<lE[(Bo).ML(Ys) - B)2(ys - Po)2] (4.6.41)

= O(..JB)

Thus 80 is not procedure robust. For small B, 8c is more procedure robustthan 8RB . This is what is to be expected since small B signifies smallvariance ratl0 82 /T2 which amounts to instability in the assessment of priorfor e. In this case, long-run performance of 8c is expected to be better thanthat of 8RB .

We note that subjective Bayes predictor 80 which is POR-robust (i.e. robustfor a given sample) fails completely in long-run performance as measuredby procedure robustness. The robust Bayes procedue 8RB seems to achievea balance between a frequentist and subjective Bayesian viewpoint.

Following Sivaganeshan and Berger (1989) the authors considered the rangeof posterior means of ')'(y) over 7f E fQ.

The authors extended the study to the symmetric class of unimodal contaminated priors to obtain the estimate 8su .

Ghosh and Kim (1997) considered robust Bayes competitors of ratio estimators. Under the superpopulation model

Yi = [3xi + ei, i = 1, ... , N

(4.6.42)

with [3 having a uniform prior over (-00,00), Bayes estimate of ')'(y) isgiven by the ratio estimator eR = CLs Y;/ Li Xi)X. Under model (4.6.42)and a 7fl = N([3o,T2 )- prior for [3, the Bayes estimator of ')'(y) is .

(4.6.43)

4.6. CONTAMINATED PRIORS

where

B 1 = M o , Xs = .!.. "XiMo+nx n L

,Es

125

and M o is as defined in (4.6.19) with associated Bayes posterior variance

v". = V(-y(y) I s, Ys) = (7"2N-1[(N - n)xs

+(N - n)2x2/(Mo+ nx)) (4.6.44)

Under the class of contaminated priors rQ where 71"1 = 71", the authors foundthe ml-II prior and the robust Bayes predictor of '}'(y) under this prior as

5nB(l)(S, Ys) = His + (1- f)x s{(l - 5.. ML (Y)B1(s))Yslxs + 5..~L(Ys)B1{30(4.6.45)

where

(4.6.46)

Also its posterior variance is

{(7"2M5..~L _ +5..~L(l-5..~L)Bi(~S _(30)2}) (4.6.47)o nxs X s

where xr = LiEr xi/eN - n). The authors compared 5nB{1) , 51 and en interms of posterior risks as well as the overall Bayes risk under the class ofpriors {N({30,T2),T2 > O}. It is found that both 51 and 8nB(l) are superiorto en in terms of posterior robustness. Also, 81 lacks procedure robustness,while en is quite procedure robust. It was found that for small valuesof 52/T2 (which amounts to greater instability in the assessment of theprior distribution of (3 relative to the superpopulation model) en is moreprocedure robust than 8nB(l). This shows that in such circumstances it issafer to use en, if one is seriously concerned about the long-run performanceof the estimator. The authors extended the study to symmetric unimodalcontaminated class of priors to obtain the estimator 8SU(l).

It may be noted that in models (4.6.1) [ (4.6.42)), (7"2 may be unknown andone may consider normal gamma priors for (p, (7"2) [({3, (7"2)) and derive ml-IIpriors under the contaminated class and undertake similar studies. Clearly,such studies can be extended to multiple regression models.

Ghosh and Kim (1997) considered 1970 population (y) and 1960 population (x) of 125 US cities with 1960-population between 105 and 106 for an


empirical study of the ratio estimator and estimators of its variance. A20% srswor of cities was taken. (12 was assumed to be known. To elicit thebasic prior 7ro for (3 data on 1950 and 1960 population was used and E waschosen as 0.1. It was found that 8RB(1) and 8SU(1) were closer to ,(y) thaneR, which was worst in terms of posterior robustness index. The range ofposterior mean of ,(y) was found to be small in both the cases r Q andr S so that if the true prior was 7ro one could use modelling via any of thecontaminations to achieve the robust Bayesian analysis.

4.7 EXERCISES

1. Let Xk I Ok be distributed independently as N(p,72), while Ok hasindependent prior distribution N(p, 7 2), k = 1, ... ,n. The posterior densityof Ok I Xk is, therefore, N('t + D(Xk - p), D) where D = 7

2/(1 + 72). The

Bayes estimate of Ok under SSEL are the posterior means

~B

Ok = P + D(Xk - p), k = 1, ... , nn n n

Define (j = LOk/n , 1/ = LBf/n, S2 = n~ 1 L(Xk - X)2, X =k=l k=l k=l

n

L Xk/n. Note that X --> p and S2 --> 1 + 7 2 = 1/(1 - D) im probability.k=lHence, or otherwise show that

(i)-;,B _

E(O ) = E(O),

(ii)

"2)Bf _1/)2 /(n - 1) = D2s2--> D72 as n --> 00

;=1

(iii)

n

E[L(O; - (j)2/(n - 1) Ix] = D(1 + DS2) --> 72 as n --> 00;=1

Therefore, histogram of B;B values are more concentrated about prior meanp than the values of 0; given x = (Xl, .. . , xn )', since 0 ::::; D ::::; 1.

Consider now the modified estimator~L

Ok = e+ A(Xk - e)

4.7. EXERCISES

where

127

A

~ =

Show that for this estimator(iv)

(v)

VD[I +~2DF/2

(1 - D)J1- + XeD - A)I-A

-;,L _E(B )=E(B)

't(Of - 'l)2/(n -1) = E['t(Bk - O? Ix)/(n - 1))k=l k=l

where 1/ = 't of/nk=l

(Louis, 1984)

2. Let Xj(j = 1, ... ,n) be independently distributed N(B, (12) and B havethe prior eo :B", N(J1-, 7 2 ) where J1-, 7, (12 are known and Bis an unobservablerandom quantity whose value we want to predict. Show that under the lossfunction L(B, a) where L(B, a) is an increasing function of I B - a I, theBayes estimate of B is

n

where x = (Xl"", Xn )' and x. = LXj/n.j=l

For L(B, a) = (B - a)2, show that the Bayes risk (expected risk) of 8B withrespect to e is

7 2(12E(8B (x) - B? = R(C, 8B (x)) = 2 2

(1 +n7

where expectation is taken with respect to predictive distribution of X forthe given prior C.

Consider an alternative prior 6 : B '" N(J1-l,7f),7f < 72. Show that underthis prior expected risk of 8B is

() [( A Al 2 Al ) 2R(6,8B X) = -A - -A--) (AI + 1) + -A-- (1 /n+1 1+1 1+1


+ (P,l - p,)2(A + 1)2

(P,l - p,)2= B0-

2In + (A + 1) (say)

wherenT

2 2

A A_ nTI

= 0-2' I - 0- 2 .

Thus

according as

Thus for any fixed value of Tf < T 2 ,R(6,8B (x)) can be made arbitrarilylarge, by making I ILl - p, I arbitrarily large.

In particular, let 0-2 = 1, p, = 0, n = 1 when T

2 = A. Under the prior ewhich we now denote as ~A, Bayes estimate of () is

8·-~A- A+l

with Bayes risk

R(~a, 8~) = AA+1

Saving in risk using 8~ over using the mle x is

Again,

Therefore, an estimator which compromises between Bayes estimator 8~

(which has high risk Ex(8~ - ())2 (in the frequentist sense) for high valueof I () I) and the mle 80 (= x) (which has no savings in risk with respect to~A but has minimax risk Re(80 ) = 1 V ()) is the Limited Translation (LT)Bayes estimator 8A ,M(x) defined as follows. For any A and M(> 0), letC = M(A + 1). Then

{

00 (x) + M = x + M for8A;M(x) = 8~(x) = (A~l) for

80 (x) - M = x - M for

8~(x) > x+M18~(x) - 00 (x) 1< M8~(x) < x - M

4.7. EXERCISES

Show that relative savings loss (RSL) of 8A,M with respect to 80 is

129

R(~A, 8A,M) - R(~A, 8A)R(~A, 80 ) - R(~A, 8A)

A= (A+1)[R(~A,8A,M)- A+1)]

(Effron and Morris, 1971)

Documents

[Lecture Notes in Statistics] Topics in Survey Sampling Volume 153 || Modifications of Bayes Procedures