Estimating probability of occurrence of the most likely multinomial event

ELSEVIER Journal of Statistical Planning and

Inference 59 (1997) 257-277

journal of statistical planning and inference

Estimating probability of occurrence of the most likely multinomial event

Khursheed Alam, Zhuojen Feng Lkpurtment cfMuthematica1 Scienca, Clemson lJnirersi[s. Clemson. SC 2Y634-IYO7, US,4

Received 6 May 1994

Abstract

The problem of estimating the parameters of a multinomial population arises in discrete multivariate analysis. This paper deals with the problem of estimating the probability associated with the most likely multinomial event. We consider several estimates, such as the maximum likelihood estimate and its modifications. and a Bayes estimate. Certain mathematical properties of the estimates are shown. Empirical results are given, showing the relative performance of the estimates with respect to the mean squared error as the loss function.

,4MS subject class[fication: primary 62F.10; secondary 62F07; 62C15

Keywords: Multinomial distribution; Maximum likelihood estimates; Quadratic loss: Bayes estimate: Admissibility

I. Introduction

The statistical problem of estimating the largest mean of K normal populations has

been considered by Kuo and Mukhopadhyay (1990), Mukhopadhyay et al. (19931, Sexena and Tong (1969) and Tong (1970), among others. In this paper we consider a related problem of estimating the largest component of a multinomial parameter. The multinomial distribution arises in the study of discrete multivariate analysis (Bishop et al. 1975). In this context the problem of estimating a multinomial parameter has been inestigated fully with the maximum likelihood estimator (MLE) being of particular interest. Certain optimal properties of the MLE have been shown by Steinhaus (1957), Trybula (1958) and Rutkowska (1977). Decision theoretic properties such as admissibility, have been shown by Johnson (1971), Alam (1979) and Olkin and Sobel(1979). The Bayes method is an alternative approach. See Good (1965, 1967) for a discussion of Bayes estimates. Fienberg and Holland (1973) and Mosimann (1967) have proposed pseudo-Bayes estimates. Alam and Mitra (1986) among others. have considered empirical Bayes estimates.

0378-3758!97!$17.00 IT 1997 Elsevier Science R.V. All rights reserved PII: SO378-?758(96)001 12-7

258 K. Alam, Z. FengJJournal of Statistical Planning and Inference 59 (1997) 257-277

Let x = (xi, . . . ,x,) be a sample from a multinomial distribution with the associated probability vector p = (pi, . . . , pk), where 2: pi = 1 and 1: Xi = n. The problem of estimating pmax = max(pi, . . . , pk) arises, for example, in testing the hypothesis that the multinomial events are equally likely. This is equivalent to testing the hypothesis

that pmax = (l/k). More generally, the problem arises in the measurement of the diversity of a multinomial population. There are various criteria for measuring deversity. The principle of majorization provides an intrinsic criterion for comparing the diversities of two multinomial populations. A comprehensive theory of majorization is given in the textbook by Marshall and Olkin (1979). Schur-concave functions, which are isotonic with respect to the majorization relation, provide a useful class of indices of diversity. The well-known Gini-Simpson index of diversity, given by the right-hand side of (2.2) below, is a Schur-concave function. It is seen that the parametric function of 1 - pmax is also a Schur-concave function of p. It is thus a measure of diversity. Therefore, the problem of estimating pmax (more generaly the ordered components of p) is of broad interest. In some problems it is natural to consider the specification of the most probable category together with the estimation

of Pmar But the specification is not required for the measurement of diversity. We supplement the foregoing discussion with the following illustration, which was

used by Patil and Taillie (1982) to explain the concept of diversity. A traveler in a tropical forest notices a particular species (tree) and looks for a second specimen of the same species. How often will he come across the same species, as he travels, is a question. The typical (most frequently appearing) species is relatively rare in a diverse forest. That is, a smaller value of pmax, the probability associated with the appearance of the typical species, is indicative of greater diversity.

We consider several estimates of pmax. Here and throughout this paper, we use the term estimate for the estimating function (estimator) as well as for any particular value of the function. The meaning will be always clear from the context. The MLE of p is given by $ = (pî, . , &), where ii = xi/n. Therefore, the MLE of pmax is given by

p^,,X = max@i, . . . ,&,). The MLE tends to overestimate pmax, since Efirnax B max(Efil, . . . , Et,) = pmax. The overestimation is reduced by the modification

sj. = nhmax + (1 - i)$ (1.1)

where 0 < 1 < 1. Clearly, 6, d fi,,,. Specific values of 3, of particular interest which have been considered in the literature, are given by 2 = n/(n + c), where c = 6 and

,f% (Bishop et al. 1975, Table 12.2-1). We shall consider the case c = & and denote the corresponding estimate by S*.

As 6, = fi,,,, & = i and 6j, is increasing in 2, there exists a value of 3, = Jo, say, for which EdA = pmax. However, &, depends on the parameter p, being given by

1.0 = (Pm,. - y(ELL+. -:)-

K. Alum. 2. FenylJournal qf‘ Statistic~nl Planning und Inference 59 11997) 257- 277 75’)

Rather than estimating R0 for the substitution in (1.1) it is preferable to estimate directly the bias of fi,,X. So we obtain an estimate of bias (BIAS) from the distribution of i,,X and derive a bias corrected estimate of pmax, given by

6, = fim,, - BIAS. (1.2)

Finally, we consider a Bayes estimate and its limiting form, with respect to a Dirichlet prior for the parameter p of the multinomial distribution. The Dirichlet distribution, given by the density function.

Q,(p:v,. ,l’k) = I-("1 +, t VL) ,,,_ 1

l-(Y,) . (1.k) p1 . ..pp-‘. (1.3)

where \‘i > 0, i = 1, . . . , k, is a conjugate prior distribution for p. For symmetry we let \li = v, say, for i = 1, , k. The prior distribution is more diffused (less informative) as 1’ decreases. To see this, we observe that Epg = l/k under the given prior, and that

var(p,) = (k - 1)/(k2(kLl + 1)) 1. . . ,k

is decreasing in v. However, the limiting distribution, as v ---f 0, is improper. We obtain a Bayes estimate (6)) of pmax with respect to the given prior under the squared error loss, derive its limiting form &,), as 1’ + 0, and show that i,, is admissible.

In the following section we present certain mathematical properties of i,,,, (i* and 6,. The Bayes estimate sL. and its limiting form & are introduced in Section 3. In Section 4 we consider a two-stage procedure which combines a rule for selecting the most probable category with the estimation of fimah. A table is given together with a graph, showing the mean squared error (MSE) of the proposed estimates, for a comparison of their performance. The tabulated results are discussed in Section 5. In Section 6 (Appendix) we provide a recursive formula for computing the c.d.f. of (;,;,, which has been used to compute the MSE. We derive also the asymptotic distribution of J?,~~~,. The Appendix is specially provided, since the distribution of inlax arises in many other problems, such as ranking and selection.

In this paper we have dealt with the problem of estimating pman. The problem of estimating pmax. the probability associated with the least likely multinomial event can be handled similarly.

A related topic of interest is the problem of selecting the most probable multinomial event, or selecting a subset of the multinomial events. containing the most probable event. This is a topic of ranking and selection. See Gupta ad Panchapakesan (1979) for a comprehensive literature in this area. See also Alam (1971) and a recent paper on the topic by Gupta and Hande (1993).

2. Estimates of pmax

First we consider &,,. From the following lemma we derive an upper bound on the value of II MSE(&,,).

260 K. Alum, Z. FenylJournal of Statistical Planning und Inference 59 (1997) 257-277

Lemma 2.1. if ct 3 0, then

lhnax - PmaxlZ G i: IBi - PiI'. i=l

(2.1)

Proof. Suppose that prnax = pi and p*,,, = jjj The above inequality is trivially satisfied when i = j. Let i # j. If pmax > &,,,X then

On the other hand, if pmax < p^,,, then

= P.i - p^,,x

* d pj -pj.

The lemma follows in either of the two cases. 0

From (2.1) we have that

(2.2)

The last expression on the right-hand side is a measure of diversity of the multinomial population, called the Gini-Simpson index, due to Gini (1912) and Simpson (1949).

We now consider the asymptotic distribution of fi,,,, as n + x from which we derive the limiting value of n MSE p^,,,. Suppose that r of the components of the vector p are equal to pmax and the remaining components are each less than pmax, r = 1, . , k. For r 2 2 let W, denote the largest component of an r-dimensional random vector, distributed according to a multivariate normal distribution N&),x), where 1 is an equi-correlation matrix, with the common correlation

P = - Pmaxl(1 - Pm,).

It is shown below in Section 6 that for large rz

J&%l,X ~ Pnl,,) - (Pmax(l - P*,aX))1’2 WV

for r 3 2. and

,$( limax - Pnlnx) - (PI&l - PlWXP2 Z (2.4)

for I’ = 1. where Z denotes a standard normal random variable - means ‘asymptotically distributed as”. See Remark 6.1. below.

From (2.1) we have that

= iil C3Pf(’ - PiI + $/‘i(f - p;)(f - 6p;)(] - pi))]

The boundedness of !I* E(fi,,, - p,,.J4 implies that ,/~I&,,,, - prnarl and

” I limnx - p,,,12 are uniformly integrable. It follows from (2.3) and (2.4) that (Serfling 1980). Lemma 1.4(A) and Theorem 1.4(A))

lim ,.‘~EIj$,,,, - ~maxl = (~mm(l - ~m;x))“’ El wvl. (2.5) ll-1 ;I

h 11 MWh,,,,) = ~,,,(l - P,,,)EW; (3.6) nm+ I

for I’ 3 2, and

lim v~EIli,,, - pmaxl = (p,,,(l - P,,,,,))"~ v i, i n-x

(2.7)

for I’ = 1. From the formulas (6.10) and (6.11) of Appendix we have that

EW,? = (1 - p)EZ,? + p,

where Z,. denotes the largest order statistic in a sample of r observations from N(0. I ). To summarize, we have obtained the following result.

Theorem 2.1. The c.d.jI qf pmax is yicen by (6.1) and (6.2). As n + XI. the ~rsytnptotic

distribution of pmax is given by (2.3) and (2.4) and the litniting values ofthe wean and twrtn

squmred deviations of p^,,x are given ~JJ (2.5H2.8).

Remark 2.1. Since ima, is derived heuristicly from I;, it is appropriate to mention here the optimal properties of p^ as an estimate of p. In this regard we mention that i, is

262 K. Alum, Z. FenglJournal of Statistical Planning and Inference 59 (1997) 257-277

admissible (Alam, 1979) with respect to the squared error loss, given by

Jw,P) = i (4 - pJ2, i=l

where 6 = (6,, . ,6,) denotes any estimate of p. It has been shown by Oklin and Sobel(l979) that p^ is minimax and admissible with respect to the loss function, given

by

i=l

However, $ is not minimax with respect to the quadratic loss (2.1 l), as shown below. Consider the Dirichlet prior distribution D,(p; v, . . . , v), given by (1.3), for the

parameter p. A Bayes estimate (6) of p* with respect to the given prior and the loss (2.9) is given by

6i = (npî + V)/(n + kV), i = 1, . ) k. (2.10)

Let v = h/k. We see that the risk of the Bayes estimate is constant, given by

R(&P) = EU&P)

= n(k - l)/(k(n + J;?)“). (2.11)

Hence, 6 is minimax and admissible. On the other hand, the risk of j is given by

which is maximized for p = (i, . . . ,i). The maximum risk of p^ is equal to (k - l)/(nk) > R(6,p). Therefore, p* is not minimax.

Even though j is not minimax, the difference between the maximum risk of fi and 6 is small for large n. In fact for large values of n, 6 has larger risk then 8 in most parts of the parameter space, except in the vicinity of the point p = (i, . . . , i). Moreover, 6 does not compare favourably with j from a local asymptotic point of view.

Estimate a*. Thus estimate 6* is given by the largest component of the Bayes estimate

6, given by (2.10) for v = &/k. That is

(2.12)

To motivate the choice of 6* as an estimate of pmax, first we note that 6* is the largest component of a conservative estimate of p which is minimax and admissible. Second,

we note that 6* is obtained form (1.1) by putting 1 = n/(n + &). That is, 6* shrinks

J%l,X. Since 1, increases with n (tending to 1 as n + co), the shrinkage decreases with n,

as it should, since p^,,X converges in probability to pmax, as n -+ co.

K. Ah, Z. Feny,lJournal of Stuiistic,al Plannimg and lnjwnce 59 (IYY7I 257 277 263

Remark 2.2. We have noted above that 6* shrinks the estimate &,,X towards the point l/k, the least possible value of pmax. A less extreme shrinkage point may be perferred. specially in the situation where it is known a priori that pmax is large. An adaptive choice of the shrinkage point is given by the geometric mean of p^,,, and l/k. The corresponding estimate of pmax is given by

However. we do not consider this estimate any further in this paper.

Estimate &. Let e(p) = E(im,,Ip). For any given value of p, the value of e(p) can be computed from the distribution of ima, given in Appendix. An estimate of the bias of &,,, is given by

BIAS = e(p*) - $,,X,

where C: is obtained from e(p) by substituting fi for p.

Using (2.13) for BIAS in (1.2) we obtain

(2.13)

for a “bias corrected” estimate of pmax. We note that BIAS is a “bootstrap” estimate of the bias of i,,,. In the bootstrap

scheme, given x = (x1, . , xk), we generate a large number (N) of samples of size n from a multinomial distribution with the associated value of the parameter p = I;. This is called a bootstrap resampling plan. From each sample so generated, we compute the analog of p^,,, = p&,, say, and take its average values (AV p&). Clearly.

AV P&,, + 2, as N + m. Since, p^,,, plays the role of pmax in the bootstrap resampling scheme, we estimate the bias of i,,,,, by AV pzax - I;,,, which tends to C: ~ &,,, = BIAS. as N -+ m.

The bootstrap method is a computer directed nonparametric technique. It is used for calculating approximate bias, standard deviation and confidence interval in almost any nonparametric problem. See Efron (1982) for a review of the bootstrap method and other resampling plans. We have mentioned here the bootstrap method for the sake of motivation. In fact, we do not use the bootstrap sampling scheme for estimating &,, since e* is computed directly from the formula given in Appendix for the distribution of I;,,,.

3. Bayes solution

Consider the Dirichlet distribution given by (1.3). It is a generalization of the beta distribution. It is also represented as the joint distribution of I/,, . V,. given by

i = 1, ,k, (3.1)

264 K. Alum, 2. Feny/Journal of Statistical Planning and Injerence 59 (1997) 257-277

where WI, . . . , W, denote k independent gamma random variables distributed with

Vl, ... > vk degrees of freedom, respectively, and a common scale parameter 0, say. This representation is given by Theorem 1 of Mosimann (1962). A gamma distribution with v degrees of freedom and scale parameter 0 is given by the density function

f(x) = .“-le~“‘“/(H’T(v)), x > 0.

Since the distribution of x does not depend on 0, it is independent of Ci=, Wj, a sufficient and complete statistic for 0 (Basu, 1955; Lehmann, 1959, Theorem 5.2), i = 1,. . . , k. Let

V = max(V,, . , V,).

From the given result it follows that

=(E)V

EV = E,,,(W,, ... , W,)/E (3.2)

Let the multinomial parameter p be a priori distributed according to the Dirichlet distribution (1.3). Then prnax is distributed as V and from (3.2) we have that

EP,,, = ; (3.3)

where G = vr + . . . + vk and G,, (y) denotes the c.d.f. of a gamma distribution with vi degrees of freedom and scale parameter 0 = 1. Given the sample values _?: = (X1, . . . , xk) , p is a posteriori distributed according to a Dirichlet distribution with parameter (VT, . . . , vz), where VT = vi + Xi, i = 1, . , k. Corresponding to (3.3) the posterior mean of prnax is given by

(3.4)

where v* = F + n. The posterior mean of pm_ given by (3.4) is a Bayes estimate of Pmax with respect to the Dirichlet prior and the squared error as loss function.

We let v1 = ... = vk = v, say, for symmetry. Then the Bayes estimate of Pmax is given by the posterior mean

(3.5)

We have mentioned in the introduction that the prior distribution is more diffused (noninformative) as 1’ decreases. The proposed estimate fi,, is the limiting value of fi,. as 1’ + 0. given by

(3.6)

Here we put G,,(r) = 1 for s > 0 when xi = 0

Admissibility of i),,. When k = 2, the admissibility of I;,, is established by a standard method. as follows. First we give a lemma.

Lemma 3.1. Let v > 0. Then

Proof. Since G,.(s) is decreasing in V, from (3.5) and (3.6) we have that

0 < (n + kv);,. - nj& = I

,:: fi G,..,, (y) d!, 1

$I i

’ (G,,(Y) - G,.+, (Y)) d) (3.7) I 0

= I\\,,. (3.8)

The inequality (3.7) follows from the following result: Let 0 6 II,. hi < I and lri-h,>O,i= 1, . . . . k,then

This result is proved easily by induction. The lemma follows from (3.X) 0

Proof. Suppose that &, is inadmissible, being dominated by an estimator & say. Let i)(6) denote the Bayes risk of 6 with respect to the Dirichlet prior Dk(p; v. . \N). From (1.3) we have that

r(kv) &(p: 1’, ‘.. . VI = (r(V))k (PI, .., .Pk)

I’ 1

(3.9)

266 K. Alam, Z. FenglJournal of Statistical Planning and Inference 59 (1997) 257-277

Hence for 0 < v < 1,

lim vPk+i (P@) - P(8) ’ 0. (3.10) Y-CC

Now

MO) - P(d) G MO) - P(&) (3.11)

= E(pô - ~,ax)’ - E(p^y - P,,,)’

= &(&I - pIy)2 (3.12)

< k2v2/n2, (3.13)

where E, denotes the expectation with respect to the marginal distribution o f 5, given

by

n!r(kv) k T(V + Xi)

m(x) = r(n + kv) i= 1 (T(v)q!) ’ n

The inequality (3.11) follows since p^y

(3.14)

is a Bayes estimate. The equality (3.12) follows from the fact that fiy is the mean of the posterior distribution of pmax, given -1”. The inequality (3.13) follows from Lemma 3.1.

Let k = 2. It follows from (3.13) that

lim v-‘(p(/%) - p(6)) d 0 V+O

in the contradiction of (3.10), hence fro is admissible when k = 2. 0

k > 2: The method given above for establishing the admissiblility of to when k = 2 does not go through when k > 2. In this case, we proceed as follows. In a study of admissibility for statistical problems with finite sample space, Brown (1981) proposed a stepwise algorithm to construct Bayes procedures. Let N(x) (M(p)) denote the number of components of x(p) equal to 0. Let Xi (Sz,) denote the set of points x(p) in the sample (parameter) space for which N(x)(M(p)) < k - i, i = 1, . . , k. Let Y(pi) = pi > ( = ) 0, and let pi be a prior distribution on 52i, given by the density

ni(P) K fi y(Pj))- ‘, i=l

with support on 521 - 521,i = 1, . . . , k, where 52,+ 1 denotes the null set. We have a sequence of sample spaces x = x1 1 x2 2 . 3 Xk and a related sequence of priors {xi}, satisfying the condition (2.2) of Brown. Using Brown’s terminology, a procedure d in a decision problem is called totally Bayes relative to the sequence of priors

r1, .‘. 2 nk if d is Bayes relative to zi on i = 1, . . . , k. It is seen that to is a uniquely determined totally Bayes procedure for estimating pmax under the squared error loss function. It follows from Theorem 2.4 of Brown that fro is admissible.

K. A/am. Z. Feng/Journal of Sta/istic,ul Planning and lnfrrrnce 59 (1997) 257 277 167

Remark 3.1. In the discussion leading to Theorem 2.4 of Brown, it is assumed that the prior distributions are proper distributions. Here pi is an improper distribution. But the theorem is still applicable since the corresponding Bayes risk is bounded.

Remark 3.2. A related problem of interest is the estimation of the larger translation parameter. Let y, and y, be independent random variables, where yi is distributed according to the density functionf(x - Hi), i = 1,2. Consider the problem of estimating 4(0,, (II) = max(Hr, 0,) with the squared error as loss function. Blumenthal and Cohen (1968) have shown that under suitable conditions, one of which is that ./ is symmetric, b(yI,y,) is not admissible in general. Moreover. @(yl,yz) need not be minimax when ,f is not symmetric.

4. Two-stage procedure

We have mentioned above that for estimating pmax as a measure of diversity of the multinomial population, the specification of the multinomial event associated with P,,,_ may not be consequential. However, in some other applications it may be appropriate to incorporate the selection of the most probable category in the cstima- tion procedure. We consider a two-stage procedure, where first we select the most probable category, then we estimate the probability associated with the selected category. For estimating prnax the loss is given by the sum of two losses LI and L2. where L1 is the loss due to the selection of the most probable category and L2 is the loss due to the estimation of the probability associated with the selected category. Specifically, we let L1 = a(0) if an incorrect (a correct) selection has been made, and L, = h(6 - pi)2, when the ith category is selected. where a and b are positive con- stants, and 6 is an estimate of pi, being considered as an estimate of pmnx.

Given the observation 5, a selection rule is a vector Y(x) = (‘PI(x), . . Y,(.z)). where Y,(x) denotes the probability of selecting the ith category, and xsE 1 Y,(s) = 1. Suppose that the multinomial parameter p is a priori distributed according to the Dirichlet distribution (1.3) where r1 = ... = vii, = v, say. Since p is a posteriori distibuted according to a Dirichlet distribution with parameter values (v + xl, , v + xk), from the representation (3.1) of the Dirichlet distribution we have that the posterior expected loss (L,) due to selection is given by

A(x) = 0 1 yi(X) Ai( (4.1) i= 1

where

Ai = b4dG.x + V(Y). (4.2)

268 K. Alatn, 2. FenylJournal of’ Statistical Planniny and Inference 59 (1997) 257-277

Let X(i) denote the ith ordered component of 5, and let AC,,(x) be obtained from (4.2) by substituting X~i) for xi and X(j) for Xj. Since Ai is decreasing in xi and increasing in Xj(j # l), we have that

41,(x) 3 ‘42,(x) 3 ‘.’ 3 A@,(x)

Hence, A($ is minimized by putting Yi(x) = 1 and ‘Pi(x) = 0 forj # i, when xi = max (x1, . ,xk). That is, the procedure which selects the category associated with the largest component of 5 (breaking ties by randomization) is a Bayes procedure with respect to the loss Lr. The corresponding posterior risk is equal to A,,,($.

Next consider the loss L2. Suppose the the ith category has been selected. The posterior expected loss is minimized by letting

6 = (V + Xi)/(kV + ?I) (4.3)

representing the posterior mean of pi. The minimum expected loss is equal to b Bi (I$, where Bi (5) denotes the variance of pi under the posterior distribution, given by

Bi (x) = (V + Xi)((k - 1)~ + ~1 - Xi)/(kV + n)‘(l + kv + n))

d (4(1 + kv + n))-’

(4.4)

= c,, say. (4.5)

Let

Qi(x) = aA; + bBi(x)

and

Q(x) = min(Qdx), , QL(x)).

The posterior risk with respect to combined losses L1 and L, is given by

R(x) = i T(X) Qi(Z). 1

(4.6)

The above formula leads to the following theorem.

Theorem 4.1. The two-stage procedure which selects the ith category for which Qi(x) = Q(x) together with the estimute of pi being given by (4.3) is Bayes with respect to the Dirichelt prior (3.9) and the combined losses L1 and L2.

A similar result has been derived by Gupta and Hande (1993) for the two-stage procedure with combined losses L1 and L,, where the loss due to selecting the ith category is given by L, = pmax - pi. The Bayes rule corresponding to this loss function turns out to be (computationally) simpler than the Bayes rule given by Theorem 4.1.

We have shown above that the procedure which selects the category associated with the largest component of? is a Bayes procedure with respect to the loss L1. It is sholvn below (Lemma 4.1) that for i < k

for sufficiently large II, where E, denotes the expectation with respect to the marginal distribution of 5. given by (3.14). On the other hand. from (4.5) it is seen that the expected value of the loss L, is bounded by hc,,. Since I’,, --f 0 as n + 1y1 , we have the following result.

Theorem 4.2. The t\vo-stuge procedure wkick selects the itk cate(gor~~for kvkick xi = xcA, togctkrr witk tkr rstimate ofpi being gicen hi (4.3) is Ba~w lvitk respect to tke Diricklct prior (3.9) and the combined losses L, trnd Lz. ,for sufficiently large n.

The estimate (5 given by (4.3), tends to j?; as I’ ---f 0. Let So denote the two-stage procedure which selects the ithe category for which I, = .x,~, together with the estimate of pi being given by fimilX, corresponding to the Bayes rule of Theorem 4.2. as 1’ ---f 0. An application of the standard method used in the proof of Theorem 3.1 for admissibility, shows that So is admissible. Hence.

Theorem 4.3. Tke two-stage procedure which ,selrcts tkr category ussoc~irrtcd \vitk tkcl largest component of‘ x together btitk tke estimate oj pmax heinq giccrl hi fin,:>,. I.\ dmi.ssihlc v\Ytk respect to the comhinrtl losses L, and L2. ,for syjficientl~~ lwgr tl.

Lemma 4.1. For 1 6 i < 12 - 1

Proof. From (4.2) we have that

In the right-hand side of (4.8) the first integral is bounded below by

(4.7)

(4.X)

(4.9)

270 K. Alam, 2. Feng/Journal of Statistical Planning and Inference 59 (1997) 257-277

whereas the second integral is bounded above by

s Oc Gxo,+vb) dGx,o+d) z @((X(i) + X~k))I(x~i) + x(k) + 2v)“Z 0

(4.10)

6 @((X(i) - x(k))I(n + 2v)1’2 (4.11)

as n + co, where @ denotes the standard normal c.d.f. The approximation (4.10) follows from the asymptotic normality of the gamma distribution for large degrees of freedom. The inequality (4.11) follows from the fact that xCij - xCkj d 0.

Let 1, 3 0. Under the marginal (distribution of 3, we have that for large n)

P{JXi-Xjl <A&Z Ep @ [( /z - J;;(Pi - Pj) - A - &(Pi - Pj)

4 4

- ’ - &(pi - pj) dH(p, _ p.)

L

4 .I

-+O asn+ co, (4.12)

where 4 = (Pi + Pj - (Pi - Pj) 1 ’ i” E denotes the expectation under the prior distri- 9 p

bution of p, given by (3.9) and H(pi, pj) denotes the maginal distribution of (pi, pj). It follows from (4.12) that

Em(@((x(i) - x(k))I(n + 2v) 1/Z -+ 0 asn-+ ccj. (4.13)

The lemma follows from (4.9) and (4.13).

5. Emperical results

We have compared empirically the performance of the proposed estimates p*,,,, 6*, &, and tie, with respect to the mean squared error. Table 1 below shows the values of n MSE for the four estimates, under the slippage configuration of the multinomial parameter p, where pmax exceeds each of the remaining components of p by a given number E > 0. An alternative specification of the slippage configuration is given by the ratio of pmax to each of the remaining components of p. The specification of the ratio has been used in the literature for the problem of selecting the largest component of p.

The table gives the values of n MSE for k = 2,3,4, n = 20, 50, and E = 0(.1).9. We observe from the table the following characteristics of the proposed estimates.

The value of MSE for 6* increases as e varies from .l to .9. On the other hand, for p^,,,, Sb and jo, the MSE is first increasing then decreasing. Broadly, the values of n MSE are nearly equal for n = 20 and n = 50, showing the normalization (multiplication of MSE by n) is appropriate.

Table I Mean Squared error of j,,,, h*, (5,. i,,, multiplied by II

Estimate Samle sxe II =

k=2 P”1.X n* J,, PO

20 50 20 50 20 50 ‘0 50

i: = 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.x 0.9

i: = 0.0 0.1 0.2 0.3 0.4. 0.5 0.6 0.7 0.X 0.9

c = 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

0.250 0.154 0.155 0.180 0.191 0.182 0. I59 0.128 0.090 0.048

0.33 I 0.166 0.163 0.202 0.223 0.218 0.195 0.160 0.116 0.062

0.370 0.151 0.156 0.200 0.225 0.236 0.209 0.174 0.123 0.068

0.250 0.150 0.193 0.2 18 0.209 0.188 0.151 0.128 0.09 1 0.048

0.325 0.150 0.209 0.243 0.209 0.222 0.196 0.161 0.116 0.063

0.326 0.129 0.195 0.224 0.257 0.217 0.216 0.171 0.117 0.068

0.167 0.192 0.9 I 0.106 0.097 0.151 0.1’8 0.183 0.151 0.191 0.163 0. I92 0.167 0.193 0.167 0.192 0.167 0.192 0.167 0.193

k=3 0.22 1 0.750 0.092 0.103 0.102 0.168 0.152 0.216 0.193 0.191 0.219 0.256 0.237 0.273 0.252 0.29 1 0.267 0.307 0.282 0.324

k=4 0.247 0.278 0.080 0.088 0.09’) 0. I59 0.160 0.206 0.203 0.269 0.245 0.279 0.259 0.324 0.285 0.336 0.3 I .3 0.353 0.349 0.387

0.235 0.206 0.299 0.269 0.256 0.218 0.174 0. I3 I 0.09 I 0.04x

0.243 0.193 0.256 0.301 0.794 0.255 0.209 0.164 0.116 0.064

0.241 0.170 0.245 0.297 0.286 0.271 0.219 0.175 0.123 0.068

0.227 0.359 0.22 I 0. I90 0.280 0.130 0.267 0.126 0.226 0.190 0.191 0. I 50 0.161 0.14x 0.128 0. I29 0.090 0.095 0.048 0 054

0.235 0.543 0.218 0.25 1 0.302 0.15 I 0.290 0.147

0.254 0.171 0.226 0.186 0.197 0.180 0. I62 0 153 0.117 0. I 13 0.064 0.054

0.237 0.642 0.195 0.266 0.278 0.153 0.253 0.148 0.267 0. I x0 0.218 0.208 0.217 0.201 0.170 0.172 0.117 0.123 0:068 0.069

0,355

0 I52 0. I99 0.187 0.198 0 178 0.149 0 121 0.086 0.049

(1.53 I 0.17’ 0. 150 0 20x O.EiO 0.2’7 0.191 0.161 0.1 14 0.070

0.623 0.162 0.145 0.201 0.240 0.2 I9 0.21 5 0.173 0. I I9 0.068

Among the four estimates, it is seen that p^,,, and p0 are comparable. The estimate 6* performs best (worst) among the four estimates for small (large) values of C. The estimate (Sb performs worst for small values of 1: but it is comparable with imox and p. for large values of F.

The relative performance of the given estimates is shown graphically Fig. 1 below for k = 3 and n = 50. It is seen from the figure that for c > 0.5, j,,,, 6, and fi,, have nearly the same risk which is decreasing in I:, but the risk of 6* is considerably larger and it is increasing in E. We observe that 6* shrinks pmax towards the minimum value (l/k). The shrinkage affects the estimate negatively when c is large. This is

272 K. Alam, 2. Feng/Journal of Statisticul Planning and Inf&mce 59 (1997) 257-277

0.05 1 ’ # 1 t

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 E

0.S

Fig. 1. Mean squared error of estimates.

a reason for the poor performance of 6* for large values of E, compared to the other estimates.

6. Conclusion

The overall conclusion is that p^,,, and &, are preferable choices under the mean squared error criterion of performance. Between the two, p^,,, is preferred from the computational aspect, even though so is shown to be admissible, whereas it is not known that p^,,, is admissible, with respect to the squared error loss.

We have noted above that &,,, tends to overestimate pmax. It would be interesting to consider the bias of p*,,,. Since &,,,, is a Schur-convex function of x and the multinomial distribution is parameterized to preserve Schur-convexity, it follows that

El&lax is a Schur-covex function of p (Marshall and Olkin, 1979). Hence, given pmax,

E&3X is minimized in the slippage configuration, given by p = (p,,,, 5, . . , lj), where

s’ = (1 - ~,,,Jl(k - f), and EAax is maximized for the least diverse configuration, given by p = (r], . , y, 8, 0, . ,O), where pmax = q > 8 2 0. That is, given pm_, the bias

of An‘?, is minimized (maximized) for the slippage (least diverse) configuration, given above.

Since the estimate &, is designed to correct the bias of I;,,,, it is also interesting to

consider the bias of ci,. The following table shows values of & times the bias of tmaX and ii, for k = 3, n = 50 and the slippage configuration, specified by f: = 0(0.1)0.9. Wc have that P,,~, = (1 + (k - 1) 8)/k. The cqui-parameter case corresponds to f: = 0. It is seen from the table that the bias of I;,,;,, decreases as i: increases, and is negligible fat i: > 0.3. The bias of cSh is small. except for I: = 0.

- Values of cs II times the bias of fimax and &

i: = 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

iL.,L m70 0.02 1 0.004 0.000 0.000 0.000 0.000 0.000 0.000 0.000

ii,, 0.040 ~ 0.003 - 0.009 - 0.004 0.001 0.000 0.000 0.000 0.00 0.000

6. Appendix A

A. 1. Distribution of fim,,

Approximate methods for computing the c.d.f. of &,,;lX have been given by Kozelka (1956), Mallows (1968), Yusas (1972), Proschan and Sethuraman (1975), and Glaz and Johnson (1984). Hoover (1990) has derived higher-order Bonferroni-type inequalities which are used to derive the approximate distribution. Here we give the exact distribution. A recursion formula for computing the c.d.f. of imax is given as follows.

Let C,,,, denote the largest component of the multinomial vector 5. and let

CL (1.: Qpr,) = P(C,,, d cl,

where k denotes the dimension of 5, L’ is a positive integer and _P~ = ( pl, . I)~) denotes the associated probability vector. Let [y] denote the greatest integer < y. We habe that for k = 2:

and for k > 3

0, c < nJk,

(A.0

(A.?)

where PLY, = (pIAl -A), . .(ph- ,)!(I - pd).

214 K. Alam, 2. FenglJournal of Statistical Planning and Inference 59 (1997) 257-277

A.2. Equiparameter configuration

Consider the special case when p1 = ... = Pk. In this case the recursion formula for the c.d.f. of C,,, may be given as follows. Let qk = ($, . , i), a k-dimensional vector. We have that

I u 3 n,

C2 (v; 4 92) = ii 0 2 n jJ_ y , n > v 3 n/2,

r-n v

0, v < n/2

(A.3)

and for k 3 3, Ck(v;n, qk) is obtained from (A.2) with the substitution

(f/K)” i 0

’ (k - I)“-*Ck-r (?&?‘I - r,&-1)) r=max{O,n(k- l)v] r

for the summation on the right side. Hence &- r, = (&, . . , A) is a (k - l)- dimensional vector.

The formula for the probability function of C,,, is given as follows. Let

Qt'(4 = p{&m = VIP = qk}.

We have that for k = 2

n > v > n/2,

v = n/2,

and for k > 3

QP’(v) =

i

%,(v - 1, n - sv, $k-s))

n dVG=

(tj)“-I(, _ (k _ I)~)! ’ ’ 3 ’ ’ n/k>

(A.4)

n (n!)/(v!)“, v = n/k,

0, v < n/k, 64.5)

K. Alum, 2. FenglJournal of Stcrtistiad Pluming und Inference 59 (1997) 257 277 275

where m = min(k - 2, [n/v]) and I denotes the indicator function. The rth term of the summation inside the braces represents the probability that C,,, = c for exactly r( < k - 2) values of the components of the multinomial vector x. The last term inside the braces represents the probability that C,,, = 1’ for exactly k - 1 values of the components of x.

‘4.3. Slippage conjguration

The slippage configuration is given by

( 1-Y 1-q q= __ ___ q,k_l’...‘k_l ) 1 where q = pmax > $ is of special interest. It arises, for example, in the problem of selecting the most probable category, as the least favourable configuration (Kesten and Morse, 1959). For this case we have that

v-1

+c 0 n q’(1 -Y) (n-r.)Qjin_-;) (c)

rz(J r

for 2’ 3 n/k.

(Ah)

A.4. Asymptotic distribution

We consider the asymptotic distribution of imaX, as rr + 1~. The distribution IS derived from the asymptotic normality of the multinomial distribution, as follows. Suppose that Y of the components of p are each equal to pmax, which are assumed, without loss of generality, to be the first r components. The remaining k - Y components are each less than pmax. We have that for any real number c

lim P{&&,, - p,,,) < c) = lim P max JL(ji - p,,,) < (’ (A.7) n- 7 n-x_ i= 1. ,I

Now, (&I - ~,a,)> . . . >,/br -- ~max,> IS asymptotically distributed according N (0, C), where C = (Oij) is given by

Oij = Pmax(l - Pmax),

i,j = 1 , ... 1 r. From (A.7) it follows that

276 K. Alum, Z. FengjJournal of Statistical Planning and Inf&nce 59 (1997) 257-277

where WC*, denotes the largest component of a random vector W = (W,, . . . f W,) which is distributed according to a multivariate normal distribution N@, CC?), where Q is an equicorrelation matrix with the common correlation equal to

- P~~J(~ - P,,,) = - P say.

Remark A.1. If prnax = l/r then p = l/(r - 1) and so i C?e_ = 0, where i = (1, , 1). Hence xi= I W’i = 0 and SO Wtr, 3 0 with probability 1. In this case both sides of (6.8) are nonnegative with probability 1. Note that (A.8) appears above as (2.3)

Let 6 > 0 and let U be a standard normal random variable, independent of VJ’. Then for any real number w

P(W,,, + MU d W) = P{Wi + MU d W, i = 1, ... ,r)

cc =s ( ~’ w - Js - pL’ d@,(c).

- CC JG

Putting 6 = 0, we get

P{ wcr, d w) = s

= @(w - &z;)/fi)d@(z;). (A.9) -on

The substitution 6 = 0 is valid since @(.) is an analytic function of its argument. Note that the right-hand side of (A.6.9) is real valued since the standard normal density

function is symmetric about the origin. Since WI + AU, . , W, + &U are i.i.d. according to N(O,l + p), the mean and the variance of WC*, are given by

E WC*, = (1 + /V2 EZ,,,, (A.lO)

Var(&) = (1 + P)varZ(,, -P, (A.ll)

where Z,,, denotes the largest order statistic in a sample of y observations from N(0, 1). The mean and variance of order statistics from the normal distribution, have been

tabulated. See David (1981) for references to the tables.

References

Alam, K. (1971). On the selecting the most probable category. Tehnometrics 13, 843-850 Alam, K. (1979). Estimation of multinomial probabilities. Ann. Statist. 7, 282-283. Alam, K. and A. Mitra (1986). An empirical Bayes estimate of multinomial probabilites. Commun. Statist.

Theor. Method. 15, 3103-3127. Basu, I. (1955). On statistics independent of a complete sufficient statistic. Sankhya 377-380. Bishop, M.M., SE. Fienberg, P.W. Holland (1975 ). Discrete Mukivariate Analysis. Theory and Practice,

The MIT Press, Cambridge. Blumenthal, S. and A. Cohen (1986). Estimation of the larger translation parameter. Ann. Met/?. Statist. 39,

502-516. Brown, L.D. (1981). A complete class theorem for statistical problems with finite sample spaces. Ann. Statist.

9, 1289-l 300. David, (1981). H.A. Or&r Statistics, 2nd edn. Wiley, New York.

Efron, B. (1982). T/w Jtrckknije, the Boorstrrtp trnd other Rrwmpli~y Pltrns. SIAM (CBMS-NSF Regiona: Conferences Series in Applied Mathematics), second printing.

Fienberg. S.E. and P.W. Holland (1973). Simultaneous estimation of multlnomial cell probabilities. .I .-l~rw Srcrtisr. Assoc. 68. 6X3-691.

Gini, C. (1912). Variability e mutabilita, Studi Economica-Giuridlci della facolta di tiiurisprudew della. Universiti di Caglairi 111. Parte II, p. 80.

Glar. J. and B. Johnson (1984). Probability in inequalitw for multivarlate distributions with dcpendcncc structures. .1. 4mer. Strrtist. Assoc. 79, 436 440.

Good. I.J. (1965) The E,stimcrtiw~ (4 Prohnhrlttir,s MIT Press. Cambridge, MA. Good. I.J. (lY67). A Bayesian significance test fol- multinomial distributions (with discuhslona). ./. RIIL

Sttrtist. Sot,. SC/.. 29. 399-43 I Gupta, S.S. and S.N. Hande (1993). Single-sample Bayes and empirical Bares rulcb. fol- rankin: and

estimatmg multinomial probabilities. J. Sttrtist. Plum Ir~fwencc 35, 367 3X2. Gupta. S.S. and S. Panchapakesan (1979). Mt~ltipl~ Dwisiot, Pr~~c~t/urt~v: Mr~thotls for S&c titrq arm/ Rrtdiyq

Po/mlrrtim\. Wiley, New York. HooLer. D.R. (1990). Complement addition upper bounds-an unprobed Inclusion.;excluslon method .I

Stcrtr.st. Pltrrtr~. /r~fAw~r. 29, 195-202. Johnson. B. Mck. (1971). On the addimlssible estimates fat- certain fired aample binomlul pl-oblcms. Irlj:.

.Ifuth. St~ttr.st. 42. I579 1587. Kestcn. H. nnd N. Morse (1959). A property of the multinomlal distribution. .*I?I/~. .2loth. .Stut~\t 30.

120 127. Kolrclka. R.M. (1956). Approximate upper percentage points for cxtremc values in multinomlal sampling.

Inrl. ,2ltrth. Sttrtist. 27. Kuo, L. and K. Mukhopadhyay (1990). Multi-stage pomt and intcrbal estnnation of the largest mean (11

K normal populations and the associated second order properties. ,Mrtriktr 37. 291 300 Lehmann. E. L. (1959). 7’estiny Statisricd Hypothesis. Wile). NC\% York. Mallows. <‘. L. (I 968). An inequality involving multinomial probabilities. Hiometrika 55. 422 424.

Marshall. A.W. and I. Olkin (1979). /rwqu~~/~ric~.~: T/zcw~ ,jt .2ltrjoriztrtiorl trod Its .4pp/ic lrtir~ns. Acacicm~c Press. New York.

Mosimann J.F. I 1962). On the compound multinomial distribution. the multivarlate /i-di\tributlon and correlations among proportions. Biomrtriku 49. 65 82.

Mukhopadhyay. N., S. Chattopadhyay. and S.K. Sahu (1993). Further developments in cstimatton of the largest mean of K normal populations. :Llrlri/i0 40. 173-I 83.

Olkin. I. and M. Sobel (1979). Admissible and mimmax estimation for the multinomial di\tributton and fol h independent binomial distributions. AU/Z. Statist. 7, 184~~190.

Patil, G.P. and C Tailhe (1982). Diversity as a concept and its mcasuremcnt. J. .-lnwr. Slrlti,>t. l\.si~~ 77. 54x 567

Proschan. F. and J. Sethuraman (1975). Sample multlvariate ineyualitles. using aasociatlon 7 /work /‘vr&&. ,Ippl. 20, I93 -195.

Rutkowska, M. (1977). Minimax estimator of the parameters of the multivariate hypergeomctrlc and multinomlal distributions. Zastosontrnitr ~bfMtrtmutv:ki 16. 9- 21.

Suena, KML and Y.L. Tong (1969). Interval estimation of the largest mean of K normal populations uilh known variances. J. Anwr. Statist. Assoc. 64 . 296 299.

Set-fling. R.J. (I 980). .Appro.ximation T/zrww~s of .Mathet~~trric~cd Stntrstic~c-. Wiley. Neh Yol-k. Simpson. E. H. (I 949). Measurement of diversity. h’uturr 163, 088. Steinhaus. H. (1957). The problem of estimatmn. Ann. Mtrth. Statist. 28. 03% 648.

Tong, Y.L. t 1970). Multi-stage interval estunation of the largest of K normal means. J. RCJI. Stutr,st SCM

H 32, ‘72 277. Trybula. S. (195X). Some problems of simultaneous minimax estimation. .4nn. .%l~~h. Stutist. 29. 145 11.53. Yusas. I.S. ( 1972). On the distribution ofmaximum frequency of multinomial distributions. 7‘/wrwt. Prrhrh

.Appl. 17. 711 717.

Documents

Estimating probability of occurrence of the most likely multinomial event