CHAPTER 8 More About Estimation. 8.1 Bayesian Estimation In this chapter we introduce the concepts related to estimation and begin this by considering

CHAPTER 8

More About Estimation

8.1 Bayesian Estimation

In this chapter we introduce the concepts related to estimation and begin this by considering Bayesian estimates, which are also based upon sufficient statistics if the latter exist. We shall now describe the Bayesian approach to the problem of estimation. This approach takes into account any prior knowledge of the experiment that the statistician has and it is one application of a principle of statistical inference that may be called Bayesian statistics. Consider a random variable X that has a distribution of probability that depends upon the symbol , where is an element of a well-defined set . random variable : has a distribution of probability over the set .x: a possible value of the random variable X.

: a possible value of the random variable . The distribution of X depends upon , an experimental value of the random variable . We shall denote the p.d.f. of by and we take when is not an element of . Moreover, we now denote the p.d.f. of X by since we think of it as a conditional p.d.f. of X, given Say is a random sample from his conditional distribution of X . Thus we can write the joint conditional p.d.f. of , givenas Thus the joint p.d.f. of and is

)(h

0)( h xf

.nXXX ,...,, 21

nXXX ,...,, 21 ，).()()( 21 nxfxfxf

nXXX ,...,, 21

).()()()(),,...,,( 2121 hxfxfxfxxxg nn

If is a random variable of the continuous type, the joint marginal p.d.f. of is given by

If is a random variable of the discrete type, integration would be replaced by summation. In either case the conditional p.d.f. of ,given

is

This ralationship is another form of Bayes' formula.

nXXX ,...,, 21

.),,...,,(),...,,(- 21211

dxxxgxxxg nn

,,...,11 nn xXxX

.),...,,(

)()()()(

),...,,(

),,...,,(),...,,(

211

21

211

2121

n

n

n

nn

xxxg

hxfxfxf

xxxg

xxxgxxxk

Example . Let be a random sample from a Poission distribution with mean , where is the observed value of a random variable having a gamma distribution with known parameters and .

Thus

provided that and ,

and is equal to zero elsewhere. Then

nXXX ,...,, 21

,)(!!

),,...,(1

11

1

e

x

e

x

exxg

n

xx

n

n

nixi ,...,2,1,...,3,2,1,0 0

dxx

exxg

n

nx

n

i

01

)1(1

11 )(!!),...,(

Finally, the conditional p.d.f. of , given

is

provided that , and is equal to zero elsewhere. This conditional p.d.f. is one of the gamma type with parameters and

.)1()(!!

)(

1

1

ix

n

n

i

nxx

x

,...,11 xX ,nn xX

,)1(

),...,(

),,...,(),...,(

)1(1

11

11

i

i

xi

nx

n

nn

nx

e

xxg

xxgxxk

0

ix ).1( n

Bayesian statisticians frequently write that is proportional to

that is,

In example 1, the Bayesian statistician would simply write

or, equivalently,

and is equal to zero elsewhere.

),...,( 1 nxxk ；),,...,,( 21 nxxxg

).()()(),...,( 11 hxfxfxxk nn

eexxk nxn

i 11 ),...,(

,),...,( )1(11

nxn exxk i

0

In Bayesian statistics, the p.d.f. is called the prior p.d.f. of , and the conditional p.d.f. is called the posterior p.d.f. of .Suppose that we want a point estimate of , this really amounts to selecting a decision function , so that is a predicted value of when the computed value y and are known. : an experimental value of any random variable; : the mean, of the distribution of ; : the loss function.A Bayes' solution is a decision function that minimizes

)(h )( yk

)(y

)( yk W

)(WE W )(, y

dykyyYyE )()(,)(,

dhdyygy

dyygdyky

)()()(,

)()()(, 1

If an interval estimate of is desired, we can find two functions and so that the conditional probability

is large, say 0.95.

)(yv)(yu

,)()()(Pr)(

)(yv

yudykyYyvyu

8.2 Fisher Information and the Rao-Cramer Inequality

Let X be a random variable with p.d.f.where the parameter space is an interval. We consider only special cases, sometimes called regular cases, of probability density functions as we wish to differentiate under an integral sign.We have that

and, by taking the derivative with respect to ,

(1)

,),;( xf

1);(

dxxf

.0);(

dx

xf

The latter expression can be rewritten as

or, equivalently,

If we differentiate again, if follows that

(2)

We rewrite the second term of the left-hand member of this equation as

0);();(

);(

dxxf

xf

xf

.0);();(ln

dxxf

xf

.0);();(ln

);();(ln

2

2

dx

xfxfxf

xf

This is called Fisher information and is denoted byThat is,

but, from Equation (2), we see that can be computed from

Sometimes, one expression is easier to compute than the other, but often we prefer the second expression.

.);();(ln

);();(

);();(ln

2

dxxfxf

dxxfxf

xfxf

).(I

;);();(ln

)(2

dxxfxf

I

)(I

.);();(ln

)(2

2

dxxfxf

I

Example 1. Let X be binomial Thus

and

Clearly,

which is larger for values close to zero or 1.

).,1( b

),1ln()1(ln);(ln xxxf

,1

1);(ln

xxxf

,)1(

1

1

11

)1(

1

)1(

1)(

22

22

XX

EI

.)1(

1);(ln22

2

xxxf

The likelihood function is

The Rao-Cramér inequality: ( )

nXXX ,...,, 21

).;();();()( 21 nxfxfxfL );(ln);(ln);(ln)(ln 21 nxfxfxfL

);(ln);(ln);(ln)(ln 21 nxfxfxfL

n

i

in

XfEI

1

2);(ln

)(

)()( nIIn

)(

12

nIY )(k

Definition 1. Let Y be an unbiased estimator of a parameter in such a case of point estimation. The statistic Y is called an efficient estimator of if and only if the variance of Y attains the Rao-Cramér lower bound. Definition 2. In cases in which we can differentiate with respect to a parameter under an integral or summation symbol, the ratio of the Rao-Cramér lower bound to the actual variance of any unbiased estimation of a parameter is called the efficiency of that statistic. Example 2. Let denote a random sample from a Poisson distribution that has the mean

nXXX ,...,, 21

It is known that is an m.l.e. of we shall show that it is also an efficient estimator of . We have

Accordingly,

The Rao-Cramér lower bound in this case is But is the variance of . Henceis an efficient estimator of .

X.0 ；

.1

)!lnln();(ln

xx

xxxf

.1)();(ln

22

2

2

22

XEXf

E

.)1(1 nn n X X

8.3 Limiting Distributions of Maximum Likelihood Estimators

We can differentiate under the integral sign, so that

has mean zero and variance . In addition, we want to be able to find the maximum likelihood estimator by solving

n

i

nXfLZ

1

);(ln)(ln

)(nI

.0)(ln

L

)ˆ;()ˆ;()ˆ( 1 nXfXfL

.)(ln)(ln

)(ln

ˆ

2

2

2

2

L

Z

L

L

.

)()(ln1

)(

)(1

ˆ

2

2

IL

n

nIZ

nI

This equation can be rewrited as

(1)

Since Z is the sum of the i.i.d. random variables

each with mean zero and variance ,the numerator of the right-hand member of Equation (1)

,,...,2,1,);(ln

niXf i

)(I

is limiting N(0,1) by the central limit theorem.

Example. Suppose that the random sample arises from a distribution with p.d.f.

zero elsewhere. We have

and

},0:{,10,);( 1 xxxf

,ln1);(ln

,ln)1(ln);(ln

xxf

xxf

22

2 1);(ln

xf

22 1)1( ESince , the lower bound of the variance of every unbiased estimator of is .Moreover, the maximum likelihood estimator has an approximate normal distribution with mean and variance .Thus, in a limiting sense, is the unbiased minimum variance estimator of ; that is, is asymptotically efficient.

n2

n

i iXn1

ln n2

8.4 Robust M-Estimation We have found the m.l.e. of the center of the Cauchy distribution with p.d.f.

where .The logarithm of the likelihood function of a random samplefrom this distribution is

,,)(1

1);(

2

x

xxf

nXXX ,...,, 21

.)(1lnln)(ln1

2

n

iixnL

n

i i

i

x

x

d

Ld

12

.0)(1

)(2)(ln

The equation can be solved by some iterative process. We use the weight function

where and

where

We have

,)ˆ(1

2)ˆ(

20

0

xxw

,)()(ln)(ln1 1

n

i

n

iii xxfL

),(ln)( xfx

,)()(

)()(ln

1 1

n

i

n

ii

i

i xxf

xf

d

Ld

).()( xx ),1ln(ln)( 2xx

and

In addition, we define a weight function as

which equals in the Cauchy case.

Definition 1. An estimator that is fairly good (small variance, say) for a wide variety of distributions (not necessarily the best for any one of them) is called a robust estimator.

Definition 2. Estimators associated with the solution of the equation

.1

2)(

2x

xx

,)(

)(x

xxw

)1(2 2x

0)(1

n

iix

are frequently called robust M-estimators (denoted by )because they can be thought of as maximum likelihood e

stimators.Huber's function:

with weight provided that

With Huber's function, another problem arises:If we double each estimators such as a

nd median also double.

,,

,,

,,)(

xkk

kxkx

kxkx

,,,1)( xkandkxxw .xk

,,...,, 21 nXXX X)( iX

This is not at all true with the solution of the equation

where the function is that of Huber.

(1)

A popular d to use is

The scheme of selecting d also provides us with a clue for selecting k.

,0)(1

n

iix

n

i

i

d

x

1

0

.6745.0

)( ii xmedianxmediand

To satisfy the inequality

Because then

If all the values satisfy this inequality, then Equation (1) becomes

This has the solution , which of course is most desirable with normal distributions.

kd

xi

.d

x

d

x ii

.011

n

i

in

i

i

d

x

d

x

x

To solve Equation (1). Newton's method.

Let be a first estimate of , such as

(the one-step M-estimate of )

If we use in place of , we obtain ,the two-step M-estimate of .

0 ).(0 ixmedian

01ˆ

)ˆ(ˆ

1

00

1

0

dd

x

d

x n

i

in

i

i

,ˆ

ˆ

ˆˆ

0

1

1

01

d

x

d

xd

in

i

n

i

oi

1 0 2

The scale parameter is known. Two terms of Taylor's expansion of

about provides the approximation

This can be rewritten

(2)

n

i

iX

1

0ˆ

n

i

n

i

ii XX

1 1

.01

)ˆ(

i

i

X

X

ˆ

We have considered

Clearly,

Thus Equation (2) can be rewritten as

(3)

,0

X

E

.var 2

X

EX

.

)ˆ(

2

2

2

ii

i

XnE

X

XnE

X

XE

XE

n

Documents

CHAPTER 8 More About Estimation. 8.1 Bayesian Estimation In this chapter we introduce the concepts related to estimation and begin this by considering