Upload
clifford-mccormack
View
224
Download
4
Embed Size (px)
Citation preview
CHAPTER 8
More About Estimation
8.1 Bayesian Estimation
In this chapter we introduce the concepts related to estimation and begin this by considering Bayesian estimates, which are also based upon sufficient statistics if the latter exist. We shall now describe the Bayesian approach to the problem of estimation. This approach takes into account any prior knowledge of the experiment that the statistician has and it is one application of a principle of statistical inference that may be called Bayesian statistics. Consider a random variable X that has a distribution of probability that depends upon the symbol , where is an element of a well-defined set . random variable : has a distribution of probability over the set .x: a possible value of the random variable X.
: a possible value of the random variable . The distribution of X depends upon , an experimental value of the random variable . We shall denote the p.d.f. of by and we take when is not an element of . Moreover, we now denote the p.d.f. of X by since we think of it as a conditional p.d.f. of X, given Say is a random sample from his conditional distribution of X . Thus we can write the joint conditional p.d.f. of , givenas Thus the joint p.d.f. of and is
)(h
0)( h xf
.nXXX ,...,, 21
nXXX ,...,, 21 ,).()()( 21 nxfxfxf
nXXX ,...,, 21
).()()()(),,...,,( 2121 hxfxfxfxxxg nn
If is a random variable of the continuous type, the joint marginal p.d.f. of is given by
If is a random variable of the discrete type, integration would be replaced by summation. In either case the conditional p.d.f. of ,given
is
This ralationship is another form of Bayes' formula.
nXXX ,...,, 21
.),,...,,(),...,,(- 21211
dxxxgxxxg nn
,,...,11 nn xXxX
.),...,,(
)()()()(
),...,,(
),,...,,(),...,,(
211
21
211
2121
n
n
n
nn
xxxg
hxfxfxf
xxxg
xxxgxxxk
Example . Let be a random sample from a Poission distribution with mean , where is the observed value of a random variable having a gamma distribution with known parameters and .
Thus
provided that and ,
and is equal to zero elsewhere. Then
nXXX ,...,, 21
,)(!!
),,...,(1
11
1
e
x
e
x
exxg
n
xx
n
n
nixi ,...,2,1,...,3,2,1,0 0
dxx
exxg
n
nx
n
i
01
)1(1
11 )(!!),...,(
Finally, the conditional p.d.f. of , given
is
provided that , and is equal to zero elsewhere. This conditional p.d.f. is one of the gamma type with parameters and
.)1()(!!
)(
1
1
ix
n
n
i
nxx
x
,...,11 xX ,nn xX
,)1(
),...,(
),,...,(),...,(
)1(1
11
11
i
i
xi
nx
n
nn
nx
e
xxg
xxgxxk
0
ix ).1( n
Bayesian statisticians frequently write that is proportional to
that is,
In example 1, the Bayesian statistician would simply write
or, equivalently,
and is equal to zero elsewhere.
),...,( 1 nxxk ;),,...,,( 21 nxxxg
).()()(),...,( 11 hxfxfxxk nn
eexxk nxn
i 11 ),...,(
,),...,( )1(11
nxn exxk i
0
In Bayesian statistics, the p.d.f. is called the prior p.d.f. of , and the conditional p.d.f. is called the posterior p.d.f. of .Suppose that we want a point estimate of , this really amounts to selecting a decision function , so that is a predicted value of when the computed value y and are known. : an experimental value of any random variable; : the mean, of the distribution of ; : the loss function.A Bayes' solution is a decision function that minimizes
)(h )( yk
)(y
)( yk W
)(WE W )(, y
dykyyYyE )()(,)(,
dhdyygy
dyygdyky
)()()(,
)()()(, 1
If an interval estimate of is desired, we can find two functions and so that the conditional probability
is large, say 0.95.
)(yv)(yu
,)()()(Pr)(
)(yv
yudykyYyvyu
8.2 Fisher Information and the Rao-Cramer Inequality
Let X be a random variable with p.d.f.where the parameter space is an interval. We consider only special cases, sometimes called regular cases, of probability density functions as we wish to differentiate under an integral sign.We have that
and, by taking the derivative with respect to ,
(1)
,),;( xf
1);(
dxxf
.0);(
dx
xf
The latter expression can be rewritten as
or, equivalently,
If we differentiate again, if follows that
(2)
We rewrite the second term of the left-hand member of this equation as
0);();(
);(
dxxf
xf
xf
.0);();(ln
dxxf
xf
.0);();(ln
);();(ln
2
2
dx
xfxfxf
xf
This is called Fisher information and is denoted byThat is,
but, from Equation (2), we see that can be computed from
Sometimes, one expression is easier to compute than the other, but often we prefer the second expression.
.);();(ln
);();(
);();(ln
2
dxxfxf
dxxfxf
xfxf
).(I
;);();(ln
)(2
dxxfxf
I
)(I
.);();(ln
)(2
2
dxxfxf
I
Example 1. Let X be binomial Thus
and
Clearly,
which is larger for values close to zero or 1.
).,1( b
),1ln()1(ln);(ln xxxf
,1
1);(ln
xxxf
,)1(
1
1
11
)1(
1
)1(
1)(
22
22
XX
EI
.)1(
1);(ln22
2
xxxf
The likelihood function is
The Rao-Cramér inequality: ( )
nXXX ,...,, 21
).;();();()( 21 nxfxfxfL );(ln);(ln);(ln)(ln 21 nxfxfxfL
);(ln);(ln);(ln)(ln 21 nxfxfxfL
n
i
in
XfEI
1
2);(ln
)(
)()( nIIn
)(
12
nIY )(k
Definition 1. Let Y be an unbiased estimator of a parameter in such a case of point estimation. The statistic Y is called an efficient estimator of if and only if the variance of Y attains the Rao-Cramér lower bound. Definition 2. In cases in which we can differentiate with respect to a parameter under an integral or summation symbol, the ratio of the Rao-Cramér lower bound to the actual variance of any unbiased estimation of a parameter is called the efficiency of that statistic. Example 2. Let denote a random sample from a Poisson distribution that has the mean
nXXX ,...,, 21
It is known that is an m.l.e. of we shall show that it is also an efficient estimator of . We have
Accordingly,
The Rao-Cramér lower bound in this case is But is the variance of . Henceis an efficient estimator of .
X.0 ;
.1
)!lnln();(ln
xx
xxxf
.1)();(ln
22
2
2
22
XEXf
E
.)1(1 nn n X X
8.3 Limiting Distributions of Maximum Likelihood Estimators
We can differentiate under the integral sign, so that
has mean zero and variance . In addition, we want to be able to find the maximum likelihood estimator by solving
n
i
nXfLZ
1
);(ln)(ln
)(nI
.0)(ln
L
)ˆ;()ˆ;()ˆ( 1 nXfXfL
.)(ln)(ln
)(ln
ˆ
2
2
2
2
L
Z
L
L
.
)()(ln1
)(
)(1
ˆ
2
2
IL
n
nIZ
nI
This equation can be rewrited as
(1)
Since Z is the sum of the i.i.d. random variables
each with mean zero and variance ,the numerator of the right-hand member of Equation (1)
,,...,2,1,);(ln
niXf i
)(I
is limiting N(0,1) by the central limit theorem.
Example. Suppose that the random sample arises from a distribution with p.d.f.
zero elsewhere. We have
and
},0:{,10,);( 1 xxxf
,ln1);(ln
,ln)1(ln);(ln
xxf
xxf
22
2 1);(ln
xf
22 1)1( ESince , the lower bound of the variance of every unbiased estimator of is .Moreover, the maximum likelihood estimator has an approximate normal distribution with mean and variance .Thus, in a limiting sense, is the unbiased minimum variance estimator of ; that is, is asymptotically efficient.
n2
n
i iXn1
ln n2
8.4 Robust M-Estimation We have found the m.l.e. of the center of the Cauchy distribution with p.d.f.
where .The logarithm of the likelihood function of a random samplefrom this distribution is
,,)(1
1);(
2
x
xxf
nXXX ,...,, 21
.)(1lnln)(ln1
2
n
iixnL
n
i i
i
x
x
d
Ld
12
.0)(1
)(2)(ln
The equation can be solved by some iterative process. We use the weight function
where and
where
We have
,)ˆ(1
2)ˆ(
20
0
xxw
,)()(ln)(ln1 1
n
i
n
iii xxfL
),(ln)( xfx
,)()(
)()(ln
1 1
n
i
n
ii
i
i xxf
xf
d
Ld
).()( xx ),1ln(ln)( 2xx
and
In addition, we define a weight function as
which equals in the Cauchy case.
Definition 1. An estimator that is fairly good (small variance, say) for a wide variety of distributions (not necessarily the best for any one of them) is called a robust estimator.
Definition 2. Estimators associated with the solution of the equation
.1
2)(
2x
xx
,)(
)(x
xxw
)1(2 2x
0)(1
n
iix
are frequently called robust M-estimators (denoted by )because they can be thought of as maximum likelihood e
stimators.Huber's function:
with weight provided that
With Huber's function, another problem arises:If we double each estimators such as a
nd median also double.
,,
,,
,,)(
xkk
kxkx
kxkx
,,,1)( xkandkxxw .xk
,,...,, 21 nXXX X)( iX
This is not at all true with the solution of the equation
where the function is that of Huber.
(1)
A popular d to use is
The scheme of selecting d also provides us with a clue for selecting k.
,0)(1
n
iix
n
i
i
d
x
1
0
.6745.0
)( ii xmedianxmediand
To satisfy the inequality
Because then
If all the values satisfy this inequality, then Equation (1) becomes
This has the solution , which of course is most desirable with normal distributions.
kd
xi
.d
x
d
x ii
.011
n
i
in
i
i
d
x
d
x
x
To solve Equation (1). Newton's method.
Let be a first estimate of , such as
(the one-step M-estimate of )
If we use in place of , we obtain ,the two-step M-estimate of .
0 ).(0 ixmedian
01ˆ
)ˆ(ˆ
1
00
1
0
dd
x
d
x n
i
in
i
i
,ˆ
ˆ
ˆˆ
0
1
1
01
d
x
d
xd
in
i
n
i
oi
1 0 2
The scale parameter is known. Two terms of Taylor's expansion of
about provides the approximation
This can be rewritten
(2)
n
i
iX
1
0ˆ
n
i
n
i
ii XX
1 1
.01
)ˆ(
i
i
X
X
ˆ
We have considered
Clearly,
Thus Equation (2) can be rewritten as
(3)
,0
X
E
.var 2
X
EX
.
)ˆ(
2
2
2
ii
i
XnE
X
XnE
X
XE
XE
n