Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Hypothesis Testing:Large Sample
Asymptotic Theory — Part IV
James J. HeckmanUniversity of Chicago
Econ 312This draft, April 18, 2006
This lecture consists of three sections:
• Three commonly used tests• Composite hypothesis• Application to linear regression model
ln$ =X=1
ln ( | ) ,
where the are i.i.d.
1
1 Three commonly used tests
In this section we examine 3 commonly used (asymptoticallyequivalent) tests:
1. Wald Test
2. Rao, Score or Lagrange Multiplier (LM) Test
3. LR Test
We motivate the derivation of the test statistic in each caseand show asymptotic equivalence of the three tests.
2
1.1 Wald Test
Consider a simple null hypothesis:
0 : = 0, : 6= 0.
Natural test uses the fact1 that:
(ˆ 0) (0 10)
so that (ˆ 0)0
0 (ˆ 0)2( ) ( =number of
parameters), where:
0 =1 2 ln$
0
¸=
2 ln ( | )0
¸,
for i.i.d.1See Lecture III on Asymptotic theory.
3
This is the Wald Test and is similar to the test used in OLShypothesis testing. We use the uniform convergence of ˆ to 0
to obtain plim ˆ = 0, so that the test statistic for the Waldtest:
= (ˆ 0)0ˆ (ˆ 0)
d 2( )
in large samples.
Note that the Wald test is based on the unrestricted MLEmodel.
4
Recall at true parameter vector = 0,
ln$ ( | )¸=
Zln$ ( | ) ¯̄̄̄
= 0
$ ( | 0) = 0
because Z$ ( | 0) = 1 =
Z$ ( | )
= 0,
but
$=
ln$$
=
Zln$ ( | ) ¯̄̄̄
= 0
$ ( | 0) = 0
5
Di erentiating again,Z 2 ln$ ( | )0
¯̄̄̄= 0
$ ( | 0)
+
Zln$ ( | ) ¯̄̄̄
= 0
ln$ ( | )0
¯̄̄̄= 0
$ ( | 0)
= 0
Therefore,
0 =
"1Ã
ln$¯̄̄̄= 0
ln$0
¯̄̄̄= 0
!#
6
Cramer-Rao Lower Bound (Scalar Case)
Consider an estimator ( ) of :
( ( )) =
Z( )$ ( ; )
Under regularity,
( ( ))=
Z( )
ln$ ( ; )
=
μ( )
ln$¶
7
E( ( ))=
Z( )
$d
=
Z( )
ln$$d
= Covμ( )
ln$¶,
because
Eμ
ln$¶= 0.
8
From the Cauchy-Schwartz inequality,μ( ( ))
¶2( ( ))
μln$
¶| {z }
0
For full rank information matrix 0,
( ( ))
μ( ( ))
¶20
9
If unbiased,
( ( )) =ln ( ( ))
= 1
( ( ))1
0
There is a vector version:
( ( )) 10
Cannot do better than MLE in terms of e ciency.
10
1.2 Rao Test (also LM or Score Test)
The second Test — Rao Test (LM) is based on the restrictedmodel. Observes that in a large enough sample 0 (true para-meter value) should be a root of the likelihood equation:
1 ln$¯̄̄̄0
1X=1
ln ( )¯̄̄̄0
= 0
i.e., it imposes the null onto the model for all sample sizes.
In contrast, the Wald Test gets its statistics from estimates ofthe unrestricted model (i.e., a model where the null is not im-posed on the estimates). The Likelihood Ratio Test comparesthe likelihood restricted with the likelihood unrestricted.
11
ln$¯̄̄̄ˆML
= 0 and Eμ
ln ( | )¶= 0,
but
1 ln$ P Eμ
ln ( | )¶= 0
=1 ln$
¯̄̄̄0
P0.
12
Now
By C.L.T.; Lindberg-Levy for i.i.d.z }| {1 ln$
¯̄̄̄0
=1 X
=1
ln ( )¯̄̄̄0
(0 0)
implies that the hypothesis could be tested by testing if scoreln$
¯̄̄̄0
= 0, at the restricted parameter values.2
2For the distributional results, refer to earlier lectures on Asymptotictheory.
13
Thus this test uses the statistic:
=
μ1 ln$
¶00
1ˆ
μ1 ln$
¶0
d 2 ( ) .
2( ) in large samples can be shown using plim ˆ = 0 .
14
The Rao test is also called the Score Test or the LagrangeMultiplier test. This is because this test can be motivatedfrom the solution to a constrained maximization of the log-likelihood subject to constraint = 0. We get:
Lagrangian: ln$ 0( 0)
FOC:ln$
= 0
At null: =ln$
= 0
Thus one can test on (the Lagrange Multiplier) or on the
Scoreμ
ln$¶.
15
Asymptotic Equivalence of Wald and LM tests To estab-lish the asymptotic relationships between the two tests,we use Taylor’s theorem to write:
1 X ln¯̄̄̄ˆ| {z }
=0 by construction
=1X ln
¯̄̄̄0
+1μX 2 ln
0
¶(ˆ 0)
where is an intermediate value with
k 0k k k kˆk
16
From above, we get the duality relationships between score andparameter vectors:
(ˆ 0) =
μX 2 ln0
¶ 1X ln¯̄̄̄0
(ˆ 0) =
μ1μX 2 ln
0
¶ ¶ 11 X ln
¯̄̄̄0
Noting that plim = 0, substituting into the Wald statisticwe get:
plim =
μ1 ln$
¶0 ¯̄̄̄0
10 0
10
μ1 ln$
¶¯̄̄̄0
= plim
Thus, asymptotically the Wald and Rao tests are equivalent.
17
1.3 Likelihood Ratio Test
The third commonly used test is the Likelihood Ratio Test andit uses both the restricted and the unrestricted models. Taylorexpanding the Likelihood function around the point ˆ we get:
ln$( 0)| {z }showed start with the restricted
= ln$(ˆ) +ln$
¯̄̄̄ˆ( 0
ˆ)| {z }=0 by construction
+1
2( 0
ˆ)0μ
2 ln$0
¶( 0
ˆ)
where is an intermediate value with k 0k k k kˆk.
18
ln$0 ln$³ˆ0
´ln$
³ˆ´
ln$
ln$ = ln$0 +ln$0 ( 0)| {z }
=0
+0
2
2$0 ( 0)0
2 (ln$ ln$0) = ( 0)0 2$0
0 ( 0)
= 2 (ln$ ln$0)| {z }
19
= ( 0)0μ
22$0( ) 0
¶( 0)
where μ22$0
( ) 0
¶p
0
and( 0)
d ¡0 1
0
¢.
20
2[ln$(ˆ) ln$( 0)] = (ˆ 0)0μ
1 2 ln$0
¶(ˆ 0)
2 ln
"$(ˆ)
$( 0)
#(ˆ 0)
00 (ˆ 0).
We may also write
2 ln
"$(ˆ)
$( 0)
#0
0
2 ( ) ;
¡0 1
0
¢, and
³ˆ
0
´.
21
Based on the above derivation, we get the statistic for thelikelihood ratio test:
= 2 ln
È
ˆ
!
where ˆ is the restricted maximum likelihood function andˆ is the unrestricted maximum likelihood function, and asshown above:
2( )
Note that ˆ ˆ (as an unrestricted maximized functionwould be greater than or equal to the restricted maximizedfunction), so that always LR 0.
22
Asymptotic Equivalence of LR and Wald tests
From the derivation above, it is clear that asymptotically, theLR test statistic converges to the Wald test statistic, i.e.,
plim = (ˆ 0)| {z }d
00 (ˆ 0)| {z }
d
= plim
where (0 0).
We saw earlier that the LM andWald tests were asymptoticallyequivalent. Thus along with the above asymptotic equivalenceof LR andWald we get that asymptotically all three commonlyused tests are equivalent, i.e., asymptotically:
23
2 Composite Hypothesis
Consider a parameter vector = ( 1 2) (where 1 and 2 arepossibly vectors). In many situations, we may want to considerestimating only a subset of the parameters 2. This could bebecause we omit 1 or we hypothesize that 1 = 0. In thiscontext, 2 questions are relevant:
• When do we get an unbiased estimate for 2 if we do theMLE estimation omitting 1?
• How do the results obtained in the previous section applyto test the composite hypothesis:
0 : 1 = 0 and 2 unrestricted: 1 and 2 both unrestricted
We explore these two questions in the next few subsections.
24
2.1 Definitions
We use the following defined expressions in the rest of thesection.
• Define true parameter vector 0 (¯1 ¯2);
• Define estimated parameter vector (unconstrained) asˆ (ˆ1 ˆ2);
• Define estimated parameter vector (constrained, i.e. un-der the null hypothesis setting 1 = 0) as ˜ (0 ˜2);
• In Taylor expansions hereafter, we use 0 and inter-changeably (neglecting (1) terms);
25
• Define the various information matrix related terms asfollows:
1μ
2 $0
¶0
0 =
μ1 1 1 2
2 1 2 2
¶
26
2.2 When is ˆ2 unbiased if 1 is omitted?
In this subsection, we show that the condition for ˆ2 to bean unbiased estimator of ¯2 when the model is misspecified
and estimated omitting 1 is that the score vectorsln$
1and
ln$
2be orthogonal to each other.
27
First, Taylor expanding the score vectors 1 and 2, we get:
1μ
$
1
¶ˆ=
1μ
$
1
¶0
(1)
1 1 (ˆ1 ¯1) 1 2 (ˆ2 ¯
2) + (1)
1μ
$
2
¶ˆ=
1μ
$
2
¶0
(2)
2 2 (ˆ2 ¯2) 2 1 (ˆ1 ¯
1) + (1)
(1) 0 =1μ
$
1
¶0
1 1 (ˆ1 ¯1) 1 2 (ˆ2 ¯
2)+ (1)
(2) 0 =1μ
$
2
¶0
2 2 (ˆ2 ¯2) 2 1 (ˆ1 ¯
1)+ (1)
28
Collecting terms we get:
μ(ˆ1 ¯
1)
(ˆ2 ¯2)
¶= [ ] 1
1μ
$
1
¶0
1μ
$
2
¶0
Next consider ˜2 (MLE of 2 given ¯1 = 0). Expanding aroundroot of likelihood ¯1 = 0, we get:
1μ
$
2
¶˜=
1μ
$
2
¶0
2 2 (˜2 ¯2) + (1)
(3)Now the first term on the right hand side is in common withthe term on of (2) above. Therefore we have that:
2 2 (e2 2) = 2 2 (b2 2)+ 2 1 (b1 1)+ (1)
29
or:
(e2 2) = (b2 2) (4)
+ 12 2 2 1 (b1 1) + (1)
Thus, we get that for (˜2 ¯2) = (ˆ2 ¯
2) we need:
2 1 =ln$
2
μln$
1
¶0¸= 0
i.e., for ˜ (the constrained estimator) to have the same proper-ties as ˆ (the unconstrained estimator) we need that the score
vectorsln$
1and
ln$
2be uncorrelated.
30
Similarity with OLS result: Note that this result is analo-gous to a result for the OLS framework. In the OLS model wehave:
= 1 1 + 2 2 +
If we run the regression omitting 1, we have:
= 2 2 + { + 1 1}Then we get, under standard OLS assumptions:
plimˆ2 = 2 + plimμ 0
2 2
¶ 1
plimμ 0
2 1
¶1
so that we get consistent estimate ˆ2 if the 2 and 1 areorthogonal to each other.3
3Note plim 20
1= [ 0
2 1] by appropriate LLN.
31
Note that the score vectors in MLE are analogous to the datavectors in OLS; we shall revisit this analogy again in Section 3below.
32
2.3 Hypothesis testing results for the com-posite hypothesis
In this subsection, we show the following results:
• Asymptotic equivalence of LR test and Wald test;• Asymptotic equivalence of Rao test (Score/LM test) andthe LR test.
33
2.3.1 Equivalence between Likelihood Ratio andWaldTests
From the definition of the LR test statistic in Section 1.3, wehave the analogous definition for the composite hypothesis caseas:
LR = 2 ln$(ˆ1 ˆ2)
$( 1 = 0 ˜2)= 2 ln
$( 1 = 0 ˜2)
$(ˆ1 ˆ2)
or
LR = 2[ln$( 1 = 0 ˜2) ln$(¯1 ¯2)]
+2[ln$(ˆ1 ˆ2) ln$(¯1 ¯2)]
34
Then following same steps as in derivation in Section 1.3, weget:
LR = (ˆ1 ¯1ˆ2
¯2)0
0 (ˆ1 ¯1ˆ2
¯2)
(0 ˜2 ¯2)0
0 (0 ˜2 ¯2) + (1)
or
LR = (ˆ1 ¯1ˆ2
¯2)0
0 (ˆ1 ¯1ˆ2
¯2)
(˜2 ¯2)0
2 2 (˜2 ¯2) + (1)
35
Substituting for (˜2 ¯2) from equation (4) in Section 2.2, we
get:
LR = (ˆ1 ¯1ˆ2
¯2)0¯1¯2
(ˆ1 ¯1ˆ2
¯2)h
(ˆ2 ¯2) + ( 2 2)
12 1 (ˆ1 ¯
1)i0
2 2h(ˆ2 ¯
2) + ( 2 2)1
2 1 (ˆ1 ¯1)i+ (1)
Call ˆ1 ¯1 and ˆ2 ¯
2 .
36
Then we have:
LR = 01 1 + 0
2 1
+ 01 2 + 0
2 2
01 2( 2 2)
12 2( 2 2)
12 1
01 2( 2 2)
12 2
02 2
02 2
12 2 2 1 + (1)
= 0[ 1 1 1 2( 2 2)1
2 1 ] + (1)
= 0( 1 1) 1 + (1) (5)
where 1 1 is the upper left diagonal block of:
10=
μ1 1 1 2
2 1 2 2
¶ 1
37
(See partitioned inverse result established in section 3.)In the composite hypothesis case, the Wald test exploits thefact that: (ˆ1 ¯
1) (0 1 1) so that the Wald teststatistic here is:
= (ˆ1 ¯1)0( 1 1) 1 (ˆ1 ¯
1) + (1) (6)
From (5) and (6) above, we get directly the asymptotic equiv-alence of the Wald and LR tests, i.e., , in largesamples.
38
2.3.2 Equivalence between Rao (Score/LM) test andthe other tests
To derive the statistic for the Rao (Score/LM) test, first com-pute the derivative with respect to 1 when 1 constraint isimposed):
1 ln$
1
¯̄̄̄˜=
1 ln$
1
¯̄̄̄0
1 2
³˜2 2
´+ (1) (7)
39
From Eqn (3) in Section 2.2 we have:³˜2 2
´= 1
2 2
1 ln$
2
¯̄̄̄0
+ (1)
Substituting above result in (7), we get:
1 ln$
1
¯̄̄̄˜=
1 ln$
1
¯̄̄̄0
1 2
12 2
1 ln$
2
¯̄̄̄0
+ (1)
1 2
12 2
+ (1)
defining1μ
$
1
¶0
and1μ
$
2
¶0
.
40
Then we obtain variance of the key score parameter as:μ1 ln$
1
¯̄̄̄˜
¶[( 1 2
12 2
)( 1 2
12 2
)0]
= 1 1 + 1 2
12 2 2 2
12 2 2 1
2 1 2
12 2 2 1
= ( 1 1) 1
41
· We have the Rao test statistic:
LM =
μ1 ln$
1
¯̄̄̄˜
¶01 1
μ1 ln$
1
¯̄̄̄˜
¶=
£1 2
12 2
¤01 1£
1 2
12 2
¤From Section 2.2 and results for partitioned inverses (refer Sec-tion 2.3.1), we have:
(ˆ1 ¯1) = 1 1
1 ln$
1
¯̄̄̄0
+ 1 2
"1 $
2
¯̄̄̄0
#+ (1)
= 1 1
"1 ln$
1
¯̄̄̄0
1 2
12 2
1 ln$
2
¯̄̄̄0
#+ (1)
= 1 1£
1 2
12 2
¤+ (1)
42
Substituting this into expressions for the Wald statistic, weget:
= (ˆ1 ¯1)0 1 1 (ˆ1 ¯
1)
= [ 1 2
12 2
]0 1 1[ 1 2
12 2
] + (1)
LM (Rao/Score)
Thus, along with the result in Section 2.3.1, we have even forthe composite hypothesis case, asymptotically:
(Rao)
43
3 Application to linear regressionmodel
In this section, we look at some analogies between standardMLE results derived in the earlier sections and OLS regressionresults. (Recall that we already look at some analogies betweenOLS and MLE in Section 2.2 above.)
First, we show analogy between OLS and MLE for estimationof parameter sub-vectors. Then we derive expressions for thethree common test statistics and establish certain results forthese.
44
3.1 Estimation of parameter subsets
Key result in regression is Theorem of the Partitioned Inverse:
= 11 12
21 22
¸assume nonsingular,
Assume 11 is nonsingular so 22 211
11 12 = .Then
1 =1
11 ( + 121
21 11)1
11 121
121
111
1
¸Proof: Multiply it out.
45
In OLS case, it produces a useful result (Frisch-Waugh-Goldberger):
0 =01 1
01 2
02 1
02 2
¸In the OLS case we partition the matrix of independent vari-ables as
=¡
1 2
¢and also 0 =
μ 01 1
01 2
102
02 2
¶
46
We define:
1 1(01 1)
1 01
02 2
02 1(
01 1)
1 01 2
= 02 1 2
Now we have result for OLS regression:
ˆ =
μ1̂
2̂
¶= ( 0 ) 1( 0 )
=01 1
01 2
102
02 2
¸ 1 0102
¸(8)
47
Then using the results from theorem of partitioned inverseabove and simplifying, we can write:
ˆ1 = ( 0
1 1)1( 0
1 )
+( 01 1)
1 01 2
1 02 1(
01 1)
1 01
( 01 1)
1 01 2
1 02
= ( 01 1)
1 01 ( 0
1 1)1( 0
1 2)1 0
2 1
ˆ2 = 1 0
2 1(01 1)
1 01 + 1 0
2
= 1 02( 1(
01 1)
11)
= 1 02 1 (9)
This leads to a Double Residual Regression Result. ˆ2: Regresson 1, 2 on 1; form residuals. Regress one set of residuals
on another.
48
Result: 2̂ is the regression of “cleaned out ” on “cleanedout 2”
This result follows directly from the results (8) and (9) above.Define:
“Cleaned out ” = [ 1(01 1)
1 01] = 1
(Residual from the regression of on 1)
“Cleaned out 2” 2 = [ 1(01 1)
1 01] 2 = 1 2
(Residual from the regression of 2 on 1)
49
Then from (9) above we have:
ˆ2 = 1 0
2 1
= [ 02 1 2]
1( 02 1 )
= [ 02 2 ]
1( 2 )
Note that we really don’t have to clean out , just since 1isidempotent. Further, if 1 and 2 are orthogonal (uncorre-lated), then 2 = 2, and hence unbiased/consistent estimateof 2̂ can be obtained by directly regressing 2 on . (Recallthat we derived the same result for MLE in Section 2.2.)
50
Also observe (derived directly from ˆ = 1ˆ1 + 2
ˆ2) that:
ˆ 0 ˆ = 01(
01 1)
1 01 1| {z }
Part due to 1
+ 01 2
1 02 1| {z }
Part due to orthogonalized 2
Thus, unless regressors are orthogonal, there are no uniquecontributions.
Proof.
1. Observe that:
1 1 = 0 2 2 = 0
1 = 2 =
+ =³
1ˆ1 + 2
ˆ2
´+
51
2. Observe that:
1 2 = ( 1) 2 = linear combination of 2
= · 1 2 = 0.
3.
02 1 = 2 1 1 1 +
02 1 2
ˆ2 +
02 2
= 2 1 1 1 +02 1 2
ˆ2 +
0 ( 1 2) ,
where 2 1 1 1 0 and 1 2 0. Thus,
ˆ2 = (
02 1 2)
1( 0
2 1 ) .
52
4. Observe that ˆ2 is the result of the regression
1 = 1 2ˆ2 + error2,
but
= 1 1 + 2 2 +
= 1 = 1 1 1 + 1 2 2 + 1
= 1 = 1 2 2 + 1 1 1 0
= error2 = .
(from Davidson & MacKinon)
53
Analogously, for MLE in neighborhood of optimum we havethat:
³b0
´ ln$μ
ln$¶0 1
1 ln$ 0 + (1)
54
= 1
ln$
1
μln$
1
¶0ln$
1
μln$
2
¶0ln$
2
μln$
1
¶0ln$
2
μln$
2
¶01
× 1ln$
1ln$
2
0 + (1),
where is a 1× vector of ones, i.e. = (1 1 1 1 1).
55
Comparing above equation to the result for the standard OLSregression (see eqn (8) above), we note that the MLE resultis analogous to regressing on the score vector (a vector ofones); i.e., we have that:
[ 01 2]
μln$
1
¶ μln$
2
¶0¸and 0
56
Further, we also get:³2̂ 2̄
´=
"μln$
2
¶0 "ln$
1
μln$0
1
ln$
1
¶ 1ln$0
1
#ln$
2
# 1
×Ã
ln$
1
μln$
1
¶0ln$
1
¸ 1ln$0
1
!0 + (1),
which is analogous to regression result (9) above.
57
3.2 Hypothesis testing results for the linearregression model
Consider in classical linear regression model.
= +¡0 2
¢i.i.d.,
i.e.,
( ) =1p2 2
exp
μ1
2
2
2
¶=
( | ) = ( | )
=1
2 2exp
μ1
2 2( )2
¶
58
With i.i.d. sampling, we have the log likelihood function:
ln$ =2ln 2
2ln 2 1
2 2
X=1
( )2
59
3.2.1 MLE Solutionln$
=X
= 0
at optimum yields:
ˆ =³X 0
´ 1X 0
ˆ2 =1X
=1
( ˆ)2
60
1 2 ln$0 =
( 0 ) 12
1 2 ln$
( 2 )=
"2
12+
1
2 ( 2 )2
X( ˆ)2
#=
1
(2 2 )2
X( ˆ)0 = 0
(This holds unless we have that the model is such that there’sa functional relationship between 2 and — this we excludeby assumption.)
61
1 2 ln$
( 2 )2=
2(ˆ2 )2
This yields the information matrix:
0 =
12
0 0
01
2 4
which is and block diagonal.(By previous result we may ignore parameter estimation errorbetween b and b2).
62
3.2.2 Expressions for the various test statistics
Let: = ( 1 2) and 0: 2 = 0.Then the expression for the Likelihood Ratio test statistic isgiven by:
= 2 ln
"$(b)$(˜)
#where $(ˆ) is the unrestricted maximized likelihood functionand $(˜) is the restricted maximized likelihood function.
63
We have:
ln$(˜) =2ln ˜2
1
2˜2
X=1
( 1˜1)2
=2ln
P³1˜1
´2
using fact ˜2 =1 P
=1( 1˜1)2.
64
Similarly,
ln$(ˆ) =2ln
P³1ˆ1 2
ˆ2
´2
=
= 2 ln
"$(b)$(b0)
#
=2ln
μ 0 [ 1(01 1)
1]0 [ ( 0 ) 1 0]
¶(10)
65
The expression for the Wald Test statistic is obtained usingpartitioned inverse theorem on information matrix above andstandard asymptotic normality result for MLE:
(b2 2̄)¡0 2( 0
2[ 1(01 1)
1 01] 2)
1¢
· =³b
2 2
´0 02 ( 1(
01 1)
1 01) 2
2
¸×
³b2 2
´where 2 is the hypothesized value of 2.
66
The test statistic for the Rao (LM/Score) is obtained by com-puting the derivative of the log likelihood function at the pointwhere we have that 2 = 0 is imposed. We get:
ln$( 0) =2ln 2
P( 1 1 1 2 2)
2
2 2
=
1 ln$( 0)
2
¯̄̄̄2=0
=1P³
1 1˜1
´2
2˜2
67
As discussed in earlier sections, the Rao test statistic can nowbe formed by examining the asymptotic distribution of thescore vector. In this linear regression context, the Rao test canbe motivated in another intuitive way.
Basically, the Rao test considers the restrictions to be valid ifthe score 0 (i.e., the log likelihood function is maximized)when the restrictions are imposed. Looking at the rhs of theexpression above, in the linear regression case, this is equivalentto checking if the residual from the restricted MLE
˜ = ( 1 1˜1)
and 2 are orthogonal!
68
Accordingly, we have the Rao test statistic:
=
μ1P˜ 2
2˜2
¶0 ¡2 2¢μ 1P˜ 2
2˜2
¶=
μ1P˜ 2
2˜2
¶0˜2¡ 0
2
¡1(
01 1)
1 01
¢2
¢ 1
×μ1P˜ 2
2˜2
¶
69
Observe a feature of Rao test: Is it equivalent to regressing1 1 on 2 and testing for statistical significance?
The answer is: not in general, as shown below.Regressing 1 1 on 2 gives:
1 1 = 2 +ˆ = ( 0
2 2)1 0
2 1
70
OLS test of = 0 yields statistic:
b 0( 02 2)b
=0
1 2(02 2)
1( 02 2)(
02 2)
1 02 1
( 2( 02 2) 1 0
2 1 )0( 2( 02 2) 1 0
2 1 )
=0
1 2(02 2)
1 02 1
0 [ 1 2( 02 2) 1 0
2] [ 2( 02 2) 1 0
2 1]
=0
1 2(02 2)
1 02 1
0+ 1 2(
02 2)
1 02 1
1 2(02 2)
1 02
2(02 2)
1 02 1
71
Whereas the Rao statistic as derived earlier is:
=0
1 2(02 2)
1 02 1
01 2 1
72
So only if 1 2 = 2 (i.e., 1 2) are these equal, not ingeneral.
Note that if we regress 1 on 1 and 2, then testing if 2
in this regression is significant is equivalent to the Rao test.That is, if we set the model as:
1 1 = 1 1 + 2 2 + bb2 = ( 0
2 1 2)1 0
2 1
Then the OLS test of 2 = 0 is equivalent to the Rao test (leftas exercise for the reader).
73
3.2.3 Relationship between the various tests
While we demonstrated in earlier sections that, asymptotically,the three tests are equivalent, in this special linear regressioncase we can show that so that the Wald testis the most conservative (most likely to reject the null) and theRao test is the least conservative (least likely to reject the null)in small/finite samples.Define:
( ) Expected Sum of Squared Residuals (Unrestricted)
= ( ˆ1 1
ˆ2 2)
0( ˆ1 1
ˆ2 2)
( ) Expected Sum of Squared Residuals (Restricted)
= ( ˜1 1)
0( ˜1 1)
74
Using the results in the earlier subsections of Section 3, andfrom the expressions for Wald, LR and LM test statistics de-rived above it can be shown (left as exercise for the reader)that:
=( ) ( )
( )
¸= ln
( )
( )
¸(Rao) =
( ) ( )
( )
¸
75
1. Proof that :
Define( )
( )
= = ln = ( 1)
Now 1 ln for 1 =
2. Proof that Rao:
Define( )
( )
1.
= (Rao) = [1 ] = ln
Now ln 1 (since 1 always) = Rao.
76