Hypothesis Testing: Large Samplejenni.uchicago.edu/.../topic2/asymp4_hyp-test-large... · 4/18/2006 · The second Test — Rao Test (LM) is based on the restricted model. Observes

Hypothesis Testing:Large Sample

Asymptotic Theory — Part IV

James J. HeckmanUniversity of Chicago

Econ 312This draft, April 18, 2006

This lecture consists of three sections:

• Three commonly used tests• Composite hypothesis• Application to linear regression model

ln$ =X=1

ln ( | ) ,

where the are i.i.d.

1

1 Three commonly used tests

In this section we examine 3 commonly used (asymptoticallyequivalent) tests:

1. Wald Test

2. Rao, Score or Lagrange Multiplier (LM) Test

3. LR Test

We motivate the derivation of the test statistic in each caseand show asymptotic equivalence of the three tests.

2

1.1 Wald Test

Consider a simple null hypothesis:

0 : = 0, : 6= 0.

Natural test uses the fact1 that:

(ˆ 0) (0 10)

so that (ˆ 0)0

0 (ˆ 0)2( ) ( =number of

parameters), where:

0 =1 2 ln$

0

¸=

2 ln ( | )0

¸,

for i.i.d.1See Lecture III on Asymptotic theory.

3

This is the Wald Test and is similar to the test used in OLShypothesis testing. We use the uniform convergence of ˆ to 0

to obtain plim ˆ = 0, so that the test statistic for the Waldtest:

= (ˆ 0)0ˆ (ˆ 0)

d 2( )

in large samples.

Note that the Wald test is based on the unrestricted MLEmodel.

4

Recall at true parameter vector = 0,

ln$ ( | )¸=

Zln$ ( | ) ¯̄̄̄

= 0

$ ( | 0) = 0

because Z$ ( | 0) = 1 =

Z$ ( | )

= 0,

but

$=

ln$$

=

Zln$ ( | ) ¯̄̄̄

= 0

$ ( | 0) = 0

5

Di erentiating again,Z 2 ln$ ( | )0

¯̄̄̄= 0

$ ( | 0)

+

Zln$ ( | ) ¯̄̄̄

= 0

ln$ ( | )0

¯̄̄̄= 0

$ ( | 0)

= 0

Therefore,

0 =

"1Ã

ln$¯̄̄̄= 0

ln$0

¯̄̄̄= 0

!#

6

Cramer-Rao Lower Bound (Scalar Case)

Consider an estimator ( ) of :

( ( )) =

Z( )$ ( ; )

Under regularity,

( ( ))=

Z( )

ln$ ( ; )

=

μ( )

ln$¶

7

E( ( ))=

Z( )

$d

=

Z( )

ln$$d

= Covμ( )

ln$¶,

because

Eμ

ln$¶= 0.

8

From the Cauchy-Schwartz inequality,μ( ( ))

¶2( ( ))

μln$

¶| {z }

0

For full rank information matrix 0,

( ( ))

μ( ( ))

¶20

9

If unbiased,

( ( )) =ln ( ( ))

= 1

( ( ))1

0

There is a vector version:

( ( )) 10

Cannot do better than MLE in terms of e ciency.

10

1.2 Rao Test (also LM or Score Test)

The second Test — Rao Test (LM) is based on the restrictedmodel. Observes that in a large enough sample 0 (true para-meter value) should be a root of the likelihood equation:

1 ln$¯̄̄̄0

1X=1

ln ( )¯̄̄̄0

= 0

i.e., it imposes the null onto the model for all sample sizes.

In contrast, the Wald Test gets its statistics from estimates ofthe unrestricted model (i.e., a model where the null is not im-posed on the estimates). The Likelihood Ratio Test comparesthe likelihood restricted with the likelihood unrestricted.

11

ln$¯̄̄̄ˆML

= 0 and Eμ

ln ( | )¶= 0,

but

1 ln$ P Eμ

ln ( | )¶= 0

=1 ln$

¯̄̄̄0

P0.

12

Now

By C.L.T.; Lindberg-Levy for i.i.d.z }| {1 ln$

¯̄̄̄0

=1 X

=1

ln ( )¯̄̄̄0

(0 0)

implies that the hypothesis could be tested by testing if scoreln$

¯̄̄̄0

= 0, at the restricted parameter values.2

2For the distributional results, refer to earlier lectures on Asymptotictheory.

13

Thus this test uses the statistic:

=

μ1 ln$

¶00

1ˆ

μ1 ln$

¶0

d 2 ( ) .

2( ) in large samples can be shown using plim ˆ = 0 .

14

The Rao test is also called the Score Test or the LagrangeMultiplier test. This is because this test can be motivatedfrom the solution to a constrained maximization of the log-likelihood subject to constraint = 0. We get:

Lagrangian: ln$ 0( 0)

FOC:ln$

= 0

At null: =ln$

= 0

Thus one can test on (the Lagrange Multiplier) or on the

Scoreμ

ln$¶.

15

Asymptotic Equivalence of Wald and LM tests To estab-lish the asymptotic relationships between the two tests,we use Taylor’s theorem to write:

1 X ln¯̄̄̄ˆ| {z }

=0 by construction

=1X ln

¯̄̄̄0

+1μX 2 ln

0

¶(ˆ 0)

where is an intermediate value with

k 0k k k kˆk

16

From above, we get the duality relationships between score andparameter vectors:

(ˆ 0) =

μX 2 ln0

¶ 1X ln¯̄̄̄0

(ˆ 0) =

μ1μX 2 ln

0

¶ ¶ 11 X ln

¯̄̄̄0

Noting that plim = 0, substituting into the Wald statisticwe get:

plim =

μ1 ln$

¶0 ¯̄̄̄0

10 0

10

μ1 ln$

¶¯̄̄̄0

= plim

Thus, asymptotically the Wald and Rao tests are equivalent.

17

1.3 Likelihood Ratio Test

The third commonly used test is the Likelihood Ratio Test andit uses both the restricted and the unrestricted models. Taylorexpanding the Likelihood function around the point ˆ we get:

ln$( 0)| {z }showed start with the restricted

= ln$(ˆ) +ln$

¯̄̄̄ˆ( 0

ˆ)| {z }=0 by construction

+1

2( 0

ˆ)0μ

2 ln$0

¶( 0

ˆ)

where is an intermediate value with k 0k k k kˆk.

18

ln$0 ln$³ˆ0

´ln$

³ˆ´

ln$

ln$ = ln$0 +ln$0 ( 0)| {z }

=0

+0

2

2$0 ( 0)0

2 (ln$ ln$0) = ( 0)0 2$0

0 ( 0)

= 2 (ln$ ln$0)| {z }

19

= ( 0)0μ

22$0( ) 0

¶( 0)

where μ22$0

( ) 0

¶p

0

and( 0)

d ¡0 1

0

¢.

20

2[ln$(ˆ) ln$( 0)] = (ˆ 0)0μ

1 2 ln$0

¶(ˆ 0)

2 ln

"$(ˆ)

$( 0)

#(ˆ 0)

00 (ˆ 0).

We may also write

2 ln

"$(ˆ)

$( 0)

#0

0

2 ( ) ;

¡0 1

0

¢, and

³ˆ

0

´.

21

Based on the above derivation, we get the statistic for thelikelihood ratio test:

= 2 ln

Ãˆ

ˆ

!

where ˆ is the restricted maximum likelihood function andˆ is the unrestricted maximum likelihood function, and asshown above:

2( )

Note that ˆ ˆ (as an unrestricted maximized functionwould be greater than or equal to the restricted maximizedfunction), so that always LR 0.

22

Asymptotic Equivalence of LR and Wald tests

From the derivation above, it is clear that asymptotically, theLR test statistic converges to the Wald test statistic, i.e.,

plim = (ˆ 0)| {z }d

00 (ˆ 0)| {z }

d

= plim

where (0 0).

We saw earlier that the LM andWald tests were asymptoticallyequivalent. Thus along with the above asymptotic equivalenceof LR andWald we get that asymptotically all three commonlyused tests are equivalent, i.e., asymptotically:

23

2 Composite Hypothesis

Consider a parameter vector = ( 1 2) (where 1 and 2 arepossibly vectors). In many situations, we may want to considerestimating only a subset of the parameters 2. This could bebecause we omit 1 or we hypothesize that 1 = 0. In thiscontext, 2 questions are relevant:

• When do we get an unbiased estimate for 2 if we do theMLE estimation omitting 1?

• How do the results obtained in the previous section applyto test the composite hypothesis:

0 : 1 = 0 and 2 unrestricted: 1 and 2 both unrestricted

We explore these two questions in the next few subsections.

24

2.1 Definitions

We use the following defined expressions in the rest of thesection.

• Define true parameter vector 0 (¯1 ¯2);

• Define estimated parameter vector (unconstrained) asˆ (ˆ1 ˆ2);

• Define estimated parameter vector (constrained, i.e. un-der the null hypothesis setting 1 = 0) as ˜ (0 ˜2);

• In Taylor expansions hereafter, we use 0 and inter-changeably (neglecting (1) terms);

25

• Define the various information matrix related terms asfollows:

1μ

2 $0

¶0

0 =

μ1 1 1 2

2 1 2 2

¶

26

2.2 When is ˆ2 unbiased if 1 is omitted?

In this subsection, we show that the condition for ˆ2 to bean unbiased estimator of ¯2 when the model is misspecified

and estimated omitting 1 is that the score vectorsln$

1and

ln$

2be orthogonal to each other.

27

First, Taylor expanding the score vectors 1 and 2, we get:

1μ

$

1

¶ˆ=

1μ

$

1

¶0

(1)

1 1 (ˆ1 ¯1) 1 2 (ˆ2 ¯

2) + (1)

1μ

$

2

¶ˆ=

1μ

$

2

¶0

(2)

2 2 (ˆ2 ¯2) 2 1 (ˆ1 ¯

1) + (1)

(1) 0 =1μ

$

1

¶0

1 1 (ˆ1 ¯1) 1 2 (ˆ2 ¯

2)+ (1)

(2) 0 =1μ

$

2

¶0

2 2 (ˆ2 ¯2) 2 1 (ˆ1 ¯

1)+ (1)

28

Collecting terms we get:

μ(ˆ1 ¯

1)

(ˆ2 ¯2)

¶= [ ] 1

1μ

$

1

¶0

1μ

$

2

¶0

Next consider ˜2 (MLE of 2 given ¯1 = 0). Expanding aroundroot of likelihood ¯1 = 0, we get:

1μ

$

2

¶˜=

1μ

$

2

¶0

2 2 (˜2 ¯2) + (1)

(3)Now the first term on the right hand side is in common withthe term on of (2) above. Therefore we have that:

2 2 (e2 2) = 2 2 (b2 2)+ 2 1 (b1 1)+ (1)

29

or:

(e2 2) = (b2 2) (4)

+ 12 2 2 1 (b1 1) + (1)

Thus, we get that for (˜2 ¯2) = (ˆ2 ¯

2) we need:

2 1 =ln$

2

μln$

1

¶0¸= 0

i.e., for ˜ (the constrained estimator) to have the same proper-ties as ˆ (the unconstrained estimator) we need that the score

vectorsln$

1and

ln$

2be uncorrelated.

30

Similarity with OLS result: Note that this result is analo-gous to a result for the OLS framework. In the OLS model wehave:

= 1 1 + 2 2 +

If we run the regression omitting 1, we have:

= 2 2 + { + 1 1}Then we get, under standard OLS assumptions:

plimˆ2 = 2 + plimμ 0

2 2

¶ 1

plimμ 0

2 1

¶1

so that we get consistent estimate ˆ2 if the 2 and 1 areorthogonal to each other.3

3Note plim 20

1= [ 0

2 1] by appropriate LLN.

31

Note that the score vectors in MLE are analogous to the datavectors in OLS; we shall revisit this analogy again in Section 3below.

32

2.3 Hypothesis testing results for the com-posite hypothesis

In this subsection, we show the following results:

• Asymptotic equivalence of LR test and Wald test;• Asymptotic equivalence of Rao test (Score/LM test) andthe LR test.

33

2.3.1 Equivalence between Likelihood Ratio andWaldTests

From the definition of the LR test statistic in Section 1.3, wehave the analogous definition for the composite hypothesis caseas:

LR = 2 ln$(ˆ1 ˆ2)

$( 1 = 0 ˜2)= 2 ln

$( 1 = 0 ˜2)

$(ˆ1 ˆ2)

or

LR = 2[ln$( 1 = 0 ˜2) ln$(¯1 ¯2)]

+2[ln$(ˆ1 ˆ2) ln$(¯1 ¯2)]

34

Then following same steps as in derivation in Section 1.3, weget:

LR = (ˆ1 ¯1ˆ2

¯2)0

0 (ˆ1 ¯1ˆ2

¯2)

(0 ˜2 ¯2)0

0 (0 ˜2 ¯2) + (1)

or

LR = (ˆ1 ¯1ˆ2

¯2)0

0 (ˆ1 ¯1ˆ2

¯2)

(˜2 ¯2)0

2 2 (˜2 ¯2) + (1)

35

Substituting for (˜2 ¯2) from equation (4) in Section 2.2, we

get:

LR = (ˆ1 ¯1ˆ2

¯2)0¯1¯2

(ˆ1 ¯1ˆ2

¯2)h

(ˆ2 ¯2) + ( 2 2)

12 1 (ˆ1 ¯

1)i0

2 2h(ˆ2 ¯

2) + ( 2 2)1

2 1 (ˆ1 ¯1)i+ (1)

Call ˆ1 ¯1 and ˆ2 ¯

2 .

36

Then we have:

LR = 01 1 + 0

2 1

+ 01 2 + 0

2 2

01 2( 2 2)

12 2( 2 2)

12 1

01 2( 2 2)

12 2

02 2

02 2

12 2 2 1 + (1)

= 0[ 1 1 1 2( 2 2)1

2 1 ] + (1)

= 0( 1 1) 1 + (1) (5)

where 1 1 is the upper left diagonal block of:

10=

μ1 1 1 2

2 1 2 2

¶ 1

37

(See partitioned inverse result established in section 3.)In the composite hypothesis case, the Wald test exploits thefact that: (ˆ1 ¯

1) (0 1 1) so that the Wald teststatistic here is:

= (ˆ1 ¯1)0( 1 1) 1 (ˆ1 ¯

1) + (1) (6)

From (5) and (6) above, we get directly the asymptotic equiv-alence of the Wald and LR tests, i.e., , in largesamples.

38

2.3.2 Equivalence between Rao (Score/LM) test andthe other tests

To derive the statistic for the Rao (Score/LM) test, first com-pute the derivative with respect to 1 when 1 constraint isimposed):

1 ln$

1

¯̄̄̄˜=

1 ln$

1

¯̄̄̄0

1 2

³˜2 2

´+ (1) (7)

39

From Eqn (3) in Section 2.2 we have:³˜2 2

´= 1

2 2

1 ln$

2

¯̄̄̄0

+ (1)

Substituting above result in (7), we get:

1 ln$

1

¯̄̄̄˜=

1 ln$

1

¯̄̄̄0

1 2

12 2

1 ln$

2

¯̄̄̄0

+ (1)

1 2

12 2

+ (1)

defining1μ

$

1

¶0

and1μ

$

2

¶0

.

40

Then we obtain variance of the key score parameter as:μ1 ln$

1

¯̄̄̄˜

¶[( 1 2

12 2

)( 1 2

12 2

)0]

= 1 1 + 1 2

12 2 2 2

12 2 2 1

2 1 2

12 2 2 1

= ( 1 1) 1

41

· We have the Rao test statistic:

LM =

μ1 ln$

1

¯̄̄̄˜

¶01 1

μ1 ln$

1

¯̄̄̄˜

¶=

£1 2

12 2

¤01 1£

1 2

12 2

¤From Section 2.2 and results for partitioned inverses (refer Sec-tion 2.3.1), we have:

(ˆ1 ¯1) = 1 1

1 ln$

1

¯̄̄̄0

+ 1 2

"1 $

2

¯̄̄̄0

#+ (1)

= 1 1

"1 ln$

1

¯̄̄̄0

1 2

12 2

1 ln$

2

¯̄̄̄0

#+ (1)

= 1 1£

1 2

12 2

¤+ (1)

42

Substituting this into expressions for the Wald statistic, weget:

= (ˆ1 ¯1)0 1 1 (ˆ1 ¯

1)

= [ 1 2

12 2

]0 1 1[ 1 2

12 2

] + (1)

LM (Rao/Score)

Thus, along with the result in Section 2.3.1, we have even forthe composite hypothesis case, asymptotically:

(Rao)

43

3 Application to linear regressionmodel

In this section, we look at some analogies between standardMLE results derived in the earlier sections and OLS regressionresults. (Recall that we already look at some analogies betweenOLS and MLE in Section 2.2 above.)

First, we show analogy between OLS and MLE for estimationof parameter sub-vectors. Then we derive expressions for thethree common test statistics and establish certain results forthese.

44

3.1 Estimation of parameter subsets

Key result in regression is Theorem of the Partitioned Inverse:

= 11 12

21 22

¸assume nonsingular,

Assume 11 is nonsingular so 22 211

11 12 = .Then

1 =1

11 ( + 121

21 11)1

11 121

121

111

1

¸Proof: Multiply it out.

45

In OLS case, it produces a useful result (Frisch-Waugh-Goldberger):

0 =01 1

01 2

02 1

02 2

¸In the OLS case we partition the matrix of independent vari-ables as

=¡

1 2

¢and also 0 =

μ 01 1

01 2

102

02 2

¶

46

We define:

1 1(01 1)

1 01

02 2

02 1(

01 1)

1 01 2

= 02 1 2

Now we have result for OLS regression:

ˆ =

μ1̂

2̂

¶= ( 0 ) 1( 0 )

=01 1

01 2

102

02 2

¸ 1 0102

¸(8)

47

Then using the results from theorem of partitioned inverseabove and simplifying, we can write:

ˆ1 = ( 0

1 1)1( 0

1 )

+( 01 1)

1 01 2

1 02 1(

01 1)

1 01

( 01 1)

1 01 2

1 02

= ( 01 1)

1 01 ( 0

1 1)1( 0

1 2)1 0

2 1

ˆ2 = 1 0

2 1(01 1)

1 01 + 1 0

2

= 1 02( 1(

01 1)

11)

= 1 02 1 (9)

This leads to a Double Residual Regression Result. ˆ2: Regresson 1, 2 on 1; form residuals. Regress one set of residuals

on another.

48

Result: 2̂ is the regression of “cleaned out ” on “cleanedout 2”

This result follows directly from the results (8) and (9) above.Define:

“Cleaned out ” = [ 1(01 1)

1 01] = 1

(Residual from the regression of on 1)

“Cleaned out 2” 2 = [ 1(01 1)

1 01] 2 = 1 2

(Residual from the regression of 2 on 1)

49

Then from (9) above we have:

ˆ2 = 1 0

2 1

= [ 02 1 2]

1( 02 1 )

= [ 02 2 ]

1( 2 )

Note that we really don’t have to clean out , just since 1isidempotent. Further, if 1 and 2 are orthogonal (uncorre-lated), then 2 = 2, and hence unbiased/consistent estimateof 2̂ can be obtained by directly regressing 2 on . (Recallthat we derived the same result for MLE in Section 2.2.)

50

Also observe (derived directly from ˆ = 1ˆ1 + 2

ˆ2) that:

ˆ 0 ˆ = 01(

01 1)

1 01 1| {z }

Part due to 1

+ 01 2

1 02 1| {z }

Part due to orthogonalized 2

Thus, unless regressors are orthogonal, there are no uniquecontributions.

Proof.

1. Observe that:

1 1 = 0 2 2 = 0

1 = 2 =

+ =³

1ˆ1 + 2

ˆ2

´+

51

2. Observe that:

1 2 = ( 1) 2 = linear combination of 2

= · 1 2 = 0.

3.

02 1 = 2 1 1 1 +

02 1 2

ˆ2 +

02 2

= 2 1 1 1 +02 1 2

ˆ2 +

0 ( 1 2) ,

where 2 1 1 1 0 and 1 2 0. Thus,

ˆ2 = (

02 1 2)

1( 0

2 1 ) .

52

4. Observe that ˆ2 is the result of the regression

1 = 1 2ˆ2 + error2,

but

= 1 1 + 2 2 +

= 1 = 1 1 1 + 1 2 2 + 1

= 1 = 1 2 2 + 1 1 1 0

= error2 = .

(from Davidson & MacKinon)

53

Analogously, for MLE in neighborhood of optimum we havethat:

³b0

´ ln$μ

ln$¶0 1

1 ln$ 0 + (1)

54

= 1

ln$

1

μln$

1

¶0ln$

1

μln$

2

¶0ln$

2

μln$

1

¶0ln$

2

μln$

2

¶01

× 1ln$

1ln$

2

0 + (1),

where is a 1× vector of ones, i.e. = (1 1 1 1 1).

55

Comparing above equation to the result for the standard OLSregression (see eqn (8) above), we note that the MLE resultis analogous to regressing on the score vector (a vector ofones); i.e., we have that:

[ 01 2]

μln$

1

¶ μln$

2

¶0¸and 0

56

Further, we also get:³2̂ 2̄

´=

"μln$

2

¶0 "ln$

1

μln$0

1

ln$

1

¶ 1ln$0

1

#ln$

2

# 1

×Ã

ln$

1

μln$

1

¶0ln$

1

¸ 1ln$0

1

!0 + (1),

which is analogous to regression result (9) above.

57

3.2 Hypothesis testing results for the linearregression model

Consider in classical linear regression model.

= +¡0 2

¢i.i.d.,

i.e.,

( ) =1p2 2

exp

μ1

2

2

2

¶=

( | ) = ( | )

=1

2 2exp

μ1

2 2( )2

¶

58

With i.i.d. sampling, we have the log likelihood function:

ln$ =2ln 2

2ln 2 1

2 2

X=1

( )2

59

3.2.1 MLE Solutionln$

=X

= 0

at optimum yields:

ˆ =³X 0

´ 1X 0

ˆ2 =1X

=1

( ˆ)2

60

1 2 ln$0 =

( 0 ) 12

1 2 ln$

( 2 )=

"2

12+

1

2 ( 2 )2

X( ˆ)2

#=

1

(2 2 )2

X( ˆ)0 = 0

(This holds unless we have that the model is such that there’sa functional relationship between 2 and — this we excludeby assumption.)

61

1 2 ln$

( 2 )2=

2(ˆ2 )2

This yields the information matrix:

0 =

12

0 0

01

2 4

which is and block diagonal.(By previous result we may ignore parameter estimation errorbetween b and b2).

62

3.2.2 Expressions for the various test statistics

Let: = ( 1 2) and 0: 2 = 0.Then the expression for the Likelihood Ratio test statistic isgiven by:

= 2 ln

"$(b)$(˜)

#where $(ˆ) is the unrestricted maximized likelihood functionand $(˜) is the restricted maximized likelihood function.

63

We have:

ln$(˜) =2ln ˜2

1

2˜2

X=1

( 1˜1)2

=2ln

P³1˜1

´2

using fact ˜2 =1 P

=1( 1˜1)2.

64

Similarly,

ln$(ˆ) =2ln

P³1ˆ1 2

ˆ2

´2

=

= 2 ln

"$(b)$(b0)

#

=2ln

μ 0 [ 1(01 1)

1]0 [ ( 0 ) 1 0]

¶(10)

65

The expression for the Wald Test statistic is obtained usingpartitioned inverse theorem on information matrix above andstandard asymptotic normality result for MLE:

(b2 2̄)¡0 2( 0

2[ 1(01 1)

1 01] 2)

1¢

· =³b

2 2

´0 02 ( 1(

01 1)

1 01) 2

2

¸×

³b2 2

´where 2 is the hypothesized value of 2.

66

The test statistic for the Rao (LM/Score) is obtained by com-puting the derivative of the log likelihood function at the pointwhere we have that 2 = 0 is imposed. We get:

ln$( 0) =2ln 2

P( 1 1 1 2 2)

2

2 2

=

1 ln$( 0)

2

¯̄̄̄2=0

=1P³

1 1˜1

´2

2˜2

67

As discussed in earlier sections, the Rao test statistic can nowbe formed by examining the asymptotic distribution of thescore vector. In this linear regression context, the Rao test canbe motivated in another intuitive way.

Basically, the Rao test considers the restrictions to be valid ifthe score 0 (i.e., the log likelihood function is maximized)when the restrictions are imposed. Looking at the rhs of theexpression above, in the linear regression case, this is equivalentto checking if the residual from the restricted MLE

˜ = ( 1 1˜1)

and 2 are orthogonal!

68

Accordingly, we have the Rao test statistic:

=

μ1P˜ 2

2˜2

¶0 ¡2 2¢μ 1P˜ 2

2˜2

¶=

μ1P˜ 2

2˜2

¶0˜2¡ 0

2

¡1(

01 1)

1 01

¢2

¢ 1

×μ1P˜ 2

2˜2

¶

69

Observe a feature of Rao test: Is it equivalent to regressing1 1 on 2 and testing for statistical significance?

The answer is: not in general, as shown below.Regressing 1 1 on 2 gives:

1 1 = 2 +ˆ = ( 0

2 2)1 0

2 1

70

OLS test of = 0 yields statistic:

b 0( 02 2)b

=0

1 2(02 2)

1( 02 2)(

02 2)

1 02 1

( 2( 02 2) 1 0

2 1 )0( 2( 02 2) 1 0

2 1 )

=0

1 2(02 2)

1 02 1

0 [ 1 2( 02 2) 1 0

2] [ 2( 02 2) 1 0

2 1]

=0

1 2(02 2)

1 02 1

0+ 1 2(

02 2)

1 02 1

1 2(02 2)

1 02

2(02 2)

1 02 1

71

Whereas the Rao statistic as derived earlier is:

=0

1 2(02 2)

1 02 1

01 2 1

72

So only if 1 2 = 2 (i.e., 1 2) are these equal, not ingeneral.

Note that if we regress 1 on 1 and 2, then testing if 2

in this regression is significant is equivalent to the Rao test.That is, if we set the model as:

1 1 = 1 1 + 2 2 + bb2 = ( 0

2 1 2)1 0

2 1

Then the OLS test of 2 = 0 is equivalent to the Rao test (leftas exercise for the reader).

73

3.2.3 Relationship between the various tests

While we demonstrated in earlier sections that, asymptotically,the three tests are equivalent, in this special linear regressioncase we can show that so that the Wald testis the most conservative (most likely to reject the null) and theRao test is the least conservative (least likely to reject the null)in small/finite samples.Define:

( ) Expected Sum of Squared Residuals (Unrestricted)

= ( ˆ1 1

ˆ2 2)

0( ˆ1 1

ˆ2 2)

( ) Expected Sum of Squared Residuals (Restricted)

= ( ˜1 1)

0( ˜1 1)

74

Using the results in the earlier subsections of Section 3, andfrom the expressions for Wald, LR and LM test statistics de-rived above it can be shown (left as exercise for the reader)that:

=( ) ( )

( )

¸= ln

( )

( )

¸(Rao) =

( ) ( )

( )

¸

75

1. Proof that :

Define( )

( )

= = ln = ( 1)

Now 1 ln for 1 =

2. Proof that Rao:

Define( )

( )

1.

= (Rao) = [1 ] = ln

Now ln 1 (since 1 always) = Rao.

76

Documents

Hypothesis Testing: Large Samplejenni.uchicago.edu/.../topic2/asymp4_hyp-test-large... · 4/18/2006 · The second Test — Rao Test (LM) is based on the restricted model. Observes