Upload
jazmine-rossen
View
229
Download
6
Tags:
Embed Size (px)
Citation preview
Christopher Dougherty
EC220 - Introduction to econometrics (chapter 2)Slideshow: f test of goodness of fit
Original citation:
Dougherty, C. (2012) EC220 - Introduction to econometrics (chapter 2). [Teaching Resource]
© 2012 The Author
This version available at: http://learningresources.lse.ac.uk/128/
Available in LSE Learning Resources Online: May 2012
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License. This license allows the user to remix, tweak, and build upon the work even for commercial purposes, as long as the user credits the author and licenses their new creations under the identical terms. http://creativecommons.org/licenses/by-sa/3.0/
http://learningresources.lse.ac.uk/
222 )ˆ()( eYYYY
RSSESSTSS
F TEST OF GOODNESS OF FIT
In an earlier sequence it was demonstrated that the sum of the squares of the actual values of Y (TSS: total sum of squares) could be decomposed into the sum of the squares of the fitted values (ESS: explained sum of squares) and the sum of the squares of the residuals.
1
222 )ˆ()( eYYYY
RSSESSTSS
F TEST OF GOODNESS OF FIT
2
2
22
)(
)ˆ(
YY
YY
TSSESS
Ri
i
R2, the usual measure of goodness of fit, was then defined to be the ratio of the explained sum of squares to the total sum of squares.
222 )ˆ()( eYYYY
RSSESSTSS
F TEST OF GOODNESS OF FIT
3
2
22
)(
)ˆ(
YY
YY
TSSESS
Ri
i
uXY 21
The null hypothesis that we are going to test is that the model has no explanatory power.
222 )ˆ()( eYYYY
RSSESSTSS
F TEST OF GOODNESS OF FIT
4
2
22
)(
)ˆ(
YY
YY
TSSESS
Ri
i
uXY 21 0:,0: 2120 HH
Since X is the only explanatory variable at the moment, the null hypothesis is that Y is not determined by X. Mathematically, we have H0: 2 = 0
222 )ˆ()( eYYYY
RSSESSTSS
F TEST OF GOODNESS OF FIT
5
2
22
)(
)ˆ(
YY
YY
TSSESS
Ri
i
uXY 21 0:,0: 2120 HH
)/()1()1/(
)(
)1(
)/()1/(
),1( 2
2
knRkR
knTSSRSS
kTSSESS
knRSSkESS
knkF
Hypotheses concerning goodness of fit are tested via the F statistic, defined as shown. k is the number of parameters in the regression equation, which at present is just 2.
222 )ˆ()( eYYYY
RSSESSTSS
F TEST OF GOODNESS OF FIT
6
2
22
)(
)ˆ(
YY
YY
TSSESS
Ri
i
uXY 21 0:,0: 2120 HH
)/()1()1/(
)(
)1(
)/()1/(
),1( 2
2
knRkR
knTSSRSS
kTSSESS
knRSSkESS
knkF
n – k is, as with the t statistic, the number of degrees of freedom (number of observations less the number of parameters estimated). For simple regression analysis, it is n – 2.
222 )ˆ()( eYYYY
RSSESSTSS
F TEST OF GOODNESS OF FIT
7
2
22
)(
)ˆ(
YY
YY
TSSESS
Ri
i
uXY 21 0:,0: 2120 HH
)/()1()1/(
)(
)1(
)/()1/(
),1( 2
2
knRkR
knTSSRSS
kTSSESS
knRSSkESS
knkF
The F statistic may alternatively be written in terms of R2. First divide the numerator and denominator by TSS.
222 )ˆ()( eYYYY
RSSESSTSS
F TEST OF GOODNESS OF FIT
8
2
22
)(
)ˆ(
YY
YY
TSSESS
Ri
i
uXY 21 0:,0: 2120 HH
)/()1()1/(
)(
)1(
)/()1/(
),1( 2
2
knRkR
knTSSRSS
kTSSESS
knRSSkESS
knkF
We can now rewrite the F statistic as shown. The R2 in the numerator comes straight from the definition of R2.
211 RTSSESS
TSSESSTSS
TSSRSS
F TEST OF GOODNESS OF FIT
9
2
22
)(
)ˆ(
YY
YY
TSSESS
Ri
i
uXY 21 0:,0: 2120 HH
)/()1()1/(
)(
)1(
)/()1/(
),1( 2
2
knRkR
knTSSRSS
kTSSESS
knRSSkESS
knkF
It is easily demonstrated that RSS/TSS is equal to 1 – R2.
F TEST OF GOODNESS OF FIT
10
2
22
)(
)ˆ(
YY
YY
TSSESS
Ri
i
uXY 21 0:,0: 2120 HH
)/()1()1/(
)(
)1(
)/()1/(
),1( 2
2
knRkR
knTSSRSS
kTSSESS
knRSSkESS
knkF
222 )ˆ()( eYYYY
RSSESSTSS
F is a monotonically increasing function of R2. As R2 increases, the numerator increases and the denominator decreases, so for both of these reasons F increases.
11
F TEST OF GOODNESS OF FIT
R2
18/)1(1/
)18,1( 2
2
RR
F
F
0
20
40
60
80
100
120
140
0 0.2 0.4 0.6 0.8 1
Here is F plotted as a function of R2 for the case where there is 1 explanatory variable and 20 observations.
)/()1()1/(
),1( 2
2
knRkR
knkF
12
F TEST OF GOODNESS OF FIT
R2
18/)1(1/
)18,1( 2
2
RR
F
F
0
20
40
60
80
100
120
140
0 0.2 0.4 0.6 0.8 1
If the null hypothesis is true, F will have a random distribution.
)/()1()1/(
),1( 2
2
knRkR
knkF
13
F TEST OF GOODNESS OF FIT
There will be some critical value which it will exceed only 5 percent of the time. If we are performing a 5 percent significance test, we will reject H0 if the F statistic is greater than this critical value.
R2
18/)1(1/
)18,1( 2
2
RR
F
F
0
20
40
60
80
100
120
140
0 0.2 0.4 0.6 0.8 1
41.4)18,1(%5,crit F
)/()1()1/(
),1( 2
2
knRkR
knkF
14
F TEST OF GOODNESS OF FIT
In the case of an F test, the critical value depends on the number of explanatory variables as well as the number of degrees of freedom.
R2
18/)1(1/
)18,1( 2
2
RR
F
F
0
20
40
60
80
100
120
140
0 0.2 0.4 0.6 0.8 1
41.4)18,1(%5,crit F
)/()1()1/(
),1( 2
2
knRkR
knkF
15
F TEST OF GOODNESS OF FIT
For the present example, the critical value for a 5 percent significance test is 4.41, when R2 is 0.20.
R2
18/)1(1/
)18,1( 2
2
RR
F
F
0
20
40
60
80
100
120
140
0 0.2 0.4 0.6 0.8 1
41.4)18,1(%5,crit F
)/()1()1/(
),1( 2
2
knRkR
knkF
F TEST OF GOODNESS OF FIT
R2
18/)1(1/
)18,1( 2
2
RR
F
F
0
20
40
60
80
100
120
140
0 0.2 0.4 0.6 0.8 1
29.8)18,1(%1,crit F
If we wished to be more cautious, we could perform a 1 percent test. For this the critical value of F is 8.29, where R2 is 0.32.
16
)/()1()1/(
),1( 2
2
knRkR
knkF
17
F TEST OF GOODNESS OF FIT
If R2 is higher than 0.32, F will be higher than 8.29, and we will reject the null hypothesis at the 1 percent level.
R2
18/)1(1/
)18,1( 2
2
RR
F
F
0
20
40
60
80
100
120
140
0 0.2 0.4 0.6 0.8 1
29.8)18,1(%1,crit F
)/()1()1/(
),1( 2
2
knRkR
knkF
F TEST OF GOODNESS OF FIT
R2
18/)1(1/
)18,1( 2
2
RR
F
F
0
20
40
60
80
100
120
140
0 0.2 0.4 0.6 0.8 1
29.8)18,1(%1,crit F
)/()1()1/(
),1( 2
2
knRkR
knkF
Why do we perform the test indirectly, through F, instead of directly through R2? After all, it would be easy to compute the critical values of R2 from those for F.
18
19
F TEST OF GOODNESS OF FIT
The reason is that an F test can be used for several tests of analysis of variance. Rather than have a specialized table for each test, it is more convenient to have just one.
R2
18/)1(1/
)18,1( 2
2
RR
F
F
0
20
40
60
80
100
120
140
0 0.2 0.4 0.6 0.8 1
29.8)18,1(%1,crit F
)/()1()1/(
),1( 2
2
knRkR
knkF
F TEST OF GOODNESS OF FIT
Note that, for simple regression analysis, the null and alternative hypotheses are mathematically exactly the same as for a two-tailed t test. Could the F test come to a different conclusion from the t test?
20
2
22
)(
)ˆ(
YY
YY
TSSESS
Ri
i
uXY 21 0:,0: 2120 HH
)/()1()1/(
)(
)1(
)/()1/(
),1( 2
2
knRkR
knTSSRSS
kTSSESS
knRSSkESS
knkF
222 )ˆ()( eYYYY
RSSESSTSS
F TEST OF GOODNESS OF FIT
21
2
22
)(
)ˆ(
YY
YY
TSSESS
Ri
i
uXY 21 0:,0: 2120 HH
)/()1()1/(
)(
)1(
)/()1/(
),1( 2
2
knRkR
knTSSRSS
kTSSESS
knRSSkESS
knkF
222 )ˆ()( eYYYY
RSSESSTSS
The answer, of course, is no. We will demonstrate that, for simple regression analysis, the F statistic is the square of the t statistic.
22
2
22
22
222
2
22
22222
22121
2
2
).(.
1][][
2
ˆ
)2/(
tbes
b
XXs
bXX
sb
XXbss
XbbXbb
ne
YY
nRSSESS
F
iu
iu
iuu
i
i
i
22
F TEST OF GOODNESS OF FIT
We start by replacing ESS and RSS by their mathematical expressions.
22
2
22
22
222
2
22
22222
22121
2
2
).(.
1][][
2
ˆ
)2/(
tbes
b
XXs
bXX
sb
XXbss
XbbXbb
ne
YY
nRSSESS
F
iu
iu
iuu
i
i
i
23
F TEST OF GOODNESS OF FIT
The denominator is the expression for su2, the estimator of u
2, for the simple regression model. We expand the numerator using the expression for the fitted relationship.
24
F TEST OF GOODNESS OF FIT
The b1 terms in the numerator cancel. The rest of the numerator can be grouped as shown.
22
2
22
22
222
2
22
22222
22121
2
2
).(.
1][][
2
ˆ
)2/(
tbes
b
XXs
bXX
sb
XXbss
XbbXbb
ne
YY
nRSSESS
F
iu
iu
iuu
i
i
i
25
F TEST OF GOODNESS OF FIT
We take the b22 term out of the summation as a factor.
22
2
22
22
222
2
22
22222
22121
2
2
).(.
1][][
2
ˆ
)2/(
tbes
b
XXs
bXX
sb
XXbss
XbbXbb
ne
YY
nRSSESS
F
iu
iu
iuu
i
i
i
26
F TEST OF GOODNESS OF FIT
We move the term involving X to the denominator.
22
2
22
22
222
2
22
22222
22121
2
2
).(.
1][][
2
ˆ
)2/(
tbes
b
XXs
bXX
sb
XXbss
XbbXbb
ne
YY
nRSSESS
F
iu
iu
iuu
i
i
i
27
F TEST OF GOODNESS OF FIT
The denominator is the square of the standard error of b2.
22
2
22
22
222
2
22
22222
22121
2
2
).(.
1][][
2
ˆ
)2/(
tbes
b
XXs
bXX
sb
XXbss
XbbXbb
ne
YY
nRSSESS
F
iu
iu
iuu
i
i
i
28
F TEST OF GOODNESS OF FIT
Hence we obtain b22 divided by the square of the standard error of b2. This is the t statistic,
squared.
22
2
22
22
222
2
22
22222
22121
2
2
).(.
1][][
2
ˆ
)2/(
tbes
b
XXs
bXX
sb
XXbss
XbbXbb
ne
YY
nRSSESS
F
iu
iu
iuu
i
i
i
29
F TEST OF GOODNESS OF FIT
22
2
22
22
222
2
22
22222
22121
2
2
).(.
1][][
2
ˆ
)2/(
tbes
b
XXs
bXX
sb
XXbss
XbbXbb
ne
YY
nRSSESS
F
iu
iu
iuu
i
i
i
It can also be shown that the critical value of F, at any significance level, is equal to the square of the critical value of t. We will not attempt to prove this.
30
F TEST OF GOODNESS OF FIT
22
2
22
22
222
2
22
22222
22121
2
2
).(.
1][][
2
ˆ
)2/(
tbes
b
XXs
bXX
sb
XXbss
XbbXbb
ne
YY
nRSSESS
F
iu
iu
iuu
i
i
i
Since the F test is equivalent to a two-tailed t test in the simple regression model, there is no point in performing both tests. In fact, if justified, a one-tailed t test would be better than either because it is more powerful (lower risk of Type II error if H0 is false).
31
F TEST OF GOODNESS OF FIT
22
2
22
22
222
2
22
22222
22121
2
2
).(.
1][][
2
ˆ
)2/(
tbes
b
XXs
bXX
sb
XXbss
XbbXbb
ne
YY
nRSSESS
F
iu
iu
iuu
i
i
i
The F test will have its own role to play when we come to multiple regression analysis.
32
F TEST OF GOODNESS OF FIT
Here is the output for the regression of hourly earnings on years of schooling for the sample of 540 respondents from the National Longitudinal Survey of Youth.
. reg EARNINGS S
Source | SS df MS Number of obs = 540-------------+------------------------------ F( 1, 538) = 112.15 Model | 19321.5589 1 19321.5589 Prob > F = 0.0000 Residual | 92688.6722 538 172.283777 R-squared = 0.1725-------------+------------------------------ Adj R-squared = 0.1710 Total | 112010.231 539 207.811189 Root MSE = 13.126
------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- S | 2.455321 .2318512 10.59 0.000 1.999876 2.910765 _cons | -13.93347 3.219851 -4.33 0.000 -20.25849 -7.608444------------------------------------------------------------------------------
33
F TEST OF GOODNESS OF FIT
. reg EARNINGS S
Source | SS df MS Number of obs = 540-------------+------------------------------ F( 1, 538) = 112.15 Model | 19321.5589 1 19321.5589 Prob > F = 0.0000 Residual | 92688.6722 538 172.283777 R-squared = 0.1725-------------+------------------------------ Adj R-squared = 0.1710 Total | 112010.231 539 207.811189 Root MSE = 13.126
------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- S | 2.455321 .2318512 10.59 0.000 1.999876 2.910765 _cons | -13.93347 3.219851 -4.33 0.000 -20.25849 -7.608444------------------------------------------------------------------------------
15.11228.172
19322)2540/(92689
19322)2/()(
)2,1(
nRSS
ESSnF
We shall check that the F statistic has been calculated correctly. The explained sum of squares (described in Stata as the model sum of squares) is 19322.
34
F TEST OF GOODNESS OF FIT
. reg EARNINGS S
Source | SS df MS Number of obs = 540-------------+------------------------------ F( 1, 538) = 112.15 Model | 19321.5589 1 19321.5589 Prob > F = 0.0000 Residual | 92688.6722 538 172.283777 R-squared = 0.1725-------------+------------------------------ Adj R-squared = 0.1710 Total | 112010.231 539 207.811189 Root MSE = 13.126
------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- S | 2.455321 .2318512 10.59 0.000 1.999876 2.910765 _cons | -13.93347 3.219851 -4.33 0.000 -20.25849 -7.608444------------------------------------------------------------------------------
The residual sum of squares is 92689.
15.11228.172
19322)2540/(92689
19322)2/(
)2,1(
nRSS
ESSnF
35
F TEST OF GOODNESS OF FIT
. reg EARNINGS S
Source | SS df MS Number of obs = 540-------------+------------------------------ F( 1, 538) = 112.15 Model | 19321.5589 1 19321.5589 Prob > F = 0.0000 Residual | 92688.6722 538 172.283777 R-squared = 0.1725-------------+------------------------------ Adj R-squared = 0.1710 Total | 112010.231 539 207.811189 Root MSE = 13.126
------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- S | 2.455321 .2318512 10.59 0.000 1.999876 2.910765 _cons | -13.93347 3.219851 -4.33 0.000 -20.25849 -7.608444------------------------------------------------------------------------------
The number of degrees of freedom is 540 – 2 = 538.
15.11228.172
19322)2540/(92689
19322)2/(
)2,1(
nRSS
ESSnF
36
F TEST OF GOODNESS OF FIT
. reg EARNINGS S
Source | SS df MS Number of obs = 540-------------+------------------------------ F( 1, 538) = 112.15 Model | 19321.5589 1 19321.5589 Prob > F = 0.0000 Residual | 92688.6722 538 172.283777 R-squared = 0.1725-------------+------------------------------ Adj R-squared = 0.1710 Total | 112010.231 539 207.811189 Root MSE = 13.126
------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- S | 2.455321 .2318512 10.59 0.000 1.999876 2.910765 _cons | -13.93347 3.219851 -4.33 0.000 -20.25849 -7.608444------------------------------------------------------------------------------
The denominator of the expression for F is therefore 172.28. Note that this is an estimate of u
2. Its square root, denoted in Stata by Root MSE, is an estimate of the standard deviation of u.
15.11228.172
19322)2540/(92689
19322)2/(
)2,1(
nRSS
ESSnF
37
F TEST OF GOODNESS OF FIT
. reg EARNINGS S
Source | SS df MS Number of obs = 540-------------+------------------------------ F( 1, 538) = 112.15 Model | 19321.5589 1 19321.5589 Prob > F = 0.0000 Residual | 92688.6722 538 172.283777 R-squared = 0.1725-------------+------------------------------ Adj R-squared = 0.1710 Total | 112010.231 539 207.811189 Root MSE = 13.126
------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- S | 2.455321 .2318512 10.59 0.000 1.999876 2.910765 _cons | -13.93347 3.219851 -4.33 0.000 -20.25849 -7.608444------------------------------------------------------------------------------
Our calculation of F agrees with that in the Stata output.
15.11228.172
19322)2540/(92689
19322)2/(
)2,1(
nRSS
ESSnF
38
F TEST OF GOODNESS OF FIT
. reg EARNINGS S
Source | SS df MS Number of obs = 540-------------+------------------------------ F( 1, 538) = 112.15 Model | 19321.5589 1 19321.5589 Prob > F = 0.0000 Residual | 92688.6722 538 172.283777 R-squared = 0.1725-------------+------------------------------ Adj R-squared = 0.1710 Total | 112010.231 539 207.811189 Root MSE = 13.126
------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- S | 2.455321 .2318512 10.59 0.000 1.999876 2.910765 _cons | -13.93347 3.219851 -4.33 0.000 -20.25849 -7.608444------------------------------------------------------------------------------
We will also check the F statistic using theexpression for it in terms of R2. We see again that it agrees.
15.112)2540/()1725.01(
1725.0)2/()1(
)2,1( 2
2
nR
RnF
39
F TEST OF GOODNESS OF FIT
. reg EARNINGS S
Source | SS df MS Number of obs = 540-------------+------------------------------ F( 1, 538) = 112.15 Model | 19321.5589 1 19321.5589 Prob > F = 0.0000 Residual | 92688.6722 538 172.283777 R-squared = 0.1725-------------+------------------------------ Adj R-squared = 0.1710 Total | 112010.231 539 207.811189 Root MSE = 13.126
------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- S | 2.455321 .2318512 10.59 0.000 1.999876 2.910765 _cons | -13.93347 3.219851 -4.33 0.000 -20.25849 -7.608444------------------------------------------------------------------------------
We will also check the relationship between the F statistic and the t statistic for the slope coefficient.
40
F TEST OF GOODNESS OF FIT
. reg EARNINGS S
Source | SS df MS Number of obs = 540-------------+------------------------------ F( 1, 538) = 112.15 Model | 19321.5589 1 19321.5589 Prob > F = 0.0000 Residual | 92688.6722 538 172.283777 R-squared = 0.1725-------------+------------------------------ Adj R-squared = 0.1710 Total | 112010.231 539 207.811189 Root MSE = 13.126
------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- S | 2.455321 .2318512 10.59 0.000 1.999876 2.910765 _cons | -13.93347 3.219851 -4.33 0.000 -20.25849 -7.608444------------------------------------------------------------------------------
259.1015.112
Obviously, this is correct as well.
Copyright Christopher Dougherty 2011.
These slideshows may be downloaded by anyone, anywhere for personal use.
Subject to respect for copyright and, where appropriate, attribution, they may be
used as a resource for teaching an econometrics course. There is no need to
refer to the author.
The content of this slideshow comes from Section 2.7 of C. Dougherty,
Introduction to Econometrics, fourth edition 2011, Oxford University Press.
Additional (free) resources for both students and instructors may be
downloaded from the OUP Online Resource Centre
http://www.oup.com/uk/orc/bin/9780199567089/.
Individuals studying econometrics on their own and who feel that they might
benefit from participation in a formal course should consider the London School
of Economics summer school course
EC212 Introduction to Econometrics
http://www2.lse.ac.uk/study/summerSchools/summerSchool/Home.aspx
or the University of London International Programmes distance learning course
20 Elements of Econometrics
www.londoninternational.ac.uk/lse.
11.07.25