Upload
safuan-halim
View
222
Download
0
Embed Size (px)
Citation preview
7/30/2019 Heteroscedasticity[1]
http://slidepdf.com/reader/full/heteroscedasticity1 1/15
1
Heteroscedasticity
Assumption of CLRM:-
The error term are homoscedastic [ ni E i ,...,2,1;)(22 ]
the variance of each i is constant
For example: Given iioi X Y 1 , as income (x) increase mean value of savings
(Y) increase but variance of savings remains constant.
Remember: Heteroscedasticity more commonly found in cross-sectional rather than time
series data, because cross sectional data usually deals with members of population at agiven point in time (small, medium @ large firms) scale effect in cross-sectional data.
When assumption 3 holds,
– i.e. the errors ui in the regression equation have common variance (ieconstant or scalar variance)
then we have homoscedasticity.– or a “scalar error covariance matrix”
When assumption 3 breaks down, we have what is known as heteroscedasticity.
– or a “non-scalar error covariance matrix”
Homoskedasticity => variance of error term constant for each observation
Each one of the residuals has a sampling distribution, each of which
should have the same variance -- “ homoscedasticity”
scalaraisewher
00
00
00
)var(),cov(),cov(
),cov()var(),cov(),cov(),cov()var(
),....,cov(
2
2
2
2
21
2212
1211
21
nnn
n
n
n
uuuuu
uuuuu
uuuuu
uuu
7/30/2019 Heteroscedasticity[1]
http://slidepdf.com/reader/full/heteroscedasticity1 2/15
2
Generalized Least Squared Estimation
For general linear statistical model
e X Y
where
E[e]=0
Heteroscedasticity exists when diagonal element of are not all identical
2
2
22
21
00
00
00
][
T
ee E
GLS for (BLUE)
y X X X 111')'(ˆ
where 2 , with 2 unknown and known
These 2 estimators are same because
y X X X y X X X
y X X X 111
2
11
2
1111
')'(''
')'(
If 1'
PP , then
**'*)*'(
'')''(ˆ
1
1
y X X X
PyP X PX P X
where
PX X * Py y *
The t th observation for whole model can be written as
t
t
t
t
t
t e X y
'
7/30/2019 Heteroscedasticity[1]
http://slidepdf.com/reader/full/heteroscedasticity1 3/15
3
The variance of transform disturbance t t t ee / * constant
1][1
][2
22
2
22*
t
t t
t t
t t e E
e E e E
heteroscedasticity error model the GLS estimator obtained by(Weighted Least Squares):-
(a) Divide each observation (dependent & independent) by standard deviation of eror
term for that observations(b) Apply usual LS procedures to transformed observations
Recall:
GLS estimator is that minimizes
)()'(1
2
1
X y X ye
T
t t
t
T
t
t t t
T
t
t t t y x x x
1
2
1
1
'2ˆ
Covariance matrix for
1
ˆ 1
'2111
)'(*)*'(
T
t
t t t x x X X X X
7/30/2019 Heteroscedasticity[1]
http://slidepdf.com/reader/full/heteroscedasticity1 4/15
4
(a) Causes (Why variance of i vary?)
(i) Omitted Variables
Suppose the “true” model of y is:
yi = a + b1 xi + b2 zi + ui
but the model we estimate fails to include z:
yi = a + b1 xi + vi then the error term in the model estimated by SPSS (vi) will be capturing the effect of the
omitted variable, and so it will be correlated with z:vi = c zi + ui
and so the variance of vi will be non-scalar
(ii) Non-constant coefficient
Suppose that the slope coefficient varies across i: yi = a + bi xi + ui
suppose that it varies randomly around some fixed value b:
bi = b + ei
then the regression actually estimated by SPSS will be:
yi = a + (b + ei) xi + ui
= a + b xi + (ei xi + ui)
where (ei x + ui) is the error term in the SPSS regression. The error term will thus varywith x.
(iii) Non-linearities
If the true relationship is non-linear: yi = a + b xi
2+ ui
but the regression we attempt to estimate is linear:
yi = a + b xi + vi
then the residual in this estimated regression will capture the non-linearity and itsvariance will be affected accordingly:
vi = f ( xi2, ui)
(iv) Aggregation
Sometimes we aggregate our data across groups:
– e.g. quarterly time series data on income = average income of a group of households in a given quarter
if this is so, and the size of groups used to calculate the averages varies,
variation of the mean will vary– larger groups will have a smaller standard error of the mean.
7/30/2019 Heteroscedasticity[1]
http://slidepdf.com/reader/full/heteroscedasticity1 5/15
5
the measurement errors of each value of our variable will be correlatedwith the sample size of the groups used.
Since measurement errors will be captured by the regression residual
regression residual will vary the sample size of the underlying groups
on which the data is based.
Overall:
Mis-specification error- wrong functional form
- Non-linearities- Non-constant coefficient
- incorrect data transformation- omitted variables
- Aggregation
Outlier Improvement in data collecting technique Errors of behavior become smaller over time
(b) Consequences:
OLS Estimators still linear and remain unbiased The property of minimum variance no longer holds (not efficient)
Heteroskedasticity does, however, bias the OLS estimated standard errors for theestimated coefficients:
– which means that the t tests will not be reliable:t = b
hat /SE(b
hat ).
F-tests are also no longer reliable
7/30/2019 Heteroscedasticity[1]
http://slidepdf.com/reader/full/heteroscedasticity1 6/15
6
Unbiased and Consistent Estimator
Biased but Consistent Estimator
Asymptotic Distribution of OLS Estimatehat
The Estimate is Unbiased and Consistent since as the sample size increases, the mean of the
distribution tends towards the population value of the slope coefficient
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
- 4
- 3 . 7
- 3 . 3 - 3
- 2 . 6
- 2 . 3
- 1 . 9
- 1 . 6
- 1 . 2
- 0 . 9
- 0 . 5
- 0 . 2
0 . 2
0 . 5
5 0
. 9
1 . 2
5 1
. 6
1 . 9
5 2
. 3
2 . 6
5 3
3 . 3
5 3
. 7
4 . 0
5 4
. 4
4 . 7
5 5
. 1
5 . 4
5 5
. 8
6 . 1
5 6
. 5
6 . 8
5 7
. 2
7 . 5
5 7
. 9
hat
n = 1,000
n = 500
n = 300
n = 200
n = 150
Asymptotic Distribution of OLS Estimatehat
The Estimate is Biased but Consistent since as the sample size increases, the mean of the
distribution tends towards the population value of the slope coefficient
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
- 4
- 3 . 7
- 3 . 3 - 3
- 2 . 6
- 2 . 3
- 1 . 9
- 1 . 6
- 1 . 2
- 0 . 9
- 0 . 5
- 0 . 2
0 . 2
0 . 5
5 0
. 9
1 . 2
5 1
. 6
1 . 9
5 2
. 3
2 . 6
5 3
3 . 3
5 3
. 7
4 . 0
5 4
. 4
4 . 7
5 5
. 1
5 . 4
5 5
. 8
6 . 1
5 6
. 5
6 . 8
5 7
. 2
7 . 5
5 7
. 9
hat
n = 1,000
n = 500
n = 300
n = 200
n = 150
7/30/2019 Heteroscedasticity[1]
http://slidepdf.com/reader/full/heteroscedasticity1 7/15
7
(c) Detecting:
1. Graphical Examination of residuals (Informal Tests)
-plot residual square against y
-plot residual square against x
If we plot the residual against Rooms, we can see that its variance
increases with No. rooms:
Number of rooms
14121086420
U n s t a n d a r d i z e d R e s i d u a l
300000
200000
100000
0
-100000
-200000
7/30/2019 Heteroscedasticity[1]
http://slidepdf.com/reader/full/heteroscedasticity1 8/15
8
Formal Test2. White’s General Hetero Test
Given the model:
iiii u X X Y 33221
White Test Prosedure (basic idea):
(1) Estimate the model and obtain the residuals
]ˆˆˆ[ˆ33221 iiiiiii
u X X Y Y Y e
(2) Estimate the auxiliary regression
iiiiiiii v X X X X X X u 326
2
35
2
2433221
2ˆ
squared terms of all the X’s & cross products
(3) Obtain the R-squared from the auxiliary regression then can be used tocompute the test statistics
2
1
2 ~ k Rn
test-stat22
Rn
critical value f d .2
k-1=degree of freedom
(4) H0 = no heteroscedasticity (variance constant)
H0 = heteroscedasticity@
22
0:
i H for all i.
01 : H Not H
If test-stat > critical value, then reject H0 Hetero
If test-stat < critical value, then fail to reject H0 No Hetero
7/30/2019 Heteroscedasticity[1]
http://slidepdf.com/reader/full/heteroscedasticity1 9/15
9
Or
The correct covariance matrix for the LS estimator is112 ]][[][]|[ X X X X X X X bVar i and 12 ][ X X sV . Is there is no
heteroscedasticity, then V will give a consistent estimator of ]|[ X bVar .
3. Spearman’s rank correlation test
Spearman’s rank correlation
)1(
612
2
nn
d r
i
s
where
d i = difference in ranks assigned to 2 different characteristics of phenomenon
n = number of phenomena ranked
Steps:1) Fit the regression to data on Y and X and obtain residuals
2) Ignore sign of residuals, take & rank iu & i X (or iY ) & compute sr
3) Assume population rank correlation coefficient = 0 & n>8 . Use
21
2
s
s
r
nr t
, df = n-2
If computed t value < critical t value, fail to reject homoscedasticity
7/30/2019 Heteroscedasticity[1]
http://slidepdf.com/reader/full/heteroscedasticity1 10/15
10
4. Park test (strictly exploratory method)
The Park test is a test of the hypothesis:
H0 = 01 [which is constant]
H1 = 01 [hetero]
2 stage procedure:
(1) run OLS regressionii u X Y lnˆˆˆ
10 disregarding heteroscedasticity
question
(2) runiii
v X u lnˆln 2 to test particular which independent variable
causing hetero.
Park suggests
viii e X
22 or iii v X lnlnln22
Unknown, using 2ˆiu as proxy
iii v X u lnlnˆln22
ii
v X ln
If insignificant, fail to reject assumption of homoscedasticity.
7/30/2019 Heteroscedasticity[1]
http://slidepdf.com/reader/full/heteroscedasticity1 11/15
11
5. Glejser test
-Similar in spirit to Park test
Step:
After obtain residuals from OLS regression, regress absolute values of residualson regressor variable that close associated with 2
i .
6. Goldfeld-Quandt test
-critic the error term may not satisfy OLS assumption (hetero) in Park test
-assumes the heteroscedastic variance positive related to one of regressors in
Model 222ii X ------hetero if 2
i large when i X larger
-need to depend on number of central observation to be omitted & identify
correct regressor variable to order observation (on of the limitation of this test )
The Goldfeld-Quandt test is a test of the hypothesis:
H0 = 0...32 m [ 12
i , which is constant]
H1 = 2221
2... T
Steps:
1) rank observation according to the values of i X
2) omit c (specified prior) central observations, divide remaining (n-c)
into 2 groups each 2 / )( cn observations
3) fit separate OLS regressions to first and last observation, obtain
respective residual sums of squares RSS1 and RSS2
k cn
2
)(or df
k cn
2
)2(
4) compute
df RSS
df RSS
/
/
1
2
-If computed )( f < critical F, fail to reject hypothesis of
homoscedasticity
RSS from regression
corresponding to smaller i X values
7/30/2019 Heteroscedasticity[1]
http://slidepdf.com/reader/full/heteroscedasticity1 12/15
12
7. Breusch-Pagan-Godfrey test
The limitation of Goldfeld-Quandt test can be avoided by using BPG test.
If it known that a set of variables influence the error variance such as Z 1, Z2,…, Zm , we
can write:
mimii Z Z ...2212
The Breusch-Pagan-Godfrey test is a test of the hypothesis:
H0 = 0...32 m [ 12
i , which is constant]
Steps:
1) Estimate linear regression model by OLS, obtain residuals
2) Get nuii / ˆ~ 22
3) Construct variables 22 ~ / ˆ ii u p
4) Regress imimii v Z Z p ...221
5) Obtain ESS from steps 4, define ][2
1 ESS
If computed < critical value [ 12
m ], fail to reject hetero.
7/30/2019 Heteroscedasticity[1]
http://slidepdf.com/reader/full/heteroscedasticity1 13/15
13
(d) Remedial Measures:
Weighted Least Square (WLS)
The GLS estimator is
y X X X 111 )(ˆ
Consideriii X Var 22]|[ . 1 is a diagonal matrix whose ith diagonal
element isi
1. The GLS estimator is obtained by regressing
n
n y
y
y
Py
2
2
1
1
on
n
n X
X
X
Px
2
2
1
1
Applying OLS to transformed model, we obtained the WLS estimator
],[][ˆ1
1
1
n
i iii
n
i iii
y X w X X w wherei
i
1 .
For simplify version
- When2i known
Given model:
ii X Y 10
Assume the true error variance 2i is know, then we can transform the model
(divide both side by error variance for this case):
i
i
i
i
ii
i X Y
10
1
7/30/2019 Heteroscedasticity[1]
http://slidepdf.com/reader/full/heteroscedasticity1 14/15
14
Is the i
i
i v
homescedastic?
As we knowni
iiv
2
2
2
i
iiv
1
1
)()(
2
2
2
22
i
i
i
ii E v E
Since 222)()var(
ii E v (homoscedastic)
7/30/2019 Heteroscedasticity[1]
http://slidepdf.com/reader/full/heteroscedasticity1 15/15
15
- When2i unknown
If the variance is proportional to X2, divide all variables by i X 2
i
i
i
i
ii
i
X X
X
X X
Y
10
1
Is thei
i
i v X
homescedastic?
As we knowni
ii
X v
i
ii
X v
22
2
2
2
2 )()(
i
i
i
i
ii
X
X
X E v E
Since 222 )()var( ii E v (homoscedastic)
If the variance is proportional to X22, divide all variables by X2
Re-specification model
-Note: Hetero problem may be reduced as log transformation compresses the scale
White’s Heteroscedasticity Corrected Standard Error
-Take into consideration of hetero without changing the value of estimated
coefficient.