6
Testing the hypothesis of Autocorrelation via Durbin-Watson The term autocorrelation may be defined as “correlation between members of series of observations ordered in time [as in time series data] or space [as in cross-sectional data]. Or simpler if error terms depend on their previous lag values, this is called autocorrelation, means error terms depend on their lag values. The Durbin-Watson statistic ranges in value from 0 to 4. A value near 2 indicates non-autocorrelation; a value toward 0 indicates positive Autocorrelation; a value toward 4 indicates negative autocorrelation. The reader may recall that there are generally three types of data that are available for empirical analysis: (1) cross section, (2) time series, and (3) combination of cross section and time series, also known as pooled data. Type of autocorrelation 1. spatial autocorrelation In cross-section studies, data are often collected on the basis of a random sample of cross- sectional units, such as households (in a consumption function analysis) or firms (in an investment study analysis) so that there is no prior reason to believe that the error term pertaining to one household or a firm is correlated with the error term of another household or firm. If by chance such a correlation is observed in cross-sectional units, it is called spatial autocorrelation, that is, correlation in space rather than over time. 2. Serial, correlation, Temporal or Autocorrelation (errors from head-to-head time periods are correlated with one another.) The situation, however, is likely to be very different if we are dealing with time series data, for the observations in such data follow a natural ordering over time so that successive observations are likely to exhibit intercorrelations, especially if the time interval between successive observations is short, such as a day, a week, or a month rather than a year. If you observe stock price indexes, such as the Dow Jones or S&P 500 over

Autocorrelation

Embed Size (px)

DESCRIPTION

Type of Autocorrelation, consequences, reasons, AC detecting technique

Citation preview

Page 1: Autocorrelation

Testing the hypothesis of Autocorrelation via

Durbin-Watson

The term autocorrelation may be defined as “correlation between members of series of

observations ordered in time [as in time series data] or space [as in cross-sectional

data]. Or simpler if error terms depend on their previous lag values, this is called

autocorrelation, means error terms depend on their lag values. The Durbin-Watson

statistic ranges in value from 0 to 4. A value near 2 indicates non-autocorrelation; a

value toward 0 indicates positive Autocorrelation; a value toward 4 indicates negative

autocorrelation. The reader may recall that there are generally three types of data that

are available for empirical analysis: (1) cross section, (2) time series, and (3)

combination of cross section and time series, also known as pooled data.

Type of autocorrelation

1. spatial autocorrelation

In cross-section studies, data are often collected on the basis of a random sample of cross-

sectional units, such as households (in a consumption function analysis) or firms (in an

investment study analysis) so that there is no prior reason to believe that the error term

pertaining to one household or a firm is correlated with the error term of another

household or firm. If by chance such a correlation is observed in cross-sectional units, it is

called spatial autocorrelation, that is, correlation in space rather than over time.

2. Serial, correlation, Temporal or Autocorrelation (errors from head-to-head time

periods are correlated with one another.)

The situation, however, is likely to be very different if we are dealing with time series

data, for the observations in such data follow a natural ordering over time so that

successive observations are likely to exhibit intercorrelations, especially if the time

interval between successive observations is short, such as a day, a week, or a month rather

than a year. If you observe stock price indexes, such as the Dow Jones or S&P 500 over

Page 2: Autocorrelation

successive days, it is not unusual to find that these indexes move up or down for several

days in succession.

Causes of autocorrelation

Misspecification of the model

In empirical analysis the researcher often starts with a plausible regression model that may

not be the most “perfect’’ one. After the regression analysis, the researcher does the

postmortem to find out whether the results accord with a priori expectations. If not,

surgery is begun. For example, the researcher may plot the residuals ˆui obtained from the

fitted regression and may observe patterns. These residuals (which are proxies for ui) may

suggest that some variables that were originally candidates but were not included in the

model for a variety of reasons should be included. This is the case of excluded variable

specification bias. Often the inclusion of such variables removes the correlation pattern

observed among the residuals.

• Data manipulation, smoothing, seasonal adjustment

In empirical analysis, the raw data are often “manipulated.’’ For example, in time series

regressions involving quarterly data, such data are usually derived from the monthly data

by simply adding three monthly observations and dividing the sum by 3. This averaging

introduces smoothness into the data by dampening the fluctuations in the monthly data.

Therefore, the graph plotting the quarterly data looks much smoother than the monthly

data, and this smoothness may itself lend to a systematic pattern in the disturbances,

thereby introducing autocorrelation.

Nonstationarity

When we deal with non-stationary time series (Let suppose we have two variables y and

x) it is quite possible that both Y and X are nonstationary and therefore the error u is also

nonstationary In that case, the error term will exhibit autocorrelation.

Lags

In a time series regression of consumption expenditure on income, it is not uncommon to

find that the consumption expenditure in the current period depends, among other things,

on the consumption expenditure of the previous period. That is,

Consumption t= β1 + β2 income t + β3 consumptiont−1+ u t (1)

Page 3: Autocorrelation

A regression such as (1) is known as autoregression because one of the explanatory

variables is the lagged value of the dependent variable. The rationale for a model such as

(1) is simple. Consumers do not change their consumption habits readily for

psychological, technological, or institutional reasons. Now if we neglect the lagged term

in (1), the resulting error term will reflect a systematic pattern due to the influence of

lagged consumption on current consumption.

• Continued influence of shocks

• Inertia

A salient feature of most economic time series is inertia, or sluggishness. As is well

known, time series such as GNP, price indexes, production, employment, and

unemployment exhibit (business) cycles. Starting at the bottom of the recession, when

economic recovery starts, most of these series start moving upward. In this upswing, the

value of a series at one point in time is greater than its previous value. Thus there is a

“momentum’’ built into them, and it continues until something happens (e.g., increase in

interest rate or taxes or both) to slow them down. Therefore, in regressions involving time

series data, successive observations are likely to be interdependent

Consequences of ignoring autocorrelation if it is present

In fact, the consequences of ignoring autocorrelation when it is present are similar to

those of ignoring heterscedasticity. The coefficient estimates derived using OLS are

still unbiased, but they are inefficient, i.e. they are not BLUE, even at large sample

sizes, so that the standard error Estimates could be wrong. There thus exists the

possibility that the wrong inferences could be made about whether a variable is or is

not an important determinant of variations in y. In the case of positive serial

correlation in the residuals, the OLS standard error estimates will be biased

downwards relative to the true standard errors. That is, OLS will understate their true

variability. This would lead to an increase in the probability of type I error -- that is, a

tendency to reject the null hypothesis sometimes when it is correct. Furthermore, R2 is

likely to be inflated relative to its ‘correct’ value if autocorrelation is present but

ignored, since residual autocorrelation will lead to an underestimate of the true error

variance (for positive autocorrelation).

Page 4: Autocorrelation

Important to note the assumptions underlying the d statistic (durbin-watson)

The regression model includes the intercept term(if not then includes and rerun model

with intercept)

The explanatory variables, the X’s, is nonstochastic, or fixed in repeated sampling.

The error term ut is assumed to be normally distributed

The regression model does not include the lagged value(s) of the dependent variable as

one of the explanatory variables.

There are no missing observations in the data.

How to detect Autocorrelation?

There are two well-known test statistics to compute to detect autocorrelation.

1. Durbin-Watson test: most well-known but does not always give unmistakable answers

and is not appropriate when there is a lagged dependent variable (LDV) in the original

model.

2. Breusch-Godfrey (or LM) test: does not have a similar indeterminate range and can

be used with a LDV.( To avoid some of the pitfalls of the Durbin–Watson d test of

autocorrelation, statisticians Breusch and Godfrey have developed a test of autocorrelation

that is general in the sense that it allows for (1) nonstochastic regressors, such as the

lagged values of the regressand; (2) higher-order autoregressive schemes, such as AR(1),

AR(2), etc.; and (3) simple or higher-order moving averages of white noise error terms,

such as εt.

How to conclude that there is autocorrelation exist or not

As I have been earlier told that D.W value comes between 0-4, and if DW value comes

near two then we will say there is no evidence of autocorrelation ,but if our value comes

near 4 then it is evidence that there is perfect negative auto correlation which is good for

us; but there are few cases when we stuck and could not derive any conclusion about the

autocorrelation like if value comes near 2 we will say there is evidence that there is no

autocorrelation but question is that how much near? 1.2, 1.3 or what? Hence to avoid

from this headache Savin and White (1977) provide a table about DW values which will

discuss later.

Page 5: Autocorrelation

So once again properties for DW

1) There must be a constant term in the regression

(2) The regressors must be non-stochastic – as assumption 4 of the CLRM

(3) There must be no lags of dependent variable in the regression.

Hypothesis testing for Autocorrelation

If we were testing for positive autocorrelation, versus the null hypothesis of no

autocorrelation, we would use the following procedure:

If d < dl, reject the null of no autocorrelation.

If d > du, we do not reject the null hypothesis.

If dl < d < du, the test is inconclusive.

Positive autocorrelation means there is an autocorrelation in our data, which is not

favorable, d means, calculated Durbin Watson values if calculated durbin-watson

value is less than the lover tables value of Durbin Watson; Then we will reject our null

hypothesis of no autocorrelation and will chose alternative hypothesis of there is a

autocorrelation. If our d value comes more than du (Durbin upper) DW table values

we will accept null hypothesis of no autocorrelation, but if our value comes between

DW tables DU and DL values then what(u can increase observations)? NOTE. If DW

value comes 2 its means no evidence of autocorrelation

Note: it is important to know that what if our DW falls into inconclusive

zone.(suppose) It may be pointed out that the indecisive zone narrows as the sample

size increases, which can be seen clearly from the Durbin–Watson tables. For

example, with 4 regressors and 20 observations, the 5 percent lower and upper d

values are 0.894 and 1.828, respectively, but these values are 1.515 and 1.739 if the

sample size is 75. Further the computer program Shazam performs an exact d test, that

Page 6: Autocorrelation

is, it gives the p value, the exact probability of the computed d value. With modern

computing facilities, it is no longer difficult to find the p value of the computed d

statistic.

Decision about autocorrelation with the help of Savin and White (1977) table

In blow pages a table is given by Savin and white, you must concerned, now how to

read the table, we read table on the basis of numbers of Regressors and number of

observations.

Let suppose I have 5 independent variables means regressors (regressors are denote by

K) and my calculated Durbin Watson value is 1.7 and 32 observations, so how to

conclude now go to below table select the column of K=5 and go to on 32 raw

(because we have 32 observation) here you will find DL-DU

Now here we have DL=.91 and DU=1.5 while our calculated DW is 1.7

So on the base of above rules, once again remind u

If we were testing for positive autocorrelation, versus the null hypothesis of no

autocorrelation, we would use the following procedure:

If d < dl, reject the null of no autocorrelation.

If d > du, we do not reject the null hypothesis.

If dl < d < du, the test is inconclusive.

Hence here we can see that our calculated DW values is 1.7 which is more than Du that’s

If d > du, we do not reject the null hypothesis of no autocorrelation.

Means we will accept null hypothesis that is there is no auto correlation .in so simple if

our DW value comes more than DU we will say there is no autocorrelation in our

data.(note I have check Durbin-Watson Statistic: 1 Per Cent Significance Points of dL

and dU)