Download pdf - Empirical Asset Pricinglipas.uwasa.fi/~sjp/Teaching/eaptx/lectures/p4.pdf · Empirical Asset Pricing Seppo Pynn onen Department of Mathematics and Statistics, University of Vaasa,

Empirical Asset Pricing

Seppo Pynnonen

Department of Mathematics and Statistics, University of Vaasa, Finland

Texas A&M University, May – June, 2013

As of May 29, 2013Seppo Pynnonen Empirical Asset Pricing

Time Series

Part IV

Time Series

Seppo Pynnonen Empirical Asset Pricing

Time Series

1 Time Series

Time series data

Autocorrelations

ARIMA models

GARCH

Newy-West auto-correlation and heteroscedastic robuststandard errors

Accessing Stata results


Time Series

Time series data

Stata has extensive time series facilities.

In order to use time series the data must be first defined for Stataas time series.

tsset timevar [, options]

declares the data as time series for Stata with time variable (timeindex) timevar, where the timevar must be an integer valuedvariable with unique values.

See help tsset

Note: Daily data typically excludes holidays while Stata operateson calendar dates. This leaves gaps into the data which generatemissing values in time series operations.


Time Series

Time series data

Panel data sets can be declared by

tsset panelvar timevar [, tsoptions]

or

xtset panelvar timevar [, tsoptions]

If there is no time variable

xtset panelvar


Time Series

Time series data

Example: IBM data

tsset date // declare data to be time series

// using time variable date

time variable: date, 02jan1990 to 31jan2013, but with gaps

delta: 1 day

list date dow close L.close D.close in 1/7, clean

L. D.

date dow close close close

1. 02jan1990 Tue 98 . .

2. 03jan1990 Wed 98.87 98 .8700027

3. 04jan1990 Thu 100 98.87 1.129997

4. 05jan1990 Fri 99.75 100 -.25

5. 08jan1990 Mon 100.37 . .

6. 09jan1990 Tue 99.37 100.37 -1

7. 10jan1990 Wed 99 99.37 -.3700027


Time Series

Time series data

L. and D. generate lagged and difference of the series.

As seen above missing holidays generate missing values to thelagged and differenced series.

This can be circumvented by generated a dummy time variableaccording to the observation number.

Example: Dummy time variabletsset, clear // clear existing time-series setting (is not necessary)

gen tvar = _n // generate dummy time variable tvar

tsset tvar // declare data to be time series using tvar

time variable: tvar, 1 to 5818

delta: 1 unit

list date tvar dow close L.close D.close in 1/7, clean

L. D.

date tvar dow close close close

1. 02jan1990 1 Tue 98 . .

2. 03jan1990 2 Wed 98.87 98 .8700027

3. 04jan1990 3 Thu 100 98.87 1.129997

4. 05jan1990 4 Fri 99.75 100 -.25

5. 08jan1990 5 Mon 100.37 99.75 .6200027

6. 09jan1990 6 Tue 99.37 100.37 -1

7. 10jan1990 7 Wed 99 99.37 -.3700027


Time Series

Time series data

Time series operators

Operator MeaningL. lag xt−1

L2. 2-period lag xt−2

. . .F. lead xt+1

F2. 2-period lead xt+2

. . .D. difference xt − xt−1

D2. difference of differencext − xt−1 − (xt−1 − xt−2) = xt − 2xt−1 + xt−2

. . .S. seasonal difference xt − xt−1

S2. lag-2 (seasonal) difference xt − xt−2

. . .


Time Series

Time series data

L3. = LLL. = L2L., L(1 3 4). means the same as applyingseparately L., L3., and L4.

Note: S1. = D1. but S2. 6= D2.

Exampleslist date close L.close L(2 3).close D2.close S2.close LF.close in 1/6, clean noobs

L. L2. L3. D2. S2.

date close close close close close close close

02jan1990 98 . . . . . 98

03jan1990 98.87 98 . . . . 98.87

04jan1990 100 98.87 98 . .2599945 2 100

05jan1990 99.75 100 98.87 98 -1.379997 .8799973 99.75

08jan1990 100.37 99.75 100 98.87 .8700027 .3700027 100.37

09jan1990 99.37 100.37 99.75 100 -1.620003 -.3799973 99.37

. list date smb hml L(1/3).(smb hml) in 1/6, clean noobs

L. L2. L3. L. L2. L3.

date smb hml smb smb smb hml hml hml

02jan1990 -.68 -.04 . . . . . .

03jan1990 .72 -.29 -.68 . . -.04 . .

04jan1990 .42 -.24 .72 -.68 . -.29 -.04 .

05jan1990 .75 -.24 .42 .72 -.68 -.24 -.29 -.04

08jan1990 -.41 -.23 .75 .42 .72 -.24 -.24 -.29

09jan1990 .86 .07 -.41 .75 .42 -.23 -.24 -.24


Time Series

Autocorrelations

Autocorrelation function of a stationary time series yt

ρk =γkγ0, (8)

whereγk = cov[yt , yt−k ] (9)

is the kth order autocovariance of yt , k = 0, 1, . . .(γ0 = cov[yt , yt ] = var[yt ] is the variance).

Estimator of γk

γk =1

T

T∑t=k+1

(yt−k − y)(yt − y) (10)

y =1

t

T∑t=1

yt .


Time Series

ARIMA models

Stata has large amount of univariate and multivariate time seriesmodels and other time series facilities including like smoothing,extrapolation and interpolation, etc. methods.

ARMA(p, q)

arima depvar [indepvars], ar(numlist) ma(numlist)

ARIMA(p, d , q)

arima depvar , arima(#p ,#d ,#q)

ARIMA(p, d , q)× (P,D,Q)s multiplicative seasonal

arima depvar, arima(#p, #d , #q ) sarima(#P , #D ,#Q , #s)


Time Series

ARIMA models

Stata autocorrelations, partial autocorrelations, and portmanteau(Q) test

corrgram varname [if] [in] [, corrgram options]

Graph autocorrelations with confidence intervals

ac varname [if] [in] [, ac options]

Graph partial autocorrelations with confidence intervals

pac varname [if] [in] [, pac options]

See help corrgram


Time Series

ARIMA models

HML return autocorrlationscorrgram hml, lags(10)

-1 0 1 -1 0 1

LAG AC PAC Q Prob>Q [Autocorrelation] [Partial Autocor]

-------------------------------------------------------------------------------

1 0.0913 0.0913 48.509 0.0000 | |

2 -0.0028 -0.0112 48.553 0.0000 | |

3 -0.0005 0.0008 48.555 0.0000 | |

4 0.0131 0.0131 49.552 0.0000 | |

5 -0.0323 -0.0351 55.64 0.0000 | |

6 -0.0281 -0.0220 60.254 0.0000 | |

7 0.0255 0.0300 64.039 0.0000 | |

8 0.0157 0.0101 65.472 0.0000 | |

9 0.0255 0.0247 69.264 0.0000 | |

10 0.0476 0.0435 82.492 0.0000 | |

There seems to be some autocorrelation in the HML returns.


Time Series

ARIMA models

ac hml, lags(30) ylabels(-0.15(0.05)0.15)−

0.15

−0.

10−

0.05

0.00

0.05

0.10

0.15

Aut

ocor

rela

tions

of h

ml

0 10 20 30Lag

Bartlett’s formula for MA(q) 95% confidence bands


Time Series

ARIMA models

pac hml, lags(30) ylabels(-0.15(0.05)0.15)−

0.15

−0.

10−

0.05

0.00

0.05

0.10

0.15

Par

tial a

utoc

orre

latio

ns o

f hm

l

0 10 20 30Lag

95% Confidence bands [se = 1/sqrt(n)]


Time Series

ARIMA models

HML return AR(10)-model

arima hml, ar(1/10)

Sample: 1 - 5818 Number of obs = 5818

Wald chi2(10) = 276.20

Log likelihood = -5180.718 Prob > chi2 = 0.0000

------------------------------------------------------------------------------

| OPG

hml | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------

hml |

_cons | 0.013 0.009 1.42 0.154 -0.005 0.031

-------------+----------------------------------------------------------------

ARMA |

ar |

L1. | 0.091 0.007 12.89 0.000 0.077 0.105

L2. | -0.011 0.007 -1.49 0.137 -0.025 0.003

L3. | -0.002 0.008 -0.22 0.826 -0.017 0.013

L4. | 0.018 0.008 2.21 0.027 0.002 0.034

L5. | -0.032 0.008 -4.19 0.000 -0.046 -0.017

L6. | -0.025 0.007 -3.39 0.001 -0.040 -0.011

L7. | 0.029 0.008 3.84 0.000 0.014 0.044

L8. | 0.008 0.007 1.12 0.264 -0.006 0.023

L9. | 0.021 0.008 2.62 0.009 0.005 0.036

L10. | 0.043 0.008 5.53 0.000 0.028 0.059

-------------+----------------------------------------------------------------

/sigma | 0.589 0.003 217.06 0.000 0.584 0.595

------------------------------------------------------------------------------


Time Series

ARIMA models

Residual autocorrelationspredict hmlres, res // postestimation command saves residuals to hmlres

corrgram hmlres, lags(20)

-1 0 1 -1 0 1


-------------------------------------------------------------------------------

1 0.0006 0.0006 .00198 0.9645 | |

2 -0.0020 -0.0020 .02608 0.9870 | |

3 -0.0036 -0.0036 .10034 0.9918 | |

4 0.0004 0.0004 .10112 0.9988 | |

5 -0.0026 -0.0027 .14201 0.9996 | |

6 -0.0014 -0.0014 .15368 0.9999 | |

7 0.0026 0.0026 .19402 1.0000 | |

8 -0.0019 -0.0019 .21409 1.0000 | |

9 -0.0031 -0.0031 .27043 1.0000 | |

10 0.0009 0.0009 .27546 1.0000 | |

11 -0.0164 -0.0164 1.8437 0.9990 | |

12 0.0474 0.0475 14.972 0.2430 | |

13 0.0616 0.0616 37.104 0.0004 | |

14 -0.0387 -0.0389 45.86 0.0000 | |

15 0.0222 0.0229 48.743 0.0000 | |

16 0.0043 0.0046 48.85 0.0000 | |

17 0.0358 0.0359 56.338 0.0000 | |

18 0.0084 0.0092 56.754 0.0000 | |

19 0.0144 0.0143 57.964 0.0000 | |

20 0.0294 0.0297 63.001 0.0000 | |

Some autocorrelations in longer lags (ARFIMA?).


Time Series

GARCH

Stata has several conditional heteroscedasticity modelingalternatives (see help arch).

The general syntax is

arch depvar [indepvars] [if] [in] [weight] [, options]


Time Series

GARCH

No

rma

l

−2

0−

10

01

02

0R

etu

rn (

%)

1990 1995 2000 2005 2010 2015Date

Source: finance.yahoo.com

Jan 1990 − Jan 2013

IBM daily excess returns


Time Series

GARCH

IBM return FF-model with t-TGARCH(1,1) errors

arch rete mkt_rf smb hml mon mkt_x_mon, arch(1) tarch(1) garch(1) distribution(t) vce(robust)


Distribution: t Wald chi2(5) = 4304.99

Log pseudolikelihood = -9146.636 Prob > chi2 = 0.0000

------------------------------------------------------------------------------

| Semirobust

rete | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------

rete |

mkt_rf | 0.867 0.014 59.88 0.000 0.838 0.895

smb | -0.235 0.027 -8.84 0.000 -0.287 -0.183

hml | -0.412 0.031 -13.44 0.000 -0.472 -0.352

mon | 0.119 0.031 3.81 0.000 0.058 0.181

mkt_x_mon | -0.048 0.035 -1.36 0.173 -0.116 0.021

_cons | -0.016 0.013 -1.22 0.223 -0.042 0.010

-------------+----------------------------------------------------------------

ARCH |

arch L1. | 0.034 0.006 5.60 0.000 0.022 0.046

tarch L1. | -0.017 0.005 -3.48 0.000 -0.026 -0.007

garch L1. | 0.975 0.004 233.52 0.000 0.967 0.983

_cons | 0.002 0.001 1.43 0.154 -0.001 0.004

-------------+----------------------------------------------------------------

/lndfm2 | 0.619 0.112 5.54 0.000 0.400 0.838

-------------+----------------------------------------------------------------

df | 3.858 0.208 3.492 4.313

------------------------------------------------------------------------------


Time Series

GARCH

The results of the regression part are consistent with theearlier results.

The variance equation indicates strong GARCH-effect withasymmetry (negative coefficient −0.017 implies pronouncednegative residual return effect in variance (leverage)).

The degrees of freedom estimate of the t-distribution is 3.9indicating much heavier tails than normal distribution.

If the model is correctly specified

zt =utσt∼ i.i.d(0, 1). (11)

Using estimated residuals ut and the estimated GARCHstandard error σt we can generate estimated values for zt andcheck how well we have succeeded in the model specification.


Time Series

GARCH

Residual diagnosticspredict uhat, resid // residual series

predict sigma2, variance // conditional variances

gen z = uhat / sqrt(sigma2) // standardized residuals that should mimic i.i.d(0, 1)

quietly: summ z, detail

display "Mean " r(mean) " Std " r(sd)

Mean .01268537 Std 1.0083934

display "Skewness: " r(skewness) " Kurtosis: " r(kurtosis)

Skewness: -.0905711 Kurtosis: 12.001656

gen z2 = z^2 // squared residuals (should not be autocorrelated if the model is OK)

corrgram z2, lags(20)

-1 0 1 -1 0 1


-------------------------------------------------------------------------------

1 0.0267 0.0267 4.1361 0.0420 | |

2 0.0036 0.0029 4.2113 0.1218 | |

3 0.0067 0.0065 4.4709 0.2149 | |

4 0.0174 0.0170 6.2264 0.1829 | |

5 -0.0062 -0.0071 6.4491 0.2649 | |

10 -0.0204 -0.0201 9.6639 0.4705 | |

20 0.0016 0.0020 13.768 0.8421 | |

The standardized residual exhibit high kurtosis (12.0) but noskewness. The variance equation seems to capture the conditionalheteroscedasticity.


Time Series

GARCH

−10

−5

05

10

z

01jan1990 01jan1995 01jan2000 01jan2005 01jan2010 01jan2015date

[Jan 1990 to Jan 2013]

Standarized residuals


Time Series

GARCH

0.1

.2.3

.4.5

Density

−6 −4 −2 0 2 4 6z

Distribution of standardized residuals


Time Series

GARCH

Residual diagnostics continuedcorrgram z, lags(20)

-1 0 1 -1 0 1


-------------------------------------------------------------------------------

1 0.0285 0.0285 4.7113 0.0300 | |

2 -0.0155 -0.0163 6.1109 0.0471 | |

3 0.0187 0.0196 8.138 0.0432 | |

4 0.0057 0.0044 8.3285 0.0803 | |

5 -0.0048 -0.0045 8.4623 0.1325 | |

6 -0.0005 -0.0004 8.4636 0.2061 | |

7 0.0144 0.0142 9.6743 0.2078 | |

8 0.0010 0.0003 9.6796 0.2882 | |

9 -0.0048 -0.0043 9.8118 0.3659 | |

10 0.0211 0.0210 12.4 0.2592 | |

11 0.0026 0.0010 12.438 0.3316 | |

12 -0.0028 -0.0019 12.482 0.4078 | |

13 0.0033 0.0028 12.546 0.4834 | |

14 0.0002 -0.0006 12.547 0.5625 | |

15 0.0016 0.0019 12.561 0.6361 | |

20 0.0085 0.0082 14.779 0.7889 | |

Also there seems be no residual autocorrelations.

Thus in terms of these residual diagnostics the model fit is OK.


Time Series

GARCH

Additional explanatory variables (covariates) can be included to thevariance equation via het(varlist) option.

For example het(x w) in arch(1) garch(1) optionscorresponds variance model

σ2t = exp(λ0 + λ1xt + λ2wt) + αu2t−1 + βσ2t−1. (12)

This implies a multiplicative heteroscedasticity specification withrespect to the covariates.

Note: In EGARCH specification with options het(x w) earch(1)

egarch(1), the variance model is

log(σ2t ) = λ0+λ1xt+λ2wt+ αzt−1︸︷︷︸leverage

+ γ(|zt−1]−

√2/π

)︸︷︷︸

symm effect

+δ log(σ2t−1).

(13)


Time Series

GARCH

It turns out that TGARCH (and even GARCH) specification failsto converge while EGARCH works fine.

Monday effect in volatility (EGARCH specification)

arch rete mkt_rf smb hml mon mkt_x_mon, het(mon) earch(1) egarch(1) distribution(t) vce(robust)


Distribution: t Wald chi2(5) = 5872.20

Log pseudolikelihood = -9112.067 Prob > chi2 = 0.0000

------------------------------------------------------------------------------

| Semirobust

rete | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------

rete |

mkt_rf | 0.862 0.013 68.48 0.000 0.838 0.887

smb | -0.234 0.027 -8.55 0.000 -0.288 -0.181

hml | -0.410 0.026 -15.84 0.000 -0.461 -0.359

mon | 0.107 0.031 3.48 0.001 0.047 0.167

mkt_x_mon | -0.047 0.035 -1.32 0.187 -0.116 0.023

_cons | -0.016 0.016 -0.99 0.324 -0.047 0.015

-------------+----------------------------------------------------------------


Time Series

GARCH

Estimation results continue-------------+----------------------------------------------------------------

HET |

mon | 0.002 0.053 0.04 0.965 -0.101 0.106

_cons | 0.008 0.011 0.74 0.459 -0.013 0.028

-------------+----------------------------------------------------------------

ARCH |

earch |

L1. | -0.025 0.006 -4.42 0.000 -0.036 -0.014

|

earch_a |

L1. | 0.083 0.015 5.61 0.000 0.054 0.112

|

egarch |

L1. | 0.998 0.001 784.10 0.000 0.995 1.000

-------------+----------------------------------------------------------------

/lndfm2 | 0.696 0.110 6.32 0.000 0.480 0.913

-------------+----------------------------------------------------------------

df | 4.007 0.221 3.617 4.491

------------------------------------------------------------------------------

In the EGARCH model in (13) λ0 = .008, λ1 = .002 (Monday effect),

α = −.025 (leverage), γ = .083, δ = .998, df = 4.007.

Note: /lndfm2 = log(df− 2) is related to Stata estimation; for

technical reason Stata estimates m = log(df− 2) rather than df directly;

the likelihood is not defined for df ≤ 2.Seppo Pynnonen Empirical Asset Pricing

Time Series

GARCH

The estimated coefficients can be referred to as: [rete] b[smb],[HET] b[mon], [ARCH] b[L1.earch].

If we had EGARCH(2, 1), then the coefficient of zt−2 wouldreferred to as [ARCH] b[L2.arch].

Note. In most cases we can refer more briefly like [ARCH]L2.arch

For example

display [ARCH]l.earch a

.08328626

Recall, L. = L1. and with this operation the lower case and upper casedo not make difference (Stata is generally case sensitive).

References to individual coefficients are applied for example in testing the

imposed restrictions on the coefficients by the postestimation command

test.


Time Series

Newy-West auto-correlation and heteroscedastic robust standard errors

Consider the general regression model

yt = x′tβ + ut (14)

t = 1, . . . ,T , where xt = (1, x1, x2, . . . , xp)′t is the tth row of theT × (p + 1) data matrix X of explanatory variables with ones inthe first column. The OLS estimator

β = (X′X)−1X′y (15)

which we can be written as

β = β + (X′X)−1X′u. (16)

Note that

X′X =T∑t=1

xtx′t (17)

and

X′u =T∑t=1

xtut . (18)


Time Series


Given that cov[xt , ut ] = 0 and that the observations are“asymptotically independent”, then the law of large numbers(LLN) implies that

1

TX′u =

1

T

T∑t=1

xtut → cov[xt , ut ] = 0 (19)

(in probability) as T →∞, which implies that

β → β. (20)

I.e., β is a consistent estimator of β under rather generalconditions.


Time Series


Then, because in finance samples are typically fairly big:

a) if ut are independent and homoscedastic, i.e., i.i.d(0, σ2u),use OLS standard errors,

b) if ut are independent but heteroscedastic (i.e. var[ut ] = σ2t isvarying), use Huber/White-type standard errors,

c) if ut are heteroscedastic and correlated withing clusters andindependent between clusters, use cluster-correlation correctedstandard errors (option vce(cluster, clustervar) in Stata),

c) if ut are autocorrelated and heteroscedastic, use Newey-West(1987, Econometrica) corrected standard errors.


Time Series


The general formula for the variance-covariance matrix of the OLSestimator is

cov[β]

= (X′X)−1X′ΩX(X′X)−1 (21)

where Ω = var[u]: T × T variance-covariance matrix of the errorterm. Given the estimator of Ω, the standard errors of thecomponents βi of β are the square roots of the diagonal elementsof (21) (with Ω replaced by the estimator).

In case a) Ω = σ2uI and (21) reduces to σ2u(X′X)−1 withσ2u = var[ut ].

In b) the White-standard errors (Stata: regress, vce(robust))are

X′ΩX = X′Ω0X =T

T − p − 1

∑t

u2t xtx′t (22)


Time Series


In c) given a clustering variable Stata regress, vce(cluster,

clustvar) computes the robust standard errors.

In d) the Newey-West formulation is

X′ΩX = X′Ω0X (23)

+cm∑`=1

(1− `

m + 1

) T∑t=`+1

ut ut−`(xtxt−` + xt−`x

′t

)where c = T/(T − p − 1) and the maximum lag length m mustbe given. For m = 0, (23) reduces to (22). In Stata these areobtain by

newey depvar indepvars [if] [in] [weight], lag(#) [options]

where the number of lags in lag must be defined (seeregression.do).


Time Series

Accessing Stata results

We close this section by reviewing Stata commands that reportresults and their saved the results.

Commands save results in one of three places:

r-calss commands, such as summarize save results in r();most commands are r-class.

e-class commands, such as regress, save results in e();e-commands are Stata model estimation commands.

s-class commands (no good examples) save results in s();rarely used (mainly by programmers)

r-class and e-class are not really different.

Results saved in r() and e() can be seen by return list andereturn list.