Empirical Asset Pricing
Seppo Pynnonen
Department of Mathematics and Statistics, University of Vaasa, Finland
Texas A&M University, May – June, 2013
As of May 29, 2013Seppo Pynnonen Empirical Asset Pricing
Time Series
Part IV
Time Series
Seppo Pynnonen Empirical Asset Pricing
Time Series
1 Time Series
Time series data
Autocorrelations
ARIMA models
GARCH
Newy-West auto-correlation and heteroscedastic robuststandard errors
Accessing Stata results
Seppo Pynnonen Empirical Asset Pricing
Time Series
Time series data
Stata has extensive time series facilities.
In order to use time series the data must be first defined for Stataas time series.
tsset timevar [, options]
declares the data as time series for Stata with time variable (timeindex) timevar, where the timevar must be an integer valuedvariable with unique values.
See help tsset
Note: Daily data typically excludes holidays while Stata operateson calendar dates. This leaves gaps into the data which generatemissing values in time series operations.
Seppo Pynnonen Empirical Asset Pricing
Time Series
Time series data
Panel data sets can be declared by
tsset panelvar timevar [, tsoptions]
or
xtset panelvar timevar [, tsoptions]
If there is no time variable
xtset panelvar
Seppo Pynnonen Empirical Asset Pricing
Time Series
Time series data
Example: IBM data
tsset date // declare data to be time series
// using time variable date
time variable: date, 02jan1990 to 31jan2013, but with gaps
delta: 1 day
list date dow close L.close D.close in 1/7, clean
L. D.
date dow close close close
1. 02jan1990 Tue 98 . .
2. 03jan1990 Wed 98.87 98 .8700027
3. 04jan1990 Thu 100 98.87 1.129997
4. 05jan1990 Fri 99.75 100 -.25
5. 08jan1990 Mon 100.37 . .
6. 09jan1990 Tue 99.37 100.37 -1
7. 10jan1990 Wed 99 99.37 -.3700027
Seppo Pynnonen Empirical Asset Pricing
Time Series
Time series data
L. and D. generate lagged and difference of the series.
As seen above missing holidays generate missing values to thelagged and differenced series.
This can be circumvented by generated a dummy time variableaccording to the observation number.
Example: Dummy time variabletsset, clear // clear existing time-series setting (is not necessary)
gen tvar = _n // generate dummy time variable tvar
tsset tvar // declare data to be time series using tvar
time variable: tvar, 1 to 5818
delta: 1 unit
list date tvar dow close L.close D.close in 1/7, clean
L. D.
date tvar dow close close close
1. 02jan1990 1 Tue 98 . .
2. 03jan1990 2 Wed 98.87 98 .8700027
3. 04jan1990 3 Thu 100 98.87 1.129997
4. 05jan1990 4 Fri 99.75 100 -.25
5. 08jan1990 5 Mon 100.37 99.75 .6200027
6. 09jan1990 6 Tue 99.37 100.37 -1
7. 10jan1990 7 Wed 99 99.37 -.3700027
Seppo Pynnonen Empirical Asset Pricing
Time Series
Time series data
Time series operators
Operator MeaningL. lag xt−1
L2. 2-period lag xt−2
. . .F. lead xt+1
F2. 2-period lead xt+2
. . .D. difference xt − xt−1
D2. difference of differencext − xt−1 − (xt−1 − xt−2) = xt − 2xt−1 + xt−2
. . .S. seasonal difference xt − xt−1
S2. lag-2 (seasonal) difference xt − xt−2
. . .
Seppo Pynnonen Empirical Asset Pricing
Time Series
Time series data
L3. = LLL. = L2L., L(1 3 4). means the same as applyingseparately L., L3., and L4.
Note: S1. = D1. but S2. 6= D2.
Exampleslist date close L.close L(2 3).close D2.close S2.close LF.close in 1/6, clean noobs
L. L2. L3. D2. S2.
date close close close close close close close
02jan1990 98 . . . . . 98
03jan1990 98.87 98 . . . . 98.87
04jan1990 100 98.87 98 . .2599945 2 100
05jan1990 99.75 100 98.87 98 -1.379997 .8799973 99.75
08jan1990 100.37 99.75 100 98.87 .8700027 .3700027 100.37
09jan1990 99.37 100.37 99.75 100 -1.620003 -.3799973 99.37
. list date smb hml L(1/3).(smb hml) in 1/6, clean noobs
L. L2. L3. L. L2. L3.
date smb hml smb smb smb hml hml hml
02jan1990 -.68 -.04 . . . . . .
03jan1990 .72 -.29 -.68 . . -.04 . .
04jan1990 .42 -.24 .72 -.68 . -.29 -.04 .
05jan1990 .75 -.24 .42 .72 -.68 -.24 -.29 -.04
08jan1990 -.41 -.23 .75 .42 .72 -.24 -.24 -.29
09jan1990 .86 .07 -.41 .75 .42 -.23 -.24 -.24
Seppo Pynnonen Empirical Asset Pricing
Time Series
Autocorrelations
Autocorrelation function of a stationary time series yt
ρk =γkγ0, (8)
whereγk = cov[yt , yt−k ] (9)
is the kth order autocovariance of yt , k = 0, 1, . . .(γ0 = cov[yt , yt ] = var[yt ] is the variance).
Estimator of γk
γk =1
T
T∑t=k+1
(yt−k − y)(yt − y) (10)
y =1
t
T∑t=1
yt .
Seppo Pynnonen Empirical Asset Pricing
Time Series
ARIMA models
Stata has large amount of univariate and multivariate time seriesmodels and other time series facilities including like smoothing,extrapolation and interpolation, etc. methods.
ARMA(p, q)
arima depvar [indepvars], ar(numlist) ma(numlist)
ARIMA(p, d , q)
arima depvar , arima(#p ,#d ,#q)
ARIMA(p, d , q)× (P,D,Q)s multiplicative seasonal
arima depvar, arima(#p, #d , #q ) sarima(#P , #D ,#Q , #s)
Seppo Pynnonen Empirical Asset Pricing
Time Series
ARIMA models
Stata autocorrelations, partial autocorrelations, and portmanteau(Q) test
corrgram varname [if] [in] [, corrgram options]
Graph autocorrelations with confidence intervals
ac varname [if] [in] [, ac options]
Graph partial autocorrelations with confidence intervals
pac varname [if] [in] [, pac options]
See help corrgram
Seppo Pynnonen Empirical Asset Pricing
Time Series
ARIMA models
HML return autocorrlationscorrgram hml, lags(10)
-1 0 1 -1 0 1
LAG AC PAC Q Prob>Q [Autocorrelation] [Partial Autocor]
-------------------------------------------------------------------------------
1 0.0913 0.0913 48.509 0.0000 | |
2 -0.0028 -0.0112 48.553 0.0000 | |
3 -0.0005 0.0008 48.555 0.0000 | |
4 0.0131 0.0131 49.552 0.0000 | |
5 -0.0323 -0.0351 55.64 0.0000 | |
6 -0.0281 -0.0220 60.254 0.0000 | |
7 0.0255 0.0300 64.039 0.0000 | |
8 0.0157 0.0101 65.472 0.0000 | |
9 0.0255 0.0247 69.264 0.0000 | |
10 0.0476 0.0435 82.492 0.0000 | |
There seems to be some autocorrelation in the HML returns.
Seppo Pynnonen Empirical Asset Pricing
Time Series
ARIMA models
ac hml, lags(30) ylabels(-0.15(0.05)0.15)−
0.15
−0.
10−
0.05
0.00
0.05
0.10
0.15
Aut
ocor
rela
tions
of h
ml
0 10 20 30Lag
Bartlett’s formula for MA(q) 95% confidence bands
Seppo Pynnonen Empirical Asset Pricing
Time Series
ARIMA models
pac hml, lags(30) ylabels(-0.15(0.05)0.15)−
0.15
−0.
10−
0.05
0.00
0.05
0.10
0.15
Par
tial a
utoc
orre
latio
ns o
f hm
l
0 10 20 30Lag
95% Confidence bands [se = 1/sqrt(n)]
Seppo Pynnonen Empirical Asset Pricing
Time Series
ARIMA models
HML return AR(10)-model
arima hml, ar(1/10)
Sample: 1 - 5818 Number of obs = 5818
Wald chi2(10) = 276.20
Log likelihood = -5180.718 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
| OPG
hml | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
hml |
_cons | 0.013 0.009 1.42 0.154 -0.005 0.031
-------------+----------------------------------------------------------------
ARMA |
ar |
L1. | 0.091 0.007 12.89 0.000 0.077 0.105
L2. | -0.011 0.007 -1.49 0.137 -0.025 0.003
L3. | -0.002 0.008 -0.22 0.826 -0.017 0.013
L4. | 0.018 0.008 2.21 0.027 0.002 0.034
L5. | -0.032 0.008 -4.19 0.000 -0.046 -0.017
L6. | -0.025 0.007 -3.39 0.001 -0.040 -0.011
L7. | 0.029 0.008 3.84 0.000 0.014 0.044
L8. | 0.008 0.007 1.12 0.264 -0.006 0.023
L9. | 0.021 0.008 2.62 0.009 0.005 0.036
L10. | 0.043 0.008 5.53 0.000 0.028 0.059
-------------+----------------------------------------------------------------
/sigma | 0.589 0.003 217.06 0.000 0.584 0.595
------------------------------------------------------------------------------
Seppo Pynnonen Empirical Asset Pricing
Time Series
ARIMA models
Residual autocorrelationspredict hmlres, res // postestimation command saves residuals to hmlres
corrgram hmlres, lags(20)
-1 0 1 -1 0 1
LAG AC PAC Q Prob>Q [Autocorrelation] [Partial Autocor]
-------------------------------------------------------------------------------
1 0.0006 0.0006 .00198 0.9645 | |
2 -0.0020 -0.0020 .02608 0.9870 | |
3 -0.0036 -0.0036 .10034 0.9918 | |
4 0.0004 0.0004 .10112 0.9988 | |
5 -0.0026 -0.0027 .14201 0.9996 | |
6 -0.0014 -0.0014 .15368 0.9999 | |
7 0.0026 0.0026 .19402 1.0000 | |
8 -0.0019 -0.0019 .21409 1.0000 | |
9 -0.0031 -0.0031 .27043 1.0000 | |
10 0.0009 0.0009 .27546 1.0000 | |
11 -0.0164 -0.0164 1.8437 0.9990 | |
12 0.0474 0.0475 14.972 0.2430 | |
13 0.0616 0.0616 37.104 0.0004 | |
14 -0.0387 -0.0389 45.86 0.0000 | |
15 0.0222 0.0229 48.743 0.0000 | |
16 0.0043 0.0046 48.85 0.0000 | |
17 0.0358 0.0359 56.338 0.0000 | |
18 0.0084 0.0092 56.754 0.0000 | |
19 0.0144 0.0143 57.964 0.0000 | |
20 0.0294 0.0297 63.001 0.0000 | |
Some autocorrelations in longer lags (ARFIMA?).
Seppo Pynnonen Empirical Asset Pricing
Time Series
GARCH
Stata has several conditional heteroscedasticity modelingalternatives (see help arch).
The general syntax is
arch depvar [indepvars] [if] [in] [weight] [, options]
Seppo Pynnonen Empirical Asset Pricing
Time Series
GARCH
No
rma
l
−2
0−
10
01
02
0R
etu
rn (
%)
1990 1995 2000 2005 2010 2015Date
Source: finance.yahoo.com
Jan 1990 − Jan 2013
IBM daily excess returns
Seppo Pynnonen Empirical Asset Pricing
Time Series
GARCH
IBM return FF-model with t-TGARCH(1,1) errors
arch rete mkt_rf smb hml mon mkt_x_mon, arch(1) tarch(1) garch(1) distribution(t) vce(robust)
Sample: 2 - 5818 Number of obs = 5817
Distribution: t Wald chi2(5) = 4304.99
Log pseudolikelihood = -9146.636 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
| Semirobust
rete | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
rete |
mkt_rf | 0.867 0.014 59.88 0.000 0.838 0.895
smb | -0.235 0.027 -8.84 0.000 -0.287 -0.183
hml | -0.412 0.031 -13.44 0.000 -0.472 -0.352
mon | 0.119 0.031 3.81 0.000 0.058 0.181
mkt_x_mon | -0.048 0.035 -1.36 0.173 -0.116 0.021
_cons | -0.016 0.013 -1.22 0.223 -0.042 0.010
-------------+----------------------------------------------------------------
ARCH |
arch L1. | 0.034 0.006 5.60 0.000 0.022 0.046
tarch L1. | -0.017 0.005 -3.48 0.000 -0.026 -0.007
garch L1. | 0.975 0.004 233.52 0.000 0.967 0.983
_cons | 0.002 0.001 1.43 0.154 -0.001 0.004
-------------+----------------------------------------------------------------
/lndfm2 | 0.619 0.112 5.54 0.000 0.400 0.838
-------------+----------------------------------------------------------------
df | 3.858 0.208 3.492 4.313
------------------------------------------------------------------------------
Seppo Pynnonen Empirical Asset Pricing
Time Series
GARCH
The results of the regression part are consistent with theearlier results.
The variance equation indicates strong GARCH-effect withasymmetry (negative coefficient −0.017 implies pronouncednegative residual return effect in variance (leverage)).
The degrees of freedom estimate of the t-distribution is 3.9indicating much heavier tails than normal distribution.
If the model is correctly specified
zt =utσt∼ i.i.d(0, 1). (11)
Using estimated residuals ut and the estimated GARCHstandard error σt we can generate estimated values for zt andcheck how well we have succeeded in the model specification.
Seppo Pynnonen Empirical Asset Pricing
Time Series
GARCH
Residual diagnosticspredict uhat, resid // residual series
predict sigma2, variance // conditional variances
gen z = uhat / sqrt(sigma2) // standardized residuals that should mimic i.i.d(0, 1)
quietly: summ z, detail
display "Mean " r(mean) " Std " r(sd)
Mean .01268537 Std 1.0083934
display "Skewness: " r(skewness) " Kurtosis: " r(kurtosis)
Skewness: -.0905711 Kurtosis: 12.001656
gen z2 = z^2 // squared residuals (should not be autocorrelated if the model is OK)
corrgram z2, lags(20)
-1 0 1 -1 0 1
LAG AC PAC Q Prob>Q [Autocorrelation] [Partial Autocor]
-------------------------------------------------------------------------------
1 0.0267 0.0267 4.1361 0.0420 | |
2 0.0036 0.0029 4.2113 0.1218 | |
3 0.0067 0.0065 4.4709 0.2149 | |
4 0.0174 0.0170 6.2264 0.1829 | |
5 -0.0062 -0.0071 6.4491 0.2649 | |
10 -0.0204 -0.0201 9.6639 0.4705 | |
20 0.0016 0.0020 13.768 0.8421 | |
The standardized residual exhibit high kurtosis (12.0) but noskewness. The variance equation seems to capture the conditionalheteroscedasticity.
Seppo Pynnonen Empirical Asset Pricing
Time Series
GARCH
−10
−5
05
10
z
01jan1990 01jan1995 01jan2000 01jan2005 01jan2010 01jan2015date
[Jan 1990 to Jan 2013]
Standarized residuals
Seppo Pynnonen Empirical Asset Pricing
Time Series
GARCH
0.1
.2.3
.4.5
Density
−6 −4 −2 0 2 4 6z
Distribution of standardized residuals
Seppo Pynnonen Empirical Asset Pricing
Time Series
GARCH
Residual diagnostics continuedcorrgram z, lags(20)
-1 0 1 -1 0 1
LAG AC PAC Q Prob>Q [Autocorrelation] [Partial Autocor]
-------------------------------------------------------------------------------
1 0.0285 0.0285 4.7113 0.0300 | |
2 -0.0155 -0.0163 6.1109 0.0471 | |
3 0.0187 0.0196 8.138 0.0432 | |
4 0.0057 0.0044 8.3285 0.0803 | |
5 -0.0048 -0.0045 8.4623 0.1325 | |
6 -0.0005 -0.0004 8.4636 0.2061 | |
7 0.0144 0.0142 9.6743 0.2078 | |
8 0.0010 0.0003 9.6796 0.2882 | |
9 -0.0048 -0.0043 9.8118 0.3659 | |
10 0.0211 0.0210 12.4 0.2592 | |
11 0.0026 0.0010 12.438 0.3316 | |
12 -0.0028 -0.0019 12.482 0.4078 | |
13 0.0033 0.0028 12.546 0.4834 | |
14 0.0002 -0.0006 12.547 0.5625 | |
15 0.0016 0.0019 12.561 0.6361 | |
20 0.0085 0.0082 14.779 0.7889 | |
Also there seems be no residual autocorrelations.
Thus in terms of these residual diagnostics the model fit is OK.
Seppo Pynnonen Empirical Asset Pricing
Time Series
GARCH
Additional explanatory variables (covariates) can be included to thevariance equation via het(varlist) option.
For example het(x w) in arch(1) garch(1) optionscorresponds variance model
σ2t = exp(λ0 + λ1xt + λ2wt) + αu2t−1 + βσ2t−1. (12)
This implies a multiplicative heteroscedasticity specification withrespect to the covariates.
Note: In EGARCH specification with options het(x w) earch(1)
egarch(1), the variance model is
log(σ2t ) = λ0+λ1xt+λ2wt+ αzt−1︸ ︷︷ ︸leverage
+ γ(|zt−1]−
√2/π
)︸ ︷︷ ︸
symm effect
+δ log(σ2t−1).
(13)
Seppo Pynnonen Empirical Asset Pricing
Time Series
GARCH
It turns out that TGARCH (and even GARCH) specification failsto converge while EGARCH works fine.
Monday effect in volatility (EGARCH specification)
arch rete mkt_rf smb hml mon mkt_x_mon, het(mon) earch(1) egarch(1) distribution(t) vce(robust)
Sample: 2 - 5818 Number of obs = 5817
Distribution: t Wald chi2(5) = 5872.20
Log pseudolikelihood = -9112.067 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
| Semirobust
rete | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
rete |
mkt_rf | 0.862 0.013 68.48 0.000 0.838 0.887
smb | -0.234 0.027 -8.55 0.000 -0.288 -0.181
hml | -0.410 0.026 -15.84 0.000 -0.461 -0.359
mon | 0.107 0.031 3.48 0.001 0.047 0.167
mkt_x_mon | -0.047 0.035 -1.32 0.187 -0.116 0.023
_cons | -0.016 0.016 -0.99 0.324 -0.047 0.015
-------------+----------------------------------------------------------------
Seppo Pynnonen Empirical Asset Pricing
Time Series
GARCH
Estimation results continue-------------+----------------------------------------------------------------
HET |
mon | 0.002 0.053 0.04 0.965 -0.101 0.106
_cons | 0.008 0.011 0.74 0.459 -0.013 0.028
-------------+----------------------------------------------------------------
ARCH |
earch |
L1. | -0.025 0.006 -4.42 0.000 -0.036 -0.014
|
earch_a |
L1. | 0.083 0.015 5.61 0.000 0.054 0.112
|
egarch |
L1. | 0.998 0.001 784.10 0.000 0.995 1.000
-------------+----------------------------------------------------------------
/lndfm2 | 0.696 0.110 6.32 0.000 0.480 0.913
-------------+----------------------------------------------------------------
df | 4.007 0.221 3.617 4.491
------------------------------------------------------------------------------
In the EGARCH model in (13) λ0 = .008, λ1 = .002 (Monday effect),
α = −.025 (leverage), γ = .083, δ = .998, df = 4.007.
Note: /lndfm2 = log(df− 2) is related to Stata estimation; for
technical reason Stata estimates m = log(df− 2) rather than df directly;
the likelihood is not defined for df ≤ 2.Seppo Pynnonen Empirical Asset Pricing
Time Series
GARCH
The estimated coefficients can be referred to as: [rete] b[smb],[HET] b[mon], [ARCH] b[L1.earch].
If we had EGARCH(2, 1), then the coefficient of zt−2 wouldreferred to as [ARCH] b[L2.arch].
Note. In most cases we can refer more briefly like [ARCH]L2.arch
For example
display [ARCH]l.earch a
.08328626
Recall, L. = L1. and with this operation the lower case and upper casedo not make difference (Stata is generally case sensitive).
References to individual coefficients are applied for example in testing the
imposed restrictions on the coefficients by the postestimation command
test.
Seppo Pynnonen Empirical Asset Pricing
Time Series
Newy-West auto-correlation and heteroscedastic robust standard errors
Consider the general regression model
yt = x′tβ + ut (14)
t = 1, . . . ,T , where xt = (1, x1, x2, . . . , xp)′t is the tth row of theT × (p + 1) data matrix X of explanatory variables with ones inthe first column. The OLS estimator
β = (X′X)−1X′y (15)
which we can be written as
β = β + (X′X)−1X′u. (16)
Note that
X′X =T∑t=1
xtx′t (17)
and
X′u =T∑t=1
xtut . (18)
Seppo Pynnonen Empirical Asset Pricing
Time Series
Newy-West auto-correlation and heteroscedastic robust standard errors
Given that cov[xt , ut ] = 0 and that the observations are“asymptotically independent”, then the law of large numbers(LLN) implies that
1
TX′u =
1
T
T∑t=1
xtut → cov[xt , ut ] = 0 (19)
(in probability) as T →∞, which implies that
β → β. (20)
I.e., β is a consistent estimator of β under rather generalconditions.
Seppo Pynnonen Empirical Asset Pricing
Time Series
Newy-West auto-correlation and heteroscedastic robust standard errors
Then, because in finance samples are typically fairly big:
a) if ut are independent and homoscedastic, i.e., i.i.d(0, σ2u),use OLS standard errors,
b) if ut are independent but heteroscedastic (i.e. var[ut ] = σ2t isvarying), use Huber/White-type standard errors,
c) if ut are heteroscedastic and correlated withing clusters andindependent between clusters, use cluster-correlation correctedstandard errors (option vce(cluster, clustervar) in Stata),
c) if ut are autocorrelated and heteroscedastic, use Newey-West(1987, Econometrica) corrected standard errors.
Seppo Pynnonen Empirical Asset Pricing
Time Series
Newy-West auto-correlation and heteroscedastic robust standard errors
The general formula for the variance-covariance matrix of the OLSestimator is
cov[β]
= (X′X)−1X′ΩX(X′X)−1 (21)
where Ω = var[u]: T × T variance-covariance matrix of the errorterm. Given the estimator of Ω, the standard errors of thecomponents βi of β are the square roots of the diagonal elementsof (21) (with Ω replaced by the estimator).
In case a) Ω = σ2uI and (21) reduces to σ2u(X′X)−1 withσ2u = var[ut ].
In b) the White-standard errors (Stata: regress, vce(robust))are
X′ΩX = X′Ω0X =T
T − p − 1
∑t
u2t xtx′t (22)
Seppo Pynnonen Empirical Asset Pricing
Time Series
Newy-West auto-correlation and heteroscedastic robust standard errors
In c) given a clustering variable Stata regress, vce(cluster,
clustvar) computes the robust standard errors.
In d) the Newey-West formulation is
X′ΩX = X′Ω0X (23)
+cm∑`=1
(1− `
m + 1
) T∑t=`+1
ut ut−`(xtxt−` + xt−`x
′t
)where c = T/(T − p − 1) and the maximum lag length m mustbe given. For m = 0, (23) reduces to (22). In Stata these areobtain by
newey depvar indepvars [if] [in] [weight], lag(#) [options]
where the number of lags in lag must be defined (seeregression.do).
Seppo Pynnonen Empirical Asset Pricing
Time Series
Accessing Stata results
We close this section by reviewing Stata commands that reportresults and their saved the results.
Commands save results in one of three places:
r-calss commands, such as summarize save results in r();most commands are r-class.
e-class commands, such as regress, save results in e();e-commands are Stata model estimation commands.
s-class commands (no good examples) save results in s();rarely used (mainly by programmers)
r-class and e-class are not really different.
Results saved in r() and e() can be seen by return list andereturn list.
Seppo Pynnonen Empirical Asset Pricing