Upload
ngonguyet
View
236
Download
1
Embed Size (px)
Citation preview
Multivariate Time Series
Notation: I do not use boldface (or anything else) to distinguish vectors fromscalars.
Tsay (and many other writers) do.
I denote a multivariate stochastic process in the form “{Xt}”, where, for anyt, Xt is a vector of the same order.
We denote the individual elements with two subscripts:
“Xit” denotes the ith element of the multivariate time series at time t.
Warning: I’m not promising I’ll do this!
This is Tsay’s notation, and one of the two obvious notations.
One common introductory-graduate-level textbook on time series is by Shumwayand Stoffer, and another is by Brockwell and Davis.
Shumway and Stoffer use “Xti”. (I like this.)Brockwell and Davis use “Xit”.
One further thing on notation:you do not need to use the transpose (as Tsay and many others do) onXt = (X1t, . . . , Xkt).
It’s a column vector, but you’re not drawing a picture!
1
Marginal Model Components
In a multivariate time series {Xt : t ∈ . . . ,−1,0,1, . . .}, each
univariate series {Xit : t ∈ . . . ,−1,0,1, . . .} is called a marginal
time series.
The model for {Xit : t ∈ . . . ,−1,0,1, . . .} is called a marginal
model component of the model for {Xt : t ∈ . . . ,−1,0,1, . . .}.
This illustrates one of the disadvantages of not having a special
notation for vectors!
2
Stationary Multivariate Time Series
First of all, let’s establish that in time series, “stationary” means
“weakly stationary”.
(“Strictly stationary” means “strictly stationary” or “strongly”.)
If a time series is stationary, then its first and second moments
are time invariant. (This is not the definition.)
For a stationary time series {Xt}, we will generally denote the
first moment as the vector µ:
µ = E(Xt)
and the variance-covariance as Γ0:
Γ0 = V(Xt) = E((Xt − µ)(Xt − µ)T
).
(Note on notation: “Γ”, with or without serifs, is used to denote
the gamma function; it is a reserved symbol.
“Γ” is used to denote various things.)
3
Stationary Multivariate Time Series
If a time series is stationary, then its first and second moments
are time invariant.
Tsay (page 390) states the converse of this.
That is true, if “second moments” means “second auto-moments”
at fixed lags (including the 0 lag).
I looked back to see how clear Tsay has been on that point.
He has not been clear. On page 39 (in the middle), he makes
an argument that is based on
the time invariance of the first and second moments and
the finiteness of the autocovariance imply stationarity.
In other places, he indicates that time invariance of the auto-
correlations is also required, but because he does not write as a
mathematician, it is sometimes not clear.
4
Stationary Multivariate Time Series
So let’s be very clear. Analogously to the autocovariance γs,t,
let’s define the cross-autocovariance matrix Γs,t:
Γs,t = E((Xs − µs)(Xt − µ)T
).
(“cross-autocovariance matrix” is quite a mouthful, so we’ll just
call it the “cross-covariance matrix”.)
Now, suppose Γs,t is constant if s − t is constant.
(Notice I did not say if |s − t| is constant.)
Under that additional condition (together with the time-invariance
of the first two moments), we say that the multivariate time se-
ries is stationary.
5
Stationary Multivariate Time Series
In the case of stationarity, we can use the notation Γh, which is
consistent with the notation Γ0 used before.
We refer to the h = 0 case as “concurrent” or “contemporane-
ous”. The matrix Γ0 is the concurrent cross-covariance matrix.
We now introduce another notational form so that we can easily
refer to the elements of the matrices. We equate Γ(h) with Γh.
Now, we can denote the ijth element of Γ(h) as Γij(h).
(There is an alternative meaning of Γp. We have used it (I think)
to refer to the p×p symmetric matrix of autocovariances of orders
0 through p. It is often seen in the Yule-Walker equations.)
The meaning is made clear in the context.
6
Stationary Multivariate Time Series
Notice that stationarity of the multivariate time series implies
stationarity of the individual univariate time series.
The univariate autocovariance functions are the diagonal ele-
ments of Γh.
We sometimes use the phrase “jointly stationary” to refer to a
stationary multivariate time series. (This excludes the case of a
multivariate time series each of whose components is stationary,
but the cross-covariances are not constant at constant lags.)
7
Cross-Correlation Matrix
The standard deviation of the ith component of a multivariate time series inthe standard notation is
√Γii(0).
For a k-variate time series, the matrix D = diag(√
Γ11(0), . . . ,√
Γkk(0))
is
very convenient.
All variances are assumed to be positive, so D−1 exists, and
D−1ΓhD−1
is the cross-correlation matrix of lag h. (If h = 0, of course, it is the concurrentcross-correlation matrix.)
Tsay denotes it as ρ0 or ρ`.
I like to use uppercase rho, R0 or Rh.
Of course, either way, we may use an alternative notation, ρij(`) or Rij(h).
Furthermore, note that in my notation, I may use ρij(h) in place of Rij(h).
8
Properties of the Cross-Covariance and
Cross-Correlation Matrices
Notice that Γ0 is symmetric; it’s an ordinary variance-covariance
matrix.
On the other hand, Γh is not necessarily symmetric; in fact,
ΓTh = Γ−h.
The elements of the matrix Γ(h) have directional meanings, and
these have simple interpretations.
First of all, we need to be clear about what kind of relationships
that covariance or correlation relates to.
Covariance or correlation relates to linear relationships.
For X centered on 0, the correlation between X and Y = X2 is
0.
9
Properties of the Cross-Covariance and
Cross-Correlation Matrices
Consider a given i and j representing the ith and jth marginalcomponent time series.
The direction of time is very important in characterizing therelationships between the ith and jth series.
In the following, which is consistent with the nontechnical lan-guage in time series analysis, we will use the term “depend”loosely. (We need the “linear” qualifier.)
If Γij(h) = 0 for all h, then the series Xit does not depend onthe past values of the series Xjt.
If Γij(h) 6= 0 for some h, then the series Xit does depend on the
past values of the series Xjt. In this case we say Xjt leads Xit,or Xjt is a leading indicator of Xjt.
If Γij(h) = Γji(h) = 0 for all h, then neither depends on the pastvalues of the other series, and we say the series are uncoupled.
10
The Cross-Covariance Matrix in a Strictly
Stationary Process
Strict stationarity is a restrictive property.
Notice, first of all, that it requires more than just time-invariance
of the first two moments; it requires time-invariance of the whole
distribution.
It also requires stronger time-invariance of auto-properties.
An iid process is obviously strictly stationary, and such a process
is often our choice for examples.
The following process, however, is also strictly stationary:
. . . , X, −X, X, −X, . . .
11
The Cross-Covariance Matrix in a Strictly
Stationary Process and in a Serially
Uncorrelated Process
The cross-covariance matrix alone does not tell us whether the
process is stationary; we also need time-invariance of the first
two moments.
Given stationarity, it is not possible to tell from the cross-covariance
matrix whether or not a process is strictly stationary.
In a serially uncorrelated process, Γh is a hollow matrix for all
h 6= 0.
12
Sample Cross-Covariance and Cross-Correlation
Matrices
Given a realization of a multivariate time series {xt : t = 1, . . . , n},where each xt is a k-vector, the sample cross-covariance and
cross-correlation matrices are formed from in the obvious ways.
We use the “hat” notation to indicate that these are sample
quantities, and also because they are often used as estimators
of the population quantities.
We also use the notation x̄ to denote∑
xt/n, and σ̂i to denote√∑t(xit − x̄i)
2/n.
(Note the divisor. It’s not really important, of course.)
13
Sample Cross-Covariance and Cross-Correlation
Matrices
The sample cross-covariance matrix is
Γ̂h =1
n
n∑
i=1+h
(xt − x̄)(xt−h − x̄)T.
(Note the divisor.)
Letting D̂ = diag(σ̂1, . . . , σ̂k), the sample cross-correlation matrix
is
R̂h = D̂−1Γ̂hD̂−1.
14
Sample Cross-Correlation Matrices
There is a nice simplified notation that Tiao and Box introduced
to indicate leading and lagging indicators as measured by a sam-
ple cross-covariance matrix.
Each cell has an indicator of “significant” positive, “significant”
negative, “insignificant” sample correlation.
Here, “significant” is defined with respect to twice the asymp-
totic 5% critical value of a sample correlation coefficient under
the assumption that the true correlation coefficient is 0.
The comparison value is 2/√
n, and the indicators are “+”, “−”,
and “.”; thus, at a specific lag, a correlation matrix for three
component time series may be represented as
. + −+ + .. − −
.
15
Multivariate Portmanteau Tests
Recall the test statistic for the portmanteau test of Ljung and
Box (first, recall what the portmanteau test tests):
Q(m) = n(n + 2)m∑
h=1
1
n − hρ̂2h.
The multivariate version for a k-variate time series is
Qk(m) = n2m∑
h=1
1
n − htr
(Γ̂T
h Γ̂−10 Γ̂hΓ̂−1
0
).
Note the similarities and the differences.
Under the null hypothesis and some regularity conditions, this
has an asymptotic distribution of chi-squared with k2m degrees
of freedom.
16
VAR Models
In time series, “VAR” means “vector autoregressive”.
In finance generally, “VaR” means “value at risk.”
A VAR(1) model is
Xt = φ0 + ΦXt−1 + At,
where Xt, φ0, and Xt−1 are k-vectors, Φ is a k × k matrix, and
{At} is a sequence of serially uncorrelated k-vectors with 0 mean
and constant positive definite variance-covariance matrix Σ.
Note that the systematic term may be bigger than it looks.
There are k linear terms. The key is that they only go back in
time one step.
This form is called the reduced-form of a VAR model.
17
Structural Equations of VAR Models
The relationships among the component time series arise from
the off-diagonal elements of Σ.
To exhibit the concurrent relationships among the component
time series, we do a diagonal decomposition of Σ, writing it as
Σ = LGLT, where L is a lower triangular matrix whose diagonal
entries are all 1 (which means that it is of full rank), and G is
a diagonal matrix with positive entries. (Such a decomposition
exists for any positive definite matrix.)
Note that G = L−1Σ(L−1)T.
18
Structural Equations of VAR Models
Now we transform the reduced-form equation by premultiplying
each term by L−1:
L−1Xt = L−1φ0 + L−1ΦXt−1 + L−1At
= φ∗0 + Φ∗Xt−1 + Bt.
This is one of the most fundamental transformations in statistics.
The important result is that the variance-covariance that ties
the component series together concurrently, that is, V(At), has
been replaced by V(Bt), which is diagonal.
Because of the special structure of L, we can see concurrent
linear dependencies of the kth series on the others. And by
rearranging the terms in the series, we can make any component
series the “last” one.
19
Structural Equations of VAR Models
The last row of L−1 has a 1 in the last position. Call the other
elements wk1, wk2, etc.
Then the last equation in the matrix equation
L−1Xt = φ∗0 + Φ∗Xt−1 + Bt.
can be written as
Xkt +k−1∑
i=1
wikXit = φ∗k,0 +
k∑
i=1
φ∗k,iXit−1 + Bkt.
This shows the concurrent relationship of Xkt on the other series.
20
Properties of a VAR(1) Process
There are several important properties we can see easily.
Because the {A} are serially uncorrelated, Cov(At, Xt−h) = 0 for
all h > 0.
Also, we see Cov(At, Xt) = V(At) = Σ.
Also, we see the Xt depends on the jth previous X (and A by
Φj. The process would be explosive (i.e., the variance would go
to infinity) unless Φj → 0 as j → ∞. This will be guaranteed if
all eigenvalues of Φ are less than 1 in modulus.
(Remember this?)
Also, just as in the univariate case, we have the recursion
Γh = ΦΓh−1
from which we get
Γh = ΦhΓ0.
21
VAR(p) Processes and Models
A VAR(p) model, for p > 0 is
Xt = φ0 + Φ1Xt−1 + · · · + ΦpXt−p + At,
where Xt, φ0, and Xt−i are k-vectors, Φ1, . . . , Φp are k×k matrices,
with Φp 6= 0, and {At} is a sequence of serially uncorrelated
k-vectors with 0 mean and constant positive definite variance-
covariance matrix Σ.
We also can write this using the back-shift operator as
(I − Φ1B − · · · − ΦpBp)Xt = φ0 + At,
or
Φ(B)Xt = φ0 + At,
We also can work out the autocovariance for a VAR(p) process:
Γh = Φ1Γh−1 + · · · + ΦpΓh−p.
This is the multivariate Yule-Walker equations.
22
The Yule-Walker Equations
Let’s just consider an AR(p) model.
We have worked out the autocovariance function of an AR(p)
model. It is
γ(h) = φ1γ(h − 1) + · · · + φpγ(h − p)
and
σ2A = γ(0)− φ1γ(1) − · · · − φpγ(p).
23
The Yule-Walker Equations
The equations involving the autocovariance function are called
the Yule-Walker equations.
There are p such equations, for h = 1, . . . , p.
For an AR(p) process that yields the two sets of equations on
the previous slide, we can write them in matrix notation as
Γpφ = γp
and
σ2A = γ(0) − φTγp
24
Yule-Walker Estimation of the AR Parameters
After we compute the sample autocovariance function for a given
set of observations, we merely solve the Yule-Walker equations
to get our estimators:
φ̂ = Γ̂−1p γ̂p
and
σ̂2A = γ̂(0) − φ̂Tγ̂p
Instead of using the sample autocovariance function, we usually
use the sample ACF:
φ̂ = R̂−1p ρ̂p.
25
Large Sample Properties of the Yule-Walker
Estimators
A result that can be shown for the Yule-Walker estimator φ̂ is
√n(φ̂ − φ)
d→ Np(0, σ2wΓ−1
p )
and
σ̂2A
p→ σ2A.
26
Yule-Walker Equation for a VAR(p) Model
The multivariate Yule-Walker equations,
Γh = Φ1Γh−1 + · · · + ΦpΓh−p
can also be used in estimation.
They are often expressed in the correlation from
Rh = Υ1Rh−1 + · · · + ΥpRh−p,
where Υi = D−1/2ΦiD1/2.
27
Companion Matrix
We can sometimes get a better understanding of a k-dimensional
VAR(p) process by writing it as a kp VAR(1).
It is
Yt = Φ∗Xt−1 + Bt
where
0 I 0 · · · 00 0 I · · · 0... ... ... ... 00 I 0 · · · 0Φp Φp−1 Φp−2 · · · Φ1
.
This is sometimes called the companion matrix.
28