Lecture Notes for the Course ‘Empirical Macroeconomics’ · 1I thank Andreas W alchli for the preparation of these lecture notes. 2The notation used in this lecture notes are mainly

.

Lecture Notes for the Course

‘Empirical Macroeconomics’

Dr. Marcel R. Savioz, Swiss National Bank

MiQE/F-Course ”Empirical Macroeconomics”

University of St Gallen

CONTENTS I

Contents

1 Topic 1: System of Stochastic Difference Equation (VAR) 1

1.1 Basic concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Stability of a VAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Vector Moving Average . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.4 Spectral Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.5 Schur Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.6 Impulse Response Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.7 Introducing the Lag Operator L . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.8 Forecast Error Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.9 Forecast Error Variance Decomposition . . . . . . . . . . . . . . . . . . . . . . . . 9

1.10 Structural VAR (SVAR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.11 The Identification Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.12 Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.13 Proposition (Local Identification of the SVAR) . . . . . . . . . . . . . . . . . . . . 13

1.14 Cholesky Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

1.15 Structural VARs with Long-Run Restrictions . . . . . . . . . . . . . . . . . . . . . 14

1.16 VAR Processes with Integrated Variables . . . . . . . . . . . . . . . . . . . . . . . 15

1.17 Cointegrated Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

1.18 Identification in Cointegrated VARs . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2 Topic 2: The Kalman Filter 20

2.1 State Space Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.2 Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.2.1 The Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.2.2 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.2.3 Smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.3 Derivation of the Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.3.1 Conditional Normality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.3.2 Linear Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.4 Properties of Time-Invariant Models . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.5 Estimation, Initialisation and Diagnostic Checking of the Kalman Filter . . . . . . 31

2.5.1 Maximum Likelihood Estimation . . . . . . . . . . . . . . . . . . . . . . . . 31

2.5.2 Initialisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

2.5.3 Diagnostic Checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

2.6 Extended Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

2.7 Illustrative Economic Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

2.7.1 Example of a time-varying coefficients model . . . . . . . . . . . . . . . . . 37

2.7.2 Example of a multivariate SUTSE model . . . . . . . . . . . . . . . . . . . 38

Empirical Macroeconomics

CONTENTS II

3 Topic 3: Solving Rational Expectation Models 43

3.1 The Basic Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.2 Rational Expectations Models with Expectations of Future Variables (REFV Models) 45

3.3 Solving Rational Expectations (State-Space-Representation) . . . . . . . . . . . . . 46

3.4 The Problem of Multiple Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

3.5 Solution to Linear Expectation Models . . . . . . . . . . . . . . . . . . . . . . . . . 50

3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4 Topic 4: Models of Optimising Agents 53

4.1 Solution of the Optimal Control Problem . . . . . . . . . . . . . . . . . . . . . . . 55

4.2 Solution to the Deterministic Control by Lagrange Multipliers . . . . . . . . . . . . 56

4.3 Solution of Stochastic Control by Lagrange Multipliers . . . . . . . . . . . . . . . . 58

4.4 The Combined Solution and the Minimum Expected Loss . . . . . . . . . . . . . . 60

4.5 The Steady-State Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

4.6 The Method of Dynamic Programming by an Example . . . . . . . . . . . . . . . . 63

4.7 Dynamic Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

4.8 Example to Topic 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

A Proof of Lemma from Section 2.3.1 78

B Proof of Lemma from Section 2.3.2 79

C References 80


Topic 1: System of Stochastic Difference Equation (VAR) 1

1 Topic 1: System of Stochastic Difference Equation (VAR)

Introduction1

Topic 1 will introduce the basic concepts and definitions of vector autoregression analysis (VARs),

structural VARs, vector error correction models (VECM) and structural VECs. Until some 25

years ago, large structural models dominated econometric analysis. These models were used for

forecasting, policy analysis and testing of competing models. The research activity undertaken

by the Cowles Commission in the United States (1945-1970) was entirely based on such large

scale models. They were based on theoretical considerations derived from the (then) prevailing

Keynesian paradigm.

It was not until the 70s when these models were questioned for different reasons. Firstly, the

economic turmoil and the instability linked to the collapse of the Bretton Woods system and

the oil shocks led to failure of forecasting of most main macroeconometric models. Secondly,

economists put into doubt the validity of Keynes’ theories. Thirdly, the way these large scale

models were specified was criticised by Sims (1980). He emphasised two different methodological

weaknesses:

1. the specification of simultaneous equations systems was largely based on the aggregation of

partial equilibrium models, without any concern for the resulting omitted interrelations.

2. the dynamic structure of the model was often specified in order to provide restrictions nec-

essary to achieve identification (or over-identification) of the structural form.

As an answer to this criticism, Sims suggested the use of models whose specification had to be

founded on the analysis of the statistical properties of the data under study. What he suggested

were vector autoregressions, saying:

” [. . . ], it appears worthwhile to investigate the possibility of building large models in a style

which does not tend to accumulate restrictions so haphazardly . . . . It should be feasible to

estimate large-scale macromodels as unrestricted reduced forms, treating all variables as

endogenous. ”

1.1 Basic concepts2

A simple VAR is an expression relating a set of k variables,Zt , to its previous levels. It is given

by

Zt = b+B1Zt−1 +B2Zt−2 + . . .+BqZt−q + ut (1.1)

where Zt is a k -dimensional vector of endogenous variables, b a k -dimensional vector of constants

and B1, . . . , Bq are k × k-dimensional autoregressive coefficient matrices. For the k -dimensional

vector of normally distributed ut the following properties apply:

1I thank Andreas Walchli for the preparation of these lecture notes.2The notation used in this lecture notes are mainly based on the notation in Christiano, Eichenbaum and Evans

(1999), Monetary Policy Shocks: What have we learned and to what End?, in Handbook of Macroeconomics, Vol.

1, Part 1, Chapter 2.


1.1 Basic concepts 2

E(ut) = 0

E(utu′

t) = Ω

E(uiu′

j) = 0, if i 6= j

Example 1.1

A simple VAR(2) of 2 × 1 vector

Zt =[xt yt

]′[xt

yt

]=

[B1,11 B1,12

B1,21 B1,22

][xt−1

yt−1

]+

[B2,11 B2,12

B2,21 B2,22

][xt−2

yt−2

]+

[u1,t

u2,t

]

A VAR(q) can be written as a VAR(1). This turns out to be very convenient for calculating.

Recall the VAR represented by (1.1):

Zt = b+B1Zt−1 +B2Zt−2 + . . .+BqZt−q + ut

This can be written as

Zt

Zt−1...

Zt−q+1

=

b

0...

0

+

B1 B2 . . . Bq−1 Bq

1 0 . . . 0 0...

... . . ....

...

0 0 . . . 1 0

Zt−1

Zt−2...

Zt−q

+

ut

0...

0

(1.2)

or in a more compact form

Zt = b+BZt−1 + ut (1.3)

The representation in (1.3) is the so called canonical form of the VAR. Note that the first

equation of the system (1.2) is identical to (1.1) and the other equations are just identity

Zi = Zi

Example 1.2

Consider the AR(2)

Zt = B1Zt−1 +B2Zt−2 + ut

which is equal to [Zt

Zt−1

]=

[B1 B2

1 0

][Zt−1

Zt−2

]+

[ut

0

]


1.2 Stability of a VAR 3

1.2 Stability of a VAR

Consider a VAR(1), or a VAR(1) representation of a VAR(q)

Zt = b+BZt−1 + ut (1.4)

To derive the mean of Zt we may first express Zt as a function of the initial value Z0 and the past

u’s by successive substitutions as shown by (1.5) for the lagged Z ’s on the right hand side of (1.4).

Zt−1 = b+BZt−2 + ut−1

...

Zt−s = b+BZt−s−1 + ut−s for s = 1, 2, . . . (1.5)

Therefore, substituting (1.5) in (1.4) gives

Zt = b+B(b+BZt−2 + ut−1) + ut

= b+Bb+B2Zt−2 +But−1 + ut

= b+Bb+B2(b+BZt−3 + ut−2) +But−1 + ut

= b+Bb+B2b+B3Zt−3 +B2ut−2 +But−1 + ut

...

Zt = b+Bb+B2b+ . . .+BKb+BK+1Zt−K−1 + ut +But−1 + . . .+BKut−K (1.6)

By taking the mathematical expectations on both sides of (1.6) we obtain the mean function

E(Zt) = b+Bb+B2b+ . . .+BKb+BK+1Zt−K−1 (1.7)

It can be shown that the series described by (1.7) converges. First, premultiply (1.7) by B.

BE(Zt) = Bb+B2b+ . . .+BK+1b+BK+2Zt−K−1

and subtract the result from (1.7)

E(Zt)(I −B) = b−BK+1b+ (BK+1 −BK+2)Zt−K−1 (1.8)

Assuming that the inverse of (I − B) exists, it can be seen clearly that E(Zt) is stable when

BK+1 = 0 for K →∞ . Rewrite (1.8) as

E(Zt) = b(I −B)−1 ≡ µ

The VAR may be written as a deviation from its mean. For this, subtract (1.7) from (1.6). Then,

use the result above to obtain

Zt − E(Zt) = ut +But−1 + . . .+BKut−k

Zt = µ+ ut +But−1 + . . .+BKut−k (1.9)

The necessary condition for stability of a VAR is that all eigenvalues of the matrix B lie inside the

unit circle. The reasons for this will be explained in a following section ”Spectral Decomposition”.

First, we introduce the vector moving average representation of a VAR.


1.3 Vector Moving Average 4

1.3 Vector Moving Average

Recalling equation (1.9)

Zt = µ+ ut +But−1 + . . .+BKut−k

let K go to infinity and - given that it is a stable VAR - receive

Zt = µ+ ut +But−1 +B2ut−2 + . . . (1.10)

Pick out the k first equations from (1.10) (in order to extract the ”original” variables from the

canonical form) to obtain the vector moving average representation of the VAR.

Zt = µ+ ut + Ψ1ut−1 + Ψ2ut−2 + . . . (1.11)

where Ψj denotes the upper left block of Bj consisting of the first k rows and k columns. Note

that the u’s in (1.11) have the k first elements of the u’s in (1.10).

1.4 Spectral Decomposition

It was stated above that the condition for a stable VAR is that all eigenvalues of the coefficient

matrix B lie inside the unit circle. This may be explained in this section. The eigenvalue λ of a

matrix B is defined as

det(B − λI) = 0

If the eigenvectors are linearly independent (that is, all eigenvalues are distinct), the spectral

decomposition can be applied. For a matrix B, there exists a matrix T such that

B = TΛT−1

where

Λ =

λ1 0 . . . 0

0 λ2 . . . 0...

... . . ....

0 0 . . . λn

The matrix Λ is a diagonal matrix with the n eigenvalues on the main principal and zeros elsewhere.

The matrix T consists of the n eigenvectors associated to the eigenvalues of the matrix B.

Note that we therefore get

B2 = BB = TΛT−1 × TΛT−1 = TΛΛT−1 = TΛ2T−1

with Λ2 a diagonal matrix with the square of the eigenvalues on the principal diagonal.

This can be generalised to

Bs = TΛsT−1

Under the condition that all eigenvalues lie inside the unit circle the matrix Λs vanishes for s→∞,

and subsequently, also Bs.


1.5 Schur Decomposition 5

Example 1.3

Consider the process

Zt = 0.5Zt−1 + 0.5Zt−2 + εt

which may be written as [Zt

Zt−1

]=

[0.5 0.5

1 0

][Zt−1

Zt−2

]+

[εt

0

]

The VAR has the eigenvalues -0.5 and 1 with the corresponding eigenvectors [−1 2]′ and

[1 1]′. The matrix T is composed of the eigenvectors of B, so the spectral decomposition

is: [−1 1

2 1

][−0.5 0

0 1

][−1 1

2 1

]−1=

[0.5 0.5

1 0

]

As mentioned above, a necessary condition for the spectral decomposition is that the eigenvectors

are linearly independent. However, often this is not the case.

Example 1.4

Consider the process

Zt − Zt−1 = Zt−1 − Zt−2 + εt

which may be written as [Zt

Zt−1

]=

[2 −1

1 0

][Zt−1

Zt−2

]+

[εt

0

]

The VAR has the eigenvalues 1 and 1 with the corresponding eigenvectors [1 1]′. Thus,

the inverse of the matrix T does not exist.

1.5 Schur Decomposition

An alternative way to derive the decomposition is to use the Schur decomposition. Let AH denote

the transpose of the complex conjugate of A. So, if

A =

[2 3 + 4i

1− 2i 5

]

then

AH =

[2 1− 2i

3 + 4i 5

]


1.6 Impulse Response Function 6

A matrix A is unitary (similar to orthogonal, but for complex matrices), if the following condition

is satisfied:

AH = A−1

Following the Schur decomposition, a n× n - matrix A can be written as

A = ZTZH (1.12)

where Z is a unitary n × n - matrix and T is a n × n upper triangular Schur form with the

eigenvalues along the diagonal. Remark : The Schur composition also holds for real matrices A.

In this case, the Schur decomposition is

A = ZTZ ′

and

A′ = A−1

holds.

Example 1.4 (cont’d)

Again, [Zt

Zt−1

]=

[2 −1

1 0

][Zt−1

Zt−2

]+

[εt

0

]

The Schur decomposition states that there exist matrices Z and T, as defined above, such

that

A = ZTZH

In this case, A is real and thus it holds that

A = ZTZ ′

As Z has also to obey ZZ′ = I , so Z′ = Z−1 holds. Hence

T =

[1 −2

0 1

], Z =

[√22 −

√22√

22

√22

], Z ′ = Z−1 =

[ √22

√22

−√22

√22

]

The calculation can be made with a mathematical program, such as Matlab.

1.6 Impulse Response Function

The vector moving average of the VAR(q) expresses the endogenous variable Zt as a linear function

of the current and past u’s.

Zt = µ+ ut + Ψ1ut−1 + Ψ2ut−2 + . . .


1.7 Introducing the Lag Operator L 7

The matrix Ψs may be interpreted as follows

∂Zt+s∂u′t

= Ψs (1.13)

Define ψi,j(s) as the row i, column j element of Ψs. The coefficient ψi,j(s) is called impact

multiplier. It quantifies the consequence of a one unit increase in uj,t (the j -th element at date

t) for the value of zi,t+s (the i -th variable at date t+s), holding all other u’s constant. ψi,j(s)

as function of s is called impulse response function. Plotting ψi,j(s) against s visualises the

behaviour of zi,t+s in response to a shock uj,t. The combined effect of changes in ut on the value

of the vector Zt would be given by

∆Zt+s = Ψs∆ut (1.14)

1.7 Introducing the Lag Operator L

The vector moving average representation can also be obtained using the lag operator L. The lag

operator is defined as

L(Zt) ≡ Zt−1

The lag operator follows the same analytical rules as the multiplication operator, e.g.

L(L(Zt)) = L(Zt−1) = Zt−2

which could be denoted L2. The lag operator is commutative with the multiplication operator

L(βZt) = βL(Zt)

Finally, the lag operator is distributive over the addition operator

L(Zt + Yt) = L(Zt) + L(Yt)

Apply the lag operator to equation (1.1) to obtain

Zt = b+B1L(Zt) +B2L2(Zt) + . . .+BqL

q(Zt) + ut

(I −B1L−B2L2 − . . .−BqLq)Zt = b+ ut (1.15)

In a more compact form, (1.15) may be written as

B(L)Zt = b+ ut

where B(L) is a k × k-matrix of polynomials in the lag operator and is defined as

B(L) = I −B1L−B2L2 − . . .−BqLq

If all eigenvalues of B(L) lie inside the unit circle, B(L)−1 exists and the VAR(q) has an MA(∞)

representation

Zt = B(L)−1b+B(L)−1ut

Zt = µ+ Ψ(L)ut (1.16)


1.8 Forecast Error Variance 8

The moving average coefficients can be calculated using the method of undetermined coefficients,

based on the following relationship

Ψ(L) = B(L)−1 (1.17)

which requires

B(L)Ψ(L) = I

I = (Ψ0 + Ψ1L+ Ψ2L2 + . . .)(I −B1L− . . .−BqLq)

= Ψ0 + (Ψ1 −Ψ0B1)L+ (Ψ2 −Ψ1B1 −Ψ0B2)L2 + . . .+(Ψi −

i∑j=1

Ψi−jBj)Li + . . . (1.18)

I = Ψ0

0 = Ψ1 −Ψ0B1

0 = Ψ2 −Ψ1B1 −Ψ0B2

...

0 = Ψi −i∑

j=1

Ψi−jBj

...

where Bj = 0 for j > p . Hence, the Ψi can be computed recursively using

Ψ0 = I

Ψi =

i∑j=1

Ψi−jBj for i = 1, 2, . . .

The mean µ of Zt can be obtained as follows

µ = Ψ(1)b = B(1)−1b = (I −B1 − . . .−Bq)−1b (1.19)

1.8 Forecast Error Variance

The error forecast of the s period ahead forecast is

Zt+s − EtZt+s = ut+s + Ψ1ut+s−1 + . . .+ Ψsut+1 (1.20)

so the covariance matrix of the (s periods ahead) forecasting errors is

Cov(Zt+s−EtZt+s) = E(Zt+s−EtZt+s)(Zt+s−EtZt+s)′ = Ω+Ψ1ΩΨ′1+. . .+Ψs−1ΩΨ′s−1 (1.21)

For a VAR(1), Ψs = Bs, so we have

Zt+s − EtZt+s = ut+s +But+s−1 + . . .+Bsut+1 (1.22)


1.9 Forecast Error Variance Decomposition 9

and

Cov(Zt+s − EtZt+s) = Ω +BΩB′ + . . .+Bs−1Ω(Bs−1)′ (1.23)

Note that lims→∞EtZt+s = µ, that is, the forecast goes to the unconditional mean. Consequently,

the forecast becomes the VMA representation. Similarly, the forecast error variance goes to the

unconditional variance.

1.9 Forecast Error Variance Decomposition

If the shocks are uncorrelated, then it is often useful to calculate the fraction of V ar(Zi,t+s −EtZi,t+s) due to the i -th shock, the forecast error variance decomposition.

Suppose the covariance matrix of the shocks, here Ω, is a diagonal n×n matrix with the variances

ωii along the diagonal. Let ψq,i be the i -th column of Ψq. We then have

ΨqΩΨ′q =

n∑i=1

ωiiψq,i(ψq,i)′ (1.24)

Example 1.5

Illustration of the formula above

Ψq =

[ψ11 ψ12

ψ21 ψ22

]and Ω =

[ω11 0

0 ω22

]

Then

ΨqΩΨ′q =

[ω11ψ

211 + ω22ψ

212 ω11ψ11ψ21 + ω22ψ12ψ22

ω11ψ11ψ21 + ω22ψ12ψ22 ω11ψ221 + ω22ψ

222

]which is equal to

ω11

[ψ11

ψ21

][ψ11

ψ21

]′+ ω22

[ψ12

ψ22

][ψ12

ψ22

]′= ω11

[ψ211 ψ11ψ21

ψ11ψ21 ψ221

]+ ω22

[ψ212 ψ12ψ22

ψ12ψ22 ψ222

]

Applying this on the covariance matrix

Cov(Zt+s − EtZt+s) =

n∑i=1

ωiiI +

n∑i=1

ωiiψ1,i(ψ1,i)′ + . . .+

n∑i=1

ωiiψs−1,i(ψs−1,i)′

=

n∑i=1

ωii(I + ψ1,i(ψ1,i)

′ + . . .+ ψs−1,i(ψs−1,i)′)

which shows how the covariance matrix for the s-period forecast errors can be decomposed

into its n components.

1.10 Structural VAR (SVAR)

A structural form SVAR(q) in its most general form is given by

A0Zt = a+A1Zt−1 +A2Zt−2 + · · ·+AqZt−q + Sεt (1.25)


1.11 The Identification Problem 10

where a is a k -dimensional vector of constants, Ai’s are k×k-dimensional matrices, and εt is white

noise with variance-covariance matrix (Eεtε′t = I). The matrix S is a k × k-dimensional matrix

that specifies which variables are to what extent directly affected by structural shocks. Note that

S is typically a diagonal matrix.

Under the condition that the inverse of the matrix A0 exists, the SVAR(q) may be written as

follows

Zt = A−10 a+A−10 A1Zt−1 +A−10 A2Zt−2 + . . .+A−10 AqZt−q +A−10 Sεt

= b+B1Zt−1 +B2Zt−2 + . . .+BqZt−q + ut (1.26)

Equation (1.26) is the vector autoregressive representation of the structural dynamic equation

system. Hence, a VAR can be seen as a reduced form of a general structural model.

The coefficients of the structural and the reduced form are related as follows

Bi = A−10 Ai for i = 1, 2, . . . , q

b = A−10 a

ut = A−10 Sεt

If all eigenvalues of B(L) lie inside the unit circle, the MA(∞) is given by

Zt = B(L)−1b+B(L)−1A−10 SS−1A0ut

= µ+ Φ(L)εt (1.27)

where εt = S−1A0ut and Φ(L) = B(L)−1A−10 S.

The impulse response function can be obtained from the following relationship

∂Zt+s∂ε′t

= ΨsA−10 S = Φs

Define φi,j(s) as the row i, column j element of Φs. Again, the coefficient φi,j(s) is called impact

multiplier. It quantifies the consequence of a one unit increase in εj,t (the j -th element at date

t) for the value of zi,t+s (the i -th variable at date t+s), holding all other ε’s constant. φi,j(s) as

function of s is called impulse response function.

1.11 The Identification Problem3

It can be easily shown that an infinite set of different values of A0 and A1 through Aq results in

observationally equivalent reduced form methods. To illustrate this, premultiply the structural

form of (1.25) with a k × k-matrix Q of full rank.

QA0Zt = Qa+QA1Zt−1 +QA2Zt−2 + . . .+QAqZt−q +QSεt (1.28)

3Source: Favero, Applied Macroeconometrics, 2001, Chapter 6.


1.12 Identification 11

Continuing as outlined above results in

Zt = A−10 Q−1Qa+A−10 Q−1QA1Zt−1+A−10 Q−1QA2Zt−2+ . . .+A−10 Q−1QAqZt−q+A−10 Q−1QSεt

(1.29)

where matrix Q cancels out. However, this implies that the model is not identified. Without

imposing additional restrictions, the structural parameters cannot be uniquely identified from the

estimated reduced form coefficients, since there is more than one structural model that leads to

the same statistical model. The model would be identifiable, if Q were an identity matrix.

VAR models are estimated to provide empirical evidence on the response of macroeconomic vari-

ables to monetary policy impulses in order to discriminate between alternative theoretical models

of the economy. It then becomes crucial to identify monetary policy actions using restrictions inde-

pendent from the competing models of the transmission mechanism under empirical investigation,

taking into account the potential endogeneity of policy instruments.

VAR models concentrate on shocks. First the relevant shocks are identified, and the response of

the system to shocks is described by analysing impulse responses (the propagation mechanism of

shocks), forecasting error variance decomposition, and historical decomposition.

1.12 Identification

We assume a structural model of the form

A0Zt = B(L)Zt−1 + Sεt (1.30)

where Z is a vector of macroeconomic (non-policy) variables and of variables controlled by the

policy-maker. Matrix A0 describes the contemporaneous relations among the variables and B(L)

is a matrix of finite-order lag polynomial. ε is a vector of structural disturbances to the non-policy

and policy variables; non-zero elements of the matrix S allow some shocks to affect directly more

than one endogenous variable in the system. The VAR of (1.30) may be represented by its reduced

form

Zt = A−10 B(L)Zt−1 + ut (1.31)

where ut is the VAR residual vector, normally independently distributed with full variance-

covariance matrix Ω. The relation between the residuals in ut and the structural disturbances

in εt is therefore:

A0ut = Sεt (1.32)

Inverting yields

ut = A−10 Sεt

from which the relation between the variance-covariance matrices of u and ε can be derived

E(utu′t) = A−10 SE(εtε

′

t)S′A−1

′

0

Given the definitions that the structural disturbances have the identity matrix as covariance matrix

and that the residuals have full matrix Ω this can be written as

Ω = A−10 SIS′A−1′

0 (1.33)


1.12 Identification 12

The matrix Ω has k(k + 1)/2 different elements (by definition, it is symmetric). Therefore, a

necessary condition for identification is that the maximum number of parameters contained in the

two matrices A0 and S equals k(k + 1)/2; such a condition makes the number of equations equal

to the number of unknown in system (1.33). A sufficient condition is that no equation in (1.33) is

a linear combination of the other equations in the system.

The k(k + 1)/2 can be written as

vech(Ω) = vech(A−10 SS′A−1′

0 ) (1.34)

where the two matrices A0 and S have k2 elements each. Thus, 2k2 − 12k(k + 1) restrictions to

identify all 2k2 elements of A0 and S are required at least locally. Even if the diagonal elements

of A0 are set to one, 2k2 − k − 12k(k + 1) further restrictions are needed for identification. It

is therefore not surprising that most applications consider special cases with A0 = I or S = I.

However, the general model is a useful framework for SVAR analysis. The restrictions are typically

normalisations or zero restrictions which can be written in the form of linear equations

vec(A0) = RA0γA0

+ rA0and vec(S) = RSγS + rS (1.35)

where RA0and RS are suitable fixed matrices of zeros and ones, γA0

and γS are vectors of free

parameters and rA0 and rS are vectors of fixed parameters which allow, for instance, to normalise

the diagonal elements of A0. Although γS is typically zero, it is included here because it does not

complicate the analysis.

Multiplying the two sets of equations in (1.35) by orthogonal elements of RA0and RS , RA0⊥ and

RS⊥, it is easy to see that they can be written alternatively in the form

CA0 = vec(A0) = cA0 and CS = vec(S) = cS (1.36)

where CA0 = RA0⊥, CS = RS⊥, cA0 = RA0⊥rA0 and cS = RS⊥rS . The matrices CA0 and AS may

be thought of as appropriate selection matrices. Again, in general, the restrictions will ensure only

local uniqueness of A0 and S due to the nonlinear nature of the full set of equations from which to

solve for two matrices. The following proposition states a rank condition for local identification.

Theory: vec, vech and duplication matrix

The function vec(A) is the vectorisation of the matrix A by stacking the columns of A

into a single column vector, that is

vec(A) = [a11, . . . , ak1, a12, . . . , ak2, . . . , a1k, . . . , akk]′

For example

A =

[a c

b d

], vec(A) =

a

b

c

d


1.13 Proposition (Local Identification of the SVAR) 13

When A is a symmetric matrix, the vectorisation vec contains more information than is

strictly necessary. The matrix could be described by the lower triangular portion, that is

k(k + 1)/2 entries on and below the diagonal. For such matrices, the half-vectorisation is

sometimes useful.

For example

A =

[a b

b d

], vech(A) =

a

b

d

There exists a unique matrix transformation from the half-vectorisation to the vectorisation

and vice versa. These matrices are called duplication and elimination matrix, respectively.

1.13 Proposition (Local Identification of the SVAR)

Let A0 and S be non-singular (k × k)-matrices. Then, for a given symmetric, positive definite

(k× k)-matrix Ω, the system of equations in (1.34) and (1.36) has a locally unique solution if and

only if

rk

−2D+

k (Ω⊗A−10 ) 2D+k (A−10 S ⊗A−10 )

CA00

0 CS

= k2 (1.37)

where Dk is a (k2 × 12k(k + 1)) duplication matrix, D+

k ≡ (D′kDk)−1D′k.

In practice, identification requires the imposition of some restrictions on the parameters of A0 and

S. For this step various ways have been used in the past.

Theory: Kronecker product ⊗If A is an (m× n)-matrix and B is a (p× q)-matrix, then the Kronecker product A⊗ B is

the (mp× nq) block matrix

A⊗B =

a11B . . . a1nB

.... . .

...

am1B . . . amnB

There exists the following relationship between the Kronecker product and the vectorisation

of matrices

vec(AXB) = (B′ ⊗A)vec(X)


1.14 Cholesky Decomposition 14

1.14 Cholesky Decomposition

In his article, which introduced VAR methodology, Sims (1980) proposed the following identifica-

tion strategy, based on the Cholesky decomposition of matrices.

Theory: Hermitian matrix

A matrix A is a Hermitian matrix, when it is equal to its own conjugate transpose. That

is, the element of the i-th row and j-th column is equal to the complex conjugate of the

element in the j-th row, i-th column element. For real matrices A, the Hermitian matrix is

equal to its transpose.

A = AH , (aij) ∈ C

A = A′, (aij) ∈ R

Theory: Cholesky decomposition

The Cholesky decomposition states that a Hermitian, positive-definite matrix A can be

decomposed as

A = LL∗

where L is a lower triangular matrix with strictly positive diagonal elements, and L∗ denotes

the conjugate transpose of L. The Cholesky decomposition is unique for any given positive-

definite Hermitian matrix.

A0 =

1 0 . . . 0

a0,21 1 . . . 0...

......

...

a0,n1 a0,n2 . . . 1

, S =

s11 0 . . . 0

0 s22 . . . 0...

.... . .

...

0 0 . . . snn

It is obvious that this is a just-identification scheme, that is that the number of unknowns is equal

to k(k + 1)/2. The identification of structural shocks depends on the ordering of variables, with

the most endogenous variable ordered last.

1.15 Structural VARs with Long-Run Restrictions

Often long-run behaviour of shocks provides restrictions acceptable within a wide range of theo-

retical models. A typical restriction compatible with virtually all macroeconomic models is that

in the long-run demand shocks have zero impact on output. For further discussion on how to use

these restrictions to identify VARs, compare Blanchard and Quah (1989).

The structural model of interest is specified by posing A0 equal to the identity matrix and by

imposing no restriction on the matrix S. Then, a generic vector of variables Zt is specified by

Zt =

p∑i=1

AiZt−i + Sεt (1.38)


1.16 VAR Processes with Integrated Variables 15

from which the matrix which describes the long-run effect of the structural shocks on the variables

of interest can be derived as follows

Zt −p∑i=1

AiZt−i = Sεt

(I −

p∑i=1

AiLi)Zt = Sεt

Zt =(I −

p∑i=1

AiLi)−1

Sεt

For the long-run restrictions L may be set equal to one, so

(I −

p∑i=1

Ai

)−1Sεt = −Π−1Sεt

The coefficients in Π are obtained from the reduced form, therefore, we are able to impose long-run

restrictions given the estimation of the reduced form. Two points are worth noting:

1. (I − A1) is −Π, for this matrix to be invertible the VAR must be specified on stationary

variables;

2. the long-run restrictions are restrictions on the cumulative impulse response function.

1.16 VAR Processes with Integrated Variables4

Consider the following VAR process

A(L)Zt = ut

where A(L) = I − A1L− . . .− ApLp and L is the lag operator. Multiplying from the left by the

adjugate A(L)adj of A(L) gives

|A(L)|Zt = A(L)adjut

where |A(L)| denotes the determinant of A(L). (For a description of the adjugate matrix see

following box). Thus, the VAR(q) process can be written as a process with univariate AR operator.

That is, all components have the same AR operator. The right-hand side, A(L)adjut, is a finite

order MA process. If |A(L)| has d unit roots and otherwise all roots are outside the unit circle,

the AR operator can be written as

|A(L)|= α(L)(1− L)d = α(L)∆d

where α(L) is an invertible operator. Consequently, ∆dZt is a stable process. Hence, each com-

ponent becomes stable upon differencing.

4Source: Lutkepohl, New Introduction to Multiple Time Series Analysis, 2005, p. 243.


1.17 Cointegrated Processes 16

Theory: Minors and Cofactors

Given a general (n×n)-matrix A = (aij) the minor of the ij-element aij is the determinant

of the ((n − 1) × (n − 1))-matrix, which is obtained by deleting the i-th row and the j-th

column of A. The cofactor of aij , denoted by Aij , is the minor multiplied by (−1)i+j .

Example 1.6: Minors and Cofactors

Given the (3× 3)-matrix

A =

2 1 3

0 2 1

1 −1 4

the minor of the lower-right element is

det

[2 1

0 2

]= 4

and the cofactor is 4 · (−1)3+3 = 4.

Theory: Adjugate matrix

The (n× n)-matrix of cofactors

Aadj =

A11 . . . A1m

.... . .

...

Am1 . . . Amm

is the adjugate matrix of A. The following property for the adjugate matrix holds:

A−1 = det(A)−1Aadj

Note: the adjugate matrix is sometimes also called the ”adjoint” matrix, but this term is

ambiguous. Today, the ”adjoint” normally refers to the conjugate transpose..

1.17 Cointegrated Processes

A process Zt is called integrated of order d, Zt ∼ I(d), if ∆dZt is stable and ∆d−1Zt is not

stable. The I(d) process Zt is called cointegrated if there is a linear combination β′Zt =

(β1 . . . βK)(Z1t . . . ZKt)′ with β 6= 0 which is integrated of order less than d. A cointegrating

vector is not unique. Multiplying by a nonzero constant yields a further cointegrating vector.

Also, there may be various linearly independent cointegrating vectors. Before the concept of

cointegration was introduced, the closely related error correction models were discussed in the

econometrics literature. In an error correction model, the changes in a variable depend on the

deviations from some equilibrium relation. Suppose that Z1t represents the price of a commodity

in a particular market and that Z2t is the corresponding price of the same commodity in another

market. Assume furthermore that the equilibrium relation between the two variables is given by


1.17 Cointegrated Processes 17

Z1t = β1Z2t and that the changes in Z1t depend on the deviations from this equilibrium in period

t− 1.

∆Z1t = α1(Z1,t−1 − β1Z2,t−1) + u1t

Something similar may hold for Z2t

∆Z2t = α2(Z1,t−1 − β1Z2,t−1) + u2t

A more general model may in addition depend on previous changes in both variables. This can

be written in vector and matrix notation

∆Zt = αβ′Zt−1 + Γ1∆Zt−1 + ut (1.39)

where Zt = (Z1t, Z2t)′, ut = (u1t, u2t)

′, α = (α1, α2)′, β′ = (1, −β1) and

Γ1 =

[γ11,1 γ12,1

γ21,1 γ22,1

]

Rearranging gives the VAR(2) representation

Zt = (I + Γ1 + αβ′)Zt−1 − Γ1Zt−2 + ut

Hence, the cointegrated variables may be generated by a VAR process. To see how cointegration

can arise more generally in K- dimensional VAR models, consider the VAR(2) process

Zt = A1Zt−1 +A2Zt−2 + ut (1.40)

with Zt = (Z1t, . . . , ZKt)′. Suppose the process is unstable with

|I −A1z −A2z2|= (1− λ1z) · . . . · (1− λnz) = 0 for z = 1

Because the λi are the reciprocals of the roots of the determinantal polynomial, one or more of

them must be equal to 1. All other roots are assumed to lie outside the unit circle, that is, all λi

which are not 1 are inside the complex unit circle. Because |I −A1 −A2|= 0, the matrix

Π = −(I −A1 −A2)

is singular. Suppose rank(Π) = r < K. Then Π can be decomposed as Π = αβ′, where α and

β are (K × r) matrices. From the discussion in the previous section, we know that each variable

becomes stationary upon differencing. Let us assume that differencing once is sufficient, subtract

Zt−1 on both sides of (1.40) and rearrange terms as

Zt − Zt−1 = −(I −A1 −A2)Zt−1 −A2Zt−1 +A2Zt−2 + ut

or

∆Zt = ΠZt−1 + Γ1∆Zt−1 + ut (1.41)

where Γ1 = −A2, or

αβ′Zt−1 = ∆Zt − Γ1∆Zt−1 − ut


1.18 Identification in Cointegrated VARs 18

Because the right-hand side involves stationary terms only, αβ′Zt−1 must also be stationary and

it remains stationary upon multiplication by (α′α)−1α′. In other words, β′Zt is stationary and,

hence, each element of β′Zt represents a cointegrating relation. Note that simply taking first

differences of all variables in (1.40) eliminates the cointegration term which may well contain

relations of great importance for a particular analysis. Moreover, in general, a VAR process with

cointegrated variables does not admit a pure VAR representation in first differences.

In the following, we will be interested in the specific case where all individual variables are I(1)

or I(0). The K- dimensional VAR(p) process

Zt = A1Zt−1 + . . .+ApZt−p + ut (1.42)

is called cointegrated of rank r if

Π = −(I −A1 − . . .−Ap)

has rank r and, thus Π can be written as a matrix product αβ′ with α and β being of dimension

(K × r) and of rank r. The matrix β is called a cointegrating or cointegration matrix or

cointegration vector and α is sometimes called the loading matrix. If r = 0, ∆Zt has a

VAR(p-1) representation and, for r = K, |I − A1 − . . . − Ap|= |−Π|6= 0 and hence, the VAR

operator has no unit roots so that Zt is a stable VAR(p) process.

Rewriting (1.42) as in (1.41) it has a vector error correction model (VECM) representation

∆Zt = ΠZt−1 + Γ1∆Zt−1 + . . .+ Γp−1∆Zt−p+1 + ut

= αβ′Zt−1 + Γ1∆Zt−1 + . . .+ Γp−1∆Zt−p+1 + ut (1.43)

where

Γi = −(Ai+1 + . . .+Ap) for i = 1, . . . , p− 1

If this representation of a cointegrated process is given, it is easy to recover the corresponding

VAR from (1.42) by noting that

A1 = Π + I + Γ1

Ai = Γi − Γi−1 i = 2, . . . , p− 1

Ap = −Γp−1

1.18 Identification in Cointegrated VARs5

Consider, for simplicity, only first-order dynamics, the cointegrated reduced form:

∆zt = Πzt−1 + ut

where Π = αβ′. Identification of the cointegrating vectors is a problem totally separated from

identification of the structural shocks of interest. Therefore, having solved the identification of the

5Source: Favero, Applied Macroeconometrics, 2001, Chap. 6.


1.18 Identification in Cointegrated VARs 19

cointegrating relationships, we still have to deal with the problem of posing appropriate restrictions

on the parameters of the S matrix in order to pin down the shocks εt.

∆zt = Πzt−1 + Sεt

In the context of cointegration, the identification problem can be solved in a very natural way.

Consider, for simplicity, the case of a bivariate model zt = (yt, xt), in which variables are non-

stationary I(1) but cointegrated with a cointegrating vector (1,−1), so the rank of the Π matrix

is 1 and we use the following representation of the stationary reduced form:[∆yt

∆xt

]=

[α11

α21

] [1 −1

] [yt−1xt−1

]+

[s11 s12

s21 s22

][ε1,t

ε2,t

](1.44)

The system (1.44) can be re-written as follows:[−1 1

0 1

][(1− L) 0

0 1

][(yt − xt)

∆xt

]=

[α11 0

α21 0

][(yt−1 − xt−1)

∆xt−1

]+

[s11 s12

s21 s22

][ε1,t

ε2,t

](1.45)

The second representation has been widely used in research based on present value models. The

cointegrating properties of the system suggest the presence of two types of shocks: a permanent

one (related to the single common trend shared by the two variables) and a transitory one (related

to the cointegrating relation). It seems therefore natural to identify one shock as permanent and

the other as transitory. Given that we have a stationary system, the identification of shocks is

obtained by deriving long-run responses of the variables of interest to relevant shocks. Rearranging

(1.45) gives ([−1 1

0 1

][(1− L) 0

0 1

]−

[α11L 0

α21L 0

])[(yt − xt)

∆xt

]=

[s11 s12

s21 s22

][ε1,t

ε2,t

]

from which long-run responses are obtained by setting L = 1 and by inverting the matrix pre-

multiplying variables in the stationary representation of VAR[(yt − xt)

∆xt

]=

[−α11 1

−α21 1

]−1 [s11 s12

s21 s22

][ε1,t

ε2,t

](1.46)

=1

−α11 + α21

[s11 − s21 s12 − s22

α21s11 − α11s21 α21s12 − α11s22

][ε1,t

ε2,t

](1.47)

Thus ε2,t can be identified as the transitory shock by imposing the following restriction

α21s12 − α11s22 = 0

which, given knowledge of the α parameters from the cointegration analysis, provides the just-

identifying restriction for the parameters in S. Note that, there is one case in which this identi-

fication is equivalent to the Cholesky ordering, the case in which α11 = 0. Note that this is the

case in which ∆yt is weakly exogenous for the estimation of s21.


Topic 2: The Kalman Filter 20

2 Topic 2: The Kalman Filter

Introduction6

Developed by Kalman (1960) and Kalman and Bucy (1961) for the Apollo missions, this filtering

technique is an important topic in engineering (control theory) but has also been widely used in

economics. An illustrative application is radar tracking of an airplane in mid-air: Depending on

the turning speed of the radar, flight controllers would see current location of the airplane, for

instance, every three seconds on the screen. Using these periodical measurements, Kalman filtering

estimates the values between them (filtering) and predicts the values until the next measurement

occurs such that the flight controller sees the position of the airplane continuously (e.g. updated

every 0.5 seconds). After each turn of the radar, the current position of the plane is observed and

the coefficients of the filter are updated correspondingly yielding the optimal estimator for the

next 3 seconds.

In an economic context the Kalman filter can be used to disaggregate time series by related series

(i.e. from a quarterly to a monthly periodicity) where the dates between the observations are

treated as missing values. See Section 2.7 for further examples of economic applications.

2.1 State Space Modelling7

Before being able to run the Kalman filter, the model needs to be specified in a special form called

state-space form. (Setting up a model in the appropriate state space form is the really tricky

thing since the Kalman filter is just a set of mechanical transformation applied to it). The state

space form basically consists of two equations. The first one is called measurement or observation

equation and relates the observable outcome yt to a (partly) unobserved state vector αt:

yt = Ztαt + dt + εt, t = 1, . . . , T (2.1)

where yt, dt and εt are N × 1 vectors, αt is an m × 1 vector and Zt is an N × m matrix. N

is the number of time series used and m is the dimension of the state vector (e.g. two in the

univariate example 2.2). The elements of the vector εt are serially uncorrelated and E(εt) = 0

and V ar(εt) = Ht. The transition or state equation - the second equation - describes how the

unobserved state is generated and how it evolves:

αt = Ttαt−1 + ct +Rtηt, t = 1, . . . , T (2.2)

where αt, αt−1 and ct are m × 1 vectors, Tt is an m ×m matrix and ηt is a g × 1 vector which

elements are again serially uncorrelated with E(ηt) = 0 and V ar(ηt) = Q. g does not need to be

equal to m because not every state needs to change over time.

6I thank Matthias Kurmann for the preparation of theses lecture notes.7The notations used in these lecture notes are mainly based on Harvey, Andrew C. (1989). Forecasting, Structural

Time Series Models and the Kalman filter.


2.1 State Space Modelling 21

The specification is completed by two further assumptions:

1. Mean and covariance matrix of the initial state vector α0 are given by

E(α0) = a0 (2.3)

V ar(α0) = P0 (2.4)

2. εt and ηt are uncorrelated with each other over all time periods including the initial state

vector. Formally:

E(εtη′s) = 0, ∀ s, t = 1, . . . , T, and (2.5)

E(εtα′0) = 0, E(ηtα

′0) = 0, t = 1, . . . , T (2.6)

In general, the system matrices Zt, dt, Ht, Tt, ct, Rt and Qt are non-stochastic. They can change

but in a predetermined way such that yt can be expressed as a linear combination of present and

past εt’s and ηt’s and the initial state vector. If these matrices do not change over time (i.e. the

time subscripts can be dropped), the model is said to be time-invariant or time-homogeneous.

Example 2.1: Conversion of an ARMA model into a state space model.

Given the univariate ARMA(p,q) model

yt − µ = φ1(yt−1 − µ) + φ2(yt−2 − µ) + . . .+ φr(yt−r − µ) + εt + θ1εt−1 + . . .+ θr−1εt−r+1

where r = maxp, q + 1. If the coefficients are interpreted as zero for j > p, q (i.e. φj = 0

for j > p and θj = 0 for j > q), the state or transition equation is then

ξt+1 =

φ1 φ2 . . . φr−1 φr

1 0 . . . 0 0

0 1 . . . 0 0...

... . . ....

...

0 0 . . . 1 0

ξt +

εt+1

0...

0

(2.7)

and the observation or measurement equation is then

yt = µ+[1 θ1 θ2 . . . θr−1

]ξt (2.8)

(See Hamilton p. 374-375 for further examples concerning AR, MA and ARMA processes)

Example 2.2: State space representation of a random walk plus drift model (simple form

of a Basic Structural Model).

yt = µt + εt

µt = µt−1 + β + ηt


2.2 Kalman Filter 22

where µ follows a random walk with β as drift parameter.

yt =[1 0

]αt + εt, t = 1, . . . , T

αt =

[µt

βt

]=

[1 1

0 1

][µt−1

βt−1

]+

[ηt

0

]Despite being a constant β is considered as part of the state but, because of a zero restriction

in the system matrix Z, it does not change over time.

Extension: Inclusion of seasonal dummies for a quarterly series.

yt =[1 0 1 0 0

]αt + εt

αt =

µt

βt

δt

δt−1

δt−2

=

1 1 0 0 0

0 1 0 0 0

0 0 −1 −1 −1

0 0 1 0 0

0 0 0 1 0

µt−1

βt−1

δt−1

δt−2

δt−3

+

ηt

0

ω

0

0

Due to the dummy variable trap only three dummies are included. In this setting y only

depends on the first seasonal dummy which in turn negatively depends on its lagged values.

Notice that a variance is defined only for the first seasonal dummy.

Example 2.3: State space representation of a MA(1) process.

yt = µ+ εt + θεt−1

Its state space form looks as follows:

yt = µ+[1 θ

] [ εt

εt−1

][εt+1

εt

]=

[0 0

1 0

][εt

εt−1

]+

[εt+1

0

]

2.2 Kalman Filter

After having specified the state space form, the Kalman filter can be applied. It is a recursive

procedure computing the optimal estimator of the state vector at time t given the information

available at time t. The crucial assumption of the Kalman filter is that the disturbances and the

initial state vector are normally distributed since otherwise it would not give the conditional mean

of the state vector.8

8See section 2.3 for derivation of the filter.



2.2.1 The Filter

The filtering process is a rather mechanical procedure consisting of prediction equations and

updating equations. Let at−1 denote the optimal estimator of the true state αt−1 based on the

information up to yt−1. Given at−1 and Pt−1, the optimal estimator of αt is given by

at|t−1 = Ttat−1 + ct (2.9)

while the covariance matrix of the estimation error is

Pt|t−1 = TtPt−1T′

t +RtQtR′

t, t = 1, . . . , T (2.10)

where Pt−1 is defined as the squared deviation of the estimator from the true value of the state:

Pt−1 = E[(αt−1 − at−1)(αt−1 − at−1)′

](2.11)

or

E(αt+1α

′t+1

)= E

[(Tαt + vt+1)(α′tT

′ + v′t+1)]

= T · E(αtα

′t

)· T ′ + E

(vt+1v

′t+1

)= TPT ′ +R · E

(ηt+1η

′t+1

)·R′ (2.12)

Equations (2.9) and (2.10) are known as prediction equations and each time a new observation of

yt becomes available, the estimator at is updated:

at = at|t−1 + Pt|t−1Z′tF−1t vt (2.13)

and

Pt = Pt|t−1 − Pt|t−1Z ′tF−1t ZtPt|t−1 (2.14)

where vt is the deviation from the true value from the estimate

vt = yt − yt|t−1 = (yt − Ztat|t−1 − dt) (2.15)

and

Ft = ZtPt|t−1Z′t +Ht, t = 1, . . . , T (2.16)

Equations (2.13) and (2.14) are known as updating equations and Ft is the covariance matrix of

the innovations. Given the starting values for a1|0 and P1|0, the Kalman filter recursions yield

the optimal estimator of the present and one-step ahead state vector based on the information

available up to time t. The prediction errors vt, known as the innovations, play a crucial role

in the updating of the state vector (i.e. the larger the deviation measured by vt, the greater the

correction of at), since they represent the new information in the latest observation. Alternatively,

equations (2.9) and (2.10) can be written in a form known as Riccati equation which implies that

the recursions go from at|t−1 to at+1 instead of from at−1 to at:

at+1|t = (Tt+1 −KtZt)at|t−1 +Ktyt + (ct+1 −Ktdt) (2.17)

and

Pt+1|t = Tt+1(Pt|t−1 − Pt|t−1Z ′tF−1t ZtPt|t−1)T ′t+1 +Rt+1Qt+1R′t+1 (2.18)



where

Kt = Tt+1Pt|t−1Z′tF−1t (2.19)

Equation (2.19) is called the Kalman gain and will be used in different situations later on.

2.2.2 Prediction

Running the prediction and updating equations up to time T yields the final value of the state

vector aT+1|T . Substituting aT+1|T in equation (2.1) gives then the one-step ahead prediction of

y:

yT+1|T = ZT+1aT+1|T + dT+1

However, Kalman filtering also enables multi-step prediction. Substituting l times in the transition

equation (2.2) yields the state vector αT+l based on the information available up to time T meaning

that the recursions skip the updating of the coefficients:

αT+l =

[l∏

j=1

TT+j

]αT +

l−1∑j=1

[l∏

i=j+1

TT+i

][RT+jηT+j + cT+j

]+RT+lηT+l + cT+l, l = 2, 3, . . .

(2.20)

Taking conditional expectations in equation (2.20) gives the optimal estimator of αT+l:

ET (αT+l) = aT+l|T =

[l∏

j=1

TT+j

]aT +

l−1∑j=1

[l∏

i=j+1

TT+i

]cT+j + cT+l, l = 2, 3, . . . (2.21)

Inserting (2.20) and (2.21) in the definition of Pt−1 in (2.11) gives PT+l|T . In the time-invariant

case the appropriate expression is:

PT+l|T = T lPTTl′ +

l−1∑j=0

T jRQR′T j′, l = 1, 2, . . . (2.22)

Prediction can also be used if there are missing values in a time series.9 There is no updating of

the estimator as long as there are gaps. As soon as new observations become available in τ + l,

the estimator is updated based on information in τ − 1 and prediction error comprises now l + 1

steps.

2.2.3 Smoothing

Up to now, the availability of new information led only to an updating of the state vector αt

(i.e. E(αt|Yt)) but the formerly estimated values have not been adjusted according to the new

information. Smoothing algorithms take account of information made available after time t (i.e.

at|T = E(αt|Yt)). For example after having estimated at|t, the information contained in the

estimation of at+1|t+1 is used by smoothing algorithms to update the estimate of αt. Since the

9See section 2.7 for an example of how missing values are treated.



smoother is based on more information than the filtered estimator, its MSE will, in general, be

smaller than that of the filtered estimator10 (Pt|T ≤ Pt).Basically, there are three different algorithms: Fixed-point smoothing, fixed-lag smoothing and

fixed-interval smoothing.11 The choice of the smoother depends on the application since these

algorithms have different properties: Fixed-point smoothing computes smoothed estimates of the

state vector aτ |t for particular values of τ at all time periods t > τ . Thus it is an on-line smoother

running in parallel with the filtering recursions. On the other hand, fixed-interval smoothing is

an off-line technique running backwardly from t = T, . . . , 1 while producing smoothed estimates

with the estimator at|T . It is the most widely used algorithm for economic and social data.

Fixed-point smoothing

The idea behind this smoother is to estimate ατ at t = τ and to augment the Kalman filter with

this estimate for the subsequent recursions in periods τ ≤ t, . . . , T . Here T does not have to be

fixed (e.g. on-line measurement in a production process).

The starting values are the estimates of aτ |τ−1 and Pτ |τ−1 obtained by the first τ − 1 normal

Kalman recursions (For the remaining recursions τ is fixed). For the periods t = τ, τ + 1, . . . , T

the state vector is augmented by ατ giving the following state-space model:

yt =[Zt 0

]α†t + dt + εt, t = τ, τ + 1, . . . , T (2.23)

α†t =

[αt

α∗t

]=

[Tt 0

0 I

][αt−1

α∗t−1

]+

[ct

0

]+

[Rt

0

]ηt (2.24)

where a∗t denotes the estimator of ατ and, therefore, α∗t = α∗t−1 for all periods after t = τ . The

corresponding optimal estimator is

a†t+1|t =

[at+1|t

a∗t+1|t

]=

[[Tt+1 0

0 I

]−

[Kt

K∗t

] [Zt 0

]] [at|t−1a∗t|t−1

]+

[Kt

K∗t

]yt +

[[ct+1

0

]−

[Kt

K∗t

]dt

](2.25)

P †t+1|t =

[Pt+1|t P ∗

′

t+1|t

P ∗t+1|t P ∗∗t+1|t

]=

[Tt+1 0

0 I

][Pt|t−1 P ∗t|t−1

P ∗t|t−1 P ∗∗t|t−1

][[T ′t+1 0

0 I

]−

[Z ′t

0

] [K ′t K∗

′

t

]]. . .

· · ·+

[Rt+1

0

]Qt+1

[R′t+1 0

]t = τ, . . . , T (2.26)

Both equations (2.25) and (2.26) are in the form of the Riccati equations (2.17),(2.18) and (2.19)

and its starting values are

a†τ |τ−1 =

[aτ |τ−1

aτ |τ−1

]and P †τ |τ−1 =

[Pτ |τ−1 Pτ |τ−1

Pτ |τ−1 Pτ |τ−1

]

However, this form allows that the Kalman recursions and the smoothing recursions can be sep-

arated such that the two recursions run separately and that the smoothed and the unsmoothed

10In a time-invariant model the gain is greater the larger is H relative to Q.11Fixed-lag smoothing is, according to Harvey (1989), less important and is therefore, not presented here.


2.3 Derivation of the Kalman Filter 26

results can be obtained. Solving (2.26) for the gain matrix yields[Kt

K∗t

]=

[Tt+1Pt|t−1Z

′tF−1t

P ∗t|t−1Z′tF−1t

](2.27)

Substituting (2.27) in (2.25) and (2.26) gives two separate state-space recursions, the first being

(2.17) and the second is the smoothing recursion

a∗t+1|t = a∗t|t−1 +K∗t vt, t = τ, . . . , T (2.28)

where vt are the innovations produced by the Kalman filter of equation (2.17). Similarly, equation

(2.26) can be decomposed into the original recursion (2.18) and the smoothing recursions

P ∗t+1|t = P ∗t|t−1[Tt+1 −KtZt]′ (2.29)

and

P ∗∗t+1|t = P ∗∗t|t−1 − P∗∗t|t−1Z

′tK∗′t , t = τ, . . . , T (2.30)

where the equation (2.30) is the MSE of the smoothed estimator.

Fixed-interval smoothing

As mentioned above, this smoother uses aT and PT given by the last Kalman recursion at time

T and works backwards. T is required to be fixed which is often the case in economic research.

This algorithm requires that at and Pt be stored after each Kalman recursion. The equations are

as follows:

at|T = at + P ∗t (at+1|T − Tt+1at) (2.31)

and

Pt|T = Pt + P ∗t (Pt+1|T − Pt+1|t)P∗′t (2.32)

where

aT |T = aT , PT |T = PT and P ∗t = PtT′t+1P

−1t+1|t, t = T − 1, . . . , 1

2.3 Derivation of the Kalman Filter

The Kalman filter can be derived from two different perspectives. The lecture focuses on the

derivation from conditional normality (section 2.3.1) and the derivation from linear projections

is covered in section 2.3.2 which is supplementary. Derivations are based on Harvey (1989) and

Hamilton (1994).

2.3.1 Conditional Normality

Under the normality assumption, the initial state vector, α0, has a multivariate normal distribution

with mean a0 and covariance matrix P0. Additionally, the disturbances ηt and εt also have

multivariate normal distributions and are distributed independently of each other and of α0. As



can be seen from the state vector at t = 1, α1 is just a linear combination of two vectors of random

variables, both with multivariate distributions and a vector of constants: α1 = T1α0 + c1 +R1η1.

Hence, it is itself multivariate normal with a mean of

E(α1) = a1|0 = T1a0 + c1 (2.33)

and a covariance matrix

P1|0 = T1P0T′1 +R1Q1R

′1 (2.34)

Note that the equations are based on the initial conditions but not on y at t = 0. In order to

obtain the distribution of α1 conditional on y1, write

α1 = a1|0 + (α1 − a1|0) (2.35)

and

y1 = Z1a1|0 + d1 + Z1(α1 − a1|0) + ε1 (2.36)

(both equations obviously hold in any case). From equations (2.35) and (2.36) it can be seen that

the vector [α′1 y′1] has a multivariate normal distribution:[a1

y1

]∼ N

([a1|0

(Z1a1|0 + d1)′

],

[P1|0 P1|0Z

′1

Z1P1|0 Z1P1|0Z′1 +H1

])(2.37)

Lemma: (See Appendix A for proof) Let the pair of vectors x and y be jointly multivariate

normal such that [x

y

]∼ N

([µx

µy

],

[∑xx

∑xy∑

yx

∑yy

])(2.38)

Then the distribution of x conditional on y is also multivariate normal with mean

µx|y = µx +∑xy

∑−1yy (y − µy) (2.39)

and covariance matrix ∑xx|y =

∑xx −

∑xy

∑−1yy

∑yx (2.40)

Applying this lemma to (2.37) gives the result that the distribution of α1, conditional on a par-

ticular value of y1, is multivariate normal with mean

a1 = a1|0 + P1|0Z′1F−11 (y1 − Z1a1|0 − d1) (2.41)

and covariance matrix

P1 = P1|0 − P1|0Z′1F−11 Z1P1|0 (2.42)

where

F1 = Z1P1|0Z′1 +H1 (2.43)

These equations have exactly the same form as the Kalman filter. Repeating the steps used

up to now in this chapter for t = 2, . . . , T is the same as running the Kalman filter recursions



mentioned in chapter 2.2.1. However, the derivation given so far only allows at and Pt to be in-

terpreted as the mean and covariance matrix of the conditional distribution of αt. The purpose of

the following section is therefore to provide the reasons why this mean and this covariance matrix

can be interpreted as optimal estimator and covariance matrix of the estimation error respectively.

The conditional mean of αt at time T is a minimum mean square estimator [MMSE] defined as

yT+l|T = E(yT+l|YT ) = ET (yT+l) (2.44)

which estimation error can be split into two parts12

yT+l − yT+l|T = [yT+l − E(yT+l|YT )] + [E(yT+l|YT )− yT+l|T ] (2.45)

The second term on the right-hand side is fixed at time T when using the result from equation

(2.44) in (2.45) which enables squaring of the expression without having to consider the cross-

product. After squaring and taking conditional expectations on equation (2.45) yields

MSE(yT+l|T ) = V ar(yT+l|YT ) + [yT+l|T − E(yT+l|YT )2] (2.46)

The conditional variance of yT+l in equation (2.46) does not depend on yT+l|T . Hence, the MMSE

of yT+l is given by the conditional mean (2.44) and it is unique. These arguments can be directly

applied to (2.41) (in all periods). When taking the expectation over all the variables in the

information set, this estimator minimises the MSE13. In a linear Gaussian model, the MSE of

the Kalman filter is given by the matrix Pt which is totally independent of the observations (i.e.

deviations from yt do not influence Pt) which in turn implies that it is the unconditional error

covariance matrix, too. Furthermore, it follows from the definition of yt|t−1 that the innovations

vt are normally and independently distributed with mean zero and covariance matrix Ft:

vt ∼ NID(0, Ft)

(Recall the definition of Ft in (2.16)) with

E(vtvs) = 0, t, s = 1, . . . , T

It should be highlighted that the results on the distribution of the innovations only hold exactly if

the system matrices are fixed and known.14 Generally, this does not hold if these matrices contain

unknown hyperparameters.15

12The separation of the estimation error is valid for any predictor conditional on the information available at

time T .13Since the state vector is, as a general rule, random, it is actually not legitimate to speak of the conditional

mean estimator having a covariance matrix but this fact is neglected.14See section 3.2.4 in Harvey (1989) for an adjustment of the Kalman filter if the disturbances are contempora-

neously correlated15Hyperparameters are the parameters to be optimised in order to maximise loglikelihood. See section 2.5.1 for

further discussion.



2.3.2 Linear Projection

Given the set of observations, the addressed problem here is the derivation of an estimator of at

that minimises the conditional mean square error, i.e.

at = argmin E[(αt − at|t−1)(αt − at|t−1)′|Yt

](2.47)

The estimator satisfying this condition is the conditional mean which can be shown as follows.

Let the cost function be given by

J = E[(αt − at|t−1)(αt − at|t−1)′|Yt

](2.48)

which can be written as

J = E [α′tαt|Yt] + E [α′t|Yt]αt|t−1 + a′t|t−1E [αt|Yt] + E[α′t|t−1αt|t−1|Yt

](2.49)

Adding and subtracting E [α′t|Yt]E [αt|Yt] yields

J = E [α′tαt|Yt]− E [α′t|Yt]E [αt|Yt] +[at|t−1 − E [αt|Yt]

]′ [at|t−1 − E [αt|Yt]

](2.50)

The first two terms on the right hand side do not depend on the estimator at|t−1. The dependency

results from the quadratic term at the end and J is obviously minimised if the quadratic term is

zero, hence

at|t−1 = E [αt|Yt] (2.51)

Corollary (See Appendix B for proof): If f(Yt) is a given function of the observations Yt, then

the estimation error is orthogonal to f(Yt), αt − at|t−1 ⊥ f(Yt), which implies

E[(αt − at|t−1

)f′(Yt)

]= 0 (2.52)

The following Figure presents a graphical interpretation of the corollary:

The space spanned by Yt is represented by the coloured area. The true parameter vector αt is not

known. The only available information is given by the set of observations causing the estimation

vector to lie in the space generated by Yt−1. The minimum mean square error estimator is such

that the estimation error is minimised which is obviously the orthogonal projection of αt on Yt−1.


2.4 Properties of Time-Invariant Models 30

So far, no assumptions on the probability distribution have been made. Assuming αt and Yt−1 to

be jointly Gaussian implies that the MMSE is an unbiased, linear combination of the observations

that also minimises the variance. If they are not jointly Gaussian, in general, the estimator is not

a linear function of the observations but it is still the best estimator among the linear estimators.

When an additional observation is available, the estimator is updated using the formula for up-

dating a linear projection:

at|t = at|t−1 +E[(αt − at|t−1)(yt − yt|t−1)

]E[(yt − yt|t−1)(yt − yt|t−1)′

] (yt − yt|t−1) (2.53)

The term in the nominator corresponds to Pt|t−1Z, the expression in the denominator corresponds

to Ft and the last bracket is the estimation error vt. Substituting into (2.53) yields

at = at|t−1 + Pt|t−1Z′tF−1t vt (2.54)

Similarly, the MSE matrix is updated by

Pt = E[(αt − at|t−1)(αt − at|t−1)′

]. . .

− E[(αt − at|t−1)(yt − yt|t−1)′

]× E

[(yt − yt|t−1)(yt − yt|t−1)′

]−1 × E [(yt − yt|t−1)(αt − at|t−1)′]

(2.55)

which corresponds to equation (2.14).

2.4 Properties of Time-Invariant Models

All formulas and properties derived so far apply to both time-varying and time-invariant models

where time-varying means that the system matrices are allowed to change. In most applications,

however, the model is time-invariant except from the vectors ct and dt. In this section the following

model is therefore covered (Notice that the subscripts for the system matrices have been dropped):

yt = Zαt + dt + εt, V ar(εt) = H (2.56)

αt = Tαt−1 + ct +Rηt, V ar(ηt) = Q (2.57)

with E(εtη′s) = 0 for all s, t.

The Kalman filter applied to such a time-invariant state space model is in a steady state if the

error covariance matrix is time-invariant, that is

Pt+1|t = P (2.58)

The recursion for the error term is therefore redundant while the recursion for the state becomes

at+1|t = T at|t−1 + Kyt + (ct+1 − Kdt) (2.59)

where the transition and gain matrices are defined as

T = T − KZ (2.60)


2.5 Estimation, Initialisation and Diagnostic Checking of the Kalman Filter 31

and

K = T PZ ′(ZPZ ′ +H)−1 (2.61)

The steady state filter is said to be stable if the roots of system matrix T are less than one in

absolute value (i.e. |λi(T )| < 1, i = 1, . . . ,m). Necessary condition for the Kalman filter having

a steady state solution is the existence of a time-invariant error covariance matrix satisfying the

Riccati equation from (2.18). This condition holds if the model is stable and if the initial covariance

matrix P1|016 is positive semi-definite, yielding

limx→∞

Pt+1|t = P (2.62)

with P independent from the initial covariance matrix. A final point to note is that if the covariance

matrix Pt+1|t has a steady state solution, then Ft also converges to a steady state given by

limx→∞

Ft =∑

= ZPZ ′ +H (2.63)

2.5 Estimation, Initialisation and Diagnostic Checking of the Kalman

Filter

This section covers the remaining parts needed to run the Kalman recursions. In section 2.2

the equations needed for the recursions have been described and this section covers the remaining

parts, initialisation and optimisation, to estimate the Kalman filter. These elements would already

allow programming a computer code with which such a state space model could be estimated.

However - as a practical hint - Kalman filtering is a rather mechanical issue and, therefore, efforts

should be put into the setup of the state space model rather than in programming. A lot of

programming with respect to the Kalman filter has already been done. For example the software

packages SsfPack 2.2b / SsfPack 3.0 (Beta) or STAMP.17 Secondly, asking researchers for their

programming routines (as in all research areas) can save a lot of time, money and efforts.

2.5.1 Maximum Likelihood Estimation

Maximum likelihood is usually based on the assumption that the T sets of observations y1, . . . , yT

are independently and identically distributed. In time series analysis this assumption can no

longer be maintained. Therefore, a conditional probability density function is used to write the

joint density function as

L(y;ψ) =

T∏t=1

p(yt|Yt−1) (2.64)

where p(yt|Yt−1) denotes the distribution of yt based on the information set available at t−1, that

is Yt−1 = yt−1, yt−2, . . . , y1. Provided the disturbances and the initial state vector have proper

multivariate normal distributions, αt is normally distributed conditional on Yt−1 with mean at|t−1

16Initial conditions are discussed in section 2.5.2.17See http://www.timberlake.co.uk/ or http://www.ssfpack.com/ for further information.



and a covariance matrix of Pt|t−1. The Kalman filter computes these distributions where the mean

of the conditional distribution of yt is

Et−1(yt) = yt|t−1 = Ztat|t−1 + dt (2.65)

and its covariance matrix is given by

E(vtvt) = Ft = ZtPt|t−1Z′t +Ht (2.66)

Substituting equations (2.65) and (2.66) for the mean and the covariance matrix in the multivariate

Gaussian density function gives then

fyt|xt,Yt−1=

1(√2π)n√|ZtPt|t−1Z ′t +Ht|

× . . .

. . . exp

−1

2

yt − Ztat|t−1 − dt︸︷︷︸v

′ZtPt|t−1Z ′t +Ht︸︷︷︸

F

−1yt − Ztat|t−1 − dt︸︷︷︸

v

(2.67)

Taking logs and summing over the sample returns the likelihood function of (2.64):

logL = −NT2

log 2π − 1

2

T∑t=1

log |Ft| −1

2

T∑t=1

v′tF−1t vt (2.68)

where the additive term at the beginning is often discarded and vt is defined as in (2.15). Equation

(2.68) is known as the prediction error decomposition form of the likelihood which is then max-

imised with respect to the hyperparameters stacked in vector ψ18 using a numerical optimisation

algorithm.19

2.5.2 Initialisation

The starting values are a crucial part of the estimation process since changing the starting values

can severely influence estimation results. Therefore, it is good practice to test sensitivity of

estimation results with respect to different starting values. In the following sections different

methods to initialise the Kalman filter are presented.

In principle, the starting values for the Kalman filter are given by the mean and the unconditional

covariance matrix of the unconditional distribution of the state vector. If the state vector is

stationary, it has a mean of

a0 = (I − T )−1c (2.69)

and a covariance matrix P which is the unique solution to the equation

P = TPT ′ +RQR′ (2.70)

18In section 3.4.2 of Harvey (1989), to improve optimisation process, techniques for concentrating out likelihood

function parameters which enter linearly into ct and dt are described.19e.g. the function ’fminsearch’ of the Optimisation Toolbox for Matlab.



where stationarity holds if |λi(T )| < 1 and ct is time-invariant. Alternatively, above all in more

complex models, P might be calculated from

vec(P0) = [I − T ⊗ T ]−1vec(RQR′) (2.71)

(the vec(.) operator indicates that the columns of the matrix are being stacked one upon the

other). Since the unconditional distribution of α1 is equal to the unconditional distribution of α0,

the Kalman filter can be initialised as a1|0 = 0 and P1|0 = P .

When the transition equation is non-stationary, the unconditional distribution of the state vector

is not defined. Unless genuine prior information is available, therefore, the initial distribution of

α0 must be specified in terms of a diffuse or non-informative prior which can be approximated by

a0 = 0 (2.72)

P0 = κI (2.73)

where κ is a positive scalar. The diffuse prior is obtained as κ→∞ which corresponds to P−10 = 0.

It is only an approximation since it does not integrate to one. Practically speaking, κ can not

be set equal to infinity, but by setting κ equal to a very large number a good approximation is

obtained. In the univariate case, this result can even be generalised: The use of a diffuse prior is

equivalent to the construction of a proper prior from the first m sets of observations provided that

the model is observable. If only some of the elements in the state vector happen to be stationary,

P0 can be partitioned. If, without any loss in generality, the non-stationary elements are taken to

be the first d (d ≤ m), the transition matrix must be of the form

T =

[T1 T2

0 T4

](2.74)

where T1 is d× d, T2 is d× (m− d) and T4 is (m− d)× (m− d) with |λi(T4)| < 1. Partitioning

P0 accordingly yields

P1|0 =

[κI 0

0 P

](2.75)

2.5.3 Diagnostic Checking

After having defined the initial values, the optimisation algorithm begins to maximise the log-

likelihood function. During this procedure it is recommended to observe the evolvement of the

loglikelihood after each iteration20 because it can be a first hint for problems. If the value of the

loglikelihood function jumps around all the time and does not converge more or less smoothly

towards something like an equilibrium value, the search algorithm or the initial conditions might

be inappropriate.

The following test procedures, presented in chapter 5 of Harvey (1989), are conducted after the

estimation process of univariate models.

20It depends on the software package whether this is possible but it is easily programmed in Matlab.



Tests on variances

For obvious reasons the parameters of the variances are constrained to be non-negative (e.g. the

basic structural model contains the four parameters σ2η, σ2

ς , σ2ω and σ2

ε). This restriction can

be tested by a likelihood ratio test. Basically, this test determines whether the prediction error

variances differ significantly if a zero constraint is imposed on one of the variance parameters (e.g.

H0 : σ2η = 0):

LR ' (T − d) log(SSE0

SSE

)∼ χ2

m (2.76)

where SSE0 is the prediction error variance of the restricted model and d is the number of re-

strictions. SSE and SSE0 can be obtained from the last term of the loglikelihood function which

are the standardised residuals.

Tests on residuals

The basic way of examining the behaviour of the residuals is by plotting v against time. A

standardised plot is based on the demeaned residuals divided by the estimate of the standard

deviation:

(vt − ¯v)

σ∗, with σ∗ =

√√√√ 1

T − d− 1

T∑t=d+1

(vt − ¯v)2 (2.77)

where vt = vt/f0.5t and d is the number of non-stationary elements in the model.21 An alternative

way is the comparison of the cumulative sum (CUSUM) with the predetermined significance lines

drawn above and below the zero axis. The CUSUM is defined as

CUSUM(t) =1

σ∗

T∑j=d+1

vj (2.78)

and the significance lines are given by

CUSUM = ±[a√T − d+ 2a(t− d)/

√T − d

](2.79)

If CUSUM crosses the significance lines, the residuals are not normally and independently dis-

tributed which hints at a structural change.22 Notice that a determines the significance level (e.g.

a = 0.948 for a significance level of 5%).

Serial correlation and non-normality of the residuals are an important indicator of misspecifica-

tions. The Box-Ljung statistic testing serial correlation is given by

Q∗ = T ∗(T ∗ + 2)

P∑τ=1

(T ∗ − τ)−1r2v(τ) (2.80)

where the autocorrelation rv(τ) is obtained from

rv(τ) =

∑Tt=d+1+τ (vt − ¯v)(vt−τ − ¯v)∑T

t=d+1(vt − ¯v)2, τ = 1, 2, . . . (2.81)

21The matrix Ft in a univariate model is denoted as ft.22Besides being able to detect structural changes, CUSUMSQ can also detect heteroskedasticity in the residuals.

See Harvey (1989) p. 257.



Third and fourth order moments of the residuals about the mean need to be calculated in order

to calculate the Jarque-Bera statistic23 afterwards:√b1 = σ−3∗

∑(vt − ¯v)3/T ∗ (2.82)

b2 = σ−4∗∑

(vt − ¯v)4/T ∗ (2.83)

JB = (T ∗/6)b1 + (T ∗/24)(b2 − 3)2 ∼ χ22 under H0 (2.84)

Goodness of fit

The prediction error variance is a natural measurement for the goodness of fit. It is either cal-

culated as σ2 = σ2∗ f in the time-invariant case or σ2

f = σ2∗fT in the time-varying case.24 These

variances can then be used in the calculation of the coefficient of determination. The conventional

R2 is obtained by dividing SSE by the sum of squares of the observations but it does only make

sense for stationary time series. The following measures, however, are useful for non-stationary,

univariate time series.25

The residual sum of squares for a univariate time series model may be defined as

SSE = f∑t

v2t = (T − d)σ2 (2.85)

R2D is obtained by replacing the observations by their first differences (i.e. making the observations

stationary):

R2D = 1− SSE∑T

t=2(∆yt − ∆y)2(2.86)

If a model contains seasonal dummies, R2D can be rewritten in order to take the seasonal dummies

into account:

R2S = 1− SSE

SSDSM(2.87)

where SSDSM is the sum of squares of first differences around the seasonal means. Any model

with R2S being negative can be rejected.

The really important performance test for your model is an out-of-sample forecast. A standard

procedure is estimating the model over a restricted sample and comparing the forecasts with the

actual values of the remaining sample. Accepting a model just because of its in-sample goodness

of fit properties is dangerous. The following post-sample predictive test statistics can then be used

to detect spurious good fits:

ξ(l) =

[∑lj=1 v

2T+j/l

][∑T

t=d+1 v2t /(T − d)

] =

∑lj=1 v

2T+j

lσ2∗

∼ F (l, T − d) (2.88)

23Note that a large sample is required for Jarque-Bera. The Shapiro-Wilk test might be more appropriate in

small samples.24In general, the difference between the two definitions becomes small as the sample size increases.25Harvey (1989) considers them as ”an attempt to provide goodness-of-fit criteria which are useful for non-

stationary time series data”.


2.6 Extended Kalman Filter 36

When the model is misspecified and a spuriously good fit has been obtained in the sample period,

the value of ξ(l) will be inflated which in turn leads to the rejection of the model.

2.6 Extended Kalman Filter

All state space models considered so far have been linear. The extended Kalman filter enables

also the estimation of functionally non-linear state space models. Consider the following univariate

model:

yt = zt(αt) + εt (2.89)

αt = tt(αt−1) +Rt(αt−1)ηt (2.90)

Contrary to equations (2.1) and (2.2), the elements of zt(αt) and tt(αt−1) are no longer necessarily

linear functions of the elements of the state vector and Rt(αt−1) may depend on the state vector.

Even under the assumption that εt and ηt are normally distributed, obtaining an optimal filter for

a model of this kind is not, in generally, possible. However, an approximate filter can be obtained

by linearising the model by Taylor expansion around the conditional means at|t−1 and at−1, and

then applying a modification of the Kalman filter:

zt(αt) ∼= zt(at|t−1) + Zt(αt − at|t−1) (2.91)

tt(at−1) = tt(at−1) + Tt(αt−1 − at−1) (2.92)

Rt(αt−1) ∼= Rt (2.93)

where

Zt =∂zt(αt)

∂α′t−1

∣∣∣∣∣αt=at|t−1

(2.94)

Tt =∂tt(αt−1)

∂α′t−1

∣∣∣∣∣αt−1=at−1

(2.95)

Rt = Rt(at−1) (2.96)

The approximation of the original non-linear model (2.89) is obtained by substituting equations

(2.91) to (2.96) in (2.9) and assuming knowledge of at|t−1 and at−1. This gives

yt ∼= Ztαt + dt + εt (2.97)

αt = Ttαt−1 + ct + Rtηt (2.98)

where

dt = zt(at|t−1)− Ztat|t−1 (2.99)

ct = tt(at−1)− Ttat−1 (2.100)


2.7 Illustrative Economic Applications 37

The Kalman filter can now be applied to equations (2.97) - (2.100) with the modification that the

state prediction equation defined by equation (2.9) becomes

at|t−1 = tt(at−1) (2.101)

while the state updating equation is

at = at|t−1 + Pt|t−1Z′tF−1t [yt − zt(at|t−1)] (2.102)

These recursions are known as the extended Kalman filter. Note that ct and dt are never actually

computed.

Example 2.4: Consider the following non-linear model:

yt = logαt + εt

αt = α2t−1 + ηt

Applying the steps described above yields

yt ∼= a−1t|t−1αt + [log at|t−1 − 1] + εt

αt = 2at−1αt−1 + [a2t−1 − 2at−1] + ηt

2.7 Illustrative Economic Applications

2.7.1 Example of a time-varying coefficients model

Important applications of the Kalman filter are models with time-varying coefficients in order to

model structural changes (e.g. a change in the relationship between money supply and inflation

rates). In the following a linear regression model which coefficients changing stochastically is

presented.

Consider the model

yt = X ′tβt + εt, t = 1, . . . , T (2.103)

with β evolving as a random walk process

βt = βt−1 + ηt (2.104)

where εt ∼ N(0, σ2) and ηt ∼ N(0, σ2Q).26

Since β is non-stationary a diffuse prior has to be used for the initialisation of the Kalman filter (see

section 2.5.2). Once the initial values have been determined, the loglikelihood can be maximised

with respect to ψ containing the elements of Q and σ2. The matrix Q determines the extent to

which β will vary.

26Note that the matrix Q determines the extent to which β will vary.



2.7.2 Example of a multivariate SUTSE model

The following model is a rather involved state space model which has also been used in recent re-

search. It is presented here as it can be easily set up modularly. A Seemingly Unrelated Structural

Time Series model [SUTSE]27 is the multivariate generalisation of the univariate Unobserved Com-

ponents model. Similar to the seemingly unrelated regression model, it makes use of the covariance

between the time series. Interesting features of a SUTSE model are temporal disaggregation by

related series, partial updating of the state vector and estimation of a cointegration vector.

The following state space model decomposes both series into a stochastic slope, a seasonal compo-

nent and a common cycle. Finally, the annual series X is temporally disaggregated into a quarterly

series Y . Equation (2.110) is the observation or measurement equation without matrix G. The

reason for G being restricted to zero is obvious but is explained later on. The state equation

(2.110) has the usual form but matrix T is now time-varying.

Stochastic slope: the parameters µ and β yield a stochastic slope which is of order I(2) because

the coefficients in front of µ and β are equal to one. These parameters exist for both series such

that the only connection between them is their covariance Γη in matrix H. In order to model a

cointegrating relationship, it is possible to impose a common slope to both series. The first two

rows of the state equation represent the two stochastic slopes: µt = µt−1 + βt−1 + ηt where the

two consecutive rows just indicate that βt = βt−1 + ςt.

yt =

[Xt

Yt

]=

[1 0 0 0

0 1 0 0

]µt,X

µt,Y

βt,X

βt,Y

+

[0 0 0 0 Γ1,ε 0

0 0 0 0 Γ2,ε Γ3,ε

]

ηt,X

ηt,Y

ςt,X

ςt,Y

εt,X

εt,Y

at =

µt,X

µt,Y

βt,X

βt,Y

=

1 0 1 0

0 1 0 1

0 0 1 0

0 0 0 1

µt−1,X

µt−1,Y

βt−1,X

βt−1,Y

+

Γ1,η 0 0 0 0 0

Γ2,η Γ3,η 0 0 0 0

0 0 Γ1,ς 0 0 0

0 0 Γ2,ς Γ3,ς 0 0

ηt,X

ηt,Y

ςt,X

ςt,Y

εt,X

εt,Y

(2.105)

27See Harvey (1989) Chapter 8.



Seasonal dummies: In a structural model capturing seasonal effects consists of two parts: Put

δ − 1 times −1 in the first row and then append an identity matrix I(δ − 2) just below where δ

is the frequency of the series (e.g. δ = 4 for quarterly series). Instead of seasonal dummies one

could also use trigonometric methods (i.e. using sinus and cosines) similar to the common cycle

described below.

[Xt

Yt

]=

[1 0 0 0 0 0

0 0 0 1 0 0

]

δt,X,1

δt,X,2

δt,X,3

δt,Y,1

δt,Y,2

δt,Y,3

+

[0 0 0 0 0 0 Γ1,ε 0

0 0 0 0 0 0 Γ2,ε Γ3,ε

]

ωt,X

0

0

ωt,Y

0

0

εt,X

εt,Y

(2.106)

δt,X,1

δt,X,2

δt,X,3

δt,Y,1

δt,Y,2

δt,Y,3

=

−1 −1 −1 0 0 0

1 0 0 0 0 0

0 1 0 0 0 0

0 0 0 −1 −1 −1

0 0 0 1 0 0

0 0 0 0 1 0

δt−1,X,1

δt−1,X,2

δt−1,X,3

δt−1,Y,1

δt−1,Y,2

δt−1,Y,3

+

Γ1,ω 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

Γ2,ω 0 0 Γ3,ω 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

ωt,X

0

0

ωt,Y

0

0

εt,X

εt,Y

Common cycle: Similar to the stochastic slope, one could estimate a cyclical component separately

for both series. The idea of the SUTSE model, however, is that the series are affected by the same

factors (i.e. the business cycle) and, therefore, a common cycle is imposed (Notice the two ones

in the same column in the last two rows).

[Xt

Yt

]=

[1 0 1 0

0 1 0 1

]ψt,X

ψt,Y

ψ∗t,X

ψ∗t,Y

+

[0 0 0 0 Γ1,ε 0

0 0 0 0 Γ2,ε Γ3,ε

]

κt,X

κt,Y

κ∗t,X

κ∗t,Y

εt,X

εt,Y

(2.107)

ψt,X

ψt,Y

ψ∗t,X

ψ∗t,Y

=

ρ cos (λ) 0 ρ sin (λ) 0

0 ρ cos (λ) 0 ρ sin (λ)

−ρ sin (λ) 0 ρ cos (λ) 0

0 −ρ sin (λ) 0 ρ cos (λ)

ψt−1,X

ψt−1,Y

ψ∗t−1,X

ψ∗t−1,Y

+

Γ1,κ 0 0 0 0 0

Γ2,κ Γ3,κ 0 0 0 0

0 0 Γ∗1,κ 0 0 0

0 0 Γ∗2,κ Γ∗3,κ 0 0

κt,X

κt,Y

κ∗t,X

κ∗t,Y

εt,X

εt,Y



Temporal disaggregation28: For this purpose, two additional lines are appended to matrix T and

matrix G is removed. G is not needed any longer because the sum of four quarterly estimates XQ

must exactly sum up to the respective annual value XA. The sum constraint is imposed by two

elements: First, matrix C is introduced in matrix T . It aggregates the quarterly series in order to

compare them with the actual annual values and is defined as follows:

C = diag(c1,t, c2,t, . . . , cN,t) where ci,t =

0 t = 1, δi + 1, 2δi + 1, . . .

1 otherwise(2.108)

δi is in our case 4 for the annual series X and 1 for quarterly series Y . In the months between

the observations, quarterly estimates of X are aggregated over four periods because the equation

includes then, as a consequence of C being equal to one, the lagged values of quarterly X. Since

the indicator time series is already observed in the higher frequency, the second row of C is always

zero as no aggregation is needed. Temporal disaggregation could also be performed in a univariate

model but in a multivariate setting, disaggregation is assumed to be improved because the indi-

cator series contains information about the time periods between the observations.29

Second, the element Z in matrix Z is also time-varying and is defined as follows:

Zt =

1 t = δX , 2δX , . . .

0 otherwise(2.109)

It is equal to one in the periods in which the low-frequency series is observed and thus allows a

comparison between the aggregated quarterly values and the actual annual value.

Partial updating of the state vector30: Suppose that the indicator series Y is available earlier than

X. Instead of waiting for the delayed value, one can already partially update the state vector.

On the one hand, a better estimate of the indicator series is obtained and on the other hand, as a

consequence of non-zero covariances, the estimator of X is partially updated. Since the estimator

is based on more information than before, a lower forecasting error can be expected (Notice that

partial updating can not be shown in the state space representation below).

28Temporal disaggregation refers in this case to distribution. Only the temporal disaggregation of a stock variable

is called interpolation. The ”interpolation” of a flow variable (e.g. GDP) is called distribution because the value of

the quarterly GDP is distributed among three months.29Milton Friedman developed in 1962 the concept of interpolation by related series. A very popular model which

makes use of this concept was developed in 1970 by Chow-Lin.30See Harvey (1989) Chapter 8.7



Combining all these modules yields the following state space representation:

[XA

YQ

]=

[0 . . . 1 0

0 . . . 0 1

]

µX,t

µY,t

βX,t

βY,t

δX1,t

δX2,t

δX3,t

δY 1,t

δY 2,t

δY 3,t

ψt,X

ψt,Y

ψ∗,t,X

ψ∗,t,Y

XQ,t

YQ,t

(2.110)



µX,t

µY,t

βX,t

βY,t

δX1,t

δX2,t

δX3,t

δY 1,t

δY 2,t

δY 3,t

ψt,X

ψt,Y

ψ∗,t,X

ψ∗,t,Y

XQ,t

YQ,t

=

1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0

0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0

0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 −1 −1 −1 0 0 0 0 0 0 0 0 0

0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 −1 −1 −1 0 0 0 0 0 0

0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 A 0 B 0 0 0

0 0 0 0 0 0 0 0 0 0 0 A 0 B 0 0

0 0 0 0 0 0 0 0 0 0 −B 0 A 0 0 0

0 0 0 0 0 0 0 0 0 0 0 −B 0 A 0 0

1 0 1 0 1 0 0 0 0 0 A 0 B 0 C 0

0 1 0 1 0 0 0 1 0 0 0 A 0 B 0 0

µX,t−1

µY,t−1

βX,t−1

βY,t−1

δX1,t−1

δX2,t−1

δX3,t−1

δY 1,t−1

δY 2,t−1

δY 3,t−1

ψt−1,X

ψt−1,Y

ψ∗,t−1,X

ψ∗,t−1,Y

XQ,t−1

YQ,t−1

. . .

+

Γ1,η 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Γ2,η Γ3,η 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 Γ1,ς 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 Γ2,ς Γ3,ς 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 Γ1,ω 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 Γ2,ω 0 0 Γ3,ω 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 Γ1,κ 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 Γ2,κ Γ3,κ 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 Γ1,κ 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 Γ2,κ Γ3,κ 0 0

Γ1,η 0 Γ1,ς 0 0 0 0 0 0 0 Γ1,κ 0 0 0 Γ1,ε 0

Γ2,η Γ3,η Γ2,ς Γ3,ς 0 0 0 0 0 0 Γ2,κ Γ3,κ 0 0 Γ2,ε Γ3,ε

ηX,t

ηY,t

ςX,t

ςY,t

ωX,t

0

0

ωY,t

0

0

κX,t

κY,t

κ∗X,t

κ∗Y,t

εX,t

εY,t

(2.111)

where A = ρ cos (λ) and B = ρ sin (λ).


Topic 3: Solving Rational Expectation Models 43

3 Topic 3: Solving Rational Expectation Models

Introduction31

This section discusses Rational Expectation Models and shows three different ways to solve them.

The Hypothesis of Rational Expectation is an important and necessary assumptions for many

models used in economic theory. Mathematically, it means that the expectation, which economic

subjects form are equal to the mathematical expectation of this variable. Or, as John F. Muth32

states:

” The hypothesis asserts three things: (1) Information is scarce, and the economic system

generally does not waste it. (2) The way expectations are formed depends specifically on the

structure of the relevant system describing the economy. (3) A ‘public prediction’, in the sense of

Grunberg and Modigliani (1954), will have no substantial effect on the operation of the economic

system (unless it is based on inside information). ”

The section is structured as follows. First, a rather simple example leads through the mathematical

steps to obtain a solution for a macro model without and with expectations of future variables.

Then a more general solution is showed with in matrix representation. However, also this part

is limited as it uses one variable, which is known at the current period, and of a future variable,

where its expectation is used. The last and most complete part goes one step further. It uses a

model where past, current and expectations of future variables are used.

3.1 The Basic Method33

A simple macro model may be as follows

mt = pt + yt (3.1)

pt = Et−1pt + δ(yt − y∗) (3.2)

mt = m+ εt (3.3)

where m, p, y are the logarithms of money supply, the price level, and output; y∗ is normal output,

m is the monetary target (both are assumed to be known constants). Equation (3.1) is a simple

money demand function with zero interest elasticity and unit income elasticity. Equation (3.3) is

a money supply function in which the government aims for a monetary target with an error, εt,

which is white noise. Equation (3.2) is a Phillips curve as can be seen by subtracting pt−1 from

both sides; in this case it states that the rate of inflation equals last period’s expectation of the

inflation rate plus a function of ‘excess demand’.

31I thank Andreas Walchli for the preparation of these lecture notes.32see Muth, Rational Expectations and the Theory of Price Movements in Econometrica, Vol. 29, No. 3 (1961).33Source: Minford and Peel, Advanced Macroeconomics, 2002, Chapter 2.


3.1 The Basic Method 44

The model has three linear equations with three endogenous variables, two exogenous variables, m

and εt, and an expected value Et−1pt. The first step is to solve the system, assuming the expected

value as exogenous. So, substituting (3.2) and (3.3) into (3.1) gives

m+ εt = Et−1pt + δ(yt − y∗) + yt (3.4)

Now we need to find Et−1pt, to get the full solution. To do so, take the expectation of the model

at time t− 1.

Et−1mt = Et−1pt + Et−1yt (3.5)

Et−1pt = Et−1pt + δ(Et−1yt − y∗) (3.6)

Et−1mt = m (3.7)

Substituting (3.6) and (3.7) into (3.5) gives

m = Et−1pt + y∗ (3.8)

Now, (3.8) is substituted into (3.4) to obtain

yt = y∗ +1

1 + δεt (3.9)

Using (3.9) and (3.8) from (3.2) gives

pt = m− y∗ +δ

1 + δεt (3.10)

The solution for yt and pt consist both of an expected part (y∗ and m − y∗, respectively) and

an unexpected part εt. Rational expectation has incorporated anything known at t − 1 with im-

plications for p and y at time t into the expected part, so that the unexpected part is purely

unpredictable.

Without going too deep, a further remark may be stated at this point. The model has an interesting

implication, first pointed out by Sargent and Wallace (1975). The solution for yt does not depend

on the parameters of the money supply function. If there do not occur surprises, output would

be at its normal level. To illustrate this assume that the government/ central bank attempts to

stabilise output by changing the money supply rule to

mt = m− β(yt−1 − y∗) + εt (3.11)

Even in this case, the solution for yt is still (3.9), because this money supply rule is incorporated

into people’s expectations at t − 1 and cannot cause any surprises. However, the solution for pt

changes slightly to

Et−1pt = m− β(yt−1 − y∗)− y∗ = m− y∗ − β

1 + δεt−1 (3.12)

pt = m− y∗ − β

1 + δεt−1 +

δ

1 + δεt (3.13)

This result obviously contradicts the results from models with backward-looking expectations.

According to them, the government can reduce fluctuations in output by choosing an appropriate

monetary target.


3.2 Rational Expectations Models with Expectations of Future Variables (REFVModels) 45

Fundamentally, the basic method involves three steps:

1. Solve the model, treating expectations as exogenous.

2. Take the expected value of this solution at the date of the expectations, and solve for the

expectations.

3. Substitute the expectations solutions into the solution in 1, and obtain the complete solution.

3.2 Rational Expectations Models with Expectations of Future Vari-

ables (REFV Models)

It lies in the nature of economic decisions that future variables are taken into consideration. For

these REFV models, the method discussed above must be supplemented and it can be replaced

by more convenient alternatives. Let the model be

mt = pt + yt − α(Et−1pt+1 − Et−1pt), (α > 0) (3.14)

and equations (3.2) and (3.3) from before

pt = Et−1pt + δ(yt − y∗)

mt = m+ εt

Equations (3.2) and (3.3) remain unchanged, whereas the demand of money responds negatively

to expected inflation. The basic method has to be adapted in order to solve this model. First,

solve, given that expectations are exogenous:

m+ εt = Et−1pt + δ(yt − y∗) + yt − α(Et−1pt+1 − Et−1pt) (3.15)

Now, (step 2) take the expectations of the model to obtain

m− y∗ = (1 + α)Et−1pt − αEt−1pt+1 (3.16)

This is not yet the solution as the term Et−1pt+1 is not solved out. (3.16) may be shifted one

period ahead, but leaving the period of expectations at t− 1. This gives

m− y∗ = (1 + α)Et−1pt+1 − αEt−1pt+2 (3.17)

The problem has just been moved to the future. However, as proposed by Sargent, using the

method of forward iteration leads to a solution. Write (3.17) as

Et−1pt+1 =1

(1 + α)(m− y∗) +

α

(1 + α)Et−1pt+2 (3.18)

Now, substitute successively forwards for Et−1pt+2, Et−1pt+3 and so on gives

Et−1pt−1 =1

(1 + α)

N−1∑i=0

(α

1 + α

)i(m− y∗) +

(α

1 + α

)NEt−1pt+N (3.19)


3.3 Solving Rational Expectations (State-Space-Representation) 46

Let N → ∞ and assume that Et−1pt+i is stable. As for (α/(1 + α))N = 0 for N → ∞, (3.19)

becomes

Et−1pt−1 =1

(1 + α)

∞∑i=0

(α

1 + α

)i(m− y∗) = m− y∗ (3.20)

The same result could be obtained by using the forward operator, B−1 (B is the backward operator

which instructs to lag the variable but not the expectations date, unlike L which instructs to lag

both). (3.16) becomes

(1 + α)

(1− α

1 + αB−1

)Et−1pt = m− y∗ (3.21)

Solving gives

Et−1pt =1

1 + α

m− y∗(1− α

1+αB−1) =

1

1 + α

∞∑i=0

(α

1 + αB−1

)i(m− y∗) = m− y∗ (3.22)

This is a particular case where the exogenous variables are constant. However, Sargent’s method

can be generalised.

A discussion of stability and the terminal condition, which may be extended to the topics of

bubbles may be found in the book of Minford and Peel (2002). The book also explains alternative

methods of solving, such as the Muth Method of Undetermined Coefficients or the Lucas Method

of Undetermined Coefficients.

3.3 Solving Rational Expectations (State-Space-Representation)

A more general solution of the problem is being shown here. The model in state-space form is[x1,t

Etx2,t+1

]= A

[x1,t

x2,t

]+

[εt+1

0

](3.23)

where x1,t is an n1-vector of predetermined variables with the initial value x1,0 given, x2,t is an

n2-vector of ”forward looking” variables, and εt is white noise with covariance∑

.

At this point, take the expectations of (3.23)

Et

[x1,t

x2,t+1

]= A

[x1,t

x2,t

](3.24)

Calculate the Schur decomposition of A in (3.23) and reorder both T and Z so that the eigenvalues

with modulus smaller than one come first. If there are nθ stable (in modulus smaller than one)

eigenvalues and nδ unstable eigenvalues, T can be partitioned as

T =

[Tθθ Tθδ

0 Tδδ

]

Introduce the auxiliary values [θt

δt

]= ZH

[x1,t

x2,t

](3.25)



According to the Schur decomposition (3.24) can be written as

Et

[x1,t

x2,t+1

]= ZTZH

[x1,t

x2,t

]

Pre-multiplying with ZH and using (3.25) gives

Et

[θt+1

δt+1

]= ZHZT

[θt

δt

]

=

[Tθθ Tθδ

0 Tδδ

][θt

δt

], since ZHZ = I (3.26)

Because Tδδ contains the unstable eigenvalues, δt diverges when t increases unless δ0 = 0. Any

stable system requires therefore that δt = 0 for all t. Thus (3.26) can be simplified to

δt = 0

Etθt+1 = Tθθθt (3.27)

Invert (3.25) and partition as[x1,t

x2,t

]=

[Zkθ Zkδ

Zλθ Zλδ

][θt

δt

]=

[Zkθ

Zλθ

]θt (3.28)

since δt = 0.

The initial condition is that the x1,0 is given. From (3.28)

x1,0 = Zkθθ0 (3.29)

which can be solved for θ0, if Zkθ is invertible. It has n1 rows (the number of predetermined/backward

looking variables) and nθ columns (as many as stable roots). So, one necessary condition is that the

number of stable roots equals the number of predetermined variables (Proposition 1 of Blanchard

and Kahn (1980)). If that is the case and Zkθ is invertible, then

θ0 = Z−1kθ x1,0 (3.30)

so the stable solution can be calculated using (3.30) and (3.26), and then transforming back to

x1,t.

From equation (3.23) and (3.24) we know that x1,t+1 − Etx1,t+1 = εt+1. Using (3.28) the latter

may be written as

Zkθ(θt+1 − Etθt+1) = εt+1 (3.31)

Under the same conditions as mentioned above for Zkθ, this can be inverted and written as

θt+1 = Etθt+1 + Z−1kθ εt+1 (3.32)

Combining this with (3.27) results in

θt+1 = Tθθθt + Z−1kθ εt+1 (3.33)



which is, together with (3.30) and (3.28) a complete solution of the stochastic model. However,

now we need to re-introduce the dynamics of the system. Using θt = Z−1kθ x1,t from (3.28) in (3.33)

gives

x1,t+1 = ZkθTθθZ−1kθ x1,t + εt+1 (3.34)

Similarly, combining x2,t = Zλθθt with θt = Z−1kθ x1,t (both from (3.28)) gives

x2,t = ZλθZ−1kθ x1,t (3.35)

Summarise the last two equations to obtain

x1,t+1 = Mx1,t + εt+1, and (3.36)

x2,t = Cx1,t (3.37)

where the definitions of M and C are obvious, looking at (3.34) and (3.35).

Example 3.1: The Cagan model (see, for example, Blanchard and Kahn (1980)).

lnMt − lnPt = −ωit where ω > 0 (E3.1.1)

where Mt, Pt and it denote the real money balance, the price level and the nominal interest

rates. The real interest rate is assumed to be invariable over time, so the Fisher equation

becomes

it = Et(lnPt+1 − lnPt) + cons (E3.1.2)

Combining the two equations and rearranging gives

lnPt = (1− α) lnMt + αEtPt+1, with 0 < α < 1 as α = ω/(1 + ω) (E3.1.3)

Assume that the money supply is an exogenous AR(1)

lnMt+1 = ρ lnMt + εt+1 (E3.1.4)

The model can be rewritten and represented in a state-space form as in (3.23)[lnMt+1

Et lnPt+1

]=

[ρ 0α−1α

1α

][lnMt

lnPt

]+

[εt+1

0

](E3.1.5)

The eigenvalues of the matrix A are ρ and 1/α. Now, let’s assign α = 0.5 and ρ = 0.9. The

Schur decomposition is then

A =

[ρ 0α−1α

1α

]=

[0.9 0

−1 2

], and

Z ≈

[−0.740 −0.673

−0.673 0.740

], T =

[0.9 −1

0 2

], ZH ≈

[−0.740 −0.673

−0.673 0.740

](E3.1.6)



To solve the model in (E3.1.5) recall equation (3.35) and write

lnPt = Z21Z−111 lnMt

= −0.673 · (−0.74)−1 lnMt

≈ 0.909 lnMt, exact answer is 1011 lnMt

Equation (3.34) gives the solution for the AR(1) process of money supply

lnMt+1 = Z11T11Z−111 lnMt + εt+1

= −0.74 · 0.9 · (−0.74)−1 lnMt + εt+1

= 0.9 lnMt + εt+1

Example 3.2: The Cagan model with too many stable roots

Consider the Cagan model in Example 3.1 but change the price equation to

lnPt = lnMt + aEtPt+1, with 0 < a < 1 (E3.2.1)

The model can be written in matrix notation as[lnMt+1

Et lnPt+1

]=

[ρ 0

− 1a

1a

][lnMt

lnPt

]+

[εt+1

0

](E3.2.2)

with the eigenvalues ρ and 1/a. For illustration, suppose that |aρ| < 1. Then, iterating on

the price equation in (E3.2.1) gives the stable fundamental solution

lnP ∗t =

∞∑s=0

asEt lnMt+s = 11−aρ lnMt

However, the full set of solutions is lnPt = lnP ∗t + bt, where bt is a bubble. Try this in

(E3.2.1) to get

11−aρ lnMt + bt = lnMt + aEt

(1

1−aρ lnMt+1 + bt+1

)= lnMt + aρ

1−aρ lnMt + aEtbt+1

It can be easily seen that the term simplifies to bt = aEtbt+1. That means that

Etbt+1 = bt/a. When |a| < 1 the bubble is unstable and we may choose bt = 0 to ob-

tain an economically meaningful (i.e. stable) solution of the price level lnPt = lnP ∗t + bt.

However, with |a| > 1 there is an infinite number of stable bubbles, which all have a stable

price level). There is no reason to choose one over the other.


3.4 The Problem of Multiple Solutions 50

3.4 The Problem of Multiple Solutions34

First, consider the simple case where the model is linear and includes expectations only of current

endogenous variables but not of future endogenous variables. In this case it is very easy to eliminate

the expectation variables. Let the model be

CZt +A1Zt−1 +A2Zt−2 + C0Zt|t−1 + ΓXt = ut (3.38)

where ut is a vector of serially uncorrelated disturbances and Xt is a vector of strictly exogenous

variables, which are known at the end of period t−1. That is: Xt|t−1 = Xt. Taking the conditional

expectation of (3.38) yields

CZt|t−1 +A1Zt−1 +A2Zt−2 + C0Zt|t−1 + ΓXt = 0 (3.39)

Solve for Zt|t−1 to obtain

Zt|t−1 = −(C + C0)−1(A1Zt−1 +A2Zt−2 + ΓXt) (3.40)

Substitute (3.40) for Zt|t−1 into (3.38)

CZt +A1Zt−1 +A2Zt−2 + ΓXt − C0(C + C0)−1(A1Zt−1 +A2Zt−2 + ΓXt)

= CZt + [I − C0(C + C0)−1](A1Zt−1 +A2Zt−2 + ΓXt) = ut (3.41)

which is a model including only observed variables. Call (3.41) a solution to the model (3.38).

The whole story gets a little bit trickier when the model includes also expectations of future

endogenous variables. The above mentioned procedure will not work.

3.5 Solution to Linear Expectation Models

Let the reduced form of a system of linear simultaneous equations be written as

C−1(CZt+A1Zt−1+ . . .+ApZt−p+C0Zt|t−1+C1Zt+1|t−1+ . . .+CqZt+q|t−1+ΓXt) = C−1ut = vt

(3.42)

where Zt is a vector of G endogenous variables, Xt is a vector of K exogenous variables, ut is a

vector of normal and serially uncorrelated random disturbances, and Zt+i|t−1 is the expectation

of Zt+i conditional on the information up to the end of date t − 1. The Xt are treated as given

when the model is solved to explain Zt.

Assume that (3.42) is consistent with the following model

Zt = R0vt +R1vt−1 +R2vt−2 + . . .+K0Xt +K1Xt−1 +K2Xt−2 + . . . (3.43)

This assumption is justified because any linear model explaining Zt by Zt−1, Zt−2, . . . , Xt, Xt−1, . . . ,

and vt can be put in this form after repeated substitutions for the lagged Z’s. Taking expectation

34Source: Chow, Econometrics, 1983, Chapter 11.


3.5 Solution to Linear Expectation Models 51

of (3.42) and subtracting the result from (3.42) yields Zt − Zt|t−1 = vt. Similar operations on

(3.43) give Zt − Zt|t−1 = R0vt. This implies that R0 = I.

To find the relation between Zt+m|t−1 and Zt+m proceed as follows: Advance the time subscripts

in (3.43) by m

Zt+m = vt+m +R1vt+m−1 +R2vt+m−2 + . . .+K0Xt+m +K1Xt+m−1 +K2Xt+m−2 + . . .

and take the conditional expectation of the result given It−1

Zt+m|t−1 = Rm+1vt−1 +Rm+2vt−2 + . . .+K0Xt+m|t−1 +K1Xt+m−1|t−1 +K2Xt+m−2|t−1 + . . .

+KmXt +Km+1Xt−1

The two results can now be subtracted from each other to obtain

Zt+m−Zt+m|t−1 = vt+m+R1vt+m−1+R2vt+m−2+ . . .+Rmvt+K0et+m+ . . .+Km−1et+1 (3.44)

where et+j = Xt+j −Xt+j|t−1. Equation (3.44) can be used to explain all expectations Zt+m|t−1

(m = 0, 1, . . . , q) in model (3.42) by the directly observed Zt+m minus a linear combination

of vt+m−k (k = 0, 1, . . . ,m) and et+m−k (k = 0, 1, . . . ,m − 1). If C−1Cq is non-singular, the

resulting model can explain the dynamic evolution of the entire vector Zt+q. This model is free

of expectation variables and can be shown to be consistent with model (3.42) by the method to

treat (3.46) below. It does employ the additional parameters R1, . . . , Rq and K0, . . . ,Kq−1 which

characterise the multiple solutions to (3.42).

However, frequently the matrix C−1Cq is singular because some elements of Zt+q|t−1 are not

included in model (3.42). The matrix Cq will have columns of zeros corresponding to these

elements. Let Zat+q|t−1 be a vector consisting of the g1 nonzero elements of Zt+q|t−1 and reorder

the elements of Zt to write

CqZt+q|t−1 = (Caq 0)Zt+q|t−1 = CaqZat+q|t−1

Consider the reduced-form equations for Zat , which are the first g1 equations of model (3.42).

Denote the first g1 rows of C−1 by C−1a , write the reduced-form equations as

C−1a (CZt +A1Zt−1 + . . .+ApZt−p + C0Zt|t−1 + C1Zt+1|t−1 + . . .

+ CaqZat+q|t−1 + ΓXt) = C−1a ut ≡ vat (3.45)

where C−1a CZt = Zat . Our strategy is to find a model free of expectation variables to replace

(3.45) for explaining Zat , a second model to explain another subvector Zbt of Zt which represents

variables appearing in Zt+q−1|t−1 but not in Zt+q|t−1, and so forth.


3.6 Conclusion 52

To explain Zat+q, we use (3.44) to replace all expectation variables in (3.45) to yield the model

C−1a (CaqZat+q + . . .+ C1Zt+1 + (C0 + C)Zt +A1Zt−1 + . . .+ApZt−p + ΓXt)

= C−1a Cq(K0et+q + . . .+Kq−1et+1) + . . .+ C−1a C1(K0et+1)

+ C−1a Cq(vt+q +R1vt+q−1 + . . .+Rqvt) + . . .

+ C−1a C1(vt+1 +R1vt) + C−1a C0vt + vat

= Nqet+q + . . .+N1et+1 + C−1a Caq fat+q +Dq−1vt+q−1 + . . .+D0vt (3.46)

where the matrices Dj (j = 0, 1, . . . , q − 1) are g1 ×G and are defined by the line above. Assume

that C−1a Caq is non-singular. If this is not the case, it is possible to rearrange the equations of

(3.46) to make it so (see Chow(1983) p.358).

To show that (3.46) is consistent with (3.45), take expectations of (3.46) given It−1, which gives

C−1a (CaqZat+q|t−1 + . . .+C1Zt+1|t−1 + (C0 +C)Zt|t−1 +A1Zt−1 + . . .+ApZt−p + ΓXt) = 0 (3.47)

Take also expectations of (3.46) given It+q−1 and subtract the result from (3.46) to obtain

C−1a Caq (Zat+q − Zat+q|t+q−1) = C−1a Caq vat+q

which implies Zat − Zat|t−1 = vat . Replacing C−1a CZt|t−1 = Zat|t−1 in (3.47) by Zat − vat gives the

reduced form (3.45).

Model (3.46) can now be used to explain Zat . It is free of expectation variables, but employs

additional parameters. It is called a solution to the reduced-form equations for Zat because it

is a stochastic model describing how Zat evolves. To specify (3.46) completely, specification of

et+m = Xt+m −Xt+m|t+m−2 (m = 1, . . . , q) is needed. In practice, Xt is assumed to be governed

by some stochastic process independent of both ut and Zt−k (k ≥ 0) and Xt|t−2 is assumed

to be a linear function of Xt−1, . . . , Xt−s. The latter is justified when the process governing

Xt is autoregressive of order s. If Xt obeys an ARMA process, it will be assumed that the

expectation Xt|t−2 is formed by s lagged values of Xt because economic agents only possess so

much information. In any case, assume that the modelling of the Xt proves has been completed

and et+1, . . . , et+q will be treated as observable variables. With this understanding, the solution

(3.46) to the reduced-form equations of Zat is obtained by (i) replacing all expectation variables

by their actual values (dropping the |t − 1 in the subscripts), and (ii) replacing the residual vat

by C−1a Caq vat+q + Dq−1vt+q−1 + . . . + D0vt, plus a linear function Nqet+q + . . . + N1et+1 of the

one-period-ahead prediction errors et+1, . . . , et+q of the Xt process.

3.6 Conclusion

The complexity of the three different underlying models may vary substantially. However, in the

mechanics of solving each problem, some similarities arise.


Topic 4: Models of Optimising Agents 53

4 Topic 4: Models of Optimising Agents35

Introduction

The techniques and methods discussed and explained above help to understand the impact of

selected policies by governments. However, an answer to the question, which policy should be

adopted in order to achieve the ”best” result, could not be provided. This is the scope of optimal

control which will be discussed now.

A government usually has the power to change or manipulate some variables in the system. Those

variables are called policy variables or often referred to also as control variables or instru-

ments. In most econometric models these variables are part of the exogenous variables. Thus, it

is not explained by the model how the values of those variables are determined.

Two different kinds of policy can be distinguished. The first is a specification of the time paths

of the policy variables at the beginning of a planning period. These paths are to be followed by

the policy maker without regard to future events. The second kind is a specification of the policy

variables as functions of observations yet to be made. This function is called a control rule or

a control equation. The term feedback is added to indicate that the results of current policy

will determine future policy. An example is

Xt = GZt−1 + g (4.1)

where Xt is a q× 1 vector of policy variables, G a q× p matrix of coefficients and g a q× 1 vector

of intercepts. If a feedback control rule is used, the values of the policy variables in the future will

depend on future observations.

Both kinds of policies can be incorporated into an econometric model for dynamic analysis. The

first amounts to specifying a time path for the vector Xt. Using a linear model,

Zt = BZt−1 + CXt + b+ ut

= BtZ0 + ut +But−1 + . . .+Bt−1u1 + b+Bb+ . . .+Bt−1b+ CXt +BCXt−1 + . . .+Bt−1CX1

(4.2)

A control rule, such as (4.1) can be combined with the linear model (4.2) to form

Zt = (B + CG)Zt−1 + (b+ Cg) + ut (4.3)

The crucial question is now, which criteria should be used to judge the outcome of a certain

policy. If the means, variances and covariances are important characteristics of the time series for

dynamic analysis, perhaps they should be included in the criterion function. Such a function is

called a criterion or objective function. It is a scalar function that measures the desirability of

the variables or their characteristics. When the variables are deterministic, they can enter the as

arguments in the objective function. In this case the term welfare function is also used in the

context of macroeconomics. A widely used - and mathematically convenient - objective function

is a quadratic function of the economic variables generated by the stochastic model.

35Source: Chow, Analysis and Control of Economic Systems, 1986, Chapter 7.


Topic 4: Models of Optimising Agents 54

As stated above, the quadratic form of the objective function is widely used, among other reasons

because it is mathematically quite easy to handle. However, the objective function, as well as

every economic model, does not exactly represent the reality; it is just an approximation. A more

complicated form of the objective function could also be used, but the possibilities of computability

limit the complexity. Therefore, it should always be kept in mind, that economic models are an

approximation to reality and that the conclusions drawn from it are always only as good as is the

model.

Remember the simple model used in topic 1 (1.1). We may extend this model now by some

exogenous variables.

Zt = b+B1Zt−1 +B2Zt−2 + . . .+BqZt−q + C0Xt + . . .+ CpXt−p + ut (4.4)

Bi and Ci are given constants matrices and ut is a serially uncorrelated vector with mean zero

and covariance matrix Ω. The matrices Bi and Ci could also be changed over time, but for the

sake of simplicity, we rule this out.

The system (4.4) can be written as a first-order system

Zt = b+BZt−1 + CXt + ut (4.5)

The newly defined Zt includes current and (possibly) lagged dependent variables as well as current

and (possibly) lagged control variables, whereas Xt remains the same.

The performance of the system is measured by the deviations of Zt, as defined in (4.5), from the

target vector at. The vector at has the same dimension as Zt. However, as Zt includes also lagged

variables, only the elements corresponding to the current variables are relevant. The objective is

to minimise

E0W = E0

T∑t=1

(Zt − at)′K(Zt − at) (4.6)

where the expectation E0 is conditional on the initial condition Z0, again in the notation of

(4.5) and K is a known, symmetric (usually diagonal), positive semidefinite matrix, with zero

elements normally corresponding to lagged (both endogenous and control) variables. Often, if the

econometric model contains a large number of variables, only a few are relevant for the welfare

function. An optimal control problem is to minimise the expected welfare loss (4.6),

given the econometric model (4.5).

Two of the favourite candidates for the inclusion in the welfare function are the changes in the

price level and the rate of unemployment. If these two variables are already part of the vector

Zt, we can simply assign a number to the corresponding elements in at. If only the price level,

but not its change is included, there are several ways to incorporate this in the loss function. One

is to convert the desired changes in the price level to a time path for the price index. This may

be a path of a constant absolute or constant percentage growth from period 0. Another way is

to introduce a new endogenous variable ∆Zi,t, using the identity ∆Zi,t = Zi,t − Zi,t−1 assuming

that Zi,t is the general price index. At this point it is important to note that controlling the

level is different from controlling its first difference. The first does not penalise period-to-period


4.1 Solution of the Optimal Control Problem 55

changes whereas the second does. Controlling ∆Zi,t to make it as close to a constant as possible

will attach a high cost to the oscillating time path. It is also feasible to use both the level and the

first differences of the same variable in the welfare function.

Two possible defeats of a quadratic welfare function as stated in (4.6) should be discussed.

First, a positive deviation from target is assigned the same cost as a negative deviation. This

seems particularly odd for unemployment: if the target for the unemployment rate is 3 percent,

an unemployment rate of 2.5 and of 3.5 percent is treated as the same. However, this may be

solved easily with assigning a target that is idealistic low and unlikely to be achieved, such that

the actual solution may be above targets. If so, the probability to be on the low side of the target

is small and the errors introduced in assessing the negative deviations will be small.

Second, the welfare function is additive, that is, the sum of functions in different periods. In this

way it is not possible to measure the variance over time. Look at the following example:

Example 4.1: Defeat of quadratic welfare function

Look at a scalar variable yt for two periods. Let the expected loss function be

E(W ) = E(y21 + y22)

Now, let y1, y2 be random variables taking only values 0 and 1 with equal probabilities. The

two variables are statistically independent. So

E(W ) = 1

The sum y1 + y2 can take values 0, 1 and 2 with probabilities .25, .5 and .25, respectively.

If we now say y1 = y2, the sum can take values 0 and 2 with probability .5 for both. This

seems to be riskier, but has the same expected loss.

This kind of shortcoming can be solved by taking into account also the cross-product term y1y2

in the loss function to penalise positive covariance between the outcomes in the two periods.

4.1 Solution of the Optimal Control Problem

In order to solve the optimal control problem, it is decomposed into two parts. The first is a

deterministic control problem that uses the deterministic model

Zt = BZt−1 + CXt + bt, (Z0 = Z0) (4.7)

which is obtained by setting the random disturbance ut in (4.5) equal to its mean value 0. The

bars are used to indicate the endogenous and control variables in this deterministic model. The

second part is obtained by subtracting (4.7) from (4.5). This is a stochastic control problem that

uses the stochastic model

Z∗t = BZ∗t−1 + CX∗t + ut, (Z∗0 = 0) (4.8)


4.2 Solution to the Deterministic Control by Lagrange Multipliers 56

with Z∗t = (Zt − Zt). Following the decomposition of the model also the welfare function may be

portioned accordingly

E(W ) =

T∑t=1

(Zt − at)′K(Zt − at) + E

T∑t=1

Z∗′

t KZ∗t = W1 + E(W2) (4.9)

The following steps will be to solve first the deterministic control problem to minimise W1 with

respect to X∗t , given the model (4.7). Then, a stochastic control problem is to minimise E(W2)

with respect to X∗t , given the stochastic model (4.8). When the two separate solutions are found,

the policy variables Xt are set equal to the sum of the two partial solutions.

4.2 Solution to the Deterministic Control by Lagrange Multipliers

To solve the deterministic control problem introduce the vectors λt of Lagrange multipliers and

differentiate the Lagrangian expression

L1 =1

2

T∑t=1

(Zt − at)′KT (Zt − at)−T∑t=1

λ′t(Zt −BZt−1 − CXt − bt) (4.10)

The differentiation rules are

∂

∂x(x′a) = a and

∂

∂x( 12x′Ax) = Ax (4.11)

where a is a vector of constants, A a symmetric matrix of constants and x the variable vector.

The following derivatives are obtained:

∂L1

∂Zt= KT (Zt − at)− λt +B

′λt+1 = 0, (t = 1, . . . , T ;λt+1 = 0) (4.12)

∂L1

∂Xt= C

′λt = 0, (t = 1, . . . , T ) (4.13)

∂L1

∂λt= −Zt +BZt−1 + CXt + bt = 0, (t = 1, . . . , T ) (4.14)

The dynamic situation is treated simply by defining the variables Zt, Xt and λt at different

points in time as separate variables. The dynamic nature of the problem, however, gives a special

structure to the simultaneous equations (4.12), (4.13) and (4.14) and requires special methods for

their efficient solution.

One efficient way of solving the system is to start with t = T and repeat the following three steps

backward in time for t = T − 1, . . . , 1. First, (4.12) is used to express λt as a function of Zt.

Second, the result, together with (4.13) and (4.14), is used to solve for Xt. Third, the results of

the first two steps together with (4.14) are used to express Zt and λt as linear functions of Zt−1.

Using the last linear function, express λt−1 as a linear function of Zt−1, and begin again with step

one for the next round. For t = T step one, using (4.12) gives

λT = KT ZT −KTaT +B′λT+1 = HT ZT − hT (4.15)


4.2 Solution to the Deterministic Control by Lagrange Multipliers 57

where, anticipating generalisation to t < T , we have set

HT = KT (4.16)

and

hT = KTaT (4.17)

By (4.13), (4.14) and (4.15) in the second step

C′λT = 0 = C

′(HT ZT − hT ) = C

′(HTBZT−1 +HTCXT +HT bT − hT ) (4.18)

which implies

XT = GT ZT−1 + gT (4.19)

where

GT = −(C′HTC)−1C

′HTB (4.20)

gT = −(C′HTC)−1C

′(HT bT − hT ) (4.21)

The matrix C′HTC is assumed to be non-singular. If the rank r of C

′HTC is smaller than the

number q of control variables, we can arbitrarily set (q− r) elements of XT in (4.18) equal to any

desired values and solve (4.18) for the remaining elements of XT . In the third step we use (4.14)

and (4.19) to solve for ZT as a function of ZT−1.

ZT = (B + CGT )ZT−1 + CgT + bT (4.22)

This result can be applied to (4.15) to express λT also as a function of ZT−1.

λT = HT (B + CGT )ZT−1 +HT (CgT + bT )− hT (4.23)

Having solved for λT in terms of ZT−1, substitute (4.23) into (4.12) to obtain an equation analogous

to (4.15) in the first step:

λT−1 = K(ZT−1 − aT−1) +B′λT = HT−1ZT−1 − hT−1 (4.24)

where

HT−1 = KT−1 +B′HT (B + CGT ) (4.25)

hT−1 = KT−1aT−1 −B′HT (bT + CT gT ) +B

′hT (4.26)

The development from (4.18) on can now be followed, with T − 1 replacing T , and so on.

To apply this solution to the deterministic control problem use the pair of equations (4.20) and

(4.25) to obtain GT , HT−1, GT−1, . . . consecutively backward in time with (4.16) as the initial

condition. Then, given HT , use the pair of equations (4.21) and (4.26) to obtain gT , hT−1, gT−1, . . .

consecutively backward in time with (4.17) as the initial condition. Having obtained Gt and gt,

we set the optimal Xt by the linear feedback control rule (4.19) on Zt−1.


4.3 Solution of Stochastic Control by Lagrange Multipliers 58

Note that the solution of the optimal Xt is a linear function of Zt−1. Note also that the coefficients

Gt and gt are obtained by solving two sets of difference equations backward in time. Let there

be p dependent variables in Zt−1, including the lagged variables and control variables. Let there

be q control variables. The matrices Gt are q × p. They are obtained sequentially by (4.20) and

(4.25), which can be obtained to form a matrix difference equation in the p× p matrix Ht:

Ht−1 = Kt−1 +B′

tHt(Bt + CtGt)

= Kt−1 +B′tHtBt −B′tHtCt(C′tHtCt)

−1C ′tHtBt (4.27)

Equation (4.27) is know as a matrix Riccati equation and is an example of a nonlinear difference

equation. The matrices Ht are symmetric because HT is symmetric; given symmetric KT−1 and

HT , HT−1 is also symmetric by (4.27), and so on. The difference equation (4.27) is easy to solve.

Each step for one time period involves essentially matrix multiplications. The matrix C ′tHtCt to

be inverted is of order q only. In practice, the number q of control variables is probably quite small

even if the number p of dependent variables in the model is large. Similarly, the intercepts gt in

the optimal control equations are obtained by solving (4.21) and (4.26), which can be combined

to form a vector difference equation in the p× 1 vector ht

ht−1 = Kt−1at−1 −B′tHt(bt + Ctgt) +B′tht

= Kt−1at−1 −B′tHtbt +B′tHtCt(C′tHtCt)

−1C ′t(Htbt − ht) +B′tht

= Kt−1at−1 + (Bt + CtGt)′(ht −Htbt) (4.28)

4.3 Solution of Stochastic Control by Lagrange Multipliers

The second part of the control problem is to minimise E(W2) in (4.9), given the stochastic system

(4.8). We seek the optimal linear feedback equation for the control variable X∗t . That is, we search

for the optimal matrix Gt in the linear equation

X∗t = GtZ∗t−1 (4.29)

To solve the problem by the method of Lagrange multipliers first rewrite the objective function in

terms of the (nonstationary) covariance matrices E(Z∗t Z∗′t ) of the vectors Z∗t and utilise a set of

restrictions on these covariance matrices to form a Lagrange expression. The objective function is

written as

E(W2) = E

T∑t=1

tr(Z∗′

t KtZ∗t ) = E

T∑t=1

tr(KtZ∗t Z∗′t ) =

T∑t=1

trKt(E(Z∗t Z∗′t )) (4.30)

where tr, or trace, is the sum of the diagonal elements of a matrix and use is made of the property

tr(BG) = tr(GB). In the second term tr(Z∗′

t KtZ∗t ) is simply the scalar (Z∗

′

t KtZ∗t ) itself. To find

a set of restrictions on the covariance matrices when the control rule (4.29) is enforced substitute


4.3 Solution of Stochastic Control by Lagrange Multipliers 59

(4.29) into (4.8)

Z∗t = (Bt + CtGt)Z∗t−1 + ut

= RtZ∗t−1 + ut, (Z∗0 = 0) (4.31)

where the definition of Rt is obvious. Postmultiply (4.31) by Z∗′

t and take expectation

E(Z∗t Z∗′t ) = RtE(Z∗t−1Z

∗′t ) + E(utZ

∗′t )

= RtE(Z∗t−1Z∗′t ) + V (4.32)

because by (4.31) E(utZ∗′t ) = E(utu

′t) = V . Similarly, transpose equation (4.31), premultiply the

result by Z∗t−1, and take expectation

E(Z∗t−1Z∗′t ) = E(Z∗t−1Z

∗′t−1)R′t + E(Z∗t−1u

′t)︸︷︷︸

=0

(4.33)

Substitution of (4.33) into (4.32) gives the desired difference equation in E(Z∗t Z∗′t ) = Γ(t, 0) = Γ.t.

E(Z∗t Z∗′t ) = RtE(Z∗t−1Z

∗′t−1)R′t + V (4.34)

or

Γ.t = (Bt + CtGt)Γ.t−1(Bt + CtGt)′ + V

To incorporate (4.34) as a set of constraints for t = 1, . . . , T in a Lagrange expression for minimising

(4.30) a p× p matrix Ht of Lagrange multipliers is introduced for each constraint. If each element

of a p × p matrix X is constrained to be zero, the sum of p2 constraints in the form∑i,j hijxij

has to be added to the Lagrange expression, where xi,j is the i − j element of X and hi,j is the

corresponding Lagrange multiplier. The sum of these constraints can be written as the trace of

the product HX ′. Because the i-th diagonal element of the product HX ′ is∑j hijxij , the trace

of HX ′, denoted by tr(HX ′), is∑i,j hijxij , a term to be included in the Lagrange expression if

the matrix X is constrained to be the 0 matrix. Therefore for our problem of minimising (4.30),

subject to the matrix constraints (4.34), write the Lagrange expression as

L2 =

T∑t=1

tr(KtΓ.t)−T∑t=1

trHt[Γ.t − V − (Bt + CtGt)Γ.t−1(Bt + CtGt)′] (4.35)

The variables are the elements of Gt,Γ.t, and Ht.

The differentiation rule∂

∂Gtr(BG) =

∂

∂Gtr(GB) = B′

is applied to (4.35). Consider differentiating

tr(HtCtGtΓ.t−1B′t) = tr(GtΓ.t−1B

′tHtCt)

with respect to Gt, for example. Application of this rule yields

(Γ.t−1B′tHtCt)

′ = C ′tHtBtΓ.t−1


4.4 The Combined Solution and the Minimum Expected Loss 60

as the matrix of derivatives. Note that the matrix Ht of Lagrange multipliers is symmetric because

the corresponding constraints

Γ.t − V − (Bt + CtGt)Γ.t−1(Bt + CtGt)′ = 0

constitute a symmetric matrix. The derivatives of L2 are

∂L2

∂Gt= 2C ′tHtBtΓ.t−1 + 2C ′tHtCtGtΓ.t−1 = 0, (t = 1, . . . , T ) (4.36)

∂L2

∂Γ.t= Kt −Ht + (Bt+1 + Ct+1Gt+1)′Ht+1(Bt+1 + Ct+1Gt+1) = 0, (t = 1, . . . , T − 1)

(4.37)

∂L2

∂Γ.T= KT −HT = 0, (4.38)

∂L2

∂Ht= −[Γ.t − V − (Bt + CtGt)Γ.t−1(Bt + CtGt)

′] = 0, (t = 1, . . . , T ) (4.39)

Equations (4.36)- (4.39) provide a set of necessary conditions for the unknowns Gt,Γ.t, and Ht.

Equation (4.36) can be satisfied by choosing

Gt = −(C ′tHtCt)−1C ′tHtBt (4.40)

Equations (4.37) and (4.38) are equivalent to

Ht = Kt + (Bt+1 + Ct+1Gt+1)′Ht+1(Bt+1 + Ct+1Gt+1) (4.41)

Therefore the unknowns Gt and Ht can be found by using the initial condition HT = KT from

(4.38) and solving (4.40) and (4.41) alternately backward in time for t = T, T − 1, . . . , 1. It

is interesting to observe that the coefficients Gt in the optimal feedback control equations X∗t =

GtZ∗t−1 for controlling the variances of the random deviations Z∗t are identical with the coefficients

in Gt in the optimal feedback control equations Zt = BZt−1 + CZt + bt for steering the means

Zt to the targets at obtained previously for the deterministic control problem. Equation (4.41) is

easily shown to be the same as (4.25). Having solved for the optimum Gt and Ht, we can find the

covariance matrices Γ.t by (4.39), namely

Γ.t = V + (Bt + CtGt)Γ.t−1(Bt + CtGt)′ t = 1, . . . , T (4.42)

Equation (4.42) is solved forward in time from t = 1 to t = T , using the initial condition Γ.0 =

E(Z∗0Z∗′0 ) = 0 which is due to the definition Z0 = Z0 or Z∗0 = 0.

4.4 The Combined Solution and the Minimum Expected Loss

Combining the solution of the two previous sections for the deterministic and the stochastic parts

of the control problem gives the optimal control equation for Xt:

Xt = Xt +X∗t = GtZt−1 + gt +GtZ∗t−1

= GtZt−1 + gt (4.43)


4.4 The Combined Solution and the Minimum Expected Loss 61

The matrices Gt are computed using (4.20) and (4.25) or equivalently by using (4.40) and (4.41).

The vectors gt are computed by (4.21) and (4.26). In the computations the targets at affect

only gt but not Gt. Both Gt and gt are computed from information already known before any

observation on (Z1, . . . , ZT ) is made. The optimal policy Xt is set by (4.43), using Gt and gt and

the observation on Zt−1.

The combined solution (4.43) could have been found by minimising the expected welfare cost (4.9)

of the original problem, subject to the constraints (4.7) and (4.34), using the method of Lagrange

multipliers. The Lagrange expression for the combined problem is

L =1

2

T∑t=1

(Zt − at)′Kt(Zt − at) +1

2

T∑t=1

tr(KtΓ.t)

−T∑t=1

λ′t(Zt −BZt−1 − CXt − bt)

− 1

2

T∑t=1

trHt[Γ.t − V − (Bt + CtGt)Γ.t−1(Bt + CtGt)′] (4.44)

Differentiating L with respect to Xt, Zt, and λt yields (4.12), (4.13), and (4.14). The solution

of these equations provides a control rule for Xt. Differentiating L with respect to Gt,Γ.t and

Ht yields (4.36), (4.37), (4.38), and (4.39). Their solution provides a control rule for X∗t . The

combined solution for Xt is their sum, given in (4.43).

Using this solution, we can compute the minimum expected welfare loss. The expected loss is

conveniently divided into two parts. The deterministic part

W1 =

T∑t=1

(Zt − at)′Kt(Zt − at)

is evaluated by using the solution of Zt of the deterministic system under optimal control. That

is

Zt = BtZt−1 + CtXt + bt

= (Bt + CtGt)Zt−1 + Ctgt + bt (4.45)

This solution can be calculated forward in time for t = 1, 2, . . . , T , given Z0 = Z0. The second

part of the expected loss is

E(W2) =

T∑t=1

tr(KtΓ.t)

The matrices Γ.t required in its evaluation are obtained from (4.42). If the dynamic system were de-

terministic, with ut absent from (4.5), the optimal control equation would still be Xt = GtZt−1+gt,

the same as the combined solution for the stochastic model (4.5), but the minimum welfare loss

would consist of only the deterministic part W1. The second part E(W2) of the expected loss is a

measure of the importance of the random disturbances in the welfare calculations.


4.5 The Steady-State Solution 62

If the number of target variables (the number of nonzero elements in the p × p diagonal matrix

Kt) equals the number q ≤ p of control variables, the time path Zt generated by the deterministic

system (4.45) under optimal control will meet the targets exactly and the deterministic part of

W1 of the minimum expected welfare loss will be 0, provided that Ct is of rank q. For a proof of

the theorem, see Chow (1986), p. 167.

4.5 The Steady-State Solution

An interesting question is under which circumstances Gt and gt in the control equation (4.43) are

stable over time. The answer is provided separately for Gt and gt. Clearly, from (4.40) and (4.41)

follows that the matrices Bt and Ct must be invariant through time. Further, assume that also

Kt does not change over time. By equations (4.40) and (4.41) the steady-state solutions for G

and H need to satisfy

G = −(C ′HC)−1C ′HB (4.46)

and

H = K + (B + CG)′H(B + CG)

= K +R′HR (4.47)

If a matrix can be found to make R = (B + CG) so small that (4.47) can be satisfied for some

H, then a steady-state solution for Gt exists. Consider the special case when both B and C are

scalars. Then the scalar R can be made zero by choosing G = −C−1B, and H = K is the solution.

In a somewhat more general case C is a non-singular p× p matrix. G can be set equal to −C−1Bto make the matrix R vanish and again, H = K is the solution for (4.41). This is the case of

having the same number q of control variables as the number p of dependent variables. It cannot

occur if the instruments Xt are imbedded in the vector Zt because p will then be larger than q. In

this case, let the matrix K be diagonal with rank q. It can be shown that if the number of target

variables equals the number of instruments the following holds

Kt(Bt + CtGt) = 0 (4.48)

Select H = K and have HR = 0 by (4.48) so that (4.47) can be satisfied.

The equality between the number of target variables and the number q of instruments, together

with the assumption that C is of rank q, has been found sufficient for the existence of a steady-

state solution for G. It is not a necessary condition, though. Let the number of target variables

be greater than q; G can no longer be chosen to make R′KR vanish, but a pair of matrices G and

H may still be found to satisfy (4.46) and (4.47) simultaneously. This is possible if and only if,

for a matrix G satisfying (4.46), the roots of R = B + CG are smaller than 1 in absolute value.

The proof is left to interested students in Chow (1986).


4.6 The Method of Dynamic Programming by an Example 63

A steady-state solution for gt can be analysed in a similar way. The assumptions are made that

Bt, Ct, bt and Kt are all invariant through time. Equation (4.28) becomes

ht−1 = Kat + (B − CGt)′(ht −Htb) (4.49)

From (4.49), it can be seen that ht will reach a steady-state when at, Gt and Ht are time-invariant.

If Gt and Ht have a steady-state solution, all roots of R = (B + CG) will be smaller than 1 in

absolute value, as pointed out in the last paragraph. The infinite series

I +R′ +R′2 +R

′3 + . . .

will converge to (I −R′)−1 or the matrix (I −R′) will be non-singular. The steady-state form of

h from (4.49) can then be found by solving

h = Ka+R′(h−Hb) (4.50)

or

(I −R′)h = Ka−R′Hb

h = (I −B − CG)−1(Ka−R′Hb)

Thus, under the assumptions of time-invariant B,C, b,K, and a and a steady-state solution for

Gt, the intercept gt in the optimal feedback control equation will also reach a steady state.

4.6 The Method of Dynamic Programming by an Example36

As an introduction to Dynamic Programming as suggested by Richard Bellman (1957)37 look

at the following example. The problem is to maximise the objective function for three periods

(t = 0, 1, 2).2∑t=0

βtr(xt, ut) (4.51)

subject to the dynamic equation

xt+1 = f(xt, ut) + εt+1 (4.52)

where β is the discount factor; r(xt, ut) is the return for period t, which may depend on the

p-component vector xt of state variables and the q-component vector ut of control variables to

be set by the policy-maker; f(xt, ut) is a p-component vector function, and εt+1 is a vector of

random shocks. First, treat the problem as a deterministic problem by assuming the shocks εt+1

as known at period t to the policy-maker. The three period problem is to maximise the objective

function (4.51) with respect to u0, u1 and u2 subject to the constraint (4.52). By the method of

dynamic programming, first solve the last period 2. Maximising r(x2, u2) with respect to u2 yields

36Source: Chow, Dynamic Economics - Optimization by the Lagrange Method, 1997, Chapter 2.37Bellman, R. (1957); Dynamic Programming; Princeton, Princeton University Press.


4.6 The Method of Dynamic Programming by an Example 64

an optimal feedback control function u2 = g2(x2). This value can then be substituted into the

return function for period 2, obtaining

V2(x2) = r(x2, g(x2))

which depends on x2 at the beginning of period 2. The next step involves solving the problem for

periods 1 and 2. The problem is to find

V1(x1) = maxu1r(x1, u1) + βV2(x2) (4.53)

Because u2 is already given the only control variable remaining is u1. At the beginning of period

1, the economic agent maximises the sum of the two term inside the curly brackets in (4.53) with

respect to u1 to obtain the optimal control function u1 = g1(x1).

Assuming that both V2 and r are differentiable set the vector of derivatives of the expression in

the curly brackets equal to zero

∂∂u1

=∂

∂u1r(x1, u1) + β

∂

∂u1f ′(x1, u1)

∂

∂x2V2(x2) = 0 (4.54)

Solving (4.54) for u1 yields the optimal control function u1 = g1(x1). When the optimal u1 is

substituted into the two-period objective function, the result is the value function for period 1

V1(x1) = r(x1, g1(x1)) + βV2(f(x1, g1(x1)) + ε2) (4.55)

Note that V1(x1) is the value of the objective function from period 1 onward, assuming all control

variables from period 1 onward, namely u1 and u2, to be optimal. Thus, one problem that

involves two vector variables u1 and u2 is reduced to two problems, each of which involves only

one variable. Instead of finding u1 and u2 simultaneously, first find u2, and then, having found

u2, find u1. Having found u1 and u2 and knowing V1(x1), the next step would be to solve the

three-period problem by finding u0 only. That is, find

V0(x0) = maxu0r(x0, u0) + βV1(x1)

To generalise, at each period, taking xt as given and having found all future control ut+1, ut+2, . . .

and obtained Vt+1(xt+1), solve

Vt(xt) = maxutr(xt, ut) + βVt+1(xt+1) (4.56)

Equation (4.56) is known as the Bellman equation. By the principle of optimality, this solution

method, which uses the Bellman equation for each period and begins from the last period, gives

the optimal solution for all periods. The argument is that, whatever the initial state xt for each

period is, the solution ut = gt(xt) so obtained is optimal, because all future policies ut+1, ut+2, . . .

have been found to be optimal whatever their respective initial states xt+1, xt+2, . . . shall be.


4.7 Dynamic Programming 65

4.7 Dynamic Programming38

Recall the linear model of the previous section

Zt = BtZt−1 + CtXt + bt + ut (4.57)

and the quadratic loss function

W =

T∑t=1

(Zt − at)′Kt(Zt − at) =

T∑t=1

(Z ′tKtZt − 2Z ′tKtat + a′tKtat) (4.58)

The problem is to choose the optimal policy x1, . . . , xT which minimises the conditional expectation

E0W , given the initial condition Z0.

By the method of dynamic programming the problem is solved first for the last period T , given the

initial condition ZT−1. Having found the optimal policy XT for the last period, the two-period

problem is solved for the last two periods by finding the optimal XT−1, given the initial condition

ZT−2, et cetera. At the last stage the optimal X1 for the first period is found, given the initial

condition Z0.

Two features should be noted at this point. The problem is solved backward in time, step by

step. At each step, only one unknown vector is determined. Thus, the problem of T unknowns

(x1, . . . , xT ) is transformed in T problems with only one unknown each. By the principle of

optimality the solution by dynamic programming is optimal because at each time t, whatever the

initial condition, all future policies are optimal.

The problem for the last period can be written as

VT = ET−1(ZT − aT )′KT (ZT − aT ) = ET−1(Z ′THTZT − 2Z ′ThT + cT ) (4.59)

where HT = KT for ease of generalisation to the multiperiod problem later on and where hT =

kT ≡ KTaT and cT = a′TKTaT . Using the model (4.57) for ZT and taking expectations, minimise

VT = (BTZT−1 + CTXT + bT )′HT (BTZT−1 + CTXT + bT )

− 2(BTZt−1 + CTXT + bT )′hT + ET−1(u′THTuT ) + cT (4.60)

where use is made of the fact that the stochastic disturbance uT is independent of ZT−1. Differ-

entiating (4.60) with respect to the policy vector XT yields:

∂VT∂XT

= 2C ′THT (BTZT−1 + CTXT + bT )− 2C ′ThT = 0 (4.61)

The solution of (4.61) gives the optimal policy for the last period T :

XT = GTZT−1 + gT (4.62)

where

GT = −(C ′THTCT )−1(C ′THTBT ) (4.63)

gT − (C ′THTCT )−1(C ′THT bT − C ′ThT ) (4.64)

38Chow, Analysis and Control of dynamic economic systems, 1975, Chapter 8.



Equation (4.62) is the optimal feedback control equation, which shows that the optimal policy for

period T is a linear function of all the variables ZT−1 that will affect the outcome ZT through

system (4.57). It also shows the multiperiod nature of the solution. The value of XT cannot be

set in period 1, as the necessary outcomes of ZT−1 are still to come, but the optimal XT can be

written as a function of ZT−1 and is needed to determine the optimal policies for earlier periods.

To obtain the minimum expected welfare loss for the last period, conditional on the data ZT−1,

we substitute (4.62) for XT in (4.60)

VT = Z ′T−1(AT + CTGT )′HT (AT + CTGT )ZT−1

+ 2Z ′T−1(AT + CTGT )′(HT bT − hT )

+ (bT + CT gT )′HT (bT + CT gT )− 2(bT + CT gT )′hT

+ cT + ET−1u′THTuT (4.65)

Note that VT is a quadratic function of ZT−1.

Consider the problem for period T − 1. The two-period problem is to find the optimal policies

XT−1 and XT . However, the solution for XT is already given for the last period as a function of

ZT−1. The only remaining problem is to choose the optimal XT−1. The choice of XT−1 will affect

ZT−1, but whatever ZT−1 will be, the XT found before will always be optimal. This logic carries

over to any period. At any given time t, the problem is to find the optimal Xt, given that the

optimal policies for all periods t + 1, . . . , T . The solution will affect Zt, but given that all future

policies are functions of Zt, all future Xt, . . . , XT will be chosen optimally. Given this principle of

optimality in dynamic programming, for period (T − 1), the following term needs to be minimised

with respect to XT−1.

VT−1 = ET−2(Z ′T−1KT−1ZT−1 − 2Z ′T−1kT−1 + a′T−1KT−1aT−1 + VT ) (4.66)

where the welfare cost includes the contribution from the current period T − 1 plus the minimum

cost VT from period T on. Substituting (4.65) for VT into (4.66) yields

VT−1 = ET−2(Z ′T−1HT−1ZT−1 − 2Z ′T−1hT−1 + cT−1) (4.67)

where

HT−1 = KT−1 + (AT + CTGT )′HT (AT + CTGT ) (4.68)

hT−1 = kT−1 + (AT + CTGT )′(hT −HT bT ) (4.69)

cT−1 = a′T−1KT−1aT−1 + (bT + CT gT )′HT (bT + CT gT )

− 2(bT + CT gT )′hT + cT + ET−1u′THTuT (4.70)

Note that (4.67), which has to be minimised with respect to XT−1, has the same form as (4.59).

We can thus repeat the process from (4.59) to (4.70) and replace the subscripts T with T − 1.

This is the complete solution for the T - period problem.



Briefly, the aim of optimal control is to find the policies Xt as a linear function of the state

variables of the preceding period Zt−1, as stated with GtZt−1 + gt. The matrices Gt and Ht are

found by solving alternatively equations (4.63) and (4.68), backward in time, from t = T and with

the initial conditions HT = KT . Successively, the vectors gt and ht are found by solving (4.64) and

(4.69), backward in time from t = T and with initial condition hT = kT ≡ KTaT . The solution

is identical to the solution found using Lagrange multipliers!

Of course, at the beginning of period 1, the policymaker needs to act only on X1, but as showed

above, the optimal X1 depends on the matrices G1 and H1, which depend on future matrices of

themselves. By the multiperiod dimension of the problem, future optimal policies have to be taken

into account when deciding on the current optimal policy. As can be seen, the vector ct is not

necessary to calculate the optimal policy. However, to determine the welfare lost Vt one needs to

know ct, which can be solved backward in time, using (4.70).

In the case of time-invariant Bt, Ct, and Kt the solution for Gt and Ht may reach a steady state

for t smaller than some certain value. The crucial equations (4.63) and (4.68) become

G = −(C ′HC)−1(C ′HB) (4.71)

H = K + (A+ CG)′H(A+ CG) (4.72)

Define R = (A+ CG) and write expression (4.72) as an infinite series, thus obtaining

H = K +R′KR+R′2KR2 + . . . (4.73)

The steady state will only exist if (4.73) converges, that is, if all eigenvalues of R = (A+CG) are

smaller than one in modulus. Even if Gt and Ht do reach a steady state, gt and ht will not do so

when kt ≡ Ktat and bt are changing over time.


4.8 Example to Topic 4 68

4.8 Example to Topic 439

The model used in the paper of Rudebusch and Svensson has the following single equations

πt+1 = απ1πt + απ2πt−1 + απ3πt−2 + απ4πt−3 + αyyt + εt+1 (1’)

yt+1 = βy1yt + βy2yt−1 + βr (it − πt) + ηt+1 (2’)

Equation (1’) is a Phillips curve where πt is the quarterly rate of inflation at an annual rate, that

is 400×(lnPt− lnPt−1), where pt is the price level. yt is the percentage output gap between actual

real GDP and potential real GDP, defined as 100× (lnYt − lnY ∗t ). The second equation is an IS

curve where it is the quarterly average federal funds rate in percent at an annual rate. it is the

four quarter average federal funds rate, that is (1/4)∑3j=0 it−j ; similarly, πt is the four-quarter

average inflation, that is (1/4)∑3j=0 πt−j .

In order to make the model compatible with the notation used in Chow, we may lag the whole

system by one period and obtain

πt = αc + απ1πt−1 + απ2πt−2 + απ3πt−3 + απ4πt−4 + αyyt−1 + εt (1)

yt = βc + βy1yt−1 + βy2yt−2 + βr (it−1 − πt−1) + ηt (2)

The model may be represented as a VAR, with the following easy representation.

Zt = b+BZt−1 + CXt + εt (3)

The extended version of equation (3) looks rather difficult, but it isn’t.

πt

πt−1

πt−2

πt−3

yt

yt−1

it−1

it−2

it−3

=

αc

0

0

0

βc

0

0

0

0

+

απ1 απ2 απ3 απ4 αy 0 0 0 0

1 0 0 0 0 0 0 0 0

0 1 0 0 0 0 0 0 0

0 0 1 0 0 0 0 0 0

−.25βr −.25βr −.25βr −.25βr βy1 βy2 −.25βr −.25βr −.25βr

0 0 0 0 1 0 0 0 0

0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 1 0 0

0 0 0 0 0 0 0 1 0

πt−1

πt−2

πt−3

πt−4

yt−1

yt−2

it−2

it−3

it−4

· · ·

+

0

0

0

0

.25βr

0

1

0

0

it +

εt

0

0

0

ηt

0

0

0

0

(4)

39The structure of the model has been taken from Rudebusch/Svensson, Policy Rules for Inflation Targeting,

NBER Working Paper No. 6512.



The results we obtain are not very different from the results obtained by the original authors, even

if we have enlarged the sample data from 1967Q1:1996Q2 to 1956Q2:2006Q2. The results are seen

in the EViews-Outputs below. To impose easier restrictions on the coefficients, the two equations

have been estimated separately, rather than a VAR. However, as the VAR estimations are simply

OLS equations, the results do not differ.



A short interpretation of the coefficients obtained may be given at this point. It seems pretty

clear that inflation can be well explained by the inflation of the period before. However, it is

also important that the coefficient ”PI(-1)” is smaller than one; otherwise, the system would be

explosive. The coefficient in the first equation for the output gap is positive, too. According to the

definition, for a positive output gap the economy is in a boom, which puts pressure on inflation.

In the second equation the coefficient for the real interest rate (the difference between nominal

interest rate and inflation) is negative, which does cope well with most macro-models.

To compute the optimal feedback control equation

Xt = GtZt−1 + gt (5)

or better, the values of G and g in the above equation, the results obtained are written again in

matrix form, which looks like

πt

πt−1

πt−2

πt−3

yt

yt−1

it−1

it−2

it−3

=

0.21

0

0

0

0.23

0

1

0

0

+

0.17 0.28 0.36 0.06 0.08 0 0 0 0

1 0 0 0 0 0 0 0 0

0 1 0 0 0 0 0 0 0

0 0 1 0 0 0 0 0 0

0.016 0.016 0.016 0.016 1.17 −.27 −.016 −.016 −.016

0 0 0 0 1 0 0 0 0

0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 1 0 0

0 0 0 0 0 0 0 1 0

πt−1

πt−2

πt−3

πt−4

yt−1

yt−2

it−2

it−3

it−4

+

0

0

0

0

−.016

0

1

0

0

it−1

Before going on, let us state the loss function, which the central bank faces. We assume that:

L = E0

T∑t=1

(Zt − at)′Kt(Zt − at) (6)

For simplicity, we assume that the matrix Kt is time-invariant and

K =

1

0

0

0

λ

0

v −v−v v

0

(7)

The first row, first column element assign a weight of one to deviations from the inflation target. To

take into account also past deviations from the inflation target, does not make sense economically,

as these values can’t be changed anymore. The same is valid also for the deviation from potential



output, where a weight of λ is assigned to the current gap. Because the current interest rate is

not included in the vector of state variables, the nonzero elements in K assign a weight of v to

inter-temporal changes of the past interest rate. However, as a forward looking central bank, this

is taken into account when the central bank decides about its policy. All other elements of K are

zero. Further, we assume the target vector at to be constant over time with

a =[2 0 0 0 0 0 0 0 0

]′(8)

that is, the inflation target (first element) is 2 percent, and the target for the output gap (fifth

row) is zero (e.g. the economy grows with the potential output). To assign a interest rate target

has no sense, because only the change of the interest rate level is part of the loss function. The

special form of the block of v and −v elements in (7) can be made clearly in few words. Imagine

to take just this block of K, and change Zt and at accordingly. The part corresponding to interest

rates in the sum of (6) can be written as[it−1 − 0 it−2 − 0

] [ v −v−v v

][it−1 − 0

it−2 − 0

]= v · i2t−1 − 2v · it−1it−2 + v · i2t−2 = v(it−1 − it−2)2

Now, coming back to our original K as in (7), the loss function in (6) simplifies to

L = E0

T∑t=1

(πt − 2)2 + λy2t + v(it − it−1)2 (9)

Given the above assumptions, and the fact that also the matrices B, C, and b are invariant over

time, the solution for G is a steady state. The equations to define G are

G = −(C ′HC)−1C ′HB (10)

H = K + (B + CG)′H(B + CG)

= K +R′HR (11)

There does not exist an analytical solution to the problem, so we will solve it with the parameters

we found by regression and our assumptions (we assume that λ = 1 and v = 1). The Matlab-Code

is provided at the end of the example.

As optimal feedback control equation in a steady state results the following

Xt = GtZt−1 + gt (5)

where

G =[0.19 0.16 0.10 0.02 0.60 −0.18 0.69 −0.02 −0.01

](12a)

and

g = 0.68 (12b)

The optimal feedback control equation may now be written in expanded form

it = 0.19πt−1+0.16πt−2+0.10πt−3+0.02πt−4+0.60yt−1−0.18yt−2+0.69it−2−0.02it−3−0.01it−4+0.68

(5’)



Be aware that this optimal feedback control equation is linked directly to the ”preferences” of the

policymaker, that is, to the values of the variables λ and v in the loss function. A different loss

function yields also a different feedback control equation.

The following graphics show the impulse response functions. The difference between the red and

the blue line is the value for v. Thus, the two curves are the results of two different loss functions,

which represent two different sets of preferences of the policymaker. As the difference is in the

weight for inter-temporal changes of the interest rate, the difference is biggest in the response

functions of the interest rate (it moves much less).



Data

All data is downloaded from FRED (Federal Reserve Economic Data) of the Federal Reserve Bank

of St. Louis. The following data has been used:

• Pt: Consumer Price Index for All Urban Consumers: All Items; monthly; seasonally adjusted

(SA)

• Yt: Real Gross Domestic Product, 3 Decimal; Billions of Chained 2000 USD; quarterly;

seasonally adjusted annual rate (SAAR)

• Y ∗t : Real Potential Gross Domestic Product; Billions of Chained 2000 USD; quarterly

• it: Effective Federal Funds Rate; monthly

The data sets for GDP and potential GDP are already as they need to be. To calculate the

quarterly inflation, we need only the data for the month January, April, July, October and, for

each period, compute the log-difference of the two. To compute πt, simply take the average of the

current and the preceding three inflation rates. To calculate it, first take the average of the three

month of a quarter, and then the average of the current and the preceding three quarters.

Matlab-Code

function imprespfunc

%

%written by Andreas Walchli

%This program starts the process

%First, input is asked to the user, then the Impulse Response Functions are

%drawn

%

clear all;

tic;

prompt = ’Enter weight for \lambda for Output gap:’, ’Enter weight for v for Interest Rate’;dlg title = ’Input for Impulse Response Functions’;

num lines = 1;

def = ’1’,’5’;options.Resize=’on’;

options.WindowStyle=’normal’;

options.Interpreter=’tex’;



answer = inputdlg(prompt,dlg title,num lines,def,options);

lambda(1,1) = 1;

lambda(1,2) = str2double(answer(1,1));

v(1,1) = 1;

v(1,2) = str2double(answer(2,1));

%two different kinds of shocks

shock pi = zeros(1,max(size(example(lambda(1,1)))));

shock pi(1,1) = 1;

shock y = zeros(1,max(size(example(lambda(1,1)))));

shock y(1,5) = 1;

shock = zeros(max(size(example(lambda(1,1)))),4);

shock(1,1) = 1;

shock(5,2) = 1;

shock(1,3) = 1;

shock(5,4) = 1;

[BplusCG(:,:,1) K(:,:,1) a(:,1)] = example(lambda(1,1),v(1,1));




for i=1:4

X(:,:,i) = Var1SimPs(BplusCG(:,:,i), shock(:,i), 51);

L(i,1) = 0;

for t=1:41

L(i,1) = L(i,1) + (X(t,:,i)’-a(:,i))’*K(:,:,i)*(X(t,:,i)’-a(:,i));

end;

disp(’Loss 1’);

disp(L(i,1));

end;

x = 0:50;

Y = zeros(1,51);

mylegend = [’\lambda = ’, num2str(lambda(1,2)), ’, v = ’, num2str(v(1,2))];

mytitles=



’Response of Inflation to an Inflation shock’;

’Response of Inflation to an Output shock’;

’Response of Output to an Inflation shock’;

’Response of Output to an Output shock’;

’Response of the Interest Rate to an Inflation shock’;

’Response of the Interest Rate to an Output shock’;

;

figure(1);

col = [1 0 5 0 7 0];

for j=1:2:6

subplot(3,2,j); plot(x,X(:,col(1,j),1),’-b’,x,X(:,col(1,j),3),’-r’,x,Y,’:k’);

title(mytitles(j,1));

legend(’\lambda = 1, v = 1’,mylegend);

subplot(3,2,j+1); plot(x,X(:,col(1,j),2),’-b’,x,X(:,col(1,j),4),’-r’,x,Y,’:k’);

title(mytitles(j+1,1));

legend(’\lambda = 1, v = 1’,mylegend);

end;

toc;

———————————————————————————————————————————

function [newA, K, a] = example(lambda,v)

if(nargin==0)

lambda = 1;

v = 1;

end;

if(nargin==1)

v = 1;

end;

K = zeros(9,9);

K(1,1) = 1;

K(5,5) = lambda;

K(7,7) = v;

K(7,8) = -v;

K(8,7) = -v;



K(8,8) = v;

E = zeros(9,9);

for i=1:9

E(i+1,i)=1;

end;

%fill in data

betar = .25*(-0.063932);

B = [

.1744 .2832 .3616 .0613 .0786 0 0 0 0

E(2,:)

E(3,:)

E(4,:)

-betar -betar -betar -betar 1.1740 -.2706 betar betar betar

E(6,:)

zeros(1,9)

E(8,:)

E(9,:)

];

C = [0 0 0 0 betar 0 1 0 0]’;

a = [2 0 0 0 0 0 0 0 0]’;

b = [.2090 0 0 0 .2263 0 0 0 0]’;

[G,g] = steadystate(K,B,C,b,a);

G,

g,

newA = B + C*G;

———————————————————————————————————————————

function [G,g,g2] = steadystate(K,B,C,b,a)

%

%The function takes the time-invariant matrices/vectors K,B,C,b and a

%and finds the optimal G and g for a problem

%Z t = BZ t-1 + CX t + b t + u t

%with the optimal feedback control equation



%X t = GZ t-1 + g

%

H T = eye(size(K));

Hdiff = 100;

while Hdiff ¿ 1E-4

G = -inv(C’*H T*C)*C’*H T*B;

R = B + C*G;

H = K + R’*H T*R;

Hdiff = max( max( abs( H-H T )));

H T = H;

end;

G = -inv(C’*H*C)*C’*H*B;

%the same is done for h and g

h0 = K*a;

hdiff = 100;

while hdiff ¿ 1E-4

h = K*a + (C*G + B)’*(h0-H*b);

hdiff = max( max( abs(h-h0) ) );

h0 = h;

end;

h;

g = inv(C’*H*C)*C’*h;


Proof of Lemma from Section 2.3.1 78

A Proof of Lemma from Section 2.3.1

Lemma: Let the pair of vectors x and y be jointly multivariate normal such that[x

y

]∼ N

([µx

µy

],

[∑xx

∑xy∑

yx

∑yy

])(A.1)

Then the distribution of x conditional on y is also multivariate normal with mean

µx|y = µx +∑xx

∑−1yy (y − µy) (A.2)

and covariance matrix ∑xx|y =

∑xx−

∑xy

∑−1yy

∑yx (A.3)

Proof : The joint density of (x′ y′) is

p(x, y) =1

(2π)(M+N)/2|∑|12

exp

[−1

2(x′ − µ′x, y′ − µ′y)

∑−1(x− µx, y − µy)

](A.4)

where M and N are the length of x and y respectively. The density of x conditional on y is

therefore

p(x|y) =p(x, y)

p(y)=

1

(2π)N/2·|∑yy |

12

|∑|12

·exp

[− 1

2 (x′ − µ′x, y′ − µ′y)∑−1

(x− µx, y − µy)]

exp[− 1

2 (y′ − µ′y)∑−1yy (y − µy)

] (A.5)

The following formula is used to rewrite p(x|y):[I −

∑xy

∑−1yy

0 I

]∑[I 0

−∑xy

∑−1yy I

]=

[∑xx−

∑xy

∑−1yy

∑yx 0

0∑yy

](A.6)

First, taking determinants and recalling that the determinant of the product of two non-singular

matrices is the product of the determinants yields

|∑| = |

∑xx−

∑xy

∑−1yy

∑yx||

∑yy | (A.7)

Secondly, solving (A.6) for∑

gives

∑−1=

[I 0

−∑−1yy

∑′xy I

][(∑xx−

∑xy

∑−1yy

∑yx)−1 0

0∑−1yy

][I −

∑yy

∑−1xy

0 I

](A.8)

Substituting for∑−1

in (x′ − µ′x, y′ − µ′y)∑−1

(x− µx, y − µy) yields

(x′ − µ′x|y)(∑xx−

∑xy

∑−1yy

∑yx)−1(x− µx|y) + (y′ − µ′y)

∑−1yy (y − µy)

where µx|y is defined in equation (A.2). Substituting equations (A.7) and (A.8) in (A.5) gives

p(x|y) =1

(2π)M/2|∑xx−

∑xy

∑−1yy

∑yx |·exp

[−1

2(x− µx|y)′(

∑xx−

∑xy

∑−1yy

∑yx)−1(x− µx|y)

](A.9)

Thus, it could be shown that the conditional density is indeed given by the mean and covariance

of equations (A.2) and (A.3).


Proof of Lemma from Section 2.3.2 79

B Proof of Lemma from Section 2.3.2

Corollary: If f(Yt) is a given function of the observations Yt, then the estimation error is orthog-

onal to f(Yt), θ = θ − θ⊥f(Yt), which implies

E[(θ − θ)f ′(Yt)

]= 0 (B.1)

Proof : Let x and y be jointly distributed random variables and g(y) a function of y. For the

proof the following result from jointly distributed random variables is used:

E [xg(y)] = E [E(x|y)g(y)] (B.2)

where the outer mean-value operator on the right hand side is defined relative to the random

variable y. Using (B.2) in (B.1) gives

E[θf ′(Yt)

]= E

[E(θ|Yt)f ′(Y )

](B.3)

Because θ is known if Yt is given, the evaluation of the mean within the square brackets yields

E[θ|Yt

]= E [θ|Yt]− θ (B.4)

Using notation from above, equation (B.4) is

E[(αt − at|t−1)|Yt

]= E [αt|Yt]− at|t−1 (B.5)

which is zero because of equation (2.51). Therefore, if f(Yt) = at|t−1, the estimation error is

orthogonal to the estimator.


References 80

C References

Amisano, G., and C. Giannini (1997), Topics in Structural VAR Econometrics. New York:

Springer Verlag.

Aoki, M. (1990), State Space Modeling of Time Series. 2nd ed. Berlin: Springer Verlag.

Bellmann, R. E. (1957), Dynamic Programming. Princeton University Press.

Blanchard, O. J., (1979), Backward and Forward Solutions for Economies with Rational Expec-

tations, American Economic Review, Vol. 69, No. 2.

Blanchard, O. J., and C. M. Kahn (1980), The Solution of Linear Difference Models under Rational

Expectations, Econometrica, Vol. 48, No. 5.

Blanchard, O. J., and D. Quah (1989), The Dynamic Effects of Aggregate Demand and Supply

Disturbances, American Economic Review, Vol. 79, No. 4.

Chow, G. C. (1970), Optimal Stochastic Control of Linear Economic Systems, Journal of Money,

Credit and Banking, Vol. 2, No. 3.

Chow, G. C. (1972), Optimal Control of Linear Econometric Systems with Finite Horizon, Inter-

national Economic Review, Vol. 13, No. 1.

Chow, G. C. (1980), Estimation of Rational Expectations Models, Journal of Economic Dynamics

and Control, Vol. 2.

Chow, G. C. (1983), Econometrics. New York: Mcgraw-Hill College.

Chow, G. C. (1986), Analysis and Control of Dynamic Economic Systems. New York: John Wiley

& Sons.

Chow, G. C. (1997), Dynamic Economics: Optimization by the Lagrange Method. Oxford Univer-

sity Press.

Chow, G. C., and A. Lin (1971), Best Linear Unbiased Interpolation, Distribution and Extrapo-

lation of Time Series by Related Series, The Review of Economics and Statistics, Vol. 53(4),

pp. 372-375.

Christiano, L. J., M. Eichenbaum, and C. L. Evans (1999), Monetary Policy Shocks: What we

learned and to what end? in Handbook of Macroeconomics, Vol. 1, Part 1.

Enders, W. (2004), Applied Econometric Time Series, 2nd Edition. New York: John Wiley &

Sons.

Favero, C. A. (2001), Applied Macroeconometrics. Oxford: Oxford University Press.

Friedman, M. (1962), The Interpolation of Time Series by Related Series, Journal of the American

Statistical Association, Vol. 57, No. 300, pp. 729-757.

Grunberg, E., and F. Modigliani (1954), The Predictability of Social Events, Journal of Political

Economy, Vol. 62, No. 6.

Hamilton, J. D. (1994), Time Series Analysis. New Jersey: Princeton University Press.

Hansen, L. P., and T. J. Sargent (1980), Formulating and Estimating Dynamic Linear Rational

Expectations Models, Journal of Economic Dynamics and Control, Vol. 2.

Harvey, A. C. (1989), Forecasting, structural time series models and the Kalman filter. Cambridge:

Cambridge University Press.


References 81

Heer, B., and A. Maussner (2005), Dynamic General Equilibrium Modelling. Berlin: Springer

Verlag.

Lucas, R. E., and T. J. Sargent (editors) (1981), Rational Expectations and Econometric Practice.

Minneapolis: University of Minnesota Press.

Lutkepohl, H. (2005), New Introduction to Multiple Time Series Analysis. Berlin: Springer Verlag.

Minford, P., and D. Peel (2002), Advanced Macroeconomics. Elgar Publishing Limited.

Moauro, F., and G. Savio (2005), Temporal Disaggregation Using Multivariate Structural Time

Series Models, The Econometrics Journal 8(2),p. 214-234.

Muth, J. F. (1961), Rational Expectations and the Theory of Price Movements, Econometrica,

Vol. 29, No. 3.

Ribeiro, M. I. (2004). Kalman and Extended Kalman Filters: Concept, Derivation and Properties.

Retrieved July, 20, 2006 from http://users.isr.ist.utl.pt/ mir/pub/kalman.pdf

Rudebusch, G. D., and L. E. O. Svensson , Policy Rules for Inflation Targeting in Monetary Policy

Rules, edited by J. B. Taylor (2001). Chicago: University Of Chicago Press.

Sims, C. A. (1980), Macroeconomics and Reality, Econometrica, Vol. 48, No. 1.

Soderlind, P. (1999), Solution and Estimation of RE Macromodels with Optimal Policy, European

Economic Review, Vol. 43.

Svensson, L. E.O. (1997), Inflation forecast targeting: Implementing and monitoring inflation

targets, European Economic Review, Vol. 41.

Svensson, L. E.0. (1997), Optimal Inflation Targets, ”Conservative” Central Banks, and Linear

Inflation Contracts, American Economic Review, Vol. 87 No. 1.

Svensson, L. E.0. (1999), Inflation Targeting: Some Extensions, Scandinavian Journal of Eco-

nomics, Vol. 101, No. 3.

Taylor, J. B. (1977), Conditions for Unique Solutions in Stochastic Macroeconomic Models with

Rational Expectations, Econometrica, Vol. 45, No. 6.


Documents

Lecture Notes for the Course ‘Empirical Macroeconomics’ · 1I thank Andreas W alchli for the preparation of these lecture notes. 2The notation used in this lecture notes are mainly