chapter 2

1

PhD Program in Business Administration and Quantitative Methods

FINANCIAL ECONOMETRICS

2006-2007

ESTHER RUIZ

CHAPTER 2. UNOBSERVED COMPONENT MODELS

2.1 Description and properties

When analysing the dynamic evolution of a given variable of interest, it is often helpful

to assume that it is made up by unobserved components which have a direct

interpretation. There are plenty of applications in finance of models with unobserved

components. Next, we describe some of them.

Fundamentals of prices

If we are analysing the evolution of the price of a financial stock in a given market, we

may be interested in the underlying fundamental price while the observed price is

contaminated because of market rigidities. In this case, if we assume that the

fundamental price is a random walk, the observed price is given by

ttt

tttyημμ

εμ+=

+=

−1

where tμ is the underlying fundamental price and tε are measurement errors. This model

can also be interpreted as a “fads” model where different types of traders give rise to

different unobserved components; see, for example, Potterba and Summers (1988).

Ex ante real interest differentials

Cavaglia (1992) analyses the dynamic behaviour of ex ante interest differentials across

four countries, United States, Germany, Switzerland and Holland, using monthly

observations of ex post interest differentials from 1973 to 1987. The model proposed is:

ttt

ttt

yLy

yy

ηφ

π

+=

+=

−*

1*

*

)(

2

where ty is the ex post interest differential, *ty is the ex ante real interest differential and

tπ is the cross-country differential in inflation forecast errors which is assumed to be

identically and independently distributed. They conclude that ex ante real interest rates

are short-lived and mean-reverting to zero supporting theoretical models of economic

interdependence.

Factor models

There is a long tradition of factor models in finance. These models simplify the

computation of the covariance matrix of returns in the context of mean-variance

portfolio allocation. Furthermore, factors are central in two asset pricing theories: the

mutual fund separation theory, of which the CAPM is a special case, and the arbitrage

pricing theory (APT). In latent factor models, the observed variables depend on a few

factors that are modelled as GARCH processes. Multivariate latent factor models have

been used in several applications. For example, Diebold and Nerlove (1989) fitted a

one-factor model to represent the dynamic evolution of the volatilities of seven dollar

exchange rates. King, Sentana and Wadhwani (1994) used a factor model to asses the

extend of capital market integration on sixteen national stock markets. Sentana (2004)

analyses the statistical properties of alternative ways of creating actively and passively

managed mimicking portfolios from a finite number of assets. He proposes the

following model:

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

+

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

+

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

=

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

Nt

t

t

kt

t

t

Nkt

kt

kt

tN

t

t

Nt

t

t

Nt

t

t

f

ff

v

vv

r

rr

ε

εε

β

ββ

β

ββ

.....................

.........2

1

2

1

2

1

1

21

11

2

1

2

1

where itr is the return of a risky asset, [ ] tttt RffE Λ=−1' | which is a diagonal matrix

and [ ] tttt RE Γ=−1' |εε .

Term structure

Several authors consider models for the term structure of interest rates where the

observable variables are zero coupon rates and the unobserved variables are the factors

that drive the curve. The law of motion of these factors depend on the dynamic structure

chosen. For example, in the Vasiceck model, it is an Ornstein-Uhlenbeck process; see

Babbs abd Nowman (1999). Finally, the observed yields are given by the theoretical

3

rates implied by a no arbitrage condition plus a stochastic disturbance. For example, the

model proposed by de Rossi (2004) is given by

tttttt

nt

t

t

uCrBA

y

yy

ε

τ

ττ

+++−=

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

)(...

)()(

2

1

{ } { } { }

{ }tt

u

t

t

t cur

tbba

tatbtaur

η++⎥⎦

⎤⎢⎣

⎡⎥⎥

⎦

⎤

⎢⎢

⎣

⎡

Δ−−

Δ−−Δ−Δ−=⎥

⎦

⎤⎢⎣

⎡

+

+

exp0~

~expexp~exp1

1

where )(τty is the spot interest rate at time t for maturity τ+t .

In a dynamic context, Dungey, Martin and Pagan (2000) analyze bond yield spreads

between five countries by decomposing international interest rate spread into national

and global latent factors.

Modelling volatility

Consider that we are interested in modelling the volatility of the price. There are two

main types of models proposed for this goal. The most popular are the GARCH models

where the volatility is assumed to be a non-linear function of past returns. Consider, for

example, the GARCH(1,1) model given by

ttty σε=

21

21

2−− ++= ttt y βσαωσ

where tε is an IID white noise with variance 1. The parameters have to be restricted to

guarantee the positiveness of the conditional variance. In particular, 0>ω , 0≥α and

0≥β . The stationarity condition is 1<+ βα .

ARCH-type models assume that the volatility can be observed one-step-ahead.

However, a more realistic model for volatility can be based on modelling it having a

predictable component that depends on past information and an unexpected noise. In

this case, the volatility is a latent unobserved variable. One interpretation of the latent

volatility is that it represents the arrival of new information into the market; see, for

example, Clark (1973). In the simplest case, the log-volatility follows an AR(1) process.

Then, we have the ARSV(1) model given by

4

ttty σεσ *=

ttt ησφσ += − )log()log( 21

2

where tε is a strict white noise with variance 1. The noise of the volatility equation, tη ,

is assumed to be a Gaussian white noise with variance 2ησ independent of the noise of

the level, tε . The Gaussianity of tη may seem rather ad hoc. However, there are several

empirical studies that support this assumption both for exchange rates and stock returns;

see Andersen, T.G., T. Bollerslev, F.X. Diebold and H. Ebens (2001) and Andersen,

T.G., T. Bollerslev, F.X. Diebold and P. Labys (2001, 2003).

The necessity of assumptions about the dynamics of the underlying latent variables is

the main criticism against unobserved component models. However, given that the

variable of interest cannot be observed, we can only estimate them restricting somehow

their behaviour. The main point is whether these assumptions are sensible and

compatible with the data under analysis.

2.2 State space models

In general, a linear unobserved component model can be written as a state space model

as follows

ttttt

ttttt

cTdZy

ηααεα++=++=

−1

where tα is the latent state at time t which has k components, tε is a white noise

process with variance tH and tη is a k -dimensional white noise with covariance

matrix tQ uncorrelated with tε at all leads and lags. The system matrices

ttttt TdZQH ,,,, and tc are assumed to be predetermined in the sense that they are

known at time 1−t . When they are fixed, the model is said to be time-invariant.

The first equation is known as measurement equation while the second is the

transition equation.

Consider, for example, two of the models described above.

a) In the model for fundamental prices,

1,0,1,,, 22 ====== ttttttt TdZQH ηε σσμα and 0=tc .

b) In the ex ante real interest differentials model,

5

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

==== +−−

0...00............0...01

...

,),,...,,(),0,...0,1(

21

*1

*1

*

p

tttpttttt TyyyZ

φφφ

πεα , 2εσ=tH

and 2ησ=tQ .

When tε and tη are assumed to be Gaussian, the model is a Gaussian state space

model.

Unobserved component models depend on several disturbances. Provided the model is

linear, the components driven these disturbances can be combined to give a model with

a single disturbance. This is known as reduced form. The reduced form is an ARIMA

model, and the fact that it is derived from a structural form will typically imply

restrictions on its parameters.

Consider, once more, the random walk plus noise model. Taking first differences, we

obtain the following expression:

ttty εη Δ+=Δ

The mean and variance of tyΔ are given by

0)()( =Δ+=Δ ttt EyE εη

222 2)()( εη σσεη +=Δ+=Δ ttt EyVar

The dynamic properties of tyΔ can be analysed by looking at its autocorrelation

function given by

⎪⎪⎩

⎪⎪⎨

⎧

≥

=+

−=+

−

=

2,0

1,2

12)(

22

2

h

hqh εη

ε

σσσ

ρ

The constant 2

2

ε

η

σ

σ=q is known as the signal to noise ratio. From the autocorrelation

function above, it is easy to see that the reduced form of the random walk plus noise

model is an IMA(1,1) model with negative parameter. Equating the autocorrelations of

first differences at lag one gives the following expression of the MA parameter

2242 qqq −−+

=θ

6

When 0=q , tyΔ reduces to a non-invertible MA(1) model, i.e. ty is a white noise

porcess. On the other hand, as q increases, the autocorrelations of order one, and

consequently, θ , decreases. In the limit, if 02 =εσ , tyΔ is a white noise and ty is a

random walk.

2.2 The Kalman filter: filtered estimates of the unobserved components

The Kalman filter is made up of two sets of equations. First, we have the prediction

equations that give us the one-step ahead predictions of the unobserved components:

[ ] ttttttt caTYEa +== −−− 111/ |α

where [ ]111 | −−− = ttt YEa α . The one step-ahead MSE matrices of the components are

given by

[ ] tttttttttt QTPTYaEP +=−= −−−−'

111/1/ |α

where [ ]11/1/ |)')(( −−− −−= tttttttt YaaEP αα is the MSE matrix of 1−ta . Once we have

these one-step ahead estimates of the state, we can also obtain the one-step ahead

estimated of ty and corresponding prediction errors and their MSE’s as follows

[ ]ttttttttt

tttttttt

aZyydaZYyEy

εαν +−=−=+==

−−

−−−

)(ˆ|ˆ

1/1/

1/11/

( ) tttttt HZPZEF +== −'

1/2ν

The one-step ahead estimates of the state, 1/ −tta , can be updated using the new

information provided by the observation ty . The resulting equations are known as

updated equations. These equations can be easily derived using the properties of the

multivariate normal distribution. In particular, consider the distribution of tα and ty

conditional on past information up to and including time 1−t . The conditional mean

and variance of tα and ty have been derived before. The conditional covariance

between both variables can be easily derived taking into account that ty can be written

as

tttttttttt aZdaZy εα +−++= −− )( 1/1/

and, therefore,

7

[ ]'

1/

'1/1/11/1/11 )))')(((()')(()|,(

ttt

ttttttttttttttttttttt

ZP

ZaaEdaZyaEYyCov

−

−−−

−−−

−

=

+−−=−−−= εαααα

Consequently, the required conditional distribution is given by

⎥⎥⎦

⎤

⎢⎢⎣

⎡⎥⎦

⎤⎢⎣

⎡⎥⎦

⎤⎢⎣

⎡+

⎯→⎯⎥⎦

⎤⎢⎣

⎡

−

−−

−

−−

tttt

ttttt

tttt

ttt

t

t

FPZZPP

daZa

NYy 1/

'1/1/

1/

1/1 ,|

α.

From where, we can see that the updated equations are given by

[ ][ ] 1/

1'1/1/

1'1/1/

|)')((

|

−−

−−

−−−

−=−−=

+==

ttttttttttttttt

tttttttttt

PZFZPPYaaEP

FZPaYEa

αα

να

The prediction error plays a role in updating the new estimates. The more the predictor

deviates from its realized value, the bigger the change made to the estimator of the state.

If the model is Gaussian, and given the initial conditions, 0a and 0P , the Kalman filter

delivers the conditional mean of the estate which is the Minimum MSE estimator of the

state as each new observation becomes available. When the disturbances are not

normally distibuted, it is not longer true, in general, that the Kalman filter yields the

conditional mean of the state vector. If the model is Gaussian, then the estimates are

minimum MSE linear estimates.

It is important to note that, in time-invariant models, the observations ty do not affect

the MSE matrices 1/ −ttP and tP . Therefore, these matrices are both conditional and

unconditional MSE matrices.

Consider, for example, the Random walk plus noise model with known 2εσ and 2

ησ . To

initalize the filter, we need initial values for 0a and 0P . One alternative is to use what is

known as a diffuse prior distribution which in this case, is given by 00 =m and

∞=0P , where )( 000 μEm = . This says that nothing is known about the initial state.

Then, using the expression of the prediction equations, we obtain

200/1

00/1 0

ησ+=

==

PP

mm

We can update this estimate of the underlying level at time 1 by using the information

contained in 1y . Then, using the updated equations of the Kalman filter, we obtain

8

222

0

202

00/11

10/10/11

1220

20

0/111

0/10/11

1)(

)(

εεη

ηη

εη

η

σσσ

σσ

σσ

σ

=⎥⎥⎦

⎤

⎢⎢⎣

⎡

++

+−+=−=

++

+=−+=

−

P

PPPFPPP

yP

Pmy

FP

mm

Then, we can follow using recursively the prediction and updated equations,

211/2

11/2

ησ+=

=

PP

mm

and

⎥⎥⎦

⎤

⎢⎢⎣

⎡

++

+−+=−=

−+=

−22

1

212

11/21

21/21/22

1/222

1/21/22

1)(

)(

εη

ηη σσ

σσ

P

PPPFPPP

myF

Pmm

Note that initializing with a difusse prior is equivalent to using the first observation as

an initial value at time 1=t .

If the state is generated by a stationary process, the initial conditional for the Kalman

filter are given by its marginal mean and variance.

Summarizing, if the system matrices are observable at time 1−t , the Kalman filter

yields optimal:

i) One-step ahead estimates of the unobserved components: 1/ −tta .

ii) Updated estimates of the unobserved components: ta .

iii) One-step ahead prediction errors of ty and their variances: tν and tF .

Consider, for example, the following series generated by a Random walk plus noise

model with parameters 12 =εσ and 49.02 =ησ (in red). The one-step ahead estimates of the

underlying level appear in blue.

-20

-16

-12

-8

-4

0

4

8

250 500 750 1000

Y YFILTERED

9

2.3 Smoothed Estimation of the unobserved components

There are also other filters, known as smoothing algorithms that give estimates of the

components based on the information contained in the whole sample. The fixed-

smoothing algorithm consists of a set of recursions which start with the final quantities

Ta and TP given by the Kalman filter and work backwards. The smoothing estimate of

tα is given by

[ ]

1/1

'1

*

*'/1/1

*/

11/1*

/

)(

)(|

−++

++

+++

=

−+=

−−+==

ttttt

tttTtttTt

tttTtttTtTt

PTPP

PPPPPP

caTaPaYEa α

Given that the smoothing estimate of tα is based on more information than the filtered

estimates, its MSE, TtP / , is, in general, smaller than that of the filtered estimator.

These smoothers are very useful because they also provide what is known as the

auxilary residuals which are estimates of the disturbances associated to each of the

different components of the model. These auxiliary residuals can be used to identify

outliers that affect different components (Harvey and Koopman, 1992) or to identify

whether the components of a given series are conditionally heteroscedastic (Broto and

Ruiz, 2005a,b). Expressions of the auxiliary residuals have been derived by Durbin and

Koopman (2001).

The following figure represents the smoothed estimates of the underlying level together

with the one-step ahead estimates for the same series considered above.

-20

-16

-12

-8

-4

0

4

8

250 500 750 1000

YFILTERED YSMOOTHED

10

2.4 Prediction

Once, we reach Tt = , we can run the prediction equations to obtain forecasts of future

values and their MSE’s.

[ ]

[ ] ZTQTZZTPTZYaaEP

aTZYEak

j

jt

jkT

ktkTkTkTkTkT

Tk

tTkTkT

⎟⎟⎠

⎞⎜⎜⎝

⎛+=−−=

==

∑−

=+++++

++

1

0

''

'

''|)')((

|

αα

α

For example, in the random walk plus noise model

2/1

/

ησkPP

mm

TTT

TTkT

+=

=

+

+

2.5 Estimation of the parameters

Up to now, we have assumed that the parameters of the model are known. However, in

practice, they are unknown and should be estimated from the data available. If the

model is conditionally Gaussian, then the parameters can be estimated by Maximum

Likelihood (ML) . Remember that the Kalman filter provides the innovations (one-step

ahead errors) and their variances.

The likelihood function can be written as follows:

∏=

−=T

ttt YypL

11)|(

The conditional distribution of ty can be easily derived by writting

tttttttttt aZdaZy εα +−++= −− )( 1/1/

Then, if )( 1/ −− ttt aα and tε are conditionally normal,

),(| 1/1 ttttttt FdaZNYy +→ −−

and the log-likelihood function can be written down inmediately as

∑ ∑= =

−−−=T

t

T

t t

tt F

FTL1 1

2

21||log

21)2log(

2log

νπ

This expression is know as the prediction error decomposition form of the likelihood.

The parameters are estimated by maximizing numerically the likelihood function. The

asymptotic properties of the ML estimator are the usual ones as far as the parameters lie

on the interior of the parameter space. However, in many models of interest, the

parameters are variances, and it is of interest to know whether they are zero (we have

deterministic components). In this case, the asymptotic distribution could still be related

11

with the Normal but is modified as to take into account of the boundary; see Harvey

(1989).

If the model is not conditionally Gaussian, then maximizing the Gaussian log-

likelihood, we obtain what is known as the Quasi-Maximum Likelihood (QML)

estimator. In this case, the estimator looses its eficiency. Alternatives based on the true

likelihood are more efficient but, when they can be defined, are more complicated from

the computational point of view. Furthermore, droping the Normality assumption tends

to affect the asymptotic distribution of all the model parameters. In this case, the

asymptotic distribution is given by 11)ˆ( −−=− IJJT ψψ

where ⎥⎦

⎤⎢⎣

⎡

∂∂∂

−='

log2

ψψLEJ and ⎥

⎦

⎤⎢⎣

⎡∂

∂∂

∂=

'loglogψψ

LLEI and the expectations are taken

with respect to the true distribution; see Gourieroux (1997).

Once the parameters are estimated, the Kalman filter is run again with the parameters

fixed in the estimated values to yield one-step ahead and updated estimates of the

unknown estates, 1/ˆ −tta and ta respectively, and the smoother to yield estimates based

on the whole sample, Tta /ˆ . As a by-product, we also obtain several residuals:

a) Standarized residuals: t

tt F

ˆ~ νν = . Apply standard tests for Normality,

heteroscedasticity and serial correlation.

b) Auxiliary residuals:

tTttTtTt

tTtttTt

caTadaZy−−=

−−=

− /1//

//

ˆˆˆˆ

ηε

References

Broto, C. and E. Ruiz (2005), Unobserved component models with asymmetric

conditional variances, Computational Statistics and Data Analysis, forthcoming.

Harvey, A.C: (1989), Forecasting, Structural Time series models and the Kalman filter,

London, Cambridge University Press. Chapter 4.

Harvey, A.C. (1993), Time Series Models, 2nd ed., Harvester Wheatsheaf, London.

Chapters 4 and 5.

12

Harvey, A.C. and S.J. Koopman (1992), Diagnostic checking of unobserved-

components time series models, Journal of Business & Economic Statistics, 10, 377-

389.

Koopman, S.J. (1993), Disturbance smoother for state space models, Biometrika, 80,

117-126.

Koopman, S.J., A.C. Harvey, J.A. Doornik and N. Shephard (2000). STAMP: Structural

Time Series Analyser, Modeller and Predictor. Timberlake Consultants Press.

Wells, The Kalman Filter in Finance. Kluwer Academic Publishing.

Exercices

1. (a)Consider an AR(1) model in which the first observation is fixed. Write down

the likelihood function of the observation at time τ is missing. (b) Given a value

of the AR parameter, φ , show that the estimator of the missing observation

obtained by smoothing is 211

/ 1)(ˆ

φφ ττ

τ++

= +− yyy T .

2. Consider a random walk plus noise model. If 02 =ησ , show that running the

Kalman filter initialised with a diffuse prior yields an estimator of tμ equal to

the mean of the first t observations. Show that the variance of this estimator is

calculated by the Kalman filter to be t

2εσ .

3. Using the Kalman filter obtain estimates of the underlying level of the IBEX35.

Derive the reduced form model and check whether this model is in concordance

with the model fitted analysing the correlogram of tyΔ .

4. Obtain the reduced form of the local level model given by

ttt

tttt

ttty

ξββηβμμ

εμ

+=++=

+=

−

−

1

1

13

where tt ηε , and tξ are mutually uncorrelated white noise process with variances

22 , ηε σσ and 2ξσ respectively.

Documents

chapter 2