State space models with switching and program DMM · State Space Models with Switching The SSMS class encompasses models that admit representation: y t = c tz t+ H tx t+ G tu t x

State space models with switching and program DMM

Alessandro Rossi and Christophe PlanasJoint Research Centre of European Commission

Identification and global sensitivity analysis for macroeconomic models22-24 April 2015, Milano

Rossi SSMS 1 / 73

Motivation

Thanks to their flexibility for handling nonlinearities, structuralchanges, and outliers, State Space Models with Switching (SSMS)have entertained some success in the econometric literature(See e.g. Giordani, Kohn, and van Dijk, JoE 2007)

Some macroeconomic models featuring rational expectationsincorporate features that make their reduced form representable in aSSMS format (e.g. Farmer, Waggoner, and Zha, JEDC 2011)

Rossi SSMS 2 / 73

Outline

The class of state space models with switching (SSMS)

Some well known model admitting a SSMS representation

Frequentist and Bayesian inference of SSMS

Program DMM for the analysis of SSMS

A test case: the Kahn and Rich approach (JME, 1997) for detectingchanges in US trend productivity

Rossi SSMS 3 / 73

State Space Models with Switching

The SSMS class encompasses models that admit representation:

yt = ctzt + Htxt + Gtut

xt = at + Ftxt−1 + Rtut

yt ny× 1 vector of endogenous variableszt is the nz× 1 vector of exogenous seriesxt is the nx× 1 state vectorut is the nu× 1 vector of shocksct, Ht, Gt, at, Ft, and Rt determined by a parameter θ and adiscrete latent variable St = (S1t, · · · , S`t, · · · ).

Rossi SSMS 4 / 73

SSMS: general assumptions

Shocks are Gaussian: utiid∼ N(0, I)

This is not restrictive: fat tails or other features can be modelled viaproper treatment of the matrices Gt and Rt

Each variable S`t takes ns` values following independent Markovprocesses with transition probabilities π` ij ≡ Pr(S`t = i|S`t−1 = j),i, j = 1, · · · , ns`, that are collected into the vector π`

Notice that S`t can also be independent as a special case of Markovi.e. Pr(S`t = i|S`t−1 = j) = Pr(S`t = i)

Rossi SSMS 5 / 73

SSMS: general assumptions

The system matrices ct, Ht, Gt, at, Ft, and Rt depend only oncontemporaneous values of St

Different situations where some system matrices depend on laggedvalues of S`t are possible by reparameterising

Given St, the matrices ct, Ht, Gt, at, Ft, and Rt, do not depend onthe transition probabilities π = (π1, · · · , π`, · · · )

This condition makes feasible inference of unobservables and modelparameters

The eigenvalues of the transition matrix Ft are either less or equal toone in modulus

A weaker condition as detailed in Francq and Zakoian, (2001,2002) isalso possible

Rossi SSMS 6 / 73

Examples of model admitting a SSMS

Example 1: Time Varying Parameters autoregressive model

Example 2: Markov switching model for US real GDP

Example 3: Markov switching variance for USD/GBP real exchangerate

Example 4: Change-point volatility model for the Fama-Frenchmarket factor

Example 5: Structural Vector Autoregressions for monetary policy

Rossi SSMS 7 / 73

Example 1: Time varying parameters autoregressive model

Giordani and Kohn (2010) to model the persistence of US inflation

πt = ψt + ρtπt−1 + V1/2π aπt

ψt = ψt−1 + V1/2ψ aψt

ρt = ρt−1 + V1/2ρ aρt

xt = (ψt, ρt)′, ut = (aπt, aψt, aρt)

′, ct = at = 0, Ht = (1, πt−1),

Ft =

[1 00 1

], Rt =

[0 V

1/2ψ 0

0 0 V1/2ρ

], Gt =

[V

1/2π 0 0

]

Rossi SSMS 8 / 73

Example 2: Markov switching growth model

Similar to Hamilton (1989) for US GNP

yt = pt + ψt

pt = pt−1 +m(St) + V1/2p apt

m(St) = µ[St + (1− St)δ]ψt = φψt−1 + V

1/2ψ aψt

St ∈ 0, 1 with πii = p(St = i|St−1 = i), i = 0, 1.xt = (pt, ψt)

′, ut = (apt, aψt)′, ct = Gt = 0, Ht = (1, 1),

at =

[m(St)

0

], Ft =

[1 00 φ

]and Rt =

[V

1/2p 0

0 V1/2ψ

]

Rossi SSMS 9 / 73

Example 3: Markov Switching variance

Engle and KIM (1999) for USD/GBP real exchange rate

yt = pt + ψt

pt = pt−1 + V1/2p apt

ψt = φψt−1 + V1/2ψ [St + (1− St) α

1/2ψ ] aψt

St 2-states Markov chain, πii = Pr(St = i|St−1 = i) for i = 0, 1xt = (pt, ψt)

′, ut = (apt, aψt)′, ct = Gt = at = 0, Ht = (1, 1),

Ft =

[1 00 φ

]and Rt =

[V

1/2p 0

0 V1/2ψ [St + (1− St) α

1/2ψ ]

]

Rossi SSMS 10 / 73

Examples 4: Change-point volatility model

Kim, Shepard and Chib (1998) to describe stochastic volatility

σt = exp(λ1/2t )aσt

λt = λt−1 +K1tV1/2λ aλt

taking the square and log-linearizing:

log(σ2t ) = λt + µ(K2t) + γ(K2t)azt

λt = λt−1 +K1tV1/2λ aλt

K1t ∼ iiBernoulli(w)K2t ∼ iiMultinomial(π) with 10-states (Omori et al. 2007).µ(K2t) + γ(K2t)azt mixture of normals that approximates log a

2σt

Rossi SSMS 11 / 73

Example 5: Structural VAR with Switching

Primiceri (2005) to describe US Monetary policy and the private sector

yt = ψt +

p∑j=1

Bjt yt−j + ut

yt is a vector of endogenous vars

ψt is an vector of time varying coefficients, Bjt are matrices of timevarying coefficients

ut heteroscedastic shocks with covariance matrix Ωt.

Primiceri assumes Bjt = Bjt−1 + νt

if smooth transition is imposed: Bjt = {

B1jt if St = 1B2jt if St = 2

......

Bkjt if St = kwhere St is a k-dimensional Markov chain.

Rossi SSMS 12 / 73

Likelihood inference: the Kim’s algorithm

SSMS typically involve three sources of randomness: a vector ofparameters (θ, π), an unobserved continuous state x ≡ (x1, · · · ,xT ), anda N-states latent discrete process S ≡ (S1, · · · , ST ).

the likelihood L(θ, π) = f(y|θ, π) cannot be computed exactly. Infact the augmented likelihood f(y|θ, π,S) is known, butmarginalizing S out of the augmented likelihood is not feasible evenfor small time series length - unless xt is not present

the maximum likelihood estimate (θML, πML), and the smoothedquantities E(xt|y, θML), V (xt|y, θML), and Pr(St|y, θML, πML),t = 1, 2, · · · , T can be computed via the approximated filter proposedby Kim (1994)

Rossi SSMS 13 / 73

Bayesian inference

Posterior distributions are obtained via MCMC sampling from

f(θ, π,x,S | y)

A thorough scrutiny of the posterior output involves looking at

dim(θ, π) + dim(xt)× T + dim(St)× T

possibly very large dimension

In view of the above it is important to use samplers that, a priori, arebelieved to be efficient.

Rossi SSMS 14 / 73

Posterior simulation

Plain Gibbs sampling:

f(θ, π|x,S,y), f(x |θ,S,y), f(S |θ, π,x,y)

f(θ, π|x,S,y) usually simple to draw from but model dependent

f(x |θ,S,y) simulation smoother e.g. Durbin and Koopman (2002)

f(S |θ, π,x,y) e.g. Kim and Nelson (1999) simulation smoother

Issue: slowly mixing chains when θ and x are strongly dependent given S,or x and S are strongly dependent given θ.

Rossi SSMS 15 / 73

Posterior simulation: sampling x off-line

Assume that

All parameters in θ are independent, and θ is independent from π, i.e.

p(θ, π) = p(π)∏j

p(θj)

Dirichlet distributions are imposed for the transition probabilities π.

Then we may resort to the more efficient Gibbs scheme:

f(θ |S, π,y), Pr(S | θ, π,y), f(π | θ,Sy)

Rossi SSMS 16 / 73

Posterior simulation: sampling x off-line

where

θ ∼ f(θ |S, π,y) one-at-a-time by the slice sampling of Neal, 2003

S ∼ Pr(S | θ, π,y) one-at-a-time by Gerlach, Carter and Kohn, 2000or block-sampling by Fiorentini, Planas and Rossi, 2013

π ∼ f(π | θ,Sy) e.g. Fruhwirth-Schnatter, 2006

x ∼ f(x‖ θ,S,y) off-line by simulation smoother of Durbin andKoopman, 2002

Rossi SSMS 17 / 73

Posterior simulation: sampling θ

The hypotheses made imply

f(θ|S, π,y) ∝ f(y|S, θ) p(θ)

f(y|S, θ) computed by the Kalman filterdiffuse initial conditions handled as in Koopman (1997) fornon-stationary state variables

In principle θ could be sampled jointly using a Metropolis-Hastingsalgorithm. In practice the choice of the proposal density is difficultsince the conditioning set contains S

Then we resort to the Gibbs strategy:

f(θi|θ1, · · · , θi−1, θi+1, · · · , θm,S,y) ∝ f(y|S, θ) p(θi)

use is made of Neal (2003) slice sampler for each full conditional.

Rossi SSMS 18 / 73

Slice sampling

To get draws, say (θ1, · · · , θG), from f(θ)/k, k =∫f(θ)dθ:

Introduce an auxiliary variable γ and construct p(θ, γ) taking themarginal p(θ) unchanged. In particular choose γ|θ ∼ U [(0, f(θ))],then

p(θ, γ) = p(γ|θ)p(θ)

=1

f(θ)I[0

Slice sampling

Sampling from the joint p(θ, γ) is not possible but ...

draws from p(θ) can be obtained iterating Gibbs updates on γ|θ and θ|γ:

sample γ given θ from a uniform pdf over the set (0, f(θ))

sample θ given γ from a uniform pdf over S = {θ : γ < f(θ)}

Rossi SSMS 20 / 73

Slice sampling in practice

Sampling θ from a uniform over S = {θ : γ < f(θ)} is difficult to beachieved exactly. In practice:

Position I = (L,R) around θ0 at random that contains S as much aspossible;

Draw θ from the set A = {θ : θ ∈ S ∩ I and Pr(I|θ) = Pr(I|θ0)}

Neal (2003) proposes some strategies:

(i) stepping out;

(ii) doubling;

(iii) random positioning, etc.

Rossi SSMS 21 / 73

Stepping out

Rossi SSMS 22 / 73

Stepping out

Given θo, γ ∼ U(0, f(θo), and the slice S = {θ : γ < f(θo)} a new valueθn is obtained as follows:

Position I = (L,R) around θo at random: L = θo − uW , andR = L+W

Expand I setting L = L−W and R = R+W , until γ < f(L) andγ < f(R)

Shrinking: set θn = L+ u(R− L), and

set{ L = θn if θn < θoR = θn otherwise

repeat until γ < f(θn), accept θn.

Rossi SSMS 23 / 73

Doubling

Rossi SSMS 24 / 73

Doubling

Can expand the interval faster than stepping out when W turns out to betoo small.

Position I = (L,R) around θo at random: L = θo − uW , andR = L+W

Expand I setting L = L− (R− L) if u < 1/2, and R = R+ (R− L)otherwise;

Repeat until γ < f(L) or γ < f(R)

Shrinking: draw θn - is a bit more complex than the stepping outprocedure.

Rossi SSMS 25 / 73

The performance of the slice sampler

We start using a battery of univariate pdfs to select

1 the magnitude of the scaling parameter W

2 the procedure that best approximate the true slice: stepping-out,doubling, and random positioning

By 1000 replications of sample size G = 5000, we measure

NSE: V ar( 1G∑G

i=1 θi)1/2, i = 1, 2, · · · , G (small)

ρ1: the 1st order autocorrelation of the chain θ1, · · · , θG (close to 0)

IF = 1 + 2∑p

j=1 ωjρj , ωj the Parzen-weights (close to 1)

the (average) number of calls to f(θ) (small)

the number of rejections of the Cramer-Von Mises (CVM) test (5%)

Rossi SSMS 26 / 73

Univariate test cases - Marron and Wand (1992)

−4 −2 0 2 40

0.19

0.38

0.57

Skewed

−4 −2 0 2 40

0.47

0.93

1.4

Strongly skewed

−4 −2 0 2 40

0.53

1.06

1.6

Kurtotic

−4 −2 0 2 40

1.21

2.42

3.63

Outlier

−4 −2 0 2 40

0.1

0.2

0.3

Bimodal

−4 −2 0 2 40

0.13

0.27

0.4

Seperate bimodal

−4 −2 0 2 40

0.13

0.27

0.4

Skewed bimodal

−4 −2 0 2 40

0.1

0.2

0.3

Trimodal

−4 −2 0 2 40

0.2

0.4

0.6

Claw

−4 −2 0 2 40

0.14

0.27

0.41

Double claw

−4 −2 0 2 40

0.15

0.3

0.44

Asymmetric claw

−4 −2 0 2 40

0.13

0.27

0.4

Smooth comb

Rossi SSMS 27 / 73

Univariate test cases - summary results

RW-MH Slice

Stepping out DoublingW 1

2σ 3σ 10σ 100σ 1

2σ 3σ 10σ 100σ

NSE 3.77 2.75 1.92 1.83 1.79 2.30 2.02 1.85 1.80

ρ1 0.71 0.23 0.15 0.12 0.11 0.24 0.18 0.14 0.12

IF 6.38 3.97 1.51 1.38 1.34 2.24 1.68 1.42 1.35

N eval 2 9.42 6.09 6.61 9.72 23.31 14.65 9.47 10.00

CVM 0.59 0.22 0.13 0.11 0.11 0.20 0.15 0.12 0.12

RE 1 2.05 0.71 0.71 1.02 3.72 1.86 1.06 1.06

Rossi SSMS 28 / 73

Posterior simulation: sampling S

Gerlach, Carter and Kohn (2000) developed the gold standard algorithmwhich draws S from f(S|θ,y) in O(T ) operations:

Pr(St|S\t, θ, π,y) ∝ f(yTt+1 |yt,S, θ) f(yt |yt−1,St, θ) Pr(St |S\t, π)

- Lemma 4 in GCK yields f(yTt+1|yt,S, θ) in one step after an initial set ofbackward iterations.

- The kernel must be evaluated in each one of the N possible states of St.

Remark: Simulation of h variables St, · · · , St+h−1 requires h×N kernelevaluations.

Rossi SSMS 29 / 73

Posterior simulation: multi-move samplers

GCK algorithm samples f(St|S\t, θ,y)

It is a single-move algorithm so it may be inefficient when theSt-variables are strongly conditionally dependent. In such a case themixing properties of the algorithm can be improved by simulating jointlySt, · · · , St+h−1 from f(St, · · · , St+h−1|S1, · · · , St−1, St+h, · · · , ST , θ,y).

Fiorentini, Planas and Rossi (2013), propose blocking extensions to GCKsingle-move sampler:

1 Multi-move Gibbs sampler

2 Multi-move adaptive Metropolis-Hastings sampler

Rossi SSMS 30 / 73

Multi-move Gibbs

To present the idea we first consider a double-move sampler.

f(St, St+1|S\(t,t+1), θ,y) ∝ f(yTt+2 |yt+1,S, θ) f(yt+1 |yt,St+1, θ)

f(yt |yt−1,St, θ)f(St, St+1 |S\(t,t+1), θ)

We evaluate f(yTt+2 |yt+1,S, θ) in one step (GCK’s Lemma 4)

Simulation of blocks of length 2 we requires N2 kernel evaluations asopposed to 2×N for the single-move.

For blocks of length h (see paper) the relation is Nh kernel evaluations asopposed to h×N

Rossi SSMS 31 / 73

Posterior simulation: multi-move Gibbs

For example when N = 2 (one binary mixing variable)

Number of kernel evaluationsBlock-length Multi-move Single-Move

2 4 43 8 64 16 85 32 10

Block sampling via full conditionals is not feasible when N takes largevalues.

For these cases we propose an adaptive MH strategy.

Rossi SSMS 32 / 73

Posterior simulation: multi-move adaptive MH

To be viable an MH block sampling scheme for St, · · · , St+h−1 needs aproposal density such that:

1 acceptance rate remains appreciable when the block length increases

2 can be updated easily

3 acceptance probability evaluation is simple

4 simulation is fast and easy

We consider the mixture (similar to Giordani and Kohn, 2010):

q̃(st,t+h−1) = δq0(st,t+h−1) + (1− δ)q(st,t+h−1), 0 < δ

Posterior simulation: multi-move adaptive MH

At iteration n+ 1, the multi-move MH sampler proposes a candidates?t , · · · , s?t+h−1 with acceptance probability:

α = min{1,f(s?t,t+h−1 | s

(n+1)1,t−1 , s

(n)t+h,T , y, θ)

f(s(n)t,t+h−1 | s

(n+1)1,t−1 , s

(n)t+h,T , y, θ)

q̃(s(n)t,t+h−1 |s

(n+1)t−1 , s

(n)t+h)

q̃(s?t,t+h−1 |s(n+1)t−1 , s

(n)t+h)} (1)

With Gibbs sampling, ignorance of the normalizing constant forces toevaluate the kernel on all possible N × h sequences, the computation ofthe MH acceptance probability needs only two kernel evaluations

Adaptive MH is thus much faster than single and multi-move Gibbssamplers

See the paper for full details. (e.g. initial and ending blocks formulae areslightly different from those of inner blocks)

Rossi SSMS 34 / 73

Efficiency comparison

We consider Inefficiensy Factors (IF)

IF = 1 + 2

M∑k=1

wk ρ(k)

but also Relative Inefficiency Factors (RIF)

RIF =TimeATimeB

× IFAIFB

Example 1: Markov switching growth model for US real GDP.

Example 2: Markov switching variance for USD/GBP real exchangerate.

Example 3: Change-point volatility model for the Fama-Frenchmarket factor

Rossi SSMS 35 / 73

Example 1: Markov switching growth model

yt = pt + ct

pt = pt−1 + m(St) + V1/2p apt

m(St) = µ [St + (1− St) δ] δ < 1φ(L)ct = V

1/2c act

xt = (pt, ct)′

The drift m(St) takes two values, µ and µδ,

St a 2-states Markov chain with dynamics pii = p(St = i|St−1 = i),i = 0, 1.

θ = (A, τ, Vc, Vp, µ, δ, p00, p11)

Rossi SSMS 36 / 73

US real GDP: Gibbs inefficiency factors

h 1 2 3 4 6

S44 33.87 30.41 18.38 28.02 17.75S45 31.75 28.57 16.82 26.32 16.21S110 23.07 16.70 12.84 12.49 11.98S113 28.61 12.45 9.56 11.23 9.19p00 7.79 5.68 4.81 5.23 4.74p11 19.80 18.68 15.93 17.24 15.81τ 7.29 6.13 5.18 5.65 4.95ϕ 15.39 12.92 9.14 11.70 9.23µ 39.47 33.76 27.75 30.81 26.75δ 55.60 51.63 47.78 48.77 47.61

Notes: h is the block length; IF are computed using 1,000,000 draws; the four St variables shown are those for which GCKgives the largest IF.

Rossi SSMS 37 / 73

US real GDP: samplers relative inefficiency

Adaptive MH MH

h 1 2 3 4 6 8 10 RD T

S44 2.62 3.09 6.98 3.49 6.50 6.79 5.91 6.34 1.06S45 2.65 3.08 7.02 3.45 6.52 6.62 5.80 6.30 1.00S110 2.65 3.82 5.14 5.13 3.91 4.35 2.75 4.68 0.68S113 2.85 7.14 11.31 8.80 10.31 9.29 8.76 9.31 1.11p00 3.00 4.27 6.20 5.31 5.23 5.05 4.26 5.40 0.83p11 2.93 3.37 4.57 4.27 4.15 4.58 3.53 4.73 1.43τ 3.08 3.97 5.70 4.71 5.14 5.33 4.56 5.43 2.04ϕ 2.92 3.74 6.47 4.50 6.09 5.85 4.98 5.69 1.19µ 3.01 3.70 5.44 4.44 4.74 4.93 4.53 5.26 1.62δ 3.59 4.05 5.09 4.73 4.72 5.05 4.41 4.95 2.28Rel. time 0.22 0.22 0.20 0.20 0.20 0.20 0.22 0.20 0.18Acc. rate 0.99 0.97 0.94 0.93 0.89 0.86 0.82 0.88 0.06

Rossi SSMS 38 / 73

Examples 2: Markov switching variance

yt = pt + ct

∆pt = V1/2p apt

φ(L) ct = V1/2c [St + (1− St) α1/2c ] act

St a 2-states Markov chain withpii = Pr(St = i|St−1 = i) for i = 0, 1

Rossi SSMS 39 / 73

USD/GBP real exchange rate: Gibbs inefficiency factors

h 1 2 3 4 6

S167 86.89 57.73 41.08 33.80 27.04S168 89.43 59.07 43.87 36.07 29.73S169 90.05 59.91 44.39 36.54 29.76S170 85.56 57.02 41.36 33.52 26.58p00 115.65 88.89 77.43 68.56 69.95p10 113.32 86.74 76.34 67.63 67.88p01 78.18 57.62 52.19 46.05 41.69p11 68.92 54.06 48.55 43.49 39.05p02 8.92 7.86 7.34 7.01 6.81p12 8.57 7.29 6.99 6.64 6.79φ 26.94 21.93 20.79 21.24 19.88

Notes: h is the block length. IF are computed using 1,000,000 draws; the four St variables shown are those for which GCKgives the largest IF.

Rossi SSMS 40 / 73

USD/GBP real exchange rate: samplers relative ineff.

Adaptive MH MH

h 1 2 3 4 6 8 10 RD T

S167 1.82 2.52 3.36 3.50 3.81 4.40 5.18 5.36 21.74S168 1.82 2.60 3.24 3.25 3.56 3.99 5.09 5.21 20.28S169 1.84 2.58 3.19 3.24 3.43 4.04 4.94 5.31 21.14S170 1.87 2.59 3.26 3.45 3.86 4.42 4.43 5.27 22.71p00 1.53 1.89 2.17 2.48 2.39 2.18 1.98 1.39 3.71p10 1.51 1.86 2.16 2.42 2.35 2.11 1.89 1.36 3.62p02 1.84 1.97 2.04 1.95 1.54 1.31 1.29 1.61 3.35p12 1.90 2.01 1.83 1.71 1.30 1.10 1.10 1.41 3.19φ 2.30 2.51 2.99 2.56 2.80 3.02 3.08 2.80 4.08Rel. time 0.39 0.39 0.37 0.37 0.38 0.36 0.38 0.43 0.35Acc. rate 0.96 0.93 0.90 0.87 0.82 0.75 0.72 0.79 0.76

Rossi SSMS 41 / 73

Examples 3: Change-point volatility model

σt = exp(λ1/2t ) aσt

λt = λt−1 + K1t V1/2λ aλt

log-linearizing:

log(σ2t ) = λt + µ(K2t) + γ(K2t) azt

λt = λt−1 + K1t V1/2λ aλt

K1t ∼ iiBernoulli(w)K2t ∼ iiMultinomial(π) with 10-states.µ(K2t) + γ(K2t) azt mixture of normals that approximates log a

2σt

(see Kim, Shepard and Chib 1998, Omori et al. 2007)

Rossi SSMS 42 / 73

Fama-French market factor: samplers relative inefficiency

Gibbs (IF) adaptive MH (RI)

h 1 1 2 3 4 6 8 10 RD

S25 17.48 1.65 2.60 3.11 3.59 3.64 3.24 2.79 3.97S78 13.27 1.72 2.17 2.22 3.60 2.17 3.21 2.87 4.35S79 16.46 1.59 1.89 2.02 2.69 2.19 2.60 2.38 3.80S83 10.76 1.45 1.52 1.66 1.72 1.88 1.66 1.60 2.40ω 1.84 2.10 2.02 2.04 1.95 1.86 1.58 1.60 2.07Rel. time 1.00 0.39 0.38 0.36 0.35 0.34 0.33 0.33 0.32Acc. rate 0.86 0.77 0.71 0.66 0.59 0.54 0.50 0.58

Rossi SSMS 43 / 73

Block length

It is important to anticipate whether blocking is worth implementing

If yes, can we infer the ”optimal” block length?

Blocking is effective when the variables in the block are stronglyconditionally dependent given the rest

Simple way to gauge conditional dependence we use the partialautocorrelations Corr(St, St+k|y, St−1, St+k+1), k = 1, 2, · · · .

Rossi SSMS 44 / 73

Partial autocorrelations of discrete latent variable S in the US GDP example.

1 2 3 4 5 6 7 8 9 10

0

0.1

0.2

0.3

0.4

0.5

0.6

Lag

Rossi SSMS 45 / 73

Partial autocorrelations of latent variable S in the USD/GBP exchange rate

1 2 3 4 5 6 7 8 9 10

0

0.1

0.2

0.3

0.4

0.5

0.6

Lag

Rossi SSMS 46 / 73

Partial autocorrelations of discrete latent variable K1 in the Fama-French market

1 2 3 4 5 6 7 8 9 10−0.05

0

0.05

0.1

Lag

Rossi SSMS 47 / 73

Posterior simulation: transition probabilities

The assumptions made imply:

f(π|θ,S,y) = f(π|S) ∝ Pr(S|π) f(π)

In the simplest case where St is a sequence of independent variables i.e.Pr(St = k|St−1 = j) = Pr(St = k) = πk for k = 1, · · · , N .

Under the assumption π = (π1, · · · , πN ) ∼ Dirichlet(α1, · · · , αN ):

f(π|S) ∝T∏t=1

Pr(St) f(π) = Dirichlet(α∗1, · · · , α∗N )

α∗k = αk +

T∑t=1

1(St=k)

Rossi SSMS 48 / 73


Let S a Markov sequence with Pr(St = k|St−1 = j) = πkj fork, j = 1, · · · , N . Let π = (π1, · · · , πN ) and πj = (π1j , · · · , πNj).Assume πj are mutually independent Dirichlet pdfs. The full conditionaldistribution of π is such as:

f(π|S) ∝ Pr(S|π1, · · · , πN )N∏k=1

f(πk)

∝ Pr(S1|π1, · · · , πN )T∏t=2

Pr(St|St−1, π1, · · · , πN )N∏k=1

f(πk)

∝ Pr(S1|π1, · · · , πN )N∏k=1

T∏t∈Ik

Pr(St|St−1 = k, πk)f(πk)

where Ik = {t ≥ 2 : St−1 = k}.

Rossi SSMS 49 / 73


Notice that the term

N∏k=1

T∏t∈Ik

Pr(St|St−1 = k, πk)f(πk)

is proportional to the product of N independent Dirichlet distributions.

To remove dependence on the initial condition S1, this product is taken asproposal in a MH step with acceptance probability given by

min{1,Pr(S1|π∗)/Pr(S1|π0)}

where π∗ is the candidate vector and π0 is the previously sampled value.

Rossi SSMS 50 / 73

Convergence diagnostics

1 Visual inspection by plotting cumulated posterior means1g

∑gj=1 θ

(j), g = 1, 2, · · · , G.

2 Geweke convergence diagnostic: compare the mean of the first n1elements against the mean of the last n2:

θ̄1 =1

n1

n1∑j=1

θ(j); θ̄2 =1

n2

G∑j=G−n2+1

θ(j)

with n1 + n2 ≤ G. As G→∞ and n1G ,n2G remain fixed

Z =θ̄2 − θ̄1√

V (θ̄1) + V (θ̄2)→ N(0, 1)

Large values of Z indicates lack of convergence.

Geweke suggests n1 = G/5 and n2 = G/2.

Rossi SSMS 51 / 73

Convergence diagnostics

A universal method to asses convergence does not exist.

Useful to reports:

Geweke statistics.

Chain autocorrelations.

NSE the numerical standard error of the posterior mean usingautocovariances until lag equal to 4% of the recorded simulations.

RNE the relative numerical efficiency: the ratio of the variance of theposterior mean under iid hypothesis to the squared NSE. Close to 1value indicates high efficiency.

Rossi SSMS 52 / 73

Inference from the MCMC draws

Any quantity of interest can be derived from the posterior samples: forinstance marginal distribution and related moments like in

Ê[θ1|y] =1

G

G∑j=1

θ(j)1 f̂(θ̃1) =

1

G

G∑j=1

Iθ(j)1 ∈(θ̃1−δ,θ̃1+δ)

Hypothesis testing via Highest Posterior Density region (HPD) withprobability content α: the smallest interval R such that

p(β ∈ R|y) =∫Rp(β|y)dβ = α

⇒ accept H0 : β ∈ I if I ∈ R.

Rossi SSMS 53 / 73

The marginal likelihood

Let y denote the observed values and Mk a model with parameters θ forwhich a proper prior f(θ|Mk) is defined over a support Θ.

The marginal likelihood is the density mass that the model puts on thedata given the priors:

f(y|Mk) =∫

Θf(y|Mk, θ)f(θ|Mk)dθ

Rossi SSMS 54 / 73


Useful for:

1 Model selection: it gives the posterior probability of a model Mkamong K models:

p(Mk|y) =f(y|Mk)p(Mk)∑` f(y|M`)p(M`)

⇒ isolate the prior that better matches the data properties.2 Hypothesis testing: when models are made to coincide with

hypotheses, for instance to discriminate between different trendspecification.

3 Forecast combination:

f(yT+1|y) =∑`

f(yT+1|M`, y)p(M`|y)

... and for Bayesian model averaging of any quantity of interest.

Rossi SSMS 55 / 73


Few closed-form solutions exist:

Regression model (M1): y = Xβ + u with u|Vu ∼ N(0, VuIT ) andnatural conjugate prior p(β, Vu) = NIG(β0, P0, s0, ν0) which implies:

f(y|M1) = t(Xβ0, s0, (IT +XP−10 X′)−1, ν0)

Random walk plus drift (M2): ∆y = µ+ u withp(µ, Vu) = NIG(µ0, P0, s0, ν0):

f(∆y|M2) = t(µ01T−1, s0, (IT−1 + 1T−1P−10 1′T−1)

−1, ν0)

In general, no exact solution exists ⇒ a numerical evaluation is necessary.

Rossi SSMS 56 / 73

Marginal likelihood estimators

Likelihood integration over the prior:

f(y) =

∫Θf(y|θ)f(θ)dθ

f̂LI(y) =1

m

m∑i=1

f(y|θ(i)) θ iid from f(θ)

Difficulties for:

the support dimension;

likelihood functions typically highly-concentrated with respect to prior.

⇒ large variance, not used.

Rossi SSMS 57 / 73


Importance sampling is one standard in applied econometrics:

f(y) =

∫Θ

f(y|θ)f(θ)q(θ)

q(θ)dθ

⇒ f̂IS(y) =1

m

m∑i=1

f(y|θ(i))f(θ(i))q(θ(i))

; θ(i) ∼ q(θ)

where q(θ) is the importance function with support Sq.

Choice of importance function:

f(θ|y)/q(θ) ∝ 1 yields 0-variance ⇒ choose q(θ) ∝ f(θ|y) as muchas possible.

If q has light tails wrt f(θ|y), f(θ|y)/q(θ) is too large on tails. If q isover-dispersed wrt f(θ|y) irrelevant points are drawn. Both casesimply a loss in efficiency.

Often θ ∼ N(θ̃, cΣ(θ̃)) with θ̃ posterior mode and c > 1.Rossi SSMS 58 / 73


Harmonic Mean by Newton & Raftery (1994, JRSS):

1

f(y)=

∫f(θ)

f(y)dθ =

∫f(θ|y)f(y|θ)

dθ = Eθ|y[1

f(y|θ)]

⇒ f̂HM (y) = [1

m

m∑i=1

1

f(y|θ(i))]−1 θ(i) ∼ f(θ|y)

Advantage Make use of posterior samples instead of importance density.

Problem: variance is ∞ due to points with zero likelihood.

Improvement Modified harmonic mean estimator (Geweke, FRB 1999)attenuates the infinite variance problem by introducing an importancefunction with a truncation.

DMM delivers the MHM estimates with truncation at 0, 5%, · · · , 95%.

Rossi SSMS 59 / 73


Bridge sampling by Meng & Wong (1996, SS) is today’s best.

1 =

∫ h(θ)q(θ) q(θ)dθ∫ h(θ)

f(y|θ)f(θ)f(y)

f(θ|y)dθ⇒ f(y) =

∫ h(θ)q(θ) q(θ)dθ∫ h(θ)

f(y|θ)f(θ)f(θ|y)dθ

The bridge function h(θ) reduces the estimation error if located betweenthe importance function q(θ) and the posterior density. MW’s optimalchoice:

h(θ) ∝ q(θ)f(θ|y)mqq(θ) +myf(θ|y)

mq, my refers to # of draws from q(θ), f(θ|y).

Rossi SSMS 60 / 73


The MW estimator:

f̂MW (y) =

1mq

∑θ∼q(θ)

f(y|θ)f(θ)mqq(θ)+myf(θ|y)

1my

∑θ∼f(θ|y)

q(θ)mqq(θ)+myf(θ|y)

involves f(θ|y) that requires a preliminary estimate of f(y). The formulaeabove can be iterated.

DMM delivers MW estimates without iterating and with 10 iterations, allinitialized with one MHM estimates.

Rossi SSMS 61 / 73

Program DMM

DMM is a stand alone program for the analysis of dynamic mixturemodels (see Giordani, Kohn, and van Dijk, JoE, 2007)

Several packages offer estimation of state space models (see e.g.Commandeur, Koopman and Ooms, JSS, Vol. 41, 2011)

However only S+FinMetrics by Zivot (JSS, 2006), analyzes Markovswitching state space models. Estimation is performed by maximizingthe approximated likelihood devised by Kim (JoE, 1994).

Rossi SSMS 62 / 73

DMM: main features

Bayesian inference: DMM delivers posterior samples of theunobserved state vector, of the discrete latent variable, of modelparameters, missing values, forecasts and two marginal likelihoodestimates (Meng and Wong, SS, 1996 and Geweke, 1999)

Endogenous series: univariate or multivariate, stationary ornon-stationary, with missing observations, and they may be linked toexogenous variables.

Coding: for computational speed and robustness DMM is fullyimplemented in Fortran.

Rossi SSMS 63 / 73

DMM: prior assumptions

All parameters in θ are independent, and θ is independent from π

Prior distributions for the elements of θ may be normal (NT), beta(BE), and inverse gamma (IG)

Each parameter in θ is defined over a finite support

Dirichlet distributions are imposed for the transition probabilities π`.

Rossi SSMS 64 / 73

Running DMM: stand-alone version

Kahn-Rich.nml: contains model settings, prior distributions, anddata for LP, WH, and CH

Kahn-Rich.dll, a dynamic link library that defines the systemmatrices of the state space representation. The dll can be codedeither using a Fortran compiler e.g. GNU Fortran compiler forWindows by typing:

gfortran -shared -o Kahn-Rich.dll Kahn-Rich.for

or writing a simple routine in MatLab.

The program is called from the MS-DOS prompt typing:

DMM Kahn-Rich.nml

Rossi SSMS 65 / 73

The Kahn and Rich JME 2007 model G.Fiorentini, C.Planas and A.Rossi - January 2013 lp = p1 + λ1*c + z1 wh = p2 + λ2*c + z2 ch = p3 + λ3*c + z3 pj = pj(-1) + mj + ap mj = [S+(1-S)*δj]*μj, δj < 1 c = 2Acos(2π/τ)c(-1) -2A2 c(-2) + ac zj = фjzj(-1) + azj, j=1,2,3 State-space format: y(t) = c(t)z(t) + H(t)x(t) + G(t)u(t) x(t) = a(t) + F(t)x(t-1) + R(t)u(t) Namelist ssm contains: nx = number of continuous states nu = number of shocks d(1) = order of integration of the system d(2) = number of non-stationary continuous state variables nv = number of discrete S variables (

Namelist prior describes priors pdf of model parameters: nt = number of theta parameters pdftheta = prior distribution (NT=Truncated Normal; BE=Beta; IG=Inverse Gamma) hyptheta = hyperparameter of prior pdf (mean or hyp, sd or hyp,lower bound, upper bound) Note: if thetahyp(3,j)=thetahyp(4,j) the parameter is not estimated and its value is fixed at lb=ub τ,A,Vc = period,amplitude,var theta(1:3) λ1, λ2 , λ3 = common cycle loading coeff theta(4:6) μ1, δ1 = drift p1 theta(7:8) μ2, δ2 = drift p2 theta(9:10) μ3, δ3 = drift p3 theta(11:12) Vp = common trend variance theta(13) ф1, ф2 , ф3 = idio term AR1 coeffs theta(14:16) Vz1,Vz2 ,Vz3 = idio term variances theta(17:19) &prior nt = 19 pdftheta(1) = BE hyptheta(1,1) = 2.58 14.68 2 100 pdftheta(2) = BE hyptheta(1,2) = 5 5 .0001 .9999 pdftheta(3) = IG hyptheta(1,3) = .0004 6 0 1 pdftheta(4) = NT hyptheta(1,4) = 1 1 1 1 pdftheta(5) = NT hyptheta(1,5) = .3 .25 0 2 pdftheta(6) = NT hyptheta(1,6) = .6 .25 0 2 pdftheta(7) = NT hyptheta(1,7) = .008 .000001 0 .012 pdftheta(8) = NT hyptheta(1,8) = .4 .16 .001 .95 pdftheta(9) = NT hyptheta(1,9) = .008 .000001 0 .012 pdftheta(10)= NT hyptheta(1,10)= .4 .16 .001 .95 pdftheta(11)= NT hyptheta(1,11)= .008 .000001 0 .012 pdftheta(12)= NT hyptheta(1,12)= .4 .16 .001 .95 pdftheta(13)= IG hyptheta(1,13)= .0000012 6 0 1 pdftheta(14)= NT hyptheta(1,14)= 0 0 0 0 pdftheta(15)= NT hyptheta(1,15)= .8 .04 0 .98 pdftheta(16)= NT hyptheta(1,16)= .8 .04 0 .98 pdftheta(17)= IG hyptheta(1,17)= .0002 6 0 .01 pdftheta(18)= IG hyptheta(1,18)= .0002 6 0 .01 pdftheta(19)= IG hyptheta(1,19)= .0002 6 0 .01 &end

Namelist mcmc contains the Markov Chain Monte Carlo options: seed = seed of random number generator (0-999) thin = thinning burnin = burn-in period simulrec = number of recorded samples hbl = block length discrete latent variable (1:GCK, >1: AMH) &mcmc seed=0 thin=1 burnin=100 simulrec=5000 hbl=1 &end Namelist dataset provides data: T = number of observations ny = number of endogenous series nz = number of exogenous series nf = number of forecasts datasym = simulate the data {Y,N} obs = a matrix of dimension nobs x ny if nz = 0 and (nobs+nf) x (ny+nz) if nz > 0. Note: use can be made of -99999 to assign missing values to the endogenous variables &dataset T=244 ny=3 nz=0 nf=0 obs= 1.00000000000000 1.00000000000000 1.00000000000000 1.01591026190000 1.00200995470000 1.00958835440000 . . . . . . . . . 2.36156892870000 2.43176523280000 2.45613035130000 &end

SUBROUTINE DESIGN(ny,nz,nx,nu,ns,nt,theta,c,H,G,a,F,R) !DEC$ ATTRIBUTES DLLEXPORT, ALIAS:'design_' :: DESIGN C INPUT INTEGER ny,nz,nx,nu,ns(6),nt DOUBLE PRECISION theta(nt) C OUTPUT

DOUBLE PRECISION c(ny,max(1,nz),ns(1)),H(ny,nx,ns(2)), G(ny,nu,ns(3)),a(nx,ns(4)),F(nx,nx,ns(5)),R(nx,nu,ns(6))

C LOCALS INTEGER I DOUBLE PRECISION PI DATA PI/3.141592653589793D0/ C c(t) (ny x max(1,nz) x ns1) c(:,:,:) = 0.D0 C H(t) (ny x nx x ns2) H(:,:,:) = 0.D0 DO I = 1,ny H(I,I,1) = 1.D0 ! trend H(I,ny+1,1) = theta(3+I) ! cycle H(I,ny+2+I,1) = 1.D0 ! idyo ENDDO C G(t) (ny x nu x ns3) G(:,:,:) = 0.D0 C a(t) (nx x ns4) a(:,:) = 0.D0 DO I = 1,ny a(I,1) = theta(7+(I-1)*2)*theta(8+(I-1)*2) a(I,2) = theta(7+(I-1)*2) ENDDO C F(t) (nx x nx x ns5) F(:,:,:) = 0.D0 DO I = 1,ny F(I,I,1) = 1.D0 ENDDO F(ny+1,ny+1,1) = 2.D0*theta(2)*dcos(2.*PI/theta(1)) F(ny+1,ny+2,1) = -theta(2)**2 F(ny+2,ny+1,1) = 1.D0 DO I = 1,ny F(ny+2+I,ny+2+I,1) = theta(13+I) ENDDO

C R(t) (nx x nu x ns6) R(:,:,:) = 0.D0 DO I =1,ny R(I,1,1) = DSQRT(theta(13)) ! trend var ENDDO R(ny+1,2,1) = DSQRT(theta(3)) ! cycle var DO I = 1,ny R(ny+2+I,2+I,1) = DSQRT(theta(16+I)) ! idyo var ENDDO RETURN END

To create the file kahn-rich.dll from kahn-rich.for use can be made of the GNU Fortran compiler for Windows (gcc.gnu.org/wiki/GFortran) At the MS-DOS command prompt type: gfortran -shared -o kahn-rich.dll kahn-rich.for

Running DMM: dynare version (to be finalized)

// Variables and processes declaration

var y mu e;

varobs y;

varexo ee emu;

parameters Ve, Vmu, delta, S1, S2;

// Write the model as usual in dynare

model;

y = mu + e;

e = ((S1 - 1)*sqrt(delta) + (2 - S1))*sqrt(Ve)*ee;

mu = mu(-1) + (S2-1)*sqrt(Vmu)*emu;

end;

// MCMC settings

dmm(drop=1000,seed=0,thinning=1,replic=10000,

maxorderintegration=1,nonstationary=1,forecasts=10);

Rossi SSMS 66 / 73

Running DMM: dynare version (to be finalized)

// Specify the latent processes S1 and S2

multinomial( numberofregimes=2, probability = [P1]);

S1.calibration(regime=1) = 1;


multinomial( numberofregimes=2, probability = [P2]);



// Setting priors

P1.prior(shape=dirichlet,params=[1 1; 1 1]);

P2.prior(shape=dirichlet,params=[1 1; 1 1]);

Ve.prior(shape=invgamma,mean=6e4,stdev=6,interval=[0,5e4]);

Vmu.prior(shape=invgamma,mean=6e4,stdev=6,interval=[0,5e4]);

delta.prior(shape=beta,mean=2,stdev=4,interval=[1,20]);

Rossi SSMS 67 / 73

Detecting changes in US trend productivity

Kahn and Rich, JME 1997, use neoclassical growth theory to detectchanges in US trend productivity

Labour productivity as real GDP by total hours (LP )Hourly wages (WH)Real consumption per hours worked (CH)

48−4 53−4 58−4 63−4 68−4 73−4 78−4 83−4 88−4 93−4 98−4 03−4 08−41

1.2

1.4

1.6

1.8

2

2.2

2.4 LPWHCH

Rossi SSMS 68 / 73

Detecting changes in US trend productivity

The Kahn-Rich model (slightly revised):

LPt = p1t + λ1ψt + z1t

WHt = p2t + λ2ψt + z2t

CHt = p3t + λ3ψt + z3t

ψt = 2 A cos(2π/τ)ψt−1 −A2ψt−2 + aψt, aψt ∼ N(0, Vψ)

p`t = µ`(St) + p`t−1 + apt, apt ∼ N(0, Vp)

µ`(St) = µ`[(1− St) + δ`St]

z`t = φ`z`t−1 + az`t, az`t ∼ N(0, Vz`), ` = 1, 2, 3

where St ∈ {0, 1}, with Pr(St+1 = i|St = i) = πii, i = 0, 1

Rossi SSMS 69 / 73

Detecting changes in US trend productivity: DMM results

20 30 40 50 60

0.01

0.02

0.03

0.04

period0.45 0.5 0.55 0.6 0.65 0.7

2

4

6

8

10

amplitude

6 6.5 7 7.5 8 8.5 9 9.5 10

x 10−3

500

1000

1500

µ1 µ

2 µ

3

LPWHCH

0.1 0.2 0.3 0.4 0.5 0.6 0.7

2

4

6

8

δ1 δ

2 δ

3

0.8 0.85 0.9 0.95

5

10

15

Pr(St = 0 | S

t−1 = 0)

0.8 0.85 0.9 0.95

5

10

15

Pr(St = 1 | S

t−1 = 1)

Rossi SSMS 70 / 73

Detecting changes in US trend productivity: DMM results

Posterior probability of high productivity growth Pr(St = 1|yT )

48−4 52−4 56−4 60−4 64−4 68−4 72−4 76−4 80−4 84−4 88−4 92−4 96−4 00−4 04−4 08−40

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Rossi SSMS 71 / 73

Concluding remarks

DMM is a program for the analysis of dynamic mixture models:

handles multivariate series that may be non-stationary, with missingobservations, and linked to some exogenous variables

implements up-to-date techniques for sampling the discrete latentvariable in O(T ) operations, for exact initialization of the Kalmanrecursions, for drawing model parameters efficiently, and forcomputing the marginal likelihood

prior distributions do not need to be conjugate

complete freedom for model parameterization

Benefits from the computational speed advantage of low-levellanguages - particularly relevant when MCMC algorithms are employed

Rossi SSMS 72 / 73

Concluding remarks

The stand-alone version of DMM can be freely downloaded at

http://ipsc.jrc.ec.europa.eu/fileadmin/

repository/sfa/finepro/software/DMM.zip

The dynare version of DMM will be soon available at

http://www.dynare.org/

Rossi SSMS 73 / 73

Documents

State space models with switching and program DMM · State Space Models with Switching The SSMS class encompasses models that admit representation: y t = c tz t+ H tx t+ G tu t x