78
State space models with switching and program DMM Alessandro Rossi and Christophe Planas Joint Research Centre of European Commission Identification and global sensitivity analysis for macroeconomic models 22-24 April 2015, Milano Rossi SSMS 1 / 73

State space models with switching and program DMM · State Space Models with Switching The SSMS class encompasses models that admit representation: y t = c tz t+ H tx t+ G tu t x

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

  • State space models with switching and program DMM

    Alessandro Rossi and Christophe PlanasJoint Research Centre of European Commission

    Identification and global sensitivity analysis for macroeconomic models22-24 April 2015, Milano

    Rossi SSMS 1 / 73

  • Motivation

    Thanks to their flexibility for handling nonlinearities, structuralchanges, and outliers, State Space Models with Switching (SSMS)have entertained some success in the econometric literature(See e.g. Giordani, Kohn, and van Dijk, JoE 2007)

    Some macroeconomic models featuring rational expectationsincorporate features that make their reduced form representable in aSSMS format (e.g. Farmer, Waggoner, and Zha, JEDC 2011)

    Rossi SSMS 2 / 73

  • Outline

    The class of state space models with switching (SSMS)

    Some well known model admitting a SSMS representation

    Frequentist and Bayesian inference of SSMS

    Program DMM for the analysis of SSMS

    A test case: the Kahn and Rich approach (JME, 1997) for detectingchanges in US trend productivity

    Rossi SSMS 3 / 73

  • State Space Models with Switching

    The SSMS class encompasses models that admit representation:

    yt = ctzt + Htxt + Gtut

    xt = at + Ftxt−1 + Rtut

    yt ny× 1 vector of endogenous variableszt is the nz× 1 vector of exogenous seriesxt is the nx× 1 state vectorut is the nu× 1 vector of shocksct, Ht, Gt, at, Ft, and Rt determined by a parameter θ and adiscrete latent variable St = (S1t, · · · , S`t, · · · ).

    Rossi SSMS 4 / 73

  • SSMS: general assumptions

    Shocks are Gaussian: utiid∼ N(0, I)

    This is not restrictive: fat tails or other features can be modelled viaproper treatment of the matrices Gt and Rt

    Each variable S`t takes ns` values following independent Markovprocesses with transition probabilities π` ij ≡ Pr(S`t = i|S`t−1 = j),i, j = 1, · · · , ns`, that are collected into the vector π`

    Notice that S`t can also be independent as a special case of Markovi.e. Pr(S`t = i|S`t−1 = j) = Pr(S`t = i)

    Rossi SSMS 5 / 73

  • SSMS: general assumptions

    The system matrices ct, Ht, Gt, at, Ft, and Rt depend only oncontemporaneous values of St

    Different situations where some system matrices depend on laggedvalues of S`t are possible by reparameterising

    Given St, the matrices ct, Ht, Gt, at, Ft, and Rt, do not depend onthe transition probabilities π = (π1, · · · , π`, · · · )

    This condition makes feasible inference of unobservables and modelparameters

    The eigenvalues of the transition matrix Ft are either less or equal toone in modulus

    A weaker condition as detailed in Francq and Zakoian, (2001,2002) isalso possible

    Rossi SSMS 6 / 73

  • Examples of model admitting a SSMS

    Example 1: Time Varying Parameters autoregressive model

    Example 2: Markov switching model for US real GDP

    Example 3: Markov switching variance for USD/GBP real exchangerate

    Example 4: Change-point volatility model for the Fama-Frenchmarket factor

    Example 5: Structural Vector Autoregressions for monetary policy

    Rossi SSMS 7 / 73

  • Example 1: Time varying parameters autoregressive model

    Giordani and Kohn (2010) to model the persistence of US inflation

    πt = ψt + ρtπt−1 + V1/2π aπt

    ψt = ψt−1 + V1/2ψ aψt

    ρt = ρt−1 + V1/2ρ aρt

    xt = (ψt, ρt)′, ut = (aπt, aψt, aρt)

    ′, ct = at = 0, Ht = (1, πt−1),

    Ft =

    [1 00 1

    ], Rt =

    [0 V

    1/2ψ 0

    0 0 V1/2ρ

    ], Gt =

    [V

    1/2π 0 0

    ]

    Rossi SSMS 8 / 73

  • Example 2: Markov switching growth model

    Similar to Hamilton (1989) for US GNP

    yt = pt + ψt

    pt = pt−1 +m(St) + V1/2p apt

    m(St) = µ[St + (1− St)δ]ψt = φψt−1 + V

    1/2ψ aψt

    St ∈ 0, 1 with πii = p(St = i|St−1 = i), i = 0, 1.xt = (pt, ψt)

    ′, ut = (apt, aψt)′, ct = Gt = 0, Ht = (1, 1),

    at =

    [m(St)

    0

    ], Ft =

    [1 00 φ

    ]and Rt =

    [V

    1/2p 0

    0 V1/2ψ

    ]

    Rossi SSMS 9 / 73

  • Example 3: Markov Switching variance

    Engle and KIM (1999) for USD/GBP real exchange rate

    yt = pt + ψt

    pt = pt−1 + V1/2p apt

    ψt = φψt−1 + V1/2ψ [St + (1− St) α

    1/2ψ ] aψt

    St 2-states Markov chain, πii = Pr(St = i|St−1 = i) for i = 0, 1xt = (pt, ψt)

    ′, ut = (apt, aψt)′, ct = Gt = at = 0, Ht = (1, 1),

    Ft =

    [1 00 φ

    ]and Rt =

    [V

    1/2p 0

    0 V1/2ψ [St + (1− St) α

    1/2ψ ]

    ]

    Rossi SSMS 10 / 73

  • Examples 4: Change-point volatility model

    Kim, Shepard and Chib (1998) to describe stochastic volatility

    σt = exp(λ1/2t )aσt

    λt = λt−1 +K1tV1/2λ aλt

    taking the square and log-linearizing:

    log(σ2t ) = λt + µ(K2t) + γ(K2t)azt

    λt = λt−1 +K1tV1/2λ aλt

    K1t ∼ iiBernoulli(w)K2t ∼ iiMultinomial(π) with 10-states (Omori et al. 2007).µ(K2t) + γ(K2t)azt mixture of normals that approximates log a

    2σt

    Rossi SSMS 11 / 73

  • Example 5: Structural VAR with Switching

    Primiceri (2005) to describe US Monetary policy and the private sector

    yt = ψt +

    p∑j=1

    Bjt yt−j + ut

    yt is a vector of endogenous vars

    ψt is an vector of time varying coefficients, Bjt are matrices of timevarying coefficients

    ut heteroscedastic shocks with covariance matrix Ωt.

    Primiceri assumes Bjt = Bjt−1 + νt

    if smooth transition is imposed: Bjt = {

    B1jt if St = 1B2jt if St = 2

    ......

    Bkjt if St = kwhere St is a k-dimensional Markov chain.

    Rossi SSMS 12 / 73

  • Likelihood inference: the Kim’s algorithm

    SSMS typically involve three sources of randomness: a vector ofparameters (θ, π), an unobserved continuous state x ≡ (x1, · · · ,xT ), anda N-states latent discrete process S ≡ (S1, · · · , ST ).

    the likelihood L(θ, π) = f(y|θ, π) cannot be computed exactly. Infact the augmented likelihood f(y|θ, π,S) is known, butmarginalizing S out of the augmented likelihood is not feasible evenfor small time series length - unless xt is not present

    the maximum likelihood estimate (θML, πML), and the smoothedquantities E(xt|y, θML), V (xt|y, θML), and Pr(St|y, θML, πML),t = 1, 2, · · · , T can be computed via the approximated filter proposedby Kim (1994)

    Rossi SSMS 13 / 73

  • Bayesian inference

    Posterior distributions are obtained via MCMC sampling from

    f(θ, π,x,S | y)

    A thorough scrutiny of the posterior output involves looking at

    dim(θ, π) + dim(xt)× T + dim(St)× T

    possibly very large dimension

    In view of the above it is important to use samplers that, a priori, arebelieved to be efficient.

    Rossi SSMS 14 / 73

  • Posterior simulation

    Plain Gibbs sampling:

    f(θ, π|x,S,y), f(x |θ,S,y), f(S |θ, π,x,y)

    f(θ, π|x,S,y) usually simple to draw from but model dependent

    f(x |θ,S,y) simulation smoother e.g. Durbin and Koopman (2002)

    f(S |θ, π,x,y) e.g. Kim and Nelson (1999) simulation smoother

    Issue: slowly mixing chains when θ and x are strongly dependent given S,or x and S are strongly dependent given θ.

    Rossi SSMS 15 / 73

  • Posterior simulation: sampling x off-line

    Assume that

    All parameters in θ are independent, and θ is independent from π, i.e.

    p(θ, π) = p(π)∏j

    p(θj)

    Dirichlet distributions are imposed for the transition probabilities π.

    Then we may resort to the more efficient Gibbs scheme:

    f(θ |S, π,y), Pr(S | θ, π,y), f(π | θ,Sy)

    Rossi SSMS 16 / 73

  • Posterior simulation: sampling x off-line

    where

    θ ∼ f(θ |S, π,y) one-at-a-time by the slice sampling of Neal, 2003

    S ∼ Pr(S | θ, π,y) one-at-a-time by Gerlach, Carter and Kohn, 2000or block-sampling by Fiorentini, Planas and Rossi, 2013

    π ∼ f(π | θ,Sy) e.g. Fruhwirth-Schnatter, 2006

    x ∼ f(x‖ θ,S,y) off-line by simulation smoother of Durbin andKoopman, 2002

    Rossi SSMS 17 / 73

  • Posterior simulation: sampling θ

    The hypotheses made imply

    f(θ|S, π,y) ∝ f(y|S, θ) p(θ)

    f(y|S, θ) computed by the Kalman filterdiffuse initial conditions handled as in Koopman (1997) fornon-stationary state variables

    In principle θ could be sampled jointly using a Metropolis-Hastingsalgorithm. In practice the choice of the proposal density is difficultsince the conditioning set contains S

    Then we resort to the Gibbs strategy:

    f(θi|θ1, · · · , θi−1, θi+1, · · · , θm,S,y) ∝ f(y|S, θ) p(θi)

    use is made of Neal (2003) slice sampler for each full conditional.

    Rossi SSMS 18 / 73

  • Slice sampling

    To get draws, say (θ1, · · · , θG), from f(θ)/k, k =∫f(θ)dθ:

    Introduce an auxiliary variable γ and construct p(θ, γ) taking themarginal p(θ) unchanged. In particular choose γ|θ ∼ U [(0, f(θ))],then

    p(θ, γ) = p(γ|θ)p(θ)

    =1

    f(θ)I[0

  • Slice sampling

    Sampling from the joint p(θ, γ) is not possible but ...

    draws from p(θ) can be obtained iterating Gibbs updates on γ|θ and θ|γ:

    sample γ given θ from a uniform pdf over the set (0, f(θ))

    sample θ given γ from a uniform pdf over S = {θ : γ < f(θ)}

    Rossi SSMS 20 / 73

  • Slice sampling in practice

    Sampling θ from a uniform over S = {θ : γ < f(θ)} is difficult to beachieved exactly. In practice:

    Position I = (L,R) around θ0 at random that contains S as much aspossible;

    Draw θ from the set A = {θ : θ ∈ S ∩ I and Pr(I|θ) = Pr(I|θ0)}

    Neal (2003) proposes some strategies:

    (i) stepping out;

    (ii) doubling;

    (iii) random positioning, etc.

    Rossi SSMS 21 / 73

  • Stepping out

    Rossi SSMS 22 / 73

  • Stepping out

    Given θo, γ ∼ U(0, f(θo), and the slice S = {θ : γ < f(θo)} a new valueθn is obtained as follows:

    Position I = (L,R) around θo at random: L = θo − uW , andR = L+W

    Expand I setting L = L−W and R = R+W , until γ < f(L) andγ < f(R)

    Shrinking: set θn = L+ u(R− L), and

    set{ L = θn if θn < θoR = θn otherwise

    repeat until γ < f(θn), accept θn.

    Rossi SSMS 23 / 73

  • Doubling

    Rossi SSMS 24 / 73

  • Doubling

    Can expand the interval faster than stepping out when W turns out to betoo small.

    Position I = (L,R) around θo at random: L = θo − uW , andR = L+W

    Expand I setting L = L− (R− L) if u < 1/2, and R = R+ (R− L)otherwise;

    Repeat until γ < f(L) or γ < f(R)

    Shrinking: draw θn - is a bit more complex than the stepping outprocedure.

    Rossi SSMS 25 / 73

  • The performance of the slice sampler

    We start using a battery of univariate pdfs to select

    1 the magnitude of the scaling parameter W

    2 the procedure that best approximate the true slice: stepping-out,doubling, and random positioning

    By 1000 replications of sample size G = 5000, we measure

    NSE: V ar( 1G∑G

    i=1 θi)1/2, i = 1, 2, · · · , G (small)

    ρ1: the 1st order autocorrelation of the chain θ1, · · · , θG (close to 0)

    IF = 1 + 2∑p

    j=1 ωjρj , ωj the Parzen-weights (close to 1)

    the (average) number of calls to f(θ) (small)

    the number of rejections of the Cramer-Von Mises (CVM) test (5%)

    Rossi SSMS 26 / 73

  • Univariate test cases - Marron and Wand (1992)

    −4 −2 0 2 40

    0.19

    0.38

    0.57

    Skewed

    −4 −2 0 2 40

    0.47

    0.93

    1.4

    Strongly skewed

    −4 −2 0 2 40

    0.53

    1.06

    1.6

    Kurtotic

    −4 −2 0 2 40

    1.21

    2.42

    3.63

    Outlier

    −4 −2 0 2 40

    0.1

    0.2

    0.3

    Bimodal

    −4 −2 0 2 40

    0.13

    0.27

    0.4

    Seperate bimodal

    −4 −2 0 2 40

    0.13

    0.27

    0.4

    Skewed bimodal

    −4 −2 0 2 40

    0.1

    0.2

    0.3

    Trimodal

    −4 −2 0 2 40

    0.2

    0.4

    0.6

    Claw

    −4 −2 0 2 40

    0.14

    0.27

    0.41

    Double claw

    −4 −2 0 2 40

    0.15

    0.3

    0.44

    Asymmetric claw

    −4 −2 0 2 40

    0.13

    0.27

    0.4

    Smooth comb

    Rossi SSMS 27 / 73

  • Univariate test cases - summary results

    RW-MH Slice

    Stepping out DoublingW 1

    2σ 3σ 10σ 100σ 1

    2σ 3σ 10σ 100σ

    NSE 3.77 2.75 1.92 1.83 1.79 2.30 2.02 1.85 1.80

    ρ1 0.71 0.23 0.15 0.12 0.11 0.24 0.18 0.14 0.12

    IF 6.38 3.97 1.51 1.38 1.34 2.24 1.68 1.42 1.35

    N eval 2 9.42 6.09 6.61 9.72 23.31 14.65 9.47 10.00

    CVM 0.59 0.22 0.13 0.11 0.11 0.20 0.15 0.12 0.12

    RE 1 2.05 0.71 0.71 1.02 3.72 1.86 1.06 1.06

    Rossi SSMS 28 / 73

  • Posterior simulation: sampling S

    Gerlach, Carter and Kohn (2000) developed the gold standard algorithmwhich draws S from f(S|θ,y) in O(T ) operations:

    Pr(St|S\t, θ, π,y) ∝ f(yTt+1 |yt,S, θ) f(yt |yt−1,St, θ) Pr(St |S\t, π)

    - Lemma 4 in GCK yields f(yTt+1|yt,S, θ) in one step after an initial set ofbackward iterations.

    - The kernel must be evaluated in each one of the N possible states of St.

    Remark: Simulation of h variables St, · · · , St+h−1 requires h×N kernelevaluations.

    Rossi SSMS 29 / 73

  • Posterior simulation: multi-move samplers

    GCK algorithm samples f(St|S\t, θ,y)

    It is a single-move algorithm so it may be inefficient when theSt-variables are strongly conditionally dependent. In such a case themixing properties of the algorithm can be improved by simulating jointlySt, · · · , St+h−1 from f(St, · · · , St+h−1|S1, · · · , St−1, St+h, · · · , ST , θ,y).

    Fiorentini, Planas and Rossi (2013), propose blocking extensions to GCKsingle-move sampler:

    1 Multi-move Gibbs sampler

    2 Multi-move adaptive Metropolis-Hastings sampler

    Rossi SSMS 30 / 73

  • Multi-move Gibbs

    To present the idea we first consider a double-move sampler.

    f(St, St+1|S\(t,t+1), θ,y) ∝ f(yTt+2 |yt+1,S, θ) f(yt+1 |yt,St+1, θ)

    f(yt |yt−1,St, θ)f(St, St+1 |S\(t,t+1), θ)

    We evaluate f(yTt+2 |yt+1,S, θ) in one step (GCK’s Lemma 4)

    Simulation of blocks of length 2 we requires N2 kernel evaluations asopposed to 2×N for the single-move.

    For blocks of length h (see paper) the relation is Nh kernel evaluations asopposed to h×N

    Rossi SSMS 31 / 73

  • Posterior simulation: multi-move Gibbs

    For example when N = 2 (one binary mixing variable)

    Number of kernel evaluationsBlock-length Multi-move Single-Move

    2 4 43 8 64 16 85 32 10

    Block sampling via full conditionals is not feasible when N takes largevalues.

    For these cases we propose an adaptive MH strategy.

    Rossi SSMS 32 / 73

  • Posterior simulation: multi-move adaptive MH

    To be viable an MH block sampling scheme for St, · · · , St+h−1 needs aproposal density such that:

    1 acceptance rate remains appreciable when the block length increases

    2 can be updated easily

    3 acceptance probability evaluation is simple

    4 simulation is fast and easy

    We consider the mixture (similar to Giordani and Kohn, 2010):

    q̃(st,t+h−1) = δq0(st,t+h−1) + (1− δ)q(st,t+h−1), 0 < δ

  • Posterior simulation: multi-move adaptive MH

    At iteration n+ 1, the multi-move MH sampler proposes a candidates?t , · · · , s?t+h−1 with acceptance probability:

    α = min{1,f(s?t,t+h−1 | s

    (n+1)1,t−1 , s

    (n)t+h,T , y, θ)

    f(s(n)t,t+h−1 | s

    (n+1)1,t−1 , s

    (n)t+h,T , y, θ)

    q̃(s(n)t,t+h−1 |s

    (n+1)t−1 , s

    (n)t+h)

    q̃(s?t,t+h−1 |s(n+1)t−1 , s

    (n)t+h)} (1)

    With Gibbs sampling, ignorance of the normalizing constant forces toevaluate the kernel on all possible N × h sequences, the computation ofthe MH acceptance probability needs only two kernel evaluations

    Adaptive MH is thus much faster than single and multi-move Gibbssamplers

    See the paper for full details. (e.g. initial and ending blocks formulae areslightly different from those of inner blocks)

    Rossi SSMS 34 / 73

  • Efficiency comparison

    We consider Inefficiensy Factors (IF)

    IF = 1 + 2

    M∑k=1

    wk ρ(k)

    but also Relative Inefficiency Factors (RIF)

    RIF =TimeATimeB

    × IFAIFB

    Example 1: Markov switching growth model for US real GDP.

    Example 2: Markov switching variance for USD/GBP real exchangerate.

    Example 3: Change-point volatility model for the Fama-Frenchmarket factor

    Rossi SSMS 35 / 73

  • Example 1: Markov switching growth model

    yt = pt + ct

    pt = pt−1 + m(St) + V1/2p apt

    m(St) = µ [St + (1− St) δ] δ < 1φ(L)ct = V

    1/2c act

    xt = (pt, ct)′

    The drift m(St) takes two values, µ and µδ,

    St a 2-states Markov chain with dynamics pii = p(St = i|St−1 = i),i = 0, 1.

    θ = (A, τ, Vc, Vp, µ, δ, p00, p11)

    Rossi SSMS 36 / 73

  • US real GDP: Gibbs inefficiency factors

    h 1 2 3 4 6

    S44 33.87 30.41 18.38 28.02 17.75S45 31.75 28.57 16.82 26.32 16.21S110 23.07 16.70 12.84 12.49 11.98S113 28.61 12.45 9.56 11.23 9.19p00 7.79 5.68 4.81 5.23 4.74p11 19.80 18.68 15.93 17.24 15.81τ 7.29 6.13 5.18 5.65 4.95ϕ 15.39 12.92 9.14 11.70 9.23µ 39.47 33.76 27.75 30.81 26.75δ 55.60 51.63 47.78 48.77 47.61

    Notes: h is the block length; IF are computed using 1,000,000 draws; the four St variables shown are those for which GCKgives the largest IF.

    Rossi SSMS 37 / 73

  • US real GDP: samplers relative inefficiency

    Adaptive MH MH

    h 1 2 3 4 6 8 10 RD T

    S44 2.62 3.09 6.98 3.49 6.50 6.79 5.91 6.34 1.06S45 2.65 3.08 7.02 3.45 6.52 6.62 5.80 6.30 1.00S110 2.65 3.82 5.14 5.13 3.91 4.35 2.75 4.68 0.68S113 2.85 7.14 11.31 8.80 10.31 9.29 8.76 9.31 1.11p00 3.00 4.27 6.20 5.31 5.23 5.05 4.26 5.40 0.83p11 2.93 3.37 4.57 4.27 4.15 4.58 3.53 4.73 1.43τ 3.08 3.97 5.70 4.71 5.14 5.33 4.56 5.43 2.04ϕ 2.92 3.74 6.47 4.50 6.09 5.85 4.98 5.69 1.19µ 3.01 3.70 5.44 4.44 4.74 4.93 4.53 5.26 1.62δ 3.59 4.05 5.09 4.73 4.72 5.05 4.41 4.95 2.28Rel. time 0.22 0.22 0.20 0.20 0.20 0.20 0.22 0.20 0.18Acc. rate 0.99 0.97 0.94 0.93 0.89 0.86 0.82 0.88 0.06

    Rossi SSMS 38 / 73

  • Examples 2: Markov switching variance

    yt = pt + ct

    ∆pt = V1/2p apt

    φ(L) ct = V1/2c [St + (1− St) α1/2c ] act

    St a 2-states Markov chain withpii = Pr(St = i|St−1 = i) for i = 0, 1

    Rossi SSMS 39 / 73

  • USD/GBP real exchange rate: Gibbs inefficiency factors

    h 1 2 3 4 6

    S167 86.89 57.73 41.08 33.80 27.04S168 89.43 59.07 43.87 36.07 29.73S169 90.05 59.91 44.39 36.54 29.76S170 85.56 57.02 41.36 33.52 26.58p00 115.65 88.89 77.43 68.56 69.95p10 113.32 86.74 76.34 67.63 67.88p01 78.18 57.62 52.19 46.05 41.69p11 68.92 54.06 48.55 43.49 39.05p02 8.92 7.86 7.34 7.01 6.81p12 8.57 7.29 6.99 6.64 6.79φ 26.94 21.93 20.79 21.24 19.88

    Notes: h is the block length. IF are computed using 1,000,000 draws; the four St variables shown are those for which GCKgives the largest IF.

    Rossi SSMS 40 / 73

  • USD/GBP real exchange rate: samplers relative ineff.

    Adaptive MH MH

    h 1 2 3 4 6 8 10 RD T

    S167 1.82 2.52 3.36 3.50 3.81 4.40 5.18 5.36 21.74S168 1.82 2.60 3.24 3.25 3.56 3.99 5.09 5.21 20.28S169 1.84 2.58 3.19 3.24 3.43 4.04 4.94 5.31 21.14S170 1.87 2.59 3.26 3.45 3.86 4.42 4.43 5.27 22.71p00 1.53 1.89 2.17 2.48 2.39 2.18 1.98 1.39 3.71p10 1.51 1.86 2.16 2.42 2.35 2.11 1.89 1.36 3.62p02 1.84 1.97 2.04 1.95 1.54 1.31 1.29 1.61 3.35p12 1.90 2.01 1.83 1.71 1.30 1.10 1.10 1.41 3.19φ 2.30 2.51 2.99 2.56 2.80 3.02 3.08 2.80 4.08Rel. time 0.39 0.39 0.37 0.37 0.38 0.36 0.38 0.43 0.35Acc. rate 0.96 0.93 0.90 0.87 0.82 0.75 0.72 0.79 0.76

    Rossi SSMS 41 / 73

  • Examples 3: Change-point volatility model

    σt = exp(λ1/2t ) aσt

    λt = λt−1 + K1t V1/2λ aλt

    log-linearizing:

    log(σ2t ) = λt + µ(K2t) + γ(K2t) azt

    λt = λt−1 + K1t V1/2λ aλt

    K1t ∼ iiBernoulli(w)K2t ∼ iiMultinomial(π) with 10-states.µ(K2t) + γ(K2t) azt mixture of normals that approximates log a

    2σt

    (see Kim, Shepard and Chib 1998, Omori et al. 2007)

    Rossi SSMS 42 / 73

  • Fama-French market factor: samplers relative inefficiency

    Gibbs (IF) adaptive MH (RI)

    h 1 1 2 3 4 6 8 10 RD

    S25 17.48 1.65 2.60 3.11 3.59 3.64 3.24 2.79 3.97S78 13.27 1.72 2.17 2.22 3.60 2.17 3.21 2.87 4.35S79 16.46 1.59 1.89 2.02 2.69 2.19 2.60 2.38 3.80S83 10.76 1.45 1.52 1.66 1.72 1.88 1.66 1.60 2.40ω 1.84 2.10 2.02 2.04 1.95 1.86 1.58 1.60 2.07Rel. time 1.00 0.39 0.38 0.36 0.35 0.34 0.33 0.33 0.32Acc. rate 0.86 0.77 0.71 0.66 0.59 0.54 0.50 0.58

    Rossi SSMS 43 / 73

  • Block length

    It is important to anticipate whether blocking is worth implementing

    If yes, can we infer the ”optimal” block length?

    Blocking is effective when the variables in the block are stronglyconditionally dependent given the rest

    Simple way to gauge conditional dependence we use the partialautocorrelations Corr(St, St+k|y, St−1, St+k+1), k = 1, 2, · · · .

    Rossi SSMS 44 / 73

  • Partial autocorrelations of discrete latent variable S in the US GDP example.

    1 2 3 4 5 6 7 8 9 10

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    Lag

    Rossi SSMS 45 / 73

  • Partial autocorrelations of latent variable S in the USD/GBP exchange rate

    1 2 3 4 5 6 7 8 9 10

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    Lag

    Rossi SSMS 46 / 73

  • Partial autocorrelations of discrete latent variable K1 in the Fama-French market

    1 2 3 4 5 6 7 8 9 10−0.05

    0

    0.05

    0.1

    Lag

    Rossi SSMS 47 / 73

  • Posterior simulation: transition probabilities

    The assumptions made imply:

    f(π|θ,S,y) = f(π|S) ∝ Pr(S|π) f(π)

    In the simplest case where St is a sequence of independent variables i.e.Pr(St = k|St−1 = j) = Pr(St = k) = πk for k = 1, · · · , N .

    Under the assumption π = (π1, · · · , πN ) ∼ Dirichlet(α1, · · · , αN ):

    f(π|S) ∝T∏t=1

    Pr(St) f(π) = Dirichlet(α∗1, · · · , α∗N )

    α∗k = αk +

    T∑t=1

    1(St=k)

    Rossi SSMS 48 / 73

  • Posterior simulation: transition probabilities

    Let S a Markov sequence with Pr(St = k|St−1 = j) = πkj fork, j = 1, · · · , N . Let π = (π1, · · · , πN ) and πj = (π1j , · · · , πNj).Assume πj are mutually independent Dirichlet pdfs. The full conditionaldistribution of π is such as:

    f(π|S) ∝ Pr(S|π1, · · · , πN )N∏k=1

    f(πk)

    ∝ Pr(S1|π1, · · · , πN )T∏t=2

    Pr(St|St−1, π1, · · · , πN )N∏k=1

    f(πk)

    ∝ Pr(S1|π1, · · · , πN )N∏k=1

    T∏t∈Ik

    Pr(St|St−1 = k, πk)f(πk)

    where Ik = {t ≥ 2 : St−1 = k}.

    Rossi SSMS 49 / 73

  • Posterior simulation: transition probabilities

    Notice that the term

    N∏k=1

    T∏t∈Ik

    Pr(St|St−1 = k, πk)f(πk)

    is proportional to the product of N independent Dirichlet distributions.

    To remove dependence on the initial condition S1, this product is taken asproposal in a MH step with acceptance probability given by

    min{1,Pr(S1|π∗)/Pr(S1|π0)}

    where π∗ is the candidate vector and π0 is the previously sampled value.

    Rossi SSMS 50 / 73

  • Convergence diagnostics

    1 Visual inspection by plotting cumulated posterior means1g

    ∑gj=1 θ

    (j), g = 1, 2, · · · , G.

    2 Geweke convergence diagnostic: compare the mean of the first n1elements against the mean of the last n2:

    θ̄1 =1

    n1

    n1∑j=1

    θ(j); θ̄2 =1

    n2

    G∑j=G−n2+1

    θ(j)

    with n1 + n2 ≤ G. As G→∞ and n1G ,n2G remain fixed

    Z =θ̄2 − θ̄1√

    V (θ̄1) + V (θ̄2)→ N(0, 1)

    Large values of Z indicates lack of convergence.

    Geweke suggests n1 = G/5 and n2 = G/2.

    Rossi SSMS 51 / 73

  • Convergence diagnostics

    A universal method to asses convergence does not exist.

    Useful to reports:

    Geweke statistics.

    Chain autocorrelations.

    NSE the numerical standard error of the posterior mean usingautocovariances until lag equal to 4% of the recorded simulations.

    RNE the relative numerical efficiency: the ratio of the variance of theposterior mean under iid hypothesis to the squared NSE. Close to 1value indicates high efficiency.

    Rossi SSMS 52 / 73

  • Inference from the MCMC draws

    Any quantity of interest can be derived from the posterior samples: forinstance marginal distribution and related moments like in

    Ê[θ1|y] =1

    G

    G∑j=1

    θ(j)1 f̂(θ̃1) =

    1

    G

    G∑j=1

    Iθ(j)1 ∈(θ̃1−δ,θ̃1+δ)

    Hypothesis testing via Highest Posterior Density region (HPD) withprobability content α: the smallest interval R such that

    p(β ∈ R|y) =∫Rp(β|y)dβ = α

    ⇒ accept H0 : β ∈ I if I ∈ R.

    Rossi SSMS 53 / 73

  • The marginal likelihood

    Let y denote the observed values and Mk a model with parameters θ forwhich a proper prior f(θ|Mk) is defined over a support Θ.

    The marginal likelihood is the density mass that the model puts on thedata given the priors:

    f(y|Mk) =∫

    Θf(y|Mk, θ)f(θ|Mk)dθ

    Rossi SSMS 54 / 73

  • The marginal likelihood

    Useful for:

    1 Model selection: it gives the posterior probability of a model Mkamong K models:

    p(Mk|y) =f(y|Mk)p(Mk)∑` f(y|M`)p(M`)

    ⇒ isolate the prior that better matches the data properties.2 Hypothesis testing: when models are made to coincide with

    hypotheses, for instance to discriminate between different trendspecification.

    3 Forecast combination:

    f(yT+1|y) =∑`

    f(yT+1|M`, y)p(M`|y)

    ... and for Bayesian model averaging of any quantity of interest.

    Rossi SSMS 55 / 73

  • The marginal likelihood

    Few closed-form solutions exist:

    Regression model (M1): y = Xβ + u with u|Vu ∼ N(0, VuIT ) andnatural conjugate prior p(β, Vu) = NIG(β0, P0, s0, ν0) which implies:

    f(y|M1) = t(Xβ0, s0, (IT +XP−10 X′)−1, ν0)

    Random walk plus drift (M2): ∆y = µ+ u withp(µ, Vu) = NIG(µ0, P0, s0, ν0):

    f(∆y|M2) = t(µ01T−1, s0, (IT−1 + 1T−1P−10 1′T−1)

    −1, ν0)

    In general, no exact solution exists ⇒ a numerical evaluation is necessary.

    Rossi SSMS 56 / 73

  • Marginal likelihood estimators

    Likelihood integration over the prior:

    f(y) =

    ∫Θf(y|θ)f(θ)dθ

    f̂LI(y) =1

    m

    m∑i=1

    f(y|θ(i)) θ iid from f(θ)

    Difficulties for:

    the support dimension;

    likelihood functions typically highly-concentrated with respect to prior.

    ⇒ large variance, not used.

    Rossi SSMS 57 / 73

  • Marginal likelihood estimators

    Importance sampling is one standard in applied econometrics:

    f(y) =

    ∫Θ

    f(y|θ)f(θ)q(θ)

    q(θ)dθ

    ⇒ f̂IS(y) =1

    m

    m∑i=1

    f(y|θ(i))f(θ(i))q(θ(i))

    ; θ(i) ∼ q(θ)

    where q(θ) is the importance function with support Sq.

    Choice of importance function:

    f(θ|y)/q(θ) ∝ 1 yields 0-variance ⇒ choose q(θ) ∝ f(θ|y) as muchas possible.

    If q has light tails wrt f(θ|y), f(θ|y)/q(θ) is too large on tails. If q isover-dispersed wrt f(θ|y) irrelevant points are drawn. Both casesimply a loss in efficiency.

    Often θ ∼ N(θ̃, cΣ(θ̃)) with θ̃ posterior mode and c > 1.Rossi SSMS 58 / 73

  • Marginal likelihood estimators

    Harmonic Mean by Newton & Raftery (1994, JRSS):

    1

    f(y)=

    ∫f(θ)

    f(y)dθ =

    ∫f(θ|y)f(y|θ)

    dθ = Eθ|y[1

    f(y|θ)]

    ⇒ f̂HM (y) = [1

    m

    m∑i=1

    1

    f(y|θ(i))]−1 θ(i) ∼ f(θ|y)

    Advantage Make use of posterior samples instead of importance density.

    Problem: variance is ∞ due to points with zero likelihood.

    Improvement Modified harmonic mean estimator (Geweke, FRB 1999)attenuates the infinite variance problem by introducing an importancefunction with a truncation.

    DMM delivers the MHM estimates with truncation at 0, 5%, · · · , 95%.

    Rossi SSMS 59 / 73

  • Marginal likelihood estimators

    Bridge sampling by Meng & Wong (1996, SS) is today’s best.

    1 =

    ∫ h(θ)q(θ) q(θ)dθ∫ h(θ)

    f(y|θ)f(θ)f(y)

    f(θ|y)dθ⇒ f(y) =

    ∫ h(θ)q(θ) q(θ)dθ∫ h(θ)

    f(y|θ)f(θ)f(θ|y)dθ

    The bridge function h(θ) reduces the estimation error if located betweenthe importance function q(θ) and the posterior density. MW’s optimalchoice:

    h(θ) ∝ q(θ)f(θ|y)mqq(θ) +myf(θ|y)

    mq, my refers to # of draws from q(θ), f(θ|y).

    Rossi SSMS 60 / 73

  • Marginal likelihood estimators

    The MW estimator:

    f̂MW (y) =

    1mq

    ∑θ∼q(θ)

    f(y|θ)f(θ)mqq(θ)+myf(θ|y)

    1my

    ∑θ∼f(θ|y)

    q(θ)mqq(θ)+myf(θ|y)

    involves f(θ|y) that requires a preliminary estimate of f(y). The formulaeabove can be iterated.

    DMM delivers MW estimates without iterating and with 10 iterations, allinitialized with one MHM estimates.

    Rossi SSMS 61 / 73

  • Program DMM

    DMM is a stand alone program for the analysis of dynamic mixturemodels (see Giordani, Kohn, and van Dijk, JoE, 2007)

    Several packages offer estimation of state space models (see e.g.Commandeur, Koopman and Ooms, JSS, Vol. 41, 2011)

    However only S+FinMetrics by Zivot (JSS, 2006), analyzes Markovswitching state space models. Estimation is performed by maximizingthe approximated likelihood devised by Kim (JoE, 1994).

    Rossi SSMS 62 / 73

  • DMM: main features

    Bayesian inference: DMM delivers posterior samples of theunobserved state vector, of the discrete latent variable, of modelparameters, missing values, forecasts and two marginal likelihoodestimates (Meng and Wong, SS, 1996 and Geweke, 1999)

    Endogenous series: univariate or multivariate, stationary ornon-stationary, with missing observations, and they may be linked toexogenous variables.

    Coding: for computational speed and robustness DMM is fullyimplemented in Fortran.

    Rossi SSMS 63 / 73

  • DMM: prior assumptions

    All parameters in θ are independent, and θ is independent from π

    Prior distributions for the elements of θ may be normal (NT), beta(BE), and inverse gamma (IG)

    Each parameter in θ is defined over a finite support

    Dirichlet distributions are imposed for the transition probabilities π`.

    Rossi SSMS 64 / 73

  • Running DMM: stand-alone version

    Kahn-Rich.nml: contains model settings, prior distributions, anddata for LP, WH, and CH

    Kahn-Rich.dll, a dynamic link library that defines the systemmatrices of the state space representation. The dll can be codedeither using a Fortran compiler e.g. GNU Fortran compiler forWindows by typing:

    gfortran -shared -o Kahn-Rich.dll Kahn-Rich.for

    or writing a simple routine in MatLab.

    The program is called from the MS-DOS prompt typing:

    DMM Kahn-Rich.nml

    Rossi SSMS 65 / 73

  • The Kahn and Rich JME 2007 model G.Fiorentini, C.Planas and A.Rossi - January 2013 lp = p1 + λ1*c + z1 wh = p2 + λ2*c + z2 ch = p3 + λ3*c + z3 pj = pj(-1) + mj + ap mj = [S+(1-S)*δj]*μj, δj < 1 c = 2Acos(2π/τ)c(-1) -2A2 c(-2) + ac zj = фjzj(-1) + azj, j=1,2,3 State-space format: y(t) = c(t)z(t) + H(t)x(t) + G(t)u(t) x(t) = a(t) + F(t)x(t-1) + R(t)u(t) Namelist ssm contains: nx = number of continuous states nu = number of shocks d(1) = order of integration of the system d(2) = number of non-stationary continuous state variables nv = number of discrete S variables (

  • Namelist prior describes priors pdf of model parameters: nt = number of theta parameters pdftheta = prior distribution (NT=Truncated Normal; BE=Beta; IG=Inverse Gamma) hyptheta = hyperparameter of prior pdf (mean or hyp, sd or hyp,lower bound, upper bound) Note: if thetahyp(3,j)=thetahyp(4,j) the parameter is not estimated and its value is fixed at lb=ub τ,A,Vc = period,amplitude,var theta(1:3) λ1, λ2 , λ3 = common cycle loading coeff theta(4:6) μ1, δ1 = drift p1 theta(7:8) μ2, δ2 = drift p2 theta(9:10) μ3, δ3 = drift p3 theta(11:12) Vp = common trend variance theta(13) ф1, ф2 , ф3 = idio term AR1 coeffs theta(14:16) Vz1,Vz2 ,Vz3 = idio term variances theta(17:19) &prior nt = 19 pdftheta(1) = BE hyptheta(1,1) = 2.58 14.68 2 100 pdftheta(2) = BE hyptheta(1,2) = 5 5 .0001 .9999 pdftheta(3) = IG hyptheta(1,3) = .0004 6 0 1 pdftheta(4) = NT hyptheta(1,4) = 1 1 1 1 pdftheta(5) = NT hyptheta(1,5) = .3 .25 0 2 pdftheta(6) = NT hyptheta(1,6) = .6 .25 0 2 pdftheta(7) = NT hyptheta(1,7) = .008 .000001 0 .012 pdftheta(8) = NT hyptheta(1,8) = .4 .16 .001 .95 pdftheta(9) = NT hyptheta(1,9) = .008 .000001 0 .012 pdftheta(10)= NT hyptheta(1,10)= .4 .16 .001 .95 pdftheta(11)= NT hyptheta(1,11)= .008 .000001 0 .012 pdftheta(12)= NT hyptheta(1,12)= .4 .16 .001 .95 pdftheta(13)= IG hyptheta(1,13)= .0000012 6 0 1 pdftheta(14)= NT hyptheta(1,14)= 0 0 0 0 pdftheta(15)= NT hyptheta(1,15)= .8 .04 0 .98 pdftheta(16)= NT hyptheta(1,16)= .8 .04 0 .98 pdftheta(17)= IG hyptheta(1,17)= .0002 6 0 .01 pdftheta(18)= IG hyptheta(1,18)= .0002 6 0 .01 pdftheta(19)= IG hyptheta(1,19)= .0002 6 0 .01 &end

  • Namelist mcmc contains the Markov Chain Monte Carlo options: seed = seed of random number generator (0-999) thin = thinning burnin = burn-in period simulrec = number of recorded samples hbl = block length discrete latent variable (1:GCK, >1: AMH) &mcmc seed=0 thin=1 burnin=100 simulrec=5000 hbl=1 &end Namelist dataset provides data: T = number of observations ny = number of endogenous series nz = number of exogenous series nf = number of forecasts datasym = simulate the data {Y,N} obs = a matrix of dimension nobs x ny if nz = 0 and (nobs+nf) x (ny+nz) if nz > 0. Note: use can be made of -99999 to assign missing values to the endogenous variables &dataset T=244 ny=3 nz=0 nf=0 obs= 1.00000000000000 1.00000000000000 1.00000000000000 1.01591026190000 1.00200995470000 1.00958835440000 . . . . . . . . . 2.36156892870000 2.43176523280000 2.45613035130000 &end

  • SUBROUTINE DESIGN(ny,nz,nx,nu,ns,nt,theta,c,H,G,a,F,R) !DEC$ ATTRIBUTES DLLEXPORT, ALIAS:'design_' :: DESIGN C INPUT INTEGER ny,nz,nx,nu,ns(6),nt DOUBLE PRECISION theta(nt) C OUTPUT

    DOUBLE PRECISION c(ny,max(1,nz),ns(1)),H(ny,nx,ns(2)), G(ny,nu,ns(3)),a(nx,ns(4)),F(nx,nx,ns(5)),R(nx,nu,ns(6))

    C LOCALS INTEGER I DOUBLE PRECISION PI DATA PI/3.141592653589793D0/ C c(t) (ny x max(1,nz) x ns1) c(:,:,:) = 0.D0 C H(t) (ny x nx x ns2) H(:,:,:) = 0.D0 DO I = 1,ny H(I,I,1) = 1.D0 ! trend H(I,ny+1,1) = theta(3+I) ! cycle H(I,ny+2+I,1) = 1.D0 ! idyo ENDDO C G(t) (ny x nu x ns3) G(:,:,:) = 0.D0 C a(t) (nx x ns4) a(:,:) = 0.D0 DO I = 1,ny a(I,1) = theta(7+(I-1)*2)*theta(8+(I-1)*2) a(I,2) = theta(7+(I-1)*2) ENDDO C F(t) (nx x nx x ns5) F(:,:,:) = 0.D0 DO I = 1,ny F(I,I,1) = 1.D0 ENDDO F(ny+1,ny+1,1) = 2.D0*theta(2)*dcos(2.*PI/theta(1)) F(ny+1,ny+2,1) = -theta(2)**2 F(ny+2,ny+1,1) = 1.D0 DO I = 1,ny F(ny+2+I,ny+2+I,1) = theta(13+I) ENDDO

  • C R(t) (nx x nu x ns6) R(:,:,:) = 0.D0 DO I =1,ny R(I,1,1) = DSQRT(theta(13)) ! trend var ENDDO R(ny+1,2,1) = DSQRT(theta(3)) ! cycle var DO I = 1,ny R(ny+2+I,2+I,1) = DSQRT(theta(16+I)) ! idyo var ENDDO RETURN END

    To create the file kahn-rich.dll from kahn-rich.for use can be made of the GNU Fortran compiler for Windows (gcc.gnu.org/wiki/GFortran) At the MS-DOS command prompt type: gfortran -shared -o kahn-rich.dll kahn-rich.for

  • Running DMM: dynare version (to be finalized)

    // Variables and processes declaration

    var y mu e;

    varobs y;

    varexo ee emu;

    parameters Ve, Vmu, delta, S1, S2;

    // Write the model as usual in dynare

    model;

    y = mu + e;

    e = ((S1 - 1)*sqrt(delta) + (2 - S1))*sqrt(Ve)*ee;

    mu = mu(-1) + (S2-1)*sqrt(Vmu)*emu;

    end;

    // MCMC settings

    dmm(drop=1000,seed=0,thinning=1,replic=10000,

    maxorderintegration=1,nonstationary=1,forecasts=10);

    Rossi SSMS 66 / 73

  • Running DMM: dynare version (to be finalized)

    // Specify the latent processes S1 and S2

    multinomial( numberofregimes=2, probability = [P1]);

    S1.calibration(regime=1) = 1;

    S1.calibration(regime=2) = 2;

    multinomial( numberofregimes=2, probability = [P2]);

    S2.calibration(regime=1) = 1;

    S2.calibration(regime=2) = 2;

    // Setting priors

    P1.prior(shape=dirichlet,params=[1 1; 1 1]);

    P2.prior(shape=dirichlet,params=[1 1; 1 1]);

    Ve.prior(shape=invgamma,mean=6e4,stdev=6,interval=[0,5e4]);

    Vmu.prior(shape=invgamma,mean=6e4,stdev=6,interval=[0,5e4]);

    delta.prior(shape=beta,mean=2,stdev=4,interval=[1,20]);

    Rossi SSMS 67 / 73

  • Detecting changes in US trend productivity

    Kahn and Rich, JME 1997, use neoclassical growth theory to detectchanges in US trend productivity

    Labour productivity as real GDP by total hours (LP )Hourly wages (WH)Real consumption per hours worked (CH)

    48−4 53−4 58−4 63−4 68−4 73−4 78−4 83−4 88−4 93−4 98−4 03−4 08−41

    1.2

    1.4

    1.6

    1.8

    2

    2.2

    2.4 LPWHCH

    Rossi SSMS 68 / 73

  • Detecting changes in US trend productivity

    The Kahn-Rich model (slightly revised):

    LPt = p1t + λ1ψt + z1t

    WHt = p2t + λ2ψt + z2t

    CHt = p3t + λ3ψt + z3t

    ψt = 2 A cos(2π/τ)ψt−1 −A2ψt−2 + aψt, aψt ∼ N(0, Vψ)

    p`t = µ`(St) + p`t−1 + apt, apt ∼ N(0, Vp)

    µ`(St) = µ`[(1− St) + δ`St]

    z`t = φ`z`t−1 + az`t, az`t ∼ N(0, Vz`), ` = 1, 2, 3

    where St ∈ {0, 1}, with Pr(St+1 = i|St = i) = πii, i = 0, 1

    Rossi SSMS 69 / 73

  • Detecting changes in US trend productivity: DMM results

    20 30 40 50 60

    0.01

    0.02

    0.03

    0.04

    period0.45 0.5 0.55 0.6 0.65 0.7

    2

    4

    6

    8

    10

    amplitude

    6 6.5 7 7.5 8 8.5 9 9.5 10

    x 10−3

    500

    1000

    1500

    µ1 µ

    2 µ

    3

    LPWHCH

    0.1 0.2 0.3 0.4 0.5 0.6 0.7

    2

    4

    6

    8

    δ1 δ

    2 δ

    3

    0.8 0.85 0.9 0.95

    5

    10

    15

    Pr(St = 0 | S

    t−1 = 0)

    0.8 0.85 0.9 0.95

    5

    10

    15

    Pr(St = 1 | S

    t−1 = 1)

    Rossi SSMS 70 / 73

  • Detecting changes in US trend productivity: DMM results

    Posterior probability of high productivity growth Pr(St = 1|yT )

    48−4 52−4 56−4 60−4 64−4 68−4 72−4 76−4 80−4 84−4 88−4 92−4 96−4 00−4 04−4 08−40

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    Rossi SSMS 71 / 73

  • Concluding remarks

    DMM is a program for the analysis of dynamic mixture models:

    handles multivariate series that may be non-stationary, with missingobservations, and linked to some exogenous variables

    implements up-to-date techniques for sampling the discrete latentvariable in O(T ) operations, for exact initialization of the Kalmanrecursions, for drawing model parameters efficiently, and forcomputing the marginal likelihood

    prior distributions do not need to be conjugate

    complete freedom for model parameterization

    Benefits from the computational speed advantage of low-levellanguages - particularly relevant when MCMC algorithms are employed

    Rossi SSMS 72 / 73

  • Concluding remarks

    The stand-alone version of DMM can be freely downloaded at

    http://ipsc.jrc.ec.europa.eu/fileadmin/

    repository/sfa/finepro/software/DMM.zip

    The dynare version of DMM will be soon available at

    http://www.dynare.org/

    Rossi SSMS 73 / 73