118
State Space Models and Filtering Jes´ us Fern´ andez-Villaverde University of Pennsylvania 1

LectureNotes 9 Filtering

Embed Size (px)

DESCRIPTION

lecture notes on macro filtering

Citation preview

  • State Space Models and Filtering

    Jesus Fernandez-VillaverdeUniversity of Pennsylvania

    1

  • State Space Form

    What is a state space representation?

    States versus observables.

    Why is it useful?

    Relation with filtering.

    Relation with optimal control.

    Linear versus nonlinear, Gaussian versus nongaussian.2

  • State Space Representation

    Let the following system:

    Transition equation

    xt+1 = Fxt +Gt+1, t+1 N (0, Q)

    Measurement equation

    zt = H0xt + t, t N (0, R)

    where xt are the states and zt are the observables.

    Assume we want to write the likelihood function of zT = {zt}Tt=1.3

  • The State Space Representation is Not Unique

    Take the previous state space representation.

    Let B be a non-singular squared matrix conforming with F .

    Then, if xt = Bxt, F = BFB1, G = BG, and H =H0B

    0, wecan write a new, equivalent, representation:

    Transition equation

    xt+1 = Fxt +G

    t+1, t+1 N (0, Q)

    Measurement equation

    zt = H0xt + t, t N (0, R)

    4

  • Example I

    Assume the following AR(2) process:zt = 1zt1 + 2zt2 + t, t N

    0,2

    Model is not apparently not Markovian.

    Can we write this model in dierent state space forms?

    Yes!

    5

  • State Space Representation I

    Transition equation:

    xt =

    "1 12 0

    #xt1 +

    "10

    #t

    where xt =hyt 2yt1

    i0

    Measurement equation:zt =

    h1 0

    ixt

    6

  • State Space Representation II

    Transition equation:

    xt =

    "1 21 0

    #xt1 +

    "10

    #t

    where xt =hyt yt1

    i0

    Measurement equation:zt =

    h1 0

    ixt

    Try B ="1 00 2

    #on the second system to get the first system.

    7

  • Example II

    Assume the following MA(1) process:zt = t + t1, t N

    0,2

    , and Ets = 0 for s 6= t.

    Again, we have a more conmplicated structure than a simple Markov-ian process.

    However, it will again be straightforward to write a state space repre-sentation.

    8

  • State Space Representation I

    Transition equation:

    xt =

    "0 10 0

    #xt1 +

    "1

    #t

    where xt =hyt t

    i0

    Measurement equation:zt =

    h1 0

    ixt

    9

  • State Space Representation II

    Transition equation:

    xt = t1

    where xt = [t1]0

    Measurement equation:zt = xt + t

    Again both representations are equivalent!10

  • Example III

    Assume the following random walk plus drift process:zt = zt1 + + t, t N

    0,2

    This is even more interesting.

    We have a unit root.

    We have a constant parameter (the drift).

    11

  • State Space Representation

    Transition equation:

    xt =

    "1 10 1

    #xt1 +

    "10

    #t

    where xt =hyt

    i0

    Measurement equation:zt =

    h1 0

    ixt

    12

  • Some Conditions on the State Space Representation

    We only consider Stable Systems.

    A system is stable if for any initial state x0, the vector of states, xt,converges to some unique x.

    A necessary and sucient condition for the system to be stable is that:|i (F )| < 1

    for all i, where i (F ) stands for eigenvalue of F .

    13

  • Introducing the Kalman Filter

    Developed by Kalman and Bucy.

    Wide application in science.

    Basic idea.

    Prediction, smoothing, and control.

    Why the name filter?

    14

  • Some Definitions

    Let xt|t1 = Ext|zt1

    be the best linear predictor of xt given the

    history of observables until t 1, i.e. zt1.

    Let zt|t1 = Ezt|zt1

    = H0xt|t1 be the best linear predictor of zt

    given the history of observables until t 1, i.e. zt1.

    Let xt|t = Ext|zt

    be the best linear predictor of xt given the history

    of observables until t, i.e. zt.

    15

  • What is the Kalman Filter trying to do?

    Let assume we have xt|t1 and zt|t1.

    We observe a new zt.

    We need to obtain xt|t.

    Note that xt+1|t = Fxt|t and zt+1|t = H0xt+1|t, so we can go backto the first step and wait for zt+1.

    Therefore, the key question is how to obtain xt|t from xt|t1 and zt.16

  • A Minimization Approach to the Kalman Filter I

    Assume we use the following equation to get xt|t from zt and xt|t1:

    xt|t = xt|t1 +Ktzt zt|t1

    = xt|t1 +Kt

    zt H0xt|t1

    This formula will have some probabilistic justification (to follow).

    What is Kt?

    17

  • A Minimization Approach to the Kalman Filter II

    Kt is called the Kalman filter gain and it measures how much weupdate xt|t1 as a function in our error in predicting zt.

    The question is how to find the optimal Kt.

    The Kalman filter is about how to build Kt such that optimally updatext|t from xt|t1 and zt.

    How do we find the optimal Kt?

    18

  • Some Additional Definitions

    Let t|t1 Ext xt|t1

    xt xt|t1

    0 |zt1 be the predictingerror variance covariance matrix of xt given the history of observables

    until t 1, i.e. zt1.

    Let t|t1 Ezt zt|t1

    zt zt|t1

    0 |zt1 be the predictingerror variance covariance matrix of zt given the history of observables

    until t 1, i.e. zt1.

    Let t|t Ext xt|t

    xt xt|t

    0 |zt be the predicting error vari-ance covariance matrix of xt given the history of observables until t1,i.e. zt.

    19

  • Finding the optimal Kt

    We want Kt such that mint|t.

    It can be shown that, if that is the case:Kt = t|t1H

    H0t|t1H +R

    1

    with the optimal update of xt|t given zt and xt|t1 being:

    xt|t = xt|t1 +Ktzt H0xt|t1

    We will provide some intuition later.20

  • Example I

    Assume the following model in State Space form:

    Transition equationxt = + t, t N

    0,2

    Measurement equationzt = xt + t, t N

    0,2

    Let 2 = q2.

    21

  • Example II

    Then, if 1|0 = 2, what it means that x1 was drawn from the ergodicdistribution of xt.

    We have:K1 =

    21

    1 + q 11 + q

    .

    Therefore, the bigger 2 relative to 2 (the bigger q) the lower K1and the less we trust z1.

    22

  • The Kalman Filter Algorithm I

    Given t|t1, zt, and xt|t1, we can now set the Kalman filter algorithm.

    Let t|t1, then we compute:

    t|t1 Ezt zt|t1

    zt zt|t1

    0 |zt1

    = E

    H0xt xt|t1

    xt xt|t1

    0H+

    txt xt|t1

    0H +H0

    xt xt|t1

    0t+

    t0t|zt1

    = H0t|t1H +R

    23

  • The Kalman Filter Algorithm II

    Let t|t1, then we compute:

    E

    zt zt|t1

    xt xt|t1

    0 |zt1 =EH0

    xt xt|t1

    xt xt|t1

    0+ t

    xt xt|t1

    0 |zt1 = H0t|t1Let t|t1, then we compute:

    Kt = t|t1HH0t|t1H +R

    1Let t|t1, xt|t1, Kt, and zt then we compute:

    xt|t = xt|t1 +Ktzt H0xt|t1

    24

  • The Kalman Filter Algorithm III

    Let t|t1, xt|t1, Kt, and zt, then we compute:

    t|t Ext xt|t

    xt xt|t

    0 |zt =

    E

    xt xt|t1

    xt xt|t1

    0xt xt|t1

    zt H0xt|t1

    0K0t

    Ktzt H0xt|t1

    xt xt|t1

    0+

    Ktzt H0xt|t1

    zt H0xt|t1

    0K0t|zt

    = t|t1 KtH0t|t1

    where, you have to notice that xtxt|t = xtxt|t1Ktzt H0xt|t1

    .

    25

  • The Kalman Filter Algorithm IV

    Let t|t1, xt|t1, Kt, and zt, then we compute:

    t+1|t = Ft|tF 0 +GQG0

    Let xt|t, then we compute:

    1. xt+1|t = Fxt|t

    2. zt+1|t = H0xt+1|t

    Therefore, from xt|t1, t|t1, and zt we compute xt|t and t|t.26

  • The Kalman Filter Algorithm V

    We also compute zt|t1 and t|t1.

    Why?

    To calculate the likelihood function of zT = {zt}Tt=1 (to follow).

    27

  • The Kalman Filter Algorithm: A Review

    We start with xt|t1 and t|t1.

    The, we observe zt and:

    t|t1 = H0t|t1H +R

    zt|t1 = H0xt|t1

    Kt = t|t1HH0t|t1H +R

    1 t|t = t|t1 KtH0t|t1

    28

  • xt|t = xt|t1 +Ktzt H0xt|t1

    t+1|t = Ft|tF 0 +GQG0

    xt+1|t = Fxt|t

    We finish with xt+1|t and t+1|t.

    29

  • Some Intuition about the optimal Kt

    Remember: Kt = t|t1HH0t|t1H +R

    1

    Notice that we can rewrite Kt in the following way:Kt = t|t1H1t|t1

    If we did a big mistake forecasting xt|t1 using past information (t|t1large) we give a lot of weight to the new information (Kt large).

    If the new information is noise (R large) we give a lot of weight to theold prediction (Kt small).

    30

  • A Probabilistic Approach to the Kalman Filter

    Assume:

    Z|w = [X 0|w Y 0|w]0 N"

    x

    y

    #,

    "xx xyyx yy

    #!

    then:X|y,w N

    x +xy1yy (y y) ,xx xy1yy yx

    Also xt|t1 Ext|zt1

    and:

    t|t1 Ext xt|t1

    xt xt|t1

    0 |zt131

  • Some Derivations I

    If zt|zt1 is the random variable zt (observable) conditional on zt1, then:

    Let zt|t1 Ezt|zt1

    = E

    H0xt + t|zt1

    = H0xt|t1

    Lett|t1 E

    zt zt|t1

    zt zt|t1

    0 |zt1 =

    E

    H0xt xt|t1

    xt xt|t1

    0H+

    txt xt|t1

    0H+

    H0xt xt|t1

    0t+

    t0t|zt1

    = H0t|t1H +R

    32

  • Some Derivations II

    Finally, let

    Ezt zt|t1

    xt xt|t1

    0 |zt1 =E

    H0

    xt xt|t1

    xt xt|t1

    0+ t

    xt xt|t1

    0 |zt1 == H0t|t1

    33

  • The Kalman Filter First Iteration I

    Assume we know x1|0 and 1|0, thenx1z1|z0!N

    "x1|0H0x1|0

    #,

    "1|0 1|0HH01|0 H01|0H +R

    #!

    Remember that:X|y,w N

    x +xy1yy (y y) ,xx xy1yy yx

    34

  • The Kalman Filter First Iteration II

    Then, we can write:

    x1|z1, z0 = x1|z1 Nx1|1,1|1

    where

    x1|1 = x1|0 +1|0HH01|0H +R

    1 z1 H0x1|0

    and

    1|1 = 1|0 1|0HH01|0H +R

    1H01|0

    35

  • Therefore, we have that:

    z1|0 = H0x1|0

    1|0 = H01|0H +R

    x1|1 = x1|0 +1|0HH01|0H +R

    1 z1 H0x1|0

    1|1 = 1|0 1|0H

    H01|0H +R

    1H01|0

    Also, since x2|1 = Fx1|1 +G2|1 and z2|1 = H0x2|1 + 2|1:

    x2|1 = Fx1|1

    2|1 = F1|1F 0 +GQG0

    36

  • The Kalman Filter th Iteration I

    Assume we know xt|t1 and t|t1, thenxtzt|zt1

    !N

    "xt|t1H0xt|t1

    #,

    "t|t1 t|t1HH0t|t1 H0t|t1H +R

    #!

    Remember that:X|y,w N

    x +xy1yy (y y) ,xx xy1yy yx

    37

  • The Kalman Filter th Iteration II

    Then, we can write:

    xt|zt, zt1 = xt|zt Nxt|t,t|t

    where

    xt|t = xt|t1 +t|t1HH0t|t1H +R

    1 zt H0xt|t1

    and

    t|t = t|t1 t|t1HH0t|t1H +R

    1H0t|t1

    38

  • The Kalman Filter Algorithm

    Given xt|t1, t|t1 and observation zt

    t|t1 = H0t|t1H +R

    zt|t1 = H0xt|t1

    t|t = t|t1 t|t1HH0t|t1H +R

    1H0

    xt|t = xt|t1 + t|t1HH0t|t1H +R

    1 zt H0xt|t1

    039

  • t+1|t = Ft|tF 0 +GQGt|t1

    xt+1|t = Fxt|t1

    40

  • Putting the Minimization and the Probabilistic Approaches Together

    From the Minimization Approach we know that:xt|t = xt|t1 +Kt

    zt H0xt|t1

    From the Probability Approach we know that:

    xt|t = xt|t1 + t|t1HH0t|t1H +R

    1 zt H0xt|t1

    41

  • But since:Kt = t|t1H

    H0t|t1H +R

    1

    We can also write in the probabilistic approach:

    xt|t = xt|t1 +t|t1HH0t|t1H +R

    1 zt H0xt|t1

    =

    = xt|t1 +Ktzt H0xt|t1

    Therefore, both approaches are equivalent.

    42

  • Writing the Likelihood Function

    We want to write the likelihood function of zT = {zt}Tt=1:log `

    zT |F,G,H,Q,R

    =

    TXt=1

    log `zt|zt1F,G,H,Q,R

    =

    TXt=1

    N2log 2 +

    1

    2log

    t|t1

    +1

    2

    TXt=1

    v0t1t|t1vt

    vt = zt zt|t1 = zt H0xt|t1

    t|t1 = H0tt|t1Ht +R

    43

  • Initial conditions for the Kalman Filter

    An important step in the Kalman Fitler is to set the initial conditions.

    Initial conditions:

    1. x1|0

    2. 1|0

    Where do they come from?

    44

  • Since we only consider stable system, the standard approach is to set:

    x1|0 = x

    1|0 =

    where x solves

    x = Fx

    = FF 0 +GQG0

    How do we find ?

    = [I F F ]1 vec(GQG0)

    45

  • Initial conditions for the Kalman Filter II

    Under the following conditions:

    1. The system is stable, i.e. all eigenvalues of F are strictly less than one

    in absolute value.

    2. GQG0 and R are p.s.d. symmetric

    3. 1|0 is p.s.d. symmetric

    Then t+1|t .

    46

  • Remarks

    1. There are more general theorems than the one just described.

    2. Those theorems are based on non-stable systems.

    3. Since we are going to work with stable system the former theorem is

    enough.

    4. Last theorem gives us a way to find as t+1|t for any 1|0 westart with.

    47

  • The Kalman Filter and DSGE models

    Basic Real Business Cycle model

    maxE0

    Xt=0

    t { log ct + (1 ) log (1 lt)}

    ct + kt+1 = kt (e

    ztlt)1 + (1 ) k

    zt = zt1 + t, t N (0,)

    Parameters: = {,, , , ,}

    48

  • Equilibrium Conditions

    1

    ct= Et

    (1

    ct+1

    1 + ezt+1k1t+1 l

    1t+1

    )

    1 1 lt

    =

    ct(1 ) eztkt lt

    ct + kt+1 = eztkt l

    1t + (1 ) kt

    zt = zt1 + t

    49

  • A Special Case

    We set, unrealistically but rather useful for our point, = 1.

    In this case, the model has two important and useful features:

    1. First, the income and the substitution eect from a productivityshock to labor supply exactly cancel each other. Consequently, ltis constant and equal to:

    lt = l =(1 )

    (1 ) + (1 ) (1 )

    2. Second, the policy function for capital is kt+1 = eztkt l1.

    50

  • A Special Case II

    The definition of kt+1 implies that ct = (1 ) eztkt l1.

    Let us try if the Euler Equation holds:1

    ct= Et

    (1

    ct+1

    ezt+1k1t+1 l

    1t+1

    )1

    (1 ) eztkt l1= Et

    (1

    (1 ) ezt+1kt+1l1ezt+1k1t+1 l

    1t+1

    )1

    (1 ) eztkt l1= Et

    (

    (1 ) kt+1

    )

    (1 )=

    (1 )

    51

  • Let us try if the Intratemporal condition holds1 1 l

    =

    (1 ) eztkt l1(1 ) eztkt l

    1 1 l

    =

    (1 )(1 )l

    (1 ) (1 ) l = (1 ) (1 l)

    ((1 ) (1 ) + (1 ) ) l = (1 )

    Finally, the budget constraint holds because of the definition of ct.

    52

  • Transition Equation

    Since this policy function is linear in logs, we have the transition equa-tion for the model:

    1log kt+1zt

    =

    1 0 0logl1

    0 0

    1log ktzt1

    +

    011

    t.

    Note constant.

    Alternative formulations.

    53

  • Measurement Equation

    As observables, we assume log yt and log it subject to a linearly additivemeasurement error Vt =

    v1,t v2,t

    0.

    Let Vt N (0,), where is a diagonal matrix with 21 and 22, asdiagonal elements.

    Why measurement error? Stochastic singularity.

    Then:log ytlog it

    !=

    logl1 1 0

    0 1 0

    !

    1log kt+1zt

    +

    v1,tv2,t

    !.

    54

  • The Solution to the Model in State Space Form

    xt =

    1log ktzt1

    , zt =

    log ytlog it

    !

    F =

    1 0 0logl1

    0 0

    , G =

    011

    , Q = 2

    H0 = logl1 1 0

    0 1 0

    !, R =

    55

  • The Solution to the Model in State Space Form III

    Now, using zT , F,G,H,Q, and R as defined in the last slide...

    ...we can use the Ricatti equations to compute the likelihood functionof the model:

    log `zT |F,G,H,Q,R

    Croos-equations restrictions implied by equilibrium solution.

    With the likelihood, we can do inference!

    56

  • What do we Do if 6= 1?

    We have two options:

    First, we could linearize or log-linearize the model and apply theKalman filter.

    Second, we could compute the likelihood function of the model usinga non-linear filter (particle filter).

    Advantages and disadvantages.

    Fernandez-Villaverde, Rubio-Ramrez, and Santos (2005).57

  • The Kalman Filter and linearized DSGE Models

    We linearize (or loglinerize) around the steady state.

    We assume that we have data on log output (log yt), log hours (log lt),and log investment (log ct) subject to a linearly additive measurement

    error Vt =v1,t v2,t v3,t

    0.

    We need to write the model in state space form. Remember thatbkt+1 = P bkt +Qzt

    and blt = Rbkt + Szt58

  • Writing the Likelihood Function I

    The transitions Equation:

    1bkt+1zt+1

    =

    1 0 00 P Q0 0

    1bktzt

    +

    001

    t.

    The Measurement Equation requires some care.

    59

  • Writing the Likelihood Function II

    Notice that byt = zt + bkt + (1 )blt Therefore, using blt = Rbkt + Szt

    byt = zt + bkt + (1 )(Rbkt + Szt) =(+ (1 )R) bkt + (1 + (1 )S) zt

    Also since bct = 5blt + zt + bkt and using again blt = Rbkt + Sztbct = zt + bkt 5(Rbkt + Szt) =( 5R) bkt + (1 5S) zt

    60

  • Writing the Likelihood Function III

    Therefore the measurement equation is:log ytlog ltlog ct

    =

    log y + (1 )R 1 + (1 )Slog l R Slog c 5R 1 5S

    1bktzt

    +

    v1,tv2,tv3,t

    .

    61

  • The Likelihood Function of a General Dynamic Equilibrium Economy

    Transition equation:St = f (St1,Wt; )

    Measurement equation:Yt = g (St, Vt; )

    Interpretation.

    62

  • Some Assumptions

    1. We can partition {Wt} into two independent sequencesnW1,t

    oandn

    W2,to, s.t. Wt =

    W1,t,W2,t

    and dim

    W2,t

    +dim (Vt) dim (Yt).

    2. We can always evaluate the conditional densities pyt|Wt1, yt1, S0;

    .

    Lubick and Schorfheide (2003).

    3. The model assigns positive probability to the data.

    63

  • Our Goal: Likelihood Function

    Evaluate the likelihood function of the a sequence of realizations ofthe observable yT at a particular parameter value :

    pyT ;

    We factorize it as:

    pyT ;

    =

    TYt=1

    pyt|yt1;

    =TYt=1

    Z Zpyt|Wt1, yt1, S0;

    pWt1, S0|yt1;

    dWt1dS0

    64

  • A Law of Large Numbers

    If

    (st|t1,i0 , w

    t|t1,i1

    Ni=1

    )Tt=1

    N i.i.d. draws fromnpWt1, S0|yt1;

    oTt=1,

    then:

    pyT ;

    '

    TYt=1

    1

    N

    NXi=1

    pyt|wt|t1,i1 , yt1, s

    t|t1,i0 ;

    65

  • ...thus

    The problem of evaluating the likelihood is equivalent to the problem of

    drawing from npWt1, S0|yt1;

    oTt=1

    66

  • Introducing Particles

    nst1,i0 , w

    t1,i1

    oNi=1

    N i.i.d. draws from pWt11 , S0|yt1;

    .

    Each st1,i0 , wt1,i1 is a particle andnst1,i0 , w

    t1,i1

    oNi=1

    a swarm of

    particles.

    st|t1,i0 , w

    t|t1,i1

    Ni=1

    N i.i.d. draws from pWt1, S0|yt1;

    .

    Each st|t1,i0 , wt|t1,i1 is a proposed particle and

    st|t1,i0 , w

    t|t1,i1

    Ni=1

    a swarm of proposed particles.

    67

  • ... and Weights

    qit =pyt|wt|t1,i1 , yt1, s

    t|t1,i0 ;

    PNi=1 p

    yt|wt|t1,i1 , yt1, s

    t|t1,i0 ;

    68

  • A Proposition

    Letnesi0, ewi1oNi=1 be a draw with replacement from

    st|t1,i0 , w

    t|t1,i1

    Ni=1

    and probabilities qit. Thennesi0, ewi1oNi=1 is a draw from p Wt1, S0|yt; .

    69

  • Importance of the Proposition

    1. It shows how a drawst|t1,i0 , w

    t|t1,i1

    Ni=1

    from pWt1, S0|yt1;

    can be used to draw

    nst,i0 , w

    t,i1

    oNi=1

    from pWt1, S0|yt;

    .

    2. With a drawnst,i0 , w

    t,i1

    oNi=1

    from pWt1, S0|yt;

    we can use p

    W1,t+1;

    to get a draw

    st+1|t,i0 , w

    t+1|t,i1

    Ni=1

    and iterate the procedure.

    70

  • Sequential Monte Carlo I: Filtering

    Step 0, Initialization: Set t 1 and initialize pWt11 , S0|yt1;

    =

    p (S0; ).

    Step 1, Prediction: Sample N valuesst|t1,i0 , w

    t|t1,i1

    Ni=1

    from the

    density pWt1, S0|yt1;

    = p

    W1,t;

    pWt11 , S0|yt1;

    .

    Step 2, Weighting: Assign to each draw st|t1,i0 , w

    t|t1,i1 the weight

    qit.

    Step 3, Sampling: Drawnst,i0 , w

    t,i1

    oNi=1

    with rep. from

    st|t1,i0 , w

    t|t1,i1

    Ni=1

    with probabilitiesnqit

    oNi=1. If t < T set t t + 1 and go to

    step 1. Otherwise stop.

    71

  • Sequential Monte Carlo II: Likelihood

    Use

    (st|t1,i0 , w

    t|t1,i1

    Ni=1

    )Tt=1

    to compute:

    pyT ;

    '

    TYt=1

    1

    N

    NXi=1

    p

    yt|wt|t1,i1 , yt1, s

    t|t1,i0 ;

    72

  • A Trivial Application

    How do we evaluate the likelihood function pyT |,,

    of the nonlinear,

    nonnormal process:

    st = + st1

    1 + st1+ wt

    yt = st + vt

    where wt N (0,) and vt t (2) given some observables yT = {yt}Tt=1and s0.

    73

  • 1. Let s0,i0 = s0 for all i.

    2. Generate N i.i.d. drawss1|0,i0 , w

    1|0,iNi=1

    from N (0,).

    3. Evaluate py1|w1|0,i1 , y0, s

    1|0,i0

    = pt(2)

    y1

    +

    s1|0,i0

    1+s1|0,i0

    + w1|0,i!!.

    4. Evaluate the relative weights qi1 =

    pt(2)

    y1

    +

    s1|0,i0

    1+s1|0,i0

    +w1|0,i!!

    PNi=1 pt(2)

    y1

    +

    s1|0,i0

    1+s1|0,i0

    +w1|0,i!!.

    74

  • 5. Resample with replacement N values ofs1|0,i0 , w

    1|0,iNi=1

    with rela-

    tive weights qi1. Call those sampled valuesns1,i0 , w

    1,ioNi=1.

    6. Go to step 1, and iterate 1-4 until the end of the sample.

    75

  • A Law of Large Numbers

    A law of the large numbers delivers:

    py1| y0,,,

    ' 1N

    NXi=1

    py1|w1|0,i1 , y0, s

    1|0,i0

    and consequently:

    pyT,,

    '

    TYt=1

    1

    N

    NXi=1

    p

    yt|wt|t1,i1 , yt1, s

    t|t1,i0

    76

  • Comparison with Alternative Schemes

    Deterministic algorithms: Extended Kalman Filter and derivations(Jazwinski, 1973), Gaussian Sum approximations (Alspach and Soren-

    son, 1972), grid-based filters (Bucy and Senne, 1974), Jacobian of the

    transform (Miranda and Rui, 1997).

    Tanizaki (1996).

    Simulation algorithms: Kitagawa (1987), Gordon, Salmond and Smith(1993), Mariano and Tanizaki (1995) and Geweke and Tanizaki (1999).

    77

  • A Real Application: the Stochastic Neoclassical Growth Model

    Standard model.

    Isnt the model nearly linear?

    Yes, but:

    1. Better to begin with something easy.

    2. We will learn something nevertheless.

    78

  • The Model

    Representative agent with utility function U = E0Pt=0

    t

    ct (1lt)

    11

    1 .

    One good produced according to yt = eztAkt l1t with (0, 1) .

    Productivity evolves zt = zt1 + t, || < 1 and t N (0,).

    Law of motion for capital kt+1 = it + (1 )kt.

    Resource constrain ct + it = yt.79

  • Solve for c (, ) and l (, ) given initial conditions.

    Characterized by:Uc(t) = Et

    hUc(t+ 1)

    1 + Aezt+1k1t+1 l(kt+1, zt+1)

    i

    1

    c(kt, zt)

    1 l(kt, zt)= (1 ) eztAkt l(kt, zt)

    A system of functional equations with no known analytical solution.

    80

  • Solving the Model

    We need to use a numerical method to solve it.

    Dierent nonlinear approximations: value function iteration, perturba-tion, projection methods.

    We use a Finite Element Method. Why? Aruoba, Fernandez-Villaverdeand Rubio-Ramrez (2003):

    1. Speed: sparse system.

    2. Accuracy: flexible grid generation.

    3. Scalable.81

  • Building the Likelihood Function

    Time series:

    1. Quarterly real output, hours worked and investment.

    2. Main series from the model and keep dimensionality low.

    Measurement error. Why?

    = (, , ,, ,,,1,2,3)

    82

  • State Space Representation

    kt = f1(St1,Wt; ) = etanh1(t1)kt1l

    kt1, tanh

    1(t1); 1

    1

    1 (1 )

    1 l

    kt1, tanh1(t1);

    lkt1, tanh1(t1);

    + (1 ) kt1

    t = f2(St1,Wt; ) = tanh( tanh1(t1) + t)

    gdpt = g1(St, Vt; ) = etanh1(t)kt l

    kt, tanh

    1(t); 1

    + V1,t

    hourst = g2(St, Vt; ) = lkt, tanh

    1(t); + V2,t

    invt = g3(St, Vt; ) = etanh1(t)kt l

    kt, tanh

    1(t); 1

    1

    1 (1 )

    1 l

    kt, tanh

    1(t);

    lkt, tanh

    1(t);

    + V3,t

    Likelihood Function

    83

  • Since our measurement equation implies that

    p (yt|St; ) = (2)32 ||12 e

    (St;)2

    where (St; ) = (yt x(St; )))01 (yt x(St; )) t, we have

    pyT ;

    =

    (2)3T2 ||T2

    Z TYt=1

    Ze

    (St;)2 p

    St|yt1, S0;

    dSt

    p (S0; ) dS1

    ' (2)3T2 ||T2TYt=1

    1

    N

    NXi=1

    e(sit;)2

    84

  • Priors for the Parameters

    Priors for the Parameters of the ModelParameters Distribution Hyperparameters

    123

    UniformUniformUniformUniformUniformUniformUniformUniformUniformUniform

    0,10,10,1000,10,0.050.75,10,0.10,0.10,0.10,0.1

    85

  • Likelihood-Based Inference I: a Bayesian Perspective

    Define priors over parameters: truncated uniforms.

    Use a Random-walk Metropolis-Hastings to draw from the posterior.

    Find the Marginal Likelihood.

    86

  • Likelihood-Based Inference II: a Maximum Likelihood Perspective

    We only need to maximize the likelihood.

    Diculties to maximize with Newton type schemes.

    Common problem in dynamic equilibrium economies.

    We use a simulated annealing scheme.

    87

  • An Exercise with Artificial Data

    First simulate data with our model and use that data as sample.

    Pick true parameter values. Benchmark calibration values for thestochastic neoclassical growth model (Cooley and Prescott, 1995).

    Calibrated ParametersParameter Value 0.357 0.95 2.0 0.4 0.02Parameter 1 2 3Value 0.99 0.007 1.58*104 0.0011 8.66*104

    Sensitivity: = 50 and = 0.035.88

  • 0.9 0.92 0.94 0.96 0.98

    Likelihood cut at r

    1.5 2 2.5 3 3.5

    Likelihood cut at t

    0.38 0.4 0.42

    Likelihood cut at a

    0.018 0.02 0.022

    Likelihood cut at d

    7 8 9 10 11

    x 10-3

    Likelihood cut at s

    0.98 0.985 0.99

    Likelihood cut at b

    0.25 0.3 0.35 0.4

    Likelihood cut at q

    Nonlinear

    Linear

    Pseudotrue

    Figure 5.1: Likelihood Function Benchmark Calibration

  • 0.94850.9490.9495 0.95 0.95050.9510.95150

    1000

    2000

    3000

    4000

    5000r

    1.996 1.998 2 2.002 2.0040

    1000

    2000

    3000

    4000

    5000t

    0.3995 0.4 0.40050

    2000

    4000

    a

    0.01957 0.0196 0.019630

    2000

    4000

    d

    6.99 6.995 7 7.005 7.01

    x 10-3

    0

    5000s

    0.988 0.989 0.99 0.9910

    5000b

    0.3564 0.3568 0.3572 0.35760

    5000q

    1.578 1.579 1.58 1.581 1.582 1.583 1.584

    x 10-4

    0

    5000

    s1

    1.116 1.117 1.118 1.119 1.12

    x 10-3

    0

    2000

    4000

    s2

    8.645 8.65 8.655 8.66 8.665 8.67 8.675

    x 10-4

    0

    2000

    4000

    s3

    Figure 5.2: Posterior Distribution Benchmark Calibration

  • 0.9 0.92 0.94 0.96 0.98

    Likelihood cut r

    40 45 50 55

    Likelihood cut t

    0.36 0.38 0.4 0.42 0.44

    Likelihood cut a

    0.016 0.018 0.02 0.022

    Likelihood cut d

    0.03 0.035 0.04

    Likelihood cut s

    0.95 0.96 0.97 0.98 0.99

    Likelihood cut b

    0.3 0.35 0.4

    Likelihood cut q

    Nonlinear

    Linear

    Pseudotrue

    Figure 5.3: Likelihood Function Extreme Calibration

  • 0.9495 0.95 0.95050

    2000

    4000

    6000

    r

    49.95 50 50.050

    2000

    4000

    6000

    t

    0.3996 0.3998 0.4 0.40020

    2000

    4000

    6000

    a

    0.019555 0.019565 0.019575 0.0195850

    2000

    4000

    6000

    d

    0.035 0.035 0.035 0.035 0.035 0.035 0.0350

    2000

    4000

    6000

    s

    0.989 0.9895 0.99 0.99050

    2000

    4000

    6000

    b

    0.3567 0.3569 0.3571 0.35730

    5000

    q

    1.58 1.5805 1.581 1.5815 1.582 1.5825

    x 10-4

    0

    5000

    s1

    1.117 1.1175 1.118 1.1185 1.119

    x 10-3

    0

    2000

    4000

    6000

    s2

    8.655 8.66 8.665

    x 10-4

    0

    2000

    4000

    6000

    s3

    Figure 5.4: Posterior Distribution Extreme Calibration

  • 0 1 2 3 4

    x 105

    0.4

    0.6

    0.8

    1r

    0 1 2 3 4

    x 105

    40

    60

    80t

    0 1 2 3 4

    x 105

    0.350.4

    0.45

    a

    0 1 2 3 4

    x 105

    0.01

    0.02

    0.03d

    0 1 2 3 4

    x 105

    0.02

    0.03

    0.04s

    0 1 2 3 4

    x 105

    0.9

    0.95

    1b

    0 1 2 3 4

    x 105

    0.35

    0.4q

    0 1 2 3 4

    x 105

    0

    0.05

    s1

    0 1 2 3 4

    x 105

    0

    0.005

    0.01

    0.015

    s2

    0 1 2 3 4

    x 105

    0

    0.02

    0.04

    s3

    Figure 5.5: Converge of Posteriors Extreme Calibration

  • 0.96 0.97 0.98 0.99 10

    5000

    10000r

    1.68 1.7 1.72 1.74 1.760

    5000

    10000

    15000t

    0.32 0.322 0.324 0.326 0.328 0.33 0.3320

    1

    2x 10

    4 a

    6.2 6.25 6.3 6.35 6.4

    x 10-3

    0

    5000

    10000d

    0.0198 0.02 0.0202 0.02040

    5000

    10000

    15000s

    0.9964 0.9966 0.9968 0.997 0.9972 0.9974 0.99760

    5000

    10000

    15000b

    0.385 0.39 0.395 0.40

    5000

    10000q

    0.0435 0.044 0.0445 0.045 0.0455 0.046 0.04650

    5000

    10000

    s1

    0.014 0.0145 0.015 0.0155 0.0160

    5000

    10000

    s2

    0.037 0.0375 0.038 0.0385 0.039 0.0395 0.040

    5000

    10000

    15000

    s3

    Figure 5.6: Posterior Distribution Real Data

  • 0.39 0.395 0.4 0.405 0.41

    -7000

    -6000

    -5000

    -4000

    -3000

    -2000

    -1000

    Transversal cut at a

    Exact100 Particles1000 Particles10000 Particles

    0.97 0.98 0.99 1 1.01

    -7000

    -6000

    -5000

    -4000

    -3000

    -2000

    -1000

    Transversal cut at b

    Exact100 Particles1000 Particles10000 Particles

    0.93 0.94 0.95 0.96 0.97

    -220

    -200

    -180

    -160

    -140

    -120

    -100

    -80

    -60

    -40

    Transversal cut at r

    Exact100 Particles1000 Particles10000 Particles

    6.85 6.9 6.95 7 7.05 7.1 7.15

    x 10-3

    -220

    -200

    -180

    -160

    -140

    -120

    -100

    -80

    -60

    -40

    Transversal cut at se

    Exact100 Particles1000 Particles10000 Particles

    Figure 6.1: Likelihood Function

  • 0 2000 4000 6000 8000 100000

    0.2

    0.4

    0.6

    0.8

    110000 particles

    0 0.5 1 1.5 2

    x 104

    0

    0.2

    0.4

    0.6

    0.8

    120000 particles

    0 1 2 3

    x 104

    0

    0.2

    0.4

    0.6

    0.8

    130000 particles

    0 1 2 3 4

    x 104

    0

    0.2

    0.4

    0.6

    0.8

    140000 particles

    0 1 2 3 4 5

    x 104

    0

    0.2

    0.4

    0.6

    0.8

    150000 particles

    0 2 4 6

    x 104

    0

    0.2

    0.4

    0.6

    0.8

    160000 particles

    Figure 6.2: C.D.F. Benchmark Calibration

  • 0 2000 4000 6000 8000 100000

    0.2

    0.4

    0.6

    0.8

    110000 particles

    0 0.5 1 1.5 2

    x 104

    0

    0.2

    0.4

    0.6

    0.8

    120000 particles

    0 1 2 3

    x 104

    0

    0.2

    0.4

    0.6

    0.8

    130000 particles

    0 1 2 3 4

    x 104

    0

    0.2

    0.4

    0.6

    0.8

    140000 particles

    0 1 2 3 4 5

    x 104

    0

    0.2

    0.4

    0.6

    0.8

    150000 particles

    0 2 4 6

    x 104

    0

    0.2

    0.4

    0.6

    0.8

    160000 particles

    Figure 6.3: C.D.F. Extreme Calibration

  • 0 2000 4000 6000 8000 100000

    0.2

    0.4

    0.6

    0.8

    110000 particles

    0 0.5 1 1.5 2

    x 104

    0

    0.2

    0.4

    0.6

    0.8

    120000 particles

    0 1 2 3

    x 104

    0

    0.2

    0.4

    0.6

    0.8

    130000 particles

    0 1 2 3 4

    x 104

    0

    0.2

    0.4

    0.6

    0.8

    140000 particles

    0 1 2 3 4 5

    x 104

    0

    0.2

    0.4

    0.6

    0.8

    150000 particles

    0 2 4 6

    x 104

    0

    0.2

    0.4

    0.6

    0.8

    160000 particles

    Figure 6.4: C.D.F. Real Data

  • Posterior Distributions Benchmark CalibrationParameters Mean s.d.

    123

    0.3570.9502.0000.4000.0200.9890.007

    1.581041.121028.64104

    6.721053.401046.781048.601051.341051.541059.291065.751086.441076.49107

    89

  • Maximum Likelihood Estimates Benchmark CalibrationParameters MLE s.d.

    123

    0.3570.9502.0000.4000.0020.9900.007

    1.581041.121038.63104

    8.191060.0010.020

    2.021062.071051.001060.0040.0070.0070.005

    90

  • Posterior Distributions Extreme CalibrationParameters Mean s.d.

    123

    0.3570.95050.000.4000.0200.9890.035

    1.581041.121028.65104

    7.191041.881047.121034.801053.521068.691064.471061.871082.141072.33107

    91

  • Maximum Likelihood Estimates Extreme CalibrationParameters MLE s.d.

    123

    0.3570.95050.0000.4000.0190.9900.035

    1.581041.121038.66104

    2.421066.121030.022

    3.621077.431061.001050.0150.0170.0140.023

    92

  • Convergence on Number of Particles

    Convergence Real DataN Mean s.d.

    100002000030000400005000060000

    1014.5581014.6001014.6531014.6661014.6881014.664

    0.32960.25950.18290.16040.14650.1347

    93

  • Posterior Distributions Real DataParameters Mean s.d.

    123

    0.3230.9691.8250.3880.0060.9970.0230.0390.0180.034

    7.976 1040.0080.0110.001

    3.557 1059.221 1052.702 1045.346 1044.723 1046.300 104

    94

  • Maximum Likelihood Estimates Real DataParameters MLE s.d.

    123

    0.3900.9871.7810.3240.0060.9970.0230.0380.0160.035

    0.0440.7081.3980.0190.160

    8.671030.2240.0600.0610.076

    95

  • Logmarginal Likelihood Dierence: Nonlinear-Linear

    p Benchmark Calibration Extreme Calibration Real Data0.1 73.631 117.608 93.650.5 73.627 117.592 93.550.9 73.603 117.564 93.55

    96

  • Nonlinear versus Linear Moments Real DataReal Data Nonlinear (SMC filter) Linear (Kalman filter)

    outputhoursinv

    Mean s.d1.950.360.42

    0.0730.0140.066

    Mean s.d1.910.360.44

    0.1290.0230.073

    Mean s.d1.610.340.28

    0.0680.0040.044

    97

  • A Future Application: Good Luck or Good Policy?

    U.S. economy has become less volatile over the last 20 years (Stockand Watson, 2002).

    Why?

    1. Good luck: Sims (1999), Bernanke and Mihov (1998a and 1998b)

    and Stock and Watson (2002).

    2. Good policy: Clarida, Gertler and Gal (2000), Cogley and Sargent

    (2001 and 2003), De Long (1997) and Romer and Romer (2002).

    3. Long run trend: Blanchard and Simon (2001).

    98

  • How Has the Literature Addressed this Question?

    So far: mostly with reduced form models (usually VARs).

    But:

    1. Results dicult to interpret.

    2. How to run counterfactuals?

    3. Welfare analysis.

    99

  • Why Not a Dynamic Equilibrium Model?

    New generation equilibrium models: Christiano, Eichebaum and Evans(2003) and Smets and Wouters (2003).

    Linear and Normal.

    But we can do it!!!

    100

  • Environment

    Discrete time t = 0, 1, ...

    Stochastic process s S with history st = (s0, ..., st) and probabilityst.

    101

  • The Final Good Producer

    Perfectly Competitive Final Good Producer that solves

    maxyi(st)

    Zyistdi1

    Zpistyistdi.

    Demand function for each input of the form

    yist=

    pist

    p (st)

    11

    yst,

    with price aggregator:

    pst=

    Zpist 1 di

    !1.

    102

  • The Intermediate Good Producer

    Continuum of intermediate good producers, each of one behaving asmonopolistic competitor.

    The producer of good i has access to the technology:

    yist= max

    ez(s

    t)kist1

    l1i

    st , 0

    .

    Productivity zst= z

    st1

    + z

    st.

    Calvo pricing with indexing. Probability of changing prices (beforeobserving current period shocks) 1 .

    103

  • Consumers Problem

    Est

    Xt=0

    t

    cst c st dc st1c

    c l

    st l stl

    l+ m

    stm stm

    m

    pst cst+ x

    st+M

    st+Zst+1

    qst+1

    stBst+1

    dst+1 =

    pst wstlst+ r

    stkst1

    +M

    st1

    +B

    st+

    st+ T

    st

    Bst+1

    B

    kst= (1 ) k

    st1

    xst

    kst1

    + x

    st.

    104

  • Government Policy

    Monetary Policy: Taylor ruleist= rgg

    st

    +ast

    st g

    st

    +bst yst yg

    st+ i

    st

    gst= g

    st1

    +

    st

    ast= a

    st1

    + a

    st

    bst= b

    st1

    + b

    st

    Fiscal Policy.105

  • Stochastic Volatility I

    We can stack all shocks in one vector:st=zst, c

    st, l

    st, m

    st, i

    st,

    st, a

    st, b

    st0

    Stochastic volatility:

    st= R

    st0.5

    st.

    The matrix Rstcan be decomposed as:

    Rst= G

    st1

    HstGst.

    106

  • Stochastic Volatility II

    Hst(instantaneous shocks variances) is diagonal with nonzero ele-

    ments histthat evolve:

    log hist= log hi

    st1

    + ii

    st.

    Gst(loading matrix) is lower triangular, with unit entries in the

    diagonal and entries ijstthat evolve:

    ijst= ij

    st1

    + ijij

    st.

    107

  • Where Are We Now?

    Solving the model: problem with 45 state variables: physical capital,the aggregate price level, 7 shocks, 8 elements of matrix H

    st, and

    the 28 elements of the matrix Gst.

    Perturbation.

    We are making good progress.

    108