33
Mixing Dynamic Linear Models Kevin R. Keane [email protected] Computer Science & Engineering University at Buffalo SUNY

Kevin R. Keane

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Kevin R. Keane

Mixing Dynamic Linear Models

Kevin R. Keane

[email protected]

Computer Science & Engineering

University at Buffalo SUNY

Page 2: Kevin R. Keane

Modeling label distributions on a lattice

� prior distribution of labels

p(L)

� likelihood of observed data given a label

p(D|L)

� posterior distribution of labels given data

p(L|D) ∝ p(D|L)p(L)

� obtain other distributions of interest

p(X|D) ∝

p(X|L)p(L|D)dL

Mixing Dynamic Linear Models – 1/33

Page 3: Kevin R. Keane

A slight twist

Usual setup� model distribution of labels at lattice sites

This project� model distribution of linear models at lattice sites

Mixing Dynamic Linear Models – 2/33

Page 4: Kevin R. Keane

Mixing Models

Why bother modeling distributions over models?� Avoid the need to carefully specify model

parameters in advance� use simple components� grid approximation spanning likely

parameter values� let the data figure it out . . .

� To obtain better models . . .

Mixing Dynamic Linear Models – 3/33

Page 5: Kevin R. Keane

What’s a better model?

A model with higher posterior density for quantities ofinterest. For example:� parameters in a stock returns model� parameters describing a plane� . . .

Mixing Dynamic Linear Models – 4/33

Page 6: Kevin R. Keane

What’s a better model?

−3 −2 −1 0 1 2 30

0.5

1

1.5

2

Value of Θ

Pos

terio

r D

ensi

ty

P( Θ

D

)

"Better" model

Mixing Dynamic Linear Models – 5/33

Page 7: Kevin R. Keane

Bayesian framework

� prior distribution of models

p(M)

� likelihood of observed data given a model

p(D|M)

� posterior distribution of models given data

p(M |D) ∝ p(D|M)p(M)

� obtain other distributions of interest

p(X|D) ∝

p(X|M)p(M |D)dM

Mixing Dynamic Linear Models – 6/33

Page 8: Kevin R. Keane

Drilling down further . . .

Linear regression models

Y = F ′θ + ε

� Yi is the observed response� Fi is the regression vector / independent variables� θ is the regression parameter vector� εi is the error of fit

FYI, ordinary least squares estimate of θ is

θ̂ = (FF ′)−1

FY

Mixing Dynamic Linear Models – 7/33

Page 9: Kevin R. Keane

Examples of linear models

Return of a stock is a function of market, industry andstock specific return components

r = F ′θ + ε, F =

1

rM

rI

, θ =

α

βM

βI

.

Points on a plane

0 = F ′θ + ε, F =

x

y

z

−1

, θ =

A

B

C

D

.

Mixing Dynamic Linear Models – 8/33

Page 10: Kevin R. Keane

Dynamic Linear Models

Ordinary least squares yields a single estimate θ̂ of theregression parameter vector θ for the entire data set.� we may not have a finite data set, but rather an

infinite data stream!� we expect / permit θ to vary (slightly) as we traverse

the lattice, θs ≈ θt. So, our models are dynamic.

Dynamic linear models (West & Harrison) are ageneralized form of / able to represent:� Kalman filters (Kalman)� flexible least squares (Kalaba & Tesfatsion)� linear dynamical systems (Bishop)� . . .

Mixing Dynamic Linear Models – 9/33

Page 11: Kevin R. Keane

Specifying a dynamic linear model {Ft, G, V,W}

� F ′t

is a row in the design matrix� vector of independent variables effecting Yt

� G is the evolution matrix� captures deterministic changes to θ

� θt ≈ Gθt−1

� V is the observational variance� a.k.a. Var(ε) in ordinary least squares

� W is the evolution variance matrix� captures random changes to θ

� θt = Gθt−1 + wt, wt ∼ N(0,W )

� G and W make the linear model dynamic

Mixing Dynamic Linear Models – 10/33

Page 12: Kevin R. Keane

Specifying a dynamic linear model {Ft, G, V,W}

� The observation equation is

Yt = F ′tθt + νt, νt ∼ N(0, V )

� The evolution equation is

θt = Gθt−1 + ωt, ωt ∼ N(0,W )

� The initial information is summarized

(θ0|D0) ∼ N (m0, C0)

� Information at time t

Dt = {Yt, Dt−1}

Mixing Dynamic Linear Models – 11/33

Page 13: Kevin R. Keane

Experiment #1

� Specify 3 DLMs{F = 1, G = 1,W = 5.0000, V = 1}

{F = 1, G = 1,W = 0.0500, V = 1}

{F = 1, G = 1,W = 0.0005, V = 1}

� Generate 1000 observations YM

� Select a model, generate 200 observations� Select a model, generate 200 observations� . . .

Mixing Dynamic Linear Models – 12/33

Page 14: Kevin R. Keane

Generated observations Yt from 3 DLMs

0 200 400 600 800 1000−10

0

10

20

30

Time t

Obs

erve

d D

ata

Yt

W=.05

W=5 W=.0005

DLM {1,1,1,W}

local consistency is a function of Wm . . .

Mixing Dynamic Linear Models – 13/33

Page 15: Kevin R. Keane

Inference with a dynamic linear model {Ft, G, V,W}

� Posterior at t− 1, (θt−1|Dt−1) ∼ N (mt−1, Ct−1)

� Prior at t, (θt|Dt−1) ∼ N (at, Rt)

at = Gmt−1, Rt = GCt−1G′ +W

Mixing Dynamic Linear Models – 14/33

Page 16: Kevin R. Keane

Inference with a dynamic linear model {Ft, G, V,W}

� Posterior at t− 1, (θt−1|Dt−1) ∼ N (mt−1, Ct−1)

� Prior at t, (θt|Dt−1) ∼ N (at, Rt)

at = Gmt−1, Rt = GCt−1G′ +W

� One-step forecast at t, (Yt|Dt−1) ∼ N (ft, Qt)

ft = F ′tat Qt = F ′

tRtFt + V

Mixing Dynamic Linear Models – 15/33

Page 17: Kevin R. Keane

Inference with a dynamic linear model {Ft, G, V,W}

� Posterior at t− 1, (θt−1|Dt−1) ∼ N (mt−1, Ct−1)

� Prior at t, (θt|Dt−1) ∼ N (at, Rt)

at = Gmt−1, Rt = GCt−1G′ +W

� One-step forecast at t, (Yt|Dt−1) ∼ N (ft, Qt)

ft = F ′tat Qt = F ′

tRtFt + V

� One-step forecast is the model likelihood

(Yt|Dt−1) = (Yt, Dt−1|Dt−1, {Ft, G, V,W}) = (D|M)

Mixing Dynamic Linear Models – 16/33

Page 18: Kevin R. Keane

Inference with a dynamic linear model {Ft, G, V,W}

� Prior at t, (θt|Dt−1) ∼ N (at, Rt)

at = Gmt−1, Rt = GCt−1G′ +W

� One-step forecast at t, (Yt|Dt−1) ∼ N (ft, Qt)

ft = F ′tat Qt = F ′

tRtFt + V

� Posterior at t, (θt|Dt) ∼ N (mt, Ct)

et = Yt − ft At = RtFtQ−1

t

mt = at + Atet Ct = Rt − AtQtA′t

Mixing Dynamic Linear Models – 17/33

Page 19: Kevin R. Keane

Experiment #1 continued / inference about θt

� Specify 3 DLMs{F = 1, G = 1,W = 5.0000, V = 1}

{F = 1, G = 1,W = 0.0500, V = 1}

{F = 1, G = 1,W = 0.0005, V = 1}

� Generate 1000 observations YM

� Select a model, generate 200 observations� Select a model, generate 200 observations� . . .

� Compute posteriors (θt−1|M = m,Dt−1)

Mixing Dynamic Linear Models – 18/33

Page 20: Kevin R. Keane

Posterior mean mt from 3 DLMs

0 200 400 600 800 1000−10

0

10

20

30

Time t

mt

W=5W=.05W=.0005

DLM {1,1,1,W}

(θt|Dt) ∼ N (mt, Ct)

Mixing Dynamic Linear Models – 19/33

Page 21: Kevin R. Keane

Experiment #1 continued / inference about models

� Specify 3 DLMs{F = 1, G = 1,W = 5.0000, V = 1}

{F = 1, G = 1,W = 0.0500, V = 1}

{F = 1, G = 1,W = 0.0005, V = 1}

� Generate 1000 observations YM

� Select a model, generate 200 observations� Select a model, generate 200 observations� . . .

� Compute posteriors (θt−1|M = m,Dt−1)

� Assume prior p(M = m) = 1

3

� Compute likelihoods (Dt|M = m)

� Compute posterior model probabilities p(m|Dt)Mixing Dynamic Linear Models – 20/33

Page 22: Kevin R. Keane

Trailing interval model log likelihoods

0 200 400 600 800 1000−10

3

−102

−101

Time t

10−

day

log

likel

ihoo

d

W=5 W=.05 W=.0005

Mixing Dynamic Linear Models – 21/33

Page 23: Kevin R. Keane

Posterior model probabilities and ground truth

0 200 400 600 800 1000−0.2

0

0.2

0.4

0.6

0.8

1

Time t

P(

M |

D )

W=5 W=.05 W=.0005

Mixing Dynamic Linear Models – 22/33

Page 24: Kevin R. Keane

Modeling interpretation / tricks

� Large W permits the state vector θ to vary abruptlyin a stochastic manner

� Examples of when is this important?� stock price model — one company aquires

another company� planar geometry model — a building’s plane

intersects the ground plane� Easy to obtain parameter estimates from a mixture

model. For example,evolution variance WM isobtained from individual model variances,likelihoods, and posterior probabilities

WM =

M

Wmp(Wm|m)p(m|D)dm =

M

Wmp(m|D)dm

Mixing Dynamic Linear Models – 23/33

Page 25: Kevin R. Keane

A statistical arbitrage application

� Montana, Triantafyllopoulos, and Tsagaris.Flexible least squares for temporal data mining andstatistical arbitrage. Expert Systems withApplications, 36(2):2819-2830, 2009.

� Parameter δ = W

W+Vcontrols adaptiveness.

W = 0, δ = 0 equivalent to ordinary least squares.� Published results for 11 constant parameter models� Model S&P 500 Index return as a function of the

largest principal component return (score) of theunderlying stocks

rs&p 500 = F ′θ + ε, F =[

rpca

]

, θ =[

βpca

]

.

Mixing Dynamic Linear Models – 24/33

Page 26: Kevin R. Keane

Experiment #2

� Data used is the log price return seriesfor SPY and RSP

� SPY is the capitalization weighted S&P 500 ETF� RSP is the equal-weighted S&P 500 ETF; our proxy

for Montana’s βpca

rspy = F ′θ + ε, F =[

rrsp

]

, θ =[

βrsp

]

.

Mixing Dynamic Linear Models – 25/33

Page 27: Kevin R. Keane

2004 2006 2008 201050

100

150

200

250

RSP

SPY

Page 28: Kevin R. Keane

2004 2006 2008 20100.001

0.010

0.100 sqrt(W)

sqrt(V)

Page 29: Kevin R. Keane

Experiment #2 continued

� Data used is the log price return seriesfor SPY and RSP

� SPY is the capitalization weighted S&P 500 ETF� RSP is the equal-weighted S&P 500 ETF; our proxy

for Montana’s βpca

rspy = F ′θ + ε, F =[

rrsp

]

, θ =[

βrsp

]

.

� Can a mixture model outperform theex post best single DLM?

Mixing Dynamic Linear Models – 28/33

Page 30: Kevin R. Keane

2004 2006 2008 2010-50

0

50

100

150

Best DLM {F,1,1,W=221}DLMs {F,1,1,W}

Mixture Model

Page 31: Kevin R. Keane

10 100 10001.5

2.0

2.5

3.0

DLMs {F,1,1,W}

Mixture Model

Page 32: Kevin R. Keane

Future work

� Longer term (after this semester), generalize theone-dimensional DLM framework to permitapplication to images and video.

� For video, would probably need to considervariation W in different directions,

δθ

δx,

δθ

δy, and

δθ

δt.

(image coordinates x and y, frame t).

Mixing Dynamic Linear Models – 32/33

Page 33: Kevin R. Keane

� THANK YOU.

Mixing Dynamic Linear Models – 33/33