Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Mixing Dynamic Linear Models
Kevin R. Keane
Computer Science & Engineering
University at Buffalo SUNY
Modeling label distributions on a lattice
� prior distribution of labels
p(L)
� likelihood of observed data given a label
p(D|L)
� posterior distribution of labels given data
p(L|D) ∝ p(D|L)p(L)
� obtain other distributions of interest
p(X|D) ∝
∫
p(X|L)p(L|D)dL
Mixing Dynamic Linear Models – 1/33
A slight twist
Usual setup� model distribution of labels at lattice sites
This project� model distribution of linear models at lattice sites
Mixing Dynamic Linear Models – 2/33
Mixing Models
Why bother modeling distributions over models?� Avoid the need to carefully specify model
parameters in advance� use simple components� grid approximation spanning likely
parameter values� let the data figure it out . . .
� To obtain better models . . .
Mixing Dynamic Linear Models – 3/33
What’s a better model?
A model with higher posterior density for quantities ofinterest. For example:� parameters in a stock returns model� parameters describing a plane� . . .
Mixing Dynamic Linear Models – 4/33
What’s a better model?
−3 −2 −1 0 1 2 30
0.5
1
1.5
2
Value of Θ
Pos
terio
r D
ensi
ty
P( Θ
D
)
"Better" model
Mixing Dynamic Linear Models – 5/33
Bayesian framework
� prior distribution of models
p(M)
� likelihood of observed data given a model
p(D|M)
� posterior distribution of models given data
p(M |D) ∝ p(D|M)p(M)
� obtain other distributions of interest
p(X|D) ∝
∫
p(X|M)p(M |D)dM
Mixing Dynamic Linear Models – 6/33
Drilling down further . . .
Linear regression models
Y = F ′θ + ε
� Yi is the observed response� Fi is the regression vector / independent variables� θ is the regression parameter vector� εi is the error of fit
FYI, ordinary least squares estimate of θ is
θ̂ = (FF ′)−1
FY
Mixing Dynamic Linear Models – 7/33
Examples of linear models
Return of a stock is a function of market, industry andstock specific return components
r = F ′θ + ε, F =
1
rM
rI
, θ =
α
βM
βI
.
Points on a plane
0 = F ′θ + ε, F =
x
y
z
−1
, θ =
A
B
C
D
.
Mixing Dynamic Linear Models – 8/33
Dynamic Linear Models
Ordinary least squares yields a single estimate θ̂ of theregression parameter vector θ for the entire data set.� we may not have a finite data set, but rather an
infinite data stream!� we expect / permit θ to vary (slightly) as we traverse
the lattice, θs ≈ θt. So, our models are dynamic.
Dynamic linear models (West & Harrison) are ageneralized form of / able to represent:� Kalman filters (Kalman)� flexible least squares (Kalaba & Tesfatsion)� linear dynamical systems (Bishop)� . . .
Mixing Dynamic Linear Models – 9/33
Specifying a dynamic linear model {Ft, G, V,W}
� F ′t
is a row in the design matrix� vector of independent variables effecting Yt
� G is the evolution matrix� captures deterministic changes to θ
� θt ≈ Gθt−1
� V is the observational variance� a.k.a. Var(ε) in ordinary least squares
� W is the evolution variance matrix� captures random changes to θ
� θt = Gθt−1 + wt, wt ∼ N(0,W )
� G and W make the linear model dynamic
Mixing Dynamic Linear Models – 10/33
Specifying a dynamic linear model {Ft, G, V,W}
� The observation equation is
Yt = F ′tθt + νt, νt ∼ N(0, V )
� The evolution equation is
θt = Gθt−1 + ωt, ωt ∼ N(0,W )
� The initial information is summarized
(θ0|D0) ∼ N (m0, C0)
� Information at time t
Dt = {Yt, Dt−1}
Mixing Dynamic Linear Models – 11/33
Experiment #1
� Specify 3 DLMs{F = 1, G = 1,W = 5.0000, V = 1}
{F = 1, G = 1,W = 0.0500, V = 1}
{F = 1, G = 1,W = 0.0005, V = 1}
� Generate 1000 observations YM
� Select a model, generate 200 observations� Select a model, generate 200 observations� . . .
Mixing Dynamic Linear Models – 12/33
Generated observations Yt from 3 DLMs
0 200 400 600 800 1000−10
0
10
20
30
Time t
Obs
erve
d D
ata
Yt
W=.05
W=5 W=.0005
DLM {1,1,1,W}
local consistency is a function of Wm . . .
Mixing Dynamic Linear Models – 13/33
Inference with a dynamic linear model {Ft, G, V,W}
� Posterior at t− 1, (θt−1|Dt−1) ∼ N (mt−1, Ct−1)
� Prior at t, (θt|Dt−1) ∼ N (at, Rt)
at = Gmt−1, Rt = GCt−1G′ +W
Mixing Dynamic Linear Models – 14/33
Inference with a dynamic linear model {Ft, G, V,W}
� Posterior at t− 1, (θt−1|Dt−1) ∼ N (mt−1, Ct−1)
� Prior at t, (θt|Dt−1) ∼ N (at, Rt)
at = Gmt−1, Rt = GCt−1G′ +W
� One-step forecast at t, (Yt|Dt−1) ∼ N (ft, Qt)
ft = F ′tat Qt = F ′
tRtFt + V
Mixing Dynamic Linear Models – 15/33
Inference with a dynamic linear model {Ft, G, V,W}
� Posterior at t− 1, (θt−1|Dt−1) ∼ N (mt−1, Ct−1)
� Prior at t, (θt|Dt−1) ∼ N (at, Rt)
at = Gmt−1, Rt = GCt−1G′ +W
� One-step forecast at t, (Yt|Dt−1) ∼ N (ft, Qt)
ft = F ′tat Qt = F ′
tRtFt + V
� One-step forecast is the model likelihood
(Yt|Dt−1) = (Yt, Dt−1|Dt−1, {Ft, G, V,W}) = (D|M)
Mixing Dynamic Linear Models – 16/33
Inference with a dynamic linear model {Ft, G, V,W}
� Prior at t, (θt|Dt−1) ∼ N (at, Rt)
at = Gmt−1, Rt = GCt−1G′ +W
� One-step forecast at t, (Yt|Dt−1) ∼ N (ft, Qt)
ft = F ′tat Qt = F ′
tRtFt + V
� Posterior at t, (θt|Dt) ∼ N (mt, Ct)
et = Yt − ft At = RtFtQ−1
t
mt = at + Atet Ct = Rt − AtQtA′t
Mixing Dynamic Linear Models – 17/33
Experiment #1 continued / inference about θt
� Specify 3 DLMs{F = 1, G = 1,W = 5.0000, V = 1}
{F = 1, G = 1,W = 0.0500, V = 1}
{F = 1, G = 1,W = 0.0005, V = 1}
� Generate 1000 observations YM
� Select a model, generate 200 observations� Select a model, generate 200 observations� . . .
� Compute posteriors (θt−1|M = m,Dt−1)
Mixing Dynamic Linear Models – 18/33
Posterior mean mt from 3 DLMs
0 200 400 600 800 1000−10
0
10
20
30
Time t
mt
W=5W=.05W=.0005
DLM {1,1,1,W}
(θt|Dt) ∼ N (mt, Ct)
Mixing Dynamic Linear Models – 19/33
Experiment #1 continued / inference about models
� Specify 3 DLMs{F = 1, G = 1,W = 5.0000, V = 1}
{F = 1, G = 1,W = 0.0500, V = 1}
{F = 1, G = 1,W = 0.0005, V = 1}
� Generate 1000 observations YM
� Select a model, generate 200 observations� Select a model, generate 200 observations� . . .
� Compute posteriors (θt−1|M = m,Dt−1)
� Assume prior p(M = m) = 1
3
� Compute likelihoods (Dt|M = m)
� Compute posterior model probabilities p(m|Dt)Mixing Dynamic Linear Models – 20/33
Trailing interval model log likelihoods
0 200 400 600 800 1000−10
3
−102
−101
Time t
10−
day
log
likel
ihoo
d
W=5 W=.05 W=.0005
Mixing Dynamic Linear Models – 21/33
Posterior model probabilities and ground truth
0 200 400 600 800 1000−0.2
0
0.2
0.4
0.6
0.8
1
Time t
P(
M |
D )
W=5 W=.05 W=.0005
Mixing Dynamic Linear Models – 22/33
Modeling interpretation / tricks
� Large W permits the state vector θ to vary abruptlyin a stochastic manner
� Examples of when is this important?� stock price model — one company aquires
another company� planar geometry model — a building’s plane
intersects the ground plane� Easy to obtain parameter estimates from a mixture
model. For example,evolution variance WM isobtained from individual model variances,likelihoods, and posterior probabilities
WM =
∫
M
Wmp(Wm|m)p(m|D)dm =
∫
M
Wmp(m|D)dm
Mixing Dynamic Linear Models – 23/33
A statistical arbitrage application
� Montana, Triantafyllopoulos, and Tsagaris.Flexible least squares for temporal data mining andstatistical arbitrage. Expert Systems withApplications, 36(2):2819-2830, 2009.
� Parameter δ = W
W+Vcontrols adaptiveness.
W = 0, δ = 0 equivalent to ordinary least squares.� Published results for 11 constant parameter models� Model S&P 500 Index return as a function of the
largest principal component return (score) of theunderlying stocks
rs&p 500 = F ′θ + ε, F =[
rpca
]
, θ =[
βpca
]
.
Mixing Dynamic Linear Models – 24/33
Experiment #2
� Data used is the log price return seriesfor SPY and RSP
� SPY is the capitalization weighted S&P 500 ETF� RSP is the equal-weighted S&P 500 ETF; our proxy
for Montana’s βpca
rspy = F ′θ + ε, F =[
rrsp
]
, θ =[
βrsp
]
.
Mixing Dynamic Linear Models – 25/33
2004 2006 2008 201050
100
150
200
250
RSP
SPY
2004 2006 2008 20100.001
0.010
0.100 sqrt(W)
sqrt(V)
Experiment #2 continued
� Data used is the log price return seriesfor SPY and RSP
� SPY is the capitalization weighted S&P 500 ETF� RSP is the equal-weighted S&P 500 ETF; our proxy
for Montana’s βpca
rspy = F ′θ + ε, F =[
rrsp
]
, θ =[
βrsp
]
.
� Can a mixture model outperform theex post best single DLM?
Mixing Dynamic Linear Models – 28/33
2004 2006 2008 2010-50
0
50
100
150
Best DLM {F,1,1,W=221}DLMs {F,1,1,W}
Mixture Model
10 100 10001.5
2.0
2.5
3.0
DLMs {F,1,1,W}
Mixture Model
Future work
� Longer term (after this semester), generalize theone-dimensional DLM framework to permitapplication to images and video.
� For video, would probably need to considervariation W in different directions,
δθ
δx,
δθ
δy, and
δθ
δt.
(image coordinates x and y, frame t).
Mixing Dynamic Linear Models – 32/33
� THANK YOU.
Mixing Dynamic Linear Models – 33/33