Upload
lerato
View
30
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Knowledge Repn. & Reasoning Lec #24: Approximate Inference in DBNs. UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2004. (Some slides by X. Boyen & D. Koller, and by S. H. Lim; Some slides by Doucet, de Freitas, Murphy, Russell, and H. Zhou ). Dynamic Systems. - PowerPoint PPT Presentation
Citation preview
Knowledge Repn. & ReasoningLec #24:
Approximate Inference in DBNsUIUC CS 498: Section EA
Professor: Eyal AmirFall Semester 2004
(Some slides by X. Boyen & D. Koller, and by S. H. Lim;
Some slides by Doucet, de Freitas, Murphy, Russell, and H. Zhou)
Dynamic Systems
• Filtering in stochastic, dynamic systems:– Monitoring freeway traffic (from an autonomous driver
or for traffic analysis)– Monitoring patient’s symptoms
• Models to deal with uncertainty and/or partial observability in dynamic systems:– Hidden Markov Models (HMMs), Kalman Filters etc– All are special cases of Dynamic Bayesian Networks
(DBNs)
Previously
• Exact DBN inference– Filtering– Smoothing– Projection– Explanation
DBN Myth
• Bayesian Network: a decomposed structure to represent the full joint distribution
• Does it imply easy decomposition for the belief state?
• No!
Tractable, approximate representation
• Exact inference in DBN is intractable
• Need approximation– Maintain an approximate belief state– E.g. assume Gaussian processes
• Today: – Factored belief state apx [Boyen & Koller ’98]– Particle filtering (if time permits)
Idea
• Use a decomposable representation for the belief state (pre-assume some independency)
Problem
• What about the approximation errors?– It might accumulate and grow unbounded…
Contraction property
• Main result:– If the process is mixing, then every state
transition results in a contraction of the distance between the two distributions by a constant factor
– Since approximation errors from previous steps decrease exponentially, the overall error remains bounded indefinitely
Basic framework• Definition 1:
– Prior belief state:
– Posterior belief state:
• Monitoring task:
],...,|[][ )1()0()()(
10
thh
tii
t
trrsPs
],,...,|[][ )()1()0()()(
10
th
thh
tii
t
ttrrrsPs
n
l hllt
hiit
it
n
ijii
tj
t
rsOs
rsOss
ssTss
1
)1(
)1()1(
1
)()1(
][][
][][][
][][][
Simple contraction
• Distance measure:– Relative entropy (KL-divergence) between the
actual and the approximate belief state
• Contraction due to O:
• Contraction due to T (can we do better?):
i i
iiED
][
][ln][][ln]||[
]ˆ||[]]]ˆ[||][[[ )()()()()(
tttr
tr DOODE
hht
]ˆ||[]]ˆ[||][[[ )()()()( tttt DTTD
Simple contraction (cont)
• Definition:– Minimal mixing rate:
• Theorem 3 (the single process contraction theorem):– For process Q, anterior distributions φ and ψ, ulterior distributions
φ’ and ψ’,
]]|[],|[min[min2121
1, ij
n
jijiiQ QQ
]||[)1(]||[ DD Q
Simple contraction (cont)
• Proof Intuition:
Compound processes
• Mixing rate could be very small for large processes• The trick is to assume some independence among
subprocesses and factor the DBN along these subprocesses
• Fully independent subprocesses:– Theorem 5:
• For L independent subprocesses T1, …, TL. Let γl be the mixing rate for Tl and let γ = minl γl. Let φ and ψ be distributions over S1
(t), …, SL
(t), and assume that ψ renders the Sl(t) marginally independent.
Then:
]||[)1(]||[ DD
Compound processes (cont)
• Conditionally independent subprocesses• Theorem 6 (the main theorem):– For L independent subprocesses T1, …, TL, assume each
process depends on at most r others, and each influences at most q others. Let γl be the mixing rate for Tl and let γ = minl γl. Let φ and ψ be distributions over S1
(t), …, SL(t), and assume
that ψ renders the Sl(t) marginally independent. Then:
q
rwhere
DD
*
* ]||[)1(]||[
Efficient, approximate monitoring
• If each approximation incurs an error bounded by ε, then– Total error
• =>error remains bounded• Conditioning on observations might introduce
momentary errors, but the expected error will contract
2)1()1(
Approximate DBN monitoring
• Algorithm (based on standard clique tree inference):
1. Construct a clique tree from the 2-TBN2. Initialize clique tree with conditional probabilities
from CPTs of the DBN3. For each time step:
a. Create a working copy of the tree Y. Create σ(t+1).b. For each subprocess l, incorporate the marginal σ(t)
[X(t)l] in the appropriate factor in Y.
c. Incorporate evidence r(t+1) in Y.d. Calibrate the potentials in Y.e. For each l, query Y for marginal over Xl
(t+1) and store it in σ(t+1).
Conclusion of Factored DBNs
• Accuracy-efficiency tradeoff:– Small partition =>
• Faster inference• Better contraction• Worse approximation
• Key to good approximation:– Discover weak/sparse interactions among
subprocesses and factor the DBN along these lines
– Domain knowledge helps
Agenda
• Factored inference in DBNs
• Sampling: Particle Filtering
A sneak peek at particle filtering
Introduction• Analytical methods
– Kalman filter: linear-Gaussian models– HMM: models with finite state space
• Stat. approx. methods for non-parametric distributions and large discrete DBN
• Diff. names:– Sequential Monte Carlo (Handschin and Mayne
1969, Akashi and Kumamoto 1975) – Particle filtering (Doucet et all 1997)– Survival of the fittest (Kanazawa, Koller and Russell
1995)– Condensation in computer vision (Isard and Blake
1996)
Outline
• Importance Sampling (IS) revisited– Sequential IS (SIS)– Particle Filtering = SIS + Resampling
• Dynamic Bayesian Networks– A Simple example: ABC network
• Inference in DBN:– Exact inference– Pure Particle Filtering– Rao-Blackwellised PF
• Demonstration in ABC network• Discussions
Importance Sampling Revisited
• Goal: evaluate the following functional
• Importance Sampling (batch mode):– Sample from – Assign
as weight of each sample– The posterior estimation of is:
• How to make it sequential?
• Choose Importance function to be:
• We get the SIS filter
• Benefit of SIS– Observation yk don’t have be given in batch
Sequential Importance Sampling
Sequential Importance Sampling
Resampling
• Why need to resample– Degeneracy of SIS
• The variance of the importance weights (y0:k is r.v.) increases in each recursion step
– Optimal importance function
• Need to sample from and evaluate
• Resampling: eliminate small weights and concentrate on large weights
Resampling
• Measure of degeneracy: effective sample size
Resampling Step
Particle filtering = SIS + Resampling
Rao-Blackwellisation for SIS
• A method to reduce the variance of the final posterior estimation
• Useful when the state can be partitioned as in which can be analytically marginalized.
• Assuming can be evaluated analytically given , one can rewrite the posterior estimation as
Example: ABC network
Inference in DBN
n
Exact inference in ABC network
Particle filtering
Rao-Blackwellised PF
Rao-Blackwellised PF (2)
Rao-Blackwellised PF (3)
Rao-Blackwellised PF (4)
Discussions
• Structure of the network:– A, C dependent on B– yt can be also separated into 3 indep. parts