1
WHY ARE DBNs SPARSE? Shaunak Chatterjee and Stuart Russell, UC Berkeley Dynamic Bayesian Networks (DBNs) What are DBNs? DBNs are a flexible and effective tool for representing and reasoning about stochastic systems that evolve over time. Special cases include hidden Markov models (HMMs), factorial HMMs, hierarchical HMMs, discrete- time Kalman filters and several other families of discrete-time models. The stochastic system’s state is represented by a set of variables X t for each time t ≥ 0 and the DBN represents the joint distribution over the variables {X 1 , X 2 , …, X }. Typically, it is assumed that the system’s dynamics do not change over time, so the joint distribution is captured by a 2-TBN (2- Timeslice Bayesian Network), which is a compact graphical representation of the state prior p(X 0 ) and the stochastic dynamics p(X t | X t+1 ). Structured Dynamics: The dynamics are represented in factored form via a collection of local conditional models p(X i t+1 |(X i t+1 )) where (X i t+1 )) are the parent variables of X i t+1 in slice t or t+1. Inference in DBNs: Exact inference is tractable for a few special cases, namely HMMs and Kalman Filter models. For general DBNs, the computational complexity for exact inference is exponential in the number of variables for a large enough time horizon (Murphy, 2002). Approximate inference is much more popular. Boyen- Koller (BK) algorithm and Particle Filtering algorithms have been widely used. Structure learning for DBNs has also been studied (Friedman et al, 1998). However, till Definition s: Timescale The timescale of a variable is the expected number of timesteps for which it stays in its current state (for a discrete state space). In a general DBN, let (X t+1 ) denote the parents of X t+1 in the 2-TBN excluding X t . Let p k i,j = p(X t+1 =j| X t =i, (X t+1 )=k). T i,k X = 1/(1- p k i,i ) is the timescale of X in state i when its parents are in state k. l X = min i,k T i,k X ; h X = max i,k T i,k X In a DBN with 2 variables X and Y, if l X >> h Y then Y is a fast variable with respect to X. The timescale separation between X and Y is given by the ratio l X /h Y . For a cluster of variables C = {X 1 , …,X n }, the timescale bounds are defined by l C = min XiєC l Xi and h C = min XiєC h Xi . Larger timescale separations result in ∂-model to ∆-model Conversion Topology changing rules Marginaliz e out time slices t+1 and t+2 Stochastic differential equations (SDEs) to DBNs SDEs describe the stochastic dynamics of a system over an infinitesimally small timestep DBNs are approximate representations of the SDEs over a finite timestep Approximate since the exact model created by integrating the SDE over a finite timestep would result in a completely connected DBN Most DBNs modeling real-life stochastic processes are sparse The sparsity of these DBNs make them more tractable They are designed by humans who make implicit approximations How large a timestep? Critical decision in the design of a DBN Small enough so that fastest changing variable has a small probability of changing state Could result in gross inefficiency! Large timestep would be very efficient Key Questions 1.How to choose an appropriate ? 2.Topology of -model Any generally applicable rules? 3.Characterize the approximation error Approximation scheme: Consider the 2-variable DBN where s is a slow variable and f is a fast variable (w.r.t. s) ∂ denotes a short timestep and ∆ denotes a long timestep. -model Exact -model Approximate -model In the approximate -model , sf : stationary distribution of f for a fixed s ^ ss : ( P(s t+1 |s t ) ) ∆/∂ , where Error Characterization: If the conditional probabilities of the -model have the following structure then for є<<1 and times ∆/∂ upto O(1/є), the approximation scheme for ss has an error of O(є). (Proof in paper) The error of the limiting distribution decays exponentially. Experiments pH control mechanism in the human body Rule 1: If f 1 and f 2 have no cross links in ∂-model then they have no cross links in ∆-model Rule 2: If f 1 and f 2 have cross links in ∂-model then they are linked in ∆- model Rule 3: If s 2 is a parent of f and f is parent of s 1 in ∂-model then s 2 is a parent of s 1 in ∆-model There are 4 different timescales in this DBN. Lighter shade represents slower timescale and darker shade represents faster timescale. Accuracy of approximate models Speedup ←Fig 1. Avg. L2 error of joint belief vector Fig. 2.→ Accuracy in tracking the marginal distribution of pH General Algorithm: Given the exact - model, the approximation scheme can be used to create a sequence of DBNs for various values of , depending on the number of timescale separable clusters. For the algorithm, we assume the DBN has n variables X , 1. For each variable X i , determine l Xi and h Xi 2. Cluster the variables in {C1, …, Cm} (m≤n) such that є i = (h Ci /l C i+1 ) << 1, i.e. there is significant timescale separation between successive 3. Repeat for i=1, 2,…, m-1 3.1. i = ∆ i-1 O(1/ є i ) 3.2. C i is the fast cluster and C j (j>i) are slow clusters. Compute the stationary distribution of C i conditioned on each configuration of its slower parents. 3.3. In the worst case, all Cj’s become fully connected in the ∆ i model. If there are no links from C i to C j in the exact, then the {C j }{C j } link is exact. Widely varying timescales: An overview Chemical Reactions: Michaelis-Menten kinetics makes the quasi-steady- state assumption that the concentration of substrate-bound enzyme changes much more slowly than that of product or substrate. Recent works separate slow and fast timescales in the chemical master equation (CME) yielding separate reduced CMEs (see Gomez-Uribe et al) Gene Regulatory Networks: Arkin et. al. proposed an abstraction methodology using rapid equilibrium and quasi- steady-state approximations. Mathematics and Physics: Homogenization to replace rapidly oscillating coefficients. Body temperature (BT) and Thermometer temperature (TT) Both variables are discretized (binary) for simplicity Body temp. has slow dynamics compared to the thermometer BTTT link approximated by steady state distribution Bad approximation for 1-second timestep Good approximation for 60-second timestep

WHY ARE DBNs SPARSE? Shaunak Chatterjee and Stuart Russell, UC Berkeley Sparsity in DBNs is counter-intuitive Consider the unrolled version of a sample

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

Page 1: WHY ARE DBNs SPARSE? Shaunak Chatterjee and Stuart Russell, UC Berkeley Sparsity in DBNs is counter-intuitive Consider the unrolled version of a sample

WHY ARE DBNs SPARSE?Shaunak Chatterjee and Stuart Russell, UC Berkeley

Dynamic Bayesian Networks (DBNs)What are DBNs?DBNs are a flexible and effective tool for representing and reasoning about stochastic systems that evolve over time. Special cases include hidden Markov models (HMMs), factorial HMMs, hierarchical HMMs, discrete-time Kalman filters and several other families of discrete-time models.

The stochastic system’s state is represented by a set of variables Xt for each time t ≥ 0 and the DBN represents the joint distribution over the variables {X1, X2, …, X∞}. Typically, it is assumed that the system’s dynamics do not change over time, so the joint distribution is captured by a 2-TBN (2-Timeslice Bayesian Network), which is a compact graphical representation of the state prior p(X0) and the stochastic dynamics p(Xt|Xt+1).

Structured Dynamics:The dynamics are represented in factored form via a collectionof local conditionalmodels

p(Xit+1|∏(Xi

t+1))

where ∏(Xit+1)) are

the parent variablesof Xi

t+1 in slice t or t+1.

Inference in DBNs:Exact inference is tractable for a few special cases, namely HMMs and Kalman Filter models. For general DBNs, the computational complexity for exact inference is exponential in the number of variables for a large enough time horizon (Murphy, 2002).

Approximate inference is much more popular. Boyen-Koller (BK) algorithm and Particle Filtering algorithms have been widely used.

Structure learning for DBNs has also been studied (Friedman et al, 1998). However, till date, DBNs are mostly constructed by hand.

Applications:DBNs have been extensively used in:• Speech processing

• Traffic modeling• Modeling gene expression data

• Figure trackingand in numerous other applications.

Definitions:TimescaleThe timescale of a variable is the expected number of timesteps for which it stays in its current state (for a discrete state space).

In a general DBN, let ∏(Xt+1) denote the parents of Xt+1 in the 2-TBN excluding Xt. Let pk

i,j = p(Xt+1=j| Xt=i, ∏(Xt+1)=k).

Ti,kX = 1/(1- pk

i,i) is the timescale of X in state i when its parents are in state k.lX = mini,k Ti,k

X; hX = maxi,k Ti,kX

In a DBN with 2 variables X and Y, if lX >> hY then Y is a fast variable with respect to X. The timescale separation between X and Y is given by the ratio lX/hY.

For a cluster of variables C = {X1,…,Xn}, the timescale bounds are defined by lC = minXiєC lXi and hC = minXiєC hXi.

Larger timescale separations result in more accurate models for larger timesteps.

Stationary distributionWhen lX>>hY the stationary distribution of Y given X=k is the limiting distribution of Y if X is “frozen” at k.

This is the steady-state approximation of Y and is also referred to as the equilibrium distribution.

∂-model to ∆-model Conversion Topology changing rules

Marginalize out time slices t+1 and t+2

Stochastic differential equations (SDEs) to DBNs• SDEs describe the stochastic dynamics of a system over an infinitesimally small timestep• DBNs are approximate representations of the SDEs over a finite timestep

• Approximate since the exact model created by integrating the SDE over a finite timestep would result in a completely connected DBN

• Most DBNs modeling real-life stochastic processes are sparse• The sparsity of these DBNs make them more tractable• They are designed by humans who make implicit approximations

How large a timestep?• Critical decision in the design of a DBN• Small enough so that fastest changing variable has a small probability of changing state• Could result in gross inefficiency!• Large timestep would be very efficient

Key Questions1.How to choose an appropriate ∆?2.Topology of ∆-model

• Any generally applicable rules?3.Characterize the approximation error

Approximation scheme:Consider the 2-variable DBN where s is a slow variable and f is a fast variable (w.r.t. s)∂ denotes a short timestep and ∆ denotes a long timestep.

∂-model Exact ∆-model Approximate ∆-modelIn the approximate ∆-model, • sf : stationary distribution of f for a fixed s

^

• ss : ( P(st+1|st) )∆/∂ , where

Error Characterization:If the conditional probabilities of the ∂-model have the following structure

then for є<<1 and times ∆/∂ upto O(1/є), the approximation scheme for ss has an error of O(є). (Proof in paper)

The error of the limiting distribution decays exponentially.

ExperimentspH control mechanism in the human body

Rule 1: If f1 and f2 have no cross links in ∂-model then they have no cross links in ∆-model

Rule 2: If f1 and f2 have cross links in ∂-model then they are linked in ∆-model

Rule 3: If s2 is a parent of f and f is parent of s1 in ∂-model then s2 is a parent of s1 in ∆-model

There are 4 different timescales in this DBN. Lighter shade represents slower timescale and darker shade represents faster timescale.

Accuracy of approximate models Speedup←Fig 1. Avg. L2 error of joint belief vector

Fig. 2.→Accuracy in tracking

the marginal distribution of pH

General Algorithm:Given the exact ∂-model, the approximation scheme can be used to create a sequence of DBNs for various values of ∆, depending on the number of timescale separable clusters.

For the algorithm, we assume the DBN has n variables X1, X2,…, Xn.

1. For each variable Xi, determine lXi and hXi

2. Cluster the variables in {C1, …, Cm} (m≤n) such that єi = (hCi/lCi+1) << 1, i.e. there is significant timescale separation between successive clusters

3. Repeat for i=1, 2,…, m-1 3.1. ∆i = ∆i-1 O(1/ єi) 3.2. Ci is the fast cluster and Cj (j>i) are slow clusters. Compute the stationary distribution of Ci conditioned on each configuration of its slower parents. 3.3. In the worst case, all Cj’s become fully connected in the ∆i model. If there are no links from Ci to Cj in the exact, then the {Cj}{Cj} link is exact.

Widely varying timescales:An overviewChemical Reactions:Michaelis-Menten kinetics makes the quasi-steady-state assumption that the concentration of substrate-bound enzyme changes much more slowly than that of product or substrate.Recent works separate slow and fast timescales in the chemical master equation (CME) yielding separate reduced CMEs (see Gomez-Uribe et al)

Gene Regulatory Networks:Arkin et. al. proposed an abstraction methodology using rapid equilibrium and quasi-steady-state approximations.

Mathematics and Physics:Homogenization to replace rapidly oscillating coefficients.

Body temperature (BT) and Thermometer temperature (TT)• Both variables are discretized (binary) for simplicity• Body temp. has slow dynamics compared to the thermometer• BTTT link approximated by steady state distribution

Bad approximation for 1-second timestep

Good approximation for 60-second timestep