8
Refresh: The Factor-Fuse-Forget Filtering Framework Lawson L.S. Wong Abstract— Estimating and tracking the state is a fundamental ability of any autonomous system. The complexity of state spaces in typical applications frequently necessitate approx- imate filtering techniques such as particle filters. However, these methods often suffer from the curse of dimensionality when attempting to maintain a high-dimensional joint posterior distribution. Based on the insight that most joint distributions are projected into low-dimensional marginals when used, we propose to actively eschew the joint space, and perform the bulk of filtering in a factored state space, where each factor gives rise to a low-dimensional marginal distribution that is hopefully easier to track. Our framework supports fusing these factors to temporarily obtain joint posterior distributions, but forgets these outputs in favor of maintaining a tractable factored filter. In simulated domains, we demonstrate that this framework offers substantial improvements compared to existing factoring and particle filtering methods, in both filtering accuracy and computational efficiency. I. INTRODUCTION Estimating and tracking the state is a fundamental abil- ity of any autonomous system. The paradigm of recursive Bayesian filtering is particularly appealing and widely used in probabilistic robotic systems [1]. However, except in the simplest of cases, such filters do not admit closed-form solutions, and approximations, such as particle filters and other nonparametric methods, are frequently used in practice [2]. Although these methods are very general, they quickly suffer from the curse of dimensionality when trying to filter joint distributions in high-dimensional state spaces. We argue that in most applications, we typically only care about low-dimensional marginals; high-dimensional joint distributions are both difficult to track, and are only useful after being projected down to their marginals. Additionally, high-dimensional state spaces often arise simply because entities that are “largely independent of each other” happen to co-exist in the same system, and may have some weak indirect dependencies between each other [3], [4]. In this paper, we investigate an approach that makes use of these two insights. We propose to perform the bulk of filtering in a factored state space, where each factor gives rise to a low-dimensional marginal distribution that is hopefully easier to track. We intend that these factored filters will typ- ically provide sufficient estimates for the domain. However, our framework also supports pooling estimates across factors to temporarily obtain a joint posterior distribution. Crucially though, the joint estimates produced by this fusion operation are forgotten and not propagated forward in time in order to maintain overall filtering tractability. Figure 1 provides a graphical model illustration of our approach. Department of Computer Science, Brown University, Providence, RI 02912, USA [email protected] X Y Z O Time 1 2 3 (a) Graphical model Factor X Factor Z Joint state Time 1 2 3 Y X Z O (b) Factor-Fuse-Forget Fig. 1: We propose an approximate filtering scheme to perform estimation in complex joint state spaces, such as the one illustrated on the left. For arbitrary partially observable dynamical systems, joint estimation is unavoidable; however, in typical applications, the state space can be factored into weakly-interacting components. These low-dimensional components often admit efficient, or even analytical, filter- ing solutions. Our framework exploits this computational advantage as much as possible, by filtering only within these respective spaces (which is efficent, as indicated by the blue arrows on the right), and fusing only at the last possible moment, when a query about the joint distribution is actually needed. We develop the factor-fuse-forget filtering framework in detail next, after providing a brief overview of related work. In Section IV, we present experiments that compare the framework against existing factoring and particle filtering methods, and illustrates this framework’s potential. II. RELATED WORK Estimating the state of partially observable dynamical sys- tems has been widely studied, with the Kalman filter being one of the early seminal works [5]. These early methods were mostly analytical and hence was only applicable to a limited number of models. With the introduction of nonparametric methods, namely sequential Monte Carlo methods or particle filtering, a general estimation method became available for all models [2], [1]. However, the generality of the approach again restricted it to only be practical for simple, low- dimensional problems. A number of works therefore attempted to modify particle filters by exploiting structure in typical dynamical systems,

Refresh: The Factor-Fuse-Forget Filtering Frameworkcs.brown.edu/~lwong5/papers/2018-f5.pdf · to co-exist in the same system, and may have some weak indirect dependencies between

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Refresh: The Factor-Fuse-Forget Filtering Frameworkcs.brown.edu/~lwong5/papers/2018-f5.pdf · to co-exist in the same system, and may have some weak indirect dependencies between

Refresh: The Factor-Fuse-Forget Filtering Framework

Lawson L.S. Wong

Abstract— Estimating and tracking the state is a fundamentalability of any autonomous system. The complexity of statespaces in typical applications frequently necessitate approx-imate filtering techniques such as particle filters. However,these methods often suffer from the curse of dimensionalitywhen attempting to maintain a high-dimensional joint posteriordistribution. Based on the insight that most joint distributionsare projected into low-dimensional marginals when used, wepropose to actively eschew the joint space, and perform the bulkof filtering in a factored state space, where each factor givesrise to a low-dimensional marginal distribution that is hopefullyeasier to track. Our framework supports fusing these factorsto temporarily obtain joint posterior distributions, but forgetsthese outputs in favor of maintaining a tractable factored filter.In simulated domains, we demonstrate that this frameworkoffers substantial improvements compared to existing factoringand particle filtering methods, in both filtering accuracy andcomputational efficiency.

I. INTRODUCTIONEstimating and tracking the state is a fundamental abil-

ity of any autonomous system. The paradigm of recursiveBayesian filtering is particularly appealing and widely usedin probabilistic robotic systems [1]. However, except in thesimplest of cases, such filters do not admit closed-formsolutions, and approximations, such as particle filters andother nonparametric methods, are frequently used in practice[2]. Although these methods are very general, they quicklysuffer from the curse of dimensionality when trying to filterjoint distributions in high-dimensional state spaces.

We argue that in most applications, we typically only careabout low-dimensional marginals; high-dimensional jointdistributions are both difficult to track, and are only usefulafter being projected down to their marginals. Additionally,high-dimensional state spaces often arise simply becauseentities that are “largely independent of each other” happento co-exist in the same system, and may have some weakindirect dependencies between each other [3], [4].

In this paper, we investigate an approach that makes useof these two insights. We propose to perform the bulk offiltering in a factored state space, where each factor gives riseto a low-dimensional marginal distribution that is hopefullyeasier to track. We intend that these factored filters will typ-ically provide sufficient estimates for the domain. However,our framework also supports pooling estimates across factorsto temporarily obtain a joint posterior distribution. Cruciallythough, the joint estimates produced by this fusion operationare forgotten and not propagated forward in time in orderto maintain overall filtering tractability. Figure 1 provides agraphical model illustration of our approach.

Department of Computer Science, Brown University, Providence, RI02912, USA [email protected]

X

Y

Z

OTime 1 2 3

(a) Graphical model

Factor X

Factor Z

Joint state

Time 1 2 3

Y

X

Z

O

(b) Factor-Fuse-Forget

Fig. 1: We propose an approximate filtering scheme toperform estimation in complex joint state spaces, such as theone illustrated on the left. For arbitrary partially observabledynamical systems, joint estimation is unavoidable; however,in typical applications, the state space can be factoredinto weakly-interacting components. These low-dimensionalcomponents often admit efficient, or even analytical, filter-ing solutions. Our framework exploits this computationaladvantage as much as possible, by filtering only within theserespective spaces (which is efficent, as indicated by the bluearrows on the right), and fusing only at the last possiblemoment, when a query about the joint distribution is actuallyneeded.

We develop the factor-fuse-forget filtering framework indetail next, after providing a brief overview of related work.In Section IV, we present experiments that compare theframework against existing factoring and particle filteringmethods, and illustrates this framework’s potential.

II. RELATED WORK

Estimating the state of partially observable dynamical sys-tems has been widely studied, with the Kalman filter beingone of the early seminal works [5]. These early methods weremostly analytical and hence was only applicable to a limitednumber of models. With the introduction of nonparametricmethods, namely sequential Monte Carlo methods or particlefiltering, a general estimation method became available forall models [2], [1]. However, the generality of the approachagain restricted it to only be practical for simple, low-dimensional problems.

A number of works therefore attempted to modify particlefilters by exploiting structure in typical dynamical systems,

Page 2: Refresh: The Factor-Fuse-Forget Filtering Frameworkcs.brown.edu/~lwong5/papers/2018-f5.pdf · to co-exist in the same system, and may have some weak indirect dependencies between

Factor X

Factor Z

Joint state

Time 1 2 3

Y

X

Z

O

(a) Factored filter:

Factor X

Factor Z

Joint state

Time 1 2 3

Y

X

Z

O

(b) Filter independently,

Factor X

Factor Z

Joint state

Time 1 2 3

Y

X

Z

O

(c) Fuse on demand

Fig. 2: The factored filter: Factor the state space, filter independently, fuse only on demand by querying the latest factoredposterior distributions. Blue arrows indicate steps that are computationally efficient, whereas red indicates hard steps.

while maintaining overall generality. Rao-Blackwellized par-ticle filtering (RBPF) [6] lowered the number of dimensionsthat were tracked by particles, by applying analytical estima-tion in the state dimensions that admitted such methods. Thefactored-frontier algorithm [7] aggressively factored the statespace into a product of marginal distributions, then updatedthem consistently with the dynamic Bayesian network (DBN)structure, which is efficient if the DBN is sparse. Factoredparticle filtering [8] performed factoring on the level ofparticles, by having particles only track a subset of thestate dimensions, then combining them via an operation akinto a database join. Finally, recent work by Albrecht andRamamoorthy [9] identify the causal structure of the statedimensions, which then allows for a method that only updatesthe necessary state dimensions.

Our approach makes a similar observation as the abovemethods, that factoring the state space is generally a goodidea for typical dynamical systems. However, most of themethods above are still conservative because they requireinference in the joint state space during propagation. Ourproposed framework is aggressive in the sense that inferencein the joint state space is only performed when needed, atquery time. Like the RBPF, our approach also supports black-box filters in subsets of the state space dimensions, which areoften available (e.g, robot pose tracked by a Kalman filter).Observing that queries are often answerable using onlyinformation from the marginal distribution (e.g., the locationof the robot), we see that joint inference by fusion of filterposterior distributions can often be performed sparingly. Theability to only perform fusion on posteriors at arbitrary pointsin time is not handled by previous methods. Additionally,since we are agnostic to the black-box filters used by thestate space factors, our approch is applicable beyond particlefiltering. On the other hand, since our framework does notultimately revert to particle filtering, we lose the generalconvergence guarantees, although we show empirically thatour approach achieves superior performance with drasticallyfewer particles.

III. APPROACH

For sake of clarity, we focus on the case where the statevariables and their observations are partitioned into twosubsets. Generalizing to more subsets is straightforward andwill be briefly mentioned at the end of this section.

A. Factor

Let XT and ZT be a partition of the state variablesup to time step T , and let YT and OT be observationsof these variables respectively. Each bolded variable con-tains a sequence of variables over time steps, e.g., XT =〈X1, . . . , XT 〉. Observations may be intermittent, i.e., Yt andOt may not exist for all times t.

The premise of this paper is that filtering in thejoint state space is expensive, i.e., the joint posteriorP (XT , ZT |YT ,OT ) is in general difficult to compute.Instead, we will separately maintain two filters P (XT |YT )and P (ZT |OT ). To compute these filters, we addition-ally need factored transition and observation functionsP (Xt+1 |Xt) and P (Yt |Xt), and likewise for Z and O.

In general, these factored functions are approximations ofthe joint transition function P (Xt+1, Zt+1 |Xt, Zt) and thejoint observation function P (Yt, Ot |Xt, Zt). Indeed, if thefactored versions were in fact exact, i.e., if:

P (Xt+1, Zt+1 |Xt, Zt) = P (Xt+1 |Xt) P (Zt+1 |Zt) (1)P (Yt, Ot |Xt, Zt) = P (Yt |Xt) P (Ot |Zt) , (2)

then filtering the factors independently would have been anobvious design choice. Assuming this trivial case does notapply, there must be useful information from one set of statesthat could affect the other set’s posterior distribution. Forexample, the true decomposition of the above functions mayinvolve terms such as P (Xt+1 |Xt, Zt) or P (Ot |Xt, Zt),which require information transfer across the partition. Toobtain accurate posterior state distributions, fusion of thefactored filters is therefore necessary.

The strategy adopted in this paper is shown in Figure 2.We chose to partition the state space into two subsets, and

Page 3: Refresh: The Factor-Fuse-Forget Filtering Frameworkcs.brown.edu/~lwong5/papers/2018-f5.pdf · to co-exist in the same system, and may have some weak indirect dependencies between

Time 1 2 3

Y

X

Z

O

Factor X

Factor Z

Joint state

(a) Propagate joint state?

Time 1 2 3

Y

X

Z

O

Factor X

Factor Z

Joint state

(b) When fusing in the future,

Time 1 2 3

Y

X

Z

O

Factor X

Factor Z

Joint state

(c) Double-counting will occur

Fig. 3: If information is propagated forward at too many layers, double-counting will occur.

independently filter in each (Figure 2a). This requires usto additionally specify local transition functions (i.e., donot depend on the other set’s variables). In Figure 2b, wehave also drawn the information flow during inference aswell in colored arrows; blue indicates that data assimilationand state propagation is relatively easy (typically due toconjugacy); red indicates that these steps are potentiallydifficult. With appropriate choices, filtering in each layershould be relatively efficient, as illustrated.

The degree of approximation and ultimate solution qualityof the factored filter depends on whether the original statespace partition {X,Z} was judiciously chosen.

B. Fuse

Given filters P (XT |YT ) and P (ZT |OT ), we derive anefficient approximation of the fused posterior marginal distri-bution P (XT |YT ,OT ) (the derivation for P (ZT |OT ,YT )is similar):

P (XT |YT ,OT ) ∝ P (OT |XT ,YT ) P (XT |YT ) (3)≈ P (OT |XT ) P (XT |YT ) (4)

=

[∑ZT

P (ZT ,OT |XT )

]P (XT |YT ) (5)

=

[∑ZT

P (OT |ZT ) P (ZT |XT )

]P (XT |YT ) (6)

[∑ZT

P (ZT |OT )

P (ZT )P (ZT |XT )

]P (XT |YT ) (7)

P (ZT ) is the prior distribution for ZT (e.g., the mixingdistribution of the Markov chain for large T ). We denoteP (ZT |XT ) as the ‘link’ function from XT to ZT , whichspecifies the information (if any) that XT may contain aboutZT in the same time step. For example, XT and ZT may becorrelated, or even causally related. Note that this is differ-ent from the general transition function P (ZT+1 |ZT , XT ),which connects different time steps.

The final line in the derivation above is key to makingfusion efficient, and is the primary reason we maintain the

filter P (ZT |OT ). Consider the line before, which requirescomputing P (OT |ZT ). This would in turn require a sum-mation over latent variables Zt, at all times t that observationOt was made, and is generally the same as solving the entirefiltering problem up to time T . Instead of performing thisrepeatedly for each fusion operation, the maintenance ofP (ZT |OT ) basically performs the computation once andcaches the result. This then enables fusion, in Equation 7, tosimply involve combining the factored posteriors, modulatedby a summation over ZT and the link function.

For completeness, we also present a similar efficientapproximation of the joint posterior distribution, althoughwe will not use this in the remainder of the paper:

P (XT , ZT |YT ,OT )

= P (ZT |XT ,YT ,OT ) P (XT |YT ,OT ) (8)≈ P (ZT |XT ,OT ) P (XT |YT ,OT ) (9)∝ P (OT |ZT ) P (ZT |XT ) P (XT |YT ,OT ) (10)

∝ P (ZT |OT )

P (ZT )P (ZT |XT ) P (XT |YT ,OT ) (11)

Even with the above optimized fusion operation, thenormalization required to obtain the posterior marginal onXT will generally be computationally expensive. Hencefusion should be applied sparingly, only as needed. This isillustrated in Figure 2c, with fusion occurring at the secondtime step. The red arrows highlight the difficulty of this step.

C. Forget

After fusing, we now have two factored posteriors and(marginals of) a joint posterior distribution. There are mul-tiple ways to proceed with filtering from this point. Forexample, as illustrated in Figure 3a, we can choose topropagate the state forward on all three layers. This, however,is problematic for two reasons. First, filtering on the fusedstate is expensive; otherwise, we could have done so fromthe beginning. Second, information may be double-countedin the future if more fusion steps occur. Suppose, as inFigure 3b, fusion occurs again at the next time step. Theninformation from the observation Y1, which was incorporated

Page 4: Refresh: The Factor-Fuse-Forget Filtering Frameworkcs.brown.edu/~lwong5/papers/2018-f5.pdf · to co-exist in the same system, and may have some weak indirect dependencies between

Time 1 2 3

Y

X

Z

O

Factor X

Factor Z

Joint state

(a) Filter joint state, reset factored filters

Time 1 2 3

Y

X

Z

O

Factor X

Factor Z

Joint state

(b) Forget joint state, resume factored filters

Fig. 4: Two viable filtering strategies to avoid double-counting.

into the joint layer via X at t = 2, will be fused again via Xat t = 3. These two paths are indicated by the curved greenarrows in the right diagram of Figure 3.

The double-counting issue occurs whenever there aremultiple paths through which information can travel fromone node to another. Hence, to ensure sound filtering, we cankeep at most one path. This suggests two possible strategies,shown in Figure 4. In Figure 4a, we choose to propagatethe joint state forward. Then to remove the other arrow,we either have to never fuse again, or we have to reset thefactored filters (the latter is depicted). This solution is not toodesirable, since it requires an expensive joint filter, and wealso lose the factored filters that may be used more frequentlythan the joint state. In Figure 4b, we choose to propagatethe factored states forward but not the joint state. This isexactly our factor-fuse-forget strategy. These two strategiesare known for “hierarchical fusion without feedback” fusionarchitectures, of which our setup is an instance [10].

D. Computational Analysis

At this point, we can analyze what we have gainedso far. In a standard recursive Bayesian filter of thejoint state {X,Z}, at each time step we would needto propagate the belief forward using the joint transitionmodel P (Xt, Zt |Xt−1, Zt−1), and condition the belief usingBayes’ rule via the observation function P (Yt, Ot |Xt, Zt):

P (Xt, Zt |Yt, Ot) ∝ P (Yt, Ot |Xt, Zt) P (Xt, Zt) (12)

This latter step is computationally demanding because ittypically requires enumerating/sampling over the joint statespace, except in scenarios with good conjugacy properties.

In the factor-fuse-forget filtering framework, we maintainseparate filters for X and Z, using local transition andobservation functions. Since the individual state spaces arecombinatorially smaller than the joint space, factored filteringshould be significantly more efficient than the joint beliefupdate operations above. Then, fusion is occasionally per-formed using Equation 7, which again requires enumeratingthe joint state space.

In the extreme case where fusion is performed at everytime step, the additional overhead of the factored filters maycause our framework to be more computationally expensivethan the standard joint filter. However, assuming that themotivating rationale holds, that fusion can be performed at asignificantly lower frequency than the rate of incoming obser-vations, then our framework requires much fewer expensiveenumerations of the full joint space.

E. Failure Modes

Assume that all state variables are observable, in that givensufficient observations, the true joint filter can estimate allvariables consistently. If the factored states are moreoverlocally observable (e.g., Y provides information about allvariables in X), and the local filters are consistent, then thefactored states will be able to correct themselves over time,even without the need to fuse from other subsets. Supposethis trivial case does not hold. Then there exist states that arelocally unobservable in one subset, but may be influencedindirectly by observations from another subset. As long asfusion occurs infinitely often, and the critical information isretained by the local filters at the time point of fusion, thenthe information will eventually reach all states.

In this argument sketch, the only potential point of failureis if the indirect information is lost by the time fusion occurs.Suppose Z contains such information from its observationsthat would affect X , and fusion is performed at time stept. Then the above failure may occur if there exists sometime t′ < t such that P (Xt |Zt) 6= P (Xt |Zt, Zt′). Thatis, Zt is not a sufficient statistic of its own history; there ispast information in Zt′ that influences Xt that is no longeraccessible from the filter for Z at time t.

F. Extensions

After completing an expensive fusion step, it seems waste-ful to simply discard the estimate. Additionally, the joint statemay have crucial information that can correct the factoredfilters; otherwise, the fusion step would be unnecessary. Itwould be useful to be able to project that information backinto the assumed factored density and correct the marginal

Page 5: Refresh: The Factor-Fuse-Forget Filtering Frameworkcs.brown.edu/~lwong5/papers/2018-f5.pdf · to co-exist in the same system, and may have some weak indirect dependencies between

Time 1 2 3

Y

X

Z

O

Factor X

Factor Z

Joint state

(a) Propagate fused informationback to factored filters?

Time 1 2 3

Y

X

Z

O

Factor X

Factor Z

Joint state

(b) If propagate to multiple factors,double-counting may occur

Time 1 2 3

Y

X

Z

O

Factor X

Factor Z

Joint state

(c) If propagate to one factor,must reset other filters

Fig. 5: Propagating fused information back to factors may also cause double-counting.

distributions [11]. Consider such a projection scheme in ourframework, as shown in Figure 5a. The projection step isshown as the blue right arrows emanating from the jointstate (typically projection is not as difficult as fusion).

However, if we consider a fusion step in the future, asin Figure 5b, we see once again that information could havereached the fusion node via multiple paths. In the case shownby the arrows, observation O2 is fused at t = 3 via Z, andalso via the projection from the fusion node at t = 2 intoX . Once again, we must consider cutting one of the paths.If we ignore the option of disallowing fusion steps, then theonly possibilities are to reset either of the factored filters. Forexample, in Figure 5c, if we choose the filter P (XT |YT ),then we must reset the other filter P (ZT |OT ). In general, ifwe have multiple factored filters, then correcting any singleone means that all others must be reset. This scheme maybe desirable if there is a factor that should naturally takeprecedence, such as being causally upstream.

Finally, so far we have only derived the filtering strategyfor a state space partition with two factors, X and Z. Extend-ing the framework to additional factors is straightforward, aslong as local transition and observation functions, as wellas efficient local filters, can be found for each factor. Thefusion step will retain the form of Equation 7, except withadditional product terms for each filter, and the link functionis conditioned on all other factors’ state variables as well.

IV. EXPERIMENTS

We demonstrate the filtering framework by comparing it,in simulation, against a generic non-parametric particle filter.

A. An Illustrative 2-D Domain

First, we consider the simplest possible case: a discrete 2-D domain, such that the exact recursive Bayesian filter can betractably computed. In our simulation, each state dimensionhas 10 possible states. The transition matrix is uniformlyrandomly generated (and appropriately normalized). Eachstate dimension is observed independently; the observationmatrix is the identity function corrupted with a 0.25 noise

Fig. 6: Proportion of trials where factor-fuse (FF) outper-forms particle filter (PF).

level (i.e., the probability of observing the correct state is0.75, and uniformly spread across the other 9 states). Wesimulated state trajectories and observations over 10 timesteps, and applied three filtering schemes:

• PF: A 2-D particle filter on the joint state space.• FF: Two separate 1-D recursive Bayesian filters, one

for each state dimension. At the final time step, the twofilters are fused using the framework presented in thispaper.

• A 2-D recursive Bayesian filter (the exact solution).

We repeated the above simulation for 100 trials. In each,the Kullback-Leibler divergence between the compared ap-proaches (PF, FF) and the exact solution were computed.To compare the two filtering approaches, we computedthe proportion of trials in which the proposed factor-fuseframework (FF) achieved lower error with respect to theexact solution. Results are shown in Figure 6.

In particular, when PF is given 103 particles, we seethat the two methods have similar accuracies. Note that thecomputational cost of FF is approximately that required for10+10 = 20 particles, which implies in this example that FFcan be up to 50 times more efficient than PF to achieve the

Page 6: Refresh: The Factor-Fuse-Forget Filtering Frameworkcs.brown.edu/~lwong5/papers/2018-f5.pdf · to co-exist in the same system, and may have some weak indirect dependencies between

same level of accuracy, and can achieve better performancewith much less computation.

B. Tracking Objects and their Occupancies

This domain is inspired by a tabletop tracking and ma-nipulation setting, where several objects on a planar surfacemay be moved around, whose locations are detected withsignificant noise, and various occupancy observations arerecorded as well. Typically, simply using the noisy objectlocation detections is sufficient to roughly track them. How-ever, ccasionally objects may need to be accurately localized,such that they can be grasped or manipulated, hence fusingwith the occupancy observations can be useful.

The domain consists of 5 objects in a 2-D box of dimen-sions [0, 10]× [0, 10]. The objects are circular in shape withradius 1, and their centers are used as the state. For a jointstate to be valid, the entirety of each circle must be withinthe box (i.e., each center’s domain is [1, 9]× [1, 9]), and nocircles should overlap (i.e., pairs of centers should be at least2 units apart). Over the course of 10 time steps, objects movein a random walk per unit time, each taking a step with amean-zero unit-variance isotropic Gaussian distribution, withthe constraint that the resulting joint state must be valid.

At each time, each object is observed once with isotropicGaussian noise, standard deviation 1.0. The observations mayviolate the boundary and non-overlap constraints. Objects arecompletely identifiable. In addition to object observations,there is also an occupancy sensor placed in a grid, with cellsbeing unit squares (i.e., 10 × 10 grid cells). In each timestep, each cell measures its occupancy state 5 times. A cellis considered occupied if and only if its center is overlappedby some object (circle). The occupancy observation is correctwith probability 0.7.

Note that this is a challenging filtering domain for genericexisting methods; in the terminology of this paper, X is a10-D continuous space (although it will be further factoredinto 5 pairs of 2-D spaces), and Z consists of 100 binary(discrete) variables.

Figures 7 shows snapshots among 10 time steps in thisdomain. The left column in the figure shows the observationsgiven as input to the filters: the solid circles are objectmeasurements, the dashed circles are the true state, andthe grid cells are colored by the proportion of occupancyobservations that report “occupied” (darker means that thecell received more “occupied” measurements).

We compare against three methods: Rao-Blackwellizedparticle filter (RBPF) [6], factored-frontier [7], and Boyen-Koller [11]. In the RBPF, the state is the joint locationsof the 5 objects, i.e., the state is 10-dimensional. Giventhe continuous state of the objects, the latent occupancyvariables and their observations are analytically marginalizedout. Starting with 104 particles, we iteratively sample validtransitions for each and weigh each by the likelihood ofthe occupancy observations at each time step. This givesa distribution over joint states; the state with the highestaccumulated weight up until the query time is shown in theright column of Figure 7.

The factored-frontier and Boyen-Koller approaches aresimilar to each other, where the belief state is also fac-tored aggressively. Both methods propagate transition stepsin this factored form; however, subsequent incorporationof observations causes the joint distribution to no longerconform to this factored structure. In both cases, we willneed to further sample from this joint distribution due totheir high-dimensional and continuous nature (this will alsobe necessary for our approach, as described next). Thesesamples can then be weighted according to the observationlikelihood, and the resulting nonparametric belief is projectedback into the assumed factored form before continuing.

In the factor-fuse-forget framework, we first have to supplyfactored filters. For objects, we aggressively factor the 5object states, and keep track of each with a 2-D Kalman filter.Since we no longer have joint object states, the non-overlapconstraint is not enforced by the factored filters. Boundaryconstraints are also temporarily ignored. For occupancy data,a dynamic occupancy grid was used to keep track of howlikely each of the 10×10 cells is occupied. This is similar to astandard occupancy grid [12], except it can have Markoviantransitions between different occupancy states [13]. In thesecond column of Figure 7, the estimated Gaussian meansof object locations are shown with solid circles. At certaintimes (e.g., t = 8 and 9), some object mean locations causeobjects to overlap / be out of bounds.

Even prior to fusion, when we compare the log-likelihoodsof the factored and particle filter columns, we see thatoften they are comparable: factored is clearly better at t ∈{1, 7, 8}, similar at t = 10, and worse at t = 9. Note thatfactored at this stage has only involved Kalman filters; theoccupancy grid has not been used yet.

For fusion, we cannot enumerate over the objects’ loca-tions and apply Equation 7, since the space of locations iscontinuous in this domain. Instead, we sample each object’slocation individually from its respective Kalman filter, thenconcatenate the locations into a joint state. A sample isrejected if is violates the boundary constraints. We encodenon-interpenetration constraints into the occupancy grid bygiving zero probability to states which involve multipleobjects overlapping the same grid cell center. Note that thisdoes not necessarily avoid interpenetration unless the cellwidth is infinitesimal. A total of 103 such concatenated jointstates are sampled, and we can assign a weight to each usingthe occupancy grid and the occupancy prior. The one with thehighest such weight is shown in the third column in Figure 7.

As shown both qualitatively and by the log-likelihoodscores, the resulting fused estimate is significantly better atall times than the mean/mode of the factored filter, as wellas compared to the particle filter, even though the fusionstep used 10 times fewer samples. This is not too surprising,because many particles would likely have become degenerateduring filtering. Additionally, although we show the fusedestimates for all time steps, recall that they do not rely oneach other; each fusion step only requires the factored filtersat that time. If we were only interested in the joint state att = 10, we can choose not to fuse the previous time steps,

Page 7: Refresh: The Factor-Fuse-Forget Filtering Frameworkcs.brown.edu/~lwong5/papers/2018-f5.pdf · to co-exist in the same system, and may have some weak indirect dependencies between

(a) t = 1 (b) LL = −345 (c) LL = −321 (d) LL = −352

(e) t = 2 (f) LL = −346 (g) LL = −329 (h) LL = −332

· · ·

(i) t = 8 (j) LL = −356 (k) LL = −336 (l) LL = −371

(m) t = 9 (n) LL = −364 (o) LL = −328 (p) LL = −355

(q) t = 10 (r) LL = −369 (s) LL = −333 (t) LL = −371

Fig. 7: Left column shows observed locations (solid circles) and occupancy (darker = more detections of “occupied”). Objectsare color-coded by identity. Dashed circles are true object locations in all frames. Second column shows factored filter, withestimated Gaussian location means (solid circles) and an occupancy grid (darker = more likely occupied). Third and fourthcolumns are the most-likely samples from fusion (of 103) and from a particle filter over object locations (of 104). Thefused estimate has a significantly greater log-likelihood across all times, despite requiring fewer samples and hence is moreefficient.

Page 8: Refresh: The Factor-Fuse-Forget Filtering Frameworkcs.brown.edu/~lwong5/papers/2018-f5.pdf · to co-exist in the same system, and may have some weak indirect dependencies between

Approach Quality (Log-likelihood) Time (s)RBPF [6] −352 162

Factored Frontier [7] −370 17.7Boyen-Koller [11] −343 14.8Factor-Fuse-Forget −330 3.41

TABLE I: Accuracy and efficiency comparison betweenmultiple approaches on the domain described in Section IV-B. Approaches were assessed based on the quality of theestimate at the final time step (t = 10), and the amount oftime taken to compute this estimate, averaged over 10 trials.Accuracy is measured by the log-likelihood; higher is better(i.e., a smaller negative number). RBPF was given 104 par-ticles; the other methods used 103 samples when computingintermediate joint distributions (see text for details).

and the estimate at t = 10 will be the same. This is notthe case for the particle filter; particles must be propagatedthrough each time step, and hence filtering in the joint statespace is much more expensive.

Quantitative accuracy and timing results comparing thefour methods are provided in Table I.

V. CONCLUSION

We have introduced the factor-fuse-forget filtering frame-work, illustrating the general strategy, and derived the fusionsteps involved. Two experiments were presented that appliedthe framework and showed significantly better performancecompared to existing factoring-based and particle filteringmethods. We are eager to apply our framework to moredomains in the future, as well as develop a better theoreticalunderstanding of the approach.

REFERENCES

[1] S. Thrun, W. Burgard, and D. Fox, Probabilistic Robotics. MIT Press,2005.

[2] A. Doucet, N. de Freitas, and N. Gordon, “An introduction to sequen-tial Monte Carlo methods,” in Sequential Monte Carlo Methods inPractice, A. Doucet, N. de Freitas, and N. Gordon, Eds. Springer-Verlag New York, 2001, pp. 3–14.

[3] A. Pfeffer, S. Das, D. Lawless, and B. Ng, “Global/local dynamicsmodels,” in International Joint Conference on Artificial Intelligence,2007.

[4] C. Frogner and A. Pfeffer, “Discovering weakly interacting factorsin a complex stochastic process,” in Neural Information ProcessingSystems, 2007.

[5] R. Kalman, “A new approach to linear filtering and predictionproblems,” Transactions of the ASME–Journal of Basic Engineering,vol. 82, no. Series D, pp. 35–45, 1960.

[6] A. Doucet, N. de Freitas, K. Murphy, and S. Russell, “Rao-Blackwellised particle filtering for dynamic Bayesian networks,” inUncertainty in Artificial Intelligence, 2000.

[7] K. Murphy and Y. Weiss, “The factored frontier algorithm for approx-imate inference in DBNs,” in Uncertainty in Artificial Intelligence,2001.

[8] B. Ng, L. Peshkin, and A. Pfeffer, “Factored particles for scalablemonitoring,” in Uncertainty in Artificial Intelligence, 2002.

[9] S. Albrecht and S. Ramamoorthy, “Exploiting causality for selectivebelief filtering in dynamic Bayesian networks,” Journal of ArtificialIntelligence Research, vol. 55, no. 1, pp. 1135–1178, 2016.

[10] C.-Y. Chong, K.-C. Chang, and S. Mori, “Fundamentals of distributedestimation,” in Distributed Data Fusion for Network-Centric Opera-tions, D. Hall, C.-Y. Chong, J. Llinas, and M. Liggins II, Eds. CRCPress, 2012, pp. 95–124.

[11] X. Boyen and D. Koller, “Tractable inference for complex stochasticprocesses,” in Uncertainty in Artificial Intelligence, 1998.

[12] H. Moravec and A. Elfes, “High resolution maps from wide anglesonar,” in IEEE International Conference on Robotics and Automation,1985.

[13] D. Meyer-Delius, M. Beinhofer, and W. Burgard, “Occupancy gridmodels for robot mapping in changing environments,” in AAAI Con-ference on Artificial Intelligence, 2012.