Upload
conroy
View
23
Download
0
Embed Size (px)
DESCRIPTION
Probabilistic Inference in Distributed Systems. Stanislav Funiak. Disclaimer: Statements made in this talk are the sole opinions of the presenter and do not necessarily represent the official position of the University or presenter’s advisor. Monitoring in Emergency Response Systems. z. X i. - PowerPoint PPT Presentation
Citation preview
1
Probabilistic Inference in Distributed Systems
Stanislav Funiak
Disclaimer:Statements made in this talk are the sole opinions of the presenter and do not necessarily represent the official position of the University or presenter’s advisor.
2
Monitoring in Emergency Response Systems
Firefighters enter a building
As they run around, place a bunch of sensors
Want to monitor the temperature in various places
p(temperature at location i
Xi
| temperature observed at all sensors)
z
3
Efficient inference:Nice model:
Monitoring in Emergency Response Systems
You ask a 10-701 graduate for help: “learn the model”
You ask a 10-708 graduate for help: “implement efficient inference”
Put them in an IntelTM Core-Trio machine with 30GB RAM
Simulation experiments work great
X1
X2
X3
X4
X5
X6
Z2 Z6Z4
hiddenstate
observedtemp.
Done!
4
D-Day arrives…
You start up your machine and…
Firefighters deploy the sensors
The network goes down. Got flooded.
You call up an old-time friend at MIT.
Sends you a patch in 24 minutes.
highly optimized routing
Ooops! Part of the ceiling just went down, lost connection again
5
Last-minute Link Stats
mhm, communication is lossy mhm, link qualities change
*
* Joke warning: = 1 week
Maybe having a good routing was not such a bad idea…
6
What’s wrong here?
• Cannot rely on centralized infrastructure– too costly to gather all observations– need be robust against node failures, message losses– may want to perform online control
• nodes equipped with actuators
• Want to perform inference directly on network nodes
Also:
Autonomous teams of mobile robots
7
Distributed Inference – The Big Picture
p(Qn | temperature observed at all sensors)
zEach node nissues a query
some variables,e.g. temperature at locations 1,2,3
Nodes collaborate at computing the query
8
Probabilistic model vs. physical layer
Probabilisticmodel
physical nodes
available communication linksX1
X2
X3
X4
X5
X6
Z2 Z6Z4
Sensor network
Physicallayer
9
Natural solution: Loopy B.P.Suppose: Network nodes = Variables
1 3
2 4
7
6
5
8
10
Natural solution: Loopy B.P.
Issues:
X1 X3 X5 X7
X2 X4 X6 X8
4!6
5!6
6!8
[Pfeffer, 2003, 2005]
Suppose: Network nodes = Variables
Then could run loopy B.P. directly on the network
p(X4)could view as
• may not observe network structure • potentially non-converging• definitely over-confident will revisit in
experimental results
not fully resolved
: 99% hot
Truth: 51% hot, 49% cold
11
Want the Following Properties
1. Global correctness:Eventually, each node obtains the true distributionp(Qn | z)
2. Partial correctness:Before convergence, a node can form a meaningfulapproximation of p(Qn | z)
3. Local correctness:without seeing other nodes’ beliefs, each node can condition on its own observations
12
Outline
X1
X2
X3
X4
X5
X6
Z2 Z6Z4
Sensor network
communication links
routing tree
reparametrizedmodel
1. Nodes make local observations
2. Nodes establish a routing structure
3. Communicate tocompute the query
offline
distribute the model
Input model (BN / MRF) [Paskin & Guestrin, 2004]
13
Standard parameterization not robust
Now, suppose someone told us p(X2 | X3) and p(X3 | X1)
effectively, assuming uniform prior on X2
lost CPD
probability ofhigh temp.?
Much better: inference in a simpler model
Suppose we “lose” a CPD / potential (not communicated yet, a node failed)
observehigh temp.
p(X2 | X1) £ p(X3 | X1,X2) £ p(X4 | X2,X3) = p(X4 | X1)
Distribution changes dramatically
X1 X2
X3
X4
Exact model:
X2X1
X3
X4
Construct approximation:
preserves correlation btw X1 and X3
¼
X2?X1 | X3
15
Review: Junction Tree representation
BN / MNJunction tree
running intersection
family-preserving
(Think as writing the CPDs p(X6 | X4,X5), etc.)
clique marginals
separator marginals
X1
X2
X3
X4
X5
X6X1,X2 X2,X3,X4 X3,X4,X5 X4,X5,X6
separatorX4,X5
we’ll keepthese
not important(can be computed)
X3,X4,X5
clique
X3,X4X2
16
Properties used by the Algorithm
1. Marginalization amounts to pruning cliques:
Key properties:
2. Using a subset of cliques amounts to KL-projection:
3
2 4
5
6
exact:
3
2 4
5
6
approximate:
X2,3 ? X5,6 | X4
X2,X3,X4 X4,X5,X6
X4
T’:
X1,X2
X2
Junction tree T
X2,X3,X4 X3,X4,X5 X4,X5,X6
X3,4 X4,5
£
all distributions that factor as T’
X2,X3,X4 X4,X5,X6
missing clique
X3,X4,X5
17
How are these structures used for distributed inference?
From clique marginals to distributed inference
X1,X2 X3,X4,X5 X4,X5,X6
1 3
4
6
X1, X2 X2, X3, X4
X3, X4, X5
X4, X5, X6
X2, X3, X4 , X5
Network junction tree:[Paskin et al, 2005]
• used for communication
• satisfies running intersection property
• adaptive, can be optimized
Clique marginals
are assigned to network nodes
stronger linksweaker links
X2,X3,X4
18
X1,X2 X2,X3,X4 X3,X4,X5 X4,X5,X6
Global model: external J.T.Robust message passing algorithm
X1,X2 X3,X4,X5
X4,X5,X6
1 3
4
6
X1, X2 X2, X3, X4
X3, X4, X5
X4, X5, X6
X2, X3, X4 , X5
Clique marginals
X4,X5,X6
Local cliques:
Node locally decides, which cliques sufficient for its neighbors
X3,X4,X5
node 3 obtained
X2,X3,X4X2,X3,X4
Network junction treeNodes communicate clique marginals along the network junction tree
exact
19
Message passing = pruning leaf cliques
Theorem: On a path towards some network node, cliques that are not passed form branches of an external junction tree.
Corollary: At convergence, each node obtains subtree of external junction tree.
[Ch 6, Paskin, 2004]
1 34
6
X1,X2 X2,X3,X4X3,X4,X5
X4,X5,X6
X3,X4,X5
X4,X5,X6X2,X3,X4
ReplayX1,X2 X2,X3,X4 X3,X4,X5 X4,X5,X6
External junction tree:
pruned cliquescliques obtainedby node 1
20
Incorporating observationsOriginal model
X1
X2
X3
X4
X5
X6
Reparametrized as junction tree
Z1 Z3
Z4
Z6
X1,X2 X2,X3,X4 X3,X4,X5 X4,X5,X6
Suppose all observation variables are leaves:
Can associate each likelihood with any clique that covers its parents• algorithm will pass around clique priors and clique likelihoods• marginalization still amounts to pruning
e.g., suppose marginalize out X1
21
Putting it all together
Theorem: Global correctnessAt convergence, each node n obtains exact distribution overits query variables, conditioned on all observations
Theorem: Partial correctnessBefore convergence, each node n obtains a KL projection over its query variables, conditioned on collected observations E
junction tree formed bycollected cliques
22
Results: Convergence
Robust message passing
bet
ter
Standard sum-product algorithm
Model: Nodes estimate temperature as well as additive bias
converges early closeto the global optimum
Bad answers for a long time;then “snaps” in
(iteration)
23
Results: Robustness
Communication partitioned at t=60;restored at t=120
Node failure
converges closeto the global optimum
insensitive to node failures
bet
ter
(robust message passing algorithm)
24
How about dynamic inference?Firefighters get fancier equipment…
Place wireless cameras around an environment
Want to determine the locations automatically
local observationlocation Ci?
[Funiak et al 2006]
25
Firefighters get fancier equipment…Distributed camera localization:
camera location Ci object trajectory M1:T
This is a dynamic inference problem
26
How localization works in practice…
27
Model: (Dynamic) Bayesian Network
C2C1Cameralocations
observationsO(t)
O11 O1
2 O15
O25
M1
t=1Objectlocation
Transition model:
t-1 t
Filtering: compute the posterior distribution
image
Measurement model:
M5M2
t=2 t=5
stateprocesses
28
Filtering: Summary
prediction
estimation
prior distribution
posterior distribution
roll-up
29
Observations & transitions introduce dependencies
t t + 1
Suppose person observed by cameras 1 & 2 at two consecutive time steps
At time t:
At time t+1:
No independence assertionsamong C1, C2, Mt+1
Typically, after a while, no independence assertions among state variables
C1, C2, …, CN, Mt+1
C1, Mt C2, Mt C3
C1, C2, Mt+1 C3
30
Junction Tree Assumed Density Filtering
prior distributionat time t
A
B C
D E
ABC
BCD
CDE
Markov network: Junction tree:
exact prior at time t+1
ABCD
BCDE
A
B C
D E
approximate belief at time t+1
ABC
BCD
CDE
A
B C
D E
estimationprediction
roll-up KL projection
Periodically project to a “small” junction tree [Boyen,Koller 1998]
31
Distributed Assumed Density Filtering
X1,X2 X2,X3,X4 X3,X4,X5 X4,X5,X6
1 3
4
6
At each time step, a node computes a marginal over its clique(s)
1. Initialization:
2. Estimation:
3. Prediction:
condition on evidence (distributed)
advance to the next step (local)
32
Results: Convergence
RMS error
bet
ter
centralized solution
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
3 5 10 15 20Iterations per time step
Theorem: Given sufficient communication at each time step, distribution obtained by the algorithm is equal to running B&K98 algorithm.
33
Convergence: Temperature monitoringb
ette
r
Iterations per time step
34
Comparison with Loopy B.P.b
ette
r
Loopy, window 5
Loopy, window 1
Distributed filter, 1 iter/step
Distributed filter, 3 iter/step
t=1 t=2 t=3 t=4 t=5UnrolledDBN:
35
Partitions introduce inconsistencies
real camera network distribution computedby nodes on the left
distribution computedby nodes on the right
network partition
objectlocation
cameraposes
The beliefs obtained by the left and the right sub-network do not agree on the shared variables, do not represent a globally consistent distribution
Good news: the beliefs are not too different.Main difference: how certain the beliefs are.
36
The “two Bayesians meet on a street” problem
I believe the sun is up. Man, isn’t it down?
Hard problem, in general. Need samples to decide…
37
AlignmentIdea: formulate as an optimization problem.
Suppose we define aligned distribution to match the clique marginals:
aligneddistribution
inconsistentprior marginals
Not so great for Gaussians…
x
belief 1: uncertain
belief 2: certaini(x)
Aligneddistribution
This objective tends to forget information…
38
Alignment
For Gaussians, is a convex problem:
determinant maximization[Vandenberghe et al, SIAM 1998]
linear regression, can distribute[Guestrin IPSN 04]
Suppose we use KL divergence in “wrong” order
aligneddistribution
inconsistentprior marginals
Good: tends to prefer more certain distributions q
39
Results: Partition
progressively partitionthe communication graph
bet
ter
Number of partition components
omniscient best
omniscient worst
KL minimization performsas well as best unaligned solution
a simpler alignment
40
Conclusion
Distributed inference presents many interesting challenges• perform inference directly on the sensor nodes• robust to message losses, node failures
Static inference: message passing on routing tree• message = collections of clique marginals, likelihoods• obtain joint distribution• convergence, partial correctness properties
Dynamic inference: assumed density filtering• address inconsistencies