39
1 Probabilistic Inference in Distributed Systems Stanislav Funiak Disclaimer: Statements made in this talk are the sole opinions of the presenter and do not necessarily represent the official position of the University or presenter’s advisor.

Probabilistic Inference in Distributed Systems

  • Upload
    conroy

  • View
    23

  • Download
    0

Embed Size (px)

DESCRIPTION

Probabilistic Inference in Distributed Systems. Stanislav Funiak. Disclaimer: Statements made in this talk are the sole opinions of the presenter and do not necessarily represent the official position of the University or presenter’s advisor. Monitoring in Emergency Response Systems. z. X i. - PowerPoint PPT Presentation

Citation preview

Page 1: Probabilistic Inference in Distributed Systems

1

Probabilistic Inference in Distributed Systems

Stanislav Funiak

Disclaimer:Statements made in this talk are the sole opinions of the presenter and do not necessarily represent the official position of the University or presenter’s advisor.

Page 2: Probabilistic Inference in Distributed Systems

2

Monitoring in Emergency Response Systems

Firefighters enter a building

As they run around, place a bunch of sensors

Want to monitor the temperature in various places

p(temperature at location i

Xi

| temperature observed at all sensors)

z

Page 3: Probabilistic Inference in Distributed Systems

3

Efficient inference:Nice model:

Monitoring in Emergency Response Systems

You ask a 10-701 graduate for help: “learn the model”

You ask a 10-708 graduate for help: “implement efficient inference”

Put them in an IntelTM Core-Trio machine with 30GB RAM

Simulation experiments work great

X1

X2

X3

X4

X5

X6

Z2 Z6Z4

hiddenstate

observedtemp.

Done!

Page 4: Probabilistic Inference in Distributed Systems

4

D-Day arrives…

You start up your machine and…

Firefighters deploy the sensors

The network goes down. Got flooded.

You call up an old-time friend at MIT.

Sends you a patch in 24 minutes.

highly optimized routing

Ooops! Part of the ceiling just went down, lost connection again

Page 5: Probabilistic Inference in Distributed Systems

5

Last-minute Link Stats

mhm, communication is lossy mhm, link qualities change

*

* Joke warning: = 1 week

Maybe having a good routing was not such a bad idea…

Page 6: Probabilistic Inference in Distributed Systems

6

What’s wrong here?

• Cannot rely on centralized infrastructure– too costly to gather all observations– need be robust against node failures, message losses– may want to perform online control

• nodes equipped with actuators

• Want to perform inference directly on network nodes

Also:

Autonomous teams of mobile robots

Page 7: Probabilistic Inference in Distributed Systems

7

Distributed Inference – The Big Picture

p(Qn | temperature observed at all sensors)

zEach node nissues a query

some variables,e.g. temperature at locations 1,2,3

Nodes collaborate at computing the query

Page 8: Probabilistic Inference in Distributed Systems

8

Probabilistic model vs. physical layer

Probabilisticmodel

physical nodes

available communication linksX1

X2

X3

X4

X5

X6

Z2 Z6Z4

Sensor network

Physicallayer

Page 9: Probabilistic Inference in Distributed Systems

9

Natural solution: Loopy B.P.Suppose: Network nodes = Variables

1 3

2 4

7

6

5

8

Page 10: Probabilistic Inference in Distributed Systems

10

Natural solution: Loopy B.P.

Issues:

X1 X3 X5 X7

X2 X4 X6 X8

4!6

5!6

6!8

[Pfeffer, 2003, 2005]

Suppose: Network nodes = Variables

Then could run loopy B.P. directly on the network

p(X4)could view as

• may not observe network structure • potentially non-converging• definitely over-confident will revisit in

experimental results

not fully resolved

: 99% hot

Truth: 51% hot, 49% cold

Page 11: Probabilistic Inference in Distributed Systems

11

Want the Following Properties

1. Global correctness:Eventually, each node obtains the true distributionp(Qn | z)

2. Partial correctness:Before convergence, a node can form a meaningfulapproximation of p(Qn | z)

3. Local correctness:without seeing other nodes’ beliefs, each node can condition on its own observations

Page 12: Probabilistic Inference in Distributed Systems

12

Outline

X1

X2

X3

X4

X5

X6

Z2 Z6Z4

Sensor network

communication links

routing tree

reparametrizedmodel

1. Nodes make local observations

2. Nodes establish a routing structure

3. Communicate tocompute the query

offline

distribute the model

Input model (BN / MRF) [Paskin & Guestrin, 2004]

Page 13: Probabilistic Inference in Distributed Systems

13

Standard parameterization not robust

Now, suppose someone told us p(X2 | X3) and p(X3 | X1)

effectively, assuming uniform prior on X2

lost CPD

probability ofhigh temp.?

Much better: inference in a simpler model

Suppose we “lose” a CPD / potential (not communicated yet, a node failed)

observehigh temp.

p(X2 | X1) £ p(X3 | X1,X2) £ p(X4 | X2,X3) = p(X4 | X1)

Distribution changes dramatically

X1 X2

X3

X4

Exact model:

X2X1

X3

X4

Construct approximation:

preserves correlation btw X1 and X3

¼

X2?X1 | X3

Page 14: Probabilistic Inference in Distributed Systems

15

Review: Junction Tree representation

BN / MNJunction tree

running intersection

family-preserving

(Think as writing the CPDs p(X6 | X4,X5), etc.)

clique marginals

separator marginals

X1

X2

X3

X4

X5

X6X1,X2 X2,X3,X4 X3,X4,X5 X4,X5,X6

separatorX4,X5

we’ll keepthese

not important(can be computed)

X3,X4,X5

clique

X3,X4X2

Page 15: Probabilistic Inference in Distributed Systems

16

Properties used by the Algorithm

1. Marginalization amounts to pruning cliques:

Key properties:

2. Using a subset of cliques amounts to KL-projection:

3

2 4

5

6

exact:

3

2 4

5

6

approximate:

X2,3 ? X5,6 | X4

X2,X3,X4 X4,X5,X6

X4

T’:

X1,X2

X2

Junction tree T

X2,X3,X4 X3,X4,X5 X4,X5,X6

X3,4 X4,5

£

all distributions that factor as T’

X2,X3,X4 X4,X5,X6

missing clique

X3,X4,X5

Page 16: Probabilistic Inference in Distributed Systems

17

How are these structures used for distributed inference?

From clique marginals to distributed inference

X1,X2 X3,X4,X5 X4,X5,X6

1 3

4

6

X1, X2 X2, X3, X4

X3, X4, X5

X4, X5, X6

X2, X3, X4 , X5

Network junction tree:[Paskin et al, 2005]

• used for communication

• satisfies running intersection property

• adaptive, can be optimized

Clique marginals

are assigned to network nodes

stronger linksweaker links

X2,X3,X4

Page 17: Probabilistic Inference in Distributed Systems

18

X1,X2 X2,X3,X4 X3,X4,X5 X4,X5,X6

Global model: external J.T.Robust message passing algorithm

X1,X2 X3,X4,X5

X4,X5,X6

1 3

4

6

X1, X2 X2, X3, X4

X3, X4, X5

X4, X5, X6

X2, X3, X4 , X5

Clique marginals

X4,X5,X6

Local cliques:

Node locally decides, which cliques sufficient for its neighbors

X3,X4,X5

node 3 obtained

X2,X3,X4X2,X3,X4

Network junction treeNodes communicate clique marginals along the network junction tree

exact

Page 18: Probabilistic Inference in Distributed Systems

19

Message passing = pruning leaf cliques

Theorem: On a path towards some network node, cliques that are not passed form branches of an external junction tree.

Corollary: At convergence, each node obtains subtree of external junction tree.

[Ch 6, Paskin, 2004]

1 34

6

X1,X2 X2,X3,X4X3,X4,X5

X4,X5,X6

X3,X4,X5

X4,X5,X6X2,X3,X4

ReplayX1,X2 X2,X3,X4 X3,X4,X5 X4,X5,X6

External junction tree:

pruned cliquescliques obtainedby node 1

Page 19: Probabilistic Inference in Distributed Systems

20

Incorporating observationsOriginal model

X1

X2

X3

X4

X5

X6

Reparametrized as junction tree

Z1 Z3

Z4

Z6

X1,X2 X2,X3,X4 X3,X4,X5 X4,X5,X6

Suppose all observation variables are leaves:

Can associate each likelihood with any clique that covers its parents• algorithm will pass around clique priors and clique likelihoods• marginalization still amounts to pruning

e.g., suppose marginalize out X1

Page 20: Probabilistic Inference in Distributed Systems

21

Putting it all together

Theorem: Global correctnessAt convergence, each node n obtains exact distribution overits query variables, conditioned on all observations

Theorem: Partial correctnessBefore convergence, each node n obtains a KL projection over its query variables, conditioned on collected observations E

junction tree formed bycollected cliques

Page 21: Probabilistic Inference in Distributed Systems

22

Results: Convergence

Robust message passing

bet

ter

Standard sum-product algorithm

Model: Nodes estimate temperature as well as additive bias

converges early closeto the global optimum

Bad answers for a long time;then “snaps” in

(iteration)

Page 22: Probabilistic Inference in Distributed Systems

23

Results: Robustness

Communication partitioned at t=60;restored at t=120

Node failure

converges closeto the global optimum

insensitive to node failures

bet

ter

(robust message passing algorithm)

Page 23: Probabilistic Inference in Distributed Systems

24

How about dynamic inference?Firefighters get fancier equipment…

Place wireless cameras around an environment

Want to determine the locations automatically

local observationlocation Ci?

[Funiak et al 2006]

Page 24: Probabilistic Inference in Distributed Systems

25

Firefighters get fancier equipment…Distributed camera localization:

camera location Ci object trajectory M1:T

This is a dynamic inference problem

Page 25: Probabilistic Inference in Distributed Systems

26

How localization works in practice…

Page 26: Probabilistic Inference in Distributed Systems

27

Model: (Dynamic) Bayesian Network

C2C1Cameralocations

observationsO(t)

O11 O1

2 O15

O25

M1

t=1Objectlocation

Transition model:

t-1 t

Filtering: compute the posterior distribution

image

Measurement model:

M5M2

t=2 t=5

stateprocesses

Page 27: Probabilistic Inference in Distributed Systems

28

Filtering: Summary

prediction

estimation

prior distribution

posterior distribution

roll-up

Page 28: Probabilistic Inference in Distributed Systems

29

Observations & transitions introduce dependencies

t t + 1

Suppose person observed by cameras 1 & 2 at two consecutive time steps

At time t:

At time t+1:

No independence assertionsamong C1, C2, Mt+1

Typically, after a while, no independence assertions among state variables

C1, C2, …, CN, Mt+1

C1, Mt C2, Mt C3

C1, C2, Mt+1 C3

Page 29: Probabilistic Inference in Distributed Systems

30

Junction Tree Assumed Density Filtering

prior distributionat time t

A

B C

D E

ABC

BCD

CDE

Markov network: Junction tree:

exact prior at time t+1

ABCD

BCDE

A

B C

D E

approximate belief at time t+1

ABC

BCD

CDE

A

B C

D E

estimationprediction

roll-up KL projection

Periodically project to a “small” junction tree [Boyen,Koller 1998]

Page 30: Probabilistic Inference in Distributed Systems

31

Distributed Assumed Density Filtering

X1,X2 X2,X3,X4 X3,X4,X5 X4,X5,X6

1 3

4

6

At each time step, a node computes a marginal over its clique(s)

1. Initialization:

2. Estimation:

3. Prediction:

condition on evidence (distributed)

advance to the next step (local)

Page 31: Probabilistic Inference in Distributed Systems

32

Results: Convergence

RMS error

bet

ter

centralized solution

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

3 5 10 15 20Iterations per time step

Theorem: Given sufficient communication at each time step, distribution obtained by the algorithm is equal to running B&K98 algorithm.

Page 32: Probabilistic Inference in Distributed Systems

33

Convergence: Temperature monitoringb

ette

r

Iterations per time step

Page 33: Probabilistic Inference in Distributed Systems

34

Comparison with Loopy B.P.b

ette

r

Loopy, window 5

Loopy, window 1

Distributed filter, 1 iter/step

Distributed filter, 3 iter/step

t=1 t=2 t=3 t=4 t=5UnrolledDBN:

Page 34: Probabilistic Inference in Distributed Systems

35

Partitions introduce inconsistencies

real camera network distribution computedby nodes on the left

distribution computedby nodes on the right

network partition

objectlocation

cameraposes

The beliefs obtained by the left and the right sub-network do not agree on the shared variables, do not represent a globally consistent distribution

Good news: the beliefs are not too different.Main difference: how certain the beliefs are.

Page 35: Probabilistic Inference in Distributed Systems

36

The “two Bayesians meet on a street” problem

I believe the sun is up. Man, isn’t it down?

Hard problem, in general. Need samples to decide…

Page 36: Probabilistic Inference in Distributed Systems

37

AlignmentIdea: formulate as an optimization problem.

Suppose we define aligned distribution to match the clique marginals:

aligneddistribution

inconsistentprior marginals

Not so great for Gaussians…

x

belief 1: uncertain

belief 2: certaini(x)

Aligneddistribution

This objective tends to forget information…

Page 37: Probabilistic Inference in Distributed Systems

38

Alignment

For Gaussians, is a convex problem:

determinant maximization[Vandenberghe et al, SIAM 1998]

linear regression, can distribute[Guestrin IPSN 04]

Suppose we use KL divergence in “wrong” order

aligneddistribution

inconsistentprior marginals

Good: tends to prefer more certain distributions q

Page 38: Probabilistic Inference in Distributed Systems

39

Results: Partition

progressively partitionthe communication graph

bet

ter

Number of partition components

omniscient best

omniscient worst

KL minimization performsas well as best unaligned solution

a simpler alignment

Page 39: Probabilistic Inference in Distributed Systems

40

Conclusion

Distributed inference presents many interesting challenges• perform inference directly on the sensor nodes• robust to message losses, node failures

Static inference: message passing on routing tree• message = collections of clique marginals, likelihoods• obtain joint distribution• convergence, partial correctness properties

Dynamic inference: assumed density filtering• address inconsistencies