Direct Message Passing for Hybrid Bayesian Networks Wei Sun, PhD Assistant Research Professor SFL, C4I Center, SEOR Dept. George Mason University, 2009

Direct Message Passing for Direct Message Passing for Hybrid Bayesian NetworksHybrid Bayesian Networks

Wei Sun, PhDWei Sun, PhD

Assistant Research ProfessorAssistant Research Professor

SFL, C4I Center, SEOR Dept.SFL, C4I Center, SEOR Dept.

George Mason University, 2009George Mason University, 2009

2

Outline

Inference for hybrid Bayesian networks

Message passing algorithm

Direct message passing between discrete and continuous variables

Gaussian mixture reduction

Issues

3

Hybrid Bayesian Networks

Type Continuous features

Discretefeatures

speed

frequency

0.36589…

location

category

Type1Class 2

…

Both DISCRETE and CONTINUOUS variables are involved in a hybrid model.Both DISCRETE and CONTINUOUS variables are involved in a hybrid model.

4

Hybrid Bayesian Networks – Cont.

The simplest hybrid BN model – Conditional Linear Gaussian (CLG) no discrete child for continuous parent. Linear relationship between continuous variables. Clique Tree algorithm provides exact solution.

General hybrid BNs arbitrary continuous densities and arbitrary functional

relationships between continuous variables. No exact algorithm in general. Approximate methods include discretization, simulatio

n, conditional loopy propagation, etc.

5

Innovation

Message passing between pure discrete variables or between pure continuous variables is well defined. But it is an open issue to exchange messages between heterogeneous variables.

In this paper, we unify the message passing framework to exchange information between arbitrary variables. Provides exact solutions for polytree CLG, with full density estimat

ions, v.s. Clique Tree algorithm provides only first two moments. Both have same complexity.

Integrates unscented transformation to provide approximate solution for nonlinear non-Gaussian models.

Uses Gaussian mixture (GM) to represent continuous messages. May apply GM reduction techniques to make the algorithm scalabl

e.

6

Why Message Passing

Local, distributed, less computations.Local, distributed, less computations.

7

Message Passing in Polytree

In polytree, any node d-separate the sub-network above it from the sub-network below it. For a typical node X in a polytree, evidence can be divided into two exclusive sets, and processed separately:

Define messages and messages as:

Multiply-connected network may not be partitioned into two separate sub-networks by a node.

Then the belief of node X is:

8

Message Passing in Polytree – Cont

In message passing algorithm, each node maintains Lambda value and Pi value for itself. Also it sends Lambda message to its parent and Pi message to its child.

After finite-number iterations of message passing, every node obtains its correct belief.

For polytree, MP returns exact For polytree, MP returns exact belief; belief; For networks with loop, MP is For networks with loop, MP is called loopy propagation that often called loopy propagation that often gives good approximation to gives good approximation to posterior distributions.posterior distributions.

9

Message Passing in Hybrid Networks

For continuous variable, messages are represented by Gaussian mixture (GM).

Each state of discrete parent introduces A Gaussian component in continuous message.

Unscented transformation is used to compute continuous message when function relationship defined in CPD (Conditional Probability Distribution) is nonlinear.

When messages propagate, size of GM increased exponentially. Error-bounded GM reduction technique maintains the scalability of the algorithm.

10

Direct Passing between Disc. & Cont.

UD

X

Non-negative constant. Non-negative constant.

Gaussian mixture with discrete pi message as mixing prior,Gaussian mixture with discrete pi message as mixing prior,and is the inverse of function defined in CPD of X. and is the inverse of function defined in CPD of X.

Gaussian mixture with discrete pi message as mixing prior,Gaussian mixture with discrete pi message as mixing prior,and is the function specified in CPD of X.and is the function specified in CPD of X.

Message exchanged directly between discrete and continuous nodes, Size of GM increased when Message exchanged directly between discrete and continuous nodes, Size of GM increased when messages propagate. Need GM reduction technique to maintain scalability. messages propagate. Need GM reduction technique to maintain scalability.

A continuous node with bothA continuous node with bothdiscrete and continuous parents.discrete and continuous parents.

11

Complexity

-10 0 100

0.05

0.1

-10 0 100

0.05

0.1

-20 -10 0 10 200

0.1

0.2

-20 -10 0 10 200

0.1

0.2

-10 -5 0 5 100

0.05

0.1

0.15

0.2

-10 -5 0 5 100

0.05

0.1

0.15

0.2

Exploding??Exploding??

Z Y

X W

T

U

A B

12

Scalability - Gaussian Mixture Reduction

-10 -5 0 5 100

0.05

0.1

0.15

0.2f1: A 2-component GM

-10 -5 0 5 100

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18f2: A 3-component GM

-8 -6 -4 -2 0 2 4 6 80

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2f1 f2: 6-component GM

true densitytrue components

13

Gaussian Mixture Reduction – Cont.

-8 -6 -4 -2 0 2 4 6 80

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2f1 f2: 6-component GM

true densitytrue components

-8 -6 -4 -2 0 2 4 6 80

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2f1 f2: approx. 3-component GM

app. densityapp. components

Normalized integrated square error = 0.45%Normalized integrated square error = 0.45%

14

Example – 4-comp. GM to 20-comp. GM

-5 0 5 10 15 20 250

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09Gaussian mixture reduction with bounded error

20-component GM

4-component app. GM

NISE < 1%NISE < 1%

15

Scalability - Error Propagation

Approximate messages propagate, and so do the errors. We can have each approximation bounded. However, total errors after propagations is very difficult to estimate.

Ongoing research: having each GM reduction bounded with small error,

we aim to have total approximation errors are still bounded, at least empirically.

16

Numerical Experiments – Polytree CLG

Poly12CLG – a polytree BN modelPoly12CLG – a polytree BN model

DMP v.s. Clique TreeDMP v.s. Clique Tree Both have same complexity. Both provide exact solution for polytree. DMP provides full density estimation,

while CT provides only the first two

moments for continuous variables.

-6 -4 -2 0 2 4 6 80

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5Full density estimation by DMP

CliqueTreeDMPGM components

17

Numerical Experiments – Polytree CLG, with GM Reduction

Poly12CLG – a polytree BN modelPoly12CLG – a polytree BN model

V A L B H C0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Average and maximum errors after combining pi only(100 simulation runs)

Hidden discrete nodes

Abs

olut

e pr

obab

ility

err

ors

Average errorAverage diffMaximum errorMaximum diff

GM pi value -> single Gaussian approx.

18


Poly12CLG – a polytree BN modelPoly12CLG – a polytree BN model GM lambda message -> single Gaussian approx.

V A L B H C0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Average and maximum errors after combining lambda only(100 simulation runs)


Abs

olu

te p

roba

bilit

y er

rors


19


Poly12CLG – a polytree BN modelPoly12CLG – a polytree BN model GM pi and lambda message -> single Gaussian approx.

20

Reduce GM under Bounded Error

V A L B H C0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Average and maximum errors after approximating both pi and lambda with fewer components Gaussian mixture

(100 simulation runs)


Ave

rag

e a

bs

olu

te e

rro

rs

Average error

Average diff

Maximum error

Maximum diff

Each GM reduction has bounded error < 5%, Each GM reduction has bounded error < 5%, then the inference performance improved significantly. then the inference performance improved significantly.

21

Numerical Experiments – Network with loops

Loop13CLG – a BN model with loopsLoop13CLG – a BN model with loops

V A L B H C0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Average and maximum errors in loopy propagation(100 simulation runs)


Abs

olut

e pr

obab

ility

err

ors


Errors are from 1% to 5% due to loopy propagation.

22

Empirical Insights

Combining pi does not affect network ‘above’;

Combining lambda does not affect network ‘below’;

Approximation errors due to GM reduction diminish for discrete nodes further away from the discrete parent nodes.

Loopy propagation usually provides accurate estimations.

23

Summary & Future Research

DMP provides an alternative algorithm for efficient inference in hybrid BNs:Exact for polytree modelFull density estimationsSame complexity as Clique TreeScalable in trading off accuracy with computat

ional complexityDistributed algorithm, local computations only

24

25

A1 A2 AnA3 …

Y1 Y2 Y3 YnT E

26

Pi Value of A Cont. Node with both Disc. & Cont. Parents

UD

X

Pi value of a continuous node is essentially Pi value of a continuous node is essentially a distribution transformed by the function a distribution transformed by the function defined in CPD of this node, with input defined in CPD of this node, with input distributions as all of pi messages sent distributions as all of pi messages sent from its parents. from its parents.

With both discrete and continuous parents, With both discrete and continuous parents, pi value of the continuous node can be pi value of the continuous node can be represented by a Gaussian mixture. represented by a Gaussian mixture.

Gaussian mixture with discrete pi message as mixing prior,Gaussian mixture with discrete pi message as mixing prior,and is the function specified in CPD of X.and is the function specified in CPD of X.

27

Lambda Value of A Cont. Node

Lambda value of a continuous node is a product Lambda value of a continuous node is a product of all lambda messages sent from its children. of all lambda messages sent from its children.

Lambda message sending to a continuous node Lambda message sending to a continuous node is definitely a continuous message in the form of is definitely a continuous message in the form of Gaussian mixture because only continuous child Gaussian mixture because only continuous child is allowed for continuous node. is allowed for continuous node.

Product of Gaussian mixture will be a Gaussian Product of Gaussian mixture will be a Gaussian mixture with exponentially increased size.mixture with exponentially increased size.

X

28

Pi Message Sending to Cont. node from Disc. Parent

UD

X

Pi message sending to a continuous Pi message sending to a continuous node ‘X’ from its discrete parent is the node ‘X’ from its discrete parent is the product of pi value of the discrete product of pi value of the discrete parent and all of lambda messages parent and all of lambda messages sending to this discrete parent from all sending to this discrete parent from all children except ‘X’.children except ‘X’.

Lambda message sending to discrete Lambda message sending to discrete node from its child is always a discrete node from its child is always a discrete vector. vector.

Pi value of discrete node is always a Pi value of discrete node is always a discrete distribution.discrete distribution.

Pi message sending to a continuous Pi message sending to a continuous node from its discrete parent is a node from its discrete parent is a discrete vector, representing the discrete vector, representing the discrete parent’s state probabilities.discrete parent’s state probabilities.

29

Pi Message Sending to Cont. node from Cont. Parent

UD

X

Pi message sending to a continuous Pi message sending to a continuous node ‘X’ from its continuous parent is node ‘X’ from its continuous parent is the product of pi value of the the product of pi value of the continuous parent and all of lambda continuous parent and all of lambda messages sending to the continuous messages sending to the continuous parent from all children except ‘X’.parent from all children except ‘X’.

Lambda message sending to a Lambda message sending to a continuous node from its child is continuous node from its child is always a continuous message, always a continuous message, represented by GM.represented by GM.

Pi value of a continuous node is Pi value of a continuous node is always a continuous distribution, also always a continuous distribution, also represented by GM.represented by GM.

Pi message sending to a continuous Pi message sending to a continuous node from its continuous parent is a node from its continuous parent is a continuous message, represented by continuous message, represented by a GM.a GM.

30

Lambda Message Sending to Disc. Parent from Cont. node

UD

X

Given each state of discrete parent, a Given each state of discrete parent, a function is defined between continuous function is defined between continuous node and its continuous parent.node and its continuous parent.

For each state of discrete parent, lambda For each state of discrete parent, lambda message sent from a continuous node is message sent from a continuous node is a integration of two continuous a integration of two continuous distributions (both represented by GM), distributions (both represented by GM), resulting in a non-negative constant.resulting in a non-negative constant.

31

Lambda Message Sending to Cont. Parent from Cont. Node

UD

X

Lambda message sending from a Lambda message sending from a continuous node to its continuous parent continuous node to its continuous parent is a Gaussian mixture using the pi is a Gaussian mixture using the pi message sending to it from its discrete message sending to it from its discrete parent as the mixing prior. parent as the mixing prior.

Pi message sending to continuous node Pi message sending to continuous node from its discrete parent is a discrete from its discrete parent is a discrete vector, serving as the mixing prior. vector, serving as the mixing prior.

32

Unscented Transformation

Unscented transformation (UT) is a deterministic sampling methodUnscented transformation (UT) is a deterministic sampling method UT approximates the first two moments of a continuous random variable UT approximates the first two moments of a continuous random variable

transformed via an arbitrary nonlinear function.transformed via an arbitrary nonlinear function. UT is based on the principle that it is easier to approximate a probability UT is based on the principle that it is easier to approximate a probability

distribution than a nonlinear function.distribution than a nonlinear function.

deterministic sample points are chosen and propagated via the deterministic sample points are chosen and propagated via the original function.original function.

Where is the dimension of Where is the dimension of XX, is a scaling parameter., is a scaling parameter.

UT keeps the original function unchanged and results are exact for linear function.UT keeps the original function unchanged and results are exact for linear function.

33

Why Message Passing

Local

Distributed

Less computations

Documents

Direct Message Passing for Hybrid Bayesian Networks Wei Sun, PhD Assistant Research Professor SFL, C4I Center, SEOR Dept. George Mason University, 2009