An Algorithm for Bayesian Network Construction from Data

An Algorithm for Bayesian

Network Construction from

Data by: Jie Cheng David A. Bell

Weiru LiuUniversity of Ulster, UK

Presented by: Jian Xu

04/10/23 Machine Learning 2

Outline

• Introduction• Some basic concepts• The proposed algorithm for BN

construction• Experiment results• Discussions & comments


Serum Calcium Brain Tumor

Metastatic Cancer

Coma Headaches

P(M).20

S B P(C)+ + .80+ - .80- + .80- - .05

B P(H)+ .80- .60

M P(S)+ .80- .20 M P(B)

+ .20- .05

What is a Bayesian Network?

Cancer BN Example


Bayesian Network (BN)

• A Bayesian network is a compact graphical representation of a probability distribution over a set of domain random variables X = {X1, X2, …, Xn}

• Two components– Structure: direct acyclic graph (DAG)

over nodes, which exploits causal relations in the domain

– CPD: each node has a conditional probability distribution associated with it


BN Learning

• Structure learning– To identify the topology of the

network– Score based methods– Dependency analysis methods

• Parameter learning– To learn the conditional probabilities

for a given network topology– MLE, Bayesian approach, etc


BN Structure Learning

• Search & scoring methods:– To search for a structure most likely to have

generated the data– Use heuristic search method to construct a

model and evaluate it using a scoring method, such as MDL, Bayesian approach, etc

– May not find the best solution– Random restarts: to avoid getting stuck in a

local maximum– Less time complexity in the worst case, i.e.,

when the underlying DAG is fully connected


BN Learning Algorithms (Cont’d)

• Dependency analysis methods: – Use conditional independency (CI) test to

analyze dependency relationships among nodes.

– Usually asymptotically correct when the data is DAG-faithful

– Works efficiently when the underlying network is sparse

– CI tests with large condition sets may be unreliable unless the volume of data is enormous.

– Used in this proposed algorithm


Basic Concepts

• D-separation: two nodes X and Y are called d-separated given C if and only if there exists no adjacency path P between X and Y, such that:– every collider on P is in C or has a descendant in C – no other nodes on path P is in C– C is called a condition-set

• Open path: a path between X and Y is said to be open if every node in the path is active.

• Closed path: if any node in the path is inactive• Collider node• Non-collider node


Basic Concepts (Cont’d)

• DAG-faithful: when there exists such a DAG that can represent all the conditional independence relations of the underlying distribution.

• D-map: a graph G is a dependency map (D-map) of M if every independence relationship in M is true in G. (a BN with no edge)

• I-map: a graph G is an independency map (I-map) of M if every independence relationship in G is true in M. (fully-connected BN)

• Minimum I-map: a graph G is an I-map of M, but the removal of any arc from G yields a graph that is not an I-map of M.

• P-map: a graph G is a perfect map of M if it is both a D-map and an I-map of M.


Mutual Information

• The mutual information of two nodes Xi , Xj is defined as:

• The conditional mutual information is defined as:


Assumptions

• All attributes are discrete • No missing values in any record• All the records are drawn from a

single probability model independently

• The size of dataset is big enough for reliable CI tests

• The ordering of the attributes are available before the network construction


An Algorithm for BN Construction

• Drafting– Compute mutual information of each pair

of nodes, and creates a draft of the model• Thickening

– Adds arcs when the pairs of nodes cannot be d-separated, get an I-map of the model

• Thinning– Each arc of the I-map is examined using CI

tests and will be removed if the two nodes are the arc are conditionally independent


Drafting Phase

1. Initiate a graph G(V, E) where V={all nodes}, E={ }, Initiate two empty lists S, R

2. For each pair of nodes (vi, vj), i≠j, compute I(vi, vj). Sort all of the I(vi, vj) ≥ ε from large to small, and put the corresponding pairs of nodes into an ordered set S.

3. Get the first two pairs of nodes in S, and remove them from S. Add the Corresponding arc to E. (the direction of the arcs is determined by the available node ordering)

4. Get the first pair of nodes remained in S and remove it from S. If there is no open path between the two nodes (they are d-separated given empty set), add the corresponding arc to E. Otherwise add the pair of nodes to the end of an ordered set R.

5. Repeat step 4 until S is empty.


Drafting Example

• Figure (a) is the underlying BN structure

• I(B,D) ≥ I(C,E) ≥ I(B,E) ≥ I(A,B) ≥ I(B,C) ≥ I(C,D) ≥ I(D,E) ≥ I(A,D) ≥ I(A,E) ≥ I(A,C) ≥ ε

• Figure (b) is the draft graph


Thickening Phase

6. Get the first pair of nodes in R and remove it from R

7. Find a block set that blocks each open path between these nodes by a set of minimum number of nodes. Conduct a CI test, if these two nodes are still dependent on each other given the block set, connect them by an arc.

8. Go to step 6 until R is empty.


Thickening Example

• Figure (b) is the draft graph

• Examine (D,E) pair, find the minimum set that blocks all the open paths between D and E {B}

• CI test reveal that D and E are dependent given {B}, so arc (D,E) is added

• (A,C) is not added because A and C are independent given {B}


Thinning Phase

9. For each arc in E, if there are open paths between the two nodes besides this arc, remove this arc from E temporarily, and call procedure find_block_set(current graph, node1, node2). Conduct a CI test on the condition of the block set. If the two nodes are dependent, add this arc back to E; otherwise remove the arc permanently.


Thinning Example

• Figure (c) is the I-map of the underlying BN

• Arc (B,E) is removed because B and E are independent of each other given {C,D}.

• Figure (d) is the perfect I-map of the underlying dependency model (a).


Finding Minimum Block Set


Complexity Analysis

• For a dataset with N attributes, r maximum possible values each, k parents at most– Phase I: N2 mutual information

computation, each of which requires O(r2) basic operations, O(N2r2)

– Phase II: at most N2 CI tests, each with at most O(rk+2) basic operations, O(N2rk+2), worst case O(N2rN)

– Phase III: same as Phase II.


ALARM Network Structure


Experiment setup

• ALARM BN (A Logical Alarm Reduction Mechanism): a medical diagnosis system for patient monitoring– 37 nodes, 46 arcs– 3 versions: same structure, different CPD’s

• 10000 cases for each dataset• Modified conditional mutual information ca

lculation by taking the variable’s degree of freedom into consideration to make CI tests more reliable

• ε = 0.003


Result on ALARM BN


Discussions & Comments

• About the assumptions– All attributes are discrete – No missing values in any record– The size of dataset is big enough for relia

ble CI tests– The ordering of the attributes are availab

le before the network construction


Discussions & Comments

• Threshold ε– ε = 0.003– How do we pick an appropriate ε?– How does it affect the accuracy and time by cho

osing different ε?

• Modification in the experiment part– Use Modified conditional mutual information ca

lculation by taking the variable’s degree of freedom into consideration to make CI tests more reliable

– Does this modification affect the result in any way other than increasing the accuracy?


Documents

An Algorithm for Bayesian Network Construction from Data