26
An Algorithm for Bayesian Network Construction from Data by: Jie Cheng David A. Bell Weiru Liu University of Ulster, UK Presented by: Jian Xu

An Algorithm for Bayesian Network Construction from Data

  • Upload
    butest

  • View
    1.422

  • Download
    0

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: An Algorithm for Bayesian Network Construction from Data

An Algorithm for Bayesian

Network Construction from

Data by: Jie Cheng David A. Bell

Weiru LiuUniversity of Ulster, UK

Presented by: Jian Xu

Page 2: An Algorithm for Bayesian Network Construction from Data

04/10/23 Machine Learning 2

Outline

• Introduction• Some basic concepts• The proposed algorithm for BN

construction• Experiment results• Discussions & comments

Page 3: An Algorithm for Bayesian Network Construction from Data

04/10/23 Machine Learning 3

Serum Calcium Brain Tumor

Metastatic Cancer

Coma Headaches

P(M).20

S B P(C)+ + .80+ - .80- + .80- - .05

B P(H)+ .80- .60

M P(S)+ .80- .20 M P(B)

+ .20- .05

What is a Bayesian Network?

Cancer BN Example

Page 4: An Algorithm for Bayesian Network Construction from Data

04/10/23 Machine Learning 4

Bayesian Network (BN)

• A Bayesian network is a compact graphical representation of a probability distribution over a set of domain random variables X = {X1, X2, …, Xn}

• Two components– Structure: direct acyclic graph (DAG)

over nodes, which exploits causal relations in the domain

– CPD: each node has a conditional probability distribution associated with it

Page 5: An Algorithm for Bayesian Network Construction from Data

04/10/23 Machine Learning 5

BN Learning

• Structure learning– To identify the topology of the

network– Score based methods– Dependency analysis methods

• Parameter learning– To learn the conditional probabilities

for a given network topology– MLE, Bayesian approach, etc

Page 6: An Algorithm for Bayesian Network Construction from Data

04/10/23 Machine Learning 6

BN Structure Learning

• Search & scoring methods:– To search for a structure most likely to have

generated the data– Use heuristic search method to construct a

model and evaluate it using a scoring method, such as MDL, Bayesian approach, etc

– May not find the best solution– Random restarts: to avoid getting stuck in a

local maximum– Less time complexity in the worst case, i.e.,

when the underlying DAG is fully connected

Page 7: An Algorithm for Bayesian Network Construction from Data

04/10/23 Machine Learning 7

BN Learning Algorithms (Cont’d)

• Dependency analysis methods: – Use conditional independency (CI) test to

analyze dependency relationships among nodes.

– Usually asymptotically correct when the data is DAG-faithful

– Works efficiently when the underlying network is sparse

– CI tests with large condition sets may be unreliable unless the volume of data is enormous.

– Used in this proposed algorithm

Page 8: An Algorithm for Bayesian Network Construction from Data

04/10/23 Machine Learning 8

Basic Concepts

• D-separation: two nodes X and Y are called d-separated given C if and only if there exists no adjacency path P between X and Y, such that:– every collider on P is in C or has a descendant in C – no other nodes on path P is in C– C is called a condition-set

• Open path: a path between X and Y is said to be open if every node in the path is active.

• Closed path: if any node in the path is inactive• Collider node• Non-collider node

Page 9: An Algorithm for Bayesian Network Construction from Data

04/10/23 Machine Learning 9

Basic Concepts (Cont’d)

• DAG-faithful: when there exists such a DAG that can represent all the conditional independence relations of the underlying distribution.

• D-map: a graph G is a dependency map (D-map) of M if every independence relationship in M is true in G. (a BN with no edge)

• I-map: a graph G is an independency map (I-map) of M if every independence relationship in G is true in M. (fully-connected BN)

• Minimum I-map: a graph G is an I-map of M, but the removal of any arc from G yields a graph that is not an I-map of M.

• P-map: a graph G is a perfect map of M if it is both a D-map and an I-map of M.

Page 10: An Algorithm for Bayesian Network Construction from Data

04/10/23 Machine Learning 10

Mutual Information

• The mutual information of two nodes Xi , Xj is defined as:

• The conditional mutual information is defined as:

Page 11: An Algorithm for Bayesian Network Construction from Data

04/10/23 Machine Learning 11

Assumptions

• All attributes are discrete • No missing values in any record• All the records are drawn from a

single probability model independently

• The size of dataset is big enough for reliable CI tests

• The ordering of the attributes are available before the network construction

Page 12: An Algorithm for Bayesian Network Construction from Data

04/10/23 Machine Learning 12

An Algorithm for BN Construction

• Drafting– Compute mutual information of each pair

of nodes, and creates a draft of the model• Thickening

– Adds arcs when the pairs of nodes cannot be d-separated, get an I-map of the model

• Thinning– Each arc of the I-map is examined using CI

tests and will be removed if the two nodes are the arc are conditionally independent

Page 13: An Algorithm for Bayesian Network Construction from Data

04/10/23 Machine Learning 13

Drafting Phase

1. Initiate a graph G(V, E) where V={all nodes}, E={ }, Initiate two empty lists S, R

2. For each pair of nodes (vi, vj), i≠j, compute I(vi, vj). Sort all of the I(vi, vj) ≥ ε from large to small, and put the corresponding pairs of nodes into an ordered set S.

3. Get the first two pairs of nodes in S, and remove them from S. Add the Corresponding arc to E. (the direction of the arcs is determined by the available node ordering)

4. Get the first pair of nodes remained in S and remove it from S. If there is no open path between the two nodes (they are d-separated given empty set), add the corresponding arc to E. Otherwise add the pair of nodes to the end of an ordered set R.

5. Repeat step 4 until S is empty.

Page 14: An Algorithm for Bayesian Network Construction from Data

04/10/23 Machine Learning 14

Drafting Example

• Figure (a) is the underlying BN structure

• I(B,D) ≥ I(C,E) ≥ I(B,E) ≥ I(A,B) ≥ I(B,C) ≥ I(C,D) ≥ I(D,E) ≥ I(A,D) ≥ I(A,E) ≥ I(A,C) ≥ ε

• Figure (b) is the draft graph

Page 15: An Algorithm for Bayesian Network Construction from Data

04/10/23 Machine Learning 15

Thickening Phase

6. Get the first pair of nodes in R and remove it from R

7. Find a block set that blocks each open path between these nodes by a set of minimum number of nodes. Conduct a CI test, if these two nodes are still dependent on each other given the block set, connect them by an arc.

8. Go to step 6 until R is empty.

Page 16: An Algorithm for Bayesian Network Construction from Data

04/10/23 Machine Learning 16

Thickening Example

• Figure (b) is the draft graph

• Examine (D,E) pair, find the minimum set that blocks all the open paths between D and E {B}

• CI test reveal that D and E are dependent given {B}, so arc (D,E) is added

• (A,C) is not added because A and C are independent given {B}

Page 17: An Algorithm for Bayesian Network Construction from Data

04/10/23 Machine Learning 17

Thinning Phase

9. For each arc in E, if there are open paths between the two nodes besides this arc, remove this arc from E temporarily, and call procedure find_block_set(current graph, node1, node2). Conduct a CI test on the condition of the block set. If the two nodes are dependent, add this arc back to E; otherwise remove the arc permanently.

Page 18: An Algorithm for Bayesian Network Construction from Data

04/10/23 Machine Learning 18

Thinning Example

• Figure (c) is the I-map of the underlying BN

• Arc (B,E) is removed because B and E are independent of each other given {C,D}.

• Figure (d) is the perfect I-map of the underlying dependency model (a).

Page 19: An Algorithm for Bayesian Network Construction from Data

04/10/23 Machine Learning 19

Finding Minimum Block Set

Page 20: An Algorithm for Bayesian Network Construction from Data

04/10/23 Machine Learning 20

Complexity Analysis

• For a dataset with N attributes, r maximum possible values each, k parents at most– Phase I: N2 mutual information

computation, each of which requires O(r2) basic operations, O(N2r2)

– Phase II: at most N2 CI tests, each with at most O(rk+2) basic operations, O(N2rk+2), worst case O(N2rN)

– Phase III: same as Phase II.

Page 21: An Algorithm for Bayesian Network Construction from Data

04/10/23 Machine Learning 21

ALARM Network Structure

Page 22: An Algorithm for Bayesian Network Construction from Data

04/10/23 Machine Learning 22

Experiment setup

• ALARM BN (A Logical Alarm Reduction Mechanism): a medical diagnosis system for patient monitoring– 37 nodes, 46 arcs– 3 versions: same structure, different CPD’s

• 10000 cases for each dataset• Modified conditional mutual information ca

lculation by taking the variable’s degree of freedom into consideration to make CI tests more reliable

• ε = 0.003

Page 23: An Algorithm for Bayesian Network Construction from Data

04/10/23 Machine Learning 23

Result on ALARM BN

Page 24: An Algorithm for Bayesian Network Construction from Data

04/10/23 Machine Learning 24

Discussions & Comments

• About the assumptions– All attributes are discrete – No missing values in any record– The size of dataset is big enough for relia

ble CI tests– The ordering of the attributes are availab

le before the network construction

Page 25: An Algorithm for Bayesian Network Construction from Data

04/10/23 Machine Learning 25

Discussions & Comments

• Threshold ε– ε = 0.003– How do we pick an appropriate ε?– How does it affect the accuracy and time by cho

osing different ε?

• Modification in the experiment part– Use Modified conditional mutual information ca

lculation by taking the variable’s degree of freedom into consideration to make CI tests more reliable

– Does this modification affect the result in any way other than increasing the accuracy?

Page 26: An Algorithm for Bayesian Network Construction from Data

04/10/23 Machine Learning 26