Upload
butest
View
1.422
Download
0
Embed Size (px)
DESCRIPTION
Citation preview
An Algorithm for Bayesian
Network Construction from
Data by: Jie Cheng David A. Bell
Weiru LiuUniversity of Ulster, UK
Presented by: Jian Xu
04/10/23 Machine Learning 2
Outline
• Introduction• Some basic concepts• The proposed algorithm for BN
construction• Experiment results• Discussions & comments
04/10/23 Machine Learning 3
Serum Calcium Brain Tumor
Metastatic Cancer
Coma Headaches
P(M).20
S B P(C)+ + .80+ - .80- + .80- - .05
B P(H)+ .80- .60
M P(S)+ .80- .20 M P(B)
+ .20- .05
What is a Bayesian Network?
Cancer BN Example
04/10/23 Machine Learning 4
Bayesian Network (BN)
• A Bayesian network is a compact graphical representation of a probability distribution over a set of domain random variables X = {X1, X2, …, Xn}
• Two components– Structure: direct acyclic graph (DAG)
over nodes, which exploits causal relations in the domain
– CPD: each node has a conditional probability distribution associated with it
04/10/23 Machine Learning 5
BN Learning
• Structure learning– To identify the topology of the
network– Score based methods– Dependency analysis methods
• Parameter learning– To learn the conditional probabilities
for a given network topology– MLE, Bayesian approach, etc
04/10/23 Machine Learning 6
BN Structure Learning
• Search & scoring methods:– To search for a structure most likely to have
generated the data– Use heuristic search method to construct a
model and evaluate it using a scoring method, such as MDL, Bayesian approach, etc
– May not find the best solution– Random restarts: to avoid getting stuck in a
local maximum– Less time complexity in the worst case, i.e.,
when the underlying DAG is fully connected
04/10/23 Machine Learning 7
BN Learning Algorithms (Cont’d)
• Dependency analysis methods: – Use conditional independency (CI) test to
analyze dependency relationships among nodes.
– Usually asymptotically correct when the data is DAG-faithful
– Works efficiently when the underlying network is sparse
– CI tests with large condition sets may be unreliable unless the volume of data is enormous.
– Used in this proposed algorithm
04/10/23 Machine Learning 8
Basic Concepts
• D-separation: two nodes X and Y are called d-separated given C if and only if there exists no adjacency path P between X and Y, such that:– every collider on P is in C or has a descendant in C – no other nodes on path P is in C– C is called a condition-set
• Open path: a path between X and Y is said to be open if every node in the path is active.
• Closed path: if any node in the path is inactive• Collider node• Non-collider node
04/10/23 Machine Learning 9
Basic Concepts (Cont’d)
• DAG-faithful: when there exists such a DAG that can represent all the conditional independence relations of the underlying distribution.
• D-map: a graph G is a dependency map (D-map) of M if every independence relationship in M is true in G. (a BN with no edge)
• I-map: a graph G is an independency map (I-map) of M if every independence relationship in G is true in M. (fully-connected BN)
• Minimum I-map: a graph G is an I-map of M, but the removal of any arc from G yields a graph that is not an I-map of M.
• P-map: a graph G is a perfect map of M if it is both a D-map and an I-map of M.
04/10/23 Machine Learning 10
Mutual Information
• The mutual information of two nodes Xi , Xj is defined as:
• The conditional mutual information is defined as:
04/10/23 Machine Learning 11
Assumptions
• All attributes are discrete • No missing values in any record• All the records are drawn from a
single probability model independently
• The size of dataset is big enough for reliable CI tests
• The ordering of the attributes are available before the network construction
04/10/23 Machine Learning 12
An Algorithm for BN Construction
• Drafting– Compute mutual information of each pair
of nodes, and creates a draft of the model• Thickening
– Adds arcs when the pairs of nodes cannot be d-separated, get an I-map of the model
• Thinning– Each arc of the I-map is examined using CI
tests and will be removed if the two nodes are the arc are conditionally independent
04/10/23 Machine Learning 13
Drafting Phase
1. Initiate a graph G(V, E) where V={all nodes}, E={ }, Initiate two empty lists S, R
2. For each pair of nodes (vi, vj), i≠j, compute I(vi, vj). Sort all of the I(vi, vj) ≥ ε from large to small, and put the corresponding pairs of nodes into an ordered set S.
3. Get the first two pairs of nodes in S, and remove them from S. Add the Corresponding arc to E. (the direction of the arcs is determined by the available node ordering)
4. Get the first pair of nodes remained in S and remove it from S. If there is no open path between the two nodes (they are d-separated given empty set), add the corresponding arc to E. Otherwise add the pair of nodes to the end of an ordered set R.
5. Repeat step 4 until S is empty.
04/10/23 Machine Learning 14
Drafting Example
• Figure (a) is the underlying BN structure
• I(B,D) ≥ I(C,E) ≥ I(B,E) ≥ I(A,B) ≥ I(B,C) ≥ I(C,D) ≥ I(D,E) ≥ I(A,D) ≥ I(A,E) ≥ I(A,C) ≥ ε
• Figure (b) is the draft graph
04/10/23 Machine Learning 15
Thickening Phase
6. Get the first pair of nodes in R and remove it from R
7. Find a block set that blocks each open path between these nodes by a set of minimum number of nodes. Conduct a CI test, if these two nodes are still dependent on each other given the block set, connect them by an arc.
8. Go to step 6 until R is empty.
04/10/23 Machine Learning 16
Thickening Example
• Figure (b) is the draft graph
• Examine (D,E) pair, find the minimum set that blocks all the open paths between D and E {B}
• CI test reveal that D and E are dependent given {B}, so arc (D,E) is added
• (A,C) is not added because A and C are independent given {B}
04/10/23 Machine Learning 17
Thinning Phase
9. For each arc in E, if there are open paths between the two nodes besides this arc, remove this arc from E temporarily, and call procedure find_block_set(current graph, node1, node2). Conduct a CI test on the condition of the block set. If the two nodes are dependent, add this arc back to E; otherwise remove the arc permanently.
04/10/23 Machine Learning 18
Thinning Example
• Figure (c) is the I-map of the underlying BN
• Arc (B,E) is removed because B and E are independent of each other given {C,D}.
• Figure (d) is the perfect I-map of the underlying dependency model (a).
04/10/23 Machine Learning 19
Finding Minimum Block Set
04/10/23 Machine Learning 20
Complexity Analysis
• For a dataset with N attributes, r maximum possible values each, k parents at most– Phase I: N2 mutual information
computation, each of which requires O(r2) basic operations, O(N2r2)
– Phase II: at most N2 CI tests, each with at most O(rk+2) basic operations, O(N2rk+2), worst case O(N2rN)
– Phase III: same as Phase II.
04/10/23 Machine Learning 21
ALARM Network Structure
04/10/23 Machine Learning 22
Experiment setup
• ALARM BN (A Logical Alarm Reduction Mechanism): a medical diagnosis system for patient monitoring– 37 nodes, 46 arcs– 3 versions: same structure, different CPD’s
• 10000 cases for each dataset• Modified conditional mutual information ca
lculation by taking the variable’s degree of freedom into consideration to make CI tests more reliable
• ε = 0.003
04/10/23 Machine Learning 23
Result on ALARM BN
04/10/23 Machine Learning 24
Discussions & Comments
• About the assumptions– All attributes are discrete – No missing values in any record– The size of dataset is big enough for relia
ble CI tests– The ordering of the attributes are availab
le before the network construction
04/10/23 Machine Learning 25
Discussions & Comments
• Threshold ε– ε = 0.003– How do we pick an appropriate ε?– How does it affect the accuracy and time by cho
osing different ε?
• Modification in the experiment part– Use Modified conditional mutual information ca
lculation by taking the variable’s degree of freedom into consideration to make CI tests more reliable
– Does this modification affect the result in any way other than increasing the accuracy?
04/10/23 Machine Learning 26