Upload
akasma
View
28
Download
1
Embed Size (px)
DESCRIPTION
Learning Bayes Nets Based on Conditional Dependencies. Oliver Schulte Department of Philosophy and School of Computing Science Simon Fraser University Vancouver, Canada [email protected] `. with Wei Luo (Simon Fraser) and Russ Greiner (U of Alberta). Outline. Brief Intro to Bayes Nets - PowerPoint PPT Presentation
Citation preview
Learning Bayes Nets Based on Conditional Dependencies
Oliver SchulteDepartment of Philosophy andSchool of Computing ScienceSimon Fraser UniversityVancouver, [email protected] `
with Wei Luo (Simon Fraser) andRuss Greiner (U of Alberta)
Learning Bayes Nets Based on Conditional Dependencies
2/28
Outline
Brief Intro to Bayes NetsCombining Dependency Information with Model SelectionLearning from Dependency Data Only: Learning-Theoretic Analysis
Learning Bayes Nets Based on Conditional Dependencies
3/28
Bayes Nets: Overview
Bayes Net Structure = Directed Acyclic Graph.Nodes = Variables of Interest.Arcs = direct “influence”, “association”.Parameters = CP Tables = Prob of Child given Parents.Structure represents (in)dependencies.Structure + parameters represents joint probability distribution over variables.
Learning Bayes Nets Based on Conditional Dependencies
4/28
Examples from CIspace (UBC)
Learning Bayes Nets Based on Conditional Dependencies
5/28
Graphs entail Dependencies
A
B
C
A
B
C
A
B
C
Dep(A,B),Dep(A,B|C)
Dep(A,B),Dep(A,B|C),Dep(B,C),Dep(B,C|A),Dep(A,C|B)
Learning Bayes Nets Based on Conditional Dependencies
6/28
I-maps and Probability Distributions
• Defn Graph G is an I-map of prob dist P If Dependent(X,Y|S) in P, then X is d-connected to Y given S in G.
• Example: If Dependent(Father Eye Color,Mother Eye Color|Child Eye Color) in P, then Father EC is d-connected to Mother EC given Child EC in G.
• Informally, G is an I-map of P G entails all conditional dependencies in P.
• Theorem Fix G,P. There is a parameter setting for G such that (G, ) represents P G is an I-map of P.
Two Approaches to Learning Bayes Net Structure
• selectgraph G as “model” with parameters to be estimated
• “search and score”
• find G that represents dependencies in P
• “test and cover” dependencies
Aim: find G that represents P with suitable parameters
Learning Bayes Nets Based on Conditional Dependencies
8/28
Our Hybrid Approach
Sample
Set ofDependencies Final
Output Graph
The final selected graph maximizesa model selection score and covers all observed dependencies.
Definition of Hybrid Criterion• Let d be a sample. Let S(G,d) be a score function.
A
B
C
Case 1 Case 2 Case 3
S 10.5
Let Dep be a set of conditional dependencies extracted from sample d.
Graph G optimizes score S given Dep, sample d
1. G entails the dependencies Dep, and
2. if any other graph G’ entails Dep, then score(G,d) ≥ score(G’,d).
Learning Bayes Nets Based on Conditional Dependencies
10/28
Local Search Heuristics for Constrained Search
• There is a general method for adapting any local search heuristic to accommodate observed dependencies.
• Will present adaptation of GES search - call it IGES.
Learning Bayes Nets Based on Conditional Dependencies
11/28
GES Search (Meek, Chickering)
GrowthPhase:AddEdges
B
CAScore = 5
B
CA Score = 7
B
CA Score = 8.5
ShrinkPhase:DeleteEdges
B
CA
Score = 9
B
CAScore = 8
Learning Bayes Nets Based on Conditional Dependencies
12/28
IGES Search
Case 1 Case 2 Case 3
Step 1: Extract Dependencies From Sample
Testing Procedure Dependencies
1. Continue with Growth Phase until all dependencies are covered.
2. During Shrink Phase, delete edge only if dependencies are still covered.
B
CAScore = 7
B
CA Score = 5
given Dep(A,B)
Asymptotic Equivalence GES = IGES
Theorem Assume that score function S is consistent and that joint probability distribution P satisfies the composition principle. Let Dep be a set of dependencies true of P. Then with P-probability 1, GES and IGES+Dep converge to the same output in the sample size limit.
• So IGES inherits the convergence propertiesof GES.
Learning Bayes Nets Based on Conditional Dependencies
14/28
Extracting Dependencies
We use 2 test (with cell coverage condition)Exhaustive testing of all triplesIndep(X,Y|S) for cardinality(S) < k
chosen by user
More sophisticated testing strategy coming soon.
Learning Bayes Nets Based on Conditional Dependencies
15/28
Simulation Setup: Methods
• The hybrid approach is a general schema.
Our Setup
• Statistical Test: 2
• Score S: BDeu (with Tetrad default settings)
• Search Method: GES, adapted
Simulation Setup: Graphs and Data
• Random DAGs with binary variables.
• #Nodes: 4,6,8,10.
• Sample Sizes 100, 200, 400, 800, 1600, 3200, 6400, 12800, 25600.
• 10 random samples per graph per sample size, average results.
• Graphs generated with Tetrad’s random DAG utility.
Result Graphs
Conclusion for I-map learning: The Underfitting Zone
Although not explicitly designed to cover statistically significant correlations, GES+BDeu does so pretty well.But not perfectly, so IGES helps to add in missing edges (on the order of 5) for node 10 graphs.
samplesize
small:little significance
medium:underfitting of correlations
large:convergence zone
Diver-gence from True Graph
standard search + score
constrained S + S
Learning Bayes Nets Based on Conditional Dependencies
19/28
Part II: Learning-Theoretic Model (COLT 2007)
• Learning Model: Learner receives increasing enumeration (list) of conditional dependency statements.
• Data repetition is possible.
• Learner outputs graph (pattern); may output ?.
Dep(A,B) Dep(B,C) Dep(A,C|B)
B
CA
B
CA
?
…
…
Data
Conjectures
Learning Bayes Nets Based on Conditional Dependencies
20/28
Criteria for Optimal Learning
Convergence: Learner must eventually settle on true graph.Learner must minimize mind changes.Given 1 and 2, learner is not dominated in convergence time.
Learning Bayes Nets Based on Conditional Dependencies
21/28
The Optimal Learning Procedure
Theorem There is a unique optimal learner defined as follows:
1. If there is a unique graph G covering the observed dependencies with a minimum number of adjacencies, output G.
2. Otherwise output ?.
Learning Bayes Nets Based on Conditional Dependencies
22/28
Computational Complexity of the Unique Optimal Learner
Theorem The following problem is NP-hard:
1. Decide if there is a unique edge-minimal map for a set of dependencies D.
2. If yes, output the graph.
Proof: Reduction to Unique Exact 3Set Cover.
{x1,x2,x3},{x3,x4,x5},{x4,x5,x7},{x2,x4,x5}, {x3,x6,x9}, {x6,x8,x9}
x1 x2 x3 x4 x5 x6 x7 x8 x9
{x1,x2,x3},{x4,x5,x7},{x3,x6,x9}
Learning Bayes Nets Based on Conditional Dependencies
23/28
Hybrid Method and Optimal Learner
Score-based methods tend to underfit (with discrete variables): place edges correctly but too few
mind change optimal but not convergence time optimal.
• Hybrid method speeds up convergence.
Learning Bayes Nets Based on Conditional Dependencies
24/28
A New Testing Strategy
• Say that a graph G satisfies the Markov condition wrt sample d for all X, Y, if Y is nonparental nondescendant of X, then we do not find Dep(X,Y|parents(X)).
• Given sample d, look for graph G that satisfies the MC wrt d with a minimum number of adjacencies.
Learning Bayes Nets Based on Conditional Dependencies
25/28
Future Work
• Use Markov condition to develop local search algorithm for score optimization requiring only (#Var)2 tests.
• Apply idea of Markov condition +edge minimization for continuous variable models.
Learning Bayes Nets Based on Conditional Dependencies
26/28
Summary: Hybrid Criterion - test, search and score.
• Basic Idea: Base Bayes net learning on dependencies that can be reliably obtained even on small to medium sample sizes.
• Hybrid criterion: find graph that maximizes model selection score given the constraint of entailing statistically significant dependencies or correlations.
• Theory + Simulation evidence suggests that this:
• speeds up convergence to correct graph
• addresses underfitting on small-medium samples.
Learning Bayes Nets Based on Conditional Dependencies
27/28
Summary: Learning-Theoretic Analysis
Learning Model: Learn graph from dependencies alone.Optimal Method: look for graph that covers observed dependencies with a minimum number of adjacencies.Implementing this method is NP-hard.
Learning Bayes Nets Based on Conditional Dependencies
28/28
References
“Mind Change Optimal Learning of Bayes Net Structure”.O. Schulte, W. Luo and R. Greiner (2007). Conference of Learning Theory (COLT).
THE END