28
Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Department of Philosophy and School of Computing Science Simon Fraser University Vancouver, Canada [email protected] ` with Wei Luo (Simon Fraser) and Russ Greiner (U of Alberta)

Learning Bayes Nets Based on Conditional Dependencies

  • Upload
    akasma

  • View
    28

  • Download
    1

Embed Size (px)

DESCRIPTION

Learning Bayes Nets Based on Conditional Dependencies. Oliver Schulte Department of Philosophy and School of Computing Science Simon Fraser University Vancouver, Canada [email protected] `. with Wei Luo (Simon Fraser) and Russ Greiner (U of Alberta). Outline. Brief Intro to Bayes Nets - PowerPoint PPT Presentation

Citation preview

Page 1: Learning Bayes Nets Based on Conditional Dependencies

Learning Bayes Nets Based on Conditional Dependencies

Oliver SchulteDepartment of Philosophy andSchool of Computing ScienceSimon Fraser UniversityVancouver, [email protected] `

with Wei Luo (Simon Fraser) andRuss Greiner (U of Alberta)

Page 2: Learning Bayes Nets Based on Conditional Dependencies

Learning Bayes Nets Based on Conditional Dependencies

2/28

Outline

Brief Intro to Bayes NetsCombining Dependency Information with Model SelectionLearning from Dependency Data Only: Learning-Theoretic Analysis

Page 3: Learning Bayes Nets Based on Conditional Dependencies

Learning Bayes Nets Based on Conditional Dependencies

3/28

Bayes Nets: Overview

Bayes Net Structure = Directed Acyclic Graph.Nodes = Variables of Interest.Arcs = direct “influence”, “association”.Parameters = CP Tables = Prob of Child given Parents.Structure represents (in)dependencies.Structure + parameters represents joint probability distribution over variables.

Page 4: Learning Bayes Nets Based on Conditional Dependencies

Learning Bayes Nets Based on Conditional Dependencies

4/28

Examples from CIspace (UBC)

Page 5: Learning Bayes Nets Based on Conditional Dependencies

Learning Bayes Nets Based on Conditional Dependencies

5/28

Graphs entail Dependencies

A

B

C

A

B

C

A

B

C

Dep(A,B),Dep(A,B|C)

Dep(A,B),Dep(A,B|C),Dep(B,C),Dep(B,C|A),Dep(A,C|B)

Page 6: Learning Bayes Nets Based on Conditional Dependencies

Learning Bayes Nets Based on Conditional Dependencies

6/28

I-maps and Probability Distributions

• Defn Graph G is an I-map of prob dist P If Dependent(X,Y|S) in P, then X is d-connected to Y given S in G.

• Example: If Dependent(Father Eye Color,Mother Eye Color|Child Eye Color) in P, then Father EC is d-connected to Mother EC given Child EC in G.

• Informally, G is an I-map of P G entails all conditional dependencies in P.

• Theorem Fix G,P. There is a parameter setting for G such that (G, ) represents P G is an I-map of P.

Page 7: Learning Bayes Nets Based on Conditional Dependencies

Two Approaches to Learning Bayes Net Structure

• selectgraph G as “model” with parameters to be estimated

• “search and score”

• find G that represents dependencies in P

• “test and cover” dependencies

Aim: find G that represents P with suitable parameters

Page 8: Learning Bayes Nets Based on Conditional Dependencies

Learning Bayes Nets Based on Conditional Dependencies

8/28

Our Hybrid Approach

Sample

Set ofDependencies Final

Output Graph

The final selected graph maximizesa model selection score and covers all observed dependencies.

Page 9: Learning Bayes Nets Based on Conditional Dependencies

Definition of Hybrid Criterion• Let d be a sample. Let S(G,d) be a score function.

A

B

C

Case 1 Case 2 Case 3

S 10.5

Let Dep be a set of conditional dependencies extracted from sample d.

Graph G optimizes score S given Dep, sample d

1. G entails the dependencies Dep, and

2. if any other graph G’ entails Dep, then score(G,d) ≥ score(G’,d).

Page 10: Learning Bayes Nets Based on Conditional Dependencies

Learning Bayes Nets Based on Conditional Dependencies

10/28

Local Search Heuristics for Constrained Search

• There is a general method for adapting any local search heuristic to accommodate observed dependencies.

• Will present adaptation of GES search - call it IGES.

Page 11: Learning Bayes Nets Based on Conditional Dependencies

Learning Bayes Nets Based on Conditional Dependencies

11/28

GES Search (Meek, Chickering)

GrowthPhase:AddEdges

B

CAScore = 5

B

CA Score = 7

B

CA Score = 8.5

ShrinkPhase:DeleteEdges

B

CA

Score = 9

B

CAScore = 8

Page 12: Learning Bayes Nets Based on Conditional Dependencies

Learning Bayes Nets Based on Conditional Dependencies

12/28

IGES Search

Case 1 Case 2 Case 3

Step 1: Extract Dependencies From Sample

Testing Procedure Dependencies

1. Continue with Growth Phase until all dependencies are covered.

2. During Shrink Phase, delete edge only if dependencies are still covered.

B

CAScore = 7

B

CA Score = 5

given Dep(A,B)

Page 13: Learning Bayes Nets Based on Conditional Dependencies

Asymptotic Equivalence GES = IGES

Theorem Assume that score function S is consistent and that joint probability distribution P satisfies the composition principle. Let Dep be a set of dependencies true of P. Then with P-probability 1, GES and IGES+Dep converge to the same output in the sample size limit.

• So IGES inherits the convergence propertiesof GES.

Page 14: Learning Bayes Nets Based on Conditional Dependencies

Learning Bayes Nets Based on Conditional Dependencies

14/28

Extracting Dependencies

We use 2 test (with cell coverage condition)Exhaustive testing of all triplesIndep(X,Y|S) for cardinality(S) < k

chosen by user

More sophisticated testing strategy coming soon.

Page 15: Learning Bayes Nets Based on Conditional Dependencies

Learning Bayes Nets Based on Conditional Dependencies

15/28

Simulation Setup: Methods

• The hybrid approach is a general schema.

Our Setup

• Statistical Test: 2

• Score S: BDeu (with Tetrad default settings)

• Search Method: GES, adapted

Page 16: Learning Bayes Nets Based on Conditional Dependencies

Simulation Setup: Graphs and Data

• Random DAGs with binary variables.

• #Nodes: 4,6,8,10.

• Sample Sizes 100, 200, 400, 800, 1600, 3200, 6400, 12800, 25600.

• 10 random samples per graph per sample size, average results.

• Graphs generated with Tetrad’s random DAG utility.

Page 17: Learning Bayes Nets Based on Conditional Dependencies

Result Graphs

Page 18: Learning Bayes Nets Based on Conditional Dependencies

Conclusion for I-map learning: The Underfitting Zone

Although not explicitly designed to cover statistically significant correlations, GES+BDeu does so pretty well.But not perfectly, so IGES helps to add in missing edges (on the order of 5) for node 10 graphs.

samplesize

small:little significance

medium:underfitting of correlations

large:convergence zone

Diver-gence from True Graph

standard search + score

constrained S + S

Page 19: Learning Bayes Nets Based on Conditional Dependencies

Learning Bayes Nets Based on Conditional Dependencies

19/28

Part II: Learning-Theoretic Model (COLT 2007)

• Learning Model: Learner receives increasing enumeration (list) of conditional dependency statements.

• Data repetition is possible.

• Learner outputs graph (pattern); may output ?.

Dep(A,B) Dep(B,C) Dep(A,C|B)

B

CA

B

CA

?

Data

Conjectures

Page 20: Learning Bayes Nets Based on Conditional Dependencies

Learning Bayes Nets Based on Conditional Dependencies

20/28

Criteria for Optimal Learning

Convergence: Learner must eventually settle on true graph.Learner must minimize mind changes.Given 1 and 2, learner is not dominated in convergence time.

Page 21: Learning Bayes Nets Based on Conditional Dependencies

Learning Bayes Nets Based on Conditional Dependencies

21/28

The Optimal Learning Procedure

Theorem There is a unique optimal learner defined as follows:

1. If there is a unique graph G covering the observed dependencies with a minimum number of adjacencies, output G.

2. Otherwise output ?.

Page 22: Learning Bayes Nets Based on Conditional Dependencies

Learning Bayes Nets Based on Conditional Dependencies

22/28

Computational Complexity of the Unique Optimal Learner

Theorem The following problem is NP-hard:

1. Decide if there is a unique edge-minimal map for a set of dependencies D.

2. If yes, output the graph.

Proof: Reduction to Unique Exact 3Set Cover.

{x1,x2,x3},{x3,x4,x5},{x4,x5,x7},{x2,x4,x5}, {x3,x6,x9}, {x6,x8,x9}

x1 x2 x3 x4 x5 x6 x7 x8 x9

{x1,x2,x3},{x4,x5,x7},{x3,x6,x9}

Page 23: Learning Bayes Nets Based on Conditional Dependencies

Learning Bayes Nets Based on Conditional Dependencies

23/28

Hybrid Method and Optimal Learner

Score-based methods tend to underfit (with discrete variables): place edges correctly but too few

mind change optimal but not convergence time optimal.

• Hybrid method speeds up convergence.

Page 24: Learning Bayes Nets Based on Conditional Dependencies

Learning Bayes Nets Based on Conditional Dependencies

24/28

A New Testing Strategy

• Say that a graph G satisfies the Markov condition wrt sample d for all X, Y, if Y is nonparental nondescendant of X, then we do not find Dep(X,Y|parents(X)).

• Given sample d, look for graph G that satisfies the MC wrt d with a minimum number of adjacencies.

Page 25: Learning Bayes Nets Based on Conditional Dependencies

Learning Bayes Nets Based on Conditional Dependencies

25/28

Future Work

• Use Markov condition to develop local search algorithm for score optimization requiring only (#Var)2 tests.

• Apply idea of Markov condition +edge minimization for continuous variable models.

Page 26: Learning Bayes Nets Based on Conditional Dependencies

Learning Bayes Nets Based on Conditional Dependencies

26/28

Summary: Hybrid Criterion - test, search and score.

• Basic Idea: Base Bayes net learning on dependencies that can be reliably obtained even on small to medium sample sizes.

• Hybrid criterion: find graph that maximizes model selection score given the constraint of entailing statistically significant dependencies or correlations.

• Theory + Simulation evidence suggests that this:

• speeds up convergence to correct graph

• addresses underfitting on small-medium samples.

Page 27: Learning Bayes Nets Based on Conditional Dependencies

Learning Bayes Nets Based on Conditional Dependencies

27/28

Summary: Learning-Theoretic Analysis

Learning Model: Learn graph from dependencies alone.Optimal Method: look for graph that covers observed dependencies with a minimum number of adjacencies.Implementing this method is NP-hard.

Page 28: Learning Bayes Nets Based on Conditional Dependencies

Learning Bayes Nets Based on Conditional Dependencies

28/28

References

“Mind Change Optimal Learning of Bayes Net Structure”.O. Schulte, W. Luo and R. Greiner (2007). Conference of Learning Theory (COLT).

THE END