59
Finding Optimal Bayesian Networks with Greedy Search Max Chickering

Finding Optimal Bayesian Networks with Greedy Search Max Chickering

  • View
    235

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Finding Optimal Bayesian Networks with Greedy Search Max Chickering

Finding Optimal Bayesian Networks with Greedy Search

Max Chickering

Page 2: Finding Optimal Bayesian Networks with Greedy Search Max Chickering

Outline

• Bayesian-Network Definitions

• Learning

• Greedy Equivalence Search (GES)

• Optimality of GES

Page 3: Finding Optimal Bayesian Networks with Greedy Search Max Chickering

Bayesian Networks

)),(|(),|,...,(1

1 ii

n

iin XParXpSXXp

Use B = (S,) to represent p(X1, …, Xn)

Page 4: Finding Optimal Bayesian Networks with Greedy Search Max Chickering

Markov Conditions

X

Desc

Desc

Par ParPar

ND

ND

From factorization: I(X, ND | Par(X))

Markov Conditions + Graphoid Axioms characterize all independencies

Page 5: Finding Optimal Bayesian Networks with Greedy Search Max Chickering

Structure/Distribution Inclusion

p is included in S if there exists s.t. B(S,) defines p

X Y Z

pAll distributions

S

Page 6: Finding Optimal Bayesian Networks with Greedy Search Max Chickering

Structure/Structure Inclusion T ≤ S

T is included in S if every p included in T is included in S

X Y Z

All distributions

X Y Z

S T

(S is an I-map of T)

Page 7: Finding Optimal Bayesian Networks with Greedy Search Max Chickering

Structure/Structure EquivalenceT S

X Y Z

All distributions

X Y Z

S T

Reflexive, Symmetric, Transitive

Page 8: Finding Optimal Bayesian Networks with Greedy Search Max Chickering

Equivalence

V-structure

D

A B C

Theorem (Verma and Pearl, 1990)S T same v-structures and skeletons

D

A B C

Skeleton

Page 9: Finding Optimal Bayesian Networks with Greedy Search Max Chickering

Learning Bayesian Networks

1. Learn the structure

2. Estimate the conditional distributions

X Y Z0 1 11 0 10 1 0

.

.

.1 0 1

X

Y

Z

p*

iidsamples

GenerativeDistribution

Observed Data Learned Model

Page 10: Finding Optimal Bayesian Networks with Greedy Search Max Chickering

Learning Structure

• Scoring criterion

F(D, S)

• Search procedure

Identify one or more structures with high values

for the scoring function

Page 11: Finding Optimal Bayesian Networks with Greedy Search Max Chickering

Properties of Scoring Criteria

• Consistent

• Locally Consistent

• Score Equivalent

Page 12: Finding Optimal Bayesian Networks with Greedy Search Max Chickering

Consistent Criterion

S includes p*, T does not include p* F(S,D) > F(T,D)

Both include p*, S has fewer parameters F(S,D) > F(T,D)

Criterion favors (in the limit) simplest model that includes the generative distribution p*

X Y Z

X Y Z

X Y ZX Y Z

p*

Page 13: Finding Optimal Bayesian Networks with Greedy Search Max Chickering

Locally Consistent Criterion

X Y X Y

S T

If I(X,Y|Par(X)) in p* then F(S,D) > F(T,D)Otherwise F(S,D) < F(T,D)

S and T differ by one edge:

Page 14: Finding Optimal Bayesian Networks with Greedy Search Max Chickering

Score-Equivalent Criterion

ST F(S,D) = F(T,D)

X Y

X Y

S

T

Page 15: Finding Optimal Bayesian Networks with Greedy Search Max Chickering

Bayesian Criterion(Consistent, locally consistent and score equivalent)

Sh : generative distribution p* has same

independence constraints as S.

FBayes(S,D) = log p(Sh |D)

= k + log p(D|Sh) + log p(Sh)

Marginal Likelihood(closed form w/ assumptions)

Structure Prior(e.g. prefer simple)

Page 16: Finding Optimal Bayesian Networks with Greedy Search Max Chickering

Search Procedure

• Set of states

• Representation for the states

• Operators to move between states

• Systematic Search Algorithm

Page 17: Finding Optimal Bayesian Networks with Greedy Search Max Chickering

Greedy Equivalence Search

• Set of statesEquivalence classes of DAGs

• Representation for the statesEssential graphs

• Operators to move between statesForward and Backward Operators

• Systematic Search AlgorithmTwo-phase Greedy

Page 18: Finding Optimal Bayesian Networks with Greedy Search Max Chickering

Representation: Essential Graphs

E

A B C

F

Compelled Edges

Reversible Edges

D

E

A B C

FD

Page 19: Finding Optimal Bayesian Networks with Greedy Search Max Chickering

GES Operators

Forward Direction – single edge additions

Backward Direction – single edge deletions

Page 20: Finding Optimal Bayesian Networks with Greedy Search Max Chickering

Two-Phase Greedy Algorithm

Phase 1: Forward Equivalence Search (FES)• Start with all-independence model• Run Greedy using forward operators

Phase 2: Backward Equivalence Search (BES)• Start with local max from FES• Run Greedy using backward operators

Page 21: Finding Optimal Bayesian Networks with Greedy Search Max Chickering

Forward Operators

• Consider all DAGs in the current state

• For each DAG, consider all single-edge additions (acyclic)

• Take the union of the resulting equivalence classes

Page 22: Finding Optimal Bayesian Networks with Greedy Search Max Chickering

Forward-Operators ExampleB

C

ACurrent State: All DAGs: B

C

A B

C

A

All DAGs resulting from single-edge addition:

B

C

A B

C

A

B

C

A B

C

A

B

C

A B

C

A

B

C

A B

C

A

B

C

A B

C

AB

C

A B

C

A

Union of corresponding essential graphs:

Page 23: Finding Optimal Bayesian Networks with Greedy Search Max Chickering

Forward-Operators Example

B

C

AB

C

A B

C

A

B

C

A

B

C

A

Page 24: Finding Optimal Bayesian Networks with Greedy Search Max Chickering

Backward Operators

• Consider all DAGs in the current state

• For each DAG, consider all single-edge deletions

• Take the union of the resulting equivalence classes

Page 25: Finding Optimal Bayesian Networks with Greedy Search Max Chickering

Backward-Operators ExampleCurrent State: All DAGs: B

C

A B

C

A

All DAGs resulting from single-edge deletion:

B

C

A

Union of corresponding essential graphs:

B

C

A B

C

A

B

C

A B

C

A B

C

A B

C

A B

C

A B

C

A

B

C

A

Page 26: Finding Optimal Bayesian Networks with Greedy Search Max Chickering

Backward-Operators Example

B

C

AB

C

A B

C

A

Page 27: Finding Optimal Bayesian Networks with Greedy Search Max Chickering

DAG PerfectDAG-perfect distribution p

Exists DAG G:

I(X,Y|Z) in p I(X,Y|Z) in G

Non-DAG-perfect distribution q

BA

DC

I(A,D|B,C)I(B,C|A,D)

BA

DC

BA

DC

I(B,C|A,D) I(A,D|B,C)

Page 28: Finding Optimal Bayesian Networks with Greedy Search Max Chickering

DAG-Perfect Consequence: Composition Axiom Holds in p*

If I(X,Y | Z) then I(X,Y | Z)for some singleton Y Y

A B C

X

D C

X

Page 29: Finding Optimal Bayesian Networks with Greedy Search Max Chickering

Optimality of GES

X Y Z0 1 11 0 10 1 0 . . .1 0 1

X

Y

ZS*

X

Y

Ziid

samples

If p* is DAG-perfect wrt some G*

G*

X

Y

Zn

GES

S

p*

For large n, S = S*

Page 30: Finding Optimal Bayesian Networks with Greedy Search Max Chickering

Optimality of GES

Proof Outline• After first phase (FES), current state includes S*• After second phase (BES), the current state = S*

FES BES

All-independence State includes S* State equals S*

Page 31: Finding Optimal Bayesian Networks with Greedy Search Max Chickering

FES Maximum Includes S*Assume: Local Max does NOT include S* Any DAG G from S

Markov Conditions characterize independencies:In p*, exists X not indep. non-desc given parents

E

A B C

XD I(X,{A,B,C,D} | E) in p*

p* is DAG-perfect composition axiom holds

E

A B C

XD I(X,C | E) in p*

Locally consistent: adding CX edge improves score, and EQ class isa neighbor

Page 32: Finding Optimal Bayesian Networks with Greedy Search Max Chickering

BES Identifies S*

• Current state always includes S*:

Local consistency of the criterion

• Local Minimum is S*:

Meek’s conjecture

Page 33: Finding Optimal Bayesian Networks with Greedy Search Max Chickering

Meek’s Conjecture

Any pair of DAGs G,H such that H includes G (G ≤ H)

There exists a sequence of

(1) covered edge reversals in G

(2) single-edge additions to G

after each change G ≤ Hafter all changes G=H

Page 34: Finding Optimal Bayesian Networks with Greedy Search Max Chickering

Meek’s Conjecture

BA

C D

BA

C D

I(A,B)I(C,B|A,D)

BA

C D

BA

C D

BA

C D

H

G

Page 35: Finding Optimal Bayesian Networks with Greedy Search Max Chickering

Meek’s Conjecture and BESS*≤S

Assume: Local Max S Not S* Any DAG H from S Any DAG G from S*

G H

Add AddRev Rev Rev

Page 36: Finding Optimal Bayesian Networks with Greedy Search Max Chickering

Meek’s Conjecture and BESS*≤S

Assume: Local Max S Not S* Any DAG H from S Any DAG G from S*

G H

Add AddRev Rev Rev

G H

Del DelRev Rev Rev

Page 37: Finding Optimal Bayesian Networks with Greedy Search Max Chickering

Meek’s Conjecture and BESS*≤S

Assume: Local Max S Not S* Any DAG H from S Any DAG G from S*

G H

Add AddRev Rev Rev

S* SNeighbor of S in BES

G H

Del DelRev Rev Rev

Page 38: Finding Optimal Bayesian Networks with Greedy Search Max Chickering

Discussion Points

• In practice, GES is as fast as DAG-based search

Neighborhood of essential graphs can be generated and scored very efficiently

• When DAG-perfect assumption fails, we still get optimality guarantees

As long as composition holds in generative distribution, local maximum is inclusion-minimal

Page 39: Finding Optimal Bayesian Networks with Greedy Search Max Chickering

Thanks!My Home Page:

http://research.microsoft.com/~dmax

Relevant Papers:

“Optimal Structure Identification with Greedy Search”JMLR SubmissionContains detailed proofs of Meek’s conjecture and optimality of GES

“Finding Optimal Bayesian Networks”UAI02 Paper with Chris MeekContains extension of optimality results of GES when not DAG perfect

Page 40: Finding Optimal Bayesian Networks with Greedy Search Max Chickering
Page 41: Finding Optimal Bayesian Networks with Greedy Search Max Chickering

Bayesian Criterion is Locally Consistent

• Bayesian score approaches BIC + constant

• BIC is decomposible:

• Difference in score same for any DAGS that differ by YX edge if X has same parents

))(,(),(1

i

n

ii XParXFSBIC

D

X Y X Y

Complete network (always includes p*)

Page 42: Finding Optimal Bayesian Networks with Greedy Search Max Chickering

Bayesian Criterion is Consistent

Assume Conditionals:(1) unconstrained multinomials(2) linear regressions

Network structures = curved exponential models

Bayesian Criterion is consistent

Geiger, Heckerman, King and Meek (2001)

Haughton (1988)

Page 43: Finding Optimal Bayesian Networks with Greedy Search Max Chickering

Bayesian Criterion isScore Equivalent

ST F(S,D) = F(T,D)

X Y

X Y

S

T

Sh = Th

Sh : no independence constraints

Th : no independence constraints

Page 44: Finding Optimal Bayesian Networks with Greedy Search Max Chickering

Active Paths

Z-active Path between X and Y: (non-standard)1. Neither X nor Y is in Z2. Every pair of colliding edges meets at a member of Z3. No other pair of edges meets at a member of Z

X Z Y

G ≤ H If Z-active path between X and Y in Gthen Z-active path between X and Y in H

Page 45: Finding Optimal Bayesian Networks with Greedy Search Max Chickering

Active Paths

X A Z W B Y

A B C

• X-Y: Out-of X and In-to Y

• X-W Out-of both X and W

• Any sub-path between A,BZ is also active

• A – B, B – C, at least one is out-of B Active path between A and C

Page 46: Finding Optimal Bayesian Networks with Greedy Search Max Chickering

Simple Active Paths

Y X BA

A B contains YX

Then active path

X Y BA Y X

(1) Edge appears exactly once

(2) Edge appears exactly twice

OR

Simplify discussion: Assume (1) only – proofs for (2) almost identical

Page 47: Finding Optimal Bayesian Networks with Greedy Search Max Chickering

Typical Argument:Combining Active Paths

X

Z

Y BA

X

Z

Y BA XA

Y B

X

Z

Y

G

H

G’ : Suppose AP in G’ (X not in CS) with no corresp. AP in H. Then Z not in CS.

Z sink node adj X,Y

G≤H

Page 48: Finding Optimal Bayesian Networks with Greedy Search Max Chickering

Proof Sketch

Two DAGs G, H with G<H

Identify either:

(1) a covered edge XY in G that has opposite orientation in H

(2) a new edge XY to be added to G such that it remains included in H

Page 49: Finding Optimal Bayesian Networks with Greedy Search Max Chickering

The Transformation

XY

Y

XY

XY

W

Y

XY

XY

XY

W

Y

Choose any node Y that is a sink in H

Case 1a: Y is a sink in G X ParH(Y) X ParG(Y)

Case 1b: Y is a sink in G same parents

Case 2a: X s.t. YX covered

Case 2b: X s.t. YX & W par of Y but not X

Case 2c: Every YX, Par (Y) Par(X)

Page 50: Finding Optimal Bayesian Networks with Greedy Search Max Chickering

Preliminaries

• The adjacencies in G are a subset of the adjacencies in H

• If XYZ is a v-structure in G but not H, then X and Z are adjacent in H

• Any new active path that results from adding XY to G includes XY

(G ≤ H)

Page 51: Finding Optimal Bayesian Networks with Greedy Search Max Chickering

Proof Sketch: Case 1

Z

Y is a sink in G

XY

Case 1a: X ParH(Y) X ParG(Y)

Case 1b: Parents identical

Remove Y from both graphs: proof similar

XYH:

XYG:

Suppose there’s some new active path between A and B not in H

Y X BA

1. Y is a sink in G, so it must be in CS2. Neither X nor next node Z is in CS3. In H, AP(A,Z), AP(X,B), ZYX

Page 52: Finding Optimal Bayesian Networks with Greedy Search Max Chickering

Proof Sketch: Case 2

Case 2a: There is a covered edge YX : Reverse the edge

Case 2b: There is a non-covered edge YX such that W is a parent of Y but not a parent of X

X

W

Y X

W

YG: G’:

X

W

YH:

Y must be in CS, else replace WX by W Y X (not new).If X not in CS, then in H active: A-W, X-B, WYX

X

W

Y Z X

W

YH:G’:

Z

A B BA

Y is not a sink in G

Suppose there’s some new active path between A and B not in H

Page 53: Finding Optimal Bayesian Networks with Greedy Search Max Chickering

Case 2c: The Difficult CaseAll non-covered edges YZ have Par(Y) Par(Z)

W1

Z1

Y

Z2

W2 W1

Z1

Y

Z2

W2

G H

W1Y: G no longer < H (Z2-active path between W1 and W2)W2Y: G < H

Page 54: Finding Optimal Bayesian Networks with Greedy Search Max Chickering

Choosing Z

D

Y

Z

Descendants of Y in G

D

Y

Descendants of Y in G

G H

D is the maximal G-descendant in HZ is any maximal child of Y such that D is a descendant of Z in G

Page 55: Finding Optimal Bayesian Networks with Greedy Search Max Chickering

Choosing ZW1

Z1

YZ2

W2 W1

Z1

YZ2

W2

G H

Descendants of Y in G:Y, Z1, Z2

Maximal descendant in H:D=Z2

Maximal child of Y in G that has D=Z2 as descendantZ2

Add W2Y

Page 56: Finding Optimal Bayesian Networks with Greedy Search Max Chickering

Difficult Case: Proof Intuition

D

Y

Z

G H

W Y

Z

W

1. W not in CS2. Y not in CS, else active in H3. In G, next edges must be away from Y until B or CS reached4. In G, neither Z nor desc in CS, else active before addition5. From (1,2,4), AP (A,D) and (B,D) in H6. Choice of D: directed path from D to B or CS in H

A A

DB or CS

B or CS

BB

Page 57: Finding Optimal Bayesian Networks with Greedy Search Max Chickering
Page 58: Finding Optimal Bayesian Networks with Greedy Search Max Chickering

Optimality of GES

Definitionp is DAG-perfect wrt G:Independence constraints in p are precisely those in G

Assumption Generative distribution p* is perfect wrt some G* definedover the observable variables

S* = Equivalence class containing G*

Under DAG-perfect assumption, GES results in S*

Page 59: Finding Optimal Bayesian Networks with Greedy Search Max Chickering

Important Definitions

• Bayesian Networks

• Markov Conditions

• Distribution/Structure Inclusion

• Structure/Structure Inclusion