31
Kansas State University Department of Computing and Information Sciences Ben Perry – M.S. thesis defense Benjamin B. Perry Laboratory for Knowledge Discovery in Databases Kansas State University http://www.kddresearch.org http://www.cis.ksu.edu/~bbp9857 A Genetic Algorithm for Learning ayesian Network Adjacency Matrices from Dat Ben Perry – M.S. Thesis Defense Ben Perry – M.S. Thesis Defense

Kansas State University Department of Computing and Information Sciences Ben Perry – M.S. thesis defense Benjamin B. Perry Laboratory for Knowledge Discovery

Embed Size (px)

Citation preview

Page 1: Kansas State University Department of Computing and Information Sciences Ben Perry – M.S. thesis defense Benjamin B. Perry Laboratory for Knowledge Discovery

Kansas State University

Department of Computing and Information SciencesBen Perry – M.S. thesis defense

Benjamin B. Perry

Laboratory for Knowledge Discovery in Databases

Kansas State University

http://www.kddresearch.org

http://www.cis.ksu.edu/~bbp9857

A Genetic Algorithm for Learning Bayesian Network Adjacency Matrices from Data

Ben Perry – M.S. Thesis DefenseBen Perry – M.S. Thesis Defense

Page 2: Kansas State University Department of Computing and Information Sciences Ben Perry – M.S. thesis defense Benjamin B. Perry Laboratory for Knowledge Discovery

Kansas State University

Department of Computing and Information SciencesBen Perry – M.S. thesis defense

• Bayesian Network– Definitions and examples

– Inference and learning

• Genetic Algorithms

• Structure Learning Background– Problem

– K2 algorithm

– Sparse Candidate

• Improving K2: Permutation Genetic Algorithm (GASLEAK)– Shortcoming: greedy, sensitive to ordering

– Permutation GA

• Master’s thesis: Adjacency Matrix GA (SLAM GA)

– Rationale

• Evaluation with Known Bayesian Networks

• Summary

OverviewOverview

Page 3: Kansas State University Department of Computing and Information Sciences Ben Perry – M.S. thesis defense Benjamin B. Perry Laboratory for Knowledge Discovery

Kansas State University

Department of Computing and Information SciencesBen Perry – M.S. thesis defense

• Bayesian Network– Directed acyclic graph

– Vertices (nodes): denote events, or states of affairs (each a random variable)

– Edges (arcs, links): denote conditional dependencies, causalities

– Model of conditional dependence assertions (or CI assumptions)

• Example (“Ben’s Presentation” BBN) (sprinkler)

• General Product (Chain) Rule for BBNs`

Bayesian Belief Networks (BBNS):Bayesian Belief Networks (BBNS):DefinitionDefinition

X1

X2

X3

X4

Sleep:NarcolepticWellBadAll-nighter

Appearance: Good, Bad

Memory: Elephant, Good, Bad, None

Ben is nervous:Extremely, Yes, No

X5

Ben’s presentation:Good, Not so good, Failed miserably

P(Well, Good, Good, No, Good) = P(G) · P(G | W) · P(G | W) · P(N | G, G) · P(G | N)

n

iiin21 Xparents |XPX , ,X,XP

1

Page 4: Kansas State University Department of Computing and Information Sciences Ben Perry – M.S. thesis defense Benjamin B. Perry Laboratory for Knowledge Discovery

Kansas State University

Department of Computing and Information SciencesBen Perry – M.S. thesis defense

• Idea– Want: model that can be used to perform inference

– Desired properties

• Correlations among variables

• Ability to represent functional, logical, stochastic relationships

• Probability of certain events

• Inference: Decision Support Problems

– Diagnosis (medical, equipment)

– Pattern recognition (image, speech)

– Prediction

• Want to Learn: Most Likely Model that Generates Observed Data

– Under certain assumptions (Causal Markovity), it has been shown that we can do it

– Given: data D (tuples or vectors containing observed values of variables)

– Return: directed graph (V, E) expressing target CPTs

– NEXT: Genetic algorithms

Graphical ModelsGraphical Modelsof Probability Distributionsof Probability Distributions

Page 5: Kansas State University Department of Computing and Information Sciences Ben Perry – M.S. thesis defense Benjamin B. Perry Laboratory for Knowledge Discovery

Kansas State University

Department of Computing and Information SciencesBen Perry – M.S. thesis defense

• Idea– Emulate natural process of survival of the fittest (Example: Roaches adapt)

– Each generation has many diverse individuals

– Each individual competes for the chance to survive

– Most common approach: best individuals live to the next generation and mate

– Produce children with traits from both parents

– If parents are strong, children might be stronger

• Major components (operators)

– Fitness function

– Chromosome manipulation

– Cross-over (Not the “John Edward” type!), mutation

• From (Educated?) Guess to Gold– Initial population typically random or not much better than random – bad scores

– Performs well with a non-deceptive search space and good genetic operators

– Ability to escape local optima with mutations.

– Not guaranteed to get the best answer, but usually gets close

Genetic AlgorithmsGenetic Algorithms

Page 6: Kansas State University Department of Computing and Information Sciences Ben Perry – M.S. thesis defense Benjamin B. Perry Laboratory for Knowledge Discovery

Kansas State University

Department of Computing and Information SciencesBen Perry – M.S. thesis defense

Learning Structure:Learning Structure:K2K2 Algorithm Algorithm

• Algorithm Learn-BBN-Structure-K2 (D, Max-Parents)

FOR i 1 to n DO // arbitrary ordering of variables {x1, x2, …, xn}

WHILE (Parents[xi].Size < Max-Parents) DO // find best candidate parent

Best argmaxj>i (P(D | xj Parents[xi]) // max Dirichlet score

IF (Parents[xi] + Best).Score > Parents[xi].Score) THEN Parents[xi] += Best

RETURN ({Parents[xi] | i {1, 2, …, n}})

• A Logical Alarm Reduction Mechanism [Beinlich et al, 1989]

– BBN model for patient monitoring in surgical anesthesia

– Vertices (37): findings (e.g., esophageal intubation), intermediates, observables

– K2: found BBN different in only 1 edge from gold standard (elicited from expert)

17

6 5 4

19

10 21

311127

20

22

15

34

32

1229

9

28

7 8

30

2518

26

1 2 3

33 14

35

23

13

36

24

16

37

Page 7: Kansas State University Department of Computing and Information Sciences Ben Perry – M.S. thesis defense Benjamin B. Perry Laboratory for Knowledge Discovery

Kansas State University

Department of Computing and Information SciencesBen Perry – M.S. thesis defense

Learning Structure:Learning Structure:K2K2 downfalls downfalls

• Greedy (may fall into local maxima)

• Highly dependent upon node ordering

• Optimal node ordering must be given

• If optimal order is already known, an expert could probably create the network

• Number of orderings consistent with DAGs is exponential (n!)

Page 8: Kansas State University Department of Computing and Information Sciences Ben Perry – M.S. thesis defense Benjamin B. Perry Laboratory for Knowledge Discovery

Kansas State University

Department of Computing and Information SciencesBen Perry – M.S. thesis defense

• General Idea:

– Inspect k-best parent candidates at a time. (K2 only inspects one)

– k is typically very small ~ 5 ≤ k ≤ 15

– Exponential to the order of k

• Algorithm:

Loop until no improvements or iteration limit exceeds:

For each node, select the top k parent candidates (mutual information or m_disc) [Restrict]

Build a network by manipulating parents (add, remove, reverse from candidate set for each node) . Only accept changes that maximizes the network score (Minimum Descriptor Length) [Maximize phase]

• Must handle cycles.. expensive.

– K2 gives this to us for free

– Next: Improving K2

Learning Structure:Learning Structure:Sparse CandidateSparse Candidate

Page 9: Kansas State University Department of Computing and Information Sciences Ben Perry – M.S. thesis defense Benjamin B. Perry Laboratory for Knowledge Discovery

Kansas State University

Department of Computing and Information SciencesBen Perry – M.S. thesis defense

GASLEAKGASLEAK::A Permutation GA for Variable OrderingA Permutation GA for Variable Ordering

[2] Representation Evaluatorfor Bayesian Network

Structure Learning Problems

Genetic Algorithm for Structure Learningfrom Evidence, AIS, and K2

D: Training Data

: Evidence Specification

Dtrain (Structure Learning)

Dval (Inference)

[1] Permutation Genetic Algorithm

α

CandidateOrdering

f(α)

OrderingFitness

OptimizedOrdering

α̂

eI

Page 10: Kansas State University Department of Computing and Information Sciences Ben Perry – M.S. thesis defense Benjamin B. Perry Laboratory for Knowledge Discovery

Kansas State University

Department of Computing and Information SciencesBen Perry – M.S. thesis defense

• Elitist

• Chromosome representation

– Integer permutation ordering

– Sample chromosome in a BBN of 5 nodes might look like: 3 1 2 0 4

• Seeding

– Random shuffle

• Operators

– Order crossover

– Swap mutation

• Fitness– RMSE

• Job farm– Java-based; Utilize many machines regardless of OS

Properties of the Genetic AlgorithmProperties of the Genetic Algorithm

Page 11: Kansas State University Department of Computing and Information Sciences Ben Perry – M.S. thesis defense Benjamin B. Perry Laboratory for Knowledge Discovery

Kansas State University

Department of Computing and Information SciencesBen Perry – M.S. thesis defense

Frequency of Validation Set Fitness

0 200 400 600 800 1000 1200 1400

0.802

0.816

0.830

0.844

0.858

0.871

0.885

0.899

0.913

0.927

0.941

0.955

0.969

0.982

0.996

Histogram of estimated fitness for all 8! = 40320 permutations of Asia variables.

• Not encouraging

– Bad fitness functionor bad evidence b.v.

– Many graph errors

GASLEAK resultsGASLEAK results

Page 12: Kansas State University Department of Computing and Information Sciences Ben Perry – M.S. thesis defense Benjamin B. Perry Laboratory for Knowledge Discovery

Kansas State University

Department of Computing and Information SciencesBen Perry – M.S. thesis defense

• SLAM GA – Structure Learning Adjacency Matrix Genetic Algorithm

• Initial population- tried several approaches:

– Completely Random Bayesian Networks (Box-Muller, Max parents)

– Many illegal structures; wrote fixCycles algorithm.

– Random networks generated from parents pre-selected by the Restrict phase of Sparse Candidate

– Performed better than random

– Aggregate of k learned networks from K2 given random orderings (cycles eliminated) – Best approach

Master’s Thesis: SLAM GAMaster’s Thesis: SLAM GA

Page 13: Kansas State University Department of Computing and Information Sciences Ben Perry – M.S. thesis defense Benjamin B. Perry Laboratory for Knowledge Discovery

Kansas State University

Department of Computing and Information SciencesBen Perry – M.S. thesis defense

For small networks, k=1 is best. For larger networks, k=2 is best.

D K2Random Order

K2Random Order

Aggregator

BBN

BBN

K2Random Order BBN

.

.

.

.

Training Data

Aggregate BBN

K2 Manager

BBN

1

2

k

Aggregator InstantiaterAggregator Instantiater

Page 14: Kansas State University Department of Computing and Information Sciences Ben Perry – M.S. thesis defense Benjamin B. Perry Laboratory for Knowledge Discovery

Kansas State University

Department of Computing and Information SciencesBen Perry – M.S. thesis defense

• Chromosome representation

– Edge matrix – n^2 bits

– Each bit represents a parent edge to node.

– 1 = parent, 0 = not parent

• Operators

– Crossover: Swap parents, fix cycles.

SLAM GASLAM GA

Page 15: Kansas State University Department of Computing and Information Sciences Ben Perry – M.S. thesis defense Benjamin B. Perry Laboratory for Knowledge Discovery

Kansas State University

Department of Computing and Information SciencesBen Perry – M.S. thesis defense

SLAM GA: CrossoverSLAM GA: Crossover

Page 16: Kansas State University Department of Computing and Information Sciences Ben Perry – M.S. thesis defense Benjamin B. Perry Laboratory for Knowledge Discovery

Kansas State University

Department of Computing and Information SciencesBen Perry – M.S. thesis defense

• Chromosome representation

– Edge matrix – n^2

– Each bit represents a parent edge to node.

– 1 = parent, 0 = not parent

• Operators

– Crossover: Swap parents, fix cycles.

– Mutation: Reverse, delete, or add a random number of edges. Fix cycles.

• Fitness

– Total Bayesian Dirichlet equivalence score for each node

SLAM GASLAM GA

Page 17: Kansas State University Department of Computing and Information Sciences Ben Perry – M.S. thesis defense Benjamin B. Perry Laboratory for Knowledge Discovery

Kansas State University

Department of Computing and Information SciencesBen Perry – M.S. thesis defense

Results - AsiaResults - Asia

Best of first generation Actual

15 Graph Errors1 Graph Error

Learned network

Page 18: Kansas State University Department of Computing and Information Sciences Ben Perry – M.S. thesis defense Benjamin B. Perry Laboratory for Knowledge Discovery

Kansas State University

Department of Computing and Information SciencesBen Perry – M.S. thesis defense

Best fitness per generation

3300

3350

3400

3450

3500

3550

3600

3650

3700

3750

1 5 9 13

17

21

25

29

33

37

41

45

49

53

57

61

65

69

73

77

81

85

89

93

97

Generation

Best

Fit

ness o

f G

en

era

tio

n

K2x1

K2x2

Rnd

Results – AsiaResults – Asia

Page 19: Kansas State University Department of Computing and Information Sciences Ben Perry – M.S. thesis defense Benjamin B. Perry Laboratory for Knowledge Discovery

Kansas State University

Department of Computing and Information SciencesBen Perry – M.S. thesis defense

Results - PokerResults - Poker

Best of first generation Actual

11 Graph Errors2 Graph Errors

Learned network

Page 20: Kansas State University Department of Computing and Information Sciences Ben Perry – M.S. thesis defense Benjamin B. Perry Laboratory for Knowledge Discovery

Kansas State University

Department of Computing and Information SciencesBen Perry – M.S. thesis defense

Results - PokerResults - Poker

Best fitness per generation

0

500

1000

1500

2000

2500

1 5 9 13

17

21

25

29

33

37

41

45

49

53

57

61

65

69

73

77

81

85

89

93

97

Generations

Best

Fit

ness o

f G

en

era

tio

n

K2x1

K2x2

Rnd

Page 21: Kansas State University Department of Computing and Information Sciences Ben Perry – M.S. thesis defense Benjamin B. Perry Laboratory for Knowledge Discovery

Kansas State University

Department of Computing and Information SciencesBen Perry – M.S. thesis defense

Results - GolfResults - Golf

Best of first generation Actual

11 Graph Errors4 Graph Errors

Learned network

Page 22: Kansas State University Department of Computing and Information Sciences Ben Perry – M.S. thesis defense Benjamin B. Perry Laboratory for Knowledge Discovery

Kansas State University

Department of Computing and Information SciencesBen Perry – M.S. thesis defense

Results - GolfResults - Golf

Best fitness per generation

0

500

1000

1500

2000

2500

3000

3500

1 5 9 13

17

21

25

29

33

37

41

45

49

53

57

61

65

69

73

77

81

85

89

93

97

Generation

Best

Fit

ness o

f G

en

era

tio

n

K2x1

K2x2

Rnd

Page 23: Kansas State University Department of Computing and Information Sciences Ben Perry – M.S. thesis defense Benjamin B. Perry Laboratory for Knowledge Discovery

Kansas State University

Department of Computing and Information SciencesBen Perry – M.S. thesis defense

Results – Boerlage92Results – Boerlage92

Initial ActualLearned network

Page 24: Kansas State University Department of Computing and Information Sciences Ben Perry – M.S. thesis defense Benjamin B. Perry Laboratory for Knowledge Discovery

Kansas State University

Department of Computing and Information SciencesBen Perry – M.S. thesis defense

Results - Boerlage92Results - Boerlage92

Boerlage92

0

200

400

600

800

1000

1200

1400

1600

1 6 11

16

21

26

31

36

41

46

51

56

61

66

71

76

81

86

91

96

Generation

Best

Fit

ness o

f G

en

era

tio

n

K2x1

K2x2

Rnd

Page 25: Kansas State University Department of Computing and Information Sciences Ben Perry – M.S. thesis defense Benjamin B. Perry Laboratory for Knowledge Discovery

Kansas State University

Department of Computing and Information SciencesBen Perry – M.S. thesis defense

Results - AlarmResults - Alarm

Best network per generation

0

1000

2000

3000

4000

5000

6000

7000

8000

1 5 9 13

17

21

25

29

33

37

41

45

49

53

57

61

65

69

73

77

81

85

89

93

97

Generation

Best

Fit

ness o

f G

en

era

tio

n

K2x1

K2x2

Rnd

Page 26: Kansas State University Department of Computing and Information Sciences Ben Perry – M.S. thesis defense Benjamin B. Perry Laboratory for Knowledge Discovery

Kansas State University

Department of Computing and Information SciencesBen Perry – M.S. thesis defense

Final Fitness ValuesFinal Fitness Values

Asia Poker Golf Boerlage92 AlarmK2x1 3722.084 1999.395 3081.16 1228.621 5006.827K2x2 3720.6069 2011.54 3220.985 1429.355 7095.658Random 3722.249 2001.884 3214.614 1459.587 6861.285

Page 27: Kansas State University Department of Computing and Information Sciences Ben Perry – M.S. thesis defense Benjamin B. Perry Laboratory for Knowledge Discovery

Kansas State University

Department of Computing and Information SciencesBen Perry – M.S. thesis defense

K2 vs. SLAM GAK2 vs. SLAM GA

• K2:

– Very good if ordering is known

– Ordering is often not known

– Greedy, very dependent on ordering.

• SLAM GA

– Stochastic; falls out of local optima trap

– Can improve on bad structures learned by K2

– Takes much longer than K2

Page 28: Kansas State University Department of Computing and Information Sciences Ben Perry – M.S. thesis defense Benjamin B. Perry Laboratory for Knowledge Discovery

Kansas State University

Department of Computing and Information SciencesBen Perry – M.S. thesis defense

GASLEAK vs. SLAM GAGASLEAK vs. SLAM GA

• GASLEAK:

– Gold network never recovered

– Much more computationally-expensive

– K2 is run on each [new] individual each generation

– Each chromosome must be scored

– Final network has many graph errors

• SLAM GA

– For small networks, gold standard network often recovered.

– Relatively few graph errors for final network.

– Less computationally intensive

– Initial population most expensive

– Each chromosome must be scored

Page 29: Kansas State University Department of Computing and Information Sciences Ben Perry – M.S. thesis defense Benjamin B. Perry Laboratory for Knowledge Discovery

Kansas State University

Department of Computing and Information SciencesBen Perry – M.S. thesis defense

SLAM GA: RamificationsSLAM GA: Ramifications

• Effective structure learning algorithm

– Ideal for small networks

• Improvement over GASLEAK

– SLAM GA faster in spite of same GA parameters

– SLAM GA more accurate

• Improvement over K2

• Aggregate algorithm produces better initial population

• Parent-swapping crossover technique effective

– Diversifies search space while retaining past information

Page 30: Kansas State University Department of Computing and Information Sciences Ben Perry – M.S. thesis defense Benjamin B. Perry Laboratory for Knowledge Discovery

Kansas State University

Department of Computing and Information SciencesBen Perry – M.S. thesis defense

SLAM GA: Future WorkSLAM GA: Future Work

• Parameter tweaking

• Better fitness function

– Several ‘bad’ structures score better than gold standard

– GA works fine

• ‘Intelligent’ mutation operator

– Add edges from pre-qualified set of candidate parents

• New instantiation methods

– Use GASLEAK

– Other structure-learning algorithms

• Scalability

– Job farm

Page 31: Kansas State University Department of Computing and Information Sciences Ben Perry – M.S. thesis defense Benjamin B. Perry Laboratory for Knowledge Discovery

Kansas State University

Department of Computing and Information SciencesBen Perry – M.S. thesis defense

SummarySummary

• Bayesian Network

• Genetic Algorithms

• Learning Structure: K2, Sparse Candidate

• GASLEAK

• SLAM GA