108
Martin Pelikan Probabilistic Model-Building Genetic Algorithms a.k.a. Estimation of Distribution Algorithms a.k.a. Iterated Density Estimation Algorithms Missouri Estimation of Distribution Algorithms Laboratory (MEDAL) Department of Mathematics and Computer Science University of Missouri - St. Louis [email protected] http://martinpelikan.net/ Copyright is held by the author/owner(s). GECCO’12 Companion, July 7–11, 2012, Philadelphia, PA, USA. ACM 978-1-4503-1178-6/12/07.

Estimation of Distribution Algorithms Tutorial

Embed Size (px)

DESCRIPTION

Probabilistic model-building algorithms (PMBGAs), also called estimation of distribution algorithms (EDAs) and iterated density estimation algorithms (IDEAs), replace traditional variation of genetic and evolutionary algorithms by (1) building a probabilistic model of promising solutions and (2) sampling the built model to generate new candidate solutions. PMBGAs are also known as estimation of distribution algorithms (EDAs) and iterated density-estimation algorithms (IDEAs). Replacing traditional crossover and mutation operators by building and sampling a probabilistic model of promising solutions enables the use of machine learning techniques for automatic discovery of problem regularities and exploitation of these regularities for effective exploration of the search space. Using machine learning in optimization enables the design of optimization techniques that can automatically adapt to the given problem. There are many successful applications of PMBGAs, for example, Ising spin glasses in 2D and 3D, graph partitioning, MAXSAT, feature subset selection, forest management, groundwater remediation design, telecommunication network design, antenna design, and scheduling. This tutorial provides a gentle introduction to PMBGAs with an overview of major research directions in this area. Strengths and weaknesses of different PMBGAs will be discussed and suggestions will be provided to help practitioners to choose the best PMBGA for their problem. The video of this tutorial presented at GECCO-2008 can be found at http://medal.cs.umsl.edu/blog/?p=293

Citation preview

Page 1: Estimation of Distribution Algorithms Tutorial

Martin Pelikan

Probabilistic Model-Building Genetic Algorithms a.k.a. Estimation of Distribution Algorithms a.k.a. Iterated Density Estimation Algorithms

Missouri Estimation of Distribution Algorithms Laboratory (MEDAL) Department of Mathematics and Computer Science University of Missouri - St. Louis [email protected] http://martinpelikan.net/ Copyright is held by the author/owner(s).

GECCO’12 Companion, July 7–11, 2012, Philadelphia, PA, USA. ACM 978-1-4503-1178-6/12/07.

Page 2: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 2

Foreword n  Motivation

¨  Genetic and evolutionary computation (GEC) popular. ¨  Toy problems great, but difficulties in practice. ¨  Must design new representations, operators, tune, …

n  This talk ¨  Discuss a promising direction in GEC. ¨  Combine machine learning and GEC. ¨  Create practical and powerful optimizers.

Page 3: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 3

Overview n  Introduction

¨  Black-box optimization via probabilistic modeling. n  Probabilistic Model-Building GAs

¨  Discrete representation ¨  Continuous representation ¨  Computer programs (PMBGP) ¨  Permutations

n  Conclusions

Page 4: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 4

Problem Formulation

n  Input ¨  How do potential solutions look like? ¨  How to evaluate quality of potential solutions?

n  Output ¨  Best solution (the optimum).

n  Important ¨  No additional knowledge about the problem.

Page 5: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 5

Why View Problem as Black Box?

n  Advantages ¨  Separate problem definition from optimizer. ¨  Easy to solve new problems. ¨  Economy argument.

n  Difficulties ¨  Almost no prior problem knowledge. ¨  Problem specifics must be learned automatically. ¨  Noise, multiple objectives, interactive evaluation.

Page 6: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 6

Representations Considered Here

n  Start with ¨  Solutions are n-bit binary strings.

n  Later ¨  Real-valued vectors. ¨  Program trees. ¨  Permutations

Page 7: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 7

Typical Situation

n  Previously visited solutions + their evaluation:

n  Question: What solution to generate next?

# Solution Evaluation 1 00100 1

2 11011 4

3 01101 0

4 10111 3

Page 8: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 8

Many Answers n  Hill climber

¨  Start with a random solution. ¨  Flip bit that improves the solution most. ¨  Finish when no more improvement possible.

n  Simulated annealing ¨  Introduce Metropolis.

n  Probabilistic model-building GAs ¨  Inspiration from GAs and machine learning (ML).

Page 9: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 9

Probabilistic Model-Building GAs

01011

11000

11001

11101

11001

10101

01011

11000

Selected population

Current population

Probabilistic Model 11011

00111

01111

11001

New population

…replace crossover+mutation with learning and sampling probabilistic model

Page 10: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 10

Other Names for PMBGAs

n  Estimation of distribution algorithms (EDAs) (Mühlenbein & Paass, 1996)

n  Iterated density estimation algorithms (IDEA) (Bosman & Thierens, 2000)

Page 11: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 11

Implicit vs. Explicit Model n  GAs and PMBGAs perform similar task

¨  Generate new solutions using probability distribution based on selected solutions.

n  GAs ¨  Variation defines implicit probability distribution of

target population given original population and variation operators (crossover and mutation).

n  PMBGAs ¨  Explicit probabilistic model of selected candidate

solutions is built and sampled.

Page 12: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 12

What Models to Use?

n  Start with a simple example ¨  Probability vector for binary strings.

n  Later ¨  Dependency tree models (COMIT). ¨  Bayesian networks (BOA). ¨  Bayesian networks with local structures (hBOA).

Page 13: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 13

Probability Vector

n  Assume n-bit binary strings. n  Model: Probability vector p=(p1, …, pn)

¨  pi = probability of 1 in position i ¨  Learn p: Compute proportion of 1 in each position. ¨  Sample p: Sample 1 in position i with prob. pi

Page 14: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 14

Example: Probability Vector

(Mühlenbein, Paass, 1996), (Baluja, 1994)

01011

11000

11001

10101

Selected population

11101

11001

10101

10001

New population

11001

10101

01011

11000

1.0 0.5 0.5 0.0 1.0

Probability vector

Current population

Page 15: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 15

Probability Vector PMBGAs n  PBIL (Baluja, 1995)

¨  Incremental updates to the prob. vector. n  Compact GA (Harik, Lobo, Goldberg, 1998)

¨  Also incremental updates but better analogy with populations.

n  UMDA (Mühlenbein, Paass, 1996) ¨  What we showed here.

n  DEUM (Shakya et al., 2004) n  All variants perform similarly.

Page 16: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 16

Probability Vector Dynamics

n Bits that perform better get more copies. n And are combined in new ways. n But context of each bit is ignored.

n Example problem 1: Onemax

∑=

=n

iin XXXXf

121 ),,,( …

Page 17: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 17

Probability Vector on Onemax

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Generation

Prob

abilit

y ve

ctor

ent

ries

Page 18: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 18

Probability Vector: Ideal Scale-up

n  O(n log n) evaluations until convergence ¨  (Harik, Cantú-Paz, Goldberg, & Miller, 1997) ¨  (Mühlenbein, Schlierkamp-Vosen, 1993)

n  Other algorithms ¨  Hill climber: O(n log n) (Mühlenbein, 1992) ¨  GA with uniform: approx. O(n log n) ¨  GA with one-point: slightly slower

Page 19: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 19

When Does Prob. Vector Fail?

n  Example problem 2: Concatenated traps ¨  Partition input string into disjoint groups of 5 bits. ¨  Groups contribute via trap (ones=number of ones):

¨  Concatenated trap = sum of single traps ¨  Optimum: String 111…1

trap ones( ) = 5 if ones = 54 − ones otherwise

"#$

Page 20: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 20

Trap-5

0 1 2 3 4 5

0

1

2

3

4

5

Number of ones, u

trap(

u)

Page 21: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 21

Probability Vector on Traps

0 10 20 30 40 500

0.10.20.30.40.5

0.60.70.80.9

1

Generation

Prob

abilit

y ve

ctor

ent

ries

Page 22: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 22

Why Failure?

n  Onemax: ¨  Optimum in 111…1 ¨  1 outperforms 0 on average.

n  Traps: optimum in 11111, but n  f(0****) = 2 n  f(1****) = 1.375

n  So single bits are misleading.

Page 23: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 23

How to Fix It?

n  Consider 5-bit statistics instead 1-bit ones. n  Then, 11111 would outperform 00000. n  Learn model

¨  Compute p(00000), p(00001), …, p(11111)

n  Sample model ¨  Sample 5 bits at a time ¨  Generate 00000 with p(00000),

00001 with p(00001), …

Page 24: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 24

0 10 20 30 40 500

0.10.20.30.40.50.60.70.80.9

1

Generation

Prob

abilit

ies

of 1

1111

Correct Model on Traps: Dynamics

Page 25: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 25

Good News: Good Stats Work Great!

n  Optimum in O(n log n) evaluations. n  Same performance as on onemax! n  Others

¨  Hill climber: O(n5 log n) = much worse. ¨  GA with uniform: O(2n) = intractable. ¨  GA with k-point xover: O(2n) (w/o tight linkage).

Page 26: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 26

Challenge

n  If we could learn and use relevant context for each position ¨  Find non-misleading statistics. ¨  Use those statistics as in probability vector.

n  Then we could solve problems decomposable into statistics of order at most k with at most O(n2) evaluations! ¨  And there are many such problems (Simon, 1968).

Page 27: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 27

What’s Next? n  COMIT

¨  Use tree models

n  Extended compact GA ¨  Cluster bits into groups.

n  Bayesian optimization algorithm (BOA) ¨  Use Bayesian networks (more general).

Page 28: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 28

Beyond single bits: COMIT

String

Model

X P(Y=1|X) 0 30 % 1 25 %

P(X=1) 75 %

X P(Z=1|X) 0 86 % 1 75 %

(Baluja, Davies, 1997)

Page 29: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 29

How to Learn a Tree Model? n  Mutual information:

n  Goal ¨  Find tree that maximizes mutual information

between connected nodes. ¨  Will minimize Kullback-Leibler divergence.

n  Algorithm ¨  Prim’s algorithm for maximum spanning trees.

I(Xi ,Xj ) = P(Xi = a,Xj = b)a,b∑ log

P(Xi = a,Xj = b)P(Xi = a)P(Xj = b)

Page 30: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 30

Prim’s Algorithm

n  Start with a graph with no edges. n  Add arbitrary node to the tree. n  Iterate

¨  Hang a new node to the current tree. ¨  Prefer addition of edges with large mutual

information (greedy approach).

n  Complexity: O(n2)

Page 31: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 31

Variants of PMBGAs with Tree Models

n  COMIT (Baluja, Davies, 1997) ¨  Tree models.

n  MIMIC (DeBonet, 1996) ¨  Chain distributions.

n  BMDA (Pelikan, Mühlenbein, 1998) ¨  Forest distribution (independent trees or tree)

Page 32: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 32

Beyond Pairwise Dependencies: ECGA

n  Extended Compact GA (ECGA) (Harik, 1999). n  Consider groups of string positions.

0 86 %

1 14 %

String Model

000 17 %

001 2 %

· · · 111 24 %

00 16 %

01 45 %

10 35 %

11 4 %

Page 33: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 33

Learning the Model in ECGA

n  Start with each bit in a separate group. n  Each iteration merges two groups for best

improvement.

Page 34: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 34

How to Compute Model Quality?

n  ECGA uses minimum description length. n  Minimize number of bits to store model+data:

n  Each frequency needs (0.5 log N) bits:

n  Each solution X needs -log p(X) bits:

MDL( M , D) = DModel + DData

DModel = 2|g |−1 log N

g∈G∑

DData = −N p( X ) log p( X )

X∑

Page 35: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 35

Sampling Model in ECGA

n  Sample groups of bits at a time.

n  Based on observed probabilities/proportions.

n  But can also apply population-based crossover similar to uniform but w.r.t. model.

Page 36: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 36

Building-Block-Wise Mutation in ECGA

n  Sastry, Goldberg (2004); Lima et al. (2005) n  Basic idea

¨  Use ECGA model builder to identify decomposition ¨  Use the best solution for BB-wise mutation ¨  For each k-bit partition (building block)

n  Evaluate the remaining 2k-1 instantiations of this BB n  Use the best instantiation of this BB

n  Result (for order-k separable problems) ¨  BB-wise mutation is times faster than ECGA! ¨  But only for separable problems (and similar ones).

O k logn( )

Page 37: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 37

What’s Next?

n  We saw ¨  Probability vector (no edges). ¨  Tree models (some edges). ¨  Marginal product models (groups of variables).

n  Next: Bayesian networks ¨  Can represent all above and more.

Page 38: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 38

Bayesian Optimization Algorithm (BOA)

n  Pelikan, Goldberg, & Cantú-Paz (1998) n  Use a Bayesian network (BN) as a model. n  Bayesian network

¨  Acyclic directed graph. ¨  Nodes are variables (string positions). ¨  Conditional dependencies (edges). ¨  Conditional independencies (implicit).

Page 39: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 39

Example: Bayesian Network (BN)

n  Conditional dependencies. n  Conditional independencies.

Page 40: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 40

BOA

Current population

Selected population

New population

Bayesian network

Page 41: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 41

Learning BNs

n  Two things again: ¨  Scoring metric (as MDL in ECGA). ¨  Search procedure (in ECGA done by merging).

Page 42: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 42

Learning BNs: Scoring Metrics

n  Bayesian metrics ¨  Bayesian-Dirichlet with likelihood equivallence

n  Minimum description length metrics ¨  Bayesian information criterion (BIC)

BD(B) = p(B)

Γ(m '(π i ))Γ(m '(π i ) + m(π i ))

Γ(m '(xi ,π i ) + m(xi ,π i ))Γ(m '(xi ,π i ))xi

∏π

i

∏i=1

n

BIC(B) = −H ( Xi |Πi )N − 2Πi

log2 N2

#

$%&

'(i=1

n

Page 43: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 43

Learning BNs: Search Procedure

n  Start with empty network (like ECGA). n  Execute primitive operator that improves the

metric the most (greedy). n  Until no more improvement possible. n  Primitive operators

¨  Edge addition (most important). ¨  Edge removal. ¨  Edge reversal.

Page 44: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 44

Learning BNs: Example

Page 45: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 45

BOA and Problem Decomposition

n  Conditions for factoring problem decomposition into a product of prior and conditional probabilities of small order in Mühlenbein, Mahnig, & Rodriguez (1999).

n  In practice, approximate factorization sufficient that can be learned automatically.

n  Learning makes complete theory intractable.

Page 46: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 46

BOA Theory: Population Sizing n  Initial supply (Goldberg et al., 2001)

¨  Have enough stuff to combine.

n  Decision making (Harik et al, 1997) ¨  Decide well between competing partial sols.

n  Drift (Thierens, Goldberg, Pereira, 1998) ¨  Don’t lose less salient stuff prematurely.

n  Model building (Pelikan et al., 2000, 2002) ¨  Find a good model.

O n( )

O n1.05( )

O n log n( ) O 2k( )

Page 47: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 47

BOA Theory: Num. of Generations

n  Two extreme cases, everything in the middle. n  Uniform scaling

¨  Onemax model (Muehlenbein & Schlierkamp-Voosen, 1993)

n  Exponential scaling ¨  Domino convergence (Thierens, Goldberg, Pereira, 1998)

O n( )

O n( )

Page 48: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 48

Good News: Challenge Met! n  Theory

¨  Population sizing (Pelikan et al., 2000, 2002) n  Initial supply. n  Decision making. n  Drift. n  Model building.

¨  Number of iterations (Pelikan et al., 2000, 2002) n  Uniform scaling. n  Exponential scaling.

n  BOA solves order-k decomposable problems in O(n1.55) to O(n2) evaluations!

O(n) to O(n1.05)

O(n0.5) to O(n)

Page 49: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 49

Theory vs. Experiment (5-bit Traps)

100 125 150 175 200 225 250

100000

150000

200000

250000

300000

350000

400000 450000 500000

Problem Size

Num

ber o

f Eva

luat

ions

Experiment Theory

Page 50: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 50

BOA Siblings

n  Estimation of Bayesian Networks Algorithm (EBNA) (Etxeberria, Larrañaga, 1999).

n  Learning Factorized Distribution Algorithm (LFDA) (Mühlenbein, Mahnig, Rodriguez, 1999).

Page 51: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 51

Another Option: Markov Networks

n  MN-FDA, MN-EDA (Santana; 2003, 2005) n  Similar to Bayes nets but with undirected edges.

Page 52: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 52

Yet Another Option: Dependency Networks

n  Estimation of dependency networks algorithm (EDNA) ¨  Gamez, Mateo, Puerta (2007). ¨  Use dependency network as a model. ¨  Dependency network learned from pairwise interactions. ¨  Use Gibbs sampling to generate new solutions.

n  Dependency network ¨  Parents of a variable= all variables influencing this variable. ¨  Dependency network can contain cycles.

Page 53: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 53

Model Comparison

BMDA ECGA BOA

Model Expressiveness Increases

Page 54: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 54

From single level to hierarchy

n  Single-level decomposition powerful. n  But what if single-level decomposition is not

enough? n  Learn from humans and nature

¨  Decompose problem over multiple levels. ¨  Use solutions from lower level as basic building

blocks. ¨  Solve problem hierarchically.

Page 55: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 55

Hierarchical Decomposition Car

Engine Braking system Electrical system

Fuel system Valves Ignition system

Page 56: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 56

Three Keys to Hierarchy Success

n  Proper decomposition ¨  Must decompose problem on each level properly.

n  Chunking ¨  Must represent & manipulate large order solutions.

n  Preservation of alternative solutions ¨  Must preserve alternative partial solutions

(chunks).

Page 57: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 57

Hierarchical BOA (hBOA)

n  Pelikan & Goldberg (2000, 2001) n  Proper decomposition

¨  Use Bayesian networks like BOA.

n  Chunking ¨  Use local structures in Bayesian networks.

n  Preservation of alternative solutions. ¨  Use restricted tournament replacement (RTR). ¨  Can use other niching methods.

Page 58: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 58

Local Structures in BNs

n  Look at one conditional dependency. ¨  2k probabilities for k parents.

n  Why not use more powerful representations for conditional probabilities?

X1

X3 X2

X2X3 P(X1=0|X2X3) 00 26 %

01 44 %

10 15 %

11 15 %

Page 59: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 59

Local Structures in BNs

n  Look at one conditional dependency. ¨  2k probabilities for k parents.

n  Why not use more powerful representations for conditional probabilities?

X2

X3

0 1

0 1

26% 44%

15%

X1

X3 X2

Page 60: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 60

Restricted Tournament Replacement

n  Used in hBOA for niching. n  Insert each new candidate solution x like this:

¨  Pick random subset of original population. ¨  Find solution y most similar to x in the subset. ¨  Replace y by x if x is better than y.

Page 61: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 61

Hierarchical Traps: The Ultimate Test

n  Combine traps on more levels. n  Each level contributes to fitness. n  Groups of bits map to next level.

Page 62: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 62

hBOA on Hierarchical Traps

27 81 243 729

104

105

106

Problem Size

Num

ber o

f Eva

luatio

ns

Experiment O(n1.63 log(n))

Page 63: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 63

PMBGAs Are Not Just Optimizers

n  PMBGAs provide us with two things ¨  Optimum or its approximation. ¨  Sequence of probabilistic models.

n  Probabilistic models ¨  Encode populations of increasing quality. ¨  Tell us a lot about the problem at hand. ¨  Can we use this information?

Page 64: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 64

Efficiency Enhancement for PMBGAs

n  Sometimes O(n2) is not enough ¨  High-dimensional problems (1000s of variables) ¨  Expensive evaluation (fitness) function

n  Solution ¨  Efficiency enhancement techniques

Page 65: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 65

Efficiency Enhancement Types

n  7 efficiency enhancement types for PMBGAs ¨  Parallelization ¨  Hybridization ¨  Time continuation ¨  Fitness evaluation relaxation ¨  Prior knowledge utilization ¨  Incremental and sporadic model building ¨  Learning from experience

Page 66: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 66

Multi-objective PMBGAs n  Methods for multi-objective GAs adopted

¨  Multi-objective mixture-based IDEAs (Thierens, & Bosman, 2001)

¨  Another multi-objective BOA (from SPEA2 and mBOA) (Laumanns, & Ocenasek, 2002)

¨  Multi-objective hBOA (from NSGA-II and hBOA) (Khan, Goldberg, & Pelikan, 2002) (Pelikan, Sastry, & Goldberg, 2005)

¨  Regularity Model Based Multiobjective EDA (RM-MEDA) (Zhang, Zhou, Jin, 2008)

Page 67: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 67

Promising Results with Discrete PMBGAs

n  Artificial classes of problems n  Physics n  Bioinformatics n  Computational complexity and AI n  Others

Page 68: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 68

Results: Artificial Problems

n  Decomposition ¨  Concatenated traps (Pelikan et al., 1998).

n  Hierarchical decomposition ¨  Hierarchical traps (Pelikan, Goldberg, 2001).

n  Other sources of difficulty ¨  Exponential scaling, noise (Pelikan, 2002).

Page 69: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 69

BOA on Concatenated 5-bit Traps

100 125 150 175 200 225 250

100000

150000

200000

250000

300000

350000

400000 450000 500000

Problem Size

Num

ber o

f Eva

luat

ions

Experiment Theory

Page 70: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 70

hBOA on Hierarchical Traps

27 81 243 729

104

105

106

Problem Size

Num

ber o

f Eva

luatio

ns

Experiment O(n1.63 log(n))

Page 71: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 71

Results: Physics n  Spin glasses (Pelikan et al., 2002, 2006, 2008)

(Hoens, 2005) (Santana, 2005) (Shakya et al., 2006) ¨  ±J and Gaussian couplings ¨  2D and 3D spin glass ¨  Sherrington-Kirkpatrick (SK) spin glass

n  Silicon clusters (Sastry, 2001) ¨  Gong potential (3-body)

Page 72: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 72

hBOA on Ising Spin Glasses (2D)

64 100 144 196 256 324 400

103

Problem Size

Num

ber o

f Eva

luatio

nshBOA

O(n1.51)

Number of Spins

Num

ber o

f Eva

luat

ions

Page 73: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 73

Results on 2D Spin Glasses

n  Number of evaluations is O(n 1.51). n  Overall time is O(n 3.51). n  Compare O(n3.51) to O(n3.5) for best method

(Galluccio & Loebl, 1999) n  Great also on Gaussians.

Page 74: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 74

hBOA on Ising Spin Glasses (3D)

64 125 216 343103

104

105

106

Problem Size

Num

ber o

f Eva

luatio

ns

Experimental average

O(n3.63 )

Number of Spins

Num

ber o

f Eva

luat

ions

Page 75: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 75

hBOA on SK Spin Glass

Page 76: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 76

Results: Computational Complexity, AI

n  MAXSAT, SAT (Pelikan, 2002) ¨  Random 3CNF from phase transition. ¨  Morphed graph coloring. ¨  Conversion from spin glass.

n  Feature subset selection (Inza et al., 2001) (Cantu-Paz, 2004)

Page 77: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 77

Results: Some Others n  Military antenna design (Santarelli et al., 2004) n  Groundwater remediation design (Arst et al., 2004) n  Forest management (Ducheyne et al., 2003) n  Nurse scheduling (Li, Aickelin, 2004) n  Telecommunication network design (Rothlauf, 2002) n  Graph partitioning (Ocenasek, Schwarz, 1999; Muehlenbein, Mahnig,

2002; Baluja, 2004) n  Portfolio management (Lipinski, 2005, 2007) n  Quantum excitation chemistry (Sastry et al., 2005) n  Maximum clique (Zhang et al., 2005) n  Cancer chemotherapy optimization (Petrovski et al., 2006) n  Minimum vertex cover (Pelikan et al., 2007) n  Protein folding (Santana et al., 2007) n  Side chain placement (Santana et al., 2007)

Page 78: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 78

Discrete PMBGAs: Summary n  No interactions

¨  Univariate models; PBIL, UMDA, cGA. n  Some pairwise interactions

¨  Tree models; COMIT, MIMIC, BMDA. n  Multivariate interactions

¨  Multivariate models: BOA, EBNA, LFDA. n  Hierarchical decomposition

¨  hBOA

Page 79: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 79

Discrete PMBGAs: Recommendations

n  Easy problems ¨  Use univariate models; PBIL, UMDA, cGA.

n  Somewhat difficult problems ¨  Use bivariate models; MIMIC, COMIT, BMDA.

n  Difficult problems ¨  Use multivariate models; BOA, EBNA, LFDA.

n  Most difficult problems ¨  Use hierarchical decomposition; hBOA.

Page 80: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 80

Real-Valued PMBGAs

n  New challenge ¨  Infinite domain for each variable. ¨  How to model?

n  2 approaches ¨  Discretize and apply discrete model/PMBGA ¨  Create model for real-valued variables

n  Estimate pdf.

Page 81: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 81

PBIL Extensions: First Step n  SHCwL: Stochastic hill climbing with learning

(Rudlof, Köppen, 1996). n  Model

¨  Single-peak Gaussian for each variable. ¨  Means evolve based on parents (promising solutions). ¨  Deviations equal, decreasing over time.

n  Problems ¨  No interactions. ¨  Single Gaussians=can model only one attractor. ¨  Same deviations for each variable.

Page 82: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 82

Use Different Deviations

n  Sebag, Ducoulombier (1998) n  Some variables have higher variance. n  Use special standard deviation for each

variable.

Page 83: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 83

Use Covariance

n  Covariance allows rotation of 1-peak Gaussians. n  EGNA (Larrañaga et al., 2000) n  IDEA (Bosman, Thierens, 2000)

Page 84: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 84

How Many Peaks?

n  One Gaussian vs. kernel around each point. n  Kernel distribution similar to ES. n  IDEA (Bosman, Thierens, 2000)

Page 85: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 85

-4 -2 0 2 4-4

-2

0

2

4

Mixtures: Between One and Many n  Mixture distributions provide transition between one

Gaussian and Gaussian kernels. n  Mixture types

¨  Over one variable. n  Gallagher, Frean, & Downs (1999).

¨  Over all variables. n  Pelikan & Goldberg (2000). n  Bosman & Thierens (2000).

¨  Over partitions of variables. n  Bosman & Thierens (2000). n  Ahn, Ramakrishna, and Goldberg (2004).

Page 86: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 86

Mixed BOA (mBOA) n  Mixed BOA (Ocenasek, Schwarz, 2002) n  Local distributions

¨  A decision tree (DT) for every variable. ¨  Internal DT nodes encode tests on other variables

n Discrete: Equal to a constant n Continuous: Less than a constant

¨  Discrete variables: DT leaves represent probabilities.

¨  Continuous variables: DT leaves contain a normal kernel distribution.

Page 87: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 87

Real-Coded BOA (rBOA)

n  Ahn, Ramakrishna, Goldberg (2003) n  Probabilistic Model

¨  Underlying structure: Bayesian network ¨  Local distributions: Mixtures of Gaussians

n  Also extended to multiobjective problems (Ahn, 2005)

Page 88: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 88

Aggregation Pheromone System (APS)

n  Tsutsui (2004) n  Inspired by aggregation pheromones n  Basic idea

¨  Good solutions emit aggregation pheromones ¨  New candidate solutions based on the density of

aggregation pheromones ¨  Aggregation pheromone density encodes a mixture

distribution

Page 89: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 89

Adaptive Variance Scaling

n  Adaptive variance in mBOA ¨  Ocenasek et al. (2004)

n  Normal IDEAs ¨  Bosman et al. (2006, 2007) ¨  Correlation-triggered adaptive variance scaling ¨  Standard-deviation ratio (SDR) triggered variance

scaling

Page 90: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 90

Real-Valued PMBGAs: Discretization

n  Idea: Transform into discrete domain. n  Fixed models

¨  2k equal-width bins with k-bit binary string. ¨  Goldberg (1989). ¨  Bosman & Thierens (2000); Pelikan et al. (2003).

n  Adaptive models ¨  Equal-height histograms of 2k bins. ¨  k-means clustering on each variable. ¨  Pelikan, Goldberg, & Tsutsui (2003); Cantu-Paz (2001).

Page 91: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 91

Real-Valued PMBGAs: Summary n  Discretization

¨  Fixed ¨  Adaptive

n  Real-valued models ¨  Single or multiple peaks? ¨  Same variance or different variance? ¨  Covariance or no covariance? ¨  Mixtures? ¨  Treat entire vectors, subsets of variables, or single

variables?

Page 92: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 92

Real-Valued PMBGAs: Recommendations

n  Multimodality? ¨  Use multiple peaks.

n  Decomposability? ¨  All variables, subsets, or single variables.

n  Strong linear dependencies? ¨  Covariance.

n  Partial differentiability? ¨  Combine with gradient search.

Page 93: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 93

PMBGP (Genetic Programming) n  New challenge

¨  Structured, variable length representation. ¨  Possibly infinitely many values. ¨  Position independence (or not). ¨  Low correlation between solution quality and

solution structure (Looks, 2006).

n  Approaches ¨  Use explicit probabilistic models for trees. ¨  Use models based on grammars.

Page 94: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 94

PIPE n  Probabilistic incremental

program evolution (Salustowicz & Schmidhuber, 1997)

n  Store frequencies of operators/terminals in nodes of a maximum tree.

n  Sampling generates tree from top to bottom

X P(X) sin 0.15 + 0.35 - 0.35 X 0.15

Page 95: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 95

eCGP

n  Sastry & Goldberg (2003) n  ECGA adapted to program trees. n  Maximum tree as in PIPE. n  But nodes partitioned into groups.

Page 96: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 96

BOA for GP

n  Looks, Goertzel, & Pennachin (2004) n  Combinatory logic + BOA

¨  Trees translated into uniform structures. ¨  Labels only in leaves. ¨  BOA builds model over symbols in different nodes.

n  Complexity build-up ¨  Modeling limited to max. sized structure seen. ¨  Complexity builds up by special operator.

Page 97: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 97

MOSES

n  Looks (2006). n  Evolve demes of programs. n  Each deme represents similar structures. n  Apply PMBGA to each deme (e.g. hBOA). n  Introduce new demes/delete old ones. n  Use normal forms to reduce complexity.

Page 98: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 98

PMBGP with Grammars n  Use grammars/stochastic grammars as models. n  Grammars restrict the class of programs.

n  Some representatives ¨  Program evolution with explicit learning (Shan et al., 2003) ¨  Grammar-based EDA for GP (Bosman, de Jong, 2004) ¨  Stochastic grammar GP (Tanev, 2004) ¨  Adaptive constrained GP (Janikow, 2004)

Page 99: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 99

PMBGP: Summary

n  Interesting starting points available. n  But still lot of work to be done. n  Much to learn from discrete domain, but some

completely new challenges. n  Research in progress

Page 100: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 100

PMBGAs for Permutations

n  New challenges ¨  Relative order ¨  Absolute order ¨  Permutation constraints

n  Two basic approaches ¨  Random-key and real-valued PMBGAs ¨  Explicit probabilistic models for permutations

Page 101: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 101

Random Keys and PMBGAs n  Bengoetxea et al. (2000); Bosman et al. (2001) n  Random keys (Bean, 1997)

¨  Candidate solution = vector of real values ¨  Ascending ordering gives a permutation

n  Can use any real-valued PMBGA (or GEA) ¨  IDEAs (Bosman, Thierens, 2002) ¨  EGNA (Larranaga et al., 2001)

n  Strengths and weaknesses ¨  Good: Can use any real-valued PMBGA. ¨  Bad: Redundancy of the encoding.

Page 102: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 102

Direct Modeling of Permutations

n  Edge-histogram based sampling algorithm (EHBSA) (Tsutsui, Pelikan, Goldberg, 2003) ¨  Permutations of n elements ¨  Model is a matrix A=(ai,j)i,j=1, 2, …, n ¨  ai,j represents the probability of edge (i, j) ¨  Uses template to reduce exploration ¨  Applicable also to scheduling

Page 103: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 103

ICE: Modify Crossover from Model

n  ICE ¨  Bosman, Thierens (2001). ¨  Represent permutations with random keys. ¨  Learn multivariate model to factorize the problem. ¨  Use the learned model to modify crossover.

n  Performance ¨  Typically outperforms IDEAs and other PMBGAs

that learn and sample random keys.

Page 104: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 104

Multivariate Permutation Models n  Basic approach

¨  Use any standard multivariate discrete model. ¨  Restrict sampling to permutations in some way. ¨  Bengoetxea et al. (2000), Pelikan et al. (2007).

n  Strengths and weaknesses ¨  Use explicit multivariate models to find regularities. ¨  High-order alphabet requires big samples for good models. ¨  Sampling can introduce unwanted bias. ¨  Inefficient encoding for only relative ordering constraints,

which can be encoded simpler.

Page 105: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 105

Conclusions n  Competent PMBGAs exist

¨  Scalable solution to broad classes of problems. ¨  Solution to previously intractable problems. ¨  Algorithms ready for new applications.

n  PMBGAs do more than just solve the problem ¨  They provide us with sequences of probabilistic models. ¨  The probabilistic models tell us a lot about the problem.

n  Consequences for practitioners ¨  Robust methods with few or no parameters. ¨  Capable of learning how to solve problem. ¨  But can incorporate prior knowledge as well. ¨  Can solve previously intractable problems.

Page 106: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 106

Starting Points n  World wide web n  Books and surveys

¨  Larrañaga & Lozano (eds.) (2001). Estimation of distribution algorithms: A new tool for evolutionary computation. Kluwer.

¨  Pelikan et al. (2002). A survey to optimization by building and using probabilistic models. Computational optimization and applications, 21(1), pp. 5-20.

¨  Pelikan (2005). Hierarchical BOA: Towards a New Generation of Evolutionary Algorithms. Springer.

¨  Lozano, Larrañaga, Inza, Bengoetxea (2006). Towards a New Evolutionary Computation: Advances on Estimation of Distribution Algorithms, Springer.

¨  Pelikan, Sastry, Cantu-Paz (eds.) (2006). Scalable Optimization via Probabilistic Modeling: From Algorithms to Applications, Springer.

Page 107: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 107

Online Code (1/2) n  BOA, BOA with decision graphs, dependency-tree EDA

http://medal-lab.org/

n  ECGA, xi-ary ECGA, BOA, and BOA with decision trees/graphs http://www.illigal.org/

n  mBOA http://jiri.ocenasek.com/

n  PIPE http://www.idsia.ch/~rafal/

n  Real-coded BOA http://www.evolution.re.kr/

Page 108: Estimation of Distribution Algorithms Tutorial

Martin Pelikan, Probabilistic Model-Building GAs 108

Online Code (2/2) n  Demos of APS and EHBSA

http://www.hannan-u.ac.jp/~tsutsui/research-e.html

n  RM-MEDA: A Regularity Model Based Multiobjective EDA Differential Evolution + EDA hybrid http://cswww.essex.ac.uk/staff/qzhang/mypublication.htm

n  Naive Multi-objective Mixture-based IDEA (MIDEA) Normal IDEA-Induced Chromosome Elements Exchanger (ICE) Normal Iterated Density-Estimation Evolutionary Algorithm (IDEA) http://homepages.cwi.nl/~bosman/code.html