19
Spurious Dependencies and EDA Scalability Elizabeth Radetic and Martin Pelikan Missouri Estimation of Distribution Algorithms Laboratory (MEDAL) University of Missouri, St. Louis, MO http://medal.cs.umsl.edu/ [email protected] Download MEDAL Report No. 2010002 http://medal.cs.umsl.edu/files/2010002.pdf Elizabeth Radetic and Martin Pelikan Spurious Dependencies and EDA Scalability

Spurious Dependencies and EDA Scalability

Embed Size (px)

DESCRIPTION

More on this work can be found in the technical report: http://medal.cs.umsl.edu/show_abstract.php?type=tr&number=2010002

Citation preview

Page 1: Spurious Dependencies and EDA Scalability

Spurious Dependencies and EDA Scalability

Elizabeth Radetic and Martin PelikanMissouri Estimation of Distribution Algorithms Laboratory (MEDAL)

University of Missouri, St. Louis, MOhttp://medal.cs.umsl.edu/

[email protected]

Download MEDAL Report No. 2010002

http://medal.cs.umsl.edu/files/2010002.pdf

Elizabeth Radetic and Martin Pelikan Spurious Dependencies and EDA Scalability

Page 2: Spurious Dependencies and EDA Scalability

Motivation

Estimation of distribution algorithms (EDAs)I Replace standard crossover and mutation by

I building a probabilistic model of selected solutions, andI sampling the probabilistic model to generate new solutions.

I Can solve many problems intractable with standard EAs.

Model accuracyI It is important that the EDA model is accurate.I Types of inaccuracies for dependency-based models

I Missing dependencies.I Spurious, unnecessary dependencies.

I Most prior work focused on missing dependencies.

This studyI Focus on effects of spurious dependencies.

I Theoretical study for population sizing.I Empirical study for the number of generations.

Elizabeth Radetic and Martin Pelikan Spurious Dependencies and EDA Scalability

Page 3: Spurious Dependencies and EDA Scalability

Outline

1. Model accuracy.

2. Spurious dependenciesI Model for spurious dependencies.I Effects on population sizing.I Effects on the number of generations.

3. Experiments.

4. Conclusions and future work.

Elizabeth Radetic and Martin Pelikan Spurious Dependencies and EDA Scalability

Page 4: Spurious Dependencies and EDA Scalability

Dependency-Based Probabilistic Models in EDAs

Dependency-based probabilistic models

I Encode dependencies and independencies between variables.I Dependency structure decomposes the problem.I Subproblems should be of bounded order.

Examples

I Marginal product models.I Bayesian networks.

Elizabeth Radetic and Martin Pelikan Spurious Dependencies and EDA Scalability

Page 5: Spurious Dependencies and EDA Scalability

Marginal Product Model

I Variables are divided into linkage groups.I Defines problem decomposition into separable subproblems.I Distribution of each group encoded by probability table.I We assume binary representation of candidate solutions.

Martin Pelikan, Probabilistic Model-Building GAs 29

How to Learn a Tree Model? !  Mutual information:

!  Goal "  Find tree that maximizes mutual information

between connected nodes. "  Will minimize Kullback-Leibler divergence.

!  Algorithm "  Prim’s algorithm for maximum spanning trees.

I(Xi ,Xj ) = P(Xi = a,Xj = b)a,b! log

P(Xi = a,Xj = b)P(Xi = a)P(Xj = b)

Martin Pelikan, Probabilistic Model-Building GAs 30

Prim’s Algorithm

!  Start with a graph with no edges. !  Add arbitrary node to the tree. !  Iterate

"  Hang a new node to the current tree. "  Prefer addition of edges with large mutual

information (greedy approach).

!  Complexity: O(n2)

Martin Pelikan, Probabilistic Model-Building GAs 31

Variants of PMBGAs with Tree Models

!  COMIT (Baluja, Davies, 1997) "  Tree models.

!  MIMIC (DeBonet, 1996) "  Chain distributions.

!  BMDA (Pelikan, Mühlenbein, 1998) "  Forest distribution (independent trees or tree)

Martin Pelikan, Probabilistic Model-Building GAs 32

Beyond Pairwise Dependencies: ECGA

!  Extended Compact GA (ECGA) (Harik, 1999). !  Consider groups of string positions.

0 86 %

1 14 %

String Model

000 17 %

001 2 %

! ! ! 111 24 %

00 16 %

01 45 %

10 35 %

11 4 %

Elizabeth Radetic and Martin Pelikan Spurious Dependencies and EDA Scalability

Page 6: Spurious Dependencies and EDA Scalability

Model Accuracy

Types of inaccuraciesI Missing dependencies.I Spurious, unnecessary dependencies.

Example: Trap-5

I ftrap5(X1, . . . , Xn) =∑n/5

i=1 trap5(X5i−4 + X5i−3 + X5i−2 + X5i−1 + X5i)

I trap5(u) =

{5 if u = 54− u otherwise

Elizabeth Radetic and Martin Pelikan Spurious Dependencies and EDA Scalability

Page 7: Spurious Dependencies and EDA Scalability

Onemax Model of Spurious Dependencies

Onemax is the sum of bits in the binary string

I onemax(X1, . . . , Xn) =∑n

i=1 Xi

Perfect and spurious models for onemax

I Perfect model assumes no dependence at all.I Spurious model assumes linkage groups of order kspurious > 1.I Parameter kspurious controls order of spurious dependencies.

Elizabeth Radetic and Martin Pelikan Spurious Dependencies and EDA Scalability

Page 8: Spurious Dependencies and EDA Scalability

Effects of Spurious Models on EDA Performance

Two main effects of spurious dependencies

I Population size.I Number of generations.

Population sizing decomposition

I Population size requirements should increaseI Effects depend on learning, but sometimes substantial.

Number of generations

I Number of generations may decrease due to weaker variation.I Effects not expected substantial.

Elizabeth Radetic and Martin Pelikan Spurious Dependencies and EDA Scalability

Page 9: Spurious Dependencies and EDA Scalability

EDA Population Sizing and Spurious Dependencies

Population sizing decomposition

I Initial supplyI Initial population is random.I Ensure sufficient supply of partial solutions for each group.

I Decision makingI Decision making between partial solutions is stochastic.I Ensure that best partial solution wins in each group.

I Model buildingI Ensure accurate enough models to find the optimum.I The reason for spurious dependencies, not the effect.

Focus in this work

I Initial supply.I Decision making.

Elizabeth Radetic and Martin Pelikan Spurious Dependencies and EDA Scalability

Page 10: Spurious Dependencies and EDA Scalability

Population Sizing: Initial Supply

Initial supply for perfect model (Goldberg et al., 2001)

N = 2 ln 2m

Initial supply for arbitrary kspurious

N = 2kspurious

(kspurious ln 2 + ln

n

kspurious

)

Initial-supply population increase factor

γis = 2kspurious−1kspurious ln 2 + ln n

kspurious

ln 2 + ln n

Elizabeth Radetic and Martin Pelikan Spurious Dependencies and EDA Scalability

Page 11: Spurious Dependencies and EDA Scalability

Population Sizing: Decision Making

Decision making for perfect model (Harik et al., 1997)

N = −1

2lnα

√π(n− 1)

Decision making for arbitrary kspurious

N = −2kspurious−2 lnα√

π(n− 1)

Decision-making population increase factor

γdm = 2kspurious

Elizabeth Radetic and Martin Pelikan Spurious Dependencies and EDA Scalability

Page 12: Spurious Dependencies and EDA Scalability

Number of Generations

Effects of spurious dependencies on number of generations

I Spurious dependencies weaken the mixing.I This reduces the effects of variation.I This should reduce the number of generations until

convergence (assuming a large enough population).I No theoretical model as of now.

Elizabeth Radetic and Martin Pelikan Spurious Dependencies and EDA Scalability

Page 13: Spurious Dependencies and EDA Scalability

Description of Experiments

Operators

I Binary tournament selection without replacement.I Three replacement types

I Full replacement.I Elitist replacement (50% worst are replaced).I Restricted tournament replacement (niching).

I Models with various levels of spurious linkage.

Parameters

I Optimal population size obtained by bisection.I Runs stop when a solution close enough to the optimum is

reached (allow one linkage group to end up incorrect).

Elizabeth Radetic and Martin Pelikan Spurious Dependencies and EDA Scalability

Page 14: Spurious Dependencies and EDA Scalability

Population Size (Full Replacement)

0

200

400

600

800

1000

1 1.5 2 2.5 3 3.5 4 4.5 5

Pop

ulat

ion

size

Spurious linkage group size

Gambler’s ruinInitial supplyExperiment

(a) Population size

4

8

12

16

1 2 3 4 5

Pop

ulat

ion

size

rat

io

Spurious linkage group size

Gambler’s ruinInitial supplyExperiment

(b) Population size ratio

Figure 2: Growth of the population size with respect to the group size for a problem of 300 bits.The left-hand side shows the actual population sizes compared to the theoretical model, whereasthe right-hand side shows the ratio of the population sizes with spurious linkage and the populationsizes with no spurious linkage.

0

200

400

600

800

1000

1200

1 1.5 2 2.5 3 3.5 4 4.5 5

Pop

ulat

ion

size

Group size (bits per group)

Problem size30024018012060

(a) Full replacement

0

200

400

600

800

1000

1 1.5 2 2.5 3 3.5 4 4.5 5

Pop

ulat

ion

size

Group size (bits per group)

Problem size30024018012060

(b) Elitist replacement

0

100

200

300

400

500

1 1.5 2 2.5 3 3.5 4 4.5 5

Pop

ulat

ion

size

Group size (bits per group)

Problem size30024018012060

(c) RTR

Figure 3: Growth of the population size with respect to the spurious linkage group size.

1(a) shows the average number of spurious linkage groups (groups of size at least 2) for each prob-lem size. The results indicate that the number of such groups increases approximately linearly withproblem size. Figure 1(b) shows the average size of spurious linkage groups. For each problem size,the size of spurious linkage groups is close to two, indicating that larger linkage groups were createdonly rarely. Finally, figure 1(c) shows the average linkage group size when both spurious linkagegroups and independent bits are taken into account. The average group size is between 1 and 2and it increases slightly with problem size. Similar results for ECGA on onemax were reported inref. [28]. The results presented here thus reaffirm the need for studying spurious dependencies andtheir effects.

4.2.2 Population sizing with spurious linkage

This section presents the results of using fixed MPM models with spurious dependencies on onemax.To confirm the theory presented in section 3, figure 2 compares the experimental results for thepopulation size to the predictions made by the initial supply and gambler’s ruin population sizingmodels. As expected, the gambler’s ruin model matches the experimental results more closely thanthat for the initial supply. The gambler’s ruin model can therefore help to determine the impactof spurious dependencies on EDA population sizing. Since the population size is one of the keyfactors that affect EDA performance, this can provide guidelines for predicting the overall impactof spurious dependencies on EDA performance.

Additional results on the effects of spurious dependencies on the population size for all problemsizes and replacement methods are shown in figure 3. These results illustrate that, in each case, thepopulation size grows approximately exponentially with the spurious linkage group size. Further-

10

I Increase of population size with kspurious is exponential.I Theory provides a conservative bound.

Elizabeth Radetic and Martin Pelikan Spurious Dependencies and EDA Scalability

Page 15: Spurious Dependencies and EDA Scalability

Population Size (All Replacement Strategies)

Full replacement Elitist replacement RTR

0

200

400

600

800

1000

1 1.5 2 2.5 3 3.5 4 4.5 5

Pop

ulat

ion

size

Spurious linkage group size

Gambler’s ruinInitial supplyExperiment

(a) Population size

4

8

12

16

1 2 3 4 5

Pop

ulat

ion

size

rat

io

Spurious linkage group size

Gambler’s ruinInitial supplyExperiment

(b) Population size ratio

Figure 2: Growth of the population size with respect to the group size for a problem of 300 bits.The left-hand side shows the actual population sizes compared to the theoretical model, whereasthe right-hand side shows the ratio of the population sizes with spurious linkage and the populationsizes with no spurious linkage.

0

200

400

600

800

1000

1200

1 1.5 2 2.5 3 3.5 4 4.5 5

Pop

ulat

ion

size

Group size (bits per group)

Problem size30024018012060

(a) Full replacement

0

200

400

600

800

1000

1 1.5 2 2.5 3 3.5 4 4.5 5

Pop

ulat

ion

size

Group size (bits per group)

Problem size30024018012060

(b) Elitist replacement

0

100

200

300

400

500

1 1.5 2 2.5 3 3.5 4 4.5 5

Pop

ulat

ion

size

Group size (bits per group)

Problem size30024018012060

(c) RTR

Figure 3: Growth of the population size with respect to the spurious linkage group size.

1(a) shows the average number of spurious linkage groups (groups of size at least 2) for each prob-lem size. The results indicate that the number of such groups increases approximately linearly withproblem size. Figure 1(b) shows the average size of spurious linkage groups. For each problem size,the size of spurious linkage groups is close to two, indicating that larger linkage groups were createdonly rarely. Finally, figure 1(c) shows the average linkage group size when both spurious linkagegroups and independent bits are taken into account. The average group size is between 1 and 2and it increases slightly with problem size. Similar results for ECGA on onemax were reported inref. [28]. The results presented here thus reaffirm the need for studying spurious dependencies andtheir effects.

4.2.2 Population sizing with spurious linkage

This section presents the results of using fixed MPM models with spurious dependencies on onemax.To confirm the theory presented in section 3, figure 2 compares the experimental results for thepopulation size to the predictions made by the initial supply and gambler’s ruin population sizingmodels. As expected, the gambler’s ruin model matches the experimental results more closely thanthat for the initial supply. The gambler’s ruin model can therefore help to determine the impactof spurious dependencies on EDA population sizing. Since the population size is one of the keyfactors that affect EDA performance, this can provide guidelines for predicting the overall impactof spurious dependencies on EDA performance.

Additional results on the effects of spurious dependencies on the population size for all problemsizes and replacement methods are shown in figure 3. These results illustrate that, in each case, thepopulation size grows approximately exponentially with the spurious linkage group size. Further-

10

0

200

400

600

800

1000

1 1.5 2 2.5 3 3.5 4 4.5 5

Pop

ulat

ion

size

Spurious linkage group size

Gambler’s ruinInitial supplyExperiment

(a) Population size

4

8

12

16

1 2 3 4 5

Pop

ulat

ion

size

rat

io

Spurious linkage group size

Gambler’s ruinInitial supplyExperiment

(b) Population size ratio

Figure 2: Growth of the population size with respect to the group size for a problem of 300 bits.The left-hand side shows the actual population sizes compared to the theoretical model, whereasthe right-hand side shows the ratio of the population sizes with spurious linkage and the populationsizes with no spurious linkage.

0

200

400

600

800

1000

1200

1 1.5 2 2.5 3 3.5 4 4.5 5

Pop

ulat

ion

size

Group size (bits per group)

Problem size30024018012060

(a) Full replacement

0

200

400

600

800

1000

1 1.5 2 2.5 3 3.5 4 4.5 5

Pop

ulat

ion

size

Group size (bits per group)

Problem size30024018012060

(b) Elitist replacement

0

100

200

300

400

500

1 1.5 2 2.5 3 3.5 4 4.5 5

Pop

ulat

ion

size

Group size (bits per group)

Problem size30024018012060

(c) RTR

Figure 3: Growth of the population size with respect to the spurious linkage group size.

1(a) shows the average number of spurious linkage groups (groups of size at least 2) for each prob-lem size. The results indicate that the number of such groups increases approximately linearly withproblem size. Figure 1(b) shows the average size of spurious linkage groups. For each problem size,the size of spurious linkage groups is close to two, indicating that larger linkage groups were createdonly rarely. Finally, figure 1(c) shows the average linkage group size when both spurious linkagegroups and independent bits are taken into account. The average group size is between 1 and 2and it increases slightly with problem size. Similar results for ECGA on onemax were reported inref. [28]. The results presented here thus reaffirm the need for studying spurious dependencies andtheir effects.

4.2.2 Population sizing with spurious linkage

This section presents the results of using fixed MPM models with spurious dependencies on onemax.To confirm the theory presented in section 3, figure 2 compares the experimental results for thepopulation size to the predictions made by the initial supply and gambler’s ruin population sizingmodels. As expected, the gambler’s ruin model matches the experimental results more closely thanthat for the initial supply. The gambler’s ruin model can therefore help to determine the impactof spurious dependencies on EDA population sizing. Since the population size is one of the keyfactors that affect EDA performance, this can provide guidelines for predicting the overall impactof spurious dependencies on EDA performance.

Additional results on the effects of spurious dependencies on the population size for all problemsizes and replacement methods are shown in figure 3. These results illustrate that, in each case, thepopulation size grows approximately exponentially with the spurious linkage group size. Further-

10

0

200

400

600

800

1000

1 1.5 2 2.5 3 3.5 4 4.5 5

Pop

ulat

ion

size

Spurious linkage group size

Gambler’s ruinInitial supplyExperiment

(a) Population size

4

8

12

16

1 2 3 4 5

Pop

ulat

ion

size

rat

io

Spurious linkage group size

Gambler’s ruinInitial supplyExperiment

(b) Population size ratio

Figure 2: Growth of the population size with respect to the group size for a problem of 300 bits.The left-hand side shows the actual population sizes compared to the theoretical model, whereasthe right-hand side shows the ratio of the population sizes with spurious linkage and the populationsizes with no spurious linkage.

0

200

400

600

800

1000

1200

1 1.5 2 2.5 3 3.5 4 4.5 5

Pop

ulat

ion

size

Group size (bits per group)

Problem size30024018012060

(a) Full replacement

0

200

400

600

800

1000

1 1.5 2 2.5 3 3.5 4 4.5 5

Pop

ulat

ion

size

Group size (bits per group)

Problem size30024018012060

(b) Elitist replacement

0

100

200

300

400

500

1 1.5 2 2.5 3 3.5 4 4.5 5

Pop

ulat

ion

size

Group size (bits per group)

Problem size30024018012060

(c) RTR

Figure 3: Growth of the population size with respect to the spurious linkage group size.

1(a) shows the average number of spurious linkage groups (groups of size at least 2) for each prob-lem size. The results indicate that the number of such groups increases approximately linearly withproblem size. Figure 1(b) shows the average size of spurious linkage groups. For each problem size,the size of spurious linkage groups is close to two, indicating that larger linkage groups were createdonly rarely. Finally, figure 1(c) shows the average linkage group size when both spurious linkagegroups and independent bits are taken into account. The average group size is between 1 and 2and it increases slightly with problem size. Similar results for ECGA on onemax were reported inref. [28]. The results presented here thus reaffirm the need for studying spurious dependencies andtheir effects.

4.2.2 Population sizing with spurious linkage

This section presents the results of using fixed MPM models with spurious dependencies on onemax.To confirm the theory presented in section 3, figure 2 compares the experimental results for thepopulation size to the predictions made by the initial supply and gambler’s ruin population sizingmodels. As expected, the gambler’s ruin model matches the experimental results more closely thanthat for the initial supply. The gambler’s ruin model can therefore help to determine the impactof spurious dependencies on EDA population sizing. Since the population size is one of the keyfactors that affect EDA performance, this can provide guidelines for predicting the overall impactof spurious dependencies on EDA performance.

Additional results on the effects of spurious dependencies on the population size for all problemsizes and replacement methods are shown in figure 3. These results illustrate that, in each case, thepopulation size grows approximately exponentially with the spurious linkage group size. Further-

10

I Increase of population size with kspurious similar in all cases.

Elizabeth Radetic and Martin Pelikan Spurious Dependencies and EDA Scalability

Page 16: Spurious Dependencies and EDA Scalability

Number of Generations (All Replacement Strategies)

Full replacement Elitist replacement RTR

10 20 30 40 50 60 70 80

1 1.5 2 2.5 3 3.5 4 4.5 5

Num

ber

of g

ener

atio

ns

Group Size (bits per group)

Problem size300240180

12060

(a) Full replacement

10 20 30 40 50 60 70 80

1 1.5 2 2.5 3 3.5 4 4.5 5N

umbe

r of

gen

erat

ions

Group size (bits per group)

Problem size300240180

12060

(b) Elitist replacement

10

100

1000

10000

100000

1e+06

1e+07

1 1.5 2 2.5 3 3.5 4 4.5 5

Num

ber

of g

ener

atio

ns

Group size (bits per group)

Problem size30024018012060

(c) RTR

Figure 4: Growth of the number of generations with respect to the spurious linkage group size.

60 80

100 120 140 160 180 200 220 240

1 1.2 1.4 1.6 1.8 2

Pop

ulat

ion

size

Avg. spurious linkage group size

Full repl.Elitist repl.RTR repl.

(a) Population size

20 40 60 80

100 120 140 160 180 200

1 1.2 1.4 1.6 1.8 2

Num

ber

of g

ener

atio

ns

Avg. spurious linkage group size

Full repl.Elitist repl.RTR repl.

(b) Number of generations

2000 4000 6000 8000

10000 12000 14000 16000 18000 20000

1 1.2 1.4 1.6 1.8 2N

umbe

r of

eva

luat

ions

Avg. spurious linkage group size

Full repl.Elitist repl.RTR repl.

(c) Number of evaluations

Figure 5: The population size, the number of generations, and the number of evaluations for the300-bit onemax with varying numbers of spurious linkage groups of size 2 (the remaining groupsare of size 1).

more, elitist replacement requires somewhat smaller populations than full replacement, and RTRoutperforms other replacement strategies in terms of the population size. However, in each casethe relative increase of population sizes with the size of spurious linkage groups is similar.

The results for ECGA presented earlier (figure 1) indicated that typically the models in ECGAcombined only a fraction of the bits into spurious linkage groups, most of which were of size 2.The results of figure 5 illustrate the effects of having different proportions of such linkage groups.More specifically, figure 5 illustrates the results obtained when using a variable number of spuriouslinkage groups of size 2 on a 300-bit onemax (the remaining groups are of size 1). From theseresults, it is clear that the required population size increases with the number of spurious linkagegroups. An upper bound for this scenario should be of course provided by EDAs with MPMs withthe fixed order of spurious dependencies kspurious = 2.

In summary, the results presented in this subsection demonstrate that spurious dependenciestend to increase the required population size for EDAs and that the theory from section 3 providesan accurate estimation of the effects of spurious dependencies on the population size.

4.2.3 Number of generations with spurious linkage

According to the results in figures 4 and 5, the number of generations until optimum does not seemto be strongly affected by the order of spurious dependencies when full and elitist replacementmethods are used. In fact, figure 4 shows that the number of generations required to find anaccurate solution decreases slightly with the size of spurious linkage groups. Figure 5 shows similarresults on a smaller scale.

11

10 20 30 40 50 60 70 80

1 1.5 2 2.5 3 3.5 4 4.5 5

Num

ber

of g

ener

atio

ns

Group Size (bits per group)

Problem size300240180

12060

(a) Full replacement

10 20 30 40 50 60 70 80

1 1.5 2 2.5 3 3.5 4 4.5 5

Num

ber

of g

ener

atio

ns

Group size (bits per group)

Problem size300240180

12060

(b) Elitist replacement

10

100

1000

10000

100000

1e+06

1e+07

1 1.5 2 2.5 3 3.5 4 4.5 5

Num

ber

of g

ener

atio

ns

Group size (bits per group)

Problem size30024018012060

(c) RTR

Figure 4: Growth of the number of generations with respect to the spurious linkage group size.

60 80

100 120 140 160 180 200 220 240

1 1.2 1.4 1.6 1.8 2

Pop

ulat

ion

size

Avg. spurious linkage group size

Full repl.Elitist repl.RTR repl.

(a) Population size

20 40 60 80

100 120 140 160 180 200

1 1.2 1.4 1.6 1.8 2

Num

ber

of g

ener

atio

ns

Avg. spurious linkage group size

Full repl.Elitist repl.RTR repl.

(b) Number of generations

2000 4000 6000 8000

10000 12000 14000 16000 18000 20000

1 1.2 1.4 1.6 1.8 2

Num

ber

of e

valu

atio

ns

Avg. spurious linkage group size

Full repl.Elitist repl.RTR repl.

(c) Number of evaluations

Figure 5: The population size, the number of generations, and the number of evaluations for the300-bit onemax with varying numbers of spurious linkage groups of size 2 (the remaining groupsare of size 1).

more, elitist replacement requires somewhat smaller populations than full replacement, and RTRoutperforms other replacement strategies in terms of the population size. However, in each casethe relative increase of population sizes with the size of spurious linkage groups is similar.

The results for ECGA presented earlier (figure 1) indicated that typically the models in ECGAcombined only a fraction of the bits into spurious linkage groups, most of which were of size 2.The results of figure 5 illustrate the effects of having different proportions of such linkage groups.More specifically, figure 5 illustrates the results obtained when using a variable number of spuriouslinkage groups of size 2 on a 300-bit onemax (the remaining groups are of size 1). From theseresults, it is clear that the required population size increases with the number of spurious linkagegroups. An upper bound for this scenario should be of course provided by EDAs with MPMs withthe fixed order of spurious dependencies kspurious = 2.

In summary, the results presented in this subsection demonstrate that spurious dependenciestend to increase the required population size for EDAs and that the theory from section 3 providesan accurate estimation of the effects of spurious dependencies on the population size.

4.2.3 Number of generations with spurious linkage

According to the results in figures 4 and 5, the number of generations until optimum does not seemto be strongly affected by the order of spurious dependencies when full and elitist replacementmethods are used. In fact, figure 4 shows that the number of generations required to find anaccurate solution decreases slightly with the size of spurious linkage groups. Figure 5 shows similarresults on a smaller scale.

11

10 20 30 40 50 60 70 80

1 1.5 2 2.5 3 3.5 4 4.5 5

Num

ber

of g

ener

atio

ns

Group Size (bits per group)

Problem size300240180

12060

(a) Full replacement

10 20 30 40 50 60 70 80

1 1.5 2 2.5 3 3.5 4 4.5 5

Num

ber

of g

ener

atio

ns

Group size (bits per group)

Problem size300240180

12060

(b) Elitist replacement

10

100

1000

10000

100000

1e+06

1e+07

1 1.5 2 2.5 3 3.5 4 4.5 5

Num

ber

of g

ener

atio

ns

Group size (bits per group)

Problem size30024018012060

(c) RTR

Figure 4: Growth of the number of generations with respect to the spurious linkage group size.

60 80

100 120 140 160 180 200 220 240

1 1.2 1.4 1.6 1.8 2

Pop

ulat

ion

size

Avg. spurious linkage group size

Full repl.Elitist repl.RTR repl.

(a) Population size

20 40 60 80

100 120 140 160 180 200

1 1.2 1.4 1.6 1.8 2

Num

ber

of g

ener

atio

ns

Avg. spurious linkage group size

Full repl.Elitist repl.RTR repl.

(b) Number of generations

2000 4000 6000 8000

10000 12000 14000 16000 18000 20000

1 1.2 1.4 1.6 1.8 2

Num

ber

of e

valu

atio

ns

Avg. spurious linkage group size

Full repl.Elitist repl.RTR repl.

(c) Number of evaluations

Figure 5: The population size, the number of generations, and the number of evaluations for the300-bit onemax with varying numbers of spurious linkage groups of size 2 (the remaining groupsare of size 1).

more, elitist replacement requires somewhat smaller populations than full replacement, and RTRoutperforms other replacement strategies in terms of the population size. However, in each casethe relative increase of population sizes with the size of spurious linkage groups is similar.

The results for ECGA presented earlier (figure 1) indicated that typically the models in ECGAcombined only a fraction of the bits into spurious linkage groups, most of which were of size 2.The results of figure 5 illustrate the effects of having different proportions of such linkage groups.More specifically, figure 5 illustrates the results obtained when using a variable number of spuriouslinkage groups of size 2 on a 300-bit onemax (the remaining groups are of size 1). From theseresults, it is clear that the required population size increases with the number of spurious linkagegroups. An upper bound for this scenario should be of course provided by EDAs with MPMs withthe fixed order of spurious dependencies kspurious = 2.

In summary, the results presented in this subsection demonstrate that spurious dependenciestend to increase the required population size for EDAs and that the theory from section 3 providesan accurate estimation of the effects of spurious dependencies on the population size.

4.2.3 Number of generations with spurious linkage

According to the results in figures 4 and 5, the number of generations until optimum does not seemto be strongly affected by the order of spurious dependencies when full and elitist replacementmethods are used. In fact, figure 4 shows that the number of generations required to find anaccurate solution decreases slightly with the size of spurious linkage groups. Figure 5 shows similarresults on a smaller scale.

11

I Full and elitist replacementI Number of generations slightly decreases with kspurious.

I Niching (restricted tournament replacement)I Number of generations dramatically increases!

Elizabeth Radetic and Martin Pelikan Spurious Dependencies and EDA Scalability

Page 17: Spurious Dependencies and EDA Scalability

Spurious Linkage in Multivariate EDAs

Experiment

I Use optimal population size in ECGA.I Observe spurious dependencies in actual models.

0 20 40 60 80

100 120 140

50 100 150 200 250 300Avg

. num

ber

of g

roup

s >

1

Problem size (number of bits)

ReplacementRTR

ElitistFull

(a) Number of spurious linkagegroups

2.015 2.02

2.025 2.03

2.035 2.04

2.045 2.05

50 100 150 200 250 300

Avg

. siz

e of

gro

ups

> 1

Problem size (number of bits)

ReplacementRTR

ElitistFull

(b) Avg. size of spurious linkagegroups

1.4 1.45

1.5 1.55

1.6 1.65

1.7 1.75

1.8

50 100 150 200 250 300

Ave

rage

gro

up s

ize

Problem size (number of bits)

ReplacementRTR

ElitistFull

(c) Average linkage group size

Figure 1: The average number of spurious linkage groups (groups of size ≥ 2), the average sizeof linkage groups of size ≥ 2, and the average linkage group size (including all linkage groups) forECGA on onemax. Three replacement strategies are considered: full replacement, elitist replace-ment and RTR. For each problem size and replacement strategy, the results represent an averageover 100 runs (10 bisections of 10 runs each).

Problems of size n = 60 to 300 bits in increments of 60 were tested. Population sizes weredetermined empirically using the bisection method [27, 22] to ensure 10 successful consecutiveruns. For each problem size and each test scenario, 10 independent bisections were performed, fora total of 100 independent runs. A run was considered successful when a string was found with atmost one suboptimal linkage group (with the linkage groups depending on the used model). Forthe base cases with no spurious linkage and the experiments with ECGA, at most 1 bit was allowedto be incorrect. Full population convergence was not required and each run was terminated whenone solution of the desired quality had been found. This allowed the same stopping criterion for alltested replacement methods including those with niching, which prevents full convergence.

Binary tournament selection without replacement was used to select parent populations. Toincorporate the offspring into the population, three replacement strategies were tested: (1) Fullreplacement, where the child population completely replaces the parent population; (2) elitistreplacement, where the worst 50% of the parent population was replaced with the child population;and (3) restricted tournament replacement (RTR) [10, 22], where for each offspring w solutions wererandomly selected from the parent population and the one genotypically closest to the offspringwas replaced if its fitness is worse. The window size in RTR was set to min(n, 0.05N) as suggestedby ref. [22].

The maximum number of generations allowed was set to 5n for most runs, but this limit wasincreased to 10000n when RTR was used with spurious dependencies. The need for this increasewas due to the effects of niching, as will be discussed in section 4.2.3.

4.2 Results

This section presents experimental results. First, the results that depict the accuracy of ECGAmodels on onemax are presented. The effects of spurious dependencies on the population size andthe time to convergence are then shown.

4.2.1 Spurious linkage for ECGA on onemax

The results in figure 1 provide an insight into the number and size of spurious dependencies dis-covered by ECGA on onemax using the population size obtained with bisection. Specifically, figure

9

Elizabeth Radetic and Martin Pelikan Spurious Dependencies and EDA Scalability

Page 18: Spurious Dependencies and EDA Scalability

Conclusions and Future Work

Conclusions

I Population size increases exponentially with kspurious.I Number of generations mostly unaffected.I But for niching, the number of generations skyrocks!I Spurious dependencies should not be ignored.

Future work

I From our model to multivariate EDAsI In most EDAs population sizing driven by model building.I Almost always the models contain spurious dependencies.I How do the models interact?

I Dramatic increase in the number of generations with nichingI Explain why.I Propose ways to deal with it.

Elizabeth Radetic and Martin Pelikan Spurious Dependencies and EDA Scalability

Page 19: Spurious Dependencies and EDA Scalability

Acknowledgments

Acknowledgments

I NSF; NSF CAREER grant ECS-0547013.

I University of Missouri; High Performance ComputingCollaboratory sponsored by Information Technology Services;Research Award; Research Board.

Elizabeth Radetic and Martin Pelikan Spurious Dependencies and EDA Scalability