Upload
martin-pelikan
View
891
Download
1
Tags:
Embed Size (px)
Citation preview
Initial-Population Bias in the UnivariateEstimation of Distribution Algorithm
Martin Pelikan and Kumara Sastry
Missouri Estimation of Distribution Algorithms Laboratory (MEDAL)University of Missouri, St. Louis, MO
http://medal.cs.umsl.edu/[email protected]
Download MEDAL Report No. 2009001
http://medal.cs.umsl.edu/files/2009001.pdf
Martin Pelikan and Kumara Sastry Initial-Population Bias in UMDA
Motivation
Importance of bias
I Efficiency enhancements of EDAs may introduce bias.I Examples
I Local search.I Injection of prior full or partial solutions.I Bias based on prior knowledge about the problem.
I Bias may have positive or negative effects.I It is important to understand these effects.
This study
I Study the effects of biasing the initial population.I Consider UMDA on onemax and noisy onemax.I Theory and experiment.
Martin Pelikan and Kumara Sastry Initial-Population Bias in UMDA
Outline
1. UMDA.
2. Basic model for bias.
3. Population size.
4. Number of generations.
5. Compare to hill climber.
6. Conclusions.
7. Future work.
Martin Pelikan and Kumara Sastry Initial-Population Bias in UMDA
Probability Vector as a Model
Probability vector, p
I Store probability of 1 in each position.
I p = (p1, p2, . . . , pn).I pi is probability of 1 in position i.
Replace crossover/mutation by model building and sampling
I Learn the probability vector from selected points.
I Sample new points according to the learned vector.
Martin Pelikan and Kumara Sastry Initial-Population Bias in UMDA
Univariate Marginal Distribution Algorithm (UMDA)
UMDA (Muhlenbein & Paaß, 1996).
1. Generate random population of binary strings.2. Selection (e.g. tournament selection).3. Learn probability vector for selected solutions.4. Sample probability vector to generate new solutions.5. Incorporate new solutions into original population.
Martin Pelikan, Probabilistic Model-Building GAs 13
Example: Probability Vector
(Mühlenbein, Paass, 1996), (Baluja, 1994)
01011
11000
11001
10101
Selectedpopulation
11101
11001
10101
10001
Newpopulation
11001
10101
01011
11000
1.0 0.5 0.5 0.0 1.0
Probabilityvector
Currentpopulation
Martin Pelikan and Kumara Sastry Initial-Population Bias in UMDA
Assumptions
Algorithm
I UMDA with binary tournament selection and full replacement.I Results should generalize to other selection methods with
fixed selection intensity.
Fitness
I Deterministic onemax:
onemax(X1, X2, . . . , Xn) =n∑
i=1
Xi
I Noisy onemax:
onemaxnoisy(X1, X2, . . . , Xn) =n∑
i=1
Xi + N(0, σ2)
I Results should generalize to other separable problems ofbounded order (if good model is used).
Martin Pelikan and Kumara Sastry Initial-Population Bias in UMDA
Basic Model for Bias
Basic model
I Introduce bias in the initial population.
I Increase or decrease the initial proportion pinit of optimal bits.
I Use the same bias for all string positions.
I Examples
pinit = 0.2 pinit = 0.5 pinit = 0.800001 11110 1111000001 01010 0101101000 11101 0111100010 00010 1111110000 11011 10111
I What to expect?I pinit grows ⇒ UMDA performance improves.I pinit decreases ⇒ UMDA performance suffers.
Martin Pelikan and Kumara Sastry Initial-Population Bias in UMDA
Theoretical Model for Deterministic Onemax
Population size
I Gambler’s ruin population-sizing model (Harik et al., 1997).
I Population sizing bound
N = − 14pinit
lnα√
πn
Number of generations
I Convergence model (Thierens & Goldberg, 1994).
I Number of generations bound
G =(π
2− arcsin(2pinit − 1)
)√πn
Martin Pelikan and Kumara Sastry Initial-Population Bias in UMDA
Deterministic Onemax: Theoretical Speedup
Speedup factors
I How many times faster the algorithm becomes compared topinit = 0.5?
I Population size:
ηN =1
2pinit
I Number of generations:
ηG = 1− 2 arcsin(2pinit − 1)π
I Number of evaluations:
ηE =1
2pinit
(1− 2 arcsin(2pinit − 1)
π
)
Martin Pelikan and Kumara Sastry Initial-Population Bias in UMDA
Experimental Setup
Basic setup
I Binary tournament selection without replacement.
I Full replacement (no elitism or niching).
I Problems of n = 100 to n = 500 tested (focus on n = 500).
I Population size set using bisection to ensure 10 successfulruns with 95% optimal solution out of 10 independent runs.
I Bisection repeated 10 times for each setting.
Observed statistics
I Population size.
I Number of generations.
I Number of evaluations.
Martin Pelikan and Kumara Sastry Initial-Population Bias in UMDA
Deterministic Onemax: Speedup and Slowdown
Speedup Slowdown
0 0.2 0.4 0.6 0.8 1
0
5
10
15
20
pinit
Slo
wdo
wn
(slower than pinit
=0.5)
(faster than pinit
=0.5)
Number of evaluationsPopulation sizeNumber of generationsBase case
0 0.2 0.4 0.6 0.8 1
0
2
4
6
8
pinit
Spe
edup
(faster than pinit
=0.5)
(slower than pinit
=0.5)
Number of evaluationsPopulation sizeNumber of generationsBase case
Figure 2: The factor by which the population size, the number of generations and the number ofevaluations should change with varying pinit compared to the base case with pinit = 0.5. The threefactors are based on the population-sizing and time-to-convergence models. The results are shownas speedup and slowdown curves.
problem of n = 500 is used in most presented results. In all experiments, binary tournamentselection without replacement is used. The sampling ensures that the generated population ofstrings corresponds to the the used probability vector as closely as possible in order to minimizesampling errors; this is done by first generating the expected number of 1s for each position and thenrandomly shuffling the bits in each position across the population of new strings (Branke, Lode,& Shapiro, 2007). In each generation, a population of the same size as the original populationis generated and the old population is fully replaced by the newly generated strings. There is noelitism or niching. To alleviate the effects of stochastic noise on the initial-population bias, theinitial population is generated similarly as the new solutions in each iteration; first, the expectednumber of 1s is put in every string position and the bits are then shuffled randomly.
The bisection method (Sastry, 2001; Pelikan, 2005) was used to find an adequate populationsize for each setting, which ensured that the algorithm finds a good-enough solution in each of the10 independent runs. In each run, UMDA was required to find a solution with at least 95% ofstring positions containing a 1 (at least 95% bits of the global optimum were correctly discovered).To provide more accurate estimates of the actual UMDA performance, the bisection was repeated10 times for each setting, yielding 100 successful runs for each configuration.
The results for UMDA are shown in Figure 3. The results indicate a close match between thetheoretical model and experimental results. Analogical match was observed for other problem sizes.
5 Mathematical Model for Noisy Onemax
Genetic algorithms are known to perform well even when the fitness cannot be determined exactlydue to external noise. The noise may come from an inherently noisy problem domain or a noisyapproximation of a computationally expensive fitness function. External noise can also be used tomodel certain types of problem difficulty. This section extends the analysis presented thus far toonemax with external Gaussian noise. This will provide us with the basic understanding of theinteraction between the initial-population bias and external noise.
The section starts with the description of the noisy onemax function. Next, the section presentsthe population-sizing and time-to-convergence models for noisy onemax. Finally, the models areverified with experiments and the results are discussed.
7
0 0.2 0.4 0.6 0.8 1
0
5
10
15
20
pinit
Slo
wdo
wn
(slower than pinit
=0.5)
(faster than pinit
=0.5)
Number of evaluationsPopulation sizeNumber of generationsBase case
0 0.2 0.4 0.6 0.8 1
0
2
4
6
8
pinit
Spe
edup
(faster than pinit
=0.5)
(slower than pinit
=0.5)
Number of evaluationsPopulation sizeNumber of generationsBase case
Figure 2: The factor by which the population size, the number of generations and the number ofevaluations should change with varying pinit compared to the base case with pinit = 0.5. The threefactors are based on the population-sizing and time-to-convergence models. The results are shownas speedup and slowdown curves.
problem of n = 500 is used in most presented results. In all experiments, binary tournamentselection without replacement is used. The sampling ensures that the generated population ofstrings corresponds to the the used probability vector as closely as possible in order to minimizesampling errors; this is done by first generating the expected number of 1s for each position and thenrandomly shuffling the bits in each position across the population of new strings (Branke, Lode,& Shapiro, 2007). In each generation, a population of the same size as the original populationis generated and the old population is fully replaced by the newly generated strings. There is noelitism or niching. To alleviate the effects of stochastic noise on the initial-population bias, theinitial population is generated similarly as the new solutions in each iteration; first, the expectednumber of 1s is put in every string position and the bits are then shuffled randomly.
The bisection method (Sastry, 2001; Pelikan, 2005) was used to find an adequate populationsize for each setting, which ensured that the algorithm finds a good-enough solution in each of the10 independent runs. In each run, UMDA was required to find a solution with at least 95% ofstring positions containing a 1 (at least 95% bits of the global optimum were correctly discovered).To provide more accurate estimates of the actual UMDA performance, the bisection was repeated10 times for each setting, yielding 100 successful runs for each configuration.
The results for UMDA are shown in Figure 3. The results indicate a close match between thetheoretical model and experimental results. Analogical match was observed for other problem sizes.
5 Mathematical Model for Noisy Onemax
Genetic algorithms are known to perform well even when the fitness cannot be determined exactlydue to external noise. The noise may come from an inherently noisy problem domain or a noisyapproximation of a computationally expensive fitness function. External noise can also be used tomodel certain types of problem difficulty. This section extends the analysis presented thus far toonemax with external Gaussian noise. This will provide us with the basic understanding of theinteraction between the initial-population bias and external noise.
The section starts with the description of the noisy onemax function. Next, the section presentsthe population-sizing and time-to-convergence models for noisy onemax. Finally, the models areverified with experiments and the results are discussed.
7
I Empirical results confirm intuition.I Positive bias improves performance.I Negative bias worsens performance.
Martin Pelikan and Kumara Sastry Initial-Population Bias in UMDA
Deterministic Onemax: Experiments vs. Theory
Population size Number of generations
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
100
200
300
400
pinit
Pop
ulat
ion
size
ExperimentTheory
(a) Population size.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
20
40
60
80
100
120
pinit
Num
ber
of g
ener
atio
ns
ExperimentTheory
(b) Number of generations.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
1
2
3
4
5x 104
pinit
Num
ber
of e
valu
atio
ns
ExperimentTheory
(c) Number of evaluations.
Figure 3: Effects of initial-population bias on UMDA performance on onemax of n = 500 bitswithout external noise.
5.1 Noisy Onemax
Noisy fitness functions can be modeled as (Goldberg, Deb, & Clark, 1992)
fitness′ = fitness + noise, (18)
where fitness is the true fitness of the candidate solution and noise is a random variable corre-sponding to the external noise. External noise can be characterized by two basic quantities: biasand variance (Sastry, 2001). Unbiased noise only adds variance to the population without affectingthe expected fitness of any candidate solution. While increasing the fitness variance makes decisionmaking between competing partial solutions or building blocks more difficult, the expected out-come of the computation should remain the same, assuming that the population size is increased toaverage out the added variance. On the other hand, biased noise may change the expected fitnessof any candidate solution. This will lead to both a more difficult decision making as well as thepotential for changing the overall outcome, since the best solution of the noisy fitness may notrepresent the best solution with no external noise.
In models of noisy fitness functions, external noise is typically distributed according to a zero-mean Gaussian distribution, which is also the considered here. The noisy onemax is thus definedas
onemaxnoisy(X1, X2, . . . , Xn) = onemax(X1, X2, . . . , Xn) + N(0, σ2
N ), (19)
where N(0, σ2
N ) is a random variable distributed according to the Gaussian distribution with mean0 and variance σ2
N .
Intuitively, the effects of external noise depend on how its variance compares the overall varianceof the fitness function. If the external-noise variance is very small compared to the fitness variance,the effects of external noise can be expected to be negligible; on the other hand, if the noise varianceis larger than the fitness variance, the effects of external noise can be expected to be substantial.That is why the strength (variance) of external noise is often measured relatively to the fitnessvariance:
σ2
N = βσ2
F , (20)
where σ2
F is the fitness variance and β ≥ 0 is a constant factor. In this paper, we consider fourlevels of external noise: β = {0.5, 1, 1.5, 2}. Note that for the considered range of β, the effects ofnoise on the fitness function are substantial. For the smallest β = 0.5, the variance of the externalnoise is half the overall fitness variance, and for the largest β = 2, the variance of the added noiseis twice the fitness variance.
8
I Empirical results match theory.
I Theory makes conservative estimates.
I Empirical results confirm intuition.
Martin Pelikan and Kumara Sastry Initial-Population Bias in UMDA
Theoretical Model for Noisy Onemax: Population Size
Population size
I Gambler’s ruin population-sizing model (Harik et al., 1997).
I Variance of external noise given in terms of fitness variance:
σ2noise = β × σ2
fitness
I Population sizing bound becomes
N = − 14pinit
lnα√
πn(1 + β)
Martin Pelikan and Kumara Sastry Initial-Population Bias in UMDA
Theoretical Model for Noisy Onemax: Generations
Number of generations
I Convergence model (Miller & Goldberg, 1994; Sastry, 2001;Goldberg, 2002).
I Difficult to solve analytically for arbitrary pinit.
I Effects of pinit modeled by an empirical fit.
I Number of generations bound
G =π
2√
πn√
1 + β
(1− 2 arcsin(2pinit − 1)
π
)
Martin Pelikan and Kumara Sastry Initial-Population Bias in UMDA
Noisy Onemax: Theoretical Speedup
Speedup factors same as for deterministic case!
I Population size:
ηN =1
2pinit
I Number of generations:
ηG = 1− 2 arcsin(2pinit − 1)π
I Number of evaluations:
ηE =1
2pinit
(1− 2 arcsin(2pinit − 1)
π
)
Martin Pelikan and Kumara Sastry Initial-Population Bias in UMDA
Noisy Onemax: Experiments vs. Theory for β = 1
Population size Number of generations
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
200
400
600
800
pinit
Pop
ulat
ion
size
ExperimentTheory
(a) Population size.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
50
100
150
pinit
Num
ber
of g
ener
atio
ns
ExperimentTheory
(b) Number of generations.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
2
4
6
8
10x 104
pinit
Num
ber
of e
valu
atio
ns
ExperimentTheory
(c) Number of evaluations.
Figure 4: Effects of initial-population bias on UMDA performance on onemax of n = 500 bits,σ2
N = 0.5σ2
F = 0.125n.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
200
400
600
800
pinit
Pop
ulat
ion
size
ExperimentTheory
(a) Population size.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
50
100
150
200
250
pinit
Num
ber
of g
ener
atio
ns
ExperimentTheory
(b) Number of generations.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
5
10
15x 104
pinit
Num
ber
of e
valu
atio
ns
ExperimentTheory
(c) Number of evaluations.
Figure 5: Effects of initial-population bias on UMDA performance on onemax of n = 500 bits,σ2
N = σ2
F = 0.25n.
Figure 8 visualizes the effects of external noise on the number of evaluations. Comparing theoverall number of evaluations confirms the intuition that the number of evaluations grows with thestrength of external noise. Specifically, from the models presented earlier, the number of evaluationsgrows linearly with the ratio of the variance of external noise and the fitness variance, β.
6 Effects of mutation and the sampling procedure
Thus far we examined the effects of the initial population bias on the performance of standardUMDA. This section answers three additional questions related to this topic:
(1) How does bit-flip mutation affect performance of UMDA with initial population bias?
(2) How does the sampling procedure affect performance of UMDA with initial population bias?
(3) How is performance of mutation-based hill climbing affected by initial bias and noise comparedto UMDA?
11
I Empirical results match theory.
I Population sizing remains a conservative estimate.
I Note: β = 1 is a lot of noise (noise variance equal to overallfitness variance).
Martin Pelikan and Kumara Sastry Initial-Population Bias in UMDA
Compare to Hill Climber on Deterministic Case
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
1
2
3
4
5x 105
pinit
Num
ber
of e
valu
atio
ns
Standard samplingNoiseless sampling
(a) Number of evaluations.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.5
1
1.5
2
2.5
(noiseless is faster)
(standard is faster)
pinit
Noi
sele
ss v
s. s
tand
ard
sam
plin
g
n=100n=200n=300n=400n=500
(b) Speedup of noiseless over standardsampling.
Figure 12: Effects of sampling on UMDA performance with external noise σ2
N = 2σ2
F .
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
1000
2000
3000
4000
pinit
Num
ber
of e
valu
atio
ns
ExperimentTheory
(a) Hill climbing on 500-bit onemax.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
1
2
3
4x 104
pinit
Num
ber
of e
valu
atio
ns
UMDAHill Climbing
(b) Comparison of UMDA and HC.
Figure 13: Performance of HC on a 500-bit deterministic onemax and its comparison to UMDA.Theoretical model of Muhlenbein (Muhlenbein, 1992) is used to provide an upper bound on thetime to convergence (on the left).
1s. The total number of evaluations until termination was then recorded and averaged over all 100runs. To analyze the influence of the initial-population bias on HC performance, the initial stringis generated to contain pinitn ones.
It is well known that HC works very well on onemax and similar unimodal problems (Muhlenbein,1992). This is confirmed with the results shown in Figure 13. HC is clearly much less sensitive tothe value of pinit than UMDA. This fact is not surprising because it was shown (Muhlenbein, 1992)that in HC majority of computational resources are typically spent in making the last few stepstoward the optimum. Furthermore, the results clearly indicate that on onemax HC significantlyoutperforms UMDA except for relatively large values of pinit. The smaller pinit, the greater thedifferences between HC and UMDA.
6.3.2 Performance of Hill Climbing on Noisy Onemax
After several initial experiments, it became clear that while on deterministic onemax HC signifi-cantly outperforms UMDA, on noisy onemax the situation changes rapidly. This result is not a bigsurprise, because it was shown that recombination-based search can deal with external noise muchmore effectively than search based on local operators (Sastry & Goldberg, 2004); nonetheless, the
15
I Performance of HC is great regardless of bias.
I This agrees with theory (Muhlenbein, 1992).
Martin Pelikan and Kumara Sastry Initial-Population Bias in UMDA
Compare to Hill Climber on Noisy Case
Performance of HC becomes poor with noise!
β n pinit HC evaluations UMDA evaluations
0.5 10 0.1 4,449 1,2100.5 25 0.1 2,125,373 1,8860.5 10 0.5 11,096 660.5 25 0.5 8,248,140 1691.0 5 0.1 215 5741.0 15 0.1 5,691,725 1,2101.0 5 0.5 64 201.0 15 0.5 15,738,168 64
Martin Pelikan and Kumara Sastry Initial-Population Bias in UMDA
Conclusions
I We have good theoretical understanding of the effects of onetype of initial-population bias on performance of UMDA ondeterministic and noisy onemax.
I Effects of bias match intuitionI Good bias improves performance.I Bad bias worsens performance.
I Effects of bias are independent of noise.
I Experimental results match theory.
Martin Pelikan and Kumara Sastry Initial-Population Bias in UMDA
Future Work
I Study specific efficiency enhancement techniques and the biasthey introduce, and apply the theory developed here toestimate the final effects.
I Extend this work to other types of bias.
I Extend this work to other evolutionary algorithms, especiallythe standard genetic algorithms with two-parentrecombination and EDAs with multivariate models (e.g. BOAand ecGA).
I Eliminate the empirical fit from the model for the noisyonemax.
Martin Pelikan and Kumara Sastry Initial-Population Bias in UMDA
Acknowledgments
Acknowledgments
I NSF; NSF CAREER grant ECS-0547013.
I U.S. Air Force, AFOSR; FA9550-06-1-0096.
I University of Missouri; High Performance ComputingCollaboratory sponsored by Information Technology Services;Research Award; Research Board.
Martin Pelikan and Kumara Sastry Initial-Population Bias in UMDA