Evolutionary Algorithms - University of Manchesterstudentnet.cs.manchester.ac.uk/pgt/2014/COMP60342/COMP...Evolutionary algorithms can be used to: simulate aspects of evolution to

Evolutionary AlgorithmsStochastic Iterative Search / Heuristic Search / Metaheuristics

Joshua KnowlesSchool of Computer Science

The University of Manchester

COMP60342 - Week 4 2.15, 24 April 2015

In This Lecture

• Simulated Evolution: Overview and Applications

• Evolutionary Algorithms for Optimization (in Detail)

• Tuning and Testing EAs (Basics)

• Other Stochastic Search Algorithms: Hillclimbing and Simulated Annealing

Image from http://www.truthtree.com/

Evolutionary Algorithms 2 2.15, 24 April 2015

Simulated Evolution: Overview andApplications


Evolutionary Algorithms

Evolutionary algorithm (EA) is the collective name for a number of different types ofalgorithmic simulation of the processes of Darwinian evolution by natural selection.

The main types of EA are:

• genetic algorithms

• evolution strategies

• evolutionary programming

• genetic programming

• learning classifier systems.

These different typesoriginated separately butare now largely obsoleteas separate categories.

The computer science discipline studying evolutionary algorithms is calledEvolutionary Computation (EC).

Evolutionary algorithms can be used to: simulate aspects of evolution to helpunderstand evolutionary dynamics and processes; to provide a mechanism for thecreation of artificial life-forms; to solve optimization problems.


Natural Evolution

The question that led in 19th Century to the Theory was:

How do we explain the diversity of life?


Natural Evolution — Origins

Even before Charles Darwin and Alfred Russel Wallace, several different theories ofEvolution attempted to explain the origin of the variety of biota on Earth. Chambers(1844) popularised transmutation - the idea that one species could change intoanother.

Chambers also believed that there was an inbuilt direction to this change: fromprimitive to more complex (or advanced).

Lamarck (1809) had proposed that species adapt to their environment during theirlifetime and then can pass on these acquired adaptations. (This mechanism waslater refuted).

So, Darwin and Wallace didn’t invent evolution, but contributed the explanation of themechanism, ‘natural selection’, that drives it. This, together with several other relatedtheories explained how all species come to exist (and depart).


Natural Selection

Figure: modified from One Long Argument by Ernst Mayr (1991)

http://www.christs.cam.ac.uk/darwin200/pages/index.php?page id=d3

Competition between individuals occurs because more offspring are created thanneeded to replace parents (superfecundity) and these cannot all be supported by theenvironment. Variation in the individuals and the inheritance of traits plus differentialsurvivability then leads to Evolution.


Arfiticial Selection: Breeding

Humans had artificially evolved “good” breedsin animals and plants long before (thousandsof years before) Darwin’s theory. In fact,Darwin used the idea to strengthen parts of hisarguments.Humans achieved this through the selectionof individuals (animals or plants) that theyobserved had preferable traits: selectivebreeding.Today: artificial selection is still used. Butwe can also manipulate the genes via geneticengineering or genetic modification. There ispotential for the latter to be faster, but it is stilla science in infancy.

Can we use similar processes to solve (optimization) problems too?


What an EA is

Stop?

YESOUTPUT:Final population of solutions

Initialize randomly

individuals (genotypes)a population of

population in a biased way

favouring fitter individuals

Select from the current

Reproduce selected individualsby sexual or asexual means

Evaluate the new individuals

NO

Replace parent population withnew individuals (selection maybe used here again)

Evaluate the initialpopulation

• An EA consists of several stochastic(i.e. random) processes

• But it is not entirely random !

• Heritability of good traits and biasedselection make the difference.

• These make the process one of trialand error


Some Evolution Vocabulary I

• an organism is defined by its GENOTYPE, a collection of GENES

• organisms exist in POPULATIONS

• new organisms arise after MATING

• new genotypes arise from RECOMBINATION a.k.a. CROSSOVER (shuffling upof genes from parents’ genotypes)

• new genotypes suffer from MUTATION

• the FITNESS of an individual is both the quality of the TRAITS it possesses, andits lifetime fecundity (the number of offspring it has)

• populations exist in NICHES, sets of conditions to which the population adaptsand EVOLVES


Some Evolution Vocabulary II

Equivalent terms (more or less):

Natural Evolution Evolutionary Algorithms Optimizationpopulation population set of solutionsorganism individual solution and its utilitygene gene variablelocus locus variable indexallele allele value a variable takesgenotype genotype solution vectorphenotype phenotype solution, e.g. a graphfitness fitness utility/cost/objective value

What does fitness mean in a biological entity? In most EC research, it equates to awell-defined objective function.


EA Models vs Natural Evolution

Some of the main differences are:

In EAs, fitness is objective and static; in Nature, fitness is relative and changing

EAs use a fixed population size; in Nature, populations fluctuate in size and go extinct

EA populations assume perfect mixing (panmictic); Natural populations may beseparated by geographic isolation, inter- species mating is forbidden, inbreedingforbidden in some populations

EAs use single chromosomes; Natural systems have multiple chromosomes enablingrobustness to mutational damage or environmental fluctuations

EAs stop and produce something “final”; Natural evolution has not stopped yet.

NB: The above applies to ‘standard’ EAs. Many ideas from Natural evolutionhave been tried in EAs


An Example: Bin Packing Problem

INSTANCE: K bins, e.g. lorries; set of items of different weights

PROBLEM: Pack the items into the bins so that the packed weight of the bins is asclose to equal as possible

item number: 1 2 3 4 5 6 7 8 9 10weight: 17 4 61 8 2 13 22 9 18 49

REPRESENTATION: genotype of 10 genes. Each gene has K alleles, representingwhich bin to pack item into, e.g. 2232213221 for K=3

INITIALIZATION: Random allele (value) assigned to each gene independently

MUTATION: choose a gene at random and set its value randomly

COST: Difference between lightest and heaviest bin. Here: = 25.


An Example: Bin Packing Problem

initial f gen 1 f gen 2 f2232213221 25 2232213121 34 2233212222 1161123121333 39 2232113221 33 2231212221 113311111213 115 1233321222 54 2231213221 331233321222 54 2232212221 19 2232212223 97

gen 3 f gen 4 f gen 5 f2331212221 5 2331312221 4 2331312221 42232212222 116 2311312221 125 2331312321 192331233221 54 2231312221 7 2331312221 41332212223 84 2331312321 19 2233312221 9

An illustration only. Best fitness improved quite rapidly. But did we reach anoptimum? How do we know when to stop?


The Power of Random Mutation + Biased Selection

Roger Alsing’s “Evolisa”.

The fitness function has a definite target. Is this really evolution? ... It demonstratethe power of selection, even when mutations are entirely random


The Power of Random Mutation + Biased Selection

Random mutation alone:The Monkey Shakespeare Simulator took a simulated total of 42,162,500,000 billionbillion monkey-years, until one of the ”monkeys” typed,

VALENTINE. Cease toIdor:eFLP0FRjWK78aXzVOwm)-;8.t

in which the first 19 letters appear in Two Gentleman of Verona.

Compare that with Richard Dawkins’s demonstration that

“METHINKS IT IS LIKE A WEASEL”

from Hamlet could be evolved using selection + random mutation in about 40generations.


Necessary Ingredients for an EA to Work

• A way to represent solutions (phenotypes) as strings of symbols (genotypes)

• A (fitness) function that maps genotypes to phenotypes and maps phenotypes toa measure of ‘fitness’

• Operators to reproduce and vary individual genotypes in such a way thatinheritance of traits occurs

The EA does need to know details of the fitness function (although it may help). Thefitness function can be a ‘black box’.

fitness

ACCAGT


Evolution and Fitness Landscapes

Hillclimbing, the simplest EA with asingle asexually reproducing individual,may work well on a simple single-peaklandscape

On a multi-peaked landscape, apopulation of individuals should be anadvantage



A multi-peaked fitness landscape



Individuals are distributed at random. There is initial diversity



Selection and variation begins to change the fitness distribution and the distributionof alleles



Further selection and variation drives diversity out of the population and drives upfitness



The population supports only limited diversity — it is distributed over just two peaks



Convergence to a single fitness peak may occur. Variation still operates but diversityis very limited. Further evolution from here is difficult


Searching a Multimodal Fitness Landscape

A multimodal fitness landscape has many peaks. Perhaps just one of those peaks isoptimal.

x

Fitness

xx x

x

Fitness

xx x

Fitness

Fitness

Fitness

Fitness

x

Fitness

Fitness

Fitness

Tim

e

Ordinary GA

x

x

xxxx

xxx

x

xxxx x

xx xx xx

Hillclimbing Good GA

(adapted from a slide by David Corne)


Evolution and Gene Frequencies

We can also view evolutionary processes in terms of what happens to the genes.

Random genes fitness Strong selection Weak selection01001010100 4 10110101110 1011010111001010110101 6 10110101110 0100101010001000101010 4 01010110101 0100010101010110101110 7 10110101110 01010110101

Genes (or alleles) that appear in fit individuals increase in frequency in a population.

Holland (1975) showed that short, higly fit schema (chunks of genetic code) increasein frequency during the run of a genetic algorithm.

Initial genetic diversity is not sustained if the population is small and if selectionstrongly favours fitter individuals. Continuing “Progress” depends on anintermediate selection pressure.


Classification of an EA

An evolutionary algorithm may be called any of these:

• A global optimizer

• A ‘black-box’ optimizer

• A stochastic, iterative search method

• A heuristic search method

• A metaheuristic

• A nature-inspired method

Usually its optimization performance carries no formal guarantees (even toapproximate optimal solutions).

However, EAs can be applied very generally, and there is much accumulatedevidence that their performance is often good if certain design principles are followed.

There are several other methods sharing some common features with EAs:simulated annealing, tabu search, particle swarm optimization.


First EAs

For the really early history see thisinteresting book.

Butler (1863) imagined machinesevolving in an article “Darwin amongthe machines”. Later on:

Nils Barricelli (1953)

Box (1957)

Friedman (1959)

Bledsoe (1961)

Bremermann (1962)

all ran experiments with computers andsimulated evolution independently.


Early German Work on EAsRechenberg, Schwefel and Bienert in 1960s and 1970s.


Figures from a talk by Rechenberg

Evolving Jet Nozzle shapes


Interactive Evolution

Photofit methods based on EAs have been developed. The fitness function is ahuman-in-the-loop. This is known as interactive evolution.

One of the main issues is fatigue.

Try David Corne’s interactive evolution demo.http://www.macs.hw.ac.uk/ dwcorne/Teaching/iea.html


Interactive Evolution II

Evolving chocolate?

Cocoa bean roasting processaffects aroma, flavour andflavonoid content of chocolate.

Interactive EA used to derivenew roasting temperaturecurves.

Difficulty: final taste ofchocolate affected by morethan just roast, so feedbacksignal is weak.


EAs in Art, Design and Music

Due to their random nature, EAs can come up with ‘surprising’ patterns, designs orsolutions. This has led to interest in them for supporting creative processes(architecture / design / music) or even being an autonomous creative agent.

Recent work by Kenneth Stanley evolves musical accompaniments.


Evolutionary Robotics

Developing controllers for robots is a difficult engineering task. The difficulty is evengreater when we wish to obtain robots that can co-operate to perform tasks robustly.

The evolution of neural network controllers has been successful in developing robotscapable of processing complex input sensor data to achieve coordinated motion andconglomeration.


Hardware Evolution

Electronic circuits and other hardware devices have been evolved by AdrianThompson, John Koza and others.

EAs come up with different circuit design solutions than humans, who tend to useformal design principles. EA solutions can exploit secondary electronic effects notknown (or ignored) by human designers. This can have both positive and negativeeffects.


Program Evolution (GP)

The evolution of computer programs takes different forms, but is commonly known asgenetic programming or GP.

One application of genetic programming is symbolic regression. Lipson (2009) hasevolved some physical laws of motion from physical observations.

Another notable success is the evolution of a pseudo-random number generatorswhich score very highly on statistical benchmark tests of apparent randomness.


Evolutionary Algorithms in Experimental Optimization

Pioneering work in Germany by Rechenberg, Schwefel andco-workers hooked “evolution strategies” up to physical experiments.


Airfoil shapes, and jet nozzles were optimized using physicalexperiments for the fitness function.

Today, some physical / chemical / biochemical systems are stilldifficult to model. They may still be optimized by evolution.


Using EAs for Optimization: Health Warnings

It is very important to remember

• EA are heuristics; their effectiveness depends upon many factors

• EAs give an approximate solution only

• EAs are stochastic: different runs might give different results

• Usually EAs offer no performance guarantee — not even a guarantee that somelevel of approximation will be reached


Wide Applicability, Small Development Time

Nevertheless, EAs can still operate when problems feature

• nonlinear, nonconvex, nondifferentiable and/or discontinuous costsurfaces (fitness landscapes)

• noisy or uncertain estimation of costs

• multiple nonlinear constraints

EAs only require that solutions can be represented by some symboliccoding, and there is a way to evaluate proposed solutions.

This makes them widely applicable.

Since it is possible to develop a (basic) EA for a problem withoutunderstanding anything about the problem’s structure, developmenttimes can be very fast. Often a fitness function can be just plugged in.


Think About Alternative Solution Methods

Since an EA is widely applicable, it is tempting to use them for everyproblem.

In reality, it is only sensible to use EAs when an efficient alternativesolution method is not known. (E.g. you wouldn’t use an EA for theminimum spanning tree problem. Why?)

If development time is an issue this may also be a reason to use anEA. Some efficient techniques are difficult and time-consuming toimplement; an EA may be developed much more quickly.

If a basic problem (such as minimum spanning tree, is augmentedwith additional constraints) then it may become much harder. An EAmay then be sensible, provided exact optimal solutions are notrequired.


EAs in Detail


Representation: Genotype to Phenotype Mapping

With evolutionary algorithms we are FREE to choose the representation: theway the genotype codes for phenotype.

The phenotype (in EA-speak) is the solution to a problem. E.g. a complete timetablefor the LONDON2012 games.

The genotype is just the coding for how we build it.

Fitness

DECODER101011102A71G

{ {genotype phenotype

The fitness value is a function of the phenotype. But often we map directly fromgenotype to fitness.

The choice of representation is important for EA performance.


Choice of Representation

Example:BINPACKING: Pack items with given sizes intosmallest number of bins possible, where binshave certain capacity

REPRESENTATION 1: For each item, give a bin number it should go in.

REPRESENTATION 2: Genotype is a permutation of the items. Phenotype (theactual packing) is created by using a heuristic called FIRST-FIT. FIRST-FIT puts thenext item in the first bin in which it will fit.

The first representation is more direct. A possible problem with it is that manyinfeasible solutions are represented, and many very poor ones too.

The second representation is indirect. Possible problems with indirectrepresentations are:


• Optimal solution may no be representable !

• Heuristic decoder may be computationally expensive

• Many changes to genotype make no difference to phenotype

But it solves the problem of the direct coding. So, you make your choice...


Other Properties of Representations

Representation determines how close different solutions(phenotypes) are to each other.

So representation choice influences fitness landscape shape.

Generally speaking, we would like small changes in genotype to give(mostly) small changes in phenotype, which will in turn give smallchanges in fitness, as in the left figure:

FitnessX

XFitness

XX

Then we can “hill-climb” up a smooth path to the optimum.


Choosing a Population Size

Evolutionary algorithms use (usually) a constant population size |P |.

Setting it to a moderate value (10s or 100s) is usually advisable because:

X

XX

Fitness

Fitness

XXX

Tim

e

Fitness

X

Tim

e

XX

Fitness

XXX

XXXXX

XX

X

XX

XXXX

XX

XX

X

X

Too small a population has insufficient diversity andmay converge prematurely

Too large a population size may not leave enoughtime (generations) for evolution to get very far


Initialization

Random (unbiased) initialization of thepopulation of genotypes is the default.

Although it is tempting to build in something more intelligent, the risk is that seedingwith one fit individual can take over the population and cause prematureconvergence.

And It can be difficult to generate many different, good solutions.

Ideally, we want diverse and good solutions in the initial population. But it is better tosacrifice quality than diversity.


Steady State vs Generational Reproduction

Generational replacement scheme: offspring population

entirely replaces parent population. No competition

between offspring and parents

P(t) P’(t) P(t+1)parents offspring

A pure generational scheme has non-overlapping parent andoffspring populations.


P(t) P(t+1)parents offspring

are created. These replace the weakest individuals in the

parent population (if and only if they are fitter).

A steady state reproduction scheme. Only 1 or 2 offspring

The steady state scheme has significant overlap betweengenerations. We must choose which solutions to replace


Generational Reproduction with Elitism

P(t) P’(t) P(t+1)parents offspring

Generational replacement with elitism: offspring population

replaces parent population save for the fittest k individual(s)

elite elite

For overlapping populations, the fraction of the parent population replaced pergeneration is known as the generation gap. Elitism with k elite individuals has ageneration gap of |P | − k/|P |. A steady state EA producing two offspring fromrecombination has a generation gap of 2/|P |.


Intergenerational Competition

Alternative ways of replacing individuals from previous generations:

• Random replacement (independent of fitness)

• Replace the least fit

• Hold a tournament to tend to select poorer but not necessarily theworst solutions (see tournament selection in following slides)

• Offspring replace their parents

• Offspring replace their parents only if they are fitter

These give slightly different evolutionary dynamics, and which one tochoose may depend upon how mating selection is being done, andthe selection pressure being applied.


Reproduction Schemes in Evolution Strategies

Evolution strategies are the German evolutionary algorithms

There are two main reproduction schemes

(µ + λ) selection. There are µ parents, and λ offspring are generated from them (bycloning, mutation and recombination). The whole µ + λ of them compete to be the µparents of the next generation. Usually truncation selection is used: the fittest µ aretaken. λ may be smaller or larger than µ. E.g. A (10+1) scheme is like a steady stateEA.

(µ, λ) selection. With this “comma” selection strategy the offspring do not competewith their parents. The λ offspring compete amongst themselves and the fittest µ ofthem become the parents of the next generation. In this scheme µ must be greaterthan or equal to λ.


Mutation

Mutation is a very important operator in EAs. Recombination and selection alone aregenerally poor.

Mutation rates should nevertheless be small. A large mutation rate disrupts theheritability of good traits.

Two common schemes to do point mutation(s) on a binary string

1. Select a random locus (gene), and change its allele value

2. Consider every locus (gene) independently and with probability pm change itsallele value

With the first scheme exactly one gene’s allele value is changed per chromosomemutation event.

With the second scheme, when the mutation rate pm is set to 1/L, where L ischromosome length, then the expected number of mutations is one too. But there isa probability of ((L− 1)/L)L ≈ 0.36 that no mutations occur. And there is aprobability of ((L− 1)/L)L−M .(1/L)M .

(ML

)that exactly M mutations occur. So for

L = 100 and M = 3, this occurs with probability = 0.061. This allows occasionallonger jumps


Mutation with Other Representations

If one is using a genotypic representation other than binary strings, then mutationmust also be defined appropriately. (See the bin-packing example above for aninteger mutation).

With continuous gene values, perturbing the gene’s value by drawing from aGaussian distribution centred on the current gene’s value is one technique:

Let x ∈ [a, b] be a real variable. Then the Gaussian mutation operatorMG

changes x to MG(x) := min(max(N(x, σ), a), b), where N(x, σ) is anormally disrtributed random variable with mean x and standard deviationσ, and σ may depend on the length r := b − a of the interval and istypically set to σ/r = 1/10.

For permutations, still other mutations are possible...


Mutation with Other Representations

Often an EA is used to optimize a problem where the natural representation is apermutation of the numbers from 1 to N . Examples are TSP, some matchingproblems, some scheduling problems, knapsack problems when an indirect coding isbeing used.

2-swap mutationChoose any pair of genes at random. Swap them.A B C D E F G H −→ A F C D E B G H

2-opt mutation (good for TSP)Choose a contiguous chunk of the chromosome at random. Reverse its order.A B C D E F G H −→ E D C B A F G H

shift mutation (good for scheduling problems) Choose a contiguous chunk of thechromosome at random. Shift it to the right or left by S spaces for some random S. AB C D E F G H −→ A E F B C D G H

It is easy to design your own operator for a problem. You can also use more than onemutation operator in your evolutionary algorithm.


Recombination

Most evolutionary algorithms use recombination in addition to mutation in order toobtain variation

The role of recombination (or sexual reproduction in general) is disputed inNatural evolution and EC.

It seems to allow two or more positive traits that have been separately evolved to becombined. This could speed up evolution.

It can also act to repair the damage of a poor mutation (has a stabilizing effect).

Usually, recombination of two parents is used to produce 1 or 2 offspring individuals.But other models are possible.


Recombination: One Point Crossover

Early EAs used one-point crossover. Here is an example.

a b c d e f g h

A B C D E F G H

↑a b c D E F G H

A B C d e f g h

Parent 1 chromosomeParent 2 chromosomeCrossover pointOffspring 1Offspring 2

A random point along the string (between two genes) is chosen. Genes to the left arecopied from Parent 1. Those to the right are from Parent 2. (This is then reversed forthe second offspring if two offspring are produced, as above)

A problem with one-point is that certain combinations of genes in the two parentscannot be passed on to the offspring.


Recombination: Uniform Crossover

Syswerda (1989) invented uniform crossover, which is commonly used today.a b c d e f g h

A B C D E F G H

0 1 1 1 0 0 1 0

a B C D e f G h

A b c d E F g H

Parent 1 chromosomeParent 2 chromosomeCrossover maskOffspring 1Offspring 2

Uniform crossover uses a mask to dictate which parent to take each gene from. Anycombination of genes from either parent can be reached.

Note that alleles common in both parents are preserved in the offspring. Theremainder of the offspring inherits its genetic material from one parent or the other.

When we design recombination operators for other representations (e.g.permutations) we still try to keep these properties. It can be difficult to achieve,however.


Mating Selection Schemes: Fitness ProportionateSelection

The probability that an individual i with fitness fi is selected under one call to fitnessproportionate selection is

p(select i) =fi∑j∈P fj

where P is the set of indexes of the population. Here f is assumed to be positive,and we are maximizing fitness.


Fitness proportionate selection is also known commonly as roulette-wheel selection:

20 10

21

The size of the slot on the roulettewheel is proportional to fitness.The wheel is spun to select anindividual.


Mating Selection Schemes: Fitness ProportionateSelection

Clearly, fitness proportionate selection is sensitive to the scaling of fitness values. Ifwe have fitnesses {1,2,10,20} in a population, and we change our fitness function byadding a constant=100 to it, then the selection probabilities change radically.

20 10

21

120

110

102

101

Selective opportunity has been“flattened”

This can be a good or a bad thing for this selection method. But generally, it meansyou might need to be more careful with designing the fitness function.

Problems with fitness proportionate selection:Small differences in fitness (a competitive advantage) may not be rewarded much.⇒ May prevent evolution from getting off the ground, or retard progress later on.Large differences in fitness may be rewarded too much. ⇒ Take-over of thepopulation occurs where all diversity is lost suddenly.


Fitness-proportionate selection was popular in the early days of genetic algorithmsresearch, following Holland’s book in 1975. Much less so now.


Mating Selection Schemes: Rank-Based Selection

In Rank-Based Selection, we use the same method of selecting using aroulette-wheel, but raw fitness is replaced by the solution’s rank.

p(select i) =Ri∑j∈P Rj

where Ri is the rank of individual i in the population P , with the worst individualhaving a rank of 1 and the best a rank of |P |.

Drawbacks:Although rank-based selection overcomes some difficulties with fitness-proportionateselection, it adds computational complexity because we need to sort the individuals.Also: what about tied fitness values? What happens?


Mating Selection Schemes: Rank-Based Selection

We may not like the balance of selection pressure given by the ranks. To control thepressure, the ranks can be raised to a power in the calculation of probabilities.

1/2

Selection Probabilities underRank−Based Selection

0 0.02 0.04 0.06 0.08

0.1 0.12 0.14 0.16 0.18

0.2

p(s

elec

tio

n)

rank−based rank−basedusing R

p(select i) = RBi∑

j∈P RBj

where B is

the bias. A bias of zero would giveequal probability to all populationmembers.


Mating Selection Schemes: Tournament Selection

Another selection scheme that is invariant to rescaling of fitness values (i.e. dependsonly on the relative ranks of the individuals in the population) is TournamentSelection.

Tournament Selection( population P ) {best = contestant[1] = randomly select from( P )for (j = 2 to T size )

contestant[ j ] = randomly select from( P )if ( f(contestant[ j ]) > f(best) )

best = contestant[j]return j}

The random selections are done with uniform probability and with replacement.

The tournament size T size controls the bias or selection pressure. A tournamentsize of 1 gives no bias to fitter individuals.

Advantages: Tournament selection gives similar invariance to fitness re-scaling asrank-based selection but is simpler to implement and computationally more efficient


Tournament Selection: Expected Numbers of Offspring

T size values of 2 or 3 are generally very good in practice. The value 2 is known asbinary tournament selection.

Q. In a generational EA, with |P | selections for mating being made, what is theexpected number of copies made of the fittest individual?

A. The fittest individual will win any tournament it appears in. We would expect it toappear in two tournaments (almost certainly two different tournaments) when|P | selections are made since its independent probability of appearing in anyone tournament is 2/|P | (two being the number of individuals contesting thetournament). So, overall two copies of the fittest individual will be made inexpectation.

(The above assumed that there is only one fittest individual.)

What is the expected number for the least fit? And the second-to-least fit?


Stopping Criteria for Evolutionary Algorithms

Standard stopping criteria are

• A fixed number of generations

• Monitor the fitness and stop when it plateaus

• Monitor the genetic diversity and stop when it falls below some threshold

The first is common but not really advisable for best results. It is a guess and is likelyto lead to stopping too early (not yet converged) or too late (wasting resources anddoing nothing useful).

Monitoring fitness is OK, but the fitness may plateau for some time before progressagain occurs. This is the case for complex problems.

Monitoring diversity may be better. An EA usually loses most of its power to progressonce diversity is lost. How should we measure this?

Combining diversity and fitness criteria is also possible


Tuning Parameters in an Evolutionary Algorithm

EAs have many parameters. What are good values for them (generally) and howshould we go about setting them for a specific problem?This is a recurring and difficult question.

Here are some broad practical guidelinesparameter first choice 3 alternativesPopulation size 100 1‡, 20, 200Mutation rate∗ 1/L 0.5/L, 2/L, 4/LCrossover rate 0.7 0, 0.2, 0.95Tournament size 3 2, 5, 10

∗Note: Per-gene mutation rates in inverse proportion to the length L of the genotypeseem to work well

‡Note: Population size=1 gives a kind of hillclimber

Also: crossover rate is least important to tune, usually


Comparing Performance of Stochastic Algorithms

First level: Descriptive statistics. It is a good idea to plot the fitness evolution (orobjective function cost) over the generations

These are means over several runs. The error bars indicate the standard error ofthe mean.

(Clearly, the problem shown requires minimization of the objective function)


Comparing Performance of Stochastic Algorithms

Second level: inferential statistics or hypothesis testing.

• Collect data from several runs for a pair of algorithms.

• Test the hypothesis that one algorithm is better than the other. Using a t test isOK, but distributions may not be normal.

• Alternative is to use a non-parametric test. One type is the Mann Witney U test.

• A p-value is defined as the probability of seeing differences as extreme as thoseobserved, given that the samples are from the same population. It does notprove that differences are real, or give the probability they are real

To test multiple algorithms is more difficult. (You need to correct the statisticalsignificance values)


Constraint-Handling Schemes

Many problems are constrained. How can we handle constraints in EAs?

3 alternatives are:

1. Give a fitness penalty to solutions that violate constraints. Make thispenalty large if the constraint is hard. Make the penalty larger the morethe constraints are violated.

2. Do not let individuals that violate constraints survive (the death penaltyapproach)

3. Repair individuals that violate constraints before breeding from them.

The penalty method is generally the best and most often used.

Repairing can be very effective, but it can also be difficult in some circumstances toknow how to repair an infeasible solution (this may require a long search in itself).


Maintaining Diversity

There are numerous ways EAs can be improved beyond the basic model. One of themost effective improvements is to actively try to maintain diversity for longer. This isgenerally called ‘niching’, as it encourages different niches to be populated.

Three ways this can be achieved are:

1. structured populations with (partial) isolation

2. preventing stagnation by restarting evolution or injecting random solutions

3. fitness sharing


Maintaining Diversity: Structured Populations

This figure shows the fitness of individuals in a diffusion model GA:

Mating occurs locally on this grid. Offspring are also placed close to their parents,e.g. by doing a short random walk from one parent and replacing the least fitindividual on the walk.

Island Model GAs are another similar type: populations evolve separately on islands,with only very limited migration between islands.

Note: these EAs are also more highly parallel than standard EAs, so they are a good choice for distributed architectures


Maintaining Diversity: Restart Methods

This is an EA called the micro-GA:

Typically the populationis smaller than astandard EA, but whenconvergence is detected itis restarted (so it can runfor a long time relative topopulation size).


Maintaining Diversity: Fitness Sharing

The idea of fitness sharing is to reduce the effective fitness of individuals if they aretoo similar to others (if they occupy the same ‘niche’)

FITNESS

NICHE RADIUS

REDUCED FITNESS

The three individuals atop the high fitness peak fall within a certain radius (called theniche radius) of each other. In a simple scheme, their fitness would be divided(shared) by 3 (thus reducing it, as indicated by the magenta points).


Summary for EAs

EAs are computational models of adaptation that follow many principles of themodern evolutionary synthesis: Mendel’s genetics + Darwin’s Natural Selection

EAs have many applications, mostly in optimization. They are particularly usefulwhen the thing to optimize is a ‘black box’ (i.e., complex or unknowable) function, orwhen many constraints make it difficult to develop or apply other methods

EAs are stochastic and heuristic. No formal performance guarantees are usuallypossible. Nevertheless, much theory DOES exist. We can now give theoreticallyjustified performance predictions for several classes of problem

In practice, EAs usually need some careful tuning. For better performance, choose agood representation, work on the variation operators, and add diversity control


Local Search Heuristics: Hillclimbing andSimulated Annealing


HillClimber

A hillclimber is an example of a local search heuristic.

It is like the simplest form of an EA:

Hillclimberpopulation size 1crossover nomutation/neighbourhood move yesinitialization randomreplacement accept non-deteriorating moves

The big advantages of hillclimbing are its simplicity and generality. And with agood choice of neighbourhood move it can be quite effective.

The big drawback is that it gets stuck at local optima. There is no way to escape.

Possible solution: use a big neighbourhood. Big neighbourhoods are inefficient. (Butone advanced method is to vary the neighbourhood size.⇒ Variable NeighbourhoodSearch.)


Simple Hillclimber Pseudocode

First-Improvement Hillclimber()c = random solutionEc = evaluate(c)while not stop

m = neigbour(c)Em = evaluate(m)if Em ≤ Ec {comment: minimization}

c=mendwhileOutput c, Ec

Just start somewhere, and walk uphill (or downhill for minimization) until you can walkno more !


Hillclimber: Neighbourhoods and How to UseThem

The neighbourhood of a solution is the set of solutions that can be reached byapplying a neighbourhood move (aka mutation) operator to it.

A local optimum is a solution that has no fitter neighbours.

Aside: What is the neighbourhood when applying the standard EA mutation:mutating each bit with an independent positive probability?

Hillclimbers can be first-improvement or best-improvement. The first-improvementone is more like an EA. As soon as an improving move is found, it becomes thecurrent solution. In best-improvement the whole neighbourhood is systematicallychecked first, and the best neighbour becomes the new current solution if it is notworse than the current solution.

Best-improvement has a definite stopping criterion. But it is more common to usefirst-improvement.

It is often found that the performance of the first-improvement hillclimber is better if amove is accepted if it is better or equal in fitness to the current solution.


Simulated Annealing: The Physics of EscapingLocal Optima

Simulated annealing is a search method that derives from physics theory explaininghow metals reach lower energy (more ordered) states if they are heated up andcooled sufficiently slowly .

What has cooling a metal (annealing) got to do with optimization?

The following analogies/connections are madePhysics OptimizationPhysical State of Metal SolutionEnergy/entropy of Metal Fitness / CostTemperature Acceptance Probability for Moving to Worse Solutions

A metal at a high temperature can move from a low energy state to a higher onefreely. If it is cooled, it can become “frozen” in a state. There is no energy to allow itto change.

Hillclimbing is like being frozen: no moves to worse solutions are ever accepted.


Simulated Annealing: The Physics of EscapingLocal Optima

Boltzmann says that the probability that a move from a state c to a proposed state mis accepted is related to temperature as follows

p(accept move) =

{1 if Em ≤ Ec

exp(Ec− Em)/kT otherwise(1)

where k is a constant.

When a metal is cooled slowly it still makes some upward energy moves, butincreasingly less frequently. So it does not get frozen in a glassy (weak) state. It endsin a low energy state.

We can use this idea to let hillclimbing escape local optima !


Simulated Annealing: The Algorithm

SA was first proposed by Kirkpatrick in 1973. It is based on earlier work byMetropolis.

Simulated Annealing()T = Tstartc=random solutionEc =evaluate(c)while not stop

m = neigbour(c)Em = evaluate(m)if Em ≤ Ec or if exp(Ec - Em)/T < randomvar()

c=mcool(T)

endwhileOutput c, Ec

This is just hillclimbing with three lines of code changed ! We add a variable T anduse it to accept some deteriorating moves. We must also have some way to reduce T(cool it).


The Cooling Schedule

A cooling schedule is a specification of the initial temperature Tstart and how it will bereduced over the course of the algorithm.

Much work has gone into the theory and practice of good cooling schedules forsimulated annealing.

A good basic technique for setting a cooling schedule is

• Set Tstart so that roughly half of worsening moves are accepted. If mostworsening moves have a cost difference of W , then Tstart = −W/ log(0.5).• Every iteration reduce T by doing T = gT, where g is a parameter with value less

than 1. This is called geometric cooling.

• A generally good value for g is one where the final temperature reached afterMAXITERATIONS is one where even the smallest possible increase in cost isvery unlikely to be accepted.

Let us say that M=MAXITERATIONS. Then g will be given by

g = e1/M. log(Tend/Tstart)

More information is here: http://www.btluke.com/featur05.html


Neighbourhood Move Assumptions/Rules

It can be shown that simulated annealing converges to an optimal solution, givencertain assumptions about the move operator (neighbourhood) and the coolingschedule.

The cooling schedule may have to be exponentially slow: since otherwise simulatedannealing could be used to not solve NP-hard problems in polynomial time. But inpractice, reasonably fast cooling still gives good results.

The simulated annealing process must be ergodic. This means it must be possibleto reach every state from every other one. So the move operator must have thisproperty. It must not be possible to move to a solution from which other solutionsbecome unreachable.

If the move operator is symmetric (any move is reversible) then it is usually the casethat the whole search space remains reachable.


Summary

Hillclimbing and local search are simple but very effective methods formany problems

Simulated annealing (SA) extends hillclimbing to be able to jump outof local optima

The cooling schedule and the neighbourhood (mutation or move)operator are important to get right but there are some good guidingprinciples.

SA is often competitive with evolutionary algorithms and otheradvanced stochastic search methods.

Other search methods based on local search also exist: tabu search,variable neighbourhood search, reactive search.


Documents

Evolutionary Algorithms - University of Manchesterstudentnet.cs.manchester.ac.uk/pgt/2014/COMP60342/COMP...Evolutionary algorithms can be used to: simulate aspects of evolution to