Genetic Algorithms Beasley, Bull and Martin, An Overview of Genetic Algorithms: Part 1, Fundamentals & Part 2, Research Topics University Computing, 1993

Genetic Algorithms

Beasley, Bull and Martin,An Overview of Genetic Algorithms:

Part 1, Fundamentals &Part 2, Research Topics

University Computing, 1993

Background of GAs

GAs are based on genetic processes of biological organisms, i.e. evolution according to principles of natural selection and survival of the fittestIn nature, individuals in a population compete with each other for resources and to attract a mateThe fittest ones survive and produce offspring, spreading their genetic properties to populationCombination of good properties may in time produce “superfit” offspringGAs were first proposed by Holland (1975)

Basic notions of GAs

GAs work with a population of individuals, each representing a solution to the problem at handEach individual is assigned a fitness scoreHighly fit individuals are selected as parents and given an opportunity to reproduce by crossover, leading to exploration of the most promising regions in the search spaceOffspring produced share features taken from their parents and may be subject to mutation

Basic notions of GAs (cont.)

Best individuals are selected (from parents and/or offspring) to form the next generationOver many generations, good features spread throughout the population, being mixed and exchanged with other good featuresThis leads to convergence to a good solutionGAs are robust and used for a variety of problemsThe main area for GAs is difficult problems for which there are no specialized techniques

A generic GA

Generate an initial population of size population_sizeCompute fitness of each individualRepeat

Repeat for population_size/2Select two parents based on fitness valuesRecombine the parents to produce two offspring by

applying crossover (with rate/probability pc)

and mutation (with probability pm)Compute fitness values of offspring

Form the population for the next generationUntil population has converged

Decisions in a GA

Chromosome representation or coding of a solutionFitness functionPopulation size, generation of initial populationParent selection for reproduction

Crossover rate/probability (pc), crossover operator

Mutation probability (pm), mutation operatorForming the population for next generationStopping (convergence) condition

Coding or representation

A potential solution to a problem may be coded or represented by a set of variablesIn GAs, each of these variables (solution components) is called a geneA string of genes, representing a complete solution, is called a chromosomeThe set of variables represented by a chromosome is called genotype, solution constructed using these variables is called phenotypeThe ideal representation scheme is binary coding

Examples of binary coding

Optimization of function f(x,y,z)

(assuming )

Phenotype: x=46, y=24, z=13

Genotype: 0 1 1 1 0 1 | 0 0 0 1 1 0 | 1 0 1 1 0 0

6312,,0 5 zyx

Examples of binary coding (cont.)

Assignment problem

Phenotype: facility f1 is assigned to location l1, f2 to l4, f3 to l2

Genotype: location

0010

1000

0001

3

2

14321

facility

Examples of binary coding (cont.)

TSP

Phenotype: the tour 1-3-2-4-1

Genotype: to city

0001

0010

1000

0100

4

3

2

14321

city from

Fitness function

Fitness function returns a single numerical fitness for a chromosomeFitness value is used in probabilistic selection of parents for reproduction; usually the higher the fitness, the higher the probability of selectionFitness function may simply be the objective function where we optimize a single criterionIt may also be a more complicated measure involving multiple criteria and penalties for infeasibility

Fitness function (cont.)

Fitness function should be smooth and regular (similar chromosomes must have close fitness)It should reflect real value of a chromosomeLet the magnitude of penalty reflect the “amount” of constraint violation, e.g. how much will it cost to convert the chromosome into a valid one?Use approximate function evaluation when Evaluating true fitness is too costly Fitness function is stochastic

Population size and generation of initial population

Increasing the population size usually increases solution quality but requires more computationCommon population sizes are 20, 30, 50, 100In any case, population size is such a small fraction of the search space that increasing it further is not justifiedInitial population can be generated Randomly Completely or partly (called seeding) by heuristic(s)

Reproduction

Reproduction involves Parent selection Recombination by using crossover and mutation

operators

Crossover is the more important of the two for rapidly exploring the search space

Mutation provides a small amount of random search and ensures that every point in search space is accessible

Parent selection

Some individuals from the population are selected to form a mating pool (multiple copies of the same individual may be allowed)Size of the mating pool depends on the crossover rate/probability and the replacement schemeTwo extremes are For 100% replacement: Size of the mating pool is the

same as the population size, pc=1.0 For steady-state replacement: Two parents are selected to

reproduce two offspring, which replace two worst parents

Parent selection (cont.)

One idea is to allocate the number of reproductive trials (the number of times an individual is copied into the mating pool) or to assign selection probability to individuals in proportion to their fitness, e.g. for maximization

ff

i i individualfor trialsvereproducti of #

minmax

min individualfor y probabilit selectionff

ffi i


The number of reproductive trials may not be integer in which case we can use a stochastic sampling method, e.g. if reproductive trials for individuals i and j are 1.8 and 1.2 then each will have one copy in the mating pool and the third will be either i or j with remainder probabilities

Selection probabilities may not add up to 1.0 in which case we can normalize.


The above would have worked with infinite population size but with finite population it may cause a few highly fit individuals to dominate the population rapidly (premature convergence)

Conversely, the population may converge after many generations, but without precisely locating the optimum due to insufficient gradient in fitness function to push the GA towards the optimum (slow finishing)

Fitness remapping

Fitness remapping is used to avoid Premature convergence, by compressing the range of

fitness values Slow finishing, by expanding the range of fitness

values

Selection pressure (ratio of maximum to average reproductive trials allocated) can be adjusted by explicit or implicit fitness remapping

Fitness remapping (cont.)

1. Fitness scaling: Bring the maximum number of reproductive trials allocated to an individual to desired level by shifting the fitness values, i.e.

2. Fitness windowing: Same as above where s is the minimum fitness observed during the last n (typically 10) generations

ff

sfsf maxmax

Fitness remapping (cont.)

3. Fitness ranking: To avoid using extreme fitness values, sort individuals according to raw fitness, then assign reproductive trials according to rank (found to be superior to scaling)

4. Tournament selection: Select a pair of individuals from the population at random (with replacement), copy the better one into the mating pool, repeat until the pool is full

5. Probabilistic tournament selection: Better individual is selected with a probability > 0.5.

Crossover operator

Typically, crossover takes two parents, cuts their chromosome strings at a randomly chosen position, swaps the head (or tail) segments to produce two offspringCrossover is not usually applied to all pairs of parents selected for matingLikelyhood of crossover being applied to a pair, pc, is usually between 0.6 and 1.0If crossover is not applied, offspring are produced by duplicating their parents (no disruption)

Crossover operator (cont.)

Alternatively, taking pc as constant crossover rate, pc x population_size/2 pairs are selected, and crossover is applied to all selected pairs

Most common crossover operators for binary representation are: 1-point crossover 2-point crossover (chromosome is viewed as a loop

rather than a string) Uniform crossover

1- and 2-point crossover

1-point Parent 1: 1 0 1 0 | 0 0 1 1 1 0crossover Parent 2: 0 0 1 1 | 0 1 0 0 1 0

Offspring 1: 1 0 1 0 | 0 1 0 0 1 0Offspring 2: 0 0 1 1 | 0 0 1 1 1 0

2-point Parent 1: 1 0 1 0 | 0 0 1 | 1 1 0crossover Parent 2: 0 0 1 1 | 0 1 0 | 0 1 0

Offspring 1: 1 0 1 0 | 0 1 0 | 1 1 0Offspring 2: 0 0 1 1 | 0 0 1 | 0 1 0

Uniform crossover

A binary crossover mask is used to determine which gene will be taken from which parent

Parent 1: 1 0 1 0 0 0 1 1 1 0Parent 2: 0 0 1 1 0 1 0 0 1 0

Crossover mask: 1 0 0 1 0 1 1 1 0 0

Offspring 1: 1 0 1 0 0 0 1 1 1 0Offspring 2: 0 0 1 1 0 1 0 0 1 0

Mutation operator

Mutation is applied to every offspring by altering each binary gene with a small probability, pm (typically 0.001)Offspring: 1 0 1 0 0 1 0 0 1 0Mutated offspring: 1 0 1 0 0 0 0 0 1 0Alternatively, the entire chromosome may be mutated at once by a higher pm, particularly when a non-binary representation and problem specific genetic operators are used

Forming population for next generation (replacement)

After two offspring are produced and mutated, they may replace their parents Unconditionally (a generation gap of 100%) If they are more fit than their parents

Alternatively, all parents and offspring may be sorted together according to their fitness, and the best population_size of them may be selected

Steady-state replacement replaces only a few parents (two worst parents by two best offspring)

Convergence

A gene is said to converge when 95% of the population share the same valueThe population is said to converge when all genes have convergedTo monitor convergence, plot population average, population best and incumbent solution throughout the generationsAs the population converges, average fitness approaches the best

Convergence (cont.)

Why GAs work: Schemata

A schema is a pattern of gene values represented by a string of characters in the alphabet {0, 1, #} where # matches anythingFor example, the chromosome 1010 contains, among others, the schemata 10##, #0#0, ##1#, 101#Order of a schema is the number of non-# symbols it contains (2, 2, 1, 3 in the example)Defining length of a schema is the distance between the outermost non-# symbols (2, 3, 1, 3)

Schema theorem

It is assumed that an individual’s high fitness is due to the good schemata it containsHolland (1975) showed that, under simplifying assumptions, the optimum way to explore the search space is to allocate reproductive trials to individuals in proportion to their fitness valuesIn this way, good schemata receive an exponentially increasing number of reproductive trials in successive generations (this is called the schema theorem)

Schema theorem (cont.)

Holland also showed that, since each individual contains many different schemata, the number of schemata effectively processed in each generation is in the order of population_size3

This property is known as implicit parallelism, and is one of the explanations for the good performance of GAs

Binary coding is thought to be ideal because these theoretical results are valid for binary coding

Building block hypothesis

Goldberg (1989) claims that power of the GA lies in it being able to find good building blocks

Building blocks are schemata of short defining length consisting of genes that work well together, and improve performance when incorporated into an individual

Short defining length is needed so that building blocks are disrupted less by random cut points in 1- or 2-point crossover

Building block hypothesis (cont.)

Hence a successful coding scheme encourages formation of building blocks by ensuring that Related genes are close together on the chromosome There is little interaction between genes

Interaction (epistasis) means that contribution of a gene to the fitness depends on values of other genes in the chromosomeThere is always some interaction between genes in multimodal fitness functions, and the above conditions are not easy to satisfy

Exploration and exploitation

Any good search algorithm must find a tradeoff between exploration and exploitation, e.g. random search does only exploration whereas traditional descent does only exploitation

Holland showed that a GA does both simultaneously in an optimal way, assuming that Population size is infinite Fitness function accurately reflects the solution’s utility Genes in a chromosome do not interact significantly

Exploration and exploitation (cont.)

The first assumption can never be satisfied in practice; GA’s “population” is only a sample and stochastic error is unavoidableGenetic drift: Even in the absence of any selection pressure (i.e. a constant fitness function), the GA will still converge if, by chance, a chromosome becomes predominant in the populationFor the GA to properly exploit, the fitness function must provide a sufficiently large slope to counteract the genetic driftMutation can be useful in avoiding genetic drift

Comparison of GAs with others

Random search does only exploration, traditional ascent (hillclimbing) does only exploitation, GA does both.Iterated hillclimbing with random restarting points allocates its trials evenly over the search space, GA allocates increasing trials to promising regionsSA and TS deal with one candidate solution at a time, GA has a population and implicit parallelismTS is usually deterministic, GA is stochasticSA does not have memory, TS does, GA?

Part 2: Crossover revisited

2-point crossover is better than 1-point because a chromosome, when consired as a loop, can contain more building blocks

Schemata of a particular order are equally likely to be disrupted by uniform crossover, irrespective of their defining length

Schemata with long defining length are more likely to be disrupted by 2-point crossover, irrespective of their order

Crossover revisited (cont.)

Schemata with short defining length are more likely to be disrupted by uniform crossover, but the same is not true for longer defining length (?)Hence, total amount of schemata disruption may be lower with uniform crossoverOrdering of genes in the chromosome is not important with uniform crossover, hence it is more robust than 2-point crossoverTheoretical and empirical results show that there is no overall winner

Mutation revisited

Mutation is traditionally regarded less important than crossover and used to provide a small amount of random search and to avoid genetic driftHowever, asexual reproduction can also result in successful evolutionNaive evolution (just selection and mutation) results in slower evolution than crossover alone, but it may find better solutions at the endIndeed, as the population converges, mutation becomes more productive than crossover

Inversion and reordering

Order of genes on a chromosome is critical for the building block hypothesis to work effectivelyPurpose of inversion/reordering is to find gene orderings that have better evolutionary potentialInversion is a special form of reordering which reverses the order of genes between two randomly selected positionsReordering does not lower epistasis; nor does it help when linear ordering of genes is not possibleReordering also expands the search space

More on epistasis (interaction)

In biology, a gene is epistatic when its presence suppresses the effect of a gene at another positionEven when individual genes are not epistatic, there will be “chains of influence” (one gene’s product affects another gene’s function)Hence, interaction among genes is unavoidableInteraction is inherent in some problems, e.g. In TSP, it is the relationship (distance) between cities

that counts, not the cities themselves Two facilities cannot be assigned to the same location

Deception

Normally, short, low-order schemata contained in global optimal solution are expected to increase in frequency throughout the evolutionIf schemata not contained in optimal solution have higher fitness, then they will increase in frequency faster, and the GA will be misledDeception is a special case of epistasisDeceptive problems may be difficult to solve, but the bias introduced in average fitness estimation after the first generation may help solve them

Tackling epistasis

Epistasis can be tackled in two ways As a coding problem As a GA theory problem

If taken as a coding problem, the solution is to find a different coding scheme and to develop appropriate genetic operators, e.g. Goldsberg’s order-schemata and PMX crossover Expansive coding which uses a larger number of

weakly interacting genes (larger search space) instead of a small number of strongly interacting genes

Tackling epistasis (cont.)

We will see such examples for ordering problems when we discuss genetic operators for TSP

When treated as a GA theory problem, a new theory (and new algorithms) may have to be developed, which takes epistasis into account

Although Holland’s convergence proof assumes low epistasis, there may be a weaker proof for domains of high epistasis

Non-binary representations

Binary representation, where each gene has a cardinality of two, is traditionally believed to give the largest number of schemata and to provide highest degree of implicit parallelismRecently, higher-cardinality representations are claimed to contain more schemata; they can perform wellInteger or real numbers can be used as high-cardinality alphabets, and meaningful problem specific genetic operators can be defined easily

Non-binary representations (cont.)

Examples of non-binary crossover operators Take arithmetic average of the two gene values Take geometric mean (square root of the product) Take the difference between the two gene values, add it

to the higher or subtract it from the lower

Examples of non-binary mutation operators Replace the current value with a random one Add or subtract a small random amount (creep) Multiply by a random amount close to one (geometric

creep)

Dynamic operator probabilities

Crossover probability, pc, and mutation probability, pm, may vary during the evolution

For better exploration, pc may decrease and pm may increase during the run according to a fixed schedule, e.g. linearly

For convergence, pm may decrease exponentially (similar to the temperature in SA)

pc and pm can be adjusted dynamically, depending on the spread of fitness values, e.g. increase pm as the spread decreases (as the population converges)

Dynamic operator probabilities (cont.)

Probability of the more successful operator can be increased, e.g. Monitor the fitness improvement due to crossover and

mutation operators over the last n reproductive trials Give more weight to the more successful operator For each reproductive trial, choose one of the operators

probabilistically according to its weight

Different crossover and mutation operators may also be weighted in a similar manner

Niche and speciation

In nature, different species evolve to fill different ecological nichesSpeciation is the process by which a single species differentiates into two or more different speciesNiches are analogous to alternative maxima of fitness values in GAsNormally, a GA cannot find these alternatives because of genetic drift and convergence (a GA does not allow speciation and the entire population end up in the same niche)

Niche and speciation (cont.)

To solve this problem, we should Maintain diversity by encouraging speciation Share the payoff associated with a niche

Preselection: Offspring replace the parents only if the offspring’s fitness is higher than that of the inferior parent (this maintains population diversity since similar individuals replace each other)Crowding: Offspring is compared with a few randomly selected individuals and replaces the most similar one (again for diversity)


Restricted mating: Individuals are allowed to mate only if they are similar (this encourages speciation); offspring of two highly fit but dissimilar parents may be unfitMultiple subpopulations: Population is divided into subpopulations, each evolving in itself, and migration is allowed at a limited rate (again for speciation)Local mating: Similar to multiple subpopulations, but without explicit boundaries


Sharing: Similar individuals that are in the same niche share the fitness payoff among them A full niche is no longer rewarding since the payoff is

shared and individual fitness values are reduced Sharing distributes individuals to peaks in fitness

function in proportion to the height of the peak Sharing is found to be superior to crowding

Sequential niches: Multiple GA runs are made, each locating a new peak (previously found peaks are cancelled out from the fitness function)

Diploidy and dominance

Diploidy: Higher lifeforms like mamals have two sets of genes; of a pair of genes, one is dominant and the other is recessive Diploidy allows two solutions to be remembered instead of one, and provides higher diversityPotentially useful gene sets can be maintained in recessive positionAn extension can be to keep the best individuals (elite solutions?) and try reintroducing them to the population if the performance falls

Problem specific knowledge

Binary coding, random initial population, and traditional crossover/mutation operators follow the biological process more closely (are generic) but do not make use of problem specific knowledgeUsing problem specific knowledge, we can Find more suitable representation schemes Generate initial population using heuristics Develop problem specific genetic operators that

guarantee feasibility, particularly in ordering problems Use local improvement as a form of mutation

Documents

Genetic Algorithms Beasley, Bull and Martin, An Overview of Genetic Algorithms: Part 1, Fundamentals & Part 2, Research Topics University Computing, 1993