Upload
chastity-flowers
View
232
Download
2
Tags:
Embed Size (px)
Citation preview
Genetic Algorithms
Beasley, Bull and Martin,An Overview of Genetic Algorithms:
Part 1, Fundamentals &Part 2, Research Topics
University Computing, 1993
Background of GAs
GAs are based on genetic processes of biological organisms, i.e. evolution according to principles of natural selection and survival of the fittestIn nature, individuals in a population compete with each other for resources and to attract a mateThe fittest ones survive and produce offspring, spreading their genetic properties to populationCombination of good properties may in time produce “superfit” offspringGAs were first proposed by Holland (1975)
Basic notions of GAs
GAs work with a population of individuals, each representing a solution to the problem at handEach individual is assigned a fitness scoreHighly fit individuals are selected as parents and given an opportunity to reproduce by crossover, leading to exploration of the most promising regions in the search spaceOffspring produced share features taken from their parents and may be subject to mutation
Basic notions of GAs (cont.)
Best individuals are selected (from parents and/or offspring) to form the next generationOver many generations, good features spread throughout the population, being mixed and exchanged with other good featuresThis leads to convergence to a good solutionGAs are robust and used for a variety of problemsThe main area for GAs is difficult problems for which there are no specialized techniques
A generic GA
Generate an initial population of size population_sizeCompute fitness of each individualRepeat
Repeat for population_size/2Select two parents based on fitness valuesRecombine the parents to produce two offspring by
applying crossover (with rate/probability pc)
and mutation (with probability pm)Compute fitness values of offspring
Form the population for the next generationUntil population has converged
Decisions in a GA
Chromosome representation or coding of a solutionFitness functionPopulation size, generation of initial populationParent selection for reproduction
Crossover rate/probability (pc), crossover operator
Mutation probability (pm), mutation operatorForming the population for next generationStopping (convergence) condition
Coding or representation
A potential solution to a problem may be coded or represented by a set of variablesIn GAs, each of these variables (solution components) is called a geneA string of genes, representing a complete solution, is called a chromosomeThe set of variables represented by a chromosome is called genotype, solution constructed using these variables is called phenotypeThe ideal representation scheme is binary coding
Examples of binary coding
Optimization of function f(x,y,z)
(assuming )
Phenotype: x=46, y=24, z=13
Genotype: 0 1 1 1 0 1 | 0 0 0 1 1 0 | 1 0 1 1 0 0
6312,,0 5 zyx
Examples of binary coding (cont.)
Assignment problem
Phenotype: facility f1 is assigned to location l1, f2 to l4, f3 to l2
Genotype: location
0010
1000
0001
3
2
14321
facility
Examples of binary coding (cont.)
TSP
Phenotype: the tour 1-3-2-4-1
Genotype: to city
0001
0010
1000
0100
4
3
2
14321
city from
Fitness function
Fitness function returns a single numerical fitness for a chromosomeFitness value is used in probabilistic selection of parents for reproduction; usually the higher the fitness, the higher the probability of selectionFitness function may simply be the objective function where we optimize a single criterionIt may also be a more complicated measure involving multiple criteria and penalties for infeasibility
Fitness function (cont.)
Fitness function should be smooth and regular (similar chromosomes must have close fitness)It should reflect real value of a chromosomeLet the magnitude of penalty reflect the “amount” of constraint violation, e.g. how much will it cost to convert the chromosome into a valid one?Use approximate function evaluation when Evaluating true fitness is too costly Fitness function is stochastic
Population size and generation of initial population
Increasing the population size usually increases solution quality but requires more computationCommon population sizes are 20, 30, 50, 100In any case, population size is such a small fraction of the search space that increasing it further is not justifiedInitial population can be generated Randomly Completely or partly (called seeding) by heuristic(s)
Reproduction
Reproduction involves Parent selection Recombination by using crossover and mutation
operators
Crossover is the more important of the two for rapidly exploring the search space
Mutation provides a small amount of random search and ensures that every point in search space is accessible
Parent selection
Some individuals from the population are selected to form a mating pool (multiple copies of the same individual may be allowed)Size of the mating pool depends on the crossover rate/probability and the replacement schemeTwo extremes are For 100% replacement: Size of the mating pool is the
same as the population size, pc=1.0 For steady-state replacement: Two parents are selected to
reproduce two offspring, which replace two worst parents
Parent selection (cont.)
One idea is to allocate the number of reproductive trials (the number of times an individual is copied into the mating pool) or to assign selection probability to individuals in proportion to their fitness, e.g. for maximization
ff
i i individualfor trialsvereproducti of #
minmax
min individualfor y probabilit selectionff
ffi i
Parent selection (cont.)
The number of reproductive trials may not be integer in which case we can use a stochastic sampling method, e.g. if reproductive trials for individuals i and j are 1.8 and 1.2 then each will have one copy in the mating pool and the third will be either i or j with remainder probabilities
Selection probabilities may not add up to 1.0 in which case we can normalize.
Parent selection (cont.)
The above would have worked with infinite population size but with finite population it may cause a few highly fit individuals to dominate the population rapidly (premature convergence)
Conversely, the population may converge after many generations, but without precisely locating the optimum due to insufficient gradient in fitness function to push the GA towards the optimum (slow finishing)
Fitness remapping
Fitness remapping is used to avoid Premature convergence, by compressing the range of
fitness values Slow finishing, by expanding the range of fitness
values
Selection pressure (ratio of maximum to average reproductive trials allocated) can be adjusted by explicit or implicit fitness remapping
Fitness remapping (cont.)
1. Fitness scaling: Bring the maximum number of reproductive trials allocated to an individual to desired level by shifting the fitness values, i.e.
2. Fitness windowing: Same as above where s is the minimum fitness observed during the last n (typically 10) generations
ff
sfsf maxmax
Fitness remapping (cont.)
3. Fitness ranking: To avoid using extreme fitness values, sort individuals according to raw fitness, then assign reproductive trials according to rank (found to be superior to scaling)
4. Tournament selection: Select a pair of individuals from the population at random (with replacement), copy the better one into the mating pool, repeat until the pool is full
5. Probabilistic tournament selection: Better individual is selected with a probability > 0.5.
Crossover operator
Typically, crossover takes two parents, cuts their chromosome strings at a randomly chosen position, swaps the head (or tail) segments to produce two offspringCrossover is not usually applied to all pairs of parents selected for matingLikelyhood of crossover being applied to a pair, pc, is usually between 0.6 and 1.0If crossover is not applied, offspring are produced by duplicating their parents (no disruption)
Crossover operator (cont.)
Alternatively, taking pc as constant crossover rate, pc x population_size/2 pairs are selected, and crossover is applied to all selected pairs
Most common crossover operators for binary representation are: 1-point crossover 2-point crossover (chromosome is viewed as a loop
rather than a string) Uniform crossover
1- and 2-point crossover
1-point Parent 1: 1 0 1 0 | 0 0 1 1 1 0crossover Parent 2: 0 0 1 1 | 0 1 0 0 1 0
Offspring 1: 1 0 1 0 | 0 1 0 0 1 0Offspring 2: 0 0 1 1 | 0 0 1 1 1 0
2-point Parent 1: 1 0 1 0 | 0 0 1 | 1 1 0crossover Parent 2: 0 0 1 1 | 0 1 0 | 0 1 0
Offspring 1: 1 0 1 0 | 0 1 0 | 1 1 0Offspring 2: 0 0 1 1 | 0 0 1 | 0 1 0
Uniform crossover
A binary crossover mask is used to determine which gene will be taken from which parent
Parent 1: 1 0 1 0 0 0 1 1 1 0Parent 2: 0 0 1 1 0 1 0 0 1 0
Crossover mask: 1 0 0 1 0 1 1 1 0 0
Offspring 1: 1 0 1 0 0 0 1 1 1 0Offspring 2: 0 0 1 1 0 1 0 0 1 0
Mutation operator
Mutation is applied to every offspring by altering each binary gene with a small probability, pm (typically 0.001)Offspring: 1 0 1 0 0 1 0 0 1 0Mutated offspring: 1 0 1 0 0 0 0 0 1 0Alternatively, the entire chromosome may be mutated at once by a higher pm, particularly when a non-binary representation and problem specific genetic operators are used
Forming population for next generation (replacement)
After two offspring are produced and mutated, they may replace their parents Unconditionally (a generation gap of 100%) If they are more fit than their parents
Alternatively, all parents and offspring may be sorted together according to their fitness, and the best population_size of them may be selected
Steady-state replacement replaces only a few parents (two worst parents by two best offspring)
Convergence
A gene is said to converge when 95% of the population share the same valueThe population is said to converge when all genes have convergedTo monitor convergence, plot population average, population best and incumbent solution throughout the generationsAs the population converges, average fitness approaches the best
Convergence (cont.)
Why GAs work: Schemata
A schema is a pattern of gene values represented by a string of characters in the alphabet {0, 1, #} where # matches anythingFor example, the chromosome 1010 contains, among others, the schemata 10##, #0#0, ##1#, 101#Order of a schema is the number of non-# symbols it contains (2, 2, 1, 3 in the example)Defining length of a schema is the distance between the outermost non-# symbols (2, 3, 1, 3)
Schema theorem
It is assumed that an individual’s high fitness is due to the good schemata it containsHolland (1975) showed that, under simplifying assumptions, the optimum way to explore the search space is to allocate reproductive trials to individuals in proportion to their fitness valuesIn this way, good schemata receive an exponentially increasing number of reproductive trials in successive generations (this is called the schema theorem)
Schema theorem (cont.)
Holland also showed that, since each individual contains many different schemata, the number of schemata effectively processed in each generation is in the order of population_size3
This property is known as implicit parallelism, and is one of the explanations for the good performance of GAs
Binary coding is thought to be ideal because these theoretical results are valid for binary coding
Building block hypothesis
Goldberg (1989) claims that power of the GA lies in it being able to find good building blocks
Building blocks are schemata of short defining length consisting of genes that work well together, and improve performance when incorporated into an individual
Short defining length is needed so that building blocks are disrupted less by random cut points in 1- or 2-point crossover
Building block hypothesis (cont.)
Hence a successful coding scheme encourages formation of building blocks by ensuring that Related genes are close together on the chromosome There is little interaction between genes
Interaction (epistasis) means that contribution of a gene to the fitness depends on values of other genes in the chromosomeThere is always some interaction between genes in multimodal fitness functions, and the above conditions are not easy to satisfy
Exploration and exploitation
Any good search algorithm must find a tradeoff between exploration and exploitation, e.g. random search does only exploration whereas traditional descent does only exploitation
Holland showed that a GA does both simultaneously in an optimal way, assuming that Population size is infinite Fitness function accurately reflects the solution’s utility Genes in a chromosome do not interact significantly
Exploration and exploitation (cont.)
The first assumption can never be satisfied in practice; GA’s “population” is only a sample and stochastic error is unavoidableGenetic drift: Even in the absence of any selection pressure (i.e. a constant fitness function), the GA will still converge if, by chance, a chromosome becomes predominant in the populationFor the GA to properly exploit, the fitness function must provide a sufficiently large slope to counteract the genetic driftMutation can be useful in avoiding genetic drift
Comparison of GAs with others
Random search does only exploration, traditional ascent (hillclimbing) does only exploitation, GA does both.Iterated hillclimbing with random restarting points allocates its trials evenly over the search space, GA allocates increasing trials to promising regionsSA and TS deal with one candidate solution at a time, GA has a population and implicit parallelismTS is usually deterministic, GA is stochasticSA does not have memory, TS does, GA?
Part 2: Crossover revisited
2-point crossover is better than 1-point because a chromosome, when consired as a loop, can contain more building blocks
Schemata of a particular order are equally likely to be disrupted by uniform crossover, irrespective of their defining length
Schemata with long defining length are more likely to be disrupted by 2-point crossover, irrespective of their order
Crossover revisited (cont.)
Schemata with short defining length are more likely to be disrupted by uniform crossover, but the same is not true for longer defining length (?)Hence, total amount of schemata disruption may be lower with uniform crossoverOrdering of genes in the chromosome is not important with uniform crossover, hence it is more robust than 2-point crossoverTheoretical and empirical results show that there is no overall winner
Mutation revisited
Mutation is traditionally regarded less important than crossover and used to provide a small amount of random search and to avoid genetic driftHowever, asexual reproduction can also result in successful evolutionNaive evolution (just selection and mutation) results in slower evolution than crossover alone, but it may find better solutions at the endIndeed, as the population converges, mutation becomes more productive than crossover
Inversion and reordering
Order of genes on a chromosome is critical for the building block hypothesis to work effectivelyPurpose of inversion/reordering is to find gene orderings that have better evolutionary potentialInversion is a special form of reordering which reverses the order of genes between two randomly selected positionsReordering does not lower epistasis; nor does it help when linear ordering of genes is not possibleReordering also expands the search space
More on epistasis (interaction)
In biology, a gene is epistatic when its presence suppresses the effect of a gene at another positionEven when individual genes are not epistatic, there will be “chains of influence” (one gene’s product affects another gene’s function)Hence, interaction among genes is unavoidableInteraction is inherent in some problems, e.g. In TSP, it is the relationship (distance) between cities
that counts, not the cities themselves Two facilities cannot be assigned to the same location
Deception
Normally, short, low-order schemata contained in global optimal solution are expected to increase in frequency throughout the evolutionIf schemata not contained in optimal solution have higher fitness, then they will increase in frequency faster, and the GA will be misledDeception is a special case of epistasisDeceptive problems may be difficult to solve, but the bias introduced in average fitness estimation after the first generation may help solve them
Tackling epistasis
Epistasis can be tackled in two ways As a coding problem As a GA theory problem
If taken as a coding problem, the solution is to find a different coding scheme and to develop appropriate genetic operators, e.g. Goldsberg’s order-schemata and PMX crossover Expansive coding which uses a larger number of
weakly interacting genes (larger search space) instead of a small number of strongly interacting genes
Tackling epistasis (cont.)
We will see such examples for ordering problems when we discuss genetic operators for TSP
When treated as a GA theory problem, a new theory (and new algorithms) may have to be developed, which takes epistasis into account
Although Holland’s convergence proof assumes low epistasis, there may be a weaker proof for domains of high epistasis
Non-binary representations
Binary representation, where each gene has a cardinality of two, is traditionally believed to give the largest number of schemata and to provide highest degree of implicit parallelismRecently, higher-cardinality representations are claimed to contain more schemata; they can perform wellInteger or real numbers can be used as high-cardinality alphabets, and meaningful problem specific genetic operators can be defined easily
Non-binary representations (cont.)
Examples of non-binary crossover operators Take arithmetic average of the two gene values Take geometric mean (square root of the product) Take the difference between the two gene values, add it
to the higher or subtract it from the lower
Examples of non-binary mutation operators Replace the current value with a random one Add or subtract a small random amount (creep) Multiply by a random amount close to one (geometric
creep)
Dynamic operator probabilities
Crossover probability, pc, and mutation probability, pm, may vary during the evolution
For better exploration, pc may decrease and pm may increase during the run according to a fixed schedule, e.g. linearly
For convergence, pm may decrease exponentially (similar to the temperature in SA)
pc and pm can be adjusted dynamically, depending on the spread of fitness values, e.g. increase pm as the spread decreases (as the population converges)
Dynamic operator probabilities (cont.)
Probability of the more successful operator can be increased, e.g. Monitor the fitness improvement due to crossover and
mutation operators over the last n reproductive trials Give more weight to the more successful operator For each reproductive trial, choose one of the operators
probabilistically according to its weight
Different crossover and mutation operators may also be weighted in a similar manner
Niche and speciation
In nature, different species evolve to fill different ecological nichesSpeciation is the process by which a single species differentiates into two or more different speciesNiches are analogous to alternative maxima of fitness values in GAsNormally, a GA cannot find these alternatives because of genetic drift and convergence (a GA does not allow speciation and the entire population end up in the same niche)
Niche and speciation (cont.)
To solve this problem, we should Maintain diversity by encouraging speciation Share the payoff associated with a niche
Preselection: Offspring replace the parents only if the offspring’s fitness is higher than that of the inferior parent (this maintains population diversity since similar individuals replace each other)Crowding: Offspring is compared with a few randomly selected individuals and replaces the most similar one (again for diversity)
Niche and speciation (cont.)
Restricted mating: Individuals are allowed to mate only if they are similar (this encourages speciation); offspring of two highly fit but dissimilar parents may be unfitMultiple subpopulations: Population is divided into subpopulations, each evolving in itself, and migration is allowed at a limited rate (again for speciation)Local mating: Similar to multiple subpopulations, but without explicit boundaries
Niche and speciation (cont.)
Sharing: Similar individuals that are in the same niche share the fitness payoff among them A full niche is no longer rewarding since the payoff is
shared and individual fitness values are reduced Sharing distributes individuals to peaks in fitness
function in proportion to the height of the peak Sharing is found to be superior to crowding
Sequential niches: Multiple GA runs are made, each locating a new peak (previously found peaks are cancelled out from the fitness function)
Diploidy and dominance
Diploidy: Higher lifeforms like mamals have two sets of genes; of a pair of genes, one is dominant and the other is recessive Diploidy allows two solutions to be remembered instead of one, and provides higher diversityPotentially useful gene sets can be maintained in recessive positionAn extension can be to keep the best individuals (elite solutions?) and try reintroducing them to the population if the performance falls
Problem specific knowledge
Binary coding, random initial population, and traditional crossover/mutation operators follow the biological process more closely (are generic) but do not make use of problem specific knowledgeUsing problem specific knowledge, we can Find more suitable representation schemes Generate initial population using heuristics Develop problem specific genetic operators that
guarantee feasibility, particularly in ordering problems Use local improvement as a form of mutation