1 Genetic Algorithms Data Mining and Distributed Computing Group Facultad de Informática

Genetic Algorithms

Data Mining and Distributed Computing Group

Facultad de Informática

DATSI, Universidad Politécnica de Madrid


A class of probabilistic optimization algorithms Inspired by the biological evolution process Uses concepts of “Natural Selection” and “Genetic

Inheritance” (Darwin 1859) Originally developed by John Holland (1975) Particularly well suited for hard problems where little

is known about the underlying search space Widely-used in business, science and engineering

DATSI, Universidad Politécnica de Madrid

Natural Evolution

Darwin: The Origin of Species– Search of optimal forms (environment)– Based on:

assortment : recombination of genetic material

randomness selection : survival of the fittest

DATSI, Universidad Politécnica de Madrid

Biological Concepts (Cell)

• A set of many small “factories” working together

• Center: cell nucleus

• The nucleus contains the genetic information

DATSI, Universidad Politécnica de Madrid

Biological Concepts (Chromosome)

• Chromosomes store genetic information

• Each chromosome is build of DNA

• Chromosomes in humans form pairs (23 pairs)

• The chromosome is divided in parts: genes

• Genes code for properties

• Possible values of the genes: allele

• Position of the gene in the

chromosome: locus

DATSI, Universidad Politécnica de Madrid

Biological Concepts (Genetics)

• The entire combination of genes: genotype

• A genotype is expressed as a phenotype

• Alleles can be either dominant or recessive

• Dominant alleles will always express from the genotype to the phenotype

• Recessive alleles can survive in the population for many generations, without being expressed

DATSI, Universidad Politécnica de Madrid

Biological Concepts (Reproduction)

Mitosis: copying the same genetic information to new offspring: there is no exchange of information

Normal way of growing of multicell structures, like organs.

DATSI, Universidad Politécnica de Madrid

Biological Concepts (Reproduction)

• Meiosis: the basis of sexual reproduction

• After meiotic division 2 gametes appear in the process

• In reproduction two gametes conjugate to a zygote wich will become the new individual

• Genetic information is shared between the parents in order to create new offspring

DATSI, Universidad Politécnica de Madrid

Biological Concepts (Reproduction)

• During reproduction “errors” occur

• Due to these “errors” genetic variation exists

• Most important “errors” are:

• Recombination (cross-over)

• Mutation

DATSI, Universidad Politécnica de Madrid

Biological Concepts (Natural Selection)

• The origin of species: “Preservation of favourable variations and rejection of unfavourable variations.”

• Survival of the fittest

• Mathematical expresses as fitness: success in life

DATSI, Universidad Politécnica de Madrid

Genetic Algorithms (GAs)

John Holland (~1973)– GAs use the principle of natural selection to solve

complicated optimization problems.– An alternative of brute-force search on a complex

solution space. Instead of complete exploration Estocastic-driven search

DATSI, Universidad Politécnica de Madrid

Genetic Algorithms (GAs)

F inonacc i N ew ton

D irect m ethods Indirect m ethods

C alcu lus-based techn iques

Evolu tionary s trategies

C entra l ized D is tr ibuted

Para l le l

S teady-s ta te G enera tiona l

Sequentia l

G enetic a lgori thm s

Evolutionary a lgori thm s S im u lated annealing

G uided random search techniques

D ynam ic program m ing

Enum erative techn iques

Search techniques

DATSI, Universidad Politécnica de Madrid

Genetic Algorithms: Definitions

Concepts:– Individual: A possible solution of the problem.– Gene: A solution component (a.k.a. an attribute) [eye color]– Allele: A gene value [eye color=green]– Population: group of potential solutions


-19 4 108 46

0.433 -33.345 0.0013


A x

B 3

DATSI, Universidad Politécnica de Madrid

Genetic Algorithms: Definitions

Concepts:– Genotype: Genetic codification (seq. of genes)– Phenotype: Actual individual (with its capabilities)

For GAs, phenotype represents the fitness value of the solution:

Asumption: there exists a relationship between genetic information and individual fitness.

DATSI, Universidad Politécnica de Madrid

Fitness Landscapes

Fitness: Function that evaluates individual capabilities.

Graphical representation

N+1 dimensions


DATSI, Universidad Politécnica de Madrid

Natural Evolution and Operators

How new individuals are created?– Crossover (recombination)

mixing existing genetic material, assortment Inheritage of usefull characteristics

– Mutation: rare (very rare in biological perspective) Randomness

They simulate the evolution process

DATSI, Universidad Politécnica de Madrid

Natural Evolution and Operators





Variants: 2, 3, n-points crossover Uniform crossover Other...

DATSI, Universidad Politécnica de Madrid

Natural Evolution and Operators





Variants: Mutation-by-swap Biased mutation Cataclysmic mutation

DATSI, Universidad Politécnica de Madrid

Natural Evolution and Operators

Selection:1. Evaluation of individuals (fitness)

2. Parent selection (for each crossover operation).

Selective pressure: relationship between capacities (fitness) of the individual and its possibilities to generate new individuals (participate in the reproduction).

DATSI, Universidad Politécnica de Madrid

Natural Evolution and Operators

Selection (alternatives):– Tournament selection– Rank-based: Continuous function– Roulette wheel

Variants:– Previous population selection (worst individual


DATSI, Universidad Politécnica de Madrid

Canonical Genetic Algorithm


Initial population

end condition?

selection P’iSelected population




Next population

Includes the chances of participation in reproduction

DATSI, Universidad Politécnica de Madrid

GA Variants (Replacement)

Elitism:– The best(s) individuals in Pi stay in population Pi.

– Provides monotonic fitness increment

Steady-state:– Only one individual is created per generation.– New one replaces worst individual.

DATSI, Universidad Politécnica de Madrid

Replacement Overview


Current population


P’iNext population


Offspring population

Canonical Steady-sate Elitism

Offspring size N 1 >, = ó < N

Current Next 0 N-1 << N

Replacement All Worst Worst


DATSI, Universidad Politécnica de Madrid

The Metaphor

Genetic Algorithm Nature

Optimization problem Environment

Feasible solutions Individuals living in that environment

Solutions quality (fitness function) Individual’s degree of adaptation to its surrounding environment

A set of feasible solutions A population of organisms (species)

Stochastic operators Selection, recombination and mutation in nature’s evolutionary process

Iteratively applying a set of stochastic operators on a set of feasible solutions

Evolution of populations to suit their environment

DATSI, Universidad Politécnica de Madrid

Aspects to consider: Population Size

What’s the optimal size of the population?– Too small: premature convergence (weak

exploration)– Too large: waste of resources

DATSI, Universidad Politécnica de Madrid

Aspects to consider:Problem representation

A key aspect: chromosomes representation– Initially, a binary string– Nowadays, other possibilities:

Bit strings (0101 ... 1100) Real numbers (43.2 -33.1 ... 0.0 89.2) Permutations of element (E11 E3 E7 ... E1 E15) Lists of rules (R1 R2 R3 ... R22 R23) Program elements (genetic programming) ... any data structure ...

– Great influence in the problem resolution

DATSI, Universidad Politécnica de Madrid

GA elements to define

Chromosome representation Creation of initial population Fitness function Genetic operators Parameters (population size,

probabilities of the genetic operators, etc.)

Population size: 50 – 100

Children per generation:

= population size

Crossovers: > 85%

Mutations: < 5%

Generations: 20 – 20,000

Typical configuration for small problems:

DATSI, Universidad Politécnica de Madrid

Example (The MAXONE problem)

• Suppose we want to maximize the number of ones in a string of m binary digits

Is it a trivial problem?

• It may seem so because we know the answer in advance

• However, we can think of it as maximizing the number of correct answers, each encoded by 1, to m yes/no difficult questions`

DATSI, Universidad Politécnica de Madrid

MAXONE problem

An individual is encoded (naturally) as a string of m binary digits

The fitness f of a candidate solution to the MAXONE problem is the number of ones in its genetic code

We start with a population of n random strings. Suppose that m = 10 and n = 6

DATSI, Universidad Politécnica de Madrid

MAXONE problem (initialization)

• We toss a fair coin 60 times and get the following initial population:

s1 = 1111010101 f (s1) = 7

s2 = 0111000101 f (s2) = 5

s3 = 1110110101 f (s3) = 7

s4 = 0100010011 f (s4) = 4

s5 = 1110111101 f (s5) = 8

s6 = 0100110000 f (s6) = 3

DATSI, Universidad Politécnica de Madrid

MAXONE problem (selection)

• Next we apply fitness proportionate selection with the roulette wheel method:



Area is Proportional to fitness value

Individual i will have a

probability to be chosen







• We repeat the extraction as many times as the number of individuals we need to have the same parent population size (6 in our case)

DATSI, Universidad Politécnica de Madrid

MAXONE problem (selection)

• Suppose that, after performing selection, we get the following population:

s1` = 1111010101 (s1)

s2` = 1110110101 (s3)

s3` = 1110111101 (s5)

s4` = 0111000101 (s2)

s5` = 0100010011 (s4)

s6` = 1110111101 (s5)

DATSI, Universidad Politécnica de Madrid

MAXONE problem (crossover)

• Next we mate strings for crossover. For each couple we decide according to crossover probability (for instance 0.6) whether to actually perform crossover or not

• Suppose that we decide to actually perform crossover only for couples (s1`, s2`) and (s5`, s6`). For each couple, we randomly extract a crossover point, for instance 2 for the first and 5 for the second

DATSI, Universidad Politécnica de Madrid

MAXONE problem (crossover)

s1` = 1111010101 s2` = 1110110101

s5` = 0100010011 s6` = 1110111101

Before crossover:

After crossover:

s1`` = 1110110101 s2`` = 1111010101

s5`` = 0100011101 s6`` = 1110110011

DATSI, Universidad Politécnica de Madrid

MAXONE problem (mutation)

• The final step is to apply random mutation: for each bit that we are to copy to the new population we allow a small probability of error (for instance 0.1)

• Before applying mutation:

s1`` = 1110110101 s4`` = 0111000101

s2`` = 1111010101 s5`` = 0100011101

s3`` = 1110111101 s6`` = 1110110011

DATSI, Universidad Politécnica de Madrid

MAXONE problem (mutation)

After applying mutation:

s1``` = 1110100101 f (s1``` ) = 6

s2``` = 1111110100 f (s2``` ) = 7

s3``` = 1110101111 f (s3``` ) = 8

s4``` = 0111000101 f (s4``` ) = 5

s5``` = 0100011101 f (s5``` ) = 5

s6``` = 1110110001 f (s6``` ) = 6

DATSI, Universidad Politécnica de Madrid

MAXONE problem

• In one generation, the total population fitness changed from 34 to 37, thus improved by ~9%.

• At this point, we go through the same process all over again, until a stopping criterion is met

DATSI, Universidad Politécnica de Madrid

Local Optimizations (Lamark)

Fitness calculation performs an improved solution in the neibourghood:– Lamarkian Evolution: the inheritance of acquired

characteristics.– Methods:

Hill-climbing Simulated annealing

DATSI, Universidad Politécnica de Madrid

Local Optimizations (Baldwin)

Other alternatives:– Evolve fitness

function– But keep

genetic representation

As a result:– New fitness


DATSI, Universidad Politécnica de Madrid

GAs: Why do they work?

• Schema Theorem

• Building Blocks Hypothesis

DATSI, Universidad Politécnica de Madrid

Schema Notation

• {0,1,#} is the symbol alphabet, where # is a special wild card symbol

• A schema is a template consisting of a string composed of these three symbols

• Example: the schema [1#00#] matches the strings: [10000], [10001], [11000] and [11001]


1 000 0

1 000 1

1 100 0

1 100 1

DATSI, Universidad Politécnica de Madrid


“A schema is a similarity template describing a subset of strings with similarities at certain positions” John Holland

Crossover causes low order schema, to get increased representation in the next generation.

Mutation causes long schema to be destroyed! Thus short schema have a better chance of surviving to the next generation

DATSI, Universidad Politécnica de Madrid

Schema Theorem and Building Block

GAs explore the search space by short, low-order schemata which, subsequently, are used for information exchange during crossover

Objective: Building blocks (set of alleles with representative participation on fitness/solution quality).

DATSI, Universidad Politécnica de Madrid

Building Blocks Hypothesis

“A genetic algorithm seeks near-optimal performance through the juxtaposition of short, low-order, high-performance schemata, called the building blocks”

Some building blocks (short, low-order schemata) can mislead GA and cause its convergence to suboptimal points

DATSI, Universidad Politécnica de Madrid

Differences betwwen GA and Conventional Search Algorithms

GA works on a coding of the parameters set, not the parameter themselves

GA searches from a population of points, not a single point GA uses only a payoff function, and no domain knowledge GA uses probabilistic transition rules, not deterministic

ones GA can provide a number of potential solutions to a given

problem. The final choice is left to the user.

DATSI, Universidad Politécnica de Madrid


“Genetic Algorithms are good at taking large, potentially huge search spaces and navigating them, looking for optimal

combinations of things, solutions you might not otherwise find in a lifetime.”

Salvatore Mangano

Computer Design, May 1995