Upload
kimberly-harding
View
218
Download
0
Tags:
Embed Size (px)
Citation preview
Andrea G. B. Tettamanzi, 2002
Algoritmi EvolutiviAlgoritmi Evolutivi
Andrea G. B. Tettamanzi
Andrea G. B. Tettamanzi, 2002
Lezione 1
23 aprile 2002
Andrea G. B. Tettamanzi, 2002
Contents of the Lectures
• Taxonomy and History;• Evolutionary Algorithms basics; • Theoretical Background;• Outline of the various techniques: plain genetic algorithms,
evolutionary programming, evolution strategies, genetic programming;
• Practical implementation issues;• Evolutionary algorithms and soft computing;• Selected applications from the biological and medical area;• Summary and Conclusions.
Andrea G. B. Tettamanzi, 2002
Bibliography
Th. Bäck. Evolutionary Algorithms in Theory and Practice. Oxford University Press, 1996
L. Davis. The Handbook of Genetic Algorithms. Van Nostrand & Reinhold, 1991
D.B. Fogel. Evolutionary Computation. IEEE Press, 1995 D.E. Goldberg. Genetic Algorithms in Search, Optimization and
Machine Learning. Addison-Wesley, 1989 J. Koza. Genetic Programming. MIT Press, 1992 Z. Michalewicz. Genetic Algorithms + Data Structures = Evolution
Programs. Springer Verlag, 3rd ed., 1996 H.-P. Schwefel. Evolution and Optimum Seeking. Wiley & Sons,
1995 J. Holland. Adaptation in Natural and Artificial Systems. MIT Press
1995
Andrea G. B. Tettamanzi, 2002
Tassonomia (1)
Algoritmi Genetici
Algoritmi Evolutivi
Programmazione Evolutiva
Strategie Evolutive
Programmazione Genetica
Ricottura simulata(simulated annealing)
Taboo Search
Metodi Monte Carlo
Metodi stocastici di ottimizzazione
Andrea G. B. Tettamanzi, 2002
Tassonomia
Tratti distintivi di un algoritmo evolutivo
• opera su una codifica appropriata delle soluzioni;• considera a ogni istante una popolazione di soluzioni candidate;• non richiede condizioni di regolarità (p.es. derivabilità);• segue regole di transizione probabilistiche.
Andrea G. B. Tettamanzi, 2002
History (1)I. Rechenberg,H.-P. SchwefelTU Berlin, ‘60s
John H. HollandUniversity of Michigan,
Ann Arbor, ‘60s
L. FogelUC S. Diego, ‘60s
John KozaStanford University
‘80s
Andrea G. B. Tettamanzi, 2002
History (2)
1859 Charles Darwin: inheritance, variation, natural selection
1957 G. E. P. Box: random mutation & selection for optimization
1958 Fraser, Bremermann: computer simulation of evolution
1964 Rechenberg, Schwefel: mutation & selection
1966 Fogel et al.: evolving automata - “evolutionary programming”
1975 Holland: crossover, mutation & selection - “reproductive plan”
1975 De Jong: parameter optimization - “genetic algorithm”
1989 Goldberg: first textbook
1991 Davis: first handbook
1993 Koza: evolving LISP programs - “genetic programming”
Andrea G. B. Tettamanzi, 2002
Evolutionary Algorithms Basics
• what an EA is (the Metaphor)• object problem and fitness• the Ingredients• schemata• implicit parallelism• the Schema Theorem• the building blocks hypothesis• deception
Andrea G. B. Tettamanzi, 2002
La metafora
Ambiente Problema da risolvere
Individuo
Addattamento
Soluzione candidata
Qualità della soluzione
EVOLUZIONEEVOLUZIONE PROBLEM SOLVINGPROBLEM SOLVING
Andrea G. B. Tettamanzi, 2002
Object problem and Fitness
genotype solutionM
c S
c
S
:
min ( )
R
s
s
object problem
s
ffitness
Andrea G. B. Tettamanzi, 2002
Gli ingredienti di un algoritmo evolutivo
generazione t t + 1
mutazione
ricombinazione
riproduzione
selezione(sopravvivenzadel più adatto)
popolazione di soluzioni(appropriatamente codificate)
“DNA” di una soluzione
Andrea G. B. Tettamanzi, 2002
Il ciclo evolutivo
Ricombinazione
MutazionePopolazione
Figli
GenitoriSelezione
Sostituzione
Rip
rodu
zione
Andrea G. B. Tettamanzi, 2002
Pseudocode
generation = 0;
SeedPopulation(popSize); // at random or from a file
while(!TerminationCondition())
{
generation = generation + 1;
CalculateFitness(); // ... of new genotypes
Selection(); // select genotypes that will reproduce
Crossover(pcross); // mate pcross of them on average
Mutation(pmut); // mutate all the offspring with Bernoulli
// probability pmut over genes
}
Andrea G. B. Tettamanzi, 2002
A Sample Genetic Algorithm
• The MAXONE problem• Genotypes are bit strings• Fitness-proportionate selection• One-point crossover• Flip mutation (transcription error)
Andrea G. B. Tettamanzi, 2002
The MAXONE Problem
Problem instance: a string of l binary cells, l:
Objective: maximize the number of ones in the string.
f ii
l
( )
1
Fitness:
Andrea G. B. Tettamanzi, 2002
Fitness Proportionate Selection
Implementation: “Roulette Wheel”
Pf
f( )
( )
Probability of being selected:
2f
f
( )
Andrea G. B. Tettamanzi, 2002
One Point Crossover
00000 1111 1
01101 0101 0
crossoverpoint
01000 0101 0
00101 1111 1
parents offspring
Andrea G. B. Tettamanzi, 2002
Mutation
11101 0101 0
pmut
01101 0101 1
independent Bernoulli transcription errors
Andrea G. B. Tettamanzi, 2002
Example: Selection
0111011011 f = 7 Cf = 7 P = 0.1251011011101 f = 7 Cf = 14 P = 0.1251101100010 f = 5 Cf = 19 P = 0.0890100101100 f = 4 Cf = 23 P = 0.0711100110011 f = 6 Cf = 29 P = 0.1071111001000 f = 5 Cf = 34 P = 0.0890110001010 f = 4 Cf = 38 P = 0.0711101011011 f = 7 Cf = 45 P = 0.1250110110000 f = 4 Cf = 49 P = 0.0710011111101 f = 7 Cf = 56 P = 0.125
Random sequence: 43, 1, 19, 35, 15, 22, 24, 38, 44, 2
Andrea G. B. Tettamanzi, 2002
Example: Recombination & Mutation
0111011011 0111011011 0111111011 f = 80111011011 0111011011 0111011011 f = 7110|1100010 1100101100 1100101100 f = 5010|0101100 0101100010 0101100010 f = 41|100110011 1100110011 1100110011 f = 61|100110011 1100110011 1000110011 f = 50110001010 0110001010 0110001010 f = 41101011011 1101011011 1101011011 f = 7011000|1010 0110001011 0110001011 f = 5110101|1011 1101011010 1101011010 f = 6
TOTAL = 57
Andrea G. B. Tettamanzi, 2002
Lezione 2
24 aprile 2002
Andrea G. B. Tettamanzi, 2002
Schemata
Don’t care symbol:
11 0
a schema S matches 2l - o(S) stringsa string of length l is matched by 2l schemata
order of a schema: o(S) = # fixed positions
defining length (S) = distance between first and last fixed position
Andrea G. B. Tettamanzi, 2002
Implicit Parallelism
In a population of n individuals of length l
2l # schemata processed n2l
n3 of which are processed usefully (Holland 1989)
(i.e. are not disrupted by crossover and mutation)
But see Bertoni & Dorigo (1993)
“Implicit Parallelism in Genetic Algorithms”
Artificial Intelligence 61(2), p. 307314
Andrea G. B. Tettamanzi, 2002
Fitness of a schema
f Sq S
q fxx
xS
( )( )
( ) ( )1
f(): fitness of string
qx(): fraction of strings equal to in population x
qx(S): fraction of strings matched by S in population x
Andrea G. B. Tettamanzi, 2002
The Schema Theorem
{Xt}t=0,1,... populations at times t
f S f X
f Xc
X t
t
t( ) ( )
( )
E q S X q S c pS
lo S pX X
tcross mut
t
t[ ( )| ] ( )( )
( )( )0 0
1 11
suppose that is constant
i.e. above-average individuals increase exponentially!
Andrea G. B. Tettamanzi, 2002
The Schema Theorem (proof)
E q S X q Sf S
f XP S q S c P SX t X
X
tsurv X survt t
t
t[ ( )| ] ( )
( )
( )[ ] ( )( ) [ ]
11
1
1
11
P S pS
lp o Ssurv cross mut[ ]
( )( )
1
1
Andrea G. B. Tettamanzi, 2002
The Building Blocks Hypothesis
‘‘An evolutionary algorithm seeks near-optimal performance through the juxtaposition of short, low-order, high-performance
schemata — the building blocks’’
Andrea G. B. Tettamanzi, 2002
Deception
i.e. when the building block hypothesis does not hold:
* S f S f S( ) ( )butfor some schema S,
Example:
* = 1111111111
S1 = 111*******
S2 = ********11
S = 111*****11
S = 000*****00
Andrea G. B. Tettamanzi, 2002
Remedies to deception
Prior knowledge of the objective function
Non-deceptive encoding
Inversion
Semantics of genes not positional
“Messy Genetic Algorithms”
Underspecification & overspecification
Andrea G. B. Tettamanzi, 2002
Theoretical Background
• Theory of random processes;• Convergence in probability;• Open question: rate of convergence.
Andrea G. B. Tettamanzi, 2002
Eventi
Spazio campionario
A
B
D
Andrea G. B. Tettamanzi, 2002
Variabili aleatorie
0)(X
XRX :
Andrea G. B. Tettamanzi, 2002
Processi Stocastici
X t t( )
, ,
0 1
Un processo stocastico è una successione di v.a.
,,,, 21 tXXX
Ciascuna con la propria distribuzione di probabilità.
Notazione:
Andrea G. B. Tettamanzi, 2002
EAs as Random Processes
a sample of size nx n ( )
, ,2 probability space
, ,F P X t t( )
, ,
0 1
, ,2
“random numbers”trajectory
evolutionaryprocess
Andrea G. B. Tettamanzi, 2002
Catene di Markov
X t t( )
, ,
0 1 Un processo stocastico
è una catena di Markov sse il suo stato dipende solo dallo
stato precedente, cioè, per ogni t,
P X x X X X P X x Xt t t t[ | , , , ] [ | ] 0 1 1 1
A B C
0.4
0.6
0.3
0.7
0.25
0.75
Andrea G. B. Tettamanzi, 2002
Abstract Evolutionary Algorithm
select: (n)cross: mutate: mate: insert:
Xt
Xt+1
select
select
crossmate
insertmutate
Stochastic functions:
X T Xt t t 1( ) ( ) ( )
Transition function:
Andrea G. B. Tettamanzi, 2002
Convergence to Optimum
Theorem: if {Xt()}t = 0, 1, ... is monotone, homogeneous, x0 isgiven, y in reach(x0) (n)
O reachable, then
lim [ | ] .( )
tt O
nP X X x
0 0 1
Theorem: if select, mutate are generous, the neighborhoodstructure is connective, transition functions Tt(), t = 0, 1, ... are i.i.d.and elitist, then
lim [ ] .( )
tt O
nP X
1
Andrea G. B. Tettamanzi, 2002
Lezione 3
7 maggio 2002
Andrea G. B. Tettamanzi, 2002
Outline of various techniques
• Plain Genetic Algorithms• Evolutionary Programming• Evolution Strategies• Genetic Programming
Andrea G. B. Tettamanzi, 2002
Plain Genetic Algorithms
• Individuals are bit strings• Mutation as transcription error• Recombination is crossover• Fitness proportionate selection
Andrea G. B. Tettamanzi, 2002
Evolutionary Programming
• Individuals are finite-state automata• Used to solve prediction tasks• State-transition table modified by uniform random mutation• No recombination• Fitness depends on the number of correct predictions• Truncation selection
Andrea G. B. Tettamanzi, 2002
Evolutionary Programming: Individuals
Finite-state automaton: (Q, q0, A, , ) • set of states Q;• initial state q0;• set of accepting states A;• alphabet of symbols ;• transition function : Q Q;• output mapping function : Q ;
q0 q1 q2
a
b
c
stateinput
q0
q0
q0q1 q1
q1
q2
q2
q2
q1
q0
q2
b/c c/b
a/b
c/c
a/b
b/c
a/a
c/ab/a
a
c c
c
a
ab
b b
Andrea G. B. Tettamanzi, 2002
Evolutionary Programming: Fitness
a b c a b c a b
b =?no
yes
f() = f() + 1
individual
prediction
Andrea G. B. Tettamanzi, 2002
Evolutionary Programming: Selection
Variant of stochastic q-tournament selection:
1
2
q...
score() = #{i | f() > f(i) }
Order individuals by decreasing scoreSelect first half (Truncation selection)
Andrea G. B. Tettamanzi, 2002
Evolution Strategies
• Individuals are n-dimensional vectors of reals• Fitness is the objective function• Mutation distribution can be part of the genotype
(standard deviations and covariances evolve with solutions)
• Multi-parent recombination• Deterministic selection (truncation selection)
Andrea G. B. Tettamanzi, 2002
Evolution Strategies: Individuals
candidate solution
rotation angles
standard deviations
a
x
ij
i j
i j
1
2
22 2arctan
cov( , )
Andrea G. B. Tettamanzi, 2002
Evolution Strategies: Mutation
i i i
j j j
N N
N
x x N
exp( ( , ) ( , ))
( , )
( , , )
0 1 0 1
0 1
0
Hans-Paul Schwefel suggests:
2
2
0 0873 5
1
1
n
n
.
self-adaptation
Andrea G. B. Tettamanzi, 2002
Genetic Programming
• Program induction• LISP (historically), math expressions, machine language, ...• Applications:
– optimal control;
– planning;
– sequence induction;
– symbolic regression;
– modelling and forecasting;
– symbolic integration and differentiation;
– inverse problems
Andrea G. B. Tettamanzi, 2002
Genetic Programming: The Individuals
subset of LISP S-expressions
(OR (AND (NOT d0) (NOT d1)) (AND d0 d1))
OR
AND
NOT
d0
NOT
d1
AND
d0 d1
Andrea G. B. Tettamanzi, 2002
Genetic Programming: Initialization
OR
AND
NOT
d0
NOT
d1
AND
d0 d1
OR
OR
AND
OR
AND AND
OR
AND AND
NOT
Andrea G. B. Tettamanzi, 2002
Genetic Programming: Crossover
OR
ANDNOT
d0 d0 d1
OR
OR AND
d1 NOT NOT NOT
d0 d0 d1
OR
AND NOT
d0d0 d1
OR
ORAND
d1 NOTNOT NOT
d0d0 d1
Andrea G. B. Tettamanzi, 2002
Genetic Programming: Other Operators
• Mutation: replace a terminal with a subtree• Permutation: change the order of arguments to a function• Editing: simplify S-expressions, e.g. (AND X X) X• Encapsulation: define a new function using a subtree• Decimation: throw away most of the population
Andrea G. B. Tettamanzi, 2002
Genetic Programming: Fitness
Fitness cases: j = 1, ..., Ne
“Raw” fitness:
“Standardized” fitness: s() [0, +)
“Adjusted” fitness:
r j C jj
Ne
( ) Output( , ) ( )
1
as
( )( )
1
1
Andrea G. B. Tettamanzi, 2002
Sample Application: Myoelectric Prosthesis Control
• Control of an upper arm prosthesis• Genetic Programming application• Recognize thumb flection, extension and abduction patterns
Andrea G. B. Tettamanzi, 2002
Prosthesis Control: The Context
humanarm
myoelectric signals
measure
raw myo-measurements
preprocess
myo-signal features
deduce intentions
map into goal
human motion
robot motion
convert
actuator commands
robotarm
150 ms
2 electrodes
Andrea G. B. Tettamanzi, 2002
Prosthesis Control: Terminals
Features for electrodes 1, 2:• Mean absolute value (MAV)• Mean absolute value slope (MAVS)• Number of zero crossings (ZC)• Number of slope sign changes (SC)• Waveform length (LEN)• Average value (AVG)• Up slope (UP)• Down slope (DOWN)• MAV1/MAV2, MAV2/MAV1• 0.7, 0.8, 0.9, 1.0, 1.1, 1.2, 1.3, 0.01, -1.0
Andrea G. B. Tettamanzi, 2002
Prosthesis Control: Function SetAddition x + y
Subtraction x - y
Multiplication x * y
Division x / y (protected for y=0)
Square root sqrt(|x|)
Sine sin x
Cosine cos x
Tangent tan x (protected for x=/2)
Natural logarithm ln |x| (protected for x=0)
Common logarithm log |x| (protected for x=0)
Exponential exp x
Power function x ^ y
Reciprocal 1/x (protected for x=0)
Absolute value |x|
Integer or truncate int(x)
Sign sign(x)
Andrea G. B. Tettamanzi, 2002
Prosthesis Control: Fitness
type 1 type 2 type 3undefined undefined undefined
r( )min , ,
abduction extension flexion
abduction extension abduction flexion extension flexion
100
separation
spread
22 signals per motion
result
Andrea G. B. Tettamanzi, 2002
Myoelectric Prosthesis Control Reference
• Jaime J. Fernandez, Kristin A. Farry and John B. Cheatham. “Waveform Recognition Using Genetic Programming: The Myoelectric Signal Recognition Problem. GP ‘96, The MIT Press, pp. 63–71
Andrea G. B. Tettamanzi, 2002
Classifier Systems (Michigan approach)
IF X = A AND Y = B THEN Z = Dindividual:
IF ... THEN ...
IF ... THEN ...
IF ... THEN ...
IF ... THEN ...
IF ... THEN ...
IF ... THEN ...
IF ... THEN ...
IF ... THEN ...
IF ... THEN ...
fe f r n n
p f n nn
n
n
1
1
1( )
( ) ( ) ( ) class( )
( ) ( ) ( ) class( )
r gN R ( )1 where
number of attributes in antecedent part
Andrea G. B. Tettamanzi, 2002
Lezione 4
14 maggio 2002
Andrea G. B. Tettamanzi, 2002
Practical Implementation Issues
• from elegant academia to not so elegant but robust and efficient real-world applications, evolution programs
• handling constraints• hybridization• parallel and distributed algorithms
Andrea G. B. Tettamanzi, 2002
Evolution Programs
Slogan:Genetic Algorithms + Data Structures = Evolution Programs
Key ideas:• use a data structure as close as possible to object problem• write appropriate genetic operators• ensure that all genotypes correspond to feasible solutions• ensure that genetic operators preserve feasibility
Andrea G. B. Tettamanzi, 2002
Encodings: “Pie” Problems
W
X
Y
Z
0–255
128 32 90 200–255 0–255 0–255
W X Y Z
X = 32/270 = 11.85%
Andrea G. B. Tettamanzi, 2002
Encodings: “Permutation” ProblemsAdjacency Representation
Ordinal Representation
(2, 4, 8, 3, 9, 7, 1, 5, 6)
1 - 2 - 4 - 3 - 8 - 5 - 9 - 6 - 7
(1, 1, 2, 1, 4, 1, 3, 1, 1)
Path Representation
(1, 2, 4, 3, 8, 5, 9, 6, 7)
Matrix Representation10 1
0
0
0
0
0
0
0
0
1 1 1 1 1 1
0 1 1 1 1 1 1 1
0 0 0 1 1 1 1 1
0 0 1 1 1 1 1 1
0 0 00 1 1 1 1
0 0 00 01 1 1
0 0 00 00 1 1
10
0
0 0 0 0 0 0
0 0 0 0 0 0 0
Sorting Representation
(-23, -6, 2, 0, 19, 32, 85, 11, 25)
Andrea G. B. Tettamanzi, 2002
Handling Constraints
• Penalty functionsRisk of spending most of the time evaluating unfeasible solutions, sticking with the first feasible solution found, or finding an unfeasible solution that scores better of feasible solutions
• Decoders or repair algorithmsComputationally intensive, tailored to the particular application
• Appropriate data structures and specialized genetic operatorsAll possible genotypes encode for feasible solutions
Andrea G. B. Tettamanzi, 2002
Penalty Functions
S c
P
f c z P z( ) Eval( ( )) ( )
P z w t w zi ii
( ) ( ) ( )
Andrea G. B. Tettamanzi, 2002
Decoders / Repair Algorithms
S c
recombination
mutation
Andrea G. B. Tettamanzi, 2002
Hybridization
2) Use local optimization algorithms as genetic operators (Lamarckian mutation)
1) Seed the population with solutions provided by some heuristics
heuristics initial population
3) Encode parameters of a heuristics
genotypeheuristics candidate solution
Andrea G. B. Tettamanzi, 2002
Sample Application: Unit Commitment
• Multiobjective optimization problem: cost VS emission• Many linear and non-linear constraints• Traditionally approached with dynamic programming
• Hybrid evolutionary/knowledge-based approach• A flexible decision support system for planners• Solution time increases linearly with the problem size
Andrea G. B. Tettamanzi, 2002
The Unit Commitment Problem
C P a b P c Pi i i i i i i( ) 2
z C P SU SD HSi i i i ii
n
$ ( )
1
z E PE i ii
n
( )
1
E P P Pij i ij ij i ij i( ) 2
E P E Pi i ij ij
m
( ) ( )
1
Emissions Cost
Andrea G. B. Tettamanzi, 2002
Predicted Load Curve
0
5
10
15
20
25
30
35
40
45
Spinning Reserve
Load
Andrea G. B. Tettamanzi, 2002
Unit Commitment: Constraints
• Power balance requirement• Spinning reserve requirement• Unit maximum and minimum output limits• Unit minimum up and down times• Power rate limits• Unit initial conditions• Unit status restrictions• Plant crew constraints• ...
Andrea G. B. Tettamanzi, 2002
Unit Commitment: EncodingUnit 1 Unit 2 Unit 3 Unit 4 Time
1.0 00:00
01:00
02:00
03:00
04:00
05:00
06:00
07:00
08:00
09:00
1.0
1.0
1.0
1.0
1.0 1.0
1.0
1.0
1.0
1.0
1.01.0
0.9
0.8
0.8
0.8
0.8
0.4
0.8
0.8
0.75
0.8
0.2
0.2
0.25
0.2
0.2
0.15
0.0
0.0
0.0 0.0
0.0 0.0
0.5
0.65
0.5
0.5
1.0
FuzzyKnowledge
Base
Andrea G. B. Tettamanzi, 2002
Unit Commitment: Solution
Unit 1 Unit 2 Unit 3 Unit 4 Time
00:00
01:00
02:00
03:00
04:00
05:00
06:00
07:00
08:00
09:00
down
hot-stand-bystartingshutting down
up
Andrea G. B. Tettamanzi, 2002
Unit Commitment: Selection
cost ($)
em
issi
on
$507,762 $516,511
213,489 £ 60,080 £
competitive selection:
Andrea G. B. Tettamanzi, 2002
Unit Commitment References
• D. Srinivasan, A. Tettamanzi. “An Integrated Framework for Devising Optimum Generation Schedules”. In Proceedings of the 1995 IEEE International Conference on Evolutionary Computing (ICEC ‘95), vol. 1, pp. 1-4.
• D. Srinivasan, A. Tettamanzi. A Heuristic-Guided Evolutionary Approach to Multiobjective Generation Scheduling. IEE Proceedings Part C - Generation, Transmission, and Distribution, 143(6):553-559, November 1996.
• D. Srinivasan, A. Tettamanzi. An Evolutionary Algorithm for Evauation of Emission Compliance Options in View of the Clean Air Act Amendments. IEEE Transactions on Power Systems, 12(1):336-341, February 1997.
Andrea G. B. Tettamanzi, 2002
Lezione 5
15 maggio 2002
Andrea G. B. Tettamanzi, 2002
Algoritmi Evolutivi Paralleli
• Algoritmo evolutivo standard enunciato come sequenziale...
• … ma gli algoritmi evolutivi sono intrinsecamente paralleli
• Vari modelli:– algoritmo evolutivo cellulare– algoritmo evolutivo parallelo a grana fine (griglia)– algoritmo evolutivo parallelo a grana grossa (isole)– algoritmo evolutivo sequenziale con calcolo della fitness
parallelo (master - slave)
Andrea G. B. Tettamanzi, 2002
Island Model
Andrea G. B. Tettamanzi, 2002
Sample Application: Protein Folding
• Finding 3-D geometry of a protein to understand its functionality• Very difficult: one of the “grand challenge problems”• Standard GA approach• Simplified protein model
Andrea G. B. Tettamanzi, 2002
Protein Folding: The Problem
• Much of a proteins function may be derived from its conformation (3-D geometry or “tertiary” structure).
• Magnetic resonance & X-ray crystallography are currently used to view the conformation of a protein:– expensive in terms of equipment, computation and time;
– require isolation, purification and crystallization of protein.
• Prediction of the final folded conformation of a protein chain has been shown to be NP-hard.
• Current approaches:– molecular dynamics modelling (brute force simulation);
– statistical prediction;
– hill-climbing search techniques (simulated annealing).
Andrea G. B. Tettamanzi, 2002
Protein Folding: Simplified Model
• 90° lattice (6 degrees of freedom at each point);• Peptides occupy intersections;• No side chains;• Hydrophobic or hydrophilic (no relative strengths) amino acids;• Only hydrophobic/hydrophilic forces considered;• Adjacency considered only in cardinal directions;• Cross-chain hydrophobic contacts are the basis for evaluation.
Andrea G. B. Tettamanzi, 2002
Protein Folding: Representation
preference order encoding:
relative move encoding:
UP DOWN FORWARD LEFT UP RIGHT
UPLEFTRIGHTDOWNFORWARD
DOWNLEFTUPFORWARDRIGHT
FORWARDUPDOWNLEFTRIGHT
LEFTDOWNFORWARDUPRIGHT
...
...
Andrea G. B. Tettamanzi, 2002
Protein Folding: Fitness
Decode: plot the course encoded by the genotype.
Test each occupied cell:• any collisions: -2;• no collisions AND a hydrophobe in an adjacent cell: 1.
Notes:• for each contact: +2;• adjacent hydrophobes not discounted in the scoring;• multiple collisions (>1 peptides in one cell): -2;• hydrophobe collisions imply an additional penalty (no contacts
are scored).
Andrea G. B. Tettamanzi, 2002
Protein Folding: Experiments
• Preference ordering encoding;• Two-point crossover with a rate of 95%;• Bit mutation with a rate of 0.1%;• Population size: 1000 individuals;• crowding and incest reduction.
• Test sequences with known minimum configuration;
Andrea G. B. Tettamanzi, 2002
Protein Folding References
• S. Schulze-Kremer. “Genetic Algorithms for Protein Tertiary Structure Prediction”. PPSN 2, North-Holland 1992.
• R. Unger and J. Moult. “A Genetic Algorithm for 3D Protein Folding Simulations”. ICGA-5, 1993, pp. 581–588.
• Arnold L. Patton, W. F. Punch III and E. D. Goodman. “A Standard GA Approach to Native Protein Conformation Prediction”. ICGA 6, 1995, pp. 574–581.
Andrea G. B. Tettamanzi, 2002
Sample Application: Drug Design
Purpose: given a chemical specification (activity), design a tertiary structure complying with it.
Requirement: a quantitative structure-activity relationship model.
Example: design ligands that can bind targets specifically and selectively. Complementary peptides.
Andrea G. B. Tettamanzi, 2002
Drug Design: Implementation
N L H A F G L F K A
amino acid (residue)
individual
• name• hydropathic value
Operators:• Hill-climbing Crossover• Hill-climbing Mutation• Reordering (no selection)
implicit selection
Andrea G. B. Tettamanzi, 2002
Drug Design: Fitness
target a complement b
moving averagehydropathya hk i
i k s
k s
b gk ii k s
k s
hydropathy of residues
k s, ..., n s n: number of residues in target
Qa b
n si i
i
( )2
2(lower Q = better complementarity)
Andrea G. B. Tettamanzi, 2002
Drug Design: Results
0 2 4 6 8 10 12 14 16-6
-4
-2
0
2
4
Sequence:FANSGNVYFGIIAL Fassina GA Target
HydropathicValue
AminoAcid
Andrea G. B. Tettamanzi, 2002
Drug Design References
• T. S. Lim. A Genetic Algorithms Approach for Drug Design. MS Dissertation, Oxford University, Computing Laboratory, 1995.
• A. L. Parrill. Evolutionary and Genetic Methods in Drug Design. Drug Discovery Today, Vol. 1, No. 12, Dec 1996, pp. 514–521.