View
229
Download
0
Tags:
Embed Size (px)
Citation preview
John R. KozaConsulting Professor (Medical Informatics)
Department of Medicine
School of Medicine
Consulting Professor
Department of Electrical Engineering
School of Engineering
Stanford University
Stanford, California 94305
http://www.smi.stanford.edu/people/koza/
DEFINITION OF THE GENETIC ALGORITHM (GA)
The genetic algorithm is a probabalistic search algorithm that iteratively transforms a set (called a population) of mathematical objects (typically fixed-length binary character strings), each with an associated fitness value, into a new population of offspring objects using the Darwinian principle of natural selection and using operations that are patterned after naturally occurring genetic operations, such as crossover (sexual recombination) and mutation.
GENETIC ALGORITHM (GA)
Generation 0 Generation 1
Individuals Fitness Offspring
011 $3 111
001 $1 010
110 $6 110
010 $2 010
HAMBURGER RESTAURANT PROBLEM
• Price1 = $ 0.50 price0 = $10.00 price
• Drink1 = Coca Cola0 = Wine
• Ambiance1 = Fast snappy service0 = Leisurely service with tuxedoed waiter
THE SEARCH SPACE
• Alphabet size K=2, Length L=3
• Size of search space: KL=2L=23=8
1 000
2 001
3 010
4 011
5 100
6 101
7 110
8 111
IMPRACTICALITY OF RANDOM OR ENUMERATIVE SEARCH
• 81-bit problems are very small for GA
• However, even if L is as small as 81, 281 ~ 1027 = number of nanoseconds since the beginning of the universe 15 billion years ago
DEFINITION OF THE GENETIC ALGORITHM (GA)
The genetic algorithm is a probabalistic search algorithm that iteratively transforms a set (called a population) of mathematical objects (typically fixed-length binary character strings), each with an associated fitness value, into a new population of offspring objects using the Darwinian principle of natural selection and using operations that are patterned after naturally occurring genetic operations, such as crossover (sexual recombination) and mutation.
PROBABILISTIC SELECTION BASED ON FITNESS
• Better individuals are preferred• Best is not always picked• Worst is not necessarily excluded• Nothing is guaranteed• Mixture of greedy exploitation and adventurous exploration• Similarities to simulated annealing (SA)
DARWINIAN FITNESS PROPORTIONATE SELECTION
Generation 0 Mating pool
1 011 3 .25 011 3
2 001 1 .08 110 6
3 110 6 .50 110 6
4 010 2 .17 010 2
Total 12 17
Worst 1 2
Average 3.00 4.5
Best 6 6
DEFINITION OF THE GENETIC ALGORITHM (GA)
The genetic algorithm is a probabalistic search algorithm that iteratively transforms a set (called a population) of mathematical objects (typically fixed-length binary character strings), each with an associated fitness value, into a new population of offspring objects using the Darwinian principle of natural selection and using operations that are patterned after naturally occurring genetic operations, such as crossover (sexual recombination) and mutation.
MUTATION OPERATION
• Parent chosen probabilistically based on fitness
• Mutation point chosen at random
• One offspring
Parent
010
Parent
--0
Offspring
011
AFTER MUTATION OPERATION
Generation 0 Mating pool Generation 1
1 011 3 .25 011 3
2 001 1 .08 110 6
3 110 6 .50 110 6
4 010 2 .17 010 2 --- 011 3
Total 12 17
Worst 1 2
Average 3.00 4.5
Best 6 6
CROSSOVER (CONTINUED)• Interstitial point picked at random
• 2 remainders
• 2 offspring produced by crossover
Remainder 1 Remainder 2
- - 1 - - 0
Offspring 1 Offspring 2
111 010
Fragment 1 Fragment 2
01- 11-
AFTER CROSSOVER OPERATION
Generation 0 Mating pool Generation 1
1 011 3 .25 011 3 2 111 7
2 001 1 .08 110 6 2 010 2
3 110 6 .50 110 6
4 010 2 .17 010 2
Total 12 17
Worst 1 2
Average 3.00 4.5
Best 6 6
AFTER REPRODUCTION OPERATION
Generation 0 Mating pool Generation 1
1 011 3 .25
2 001 1 .08
3 110 6 .50 110 6 --- 110 6
4 010 2 .17
Total 12 17
Worst 1 2
Average 3.00 4.5
Best 6 6
DEFINITION OF THE GENETIC ALGORITHM (GA)
The genetic algorithm is a probabalistic search algorithm that iteratively transforms a set (called a population) of mathematical objects (typically fixed-length binary character strings), each with an associated fitness value, into a new population of offspring objects using the Darwinian principle of natural selection and using operations that are patterned after naturally occurring genetic operations, such as crossover (sexual recombination) and mutation.
GENERATION 1
Generation 0 Mating pool Generation 1
1 011 3 .25 011 3 2 111 7
2 001 1 .08 110 6 2 010 2
3 110 6 .50 110 6 --- 110 6
4 010 2 .17 010 2 --- 011 3
Total 12 17 18
Worst 1 2 2
Average 3.00 4.5 4.5
Best 6 6 7
DEFINITION OF THE GENETIC ALGORITHM (GA)
The genetic algorithm is a probabalistic search algorithm that iteratively transforms a set (called a population) of mathematical objects (typically fixed-length binary character strings), each with an associated fitness value, into a new population of offspring objects using the Darwinian principle of natural selection and using operations that are patterned after naturally occurring genetic operations, such as crossover (sexual recombination) and mutation.
DEFINITION OF THE GENETIC ALGORITHM (GA)
The genetic algorithm is a probabalistic search algorithm that iteratively transforms a set (called a population) of mathematical objects (typically fixed-length binary character strings), each with an associated fitness value, into a new population of offspring objects using the Darwinian principle of natural selection and using operations that are patterned after naturally occurring genetic operations, such as crossover (sexual recombination) and mutation.
PROBABILISTIC STEPS
• The initial population is typically random• Probabilistic selection based on fitness
- Best is not always picked- Worst is not necessarily excluded
• Random picking of mutation and crossover points
• Often, there is probabilistic scenario as part of the fitness measure
ANTENNA DESIGN
• The problem (Altshuler and Linden 1998) is to determine the x-y-z coordinates of the 3-dimensional position of the ends (X1, Y1, Z1, X2, Y2, Z2,… , X7, Y7, Z7) of 7 straight wires so that the resulting 7-wire antenna satisfies certain performance requirements
• The first wire starts at feed point (0, 0, 0) in the middle of the ground plane
• The antenna must fit inside the 0.5 cube
ANTENNA GENOME
• 105-bit chromosome (genome)• Each x-y-z coordinate is represented by 5 bits
(4-bit granularity for data plus a sign bit)• Total chromosome is 3 7 5 = 105 bits
X1 Y1 Z1 X2 Y2 Z2 …
+0010 -1110 +0001 +0011 -1011 +0011 …
ANTENNA FITNESS
• Antenna is for ground-to-satellite communications for cars and handsets
• We desire near-uniform gain pattern 10 above the horizon
• Fitness is measured based on the antenna's radiation pattern. The radiation pattern is simulated by National Electromagnetics Code (NEC)
ANTENNA FITNESS
• Fitness is sum of the squares of the difference between the average gain and the antenna's gain
• Sum is taken for angles between -90 and +90 and all azimuth angles from 0 to 180
• The smaller the value of fitness, the better
10-MEMBER TRUSS
• Prespecified topological arrangement of the 10 members, the load, and the wall (Goldberg and Samtani 1986)
• Truss has 10 members (6 are length of 30 feet and 4 are length 30√2 = 41 feet)
• The problem is to determine the cross-sectional areas (A1, … , A10) of each of the 10 members so as to minimize weight of the material for a truss that supports the 2 loads
• The weight is based on volume (i.e., cross-sectional area length)
TRUSS GENOME
• 40-bit chromosome (genome)• 4-bit granularity for truss diameters• 0000 = smallest diameter• 1111 = largest diameter• Total chromosome is 4 10 = 40 bits
A1 A2 A3 A4 A5 A6 A7 A8 A9 A10
0010 1110 0001 0011 1011 0011 1111 0011 0011 1010
TRUSS FITNESS
• Two-part (multiobjective) fitness measure– First, fitness is computed by taking the sum,
over the 10 members, of the cross-sectional area of each member times the length of each member (30 feet or 30√2 = 41 feet).
– Second, a penalty (up to 10%) is imposed for violating the stress constraints. Stresses are computed using standard mechanical engineering techniques.
• The smaller the total fitness, the better
STATE TRANSITION TABLE
# WWW WW W X E EE EEE Rule
0 0 0 0 0 0 0 1 a0
1 0 0 0 0 0 1 0 a1
2 0 0 0 0 1 0 0 a2
3 0 0 0 0 1 1 0 a3
4 0 0 1 1 0 0 0 a4
… … … … … … … … …
127 1 1 1 1 1 1 1 a127
PROBLEM-SPECIFIC GENOMES
N M GENOME
1 1 0 1 1 1 1
1 1 1 1 0 1 1
1 0 1 1 1 0 0
1 1 1 1 1 1 1
1 1 0 1 1 1 1
1 1 1 1 0 1 1
1 1 0 1 1 1 0
GENETIC ALGORITHM USING VARIABLE-LENGTH STRINGS
• 5-WIRE ANTENNA (5 15 = 75 bits)
• 4-WIRE ANTENNA (4 15 = 60 bits)
X1 Y1 Z1 … X5 Y5 Z5
+0010 -1110 +0001 … +0010 -1110 +0001
X1 Y1 Z1 … X4 Y4 Z4
+1010 -0110 +1101 … +1010 -0110 +1001
THE CHALLENGE
"How can computers learn to solve problems without being explicitly programmed? In other words, how can computers be made to do what is needed to be done, without being told exactly how to do it?"
Attributed to Arthur Samuel (1959)
CRITERION FOR SUCCESS
"The aim [is] ... to get machines to exhibit behavior, which if done by humans, would be assumed to involve the use of intelligence.“
Arthur Samuel (1983)
REPRESENTATIONS
• Decision trees• If-then production
rules• Horn clauses• Neural nets• Bayesian networks• Frames• Propositional logic
• Binary decision diagrams
• Formal grammars • Coefficients for
polynomials• Reinforcement
learning tables• Conceptual clusters • Classifier systems
GENETIC PROGRAMMING (GP)
• GP applies the approach of the genetic algorithm to the space of possible computer programs
• Computer programs are the lingua franca for expressing the solutions to a wide variety of problems
• A wide variety of seemingly different problems from many different fields can be reformulated as a search for a computer program to solve the problem.
GP MAIN POINTS
• Genetic programming now routinely delivers high-return human-competitive machine intelligence.
• Genetic programming is an automated invention machine.
• Genetic programming has delivered a progression of qualitatively more substantial results in synchrony with five approximately order-of-magnitude increases in the expenditure of computer time.
DEFINITION OF “HIGH-RETURN”
The AI ratio (the “artificial-to-intelligence” ratio) of a problem-solving method as the ratio of that which is delivered by the automated operation of the artificial method to the amount of intelligence that is supplied by the human applying the method to a particular problem
DEFINITION OF “ROUTINE”
A problem solving method is routine if it is general and relatively little human effort is required to get the method to successfully handle new problems within a particular domain and to successfully handle new problems from a different domain.
CRITERIA FOR “HUMAN-COMPETITIVENESS”
• Previously patented, an improvement over a patented invention, or patentable today
• Publishable in its own right as a new scientific result independent of the fact that the result was mechanically created
• Holds it own in regulated competition against humans (or programs)
• 5 other similar criteria that are “arms-length” from the fields of AI, ML, GP
PROGRESSION OF QUALITATIVELY MORE SUBSTANTIAL RESULTS PRODUCED BY GP
• Toy problems
• Human-competitive non-patent results
• 20th-century patented inventions
• 21st-century patented inventions
• Patentable new inventions
A COMPUTER PROGRAM IN C
int foo (int time){ int temp1, temp2; if (time > 10) temp1 = 3; else temp1 = 4; temp2 = temp1 + 1 + 2; return (temp2);}
CREATING RANDOM PROGRAMS
• Available functions F = {+, -, *, %, IFLTE}
• Available terminals T = {X, Y, Random-Constants}
• The random programs are:– Of different sizes and shapes
– Syntactically valid
– Executable
GP GENETIC OPERATIONS
• Reproduction
• Mutation
• Crossover (sexual recombination)
• Architecture-altering operations
MUTATION OPERATION
• Select 1 parent probabilistically based on fitness• Pick point from 1 to NUMBER-OF-POINTS• Delete subtree at the picked point• Grow new subtree at the mutation point in same
way as generated trees for initial random population (generation 0)
• The result is a syntactically valid executable program
• Put the offspring into the next generation of the population
CROSSOVER OPERATION
• Select 2 parents probabilistically based on fitness
• Randomly pick a number from 1 to NUMBER-OF-POINTS for 1st parent
• Independently randomly pick a number for 2nd parent
• The result is a syntactically valid executable program
• Put the offspring into the next generation of the population
• Identify the subtrees rooted at the two picked points
REPRODUCTION OPERATION
• Select parent probabilistically based on fitness
• Copy it (unchanged) into the next generation of the population
FIVE MAJOR PREPARATORY STEPS FOR
GP
• Determining the set of terminals• Determining the set of functions• Determining the fitness measure • Determining the parameters for the run• Determining the method for designating a result and
the criterion for terminating a run
SYMBOLIC REGRESSION
Independent variable X
Dependent variable Y
-1.00 1.00
-0.80 0.84
-0.60 0.76
-0.40 0.76
-0.20 0.84
0.00 1.00
0.20 1.24
0.40 1.56
0.60 1.96
0.80 2.44
1.00 3.00
PREPARATORY STEPS
Objective: Find a computer program with one input (independent variable X) whose output equals the given data
1 Terminal set: T = {X, Random-Constants}
2 Function set: F = {+, -, *, %}
3 Fitness: The sum of the absolute value of the differences between the candidate program’s output and the given data (computed over numerous values of the independent variable x from –1.0 to +1.0)
4 Parameters: Population size M = 4
5 Termination: An individual emerges whose sum of absolute errors is less than 0.1
SYMBOLIC REGRESSION x2 + x + 1
FITNESS OF THE 4 INDIVIDUALS IN GEN 0
x + 1 x2 + 1 2 x
0.67 1.00 1.70 2.67
SYMBOLIC REGRESSION x2 + x + 1
GENERATION 1
Copy of (a)
Mutant of (c) picking “2” as mutation point
First offspring of crossover of (a) and (b) picking “+” of parent (a) and left-most “x” of parent (b) as crossover points
Second offspring of crossover of (a) and (b) picking “+” of parent (a) and left-most “x” of parent (b) as crossover points
GP TABLEAU – INTERTWINED SPIRALS
Objective: Create a program to classify a given point in the x-y plane to the red or blue spiral
1 Terminal set: T = {X,Y,Random-Constants}
2 Function set: F = {+,-,*,%,IFLTE,SIN,COS}
3 Fitness: The number of correctly classified points (0 – 194)
4 Parameters: M = 10,000. G = 51
5 Termination: An individual program scores 194
TRUCK BACKER UPPER
• 4-Dimensional control problem– horizontal position, x– vertical position, y– angle between trailer and horizontal, t
– angle between trailer and cab, d
• One control variable (steering wheel turn angle)
• State transition equations map the 4 state variables into 1 output (the control variable)
• Simulation run over many initial conditions and over hundreds of time steps
2 MAIN POINTS FROM 1992 BOOK
• Virtually all problems in artificial intelligence, machine learning, adaptive systems, and automated learning can be recast as a search for a computer program.
• Genetic programming provides a way to successfully conduct the search for a computer program in the space of computer programs.
SOME RESULTS FROM 1992 BOOK
• Intertwined Spirals• Truck Backer Upper• Broom Balancer• Wall Follower• Box Mover• Artificial Ant• Differential Games• Inverse Kinematics• Central Place Foraging• Block Stacking
• Randomizer
• Cellular Automata
• Task Prioritization
• Image Compression
• Econometric Equation
• Optimization
• Boolean Function Learning
• Co-Evolution of Game-Playing Strategies
PROGRESSION OF QUALITATIVELY MORE SUBSTANTIAL RESULTS PRODUCED BY GP
• Toy problems
• Human-competitive non-patent results
• 20th-century patented inventions
• 21st-century patented inventions
• Patentable new inventions
COMPUTER PROGRAMS
• Subroutines provide one way to REUSE code possibly with different instantiations of the dummy variables (formal parameters)
• Loops (and iterations) provide a 2nd way to REUSE code• Recursion provide a 3rd way to REUSE code• Memory provides a 4th way to REUSE the results of
executing code
SYMBOLIC REGRESSION
Fitness case L0 W0 H0 L1 W1 H1 Dependent variable D
1 3 4 7 2 5 3 54
2 7 10 9 10 3 1 600
3 10 9 4 8 1 6 312
4 3 9 5 1 6 4 111
5 4 3 2 7 6 1 -18
6 3 3 1 9 5 4 -171
7 5 9 9 1 7 6 363
8 1 2 9 3 9 2 -36
9 2 6 8 2 6 10 -24
10 8 1 10 7 5 1 45
AUTOMATICALLY DEFINED FUNCTION volume
(progn
(defun volume (arg0 arg1 arg2)
(values
(* arg0 (* arg1 arg2))))
(values
(- (volume L0 W0 H0)
(volume L1 W1 H1))))
AUTOMATICALLY DEFINED FUNCTIONS
• ADFs provide a way to REUSE code• Code is typically reused with different
instantiations of the dummy variables (formal parameters)
ADDITION OF V0 AND V1
Fitness case L0 W0 H0 L1 W1 H1 V0 V1 D
1 3 4 7 2 5 3 84 30 54
2 7 10 9 10 3 1 630 30 600
3 10 9 4 8 1 6 360 48 312
4 3 9 5 1 6 4 135 24 111
5 4 3 2 7 6 1 24 42 -18
6 3 3 1 9 5 4 9 180 -171
7 5 9 9 1 7 6 405 42 363
8 1 2 9 3 9 2 18 54 -36
9 2 6 8 2 6 10 96 120 -24
10 8 1 10 7 5 1 80 35 45
DIVIDE AND CONQUER
• Decompose a problem into sub-problems
• Solve the sub-problems
• Assemble the solutions of the sub-problems into a solution for the overall problem
CHANGE OF REPRESENTATION
• Identify regularities
• Change the representation
• Solve the overall problem
ADF IMPLEMENTATION
• Each overall program in population includes – a main result-producing branch (RPB) and– function-defining branch (i.e., automatically defined
function, ADF)
• In generation 0, create random programs with different ingredients for the RPB and the ADF– Terminal set for ADF typically contains dummy
arguments (formal parameters), such as ARG0, ARG1, …
– Function set of the RPB contains ADF0– ADFs are private and associated with a particular
individual program in the population
ADF MUTATION
• Select parent probabilistically on the basis of fitness
• Pick a mutation point from either RPB or an ADF
• Delete sub-tree rooted at the picked point • Grow a new sub-tree at the picked point
composed of the allowable ingredients appropriate for the picked point
• The offspring is a syntactically valid executable program
ADF CROSSOVER
• Select parent probabilistically on the basis of fitness
• Pick a crossover point from either RPB or an ADF of the FIRST patent
• The choice of crossover point in the SECOND parent is RESTRICTED to the picked RPB or to the picked ADF
• The sub-trees are swapped• The offspring are syntactically valid executable
programs
MAIN POINTS OF 1994 BOOK
• Scalability is essential for solving non-trivial problems in artificial intelligence, machine learning, adaptive systems, and automated learning
• Scalability can be achieved by reuse• Genetic programming provides a way to
automatically discover and reuse subprograms in the course of automatically creating computer programs to solve problems
COMPUTER PROGRAMS
• Subroutines provide one way to REUSE code possibly with different instantiations of the dummy variables (formal parameters)
• Loops (and iterations) provide a 2nd way to REUSE code• Recursion provide a 3rd way to REUSE code• Memory provides a 4th way to REUSE the results of
executing code
COMPUTER PROGRAMS
• Subroutines provide one way to REUSE code possibly with different instantiations of the dummy variables (formal parameters)
• Loops (and iterations) provide a 2nd way to REUSE code• Recursion provide a 3rd way to REUSE code• Memory provides a 4th way to REUSE the results of
executing code
AUTOMATICALLY DEFINED ITERATION (ADI)
• The overall program includes an iteration-performing branch (IPB) in addition to a result-producing branch (RPB) and function-defining branches (ADF)
• There are no infinite loops because the iteration is performed over a known, fixed set – protein or DNA sequence (of varying length)– time-series data– two-dimensional array of pixels
• Memory is usually involved and is used to communicate between IPB, RPB, and ADF
TRANSMEMBRANE SEGMENT IDENTIFICATION PROBLEM
Goal is to classify a given protein segment as being a transmembrane domain or
non-transmembrane area of the protein
TRANSMEMBRANE SEGMENT IDENTIFICATION PROBLEM
(progn (defun ADF0 ()(ORN (ORN (ORN (I?) (H?)) (ORN (P?) (G?))) (ORN (ORN (ORN (Y?)
(N?)) (ORN (T?) (Q?))) (ORN (A?) (H?)))))) (defun ADF1 ()(values (ORN (ORN (ORN (A?) (I?)) (ORN (L?) (W?))) (ORN (ORN
(T?) (L?)) (ORN (T?) (W?)))))) (defun ADF2 ()(values (ORN (ORN (ORN (ORN (ORN (D?) (E?)) (ORN (ORN (ORN
(D?) (E?)) (ORN (ORN (T?) (W?)) (ORN (Q?) (D?)))) (ORN (K?) (P?)))) (ORN (K?) (P?))) (ORN (T?) (W?))) (ORN (ORN (E?) (A?)) (ORN (N?) (R?))))))
(progn (loop-over-residues (SETM0 (+ (- (ADF1) (ADF2)) (SETM3 M0))))
(values (% (% M3 M0) (% (% (% (- L -0.53) (* M0 M0)) (+ (% (% M3 M0) (% (+ M0 M3) (% M1 M2))) M2)) (% M3 M0))))))
TRANSMEMBRANE SEGMENT IDENTIFICATION PROBLEM
• in-sample correlation of 0.976
• out-of-sample correlation of 0.968
• out-of-sample error rate 1.6%
AUTOMATICALLY DEFINED LOOP (ADL)
• loop initialization branch, LIB• loop condition branch, LCB• loop body branch, LBB• loop update branch, LUB
COMPUTER PROGRAMS
• Subroutines provide one way to REUSE code possibly with different instantiations of the dummy variables (formal parameters)
• Loops (and iterations) provide a 2nd way to REUSE code• Recursion provide a 3rd way to REUSE code• Memory provides a 4th way to REUSE the results of
executing code
AUTOMATICALLY DEFINED RECURSION (ADR)
• recursion condition branch, RCB• recursion body branch, RBB• recursion update branch, RUB• recursion ground branch, RGB
HUMAN-COMPETITIVE RESULTS(NOT RELATED TO PATENTS)
Transmembrane segment identification problem for proteins
Motifs for D–E–A–D box family and manganese superoxide dismutase family of proteins
Cellular automata rule for Gacs-Kurdyumov-Levin (GKL) problem
Quantum algorithm for the Deutsch-Jozsa “early promise” problem
Quantum algorithm for Grover’s database search problem
Quantum algorithm for the depth-two AND/OR query problem
Quantum algorithm for the depth-one OR query problem
Protocol for communicating information through a quantum gate
Quantum dense coding
Soccer-playing program that won its first two games in the 1997 Robo Cup competition
Soccer-playing program that ranked in the middle of field in 1998 Robo Cup competition
Antenna designed by NASA for use on spacecraft
Sallen-Key filter
PROGRESSION OF QUALITATIVELY MORE SUBSTANTIAL RESULTS PRODUCED BY GP
• Toy problems
• Human-competitive non-patent results
• 20th-century patented inventions
• 21st-century patented inventions
• Patentable new inventions
16 ATTRIBUTES OF A SYSTEM FOR AUTOMATICALLY CREATING COMPUTER
PROGRAMS
• Starts with "What needs to be done"
• Tells us "How to do it"• Produces a computer
program• Automatic determination of
program size• Code reuse• Parameterized reuse• Internal storage• Iterations, loops, and
recursions
• Self-organization of hierarchies
• Automatic determination of program architecture
• Wide range of programming constructs
• Well-defined• Problem-independent• Wide applicability• Scalable• Competitive with human-
produced results
AUTOMATIC SYNTHESIS OF BOTH THE TOPOLOGY AND
SIZING OF ANALOG ELECTRICAL CIRCUITS BY MEANS OF
DEVELOPMENTAL GENETIC PROGRAMMING
AUTOMATED CIRCUIT SYNTHESIS
• The topology of a circuit includes specifying the gross number of components in the circuit, the type of each component (e.g., a capacitor), and a netlist specifying where each lead of each component is to be connected.
• Sizing involves specifying the values (typically numerical) of each of the circuit's components.
COMPONENT-CREATING FUNCTIONS
• Resistor R function
• Capacitor C function
• Inductor L function
• Diode D function
• Transistor Q function (3-leaded)
DEVELOPMENTAL GP
(LIST (C (– 0.963 (– (– -0.875 -0.113) 0.880)) (series (flip end) (series (flip end) (L -0.277 end) end) (L (– -0.640 0.749) (L -0.123 end)))) (flip (nop (L -0.657 end)))))
CAPACITOR-CREATING FUNCTION
(LIST (C (– 0.963 (– (– -0.875
-0.113) 0.880)) (series (flip
end) (series (flip end) (L –
0.277 end) end) (L (– -0.640
0.749) (L -0.123 end)))) (flip
(nop (L -0.657 end)))))
SERIES DIVISION FUNCTION
(LIST (C (– 0.963 (– (– -0.875
-0.113) 0.880)) (series (flip
end) (series (flip end) (L –
0.277 end) end) (L (– -0.640
0.749) (L -0.123 end)))) (flip
(nop (L -0.657 end)))))
EVALUATION OF FITNESS
Program Tree
+ IN OUT z0
Embryonic Circuit
Fully Designed Circuit (NetGraph)
Circuit Netlist (ascii)
Circuit Simulator (SPICE)
Circuit Behavior (Output)
Fitness
EVOLVED DARLINGTON EMITTER-FOLLOWER SECTION
U. S. patent 2,663,806
Sidney Darlington
Bell Telephone Laboratories
1953
20th-CENTURY PATENTS
Campbell ladder topology for filters
Zobel “M-derived half section” and “constant K” filter sections
Crossover filter
Negative feedback
Cauer (elliptic) topology for filters
PID and PID-D2 controllers
Darlington emitter-follower section and voltage gain stage
Sorting network for seven items using only 16 steps
60 and 96 decibel amplifiers
Analog computational circuits
Real-time analog circuit for time-optimal robot control
Electronic thermometer
Voltage reference circuit
Philbrick circuit
NAND circuit
Simultaneous synthesis of topology, sizing, placement, and routing
PROGRESSION OF QUALITATIVELY MORE SUBSTANTIAL RESULTS PRODUCED BY GP
• Toy problems
• Human-competitive non-patent results
• 20th-century patented inventions
• 21st-century patented inventions
• Patentable new inventions
21st-CENTURY PATENTED INVENTIONS
Low-voltage balun circuit
Mixed analog-digital variable capacitor circuit
High-current load circuit
Voltage-current conversion circuit
Cubic function generator
Tunable integrated active filter
PROGRESSION OF QUALITATIVELY MORE SUBSTANTIAL RESULTS PRODUCED BY GP
• Toy problems
• Human-competitive non-patent results
• 20th-century patented inventions
• 21st-century patented inventions
• Patentable new inventions
NOVELTY-DRIVEN EVOLUTION
• Two factors in fitness measure– Circuit’s behavior in the frequency domain
– Largest number of nodes and edges (circuit components) of a subgraph of the given circuit that is isomorphic to a subgraph of a template representing the prior art. Graph isomorphism algorithm with the cost function being based on the number of shared nodes and edges (instead of just the number of nodes).
NOVELTY-DRIVEN EVOLUTION
• For circuits not scoring the maximum number of hits (101), the fitness of a circuit is the product of the two factors.
• For circuits scoring 101 hits (100%-compliant individuals), fitness is the number of shared nodes and edges divided by 10,000.
AUTOMATIC SYNTHESIS OF CIRCUIT LAYOUT
INCLUDING THE PLACEMENT OF COMPONENTS AND ROUTING OF
WIRES
ALONG WITH THE TOPOLOGY AND SIZING
CIRCUIT LAYOUT
• Circuit placement involves the assignment of each of the circuit's components to a particular physical location on a printed circuit board or silicon wafer.
• Routing involves the assignment of a particular physical location to the wires between the leads of the circuit's components.
100%-COMPLIANT LOWPASS FILTER
BEST-OF-RUN CIRCUIT OF GENERATION 138 WITH 4 INDUCTORS AND 4 CAPACITORS AREA OF 359.4
CONTROLLER BLOCKS
• gain
• integrator
• differentiator
• adder
• subtractor
• multiplier
• differential input integrators
• inverter
• lead
• lag
• two-parameter lag
• absolute value
• limiter
• divider
• delay
• conditional operators (switches)
FUNCTION SET FOR CONTROLLER SYNTHESIS
F = {GAIN,INVERTER,LEAD,LAG,LAG2, DIFFERENTIAL_INPUT_INTEGRATOR, DIFFERENTIATOR, ADD_SIGNAL, SUB_SIGNAL,ADD_3_SIGNAL,ADF0, ADF1,ADF2,ADF3,ADF4}
CONSTRAINED SYNTACTIC STRUCTURE
• A grammar that specifies what functions and terminals are allowed as arguments to particular functions
• For example, the first argument of the GAIN function must be a value-setting subtree whereas the second can be from the general pool of functions
• Also known as strong typing
8 FITNESS CASES
• 8 elements of the fitness measure represent 2 2 2 choices:– 2 different values of the plant's internal gain,
K (1.0 and 2.0), – 2 different values of the plant's time constant
(0.5 and 1.0), – 2 different values for the height of the
reference signal (rising from 0 to 1 volts or from 0 to 1 microvolts at t = 100 milliseconds
FITNESS MEASURE
• For each of these 8 fitness cases, a transient analysis (time domain) is performed using the SPICE simulator.
• The contribution to fitness for the 8 elements is
– Integral of time-weighted absolute error (ITAE)
– e(t) is difference between plant output and reference signal.
– Multiplication by B (106 or 1) makes both reference signals equally influential.
– Additional weighting function, A, heavily penalizes non-compliant amounts of overshoot. A weights all variations up to 2% above the reference signal by 1.0, but bigger variations by 10.0.
ANTENNA SYNTHESIS USING GP
(PROGN3 (TURN-RIGHT 0.125) (LANDMARK (REPEAT 2 (PROGN2 (DRAW 1.0 HALF-MM-WIRE) (DRAW 0.5 NO-WIRE))) (TRANSLATE-RIGHT 0.125 0.75))
EVOLVED NETWORK
(IF (< LACTOSE_LEVEL 9.139 ) (IF (<REPRESSOR_LEVEL 6.270 ) (IF (> GLUCOSE_LEVEL5.491 ) 2.02 (IF (< CAP_LEVEL 0.639 ) 2.033 (IF(< CAP_LEVEL 4.858 ) (IF (> LACTOSE_LEVEL 2.511 )(IF (> CAP_LEVEL 7.807 ) 5.586 (IF (>LACTOSE_LEVEL 2.114 ) 1.978 2.137 ) ) 0.0 ) (IF(> REPRESSOR_LEVEL 4.015 ) 0.036 (IF (<GLUCOSE_LEVEL 5.128 ) 10.0 (IF (< REPRESSOR_LEVEL4.268 ) 2.022 9.122 ) ) ) ) ) ) (IF (> CAP_LEVEL0.842 ) 0.0 5.97 ) ) (IF (< CAP_LEVEL 1.769 )2.022 (IF (< GLUCOSE_LEVEL 2.382 ) (IF (>LACTOSE_LEVEL 1.256 ) (IF (> LACTOSE_LEVEL 1.933) (IF (> GLUCOSE_LEVEL 2.022 ) (IF (<GLUCOSE_LEVEL 5.183 ) 6.323 (IF (> CAP_LEVEL1.208 ) 9.713 0.842 ) ) 10.0 ) (IF (>GLUCOSE_LEVEL 6.270 ) 2.109 ) 1.965 ) ) 0.665 )
1.982 ) ) )
IN C-STYLE PSEUDO CODE
if(LACTOSE_LEVEL < 9.139) { if(REPRESSOR_LEVEL <
6.270) { LAC_mRNA_LEVEL =
2.022; } else { LAC_mRNA_LEVEL =
0.0; }}
else{ if(CAP_LEVEL < 1.769) { LAC_mRNA_LEVEL =
2.022; } else { if(GLUCOSE_LEVEL <
2.382) { LAC_mRNA_LEVEL
= 10.0; } else { LAC_mRNA_LEVEL
= 1.982; } } }
PARAMETERIZED TOPOLOGIES
• One of the most important characteristics of computer programs is that they ordinarily contain inputs (free variables) and conditional operations
PATENTABLE NEW INVENTIONS
PID tuning rules that outperform the Ziegler-Nichols and Åström-Hägglund tuning rules
General-purpose controllers outperforming Ziegler-Nichols and Åström-Hägglund rules
PROGRESSION OF QUALITATIVELY MORE SUBSTANTIAL RESULTS PRODUCED BY GP
• Toy problems
• Human-competitive non-patent results
• 20th-century patented inventions
• 21st-century patented inventions
• Patentable new inventions
GP PARALLELIZATION
• Like Hormel, Get Everything Out of the Pig, Including the Oink
• Keep on Trucking
• It Takes a Licking and Keeps on Ticking
• The Whole is Greater than the Sum of the Parts
PETA-OPS
• Human brain operates at 1012 neurons operating at 103 per second = 1015 ops per second
• 1015 ops = 1 peta-op = 1 bs (brain second)
GP 1987–2002
System Dates Speed-up over first system
Human-competitive results
Problem Category
Serial LISP 1987–1994 1 (base) 0 toy problems 64 transputers 1994–1997 9 2 human-competitive
results not related to patented inventions
64 PowerPC’s 1995–2000 204 12 20th-century patented inventions
70 Alpha’s 1999–2001 1,481 2 20th-century patented inventions
1,000 Pentium II’s 2000–2002 13,900 12 21st-century patented inventions
4-week runs on 1,000 Pentium II’s
2002-2003 130,000 2 patentable new inventions
PROMISING GP APPLICATION AREAS
• Problem areas involving many variables that are interrelated in highly non-linear ways
• Inter-relationship of variables is not well understood
• Discovery of the size and shape of the solution is a major part of the problem
• "Black art" problems
PROMISING GP APPLICATION AREAS (CONTINUED)
• Areas where you simply have no idea how to program a solution, but where you know what you want
PROMISING GP APPLICATION AREAS (CONTINUED)
• Problem areas where a good approximate solution is satisfactory
design
control
bioinformatics
classification
data mining
system identification
forecasting
PROMISING GP APPLICATION AREAS (CONTINUED)
Areas where large computerized databases are accumulating and computerized techniques are needed to analyze the data
genome, protein, microarray data
satellite image data
astronomical data
petroleum databases
financial databases
medical records
marketing databases
PROMISING GP APPLICATION AREAS (CONTINUED)
Areas for which humans find it very difficult to write good programs
parallel computers cellular automata multi-agent strategies field-programmable game arrays digital signal processors swarm intelligence
CHARACTERISTICS SUGGESTING USE OF GP
(1) discovering the size and shape of the solution, (2) reusing substructures, (3) discovering the number of substructures, (4) discovering the nature of the hierarchical references among
substructures, (5) passing parameters to a substructure,(6) discovering the type of substructures (e.g., subroutines, iterations,
loops, recursions, or storage),(7) discovering the number of arguments possessed by a substructure, (8) maintaining syntactic validity and locality by means of a
developmental process, or(9) discovering a general solution in the form of a parameterized
topology containing free variables
DESIGNING A GIRAFFE
• Long neck
• Long tongue
• Vegetable-digesting enzymes in stomach
• 4 legs
• Long legs
• Brown coloration
THE DESIGN OF A GOOD GIRAFE
Neck length
Tongue length
Carnivorous? Number of legs
Leg length
Coloration
15.11 feet
14 inches No 4 9.96 feet Brown
Floating point
Floating point
Boolean Integer Floating point
Categorical
NON-LINEARITY — GIRAFE
• Taken one-by-one, some gene values found in a giraffe, such as the long neck contribute (alone) negatively to fitness– requires considerable material to construct
– requires considerable energy to maintain
– prone to injury (thereby hurting rate of survival and reproduction)
• Thus, maximizing any one variable will not lead to the global optimum solution
NON-LINEARITY (CONTINUED)
• When the variables are taken in pairs (there are 15 possible pairs), many combinations of pairs (e.g., Long neck and long tongue) are doubly detrimental
NON-LINEARITY (CONTINUED)
• But, certain combinations of traits, when taken together, are "co-adapted sets of alleles" that yield a very fit animal for eating high acacia leaves in the jungle environment, having good camouflage, having high escape velocity when faced with predators, and exploiting a niche (and avoiding competition) with other animals feeding on low-hanging vegetation
SEARCH METHODS IN GENERAL
• Initial structure(s)
• Fitness measure
• Operations for creating new structures
• Parameters
• Termination criterion and method of designating the result
SEARCH METHODS
• Blind random search does not use acquired information in deciding on the future direction of the search
• Hill combing and gradient descent use acquired information; however, they are prone to becoming trapped on local optima
• The previous point is especially true for non-trivial search spaces
REPRESENTATION
Genetic programming overtly conducts it
search for a solution to the given problem
in program space
ROLE OF POINT-TO-POINT TRANSFORMATIONS IN THE
SEARCH
Genetic programming does not conduct its
search by transforming a single point in the
search space into another single point, but
instead transforms a set of points into
another set of points
ROLE OF HILL CLIMBING IN THE SEARCH
Genetic programming does not rely
exclusively on greedy hill climbing to
conduct its search, but instead allocates a
certain number of trials, in a principled
way, to choices that are known to be
inferior
ROLE OF FORMAL LOGIC IN THE SEARCH
Genetic programming does not utilize
formal logic in it’s search strategy. Contradictory alternatives are created and actively maintained.
TURING (1948)
Turing made the connection between searches and the challenge of getting a computer to solve a problem without explicitly programming it in his 1948 essay “Intelligent Machines”
"Further research into intelligence of machinery will probably be very greatly concerned with 'searches' ... “
TURING’S 3 APPROACHES TO MACHINE INTELLIGENCE (1948)
LOGIC-BASED SEARCH
One approach that Turing identified is asearch through the space of integers representing candidate computer programs.
TURING’S 3 APPROACHES (CONTINUED)
CULTURAL SEARCH
A second approach is the "cultural search“which relies on knowledge and expertiseacquired over a period of years fromothers (akin to present-day knowledge-based systems).
TURING’S 3 APPROACHES (CONTINUED)
GENETICAL OR EVOLUTIONARY SEARCH
"There is the genetical or evolutionarysearch by which a combination of genes islooked for, the criterion being the survivalvalue.“
TURING (1950)
From Turing’s 1950 paper "ComputingMachinery and Intelligence" …
“We cannot expect to find a good child-machine at the first attempt. One must experiment with teaching one such machine and see how well it learns. One can then try another and see if it is better or worse. There is an obvious connection between this process and evolution, by the identifications”