1
Competent Program Evolution
Dissertation DefenseMoshe Looks
December 11th 2006
2
Synopsis
Competent optimization requires adaptive decomposition
This is problematic in program spaces
Thesis we can do it by exploiting semantics
Results it works
3
General Optimization
Find a solution s in S Maximizeminimize f(s)
fS
To solve this faster than O(|S|) make assumptions about f
4
Near-Decomposability
Complete separability would be nicehellip
Near-decomposability (Simon 1969) is more realistic
Stronger Interactions
Weaker Interactions
5
Exploiting Separability Separability = independence assumptions
Given a prior over the solution space represented as a probability vector
1 Sample solutions from the model 2 Update model toward higher-scoring points3 Iterate
Works well when interactions are weak
6
Exploiting Near-Decomposability Bayesian optimization algorithm (BOA)
represent problem decomposition as a Bayesian Network learned greedily via a network scoring metric
Hierarchical BOA uses Bayesian networks with local structure
allows smaller model-building steps leads to more accurate models
restricted tournament replacement promotes diversity
Solves the linkage problem Competence solving hard problems quickly
accurately and reliably
7
Program Learning
Solutions encode executable programs execution maps programs to behaviors
execPB find a program p in P maximizeminimize f(exec(p))
fB
To be useful make assumptions about exec P and B
8
Properties of Program Spaces Open-endedness
Over-representation many programs map to the same behavior
Compositional hierarchy intrinsically organized into subprograms
Chaotic Execution similar programs may have very different behaviors
9
Properties of Program Spaces Simplicity prior
simpler programs are more likely
Simplicity preference smaller programs are preferable
Behavioral decomposability fB is separable nearly decomposable
White box execution execution function is known and constant
10
Thesis
Program spaces not directly decomposable
Leverage properties of program spaces as inductive bias
Leading to competent program evolution
11
Representation-Building
Organize programs in terms of commonalities Ignore semantically meaningless variation Explore plausible variations
12
Representation-Building
Common regions must be aligned Redundancy must be identified Create knobs for plausible variations
13
Representation-Building
What abouthellip changing the phase averaging two input instead of picking one hellip
behavior (semantic) space program (syntactic) space
14
Statics amp Dynamics
Representations span a limited subspace of programs
Conceptual steps in representation-building1 reduction to normal form (x x + 0 rarr x)2 neighborhood enumeration (generate knobs)3 neighborhood reduction (get rid of some knobs)
Create demes to maintain a sample of many representations deme a sample of programs living in a common representation intra-deme optimization use the hBOA inter-deme
based on dominance relationships
15
Meta-Optimizing Semantic Evolutionary Search (MOSES)
1 Create an initial deme based on a small set of knobs (ie empty program) and random sampling in knob-space
2 Select a deme and run hBOA on it
3 Select programs from the final hBOA population meeting the deme-creation criterion (possibly displacing existing demes)
4 For each such program1 create a new representation centered around the program2 create a new random sample within this representation3 add as a deme
5 Repeat from step 2
16
Artificial Ant
Eat all food pellets within 600 steps Existing evolutionary methods not
significantly than random Space contains many regularities
To apply MOSES three reductions rules for normal form
eg left left left rarr right separate knobs for rotation
movement amp conditionals no neighborhood reduction
needed
rarr
17
Artificial Ant
How does MOSES do it
Searches a greatly reduced space
Exploits key dependencies ldquo[t]hese symmetries lead to essentially the same solutions
appearing to be the opposite of each other Eg either a pair of Right or pair of Left terminals at a particular location may be importantrdquo ndash Langdon amp Poli ldquoWhy ants are hardrdquo
hBOA modeling learns linkage between rotation knobs
Eliminate modeling and the problem still gets solved but with much higher variance computational effort rises to 36000
Technique Effort
Evolutionary Programming
136000 x
Genetic Programming
450000 x
MOSES 23000
18
Elegant Normal Form (Holman rsquo90)
Hierarchical normal form for Boolean formulae Reduction process takes time linear in formula size 99 of random 500-literal formulae reduced over 98
19
Syntactic vs Behavioral Distance
Is there a correlation between syntactic and behavioral distance
5000 unique random formulae of arity 10 with 30 literals each qualitatively similar results for arity 5
Computed the set of pairwise behavioral distances (truth-table Hamming distance) syntactic distances (tree edit distance normalized by tree size)
The same computation on the same formulae reduced to ENF
20
Syntactic vs Behavioral Distance
Is there a correlation between syntactic and behavioral distance
Random Formulae Reduced to ENF
21
Neighborhoods amp Knobs
What do neighborhoods look like behaviorally
1000 unique random formulae arity 5 100 literals each qualitatively similar results for arity 10
Enumerate all neighbors (edit distances lt2) compute behavioral distance from source
Neighborhoods in MOSES defined based on ENF neighbors are converted to ENF compared to original used to heuristically reduce total neighborhood size
22
Neighborhoods amp Knobs
What do neighborhoods look like behaviorally
Random formulae Reduced to ENF
23
Hierarchical Parity-Multiplexer
Study decomposition in a Boolean domain
Multiplexer function of arity k1 computed from k1 parity function of arity k2
total arity is k1k2
Hypothesis parity subfunctions will exhibit tighter linkages
24
Hierarchical Parity-Multiplexer Computational effort decreases 42 with model-
building (on 2-parity-3-multiplexer)
Paritysubfunctions(adjacent pairs)have tightest linkages
Hypothesis validated
25
Program Growth
5-parity minimal program size ~ 53
26
Program Growth
11-multiplexer minimal program size ~ 27
27
Where do the Cycles Go
Problem hBOA Representation-Building
Program Evaluation
5-Parity 28 43 29
11-multiplex 5 5 89
CFS 80 10 11
Complexity O(Nl2a2) O(Nla) O(Nlc)
N is population size O(n105)l is program size a is the arity of the spacen is representation size O(aprogram size)c is number of test cases
28
Supervised Classification
Goals accuracies comparable to SVM
superior accuracy vs GP
simpler classifiers vs SVM and GP
29
Supervised Classification
How much simpler Consider average-sized formulae learned for the 6-multiplexer
MOSES 21 nodes max depth 4
and(or(not(x2) and(or(x1 x4) or(and(not(x1) x4) x6))) or(and(or(x1 x4) or(and(or(x5 x6) or(x2 and(x1 x5))) and(not(x1) x3))) and(or(not(x1) and(x2 x6)) or(not(x1) x3 x6) or(and(not(x1) x2) and(x2 x4) and(not(x1) x3)))))
or(and(not(x1) not(x2) x3) and(or(not(x2) and(x3 x6)) x1 x4) and(not(x1) x2 x5) and(x1 x2 x6))
GP (after reduction to ENF) 50 nodes max depth 7
30
Supervised Classification
Datasets taken from recent comp bio papers
Chronic fatigue syndrome (101 cases) based on 26 SNPs genes either in homozygosis in heterozygosis or not expressed 56 binary features
Lymphoma (77 cases) amp aging brains (19 cases) based on gene expression levels (continuous) 50 most-differentiating genes selected preprocessed into binary features based on medians
All experiments based on 10 independent runs of 10-fold cross-validation
31
Quantitative Results
Classification average test accuracy
Technique CFS Lymphoma Aging Brain
SVM 662 975 950
GP 673 779 700
MOSES 679 946 953
32
Quantitative Results
Benchmark performance artificial ant
6x less computational effort vs EP 20x less vs GP
parity problems 133x less vs EP 4x less vs GP on 5-parity found solutions to 6-parity (none found by EP or GP)
multiplexer problems 9x less vs GP on 11-multiplexer
33
Qualitative Results
Requirements for competent program evolution all requirements for competent optimization
+ exploit semantics
+ recombine programs only within bounded subspaces
Bipartite conception of problem difficulty program-level adapted from the optimization case
deme-level theory based on global properties of the space (deme-level neutrality deceptiveness etc)
34
Qualitative Results
Representation-building for programs
parameterization based on semantics
transforms program space properties to facilitate program evolution
probabilistic modeling over sets of program transformations
models compactly represent problem structure
35
Competent Program Evolution
Competent not just good performance explainability of good results robustness
Vision representations are important program learning is unique representations must be specialized based on semantics
MOSES meta-optimizing semantic evolutionary search exploiting semantics and managing demes
36
Committee
Dr Ron Loui (WashU chair)
Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech
Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)
2
Synopsis
Competent optimization requires adaptive decomposition
This is problematic in program spaces
Thesis we can do it by exploiting semantics
Results it works
3
General Optimization
Find a solution s in S Maximizeminimize f(s)
fS
To solve this faster than O(|S|) make assumptions about f
4
Near-Decomposability
Complete separability would be nicehellip
Near-decomposability (Simon 1969) is more realistic
Stronger Interactions
Weaker Interactions
5
Exploiting Separability Separability = independence assumptions
Given a prior over the solution space represented as a probability vector
1 Sample solutions from the model 2 Update model toward higher-scoring points3 Iterate
Works well when interactions are weak
6
Exploiting Near-Decomposability Bayesian optimization algorithm (BOA)
represent problem decomposition as a Bayesian Network learned greedily via a network scoring metric
Hierarchical BOA uses Bayesian networks with local structure
allows smaller model-building steps leads to more accurate models
restricted tournament replacement promotes diversity
Solves the linkage problem Competence solving hard problems quickly
accurately and reliably
7
Program Learning
Solutions encode executable programs execution maps programs to behaviors
execPB find a program p in P maximizeminimize f(exec(p))
fB
To be useful make assumptions about exec P and B
8
Properties of Program Spaces Open-endedness
Over-representation many programs map to the same behavior
Compositional hierarchy intrinsically organized into subprograms
Chaotic Execution similar programs may have very different behaviors
9
Properties of Program Spaces Simplicity prior
simpler programs are more likely
Simplicity preference smaller programs are preferable
Behavioral decomposability fB is separable nearly decomposable
White box execution execution function is known and constant
10
Thesis
Program spaces not directly decomposable
Leverage properties of program spaces as inductive bias
Leading to competent program evolution
11
Representation-Building
Organize programs in terms of commonalities Ignore semantically meaningless variation Explore plausible variations
12
Representation-Building
Common regions must be aligned Redundancy must be identified Create knobs for plausible variations
13
Representation-Building
What abouthellip changing the phase averaging two input instead of picking one hellip
behavior (semantic) space program (syntactic) space
14
Statics amp Dynamics
Representations span a limited subspace of programs
Conceptual steps in representation-building1 reduction to normal form (x x + 0 rarr x)2 neighborhood enumeration (generate knobs)3 neighborhood reduction (get rid of some knobs)
Create demes to maintain a sample of many representations deme a sample of programs living in a common representation intra-deme optimization use the hBOA inter-deme
based on dominance relationships
15
Meta-Optimizing Semantic Evolutionary Search (MOSES)
1 Create an initial deme based on a small set of knobs (ie empty program) and random sampling in knob-space
2 Select a deme and run hBOA on it
3 Select programs from the final hBOA population meeting the deme-creation criterion (possibly displacing existing demes)
4 For each such program1 create a new representation centered around the program2 create a new random sample within this representation3 add as a deme
5 Repeat from step 2
16
Artificial Ant
Eat all food pellets within 600 steps Existing evolutionary methods not
significantly than random Space contains many regularities
To apply MOSES three reductions rules for normal form
eg left left left rarr right separate knobs for rotation
movement amp conditionals no neighborhood reduction
needed
rarr
17
Artificial Ant
How does MOSES do it
Searches a greatly reduced space
Exploits key dependencies ldquo[t]hese symmetries lead to essentially the same solutions
appearing to be the opposite of each other Eg either a pair of Right or pair of Left terminals at a particular location may be importantrdquo ndash Langdon amp Poli ldquoWhy ants are hardrdquo
hBOA modeling learns linkage between rotation knobs
Eliminate modeling and the problem still gets solved but with much higher variance computational effort rises to 36000
Technique Effort
Evolutionary Programming
136000 x
Genetic Programming
450000 x
MOSES 23000
18
Elegant Normal Form (Holman rsquo90)
Hierarchical normal form for Boolean formulae Reduction process takes time linear in formula size 99 of random 500-literal formulae reduced over 98
19
Syntactic vs Behavioral Distance
Is there a correlation between syntactic and behavioral distance
5000 unique random formulae of arity 10 with 30 literals each qualitatively similar results for arity 5
Computed the set of pairwise behavioral distances (truth-table Hamming distance) syntactic distances (tree edit distance normalized by tree size)
The same computation on the same formulae reduced to ENF
20
Syntactic vs Behavioral Distance
Is there a correlation between syntactic and behavioral distance
Random Formulae Reduced to ENF
21
Neighborhoods amp Knobs
What do neighborhoods look like behaviorally
1000 unique random formulae arity 5 100 literals each qualitatively similar results for arity 10
Enumerate all neighbors (edit distances lt2) compute behavioral distance from source
Neighborhoods in MOSES defined based on ENF neighbors are converted to ENF compared to original used to heuristically reduce total neighborhood size
22
Neighborhoods amp Knobs
What do neighborhoods look like behaviorally
Random formulae Reduced to ENF
23
Hierarchical Parity-Multiplexer
Study decomposition in a Boolean domain
Multiplexer function of arity k1 computed from k1 parity function of arity k2
total arity is k1k2
Hypothesis parity subfunctions will exhibit tighter linkages
24
Hierarchical Parity-Multiplexer Computational effort decreases 42 with model-
building (on 2-parity-3-multiplexer)
Paritysubfunctions(adjacent pairs)have tightest linkages
Hypothesis validated
25
Program Growth
5-parity minimal program size ~ 53
26
Program Growth
11-multiplexer minimal program size ~ 27
27
Where do the Cycles Go
Problem hBOA Representation-Building
Program Evaluation
5-Parity 28 43 29
11-multiplex 5 5 89
CFS 80 10 11
Complexity O(Nl2a2) O(Nla) O(Nlc)
N is population size O(n105)l is program size a is the arity of the spacen is representation size O(aprogram size)c is number of test cases
28
Supervised Classification
Goals accuracies comparable to SVM
superior accuracy vs GP
simpler classifiers vs SVM and GP
29
Supervised Classification
How much simpler Consider average-sized formulae learned for the 6-multiplexer
MOSES 21 nodes max depth 4
and(or(not(x2) and(or(x1 x4) or(and(not(x1) x4) x6))) or(and(or(x1 x4) or(and(or(x5 x6) or(x2 and(x1 x5))) and(not(x1) x3))) and(or(not(x1) and(x2 x6)) or(not(x1) x3 x6) or(and(not(x1) x2) and(x2 x4) and(not(x1) x3)))))
or(and(not(x1) not(x2) x3) and(or(not(x2) and(x3 x6)) x1 x4) and(not(x1) x2 x5) and(x1 x2 x6))
GP (after reduction to ENF) 50 nodes max depth 7
30
Supervised Classification
Datasets taken from recent comp bio papers
Chronic fatigue syndrome (101 cases) based on 26 SNPs genes either in homozygosis in heterozygosis or not expressed 56 binary features
Lymphoma (77 cases) amp aging brains (19 cases) based on gene expression levels (continuous) 50 most-differentiating genes selected preprocessed into binary features based on medians
All experiments based on 10 independent runs of 10-fold cross-validation
31
Quantitative Results
Classification average test accuracy
Technique CFS Lymphoma Aging Brain
SVM 662 975 950
GP 673 779 700
MOSES 679 946 953
32
Quantitative Results
Benchmark performance artificial ant
6x less computational effort vs EP 20x less vs GP
parity problems 133x less vs EP 4x less vs GP on 5-parity found solutions to 6-parity (none found by EP or GP)
multiplexer problems 9x less vs GP on 11-multiplexer
33
Qualitative Results
Requirements for competent program evolution all requirements for competent optimization
+ exploit semantics
+ recombine programs only within bounded subspaces
Bipartite conception of problem difficulty program-level adapted from the optimization case
deme-level theory based on global properties of the space (deme-level neutrality deceptiveness etc)
34
Qualitative Results
Representation-building for programs
parameterization based on semantics
transforms program space properties to facilitate program evolution
probabilistic modeling over sets of program transformations
models compactly represent problem structure
35
Competent Program Evolution
Competent not just good performance explainability of good results robustness
Vision representations are important program learning is unique representations must be specialized based on semantics
MOSES meta-optimizing semantic evolutionary search exploiting semantics and managing demes
36
Committee
Dr Ron Loui (WashU chair)
Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech
Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)
3
General Optimization
Find a solution s in S Maximizeminimize f(s)
fS
To solve this faster than O(|S|) make assumptions about f
4
Near-Decomposability
Complete separability would be nicehellip
Near-decomposability (Simon 1969) is more realistic
Stronger Interactions
Weaker Interactions
5
Exploiting Separability Separability = independence assumptions
Given a prior over the solution space represented as a probability vector
1 Sample solutions from the model 2 Update model toward higher-scoring points3 Iterate
Works well when interactions are weak
6
Exploiting Near-Decomposability Bayesian optimization algorithm (BOA)
represent problem decomposition as a Bayesian Network learned greedily via a network scoring metric
Hierarchical BOA uses Bayesian networks with local structure
allows smaller model-building steps leads to more accurate models
restricted tournament replacement promotes diversity
Solves the linkage problem Competence solving hard problems quickly
accurately and reliably
7
Program Learning
Solutions encode executable programs execution maps programs to behaviors
execPB find a program p in P maximizeminimize f(exec(p))
fB
To be useful make assumptions about exec P and B
8
Properties of Program Spaces Open-endedness
Over-representation many programs map to the same behavior
Compositional hierarchy intrinsically organized into subprograms
Chaotic Execution similar programs may have very different behaviors
9
Properties of Program Spaces Simplicity prior
simpler programs are more likely
Simplicity preference smaller programs are preferable
Behavioral decomposability fB is separable nearly decomposable
White box execution execution function is known and constant
10
Thesis
Program spaces not directly decomposable
Leverage properties of program spaces as inductive bias
Leading to competent program evolution
11
Representation-Building
Organize programs in terms of commonalities Ignore semantically meaningless variation Explore plausible variations
12
Representation-Building
Common regions must be aligned Redundancy must be identified Create knobs for plausible variations
13
Representation-Building
What abouthellip changing the phase averaging two input instead of picking one hellip
behavior (semantic) space program (syntactic) space
14
Statics amp Dynamics
Representations span a limited subspace of programs
Conceptual steps in representation-building1 reduction to normal form (x x + 0 rarr x)2 neighborhood enumeration (generate knobs)3 neighborhood reduction (get rid of some knobs)
Create demes to maintain a sample of many representations deme a sample of programs living in a common representation intra-deme optimization use the hBOA inter-deme
based on dominance relationships
15
Meta-Optimizing Semantic Evolutionary Search (MOSES)
1 Create an initial deme based on a small set of knobs (ie empty program) and random sampling in knob-space
2 Select a deme and run hBOA on it
3 Select programs from the final hBOA population meeting the deme-creation criterion (possibly displacing existing demes)
4 For each such program1 create a new representation centered around the program2 create a new random sample within this representation3 add as a deme
5 Repeat from step 2
16
Artificial Ant
Eat all food pellets within 600 steps Existing evolutionary methods not
significantly than random Space contains many regularities
To apply MOSES three reductions rules for normal form
eg left left left rarr right separate knobs for rotation
movement amp conditionals no neighborhood reduction
needed
rarr
17
Artificial Ant
How does MOSES do it
Searches a greatly reduced space
Exploits key dependencies ldquo[t]hese symmetries lead to essentially the same solutions
appearing to be the opposite of each other Eg either a pair of Right or pair of Left terminals at a particular location may be importantrdquo ndash Langdon amp Poli ldquoWhy ants are hardrdquo
hBOA modeling learns linkage between rotation knobs
Eliminate modeling and the problem still gets solved but with much higher variance computational effort rises to 36000
Technique Effort
Evolutionary Programming
136000 x
Genetic Programming
450000 x
MOSES 23000
18
Elegant Normal Form (Holman rsquo90)
Hierarchical normal form for Boolean formulae Reduction process takes time linear in formula size 99 of random 500-literal formulae reduced over 98
19
Syntactic vs Behavioral Distance
Is there a correlation between syntactic and behavioral distance
5000 unique random formulae of arity 10 with 30 literals each qualitatively similar results for arity 5
Computed the set of pairwise behavioral distances (truth-table Hamming distance) syntactic distances (tree edit distance normalized by tree size)
The same computation on the same formulae reduced to ENF
20
Syntactic vs Behavioral Distance
Is there a correlation between syntactic and behavioral distance
Random Formulae Reduced to ENF
21
Neighborhoods amp Knobs
What do neighborhoods look like behaviorally
1000 unique random formulae arity 5 100 literals each qualitatively similar results for arity 10
Enumerate all neighbors (edit distances lt2) compute behavioral distance from source
Neighborhoods in MOSES defined based on ENF neighbors are converted to ENF compared to original used to heuristically reduce total neighborhood size
22
Neighborhoods amp Knobs
What do neighborhoods look like behaviorally
Random formulae Reduced to ENF
23
Hierarchical Parity-Multiplexer
Study decomposition in a Boolean domain
Multiplexer function of arity k1 computed from k1 parity function of arity k2
total arity is k1k2
Hypothesis parity subfunctions will exhibit tighter linkages
24
Hierarchical Parity-Multiplexer Computational effort decreases 42 with model-
building (on 2-parity-3-multiplexer)
Paritysubfunctions(adjacent pairs)have tightest linkages
Hypothesis validated
25
Program Growth
5-parity minimal program size ~ 53
26
Program Growth
11-multiplexer minimal program size ~ 27
27
Where do the Cycles Go
Problem hBOA Representation-Building
Program Evaluation
5-Parity 28 43 29
11-multiplex 5 5 89
CFS 80 10 11
Complexity O(Nl2a2) O(Nla) O(Nlc)
N is population size O(n105)l is program size a is the arity of the spacen is representation size O(aprogram size)c is number of test cases
28
Supervised Classification
Goals accuracies comparable to SVM
superior accuracy vs GP
simpler classifiers vs SVM and GP
29
Supervised Classification
How much simpler Consider average-sized formulae learned for the 6-multiplexer
MOSES 21 nodes max depth 4
and(or(not(x2) and(or(x1 x4) or(and(not(x1) x4) x6))) or(and(or(x1 x4) or(and(or(x5 x6) or(x2 and(x1 x5))) and(not(x1) x3))) and(or(not(x1) and(x2 x6)) or(not(x1) x3 x6) or(and(not(x1) x2) and(x2 x4) and(not(x1) x3)))))
or(and(not(x1) not(x2) x3) and(or(not(x2) and(x3 x6)) x1 x4) and(not(x1) x2 x5) and(x1 x2 x6))
GP (after reduction to ENF) 50 nodes max depth 7
30
Supervised Classification
Datasets taken from recent comp bio papers
Chronic fatigue syndrome (101 cases) based on 26 SNPs genes either in homozygosis in heterozygosis or not expressed 56 binary features
Lymphoma (77 cases) amp aging brains (19 cases) based on gene expression levels (continuous) 50 most-differentiating genes selected preprocessed into binary features based on medians
All experiments based on 10 independent runs of 10-fold cross-validation
31
Quantitative Results
Classification average test accuracy
Technique CFS Lymphoma Aging Brain
SVM 662 975 950
GP 673 779 700
MOSES 679 946 953
32
Quantitative Results
Benchmark performance artificial ant
6x less computational effort vs EP 20x less vs GP
parity problems 133x less vs EP 4x less vs GP on 5-parity found solutions to 6-parity (none found by EP or GP)
multiplexer problems 9x less vs GP on 11-multiplexer
33
Qualitative Results
Requirements for competent program evolution all requirements for competent optimization
+ exploit semantics
+ recombine programs only within bounded subspaces
Bipartite conception of problem difficulty program-level adapted from the optimization case
deme-level theory based on global properties of the space (deme-level neutrality deceptiveness etc)
34
Qualitative Results
Representation-building for programs
parameterization based on semantics
transforms program space properties to facilitate program evolution
probabilistic modeling over sets of program transformations
models compactly represent problem structure
35
Competent Program Evolution
Competent not just good performance explainability of good results robustness
Vision representations are important program learning is unique representations must be specialized based on semantics
MOSES meta-optimizing semantic evolutionary search exploiting semantics and managing demes
36
Committee
Dr Ron Loui (WashU chair)
Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech
Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)
4
Near-Decomposability
Complete separability would be nicehellip
Near-decomposability (Simon 1969) is more realistic
Stronger Interactions
Weaker Interactions
5
Exploiting Separability Separability = independence assumptions
Given a prior over the solution space represented as a probability vector
1 Sample solutions from the model 2 Update model toward higher-scoring points3 Iterate
Works well when interactions are weak
6
Exploiting Near-Decomposability Bayesian optimization algorithm (BOA)
represent problem decomposition as a Bayesian Network learned greedily via a network scoring metric
Hierarchical BOA uses Bayesian networks with local structure
allows smaller model-building steps leads to more accurate models
restricted tournament replacement promotes diversity
Solves the linkage problem Competence solving hard problems quickly
accurately and reliably
7
Program Learning
Solutions encode executable programs execution maps programs to behaviors
execPB find a program p in P maximizeminimize f(exec(p))
fB
To be useful make assumptions about exec P and B
8
Properties of Program Spaces Open-endedness
Over-representation many programs map to the same behavior
Compositional hierarchy intrinsically organized into subprograms
Chaotic Execution similar programs may have very different behaviors
9
Properties of Program Spaces Simplicity prior
simpler programs are more likely
Simplicity preference smaller programs are preferable
Behavioral decomposability fB is separable nearly decomposable
White box execution execution function is known and constant
10
Thesis
Program spaces not directly decomposable
Leverage properties of program spaces as inductive bias
Leading to competent program evolution
11
Representation-Building
Organize programs in terms of commonalities Ignore semantically meaningless variation Explore plausible variations
12
Representation-Building
Common regions must be aligned Redundancy must be identified Create knobs for plausible variations
13
Representation-Building
What abouthellip changing the phase averaging two input instead of picking one hellip
behavior (semantic) space program (syntactic) space
14
Statics amp Dynamics
Representations span a limited subspace of programs
Conceptual steps in representation-building1 reduction to normal form (x x + 0 rarr x)2 neighborhood enumeration (generate knobs)3 neighborhood reduction (get rid of some knobs)
Create demes to maintain a sample of many representations deme a sample of programs living in a common representation intra-deme optimization use the hBOA inter-deme
based on dominance relationships
15
Meta-Optimizing Semantic Evolutionary Search (MOSES)
1 Create an initial deme based on a small set of knobs (ie empty program) and random sampling in knob-space
2 Select a deme and run hBOA on it
3 Select programs from the final hBOA population meeting the deme-creation criterion (possibly displacing existing demes)
4 For each such program1 create a new representation centered around the program2 create a new random sample within this representation3 add as a deme
5 Repeat from step 2
16
Artificial Ant
Eat all food pellets within 600 steps Existing evolutionary methods not
significantly than random Space contains many regularities
To apply MOSES three reductions rules for normal form
eg left left left rarr right separate knobs for rotation
movement amp conditionals no neighborhood reduction
needed
rarr
17
Artificial Ant
How does MOSES do it
Searches a greatly reduced space
Exploits key dependencies ldquo[t]hese symmetries lead to essentially the same solutions
appearing to be the opposite of each other Eg either a pair of Right or pair of Left terminals at a particular location may be importantrdquo ndash Langdon amp Poli ldquoWhy ants are hardrdquo
hBOA modeling learns linkage between rotation knobs
Eliminate modeling and the problem still gets solved but with much higher variance computational effort rises to 36000
Technique Effort
Evolutionary Programming
136000 x
Genetic Programming
450000 x
MOSES 23000
18
Elegant Normal Form (Holman rsquo90)
Hierarchical normal form for Boolean formulae Reduction process takes time linear in formula size 99 of random 500-literal formulae reduced over 98
19
Syntactic vs Behavioral Distance
Is there a correlation between syntactic and behavioral distance
5000 unique random formulae of arity 10 with 30 literals each qualitatively similar results for arity 5
Computed the set of pairwise behavioral distances (truth-table Hamming distance) syntactic distances (tree edit distance normalized by tree size)
The same computation on the same formulae reduced to ENF
20
Syntactic vs Behavioral Distance
Is there a correlation between syntactic and behavioral distance
Random Formulae Reduced to ENF
21
Neighborhoods amp Knobs
What do neighborhoods look like behaviorally
1000 unique random formulae arity 5 100 literals each qualitatively similar results for arity 10
Enumerate all neighbors (edit distances lt2) compute behavioral distance from source
Neighborhoods in MOSES defined based on ENF neighbors are converted to ENF compared to original used to heuristically reduce total neighborhood size
22
Neighborhoods amp Knobs
What do neighborhoods look like behaviorally
Random formulae Reduced to ENF
23
Hierarchical Parity-Multiplexer
Study decomposition in a Boolean domain
Multiplexer function of arity k1 computed from k1 parity function of arity k2
total arity is k1k2
Hypothesis parity subfunctions will exhibit tighter linkages
24
Hierarchical Parity-Multiplexer Computational effort decreases 42 with model-
building (on 2-parity-3-multiplexer)
Paritysubfunctions(adjacent pairs)have tightest linkages
Hypothesis validated
25
Program Growth
5-parity minimal program size ~ 53
26
Program Growth
11-multiplexer minimal program size ~ 27
27
Where do the Cycles Go
Problem hBOA Representation-Building
Program Evaluation
5-Parity 28 43 29
11-multiplex 5 5 89
CFS 80 10 11
Complexity O(Nl2a2) O(Nla) O(Nlc)
N is population size O(n105)l is program size a is the arity of the spacen is representation size O(aprogram size)c is number of test cases
28
Supervised Classification
Goals accuracies comparable to SVM
superior accuracy vs GP
simpler classifiers vs SVM and GP
29
Supervised Classification
How much simpler Consider average-sized formulae learned for the 6-multiplexer
MOSES 21 nodes max depth 4
and(or(not(x2) and(or(x1 x4) or(and(not(x1) x4) x6))) or(and(or(x1 x4) or(and(or(x5 x6) or(x2 and(x1 x5))) and(not(x1) x3))) and(or(not(x1) and(x2 x6)) or(not(x1) x3 x6) or(and(not(x1) x2) and(x2 x4) and(not(x1) x3)))))
or(and(not(x1) not(x2) x3) and(or(not(x2) and(x3 x6)) x1 x4) and(not(x1) x2 x5) and(x1 x2 x6))
GP (after reduction to ENF) 50 nodes max depth 7
30
Supervised Classification
Datasets taken from recent comp bio papers
Chronic fatigue syndrome (101 cases) based on 26 SNPs genes either in homozygosis in heterozygosis or not expressed 56 binary features
Lymphoma (77 cases) amp aging brains (19 cases) based on gene expression levels (continuous) 50 most-differentiating genes selected preprocessed into binary features based on medians
All experiments based on 10 independent runs of 10-fold cross-validation
31
Quantitative Results
Classification average test accuracy
Technique CFS Lymphoma Aging Brain
SVM 662 975 950
GP 673 779 700
MOSES 679 946 953
32
Quantitative Results
Benchmark performance artificial ant
6x less computational effort vs EP 20x less vs GP
parity problems 133x less vs EP 4x less vs GP on 5-parity found solutions to 6-parity (none found by EP or GP)
multiplexer problems 9x less vs GP on 11-multiplexer
33
Qualitative Results
Requirements for competent program evolution all requirements for competent optimization
+ exploit semantics
+ recombine programs only within bounded subspaces
Bipartite conception of problem difficulty program-level adapted from the optimization case
deme-level theory based on global properties of the space (deme-level neutrality deceptiveness etc)
34
Qualitative Results
Representation-building for programs
parameterization based on semantics
transforms program space properties to facilitate program evolution
probabilistic modeling over sets of program transformations
models compactly represent problem structure
35
Competent Program Evolution
Competent not just good performance explainability of good results robustness
Vision representations are important program learning is unique representations must be specialized based on semantics
MOSES meta-optimizing semantic evolutionary search exploiting semantics and managing demes
36
Committee
Dr Ron Loui (WashU chair)
Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech
Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)
5
Exploiting Separability Separability = independence assumptions
Given a prior over the solution space represented as a probability vector
1 Sample solutions from the model 2 Update model toward higher-scoring points3 Iterate
Works well when interactions are weak
6
Exploiting Near-Decomposability Bayesian optimization algorithm (BOA)
represent problem decomposition as a Bayesian Network learned greedily via a network scoring metric
Hierarchical BOA uses Bayesian networks with local structure
allows smaller model-building steps leads to more accurate models
restricted tournament replacement promotes diversity
Solves the linkage problem Competence solving hard problems quickly
accurately and reliably
7
Program Learning
Solutions encode executable programs execution maps programs to behaviors
execPB find a program p in P maximizeminimize f(exec(p))
fB
To be useful make assumptions about exec P and B
8
Properties of Program Spaces Open-endedness
Over-representation many programs map to the same behavior
Compositional hierarchy intrinsically organized into subprograms
Chaotic Execution similar programs may have very different behaviors
9
Properties of Program Spaces Simplicity prior
simpler programs are more likely
Simplicity preference smaller programs are preferable
Behavioral decomposability fB is separable nearly decomposable
White box execution execution function is known and constant
10
Thesis
Program spaces not directly decomposable
Leverage properties of program spaces as inductive bias
Leading to competent program evolution
11
Representation-Building
Organize programs in terms of commonalities Ignore semantically meaningless variation Explore plausible variations
12
Representation-Building
Common regions must be aligned Redundancy must be identified Create knobs for plausible variations
13
Representation-Building
What abouthellip changing the phase averaging two input instead of picking one hellip
behavior (semantic) space program (syntactic) space
14
Statics amp Dynamics
Representations span a limited subspace of programs
Conceptual steps in representation-building1 reduction to normal form (x x + 0 rarr x)2 neighborhood enumeration (generate knobs)3 neighborhood reduction (get rid of some knobs)
Create demes to maintain a sample of many representations deme a sample of programs living in a common representation intra-deme optimization use the hBOA inter-deme
based on dominance relationships
15
Meta-Optimizing Semantic Evolutionary Search (MOSES)
1 Create an initial deme based on a small set of knobs (ie empty program) and random sampling in knob-space
2 Select a deme and run hBOA on it
3 Select programs from the final hBOA population meeting the deme-creation criterion (possibly displacing existing demes)
4 For each such program1 create a new representation centered around the program2 create a new random sample within this representation3 add as a deme
5 Repeat from step 2
16
Artificial Ant
Eat all food pellets within 600 steps Existing evolutionary methods not
significantly than random Space contains many regularities
To apply MOSES three reductions rules for normal form
eg left left left rarr right separate knobs for rotation
movement amp conditionals no neighborhood reduction
needed
rarr
17
Artificial Ant
How does MOSES do it
Searches a greatly reduced space
Exploits key dependencies ldquo[t]hese symmetries lead to essentially the same solutions
appearing to be the opposite of each other Eg either a pair of Right or pair of Left terminals at a particular location may be importantrdquo ndash Langdon amp Poli ldquoWhy ants are hardrdquo
hBOA modeling learns linkage between rotation knobs
Eliminate modeling and the problem still gets solved but with much higher variance computational effort rises to 36000
Technique Effort
Evolutionary Programming
136000 x
Genetic Programming
450000 x
MOSES 23000
18
Elegant Normal Form (Holman rsquo90)
Hierarchical normal form for Boolean formulae Reduction process takes time linear in formula size 99 of random 500-literal formulae reduced over 98
19
Syntactic vs Behavioral Distance
Is there a correlation between syntactic and behavioral distance
5000 unique random formulae of arity 10 with 30 literals each qualitatively similar results for arity 5
Computed the set of pairwise behavioral distances (truth-table Hamming distance) syntactic distances (tree edit distance normalized by tree size)
The same computation on the same formulae reduced to ENF
20
Syntactic vs Behavioral Distance
Is there a correlation between syntactic and behavioral distance
Random Formulae Reduced to ENF
21
Neighborhoods amp Knobs
What do neighborhoods look like behaviorally
1000 unique random formulae arity 5 100 literals each qualitatively similar results for arity 10
Enumerate all neighbors (edit distances lt2) compute behavioral distance from source
Neighborhoods in MOSES defined based on ENF neighbors are converted to ENF compared to original used to heuristically reduce total neighborhood size
22
Neighborhoods amp Knobs
What do neighborhoods look like behaviorally
Random formulae Reduced to ENF
23
Hierarchical Parity-Multiplexer
Study decomposition in a Boolean domain
Multiplexer function of arity k1 computed from k1 parity function of arity k2
total arity is k1k2
Hypothesis parity subfunctions will exhibit tighter linkages
24
Hierarchical Parity-Multiplexer Computational effort decreases 42 with model-
building (on 2-parity-3-multiplexer)
Paritysubfunctions(adjacent pairs)have tightest linkages
Hypothesis validated
25
Program Growth
5-parity minimal program size ~ 53
26
Program Growth
11-multiplexer minimal program size ~ 27
27
Where do the Cycles Go
Problem hBOA Representation-Building
Program Evaluation
5-Parity 28 43 29
11-multiplex 5 5 89
CFS 80 10 11
Complexity O(Nl2a2) O(Nla) O(Nlc)
N is population size O(n105)l is program size a is the arity of the spacen is representation size O(aprogram size)c is number of test cases
28
Supervised Classification
Goals accuracies comparable to SVM
superior accuracy vs GP
simpler classifiers vs SVM and GP
29
Supervised Classification
How much simpler Consider average-sized formulae learned for the 6-multiplexer
MOSES 21 nodes max depth 4
and(or(not(x2) and(or(x1 x4) or(and(not(x1) x4) x6))) or(and(or(x1 x4) or(and(or(x5 x6) or(x2 and(x1 x5))) and(not(x1) x3))) and(or(not(x1) and(x2 x6)) or(not(x1) x3 x6) or(and(not(x1) x2) and(x2 x4) and(not(x1) x3)))))
or(and(not(x1) not(x2) x3) and(or(not(x2) and(x3 x6)) x1 x4) and(not(x1) x2 x5) and(x1 x2 x6))
GP (after reduction to ENF) 50 nodes max depth 7
30
Supervised Classification
Datasets taken from recent comp bio papers
Chronic fatigue syndrome (101 cases) based on 26 SNPs genes either in homozygosis in heterozygosis or not expressed 56 binary features
Lymphoma (77 cases) amp aging brains (19 cases) based on gene expression levels (continuous) 50 most-differentiating genes selected preprocessed into binary features based on medians
All experiments based on 10 independent runs of 10-fold cross-validation
31
Quantitative Results
Classification average test accuracy
Technique CFS Lymphoma Aging Brain
SVM 662 975 950
GP 673 779 700
MOSES 679 946 953
32
Quantitative Results
Benchmark performance artificial ant
6x less computational effort vs EP 20x less vs GP
parity problems 133x less vs EP 4x less vs GP on 5-parity found solutions to 6-parity (none found by EP or GP)
multiplexer problems 9x less vs GP on 11-multiplexer
33
Qualitative Results
Requirements for competent program evolution all requirements for competent optimization
+ exploit semantics
+ recombine programs only within bounded subspaces
Bipartite conception of problem difficulty program-level adapted from the optimization case
deme-level theory based on global properties of the space (deme-level neutrality deceptiveness etc)
34
Qualitative Results
Representation-building for programs
parameterization based on semantics
transforms program space properties to facilitate program evolution
probabilistic modeling over sets of program transformations
models compactly represent problem structure
35
Competent Program Evolution
Competent not just good performance explainability of good results robustness
Vision representations are important program learning is unique representations must be specialized based on semantics
MOSES meta-optimizing semantic evolutionary search exploiting semantics and managing demes
36
Committee
Dr Ron Loui (WashU chair)
Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech
Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)
6
Exploiting Near-Decomposability Bayesian optimization algorithm (BOA)
represent problem decomposition as a Bayesian Network learned greedily via a network scoring metric
Hierarchical BOA uses Bayesian networks with local structure
allows smaller model-building steps leads to more accurate models
restricted tournament replacement promotes diversity
Solves the linkage problem Competence solving hard problems quickly
accurately and reliably
7
Program Learning
Solutions encode executable programs execution maps programs to behaviors
execPB find a program p in P maximizeminimize f(exec(p))
fB
To be useful make assumptions about exec P and B
8
Properties of Program Spaces Open-endedness
Over-representation many programs map to the same behavior
Compositional hierarchy intrinsically organized into subprograms
Chaotic Execution similar programs may have very different behaviors
9
Properties of Program Spaces Simplicity prior
simpler programs are more likely
Simplicity preference smaller programs are preferable
Behavioral decomposability fB is separable nearly decomposable
White box execution execution function is known and constant
10
Thesis
Program spaces not directly decomposable
Leverage properties of program spaces as inductive bias
Leading to competent program evolution
11
Representation-Building
Organize programs in terms of commonalities Ignore semantically meaningless variation Explore plausible variations
12
Representation-Building
Common regions must be aligned Redundancy must be identified Create knobs for plausible variations
13
Representation-Building
What abouthellip changing the phase averaging two input instead of picking one hellip
behavior (semantic) space program (syntactic) space
14
Statics amp Dynamics
Representations span a limited subspace of programs
Conceptual steps in representation-building1 reduction to normal form (x x + 0 rarr x)2 neighborhood enumeration (generate knobs)3 neighborhood reduction (get rid of some knobs)
Create demes to maintain a sample of many representations deme a sample of programs living in a common representation intra-deme optimization use the hBOA inter-deme
based on dominance relationships
15
Meta-Optimizing Semantic Evolutionary Search (MOSES)
1 Create an initial deme based on a small set of knobs (ie empty program) and random sampling in knob-space
2 Select a deme and run hBOA on it
3 Select programs from the final hBOA population meeting the deme-creation criterion (possibly displacing existing demes)
4 For each such program1 create a new representation centered around the program2 create a new random sample within this representation3 add as a deme
5 Repeat from step 2
16
Artificial Ant
Eat all food pellets within 600 steps Existing evolutionary methods not
significantly than random Space contains many regularities
To apply MOSES three reductions rules for normal form
eg left left left rarr right separate knobs for rotation
movement amp conditionals no neighborhood reduction
needed
rarr
17
Artificial Ant
How does MOSES do it
Searches a greatly reduced space
Exploits key dependencies ldquo[t]hese symmetries lead to essentially the same solutions
appearing to be the opposite of each other Eg either a pair of Right or pair of Left terminals at a particular location may be importantrdquo ndash Langdon amp Poli ldquoWhy ants are hardrdquo
hBOA modeling learns linkage between rotation knobs
Eliminate modeling and the problem still gets solved but with much higher variance computational effort rises to 36000
Technique Effort
Evolutionary Programming
136000 x
Genetic Programming
450000 x
MOSES 23000
18
Elegant Normal Form (Holman rsquo90)
Hierarchical normal form for Boolean formulae Reduction process takes time linear in formula size 99 of random 500-literal formulae reduced over 98
19
Syntactic vs Behavioral Distance
Is there a correlation between syntactic and behavioral distance
5000 unique random formulae of arity 10 with 30 literals each qualitatively similar results for arity 5
Computed the set of pairwise behavioral distances (truth-table Hamming distance) syntactic distances (tree edit distance normalized by tree size)
The same computation on the same formulae reduced to ENF
20
Syntactic vs Behavioral Distance
Is there a correlation between syntactic and behavioral distance
Random Formulae Reduced to ENF
21
Neighborhoods amp Knobs
What do neighborhoods look like behaviorally
1000 unique random formulae arity 5 100 literals each qualitatively similar results for arity 10
Enumerate all neighbors (edit distances lt2) compute behavioral distance from source
Neighborhoods in MOSES defined based on ENF neighbors are converted to ENF compared to original used to heuristically reduce total neighborhood size
22
Neighborhoods amp Knobs
What do neighborhoods look like behaviorally
Random formulae Reduced to ENF
23
Hierarchical Parity-Multiplexer
Study decomposition in a Boolean domain
Multiplexer function of arity k1 computed from k1 parity function of arity k2
total arity is k1k2
Hypothesis parity subfunctions will exhibit tighter linkages
24
Hierarchical Parity-Multiplexer Computational effort decreases 42 with model-
building (on 2-parity-3-multiplexer)
Paritysubfunctions(adjacent pairs)have tightest linkages
Hypothesis validated
25
Program Growth
5-parity minimal program size ~ 53
26
Program Growth
11-multiplexer minimal program size ~ 27
27
Where do the Cycles Go
Problem hBOA Representation-Building
Program Evaluation
5-Parity 28 43 29
11-multiplex 5 5 89
CFS 80 10 11
Complexity O(Nl2a2) O(Nla) O(Nlc)
N is population size O(n105)l is program size a is the arity of the spacen is representation size O(aprogram size)c is number of test cases
28
Supervised Classification
Goals accuracies comparable to SVM
superior accuracy vs GP
simpler classifiers vs SVM and GP
29
Supervised Classification
How much simpler Consider average-sized formulae learned for the 6-multiplexer
MOSES 21 nodes max depth 4
and(or(not(x2) and(or(x1 x4) or(and(not(x1) x4) x6))) or(and(or(x1 x4) or(and(or(x5 x6) or(x2 and(x1 x5))) and(not(x1) x3))) and(or(not(x1) and(x2 x6)) or(not(x1) x3 x6) or(and(not(x1) x2) and(x2 x4) and(not(x1) x3)))))
or(and(not(x1) not(x2) x3) and(or(not(x2) and(x3 x6)) x1 x4) and(not(x1) x2 x5) and(x1 x2 x6))
GP (after reduction to ENF) 50 nodes max depth 7
30
Supervised Classification
Datasets taken from recent comp bio papers
Chronic fatigue syndrome (101 cases) based on 26 SNPs genes either in homozygosis in heterozygosis or not expressed 56 binary features
Lymphoma (77 cases) amp aging brains (19 cases) based on gene expression levels (continuous) 50 most-differentiating genes selected preprocessed into binary features based on medians
All experiments based on 10 independent runs of 10-fold cross-validation
31
Quantitative Results
Classification average test accuracy
Technique CFS Lymphoma Aging Brain
SVM 662 975 950
GP 673 779 700
MOSES 679 946 953
32
Quantitative Results
Benchmark performance artificial ant
6x less computational effort vs EP 20x less vs GP
parity problems 133x less vs EP 4x less vs GP on 5-parity found solutions to 6-parity (none found by EP or GP)
multiplexer problems 9x less vs GP on 11-multiplexer
33
Qualitative Results
Requirements for competent program evolution all requirements for competent optimization
+ exploit semantics
+ recombine programs only within bounded subspaces
Bipartite conception of problem difficulty program-level adapted from the optimization case
deme-level theory based on global properties of the space (deme-level neutrality deceptiveness etc)
34
Qualitative Results
Representation-building for programs
parameterization based on semantics
transforms program space properties to facilitate program evolution
probabilistic modeling over sets of program transformations
models compactly represent problem structure
35
Competent Program Evolution
Competent not just good performance explainability of good results robustness
Vision representations are important program learning is unique representations must be specialized based on semantics
MOSES meta-optimizing semantic evolutionary search exploiting semantics and managing demes
36
Committee
Dr Ron Loui (WashU chair)
Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech
Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)
7
Program Learning
Solutions encode executable programs execution maps programs to behaviors
execPB find a program p in P maximizeminimize f(exec(p))
fB
To be useful make assumptions about exec P and B
8
Properties of Program Spaces Open-endedness
Over-representation many programs map to the same behavior
Compositional hierarchy intrinsically organized into subprograms
Chaotic Execution similar programs may have very different behaviors
9
Properties of Program Spaces Simplicity prior
simpler programs are more likely
Simplicity preference smaller programs are preferable
Behavioral decomposability fB is separable nearly decomposable
White box execution execution function is known and constant
10
Thesis
Program spaces not directly decomposable
Leverage properties of program spaces as inductive bias
Leading to competent program evolution
11
Representation-Building
Organize programs in terms of commonalities Ignore semantically meaningless variation Explore plausible variations
12
Representation-Building
Common regions must be aligned Redundancy must be identified Create knobs for plausible variations
13
Representation-Building
What abouthellip changing the phase averaging two input instead of picking one hellip
behavior (semantic) space program (syntactic) space
14
Statics amp Dynamics
Representations span a limited subspace of programs
Conceptual steps in representation-building1 reduction to normal form (x x + 0 rarr x)2 neighborhood enumeration (generate knobs)3 neighborhood reduction (get rid of some knobs)
Create demes to maintain a sample of many representations deme a sample of programs living in a common representation intra-deme optimization use the hBOA inter-deme
based on dominance relationships
15
Meta-Optimizing Semantic Evolutionary Search (MOSES)
1 Create an initial deme based on a small set of knobs (ie empty program) and random sampling in knob-space
2 Select a deme and run hBOA on it
3 Select programs from the final hBOA population meeting the deme-creation criterion (possibly displacing existing demes)
4 For each such program1 create a new representation centered around the program2 create a new random sample within this representation3 add as a deme
5 Repeat from step 2
16
Artificial Ant
Eat all food pellets within 600 steps Existing evolutionary methods not
significantly than random Space contains many regularities
To apply MOSES three reductions rules for normal form
eg left left left rarr right separate knobs for rotation
movement amp conditionals no neighborhood reduction
needed
rarr
17
Artificial Ant
How does MOSES do it
Searches a greatly reduced space
Exploits key dependencies ldquo[t]hese symmetries lead to essentially the same solutions
appearing to be the opposite of each other Eg either a pair of Right or pair of Left terminals at a particular location may be importantrdquo ndash Langdon amp Poli ldquoWhy ants are hardrdquo
hBOA modeling learns linkage between rotation knobs
Eliminate modeling and the problem still gets solved but with much higher variance computational effort rises to 36000
Technique Effort
Evolutionary Programming
136000 x
Genetic Programming
450000 x
MOSES 23000
18
Elegant Normal Form (Holman rsquo90)
Hierarchical normal form for Boolean formulae Reduction process takes time linear in formula size 99 of random 500-literal formulae reduced over 98
19
Syntactic vs Behavioral Distance
Is there a correlation between syntactic and behavioral distance
5000 unique random formulae of arity 10 with 30 literals each qualitatively similar results for arity 5
Computed the set of pairwise behavioral distances (truth-table Hamming distance) syntactic distances (tree edit distance normalized by tree size)
The same computation on the same formulae reduced to ENF
20
Syntactic vs Behavioral Distance
Is there a correlation between syntactic and behavioral distance
Random Formulae Reduced to ENF
21
Neighborhoods amp Knobs
What do neighborhoods look like behaviorally
1000 unique random formulae arity 5 100 literals each qualitatively similar results for arity 10
Enumerate all neighbors (edit distances lt2) compute behavioral distance from source
Neighborhoods in MOSES defined based on ENF neighbors are converted to ENF compared to original used to heuristically reduce total neighborhood size
22
Neighborhoods amp Knobs
What do neighborhoods look like behaviorally
Random formulae Reduced to ENF
23
Hierarchical Parity-Multiplexer
Study decomposition in a Boolean domain
Multiplexer function of arity k1 computed from k1 parity function of arity k2
total arity is k1k2
Hypothesis parity subfunctions will exhibit tighter linkages
24
Hierarchical Parity-Multiplexer Computational effort decreases 42 with model-
building (on 2-parity-3-multiplexer)
Paritysubfunctions(adjacent pairs)have tightest linkages
Hypothesis validated
25
Program Growth
5-parity minimal program size ~ 53
26
Program Growth
11-multiplexer minimal program size ~ 27
27
Where do the Cycles Go
Problem hBOA Representation-Building
Program Evaluation
5-Parity 28 43 29
11-multiplex 5 5 89
CFS 80 10 11
Complexity O(Nl2a2) O(Nla) O(Nlc)
N is population size O(n105)l is program size a is the arity of the spacen is representation size O(aprogram size)c is number of test cases
28
Supervised Classification
Goals accuracies comparable to SVM
superior accuracy vs GP
simpler classifiers vs SVM and GP
29
Supervised Classification
How much simpler Consider average-sized formulae learned for the 6-multiplexer
MOSES 21 nodes max depth 4
and(or(not(x2) and(or(x1 x4) or(and(not(x1) x4) x6))) or(and(or(x1 x4) or(and(or(x5 x6) or(x2 and(x1 x5))) and(not(x1) x3))) and(or(not(x1) and(x2 x6)) or(not(x1) x3 x6) or(and(not(x1) x2) and(x2 x4) and(not(x1) x3)))))
or(and(not(x1) not(x2) x3) and(or(not(x2) and(x3 x6)) x1 x4) and(not(x1) x2 x5) and(x1 x2 x6))
GP (after reduction to ENF) 50 nodes max depth 7
30
Supervised Classification
Datasets taken from recent comp bio papers
Chronic fatigue syndrome (101 cases) based on 26 SNPs genes either in homozygosis in heterozygosis or not expressed 56 binary features
Lymphoma (77 cases) amp aging brains (19 cases) based on gene expression levels (continuous) 50 most-differentiating genes selected preprocessed into binary features based on medians
All experiments based on 10 independent runs of 10-fold cross-validation
31
Quantitative Results
Classification average test accuracy
Technique CFS Lymphoma Aging Brain
SVM 662 975 950
GP 673 779 700
MOSES 679 946 953
32
Quantitative Results
Benchmark performance artificial ant
6x less computational effort vs EP 20x less vs GP
parity problems 133x less vs EP 4x less vs GP on 5-parity found solutions to 6-parity (none found by EP or GP)
multiplexer problems 9x less vs GP on 11-multiplexer
33
Qualitative Results
Requirements for competent program evolution all requirements for competent optimization
+ exploit semantics
+ recombine programs only within bounded subspaces
Bipartite conception of problem difficulty program-level adapted from the optimization case
deme-level theory based on global properties of the space (deme-level neutrality deceptiveness etc)
34
Qualitative Results
Representation-building for programs
parameterization based on semantics
transforms program space properties to facilitate program evolution
probabilistic modeling over sets of program transformations
models compactly represent problem structure
35
Competent Program Evolution
Competent not just good performance explainability of good results robustness
Vision representations are important program learning is unique representations must be specialized based on semantics
MOSES meta-optimizing semantic evolutionary search exploiting semantics and managing demes
36
Committee
Dr Ron Loui (WashU chair)
Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech
Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)
8
Properties of Program Spaces Open-endedness
Over-representation many programs map to the same behavior
Compositional hierarchy intrinsically organized into subprograms
Chaotic Execution similar programs may have very different behaviors
9
Properties of Program Spaces Simplicity prior
simpler programs are more likely
Simplicity preference smaller programs are preferable
Behavioral decomposability fB is separable nearly decomposable
White box execution execution function is known and constant
10
Thesis
Program spaces not directly decomposable
Leverage properties of program spaces as inductive bias
Leading to competent program evolution
11
Representation-Building
Organize programs in terms of commonalities Ignore semantically meaningless variation Explore plausible variations
12
Representation-Building
Common regions must be aligned Redundancy must be identified Create knobs for plausible variations
13
Representation-Building
What abouthellip changing the phase averaging two input instead of picking one hellip
behavior (semantic) space program (syntactic) space
14
Statics amp Dynamics
Representations span a limited subspace of programs
Conceptual steps in representation-building1 reduction to normal form (x x + 0 rarr x)2 neighborhood enumeration (generate knobs)3 neighborhood reduction (get rid of some knobs)
Create demes to maintain a sample of many representations deme a sample of programs living in a common representation intra-deme optimization use the hBOA inter-deme
based on dominance relationships
15
Meta-Optimizing Semantic Evolutionary Search (MOSES)
1 Create an initial deme based on a small set of knobs (ie empty program) and random sampling in knob-space
2 Select a deme and run hBOA on it
3 Select programs from the final hBOA population meeting the deme-creation criterion (possibly displacing existing demes)
4 For each such program1 create a new representation centered around the program2 create a new random sample within this representation3 add as a deme
5 Repeat from step 2
16
Artificial Ant
Eat all food pellets within 600 steps Existing evolutionary methods not
significantly than random Space contains many regularities
To apply MOSES three reductions rules for normal form
eg left left left rarr right separate knobs for rotation
movement amp conditionals no neighborhood reduction
needed
rarr
17
Artificial Ant
How does MOSES do it
Searches a greatly reduced space
Exploits key dependencies ldquo[t]hese symmetries lead to essentially the same solutions
appearing to be the opposite of each other Eg either a pair of Right or pair of Left terminals at a particular location may be importantrdquo ndash Langdon amp Poli ldquoWhy ants are hardrdquo
hBOA modeling learns linkage between rotation knobs
Eliminate modeling and the problem still gets solved but with much higher variance computational effort rises to 36000
Technique Effort
Evolutionary Programming
136000 x
Genetic Programming
450000 x
MOSES 23000
18
Elegant Normal Form (Holman rsquo90)
Hierarchical normal form for Boolean formulae Reduction process takes time linear in formula size 99 of random 500-literal formulae reduced over 98
19
Syntactic vs Behavioral Distance
Is there a correlation between syntactic and behavioral distance
5000 unique random formulae of arity 10 with 30 literals each qualitatively similar results for arity 5
Computed the set of pairwise behavioral distances (truth-table Hamming distance) syntactic distances (tree edit distance normalized by tree size)
The same computation on the same formulae reduced to ENF
20
Syntactic vs Behavioral Distance
Is there a correlation between syntactic and behavioral distance
Random Formulae Reduced to ENF
21
Neighborhoods amp Knobs
What do neighborhoods look like behaviorally
1000 unique random formulae arity 5 100 literals each qualitatively similar results for arity 10
Enumerate all neighbors (edit distances lt2) compute behavioral distance from source
Neighborhoods in MOSES defined based on ENF neighbors are converted to ENF compared to original used to heuristically reduce total neighborhood size
22
Neighborhoods amp Knobs
What do neighborhoods look like behaviorally
Random formulae Reduced to ENF
23
Hierarchical Parity-Multiplexer
Study decomposition in a Boolean domain
Multiplexer function of arity k1 computed from k1 parity function of arity k2
total arity is k1k2
Hypothesis parity subfunctions will exhibit tighter linkages
24
Hierarchical Parity-Multiplexer Computational effort decreases 42 with model-
building (on 2-parity-3-multiplexer)
Paritysubfunctions(adjacent pairs)have tightest linkages
Hypothesis validated
25
Program Growth
5-parity minimal program size ~ 53
26
Program Growth
11-multiplexer minimal program size ~ 27
27
Where do the Cycles Go
Problem hBOA Representation-Building
Program Evaluation
5-Parity 28 43 29
11-multiplex 5 5 89
CFS 80 10 11
Complexity O(Nl2a2) O(Nla) O(Nlc)
N is population size O(n105)l is program size a is the arity of the spacen is representation size O(aprogram size)c is number of test cases
28
Supervised Classification
Goals accuracies comparable to SVM
superior accuracy vs GP
simpler classifiers vs SVM and GP
29
Supervised Classification
How much simpler Consider average-sized formulae learned for the 6-multiplexer
MOSES 21 nodes max depth 4
and(or(not(x2) and(or(x1 x4) or(and(not(x1) x4) x6))) or(and(or(x1 x4) or(and(or(x5 x6) or(x2 and(x1 x5))) and(not(x1) x3))) and(or(not(x1) and(x2 x6)) or(not(x1) x3 x6) or(and(not(x1) x2) and(x2 x4) and(not(x1) x3)))))
or(and(not(x1) not(x2) x3) and(or(not(x2) and(x3 x6)) x1 x4) and(not(x1) x2 x5) and(x1 x2 x6))
GP (after reduction to ENF) 50 nodes max depth 7
30
Supervised Classification
Datasets taken from recent comp bio papers
Chronic fatigue syndrome (101 cases) based on 26 SNPs genes either in homozygosis in heterozygosis or not expressed 56 binary features
Lymphoma (77 cases) amp aging brains (19 cases) based on gene expression levels (continuous) 50 most-differentiating genes selected preprocessed into binary features based on medians
All experiments based on 10 independent runs of 10-fold cross-validation
31
Quantitative Results
Classification average test accuracy
Technique CFS Lymphoma Aging Brain
SVM 662 975 950
GP 673 779 700
MOSES 679 946 953
32
Quantitative Results
Benchmark performance artificial ant
6x less computational effort vs EP 20x less vs GP
parity problems 133x less vs EP 4x less vs GP on 5-parity found solutions to 6-parity (none found by EP or GP)
multiplexer problems 9x less vs GP on 11-multiplexer
33
Qualitative Results
Requirements for competent program evolution all requirements for competent optimization
+ exploit semantics
+ recombine programs only within bounded subspaces
Bipartite conception of problem difficulty program-level adapted from the optimization case
deme-level theory based on global properties of the space (deme-level neutrality deceptiveness etc)
34
Qualitative Results
Representation-building for programs
parameterization based on semantics
transforms program space properties to facilitate program evolution
probabilistic modeling over sets of program transformations
models compactly represent problem structure
35
Competent Program Evolution
Competent not just good performance explainability of good results robustness
Vision representations are important program learning is unique representations must be specialized based on semantics
MOSES meta-optimizing semantic evolutionary search exploiting semantics and managing demes
36
Committee
Dr Ron Loui (WashU chair)
Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech
Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)
9
Properties of Program Spaces Simplicity prior
simpler programs are more likely
Simplicity preference smaller programs are preferable
Behavioral decomposability fB is separable nearly decomposable
White box execution execution function is known and constant
10
Thesis
Program spaces not directly decomposable
Leverage properties of program spaces as inductive bias
Leading to competent program evolution
11
Representation-Building
Organize programs in terms of commonalities Ignore semantically meaningless variation Explore plausible variations
12
Representation-Building
Common regions must be aligned Redundancy must be identified Create knobs for plausible variations
13
Representation-Building
What abouthellip changing the phase averaging two input instead of picking one hellip
behavior (semantic) space program (syntactic) space
14
Statics amp Dynamics
Representations span a limited subspace of programs
Conceptual steps in representation-building1 reduction to normal form (x x + 0 rarr x)2 neighborhood enumeration (generate knobs)3 neighborhood reduction (get rid of some knobs)
Create demes to maintain a sample of many representations deme a sample of programs living in a common representation intra-deme optimization use the hBOA inter-deme
based on dominance relationships
15
Meta-Optimizing Semantic Evolutionary Search (MOSES)
1 Create an initial deme based on a small set of knobs (ie empty program) and random sampling in knob-space
2 Select a deme and run hBOA on it
3 Select programs from the final hBOA population meeting the deme-creation criterion (possibly displacing existing demes)
4 For each such program1 create a new representation centered around the program2 create a new random sample within this representation3 add as a deme
5 Repeat from step 2
16
Artificial Ant
Eat all food pellets within 600 steps Existing evolutionary methods not
significantly than random Space contains many regularities
To apply MOSES three reductions rules for normal form
eg left left left rarr right separate knobs for rotation
movement amp conditionals no neighborhood reduction
needed
rarr
17
Artificial Ant
How does MOSES do it
Searches a greatly reduced space
Exploits key dependencies ldquo[t]hese symmetries lead to essentially the same solutions
appearing to be the opposite of each other Eg either a pair of Right or pair of Left terminals at a particular location may be importantrdquo ndash Langdon amp Poli ldquoWhy ants are hardrdquo
hBOA modeling learns linkage between rotation knobs
Eliminate modeling and the problem still gets solved but with much higher variance computational effort rises to 36000
Technique Effort
Evolutionary Programming
136000 x
Genetic Programming
450000 x
MOSES 23000
18
Elegant Normal Form (Holman rsquo90)
Hierarchical normal form for Boolean formulae Reduction process takes time linear in formula size 99 of random 500-literal formulae reduced over 98
19
Syntactic vs Behavioral Distance
Is there a correlation between syntactic and behavioral distance
5000 unique random formulae of arity 10 with 30 literals each qualitatively similar results for arity 5
Computed the set of pairwise behavioral distances (truth-table Hamming distance) syntactic distances (tree edit distance normalized by tree size)
The same computation on the same formulae reduced to ENF
20
Syntactic vs Behavioral Distance
Is there a correlation between syntactic and behavioral distance
Random Formulae Reduced to ENF
21
Neighborhoods amp Knobs
What do neighborhoods look like behaviorally
1000 unique random formulae arity 5 100 literals each qualitatively similar results for arity 10
Enumerate all neighbors (edit distances lt2) compute behavioral distance from source
Neighborhoods in MOSES defined based on ENF neighbors are converted to ENF compared to original used to heuristically reduce total neighborhood size
22
Neighborhoods amp Knobs
What do neighborhoods look like behaviorally
Random formulae Reduced to ENF
23
Hierarchical Parity-Multiplexer
Study decomposition in a Boolean domain
Multiplexer function of arity k1 computed from k1 parity function of arity k2
total arity is k1k2
Hypothesis parity subfunctions will exhibit tighter linkages
24
Hierarchical Parity-Multiplexer Computational effort decreases 42 with model-
building (on 2-parity-3-multiplexer)
Paritysubfunctions(adjacent pairs)have tightest linkages
Hypothesis validated
25
Program Growth
5-parity minimal program size ~ 53
26
Program Growth
11-multiplexer minimal program size ~ 27
27
Where do the Cycles Go
Problem hBOA Representation-Building
Program Evaluation
5-Parity 28 43 29
11-multiplex 5 5 89
CFS 80 10 11
Complexity O(Nl2a2) O(Nla) O(Nlc)
N is population size O(n105)l is program size a is the arity of the spacen is representation size O(aprogram size)c is number of test cases
28
Supervised Classification
Goals accuracies comparable to SVM
superior accuracy vs GP
simpler classifiers vs SVM and GP
29
Supervised Classification
How much simpler Consider average-sized formulae learned for the 6-multiplexer
MOSES 21 nodes max depth 4
and(or(not(x2) and(or(x1 x4) or(and(not(x1) x4) x6))) or(and(or(x1 x4) or(and(or(x5 x6) or(x2 and(x1 x5))) and(not(x1) x3))) and(or(not(x1) and(x2 x6)) or(not(x1) x3 x6) or(and(not(x1) x2) and(x2 x4) and(not(x1) x3)))))
or(and(not(x1) not(x2) x3) and(or(not(x2) and(x3 x6)) x1 x4) and(not(x1) x2 x5) and(x1 x2 x6))
GP (after reduction to ENF) 50 nodes max depth 7
30
Supervised Classification
Datasets taken from recent comp bio papers
Chronic fatigue syndrome (101 cases) based on 26 SNPs genes either in homozygosis in heterozygosis or not expressed 56 binary features
Lymphoma (77 cases) amp aging brains (19 cases) based on gene expression levels (continuous) 50 most-differentiating genes selected preprocessed into binary features based on medians
All experiments based on 10 independent runs of 10-fold cross-validation
31
Quantitative Results
Classification average test accuracy
Technique CFS Lymphoma Aging Brain
SVM 662 975 950
GP 673 779 700
MOSES 679 946 953
32
Quantitative Results
Benchmark performance artificial ant
6x less computational effort vs EP 20x less vs GP
parity problems 133x less vs EP 4x less vs GP on 5-parity found solutions to 6-parity (none found by EP or GP)
multiplexer problems 9x less vs GP on 11-multiplexer
33
Qualitative Results
Requirements for competent program evolution all requirements for competent optimization
+ exploit semantics
+ recombine programs only within bounded subspaces
Bipartite conception of problem difficulty program-level adapted from the optimization case
deme-level theory based on global properties of the space (deme-level neutrality deceptiveness etc)
34
Qualitative Results
Representation-building for programs
parameterization based on semantics
transforms program space properties to facilitate program evolution
probabilistic modeling over sets of program transformations
models compactly represent problem structure
35
Competent Program Evolution
Competent not just good performance explainability of good results robustness
Vision representations are important program learning is unique representations must be specialized based on semantics
MOSES meta-optimizing semantic evolutionary search exploiting semantics and managing demes
36
Committee
Dr Ron Loui (WashU chair)
Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech
Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)
10
Thesis
Program spaces not directly decomposable
Leverage properties of program spaces as inductive bias
Leading to competent program evolution
11
Representation-Building
Organize programs in terms of commonalities Ignore semantically meaningless variation Explore plausible variations
12
Representation-Building
Common regions must be aligned Redundancy must be identified Create knobs for plausible variations
13
Representation-Building
What abouthellip changing the phase averaging two input instead of picking one hellip
behavior (semantic) space program (syntactic) space
14
Statics amp Dynamics
Representations span a limited subspace of programs
Conceptual steps in representation-building1 reduction to normal form (x x + 0 rarr x)2 neighborhood enumeration (generate knobs)3 neighborhood reduction (get rid of some knobs)
Create demes to maintain a sample of many representations deme a sample of programs living in a common representation intra-deme optimization use the hBOA inter-deme
based on dominance relationships
15
Meta-Optimizing Semantic Evolutionary Search (MOSES)
1 Create an initial deme based on a small set of knobs (ie empty program) and random sampling in knob-space
2 Select a deme and run hBOA on it
3 Select programs from the final hBOA population meeting the deme-creation criterion (possibly displacing existing demes)
4 For each such program1 create a new representation centered around the program2 create a new random sample within this representation3 add as a deme
5 Repeat from step 2
16
Artificial Ant
Eat all food pellets within 600 steps Existing evolutionary methods not
significantly than random Space contains many regularities
To apply MOSES three reductions rules for normal form
eg left left left rarr right separate knobs for rotation
movement amp conditionals no neighborhood reduction
needed
rarr
17
Artificial Ant
How does MOSES do it
Searches a greatly reduced space
Exploits key dependencies ldquo[t]hese symmetries lead to essentially the same solutions
appearing to be the opposite of each other Eg either a pair of Right or pair of Left terminals at a particular location may be importantrdquo ndash Langdon amp Poli ldquoWhy ants are hardrdquo
hBOA modeling learns linkage between rotation knobs
Eliminate modeling and the problem still gets solved but with much higher variance computational effort rises to 36000
Technique Effort
Evolutionary Programming
136000 x
Genetic Programming
450000 x
MOSES 23000
18
Elegant Normal Form (Holman rsquo90)
Hierarchical normal form for Boolean formulae Reduction process takes time linear in formula size 99 of random 500-literal formulae reduced over 98
19
Syntactic vs Behavioral Distance
Is there a correlation between syntactic and behavioral distance
5000 unique random formulae of arity 10 with 30 literals each qualitatively similar results for arity 5
Computed the set of pairwise behavioral distances (truth-table Hamming distance) syntactic distances (tree edit distance normalized by tree size)
The same computation on the same formulae reduced to ENF
20
Syntactic vs Behavioral Distance
Is there a correlation between syntactic and behavioral distance
Random Formulae Reduced to ENF
21
Neighborhoods amp Knobs
What do neighborhoods look like behaviorally
1000 unique random formulae arity 5 100 literals each qualitatively similar results for arity 10
Enumerate all neighbors (edit distances lt2) compute behavioral distance from source
Neighborhoods in MOSES defined based on ENF neighbors are converted to ENF compared to original used to heuristically reduce total neighborhood size
22
Neighborhoods amp Knobs
What do neighborhoods look like behaviorally
Random formulae Reduced to ENF
23
Hierarchical Parity-Multiplexer
Study decomposition in a Boolean domain
Multiplexer function of arity k1 computed from k1 parity function of arity k2
total arity is k1k2
Hypothesis parity subfunctions will exhibit tighter linkages
24
Hierarchical Parity-Multiplexer Computational effort decreases 42 with model-
building (on 2-parity-3-multiplexer)
Paritysubfunctions(adjacent pairs)have tightest linkages
Hypothesis validated
25
Program Growth
5-parity minimal program size ~ 53
26
Program Growth
11-multiplexer minimal program size ~ 27
27
Where do the Cycles Go
Problem hBOA Representation-Building
Program Evaluation
5-Parity 28 43 29
11-multiplex 5 5 89
CFS 80 10 11
Complexity O(Nl2a2) O(Nla) O(Nlc)
N is population size O(n105)l is program size a is the arity of the spacen is representation size O(aprogram size)c is number of test cases
28
Supervised Classification
Goals accuracies comparable to SVM
superior accuracy vs GP
simpler classifiers vs SVM and GP
29
Supervised Classification
How much simpler Consider average-sized formulae learned for the 6-multiplexer
MOSES 21 nodes max depth 4
and(or(not(x2) and(or(x1 x4) or(and(not(x1) x4) x6))) or(and(or(x1 x4) or(and(or(x5 x6) or(x2 and(x1 x5))) and(not(x1) x3))) and(or(not(x1) and(x2 x6)) or(not(x1) x3 x6) or(and(not(x1) x2) and(x2 x4) and(not(x1) x3)))))
or(and(not(x1) not(x2) x3) and(or(not(x2) and(x3 x6)) x1 x4) and(not(x1) x2 x5) and(x1 x2 x6))
GP (after reduction to ENF) 50 nodes max depth 7
30
Supervised Classification
Datasets taken from recent comp bio papers
Chronic fatigue syndrome (101 cases) based on 26 SNPs genes either in homozygosis in heterozygosis or not expressed 56 binary features
Lymphoma (77 cases) amp aging brains (19 cases) based on gene expression levels (continuous) 50 most-differentiating genes selected preprocessed into binary features based on medians
All experiments based on 10 independent runs of 10-fold cross-validation
31
Quantitative Results
Classification average test accuracy
Technique CFS Lymphoma Aging Brain
SVM 662 975 950
GP 673 779 700
MOSES 679 946 953
32
Quantitative Results
Benchmark performance artificial ant
6x less computational effort vs EP 20x less vs GP
parity problems 133x less vs EP 4x less vs GP on 5-parity found solutions to 6-parity (none found by EP or GP)
multiplexer problems 9x less vs GP on 11-multiplexer
33
Qualitative Results
Requirements for competent program evolution all requirements for competent optimization
+ exploit semantics
+ recombine programs only within bounded subspaces
Bipartite conception of problem difficulty program-level adapted from the optimization case
deme-level theory based on global properties of the space (deme-level neutrality deceptiveness etc)
34
Qualitative Results
Representation-building for programs
parameterization based on semantics
transforms program space properties to facilitate program evolution
probabilistic modeling over sets of program transformations
models compactly represent problem structure
35
Competent Program Evolution
Competent not just good performance explainability of good results robustness
Vision representations are important program learning is unique representations must be specialized based on semantics
MOSES meta-optimizing semantic evolutionary search exploiting semantics and managing demes
36
Committee
Dr Ron Loui (WashU chair)
Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech
Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)
11
Representation-Building
Organize programs in terms of commonalities Ignore semantically meaningless variation Explore plausible variations
12
Representation-Building
Common regions must be aligned Redundancy must be identified Create knobs for plausible variations
13
Representation-Building
What abouthellip changing the phase averaging two input instead of picking one hellip
behavior (semantic) space program (syntactic) space
14
Statics amp Dynamics
Representations span a limited subspace of programs
Conceptual steps in representation-building1 reduction to normal form (x x + 0 rarr x)2 neighborhood enumeration (generate knobs)3 neighborhood reduction (get rid of some knobs)
Create demes to maintain a sample of many representations deme a sample of programs living in a common representation intra-deme optimization use the hBOA inter-deme
based on dominance relationships
15
Meta-Optimizing Semantic Evolutionary Search (MOSES)
1 Create an initial deme based on a small set of knobs (ie empty program) and random sampling in knob-space
2 Select a deme and run hBOA on it
3 Select programs from the final hBOA population meeting the deme-creation criterion (possibly displacing existing demes)
4 For each such program1 create a new representation centered around the program2 create a new random sample within this representation3 add as a deme
5 Repeat from step 2
16
Artificial Ant
Eat all food pellets within 600 steps Existing evolutionary methods not
significantly than random Space contains many regularities
To apply MOSES three reductions rules for normal form
eg left left left rarr right separate knobs for rotation
movement amp conditionals no neighborhood reduction
needed
rarr
17
Artificial Ant
How does MOSES do it
Searches a greatly reduced space
Exploits key dependencies ldquo[t]hese symmetries lead to essentially the same solutions
appearing to be the opposite of each other Eg either a pair of Right or pair of Left terminals at a particular location may be importantrdquo ndash Langdon amp Poli ldquoWhy ants are hardrdquo
hBOA modeling learns linkage between rotation knobs
Eliminate modeling and the problem still gets solved but with much higher variance computational effort rises to 36000
Technique Effort
Evolutionary Programming
136000 x
Genetic Programming
450000 x
MOSES 23000
18
Elegant Normal Form (Holman rsquo90)
Hierarchical normal form for Boolean formulae Reduction process takes time linear in formula size 99 of random 500-literal formulae reduced over 98
19
Syntactic vs Behavioral Distance
Is there a correlation between syntactic and behavioral distance
5000 unique random formulae of arity 10 with 30 literals each qualitatively similar results for arity 5
Computed the set of pairwise behavioral distances (truth-table Hamming distance) syntactic distances (tree edit distance normalized by tree size)
The same computation on the same formulae reduced to ENF
20
Syntactic vs Behavioral Distance
Is there a correlation between syntactic and behavioral distance
Random Formulae Reduced to ENF
21
Neighborhoods amp Knobs
What do neighborhoods look like behaviorally
1000 unique random formulae arity 5 100 literals each qualitatively similar results for arity 10
Enumerate all neighbors (edit distances lt2) compute behavioral distance from source
Neighborhoods in MOSES defined based on ENF neighbors are converted to ENF compared to original used to heuristically reduce total neighborhood size
22
Neighborhoods amp Knobs
What do neighborhoods look like behaviorally
Random formulae Reduced to ENF
23
Hierarchical Parity-Multiplexer
Study decomposition in a Boolean domain
Multiplexer function of arity k1 computed from k1 parity function of arity k2
total arity is k1k2
Hypothesis parity subfunctions will exhibit tighter linkages
24
Hierarchical Parity-Multiplexer Computational effort decreases 42 with model-
building (on 2-parity-3-multiplexer)
Paritysubfunctions(adjacent pairs)have tightest linkages
Hypothesis validated
25
Program Growth
5-parity minimal program size ~ 53
26
Program Growth
11-multiplexer minimal program size ~ 27
27
Where do the Cycles Go
Problem hBOA Representation-Building
Program Evaluation
5-Parity 28 43 29
11-multiplex 5 5 89
CFS 80 10 11
Complexity O(Nl2a2) O(Nla) O(Nlc)
N is population size O(n105)l is program size a is the arity of the spacen is representation size O(aprogram size)c is number of test cases
28
Supervised Classification
Goals accuracies comparable to SVM
superior accuracy vs GP
simpler classifiers vs SVM and GP
29
Supervised Classification
How much simpler Consider average-sized formulae learned for the 6-multiplexer
MOSES 21 nodes max depth 4
and(or(not(x2) and(or(x1 x4) or(and(not(x1) x4) x6))) or(and(or(x1 x4) or(and(or(x5 x6) or(x2 and(x1 x5))) and(not(x1) x3))) and(or(not(x1) and(x2 x6)) or(not(x1) x3 x6) or(and(not(x1) x2) and(x2 x4) and(not(x1) x3)))))
or(and(not(x1) not(x2) x3) and(or(not(x2) and(x3 x6)) x1 x4) and(not(x1) x2 x5) and(x1 x2 x6))
GP (after reduction to ENF) 50 nodes max depth 7
30
Supervised Classification
Datasets taken from recent comp bio papers
Chronic fatigue syndrome (101 cases) based on 26 SNPs genes either in homozygosis in heterozygosis or not expressed 56 binary features
Lymphoma (77 cases) amp aging brains (19 cases) based on gene expression levels (continuous) 50 most-differentiating genes selected preprocessed into binary features based on medians
All experiments based on 10 independent runs of 10-fold cross-validation
31
Quantitative Results
Classification average test accuracy
Technique CFS Lymphoma Aging Brain
SVM 662 975 950
GP 673 779 700
MOSES 679 946 953
32
Quantitative Results
Benchmark performance artificial ant
6x less computational effort vs EP 20x less vs GP
parity problems 133x less vs EP 4x less vs GP on 5-parity found solutions to 6-parity (none found by EP or GP)
multiplexer problems 9x less vs GP on 11-multiplexer
33
Qualitative Results
Requirements for competent program evolution all requirements for competent optimization
+ exploit semantics
+ recombine programs only within bounded subspaces
Bipartite conception of problem difficulty program-level adapted from the optimization case
deme-level theory based on global properties of the space (deme-level neutrality deceptiveness etc)
34
Qualitative Results
Representation-building for programs
parameterization based on semantics
transforms program space properties to facilitate program evolution
probabilistic modeling over sets of program transformations
models compactly represent problem structure
35
Competent Program Evolution
Competent not just good performance explainability of good results robustness
Vision representations are important program learning is unique representations must be specialized based on semantics
MOSES meta-optimizing semantic evolutionary search exploiting semantics and managing demes
36
Committee
Dr Ron Loui (WashU chair)
Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech
Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)
12
Representation-Building
Common regions must be aligned Redundancy must be identified Create knobs for plausible variations
13
Representation-Building
What abouthellip changing the phase averaging two input instead of picking one hellip
behavior (semantic) space program (syntactic) space
14
Statics amp Dynamics
Representations span a limited subspace of programs
Conceptual steps in representation-building1 reduction to normal form (x x + 0 rarr x)2 neighborhood enumeration (generate knobs)3 neighborhood reduction (get rid of some knobs)
Create demes to maintain a sample of many representations deme a sample of programs living in a common representation intra-deme optimization use the hBOA inter-deme
based on dominance relationships
15
Meta-Optimizing Semantic Evolutionary Search (MOSES)
1 Create an initial deme based on a small set of knobs (ie empty program) and random sampling in knob-space
2 Select a deme and run hBOA on it
3 Select programs from the final hBOA population meeting the deme-creation criterion (possibly displacing existing demes)
4 For each such program1 create a new representation centered around the program2 create a new random sample within this representation3 add as a deme
5 Repeat from step 2
16
Artificial Ant
Eat all food pellets within 600 steps Existing evolutionary methods not
significantly than random Space contains many regularities
To apply MOSES three reductions rules for normal form
eg left left left rarr right separate knobs for rotation
movement amp conditionals no neighborhood reduction
needed
rarr
17
Artificial Ant
How does MOSES do it
Searches a greatly reduced space
Exploits key dependencies ldquo[t]hese symmetries lead to essentially the same solutions
appearing to be the opposite of each other Eg either a pair of Right or pair of Left terminals at a particular location may be importantrdquo ndash Langdon amp Poli ldquoWhy ants are hardrdquo
hBOA modeling learns linkage between rotation knobs
Eliminate modeling and the problem still gets solved but with much higher variance computational effort rises to 36000
Technique Effort
Evolutionary Programming
136000 x
Genetic Programming
450000 x
MOSES 23000
18
Elegant Normal Form (Holman rsquo90)
Hierarchical normal form for Boolean formulae Reduction process takes time linear in formula size 99 of random 500-literal formulae reduced over 98
19
Syntactic vs Behavioral Distance
Is there a correlation between syntactic and behavioral distance
5000 unique random formulae of arity 10 with 30 literals each qualitatively similar results for arity 5
Computed the set of pairwise behavioral distances (truth-table Hamming distance) syntactic distances (tree edit distance normalized by tree size)
The same computation on the same formulae reduced to ENF
20
Syntactic vs Behavioral Distance
Is there a correlation between syntactic and behavioral distance
Random Formulae Reduced to ENF
21
Neighborhoods amp Knobs
What do neighborhoods look like behaviorally
1000 unique random formulae arity 5 100 literals each qualitatively similar results for arity 10
Enumerate all neighbors (edit distances lt2) compute behavioral distance from source
Neighborhoods in MOSES defined based on ENF neighbors are converted to ENF compared to original used to heuristically reduce total neighborhood size
22
Neighborhoods amp Knobs
What do neighborhoods look like behaviorally
Random formulae Reduced to ENF
23
Hierarchical Parity-Multiplexer
Study decomposition in a Boolean domain
Multiplexer function of arity k1 computed from k1 parity function of arity k2
total arity is k1k2
Hypothesis parity subfunctions will exhibit tighter linkages
24
Hierarchical Parity-Multiplexer Computational effort decreases 42 with model-
building (on 2-parity-3-multiplexer)
Paritysubfunctions(adjacent pairs)have tightest linkages
Hypothesis validated
25
Program Growth
5-parity minimal program size ~ 53
26
Program Growth
11-multiplexer minimal program size ~ 27
27
Where do the Cycles Go
Problem hBOA Representation-Building
Program Evaluation
5-Parity 28 43 29
11-multiplex 5 5 89
CFS 80 10 11
Complexity O(Nl2a2) O(Nla) O(Nlc)
N is population size O(n105)l is program size a is the arity of the spacen is representation size O(aprogram size)c is number of test cases
28
Supervised Classification
Goals accuracies comparable to SVM
superior accuracy vs GP
simpler classifiers vs SVM and GP
29
Supervised Classification
How much simpler Consider average-sized formulae learned for the 6-multiplexer
MOSES 21 nodes max depth 4
and(or(not(x2) and(or(x1 x4) or(and(not(x1) x4) x6))) or(and(or(x1 x4) or(and(or(x5 x6) or(x2 and(x1 x5))) and(not(x1) x3))) and(or(not(x1) and(x2 x6)) or(not(x1) x3 x6) or(and(not(x1) x2) and(x2 x4) and(not(x1) x3)))))
or(and(not(x1) not(x2) x3) and(or(not(x2) and(x3 x6)) x1 x4) and(not(x1) x2 x5) and(x1 x2 x6))
GP (after reduction to ENF) 50 nodes max depth 7
30
Supervised Classification
Datasets taken from recent comp bio papers
Chronic fatigue syndrome (101 cases) based on 26 SNPs genes either in homozygosis in heterozygosis or not expressed 56 binary features
Lymphoma (77 cases) amp aging brains (19 cases) based on gene expression levels (continuous) 50 most-differentiating genes selected preprocessed into binary features based on medians
All experiments based on 10 independent runs of 10-fold cross-validation
31
Quantitative Results
Classification average test accuracy
Technique CFS Lymphoma Aging Brain
SVM 662 975 950
GP 673 779 700
MOSES 679 946 953
32
Quantitative Results
Benchmark performance artificial ant
6x less computational effort vs EP 20x less vs GP
parity problems 133x less vs EP 4x less vs GP on 5-parity found solutions to 6-parity (none found by EP or GP)
multiplexer problems 9x less vs GP on 11-multiplexer
33
Qualitative Results
Requirements for competent program evolution all requirements for competent optimization
+ exploit semantics
+ recombine programs only within bounded subspaces
Bipartite conception of problem difficulty program-level adapted from the optimization case
deme-level theory based on global properties of the space (deme-level neutrality deceptiveness etc)
34
Qualitative Results
Representation-building for programs
parameterization based on semantics
transforms program space properties to facilitate program evolution
probabilistic modeling over sets of program transformations
models compactly represent problem structure
35
Competent Program Evolution
Competent not just good performance explainability of good results robustness
Vision representations are important program learning is unique representations must be specialized based on semantics
MOSES meta-optimizing semantic evolutionary search exploiting semantics and managing demes
36
Committee
Dr Ron Loui (WashU chair)
Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech
Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)
13
Representation-Building
What abouthellip changing the phase averaging two input instead of picking one hellip
behavior (semantic) space program (syntactic) space
14
Statics amp Dynamics
Representations span a limited subspace of programs
Conceptual steps in representation-building1 reduction to normal form (x x + 0 rarr x)2 neighborhood enumeration (generate knobs)3 neighborhood reduction (get rid of some knobs)
Create demes to maintain a sample of many representations deme a sample of programs living in a common representation intra-deme optimization use the hBOA inter-deme
based on dominance relationships
15
Meta-Optimizing Semantic Evolutionary Search (MOSES)
1 Create an initial deme based on a small set of knobs (ie empty program) and random sampling in knob-space
2 Select a deme and run hBOA on it
3 Select programs from the final hBOA population meeting the deme-creation criterion (possibly displacing existing demes)
4 For each such program1 create a new representation centered around the program2 create a new random sample within this representation3 add as a deme
5 Repeat from step 2
16
Artificial Ant
Eat all food pellets within 600 steps Existing evolutionary methods not
significantly than random Space contains many regularities
To apply MOSES three reductions rules for normal form
eg left left left rarr right separate knobs for rotation
movement amp conditionals no neighborhood reduction
needed
rarr
17
Artificial Ant
How does MOSES do it
Searches a greatly reduced space
Exploits key dependencies ldquo[t]hese symmetries lead to essentially the same solutions
appearing to be the opposite of each other Eg either a pair of Right or pair of Left terminals at a particular location may be importantrdquo ndash Langdon amp Poli ldquoWhy ants are hardrdquo
hBOA modeling learns linkage between rotation knobs
Eliminate modeling and the problem still gets solved but with much higher variance computational effort rises to 36000
Technique Effort
Evolutionary Programming
136000 x
Genetic Programming
450000 x
MOSES 23000
18
Elegant Normal Form (Holman rsquo90)
Hierarchical normal form for Boolean formulae Reduction process takes time linear in formula size 99 of random 500-literal formulae reduced over 98
19
Syntactic vs Behavioral Distance
Is there a correlation between syntactic and behavioral distance
5000 unique random formulae of arity 10 with 30 literals each qualitatively similar results for arity 5
Computed the set of pairwise behavioral distances (truth-table Hamming distance) syntactic distances (tree edit distance normalized by tree size)
The same computation on the same formulae reduced to ENF
20
Syntactic vs Behavioral Distance
Is there a correlation between syntactic and behavioral distance
Random Formulae Reduced to ENF
21
Neighborhoods amp Knobs
What do neighborhoods look like behaviorally
1000 unique random formulae arity 5 100 literals each qualitatively similar results for arity 10
Enumerate all neighbors (edit distances lt2) compute behavioral distance from source
Neighborhoods in MOSES defined based on ENF neighbors are converted to ENF compared to original used to heuristically reduce total neighborhood size
22
Neighborhoods amp Knobs
What do neighborhoods look like behaviorally
Random formulae Reduced to ENF
23
Hierarchical Parity-Multiplexer
Study decomposition in a Boolean domain
Multiplexer function of arity k1 computed from k1 parity function of arity k2
total arity is k1k2
Hypothesis parity subfunctions will exhibit tighter linkages
24
Hierarchical Parity-Multiplexer Computational effort decreases 42 with model-
building (on 2-parity-3-multiplexer)
Paritysubfunctions(adjacent pairs)have tightest linkages
Hypothesis validated
25
Program Growth
5-parity minimal program size ~ 53
26
Program Growth
11-multiplexer minimal program size ~ 27
27
Where do the Cycles Go
Problem hBOA Representation-Building
Program Evaluation
5-Parity 28 43 29
11-multiplex 5 5 89
CFS 80 10 11
Complexity O(Nl2a2) O(Nla) O(Nlc)
N is population size O(n105)l is program size a is the arity of the spacen is representation size O(aprogram size)c is number of test cases
28
Supervised Classification
Goals accuracies comparable to SVM
superior accuracy vs GP
simpler classifiers vs SVM and GP
29
Supervised Classification
How much simpler Consider average-sized formulae learned for the 6-multiplexer
MOSES 21 nodes max depth 4
and(or(not(x2) and(or(x1 x4) or(and(not(x1) x4) x6))) or(and(or(x1 x4) or(and(or(x5 x6) or(x2 and(x1 x5))) and(not(x1) x3))) and(or(not(x1) and(x2 x6)) or(not(x1) x3 x6) or(and(not(x1) x2) and(x2 x4) and(not(x1) x3)))))
or(and(not(x1) not(x2) x3) and(or(not(x2) and(x3 x6)) x1 x4) and(not(x1) x2 x5) and(x1 x2 x6))
GP (after reduction to ENF) 50 nodes max depth 7
30
Supervised Classification
Datasets taken from recent comp bio papers
Chronic fatigue syndrome (101 cases) based on 26 SNPs genes either in homozygosis in heterozygosis or not expressed 56 binary features
Lymphoma (77 cases) amp aging brains (19 cases) based on gene expression levels (continuous) 50 most-differentiating genes selected preprocessed into binary features based on medians
All experiments based on 10 independent runs of 10-fold cross-validation
31
Quantitative Results
Classification average test accuracy
Technique CFS Lymphoma Aging Brain
SVM 662 975 950
GP 673 779 700
MOSES 679 946 953
32
Quantitative Results
Benchmark performance artificial ant
6x less computational effort vs EP 20x less vs GP
parity problems 133x less vs EP 4x less vs GP on 5-parity found solutions to 6-parity (none found by EP or GP)
multiplexer problems 9x less vs GP on 11-multiplexer
33
Qualitative Results
Requirements for competent program evolution all requirements for competent optimization
+ exploit semantics
+ recombine programs only within bounded subspaces
Bipartite conception of problem difficulty program-level adapted from the optimization case
deme-level theory based on global properties of the space (deme-level neutrality deceptiveness etc)
34
Qualitative Results
Representation-building for programs
parameterization based on semantics
transforms program space properties to facilitate program evolution
probabilistic modeling over sets of program transformations
models compactly represent problem structure
35
Competent Program Evolution
Competent not just good performance explainability of good results robustness
Vision representations are important program learning is unique representations must be specialized based on semantics
MOSES meta-optimizing semantic evolutionary search exploiting semantics and managing demes
36
Committee
Dr Ron Loui (WashU chair)
Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech
Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)
14
Statics amp Dynamics
Representations span a limited subspace of programs
Conceptual steps in representation-building1 reduction to normal form (x x + 0 rarr x)2 neighborhood enumeration (generate knobs)3 neighborhood reduction (get rid of some knobs)
Create demes to maintain a sample of many representations deme a sample of programs living in a common representation intra-deme optimization use the hBOA inter-deme
based on dominance relationships
15
Meta-Optimizing Semantic Evolutionary Search (MOSES)
1 Create an initial deme based on a small set of knobs (ie empty program) and random sampling in knob-space
2 Select a deme and run hBOA on it
3 Select programs from the final hBOA population meeting the deme-creation criterion (possibly displacing existing demes)
4 For each such program1 create a new representation centered around the program2 create a new random sample within this representation3 add as a deme
5 Repeat from step 2
16
Artificial Ant
Eat all food pellets within 600 steps Existing evolutionary methods not
significantly than random Space contains many regularities
To apply MOSES three reductions rules for normal form
eg left left left rarr right separate knobs for rotation
movement amp conditionals no neighborhood reduction
needed
rarr
17
Artificial Ant
How does MOSES do it
Searches a greatly reduced space
Exploits key dependencies ldquo[t]hese symmetries lead to essentially the same solutions
appearing to be the opposite of each other Eg either a pair of Right or pair of Left terminals at a particular location may be importantrdquo ndash Langdon amp Poli ldquoWhy ants are hardrdquo
hBOA modeling learns linkage between rotation knobs
Eliminate modeling and the problem still gets solved but with much higher variance computational effort rises to 36000
Technique Effort
Evolutionary Programming
136000 x
Genetic Programming
450000 x
MOSES 23000
18
Elegant Normal Form (Holman rsquo90)
Hierarchical normal form for Boolean formulae Reduction process takes time linear in formula size 99 of random 500-literal formulae reduced over 98
19
Syntactic vs Behavioral Distance
Is there a correlation between syntactic and behavioral distance
5000 unique random formulae of arity 10 with 30 literals each qualitatively similar results for arity 5
Computed the set of pairwise behavioral distances (truth-table Hamming distance) syntactic distances (tree edit distance normalized by tree size)
The same computation on the same formulae reduced to ENF
20
Syntactic vs Behavioral Distance
Is there a correlation between syntactic and behavioral distance
Random Formulae Reduced to ENF
21
Neighborhoods amp Knobs
What do neighborhoods look like behaviorally
1000 unique random formulae arity 5 100 literals each qualitatively similar results for arity 10
Enumerate all neighbors (edit distances lt2) compute behavioral distance from source
Neighborhoods in MOSES defined based on ENF neighbors are converted to ENF compared to original used to heuristically reduce total neighborhood size
22
Neighborhoods amp Knobs
What do neighborhoods look like behaviorally
Random formulae Reduced to ENF
23
Hierarchical Parity-Multiplexer
Study decomposition in a Boolean domain
Multiplexer function of arity k1 computed from k1 parity function of arity k2
total arity is k1k2
Hypothesis parity subfunctions will exhibit tighter linkages
24
Hierarchical Parity-Multiplexer Computational effort decreases 42 with model-
building (on 2-parity-3-multiplexer)
Paritysubfunctions(adjacent pairs)have tightest linkages
Hypothesis validated
25
Program Growth
5-parity minimal program size ~ 53
26
Program Growth
11-multiplexer minimal program size ~ 27
27
Where do the Cycles Go
Problem hBOA Representation-Building
Program Evaluation
5-Parity 28 43 29
11-multiplex 5 5 89
CFS 80 10 11
Complexity O(Nl2a2) O(Nla) O(Nlc)
N is population size O(n105)l is program size a is the arity of the spacen is representation size O(aprogram size)c is number of test cases
28
Supervised Classification
Goals accuracies comparable to SVM
superior accuracy vs GP
simpler classifiers vs SVM and GP
29
Supervised Classification
How much simpler Consider average-sized formulae learned for the 6-multiplexer
MOSES 21 nodes max depth 4
and(or(not(x2) and(or(x1 x4) or(and(not(x1) x4) x6))) or(and(or(x1 x4) or(and(or(x5 x6) or(x2 and(x1 x5))) and(not(x1) x3))) and(or(not(x1) and(x2 x6)) or(not(x1) x3 x6) or(and(not(x1) x2) and(x2 x4) and(not(x1) x3)))))
or(and(not(x1) not(x2) x3) and(or(not(x2) and(x3 x6)) x1 x4) and(not(x1) x2 x5) and(x1 x2 x6))
GP (after reduction to ENF) 50 nodes max depth 7
30
Supervised Classification
Datasets taken from recent comp bio papers
Chronic fatigue syndrome (101 cases) based on 26 SNPs genes either in homozygosis in heterozygosis or not expressed 56 binary features
Lymphoma (77 cases) amp aging brains (19 cases) based on gene expression levels (continuous) 50 most-differentiating genes selected preprocessed into binary features based on medians
All experiments based on 10 independent runs of 10-fold cross-validation
31
Quantitative Results
Classification average test accuracy
Technique CFS Lymphoma Aging Brain
SVM 662 975 950
GP 673 779 700
MOSES 679 946 953
32
Quantitative Results
Benchmark performance artificial ant
6x less computational effort vs EP 20x less vs GP
parity problems 133x less vs EP 4x less vs GP on 5-parity found solutions to 6-parity (none found by EP or GP)
multiplexer problems 9x less vs GP on 11-multiplexer
33
Qualitative Results
Requirements for competent program evolution all requirements for competent optimization
+ exploit semantics
+ recombine programs only within bounded subspaces
Bipartite conception of problem difficulty program-level adapted from the optimization case
deme-level theory based on global properties of the space (deme-level neutrality deceptiveness etc)
34
Qualitative Results
Representation-building for programs
parameterization based on semantics
transforms program space properties to facilitate program evolution
probabilistic modeling over sets of program transformations
models compactly represent problem structure
35
Competent Program Evolution
Competent not just good performance explainability of good results robustness
Vision representations are important program learning is unique representations must be specialized based on semantics
MOSES meta-optimizing semantic evolutionary search exploiting semantics and managing demes
36
Committee
Dr Ron Loui (WashU chair)
Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech
Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)
15
Meta-Optimizing Semantic Evolutionary Search (MOSES)
1 Create an initial deme based on a small set of knobs (ie empty program) and random sampling in knob-space
2 Select a deme and run hBOA on it
3 Select programs from the final hBOA population meeting the deme-creation criterion (possibly displacing existing demes)
4 For each such program1 create a new representation centered around the program2 create a new random sample within this representation3 add as a deme
5 Repeat from step 2
16
Artificial Ant
Eat all food pellets within 600 steps Existing evolutionary methods not
significantly than random Space contains many regularities
To apply MOSES three reductions rules for normal form
eg left left left rarr right separate knobs for rotation
movement amp conditionals no neighborhood reduction
needed
rarr
17
Artificial Ant
How does MOSES do it
Searches a greatly reduced space
Exploits key dependencies ldquo[t]hese symmetries lead to essentially the same solutions
appearing to be the opposite of each other Eg either a pair of Right or pair of Left terminals at a particular location may be importantrdquo ndash Langdon amp Poli ldquoWhy ants are hardrdquo
hBOA modeling learns linkage between rotation knobs
Eliminate modeling and the problem still gets solved but with much higher variance computational effort rises to 36000
Technique Effort
Evolutionary Programming
136000 x
Genetic Programming
450000 x
MOSES 23000
18
Elegant Normal Form (Holman rsquo90)
Hierarchical normal form for Boolean formulae Reduction process takes time linear in formula size 99 of random 500-literal formulae reduced over 98
19
Syntactic vs Behavioral Distance
Is there a correlation between syntactic and behavioral distance
5000 unique random formulae of arity 10 with 30 literals each qualitatively similar results for arity 5
Computed the set of pairwise behavioral distances (truth-table Hamming distance) syntactic distances (tree edit distance normalized by tree size)
The same computation on the same formulae reduced to ENF
20
Syntactic vs Behavioral Distance
Is there a correlation between syntactic and behavioral distance
Random Formulae Reduced to ENF
21
Neighborhoods amp Knobs
What do neighborhoods look like behaviorally
1000 unique random formulae arity 5 100 literals each qualitatively similar results for arity 10
Enumerate all neighbors (edit distances lt2) compute behavioral distance from source
Neighborhoods in MOSES defined based on ENF neighbors are converted to ENF compared to original used to heuristically reduce total neighborhood size
22
Neighborhoods amp Knobs
What do neighborhoods look like behaviorally
Random formulae Reduced to ENF
23
Hierarchical Parity-Multiplexer
Study decomposition in a Boolean domain
Multiplexer function of arity k1 computed from k1 parity function of arity k2
total arity is k1k2
Hypothesis parity subfunctions will exhibit tighter linkages
24
Hierarchical Parity-Multiplexer Computational effort decreases 42 with model-
building (on 2-parity-3-multiplexer)
Paritysubfunctions(adjacent pairs)have tightest linkages
Hypothesis validated
25
Program Growth
5-parity minimal program size ~ 53
26
Program Growth
11-multiplexer minimal program size ~ 27
27
Where do the Cycles Go
Problem hBOA Representation-Building
Program Evaluation
5-Parity 28 43 29
11-multiplex 5 5 89
CFS 80 10 11
Complexity O(Nl2a2) O(Nla) O(Nlc)
N is population size O(n105)l is program size a is the arity of the spacen is representation size O(aprogram size)c is number of test cases
28
Supervised Classification
Goals accuracies comparable to SVM
superior accuracy vs GP
simpler classifiers vs SVM and GP
29
Supervised Classification
How much simpler Consider average-sized formulae learned for the 6-multiplexer
MOSES 21 nodes max depth 4
and(or(not(x2) and(or(x1 x4) or(and(not(x1) x4) x6))) or(and(or(x1 x4) or(and(or(x5 x6) or(x2 and(x1 x5))) and(not(x1) x3))) and(or(not(x1) and(x2 x6)) or(not(x1) x3 x6) or(and(not(x1) x2) and(x2 x4) and(not(x1) x3)))))
or(and(not(x1) not(x2) x3) and(or(not(x2) and(x3 x6)) x1 x4) and(not(x1) x2 x5) and(x1 x2 x6))
GP (after reduction to ENF) 50 nodes max depth 7
30
Supervised Classification
Datasets taken from recent comp bio papers
Chronic fatigue syndrome (101 cases) based on 26 SNPs genes either in homozygosis in heterozygosis or not expressed 56 binary features
Lymphoma (77 cases) amp aging brains (19 cases) based on gene expression levels (continuous) 50 most-differentiating genes selected preprocessed into binary features based on medians
All experiments based on 10 independent runs of 10-fold cross-validation
31
Quantitative Results
Classification average test accuracy
Technique CFS Lymphoma Aging Brain
SVM 662 975 950
GP 673 779 700
MOSES 679 946 953
32
Quantitative Results
Benchmark performance artificial ant
6x less computational effort vs EP 20x less vs GP
parity problems 133x less vs EP 4x less vs GP on 5-parity found solutions to 6-parity (none found by EP or GP)
multiplexer problems 9x less vs GP on 11-multiplexer
33
Qualitative Results
Requirements for competent program evolution all requirements for competent optimization
+ exploit semantics
+ recombine programs only within bounded subspaces
Bipartite conception of problem difficulty program-level adapted from the optimization case
deme-level theory based on global properties of the space (deme-level neutrality deceptiveness etc)
34
Qualitative Results
Representation-building for programs
parameterization based on semantics
transforms program space properties to facilitate program evolution
probabilistic modeling over sets of program transformations
models compactly represent problem structure
35
Competent Program Evolution
Competent not just good performance explainability of good results robustness
Vision representations are important program learning is unique representations must be specialized based on semantics
MOSES meta-optimizing semantic evolutionary search exploiting semantics and managing demes
36
Committee
Dr Ron Loui (WashU chair)
Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech
Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)
16
Artificial Ant
Eat all food pellets within 600 steps Existing evolutionary methods not
significantly than random Space contains many regularities
To apply MOSES three reductions rules for normal form
eg left left left rarr right separate knobs for rotation
movement amp conditionals no neighborhood reduction
needed
rarr
17
Artificial Ant
How does MOSES do it
Searches a greatly reduced space
Exploits key dependencies ldquo[t]hese symmetries lead to essentially the same solutions
appearing to be the opposite of each other Eg either a pair of Right or pair of Left terminals at a particular location may be importantrdquo ndash Langdon amp Poli ldquoWhy ants are hardrdquo
hBOA modeling learns linkage between rotation knobs
Eliminate modeling and the problem still gets solved but with much higher variance computational effort rises to 36000
Technique Effort
Evolutionary Programming
136000 x
Genetic Programming
450000 x
MOSES 23000
18
Elegant Normal Form (Holman rsquo90)
Hierarchical normal form for Boolean formulae Reduction process takes time linear in formula size 99 of random 500-literal formulae reduced over 98
19
Syntactic vs Behavioral Distance
Is there a correlation between syntactic and behavioral distance
5000 unique random formulae of arity 10 with 30 literals each qualitatively similar results for arity 5
Computed the set of pairwise behavioral distances (truth-table Hamming distance) syntactic distances (tree edit distance normalized by tree size)
The same computation on the same formulae reduced to ENF
20
Syntactic vs Behavioral Distance
Is there a correlation between syntactic and behavioral distance
Random Formulae Reduced to ENF
21
Neighborhoods amp Knobs
What do neighborhoods look like behaviorally
1000 unique random formulae arity 5 100 literals each qualitatively similar results for arity 10
Enumerate all neighbors (edit distances lt2) compute behavioral distance from source
Neighborhoods in MOSES defined based on ENF neighbors are converted to ENF compared to original used to heuristically reduce total neighborhood size
22
Neighborhoods amp Knobs
What do neighborhoods look like behaviorally
Random formulae Reduced to ENF
23
Hierarchical Parity-Multiplexer
Study decomposition in a Boolean domain
Multiplexer function of arity k1 computed from k1 parity function of arity k2
total arity is k1k2
Hypothesis parity subfunctions will exhibit tighter linkages
24
Hierarchical Parity-Multiplexer Computational effort decreases 42 with model-
building (on 2-parity-3-multiplexer)
Paritysubfunctions(adjacent pairs)have tightest linkages
Hypothesis validated
25
Program Growth
5-parity minimal program size ~ 53
26
Program Growth
11-multiplexer minimal program size ~ 27
27
Where do the Cycles Go
Problem hBOA Representation-Building
Program Evaluation
5-Parity 28 43 29
11-multiplex 5 5 89
CFS 80 10 11
Complexity O(Nl2a2) O(Nla) O(Nlc)
N is population size O(n105)l is program size a is the arity of the spacen is representation size O(aprogram size)c is number of test cases
28
Supervised Classification
Goals accuracies comparable to SVM
superior accuracy vs GP
simpler classifiers vs SVM and GP
29
Supervised Classification
How much simpler Consider average-sized formulae learned for the 6-multiplexer
MOSES 21 nodes max depth 4
and(or(not(x2) and(or(x1 x4) or(and(not(x1) x4) x6))) or(and(or(x1 x4) or(and(or(x5 x6) or(x2 and(x1 x5))) and(not(x1) x3))) and(or(not(x1) and(x2 x6)) or(not(x1) x3 x6) or(and(not(x1) x2) and(x2 x4) and(not(x1) x3)))))
or(and(not(x1) not(x2) x3) and(or(not(x2) and(x3 x6)) x1 x4) and(not(x1) x2 x5) and(x1 x2 x6))
GP (after reduction to ENF) 50 nodes max depth 7
30
Supervised Classification
Datasets taken from recent comp bio papers
Chronic fatigue syndrome (101 cases) based on 26 SNPs genes either in homozygosis in heterozygosis or not expressed 56 binary features
Lymphoma (77 cases) amp aging brains (19 cases) based on gene expression levels (continuous) 50 most-differentiating genes selected preprocessed into binary features based on medians
All experiments based on 10 independent runs of 10-fold cross-validation
31
Quantitative Results
Classification average test accuracy
Technique CFS Lymphoma Aging Brain
SVM 662 975 950
GP 673 779 700
MOSES 679 946 953
32
Quantitative Results
Benchmark performance artificial ant
6x less computational effort vs EP 20x less vs GP
parity problems 133x less vs EP 4x less vs GP on 5-parity found solutions to 6-parity (none found by EP or GP)
multiplexer problems 9x less vs GP on 11-multiplexer
33
Qualitative Results
Requirements for competent program evolution all requirements for competent optimization
+ exploit semantics
+ recombine programs only within bounded subspaces
Bipartite conception of problem difficulty program-level adapted from the optimization case
deme-level theory based on global properties of the space (deme-level neutrality deceptiveness etc)
34
Qualitative Results
Representation-building for programs
parameterization based on semantics
transforms program space properties to facilitate program evolution
probabilistic modeling over sets of program transformations
models compactly represent problem structure
35
Competent Program Evolution
Competent not just good performance explainability of good results robustness
Vision representations are important program learning is unique representations must be specialized based on semantics
MOSES meta-optimizing semantic evolutionary search exploiting semantics and managing demes
36
Committee
Dr Ron Loui (WashU chair)
Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech
Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)
17
Artificial Ant
How does MOSES do it
Searches a greatly reduced space
Exploits key dependencies ldquo[t]hese symmetries lead to essentially the same solutions
appearing to be the opposite of each other Eg either a pair of Right or pair of Left terminals at a particular location may be importantrdquo ndash Langdon amp Poli ldquoWhy ants are hardrdquo
hBOA modeling learns linkage between rotation knobs
Eliminate modeling and the problem still gets solved but with much higher variance computational effort rises to 36000
Technique Effort
Evolutionary Programming
136000 x
Genetic Programming
450000 x
MOSES 23000
18
Elegant Normal Form (Holman rsquo90)
Hierarchical normal form for Boolean formulae Reduction process takes time linear in formula size 99 of random 500-literal formulae reduced over 98
19
Syntactic vs Behavioral Distance
Is there a correlation between syntactic and behavioral distance
5000 unique random formulae of arity 10 with 30 literals each qualitatively similar results for arity 5
Computed the set of pairwise behavioral distances (truth-table Hamming distance) syntactic distances (tree edit distance normalized by tree size)
The same computation on the same formulae reduced to ENF
20
Syntactic vs Behavioral Distance
Is there a correlation between syntactic and behavioral distance
Random Formulae Reduced to ENF
21
Neighborhoods amp Knobs
What do neighborhoods look like behaviorally
1000 unique random formulae arity 5 100 literals each qualitatively similar results for arity 10
Enumerate all neighbors (edit distances lt2) compute behavioral distance from source
Neighborhoods in MOSES defined based on ENF neighbors are converted to ENF compared to original used to heuristically reduce total neighborhood size
22
Neighborhoods amp Knobs
What do neighborhoods look like behaviorally
Random formulae Reduced to ENF
23
Hierarchical Parity-Multiplexer
Study decomposition in a Boolean domain
Multiplexer function of arity k1 computed from k1 parity function of arity k2
total arity is k1k2
Hypothesis parity subfunctions will exhibit tighter linkages
24
Hierarchical Parity-Multiplexer Computational effort decreases 42 with model-
building (on 2-parity-3-multiplexer)
Paritysubfunctions(adjacent pairs)have tightest linkages
Hypothesis validated
25
Program Growth
5-parity minimal program size ~ 53
26
Program Growth
11-multiplexer minimal program size ~ 27
27
Where do the Cycles Go
Problem hBOA Representation-Building
Program Evaluation
5-Parity 28 43 29
11-multiplex 5 5 89
CFS 80 10 11
Complexity O(Nl2a2) O(Nla) O(Nlc)
N is population size O(n105)l is program size a is the arity of the spacen is representation size O(aprogram size)c is number of test cases
28
Supervised Classification
Goals accuracies comparable to SVM
superior accuracy vs GP
simpler classifiers vs SVM and GP
29
Supervised Classification
How much simpler Consider average-sized formulae learned for the 6-multiplexer
MOSES 21 nodes max depth 4
and(or(not(x2) and(or(x1 x4) or(and(not(x1) x4) x6))) or(and(or(x1 x4) or(and(or(x5 x6) or(x2 and(x1 x5))) and(not(x1) x3))) and(or(not(x1) and(x2 x6)) or(not(x1) x3 x6) or(and(not(x1) x2) and(x2 x4) and(not(x1) x3)))))
or(and(not(x1) not(x2) x3) and(or(not(x2) and(x3 x6)) x1 x4) and(not(x1) x2 x5) and(x1 x2 x6))
GP (after reduction to ENF) 50 nodes max depth 7
30
Supervised Classification
Datasets taken from recent comp bio papers
Chronic fatigue syndrome (101 cases) based on 26 SNPs genes either in homozygosis in heterozygosis or not expressed 56 binary features
Lymphoma (77 cases) amp aging brains (19 cases) based on gene expression levels (continuous) 50 most-differentiating genes selected preprocessed into binary features based on medians
All experiments based on 10 independent runs of 10-fold cross-validation
31
Quantitative Results
Classification average test accuracy
Technique CFS Lymphoma Aging Brain
SVM 662 975 950
GP 673 779 700
MOSES 679 946 953
32
Quantitative Results
Benchmark performance artificial ant
6x less computational effort vs EP 20x less vs GP
parity problems 133x less vs EP 4x less vs GP on 5-parity found solutions to 6-parity (none found by EP or GP)
multiplexer problems 9x less vs GP on 11-multiplexer
33
Qualitative Results
Requirements for competent program evolution all requirements for competent optimization
+ exploit semantics
+ recombine programs only within bounded subspaces
Bipartite conception of problem difficulty program-level adapted from the optimization case
deme-level theory based on global properties of the space (deme-level neutrality deceptiveness etc)
34
Qualitative Results
Representation-building for programs
parameterization based on semantics
transforms program space properties to facilitate program evolution
probabilistic modeling over sets of program transformations
models compactly represent problem structure
35
Competent Program Evolution
Competent not just good performance explainability of good results robustness
Vision representations are important program learning is unique representations must be specialized based on semantics
MOSES meta-optimizing semantic evolutionary search exploiting semantics and managing demes
36
Committee
Dr Ron Loui (WashU chair)
Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech
Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)
18
Elegant Normal Form (Holman rsquo90)
Hierarchical normal form for Boolean formulae Reduction process takes time linear in formula size 99 of random 500-literal formulae reduced over 98
19
Syntactic vs Behavioral Distance
Is there a correlation between syntactic and behavioral distance
5000 unique random formulae of arity 10 with 30 literals each qualitatively similar results for arity 5
Computed the set of pairwise behavioral distances (truth-table Hamming distance) syntactic distances (tree edit distance normalized by tree size)
The same computation on the same formulae reduced to ENF
20
Syntactic vs Behavioral Distance
Is there a correlation between syntactic and behavioral distance
Random Formulae Reduced to ENF
21
Neighborhoods amp Knobs
What do neighborhoods look like behaviorally
1000 unique random formulae arity 5 100 literals each qualitatively similar results for arity 10
Enumerate all neighbors (edit distances lt2) compute behavioral distance from source
Neighborhoods in MOSES defined based on ENF neighbors are converted to ENF compared to original used to heuristically reduce total neighborhood size
22
Neighborhoods amp Knobs
What do neighborhoods look like behaviorally
Random formulae Reduced to ENF
23
Hierarchical Parity-Multiplexer
Study decomposition in a Boolean domain
Multiplexer function of arity k1 computed from k1 parity function of arity k2
total arity is k1k2
Hypothesis parity subfunctions will exhibit tighter linkages
24
Hierarchical Parity-Multiplexer Computational effort decreases 42 with model-
building (on 2-parity-3-multiplexer)
Paritysubfunctions(adjacent pairs)have tightest linkages
Hypothesis validated
25
Program Growth
5-parity minimal program size ~ 53
26
Program Growth
11-multiplexer minimal program size ~ 27
27
Where do the Cycles Go
Problem hBOA Representation-Building
Program Evaluation
5-Parity 28 43 29
11-multiplex 5 5 89
CFS 80 10 11
Complexity O(Nl2a2) O(Nla) O(Nlc)
N is population size O(n105)l is program size a is the arity of the spacen is representation size O(aprogram size)c is number of test cases
28
Supervised Classification
Goals accuracies comparable to SVM
superior accuracy vs GP
simpler classifiers vs SVM and GP
29
Supervised Classification
How much simpler Consider average-sized formulae learned for the 6-multiplexer
MOSES 21 nodes max depth 4
and(or(not(x2) and(or(x1 x4) or(and(not(x1) x4) x6))) or(and(or(x1 x4) or(and(or(x5 x6) or(x2 and(x1 x5))) and(not(x1) x3))) and(or(not(x1) and(x2 x6)) or(not(x1) x3 x6) or(and(not(x1) x2) and(x2 x4) and(not(x1) x3)))))
or(and(not(x1) not(x2) x3) and(or(not(x2) and(x3 x6)) x1 x4) and(not(x1) x2 x5) and(x1 x2 x6))
GP (after reduction to ENF) 50 nodes max depth 7
30
Supervised Classification
Datasets taken from recent comp bio papers
Chronic fatigue syndrome (101 cases) based on 26 SNPs genes either in homozygosis in heterozygosis or not expressed 56 binary features
Lymphoma (77 cases) amp aging brains (19 cases) based on gene expression levels (continuous) 50 most-differentiating genes selected preprocessed into binary features based on medians
All experiments based on 10 independent runs of 10-fold cross-validation
31
Quantitative Results
Classification average test accuracy
Technique CFS Lymphoma Aging Brain
SVM 662 975 950
GP 673 779 700
MOSES 679 946 953
32
Quantitative Results
Benchmark performance artificial ant
6x less computational effort vs EP 20x less vs GP
parity problems 133x less vs EP 4x less vs GP on 5-parity found solutions to 6-parity (none found by EP or GP)
multiplexer problems 9x less vs GP on 11-multiplexer
33
Qualitative Results
Requirements for competent program evolution all requirements for competent optimization
+ exploit semantics
+ recombine programs only within bounded subspaces
Bipartite conception of problem difficulty program-level adapted from the optimization case
deme-level theory based on global properties of the space (deme-level neutrality deceptiveness etc)
34
Qualitative Results
Representation-building for programs
parameterization based on semantics
transforms program space properties to facilitate program evolution
probabilistic modeling over sets of program transformations
models compactly represent problem structure
35
Competent Program Evolution
Competent not just good performance explainability of good results robustness
Vision representations are important program learning is unique representations must be specialized based on semantics
MOSES meta-optimizing semantic evolutionary search exploiting semantics and managing demes
36
Committee
Dr Ron Loui (WashU chair)
Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech
Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)
19
Syntactic vs Behavioral Distance
Is there a correlation between syntactic and behavioral distance
5000 unique random formulae of arity 10 with 30 literals each qualitatively similar results for arity 5
Computed the set of pairwise behavioral distances (truth-table Hamming distance) syntactic distances (tree edit distance normalized by tree size)
The same computation on the same formulae reduced to ENF
20
Syntactic vs Behavioral Distance
Is there a correlation between syntactic and behavioral distance
Random Formulae Reduced to ENF
21
Neighborhoods amp Knobs
What do neighborhoods look like behaviorally
1000 unique random formulae arity 5 100 literals each qualitatively similar results for arity 10
Enumerate all neighbors (edit distances lt2) compute behavioral distance from source
Neighborhoods in MOSES defined based on ENF neighbors are converted to ENF compared to original used to heuristically reduce total neighborhood size
22
Neighborhoods amp Knobs
What do neighborhoods look like behaviorally
Random formulae Reduced to ENF
23
Hierarchical Parity-Multiplexer
Study decomposition in a Boolean domain
Multiplexer function of arity k1 computed from k1 parity function of arity k2
total arity is k1k2
Hypothesis parity subfunctions will exhibit tighter linkages
24
Hierarchical Parity-Multiplexer Computational effort decreases 42 with model-
building (on 2-parity-3-multiplexer)
Paritysubfunctions(adjacent pairs)have tightest linkages
Hypothesis validated
25
Program Growth
5-parity minimal program size ~ 53
26
Program Growth
11-multiplexer minimal program size ~ 27
27
Where do the Cycles Go
Problem hBOA Representation-Building
Program Evaluation
5-Parity 28 43 29
11-multiplex 5 5 89
CFS 80 10 11
Complexity O(Nl2a2) O(Nla) O(Nlc)
N is population size O(n105)l is program size a is the arity of the spacen is representation size O(aprogram size)c is number of test cases
28
Supervised Classification
Goals accuracies comparable to SVM
superior accuracy vs GP
simpler classifiers vs SVM and GP
29
Supervised Classification
How much simpler Consider average-sized formulae learned for the 6-multiplexer
MOSES 21 nodes max depth 4
and(or(not(x2) and(or(x1 x4) or(and(not(x1) x4) x6))) or(and(or(x1 x4) or(and(or(x5 x6) or(x2 and(x1 x5))) and(not(x1) x3))) and(or(not(x1) and(x2 x6)) or(not(x1) x3 x6) or(and(not(x1) x2) and(x2 x4) and(not(x1) x3)))))
or(and(not(x1) not(x2) x3) and(or(not(x2) and(x3 x6)) x1 x4) and(not(x1) x2 x5) and(x1 x2 x6))
GP (after reduction to ENF) 50 nodes max depth 7
30
Supervised Classification
Datasets taken from recent comp bio papers
Chronic fatigue syndrome (101 cases) based on 26 SNPs genes either in homozygosis in heterozygosis or not expressed 56 binary features
Lymphoma (77 cases) amp aging brains (19 cases) based on gene expression levels (continuous) 50 most-differentiating genes selected preprocessed into binary features based on medians
All experiments based on 10 independent runs of 10-fold cross-validation
31
Quantitative Results
Classification average test accuracy
Technique CFS Lymphoma Aging Brain
SVM 662 975 950
GP 673 779 700
MOSES 679 946 953
32
Quantitative Results
Benchmark performance artificial ant
6x less computational effort vs EP 20x less vs GP
parity problems 133x less vs EP 4x less vs GP on 5-parity found solutions to 6-parity (none found by EP or GP)
multiplexer problems 9x less vs GP on 11-multiplexer
33
Qualitative Results
Requirements for competent program evolution all requirements for competent optimization
+ exploit semantics
+ recombine programs only within bounded subspaces
Bipartite conception of problem difficulty program-level adapted from the optimization case
deme-level theory based on global properties of the space (deme-level neutrality deceptiveness etc)
34
Qualitative Results
Representation-building for programs
parameterization based on semantics
transforms program space properties to facilitate program evolution
probabilistic modeling over sets of program transformations
models compactly represent problem structure
35
Competent Program Evolution
Competent not just good performance explainability of good results robustness
Vision representations are important program learning is unique representations must be specialized based on semantics
MOSES meta-optimizing semantic evolutionary search exploiting semantics and managing demes
36
Committee
Dr Ron Loui (WashU chair)
Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech
Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)
20
Syntactic vs Behavioral Distance
Is there a correlation between syntactic and behavioral distance
Random Formulae Reduced to ENF
21
Neighborhoods amp Knobs
What do neighborhoods look like behaviorally
1000 unique random formulae arity 5 100 literals each qualitatively similar results for arity 10
Enumerate all neighbors (edit distances lt2) compute behavioral distance from source
Neighborhoods in MOSES defined based on ENF neighbors are converted to ENF compared to original used to heuristically reduce total neighborhood size
22
Neighborhoods amp Knobs
What do neighborhoods look like behaviorally
Random formulae Reduced to ENF
23
Hierarchical Parity-Multiplexer
Study decomposition in a Boolean domain
Multiplexer function of arity k1 computed from k1 parity function of arity k2
total arity is k1k2
Hypothesis parity subfunctions will exhibit tighter linkages
24
Hierarchical Parity-Multiplexer Computational effort decreases 42 with model-
building (on 2-parity-3-multiplexer)
Paritysubfunctions(adjacent pairs)have tightest linkages
Hypothesis validated
25
Program Growth
5-parity minimal program size ~ 53
26
Program Growth
11-multiplexer minimal program size ~ 27
27
Where do the Cycles Go
Problem hBOA Representation-Building
Program Evaluation
5-Parity 28 43 29
11-multiplex 5 5 89
CFS 80 10 11
Complexity O(Nl2a2) O(Nla) O(Nlc)
N is population size O(n105)l is program size a is the arity of the spacen is representation size O(aprogram size)c is number of test cases
28
Supervised Classification
Goals accuracies comparable to SVM
superior accuracy vs GP
simpler classifiers vs SVM and GP
29
Supervised Classification
How much simpler Consider average-sized formulae learned for the 6-multiplexer
MOSES 21 nodes max depth 4
and(or(not(x2) and(or(x1 x4) or(and(not(x1) x4) x6))) or(and(or(x1 x4) or(and(or(x5 x6) or(x2 and(x1 x5))) and(not(x1) x3))) and(or(not(x1) and(x2 x6)) or(not(x1) x3 x6) or(and(not(x1) x2) and(x2 x4) and(not(x1) x3)))))
or(and(not(x1) not(x2) x3) and(or(not(x2) and(x3 x6)) x1 x4) and(not(x1) x2 x5) and(x1 x2 x6))
GP (after reduction to ENF) 50 nodes max depth 7
30
Supervised Classification
Datasets taken from recent comp bio papers
Chronic fatigue syndrome (101 cases) based on 26 SNPs genes either in homozygosis in heterozygosis or not expressed 56 binary features
Lymphoma (77 cases) amp aging brains (19 cases) based on gene expression levels (continuous) 50 most-differentiating genes selected preprocessed into binary features based on medians
All experiments based on 10 independent runs of 10-fold cross-validation
31
Quantitative Results
Classification average test accuracy
Technique CFS Lymphoma Aging Brain
SVM 662 975 950
GP 673 779 700
MOSES 679 946 953
32
Quantitative Results
Benchmark performance artificial ant
6x less computational effort vs EP 20x less vs GP
parity problems 133x less vs EP 4x less vs GP on 5-parity found solutions to 6-parity (none found by EP or GP)
multiplexer problems 9x less vs GP on 11-multiplexer
33
Qualitative Results
Requirements for competent program evolution all requirements for competent optimization
+ exploit semantics
+ recombine programs only within bounded subspaces
Bipartite conception of problem difficulty program-level adapted from the optimization case
deme-level theory based on global properties of the space (deme-level neutrality deceptiveness etc)
34
Qualitative Results
Representation-building for programs
parameterization based on semantics
transforms program space properties to facilitate program evolution
probabilistic modeling over sets of program transformations
models compactly represent problem structure
35
Competent Program Evolution
Competent not just good performance explainability of good results robustness
Vision representations are important program learning is unique representations must be specialized based on semantics
MOSES meta-optimizing semantic evolutionary search exploiting semantics and managing demes
36
Committee
Dr Ron Loui (WashU chair)
Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech
Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)
21
Neighborhoods amp Knobs
What do neighborhoods look like behaviorally
1000 unique random formulae arity 5 100 literals each qualitatively similar results for arity 10
Enumerate all neighbors (edit distances lt2) compute behavioral distance from source
Neighborhoods in MOSES defined based on ENF neighbors are converted to ENF compared to original used to heuristically reduce total neighborhood size
22
Neighborhoods amp Knobs
What do neighborhoods look like behaviorally
Random formulae Reduced to ENF
23
Hierarchical Parity-Multiplexer
Study decomposition in a Boolean domain
Multiplexer function of arity k1 computed from k1 parity function of arity k2
total arity is k1k2
Hypothesis parity subfunctions will exhibit tighter linkages
24
Hierarchical Parity-Multiplexer Computational effort decreases 42 with model-
building (on 2-parity-3-multiplexer)
Paritysubfunctions(adjacent pairs)have tightest linkages
Hypothesis validated
25
Program Growth
5-parity minimal program size ~ 53
26
Program Growth
11-multiplexer minimal program size ~ 27
27
Where do the Cycles Go
Problem hBOA Representation-Building
Program Evaluation
5-Parity 28 43 29
11-multiplex 5 5 89
CFS 80 10 11
Complexity O(Nl2a2) O(Nla) O(Nlc)
N is population size O(n105)l is program size a is the arity of the spacen is representation size O(aprogram size)c is number of test cases
28
Supervised Classification
Goals accuracies comparable to SVM
superior accuracy vs GP
simpler classifiers vs SVM and GP
29
Supervised Classification
How much simpler Consider average-sized formulae learned for the 6-multiplexer
MOSES 21 nodes max depth 4
and(or(not(x2) and(or(x1 x4) or(and(not(x1) x4) x6))) or(and(or(x1 x4) or(and(or(x5 x6) or(x2 and(x1 x5))) and(not(x1) x3))) and(or(not(x1) and(x2 x6)) or(not(x1) x3 x6) or(and(not(x1) x2) and(x2 x4) and(not(x1) x3)))))
or(and(not(x1) not(x2) x3) and(or(not(x2) and(x3 x6)) x1 x4) and(not(x1) x2 x5) and(x1 x2 x6))
GP (after reduction to ENF) 50 nodes max depth 7
30
Supervised Classification
Datasets taken from recent comp bio papers
Chronic fatigue syndrome (101 cases) based on 26 SNPs genes either in homozygosis in heterozygosis or not expressed 56 binary features
Lymphoma (77 cases) amp aging brains (19 cases) based on gene expression levels (continuous) 50 most-differentiating genes selected preprocessed into binary features based on medians
All experiments based on 10 independent runs of 10-fold cross-validation
31
Quantitative Results
Classification average test accuracy
Technique CFS Lymphoma Aging Brain
SVM 662 975 950
GP 673 779 700
MOSES 679 946 953
32
Quantitative Results
Benchmark performance artificial ant
6x less computational effort vs EP 20x less vs GP
parity problems 133x less vs EP 4x less vs GP on 5-parity found solutions to 6-parity (none found by EP or GP)
multiplexer problems 9x less vs GP on 11-multiplexer
33
Qualitative Results
Requirements for competent program evolution all requirements for competent optimization
+ exploit semantics
+ recombine programs only within bounded subspaces
Bipartite conception of problem difficulty program-level adapted from the optimization case
deme-level theory based on global properties of the space (deme-level neutrality deceptiveness etc)
34
Qualitative Results
Representation-building for programs
parameterization based on semantics
transforms program space properties to facilitate program evolution
probabilistic modeling over sets of program transformations
models compactly represent problem structure
35
Competent Program Evolution
Competent not just good performance explainability of good results robustness
Vision representations are important program learning is unique representations must be specialized based on semantics
MOSES meta-optimizing semantic evolutionary search exploiting semantics and managing demes
36
Committee
Dr Ron Loui (WashU chair)
Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech
Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)
22
Neighborhoods amp Knobs
What do neighborhoods look like behaviorally
Random formulae Reduced to ENF
23
Hierarchical Parity-Multiplexer
Study decomposition in a Boolean domain
Multiplexer function of arity k1 computed from k1 parity function of arity k2
total arity is k1k2
Hypothesis parity subfunctions will exhibit tighter linkages
24
Hierarchical Parity-Multiplexer Computational effort decreases 42 with model-
building (on 2-parity-3-multiplexer)
Paritysubfunctions(adjacent pairs)have tightest linkages
Hypothesis validated
25
Program Growth
5-parity minimal program size ~ 53
26
Program Growth
11-multiplexer minimal program size ~ 27
27
Where do the Cycles Go
Problem hBOA Representation-Building
Program Evaluation
5-Parity 28 43 29
11-multiplex 5 5 89
CFS 80 10 11
Complexity O(Nl2a2) O(Nla) O(Nlc)
N is population size O(n105)l is program size a is the arity of the spacen is representation size O(aprogram size)c is number of test cases
28
Supervised Classification
Goals accuracies comparable to SVM
superior accuracy vs GP
simpler classifiers vs SVM and GP
29
Supervised Classification
How much simpler Consider average-sized formulae learned for the 6-multiplexer
MOSES 21 nodes max depth 4
and(or(not(x2) and(or(x1 x4) or(and(not(x1) x4) x6))) or(and(or(x1 x4) or(and(or(x5 x6) or(x2 and(x1 x5))) and(not(x1) x3))) and(or(not(x1) and(x2 x6)) or(not(x1) x3 x6) or(and(not(x1) x2) and(x2 x4) and(not(x1) x3)))))
or(and(not(x1) not(x2) x3) and(or(not(x2) and(x3 x6)) x1 x4) and(not(x1) x2 x5) and(x1 x2 x6))
GP (after reduction to ENF) 50 nodes max depth 7
30
Supervised Classification
Datasets taken from recent comp bio papers
Chronic fatigue syndrome (101 cases) based on 26 SNPs genes either in homozygosis in heterozygosis or not expressed 56 binary features
Lymphoma (77 cases) amp aging brains (19 cases) based on gene expression levels (continuous) 50 most-differentiating genes selected preprocessed into binary features based on medians
All experiments based on 10 independent runs of 10-fold cross-validation
31
Quantitative Results
Classification average test accuracy
Technique CFS Lymphoma Aging Brain
SVM 662 975 950
GP 673 779 700
MOSES 679 946 953
32
Quantitative Results
Benchmark performance artificial ant
6x less computational effort vs EP 20x less vs GP
parity problems 133x less vs EP 4x less vs GP on 5-parity found solutions to 6-parity (none found by EP or GP)
multiplexer problems 9x less vs GP on 11-multiplexer
33
Qualitative Results
Requirements for competent program evolution all requirements for competent optimization
+ exploit semantics
+ recombine programs only within bounded subspaces
Bipartite conception of problem difficulty program-level adapted from the optimization case
deme-level theory based on global properties of the space (deme-level neutrality deceptiveness etc)
34
Qualitative Results
Representation-building for programs
parameterization based on semantics
transforms program space properties to facilitate program evolution
probabilistic modeling over sets of program transformations
models compactly represent problem structure
35
Competent Program Evolution
Competent not just good performance explainability of good results robustness
Vision representations are important program learning is unique representations must be specialized based on semantics
MOSES meta-optimizing semantic evolutionary search exploiting semantics and managing demes
36
Committee
Dr Ron Loui (WashU chair)
Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech
Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)
23
Hierarchical Parity-Multiplexer
Study decomposition in a Boolean domain
Multiplexer function of arity k1 computed from k1 parity function of arity k2
total arity is k1k2
Hypothesis parity subfunctions will exhibit tighter linkages
24
Hierarchical Parity-Multiplexer Computational effort decreases 42 with model-
building (on 2-parity-3-multiplexer)
Paritysubfunctions(adjacent pairs)have tightest linkages
Hypothesis validated
25
Program Growth
5-parity minimal program size ~ 53
26
Program Growth
11-multiplexer minimal program size ~ 27
27
Where do the Cycles Go
Problem hBOA Representation-Building
Program Evaluation
5-Parity 28 43 29
11-multiplex 5 5 89
CFS 80 10 11
Complexity O(Nl2a2) O(Nla) O(Nlc)
N is population size O(n105)l is program size a is the arity of the spacen is representation size O(aprogram size)c is number of test cases
28
Supervised Classification
Goals accuracies comparable to SVM
superior accuracy vs GP
simpler classifiers vs SVM and GP
29
Supervised Classification
How much simpler Consider average-sized formulae learned for the 6-multiplexer
MOSES 21 nodes max depth 4
and(or(not(x2) and(or(x1 x4) or(and(not(x1) x4) x6))) or(and(or(x1 x4) or(and(or(x5 x6) or(x2 and(x1 x5))) and(not(x1) x3))) and(or(not(x1) and(x2 x6)) or(not(x1) x3 x6) or(and(not(x1) x2) and(x2 x4) and(not(x1) x3)))))
or(and(not(x1) not(x2) x3) and(or(not(x2) and(x3 x6)) x1 x4) and(not(x1) x2 x5) and(x1 x2 x6))
GP (after reduction to ENF) 50 nodes max depth 7
30
Supervised Classification
Datasets taken from recent comp bio papers
Chronic fatigue syndrome (101 cases) based on 26 SNPs genes either in homozygosis in heterozygosis or not expressed 56 binary features
Lymphoma (77 cases) amp aging brains (19 cases) based on gene expression levels (continuous) 50 most-differentiating genes selected preprocessed into binary features based on medians
All experiments based on 10 independent runs of 10-fold cross-validation
31
Quantitative Results
Classification average test accuracy
Technique CFS Lymphoma Aging Brain
SVM 662 975 950
GP 673 779 700
MOSES 679 946 953
32
Quantitative Results
Benchmark performance artificial ant
6x less computational effort vs EP 20x less vs GP
parity problems 133x less vs EP 4x less vs GP on 5-parity found solutions to 6-parity (none found by EP or GP)
multiplexer problems 9x less vs GP on 11-multiplexer
33
Qualitative Results
Requirements for competent program evolution all requirements for competent optimization
+ exploit semantics
+ recombine programs only within bounded subspaces
Bipartite conception of problem difficulty program-level adapted from the optimization case
deme-level theory based on global properties of the space (deme-level neutrality deceptiveness etc)
34
Qualitative Results
Representation-building for programs
parameterization based on semantics
transforms program space properties to facilitate program evolution
probabilistic modeling over sets of program transformations
models compactly represent problem structure
35
Competent Program Evolution
Competent not just good performance explainability of good results robustness
Vision representations are important program learning is unique representations must be specialized based on semantics
MOSES meta-optimizing semantic evolutionary search exploiting semantics and managing demes
36
Committee
Dr Ron Loui (WashU chair)
Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech
Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)
24
Hierarchical Parity-Multiplexer Computational effort decreases 42 with model-
building (on 2-parity-3-multiplexer)
Paritysubfunctions(adjacent pairs)have tightest linkages
Hypothesis validated
25
Program Growth
5-parity minimal program size ~ 53
26
Program Growth
11-multiplexer minimal program size ~ 27
27
Where do the Cycles Go
Problem hBOA Representation-Building
Program Evaluation
5-Parity 28 43 29
11-multiplex 5 5 89
CFS 80 10 11
Complexity O(Nl2a2) O(Nla) O(Nlc)
N is population size O(n105)l is program size a is the arity of the spacen is representation size O(aprogram size)c is number of test cases
28
Supervised Classification
Goals accuracies comparable to SVM
superior accuracy vs GP
simpler classifiers vs SVM and GP
29
Supervised Classification
How much simpler Consider average-sized formulae learned for the 6-multiplexer
MOSES 21 nodes max depth 4
and(or(not(x2) and(or(x1 x4) or(and(not(x1) x4) x6))) or(and(or(x1 x4) or(and(or(x5 x6) or(x2 and(x1 x5))) and(not(x1) x3))) and(or(not(x1) and(x2 x6)) or(not(x1) x3 x6) or(and(not(x1) x2) and(x2 x4) and(not(x1) x3)))))
or(and(not(x1) not(x2) x3) and(or(not(x2) and(x3 x6)) x1 x4) and(not(x1) x2 x5) and(x1 x2 x6))
GP (after reduction to ENF) 50 nodes max depth 7
30
Supervised Classification
Datasets taken from recent comp bio papers
Chronic fatigue syndrome (101 cases) based on 26 SNPs genes either in homozygosis in heterozygosis or not expressed 56 binary features
Lymphoma (77 cases) amp aging brains (19 cases) based on gene expression levels (continuous) 50 most-differentiating genes selected preprocessed into binary features based on medians
All experiments based on 10 independent runs of 10-fold cross-validation
31
Quantitative Results
Classification average test accuracy
Technique CFS Lymphoma Aging Brain
SVM 662 975 950
GP 673 779 700
MOSES 679 946 953
32
Quantitative Results
Benchmark performance artificial ant
6x less computational effort vs EP 20x less vs GP
parity problems 133x less vs EP 4x less vs GP on 5-parity found solutions to 6-parity (none found by EP or GP)
multiplexer problems 9x less vs GP on 11-multiplexer
33
Qualitative Results
Requirements for competent program evolution all requirements for competent optimization
+ exploit semantics
+ recombine programs only within bounded subspaces
Bipartite conception of problem difficulty program-level adapted from the optimization case
deme-level theory based on global properties of the space (deme-level neutrality deceptiveness etc)
34
Qualitative Results
Representation-building for programs
parameterization based on semantics
transforms program space properties to facilitate program evolution
probabilistic modeling over sets of program transformations
models compactly represent problem structure
35
Competent Program Evolution
Competent not just good performance explainability of good results robustness
Vision representations are important program learning is unique representations must be specialized based on semantics
MOSES meta-optimizing semantic evolutionary search exploiting semantics and managing demes
36
Committee
Dr Ron Loui (WashU chair)
Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech
Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)
25
Program Growth
5-parity minimal program size ~ 53
26
Program Growth
11-multiplexer minimal program size ~ 27
27
Where do the Cycles Go
Problem hBOA Representation-Building
Program Evaluation
5-Parity 28 43 29
11-multiplex 5 5 89
CFS 80 10 11
Complexity O(Nl2a2) O(Nla) O(Nlc)
N is population size O(n105)l is program size a is the arity of the spacen is representation size O(aprogram size)c is number of test cases
28
Supervised Classification
Goals accuracies comparable to SVM
superior accuracy vs GP
simpler classifiers vs SVM and GP
29
Supervised Classification
How much simpler Consider average-sized formulae learned for the 6-multiplexer
MOSES 21 nodes max depth 4
and(or(not(x2) and(or(x1 x4) or(and(not(x1) x4) x6))) or(and(or(x1 x4) or(and(or(x5 x6) or(x2 and(x1 x5))) and(not(x1) x3))) and(or(not(x1) and(x2 x6)) or(not(x1) x3 x6) or(and(not(x1) x2) and(x2 x4) and(not(x1) x3)))))
or(and(not(x1) not(x2) x3) and(or(not(x2) and(x3 x6)) x1 x4) and(not(x1) x2 x5) and(x1 x2 x6))
GP (after reduction to ENF) 50 nodes max depth 7
30
Supervised Classification
Datasets taken from recent comp bio papers
Chronic fatigue syndrome (101 cases) based on 26 SNPs genes either in homozygosis in heterozygosis or not expressed 56 binary features
Lymphoma (77 cases) amp aging brains (19 cases) based on gene expression levels (continuous) 50 most-differentiating genes selected preprocessed into binary features based on medians
All experiments based on 10 independent runs of 10-fold cross-validation
31
Quantitative Results
Classification average test accuracy
Technique CFS Lymphoma Aging Brain
SVM 662 975 950
GP 673 779 700
MOSES 679 946 953
32
Quantitative Results
Benchmark performance artificial ant
6x less computational effort vs EP 20x less vs GP
parity problems 133x less vs EP 4x less vs GP on 5-parity found solutions to 6-parity (none found by EP or GP)
multiplexer problems 9x less vs GP on 11-multiplexer
33
Qualitative Results
Requirements for competent program evolution all requirements for competent optimization
+ exploit semantics
+ recombine programs only within bounded subspaces
Bipartite conception of problem difficulty program-level adapted from the optimization case
deme-level theory based on global properties of the space (deme-level neutrality deceptiveness etc)
34
Qualitative Results
Representation-building for programs
parameterization based on semantics
transforms program space properties to facilitate program evolution
probabilistic modeling over sets of program transformations
models compactly represent problem structure
35
Competent Program Evolution
Competent not just good performance explainability of good results robustness
Vision representations are important program learning is unique representations must be specialized based on semantics
MOSES meta-optimizing semantic evolutionary search exploiting semantics and managing demes
36
Committee
Dr Ron Loui (WashU chair)
Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech
Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)
26
Program Growth
11-multiplexer minimal program size ~ 27
27
Where do the Cycles Go
Problem hBOA Representation-Building
Program Evaluation
5-Parity 28 43 29
11-multiplex 5 5 89
CFS 80 10 11
Complexity O(Nl2a2) O(Nla) O(Nlc)
N is population size O(n105)l is program size a is the arity of the spacen is representation size O(aprogram size)c is number of test cases
28
Supervised Classification
Goals accuracies comparable to SVM
superior accuracy vs GP
simpler classifiers vs SVM and GP
29
Supervised Classification
How much simpler Consider average-sized formulae learned for the 6-multiplexer
MOSES 21 nodes max depth 4
and(or(not(x2) and(or(x1 x4) or(and(not(x1) x4) x6))) or(and(or(x1 x4) or(and(or(x5 x6) or(x2 and(x1 x5))) and(not(x1) x3))) and(or(not(x1) and(x2 x6)) or(not(x1) x3 x6) or(and(not(x1) x2) and(x2 x4) and(not(x1) x3)))))
or(and(not(x1) not(x2) x3) and(or(not(x2) and(x3 x6)) x1 x4) and(not(x1) x2 x5) and(x1 x2 x6))
GP (after reduction to ENF) 50 nodes max depth 7
30
Supervised Classification
Datasets taken from recent comp bio papers
Chronic fatigue syndrome (101 cases) based on 26 SNPs genes either in homozygosis in heterozygosis or not expressed 56 binary features
Lymphoma (77 cases) amp aging brains (19 cases) based on gene expression levels (continuous) 50 most-differentiating genes selected preprocessed into binary features based on medians
All experiments based on 10 independent runs of 10-fold cross-validation
31
Quantitative Results
Classification average test accuracy
Technique CFS Lymphoma Aging Brain
SVM 662 975 950
GP 673 779 700
MOSES 679 946 953
32
Quantitative Results
Benchmark performance artificial ant
6x less computational effort vs EP 20x less vs GP
parity problems 133x less vs EP 4x less vs GP on 5-parity found solutions to 6-parity (none found by EP or GP)
multiplexer problems 9x less vs GP on 11-multiplexer
33
Qualitative Results
Requirements for competent program evolution all requirements for competent optimization
+ exploit semantics
+ recombine programs only within bounded subspaces
Bipartite conception of problem difficulty program-level adapted from the optimization case
deme-level theory based on global properties of the space (deme-level neutrality deceptiveness etc)
34
Qualitative Results
Representation-building for programs
parameterization based on semantics
transforms program space properties to facilitate program evolution
probabilistic modeling over sets of program transformations
models compactly represent problem structure
35
Competent Program Evolution
Competent not just good performance explainability of good results robustness
Vision representations are important program learning is unique representations must be specialized based on semantics
MOSES meta-optimizing semantic evolutionary search exploiting semantics and managing demes
36
Committee
Dr Ron Loui (WashU chair)
Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech
Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)
27
Where do the Cycles Go
Problem hBOA Representation-Building
Program Evaluation
5-Parity 28 43 29
11-multiplex 5 5 89
CFS 80 10 11
Complexity O(Nl2a2) O(Nla) O(Nlc)
N is population size O(n105)l is program size a is the arity of the spacen is representation size O(aprogram size)c is number of test cases
28
Supervised Classification
Goals accuracies comparable to SVM
superior accuracy vs GP
simpler classifiers vs SVM and GP
29
Supervised Classification
How much simpler Consider average-sized formulae learned for the 6-multiplexer
MOSES 21 nodes max depth 4
and(or(not(x2) and(or(x1 x4) or(and(not(x1) x4) x6))) or(and(or(x1 x4) or(and(or(x5 x6) or(x2 and(x1 x5))) and(not(x1) x3))) and(or(not(x1) and(x2 x6)) or(not(x1) x3 x6) or(and(not(x1) x2) and(x2 x4) and(not(x1) x3)))))
or(and(not(x1) not(x2) x3) and(or(not(x2) and(x3 x6)) x1 x4) and(not(x1) x2 x5) and(x1 x2 x6))
GP (after reduction to ENF) 50 nodes max depth 7
30
Supervised Classification
Datasets taken from recent comp bio papers
Chronic fatigue syndrome (101 cases) based on 26 SNPs genes either in homozygosis in heterozygosis or not expressed 56 binary features
Lymphoma (77 cases) amp aging brains (19 cases) based on gene expression levels (continuous) 50 most-differentiating genes selected preprocessed into binary features based on medians
All experiments based on 10 independent runs of 10-fold cross-validation
31
Quantitative Results
Classification average test accuracy
Technique CFS Lymphoma Aging Brain
SVM 662 975 950
GP 673 779 700
MOSES 679 946 953
32
Quantitative Results
Benchmark performance artificial ant
6x less computational effort vs EP 20x less vs GP
parity problems 133x less vs EP 4x less vs GP on 5-parity found solutions to 6-parity (none found by EP or GP)
multiplexer problems 9x less vs GP on 11-multiplexer
33
Qualitative Results
Requirements for competent program evolution all requirements for competent optimization
+ exploit semantics
+ recombine programs only within bounded subspaces
Bipartite conception of problem difficulty program-level adapted from the optimization case
deme-level theory based on global properties of the space (deme-level neutrality deceptiveness etc)
34
Qualitative Results
Representation-building for programs
parameterization based on semantics
transforms program space properties to facilitate program evolution
probabilistic modeling over sets of program transformations
models compactly represent problem structure
35
Competent Program Evolution
Competent not just good performance explainability of good results robustness
Vision representations are important program learning is unique representations must be specialized based on semantics
MOSES meta-optimizing semantic evolutionary search exploiting semantics and managing demes
36
Committee
Dr Ron Loui (WashU chair)
Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech
Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)
28
Supervised Classification
Goals accuracies comparable to SVM
superior accuracy vs GP
simpler classifiers vs SVM and GP
29
Supervised Classification
How much simpler Consider average-sized formulae learned for the 6-multiplexer
MOSES 21 nodes max depth 4
and(or(not(x2) and(or(x1 x4) or(and(not(x1) x4) x6))) or(and(or(x1 x4) or(and(or(x5 x6) or(x2 and(x1 x5))) and(not(x1) x3))) and(or(not(x1) and(x2 x6)) or(not(x1) x3 x6) or(and(not(x1) x2) and(x2 x4) and(not(x1) x3)))))
or(and(not(x1) not(x2) x3) and(or(not(x2) and(x3 x6)) x1 x4) and(not(x1) x2 x5) and(x1 x2 x6))
GP (after reduction to ENF) 50 nodes max depth 7
30
Supervised Classification
Datasets taken from recent comp bio papers
Chronic fatigue syndrome (101 cases) based on 26 SNPs genes either in homozygosis in heterozygosis or not expressed 56 binary features
Lymphoma (77 cases) amp aging brains (19 cases) based on gene expression levels (continuous) 50 most-differentiating genes selected preprocessed into binary features based on medians
All experiments based on 10 independent runs of 10-fold cross-validation
31
Quantitative Results
Classification average test accuracy
Technique CFS Lymphoma Aging Brain
SVM 662 975 950
GP 673 779 700
MOSES 679 946 953
32
Quantitative Results
Benchmark performance artificial ant
6x less computational effort vs EP 20x less vs GP
parity problems 133x less vs EP 4x less vs GP on 5-parity found solutions to 6-parity (none found by EP or GP)
multiplexer problems 9x less vs GP on 11-multiplexer
33
Qualitative Results
Requirements for competent program evolution all requirements for competent optimization
+ exploit semantics
+ recombine programs only within bounded subspaces
Bipartite conception of problem difficulty program-level adapted from the optimization case
deme-level theory based on global properties of the space (deme-level neutrality deceptiveness etc)
34
Qualitative Results
Representation-building for programs
parameterization based on semantics
transforms program space properties to facilitate program evolution
probabilistic modeling over sets of program transformations
models compactly represent problem structure
35
Competent Program Evolution
Competent not just good performance explainability of good results robustness
Vision representations are important program learning is unique representations must be specialized based on semantics
MOSES meta-optimizing semantic evolutionary search exploiting semantics and managing demes
36
Committee
Dr Ron Loui (WashU chair)
Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech
Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)
29
Supervised Classification
How much simpler Consider average-sized formulae learned for the 6-multiplexer
MOSES 21 nodes max depth 4
and(or(not(x2) and(or(x1 x4) or(and(not(x1) x4) x6))) or(and(or(x1 x4) or(and(or(x5 x6) or(x2 and(x1 x5))) and(not(x1) x3))) and(or(not(x1) and(x2 x6)) or(not(x1) x3 x6) or(and(not(x1) x2) and(x2 x4) and(not(x1) x3)))))
or(and(not(x1) not(x2) x3) and(or(not(x2) and(x3 x6)) x1 x4) and(not(x1) x2 x5) and(x1 x2 x6))
GP (after reduction to ENF) 50 nodes max depth 7
30
Supervised Classification
Datasets taken from recent comp bio papers
Chronic fatigue syndrome (101 cases) based on 26 SNPs genes either in homozygosis in heterozygosis or not expressed 56 binary features
Lymphoma (77 cases) amp aging brains (19 cases) based on gene expression levels (continuous) 50 most-differentiating genes selected preprocessed into binary features based on medians
All experiments based on 10 independent runs of 10-fold cross-validation
31
Quantitative Results
Classification average test accuracy
Technique CFS Lymphoma Aging Brain
SVM 662 975 950
GP 673 779 700
MOSES 679 946 953
32
Quantitative Results
Benchmark performance artificial ant
6x less computational effort vs EP 20x less vs GP
parity problems 133x less vs EP 4x less vs GP on 5-parity found solutions to 6-parity (none found by EP or GP)
multiplexer problems 9x less vs GP on 11-multiplexer
33
Qualitative Results
Requirements for competent program evolution all requirements for competent optimization
+ exploit semantics
+ recombine programs only within bounded subspaces
Bipartite conception of problem difficulty program-level adapted from the optimization case
deme-level theory based on global properties of the space (deme-level neutrality deceptiveness etc)
34
Qualitative Results
Representation-building for programs
parameterization based on semantics
transforms program space properties to facilitate program evolution
probabilistic modeling over sets of program transformations
models compactly represent problem structure
35
Competent Program Evolution
Competent not just good performance explainability of good results robustness
Vision representations are important program learning is unique representations must be specialized based on semantics
MOSES meta-optimizing semantic evolutionary search exploiting semantics and managing demes
36
Committee
Dr Ron Loui (WashU chair)
Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech
Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)
30
Supervised Classification
Datasets taken from recent comp bio papers
Chronic fatigue syndrome (101 cases) based on 26 SNPs genes either in homozygosis in heterozygosis or not expressed 56 binary features
Lymphoma (77 cases) amp aging brains (19 cases) based on gene expression levels (continuous) 50 most-differentiating genes selected preprocessed into binary features based on medians
All experiments based on 10 independent runs of 10-fold cross-validation
31
Quantitative Results
Classification average test accuracy
Technique CFS Lymphoma Aging Brain
SVM 662 975 950
GP 673 779 700
MOSES 679 946 953
32
Quantitative Results
Benchmark performance artificial ant
6x less computational effort vs EP 20x less vs GP
parity problems 133x less vs EP 4x less vs GP on 5-parity found solutions to 6-parity (none found by EP or GP)
multiplexer problems 9x less vs GP on 11-multiplexer
33
Qualitative Results
Requirements for competent program evolution all requirements for competent optimization
+ exploit semantics
+ recombine programs only within bounded subspaces
Bipartite conception of problem difficulty program-level adapted from the optimization case
deme-level theory based on global properties of the space (deme-level neutrality deceptiveness etc)
34
Qualitative Results
Representation-building for programs
parameterization based on semantics
transforms program space properties to facilitate program evolution
probabilistic modeling over sets of program transformations
models compactly represent problem structure
35
Competent Program Evolution
Competent not just good performance explainability of good results robustness
Vision representations are important program learning is unique representations must be specialized based on semantics
MOSES meta-optimizing semantic evolutionary search exploiting semantics and managing demes
36
Committee
Dr Ron Loui (WashU chair)
Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech
Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)
31
Quantitative Results
Classification average test accuracy
Technique CFS Lymphoma Aging Brain
SVM 662 975 950
GP 673 779 700
MOSES 679 946 953
32
Quantitative Results
Benchmark performance artificial ant
6x less computational effort vs EP 20x less vs GP
parity problems 133x less vs EP 4x less vs GP on 5-parity found solutions to 6-parity (none found by EP or GP)
multiplexer problems 9x less vs GP on 11-multiplexer
33
Qualitative Results
Requirements for competent program evolution all requirements for competent optimization
+ exploit semantics
+ recombine programs only within bounded subspaces
Bipartite conception of problem difficulty program-level adapted from the optimization case
deme-level theory based on global properties of the space (deme-level neutrality deceptiveness etc)
34
Qualitative Results
Representation-building for programs
parameterization based on semantics
transforms program space properties to facilitate program evolution
probabilistic modeling over sets of program transformations
models compactly represent problem structure
35
Competent Program Evolution
Competent not just good performance explainability of good results robustness
Vision representations are important program learning is unique representations must be specialized based on semantics
MOSES meta-optimizing semantic evolutionary search exploiting semantics and managing demes
36
Committee
Dr Ron Loui (WashU chair)
Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech
Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)
32
Quantitative Results
Benchmark performance artificial ant
6x less computational effort vs EP 20x less vs GP
parity problems 133x less vs EP 4x less vs GP on 5-parity found solutions to 6-parity (none found by EP or GP)
multiplexer problems 9x less vs GP on 11-multiplexer
33
Qualitative Results
Requirements for competent program evolution all requirements for competent optimization
+ exploit semantics
+ recombine programs only within bounded subspaces
Bipartite conception of problem difficulty program-level adapted from the optimization case
deme-level theory based on global properties of the space (deme-level neutrality deceptiveness etc)
34
Qualitative Results
Representation-building for programs
parameterization based on semantics
transforms program space properties to facilitate program evolution
probabilistic modeling over sets of program transformations
models compactly represent problem structure
35
Competent Program Evolution
Competent not just good performance explainability of good results robustness
Vision representations are important program learning is unique representations must be specialized based on semantics
MOSES meta-optimizing semantic evolutionary search exploiting semantics and managing demes
36
Committee
Dr Ron Loui (WashU chair)
Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech
Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)
33
Qualitative Results
Requirements for competent program evolution all requirements for competent optimization
+ exploit semantics
+ recombine programs only within bounded subspaces
Bipartite conception of problem difficulty program-level adapted from the optimization case
deme-level theory based on global properties of the space (deme-level neutrality deceptiveness etc)
34
Qualitative Results
Representation-building for programs
parameterization based on semantics
transforms program space properties to facilitate program evolution
probabilistic modeling over sets of program transformations
models compactly represent problem structure
35
Competent Program Evolution
Competent not just good performance explainability of good results robustness
Vision representations are important program learning is unique representations must be specialized based on semantics
MOSES meta-optimizing semantic evolutionary search exploiting semantics and managing demes
36
Committee
Dr Ron Loui (WashU chair)
Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech
Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)
34
Qualitative Results
Representation-building for programs
parameterization based on semantics
transforms program space properties to facilitate program evolution
probabilistic modeling over sets of program transformations
models compactly represent problem structure
35
Competent Program Evolution
Competent not just good performance explainability of good results robustness
Vision representations are important program learning is unique representations must be specialized based on semantics
MOSES meta-optimizing semantic evolutionary search exploiting semantics and managing demes
36
Committee
Dr Ron Loui (WashU chair)
Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech
Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)
35
Competent Program Evolution
Competent not just good performance explainability of good results robustness
Vision representations are important program learning is unique representations must be specialized based on semantics
MOSES meta-optimizing semantic evolutionary search exploiting semantics and managing demes
36
Committee
Dr Ron Loui (WashU chair)
Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech
Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)