Epigenetics in Evolutionary Algorithms and Computer ...€¦ · The Australian National University Epigenetics in Evolutionary Algorithms and Computer Generated Artwork COMP8755 William

The Australian National University

Epigenetics in Evolutionary Algorithmsand Computer Generated Artwork

COMP8755

William Maroney (u5612989)Supervised by: Tom Gedeon and Bob McKay

May 26, 2017

Abstract

This report presents extensions to genetic algorithms by incorporating possiblemodels of epigenetics and demonstrates how they can be applied to an interactiveevolutionary computing problem involving computer generated artwork.This computer generated artwork problem provides a sufficiently complex/non-

trivial setting to investigate whether these extensions help, hinder or do not influ-ence the performance of the genetic algorithm.The genetic algorithm is extended with two possible models incorporating epige-

netics. The abstract models used assume that two identical phenotypes could comefrom different underlying genomes, but that there would be a differential cost tothe individual to achieve the phenotype which would provide different evolutionaryselective pressure in even seemingly identical individuals.Initial experimental results are inconclusive as to whether these extensions ac-

tually improve the convergence of genetic algorithms, however, they are shown toaffect the selective pressures of genetic algorithms.The suggestive results identify future work options to take these ideas forward.

1

Acknowledgements

Thank you to my supervisors Tom Gedeon and Bob McKay for their continual support,encouragement and direction throughout this project. Without their tireless efforts inworking with me to understand issues as they arose, and in identifying resolutions, thisproject would not have progressed at all. In particular their willingness to review mywork, a number of times while in a questionable state of completion, was invaluable.

To everyone who generously gave their time to participate in testing, I thank you.

2

Contents

1. Introduction 5

2. Background 62.1. Evolutionary Computing Background . . . . . . . . . . . . . . . . . . . . 62.2. Computer Generated Artwork . . . . . . . . . . . . . . . . . . . . . . . . 72.3. Prior and Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.4. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.5. Major Contribution of this Work . . . . . . . . . . . . . . . . . . . . . . 92.6. Roadmap for this Report . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3. Algorithms 103.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.2. Genotypes, Phenotypes and Epigenetics . . . . . . . . . . . . . . . . . . 113.3. Notation and Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.4. Hyper-parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.5. The Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.6. Proposed Epigenetic Models . . . . . . . . . . . . . . . . . . . . . . . . . 18

4. Software Architecture 234.1. Software Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244.2. Software Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

5. Experiments 285.1. Test Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285.2. Key Performance Measures . . . . . . . . . . . . . . . . . . . . . . . . . . 285.3. Experimental Set-up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285.4. Results and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

6. Conclusion and Future Work 336.1. Suggested Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

References 35

Appendix A. Independent Study Contract 36

Appendix B. Original Project Outline 37

Appendix C. Software Description 41

Appendix D. Software Usage 44

Appendix E. Darwindrian Representation (Jian Yin Shen) 45

Appendix F. Darwindrian Representation (Mathew Smith) 48

3

Appendix G. Experimental Data 52

4

1. Introduction

Evolutionary biology, as it is currently understood, models changes in heritable traitsin a population of individual biological entities across generations. Whether a traitpossessed by an individual in one generation is passed onto subsequent generations ornot, is subject to natural selection, or survival of the fittest. Simulating this core conceptforms the basis for genetic algorithms.

Genetic algorithms can solve various computational problems by simulating biologicalevolution. This simulation guides the search of a possible set of solutions in a waythat mimics natural selection. With a suitable problem and mechanism to determinethe fitness of any possible solution, genetic algorithms can outperform naive brute forceor random search strategies. By performing these simulations on a computer, geneticalgorithms may very quickly explore possible solutions; leveraging the ever-increasingcomputational power of modern computers. However, there are a subset of problemswere assessing the fitness of individuals is not well understood or well defined, meaningcomplete automation of a genetic algorithm solution is not possible. In such problems,a human must provide insight into the critical fitness evaluation step before the geneticalgorithm can proceed. This is known as interactive evolutionary computing.

Examples of such problems involve the computer generation of music or artwork thatis aesthetically pleasing. Determining whether any individual, or many people, willenjoy a certain piece of music or artwork is not something that has been reduced to adeterministic function. The surest way to find out, is to ask people their opinion.

Requiring humans to evaluate many individual works however exposes the inconsis-tency and mortal limits of the human. A human may make a mistake, may not knowand guess, may change their minds, may get tired. The human factor will introducenoise and fragility into any interactive evolutionary computing solution.

In the context of genetic algorithms, there have been various bodies of research thathave attempted to optimise how quickly fitness evaluations may be produced, however,the human element will always be a limiting factor in interactive evolutionary computing.Optimising other aspects of genetic algorithms is unlikely to overcome this limitation.

This work takes a different approach. That is, accepting the human factor as anunavoidable and limiting constraint, how can one extract the most value from eachpiece of human provided information?

The overarching goal of this work is to maximise the value extracted from each andevery human engagement in the interactive evolutionary computing model. That is:

• to extract as much information as possible from individual fitness evaluations

• to converge to satisfactory solutions in as few iterations as possible

• to reduce or smooth the noise inherent in human provided fitness evaluations

This report investigates a proposition by Gedeon [13] for incorporating a model ofepigenetics into genetic algorithms.

5

2. Background

2.1. Evolutionary Computing Background

Genetic algorithms, often attributed to Bremermann [1], Fraser [2] and Holland [3]in their contemporary form, is a set of techniques that have practical applications tooptimisation and search problems where:

• there may be no single “best” answer

• we may be satisfied with a “good enough” answer

• the search space is not well-defined, precluding gradient-descent methods

• the search space is prohibitively large or complex

• the quality of given candidate solutions (fitness) can be determined relatively easily

The genetic algorithm is a specific example of an evolutionary algorithm, or evolu-tionary computing; that is, algorithms inspired by biological evolutionary processes. Inthe case of genetic algorithms, the theory of natural selection is simulated to guide theexploration of the possible search space, with full details presented in section 3.5.

Evolutionary computation includes the special case of interactive evolutionary com-putation (IEC). IEC is defined by the additional constraint that determining the qualityof candidate solutions (i.e. computing their “fitness”) requires human intervention.

IEC can be used where an individual’s aesthetic preferences are involved in determin-ing the fitness of a solution. For example in the creation of computer generated artworkor music that is pleasing to someone or a group of people. With current knowledge, it isgenerally not possible to describe, programatically or deterministically, human aestheticresponses. So while IEC provides a framework to intelligently produce many candidatesolutions quickly, human input is generally required to assess those solutions.

Importantly, there are some optimisation/search problems where the requirement forhuman input is unavoidable; this inevitably introduces a significant bottleneck. Theutilisation of computing power can only accelerate evolutionary algorithms so much.They all generally involve an iterative step where fitness values must be computed (or inthe case of IEC, received from a human). This periodic and sequential need for humaninput limits the number of candidates that can be evaluated within an IEC algorithm.This is further exacerbated by human fatigue; people can only perform repetitive tasksfor so long before they tire or become distracted. These factors motivate the searchfor better performing search/optimisation algorithms, that is, algorithms that will find“good enough” solutions as quickly as possible.

6

2.2. Computer Generated Artwork

The Dutch painter Piet Mondrian (1872-1944) was one of the key contributors to anartistic movement known as neoplasticism [4]. This particular category of artwork wasabstract in nature consisting of very simple geometric objects and rules. In particular:

• the use of a white canvas background

• solid black lines drawn only in horizontal and vertical directions

• rectangular regions coloured in one of three primary colours

• all lines ending on the canvas border or intersecting another line

• no right angles (excluding a line intersecting the canvas border)

• no lines parallel to and adjacent to a border

• no lines parallel to and adjacent to each other

The following image is one of Mondrian’s works and illustrates these geometric criteria.

Figure 2.1: Composition II in Red, Blue, and Yellow, 1930 [5]

These specific geometric properties are particularly amenable to computer representa-tion and manipulation, thus will form the motivating problem considered in this work.That is, can we produce computer-generated and aesthetically pleasing “Mondrian-like”images based on simple user feedback of candidate artworks?

7

2.3. Prior and Related Works

This exact problem has been considered in earlier works by Shen [6] and Smith [7].Shen encapsulated this as an optimisation problem and applied various evolutionaryalgorithms1 to solve it. Smith then built on this work, broadly maintaining the problemencapsulation while investigating whether the performance of Shen’s methods couldbe improved. Smith investigated a number of problem and implementation specificoptimisations along with some algorithmic adaptations; the major contribution modelleduser behaviour with artificial neural networks in an attempt to automatically makechoices based on likely user actions. This allowed for actual human input to be reservedfor cases where only choices with higher degrees of uncertainty were available.

These prior works applied the conventional genetic algorithm, without a model forepigenetics, to the problem of generating aesthetically pleasing Mondrian-like artwork.The algorithmic details of Shen’s and Smith’s prior works are described in appendices Eand F respectively. These algorithmic details are captured in this report to illustrate anumber of problem-specific heuristics that were incorporated into those implementationsof the interactive genetic algorithm. This work removes these heuristics as they introducebiases into the optimisation, thus allowing for a clearer comparison between the geneticalgorithm with and without the proposed epigenetic models.

Some of these problem-specific heuristics resulted in significant deviation from thetypical behaviour of genetic algorithms. For example, Shen’s work introduced constraintson valid pairs of parents; making their pairing almost deterministic. Further, eachgene within a chromosome was independently subject to crossover, the details of whichwere fully determined by the relative fitness values of the parents undergoing crossover.Finally, no mutation operator was employed. The lack of fresh genetic material resultedin heavily biased search behaviour and an inability to escape local optima.

Outside of the changes to the genetic algorithms, specific desirable traits for the finalMondrian-like artworks were identified and sought out in Shen’s model. In particular, theproperty that small coloured rectangles were included in the final images was preferred.Complementary biases were introduced to ensure that such coloured rectangles wouldquickly be achieved, artificially converging to pre-identified examples of “good” solutions.

Smith’s work identified both the lack of a mutation operator and this convergencetowards small coloured rectangles. However, the solutions employed largely reduced thestrength of these biases but did not ultimately remove them.

The work discussed in this report broadly reuses the problem definition, the encodingof candidate solutions and the Mondrian-like image creation logic. But while these designcomponents are retained, they have been re-implemented into an unbiased frameworkof genetic algorithms to enable an objective comparison between the various epigeneticmodels and traditional genetic algorithms.

A more detailed treatment of genetic algorithms is presented in section 3. Section 3will present the conceptual set-up for genetic algorithms and define in detail how theycan be applied to the Mondrian-like problem.

1The interactive genetic algorithm and the interactive bacterial evolution algorithm

8

2.4. Motivation

The evolutionary computing background presented in section 2.1 and the problem defi-nition motivates the desire for a better performing IEC technique. “Better” performingin this context means an algorithm that converges to a “good enough” solution withfewer fitness evaluations; thus mitigating to some extent the human bottleneck factor.

2.5. Major Contribution of this Work

Genetic algorithms have been demonstrated to practically solve real-world optimisationproblems (for example, see [8], [9], [10]). It does this by modifying its search behaviourthrough a possible solution space in a way inspired by the theory of natural selection.Importantly, this simulation is somewhat crude and just captures a few key concepts:

• survival of the fittest - the fitter an individual is, the greater chance it has ofpassing its genetic material on to subsequent generations

• random mutation - randomly, the genetic material of individuals can mutate pro-ducing better or worse results than would otherwise be achieved

The fact that even this very high-level approximation of biological evolution resultsin convergence better than expected by random search is suggestive. It motivates thequestion: does simulating biological evolutionary processes more accurately result ineven better performance?

The major contribution of this work is to investigate this question. In particular, thegenetic algorithm is extended to incorporate aspects of epigenetics.

Additionally, the minimisation of heuristics specific to the Mondrian-like problem inthis work aims to ensure that any findings, positive or negative, can be applied moregenerally to optimisation problems; be they interactive or not.

2.6. Roadmap for this Report

The rest of this report is broken up into a number of major sections. Firstly, the geneticalgorithm as it applies to the Mondrian-like problem is described in detail along with twospecific epigenetic extensions. Secondly, the software architecture of an experimentalenvironment that was built to test the ideas in of this work is described. Thirdly,experiments conducted along with their results are documented. Finally, this reportconcludes with a high-level analysis of the results of this work and suggests future workto carry this further.

9

3. Algorithms

This section presents the algorithmic details of the Mondrian-like problem and how touse the genetic algorithm to solve it; with and without extensions incorporating possiblemodels of epigenetics.

3.1. Overview

The genetic algorithm is an optimisation technique that simulates evolutionary processesto converge to optima. These processes are derived from the theory of natural selection.The genetic algorithm operates on a fixed-size generation of candidate solutions, orindividuals. The genetic algorithm also requires a real-valued fitness function whichacts on an individual and provides a measure of “how good” a solution it is. Thegenetic algorithm operates on the current generation of individuals by selecting subsetsof individuals to act as parents that are combined to produce offspring in the nextgeneration. The fittest individuals (i.e. those with the highest fitness value) are morelikely to be selected as parents and thus contribute to subsequent generations. Thespecific details of this transition from one generation to another can vary by problemand implementation; this report will denote the selection of these specific details as theevolution2 algorithm. The evolution algorithm can utilise any or all of the followinggenetic operators:

• selection produces a collection of one or more unique individuals from the currentgeneration that will combine their genetic material to produce one or more offspring- this selection is based on the fitness of individuals, those having greater fitnesswill have a greater chance of selection than individuals with lesser fitness

• crossover takes a collection of selected individuals and merges their genetic ma-terial in some way, resulting in one or more offspring

• mutation takes a single individual and randomly mutates its genes

These four key functions define a specific implementation of the genetic algorithm.Such an implementation of the genetic algorithm can then be applied to any problemthat can be phrased as an optimisation problem. To do so, we must define a fitnessfunction and encode the set of all individuals (i.e. all candidate solutions) in such a waythat it is closed under all of the genetic operators listed above.

Note: a common technique employed in genetic algorithms is elitism, or elite selection[11]. This technique will identify one (or several) of the fittest individual(s) from everygeneration and propagate them unchanged to the next. Where the number of elitesselected is non-zero, the quality of solutions will never decrease across generations. Thework discussed in this report will make use of elitism.

2While this is not a commonly employed nomenclature in the area of genetic algorithms, it is a usefulconstruct to clearly contrast such design choices made in this work with those made in [6] and [7].

10

3.2. Genotypes, Phenotypes and Epigenetics

The final critical component of this set-up is to distinguish “genotypes” from “pheno-types”. This is best first illustrated with a more familiar example and then tied back tothe Mondrian-like problem.

A genotype encapsulates the genetic make-up of a biological entity (e.g. a human) -our DNA. However, while our DNA constitutes our genetic material (our genotype), thatgenotype results in, or expresses, the actual physical human being - us; our phenotype.While most genetic operators are applied to our genotypes (i.e. crossover and mutation),it is our expressed phenotypes (actual people) that compete for survival of their geneticmaterial across subsequent generations. It is the fitness of an individual’s phenotypethat influences the selection of their corresponding genotype during evolution.

This nuance is particularly important in this work as epigenetics, broadly speaking, isabout a more complex interplay between genotypes and phenotypes than is traditionallydescribed by the theory of natural selection, but is now part of modern evolutionarybiology. It is this complexity that is utilised in both possible epigenetic extensions tothe genetic algorithm proposed in this work.

Briefly, epigenetics is concerned with individual traits that may be inherited acrossgenerations that cannot be explained by genetic material. It is considered “outside ofgenetics” - hence the name, epigenetics. The full details of the application of epigeneticsto genetic algorithms and the Mondrian-like problem is described in section 3.6.

Subsequent sections will clearly identify when objects are genotypes or phenotypes,and when functions are defined to act on genotypes or phenotypes.

Returning to the case of the Mondrian-like problem, Shen proposed a suitable genotyperepresentation in [6] that was largely adopted by [7] and will also be utilised in this work;it is described in detail in section 3.5.1. These genotypes then probabilistically expressan actual Mondrian-like artwork - the phenotype.

3.3. Notation and Definitions

The following notations and definitions will be used throughout the rest of this report.Let U(a, b) ∈ R be a random variable, uniformly selected from the closed interval [a, b]Let G be the set of all valid genotypes (abstract algebraic objects, see 3.5.1)Let P be the set of all valid phenotypes (“Mondrian-like” image, see 3.5.2)Let g ∈ G be a genotypeLet p ∈ P be a phenotypeLet ||g|| be the number of genes in the genotype gLetm∗ : G→ P be the probabilistic expression mapping from genotypes to phenotypes

Vectors will be denoted in bold, e.g. g while scalars will not, e.g. a.Vector components will be denoted and indexed by subscript, e.g. g1.

Mathematically, the key functions comprising the genetic algorithm are defined as:

• evolution : Gn → Gn (where n is the population size)

11

• selection : Gn → Gm (where m is the number of parents used during crossover)

• crossover : Gm → Gl (where l is the number of offspring produced during crossover)

• mutation : G→ G

3.4. Hyper-parameters

The following parameters will be fixed within the context of any experiment. They arealmost completely common across all of the representations presented in this report.

Let n be the number of individuals within a generationLet w, h be the width and height (respectively) of a phenotype in pixelsLet nq be the number of generating points (see section 3.5.1)Let nl be the number of generating loops (see section 3.5.3)Let nelites be the number of elites to select for the next generationLet p~t be probability of emitting any line out of a point of type t3

3.5. The Genetic Algorithm

The algorithmic details employed by the prior works of Shen [6] and Smith [7] employeda number of problem-specific biases and heuristics. For example, the fitness of an indi-vidual influenced every genetic operator - i.e. crossover and mutation, not just selectionas is typical of genetic algorithms.In an effort to focus this work on the proposed epigenetic models and reduce over-

all complexity, these heuristics and biases were fully removed and replaced with thealgorithms outlined in this section.The representation presented in this chapter is a generalisation of the representations

from prior work on this problem, each fully specified in appendices E and F.

3.5.1. Genotype Representation

The genotype used in the Mondrian-like problem is a probabilistic structure that canproduce many possible phenotypes. At a high-level, it encodes a set of “generatingpoints” on a canvas that will emit lines in directions that loosely follow the geometricrules of a Mondrian-like image. Associated with these generating points are probabilitydistributions that dictate how lines are drawn from these points, how rectangles areselected for filling and the colour to use when filling those rectangles.The following is a visual representation of an example Mondrian-like genotype. In this

case there are nq = 3 generating points plotted at specific locations on a two-dimensionalcanvas; each of which share a probability distribution d that describes the likelihood ofemitting lines in the valid directions (north, south, east and west). The length of thevectors are proportional to the corresponding probabilities, with (in this example) the

3Where t ∈ {terminal,online,right-angled,connected,etc.} - possible generating point states encoun-tered during phenotype emission as defined by Shen in [6]

12

lowest probability associated with emitting a line west, and the the highest probabilityassociated with emitting a line south.

Figure 3.1: Graphical representation of q,d genes in a genotype

The remaining genes in the genotype representation describe bounds used to identifyrectangles that can be coloured (amin, amax, cmax) and a probability distribution over thepossible colours (c). More formally:

Let g ∈ G s.t. g = (amin, amax, cmax,d, c,q); whereamin ∈ Z/wZ× Z/hZ is the minimum dimensions of a rectangle that may be colouredamax ∈ Z/wZ× Z/hZ is the maximum dimensions of a rectangle that may be colouredcmax ∈ Z is the maximum number of rectangles that may be colouredd ∈ R4 s.t. 0 ≤ di ≤ 1 ∀i ∈ [1, 4] is the probability of drawing north/south/east/westc ∈ R3 s.t. 0 ≤ ci ≤ 1 ∀i ∈ [1, 3] is the probability of colouring red/blue/yellow

q ∈(Z/wZ× Z/hZ

)nqis the set of generating points

Note: this is the genotype encoding from section F.1 with continuous probabilities.

3.5.2. Phenotype Representation

The phenotype used in this work on the Mondrian-like problem is a geometric represen-tation of a two-dimensional artwork that conforms to the criteria of section 2.2.

13

More specifically, the phenotype consists of a canvas, a set of horizontal and verticallines and coloured rectangles. For example, software constructed for this work producedthe following phenotype - encoded as an 8-bit RGB PNG image file:

Figure 3.2: Sample Mondrian-like phenotype

3.5.3. Genotype to Phenotype Expression Mapping

The following algorithm expresses one of the possible phenotypes from a given genotype.

Algorithm 1 Expression mapping m∗(g) (genotype to phenotype)

Let (amin, amax, cmax,d, c,q)← g . decompose gLet segments ← ∅ . the empty setfor i = 1, . . . , nl do . iterate “number of generating loops” times

randomly shuffle qfor qj ∈ q do . random ordering over qj

if possible to emit line from qj thenLet d← random direction drawn using the probability distribution {p~t}Let e← all valid end-points (i.e. points of intersection)Let ex ∈ e be a random end-point from e selected uniformly at randomLet l← (qj ↗ ex) (line from qj to ex)Let segments ← segments ∪ {l}

return c∗(amin, amax, c, segments)

14

Algorithm 2 Phenotype rectangle colouring c∗(amin, amax, c, segments)for i = 1, . . . , cmax do . no more than cmax coloured rectangles

Select colour i with probability ciΣ3j=1cj

Randomly select a rectangle r from segments with area bounded by amin, amax

Fill the rectangle r with colour ireturn segments with any colouring applied

3.5.4. Fitness Function

The fitness function is defined as follows:Let ratings← {strongly dislike,dislike,indifferent,like,strongly like}Let fP : P→ ratings be the phenotype fitness functionLet fG : G→ R be the genotype fitness functionDefine fP as fP(p) = user selected rating of pDefine fG

τ ∀τ (generations up to and including τ) as

fGτ (g) =

−2 fP(m∗(g)) = strongly dislike−1 fP(m∗(g)) = dislike0 fP(m∗(g)) = indifferent1 fP(m∗(g)) = like2 fP(m∗(g)) = strongly like

That is, the entire fitness value of a phenotype is simply assigned to the individualgenotype that expressed it.

3.5.5. Genetic Operator: Evolution

Recall, the “evolution” algorithm is defined in this report to cover the iterative methodused to pass from one generation of individuals to the next.

First, each individual genotype in the current generation emits a phenotype. Theseare then used to assign fitness values to the underlying individuals.

A common practice known as “elite selection” is employed here. Elite selection takessome of the most fit individuals and passes them on to the next generation unchanged.

The rest of the next generation is constructed in the typical method. A set of uniqueparents is selected (weighted by fitness), each set of parents performs crossover whichresults in another set of offspring. Finally, each of these offspring may then undergosome mutation - i.e. modification of their genes (helping to escape local optima).

15

Algorithm 3 evolution({gi}ni=1)

Express a pi = m∗(gi) ∀i ∈ [1, n]Evaluate fitness values at generation τ , fG

τ (gi) ∀i ∈ [1, n]

Sort {gi}ni=1 by fitness values fGτ (gi) in descending order

Let next_generation ← {g1, . . . ,gnelites} . elite selectionwhile ||next_generation|| < n do . complete the next generation

Let parents ← selection({gi}ni=1)Let offspring ← crossover(parents)for child ∈ offspring do

Let next_generation ← next_generation ∪ {mutation(child)}return next_generation . truncated to length of n if necessary

3.5.6. Genetic Operator: Selection

The selection method employed in this work is known as roulette selection, or fitnessproportionate selection. Critically, each individual is given a selection probability pro-portional to its fitness value. Selection is performed without replacement so that thesame parent cannot pass its genetic material on to the same offspring multiple times.

Algorithm 4 selection({gi}ni=1) where m = 2

Let parents ← ∅ . the empty setScale fitness values s.t. fG

τ (gi) ≥ 0 . ensure non-negative values

for j = 1, . . . ,m doSelect gj with probability fGτ (gj)

Σnk=1fGτ (gk)

; gj,gk /∈ parents . without replacementLet parents ← parents ∪ {gj}

return parents

3.5.7. Genetic Operator: Crossover

The crossover method employed in this work is two-parent, single-point crossover. Asingle point in the genotypes’ chromosomes is selected uniformly at random. At thispoint, a single crossover is performed resulting in two new individuals. The first newindividual is constructed from the genes of parent A until the crossover point, and thencontinued with the genes of parent B. The second new individual is similarly constructedfrom the genes of parent B until the crossover point, and then continued with the genesof parent A. This process is visually represented below.

16

Figure 3.3: Two parent, single-point crossover

Algorithm 5 crossover({gi}mi=1) where m = 2, l = 2

Let (g11, . . . ,g

1||g||)← g1 . decompose g1

Let (g21, . . . ,g

2||g||)← g2 . decompose g2

Let r ← U(1, ||g||) rounded to nearest integer . random crossover pointLet child1 ← (g1

1, . . . ,g1r ,g

2r+1, . . . ,g

2||g||) . single-point crossover

Let child2 ← (g21, . . . ,g

2r ,g

1r+1, . . . ,g

1||g||) . single-point crossover

return {child1, child2}

3.5.8. Genetic Operator: Mutation

A simple probabilistic mutation method is employed such that the expected number ofmutated genes in each individual is low and constant; typically this expected value is 1.

17

Algorithm 6 mutation(g)Let (g1, . . . ,g||g||)← g . decompose gfor i = 1, . . . , ||g|| do

if U(0, 1) < 1||g|| then . with constant mutation probability per gene

if gi is a bounded integer in the range [a, b] thengi ← perturb(gi, a, b) rounded to nearest integer

else if gi is a probability distribution thenLet x be one of the possible events in gi selected uniformly at randomLet px ← perturb(px, 0, 1) . mutate the probability of xLet pj =

pj∑k pk∀pj ∈ gi . renormalise

return (g1, . . . ,g||g||) (with any perturbations applied)

The helper function perturb referenced in the mutation operator fits a Gaussiandistribution to the current value of the variable to mutate. This distribution is sampleduntil a valid value is retrieved - this becomes the new value. While the expected outcomeof this operation is for the input value to remain unchanged, there is random mutationintroduced about this value.

Algorithm 7 perturb(x, lower, upper)

Let σ ← upper−lower3

. σ = 3 covers 99.7% of the range of xLet y ← upper + 1while y /∈ [lower, upper] do . re-sample if necessary

y ← N (µ = x, σ2) . Gaussian distribution: mean µ, variance σ2

return y

Note: σ is selected such that three standard deviations cover the range of valid values.So, if the mean is centred then there is a 99.7% chance that this algorithm terminatesafter one iteration. The worst case is that x is either lower or upper, in which case thereis a 99.7

2% chance of terminating after one iteration or 1 − 99.7

2% ≈ 50% chance of not

terminating. In the worst case, the expected number of iterations is approximately two.

3.5.9. Invalid Phenotypes

This combination of crossover and mutation can produce genotypes that are able toexpress phenotypes that violate some desirable constraints from section 2.2.

This issue will be revisited later in the report with possible remedies suggested aspotential future work. This report does not resolve the limitation.

3.6. Proposed Epigenetic Models

Epigenetics in biology are stable heritable traits which can not be explained by changesin DNA sequence, often referring to changes in gene methylation which affects gene

18

activity, and also includes prions. As seen in biology, epigenetic changes do not fit wellwith modern views of Darwin’s theory of evolution (all inheritance is via DNA), norLamarck’s (which would require phenotypic changes to modify the DNA). Baldwin’stheory comes close, and assumes that evolutionary pressure will favour individuals withthe capability to learn during their lifetime [12].

In genetic algorithms with a single chromosome with a direct phenotype to genotypematching, there is no place for epigenetics. That is, we could not meaningfully differ-entiate two parts of a data structure representing algorithmic genetic information andall one “DNA” and the other “not-DNA” (and hence epigenetic). In this project, wehave a chromosome with a probabilistic process to create a phenotype from a genotype,thus potentially allowing many genotypes to generate the same phenotype, and a singlegenotype to generate many phenotypes.

So, by analogy with Baldwin’s theory, we consider the process of search over genome-to-phenotype pairs as epigenetic, if it is used as part of the evolutionary selection. Thisis not exactly the same as an individual learning over a lifetime, rather it is abstractlyanalogous, and is potentially useful to speed up interactive evolutionary algorithms.

In a (simulated) biological system which has both short term and long term adaptationmechanisms for individuals in a population, the long term process would maintain theprimary inheritable information. I propose that the function of the short termmechanism is to adapt individuals to their current environment at a fitness cost relatedto the “distance” between the simplest expression of their primary inheritableinformation, and the actual expression. Thus, short term environmental changes can beaccommodated by the short term mechanism. If the short term continues long enough,some of the changes become incorporated into the primary inheritable information due toselection pressure. This prepares individuals in the population to be fitter if theenvironmental change continues in the “same direction” further into the future. In termsof biology, the long term process is Darwinian selection using DNA, while the short termprocess is similar or analogous to epigenetic changes.

In this project, there is a chromosome with a probabilistic process to create a phenotypefrom a genotype, thus potentially allowing many genotypes to generate the samephenotype, and a single genotype to generate many phenotypes. The search overgenotype-to-phenotype pairs to assign fitness from phenotype to all of the genotypesavailable which could have created this phenotype is an abstract model of an epigeneticmechanism as proposed above, which has the potential to speed up interactiveevolutionary algorithms or where fitness evaluation is particularly costly.

Tom Gedeon’s proposition for epigenetic models [13]

Gedeon’s proposition forms the foundation for both epigenetic extensions to the ge-netic algorithm investigated in this work. These models are built on top of the existingframework from section 3.5 and are constrained to a redefinition of the fitness functions.

While section 3.5 employed the symbol fGτ for the fitness function, the parameter τ

was un-utilised. It was defined in that way so that the fitness function could be extended

19

to be temporal in nature in these epigenetic models without having to redefine any ofthe algorithms from previous sections. The main implication of this construction is thatthe fitness of an individual may change over time.

3.6.1. Epigenetics: Exact Phenotype Matches

The fitness function is defined as follows:Let ratings← {strongly dislike,dislike,indifferent,like,strongly like}Let rating : P→ ratings be the phenotype rating functionLet fP : P→ Z be the phenotype fitness functionLet fG

τ : G→ R be the genotype fitness function at time τDefine fP as

fP(p) =

−2 rating(p) = strongly dislike−1 rating(p) = dislike0 rating(p) = indifferent1 rating(p) = like2 rating(p) = strongly like

Define fGτ as, ∀p ∈ P observed and rated at time t ≤ τ

fGτ (g) =

∑p

(p(m∗(g) = p)× fP(p)× 1

k×||m∗(g)−p||+1

)∑

p

(p(m∗(g) = p)×MAX_FITNESS× 1

k×||m∗(g)−p||+1

)Where MAX_FITNESS = max{fP(p)} = 2

This fitness function accumulates the fitness value of all previously observed pheno-types. These fitness values are weighted proportionally to the expression difficulty (orprobability) and inverse proportionally to the distance between each observed phenotypeand that which was actually expressed by the given genotype.Note: the denominator is used to normalise the fitness function such that genotype

to phenotype transitions with high probability to do not dominate the summation.

3.6.2. Epigenetics: Inexact Phenotype Matches

Generalising further, we may consider inexact matches. i.e. not just phenotypes that canbe directly expressed by a genotype, but those “close” to phenotypes directly expressible.A measure on this “closeness” will be included. Define fG

τ (g) as

fGτ (g) =

∑p

(p((m∗(g))(j) = p(j)

)× fP(p)× 1

k1×j+1× 1

k2×||m∗(g)−p′||+1

)∑

p

(p((m∗(g))(j) = p(j)

)×MAX_FITNESS× 1

k1×j+1× 1

k2×||m∗(g)−p′||+1

)∀p ∈ P observed and rated at time t ≤ τ

Where p′ is given by min j ∈ N s.t.(m∗(g)

)(j)= p(j)

20

Where p(j) is the phenotype p, with line segments co-ordinates as multiples of jWhere p′ is the “nearest” (possibly exact) match to pWhere MAX_FITNESS = max{fP(p)} = 2

The p(j) construction is best illustrated with an example showing various values of j.As j increases, the granularity of valid plotting coordinates decreases. With sufficient j,lines begin to collapse on top of each other. The purpose of this construction is to force(with sufficiently high j) all phenotypes to collapse to all other phenotypes. The degree ofgranularity reduction required forms a distance metric between phenotypes; the greaterthe granularity decrease required for equivalence, the greater the distance between twophenotypes. A sufficiently high value of j causes all phenotypes to degenerate to theempty canvas, thus ensuring that every pair of phenotypes has a finite distance.

(a) Example p (b) p(10) (c) p(40) (d) p(80) (e) p(160)

3.6.3. Computing Epigenetic Fitness Functions

From a computability perspective, it would be ideal to invert the function m∗. That is,we would like to compute (m∗)−1 and enumerate {gi} s.t. p

(gi = (m∗)−1(p)

)> 0 for

any given phenotype p. However, this inverse does not exist. The randomised nature ofm∗ means it is a one-many function. It is also easy to see that multiple genotypes canproduce the same phenotype. Since m∗ is a many-many function, it is not invertible.

Enumerating all possible genotypes that could have produced a given phenotype su-perficially appears intractable. However, recall that the context of this is evolutionarycomputing. While the set of genotypes that could have generated a given phenotype willimpact how we “spread” a fitness value over the search space, what really matters is theaccumulated fitness value of genotypes as they occur during selection. Thus, considerthe deterministic function m corresponding to m∗ which also acts on a multi-variaterandom variable representing the random choices made in the computation of m∗.Let m : G×X→ P where X is a discrete and finite multi-variate random variable.

Observe that while computing fGτ (g) ∀g ∈ G is computationally difficult, we only

require the fitness values for genotypes in the current population. Thus, we may accu-mulate the set of all observed and rated phenotypes pi at time τ and compute fG

τ (g)“on-the-fly” by evaluating m(g,x) ∀x ∈ X and comparing each result to all {pi}.

While this algorithm involves identifying any match between m(g,x) and {pi} ateach evaluation of fG

τ , a phenotype is simply a collection of unique line and rectangleco-ordinates. A suitable use of hash-map data structures reduces this operation to O(1).

21

This general algorithmic approach may be extended to the epigenetic model withinexact matches. Specifically, for each historical phenotype, and for each reachablephenotype enumerated during the fG

τ computation, the translated phenotype p(j) mustbe computed ∀j ∈ [1,max(w, h)]. This is required to identify “inexact” matches andthe smallest such translation factor j (as used by the fitness function at 3.6.2. Theupper-bound on j comes from the fact that any further loss of resolution results in ablank canvas for all starting phenotypes. That is, if the canvas has less pixel locationsthen our granularity, we cannot plot anything. So, each observed and rated phenotypep now corresponds to the set {pj} ∀j ∈ [1,max(w, h)].

The following pseudo-code provides algorithms to compute these fitness functions.

Algorithm 8 Epigenetic (exact matches) fitness at generation τ , fGτ (g)

Let p be the phenotype already expressed by gLet H← {pi} be the historical phenotypes already encountered ∀i ∈ [1, n× τ ]Let R← {qj ← m(g,x) | ∀x ∈ X} be all phenotypes reachable from gfor (qj,pi) ∈ R×H do

if qj = pi thenLet a← a+ p(m∗(g)=qj)×fP(pi)

k1×||p−qj ||+1

Let b← b+p(m∗(g)=qj)×MAX_FITNESS

k1×||p−qj ||+1

return ab

The asymptotic complexity of this algorithm is O(n× ||X|| × τ); that is, linear in τ .

Algorithm 9 Epigenetic (inexact matches) fitness at generation τ , fGτ (g)

Let p be the phenotype already expressed by gLet H← {pi} be the historical phenotypes already encountered ∀i ∈ [1, n× τ ]Let R← {qj ← m(g,x) | ∀x ∈ X} be all phenotypes reachable from gfor pi ∈ H do

for l = 1, . . . ,max(w, h) dofor qj ∈ R do

if qj(l) = pi(l) then . inexact match on multiple-of-l coordinate systemLet a← a+ p(m∗(g)=qj)×fP(pi)

(k1×j+1)×(k2×||p−qj(l)||+1

Let b← b+p(m∗(g)=qj)×MAX_FITNESS

(k1×j+1)×(k2×||p−qj(l)||+1

Proceed to next pi

return ab

Accounting for inexact phenotype matches in this way results in only a constantincrease of complexity, that is, computing fG

τ (g) for inexact matches has asymptoticcomplexity O(n×max(w, h)× ||X|| × τ); which is still linear in τ .

22

4. Software Architecture

The software constructed to support this project provided a flexible framework in whichto implement various genetic algorithms to solve the Mondrian-like problem. In partic-ular, all three “implementations” described in this paper were incorporated:

• Maroney (section 3.5)

• Shen (appendix E)

• Smith (appendix F)

This software presents a HTTP server through which a user can perform experi-ment(s). All data is recorded and stored on the hosting server for later analysis. Theuser is expected to connect to this HTTP server via any modern web browser - standardsbased and portable client-side technologies are employed.

The high-level architectural design of the software is described in the image below.

Darwindrian

HTTP

server

Web

browser

Worker pool

1. Start HTTP server

2. Start web browser

3b.3c.

4b.4e.

4b.4e.

...5b.5c.

3a.3d.

4a.4f.

4a.4f.

...5a.5d.

4c. 4d.

Figure 4.1: Architectural diagram of software artefacts developed in this project

23

4.1. Software Operation

The main program is named darwindrian, as was used in the works of [6] and [7]. Thisname is a play on words, combining Darwin (for the evolutionary computing aspectsof this work) and Mondrian (for the artist that inspired the computer generated art-work model). Upon launching the darwindrian application a simple HTTP server islaunched (1) and after a brief pause, the default local web browser is launched to openthe associated darwindrian homepage (2).

Note: while the system hosting the HTTP server will also launch a local web browserby default, the server will accept connections from any routable device. A separatecomputer may be used to access and perform the experiments remotely.

The HTTP server acts largely as a mechanism through which to pass data betweendarwindrian and the web browser. It also maintains the state of the current geneticalgorithm experiment. This software has been designed as a single-user application. Torun multiple experiments concurrently, host multiple darwindrian instances on separateport numbers and connect to them individually with separate web browser sessions.

Once running, the first page presented to the user allows them to set the hyper-parameters and other settings of an experiment. Steps 3a, 3b, 3c, 3d in the architecturaldiagram above illustrate this exchange. A screen shot of this first page is included below.

Figure 4.2: Screen shot of experimental hyper-parameter selection

24

Once these hyper-parameter selections are submitted, a genetic algorithm instanceis created on the server, ready for interactive evolution. The user will then be redi-rected (3d) to the main evolution interface, a web page with a the current generation ofMondrian-like images and associated rating forms. A screen shot of this main interactiveevolutionary computing page is included below.

Figure 4.3: Screen shot of interactive evolutionary feedback

Essentially all of the experimental time will be spent on this page, with the user ratinga generation of individuals at a time until they have reached the pre-established numberof generations or they click on the “See Results” button to exit early. Alternatively, anexperiment may be halted without producing any results with the “Start Again” button.

The user will typically first request the next generation with the “See Next Generation”button (4a). The HTTP server then relays this request (4b) to the darwindrian logicwhere the next generation is computed with the evolution algorithm. Each individualgenotype in this new generation then expresses a Mondrian-like phenotype which isrendered as an image file. These images, embedded in the main evolution web page, arethen served to the user (4e, 4f). These steps (4a-4f) will iterate a number of times.

The steps 4c,4d only apply to the epigenetic models, not the basic genetic model. Asdescribed in section 3.6, the epigenetic fitness functions require an enumeration of allphenotypes reachable from the given genotype. Due to the combinatorial complexity ofthe Mondrian-like phenotype expression logic, this is computationally expensive. To re-duce the impact on the user experience, the genotype to phenotype enumeration has been

25

implemented in an optimised C++ stand-alone executable utilising a dynamic program-ming algorithm [14] to avoid redundant calculations during the search tree expansion.Further, the enumeration for each individual genotype is independent of all others andso can be computed in parallel. A “worker pool” is created and an enumeration for eachindividual genotype is queued to the pool. This allows for the work to be distributed toa number of processing cores local to the computer hosting darwindrian. The web formwill be pre-populated with a suggestion of allocating all but one of the local processingcores to this worker pool, however, this can be changed by the test subject. Finally, thisenumeration is independent of the ultimate user rating assigned to the singly expressedMondrian-like phenotype. Thus, the enumerations begin processing immediately whilethe user is presented with the current generation’s images for rating. This concurrencyleverages the relative slowness of human actions to mask the computational cost of thegenotype to phenotype enumerations required in the epigenetic models.

Once the steps (4a-4f, with or without steps 4c,4d for the epigenetic models) haveiterated the requisite number of times, the experiment will conclude. At this point, theuser will be re-directed to request the results view (5a). The HTTP server will againrelay this request (5b) to darwindrian which will compute a series of performance mea-sures and serialise all experimental data for any subsequent analysis. The darwindrianapplication then produces a high-level summary of the experiment and its results, alongwith a sample of a Mondrian that was produced by a genotype with the highest observedfitness value. A screen shot of this results page is included below.

Figure 4.4: Screen shot of experimental results

26

The graph in the middle of the results screen shot was used to illustrate a key perfor-mance measure, namely, the average phenotype fitness per generation. This graph givesa rough view of the average user rating per Mondrian-like image over the generations.

The Mondrian-like image on the right was expressed from one of (there may be mul-tiple) the individual genotypes with the highest fitness value. In a sense, this imagerepresents one of the “best” solutions identified in the experiment.

The underlying data for all test subjects that produced both the performance measuregraph and the sample Mondrian-like image are maintained for subsequent analysis. Theaggregate results of this analysis is captured and discussed in section 5.4.

This software was designed to support one final experiment that captures the user’ssubjective comparison between the final generation of an experiment under each of thethree models supported (the genetic model and epigenetic models with exact and inexactphenotype matches).

It is assumed that a user will perform three experiments in sequence, one for eachmodel. At this point, they will return to the main page (see figure 4.2) and click on the“Final Evaluation” button. This final experiment retrieves all Mondrian-like phenotypesfrom the last generation for each of the three models tested. These images are thenrandomly shuffled and presented in an arbitrary order. The user is asked to rate allimages, the results of which are serialised back into the experimental result directories.

4.2. Software Synopsis

The darwindrian software has a single entry point, its synopsis is defined by:

usage: darwindrian.py [-h] [--port PORT] [--nobrowser]

optional arguments:-h, --help show this help message and exit--port PORT web server port (default 8080)--nobrowser disable launch of local web browser

By executing darwindrian without any arguments, it will create a HTTP serverlistening on the default network port and locally launch a web browser to connect to thatHTTP server. Both of these behaviours can be modified with command-line arguments.

To have darwindrian listen on an alternative port, simply use the --port PORT flag.To stop the launch of a local web browser, simply use the --nobrowser flag. You may

wish to do this for example if you intend to connect to darwindrian from a networkconnected client instead of locally.

27

5. Experiments

5.1. Test Hypotheses

The original aim of these experiments was to test the following hypothesis.

Hypothesis 1. Epigenetic search improves the rate of convergence in genetic algorithms.

However, over the course of the work it became apparent that a number of factorswould likely impede the effectiveness of empirically testing this hypothesis. In particular,the interactive evolutionary nature of the problem, the limited time and scale dedicatedto user experiments and the magnitude of the search spaces involved were identified.

Thus, a weaker hypothesis thought to be feasible to empirically test was developed.

Hypothesis 2. Epigenetic search affects the rate of convergence in genetic algorithms.

5.2. Key Performance Measures

To test these two hypotheses, three key performance measures were developed. All ofthese performance measures were computed per generation during experiments:

1. Average phenotype fitness rating given by the test subject

2. Average number of historical phenotypes that contribute to epigenetic fitness values

3. Mean Absolute Error between genotype and phenotype fitness values

The first performance measure was used to test hypothesis 1 - however, it should benoted that it is heavily susceptible to subjective and inconsistent user behaviour.

The second and third performance measures were designed to be more objective andthus to test hypothesis 2. Specifically, they were focussed on measuring changes to fitnessvalues due to the epigenetic models, and hence the application of selective pressure.These measures do not obviously provide evidence for or against hypothesis 1.

5.3. Experimental Set-up

The major parameter under investigation was the “model” employed by genetic algo-rithms to solve the Mondrian-like problem. That is, the genetic algorithm model ofevolution or one of the epigenetic models presented in 3.6. So, a reasonable baseline setof values for all other hyper-parameters was established; these values remained fixed foreach test subject as they performed an experiment with each of the three models. Thesefixed hyper-parameter values are detailed in appendix G.1.

The three experiments were performed in a randomised order. These orders wereassigned to each test subject by the report’s author to ensure a uniform coverage overall possible orderings. Those model orders assigned to each test subject are detailed inappendix G.2.

28

Once the three main experiments were completed, a final experiment was conducted.All Mondrian-like phenotypes for the last generation of each experiment were collected.These phenotypes were then displayed in a randomised order to encourage an objectivecomparison between them. All phenotypes were then rated again by the test subject.

This final rating exercise forms a fourth, minor, performance measure to test hypoth-esis 1 in a slightly more objective fashion. The results can be found in appendix G.3.

Due to the labour-intensive nature of these experiments, test subjects were groupedbased on the amount of testing they were willing and able to perform. There were threeclasses of test subjects:

• Performed 5 generations for each model (15 total)



All test subjects were provided with the source code and deployed it on their ownsystems, performing tests unsupervised. However, the darwindrian software producestimestamped log data to identify whether any test subject deviated from the expectedexperimental procedure. No inconsistency was identified.

5.4. Results and Analysis

The following graph shows the results of the first performance measure, averaged acrossall test subjects for each generation. It must be noted that this, and subsequent, graphshave been interpolated to “smooth” out any noise and highlight the underlying shape ofthe results. The underlying raw result data is included in appendix G.4.

Figure 5.1: Average fitness over time

29

Recall, not all test subjects performed the same number of generations. The figurescorresponding to the first five generations come from ten test subjects. The figurescorresponding to the next five generations (up to ten) come from six test subjects. Thefigures corresponding to the all of the generations above ten come from a single testsubject, due to the substantial length of time required.

It appears that all three models rate similarly for the first several generations, butthen a rough trend is established. Namely, that the epigenetic model (inexact matches)out-performs the epigenetic model (exact matches), which subsequently out-performsthe genetic algorithm model. It would appear that this trend takes some time to set in,although, this trend does not hold towards the end of the experiments.

The increased volatility across models in the later generations is clear. This could bedue to inconsistency in the behaviour of the single test subject. More comprehensivetesting should be performed with a greater number of test subjects to investigate further.

Conversely, the results corresponding to the first ten generations show some interestingproperties. The difference between all three models is much smaller, with all having slighttrends away from an average rating of 0 (a neutral result) towards a positive rating. Thissuggests that all three models are producing improving fitness values over time, but doesnot strongly suggest that any model necessarily out-performs the others.

This finding is corroborated by the “Final Evaluation” test. As the results in ap-pendix G.3 show, six of the ten test subjects judged that either of the epigenetic resultsproduced, on average, “better” Mondrian-like images for the final generation. This is asuggestive but inconclusive result given the sample size.

The following graph shows the average number of phenotypes that contributed to thefitness values of genotypes in each of the models.

Figure 5.2: Historical phenotypes contributing to epigenetic fitness over time

30

This measure attempts to test hypothesis 2, and in particular, investigates whetherthe genotype to phenotype enumeration changes the fitness values at all. The numberof contributing phenotypes will be at least one as one reachable phenotype is alwaysproduced, the phenotype created during expression. If this measure does not exceedone, this computationally expensive approach is not worthwhile. Ideally, this measureshould grow over time, thus giving the epigenetic fitness function access to increasedinformation as evolution progresses.

From the limited experimentation performed, this performance measure remains es-sentially constant for the epigenetic model with exact phenotype matches. This suggeststhat the extra computation associated with the genotype to phenotype enumeration isunlikely to be worthwhile; generally, the only historically observed phenotype that af-fects the fitness value of a given genotype is the one that was actually expressed. Thisapproximately degenerates to the standard genetic algorithm model.

This outcome is not overly surprising. The massive size of the search space in theMondrian-like problem, and the relatively trivial number of evaluated points in thatspace present a probabilistic argument that we should not expect to express multiplephenotypes from differing genotypes that could have been expressed by other genotypes.

On the other hand, the epigenetic model with inexact matches utilised a number ofphenotypes in its fitness function that appeared to grow linearly with the generationcount. That is, almost all historically expressed and rated phenotypes will contribute tothe genotype fitness values in the epigenetic model with inexact matches.

In fact, the particular construction of the distance function on Mondrian-like pheno-types suggests that this count should scale precisely linearly with the generation count.The slight underachievement on that outcome could suggest that some aspect of theevolution or implementation is prone to collisions in either genotypes or phenotypes.This aspect should be investigated further.

Figure 5.3: Mean Absolute Error between genetic and epigenetic models over time

31

This final graph shows the Mean Absolute Error between phenotype and genotypefitness values in each of the epigenetic models over time. The fitness of phenotypesis equivalent to the genotype fitness function in the genetic algorithm model. Thisperformance measure tests the magnitude of the change in fitness values as a result ofthe epigenetic search.

Quite surprisingly, the epigenetic model with inexact matches does not obviouslydeviate from the epigenetic model with exact matches. Given that the epigenetic modelwith exact matches made use of approximately one phenotype (and is largely equivalentto the genetic algorithm model) its error sequence should be fairly close to 0. However,even though as demonstrated in the previous graph, the epigenetic model with inexactmatches incorporated information from effectively all historical phenotypes, it isn’t clearthat it has had a significantly greater impact on fitness values.

Having said that, the epigenetic model with exact matches does appear to include afew large peaks to its Mean Absolute Error sequence, potentially skewing results whilethe epigenetic model with inexact matches appears to be more consistently and steadilygrowing away from the genetic algorithm baseline.

Again, while suggestive, this is not a strongly conclusive result. Further and morethorough testing should be employed to investigate whether these are causal or randomresults in the behaviour of the various evolutionary models presented in this report.

In summary, while it is not clear that either of these epigenetic models has improvedthe convergence rate of genetic algorithms applied to the Mondrian-like problem, it isclear that they could influence fitness values and hence selective pressure.

The last two results in particular suggest a critical open question that should form thebasis for future work. Why, in the epigenetic model with inexact phenotype matches,does there appear to be a dampened affect on the resultant fitness values even thoughmost expressed phenotypes make a contribution? Possible causes for this result include:

• Undiagnosed flaws in the software artefacts.

• High complexity in the Mondrian-like problem relative to the amount of experi-mentation performed, both in terms of test subjects and experiment duration.

• Imbalance in the epigenetic fitness functions that emphasises or weights the con-tribution from the phenotype expressed too highly.

• Inconsistent fitness functions across generations. Test subjects are very likely torate individuals in each generation relative to each other, but not consistentlyacross generations; in other words, a dynamic fitness function is likely to havebeen used due to the particulars of the experimental set-up.

32

6. Conclusion and Future Work

While this work has produced some suggestive and encouraging results, these resultswere not conclusive. In particular, the evidence either for or against the original testhypothesis 1 was not produced. However, this work has shown that the epigeneticmodels proposed by Gedeon [13] do have the potential to alter selective pressure as it isemployed by evolutionary computing algorithms such as the genetic algorithm.

Unfortunately, a greater than anticipated time was necessarily devoted to understand-ing and removing biases from the earlier works of [6] and [7]. Regardless, while it wouldhave been ideal to have devoted additional time to experimentation and analysis ofresults, the “human factor” associated with interactive evolutionary computing prob-lems will always be a limiting factor. It is likely that for a problem as complex as theMondrian-like problem, this aspect will always be a limitation.

In addition to the findings outlined in this report, this project has produced a fullyfunctional framework to perform further experimentation as it has been described above.This software framework has been designed and developed to be extensible; thus futurework such as that suggested below should be feasible without major redevelopment effort.

6.1. Suggested Future Work

This work has raised a number of unanswered questions that could form future bodies ofwork. Such bodies of work could make additional progress towards the original test hy-pothesis concerning the efficacy of incorporating models of epigenetics into evolutionarycomputing.

6.1.1. Performance Measures and Experimental Set-up

Most importantly, the seemingly contradictory results identified in section 5.4 should beinvestigated further. Of particular note is the fact that the epigenetic fitness functionswere parameterised by one or two (depending on the model) weight factors. This designwas deliberate and acknowledged the need to balance various contributing components.However, during experiments, these weights were given fixed values for all test subjects(see appendix G.1). Appropriate calibration of these weights should be investigatedfurther as should the assumption that they are best weighted linearly.

Additionally, the experimental set-up may have contributed. As noted in section 5.4,in the absence of strict control or instruction, it is likely that test subjects rated theindividuals within each generation relative to each other, however, this is unlikely tohave been consistent across generations. One possibility is to re-run (and increase thescale of) experiments with stricter control and instruction to minimise the chance of thisphenomenon occurring. Alternative interfaces may also alleviate this issue and could beinvestigated.

Another solution would be to explicitly instruct test subjects to rate each generationconsistently, but not to worry about consistency across generations. There would then bea place to identify an appropriate scaling of these raw ratings before incorporating into

33

any performance measures and analysis. For example, the “Final Evaluation” experimentcould give a relative rating between the final generation of each experiment, providinga mechanism to scale earlier generations’ ratings.

6.1.2. Generalised Epigenetic Fitness Functions

The epigenetic fitness functions defined for the purposes of this work are not entirelytheoretically satisfying. In particular, there are greedy elements in the inexact matchescase for the granularity parameter j. Perhaps, a “better” inexact match will be foundwith a higher j that happens to produce a closer pixel-wise distance, resulting in anoverall higher fitness contribution.

This greedy aspect could be replaced by a further maximisation over historical phe-notypes, rather than minimising the parameter j in isolation.

6.1.3. Consistent Experimental Starting Point

Given the relatively small number of generations utilised in these experiments, the endresults are highly susceptible to the quality of the random initial population. Whileideally, the solution to this is to run many generations, that is not feasible for interactiveevolutionary computing problems. An alternative that could be investigated is to haveeach test subject perform an experiment on each model with the exact same initialpopulation, thus making any model comparison more objective.

6.1.4. Investigate Alternative Genetic Operators

The genetic operators used in this work were established at the beginning of the projectand not critically revisited at any point. Perhaps, there are alternative operators bettersuited to this particular application of genetic algorithms to the Mondrian-like problem.

For example, earlier generations in the experiments consist of largely random, andprobably generally unappealing Mondrian-like phenotypes. The use of fitness propor-tionate selection may not be ideal. Rank-based alternatives such as tournament selectionmay extract more information from this particular distribution of phenotypes.

6.1.5. Improving Mondrian-like Results

Finally, focussing exclusively on the Mondrian-like problem, there is possible future workto avoid the creation of invalid Mondrian-like images during the evolutionary process.There are a number of biologically inspired concepts that should be investigated. Inparticular, random repair [15] and sudden death - i.e. throw the offspring out and startagain, that is, to either fix a degenerate individual or discard its genetic material fromconsideration.

34

References

[1] Hans Bremermann, The evolution of intelligence : the nervous system as a model ofits environment, 1958

[2] Alex Fraser, Simulation of genetic systems by automatic digital computers. II: Effectsof linkage on rates under selection, 1957

[3] John Holland, Adaptation in Natural and Artificial Systems, 1975

[4] The Art Story, Modern Art Insight - De Stijl,http://www.theartstory.org/movement-de-stijl.htm

[5] Piet Mondrian: Composition II in Red, Blue, and Yellow, 1930,https://en.wikipedia.org/wiki/Piet_Mondrian

[6] Jian Yin Shen, Tom Gedeon, Cyber-Genetic Neo-Plasticism, 2007 <CONFIRM>

[7] Mathew Smith, Using Artificial Neural Nets to Increase the Performance of Inter-active Evolutionary Algorithms, 2009

[8] Jason D. Lohn, Derek S. Linden, Gregory S. Hornby, William F. Kraus, AdánRodríguez-Arroyo, Stephen E. Seufert, Evolutionary Design of an X-Band Antennafor NASA’s Space Technology 5 Mission, 2004

[9] Gregory S. Hornby, Al Globus, Derek S. Linden, Jason D. Lohn, Automated AntennaDesign with Evolutionary Algorithms, 2004

[10] Thomas Geijtenbeek, Michiel van de Panne, A. Frank van der Stappen, FlexibleMuscle-Based Locomotion for Bipedal Creatures, 2013

[11] Shumeet Baluja, Rich Caruana, Removing the Genetics from the Standard GeneticAlgorithm, 1995

[12] Andreas Holzinger, David Blanchard, Marcus Bloice, Katharina Holzinger, VasilePalade, Raul Rabadan, Darwin, Lamarck, or Baldwin: Applying Evolutionary Algo-rithms to Machine Learning Techniques, 2014

[13] Tom Gedeon, personal communication with Bob McKay and William Maroney, 2017

[14] Richard Bellman, Dynamic Programming, 1957

[15] George G. Mitchell, Diarmuid O‘Donoghue, David Barnes, Mark McCarville,GeneRepair - A Repair Operator for Genetic Algorithms, 2003

35

A. Independent Study Contract

Student and SupervisorsUniversity ID u5612989Student William MaroneyProject Supervisor Bob McKayCourse Supervisor Tom GedeonCourse Details COMP8755 Individual Computing Project 12 unitsSemester Undertaken S1 2017

Project TitlePhenotype to Genotype Matching and Epigentics in Evolutionary Algorithms

Learning ObjectivesFocussed on phenotype to genotype mappings, gain:

• Experience with evolutionary algorithm design

• Experience with advanced evolutionary algorithm analysis

Project DescriptionWe have an existing probabilistic genotype with significant data collected, which gen-erates multiple phenotypes. A possible mechanism for evolution with epigenetics is toconsider the cost function or probability of a genotype generating a particular phenotype(exact matches) or close to a particular phenotype (approximate matches).

Please see the attached document (appendix B) for a more detailed initial discussion.

Assessment BreakdownProject Component % of Mark Due Date Evaluated ByFinal report 50% 26/05/2017 Sumudu MendisSoftware 40% 26/05/2017 Tom GedeonPresentation 10% 29/05/2017 Weifa Liang, Peter Strazdins

Meetings DatesWeekly

36

B. Original Project Outline

This appendix includes the original high-level proposal to compute the phenotype togenotype mapping, however, early on in the project an alternative method that was lesscomputationally expensive was identified. The main report body details the approachthat was ultimately taken.

B.1. “Invert” the Genotype to Phenotype Mapping

Consider interactive evolutionary computation where evaluation of a fitness functionrequires human input. Practically speaking, the requirement for humans to evaluate thefitness of various phenotypes limits the feasible scope and scale of experiments.

Thus, given the human contribution to the fitness value of a single phenotype itwould be desirable if we could infer something about the fitness value of other pheno-type(s)/genotype(s).

An interactive method to generate “Mondrian-like” images4 will be used as inspirationfor this investigation.

B.1.1. Notation and Genotype Encoding

The following notations will be used.Let G be the set of all valid genotypes (abstract point-based representation)Let P be the set of all valid phenotypes (“Mondrian-like” image)Let f ∗ : G→ P be the randomised mapping from a genotype to a phenotype

Now, consider the genotype encoding.

Let g ∈ G s.t. g = (a,d, c,q); wherea ∈ Z/wZ× Z/hZ is the area threshold below which a rectangle may be coloured

d ∈(Z/ndZ

)4

is the probability factor of drawing north/south/east/west

c ∈(Z/ncZ

)3

is the probability factor of colouring red/blue/yellow

q ∈(Z/wZ× Z/hZ


w, h ∈ N is the width and height of a phenotypend ∈ N is the number of discrete steps in drawing probabilities (10)nc ∈ N is the number of discrete steps in colour probabilities (9)nq ∈ N is the number of points used in a genotype (3)

4Cyber-Genetic Neo-Plasticism - An AI program Creating Mondrian-like Paintings by using InteractiveBacterial Evolution Algorithm - Jian Yin Shen, Tom Gedeon

37

Note: a “probability factor” is not actually a probability, however, after suitablefiltering and scaling is performed it will become a discretised probability.Note: the original report states that the probabilities of drawing a line out of certain

types of points during the genotype-phenotype mapping is part of the genotype. This isinconsistent with the corresponding code and the probabilities are in fact constant. Re-gardless, this argument could be extended to also include such discretised probabilities.

B.2. Re-framing the Problem

Ideally, we’d like to compute (f ∗)−1, however, it does not exist. The randomised natureof f ∗ means it is a one-many function. It is also easy to see that multiple genotypes canproduce the same phenotype. Since f ∗ is a many-many function, it is not invertible.

Alternatively, consider the corresponding deterministic function f which also acts on amulti-variate random variable representing the random choices made in the computationof f ∗. Let f : G×X→ P where X is a discrete and finite multi-variate random variable.

Note: X is discrete and finite due to the specific encoding of “probability factors”which results in discretised probability distributions being drawn on to select outcomes.

Now, given a phenotype p ∈ P we’d like to compute the set of possible generatinggenotypes {gi ∈ G}. Additionally, we’d like to compute the probability of deriving pgiven each gi; that is, P (f ∗(gi) = p|gi) ∀gi

The first aspect (find {gi}) can be considered as a search problem since the domainof f is finite. Obviously, for any interesting (sufficiently big) instance of this problem,an exhaustive search strategy will be intractible. However, the specifics of the genotypeand phenotype encodings can be used to reduce the potential search space dramatically.

Observe that the a,q components of g essentially define a valid genotype-phenotypemapping; and thus will be the focus of the search strategy. The desired probabilityP (f ∗(g) = p|g) is largely defined by the d, c components of g; thus these will beconsidered subsequently to the search problem.

B.3. Search Problem - Finding Genotypes (q genes)

Conjecture 1. Each generating point in a genotype emits at least one line segment inat least one direction.

Conjecture 2. Each generating point in a genotype must lie on at least one line segment.

Conjecture 3. The number of generating points within an interactive experiment isknown and remains fixed.

Conjecture 4. Each line segment must be connected to at least one generating point inthe genotype.

38

Given a phenotype, extend all line segments to positive and negative infinity; you willnow have a series of lines of the form x = xi ∀i ∈ [0,m), y = yj ∀l ∈ [0, l). Now, therewill be at most m× l points of intersection.

Consider the x = xi equations. Observe that m = np ⇒ the x-coordinate for preciselyone generating point must lie on each vertical line. Similarly, l = np ⇒ the y-coordinatefor precisely one generating point must lie on each horizontal line.Thus, (m = np)∧(l = np)⇒ we must search at most

(n2pnp

)ordered sequences of points.

Conjecture 5. Where there are np distinct x = xi and y = yj equations then there areat most

(n2pnp

)valid, ordered, sets of generating points.

What if m < np? Consider the case where (m = np − 1) ∧ (l = np). Here, thex-coordinates of np − 1 generating points must lie on precisely one vertical line and they-coordinates of np generating points must lie on precisely one horizontal line. The x-coordinate of the partially constrained generating point may lay on any line (made upof at most (np−1)×h+np×w locations), so long as the above conjectures are satisfied.This following facts somewhat mitigate the search space blow-out:

• Only points on actual line segments need be considered

• Conjecture 4 implies that only points on actual line segments not yet connectedto one of the np − 1 placed generating points need be considered

Together, this results in an increased search space of maximum size(np×(np−1)

np−1

)×w. It

is likely that any further reduction in m and/or l will result in a combinatorial blow-outof the size of the search space; however, this question deserves further attention.From a casual viewing of available “Mondrian-like” images available from previous

works it appears that the cases m = np and l = np are exceedingly common. Occasion-ally, m = np− 1 or l = np− 1 are observed but no further degeneracy was evident. Thisassessment was not rigorous but is suggestive that such constraints will often be veryeffective in reducing the size of the search space.

Search Space Bound m = 3 m = 2

l = 3(

93

)= 84

(62

)× 480 = 7200

l = 2(

62

)× 480 = 7200 . . .

B.4. Search Problem - Finding Genotypes (a genes)

The a component of a genotype specifies the maximum area of rectangles that may becoloured. Thus, the set of candidate genotypes found in the search above (constrainedby q) can be augmented by enumerating values of a that would have allowed all colouredrectangles in the given phenotype.

That is, find the maximum area of a coloured rectangle in the phenotype p andproduce the set of a that would allow said rectangle. Produce the cross-product of thesea with the q found above.

39

B.5. Finding P (f ∗(g) = p|g) (d, c genes)

Since all probability factors are non-zero, all values of d, c are valid. The specific valuesfor a given genotype g will however influence the probability that it generates the givenphenotype.

B.6. Do We Need to Enumerate Valid Genotypes?

Even with these constraints, enumerating all possible genotypes that could have pro-duced a given phenotype appears quite costly. However, recall that the context of thisis evolutionary computation. While the set of genotypes that could have generated agiven phenotype will impact how we spread a fitness value, what really matters is theaccumulated fitness value of genotypes as they occur during selection. This accumulatedvalue is a function of:

• the fitness value(s) of the corresponding phenotype(s) reachable from the givengenotype

• the proportion of this/these fitness value(s) belonging to the given genotype com-pared the that apportioned to all other genotypes

If this second factor can be computed without enumerating all possibly generatinggenotypes, then this problem may largely reduce to a counting problem.

How does one compute the unconditional probability P (f ∗(g) = p)?

40

C. Software Description<epiga>

genotype-to-phenotype C++ optimised genotype to phenotype enumeration

main.cpp Command-line interface for enumeration

Makefile Makefile based build script

phenotype.cpp Phenotype expression logic - implementation

phenotype.hpp Phenotype expression logic - header

implementation Multiple Darwindrian implementations

__init__.py Empty - makes implementation a Python module

improved.py Darwindrian implementation - Smith

original.py Darwindrian implementation - Shen

unbiased.py Darwindrian implementation - Maroney

resources Various web resources used by web application

bootstrap.min.css Twitter’s CSS library Bootstrap

bootstrap.min.css.map Twitter’s CSS library Bootstrap

complete.html HTML template to report experimental results

error.html HTML template to render any errors

evolve_template.html HTML template for phenotype rating

favicon.ico Original Mondrian art - web server’s favicon

form-magic.js Client-side web functionality for web forms

index.html HTML template for main web page

jquery-1.7.1.min.js Open-source JavaScript library jQuery

magic-check.css Open-source CSS for HTML radio buttons

magic-check.min.css Open-source CSS for HTML radio buttons

style.css Custom web page CSS styling

thumbnail.js Mondrian thumbnail and preview functionality

... See source file listing on next page 41

<epiga>

genotype-to-phenotype

... See source file listing on previous page

implementation


resources


crossover.py Abstract base class for crossover algorithms

darwindrian.py Darwindrian entry point (main function)

evolution.py Abstract base class for evolution algorithms

ga.py The genetic algorithm implementation

gene.py Abstract base class for genes and concrete genes

genotype.py Shared Mondrian genotype functionality

mondrian.py Shared Mondrian phenotype functionality

mutation.py Abstract base class for mutation algorithms

README Darwindrian build and execution instructions

selection.py Abstract base class for selection algorithms

template.py Abstract base class for all extensible types

util.py Various utility functions used throughout

webui.py Darwindrian web server functionality

Note: The source tree listing is too large to fit onto one page so was split across twopages. The use of ... place-holders implies that the corresponding sub-tree will befound in the other image. The two image parts are complementary and cover the entiresource tree listing.

All third-party developed components are highlighted in red in the source tree listingsabove. While the entries highlighted in blue were implemented by this report’s author,they are logically equivalent ports of the Darwindrian implementations from the prior

42

works of [6] and [7] to allow for integration into the framework developed in this work.All other source code is an original work by the report’s author.

C.1. Testing

Correctness testing was difficult to achieve due to heavily stochastic nature of the al-gorithms used in this work. To compensate for this fact, a number of strategies wereemployed to test the implementation during development and prior to experimentation.

Primarily, defensive programming was employed. A highly modular experimentationframework was designed which facilitated the definition of pre and post conditions formany functional components. Run-time assertions were used liberally to test that bothpre and post conditions held before and after execution of said functions.

Exception handling was used thoroughly to motivate the analysis of success and failuremodes in all complex functions and components. Incorporating exception reporting andstack trace analysis into the web based user interface heavily assisted and streamlineddebugging - exceptions were tied to user actions and reported with full contextual infor-mation while performing functional testing. This design aspect brought any un-handledexception to the developer’s attention immediately.

Various standard software engineering practices were employed to allow for betteroverall quality control. Version control was utilised to assist in bug traceability, andchange reversion where needed. The JetBrains PyCharm Interactive Development Envi-ronment was used which provides version control commit hooks to continually performstatic code analysis, identifying potential problems as they are created in source code.

Finally, simple optimisation problems were defined and solved with the software cre-ated. Problems with well understood solutions were used, and black-box testing wasperformed to ensure that expected convergence occurred.

C.2. Development and Experimental Environments

The software produced for this project was developed in a 64-bit Windows 7 environment,however, it was designed and developed to be cross-platform. In particular, only portablelanguages were employed (Python, C++ and web standards-based web technologies).Use of non-standard libraries was minimised as much as possible.

During development, the software was tested on Windows 7, Windows 10, variousLinux distributions and macOS Sierra systems. Experiments were run in a distributedfashion on computers provided by test subjects.

Primary system: Intel Xeon E3-1241v3 3.5GHz with hyper-threading, 32GB of RAMPrimary build chain: C++ (Cygwin g++ v5.4.0) and Python 3.6

C.3. Datasets

All data was generated at run-time using the various evolutionary algorithms describedin this report. No static datasets were utilised.

43

D. Software Usage

The following is the README file attached to the software artefact which describes howto build and run the application. Specific software dependencies are also outlined.

Overview========The epiga project implements an experimental framework to apply various geneticalgorithms to the problem of automatically generating abstract artworks(inspired by Piet Mondrian ).

The epiga project is primarily a Python3 application with a web -based userinterface. It also includes a natively executed program to performcomputationally expensive aspects of the genetic algorithms employed.

Software Dependencies=====================* A C++ compiler supporting the C++11 standard such as the GNU Compiler Collection* The GNU make system or the ability to customise C++ build steps* Python3 .4+ with the additional libraries Pillow and matplotlib

The matplotlib library is optional. Its absence will disable some graphgeneration functionality , but otherwise , the epiga application will run. ThePillow library is mandatory. It is responsible for generating the artworks.

To install these dependencies on Debian based Linux distributions (e.g. Ubuntu ):sudo apt -get install g++ python3 python3 -pil

Build Instructions==================1. Deploy the contents of the epiga source into direction - denoted <EPIGA_DIRECTORY >2. Build the stand -alone C++ application using the provided makefile

cd <EPIGA_DIRECTORY >/genotype -to-phenotypemake

Usage Instructions==================Launch the main epiga application which will spawn a local HTTP serverlistening on port 8080. This will listen for network connections allowingfor alternative hosts to connect to the epiga web server instance if thehost running the server allows it.

Upon spawning the HTTP server , the default local web browser will launch (ifnot already running) and open the epiga web page.

To launch epiga:1. cd <EPIGA_DIRECTORY >2. python3 darwindrian.py

Supported Platforms===================The following platforms have been tested however any system that meets thehardware and software requirements outlined in this README should be able torun the epiga application.* 64-bit Windows7 running Python3 .6 and Cygwin v2.8.0* 64-bit Linux Mint/Ubuntu 14.04 running Python3 .4 and gcc v4.8.4

Minimum Hardware Recommendations================================* Multi -core 64-bit CPU* At least 3GB of RAM per processing core allocated to epigenetic models

Note: if you have less than 3GB of RAM per CPU core , reduce the number of"Processing cores" allocated to any EGA based experiment accordingly.

44

E. Darwindrian Representation (Jian Yin Shen)

This appendix describes the genetic algorithm implementation used in [6]. This imple-mentation, and that developed for the work discussed in the body of this report havesome common functionality. For brevity, only different or additional functionality willbe described in this appendix.

E.1. Additional Hyper-parameters

The following parameters will be fixed within the context of any experiment and arespecific to this representation. These hyper-parameters are in addition to those definedin section 3.4.

Let nd ∈ N be the number of discrete steps in drawing probabilities (nd = 10)Thus, valid drawing probabilities are: { i

nd} ∀i ∈ [0, nd]

Let nc ∈ N be the number of discrete steps in colour probabilities (nc = 9)Thus, valid colour probabilities are: { i

nc} ∀i ∈ [0, nc]

Let nl ∈ N be the number of emitting loops used in m∗ (nl = 4)

E.2. Genotype Representation

Let g ∈ G s.t. g = (a,d, c,q); wherea ∈ Z/wZ× Z/hZ is the maximum dimensions of a rectangle that may be coloured

d ∈(Z/ndZ

)4


c ∈(Z/ncZ

)3


q ∈(Z/wZ× Z/hZ


Note: a “probability factor” is not actually a probability, however, after suitablefiltering and scaling is performed it will become a discretised probability.Note: the original report states that the probabilities of drawing a line out of certain

types of points during the genotype-phenotype mapping is part of the genotype. Thisis inconsistent with the corresponding code and the probabilities are in fact constant.Regardless, this encoding could be extended to also include such discretised probabilities.

E.3. Fitness Function

The fitness function is defined as follows:Let ratingS : P→ (Z/2Z) be the structural rating of pLet ratingC : P→ (Z/2Z) be the colour rating of pLet fP : P→ (Z/2Z)2 be the phenotype fitness functionLet fG : G→ R be the genotype fitness functionDefine fP as fP(p) = {ratingS(p), ratingC(p)}

45

Define fG as

fG(g) =

0.30 fP(m∗(g)) = (0, 0)0.65 fP(m∗(g)) = (0, 1)0.75 fP(m∗(g)) = (1, 0)1.00 fP(m∗(g)) = (1, 1)

E.4. Genetic Operators/Algorithms


Express a pi = m∗(gi) ∀i ∈ [1, n]Evaluate fitness values, fG(gi) = fP(pi) ∀i ∈ [1, n]Sort {gi}ni=1 by fitness values fG(gi) in descending ordernext_generation ← {g1} . single elite selectionfor i = 1, . . . , n− 1 do

if fG(gi)fG(g1)

< U(0, 1) then . with probability fG(gi)fG(g1)

parents ← {gi,gi+1} . select adjacent individualsoffspring ← crossover(parents) . with crossovernext_generation ← next_generation ∪ offspring . without mutation

while ||next_generation|| < n do . complete the next generationnext_generation ← next_generation ∪ {new random individual} . not mutation

return next_generation

Algorithm 11 πJian(a1, a2,g1,g2) - crossover helper function

Let pi be the phenotype already expressed by gi ∀i ∈ [1, 2]Let (a1

1, a12)← a1

Let (a21, a

22)← a2

return

a1 ratingS(p

1) > ratingS(p2)

(a11, a

22) ratingS(p

1) = ratingS(p2)

a2 ratingS(p1) < ratingS(p

2)

46

Algorithm 12 crossover({gi}mi=1) where m = 2, l = 1

Let pi be the phenotype already expressed by gi ∀i ∈ [1,m]Let (x, y, z)← (2, 2, 1)if ratingS(p

1) > ratingS(p2) AND ratingC(p1) > ratingC(p2) then

Let (x, y, z)← (4, 3, 3)else if ratingS(p

1) > ratingS(p2) AND ratingC(p1) = ratingC(p2) then

Let (x′, y)← (3, 2) . possible bug, meant to be x?Let z ← U(1, 2) rounded to nearest integer

else if ratingS(p1) > ratingS(p

2) AND ratingC(p1) < ratingC(p2) thenLet (x′, y, z)← (3, 2, 0) . possible bug, meant to be x?

else if ratingS(p1) = ratingS(p

2) AND ratingC(p1) > ratingC(p2) thenLet (x, z)← (2, 2)Let y ← U(1, 2) rounded to nearest integer


2) AND ratingC(p1) = ratingC(p2) thenLet x← 2Let y ← U(1, 2) rounded to nearest integerLet z ← U(1, 2) rounded to nearest integer


2) AND ratingC(p1) < ratingC(p2) thenLet (x, z)← (2, 0)Let y ← U(1, 2) rounded to nearest integer

Let (a1,d1, c1,q1)← g1 . decompose g1

Let (a2,d2, c2,q2)← g2 . decompose g2

Let (d11, d

12, d

13, d

14)← d1 . decompose d1

Let (d21, d

22, d

23, d

24)← d2 . decompose d2

Let (c11, c

12, c

13)← c1 . decompose c1

Let (c21, c

22, c

23)← c2 . decompose c2

Let (q11, . . . ,q

1nq)← q1 . decompose q1

Let (q21, . . . ,q

2nq)← q2 . decompose q2

Let a← πJian(a1, a2,g1,g2)

Let d← (d11, . . . , d

1x, d

2x+1, . . . , d

24)

Let c← (c11, . . . , c

1z, c

2z+1, . . . , c

23)

Let q← (q11, . . . ,q

1y,q

2y+1, . . . ,q

2nq)

Let child ← (a,d, c,q)return {child}

47

F. Darwindrian Representation (Mathew Smith)

This representation builds heavily on that presented in appendix E, thus only changesare noted in this description. Where any detail is omitted, please refer to appendix E.

F.1. Genotype Representation

Let g ∈ G s.t. g = (amin, amax, cmax,d, c,q); whereamin ∈ Z/wZ× Z/hZ is the minimum dimensions of a rectangle that may be colouredamax ∈ Z/wZ× Z/hZ is the maximum dimensions of a rectangle that may be colouredcmax ∈ Z is the maximum number of rectangles that may be coloured

d ∈(Z/ndZ

)4


c ∈(Z/ncZ

)3


q ∈(Z/wZ× Z/hZ


F.2. Fitness Function

The fitness function is defined as follows:Let ratingS : P→ (Z/2Z) be the structural rating of pLet ratingC : P→ (Z/2Z) be the colour rating of pLet fP : P→ {like,indifferent,dislike} be the phenotype fitness functionLet fG : G→ R be the genotype fitness functionDefine ratingS(p) and ratingC(p) as{

1 fP(p) = like0 otherwise

Define fP as fP(p) = user selected rating of pDefine fG as

fG(g) =

0.1 fP(m∗(g)) = dislike0.5 fP(m∗(g)) = indifferent1.0 fP(m∗(g)) = like

48

F.3. Genetic Operators/Algorithms


Express a pi = m∗(gi) ∀i ∈ [1, n]Evaluate fitness values, fG(gi) = fP(pi) ∀i ∈ [1, n]Sort {gi}ni=1 by fitness values fG(gi) in descending ordernext_generation ← {g1} . single elite selectionfor i = 1, . . . , n− 1 do

if fG(gi)fG(g1)

< U(0, 1) then . with probability fG(gi)fG(g1)

parents ← {gi,gi+1} . select adjacent individualsoffspring ← crossover(parents) . with crossovernext_generation ← next_generation ∪ offspring . without mutation

while ||next_generation|| < n do . complete the next generationindividual ← gi with probability fG(gi)

Σnj=1fG(gj)

next_generation ← next_generation ∪ {mutation(individual)} . with mutationreturn next_generation

Algorithm 14 crossover({gi}mi=1) where m = 2, l = 1. . .. . . . This logic is unchanged from algorithm 12Let (amin

1 , amax1 , cmax

1 ,d1, c1,q1)← g1 . decompose g1

Let (amin2 , amax

1 , cmax2 ,d2, c2,q2)← g2 . decompose g2

. . . . This logic is unchanged from algorithm 12Let (amin, amax, cmax)← πMathew(amin

1 , amax1 , amin

2 , amax2 , cmax

1 , cmax2 ,g1,g2)

Let d← {d11, . . . , d

1x, d

2x+1, . . . , d

24}

Let c← {c11, . . . , c

1z, c

2z+1, . . . , c

23}

Let q← {q11, . . . ,q

1y,q

2y+1, . . . ,q

2nq}

Let child ← (amin, amin, cmax,d, c,q)return {child}

49

Algorithm 15 πMathew(amin1 , amax

1 , amin2 , amax

2 , cmax1 , cmax

2 ,g1,g2)

Let pi be the phenotype already expressed by gi ∀i ∈ [1, 2]Let (amin

1,1 , amin1,2 )← amin

1

Let (amax1,1 , a

max1,2 )← amax

1

Let (amin2,1 , a

min2,2 )← amin

2

Let (amax2,1 , a

max2,2 )← amax

2

if ratingS(p1) > ratingS(p

2) and p1 has coloured regions thenLet wmin, wmax be the min and max widths of a coloured region in p1

Let hmin, hmax be the min and max heights of a coloured region in p1

Let (amin, amax)←((wmin+amin

1,1

2,hmin+amin

1,2

2), (

wmax+amax1,1

2,hmax+amax

1,2

2))


2) then

Let amin ←(min(amin

1,1 , amin2,1 ),min(amin

1,2 , amin2,2 )

)Let amax ← (amax

1,1 , amax2,2 )

else if ratingS(p1) < ratingS(p

2) and p2 has coloured regions thenLet wmin, wmax be the min and max widths of a coloured region in p2

Let hmin, hmax be the min and max heights of a coloured region in p2

Let (amin, amax)←((wmin+amin

2,1

2,hmin+amin

2,2

2), (

wmax+amax2,1

2,hmax+amax

2,2

2))

else if ratingS(p1) > ratingS(p

2) thenLet (amin, amax)← (amin

1 , amax1 )

else if ratingS(p1) < ratingS(p

2) thenLet (amin, amax)← (amin

2 , amax2 )

if ratingC(p1) > ratingC(p2) thenLet cmax ← cmax

1

With 50% probability, let cmax ← cmax ± 1 . plus and minus are equiprobableelse if ratingC(p1) = ratingC(p2) then

Let cmax ← cmax1 +cmax

2

2

else if ratingC(p1) < ratingC(p2) thenLet cmax ← cmax

2

With 50% probability, let cmax ← cmax ± 1 . plus and minus are equiprobablereturn (amin, amax, cmax)

50

Algorithm 16 mutation(g)Let p be the phenotype already expressed by g

Let mf ←

3 fG(g) < 0⌊1(

fG(g))2⌋ fG(g) ≥ 0

Let (amin, amax, cmax,d, c,q)← g . decompose gfor i = 1, . . . ,mf do

r ← U(0, 5) rounded to nearest integerif r = 0 then

cmax = cmax ± 1 . plus and minus are equiprobableelse if r = 1 then

c←(U(0, nc − 1), U(0, nc − 1), U(0, nc − 1)

). fresh c

else if r = 2 thend←

(U(0, nd − 1), U(0, nd − 1), U(0, nd − 1), U(0, nd − 1)

). fresh d

else if r = 3 thenComplex and probabilistic function to decrease the distance between a1, a2

Let r ← U(1, 2) . force control to flow into one of the next clausesif r = 4 then

Complex and probabilistic function to increase the distance between a1, a2

else if r = 5 thenComplex and probabilistic function to increase the distance between a1, a2

Let g′ ← (a,d, c,q) return g′

Note: fG(g) ∈ {0.1, 0.5, 1.0} ⇒ mf = 1(fG(g)

)2 (always), thus mf ∈ {99, 4, 1}.

51

G. Experimental Data

G.1. Fixed Hyper-parameter Values

The following hyper-parameters were fixed for all experiments. The implementation wasset to ensure that the genetic algorithms without inherent biases (see section 3.5) wereused. All other hyper-parameters have carried over from the prior works of [6] and [7].These specific values were selected to produce phenotypes that were broadly similar, orexpected to be similar, to true Mondrian artworks.

Hyper-parameter Fixed ValueImplementation Unbiased DarwindrianPopulation size 10Number of elites 1Image dimensions 480x480Number of origin points 3Number of generating loops 3Epigenetic fitness weight k1 1Epigenetic fitness weight k2 1

G.2. Test Subjects

The actual test subjects have been anonymised, however, each is uniquely identified byan alias. Their experimental parameters are identified in the following table.

Test Subject # Generations Method Model OrderingA 30 Parallel N/AB 10 Series EGA (exact), EGA (inexact), GAC 10 Series GA, EGA (exact), EGA (inexact)D 10 Series EGA (inexact), EGA (exact), GAE 10 Series GA, EGA (inexact), EGA (exact)F 10 Series EGA (exact), GA, EGA (inexact)G 5 Series GA, EGA (exact), EGA (inexact)H 5 Series EGA (exact), EGA (inexact), GAI 5 Series GA, EGA (inexact), EGA (exact)J 5 Series EGA (exact), GA, EGA (inexact)

Note: while most test subjects completed the three experiments in sequential order,one (A) completed all three experiments in parallel. This was to further remove thepotential bias introduced by having all test subjects use the same order, and perhapschange their behaviour as they became more familiar with the darwindrian software.

52

G.3. Final Evaluation Results

The following table includes the average phenotype fitness values for the final gener-ation in each experiment, for each test subject. This data is derived from the “FinalEvaluation” experiment.

Test Subject GA EGA (exact matches) EGA (inexact matches)A -0.6 0.8 0.7B 0.5 0.2 0.3C 0.0 0.4 0.6D 1.2 1.2 1.0E -0.1 -0.6 -0.4F -0.2 -0.3 -0.4G -0.7 -0.8 -1.3H -0.5 0.0 -0.2I -0.5 -0.2 -0.8J 1.1 1.2 1.5

G.4. Raw Result Data

This section includes the raw data for the phenotype rating performance measure anal-ysed in section 5.4.

The following three tables correspond to the three models that were investigated.

• the GA model

• the EGA (exact phenotype matches) model

• the EGA (inexact phenotype matches) model

For each model, the average phenotype rating for each test subject is reported. Theseaverage ratings are then aggregated to provide an overall average phenotype rating pergeneration. It is this final figure that is used in section 5.4.

53

G.4.1. Model: GA

Generation Overall Average Individual Averages1 0.01 -0.4, 1.3, 0.0, 0.2, 0.2, -0.6, 0.7, -0.9, -0.2, -0.22 0.11 -0.6, 1.6, 0.1, 0.9, 0.6, -0.5, 0.7, -0.8, -0.5, -0.43 0.13 0.1, 1.2, 0.1, 0.6, 0.4, -0.1, 0.4, -0.9, -0.4, -0.14 0.16 -0.4, 1.3, -0.4, 1.3, 0.3, -0.4, 0.7, -0.3, -0.1, -0.45 0.16 0.1, 1.0, -0.4, 0.8, 0.8, -0.3, 0.6, -0.6, -0.3, -0.16 0.57 0.5, -0.3, 1.5, 0.5, 0.0, 1.27 0.45 0.4, -0.4, 1.4, 0.7, -0.1, 0.78 0.27 -0.5, -0.3, 1.3, 0.9, 0.2, 0.09 0.18 -0.3, -0.1, 0.6, 0.8, 0.1, 0.010 0.30 -0.3, 0.2, 1.4, 0.7, 0.2, -0.411 0.50 0.512 -0.20 -0.213 0.20 0.214 -0.60 -0.615 0.10 0.116 -0.30 -0.317 -0.30 -0.318 0.50 0.519 0.00 0.020 -0.60 -0.621 -0.60 -0.622 -0.60 -0.623 0.80 0.824 0.80 0.825 0.30 0.326 0.30 0.327 0.60 0.628 0.60 0.629 1.10 1.130 0.90 0.9

54

G.4.2. Model: EGA (exact matches)

Generation Overall Average Individual Averages1 0.02 0.3, 0.8, -0.1, -0.3, -0.4, 0.7, 0.7, -0.2, -0.6, -0.72 0.02 -0.3, 0.8, -0.4, 0.2, -0.8, 1.0, 0.7, -0.2, -1.0, 0.23 -0.21 -0.1, 0.9, -0.3, 0.0, -0.3, 0.0, -0.2, -0.8, -0.5, -0.84 -0.04 -0.2, 1.4, -0.5, -0.6, -0.2, 1.1, -0.2, -0.1, -0.3, -0.85 0.18 0.3, 1.2, -0.5, -0.1, 0.5, 1.0, 0.3, -0.6, 0.0, -0.36 0.43 0.2, 0.1, 0.1, 0.3, 1.1, 0.87 0.25 -0.2, -0.2, 0.1, 0.4, 1.0, 0.48 0.23 0.1, -0.4, 0.0, 0.5, 0.9, 0.39 0.22 0.5, -0.2, -0.2, 0.5, 0.4, 0.310 0.60 0.3, -0.1, 0.6, 0.4, 1.4, 1.011 0.80 0.812 0.40 0.413 0.20 0.214 -0.10 -0.115 0.00 0.016 0.90 0.917 0.80 0.818 0.40 0.419 0.20 0.220 0.10 0.121 0.10 0.122 0.40 0.423 0.70 0.724 1.10 1.125 1.00 1.026 0.70 0.727 0.30 0.328 0.20 0.229 0.90 0.930 0.60 0.6

55

G.4.3. Model: EGA (inexact matches)

Generation Overall Average Individual Averages1 0.05 0.0, 1.2, -0.5, 0.9, 0.1, -0.3, 0.6, -0.8, -0.3, -0.42 -0.02 -0.1, 0.7, -0.1, 0.9, -0.2, -0.1, 0.0, 0.1, -0.7, -0.73 0.05 -0.3, 1.2, -0.2, 1.0, 0.1, -0.1, 0.4, -0.2, -1.1, -0.34 0.18 0.4, 1.2, -0.2, 1.5, 0.4, -0.3, 0.3, -0.1, -0.7, -0.75 0.20 0.0, 1.2, 0.0, 0.7, 0.3, 0.2, 0.5, -0.1, -0.9, 0.16 0.30 0.0, 0.3, 0.8, 0.1, 0.1, 0.57 0.30 -0.4, -0.4, 0.2, 0.7, 1.3, 0.48 0.58 0.1, 0.1, 0.6, 0.6, 1.0, 1.19 0.40 -0.4, -0.3, 1.0, 0.4, 0.9, 0.810 0.23 0.3, -0.3, 0.6, 0.6, 0.3, -0.111 1.10 1.112 0.90 0.913 1.50 1.514 1.50 1.515 1.50 1.516 0.90 0.917 1.40 1.418 0.80 0.819 0.50 0.520 0.70 0.721 1.30 1.322 0.80 0.823 0.80 0.824 -0.60 -0.625 1.40 1.426 0.50 0.527 0.70 0.728 1.30 1.329 0.90 0.930 0.80 0.8

56

Documents

Epigenetics in Evolutionary Algorithms and Computer ...€¦ · The Australian National University Epigenetics in Evolutionary Algorithms and Computer Generated Artwork COMP8755 William