Sequence Selection by Dynamical Symmetry Breaking … · Sequence selection by dynamical symmetry breaking in an autocatalytic binary polymer model Harold Fellermann Interdisciplinary

Sequence Selection byDynamical Symmetry Breakingin an Autocatalytic BinaryPolymer ModelHarold FellermannShinpei TanakaSteen Rasmussen

SFI WORKING PAPER: 2017-08-031

SFIWorkingPaperscontainaccountsofscienti5icworkoftheauthor(s)anddonotnecessarilyrepresenttheviewsoftheSantaFeInstitute.Weacceptpapersintendedforpublicationinpeer-reviewedjournalsorproceedingsvolumes,butnotpapersthathavealreadyappearedinprint.Exceptforpapersbyourexternalfaculty,papersmustbebasedonworkdoneatSFI,inspiredbyaninvitedvisittoorcollaborationatSFI,orfundedbyanSFIgrant.

©NOTICE:Thisworkingpaperisincludedbypermissionofthecontributingauthor(s)asameanstoensuretimelydistributionofthescholarlyandtechnicalworkonanon-commercialbasis.Copyrightandallrightsthereinaremaintainedbytheauthor(s).Itisunderstoodthatallpersonscopyingthisinformationwilladheretothetermsandconstraintsinvokedbyeachauthor'scopyright.Theseworksmayberepostedonlywiththeexplicitpermissionofthecopyrightholder.

www.santafe.edu

SANTA FE INSTITUTE

Sequence selection by dynamical symmetry breaking in an autocatalytic binarypolymer model

Harold Fellermann∗

Interdisciplinary Computing and Complex Biosystem Research Group,School of Computing Science, Newcastle University,

Claremont Tower, Newcastle upon Tyne NE1 7RU, United Kingdom

Shinpei TanakaGraduate School of Integrated Arts and Sciences, Hiroshima University,

1-7-1 Kagamiyama, Higashi-Hiroshima 739-8521, Japan

Steen RasmussenCenter for Fundamental Living Technology (FLinT) Department of Physics, Chemistry and Pharmacy,

University of Southern Denmark, Campusvej 55, 5230 Odense M, Denmark andSanta Fe Institute, 1399 Hyde Park Rd, Santa Fe NM 87501, USA

Template directed replication of nucleic acids is at the essence of all living beings and a major mile-stone for any origin of life scenario. We here present an idealized model of prebiotic sequence repli-cation, where binary polymers act as templates for their autocatalytic replication, thereby servingas each others reactants and products in an intertwined molecular ecology. Our model demonstrateshow autocatalysis alters the qualitative and quantitative system dynamics in counter-intuitive ways.Most notably, numerical simulations reveal a very strong intrinsic selection mechanism that favorsthe appearance of a few population structures with highly ordered and repetitive sequence patternswhen starting from a pool of monomers. We demonstrate both analytically and through simulationhow this “selection of the dullest” is caused by continued symmetry breaking through random fluc-tuations in the transient dynamics that are amplified by autocatalysis and eventually propagate tothe population level. The impact of these observations on related prebiotic mathematical models isdiscussed.

PACS numbers: 05.65.+b 87.23.Cc 87.23.Kg 82.40.Qt

I. INTRODUCTION

The ability of nucleic acids to serve as templates fortheir own replication via the well-known Watson-Crickpairing is at the heart of all life known today. Conse-quently, the onset of such chemical sequence informationand its replication are crucial cornerstones in most theo-ries about the origin of life [1, 2].

While the appearance of first molecular replicators isstill subject to ongoing debate [3, 4], we suspect [5, 6]that catalysis combined with replication of the catalyticmolecules is the critical molecular invention. Catalysisenables access to energy and resources while replicationboth preserves and enables combinatorial exploration ofthe catalyst. Competition over common and scarce re-sources would cause a selection pressure on the catalystthat directs the random exploration process of copy er-ror mutations. The resulting evolutionary search wouldeventually select replicator species with increasingly ad-vantageous catalytic properties [7] – thus exemplifyingDarwinian chemical evolution.

Several simple models have been developed to eluci-date the prebiotic emergence and evolution of nucleicacid replicators. Most notably, random catalytic network

∗ [email protected]

initialrandomligation equilibration

randomligation

equilibrationdegradation

+

+

+

degradation

ligation

autocatalysis

FIG. 1. In the exact autocatalytic polymer model polymersmade up of two types of monomers can degrade, ligate ran-domly or undergo autocatalytic replication (inlay). Strongautocatalysis and comparably rare random ligation promotethe emergence of characteristic population structures (mainimage).

models have been introduced to predict the emergence ofautocatalytic cycles in pools of arbitrary molecules withrandom cross-catalytic activity [8–10]. More closely re-lated to nucleic acids are catalytic polymer models, where

2

the catalyzed polymer ligation reaction is modeled asconcatenation of random strings [11–19]. Most studies ofpolymer models concern themselves with the appearanceof catalytic cycles when the catalytic activity of polymersis assigned either randomly or dependent on the sequencearound the ligation site. Common conclusion of thesestudies is that random catalytic activity among a set ofmolecules is sufficient to promote the appearance of re-flexively autocatalytic sets, i.e. groups of molecules thatare able to jointly replicate themselves through cross-catalysis, see e.g. Refs. [9, 13, 20].

In this study, we take the appearance of catalysis andreplication for granted, and raise the critical questions:do catalysis, replication, competition, cooperation, andselection indeed allow for the free exploration of thechemical sequence space or are some prebiotic evolution-ary fates more common than others? If the latter, whatis driving the evolution in such chemical networks? Ul-timately, we want to better understand how prebioticchemistry with reversible reactions can give rise to theemergence of history, i.e. can lead to a series of pathdependent events, where the outcome of a past selectionevents impacts on what becomes accessible to evolutionin the future.

In order to focus on these questions, we take a radicalstance and study a polymer model in which autocataly-sis is assumed to work perfectly for all molecular species.There are several reasons to do so. Firstly, if we believechemical evolution to have evolved more and more ef-fective self-replicators, it should be just as informative tounderstand the limiting case of perfect autocatalysis as itis to understand the limiting case of completely randomchemistry. Secondly, by demanding perfect autocatalysis,we are able to derive closed analytic expressions for allinvolved molecular species. These allow us to scrutinythe reaction system with unprecedented analytic rigor,and we are able to align our simulation results with theanalytic derivations.

We have previously reported on this model of self-replicating binary polymers where exact autocatalyticreplication introduces a selection bias toward few specificreplicator motifs and population structures [21]. Here, wepresent a full analysis of this system and several prebiot-ically motivated model variants.

In the remainder of the article, we give a detailed ex-position of our simple binary polymer model with ex-act autocatalysis. We derive analytical results for thetwo important limiting cases of purely random as well aspurely autocatalytic replication. We proceed by present-ing simulation results of the prebiotically interesting case,where autocatalysis is dominent but augmented with rel-atively rare random ligation. For this scenario we reveala strong inherent selection pressure toward repetitive se-quence motifs. We suggest two non-exclusive explana-tions for the strong observed selection pressure, one thatdominates the transient dynamics of the system, and onethat acts upon equilibrated populations. We also ana-lyze several variants of our model, including larger al-

phabet sizes, mutations, sequence complementarity, andnon-replicating food species, before discussing the gen-eral implications of our findings.

II. AN EXACT AUTOCATALYTIC BINARYPOLYMER MODEL

We study a system of self-replicating polymers madeup of two types of monomers. Each polymer can (1)decompose into two substrands by hydrolysis of any of itsbonds, (2) randomly ligate with another polymer to forma longer polymer, and (3) replicate itself autocatalyticallyby ligating two matching substrands. As an idealization,we assume that the rate constants of these reactions areconstant and in particular independent of the sequenceinformation of each polymer. As we are interested in theeffect of catalysis, we assume that random ligations arecomparatively rare.

The resulting system is a highly coupled molecularecology, where replicating species compete for commonresources and serve as each other’s reactants and degra-dation products. What will typically happen in such anecology when starting from a pool of monomers?

To formalize this reaction system we first introducesome notation: let A = {0, 1} be a binary alphabet. Wedenote the set of strings over A by A∗, and we write |k|for the length of string k ∈ A∗. For l,m ∈ A∗ we denoteby l.m the string obtained by concatenating l and m.

We can now define the above reaction system formally,using the following three reaction rules:

l.mc0−→ l +m (1)

l +mc1−→ l.m (2)

l +m+ l.mc2−→ 2 l.m. (3)

It has been observed experimentally, that non-enzymatic replicators suffer from product inhibition [22],where most potential templates are in their inactive dou-ble strand configuration, and thus cannot serve as repli-cation templates. As a consequence, experimental repli-cators do not follow the exponential growth dynamics ofsimple autocatalysis and, in turn, promote coexistenceof replicator species, rather than survival of only thefittest [23, 24]. In our study, the implicit energy flowentering reaction (3) is assumed to separate the inactiveand low energy double strand configuration of replicatorstrands and to transform them into their activated singlestrand configuration.

Since A∗ is enumerable, we can take it as basis for aninfinite dimensional state space: each state vector x de-notes a population within which every species k ∈ A∗spans one dimension and its associated coordinate xk ∈R+ specifies the concentration of the respective species.Standard dynamical systems analysis is complicated bythe infinite dimensionality of this state space. Previ-ous studies circumvented this difficulty by introducing amaximal strand length [15, 17] or focusing on very small

3

monomer pools [18, 25]. Here, we present a mathematicalformalism that is able to deal with the infinite dimension-ality for the most part analytically.

Assuming mass action kinetics, we can derive dimen-sionless reaction kinetic equations for the molar concen-trations xk of species k ∈ A as

dxkdt

=

∑k.j=ij.k=i

xi −∑i.j=k

xk

+ α

∑i.j=k

xixj −∑k.j=ij.k=i

xjxk

+ β

∑i.j=k

xixjxk −∑k.j=ij.k=i

xixjxk

≡ fk. (4)

where we have chosen a time scale t = c−10 , and intro-

duced the non-dimensional random ligation rate α =c1/c0, and the non-dimensional autocatalytic ligationrate β = c2/c0. The above summations consider ev-ery pair of species i and j which satisfy i.j = k, andk.j = i or j.k = i. Formally, the sum operators aredefined by means of the indicator functions (1i.j=k =1 if i.j = k else 0) as follows:∑

i.j=k

:=∑i,j∈A∗

1i.j=k (5)

∑j.k=ik.j=i

:=∑i,j∈A∗

(1j.k=i + 1k.j=i) (6)

Note that the second definition may involve an infinitesum and is only defined if the right-hand-side is defined.For finite size systems, which we consider in this article,the sum is always defined and allows us to write downthe reaction equation system in a compact closed formwithout introducing any arbitrary truncation.

Equation (4) allows us to perform standard station-arity and stability analysis for the two limiting cases ofpurely random ligation as well as purely autocatalyticligation.

A. Purely random and purely autocatalytic limitcases

In the absense of autocatalysis (β = 0), the systemapproaches a stationary and stable equilibrium distribu-tion, where the concentration of each replicator decaysexponentially with its length:

(x∗exp

)k

=1

αe−b|k|, (7)

which can be readily confirmed by inserting (7) into (4)for β = 0. Here, b is a constant determined by the bound-ary condition. Importantly, there is no selection for spe-cific sequence patterns.

We will now show that this situation changes radi-cally in the limit of purely autocatalytic ligation (α = 0).Firstly, since decomposition is now the only reaction thatintroduces novel species, any stationary state can onlyconsist of initially present strands or their substrands.For any strand that is populated in the stationary state,all its substrands must be populated as well. This canbe proven by contradiction: assume there exists a steadystate x∗ with x∗m > 0 but x∗k = 0 for some k.j = m. Fromequation (4) it follows:

dxkdt

∣∣∣∣x∗

=∑k.j=ij.k=i

x∗i ≥ x∗m > 0. (8)

Since fk(x∗) > 0, x∗ cannot have been a steady state.Thus, the assumption is false and species k must be pop-ulated as well.

We call any strand that is not a substrand of any otherstrand in the population a “chief” and the set of its sub-strands its “clan”. An example of such a “chief-clan”structure is shown in figure 1. We denote the set of allchiefs of x by ∂A†x and the set of all clan members byA†x. Using this nomenclature, the stationary chiefs of thepurely autocatalytic case are those substrands of initallypresent strands that survived decomposition and are notsubstrands of other surviving species.

Interestingly, under our assumption of exact replica-tion (3), stationary species concentrations no longer de-cay exponentially with their length. Instead, any properclan member is maintained at a constant concentrationgiven by the replication rate constant β:

x∗k =

{β−1/2, k ∈ A†x\∂A†x0, k 6∈ A†x.

(9)

This can again be readily confirmed by inserting (9)into (4) with α = 0.

Concentrations of chief species m ∈ ∂A†x are uncon-strained in the steady state, appart from what is deter-mined by the boundary condition. This is so because thesteady state condition for chiefs

dxmdt

∣∣∣∣x∗

= −∑i.j=m

x∗m + β∑i.j=m

x∗i x∗jx∗m

= x∗m

β ∑i.j=m

β−1 −∑i.j=m

1

= 0 (10)

is always fulfilled, independent of xm.To understand how stationary solutions respond to

fluctuations, we derive the entries of the functional ma-

4

trix:

∂fk∂xl

∣∣∣∣x∗

=∑k.j=ij.k=i

∂xi∂xl−∑i.j=k

∂xk∂xl

+ β

∑i.j=k

∂(xixjxk)

∂xl−∑k.j=ij.k=i

∂(xixjxk)

∂xl

=

∑j.l=kl.j=k

1−∑k.l=il.k=i

1− δkl∑k.j=ij.k=i

1

, (11)

where δij = 1 if i = j else 0 is the Kronecker symbol, and

chief concentrations are assumed to be β−1/2 for simplic-ity.

The equation simply expresses a detailed balance re-lation: increasing the amount of any clan species resultsin a flux to any species that uses the increased strand asprefix or suffix, while simultaneously introducing a fluxaway from all species needed as reactants for the forma-tion of those longer species. Again, increasing the level ofchief species (l = m) does not introduce any flux in thepopulation structure. Notably, the stability of a station-ary population is entirely determined by the topology ofthe reaction graph, not by the reaction rates.

To fully determine the stability of stationary solutionsrequires to solve for the eigenvalue spectrum of the func-tional matrix. The analytical treatment is complicatedby the fact that the dimension of our system is formallyinfinite. We therefore resolve to numerically probing theeigenvalue spectrum of hundreds of different trial popu-lations (see table I). We find that the vast majority ofstationary chief-clan structures have indeed no positiveeigenvalues. Moreover, we find that the number of zeroeigenvalues either equals the number of chiefs in the pop-ulation or is one greater than that. Up to two of thesezero eigenvalues result from mass conservation. The re-maining ones express the fact that chief-clan structureswith competing chiefs are connected by lines of linearlystable solutions. For example, in a population with com-peting chiefs 01 and 10 all populations that satisfy theconstant solution x0 = x1 = x∗ are linearly stable, inde-pendent of how material is distributed among the chiefs.

We remind that linear stability analysis only detectslocal stability and does not discriminate the relative sta-bility of one solution over another. Moreover, our analy-sis does not consider boundary conditions, which mightrender some chief-clan structures less likely than others,if not outright impossible.

How many potential equilibrium solutions are there?We can roughly estimate the potential stable chief-clanstructures as a function of the longest population mem-ber as follows: there are 2|m| different chiefs of length|m|. If any of these chiefs can be either absent or present

in the population, then there are 22|m| different poten-tial steady state populations. Despite the constraints of

TABLE I. Stability of clans with different number of chiefs.

Clans with one chief

|m| Psn Pss Pu

2 1.00 0.00 0.00

4 1.00 0.00 0.00

6 0.40 0.60 0.00

8 0.20 0.80 0.00

10 0.07 0.93 0.00

12 0.02 0.98 0.00

14 0.01 0.99 0.00

16 0.00 1.00 0.00

18 0.00 1.00 0.00

Clans with two chiefs

|m| Psn Pss Pu

2 1.00 0.00 0.00

4 0.72 0.28 0.00

6 0.21 0.79 0.00

8 0.04 0.95 0.00

10 0.00 0.98 0.02

12 0.00 0.97 0.03

14 0.00 0.95 0.05

Clans with four chiefs

|m| Psn Pss Pu

2 1.00 0.00 0.00

4 0.82 0.18 0.00

6 0.19 0.81 0.00

8 0.01 0.99 0.00

10 0.00 0.99 0.01

12 0.00 0.99 0.01

|m|: the length of the chief(s), Psn: the probability to findstable nodes, Pss: the probability to find stable spirals, Pu: theprobability to find unstable stationary points. The number ofsampling was 104. Only combinations of chiefs that had thesame number of “0” and “1” were sampled.

equation (9), there are super-exponentially many stablepotential equilibrium populations. Already for strandsas short as ten, this amounts to more than 10300 possi-ble stationary chief-clan structures. One should bare inmind, however, that this estimation ignores the boundarycondition imposed by the initial (and conserved) materialamount.

So far our analysis has been based on molar concentra-tions. If material is sparse, however, it is more accurateto adopt the view that molecular species are present indiscrete amounts. Under this view, the continuous equa-tions (1)–(3) still describe the expected tendencies of thedynamic process, but with the difference that species canbecome extinct and disappear entirely, rather than be-

5

ing diluted to arbitrary small (but non-vanishing) con-centrations. Under pure autocatalysis, this is especiallyrelevant for chief species, as they are not constrained topotentially high equilibrium amounts. If chief species dis-appear as the result of random fluctuations, they becomepermanently extinct.

B. Combined random and autocatalytic ligation

We now study systems where both ligation reactionsact in combination. In particular, we analyze how thesystem transitions from the exponential equilibrium so-lution to stable chief-clan structures when varying theratio of the random and autocatalytic ligation rates.

To answer this question, we sample the stochastic pro-cess defined through (1)–(3) via exact stochastic simu-lation using a Gillespie algorithm. In order to tacklethe dimensionality of the problem, the algorithm gener-ates for the current state x(t) a list of potential follow-upstates on the fly and evaluates transition rates ri accord-ing to equation (4). For any finite size population, thistransition list will also be finite.

Simulations are started from a monomer pool withx0(0) = x1(0) = 2100, β = 10−4 and α varying from10−7 to 10−3. Under sufficiently rare random ligation,we expect these settings to stabilize chief-clan structuresup to length ten, and under only random ligation, an ex-ponential distribution up to length eight. With the givenβ, in a population distributed according to (9), randomand autocatalytic ligation would be equally likely for arandom ligation rate constant α = β−1/2 = 10−2.

After t = 1000 time units, populations are comparedto both equilibrium solutions (7) and (9) using the cosinesimilarities

sexp =1

α

∑k∈A∗ xke−b|k|

|x|∣∣x∗exp

∣∣ (12)

and

sccs =

∑k∈Ax

xkβ−1/2

|∑k∈Ax

xk|∣∣∑

k∈Axβ−1/2

∣∣ (13)

where Ax = A†x\∂A†x. sexp the distance toward the corre-sponding exponential equilibrium and sccs measures thedistance to the smallest enclosing chief-clan structure,i.e. the smallest chief-clan structure that has a non-zerospecies count for any species that is present in the popu-lation. Fig. 2 shows the transition from stable chief-clanstructures into random distributions. With increasingrandom ligation rate α, observed populations seize toresemble their original chief-clan structure. Visual in-spection of the results revealed that these deviations aremainly caused by few random ligation products that arenot numerous enough to establish their clan. When fil-tering ouf such outliers, we found that the remaining corepopulations resemble more closely their corresponding

0

0.2

0.4

0.6

0.8

1

1.2

1x10-7 1x10-6 1x10-5 0.0001 0.001

ligation rate α

sexpsccs(1)sccs(2)sccs(5)

sccs(10)

FIG. 2. Cosine similarity between simulated population sam-ples and ideal exponential distributions (red) and betweenthe samples and their smallest enclosing chief-clan structure(blue), the latter for different threshold values. Species withoccupancies below the threshold are not considered for thedistance calculation.

chief-clan structure, essentially until α reaches 10−2. Atthe same time, populations approach equilibrium distri-butions, because autocatalysis becomes too slow to gen-erate chief-clan structures for the novel species createdby random ligation. For high enough α, deviation fromthe exponential equilibrium distribution is again mainlycaused by rare long strands, whose expectation numberfrom equation (7) is well below a single molecule.

These results indicate that chief-clan structures can-not develop above a critical random ligation rate lessthan β−1/2. Approaching this critical threshold, chief-can structures become succeedingly obscured by randomligation products in low copy number.

C. Rare random ligation under strongautocatalysis

With the above insights we now concern ourselveswith the prebiotically interesting parameter regime wherecomparatively rare random ligations occasionally intro-duce novel species into populations with strong auto-catalytic replication. When starting from a pool ofmonomers, random ligation will create longer species thatget amplified via autocatalytic replication and generatetheir clans via decomposition. We expect this process togenerate chief-clan structures satisfying equation (9) of asize that is just able to accommodate the overall amountof material.

Our question now becomes: does autocatalytic repli-cation with rare random ligation generate all of the over-whelmingly many potential chief-clan structures, or isthere some reason for some chief-clan structure to bemore prominent than others?

Using the results of 1000 simulation runs (simulateduntil t = 100), we perform single linkage hierarchical

6

clustering based on cosine distance

d(x, x′) = 1−∑k∈A∗ xkx

′k

|x||x′|. (14)

We find that about 75% of the simulation results concen-trate in six clusters with characteristic population struc-tures, shown in figure 3. This is a very strong selectionpressure, given that all our kinetic equations are sequenceagnostic—especially under the light of the results of theprevious discussion.

Interestingly, the selected populations have very fewchiefs composed of short repeating sequence motifs suchas 01 in cluster 3(a) (bootlace), 0011 in cluster 3(b) and(f) (pinecone1), 00001111 and 11110000 in cluster 3(c)and (d) (pinecone2). In about 2% of the simulation runs,we even observe complete demixing of the monomers intohomopolymers composed exclusively of 0’s and 1’s re-spectively (cluster 3(e), twotowers).

We also observe that some clusters exhibit an “in-verted” population structure where chiefs are more pop-ulated than other clan members (3(c), (d), and (f)). Thisis due to the boundary condition where the total mate-rial amount is too low to maintain a longer chief but toohigh for the chief concentration to be comparable to theequilibrium value. As a consequence, the highly popu-lated chief will more likely create—via random ligation—species outside its clan structure, which either die outor cause the whole population to transition into anotherstructure. Indeed, we have observed in simulation thatthese metastable structures exhibit stronger random fluc-tuations and potentially transition into completely dis-tinct chief-clan structures [21].

We emphasize that the stochastic simulation resultsare rather unexpected, if not even in apparent contra-diction to the results of the linear stability analysis de-rived above. If linear stability theory allows for almostany chief-clan structure to be (at least meta-)stable, howcome that stochastic simulation exhibits such a strongbias toward selecting very few and very regular popula-tion structures? What constitutes this “survival of thedullest”?

III. MECHANISMS OF STRUCTURESELECTION

We now propose two possible explanations for the sym-metry breakdown in this replicator ecology: On the onehand, we identify a selection bias in the transient processof creating equilibrated, non-inverted chief-clan struc-tures. On the other hand, we identify a selection bias inthe steady state dynamics of an established equilibratedpopulation. Both selection biases steer the system towardhighly repetitive sequence patterns, and we hypothesizethat both of them are responsible for the strong selectionobserved in stochastic simulation.

(a) 47.6%0

0 1 1 0

0 1 0 1 0 1

1

0 1 0 1 1 0 1 0

0 1 0 1 0 1 0 1 0 1

0 1 0 1 0 1 1 0 1 0 1 0

0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 1 0 1 0 1 0 1 0

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0101010101 1010 10 10 10

01010101010 10101010101

(b) 16.7%0

0 0 0 11 0

0 0 11 0 0 0 1 11 1 0

1

1 1

0 0 1 11 0 0 1 1 1 0 0 0 1 1 0

0 0 1 1 01 0 0 1 11 1 0 0 1 0 1 1 0 0

0 0 1 1 0 01 0 0 1 1 01 1 0 0 1 1 0 1 1 0 0 1

0 0 1 1 0 0 1 1 0 0 1 1 0 01 1 0 0 1 1 00 1 1 0 0 1 1

0 0 1 1 0 0 1 1 1 0 0 1 1 0 0 11 1 0 0 1 1 0 00 1 1 0 0 1 1 0

(c) 4.3%0

0 0 1 0

0 0 0 1 0 0 1 1 0

1

1 1

1 1 1

0 0 0 0 1 0 0 0 1 1 0 0 1 1 1 0 1 1 1 1

1 0 0 0 0 1 1 0 0 0 1 1 1 0 0 1 1 1 1 0

1 1 0 0 0 0 1 1 1 0 0 0 1 1 1 1 0 0

1 1 1 0 0 0 0 1 1 1 1 0 0 0

1 1 1 1 0 0 0 0

(d) 2.5%0

0 0 0 1 1 0

0 0 0 0 0 1 0 1 1

1

1 1

1 1 1

0 0 0 0 0 0 0 1 0 0 1 1 0 1 1 1 1 1 1 1

0 0 0 0 1 0 0 0 1 1 0 0 1 1 1 0 1 1 1 1

0 0 0 0 1 1 0 0 0 1 1 1 0 0 1 1 1 1

0 0 0 0 1 1 1 0 0 0 1 1 1 1

0 0 0 0 1 1 1 1

(e) 2.0%

0

0 0

0 0 0

1

1 1

1 1 1

0 0 0 0 1 1 1 1

0 0 0 0 0 1 1 1 1 1

0 0 0 0 0 0 1 1 1 1 1 1

0 0 0 0 0 0 0 1 1 1 1 1 1 1

0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1

0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1

0000000000 1111111111

00000000000 11111111111

(f) 1.8%0

0 0 0 11 0

0 0 11 0 0 0 1 11 1 0

1

1 1

0 0 1 11 0 0 1 1 1 0 0 0 1 1 0

0 0 1 1 01 0 0 1 11 1 0 0 1 0 1 1 0 0

0 0 1 1 0 01 0 0 1 1 01 1 0 0 1 1

1 0 0 1 1 0 01 1 0 0 1 1 0

1 1 0 0 1 1 0 0

(g) 1.6%0

0 0 0 1 1 0

0 0 1 1 0 00 1 0 0 1 11 0 1 1 1 0

1

1 1

0 0 1 0 1 0 0 1 1 1 0 00 1 0 1 0 1 1 01 0 1 1

0 0 1 0 11 0 0 1 0 1 1 0 0 1 0 1 0 1 1 0 1 1 0 0 1 0 1 1 0

0 0 1 0 1 11 0 0 1 0 1 1 1 0 0 1 0 0 1 0 1 1 00 1 1 0 0 1 1 0 1 1 0 0

1 1 0 0 1 0 1 0 1 0 1 1 0 00 1 1 0 0 1 0 1 0 1 1 0 0 1

FIG. 3. The result of hierarchical clustering for 1000 pop-ulations produced by simulation; Named: (a) bootlace, (b)pinecone1, (c) pinecone2, (d) pinecone2, (e) two-towers, (f)pinecone1. Cluster (g) has no name and there are many simi-lar structures. Note that (b) and (f), and, (c) and (d) belongto the same symmetry. These seven clusters occupy 76.5% ofall the population produced. Species having less than 10% ofthe constant solution are omitted. Simulations were run withα = 10−10, β = 10−7, and x0(0) = x1(0) = 200 000.

A. Selection in locally equilibrated populations

We first discuss the equilibrium scenario. Consider anequilibrated population structure that satisfies (9) andcomprises two chiefs a and b. The clans of these chiefs

7

always overlap (because monomers take part in all clans)and the overlap forms a pool of common resources forthe two clan structures to compete over. As in the endof the previous section, we assume random ligations to besufficiently rare and use equilibrated chief-clan structuresas initial conditions.

It is illustrative to first investigate a minimal systemof four species, where two chiefs 01 and 10 compete overthe shared resources 0 and 1. We know from the abovethat mass conservation of the two monomers fixes thissystem on a two dimensional plane, which features acontinuous line of marginally stable steady states dueto condition (9). The line connects the two boundarystates, formed by populations with only one survivingchief species.

In a stochastic treatment of this system, the line of con-nected equilibria, translates into a drift term that van-ishes for equilibrium solutions. Unlike in the determinis-tic treatment, random fluctuations can now redistributematerial along the line of marginal steady states with vir-tually no deviation from equilibrium. The two boundarystates are absorbing boundaries, as the walk terminates ifone of the competing chiefs becomes extinct (and cannotbe repopulated in the absense of random ligations).

Interestingly, diffusion of the random process also be-comes position dependent: randomly converting a singlechief molecule from one species to the other requires theabundance of both chief species, one entering as reac-tant to be degraded, the other as catalyst to enhace itsown ligation. Diffusion is therefore the highest when theproduct of the two chiefs is maximized and goes towardzero when approaching the boundaries, where either ofthe two species becomes sparse.

In the appendix, we develop a one dimensional ran-dom walk, constrained to the line of connected steadystates, that approximates the two dimensional case. Forthis approximation, we are able to analytically derive aFokker-Planck equation that features zero drift and po-sition dependent diffusion:

∂

∂tP (x01, t) ≈

∂2

∂x201

[D(x01)P (x01, t)] (15)

where D(x) = x01x10−1x01+x10

.The parabolic diffusion term has a remarkable effect

on the first hitting time, i.e. the typical time of chief co-existence. Because diffusion is higher in the center of themanifold (where both chiefs are populated about evenly)the system is more prone to leave this state and movetoward states of lower diffusion. This diffusion inducedeffective drift drives the system toward regions where onechief outnumbers the other. The average time for the sys-tem to reach one of the absorbing states is therefore or-ders of magnitude shorter than for a random walk processwith comparable, but constant diffusion. Figure 13 of theappendix compares the cumulative probabilty functionsof the hitting time distributions for the original two di-mensional system, our one dimensional approximation,and a random walk process with constant diffsuion.

Perheaps contrary to expectations raised by the pre-vious linear stability results, we now see that there is anatural tendency for minority species to be driven intoextinction. This effective drift is ultimately induced byposition dependent diffusion along the line of competingchief populations.

Note that this effective drift can only be observedamong chiefs that can absorb all the material of theircompetitor: otherwise, both chiefs can survive. For ex-ample, two chiefs 010 and 101 can coexist because neithercan absorb all the material of the other. Yet, deviationsin the chief-clan structure from the constant solution cansometimes absorb a chief with different material than itscompetitor, as we discuss now.

Figure 4 shows how a short chief, 00, can be driveninto extinction as the result of competion with longerchiefs: simulations are initialized with equilibrated boot-lace structures of varying length, plus 00, and measurethe probability of survival of the shorter chief after 1000time units as a function of its initial concentration forvarying length of the longer chief. In this case, chiefs ina bootlace structure, such as 0101, cannot absorb all thematerial of 00, but the short chief can nevertheless be ab-sorbed by fluctuations of the chief-clan structure, whichin turn deviates from the exact solution 9. The simu-lation shows the clear tendency of extinction of shorterclans in competition.

While the above reveals the source of competition andextinction in our model, and indicates that simulationsshould generate populations with few chiefs, it does notexplain why the dynamics select those particular chiefsthat feature simple repetitive motifs.

An answer to this question lies in the way perturba-tions are distributed among competing clans, as shownschematically in figure 5: When species in the overlapof competing clans are perturbed through random de-composition of longer strands, detailed balance causes anopposing flow from the perturbed species toward longerstrands. This outflow can be partitioned into three com-ponents: material flowing into species belonging exclu-sively to clan A, clan B, or their overlap C:

∂fA∪B∪Cl

∂xl=∂fl∂xl

. (16)

where we have introduced the “partial” flow∂fA

l

∂xlfrom l

into the species in A:

∂fAl∂xl

∣∣∣∣x∗

= −

∑l.j∈Aj.l∈A

1 +∑l.l∈A

2

≈ − ∑l.j∈Aj.l∈A

1. (17)

Thus, the flow into A (B) scales with the number ofstrands in A (B) that have the perturbed strand as theirprefix or suffix. Since regular chiefs can form longerstrands with more repetitive sequences than irregularchiefs, we expect regular chiefs to outperform irregularones in this competition.

8

(a)

0

0.2

0.4

0.6

0.8

1

0 100 200 300

Psv

x00(0)

(b)

0

50

100

150

200

250

2 4 6 8 10 12 14

x∗ 00

|m|

FIG. 4. (a) Probability of survival, Psv, after t = 1000 for00 coexisting with a bootlace structure (figure 1) with differ-ent chief length, depending on its initial concentration x00(0).The chief length are: |m| = 4 (×), |m| = 6 (∗), |m| = 8 (�),|m| = 10 (�), and |m| = 12 (◦). The solid curves are fittederror functions. (b) The threshold value, x∗00, where Psv = 0.5against the chief length |m|. The solid curve is a fitted loga-rithmic function.

Summary In equilibrium strands with high popula-tion numbers outcompete low populated strings in com-petition for resources and this makes it difficult for ran-domly created chiefs to survive. In some cases this pro-cess can be described by a Fokker-Planck equation, whichexpresses a position dependent diffusion. Further, longclans can absorb material faster than short clans becausethey offer more reaction pathways to absorb fluctuations(due to an increased number of prefix/suffix matches).With a given amount of material, regular chiefs can formlonger clans than irregular ones.

B. Selection far from equilibrium

We now discuss the transient dynamics of a typicalprocess of structure generation that starts from a pool ofmonomers, where—as before—random ligation is smallenough for the system to locally equilibrate after eachrandom ligation event. Thus, after spontaneous forma-tion of an intermediate chief through random ligation, thenew chief will accumulate material via autocatalysis inorder to satisfy the equilibrium condition (9). The resultis an inverted population structure where chief species

C

B

A

1

01 1011

0

00

010011 001 101

1

110 100

11 19

0010 0100 1010 01010110 0011 1001

00110 10011 0010001001

001001010011100110

00100110100110

00100110

01010 10101

010101 101010

1010101 0101010

01010101 10101010

101010101 010101010

1010101010 0101010101

10101010101 01010101010

010101010101

FIG. 5. Visualization of the “partial flows” when a popu-lation in equilibrium state x∗ is perturbed by adding ∆x tospecies l = 10. In the case shown, dfl/dxl = −31 (red). Thisflow is distributed such that −dfA

l /dxl = 19 go into clan A,−dfB

l /dxl = 11 go into clan B, and −dfCl /dxl = 1 remain in

the overlap C. Green nodes show the recipients of materialwhere light green corresponds to 1, medium green to 2, anddark green to 4.

accumulate the majority of the material. Therefore, thenext ligation is more likely to happen among two chiefsthan among any other species in the population—thuscreating a strand with repeated sequence patterns [21].Irregular chiefs, on the other hand, require additionalsubstrands to be formed from now less abundant clanmembers. The process repeats until chief concentrationsbecome lower than the equilibrium solution, at whichpoint there is no surplus material to grow longer strands.

The above verbal argument can be refined into a quan-titative estimation of selection likelihoods: assuming thatpopulations grow by irreversible random ligation andsubsequent equilibration, the likelihood to obtain onepopulation from a previous one via a single ligation eventis proportional to the equilibrium abundance of the re-actants, which is largely determined by the equilibriumcondition (9). Assuming that the surplus material isevenly distributed among chief species, we can calculatethe probability to obtain any final population from theinitial condition by multiplying the conditional probabili-ties of ligations along each ligation pathway that can leadto the final population in question. However, becauseof the double exponential scaling of potential chief-clanstructures as a function of chief length, the reaction sys-tem leads to a combinatorial explosion that fastly exceeds

9

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

40000 80000 160000 320000 640000

pro

port

ion o

f cl

ans

pool size n

otherpinecone2

pinecone1two towers

bootlace

FIG. 6. Estimated likelihood to form populations of strandswith certain motifs as a function of the monomer pool sizewith balanced monomer compositions. Note the exponentialscale of the x axis. Overlaid bar charts show results of fullstochastic simulations for various system sizes. See text fordetails.

computing resources. Justified by equation (17) and theresults of figure 4, we introduce the additional assump-tion that a new chief can only form in a population if itis at least as long as any existing chief. With this ad-ditional assumption, the combinatorial problem can betackled.

We systematically perform these likelihood estimationsfor different initial monomer amounts and estimate the90% most likely ligation pathways. The resulting popula-tion structures are classified according to their sequencemotifs and compared to the results of stochastic simula-tions for selected pool sizes. The computed likelihoods—shown in figure 6—are generally in good agreement withthe simulations.

Summary During the transient equilibration movesmost material into chiefs. If there is a single chief, it willmost likely react with itself to create the next chief (sce-nario 01 + 01→ 0101). The key here are inverted popu-lation structures [21] where most material is in the chiefspecies. To make irregular sequences during the tran-sient, distinct building blocks are needed. If one chiefhas absorbed the material, there is less material for theother chief, thus reducing the propensity of forming theirregular chief (scenario 10 + 0→ 100). Again most ma-terial is in the chief. To make a particular irregular chiefduring the transient, the two substrings have to ligate incorrect order (scenario 10+0→ 100 vs. 0+10→ 010), sothe symmetry of the reaction network plays as only one oftwo possible reactions produce a non-repetitive sequence.Finally, regular chiefs require the build-up of fewer inter-mediate species, i.e. 01010101 requires only 01 and 0101,whereas 01110001 would require 01, 11, 00, 01, 0111 and0001 or a similar number of other intermediates.

10-210-1100101102103104105106

0.1 1 10 100 1000

acti

vity

A

0.40.50.60.70.80.9

11.1

0.1 1 10 100 1000

s ccs

time t

02468

10121416

0.1 1 10 100 1000

spec

ies

/ chi

efs chiefs

species

FIG. 7. Typical development of chief-clan structures (herebootlace) under rare random ligation. Parameters are chosenas in figure 2 with α = 10−7. Graphs show the number ofspecies and chiefs (top), an activity that measures how “in-verted” the population is (middle), and the distance to thesmallest enclosing chief-clan structure (bottom). See text fordetails.

C. Distinguishing the alternative selectionmechanisms

We have presented two alternative driving mechanismsthat might explain the selection of few chiefs with shortrepetitive motifs. Which of these mechanisms is respon-sible for the results presented in figures 2 and 3?

In order to discriminate the two selection mechanisms,we perform numerical simulations and measure—over thecourse of time—the number of species and chiefs, an ac-tivity measure A that measures the surplus of materialin chief species through the definition

A(x) =∑

xi∈∂A†x

( xix∗

)2 ( xix∗− 1)2

, (18)

as well as the cosine similarity sccs of the population withthe smallest enclosing chief-clan structure.

Figure 7 shows exemplary results for the formation of abootlace structure under the parameters of figure 2 withα = 10−7. The stepwise elongation of the bootlace struc-ture is reflected by a steady increase in species count, buta non-growing number of either one or two chiefs. Whilethe structure grows, the activity generally decreases, un-til it levels off around t = 30 where the structure isfully established. Each formation of new chiefs in thestructure leads to a temporary decrease in the otherwisemaximal cosine similarity to the corresponding chief-clan

10

structure. This decrease in similarity is caused by thetemporarily high population count of the previous chief,which now violates the constant solution. Similarity isregained with the autocatalytic flow of matter into thenewly formed chief, and is accompanied by a decrease inactivity.

These observations indicate that the selection of thebootlace structure occurs from the onset and is causedby the far-from-equilibrium selection mechanism that isactive during the transient period. Yet, once the struc-ture is fully developed around t = 30, the equilibriumselection mechanism stabilizes the developed populationand prevents the occurence of new chiefs. As a result,fluctuations in species and chief numbers, activity, andsimilarity are short-lived.

Visual inspection of several dozens simulation runsconfirms this general behavior independent of the natureof selected structure. The lower the random ligation rateα the more prominent the far-from-equilibrium selection.For the parameters chosen (c.f. figure 2), clear chief-clanstructures form and remain stable for α between 10−7

and 10−6. For α = 10−5, the dynamics still yield a tran-sient selection, but the selected chief-clan structures isno longer maintained in equilibrium, as random ligationsbecome too common. Finally, for α = 10−4 and above,stable chief-clan structures can no longer form (at leastwithout applying a threshold for the structure identifica-tion).

IV. MODEL VARIANTS

We now present several variations of our original reac-tion system to demonstrate that the reported selectionpressure is structurally robust and can be observed insimilar replicator systems that more closely resemble thechemistry of nucleic acid replicators.

A. Increased alphabet size

Alphabet size has a significant impact on the selectedpopulation structures. Here we show populations pro-duced from a pool of four types of monomers (A ={0, 1, 2, 3}). Although introducing new monomer typesmakes the system more complex quickly, the selection ofchief-clan structures does happen as shown in figure IV A.The six largest clusters contain about 25% of all the pop-ulations produced. They all feature permutations of 0123as motifs. They are thus a counterpart of the bootlacestructure with motif 01. As in the case of a binary al-phabet, symmetry is broken at the level of dimers andpropagates from there through the population structure.From the seventh cluster onwards [figure IV A(b)], struc-tures lose order and become more irregular.

(a) 5.3%0

0 3 2 0

0 3 1 2 0 3 1 2 0

1

1 23 1

3 1 2

23

0 3 1 22 0 3 1 1 2 0 3 3 1 2 0

0 3 1 2 02 0 3 1 2 1 2 0 3 1 3 1 2 0 3

0 3 1 2 0 32 0 3 1 2 01 2 0 3 1 2 3 1 2 0 3 1

0 3 1 2 0 3 12 0 3 1 2 0 31 2 0 3 1 2 0 3 1 2 0 3 1 2

0 3 1 2 0 3 1 2 2 0 3 1 2 0 3 11 2 0 3 1 2 0 33 1 2 0 3 1 2 0

0 3 1 2 0 3 1 2 0 2 0 3 1 2 0 3 1 21 2 0 3 1 2 0 3 13 1 2 0 3 1 2 0 3

0 312031203 2 0 312031 20 1203 1203 123120 312031

03120312031 20312031203 1203120312031203120312

(b) 4.3%0

0 2 3 0

0 2 1 3 0 2 1 3 0

1

1 32 1

2 1 3

2 3

0 2 1 33 0 2 1 1 3 0 2 2 1 3 0

0 2 1 3 03 0 2 1 3 1 3 0 2 1 2 1 3 0 2

0 2 1 3 0 23 0 2 1 3 01 3 0 2 1 3 2 1 3 0 2 1

0 2 1 3 0 2 13 0 2 1 3 0 21 3 0 2 1 3 0 2 1 3 0 2 1 3

0 2 1 3 0 2 1 3 3 0 2 1 3 0 2 11 3 0 2 1 3 0 22 1 3 0 2 1 3 0

0 2 1 3 0 2 1 3 0 3 0 2 1 3 0 2 1 31 3 0 2 1 3 0 2 12 1 3 0 2 1 3 0 2

0213 021302 3021302130 13021302132 1302130 21

02130213021 30213021302 1302130213021302130213

(c) 4.3%0

0 2 1 0

0 2 3 1 0 2 3 1 0

1

3 1

2 3 1

2

2 3

3

0 2 3 11 0 2 3 3 1 0 2 2 3 1 0

0 2 3 1 01 0 2 3 1 3 1 0 2 3 2 3 1 0 2

0 2 3 1 0 21 0 2 3 1 03 1 0 2 3 1 2 3 1 0 2 3

0 2 3 1 0 2 31 0 2 3 1 0 23 1 0 2 3 1 0 2 3 1 0 2 3 1

0 2 3 1 0 2 3 1 1 0 2 3 1 0 2 33 1 0 2 3 1 0 22 3 1 0 2 3 1 0

0 2 3 1 0 2 3 1 0 1 0 2 3 1 0 2 3 13 1 0 2 3 1 0 2 32 3 1 0 2 3 1 0 2

0 231023102 1 0 231023 10 3102 3102 312310 231023

02310231023 10231023102 3102310231023102310231

(d) 4.1%0

0 1 3 0

0 1 2 3 0 1 2 3 0

1

1 2

1 2 3

2

2 3

3

0 1 2 33 0 1 2 2 3 0 1 1 2 3 0

0 1 2 3 03 0 1 2 3 2 3 0 1 2 1 2 3 0 1

0 1 2 3 0 13 0 1 2 3 02 3 0 1 2 3 1 2 3 0 1 2

0 1 2 3 0 1 23 0 1 2 3 0 12 3 0 1 2 3 0 1 2 3 0 1 2 3

0 1 2 3 0 1 2 3 3 0 1 2 3 0 1 22 3 0 1 2 3 0 11 2 3 0 1 2 3 0

0 1 2 3 0 1 2 3 0 3 0 1 2 3 0 1 2 32 3 0 1 2 3 0 1 21 2 3 0 1 2 3 0 1

0123 012301 3012301230 23012301231 2301230 12

01230123012 30123012301 2301230123012301230123

(e) 3.5%0

0 3 1 0

0 3 2 1 0 3 2 1 0

1

2 1

3 2 1

2

3 2

3

0 3 2 11 0 3 2 2 1 0 3 3 2 1 0

0 3 2 1 01 0 3 2 1 2 1 0 3 2 3 2 1 0 3

0 3 2 1 0 31 0 3 2 1 02 1 0 3 2 1 3 2 1 0 3 2

0 3 2 1 0 3 21 0 3 2 1 0 32 1 0 3 2 1 0 3 2 1 0 3 2 1

0 3 2 1 0 3 2 1 1 0 3 2 1 0 3 22 1 0 3 2 1 0 33 2 1 0 3 2 1 0

0 3 2 1 0 3 2 1 0 1 0 3 2 1 0 3 2 12 1 0 3 2 1 0 3 23 2 1 0 3 2 1 0 3

0 321032103 1 0 321032 10 2103 2103 213210 321032

03210321032 10321032103 2103210321032103210321

(f) 3.0%0

0 1 2 0

0 1 3 2 0 1 3 2 0

1

1 3

1 3 2

2

3 2

3

0 1 3 22 0 1 3 3 2 0 1 1 3 2 0

0 1 3 2 02 0 1 3 2 3 2 0 1 3 1 3 2 0 1

0 1 3 2 0 12 0 1 3 2 03 2 0 1 3 2 1 3 2 0 1 3

0 1 3 2 0 1 32 0 1 3 2 0 13 2 0 1 3 2 0 1 3 2 0 1 3 2

0 1 3 2 0 1 3 2 2 0 1 3 2 0 1 33 2 0 1 3 2 0 11 3 2 0 1 3 2 0

0 1 3 2 0 1 3 2 0 2 0 1 3 2 0 1 3 23 2 0 1 3 2 0 1 31 3 2 0 1 3 2 0 1

0132 013201 2013201320 32013201321 3201320 13

01320132013 20132013201 3201320132013201320132

FIG. 8. The structures produced from simulations with A ={0, 1, 2, 3}. They all belong to the same motif, . . . 0123 . . .,and its permutations. All six permutations of the motif ap-pear in simulation, occupying in total 25% of all populations.After the 6th cluster, the structure becomes random. The pa-rameters used: α = 10−10, β = 10−6 with 2× 105 of 0, 1, 2,and 3. The hierarchical clustering was stopped when 30% ofpopulations are included into the largest 10 clusters. Speciesless than 30% of the constant solution are omitted.

B. Mutations

Evolution arises from variation and selection. In ourbasic model, variation is entirely due to random ligation.It is interesting to see how the reported sequence selectionin our model is affected when potential mutation opensan additional channel for variation. To this end, pointmutations are introduced as a reaction:

l′ +m′ + l.mc3|l.m|−−−−−−→ l′.m′ + l.m (19)

11

(a)

(b)

FIG. 9. The structures produced from simulations with mu-tation. Bootlace, (a), is more robust against mutation thanpinecone1, (b), which is disturbed significantly. The occu-pancy is 34% and 10% for bootlace and pinecone1, respec-tively, in 697 runs. The parameters used: α = 10−10, β =10−7, c3 = 10−11 with 2× 105 of 0 and 1.

where l′.m′ is one bit different from l.m, and c3/c2 =10−4.

Figure 9 shows the largest and second largest clustersfound in simulation. The largest cluster [figure 9(a)], oc-cupying 34% of all populations, is based on the bootlacestructure [figure 3(a)], whereas the second largest cluster[figure 9(b)], occupying 10%, is based on the pinecone1structure [figure 3(b)], judging from the tetramer andpentamer sequence motifs. In figure 9(a), despite thefairly large mutation rate, perturbations arising frompoint mutations only produce short additional chiefs,and the bootlace structure remains unperturbed in itslonger sequences. On the other hand, the distortion infigure 9(b) is significant. Actually, all the species shorterthan trimers appear. This indicates that the robustnessagainst mutations depends on the population structure,which likely connects to the different stabilities of thestructures.

C. Sequence complementarity

Sequence complementary hybridization is a character-istic feature of template directed nucleic acid replication.We introduce sequence complementarity into our modelby defining a complementarity operator · : A∗ → A∗

recursively as follows:

0 = 1 (20)

1 = 0 (21)

l.m = l.m (22)

This allows us to replace the original template directedligation reaction (3) by its sequence-complementarycounterpart:

l +m+ l.mc2−−−→ l.m+ l.m. (23)

Figure 10 shows a result of hierarchical clustering out of1000 runs. The seven clusters shown in figure 10 occupyabout 70% of all the clusters produced. This value isslightly smaller than the case in figure 3, but still signif-icant fraction, indicating a strong dynamical selection.On the other hand, the occupation of each cluster be-comes different from the case in figure 3. The bootlacestructure, for example, is still a biggest cluster, but itoccupies only about 11% in comparison with 48% shownin figure 3. Pinecone1 [figure 10(b) and (c)] and two-towers [figure 10(d)] increase their occupancy. Three newclusters [figure 10(e), (f), and (g)] appear as replacingthe bootlace’s occupancy. One also notices that all thestructures shown are symmetric about the center axis,reflecting the complementarity.

D. Existence of inert food species

Several previous studies assume the existence of foodspecies [17, 18] that cannot be produced by autocatalysisand thus have to be present in the system in the begin-ning or have to be fed continuously into the system. Inour basic model, monomers are the only food species. Wenow analyze a variant where any species up to a certainlength lfood are food species that cannot undergo auto-catalysis. As a consequence, the system does not have tosatisfy the constant solution (9).

To examine the effect of food species, we conducted astability test. First, a bootlace structure including foodspecies, with two chiefs of length |m| is constructed as-suming that all species has the same concentration givenby the constant solution. Then the system is equili-brated. Figure 11 shows a diagram indicating whetherthe bootlace structure is stable or not, depending on |m|and lfood. It can be seen that the structure is stable if thelength of the chief is long enough. The neccesary lengthstabilizing the bootlace is roughly linearly proportionalto lfood.

Figure 12 shows a distribution of species in equlibrium.Crosses show a distorted bootlace structure, where thedistribution of food species decreases exponentially. Forautocatalytic species, the constant solution is approxi-mately recovered with slightly larger values.

If we start simulations from a pool of monomers,monomers must spontaneously ligate to become longer

12

(a) 10.7% (b) 10.5%

(c) 10.3%

(d) 10.2%

(e) 9.8%

(f) 9.5%

(g) 9.2%

FIG. 10. The structures produced by sequence complemen-tary autocatalysis. The parameters used: α = 10−10, β =10−7 with 2 × 105 of 0 and 1. These seven clusters occupyabout 70% of all the populations produced.

than lfood to initiate autocatalytic reactions. It is unlikelythen, that autocatalytic species appear spontaneouslyunless the concentration of monomers is extremely high.Circles in figure 12 show the distribution when the sim-ulation is started with the same amount of monomers asabove, but without a pre-constructed bootlace structure(crosses in figure 12). Only dimers appear in this case.Therefore, it is unlikely, if not impossible, for this sys-tem to achieve spontaneously autocatalytic species andthe resulting distribution maintained by autocatalysis.

This barrier between autocatalytic and non-autotacalytic states depends strongly on the size offood species. The longer the food species, the higher thebarrier. This observation is important, for example, toa real chemistry or origin-of-life scenarios. The available

2468

1012141618

2 3 4 5 6 7 8 9 10

|m|

lfood

FIG. 11. A stability diagram of the bootlace structures withtheir chief length |m| against the maximum length of foodspecies, lfood. Crosses and circles represent where they arestable and unstable, respectively.

0.01

0.1

1

10

100

1000

2 4 6 8 10 12 14 16 18

〈x〉/103

|k|

FIG. 12. The averaged distribution of species, 〈x〉, dependingon their length, |k| when lfood = 8. Crosses represent a casestarting from a pre-built bootlace structure, whereas circlesrepresent a case starting from a pool of monomers. Solid linesrepresent exponential functions. Dashed line is the constantsolution, β−1/2.

monomer concentration determines how long the foodspecies is tolerated to achieve an autocatalyticallymaintained state.

V. DISCUSSION

We have presented a simple model of binary self-replicating polymers, in which exact autocatalysis givesrise to the formation of characteristic chief-clan struc-tures, where the sequence information of the longestreplicators completely determines the population distri-bution of the molecular ecosystem. When starting froma pool of monomers, only few, highly regular sequencepatterns are selected by the dynamics.

13

We attribute this selection for repetitive sequencesto stems in part from reaction symmetries and inpart from a transient concentration of material in thelong sequences. Due to the reaction conditions dur-ing the transient, most material is concentrated in thelong sequences, the chiefs. In the transient forma-tion of the longer sequences, mainly from the abundantchiefs,repetitive sequences both have more and shorterreaction pathways due to their symmetries. This is trueboth in the ligation of shorter sequences into longer se-quences and in the breakdown of longer sequences intoshorter sequences. These are the reasons why the produc-tion of repetitive sequences are favoured in such systems.

Notably, this dynamical selection is intrinsic to ourmodel and does not require an a priori fitness function.This is important for the study of, for example, prebioticevolution and origins of life [26–28], as well as protocellresearch [6, 29, 30], since it implies that the selectionoccurs by means of both environmental as well as inter-nal factors. This aspect was not elucidated by previousstudies where selection was either induced by means ofan external fitness function or by assigning randomly at-tributed catalysis rates.

It is commonly understood that random fluctuationsare mostly significant in systems with small molecularcopy numbers and that, in the limit of large numbers,results of stochastic simulations typically approach ex-pectation values that can be obtained from equivalentdeterministic models. The situation is radically differentin the system presented in this study: firstly, autocat-alytic replication amplifies small random fluctuations inthe transient, which ultimately determines the fate of en-tire populations on the system level; and secondly, steadystate population numbers are only determined by reac-tion constants, but not by the total material concentra-tion. Increasing the total monomer concentration will notincrease the concentration of molecular species in steadystate. Instead, the system will accommodate for the ad-ditional material by generating larger polymers, up to alength where material becomes sparse and thus remainssusceptible to copy number fluctuations.

Although our exact findings have been derived by an-alyzing a specific reaction system (exact autocatalysis),

we hypothesize that many of our general results also holdfor similar binary polymer models, for example those em-ploying sequence dependent replication rates [18], ran-dom cross-catalysis [15], or material flow.

Namely, if studied under conditions where the analy-sis of our work applies (sufficiently high molecule num-bers, dominant catalysis over random ligation, and rel-atively constant environment), any polymer chemistryis expected to approach some equilibrium distribution.The phase space profile of these equilibrium states willdictate which populations are stable and how matter isdistributed among their members. If the kinetics of thepolymer chemistry allows for multiple equilibria, selec-tion is likely to occur. Either because these are isolatedstable states and an initial population has to commit toone or the other; or because a smaller set of stable statesis selected by noise induced drift, as encountered in thechemistry studied in this article. For example, Bagleyet al. [15] report the emergence of “concentration land-scapes” in randomly catalyzing polymer networks, whichwe can now identify as equivalents to chief-clan struc-tures.

As long as selected populations introduce novel speciesthat provide resources and catalysts for higher units ofselection (i.e. longer polymers of a net-catalyzing set),the chemistry meets in principle the conditions for con-tinued selection dynamics during a transient that startsfrom monomers or short oligomers. In the model systemdiscussed in this article, continued selection is particu-larly prominent, since every newly created species is anauto-catalyst. Continued selection is expected to stallonce the chemistry does not introduce novel resources ornovel catalysts. For example, if the chemistry allowed foroverhangs around a ligation window, we expect the sym-metry breakdown to stall, once a suitable ligation sitehas emerged.

In contrast, we expect that the particular repetitivesequence motifs that we report here stem from symme-tries in the reaction graph of exact autocatalysis and arelikely not prominent in random catalytic chemistries.

The investigation of related chemistries employing se-quence dependent and randomly assigned catalysis is leftfor future research.

[1] M. Eigen, Naturwissenschaften 58, 465 (1971).[2] W. Gilbert, Nature 319, 618 (1986).[3] P. A. Monnard, in Prebiotic Evolution and Astrobiology,

edited by J. T.-F. Wong and A. Lazcano (Landes Bio-science, Austin, TX, USA, 2008).

[4] M. W. Powner, B. Gerland, and J. D. Sutherland, Nature459, 239 (2009).

[5] M. S. DeClue, P. A. Monnard, J. A. Bailey, S. E. Mau-rer, G. E. Colins, H. J. Ziock, S. Rasmussen, and J. M.Boncella, J. Am. Chem. Soc. (2009).

[6] S. Rasmussen, A. Constantinescu, and C. Svaneborg,Phil. Trans. R. Soc. B 371, 20150440 (2016).

[7] M. Eigen and R. Winkler-Oswatitsch, Naturwis-senschaften 68, 217 (1981).

[8] P. F. Stadler, W. Fontana, and J. H. Miller, Physica D:Nonlinear Phenomena 63, 378 (1993).

[9] R. Hanel, S. Kauffman, and S. Thurner, Physical re-view. E, Statistical, nonlinear, and soft matter physics72, 036117 (2005).

[10] A. Filisetti, A. Graudenzi, R. Serra, M. Villani, R. M.Fchslin, N. Packard, S. A. Kauffman, and I. Poli, Theoryin Biosciences 131, 85 (2012).

[11] S. Rasmussen, Aspects of Instabilities and Self-Organizing Processes, PhD thesis, Technical University

14

of Denmark, Lundby (1985).[12] J. D. Farmer, S. A. Kauffman, and N. H. Packard, Phys-

ica D 22, 50 (1986).[13] S. A. Kauffman, J. Theor. Biol. 119, 1 (1986).[14] S. Rasmussen, E. Mosekilde, and J. Engelbrecht, in

Structure, Coherence and Chaos in Dynamical Systems(1989) pp. 315–331.

[15] R. J. Bagley and J. D. Farmer, in Artificial Life II, editedby C. G. Langton, C. Taylor, J. D. Farmer, and S. Ras-mussen (Addison-Wesley, 1991) pp. 93–140.

[16] C. Fernando, G. v. Kiedrowski, and E. Szathmry, J. Mol.Evol. 64, 572 (2007).

[17] W. Hordijk and M. Steel, Journal of Theoretical Biology295, 132 (2012).

[18] J. Derr, M. L. Manapat, S. Rajamani, K. Leu, R. Xulvi-Brunet, I. Joseph, M. A. Nowak, and I. A. Chen, NucleicAcids Research 40, 4711 (2012).

[19] W. Hordijk, arXiv:1612.02770 [q-bio] (2016), arXiv:1612.02770.

[20] W. Hordijk and M. Steel, Origins of Life and Evolutionof Biospheres 44, 111 (2014).

[21] S. Tanaka, H. Fellermann, and S. Rasmussen, EPL (Eu-rophysics Letters) 107, 28004 (2014).

[22] G. von Kiedrowski, B. Wlotzka, J. Helbing, M. Matzen,and S. Jordan, Angewandte Chemie International Edi-tion in English 30, 423 (1991).

[23] E. Szathmry and I. Gladkih, Journal of Theoretical Bi-ology 138, 55 (1989).

[24] H. Fellermann and S. Rasmussen, Entropy13, 1882 (2011), http://www.mdpi.com/1099-4300/13/10/1882Article.

[25] M. Wu and P. G. Higgs, Journal of Molecular Evolution69, 541 (2009).

[26] G. F. Joyce, Orig. Life Evol. Biosph. 14, 613 (1984).[27] R. R. Breaker and G. F. Joyce, Proceedings of the Na-

tional Academy of Sciences 91, 6093 (1994).[28] N. Vaidya, M. L. Manapat, I. A. Chen, R. Xulvi-Brunet,

E. J. Hayden, and N. Lehman, Nature 491, 72 (2012).[29] W. Szostack, D. P. Bartel, and P. L. Luisi, Nature 409,

387 (2001).[30] H. Fellermann, in Principles of Evolution (Springer

Berlin Heidelberg, 2011) pp. 261–280.

Appendix A: Fokker Planck derivation

We study the minimal competing system with fourspecies, where two competing chiefs 01 and 10 competeover resources 0 and 1. Mass conservation maintains thedynamics on a two dimensional manifold, essentially x01

and x10 (x0 = x1 can then be infered).For brevity, we introduce the notation X = 01, Y = 10,

with concentrations x = x01, y = x10, z = x + y, and in-distrimnately refer to monomers by A with concentrationa = x0 = x1.

Motivated by the observation that there is a lineof steady state solutions (any combination of x and ythat fulfills a = β1/2) that are all of marginal stabil-ity (zero eigenvalues), we introduce a simplification thatconstrains the system to maintain equilibrium by onlydistributing material among chiefs.

To achieve this, we construct a constrained stochas-tic process, where no two degradations nor two ligations

can follow each other without an intermediate event ofthe other type. Each net degradation-ligation event thusmaintains the initial distance from the equilibrum line.

Technically, we introduce a stochastic net process withtwo Markovian degradation reactions, and two delayedtime ligation reactions whose rates β are functions of thetime t− t′ since the last degradation event. With∫ ∞

t′β(t− t′)dt = β, (A1)

we ensure that the original ligation rate is maintained inthe long term. Informally, the constrained process canbe written as

X1−→ 2A

β(t−t′)x−−−−−→ X (A2)

X1−→ 2A

β(t−t′)y−−−−−→ Y (A3)

Y1−→ 2A

β(t−t′)x−−−−−→ X (A4)

Y1−→ 2A

β(t−t′)y−−−−−→ Y. (A5)

We now let

β(t−′ t)→ βδt−t′ (A6)

approach a Dirac distribution, so that ligation followsdegradation immediately.

Degradation events occur as previously with rate con-stant 1, but only a certain fraction (e.g. y/(x + y) forequation (A3)) lead to an actual change in the systemstate. The net conversion rates are therefore the degra-dation rate (having been set to one per unit time in thedimensionless system) times the probability to lead to asystem change. The net process therefore reads:

X(z−x)/z−−−−−→ Y (A7)

Yx/z−−→ X (A8)

Note that the total propensity of this net process, (z −x)/z + x/z = z, equals the total propensity of degrada-tions in the original process.

To compare the quality of this approximation, we sam-pled 1000 instantiations of the original 2D process and ofour 1D simplification and compare the cumulative prob-ability distributions of first hitting times for walks thatstart at x(0) = z

2 and reach one of the absorbing bun-daries 0 or z. Results are shown in figure 13.

The results reveal that the 1D approximation generateshitting time distributions with the same general shapethan the original process, but overestimates typical hit-ting times – that is, the unconstrained process typicallyterminates faster than the constrained process. The ap-proximation is more accurate for late hitting times, andapproaches the original hitting time distrubtion.

Figure 13 also reveals that both the original and theconstrained process complete orders of magnitude fasterthan a random walk where X transforms into Y and viceversa with constant rate z.

15

0

0.25

0.5

0.75

1

100 1000 10000 100000 1x106 1x107 1x108

cum

ula

tive

dis

trib

uti

on f

unct

ion

time

first hitting time from center

original 2D process1D approximation

1D constant diffusion

FIG. 13. Cumulative hitting time distributions of differentrandom walks starting in the inital state x(0) = z

2= x∗

2

for z =√

107. Distribution functions have been obtainedfrom 1000 numerical samples for each of purple: the original2D process defined by equations (1) and (3), green: our 1Dapproximation of equation (A7), and blue: a random walkwith constant diffusion for reference.

We now derive the Master and Fokker Planck equations for the stochastic process (A7)—(A8). Since mass isconserved in this system, this corresponds to a one dimensional random walk, here expressed for the species X=01:

. . . −−⇀↽−− x− 1(x− 1)(z − x+ 1)/z−−−−−−−−−−−−−⇀↽−−−−−−−−−−−−−

x(z − x)/zx

x(z − x)/z−−−−−−−−−−−−−⇀↽−−−−−−−−−−−−−(x+ 1)(z − x− 1)/z

x+ 1 −−⇀↽−− . . . (A9)

The stochastic process has the associated Master equation

∂

∂tP (x, t) = −2

x(z − x)

zP (x, t) +

(x− 1)(z − x+ 1)

zP (x− 1, t) +

(x+ 1)(z − x− 1)

zP (x+ 1, t) (A10)

= −2x(z − x)

zP (x, t)

+x(z − x)

zP (x− 1, t) +

x

zP (x− 1, t)− z − x

zP (x− 1, t)− 1

zP (x− 1, t)

+x(z − x)

zP (x+ 1, t)− x

zP (x+ 1, t) +

z − xz

P (x+ 1, t)− 1

zP (x+ 1, t) (A11)

=x(z − x)

z[P (x+ 1, t)− 2P (x, t) + P (x− 1, t)]

+z − 2x

z[P (x+ 1, t)− P (x− 1, t)]− 1

zP (x+ 1, t)− 1

zP (x− 1, t) (A12)

=x(z − x)− 1

z[P (x+ 1, t)− 2P (x, t) + P (x− 1, t)]

+z − 2x

z[P (x+ 1, t)− P (x− 1, t)]− 2

zP (x, t). (A13)

Approximating

P (x+ 1, t)− P (x− 1, t) ≈ 2∂

∂xP (x, t) (A14)

P (x+ 1, t)− 2P (x, t) + P (x− 1, t) ≈ ∂2

∂x2P (x, t), (A15)

16

we derive a Fokker Planck equation with zero drift and non-constant diffusion:

∂

∂tP (x, t) ≈ x(z − x)− 1

z

∂2

∂x2P (x, t) + 2

z − 2x

z

∂

∂xP (x, t)− 2

zP (x, t) (A16)

=z − 2x

z

∂

∂xP (x, t) +

x(z − x)− 1

z

∂2

∂x2P (x, t) +

z − 2x

z

∂

∂xP (x, t)− 2

zP (x, t) (A17)

=∂

∂x

x(z − x)− 1

z

∂

∂xP (x, t) +

x(z − x)− 1

z

∂2

∂x2P (x, t) +

∂

∂x

z − 2x

zP (x, t) +

z − 2x

z

∂

∂xP (x, t) (A18)

=∂

∂x

[x(z − x)− 1

z

∂

∂xP (x, t)

]+

∂

∂x

[z − 2x

zP (x, t)

](A19)

=∂

∂x

[x(z − x)− 1

z

∂

∂xP (x, t) +

∂

∂x

x(z − x)− 1

zP (x, t)

](A20)

=∂2

∂x2

[x(z − x)− 1

zP (x, t)

](A21)

=∂2

∂x2[D(x)P (x, t)] (A22)

with non-constant diffusion D(x) = x(z−x)−1z . Note that diffusion is maximal for x = z/2, i.e. when both chiefs

are equally populated, and becomes negative for x = 0 and x = z, which reflects the fact that these are absorbingboundary conditions.

Documents

Sequence Selection by Dynamical Symmetry Breaking … · Sequence selection by dynamical symmetry breaking in an autocatalytic binary polymer model Harold Fellermann Interdisciplinary