Mapping the Fitness Landscape of Gene Expression Uncovers ......Mapping the Fitness Landscape of Gene Expression Uncovers the cause of Antagonism and Sign Epistasis between Adaptive

Mapping the Fitness Landscape of Gene ExpressionUncovers the cause of Antagonism and Sign Epistasisbetween Adaptive MutationsHsin-Hung Chou1,2¤a, Nigel F. Delaney1¤b, Jeremy A. Draghi3,4, Christopher J. Marx1,5*¤c

1 Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts, United States of America, 2 Institute of Molecular Systems Biology,

ETH Zurich, Zurich, Switzerland, 3 Department of Zoology, University of British Columbia, Vancouver, Canada, 4 Department of Biology, University of Pennsylvania,

Philadelphia, Pennsylvania, United States of America, 5 Faculty of Arts and Sciences Center for Systems Biology, Harvard University, Cambridge, Massachusetts, United

States of America

Abstract

How do adapting populations navigate the tensions between the costs of gene expression and the benefits of geneproducts to optimize the levels of many genes at once? Here we combined independently-arising beneficial mutations thataltered enzyme levels in the central metabolism of Methylobacterium extorquens to uncover the fitness landscape defined bygene expression levels. We found strong antagonism and sign epistasis between these beneficial mutations. Mutations withthe largest individual benefit interacted the most antagonistically with other mutations, a trend we also uncovered throughanalyses of datasets from other model systems. However, these beneficial mutations interacted multiplicatively (i.e., noepistasis) at the level of enzyme expression. By generating a model that predicts fitness from enzyme levels we couldexplain the observed sign epistasis as a result of overshooting the optimum defined by a balance between enzyme catalysisbenefits and fitness costs. Knowledge of the phenotypic landscape also illuminated that, although the fitness peak wasphenotypically far from the ancestral state, it was not genetically distant. Single beneficial mutations jumped straighttoward the global optimum rather than being constrained to change the expression phenotypes in the correlated fashionexpected by the genetic architecture. Given that adaptation in nature often results from optimizing gene expression, theseconclusions can be widely applicable to other organisms and selective conditions. Poor interactions between individuallybeneficial alleles affecting gene expression may thus compromise the benefit of sex during adaptation and promote geneticdifferentiation.

Citation: Chou H-H, Delaney NF, Draghi JA, Marx CJ (2014) Mapping the Fitness Landscape of Gene Expression Uncovers the cause of Antagonism and SignEpistasis between Adaptive Mutations. PLoS Genet 10(2): e1004149. doi:10.1371/journal.pgen.1004149

Editor: Mark Achtman, Warwick Medical School, United Kingdom

Received September 24, 2013; Accepted December 16, 2013; Published February 27, 2014

Copyright: � 2014 Chou et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permitsunrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: HC acknowledges support by an EMBO Long-Term Fellowship (ALTF 132-2010). This work was funded by an NIH award to CJM (GM078209).The fundershad no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing Interests: The authors have declared that no competing interests exist.

* E-mail: [email protected]

¤a Current address: Department of Biochemistry, University of Cambridge, Cambridge, United Kingdom.¤b Current address: Department of Molecular Biology, Massachusetts General Hospital, Boston, Massachusetts, United States of America.¤c Current address: Department of Biological Sciences, University of Idaho, Moscow, Idaho, United States of America.

Introduction

The concept of a fitness landscape unites the three levels of

evolutionary change – genotype, phenotype, and fitness – into a

mathematical picture of the potential for, and constraints upon,

adaptive evolution. By mapping genotypes to a measure of fitness,

fitness landscapes guide our understanding of how epistasis –

nonlinear interactions between the fitness effects of mutations –

shapes evolution. Strong epistasis implies that landscapes are

rugged, with many peaks, or locally optimally genotypes [1,2]. The

magnitude and form of epistasis is predicted to determine the

number of evolutionary trajectories [3,4], the rate and repeatabil-

ity of adaptation [5–7], and the benefit of sex [8]. Recent

experimental work with a wide variety of model organisms has

revealed diminishing returns as a general trend of adaptation [9–

13], with relatively few cases of synergy [11,14] or sign epistasis

[15] (i.e., the same mutation being beneficial or deleterious in

different contexts [16]). Antagonism between adaptive mutations

might imply that these populations are summiting peaks in their

fitness landscapes with just a handful of genetic changes. This

explanation might lead to further trends, such as a negative

relationship between the initial selective coefficient of a mutation

and its epistatic interactions that could prove to be a useful

predictor of a saturating process of adaptation [17]. In order to

definitely link diminishing returns to the ascent of local peaks, as

well as to understand the existence of the peaks themselves, we

must understand the phenotypes that link genotype and fitness in

the adaptive landscape. Mathematically convenient formulations

such as Fisher’s geometric model for adaptation near a single peak

[18] have been used to interpret the trend toward antagonism

[19]. This approach assumes stabilizing selection a priori. What

remains unclear is what types of physiological interactions give rise

to fitness landscapes of varying shape and form, as well the

constraints upon mutational changes to underlying phenotypes.

Models of metabolic pathways have been amongst the most

successful in translating underlying biochemical phenotypes to

PLOS Genetics | www.plosgenetics.org 1 February 2014 | Volume 10 | Issue 2 | e1004149

http://creativecommons.org/licenses/by/4.0/

fitness. The contribution of enzyme activities upon metabolic flux

has been formalized via Metabolic Control Analysis (MCA)

[20,21]. The ability of this approach to predict the fitness

consequences of changes in enzyme properties has been verified

in experimental systems that vary from Escherichia coli in lactose-

limited chemostats to the flight properties of butterflies (reviewed

in [22]). Turning to multiple enzymes, MCA theory has suggested

a general trend toward synergistic interactions between activity-

increasing mutations in a metabolic pathway [23,24]. A major

limitation, however, has been that the costs of enzyme expression

[25] have not been included in classical MCA. Whereas the

dependence of flux through a metabolic pathway saturates with

increasing levels of a given enzyme, the costs will continue to

accumulate. The balance of these two selective factors will

generate an intermediate optimum, and thus stabilizing selection.

Inclusion of expression costs to MCA has enabled predictions of

the optimum levels of a single enzyme [26], and was used to

compare the differential utility of alternate, degenerate pathways

[27]. An open question, however, is how the balance between

catalytic benefits and expression costs plays out to optimize

enzyme expression across many enzymes simultaneously.

In order to study how evolution would simultaneously optimize

expression of multiple genes, we have developed a model system of

an engineered Methylobacterium extorquens AM1 (EM) in which we

altered its central metabolism to be dependent upon a foreign

pathway (Figure 1A for details). M. extorquens grows on methanol

by oxidizing it first to formaldehyde, and then through a series of

steps to formate, which is either fully oxidized to CO2 or

incorporated into biomass [28–32]. In the EM strain we removed

the endogenous pathway for formaldehyde oxidation in wild-type

(WT) [28] and replaced it with two genes encoding a foreign

pathway that oxidizes formaldehyde via glutathione (GSH)

derivatives [29]. Eight populations dependent upon this intro-

duced metabolic pathway evolved in methanol-containing medi-

um via serial transfers for 900 generations [10,33].

Adaptation of the unfit EM strain to grow on methanol

consistently involved beneficial mutations that altered expression

of the foreign GSH pathway (Figure 1B). When the GSH pathway

was introduced, the two enzymes were cloned together on a single

mRNA transcript behind a strong native promoter present on a

medium copy plasmid (,9 cell21) [10,29,34,35]. As such, the costs

of expression outweighed the catalytic benefits, and among the

targets of adaptation we identified by resequencing strains evolved

in separate populations, we universally obtained beneficial

mutations that decreased expression of these enzymes [10,34].

These mutations reduced expression of the GSH pathway through

three classes of underlying mechanisms: Class A decreased

expression per gene copy, Class B reduced gene dosage by

lowering plasmid copy number, and Class C integrated the

introduced pathway into the host genome, which also reduced

plasmid copy number [33,34] (Figure 1B). In terms of epistasis,

mutations in multiple genes along a single adaptive trajectory –

including one mutation (here ‘A1’) reducing expression of the

GSH pathway – have been shown to exhibit a general trend of

diminishing returns that was devoid of sign epistasis [10].

However, here we are interested in uncovering the trends and

mechanisms underlying epistatic interactions between mutations

that arose in separate adapting lineages and affect expression of

the same metabolic pathway.

We combined independently-arising beneficial mutations af-

fecting gene expression of this two-enzyme metabolic pathway and

report strong antagonism and sign epistasis for fitness. These

interactions were increasingly antagonistic for larger benefit

mutations. Such strong antagonism did not stem from the effects

of mutational combinations upon enzyme levels, but rather from

the nonlinear mapping between enzyme expression and organis-

mal fitness. By developing a quantitative model that relates

expression cost and catalytic benefit to fitness, we characterized

the overall shape of this fitness landscape and revealed that some

of these single mutations can optimize multiple phenotypes

simultaneously, leading to a big jump toward the single, global

optimum.

Results

Interactions between mutations affecting expression of atwo-enzyme metabolic pathway exhibit strongantagonism and sign epistasis

To explore the pattern of epistatic interactions between

beneficial mutations affecting expression of the GSH-dependent

pathway, we combined beneficial plasmid mutations that emerged

during experimental evolution and affected distinct traits [34]. We

focused upon Class A (decreased expression per copy) and B

(reduced gene dosage) mutations because of their genetic

tractability, and the prediction that these represent orthogonal

mechanisms to achieve lower expression. We hypothesized that

mutational combinations between these classes would result in

enzyme levels that would be the product of the individual

perturbations (Figure 1C). Three class A mutations, A1–A3, and

one class B mutation, B5, occurred independently, whereas B2 and

B3 were isolated together from the same plasmid. We generated

12 plasmids that paired each Class A mutation with each one from

Class B, as well as with the B2–B3 pair, and measured their

relative fitness via competitions with a fluorescently labeled

ancestor [34] (Tables S1, S2). The observed fitness values for

the mutational combinations were substantially less than expected

based upon a simple multiplicative null model incorporating the

single mutant effects (i.e., Wij = Wi6Wj; R2 = 0.53, adj-R2 = 0.32;

Figure 2A).

The increasingly strong antagonism for higher expected fitness

values suggested a potential negative relationship between the

selective coefficients observed for each mutation and the average

epistasis that mutation exhibited with other mutations. We

observed that the individually most beneficial mutations (large s)

engendered the greatest antagonism (e,0) when combined with

Author Summary

The pace and outcome of a series of adaptive steps in anevolving lineage depends upon how well differentbeneficial mutations stack on top of each other. We foundthat independent beneficial mutations that affected geneexpression for a metabolic pathway did not work welltogether, and were often jointly deleterious. The mostbeneficial mutations interacted the most poorly withothers, which was a trend we found common in otherbiological systems. Through generating a model thataccounted for enzymatic benefits and expression costs,we uncovered that this antagonism was caused by aphenotype to fitness mapping that had an intermediatepeak. This allowed us to predict the fitness effect of doublemutants and to uncover that the single winning mutationstended to move straight to the peak in a single step. Thesefindings demonstrate the importance of considering thephenotypic changes that cause nonlinear interactionsbetween mutations upon fitness, and thus influence howpopulations evolve.

Antagonism on a Gene Expression Fitness Landscape


Figure 1. Adaptive mutations that optimized the expression of the GSH-linked pathway. A) The GSH-linked formaldehyde oxidationpathway in the EM strain. The enzymes of the GSH-linked pathway are indicated with yellow arrows. MDH, methanol dehydrogenase; spont.,spontaneous reaction of formaldehyde and GSH; FlhA, S-hydroxymethyl GSH dehydrogenase; FghA, S-formyl-GSH hydrolase; FDHs, formatedehydrogenases. B) Adaptive mutations identified on the introduced plasmid (pCM410) expressing the introduced formaldehyde pathway. Mutationsoccurring in the predicted ribosome-binding site (bold text) of fghA, or its upstream region are shown in magenta (Class A). Mutations occurring inregions that control plasmid replication are shown in blue (Class B). The fghA start codon is underlined. trfA, a gene encoding the TrfA proteinessential to plasmid replication; PmxaF, promoter of the flhA-fghA gene cassette; oriV, origin of replication recognized by the TrfA protein; oriT, originof transfer. ISMex4 and ISMex25, two insertion sequences native to M. extorquens AM1. C) Diagram of orthogonal mechanisms of Class A and Bmutations on gene expression.doi:10.1371/journal.pgen.1004149.g001



other mutations, including several examples of sign epistasis

(Figure 2B).

The observed relationship between selective coefficientand average epistasis is observed for other biologicalsystems

Several theoretical arguments suggest that the geometry of

fitness landscapes might induce correlations between the size of a

mutation and the strength and direction of epistasis. Epistasis has

been observed to be coupled to the mean fitness effect of mutations

[36,37]. A single beneficial mutation of large effect may

appreciably change both the mean fitness of subsequent mutations

and the remaining distance to the optimum, potentially skewing its

own epistatic coefficients.

Given the emerging empirical consensus and theoretical

arguments for antagonistic epistatic interactions among beneficial

mutations, we analyzed several other datasets to ask whether the

strength and form of the relationship between selective effect and

average epistatic effect held for intragenic and intergenic datasets.

For this comparison we analyzed the relationship between s and efor previous datasets from M. extorquens and E. coli where the

beneficial mutations occurred consecutively in a variety of genes

across the genome of a single adapting lineage [10,11], combina-

tions of mutations from two genes of the bacteriophage ID11 [12],

and two datasets of within-protein interactions for b-lactamase

[17,38]. These datasets also displayed signs of a correlation

between s and increasingly negative e (as noted in [37]), with the

exception of the intragenic data for b-lactamase (Figure S1).

Negative trends in the relationship between initial selective

coefficient and epistasis may seem like obvious evidence for

diminishing returns. However, recent theoretical work has shown

that in models where mutations have random effects and no

tendency to be either synergistic or antagonistic, a pattern of

diminishing returns occurs between mutations if they are selected

conditioned on being beneficial in the ancestral background [39].

In Supplementary Text S1 we show this behavior in a simple

model of evolution on fitness landscapes with no mean epistatic

tendency and show how it leads to a pattern of diminishing returns

between beneficial mutations as a form of regression to the mean.

This analysis suggests that genotype-fitness data alone, without

knowledge of the phenotypic effects of mutations or the

physiological causes for trade-offs, might be insufficient to infer

the mechanism underlying a pattern of epistasis.

Pairs of orthogonal perturbations to gene expression actindependently upon enzyme levels

What physiological factors underlie the strong antagonism

observed between mutations affecting expression of the foreign

GSH pathway? A first possibility is that mutational combinations

lead to smaller changes in protein expression than expected from

the single mutants and that such antagonistic behavior at the level

of expression phenotypes merely propagated through as observed

antagonism at the level of fitness. Because we used combinations

that largely derived from pairing mutations that reduced

expression per copy (Class A) with those that decreased plasmid

copy number (Class B), our null hypothesis was that these

mechanisms should act independently to alter expression, such

that the expression level of an A+B mutant pair would simply be

the product of these two values. Consistent with this prediction,

enzyme levels were well described by the null model of

multiplicative independence between paired perturbations

(Figure 3A, Table S1). A simple linear model of log-transformed

changes in enzyme levels as a function of the presence of the single

mutations with no interaction terms explains much of the variation

for both FlhA and FghA (adjusted-R2 = 0.85 (FlhA) and 0.86

(FghA)). This predictability can be seen in the high correlation

between observed expression phenotypes for paired perturbations

and those expected based upon the single changes.

Figure 2. Epistasis at the level of expression phenotypes and fitness relative to independence or kinetic model. A) Fitness values ofmutation combinations are consistently lower than expected by multiplicative independence. B) Mutations with larger with selective advantages (s)when tested alone in the ancestor have more negative e values in combination with other mutations.doi:10.1371/journal.pgen.1004149.g002



A fitness landscape model that incorporates benefits andcosts predicts the fitness of mutational combinations

Since mutational combinations did not introduce epistatic

interactions at the level of gene expression, we built a model of the

fitness landscape based upon enzyme levels in order to ask how its

shape would contribute to antagonism. Building upon earlier work

on single enzymes [26] (Supplementary Text S2), we generated a

model of the fitness landscape that calculates fitness as flux through

the pathway above a threshold, minus the sum of two costs Cið Þ:

W~VMax

:FlhA

FlhAzEh

{Threshold{C1:FlhA{C2

:FghA

The hyperbolic expression for catalysis has been used before [40]

to effectively describe the dependence of steady-state flux to the

levels of a single enzyme and incorporates a ‘‘Vmax’’ term for the

pathway, and an Eh half-maximal enzyme level term. We only

model FlhA concentration as beneficial to fitness even though

FghA is absolutely required for growth on methanol [34]. This is

because, over the parameter range of our perturbations, fitness

appeared to rise monotonically with decreasing levels of FghA.

This suggests FghA is a typical enzyme that has a low metabolic

‘‘control coefficient’’ [20,21] and that it only limits catalysis at

exceptionally low levels. None of our perturbations pushed FghA

levels below 2%, and for comparison b-galactosidase levels in

lactose-limited chemostats only impacted fitness significantly if

they decreased activity to #1% [41].

The threshold flux term was added to the model to capture an

unusual right-shift of the typical relationship between enzyme

concentration and fitness observed with FlhA in these data, such

that fitness approached zero even in the presence of measurable

concentrations of functioning enzyme. We have observed similar

behavior when manipulating levels of the analogous enzyme in the

endogenous, tetrahydromethanopterin-dependent pathway for

formaldehyde oxidation in WT (SM Carroll, CJM, unpublished).

As both of these enzymes occur directly downstream of

formaldehyde production, this threshold phenomenon may be

explained by toxic effects of elevated steady-state formaldehyde

concentrations at low enzyme levels. Finally, there are two cost

terms for FlhA and FghA. The cost per molecule for each enzyme

was treated as a linear function, consistent with prior work [26,42].

The six parameters of this benefit - costs model were fit using

the data from the EM ancestor, single mutants, as well as strains

with inducible promoter plasmids (27 data points; Table S3). The

inducible promoter plasmids contained a cumate-responsive

repressor to modulate the levels of flhA-fghA from ancestral levels

to lower values (Table S1). These data were critical for capturing

the steep decline of the fitness landscape at low values of FlhA.

The resulting benefit - costs model captured the curvature of the

fitness landscape (Figure 4) and, unlike the simple multiplicative

model, it was able to predict the 17 combinations of mutations that

were not used for model fitting with high precision (R2 = 0.98)

(Figure 3B). From the perspective of the ancestral genotype, in the

model fitness rises gently with decreased expression of either

enzyme, but then declines rapidly upon reaching catalytically-

limiting levels of FlhA. A similar cliff exists for low values of FghA

[34], but at enzyme levels beyond the range of our dataset and

below the detection threshold of our enzyme assay method (see

Methods).

Beneficial mutations moved directly toward the globalexpression optimum rather than in the locally steepestdirection on the fitness landscape

Our fitness landscape model that precisely maps phenotypes to

fitness allowed us to explore how much local topography may have

influenced the direction of phenotypic change during evolution by

Figure 3. Mutations exert independent effects upon enzyme expression but a mechanistic model of benefits and costs of enzymesis required to predict combined effects on fitness. A) Expression levels (in mU) are well predicted by independent, multiplicative effects of eachmutation on the FlhA and FghA enzymes. B) Using a kinetic model of catalysis and costs parameterized with data from the single mutants andinducible promoter constructs predicts the mutational combinations very well (R2 = 0.98).doi:10.1371/journal.pgen.1004149.g003



de novo mutations (Figure 5). We compared the changes in enzyme

expression caused by single beneficial mutations relative to three

factors: 1) the local gradient in the fitness landscape for the

ancestor (greater decreases of FlhA versus FghA because the

former is more costly), 2) the direct vector pointing to the global

fitness optimum and 3) equal proportional changes between the

enzymes which might be expected due to the physical constraint of

their being expression from a single transcript. All mutations

moved toward the global optimum rather than ascend in the

phenotypically steepest direction on the local fitness landscape.

Mutations B2 and B5 affected copy number but, through

mechanisms we do not currently understand, led to greater

decreases in FghA than FlhA. In contrast, B3 was directly along

the line of equivalent change in both enzymes. This mutation was

identified along with B2 as a plasmid haplotype, and this B2–B3

combination allowed this lineage to accomplish a similar

phenotypic (and fitness) change as the other mutants.

Discussion

We found that combining adaptive mutations that optimize

expression of a two-enzyme pathway exhibited strong antagonistic

interactions and sign epistasis. Fitness values of mutational

combinations were generally less than expected relative to a null

model of independent, multiplicative effects upon fitness. We

further observed a negative relationship between s and e for

individual mutations. Other datasets of intragenic epistasis

revealed similar trends between s and e; however, this overall

trend of epistasis (e.g., antagonism) does not imply a specific

connection with properties of the individual mutations, such as s.

For example, this trend will also arise as a consequence of

regression to the mean when the beneficial mutations assayed are

conditioned to be beneficial in the ancestral background and the

effect of a mutation has a component that is independently

distributed on each possible genetic background. Therefore, to

extract biological insight from the quantitative relationship

between s and e , we must interrogate the mechanisms that lead

to antagonistic epistasis.

The first possible explanation for antagonism in our data would

be non-linearities in the way mutations combined to affect enzyme

expression. However, as expected from having chosen combina-

tions that combined class A mutations with those from class B,

these orthogonal mechanistic effects resulted in independent

effects on enzyme expression that were jointly well predicted with

a simple, multiplicative model. As in this system, many ecologically

relevant genes are encoded on plasmids whose regulation and gene

dosage may both be effected by separate sets of mutations. More

broadly, mutations that influence different traits that make joint

contribution to a higher phenotype such as fitness are common. At

the level of individual genes, for example, catalytic improvement

of an enzyme often results from the joint contribution of mutations

that improve protein stability and those that enhance kinetic

parameters [43–45].

The second factor that could generate antagonism is the

curvature of the underlying fitness landscape for gene expression.

Recent theory has shown that almost any formulation of fitness

based upon multiple underlying phenotypes will generate epistasis

at the level of fitness, even when the mutations – as we observed

here – do not interact epistatically on the underlying trait

phenotypes [46]. Previous models have formulated fitness as a

function of gene expression correctly predicted the evolution of

optimal levels of gene expression [26]. Here we extended this

model framework to multiple enzymes and used it for the first time

to interpret beneficial mutational effects from phenotype to fitness,

which we characterized both individually and in combination.

Our fitness landscape model was able to predict the fitness values

of the mutation combinations with high precision (R2 = 0.98). The

asymmetry in the curvature of this fitness landscape results from

the relatively gentle effects of expression costs relative to the sharp

transition in fitness effects due to rate limitation upon catalysis

[41]. This observed selection to maintain an intermediate

optimum of enzyme levels is distinct from the selective neutrality

Figure 4. Fitness landscape of the GSH-linked pathway. A) A two-dimensional heatmap and B) three-dimensional surface showing the shapeof the fitness landscape predicted by the model. The x and y axes are FlhA and FghA levels relative to ancestor (set at 100). Though the plasmid costis included in the model, this fourth dimension was corrected for to allow a three-dimensional visualization (see Methods). Experimental data pointsindicate the ancestor (asterisk), single mutants (grey circles), mutational combinations (white squares), and inducible expression vectors (blackcircles).doi:10.1371/journal.pgen.1004149.g004



on a catalytic plateau that was predicted by classical MCA

analyses that did not incorporate expression costs [40].

Knowledge of the underlying fitness landscape allows us to

understand aspects of the epistatic interactions not evident from

fitness values alone. For example, we observed that the A3 and B5

mutations had fairly comparable individual fitness values

(1.42060.032 vs. 1.45760.033; mean and 95% CI), but the

former had three-fold more antagonistic epistasis than the latter

(average = 20.46 vs. 21.34; t-test, p = 0.025; Figure 6). The

modeled fitness landscape illuminates the underlying reason for

this difference. Both mutations rest near the peak value of enzyme

expression, but on opposite sides (B5 has 70% the level of FghA as

A3). This poises B5 such that it is much more sensitive to further

reductions in expression than A3. Thus, although these two

mutations are essentially equivalent if one only considers their

fitness values, their locations in phenotypic space change their

likelihood for antagonism and sign epistasis.

One consequence of sign epistasis between mutations affecting

the phenotypes like gene expression is a reduction of the benefit

caused by recombination bringing beneficial mutations together

into the same genome (i.e., Fisher-Muller model). This tradeoff

between benefits and costs is inherent to gene expression, and thus

results in stabilizing selection. These patterns of epistasis are likely

very common, given the apparent ubiquity of stabilizing selection

upon gene expression from microbes to primates [47–50].

With a complete fitness landscape defined by biochemical

phenotypes, we can now interpret the genetic landscape in terms

of what was accessible to individual mutations. In contrast to how

selection acts upon standing genetic variation, the de novo

mutations fixed in experimental populations ignored the pheno-

typically-local best direction of change and ‘‘jumped’’ towards the

global optimum. This highlights that the classic quantitative

genetics intuition of climbing in the direction of the steepest

selection gradient [51], which is appropriate for small populations

containing standing genetic variation, fails to capture the

phenotypic potential of a sizeable pool of de novo mutations

arising from large populations. In our case, it was not the large

magnitude of expression change that was surprising, per se, but

change in the ratio of their expression. Given that flhA and fghA are

encoded on the same transcript, it was notable that all but one of

the single mutations down-regulated FghA to a large extent while

only cutting FlhA levels in half. Whereas this system allowed

Figure 5. Adaptive mutations tended to move phenotypes toward the global optimum. The direction of phenotypic movements forstrains with single mutations (red lines; enzyme levels of ancestor set to 100) are compared to the vectors indicating of the locally steepest fitnessgradient for the ancestor (blue), complete 1:1 correlation between phenotypes (black), and the global optimum (green). The direction of phenotypicmovement for all single mutations was closer to the vector for the global optimum relative to the local gradient around the ancestor.doi:10.1371/journal.pgen.1004149.g005



individual mutations to reach near-optimality, a multi-step

trajectory was required for the directed evolution of the LacI

repressor to reverse its regulatory logic [52]. In that system, the

first round mutation simply broke the old logic to become

constitutive, which in combination with two latter mutations

allowed the ‘‘anti-LacI’’ phenotype to emerge and locate the

fitness peak they predicted from a computational model. Our

results suggest that relatively large moves in multi-phenotype space

can emerge as winners, provided the genetic architecture at least

allows rare mutations to achieve this possibility.

Finally, the near optimal expression levels of these beneficial

mutations becomes even more remarkable when considering that

this optimization did not happen in isolation, but in adapting

populations that contained many beneficial mutations simulta-

neously [33,34,53]. In varying environments, such diversity in

large microbial populations may lead to genetically complex

adaptation such as stable polymorphisms [54]. In a stable

environment, this diversity leads to ‘clonal interference’ [55], a

type of serial fixation that effectively sorts for mutations of the

greatest effect amongst what was possible. This would have

impacted the mutations that affected GSH pathway expression in

two ways. Firstly, there were many different genetic solutions to

reducing expression of these enzymes [34]. One type in particular

- the Class C mutations that resulted from integration of the

introduced plasmid into the host chromosome – occur at very high

rates and emerged to detectable levels repeatedly, up to 17 times

per population [33]. These mutations confer ,O the benefit of

the Class A and B mutations [34], however, and were only found

to rise to fixation in three of eight populations despite more than

100 observed occurrences [33]. In this regard, clonal interference

aids finding optimal solutions by allowing only the best individual

mutations to fix. Recently, however, it has been shown that

fixation probability of contending mutations is only partly

dependent upon their individual effect because they commonly

hitchhike with other beneficial mutations present [56–58]. This

leads to a second effect of clonal interference, which is competition

between lineages carrying beneficial mutations affecting distinct

phenotypic processes. Indeed, the ancestral genotype faced a

variety of phenotypic challenges besides just optimizing expression

of the GSH pathway [14,59,60]. Some of these mutations in other

loci had beneficial effects up to 36 larger than those described

here [10] and were segregating at the same time as mutations

affecting expression of the GSH pathway [33,53]. Even with so

much turmoil in the populations, the eventual winners discovered

nearly optimal solutions to this local, two-enzyme expression

optimization in order to win the battle for fixation. Population size

thus contributed to the fixation of optimal solutions by both

increasing the number of mutations occurring and escaping drift in

the first place, and by facilitating competition between multiple

potential solutions. These factors conspired to allow selection to

reward – when mutationally possible – lineages that made long-

range, lucky jumps to distant peaks on the phenotypic landscape.

Materials and Methods

Experimental evolution and growth conditionsThe EM strain was generated previously by deleting the mptG gene

of M. extorquens AM1 in the white strain WT CM502 [61] lacking

carotenoid pigments due to an unmarked mutation in crtI (encoding

phytoene desaturase) [62], followed by introduction of pCM410

[10]. Eight replicate populations seeded by the EM strain were

grown in 9.6 ml methanol (15 mM) minimal media incubated in a

30uC shaking incubator at 225 rpm. Populations were transferred to

fresh media at a 1/64 dilution rate (thus six generations per growth

cycle, Nfinal<109) and propagated for 600 generations. One liter of

minimal media consists of 100 ml of phosphate buffer (25.3 g of

K2HPO4 and 22.5 g of NaH2PO4 in 1 liter of deionized water),

100 ml of sulfate solution (5 g of (NH4)2SO4 and 0.98 g of MgSO4 in

1 liter of deionized water), 799 ml of deionized water, and 1 ml of

trace metal solution. One liter of the trace metal solution consists of

100 ml of 179.5 mM FeSO4, 800 ml of premixed metal mix

(12.738 g of EDTA disodium salt dihydrate, 4.4 g of ZnSO4?7H2O,

1.466 g of CaCl2?2H2O, 1.012 g of MnCl2?4H2O, 0.22 g of

(NH4)6Mo7O24?4H2O, 0.314 g of CuSO4?5H2O, and 0.322 g of

CoCl2?6H2O in 1 liter of deionized water, pH 5), and 100 ml of

deionized water [14].

Plasmid and strain constructionAll strains and plasmids used are indicated in Table S2. All

plasmids constructed in this study were maintained in E. coli 10-

beta strain (New England Biolabs) and were transferred to M.

extorquens via electroporation [63] or tri-parental mating with the

helper strain pRK2073 [64]. Plasmid DNA in E. coli was extracted

using the QIAprep Spin MiniPrep Kit (Qiagen). The PmxaF

expression vector pCM160 [65], its variant pCM410 in the EM

strain expressing the flhA-fghA cassette [10], and the cumate-

Figure 6. Phenotype and not just fitness value determinesepistatic interactions. Mutations (A) A3 and (B) B5 had similar fitnessvalues, but resided on opposite flanks of the optimal phenotype (shownhere for simplicity as FlhA enzyme levels). In interactions with secondarybeneficial mutations (B2 and A2, respectively), B5 fares worse due tobeing closer to the fitness cliff that occurs when FlhA catalysis becomeslimiting.doi:10.1371/journal.pgen.1004149.g006



inducible vector pHC112 expressing the flhA-fghA cassette [34]

have been described previously. In order to combine Class A and

B mutations which accumulated on separate pCM410 derivatives

during experimental evolution of the EM strain (Figure 1B, Table

S1), Class B mutations (from pCM410B2B3, pCM410B2, pCM410B3,

and pCM410B5) were moved to pCM410 derivatives bearing Class

A mutations (pCM410A1, pCM410A2, and pCM410A3) through the

procedures delineated below. Fragments containing B2–B3, B2,

and B3 mutations were first obtained by digesting pCM410B2B3,

pCM410B2, or pCM410B3 with SfiI and NheI. These were then

ligated into the plasmid backbone of pCM410A1, pCM410A2, or

pCM410A3 cut with the same enzymes. A fragment containing the

B5 mutation was obtained by digesting of pCM410B5 with SfiI and

SexAI and then ligated into the plasmid backbone pCM410A1,

pCM410A2, or pCM410A3 cut by the same enzymes. The above

procedures were also applied to introduce B2–B3, B2, B3, and B5

mutations into pHC112 in order to generate pHC112 derivatives

that vary in their plasmid copy number.

Fitness assaysFitness assays were performed by a previously described

procedure [34]. Strains were first physiologically acclimated

through one 4-day growth cycle in 9.6 ml of minimal media

supplemented with 15 mM methanol. In addition, for strains

bearing cumate-inducible promoter plasmids (pHC112 deriva-

tives), different concentrations of cumate (Table S1) were added to

growth media to modulate the expression of FlhA and FghA

enzymes. After this acclimation phase, each of these strains was

mixed with a fluorescent variant (CM1232) of the EM ancestor

[10] by a 1:1 volume ratio, diluted 1/64 into 9.6 ml of fresh

growth media, and incubated in a 30uC shaking incubator at

225 rpm. The ratios of the two populations before (R0) and after

(R1) competitive growth were quantified by a LSR II flow

cytometer (BD Biosciences) for at least 50000 cell counts per

sample. The forward scatter threshold of LSRII was adjusted to

300 to ensure unbiased detection of the test and reference strains

despite their potential differences in cell size. Fitness values (W)

relative to the reference strain were calculated by a previously

described equation assuming an average of 64-fold size expansion

of mixed populations during competitive growth [35]:

W~logR1:64

R0

� �=log

1{R1ð Þ:64

1{R0ð Þ

� �

In order to convert to absolute differences in growth rate, the EM

ancestor grows under these conditions with a growth rate of

0.0654+/20.0016 h21 [10].

Enzyme assaysThe activities of FlhA [66] and FghA [67] were assayed in three

replicates as described using cells harvested from mid-exponential

phase cultures. Cells were collected through centrifugation at

10,0006 g for 10 min, frozen at 280uC, and used for enzyme

assays within a week. Right before assays frozen cell pellets were

suspended in 50 mM Tris-HCl buffer (pH 7.5) and physically

disrupted in tubes containing Lysing Matrix B and shaken at speed

6.0 m/s on a FastPrepH-24 bead beater (MP Biomedicals) for

40 seconds. Insoluble debris in the cell lysate was removed by

centrifugation at 13,0006 g, 4uC for 15 min. The total protein

concentration of the cell lysate was quantified using the Bradford

method [68]. Kinetic analysis of FlhA and FghA activities over

10 min at 30uC was performed in 200 ml reaction mixtures using a

SpectraMax M5 Plate Reader (Molecular Devices).

Quantification of plasmid copy numbersThe copy number of pCM410 derivatives in M. extorquens was

quantified by a real-time PCR approach described previously [34].

Briefly genomic DNA of M. extorquens from mid-exponential phase

cultures was extracted by an alkaline lysis method [69]. Detection

of plasmid DNA was targeted at the kan gene using primers

HC410p18 (59-GAAAACTCACCGAGGCAGTTCCATAG-39)

and HC410p19 (59-TCAGTCGTCACTCATGGTGATTT-

CTCA-39). Detection of chromosomal DNA was targeted at the

rpsB gene (encoding the 30S ribosomal protein S2) in the

chromosome META1 using primers HCAM111 (59-TGAC-

CAACTGGAAGACCATCTCC-39) and HCAM113 (59-TTG-

GTGTCGATCACGAACAGCAG-39). Real-time PCR experi-

ments were performed in three replicates with the PerfeCTa

SYBR Green SuperMix (Quanta Biosciences) on a DNA Engine

Opticon2 (MJ Research), and the average threshold cycle (Ct) of

each PCR reaction was determined using the Opticon Monitor v.

2.02 software (MJ Research). Each real-time PCR reaction

contained 25 ng of genomic DNA extracted from various strains

and kan- or rpsB-specific primers. To establish a standard curve

(SC) of plasmid copy numbers, 1, 0.1, 0.01, and 0.001 ng of

pCM410 (equivalent to 9.096107, 9.096106, 9.096105, and

9.096104 plasmid molecules, respectively) were mixed with 25 ng

of genomic DNA (equivalent to 3.036106 genome copies) of the

plasmid-less, white WT M. extorquens (CM502) [61]. The standard

curve is a plot of DCt (i.e. Ctkan–CtrpsB) versus plasmid molecules

on a log2 scale. For each strain, by interpolating its DCt value

against the SC the absolute quantity of plasmid DNA can be

estimated using the following equation:

Plasmid copy number

genome~2

DCt{SCinterceptSCintercept 7 3:03|106

� �

Model fitting and comparisonMultiplicative models predicting fitness or gene expression were

fit and assessed as standard linear models following a log

transformation of the response variable. The model for the fitness

landscape was fit using a non-linear routine in Matlab. The raw

data as well as commented code in Matlab and R that completely

recreates the analysis and figures has been deposited at www.

datadryad.org (doi:10.5061/dryad.8hb23).

Supporting Information

Figure S1 Trends of epistasis as a function of selective

coefficient. The epistasis values on fitness were calculated for

every mutational combination a mutation with a given value of s

was found in. The datasets represent: intragenic epistasis during

single adaptive trajectories of (A) M. extorquens [10] or (B) E. coli

[11], (C) intragenic and intergenic combinations of independent

mutations from the bacteriophage ID11 [12], or (D,E) two datasets

of intragenic combinations of E. coli b-lactamase alleles [17,38].

Note that ID11 data were given as doublings per hour, and we

therefore exponentiated with base two. These transformed data, as

well as the other four data sets, were treated as multiplicative

fitnesses; these values were normalized by dividing by the ancestor

fitness, and the epistasis of mutation i in background g was

calculated as eig = Wig2Wi6Wg.

(DOCX)

Text S1 Analysis of bias in epistasis as a function of selection

coefficient.

(DOCX)



www.datadryad.org

www.datadryad.org

Text S2 Derivation of the fitness model.

(DOCX)

Table S1 Genotypes and phenotypes of pCM410, pHC112, and

their derivatives.

(DOCX)

Table S2 Bacterial strains and plasmids.

(DOCX)

Table S3 The value of fitted parameters in the fitness landscape

model.

(DOCX)

Acknowledgments

We would like to thank Michael Desai, Sergey Kryazhimskiy, Daniel

Segre, Olivier Tenaillon, members of the Marx lab and two anonymous

reviewers for useful comments on the manuscript.

Author Contributions

Conceived and designed the experiments: HHC CJM. Performed the

experiments: HHC. Analyzed the data: HHC NFD JAD CJM.

Contributed reagents/materials/analysis tools: HHC NFD JAD CJM.

Wrote the paper: HHC NFD JAD CJM.

References

1. Wright S (1932) The roles of mutation, inbreeding, crossbreeding, and selection

in evolution. Proc VIth Int Congress of Genetics 1: 356–366.

2. Phillips PC (2008) Epistasis – the essential role of gene interactions in the

structure and evolution of genetic systems. Nat Rev Genet 9: 855–867.

3. Franke J, Klozer A, de Visser JAGM, Krug J (2011) Evolutionary accessibility of

mutational pathways. PLoS Comput Biol 7: e1002134.

4. Tenaillon O, Rodrıguez-Verdugo A, Gaut RL, McDonald P, Bennett AF, et al.

(2012) The molecular diversity of adaptive convergence. Science 335: 457–461.

doi:10.1126/science.1212986.

5. Kryazhimskiy S, Tkacik G, Plotkin JB (2009) The dynamics of adaptation on

correlated fitness landscapes. Proc Natl Acad Sci USA 106: 18638–18643.

doi:10.1073/pnas.0905497106.

6. Draghi JA, Parsons TL, Plotkin JB (2011) Epistasis increases the rate of

conditionally neutral substitution in an adapting population. Genetics 187:

1139–1152. doi:10.1534/genetics.110.125997.

7. Szendro IG, Franke J, de Visser JAGM, Krug J (2013) Predictability of evolution

depends nonmonotonically on population size. Proc Natl Acad Sci USA 110:

571–576. doi:10.1073/pnas.1213613110.

8. deVisser JAGM, Park SC, Krug J (2009) Exploring the effect of sex on empirical

fitness landscapes. Am Nat 174: S15–30. doi:10.1086/599081.

9. Sanjuan R, Moya A, Elena SF (2004) The contribution of epistasis to the

architecture of fitness in an RNA virus. Proc Natl Acad Sci USA 101: 15376–

15379.

10. Chou H-H, Chiu H-C, Delaney NF, Segre D, Marx CJ (2011) Diminishing

returns epistasis among beneficial mutations decelerates adaptation. Science

332: 1190–1192. doi:10.1126/science.1203799

11. Khan AI, Dinh DM, Schneider D, Lenski RE, Cooper TF (2011) Negative

epistasis between beneficial mutations in an evolving bacterial population.

Science 332: 1193–1196.

12. Rokyta DR, Joyce P, Caudle SB, Miller C, Beisel CJ, et al. (2011) Epistasis

between beneficial mutations and the phenotype-to-fitness map for a ssDNA

virus. PLoS Genet 7: e1002075. doi:10.1371/journal.pgen.1002075.

13. Maharjan RP, Ferenci T (2013) Epistatic interactions determine the mutational

pathways and coexistence of lineages in clonal Escherichia coli populations.

Evolution 67: 2762–2768 doi:10.1111/evo.12137.

14. Chou H-H, Berthet J, Marx CJ (2009) Fast growth increases the selective

advantage of a mutation arising recurrently during evolution under metal

limitation. PLoS Genet 5: e1000652. doi:10.1371/journal.pgen.1000652.

15. Kvitek DJ, Sherlock G (2011) Reciprocal sign epistasis between frequently

experimentally evolved adaptive mutations causes a rugged fitness landscape.

PLoS Genet 7: e1002056 doi:10.1371/journal.pgen.1002056.

16. Weinreich DM, Watson RA, Chao L (2005) Perspective: Sign epistasis and

genetic constraint on evolutionary trajectories. Evolution 59: 1165–1174.

17. Schenk MF, Szendro IG, Salverda ML, Krug J, de Visser JAGM (2013) Patterns

of epistasis between beneficial mutations in an antibiotic resistance gene. Mol

Biol Evol 30: 1779–1787. doi:10.1093/molbev/mst096.

18. Fisher RA (1930) The genetical theory of natural selection. Clarendon Press,

Oxford, UK.

19. Martin G, Elena SF, Lenormand T (2007) Distributions of epistasis in microbes

fit predictions from a fitness landscape model. 39: 555–560.

20. Kascer H, Burns JA (1973) The control of flux. Symp Soc Exp Biol 27: 65–104.

21. Heinrich R, Rapoport TA (1974) A linear steady-state treatment of enzymatic

chains. General properties, control and effector strength 42: 89–95.

22. Watt WB, Dean AM (2000) Molecular-functional studies of adaptive genetic

variation in prokaryotes and eukaryotes. Annu Rev Genet 34: 593–622.

23. Szathmary E (1993) Do deleterious mutations act synergistically? Metabolic

control theory provides a partial answer. Genetics 133: 127–132.

24. Keightley PD (1996) Metabolic models of selection response. J Theor Biol 182:

311–316.

25. Stoebel DM, Dean AM, Dykhuizen DE (2008) The cost of expression of

Escherichia coli lac operon proteins in in the process, not in the products. Genetics

178: 1653–1660. doi:10.1534/genetics.107.085399.

26. Dekel E, Alon U (2005) Optimality and evolutionary tuning of the expression

level of a protein. Nature 436: 588–592. doi:10.1038/nature03842.

27. Flamholz A, Noor E, Bar-Even A, Liebermeister W, Milo R (2013) Glycolytic

strategy as a tradeoff between energy yield and protein cost. Proc Natl Acad Sci

USA 110: 10039–10044. doi:10.1073/pnas.1215283110.

28. Chistoserdova L, Vorholt JA, Thauer RK, Lidstrom ME (1998) C1 transfer

enzymes and coenzymes linking methylotrophic bacteria and methanogenic

Archaea. Science 281: 99–102.

29. Marx CJ, Chistoserdova L, Lidstrom ME (2003) Formaldehyde-detoxifying role

of the tetrahydromethanopterin-linked pathway in Methylobacterium extorquens

AM1. J Bacteriol 185: 7160–7168.

30. Marx CJ, Laukel M, Vorholt JA, Lidstrom ME (2003) Purification of the

formate-tetrahydrofolate ligase from Methylobacterium extorquens AM1 and

demonstration of its requirement for methylotrophic growth. J Bacteriol 185:7169–7175.

31. Marx CJ, Van Dien SJ, Lidstrom ME (2005) Flux analysis uncovers the key role

of functional redundancy in formaldehyde metabolism. PLoS Biol 3: e16.

32. Crowther GJ, Kosaly G, Lidstrom ME (2008) Formate as the main branch point

for methylotrophic metabolism in Methylobacterium extorquens AM1. J Bacteriol

190: 5057–5062. doi:10.1128/JB.oo228-08.

33. Lee M-C, Marx CJ (2013) Synchronous waves of failed soft sweeps in the

laboratory: remarkably rampant clonal interference of alleles at a single locus.

Genetics 193: 943–952. doi:1534/genetics.112.148502.

34. Chou H-H, Marx CJ (2012) Optimization of gene expression through divergent

mutational paths. Cell Rep 1: 133–140. doi:10.1016/j.celrep.2011.12.003.

35. Lee M-C, Chou H-H, Marx CJ (2009) Asymmetric, bimodal trade-offs duringadaptation of Methylobacterium to distinct growth substrates. Evolution 63: 2816–

2830. doi:10.1111/j.1558-5646.2009.00757.x.

36. Wilke CO, Adami C (2001) Interaction between directional epistasis and

average mutational effects. Proc Biol Sci 268: 1469–1474.

37. Gros PA, Le Nagard H, Tenaillon O (2009) The evolution of epistasis and its

links with genetic robustness, complexity and drift in a phenotypic model of

adaptation. Genetics 182: 277–293. doi:10.1534/genetics.108.099127

38. Weinreich DM, Delaney NF, Depristo MA, Hartl DL (2006) Darwinian

evolution can follow only very few mutational paths to fitter proteins. Science

312: 111–114.

39. Draghi JA, Plotkin J (2013) Selection biases the prevalence and type of epistasis

among beneficial substitutions. Evolution 67: 3120–3131. doi:10.1111/

evo.12192.

40. Hartl DL, Dykhuizen DE, Dean AM (1985) Limits of adaptation: the evolution

of selective neutrality. Genetics 111: 655–674.

41. Dykhuizen DE, Dean AM, Hartl DL (1987) Metabolic flux and fitness. Genetics115: 25–31.

42. Dong H, Nilsson L, Kurland CG (1995) Gratuitous overexpression of genes in

Escherichia coli leads to growth inhibition and ribosome destruction. J Bacteriol177: 1497–1504.

43. Lunzer M, Miller SP, Felsheim R, Dean AM (2005) The biochemical

architecture of an ancient adaptive landscape. Science 310: 499–501.

44. Giver L, Gershenson A, Freskgard P-O, Arnold FH (1998) Directed evolution ofa thermostable esterase. Proc Natl Acad Sci USA 95: 12809–12813.

doi:10.1073/pnas.95.22.12809.

45. DePristo MA, Weinreich DM, Hartl DL (2005) Missense meanderings insequence space: a biophysical view of protein evolution. Nat Rev Genet 6: 678–

687.

46. Chiu H-C, Marx CJ, Segre D (2012) Epistasis from functional dependence offitness on underlying traits. Proc Biol Sci 279: 4156–4164. doi:10.1098/

rspb.2012.1449.

47. Denver DR, Morris K, Streelman JT, Kim SK, Lynch M, et al. (2005) Thetranscriptional consequences of mutation and natural selection in Caenorhabditis

elegans. Nat Genet 37: 544–548.

48. Lemos B, Meiklejohn CD, Caceres M, Hartl DL (2005) Rates of divergence ingene expression profiles of primates, mice, and flies: stabilizing selection and

variability among functional categories. Evolution 59: 126–137.

49. Wagner A (2005) Energy constraints on the evolution of gene expression. MolBiol Evol 22: 1365–1374.

50. Bedford T, Hartl DL (2009) Optimization of gene expression by natural

selection. Proc Natl Acad Sci USA 106: 1133–1138. doi:10.1073/pnas.0812009106.



51. Lande R, Arnold SJ (1983) The measurement of selection on correlated

characters. Evolution 37: 1210–1226.52. Poelwijk FJ, de Vos MG, Tans SJ (2011) Tradeoffs and optimality in the

evolution of gene regulation. Cell 146: 462–470. doi:10.1016/j.cell.2011.06.035.

53. Chubiz LM, Lee M-C, Delaney NF, Marx CJ (2012) FREQ-Seq: a rapid, cost-effective, sequencing-based method to determine allele frequencies directly from

mixed populations. PLoS One 7: e47959. doi:10.1371/journal.pone.0047959.54. Yi X, Dean AM (2013) Bounded population sizes, fluctuating selection and the

tempo and mode of coexistence. Proc Natl Acad Sci USA 110:16945–16950.

doi:10.1073/pnas.1309830110.55. Gerrish PJ, Lenski RE (1998) The fate of competing beneficial mutations in an

asexual population. Genetica 102–103: 127–144.56. Woods RJ, Barrick JE, Cooper TF, Shrestha U, Kauth MR, et al. (2011)

Second-order selection for evolvability in a large Escherichia coli population.Science 331:1433–1436. doi:10.1126/science.1198914.

57. Lang GI, Botstein D, Desai MM (2011) Genetic variation and the fate of

beneficial mutations in asexual populations. Genetics 188:647–661.doi:10.1534/genetics.111.128942.

58. Lang GI, Rice DP, Hickman MJ, Sodergren E, Weinstock GM, et al. (2013)Pervasive genetic hitchhiking and clonal interference in forty evolving yeast

populations. Nature 500(7464):571–4 doi:10.1038/nature12344.

59. Lee M-C, Marx CJ (2012) Repeated, selection-driven genome reduction ofaccessory genes in experimental populations. PLoS Genet 8: e1002651.

doi:10.1371/journal.pgen.1002651.60. Carroll SM, Marx CJ (2013) Evolution after introduction of a novel metabolic

pathway consistently leads to restoration of wild-type physiology. PLoS Genet 9:e1003427. doi:10.1371/journal.pgen.1003427.

61. Marx CJ (2008) Development of a broad-host-range sacB-based vector for

unmarked allelic exchange. BMC Res Notes 1: 1. doi:10.1186/1756-0500-1-1.

62. Van Dien SJ, Marx CJ, O’Brien BN, Lidstrom ME (2003) Genetic

characterization of the carotenoid biosynthetic pathway in Methylobacterium

extorquens AM1 and isolation of a colorless mutant. Appl Environ Microbiol 69:

7563–7566.

63. Marx CJ, Lidstrom ME (2004) Development of an insertional expression vectorsystem for Methylobacterium extorquens AM1 and generation of null mutants lacking

mtdA and/or fch. Microbiol 150: 9–19.64. Chistoserdov AY, Chistoserdova LV, McIntire WS, Lidstrom ME (1994)

Genetic organization of the mau gene cluster in Methylobacterium extorquens AM1:

complete nucleotide sequence and generation and characteristics of mau

mutants. J Bacteriol 176: 4052–4065.

65. Marx CJ, Lidstrom ME (2001) Development of improved versatile broad-host-range vectors for use in methylotrophs and other Gram-negative bacteria.

Microbiol 147: 2065–2075.66. Ras J, Van Ophem PW, Reijnders WN, Van Spanning RJ, Duine JA, et al.

(1995) Isolation, sequencing, and mutagenesis of the gene encoding NAD- and

glutathione-dependent formaldehyde dehydrogenase (GD-FALDH) from Para-

coccus denitrificans, in which GD-FALDH is essential for methylotrophic growth.

J Bacteriol 177: 247–251.67. Harms N, Ras J, Reijnders WN, van Spanning RJ, Stouthamer AH (1996) S-

formylglutathione hydrolase of Paracoccus denitrificans is homologous to human

esterase D: a universal pathway for formaldehyde detoxification? J Bacteriol 178:6296–6299.

68. Bradford MM (1976) Rapid and sensitive method for the quantification ofmicrogram quantities of protein utilizing the principle of protein-dye binding.

Anal Biochem 72: 248–254. doi:10.1016/0003-2697(76)90527-3.69. Lee CL, Ow DS, Oh SK (2006) Quantitative real-time polymerase chain

reaction for determination of plasmid copy number in bacteria. J Microbiol

Methods 65: 258–267.



Figure S1. Trends of epistasis as a function of selective coefficient.

0.1 0.2 0.3 0.4 0.5

-0.20

-0.15

-0.10

-0.05

0.00

s

Epistasis

A

0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14

-0.10

-0.05

0.00

0.05

0.10

s

Epistasis

B

0 10 20 30 40 50

-1000

-800

-600

-400

-200

0

s

Epistasis

C

0 5 10 15

-1e+05

-5e+04

0e+00

5e+04

s

Epistasis

D

0 5 10 15 20 25

-1500

-1000

-500

0

s

Epistasis

E

Text S1. Analysis of bias in epistasis as a function of selection coefficient.

The slope of the relationship between the selective coefficient of a mutation in the ancestral background,

and its epistatic interactions with other beneficial mutations, seems intuitively sufficient to reveal a trend toward

increasing synergistic or antagonistic epistasis. If mutations with larger beneficial effects in the ancestor combine

more poorly with other mutations, this slope will be negative; such negative relationships appear consistent with

an explanation of diminishing returns, where the benefit of a mutation decreases monotonically with the fitness of

the genetic background in which it is assayed. However, recent work in a computational model of fitness

landscapes has suggested that ascertainment bias in evolution experiments can produce statistical artifacts that

may be hard to distinguish from the patterns expected with diminishing returns [39]. Here we explore a simple,

illustrative model to show the fitness landscapes without diminishing returns can show spurious negative

correlations between the benefit of mutations in the ancestor and their fitness effects in combination.

Our simplified fitness landscape model starts with L sites, each of which permits a single alternative allele.

By assumption, each of these mutations has an inherent, or mean effect, which is independent of the genetic

background, and a random, or epistatic, effect, which reflects the influence of the genetic background on that

allele’s fitness contribution. The inherent effect of a locus i, µi, is drawn independently and identically from a

Gaussian distribution with mean zero and standard deviation msd. Random effects have a mean of zero and a

standard deviation which is specific for each locus; these standard deviations, σi, are drawn independently and

identically from an exponential with mean rsd. The random effects for loci i in background g are then drawn from

a Gaussian with mean zero and standard deviation σi. Let N(a, b) stand for a Gaussian draw with mean a and

standard deviation b and E (c) for an exponential draw with mean c; then, the inherent and random effects of the

mutation at locus i are:

µi = N(0, msd) Eq. 1

σi = E(rsd) Eq. 2

If the ancestor has a fitness of one, then the fitness of the genotype with a mutation at locus i is:

wi = 1 + µi + N(0, σi) = 1 + µi + ri Eq. 3

where ri is the random effect in the ancestor. Assuming that fitness is calculated on a multiplicative scale, the

fitness of the genotype with mutations at both loci i and j is:

wij = (1 + µi + rij)(1 + µj + rji) Eq. 4

where rij is the random effect of mutation i in the background containing j, and rji is the random effect of mutation

j with i. We can then calculate epistasis between mutations at loci i and j by subtracting the expected fitness from

the of the double mutant from its actual fitness, as calculated with Eq. 4.

εij = wij - wiwj Eq. 5

Epistasis, as represented in this model, is not biased toward antagonism at higher fitnesses; we therefore

do not expect to see signatures of diminishing returns from experiments conducted on these model landscapes. If a

trend toward negative regressions of ε on s of mutations is observed for beneficial mutations in this model, then

we can conclude that the process of selection makes these regressions vulnerable to spurious indications of

diminishing returns.

To generate each set of beneficial mutations, we first assign fitness effects in the ancestor using Eq. 3.

Then, we select a group of n beneficial mutations with a simplified model of an evolutionary process. This

selection step reflects an ascertainment bias: mutations that escape genetic drift and reach fixation are likely to be

very beneficial on the background of the ancestor, and are therefore likely to exhibit a regression to the mean

when tested on other genetic backgrounds. To illustrate the effects of this potential bias, we consider two

procedures for selecting beneficial mutations. In the first, which we will call weak selection, all beneficial

mutations are equally likely to be chosen. In the second, labeled strong selection, beneficial mutations are chosen

proportionally to their selective coefficients.

In each simulated experiment, six unique beneficial mutations are chosen according to either the weak or

strong selection procedure, and the slope of the relationship between s and ε is determined by linear regression.

These experiments are replicated 10,000 times for each parameter values, and we report the mean slope and the

fraction of slopes that are negative. Because of the nature of selection in our model, only the relative magnitudes

of selective coefficients are important; we therefore only consider the ratio msd:rsd.

Mean regression slopes, and the fraction of slopes that are negative, for strong selection simulations. Dashed

lines indicate the unbiased expectations. Parameter values are msd = 0.01, rsd = 0.05 for the ratio 0.2, msd = 0.05,

rsd = 0.05 for the ratio 1, and msd = 0.05, rsd = 0.01 for the ratio 5.

The above figure summarizes the bias toward negative slopes between selective coefficients in the

ancestor and epistasis in pairs for strong selection. For all parameter values, regression slopes are more likely to

be negative. This reflects the sorting effects of selection, where fixed mutations are likely to be unusually good on

the background in which they were selected, relative to other genetic backgrounds. This bias is more severe when

the number of loci is large and the magnitude of random effects is large compared to mean effects. The results for

the weak selection model are similar with two differences: biases are slightly smaller, and the effect of L is

negligible (data not shown). These complex interactions between parameters, which depend on aspects of both the

fitness landscape itself and the population process of selecting the mutations, highlight the difficulty in attempting

to correct for these biases.

0.2 1 5

L = 20L = 100

Ratio of Mean to Epistatic Effects

Mea

n S

lope

-1.0

-0.8

-0.6

-0.4

-0.2

0.0

0.2

0.2 1 5

L = 20L = 100

Ratio of Mean to Epistatic Effects

Frac

tion

of N

egat

ive

Slo

pes

0.0

0.2

0.4

0.6

0.8

1.0

1

Supplementary Text S2. Derivation of the fitness model.

Comparison of current benefit – cost model to previous analysis in M. extorquens.

The present formulation of fitness as a function of benefits and costs bears some

similarities to a prior phenomenological model for M. extorquens [10], but there are substantial

differences. In order to analyze epistasis amongst four beneficial mutations that occurred in a

single evolving lineage of the EM strain, we had noted that enzyme expression cost was a

substantial component of adaptation and built a simple model of fitness around considering the

impact cost would have. Three of the four alleles identified in an evolved (EVO) strain appeared

to increase fitness at least partly by reducing this cost. Therefore, like models for a single enzyme

[26], we partitioned fitness into a ‘benefit’ component b0, analogous to a single conglomerate

‘enzyme activity’ that sets the rate of energy extracted from the substrate to generate biomass;

and a cost c0, encompassing a fixed amount of energy diverted to deal with over-expression of

the foreign pathway. Thus the fitness of the ancestral strain can be written, as W0 = b0-c0 = 1. We

hypothesized that new alleles could modify benefit and cost by multiplicative factors (λi and θi,

respectively), giving rise to a fitness Wi = λib0-θic0. A successive allele j, on top of the

background of mutant i, is similarly assumed to act multiplicatively on the benefit and cost

components, yielding a fitness Wij = λiλjb0-θiθjc0.

There are three primary differences between the model described above and the one

developed in the current work. First, we measured changes in the enzyme levels themselves and

using these as an intermediate level phenotype that is the basis of calculating fitness. This

contrasts with the above model that only considered the fitness values of various strains and

abnormal cellular morphology as a proxy for costs. Second, we distinguish between the two

enzymes rather than lumping them together, for breaking the original correlation in their

expression was critical for adaptation [34]. Third, we explicitly describe the pleiotropy that links

the consistent, predictable changes in benefit and costs that occur when expression of FlhA is

altered. The above model contains the potential for pleiotropy in that each mutation could have

values of λi and θi that both differ from 1, but here that shape is established based upon a

simplification of MCA [40]. It is because of these linkages to the underlying physiology that the

current model can generate accurate predictions of fitness for mutational combinations not used

in the training set (R2 = 0.98).

2

Mechanistic derivation of the fitness model from Metabolic Control Analysis

Our fitness model is a biologically inspired but ultimately empirical description of the effect of

the measured enzyme levels on fitness. However, it is also possible to directly derive this model

from an underlying mechanistic model, as explained in this supplemental text. We would like to

gratefully acknowledge that this derivation was originally provided by an anonymous reviewer of

this manuscript.

Consider a model where fitness (𝑊) was proportional to the flux through a pathway, (𝐽), with

linear penalties for the concentration of a toxic metabolic intermediate (𝑆!) as well as the

concentration of the enzymes 𝐹𝑙ℎ𝐴 and 𝐹𝑔ℎ𝐴 along this pathway, such that:

𝑊 = 𝛽!𝐽 − 𝛽!𝑆! − 𝛽!𝐹𝑙ℎ𝐴 − 𝛽!𝐹𝑔ℎ𝐴

In our conception of this model, the toxic metabolite is formaldehyde, which appears upstream of

the enzymatic reactions catalyzed by FlhA and FghA in the pathway of methanol metabolism in

this organism.

Additionally, imagine that the pathway that determines fitness is a linear pathway that contains

the toxic intermediate 𝑆! as an intermediate step. This pathway begins at a substrate 𝑆! whose

concentration is fixed, and ends at the irreversible reaction catalyzed by 𝐸!!!. Further, assume

that this pathway is governed by mass action kinetics and that the kinetic rates of the forward and

reverse reactions do not change during the evolution experiment.

3

One can show that this model can be reduced to our model for fitness. This derivation depends on

many classic results from metabolic control theory, and if these are unfamiliar to the reader they

are encouraged to consult a reference book before examining this result (or Appendix C in [20]).

First make the following definitions:

𝑘! = 𝐹𝑜𝑟𝑤𝑎𝑟𝑑 𝑟𝑎𝑡𝑒 𝑡ℎ𝑟𝑜𝑢𝑔ℎ 𝑡ℎ𝑒 𝑥𝑡ℎ 𝑟𝑒𝑎𝑐𝑡𝑖𝑜𝑛

𝑘!! = 𝑅𝑒𝑣𝑒𝑟𝑠𝑒 𝑟𝑎𝑡𝑒 𝑡ℎ𝑟𝑜𝑢𝑔ℎ 𝑡ℎ𝑒 𝑥𝑡ℎ 𝑟𝑒𝑎𝑐𝑡𝑖𝑜𝑛

𝑞! =𝑘!𝑘!!

= 𝑟𝑎𝑡𝑖𝑜 𝑜𝑓 𝑟𝑎𝑡𝑒𝑠

𝐾!,! = 𝑞!

!

!!!

𝑒! = 𝐶𝑜𝑛𝑐𝑒𝑛𝑡𝑟𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑒𝑛𝑧𝑦𝑚𝑒 𝑋

𝐸! = 𝑒!𝑘! = 𝑒𝑛𝑧𝑦𝑚𝑒 𝑡𝑖𝑚𝑒𝑠 𝑓𝑜𝑟𝑤𝑎𝑟𝑑 𝑟𝑎𝑡𝑒 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡

Then for a linear pathway of length 𝑛 reactions, the flux through the pathway will be equal to:

𝐽 =𝑆!1

𝐾!,!𝐸!!!!!

Because the flux through the entire pathway is equal to the flux from the toxic intermediate as

well, we can set these two fluxes to be equal and solve for 𝑆!.

𝐽 =𝑆!1

𝐾!,!𝐸!!!!!

=𝑆!1

𝐾!,!𝐸!!!!!

𝑆! =𝑆!

1𝐾!,!𝐸!

!!!!

1𝐾!,!𝐸!

!!!!

4

Substituting back in to our original equation for fitness we obtain:

𝑊 = 𝛽!𝐽 − 𝛽!𝑆! − 𝛽!𝑒!"!! − 𝛽!𝑒!"!!

𝑊 = 𝛽!𝑆!1

𝐾!,!𝐸!!!!!

− 𝛽!𝑆!

1𝐾!,!𝐸!

!!!!

1𝐾!,!𝐸!

!!!!

− 𝛽!𝑒!"!! − 𝛽!𝑒!"!!

𝑊 = 𝑆!𝛽! − 𝛽!

1𝐾!,!𝐸!

!!!!

1𝐾!,!𝐸!

!!!!

− 𝛽!𝑒!"!! − 𝛽!𝑒!"!!

Note that: !!!,!

= !!,!!!,!

and that the control coefficient for an enzyme in a pathway are equal to:

𝐶! =

𝑒!𝐸!𝐾!,!

𝑒!𝐸!𝐾!,!

!!!!

As before 𝑒! is the concentration of enzyme for the 𝑥𝑡ℎ step. By rearrangement one can then

obtain:

𝑊 = 𝑆!𝛽! − 𝛽!

𝐾!,!𝐸!𝐾!,!

!!!!

1𝐾!,!𝐸!

!!!!

− 𝛽!𝑒!"!! − 𝛽!𝑒!"!!

𝑊 = 𝑆!𝛽!

1𝐾!,!𝐸!

!!!!

− 𝛽!𝐾!,!

1𝐾!,!𝐸!

1𝐾!,!𝐸!

!!!!

!

!!!

− 𝛽!𝑒!"!! − 𝛽!𝑒!"!!

5

𝑊 = 𝑆!𝛽!

1𝐾!,!𝐸!

!!!!

− 𝛽!𝐾!,!𝐶!𝑒!

!

!!!

− 𝛽!𝑒!"!! − 𝛽!𝑒!"!!

We now scale all the enzyme concentrations except FlhA and FghA to be equal to 1 (as we are

modeling their concentrations for different mutational combinations as effectively constant), and

define two constants for notational ease that are related to the control coefficients as:

𝑎 =𝐶!"!!𝑒!"!!

+𝐶!"!!𝑒!"!!

+ 𝐶!!!!;!!!"!!;!!!"!!

𝑏 =1

𝐾!,! 𝐸!!!!"!!;!!!"!!

Fitness can then be rearranged as:

𝑊 = 𝑆!𝛽!

𝑏 + 1𝐾!,!"!! 𝐸!"!!

+ 1𝐾!,!"!! 𝐸!"!!

− 𝛽!𝐾!,! 𝑎 +𝐶!"!!𝑒!"!!

+𝐶!"!!𝑒!"!!

− 𝛽!𝑒!"!!

− 𝛽!𝑒!"!!

Assume 𝑐!"!! is much smaller than 𝑐!"!! (justifiable given the observation that this enzyme

could be dropped almost to negligible concentrations relative to wild-type). This implies that: !

!!"!!!!"!!≫ !

!!"!!!!"!!, and justifies ignoring these FghA terms leading to:

𝑊 = 𝑆!𝛽!

𝑏 + 1𝐾!,!"!! 𝐸!"!!

− 𝛽!𝐾!,! 𝑎 +𝐶!"!!𝑒!"!!

− 𝛽!𝑒!"!! − 𝛽!𝑒!"!!

6

𝑊 =𝑆!𝛽!

𝑏 + 1𝐾!,!"!! 𝐸!"!!

− 𝑆!𝛽!𝐾!,!𝑎 +𝑆!𝛽!𝐾!,!𝐶!"!!

𝑒!"!!− 𝛽!𝑒!"!! − 𝛽!𝑒!"!!

𝑊 =𝑆!𝛽!

𝑏 + 1𝐾!,!"!! 𝐸!"!!

− 𝑆!𝛽!𝐾!,!𝑎 − 𝑒!"!! 𝛽! +𝑆!𝛽!𝐾!,!𝐶!"!!

𝑒!"!! ! − 𝛽!𝑒!"!!

𝑊 =𝑒!"!!𝑆!𝛽!

𝑏𝑒!"!! +

1𝑘!"!!𝐾!,!"!!𝑏

− 𝑆!𝛽!𝐾!,!𝑎 − 𝑒!"!! 𝛽! +𝑆!𝛽!𝐾!,!𝐶!"!!

𝑒!"!! ! − 𝛽!𝑒!"!!

Renaming the constants:

𝑊 =𝑒!"!!𝛾!𝑒!"!! + 𝛾!

− 𝛾! − 𝑒!"!! 𝛾! +𝛾!

𝑒!"!! ! − 𝛾!𝑒!"!!

And finally, as our model is designed to explore data where the enzyme concentration of FlhA is

decreasing relative to the wildtype (𝑒!"!! < 1.0) we can regard the term !!!!"!! ! as relatively less

important over the parameter range we are fitting (or that the penalty term in parenthesis on the

right can be approximated by a simple linear penalty), leading to our model for fitness:

𝑊 =𝑒!"!!𝛾!𝑒!"!! + 𝛾!

− 𝛾! − 𝛾!𝑒!"!! − 𝛾!𝑒!"!!

Table S1. Genotypes and phenotypes of pCM410, pHC112, and their derivatives. aThe nucleotide positions of mutations on pCM410 (Genbank Accession no. FJ389188)

are indicated in parenthesis. bThe concentrations of cumate (µM) applied to induce expression of the flhA-fghA

cassette in pHC112 and its derivatives. cPhenotypes during growth in methanol (15 mM) minimal media supplemented. Data are

reported as means and 95% confidence intervals of three independent measurements.

Plasmids were quantified as copies per genome. Enzyme activities of FlhA and FghA

were measured as the amount of substrate being catalyzed per milligram of cellular

protein per second (mM sec-1 mg-1).

Plasmid

pCM410

pHC112

aMutations bInducer cPhenotypeIndex Description Plasmid copy FlhA FghA Fitness0 ancestral plasmid 0 8.875 ± 1.091 0.669 ± 0.045 1.562 ± 0.119 1.007 ± 0.009A1 Δ(3791–3801) 0 9.081 ± 0.912 0.487 ± 0.018 0.301 ± 0.029 1.300 ± 0.027A2 flhA...ISMex4 (3771/ 3772) 0 8.866 ± 0.891 0.477 ± 0.073 0.332 ± 0.066 1.353 ± 0.022A3 T→C (3804) 0 9.451 ± 1.652 0.442 ± 0.016 0.045 ± 0.007 1.420 ± 0.032B2 Δ(9141–113) 0 6.177 ± 0.791 0.460 ± 0.010 1.076 ± 0.123 1.326 ± 0.032B3 T→G (9136) 0 7.211 ± 0.558 0.639 ± 0.007 1.351 ± 0.054 1.075 ± 0.012B2B3 Δ(9141–113), T→G (9136) 0 5.807 ± 0.664 0.427 ± 0.013 0.932 ± 0.117 1.323 ± 0.026B5 trfA::ISMex25(8302/8303) 0 2.199 ± 0.563 0.318 ± 0.011 0.093 ± 0.013 1.457 ± 0.033A1B3 Δ(3791–3801), T→G (9136) 0 8.015 ± 1.150 0.484 ± 0.037 0.315 ± 0.068 1.402 ± 0.034A1B2 Δ(3791–3801), Δ(9141–113) 0 6.156 ± 1.087 0.366 ± 0.019 0.199 ± 0.026 1.478 ± 0.069A1B2B3 Δ(3791–3801), Δ(9141–113), T→G (9136) 0 5.914 ± 0.779 0.341 ± 0.016 0.201 ± 0.040 1.492 ± 0.036A1B5 Δ(3791–3801), trfA ::ISMex25(8302/8303) 0 1.853 ± 0.341 0.108 ± 0.018 0.100 ± 0.022 0.365 ± 0.037A2B3 flhA...ISMex4 (3771/ 3772), T→G (9136) 0 8.553 ± 1.421 0.491 ± 0.040 0.311 ± 0.049 1.366 ± 0.042A2B2 flhA...ISMex4 (3771/ 3772), Δ(9141–113) 0 5.897 ± 0.606 0.423 ± 0.023 0.190 ± 0.021 1.372 ± 0.064A2B2B3 flhA...ISMex4 (3771/ 3772), Δ(9141–113), T→G (9136) 0 5.186 ± 0.418 0.392 ± 0.029 0.174 ± 0.033 1.488 ± 0.015A2B5 flhA...ISMex4 (3771/ 3772), trfA::ISMex25(8302/8303) 0 2.115 ± 0.539 0.113 ± 0.030 0.104 ± 0.029 0.448 ± 0.047A3B3 T→C (3804), T→G (9136) 0 7.880 ± 0.793 0.419 ± 0.067 0.052 ± 0.004 1.502 ± 0.024A3B2 T→C (3804), Δ(9141–113) 0 5.089 ± 0.728 0.298 ± 0.039 0.041 ± 0.008 1.483 ± 0.101A3B2B3 T→C (3804), Δ(9141–113), T→G (9136) 0 4.620 ± 1.058 0.292 ± 0.020 0.043 ± 0.002 1.428 ± 0.088A3B5 T→C (3804), trfA::ISMex25(8302/8303) 0 1.907 ± 0.252 0.171 ± 0.010 0.038 ± 0.004 1.097 ± 0.081R cumate-inducible expression plasmid 0 8.645 ± 1.402 0.131 ± 0.014 0.902 ± 0.141 0.584 ± 0.074

0.05 9.347 ± 1.359 0.143 ± 0.019 1.052 ± 0.223 0.804 ± 0.0080.1 9.002 ± 1.652 0.177 ± 0.018 1.129 ± 0.087 1.051 ± 0.0260.15 9.045 ± 1.838 0.281 ± 0.046 1.139 ± 0.200 1.141 ± 0.0160.25 9.375 ± 1.832 0.311 ± 0.026 1.190 ± 0.152 1.192 ± 0.0060.5 8.714 ± 1.007 0.356 ± 0.043 1.350 ± 0.178 1.170 ± 0.0021 9.665 ± 0.927 0.466 ± 0.074 1.474 ± 0.139 1.141 ± 0.00115 7.206 ± 1.391 0.627 ± 0.094 1.715 ± 0.172 1.065 ± 0.027

RB2 cumate-inducible expression, Δ(9141–113) 0 6.350 ± 1.380 0.112 ± 0.026 0.653 ± 0.078 0.275 ± 0.0400.05 0.625 ± 0.0430.1 6.255 ± 1.347 0.132 ± 0.021 0.752 ± 0.023 0.883 ± 0.0570.15 1.225 ± 0.0550.25 5.957 ± 1.082 0.248 ± 0.033 0.789 ± 0.026 1.224 ± 0.0270.5 1.226 ± 0.0761 1.228 ± 0.02315 6.053 ± 0.659 0.419 ± 0.105 1.372 ± 0.226 1.227 ± 0.089

RB3 cumate-inducible expression, T→G (9136) 0 7.355 ± 0.887 0.128 ± 0.028 0.836 ± 0.068 0.463 ± 0.0450.05 0.793 ± 0.0520.1 7.065 ± 0.539 0.172 ± 0.032 0.983 ± 0.088 1.040 ± 0.0110.15 1.138 ± 0.0390.25 7.101 ± 0.608 0.332 ± 0.039 1.182 ± 0.045 1.210 ± 0.0640.5 1.194 ± 0.0031 1.118 ± 0.01515 6.793 ± 0.191 0.619 ± 0.101 1.641 ± 0.158 1.085 ± 0.014

RB2B3 cumate-inducible expression, Δ(9141–113), T→G (9136) 0 4.804 ± 0.696 0.107 ± 0.010 0.548 ± 0.114 0.357 ± 0.0330.05 0.692 ± 0.0420.1 5.311 ± 0.553 0.142 ± 0.015 0.619 ± 0.165 0.856 ± 0.0070.15 1.190 ± 0.0770.25 5.551 ± 1.162 0.234 ± 0.034 0.743 ± 0.022 1.226 ± 0.0530.5 1.263 ± 0.0261 1.252 ± 0.02415 5.714 ± 0.270 0.421 ± 0.030 1.430 ± 0.276 1.264 ± 0.017

RB5 cumate-inducible expression, trfA::ISMex25(8302/8303) 0 1.786 ± 0.575 0.108 ± 0.023 0.555 ± 0.099 0.307 ± 0.0290.05 0.582 ± 0.0200.1 2.191 ± 0.612 0.122 ± 0.013 0.609 ± 0.052 0.641 ± 0.0340.15 1.142 ± 0.0390.25 1.943 ± 0.568 0.268 ± 0.057 0.753 ± 0.152 1.248 ± 0.0330.5 1.267 ± 0.0321 1.287 ± 0.04315 2.527 ± 0.859 0.434 ± 0.025 1.305 ± 0.160 1.330 ± 0.024

Table S2. Bacterial strains and plasmids.

Strain or plasmid Description Source or reference Strains CM502 Wild-type, crtI502 [61] CM624 crtI502, ΔmptG [10] CM701 ΔmptG, pCM410 [10] CM702 crtI502, ΔmptG, pCM410 [10] CM1232 crtI502, katA::(loxP-trrnB-PtacA-mCherry-tT7), ΔmptG, pCM410 [10] Plasmids pCM410 pCM160 bearing the flhA-fghA cassette downstream of PmxaF [10] pHC112 pCM410 bearing Plac-cymR and a CymR binding site (CuOcmt)

between PmxaF and the flhA-fghA cassette [34]

bpCM410Index Derivatives of pCM410 This study bpHC112Index Derivatives of pCM112 This study aKmr, kanamycin resistance bIndex, plasmid derivatives of pCM410, pCM160, pHC112 carrying mutations that occurred on pCM410

during experimental evolution of the EM strain. See Figure 1B and Table S1 for details.

Table S3. The value of fitted parameters in the fitness landscape model.

Parameter Value Vmax 175.841 Eh 0.186 Threshold 173.230 C1 0.011 C2 0.002

Documents

Mapping the Fitness Landscape of Gene Expression Uncovers ......Mapping the Fitness Landscape of Gene Expression Uncovers the cause of Antagonism and Sign Epistasis between Adaptive