40
Inferring gene regulation dynamics from static snapshots of gene expression variability Euan Joly-Smith, 1 Zitong Jerry Wang, 2 and Andreas Hilfinger 1, 3, 4 1 Department of Physics, University of Toronto, 60 St. George St., Ontario M5S 1A7, Canada 2 Division of Biology and Biological Engineering, California Institute of Technology, Pasadena CA 91125, USA 3 Department of Mathematics, University of Toronto, 40 St. George St., Toronto, Ontario M5S 2E4 4 Department of Cell & Systems Biology, University of Toronto, 25 Harbord St, Toronto, Ontario M5S 3G5 (Dated: September 2, 2021) Inferring functional relationships within complex networks from static snapshots of a subset of variables is a ubiquitous problem in science. For example, a key challenge of systems biology is to translate cellular heterogeneity data obtained from single-cell sequencing or flow-cytometry experi- ments into regulatory dynamics. We show how static population snapshots of co-variability can be exploited to rigorously infer properties of gene expression dynamics when gene expression reporters probe their upstream dynamics on separate time-scales. This can be experimentally exploited in dual-reporter experiments with fluorescent proteins of unequal maturation times, thus turning an experimental bug into an analysis feature. We derive correlation conditions that detect the presence of closed-loop feedback regulation in gene regulatory networks. Furthermore, we show how genes with cell-cycle dependent transcription rates can be identified from the variability of co-regulated fluorescent proteins. Similar correlation constraints might prove useful in other areas of science in which static correlation snapshots are used to infer causal connections between dynamically inter- acting components. I. INTRODUCTION A large body of experimental work has quantified sig- nificant non-genetic variability in living cells [1–8]. Har- nessing the information contained in this naturally occur- ring variability to infer molecular processes in cells with- out perturbation experiments is a long-standing goal of systems biology. However, measuring the spontaneous non-genetic variability of only one cellular component does not have sufficient discriminatory power to distin- guish between models of complex cellular processes with many interacting components [9]. Fortunately, progress in experimental techniques has made it possible to mea- sure multiple components simultaneously. For example, covariances between mRNA and protein levels have been used to test hypotheses about translation rates in bacte- ria [10]. Despite improvements in experimental methods, it re- mains technically challenging to measure multiple differ- ent types of molecules in the same cell. More feasible, and thus more common, are experiments that measure multiple levels of the same type of molecule, e.g., mea- suring different mRNA levels using sequencing techniques [11], or measuring abundances of proteins using fluores- cence microscopy [12]. Such experiments have motivated “dual reporter” approaches in which correlations between identical copies of reporters responding to a common up- stream signal are used to characterize sources of variabil- ity within cellular processes [12–14]. Previous work focused on splitting the total observed variability into “intrinsic” and “extrinsic” contributions under the assumption that reporters are identical in all intrinsic properties. However, actual experimental re- porters are never exactly identical. For example, com- monly used fluorescent proteins differ enormously in their maturation half-lifes ranging from minutes to hours [15]. We show that despite such asymmetries, gene expression reporters can be used to rigorously detect closed-loop control networks from correlation measurements. Fur- thermore, we show that the inherent asymmetry of re- porters can in fact be exploited to our advantage. Be- cause reporters that differ in their intrinsic dynamics re- spond to their shared upstream input on different time- scales, their variability contains information about the unobserved upstream dynamics even when we have only access to static population snapshots. For example, we show how asymmetric dual reporters can be used to dis- tinguish periodically varying deterministic driving from stochastic upstream “noise”. The utility of these results lies in interpreting exper- imental data even when only a small part of a complex regulatory process can be observed directly. Instead of trying to model all of the many direct and indirect steps of gene expression regulation, we analyze entire classes of systems in which we specify only some steps but leave all other details unspecified. This approach allows us to derive inequalities that constrain the space of behaviour that could possibly be observed across a population of genetically identical cells within these classes, regardless of the details of the unspecified parts. These inequalities are in terms of coefficients of variation (CVs) and corre- lation coefficients of reporter levels, x and y, engineered to readout components-of-interest, CV x := Var(x) x ρ xy := Cov(x, y) Var(x)Var(y) where angular brackets denote population averages. Such population statistics are experimentally accessible from static snapshots of cellular populations that have reached a time-independent distribution of cell-to-cell variability. If a measurement violates our inequalities, one of our assumptions must be false, regardless of how the unspec- ified part of the system behaves. This way mathematical arXiv:2109.00392v1 [q-bio.MN] 1 Sep 2021

arXiv:2109.00392v1 [q-bio.MN] 1 Sep 2021

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Inferring gene regulation dynamics from static snapshots of gene expression variability

Euan Joly-Smith,1 Zitong Jerry Wang,2 and Andreas Hilfinger1, 3, 4

1Department of Physics, University of Toronto, 60 St. George St., Ontario M5S 1A7, Canada2Division of Biology and Biological Engineering, California Institute of Technology, Pasadena CA 91125, USA

3Department of Mathematics, University of Toronto, 40 St. George St., Toronto, Ontario M5S 2E44Department of Cell & Systems Biology, University of Toronto, 25 Harbord St, Toronto, Ontario M5S 3G5

(Dated: September 2, 2021)

Inferring functional relationships within complex networks from static snapshots of a subset ofvariables is a ubiquitous problem in science. For example, a key challenge of systems biology is totranslate cellular heterogeneity data obtained from single-cell sequencing or flow-cytometry experi-ments into regulatory dynamics. We show how static population snapshots of co-variability can beexploited to rigorously infer properties of gene expression dynamics when gene expression reportersprobe their upstream dynamics on separate time-scales. This can be experimentally exploited indual-reporter experiments with fluorescent proteins of unequal maturation times, thus turning anexperimental bug into an analysis feature. We derive correlation conditions that detect the presenceof closed-loop feedback regulation in gene regulatory networks. Furthermore, we show how geneswith cell-cycle dependent transcription rates can be identified from the variability of co-regulatedfluorescent proteins. Similar correlation constraints might prove useful in other areas of science inwhich static correlation snapshots are used to infer causal connections between dynamically inter-acting components.

I. INTRODUCTION

A large body of experimental work has quantified sig-nificant non-genetic variability in living cells [1–8]. Har-nessing the information contained in this naturally occur-ring variability to infer molecular processes in cells with-out perturbation experiments is a long-standing goal ofsystems biology. However, measuring the spontaneousnon-genetic variability of only one cellular componentdoes not have sufficient discriminatory power to distin-guish between models of complex cellular processes withmany interacting components [9]. Fortunately, progressin experimental techniques has made it possible to mea-sure multiple components simultaneously. For example,covariances between mRNA and protein levels have beenused to test hypotheses about translation rates in bacte-ria [10].

Despite improvements in experimental methods, it re-mains technically challenging to measure multiple differ-ent types of molecules in the same cell. More feasible,and thus more common, are experiments that measuremultiple levels of the same type of molecule, e.g., mea-suring different mRNA levels using sequencing techniques[11], or measuring abundances of proteins using fluores-cence microscopy [12]. Such experiments have motivated“dual reporter” approaches in which correlations betweenidentical copies of reporters responding to a common up-stream signal are used to characterize sources of variabil-ity within cellular processes [12–14].

Previous work focused on splitting the total observedvariability into “intrinsic” and “extrinsic” contributionsunder the assumption that reporters are identical in allintrinsic properties. However, actual experimental re-porters are never exactly identical. For example, com-monly used fluorescent proteins differ enormously in theirmaturation half-lifes ranging from minutes to hours [15].We show that despite such asymmetries, gene expression

reporters can be used to rigorously detect closed-loopcontrol networks from correlation measurements. Fur-thermore, we show that the inherent asymmetry of re-porters can in fact be exploited to our advantage. Be-cause reporters that differ in their intrinsic dynamics re-spond to their shared upstream input on different time-scales, their variability contains information about theunobserved upstream dynamics even when we have onlyaccess to static population snapshots. For example, weshow how asymmetric dual reporters can be used to dis-tinguish periodically varying deterministic driving fromstochastic upstream “noise”.

The utility of these results lies in interpreting exper-imental data even when only a small part of a complexregulatory process can be observed directly. Instead oftrying to model all of the many direct and indirect stepsof gene expression regulation, we analyze entire classesof systems in which we specify only some steps but leaveall other details unspecified. This approach allows us toderive inequalities that constrain the space of behaviourthat could possibly be observed across a population ofgenetically identical cells within these classes, regardlessof the details of the unspecified parts. These inequalitiesare in terms of coefficients of variation (CVs) and corre-lation coefficients of reporter levels, x and y, engineeredto readout components-of-interest,

CVx :=

√Var(x)

〈x〉 ρxy :=Cov(x, y)√

Var(x)Var(y)

where angular brackets denote population averages. Suchpopulation statistics are experimentally accessible fromstatic snapshots of cellular populations that have reacheda time-independent distribution of cell-to-cell variability.If a measurement violates our inequalities, one of ourassumptions must be false, regardless of how the unspec-ified part of the system behaves. This way mathematical

arX

iv:2

109.

0039

2v1

[q-

bio.

MN

] 1

Sep

202

1

2

constraints can be used to deduce features of gene ex-pression.

While our results are motivated by methods in exper-imental cell biology to understand gene expression dy-namics, they apply to any “reporters” embedded in a dy-namic interaction network, subject to the specified pro-duction and elimination fluxes considered here and maythus be more broadly applicable.

II. DETECTING GENE REGULATIONFEEDBACK FROM STATIC SNAPSHOTS OF

POPULATION HETEROGENEITY

Cells employ both open-loop regulation or closed-loopfeedback to control cellular processes [16]. Here we showhow to infer the presence of closed-loop feedback for anymolecule within a network, by introducing an additionalreporter molecule into the system. After describing thekey theoretical result we detail the experimental set-upto detect feedback control in gene regulation from mRNAlevels or fluorescent protein measurements. In brief, ourresults apply to “dual reporter” genes that are engineeredto share an unspecified but identical transcription rate.This assumption defines our class of models and needs tobe experimentally ensured through appropriate geneticengineering in combination with self-consistency checks,such as indistinguishable reporter distributions, as forexample reported in [4, 12, 13, 17]. Further experimentalconsiderations are discussed in Sec. IV.

A. Mathematical correlation constraints foropen-loop “dual reporters”

Motivated by the transcriptional dynamics of co-regulated genes, we consider a generic class of systems inwhich two cellular components, X and Y, are producedwith a common (but unspecified) time-varying rate, andare degraded in a first order reaction, with average life-times τx and τy, respectively

xR(u(t))−−−−−−−→ x+ 1 y

R(u(t))−−−−−−−→ y + 1

xx/τx−−−−−−−→ x− 1 y

y/τy−−−−−−−→ y − 1 (1)

where the transcription rate can depend in any way ona cloud of unknown components u(t), which in turn candepend in arbitrary ways on the number of X and Ymolecules, denoted by x and y. While we characterize thestochastic reactions of X and Y, the dynamics of all othercellular components remain unspecified, see Fig. 1A, andthe resulting dynamics need not be Markovian or er-godic in X and Y. A related class of stochastic processeshas been previously considered to analyze mRNA-proteincorrelations in gene expression [9], whereas here we ana-lyze correlations between co-regulated transcripts.

Previous work established universal probability bal-ance relations that constrain the stationary state distri-butions of stochastic processes [18]. For the above class of

systems, these relations translate the specified reactionsof Eq. (1) into an underdetermined system of equationsfor (co)variances (see Appendix A). Though this systemof equations cannot be solved due to the unspecified partsof the dynamics, not all dual reporter correlations ρxy areaccessible for all variability ratios CVx/CVy. For exam-ple, the correlation of any system is bounded by the cor-relation of two reporters that respond deterministicallyto their upstream input (Appendix A). When the tworeporters respond to their upstream input on unequaltimescales, their correlations are constrained as illus-trated by the dashed orange lines in Fig. 1B that boundall systems for a given value of T := τy/τx. In Fig. 1 andthroughout the paper, numerical simulations of specificstochastic models and parameters establish that the in-equalities are tight, i.e., that the entire bounded regionsare accessible. For illustration, we plot a subset of sim-ulations with arbitrarily chosen sampling density, alongwith the analytically proven constraints.

The space of possible correlations is restricted muchfurther for open-loop systems in which upstream vari-ables regulate the reporter production rates but are notaffected by them, corresponding to all possible systems inFig. 1A in which X and Y do not affect the unspecifiedcloud. For all such systems, we can derive additionalconstraints by considering the hypothetical average ofan ensemble of stochastic dual reporters conditioned onthe history of their upstream influences. While theseconditional averages are typically experimentally inac-cessible, they mathematically constrain the measurable(co)variances. We find (see Appendix B) that the corre-lations of cellular components X and Y that are regulatedthrough an open-loop process must satisfy

∣∣∣CVxCVy

− T CVyCVx

∣∣∣

1− T 6 ρxy with T := τy/τx, (2)

where without loss of generality we assume that T 6 1,i.e., that Y is the faster reporter. Fig. 1B shows the above“open-loop constraint” of Eq. (2) as solid orange lines, forspecific values of T . Note that, in the symmetric limitT → 1, Eq. (2) reduces to ρxy > 0 and CVx = CVy asintuitively expected.

Any system whose measured (co)variability falls out-side the region defined by Eq. (2) must be regulatedthrough closed-loop feedback. Violations of this con-straint can thus be used to experimentally detect thepresence of feedback based on static population variabil-ity measurements without the need for perturbations.Fig. 1B shows the (co)variability of simulated systemswith feedback (grey dots) and without feedback (bluedots), illustrating that only systems with feedback canfall outside the region bounded by solid orange lines, vio-lating the “open-loop constraint” of Eq. (2). The positionof a system outside the “open-loop constraint” can beused to quantify a heuristic “feedback strength” and pro-vides additional information to distinguish positive fromnegative feedback (see Appendix B 2).

3

= Space of systems with feedback = Space of systems without feedback

= Particular system without feedback= Particular system with feedback

geneX

geneY

R(u(t))

R(u(t))mRNA

Native

Reporter

Common transcription rate R(u(t))C)

Figure 1. Feedback in gene regulation affects the space of possible mRNA covariability. A) We consider allstochastic processes in which two components, X and Y, are made with an identical, but unspecified, rate. This rate candepend in any way on a cloud of unknown components u(t), which in turn can depend in arbitrary ways on the number of Xand Y molecules. The shared production rate of X and Y, together with first-order degradation of X and Y with respectivelifetimes τx and τy, are the only specified parts within this arbitrarily large network, as defined in Eq. (1). B) Space of possiblecovariability for different values of the life-time ratio T := τy/τx. All systems must satisfy ρxy(1 + T ) 6 CVx/CVy + TCVy/CVxcorresponding to the area below the dashed orange line. Allowing for feedback (grey dots), the entire space below the dashedorange line is accessible. In the absence of feedback (blue dots), ρxy is additionally constrained by Eq. (2), corresponding tothe region between the solid orange lines. Dots are stochastic simulation data for specific models of co-regulated genes withinthe class defined in Eq. (1). Plotted are a subset of simulations with arbitrary density to demonstrate the accessibility ofthe constrained regions. Blue and grey curves are exemplary toy models (Appendix G) that illustrate the effect of decreasingtranscription rate variability (blue) or increasing feedback strength (grey). C) Experimental set-up to detect feedback in theregulation of a native gene of interest geneX. The native and reporter genes are regulated by identical promoters in the same cell.If the covariability of the transcripts X and Y of such genes falls outside the open-loop constraint of Eq. (2), we can concludethat the gene of interest geneX regulates its own transcription. To maximize the discriminatory power of the approach thereporter geneY should be engineered to be a passive read-out of transcription without significantly affecting gene expression.

B. Experimentally exploiting mRNA correlationsto detect feedback

Eq. (2) can be exploited through an experimental set-up analogous to previously engineered circuits in whichco-regulated genes reportedly satisfied the assumptions ofEq. (1) and transcripts were counted with single moleculefluorescence in-situ hybridization (smFISH) [17, 19]. Todetect feedback regulation of a gene of interest geneX, areporter geneY should be introduced whose expressionis under the control of an identical copy of the promoterof geneX, see Fig. 1C. The precise sequence of the re-porter gene is unimportant as long as its transcript Yis sufficiently different from the transcript X of the geneof interest to avoid cross-hybridization by RNA probes.This can be achieved, e.g., by making the reporter genesequence a scrambled version of the gene of interest.

Using smFISH, transcripts X and Y can be measuredsimultaneously at the single cell level [20]. Simple popu-lation snapshots of cell-to-cell variability then determinewhether experimentally observed CVx, CVy, ρxy violateEq. (2). If this open-loop constraint is violated we canconclude that geneX must directly or indirectly affect itsown production rate.

If a system falls inside the region defined by Eq. (2) wecannot say whether it is regulated through feedback or

not, see Fig. 1B. That is because systems with infinitesi-mally weak feedback are fundamentally indistinguishablefrom open-loop processes. To detect significant feedbackin geneX, it is advantageous to ensure the reporter geneYis not involved in the same feedback regulation. Thiscould, e.g., be achieved by removing the start codon fromgeneY so its mRNA is not translated, thus preventingthe protein of geneY from exerting any feedback control.Furthermore, by ensuring that the life-time of the secondreporter transcript is comparable to that of the gene ofinterest we can minimize the accessible area defined byEq. (2) and thus maximize the discriminatory power ofthe approach.

Note that only relative abundances are necessary todetermine CVx, CVy, and ρxy. The above steps can thusbe applied to data from single cell sequencing techniques.Additionally, severe violations of Eq. (2) can potentiallybe detected already from sequential rather than simul-taneous measurements of X and Y: if their ratio of CVsfalls outside the interval [T, 1] then Eq. (2) must be vi-olated regardless of the value of ρxy. When analyzinggenes with potentially unknown mRNA life-time, the ra-tio of mRNA life-times T can be inferred from pairwisecorrelation measurements between three reporter genes,as detailed in Sec. IV.

4

Figure 2. Feedback in gene regulation affects the possible covariability of fluorescent protein measurements.A) As before X and Y correspond to co-regulated mRNA, but we now explicitly include the dynamics of immature fluorescentproteins denoted as X’ and Y’, as well as mature fluorescent proteins X” and Y”. Because fluorescent proteins are typicallystable and are thus effectively diluted with a common “degradation” time set by the cell-cycle, we focus on gene expressiondynamics that is symmetric apart from the maturation step. The asymmetry between the co-regulated genes is then entirelycharacterized by the ratio of average fluorescent maturation times Tm := τmat,y/τmat,x. B) Without feedback control (blue dots),fluorescence correlations are constrained to the region between the dashed and solid orange lines, the latter corresponding tothe bound of Eq. (3). Allowing for feedback (grey dots), the entire region becomes available. Correlations in fluorescence levelscan thus be used to detect causal feedback in gene regulation from static population snapshots. Dots are selected numericalsimulations of specific fluorescent reporter systems within this class that illustrate the full accessibility of the region constrainedby the analytically proven bounds.

C. Experimentally exploiting fluorescent proteincorrelations to detect feedback

Similar constraints can be derived to interpret geneexpression data in which fluorescent fusion proteins areused to quantify gene expression [4]. In this case theexperimental read-out involves a fluorescent maturationstep in addition to transcription and translation. Whilethis maturation step is often well approximated as ex-ponential [15], its life-time differs significantly betweencommonly used fluorescent proteins. For example, thematuration half-lifes of mCerulean, mEYFP, mEGFP,and mRFP1 are 6.6 min, 9 min, 14.5 min and 21.9 minrespectively [15]. In order to detect gene regulatory feed-back from (co)variability data in such an experimentalset-up, we extend our previous class of systems to ex-plicitly account for protein translation and maturationevents, see Fig. 2A. In this parallel cascade system X′′

and Y′′ represent the level of mature fluorescent proteins.

Crucially, open-loop systems in which none of the geneproducts directly, or indirectly, affect their transcriptionrate must additionally satisfy

max

Tm

CVy′′

CVx′′− CVx′′

CVy′′

1− Tm, 0

6 ρx′′y′′ &

CVx′′

CVy′′6 1,

(3)where Tm := τmat,y/τmat,x is the ratio of maturationtimes between the two fluorescent proteins (see AppendixE). The “left boundary” of this region is mathematicallyidentical to the previous bound of Eq. (2), but now ap-

plied to the statistics of fluorescence levels X′′ and Y′′.The new right-hand bound CVx′′ 6 CVy′′ broadens theregion accessible to open-loop processes due the addi-tional intrinsic degrees of freedom of this class of sys-tems. Fig. 2B illustrates the tightness of these bounds(solid orange lines) for Tm = 0.5, where dots correspondto simulated systems with (grey) and without feedback(blue).

Moreover, analogously to the mRNA-correlations, allpossible fluorescence correlations for open-loop systemsare constrained by ρx′′y′′(1 + Tm) 6 TmCVx′′/CVy′′ +TmCVy′′/CVx′′ . Correlations of systems with feedbackcan break this bound, as shown by the grey dots inFig. 2B. This is because in the limit of infinitesimallysmall maturation times and mRNA fluctuations, the sys-tem becomes identical to Fig. 1A with T = 1, which hasunbounded correlations as shown in Fig. 1B.

Fluorescent proteins can thus be used to detectwhether a given gene regulates its own production asfollows. Considering geneZ as the native gene of inter-est, two recombinant genes, geneZ-GFP and geneZ-RFP(or other spectrally distinguishable pairs) would be engi-neered into an isogenic cell population under the controlof the same (but distinct) promoter as geneZ. The tran-scripts of geneZ-GFP and geneZ-RFP then correspondto X and Y in Fig. 2A, as they are transcribed with iden-tical rates. The level of mature fusion protein, X′′ andY′′, can be read out at the single-cell level with fluores-cence microscopy, and from the observed CVx′′ , CVy′′ ,and ρx′′y′′ we can detect violations of the open-loop con-straint Eq. (3).

5

If necessary, the discriminatory power of this approachcan be increased by introducing a third fusion proteinwith a different fluorescent maturation time to elimi-nate the unknown internal degrees of freedom, and de-termine how much variability is generated through tran-scriptional, translational, or maturation events (see Ap-pendix J 3).

In modeling this parallel cascade system, we assumedthe gene expression dynamics of the two fusion proteinsis identical apart from the fluorescent maturation step.This is motivated by the experimental set-up in which weuse the same native protein fused to two different fluores-cent proteins. The absence of further asymmetries, e.g.,caused by codon usage, translation initiation, or reportercross talk, would need to be established experimentally.Traditionally, this has been done by comparing the dis-tributions of reporter variability for reporters that areclaimed to be identical in their transcriptional and trans-lational dynamics [4, 12, 13, 17]. Note that the class ofsystems defined in Fig. 2A is in fact a subset of a muchlarger class of systems in which the key specified part isthat there is a final maturation step in a symmetric butotherwise unspecified intrinsic cascade of arbitrary steps(see Appendix D), thus allowing for a cascade of sequen-tial post translational modifications that occur before thematuration step.

So far we considered arbitrary values of T to allow forthe experimental reality that reporters are never com-pletely identical. Our next results show that T 6= 1 isnot just a nuisance but can be exploited to infer the dy-namics of the transcription rate of open-loop systems.

III. DISTINGUISHING STOCHASTIC FROMDETERMINISTIC TRANSCRIPTION RATE

VARIABILITY

A fundamental issue when interpreting cell-to-cell vari-ability is that we generally do not know whether a com-ponent’s variability is due to stochastic upstream noiseor whether a component is driven by deterministic vari-ability [21]. Next, we show how we can distinguish thetwo types of dynamics from static population snapshotsof asymmetric gene expression reporters.

A. Mathematical correlation constraints forstochastic transcriptional noise

Focusing on the class of systems defined in Eq. (1) inthe absence of feedback, we next show how snapshots ofdual reporters can be used to infer temporal propertiesof the unobserved production rates. To discuss periodicdriving in cells we consider the stationary state auto-correlation of the transcription rate

A(s) :=〈R(t+ s)R(t)〉 − 〈R(t+ s)〉〈R(t)〉

Var[R(t)].

We define a production rate as periodic if this auto-correlation A(s) becomes negative for some s. In other

words, the periodicity of the driving has to be strongenough such that the rate of production is negatively cor-related with itself some time later. Conversely, we defineupstream variability as stochastic if the auto-correlationof the unobserved transcription rate is non-negative ev-erywhere, see Fig. 3A.

Fourier analysis of the dynamics of X and Y condi-tioned on the histories of their production rates shows(Appendix C) that not all systems can exhibit fluctua-tions everywhere within the region defined by Eq. (2).Components that are stochastically driven are addition-ally constrained by

√T 6 CVx

CVy, (4)

Sequential measurements of CVx and CVy from staticsnapshots of X and Y can thus discriminate betweendeterministic and stochastic transcription rates whenCVx < CVy

√T without access to time-series data or

directly measuring the unobserved upstream dynamics.Fig. 3 illustrate this discriminatory power with simulatedsystems in which genes that are periodically driven (bluedots) can fill the entire region defined by Eq. (2) (solidyellow lines), whereas genes that are driven stochastically(red dots) are further constrained by Eq. (4) (dashedblack line).

A pair of asymmetric (T < 1), co-regulated reportersin an open-loop system can exhibit perfect correlations(ρxy = 1) in two distinct regimes, see Fig. 3B. First, whenthe upstream variability is much slower than both τx andτy, the reporters adjust rapidly to their quasi-stationarystates such that y(t) = Tx(t) at all times, and thusCVx = CVy. The second regime occurs when productionrates oscillates rapidly, such that the upstream signal en-slaves the reporters into a transient regime where theirdifferent response times simply shift their average dy-namics. This second regime is not accessible by reportersthat are driven stochastically. Instead, in the limit of in-finitely fast stochastic variability, dual reporter systemsapproach CVx/CVy →

√T corresponding to the bound

of Eq. (4). Where on the right-hand side a stochasticallydriven system falls can be used to infer the time-scale ofthe upstream fluctuations (Appendix C).

Inferring upstream dynamics from static snapshots ispossible because the reporters probe their upstream dy-namics on different time-scales. In fact, in the hypo-thetical limit of an infinite number of reporters respond-ing to an upstream signal on all time-scales, the auto-correlation of the upstream signal is entirely determinedby static measurements of its downstream variability.That is because knowing a signal’s downstream variabil-ity on all time-scales effectively determines the Laplacetransform of its auto-correlation, see Appendix C.

6

Figure 3. Periodic transcriptional variabil-ity can be distinguished from stochastic vari-ability without experimental access to tran-scription rates and without following individ-ual cells over time. A) In a noisy cellular milieu,the auto-correlation of a periodic signal is not per-fectly periodic but decays. We thus operationallydefine a signal as periodic when its periodicity isstrong enough such that its auto-correlation func-tion becomes negative at some point. Conversely, weclassify signals with non-negative auto-correlations asstochastic. B) The left-side of the “open-loop” re-gion defined by Eq. (2) is only accessible by mRNAreporters that are driven by oscillatory productionrates rather than purely stochastic upstream vari-ability. The boundary (dashed black) line is definedby Eq. (4). C) Sequential measurements of mRNAreporters X and Y, or fluorescent proteins X” andY”, can be used to discriminate between stochasticand oscillatory transcription rates if a system vio-lates the respective bounds of Eq. (4) or Eq. (5) (indi-cated by dashed black line). Selected numerical sim-ulation (dots) illustrate the achievability of the con-strained regions, and the arrowed curves indicate howspecific models (Appendix G) behave as the down-stream response becomes slower. For these systems,we find that oscillatory systems cross the black dashedline when 2π

√τxτy or 2π

√τmat,xτmat,y become slower

than the period of the driving oscillation. To detectoscillations it is thus advantageous to choose slow re-porters. D) Periodic transcription rates of a gene ofinterest geneZ can be experimentally detected eitherutilizing mRNA reporters, X and Y (left panel), or flu-orescent protein reporters, X” and Y” (right panel),driven by the promoter of geneZ. If experimental re-porters violate Eq. (4) or (5), they land to the left ofthe dashed black line (panel C), and geneZ must bedriven by a periodically varying transcription rate.

B. Experimentally exploiting mRNA correlationsto detect periodic transcription rates

The constraint of Eq. (4) can be exploited to detectperiodic transcription rates, in experiments in which thepromoter of a gene of interest is used to independentlydrive the expression of two non-native passive gene ex-pression reporters [17, 19]. The details of the two reportergenes are not important as long as their transcripts haveunequal life-times, and their products are not involvedin any cellular control. Both criteria can be satisfied,e.g., by using a random sequence to encode the first re-porter, and combining another random sequence with amRNA stabilizing motif to encode the second reporter.This motif could be a 5’ stem-loop structure in the caseof prokaryotic cells or a carbohydrate recognition domain(CRD) sequence in eukaryotes [22, 23]. Transcript levelsof the two reporter genes then correspond to our compo-nents X and Y. Fig. 3D (left panel) illustrates this mRNAreporter set-up.

Because sequential measurements of CVx and CVy suf-

fice to detect violations of Eq. (4), the two reporters Xand Y do not need to be expressed simultaneously andcan be measured independently at the single cell level,e.g., using smFISH [24]. Systems satisfying Eq. (4) canbe either driven by stochastic or periodic transcriptionrates, but any violation strictly implies the existence ofperiodic upstream variability. To detect oscillations itis advantageous to use sufficiently long-lived mRNA re-porters such that 2π

√τxτy is comparable or larger to the

period of the upstream signal. This is demonstrated bythe arrowed curves in Fig. 3C corresponding to exemplaryoscillating and stochastic systems (defined in AppendixG) for varying τy and fixed τx. This oscillating systemcrosses into the discriminatory region when 2π

√τxτy is

greater than the period of the upstream oscillation. Fur-thermore, Fig. 3C suggests an advantageous range for Tto detect oscillations: 0.25 ≤ T ≤ 0.5.

7

C. Experimentally exploiting fluorescent proteincorrelations to detect periodic transcription rates

To detect transcriptional oscillations using co-regulated fluorescent reporter proteins we consider sys-tems as defined in Fig. 2A in the absence of feedback.Just like transcript levels, we can prove (see Appendix F)that correlations between non-identical fluorescent pro-teins in the absence of periodic driving are constrainedby

√Tm 6 CVx′′

CVy′′, (5)

where X′′ and Y′′ denote the fluorescence levels of thetwo reporter protein levels with a ratio of maturationtimes Tm := τmat,y/τmat,x. In contrast, systems that areperiodically driven can fill the entire region defined byEq. (3). The constraint of Eq. (5) can be used to detectoscillations in transcription rates from measures fluores-cent protein variability analogous to the mRNA methoddiscussed above, see Fig. 3C. The corresponding exper-imental set-up simply requires two different fluorescentproteins under the same transcriptional control as ourgene of interest as illustrated in Fig. 3D (right panel).

IV. APPLYING THEORETICAL BOUNDS TOEXPERIMENTAL DATA

The variables in our constraints are experimentally ac-cessible: mRNA dual reporter life-time ratios, CVs, andcorrelations have been reported [17, 19] with measure-ments in the range 0.14 ≤ T ≤ 1, 0.3 ≤ CVx,y ≤ 3,and 0.056 ≤ ρxy ≤ 0.89. CVs from fluorescent proteinreporters tend to be smaller, with CVs typically rang-ing from 0.1 to 1 [4, 12, 13]. Next we discuss real-worldchallenges when our constraints meet experimental dataand analyze recent gene expression data quantifying pop-ulation variability of constitutively expressed fluorescentproteins in E. coli [15].

A. Unknown life-time ratios

Our constraints depend on the ratio of reporter life-times or maturation times. For many fluorescent pro-teins, maturation times can be obtained from the lit-erature [15], but mRNA life-times may not be preciselyknown or might be highly context dependent. This prob-lem can be overcome through the addition of a third re-porter: by measuring pairwise correlations between threedual reporter transcripts within the class of Eq. (1), wecan determine the ratio of life-times between any pair ofthree reporters (Appendix J 1). In particular, the ratioT := τy/τx is given by

T =ηxx − ηxwηyy − ηyw

, (6)

where w denotes the abundance of a third co-regulatedmRNA reporter (with unknown/arbitrary life-time), and

ηfg := Cov(f, g)/(〈f〉〈g〉) denote normalized covariancesobtained from population snapshots.

Similarly, for fluorescent proteins within the class ofFig. 2A, we can determine the ratio of maturation timesTm by measuring the correlations of X′′ and Y′′ withtwo other fluorescent proteins of known maturation time(see Appendix J 1). Utilizing additional reporters to de-termine unknown asymmetries between X and Y is asuccessful general strategy because the number of con-straints on a system increases faster than the number ofparameters when adding additional reporters and observ-ing their (co)variance. For example, next we specify howunknown constants of proportionality in transcription orexperimental detection can be inferred from pairwise cor-relation measurements with additional reporters.

B. Proportional transcription rates

So far we considered co-regulated genes that have iden-tical (though probabilistic) transcription rates. This ismotivated by experimental synthetic reporter systemsin which two fluorescent proteins are expressed undertwo identical copies of a known transcriptional promoter.Additionally, the above results can be extended to co-regulated genes in which the transcription rates are notidentical but proportional to each other, i.e., Ry = αRxfor some α. Similar bounds hold for this new class ofsystems (see Appendix I) but they depend on the pro-portionality factor α which is potentially unknown. Forthe class of systems defined in Eq. (1) this issue can beresolved by introducing an additional reporter to deter-mine the constant of proportionality between the tran-scription rates. If the life-time ratio T is not known apriori, then four reporters would be needed in order todetermine all unknowns. Additionally, for synthetic fluo-rescent protein reporters, the constant of proportionalityin transcription can be inferred from absolute numbersof molecules. If measurements only report relative num-bers, α can be inferred through additional reporter cor-relations or sequential single-color experiments in whichthe same fluorescent protein is expressed under the twodifferent promoters (see Appendix J 2).

C. Systematic undercounting

Experimental data in cell biology might not reflectabsolute abundances. For example, mRNA moleculesare often counted using fluorescence in situ hybridization(FISH), a method that can lead to a systematic under-counting of the copy number due to probabilistic bind-ing between mRNA molecules and the fluorescent probes.Similarly, sequencing methods rely on probabilistic am-plification steps to detect transcripts. However, as longas each type molecule is experimentally detected withthe same probability, the above bounds and accessibleregions are strictly unaffected by the detection probabil-ities (Appendix H). This can be intuitively understoodbecause undercounting corresponds to a binomial read-

8

C)

Figure 4. Utilizing bounds to analyze concentrations in growing and dividing cells. A) We want to distinguishstochastically varying transcription rates whose average remains constant throughout the cell-cycle (red line) from transcriptionrates that change periodically during the cell-cycle (exemplified by blue line). Numbers of molecules oscillate in both typesof systems but only systems with cell-cycle dependent driving oscillate in concentrations. Our constraints can identify suchperiodic gene expression from static snapshots of population variability without access to time-series data. B) Solid orangelines denote open-loop constraints proven for systems in which the reporter concentrations are independent of the cell volume,under the assumption that the cell volume grows exponentially between division events. For mRNA reporters (left panel)concentration correlations are then bounded by the same constraints derived for absolute numbers in the absence of feedback.When transcription rates are periodic, the open-loop constraint of Eq. 2 for absolute numbers does not strictly apply toconcentrations. Numerical simulations of example systems (blue dots) suggest that such systems only fall fall marginally outsidethe orange bounds. For fluorescent reporters (right panel), the feedback bounds (orange lines) for concentrations depend onthe additional parameter T cm := (1/τmat,x + ln(2)/τc)/(1/τmat,y + ln(2)/τc) where τc is the average cell-cycle time, see Eq. (7).However, for both mRNA and fluorescence levels, Eqs. (4) and (5) (dashed black lines) can be used to detect cell cycle dependentgenes from asynchronous static snapshots of dual reporter concentrations. C) Previously reported gene expression variabilitydata for constitutively expressed fluorescent proteins that exhibit first order maturation dynamics in E. coli [15]. Plotted arevariability ratios and reporter asymmetries with respect to the observed dynamics of mEYFP. As expected for constitutivelyexpressed fluorescent proteins, the experimental data is consistent with the class of gene expression models that do not exhibitfeedback and are not periodically driven (pink region) bounded by Eq.(5) and Eq.(8). With the exception of the indicatedmEGFP outlier, all data fall along the right hand boundary that is only accessible for systems whose fluorescence variabilityis dominated by a translation, maturation, or machine read-out step. The yellow corridor indicates the estimated uncertaintyin reported cell-cycle time τc = (28.5 ± 2) min and the reference mEYFP maturation half-life τmat,y ln(2) = (9.0 ± 0.7) min.Variability ratios with respect to all other reference fluorescence proteins confirm the above picture with the exception of theobserved mEGFP variability which violates the expected behaviour, see Appendix L.

out step which de-correlates measurements just like thealready accounted for stochastic reaction steps. Whilereported averages, variances, and correlations stronglydepend on detection probabilities, their accessible regionis bounded by the same inequalities described above.

If different reporters have a different probability of be-ing detected we can generalize the above constraints toaccount for that and infer the missing parameter frompairwise correlations between three reporters (see Ap-pendix J 2).

D. Concentration measurements instead ofabsolute numbers

Growing and dividing cells exhibit a natural cycle inwhich, on average, cell-division “eliminates” half themolecules of a cell while also reducing the cell volumeby a factor of two. Instead of considering absolutenumbers, many experiments thus report concentrationsof molecules which (on average) are unaffected by cell-division. In the preceding discussion, the specified re-

actions describe the birth and death of molecules, butexactly the same constraints of Eqs. (4) and (5) can beused to analyze concentration measures of growing anddividing cells. When using concentrations to determineCVs, violations of Eqs. (4) and (5) can then detect geneswhose transcription rate varies periodically during thecell-cycle, see Fig. 4. In this analysis of concentrations weallow for binomial splitting noise, division time fluctua-tions, and asymmetric divisions, and we assume that cel-lular volume grows exponentially between divisions (Ap-pendix K).

When reporter concentrations are independent of cellvolume, open-loop constraints for concentrations similarto the ones derived for absolute numbers can be analyti-cally proven (Appendix K). For co-regulated mRNA thisconstraint is exactly identical to Eq. (2), but for fluores-cent proteins the equivalent of Eq. (3) changes becauseconcentrations are subject to an additional degradationrate from dilution governed by the average cell-cycle timeτc. In the absence of feedback, CVs in fluorescence con-centrations that are volume independent are constrained

9

by

max

T cm

CVy′′CVx′′

− CVx′′CVy′′

1− T cm, 0

6 ρx′′y′′ 6

T cmTm

CVy′′CVx′′

− CVx′′CVy′′

T cmTm− 1

(7)where T cm = (1/τmat,x + ln(2)/τc)/(1/τmat,y + ln(2)/τc),and these bounds are indicated by the solid orange linesin Fig. 4B (right panel).

Systems in which the reporter concentrations are notindependent of the volume are not bound by the aboveconstraints. However, numerical simulations suggest thatCVs in concentrations violate inequalities Eqs. (2), (7)only marginally, see blue dots in Fig. 4B. An exact boundcan be derived to strictly constrain this class of systemsif we have experimental access to a third reporter (seeAppendix K 3).

E. Data from constitutively expressed fluorescentproteins fall within the expected bounds

Ultimate proof that our constraints can be used toidentify periodically expressed genes will consist of ex-perimentally observing CVs of cell-cycle dependent pro-moters that violate Eq. (4) or Eq. (5) in physiologicallyrelevant regimes. However, even in the absence of suchdirect verification, the applicability of our method can besupported through a self-consistency check, i.e., whetherdata for fluorescent proteins expressed through a con-stitutive promoter fall into the expected region of geneexpression dynamics that is stochastically driven and notsubject to feedback control. Such data exists in E. coli forfluorescent proteins with precisely determined matura-tion dynamics [15]. Here, we analyze this cell-to-cell vari-ability data for fluorescent reporters that exhibited clearfirst-order maturation dynamics by determining the CVin protein concentration after subtracting volume vari-ability of growing and dividing cells (Appendix L).

In the absence of simultaneous fluorescence measure-ments, sequential CV measurements in concentrationsare constrained by

T cm 6 CVx′′

CVy′′6√T cmTm

, (8)

which corresponds to the bounds of Eq. (7) when allowingfor any value of the correlation coefficient (unobserved forsequential variability measurements). As discussed in theprevious section, the no-oscillation constraint of Eq. (5)still applies directly even when we analyze concentrationsrather than absolute numbers.

In Fig. 4C we present experimentally determined vari-ability data [15] for fluorescent proteins under control ofthe constitutive promoter proC in E. coli with mEYFPvariability and maturation dynamics as a reference point.As expected for constitutively expressed fluorescent pro-teins, the data for these biologically “boring” systems isconsistent with the class of gene expression models that

do not exhibit feedback and are not periodically driven(pink region), bounded by Eq.(5) and Eq.(8). Error barsfor T cm in Fig. 4C are taken from [15] with error prop-agation, and error bars in CVs and their ratios are thestandard error of the mean from three independent cellcultures with error propagation respectively. Further-more, all data (with the exception of mEGFP) follow theright boundary of the allowed region. The observed vari-ability of mEGFP is confirmed as an outlier when usingthe mEGFP measurements as a reference point for whichthe observed CV-ratios violate our predicted bounds (seeAppendix L). One possible explanation of this outlierwould be that mEGFP cells grew more slowly than theexperimentally reported 28.5 min cell-division times forall strains.

Data that falls on the right hand boundary is consistentwith negligible biological noise upstream of translation orwith variability that is dominated by technical measure-ment noise whose normalized variance decreases with theinverse of the mean population signal (see Appendix L).

F. Measurement noise & technological limitations

Theoretical constraints can be experimentally ex-ploited as long as measurement uncertainties are suffi-ciently small. When considering the sampling error ofcell-to-cell variability, 95% confidence intervals for CVshave been reported to be around 5-10% of the respectiveCV value [12, 19]. Similarly, experiments have shownhigh biological reproducibility, e.g., three repeats of iden-tical flow-cytometry experiments exhibited a standarderror of the CVs of around 10% [15]. However, a sig-nificant limitation remains for current experimental high-throughput methods like flow-cytometry or single-cell se-quencing: the measurements techniques themselves in-troduce significant noise, especially when used with bac-teria [25, 26], which means that technical variability canintroduce a systematic error in estimates of biologicalvariability. In such cases, one can attempt to decon-volve true biological variability from measurement noiseusing experimental and analytical techniques, e.g., by us-ing calibration beads [25] or by using noise models [26].Alternatively, one can opt for methods of lower through-put but greater precision, for example, mRNA measure-ments by single-molecule FISH (smFISH) are well-suitedfor validation of our method given its accuracy and highsensitivity [27]. Rapid improvements in high throughputquantification tools such as sequential FISH (seqFISH)will provide exciting opportunities for future applicationsof our work [28].

V. DISCUSSION

While our results are motivated by the analysis of geneexpression dynamics, utilizing correlation constraints tocharacterize the dynamics within complex processes mayprove useful in other areas of science in which systemsinvolve many interacting components whose dynamics is

10

difficult or impossible to track completely. Our resultsshow that it is possible to rigorously analyze correlationswithin classes of incompletely specified physical modelswithout resorting to statistical inference methods. Cru-cially, our approach does not require time-resolved data,which is often unavailable for complex systems. For ex-ample, following individual cells over time is much morechallenging, and thus less common, than taking staticpopulation snapshots of cellular abundances.

Our correlation constraints provide a framework to de-tect the presence of feedback strictly from observationsof a small subset of components within a much largercellular process. This could, e.g., be utilized to pinpointmolecular components that are involved in feedback reg-ulation of a gene by observing how our proposed sig-nature of feedback is affected in knock-out experiments.Additionally, our results highlight that using unequal re-porters can reveal dynamic properties of regulation evenin the absence of temporal data. These constraints arefundamentally due to the dynamics of interactions andare not apparent in previous work that considered asym-metric dual reporters as static random variables [29]. Forexample, we show that theoretically cell-cycle dependenttranscription rates can be detected from static popula-tion measurements of asymmetric downstream productsof gene expression. Additionally, we explicitly show thatour mathematical framework can be utilized even whenexperimental techniques detect individual molecules onlyprobabilistically or when key parameters are unknown.Finally, we report an experimental “negative control” inwhich we confirm that the measured variability of con-stitutively expressed fluorescence proteins falls into theexpected region of gene expression variability for genesthat are not subject to feedback regulation or periodicdriving.

ACKNOWLEDGMENTS

We thank Raymond Fan, Brayden Kell, Seshu Iyen-gar, Timon Wittenstein, Sid Goyal, Ran Kafri, and JoshMilstein for many helpful discussions. We thank Lau-rent Potvin-Trottier and Nathan Lord for valuable feed-back on the manuscript. This work was supported bythe Natural Sciences and Engineering Research Councilof Canada and a New Researcher Award from the Uni-versity of Toronto Connaught Fund. AH gratefully ac-knowledges funding through grant NSF-1517372 while inJohan Paulsson’s group at Harvard Medical School.

APPENDIX

Here we detail the mathematical derivations and illus-trate some of the results in greater depth. Throughoutthe appendix we denote the normalized (co)variance asηfg := Cov(f, g)/(〈f〉〈g〉), where the statistical measuresare stationary population averages.

First we go over the mathematical framework in whichwe model reaction networks in cells. The state of an

intracellular biochemical network at a given moment intime is given by the integer numbers {xi} of the chemicalspecies {Xi}, which we can write as x = (x1, . . . , xn).This system is dynamic, and so this state x will undergodiscreet changes over time as reactions occur. If there arem possible reactions in the system, we can write them as

xrk(x)−−−−−−−→ x + dk k = 1, . . . ,m

where the rate rk(x) corresponds to the probability perunit time of the k-th reaction occurring, and the stepsize dk = (d1k, . . . , dnk) corresponds to the change in thechemical species numbers x from the k-th reaction (thedik are positive or negative integers). It then followsthat the system probability distribution P (x, t) evolvesaccording to the chemical master equation [30, 31]:

d

dtP (x, t) =

k

[rk(x− dk)P (x− dk, t)− rk(x)P (x, t)] .

Time evolution equations for the moments 〈xki 〉 of eachcomponent Xi follow directly from the chemical masterequation [31].

Appendix A: Fluctuation balance relations forsystems as defined in Eq. (1)

Previous work established general relations that con-strain fluctuations of components within incompletelyspecified reaction networks [18]. In particular, any twocomponents Z1 and Z2 in an arbitrarily complex networkthat reach wide-sense stationarity, must satisfy the fol-lowing flux balance relations

〈R+1 〉 = 〈R−2 〉, (A1)

as well as the following fluctuation balance relations

Cov(z1, R−2 −R+

2 ) + Cov(z2, R−1 −R+

1 )=∑

k

d1kd2k〈rk〉,

(A2)

where R+i and R−i are the net birth and death fluxes of

component Zi, and the summation is over all reactionsin the network, with rk the rate of the k-th reaction, anddik the step-size of Zi of the k-th reaction. For the classof systems in Eq. (1), Eqs. (A1) and (A2) imply

〈R〉 = 〈x〉/τx & 〈R〉 = 〈y〉/τy, (A3)

and

ηxx =1

〈x〉 + ηxR, ηyy =1

〈y〉 + ηyR,

ηxy =1

1 + TηxR +

T

1 + TηyR.

(A4)

These relations must be satisfied by all systems in theclass, regardless of the unspecified details. This allows

11

us to set general constraints that hold for the entire classof systems. For example, the above relations lead to

ηxy =1

1 + T

(ηxx −

1

〈x〉

)+

T

1 + T

(ηyy −

1

〈y〉

)

≤ 1

1 + Tηxx +

T

1 + Tηyy,

where in the 2nd step we used fact that the averagesare positive. Dividing this bound by

√ηxxηyy leads to

ρxy(1 + T ) 6 CVxCVy

+ TCVyCVx

, which is the bound indicated

by the dashed orange line in Fig. 1B. The inequality be-comes an equality when 1

〈x〉 ,1〈y〉 → 0, which corresponds

to systems where X and Y exhibit no intrinsic stochas-ticity and follow the upstream signal deterministically.

Appendix B: Constraints on open-loop systems forsystems as define in Eq. (1)

1. Derivation of the open-loop constraint Eq. (2)

We consider a hypothetical ensemble of systems fromthe class of Eq. (1) that all share the same upstream his-tory u[−∞, t]. We can then consider the average stochas-tic dual reporters conditioned on the history of their up-stream influences [32, 33], which corresponds to the aver-ages at a moment in time in this hypothetical ensemble

x(t) = E[Xt|u[−∞, t]

]& y(t) = E

[Yt|u[−∞, t]

].

These are time-dependent variables that depend on theupstream history u[−∞, t]. We can take averages 〈x〉and (co)variances ηxx over the distribution of all possiblehistories.

This conditional system is not stationary because ofthe synchronised rate R(t) = R(u(t)), and so the time-evolution of the averages will follow the following differ-ential equations [32]

d∆x

dt= ∆R(t)−∆x(t)/τx,

d∆y

dt= ∆R(t)−∆y(t)/τy,

(B1)

where ∆x = x− 〈x〉, ∆y = y − 〈y〉, and ∆R(t) = R(t)−〈R〉. These differential equations correspond to a linearresponse problem in which X and Y both respond toR(t) on different timescales. To relate the results to theactual ensemble we multiply the left differential equationby x and take the expectation over all histories u[−∞, t]to get

E

[1

2

d∆x2

dt

]= E

[∆xR(t)

]− E

[∆x2

]/τx.

Note that E[d∆x2

dt

]= d

dtE[∆x2

]= d

dtVar(x), which

equals zero at wide-sense stationarity. We thus have

Cov(x,∆R(t)) = Var(x)/τx.

Moreover, we have

Cov(x, R(t)) = E[xR(t)

]− E

[x]· E[R(t)

]

= E[E[Xt|u[−∞, t]] ·R(t)

]− 〈x〉〈R〉

= E[E[XtR(u(t))|u[−∞, t]]

]− 〈x〉〈R〉

= 〈xR〉 − 〈x〉〈R〉 = Cov(X,R).

Putting these results together and normalizing we find

ηxx = ηxR & ηyy = ηyR, (B2)

where the flux-balance relations Eq. (A3) were used andthe expression on the right follows by symmetry. Simi-larly, it has been shown [32] that ηxy = ηxy. Comparingwith the fluctuation-balance relations Eq. (A4), we find

ηxx =1

〈x〉 + ηxx, ηyy =1

〈y〉 + ηyy,

ηxy = ηxy =1

1 + Tηxx +

T

1 + Tηyy.

(B3)

These relations allow us to translate results derived fromthe deterministic dynamics to the stochastic dynamics.We now use Cauchy-Schwarz as follows

(1

1 + Tηxx +

T

1 + Tηyy

)2

= η2xy ≤ ηxxηyy.

This inequality leads to a quadratic that can be solvedto obtained the following inequality

T 2ηyy ≤ ηxx ≤ ηyy. (B4)

We need to write this inequality in terms of the measur-able (co)variances ηxx, ηyy, and ηxy. To do this, note thatthe flux-balance equations Eq. (A3) and the fluctuation-balance equations Eq. (A4) comprise a linear system of5 equations and 5 unknowns. We can thus solve for ηxRand ηyR in terms of the measurable (co)variances

ηxR =(1 + T )ηxy − Tηyy + ηxx

2,

ηyR =(1 + T )ηxy + Tηyy − ηxx

2T.

(B5)

From Eq. (B2) we find that ηxx = ηxR and ηyy = ηyR, sowe substitute Eq. (B5) into Eq. (B4), which leads to theopen-loop constraint of Eq. (2).

2. Discriminating types of feedback

The open-loop constraint of Eq. (2) constrains all sys-tems from the class of Eq. (1) in which X and Y arenot connected in some kind of feedback loop. Here wederive similar constraints on systems where only one ofthe components undergoes open-loop regulation, whilethe other can still be connected in a feedback loop. Wewill also show how co-regulated reporters can be used to

12

infer whether or not the feedback is negative, and if so,how to measure the noise suppression from this negativefeedback using only (co)variance measurements.

First, we consider systems in which there is no feedbackin one of the components, say X. We can then conditionon the history of the upstream variables u(t) and thehistory of Y — together they make a larger cloud ofcomponents that can affect X but are not affected byX. Just like the previous section, we then consider theconditional average x = E[Xt|u[−∞, t], y[−∞, t]], fromwhich we have

E[x(t)y(t)]− E[x]E[y(t)] = . . .

= E[E[Xty(t)|u[−∞, t], y[−∞, t]

]]− 〈x〉〈y〉

= E[E[XtYt|u[−∞, t], y[−∞, t]

]]− 〈x〉〈y〉

= 〈xy〉 − 〈x〉〈y〉 ⇒ ηxy = ηxy .

We can then use the Cauchy-Schwarz inequality in thefollowing way: η2

xy = η2xy ≤ ηxxηyy. This inequality

bounds systems in which there is no feedback in X.We would now like to write it in terms of measurable(co)variances. To do this, we note that the relationηxx = ηxR from Eq. (B2) still holds here as we madeno assumptions about Y in that derivation. We can thususe Eq. (B5) to write ηxx in terms of the measurable(co)variances and subsitute the results in the above in-equality. This leads to the following constraint

ρ2xy ≤

1

2

((1 + T )ρxy

CVyCVx

− T(CVyCVx

)2

+ 1

). (B6)

Systems that break this constraint must have some kindof feedback in X. Similarly, we can derive the analogousconstraint on systems with no feedback in Y :

ρ2xy ≤

1

2T

((1 + T )ρxy

CVxCVy

+ T −(CVxCVy

)2). (B7)

Systems that break this constraint must have some kindof feedback in Y . These bounds are plotted in Fig. 5A.

Next, we show how to detect negative feedback. Herewe define feedback to be negative when ηxR < 0. That is,the birthrate R acts to suppress noise in X below pois-son noise. From Eq. (B5), ηxR and ηyR can be solved forin terms of the measurable (co)variances. (Co)variancemeasurements between co-regulated mRNA can thus beused to measure ηxR and ηyR using Eq. (B5), see Fig. 5B.Moreover, we can quantify how strong this negative feed-back is through the noise suppression : ηxx/(1/〈x〉) =ηxx/(ηxx− ηxR). As the negative feedback gets stronger,the noise suppression will get smaller and quantifies thestrength of the negative feedback.

Figure 5. The space of possible covariability betweenmRNA-levels of co-regulated genes depends on thetype of feedback A) Systems that have no feedback in Xmust lie in the region bounded by the light-red solid lineswhich corresponds to the region bounded by Eq. (B6). Sys-tems that lie outside of this region must have feedback inX. Similarly, systems that have no feedback in Y must lie inthe region bounded by the light-blue solid lines which corre-sponds to the region bounded by Eq. (B7). B) Here we definefeedback to be negative when the mRNA transcription rate isnegatively correlated with the mRNA levels: ηxR < 0. Thiswould correspond to highly regulated systems that exhibitnoise suppression. Using co-regulated reporters, ηxR and ηyRcan be measured from static (co)variance measurements of Xand Y using Eq. (B5).

Appendix C: Dynamics from static transcriptvariability

The stationary solution for the variance of x(t) followsfrom the linear response problem Eq. (B1) with

ηxx =1

τxηRR

∫ ∞

0

A(s)e−s/τxds, (C1)

where A(s) is the auto-correlation of the time-varyingtranscription rate R(t) (see below for a detailed deriva-tion). For a given input R(t), ηxx depends on τx, andso measuring τxηxx as a function of τx effectively deter-mines the Laplace Transform of the auto-correlation ofR(t) from static variance measurements of downstreamreporters. Our results exploit the fact that knowingstatic downstream variability for just two values of τxconstrains the possible dynamics of R(t).

In particular, with τy < τx we have

ηxx − Tηyy =ηRRτx

∫ ∞

0

A(s)(e−s/τx − e−s/τy

)ds. (C2)

For stochastic upstream signals with A(s) ≥ 0 the inte-

13

grand of Eq. (C2) is non-negative such that

Tηyy ≤ ηxx. (C3)

Eq. (C3) gives the bound of Eq. (4) after substitutingηxx = ηxx − 1/〈x〉 and ηyy = ηyy − 1/〈y〉 together withthe flux balance given by Eq. (A3).

We can further bound the class of stochastic systemsbased on the timescale of the upstream fluctuations.If upstream fluctuations are slower than τR such thatA(s) ≥ e−s/τR , we have

ηxx − Tηyy ≥ηRRτx

∫ ∞

0

e−t/τR(e−t/τx − e−t/τy

)dt

= ηRRτR

(1

τR + τx− T

τR + τy

)

≥ ηyyτR(

1

τR + τx− T

τR + τy

),

and analogously to the above derivation of Eq. (4), itfollows that

β(1− T )ρxy 6 CVxCVy

− T CVyCVx

, (C4)

where β = (1 + T )/(

2T (1 + τxτR

)(1 + T τxτR

) + (1− T ))

.

In the limit where τR → 0 we have β → 0, and asa result Eq. (C4) converges to Eq. (4). Conversely,for slow stochastic upstream fluctuations as τR → ∞we have β → 1, and Eq. (C4) converges to the rightboundary of the open-loop constraint (see Fig. 6). Theanalytical bound of Eq. (C4) is marginally loose. A tightbound can be derived numerically as presented in theSupplemental Material [34].

Derivation of Eq. (C1): Taking the Fourier transformsof Eq. (B1) and using the Wiener-Khinchin theorem [35]which states that the spectral density of a random signalis equal to the Fourier transform of its autocorrelation,we find

F [Rx] =F [RR]

(1 + ω2)& F [Ry] =

F [RR]/T 2

(1/T 2 + ω2),

where RR is the autocovariance of R. Since RR(t = 0) =Var(R), the variances can be found by taking the inverseFourier transforms at t = 0

Var(x) =1

∫ ∞

−∞

F [RR]

(1 + ω2)dω,

Var(y) =1

∫ ∞

−∞

F [RR]

(1/T 2 + ω2)dω.

(C5)

Figure 6. The space of possible correlations betweenco-regulated reporters depends on the timescale ofthe upstream fluctuations. A) The timescale of stochas-tic fluctuations is characterized by how quickly the auto-correlation of the signal (red) decays to zero. By bound-

ing the auto-correlation from below by e−t/τR (black dashedcurve) we set a lower bound on how quickly the signal auto-correlation can decay to zero. This thus sets a lower bound onthe ”speed” of the signal fluctuations. B) Open-loop systemsin which the upstream auto-correlation is bounded below bye−t/τR are bounded further towards the right. The solid darkred lines correspond to the loose bound given by Eq. (C4),whereas the black dashed lines are given by the stricter nueri-cal bound presented in the supplement [34]. These constraintscan be used to gain information on the timescale of upstreamfluctuations with only access to static snapshots of dual re-porter transcript concentrations obtained, e.g., from single-cell sequencing methods that report snapshots of transcriptabundances rather than temporal information.

We thus have

Var(y) = . . .

=1

∫ ∞

−∞F [RR](ω)

(T 2

1 + (ωT )2

)dω

=1

∫ ∞

−∞

[∫ ∞

−∞RR(t)e−iωtdt

](T 2

1 + (ωT )2

)dω

=1

∫ ∞

−∞

[∫ ∞

−∞RR(t)cos(ωt)dt

](T 2

1 + (ωT )2

)dω

=1

∫ ∞

−∞

∫ ∞

−∞RR(t)cos(ωt)

(T 2

1 + (ωT )2

)dtdω

=1

∫ ∞

−∞RR(t)

[∫ ∞

−∞cos(ωt)

(T 2

1 + (ωT )2

)dω

]dt

=1

2

∫ ∞

−∞RR(t)Te−|t|/T dt

=

∫ ∞

0

RR(t)Te−t/T dt,

where in the third step we use the fact that RR(t) is realand symmetric, and in the last step we use the fact thatthe integrand is symmetric in t. Normalizing by the av-erages, we have Eq. (C1). Note that this expression canalso be derived by writing down the general solution of∆y(t) from the differential equation Eq. (B1), squaring

14

the solution to get ∆y(t)2, and taking the ensemble av-erage over all histories, where ergodicity is not assumed.

Setting bounds on the spectral density of upstreaminfluences using co-regulated reporters

Numerical simulations of sinusoidal driving indicatethat slow oscillations lie along the right hand side of theopen-loop region in Fig. 1B, and move towards the leftas the frequency of oscillation increases. Here we de-rive approximate conditions for how large an oscillationfrequency needs to be to break the constraint given byEq. (4) of the main text. In particular, systems in whichthe spectral density of the upstream signal |F [R]|2(ω) iszero for all ω ≥ ωu are bounded by the following inequal-ity

(TCVyCVx

− CVxCVy

)(1 + T )

(γ + T )≥ (T − γ)ρxy

where γ =

(1 + ω2

uτ2y

1 + ω2uτ

2x

).

(C6)

Systems that break the above inequality cannot satisfythe requirement that |F [R]|2(ω) = 0 for all ω ≥ ωu.Similarly, systems in which the spectral density of theupstream signal |F [R]|2(ω) is zero for all ω ≤ ωu arebounded by the following inequality

(T − γ)ρxy ≤(TCVyCVx

− CVxCVy

)(1 + T )

(γ + T )

where γ =

(1 + ω2

uτ2y

1 + ω2uτ

2x

).

(C7)

Systems that break the above inequality cannot satisfythe requirement that |F [R]|2(ω) = 0 for all ω ≤ ωu.

Eqs. (C6) and (C7) bound fast oscillations towards theleft and slow oscillations towards the right of the open-loop region in Fig. 1B. As an oscillatory signal will havea peak in its spectral density centered at the angular fre-quency of the oscillation, Eqs. (C6) and (C7) provide uswith an estimation for how fast an upstream oscillationneeds to be relative to the reporter lifetimes in order forthe system to cross the no-oscillation line and be fullydiscriminated from stochastic signals. In particular, set-ting γ = T turns (C7) into the ”no-oscillation constraint”given by Eq. (4) of the main text, and turns Eq. (C6) intothe opposite constraint which constrains systems to be inthe region only accessible by oscillations. This γ = T isachieved when ωu = 1/

√τxτy. We thus have the follow-

ing approximate requirement for how fast the upstreamsignal needs to oscillate relative to the reporter lifetimesin order to break the no-oscillation bound

1√τxτy

. ωR = 2πfR, (C8)

where fR is the frequency of the upstream oscillation.

We will now derive the bounds that were just pre-sented. We normalize Eq. (C5) by the averages

ηxx =1

2π〈x〉2∫ ∞

−∞

|F [R]|21 + ω2

ηyy =1

2π〈x〉2∫ ∞

−∞

|F [R]|21 + ω2T 2

dω,

(C9)

where without loss of generality we work in units whereτx = 1 and τy = T , and where we use the fact that〈y〉 = T 〈x〉. We thus have

ηxx −(

1 + ω2uT

2

1 + ω2u

)ηyy =

1

2π〈x〉2∫ ∞

−∞|F [R]|2

(1

1 + ω2−(

1 + ω2uT

2

1 + ω2u

)1

1 + ω2T 2

)dω.

The expression in square brackets is negative for ω > ωuand positive otherwise. Thus, if the spectral density ofR is zero for ω ≥ ωu, we have

ηxx −(

1 + ω2uT

2

1 + ω2u

)ηyy ≥ 0.

Using Eq. (B2) and Eq. (B5) to write ηxx and ηyy interms of measurable (co)variances, this inequality be-comes Eq. (C6). Similarly, the same arguments showthat systems in which the spectral density of R is zerofor all ω ≤ ωu must satisfy Eq. (C7).

Appendix D: The general class of co-regulatedfluorescent proteins

For the class of fluorescent proteins defined in Fig. 2Awe can derive similar results. First, however, we will de-fine a more general class of systems from which Fig. 2Ais a subset. This class consists of the class of systems inFig. 2A when we don’t make any assumptions about themRNA intrinsic system, see Fig. 7. Here, ux and uy aresystems of variables that model the mRNA dynamics af-ter transcription or any post-translational modificationsthat occur before the maturation step. Together withu(t) these form a larger “cloud” of components that af-

15

Figure 7. General class of dual reporter systems with co-regulated fluorescent proteins We consider a class ofsystems identical to Fig. 2A with the exception that we now include multiple unspecified steps in the intrinsic dynamics ofgene expression. Here ux and uy are identical (though independent) systems of components that are affected by the upstreamcloud of components u(t) in the same way. These two smaller clouds of components model the intrinsic steps in gene expressionand allow for a wide range of possible mRNA dynamics and post-translational modifications that occur before the maturationstep. They are left unspecified except for the fact that we assume they do not form circuits of components that can createoscillations, so that any oscillatory variability is caused by the shared environment u(t). We model the dynamics of immaturefluorescent proteins denoted as X’ and Y’, as well as mature fluorescent proteins X” and Y”. The birthrate of X’ and Y’ cannow depend on the components in ux and uy respectively, in an arbitrary way. The asymmetry between the co-regulated genesis characterized by the ratio of average fluorescent maturation times Tm := τmat,y/τmat,x.

fect the proteins X ′ and Y ′. We require that ux and uyare identical, though independent (in the sense that theyare not equal, as the shared influence of u(t) will makethem statistically dependent). Aside from this they areleft almost completely unspecified. We will make oneadditional requirement on these non-specified intrinsicsystems when we prove the open-loop constraint givenby Eq. (3) of the main text. In particular, we requirethat the fluctuations that originate from these intrinsicsystems be non-oscillatory, so that any oscillatory vari-ability is caused by variability in the shared upstreamenvironment u(t). Specifically, the birthrates of X’ andY’ will not be entirely equal due to random fluctuationsthat originate from the intrinsic systems. This intrinsicnoise can be quantified as the difference between the twotranslation rates: F (u(t),ux) − F (u(t),uy). This dif-

ference is a stochastic signal, and here we require thatthe autocorrelation of this signal be non-negative. Thisassumption holds for the class in Fig. 2A and in sys-tems where the intrinsic systems consist of an otherwiseunspecified cascade of arbitrary steps. The assumptionexcludes systems in which ux and uy form circuits ofcomponents that create oscillations.

Similarly to Appendix A, at wide-sense stationarity thefollowing flux-balance relations must hold [18]

〈x′′〉/τ ′′ = 〈x′〉/τmax,x = 〈Fx〉,〈y′′〉/τ ′′ = 〈y′〉/τmax,y = 〈Fy〉,

(D1)

where Fx := F (u(t),ux) and Fy := F (u(t),uy). In ad-dition, the following fluctuation-balance relations musthold,

ηx′x′ =1

〈x′〉 + ηx′Fx , ηy′y′ =1

〈y′〉 + ηy′Fy , ηx′y′ =1

1 + Tmηx′Fy +

Tm1 + Tm

ηy′Fx ,

ηx′′x′′ =1

〈x′′〉 + ηx′x′′ , ηy′′y′′ =1

〈y′′〉 + ηy′y′′ , ηx′′y′′ =1

2ηy′x′′ +

1

2ηx′y′′ ,

ηx′x′′ =1

1 + τ ′′τmat,x

ηx′Fx +

τ ′′

τmat,x

1 + τ ′′τmat,x

ηx′′Fx , ηy′y′′ =1

1 + τ ′′τmat,y

ηy′Fy +

τ ′′

τmat,y

1 + τ ′′τmat,y

ηy′′Fy ,

ηy′x′′ =1

1 + τ ′′τmat,y

ηx′y′ +

τ ′′

τmat,y

1 + τ ′′τmat,y

ηx′′Fy , ηx′y′′ =1

1 + τ ′′τmat,x

ηx′y′ +

τ ′′

τmat,x

1 + τ ′′τmat,x

ηy′′Fx .

(D2)

Appendix E: Constraints on open-loop systems forsystems as define in Fig. 2A

Here we prove the inequality Eq. (3). The inequalityEq. (3) has three parts, which are represented by

the three solid orange lines in Fig. 2B in the paper.We will first prove these three constraints one at atime, followed by the upper bound on correlationsillustrated by the dashed orange line in Fig. 2B. Refer

16

back to Fig. 7 for illustration of the system being studied.

Proof of the bottom bound ρx′′y′′ ≥ 0: Just like inthe previous sections, we consider the average stochas-tic dual reporter dynamics conditioned on the history oftheir upstream influences

x′′(t) = E[X ′′t |u[−∞, t]

]& y′′(t) = E

[Y ′′t |u[−∞, t]

].

When the cloud of components u(t) is not affected bythe downstream components we can write the time evolu-tion of the conditional averages using the chemical masterequation of the conditional probability space

d∆x′

dt= ∆F (t)−∆x′

d∆y′

dt= ∆F (t)−∆y′/Tm

d∆x′′

dt= ∆x′ −∆x′′/τ ′′

d∆y′′

dt= ∆y′/Tm −∆y′′/τ ′′

where without loss of generality we work in units whereτmat,x = 1 and τmat,y = Tm, and where

F (t) = E[F (u(t),ux)|u[−∞, t]

]

= E[F (u(t),uy)|u[−∞, t]

],

is the translation rate after averaging out the mRNA fluc-tuations. Note that the equality on the right hand sideis from the fact that the unspecified intrinsic systems uxand uy are identical and thus the conditional averages arethe same. Taking the Fourier transform of the above dif-ferential equations, and then using the Wiener-Khinchintheorem [35] which states that the spectral density of arandom signal is equal to the Fourier transform of itsautocorrelation, we find

F [Rx] =F [RF ]

(1/τ ′′2 + ω2)(1 + ω2),

F [Ry] =F [RF ]

(1/τ ′′2 + ω2)(1 + ω2T 2m),

where RF is the autocovariance of F (t). Since Rz(t =0) = Var(z), the variances can be found by taking theinverse Fourier transforms at t = 0

Var(x′′) =1

∫ ∞

−∞

F [RF ]

(1/τ ′′2 + ω2)(1 + ω2)dω

Var(y′′) =1

∫ ∞

−∞

F [RF ]

(1/τ ′′2 + ω2)(1 + ω2T 2m)dω.

(E1)

Moreover, we then use the Wiener-Khinchin theorem [35]which states that the Fourier transform of the cross powerspectral density of two signals is equal to the Fouriertransform of their cross-correlation to write

F [Rx′′,y′′ ] =F [RF ](1/Tm + ω2 + iω(1− 1/Tm))/Tm

(1/τ ′′2 + ω2)(1 + ω2)(1/T 2m + ω2)

,

where Rx′′,y′′ is the cross-covariance of x′′(t) and y′′(t).Taking the inverse Fourier transform at t = 0 gives us

the covariance

Cov(x′′, y′′) =

1

∫ ∞

−∞F [RF ]

(1/Tm + ω2)/Tm(1/τ ′2 + ω2)(1 + ω2)(1/T 2

m + ω2)dω,

where the term proportional to iω integrated to zero be-cause it was odd in ω whereas the rest of the integrandis even in ω through F [RF ] = |F [∆F ]|2. We thus have

(1 + Tm)Cov(x′′, y′′) =

=1

∫ ∞

−∞F [RF ]

(1 + Tm)(1/Tm + ω2)/Tm(1/τ ′2 + ω2)(1 + ω2)(1/T 2

m + ω2)dω

=1

∫ ∞

−∞F [RF ]

1

(1 + ω2)(1/τ ′2 + ω2)dω

+Tm2π

∫ ∞

−∞F [RF ]

1

(1 + ω2T 2m)(1/τ ′2 + ω2)

=Var(x′′) + TmVar(y′′).

Upon normalizing with the averages we have

ηx′′y′′ =1

1 + Tmηx′′x′′ +

Tm1 + Tm

ηy′′y′′ .

Moreover,

Cov(x′′, y′′) = . . .

= E[E[X ′′t |u[−∞, t]] · E[Y ′′t |u[−∞, t]]

]

− E[E[X ′′t |u[−∞, t]]

]· E[E[Y ′′t |u[−∞, t]]

]

= E[E[X ′′t Y

′′t |u[−∞, t]]

]− 〈x′′〉〈y′′〉

= 〈x′′y′′〉 − 〈x′′〉〈y′′〉 = Cov(x′′, y′′),

where the 3rd step comes from the fact that X ′′ andY ′′ are independent when we condition on the upstreamhistory u[−∞, t] [33]. Upon normalizing by the averageswe have ηx′′y′′ = ηx′′y′′ , and so

ηx′′y′′ =1

1 + Tmηx′′x′′ +

Tm1 + Tm

ηy′′y′′ ≥ 0. (E2)

We thus have the lower bound ρx′′y′′ ≥ 0 .

Proof of the right bound CVx′′/CVy′′ ≤ 1 : Here wewill need to consider the average stochastic dual reporterdynamics conditioned on the history of all their upstreaminfluences

¯x′′(t) := E[X ′′t |u[−∞, t],ux[−∞, t]

]

¯y′′(t) := E[Y ′′t |u[−∞, t],uy[−∞, t]

],

where now we also condition on the trajectory of themRNA dynamics. When the cloud of components u(t)does not depend on the downstream components we canwrite down the time evolution of the conditional averagesusing the chemical master equation of the conditional

17

probability space. The time evolution of the conditionalaverages ¯y′ and ¯y′′ is given by

d¯y′

dt= Fy(t)− ¯y′/Tm &

d¯y′′

dt= ¯y′/Tm − ¯y′′/τ ′′,

where now Fy(t) corresponds to a particular translationrate trajectory as we no longer average out the mRNAdynamics, and without loss of generality we let τmat,x = 1and τmat,y = Tm. In terms of the deviations from themeans these differential equations become

d∆¯y′

dt= ∆Fy(t)−∆¯y′/Tm,

d∆¯y′′

dt= ∆¯y′/Tm −∆¯y′′/τ ′′,

(E3)

Multiplying the left and right equations with ∆¯y′′ and∆¯y′ respectively, summing the results, and taking en-semble averages of the different upstream histories givesus

E

[d

dt(∆¯y′ ¯y′′)

]=

E[∆¯y′′∆Fy

]+ E

[∆¯y′2

]/Tm −

(1

Tm+

1

τ ′′

)E[∆¯y′∆¯y′′

].

At stationarity the left hand side is zero, so we have

(1

Tm+

1

τ ′′

)E[∆¯y′∆¯y′′

]= E

[∆¯y′′∆Fy

]+ E

[(¯y′)

]/Tm.

Similarly, multiplying the first equation in Eq. (E3) by∆¯y′, taking the ensemble average, and using the fact thatthe left hand side will be zero at stationarity, we have

E[∆¯y′∆¯y′

]= TmE

[∆¯y′∆Fy

].

Now mutiplying the second equation in Eq. (E3) by ∆¯y′′

and following the same steps we have

E[¯y′′ ¯y′′

]/τ ′′ = TmE

[¯y′ ¯y′′

].

Combining these expressions gives us

(1

Tm+

1

τ ′′

)Var(¯y′′) = Cov(¯y′′, Fy) + Cov(¯y′, Fy).

Now note that

Cov(¯y′′, Fy) = E[¯y′′ · Fy(t)

]− E

[¯y′′]· E[Fy]

= E[E[Y ′′t |u[−∞, t],uy[−∞, t]] · Fy(t)

]− E

[E[Yt|u[−∞, t],uy[−∞, t]]

]· E[E[F (u(t),uy(t)|u[−∞, t],uy[−∞, t]]

]

= E[E[Y ′′t · Fy(u(t),uy(t))|u[−∞, t],uy[−∞, t]]

]− 〈y′′〉〈Fy〉 = Cov(y′′, Fy),

and similarly Cov(¯y′, Fy) = Cov(y′, Fy). Thus, after nor-malizing with the averages, we have

η¯y′ ¯y′ =1

1 + τ ′′Tm

ηy′Fy +τ ′′

Tm

1 + τ ′′Tm

ηy′′Fy .

This is the expression for ηy′y′′ in the fluctuation-balanceequations Eq. (D2), and so

η¯x′′ ¯x′′ = ηx′′x′′ − 1/〈x′′〉 & η¯y′′ ¯y′′ = ηy′′y′′ − 1/〈y′′〉,(E4)

where the x′′ expression follows by symmetry. Moreover,we apply the same analysis that was done in the previousproof to write

Var(¯x′′) =1

∫ ∞

−∞

F [RFx ]

(1/τ ′′2 + ω2)(1 + ω2)dω,

Var(¯y′′) =1

∫ ∞

−∞

F [RFy ]

(1/τ ′′2 + ω2)(1 + ω2T 2m)dω,

(E5)

where RFx and RFy are the autocovariances of thetranslation rates F (u(t),ux) and F (u(t),uy). Note that

since the two intrinsic systems ux and uy are statisti-cally identical, we have RFx = RFy , and so the Var(¯y′′)expression has a larger integrande for all ω (recall thatF [RFy ] = F [∆Fy]2 and so is positive). We thus haveVar(¯x′′) < Var(¯y′′), which after normalizing gives usη¯x′′ ¯x′′ ≤ η¯y′′ ¯y′′ . From Eq. (D1) we have 〈x′′〉 = 〈y′′〉,and so by using Eq. (E4) we find the right boundCVx′′x′′ ≤ CVy′′y′′ .

Proof of the left bound: Recall from the previous twoproofs we derived the following equations

ηx′′x′′ =1

〈x′′〉 + η¯x′′ ¯x′′ ηy′′y′′ =1

〈y′′〉 + η¯y′′ ¯y′′

ηx′′y′′ = ηx′′y′′ =1

1 + Tmηx′′x′′ +

Tm1 + Tm

ηy′′y′′ .

(E6)

We now use the Cauchy-Schwarz inequality with the lastexpression

(1

1 + Tmηx′′x′′ +

Tm1 + Tm

ηy′′y′′

)2

≤ ηx′′x′′ηy′′y′′ ,

18

which leads to

T 2mηy′′y′′ ≤ ηx′′x′′ ≤ ηy′′y′′ . (E7)

Unlike the mRNA system of equations, we cannot solveEq. (E6) for ηx′′x′′ and ηy′′y′′ in terms of the measurable(co)variances because the system is underdetermined.This is due to the fact that we have not specified themRNA intrinsic system. Nevertheless, we can derive an

additional bound that will allow us to close the system ofequations to write Eq. (E7) in terms of the measurable(co)variances.

From Eqs. (E1) and (E5) we have

Var(¯x′′)−Var(x′′) =1

∫ ∞

−∞

F [RFx −RF ]

(1/τ ′′2 + ω2)(1 + ω2)dω.

Now, note that

RF (t) = E[E[∆F (u(t′),ux)|u[−∞, t′]] · E[∆F (u(t′ + t),ux)|u[−∞, t′ + t]]

]

= E[E[∆F (u(t′),ux)|u[−∞, t′]] · E[∆F (u(t′ + t),uy)|u[−∞, t′ + t]]

]

= E[E[∆F (u(t′),ux) ·∆F (u(t′ + t),uy)|u[−∞, t′ + t]]

]= Cov(Fx(t′), Fy(t′ + t)) = RFx,Fy (t),

where the second step comes from the fact that Fx and Fyare statistically equivalent and the third step comes fromthe fact that they are independent when we condition onthe upstream history u[−∞, t]. Taking the autocovari-ance of Fx − Fy gives us

RFx−Fy = 2RFx − 2RFx,Fy = 2(RFx −RF ).

Thus the requirement that we make for this class of sys-tems — that the autocorrelation of Fx − Fy be non-

negative — is equivalent to saying that RFx − RF isnon-negative. Thus, we have

Var(¯x′′)−Var(x′′) =1

∫ ∞

−∞

F [RFx −RF ]

( 1τ ′′2 + ω2)(1 + ω2)

Var(¯y′′)−Var(y′′) =1

∫ ∞

−∞

F [RFy −RF ]

( 1τ ′′2 + ω2)(1 + T 2

mω2)dω.

Since the two intrinsic systems ux and uy are identical,we have RFx = RFy . Defining f := RFx −RF , we have

(Var(¯x)−Var(x))− Tm(Var(¯y)−Var(y)) = . . .

=1

∫ ∞

−∞F [f ]

{1

(1 + ω2)(1/τ ′′2 + ω2)− Tm

(1 + (ωTm)2)(1/τ ′′2 + ω2)

}dω

=1

∫ ∞

−∞

∫ ∞

−∞f(t)cos(ωt)

{1

(1/τ ′′2 + ω2)(1 + ω2)− Tm

(1/τ ′′2 + ω2)(1 + (ωTm)2)

}dtdω

=1

∫ ∞

−∞f(t)

(∫ ∞

−∞

cos(ωt)

(1/τ ′′2 + ω2)

{1

(1 + ω2)− Tm

1 + (ωTm)2

}dω

)dt

=1

π

∫ ∞

0

f(t)

(∫ ∞

−∞

cos(ωt)

(1/τ ′′2 + ω2)

{1

(1 + ω2)− Tm

1 + (ωTm)2

}dω

)dt

=

∫ ∞

0

f(t)τ ′′2(

(e−t − τ ′′e−t/τ ′′)

(1− τ ′′2)− (Tme

−t/Tm − τ ′′e−t/τ ′′)

(T 2m − τ ′′2)

)dt,

where in the second step we used the fact that f(t) is sym-metric which lets us omit the sin(ωt) part of the Fouriertransform, and in the fourth step we use the fact that theintegrand is symmetric in t. The expression in parenthe-ses is always positive, and since f(t) is non-negative, thismeans that Tm (Var(¯y)−Var(y)) ≤ (Var(¯x)−Var(x)),which in terms of the normalized variances is Tm(η¯y′′ ¯y′′−ηy′′y′′) ≤ (η¯x′′ ¯x′′ − ηx′′x′′). Similarly, we can show using

the same method that (η¯x′′ ¯x′′−ηx′′x′′) ≤ (η¯y′′ ¯y′′−ηy′′y′′).Combining these two inequalities, we have

Tm ≤η¯x′′ ¯x′′ − ηx′′x′′

η¯y′′ ¯y′′ − ηy′′y′′≤ 1. (E8)

With this inequality we can “close” the system of equa-tions Eq. (E6) to set the limits of ηx′′x′′ and ηy′′y′′ interms of the measurable (co)variances

19

ηx′′y′′(1 + Tm)− Tm (ηy′′y′′ − ηx′′x′′)

1 + Tm≤ηx′′x′′ ≤ (1 + Tm)ηx′′y′′ + ηx′′x′′ − Tmηy′′y′′

2

ηx′′y′′(1 + Tm)− ηx′′x′′ + Tmηy′′y′′

2Tm≤ηy′′y′′ ≤

ηx′′y′′ (1 + Tm)− ηx′′x′′ + ηy′′y′′

1 + Tm.

(E9)

We can then substitute these in the open-loop constraintEq. (E7) to obtain the open-loop constraint in terms ofthe measurable (co)variances. Doing so we obtain theleft bound of Fig. 2B.

Proof of the upper correlation bound: We use thelaw of total variance to write

ηx′′x′′ = ηint,x′′ + ηx′′x′′ & ηy′′y′′ = ηint,y′′ + ηy′′y′′

where ηint,x′′ = 1〈x′′〉2 E

[Var (X ′′|u[−∞, t])

]and

ηint,y′′ = 1〈y′′〉2 E

[Var (Y ′′|u[−∞, t])

]. Thus we have

ηx′′x′′ ≥ ηx′′x′′ and ηy′′y′′ ≥ ηy′′y′′ . Substituting inEq. (E2) gives

ηx′′y′′(1 + Tm) ≤ ηx′′x′′ + Tmηy′′y′′ ,

which after dividing by√ηx′′x′′ηy′′y′′ results in the upper

correlation bound.

Appendix F: Dynamics from static fluorescentprotein variability

When there is no feedback, we can write down thetime-evolution equations for the averages of the condi-tional probability space, and from these we can deriveEq. (E5)

Var(¯x′′) =1

∫ ∞

−∞

F [RFx ]

(1/τ ′′2 + ω2)(1 + ω2)dω,

Var(¯y′′) =1

∫ ∞

−∞

F [RFy ]

(1/τ ′′2 + ω2)(1 + ω2T 2m)dω.

We then have

Var(¯y′′) =1

∫ ∞

−∞

F [RFy ]

(1/τ ′′2 + ω2)(1 + ω2T 2m)dω

=1

∫ ∞

−∞

[∫ ∞

−∞RFy (t)cos(ωt)dt

](1

(1/τ ′′2 + ω2)(1 + (ωTm)2)

)dω

=1

∫ ∞

−∞RFy (t)

(∫ ∞

−∞

cos(ωt)

(1/τ ′′2 + ω2)

(1

1 + (ωTm)2

)dω

)dt

=1

π

∫ ∞

0

RFy (t)

(∫ ∞

−∞

cos(ωt)

(1/τ ′′2 + ω2)

(1

1 + (ωTm)2

)dω

)dt

=

∫ ∞

0

RFy (t)τ ′′2(Tme

−t/Tm − τ ′′e−t/τ ′′

T 2m − τ ′′2

)dt,

where the second step comes from the fact that RFy (t)is and symmetric so we can omit the sin(ωt) part of theFourier transform, and the 4th step comes from the factthat the integrand is symmetric in t. We thus have, afternormalizing with the averages,

η¯x′′ ¯x′′ =

∫ ∞

0

ηFFAFx(t)

(τmat,xe

− tτmat,x − τ ′′e− t

τ′′

τ2mat,x − τ ′′2

)dt,

η¯y′′ ¯y′′ =

∫ ∞

0

ηFFAFy (t)

(τmat,ye

− tτmat,y − τ ′′e− t

τ′′

τ2mat,y − τ ′′2

)dt.

where AFx(t) = AFy (t) is the autocorrelation of thetranslation rate F (u(t),ux). When AFx ≥ 0, we findthat the integrands of the above integrals are always non-

negative. Upon comparing them we find that

Tmη¯y′′ ¯y′′ ≤ η¯x′′ ¯x′′ .

Using Eq. (E4), and the fact that 〈x′′〉 = 〈y′′〉 fromEq. (D1), the above inequality becomes Tmηy′′y′′ ≤ηx′′x′′ , which in terms of the CVs becomes Eq. (5) ofthe main text. Note, that the exact same analysis can beused to derive

Tmηy′′y′′ ≤ ηx′′x′′ , (F1)

which constrains the reporter (co)variances when theautocorrelation of F (t) = E

[F (u(t),ux)|u[−∞, t]

]is

stochastic. Note that F (t) is the translation rate when we

20

average out all the mRNA intrinsic fluctuations. This lat-ter constraint will be used in the section on stronger con-straints on fluorescent reporters using a third reporter.

Appendix G: Behaviour of specific example systems

To gain an intuition for where systems fall in the al-lowable region we present several example systems in themain text. For example, the arrowed curves in Fig. 1Bcorrespond to two toy models subject to an upstreamcomponent Z that undergoes Poisson fluctuations

zλ−−−−−−−→ z + 1 z

z/τz−−−−−−−→ z − 1.

The blue curve in Fig. 1B corresponds to the systemR(u(t)) = kz, with k = 10, τx = 1, τy = T , 〈z〉 = τzλ =100. As we increase the speed of the upstream fluctu-ations by decreasing τz, the dual reporter correlationsmove downwards and left towards the bound of Eq. (4).This makes intuitive sense because in that regime X andY have little time to adjust to changing Z levels and de-correlate.

In contrast, the grey curve in Fig. 1B corresponds to asystem with feedback of Y onto its own production, withR(u(t)) = kz/(1+ εy), with k = 10, τx = 1, τy = T , τz =1, and 〈z〉 = 100. For ε = 0, we have no feedback and thegrey line coincides with the previous system (blue curve)with 〈x〉 = 〈y〉/T = 1000. As we increase the strength ofthe feedback ε, the correlations of this example systemmoves outside the region constrained by Eq. (2) whenε > 0, 0.07, 0.01 for T = 1, 0.5, 0.1 respectively. Maxi-mizing the discriminatory power of the approach corre-sponds to minimizing the area in which systems withoutfeedback could lie. Choosing T = 1 or T � 1 would thusbe ideal to detect feedback in this system.

In Fig. 3C the red curve corresponds to a systemstochastically driven through a Poisson variable Z withτz = 1, 〈z〉 = 100 and R(u(t)) = kz with k = 10 sothat 〈x〉 = 1000. In contrast, the blue curve correspondsto a system driven by an oscillation with R(u(t)) =K(sin(ωt + φ) + 2), where φ is a random variable thatde-synchronizes the ensemble and K = 500 so that〈x〉 = 1000. We set the oscillation period 2π/ω = 2τxto model an oscillation set by the cell-cycle (for example,the maturation time of mEGFP is roughly half of theE. coli cell-cycle [15]). We analyzed both types of sys-tems for fixed τx = 1 while varying τy = T to see whichchoice of T maximizes the discriminatory power of theconstraint. We find that for this particular example, theoscillating system crossed the dashed black curve whenω = 1/

√τxτy.

Appendix H: The effects of stochastic undercountingon dual reporter correlations

1. Undercounting mRNA

First we analyze the effects of undercounting onmRNA-levels of co-regulated genes. We would like toknow how the derived bounds change when the reporterabundances X and Y are detected with fixed probabili-ties px and py respectively. This is done by introducingtwo new variables that correspond to the experimentalreadouts of the reporter abundances: Xr and Yr. Inparticular, Xr corresponds to the detected number of Xmolecules when each molecule is detected with probabil-ity px (and similarly for Yr and py). In terms of thesevariables, open-loop systems are constrained by the fol-lowing inequalities

−(ηxrxr −

pypxTηyryr

)≤ ηxryr

(pypx− T

),

(ηxrxr −

pypxTηyryr

)≤ ηxryr

(1− py

pxT

).

(H1)

Systems that break this inequality must be connected insome kind of feedback loop. When the detection prob-abilities are the same for both reporters px = py, theconstraint reduces to the open-loop constraint given byEq. (2) of the manuscript. However, when px 6= py, theno-feedback bound differs from Eq. (2) due to the py/pxterm. As we will see in a later section, this ratio can bemeasured using a third reporter.

Moreover, open-loop systems that are driven by astochastic upstream signal must obey the following in-equality

pypxTηyryr +

(1 + T )(

1− pypx

)

2ηxryr ≤ ηxrxr . (H2)

Open-loop systems that break this bound must be drivenby some kind of oscillation. Note that when px = py, theconstraint reduces Eq. (4) of the manuscript. However,when px 6= py, the no-oscillation bound differs fromEq. (4).

Derivation of Eq. (H1) and Eq. (H2): We considerthe following system, which is analogous to the class ofsystems in Fig. 1A in the paper with the addition of a stepwhich will model a binomial read-out of the componentsX and Y

Arbitrary mRNA dynamics :

xR(u(t))−−−−−−−→ x+ 1 x

x/τx−−−−−−−→ x− 1

yR(u(t))−−−−−−−→ y + 1 y

y/τy−−−−−−−→ y − 1

Read-out mock process :

21

xrλx(x−xr)−−−−−−−→ xr + 1 xr

βxxr−−−−−−−→ xr − 1

yrλy(y−yr)−−−−−−−→ yr + 1 yr

βyyr−−−−−−−→ yr − 1

Here Xr and Yr correspond to the experimental read-outsof X and Y respectively. For large values of λi and βi, thecomponents Xr and Yr correspond to binomial read-outsof the mRNA reporters X and Y , respectively. Xr has acounting success rate of px = λx/(λx + βx), i.e. each XmRNA molecule has a probability of px to be detectedexperimentally (similarly for Y and py). This approachagain allows for arbitrary mRNA dynamics and can besolved for relations between (co)variances in exactly thesame way as in the previous sections.

In particular, once the first and second moments ofthis system reach stationarity, general fluctuation bal-ance relations [18] lead to (co)variance relations, whichin the limit where βx � 1/τx and βy � 1/τy, but wherepx and py are kept constant (this is not an approxima-tion but is satisfied by construction of the mathematicalmock system to define Xr as a binomial cut of X), these(co)variance relations are

ηxrxr =1

〈xr〉+ ηxR, ηyryr =

1

〈yr〉+ ηyR,

ηxryr = ηxy =1

1 + TηxR +

T

1 + TηyR.

(H3)

These are identical to the (co)variances relations given byEq. (A4) which are for the variables X and Y , with theexception of the averages. Undercounting can only serveto further de-correlate the reporter read-outs, and so wewould expect for the upper bound on ρxy in Fig. 1B tohold for the read-outs. Indeed, from the above equations

ηxryr = . . .

1

1 + Tηxrxr +

T

1 + Tηyryr −

(1

〈xr〉+

T

〈yr〉

)(1

1 + T

)

≤ 1

1 + Tηxrxr +

T

1 + Tηyryr ,

which corresponds to the bound in Fig. 1B of the paper.Note that the bound holds for all detection probabilities.

Next, we will generalize the open-loop constraints tothe system with the binomial read-out step. The con-straint on open-loop systems derived for the system with-out the undercounting steps still holds for the X and Ycomponents, as these don’t depend in anyway on Xr andYr. In terms of the conditional averages x and y, thisbound is given by Eq. (B4). We would now like to writeηxx and ηyy in terms of the read-out (co)variances to

write this inequality in terms of ηxrxr , ηyryr , and ηxryr .Recall from Eq. (B2) that ηxx = ηxR and ηyy = ηyR, sowe need to solve for ηxR and ηyR in terms of the read-out (co)variances. First note that since Xr is a binomialread-out of X, we have 〈xr〉 = px〈x〉, and similarly for Y .As a result, the flux-balance relation given be Eq. (A3)becomes

〈yr〉〈xr〉

= Tpypx. (H4)

We can now solve Eq. (H3) and Eq. (H4) for ηxR and ηyRas we have 4 equations and 4 unknowns

ηxR =(ηxryr (1 + T )− Tηyryr ) pypx + ηxrxr(

1 +pypx

) ,

ηyR =ηxryr (1 + T )− ηxrxr +

pypxTηyryr

T(

1 +pypx

) .

(H5)

Note that when the detection probabilities are the samefor both reporters, these expressions become identical tothose that were derived without the undercounting stepin Eq. (B5). Recalling that ηxR = ηxx and ηyR = ηyy,we substitute Eq. (H5) into Eq. (B4) which gives us theopen-loop constraint given by Eq. (H1).

Next, we will generalize the no-oscillation bound to thesystem with the binomial read-out step. In terms of theconditional averages of X and Y the no-oscillation boundis given by Eq. (C3). We then substitute the expressionsfrom Eq. (H5) into Eq. (C3) and this bound becomesEq. (H2).

2. Undercounting fluorescent proteins

Next, we analyze the effects of stochastic undercount-ing on co-regulated fluorescent proteins. This couldmodel, for example, fluorescent proteins with chro-mophores that sometimes do not undergo maturationproperly and thus are not detected. Here we use thesame approach as the previous section by adding twoadditional variables X ′′r and Y ′′r that correspond to theexperimental read-outs of the fluorescent proteins. Inparticular, X ′′r corresponds to the detected number ofX ′′ molecules when each molecule is detected with prob-ability px, and similarly for Y ′′r and py. In terms of thesevariables, open-loop systems are constrained by the fol-lowing inequalities following constraint

22

Tm

(min

(Tm,

pypx

)ηy′′r y′′r − ηx′′

r x′′r

)≤ ηx′′

r y′′r

(min

(Tm,

pypx

)− T 2

m

),

ηx′′r x

′′r−max

(pypx, 1

)ηy′′r y′′r ≤

(1−max

(pypx, 1

))ηx′′

r y′′r,

0 ≤ ηx′′r y

′′r.

(H6)

Systems that break this inequality must be connected insome kind of feedback loop. When the detection prob-abilities are the same for both reporters px = py, theconstraint reduces to Eq. (3) of the main text.

Moreover, open-loop systems that are driven by astochastic upstream signal must obey the following in-equality

(Tm −min

(Tm,

pypx

))ηx′′

r y′′r

≤ ηx′′r x

′′r−min

(Tm,

pypx

)ηy′′r y′′r .

(H7)

Open-loop systems that break this bound must bedriven by some kind of oscillation. When px = py, theconstraint reduces to Eq. (5) of the main text.

Derivation of Eq. (H6) and Eq. (H7): We considerthe following system, which is analogous to the class ofsystems in Fig. 7 with the addition of a step which willmodel a binomial read-out of the components X ′′ and Y ′′

Arbitrary fluorescent protein dynamics:

x′F (u(t),x)−−−−−−−→ x′ + 1 (x′, x′′)

x′/τmat,x−−−−−−−→ (x′ − 1, x′′ + 1)

y′F (u(t),y)−−−−−−−→ y′ + 1 (y′, y′′)

y′/τmat,y−−−−−−−→ (y′ − 1, y′′ + 1)

x′′x′′/τ ′′

−−−−−−−→ x′′ − 1 y′′y′′/τ ′′

−−−−−−−→ y′′ − 1

Read-out mock process:

x′′rλx(x′′−x′′

r )−−−−−−−→ x′′r + 1 x′′rβxx

′′r−−−−−−−→ x′′r − 1

y′′rλy(y′′−y′′r )−−−−−−−→ y′′r + 1 y′′r

βyy′′r−−−−−−−→ y′′r − 1

Just like in the mRNA case, for large values of λi and βithe additional read-out steps model a mock process thatturn X ′′r and Y ′′r into instantaneous binomial read-outs ofthe fluorescent protein abundances X ′′ and Y ′′. Here X ′′rcorresponds to the abundance of FPs that are fluorescing,where each X ′′ has a probability px = λx/(λx + βx) ofhaving matured properly (similarly for Y ′′r ). Followingthe same steps as in the previous section, we get the

following (co)variance relations

ηx′′r y

′′r

= ηx′′y′′ ,

ηx′′r x

′′r

=1

〈x′′r 〉+ηx′′x′ , ηy′′r y′′r =

1

〈y′′r 〉+ ηy′′y′ .

(H8)

Moreover, since X ′′r is a binomial read-out of X ′′ we have〈x′′r 〉 = px〈x′′〉, and so from the above and the fluctuationbalance equations given by Eq. (A4) we have

ηx′′r x

′′r

=(1− px)

〈x′′r 〉+ ηx′′x′′ & ηy′′r y′′r =

(1− py)

〈y′′r 〉+ ηxx.

(H9)

As a result we have

ηx′′r y

′′r

= ηx′′y′′ ≤1

1 + Tmηx′′x′′ +

Tm1 + Tm

ηy′′y′′

≤ 1

1 + Tmηx′′

r x′′r

+Tm

1 + Tmηy′′r y′′r ,

where the second step is from the upper bound in Fig. 2Bgiven by ηx′′y′′(1 + Tm) 6 ηx′′x′′ + Tmηy′′y′′ . In termsof the read-out reporter correlation and CVs the aboveequations corresponds to the top bound in Fig. 2B of thepaper.

Next, we will generalize the open-loop constraints tothe system with the binomial read-out step. When thereis no feedback we can use Eq. (E6) along with Eq. (H8)and Eq. (H9) to write

ηx′′r y

′′r

=1

1 + Tmηx′′x′′ +

Tm1 + Tm

ηy′′y′′ ,

ηx′′r x

′′r

=1

〈x′′r 〉+ η¯x′′ ¯x′′ , ηy′′r y′′r =

1

〈y′′r 〉+ η¯y′′ ¯y′′ .

(H10)

Recall that in terms of the conditional averages x′′ andy′′ the open-loop constraint is given by Eq. (E7), whichin combination with the first equation on the right inEq. (H10) can be written as

Tmηy′′y′′ ≤ ηx′′r y

′′r≤ ηy′′y′′ . (H11)

We generally cannot solve for ηx′′x′′ and ηy′′y′′ in terms ofηx′′

r x′′r, ηy′′r y′′r , and ηx′′

r y′′r

like we did for the previous casein Eq. (H5) because the above system of equations is un-derdetermined. However, we can derive bounds on theseextrinsic contributions using the inequality Eq. (E8).Once the reporter averages have reached stationarity, wehave 〈x′′r 〉 =

pypx〈y′′r 〉. This balance relation, together

Eq. (E8) and Eqs. (H10) and (E7) lead to the followingbounds on the extrinsic contributions

23

max(pypx, 1)ηx′′

r y′′r

(1 + Tm)− Tm(

max(pypx, 1)ηy′′r y′′r − ηx′′

r x′′r

)

Tm + max(pypx, 1)

≤ ηx′′x′′

ηx′′x′′ ≤ ηx′′r y

′′r

+Tm

(ηx′′

r x′′r−min(Tm,

pypx

)ηy′′r y′′r

)

min(Tm,pypx

) + Tm

ηx′′r y

′′r

(1 + Tm)− ηx′′r x

′′r

+ min(Tm,

pypx

)ηy′′r y′′r

min(Tm,

pypx

)+ Tm

≤ ηy′′y′′

ηy′′y′′ ≤ηx′′

r y′′r

(1 + Tm)− ηx′′r x

′′r

+ max(pypx, 1)ηy′′r y′′r

max(pypx, 1)

+ Tm.

(H12)

Substituting the last two inequalities into Eq. (H11) leadsto the constraint given be Eq. (H6).

Next, we will generalize the ”no-oscillation” bound tothe system with the binomial read-out step. Recall that,in terms of the conditional averages, this bound is givenby Eq. (F1). Substituting the second and third inequal-ities from Eq. (H12) we obtain the inequality given byEq. (H7).

Appendix I: Genes with proportional transcriptionrates

Here we consider the more general class of systems inwhich two components are produced with arbitrary pro-duction rates that are proportional with some propor-tionality constant α. We find that the system becomesidentical to the same system in which α = 1 but wherethere is systematic undercounting with unequal detectionprobabilities (see Sec. H). We will first present the derivedbounds for the α 6= 1 case followed by the derivations.Note that the results will depend on the proportionalityconstant α, which can be measured using a third reporteras shown Appendix. J.

1. mRNA with proportional transcription rates

We consider the following class of systems which isanalogous to the class of systems in Fig. 1A in the paperwith the exception that now the transcription rates ofthe two mRNA are not equal but proportional

xR(u(t))−−−−−−−→ x+ 1 x

x/τx−−−−−−−→ x− 1

yαR(u(t))−−−−−−−→ y + 1 y

y/τy−−−−−−−→ y − 1(I1)

First we will present the analogue of the constraint onopen-loop systems given by Eq. (2) in the paper. In par-ticular, systems from the above class of systems in whichthe components X and Y do not directly, or indirectly,

affect their own transcription rate must satisfy the fol-lowing inequalities

− (ηxx − αTηyy) ≤ ηxy (α− T ) ,

(ηxx − αTηyy) ≤ ηxy (1− αT ) .(I2)

Systems that break these inequalities must be connectedin some kind of feedback loop. Note that this equationis identical to Eq. (H1) with the replacement

pypx→ α.

Next we will present the analogue of the constrainton stochastic systems given by Eq. (4) in the main text.Open-loop systems from the above class in which thetranscription rate is stochastic must satisfy the followinginequality

αTηyy +(1 + T ) (1− α)

2ηxy ≤ ηxx. (I3)

Open-loop systems that break this bound must bedriven by some kind of oscillation. Again, note that thisequation is identical to Eq. (H2) with the replacementpypx→ α.

Derivation of Eqs. (I2) and (I3): Once the aver-ages and the (co)variances of the components X and Yreach stationarity, general fluctuation balance relations[18] lead to the following (co)variance relations

ηxx =1

〈x〉 + ηxR, ηyy =1

〈y〉 + ηyR,

ηxy =1

1 + TηxR +

T

1 + TηyR.

These are identical to the fluctuation balance equationsfor the analogous class of system with α = 1, given byEq. (A4), which is intuitively explained from the factthat we are considering normalized (co)variances whichcancel out the proportionality constant. In fact, when thecomponents X and Y do not directly, or indirectly, affecttheir own transcription rate, we can again condition onthe upstream history, in which case ηyy is equal to thesame value it would take when α = 1, and so the open-loop constraint of Eq. (B4) and the no-oscillation bound

24

of Eq. (C3) hold for all α. However, the averages willnot have the same asymmetry as the α = 1 case. Inparticular, once the averages reach stationarity we obtainthe following flux balance relations [18]

〈R〉 = 〈x〉/τx & α〈R〉 = 〈y〉/τy ⇒ 〈y〉〈x〉 = αT,

(I4)

which, with the above fluctuation balance relations, isanalogous to Eq. (H3) and Eq. (H4) with the exchangepypx→ α. The following results thus follow

ηxx =ηxy(1 + T )α+ ηxx − αTηyy

1 + α,

ηyy =ηxy(1 + T )− ηxx + αTηyy

T (1 + α).

(I5)

Substituting the above expressions into Eq. (B4) givesus Eq. (I2), and substituting the above expressions intoEq. (C3) gives us Eq. (I3).

2. Fluorescent proteins with proportionaltranslation rates

We consider the following class of systems

x′F (u(t),ux)−−−−−−−→ x′ + 1 (x′, x′′)

x′/τmat,x−−−−−−−→ (x′ − 1, x′′ + 1)

y′αF (u(t),uy)−−−−−−−→ y′ + 1 (y′, y′′)

y′/τmat,y−−−−−−−→ (y′ − 1, y′′ + 1)

x′′x′′/τ ′′

−−−−−−−→ x′′ − 1 y′′y′′/τ ′′

−−−−−−−→ y′′ − 1(I6)

which is analogous to the class of systems in Fig. 7 withthe exception that now the translation rates of the twofluorescent proteins are not equal but proportional withproportionality constant α. First we will present the ana-logue of the constraint on open-loop systems given byEq. ((3)) in the paper. In particular, systems from theabove class of systems in which the downstream compo-nents ux, uy, X’, Y’, X”,Y” do not directly, or indirectly,affect their own transcription and translation rates mustsatisfy the following inequalities

min (Tm, α) ηy′′y′′ − ηx′′x′′ ≤ ηx′′y′′

(min (Tm, α)− T 2

m

Tm

)

ηx′′x′′ −max (α, 1) ηy′′y′′ ≤ (1−max (α, 1)) ηx′′y′′

0 ≤ ηx′′y′′ .(I7)

Systems that break these inequality must be connectedin some kind of feedback loop. Note that this equationis identical to Eq. (H6) with the exchange

pypx→ α.

Next we will present the analogue of the constrainton stochastic systems given by Eq. (5) in the paper. Inparticular, systems from the above class without feedback

in which the translation rate is stochastic must satisfy thefollowing inequality

(Tm −min (Tm, α)) ηx′′y′′ ≤ ηx′′x′′ −min (Tm, α) ηy′′y′′ .(I8)

Open-loop systems that break this bound must bedriven by some kind of oscillation. Again, note thatthis equation is identical to Eq. (H7) with the exchangepypx→ α.

Derivation of Eqs. (I7) and (I8): Following the sameanalysis that was done in Sec. E, we can derive the exactsame equations as Eq. (E6), (E7), and (E8), only nowthey apply to (co)variances of the class of systems withα 6= 1. These equations don’t change when α 6= 1 be-cause we are considering normalized (co)variances whichcancel out the proportionality constant. The flux balancerelation given by Eq. (D1) will be different, however, asnow one of the averages will be scaled up or down bythe factor α. In particular, at stationarity we have thefollowing flux balance relations [32]

〈x′′〉/τ ′′ = 〈x′〉/τmax,x = 〈F 〉,〈y′′〉/τ ′′ = 〈y′〉/τmax,y = α〈F 〉 ⇒ 〈y′′〉/〈x′′〉 = α.

The system of equations becomes mathematically iden-tical to the system of equations we used to derive thebounds in Sec. H 2 with the exchanges

pypx→ α. Eqs. (I7)

and (I8) thus follow from the derivations in that sectionwhere we exchange

pypx→ α.

3. Systematic undercounting of systems withproportional transcription rates

We found in the previous sections that the equationsfrom which we derive our bounds when α 6= 1 are math-ematically identical to the analogous equations whenα = 1 but where each reporter is detected with a fixedprobability such that α =

pypx

. This is because in both

these cases the fluctuation-balance relations between thereporter (co)variances remain unchanged compared tothe original class of systems, but the averages will changeaccording to Eq. (H4) and Eq. (I4). Therefor, whenα 6= 1 and there is systematic undercounting such thatpy 6= px, the only difference in our equations would bethat the averages scale differently. In fact, the equationsbecome analogous to those where there is no systematicundercounting but where the transcription rates are pro-portional with proportionality factor α = α

pypx

. All the

bounds presented in Sec. I thus hold but with the ex-change α → α. This “proportionality constant” will of-ten be unknown, and so in the next section we show howit can be measured using a third reporter.

25

Appendix J: Additional reporters

1. Measuring unknown life-time ratios

First, we will show how to measure the mRNA life-time ratio T using a third reporter. Recall, for the classof systems shown in Eq. (1) of the main text, the follow-ing fluctuation balance relations given by Eq. (A4) must

hold. We now allow for an additional reporter Z in thesystem, where Z has a life-time of τz, is produced withthe same probabilistic birthrate R(u(t)), and is allowedto feedback and affect the cloud of components u(t). Insuch a case, any pair of components in X, Y , Z will havea fluctuation balance relations like Eq. (A4). In particu-lar, we have

ηxx =1

〈x〉 + ηxR ηyy =1

〈y〉+ηyR ηzz =1

〈z〉 + ηzR

ηxy =1

1 + TyxηxR +

Tyx1 + Tyx

ηyR ηyz =1

1 + TzyηyR+

Tzy1 + Tzy

ηzR ηxz =1

1 + TzxηxR +

Tzx1 + Tzx

ηzR,

where Tij = τi/τj . Recall that the averages are relatedhere by a flux balance 〈i〉/〈j〉 = Tij , which gives us 3more equations. If all we can measure are the reporter(co)variances, we are left with 9 unknowns: the three ηiR,the three averages 〈i〉, and the three life-time ratios Tij .This is a system of 9 linear equations, so we can solve forall unknowns. In particular, we find

Tij =ηjj − ηjkηii − ηik

for i, j, k ∈ {x, y, z}. (J1)

Note that the above expression depends only on the re-porter variances and the covariances. Three seperatedual reporter experiments can thus be done to measureall the ηij and infer all the life-time ratios. If only onelife-time ratio is being measured, then only two exper-iments are needed as the above expression only involvethe covariance between two reporters and their normal-ized variances.

Next, we would like to derive a similar result for ratiosof fluorescent protein maturation times. This can be doneby using fluorescent reporters that all share the same(though distinct) promoter. Since such reporters are notinvolved in their regulation, we can use equation Eq. (E2)which holds for pairs X ′′, Y ′′ of co-regulated fluorescentreporters without feedback

ηx′′y′′ =1

1 + Tmηx′′x′′ +

Tm1 + Tm

ηy′′y′′ .

As we increase the number of reporters, the numberof such equations increases faster than the number ofunknowns, and eventually we are able to solve for all theunknowns in terms of the reporter (co)variances. Unlikethe mRNA case we cannot use the other fluctuationbalance equations of the form ηx′′x′′ = 1/〈x′′〉 + η¯x′′ ¯′′xbecause each of these equations adds an additional

unknown η¯x′′ ¯′′x to the system of equations. With threereporters we thus do not have enough equations to solvefor all the unknowns. With 4 reporters, however, we areable to solve for all maturation life-times given that wehave one of the life-time ratios, and with 5 reporters weare able to solve for all life-time ratios.

2. Measuring birthrate proportionality constantsand reporter detection probabilities

In Sec. H we have shown how systematic undercountingaffects the derived results, and in Sec. I we have shownhow proportional production rates affect the derived re-sults. We found that both cases have the same effect andcan be treated together in full generality, see Sec. I 3.That is, when both are treated together we found thatthe constraints presented in the paper change where nowthey also depend on the factor α = α

pypx

, where α is

the proportionality constant between the two productionrates and

pypx

is the ratio of detection probabilities. Here

we will show how this α can be measured using threedual reporter experiments. The derived bounds from theprevious sections can then be used with asymmetricalsystematic undercounting and with naturally occurringproportional transcription rates.

Like the previous subsection, we allow for an additionalreporter Z in the system, where the production rate ofeach reporter is αiR for i ∈ {x, y, z}. Moreover, we al-low for stochastic undercounting where each reporter isdetected with fixed probability pi for i ∈ {x, y, z}. Thereporter read-outs will have the following fluctuation bal-ance relations (see Sec. H and Sec. I)

26

ηxrxr =1

〈xr〉+ ηxR ηyryr =

1

〈yr〉+ηyR ηzrzr =

1

〈zr〉+ ηzR

ηxryr =1

1 + TyxηxR +

Tyx1 + Tyx

ηyR ηyrzr =1

1 + TzyηyR+

Tzy1 + Tzy

ηzR ηxrzr =1

1 + TzxηxR +

Tzx1 + Tzx

ηzR,

where Tij = τi/τj . As explained in Sec. I 3 these are theexact same fluctuation balance equations as Eq. (A4),but now the averages will be related by 〈ir〉/〈jr〉 = Tij

αiαj

,

where αi = αipi . When the life-time ratios Tij areknown, this gives us 9 equations and 9 unknowns (3 ηiR,3 averages 〈ir〉, and 3 αi ratios) and so we can solve forall the αi ratios. In particular, we find

αiαj

= Tji

(Tik + TjkTijTjk + TikTji

)((Tik + TjkTij)ηirir − Tik(1 + Tij)ηij − (1 + Tik)ηik + (1 + Tik)ηjk(Tjk + TikTji)ηjrjr − Tjk(1 + Tji)ηji − (1 + Tjk)ηjk + (1 + Tjk)ηik

), (J2)

for i, j, k ∈ {x, y, z}. If the life-time ratios Tij are notknown, then we have 3 additional unknowns and we havemore unknowns than equations. However, including afourth reporter gives us 4 new unknowns 4 additionalunknowns (〈u〉, αu, τu and ηuR for a fourth reporter U)and 7 additional equations analogous to the ones above,which means we can solve the system of equations forfour reporters for all the life-time ratios Tij and the pro-portionality constants αi. Note that only (co)variancemeasurements are needed, and so 4 separate experimentsinvolving two co-regulated reporters can be done to inferthe αi.

For fluorescent proteins we cannot solve for the α thisway because the system of equations will always be un-derdetermined. However, once the averages reach sta-tionarity, flux balance leads to 〈x′′〉/〈y′′〉 = α, where thetranslation rate of y′ is αF and that of x′ is F . If theexperiments report absolute numbers of molecules or con-centrations, and the averages can be measured, then αcan be measured. However, averages are often not knownbecause fluorescence measurements involve an unknownscaling factor between the numbers/concentrations andthe light intensity. To get by this, one could do twosingle reporter experiments, where the same fluorescentreporter is used for the two genes. The ratio of the twoaverages would then cancel out this unknown scaling fac-tor and would result in the actual ratio of components〈x′′1〉/〈x′′2〉 = α, from which α can be inferred.

3. Stronger constraints on fluorescent reportersusing a third reporter

The open-loop constraint for fluorescent proteins givenby Eq. (3) in the main text encompasses a larger re-gion in the ρxy and CVx/CVy plane than the equivalentconstraint for mRNA reporters given by Eq. (2). Thisis due to the fact that there are unspecified degrees offreedom in the class of fluorescent protein systems thatare unknown given the variability of two downstream re-

porters. Specifically, the fluctuation-balance relations foropen-loop mRNA reporters are given by Eq. (B3)

ηxx = ηx,int + ηxx, ηyy = ηy,int + ηyy,

ηxy =1

1 + Tηxx +

T

1 + Tηyy,

(J3)

whereas the equivalent relations for fluorescent proteinsare given by

ηx′′x′′ = ηx′′,int + ηx′′x′′ , ηy′′y′′ = ηy′′,int + ηy′′y′′ ,

ηx′′y′′ =1

1 + Tmηx′′x′′ +

Tm1 + Tm

ηy′′y′′ .

(J4)

In both systems, the open-loop constraint on the condi-tional averages is identical, see Eqs (B4) and (E7),

T 2ηxx ≤ ηxx ≤ ηxx & T 2mηx′′x′′ ≤ ηx′′x′′ ≤ ηx′′x′′ .

(J5)

The difference lies in the fact that the intrinsic systemfor the mRNA reporters is specified as there is only onestep downstream from the shared birthrate. Specifically,ηx,int = 1/〈x〉 and ηy,int = 1/〈y〉, and so ηx,int = Tηy,int.This last equation allows us to solve Eq. (J3) for ηxxand ηyy in terms of the measurable (co)variances whichallows us to take full advantage of the open-loop con-straint given by Eq. (J5). For fluorescent proteins thisis not the case because the mRNA intrinsic systems arenot specified (for example, the half-lives of the mRNAare not specified). As a result, we cannot close the sys-tem of equations to solve for ηx′′x′′ and ηy′′y′′ in terms ofmeasurable (co)variances. We can, however, bound theintrinsic noise terms using Eq. (E8), which in addition tothe open-loop constraint of Eq. (J5) leads to the broaderconstrained region shown in the paper. We can visualizethis difference by comparing the regions bounded by theopen-loop constraints in Fig. 1B and Fig. 2B of the paper.Systems that lie on the ρ = 0 line are dominated by in-trinsic variability. For the mRNA region in Fig. 1B, these

27

systems must lie at the point CVx/CVy =√T , whereas

for the FP region in Fig. 2B they can lie anywhere within[√T , 1]. If the mRNA intrinsic systems were specified,

the relation between ηx′′,int and ηy′′,int would be spec-ified and systems on the ρ = 0 line would have to belocated at some point like in the mRNA class of systems.Here we show how probing the system with a third flu-orescent reporter allows us to account for this missingdegree of freedom. In particular, we will show how withthree reporters we can measure ηx′′x′′ and ηy′′y′′ so thatwe can fully take advantage of the open-loop constraint.

We allow for a third fluorescent protein reporter Z′′

that shares the same transcription rate and an identi-cal mRNA intrinsic system as X ′′ and Y ′′. We let thematuration time of Z′′ be τmat,z, and we let Tm,ij :=τmat,i/τmat,j for i, j ∈ {x, y, z}. When each reporter does

not affect it’s own transcription or translation rates, theright-most fluctuation balance relation given by Eq. (J4)holds for each pair of reporters:

ηx′′y′′ =ηx′′x′′

1 + Tm,yx+Tm,yxηy′′y′′

1 + Tm,yx,

ηy′′z′′ =ηy′′y′′

1 + Tm,zy+Tm,zyηz′′z′′

1 + Tm,zy,

ηx′′z′′ =ηx′′x′′

1 + Tm,zx+Tm,zxηz′′z′′

1 + Tm,zx.

(J6)

With three separate dual reporter experiments, the threereporter covariances ηx′′y′′ , ηx′′z′′ and ηy′′z′′ can be mea-sured. If the maturation time ratios Tm,ij are known, wehave three equations and three unknowns, and so we canfully solve for ηx′′x′′ , ηy′′y′′ and ηz′′z′′ ,

ηi′′ i′′ =Tm,ik(1 + Tm,ij)ηij + (1 + Tm,ik)ηik − (1 + Tm,jk)ηjk

Tm,ik + Tm,jkTm,ikwhere i, j, k ∈ {x, y, z}, (J7)

and so we can fully take advantage of the open-loop con-straint given by Eq. (J5).

This is also the case for the constraint on stochasticsystems given by Eq. (5) in the paper. In terms of theconditional averages, the strongest form of this constraintis given by Eq. (F1)

Tmηy′′y′′ ≤ ηx′′x′′ . (J8)

Using the above approach we can do three separate dualreporter experiments involving 3 pairs of fluorescent re-porters to solve for ηx′′x′′ and ηy′′y′′ and fully take ad-vantage of the constraint given by Eq. (J8).

Note that ηx′′x′′ is the variability in X ′′ that orig-inates from the shared cloud of components u(t), soηx′′x′′ − ηx′′x′′ is the intrinsic noise that originates fromthe random nature of the transcription, translation, andmaturation steps. The variance can also be decomposedas ηx′′x′′ = 1

〈x′′〉 + η¯x′′ ¯x, where 1〈x′′〉 is the intrinsic noise

that originates from the translation and maturation step,and η¯x′′ ¯x is the variability originating from the intrinsicmRNA fluctuations as well as from the cloud of compo-nents u(t). If the protein abundances can be measured,along with ηx′′x′′ according to the above, then we candetermine how much variability is generated through thedifferent steps of gene expression.

Appendix K: Concentrations of growing anddividing cells

1. Derivation of Eq. (2) and Eq. (4) for mRNAconcentrations

In growing and dividing cells, the molecular abun-dances still follow the same production and degradation

reactions during the “cell-cycle” but are now affected bycell-division. In particular, considering systems as de-fined in Eq. (1), X and Y share an unspecified birthrateR(u(t), V ) which can now depend on the cell volume Vand the abundances of the components u(t). Further-more, they are assumed to undergo first order degrada-tion with life-times τx and τy respectively.

However, now the cell volume increases and undergoescell division at times {ti}, which can vary over the cellensemble. At these moments the volume is reduced by afactor V → aiV at ti, as we follow one of the daughtercells. For perfect symmetric division, for example, wehave a = 1/2. To allow for stochastic division errors incell volume, we let the ai vary over the ensemble and thedivision times through an unspecified distribution — allwe specify is that on average we have 〈ai〉 = 1/2. Atthe division times the cell content splits between the twodaughter cells, and we assume that the mRNA numbersare split according to a binomial distribution with prob-ability ai. This is summarized as

(V, x, y

) At time ti−−−−−−→(aiV, B(x, ai), B(y, ai)

)

for t1, t2, t3 . . .

We first consider systems in which the cellularcomponents X and Y are affected by, but do notaffect, the otherwise unspecified environmental vari-ables u(t) and V . In this scenario, we can ana-lyze the average stochastic dual reporter dynamics con-ditioned on the history of their upstream influences[33]: x(t) = E

[Xt|u[−∞, t], V [−∞, t]

]and y(t) =

E[Yt|u[−∞, t], V [−∞, t]

]. Since all these systems have

the same volume history, they will all undergo divisionat the same times {ti} with the same splitting factors{ai}. When the systems are not at one of these division

28

points, the time evolution is specified completely by thereactions in Eq. (1). In particular, for ti < t < ti+1, thetime evolution of the conditional averages is given by

dx

dt= R(t)− x/τx &

dy

dt=R(t)− y/τy (K1)

when ti < t < ti+1,

where R(t) = R(u(t)). At t = ti+1 we have V (ti+1) →ai+1V (ti+1), x(ti+1) → ai+1x(ti+1), and y(ti+1) →ai+1y(ti+1). This gives us the boundary conditions forthe above differential equation.

Now we consider the stochastic concentrations, definedas Xc := X/V and Yc := Y/V . Conditioned on thehistory of the upstream influences and the volume, wehave

xc(t) = E[Xt

Vt

∣∣∣u[−∞, t], V [−∞, t]]

=1

V (t)E[Xt|u[−∞, t], V [−∞, t]

]=

x

V (t),

where we can pull out the V (t) from the expectationbrackets because it is specified by the conditioning andbecomes equivalent to a constant at time t. In ti < t <ti+1, we use Eq. (K1) and the product rule to find

dxcdt

=d

dt

(x

V (t)

)= Rc(t)− xc/τx − xc

V ′(t)V (t)

, (K2)

where Rc := R/V is interpreted as the production rateof the concentration Xc. At division time ti+1 we havexc → ai+1x

ai+1V= xc. Thus xc is unchanged at the division

times and is continuous, meaning Eq. (K2) holds for all t.Eq. (K2) holds for any volume dynamics. Here we

assume that cellular volume grows exponentially, i.e.,V ′(t) = V (t) · ln(2)/τc, where τc is the average cell cycletime of a cell. When this is the case Eq. (K2) becomes

dxcdt

= Rc(t)− xc/τxc &dycdt

= Rc(t)− yc/τyc ,(K3)

where τxc :=(

1τx

+ ln(2)τc

)−1

is the “life-time” of the con-

centration Xc, and the right equation follows by symme-try with similarly defined τyc .

Eq. (K3) can be understood as a dilution approxima-tion, which is exact for the conditional ensemble average.Because it is mathematically identical to Eq. (B1) thesame bounds on the average dynamics follow. To relatethe results to the stochastic dynamics of Xc and Yc, weuse the law of total variance and Eq. (K3) to derive the(co)variance relations

ηxcxc =〈xc/V 〉〈xc〉2

+ ηxcxc , ηycyc =〈yc/V 〉〈yc〉2

+ ηycyc ,

ηxcyc =1

1 + Tcηxcxc +

Tc1 + Tc

ηycyc , (K4)

and the flux balance relation

〈yc〉 = Tc〈xc〉, (K5)

where Tc := τyc/τxc (see below for detailed derivation ofEqs. (K4) and (K5)). For systems where Rc(t) does notvary periodically as a function of the cell-cycle, Xc and Ycare independent of V (t) and Eqs. (K4) and (K5) becomemathematically identical to Eqs. (A4) and (A3) fromwhich the constraints derived for x and y are translatedto X and Y. Following the same steps as in Appendix Band C thus gives the same constraints, which concludesthe proof that bounds derived for molecular abundancesof stochastically driven systems in class Eq. (1) also applyto their cellular concentrations.

When the average dynamics of Xc and Yc is not inde-pendent of V , i.e., when Rc(t) is cell-cycle dependent, wecannot close the system of equations given by Eqs. (K4)and (K5). However, with a third reporter we are able toclose the system of equations to derive bounds similarto the ones presented in the main text for molecularabundances as discussed in Appendix K 3.

Derivation of Eqs. (K4) and (K5): Taking the en-semble average of Eq. (K3) over all possible histories, weget

E

[dxcdt

]= E[Rc(t)]− E[xc]/τxc = 〈Rc〉+ 〈xc〉/τxc .

Note that E[dxcdt

]= d

dtE[xc] = ddt 〈xc〉, so once the av-

erages reach stationarity the left hand side goes to zeroand we are left with the following flux-balance relations

〈Rc〉 = 〈xc〉/τxc & 〈Rc〉 = 〈yc〉/τyc , (K6)

from which Eq. (K5) follows. Moreover, Eq. (K3) is iden-tical to analogous equations derived for the molecularnumbers Eq. (B1), and so we can follow the same analy-sis to derive the following expression

ηxcyc =1

1 + Tcηxcxc +

Tc1 + Tc

ηycyc .

We now use the law of total variance to decomposeηxcxc and ηycyc into two terms as follows:

ηxcxc =E[Var(Xc|u[−∞, t], V [−∞, t])

]

〈xc〉2

+Var

(E[Xc|u[−∞, t], V [−∞, t]

])

〈xc〉2=ηxc,int + ηxcxc ,

where we call the first term of the expansion ηxc,int. Wethus have the following

ηxcxc = ηxc,int + ηxcxc , ηycyc = ηyc,int + ηycyc ,

ηxcyc =1

1 + Tcηxcxc +

Tc1 + Tc

ηycyc . (K7)

29

The intrinsic variance is given byE [Var(Xc|u[−∞, t], V [−∞, t])]. That is, it’s theensemble average of the conditional variance. Considerthe conditional variance

Var(Xc|u[−∞, t],V [−∞, t])

= Var

(X

V

∣∣∣u[−∞, t], V [−∞, t])

=1

V (t)2Var (X|u[−∞, t], V [−∞, t]) ,

We can thus find the conditional variance of X, divide theresult by V (t), and then take the ensemble average to getthe intrinsic variance. To find the conditional variance ofX, we consider an ensemble of N cells with the samehistories u[−∞, t] and V [−∞, t]. The variance of one ofthe cells, say X1, will give us Var(X|u[−∞, t], V [−∞, t]).To find this variance, we start by considering the to-tal number of molecules in the hypothetical ensemble

X(N)T = X1 + X2 + · · · + XN . We now use the law of

total variance conditioned on X(N)T

Var(X1) = E[Var(X1|X(N)T )] + Var(E[X1|X(N)

T ]). (K8)

Note that the expectations and variances here are overthe hypothetical ensemble of cells with the same history.The trick here is to notice that the death rate of each Xi

is xi/τx, and so the death rate of XT is xT /τx. This isequivalent to saying that each molecule has a probabilityper unit time of degrading given by 1/τx, and this prob-ability is independent of all other molecules comprisingXT or any of the Xi. Similarly, each cell in this hypo-thetical ensemble has the same synchronized birthrateR(t), so the probability of some molecule in XT to havebeen born in any of the cells is the same and indepen-dent of how many X molecules are in each cell. Lastly,the process by which a molecule is “degraded” due to celldivision is also independent of all other molecules in eachcell, as we model cell division by a binomial spit whereeach molecule has an equal and independent probabilityof remaining in the cell. All this to say, each molecule hasa probability 1/N of being in X1, and this is independentof how many molecules of XT we know are in X1. This

is a binomial distribution P (X1|X(N)T ) = B(X

(N)T , 1/N),

therefor E[X1|X(N)T ] = X

(N)T /N and Var(X1|X(N)

T ) =

E[X1|X(N)T ](1− 1/N). Eq. (K8) becomes

Var(X1) =(1− 1/N)〈X1〉+ Var

(X

(N)T

N

). (K9)

The Xi are independent and identically distributedrandom variables. They thus obey the weak law oflarge numbers, so the second term in Eq. (K9) goesto zero as N → ∞. Thus Var(X1) = 〈X1〉 =

E[X|u[−∞], V [−∞, t]], and so

Var(Xc|u[−∞, t],V [−∞, t])

=1

V (t)2E[X|u[−∞], V [−∞, t]]

= E

[X

V 2

∣∣∣u[−∞], V [−∞, t]]

= E

[Xc

V

∣∣∣u[−∞], V [−∞, t]].

We can now calculate the intrinsic variance of the realensemble which we write in terms of the normalized vari-ances

ηxc,int =

⟨Xc

V

⟩1

〈xc〉2& ηyc,int =

⟨YcV

⟩1

〈yc〉2,

(K10)

which together with Eq. (K7) leads to Eq. (K4). Whenthe reporter concentrations are independent of the cellvolume, Xc will be independent of V , and so 〈Xc/V 〉 =〈xc〉〈 1

V 〉, and similarly for y, which gives us

ηxc,int =

⟨1

V

⟩1

〈xc〉& ηyc,int =

⟨1

V

⟩1

〈yc〉. (K11)

This equation, along with Eq. (K3) and Eq. (K7), com-prise the exact same equations as Eqs. (A3), (B1), and(B3), from which the bounds for the class of systems inEq.(1) of the main text were derived. The only differ-ence is that now the (co)variances will be for the molec-ular concentrations, and the life-times τxc and τyc arefor the concentrations as they include a contribution dueto dilution from the growing volume. Note that if Rc isstochastic (non-oscillatory), then it must be independentof the cell volume V , and so the bound on stochastic sys-tems given by Eq. (4) holds in general for concentrations.

2. Derivation of Eq. (5) and Eq. (7) for volumeindependent fluorescent protein concentrations

Next, we consider the analogue of the class of systemsin Fig. 7. The fluorescent protein numbers are still de-scribed by the reactions in Fig. 7, with the addition thatthe mRNA and proteins are also each “degraded” by celldivision

(V, z)At time ti−−−−−−→(aiV,B(z, ai)),

for t1, t2, . . . , with z ∈ {x′, y′, x′′, y′′}.

When the fluorescent protein components X ′, X ′′, Y ′,Y ′′ do not affect the environmental variables u(t) andV , we can analyze the average stochastic dual reporterdynamics conditioned on the history of their upstream

30

influences

x′(t) = E[X ′t|u[−∞, t], V [−∞, t]

],

y′(t) = E[Y ′t |u[−∞, t], V [−∞, t]

],

¯x′(t) = E[X ′t|u[−∞, t], V [−∞, t],ux[−∞, t]

],

¯y′(t) = E[Y ′t |u[−∞, t], V [−∞, t],uy[−∞, t]

].

We follow the same analysis as was done in the previoussection to derive the following differential equations

dx′cdt

= Fc(t)− x′c/τx′c

dy′cdt

= Fc(t)− y′c/τy′cdx′′cdt

= x′c/τmat,x − x′′c /τ ′′cdy′′cdt

= y′c/τmat,y − y′′c /τ ′′c(K12)

d¯x′cdt

= Fx,c(t)− ¯x′c/τx′c

d¯y′cdt

= Fy,c(t)− ¯y′c/τy′c

d¯x′′cdt

= ¯x′c/τmat,x − x′′c /τ ′′cd¯y′′cdt

= ¯y′c/τmat,y − y′′c /τ ′′c(K13)

where τx′c

= ( 1τmat,x

+ ln(2)τc

)−1, τ ′′c =(

1τ ′′ + ln(2)

τc

)−1

,

Fc(t) = 1V (t)E[F (u(t),ux)|u[−∞, t], V [−∞, t]] and

Fx,c = 1V (t)F (u(t),ux(t)). We take the ensemble aver-

age of Eq. (K12) over the different histories, which oncethe first moments reach stationarity, leads to the follow-ing flux balance relations

〈x′′c 〉/τ ′′c =τx′c

τmat,x〈Fc〉 & 〈y′′c 〉/τ ′′c =

τy′cτmat,y

〈Fc〉.(K14)

Eqs. (K12) and (K13) are identical to the analogous dif-ferential equations derived in the molecular number sys-tem, with the exception that the degradation time of x′c isgiven by τx′

cinstead of τmat,x due to the added degrada-

tion that comes from dilution from the growing volume.Nevertheless, we can follow the same analysis to derivethe following expression

ηx′′c y

′′c

= ηx′′c y

′′c

=1

1 + T cmηx′′

c x′′c

+T cm

1 + T cmηy′′c y′′c , (K15)

where T cm := τy′c/τx′c. The last four equations are mathe-

matically identical to the analogous equations derived forco-regulated fluorescent proteins without feedback, fromwhich, along with Eq. (E4), we derived the bounds in thefluorescent protein system. We thus need to show thatan equation identical to Eq. (E4) holds for the reporterconcentrations. From there the constraints follow fromour previous proofs. To do this, we can use the law oftotal variance to decompose the variance of Xc by con-ditioning on the history of u, V , and ux, which givesus

Var(X ′′c )

= E [Var(X ′′c |ux[−∞, t],u[−∞, t], V [−∞, t])] + Var(¯x′′c ).

We would now like to apply the same analysis as was donefor the mRNA concentration system in order to derive anexpression for the first term on the right. We thus look atan ensemble of N cells with the same histories u[−∞, t],V [−∞, t], and ux[−∞, t], which corresponds to each cellhaving the same synchronized translation rate F (t). Insuch a case, we use the same trick as was done for themRNA system in the previous section to show

E [Var(X ′′c |ux[−∞, t],u[−∞, t], V [−∞, t])] =

⟨X ′′cV

⟩.

When the reporter concentrations are independent of thecell volume, we have

⟨XcV

⟩=⟨

1V

⟩〈x′′c 〉, and so in terms

of the normalized variances we have

ηx′′c x

′′c

=

⟨1

V

⟩1

〈x′′c 〉+ η¯x′′

c¯x′′c

ηy′′c y′′c =

⟨1

V

⟩1

〈y′′c 〉+ η¯y′′c ¯y′′c .

(K16)

This is the equation that we were seeking as it is math-ematically analogous to Eq. (E4). The factor on front ofthe average terms does not change the analysis as it isonly the ratio of these terms that came into our deriva-tions. This ratio is different than in the molecular num-bers system due to the added degradation that comesfrom dilution from the growing volume. In particular

ηx′′c x

′′c− η¯x′′

c¯x′′c

ηy′′c y′′c − η¯y′′c ¯y′′c

=T cmTm

. (K17)

We can follow the exact same steps as was done for thefluorescent protein number system to derive the analo-gous constraints in terms of the reporter concentrations.The only difference would be in the right-most open loopbound which shifts as a result of this added ”dilutiondegradation” of the immature protein concentrations.

3. Volume dependent genes — exact constraintsusing a third reporter

We were unable to formally prove the open-loop con-straint for volume dependent genes. Mathematically, theproblem lies in the fact that when the reporter concen-trations are volume dependent, the intrinsic terms ηxc,intand ηyc,int are given by Eq. (K10) and not by Eq. (K11).These introduce two new unknowns to the system ofequations, namely

⟨XcV

⟩1〈xc〉2 , and

⟨YcV

⟩1〈yc〉2 , which

makes the system of equations under-determined. Whenthe concentrations are volume independent, Eq. (K6)and Eq. (K11) imply that the ratio of intrinsic noiseterms are given by ηxc,int/ηyc,int = Tc, which allowsus to close the system of equations and solve for ηxcxcand ηycyc in terms of the measurable (co)variances whichare bounded by the sought after constraints. Note that√ηxc,int/ηyc,int sets the point along the ρxcyc line in

Fig. 4B (left panel) where the orange lines converge. For

31

volume dependent concentrations, numerical simulationsshow that

√ηxc,int/ηyc,int 6=

√Tc, but it holds to good

approximation with a divergence of up to around 6% forthe specific models that we simulated.

Though we cannot prove the open-loop constraint us-ing two reporters, we can infer the missing degree of free-dom by introducing an additional reporter into the sys-tem and derive an open-loop constraint on volume depen-dent genes. In particular, Eq. (B4) and Eq. (C3) still holdin terms of concentrations of growing and dividing cells(similarly for the fluorescent reporters), even when thereporter concentrations are cell volume dependent. It’sonly because the ηxc,int and ηyc,int terms become morecomplicated for volume dependent systems that we can-not close the system of equations and solve for ηxcxc andηycyc in terms of the measurable (co)variances. Thus, ifwe can measure ηxcxc and ηycyc then we can take full ad-vantage of the constraints. With a third reporter we areable to solve for ηxcxc and ηycyc in terms of the measur-able (co)variances as explained in Sec. J 3. In that sectionwe showed how to measure ηx′′x′′ and ηy′′y′′ using threeseparate dual reporter experiments involving three dualreporter pairs, but the exact same reasoning can be usedto measure ηxcxc and ηycyc (or the fluorescent proteinequivalents), as the same covariance equations used inthat section hold exactly for the reporter concentrations.Thus even if we are unable to formally prove the open-loop constraint for volume dependent systems with tworeporters, with a third reporter we are able to measurethe missing degree of freedom to prove the constraint.

Appendix L: Experimental data analysis

Balleza et al. quantified the maturation dynamics of50 different fluorescent proteins (FPs) in E. coli usingtime-lapse microscopy [15]. Their data show that thematuration step of roughly a third of the tested FPsare well described by first-order kinetics as assumed inour class of gene expression models. Additionally, theauthors quantified heterogeneity in fluorescence levelswhen the respective FPs were expressed under the con-stitutive promoter proC. These data were obtained fromflow-cytometry measurements of clonal E. coli MG1655(CGSC 6300) populations in M9-rich media at 37◦C.From their publicly available raw cell-to-cell heterogene-ity data [36], we selected the ten FPs that were bestmodelled by first-order maturation kinetics (see Supple-mental material [34]).

FP abundance CVs were obtained from the reportedflow-cytometry fluorescence distributions. Under the as-sumption that reporter concentrations are cell-volume in-dependent, i.e., transcription rates are cell-cycle indepen-dent, then the concentration CVs are related to the CVsin abundances and the CV in cell volume CVV [25]

CV 2concentration =

CV 2abundance

1 + CV 2V

− CV 2V

1 + CV 2V

. (L1)

We estimated CVV using separate time-lapse microscopydata that quantifies E. coli growth dynamics [37].From the publicly available time-traces of E. colilength we computed an average volume variability ofE. coli MG1655 (CGSC 6300) grown in LB media ofCVV = 0.261± 0.005. The resulting concentration vari-ability data are presented in Fig. 4C and a complete listof concentration CVs obtained this way along with thechosen FPs and their maturation times are presented inTable 1 in the Supplemental Material [34].

As shown in Fig. 8, all fluorescent protein pairs fallwithin the region expected for constitutively expressedgenes, with the exception of mEGFP. The data fall alongthe right orange bound given by the right inequality inEq. (8). According to Eq. (K14), systems that lie along

this boundary satisfy CVx′′c/CVy′′c =

√〈y′′c 〉/〈x′′c 〉 where

x′′c and y′′c denote the concentrations of the fluorescentproteins. Systems that lie along this bound thus haveCVs that scale inversely with the square root of the av-erages. According to Eq. (K16), the concentration CVsare given by

CV 2x′′c

=

⟨1

V

⟩1

〈x′′c 〉

+

∫ ∞

0

ηFFAFxc (t)

(τmat,xe

−t/τmat,x − τ ′′c e−t/τ′′c

τ2mat,x − τ ′′2c

)dt,

(L2)

where Fxc is the production rate of the immature proteinconcentrations. The first term on the right, which scalesinversely to the average, corresponds to noise that origi-nates at the translation step, the maturation step, fluc-tuations in protein degradation, and binomial splittingof proteins at cell division. The term on the right corre-sponds to noise that originates from transcription as wellas mRNA and translation rate fluctuations. Therefor,data that lie along the right orange boundary in Fig. 8corresponds to variability with negligible transcriptionnoise contributions.

In order to confirm this observation and also analyzethe mEGFP discrepancy, we plot in Fig. 8 the fluorescentprotein concentration CVs as a function of their matura-tion times. In the regime where the the second term onthe right of Eq. (L2) is negligible, and where we assumethat the fluorescent protein concentrations are degradedsolely by dilution from the growing and dividing cells, wecan use Eq. (K14) to write

CVx′′c

= A

√ln(2)

τc

(1 + ln(2)

τmat,xτc

), (L3)

where A =√⟨

1V

⟩1〈F 〉 is an unknown parameter. From

this equation we see that as the maturation time getslarger, so does the CV. This is because as the matura-tion time gets larger, the fraction of matured proteinsis reduced, which results in larger intrinsic fluctuationsthat originate from the maturation step and from cell

32

A) B) C)

D) E) F)

G) H) I)

Figure 8. Utilizing constraints on flow-cytometry data from constitutively expressed fluorescent proteins. Herewe plot all of the possible pairs from Table 1 in the Supplemental Material [34]. We plot 9 separate plots instead of combiningthe data into a single plot because for a given T cm the right bound given by Eq. (8) is only determined if we specify one of thematuration times. Thus for fixed Y reporter, the right bound is well defined for any given X reporter, with the yellow corridorindicating the estimated uncertainty in τc and τmat,y. Because all fluorescent proteins were constitutively expressed and werenot fused to cellular proteins we expect the experimental data to be consistent with the class of gene expression models that donot exhibit feedback and are not periodically driven (pink region). All variability ratios with respect to all reference fluorescenceproteins confirm the above picture with the exception for mEGFP (with ratios indicated in red) for which the data violate theexpected behaviour. With the exception of the indicated mEGFP outlier all data fall along the right hand boundary.

partitioning of matured proteins. On the other hand,we can look at the regime in which there is negligibletranslation noise (where the second term on the right ofEq. (L2) dominates). We model the autocorrelation ofthe protein concentration translation rate as a decayingexponential AFxc (t) = e−t/τF , and we assume that theprotein concentrations are degraded solely from dilution,in which case we have

CVx′′c

= B

√τF (τ ′′c τx′

c+ τF (τ ′′c + τx′

c))

(τF + τ ′′c )(τF + τx′c)(τ ′′c + τx′

c), (L4)

where τ ′′c = τc/ ln(2), τx′c

= (1/τmat,x + ln(2)/τc)−1

, andB =

√ηFF is an unknown parameter. In Fig. 9 we plot

Eq. (L3) and Eq. (L4) with A = 1.3, B = 1.2, τF = τc/10,and τc = 28.5± 2 min (as reported in [15]). We find thatthe computed CVs from the flow cytometry data sets arewell described by Eq. (L3), except for the mEGFP pointwhich displays a CV smaller than the others. The onlyparameter left in this model to vary is the division time

of the cells τc. When this parameter is changed to 35.6min, the model captures the mEGFP data point as shownby the dashed curve in Fig. 9. The discrepancy betweenmEGFP and the other 9 fluorescent proteins could thusbe explained by a slightly slower growth rate in the cellcultures used for the mEGFP data sets. The blue curve,on the other hand, corresponds to the negligible transla-tion noise model given by Eq. (L4). We find that as thematuration time gets larger, the CV gets smaller, whichis expected as fluorescent proteins with large maturationtimes have less time to adjust to varying upstream fluc-tuations and inherit less variability.

The previous analysis indicates that the variability isdominated by noise originating from the translation step,maturation step, fluctuations in protein degradation, andnoise originating from binomial splitting of protein num-bers at cell divisions, which are all described by the firstterm on the right in Eq. (L2). However, it is possible thatmeasurement noise which scales inversely with the aver-age is contributing to the measured variability. For exam-

33

ple, in [25] it is shown that flow cytometry measurementscontain a significant amount of measurement noise whenused with bacteria due to their small size. This measure-ment noise is shown to scale inversely with the averageof the fluorescence signal, meaning it could mask itself asbiological variability contributing to the first term on theright of Eq. (L2). In [25] a rigorous method is developedto measure the true CVs using flow cytometry on bacte-ria, separating measurement noise and autofluorescencecontributions using commonly used calibration beads.

0 50 100 150 200

Maturation time (min)

0.20

0.25

0.30

0.35

0.40

0.45

0.50

Conce

ntr

ati

on v

ari

abili

ty C

V

Negligible transcription noise model

Growth rate 35.6 min

Growth rate 28.5 2 min

Negligible translation noise model

Growth rate 28.5 2 min

+_

+_

Figure 9. The data is best described by a model withnegligible transcription noise. Plotted are the concentra-tion CVs from Table 1 of the Supplemental Material [34] asa function of their respective fluorescent protein maturationtimes. The pink curve corresponds to the model given byEq. (L3) with A = 1.3 and τc as reported in [15]: 28.5 ± 2min. The pink corridor indicates the result of the uncertaintyin τc. The blue curve corresponds to the model given byEq. (L4) in which translation noise is negligible, with B = 1.2,τF = 28.5/10 min, and τc = 28.5 ± 2 min. We find that thedata follows the pink curve and is thus best described by thenegligible transcription noise model. The data point withthe smallest CV corresponds to mEGFP, and the dashed redcurve corresponds to the same model as the pink curve butwith τc increased by a factor of 1.25. A possible explanationfor this relatively low CV observed for mEGFP is that the cellcultures expressing mEGFP had slightly slower growth rates.

[1] M. Thattai and A. van Oudenaarden, Intrinsic noise ingene regulatory networks, Proc. Natl. Acad. Sci. U. S. A.98, 8614 (2001).

[2] E. M. Ozbudak, M. Thattai, I. Kurtser, A. D. Grossman,and A. van Oudenaarden, Regulation of noise in the ex-pression of a single gene, Nat. Genet. 31, 69 (2002).

[3] W. J. Blake, M. Kærn, C. R. Cantor, and J. J. Collins,Noise in eukaryotic gene expression, Nature 422, 633(2003).

[4] A. Bar-Even, J. Paulsson, N. Maheshri, M. Carmi,E. O’Shea, Y. Pilpel, and N. Barkai, Noise in proteinexpression scales with natural protein abundance, Nat.Genet. 38, 636 (2006).

[5] J. R. Newman, S. Ghaemmaghami, J. Ihmels, D. K. Bres-low, M. Noble, J. L. DeRisi, and J. S. Weissman, Single-cell proteomic analysis of S. cerevisiae reveals the archi-tecture of biological noise, Nature 441, 840 (2006).

[6] A. Eldar and M. B. Elowitz, Functional roles for noise ingenetic circuits, Nature 467, 167 (2010).

[7] D. Jones and J. Elf, Bursting onto the scene? Exploringstochastic mRNA production in bacteria, Current Opin-ion in Microbiology 45, 124 (2018).

[8] M. Kærn, T. C. Elston, W. J. Blake, and J. J. Collins,Stochasticity in gene expression: from theories to pheno-types, Nature Reviews Genetics 6, 451 (2005).

[9] A. Hilfinger, T. M. Norman, and J. Paulsson, Exploit-ing natural fluctuations to identify kinetic mechanismsin sparsely characterized systems, Cell Systems 2, 251(2016).

[10] Y. Taniguchi, P. J. Choi, G.-W. Li, H. Chen, M. Babu,J. Hearn, A. Emili, and X. S. Xie, Quantifying E. coliproteome and transcriptome with single-molecule sensi-tivity in single cells., Science 329, 533 (2010).

[11] E. Lubeck, A. F. Coskun, T. Zhiyentayev, M. Ahmad,and L. Cai, Single-cell in situ RNA profiling by sequentialhybridization, Nat Meth 11, 360 (2014).

[12] M. B. Elowitz, A. J. Levine, E. D. Siggia, and P. S. Swain,Stochastic gene expression in a single cell, Science 297,1183 (2002).

[13] J. M. Raser and E. K. O’Shea, Control of stochasticityin eukaryotic gene expression, Science 304, 1811 (2004).

[14] H. Maamar, A. Raj, and D. Dubnau, Noise in gene ex-pression determines cell fate in Bacillus subtilis, Science317, 526 (2007).

[15] E. Balleza, J. Mark Kim, and P. Cluzel, Systematic char-acterization of maturation time of fluorescent proteins inliving cells, Nature Methods 15, 47 (2018).

[16] S. S. Shen-Orr, R. Milo, S. Mangan, and U. Alon, Net-work motifs in the transcriptional regulation network ofEscherichia coli , Nature genetics 31, 64 (2002).

34

[17] A. Baudrimont, V. Jaquet, S. Wallerich, S. Voegeli, andA. Becskei, Contribution of RNA degradation to intrinsicand extrinsic noise in gene expression, Cell reports 26,3752 (2019).

[18] A. Hilfinger, T. M. Norman, G. Vinnicombe, andJ. Paulsson, Constraints on fluctuations in sparselycharacterized biological systems, Phys. Rev. Lett. 116,058101 (2016).

[19] A. Raj, C. S. Peskin, D. Tranchina, D. Y. Vargas, andS. Tyagi, Stochastic mRNA synthesis in mammaliancells, PLoS Biol 4, e309 (2006).

[20] S. O. Skinner, L. A. Sepulveda, H. Xu, and I. Gold-ing, Measuring mRNA copy number in individual Es-cherichia coli cells using single-molecule fluorescent insitu hybridization, Nature protocols 8, 1100 (2013).

[21] N. Eling, M. D. Morgan, and J. C. Marioni, Challengesin measuring and understanding biological noise, NatureReviews Genetics 20, 536 (2019).

[22] S. A. Emory, P. Bouvet, and J. G. Belasco, A 5’-terminalstem-loop structure can stabilize mRNA in Escherichiacoli, Genes & development 6, 135 (1992).

[23] D. Cheneval, T. Kastelic, P. Fuerst, and C. N. Parker, Areview of methods to monitor the modulation of mRNAstability: a novel approach to drug discovery and ther-apeutic intervention, Journal of Biomolecular screening15, 609 (2010).

[24] S. M. Shaffer, M.-T. Wu, M. J. Levesque, and A. Raj,Turbo FISH: a method for rapid single molecule RNAFISH, PLoS One 8, e75120 (2013).

[25] L. Galbusera, G. Bellement-Theroue, A. Urchueguia,T. Julou, and E. van Nimwegen, Using fluorescence flowcytometry data for single-cell gene expression analysis inbacteria, PloS One 15, e0240233 (2020).

[26] D. Grun, L. Kester, and A. Van Oudenaarden, Valida-tion of noise models for single-cell transcriptomics, Na-ture methods 11, 637 (2014).

[27] A. Raj, P. Van Den Bogaard, S. A. Rifkin, A. Van Oude-naarden, and S. Tyagi, Imaging individual mRNAmolecules using multiple singly labeled probes, Naturemethods 5, 877 (2008).

[28] C.-H. L. Eng, M. Lawson, Q. Zhu, R. Dries, N. Koulena,Y. Takei, J. Yun, C. Cronin, C. Karp, G.-C. Yuan, et al.,Transcriptome-scale super-resolved imaging in tissues byRNA seqFISH+, Nature 568, 235 (2019).

[29] A. Rhee, R. Cheong, and A. Levchenko, Noise decompo-sition of intracellular biochemical signaling networks us-ing nonequivalent reporters, Proceedings of the NationalAcademy of Sciences 111, 17330 (2014).

[30] N. G. Van Kampen, Stochastic processes in physics andchemistry, Vol. 1 (Elsevier, 1992).

[31] I. Lestas, J. Paulsson, N. E. Ross, and G. Vinnicombe,Noise in gene regulatory networks, IEEE Trans. Au-tomat. Contr. 53, 189 (2008).

[32] A. Hilfinger and J. Paulsson, Separating intrinsic fromextrinsic fluctuations in dynamic biological systems,Proc. Natl. Acad. Sci. U. S. A. 108, 12167 (2011).

[33] A. Hilfinger, M. Chen, and J. Paulsson, Using tempo-ral correlations and full distributions to separate intrin-sic and extrinsic fluctuations in biological systems, Phys.Rev. Lett. 109, 248104 (2012).

[34] See supplemental material attached to the end of themanuscript.

[35] W. B. Davenport, W. L. Root, et al., An introduc-tion to the theory of random signals and noise, Vol. 159(McGraw-Hill New York, 1958).

[36] E. Balleza, Flow Cytometry Data, Harvard Dataversehttps://doi.org/10.7910/DVN/T4VSGH (2017).

[37] P. Wang, L. Robert, J. Pelletier, W. L. Dang, F. Taddei,A. Wright, and S. Jun, Robust growth of escherichia coli,Current biology 20, 1099 (2010).

Inferring gene regulation dynamics from static snapshots of gene expression variability

Supplementary Material

Euan Joly-Smith,1 Zitong Jerry Wang,2 and Andreas Hilfinger1, 3, 4

1Department of Physics, University of Toronto, 60 St. George St., Ontario M5S 1A7, Canada2California Institute of Technology, Division of Biology and Biological Engineering, Pasadena CA 91125, USA

3Department of Mathematics, University of Toronto, 40 St. George St., Toronto, Ontario M5S 2E44Department of Cell & Systems Biology, University of Toronto, 25 Harbord St, Toronto, Ontario M5S 3G5

CONTENTS

1. Inferring the timescale of upstream variability — numerical bound 1

2. Examples of non-ergodic systems 3

3. Flow cytometry data analysis 4

4. Simulations 5

References 6

1. INFERRING THE TIMESCALE OF UPSTREAM VARIABILITY — NUMERICAL BOUND

Using Eq. (C1), we have

ηxx − aηyy =

∫ ∞

0

ηRRAR(t)

(e−t/τx

τx− ae

−t/τy

τy

)dt where T ≤ a ≤ 1 .

The expression in parentheses is negative for t ≤ t∗ and positive for t ≥ t∗, where t∗ = ln(a/T )τxτy/(τx − τy). Wethus have

ηxx − aηyy =

∫ t∗

0

ηRRAR(t)

(e−t/τx

τx− ae

−t/τy

τy

)dt+

∫ ∞

t∗ηRRAR(t)

(e−t/τx

τx− ae

−t/τy

τy

)dt

≥∫ t∗

0

ηRR

(e−t/τx

τx− ae

−t/τy

τy

)dt+

∫ ∞

t∗ηRRAR(t)

(e−t/τx

τx− ae

−t/τy

τy

)dt

≥∫ t∗

0

ηRR

(e−t/τx

τx− ae

−t/τy

τy

)dt+

∫ ∞

t∗ηRRe

−t/τR(e−t/τx

τx− ae

−t/τy

τy

)dt , (1.1)

where in the second step we use the fact that the expression in parantheses is negative and AR(t) ≤ 1, and the thirdstep comes from the fact that AR(t) ≥ e−t/τR . Without loss of generality we work in units where τx = 1 and τy = T ,in which case we solve the above integral to obtain

ηxx − aηyy ≥ ηRRh(a, T, τR) (1.2)

where h(a, T, τR) =

1− a+ a

( aT

)− 11−T −

( aT

)− T1−T

+τR(aT

)− (1+τR)T

τR(1−T )

1 + τR− aτR

(aT

)− (r+T )τR(1−T )

T + τR

.

Now, we can infer that h(a, T, τR) is zero for some a∗ ∈ [T, 1]. In particular, for a = T we have

h(T, T, τR) =τR

1 + τR− TτRT + τR

=τ2R(1− T )

(1 + τR)(T + τR)> 0 .

Moreover, we note that the τR dependence of h comes from the integral on the right in Eq. (1.1), and since theexpression in parenthesis in this integral is always positive, the integral will increase as τR increases. As a result,

arX

iv:2

109.

0039

2v1

[q-

bio.

MN

] 1

Sep

202

1

2

h(a, T, τR) is a monotonically increasing function of τR and so h(1, T, τR) < h(1, T, τR →∞) = 0. Thus, we’ve shownthat h(a, T, τR) is positive for a = T and negative for a = 1, and so by the mean value theorem there must exist ana∗ ∈ [T, 1] s.t. h(a∗, T, τR) = 0. This a∗ can be found numerically for a given τR and T , and plugging this a∗ intoEq. (1.3) gives us the strictness bound on systems that satisfy e−t/τR ≤ AR(t)

ηxx − a∗ηyy ≥ 0 where h(a∗, T, τR/τx) = 0 , (1.3)

where the τx factor is added for systems with arbitrary units. We were unable to solve for an analytical solutionto a∗ given T and τR. However, through trial and error, and by looking at the functional form of the solution ofsystems with exponential auto-correlations, we proposed the ansatz that h(a∗ansatz, T, τR/τx) ≥ 0, where a∗ansatz =(0.9 + Tτx/τR)/(0.9 + τx/τR). By numerically plotting h(a∗ansatz, T, τR/τx) over different T ∈ [0, 1] and τR ∈ [0, 100](see Fig. S1), we find that h(a∗ansatz, T, τR/τx) ≥ 0 in this regime of parameters, and so through Eq. (1.3) we have theslightly weaker bound

ηxx − a∗ansatzηyy ≥ 0 where a∗ansatz =

(0.9 + Tτx

τR

0.9 + τxτR

). (1.4)

T0.00.20.40.60.81.0

0

100

0.000

0.005

0.010

50

Figure S1. Plot of h(a∗ansatz, T, τR/τx) over different T and τR/τx values, where a∗ansatz = (0.9 + T τxτR

)/(0.9 + τxτR

). Note that

h here is always positive and for most values of the domain it is approximately zero (the peak is hmax ≈ 0.02).

We recall from Eq. (B2) that ηxx = ηxR and ηyy = ηyR, and we can solve for ηxR and ηyR in terms of measurable(co)variances through Eq. (B5). Substituting these into the above inequality gives us

(a− T )ρxy ≤(a+ T

1 + T

)(CVxCVy

− T CVyCVx

)where a =

(0.9 + Tτx

τR

0.9 + τxτR

). (1.5)

Systems that break the above inequality cannot satisfy e−t/τR ≤ AR(t), and so the upstream variability must fluctuatewith timescale smaller than τR, see Fig. 6.

For fluorescent protein reporters of the class of systems in Fig. 2A of the paper, we can follow the same analysis asabove. In particular, we consider systems in which the following holds

e−t/τR ≤ AF (t) where F (t) = E[F (u(t),ux)|u[−∞, t]

].

That is, F (t) is the translation rate when we average out all the intrinsic fluctuations originating from the mRNAintrinsic system, and so AF (t) characterizes the timescale of the upstream fluctuations originating from the cloud ofcomponents u(t). Similarly to the above, we have

ηx′′x′′ − aηy′′y′′ ≥∫

g(t)<0

ηRRg(t)dt+

g(t)≥0

ηRRe−t/τug(t)dt

where g(t) =

(τmat,xe

−t/τmat,x − τ ′′e−t/τ ′′

τ2mat,x − τ ′′2

− aτmat,ye−t/τmat,y − τ ′′e−t/τ ′′

τ2mat,y − τ ′′2

),

In order to find the strongest bound on these systems, the largest Tm ≤ a∗ ≤ 1 value that would set the right handside of the above inequality to zero would need to be found. This can be done numerically, where this a∗ value woulddepend on τR, T , and τ ′′. If τ ′ is not known, then the largest a that would set the right hand side of the aboveinequality to be positive for all τ ′ can be used. In both of these cases we would end up with

aηy′′y′′ ≤ ηx′′x′′ where a = a∗ (see above paragraph) .

3

When there is no feedback, recall that Eq. (E6) and Eq. (E8) hold. Using these equations with the above inequalitywe find

(a− Tm)ρx′′y′′ ≤(a+ Tm1 + Tm

)(CVx′′

CVy′′− Tm

CVy′′

CVx′′

). (1.6)

Systems that break the above inequality must break the inequality given by e−t/τR ≤ AF (t), and so the upstreamvariability must fluctuate with timescales smaller than τR.

2. EXAMPLES OF NON-ERGODIC SYSTEMS

Non-ergodic systems are systems where time averages do not correspond to population averages. Though the cloudof components u(t) in Fig. 1A is dynamic and can vary as a function of time, it is left unspecified and we do notneed to assume ergodicity in R(u(t)). As an example, consider the case where u(t) is such that the rate R(u(t))evolves from some initial value and eventually reaches one of two values, λ1 or λ2, from which it stays constant, seeFig. S2A. Here, let’s say that the choice of states λ1 or λ2 is random with a probability of 1/2. If we follow any singlecell over time from some fixed initial state, the transcription rate will evolve to either R = λ1 or R = λ2 with equalprobability, and then stay constant. If we wait for stationarity, in which each system has left the transient state, thenwe have two sub-populations. Such a system, though not ergodic, will satisfy the constraints presented in the maintext. In particular, this exact system will lie along the right boundary of the open-loop constraint given by Eq. (2)of the main text. In previous sections we’ve conditioned on the history of the upstream variables u(t) for systemswithout feedback. For the system in Fig. S2A this corresponds to conditioning on R = λ1 or R = λ2. When we takethe average over all histories we effectively average over the two R states.

A) B)

Reached stationarityReached stationarity

Figure S2. Irreversible cell states are non-ergodic processes. A) Here are possible realizations of the transcription rateR(u(t)) in a particular system within the class of Fig. 1A in which the rate R either remains at the constant state of R = λ1

or R = λ2 after the initial transient dynamics have decayed. In each realization, the rate has a probability of 1/2 of growinginto the R = λ1 state or into the R = λ2 state. This could, for example, model a population of cells in which each cell mustirreversibly switch to one of two states. This system is not ergodic, as following an individual cell over time can only provideinformation on the sub-population that shares that cell’s state, and not the whole population. B) A similar system as A, withone of the states having an oscillating transcription rate and the other a stochastic transcription rate.

We can consider a more general non-ergodic system in which the cells move into one of two states with equalprobability, defined by either having a transcription rate R1(t) or R2(t). For example, R1 could be a stochastic signaland R2 could be an oscillation, see Fig. S2B. Here when we condition on the history we are also conditioning on thestate of the cell by specifying whether the history is an oscillation or not. When we take the average over all historieswe in effect take the average of all histories in each state and then average over each state.

It is worth noting that for such non-ergodic systems with different transcription states, if there is any feedbackbetween the downstream variables X and Y and the cloud of components u(t) in any of the different cell states, thenthat constitutes feedback in our framework and could break the open-loop constraint given by Eq. (2).

Moreover, the bound on stochastic systems given by Eq. (4) bounds systems with non-negative autocorrelation.The autocorrelation here is the autocorrelation obtained by ensemble averages, defined by

A(s) :=〈R(t+ s)R(t)〉 − 〈R(t+ s)〉〈R(t)〉

Var[R(t)],

where outer brackets are averages over the population, and the variance at the bottom is the variance over thepopulation. For a non-ergodic system of the form in Fig. S2B, this autocorrelation would be an average between theautocorrelations of the two possible transcription rates. It’s thus possible that a system will break the constraintgiven by Eq. (4) if only one of the sub-populations has an oscillating transcription rate.

4

3. FLOW CYTOMETRY DATA ANALYSIS

Figure S3. Filtering cells from debris based on scattering and fluorescence profiles. Here we plot a figureillustrating the filtering strategy that was used to filter out debris from the raw flow cytometry datasets. The side (SSC) andforward (FSC) scattering signals were used to separate the cluster of cell-like objects from debris measurements that lie outsidethe cluster. Additionally, the green (FITC) and cyan (mCFP) signals were used with the scattering signals to separate thecluster of fluorescent objects. On the bottom right is the resulting green fluorescence distribution for mEGFP, with statisticsshowing the abundance of post-sorted events (with pre-sorted at 10,000), the mean, and the coefficient of variation.

For each fluorescent protein, coefficients of variation (CVs) were obtained for three independent cell cultures ac-cording to Fig. S3, and the average was taken with an uncertainty given by the standard error. The CVs that wereobtained this way correspond to abundance CVs. In order to obtain the concentration CVs, we used Eq. (L1) asdescribed in Appendix L. Note that CVV can be estimated from first principles for a given cellular growth dynamics.For example, if we assume that the volume is growing exponentially between divisions with symmetric division times,then CVV ≈ 0.2 (similarly for a volume growing linearly). This is independent of the cell division time and can beused as a lower bound for CVV . In general, CVV will be bigger than 0.2 due to stochastic effects like asymmetricdivisions and division times. We thus estimated CVV using separate data that explicitly quantifies E. coli growthdynamics through time-lapse microscopy [1]. In particular, in [1] individual E. coli cells are followed and their lengthsare measured as a function of time for about 100 divisions, see Fig. S4. From the publicly available data we analyzedthe length time-traces of the mother cells from one experiment with E. coli MG1655 (CGSC 6300) grown in LBmedia at 37◦C. This includes over 100 length time-traces like the one in Fig. S4, thus constituting approximately104 divisions. We noticed that many time-traces exhibited large fluctuations near the end (Fig. S4), which suggestsa systematic error. Therefor, we computed the length CV from the first half of each time-trace, and then took theaverage, resulting in CVV = 0.261± 0.005.

Of the 50 fluorescent proteins that were studied in [2], varying maturation kinetics were observed, including first-order exponential kinetics as assumed in Fig. 2, second-order kinetics, and other more complex maturation kinetics.As a result, two effective maturation times are reported: τ50

mat and τ90mat, corresponding to the time it takes for 50%

or 90% of an ensemble of fluorescent proteins to become mature, respectively. The former, τ50mat, is the maturation

half-life, which for first-order maturation kinetics as assumed in Fig. 2 is related to the average maturation timethrough τ50

mat/τmat = ln(2), and to τ90mat through τ50

mat/τ90mat = ln(2)/ ln(10). In order to pick a subset of fluorescent

proteins that are the most closely modelled by a first-order maturation step, we picked the 10 fluorescent proteinsfor which the experimental ratio τ50

mat/τ90mat lies closest to ln(2)/ ln(10). However, we found that this algorithm would

capture a few fluorescent proteins whose maturation curves in [2] were noisy and could possibly also be described by

5

Part of trace used to compute CV

Figure S4. Example time-trace of growing and dividing E. coli cells. Data taken from [1] where individual E. coli cellswere followed using time-lapse microscopy and their lengths measured over time. One pixel corresponds to 0.0645µm. Many ofthese time-traces exhibited large fluctuations near the end, suggesting a possible systematic error. We therefor computed theCVs of the first half of each time-trace.

a second-order process. Therefore, we picked the 10 fluorescent proteins for which the ratio τ50mat/τ

90mat lies closest to

ln(2)/ ln(10) and that also display a clear first-order trend in the maturation kinetic time-traces reported in [2]. Thechosen fluorescent proteins with their τ50

mat maturation times and their concentration CVs are presented in Table I.

4. SIMULATIONS

Though the constraints presented in the main text are analytically proven, we performed gillespie simulations todemonstrate that the entire bounded regions are achievable. Here we summarize what systems were simulated.

To demonstrate the open-loop constraint we performed gillespie simulations of certain systems that are part of theclasses of Fig. 1A and Fig. 2A of the paper. The systems that were simulated include those driven by a sinusoidal,those driven by an oscillating step function, those driven by a poisson process z, and those driven by the square of apoisson process z2. In all these cases many simulations were performed with varying reporter lifetimes and reporterbirthrate parameters (like the sinusoidal amplitude and frequency, or the half-life of the upstream poisson process,etc.). The time (co)variances for these particular system realizations was then computed, and these correspond tothe blue dots in the paper figures. For closed-loop systems, we performed gillespie simulations of systems driven by

50

Table I. Maturation times and concentration CVs for the selected fluorescent proteins. In the top row we present the10 selected fluorescent proteins from the publicly available data that are most closely modelled by first-order maturation kinetics,along with their maturation half-life (related to the maturation time through τmat = τ50mat/ ln(2)), and the concentration CVscomputed from the flow cytometry datasets. Numbers in parentheses are the uncertainties, taken from [2] for the maturationtimes and taken as the standard error of three replicate flow cytometry data sets along with error propagation for the CVs.In the white cells we report the maturation time ratios and the ratio of CVs. Our constraints are unique for a given pair ofCVs and maturation times, thus we show only one ratio for a given fluorescent protein pair. Each panel corresponds to a setof reporter pairs where the Y reporter is held fixed and the X reporter is varied (equivalent to varying τmat,x while keepingτmay,y fixed). Each column corresponds to a panel in Fig. 8.

6

rates with feedback. In particular, to fill most of the region in the figures, rates of the form R(x) ∝ (K + xn)−1,R(y) ∝ (K + yn)−1, and R(y) ∝ (K + yn + xm)−1 were done over different n, m, and K. The region outside of theorange lines in Fig. 1B and Fig. 2B near the ρ = 1 line was only accessible by rates of the form R(x, y) ∝ U(x− ky),where U is a step function and k is a parameter that was varied.

To demonstrate the constraint on stochastic systems we performed gillespie simulations of open-loop systems thatare known to be stochastic and periodic. These include those driven by a poisson process and those driven by asinusoidal. We performed many simulations where the parameters of these specified birthrates were varied.

To simulate concentrations of growing and dividing cells, we performed gillespie simulations of molecular numbersthat undergo a binomial split at particular times separated by the same period ln(2)τc. The volume was thenmodelled as an exponentially growing volume V ∝ et/τc that is reduced by a factor of 1/2 at the same moments thatthe molecular numbers undergo a binomial split. The reporter numbers and the cell volume were kept track of, andfrom these the concentration (co)variances were computed from the time averages of x(t)/V (t) and y(t)/V (t). Theconcentration production rates Rc := R/V that we simulated include poisson processes, sinusoidals, rates proportionaland inversely proportional to the cell volume, and oscillating step functions.

[1] P. Wang, L. Robert, J. Pelletier, W. L. Dang, F. Taddei, A. Wright, and S. Jun, Robust growth of escherichia coli, Currentbiology 20, 1099 (2010).

[2] E. Balleza, J. Mark Kim, and P. Cluzel, Systematic characterization of maturation time of fluorescent proteins in livingcells, Nature Methods 15, 47 (2018).