20
J. theor. Biol. (2000) 203, 229}248 doi:10.1006/jtbi.2000.1073, available online at http://www.idealibrary.com on Theory for the Systemic De5nition of Metabolic Pathways and their use in Interpreting Metabolic Function from a Pathway-Oriented Perspective CHRISTOPHE H. SCHILLING*,DAVID LETSCHER- ? AND BERNHARD ".PALSSON*A *Department of Bioengineering, ;niversity of California, San Diego, ¸a Jolla, CA 92093-0412, ;.S.A. and - Department of Mathematics, ;niversity of California, San Diego, ¸a Jolla, CA 92093-0112, ;.S.A. (Received on 27 December 1999, Accepted in revised form on 25 February 2000) Cellular metabolism is most often described and interpreted in terms of the biochemical reactions that make up the metabolic network. Genomics is providing near complete informa- tion regarding the genes/gene products participating in cellular metabolism for a growing number of organisms. As the true functional units of metabolic systems are its pathways, the time has arrived to de"ne metabolic pathways in the context of whole-cell metabolism for the analysis of the structural design and capabilities of the metabolic network. In this study, we present the theoretical foundations for the identi"cation of the unique set of systemically independent biochemical pathways, termed extreme pathways, based on system stochiometry and limited thermodynamics. These pathways represent the edges of the steady-state #ux cone derived from convex analysis, and they can be used to represent any #ux distribution achievable by the metabolic network. An algorithm is presented to determine the set of extreme pathways for a system of any complexity and a classi"cation scheme is introduced for the characterization of these pathways. The property of systemic independence is discussed along with its implications for issues related to metabolic regulation and the evolution of cellular metabolic networks. The underlying pathway structure that is determined from the set of extreme pathways now provides us with the ability to analyse, interpret, and perhaps predict metabolic function from a pathway-based perspective in addition to the traditional reaction- based perspective. The algorithm and classi"cation scheme developed can be used to describe the pathway structure in annotated genomes to explore the capabilities of an organism. ( 2000 Academic Press Introduction Metabolism is broadly de"ned as the complex of physical and chemical processes involved in the maintenance of life. It is comprised of a vast repertoire of enzymatic reactions and transport processes used to convert thousands of organic compounds into the various molecules necessary to support cellular life. Metabolic objectives are achieved through a sophisticated control scheme ? Present address: Department of Mathematics, Okla- homa State University, Stillwater, OK 74078, U.S.A. A Author to whom correspondence should be addressed. that e$ciently distributes and processes meta- bolic resources throughout the cell's metabolic network. Before we can understand the regula- tory logic that the cell chooses to implement, we need to ask what the cell is actually attempting to regulate from a systems-based perspective. The collection of reactions and hence pathways that a metabolic network possesses determines the architecture and topology of the network. To harness the production capabilities inherent in the network, the cell must "nd a way to control the system and the pathways that determine these capabilities. 0022}5193/00/070229#20 $35.00/0 ( 2000 Academic Press

Theory for the Systemic Definition of Metabolic Pathways and their

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Theory for the Systemic Definition of Metabolic Pathways and their

J. theor. Biol. (2000) 203, 229}248doi:10.1006/jtbi.2000.1073, available online at http://www.idealibrary.com on

Theory for the Systemic De5nition of Metabolic Pathways and their usein Interpreting Metabolic Function from a Pathway-Oriented Perspective

CHRISTOPHE H. SCHILLING*, DAVID LETSCHER-? AND BERNHARD ". PALSSON*A

*Department of Bioengineering, ;niversity of California, San Diego, ¸a Jolla, CA 92093-0412, ;.S.A.and -Department of Mathematics,;niversity of California, San Diego, ¸a Jolla, CA 92093-0112,;.S.A.

(Received on 27 December 1999, Accepted in revised form on 25 February 2000)

Cellular metabolism is most often described and interpreted in terms of the biochemicalreactions that make up the metabolic network. Genomics is providing near complete informa-tion regarding the genes/gene products participating in cellular metabolism for a growingnumber of organisms. As the true functional units of metabolic systems are its pathways, thetime has arrived to de"ne metabolic pathways in the context of whole-cell metabolism for theanalysis of the structural design and capabilities of the metabolic network. In this study, wepresent the theoretical foundations for the identi"cation of the unique set of systemicallyindependent biochemical pathways, termed extreme pathways, based on system stochiometryand limited thermodynamics. These pathways represent the edges of the steady-state #ux conederived from convex analysis, and they can be used to represent any #ux distributionachievable by the metabolic network. An algorithm is presented to determine the set ofextreme pathways for a system of any complexity and a classi"cation scheme is introduced forthe characterization of these pathways. The property of systemic independence is discussedalong with its implications for issues related to metabolic regulation and the evolution ofcellular metabolic networks. The underlying pathway structure that is determined from the setof extreme pathways now provides us with the ability to analyse, interpret, and perhaps predictmetabolic function from a pathway-based perspective in addition to the traditional reaction-based perspective. The algorithm and classi"cation scheme developed can be used to describethe pathway structure in annotated genomes to explore the capabilities of an organism.

( 2000 Academic Press

Introduction

Metabolism is broadly de"ned as the complex ofphysical and chemical processes involved in themaintenance of life. It is comprised of a vastrepertoire of enzymatic reactions and transportprocesses used to convert thousands of organiccompounds into the various molecules necessaryto support cellular life. Metabolic objectives areachieved through a sophisticated control scheme

?Present address: Department of Mathematics, Okla-homa State University, Stillwater, OK 74078, U.S.A.AAuthor to whom correspondence should be addressed.

0022}5193/00/070229#20 $35.00/0

that e$ciently distributes and processes meta-bolic resources throughout the cell's metabolicnetwork. Before we can understand the regula-tory logic that the cell chooses to implement, weneed to ask what the cell is actually attempting toregulate from a systems-based perspective. Thecollection of reactions and hence pathways thata metabolic network possesses determines thearchitecture and topology of the network. Toharness the production capabilities inherent inthe network, the cell must "nd a way to controlthe system and the pathways that determine thesecapabilities.

( 2000 Academic Press

Page 2: Theory for the Systemic Definition of Metabolic Pathways and their

230 C. H. SCHILLING E¹ A¸.

The obvious functional unit in metabolic net-works is the actual enzyme or gene productexecuting a particular chemical reaction or facili-tating a transport process. Control of metabolisminvolves regulation of these individual reactionsat various levels from enzymatic activation/inhi-bition down to transcriptional regulation at thegenetic level. By physically controlling the func-tional units involved in the metabolic networkthe cell is ultimately controlling its metabolicpathways in a switchboard-like fashion, directingthe distribution and processing of metabolitesthroughout its extensive map of pathways. Thus,in seeking to comprehend the regulatory logicimplemented by the cell to control the network itis imperative to understand how the cell is ca-pable of meeting its metabolic objectives throughthe analysis of its metabolic pathways.

Decades of metabolic research and the adventof rapid genome sequencing technologies andalgorithms have placed us on the brink of havinga complete metabolic parts catalogue for severalorganisms. Once a genome has been fully se-quenced and annotated, the entire metabolic maprepresenting all the metabolic reactions takingplace in the cell can be constructed, and extensiveon-line databases have collected this informationfor a number of organisms (Kanehisa, 1997; Karpet al., 1998; Selkov et al., 1998). This informationprovides us with the contents of the metabolicsystem in a given organism. Concurrent advancesin the area of cDNA microarrays (Schena et al.,1998; Brown & Botstein, 1999) and DNA chiptechnology (Hoheisel, 1997; Ramsay, 1988; Lip-shutz et al., 1999) have provided the capability ofstudying the expression patterns and utilizationof the metabolic genotype under various environ-mental conditions. Additionally, computationalmodels of the entire metabolic network are nowbeing developed to analyse, interpret, and predictthe genotype}phenotype relationship for fully se-quenced organisms (e.g. Schilling et al., 1999a).

All of these methods for the analysis of meta-bolic networks provide clear and insightfulinformation regarding the activity of metabolicreaction networks from an individual reaction-based perspective. From the microarray data wecan observe which genes have been up- or down-regulated under changing conditions, and fromthe computational approaches it is predicted

which #uxes increase or decrease under simulatedconditions. It is now an opportune time toprovide a new perspective for the analysis ofmetabolic networks, namely that of a pathway-oriented interpretation. Thus, we seek to trans-late information on the activity of individualreactions in a metabolic network into distinctmetabolic pathways. Ideally, we would like to"nd a set of pathways that are unique for a par-ticular network, which correspondingly can pro-vide a unique representation of all the possiblefunctional modes or #ux distributions that thenetwork can achieve. With such a set of pathwaysde"ned for a given network of reactions, we coulduse these pathways to provide the pathway-basedperspective that is needed to understand howmetabolic networks are operated and controlled.

Currently, there exist two fundamentally di!er-ent approaches to the de"nition of metabolicpathways: (1) qualitative identi"cation based onhistorical groups of reactions in a databasesetting (Karp et al., 1999), and (2) rigorous quant-itative and systemic de"nitions based on math-ematical principals such as linear algebra andconvex analysis (for a recent review see Schillinget al., 1999b).

While much information can be derived froma qualitative analysis of the pathways in a meta-bolic network, comprehensive approaches for thequantitative de"nition of metabolic pathwayscan be used to assess the precise metabolic capa-bilities and performance of cellular metabolicnetworks under a broad range of environmentaland genetic challenges. This will be illustratedin the following companion paper (Schilling &Palsson, 2000) in which the theoretical ap-proaches outlined below are applied to examinethe production capabilities and functional char-acteristics of Haemophilus in-uenzae, containingthe "rst fully sequenced genome of a free-livingorganism (Fleischmann et al., 1995).

In this paper, we present the detailed theoret-ical foundations of a newly developed approachfor the study of metabolic pathway analysis thatis based on convex analysis, which has been pre-viously used to study pathways in both chemicaland metabolic reaction networks (Clarke, 1988;Schuster et al., 1999). Through the application ofconvex analysis, the unique set of extreme path-ways that are systemically independent can be

Page 3: Theory for the Systemic Definition of Metabolic Pathways and their

A PATHWAY PERSPECTIVE FOR CELLULAR METABOLISM 231

calculated for any metabolic system and thenclassi"ed. These pathways can then be used tointerpret metabolic functioning and potentiallymetabolic control. A section devoted to a com-parison between existing approaches for pathwayanalysis based on principles of convexity is alsoincluded at the end. Mathematical proofs andinstructive examples are provided to illustratemany of the concepts discussed in an e!ort tomake the theory intelligible to anyone with aninterest in either biology or mathematics. Theapplication of these theoretical concepts to a fullysequenced and annotated genome will follow(Schilling & Palsson, 2000).

Metabolic Systems and Descriptions

A cellular metabolic reaction network is acollection of enzymatic reactions and transportprocesses that serve to replenish and drain therelative amounts of certain metabolites. A systemboundary can be drawn around all these typesof physically occurring reactions, which consti-tute internal #uxes operating inside the network.The system is closed to the passage of certainmetabolites while others are allowed to enterand/or exit the system based on external sourcesand/or sinks which are operating on the networkas a whole. The existence of an external source/sink on a metabolite necessitates the introductionof an exchange #ux, which serves to allowa metabolite to enter or exit the theoretical sys-tem boundary. These #uxes are not physical bio-chemical conversions or transport processes likethose of internal #uxes, but can be thought of asrepresenting the inputs and outputs to the sys-tem. These #uxes can also be referred to aspseudoreactions (Clarke, 1980), and could alsorepresent di!usive exchange with a bu!ered ex-ternal reservoir. Demands on a metabolite forfurther processing or incorporation into cellularbiomass would also create an exchange #ux onan internal cellular metabolite. (As a conse-quence, a metabolite located inside the cell isconsidered distinct from the same metabolitelocated in the extracellular space.) Furthermore,when considering a whole-cell metabolic networkthe system boundary is drawn around the entirecell and transport reactions become internal#uxes (similar to the concept of a free-body

diagram in mechanical engineering for determin-ing force balances, or shell balances in transportphenomena).

All internal #uxes are denoted by vi

fori3[1, n

I] where n

Iis the number of internal

#uxes. All exchange #uxes are denoted by bi,

for i3[1, nE] where n

Eis the total number of

exchange #uxes. Limited thermodynamic in-formation can be used to determine if a chemicalreaction can proceed in the forward and reversedirections or it is irreversible thus physically con-straining the direction of the reaction. All internalreactions that are considered to be capable ofoperating in a reversible fashion are consideredas two #uxes occurring in opposite directions,therefore constraining all internal #uxes to benonnegative. This convention is used purely formathematical purposes and does not in#uencethe biological interpretation of metabolicfunction in any way. There can only be oneexchange #ux per metabolite, whose activitysubsequently represents the net production andconsumption of the metabolite by the system.Thus, n

Ecan never exceed the number of

metabolites in the system (m). The activity ofthese exchange #uxes is considered to be positiveif the metabolite is exiting or being produced bythe system, and negative if the metabolite is enter-ing or being consumed by the system. For allmetabolites in which a source or sink may bepresent the exchange #ux can operate in a bi-directional manner and is therefore uncon-strained.

The analysis of a metabolic system shouldbegin with a study of its structural characteristicsor invariant properties, those depending neitheron the state of the environment nor on the inter-nal state of the system, but only on its structure(Reder, 1988). The stoichiometry of a biochemicalreaction network is the primary invariant prop-erty that describes the architecture and topologyof the network. Stoichiometry refers to the molarratios in which substrates are converted intoproducts in a chemical reaction (e.g. glucokinaseconverts one mole of glucose and one mole ofATP into one mole of glucose-6-phosphate andone mole of ADP). These ratios remain constantunder changing reaction conditions, which mayserve to alter the kinetic parameters and rate ofreaction as a function of time.

Page 4: Theory for the Systemic Definition of Metabolic Pathways and their

232 C. H. SCHILLING E¹ A¸.

Dynamic mass balances can be written aroundevery metabolite in the system taking the form ofthe following equation in matrix notation wherex denotes the concentration vector of all themetabolites, S is the stochiometric matrix and v isthe #ux vector describing the activity of all theinternal and exchange #uxes:

dxdt

"S ' v. (1)

The stoichiometric matrix S is an m]n matrixwhere m corresponds to the number of meta-bolites and n is the total number of #uxes takingplace in the network (n"n

I#n

E). The S

ijele-

ment of the stochiometric matrix corresponds tothe stoichiometric coe$cient of the reactant i inthe reaction denoted by j, and v

jis the #ux

through this metabolic reaction. Thus, for theenzyme glucokinase, S

glucose,glucokinase"!1. The

vector v then refers to the relative #uxes throughthe reactions in the metabolic network. In thesedi!erential equations, enzyme kinetics enters theequation when v is represented as a rate law,typically a function of the concentration vectorand a number of kinetic parameters. Equation (1)then essentially describes the change in meta-bolite concentration as a function of time to beequal to the di!erences in the sum of all #uxesthat serve to produce the metabolite and thosewhich consume the metabolite.

The pathway structure we seek to determineshould also be an invariant property of the net-work along with stoichiometry. Thus, it isreasonable to eliminate the time derivative fromeqn (1) by imposing a steady-state condition.Under steady-state conditions the time derivativein eqn (1) can be relaxed to zero and a resultingset of linear homogeneous equations [eqn (2)] iscreated from which it is possible to calculate #uxvalues. This system of equations is typicallyunderdetermined for metabolic systems as thenumber of reactions typically exceeds the numberof participating metabolites, and as a result a cor-responding null space (NulS) can be described(Lay, 1997):

0"S ' v. (2)

The null space corresponds to the set of all solu-tions (v) for eqn (2). It has been previously shown

that a set of basis vectors can be selected todescribe the null space of eqn 2, where each basisvector corresponds to a steady-state biochemicalpathway (Fell, 1993; Schilling & Palsson, 1998).However, to completely describe the system weneed to include the constraints on internal andexchange #uxes. The constraints on internal#uxes are rather straightforward as all #uxesmust be non-negative yielding:

vi*0, ∀i. (3)

The constraints on the exchange #uxes dependon the status of a determined source or sink onthe metabolite, or similarly on the input andoutput status of the metabolite. These constraintscan be expressed as shown in eqn (4) where a

jand

bjare either zero or negative and positive in"nity,

respectively, based on the direction of the ex-change #ux. Under the existence of a source (in-put) only a

jis set to negative in"nity and b

jis set

to zero, whereas if only a sink (output) exists onthe metabolite a

jis set to zero and b

jis set to

positive in"nity. If both a source and a sink arepresent for the metabolite then the exchange #uxis bidirectional with a

jset to negative in"nity and

bj

set to positive in"nity leaving the exchange#ux unconstrained.

aj)b

j)b

j(4)

In general, we can make the distinction be-tween currency metabolites (i.e. cofactors such asATP, NADH) involved in energy and redoxlevels and the rest of the metabolites in thenetwork (primary metabolites). For a pathwayanalysis the exchange #uxes for these currencymetabolites are typically unconstrained. Underconditions in which the system is to be closed forthe exchange of these metabolites the corre-sponding exchange #uxes are constrained to zero.This distinction between metabolites will be ofassistance in classifying metabolic pathways. Ad-ditionally, we typically structure the stoichiomet-ric matrix so that the "rst series of columnsrepresent the internal #uxes and the remainingcolumns represent the exchange #uxes with theprimary exchange #uxes followed by the currencyexchange #uxes. In constructing the stoichiomet-ric matrix in this way, the vector v is composed

Page 5: Theory for the Systemic Definition of Metabolic Pathways and their

A PATHWAY PERSPECTIVE FOR CELLULAR METABOLISM 233

"rst of entries vI!v

nIfollowed by b

I!b

nE. This

construction has the advantage that the overallsystemic balance equation for a given pathway or#ux distribution is simply obtained from thevalues of the exchange #uxes constituting thelower portion of the #ux vector (i.e. see Fig. 2 andthe discussion below).

These general concepts can be illustrated bya speci"c example. Figure 1(a) depicts a bio-chemical reaction network consisting of "vemetabolites and four internal reactions intercon-verting these metabolites. Two of these reactionsare reversible creating a forward #ux and areverse #ux. Together there are a total of sixinternal #uxes. Only four of the "ve metaboliteshave sources or sinks and are subsequently al-lowed to cross over the system boundary creatingfour exchange #uxes that are all unconstrained.

FIG. 1. (a) Chemical reaction network consisting of the "ve mcreating six internal #uxes (v) along with four allowed exchangenetwork into its mathematical representation. The stoichiomehomogeneous equations derived from the conservation of mass ias in eqn (3) and the exchange #uxes are all unconstrained an

Figure 1(b) shows the complete mathematicaltranslation of the system into eqns (2)}(4). Itshould be noted that such a translation and thedetermination of a genome-speci"c stoichiomet-ric matrix representing an entire cellular meta-bolic network can be directly determined from anorganisms metabolic genotype and biochemicalinformation based on the reactions and processesassociated with each of the respective gene prod-ucts of the genotype (Edwards & Palsson, 1998;Schilling et al., 1999a).

The Steady-state Flux Cone and MetabolicCapabilities

Together eqns (2)}(4) describe a metabolic sys-tem under steady-state conditions as a systemof linear equalities/inequalities. This description

etabolites (A}E) and four internal reactions (two reversible)#uxes (b) all indicated by the arrows. (b) The translation of thetric matrix (S) is expressed in terms of the series of linearn eqn (2). All internal #uxes are constrained to be nonnegatived described as in eqn (4).

Page 6: Theory for the Systemic Definition of Metabolic Pathways and their

234 C. H. SCHILLING E¹ A¸.

captures the constraints that are placed on thenetwork by the stoichiometry and thermo-dynamics of the reactions, as well as the con-straints on input and output of metabolites fromthe system, which typically can be experimentallydetermined. The presence of linear inequalitieslimits the use of traditional concepts of linearalgebra, and necessitates the use of convexanalysis (Rockafellar, 1970), which is capable oftreating systems of linear inequalities. The set ofsolutions to any system of linear inequalities isa convex set, and for our particular system thisis also true as any linear equality can be writtenas two inequalities. This convex solution corres-ponds geometrically to a convex polyhedral conein n-dimensional space (Rn) emanating from theorigin for all metabolic systems modeled as de-scribed herein. We refer to this convex cone gen-erally as the #ux space and more speci"cally asthe steady-state #ux cone (C). Within this #uxcone lie all of the possible steady-state solutionsand hence the #ux distributions under which thesystem can operate. Since every solution or oper-ating mode of the system is contained within the#ux space, it logically follows that the entire #uxspace represents the capabilities of the givenmetabolic network. Thus, the #ux space clearlyde"nes what a network can and cannot do; whatbuilding blocks can be manufactured; how e$-cient the energy extraction and conversion ofcarbohydrates into ATP can be for a given sub-strate; where are the critical links in the network;and so on. The answer to these basic questionsand many others related to the structural andfunctional capabilities of the network are foundwithin the #ux cone.

As the answers to our questions lie within the#ux cone we must then develop a way to describeand interpret any location within this space. Inother words, we must now "nd the best way tonavigate through this solution space. We aremainly interested in determining the character-istics of this space and interpreting it from anoverall metabolic perspective. This objective canbe achieved by either interpreting the functioningof the network from the traditional reaction-based perspective as described by the #ux vectorv or from a pathway-oriented perspective. Wenow discuss the development of such a pathway-oriented perspective.

Convex Analysis and Metabolic Pathways

The study of convex polyhedral cones, whichforms the underlying mathematical structure formetabolic pathway analysis, has several concep-tual similarities with linear algebra. For convexcones, one studies extreme rays (or generatingvectors) that correspond to edges of the conebeing half-lines emanating from the origin. Theseextreme rays are said to generate the cone andcannot be decomposed into a non-trivial convexcombination of any other vectors residing in the#ux cone. For this reason they are referred to asbeing conically or systematically independent.This notion of a minimal generating set, which isproperly referred to as forming the conical hull ofthe cone, roughly corresponds to the concept ofa basis in linear algebra. However, this generat-ing set is typically unique, providing a clear ad-vantage to using convex analysis rather than juststudying the underlying linear algebra comingfrom the subspace determined by the null spaceof S. In fact, the set of conditions for describingconvex cones is nearly identical to the conditionsused to de"ne a vector space in linear algebrawith the exception that all scalars must be non-negative for convex cones (Hadley, 1961).

Here in the context of metabolic systems wewill use the term extreme pathways to denote theextreme rays of a polyhedral cone as each raycorresponds to a particular pathway or active setof #uxes which satis"es the steady-state massbalance constraints and inequalities posed ineqns (2)} (4). Extreme pathways will be denotedby the vector p

iand the total number of extreme

pathways needed to generate the #ux conefor a system will be denoted by k. Every pointwithin this cone (C ) can be written as a non-negative linear combination of the extreme path-ways as

C"Gv : v"k+i/1

wipi, w

i*0 ∀iH. (5)

Thus, the set of extreme pathways is analogousto a basis/coordinate system that can be used todescribe a position in space. These pathways aresaid to conically span or generate the set of allpathways as any pathway or distribution of#uxes can be written as a non-negative linear

Page 7: Theory for the Systemic Definition of Metabolic Pathways and their

A PATHWAY PERSPECTIVE FOR CELLULAR METABOLISM 235

combination of the pi1s. (Notice the similarity

with the concept of a spanning or generating setin linear algebra.) The pathway vector w corres-ponds to the coordinate vector relative to the setof extreme pathways. It provides the weight givento each pathway in a particular #ux distribution(v). If we consider the matrix P whose columnsare composed of the set of extreme pathways, thismatrix conceptually transforms the #ux vectorv into the pathway vector w providing a path-way-based perspective of the functioning ofthe network as opposed to a reaction-basedperspective.

The #ux cone in Rn can be geometricallythought of as the intersection between the nullspace of eqn (2) and the vector space described by

FIG. 2. (a) The set of extreme pathways for the network deillustrated corresponding to the edges of the #ux cone of admillustrated. The two additional pathways not depicted (Type II(v

2/v

3and v

4/v

5). (b) All seven pathways are presented as columns

for each pathway are obtained from the value of the exchandistribution (v) is given to illustrate the test for a unique decomv equals the dimensions of the null space for the modi"ed stoic

the inequalities of eqns (3) and (4). The vectorspace described by eqn (3) is the positive orthantin n

I-dimensional space, while the vector space

described by eqn (4) is a region of nE-dimensional

space. Thus, the #ux space can be described as thefollowing convex subset of Rn:

C"(RnI`]RnE)W(NulS), (6)

In other words, the #ux cone contains all pointsof the null space whose coordinates are non-negative, with the exception of the exchange#uxes that are constrained to be negative or thosethat are unconstrained.

To illustrate these concepts we return againto the system described in Fig. 1. Figure 2(a)

scribed in Fig. 1. Five of the seven extreme pathways areissible steady-state #ux vectors. Only Type I pathways areI) correspond to the cycling of the two reversible reactionsin the pathway matrix P. The net systemic balance equationsge #uxes below the dashed line. (c) An example of a #uxposition of a #ux vector. The number of subset pathways ofhiometric matrix creating a unique decomposition.

Page 8: Theory for the Systemic Definition of Metabolic Pathways and their

236 C. H. SCHILLING E¹ A¸.

graphically illustrates "ve of the seven extremepathways for the reaction network described inFig. 1, while Fig. 2(b) provides the matrixP whose seven columns are the extreme path-ways. The algorithm that is used to calculate theextreme pathways from eqns (2)}(4) is presentedin Appendix B. Using these seven pathways allpossible #ux distributions can be decomposed asshown in eqn (5). Later we will discuss a pathwayclassi"cation scheme.

Unique Extreme Pathways

The "rst issue that is of concern with any set ofpathways used to describe a metabolic network isthe uniqueness of the set. The set of generatingvectors for a polyhedral cone is certainly uniquewhen the cone lies in the positive orthant as is thecase when all #uxes are constrained to be posit-ive, but what happens when some of these con-straints are relaxed on the exchange #uxes andthe #ux cone no longer lies entirely in the positiveorthant? In this case, the set of extreme pathwaysis still unique and this is stated in the followingtheorem.

Theorem. A convex -ux cone determined by eqns(2)}(4) has a set of systematically independentgenerating vectors. Furthermore, these generatingvectors (extremal rays) are unique up to multiplica-tion by a positive scalar. ¹hese generating vectorswill be called extreme pathways.

The proof of this theorem is given in Appen-dix A. A simple way to understand and test forthe uniqueness property under any set of #uxconventions is to set all unidirectional #uxes(constrained to be non-negative) equal to zeroand search for a non-trivial solution to eqn (2). Ifa solution exists then this con"rms the existenceof a real line in the solution space described byeqn (6) resulting in a loss of uniqueness. If weconsider all #uxes to be constrained as non-negative then there will obviously be no solutionother than the trivial solution when these #uxesare set to zero, indicating that the set of edges ofthe convex #ux cone is unique. Uniqueness canbe lost when both reversible internal and ex-change #uxes are described as bidirectional #uxeswith no constraints. As an example, if we lumped

the forward and reverse #uxes v2/v

3and v

4/v

5together into two unconstrained reversible #uxes,the set of extreme pathways would no longer beunique as a non-trivial solution would exist whenthe unidirectional #uxes v

1and v

6are removed,

thus con"rming the presence of a real line in thesolution space. To preserve uniqueness we caneasily adopt the convention of decomposing thereversible internal #uxes into two unidirectional#uxes constrained to be nonnegative while allow-ing exchange #uxes to be free of constraintsunder conditions where a source and sink areaccounted for. If all unidirectional #uxes are setto zero there will be no solution to eqn (2) otherthan the trivial solution as all internal #uxes willbe set to zero and the exchange #uxes must thenequal zero since only one exchange #ux is presentper metabolite. Therefore, the set of extremepathways is observed to be unique under theadopted conventions.

The conclusion is that under the conventionsdescribed herein a unique set of pathways can bedescribed which provide the simplest unique viewof the pathway structure of the network o!eringan additional perspective from which to interpretmetabolic function.

Classi5cation of Extreme Pathways

In determining the set of extreme pathways fora system there are, in general, two distinct classesof pathways. There are pathways for which thecoe$cients of the exchange #uxes are all equal tozero, and there are pathways in which non-zerovalues exist for a set of exchange #uxes. Further-more, if a distinction is made between the cur-rency metabolites and the primary metabolitesof the system, a third class of pathways can bedelineated as those pathways for which all ofthe exchange #uxes for the primary metab-olites equal zero while non-zero values exist forthe exchange #uxes of some of the currencymetabolites.

The major pathways that are of functionalinterest are those for which the exchange #uxes ofthe primary metabolites are active. These path-ways are the major contributors to the decompo-sition of almost any steady-state #ux distribution,and will be classi"ed as Type I pathways. Type IIpathways will denote those in which only the

Page 9: Theory for the Systemic Definition of Metabolic Pathways and their

FIG. 3. A matrix perspective for the classi"cation schemeof extreme pathways. The matrix depicted is the pathwaymatrix P where each column represents an extreme path-way. [For an example see Fig. 2(b).] For Type I pathwaysthe only requirement is for one of the primary exchange#uxes to be active. In Type II pathways only the currencyexchange #uxes can be active, and in Type III pathwaysnone of the exchange #uxes are active.

A PATHWAY PERSPECTIVE FOR CELLULAR METABOLISM 237

exchange #uxes for the currency metabolites areactive. These pathways correspond to true futilecycles existing within the network which serve todissipate energy or reductive power. Perhaps thebest example of this class would be the well-known futile cycle that exists in glycolysis be-tween the activity of phosphofructokinase andfructose-1,6-bisphosphatase which dissipates en-ergy by converting ATP into ADP and inorganicphosphate in equal molar ratios. It should benoted that the term futile might be misleading inthe rare case in which an internal-energy-produc-ing or redox-producing cycle may exist.

Pathways in which all of the exchange #uxesare inactive correspond to internal cycles withinthe network that have no net overall e!ect on thefunctional capabilities of the network. We willclassify these pathways as Type III pathways.Usually, these pathways will correspond to thecycling of two #uxes resulting from the decom-position of reversible reactions into twounidirectional #uxes for the forward and reversereactions. On occasion a cycle comprised of mul-tiple ('2) active internal #uxes may exist, but itwill again have no impact on the capabilities ofthe network. While these pathways may not ap-pear to provide a path through the system theystill constitute an edge of the #ux cone. Wheninterested in providing a decomposition of a #uxdistribution, these #uxes will virtually never ap-pear in the decomposition as they do not a!ectthe overall productive capabilities. However,these pathways cannot be completely ignored, asthere may be dynamic consequences that are yetto be investigated such as their ability to dynam-ically regulate metabolite or metabolite poolconcentrations. A matrix representation of thesethree di!erent classi"cations of pathways isshown in Fig. 3. This classi"cation scheme can beextended to classify any pathway or #ux distribu-tion in addition to the extreme pathways of anetwork.

For the reaction of Fig. 1 the "rst "ve columnsof the matrix P in Fig. 2(b) correspond to Type Ipathways and are all illustrated in Fig. 2(a) whilethe last two columns of P correspond to Type IIIpathways, as all of the exchange #uxes are in-active. As all of the metabolites were consideredprimary, there does not exist any Type II path-way. With these pathways de"ned and classi"ed

how can they be used to analyse and interpretsystemic functions and #ux distributions?

Unique Representation of Steady States

Every #ux distribution, v, can be written as anon-negative linear combination of the extremepathways, as shown in eqn (5). In linear algebrasuch a decomposition as a sum of basis vectors isunique even though the basis itself is non-unique.However, some #ux distributions can be writtenas a sum of extreme pathways in many ways.Therefore, the decomposition of a steady-state#ux vector (v) into the corresponding extremepathways (p) is not necessarily unique. Onlya basis for a solution space guarantees a uniquerepresentation of every point in the solutionspace. For the set of extreme pathways to forma basis the number of pathways must equal thedimensions of the null space. The dimensions ofthe null space depend on the number of freevariables in the original set of linear equationsforming eqn (2), which is referred to as the rank ofS, (r). The relation termed the Rank Theoremgives the dimensions of the null space (d):

d(S)"dim (NulS)"n!r. (7)

Thus, for a full rank matrix the dimensions of thenull space will be equal to the di!erence between

Page 10: Theory for the Systemic Definition of Metabolic Pathways and their

238 C. H. SCHILLING E¹ A¸.

the number of #uxes and metabolites(d"n!m), as r will equal m. If the number ofextreme pathways exceeds the dimension of thenull space (k'd), then the pathways do notuniquely describe every point in the solutioncone. In this case, one can select a subset of theextreme pathways which are linearly independentand equal in number to the dimensions of the nullspace as a basis if desired. This corresponds to theedges of a simplex of the cone. However, thisselection is again non-unique as often there arenumerous simplexes in which a solution may lie.The edges of the cone therefore can be thought ofas providing a limited set of vectors from whichto select for the construction of a basis if desired.

If we again consider the reaction network dis-cussed in Fig. 1, the dimension of the null space is5 and the number of edges of the #ux cone is7 (d"5, k"7). Therefore, the entire #ux conecannot be uniquely decomposed into the extremepathways. To have the number of pathwaysequal to the dimension of the null space is un-common in larger networks due to the high de-gree of interconnectivity amongst metabolitesand reactions.

Even though in most cases the entire conecannot be uniquely described by the set of ex-treme pathways, there are certain regions of thesolution cone in which a solution is uniquelydescribed by the pathways. To determine if adecomposition is unique it is necessary to "rstdetermine the number of extreme pathways thatare subsets of the particular #ux distribution ofinterest, v. For an extreme pathway to be a subsetof a #ux distribution it must not contain an activeinternal #ux that is inactive in the #ux distribu-tion. Additionally, the extreme pathway must nothave an active exchange #ux that is inactive inthe #ux distribution and is also constrained to beeither positive or negative. As an example, con-sider the #ux distribution (vT"[4, 2, 0, 1, 0, 1,!4, 2, 1, 1]) shown in Fig. 2(c). The only in-active #uxes are v

3and v

5. Thus any pathway

that has either of these internal #uxes active is nota subset pathway. This eliminates extreme path-way p

4, p

5, p

6, and p

7, leaving only the "rst three

extreme pathways as subsets of v. The next stepinvolves calculating the dimensions of the nullspace for the modi"ed stoichiometric matrix(S

mod) wherein all columns corresponding to

inactive internal and exchange #uxes in v areremoved. In our example, this would eliminatethe columns corresponding to v

3and v

5. The

dimension of the corresponding null space is 3,which is equal to the number of subset pathways,thus ensuring a unique decomposition. If thenumber of extreme that are subsets of v is greaterthan the dimensions of the null space, d(S

mod),

then the decomposition is non-unique. Otherwisethe decomposition is unique.

Systemic Independence

A set of pathways Mp1,2 , p

kN is said to be

conically or systemically independent if no path-way can be written as a non-trivial non-negativelinear combination of the other pathways. Noticethat the only di!erence between this de"nitionand linear independence is that the coe$cients ofthe linear combination cannot be negative. Animportant implication of this de"nition is that itis possible for a set of pathways to be systemicallyindependent while simultaneously being a lin-early dependent set. When investigating the func-tional aspects of a metabolic system the propertyof systemic independence should take priorityover linear independence as it is a uniqueproperty of any system and its structuralcapabilities.

The set of extreme pathways for a system is notlinearly independent since typically the numberof pathways that form the edges of the steady-state #ux cone is greater than the dimensions ofthe null space. However, the set of extreme path-ways is systemically independent by de"nition ofbeing the edges of the #ux cone. The term geneticindependence was "rst introduced to describea distinction between sets of metabolic pathwayswhich were linearly dependent but de"ned asindependent genotypes (Seressiotis & Bailey,1988). The idea re#ected the notion that certainsets of pathways share a characteristic of systemicindependence even though they are linearly de-pendent. The use of the term genetically indepen-dent may potentially be misleading, as there isnot necessarily a one-to-one correspondence be-tween reactions and gene products. There area number of examples of gene products capableof carrying out a number of related but distinctbiochemical transformations.

Page 11: Theory for the Systemic Definition of Metabolic Pathways and their

A PATHWAY PERSPECTIVE FOR CELLULAR METABOLISM 239

What are the implications of pathways beingsystemically independent? From a control pointof view, it can be envisioned that if a cell couldcontrol the activity of each one of the extremepathways then it would be capable of reachingeach point within the #ux cone. In other words, itwould be maximizing its metabolic capabilities.This is most likely not the case physiologically asa distinct control switch is most likely not de-voted to each pathway even in a non-obviousmanner. The control scheme that the cell choosesto implement is much more complicated, as theregulation of certain pathways will be coupledand in fact there will be certain pathways thatare always inactive or even unregulated due tothe regulatory scheme implemented. Certaincombinations of pathways may be unfeasible. Thechallenging task that now confronts us is tounderstand how the pathways of a system areregulated.

The real question we are seeking to answer iscan we gain insight on the regulatory logic imple-mented by the cell by focusing on its pathwaystructure. Is the set of extreme pathways what thecell is truly aimed at controlling and regulating?And if so, how are they regulated? Rather thanpredicting a control scheme that could be usedfor a metabolic network it will be perhapsmost bene"cial to look at the actual knownregulatory scheme of certain well-understoodnetworks and make some rational connections tothe theoretical pathway structure which can bedetermined. It seems logical to think that thesepathways are the ultimate objective of cellularregulation. Thus, while regulation is occurring atthe protein and genetic levels through enzymaticactivation/inhibition, and transcriptional con-trol, etc., the ultimate cellular function of theseregulatory mechanisms is the control of meta-bolic pathways in an indirect and non-intuitivemanner.

Discussion

The de"nition and conceptualization of bio-chemical pathways in the context of a whole cellhas emerged as an important issue now thatgenomics is leading to the complete de"nitionof genotypes in an increasing number of organ-isms. Here we have introduced a theoretical

framework for the identi"cation of the simplestunique set of biochemical pathways for a meta-bolic system, based on methods of convex analy-sis and the laws of material conservation. Theunique extreme pathways are the systemicallyindependent pathways of the network and can beused to analyse and interpret the functional capa-bilities and #ux distributions of cellular metabol-ism from a pathway-based perspective. The ap-plication of this theoretical approach to the prob-lem of analysing a complete cellular metabolicnetwork derived from its metabolic genotype willbe discussed in a paper to follow (Schilling &Palsson, 2000). The detailed approach discussedherein represents a continuing improvement intheoretical strategies for the study of metabolicpathways.

The theory behind most recent work on path-way analysis stems from linear algebra and morespeci"cally convex analysis, two related branchesof mathematics (for a review see Schilling et al.,1999b). The "rst natural approach to study meta-bolic networks is through the use of linear alge-bra to explore the null space of the series of linearhomogeneous equations resulting from the con-servation of mass as in eqn (2). As previouslymentioned, the null space can be spanned by a setof linearly independent basis vectors which cor-respond to biochemical pathways that can beused to interpret the functional characteristics ofthe system (Fell, 1993; Schilling & Palsson, 1998).While basis vectors provide a unique representa-tion of every solution to the system, the set ofbasis vectors that span the null space is non-unique making their use much less e!ective. Toovercome this obstacle of non-uniqueness wehave turned to the mathematics associated withconvex spaces.

Convex analysis was "rst applied to inorganicchemical systems in the detailed theory of stoi-chiometric network analysis (SNA) (Clarke, 1980,1981, 1988). This theory was developed for themathematical analysis of stability in complex re-action networks. SNA utilizes convex analysis todetermine a set of &&extreme currents''which serveas a framework for a coordinate transformationused to determine the stability of the network.These &&extreme currents'' also correspond toedges of a #ux cone; however, in SNA all#uxes are constrained to be non-negative. Thus,

Page 12: Theory for the Systemic Definition of Metabolic Pathways and their

FIG. 4. Scheme of the PEP/PYR/OAA cycle. Abbreviations: PYR, pyruvate; PEP, phosphoenolpyruvate; OAA,oxaloacetate. (a) Seven extreme currents of pathways of the system considering all #uxes to be unidirectional includingexchange #uxes. Omitted are the three pathways each of which corresponds to the cycling of the input and output exchange#uxes for each of the metabolites. (b) Three extreme pathways for the system when considering exchange #uxes to beunconstrained or bidirectional. (c) Seven elementary modes for the system when considering exchange #uxes to beunconstrained or bidirectional.

240 C. H. SCHILLING E¹ A¸.

exchange #uxes are decomposed into an indepen-dent input #ux and an output #ux. The e!ects ofthis can be seen in Fig. 4 for the simple reactionsystem representing the pyruvate/phosphoenol-pyruvate/oxaloacetate cycle. When all #uxes areconsidered to be non-negative there are ten edgesto the #ux cone, seven of which are illustrated inFig. 4(a). From a systemic point of view a numberof the pathways shown are combinations of otherpathways, however mathematically this is nota true statement for the exclusive reason that thepseudoreactions (exchange #uxes) are decom-posed into forward and reverse reactions. In theapproach described herein these pseudoreactionsare e!ectively merged into a bidirectional uncon-strained #ux creating a #ux cone that containsonly three edges/pathways shown in Fig. 4(b).

Allowing exchange #uxes to become bidirec-tional has the general e!ect of reducing the di-mensions of the null space (d) in addition to

reducing the number of extreme pathways oredges of the #ux cone (k) to an even greaterextent. The quantity (k!d) therefore decreases.This serves to reduce the degrees of freedom thatcan be used to select a set of pathways to forma basis. So, while in most cases we do not arriveat a unique basis we can limit the number ofdi!erent bases to choose from signi"cantly withthis approach, bringing us as close to a uniquebasis as possible if one is seeking to use sucha basis to interpret metabolic function.

This appears to be the more reasonable ap-proach, as it provides a much cleaner view of thepathway structure. Additionally, the pseudoreac-tions should not be in#uencing the pathwaystructure to such an extent. The rami"cations ofchanging the conventions used to describe a reac-tion network on the subsequent operations ofSNA and pathway stability are a topic of futureinvestigation. SNA has been most notably used

Page 13: Theory for the Systemic Definition of Metabolic Pathways and their

FIG. 5. Illustration of the set of elementary modescorresponding to the reaction network of Fig. 1. Sevenelementary modes are shown. All of the elementary modescorrespond to the extreme pathways of the network with theexception of the two pathways indicated by the dashedarrows. The upper mode corresponds to the combination ofextreme pathways p

1and p

3while the lower mode is the

combination of extreme pathways p1and p

2. Two additional

modes exist corresponding to the cycling of the two revers-ible reactions (v

2/v

3and v

4/v

5).

A PATHWAY PERSPECTIVE FOR CELLULAR METABOLISM 241

to establish a categorization of oscillatory reac-tions (Schreiber et al., 1996), the stoichiometricconnectivity of chemical species essential for theoscillations (Eiswirth et al., 1991), and the quant-itative determination of extreme currents activeat stationary states in a model of the chlor-ide}iodide reaction at various external con-straints (Strasser et al., 1993).

The most recent application of convex analysisfor the study of metabolic pathways has been inthe development of the concepts of elementary#ux modes of a system (Schuster & Hilgetag,1994; Schuster et al., 1999). The concepts of ele-mentary modes were most notably utilized toguide the development and engineering of anEscherichia coli strain that successfully channeledcarbohydrates down the pathways for aromaticamino acid biosynthesis at theoretical yields(Liao et al., 1996). Elementary modes have beende"ned as the minimal set of enzymes that couldoperate at steady state with all irreversible reac-tions proceeding in the appropriate directions.Rule-based algorithms utilizing the principles ofconvex analysis have been previously describedfor determining the set of elementary modes forsituations in which reversible reactions aremodeled as bidirectional #uxes (Schuster et al.,1996). While the conventions for describing themetabolic system are slightly di!erent from thosediscussed here, the set of elementary modes hasbeen shown to be unique; however, in the pres-ence of reversible reactions there are often moreelementary modes than are needed to span the#ux cone. At times this situation will make thedecomposition of a #ux vector into elementarymodes non-unique.

The elementary modes of the system describedin Fig. 4(a) with exchange #uxes decomposed areidentical to the edges of the #ux cone due to theabsence of any bidirectional #uxes, and corres-pond exactly to the extreme currents or pathwaysof the network as determined using SNA. Usingthe conventions adopted herein for describing thenetwork as shown in Fig. 4(b), the number ofelementary modes is greater than the numberof extreme pathways indicating that a number ofthese elementary modes lie within the interiorof the #ux cone generated by the extreme path-ways and are positive combinations of the ex-treme pathways. The elementary modes for the

network are shown in Fig. 4(c). These modes allcorrespond to the same pattern of pathwaysshown in Fig. 4(a). The details of determining theelementary modes for this cycle have been pre-viously discussed (Schuster & Hilgetag, 1994).Here the dimension of the null space is 3 and thenumber of extreme pathways is 3; however thenumber of elementary modes is 7. Three of thesemodes are the same as the extreme pathwayswhile the remaining four lie on a face of the #uxcone or within the interior of the #ux cone.

Figure 5 illustrates the elementary modes forthe reaction network shown in Fig. 1. For bothcases the dimension of the null space is 5. All ofthe extreme pathways are identical to the elemen-tary modes with the two additional elementarymodes being the pathways leading from meta-bolites A to D and A to E. These two additionalmodes are both positive linear combinations oftwo extreme pathways and thus lie on the interiorof the cone or on a face of the cone in this case.This creates a situation in which there is a re-dundancy in the pathway structure resulting of-ten in a non-unique decomposition of a steady-state #ux distribution.

With respect to SNA and the study of elemen-tary modes the work presented here can be seenas a sort of hybrid approach that builds uponmany of the concepts of these two similar ap-proaches looking forward to the use of pathwayanalysis to study metabolic function in addition

Page 14: Theory for the Systemic Definition of Metabolic Pathways and their

FIG. 6. De"ning the metabolic genotype and phenotypein the context of convex analysis. A geometric depiction ofthe #ux cone in three dimensions where the entire un-bounded #ux cone corresponds to the theoretical capabili-ties of a metabolic genotype. Each edge of the cone indicatedby the white arrows corresponds to an extreme pathway.A particular solution or metabolic phenotype is indicatedby the "lled point that is described by a #ux vector (v) lyingon the interior of the #ux cone or convex hull. This#ux vector can be decomposed into extreme pathways as ineqn (5).

242 C. H. SCHILLING E¹ A¸.

to structural aspects of metabolic networks. Instudying the extreme pathways we arrive as closeas possible to a unique pathway structure thatcan be used to uniquely describe certain pointswithin the #ux cone. These pathways representthe smallest unique set of pathways that canpossibly be generated to accurately interpret thefunctional aspects of a metabolic network. Thesecharacteristics are achieved due to the concept ofsystemic independence and the inability to de-compose these pathways into any non-negativecombination of other pathways present withinthe #ux cone. Thus, we arrive at the underlyingpathway structure and topology of metabolicnetworks. It should once again be noted thatthere is no dynamic or regulatory informationaccounted for in the network description andcalculation performed herein, which are basedprimarily on the structure and topology of thenetwork.

Conclusion

Individual metabolic reactions and #uxeswithin the cell are the unit of chemical function,while individual extreme pathways may be con-sidered as the unit of systemic function and per-haps cellular function. Here we have discussedhow metabolic systems may be described math-ematically, and how to determine and classify theunique set of these extreme pathways that corres-pond to the edges of the #ux cone. The underly-ing pathway structure that is determined fromthe set of extreme pathways now provides us withthe ability to analyse, interpret, and predictmetabolic function from a pathway-basedperspective.

Figure 6 provides a geometric interpretation ofthe #ux cone with every point described byeqn (5). Together the set of extreme pathwaysdescribes the full capabilities of the metabolicnetwork in the simplest form possible, as thesepathways are systemically independent andirreducible. From the biological point of view,the entire #ux cone associated with the reactionscomprising cellular metabolism corresponds tothe capabilities of an organism's metabolic net-work and hence the capabilities of its metabolicgenotype. Each one of the generating vectorscorresponds to an extreme pathway that the cell

could theoretically control to reach every pointin the #ux cone. Each particular point within this#ux cone corresponds to a di!erent #ux distribu-tion representing a particular metabolic pheno-type. The actual #ux vector describing that pointcan be thought of as a positive combination ofthese extreme pathways. So one may think ofthese pathways as theoretically being &&switched''o! and on to varying degrees to reach a particu-lar metabolic phenotype. How this control isactually implemented at the enzymatic and gen-etic levels is the next question. To add capabilitiesto its metabolic network an organism must

Page 15: Theory for the Systemic Definition of Metabolic Pathways and their

A PATHWAY PERSPECTIVE FOR CELLULAR METABOLISM 243

acquire enough reactions to add another extremepathway. Thus, acquiring a gene whose geneproduct performs a speci"c reaction(s) maynot add function to the cell if other supportingreactions are not acquired to create a new ex-treme pathway. Clearly, this has a number ofinteresting implications regarding the evolutionof cellular metabolism throughout the kingdomsof life.

Perhaps one of the greatest scienti"c chal-lenges of the next century will be to understandthe principles and control schemes that underlieintegrated multi-geneic functions and the geno-type}phenotype relationship (Palsson, 1997). Tomeet the challenge, we must "rst understand thesystematic objectives of cellular functions andhow these objectives are achieved on the cellularlevel in addition to the molecular level. Then wecan proceed with the investigation of the regula-tory aspects of these networks and various analy-sis and simulation methods (McAdams & Arkin,1998; Tomita et al., 1999). With genomics andbioinformatics now providing much of the&&hardware'' within the cell, we now begin thesearch for the &&software'' or the operatingsystems that govern coordinated cellularfunction.

The authors wish to acknowledge Jeremy Edwards,John Ross, and Stefan Schuster for their commentsand contributions following many discussions onpathway analysis and related topics. Portions of thiswork are supported by grants from the National Insti-tute of Health (GM57089) and the National ScienceFoundation (9873384, 9814092). Christophe Schillingis a recipient of a graduate fellowship from theWhitaker Foundation and would like to thank theWhitaker Foundation for their support through grad-uate fellowships in bioengineering.

REFERENCES

BROWN, P. O. & BOTSTEIN, D. (1999). Exploring the newworld of the genome with DNA microarrays. NatureGenet. 21(1 Suppl), 33}37.

CLARKE, B. L. (1980). Stability of complex reaction net-works. Adv. Chem. Phys. 43, 1}215.

CLARKE, B. L. (1981). Complete set of steady states for thegeneral stoichiometric dynamical system. J. Chem. Phys.75, 4970}4979.

CLARKE, B. L. (1988). Stoichiometric network analysis. CellBiophys. 12, 237}253.

EDWARDS, J. S. & PALSSON, B. O. (1988). How will bioinfor-matics in#uence metabolic engineering. Biotechnol.Bioeng. 58, 162}169.

EISWIRTH, M., FREUND, A. & ROSS, J. (1991). Mechanisticclassi"cation of chemical oscillators and the role of species.Adv. Chem. Phys. 80, 127.

FELL, D. A. (1993). The analysis of #ux in substrate cycles.In: Modern ¹rends in Biothermokinetics (Schuster, S., Rig-oulet, M., Ouhabi, R. & Mazat, J.-P., eds), pp. 97}101. NewYork: Plenum.

FLEISCHMANN, R. D. & ADAMS, M. D. (1995). Whole-genome random sequencing and assembly of Haemophilusin-uenza Rd. Science 269, 496}512.

HADLEY, G. (1961). ¸inear Algebra. Reading, MA: Addison-Wesley Publishing Co.

HOHEISEL, J. D. (1997). Oligomer-chip technology. ¹rendsBiotechnol. 15, 465}469.

KANEHISA, M. (1997). A database for post-genome analysis.¹rends Genet. 13, 375}376.

KARP, P. D., RILEY, M., FLEISCHMANN, R. D., ADAMS,M. D., WHITE, O., CLAYTON, R. A., KIRKNESS, E. F.,KERLAVAGE, A. R., BULT, C. J., TOMB, J. F., DOUGHERTY,B. A. & MERRICK, J. M. (1998). EcoCyc: encyclopedia ofEscherichia coli genes and metabolism. Nucl. Acids Res. 26,50}53.

LAY, D. C. (1997). ¸inear Algebra and Its Applications.Reading, MA: Addison}Wesley Longman Inc.

LIAO, J.C., HOU, S.-Y. & CHO, Y.-P. (1996) Pathway analy-sis, engineering, and physiological considerations forredirecting central metabolism. Biotechnol. Bioeng. 52,129}140.

LIPSHUTZ, R. J., FODOR, S. P., GINGERAS, T. R. & LOCK-

HART, D. J. (1999). High density synthetic oligonucleotidearrays. Nat. Genet. 21 (1 Suppl), 20}24.

MCADAMS, H. H. & ARKIN, A. (1998). Simulation ofprokaryotic genetic circuits. Ann. Rev. Biophys. Biomol.Struct. 27, 199}224.

PALSSON, B. O. (1997). What lies beyond bioinformatics?Nat. Biotechnol. 15, 3}4.

RAMSAY, G. (1998). DNA chips: state-of-the-art. Nat.Biotechnol. 16, 40}44.

REDER, C. (1988). Metabolic control theory: a structuralapproach. J. theoret. Biol. 135, 175}201.

ROCKAFELLAR, R. T. (1970). Convex Analysis. Princeton,NJ: Princeton University Press.

SCHENA, M., HELLER, R. A., KARP, P. D., RILEY, M., PALEY,S. M., PELLEGRINI-TOOLE, A. & KRUMMENACKER, M.(1998). Microarrays: biotechnology's discovery platformfor functional genomics. ¹rends Biotechnol. 16, 301}306.

SCHILLING, C. & PALSSON, B. (2000). Assessment ofthe metabolic capabilities of Haemophilus in-uenzae Rdthrough a genome-scale pathway analysis. J. theor. Biol.203, 247}281.

SCHILLING, C. H., EDWARDS, J. S. & PALSSON, B. O.(1999a). Toward metabolic phenomics: analysis ofgenomic data using #ux balances. Biotechnol. Prog. 15,288}295.

SCHILLING, C. H. & PALSSON, B. O. (1998). The underlyingpathway structure of biochemical reaction networks. Proc.Nat. Acad. Sci. ;.S.A. 95, 4193}4198.

SCHILLING, C. H., SCHUSTER, S., PALSSON, B. O. & HEIN-

RICH, R. (1999b). Metabolic pathway analysis: basicconcepts and scienti"c applications in the post-genomicera. Biotechnol. Prog. 15, 296}303.

Page 16: Theory for the Systemic Definition of Metabolic Pathways and their

244 C. H. SCHILLING E¹ A¸.

SCHREIBER, I., HUNG, Y.-F. & ROSS, J. (1996). Categor-ization of some oscillatory enzymatic reactions. J. Phys.Chem. 100, 8556}8566.

SCHUSTER, R. & SCHUSTER, S. (1993). Re"ned algorithm andcomputer program for calculating all non-negative #uxesadmissible in steady states of biochemical reaction systemswith or without some #ux rates "xed. Comput. Appl. Biosci.9, 79}85.

SCHUSTER, S., DANDEKAR, T. & FELL, D. A. (1999). Detec-tion of elementary #ux modes in biochemical networks:a promising tool for pathway analysis and metabolic en-gineering. ¹rends Biotechnol. 17, 53}60.

SCHUSTER, S. & HILGETAG, C. (1994). On elementary #uxmodes in biochemical reaction systems at steady state.J. Biol. Syst. 2, 165}182.

SCHUSTER, S., HILGETAG, C., WOODS, J. H. & FELL D. A.(1996). Elementary modes of functioning in bio-chemical networks. In: Computation in Cellular andMolecular Biological Systems (Cuthbertson, R., Holcombe,M. & Paton, R., eds), pp. 151}165. London: WorldScienti"c.

SELKOV, E. J., GRECHKIN, Y., MIKHAILOVA, N. & SELKOV,E. (1998). MPW: the metabolic pathways database. Nucl.Acids Res. 26, 43}45.

SERESSIOTIS, A. & BAILEY, J. E. (1988). MPS: an arti"ci-ally intelligent software system for the analysis and syn-thesis of metabolic pathways. Biotechnol. Bioeng. 31,587}602.

STRASSER, P., STEMWEDEL, J. D. & ROSS, J. (1993). Analysisof a mechanism of the chloride}iodide reaction. J. Phys.Chem. 97, 2851}2861.

TOMITA, M., HASHIMOTO, K., SCHENA, M., HELLER, R. A.,THERIAULT, T. P., KONRAD, K., LACHENMEIER, E. &DAVIS, R. W. (1999). E-CELL: software environment forwhole-cell simulation. Bioinformatics 15, 72}84.

APPENDIX A

Theorem. A convex -ux cone has a set of systemi-cally independent generating vectors. Furthermore,these generating vectors (extremal rays) are uniqueup to a multiplication by a positive scalar.¹hese generating vectors will be called extremepathways.

Proof. First, we must show existence of a sys-temically independent generating set for a coneand then we will prove uniqueness. To showexistence we can simply refer to the algorithmoutlined in Appendix B that is justi"ed andshown to provide a set of systemicallyindependent set of generating vectors for acone. Thus existence is readily shown. Toshow uniqueness, let Mp

1,2 , p

kN be a systemi-

cally independent generating set for a cone.Notice that if

pj"c@#cA (A.1)

then both c@ and cA are positive multiples of pj. To

see why this is true, we can write the two path-ways as non-negative linear combinations of theextreme pathways:

c@"k+i/1

iipi

and

cA"k+i/1

nipi

for ii, n

i*0. (A.2)

Thus

pj"c@#cA"

k+i/1

(ii#n

i)p

i. (A.3)

Since the pi

are systemically independent, theonly way this can happen is if all the g

iand k

iare

equal to zero except for gjand k

j. This implies

that both c@ and cA are multiples of pj. This will

show that the set of extreme pathways is unique.For if Mc

1,2 , c

kN was another set of extreme

pathways the above argument would show thateach of the c

imust be a positive multiple of one of

the pi. This completes the proof of the theorem

establishing the uniqueness of the extreme path-ways.

APPENDIX B

The algorithm that is implemented to determinethe set of extreme pathways for a reaction net-work follows the principles of algorithms for"nding the extremal rays/generating vectors ofconvex polyhedral cones.

The algorithm begins with the formulation ofan initial matrix consisting of an n]n identitymatrix (I) appended to the transpose of thestoichiometric matrix, ST. Then we examine theconstraints on each of the exchange #uxes asgiven in eqn (4). If the exchange #ux is con-strained to be positive nothing is done;however, if the exchange #ux is constrained tobe negative then we multiply the correspondingrow of the initial matrix by !1. If the exchange#ux is unconstrained then we move the entirerow to a temporary matrix, T(E). This completesthe initialization of the "rst tableau, T(0). Forthe reaction system in Fig. 1, T(0) and T(E)

Page 17: Theory for the Systemic Definition of Metabolic Pathways and their

A PATHWAY PERSPECTIVE FOR CELLULAR METABOLISM 245

are follows:

T(0)"

1

1

1

1

1

1

D

D

D

D

D

D

!1 1 0 0 0

0 !1 1 0 0

0 1 !1 0 0

0 0 !1 1 0

0 0 1 !1 0

0 0 !1 0 1

,

(B.1)

T(E)"

1

1

1

1

D

D

D

D

!1 0 0 0 0

0 !1 0 0 0

0 0 !0 !1 0

0 0 !0 0 !1

We will designate each element of the abovematrices by ¹

ij. Starting with x equal to 1 and

T(0) equaling T(x~1) the next tableau is generatedin the following manner:

1. Identify all of the metabolites that do nothave an unconstrained exchange #ux associatedwith them. The total number of such metabolitesis denoted by k. In this example, only metaboliteC does not have such an unconstrained exchange#ux so k"1.

2. Begin forming the new matrix T(x) by copy-ing all rows from T(x~1) which contain a zero inthe column of ST that corresponds to the "rstmetabolite identi"ed in step 1, denoted by the

index c. (This will be the third column of thetransposed stoichiometric matrix.)

3. Of the remaining rows in T(x~1) add to-gether all possible combinations of rows whichcontain values of the opposite sign in column c,such that the addition produces a zero in thiscolumn. Given two rows, r

1and r

2, whose ele-

ments will be denoted by r1, j

and r2, j

forj"1, . . . , (n#m), combine the rows using thefollowing equation to generate a new row r to beadded to T(i) :

r@"( Dr2, c

D * r1)#( Dr

1, cD * r

2) (B.2)

The resulting matrix T (1) is as follows up to thispoint:

T(0)"

1

1 1

1 1

1 1

1 1

1 1

1 1

D

D

D

D

D

D

D

!1 1 0 0 0

0 0 0 0 0

0 !1 0 1 0

0 !1 0 1 1

0 1 0 !1 0

0 0 0 0 0

0 0 0 !1 1

(B.3)

Page 18: Theory for the Systemic Definition of Metabolic Pathways and their

246 C. H. SCHILLING E¹ A¸.

4. For all of the rows added to T (x) in steps2 and 3 check to make sure that no row existsthat is a non-negative combination of any othersets of rows in T (x). One method used is as fol-lows: let A(i) equal the set of column indices, j, forwhich the elements of row i equal zero. Thencheck to determine if there exists another row (h)for which A(i) is a subset of A(h). This is expressedmathematically in eqn (B.4) (analogous to the

condition of eqns (14) and (15) in Schuster& Schuster, 1993). Thus, if eqn (B.4) holds true forany distinct rows i and h, then row i must beeliminated from T (i)

A(i)-A(h), iOh

where

A(i)"M j :¹i, j"0, 1)j)(n#m)N. (B.4)

5. With the formation of T(x) complete repeatsteps 2}4 for all of the metabolites that do nothave an unconstrained exchange #ux operatingon the metabolite, incrementing x by one up to k.

The "nal tableau will be T (k). [In this examplethere is only one such metabolite so we do notneed to iterate through steps 2}4 again. There-fore T (k) equals T (1) as in eqn (B.3.)] Note that thenumber of rows in T (k) will be equal to (k),the number of extreme pathways.

6. Next we append T (E) to the bottom of T (k)(also the same as T (1) in this example). Thisresults in the following tableau:

T(l@E)"

1

1 1

1 1

1 1

1 1

1 1

1 1

D

D

D

D

D

D

D

!1 1 0 0 0

0 0 0 0 0

0 !1 0 1 0

0 !1 0 0 1

0 1 0 !1 0

0 0 0 0 0

0 0 0 !1 1

(B.5)

} } } } } } } } } } } } } } } } } } } } } } } } } } } } } } } } } } } } } } } } }

1

1

1

1

D

D

D

D

D

!1 0 0 0 0

0 !1 0 0 0

0 0 0 !1 0

0 0 0 0 !1

7. Starting in the n#1 column (or the "rstnon-zero column on the right side), if ¹

i,(n`1)does not equal to zero, then add the corre-sponding non-zero row from T (E) to row i soas to produce a zero in the (n#1) column. Thisis done by simply multiplying the correspond-ing row in T (E) by ¹

i, (n`1)and adding this

row to row i. Repeat this procedure for each ofthe rows in the upper portion of the tableauso as to create zeros in the entire upperportion of the (n#1) column. When "nished,remove the row in T(E) corresponding tothe exchange #ux for the metabolite justbalanced.

Page 19: Theory for the Systemic Definition of Metabolic Pathways and their

A PATHWAY PERSPECTIVE FOR CELLULAR METABOLISM 247

8. Follow the same procedure as in step 7 foreach of the columns on the right side of thetableau containing non-zero entries. (In thisexample we need to perform step 7 for everycolumn except the middle column of the rightside which corresponds to metabolite C.) The"nal tableau, T (Final), will contain the transpose ofthe matrix P containing the extreme pathways inplace of the original identity matrix. Both T (Final)and P are given below:

1 1 n n

T (Final)"

1

1 1

1 1

1 1

1 1

1 1

1 1

!1 1

!1 1

!1 1

1 !1

!1 1

D 0 0 0 0 0

D 0 0 0 0 0

D 0 0 0 0 0

D 0 0 0 0 0

D 0 0 0 0 0

D 0 0 0 0 0

D 0 0 0 0 0

(B.6)

PT"

v1

v2

v3

v4

v5

v6

b1

b2

b3

1 0 0 0 0 0 !1 1 00 1 1 0 0 0 0 0 00 1 0 1 0 0 0 !1 10 1 0 0 0 1 0 !1 00 0 1 0 1 0 0 1 !10 0 0 1 1 0 0 0 00 0 0 0 1 1 0 0 !1

b4

0001001

Qp1

Qp7

Qp3

Qp2

Qp4

Qp6

Qp5

(B.7)

As a corollary to this algorithm it is easily seenthat if an unconstrained exchange #ux existed forevery metabolite in the system the number ofextreme pathways would simply equal to thenumber of internal #uxes in the system with eachpathway equivalent to a single internal #ux. Thejusti"cation to ensure that this algorithm gener-ates a unique set of systemically independentgenerating vectors for the #ux cone is providedbelow. First we will show the existence of a coni-cal generating set. Consider the cone

C"Mx3Rn :Ax"0, x*0N, (B.8)

where A is an m]n matrix. We will induct onthe number of rows of A. If m"0 then C isjust the positive orthant, and the set of coordi-nate vectors is easily shown to be a generatingset for C.

For the induction step we have a cone C withgenerating set Mp

1, . . . , p

nN. Let

C@"CWMx :a x #2#a x "a ) x"0N. (B.9)

After a possible rescaling of the piand recording,

we can assume that

a ) pi"G

1, 1)i)k,

!1 k#1)i)k#l,

0 k#l#1)i)n.

(B.10)

It is easily shown that the pathways

Mpi#p

j: 1)i)k, k#1)j)k#lNX

Mpi: k#l#1)i)nN (B.11)

Page 20: Theory for the Systemic Definition of Metabolic Pathways and their

248 C. H. SCHILLING E¹ A¸.

form a conical-generating set for the cone C@. Usingthe induction step for each row of A, we constructa conical-generating set. This set will most likelynot be systemically independent. However, we canremove pathways from the set to get a subset thatstill generates the cone and is systemically indepen-dent. This procedure will produce a systemicallyindependent generating set for the cone.

APPENDIX C: NOTATION

0 zero vectorbj

activity of the jth exchange #uxC steady-state #ux coned dimension of the null space of the

stoichiometric matrix

k number of extreme pathwaysm number of reaction metabolitesn total number of #uxesnE

number of exchange #uxesnI

number of internal #uxesP matrix of extreme pathwaypi

the ith extreme pathwayr rank of the stoichiometric matrixS stoichiometric matrixSij

element of the ith row and the jth columnof the stoichiometric matrix

v #ux vectorvj

activity of the jth internal #uxw pathway utilization vectorwi

activity of the ith pathwayx concentration vector.