3
Spence and Aurora - From reductionist to constructionist, but only if we integrate From reductionist to constructionist, but only if we integrate The full value of sequence databases will only be evident when they are fully integrated with data from biochemical and physiological experiments, and even from human genetic studies. This integration must be made in a way that will allow simulations of outcomes to be made when individual components of the system are modified. A successful pharmaceutical company Paui Spen~ m needs to be able to identify therapeutic agents, which might be small mol- ecules or proteins, that are both effec- tive and safe. In today's competitive world, it also needs to do this quickly and cost effectively. These four driving forces for business rely on as complete an understanding as can be obtained of the pathobiology of the disease (to be treated) and the biology of the (individual) patient. For these reasons, pharmaceutical research and de- velopment must continually operate at the frontiers of biological science. Today, this frontier is exemplified by the knowledge we have of the structure and activity of the human genome at a molecular level.We are perhaps just a year away from having almost the complete human DNA sequence available to us, and we already have the complete genome sequences of many microorganisms as well as the metazoan Caenorhabditis elegans. The power of reductionism Bioinformaticists curate and analyse sequence databases 1, creating order and meaning, as much as our present understanding of biology allows. If one looks back over the history of biological research, it becomes apparent that these databases represent the extreme of biological reductionism.We have taken the human being apart by first identifying the medium that encodes them, the DNA, and then reducing this medium down to its individual units, the nucleotide base pairs. When considering this advanced state of biological reductionism and what it acre- ally gives us, we must bear in mind that the reductionist nature of thought and research in biology, and in many other disciplines, has been extremely successful at facilitat- ing an understanding of highly complex systems. To em- phasize this, it is worth considering some significant exam- pies of reductionistic successes in establishing a platform for scientific insight and the synthesis of new theories. In 1749, Carl Linnaeus established a system that brought order to the panoply of life by devising a logical schema of classification by structural similarity, the sex organs of plants being his particular speciality. This was pharmainformatics 0167-7799/99/$ - see front the binomial nomenclature describing species and genera. However, the Kin- nean system, which still forms the basis of our classification of life today, repre- sents a static description of life's diver- sity. It certainly aided naturalists in their work but it did not explain the creative mechanisms behind the structural similarities on which it was based. Just over a century later, in 1858, Charles Darwin and Alfred Russell Wallace independently pro- posed a biological mechanism that explained the order so carefully described by Linnaeus. Thus, the anatomical reductionism of Linnaeus had created the framework for the synthesis of evolutionary theory, probably the most powerful explanatory theory in biology. Dmitri Ivanovich Mendeleyev's brilliant construction of the periodic table through empirical observations was, at the same time, reductionisdc (it focused on elements rather than compounds or materials) and predictive (Mendeleyev predicted the existence of gallium and germanium, and even their chemical properties). Half a century later, the work of Henry Moseley and Niels Bohr provided an explanation for the periodic table in terms of atomic theory. With the accumulation of vast amounts of sequence data and the development of analytical tools, a great deal of understanding has already been brought to the structure and function of the human genome. For example, we are now able to make significant predictions regarding the potential function of proteins from their amino acid se- quence alone (for genomes that have been completely sequenced, this is estimated that about 40-55% of genes can be assigned functions based on homology to other proteins of known function2). The classification of gene families based on their protein sequences is a very power- £ul tool and, rather like Mendeleyev's periodic table, allows us to make functional predictions. These predictions could represent the first steps on the constructionist path. The data now available allow us to conceptualize how the environment can affect an organism, leading to changes in intercellular signahng and, in turn, to re- sponses via transcription.We can often cite examples of matter © Elsevier Science ktd.AII rights reserved. PU: SO 167-5699(99)01462-0 O

From reductionist to constructionist, but only if we integrate

Embed Size (px)

Citation preview

Page 1: From reductionist to constructionist, but only if we integrate

Spence and Aurora - From reductionist to constructionist, but only if we integrate

From reductionist to constructionist, but only if we integrate The full value of sequence databases will only be evident when they are fully integrated with

data from biochemical and physiological experiments, and even from human genetic studies.

This integration must be made in a way that will allow simulations of outcomes to be made

when individual components of the system are modified.

A successful pharmaceutical company Paui Spen~ m needs to be able to identify therapeutic agents, which might be small mol- ecules or proteins, that are both effec- tive and safe. In today's competitive world, it also needs to do this quickly and cost effectively. These four driving forces for business rely on as complete an understanding as can be obtained of the pathobiology of the disease (to be treated) and the biology of the (individual) patient.

For these reasons, pharmaceutical research and de- velopment must continually operate at the frontiers of biological science. Today, this frontier is exemplified by the knowledge we have of the structure and activity of the human genome at a molecular level.We are perhaps just a year away from having almost the complete human D N A sequence available to us, and we already have the complete genome sequences of many microorganisms as well as the metazoan Caenorhabditis elegans.

T h e p o w e r o f r e d u c t i o n i s m

Bioinformaticists curate and analyse sequence databases 1, creating order and meaning, as much as our present understanding of biology allows. I f one looks back over the history of biological research, it becomes apparent that these databases represent the extreme of biological reductionism.We have taken the human being apart by first identifying the medium that encodes them, the DNA, and then reducing this medium down to its individual units, the nucleotide base pairs. When considering this advanced state of biological reductionism and what it acre- ally gives us, we must bear in mind that the reductionist nature of thought and research in biology, and in many other disciplines, has been extremely successful at facilitat- ing an understanding of highly complex systems. To em- phasize this, it is worth considering some significant exam- pies of reductionistic successes in establishing a platform for scientific insight and the synthesis of new theories.

In 1749, Carl Linnaeus established a system that brought order to the panoply of life by devising a logical schema of classification by structural similarity, the sex organs of plants being his particular speciality. This was

pharmainformatics 0167-7799/99/$ - see front

the binomial nomenclature describing species and genera. However, the Kin- nean system, which still forms the basis of our classification of life today, repre- sents a static description of life's diver- sity. It certainly aided naturalists in their work but it did not explain the creative

mechanisms behind the structural similarities on which it was based. Just over a century later, in 1858, Charles Darwin and Alfred Russell Wallace independently pro- posed a biological mechanism that explained the order so carefully described by Linnaeus. Thus, the anatomical reductionism of Linnaeus had created the framework for the synthesis o f evolutionary theory, probably the most powerful explanatory theory in biology.

Dmitri Ivanovich Mendeleyev's brilliant construction of the periodic table through empirical observations was, at the same time, reductionisdc (it focused on elements rather than compounds or materials) and predictive (Mendeleyev predicted the existence of gallium and germanium, and even their chemical properties). Half a century later, the work o f Henry Moseley and Niels Bohr provided an explanation for the periodic table in terms of atomic theory.

With the accumulation of vast amounts of sequence data and the development of analytical tools, a great deal of understanding has already been brought to the structure and function of the human genome. For example, we are now able to make significant predictions regarding the potential function of proteins from their amino acid se- quence alone (for genomes that have been completely sequenced, this is estimated that about 40-55% of genes can be assigned functions based on homology to other proteins of known function2). The classification of gene families based on their protein sequences is a very power- £ul tool and, rather like Mendeleyev's periodic table, allows us to make functional predictions. These predictions could represent the first steps on the constructionist path.

The data now available allow us to conceptualize how the environment can affect an organism, leading to changes in intercellular signahng and, in turn, to re- sponses via transcription.We can often cite examples of

matter © Elsevier Science ktd.AII rights reserved. PU: SO 167-5699(99)01462-0 O

Page 2: From reductionist to constructionist, but only if we integrate

Spence'and Aurora - From reductionist to constructionist, but only if we integrate

Cellular phenotype

t Spatial -.

Temporal

E n v i r o n m e n t

Pathology 1 1

Gain or loss of function m /

, Phenotype'

Homeostasis (Health)

Intracellu!ar network ~

Biochemical Flegulatory

Populations -,- G e n e s +

Pharmacophore

Populations

C o n s t r u c t i o n i s t

Intercellular network

Cel l -ce l l interact ion (e.g. local, endocrine)

,,- Structure Chemistry

R e d u c t i o n i s t

Fig. I. Some basic connections that need to be made to model biological and pathological processes.

connections at all o f these levels. However, we cannot integrate the modulation of one pathway (at any of these levels) with that of another with much confidence yet, even though organisms are doing it all the time, in both health and disease, and this is what we need to do in order to understand the processes that we hope to mod- ify. Through the use of computational biology, using curated databases and creating biologically meaningful connections between disparate data sources, we can start to reconstruct these complex interactive processes and gain an insight into their functions and controls.

Reconstructing biological systems So where do we start this process? If we consider the gene as the basic functional unit of an organism and we have, or soo n will have, characterized all human genes, then we can clearly start to work back from all the individual genes to the 'macro' level ofphenotype. However, this is not going to be simple. As A.Weissman pointed out 3, cells do not inherit merely a bag of genes but a full genetic program. These genetic programs are mediated by networks o f intra- and intercellular interactions that act together to maintain the integrity and homeostasis o f the organism. The networks, often with built-in redundancy and other checks, maintain homeostasis in much the same way as a spider's web maintains its stability by being intercon- nected (deform or break the web at one node and the remainder of the web acts to maintain integrity).

In disease, assaults from the environment (e.g. muta- gens, parasites, toxins etc.) often upset this balance, lead- ing to an altered or pathological state.A situation can be considered to be pathological once the organism can no longer deal effectively with the assault as the mechanisms

that maintain integrity and homeostasis become insuffi- cient. Ideally, pharmaceutical agents should inhibit these destabilizing processes, restoring the homeostasis. In less-than-ideal circumstances, the symptoms, rather than causative agents, are treated by modulating the pathways impacted by the pathological process. Therefore, if the gene represents the most appropriate reductionistic view of the organism, an understanding of the healthy and pathological states represents the holistic view.

From a pharmainformatics viewpoint, the genes, the cellular phenotype, the networks, the phenotype, the pathology and the therapeutic agent are different axes or vectors. Given any entry point, such as a gene or a meta- bolic pathway, we can'walk' from one axis to all the others, thereby connecting the holistic data with the reduction- ist data. In practice, however, if we start out with an ob- served pathology, how can we identify all the pathways and genes in a given cell type or tissue that contribute to the pathology? The only way to accomplish this is to inte- grate all the information at all levels and present appropri- ate views or entry points that link to all the other levels.

The first level of integration is, logically, to assemble the genes into biochemical and regulatory pathways; that is, essentially, to assemble the cellular program.We know much about the biochemical and metabolic pathways, and quite a lot of work has been done in terms of assembling these. For examples, see M. Riley's work on Escherichia coli pathways 4, KEGG andWIT (Box 1). By contrast, we are just starting to elucidate the regulatory pathways, such as those addressed by Bhalla and colleagues s. In addition to these regulatory pathways, we must include the regulation o f m R N A maturation, translatiori, the cell-cycle and apoptosis, import into and export out of organelles, and

pharmainformatics

Page 3: From reductionist to constructionist, but only if we integrate

protein localization. Modeling at the organ level by integrating molecular-structural and physiological data is now being carried out as well 6.

A related level of integration is to provide not only a view to this program but also appropriate views for the gene paralogs and their programs, as well as the view for orthologs, in model organisms. This is especially impor- tant because the study of model organisms can provide additional insights into general biological phenomena. Even if the biological function of a pathway is not con- served, as is the case for 1Kas in the signaling pathway in yeast and mammals, the biochemical function can be.

O f course, another sort o f network exists in meta- zoans: between the cells and the organs. This network includes neurotransmitters, hormones, growth factors and other diffusible agents that act over short and long dis- tances in the body, sensing the environment and helping to maintain homeostasis.We must also consider local cell interactions, in which one cell touches the surface of another to elicit homeostatic changes. Examples in this category include antigen-presenting-cell-B-cell inter- actions via the major histocompa6bility proteins, as well as the cytokine network in the immune system. These intercellular networks also work to maintain homeostasis. Therefore, computers not only provide a suitable medium for the curation and organization of the genes and net- works but could ultimately provide suitable mathematical models. If the models are sufficiently rich in detail, they could be used to reproduce the cell qualitatively and quantitatively.

New experimental approaches are beginning to popu- late new databases with additional valuable information on gene expression and protein interactions. Over the past decade, we have developed some, limited, insight into the signals that predict the induction of gene expression in specific cell types. Certain motifs are known to predict the genes that can be modulated by specific transcrip- tion factors. However, our understanding of the sequence requirements for gene regulation is s611 in its infancy.The use of high-throughput, massively parallel transcriptional profiling methods has enabled us to explore the temporal patterns o f biochemical or signal-transduction-pathway modulation and technological developments will hope- fully soon increase the spatial resolution of known ex- pression patterns. In addition, high-throughput protein- interaction studies are beginning to generate precise networks o f protein communications within the cell.

We must not forget that we already have a vast library of data on biological systems, describing biochemical and physiological processes in both health and disease, much of it derived by an empirical and, of necessity, reduc- tionistic experimental approach that must be integrated with this new knowledge base. Furthermore, whole- genome sequencing is exposing a high degree of sequence polymorphism between individuals, both between and within genes, adding another level of functional variation to the complex biochemical and genetic mix.These geno- type variations will lead to phenotype variations that must also be incorporated into the understanding of pathology and, ultimately, factored into the treatment o f pathology by pharmacophores. There is much to integrate.

pharmainformatics

Spence and Aurora - From reductionist to constructionist, but only if we integrate

Box I . Usefu l webs i tes

E,Cetlsimulationsof~ware . . . . . . . . . . !ven~ar laboratorv website

Building up f rom genotype to phenotype In the future, bioinformatics (and pharmainformatics) will need to address these data-integration needs, not only statically through hot keys (such as those used to navigate the Internet) but also in a manner that allows the modeling of biological and pathological processes. Some basic connections that need to be made to achieve this goal are outlined in Fig. 1.

Genes, gene structure and gene variation connect, through expression, to regulatory and metabolic processes. The fine control of these processes is modulated by spatial and temporal expression within both cells and tissues. Individual cells communicate through direct interactions as well as remotely through hormones and other secreted products, which connect the intracellular networks of all cells to each other.The sum of these networks defines the phenotype of the organism, which, when all are working within normal limits, is at homeostasis - a state that we can call health. A shift away from this state leads to a disease phenotype, which might be represented by a new homeo- static set point or be progressive. This shift will be the result of changes in the networks that establish these set points.The changes in intracellular networks would be the result of changes in gene expression or gene sequence.

A fully integrated system would be able to connect the sequence and expression data to the homeostatic condition by taking into account relevant data on other genes associated through intra- and intercellular net- works. The networks should be established in silico in order to allow the outcomes of modifications to be simulated at all levels, including gene mutations, environmental insults and drug-receptor interactions. Attempting to construct such an integrated system now, even when much of the data required is still unavailable, should help focus our data-acquisition efforts on the critical components. We believe that the value of gen- omics and bioinformatics in health care is enormous but will only be fully realized if we integrate and integrate and integrate.

1 Boguski, M. (•998) in Trends Guide to Bioinformatics (Brenner, S. and Lewitter, E, eds), pp. 1-3, Elsevier Science

2 Botstein, D., Chervitz, S.A. and Cherry, J.M. (1997) &ience 277, 1259-1260

3 Millar, D., Millar, I., Millar, J. and Millar, M. (1996) The Cambridge Dictionary of Scientists, Cambridge University Press

4 Karp, R, Riley, M., Paley, S., Peltegrini-Toole, A. and Krummenacker, M. (1999) Nucleic Adds Res. 27, 55-58

5 Bhalla, U.S. et al. (1999) Science 283,381-387 6 Noble, D., Levin,J. and Scott,W. (1999) Drug Discov. Today

4, 10-16