104
ACTA UNIVERSITATIS UPSALIENSIS UPPSALA 2007 Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology 305 Signals and Noise in Complex Biological Systems JOHAN RUNG ISSN 1651-6214 ISBN 978-91-554-6888-0 urn:nbn:se:uu:diva-7862

Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

ACTAUNIVERSITATISUPSALIENSISUPPSALA2007

Digital Comprehensive Summaries of Uppsala Dissertationsfrom the Faculty of Science and Technology 305

Signals and Noise in ComplexBiological Systems

JOHAN RUNG

ISSN 1651-6214ISBN 978-91-554-6888-0urn:nbn:se:uu:diva-7862

Page 2: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using
Page 3: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

To Duk-Kyung and M y Fam ily

Page 4: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using
Page 5: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

List of Papers

This thesis is based on the following papers, which are referred to in the textby their Roman numerals.

I Robinson, J.W., Rung J., Bulsara A.R., Inchiosa M.E. (2001)General measures for signal-noise separation in nonlineardynamical systems. Phys. Rev. E, 63,011107:1-11

II Rung J., Schlitt T., Brazma A., Freivalds K., Vilo J. (2002) Build-ing and analysing genome-wide gene disruption networks. Bioin-formatics, 18 Suppl 2:S202-10

III Ettwiller L.M., Rung J., Birney E. (2003) Discovering novelcis-regulatory motifs using functional networks. Genome Res.13(5):883-95

IV Schlitt T., Palin K., Rung J., Dietmann S., Lappe M., Ukkonen E.,Brazma A. (2003) From gene networks to gene function. GenomeRes. 13(12):2568-76

V Sladek R., Rocheleau G., Rung J., Dina C., Shen L., Serre D.,Boutin P., Vincent D., Belisle A., Hadjadj S., Balkau B., HeudeB., Charpentier G., Hudson T.J., Montpetit A., Pshezhetsky A.V.,Prentki M., Posner B.I., Balding D.J., Meyre D., PolychronakosC., Froguel P. (2007) A genome-wide association study identifiesnovel risk loci for type 2 diabetes. Nature, 445(7130):881-5

Reprints were made with permission from the publishers.

5

Page 6: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using
Page 7: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

Contents

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.1 Complex systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121.2 Genome biology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121.3 Gene expression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131.4 Genetic variation and disease . . . . . . . . . . . . . . . . . . . . . . . . . . 141.5 Dynamical systems and noise . . . . . . . . . . . . . . . . . . . . . . . . . 151.6 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2 Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.1 Gene expression microarrays . . . . . . . . . . . . . . . . . . . . . . . . . . 172.2 Genotyping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.2.1 Genotyping with Illumina BeadChip system . . . . . . . . . . 202.3 DNA-binding proteins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.4 Protein interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.5 Single-molecule measurement techniques . . . . . . . . . . . . . . . . 23

3 Low-level signals and noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.1 Control of gene expression . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.1.1 DNA motifs as signals for protein binding . . . . . . . . . . . . 263.2 Synthetic biology and randomness in gene expression . . . . . . . 273.3 Signal transduction in stochastic bistable systems . . . . . . . . . . 27

3.3.1 Stochastic resonance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.4 Effects on gene expression by genetic variation . . . . . . . . . . . . 30

4 Pathways and biological tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334.1 Pathway control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

5 Large-scale gene networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355.1 Topology and structural properties of complex biological system 355.2 Inferring large-scale systems from data . . . . . . . . . . . . . . . . . . 40

5.2.1 Networks from microarray data . . . . . . . . . . . . . . . . . . . . 405.2.2 Transcription factor binding networks . . . . . . . . . . . . . . . 415.2.3 Protein-protein networks . . . . . . . . . . . . . . . . . . . . . . . . . 425.2.4 Integrating networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

5.3 Noise and modelling issues . . . . . . . . . . . . . . . . . . . . . . . . . . . 435.3.1 Choosing the right level of model complexity . . . . . . . . . 435.3.2 Artefacts in network modelling . . . . . . . . . . . . . . . . . . . . 44

6 Signalling between different levels of complexity . . . . . . . . . . . . . . 476.1 Genetic variation cause disease . . . . . . . . . . . . . . . . . . . . . . . . 47

6.1.1 Type 2 Diabetes Mellitus . . . . . . . . . . . . . . . . . . . . . . . . . 48

Page 8: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

6.1.2 Genome-wide association studies . . . . . . . . . . . . . . . . . . . 497 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

7.1 Paper I: General measures for signal-noise separationin nonlinear dynamical systems . . . . . . . . . . . . . . . . . . . . . . . . 53

7.1.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 547.1.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557.1.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

7.2 Paper II: Building and analysing genome-widedisruption networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

7.2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 577.2.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 577.2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 597.2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

7.3 Paper III: Discovering novel cis-regulatory motifs usingfunctional networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

7.3.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 617.3.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 637.3.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

7.4 Paper IV: From gene networks to gene function . . . . . . . . . . . . 657.4.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 657.4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 677.4.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

7.5 Paper V: A genome-wide association study identifies riskloci for type 2 diabetes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

7.5.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 687.5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 707.5.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 759 Sammanfattning på svenska - Summary in Swedish . . . . . . . . . . . . 7710 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

8

Page 9: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

1. Introduction

The world we live in is full of complex systems. When we drive a car fromone town to another, following roads and passing through other towns, we arepart of the complex network that is the traffic system. Society in itself is acomplex network of people, some of whom we have stronger links to - familyand friends, some we know through our own acquaintances, and some we arenot even closely related to. Nature, economics and society on a whole are allcomplex systems. On a smaller scale, within our bodies, complex networks ofinteracting molecules control the processes that makes us stay alive and adaptto different environments.

But what in a complex network is it that makes it so adaptable to differ-ent environments? How has a natural system, like the genes or the proteins inour cells, evolved from smaller systems to grow into something more com-plex, flexible, and robust? Even though highly discussed, most scientists to-day agree that the number of human genes is around 25,000-30,000. One ofthe main databases for genomic information, Ensembl, counts 26,720 genes (2Apr, 2007), including known and novel genes, and RNA genes. But the samefigure for the mouse is 26,428, for the zebrafish 28,396, and for the minisculeworm C. elegans, 20,068. This demonstrates that the number of genes alone isnot a very good indicator of biological complexity, at least not in the sense wenormally think of it. So if not this – how should we describe biological com-plexity, and how can we explain the adaptable and robust dynamical propertiesof biological systems in general?

In this thesis, I am reporting on work that studies such systems at differentlevels of detail in order to determine how processes on the different scalesof complexity are regulated, and how events happening on one level of detailhave effects at a different level. The work mainly concerns the cellular levelor below, but we will also see how regulation on a low level will have effecton larger scales, for instance how single mutations in our DNA can affectthe system on the cellular and organ level, and cause diseases that will affectour whole body. And although it is outside the scope of this work, we canextend the thinking to even larger scales – how such diseases will affect ussocially, and how our society will respond to widespread diseases throughpolitical and economic means. I will also in this summary give a backgroundto the different research fields my work falls into and describe how the papersincluded as parts in this thesis contribute to those fields. Essentially, this can

9

Page 10: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

be divided in three parts, approaching different levels of complexity withinbiological systems:

1. Fundamental system units: consisting of a small number of componentsthat interact to form a unit which can be modelled as a well detailed dy-namical system

2. Signalling pathways and modules: subsystems consisting of a larger num-ber of components, built by several connected fundamental units, with aparticular cellular role.

3. Networks: large networks of molecular components on a global, genome-wide scale.

My goal is to show how biological systems can be modelled and analysedat different scales of complexity, using advanced measurement technology togive us insights into how biological processes are regulated and coordinated.At a low level, we may be able to formulate a detailed model of signals andnoise regulating a single genetic switch. On the intermediate level, we candescribe pathways and modules of genes with specific functionality, where weuse models with less mathematical detail but instead incorporate knowledgeabout biological function. We can then deduce information about biologicalsignals in such systems, such as DNA patterns with regulatory properties. Onthe global scale, we can model systems as large networks of components andlinks, and from this draw conclusions about the robustness and stability ofthe whole system, about modularization of functions, and about how cellsadapt and evolve molecular systems to cope with fluctuating environments(Fig. 1.1).

In physics, we know that what happens on the microscopic scale obeysphysical laws that are inappropriate to use to determine macroscopic proper-ties of the system. But macroscopic laws are still connected to microscopiclaws, and the behaviour of macroscopic systems can be derived from look-ing at collective behaviour of events on the microscopic scale, with the useof statistics. The whole field of statistical physics builds upon this. Similarly,biological systems, viewed from a wide perspective, have properties that de-rive from the collective behaviour and couplings of the individual behaviour.In physics, a model that describes the motion of every molecule in a gas at thesame time may be possible to write down accurately, but will be fairly uselessto describe global properties of the system such as volume and pressure. Con-versely, we will not be able to describe the motion of single gas molecules byknowing the volume and pressure. The questions we ask should be addressedby a model at the right level of detail. For biological systems, even if we couldformulate a model of how every gene is regulated, global properties such asrobustness and modularity would be better addressed by other models. Andconversly, we need the detailed models to understand regulation and dynam-ics on the single gene or single protein level.

10

Page 11: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

Figure 1.1: When we zoom in from large scale to small scale systems, the new detailsof components and regulatory mechanisms come in focus. From a large scale biolog-ical system, like the moose, we can zoom into a group of cell, fi nd one cell and lookat the genome-wide gene regulatory network inside, where we can study topology,robustness and gene modules. Zooming further in, we fi nd smaller modules, groupsof coordinated pathways carrying out different biological tasks. On an even smallerscale, single pathways or groups of biological components interact to carry out a sin-gle function, transduce a signal, or to build or break down a biochemical compound.On the smallest level we look at single gene regulation, in this case a double inhibitorygenetic feedback loop.

To more properly describe and understand complex biological systems, weneed to address these systems with models tightly linking both horizontal andvertical regulation – horizontal in the sense that events happening on one scaleof complexity will affect and regulate other events on the same scale, verticalin the sense that what happens on one scale will have an impact on subsys-tems at other scales in the hierarchy of complexity, for instance that a smallperturbation on a molecular level, or the alteration of a single nucleotide inthe genome, can have effects on larger scales, like disrupting whole signallingpathways, causing a disease, or killing cells in organs and tissues.

Regulation of complex biological systems is a vast and very general areaof research, bringing together researchers from (among others) engineering,physics, chemistry, biology and mathematics. A research field called “systemsbiology” has emerged, applying methods and theory from the study of signalsand systems in engineering to biological systems. This thesis includes materialthat can be characterized as systems biology, but it should be pointed out thatother methods (classic molecular biology, not the least!) has been part of thestudies done. There is no single approach that in itself will provide all we needto know about complex biological systems, but by combining methods from avariety of fields where best suited, we can understand these systems better.

11

Page 12: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

In this introduction chapter I will now briefly review some terminology andfundamental topics that will come back throughout the more detailed chaptersthat follows.

1.1 Complex systemsThe word “complexity” has different meanings in different fields. Throughoutthis thesis, I will by “complex” mean a system with many components andinteractions, coupled in a non-trivial and non-uniform way, so that the sys-tem exhibits emergent properties such as pattern formation, synchonization, orself-adaptation. So what does “many” mean? It depends on the types of com-ponents and regulatory mechanisms present, and how the system is coupledtogether. Complex systems have different properties, components and regula-tory mechanisms depending on the scale we choose to look at (Fig. 1.1). Itis hypothesized that the mechanisms underlying these principles arise fromintrinsic properites of the system itself, because of the sensitivity that comeswith the nonlinear nature of couplings within the system [1]. Also, complex-ity appears at all levels from the microscopic to the macroscopic [2] in a largevariety of man-made and natural systems, as will be discussed throughoutthis thesis. The study of complex systems has been pioneered by for instanceIlya Prigogine, who closely connected complexity with nonlinear sciences andchaos.

1.2 Genome biologyThe fundamental source of information in a cell is the genome, the completeset of genetic material, stored in deoxy-ribonucleic acid (DNA). DNA consistsof the nucleotide bases adenine (A), cytosine (C), guanine (G) and thymine (T)linked to each other in a ordered sequence. Two strands of DNA bind to eachother if there is enough complementarity, pairing adenine on one strand withthymine on the other, and cytosine with guanine. Doublestranded DNA hasthe famous double helix structure, as discovered by Watson and Crick [3]. Ineukaryotic organisms, the genome is organized in a number of separate unitscalled chromosomes. Human beings have normally 22 pair of autosomal chro-mosomes and one pair of sex chromosomes, denoted by X and Y. One copyof each chromosome pair is inherited from the mother, and the other fromthe father. Autosomal chromosomes are essentially the same for males andfemales. Males have one copy of X and one of the much smaller Y chromo-some, females have two copies of the X chromosome although only one of thetwo copies is active [4]. We have natural variations in the genome from othersources, such as mutations and other errors introduced when the genome iscopied at each cell division. These variations result in different people having

12

Page 13: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

slightly varying genomes, and that is partly responsible for the differences be-tween us. The human genome has been completely sequenced [5, 6], so alsothe mouse [7] and currently (March 2007) 15 other mammals — most recentlythe cat, the tree shrew and the hedgehog (http://www.ensembl.org/).

Different sections of DNA in the genome have different functions.Genes are DNA regions that contain information necessary to produceother biomolecules. A gene can go through a process called transcriptioninto ribonucleic acid, RNA, that is chemically only slightly different fromDNA, but is transcribed to a single stranded molecule and contains uridine(U) instead of thymine (T). The transcribed molecules go through variousmodifications, such as splicing where the coding parts of a gene are joinedand the non-coding parts are cut out. Then the RNA gets translated into aprotein, as in the famous central dogma of biology [8], or performs structuralor regulatory functions. After translation, the protein will also undergovarious modifications (like folding into its correct 3D structure) before it canperform its cellular function. Because every gene can have many coding andnon-coding parts that can be combined differently during splicing, one genecan code for many different RNA molecules or proteins. Most of the taskson a molecular level in the cell are carried out by proteins, and these are alsoused for building structures or holding cells together.

1.3 Gene expressionThe process of decoding the information in the DNA to make RNA and pro-teins is called expression. Simply having a copy of a certain gene does notmean it is fully functional or active in the cell. Regulatory mechanisms con-trol how much RNA is expressed by each gene, and this is not simply anon/off switch. The cell needs to be able to fine tune the levels of the RNA andproteins depending on the situation. Turning down expression when a proteinis not needed lowers the metabolic cost of maintaining the full functionalityof the system: frequently, this is controlled by regulatory mechanisms in theform of feedback loops that control the expression rate of a gene as a func-tion of the level of the protein it produces. On the DNA level, these regulatorymechanisms are implemented using short sequence motifs that are targets forprotein binding. Such DNA binding proteins are called transcription factors(TF) when they function by inhibiting or enhancing the transcription rate of agene, for instance by introducing structural changes in the DNA surroundingthe transcription start sites. The term “transcription factors” is sometimes usedfor other proteins called co-factors that are involved in transcription regulationwithout necessarily binding DNA.

We can measure gene expression at different points in the process from geneto RNA to protein. DNA microarray technology is one of the main methodsthat has been developed in recent years that allows us to measure the levels

13

Page 14: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

of many individual RNA transcripts simultaneously in a highly parallel man-ner. Developments of this technique has also made it possible to study variousother quantities by large scale hybridization, such as transcription factor bind-ing to regulatory regions, or genotypes, giving us a fingerprint for the geneticvariation of an individual. Recently, techniques such as molecular beaconshave been developed for studying fewer types of transcripts, but in higher res-olution, real time and even in vivo. For the work done within this thesis, DNAmicroarrays have been the main source of data and will be explained in moredetail in chapter 2.

Activation or inhibition of transcription by DNA-binding proteins is justone of the many types of regulation on the way from DNA to protein. Butregulation of gene expression can also happen on the protein level, for instanceby requiring that a protein is phosphorylated or otherwise modified to becomeactive. For instance, many proteins in the cell are transcribed at levels to keepthe concentration of the inactive protein at a constant level, but to properlyregulate the processes they partake in, they are activated and deactivated by aset of phosphorylases and kinases, sometimes even selfregulatory.

1.4 Genetic variation and diseaseThe genetic variants we have inherited from our parents, along with the oneswe gain during our lifetime, can have effects on how biological systems areregulated. In many cases, there will be no effect either at the RNA or proteinlevel, but sometimes the variant will have a regulatory effect on the efficiencyof transcription, or change the sequence of the resulting protein, possibly com-pletely disrupting the folding or activity of it. This is how genetic variationcan cause disease. Studies that link genetic regions to disease have been de-veloped, by testing a number of individuals for the disease or a medical con-dition and also determining genetic variation at a number of positions alongthe different chromosomes.

Single nucleotide polymorphisms (SNPs) are genetic variants in which aspecific nucleotide basepair is different in different individuals. For exam-ple, a SNP may have two alleles: A (with T on the reverse strand) and G(with C on the reverse strand). If the SNP is located on an autosome that hastwo copies, individuals could be of three different genotypes: AA, AG andGG. Variants where both chromosome copies have the same allele, in thiscase AA and GG, are called homozygous and AG is heterozygous. In humanDNA, there are over 10 million such SNPs that can possibly vary betweenindividuals (11,811,594 reference SNPs are contained in dbSNP release 127,http://www.ncbi.nlm.nih.gov/SNP/). This is one type of geneticvariation that can lead to errors in the cellular control systems, as we willexplain in Chapter 6.

14

Page 15: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

SNPs are not independent, because of a genetic mechanism where DNA se-quences are shuffled within chromosomes. This is called recombination. Be-cause of recombination, two SNPs that are located very close to each other arevery likely to always be transferred together from parent to child. This resultsin genotypes for two close SNPs often being highly correlated. But when thedistance between SNPs increase, so does the likelihood of a recombinationevent happening between them, and the correlation between their genotypesdecreases in a population. This phenomenon is called Linkage Disequilibrium(LD). In particular, blocks of SNPs that are very close may be in almost com-plete LD, and will in practise always change together. Such blocks are calledhaplotypes [9]. A consequence of this is that we can predict the genotype ofa SNP with high accuracy by measuring SNPs that are close to it, and that itmay be enough to measure a small number of SNPs, so called tag SNPs, tofully determine which haplotype variant is present for a block in the genome.

When the human genome was sequenced, it was done on a pool of DNAfrom different people, leading to a consensus sequence. But in an attemptto map the genetic variation present in humans, the International HapMapproject has genotyped almost seven million SNPs in 269 people (release 21a,http://www.hapmap.org/) from four different populations around theworld. The resulting data, the first phase of HapMap, was published in 2005[10], is updated regularly, and has given us deep insight in the populationstructure and genetic variation around the world.

Micro- and minisatellites are types of genetic variation that consist of highlyrepetitive sequences scattered throughout the genome, varying from individ-ual to individual. These have been frequently used as markers for PCR-basedgenotyping or DNA fingerprinting.

Another type of genetic variation that has received much attention lately iscopy number variation. This happens when whole blocks of the genome getduplicated, so that a gene may occur in several copies on the same chromo-some, a phenomenon more wide-spread than previously thought [11, 12]. Itwas shown very recently that autism is linked to de novo copy number varia-tion [13].

1.5 Dynamical systems and noiseGene regulation on the molecular level is noisy [14]. This is because all chem-ical reactions are inherently probabilistic. In order for a chemical reaction toproceed, it is first required that the reactants come close enough to each otherto react, and then remain there for the duration of the reaction. But moleculesmove around in their environment, collide with other molecules such as thesolvent, and also have internal energy in rotational and vibrational states. Thisis the background for diffusion and the Brownian motion of particles. For agene to be transcribed into RNA, the molecular transcription machinery needs

15

Page 16: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

to bind to the promoter region of the gene, and stay for as long as it takes toproduce a full transcript, and this binding mechanism itself is probabilistic.These sources of variation on the input will lead to variating concentrations ofthe product molecules.

To study biological systems at this low level, we can use quite detailed the-oretical tools for modeling and analysis. Mathematical theory for dynamical,nonlinear and stochastic systems has been thoroughly developed and appliedto problems in physics and engineering. In this thesis, we use such methodsto study a bistable model system that flips between two different states de-pending on system noise and external signals. That is the same structure ofa genetic switch, one of the fundamental building blocks of gene regulatorynetworks [15].

1.6 OutlineAlthough the research papers included in this thesis represent in-depth studiesof different important topics in this field, this summary itself is obviously notattempting to cover all these parts in depth - that would take a rather thickbook. Instead, I will try to describe the research field from a wide viewpoint,since I believe it is important to understand that although in-depth research onspecific levels of complexity will drive the advancement of our understandingof these fields. It will be very beneficial to understanding biology and livingsystems when we can properly integrate knowledge about what happens atall levels of complexity, and how subsystems interact between these levels.To place the five papers in this thesis into context, I will review state-of-the-art science and measurement technology concerning each of the three levelsof complexity described above, the horizontal and vertical regulations withinand between these levels, and show how the five papers have contributed ordrawn upon previous knowledge. These review chapters will be followed bya chapter summarizing each of the five papers, with the main methods andresults described and put into the context given by the previous chapters.

16

Page 17: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

2. Measurement

Our knowledge of complex biological systems is driven by advancements inmeasurement technology. Detection systems with increased sensitivity andimproved assay chemistry now let us measure the abundance of biomoleculesin a cell down to single molecule precision, in single cells, and in real time.At the other end of the spectrum of technologies, we can now gain insight intogenomic systems on a massively parallel scale. DNA microarrays measure theconcentrations of up to hundreds of thousands of target DNA types in parallel,by hybridization to an array with high-density immobilised DNA spots, eachprobing a specific target sequence [16]. Similar systems have been developedto detect genetic variation, such as genotypes for hundreds of thousands ofSNPs in parallel on a single chip, or scan for protein-protein interactions on agenome-wide level. Microarray techniques have also been constructed to iden-tify variants of alternative splicing with exon specific arrays, or comparativegenomic hybridization (CGH) arrays that test copy number variation.

Papers II-IV in this thesis make use of primarily DNA microarray technol-ogy to reverse engineer gene regulatory networks, infer functional relation-ships between genes, and find regulatory mechanisms for genes and pathways.In Paper V, genome-wide genotyping technology is used to scan the genomefor association between SNPs and type 2 diabetes.

In this chapter, these measurement technologies are reviewed, along withother techniques that have recently been developed and are of interest for thestudy of complex biological systems and different levels of complexity.

2.1 Gene expression microarraysGene expression microarrays measure individual transcript levels, giving us a“fingerprint” of the gene expression in a sample. By comparing levels of in-dividual transcripts between measurements taken for samples from differentbiological or experimental conditions, we can analyse which genes are ex-pressed at different levels at these conditions. This in turn tells us about thesystems that are active in defining the tested state, or in responding to changesin external conditions. We can use this data to infer knowledge about regula-tion on all levels of detail in a complex biological system:

1. Measuring dynamics of gene regulation on the level of promoter control infundamental units, such as gene switches;

17

Page 18: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

2. Finding which pathways are up- and downregulated in different systemstates, and inferring function of unknown genes;

3. Inferring regulatory networks between genes on a genome-wide level

With the vast amount of data they can generate, and possibility to measureexpression levels in a genome-wide manner, gene expression microarrays areinvaluable for studying gene regulation.

Arrays are constructed either by spotting or by on-chip synthesis. Spottedarrays are constructed by using an inkjet-like technology to spot the probeDNA material onto a slide [17, 18]. The slide is most commonly made ofglass or a polymer, and the probe material can be cDNA or oligonucleotides,with each spotted probe targetting a specific reporter sequence. Mostcustom-made chips in academia are manufactured this way, as well aschips manufactured commercially by Agilent. Arrays can also be made bysynthesizing the oligonucleotide probes directly on the chip. Affymetrix Inc.(http://www.affymetrix.com/) manufactures the most wide-spreadsolution, which utilises a photolitography technique to synthesize the probesin the so called GeneChip technology.

Array experiments are done by preparing RNA extracts from a sample, la-belling the RNA with a fluorophore, and then hybridizing the labelled extractto an array. In one-channel experiment the extract is prepared from a singlesample, whereas in two-channel experiments a test sample and a referencesample is prepared in parallel, the RNA from the test sample being labelledwith either green (Cy3) or red (Cy5) dye, and the RNA from the referencesample labelled with the other dye. During hybridization, the individual RNAtypes bind to the spot targetting their sequence. The higher the concentrationof RNA in the sample, the more will hybridize to the probe on the chip, andthe higher the intensity of the fluorophore (Fig. 2.1).

Both one- and two-channel microarray data need to be preprocessed be-fore it can be properly analysed. The first step is scanning the array, whenfluorescence intensities are measured for the channels detecting at the fluores-cent wavelength of each dye used. The resulting data is normally stored as aTIFF image, which in turn is analysed in three steps. The first step is the gridplacement, which aligns the grid of expected spots with the image, adjusts forskewed or misaligned spotting, and finds the rough location of spots. The sec-ond step is segmentation, which employs an algorithm to find the exact shapeof the spots and decides which pixels are in the spot and which are outside it.The final step is the spot intensity estimation, where a model for the spot isused to integrate the intensity in it, which is a function of the amount of boundtarget RNA.

The resulting data consist of one intensity per spot and channel, along withestimated background intensities. This data has to go through preprocessingsteps, where the intensities are normalized. Normalization compensates fortechnical variation among arrays, so that observed differences are due to theexperimental factors tested, not due to variation in scanner intensity, labelling

18

Page 19: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

RNA fragment hybridizes with DNA on GeneChipfi array

RNA fragments with fluorescent tags from sample to be tested

Figure 2.1: Hybridization of target DNA to probes on an Affymetrix chip. Figurecourtesy of Affymetrix.

efficiency, hybridization variation etc. The notion of “housekeeping genes”,genes that are always expressed under any environmental condition, was ini-tially used to normalize arrays, but has been shown to work poorly, since eventhese genes are expressed at highly variable levels [19]. Other methods in-clude normalizing arrays to keep the total intensity constant across chips, orscaling intensities to a constant mean or median intensity. Spike-in controlscan also be used [20], as can methods aligning the array-wise distributionsof intensities across chips, such as quantile normalization. The preferred wayof normalizing Affymetrix data is by log-scale robust multi-array analysis,RMA [21], which includes a quantile normalization step. Intensities are log-transformed in most normalization techniques in order to make the data dis-tributed more like a normal distribution, and also to transform errors to beadditive instead of multiplicative [22].

After normalization, intensities (or for two-colour arrays, log-ratios) canbe used to recognize patterns of similar expression for a group of genes un-der shifting environmental conditions, using clustering or classification tech-niques, or to determine which genes are significantly differentially expressedbetween different conditions. Clustering or unsupervised classification tech-niques, such as hierarchical clustering, were developed early [23], and are partof almost any microarray analysis. Care has to be taken when drawing conclu-

19

Page 20: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

sions from such analysis though, since even random data will produce clusters.Classification methods have been quite successful and have for instance beenused to distinguish between different tumour classes [24, 25, 26] and recentlyone such method, MammaPrint, which is based on expression profiles of 70genes has even been approved by the Food and Drug Administration (FDA) fordetermining risk for the recurrence breast cancer. This is quite controversialand has been criticized for being premature and not validated enough, partic-ularly because of the amount of variation between different samples [27]. It isworth pointing out that microarray analysis has many steps where noise canbe introduced, and there are still no successful ways of combining data acrossdifferent platforms, experiments, or laboratories, even though standardizationefforts are strong.

Normalization techniques and methods for clustering, classification and dif-ferential expression have been extensively developed during the last decadeand are reviewed in [28, 29, 30, 31].

Popular methods for determining differential expression between twogroups of arrays, measuring gene expression under different treatments orexperimental conditions, include the classic t-test (or ANOVA, if we wantto test for the effect of more variables and include interactions betweenvariables), significance analysis of microarrays (SAM) [32], or local-poolederror tests [33].

2.2 GenotypingTechnologies for highly parallel genotyping have been developed recently [34,35], allowing us to test for genetic variation across the whole genome for alarge number of samples. In this section, I will describe the technology usedin Paper V, the Illumina BeadChip system. Other technologies exist and arebeing employed in genome-wide genotyping, such as the Affymetrix MappingArray Set [36, 37] which consist of two GeneChip arrays, together mappingup to 500,000 SNPs.

2.2.1 Genotyping with Illumina BeadChip systemWhen constructing microarrays, spots are normally placed on the substratearray in given locations, and two arrays always have the same probe type inthe same position on the chip. Illumina (http://www.illumina.com/)has developed a BeadChip technology which, in contrast, is based on oligonu-cleotide probes bound to 3µm-diameter beads, allowed to randomly find lo-cations on the substrate, which has a pattern of etched microwells with 5µmcenter-to-center spacing [38]. Using a genetic “barcode”, where each beadtype carries an oligonucleotide with a unique sequence, the specific type ofbead can be located after assembly and a bead location ↔ probetype map can

20

Page 21: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

be constructed [39]. The beads are linked to 75-mer oligonucleotides, of which25 bases are used for the barcode, and 50 bases are cross-complementary to aregion of genomic DNA flanking the SNP to be tested. Each bead type is rep-resented approximately 30-fold on the array (Fig. 2.2). The Infinium I designhas one such nucleotide per bead, with the 3’-end selected as one of the twoallele variants for the SNP to be tested. Two beads are required, one for eachallelic variant of the SNP. The Infinium II design has two probes per bead, onetesting an A or T variant, the other testing a C or G variant, and thus able todouble the density on the chip compared to Infinium I, but are restricted bynot allowing for the use of A/T or C/G SNPs.

Figure 2.2: The Illumina Human1 100k BeadChip contain 12 sections of 288,000 beadtypes, testing 144,000 loci. Each section has 890,000 features, allowing an approxi-mately 30-fold redundancy across the chip. Picture courtesy of Illumina.

The Infinium assay has been designed for the BeadChip technology [40].First, genomic DNA is amplified to a concentration of 2-3 pM, followed byhybridization to the array. During hybridization, the strand which is comple-mentary to the immobilized oligonucleotide will bind. In the Infinium I assaywhere two beads per SNP are used, the final base for the bead that correspondsto the right allele type will match, but the other bead will have a mismatch inthe same position. A polymerase extension step follows which incorporates achain of biotin-labelled nucleotides, but the reaction can only proceed at thebead where the genomic DNA and the oligonucleotide matches perfectly. Ifthe sample is homozygous for the tested SNP, we will only have one of the twobead types bound (in theory), whereas for a heterozygote, the two bead typeswill be bound in roughly equal amounts. This allows us to identify the geno-type (AA, AB or BB) present at the interrogated SNP location in the sample.

21

Page 22: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

In the Infinium II assay, each bead interrogates both allelic variants and theextension reaction incorporates either an A or T nucleotide, labelled with reddye, or a C or G nucleotide labelled with green. The readout will be either twored, two green or one green and one red, identifying the genotype of the SNPin the tested sample. The Infinium II assay can be employed on BeadChipsusing the Infinium I design as well.

In Paper V, the Human1 and HumanHap300 chips were used, and SNPswere selected for those two chips with two entirely different strategies. TheHuman1 chip tests 109,365 SNPs, using a gene centric approach based on theInfinium I design. 71% of the SNPs on this array are located in transcripts orwithin 10kb of an exon, and about half of the remaining SNPs are locatedin highly conserved regions. The HumanHap300 chip tests 317,503 SNPsand is built using an Infinium II design with SNPs selected to tag haplotypeblocks based on Phase I HapMap data. SNPs within 10kb of genes or highlyconserved regions were selected using an r2 threshold of 0.8. Outside theseregions, r2 < 0.7 was used. Additionally, over 7,000 nonsynonymous SNPswere added.

The density of the arrays is increasing rapidly and an Illumina BeadChipdesigned to test 1 million SNPs is scheduled for release in the first half of2007. Illumina has also developed other chips based on the same technology,used for assaying genetic variation, such as copy number variation and loss ofheterozygosity.

2.3 DNA-binding proteinsMicroarrays can also be used for detecting DNA sites bound by proteins, usingthe so called “ChIP on chip” technique, where a chromatin immunoprecipita-tion is followed by a DNA chip analysis [41, 42, 43, 44] (Fig. 2.3). Proteinsare crosslinked to genomic DNA in the sample, and the protein of interest,linked to the DNA of genomic regions it is bound to, is being pulled down byimmunoprecipitation with specific antibodies. The crosslinks are reversed andthe DNA is hybridized to a microarray where each spot contains DNA of aspecific genomic region, allowing us to determine the regions the protein wasbound to when the sample was taken. Initially, such arrays were constructedonly from promoter regions, but with the development of more high densitytechnologies, DNA covering whole chromosomes have been constructed [45].

2.4 Protein interactionsMeasuring protein interactions in high-throughput is quite different from do-ing so for DNA. While DNA is measured through hybridization with a cross-complementary strand, proteins are detected with antibodies or analysed with

22

Page 23: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

Figure 2.3: Assaying protein-DNA binding with Chip-on-chip analysis. DNA-bindingproteins are crosslinked, and after fragmenting the DNA, the proteins are immunopre-cipitated and crosslinks reversed. The recovered DNA is tagged and hybridized to anarray on genomic regions, so that the regions that were bound by the protein can beidentifi ed. Figure by Thomas Hentrich (Wikipedia)

mass spectrometry. There are several factors making this much more com-plicated than DNA hybridization. First, if we know the sequence of a tar-get DNA strand, it is easy to construct a probe binding it through simple nu-cleotide complementarity, which will bind in a highly specific manner as longas the target sequence is long enough. Some nonspecific binding will occuronly if we have few sequence mismatches, and have them in non-critical lo-cations in the sequence. For a protein, we cannot develop a specific antibodyjust by knowing what the target looks like, but have to employ costly andtimeconsuming techniques with immunization in animals, production of newcell lines, selection and purification steps. Also, whereas DNA hybridizationbinding energies and cross-hybridization is fairly easy to predict, proteins canshow significant and hard-to-predict non-specific binding. Because of this,high-density arrays for specific protein binding, similar to DNA microarrays,have developed more slowly than their DNA counterparts. The most signif-icant high-throughput results in proteomics have so far been gathered usingmass spectrometry techniques, but array technology is maturing [46, 47].

2.5 Single-molecule measurement techniquesRecent developments in measurement technology include techniques with avery high resolution and detail, as well as techniques that are massively par-allel and high-throughput. Such high resolution techniques enable us to studyregulation of single genes, transcript levels in single cells, and where tran-

23

Page 24: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

scripts are present only in very few copies, even single molecules. Singlemolecule measurements have enormous potential for the study of regulationof complex biological systems at the molecular level [48]. One such tech-nique to measure RNA is molecular beacons [49], which are single-strandedprobes containing a fluorophore and a quencher molecule that prohibits thefluorescence when in proximity to the fluorophore. In their unbound state,they form a hairpin loop with a complementary sequence to the target se-quence in the loop, and with the quencher and the fluorophore at different endsof the probe, only coming close while the hairpin is formed. When bound tothe target sequence, the hairpin is opened and the distance between quencherand fluorophore becomes so large that the probe can fluoresce without beingquenched. To avoid being cleaved by nucleases, molecular beacons used forimagining in living cells are often made from chemically modified DNA, suchas peptide nucleic acid (PNA) [50]. An enhancement to this technique carriestwo different fluorophores per probe, and will fluoresce at one wavelengthin its unbound hairpin state, and at a different wavelength in its bound state[51]. These techniques have an advantage compared to standard techniquesin molecular biology, where the expression of a system is monitored by us-ing fluorescent reporter proteins such as Green Fluorescent Protein, GFP, orYellow Fluorescent Protein, YFP. These proteins normally have a long mat-uration time, since they have to undergo translation and folding after beingtranscribed together with the gene of interest. Also, after production, they dif-fuse into the cytoplasm quickly, which makes detection harder. A techniqueto improve this is to fuse the YFP with a membrane protein, so that it locatesto the membrane after production and can be detected with single-moleculesensitivity [52]·

24

Page 25: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

3. Low-level signals and noise

Signals on the molecular level control the fundamental units of cellular func-tion, the genes. The binding events that transduce signals at this level arestochastic by nature, because of diffusion and Brownian motion of molecules.The time two molecules are bound together, for instance during the initiationof transcription, is probabilistic due to the random distribution of energies ina population of molecules and only events where this time is long enough forthe chemcial bonds to form and whole reaction to complete will have a result.The low number of some of the regulatory proteins (present in only tens orhundreds of molecules in a cell) also contribute to the fluctuations of geneexpression, since every individual production or destruction of a molecule af-fects a relatively large fraction of the total number available.

We can use fairly detailed mathematics when modelling low-level biologi-cal circuits, for example the formalism of stochastic dynamical systems. Oneof the most basic building blocks of gene regulatory system is a genetic switch,consisting of two genes [15]. When one of the genes is expressed, the otherone is not, and vice versa. Such bistable systems are very important in na-ture, and have interesting signal processing capabilities. For instance, noisehas a non-trivial effect on measurements of system response, like the stochas-tic resonance phenomenon which is studied in detail in Paper I. This chapterreviews control mechanisms active in the lowest level of biological systems,where single genes and proteins affect each other, and where we model theregulatory circuit without knowing the whole system around it.

3.1 Control of gene expressionThe DNA in our cells is stored as chromatin, a complex held together by pro-teins. DNA wraps around nucleosomes, particles of eight histone proteins.Gene expression is predominantly controlled by the interactions of DNA-binding proteins with regulatory DNA regions, such as promoter regions di-rectly upstream of a transcription start site, or enhancer elements that can belocated far from the transcribed region. Transcription of a gene requires theassembly of an RNA polymerase holoenzyme complex, and involve changesin the chromatin structure to allow the transcription complex to proceed [53].

25

Page 26: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

3.1.1 DNA motifs as signals for protein bindingTranscription of each gene is regulated by one or a combination of several TFs,each binding a DNA motif upstream of the transcription start site, typically 5-15 bp long. These proteins can be both activating and repressive, and recruitcofactors that will change the structure or chemical properties of the chro-matin or the DNA chain itself around the transcription start site. This affectsthe assembly and binding of the RNA polymerase holoenzyme, and thus con-trol the transcriptional activity of the gene [54]. We can identify TF bindingsites by a variety of methods, for example the classic “footprinting” methodthat is based on the idea that protein binding shields the DNA from cleav-age by DNases. Another method is chromatin immunoprecipitation, whereproteins are crosslinked to DNA, followed by DNA fragmentation and pulleddown by an antibody specific to the protein of interest. An enhancement of thisis the previously described “ChIP-on-chip” technique. The positioning of thechromatin proteins in the nucleosomes, along with modifications of chromatinsuch as DNA methylation and acetylation of histones also greatly impact thetranscriptional activity [55]. Little has been known about genome-wide struc-ture and modifications of promoters, but recently, the first whole-genome mapof DNA methylation was published for the plant A. thaliana [56], and the nu-cleosome positioning in chromatin structure was mapped for 3,692 promotersin the human genome [57]. Transcriptional activity can also be affected by reg-ulatory RNA [58, 59], or by events on promoters of nearby genes, so calledtranscriptional interference [60].

We can predict TF binding computationally, for instance by looking forDNA motifs that are over-represented in promoter regions for pre-definedgroups of genes. This was the method of choice in yeast, since promoter re-gions are easy to find and are fairly short. Gene expression profiles have beenused to find groups of genes that are coexpressed, indicating that they alsomay be regulated by the same set of transcription factors [61]. In higher or-ganisms, the most successful methods are based on the assumption that reg-ulatory elements are conserved across species [62], and are grouped togetherin cis-regulatory modules [63]. There is experimental evidence showing thatregulatory sites indeed are located on regions more conserved between speciesthan would be expected by random [64]. Sites and the variation of bases in TFbinding motifs are represented either as two-dimensional matrices containingthe frequency for each base at each position in the motif [65], or as sequencelogos (as used in Paper III) where the four possible base letters are stacked ontop of each other at each position, the height of each letter proportional to theinformation content [66, 67].

The control mechanisms for a genetic system can change between species,but the actual logic of the circuit remains through evolution. An example ismating specificity in yeasts, that comes in two types (a and α), mating byfusing with each other. a-cells express a specific set of genes, whereas α-cells

26

Page 27: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

express a different set. In a-cells, the α-genes are silent, and vice verse. Thetwo yeast types Saccharomyces cervisiæ and Candida albicans both have thistype of mating specificity, but the two genetic systems have evolved so thatthey are controlled by entirely different molecular mechanisms. In spite ofthis, the logic of the circuit remains - a-cells express a-genes, but not α-genes,and vice versa [68].

3.2 Synthetic biology and randomness in geneexpressionThe advances in molecular biology and sensitive measurement techniqueshave made it possible to study the dynamics of genetic control elements di-rectly. We know that the network of all genes working together in a cell is builtup from smaller pathways and modules, which in turn are built up by smallerregulatory units of two-three genes coupled together to perform specific taskswith certain dynamics. These smaller regulatory units are built up by the genesencoding proteins, and promoter regions that contain regulatory informationin the type of short DNA motifs transcription factors can bind to. This field isnow advancing rapidly, and it is even possible to use this from an engineeringperspective — to design genetic control circuits to perform specific tasks andinsert such circuits in living cells [69, 70].

The rapid advancement of synthetic biology has been much facilitated bymore detailed measurement techniques, as discussed in Chapter 2, which havegiven more detailed insights in the signal transduction and various forms ofnoise in genetic control circuits [71, 72, 73, 74, 75, 76]. This noise can becoming from sources within the cell (intrinsic), such as the probabilistic na-ture of chemical reactions and Brownian motion of molecules, as well as thelimited number of regulatory molecules and spatial arrangement of chromatin[77]. Extrinsic noise come from external signals, such as variability withinthe population of cells, the dynamics of which is in itself controlled by geneexpression in the individual cells [78]. The effect of noise in a regulatory unithas been found to depend on the timescale of the fluctuations, so that frequentchanges, such as typical in intrinsic noise, have less effect than slower ones,such as coming from external factors [79].

3.3 Signal transduction in stochastic bistable systemsBistable systems, that flip between two different stable states, are commonboth in man-made systems and in nature. Paper I describes the analysis oftransduction of signals and noise through a bistable system, a Hopfield neu-ron model. The same class of model can be used to describe genetic switches,and we find similar dynamical features in these. Such genetic switches are fun-

27

Page 28: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

damental units in naturally occuring genetic systems, and the building blocksof pathways. Completely articifical bistable genetic systems can be designedfrom scratch [80].

For an illustration of how bistable systems work, consider a bistable physi-cal system descibed by a potential of the type

V (x) =b4

x4 −a2

x2, (3.1)

where a and b are constants. Assume a particle reside in such a system, andthat x represent its state. Then the force acting on it is

F(x) = −dVdx

= bx−ax3 (3.2)

with two stable fixpoints at the energy minima in xs = ±√

a/b and one un-stable fixpoint at x = 0. This potential has the shape of a double well, wherethe two wells are separated by a barrier of height ∆V = a2/4b. The systemhas two stable states, but signals or noise act as external forces on a particleand can make it jump over the barrier and towards the other energy minimum.This is a typical toggle switch, and bistable systems are very commonly foundin nature. The dynamics of the switching events can be derived knowing thepotential function, work that was pioneered by Henrik Kramers in 1940 who

He was studying chemical reaction kinetics, where reactant and productstates are separated by an energy barrier. The reactants need to cross the en-ergy barrier for the reaction to take place and form the products. Particles inthis double well potential are affected by the force from the potential and ran-dom forces from the solvent, dampened by a linear friction. This gives theLangevin equation

md2xdt2 = −

dVdx

−mγdxdt

+N(t), (3.3)

where m is the particle mass, γ is the damping coefficient (related to the dif-fusion coefficient D by γ = kBT/D), and N(t) is a gaussian white noise termwith zero mean and variance 2mγkBT . Let the angular frequency of the poten-tial at the top of the barrier be ω2

b = |V ′′(xb)/m | and in a energy minimumwith ω2

0 = |V ′′(x0)/m |. Then, in the case of overdamping, γ ωb, the rate ofbarrier crossings between the minima is given by Kramers rate [81]:

rK =ω0ωb

2πγexp

(∆VD

)(3.4)

The “physical” view of a bistable system described with a potential functioncan be easily transferred to a stochastic differential equation (SDE) represen-tation that contain a drift term and a diffusion term on the form:

dXt = F(Xt)dt +σdWt (3.5)

where σ 2 is the variance of the noise, and W is a Wiener process (Brownianmotion) [82]. We are using a SDE description of the system in Paper I

28

Page 29: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

3.3.1 Stochastic resonanceStochastic resonance (SR) is a phenomenon that can be observed for non-linear systems, and has raised considerable interest during the past decades[81]. It has most often been described in terms of the signal-to-noise ratio(SNR), which is defined as the energy of the signal over the energy of thebackground noise. In SR, the SNR at output has a local maximum as a func-tion of noise strength, meaning that the output SNR can locally increase as thenoise strength increases, which is quite counterintuitive. It has been suggestedthat the addition of noise could improve detectability of weak signals in noise,when using nonlinear sensors.

Consider a bistable system, described with a double well potential suchas described above, and a particle in it. Because of fluctuations and externalsignals, the particle will move along the potential surface, affected by a forcethat we can write as

F(x, t) = −dVdx

+Nt +St , (3.6)

where − dVdx is the force from the potential, Nt a random force from noise,

and St a force from a signal. If no external signal is present, we can simplyput St ≡ 0. In effect, we can view this as a new system, described by a time-dependent potential V such as

−∂V∂x

= −dVdx

+St . (3.7)

Now assume there is a deterministic sinusoid signal affecting the system, St =Asin(Ωt). That makes the effective potential function

V (x, t) = V (x)−Axsin(Ωt), (3.8)

a doublewell system where the two wells will change their “depths” periodi-cally (Fig. 3.1).

If the amplitude of St is low enough, the effective potential V (x, t) will stillhave the double well shape, and a particle trapped in one energy minumumwill stay there and not be able to pass the barrier between the two wells. Butif a noise Nt is also present, random forces can push a particle in a higher en-ergy local minimum across the barrier and into the global energy minimum.The noise and any external signal will cause hopping between the two stablestates, analogously with the case studied by Kramers, and when the periodic-ity of the signal matches Kramers rate, we will see the stochastic resonancephenomenon which resembles a classic resonance, a tuning of external fre-quencies to match frequencies within a system.

Stochastic resonance has been seen in both man-made and natural system,such as the superconducting quantum interference device (SQUID) [83], neu-rons [84, 85], and many other systems [81]. It is known that the SR responseof a single bistable unit can be improved by cooperative effects when coupling

29

Page 30: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

Figure 3.1: The external sinusoid signal creates an effective system potential where thetwo wells have different depth, changing with the period of the signal. Figure courtesyof Daniel Asraf.

them in arrays [86, 87, 88]. Synchronization is a classic emergent property ofcomplex systems, and it is not surprising to see improved signal transductionin chains of biological control units coupled in complex networks [89].

An example of the effect noise has in genetic nonlinear control systems is aneffect called stochastic focusing, similar to stochastic resonance in the coun-terintuitive and “performance improving” effect noise has in SR. In stochasticfocusing, the amplification sensitivity in nonlinear biochemical reactions in-creases with increasing fluctuations on the input [ 90, 91, 92].

3.4 Effects on gene expression by genetic variationGenetic variation can have an effect on gene expression by several mecha-nisms. A SNP located in a transcription factor binding site may be the most

30

Page 31: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

direct example, where a TF may bind specifically to a DNA motif, but not toa motif where one base has been exchanged for another.

A way to find SNPs located in regulatory elements is to study allelic im-balance, the relative expression levels between gene copies located on eachof two sister chromosomes in individuals heterozygous for the tested SNP[93, 94]. Allelic imbalance can also be explained by differences in chromatinmodications between the two sister chromosomes. Regulatory SNPs with anallele specific effect on gene expression has been found in genes associatedwith for instance Alzheimer’s disease [95] and cancer [96].

Genetic effects have been mainly studied in mice, since these have a shortgeneration time, are small but genetically very similar to humans, sharing∼ 99% of the genes [7], and can be bred under strictly controlled environmen-tal condictions allowing us to separate genetic and environmental effects instudies [97]. Expression levels of each gene can also be treated as quantitativetraits, “e-QTLs”, and studied in yeast or in strains of mice with defined geneticcomposition such as recombinant inbred or recombinant congenic strains [98].Such strains are constructed by a series of backcrosses and inbreeding steps,and then genotyped. In these crosses, linkage analysis is used to find whichchromosomal regions are associated with the expression of each gene, andthis information can help us elucidate control mechanisms and gene networks[99, 100, 101, 102]. Trans-acting effects are harder to find than ones in cis fortechnical and statistical reasons [103], but studies done in yeast indicate thatsuch effects are widespread and not necessarily associated with transcriptionfactor activity [104].

Trans-interactions between regions far apart on the same, or even differentchromosomes, can be investigated further by looking at spatial arrangementsof chromosomes. It is hypothesized that in the nucleus, chromosomes occupyspecific locations and can be close enough to interact, so that the expression ofgenes at one locus may be affected by a locus on a chromosomal region in theproximity [105]. To detect such interactions, the chromosome conformationcapture (3C) method has been developed [106], that crosslink chromosomalregions that are in close proximity to each other. A recent enhancement ofthat method is the circular chromosome conformation capture, 4C [107]. Thetechnique is still under development but could prove instrumental in detectinglong-range intra- and inter-chromosomal interactions.

31

Page 32: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using
Page 33: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

4. Pathways and biological tasks

The DNA of a eukaryote cell is located in the nucleus. If an outside eventtriggers the need for a gene (or a set of genes) to be expressed, a signal istransmitted from the cell surface into the nucleus and the gene itself. Thereare many examples of such signalling pathways, and they are the signal trans-duction chains when adapting the cell to shifting environmental conditions.Many signalling pathways work using receptors in the cell membrane. An ex-ternal signal in the form of a molecule, like a hormone, binds to a cell surfacereceptor. These receptors span the cell membrane, so a binding event on theextracellular side causes effects on the intracellular side, often on the form ofactivating a kinase, which in turn activates another set of proteins by phos-phorylation. Often chains of such activation events finally target one or moretranscription factors, that are transported into the cell nucleus and interact di-rectly with the DNA of the genes they regulate. [108]

In this thesis, paper III concerns the regulation of pathways by transcriptionfactors working in a synergistic manner along a whole pathway, providingregulatory action at almost every step in the chain. Paper IV combines net-work information to infer the function of genes using a guilt-by-associationapproach, showing that genes that are linked to highly similar sets of genes intwo networks also are likely to interact functionally with each other.

4.1 Pathway controlIn biological systems, a whole range of functions need to be available for theorganism in order for it to survive, grow, communicate, move or reproduceand many other things. Every organism needs to transform nutritional sub-stances from food into molecules that can be used by the cells to build upother molecules. In order to carry out chains of reactions, pathways of reac-tions have evolved where the output of one reaction is the input of the next one.In metabolic pathways, reactions are linked by enzymes catalyzing the differ-ent steps in chemical alteration of the metabolite. In signalling pathways, theincoming signal, such as the binding of a hormone to a cell surface receptor, istransmitted through many steps of protein interactions, transportation withinthe cell, release of small signalling molecules, or by binding to nuclear recep-tors, to finally reach its way in to the nucleus and control the expression of agene. In most cellular pathways, as in man-made control circuits, feedback orfeed-forward loops are essential to maintain stability and proper function of

33

Page 34: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

stochastic systems. One such example is the circadian clock that keeps certainfunctions in an organism synchronized with the 24 hour day period, and feed-back in the circuit of genes controlling the clock is essential for its function[109].

Pathways can be bistable just as well as the smaller gene circuits, in thesense that when one pathway is fully working, the other one is shut off,and vice versa. An example of two interlocked bistable pathways are thepheromone response and hyperosmolarity pathways in yeast, which sharethe protein STE11. Sharing this component means that the two pathwayscrosstalk, since STE11 can activate the downstream target in both pathwaysat once, unless these are separated spatially or are inhibited by other means.But a mechanism of mutual inhibition downstream of this crosstalk ensuresthat only one of the two pathways is active at the same time in a given cell.In certain ranges of kinetic parameters, the system is bistable, which can beobserved by stimulating the cells with both signals at the same time. Onlyone of the two pathways will be active. Also, if the two signals are appliedsequentially, it has been shown that the system is more reluctant to switchfrom one state than remaining in it [110].

34

Page 35: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

5. Large-scale gene networks

We have discussed how genes are regulated at the molecular level, how lowlevel gene circuits are constructed and coupled together to pathways that trans-duce signals or carry out specific tasks in the cell. For a cell to live and adaptto fluctuating environmental conditions, many different tasks have to be co-ordinated in response to different signals (Fig. 5.1). This requires a parallelregulation of pathways, and a level of coordination above that of single bio-logical processes. These large-scale networks govern how cellular subsystemsand genes control each other on a global scale at the same time as coordina-tion actions on the pathway level [111]. Dynamic and structural properties ofthese networks have to be adaptable enough to cope with shifting situationsat a reasonable metabolic cost. System robustness and error tolerance mustbe balanced with the energy required to uphold such properties. Modelling ofthese systems from data also present a hierarchy of detail, and it is importantto choose a level of detail that can be supported by the data and address theright questions [112].

Paper II in this thesis deals with systems at the large level, by inferring awhole-genome gene regulatory network using a large set of systematic genedeletions in yeast, and measuring expression profiles for the perturbed strains.With this, we build a network of ∼ 6400 genes and analyse large-scale prop-erties such as tolerance against directed attacks, robustness against random er-rors, biological characteristics of highly linked genes, and the distribution ofconnectivities. We find that this gene regulatory network is scale-free [113].In this chapter, I will review theory for large-scale networks, so that the resultsof Paper II can be seen in the light of results from other areas of network sci-ence as well. Also, I will review other methods of building such genome-widenetworks, and the integration of datasets.

5.1 Topology and structural properties of complexbiological systemsNetworks are often represented as graphs. A graph can be defined as a tu-ple G = (V ,E ), where V is the set of nodes (vertices) contained in the net-work, and E is the set of edges that link nodes. An edge links two nodes, andcan have a direction or be undirected. Directed edges are sometimes calledarcs. We can assign labels to the components, so we can identify and distin-

35

Page 36: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

Figure 5.1: Cellular processes are coordinated in large complex systems. From the“Biochemical Pathways”chart (Roche Applied Science)

guish them, and weights on the edges to give them a representation of rela-tive strength. The nodes represent components of the system, and the edgesrepresent links between these components. The (total) degree of a node is thenumber of edges connected to the node. For directed graphs, we can talk aboutthe indegree — the number of edges directed to the node, and the outdegree,the number of edges leading from the node (Fig. 5.2).

The structure of networks have been studied since 1735, when LeonardEuler solved the famous problem of the Königsberg bridges. In Germany, inthe town of Königsberg, a river was running through the town and there weretwo islands in the middle of this river. To be able to move between the differentparts of the town, seven bridges had been built. The mathematical problem wasto find a way to visit all parts of the own by crossing each bridge exactly once.Euler proved that this is impossible. What he did was to represent each partof town (the two side of the river, and the two islands) with nodes, and thenrepresent the bridges connecting the parts of town with edges between thesevertices, and solved the problem with graph theory.

The study of random networks started in 1959 by Erdös and Rényi who de-scribed properties of networks constructed with an equal probability to forman edge between any two nodes [114]. For many years, such Erdös-Rényinetworks were the centre of all random network studies. But more realisticmodels of network appeared after James Milgram’s study in 1967 [115], thatled to the famous “six degrees of separation” notion, stating that any two peo-

36

Page 37: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

Figure 5.2: Graph representation of a network. Node A has four edges going out fromit, and two edges going in to it, so it has outdegree 4, indegree 2, and total degree 6.

ple anywhere on Earth are separated by, on average, six friend-of-friend-of-friend connections. Milgram gave a number of letters to volunteers in the USmidwest region, asking them to forward each letters to a specific person inBoston by giving them to friends or relatives, who in turn could hand themon in the same manner, recording the path the letter was taking. Only a smallpercentage of the letters reached their targets, but the ones that did so by an av-erage of six connections. Milgram also found that many of those letters wereforwarded by the same people, showing the existance of “hubs”, particularlywell connected people. Following Milgram’s experiment, studies of the struc-ture of such social networks have confirmed and solidified Milgram’s findings.In 1998, Duncan Watts and Steven Strogatz described topological propertiesof such networks [116], suggesting that small-world networks lie in-betweenon one hand completely random networks, like the Erdös-Rényi ones, and onthe other hand lattices where each node is coupled to other nodes in regularpatterns. They used two quantitative measures to describe this, the clusteringcoefficient, C, and the the average path length between any two nodes. Theclustering coefficient is defined as

C =1N

N

∑v=1

2Ev

kv(kv −1), (5.1)

where kv is the number of nearest neighbours in the network for node v, and Ev

is the number of connections between these. It can be viewed as the fraction ofhow many links are present in the neighbourhood of each node over the num-ber of possible links, averaged across the whole network of N nodes. Watts

37

Page 38: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

and Strogatz started from a lattice, with high clustering between nodes andhigh average path length, and randomly rewired connections. This resultedin a sharp drop in average path length even at a relatively low probability ofre-wiring, even though the clustering coefficient remained high, giving thenetworks the typical “small-world” topology. With increasing probability ofre-wiring a link, the networks resembled more and more a random network,with its typical low average pathlength and small clustering coefficient.

Although very interesting theoretically, the Watts-Strogatz (WS) modelcould not explain some other features seen in natural systems, in particularthe distribution of the number of links each node is adjacent to, the degreedistribution. Many natural and man-made complex systems of vastly differentcharacter exhibit similar topological properties, sharing a similar distributionof degrees, where the probability of finding a node with k connections ina networks often follow a scale-free power-law P(k) ∼ k−γ , where the γexponent is characteristic for each network. This is surprisingly commonand widespread. In 1945, Zipf showed that the frequency of words inthe English language, represented by James Joyce’s Ulysses and samplesof American newspapers, as a function of their rank was governed by apower-law [117, 118]. This is now known as Zipf’s law. A small number ofhighly important components bind a text together, and this principle can befound in networks in general exhibiting the power-law function. Mandelbrotextended Zipf’s work within an information-theoretic framework, arguingthat language is a coding of messages [119]. The same topology can befound in the internet, where a few hubs have millions of links, but wheremost web pages only have a small number [113]. Also, it is found in actorcollaborations or co-authoring of scientific papers, in power grids, airtransportation networks, city development, and even in the response times forletters by Charles Darwin and Albert Einstein [120, 121, 122, 123]. It is evenfound for words in random texts, where each symbol (including blankspace)is drawn at random, forming words by blankspace separation [124]. Barabásihas pioneered the studies of topology and evolution of scale-free networks,for instance by first proposing a model for the generation of such [113]. Thefield has drawn considerable attention and has been reviewed in for instance[120, 125, 126].

Also in biology, scale-free networks are found everywhere. Networks builtfrom biological experiments are also governed by a power-law degree dis-tribution, for instance in metabolic networks [127] and protein interactionnetworks, where the most connected nodes also are the ones critical for cellsurvival [128]. Paper II of this thesis was one of the first to find that generegulatory networks are scale-free and their degree distribution is following apower-law.

It has been extensively discussed what mechanisms are responsible for evo-lution of networks with power-law degree distributions. Mandelbrot in [119]argued from an information theoretic perspective about language, deriving his

38

Page 39: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

Figure 5.3: Local neighbourhood of a scale-free gene disruption network (left), and thedegree distribution (right). The frequency of nodes with k connections in the decreasesroughly as a power-law in k. From Paper II.

extended version of Zipf’s law by minimizing the cost of coding. The ideathat networks governed by a power-law degree distribution evolve as a trade-off between cost and tolerance against random errors has also been adoptedby Carlson and Doyle [129, 130]. Scale-free networks are robust against ran-dom errors, since the removal of a random node will with high probabilityonly affect very few edges. The probability to hit a central hub is very low,since there are only few of these. Biological systems are indeed very robust,and adapt easily to changing environments or system errors. Redundanciesand pathways working in parallel are also topological features that increaserobustness [131, 132]. In contrast to their robustness against random errors istheir fragility to directed attacks. If we know which the central nodes are, it iseasy to knock out the entire system by only attacking these central nodes, asdemonstrated by the correspondence between central nodes and lethal mutantsfor a protein network [128]. This has led to the idea of modularity in biologicalnetworks, proposing that the separate functions are clustered in modules abovethe pathway level, with only few links connecting these modules [133]. Sup-port for this has been lent by for instance Ravasz, who described how modulesin E. coli are connected in a hierarchical manner [134], leading to an observedself-similarity within these networks [135]. Also, Maslov and Sneppen found,by analyzing the two-dimensional degree distribution in protein networks, thatnetwork hubs rarely bind each other [136]. Girvan and Newman describe sim-ilar modularity in social networks [137]. If gene networks really are built upin a modular way (as results indicate), modularity not just being a more eas-ily grasped way to understand these system when we think about them, theyrequire a higher level of coordination between modules [138].

Scale-freeness has a big impact on the dynamics of the complex systemsaround us. For instance, as in Milgram’s experiment, and as can be seen in

39

Page 40: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

the many social networks growing on the internet, information traverses sucha network quickly, because of the relatively short pathlength between any twonodes. This is particularly seen in the effect of human travel, and how socialnetworks easily extend because it indeed is a small world, that you can travelaround easily. This can be tracked for instance by following the transfer ofbank notes [139]. Unfortunately, it also means that infectious diseases spreadquickly, in particular if an outbreak zone is closely connected with a hub inthe transportation network. Similarly, virus and malware spread quickly onthe internet.

A variety of biological mechanisms for system evolution have been pro-posed, for instance gene duplication and subsequent functional divergence[140, 141, 142]. Topological aspects of network science have been reviewedin [143], and [144, 145, 146] reviews the field with more emphasis on systemdynamics.

5.2 Inferring large-scale systems from dataEssentially, modelling large-scale systems from data can take a data-drivenor a model-driven approach. In data-driven approaches, connection betweennodes are formed based on experimental measurements or statistical analysisof data such as clustering or principal component analysis [147]. In model-driven approaches, a model for the system is assumed, and the structure andparameters of the network is learned from data. Networks built from data-driven approaches can be seen simply as graphical representations of statisti-cal properties of a dataset.

The first models of gene regulatory networks were Boolean, where the stateof a gene (on or off) in one timestep was modelled as a Boolean function of thestate of connected genes in the previous timestep. Stuart Kauffman pioneeredthis field [148, 149] and showed that such systems are stable and follows anattractor, since the number of possible states for the entire system is limited.

5.2.1 Networks from microarray dataDNA microarrays give us vast amounts of data directly linked to transcriptlevels, and it is obvious that this could be used for reverse engineering regu-latory relations between genes. This field developed rapidly after the ecomingfrom microarray experiments, quickly showing that the simple Boolean mod-els were realistic enough to describe real expression networks. Genes are notBoolean by nature, and are not simply on or off but expressed in wide ranges,different and fluctuating for each gene. Other early models were deterministicand linear [150, 151].

In Paper II in this thesis, a microarray dataset from over two hundred sys-tematic gene deletion experiments is used. Each experiment shows us which

40

Page 41: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

genes change their expression level at the deletion of another gene. From this,we can deduce that all genes that are affected by the deletion of a gene isdownstream of it, regulation-wise. Of course, since the system is complexand contains feedback loops, we will find loop structures also in a disruptionnetwork. Yeast is the main model organism to study this, and single deletionstrains exist and have been studied for all genes (∼ 6,000). 18.7% of the geneshave been found to be essential, required for growth on rich medium [152].Double deletion mutants are also informative, to screen for so called syntheticlethal interactions, where two deletion mutants are viable independently, butlethal when combined [153]. Interactions between lethal genes are harder tostudy, since the deletion of them kills the cell, but can still be done using engi-neered tet-regulated promoters [154]. Experiments studying additional levelsof knockouts show that 74% of all metabolic genes participate in processes es-sential for growth [155]. Reverse engineering algorithms have also been pro-posed where gene networks are explored by steps of systematic perturbationsthat depend on observations in previous steps [156]. Experiments preceedingand following system perturbations are essential in the study of gene regula-tory networks, to unravel chains of cause and effect [157]. Although yeast isthe main model system for systematic perturbation studies, it has also beendone in different organisms, like the worm [158].

One of the most common approaches to model gene expression data is asa Bayesian Network. These were developed in the machine learning field ofcomputer science to model conditional independencies in large systems ofcoupled events [159]. In a Bayesian network, the assumption is that the systemis a directed acyclic graph (DAG), and that the state of a node is independenton the state of grandparent nodes, given the state of the parent nodes. This isused to factorize the probability density function over all nodes to a productof conditional probabilities for neighbouring nodes, something that greatly re-duces the number of parameters needed to fully describe the system. A greatadvantage of this type of model is that it is probabilistic by nature, copingwell with the noise in microarray data. One disadvantage is the assumption isof acyclicity (indeed, any uncertain system requires control loops to maintainstability). Also, a Bayesian Network is not unique - an observed probabil-ity density on the space of all variables can be explained by many differentnetworks. For microarrays, this approach has been championed by Friedmanand collegues [160, 161, 162]. For time dependent data, an extension calledDynamic Bayesian Networks can be used [163].

5.2.2 Transcription factor binding networksTranscription factor binding networks show direct physical interactions be-tween a transcription factor and the genes the TF bind the promoter region of.This can either be found by computational predictions, scanning the genomefor the TF binding sites [164] or by “ChIP-on-chip” experiments [44]. [165]

41

Page 42: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

takes the predictional approach, in combination with analyzing gene deletionmutants [166], since we expect the disruption of a transcription factor to affectthe genes it regulate. This type of network is used extensively in Paper IV.

5.2.3 Protein-protein networksUsing high-throughput techniques, protein-protein interaction maps have beengenerated for many species, such as yeast [167, 168, 169, 170] the fruitfly[171], the nematode C. elegans [172]. A human protein-protein interactionnetwork, derived from different sources and partially predicted from otherspecies is the Human Protein Reference Database, HPRD [173]. Many high-throughput techniques yield a very high number of false positives, typicallywell over 50%, and the overlap between the maps is very low, below 15%,in particular comparing interaction maps across species. Experimental design,targeting only parts of the whole space of interactions, as well as the test sys-tem, where only parts of the proteome is expressed depending on experimentalconditions, may explain some of the low overlap [174, 175, 176, 177, 178].Also, included in the false positive number is the amount of connections foundbetween proteins that may bind each other well, but are never expressed at thesame time or the same location within the cell.

5.2.4 Integrating networksCombining and integrating networks has proven to be hard, because of therelatively high degree of noise in the experimental data and false positivesand negatives in the derived networks. Experiments performed under differentexperimental conditions test biological systems that may not be comparable,even if the same model system is used. Paper IV in this thesis is an attemptat combining different networks using a guilt-by-association approach. In anysuch analysis based on networks, we need to account for the type of data usedto infer the connections in different networks, and the interpretation will de-pend on what these connections mean in the individual networks. Often, rep-resenting a system as a network graph is misleading, since paths in the graphmay not have a meaning in the actual system. A path in a protein-proteininteraction network simply means that each binary connection represents apossible binding of the two linked proteins, but says nothing about how pro-teins further down the chain of links are related to the the ones further up.By no means should such networks should be interpreted as signalling path-ways. Real biological signalling pathways are actually very hard to predict indata, since each step in such a pathway represent an interaction that is happen-ing in a system in a specific state, different from the step before. Also, eachstep may take place at different cellular locations and have a dependency ofthe timing between each step [179]. Attempts at relating gene expression andprotein-protein interaction data have shown only limited success [180].

42

Page 43: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

Recent studies have combined protein-protein networks with phenotypedata [181] or gene expression data with drugs and diseases [182] to find howinteractions on the gene or protein level reflect on diseases and the imbalanceof biological systems on a larger scale.

In general, integration of data from different types of experiment is hard,because the evidence each method provides has to be given a weight in a uni-fying scoring scheme, and the weights are often chosen ad hoc. [183] attemptsa Bayesian approach for unifying scores from different datasets, but there areno strong arguments for any particular choice of combining scores.

5.3 Noise and modelling issues5.3.1 Choosing the right level of model complexityAs we have seen for low level gene regulation and pathways, approaches tomodel systems depend on the level of detail we choose to address. The highlevel of detail we can only employ when modelling systems with a smallnumber of components, where we can use differential equations and studythe nonlinear dynamics and effects of noise based on probabilistic chemi-cal events. For systems with a high level of complexity, we need to choosemodelling approaches that capture the higher level properties we are inter-ested in. To attempt studying the simultaneous dynamics of each componentof a full cellular system with the same approaches as for a low level systemis simply not feasible - the representation could be formulated as a systemof hundreds of thousands coupled nonlinear stochastic differential equations,where we would only have limited knowledge about the coupling character-istics or parameters guiding the system. Indeed, we should not be deluded toeven think that we know the complete lists of components, or even which reg-ulatory mechanisms are present in a cell. After the sequencing of the humangenome, the number of genes and transcripts in the cell has been discussedwidely, with number normally in the region of 25-30,000 genes, but recentlyit has been reported that transcription is much more widespread than previ-ously known [184], and even on lower levels of granularity we can expectthat any models are still very incomplete, both with respect to which compo-nents are involved and which regulatory mechanisms are active. Large scalenetwork models should be addressing questions on the global level - about ro-bustness and fragility, modularity, local communities, and large-scale aspectsof molecular evolution.

When analyzing large scale data, we end up testing many hypotheses, witheach connection or each node being scored with some test statistic. Whentesting the significance of many such test statistics at the same time, we haveto account for this in order to control the rate of false positives. If we per-form 100,000 tests at a significance level of p < 0.05, we expect to find 5,000

43

Page 44: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

test come up as positive simply by random. In order to control for this, wecan score each test against an adjusted significance level. The most strin-gent of these approaches is the Bonferroni adjustment, in which each testis done against the nominal significance threshold divided by the number oftests done. For our example above, that would conclude that the only sig-nificant test would pass p < 5 x 10−7. This is very conservative and will alsoseverely reduce the number of true positive found. Often stepwise adjustmentsare done instead, such as the Bonferroni-Holm, the Benjamini-Hochberg orthe Benjamini-Yekutieli methods, that ranks each score and adjust the signifi-cance threshold depending on the rank of the score among all others.

5.3.2 Artefacts in network modellingCouplings between variables are almost always subject to ambiguity or uncer-tainty. Typically, network couplings are inferred by measuring some systemobservables in a series of experiments, for instance by sampling time seriesdata. Based on the collected data and a model for connectivity, a couplingmay be inferred to some degree of statistical significance. What constitutes anedge will be our choice of significance threshold. Another type of uncertaintyincludes couplings that themselves may be unambiguous, but the definitionof what constitutes a link is not. For example, the network of roads betweencities is clear and unambiguous, once it has been defined what a “road” anda “city” is. A national highway may count as a link, but a series of connectedcountryside gravelled paths may not.

Even though many natural and man-made systems seem to be well-definedand certain, they have almost always passed a thresholding or sampling step.Not only the connections, but sometimes also the nodes, may be uncertainin their definition. For a co-authorship network, what is the requirement tocount something as co-authored? Would scribbling notes on the same pieceof paper count, or do we require something more, like the paper being pub-lished in a peer-reviewed journal, and the two people are both listed as au-thors? For the road network between cities, what counts as a city? Clearly,the network inferred change depending on how we define these things. Suchthresholds, arbitrary or non-arbitrary, impose a structure on the network. Thisstructure may not be uniform, but change the inferred topology. For biologi-cal networks, we know that we have a wide range of coupling strengths. Forinstance, in protein-protein interaction networks, only components that havesmall enough dissociation constants will be measured in a high-throughputexperiment, whereas intermediate and weak interactions may not show up.These intermediate and weaker interactions could possibly have very impor-tant impacts on system dynamics, but will never be reported in an experimentthat is not designed to measure those. For instance, it has been shown thatphysical models for interactions based on protein surface hydrophobicity pre-dict scale-free networks. Sampling and reduced measurability of systems will

44

Page 45: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

also have an impact [185]. Using a simple graphical model of nodes and edgesis very simplistic and can not capture the behaviour of the system properly, al-though it is still useful to visualise and mine for large-scale properties, withthe discussion mentioned above in mind. To better represent systems with arange of coupling strengths, and still have the visual advantages a networkprovides, weighted networks have been proposed [186, 187].

45

Page 46: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using
Page 47: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

6. Signalling between different levelsof complexity

I have now described mechanisms of signalling and regulation at differentlevels of detail in complex biological systems, and will in this chapter dis-cuss how “vertical” signalling between levels can amplify noise and signalshappening on a fundamental level to produce far-reaching effects on the levelof the cell, the organ, or even the organism. Complex systems are known toexhibit phenomena such as self-adaptation, synchronization, robustness andself-organization. In biological systems, we can observe these phenomenafrom the viewpoint of a biologist, saying that they are indeed properties ofall living beings. But from a physicist’s view, we know that also systems thatare not alive can also exhibit the same properties. This notion of “emergence”is well known and have been discussed in science and philosophy for a longtime. Like nonlinear dynamical systems that can exhibit chaotic behaviourand extreme sensitivity to disturbances, fairly simple mathematical principlesand the laws of physics can make system evolve with truly amazing patternformation and self-organisation.

Paper V in this thesis describes large scale effects that depend on errors on alow level – the increased risk of type 2 diabetes conferred by single nucleotidepolymorphisms.

6.1 Genetic variation cause diseaseThe robustness of complex systems that was discussed in Chapter 5 comeswith a price. The system may adapt easily to fluctuating environments andexhibit a great tolerance for random errors, but show a fragility towards di-rected attacks on key components coordinating pathways or regulating im-portant functions. A single mutation can, if in the wrong position, change anamino acid in the translated protein sequence so that the effect on folding oractivity renders the whole protein useless. SNPs can also affect splicing if lo-cated near splice sites, affect RNA stability, or act by affecting the bindingof transcription factors. As discussed earlier, SNPs can also affect expressionlevels in an allele specific manner. Because of the topology of biological net-works, such changes can quickly lead to effects on the pathway level, and upto cellular or organ level and cause disease from the disruption of higher-level

47

Page 48: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

functions. Small effects in one gene can also have an widespread effect in theregulation of whole neighbouring regulatory network [188].

6.1.1 Type 2 Diabetes MellitusType 2 diabetes is characterized by decreased ability to produce insulin, andresistance against insulin effectiveness. This causes for instance impaired glu-cose uptake by muscle cells and impaired glucose storage by the liver, leadingto elevated levels of glucose in the blood, hyperglycemia. Insulin is one ofthe key metabolic regulators, and is produced in the β -cells of the Langerhansislets in the pancreas. By binding a cell surface receptor insulin triggers a sig-nalling pathway (Fig. 6.1) leading to different responses in different tissues(Fig. 6.2) [189].

Figure 6.1: The insulin signalling pathway has three main nodes of regulation: theinsulin receptor (IR) and the four IR substrate proteins, the phosphatidylinositol 3-kinase (PI3K) and its variant subunits, and the AKT/PKB isoforms. From Taniguchiet al., [189]

The genetics behind diabetes has been hard to pinpoint, since the diseasecan develop from several different biological systems failing. Many differentgenes, each with only slightly increasing the risk of disease, can cause systemimbalance eventually leading to diabetes [190]. Previous studies have foundchromosomal regions linked to disease, in for instance the genes PPARG[191], KCNJ11 [192], CAPN10 [193], ENPP1 [194], HNF4A [195, 196] and

48

Page 49: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

most notably TCF7L2 [197], a transcription factor in the WNT signallingpathway. TCF7L2 has been verified for association with disease in a lot ofdifferent studies in different populations. The study of complex diseases, suchas diabetes, will be much facilitated thanks to genome-wide association stud-ies and can further help us to find genes associated with disease, as shown inPaper V.

Figure 6.2: Insulin regulates responses in different tissues via the four different in-sulin receptor substrate proteins, that are themselves regulated by system feedbackand crosstalk with other signalling pathways. From Taniguchi et al., [189]

6.1.2 Genome-wide association studiesTo test association between genetic markers and disease, different strategieshave been employed. Linkage studies measure the effect of genetic loci ona quantitative trait, that can either be a binary variable (“healthy or carryingthe disease”) or real valued, like the blood glucose level after fasting. Familybased testing take heritability into account to zoom in on loci that have beenpresent in a family well correlated with the prevalence of the disease. Withrecent advances in measurement technology, as discussed in Chapter 2, wecan now test the genetic variation at a large number of markers in large groupsof people. One approach is the case-control study, where roughly equal num-bers of patient with a disease are genotyped and compared to healthy people.

49

Page 50: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

Markers where the genotype is significantly different between case and controlgroups are associated with the disease, and the added risk you run to get thedisease by carrying a specific genotype can be estimated. For biallelic SNPs,this is done by counting the three different genotypes (aa, aA and AA) in caseand control group, and score the difference between observed and expectedgenotype frequencies [198]. Methods that test clusters of SNPs together arebelieved to better address association by haplotype block [199].

It is important to put much effort into designing the study and select thecases and controls properly. In Paper V, we have selected cases as patientswith diabetes type 2, but who are not obese. This gives us more strength inthe assumption that a link between disease and a genetic marker really is tothe disease itself, and not to obesity or any other condition that may have hadan impact of the disease. A carefully selected cohort is important to maximizethe power to detect associations.

Similarly, effects from relatedness between test subjects can have a greatimpact on study power and association scores. If the cohort contain peoplewho are more related that you would expect by random, it means that theyalso share more alleles than you would expect. A genetic loci where a specificvariation is more prevalent will have higher association scores, even if it ispresent simply because of relatedness and not necessarily because it causesdisease. A test for this is to study the deviation from Hardy-Weinberg equilib-rium (HWE) for each SNP. In a population, we expect the two allele types foreach SNP to be independent, and when this assumption does not hold, it indi-cates problems – either in assaying that SNP, or by population effects. Severalmethods have been developed to analyze and counter the effects of popula-tion stratification. Genomic control uses the fact that a random sample froma χ2 distribution with one degree of freedom has unit mean, and estimates avariance inflation factor that is used for scaling all association test scores (fora χ2 test) [200, 201]. This can be conservative and lead to a loss of powerto detect, since it lowers all association scores with the same factor. Princi-pal component analysis can also be used, where the first component axis isnormally associated with geographical bias for association scores in a popu-lation. PCA-based statistical methods to analyse genotype data, detecting andcorrecting for population stratification, show promising results [202, 203].

Genome-wide association studies are often done in stages, to maximizepower to detect while holding down the number of genotypes necessary totest. The available samples are split across the stages, and the most significantSNPs from the first stage is tested on more samples in the second stage, andso on. Depending on the size of marker sets and sample groups, a joint anal-ysis for each SNP across the stages is normally having the greatest statisticalpower [204].

Case-control or phenotype data can be used to detect epistatic interactionsbetween genetic loci [205]. Even though it may present a computationallyintractable way, it has been show that it is statistically feasible to look for

50

Page 51: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

epistatic interactions on a genome-wide scale in spite of the enormous prob-lem multiple hypothesis testing presents [206].

The sex chromosomes have to be handled separately in association studies,since loci on these contribute to a phenotype differently than on autosomalchromosomes. Since males have one X and one Y chromosome, a locus onone of these chromosomes can not combine in an additive or multiplicativeway. Instead of the 2x3 allele count table we normally use for a χ 2-test ina case-control study, we use a 2x2 table. Also, an effect from a gene on theX-chromosome may contribute differently in males compared to females. Fe-males have two X-chromosomes, but normally, only one is active [207, 208].Chromosome number has an important effect on gene regulation and can beseen in lower eukaryotes as well [209].

A number of genome-wide association studies are underway or have com-pleted with various success: Paper V is a study done for type 2 diabetes, onethe first successful report with such a large number of SNPs tested in a popula-tion large enough to detect even alleles with moderate risk for disease. A studyof similar size identified a gene linked to inflammatory bowel disease [ 210],and another one used many SNPs but fewer samples to find SNPs linked toParkinson disease [211]. Genome-wide studies with fewer SNPs tested haveresulted in significant associations for type 1 diabetes [212] and breast cancer[213]. More background on genome-wide association studies can be found in[214], and current statistical methods are reviewed in [215].

51

Page 52: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using
Page 53: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

7. Contributions

The five papers included in this thesis have contributed results to the knowl-edge of regulation of complex biological systems at different level of detail.In this chapter, the papers will be summarized and the main results presentedand put into context.

7.1 Paper I: General measures for signal-noiseseparation in nonlinear dynamical systemsThis first paper describes the separation of signal and noise through a bistablesystem, in this case a Hopfield neuron model. The same class of system de-scribes fundamental bistable genetic control circuits, such as toggle switches[15].

To measure the change in separation between signal and noise goingthrough a stochastic nonlinear dynamical system (SNDS), it is important touse statistics which capture the full probabilistic structure of both signal andnoise. One of the most common measures used is the signal-to-noise ratio(SNR), the ratio between the signal energy and the energy of the noise floor,because of its relation to detection performance in the case of deterministicsignals in Gaussian noise [216]. The SNR has some useful properties: forexample, it is easy to compute, it is intuitively easy to understand and itcan be used for assessing detection performance for a number of standardscenarios. However, for general classes of signals, it is not an accuratemeasure of detectability. This is because the output of a nonlinear systemdoes not generally have a Gaussian structure, and the SNR can not capturethe full probabilistic structure of it, only the first and second order moments.

Instead, Paper I is based on an information theoretic approach laid out in[217, 218] and specifically uses φ -divergence measures [219] as a family ofperformance measures for signal detection in SNDS. These are separationmeasures between probability distributions representing different conditions,or modes, for the system, typically for the separation of signal and noise as inthe detection setting. They are generally defined as

dφ (P0,P1) =

∫S

φ(dP1

dP0)dP0, (7.1)

where P1 and P0 are two probability measures on the space S. φ(x) is a convexand continuous function for positive x, and D is the divergence. Different φ

53

Page 54: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

functions give us different divergences, such as the Kullback-Liebler distancedI for φ(x) =−log(x), the Kolomogorov divergence de for φ(x) = |(1−α)x−α|, and the χ2-divergence dχ2 for φ(x) = (x−1)2 [220]. The likelihood ratioΛ relates P0 and P1 by

P1 =

∫ΛdP0 = EP0Λ (7.2)

Λ =dP1

dP0

The φ -divergences can answer a wide range of questions about system per-formance in an information theoretic sense, and have important connectionsto performance bounds for general detector structures [221]. They are all in-timately connected with theoretical limits of performance from different as-pects. For example, the probability of detection for the optimal detector, theCramer-Rao bound for parameter estimation, and channel capacity can allbe expressed as simple monotonic functions of a φ -divergence. In contrast,the SNR is based on the spectrum, which only captures the first and secondmoments of the probability density function of the output. For signals go-ing through nonlinear systems, the SNR is close to being well defined onlyfor narrowband cases. Using these new measures, we have investigated theSR phenomenon for a generic bistable potential and compared with previousstudies [86]. Bistable potentials are commonly studied and used as examplesof generic nonlinear physical systems, since that type of nonlinearity is fre-quently appearing in natural systems. In our studies, for this typical double-well system, the SR phenomenon is explained by describing the SR curve asthe performance of a suboptimal detector which does not capture the completeprobabilistic structure of the output. The φ -divergences, which are able to dothis, do not show the SR behaviour. They also show the the performance onthe input is the same as on the output, confirming that simple double-well sys-tems are information-preserving. The φ -divergences also have the advantageof being easy to calculate, and we provide formulas for this which works fora wide range of systems, signals and types of noise.

7.1.1 MethodsThe setting for our computations is in continuous time and are based on themodel described as a system of stochastic differential equations (SDE), whichin the one-dimensional Itô case takes the form

dXt = f (Xt)dt + st dt +σ dWt , t ∈ [0,T ] (7.3)

As a test model for the evaluation of φ -divergences as performance measures,we have used a ‘soft’ potential of the ‘Hopfield’ type, which has previouslybeen shown to exhibit the SR behaviour [86]. f in (7.3) is then given by:

f (x) = −ax+b tanh(x), a,b > 0 (7.4)

54

Page 55: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

A detection problem can then be formulated as the testing of the two hypothe-ses

H0 : dXt = f0(Xt , t)dt +σdWt , f0(Xt , t) = f0(Xt , t) (7.5)

H1 : dXt = f1(Xt , t)dt +σdWt , f1(Xt , t) = f0(Xt , t)+ st (7.6)

with P0 and P1 as the corresponding probability measures being induced by theoutput signal, Xt , on S. st , Wt and f are as above. To calculate the likelihoodratio dP1/dP0 that is used in Eq. 7.1, we use a the Wiener measure, PW asreference; the measure being induced on S by the Wiener process Wt . TheWiener process belongs to a class of stochastic processes called Martingalesfor which rigorous theoretical results have been developed using probabilitytheory [222].

The likelihood ratio Λ can be computed with the help of Girsanov’s theoremand the so called Cameron-Martin formula that tells us how to transform oneprobability measure to a Wiener measure [82, 223, 224, 225]

dP1

dP0=

dP1

dPW/

dP0

dPW= (7.7)

= exp

∫ T

0

f1σ 2 dX1

t −12

∫ T

0f 21 dt −

∫ T

0

f0σ 2 dX0

t +12

∫ T

0f 20 dt

(7.8)

Eq. 7.8 is solvable by an Euler-Maruyama scheme, by simulating the systemand recording trajectories for Xt under P0 and P1 [226], and with that we cancalculate the divergences by Eq. 7.9.

dφ (P0,P1) = EP0

φ

(dP1

dP0

)(7.9)

7.1.2 ResultsOur results show the different divergences for various cases of signals (Fig.7.1 and Paper I, Fig. 5), and show that they are all monotonically decreasingas a function of noise strength, exhibiting no stochastic resonance behaviour.But when we use a statistic insufficent for the likelihood ratio (similarly tothe SNR), as in Paper I, Fig. 2, we see a local maximum, very much like theSR effect. Sufficient statistics for the LR are variables that can be used toretrieve the LR using a invertible transformation, and we see that when westudy the system using a sufficent statistic, we get the same behaviour as theφ -divergences get.

7.1.3 DiscussionThis paper shows that the stochastic resonance phenomenon described for theSNR as a function of noise strength depends on the suboptimal nature of theSNR statistic when applied to the output of a SNDS. The SNR is a insufficient

55

Page 56: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

Figure 7.1: The distance between signal and noise, as measured by three φ -divergences, is unchanged when passing through an invertible system. All divergences,closely related to optimal performance bounds for signal-noise separation, are mono-tonically decreasing functions of the noise strength σ .

statistic w.r.t. the likelihood ratio, but when we use φ -divergences, or statis-tics sufficient for the LR, we can observe a monotonically decreasing distancebetween signal and noise as noise strength increases. The φ -divergences aredirectly related to optimal detection performance, so the conclusion is that theSR effect comes from the fact that the SNR is a suboptimal detection statis-tic for nonlinear systems, and the local maximum in the SR curve is locatedat noise levels where the information loss is the highest. An optimal detec-tor would have monotonically decreasing performance for increasing noise,but a suboptimal detector, such as the SNR or other statistics insufficient forthe likelihood ratio, would show a SR effect (but always do worse than anoptimal detector). Due to the data processing inequality [227] the distance be-tween signal and noise can not increase by passing through a system. If thesystem is invertible, so that the input can fully be deduced from the output, thedistance will not change, otherwise it will decrease. Still, the stochastic res-onance effect can be seen in suboptimal detectors [228], or even for optimaldetectors when additional noise is added to the output of the system [229] orwhen the system is noninvertible [217].

56

Page 57: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

7.2 Paper II: Building and analysing genome-widedisruption networksIn this paper, a large microarray dataset of systematic gene knockouts for S.cerevisiae was used for inferring genome-wide disruption networks describinghow the disruption of one gene affected the expression levels of others. Theresults show that• genes with many outgoing edges often have regulatory function, genes with

many incoming edges have other non-regulatory• the global structure of such networks have a scale-free topology which is

very hard to disrupt by deleting genes• local neighbourhoods of a gene have a statistical overrepresentation of

functionally related genes• genes are either regulated, with many incoming edges, or regulatory, with

many outgoing edges.

7.2.1 BackgroundIn 2000, a compendium of transcriptional profiles for Saccharomyces cere-visiae was published [166], one of the first systematic mappings of gene ex-pression in yeast strains with one gene knocked out. Microarray experimentswere carried out for 274 single gene deletions, 2 cases of double gene dele-tions, 13 conditional knockouts using tetracycling regulated promoters, and 11experiments where wildtype (WT) yeast strains had been treated with differentchemical compounds. Importantly, 63 control experiments (WT-vs-WT) werealso performed, allowing us to estimate the variation of gene expression with-out any perturbations. This seminal paper demonstrated that the co-expressionof two genes also indicate a functional link, by showing that expression pro-files cluster together for yeast strains with different genes in the same pathwayknocked out.

7.2.2 MethodsWe selected a subset of 248 out of the 274 single gene knockouts studied byHughes et al., excluding the yeast strains with chromosome number anoma-lies, to make sure that the expression changes we observed was a result of thegenetic perturbation, not because of ploidy. Using the 63 control experiments,we estimated the standard deviation σi of WT-vs-WT log-ratios for each genei. This standard deviation was then used to normalize the log-ratios of eachgene across the set of all 274 knockouts, allowing us to distinguish betweengene expression fluctuations induced by the knockout, as compared to naturalfluctuations, than can be very different between different genes. If gene i hada normalized knockout-vs-WT log-ratio exceeding a given threshold γ whengene j had been deleted, an edge was added from the deleted gene j to the

57

Page 58: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

gene i. This resulted in a “disruption network” with up to 248 nodes havingoutgoing links, and up to 6318 (the number of yeast genes test on the microar-ray) nodes with incoming links. Since the network depends on the thresholdγ , we constructed different networks for values ranging from 2.0 up to 26.0,and analysed the consistency of these.

We compared our inferred networks to randomized networks and a refer-ence network, created using the manually curated database Yeast ProteomeDatabase, YPD [230]. This reference network was created by analyzing the‘description’ field in the database for the 248 genes that were deleted in [166].If this description field contained the name of another gene in the network, thetwo genes were linked by an undirected edge. Since there was no analysis ofsentence syntax, we could not deduce the direction of any relation, and in fact,we could not even deduce if the relation was positive or negative (the absenceof a relation), since any analysis where we just look for the occurence of genenames will not disinguish between sentences like “X is regulated by Y”, “Yis regulated by X”, or “X does not affect Y in any way”. Still, after manuallyreading a large number of desription fields, the occurence of negative rela-tions was deemed so rare that it would have very little negative impact on theoverall use of the reference network. Our hypothesis was that if our disruptionnetworks had any functional relevance, they would overlap significantly morewith the reference network than would be expected by chance. Randomizednetworks were created by shuffling the edges of the measured network at eachanalysed threshold level γ . The shuffling was done to ensure that the numberof ingoing and outgoing connections for each gene were kept constant.

The topology of the network was analysed by studying the distribution ofindegree and outdegree, and the total degree. We also analysed the correla-tion between in- and out-degrees by comparing ranks. Each gene was rankedaccording to indegree and outdegree, and the two ranks plotted against eachother. We also studied size and number of connected components, and howthe network was held together when eliminating the most connected genes.Connected components are parts of the graph where there is a path from anygene to every other gene. If the graph is disjoint, there are two or more discon-nected components, where there is no path from any node in one componentto a node in a different component. Such components are found using an al-gorithm called “depth-first search”, where for each node, its connected nodesare found. This is done in a recursive manner until no more nodes are found.

We studied the correspondence between network structure and function bylooking at groups of genes assigned to the same cellular role in YPD, andby looking at neighbourhoods and subnetworks. Genes with the same cellularrole were grouped, and the median in- and out-degree of the group was stud-ied, and the groups were ranked accordingly. For neighbourhoods, we selecteda core set of 20 genes known to be involved in pheromone response, and stud-ied how the neighbourhood, as defined by the genes that were at most one edgeaway from any gene in the core set, was composed. To study how well gene

58

Page 59: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

disruption mapped to transcription factor binding, we selected the neighbour-hood of the transcription factor Ste12 and studied how often we could findthe consensus DNA motif for binding, TGAAACAA, in the promoter regions600bp upstream of the transcription start site for these genes.

7.2.3 ResultsThe reference network constructed from YPD had 274 genes and 827edges between these. For each network contructed for γ in the range2.0,2.1, ...,4.0, we measured the number of overlapping edges between themeasured network and the references network, as well as the average numberof overlapping edges for 1000 randomized networks. The result, shown inFig. 1, Paper II, was that our measured network overlapped significantlymuch more with the reference network than the randomized networks. Theoverlap increased from 9 to 16% in that range, indicating that with increasedstringency in how the initial measured network was built from data, theconcordance with data retrieved from YPD was also increased.

The distribution of indegree and outdegree in the network were shown toroughly follow a power law (Fig. 2 of Paper II), something that was alsoconfirmed in independent studies by Wagner [231], Featherstone and Broadie[232] and Farkas et al. [233]. The genes with the highest outdegrees encodedgenes with regulatory or very central positions, while the genes with the high-est indegrees were found to be mostly involved in metabolism. As shown inPaper II, Fig. 3, it is rare for a gene to both have a high outdegree and a highindegree, meaning that hubs are either regulatory or regulated.

The study of connected components showed that even if up to 10% of themost highly connected nodes are removed, the network remains mainly onelarge connected component containing most of the genes. Only when the net-work is built from a stringent filtering of the initial microarray data, and 10%of the most connected genes are removed, will it break down in several com-ponents of roughly the same size. Those are starshaped though, meaning thatthey are built by one central node as the single link between all the other genesin that star. This seems to go against both the general idea that networks aremodular, and the sensitivity to random attacks that is a common feature insystem with a scale-free structure. The latter, at least, may be explained by thefact that the dataset is already filtered, since the genes that may be so centralthat their deletion would disrupt the whole system could not be in the testedset - since their deletion would be lethal and no data would be possible tomeasure.

The idea that regulatory genes have higher outdegrees and that genes in-volved in metabolism have higher indegrees was also supported by the de-gree medians measured for genes grouped according to YPD cellular role(Tab. 1, Paper II). The analysis of the neighbourhood around core genes forpheromone response showed that neighbourhood were ennriched for genes in-

59

Page 60: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

volved in the same function. At γ = 4.0, the neighbourhood contains 63 genesand 115 edges (Fig. 7.2). When “leaves” (nodes with only one incoming edge)were taken out, the network consisted of 36 genes, of which 18 belong to theinitial set of 20 core genes. Of the remaining 18 genes not in the core set, 8were also annotated as being involved in pheromone respones, showing thatthese genes are well connected between each other, and form a highly clus-tered network region.

The analysis of promoter regions in genes linked to Ste12 was also showinga functional relationship: out of 19 genes known to be induced by Ste12, 12were connected to Ste12 in the disruption network, but only 2 out of 13 genesknown not to respond to the pheromone were connected. 8 genes carried theconsensus DNA binding motif of Ste12 in their promoter regions, 7 of thosewere both connected in the disruption network and known to be induced bySte12.

Figure 7.2: The gene neighbourhood of a core set of 20 phermomone genes (coloured),from the disruption network at γ = 4.0.

7.2.4 DiscussionIt is important to note that because only 248 of the possible 6318 single geneknockouts were included in this study, the gene network will represent only asample of the full network, and any conclusions drawn will be affect of howwell the 248 represent the full set. It also means that the maximum indegreeof any gene is 248, compared to 6318 for maximum outdegree. Also, it is notclear how well the notion of a path works in disruption network. An edge fromgene X to gene Y means that if gene X is disrupted, the expression level ofgene Y will be affected. How this effect is carried from X to Y is unknown,and may be the result of many different regulatory steps. The only thing wecan deduce is that Y somehow is downstream of X in a regulatory chain.Similarly, a chain X → Y → Z have no meaning of a true path as we think

60

Page 61: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

of them in a signalling pathway. The X → Y link is measured in a systemwhere X has been perturbed, but Y → Z come from an experiment where Yhas been perturbed, representing an entirely different system. Indeed, if wehave a chain X → Y → Z without seeing a link X → Z, the disruption ofX will have no effect on Z. But even in spite of this, the paper shows thatsystematic disruption and genome-wide measurement of biological systemswill give us results that give us insights in the biological systems, and maygive us evidence of the function of unknown genes, and predicted centralityand regulatory importance of individual genes.

7.3 Paper III: Discovering novel cis-regulatory motifsusing functional networksIn prokaryotes, genes that are linked functionally, for instance by partakingin the same pathway, often form operons that are expressed under the controlof a single promoter. This paper hypothesizes that synergistic control of ex-pression for functionally linked genes also occurs in eukaryotes. Support forthat theory had been, for instance, the finding by Hughes et al. in their com-pendium of systematic yeast deletion mutants that knockout of genes that arefunctionally related have expression profiles that cluster together [166]. Also,it is known that by using clusters of coexpressed genes, we can find DNAsequences for transcription factor binding by looking for motifs that are over-represented in the promoter regions of genes in such a group [61], and thereare many examples of DNA motives present in the genes involved in severallinked steps in pathways (Fig. 7.3). In Paper III, we combine functional net-works with sequence information in yeast to find sequence motifs that areenriched within a subnetwork or a pathway compared to random groups ofgenes, indicating that a transcription factor may bind that motif to regulate thepathway synergistically.

7.3.1 MethodsWe use the full metabolic network from Kyoto Encyclopedia of Genesand Genomes, KEGG [234], and two interaction networks with proteincomplexes: from Cellzome [169] and MDS [168]. These networks arebipartite, with two types of components. In the metabolic network, eachcompound has a number of enzymes that have that compound either assubstrate or product. We make a unipartite graph by linking genes that codefor enzymes linked to the same compound. Similarly, each of the proteincomplexes has proteins linked to it, the ones partaking in the complex,and we make a unipartite graph by connecting proteins that partake in thesame complex. Since we are looking for motifs specific to small groups ofgenes, it is not in our interest to study complexes or compounds that bind

61

Page 62: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

Figure 7.3: In the synthesis of DNA from diphospo-nucleotides, the compounds (largenodes) are being modifi ed by the genes indicated in the three presented steps. One (*)or both (**) DNA patterns C[AT].GTTT.GTG and ACGCG.AA.T, where [AT] meanseither A or T and . is the wildcard symbol, are present in many of the promoter regionsof the involved genes for all steps.

promiscuously. Therefore, we removed all compounds that were connected to60 or more proteins, for instance H2O, O2 and CO2. In the resulting networks,we removed interactions between genes with highly homologous promoterregions, since these would bias the entire promoter region for enrichment,not only the TF binding motives that we are seeking. For each compound orcomplex linked to three or more genes, we extracted the promoter regions,600 bp upstream of the transcription start site, and searched for DNApatterns of at least 8 nucleotides with two wildcard symbols allowed usingthe TEIRESIAS algorithm [235]. The algorithm reports patterns that matchthe search criteria and have support in a group of at least 3 or 4 genes, thecore group (3 in groups of less than 10 genes, 4 required in larger groups).Reported patterns were maximal, i.e. not specific cases of other patterns. Forinstance, the pattern “CG.TT” (where “.” is the wildcard symbol) is a specificcase of “C..TT”, so if both are found matching the criteria in a group ofsequences, only “CG.TT” would be reported. For each of the patterns found,we retrieved all genes in the full functional network that carried that patternin their promoter region, and formed a pattern network, a fully connectednetwork consisting of these genes. The overlap of this network and theoriginal functional network was scored with the function

S =

√∑

i

(1

ai +bi −1

)(7.10)

where i is running over all edges in common between the functional networkand the pattern network, excluding edges between genes in the core group. ai

and bi are the total number of edges adjacent to the nodes linked by each such

62

Page 63: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

edge. This score was compared to the similarly calculated scores for randomgroups of genes drawn from the full functional network, with the group sizebeing the same as in the proposed pattern network. The significance of themeasured pattern was measured by number of standard deviations from themean, estimated from the random networks of the same size.

7.3.2 ResultsThe KEGG network generated 197,922 patterns with TEIRESIAS, whereasthe Cellzome data generated 197,111 and the MDS data 320,405. We gener-ated 200 random networks for each size in the range 2, 3, ... , 500 for eachdataset, and scored these using Eq. 7.10. Figure 3 in Paper III shows the dis-tribution of overlap scores for random networks of various size, and for realpattern networks. It is clear that overlap scores from real networks have a verydifferent distribution, a range of pattern considerably higher than would beexpected by random. 647 motifs were selected, filtered for redundancies, andclustered based on genomic location in the promoter. We assumed that true TFbinding motifs would be located at roughly the same distance from transcrip-tion start site, so for each motif we retrieved the exact genomic locations forthe genes it was present in. Two motifs were clustered together if their two setsof genomic location coordinates, with a tolerance of ±5 bp, were overlappingfor at least 40% of the promoter sequence in one of the groups. An all-vs-allclustering of the 647 motifs produced 42 clusters, for which sequence logos[66] reported in Table 1 in Paper III.

Many of the discovered motifs are known to be bound by transcription fac-tors, such as GGTGGCAAA that is bound by Rpn4p and regulates the tran-scription of genes involved in the proteasome [236], or TGACTC that is boundby the transcription factor GCN4 that regulates genes involved in amino acidsynthesis [237] (Fig. 7.4). A strong motif is AAAATTTT which is found inmany genes and is significantly linked to parts of the functional network thatare involved in central transcription and translation processes. This pattern isknown to bend DNA and have a general effect on accessibility of promoterbinding for transcription factors at nearby sites, instead of binding a specificfactor itself [238]. We hypothesize that this provides a general switch for pro-tein production. These poly(dA-DT) motifs have been investigated a lot andafter a complete mapping of nucleosome positions in yeast [239], it was foundthat these were enriched in sequence regions not bound by nucleosomal pro-teins, giving better access to neighbouring transcription factor binding motifs.

We found that the discovered motifs were located at roughly the same posi-tions relative to the transcription start site, also indicating that the motifs havea role in transcription regulation.

63

Page 64: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

Figure 7.4: The transcription factor GCN4 binds a TGACTC DNA motif, as shownin the sequence logo. The network shows links enzymes that are linked to the samecompund in KEGG, and contain the TGACTC DNA motif in their promoter regionswithin 600 bp upstream of the transcription start site.

7.3.3 DiscussionThis work uses functional networks and sequence information to show thatgenes connected in pathways are often carrying the same specific DNA motifsin their promoter regions, and at roughly the same locations, indicating thatthese genes may be regulated in synergy by the same transcription factor or thesame sequence-dependent transcriptional mechanism. The method does notneed the support of any data other than a functional network and the sequenceswe anticipate regulatory DNA motifs to be present in.

Obviously, this method depends heavily on the method to discover motifs ingroups of sequences and the functionl networks used for grouping genes. Withadvancements in network mapping the discovery and cataloguing of transcrip-

64

Page 65: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

tion factor binding, chromatin structure and regulatory elements, even in otherspecies than yeast, this method could be applied to link pathway function withregulatory mechanisms much wider than when Paper III was published.

7.4 Paper IV: From gene networks to gene functionThis paper combines different types of gene networks for yeast to findfunctional information by analysing the overlap of gene neighbourhoods.We showed in Paper II that gene neighbourhoods in disruption networkshave functional relevance, and here we expand on this idea by looking attranscription factor binding networks, both inferred from experiments, suchas “ChIP on chip”, and predicted from identification of DNA motifs knownto bind transcription factors.

7.4.1 MethodsOur method relies on the assumption that if two genes have gene networkneighbourhoods that overlap more than expected, the two genes themselvesshould have a functional relationship that depends on the two types of net-works compared. We used three types of gene networks:1. Gene disruption networks, where gene X has a directed connection to gene

Y if the deletion of X significantly affects the expression level of gene Y.We used the network inferred in Paper II.

2. Predicted transcription factor binding networks, where gene X is connectedto gene Y if X is a transcription factor with a consensus DNA motif forbinding present in the promoter of gene Y. The network used was publishedin [165].

3. Experimentally derived transcription factor binding networks, where geneX is connected to gene Y if X is a transcription factor that has been found tobind the promoter region of gene Y in a “ChIP on chip” experiment. Foursuch networks were used, based on results from [41, 42, 43, 44] that testedtwo, three, nine and 106 transcription factors, respectively.

We defined a source gene as a gene with outgoing edges, and a target geneas a gene with incoming edges. Every source gene in each network was com-pared to every source gene in the same or in other networks, by evaluatingthe overlap of their target gene sets with a hypergeometric test (Fig. 7.5. Ifgene X has edges to a group of MX genes (out of N possible), and gene Y istargetting MY genes out of N, and the observed overlap of the two groups isK, the probability to find at least this overlap by random is given by

p =N

∑k=K

(MXk

)(N−MXMY−k

)( N

MY

) (7.11)

65

Page 66: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

To correct for multiple testing, the Holm method was used [240], and a linkbetween two source genes was deemed significant if p < 0.01. Depending onthe network the two source genes were coming from, their connection havedifferent interpretation, as shown in Fig. 2, Paper IV.

Figure 7.5: The size of the overlap between the target sets is used as a predictor offunctional connection between the sources genes. If the overlap is larger that wouldbe expected by random, we infer that the sources genes are connected with a functionallink that is defi ned by the nature of the networks we retrieve the target sets from.

The resulting networks of linked source genes were compared with vali-dation networks of different types, representing different functional informa-tion:1. Protein-protein interaction networks, both from yeast two-hybrid screens

[167, 170] and from mass spectrometry based methods [168, 169]. Twolevels of stringency were used, one where networks were constructed sim-ply as the union of all found connections, and one where each connectionrequired the support of two independent data sets.

2. Manually curated protein complex networks [177]3. Literature networks constructed by identifying co-occuring gene names in

Medline abstracts for yeast. One network was constructed as the union ofall co-citations supported by at least two different Medline abstract, onenetwork requiring three different abstracts.The predictive power of our data derived network was calculated against

these reference networks using Receiver Operating Characteristics (ROC)plots. These plot the probability of detection as a function of probability

66

Page 67: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

of false alarm, or in other terms, the rate of true positives (sensitivity) asa function of the false positive rate (1-specificity). A model with goodpredictive power would have a high number of true positives even for a lownumber of false positives, whereas a totally random predictor would have nohigher rate for true than for false positives. In our case, the reference networksare treated as “true” positives, although these are obviously predicted fromdata themselves, and a good predictive power for our model compared to thereference networks mean that the inferred connections between source genepairs match well to gene pairs discovered in the functional networks.

7.4.2 ResultsOut of 15,061 source gene pairs, 23,758 target set comparions were carried out(the same source gene pair can be present in several network comparisons).816 unique gene pairs between 159 different genes were found to have sig-nificant connections. Of these, 741 were connections between two differentsource genes based on target set comparisons within the same network. 143came from different source genes in different networks, and 17 from target setcomparisons between the same gene in two different networks. We found thatnetworks derived from the same type of experiment are similar. For instance,source gene pairs connected within an experimentally derived transcriptionfactor binding network were often connected when comparing networks ofthe same type but derived from different datasets. For these 174 connectionswere significant out of 3414 tested, but when compared to the yeast deletionmutant networks, only 49 out of 7664 were significant at p <= 0.01, Holmcorrected.

When compared to the reference networks, ROC plots (Fig. 3 in Paper IV)show that overlap between target sets indeed can be used as a predictor offunctional relationships, in particular when such overlaps are calculated be-tween genes in the same network. In protein-protein interaction reference net-works, 11 connections were found between genes that occured in “ChIP-on-chip” networks, with an overlap of 6 out of 11. For co-citation and proteincomplex reference networks, the best predictions were found in the deletionmutant networks, that found 23 out of 56 and 7 out of 14, respectively.

We can see that genes involved in the same biological process cluster inour predicted networks, as shown in Figure 4, Paper IV, where clusters ofgenes involved in pheromone response and cell cycle regulation have beenhighlighted.

7.4.3 DiscussionPaper IV presents a method to integrate networks derived from differentdata sources, and show that doing so can increase the power to predictfunctional information, like protein-protein interactions, co-citations

67

Page 68: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

in Medline abstracts, or protein complexes. The method formalizes a“guilt-by-association” approach, and is easily extendable to comparisonsbetween networks in general. How predicted links should be interpretedwill obviously depend on what different data sources that we combine.A limitation of the method is that it requires that target sets have to bemeasurable in two compared datasets, since the overlap score is calculatedbased on the assumption that the union of the two compared target gene setsare present in both networks.

7.5 Paper V: A genome-wide association studyidentifi es risk loci for type 2 diabetesA case-control genome-wide association study was done to identify risk locifor diabetes type 2. In the first stage, 392,935 SNPs were tested for associ-ation to disease in almost 700 case and 700 control subjects, using the Il-lumina Human1 BeadChip testing 109,365 SNPs selected in a gene-centricmanner, and the HumanHap300 chip testing 317,503 SNPs selected based onthe HapMap Phase I data. The 59 most significant SNPs were tested again ina larger population, and 8 of these were found to be significant at p < 0.05,Bonferroni corrected for multiple hypothesis testing. These eight SNPs arelocated in 5 different haplotype blocks. One is located in the gene TCF7L2which already has been found to be associated with diabetes type 2. One isin the zinc transporter SLC30A8, and another block contain the three genesHHEX, KIF11 and IDE. Two blocks on chromosome eleven are borderlinesignificant at nominal p < 0.05, one in the gene EXT2 and one in a gene ofunknown function.

7.5.1 MethodsA two-stage experimental design was used (Fig. 7.6, where in the first stage aFrench cohort of 690 cases and 670 control subjects were tested for associa-tion. Control were over 45 years old, had normal fasting glucose levels, and abody mass index (BMI) of less than 27. Cases were type 2 diabetics accord-ing to American Diabetes Association (ADA) criteria, one first degree relativeaffected by diabetes, and non-obese (BMI< 30 kg/m2).

Initial quality control for samples resulted in a number of these to be ex-cluded from further analysis, for instance because of duplicated assays, incor-rectly phenotyped patients, individuals of non-European ancestry, and samplesfor which the call rate was below 95% as given by the Illumina BeadStudiosoftware.

For both sets, association tests were carried out on a subset of SNPs meetingthe following criteria:• Located on autosomal chromosome

68

Page 69: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

Figure 7.6: In the fi rst stage of the genome-wide association study, almost 393,000SNPs were genotyped and passed quality control fi lters. The 59 top SNPs were fast-tracked for second stage validation, while a full second stage of 20,000 SNPs areunderway, and follow-up studies including fi nemapping of the most associated loci.Figure by Rob Sladek.

• Call rate ≥ 95% per SNP• Minor allele frequency (MAF) ≥ 0.01 for both cases and controls• pHWE ≥ 0.001 for controls

Sex chromosomes were analysed separately.For both datasets, a 2x3 genotype count table (Tab. 7.1) was formed for

each SNP. The allele for which the frequency was higher in the cases than inthe controls was denoted “A”, and the other allele “a”.

aa aA AA Sum

Cases r0 r1 r2 R

Controls s0 s1 s2 S

Count n0 n1 n2 N

Table 7.1: 2x3 genotype count table.

For each SNP i, the association between marker and disease was measuredusing Armitage’s trend test for additive, dominant and recessive models asdescribed in [198]. For clarity, SNP subscripts are dropped from the notationfor the quantities in the allele count table and their subsequent use in formulasbelow, where this is unambiguous.

69

Page 70: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

X2A,i =

N [N (r1 +2r2)−R(n1 +2n2)]2

R(N −R)[N (n1 +4n2)− (n1 +2n2)

2] (7.12)

X2D,i =

N [N (r1 + r2)−R(n1 +n2)]2

R(N −R)[N (n1 +n2)− (n1 +n2)

2] (7.13)

X2R,i =

N (Nr2 −Rn2)2

R(N −R)(Nn2 −n2

2

) (7.14)

A max statistic was formed across these,

X2max,i = max

X2

A,i,X2D,i,X

2R,i

, (7.15)

to select the strongest obtainable association for any of the three models.To estimate empirical p-values, Nperm = 10000 permutations of the disease

state vector were done, and for each model ξ (including the maximum), eachSNP i and each permutation j, the corresponding statistic X2

ξ ,i, j was calculated.Two types of empirical p-values were measured:• A p-value indicating how likely a statistic measured for a SNP is exceeded

among the permuted data for that SNP:

mξ ,i = #

X2ξ ,i, j : X2

ξ ,i, j > X2ξ ,i, j ∈ [1, ..,Nperm]

(7.16)

pξ ,i =mξ ,i +1

Nperm +1(7.17)

where # is the cardinality of the set.• A p-value indicating how likely a statistic measured for a SNP is exceeded

across all permutations and all SNPs:

mgwξ ,i = #

X2

ξ ,i, j : X2ξ ,i, j > X2

ξ ,i, j ∈ [1, ..,Nperm], i ∈ [1, ..,NSNPs]

pgwξ ,i =

mgwξ ,i +1

Nperm ∗NSNPs +1(7.18)

7.5.2 ResultsBased on the computed statistics and their corresponding theoretical and em-pirical p-values, we used qq-plots to study the distribution of measured p-values against expected ones and to determine a p-value threshold below p-values clearly deviate from the null hypothesis. SNPs with p-values belowthis threshold are implicated as significantly associated with disease, and werevalidated in a larger cohort.

Subpopulation structure can give rise to a variance inflation for the mea-sured test statistics and a deviation from the expected χ 2(1) distribution, so

70

Page 71: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

that we have X 2ξ ∼ λ χ2(1). Systematic genotyping errors can also contribute

to this effect. This variance inflation factor λ [ 200] can be estimated in or-der to adjust the measured test statistics for subpopulation effects. Using themethod described in [201], we estimated λ for each model separately by tak-ing the ratio between the mean of the measured statistic and the mean of theexpected one for the SNPs with the lowest 90% of the measured statistic:

λ =

1M ∑M

i=1 X2ξ ,i

1M ∑M

i=1 F−1(

iN+1

) , (7.19)

where F is the χ2 distribution for one degree of freedom, X 2ξ ,i have been

ranked in increasing order, N the total number of SNPs, and M the numberof SNPs considered for the estimate (90% of N) (table 7.2).

Model 100K 300K

Additive 1.1447 1.1219

Dominant 1.0901 1.0597

Recessive 1.1085 1.1145

Table 7.2: Estimates of the variance inflation factor.

The adjusted qq-plot show a distribution of p-values that better align withthe expected values. From the histogram plots, it becomes obvious that thisadjustment has a very strong effect on the p-value distribution for the differentmodels, with a strong flattening of the curve, and many of the lower p-valuesgetting closer to the expected ones. We have used these plots for the maximumstatistic to determine the threshold at which SNPs start deviating from thenull hypothesis. As expected, the p-values associated with the max statisticare more biased towards lower values than the other models and will not bedistributed uniformly even under the null hypothesis.

To select the SNPs that would be passed to fast-track confirmation, we se-lected the ones that were deviating heavily from the baseline in the qq-plotscomparing the p-value-distribution to an expected uniform distribution (Fig.7.7). For the 100K data, a threshold of p ≤ 1∗10−4 was chosen, and for the300K set, p ≤ 5 x 10−5 was used.

An observation is that although not many more SNPs pass the threshold forthe 300K than the 100K data, the p-values for these SNPs are considerablylower. This can indicate a better selection of SNPs on the 300K assays andless false positives.

59 SNPs from phase I were genotyped successfully and passed quality con-trol criteria in a fast-track second stage, consisting of 2,617 cases and 2,894control subjects. Using the same MAX statistic as above, and 10,000,000 per-mutation tests, 8 SNPs passed a significance level of 0.05, Bonferroni cor-rected (p < 8.8 x 10−4) (Table 7.3).

71

Page 72: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

Figure 7.7: QQ plot for the max statistic for the Human1 data. Uniform distribution(magenta, diagonal), p-value using the χ 2 distribution with one degree of freedom,adjusted for λ estimate (blue, middle) and unadjusted (green, lower).

SNP Chromosome Position pMAX permut. Gene

rs7903146 10 114748339 <1.0 x 10−7 TCF7L2

rs13266634 8 118253964 5.0 x 10−7 SLC30A8

rs1111875 10 94452862 7.4 x 10−6 HHEX

rs7923837 10 94471897 2.2 x 10−5 HHEX

rs7480010 11 42203294 2.9 x 10−4 LOC387761

rs3740878 11 44214378 2.8 x 10−4 EXT2

rs11037909 11 44212190 4.5 x 10−4 EXT2

rs1113132 11 44209979 8.1 x 10−4 EXT2

Table 7.3: SNPs significantly associated with disease in fast-track second stage.

7.5.3 DiscussionThe eight SNPs that were significantly associated with type 2 diabetes are lo-cated in five haplotype blocks on chromosomes 8, 10 and 11. The strongestassociation is rs7903146, an intronic SNP located in the TCF7L2 gene, con-firming many previous studies association this gene with diabetes type 2. Thesecond strongest associated SNP, rs13266634, is a nonsynonymous SNP in the

72

Page 73: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

zinc transporter SLC30A8. This gene is only expressed in secretory vesiclesof β -cells, and transports zinc into the cells, which is essential for storing andsecreting active insulin [241]. HHEX is a target of the WNT signalling path-way, like TCF7L2, and is needed for pancreatic development [242]. EXT2 isinvolved in hedgehog signalling, and is involved in insulin synthesis and de-velopment of the pancreas [243, 244]. The finding of the five genes can haveimportant implications for unravelling the mechanisms behind diabetes type 2and lead to new diagnostic or, eventually, therapeutic methods. This work alsodelivers good support for genome-wide association studies, and show that awhole genome scan can, in cases of reasonable power to detect, find signifi-cantly associated SNPs for complex diseases.

73

Page 74: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using
Page 75: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

8. Conclusion

During the last decade, the way we approach analysis of biological systemshas changed radically due to the development of high-throughput measure-ment techniques and the advancement of computational systems and algo-rithms. Entirely new research fields such as bioinformatics and systems biol-ogy have developed, and promise a revolution in the understanding of howcomplex biological systems work, from the most detailed level to whole-genome systems. We can join these advancements with those in engineeringand physics to study the world around us with the insight that these fields allstudy systems governed by the same physical laws, with principles for reg-ulation and control in man-made systems resembling those found in nature.Yet, we must not forget that the models of natural systems that we know aremodels - they are, at least in some respect, allowing us to explain observationsand predict outcomes under various restrictions, but they are by no meanscomplete or exhaustive. It is easy, maybe because of the great impact (andgreat contribution) that computer science has made to modern biology, to besomewhat “binary” in one’s thinking and consider interactions as being eitherpresent or non-present. Representing networks as graphs of nodes and edgesis convenient and helps us visualise complex systems, but it is also somewhatdeceiving in its simplicity. We forget about the many layers of intermediateand weak interactions that can have a large effect on the system, even if eachof them is too small to detect on its own. Likewise, we are repeatedly toldthat this is the “post-genomic” era, meaning that the mapping of biologicalcomponents is now done, and we can now go into the era of finding the func-tions of these components. I believe we should be careful with statements likethat. Great feats, such as the mapping of the human genome, should make usmore humble at the complexity and richness of biological systems and inspireus to investigate these systems with continued width and depth, remainingskeptical that “the map” is really showing us everything. We have seen thatto study a complex system, we need modelling approaches where the detailand complexity matches the questions we seek answers to. No system-widehigh-throughput technique will ever replace meticulously detailed studies ofsingle genes or proteins, because each single component has unique chemi-cal properties and working environments. But at the same time, thousands ofindependent small-scale studies can never replace a high-throughout parallelmeasurement giving us a fingerprint or a snapshot of a complex biological sys-tem. These approaches address different questions related to different levels

75

Page 76: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

of complexity, but all part of the same system. The signaling going betweenlevels of complexity ties it all together. A small change on the molecular scalecan have a system-wide impact, such as the single mutations increasing therisk for diabetes.

The presence of noise in the data used for modelling biological complexnetworks raise questions about how this, and data processing methods, affectthe structure of the inferred network models. Preliminary studies followingthe work reported in this thesis indicate that the noise will have a non-uniformrole, shaping the topology of the network depending on data processing fea-tures such as sampling and thresholding. This is an interesting field to study,and also raise questions about the role of noise as a driving force in the evolu-tion of gene regulatory networks.

A key challenge is to combine the advances in both extremes of the spec-trum of measurement techniques - methods to measure single molecules insingle living cells, to precisely model the regulation and noise in gene cir-cuits, and methods to mine genome-wide, or even environment-wide data tomap the state of the system at a given experimental condition. The study ofhow small gene circuits couple to build pathways using a variety of signalsand regulatory mechanisms may give insight in how to predict pathway oreven larger scale dynamics based on the small scale components and inter-actions, or even engineer full pathways with desired biological functions andregulatory mechanisms.

I believe that we will see great advances in measuring and understandingbiological systems, but also that genome biology has a lot of variables wehave yet to discover. Every year, new studies show previously unknown typesof regulation in genetic systems, and even though the human genome has beenmapped, the translation of that code to actual biological components and reg-ulatory mechanisms remain incomplete to an unknown degree. Just as an ex-ample, a recent study found more than 15,000 new, unannotated human tran-scripts [184], and we may find more when we can better predict and measurethe complete set of components in a cell. We have a lot of work ahead of us,and many new fascinating things to discover.

76

Page 77: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

9. Sammanfattning på svenska -Summary in Swedish

Denna avhandling behandlar reglering av komplexa biologiska systempå olika skalnivåer. Komplexa system karakteriseras av egenskaper somutvecklas då systemen växer, men som inte är direkt avhängiga enbart påstorleken av systemet, t.ex. egenskaper såsom synkronisering, adaptivitet,mönsterbildning och tolerans mot slumpmässiga fel. Med hjälp av biologiskasignaler som reglerar processer på de olika skalnivåerna i cellulära systemkan dessa uppvisa alla de egenskaper komplexa system gör, och jag beskriveri denna avhandling fem olika arbeten som bidrar till vårt kunnande om dessamekanismer, och hur de fungerar i biologiska system .

På den mest fundamentala nivån i celler finns arvsanlaget kodat som DNA-molekyler, vars sekvens av baspar avkodas av ett intrikat molekylärt mask-ineri som genererar RNA i en process som kallas transkription. RNA har isin tur antingen har strukturella eller regulatoriska funktioner, eller genomgårtranslation där det avkodas till protein. En gen är inte bara av- eller påsla-gen — hur mycket en gen uttrycker RNA och sedan protein regleras exem-pelvis genom att andra RNA- eller proteinmolekyler (transkriptionsfaktorer)binder till DNA. Fluktuationer i insignaler av detta slag som orsakas t.ex. avden probabilistiska naturen av alla kemiska reaktioner eller diffusion, eller pågrund av mutationer i DNA:t, resulterar i fluktuationer i hur mycket en genuttrycks. Sedan beroende på hur gener och genprodukter kopplar till varandrakan dessa fluktuationer ha olika effekt på högre nivå — som de flesta kom-plexa system finns det en stor tolerans för denna typ av variationer, men sam-tidigt en känslighet. Om en variation på låg nivå, t.ex. en mutation, påverkaren av dessa känsliga punkter i reglersystemet kan detta ha en stor effekt på hö-gre nivåer där signalvägar eller stora system av gener kommer i obalans, ochdetta kan i sin tur slå ut livsviktiga funktioner i celler och orsaka sjukdomar.

I detta arbete beskriver jag bland annat hur signal och brus separerar närde går genom ett ickelinjärt system som är bistabilt, d.v.s. har två stabila lä-gen som systemet kan växla mellan. T.ex. genetiska vippor har denna typ avstruktur, där en gen uttrycks om den andra inte uttrycks, och vice versa. Sep-aration mellan signal och brus beskrivs ofta med hjälp av SNR, signal-mot-brusförhållandet, som beräknas utifrån frekvensspektrum. Man har funnit attsignaler i brus som går genom sådana system kan uppvisa s.k. stokastisk res-onans, där SNR kan uppnå ett lokalt maximum för intermediära brusstyrkor.Detta har givit hopp om att kunna användas för att förbättra detektionspre-

77

Page 78: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

standa för svaga signaler i brus genom att öka brusnivån. Jag visar i ett de-larbete till denna avhandling hur stokastisk resonans kommer sig av att SNRmäter information på ett suboptimalt sätt, och att statistikor som i stället mätersignal-brusseparation på ett optimalt sätt inte uppvisar detta fenomen i invert-erbara system.

Ett flertal gener som kopplas ihop för att sköta en viss biologisk funktionregleras ofta tillsammans. Detta visar jag i ett annat delarbete där jag användermetaboliska nätverk och proteinkomplexnätverk för att detektera DNA-motivsom återfinns oftare än vad de skulle göra i liknande grupper av slumpmässigtutvalda gener, men ihopkopplade på samma sätt. Dessa motiv kan ha regula-toriska funktioner, och man kan se hur stora grupper av gener som är koppladefunktionellt till varandra har samma motiv i promoterdelarna. Detta indikeraratt uttrycksnivåerna av gener som är kopplade funktionellt har möjlighet attstyras med samma molekylära mekanism.

När sedan sådana signalvägar kopplas ihop bildas stora nätverk, upp tillgenom-nivån där alla gener är kopplade. Jag har använt data från genex-pressionsarrayer, “microarrays”, för en serie systematiska mutationer hos jästför att bygga upp sådana nätverk, och sedan undersökt deras struktur bådefrån topologiskt och biologiskt perspektiv. Nätverken är skalfria, d.v.s. antaletgener med ett visst antal kopplingar sjunker som en avtagande potensfunk-tion med avseende på ökat kopplingsantal. Denna topologi är mycket vanligtförekommande i komplexa nätverk och återfinns t.ex. i sociala nätverk, struk-tur av språk, trafik- och kraftnätet, och i många olika biologiska nätverk. Enegenskap hos nätverk med denna typ av topologi är just en stor tolerans motslumpmässiga fel, men känslighet mot riktade attacker mot centrala noder inätverket. Jag använder dessa nätverk av mutationer tillsammans med nätverksom byggt med hjälp av att studera vilka gener som binds av olika transkrip-tionsfaktorer för att knyta nätverksstruktur till funktion, såsom proteinbindingeller funktion som beskrivs i litteratur. Detta görs genom en algoritm där generkopplas funktionellt om deras närområden i de ursprungliga nätverken liknarvarandra.

Slutligen beskrivs ett arbete där jag studerat hur små fel på DNA-nivå kanorsaka sjukdom, i detta specifika fall typ 2-diabetes. Närmare 393,000 enstakamutationer (SNPar) har genotypats i en provgrupp om ungefär 700 patientermed typ 2-diabetes och 700 friska personer. Genotyping innebär att man de-tekterar vilka två allelvarianter en person bär för varje SNP. De varianter somåterfinns med mycket större frekvens hos sjuka än hos friska är kopplade tillsjukdomen, och betyder att de variationer de orsakar i regleringen eller avkod-ningen av den gen de befinner sig i har en stor biologisk effekt för uppkomstenav typ 2-diabetes. Vi fann 59 SNPar i första fasen av projektet, och genoty-pade dessa igen på en ny grupp med närmare 5,500 personer. I denna fasfann vi åtta SNPar i fem olika gener som var signifikant kopplade till diabetestyp 2. En av dessa gener är TCF7L2, som är en transkriptionsfaktor i WNT-signalvägen, och har tidigare kopplats till typ 2-diabetes i många studier. En

78

Page 79: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

annan gen är SLC30A8, som kodar för ett protein som bara återfinns i de in-sulinproducerande betacellerna i bukspottkörteln, och transporterar in zink tilldessa celler. Zink är viktigt eftersom insulin lagras och utsöndras i sin aktivaform som ett komplex bundet med zink. Två andra gener är EXT2 och HHEX,som är inblandade i utvecklingen och funktionen är bukspottkörteln och dessbetaceller. Den femte genen har okänd funktion.

Slutsatsen av dessa arbeten, som spänner över ett tämligen brett områdemen alla behandlar reglering av komplexa system på olika skalnivåer, är attman måste kombinera mätning och modellering av biologiska komplexa sys-tem på olika nivåer för att kunna förstå helheten i hur dessa system uppförsig och reagerar på yttre störningar eller förändringar i betingelser. Horison-tella regleringar som sker mellan delsystem på samma skalnivå och vertikalaregleringar som sker mellan delsystem på olika skalnivåer är kopplade, ochgenom topologi hos nätverket av dessa kopplingar, tillsammans med ickelin-jära effekter i själva kopplingarna, så utvecklas systemet till att få egenskapersom skalfrihet, tolerans mot slumpmässiga fel, känslighet mot riktade attackeroch adaptivitet mot fluktuationer i omgivningen.

79

Page 80: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using
Page 81: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

10. Acknowledgements

This thesis is based on work that has been carried out during three very dif-ferent phases of my life, three different projects, in four different countries. Ihave learned a lot about many things - not least about life itself and how luckyI am to have such great collegues, family and friends. You are so many, and somuch would I like to praise you all, that I would need to write an extra book!I will have to limit myself here because of lack of space, but I really want tothank all of you who I interacted with during these years, in work or outside,because you have all one way or another had an impact on me, and togethermade my life rich and happy.

I am particularly grateful to Erkki Brändas, who as my advisor have let meenjoy such freedom to pursue science in a way that has been fairly unortho-dox for a student. At home as well as throughout all my different travels andprojects he has given me 100% support and trust in my own decisions, whilestill giving me advice I needed and “leading by example” with a great attitudetowards science and life in general.

The first part of the trilogy which is my life as a PhD student was carried outin Uppsala at the Department of Quantum Chemistry, where Erkki took me onas a student in collaboration with FOA (presently FOI), the Swedish DefenceResearch Agency. At FOA, I was working in the Nonlinear Dynamics project,resulting in for instance Paper I. I am very grateful to FOI for their financialand practical support, and for the eminent scientific training and guidance Ireceived by John Robinson and Peter Krylstedt. Thanks to them, I learnedto approach science rigorously and meticulously, and although I may neverreach the levels they set out by example, I know what to strive for and toappreciate it when I see it. During this period, I also enjoyed several visitsto SPAWAR, the Space and Naval Warfare Systems Center, San Diego, forintense research collaborations, and I am very thankful for the support andhospitality of Adi Bulsara and Mario Inchiosa.

Most of my time in Uppsala and thereafter was done under AIM, the grad-uate school of Advanced Instrumentation and Measurements, providing ex-cellent support and a very good environment for physics studies, for whichI am very thankful. Thanks also to everyone at Quantum Chemistry for thesupport and the great working environment, especially Björn Hessmo, whoconvinced me to start this PhD thing over a pint of Guinness (what else?) onegood summer evening at Orvar’s, and my good friends from AIM, in particularMattias Lantz, Marcus Dahlfors and Hans Henriksson. Special thoughts go

81

Page 82: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

to Björn Larsson, a genuinely good friend and brilliant co-student who leftus much too early.

The second part of this period of my life was carried out in Cambridge, Eng-land, at the European Bioinformatics Institute, EBI. Thanks to Alvis Brazmafor all the friendliness, generosity and scientific brilliance, and for creating agreat team atmosphere to work in. I truly loved Cambridge and had a greattime there thanks to co-workers, friends and collaborators: Thomas Schlitt,Helen Parkinson, Ele Holloway, Jess Mar, Aedin Culhane, Laurence Et-twiller, Kai Runte, Virginie Mittard and the whole microarray team I waslucky enough to be a part of. Many thanks to the Swedish Science ResearchCouncil, NFR, who funded me for my first year at EBI with a generous grant.

My most recent years, have been spent in Montreal at the McGill Univer-sity and Genome Quebec Innovation Centre. Special thanks to Rob Sladek,who has shown that the same rigour that I always enjoyed seeing in physicsand mathematics also can be applied in biology, and given me support andfreedom to work independently with interesting stuff in a great team. Also,big thanks to Tom Hudson who gave me the opportunity to come to Montrealand learn so many new things, and be a part of the Innovation Centre. Specialthanks to friends and collegues: Yoshihiko (Tiger) Nagai, Haïg Djambazian,Saravanan Sundararajan, Ghislain Rocheleau, Lishuang Shen, Alexan-der Mazur, David Serre, Tibor van Rooij, Elin Grundberg, Susan Rogers,Maya Tinker, Nicole Perry, Amira Djebbari, Kanmani Kandan, KarineThivel, Sylvain Foisy, Jean-Philippe Laverdure and everyone at the Innova-tion Centre.

None of this could have happened without the helpfulness of administrativesupport, in particular Christina Rasmundson and Inger Ericson in Uppsala,Liz Ford and Martina Munzittu at EBI, and Lisa-Marie Baril and JennyKoulis in Montreal. Thank you.

Thanks to Daniel Asraf for sharing great times in San Diego and in Swe-den, as well as hard work over courses and good friendship over the years. In apretty neat twist of fate we both ended up in Montreal at the same time, so thatwe could keep up the “nerditude” and the frantic scribbling of equations onnapkins and beermats. Priceless. Thanks to Hampus Rystedt, Carolina Rib-bing, Lennart Köhler, Karin Edoff, Robert Yantes, Susie Stephens, GöranWallin, Johan Elvnert, Anders Hedström, Joanna Applequist and manyothers for being great friends over many years.

The very reason I am here is my family: Mamma, Pappa, Andreas andMartina, for love, support and encouragement, for bringing me up with thecuriousity of wanting to know the unknown, the inspiration to see the goodand the fun in everything, and showing me that the only limits are the ones Iset up for myself.

Finally, very special thanks to Duk-Kyung for all the love and support, forsharing my life through thick and thin, and for being simply brilliant in everyway.

82

Page 83: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

Bibliography

[1] G. Nicolis and I. Prigogine. Exploring complexity. Freeman New York, 1989.

[2] E.J. Brändas. New perspectives in theoretical chemical physics. Advances inQuantum Chemistry, 42:383–397, 2003.

[3] J.D. Watson and F.H.C. Crick. Molecular structure of nucleic acids: a structurefor deoxyribose nucleic acid. Nature, 171(4356):737–738, 1953.

[4] E. Heard and C.M. Disteche. Dosage compensation in mammals: fi ne-tuningthe expression of the X chromosome. Genes & Development, 20(14):1848–1867, 2006.

[5] J.C. Venter, M.D. Adams, E.W. Myers, P.W. Li, R.J. Mural, G.G. Sutton, H.O.Smith, M. Yandell, C.A. Evans, R.A. Holt, et al. The sequence of the humangenome. Science, 291(5507):1304–1351, 2001.

[6] E.S. Lander, L.M. Linton, B. Birren, C. Nusbaum, M.C. Zody, J. Baldwin,K. Devon, K. Dewar, M. Doyle, W. FitzHugh, et al. Initial sequencing andanalysis of the human genome. Nature, 409(6822):860–921, 2001.

[7] R.H. Waterston, K. Lindblad-Toh, E. Birney, J. Rogers, J.F. Abril, P. Agarwal,R. Agarwala, R. Ainscough, M. Alexandersson, P. An, et al. Initial sequencingand comparative analysis of the mouse genome. Nature, 420(6915):520–562,2002.

[8] F.H. Crick. On protein synthesis. Symp Soc Exp Biol, 12:138–163, 1958.

[9] M.J. Daly, J.D. Rioux, S.F. Schaffner, T.J. Hudson, and E.S. Lander. High-resolution haplotype structure in the human genome. Nature Genetics,29(2):229–232, 2001.

[10] D. Altshuler, L.D. Brooks, A. Chakravarti, F.S. Collins, M.J. Daly, P. Donnelly,et al. A haplotype map of the human genome. Nature, 437(7063):1299–320,2005.

[11] J. Sebat, B. Lakshmi, J. Troge, J. Alexander, J. Young, P. Lundin, et al.Large-scale copy number polymorphism in the human genome. Science,305(5683):525–528, 2004.

83

Page 84: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

[12] R. Redon, S. Ishikawa, K.R. Fitch, L. Feuk, G.H. Perry, T.D. Andrews,H. Fiegler, M.H. Shapero, A.R. Carson, W. Chen, et al. Global variation incopy number in the human genome. Nature, 444:444–454, 2006.

[13] J. Sebat, B. Lakshmi, D. Malhotra, J. Troge, C. Lese-Martin, T. Walsh, B. Yam-rom, S. Yoon, A. Krasnitz, J. Kendall, A. Leotta, D. Pai, R. Zhang, Y. Lee,J. Hicks, S.J. Spence, A.T. Lee, K. Puura, T. Lehtimäki, D. Ledbetter, P.K.Gregersen, J. Bregman, J.S. Sutcliffe, V. Jobanputra, W. Chung, D. Warbur-ton, M. King, D. Skuse, D.H. Geschwind, T.C. Gilliam, K. Ye, and M. Wigler.Strong association of de novo copy number mutations with autism. Science,E-publ.:1138659, 2007.

[14] H.H. McAdams and A. Arkin. Stochastic mechanisms in gene expression.Proc Natl Acad Sci, 94(3):814–819, 1997.

[15] J. Hasty, D. McMillen, F. Isaacs, and J.J. Collins. Computational studies ofgene regulatory networks: in numero molecular biology. Nat. Rev. Genet,2(4):268–279, 2001.

[16] Hoheisel, J.D. Microarray technology: beyond transcript profi ling and geno-type analysis. Nature Reviews Genetics, 7:200–210, 2006.

[17] M. Schena, D. Shalon, R.W. Davis, and P.O. Brown. Quantitative monitoringof gene expression patterns with a complementary DNA microarray. Science,270(5235):467–470, 1995.

[18] J.L. DeRisi, V.R. Iyer, and P.O. Brown. Exploring the metabolic and geneticcontrol of gene expression on a genomic scale. Science, 278(5338):680–686,1997.

[19] P.D. Lee, R. Sladek, C.M. Greenwood, and T.J. Hudson. Control genes andvariability: absence of ubiquitous reference transcripts in diverse mammalianexpression studies. Genome Research, 12(2):292–297, 2002.

[20] J. van de Peppel, P. Kemmeren, H. van Bakel, M. Radonjic, D. van Leenen,and F.C.P. Holstege. Monitoring global messenger RNA changes in externallycontrolled microarray experiments. EMBO Reports, 4(4):387–393, 2003.

[21] R.A. Irizarry, B.M. Bolstad, F. Collin, L.M. Cope, B. Hobbs, and T.P. Speed.Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res,31(4):e15, 2003.

[22] R. Nadon and J. Shoemaker. Statistical issues with microarrays: processingand analysis. Trends in Genetics, 18(5):265–271, 2002.

[23] P.T. Spellman, G. Sherlock, M.Q. Zhang, V.R. Iyer, K. Anders, M.B. Eisen,P.O. Brown, D. Botstein, and B. Futcher. Comprehensive identifi cation of cellcycle-regulated genes of the yeast Saccharomyces cerevisiae by microarrayhybridization. Molecular Biology of the Cell, 9(12):3273–3297, 1998.

84

Page 85: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

[24] S. Ramaswamy, K.N. Ross, E.S. Lander, and T.R. Golub. A molecular sig-nature of metastasis in primary solid tumors. Nature Genetics, 33(1):49–54,2003.

[25] L.J. van’t Veer, H. Dai, M.J. van de Vijver, Y.D. He, A.A. Hart, M. Mao, H.L.Peterse, K. van der Kooy, M.J. Marton, A.T. Witteveen, et al. Gene expressionprofi ling predicts clinical outcome of breast cancer. Nature, 415(6871):530–536, 2002.

[26] T.R. Golub, D.K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J.P. Mesirov,H. Coller, M.L. Loh, J.R. Downing, M.A. Caligiuri, et al. Molecular clas-sifi cation of cancer: class discovery and class prediction by gene expressionmonitoring. Science, 286(5439):531–537, 1999.

[27] K. Novak. News feature: Where the chips fall. Nature Medicine, 12:158–159,2006.

[28] J. Quackenbush. Microarray data normalization and transformation. NatureGenetics, 32:496–501, 2002.

[29] A. Butte. The use and analysis of microarray data. Nature Reviews DrugDiscovery, 1(12):951–960, 2002.

[30] H.C. Causton, J. Quackenbush, and A. Brazma. Microarray gene expressiondata analysis: a beginner’s guide. Blackwell Pub., 2003.

[31] D.B. Allison, X. Cui, G.P. Page, and M. Sabripour. Microarray data analysis:from disarray to consolidation and consensus. Nature Reviews Genetics,7(1):55–65, 2006.

[32] V.G. Tusher, R. Tibshirani, and G. Chu. Signifi cance analysis of microarraysapplied to the ionizing radiation response. Proc Natl Acad Sci, 98(9):5116–5121, 2001.

[33] N. Jain et al. Local-pooled-error test for identifying differentially expressedgenes with a small number of replicated microarrays. Bioinformatics,19(15):1945–1951, 2003.

[34] A.-C. Syvänen. Toward genome-wide SNP genotyping. Nature Genetics,37:S5–10, 2005.

[35] Fan, J.-B., Chee, M.S., and Gunderson, K.L. Highly parallel genomic assays.Nature Reviews Genetics, 7:632–644, 2006.

[36] H. Matsuzaki, S. Dong, H. Loi, X. Di, G. Liu, E. Hubbell, J. Law, T. Berntsen,M. Chadha, H. Hui, et al. Genotyping over 100,000 SNPs on a pair of oligonu-cleotide arrays. Nature Methods, 1:109–111, 2004.

85

Page 86: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

[37] H. Matsuzaki, H. Loi, S. Dong, Y.Y. Tsai, J. Fang, J. Law, X. Di, W.M. Liu,G. Yang, G. Liu, et al. Parallel genotyping of over 10,000 SNPs using a one-primer assay on a high-density oligonucleotide array. Genome Research,14(3):414–25, 2004.

[38] J.B. Fan, K.L. Gunderson, M. Bibikova, J.M. Yeakley, J. Chen, E. Wick-ham Garcia, L.L. Lebruska, M. Laurent, R. Shen, and D. Barker. Illuminauniversal bead arrays. Methods Enzymology, 410:57–73, 2006.

[39] K.L. Gunderson, S. Kruglyak, M.S. Graige, F. Garcia, B.G. Kermani, C. Zhao,D. Che, T. Dickinson, E. Wickham, J. Bierle, et al. Decoding randomly orderedDNA arrays. Genome Research, 14(5):870–877, 2004.

[40] K.L. Gunderson, F.J. Steemers, G. Lee, L.G. Mendoza, and M.S. Chee. Agenome-wide scalable SNP genotyping assay using microarray technology.Nature Genetics, 37:549–554, 2005.

[41] B. Ren, F. Robert, J.J. Wyrick, O. Aparicio, E.G. Jennings, I. Simon,J. Zeitlinger, J. Schreiber, N. Hannett, E. Kanin, et al. Genome-wide loca-tion and function of DNA binding proteins. Science, 290(5500):2306–2309,2000.

[42] V.R. Iyer, C.E. Horak, C.S. Scafe, D. Botstein, M. Snyder, and P.O. Brown.Genomic binding sites of the yeast cell-cycle transcription factors SBF andMBF. Nature, 409(6819):533–538, 2001.

[43] I. Simon, J. Barnett, N. Hannett, C.T. Harbison, N.J. Rinaldi, T.L. Volkert, J.J.Wyrick, J. Zeitlinger, D.K. Gifford, T.S. Jaakkola, et al. Serial regulation oftranscriptional regulators in the yeast cell cycle. Cell, 106(6):697–708, 2001.

[44] T.I. Lee, N.J. Rinaldi, F. Robert, D.T. Odom, Z. Bar-Joseph, G.K. Gerber, N.M.Hannett, C.T. Harbison, C.M. Thompson, I. Simon, et al. Transcriptional reg-ulatory networks in Saccharomyces cerevisiae. Science, 298(5594):799–804,2002.

[45] S. Cawley, S. Bekiranov, H.H. Ng, P. Kapranov, E.A. Sekinger, D. Kampa,A. Piccolboni, V. Sementchenko, J. Cheng, A.J. Williams, et al. Unbiasedmapping of transcription factor binding sites along human chromosomes 21and 22 points to widespread regulation of noncoding RNAs. Cell, 116(4):499–509, 2004.

[46] T. Kislinger, B. Cox, A. Kannan, C. Chung, P. Hu, A. Ignatchenko, M.S. Scott,A.O. Gramolini, Q. Morris, M.T. Hallett, et al. Global survey of organ andorganelle protein expression in mouse: combined proteomic and transcriptomicprofi ling. Cell, 125(1):173–186, 2006.

86

Page 87: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

[47] S.F. Kingsmore. Multiplexed protein measurement: technologies and appli-cations of protein and antibody arrays. Nature Reviews Drug Discovery,5(4):310–321, 2006.

[48] van Oijen, A.M. Single-molecule studies of complex systems: the replisome.Mol. BioSyst., 3:117–125, 2007.

[49] S. Tyagi, F.R. Kramer, et al. Molecular beacons: Probes that fluoresce uponhybridization. Nature Biotechnology, 14(3):303–308, 1996.

[50] A.P. Silverman and E.T. Kool. Quenched probes for highly specifi c detectionof cellular RNAs. Trends in Biotechnology, 23(5):225–230, 2005.

[51] S. Tyagi, S.A.E. Marras, and F.R. Kramer. Wavelength-shifting molecular bea-cons. Nature Biotechnology, 18:1191–1196, 2000.

[52] J. Yu, J. Xiao, X. Ren, K. Lao, and X.S. Xie. Probing gene expression in livecells, one protein molecule at a time. Science, 311(5767):1600–1603, 2006.

[53] T.I. Lee and R.A. Young. Transcription of eukaryotic protein-coding genes.Annu. Rev. Genet., 34:77–137, 2000.

[54] P.J. Mitchell and R. Tjian. Transcriptional regulation in mammalian cells bysequence-specifi c DNA binding proteins. Science, 245(4916):371–378, 1989.

[55] J.T. Kadonaga. Regulation of RNA polymerase II transcription by sequence-specifi c DNA binding factors. Cell, 116(2):247–257, 2004.

[56] X. Zhang, J. Yazaki, A. Sundaresan, S. Cokus, S.W.L. Chan, H. Chen, I.R.Henderson, P. Shinn, M. Pellegrini, S.E. Jacobsen, et al. Genome-wide high-resolution mapping and functional analysis of DNA methylation in Arabidop-sis. Cell, 126(6):1189–1201, 2006.

[57] F. Ozsolak, J.S. Song, X.S. Liu, and D.E. Fisher. High-throughput map-ping of the chromatin structure of human promoters. Nature Biotechnology,25(2):244–248, 2007.

[58] K.V. Prasanth and D.L. Spector. Eukaryotic regulatory RNAs: an answer to the‘genome complexity’ conundrum. Genes & Development, 21:11–42, 2007.

[59] TM Rana. Illuminating the silence: understanding the structure and functionof small RNAs. Nature Reviews Mol. Cell Biol., 8(1):23–36, 2007.

[60] K.E. Shearwin, B.P. Callen, and J.B. Egan. Transcriptional interference - acrash course. Trends in Genetics, 21(6):339–345, 2005.

[61] J. Vilo, A. Brazma, I. Jonassen, A. Robinson, and E. Ukkonen. Mining forputative regulatory elements in the yeast genome using gene expression data.Proc Int Conf Intell Syst Mol Biol (ISMB), 8:384–394, 2000.

87

Page 88: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

[62] X. Xie, J. Lu, EJ Kulbokas, T.R. Golub, V. Mootha, K. Lindblad-Toh, E.S.Lander, and M. Kellis. Systematic discovery of regulatory motifs in humanpromoters and 3’UTRs by comparison of several mammals. Nature, 434:338–345, 2005.

[63] M. Blanchette, A.R. Bataille, X. Chen, C. Poitras, J. Laganiére, V. Ferretti,D. Bergeron, B. Coulombe, and F. Robert. Genome-wide computational pre-diction of transcriptional regulatory modules reveals new insights into humangene expression. Genome Research, 16:656–668, 2006.

[64] Y. Liu, X.S. Liu, L. Wei, R.B. Altman, and S. Batzoglou. Eukaryotic regulatoryelement conservation analysis and identifi cation using comparative genomics.Genome Research, 14:451–458, 2004.

[65] G.D. Stormo. DNA binding sites: representation and discovery. Bioinformat-ics, 16(1):16–23, 2000.

[66] T.D. Schneider and R.M. Stephens. Sequence Logos: A New Way to DisplayConsensus Sequences. Nucleic Acids Res., 18:6097–6100, 1990.

[67] J. Gorodkin, L.J. Heyer, S. Brunak, and G.D. Stormo. Displaying the infor-mation contents of structural RNA alignments: the structure logos. Computerapplications in the biosciences: CABIOS, 13(6):583–586, 1997.

[68] A.E. Tsong, B.B. Tuch, H. Li, and A.D. Johnson. Evolution of alternativetranscriptional circuits with identical logic. Nature, 443(7110):415–420, 2006.

[69] M. Kærn, W.J. Blake, and J.J. Collins. The engineering of gene regulatorynetworks. Annual Review of Biomedical Engineering, 5:179–206, 2003.

[70] D.A. Drubin, J.C. Way, and P.A. Silver. Designing biological systems. Genes& Dev., 21:242–254, 2007.

[71] M. Kaern, T.C. Elston, W.J. Blake, and J.J. Collins. Stochasticity in gene ex-pression: from theories to phenotypes. Nature Reviews Genetics, 6(6):451–464, 2005.

[72] J.M. Pedraza and A. van Oudenaarden. Noise propagation in gene networks.Science, 307(5717):1965–1969, 2005.

[73] J.M. Raser and E.K. O’Shea. Noise in gene expression: origins, consequences,and control. Science, 309(5743):2010–2013, 2005.

[74] E.M. Ozbudak, M. Thattai, I. Kurtser, A.D. Grossman, and A. van Oudenaar-den. Regulation of noise in the expression of a single gene. Nature Genetics,31(1):69–73, 2002.

[75] W.J. Blake, M. KAern, C.R. Cantor, and J.J. Collins. Noise in eukaryotic geneexpression. Nature, 422(6932):633–637, 2003.

88

Page 89: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

[76] M.B. Elowitz, A.J. Levine, E.D. Siggia, and P.S. Swain. Stochastic gene ex-pression in a single cell. Science, 297(5584):1129–1131, 2002.

[77] A. Becskei, B.B. Kaufmann, and A. van Oudenaarden. Contributions of lowmolecule number and chromosomal positioning to stochastic gene expression.Nature Genetics, 37(9):937–944, 2005.

[78] D. Volfson, J. Marciniak, W.J. Blake, N. Ostroff, L.S. Tsimring, and J. Hasty.Origins of extrinsic variability in eukaryotic gene expression. Nature,439(7078):861–864, 2006.

[79] N. Rosenfeld, J.W. Young, U. Alon, P.S. Swain, and M.B. Elowitz. Gene reg-ulation at the single-cell level. Science, 307(5717):1962–1965, 2005.

[80] J. Kim, K.S. White, and E. Winfree. Construction of an in vitro bistable cir-cuit from synthetic transcriptional switches. Molecular Systems Biology,msb4100099:1–12, 2006.

[81] L. Gammaitoni, P. Hänggi, P. Jung, and F. Marchesoni. Stochastic resonance.Reviews of Modern Physics, 70(1):223–287, 1998.

[82] I. Karatzas and S.E. Shreve. Brownian motion and stochastic calculus.Springer, New York, 1991.

[83] A.D. Hibbs, A.L. Singsaas, E.W. Jacobs, A.R. Bulsara, J.J. Bekkedahl, andF. Moss. Stochastic resonance in a superconducting loop with a Josephsonjunction. Journal of Applied Physics, 77(6):2582–2590, 1995.

[84] A. Longtin, A. Bulsara, D. Pierson, and F. Moss. Bistability and the dynamicsof periodically forced sensory neurons. Biological Cybernetics, 70(6):569–578, 1994.

[85] A.R. Bulsara et al. Cooperative behaviour in periodically driven noisyintegrate-lire models of neuronal networks. Physical Review E, 53:3958–3969, 1996.

[86] M.E. Inchiosa and A.R. Bulsara. Nonlinear dynamic elements with noisy sinu-soidal forcing: Enhancing response via nonlinear coupling. Physical ReviewE, 52(1):327–339, 1995.

[87] M. Löcher, D. Cigna, E.R. Hunt, G.A. Johnson, F. Marchesoni, L. Gammaitoni,M.E. Inchiosa, and A.R. Bulsara. Stochastic resonance in coupled nonlineardynamic elements. Chaos: An Interdisciplinary Journal of Nonlinear Sci-ence, 8(3):604–615, 1998.

[88] J.F. Lindner, B.K. Meadows, W.L. Ditto, M.E. Inchiosa, and A.R. Bulsara. Ar-ray enhanced stochastic resonance and spatiotemporal synchronization. Phys-ical Review Letters, 75(1):3–6, 1995.

89

Page 90: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

[89] J.A. Acebrón, A.R. Bulsara, and W.J. Rappel. Noisy FitzHugh-Nagumo model:from single elements to globally coupled networks. Physical Review E,69(2):26202, 2004.

[90] J. Paulsson, O.G. Berg, and M. Ehrenberg. Stochastic focusing: fluctuation-enhanced sensitivity of intracellular regulation. Proc Natl Acad Sci,97(13):7148–53, 2000.

[91] J. Paulsson and M. Ehrenberg. Random signal fluctuations can reduce randomfluctuations in regulated components of chemical regulatory networks. Physi-cal Review Letters, 84(23):5447–5450, 2000.

[92] J. Paulsson. The stochastic nature of intracellular control circuits. 2000.

[93] H. Yan, W. Yuan, V.E. Velculescu, B. Vogelstein, and K.W. Kinzler. Allelicvariation in human gene expression. Science, 297(5584):1143, 2002.

[94] T. Pastinen and T.J. Hudson. Cis-acting regulatory variation in the humangenome. Science, 306(5696):647–650, 2004.

[95] Y. Li, A. Grupe, C. Rowland, P. Nowotny, J.S.K. Kauwe, S. Smemo, A. Hin-richs, K. Tacey, T.A. Toombs, S. Kwok, et al. DAPK1 variants are associatedwith Alzheimer’s disease and allele-specifi c expression. Human MolecularGenetics, 15(17):2560–2568, 2006.

[96] L. Milani, M. Gupta, M. Andersen, S. Dhar, M. Fryknäs, A. Isaksson, R. Lars-son, and A.-C. Syvänen. Allelic imbalance in gene expression as a guide tocis-acting regulatory single nucleotide polymorphisms in cancer cells. NucleicAcids Research, Advance Access(gkl1152):10, 2007.

[97] L.L. Peters, R.F. Robledo, C.J. Bult, G.A. Churchill, B.J. Paigen, and K.L.Svenson. The mouse as a model for human biology: a resource guide for com-plex trait analysis. Nature Reviews Genetics, 8:58–69, 2007.

[98] R.W. Doerge, B.S. Weir, and Z.B. Zeng. Statistical issues in the search forgenes affecting quantitative traits in experimental populations. Statistical Sci-ence, 12(3):195–219, 1997.

[99] M.V. Rockman and L. Kruglyak. Genetics of global gene expression. NatureReviews Genetics, 7:862–872, 2006.

[100] E.E. Schadt, S.A. Monks, T.A. Drake, A.J. Lusis, N. Che, V. Colinayo, T.G.Ruff, S.B. Milligan, J.R. Lamb, G. Cavet, et al. Genetics of gene expressionsurveyed in maize, mouse and man. Nature, 422(6929):297–302, 2003.

[101] E.E. Schadt, J. Lamb, X. Yang, J. Zhu, S. Edwards, D. GuhaThakurta, S.K.Sieberts, S. Monks, M. Reitman, C. Zhang, et al. An integrative genomicsapproach to infer causal associations between gene expression and disease.Nature Genetics, 37:710–717, 2005.

90

Page 91: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

[102] M. Mehrabian, H. Allayee, J. Stockton, P.Y. Lum, T.A. Drake, L.W. Castellani,M. Suh, C. Armour, S. Edwards, J. Lamb, et al. Integrating genotypic andexpression data in a segregating mouse population to identify 5-lipoxygenaseas a susceptibility gene for obesity and bone traits. Nature Genetics, 37:1224–1233, 2005.

[103] T. Pastinen, B. Ge, and T.J. Hudson. Influence of human genome polymor-phism on gene expression. Human Molecular Genetics, 15(1):R9–R16,2006.

[104] G. Yvert, R.B. Brem, J. Whittle, J.M. Akey, E. Foss, E.N. Smith, R. Mackel-prang, and L. Kruglyak. Trans-acting regulatory variation in Saccharomycescerevisiae and the role of transcription factors. Nature Genetics, 35:57–64,2003.

[105] C.G. Spilianakis, M.D. Lalioti, T. Town, G.R. Lee, and R.A. Flavell. Interchro-mosomal associations between alternatively expressed loci. Nature, 435:637–645, 2005.

[106] J. Dekker, K. Rippe, M. Dekker, and N. Kleckner. Capturing chromosomeconformation. Science, 295(5558):1306–1311, 2002.

[107] Z. Zhao, G. Tavoosidana, M. Sjolinder, A. Gondor, P. Mariano, S. Wang,C. Kanduri, M. Lezcano, K. Singh Sandhu, U. Singh, et al. Circular chro-mosome conformation capture (4C) uncovers extensive networks of epigeneti-cally regulated intra-and interchromosomal interactions. Nature Genetics, 38,2006.

[108] A.H. Brivanlou and J.E. Darnell. Signal Transduction and the Control of GeneExpression. 295(5556):813–818, 2002.

[109] T.K. Sato, R.G. Yamada, H. Ukai, J.E. Baggs, L.J. Miraglia, T.J. Kobayashi,D.K. Welsh, S.A. Kay, H.R. Ueda, and J.B. Hogenesch. Feedback repression isrequired for mammalian circadian clock function. Nature Genetics, 38:312–319, 2006.

[110] M.N. McClean, A. Mody, J.R. Broach, and S. Ramanathan. Cross-talk anddecision making in MAP kinase pathways. Nature Genetics, 39(3):409–414,2007.

[111] Z.N. Oltvai and A.L. Barabási. Life’s complexity pyramid. Science,298(5594):763–764, 2002.

[112] T. Schlitt and A. Brazma. Modelling gene networks at different organisationallevels. FEBS Letters, 579(8):1859–1866, 2005.

[113] A.L. Barabási and R. Albert. Emergence of scaling in random networks. Sci-ence, 286(5439):509–12, 1999.

91

Page 92: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

[114] P. Erdös and A. Rényi. On random graphs I. Publ. Math. Debrecen, 6:290–297, 1959.

[115] S. Milgram. The small world problem. Psychology Today, 2(1):60–67, 1967.

[116] D.J. Watts and S.H. Strogatz. Collective dynamics of ’small-world’ networks.Nature, 393(6684):409–410, 1998.

[117] G.K. Zipf. The repetition of words, time-perspective, and semantic balance.Journal of General Psychology, 32:127–148, 1945.

[118] G.K. Zipf. Human Behaviour and the Principle of Least Effort. Adisson-Wesley Press Inc, Cambridge, 1949.

[119] B. Mandelbrot. An informational theory of the statistical structure of language.Communication Theory, pages 486–502, 1953.

[120] R. Albert and A.L. Barabási. Statistical mechanics of complex networks. Re-views of Modern Physics, 74(1):47–97, 2002.

[121] D.H. Zanette and S.C. Manrubia. Role of intermittency in urban development:a model of large-scale city formation. Physical Review Letters, 79(3):523–526, 1997.

[122] R. Guimera, S. Mossa, A. Turtschi, and L.A.N. Amaral. The worldwide airtransportation network: Anomalous centrality, community structure, and cities’global roles. Proc Natl Acad Sci, 102(22):7794–7799, 2005.

[123] J.G. Oliveira and A.L. Barabási. Human dynamics: Darwin and Einstein cor-respondence patterns. Nature, 437(7063):1251, 2005.

[124] W. Li. Random texts exhibit Zipf’s-law-like word frequency distribution.Transactions on Information Theory, IEEE, 38(6):1842–1845, 1992.

[125] X.F. Wang and G. Chen. Complex networks: small-world, scale-free and be-yond. Circuits and Systems Magazine, IEEE, 3(1):6–20, 2003.

[126] M.E.J. Newman. Power laws, Pareto distributions and Zipf’s law. Contempo-rary Physics, 46(5):323–351, 2005.

[127] H. Jeong, B. Tombor, R. Albert, Z.N. Oltvai, and A.L. Barabási. The large-scale organization of metabolic networks. Nature, 407(6804):651–654, 2000.

[128] H. Jeong, S.P. Mason, A.L. Barabási, and Z.N. Oltvai. Lethality and centralityin protein networks. Nature, 411(6833):41–42, 2001.

[129] J.M. Carlson and J. Doyle. Highly optimized tolerance: A mechanism forpower laws in designed systems. Physical Review E, 60(2):1412–1427, 1999.

92

Page 93: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

[130] J. Doyle and J.M. Carlson. Power laws, highly optimized tolerance, and gen-eralized source coding. Physical Review Letters, 84(24):5656–5659, 2000.

[131] H. Kitano. Biological robustness. Nature Reviews Genetics, 5(11):826–837,2004.

[132] J. Stelling, U. Sauer, Z. Szallasi, F.J. Doyle, and J. Doyle. Robustness of cellu-lar functions. Cell, 118(6):675–685, 2004.

[133] L.H. Hartwell, J.J. Hopfi eld, S. Leibler, and A.W. Murray. From molecular tomodular cell biology. Nature, 402(6761):C47–C52, 1999.

[134] E. Ravasz, A.L. Somera, D.A. Mongru, and A.L. Oltvai, Z.N .and Barabási.Hierarchical organization of modularity in metabolic networks. Science,297(5586):1551–1555, 2002.

[135] C. Song, S. Havlin, and H.A. Makse. Self-similarity of complex networks.Nature, 433(7024):392–395, 2005.

[136] S. Maslov and K. Sneppen. Specifi city and stability in topology of proteinnetworks. Science, 296(5569):910–913, 2002.

[137] M. Girvan and M.E.J. Newman. Community structure in social and biologicalnetworks. Proc Natl Acad Sci, 99(12):7821–7826, 2002.

[138] A.A. Petti and G.M. Church. A network of transcriptionally coordinated func-tional modules in Saccharomyces cerevisiae. Genome Research, 15(9):1298–1306, 2005.

[139] D. Brockmann, L. Hufnagel, and T. Geisel. The scaling laws of human travel.Nature, 439(7075):462–465, 2006.

[140] A. Wagner. Robustness, evolvability, and neutrality. FEBS Letters, 579:1772–1778, 2005.

[141] A. Wagner. How the global structure of protein interaction networks evolves.Proceedings: Biological Sciences, 270(1514):457–466, 2003.

[142] D.V. Foster, S.A. Kauffman, and J.E.S. Socolar. Network growth models andgenetic regulatory networks. Physical Review E, 73(3):31912, 2006.

[143] A.L. Barabási and Z.N. Oltvai. Network biology: understanding the cell’s func-tional organization. Nature Reviews Genetics, 5(2):101–113, 2004.

[144] M.E. Csete and J.C. Doyle. Reverse engineering of biological complexity.Science, 295(5560):1664–1669, 2002.

[145] J.J. Tyson, K. Chen, and B. Novak. Network dynamics and cell physiology.Nature Reviews Molecular Cell Biology, 2(12):908–16, 2001.

93

Page 94: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

[146] O. Wolkenhauer, M. Ullah, P. Wellstead, and K.H. Cho. The dynamic systemsapproach to control and regulation of intracellular networks. FEBS Letters,579(8):1846–1853, 2005.

[147] K.A. Janes and M.B. Yaffe. Data-driven modelling of signal-transduction net-works. Nature Reviews Molecular Cell Biology, 7:820–828, 2006.

[148] S.A. Kauffman. Metabolic stability and epigenesis in randomly constructedgenetic nets. Journal of Theoretical Biology, 22(3):437–467, 1969.

[149] S.A. Kauffman. The Origins of Order. Oxford University Press, 1993.

[150] S. Liang, S. Fuhrman, R. Somogyi, et al. REVEAL, a general reverse en-gineering algorithm for inference of genetic network architectures. PacificSymposium on Biocomputing, Proceedings, 3:18–29, 1998.

[151] P. D’Haeseleer, X. Wen, S. Fuhrman, and R. Somogyi. Linear modeling ofmRNA expression levels during CNS development and injury. Pacific Sym-posium on Biocomputing, Proceedings, 4:41–52, 1999.

[152] G. Giaever, A.M. Chu, L. Ni, C. Connelly, L. Riles, S. Véronneau, S. Dow,A. Lucau-Danila, K. Anderson, B. André, et al. Functional profi ling of theSaccharomyces cerevisiae genome. Nature, 418:387–391, 2002.

[153] A.H.Y. Tong, G. Lesage, G.D. Bader, H. Ding, H. Xu, X. Xin, J. Young, G.F.Berriz, R.L. Brost, M. Chang, et al. Global mapping of the yeast genetic inter-action network. Science, 303(5659):808–813, 2004.

[154] A.P. Davierwala, J. Haynes, Z. Li, R.L. Brost, M.D. Robinson, L. Yu,S. Mnaimneh, H. Ding, H. Zhu, Y. Chen, et al. The synthetic genetic interactionspectrum of essential genes. Nature Genetics, 37(10):1147–1152, 2005.

[155] D. Deutscher, I. Meilijson, M. Kupiec, and E. Ruppin. Multiple knockout anal-ysis of genetic robustness in the yeast metabolic network. Nature Genetics,38:993–998, 2006.

[156] J. Tegner, M.K.S. Yeung, J. Hasty, and J.J. Collins. Reverse engineering genenetworks: Integrating genetic perturbations with dynamical modeling. ProcNatl Acad Sci, 100(10):5944–5949, 2003.

[157] J. Tegnér and J. Björkegren. Perturbations to uncover gene networks. TrendsGenet, 5(9):691–701, 2006.

[158] M. Tewari, P.J. Hu, J.S. Ahn, N. Ayivi-Guedehoussou, P.O. Vidalain, S. Li,S. Milstein, C.M. Armstrong, M. Boxem, M.D. Butler, et al. Systematic in-teractome mapping and genetic perturbation analysis of a C. elegans TGF-betasignaling network. Molecular Cell, 13(4):469–482, 2004.

94

Page 95: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

[159] D. Heckerman, D. Geiger, and D.M. Chickering. Learning Bayesian net-works: The combination of knowledge and statistical data. Machine Learn-ing, 20(3):197–243, 1995.

[160] N. Friedman, M. Linial, I. Nachman, and D. Pe’er. Using Bayesian networksto analyze expression data. Journal of Computational Biology, 7(3-4):601–620, 2000.

[161] E. Segal, M. Shapira, A. Regev, D. Pe2019er, D. Botstein, D. Koller, andN. Friedman. Module networks: identifying regulatory modules and theircondition-specifi c regulators from gene expression data. Nature Genetics,34(2):166–76, 2003.

[162] N. Friedman. Inferring cellular networks using probabilistic graphical models.Science, 303(5659):799–805, 2004.

[163] K. Murphy and S. Mian. Modelling gene expression data using dynamicBayesian networks, 1999.

[164] Y. Pilpel, P. Sudarsanam, and G.M. Church. Identifying regulatory networksby combinatorial analysis of promoter elements. Nature Genetics, 29(2):153–159, 2001.

[165] K. Palin, E. Ukkonen, A. Brazma, and J. Vilo. Correlating gene promoters andexpression in gene disruption experiments. Bioinformatics, 18(90002):S172–S180, 2002.

[166] T.R. Hughes, M.J. Marton, A.R. Jones, C.J. Roberts, R. Stoughton, C.D. Ar-mour, H.A. Bennett, E. Coffey, H. Dai, Y.D. He, et al. Functional discovery viaa compendium of expression profi les. Cell, 102(1):109–126, 2000.

[167] P. Uetz, L. Giot, G. Cagney, T.A. Mansfi eld, R.S. Judson, J.R. Knight, D. Lock-shon, V. Narayan, M. Srinivasan, P. Pochart, et al. A comprehensive analysis ofprotein–protein interactions in Saccharomyces cerevisiae. Nature, 403:623–627, 2000.

[168] Y. Ho, A. Gruhler, A. Heilbut, G.D. Bader, L. Moore, S.L. Adams, A. Mil-lar, P. Taylor, K. Bennett, K. Boutilier, et al. Systematic identifi cation of pro-tein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature,415(6868):180–183, 2002.

[169] A.C. Gavin, M. Bosche, R. Krause, P. Grandi, M. Marzioch, A. Bauer,J. Schultz, J.M. Rick, A.M. Michon, C.M. Cruciat, et al. Functional orga-nization of the yeast proteome by systematic analysis of protein complexes.Nature, 415(6868):123–124, 2002.

[170] T. Ito, T. Chiba, R. Ozawa, M. Yoshida, M. Hattori, and Y. Sakaki. A com-prehensive two-hybrid analysis to explore the yeast protein interactome. ProcNatl Acad Sci, 98(8):4277–4278, 2001.

95

Page 96: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

[171] L. Giot, J.S. Bader, C. Brouwer, A. Chaudhuri, B. Kuang, Y. Li, YL Hao, C.E.Ooi, B. Godwin, E. Vitols, et al. A Protein Interaction Map of Drosophilamelanogaster. Science, 302(5651):1727–1736, 2003.

[172] S.M. Li, C.M. Armstrong, N. Bertin, H. Ge, S. Milstein, M. Boxem, P.O. Vi-dalain, J.D.J. Han, A. Chesneau, T. Hao, et al. A map of the interactome net-work of the metazoan C. elegans. Science, 303(5657):540–543, 2004.

[173] G.R. Mishra, M. Suresh, K. Kumaran, N. Kannabiran, S. Suresh, P. Bala,K. Shivakumar, N. Anuradha, R. Reddy, T.M. Raghavan, et al. Human proteinreference database: 2006 update. Nucleic Acids Research, 34:D411–D414,2006.

[174] T.K. Gandhi, J. Zhong, S. Mathivanan, L. Karthick, K.N. Chandrika, S.S. Mo-han, S. Sharma, S. Pinkert, S. Nagaraju, B. Periaswamy, et al. Analysis of thehuman protein interactome and comparison with yeast, worm and fly interac-tion datasets. Nature Genetics, 38(3):285–293, 2006.

[175] G.T. Hart, A. Ramani, and E. Marcotte. How complete are current yeast andhuman protein-interaction networks? Genome Biology, 7(11):120, 2006.

[176] T.R. Hazbun and S. Fields. Networking proteins in yeast. Proc Natl AcadSci, 98(8):4277–4278, 2001.

[177] C. von Mering, R. Krause, B. Snel, M. Cornell, S.G. Oliver, S. Fields, andP. Bork. Comparative assessment of large-scale data sets of protein- proteininteractions. Nature, 417:399–403, 2002.

[178] ME.. Futschik, G. Chaurasia, and H. Herzel. Comparison of human protein–protein interaction maps. Bioinformatics, 23(5):605–611, 2007.

[179] A.R. Joyce and B.Ø. Palsson. The model organism as a system: integrating’omics’ data sets. Nature Reviews Molecular Cell Biology, 7:198–210,2006.

[180] R. Jansen, D. Greenbaum, and M. Gerstein. Relating whole-genome expressiondata with protein-protein interactions. Genome Res, 12(1):37–46, 2002.

[181] K. Lage, O.E. Karlberg, Z.M. Størling, Páll, A.G. Pedersen, O. Rigina, A.M.Hinsby, Z. Tümer, F. Pociot, N. Tommerup, Y. Moreau, and S. Brunak. A hu-man phenome-interactome network of protein complexes implicated in geneticdisorders. Nature Biotechnology, 25(3):309–316, 2007.

[182] J. Lamb, E.D. Crawford, D. Peck, J.W. Modell, I.C. Blat, M.J. Wrobel,J. Lerner, J.P. Brunet, A. Subramanian, K.N. Ross, et al. The Connectivitymap: using gene-expression signatures to connect small molecules, genes, anddisease. Science, 313(5795):1929–1935, 2006.

96

Page 97: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

[183] I. Lee, S.V. Date, A.T. Adai, and E.M. Marcotte. A probabilistic functionalnetwork of yeast genes. Science, 306(5701):1555–1558, 2004.

[184] B.A. Peters, B. St. Croix, T. Sjöblom, J.M. Cummins, N. Silliman, J. Ptak,S. Saha, K.W. Kinzler, C. Hatzis, and V.E. Velculescu. Large-scale identifi ca-tion of novel transcripts in the human genome. Genome Reesearch, 17:287–292, 2007.

[185] J.D. Han, D. Dupuy, N. Bertin, M.E. Cusick, and M. Vidal. Effect of sam-pling on topology predictions of protein-protein interaction networks. NatureBiotechnology, 23(7):839–44, 2005.

[186] M.E.J. Newman. Analysis of weighted networks. Physical Review E,70(5):56131, 2004.

[187] A. Barrat, M. Barthelemy, R. Pastor-Satorras, and A. Vespignani. The architec-ture of complex weighted networks. Proceedings of the National Academyof Sciences, 101(11):3747–3752, 2004.

[188] C.G. Knight, N. Zitzmann, S. Prabhakar, R. Antrobus, R. Dwek, H. Hebestreit,and P.B. Rainey. Unraveling adaptive evolution: how a single point mutationaffects the protein coregulation network. Nature Genetics, 38:1015–1022,2006.

[189] C.M. Taniguchi, B. Emanuelli, and C.R. Kahn. Critical nodes in signallingpathways: insights into insulin action. Nature Reviews Molecular Cell Bi-ology, 7:85–96, 2006.

[190] M.A. Permutt, J. Wasson, and N. Cox. Genetic epidemiology of diabetes. Jour-nal of Clinical Investigation, 115(6):1431–1439, 2005.

[191] D. Altshuler, J.N. Hirschhorn, M. Klannemark, C.M. Lindgren, M.C. Vohl,J. Nemesh, C.R. Lane, S.F. Schaffner, S. Bolk, C. Brewer, et al. The commonPPAR big gamma Pro12Ala polymorphism is associated with decreased risk oftype 2 diabetes. Nature Genetics, 26:76–80, 2000.

[192] A.L. Gloyn, M.N. Weedon, K.R. Owen, M.J. Turner, B.A. Knight, G. Hitman,M. Walker, J.C. Levy, M. Sampson, S. Halford, M.I. McCarthy, A.T. Hatters-ley, and T.M. Frayling. Large-scale association studies of variants in genes en-coding the pancreatic beta-cell katp channel subunits kir6.2 (kcnj11) and sur1(abcc8) confi rm that the kcnj11 e23k variant is associated with type 2 diabetes.Diabetes, 52(2):568–72, 2003.

[193] Y. Horikawa, N. Oda, N.J. Cox, X. Li, M. Orho-Melander, M. Hara, Y. Hinokio,T.H. Lindner, H. Mashima, P.E.H. Schwarz, et al. Genetic variation in thegene encoding calpain-10 is associated with type 2 diabetes mellitus. NatureGenetics, 26:163–175, 2000.

97

Page 98: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

[194] D. Meyre, N. Bouatia-Naji, A. Tounian, C. Samson, C. Lecoeur, V. Vatin,M. Ghoussaini, C. Wachter, S. Hercberg, G. Charpentier, et al. Variants ofENPP1 are associated with childhood and adult obesity and increase the riskof glucose intolerance and type 2 diabetes. Nature Genetics, 37:863–867,2005.

[195] L.D. Love-Gregory, J. Wasson, J. Ma, C.H. Jin, B. Glaser, B.K. Suarez, andM.A. Permutt. A common polymorphism in the upstream promoter region ofthe hepatocyte nuclear factor-4 alpha gene on chromosome 20q is associatedwith type 2 diabetes and appears to contribute to the evidence for linkage in anashkenazi jewish population. Diabetes, 53(4):1134–40, 2004.

[196] K. Silander, K.L. Mohlke, L.J. Scott, E.C. Peck, P. Hollstein, A.D. Skol, A.U.Jackson, P. Deloukas, S. Hunt, G. Stavrides, et al. Genetic variation near thehepatocyte nuclear factor-4 alpha gene predicts susceptibility to type 2 dia-betes. Diabetes, 53(4):1141–9, 2004.

[197] S.F. Grant, G. Thorleifsson, I. Reynisdottir, R. Benediktsson, A. Manolescu,J. Sainz, A. Helgason, H. Stefansson, V. Emilsson, A. Helgadottir, et al. Variantof transcription factor 7-like 2 (TCF7L2) gene confers risk of type 2 diabetes.Nature Genetics, 38:320–323, 2006.

[198] P.D. Sasieni. From genotypes to genes: doubling the sample size. Biometrics,53(4):1253–1261, 1997.

[199] J. Li and T. Jiang. Haplotype-based linkage disequilibrium mapping via directdata mining. Bioinformatics, 21(24):4384–4393, 2005.

[200] B. Devlin and K. Roeder. Genomic control for association studies. Biometrics,55(4):997–1004, 1999.

[201] D.G. Clayton, N.M. Walker, D.J. Smyth, R. Pask, J.D. Cooper, L.M. Maier, L.J.Smink, A.C. Lam, N.R. Ovington, H.E. Stevens, et al. Population structure,differential bias and genomic control in a large-scale, case-control associationstudy. Nature Genetics, 37:1243–1246, 2005.

[202] A.L. Price, N.J. Patterson, R.M. Plenge, M.E. Weinblatt, N.A. Shadick, andD. Reich. Principal components analysis corrects for stratifi cation in genome-wide association studies. Nature Genetics, 38:904–909, 2006.

[203] N. Patterson, A.L. Price, and D. Reich. Population structure and eigenanalysis.PLoS Genetics, 2(12):2074–2093, 2006.

[204] A.D. Skol, L.J. Scott, G.R. Abecasis, and M. Boehnke. Joint analysis is moreeffi cient than replication-based analysis for two-stage genome-wide associa-tion studies. Nature Genetics, 38(2):209–213, 2006.

98

Page 99: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

[205] H.J. Cordell. Epistasis: what it means, what it doesn’t mean, and statisticalmethods to detect it in humans. Human Molecular Genetics, 11(20):2463–2468, 2002.

[206] J. Marchini, P. Donnelly, and L.R. Cardon. Genome-wide strategies for detect-ing multiple loci that influence complex diseases. Nature Genetics, 37:413–417, 2005.

[207] M.F. Lyon. Gene action in the X-chromosome of the mouse (Mus musculusL.). Nature, 190(4773):372–373, 1961.

[208] J.C. Chow, Z. Yen, S.M. Ziesche, and C.J. Brown. Silencing of the mammalianX chromosome. Annu Rev Genomics Hum Genet, 6:69–92, 2005.

[209] T. Galitski, A.J. Saldanha, C.A. Styles, E.S. Lander, and G.R. Fink. Ploidyregulation of gene expression. Science, 285(5425):251–254, 1999.

[210] R.H. Duerr, K.D. Taylor, S.R. Brant, J.D. Rioux, M.S. Silverberg, M.J. Daly,A.H. Steinhart, C. Abraham, M. Regueiro, A. Griffi ths, et al. A genome-wideassociation study identifi es IL23R as an inflammatory bowel disease gene. Sci-ence, 314(5804):1461–1463, 2006.

[211] D.M. Maraganore, M. de Andrade, T.G. Lesnick, K.J. Strain, M.J. Farrer,W.A. Rocca, P.V.K. Pant, K.A. Frazer, D.R. Cox, and D.G. Ballinger. High-resolution whole-genome association study of Parkinson disease. AmericanJournal of Human Genetics, 77(5):685–693, 2005.

[212] D.J. Smyth, J.D. Cooper, R. Bailey, S. Field, O. Burren, L.J. Smink, C. Guja,C. Ionescu-Tirgoviste, B. Widmer, D.B. Dunger, et al. A genome-wide asso-ciation study of nonsynonymous SNPs identifi es a type 1 diabetes locus in theinterferon-induced helicase (IFIH1) region. Nature Genetics, 38:617–619,2006.

[213] R.L. Milne, G. Ribas, A. Gonzalez-Neira, R. Fagerholm, A. Salas, E. Gonzalez,J. Dopazo, H. Nevanlinna, M. Robledo, and J. Benitez. ERCC4 associatedwith breast cancer risk: A two-stage case-control study using high-throughputgenotyping. Cancer Research, 66(19):9420–9427, 2006.

[214] J.N. Hirschhorn and M.J. Daly. Genome-wide association studies for commondiseases and complex traits. Nature Reviews Genetics, 6(2):95–108, 2005.

[215] DJ Balding. A tutorial on statistical methods for population association studies.Nature Reviews Genetics, 2006:781–791, 2006.

[216] H.V. Poor. An introduction to signal detection and estimation. SpringerVerlag, 1994.

99

Page 100: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

[217] J.W.C. Robinson, D.E. Asraf, A.R. Bulsara, and M.E. Inchiosa. Information-theoretic distance measures and a generalization of stochastic resonance.Physical Review Letters, 81(14):2850–2853, 1998.

[218] J. Rung and J.W.C. Robinson. A statistical framework for the description ofstochastic resonance phenomena. In D.S Broomhead, E.A. Luchinskaya, P.V.E.McClintock, and T. Mullin, editors, STOCHAOS: Stochastic and ChaoticDynamics in the Lakes, Melville, NY, 1999. American Institute of Physics.

[219] S.M. Ali and S.D. Silvey. A general class of coeffi cients of divergence of onedistribution from another. Journal of the Royal Statistical Society. SeriesB (Methodological), 28(1):131–142, 1966.

[220] M. Basseville. Distance measures for signal processing and pattern recogni-tion. Signal Processing, 18(4):349–369, 1989.

[221] G.C. Orsak and B.P. Paris. On the relationship between measures of discrimi-nation and the performance of suboptimal detectors. Transactions on Infor-mation Theory, IEEE, 41(1):188–203, 1995.

[222] D. Williams. Probability with martingales. Cambridge University Press,1991.

[223] R.S. Liptser and AN Shiryayev. Statistics of random processes. Springer,1978.

[224] T. Kailath. A general likelihood-ratio formula for random signals in Gaussiannoise. IEEE Transactions on Information Theory, 15:350–361, 1969.

[225] T. Kailath. A further note on a general likelihood formula for random signalsin Gaussian noise. IEEE Transactions on Information Theory, 16(4):393–396, 1970.

[226] P.E. Kloeden and E. Platen. Numerical solution of stochastic differentialequations. Springer, 1992.

[227] T.M. Cover and J.A. Thomas. Elements of information theory. Wiley, 1991.

[228] S. Kay. Can detectability be improved by adding noise? Signal ProcessingLetters, IEEE, 7(1):8–10, 2000.

[229] M.E. Inchiosa, J.W.C. Robinson, and A.R. Bulsara. Information-theoreticstochastic resonance in noise-floor limited systems: the case for adding noise.Physical Review Letters, 85(16):3369–3372, 2000.

[230] C. Csank, M.C. Costanzo, J. Hirschman, P. Hodges, J.E. Kranz, M. Mangan,K. O’Neill, L.S. Robertson, M.S. Skrzypek, J. Brooks, et al. Three yeast pro-teome databases: YPD, PombePD, and CalPD (MycoPathPD). Methods En-zymology, 350:347–373, 2002.

100

Page 101: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

[231] A. Wagner. Estimating coarse gene network structure from large-scale geneperturbation data. Genome Research, 12(2):309–315, 2002.

[232] D.E. Featherstone and K. Broadie. Wrestling with pleiotropy: genomicand topological analysis of the yeast gene expression network. Bioessays,24(3):267–274, 2002.

[233] I. Farkas, H. Jeong, T. Vicsek, A.L. Barabási, and Z.N. Oltvai. The topologyof the transcription regulatory network in the yeast, Saccharomyces cerevisiae.Physica A, 318(3-4):601–612, 2003.

[234] M. Kanehisa, S. Goto, M. Hattori, K.F. Aoki-Kinoshita, M. Itoh,S. Kawashima, T. Katayama, M. Araki, and M. Hirakawa. From genomics tochemical genomics: new developments in KEGG. Nucleic Acids Research,34:D354–357, 2006.

[235] I. Rigoutsos and A. Floratos. Combinatorial pattern discovery in biologicalsequences: The TEIRESIAS algorithm. Bioinformatics, 14(1):55–67, 1998.

[236] G. Mannhaupt, R. Schnall, V. Karpov, I. Vetter, and H. Feldmann. Rpn4p actsas a transcription factor by binding to PACE, a nonamer box found upstreamof 26S proteasomal and other genes in yeast. FEBS Letters, 450(1):27–34,1999.

[237] K. Arndt and G.R. Fink. GCN4 Protein, a Positive Transcription Factor inYeast, Binds General Control Promoters at all 5’TGACTC 3’Sequences. ProcNatl Acad Sci, 83(22):8516–8520, 1986.

[238] V. Iyer and K. Struhl. Poly (dA: dT), a ubiquitous promoter element thatstimulates transcription via its intrinsic DNA structure. EMBO Journal,14(11):2570–2579, 1995.

[239] G.C. Yuan, Y.J. Liu, M.F. Dion, M.D. Slack, L.F. Wu, S.J. Altschuler, and O.J.Rando. Genome-scale identifi cation of nucleosome positions in S. cerevisiae.Science, 309(5734):626–630, 2005.

[240] S. Holm. A simple sequentially rejective multiple test procedure. Scandina-vian Journal of Statistics, 6:65–70, 1979.

[241] F. Chimienti, S. Devergnas, F. Pattou, F. Schuit, R. Garcia-Cuenca, B. Van-dewalle, J. Kerr-Conte, L. Van Lommel, D. Grunwald, A. Favier, et al. Invivo expression and functional characterization of the zinc transporter ZnT8 inglucose-induced insulin secretion. Journal of Cell Science, 119(20):4199,2006.

[242] R. Bort, J.P. Martinez-Barbera, R.S.P. Beddington, and K.S. Zaret. Hex home-obox gene-dependent tissue positioning is required for organogenesis of theventral pancreas. Development, 131(4):797–806, 2004.

101

Page 102: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

[243] A. Apelqvist, U. Ahlgren, and H. Edlund. Sonic hedgehog directs specialisedmesoderm differentiation in the intestine and pancreas. Curr. Biol, 7(10):801–804, 1997.

[244] M.K. Thomas, N. Rastalsky, J.H. Lee, and J.F. Habener. Hedgehog signalingregulation of insulin production by pancreatic-cells. Diabetes, 49:2039–2047,2000.

102

Page 103: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using
Page 104: Signals and Noise in Complex Biological Systems170234/FULLTEXT01.pdf · My goal is to show how biological systems can be modelled and analysed at different scales of complexity, using

Acta Universitatis UpsaliensisDigital Comprehensive Summaries of Uppsala Dissertationsfrom the Faculty of Science and Technology 305

Editor: The Dean of the Faculty of Science and Technology

A doctoral dissertation from the Faculty of Science andTechnology, Uppsala University, is usually a summary of anumber of papers. A few copies of the complete dissertationare kept at major Swedish research libraries, while thesummary alone is distributed internationally through theseries Digital Comprehensive Summaries of UppsalaDissertations from the Faculty of Science and Technology.(Prior to January, 2005, the series was published under thetitle “Comprehensive Summaries of Uppsala Dissertationsfrom the Faculty of Science and Technology”.)

Distribution: publications.uu.seurn:nbn:se:uu:diva-7862

ACTAUNIVERSITATISUPSALIENSISUPPSALA2007