View
1
Download
0
Category
Preview:
Citation preview
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
THE CONSTRUCTOR AND THE REPLICATOR:PREREQUISITES FOR THE CONSTRUCTION
OF A SYNTHETIC CELL
What are the theoretical tools most useful for understanding biologicalsystems?
IHES, 13 november 2007
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
ACKNOWLEDGEMENTS
Génétique in silico Marc Bailly-Béchet Massimo Vergassola
Génoscope AGC Claudine Médigue
Génétique des Génomes Bactériens(in silico)
Gang FangEtienne LarsabalGéraldine PascalEduardo Rocha
The University of Hong KongDpt of Mathematics and HKU-Pasteur Research Centre
Stanislas Noria (collective name, working seminarin « Conceptual Biology »)
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
THE EXTINCTION OF POPPERISM: THECRITICAL GENERATIVE METHOD
Othermodels
Clay statue
Simulation
...
Common ideas Language
Axioms and Definitions
Predictions Theorem(Conjecture)
Postulates
Existential Refutable
interpretation
interpretation
instantiation
abstraction
dem
onst
ratio
n
Model (formal)
- - - - - T H E O R Y - - - - -
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
MODELS AND REALITY Against Popper: coupling a model with reality is not
straightforward (abstraction and instantiation) Several models can account for the same reality (e.g.
quantic and classical views in Nuclear MagneticResonance)
« Falsification » is not a direct outcome of a contradictionwith an experimental prediction
Biology is, in depth, particularly abstract; howeverproperties of life are implemented as specific componentswhich have idiosynchratic properties; this results in themultiplication of « anecdotes »
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
THREE REVOLUTIONS
1944 - 1985 MOLECULAR BIOLOGY
1985 - 2005 GENOMICS
2005 - … SYNTHETIC (SYMPLECTIC) BIOLOGY(highly multidisciplinary !)
« Symplectic » is in Greek (συν, together, πλεκτειν, toweave) the same word as « Complex » in Latin; used hereto avoid the unwanted fuzzy connotations associated to« Complexity »; a connotation in Geometry will not interfere
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
LIFE AND COMPUTATION SOME SIMPLE PHYSICAL CONSTRAINTS TRANSLATION ORGANIZES THE BACTERIAL GENOME THE PALEOME: CONSTRUCTOR AND REPLICATOR THE CENOME: THE “PURPOSE” OF THE MACHINE REPRODUCTION vs REPLICATION: THE ESSENTIALITY
OF METABOLISM
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
Three co-existing processes constitute life:
Metabolism | aCompartmentalization | machine
Information transfer | a “program” (declarative, not| prescriptive)
The cell is the atom of life
WHAT LIFE IS
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
THE “GENETIC PROGRAM”
Physics: matter, energy, timeStatistical physics: Physics + « information »Biology: Physics + information, coding, control...Arithmetics: sequences of integers, recursivity, coding…Computation: Arithmetics + programs + machine...
The « genetic program » metaphor has practicalconsequences: we know how to manipulate genes andgene products, do we have the conceptual tools to push themetaphor to its ultimate consequences?
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
Two entities permit computing:
A machine able to read and write
A program on a physical support, split (in practice, but notconceptually) into two entities:
Program (providing the “goal”)Data (providing the context)
The machine is distinct from the program
WHAT COMPUTING IS
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
THE TURING MACHINE
the program(data)as a linearsequenceof symbols
the machine(read/write)
is physicallydistinctfrom
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
CELLS AND COMPUTERSGenetics rests on the description of genomes as texts written with a four
letter alphabet: do cells behave as computers?
Horizontal Gene TransferVirusesGenetic engineeringDirect transplantation of a naked genome into a recipient cell withsubsequent change of the recipient machine into a new one (2007)
all points to separation between
«Machine» (the cell factory) and «Data/Program»
Need: conceptual analysis of biological information (algorithmic complexity,logical depth...)
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
AN ALGORITHMIC VIEW OFBIOLOGICAL ACTIONS
Replication, transcription, translation: high parallelism
“Begin, Check Control Points, Repeat, End”
The action is always oriented, with a beginning and an end
The processes of time control (check points) are rarely takeninto account (except for the replication/division processes), buttheir role is essential to allow coordination of multiple actions inparallel
Need: conceptual analysis of check points; experimentalidentification
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
John von Neumann, trying to understand the brain, suggested thatwere a computer both to behave as a computer and to construct themachine itself, it should harbour an image of the machinesomewhere
That special computer had to be split into a replicator and a constructor,which expresses the program for construction of both the replicatorand the constructor
The metaphor does not appear to apply to the brain, does it apply to thecell?
IS THERE A MAP OF THECELL IN THE CHROMOSOME?
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
A GENETIC COMPUTER?
In a computer the machine is distinct from data andprogram
In the cell, data and program play the same role (theyare ‘ declarations ’ not prescriptions); they can bemodified by the machine (Pol IV, Pol V...)
General reflection (Number Theory) considers theactions of the machine, but not the way it is constructed
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
PARADOX
If the machine has not only to behave as a Turingmachine but also to make the machine, one mustfind a geometrical program somewhere in themachine (J. von Neumann)
Is there an image of the organism in the genome?
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
GENE ORDER AND CELL SHAPE
Tamames J, Gonzalez-Moreno M, Mingorance J, Valencia A, Vicente MBringing gene order into bacterial shapeTrends in Genetics (2001) 17: 124-126
The mur-fts cluster
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
LIFE AND COMPUTATION SOME SIMPLE PHYSICAL CONSTRAINTS TRANSLATION ORGANIZES THE BACTERIAL GENOME THE PALEOME: CONSTRUCTOR AND REPLICATOR THE CENOME: THE “PURPOSE” OF THE MACHINE REPRODUCTION vs REPLICATION: THE ESSENTIALITY
OF METABOLISM
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
PHYSICS OF REPLICATION
DNA forms a long folded thread: how do thedaughter molecules separate?
Are physical constraints reflected in thesequence?
[Replication is oriented: the physics of a strandcannot be that of its complement]
A correct use of physics helps!
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
A TEXTBOOK VIEW OF ENTROPY
0
t
....... .... . ..... ........ .
.... ....... ....
. ..... ...
.... .. ....
. . .. .
. ..........
. .... . ...
.. .. ... . ... ... ....... ... . . .
.... ..... ...
.
. ....... .... ..
. . ... .. . ... ..
. ..... .. .. .. . . .
......
.. .... .Benjamin Crowell, licensed under the CreativeCommons Attribution-ShareAlike license
S = k log Ω
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
AN INCREASE IN ENTROPY IS ENOUGH TOSEPARATE CHROMOSOMES
0
tJun S, Mulder B. Entropy-driven spatial organization of highly confined polymers: lessonsfor the bacterial chromosome .Proc Natl Acad Sci U S A. 2006 103:12388-12393
Danchin A, Guerdoux-Jamet P, Moszer I, Nitschke P. Mapping the bacterial cell architectureinto the chromosome. Philos Trans R Soc Lond B Biol Sci 2000, 355:179-190
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
CONCEPTS AND PATCHES
The processes constituting life can be analyzedconceptually. They need however to be implementedwith concrete objects, having idiosynchraticproperties. The DNA sequence cannot be a smoothlinear double helix, simply because of the chemicalnature of its nucleotides; it winds, turns and bends.However it needs to be recognized by control orstructural elements. How can these divergentconstraints be reconciled?
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
RECURSIVE MODELLING
Realistic Model 1 <=> Real sequence
Prediction 1
Realistic Model 2 <=> Real sequence
Prediction 2
Realistic Model 3 <=> Real sequence
Prediction 3
…..
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
Evolution optimises replication, while DNA needs alsoto support gene sequences
This is witnessed by: A period 3, signature of the codon succession in
genes (constrained by the genetic code rule)
A period 10-11.5 of yet unknown function...
CONSTRAINTS IN THE DNA SEQUENCE
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
PERIODS IN GENOMES
One observes a correlationbetween base pairs withperiod three
After deconvolution of thisperiod there remains asomewhat fuzzy period of 10to 11.5 base pairs
Eskesen et coll. BMC Molecular Biology Volume 5, 12, 2004
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
A UNIVERSAL FEATURE OF THE GENOME TEXT: 10-11.5
motifs
0 10 20 30 40 50 60 70 80 90 100bp
0 10 20 30 40 50 60 70 80 90 100bp
ffss(G(G--))-0.01
0
0.01
0 10 20 30 40 50 60 70 80 90 100bp
real
model
validation
Helicobacter pylori
Discrepancy
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
TYPE A FLEXIBLE MOTIFS
methods
1- 1-xxAAxxxxxxxxTTxxxxxxxxAAxxxxxxxxTTTTxxxxxxxxxxAAxxxxxxxxTTxxxxxxxxAAxxxxxx: : All domainsAll domains 2-2-xxxxxxxxxxxxxxxxxxxxxxGGxxxxxxxxTTTTxxxxxxCCxxxxxxxxxxTTxxxxxxxxxxxxxxxxxx:: ProteobacteriaProteobacteria 4- 4-xxxxxxxxxxxxTTxxxxxxxxAGAGxxxxxxTTTTxxxxxxxxxxxxxxxxTTxxxxxxxxxxxxxxxxxxxx:: ArchaeaArchaea 55''--xxxxxx-10xxxxxxxxx0xxxxxxxx10xxxxxx-10xxxxxxxxx0xxxxxxxx10xxxxxxbp-3bp-3''
TTTTxxxxxxGGxxxxxxTTxxxxxxxxxxxxxxxxxxxxTTTT
Nucleotides forming this class of motifsare fully accessible on this side of thehelix and the dinucleotides are locatedin the small groove
Nucleotides forming this class of motifsare fully accessible on this side of thehelix and the dinucleotides are locatedin the large grooveTTTT
GG TT TTTT
AAAACC
AA AAAA
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
FLEXIBLE MOTIFS ACCOMMODATE LOCALVARIATIONS OF THE DNA STRUCTURE
The flexiblility of these motifs allow DNA to take intoaccount superturns and bends
Larsabal, E, Danchin, AGenomes are covered with ubiquitous 11 bp periodic patterns, the "class A flexible patterns »BMC Bioinformatics. 2005 6:206
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
OPEN QUESTIONS
The constraints resulting from the presence offlexible motifs is so large that it should be visiblein gene products
It may result in non random distribution of genesif some functions are associated to regularities inproteins (alpha helices, beta sheets, beta turnsetc)
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
LIFE AND COMPUTATION SOME SIMPLE PHYSICAL CONSTRAINTS TRANSLATION ORGANIZES THE BACTERIAL GENOME THE PALEOME: CONSTRUCTOR AND REPLICATOR THE CENOME: THE “PURPOSE” OF THE MACHINE REPRODUCTION vs REPLICATION: THE ESSENTIALITY
OF METABOLISM
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
MULTIVARIATE ANALYSESMultivariate analyses try to extract information by reducing as much aspossible the number of descriptors of the objects of interest
Laplace-Gauss statistics
Principal Component Analysis uses the centered average and a simpledistance (identity); it is the reference method
Correspondence Analysis belongs to the same family, but it uses the χ2
measure as a distance (Benzécri, 1965)
Absence of normality (or log-normality)
Independent Component Analysis uses the non gaussian character ofthe values associated to descriptors; it characterizes objects belongingto common independent clusters (the « cocktail party » theorem),(Hérault, 1984)
Further methods need to be developed
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
Correspondence Analysis (CA)
Factorial space ofthe proteins
Factorial space ofthe amino acids
Superimposition ofboth spaces (clouds)
clustering
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
UNIVERSAL BIASES IN PROTEINAMINO ACID COMPOSITION
First axis: separates Integral Inner MembraneProteins (IIMP) from the rest; driven by oppositionbetween charged and large hydrophobic residues
Second axis: separates proteins by theircontent in aromatic amino acids; enriched inorphan proteins
Third axis: separates proteins according to anopposition driven by the G+C content of the firstcodon base
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
Neighborhood:distribution of
aminoacids in theproteome
G. Pascal
BIAS IN AMINO ACID DISTRIBUTION
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
A strong bias opposing charged residues to hydrophobic residues
BIASES IN HYDROPHOBICITY ANDAROMATICITY OF PROTEINS
Proteins of theinner
membrane:LacY. SecE…
Proteins of theouter
membrane:OmpT. OmpL…
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
TEMPERATURE-DEPENDENT BIASES INPROTEIN AMINO ACID COMPOSITION
The general trend of amino acid compositionbias is to avoid some aminoacids at highertemperatures (associated to aging processes)
Mesophilic bacteria belong to at least twodifferent classes (in a 5-clusters analysis)
Biases are always dominated by the IIMPclustering
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
A specific asparagine bias in psychrophiles
COMPARATIVE PROTEOMICS
IIMPsIIMPs
53%53%mesophilesmesophiles
62%62%thermophilesthermophiles
55%55%psychrophilespsychrophiles
Motility Motility
Cell wall, Cell wall,outeroutermembranemembrane
Transport Transport(TonB),(TonB),secretionsecretion
Adaptation Adaptationto stressto stress
Metabolism Metabolismof DNA andof DNA andRNARNA
isoaspartate
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
Asparagine deamidates: a major contribution to protein aging
A CHEMICAL ANECDOTE
Main post-translational modification
Reaction still poorly understood
Spontaneous reaction (untargeted?)
Affects the protein structure (and function?)
Role in regulating protein folding
Signal for degradation of intracellular proteins
Asparagine (N)
Intermediary:succinimide
Degradation:succinimide
IsoaspartateAspartate
challenges
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
Bias driven by protein aromaticity
SECOND UNIVERSAL BIAS
Aromaticamino acids
Small aminoacids (lowmetabolic
cost)
function
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
WHY AROMATIC RESIDUES IN ORPHAN PROTEINS?
From orphans to « Gluons »; how are genes created?
Orphan loose their status in the course of evolution: Rocha. 2002.Pedulla. 2003
G. Pascal
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
LOCAL BIASES OF CODON USAGE
Correspondence Analysis shows that genes with similarbiases are functionnally related. How is this reflected in thechromosome?
A clustering method (Vergassola et al.) based on informationtheory groups the genes into homogeneous families, whichappear not to be randomly spread in the chromosome. Themethod identifies 4 classes in E. coli and 5 in B. subtilis.Genes sharing similar codon bias tend to be close to eachother on the chromosome, in coherent patches extended onaverage ten times the extent of transcriptional units
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
Genes with similar bias areorganized into groups longer thanoperons, showing some translation-driven organization of thechromosome
A major part of this effect comes fromthe recycling or rare transfer RNAmolecules. It is essential tounderstand that individual molecules(not concetration!) are important inthe cell
GENOMIC TRANSLATION ISLANDS
M. Bailly-Béchet
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
TRANSLATION ISLANDS
One groups is associatedto high expression (blue).
The other groups arealso functionnallyconsistent: horizontallytransferred genes (red),motility (yellow) andintermediary metabolism(green). M. Bailly-Béchet
M Bailly-Bechet, A Danchin, M Iqbal, M Marsili, M VergassolaCodon usage domains over bacterial chromosomesPLoS Computational Biology (2006) 2: april 20th
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
SEQUENCES ANDARCHITECTURES
The non-random distribution of genes in the genome suggestsstrong constraints of the 3D distribution of molecules in thecell. Escherichia coli has to accomodate in less than one cubicmicrometer 20,000 ribosomes, 150,000 tRNAs, 1,000 mRNAs(each 3 times longer than the cell), and a DNA molecule 1,000longer than the length of the cell, together with a huge numberof proteins. Occupation of space is therefore a major questioncombining constraints related to the physics of diffusion andthe physics of polymers. Furthremore, the « concentration » ofmany small molecules is meaningless (1 µM = 600 moleculesin E. coli)…
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
LIFE AND COMPUTATION SOME SIMPLE PHYSICAL CONSTRAINTS TRANSLATION ORGANIZES THE BACTERIAL GENOME THE PALEOME: CONSTRUCTOR AND REPLICATOR THE CENOME: THE “PURPOSE” OF THE MACHINE REPRODUCTION vs REPLICATION: THE ESSENTIALITY
OF METABOLISM
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
LOOKING FOR THE REPLICATORAND THE CONSTRUCTOR
Are genes grouped randomly in the chromosomes?
Do we find different gene categories, in terms of the waythey are organized?
At first sight, consistent with different DNA management processes indifferent organisms not much is conserved, while genes transferred fromother organisms are distributed throughout genomes
However, groups of genes such as operons or pathogenicity islands tendto cluster in specific places, and they code for proteins with commonfunctions. « Persistent » genes are clustered together
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
Genes are preferentially located in theleading replication strand in Bacteria. Thereis however much variation, depending on theorganism, with a considerable bias in A+T-rich Gram-positive organisms
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
180
90
0
270
55% leading
Escherichia coli
Ori
Ter
90270
65% leading
Treponema pallidum
Ori
Ter
180
9027075% leading
Bacillus subtilis
Ori
Ter
90270
87% leading
Thermoanaerobactertengcongensis
Ori
Ter
Gene densityGene density inthe leading strand
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
DISTRIBUTION OF HIGHLY-EXPRESSED GENES
C. c
resc
entu
s
M. t
uber
culo
sis
E. c
oli
B. su
bitli
s
rRNA Highly-expressed genes are
clustered near the origin ofreplication in fast growingbacteria
Origin
Terminus
Middle
Ori
Ter
10%
20%
30%
40%
50%
60%
0%
70%
10%
20%
30%
40%
50%
0%
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
CONCLUSION
The genome organization is so rigid that theoverall result of selection pressure on DNA isvisible in the genome text, which differentiatesthe leading strand from the lagging strand
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
TO LEAD OR TO LAG...
Is it possible to see whether there is a difference inthe nucleotide composition, between the leadingand the lagging strand? Does that have aconsequence on the codon biases? Does that havea consequence for the protein amino acidsequence?
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
TO LEAD HAS A COST: BIASVISIBLE IN PROTEINS…
GT in the leading strand, CA in the lagging strand....
0
5
10
15
0 5 10 15
% V
% T
Borreliaburgdorferi
0
5
10
15
0 5 10 15
% V
% T
Chlamydiatrachomatis
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
Proteins are made of 20 amino acid types, amongwhich Valine and Threonine, and one observes thatValine-rich protein are on the leading strand whileThreonine-rich proteins are on the lagging strand!Isologous proteins replace preferentially one residue forthe other when their gene change strand
This should be taken into account in models ofevolution
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
ESSENTIAL GENES LOCATE INTHE LEADING STRAND
highlyexpressed
0%
25%
50%
75%
100%
non-highlyexpressed
essential genes non-essential genes
lagging
leading
non-highlyexpressed
highlyexpressed
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
PHYSICAL CAUSALITY: AVOIDING COLLISIONSBETWEEN RNA AND DNA POLYMERASES
DNAPdeceleration
End oftranscription
Stop of RNAP & DNAP
Transcriptionaborts
Co-oriented FrontalConsequences:1. Slowing down of
replication
2. Loss of transcripts
Consequences:1. Truncated
transcripts
2. Truncated essentialproteins
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
PERSISTENT GENES
Laboratory essential genes are located in the DNAleading strand. They are conserved in a majority ofgenomes. By contrast the genes that are conserved andlocated in the leading strand make a particular category,which doubles the number of « essential » genes.
These genes make a universal category; 400-500genes persist in a majority of bacterial genomes; theyare not only involved in the three processes needed forlife, but in maintenance and in adaptation to transientphenomena; a fraction manages the evolution of theorganism.
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
GENE PERSISTENCE
Which functional category?
Information transfer
Compartmentalization
Intermediary metabolism
Stress, maintenance and repair
Highly non random!
Persistent genes
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
The contribution of gene divergence wasmeasured for each pair of genomes the correlationbetween the similarity of orthologous pairs andthat of the 16S rRNA
(A), 38% (resp. 48%) of B. subtilis (resp. E. coli)persistent genes showed a correlation coefficient>0.9 between the sequence similarity of the pair oforthologs and the 16S RNA
Some genes (B) evolve in an erratic way. Thismay be due to horizontal gene transfer, localadaptations leading to change in evolutionarypace, or simply wrong assignments of orthology.The latter is a significant problem, especially inlarge protein families
GENE PERSISTENCE
G Fang, EPC Rocha, A DanchinHow essential are non-essential genes?Mol Biol Evol (2005) 22: 2147-2156
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
PERSISTENT GENES ARE CLUSTEREDTOGETHER
Persistent genes are functionally defined. They arelocated in the DNA replication leading strand
The way they group along chromosomes in morethan 250 bacteria (genome length > 1,500) displaysthree clusters that reflect a scenario of the origin oflife. This is why it is proposed to name paleome(from παλαιος, ancient) this group of core genes
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
EXISTENCE IMPLIES CLUSTEREDPERSISTENCE
Why are persistent genes clustered? A simple model showsthat if, in addition to horizontal gene transfer, there is aprocess deleting genes in groups in genomes, then anygene contributing to fitness frequently enough overgenerations will tend to cluster with other genes with similarproperties. This accounts for clustering of essential genes,but most probably also for clustering of antibiotic resistancegenes in bacteria found in hospitals....
As a consequence gene clustering will precede, not derivefrom co-transcription or protein-protein interaction (nointelligent design!)
Note: the model needs to be refined. It may yield interestingchaotic behaviours
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
EXISTENCE IMPLIES CLUSTERING
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
PERSISTENT GENESCONNECTIVITY
Using 228 genomes with more than 1500genes and « correct » annotations, we haveidentified genes that tend to remain close toone another; this « mutual attraction »constructs a remarkable network made ofthree layers
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
PERSISTENT GENES RECAPITULATETHE ORIGIN OF LIFE
G. Fang
The external network, made ofgenes of intermediarymetabolism (nucleotides andcoenzymes, lipids), is highlyfragmented; the middle networkis built around class I tRNAsynthetases, and the innernetwork, almost continuous,organized around the ribosome,transcription and replicationmanages information transfers
A Danchin, G Fang, S NoriaThe extant core bacterial proteome is an archive of the origin of lifeProteomics. (2007) 7:875-889
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
METABOLISM AND REPLICATION
This scenario emphasizes the separationbetween metabolism and replication, thelatter being a secondary invention ofprebiotic systems:
Building blocks => nucleotides => tRNA =>ribosome => DNA
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
LIFE AND COMPUTATION SOME SIMPLE PHYSICAL CONSTRAINTS TRANSLATION ORGANIZES THE BACTERIAL GENOME THE PALEOME: CONSTRUCTOR AND REPLICATOR THE CENOME: THE “PURPOSE” OF THE MACHINE REPRODUCTION vs REPLICATION: THE ESSENTIALITY
OF METABOLISM
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
THE COMPOSITE GENOME
Expecting two genome components, coding for themachine and for the “purpose” of the machine, we need toseparate between the replicator/constructor andsecondary functions.
Extant genomes should comprize ubiquitous functions (notgenes!) which would correspond to the former (herenamed the paleome) and functions specific to theenvironment of the organism (named the cenome — as in“biocenose” — to express the fact that these genescorrespond to a specific niche)
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
CONSERVATION OFGENE CLUSTERING
Clusteringfrequency
Frequencyin genomes
+frequent + rare
Known Unknown
Antibiotics Virulence
PenicillinVancomycin
Isoniazide
Rifampicin
Streptomycin
Tetracycline
Genome core<2000 genes
Variable genesalready > 50000 genes
The cenome(from the Greekκοινος, common)
The paleome(from the Greek
παλαιος, ancient)
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
A RECURSIVE MACHINE
Replicator: DNA specifies proteins thatreplicate DNA
Constructor: DNA specifies proteins whichform the machine that constructs the cell
However, DNA can only accumulate errors…the machine needs to cope with errors
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
SYNTHESIS: A TALE OF TWO GENOMES
Life manifests first bygrowth and repair ofweathering: thecorresponding genomeexists since the origin, it isthe paleome.. Explorationof the environment is aninevitable consequence ofexistence, it results fromcontinuous creation andexchange of the geneswhich form the cenome.
To liveTo escape death
To live in contextA. Danchin. Archives or Palimpsests? Bacterial Genomes Unveil a Scenario for the Origin of LifeBiological Theory (MIT Press) (2007) 2: 52-61.
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
THREE PARTS IN THE GENOME’SORGANIZATION
Anabolism and Replication
Maintenance and Repair: coping witherrors
Life in context (the cenome)
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
LIFE AND COMPUTATION SOME SIMPLE PHYSICAL CONSTRAINTS TRANSLATION ORGANIZES THE BACTERIAL GENOME THE PALEOME: CONSTRUCTOR AND REPLICATOR THE CENOME: THE “PURPOSE” OF THE MACHINE REPRODUCTION vs REPLICATION: THE ESSENTIALITY
OF METABOLISM
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
METABOLISM AND REPLICATION
Replication accumulates errors(Muller’s ratchet and Orgel’s errorcatastrophe)
Reproduction: can metabolismreproduce in an error-prone context, andimprove on unperfect components?
Freeman Dyson’s « origins of life » revisited
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
INFORMATION AGAIN
Metabolism improvement can be conceptuallytolerated as creation of information isreversible (Landauer, 1961; Bennett, 1988)
Open question: « room » is needed toaccomodate innovation; how is it obtained?experiments are needed to identify thecorresponding processes
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
GENOME EXPLORATION
Is the structure of the paleome homogeneous?
How do we see the dialogue between the youngand the old (creation of information, in practice)?
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
ESSENTIAL METABOLISM
Recovery from an aged state requires:Stepwise improvement of the « quality » of biologicalobjects
Energy-dependent selection of what needs to bedestroyed (ratchet mechanism)
Persistent genes of unknown function evolvefollowing a tree that differs from that of theanabolic pathways…
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
CAVEAT: PATCHES FORANECDOTES
Each component of the cell has idiosynchraticproperties, some are incompatible with othercomponentsThe paleome codes for anabolic and maintenancefeatures, except for a few purely catabolic stepsOne example: serine catabolism (accounts for serinetoxicity)
This results in the « anecdotal » appearance of biologicalsystems
Recommended