Upload
maud-gibson
View
218
Download
0
Tags:
Embed Size (px)
Citation preview
3- NON-RIBOSOMAL GENE RECONSTRUCTION
Core / auxiliary / strain specific genesCore / auxiliary / strain specific genes
Housekeeping genes and accordance with global reconstructionHousekeeping genes and accordance with global reconstruction
MLSAMLSA
Alignment (aminoacid / nucleotide Alignment (aminoacid / nucleotide depends on the level of depends on the level of resolution)resolution)
Filtering alignmentsFiltering alignments
Number of genes for a stable topologyNumber of genes for a stable topology
Horizontal gene transferHorizontal gene transfer
Tetranucleotide signaturesTetranucleotide signatures
Housekeeping genes Housekeeping genes alternative phylogenies alternative phylogenies
Core genes with phylogenetic Core genes with phylogenetic signalsignal
Auxiliary genes, not present Auxiliary genes, not present in all populations with low in all populations with low
phylogenetic signalphylogenetic signal
Specific genes of a single Specific genes of a single strain without phylogenetic strain without phylogenetic
signalsignal
Lan and Reeves. 2000 TRENDS Microbiol 8: 396-401Lan and Reeves. 2000 TRENDS Microbiol 8: 396-401
Characteristics of a molecule as molecular clockCharacteristics of a molecule as molecular clock
Universally presentUniversally present
only only 3434 orthologous universal genes ( orthologous universal genes (Huynen & Bork, PNAS, 1998. 95:5849-5856Huynen & Bork, PNAS, 1998. 95:5849-5856))
group specific (i.e. phylum, family, genus…) genes with phylogenetic signal can be usedgroup specific (i.e. phylum, family, genus…) genes with phylogenetic signal can be used
functional constancy functional constancy
sufficient sequence conservation for reconstruction purposessufficient sequence conservation for reconstruction purposes
sufficient sequence complexity for a good phylogenetic signalsufficient sequence complexity for a good phylogenetic signal
Ludwig and Schleifer. 2005 Microbial phylogeny and Ludwig and Schleifer. 2005 Microbial phylogeny and evolution (Sapp) 70-98. (Oxford University Press)evolution (Sapp) 70-98. (Oxford University Press)
Markers supporting global phylogeniesMarkers supporting global phylogenies
RNAr 16SRNAr 16S
RNAr 23SRNAr 23S
EF-Tu (EF-Tu (some phyla are paraphyletic e.g. some phyla are paraphyletic e.g. Actinobacteria Actinobacteria yy Streptomyces Streptomyces))
RNA polimerase rpoB RNA polimerase rpoB (some phyla are (some phyla are paraphyletic e.g. paraphyletic e.g. EpsilonproteobacteriaEpsilonproteobacteria y resto y resto ProteobacteriaProteobacteria))
Heat Shock Hsp60 Heat Shock Hsp60 ((BacteriaBacteria: GroEL, : GroEL, ArchaeaArchaea: : Tf-55; some may be paraphyletic)Tf-55; some may be paraphyletic)
Aminoacyl tRNA sintetasesAminoacyl tRNA sintetases
Markers that do not support global phylogeniesMarkers that do not support global phylogenies
ATPasesATPases
DNA girasesDNA girases
Hsp70Hsp70
RecARecA
Housekeeping genes Housekeeping genes not all give the same resolution not all give the same resolution
PHYLOGENY BASED ON NON-RIBOSOMAL GENES: MLSAPHYLOGENY BASED ON NON-RIBOSOMAL GENES: MLSA
Stackebrandt et al., 2002, Int J Syst Evol Microbiol. Stackebrandt et al., 2002, Int J Syst Evol Microbiol. 52:846-849 52:846-849 Gevers et al., 2005, Nature Rev. Microbiol. Gevers et al., 2005, Nature Rev. Microbiol. 3:733-7393:733-739
MLSAMLSA (multilocus sequence analysis):(multilocus sequence analysis):
5-105-10 full/partial sequencesfull/partial sequences
house keeping geneshouse keeping genes primer design difficultiesprimer design difficulties biases in the selection of genesbiases in the selection of genes time consumingtime consuming ↓↓ ↓↓ number for stable topologynumber for stable topology
Amplify and sequence 5-10 Amplify and sequence 5-10 housekeeping genes for each strainhousekeeping genes for each strain
Concatenate Concatenate gene sequencesgene sequences
Reconstruct Reconstruct the the
phylogenyphylogeny
genAgenA genBgenB genCgenC genDgenD genEgenE genFgenFStr. 1Str. 1Str. 2Str. 2Str. 3Str. 3Str. 4Str. 4
The alignments of proteins of genes are less clear than rRNAThe alignments of proteins of genes are less clear than rRNA
Protein sequences vs. rRNA sequencesProtein sequences vs. rRNA sequences
Codifying DNA harbors information in triplets (codons)Codifying DNA harbors information in triplets (codons)
Degenerated code allows silent mutations (not much evolutionarily constraints)Degenerated code allows silent mutations (not much evolutionarily constraints)
For deep phylogenies, amino acid alignments give better resolution. For deep phylogenies, amino acid alignments give better resolution.
DNA phylogenies should only be done with close relative sequencesDNA phylogenies should only be done with close relative sequences
Generally shorter sequences (300-1000 residues) than rRNAGenerally shorter sequences (300-1000 residues) than rRNA
Removing hypervariable positions Removing hypervariable positions reducing phylogenetic noise reducing phylogenetic noise
http://molevol.cmima.csic.es/castresana/Gblocks.html
Of all Of all 22 22 analyzed genesanalyzed genes::
57 % 57 % BacteroidetesBacteroidetes
27 % 27 % ChlorobiChlorobi
18 % 18 % Chlorobi- BacteroidetesChlorobi- Bacteroidetes
A. PyrG
0.1
74100
100
98
100
100
99
64
83
Bacteroidetes fragilis YP_099886
Bacteroidetes thetaiotaomicron NP_809503
Prevotella intermedia PINA1923
Porphyromonas gingivalis NP_904820
Cytophaga hutchinsonii ZP_00311101
Chlorobium chlorocromatii YP_378411
Chlorobium tepidum NP_661048
Rhodopirellula baltica NP_870297
Geobacter sulfurreducens NP_952944
Nitrosomonas europaea NP_841114
Treponema denticola NP_952944
Oceanobacillus iheyensis NP_693929
0.1
10084
98
74
61
100
57
54
58
Bacteroidetes fragilis YP_099485
Bacteroidetes thetaiotaomicron NP_809651
Porphyromonas gingivalis NP_904395
Prevotella intermedia PIN0101
Cytophaga hutchinsonii ZP_00309740
Chlorobium chlorocromatii YP380083
Chlorobium tepidum NP_662473
Geobacter sulfurreducens NP_952658
Nitrosomonas europaea NP_841474
Rhodopirellula baltica NP_867123
Treponema denticola NP_973266
Oceanobacillus iheyensis NP_693907
B. GlyA
0.1
10099
100
99
100
59
33
17
57
Bacteroidetes fragilis YP_100673
Bacteroidetes thetaiotaomicron NP_810742
Prevotella intermedia PINA1797
Porphyromonas gingivalis NP_904815
Cytophaga hutchinsonii ZP_00310575
Chlorobium chlorocromatii YP_379609
Chlorobium tepidum NP_661430
Geobacter sulfurreducens NP_954380
Nitrosomonas europaea NP_840129
Treponema denticola NP_971783
Rhodopirellula baltica NP_868643
Oceanobacillus iheyensis NP_691577
C. GroEL
One cannot rely on single gene One cannot rely on single gene reconstructions that may produce inconsistent reconstructions that may produce inconsistent
resultsresults
Single genes may lead to different topologies Single genes may lead to different topologies
The amount of genes in the concatenate influence the stability of the treeThe amount of genes in the concatenate influence the stability of the tree
0102030405060708090
100
4 8 12 16
Boots
trap
Number of genes
random selection among the random selection among the 2222 genes genes checking branching robustnesschecking branching robustness
The bootstrap values improve The bootstrap values improve with the increase of amount of with the increase of amount of
genes in the analysisgenes in the analysis
Below Below 8 8 genes one can obtain genes one can obtain unstable topologiesunstable topologies
1212 genes gave the threshold for genes gave the threshold for reliabilityreliability
For taxonomic purposes, 16S For taxonomic purposes, 16S rRNA gene sequence analysis is rRNA gene sequence analysis is the most parsimonious approachthe most parsimonious approach
Sória-Carrasco et al., 2008, System Appl Microbiol. Sória-Carrasco et al., 2008, System Appl Microbiol. 30:171-17930:171-179
MLSA: phylogenetic reconstructions
MULTIPLE SEQUENCE ALIGNMENTS
sometimes have better resolution than the 16S rRNA gene
16S rRNA gene can have very low resolution
Jiménez et al., 2013, System Appl Microbiol, 36: 383- 391
MULTIPLE SEQUENCE ALIGNMENTS (LARGE SETS)
r-MLST (ribosomal protein concatenates)
SPECL (single copy marker genes)
r-MLST (ribosomal protein concatenates)
http://pubmlst.org/rmlst/
Jolley et al., 2012, Microbiology 158:1005-15
53 ribosomal protein genes (rps genes)
SPECL
(http://vm-lux.embl.de/~mende/specI/)
Mende et al., Nat Methods, in revision
on 40 universal, single copy marker genes
Optimized cutoffs (96.5% nucleotide identity)
MONOPHYLY: phylogenetic reconstructions (MLSA)
Loosing identity due to HGTLoosing identity due to HGT
Kunin et al. 2005. Genome Res. 15:954-959 Kunin et al. 2005. Genome Res. 15:954-959
Phylogenetic incongruences: Phylogenetic incongruences:
HGT makes fuzzy the assignment of HGT makes fuzzy the assignment of identitiesidentities
Masive HGT in the microbial worldMasive HGT in the microbial world
No tree of life is possibleNo tree of life is possible
TWO SCHOOLSTWO SCHOOLS
Phylogenetic incongruences: Phylogenetic incongruences:
Can be explained by Can be explained by
► ► gene duplication (paralogy) and gene duplication (paralogy) and deletion (hidden paralogy)deletion (hidden paralogy)
► ► false orthology assignationfalse orthology assignation
► ► alignments artifactsalignments artifacts
Orthology should be carefully checkedOrthology should be carefully checked
Soria-Carrasco & Castresana, 2008. Mol. Biol. Soria-Carrasco & Castresana, 2008. Mol. Biol. Evol. 25: 2319-2329Evol. 25: 2319-2329
Kurland. 2005. Bioessays 27:741-747Kurland. 2005. Bioessays 27:741-747
pyrEpyrE
aroAaroA
Some times no other explanation (either true or lack of information)Some times no other explanation (either true or lack of information)
Some times a loss of phylogenetic signalSome times a loss of phylogenetic signal
Tetranucleotide variation: 4Tetranucleotide variation: 444 = 256 = 256
TETRATETRA::
Genomes have an oligonucleotide usage (not yet Genomes have an oligonucleotide usage (not yet understood, related to codon usage)understood, related to codon usage)
Similar genomes might have similar usageSimilar genomes might have similar usage
ALIGNMENT FREE PARAMETERALIGNMENT FREE PARAMETER
may be useful in deciding whether a group of strains may be useful in deciding whether a group of strains deserve a species statusdeserve a species status
Genome SignaturesGenome Signatures
G+C content G+C content ►dinucleotide ►dinucleotide ► ► not much not much informativeinformative
Codon usage Codon usage ► trinucleotide ► more ► trinucleotide ► more informativeinformative
Tetranucleotides (penta-, hexa-…) Tetranucleotides (penta-, hexa-…) ►more ►more information but more computing effortinformation but more computing effort
Contigs can be ordered by means of their tetranucleotide similarityContigs can be ordered by means of their tetranucleotide similarity
Teeling et al., 2004 Environ Microbiol. 6:938-947Teeling et al., 2004 Environ Microbiol. 6:938-947
High regression may indicate similar genome genetic codificationHigh regression may indicate similar genome genetic codification
Probably fragments of Probably fragments of the same organismthe same organism