14
3- NON-RIBOSOMAL GENE RECONSTRUCTION Core / auxiliary / strain specific genes Core / auxiliary / strain specific genes Housekeeping genes and accordance with global Housekeeping genes and accordance with global reconstruction reconstruction MLSA MLSA Alignment (aminoacid / nucleotide Alignment (aminoacid / nucleotide depends on the depends on the level of resolution) level of resolution) Filtering alignments Filtering alignments Number of genes for a stable topology Number of genes for a stable topology Horizontal gene transfer Horizontal gene transfer Tetranucleotide signatures Tetranucleotide signatures

3- NON-RIBOSOMAL GENE RECONSTRUCTION Core / auxiliary / strain specific genes Housekeeping genes and accordance with global reconstruction MLSA

Embed Size (px)

Citation preview

Page 1: 3- NON-RIBOSOMAL GENE RECONSTRUCTION  Core / auxiliary / strain specific genes  Housekeeping genes and accordance with global reconstruction  MLSA

3- NON-RIBOSOMAL GENE RECONSTRUCTION

Core / auxiliary / strain specific genesCore / auxiliary / strain specific genes

Housekeeping genes and accordance with global reconstructionHousekeeping genes and accordance with global reconstruction

MLSAMLSA

Alignment (aminoacid / nucleotide Alignment (aminoacid / nucleotide depends on the level of depends on the level of resolution)resolution)

Filtering alignmentsFiltering alignments

Number of genes for a stable topologyNumber of genes for a stable topology

Horizontal gene transferHorizontal gene transfer

Tetranucleotide signaturesTetranucleotide signatures

Page 2: 3- NON-RIBOSOMAL GENE RECONSTRUCTION  Core / auxiliary / strain specific genes  Housekeeping genes and accordance with global reconstruction  MLSA

Housekeeping genes Housekeeping genes alternative phylogenies alternative phylogenies

Core genes with phylogenetic Core genes with phylogenetic signalsignal

Auxiliary genes, not present Auxiliary genes, not present in all populations with low in all populations with low

phylogenetic signalphylogenetic signal

Specific genes of a single Specific genes of a single strain without phylogenetic strain without phylogenetic

signalsignal

Lan and Reeves. 2000 TRENDS Microbiol 8: 396-401Lan and Reeves. 2000 TRENDS Microbiol 8: 396-401

Page 3: 3- NON-RIBOSOMAL GENE RECONSTRUCTION  Core / auxiliary / strain specific genes  Housekeeping genes and accordance with global reconstruction  MLSA

Characteristics of a molecule as molecular clockCharacteristics of a molecule as molecular clock

Universally presentUniversally present

only only 3434 orthologous universal genes ( orthologous universal genes (Huynen & Bork, PNAS, 1998. 95:5849-5856Huynen & Bork, PNAS, 1998. 95:5849-5856))

group specific (i.e. phylum, family, genus…) genes with phylogenetic signal can be usedgroup specific (i.e. phylum, family, genus…) genes with phylogenetic signal can be used

functional constancy functional constancy

sufficient sequence conservation for reconstruction purposessufficient sequence conservation for reconstruction purposes

sufficient sequence complexity for a good phylogenetic signalsufficient sequence complexity for a good phylogenetic signal

Ludwig and Schleifer. 2005 Microbial phylogeny and Ludwig and Schleifer. 2005 Microbial phylogeny and evolution (Sapp) 70-98. (Oxford University Press)evolution (Sapp) 70-98. (Oxford University Press)

Markers supporting global phylogeniesMarkers supporting global phylogenies

RNAr 16SRNAr 16S

RNAr 23SRNAr 23S

EF-Tu (EF-Tu (some phyla are paraphyletic e.g. some phyla are paraphyletic e.g. Actinobacteria Actinobacteria yy Streptomyces Streptomyces))

RNA polimerase rpoB RNA polimerase rpoB (some phyla are (some phyla are paraphyletic e.g. paraphyletic e.g. EpsilonproteobacteriaEpsilonproteobacteria y resto y resto ProteobacteriaProteobacteria))

Heat Shock Hsp60 Heat Shock Hsp60 ((BacteriaBacteria: GroEL, : GroEL, ArchaeaArchaea: : Tf-55; some may be paraphyletic)Tf-55; some may be paraphyletic)

Aminoacyl tRNA sintetasesAminoacyl tRNA sintetases

Markers that do not support global phylogeniesMarkers that do not support global phylogenies

ATPasesATPases

DNA girasesDNA girases

Hsp70Hsp70

RecARecA

Housekeeping genes Housekeeping genes not all give the same resolution not all give the same resolution

Page 4: 3- NON-RIBOSOMAL GENE RECONSTRUCTION  Core / auxiliary / strain specific genes  Housekeeping genes and accordance with global reconstruction  MLSA

PHYLOGENY BASED ON NON-RIBOSOMAL GENES: MLSAPHYLOGENY BASED ON NON-RIBOSOMAL GENES: MLSA

Stackebrandt et al., 2002, Int J Syst Evol Microbiol. Stackebrandt et al., 2002, Int J Syst Evol Microbiol. 52:846-849 52:846-849 Gevers et al., 2005, Nature Rev. Microbiol. Gevers et al., 2005, Nature Rev. Microbiol. 3:733-7393:733-739

MLSAMLSA (multilocus sequence analysis):(multilocus sequence analysis):

5-105-10 full/partial sequencesfull/partial sequences

house keeping geneshouse keeping genes primer design difficultiesprimer design difficulties biases in the selection of genesbiases in the selection of genes time consumingtime consuming ↓↓ ↓↓ number for stable topologynumber for stable topology

Amplify and sequence 5-10 Amplify and sequence 5-10 housekeeping genes for each strainhousekeeping genes for each strain

Concatenate Concatenate gene sequencesgene sequences

Reconstruct Reconstruct the the

phylogenyphylogeny

genAgenA genBgenB genCgenC genDgenD genEgenE genFgenFStr. 1Str. 1Str. 2Str. 2Str. 3Str. 3Str. 4Str. 4

Page 5: 3- NON-RIBOSOMAL GENE RECONSTRUCTION  Core / auxiliary / strain specific genes  Housekeeping genes and accordance with global reconstruction  MLSA

The alignments of proteins of genes are less clear than rRNAThe alignments of proteins of genes are less clear than rRNA

Protein sequences vs. rRNA sequencesProtein sequences vs. rRNA sequences

Codifying DNA harbors information in triplets (codons)Codifying DNA harbors information in triplets (codons)

Degenerated code allows silent mutations (not much evolutionarily constraints)Degenerated code allows silent mutations (not much evolutionarily constraints)

For deep phylogenies, amino acid alignments give better resolution. For deep phylogenies, amino acid alignments give better resolution.

DNA phylogenies should only be done with close relative sequencesDNA phylogenies should only be done with close relative sequences

Generally shorter sequences (300-1000 residues) than rRNAGenerally shorter sequences (300-1000 residues) than rRNA

Page 6: 3- NON-RIBOSOMAL GENE RECONSTRUCTION  Core / auxiliary / strain specific genes  Housekeeping genes and accordance with global reconstruction  MLSA

Removing hypervariable positions Removing hypervariable positions reducing phylogenetic noise reducing phylogenetic noise

http://molevol.cmima.csic.es/castresana/Gblocks.html

Page 7: 3- NON-RIBOSOMAL GENE RECONSTRUCTION  Core / auxiliary / strain specific genes  Housekeeping genes and accordance with global reconstruction  MLSA

Of all Of all 22 22 analyzed genesanalyzed genes::

57 % 57 % BacteroidetesBacteroidetes

27 % 27 % ChlorobiChlorobi

18 % 18 % Chlorobi- BacteroidetesChlorobi- Bacteroidetes

A. PyrG

0.1

74100

100

98

100

100

99

64

83

Bacteroidetes fragilis YP_099886

Bacteroidetes thetaiotaomicron NP_809503

Prevotella intermedia PINA1923

Porphyromonas gingivalis NP_904820

Cytophaga hutchinsonii ZP_00311101

Chlorobium chlorocromatii YP_378411

Chlorobium tepidum NP_661048

Rhodopirellula baltica NP_870297

Geobacter sulfurreducens NP_952944

Nitrosomonas europaea NP_841114

Treponema denticola NP_952944

Oceanobacillus iheyensis NP_693929

0.1

10084

98

74

61

100

57

54

58

Bacteroidetes fragilis YP_099485

Bacteroidetes thetaiotaomicron NP_809651

Porphyromonas gingivalis NP_904395

Prevotella intermedia PIN0101

Cytophaga hutchinsonii ZP_00309740

Chlorobium chlorocromatii YP380083

Chlorobium tepidum NP_662473

Geobacter sulfurreducens NP_952658

Nitrosomonas europaea NP_841474

Rhodopirellula baltica NP_867123

Treponema denticola NP_973266

Oceanobacillus iheyensis NP_693907

B. GlyA

0.1

10099

100

99

100

59

33

17

57

Bacteroidetes fragilis YP_100673

Bacteroidetes thetaiotaomicron NP_810742

Prevotella intermedia PINA1797

Porphyromonas gingivalis NP_904815

Cytophaga hutchinsonii ZP_00310575

Chlorobium chlorocromatii YP_379609

Chlorobium tepidum NP_661430

Geobacter sulfurreducens NP_954380

Nitrosomonas europaea NP_840129

Treponema denticola NP_971783

Rhodopirellula baltica NP_868643

Oceanobacillus iheyensis NP_691577

C. GroEL

One cannot rely on single gene One cannot rely on single gene reconstructions that may produce inconsistent reconstructions that may produce inconsistent

resultsresults

Single genes may lead to different topologies Single genes may lead to different topologies

Page 8: 3- NON-RIBOSOMAL GENE RECONSTRUCTION  Core / auxiliary / strain specific genes  Housekeeping genes and accordance with global reconstruction  MLSA

The amount of genes in the concatenate influence the stability of the treeThe amount of genes in the concatenate influence the stability of the tree

0102030405060708090

100

4 8 12 16

Boots

trap

Number of genes

random selection among the random selection among the 2222 genes genes checking branching robustnesschecking branching robustness

The bootstrap values improve The bootstrap values improve with the increase of amount of with the increase of amount of

genes in the analysisgenes in the analysis

Below Below 8 8 genes one can obtain genes one can obtain unstable topologiesunstable topologies

1212 genes gave the threshold for genes gave the threshold for reliabilityreliability

For taxonomic purposes, 16S For taxonomic purposes, 16S rRNA gene sequence analysis is rRNA gene sequence analysis is the most parsimonious approachthe most parsimonious approach

Sória-Carrasco et al., 2008, System Appl Microbiol. Sória-Carrasco et al., 2008, System Appl Microbiol. 30:171-17930:171-179

Page 9: 3- NON-RIBOSOMAL GENE RECONSTRUCTION  Core / auxiliary / strain specific genes  Housekeeping genes and accordance with global reconstruction  MLSA

MLSA: phylogenetic reconstructions

MULTIPLE SEQUENCE ALIGNMENTS

sometimes have better resolution than the 16S rRNA gene

16S rRNA gene can have very low resolution

Jiménez et al., 2013, System Appl Microbiol, 36: 383- 391

Page 10: 3- NON-RIBOSOMAL GENE RECONSTRUCTION  Core / auxiliary / strain specific genes  Housekeeping genes and accordance with global reconstruction  MLSA

MULTIPLE SEQUENCE ALIGNMENTS (LARGE SETS)

r-MLST (ribosomal protein concatenates)

SPECL (single copy marker genes)

r-MLST (ribosomal protein concatenates)

http://pubmlst.org/rmlst/

Jolley et al., 2012, Microbiology 158:1005-15

53 ribosomal protein genes (rps genes)

SPECL

(http://vm-lux.embl.de/~mende/specI/)

Mende et al., Nat Methods, in revision

on 40 universal, single copy marker genes

Optimized cutoffs (96.5% nucleotide identity)

MONOPHYLY: phylogenetic reconstructions (MLSA)

Page 11: 3- NON-RIBOSOMAL GENE RECONSTRUCTION  Core / auxiliary / strain specific genes  Housekeeping genes and accordance with global reconstruction  MLSA

Loosing identity due to HGTLoosing identity due to HGT

Kunin et al. 2005. Genome Res. 15:954-959 Kunin et al. 2005. Genome Res. 15:954-959

Phylogenetic incongruences: Phylogenetic incongruences:

HGT makes fuzzy the assignment of HGT makes fuzzy the assignment of identitiesidentities

Masive HGT in the microbial worldMasive HGT in the microbial world

No tree of life is possibleNo tree of life is possible

TWO SCHOOLSTWO SCHOOLS

Phylogenetic incongruences: Phylogenetic incongruences:

Can be explained by Can be explained by

► ► gene duplication (paralogy) and gene duplication (paralogy) and deletion (hidden paralogy)deletion (hidden paralogy)

► ► false orthology assignationfalse orthology assignation

► ► alignments artifactsalignments artifacts

Orthology should be carefully checkedOrthology should be carefully checked

Soria-Carrasco & Castresana, 2008. Mol. Biol. Soria-Carrasco & Castresana, 2008. Mol. Biol. Evol. 25: 2319-2329Evol. 25: 2319-2329

Kurland. 2005. Bioessays 27:741-747Kurland. 2005. Bioessays 27:741-747

Page 12: 3- NON-RIBOSOMAL GENE RECONSTRUCTION  Core / auxiliary / strain specific genes  Housekeeping genes and accordance with global reconstruction  MLSA

pyrEpyrE

aroAaroA

Some times no other explanation (either true or lack of information)Some times no other explanation (either true or lack of information)

Some times a loss of phylogenetic signalSome times a loss of phylogenetic signal

Page 13: 3- NON-RIBOSOMAL GENE RECONSTRUCTION  Core / auxiliary / strain specific genes  Housekeeping genes and accordance with global reconstruction  MLSA

Tetranucleotide variation: 4Tetranucleotide variation: 444 = 256 = 256

TETRATETRA::

Genomes have an oligonucleotide usage (not yet Genomes have an oligonucleotide usage (not yet understood, related to codon usage)understood, related to codon usage)

Similar genomes might have similar usageSimilar genomes might have similar usage

ALIGNMENT FREE PARAMETERALIGNMENT FREE PARAMETER

may be useful in deciding whether a group of strains may be useful in deciding whether a group of strains deserve a species statusdeserve a species status

Genome SignaturesGenome Signatures

G+C content G+C content ►dinucleotide ►dinucleotide ► ► not much not much informativeinformative

Codon usage Codon usage ► trinucleotide ► more ► trinucleotide ► more informativeinformative

Tetranucleotides (penta-, hexa-…) Tetranucleotides (penta-, hexa-…) ►more ►more information but more computing effortinformation but more computing effort

Page 14: 3- NON-RIBOSOMAL GENE RECONSTRUCTION  Core / auxiliary / strain specific genes  Housekeeping genes and accordance with global reconstruction  MLSA

Contigs can be ordered by means of their tetranucleotide similarityContigs can be ordered by means of their tetranucleotide similarity

Teeling et al., 2004 Environ Microbiol. 6:938-947Teeling et al., 2004 Environ Microbiol. 6:938-947

High regression may indicate similar genome genetic codificationHigh regression may indicate similar genome genetic codification

Probably fragments of Probably fragments of the same organismthe same organism