26
Microsatellites in Small Genomes Milo Thurston CEH Oxford

prokaryote microsatellites

Embed Size (px)

Citation preview

Page 1: prokaryote microsatellites

Microsatellites in Small Genomes

Milo ThurstonCEH Oxford

Page 2: prokaryote microsatellites

Genomes

• What is the genetic capacity of an organism?

• What does an organism do with this genetic capacity?

• How fast does this genetic capacity change over time? How?

Page 3: prokaryote microsatellites

Microsatellites

a feature found across all genomes

(eukaryotes, bacteria, plasmids, viruses, organelles)

Create a “Microbial Genomes Microsatellite

Database” inside the genomemine

Page 4: prokaryote microsatellites

ATCGATGCATCGATGCATATATATATATATATATATATATATATATATATATTGCCTGGTGCCTGG (AT)9

Microsatellites are hot-spots of mutation(short direct repeats of 1-6 bp)

Page 5: prokaryote microsatellites

ATCGATGCATCGATGCATATATATATATATATATATATATATATATATTGCCTGCCTGGTGG

ATCGATGCATCGATGCATATATATATATATATATATATATATATATATATATATATATATTGCCTGGTGCCTGG

ATCGATGCATCGATGCATATATATATATATATATATATATATATATATATATTGCCTGGTGCCTGG (AT)9

Microsatellites are hot-spots of mutation(short direct repeats of 1-6 bp)

(AT)8

(AT)9

(AT)11

ATCGATGCATCGATGCATATATATATATATATATATATATATATATATATATTGCCTGGTGCCTGG

Page 6: prokaryote microsatellites

Why study microsatellites in ‘small’ genomes?

• Present in significant numbers, but not in all genomes• Availability of Collections of Genomes• Mutation & Selection• Develop New Molecular Markers• Insights into Gene and Genome Mutability• Detect and Study Loci under Selection• Experimental Systems

Microsatellites

• Molecular Markers in Eukaryotes• Triplet-Repeat Expansion Diseases• Contingency Loci in Pathogenic Prokaryotes

Page 7: prokaryote microsatellites

ATATATATATATATATAATATATATATATATATA

ATATATATATATATATATATATATATATATATATATATAT

ATATATATATATATATAATATATATATATATATA

• Reversible

Evolutionary Potential of Microsatellite Loci

• Rapid Rates10 -2 - 10 -5

Haemophilus influenzae

“Phase Variation”Many pathogenic bacteria have the ability to rapidly switch the abundances and types

of molecules on their cell surface.

lic1

Page 8: prokaryote microsatellites

ON - OFF Molecular Switches

(CAAT)40

(AT)8

(translational switch)

(transcriptional switch)

Page 9: prokaryote microsatellites

ON - OFF Molecular Switches

(CAAT)39

(AT)7

(translational switch)

(transcriptional switch)

Page 10: prokaryote microsatellites

Origin and Maintenance of Loci Involved in Antigenic Variation, Ecological Tradeoffs & ‘Mutational Phenotypes’ ATGCAATCAATCAATCAATCAATCAATCAA

TCAATCAATCAATCAATCAATCAACAATCAATCAATCAATCAATCAAATTGTAGGATTTGTTAAAACTTGCTACAAGCCTGAGGAAGTATTTCATTTTCTTCATCAGCATTCCATTCCTTTTTCCTCCATTGGAGGAATGACCAATCAAAATGTTCTACTTAATATTTCTGGAGTTAAGTTTGTATTACGGATCCCTAATGCCGTAAATTTATCACTTATAAATCGAGA........

Page 11: prokaryote microsatellites

Genome Sequencing aids in the discovery of microsatellite “molecular switches” in pathogenic bacteria (“Contingency Loci” Moxon, Rainey, Nowak & Lenski, 1994)

Haemophilus influenzae (Hood et. al., 1996)Helicobacter pylori (Tomb et. al., 1997; Alm et. al., 1999)Campylobacter jejuni (Parkhill et.al., 2000)Neisseria meningiditus (Tettlin et. al., 2000; Parkhill et. al., 2000)

Functional Group Rd Gene pool

Loci NmB NmA Gene pool

Loci

Evasins 0 0 2 1 2 siaD, porALPS Biosynthesis 4 5 lic1, lic2, lic3,

lgtC,lex22 2 4 lgtA, lgtC, lgtD,

lgtGAdhesins 3 4 hmw1, hmw2,

yadA, hifA/hifB10 7 13 pilC1, pilC2, pglA,

opc, opaA-G, NMB1998, yadA

Iron Acquisition 4 7 hgpA-C, H10635, 0661,

0712, 1565

3 2 5 hpuA, hmbR, lgpA, frpB

Restriction-Modification Systems

1 2 mod, hsd 4 2 4 NMB0831, 1223, 1375, 1261,

NMA1040, 1467

Neisseria meningitidisHaemophilus influenzae

Page 12: prokaryote microsatellites

genomic survey...

Page 13: prokaryote microsatellites

Genomes (1802) with (467) and without microsatellites (1329)Thresholds where mononucleotide >13 bp and di- to hexanucleotides are > 6 repeat units)

Log Genome SizeLog Genome Size

G+C Content

G+C Content

G+C Content

Nu

m.

Mic

rosa

telli

tes

Nu

m.

Mic

rosa

telli

tes

Nu

m.

Mic

rosa

telli

tes

Page 14: prokaryote microsatellites

Genomes with the most MicrosatellitesTaxa Species Genome Size

in kbNo. Freq in bp

Mitochondrion Saccharomyces cerevisiae 86 171 502Virus Molluscum contagiosum virus subtype 1 190 79 2,409Bacteria Xanthomonas axonopodis pv. citri str 5,176 61 84,845Nucleomorph Guillardia theta 174 44 3,958Bacteria Xanthomonas campestris pv. campestris 5,076 43 118,051Virus shrimp white spot syndrome virus 305 42 7,264Chloroplast Marchantia polymorpha 121 40 3,026Nucleomorph Guillardia theta 181 40 4,523Virus Gallid herpesvirus 3 164 40 4,107Virus Spodoptera exigua nucleopolyhedrovirus 136 38 3,569Bacteria Helicobacter pylori 26695 1,668 37 45,077Chloroplast Chlorella vulgaris 151 37 4,071Bacteria Xylella fastidiosa 2,679 35 76,552Bacteria Helicobacter pylori J99 1,644 34 48,348

• Repeats extremely common in some genomes

Page 15: prokaryote microsatellites

Taxa Species Genome Size in

kb

G+C No. Freq

Mitochondrion Monosiga brevicollis 77 14 5 15,314Mitochondrion Apis mellifera ligustica 16 15 8 2,043Mitochondrion Saccharomyces cerevisiae 86 17 171 502Mitochondrion Drosophila melanogaster 20 17 24 813Virus Amsacta moorei entomopoxvirus 232 17 11 21,127Mitochondrion Pichia canadensis 28 18 31 893Mitochondrion Bombyx mori 16 18 11 1,422Virus Melanoplus sanguinipes entomopoxvirus 236 18 11 21,465Mitochondrion Bombyx mandarina 16 18 10 1,593Mitochondrion Schizosaccharomyces japonicus 80 19 8 10,007Mitochondrion Ostrinia furnacalis 15 19 1 14,536Mitochondrion Ostrinia nubilalis 15 19 1 14,535Mitochondrion Saccharomyces castellii 26 20 18 1,431Mitochondrion Tetrahymena thermophila 48 20 11 4,325

• Repeats extremely common in some genomes • G+C content is a factor (mutational pressure), but some extremely G+C skewed genomes lack large number of microsatellites (negative selection)

Genomes with the lowest G+C content

Page 16: prokaryote microsatellites

Taxa Species Genome Size in

kb

G+C No. Freq in kb

Virus Bovine herpesvirus 1 135 72 25 5,412Virus human herpesvirus 2 155 70 26 5,952Virus human herpesvirus 1 152 68 17 8,957bacteria Caulobacter vibrioides 4,017 67 29 138,515bacteria Ralstonia solanacearum 3,716 67 28 132,729bacteria Deinococcus radiodurans 2,649 67 3 882,879bacteria Halobacterium sp. NRC-1 2,014 67 3 671,413Virus Tupaia herpesvirus 196 66 16 12,241bacteria Pseudomonas aeruginosa 6,264 66 5 1,252,881Virus Grapevine fleck virus 8 66 2 3,782bacteria Xanthomonas campestris pv. campestris 5,076 65 43 118,051bacteria Mycobacterium tuberculosis 4,412 65 2 2,205,765bacteria Mycobacterium tuberculosis CDC1551 4,404 65 1 4,403,836bacteria Xanthomonas axonopodis pv. citri str 5,176 64 61 84,845Plasmid Rhodococcus equi 81 64 1 80,609Virus Molluscum contagiosum virus subtype 1 190 63 79 2,409

Genomes with the highest G+C content

Page 17: prokaryote microsatellites

Log Genome Size

MicrosatelliteFootprint Lengthversus GenomeSize

Longest repeats are • extremely long• hexanucleotides in Herpes viruses, vertebrate mitochondrial genomes, VNTR’s in pathogenic prokaryotes, contingency loci• artefact (plasmid dinucleotide)• viral virulence factor (mononucleotide)• Include long polymorphic repeats in Baculoviruses (variety of repeats)• largely unannotated

Footp

rin

t Le

ng

th

103 104 105 106 107

Genome Size

Footp

rint

Len

gth

103 104 105 106 107

Page 18: prokaryote microsatellites

Microsatellites in Bacteria

H. influenzaeNeisseria x 2

Pathogenic E. coli

M. genitalium

C. jejuni

H. pylori x 2

VNTRs

Page 19: prokaryote microsatellites

Observations

• ‘Small genomes’ have significant numbers of long microsatellites

• A variety of factors, including genome size and G+C content, contribute to presence/absence

• Taxonomic differences (numbers, motif types, biological significance)

• Next step requires extensive curation (meta-data, genetic content, homology, literature on phenotype and mutability)

Page 20: prokaryote microsatellites

genomemine/genomebank

merging evolutionary and ecological meta-data with complete genome

sequences

“key”=“value”information

Page 21: prokaryote microsatellites

• Facilitate new computational studies• Growth in the number of genomes• Biological Patterns

• Biases • Evaluate Prospective Data Sets• Hypothesis Generation• Inform ongoing computational/empirical

studies

Motivations

Page 22: prokaryote microsatellites

genomemine/genomebank• Automated retrieval of genomes (bacteria, plasmids, viruses, and

organelles)

• Meta-data collected from:– NCBI genome annotations (Genome Size, G+C, Taxonomy, Nucleic Acid type,

circular/linear, Number of Coding Regions, Percent Coding, A, C, T, G, A/C/T/G skew, etc)

– Primary genome publications (‘Why sequenced’, number of chromosomes, publication date, number of ribosomal operons, tRNA genes, pseudogenes, megaplasmids, contingency loci, etc).

– Ecological literature (habitat, extremophile?, host, carbon source, oxygen, shape, motile?, etc)

– Analysis of meta-data (Description of Collections) (Total MB sequenced, Sub totals of any subset or taxonomic level, alphabetical order, publication order, ranks, etc)

– Expert input on specific taxonomic groups (naming conventions, host range, variables not shared across genomes)

– Informatic Studies (modules) (microsatellites, low complexity regions, orphans, etc)

Page 23: prokaryote microsatellites

0

10000

20000

30000

0 50000 100000 150000

Non-OrphansO

rphans

0

2000

4000

6000

8000

0 10000 20000 30000 40000

Non-Orphans

Orp

hans

Acquisition of Orphansas genomes are sequenced

Archaea

Eubacteria

Proportion Low Complexityand G+C Content

02468

101214

0 10 20 30 40 50 60 70 80 90 100

G+C Content

Pe

rce

nt

Lo

w

Co

mp

lex

ity

Finished ‘genomebank’ reportsFeature Info Value

Taxonomygenus ? Haemophilusspecies ? influenzaestrain ? Rd

Core Genome FeaturesGenome Size ? 1,830,138ORFs ? 1709G+C ? 39%Orphans at Time of Publication ? 389

EcologyPrimary Habitat ? Human Host/Respiratory TractInteraction ? Commensal with ability to cause disease

Computed Features

Microsatellites ? 14 (Link in MGMD)

Percent Low Complexity ? 3.8

FutureInteractive Plotting

Page 24: prokaryote microsatellites

EcologyField

HabitatPrimary habitatExtremophile? Optimal TemperatureOptimal pHEnvironmental BreadthTrophic StatusInteractionObligate? GuildOxygenEnergyCarbonGrowthDoubling Time in vitroMorphologyShapeGram stainMedian widthMedian lengthVolumeSurface to volume ratioMotile?

Phylogeny“A Tree Viewer” (ATV)Zmasek C. M and Eddy S.R (2001) ATV: display and manipulation of annotated phylogenetic trees.Bioinformatics. 17, 383-384.http://www.genetics.wustl.edu/eddy/atv

Annotated Phylogenies

NCBI Taxonomy16S ribosomal DNA(proteome comparisons)

Page 25: prokaryote microsatellites

Summary• Collections of Genomes present new

opportunities• Should merge genomes with evolutionary

and ecological meta-data to put genomic information in an ‘organismal context’

• Biological patterns/rules (biases/artefacts) emerge (microsatellites)

• genomemine/genomebankhttp://www.genomics.ceh.ac.uk/GMINE/

Page 26: prokaryote microsatellites

Acknowledgements

Dawn Field

Jennifer HughesAdrian TettAndrew SpiersSarah TurnerMark Bailey

Ali Cody Chris BaylissDerek HoodRichard Moxon