Upload
aspen
View
27
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Causes of insertion sequences abundance in prokaryotic genomes? A problem of size. Marie Touchon E.P.C Rocha Atelier de BioInformatique, Université Pierre et Marie Curie, Paris Unité Génétique des Génomes Bactériens, Institut Pasteur, Paris [email protected]. IS elements : - PowerPoint PPT Presentation
Citation preview
Causes of insertion sequences abundance in prokaryotic genomes?
A problem of size
Marie Touchon
E.P.C Rocha
Atelier de BioInformatique, Université Pierre et Marie Curie, Paris
Unité Génétique des Génomes Bactériens, Institut Pasteur, Paris
IS elements :
the simplest form of transposable elements
- 700 to 2500 bp
- coding only the information allowing their mobility
ability to generate mutations :
- by insertion within genes
- by activate genes on insertion upstream
- to generate extensive DNA rearrangements
have been found to shuttle the transfer of adaptive traits such as :
- antibiotic resistance
- virulence
- new metabolic capabilities
Their exact nature is still debated : Selfish/Advantageous?
- genomic parasites
- beneficial agents
Causes of insertion sequences abundance in prokaryotic genome ?
Reasons largely unknown and widely speculated
Hypotheses :- IS family specificity- Genome size- Frequency of horizontal gene transfer - Pathogenicity- Type of ecological associations- Human sedentarisation
The current availability of hundreds of genomes renders testable many of these hypotheses.
IS elements Identification :
Problem : ISs annotations are heterogeneous, inaccurate or insufficient
Solution : Reannotation of ISs using comparative study
by adopting the nomenclature defined by Chandler (1998)
- ISs have one or two consecutive ORFs encoding transposase protein
- ISs are grouped into 21 distinct families
ISs Reannotation
All annotated CDS Genome x
(1)(1) ISs CDS DetectionISs CDS Detection
ISs DatabaseChandler et al.
IS21 IS3
IS1
IS1A-IS21A-IS21B-IS1B IS1A-IS3A-IS3B-IS1A IS1A-IS1B
(2)(2) IS elements reconstitution
IS1 IS1
(3)(3) ISs complete or partial
ISs fragments (> 20% of difference length)
ISs with internal insertion Partial elements
ISs Reannotation - Reassessment
Annotated ISs CDS
Decteted ISs CDS
8823(89%)
2115(22%)
1194 (11%)
Shigella flexneri
Number of Annotated ISs CDS
Num
ber
of
Dete
cted ISs
CD
S
262 genomes(1)(1)
Y = 0.77 (0.02) X + 5.86 ( 1.89)
R2 = 0.81 (P< 0.0001)
R = 0.95 (P< 0.0001)
8123 ISs elements
83% are complete (may be active)
(2)(2)
(3)(3)
Only 20% (1994) of Genbank ISs had a consistent classification
Nu
mb
er
of
Gen
om
es
Distribution of ISs in 262 genomes
Shigella sonnei ( proteobacteria)
Bordetella pertussis ( proteobacteria)
Sulfolobus solfactaricus (archaebacteria)Bacillus haludorans (firmicute)Nitrobacter winogradskyi ( proteobacteria)
The absence of ISs is not anecdotic24% genomes lack IS48% genomes [0-10] ISs
High variability of the number of ISs / Genomeof the number of ISs families / Genome Number of ISs families
Num
ber
of
IS
s
Association with phylogenetic inertia
Rapid dynamic of gain and lossThe number of ISs evolve so fast, that
there is no historical correlation
The effect of IS family specificity
100%
90%
Incongruent phylogenetic treesHigh diversity of ISs found within strains or closely related species
Firmicute ; Proteo ; Proteo
Entero
Pseudomonas syringae tomato
Pseudomonas syringae syringae
Pseudomonas syringae pv. phaseolicola
10 IS342 IS523 IS2140 IS6610 IS111113 ISNCY 1 IS91
14 IS3 1 IS5
1 IS66
1 IS110 1 IS630
7 IS343 IS5 7 IS21 2 IS66 1 IS1111 1 ISNCY 3 IS91
52 IS256
= 139 ISs = 18 ISs = 116 ISs
+ +
The effect of IS family specificity : Examples
This effect is unlikely to explain the variability of ISs
The effect of genome size
Wilcoxon test : p<0.0001 Spearman’s r=0.63, p<0.0001
Strong association between Genome size and IS number (and density)
The larger the genome, the more IS elements it contains
N= 64 198
The effect of horizontal gene transfer
Strain A
specific region
Lists of orthologs
Strain A B C
A Bi jPutative orthologs: Reciprocal best hits, proteins with >90% similarity and <20% length difference.
Strain specific region:Exclusive region to a strainwhich presented at leastten consecutive genes withoutan orthologs
Strain Specific region
Prophage-Database (Nestle, Casjeans, 2003)
HGT-Database (Garcia-Vallve,2003)
E. Coli O157:H7 Sakai
The effect of horizontal gene transfer
Wilcoxon test : p<0.0001
5.2%
11.4%
t-test : p<0.001
ISs are ~ 4 times more concentrated
in HGT regions
Genomes lacking ISs have fewer HGT
Spearman’s r= 0.31 p>0.1 (NS)
HGT may be a determinant of the
presence of ISs, but not of its abundance
Spearman’s r=0.84, p<0.0001
The effect of horizontal gene transfer
HGT is a necessary but not sufficient condition to the presence of ISs
The intensity of HGT is not a significant determinant of the IS abundance
IS families diversity in HGT regions is almost as high as in
the entire genome
The effect of pathogenicity
Yersinia pestis (plague)
Shigella flexneri, sonnei (dysentery)
Bordetella pertussis (whooping cough)
4.33.6
Wilcoxon test : p>0.5
N = 100 153
IS=0 8% 17% 55% 100%
Wilcoxon test : p<0.001
No association between the
presence of IS and pathogenicity
Strong association between the frequency of IS and the facultative
character of the ecological associations
The effect of the type of ecological association
Stepwise multiple regression
Genome size
Ecological association
Frequency HGT
0.4
0.47
0.47
Number of ISs
Covariate Cumulative R2
Genome size is the most important
variable
Kruskal-Wallis test : p>0.5 (NS)
We removed genomes lacking IS(possibly under sexual isolation)
Lifestyles is a non-significant
determinant
The effect of human sedentarisation (Mira et al.,2006)
1) Genomes with many ISs are from prokaryotes associated with humans or domesticated animals and plants.
2) Large intra-genomic IS expansions are recent.
Kruskal-Wallis test : p>0.5 (NS)
not directlyindirectly
No evidence that man-related prokaryotes have more Iss.
Genome size explains ˜ 40% of the variance in IS abundance
The smallest the genome, the lower the number but also the lower density of ISs
- Selection could favor small genomes : optimal use of resources; the replication time (an increase in genome size caused by IS could be counter-selected)
- ISs are selected to generate genetic variation : (such selection should be stronger in larger genomes)
Genomes with fewer ISs, correspond to the slowest growing prokaryotes
Wilcoxon test : p<0.05
De
nsi
ty o
f IS
s (/
Mb
)
fast slow
Growth
tranposition inactivates genes with high probability
the total number of essential genes : ˜300
+ 200-300 genes are nearly ubiquitous
The abundance of IS elements in genomes could be mostly a question of space for not highly deleterious
transposition events
500 nearly essential genes
- Selection against transposition in genomes with higher density of deleterious transposition targets
One explanation fits well the available data
Conclusions
High diversity of ISs found within strains or closely related species
The number of ISs evolve so fast, that there is no historical correlation
HGT may be a determinant of the presence of ISs, but not of its abundance
Surprisingly, genome size alone is the best predictor of IS number and density
Selection against transposition in genomes with higher density of deleterious
transposition targets
Bordetella bronchiseptica
Bord
ete
lla p
ara
pert
uss
is
Impacts of IS abundance?
IS expansion :
- increases the rate of genome rearrangements
- increases the number of pseudogenes Number of ISs
% o
f br
eakp
oint
s co
inci
de w
ith I
S
observed
expectedO
/E
R
ge
ne
/inte
rge
ne
Number of ISs
Acknowledgements
E.P.C Rocha
A. Danchin
Institut Pasteur
La Région Ile de France
Nitrobacter winogradskyi Shigella sonnei
Examples
37 IS332 IS527 IS630 2 IS2114 IS481 4 ISNCY
107 IS3157 IS1 16 IS630 33 IS4 25 IS21 1 IS66 1 IS91 18 IS110 3 IS605 3 IS1111 4 ISAs1 2 ISNCY
= 117 ISs = 372 ISs
Pseudomonas syringae syringae
14 IS3 1 IS5 1 IS630 1 IS66 1 IS110
= 18 ISs
Large Repeats decrease genome stability
(Rocha, Trends Genetics, 03)
Sta
bili
ty
density of repeats
Association with stability ?
Number of ISs
Sta
biliy
But not ISs elements ?
The number of ISs evolve so fast, that there is no historical
correlation
Association with phylogenetic inertia ?
+IS
acquisition
+IS
expansion
-IS
deletion
lineage loss
+I
S
+I
S
Two scenariosgenomic
parasites beneficial agents
Burkholderia pseudomallei 36 Facultative pathogenBurkholderia mallei 152 Obligatory pathogen
Escherichia coli K12 52 CommensalShigella flexneri 298 Obligatory pathogen
Bordetella bronchiseptica 2 Facultative pathogenBordetella pertussis 247 Obligatory pathogen
Association with lifestyle ?
Link with lifestyle
host restriction, niche change, ..
Association with recent rearrangements ?
Yersinia pseudotuberculosis Yersinia pseudotuberculosis
Yers
inia
pest
is
Yers
inia
pest
is
Bordetella bronchiseptica Bordetella bronchiseptica
Bord
ete
lla p
ara
pert
uss
is
Bord
ete
lla p
ara
pert
uss
is
IS expansion promoted frequent
genomic rearrangements
Number of ISs
% o
f b
reakp
oin
ts
coin
cid
e w
ith
IS
observedexpected
B. bronchiseptica B. bronchiseptica
B.
pert
uss
isE. coli K12 E. coli K12
S.
Ente
rica
typhym
uri
um
S. enterica typhymuriumS.
ente
rica
ente
rica
sero
var
thyphi
Shig
ella
flexeneri
99% similarity 99% similarity 90% similarity
99% similarity99% similarity
Bord
ete
lla p
ara
pert
uss
is
IS expansion increases the rate of genome rearrangements
Association with recent rearrangements ?
32
IS
s
24
7
ISs
A B
IS
Or1
Or1’
Or2
Or2’Intergenic
region
B
Or1’
Or2’
A
Or1
Or2
A B
IS
Or1 Or1’
Or2
Or2’
Number of ISs in genes
Number of ISs in intergenes
Association with pseudogenes ?
Association with pseudogenes ?
IS expansion increases the number of pseudogenes
Number of ISs
O/E
R p
seu
do
R pseudo = Number of ISs in genes-----------------------------
Number of ISs in intergenes
+IS
+IS
-ISacquisiti
onexpansio
n
deletion
lineage loss
High variability :
- of the number of ISs / Genome
- of the number of ISs families / Genome
- of the number of ISs copies / Family
IS have been recenlty acquired (HGT)
IS expansion :
- is associated with lifestyle/niche change
- increases the rate of genome rearrangements
- increases the number of pseudogenes
Conclusions
ISs are frequent but not all ubiquitous
ISs number and families varie a lot
Lack of association of the stability with the number of ISs
The presence of ISs is associated with lifestyle
beneficial agents
IS expansion increases the rate of genome rearrangements
IS expansion increases the number of pseudogenes
genomic parasites
Conclusions
High variability of the number of ISs / Genomeof the number of ISs families / Genome
Nu
mb
er
of
Gen
om
es
Nu
mb
er
of
Gen
om
es
Number of ISs families
Number of ISs
Number of ISs families
How many IS ?N
um
ber
of
Gen
om
es
Nu
mb
er
of
Gen
om
es
ISs families
Log
(Nu
mb
er
of
ISs/G
en
om
e)
112-108 : IS1126-124 : IS334-22 : IS4
157 : IS1106 : IS333 : IS425 : IS21
16 : IS110229 : IS481
Number of ISs families
N
um
ber
of
ISs
B. pertussisS. sonnei
S. flexneri
High variability of the number of ISs families / Genomeof the number of ISs / Family
How many IS ?
Hypothesis I
IS induce short spikes of instability which are averaged out in a deep phylogenetic analysis
Hypothesis II
Invasions of highly replicative IS lead to deleterious instability and lineage loss