Upload
david-h
View
214
Download
1
Embed Size (px)
Citation preview
L E T T E R S
Genomic islands, such as pathogenicity islands, contribute to theevolution and diversification of microbial life1. Here we reporton the Widespread Colonization Island, which encompasses thetad (tight adherence) locus for colonization of surfaces andbiofilm formation by the human pathogen Actinobacillusactinomycetemcomitans. At least 12 of the 14 genes at the tadlocus are required for tenacious biofilm formation and synthesisof bundled Flp pili (fibrils) that mediate adherence. The pilinsubunit2, Flp1, remains inside the cell in tad-locus mutants,indicating that these genes encode a secretion system for exportand assembly of fibrils. We found tad-related regions in a widevariety of Bacterial and Archaeal species3, and their sequencecharacteristics indicate possible horizontal transfer. To test thehypothesis of horizontal transfer, we compared the phylogeny ofthe tad locus to a robust organismal phylogeny using statisticaltests of congruence and tree reconciliation techniques. Ouranalysis strongly supports a complex history of gene shuffling byrecombination and multiple horizontal transfers, duplicationsand losses. We present evidence for a specific horizontal transferevent leading to the establishment of this region as a determinantof disease.
Fresh clinical isolates of γ-proteobacterium A. actinomycetemcomi-tans adhere tightly to surfaces to form biofilms that are extremelydifficult to dislodge. This property is required for colonization andpersistence in the oral cavity and initiation of periodontal disease(ref. 4 and H.C. Schreiner et al., manuscript submitted). Genes fromthe tad locus (Fig. 1) including pilin-encoding gene2,5 flp-1 andtadABCDEFG6 are required for tight adherence and the associatedphenotypes of autoaggregation, rough colony morphology and fib-ril production. Similar tad loci have recently been implicated inhuman and animal diseases caused by Pasteurella multocida7 andHaemophilus ducreyi8, which, like A. actinomycetemcomitans, arePasteurellaceae family members. tad-related loci are found in otherhuman pathogens, including Yersinia spp., Bordetella pertussis,Pseudomonas aeruginosa, Corynebacterium diphtheriae andMycobacterium tuberculosis, indicating a possible role in coloniza-tion and pathogenesis in these bacteria3. A very similar region in the
1Department of Microbiology, College of Physicians & Surgeons, Columbia University, 701 West 168th Street, New York, New York 10032, USA. 2Department of OralBiology, University of Medicine and Dentistry of New Jersey; Newark, New Jersey 07103, USA. 3Molecular Biology Laboratory, American Museum of Natural History,Central Park West at 79th Street, New York, New York 10024, USA. Correspondence should be addressed to D.H.F. ([email protected]).
The Widespread Colonization Island of ActinobacillusactinomycetemcomitansPaul J Planet1, Scott C Kachlany1, Daniel H Fine2, Rob DeSalle3 & David H Figurski1
NATURE GENETICS VOLUME 34 | NUMBER 2 | JUNE 2003 193
Ye 45.5%34.8%10,516 bp
46%49.2%
11,515 bpAa 45.2%
35.7%
45.8%42.7%
2 kb
Yp10,441 bp
48.7%
36.7%45.2% 48.5 ± 1.5%
11,643 bp
41.3%Pm 34.7%
38.8% 40.8-43.2%
11,766 bp
39.7%34.8%
39.2%Hd 38%
Bergey’s
Cl 53.1% 53.3%10,227 bp 8,764 bp
11,227 bp
58.5%50.6% 52.5%Ct
9,584 bp
69% 68.7% 69.2%Cc
7,061 bp
65.9% 66.7%Pa 65.3% 67.2%
62-67%
52.5%
52.5%
a
b
flp- 1
flp- 2
tadVrcpC rcpA rcpB tadZ tadA tadBtadC tadD ta
dEta
dFtadG
Figure 1 tad locus. (a) Schematic of the tad locus ofA. actinomycetemcomitans. (b) G+C percentages in tad-related regionsof different organisms compared to those of equal-sized flankingregions and to referenced G+C percentages for the entire genome. Aa,A. actinomycetmcomitans; Pm, Pasteurella multocida; Hd, Haemophilusducreyi; Yp, Yersinia pestis; Ye, Yersinia enterocolitica; Cl, Chlorobiumlimicola; Ct, Chlorobium tepidum; Cc, Caulobacter crescentus CB15; Pa,Pseudomonas aeruginosa. For each species, the middle portion representsthe tad locus (defined as the contiguous area containing ORFs similarto ORFs from the A. actinomycetemcomitans tad locus). The verticaldisplacement of the line is proportional to the percentage of G+C over thatregion. Approximations of G+C percentage for the overall genome are asreported30. Dots indicate positions of putative uptake sequences11,16
(5′-aAAGTGCGgtc-3′), which were found by allowing two of the four lower-case positions in the canonical sequence to vary. A paucity of uptakesequences was also found in H. ducreyi under less stringent sequencesearching conditions (data not shown).
©20
03 N
atu
re P
ub
lish
ing
Gro
up
h
ttp
://w
ww
.nat
ure
.co
m/n
atu
reg
enet
ics
L E T T E R S
nonpathogenic, α-proteobacterium Caulobacter crescentus isrequired for production of pili9 and possibly involved in colonizationof surfaces10. The presence of similar loci in other nonpathogenicspecies, such as the Chlorobiaceae, argues for a role in colonizationof environmental niches3.
To test if other genes in the tad locus are required for adherence,we used transposon IS903φkan11 to isolate nonadherent mutants ofA. actinomycetemcomitans strain CU1000N6. Screening by PCR forinsertions between flp-1 and tadA identified mutations in rcpA, rcpCand tadZ. The product of rcpA is expressed only in rough, adherentstrains12. The functions of the products of rcpC and tadZ are unknown.
All mutants were defective in rough colony formation, autoaggre-gation, adherence and production of Flp pili (Fig. 2a,b). These muta-tions are not polar on downstream tad genes, as they werecomplemented by the wild-type gene on an IncQ plasmid (Fig. 2a,b).No insertions were detected in flp-2, tadV or rcpB, but tadV encodes aputative prepilin peptidase, and its homolog in C. crescentus (cpaA) isrequired for pilus production9. The functions of the products of aputative pilin gene flp-2 and rcpB are unknown. Taken together, theresults indicate that at least 12 of 14 genes of the tad locus are requiredfor adherence.
Several genes in the tad locus (flp-1, flp-2, tadV, rcpA, tadA, tadBand tadC) are putative homologs of genes encoding bacterial typeII secretion12,13, biogenesis of type IV pili and archaeal flagella5,9,14.Although TadA, an ATP hydrolase15, is related to ATPases of pilusassembly and type II secretion systems (for example, PulE,Klebsiella spp.), it is more closely related to those of type IV secre-tion systems13 (for example, VirB11, Agrobacterium tumefaciens),suggesting gene shuffling between types of secretion systems. Nomatches for rcpC, rcpB, tadZ, tadD, tadE, tadF or tadG occur inknown secretion systems. To test if the tad locus encodes a systemfor secretion and assembly of fibrils, we compared the localization
194 VOLUME 34 | NUMBER 2 | JUNE 2003 NATURE GENETICS
4630
2114
6.5
Whole cells14
Wild
-type
flp1–
tadA–
tadC–
tadD–
tadF–
tadG–
Supernatant
Coo
mas
sie
Wes
tern
6.5
14
a
b
c
Figure 2 Phenotypes of tad mutants. (a) Adherence phenotype. Shown is thetadZ1194 mutant with the IncQ plasmid pJAK16 (ref. 11; left) and with thewild-type tadZ gene expressed from the tacp promoter on pJAK16 (right).The complemented mutant adheres to the bottom of the well. Similar resultswere obtained for mutations in rcpA and rcpC. (b) Fibril production. Electronmicrographs of the tadZ1194 mutant with pJAK16 (left) and with thecomplementing tadZ+ pJAK16 plasmid (right). Similar results were obtainedfor mutations in rcpA and rcpC. Samples were negatively stained and viewedas previously described6 (c) Localization of Flp1-T7 in wild type and tadmutant strains. Protein from whole cells (top two panels) or supernatantfractions (bottom panel) was separated by SDS–PAGE. Total protein wasstained with Coomassie blue (top panel) and Flp1-T7 was detected bywestern blotting using monoclonal antibody against T7–Tag (Novagen;bottom two panels). Molecular mass markers, in kDa, are noted on the left.Dot blots probed with monoclonal antibody against T7–TAG with supernatantfractions concentrated 6× yielded similar results (see URL).
Incongruent (P < 0.01)
Borderline (P < 0.1)
Congruent
Not applicable
rcpC
rcpA
tadV
tadA
tadC+
tadB+
tadE
tadF
tadD
cpaD
tadZ
tadB–
tadC–
rcpCrcpA
tadVtadA
tadC+
tadB+
tadEtadF
tadDcpaDtadZ
tadB–
tadC–
tadG
tadG
Figure 3 Pairwise ILD analysis and partitions. Shown is a matrix of pairwiseILD comparisons used to define incongruent partitions. Red squares showhighly significant phylogenetic incongruence between several genes in thetad locus, which is best explained by past recombination events in the locus.A mostly white matrix would signal that there was little evidence of shufflingof genes between tad regions. We used this analysis to partition the tad locusinto sets of genes with very different evolutionary histories. We groupedgenes together in the same partition when they were either congruent (white)or borderline (pink) with every other gene in the partition. Because theincongruencies were not symmetric, several different partitions werepossible. We chose the one that had the highest number of contiguousgenes, which included cpaD, rcpC, rcpB, tadZ, tadA, tadB–, tadB+, tadC–,tadC+, tadD and tadF. tadE and tadG were also combined. All of thetentative partitions were tested against each other in succession. Partitionswere combined if subsequent ILD tests suggested congruence. In this way,tadV was combined with the largest partition to form P1, but rcpA (P2)and the combination of tadE and tadG (P3) remained independent andincongruent. flp genes were excluded from this analysis because their smallsize limited possible statistical support and their complex history ofduplication and loss made false orthology statements likely5.
©20
03 N
atu
re P
ub
lish
ing
Gro
up
h
ttp
://w
ww
.nat
ure
.co
m/n
atu
reg
enet
ics
L E T T E R S
NATURE GENETICS VOLUME 34 | NUMBER 2 | JUNE 2003 195
Aa
Ct
SmAtAf
PmHi
Yp
MlCc
Rs
PaBc
Bp
MtCd
Sc
26/● /● /● /● /●
70/● /● /● /● /●
16/● /● /● /● /●
9/● /● /● /● /●
3/76/● /● /● /●
2/N/68/● /N/●
12/● /● /● /67/60
19/● /● /● /● /●
1/● /● /● /● /●
54/● /● /● /● /●54/● /● /● /● /●
20/● /● /● /● /●
139/● /● /● /● /●269/● /● /● /● /●
208/● /● /● /● /●
11/● /● /● /●
2/● /68/N/N/●
2/● /68/N/N/69
12/● /● /● /● /●
SeEc
Vc
NgNm
Pg
10/● /● /● /● /●
Aa
Ct
Sm1
Sm2At
Af
PmYpMl1Ml2
Ml3
Cc
Pa
Rs2
RsBc1Bp1Bc2
Bp2
MtCd
Sc2
Sc1
11/● /● /● /N/●59/● /● /69/60/●
59/● /● /69/60/62
15/● /● /● /● /●
9/70/● /N/N/29
5/40/● /N/N/N
9/53/● /65/N/45
4/N/69/N/N/N
4/N/N/N/N/N
30/● /● /● /● /●
27/● /● /● /● /●
85/● /● /● /● /●
39/● /● /● /● /●
60/● /● /● /● /●69/● /● /● /● /●
4/66/● /● /74/●
14/● /● /76/● /●
8/53/N/77/● /N 42/● /● /● /● /●
P1:MP (ILD = 924; P < 0.01)
Aa
Ct
SmAtAf
PmHi
Yp
MlCc
Rs
PaBc
Bp
26/● /● /● /● /●
70/● /● /● /● /●
16/● /● /● /● /●
9/● /● /● /● /●
3/76/● /● /● /●
2/N/68/● /N/●
12/● /● /● /67/60
19/● /● /● /● /●
20/● /● /● /● /●
139/● /● /● /● /●269/● /● /● /● /●
208/● /● /● /● /●
11/● /● /● /●
2/● /68/N/N/●
2/● /68/N/N/69
12/● /● /● /● /●
SeEc
Vc
NgNm
Pg
10/● /● /● /● /●
Aa
Ct
SmAtAf
PmHi
Yp
MlCc
Rs
PaBc
Bp
26/● /● /● /● /●
70/● /● /● /● /●
16/● /● /● /● /●
9/● /● /● /● /●
3/76/● /● /● /●
2/N/68/● /N/●
12/● /● /● /67/60
19/● /● /● /● /●
20/● /● /● /● /●
139/● /● /● /● /●269/● /● /● /● /●
208/● /● /● /● /●
11/● /● /● /●
2/● /68/N/N/●
2/● /68/N/N/69
12/● /● /● /● /●
SeEc
Vc
NgNm
Pg
10/● /● /● /● /●
P2:MP (ILD = 223; P < 0.01) Aa
Ct
At
PmYp
Ml1
Ml2
Ml3Cc
Rs2
Rs
Bc2
AfPa
Bp2
Sm1
Sm2
13/● /● /● /● /●
29/● /● /● /● /●
1/34/N/N/N/N
1/36/N/50/71/N
29/● /● /● /● /●
Bp1
4/● /● /● /58/●
1/46/● /46/N/49
5/73/● /● /● /●
25/● /● /● /● /●19/● /● /● /● /●
6/● /● /● /● /●
16/● /● /● /● /●
4/● /● /● /N/74
2/46/N/N/N/N2/54/● /● /N/68
7/● /● /● /66/77
Aa
PmYp
Sm1Sm2CcCtRsMl1
PaRs2
Bp2Bc2
19/● /● /● /● /●
6/● /● /● /● /●
4/35/79/60/55/51
2/38/N/73/56/N
5/44/N/43/● /N
1/27/70/39/53
1/22/N/N/N/N
2/46/● /72/● /63
2/39/N/N/N/N
1/N/N/N/N/N
P3:MP (ILD = 15; P < 0.01)
2/46/● /72/● /44
23/● /● /● /● /●
AfRs2
Bc2Bp2
Aa
Ct
Sm1
Sm2At
PmYp
Ml1Ml2
Ml3
Cc
Pa
Rs
Bc1Bp1
MtCd
Sc2
Sc1
P1:ML
P2:MLAa
Ct
At
PmYp
Ml1
Ml2
Ml3Cc
Rs2
Rs
Bc2
AfPa
Bp2
Sm1
Sm2
Bp1
Aa
PmYp
Sm1Sm2Cc
Ct
RsMl1
Pa
Rs2Bp2Bc2
P3:ML
Aa
Ct
SmAtAf
PmHi
Yp
MlCc
Pa
MtCd
Sc
26/● /● /● /● /●
70/● /● /● /● /●
16/● /● /● /● /●
9/● /● /● /● /●
3/76/● /● /● /●
2/N/68/● /N/●
12/● /● /● /67/60
19/● /● /● /● /●
1/● /● /● /● /●
54/● /● /● /● /●54/● /● /● /● /●
20/● /● /● /● /●
139/● /● /● /● /●269/● /● /● /● /●
208/● /● /● /● /●
11/● /● /● /●
12/● /● /● /● /●
SeEc
Vc
NgNm
Pg
10/● /● /● /● /●
Aa
Ct
SmAtAf
PmHi
Yp
MlCc
Pa
26/● /● /● /● /●
70/● /● /● /● /●
16/● /● /● /● /●
9/● /● /● /● /●
3/76/● /● /● /●
2/N/68/● /N/●
12/● /● /● /67/60
19/● /● /● /● /●
20/● /● /● /● /●
139/● /● /● /● /●269/● /● /● /● /●
208/● /● /● /● /●
11/● /● /● /●
2/● /68/N/N/69
12/● /● /● /● /●
SeEc
Vc
NgNm
Pg
10/● /● /● /● /●
Aa
Ct
SmAtAf
PmHi
Yp
MlCc
Pa
26/● /● /● /● /●
70/● /● /● /● /●
16/● /● /● /● /●
9/● /● /● /● /●
3/76/● /● /● /●
2/N/68/● /N/●
12/● /● /● /67/60
19/● /● /● /● /●
20/● /● /● /● /●
139/● /● /● /● /●
269/● /● /● /● /●208/● /● /● /● /●
11/● /● /● /●
2/● /68/N/N/69
12/● /● /● /● /●
SeEc
Vc
NgNm
Pg
10/● /● /● /● /●
RsBc
Bp
2/● /68/N/N/69N/N/64/N/N/48
2/● /68/N/N/69
RsBc
Bp
2/● /68/N/N/69N/N/64/N/N/48
RsBc
Bp
2/● /68/N/N/69N/N/64/N/N/48
27/● /● /● /● /●
39/● /● /● /● /●
60/● /● /● /● /●69/● /● /● /● /●
4/66/● /● /74/●
14/● /● /76/● /●
23/● /● /● /● /●
15/● /● /● /● /●
N/N/74/N/N/75 42/● /● /● /● /●
11/● /● /● /N/●
59/● /● /69/60/●
9/70/● /N/N/29
N/N/74/N/N/●
30/● /● /● /● /●
85/● /● /● /● /●
0/72/● /● /N/●
N/N/● /N/N/44
9/53/● /65/N/45
13/● /● /● /● /●
29/● /● /● /● /●
N/N/● /67/61/44
25/● /● /● /● /●
29/● /● /● /● /●
4/● /● /● /58/●
1/46/● /46/N/49
5/73/● /● /● /●
19/● /● /● /● /●
6/● /● /● /● /●
16/● /● /● /● /●
4/● /● /● /N/74
2/46/N/N/N/N2/54/● /● /N/68
7/● /● /● /66/77N/N/60/N/N/46
19/● /● /● /● /●
6/● /● /● /● /●
N/N/N/73/53/30
N/N/67/N/N/N
N/N/N/63/56/53
N/N/72/N/N/35
2/46/● /72/● /63
N/N/● /48/N/61
N/N/50/N/N/N
4/35/79/60/55/51
2/46/● /72/● /44
4/N/69/N/N/N
59/● /● /69/60/62
Figure 4 Phylogenies and tanglegrams. Shown are the optimal phylogenetic trees inferred from both maximum parsimony (MP) and maximum likelihood (ML)methods in six associated tree networks called tanglegrams. In each tanglegram, the organismal tree is on the left and the tree representing a congruentpartition of genes in the tad locus is on the right. Red lines connect tad loci to the organisms in which they are found. The tad locus partitions are as follows:P1: rcpBC, cpaD, tadVZABCDF; P2: rcpA; P3: tadEG. The organismal phylogeny is derived from a data set composed of homologs of 44 translational genes(eftu, if-1, if-3, npt, rba, rf-1, rf-2, rfr, rpl1, rpl2, rpl3, rpl4, rpl6, rpl10, rpl11, rpl13, rpl14, rpl16, rpl17, rpl19, rpl20, rpl21, rpl22, rpl23, rpl27, rpl34,rps2, rps3, rps4, rps6, rps7, rps8, rps9, rps11, rps12, rps13, rps15, rps17, rps18, rps19, rps20, sp2, trmD and truA). P1 is rooted in the same branch thatdefines the bacterial tadA subfamily in the type II/IV NTPase superfamily phylogeny13 (P.H.P., R.D. and D.H. Figurski, unpublished data). P2 is rooted usingthe secretin family tree (P.H.P., R.D. and D.H. Figurski, unpublished data). P3 is rooted to correspond with trees from P1 and P2. Aa, Cc, Ct, Pa, Pm and Ypare the same designations used in Figure 1. Af, Acidithiobacillus ferroxidans; At, Agrobacterium tumefaciens str. C58; Bc, Burkholderia cepacia; Bp,Bordetella pertussis; Cd, Corynebacterium diphtheriae; Ec, Escherichia coli; Ml, Mesorhizobium loti; Mt, Mycobacterium tuberculosis H37Rv; Ng, Neisseriagonorhoea; Nm, N. meningitidis; Pg, Porphyromonas gingivalis; Rs, Ralstonia solanacearum; Sc, Streptomyces coelicolor A3(2); Se, Salmonella enterica;Sm, Sinorhizobium meliloti; Vc, Vibrio cholerae. Numbers with species designations refer to independent loci in the same organism. Numbers on ornear branches (some with arrows) indicate support indices from different methods in the following order: Bremer decay index/parsimony bootstrappercentage/Bayesian credibility value/neighbor-joining bootstrap percentage/maximum likelihood quartet-puzzling support/maximum likelihood bootstrap.Dots indicate values over 80% (for actual numbers, see URL). N indicates that the node did not exist in the optimal support tree. Nodes that are incongruentwith the organismal tree are designated with a red dot. ILD scores, which represent the number of extra steps required when the organismal phylogenytopology is imposed on each tad locus partition, are shown along with P values.
©20
03 N
atu
re P
ub
lish
ing
Gro
up
h
ttp
://w
ww
.nat
ure
.co
m/n
atu
reg
enet
ics
Another potential confounding factor of phylogenetic inference isthat recombination events may have occurred between tad regions.Recombination events can be detected reliably using incongruencelength difference20,21 (ILD), which provides a relatively conservative,albeit not infallible22, statistical test for differing phylogenetic histo-ries. Pairwise ILD comparisons for all genes in the tad region showedstrong evidence for recombination (Fig. 3). Therefore, we partitionedthe genes into three concatenated alignments of phylogenetically con-gruent genes (Fig. 3). ILD analysis showed that the partitioned datasets were statistically incongruent with each other and with theorganismal data set (P < 0.01).
Phylogenies of the three partitions were inferred using parsimony,Bayesian, maximum likelihood and neighbor-joining algorithms.Trees from each partition differed markedly in topology from theorganismal tree (Fig. 4), indicating markedly different evolutionaryhistories.
Tree reconciliation, a technique used for reconstructing co-evolution between parasites and hosts or genes and organisms23,searches for parsimonious historical scenarios that may have ledto topological incongruence between phylogenies. Tree reconcili-ation distinguishes between gene duplication and horizontal trans-fer as explanations for incongruence. Using Treemap2 (ref. 24), wereconciled the most parsimonious trees from each of the three par-titions with the organismal tree (Fig. 5). Our reconciled scenariosshow the minimum total number of non-vertical (ad hoc) inheri-tance events (that is, gene losses, duplications and horizontal trans-fers) found in our analysis. Other models, currently lessparsimonious, may emerge as more plausible explanations as more
of epitope-tagged Flp1 (Flp1-T7) in wild-type and mutant strains(Fig. 2). Flp1-T7 was found in the supernatants of both wild-typeCU1000N and the flp-1 mutant, as expected. In contrast, no Flp1-T7 was present in the supernatants of the tad mutants, confirmingthat Flp1 is a substrate for secretion by the products of the tad locus.
Genomic islands can be identified by the presence of genes orsequences that suggest past or potential movement between chromo-somes1. Some tad loci bear characteristics indicative of a foreign ori-gin. tad loci in Pasteurellaceae and Yersinia species have atypically lowG+C contents, and A. actinomycetemcomitans and P. multocida tadloci lack characteristic DNA uptake sequences for natural transfor-mation that occur in these chromosomes approximately once perkilobase11,16 (Fig. 1).
We define genomic islands as clusters of genes with a common evo-lutionary history that is distinct and divergent from the histories ofthe organisms in which they reside. They can be considered distinctbiological entities (for example, parasites) living in their hostgenomes, and techniques for inferring evolutionary histories of hostsand parasites can be applied to test specific horizontal transfer, orhost switching, events.
Rampant horizontal transfer may obscure underlying microbialphylogeny17. Thus, we used a concatenated data set of genes involvedin mRNA translation previously shown to have little evidence of hor-izontal transfer18,19 to provide a reasonable representation of theorganismal, or host, phylogeny. We used parsimony-based, Bayesian,maximum likelihood and distance-based (neighbor-joining) algo-rithms to infer the organismal phylogeny. The resulting trees werelargely in agreement.
Figure 5 Reticulogram. Shown is one of the fourmost parsimonious total reconciliations foundusing our search strategy. The reticulogramoutlines a plausible scenario of the disseminationof the tad locus. Species designations are thesame as in Figures 1 and 3. Gray tubes representorganismal phylogeny. Colors indicate phylogeniesof individual genes as shown. H, horizontaltransfer events; R, recombination and genereplacement; D, duplications. Losses are notindicated for simplicity. cpaD, tadF and rcpB arenot shown for clarity, though all follow the samepath as P1 partition genes. In this totalreconciliation there are 20 losses, 5 horizontaltransfers, 7 duplications and 4 recombinationsand gene replacements (total of 36 non-verticalevents). Some losses may represent missing datain the unfinished genomes. This tally is 7 stepsshorter than the sum of non-vertical events fromthe three most parsimonious individualreconciliations from each partition given byTreeMap2 (P1:19, P2:11, P3:13; see URL). D1,D2, D3, D4, D6, D7, H1, H4, H5, R2, R3 and R4are found in all four most parsimonious totalreconciliations. H1, H5 and D2 are furthersupported by their appearance in all mostparsimonious individual reconciliations betweenP1 and the organismal tree. The most parsimonious reconciliation in which H1 and H5 do not appear requires two more steps than the most parsimonioussolution. Similarly, the most parsimonious reconciliation of P2 that does not include H5 requires 4 more events. D2 and D6 are found in all reconciliationsproduced by TreeMap2. We also reconciled trees generated by each inference method (maximum likelihood, Bayesian and neighbor-joining) separately (seeURL). H1, H2, H3, H4, D2 and D4 are all routinely found in these most parsimonious reconciliations, indicating that the scenario depicted is robust totopological disagreement between trees generated by different inference methods. Confirmation of these events will require further testing and sampling.LILD scores (numbers with arrows) show the total number of extra steps required to impose specific nodes from organismal phylogeny on the partitions fromthe tad locus. These values offer a localized measure of support for this scenario that is supported by statistical analysis. In parentheses is the LILD score foreach partition individually (P1/P2/P3); all values are statistically significant (P < 0.01). Asterisks indicate nodes that were also found in simultaneousanalysis of each partition with the organismal phylogeny. N indicates that the node was not applicable to a specific partition.
L E T T E R S
196 VOLUME 34 | NUMBER 2 | JUNE 2003 NATURE GENETICS
©20
03 N
atu
re P
ub
lish
ing
Gro
up
h
ttp
://w
ww
.nat
ure
.co
m/n
atu
reg
enet
ics
L E T T E R S
data become available. To gauge support for our scenarios, we usedthe local incongruence length difference (LILD) test25, which mea-sures the extent and significance of incongruent phylogenetic sig-nal at differing nodes. The LILD test yielded high scores withstrong statistical support (P < 0.01) at several key nodes (Fig 5).
The most parsimonious tree reconciliations show periods ofvertical inheritance punctuated by horizontal transfers andduplications (Fig. 5). In the α-proteobacteria, some of the tadloci have a clear pattern of descent that exactly tracks the organ-ismal phylogeny, indicating vertical inheritance. But several ofthese species have second or third copies of the locus originatingfrom early duplications. These lineages are largely characterizedby non-vertical events, and several descendants, including those inSinorhizobium meliloti and Chlorobium limicola, are found onplasmids. These results raise the possibility that the driving forcefor many of these events was acquisition and mobilization of theregion by plasmids.
One mobile lineage seems to be responsible for the transfer of thetad locus to the common ancestor of the Pasteurellaceae and Yersiniaspecies (Fig. 5). This event is independently supported by the markeddifferences in G+C content in the tad loci of this clade (Fig. 1), but animmediate hypothetical donor with low G+C content is not well rep-resented in this analysis, suggesting an even more complex historicalscenario than presented here.
Our genetic and functional data show that the tad region inA. actinomycetemcomitans encodes a secretion system dedicated tothe assembly of fibrils that mediate adherence and colonization.Because tad regions in other organisms probably facilitate colo-nization by promoting adherence to different host tissues and sur-faces in the environment, we have designated this region as theWidespread Colonization Island. Our analysis indicates thatPasteurellaceae and Yersinia species acquired the WidespreadColonization Island, directly or indirectly, from ancestors or closerelatives of modern day Rhizobiaceae. Thus, a set of genes that sev-eral different organisms require for pathogenesis in humans andanimals was acquired from organisms that probably used thesegenes for other purposes. It is, therefore, important to consider thepotential genetic contribution of environmental, non-pathogenicorganisms in the emergence of infectious disease.
METHODSGenetic analysis. We generated and complemented random transposon inser-tion mutations as previously described6. Uninduced expression from the leakytacp promoter of the IncQ vector was sufficient to complement in all cases. Totest for adherence, we grew strains in polystyrene wells overnight and washedand stained them as previously described6.
Flp1-T7 localization. We grew cells containing expression plasmid pSK163with the flp-1-T7–TAG fusion5 in 10 ml A. actinomycetemcomitans growthmedium (AAGM; 30 g Trypticase soy broth (BBL), 6 g yeast extract (BBL),0.75% glucose and 0.4% sodium bicarbonate per liter) containing chloram-phenicol (4 µg µl–1) and 1.0 mM isopropyl-β-D-thiogalactopyranoside for5 h. We extracted whole-cell protein by removing 100 µl of cell suspension,pelleting cells by centrifugation (at 16,000g for 3 min) and resuspending thecell pellet in 20 µl of SDS loading dye. We loaded 5 µl of this mixture onto a20% SDS–PAGE gel. To obtain Flp1-T7 protein from the supernatant, weplated cells onto AAGM solid medium containing chloramphenicol andisopropyl-β-D-thiogalactopyranoside. After incubation for 3–4 d, wescraped and resuspended cells in 1.0 ml HEPES buffer (10 mM; pH 7.4).After vortex mixing, we pelleted cells by centrifugation. We concentrated analiquot of 500 µl of supernatant ∼ 10× in a Microcon YM-10 (Millipore)concentrating device. We carried out SDS–PAGE (20% acrylamide) andwestern-blot analysis as previously described3.
Database searches and alignment. We identified putative homologous lociand genes using the default settings of BLAST26 to search both the finished andunfinished microbial databases. We accepted loci if they had two or moreORFs that were significantly similar to any gene in the tad locus of A. actino-mycetemcomitans or any other tad-like locus found. We then aligned amino-acid sequences deduced from each ORF in each region with other similargenes using the default settings of CLUSTALX 1.63 (European MolecularBiology Organization). Genes representing organismal phylogeny were basedon 44 ‘core’ genes listed in the data set of Brochier et al.18 that are statisticallycongruent based on principle component analysis. We initially combined puta-tive homologs of tadB and tadC in the same alignment, and phylogenetic analy-sis of the combined tadB and tadC family suggested that the common ancestorof tadB and tadC may have been inherited as one gene that subsequently dupli-cated independently in gram-positive and gram-negative organisms (data notshown). To account for this possibility, we divided the TadB and TadC align-ment into four independent alignments (designated TadB+ and TadC+ forgram-positive versions; TadB– and TadC– for gram-negative versions). TadC–
and TadB– also included proteins from one of the two regions present in thegram-positive organism Streptomyces coelicolor, as these proteins consistentlygrouped with gram-negative versions in phylogenetic analysis. Similarly, weinitially placed putative homologs of tadE and tadF in the same alignment.tadF genes formed a monophyletic clade after an apparent duplication event(data not shown) and were therefore separated into a separate alignment forfurther analysis.
Phylogenetic and comparative analysis. We carried out heuristicsearches for the most parsimonious trees using the ‘ratchet’ method27
implemented using PAUPRat (Sikes & Lewis) in conjunction withPAUP4.0b10 (Swofford). We carried out 200 replicates of the ratchetmethod, upweighting 15% of the characters in each iteration, using thetree-branch reconnection technique and saving only one tree at each step.We then did tree-branch reconnection on the resulting trees. All charac-ters and state transformations were given equal weight. To calculate confi-dence in the resulting trees, we generated Bremer decay indices using theprogram Autodecay (Eriksson) and 100 Bootstrap replicates in PAUP with100 iterations of random addition followed by tree-branch reconnection.We constructed neighbor-joining trees using default settings in PAUP. Weconstructed maximum likelihood trees and bootstrap supports withProML in PHYLIP 3.6 (Felsenstein) using the JTT empirical model. Themaximum likelihood quartet puzzling method was implemented withTREE-PUZZLE28 using WAG and JTT empirical models. We constructedBayesian trees with the Markov-Chain-Monte-Carlo technique in Mr.Bayes29
using the JTT empirical model. We used the partition homogeneity functionin PAUP for ILD analyses. We carried out the LILD test at nodes that weretopologically incongruent with the most parsimonious tree from both theorganismal tree and the simultaneous analysis25. We used TreeMap2b24 togenerate individual reconciliations of each of the three partitions with theorganismal phylogeny. For computational tractability, we allowed no morethan 6 horizontal transfer events, 22 losses and 20 duplications for any singlereconciliation. Further testing, allowing for more events, did not yield moreparsimonious reconciliations (data not shown). Polytomies were resolved tobe congruent with other partitions. All events were weighted equally. Weused the resulting optimal individual reconciliations of each partition asguides in a heuristic search for the best simultaneous total reconciliation ofall three partitions on the organismal tree. To reduce the considerable math-ematical complexity of a manual search for optimal total reconciliations (noautomated techniques are available), any duplication or horizontal transferevent that we found in common for all of the most parsimonious individualreconciliations of each partition was constrained to exist in at least one totalreconciliation. In addition, we assumed that because the genes in the tadlocus are closely linked, non-vertical events predicted for more than one par-tition should be counted only once instead of two or three times in cost cal-culations. Further, all recombination and gene replacement events betweentad loci should occur when the exchanging loci are present in the same hypo-thetical lineage. Using these criteria, we searched for total reconciliationsthat minimized the total number of duplications, horizontal transfers, lossesand recombination and gene replacement events.
NATURE GENETICS VOLUME 34 | NUMBER 2 | JUNE 2003 197
©20
03 N
atu
re P
ub
lish
ing
Gro
up
h
ttp
://w
ww
.nat
ure
.co
m/n
atu
reg
enet
ics
L E T T E R S
URL. Additional information and data from this study are available athttp://cpmcnet.columbia.edu/dept/figurski/WCI/wci.html.
ACKNOWLEDGMENTSWe thank M. Charleston for early release of TreeMap2b and helpful discussion,I.N. Sarkar for computer assistance, the members of D.H. Figurski’s, R.D.’sand D.H. Fine’s laboratories for helpful comments and the multiple genomesequencing projects from which data were collected for this analysis. This work wasfunded by grants from the US National Institutes of Health (to D.H. Figurski andR.D.). R.D. is also supported by the Lewis B. and Dorothy Cullman Program forMolecular Systematics at the American Museum of Natural History. S.C.K. waspartially supported by a training grant from the US National Institutes of Health toColumbia University. P.J.P. is supported by the Medical Scientist Training Programof Columbia University.
COMPETING INTERESTS STATEMENTThe authors declare that they have no competing financial interests.
Received 21 February; accepted 31 March 2003Published online 28 April 2003; doi:10.1038/ng1154
1. Hacker, J. & Kaper, J.B. Pathogenicity islands and the evolution of microbes. Annu.Rev. Microbiol. 54, 641–679 (2000).
2. Inoue, T. et al. Molecular characterization of low-molecular-weight component pro-tein, Flp, in Actinobacillus actinomycetemcomitans fimbriae. Microbiol. Immunol.42, 253–258 (1998).
3. Kachlany, S.C., Planet, P.J., DeSalle, R., Fine, D.H. & Figurski, D.H. Genes for tightadherence of Actinobacillus actinomycetemcomitans: from plaque to plague to pondscum. Trends Microbiol. 9, 429–437 (2001).
4. Fine, D.H. et al. Colonization and persistence of rough and smooth colony variants ofActinobacillus actinomycetemcomitans in the mouths of rats. Arch. Oral Biol. 46,1065–1078 (2001).
5. Kachlany, S.C. et al. flp-1, the first representative of a new pilin gene subfamily, isrequired for non-specific adherence of Actinobacillus actinomycetemcomitans. Mol.Microbiol. 40, 542–554 (2001).
6. Kachlany, S.C. et al. Nonspecific adherence by Actinobacillus actinomycetemcomi-tans requires genes widespread in bacteria and archaea. J. Bacteriol. 182,6169–6176 (2000).
7. Fuller, T.E., Kennedy, M.J. & Lowery, D.E. Identification of Pasteurella multocidavirulence genes in a septicemic mouse model using signature-tagged mutagenesis.Microb. Pathog. 29, 25–38 (2000).
8. Nika, J.R. et al. Haemophilus ducreyi requires the flp gene cluster for microcolonyformation in vitro. Infect. Immun. 70, 2965–2975 (2002).
9. Skerker, J.M. & Shapiro, L. Identification and cell cycle control of a novel pilus sys-tem in Caulobacter crescentus. EMBO J. 19, 3223–3234 (2000).
10. Sommer, J.M. & Newton, A. Turning off flagellum rotation requires the pleiotropicgene pleD: pleA, pleC, and pleD define two morphogenic pathways in Caulobactercrescentus. J. Bacteriol. 171, 392–401 (1989).
11. Thomson, V.J., Bhattacharjee, M.K., Fine, D.H., Derbyshire, K.M. & Figurski, D.H.Direct selection if IS903 transposon insertions by use of a broad-host-range vector:isolation of catalase-deficient mutants of Actinobacillus actinomycetemcomitans.J. Bacteriol. 181, 7298–7307 (1999).
12. Haase, E.M., Zmuda, J.L. & Scannapieco, F.A. Identification and molecular analysisof rough-colony-specific outer membrane proteins of Actinobacillus actinomycetem-comitans. Infect. Immun. 67, 2901–2908 (1999).
13. Planet, P.J., Kachlany, S.C., DeSalle, R. & Figurski, D.H. Phylogeny of genes forsecretion NTPases: identification of the widespread tadA subfamily and develop-ment of a diagnostic key for gene classification. Proc. Natl. Acad. Sci. USA 98,2503–2508 (2001).
14. Thomas, N.A., Mueller, S., Klein, A. & Jarrell, K.F. Mutants in flaI and flaJ of thearchaeon Methanococcus voltae are deficient in flagellum assembly. Mol. Microbiol.46, 879–887 (2002).
15. Bhattacharjee, M.K., Kachlany, S.C., Fine, D.H. & Figurski, D.H. Nonspecific adher-ence and fibril biogenesis by Actinobacillus actinomycetemcomitans: TadA proteinis an ATPase. J. Bacteriol. 183, 5927–5936 (2001).
16. Wang, Y., Goodman, S.D., Redfield, R.J. & Chen, C. Natural transformation and DNAuptake signal sequences in Actinobacillus actinomycetemcomitans. J. Bacteriol.184, 3442–3449 (2002).
17. Doolittle, W.F. Phylogenetic classification and the universal tree. Science 284,2124–2129 (1999).
18. Brochier, C., Bapteste, E., Moreira, D. & Philippe, H. Eubacterial phylogeny basedon translational apparatus proteins. Trends Genet. 18, 1–5 (2002).
19. Jain, R., Rivera, M.C. & Lake, J.A. Horizontal gene transfer among genomes: thecomplexity hypothesis. Proc. Natl. Acad. Sci. USA 96, 3801–3806 (1999).
20. Farris, J.S., Kallersjo, M., Kluge, A.G. & Bult, C. Constructing a significance test forincongruence. Syst. Biol. 44, 570–572 (1995).
21. Brown, E.W., Kotewicz, M.L. & Cebula, T.A. Detection of recombination amongSalmonella enterica strains using the incongruence length difference test. Mol.Phylogenet. Evol. 24, 102–120 (2002).
22. Dolphin, K., Belshaw, R., Orme, C.D. & Quicke, D.L. Noise and incongruence: inter-preting results of the incongruence length difference test. Mol. Phylogenet. Evol. 17,401–406 (2000).
23. Page, R.D. & Charleston, M.A. From gene to organismal phylogeny: reconciled treesand the gene tree/species tree problem. Mol. Phylogenet. Evol. 7, 231–240 (1997).
24. Charleston, M.A. Jungles: a new solution to the host/parasite phylogeny reconcilia-tion problem. Math. Biosci. 149, 191–223 (1998).
25. Thornton, J.W. & DeSalle, R. A new method to localize and test the significance ofincongruence: detecting domain shuffling in the nuclear receptor superfamily. Syst.Biol. 49, 183–201 (2000).
26. Altschul, S.F. et al. Gapped BLAST and PSI-BLAST: a new generation of proteindatabase search programs. Nucleic Acids Res. 25, 3389–3402 (1997).
27. Nixon, K.C. The parsimony ratchet, a new method for rapid parsimony analysis.Cladistics 15, 407–414 (1999).
28. Schmidt, H.A., Strimmer, K., Vingron, M. & von Haeseler, A. TREE-PUZZLE: maxi-mum likelihood phylogenetic analysis using quartets and parallel computing.Bioinformatics 18, 502–504 (2002).
29. Huelsenbeck, J.P. & Ronquist, F. MRBAYES: Bayesian inference of phylogenetictrees. Bioinformatics 17, 754–755 (2001).
30. Bergey, D.H., Holt, J.G. & Krieg, N.R. Bergey’s manual of systematic bacteriologyvol. 4 (Williams & Wilkins, Baltimore, 1984).
198 VOLUME 34 | NUMBER 2 | JUNE 2003 NATURE GENETICS
©20
03 N
atu
re P
ub
lish
ing
Gro
up
h
ttp
://w
ww
.nat
ure
.co
m/n
atu
reg
enet
ics