6
LETTERS Genomic islands, such as pathogenicity islands, contribute to the evolution and diversification of microbial life 1 . Here we report on the Widespread Colonization Island, which encompasses the tad (tight adherence) locus for colonization of surfaces and biofilm formation by the human pathogen Actinobacillus actinomycetemcomitans. At least 12 of the 14 genes at the tad locus are required for tenacious biofilm formation and synthesis of bundled Flp pili (fibrils) that mediate adherence. The pilin subunit 2 , Flp1, remains inside the cell in tad-locus mutants, indicating that these genes encode a secretion system for export and assembly of fibrils. We found tad-related regions in a wide variety of Bacterial and Archaeal species 3 , and their sequence characteristics indicate possible horizontal transfer. To test the hypothesis of horizontal transfer, we compared the phylogeny of the tad locus to a robust organismal phylogeny using statistical tests of congruence and tree reconciliation techniques. Our analysis strongly supports a complex history of gene shuffling by recombination and multiple horizontal transfers, duplications and losses. We present evidence for a specific horizontal transfer event leading to the establishment of this region as a determinant of disease. Fresh clinical isolates of γ-proteobacterium A. actinomycetemcomi- tans adhere tightly to surfaces to form biofilms that are extremely difficult to dislodge. This property is required for colonization and persistence in the oral cavity and initiation of periodontal disease (ref. 4 and H.C. Schreiner et al., manuscript submitted). Genes from the tad locus (Fig. 1) including pilin-encoding gene 2,5 flp-1 and tadABCDEFG 6 are required for tight adherence and the associated phenotypes of autoaggregation, rough colony morphology and fib- ril production. Similar tad loci have recently been implicated in human and animal diseases caused by Pasteurella multocida 7 and Haemophilus ducreyi 8 , which, like A. actinomycetemcomitans, are Pasteurellaceae family members. tad-related loci are found in other human pathogens, including Yersinia spp., Bordetella pertussis, Pseudomonas aeruginosa, Corynebacterium diphtheriae and Mycobacterium tuberculosis, indicating a possible role in coloniza- tion and pathogenesis in these bacteria 3 . A very similar region in the 1 Department of Microbiology, College of Physicians & Surgeons, Columbia University, 701 West 168th Street, New York, New York 10032, USA. 2 Department of Oral Biology, University of Medicine and Dentistry of New Jersey; Newark, New Jersey 07103, USA. 3 Molecular Biology Laboratory, American Museum of Natural History, Central Park West at 79th Street, New York, New York 10024, USA. Correspondence should be addressed to D.H.F. ([email protected]). The Widespread Colonization Island of Actinobacillus actinomycetemcomitans Paul J Planet 1 , Scott C Kachlany 1 , Daniel H Fine 2 , Rob DeSalle 3 & David H Figurski 1 NATURE GENETICS VOLUME 34 | NUMBER 2 | JUNE 2003 193 Ye 45.5% 34.8% 10,516 bp 46% 49.2% 11,515 bp Aa 45.2% 35.7% 45.8% 42.7% 2 kb Yp 10,441 bp 48.7% 36.7% 45.2% 48.5 ± 1.5% 11,643 bp 41.3% Pm 34.7% 38.8% 40.8-43.2% 11,766 bp 39.7% 34.8% 39.2% Hd 38% Bergey’s Cl 53.1% 53.3% 10,227 bp 8,764 bp 11,227 bp 58.5% 50.6% 52.5% Ct 9,584 bp 69% 68.7% 69.2% Cc 7,061 bp 65.9% 66.7% Pa 65.3% 67.2% 62-67% 52.5% 52.5% a b flp- 1 flp- 2 tadV rcpC rcpA rcpB tadZ tadA tadBtadC tadD tadE tadF tadG Figure 1 tad locus. (a) Schematic of the tad locus of A. actinomycetemcomitans. (b) G+C percentages in tad-related regions of different organisms compared to those of equal-sized flanking regions and to referenced G+C percentages for the entire genome. Aa, A. actinomycetmcomitans; Pm, Pasteurella multocida; Hd, Haemophilus ducreyi; Yp, Yersinia pestis; Ye, Yersinia enterocolitica; Cl, Chlorobium limicola; Ct, Chlorobium tepidum; Cc, Caulobacter crescentus CB15; Pa, Pseudomonas aeruginosa. For each species, the middle portion represents the tad locus (defined as the contiguous area containing ORFs similar to ORFs from the A. actinomycetemcomitans tad locus). The vertical displacement of the line is proportional to the percentage of G+C over that region. Approximations of G+C percentage for the overall genome are as reported 30 . Dots indicate positions of putative uptake sequences 11,16 (5-aAAGTGCGgtc-3), which were found by allowing two of the four lower- case positions in the canonical sequence to vary. A paucity of uptake sequences was also found in H. ducreyi under less stringent sequence searching conditions (data not shown). © 2003 Nature Publishing Group http://www.nature.com/naturegenetics

The Widespread Colonization Island of Actinobacillus actinomycetemcomitans

  • Upload
    david-h

  • View
    214

  • Download
    1

Embed Size (px)

Citation preview

Page 1: The Widespread Colonization Island of Actinobacillus actinomycetemcomitans

L E T T E R S

Genomic islands, such as pathogenicity islands, contribute to theevolution and diversification of microbial life1. Here we reporton the Widespread Colonization Island, which encompasses thetad (tight adherence) locus for colonization of surfaces andbiofilm formation by the human pathogen Actinobacillusactinomycetemcomitans. At least 12 of the 14 genes at the tadlocus are required for tenacious biofilm formation and synthesisof bundled Flp pili (fibrils) that mediate adherence. The pilinsubunit2, Flp1, remains inside the cell in tad-locus mutants,indicating that these genes encode a secretion system for exportand assembly of fibrils. We found tad-related regions in a widevariety of Bacterial and Archaeal species3, and their sequencecharacteristics indicate possible horizontal transfer. To test thehypothesis of horizontal transfer, we compared the phylogeny ofthe tad locus to a robust organismal phylogeny using statisticaltests of congruence and tree reconciliation techniques. Ouranalysis strongly supports a complex history of gene shuffling byrecombination and multiple horizontal transfers, duplicationsand losses. We present evidence for a specific horizontal transferevent leading to the establishment of this region as a determinantof disease.

Fresh clinical isolates of γ-proteobacterium A. actinomycetemcomi-tans adhere tightly to surfaces to form biofilms that are extremelydifficult to dislodge. This property is required for colonization andpersistence in the oral cavity and initiation of periodontal disease(ref. 4 and H.C. Schreiner et al., manuscript submitted). Genes fromthe tad locus (Fig. 1) including pilin-encoding gene2,5 flp-1 andtadABCDEFG6 are required for tight adherence and the associatedphenotypes of autoaggregation, rough colony morphology and fib-ril production. Similar tad loci have recently been implicated inhuman and animal diseases caused by Pasteurella multocida7 andHaemophilus ducreyi8, which, like A. actinomycetemcomitans, arePasteurellaceae family members. tad-related loci are found in otherhuman pathogens, including Yersinia spp., Bordetella pertussis,Pseudomonas aeruginosa, Corynebacterium diphtheriae andMycobacterium tuberculosis, indicating a possible role in coloniza-tion and pathogenesis in these bacteria3. A very similar region in the

1Department of Microbiology, College of Physicians & Surgeons, Columbia University, 701 West 168th Street, New York, New York 10032, USA. 2Department of OralBiology, University of Medicine and Dentistry of New Jersey; Newark, New Jersey 07103, USA. 3Molecular Biology Laboratory, American Museum of Natural History,Central Park West at 79th Street, New York, New York 10024, USA. Correspondence should be addressed to D.H.F. ([email protected]).

The Widespread Colonization Island of ActinobacillusactinomycetemcomitansPaul J Planet1, Scott C Kachlany1, Daniel H Fine2, Rob DeSalle3 & David H Figurski1

NATURE GENETICS VOLUME 34 | NUMBER 2 | JUNE 2003 193

Ye 45.5%34.8%10,516 bp

46%49.2%

11,515 bpAa 45.2%

35.7%

45.8%42.7%

2 kb

Yp10,441 bp

48.7%

36.7%45.2% 48.5 ± 1.5%

11,643 bp

41.3%Pm 34.7%

38.8% 40.8-43.2%

11,766 bp

39.7%34.8%

39.2%Hd 38%

Bergey’s

Cl 53.1% 53.3%10,227 bp 8,764 bp

11,227 bp

58.5%50.6% 52.5%Ct

9,584 bp

69% 68.7% 69.2%Cc

7,061 bp

65.9% 66.7%Pa 65.3% 67.2%

62-67%

52.5%

52.5%

a

b

flp- 1

flp- 2

tadVrcpC rcpA rcpB tadZ tadA tadBtadC tadD ta

dEta

dFtadG

Figure 1 tad locus. (a) Schematic of the tad locus ofA. actinomycetemcomitans. (b) G+C percentages in tad-related regionsof different organisms compared to those of equal-sized flankingregions and to referenced G+C percentages for the entire genome. Aa,A. actinomycetmcomitans; Pm, Pasteurella multocida; Hd, Haemophilusducreyi; Yp, Yersinia pestis; Ye, Yersinia enterocolitica; Cl, Chlorobiumlimicola; Ct, Chlorobium tepidum; Cc, Caulobacter crescentus CB15; Pa,Pseudomonas aeruginosa. For each species, the middle portion representsthe tad locus (defined as the contiguous area containing ORFs similarto ORFs from the A. actinomycetemcomitans tad locus). The verticaldisplacement of the line is proportional to the percentage of G+C over thatregion. Approximations of G+C percentage for the overall genome are asreported30. Dots indicate positions of putative uptake sequences11,16

(5′-aAAGTGCGgtc-3′), which were found by allowing two of the four lower-case positions in the canonical sequence to vary. A paucity of uptakesequences was also found in H. ducreyi under less stringent sequencesearching conditions (data not shown).

©20

03 N

atu

re P

ub

lish

ing

Gro

up

h

ttp

://w

ww

.nat

ure

.co

m/n

atu

reg

enet

ics

Page 2: The Widespread Colonization Island of Actinobacillus actinomycetemcomitans

L E T T E R S

nonpathogenic, α-proteobacterium Caulobacter crescentus isrequired for production of pili9 and possibly involved in colonizationof surfaces10. The presence of similar loci in other nonpathogenicspecies, such as the Chlorobiaceae, argues for a role in colonizationof environmental niches3.

To test if other genes in the tad locus are required for adherence,we used transposon IS903φkan11 to isolate nonadherent mutants ofA. actinomycetemcomitans strain CU1000N6. Screening by PCR forinsertions between flp-1 and tadA identified mutations in rcpA, rcpCand tadZ. The product of rcpA is expressed only in rough, adherentstrains12. The functions of the products of rcpC and tadZ are unknown.

All mutants were defective in rough colony formation, autoaggre-gation, adherence and production of Flp pili (Fig. 2a,b). These muta-tions are not polar on downstream tad genes, as they werecomplemented by the wild-type gene on an IncQ plasmid (Fig. 2a,b).No insertions were detected in flp-2, tadV or rcpB, but tadV encodes aputative prepilin peptidase, and its homolog in C. crescentus (cpaA) isrequired for pilus production9. The functions of the products of aputative pilin gene flp-2 and rcpB are unknown. Taken together, theresults indicate that at least 12 of 14 genes of the tad locus are requiredfor adherence.

Several genes in the tad locus (flp-1, flp-2, tadV, rcpA, tadA, tadBand tadC) are putative homologs of genes encoding bacterial typeII secretion12,13, biogenesis of type IV pili and archaeal flagella5,9,14.Although TadA, an ATP hydrolase15, is related to ATPases of pilusassembly and type II secretion systems (for example, PulE,Klebsiella spp.), it is more closely related to those of type IV secre-tion systems13 (for example, VirB11, Agrobacterium tumefaciens),suggesting gene shuffling between types of secretion systems. Nomatches for rcpC, rcpB, tadZ, tadD, tadE, tadF or tadG occur inknown secretion systems. To test if the tad locus encodes a systemfor secretion and assembly of fibrils, we compared the localization

194 VOLUME 34 | NUMBER 2 | JUNE 2003 NATURE GENETICS

4630

2114

6.5

Whole cells14

Wild

-type

flp1–

tadA–

tadC–

tadD–

tadF–

tadG–

Supernatant

Coo

mas

sie

Wes

tern

6.5

14

a

b

c

Figure 2 Phenotypes of tad mutants. (a) Adherence phenotype. Shown is thetadZ1194 mutant with the IncQ plasmid pJAK16 (ref. 11; left) and with thewild-type tadZ gene expressed from the tacp promoter on pJAK16 (right).The complemented mutant adheres to the bottom of the well. Similar resultswere obtained for mutations in rcpA and rcpC. (b) Fibril production. Electronmicrographs of the tadZ1194 mutant with pJAK16 (left) and with thecomplementing tadZ+ pJAK16 plasmid (right). Similar results were obtainedfor mutations in rcpA and rcpC. Samples were negatively stained and viewedas previously described6 (c) Localization of Flp1-T7 in wild type and tadmutant strains. Protein from whole cells (top two panels) or supernatantfractions (bottom panel) was separated by SDS–PAGE. Total protein wasstained with Coomassie blue (top panel) and Flp1-T7 was detected bywestern blotting using monoclonal antibody against T7–Tag (Novagen;bottom two panels). Molecular mass markers, in kDa, are noted on the left.Dot blots probed with monoclonal antibody against T7–TAG with supernatantfractions concentrated 6× yielded similar results (see URL).

Incongruent (P < 0.01)

Borderline (P < 0.1)

Congruent

Not applicable

rcpC

rcpA

tadV

tadA

tadC+

tadB+

tadE

tadF

tadD

cpaD

tadZ

tadB–

tadC–

rcpCrcpA

tadVtadA

tadC+

tadB+

tadEtadF

tadDcpaDtadZ

tadB–

tadC–

tadG

tadG

Figure 3 Pairwise ILD analysis and partitions. Shown is a matrix of pairwiseILD comparisons used to define incongruent partitions. Red squares showhighly significant phylogenetic incongruence between several genes in thetad locus, which is best explained by past recombination events in the locus.A mostly white matrix would signal that there was little evidence of shufflingof genes between tad regions. We used this analysis to partition the tad locusinto sets of genes with very different evolutionary histories. We groupedgenes together in the same partition when they were either congruent (white)or borderline (pink) with every other gene in the partition. Because theincongruencies were not symmetric, several different partitions werepossible. We chose the one that had the highest number of contiguousgenes, which included cpaD, rcpC, rcpB, tadZ, tadA, tadB–, tadB+, tadC–,tadC+, tadD and tadF. tadE and tadG were also combined. All of thetentative partitions were tested against each other in succession. Partitionswere combined if subsequent ILD tests suggested congruence. In this way,tadV was combined with the largest partition to form P1, but rcpA (P2)and the combination of tadE and tadG (P3) remained independent andincongruent. flp genes were excluded from this analysis because their smallsize limited possible statistical support and their complex history ofduplication and loss made false orthology statements likely5.

©20

03 N

atu

re P

ub

lish

ing

Gro

up

h

ttp

://w

ww

.nat

ure

.co

m/n

atu

reg

enet

ics

Page 3: The Widespread Colonization Island of Actinobacillus actinomycetemcomitans

L E T T E R S

NATURE GENETICS VOLUME 34 | NUMBER 2 | JUNE 2003 195

Aa

Ct

SmAtAf

PmHi

Yp

MlCc

Rs

PaBc

Bp

MtCd

Sc

26/● /● /● /● /●

70/● /● /● /● /●

16/● /● /● /● /●

9/● /● /● /● /●

3/76/● /● /● /●

2/N/68/● /N/●

12/● /● /● /67/60

19/● /● /● /● /●

1/● /● /● /● /●

54/● /● /● /● /●54/● /● /● /● /●

20/● /● /● /● /●

139/● /● /● /● /●269/● /● /● /● /●

208/● /● /● /● /●

11/● /● /● /●

2/● /68/N/N/●

2/● /68/N/N/69

12/● /● /● /● /●

SeEc

Vc

NgNm

Pg

10/● /● /● /● /●

Aa

Ct

Sm1

Sm2At

Af

PmYpMl1Ml2

Ml3

Cc

Pa

Rs2

RsBc1Bp1Bc2

Bp2

MtCd

Sc2

Sc1

11/● /● /● /N/●59/● /● /69/60/●

59/● /● /69/60/62

15/● /● /● /● /●

9/70/● /N/N/29

5/40/● /N/N/N

9/53/● /65/N/45

4/N/69/N/N/N

4/N/N/N/N/N

30/● /● /● /● /●

27/● /● /● /● /●

85/● /● /● /● /●

39/● /● /● /● /●

60/● /● /● /● /●69/● /● /● /● /●

4/66/● /● /74/●

14/● /● /76/● /●

8/53/N/77/● /N 42/● /● /● /● /●

P1:MP (ILD = 924; P < 0.01)

Aa

Ct

SmAtAf

PmHi

Yp

MlCc

Rs

PaBc

Bp

26/● /● /● /● /●

70/● /● /● /● /●

16/● /● /● /● /●

9/● /● /● /● /●

3/76/● /● /● /●

2/N/68/● /N/●

12/● /● /● /67/60

19/● /● /● /● /●

20/● /● /● /● /●

139/● /● /● /● /●269/● /● /● /● /●

208/● /● /● /● /●

11/● /● /● /●

2/● /68/N/N/●

2/● /68/N/N/69

12/● /● /● /● /●

SeEc

Vc

NgNm

Pg

10/● /● /● /● /●

Aa

Ct

SmAtAf

PmHi

Yp

MlCc

Rs

PaBc

Bp

26/● /● /● /● /●

70/● /● /● /● /●

16/● /● /● /● /●

9/● /● /● /● /●

3/76/● /● /● /●

2/N/68/● /N/●

12/● /● /● /67/60

19/● /● /● /● /●

20/● /● /● /● /●

139/● /● /● /● /●269/● /● /● /● /●

208/● /● /● /● /●

11/● /● /● /●

2/● /68/N/N/●

2/● /68/N/N/69

12/● /● /● /● /●

SeEc

Vc

NgNm

Pg

10/● /● /● /● /●

P2:MP (ILD = 223; P < 0.01) Aa

Ct

At

PmYp

Ml1

Ml2

Ml3Cc

Rs2

Rs

Bc2

AfPa

Bp2

Sm1

Sm2

13/● /● /● /● /●

29/● /● /● /● /●

1/34/N/N/N/N

1/36/N/50/71/N

29/● /● /● /● /●

Bp1

4/● /● /● /58/●

1/46/● /46/N/49

5/73/● /● /● /●

25/● /● /● /● /●19/● /● /● /● /●

6/● /● /● /● /●

16/● /● /● /● /●

4/● /● /● /N/74

2/46/N/N/N/N2/54/● /● /N/68

7/● /● /● /66/77

Aa

PmYp

Sm1Sm2CcCtRsMl1

PaRs2

Bp2Bc2

19/● /● /● /● /●

6/● /● /● /● /●

4/35/79/60/55/51

2/38/N/73/56/N

5/44/N/43/● /N

1/27/70/39/53

1/22/N/N/N/N

2/46/● /72/● /63

2/39/N/N/N/N

1/N/N/N/N/N

P3:MP (ILD = 15; P < 0.01)

2/46/● /72/● /44

23/● /● /● /● /●

AfRs2

Bc2Bp2

Aa

Ct

Sm1

Sm2At

PmYp

Ml1Ml2

Ml3

Cc

Pa

Rs

Bc1Bp1

MtCd

Sc2

Sc1

P1:ML

P2:MLAa

Ct

At

PmYp

Ml1

Ml2

Ml3Cc

Rs2

Rs

Bc2

AfPa

Bp2

Sm1

Sm2

Bp1

Aa

PmYp

Sm1Sm2Cc

Ct

RsMl1

Pa

Rs2Bp2Bc2

P3:ML

Aa

Ct

SmAtAf

PmHi

Yp

MlCc

Pa

MtCd

Sc

26/● /● /● /● /●

70/● /● /● /● /●

16/● /● /● /● /●

9/● /● /● /● /●

3/76/● /● /● /●

2/N/68/● /N/●

12/● /● /● /67/60

19/● /● /● /● /●

1/● /● /● /● /●

54/● /● /● /● /●54/● /● /● /● /●

20/● /● /● /● /●

139/● /● /● /● /●269/● /● /● /● /●

208/● /● /● /● /●

11/● /● /● /●

12/● /● /● /● /●

SeEc

Vc

NgNm

Pg

10/● /● /● /● /●

Aa

Ct

SmAtAf

PmHi

Yp

MlCc

Pa

26/● /● /● /● /●

70/● /● /● /● /●

16/● /● /● /● /●

9/● /● /● /● /●

3/76/● /● /● /●

2/N/68/● /N/●

12/● /● /● /67/60

19/● /● /● /● /●

20/● /● /● /● /●

139/● /● /● /● /●269/● /● /● /● /●

208/● /● /● /● /●

11/● /● /● /●

2/● /68/N/N/69

12/● /● /● /● /●

SeEc

Vc

NgNm

Pg

10/● /● /● /● /●

Aa

Ct

SmAtAf

PmHi

Yp

MlCc

Pa

26/● /● /● /● /●

70/● /● /● /● /●

16/● /● /● /● /●

9/● /● /● /● /●

3/76/● /● /● /●

2/N/68/● /N/●

12/● /● /● /67/60

19/● /● /● /● /●

20/● /● /● /● /●

139/● /● /● /● /●

269/● /● /● /● /●208/● /● /● /● /●

11/● /● /● /●

2/● /68/N/N/69

12/● /● /● /● /●

SeEc

Vc

NgNm

Pg

10/● /● /● /● /●

RsBc

Bp

2/● /68/N/N/69N/N/64/N/N/48

2/● /68/N/N/69

RsBc

Bp

2/● /68/N/N/69N/N/64/N/N/48

RsBc

Bp

2/● /68/N/N/69N/N/64/N/N/48

27/● /● /● /● /●

39/● /● /● /● /●

60/● /● /● /● /●69/● /● /● /● /●

4/66/● /● /74/●

14/● /● /76/● /●

23/● /● /● /● /●

15/● /● /● /● /●

N/N/74/N/N/75 42/● /● /● /● /●

11/● /● /● /N/●

59/● /● /69/60/●

9/70/● /N/N/29

N/N/74/N/N/●

30/● /● /● /● /●

85/● /● /● /● /●

0/72/● /● /N/●

N/N/● /N/N/44

9/53/● /65/N/45

13/● /● /● /● /●

29/● /● /● /● /●

N/N/● /67/61/44

25/● /● /● /● /●

29/● /● /● /● /●

4/● /● /● /58/●

1/46/● /46/N/49

5/73/● /● /● /●

19/● /● /● /● /●

6/● /● /● /● /●

16/● /● /● /● /●

4/● /● /● /N/74

2/46/N/N/N/N2/54/● /● /N/68

7/● /● /● /66/77N/N/60/N/N/46

19/● /● /● /● /●

6/● /● /● /● /●

N/N/N/73/53/30

N/N/67/N/N/N

N/N/N/63/56/53

N/N/72/N/N/35

2/46/● /72/● /63

N/N/● /48/N/61

N/N/50/N/N/N

4/35/79/60/55/51

2/46/● /72/● /44

4/N/69/N/N/N

59/● /● /69/60/62

Figure 4 Phylogenies and tanglegrams. Shown are the optimal phylogenetic trees inferred from both maximum parsimony (MP) and maximum likelihood (ML)methods in six associated tree networks called tanglegrams. In each tanglegram, the organismal tree is on the left and the tree representing a congruentpartition of genes in the tad locus is on the right. Red lines connect tad loci to the organisms in which they are found. The tad locus partitions are as follows:P1: rcpBC, cpaD, tadVZABCDF; P2: rcpA; P3: tadEG. The organismal phylogeny is derived from a data set composed of homologs of 44 translational genes(eftu, if-1, if-3, npt, rba, rf-1, rf-2, rfr, rpl1, rpl2, rpl3, rpl4, rpl6, rpl10, rpl11, rpl13, rpl14, rpl16, rpl17, rpl19, rpl20, rpl21, rpl22, rpl23, rpl27, rpl34,rps2, rps3, rps4, rps6, rps7, rps8, rps9, rps11, rps12, rps13, rps15, rps17, rps18, rps19, rps20, sp2, trmD and truA). P1 is rooted in the same branch thatdefines the bacterial tadA subfamily in the type II/IV NTPase superfamily phylogeny13 (P.H.P., R.D. and D.H. Figurski, unpublished data). P2 is rooted usingthe secretin family tree (P.H.P., R.D. and D.H. Figurski, unpublished data). P3 is rooted to correspond with trees from P1 and P2. Aa, Cc, Ct, Pa, Pm and Ypare the same designations used in Figure 1. Af, Acidithiobacillus ferroxidans; At, Agrobacterium tumefaciens str. C58; Bc, Burkholderia cepacia; Bp,Bordetella pertussis; Cd, Corynebacterium diphtheriae; Ec, Escherichia coli; Ml, Mesorhizobium loti; Mt, Mycobacterium tuberculosis H37Rv; Ng, Neisseriagonorhoea; Nm, N. meningitidis; Pg, Porphyromonas gingivalis; Rs, Ralstonia solanacearum; Sc, Streptomyces coelicolor A3(2); Se, Salmonella enterica;Sm, Sinorhizobium meliloti; Vc, Vibrio cholerae. Numbers with species designations refer to independent loci in the same organism. Numbers on ornear branches (some with arrows) indicate support indices from different methods in the following order: Bremer decay index/parsimony bootstrappercentage/Bayesian credibility value/neighbor-joining bootstrap percentage/maximum likelihood quartet-puzzling support/maximum likelihood bootstrap.Dots indicate values over 80% (for actual numbers, see URL). N indicates that the node did not exist in the optimal support tree. Nodes that are incongruentwith the organismal tree are designated with a red dot. ILD scores, which represent the number of extra steps required when the organismal phylogenytopology is imposed on each tad locus partition, are shown along with P values.

©20

03 N

atu

re P

ub

lish

ing

Gro

up

h

ttp

://w

ww

.nat

ure

.co

m/n

atu

reg

enet

ics

Page 4: The Widespread Colonization Island of Actinobacillus actinomycetemcomitans

Another potential confounding factor of phylogenetic inference isthat recombination events may have occurred between tad regions.Recombination events can be detected reliably using incongruencelength difference20,21 (ILD), which provides a relatively conservative,albeit not infallible22, statistical test for differing phylogenetic histo-ries. Pairwise ILD comparisons for all genes in the tad region showedstrong evidence for recombination (Fig. 3). Therefore, we partitionedthe genes into three concatenated alignments of phylogenetically con-gruent genes (Fig. 3). ILD analysis showed that the partitioned datasets were statistically incongruent with each other and with theorganismal data set (P < 0.01).

Phylogenies of the three partitions were inferred using parsimony,Bayesian, maximum likelihood and neighbor-joining algorithms.Trees from each partition differed markedly in topology from theorganismal tree (Fig. 4), indicating markedly different evolutionaryhistories.

Tree reconciliation, a technique used for reconstructing co-evolution between parasites and hosts or genes and organisms23,searches for parsimonious historical scenarios that may have ledto topological incongruence between phylogenies. Tree reconcili-ation distinguishes between gene duplication and horizontal trans-fer as explanations for incongruence. Using Treemap2 (ref. 24), wereconciled the most parsimonious trees from each of the three par-titions with the organismal tree (Fig. 5). Our reconciled scenariosshow the minimum total number of non-vertical (ad hoc) inheri-tance events (that is, gene losses, duplications and horizontal trans-fers) found in our analysis. Other models, currently lessparsimonious, may emerge as more plausible explanations as more

of epitope-tagged Flp1 (Flp1-T7) in wild-type and mutant strains(Fig. 2). Flp1-T7 was found in the supernatants of both wild-typeCU1000N and the flp-1 mutant, as expected. In contrast, no Flp1-T7 was present in the supernatants of the tad mutants, confirmingthat Flp1 is a substrate for secretion by the products of the tad locus.

Genomic islands can be identified by the presence of genes orsequences that suggest past or potential movement between chromo-somes1. Some tad loci bear characteristics indicative of a foreign ori-gin. tad loci in Pasteurellaceae and Yersinia species have atypically lowG+C contents, and A. actinomycetemcomitans and P. multocida tadloci lack characteristic DNA uptake sequences for natural transfor-mation that occur in these chromosomes approximately once perkilobase11,16 (Fig. 1).

We define genomic islands as clusters of genes with a common evo-lutionary history that is distinct and divergent from the histories ofthe organisms in which they reside. They can be considered distinctbiological entities (for example, parasites) living in their hostgenomes, and techniques for inferring evolutionary histories of hostsand parasites can be applied to test specific horizontal transfer, orhost switching, events.

Rampant horizontal transfer may obscure underlying microbialphylogeny17. Thus, we used a concatenated data set of genes involvedin mRNA translation previously shown to have little evidence of hor-izontal transfer18,19 to provide a reasonable representation of theorganismal, or host, phylogeny. We used parsimony-based, Bayesian,maximum likelihood and distance-based (neighbor-joining) algo-rithms to infer the organismal phylogeny. The resulting trees werelargely in agreement.

Figure 5 Reticulogram. Shown is one of the fourmost parsimonious total reconciliations foundusing our search strategy. The reticulogramoutlines a plausible scenario of the disseminationof the tad locus. Species designations are thesame as in Figures 1 and 3. Gray tubes representorganismal phylogeny. Colors indicate phylogeniesof individual genes as shown. H, horizontaltransfer events; R, recombination and genereplacement; D, duplications. Losses are notindicated for simplicity. cpaD, tadF and rcpB arenot shown for clarity, though all follow the samepath as P1 partition genes. In this totalreconciliation there are 20 losses, 5 horizontaltransfers, 7 duplications and 4 recombinationsand gene replacements (total of 36 non-verticalevents). Some losses may represent missing datain the unfinished genomes. This tally is 7 stepsshorter than the sum of non-vertical events fromthe three most parsimonious individualreconciliations from each partition given byTreeMap2 (P1:19, P2:11, P3:13; see URL). D1,D2, D3, D4, D6, D7, H1, H4, H5, R2, R3 and R4are found in all four most parsimonious totalreconciliations. H1, H5 and D2 are furthersupported by their appearance in all mostparsimonious individual reconciliations betweenP1 and the organismal tree. The most parsimonious reconciliation in which H1 and H5 do not appear requires two more steps than the most parsimonioussolution. Similarly, the most parsimonious reconciliation of P2 that does not include H5 requires 4 more events. D2 and D6 are found in all reconciliationsproduced by TreeMap2. We also reconciled trees generated by each inference method (maximum likelihood, Bayesian and neighbor-joining) separately (seeURL). H1, H2, H3, H4, D2 and D4 are all routinely found in these most parsimonious reconciliations, indicating that the scenario depicted is robust totopological disagreement between trees generated by different inference methods. Confirmation of these events will require further testing and sampling.LILD scores (numbers with arrows) show the total number of extra steps required to impose specific nodes from organismal phylogeny on the partitions fromthe tad locus. These values offer a localized measure of support for this scenario that is supported by statistical analysis. In parentheses is the LILD score foreach partition individually (P1/P2/P3); all values are statistically significant (P < 0.01). Asterisks indicate nodes that were also found in simultaneousanalysis of each partition with the organismal phylogeny. N indicates that the node was not applicable to a specific partition.

L E T T E R S

196 VOLUME 34 | NUMBER 2 | JUNE 2003 NATURE GENETICS

©20

03 N

atu

re P

ub

lish

ing

Gro

up

h

ttp

://w

ww

.nat

ure

.co

m/n

atu

reg

enet

ics

Page 5: The Widespread Colonization Island of Actinobacillus actinomycetemcomitans

L E T T E R S

data become available. To gauge support for our scenarios, we usedthe local incongruence length difference (LILD) test25, which mea-sures the extent and significance of incongruent phylogenetic sig-nal at differing nodes. The LILD test yielded high scores withstrong statistical support (P < 0.01) at several key nodes (Fig 5).

The most parsimonious tree reconciliations show periods ofvertical inheritance punctuated by horizontal transfers andduplications (Fig. 5). In the α-proteobacteria, some of the tadloci have a clear pattern of descent that exactly tracks the organ-ismal phylogeny, indicating vertical inheritance. But several ofthese species have second or third copies of the locus originatingfrom early duplications. These lineages are largely characterizedby non-vertical events, and several descendants, including those inSinorhizobium meliloti and Chlorobium limicola, are found onplasmids. These results raise the possibility that the driving forcefor many of these events was acquisition and mobilization of theregion by plasmids.

One mobile lineage seems to be responsible for the transfer of thetad locus to the common ancestor of the Pasteurellaceae and Yersiniaspecies (Fig. 5). This event is independently supported by the markeddifferences in G+C content in the tad loci of this clade (Fig. 1), but animmediate hypothetical donor with low G+C content is not well rep-resented in this analysis, suggesting an even more complex historicalscenario than presented here.

Our genetic and functional data show that the tad region inA. actinomycetemcomitans encodes a secretion system dedicated tothe assembly of fibrils that mediate adherence and colonization.Because tad regions in other organisms probably facilitate colo-nization by promoting adherence to different host tissues and sur-faces in the environment, we have designated this region as theWidespread Colonization Island. Our analysis indicates thatPasteurellaceae and Yersinia species acquired the WidespreadColonization Island, directly or indirectly, from ancestors or closerelatives of modern day Rhizobiaceae. Thus, a set of genes that sev-eral different organisms require for pathogenesis in humans andanimals was acquired from organisms that probably used thesegenes for other purposes. It is, therefore, important to consider thepotential genetic contribution of environmental, non-pathogenicorganisms in the emergence of infectious disease.

METHODSGenetic analysis. We generated and complemented random transposon inser-tion mutations as previously described6. Uninduced expression from the leakytacp promoter of the IncQ vector was sufficient to complement in all cases. Totest for adherence, we grew strains in polystyrene wells overnight and washedand stained them as previously described6.

Flp1-T7 localization. We grew cells containing expression plasmid pSK163with the flp-1-T7–TAG fusion5 in 10 ml A. actinomycetemcomitans growthmedium (AAGM; 30 g Trypticase soy broth (BBL), 6 g yeast extract (BBL),0.75% glucose and 0.4% sodium bicarbonate per liter) containing chloram-phenicol (4 µg µl–1) and 1.0 mM isopropyl-β-D-thiogalactopyranoside for5 h. We extracted whole-cell protein by removing 100 µl of cell suspension,pelleting cells by centrifugation (at 16,000g for 3 min) and resuspending thecell pellet in 20 µl of SDS loading dye. We loaded 5 µl of this mixture onto a20% SDS–PAGE gel. To obtain Flp1-T7 protein from the supernatant, weplated cells onto AAGM solid medium containing chloramphenicol andisopropyl-β-D-thiogalactopyranoside. After incubation for 3–4 d, wescraped and resuspended cells in 1.0 ml HEPES buffer (10 mM; pH 7.4).After vortex mixing, we pelleted cells by centrifugation. We concentrated analiquot of 500 µl of supernatant ∼ 10× in a Microcon YM-10 (Millipore)concentrating device. We carried out SDS–PAGE (20% acrylamide) andwestern-blot analysis as previously described3.

Database searches and alignment. We identified putative homologous lociand genes using the default settings of BLAST26 to search both the finished andunfinished microbial databases. We accepted loci if they had two or moreORFs that were significantly similar to any gene in the tad locus of A. actino-mycetemcomitans or any other tad-like locus found. We then aligned amino-acid sequences deduced from each ORF in each region with other similargenes using the default settings of CLUSTALX 1.63 (European MolecularBiology Organization). Genes representing organismal phylogeny were basedon 44 ‘core’ genes listed in the data set of Brochier et al.18 that are statisticallycongruent based on principle component analysis. We initially combined puta-tive homologs of tadB and tadC in the same alignment, and phylogenetic analy-sis of the combined tadB and tadC family suggested that the common ancestorof tadB and tadC may have been inherited as one gene that subsequently dupli-cated independently in gram-positive and gram-negative organisms (data notshown). To account for this possibility, we divided the TadB and TadC align-ment into four independent alignments (designated TadB+ and TadC+ forgram-positive versions; TadB– and TadC– for gram-negative versions). TadC–

and TadB– also included proteins from one of the two regions present in thegram-positive organism Streptomyces coelicolor, as these proteins consistentlygrouped with gram-negative versions in phylogenetic analysis. Similarly, weinitially placed putative homologs of tadE and tadF in the same alignment.tadF genes formed a monophyletic clade after an apparent duplication event(data not shown) and were therefore separated into a separate alignment forfurther analysis.

Phylogenetic and comparative analysis. We carried out heuristicsearches for the most parsimonious trees using the ‘ratchet’ method27

implemented using PAUPRat (Sikes & Lewis) in conjunction withPAUP4.0b10 (Swofford). We carried out 200 replicates of the ratchetmethod, upweighting 15% of the characters in each iteration, using thetree-branch reconnection technique and saving only one tree at each step.We then did tree-branch reconnection on the resulting trees. All charac-ters and state transformations were given equal weight. To calculate confi-dence in the resulting trees, we generated Bremer decay indices using theprogram Autodecay (Eriksson) and 100 Bootstrap replicates in PAUP with100 iterations of random addition followed by tree-branch reconnection.We constructed neighbor-joining trees using default settings in PAUP. Weconstructed maximum likelihood trees and bootstrap supports withProML in PHYLIP 3.6 (Felsenstein) using the JTT empirical model. Themaximum likelihood quartet puzzling method was implemented withTREE-PUZZLE28 using WAG and JTT empirical models. We constructedBayesian trees with the Markov-Chain-Monte-Carlo technique in Mr.Bayes29

using the JTT empirical model. We used the partition homogeneity functionin PAUP for ILD analyses. We carried out the LILD test at nodes that weretopologically incongruent with the most parsimonious tree from both theorganismal tree and the simultaneous analysis25. We used TreeMap2b24 togenerate individual reconciliations of each of the three partitions with theorganismal phylogeny. For computational tractability, we allowed no morethan 6 horizontal transfer events, 22 losses and 20 duplications for any singlereconciliation. Further testing, allowing for more events, did not yield moreparsimonious reconciliations (data not shown). Polytomies were resolved tobe congruent with other partitions. All events were weighted equally. Weused the resulting optimal individual reconciliations of each partition asguides in a heuristic search for the best simultaneous total reconciliation ofall three partitions on the organismal tree. To reduce the considerable math-ematical complexity of a manual search for optimal total reconciliations (noautomated techniques are available), any duplication or horizontal transferevent that we found in common for all of the most parsimonious individualreconciliations of each partition was constrained to exist in at least one totalreconciliation. In addition, we assumed that because the genes in the tadlocus are closely linked, non-vertical events predicted for more than one par-tition should be counted only once instead of two or three times in cost cal-culations. Further, all recombination and gene replacement events betweentad loci should occur when the exchanging loci are present in the same hypo-thetical lineage. Using these criteria, we searched for total reconciliationsthat minimized the total number of duplications, horizontal transfers, lossesand recombination and gene replacement events.

NATURE GENETICS VOLUME 34 | NUMBER 2 | JUNE 2003 197

©20

03 N

atu

re P

ub

lish

ing

Gro

up

h

ttp

://w

ww

.nat

ure

.co

m/n

atu

reg

enet

ics

Page 6: The Widespread Colonization Island of Actinobacillus actinomycetemcomitans

L E T T E R S

URL. Additional information and data from this study are available athttp://cpmcnet.columbia.edu/dept/figurski/WCI/wci.html.

ACKNOWLEDGMENTSWe thank M. Charleston for early release of TreeMap2b and helpful discussion,I.N. Sarkar for computer assistance, the members of D.H. Figurski’s, R.D.’sand D.H. Fine’s laboratories for helpful comments and the multiple genomesequencing projects from which data were collected for this analysis. This work wasfunded by grants from the US National Institutes of Health (to D.H. Figurski andR.D.). R.D. is also supported by the Lewis B. and Dorothy Cullman Program forMolecular Systematics at the American Museum of Natural History. S.C.K. waspartially supported by a training grant from the US National Institutes of Health toColumbia University. P.J.P. is supported by the Medical Scientist Training Programof Columbia University.

COMPETING INTERESTS STATEMENTThe authors declare that they have no competing financial interests.

Received 21 February; accepted 31 March 2003Published online 28 April 2003; doi:10.1038/ng1154

1. Hacker, J. & Kaper, J.B. Pathogenicity islands and the evolution of microbes. Annu.Rev. Microbiol. 54, 641–679 (2000).

2. Inoue, T. et al. Molecular characterization of low-molecular-weight component pro-tein, Flp, in Actinobacillus actinomycetemcomitans fimbriae. Microbiol. Immunol.42, 253–258 (1998).

3. Kachlany, S.C., Planet, P.J., DeSalle, R., Fine, D.H. & Figurski, D.H. Genes for tightadherence of Actinobacillus actinomycetemcomitans: from plaque to plague to pondscum. Trends Microbiol. 9, 429–437 (2001).

4. Fine, D.H. et al. Colonization and persistence of rough and smooth colony variants ofActinobacillus actinomycetemcomitans in the mouths of rats. Arch. Oral Biol. 46,1065–1078 (2001).

5. Kachlany, S.C. et al. flp-1, the first representative of a new pilin gene subfamily, isrequired for non-specific adherence of Actinobacillus actinomycetemcomitans. Mol.Microbiol. 40, 542–554 (2001).

6. Kachlany, S.C. et al. Nonspecific adherence by Actinobacillus actinomycetemcomi-tans requires genes widespread in bacteria and archaea. J. Bacteriol. 182,6169–6176 (2000).

7. Fuller, T.E., Kennedy, M.J. & Lowery, D.E. Identification of Pasteurella multocidavirulence genes in a septicemic mouse model using signature-tagged mutagenesis.Microb. Pathog. 29, 25–38 (2000).

8. Nika, J.R. et al. Haemophilus ducreyi requires the flp gene cluster for microcolonyformation in vitro. Infect. Immun. 70, 2965–2975 (2002).

9. Skerker, J.M. & Shapiro, L. Identification and cell cycle control of a novel pilus sys-tem in Caulobacter crescentus. EMBO J. 19, 3223–3234 (2000).

10. Sommer, J.M. & Newton, A. Turning off flagellum rotation requires the pleiotropicgene pleD: pleA, pleC, and pleD define two morphogenic pathways in Caulobactercrescentus. J. Bacteriol. 171, 392–401 (1989).

11. Thomson, V.J., Bhattacharjee, M.K., Fine, D.H., Derbyshire, K.M. & Figurski, D.H.Direct selection if IS903 transposon insertions by use of a broad-host-range vector:isolation of catalase-deficient mutants of Actinobacillus actinomycetemcomitans.J. Bacteriol. 181, 7298–7307 (1999).

12. Haase, E.M., Zmuda, J.L. & Scannapieco, F.A. Identification and molecular analysisof rough-colony-specific outer membrane proteins of Actinobacillus actinomycetem-comitans. Infect. Immun. 67, 2901–2908 (1999).

13. Planet, P.J., Kachlany, S.C., DeSalle, R. & Figurski, D.H. Phylogeny of genes forsecretion NTPases: identification of the widespread tadA subfamily and develop-ment of a diagnostic key for gene classification. Proc. Natl. Acad. Sci. USA 98,2503–2508 (2001).

14. Thomas, N.A., Mueller, S., Klein, A. & Jarrell, K.F. Mutants in flaI and flaJ of thearchaeon Methanococcus voltae are deficient in flagellum assembly. Mol. Microbiol.46, 879–887 (2002).

15. Bhattacharjee, M.K., Kachlany, S.C., Fine, D.H. & Figurski, D.H. Nonspecific adher-ence and fibril biogenesis by Actinobacillus actinomycetemcomitans: TadA proteinis an ATPase. J. Bacteriol. 183, 5927–5936 (2001).

16. Wang, Y., Goodman, S.D., Redfield, R.J. & Chen, C. Natural transformation and DNAuptake signal sequences in Actinobacillus actinomycetemcomitans. J. Bacteriol.184, 3442–3449 (2002).

17. Doolittle, W.F. Phylogenetic classification and the universal tree. Science 284,2124–2129 (1999).

18. Brochier, C., Bapteste, E., Moreira, D. & Philippe, H. Eubacterial phylogeny basedon translational apparatus proteins. Trends Genet. 18, 1–5 (2002).

19. Jain, R., Rivera, M.C. & Lake, J.A. Horizontal gene transfer among genomes: thecomplexity hypothesis. Proc. Natl. Acad. Sci. USA 96, 3801–3806 (1999).

20. Farris, J.S., Kallersjo, M., Kluge, A.G. & Bult, C. Constructing a significance test forincongruence. Syst. Biol. 44, 570–572 (1995).

21. Brown, E.W., Kotewicz, M.L. & Cebula, T.A. Detection of recombination amongSalmonella enterica strains using the incongruence length difference test. Mol.Phylogenet. Evol. 24, 102–120 (2002).

22. Dolphin, K., Belshaw, R., Orme, C.D. & Quicke, D.L. Noise and incongruence: inter-preting results of the incongruence length difference test. Mol. Phylogenet. Evol. 17,401–406 (2000).

23. Page, R.D. & Charleston, M.A. From gene to organismal phylogeny: reconciled treesand the gene tree/species tree problem. Mol. Phylogenet. Evol. 7, 231–240 (1997).

24. Charleston, M.A. Jungles: a new solution to the host/parasite phylogeny reconcilia-tion problem. Math. Biosci. 149, 191–223 (1998).

25. Thornton, J.W. & DeSalle, R. A new method to localize and test the significance ofincongruence: detecting domain shuffling in the nuclear receptor superfamily. Syst.Biol. 49, 183–201 (2000).

26. Altschul, S.F. et al. Gapped BLAST and PSI-BLAST: a new generation of proteindatabase search programs. Nucleic Acids Res. 25, 3389–3402 (1997).

27. Nixon, K.C. The parsimony ratchet, a new method for rapid parsimony analysis.Cladistics 15, 407–414 (1999).

28. Schmidt, H.A., Strimmer, K., Vingron, M. & von Haeseler, A. TREE-PUZZLE: maxi-mum likelihood phylogenetic analysis using quartets and parallel computing.Bioinformatics 18, 502–504 (2002).

29. Huelsenbeck, J.P. & Ronquist, F. MRBAYES: Bayesian inference of phylogenetictrees. Bioinformatics 17, 754–755 (2001).

30. Bergey, D.H., Holt, J.G. & Krieg, N.R. Bergey’s manual of systematic bacteriologyvol. 4 (Williams & Wilkins, Baltimore, 1984).

198 VOLUME 34 | NUMBER 2 | JUNE 2003 NATURE GENETICS

©20

03 N

atu

re P

ub

lish

ing

Gro

up

h

ttp

://w

ww

.nat

ure

.co

m/n

atu

reg

enet

ics