10
JOURNAL OF BACTERIOLOGY, Dec. 2010, p. 6126–6135 Vol. 192, No. 23 0021-9193/10/$12.00 doi:10.1128/JB.01081-10 Copyright © 2010, American Society for Microbiology. All Rights Reserved. Molecular Evolution of the Helicobacter pylori Vacuolating Toxin Gene vacA Kelly A. Gangwer, 1,2 Carrie L. Shaffer, 1 Sebastian Suerbaum, 3 D. Borden Lacy, 1,2,4 Timothy L. Cover, 1,5,7 * and Seth R. Bordenstein 6 * Department of Microbiology and Immunology, Vanderbilt University School of Medicine, Nashville, Tennessee 1 ; Center for Structural Biology, Vanderbilt University School of Medicine, Nashville, Tennessee 2 ; Institute of Medical Microbiology and Hospital Epidemiology, Hannover Medical School, Hannover, Germany 3 ; Department of Biochemistry, Vanderbilt University School of Medicine, Nashville, Tennessee 4 ; Department of Medicine, Vanderbilt University School of Medicine, Nashville, Tennessee 5 ; Department of Biological Sciences, Vanderbilt University, Nashville, Tennessee 6 ; and Veterans Affairs Tennessee Valley Healthcare System, Nashville, Tennessee 7 Received 10 September 2010/Accepted 15 September 2010 Helicobacter pylori is a genetically diverse organism that is adapted for colonization of the human stomach. All strains contain a gene encoding a secreted, pore-forming toxin known as VacA. Genetic variation at this locus could be under strong selection as H. pylori adapts to the host immune response, colonizes new human hosts, or inhabits different host environments. Here, we analyze the molecular evolution of VacA. Phylogenetic reconstructions indicate the subdivision of VacA sequences into three main groups with distinct geographic distributions. Divergence of the three groups is principally due to positively selected sequence changes in the p55 domain, a central region required for binding of the toxin to host cells. Divergent amino acids map to surface-exposed sites in the p55 crystal structure. Comparative phylogenetic analyses of vacA sequences and housekeeping gene sequences indicate that vacA does not share the same evolutionary history as the core genome. Further, rooting the VacA tree with outgroup sequences from the close relative Helicobacter acinonychis reveals that the ancestry of VacA is different from the African origin that typifies the core genome. Finally, sequence analyses of the virulence determinant CagA reveal three main groups strikingly similar to the three groups of VacA sequences. Taken together, these results indicate that positive selection has shaped the phylogenetic structure of VacA and CagA, and each of these virulence determinants has evolved separately from the core genome. Helicobacter pylori is a Gram-negative bacterium that persis- tently colonizes the human stomach. H. pylori induces a gastric mucosal inflammatory response known as superficial gastritis and is a risk factor for the development of peptic ulcer disease, gastric adenocarcinoma, and gastric mucosa-associated lym- phoid tissue (MALT) lymphoma (2, 43). H. pylori is present in about half of all humans throughout the world. H. pylori strains from unrelated humans exhibit a high level of genetic diversity (5, 44). The population structure of H. pylori is panmictic, and the rate of recombination in H. pylori is reported to be among the highest in the Eubacteria (17, 44). Multilocus sequence analysis of housekeeping genes has re- vealed the presence of at least nine different H. pylori popula- tions or subpopulations that are localized to distinct geo- graphic regions (12, 27, 31). Analysis of these sequences suggests that H. pylori has spread throughout the world con- currently with the major events of human dispersal, and thus H. pylori is potentially a useful marker for the geographic migrations of human populations (12). One of the important virulence determinants of H. pylori is a secreted toxin known as VacA. VacA is a pore-forming toxin that causes multiple alterations in human cells, including cell vacuolation, depolarization of membrane potential, alteration of mitochondrial membrane permeability, apoptosis, activation of mitogen-activated protein kinases, inhibition of antigen pre- sentation, and inhibition of T-cell activation and proliferation (8, 10, 15). Secreted by an autotransporter (type Va) secretion mechanism, VacA is translated as a 140-kDa protoxin that undergoes N- and C-terminal cleavage during the secretion process to yield an N-terminal signal sequence, a mature 88- kDa secreted toxin known as p88, a small secreted peptide with no known function (termed secreted alpha peptide, or SAP) (7), and a C-terminal beta-barrel domain (41, 47) (Fig. 1A). Two domains of p88 VacA, p33 and p55, have been identified based on partial proteolysis of p88 into fragments of 33 kDa and 55 kDa, respectively (47) (Fig. 1A). The N-terminal p33 domain (residues 1 to 311) is involved in pore formation while the p55 domain (residues 312 to 821) contains one or more cell-binding domains (14, 48). The isolated p55 domain binds to host cells less avidly than does the full-length p88 protein, and in contrast to p88, the isolated p55 domain is not inter- * Corresponding author. Mailing address for T. L. Cover: Division of Infections Diseases, A2200 Medical Center North, Vanderbilt University School of Medicine, Nashville, TN 37232. Phone: (615) 322-2035. Fax: (615) 343-6160. E-mail: [email protected]. Mailing address for S. R. Bordenstein: Department of Biological Sciences, Vanderbilt University, Box 351634, Station B, Nashville, TN 37235-1634. Phone: (615) 322-9087. Fax: (615) 343-6707. E-mail: s.bordenstein @vanderbilt.edu. † Supplemental material for this article may be found at http://jb .asm.org/. Published ahead of print on 24 September 2010. 6126 on February 13, 2021 by guest http://jb.asm.org/ Downloaded from

Molecular Evolution of the Helicobacter pylori Vacuolating ... · VacA) typically express CagA, and strains that produce inac-tive VacA proteins (type s2 VacA) typically lack the

  • Upload
    others

  • View
    11

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Molecular Evolution of the Helicobacter pylori Vacuolating ... · VacA) typically express CagA, and strains that produce inac-tive VacA proteins (type s2 VacA) typically lack the

JOURNAL OF BACTERIOLOGY, Dec. 2010, p. 6126–6135 Vol. 192, No. 230021-9193/10/$12.00 doi:10.1128/JB.01081-10Copyright © 2010, American Society for Microbiology. All Rights Reserved.

Molecular Evolution of the Helicobacter pylori VacuolatingToxin Gene vacA�†

Kelly A. Gangwer,1,2 Carrie L. Shaffer,1 Sebastian Suerbaum,3 D. Borden Lacy,1,2,4

Timothy L. Cover,1,5,7* and Seth R. Bordenstein6*Department of Microbiology and Immunology, Vanderbilt University School of Medicine, Nashville, Tennessee1; Center for

Structural Biology, Vanderbilt University School of Medicine, Nashville, Tennessee2; Institute of Medical Microbiology andHospital Epidemiology, Hannover Medical School, Hannover, Germany3; Department of Biochemistry,

Vanderbilt University School of Medicine, Nashville, Tennessee4; Department of Medicine,Vanderbilt University School of Medicine, Nashville, Tennessee5; Department of

Biological Sciences, Vanderbilt University, Nashville, Tennessee6; andVeterans Affairs Tennessee Valley Healthcare System,

Nashville, Tennessee7

Received 10 September 2010/Accepted 15 September 2010

Helicobacter pylori is a genetically diverse organism that is adapted for colonization of the human stomach.All strains contain a gene encoding a secreted, pore-forming toxin known as VacA. Genetic variation at thislocus could be under strong selection as H. pylori adapts to the host immune response, colonizes new humanhosts, or inhabits different host environments. Here, we analyze the molecular evolution of VacA. Phylogeneticreconstructions indicate the subdivision of VacA sequences into three main groups with distinct geographicdistributions. Divergence of the three groups is principally due to positively selected sequence changes in thep55 domain, a central region required for binding of the toxin to host cells. Divergent amino acids map tosurface-exposed sites in the p55 crystal structure. Comparative phylogenetic analyses of vacA sequences andhousekeeping gene sequences indicate that vacA does not share the same evolutionary history as the coregenome. Further, rooting the VacA tree with outgroup sequences from the close relative Helicobacter acinonychisreveals that the ancestry of VacA is different from the African origin that typifies the core genome. Finally,sequence analyses of the virulence determinant CagA reveal three main groups strikingly similar to the threegroups of VacA sequences. Taken together, these results indicate that positive selection has shaped thephylogenetic structure of VacA and CagA, and each of these virulence determinants has evolved separatelyfrom the core genome.

Helicobacter pylori is a Gram-negative bacterium that persis-tently colonizes the human stomach. H. pylori induces a gastricmucosal inflammatory response known as superficial gastritisand is a risk factor for the development of peptic ulcer disease,gastric adenocarcinoma, and gastric mucosa-associated lym-phoid tissue (MALT) lymphoma (2, 43). H. pylori is present inabout half of all humans throughout the world.

H. pylori strains from unrelated humans exhibit a high levelof genetic diversity (5, 44). The population structure of H.pylori is panmictic, and the rate of recombination in H. pylori isreported to be among the highest in the Eubacteria (17, 44).Multilocus sequence analysis of housekeeping genes has re-vealed the presence of at least nine different H. pylori popula-tions or subpopulations that are localized to distinct geo-graphic regions (12, 27, 31). Analysis of these sequences

suggests that H. pylori has spread throughout the world con-currently with the major events of human dispersal, and thusH. pylori is potentially a useful marker for the geographicmigrations of human populations (12).

One of the important virulence determinants of H. pylori isa secreted toxin known as VacA. VacA is a pore-forming toxinthat causes multiple alterations in human cells, including cellvacuolation, depolarization of membrane potential, alterationof mitochondrial membrane permeability, apoptosis, activationof mitogen-activated protein kinases, inhibition of antigen pre-sentation, and inhibition of T-cell activation and proliferation(8, 10, 15). Secreted by an autotransporter (type Va) secretionmechanism, VacA is translated as a 140-kDa protoxin thatundergoes N- and C-terminal cleavage during the secretionprocess to yield an N-terminal signal sequence, a mature 88-kDa secreted toxin known as p88, a small secreted peptide withno known function (termed secreted alpha peptide, or SAP)(7), and a C-terminal beta-barrel domain (41, 47) (Fig. 1A).Two domains of p88 VacA, p33 and p55, have been identifiedbased on partial proteolysis of p88 into fragments of 33 kDaand 55 kDa, respectively (47) (Fig. 1A). The N-terminal p33domain (residues 1 to 311) is involved in pore formation whilethe p55 domain (residues 312 to 821) contains one or morecell-binding domains (14, 48). The isolated p55 domain bindsto host cells less avidly than does the full-length p88 protein,and in contrast to p88, the isolated p55 domain is not inter-

* Corresponding author. Mailing address for T. L. Cover: Divisionof Infections Diseases, A2200 Medical Center North, VanderbiltUniversity School of Medicine, Nashville, TN 37232. Phone: (615)322-2035. Fax: (615) 343-6160. E-mail: [email protected] address for S. R. Bordenstein: Department of Biological Sciences,Vanderbilt University, Box 351634, Station B, Nashville, TN 37235-1634.Phone: (615) 322-9087. Fax: (615) 343-6707. E-mail: [email protected].

† Supplemental material for this article may be found at http://jb.asm.org/.

� Published ahead of print on 24 September 2010.

6126

on February 13, 2021 by guest

http://jb.asm.org/

Dow

nloaded from

Page 2: Molecular Evolution of the Helicobacter pylori Vacuolating ... · VacA) typically express CagA, and strains that produce inac-tive VacA proteins (type s2 VacA) typically lack the

nalized by cells (18, 48). These observations suggest that se-quences in both the p33 and p55 domains mediate VacA in-teractions with the surface of cells.

All strains of H. pylori contain a chromosomal vacA gene,but individual strains differ considerably in levels of VacAactivity (3, 8). Two studies analyzed vacA sequence encoding afragment of the p33 domain and did not detect any recogniz-able phylogenetic structure (star or bush-type pattern), pre-sumably due to the presence of extensive recombination (19,44). Other studies analyzed different regions of VacA anddetected polymorphisms that allow classification of vacA al-leles into distinct families (designated s1/s2, i1/i2, and m1/m2)depending on the presence of signature sequences in differentregions of VacA (3, 4, 39). Geographic differences have beendetected within several of these vacA regions (22, 24, 29, 37, 51,52, 55). In general, strains containing vacA alleles classified ass1, i1, or m1 have been associated with an increased risk ofulcer disease or gastric cancer compared to strains containingvacA alleles classified as s2, i2, or m2 (3, 13, 39).

Another important H. pylori virulence factor is the secretedCagA effector protein. The cagA gene is localized within a

40-kb chromosomal region known as the cag pathogenicityisland (PAI) (20). H. pylori strains expressing CagA are asso-ciated with a significantly increased risk for development ofulcer disease or gastric cancer compared to strains that lack thecagA gene (6). Upon entry into cells, CagA undergoes phos-phorylation by host cell kinases and induces numerous alter-ations in cellular signaling, leading to the designation of CagAas a “bacterial oncoprotein” (20, 32).

H. pylori strains that produce an active VacA protein (type s1VacA) typically express CagA, and strains that produce inac-tive VacA proteins (type s2 VacA) typically lack the cagA gene(3). vacA and the cag PAI localize to distant sites on the H.pylori chromosome, and, therefore, the basis for this associa-tion has been unclear. Recently, several studies have reportedthat there are complex relationships between the cellular ef-fects of VacA and CagA, whereby VacA can downregulateCagA’s effects on epithelial cells, or vice versa (1, 35, 46, 56).This functional interaction between VacA and CagA may rep-resent a mechanism that allows H. pylori to minimize damageto gastric epithelial cells or minimize mucosal inflammation,thereby allowing it to persistently colonize the stomach.

Although VacA is considered an important H. pylori viru-lence factor and hundreds of studies have classified H. pyloristrains based on a vacA typing scheme, there has been verylittle effort to investigate the forces that drive vacA diversifi-cation, to analyze the evolutionary history of vacA, or to cor-relate vacA diversity with features of the VacA three-dimen-sional structure. Several important questions remain instudying the vacA gene: (i) Are the s1, i1, and m1 alleles (whichare associated with an increased risk of gastroduodenal dis-ease) more recently derived than the s2, i2, and m2 alleles? (ii)Are the geographic differences in vacA alleles driven by adap-tive evolution or genetic drift? (iii) Does the evolutionary his-tory of the vacA gene parallel the evolutionary history of thecore genes used for MLST analysis, which are markers forancient migrations of human populations?

In the current study, we present a comprehensive analysis ofthe molecular evolution of vacA. Our analysis of VacA diver-sity indicates that VacA sequences are clustered into threemain groups with distinct geographic distributions. By analyz-ing topological differences between vacA and housekeepinggene phylogenetic trees, we demonstrate that the vacA genedoes not share the same evolutionary history as the core ge-nome of H. pylori. We report that the evolution of VacA hasbeen shaped by positive selection, and adaptive evolution isrestricted to the p55 domain. Most of the sequence divergencecorresponds to surface-exposed amino acids in the three-di-mensional structure of the p55 domain. Finally, we note thatthere are similarities between the phylogenetic structure of theVacA and CagA trees, and we discuss the roles that positiveselection pressures have played in the evolution of these twovirulence determinants.

MATERIALS AND METHODS

VacA reference sequences. VacA from strain 60190 (GenBank accession num-ber Q48245) was used as the reference sequence for amino acid numbering, inwhich residue 1 refers to alanine-1 of the secreted 88-kDa VacA protein. It is theprototype s1/m1 form of VacA, and the crystal structure of the p55 domain ofVacA from this strain has been determined previously (14). VacA sequencesfrom strain 95-54 (GenBank accession number U95971) and strain Tx30a

FIG. 1. Analysis of VacA phylogeography. (A) The vacA gene en-codes a 140-kDa protoxin, which undergoes cleavage to yield a signalsequence, a secreted 88-kDa toxin, a secreted alpha-peptide (SAP),and a C-terminal �-barrel domain. The mature 88-kDa VacA toxincontains two domains, designated p33 and p55. The midregion se-quence that defines type m1 and m2 forms of VacA is located withinp55. A 21-amino-acid insertion is present in m2 forms but not m1forms of VacA. (B) Neighbor-joining phylogenetic tree of 100 aminoacid sequences of VacA. Three major groups (designated groups 1 to3) are evident. The chart shows the number of strains analyzed andcharacteristics of VacA protein sequences in each group of the tree.Group 1 comprises type m1 sequences mainly from non-Asian strains,group 2 comprises m1 sequences from Asian strains, and group 3comprises m2 sequences from both Asian and non-Asian strains. SeeFig. S1 in the supplemental material for a ladder-type version of thistree.

VOL. 192, 2010 MOLECULAR EVOLUTION OF H. PYLORI vacA 6127

on February 13, 2021 by guest

http://jb.asm.org/

Dow

nloaded from

Page 3: Molecular Evolution of the Helicobacter pylori Vacuolating ... · VacA) typically express CagA, and strains that produce inac-tive VacA proteins (type s2 VacA) typically lack the

(GenBank accession number Q48253) were used as reference sequences fors1/m2 and s2/m2 proteins, respectively (3, 36).

Delineation of VacA domains. VacA domains analyzed in this study corre-spond to the following amino acid sequence numbers in VacA from H. pyloristrain 60190: p33, residues 1 to 311; p55, residues 312 to 821; secreted alphapeptide, residues 822 to 954 (7); and the C-terminal beta-barrel domain, residues955 to 1254. The signal sequence corresponds to residues preceding the p33domain.

Selection of VacA and CagA sequences for phylogenetic analysis. One hundreddeduced VacA amino acid sequences (86 full-length gene sequences and 14 thatwere complete within the region encoding the p55 domain) (for strain names seeFig. S1 in the supplemental material) were identified by a BLAST search of theGenBank using the two prototype VacA sequences listed above. These se-quences originated from H. pylori strains that were isolated from humans in manydifferent regions of the world. To obtain additional VacA sequences of Africanorigin, we analyzed vacA in five H. pylori strains that were isolated from patientsin Africa and previously classified by multilocus sequence typing (MLST) anal-ysis as HpAfrica2 (strains 191.9 and 501.9), HspSAfrica (cc2c), or HspW Africa(D1a and D1b) (12, 27). The vacA locus was amplified using an Expand LongTemplate PCR System (Roche Applied Science) with the primers described inTable S1 in the supplemental material, and the vacA sequences were determined.The vacA sequences from strains D1a and D1b each contained a frameshiftmutation and were excluded from subsequent phylogenetic analyses. Sequenceswere aligned with MUSCLE and edited manually in MacClade, version 4.08(http://macclade.org/macclade.html) (28). The total length of aligned sequenceswas 1,354 amino acids. All insertions/deletions (indels) and hypervariable regionswere removed manually by eye from the alignments, resulting in a final alignmentlength of 1,135 amino acids for the unrooted analysis and 971 amino acids for therooted analysis. For analysis of CagA sequences, we evaluated the group of 100strains from which VacA sequences were available and identified 46 strains forwhich full-length CagA sequences were also available.

Criteria for classification of VacA sequences. VacA sequences were classifiedas m1 or m2 based on the absence or presence, respectively, of a 21-amino-acidinsert within the p55 domain (between amino acids 475 and 476) (Fig. 1A) (3).We identified and excluded two m1/m2 chimeric VacA proteins (from strains ch2and v225) in which tracts of recombination between m1 and m2 sequences wereidentifiable by eye (4, 14, 54). VacA sequences were classified as s1 or s2 basedon the absence or presence, respectively, of a 9-amino-acid insertion in the signalsequence region (3). VacA sequences were classified as i1 or i2 based on aminoacid substitutions that fall into two clusters, previously denoted as clusters B andC (39).

Phylogenetic analyses. Unrooted phylogenetic distance trees based on VacAand CagA protein sequences were created using the neighbor-joining methodwith a Jukes-Cantor genetic distance model in Geneious, version 4.6.5 (A. J.,Drummond, B. Ashton, M. Cheung, J. Heled, M. Kearse, R. Moir, T. Stones-Havas, T. Thierer, and A. Wilson, Biomatters, Auckland, New Zealand). Supportfor nodes on the neighbor-joining trees was assessed by 2,000 replicates ofbootstrap, and the majority rule consensus trees are shown. Maximum-likelihood(ML) trees based on DNA sequences were used for Shimodaira-Hasegawa (SH)statistical tests of topological congruence. Prior to ML analyses, a DNA substi-tution model for each data set was selected using jModelTest, version 0.1.1(http://darwin.uvigo.es/software/modeltest.html) (38) with the corrected Akaikeinformation criterion (AICc). ML heuristic searches were performed using 100random taxon addition replicates with tree bisection and reconnection branchswapping. ML bootstrap support was determined using 100 bootstrap replicates,each using 10 random taxon addition replicates with tree bisection and recon-nection (TBR) branch swapping. Searches were performed in parallel on aBeowulf cluster using the clusterpaup program, written by A.G. McArthur, andPAUP, version 4.0b10 (45).

Shimodaira-Hasegawa test. The Shimodaira-Hasegawa test (42) is used tocompare the topology of a maximum-likelihood (best) tree to that of an alternateevolutionary hypothesis for tree topology. We tested the significance of topolog-ical differences between the vacA and MLST phylogenetic trees and between thevacA and cagA trees using the SH test (42). The test compares the likelihoodscore (�lnL) of a given sequence alignment across its ML tree versus the �lnLof that data set across alternative topologies, which in this case are the MLphylogenies for other data sets. The differences in the �lnL values are evaluatedfor statistical significance using bootstrap (1,000 replicates) based on two meth-ods, the resampling estimated log-likelihood (RELL) method and the moreextensive full optimization. These two approaches yielded similar results.

Reconstruction of a vacA pseudogene from Helicobacter acinonychis. The entirevacA pseudogene of H. acinonychis, corresponding to approximately nucleotides443900 to 439500 in the genome sequence of strain Sheeba (9, 11), was translated

in all three reading frames, and the translated fragments with homology to H.pylori VacA were then concatenated. The VacA protein encoded by the recon-structed H. acinonychis vacA pseudogene consists of 1,310 amino acids. ABLAST search indicates that the reconstructed H. acinonychis VacA sequenceexhibits 64% amino acid identity to its closest match in H. pylori and retains ahigh level of relatedness to H. pylori VacA throughout the sequence.

Analysis of housekeeping genes. Nucleotide sequences of housekeeping geneswere retrieved from the H. pylori multilocus sequence typing database (http://pubmlst.org/helicobacter). This database contains nucleotide sequence data(398 to 627 nucleotides per gene) for seven housekeeping genes (atpA, efp, mutY,ppa, trpC, ureI, and yphC) from each H. pylori strain included in the database(12). Concatenated nucleotide sequences were aligned using MUSCLE andedited manually in MacClade, version 4.08 (28). To permit rooting of a tree ofconcatenated housekeeping genes, we retrieved orthologous sequences from theH. acinonychis genome (11). PhyloBayes and MrBayes inference methods wereused to generate the rooted housekeeping gene trees and posterior probabilityvalues.

Rooted phylogenetic analyses of VacA sequences. PhyloBayes, version 2.3(http://megasun.bch.umontreal.ca/People/lartillot/www/index.htm), was used toreconstruct the VacA rooted trees based on various inference methods. Theseanalyses were performed by leveraging several models of molecular evolution tothe sequence alignments, including the site-homogeneous models of Jones-Tay-lor-Thorton (JTT) and Whelan and Goldman (WAG) and the category aminoacid site-heterogeneous mixture model (CAT), to suppress tree artifacts associ-ated with long-branch attraction (26). For all analyses, at least two independentruns were performed with free equilibrium frequencies inferred from the dataand gamma distributed rate variation with four discrete categories. Burn-ins upto 20% of the sampled trees were used until a maximum difference (MaxDiff)value of �0.15 was achieved to ensure chain equilibration.

Population genetic tests of selection. A sliding-window analysis of the ratio ofnonsynonymous to synonymous substitutions dN/dS was performed using VacAsequences from strains 60190 (m1 type) and 95-54 (m2 type) with the programDnaSP (http://www.ub.edu/dnasp) (40). Sliding-window parameters included awindow size of 50 bases and a step size of 10 bases. For further analyses, a totalof 45 VacA sequences, corresponding to 15 VacA amino acid sequences fromeach VacA group, were retrieved from GenBank. These strains are shown in Fig.S1 in the supplemental material (in boldface). Additionally, a total of 32 CagAsequences, corresponding to CagA amino acid sequences from each CagA group(6 from group 1, 15 from group 2, and 11 from group 3), were retrieved.Sequences were assembled and aligned with Geneious and edited manually inMacClade, version 4.08 (28). All indels and hypervariable regions were removedmanually by eye. The standard McDonald-Kreitman test (http://mkt.uab.es/mkt/)(30) was carried out on full-length vacA, individual regions of vacA, and full-length cagA sequences with the exclusion of low-frequency variants less than orequal to 15% to reduce artifacts associated with detecting adaptive evolution.The neutrality index (NI) was calculated from the ratio of the number of poly-morphisms to the number of substitutions as follows: NI � (Pn/Ps)/(Dn/Ds),where P is polymorphic within the population, D is divergence or fixed differencebetween populations, n is nonsynonymous, and s is synonymous.

Nucleotide sequence accession numbers. Sequences of the vacA genes de-termined in this study were deposited in GenBank under accession numbersHQ287752, HQ287753, and HQ287754.

RESULTS AND DISCUSSION

Phylogenetic analysis of VacA. As a first approach for study-ing phylogenetic features of VacA, we analyzed 100 completeor nearly complete VacA amino acid sequences that wereavailable in GenBank. An unrooted phylogenetic analysis dem-onstrated that most of the sequences clustered into three dis-tinct groups (Fig. 1B; see also Fig. S1 in the supplementalmaterial), corresponding to non-Asian strains (predominantlyfrom Australia, Kenya, the United States, and Europe; group1), Asian strains (predominantly from China and Japan; group2), and strains with a worldwide distribution (both Asian andnon-Asian; group 3). Based on an analysis of indels that arediagnostic of previously described VacA families (3, 4, 39), allof the sequences in group 1 and group 2 were classified as typem1, and all of the sequences in group 3 were classified as type

6128 GANGWER ET AL. J. BACTERIOL.

on February 13, 2021 by guest

http://jb.asm.org/

Dow

nloaded from

Page 4: Molecular Evolution of the Helicobacter pylori Vacuolating ... · VacA) typically express CagA, and strains that produce inac-tive VacA proteins (type s2 VacA) typically lack the

m2 (Fig. 1B). All of the VacA sequences in groups 1 and 2contain a type s1 signal sequence region; three sequenceswithin group 3 contain a type s2 signal sequence, and theremaining sequences contain type s1 signal sequences (Fig.1B). Within group 3 there is a subgroup of four sequences(from strains CHN5147, CHN1811a, CHN5114a, and CHN3295b;designated subgroup II), all of which were from H. pylori strainsisolated in Shanghai, China (24). The VacA sequence fromstrain Shi470, which was isolated from an Amerindian patientin the Amazon (25), was located in the tree between group 1and group 2 sequences.

Phylogenetic analysis of VacA structural domains. To de-termine which of the structural domains in VacA have shapedthe tree into three phylogeographic groups, we performed phy-logenetic analyses on five putative structural domains (Fig.1A): p55, p33, signal sequence region, secreted alpha-peptide(SAP), and the C-terminal �-barrel region (Fig. 2, p55 andp33; see also Fig. S2A to C, respectively, in the supplementalmaterial for the other domains). There was marked variationin the general appearance of these trees. Of particular interest,the tree structures of the two domains comprising the secretedVacA toxin (p33 and p55) were markedly different from eachother (Fig. 2). Only the tree for the p55 domain (427 alignedamino acids of 1,135 total amino acids) yielded a three-grouppattern (Fig. 2A) that overlaps with the phylogeography offull-length VacA (Fig. 1B). The other regions exhibited treestructures (Fig. 2B; see also Fig. S2A to C) substantially dif-ferent from those of full-length VacA or p55 trees. Therefore,the localization of full-length VacA sequences to three maingroups (Fig. 1) is determined primarily by protein sequence

divergence in the p55 region. In each of the trees, we notedthat particular groups of sequences had distinct geographicdistributions. Within the p55 tree, a group of sequences ofAsian origin, classified as group 2 (m1 Asian) in Fig. 2, havebeen assigned a variety of different labels in previous publica-tions, including m1b, m1T, and m3 (21, 29, 37, 53).

Phylogenetic incongruence between the trees of vacA andhousekeeping genes. Previous studies have classified H. pyloristrains into a set of population groups with distinct geographicdistributions, based on MLST analyses of seven housekeepinggenes (12, 27, 31). To investigate relationships between thephylogeny of housekeeping genes and the three-group struc-ture of the VacA phylogeny (Fig. 1B), we analyzed the nucle-otide sequences of vacA and housekeeping gene sequencesfrom 12 strains for which both sets of sequences were available.In addition, we determined the vacA nucleotide sequences oftwo strains previously classified as HpAfrica2 by MLST anal-ysis since this population group is known to exhibit a relativelyhigh level of divergence from other H. pylori population groups(12, 27, 31). Housekeeping genes from different strains do notdiffer substantially from one another at the protein level, and,therefore, this comparative analysis required the use of nucle-otide sequences rather than protein sequences. The overalltopology of the vacA tree was completely dissimilar from thatof the housekeeping gene tree (see Fig. S3 in the supplementalmaterial). To statistically evaluate the topological incongru-ence or congruence between the vacA and housekeeping genephylogenies, we compared the maximum-likelihood (ML) phy-logenies of the 14 taxa common to both data sets using theShimodaira-Hasegawa (SH) test. This analysis confirmed that

FIG. 2. Neighbor-joining phylogeny of the VacA p55 domain and VacA p33 domain. (A) Three main groups (designated groups 1 to 3) aredetected within this tree. The chart shows the number of strains analyzed and characteristics of VacA sequences in each group of the tree. Thistree maintains the same pattern as the VacA full-length tree shown in Fig. 1. The nomenclature for the primary VacA p55 groups (groups 1, 2,and 3) is consistent with the nomenclature of groups in the full-length VacA tree (Fig. 1). (B) Two main groups are evident, designated group Ap33and group Bp33. The chart shows the number of strains analyzed and characteristics of VacA sequences in each group of the tree. The sequencesin group Ap33 were localized in groups 1, 2, and 3 of the full-length VacA tree (Fig. 1B) and groups 1, 2, and 3 of the p55 tree (panel A), andsequences in group Bp33 were all localized in group 3 of the full-length VacA tree and group 3 of the p55 tree. Divergence between group Ap33and group Bp33 reflects differences within the VacA intermediate region (39). The sequences in group Ap33 are characterized as type i1, with theexception of two sequences that appear to be i1-i2 hybrids, and sequences in group Bp33 are exclusively characterized as type i2.

VOL. 192, 2010 MOLECULAR EVOLUTION OF H. PYLORI vacA 6129

on February 13, 2021 by guest

http://jb.asm.org/

Dow

nloaded from

Page 5: Molecular Evolution of the Helicobacter pylori Vacuolating ... · VacA) typically express CagA, and strains that produce inac-tive VacA proteins (type s2 VacA) typically lack the

the vacA and housekeeping tree topologies are significantlydifferent (P � 0.0001) (Table 1; see also Fig. S3), indicatingthat the vacA toxin gene has a different evolutionary historyfrom that of the core genes of H. pylori.

Rooted phylogenetic analyses of VacA and housekeepinggenes. To further compare the ancestry of vacA with that of theMLST core genes, we generated and compared a rooted treeof full-length VacA protein sequences with a rooted tree ofconcatenated housekeeping gene sequences, using the samestrains used in the SH test described above. We used thecorresponding nucleotide sequences from housekeeping genesfrom the close relative H. acinonychis to root the housekeepinggene tree (11), and we used the deduced protein sequence ofa reconstructed vacA pseudogene in H. acinonychis as an out-group to root the VacA tree (11). We could not use vacAnucleotide sequences in this analysis because of the extremelyhigh nucleotide divergence between the outgroup sequenceand the ingroup. Nonetheless, the VacA protein and vacAnucleotide trees of the ingroup recapitulate the same three-group phylogenies (data not shown).

The Bayesian root for the MLST housekeeping gene tree isconfidently positioned in taxa classified as HpAfrica2, a pop-ulation currently found almost exclusively in South Africa (12,27) (Fig. 3A). We performed a second, rooted analysis using alarger MLST data set consisting of 61 sequences from repre-sentative H. pylori strains that previously had been classifiedinto nine geographically distinct populations and subpopula-tions (12). The root was again positioned in taxa classified asHpAfrica2; the next most closely related taxa are also fromAfrica and are classified as HspSouth Africa or HspWest Af-rica subpopulations. Results were similar between the smallerand larger data sets, and thus there was no effect of taxonselection on the placement of the MLST root (compare Fig.3A and Fig. S4 in the supplemental material). To further con-firm the rooting position in African populations, we excludedthe HpAfrica2 taxa and repeated the analysis. In this case, theroot is positioned in taxa classified as HspSouth Africa sub-population (data not shown). Taken together, the confidentplacement of the MLST rooting in the African taxa confirmsprevious reports of an ancient African origin for H. pylori inhumans (12, 27).

A 971-amino-acid sequence alignment of the reconstructedVacA amino acid sequence of H. acinonychis with 14 ingrouptaxa yields a Bayesian phylogeny (Fig. 3B) with the VacA rootconfidently positioned at the B38 taxon (an s2/m2 form ofVacA from a strain classified as HpEurope based on MLSTanalysis). We also created a rooted tree for a larger data setusing 24 ingroup taxa (see Fig. S5 in the supplemental mate-rial), corresponding to H. pylori VacA sequences that wererepresentative of VacA groups 1 to 3 in the unrooted tree (Fig.1). In this analysis, the VacA root is confidently positioned inthe CHN3295 and CHN5147 taxa (see Fig. S5). These twostrains also have m2 sequence characteristics and belong to thegroup 3 subgroup II of the full-length VacA tree (Fig. 1). Thus,the root of the VacA tree is unexpectedly positioned in m2 taxaof Chinese origin, rather than taxa of African origin, with thenext branch consisting of m2 sequences of non-Asian origin.

This analysis does not allow us to determine with confidence

TABLE 1. Results of Shimodaira-Hasegawa test of alternative treetopologies for housekeeping and vacA genesa

TopologyLikelihood score for data set

Core genes vacA

Core gene tree �9,773.96 �14,725.85*vacA tree �10,156.12* �14,247.38

a Data set denotes the alignment of the concatenated core genes and the vacAgene. Topology denotes the maximum-likelihood trees shown in Fig. S3 in thesupplemental material. The likelihood scores (�lnL) are shown in the table andare based on comparing each data set across its own ML tree topology and thealternative topology. The lowest (best) likelihood scores are indicated in bold-face for each data set. Significance of the likelihood differences from the com-parisons of a common data set across different topologies was measured using abootstrap approach with RELL sampling and full optimization for 1,000 repli-cates. For example, the score from the comparison of the core genes data setagainst the core gene topology (�9773.96) is significantly better than the scorefrom the alternative comparison of the core genes data set against the vacA genetopology (�10156.12). �, P � 0.001.

FIG. 3. Comparative phylogenetic analyses of housekeeping gene sequences and VacA sequences. Rooted MrBayes trees of concatenatedhousekeeping gene sequences and VacA sequences. (A) Nucleotide sequences of seven housekeeping genes (atpA, efp, mutY, ppa, trpC, ureI, andyphC) from 14 strains of H. pylori and one outgroup, H. acinonychis, were analyzed. A classification of H. pylori strains into populations orsubpopulations based on MLST analysis (12, 27) is shown in parentheses. Housekeeping genes are referred to as core genes. (B) Deduced aminoacid sequences of VacA from the same 14 strains of H. pylori and one outgroup, H. acinonychis, were analyzed. The numbers represent Bayesianposterior probability values for each node.

6130 GANGWER ET AL. J. BACTERIOL.

on February 13, 2021 by guest

http://jb.asm.org/

Dow

nloaded from

Page 6: Molecular Evolution of the Helicobacter pylori Vacuolating ... · VacA) typically express CagA, and strains that produce inac-tive VacA proteins (type s2 VacA) typically lack the

whether the s1, i1, and m1 forms of VacA are more recentlyderived than the s2, i2, and m2 forms. However, these datasuggest that m2 forms of VacA were present at the time whenH. pylori and H. acinonychis diverged from a common ancestor.Three observations suggest that the VacA rooting position isaccurate. First, the VacA rooting position in m2 Asian se-quences is supported with two different inference methods,MrBayes and PhyloBayes. Second, three different models ofevolution (CAT, WAG, and JTT) and removal or addition ofother m2 Asian sequences and hypervariable regions does notalter the VacA rooting position; in particular, the probabilisticinference model, CAT, accounts for across-site heterogeneitiesand can handle model misspecifications associated with long-branch attraction artifacts (26). Third, the maximum aminoacid identity of the VacA outgroup with Asian m2 (65.4%) ornon-Asian m2 sequences (64.0%) is greater than that of theoutgroup with Asian m1 (57.4%) or non-Asian m1 sequences(51.1%).

Positive selection in VacA. A relatively high level of diver-gence within the p55 VacA cell-binding domain compared toother domains may reflect relaxed constraint on that portion ofthe sequence or positive selection if amino acid replacementsconfer a selective advantage. In the latter case, we expect toobserve an accumulation of nonsynonymous changes (dN) at arate higher than that of synonymous changes (dS). Previousstudies failed to detect positive selection in VacA based onanalyses of dN/dS ratios (4), but such analyses are known tolack sensitivity when applied to large segments of a gene. Asanother approach for investigating the evolutionary pressuresacting on VacA, we first analyzed vacA sequences for positiveselection (dN/dS of �1) using a sliding-window analysis withfull-length vacA sequences from strains 60190 (type m1 non-Asian, group 1) and 95-54 (type m2, group 3). The crystalstructure of the p55 domain is available for VacA from strain60190 (14), and VacA from strain 95-54 is known to exhibit adifferent cell type specificity from that of VacA from strain60190 (36). dN/dS ratios greater than 1 were observed inmainly one portion of the vacA sequence, the p55 cell-bindingdomain (Fig. 4).

To follow up the observation of elevated dN/dS ratios in theportion of vacA encoding the p55 cell-binding domain, we

collected full-length DNA sequences of 15 vacA alleles fromeach of the three main groups (Fig. 1, groups 1 to 3). We usedthe McDonald-Kreitman test (MKT) (30) to investigate ifadaptive evolution in the p55 domain is driving the divergenceof the three groups. The MKT analyzes the neutral theoryprediction that the ratio of synonymous-to-nonsynonymouspolymorphism (Ps/Pn) within groups should be the same as theratio of synonymous-to-nonsynonymous divergence (Ds/Dn)between groups. It was used previously to detect positive se-lection in an H. pylori sel1 homolog (33). The results indicate asignificant deviation from neutrality when full-length vacA se-quences and the p55 domain (Table 2) (P � 0.001) are ana-lyzed but not when the p33 domain or other regions are ana-lyzed. Excess nonsynonymous fixation, one signature ofadaptive protein evolution, causes the neutrality index (NI) inthe MKT to be less than 1. For all statistically significant MKTcomparisons, the NI was �0.53 (Table 2). These results con-firm the sliding-window analysis and indicate that the diver-gence in the p55 cell-binding domain is due to strong positiveselection. Serum antibody responses to VacA are known to bedirected predominantly against the p55 domain rather than thep33 domain (16). Therefore, immune selective pressure couldpotentially be one of the important forces that drive positiveselection within the p55 domain. In addition, it is possible thatdiversification represents functional adaptation of the p55 do-main to interact with different receptors or targets in host cells(36).

Surface exposure of divergent amino acids within the p55domain. Comparative sequence analyses revealed three mainfamilies of p55 domain sequences (Fig. 2), and divergencewithin this domain is the result of positive selection. To inves-tigate the location of divergent amino acids within the three-dimensional structure of the p55 domain, we used the se-quence of the p55 domain from H. pylori strain 60190 as areference for VacA sequences classified as group 1 (m1 non-Asian) in Fig. 1 and 2 since a crystal structure is available forthe p55 domain from this strain. All of the VacA sequencesfrom group 2 (Fig. 1, m1 Asian) were aligned to generate anm1 Asian consensus sequence, and, similarly, all of the VacAsequences from group 3 (Fig. 1, m2) were aligned to generatean m2 consensus sequence. We then compared the reference(group 1) sequence with the two consensus sequences andidentified the sites of divergent amino acids within the p55crystal structure.

In a comparison of the reference group 1 (m1 non-Asian)VacA sequence with that of the group 2 (m1 Asian) consensussequence, 30 sites differed, and 28 of these sites were surfaceexposed (Fig. 5A). A total of 109 sites differed when the ref-erence VacA sequence was compared with the group 3 (m2)consensus sequence, and 95 were surface exposed (Fig. 5B).Interestingly, 17 of the 30 divergent amino acids identified inthe first comparison (reference versus m1 Asian) were alsodivergent in the second comparison (reference versus m2), and15 of these correspond to surface-exposed residues. At 10 ofthese 17 sites, each of the three populations contains a distinctamino acid substitution, which suggests that the observed di-vergence has resulted from multiple independent bouts of evo-lutionary changes. Divergent amino acids often appeared ascontiguous sites (or clusters of amino acids) within the p55crystal structure (Fig. 5). In particular, VacA sequences clas-

FIG. 4. Sliding-window analysis of vacA from H. pylori strains60190 and 95-54. vacA sequences from strains 60190 (non-Asian m1type) and 95-54 (non-Asian m2 type) were aligned, and dN/dS ratioswere calculated using DnaSP with a sliding window of 50 bases and a10-bp step size. A dN/dS value of �1 indicates positive selection.

VOL. 192, 2010 MOLECULAR EVOLUTION OF H. PYLORI vacA 6131

on February 13, 2021 by guest

http://jb.asm.org/

Dow

nloaded from

Page 7: Molecular Evolution of the Helicobacter pylori Vacuolating ... · VacA) typically express CagA, and strains that produce inac-tive VacA proteins (type s2 VacA) typically lack the

sified as m2 contained a set of divergent amino acids corre-sponding to a longitudinal contiguous patch within the crystalstructure (Fig. 5B). A surface-exposed location of divergentamino acids is consistent with the hypothesis that these resi-dues may be subject to antibody recognition. Moreover, alter-ations in amino acids found on the surface of VacA couldpotentially lead to alterations in the interactions of VacA withhost cells.

CagA phylogeography and adaptive evolution. In H. pyloristrains 26695 and J99, vacA and cagA (encoding the secretedeffector protein CagA) are located �350 kb apart in the ge-nomes. Recently, it has been shown that VacA can downregu-late CagA’s effects on epithelial cells and that CagA can pro-tect cells against the apoptotic effects of VacA (35, 46).Furthermore, VacA can counteract the ability of CagA toactivate nuclear factor of activated T cells (NFAT) in gastricepithelial cells (56). We thus hypothesized that these two genesshare an evolutionary history characterized by co- or counter-adaptations in response to a common selective pressure.

We identified 46 H. pylori strains for which both VacA andCagA sequences were available. Phylogenetic analysis of thefull-length CagA sequences revealed three groups (Fig. 6).Clustering of CagA sequences from strains of East Asian originin a distinct group is consistent with results of previous studies(50, 55). Notably, the overall appearance of the CagA tree isvery similar to the phylogeny of VacA (Fig. 1). Group 1 in theCagA tree consists of seven sequences that are predominantlyfrom non-Asian strains (Fig. 6). The corresponding VacA se-quences from most of these strains are characterized as m1non-Asian and are found in group 1 in the full-length VacAtree (Fig. 1). CagA group 2 consists of 25 exclusively Asian

TABLE 2. Analysis of positive selection in vacA using the McDonald-Kreitman testa

Domain of VacAb Dn Ds Pn Ps P value NI �-Valuec

Group 2 vs group 3Full-length 146.06 68.03 248 286 0 0.408 0.591Signal sequence 0 0 17 19 NA NA NAp33 0 0 57 75 NA NA NAp55 153.49 75.68 99 92 0.001 0.53 0.469SAP 1 0 22 22 0.322 0 1�-Barrel 0 0 72 100 NA NA NA

Group 1 vs group 3Full-length 132.02 63.09 353 393 0 0.429 0.57Signal sequence 0 0 21 19 NA NA NAp33 3 2.01 78 104 0.446 0.501 0.498p55 130.93 61.1 143 140 0 0.476 0.523SAP 3.02 3.07 33 29 0.862 1.157 -0.157�-Barrel 1 1 78 101 0.856 0.774 0.225

Group 2 vs group 1Full-length 46.51 31.81 194 314 0 0.422 0.577Signal sequence 0 0 12 16 NA NA NAp33 4.01 5.08 31 75 0.348 0.523 0.476p55 32.64 17.62 70 110 0.001 0.343 0.656SAP 6.08 2.03 27 29 0.154 0.311 0.688�-Barrel 4.01 7.18 54 84 0.828 1.15 �0.15

a The neutrality index (NI) was calculated from the ratio of the number of polymorphisms to the number of substitutions as follows: NI � (Pn/Ps)/(Dn/Ds), whereP is polymorphic within the population, D is divergence or fixed difference between populations, n is nonsynonymous, and s is synonymous. Shaded lines indicatestatistically significant results that are indicative of positive selection. NA, not applicable.

b Group 1, m1 non-Asian; group 2, ml Asian; group 3, m2.c The proportion of adaptive substitutions that ranges from �� to 1 and is estimated as 1 � NI.

FIG. 5. Divergent amino acids within the VacA p55 domain. Thethree-dimensional structure of the VacA p55 domain (amino acids 355to 811) from H. pylori 60190 (classified as group 1 in Fig. 1 and 2) wasused as a reference for mapping divergent amino acids. (A) VacAsequences classified in group 2 (m1 Asian) were aligned, and a con-sensus sequence was determined. Differences between the sequence ofVacA from strain 60190 and the m1 Asian consensus sequence arehighlighted in blue. (B) VacA sequences classified in group 3 (m2)were aligned, and a consensus sequence was determined. Differencesbetween the sequence of VacA from strain 60190 and the m2 consen-sus sequence are highlighted in blue.

6132 GANGWER ET AL. J. BACTERIOL.

on February 13, 2021 by guest

http://jb.asm.org/

Dow

nloaded from

Page 8: Molecular Evolution of the Helicobacter pylori Vacuolating ... · VacA) typically express CagA, and strains that produce inac-tive VacA proteins (type s2 VacA) typically lack the

sequences; most of the corresponding VacA sequences arecharacterized as m1 Asian and are found in group 2 in thefull-length VacA tree. Finally, CagA group 3 consists of 11exclusively Asian sequences; most of the corresponding VacAsequences are characterized as m2 and are found in group 3 inthe full-length VacA tree. The observed similarities in theCagA and VacA phylogenetic trees suggest that CagA andVacA might be coevolving due to similar selective pressures. Insupport of this hypothesis, a McDonald-Kreitman test indi-cates that positive selection has shaped CagA divergence ofgroup 2 from both group 3 (P � 0.005; NI of 0.58) and group1 (P � 0.018; NI of 0.66). Two recent publications also de-tected positive selection when cagA sequences were analyzed(34, 49), and a recent paper reported an association betweenparticular cagA motifs and specific vacA types in a differentgroup of strains (23). Thus, there are striking similarities be-tween the topologies of VacA and CagA trees, and positiveselection has shaped the phylogenetic structures of both ofthese virulence determinants.

Phylogenetic incongruence between the trees of vacA andcagA. We next sought to investigate more rigorously the evo-lutionary relationships between vacA and cagA. Generation ofa rooted tree of CagA sequences is not possible because cagAis not present in H. acinonychis and is not currently known tobe present in any species other than H. pylori. Therefore, westatistically tested the topological similarity between vacA andcagA phylogenies.

For this analysis, we selected 28 H. pylori strains containingVacA and CagA sequences that were representative of thethree different groups. We compared the ML phylogenies

based on nucleotide sequences of the 28 taxa common to bothdata sets using the SH test. We used vacA nucleotide se-quences in this analysis instead of protein sequences becausethe nucleotide differences provide increased resolution for thetopology comparisons, and the trees are unrooted, which ob-viates the need to use protein sequences that are more con-served. Nonetheless, the vacA and cagA nucleotide trees of theingroup recapitulate the same three-group phylogenies (seeFig. S6 in the supplemental material). Despite the groupingsimilarities in the vacA and cagA trees, the topologies aresignificantly different based on the SH test (P � 0.001) (Table3; see also Fig. S6), indicating that the vacA toxin gene has notcoevolved in strict concert with cagA. This result is not unex-pected as there are strain differences (OK111, J99, F37, andShi470) in the two phylogenies that can account for this sta-tistical result (see Fig. S6). Second, there are fine-scale differ-ences in evolutionary relationships within the groups that arenot congruent when the vacA and cagA trees are compared.Repeating the SH test after removal of the four major outliersagain yielded a significant difference between the topologies(data not shown).

The most parsimonious explanation for the fine-scale differ-ences between the genes, and yet the broad similarities inphylogeographic patterns and patterns of adaptive evolution, isthat historical bouts of adaptation drove the parallel diver-gence of both cagA and vacA. However, more recent evolu-tionary changes at the tips of the three groups have scrambledany support for statistical concordance. These recent changeswithin groups could now be occurring by either drift or selec-tion unrelated to the ancestral changes that drove the threegroups’ common divergence. Thus, we hypothesize that VacAand CagA functionally interact most effectively when they arefrom the same group (i.e., group 1 VacA interacts most effi-ciently with group 1 CagA, etc.).

Conclusions. In summary, our key findings indicate, first,that VacA sequences can be classified into three distinctgroups on the basis of amino acid sequences and that differentVacA domains exhibit different evolutionary histories. Second,VacA has undergone strong divergence and positive selectionin the p55 cell-binding domain, which is consistent with hu-moral immune recognition of this domain; a result may beoptimized binding of VacA to different receptors or targets in

TABLE 3. Results of Shimodaira-Hasegawa test of alternative treetopologies for cagA and vacA genesa

TopologyLikelihood score for data set

cagA vacA

cagA tree �14,893.50 �18,862.81*vacA tree �16,922.57* �17,537.20

a Data set denotes the alignments of the vacA and cagA genes. Topologydenotes the maximum-likelihood trees shown in Fig. S6 in the supplementalmaterial. The likelihood scores (�lnL) are shown in the table and are based oncomparing each data set across its own ML tree topology and the alternativetopology. The lowest (best) likelihood scores are indicated in boldface for eachdata set. Significance of the likelihood differences from the comparisons of acommon data set across different topologies was measured using a bootstrapapproach with RELL sampling and full optimization for 1,000 replicates. Forexample, the score from the comparison of the cagA data set against the cagAtopology (�14893.50) is significantly better than the score from the alternativecomparison of the cagA data set against the vacA topology (�16922.57). �, P �0.001

FIG. 6. Analysis of CagA phylogeography. Neighbor-joining phylo-genetic tree of 46 CagA amino acid sequences. Three major groups areevident: group 1 consists predominantly of sequences from non-Asianstrains, group 2 consists of Asian sequences, and group 3 consists ofAsian sequences. The chart shows the number of strains analyzed andcharacteristics of VacA sequences in each group of the tree. CagAsequences shown in groups 1 and 2 correspond to H. pylori strainscontaining type m1 VacA (groups 1 and 2 of Fig. 1), whereas CagAsequences shown in group 3 correspond to strains containing type m2VacA (group 3 of Fig. 1). The nomenclature for the primary CagAgroups (groups 1, 2, and 3) is consistent with the nomenclature ofgroups in the full-length VacA tree (Fig. 1).

VOL. 192, 2010 MOLECULAR EVOLUTION OF H. PYLORI vacA 6133

on February 13, 2021 by guest

http://jb.asm.org/

Dow

nloaded from

Page 9: Molecular Evolution of the Helicobacter pylori Vacuolating ... · VacA) typically express CagA, and strains that produce inac-tive VacA proteins (type s2 VacA) typically lack the

host cells. Third, divergent amino acids map to surface-ex-posed sites in the p55 domain. Fourth, the phylogeographicfeatures of VacA and CagA are surprisingly similar yet mark-edly different from the phylogeographic features of housekeep-ing genes, which reflect a global spread of H. pylori out ofAfrica. We speculate that there is likely a related selectivepressure on both VacA and CagA. Since there is substantialphysical distance between vacA and cag genes within the bac-terial genome, this selection could arise by a form of pseudo-linkage of functionally interacting genes that perhaps balancesproinflammatory and anti-inflammatory characteristics ofstrains to facilitate long-term colonization of the human gastricmucosa.

ACKNOWLEDGMENTS

This work was supported by National Institutes of Health grants R01AI39657, R01 AI068009, and P01 CA116087 (to T.L.C.) and R01GM085163 (to S.R.B.), by German Research Foundation grant SFB900/A1 (to S.S.), and by funding from the Department of VeteransAffairs (to T.L.C.) and the Burroughs Wellcome Fund (to D.B.L.).

REFERENCES

1. Argent, R. H., R. J. Thomas, D. P. Letley, M. G. Rittig, K. R. Hardie, andJ. C. Atherton. 2008. Functional association between the Helicobacter pylorivirulence factors VacA and CagA. J. Med. Microbiol. 57:145–150.

2. Atherton, J. C., and M. J. Blaser. 2009. Coadaptation of Helicobacter pyloriand humans: ancient history, modern implications. J. Clin. Invest. 119:2475–2487.

3. Atherton, J. C., P. Cao, R. M. Peek, Jr., M. K. Tummuru, M. J. Blaser, andT. L. Cover. 1995. Mosaicism in vacuolating cytotoxin alleles of Helicobacterpylori. Association of specific vacA types with cytotoxin production andpeptic ulceration. J. Biol. Chem. 270:17771–17777.

4. Atherton, J. C., P. M. Sharp, T. L. Cover, G. Gonzalez-Valencia, R. M. Peek,Jr., S. A. Thompson, C. J. Hawkey, and M. J. Blaser. 1999. Vacuolatingcytotoxin (vacA) alleles of Helicobacter pylori comprise two geographicallywidespread types, m1 and m2, and have evolved through limited recombi-nation. Curr. Microbiol. 39:211–218.

5. Blaser, M. J., and D. E. Berg. 2001. Helicobacter pylori genetic diversity andrisk of human disease. J. Clin. Invest. 107:767–773.

6. Blaser, M. J., G. I. Perez-Perez, H. Kleanthous, T. L. Cover, R. M. Peek,P. H. Chyou, G. N. Stemmermann, and A. Nomura. 1995. Infection withHelicobacter pylori strains possessing cagA is associated with an increased riskof developing adenocarcinoma of the stomach. Cancer Res. 55:2111–2115.

7. Bumann, D., S. Aksu, M. Wendland, K. Janek, U. Zimny-Arndt, N. Sabarth,T. F. Meyer, and P. R. Jungblut. 2002. Proteome analysis of secreted pro-teins of the gastric pathogen Helicobacter pylori. Infect. Immun. 70:3396–3403.

8. Cover, T. L., and S. R. Blanke. 2005. Helicobacter pylori VacA, a paradigmfor toxin multifunctionality. Nat. Rev. Microbiol. 3:320–332.

9. Dailidiene, D., G. Dailide, K. Ogura, M. Zhang, A. K. Mukhopadhyay, K. A.Eaton, G. Cattoli, J. G. Kusters, and D. E. Berg. 2004. Helicobacter aci-nonychis: genetic and rodent infection studies of a Helicobacter pylori-likegastric pathogen of cheetahs and other big cats. J. Bacteriol. 186:356–365.

10. de Bernard, M., A. Cappon, G. Del Giudice, R. Rappuoli, and C. Mon-tecucco. 2004. The multiple cellular activities of the VacA cytotoxin ofHelicobacter pylori. Int. J. Med. Microbiol. 293:589–597.

11. Eppinger, M., C. Baar, B. Linz, G. Raddatz, C. Lanz, H. Keller, G. Morelli,H. Gressmann, M. Achtman, and S. C. Schuster. 2006. Who ate whom?Adaptive Helicobacter genomic changes that accompanied a host jump fromearly humans to large felines. PLoS Genet. 2:e120.

12. Falush, D., T. Wirth, B. Linz, J. K. Pritchard, M. Stephens, M. Kidd, M. J.Blaser, D. Y. Graham, S. Vacher, G. I. Perez-Perez, Y. Yamaoka, F. Me-graud, K. Otto, U. Reichard, E. Katzowitsch, X. Wang, M. Achtman, and S.Suerbaum. 2003. Traces of human migrations in Helicobacter pylori popula-tions. Science 299:1582–1585.

13. Figueiredo, C., J. C. Machado, P. Pharoah, R. Seruca, S. Sousa, R. Carvalho,A. F. Capelinha, W. Quint, C. Caldas, L. J. van Doorn, F. Carneiro, and M.Sobrinho-Simoes. 2002. Helicobacter pylori and interleukin 1 genotyping: anopportunity to identify high-risk individuals for gastric carcinoma. J. Natl.Cancer Inst. 94:1680–1687.

14. Gangwer, K. A., D. J. Mushrush, D. L. Stauff, B. Spiller, M. S. McClain, T. L.Cover, and D. B. Lacy. 2007. Crystal structure of the Helicobacter pylorivacuolating toxin p55 domain. Proc. Natl. Acad. Sci. U. S. A. 104:16293–16298.

15. Gebert, B., W. Fischer, and R. Haas. 2004. The Helicobacter pylori vacuolat-

ing cytotoxin: from cellular vacuolation to immunosuppressive activities.Rev. Physiol. Biochem. Pharmacol. 152:205–220.

16. Ghose, C., G. I. Perez-Perez, V. J. Torres, M. Crosatti, A. Nomura, R. M.Peek, Jr., T. L. Cover, F. Francois, and M. J. Blaser. 2007. Serological assaysfor identification of human gastric colonization by Helicobacter pylori strainsexpressing VacA m1 or m2. Clin. Vaccine Immunol. 14:442–450.

17. Go, M. F., V. Kapur, D. Y. Graham, and J. M. Musser. 1996. Populationgenetic analysis of Helicobacter pylori by multilocus enzyme electrophoresis:extensive allelic diversity and recombinational population structure. J. Bac-teriol. 178:3934–3938.

18. Gonzalez-Rivera, C., K. A. Gangwer, M. S. McClain, I. M. Eli, M. G. Cham-bers, M. D. Ohi, D. B. Lacy, and T. L. Cover. 2010 Reconstitution ofHelicobacter pylori VacA toxin from purified components. Biochemistry 49:5743–5752.

19. Gottke, M. U., C. A. Fallone, A. N. Barkun, K. Vogt, V. Loo, M. Trautmann,J. Z. Tong, T. N. Nguyen, T. Fainsilber, H. H. Hahn, J. Korber, A. Lowe, andR. N. Beech. 2000. Genetic variability determinants of Helicobacter pylori:influence of clinical background and geographic origin of isolates. J. Infect.Dis. 181:1674–1681.

20. Hatakeyama, M. 2004. Oncogenic mechanisms of the Helicobacter pyloriCagA protein. Nat. Rev. Cancer. 4:688–694.

21. Ito, Y., T. Azuma, S. Ito, H. Miyaji, M. Hirai, Y. Yamazaki, F. Sato, T. Kato,Y. Kohli, and M. Kuriyama. 1997. Analysis and typing of the vacA gene fromcagA-positive strains of Helicobacter pylori isolated in Japan. J. Clin. Micro-biol. 35:1710–1714.

22. Ito, Y., T. Azuma, S. Ito, H. Suto, H. Miyaji, Y. Yamazaki, Y. Kohli, and M.Kuriyama. 1998. Full-length sequence analysis of the vacA gene from cyto-toxic and noncytotoxic Helicobacter pylori. J. Infect. Dis. 178:1391–1398.

23. Jang, S., K. R. Jones, C. H. Olsen, Y. M. Joo, Y.-J. Yoo, I.-S. Chung, J.-H.Cha, and D. S. Merrell. 2010. Epidemiological link between gastric diseaseand polymorphisms in VacA and CagA. J. Clin. Microbiol. 48:559–567.

24. Ji, X., F. Frati, S. Barone, C. Pagliaccia, D. Burroni, G. Xu, R. Rappuoli,J. M. Reyrat, and J. L. Telford. 2002. Evolution of functional polymorphismin the gene coding for the Helicobacter pylori cytotoxin. FEMS Microbiol.Lett. 206:253–258.

25. Kersulyte, D., A. K. Mukhopadhyay, B. Velapatino, W. Su, Z. Pan, C. Garcia,V. Hernandez, Y. Valdez, R. S. Mistry, R. H. Gilman, Y. Yuan, H. Gao, T.Alarcon, M. Lopez-Brea, G. Balakrish Nair, A. Chowdhury, S. Datta, M.Shirai, T. Nakazawa, R. Ally, I. Segal, B. C. Wong, S. K. Lam, F. O. Olfat,T. Boren, L. Engstrand, O. Torres, R. Schneider, J. E. Thomas, S. Czinn,and D. E. Berg. 2000. Differences in genotypes of Helicobacter pylori fromdifferent human populations. J. Bacteriol. 182:3210–3218.

26. Lartillot, N., H. Brinkmann, and H. Philippe. 2007. Suppression of long-branch attraction artefacts in the animal phylogeny using a site-heteroge-neous model. BMC Evol. Biol. 7(Suppl. 1):S4.

27. Linz, B., F. Balloux, Y. Moodley, A. Manica, H. Liu, P. Roumagnac, D.Falush, C. Stamer, F. Prugnolle, S. W. van der Merwe, Y. Yamaoka, D. Y.Graham, E. Perez-Trallero, T. Wadstrom, S. Suerbaum, and M. Achtman.2007. An African origin for the intimate association between humans andHelicobacter pylori. Nature 445:915–918.

28. Maddison, D. R., and W. P. Maddison. 2005. MacClade 4: analysis of phy-logeny and character evolution. Sinauer Associates, Inc., Sunderland, MA.

29. Mane, S. P., M. G. Dominguez-Bello, M. J. Blaser, B. W. Sobral, R. Honte-cillas, J. Skoneczka, S. K. Mohapatra, O. R. Crasta, C. Evans, T. Modise, S.Shallom, M. Shukla, C. Varon, F. Megraud, A. L. Maldonado-Contreras,K. P. Williams, and J. Bassaganya-Riera. 2010. Host-interactive genes inAmerindian Helicobacter pylori diverge from their Old World homologs andmediate inflammatory responses. J. Bacteriol. 192:3078–3092.

30. McDonald, J. H., and M. Kreitman. 1991. Adaptive protein evolution at theAdh locus in Drosophila. Nature 351:652–654.

31. Moodley, Y., and B. Linz. 2009. Helicobacter pylori sequences reflect pasthuman migrations. Genome Dyn. 6:62–74.

32. Odenbreit, S., J. Puls, B. Sedlmaier, E. Gerland, W. Fischer, and R. Haas.2000. Translocation of Helicobacter pylori CagA into gastric epithelial cells bytype IV secretion. Science 287:1497–1500.

33. Ogura, M., J. C. Perez, P. R. Mittl, H. K. Lee, G. Dailide, S. Tan, Y. Ito, O.Secka, D. Dailidiene, K. Putty, D. E. Berg, and A. Kalia. 2007. Helicobacterpylori evolution: lineage-specific adaptations in homologs of eukaryotic Sel1-like genes. PLoS Comput. Biol. 3:e151.

34. Olbermann, P., C. Josenhans, Y. Moodley, M. Uhr, C. Stamer, M. Vauterin,S. Suerbaum, M. Achtman, and B. Linz. 2010. A global overview of thegenetic and functional diversity in the Helicobacter pylori cag pathogenicityisland. PLoS Genet. 6:e1001069.

35. Oldani, A., M. Cormont, V. Hofman, V. Chiozzi, O. Oregioni, A. Canonici, A.Sciullo, P. Sommi, A. Fabbri, V. Ricci, and P. Boquet. 2009. Helicobacterpylori counteracts the apoptotic action of its VacA toxin by injecting theCagA protein into gastric epithelial cells. PLoS Pathog. 5:e1000603.

36. Pagliaccia, C., M. de Bernard, P. Lupetti, X. Ji, D. Burroni, T. L. Cover, E.Papini, R. Rappuoli, J. L. Telford, and J. M. Reyrat. 1998. The m2 form ofthe Helicobacter pylori cytotoxin has cell type-specific vacuolating activity.Proc. Natl. Acad. Sci. U. S. A. 95:10212–10217.

37. Pan, Z. J., D. E. Berg, R. W. van der Hulst, W. W. Su, A. Raudonikiene, S. D.

6134 GANGWER ET AL. J. BACTERIOL.

on February 13, 2021 by guest

http://jb.asm.org/

Dow

nloaded from

Page 10: Molecular Evolution of the Helicobacter pylori Vacuolating ... · VacA) typically express CagA, and strains that produce inac-tive VacA proteins (type s2 VacA) typically lack the

Xiao, J. Dankert, G. N. Tytgat, and A. van der Ende. 1998. Prevalence ofvacuolating cytotoxin production and distribution of distinct vacA alleles inHelicobacter pylori from China. J. Infect. Dis. 178:220–226.

38. Posada, D. 2003. Using MODELTEST and PAUP* to select a model ofnucleotide substitution. Curr. Protoc. Bioinformatics, chapter 6, unit 6.5.doi:10.1002/0471250953.bi0605s00.

39. Rhead, J. L., D. P. Letley, M. Mohammadi, N. Hussein, M. A. Mohagheghi,M. Eshagh Hosseini, and J. C. Atherton. 2007. A new Helicobacter pylorivacuolating cytotoxin determinant, the intermediate region, is associatedwith gastric cancer. Gastroenterology 133:926–936.

40. Rozas, J., J. C. Sanchez-DelBarrio, X. Messeguer, and R. Rozas. 2003.DnaSP, DNA polymorphism analyses by the coalescent and other methods.Bioinformatics 19:2496–2497.

41. Schmitt, W., and R. Haas. 1994. Genetic analysis of the Helicobacter pylorivacuolating cytotoxin: structural similarities with the IgA protease type ofexported protein. Mol. Microbiol. 12:307–319.

42. Shimodaira, H. H., M. 1999. Multiple comparisons of log-likelihoods withapplications to phylogenetic inference. Mol. Biol. Evol. 16:1114–1116.

43. Suerbaum, S., and P. Michetti. 2002. Helicobacter pylori infection. N. Engl.J. Med. 347:1175–1186.

44. Suerbaum, S., J. M. Smith, K. Bapumia, G. Morelli, N. H. Smith, E. Kun-stmann, I. Dyrek, and M. Achtman. 1998. Free recombination within Heli-cobacter pylori. Proc. Natl. Acad. Sci. U. S. A. 95:12619–12624.

45. Swofford, D. L. 2002. PAUP*: phylogenetic analysis using parsimony (*andother methods), version 4. Sinauer Associates, Sunderland, MA.

46. Tegtmeyer, N., D. Zabler, D. Schmidt, R. Hartig, S. Brandt, and S. Backert.2009. Importance of EGF receptor, HER2/Neu and Erk1/2 kinase signallingfor host cell elongation and scattering induced by the Helicobacter pyloriCagA protein: antagonistic effects of the vacuolating cytotoxin VacA. CellMicrobiol. 11:488–505.

47. Telford, J. L., P. Ghiara, M. Dell’Orco, M. Comanducci, D. Burroni, M.Bugnoli, M. F. Tecce, S. Censini, A. Covacci, Z. Xiang, et al. 1994. Genestructure of the Helicobacter pylori cytotoxin and evidence of its key role ingastric disease. J. Exp. Med. 179:1653–1658.

48. Torres, V. J., S. E. Ivie, M. S. McClain, and T. L. Cover. 2005. Functional

properties of the p33 and p55 domains of the Helicobacter pylori vacuolatingcytotoxin. J. Biol. Chem. 280:21107–21114.

49. Torres-Morquecho, A., S. Giono-Cerezo, M. Camorlinga-Ponce, C. F. Var-gas-Mendoza, and J. Torres. 2010. Evolution of bacterial genes: evidences ofpositive Darwinian selection and fixation of base substitutions in virulencegenes of Helicobacter pylori. Infect. Genet. Evol. 10:764–776.

50. Truong, B. X., V. T. Mai, H. Tanaka, T. Ly le, T. M. Thong, H. H. Hai, D. VanLong, K. Furumatsu, M. Yoshida, H. Kutsumi, and T. Azuma. 2009. Diversecharacteristics of the CagA gene of Helicobacter pylori strains collected frompatients from southern Vietnam with gastric cancer and peptic ulcer. J. Clin.Microbiol. 47:4021–4028.

51. Van Doorn, L. J., C. Figueiredo, F. Megraud, S. Pena, P. Midolo, D. M.Queiroz, F. Carneiro, B. Vanderborght, M. D. Pegado, R. Sanna, W. DeBoer, P. M. Schneeberger, P. Correa, E. K. Ng, J. Atherton, M. J. Blaser, andW. G. Quint. 1999. Geographic distribution of vacA allelic types of Helico-bacter pylori. Gastroenterology 116:823–830.

52. van Doorn, L. -J., C. Figueiredo, R. Sanna, S. Pena, P. Midolo, E. K. Ng, J. C.Atherton, M. J. Blaser, and W. G. Quint. 1998. Expanding allelic diversity ofHelicobacter pylori vacA. J. Clin. Microbiol. 36:2597–2603.

53. Wang, H. J., C. H. Kuo, A. A. Yeh, P. C. Chang, and W. C. Wang. 1998.Vacuolating toxin production in clinical isolates of Helicobacter pylori withdifferent vacA genotypes. J. Infect. Dis. 178:207–212.

54. Wang, W. C., H. J. Wang, and C. H. Kuo. 2001. Two distinctive cell bindingpatterns by vacuolating toxin fused with glutathione S-transferase: one high-affinity m1-specific binding and the other lower-affinity binding for variant mforms. Biochemistry 40:11887–11896.

55. Yamazaki, S., A. Yamakawa, T. Okuda, M. Ohtani, H. Suto, Y. Ito, Y.Yamazaki, Y. Keida, H. Higashi, M. Hatakeyama, and T. Azuma. 2005.Distinct diversity of vacA, cagA, and cagE genes of Helicobacter pylori asso-ciated with peptic ulcer in Japan. J. Clin. Microbiol. 43:3906–3916.

56. Yokoyama, K., H. Higashi, S. Ishikawa, Y. Fujii, S. Kondo, H. Kato, T.Azuma, A. Wada, T. Hirayama, H. Aburatani, and M. Hatakeyama. 2005.Functional antagonism between Helicobacter pylori CagA and vacuolatingtoxin VacA in control of the NFAT signaling pathway in gastric epithelialcells. Proc. Natl. Acad. Sci. U. S. A. 102:9661–9666.

VOL. 192, 2010 MOLECULAR EVOLUTION OF H. PYLORI vacA 6135

on February 13, 2021 by guest

http://jb.asm.org/

Dow

nloaded from