12
EUKARYOTIC CELL, Mar. 2005, p. 504–515 Vol. 4, No. 3 1535-9778/05/$08.000 doi:10.1128/EC.4.3.504–515.2005 Copyright © 2005, American Society for Microbiology. All Rights Reserved. Comparative Genomic Hybridizations of Entamoeba Strains Reveal Unique Genetic Fingerprints That Correlate with Virulence Preetam H. Shah, 1 Ryan C. MacFarlane, 2 Dhruva Bhattacharya, 2 John C. Matese, 3 Janos Demeter, 3 Suzanne E. Stroup, 4 and Upinder Singh 1,2 * Division of Infectious Diseases, Department of Internal Medicine, 1 and Department of Microbiology and Immunology, 2 Stanford University School of Medicine, and Department of Genetics, Center for Clinical Sciences Research, Stanford University, 3 Stanford, California, and Department of Internal Medicine, University of Virginia, Charlottesville, Virginia 4 Received 26 September 2004/Accepted 21 December 2004 Variable phenotypes have been identified for Entamoeba species. Entamoeba histolytica is invasive and causes colitis and liver abscesses but only in 10% of infected individuals; 90% remain asymptomatically colonized. Entamoeba dispar, a closely related species, is avirulent. To determine the extent of genetic diversity among Entamoeba isolates and potential genotype-phenotype correlations, we have developed an E. histolytica genomic DNA microarray and used it to genotype strains of E. histolytica and E. dispar. On the basis of the identification of divergent genetic loci, all strains had unique genetic fingerprints. Comparison of divergent genetic regions allowed us to distinguish between E. histolytica and E. dispar, identify novel genetic regions usable for strain and species typing, and identify a number of genes restricted to virulent strains. Among the four E. histolytica strains, a strain with attenuated virulence was the most divergent and phylogenetically distinct strain, raising the intriguing possibility that genetic subtypes of E. histolytica may be partially responsible for the observed variability in clinical outcomes. This microarray-based genotyping assay can readily be applied to the study of E. histolytica clinical isolates to determine genetic diversity and potential genotypic-phenotypic associations. Entamoeba histolytica causes amebic colitis and liver ab- scesses. Worldwide, 50 million people have invasive E. histo- lytica, and 100,000 die each year, making it the second most common cause of parasitic death in humans (71). In Dhaka, Bangladesh, where diarrheal diseases are the leading cause of death in children less than 6 years old, 50% of children have serological evidence of exposure to E. histolytica by age 5 (34, 35). Infected children suffer from significant morbidity with malnourishment and growth delays. A number of interesting epidemiologic trends have been identified for amebic disease. The data, although limited, indicate that a minority (10%) of individuals who become infected with E. histolytica progress to clinically overt disease; others remain asymptomatically colo- nized (29). In individuals with invasive disease, occurrence of amebic liver abscess (ALA) is 5 to 50 times less common than diarrhea (65). Geographically variable disease predilections have been observed, and invasive disease predominantly affects men (1, 2, 11, 61). The extent of genetic diversity among E. histolytica clinical isolates is unclear. Studies have analyzed a small number of highly repetitive and polymorphic genetic loci by techniques, such as randomly amplified polymorphic DNA (RAPD), RNA arbitrarily primed PCR, and restriction fragment length poly- morphism (RFLP), to conclude that there is significant genetic diversity (7, 18, 23, 31, 33, 64, 72, 73). Contradictory data were obtained by two studies; in one study, the lectin gene was sequenced, and in the other study, the intergenic region be- tween the superoxide dismutase gene and the actin 3 gene was analyzed (9, 30). These studies found minimal genetic diversity among clinical isolates. Although limited, these data indicate that the extensive genetic diversity identified by analysis of repetitive regions may not be indicative of a genome-wide phenomenon. A number of studies have attempted to identify an association between parasite genotype and the clinical phe- notype and although most have failed, two studies (one using RAPD analysis of amebic isolates and one using PCR and RFLP analyses of the serine-rich E. histolytica protein [SREHP]) did give some preliminary indication that it may be feasible to find a genotypic pattern predictive of a phe- notypic outcome (7, 64). Host factors that contribute to variable disease manifestations have also been studied. In a study of Mexican mestizo adults and children, an increased frequency of HLA-DR3 was found to be associated with ALA (5, 6, 63). A recent study identified a potential pro- tective association of the HLA class II allele DQB1*0601 and the heterozygous haplotype DQB1*0601/DRB1*1501 for intestinal amebiasis (26). E. histolytica was recently reclassified into E. histolytica Schaudinn (EH) (previously described as pathogenic E. histo- lytica) and Entamoeba dispar Brumpt (ED) (previously de- scribed as nonpathogenic E. histolytica) (25). The two species are morphologically similar, exhibit 98% identity at the rRNA level, and have similar host range and cell biology. However, their pathogenicity in vivo is vastly different. E. dis- par colonizes humans, but only E. histolytica is able to cause invasive disease. To date, only four genes have been reported to be present in E. histolytica but absent (EhCP1, Ehapt2, and Ariel1) or significantly degenerate (EhCP5) in E. dispar (12, * Corresponding author. Mailing address: Department of Medicine, Division of Infectious Diseases, S-141 Grant Building, 300 Pasteur Dr., Stanford, CA 94305. Phone: (650) 723-4045. Fax: (650) 724-3892. E- mail: [email protected]. † Present address: Lewis-Sigler Institute for Integrative Genomics, Princeton, NJ 08544. 504 on October 10, 2020 by guest http://ec.asm.org/ Downloaded from

Comparative Genomic Hybridizations of Entamoeba Strains ...Microarray hybridization. Genomic DNA (4 to 8 g) from the EH and ED strains was resuspended in 30.5 l of Tris-EDTA (pH 7.6),

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Comparative Genomic Hybridizations of Entamoeba Strains ...Microarray hybridization. Genomic DNA (4 to 8 g) from the EH and ED strains was resuspended in 30.5 l of Tris-EDTA (pH 7.6),

EUKARYOTIC CELL, Mar. 2005, p. 504–515 Vol. 4, No. 31535-9778/05/$08.00�0 doi:10.1128/EC.4.3.504–515.2005Copyright © 2005, American Society for Microbiology. All Rights Reserved.

Comparative Genomic Hybridizations of Entamoeba Strains RevealUnique Genetic Fingerprints That Correlate with Virulence

Preetam H. Shah,1 Ryan C. MacFarlane,2 Dhruva Bhattacharya,2 John C. Matese,3†Janos Demeter,3 Suzanne E. Stroup,4 and Upinder Singh1,2*

Division of Infectious Diseases, Department of Internal Medicine,1 and Department of Microbiology and Immunology,2

Stanford University School of Medicine, and Department of Genetics, Center for Clinical Sciences Research,Stanford University,3 Stanford, California, and Department of Internal Medicine,

University of Virginia, Charlottesville, Virginia4

Received 26 September 2004/Accepted 21 December 2004

Variable phenotypes have been identified for Entamoeba species. Entamoeba histolytica is invasive and causescolitis and liver abscesses but only in �10% of infected individuals; 90% remain asymptomatically colonized.Entamoeba dispar, a closely related species, is avirulent. To determine the extent of genetic diversity amongEntamoeba isolates and potential genotype-phenotype correlations, we have developed an E. histolytica genomicDNA microarray and used it to genotype strains of E. histolytica and E. dispar. On the basis of the identificationof divergent genetic loci, all strains had unique genetic fingerprints. Comparison of divergent genetic regionsallowed us to distinguish between E. histolytica and E. dispar, identify novel genetic regions usable for strain andspecies typing, and identify a number of genes restricted to virulent strains. Among the four E. histolyticastrains, a strain with attenuated virulence was the most divergent and phylogenetically distinct strain, raisingthe intriguing possibility that genetic subtypes of E. histolytica may be partially responsible for the observedvariability in clinical outcomes. This microarray-based genotyping assay can readily be applied to the study ofE. histolytica clinical isolates to determine genetic diversity and potential genotypic-phenotypic associations.

Entamoeba histolytica causes amebic colitis and liver ab-scesses. Worldwide, 50 million people have invasive E. histo-lytica, and 100,000 die each year, making it the second mostcommon cause of parasitic death in humans (71). In Dhaka,Bangladesh, where diarrheal diseases are the leading cause ofdeath in children less than 6 years old, �50% of children haveserological evidence of exposure to E. histolytica by age 5 (34,35). Infected children suffer from significant morbidity withmalnourishment and growth delays. A number of interestingepidemiologic trends have been identified for amebic disease.The data, although limited, indicate that a minority (�10%) ofindividuals who become infected with E. histolytica progress toclinically overt disease; others remain asymptomatically colo-nized (29). In individuals with invasive disease, occurrence ofamebic liver abscess (ALA) is 5 to 50 times less common thandiarrhea (65). Geographically variable disease predilectionshave been observed, and invasive disease predominantly affectsmen (1, 2, 11, 61).

The extent of genetic diversity among E. histolytica clinicalisolates is unclear. Studies have analyzed a small number ofhighly repetitive and polymorphic genetic loci by techniques,such as randomly amplified polymorphic DNA (RAPD), RNAarbitrarily primed PCR, and restriction fragment length poly-morphism (RFLP), to conclude that there is significant geneticdiversity (7, 18, 23, 31, 33, 64, 72, 73). Contradictory data wereobtained by two studies; in one study, the lectin gene was

sequenced, and in the other study, the intergenic region be-tween the superoxide dismutase gene and the actin 3 gene wasanalyzed (9, 30). These studies found minimal genetic diversityamong clinical isolates. Although limited, these data indicatethat the extensive genetic diversity identified by analysis ofrepetitive regions may not be indicative of a genome-widephenomenon. A number of studies have attempted to identifyan association between parasite genotype and the clinical phe-notype and although most have failed, two studies (one usingRAPD analysis of amebic isolates and one using PCR andRFLP analyses of the serine-rich E. histolytica protein[SREHP]) did give some preliminary indication that it maybe feasible to find a genotypic pattern predictive of a phe-notypic outcome (7, 64). Host factors that contribute tovariable disease manifestations have also been studied. In astudy of Mexican mestizo adults and children, an increasedfrequency of HLA-DR3 was found to be associated withALA (5, 6, 63). A recent study identified a potential pro-tective association of the HLA class II allele DQB1*0601and the heterozygous haplotype DQB1*0601/DRB1*1501for intestinal amebiasis (26).

E. histolytica was recently reclassified into E. histolyticaSchaudinn (EH) (previously described as pathogenic E. histo-lytica) and Entamoeba dispar Brumpt (ED) (previously de-scribed as nonpathogenic E. histolytica) (25). The two speciesare morphologically similar, exhibit �98% identity at therRNA level, and have similar host range and cell biology.However, their pathogenicity in vivo is vastly different. E. dis-par colonizes humans, but only E. histolytica is able to causeinvasive disease. To date, only four genes have been reportedto be present in E. histolytica but absent (EhCP1, Ehapt2, andAriel1) or significantly degenerate (EhCP5) in E. dispar (12,

* Corresponding author. Mailing address: Department of Medicine,Division of Infectious Diseases, S-141 Grant Building, 300 Pasteur Dr.,Stanford, CA 94305. Phone: (650) 723-4045. Fax: (650) 724-3892. E-mail: [email protected].

† Present address: Lewis-Sigler Institute for Integrative Genomics,Princeton, NJ 08544.

504

on October 10, 2020 by guest

http://ec.asm.org/

Dow

nloaded from

Page 2: Comparative Genomic Hybridizations of Entamoeba Strains ...Microarray hybridization. Genomic DNA (4 to 8 g) from the EH and ED strains was resuspended in 30.5 l of Tris-EDTA (pH 7.6),

67–69). However, the molecular basis of E. histolytica virulenceand E. dispar nonvirulence is not yet known.

The parasite and host variables that contribute to the epi-demiologic trends described above are not clear. Undoubtedly,there is a complex interplay between host genetics, immunity,enteric flora, nutrition, and parasite genetics that occurs andcontributes to disease. Whether there are subtypes of EH thathave higher or lower virulence potential or predilection forinfection of certain organ systems is not known. The WorldHealth Organization has prioritized efforts to determinewhether functional subgroups of EH exist (71). Using an E.histolytica 11,328-clone genomic DNA microarray, we havegenotyped four E. histolytica laboratory strains and two E.dispar laboratory strains. This first genome-wide analysis ofEntamoeba strains reveals that genotypic fingerprints can beused to distinguish E. histolytica from E. dispar, identify genesrestricted to virulent strains, and find potential genotypic-phe-notypic associations.

MATERIALS AND METHODS

Parasite culture, DNA extraction, and Entamoeba strain verification. Four EHstrains and two ED strains were utilized in our study. The EH strains were asfollows: EH (HM-1:IMSS), originally isolated in 1967 from a patient with amebiccolitis in Mexico; EH (200:NIH) isolated from a patient with colitis in India in1949; EH (HK-9) isolated from a patient with amebic dysentery in Korea; andEH (Rahman) isolated from an asymptomatic cyst passer in England in 1972.The ED strains were ED (SAW760), which was isolated from an adult humanmale in England in 1979, and ED (SAW1734), which was isolated from Ethio-pian Jews in Israel in the 1980s (http://www.atcc.org/). All E. histolytica strainswere grown under axenic conditions in Trypticase-yeast extract-iron-serum me-dium (TYI-S-33) with 10 to 15% adult bovine serum (Sigma), penicillin (100U/ml), streptomycin (100 �g/ml) (Gibco BRL), and 1� Diamond’s vitamins(Biosource International, Camarillo, Calif.) in 15-ml glass culture tubes at 37°C(60). E. dispar isolates were grown under xenic conditions (media supplementedwith 200 �g of erythromycin per ml) (Sigma Aldrich, St. Louis, Mo.) as previ-ously described (36). Cultures of mid-logarithmic or late-logarithmic-phase tro-phozoites were chilled on ice for 10 min and centrifuged at 430 � g at 4°C for 5min, and genomic DNA was isolated using either a phenol-chloroform method ora Wizard Genomic DNA kit according to the manufacturer’s directions (Pro-mega Corporation, Madison, Wis.) (58). The EH and ED strains were verified byPCR analysis of the rRNA episome and strain-specific genes and by PCR andRFLP analyses of the SREHP using previously described methods (18, 62). TheSREHP PCR products were digested with AluI, and results for all PCR productswere compared to banding patterns in previously published studies (18). Primersequences are available for review (see Table 1 in the supplemental material athttp://microarray-pubs.stanford.edu/entamoeba_diversity/index.shtml).

E. histolytica and E. dispar genome sequencing. The Institute of GenomicResearch (TIGR) (http://www.tigr.org/tdb/e2k1/eha1/) and Sanger Center(http://www.sanger.ac.uk/Projects/E_histolytica/) have sequenced E. histolytica(HM-1:IMSS) and E. dispar (SAW760) (47a). As of March 2004, there was 12Xgenome coverage of EH and �2X genome coverage of ED.

Microarray construction and quality control. We utilized 11,328 sequencedclones from the genomic library of EH (HM-1:IMSS) prepared by TIGR togenerate our DNA microarrays. Approximately 62% of these clones were used inthe final sequence assembly. Microarray construction was similar to previouslydescribed methods (19). Briefly, we PCR amplified the genomic insert directlyfrom 5 �l of bacterial cultures using M13F and M13R primers that are part of themultiple cloning sites of the cloning vector. A total of 35 cycles of PCR wasperformed, with 1 cycle consisting of 1 min at 94°C, 1 min at 50°C, and 2 min at72°C. PCR was performed with 2 mM MgCl2 in a Tetrad Thermo cycler (MJResearch, Waltham, Mass.). The products were ethanol precipitated, resus-pended in water, dried, resuspended in 3� SSC buffer (1� SSC is 0.15 M NaClplus 0.015 M sodium citrate), and printed on polylysine-coated glass slides (19).To check the accuracy of printing, quality controls were performed. Probes fromthe rRNA episome, EhActin, EhPFK, and EhATPase were used to verify cloneorientation and contamination issues. One hundred nanograms of each amplifiedPCR product was labeled with Cy5-dUTP and hybridized on the arrays (see“Microarray hybridization” for details). Primer sequences are available for re-

view (see Table 1 in the supplemental material at http://microarray-pubs.stanford.edu/entamoeba_diversity/index.shtml).

Clone annotation. The assembly, locus information, and final TIGR/SangerCenter gene annotation were kindly provided by Brendan Loftus (annotated on12 February 2004). Clones from potentially contaminated plates and clones fromthe rRNA episome and repeat elements (Ehapt2 and EhRLE4) were identified.The final information was compiled and stored in the Stanford MicroarrayDatabase (SMD) (http://smd.stanford.edu/) for data retrieval and analysis. Wealso identified a subset of clones which were fully sequenced, used in the finalassembly, contained one open reading frame (ORF) according to the TIGRannotation (�200-bp overlap at �98% identity), and were not from contami-nated plates. These clones (2,802 good clones which contain 2,112 unique genes)were used for analyses in which we considered data from clones with a singleORF.

Microarray hybridization. Genomic DNA (4 to 8 �g) from the EH and EDstrains was resuspended in 30.5 �l of Tris-EDTA (pH 7.6), 3 �l of randomnanomers (4 �g/�l) was added to the solution, and the solution was boiled for 1min and cooled for 5 min. The DNA was mixed with 10� deoxynucleosidetriphosphates (0.25 mM concentrations of dATP, dCTP, and dGTP and 0.09 mMconcentration of dTTP), 1.5 �l of Cy5-dUTP, and 2 �l of Klenow fragment andincubated at 37°C for 4 h. A reference sample (350 ng of a 140-bp PCR productwhich labeled all spots on the array) was labeled similarly with Cy3-dUTP. Thelabeled products were mixed, purified with a Microcon 30 filter (Amicon/Milli-pore, Billerica, Mass.), and hybridized on arrays for 16 to 18 h at 65°C. The arrayswere washed at room temperature sequentially for 2 min each with three solu-tions: 2� SSC and 0.1% sodium dodecyl sulfate, 1� SSC, and 0.2� SSC (19).The arrays were scanned using GenePix 4000B microarray scanner (Axon In-struments, Union City, Calif.). We performed a minimum of two replicate ex-periments per strain, except for EH (HM-1:IMSS) and ED (SAW760) for whichfive replicate arrays were generated. Clones with significant cross-hybridizationwith bacterial genomic DNA were identified by labeling and hybridizing bacterialgenomic DNA on the arrays as described above.

Microarray data analysis. Data were analyzed using ScanAlyze software(http://rana.lbl.gov/EisenSoftware.htm) to determine the fluorescence signal in-tensity of each dye at each spot. The intensities of channel 2 (Cy5-labeledEntamoeba genomic DNA) and channel 1 (Cy3-labeled reference DNA) mea-sured the Cy5 and Cy3 signals, respectively.

(i) Data normalization. For all arrays hybridized with Entamoeba genomicDNA, the channel 2 signal intensity was adjusted such that the mean channel2/channel 1 intensity, or R/G ratio, was 1.0 for the entire array (19). For arrayshybridized with control DNA, nonnormalized R/G signal was used; the cutoff forpositive hybridization was the mean R/G value plus 3 standard deviations.

(ii) Filters used. Spots were filtered out on the basis of two criteria: poor-quality spots were discarded if they were flagged or had a channel 1 net meanintensity of �250, and spots with high noise/signal ratios were discarded if theyhad extremely different values in replicate arrays (average deviation/average of�0.4). Additionally, clones from contaminated plates and clones that cross-hybridized with bacterial genomic DNA were not analyzed.

Comparative genomic hybridizations (CGH). For each EH and ED strain, thehybridization signal to the clones on the array was compared to the hybridizationsignal for the reference strain, EH (HM-1:IMSS), and the relative hybridizationwas analyzed by genomotyping analysis by the method of Charlie Kim et al.(GACK) (40). This algorithm generates a graded output and assigns a range ofvalues from �0.50 to �0.50 in 0.05 increments. For a GACK value of �0.50,there is 100% likelihood that a gene is present; for a GACK value of �0.50, thereis 100% likelihood that a gene is absent, divergent, or present at reducedgenomic abundance. The GACK output was put into four categories (on thebasis of Southern blot hybridizations and sequence analysis): A (absent or highlydivergent; GACK values of �0.50 and �0.45), B (significantly divergent; GACKvalues of �0.45 to �0.15), C (moderately divergent; GACK values of �0.10 to�0.40), and D (highly conserved; GACK values of �0.45 and �0.50).

Estimation of copy number. The average signal intensity for low-copy-numbergenes (1 or 2 loci), multicopy genes (6 to 10 loci), or high-copy-number repeatelements was calculated by analyzing a minimum of 10 clones in each category.Genes were selected on the basis of previously published literature and genomeannotation. All clones analyzed contained a single ORF. To estimate the signalfor specific regions of the rRNA episome, we chose clones that represented (i)a portion of the upper intergenic spacer (22,000 to 24,000 bp), (ii) a part of thetranscribed unit (15,000 to 19,000 bp), and (iii) a part of the lower intergenicspacer (12,000 to 14,000 bp) (58). For regions with �2-kb sequence, clones with100% match were selected. For regions with �2-kb sequence, clones with thehighest E value were chosen, although these clones would always containgenomic sequences other than the sequence of interest. For each of the catego-

VOL. 4, 2005 GENOTYPES OF ENTAMOEBA CORRELATE WITH VIRULENCE 505

on October 10, 2020 by guest

http://ec.asm.org/

Dow

nloaded from

Page 3: Comparative Genomic Hybridizations of Entamoeba Strains ...Microarray hybridization. Genomic DNA (4 to 8 g) from the EH and ED strains was resuspended in 30.5 l of Tris-EDTA (pH 7.6),

ries, the R/G signal for each EH and ED strain was retrieved, spot quality filters wereapplied, and the mean signal intensity � standard error was calculated. Lists of theclones used in these analyses are available for review (see Table 4 in the supple-mental material at http://microarray-pubs.stanford.edu/entamoeba_diversity/index.shtml).

Estimation of Cy5-labeled probe size. To determine the average probe lengthof the Cy5-labeled genomic DNA, labeled genomic DNA was electrophoresed inan agarose gel. Cy5 and ethidium bromide were visualized with the MolecularDynamics Typhoon 8600 multimode imaging system (excitation wavelengths of633 and 493 nm, respectively, and emission filters of 670 and 620 nm, respec-tively) (Amersham Biosciences, Piscataway, N.J.).

Estimation of sequence identity for clones in divergence categories. To deter-mine the reliability of the divergence categories, all clones from categories A, B,and C and 200 random clones from category D of the ED (SAW760) analysiswere compared to those in the ED (SAW760) sequence database. BLASTNanalysis was performed, and the match length and percent identity of the first hitwas retrieved.

Statistical analysis. A Student’s t test (two-tailed distribution and two-sampleequal variance) was utilized. The values in comparisons of sample pairs wereconsidered significantly different if the P value was �0.05.

Southern blotting and PCR and sequence analysis. Southern blotting wasperformed by the standard protocol (58). Genomic DNA (10 to 15 �g) wasdigested with EcoRI or HindIII, electrophoretically separated on 1% agarosegels, blotted onto Hybond N� nylon membranes, UV cross-linked, and hybrid-ized with radiolabeled probes. The probes were PCR amplified from the appro-priate clone, and the probe sequence was verified, labeled with [-32P]dATP withthe Random Primed DNA labeling kit (Roche, Mannheim, Germany), andhybridized with Express-Hyb (Clontech, Palo Alto, Calif.). Blots were subjectedto autoradiography, scanned, and prepared for publication using Adobe Photo-shop (version 6.01; Adobe Systems, San Jose, Calif.). Blots were stripped usingH2O–0.5% sodium dodecyl sulfate and reused for subsequent hybridizations.The probes used were: category A (ENTI070, ENTIL41, and ENTOU30), cat-egory B (ENTPF64, ENTON45, ENTNW18, and ENTNQ01), and category D(ENTOI63, ENTOE77, ENTOJ81, and ENTPJ36). Four loci (423.m00019,271.m00049, 353.m00047, and 8.m00351) were PCR amplified when possiblefrom all EH strains and ED (SAW760). PCR products were directly sequencedand analyzed using CLUSTALW. Primer sequences are available for review (seeTable 1 in the supplemental material at http://microarray-pubs.stanford.edu/entamoeba_diversity/index.shtml).

Cluster and dendrogram analysis. The GACK divergence output was used forclustering. Clones identified as being highly conserved in all strains were re-moved from the analysis to facilitate use of the clustering and visualizationprograms. The remaining clones (�2,389) were clustered using the programXCLUSTER (http://smd.stanford.edu/download/), using a Pearson correlation,uncentered metric algorithm, and hierarchical clustering. The clustering outputwas viewed using Java Treeview version 1.0.3. (Alok Saldahna, http://sourceforge.net/projects/jtreeview). To generate dendrograms based on genotypic differ-ences, the clones were analyzed using the Mega 2.2 program (dMEGA2: Mo-lecular Evolutionary Genetics Analysis Software, Arizona State University)(http://www.megasoftware.net/) using an unweighted pair group method witharithmetic mean (UPGMA) distance matrix algorithm and bootstrap values toestimate confidence intervals.

Nucleotide sequence accession numbers. Nucleotide sequence data for the423.m00019, 271.m00049, and 353.m00047 loci have been deposited in GenBank.The locus names and accession numbers in GenBank follow: Rahman_423.m00019, AY857547; NIH_353.m00047, AY857548; Rahman_353.m00047,AY857549; HK9_353.m00047, AY857550; and SAW_271.m00049, AY857551.

RESULTS

An 11,328-clone E. histolytica genomic DNA microarray wasgenerated. To perform CGH, we generated an E. histolyticagenomic DNA microarray. As of February 2004, the genomeannotation was in 888 assemblies with 9,938 predicted genes(Brendan Loftus and Neil Hall [TIGR], personal communica-tion). Due to the compact nature of the Entamoeba genome,clones may contain more than one ORF. The number of cloneswith one annotated ORF was 2,802 (2,112 unique genes).

Microarray signal intensity estimates genomic abundance.In order to determine whether the hybridization signal corre-

sponded to genomic abundance, we analyzed the signal fromclones with no genomic insert, low-copy-number genetic loci (1to 2 loci), high-copy-number genetic loci (6 to 10 loci), andhighly repetitive genomic elements (see Fig. 1 in the supple-mental material at http://microarray-pubs.stanford.edu/entamoeba_diversity/index.shtml). The average probe length of Cy5-labeledgenomic DNA was 500 to 1,500 bp, and all clones had compa-rable AT content allowing this comparison (data not shown).For clones with no genomic insert, the hybridization signalintensity was extremely low (R/G, 0.009 � 0.002), indicatingthat Cy5-labeled E. histolytica genomic DNA did not cross-hybridize to the reference DNA. Signal intensities for 1-copy(R/G, 0.19 � 0.01), 2-copy (R/G, 0.31 � 0.05), and 6- to 10-copy-number genes (R/G, 1.65 � 0.37) were proportional tothe copy number. Clones with repetitive loci (Ehapt2 andEhRLE4) had high signal intensities (R/G, 1.03 � 0.18 and 1.75� 0.16, respectively), although not proportional to the copynumber, likely due to signal dampening from the adjacentnonrepetitive genomic regions. Clones from nontranscribedregions of the rRNA episome had half (R/G, 1.54 � 0.17) thesignal from transcribed regions (R/G, 3.95 � 0.36) (each rRNAepisome contains two identical transcribed repeats) (58). Wedetermined the mean signal intensity for all clones on the arraythat reliably contained one ORF. Of the genes in EH (HM-1:IMSS), 13.8% had a mean signal intensity R/G of �1.0, indi-cating that these genetic loci were in high genomic abundanceor were members of highly similar gene families.

When we compared E. histolytica and E. dispar hybridiza-tions, we identified genetic loci known to be absent in EDcompared to EH (CP1, Ariel1, and Ehapt2) as having signifi-cantly lower signal intensities in ED (SAW760) compared to

FIG. 1. CGH accurately identifies genetic loci that are absent inED (SAW760) compared to EH (HM-1:IMSS). The mean signal in-tensities for genes conserved in both EH and ED, namely, the dipep-tidylpeptidase 2 (DPP2), P glycoprotein 6 (PGP6), and actin-bindingprotein (ABP2) genes, are comparable. In contrast, the signal intensi-ties for genes previously known to be absent (CP1, Ariel1, and Ehapt2)in ED compared to EH were lower. The mean signal intensity �standard error (S.E.) (error bars) is shown on the y axis, and thegenetic loci are shown on the x axis.

506 SHAH ET AL. EUKARYOT. CELL

on October 10, 2020 by guest

http://ec.asm.org/

Dow

nloaded from

Page 4: Comparative Genomic Hybridizations of Entamoeba Strains ...Microarray hybridization. Genomic DNA (4 to 8 g) from the EH and ED strains was resuspended in 30.5 l of Tris-EDTA (pH 7.6),

EH (Fig. 1) (12, 67, 68). For highly abundant genomic loci(such as the cysteine proteinase genes and Ehapt2), althoughthe signal for ED is markedly reduced than that in EH, it is notcompletely absent, since each clone contains additionalgenomic regions that will contribute to hybridization signal.The clones with CP5 also contained another potential ORFand therefore were not assessed. Genes highly conserved in thetwo species on the basis of homologues identified in the EDdatabase, dipeptidylpeptidase 2 (DPP2), P glycoprotein 6(PGP6), and actin-binding protein 2 (ABP2), had similar signallevels in EH and ED.

Analysis of the rRNA episome revealed significant differ-ences between EH and ED (see Fig. 2 in the supplemen-tal material at http://microarray-pubs.stanford.edu/entamoeba_diversity/index.shtml). In EH (HM-1:IMSS), this 24.5-kb ex-trachromosomal DNA is present in 200 copies per genome andconsists of two 5.9-kb identical inverted repeat regions that aretranscribed and two nontranscribed regions of 9.2 kb (upperintergenic sequence) and 3.5 kb (lower intergenic sequence),respectively (58). Our analysis revealed that the upper andlower nontranscribed intergenic sequences had significantlyhigher signal in EH than ED. We performed sequence com-parison of these regions with the ED (SAW760) genome se-quence. The nontranscribed intergenic sequences of EH hadlittle homology to comparable regions in ED (at most 91%identity over 64 to 141 bp). Thus, the low signal intensity of therRNA nontranscribed region in ED accurately reflected thesignificant sequence dissimilarity of these two regions. Signalfrom the transcribed region of the rRNA episome was signif-icantly lower in EH than in both ED strains (sequences of EHand ED exhibit 98.6% identity), and Southern blot analysisconfirmed that the copy number of the rRNA episome waslower in EH than ED (see Fig. 2B in the supplemental material

at http://microarray-pubs.stanford.edu/entamoeba_diversity/index.shtml).

CGH of Entamoeba strains and algorithms for identifyingdivergence. We genotyped four E. histolytica strains and two E.dispar strains (see Materials and Methods for details). Therewas high reproducibility among arrays hybridized with inde-pendently isolated genomic DNA. The mean correlation (�standard deviation) was 0.93 � 0.03. The R/G signals obtainedfrom hybridizations with each strain were compared to the R/Gsignals obtained for hybridization with EH (HM-1:IMSS)genomic DNA. For example, (R/GHK-9)/(R/GHM-1:IMSS) wouldindicate the relative hybridization for a given clone betweenEH (HK-9) and EH (HM-1:IMSS). Clones for which the ex-perimental strain genomic DNA hybridized as well as EH(HM-1:IMSS) strain would represent genetic loci that werehighly conserved. Clones that showed poor or no hybridizationfrom the experimental strain genomic DNA would representloci that were either absent, highly divergent so as not tocross-hybridize, or in significantly lower copy number in theexperimental strain compared to EH (HM-1:IMSS). Data wereassigned into functional divergence categories: A (absent orhighly divergent), B (significantly divergent), C (moderatelydivergent), and D (highly conserved) (40).

Southern blot analyses were performed with EH (HM-1:IMSS) and ED (SAW760) genomic DNA and yielded theexpected results (Fig. 2 [some data not shown]). For clones inthe divergence categories A and B, the probes did not hybrid-ize to genomic DNA from ED (SAW760). We used genomesequence data from ED (SAW760) to further analyze the va-lidity of the divergence algorithm and performed BLASTNanalysis on all clones in categories A, B, and C and 200 randomclones in category D. For categories A, B, C, and D, thenumbers of sequences that had no significant BLASTN match

FIG. 2. Southern blot analysis confirms microarray data for clones identified as absent or highly divergent in ED (SAW760) compared to EH(HM-1:IMSS). Genomic DNA from EH (HM-1:IMSS) and ED (SAW760) was digested with HindIII or EcoRI and hybridized with the cloneslisted. All blots were probed with a control probe with known equal genomic abundance as a loading control. The blots were exposed to film, andthe images were scanned and prepared for publication using Adobe Photoshop 6 software program. The divergence category for each clone is listedbelow the blots.

VOL. 4, 2005 GENOTYPES OF ENTAMOEBA CORRELATE WITH VIRULENCE 507

on October 10, 2020 by guest

http://ec.asm.org/

Dow

nloaded from

Page 5: Comparative Genomic Hybridizations of Entamoeba Strains ...Microarray hybridization. Genomic DNA (4 to 8 g) from the EH and ED strains was resuspended in 30.5 l of Tris-EDTA (pH 7.6),

(E value of �1E�4) were 24.6, 13.3, 9.8, and 7.5%, respec-tively (Fig. 3A). The 7.5% of sequences in category D withouta BLAST match most likely represent incomplete genomesequence coverage of ED (�2X at present). The higher pro-portion of such sequences in the highly and moderately diver-gent categories indicated that those genomic loci had nomatches in the ED database on the basis of divergence (ratherthan incomplete coverage alone). For those sequences withsignificant matches, the BLASTN results were analyzed (Fig.3B and C). The average identity to the ED sequence databaseof clones from categories A, B, C, and D was 89% � 0.4%,90% � 0.3%, 92% � 0.2%, and 93% � 0.3%, respectively. Themean match length for clones in categories A, B, C, and D was191 � 11, 242 � 11, 273 � 7, and 368 � 16 bp, respectively.Therefore, in the highest divergence category (category A),although the query EH sequences were �800 bp, the averagematch in the ED database was 191 bp with 89% identity. Thus,both the Southern blot hybridizations and large-scale sequence

analyses indicated that the divergence categories were quanti-tatively valid. Although these analyses were performed for ED(SAW760), the validity of the categories should be applicableto other Entamoeba strains.

Identification of unique genetic fingerprints for Entamoebastrains. Using the divergence categories outlined above, weanalyzed each EH and ED strain compared to the referencestrain EH (HM-1:IMSS) (Table 1). The total number of clonesanalyzed for these strains ranged from 7,669 to 8,677. The twoED strains (SAW1734 and SAW760) exhibited the greatestdivergence from EH (HM-1:IMSS) with 2.2 and 3.1% of clonesin category A, 2.5 and 3.6% of clones in category B, and 8.6and 10% of clones in category C, respectively. EH strains(200:NIH, HK-9, and Rahman) exhibited less divergence fromEH (HM-1:IMSS) with 0.4, 0.2, and 0.1% of clones in categoryA, 1.1, 0.9, and 1.5% of clones in category B, and 5.2, 4.4, and5.9% of clones in category C, respectively. The majority ofclones were in category D (highly conserved). The numbers of

FIG. 3. Sequence comparison of clones in the divergence categories reveals a sequential decrease in the number of clones with no significantBLASTN data (A), an increase in mean match length (C), and an increase in percent identity (B) in clones from categories A, B, C, and D. Allclones from divergence categories A, B, C and 200 random clones from category D were compared to the available genomic sequences for ED(SAW760). The mean data (� standard error [S.E.]) are shown. All values were statistically significantly different from the adjacent value(Student’s t test; P � 0.05) in panels B and C.

TABLE 1. Assessment of genetic divergence in EH and ED laboratory strainsa

Divergencecategory

EH (200:NIH) EH (HK-9) EH (Rahman) ED (SAW1734) ED (SAW760)

No. of clones(%)

No. of uniqueloci (%)

No. of clones(%)

No. of uniqueloci (%)

No. of clones(%)

No. of uniqueloci (%)

No. of clones(%)

No. of uniqueloci (%)

No. of clones(%)

No. of uniqueloci (%)

A 35 (0.4) 8 (0.5) 17 (0.2) 3 (0.2) 6 (0.1) 0 (0) 176 (2.2) 13 (0.8) 234 (3.1) 22 (1.3)B 90 (1.1) 13 (0.7) 79 (0.9) 14 (0.8) 127 (1.5) 5 (0.3) 198 (2.5) 22 (1.3) 279 (3.6) 45 (2.7)C 443 (5.2) 90 (5.0) 376 (4.4) 64 (3.6) 512 (5.9) 40 (2.2) 674 (8.6) 112 (6.8) 768 (10) 129 (7.9)D 8,000 (93) 1,674 (94) 8,070 (94) 1,708 (96) 8,032 (93) 1,772 (98) 6,783 (87) 1,512 (91) 6,388 (83) 1,444 (88)

Total 8,568 1,785 8,542 1,789 8,677 1,817 7,831 1,659 7,669 1,640

a CGH for EH (200:NIH, HK-9, and Rahman) and ED (SAW760, and SAW1734) strains compared to the EH (HM-1:IMSS) strain. Data are shown in the fourdivergence categories: A (absent or highly divergent), B (significantly divergent), C (moderately divergent), and D (highly conserved). For each strain, the number ofclones and unique genetic loci in each category are shown. The overall extent of divergence in categories A, B, and C was statistically significantly higher in the twoED strains than in the EH strains (P � 0.05).

508 SHAH ET AL. EUKARYOT. CELL

on October 10, 2020 by guest

http://ec.asm.org/

Dow

nloaded from

Page 6: Comparative Genomic Hybridizations of Entamoeba Strains ...Microarray hybridization. Genomic DNA (4 to 8 g) from the EH and ED strains was resuspended in 30.5 l of Tris-EDTA (pH 7.6),

clones in category D ranged from 93 to 94% for EH strains and83 to 87% for ED strains, but the numbers were statisticallydifferent for the ED and EH strains (P value of 0.009). In orderto determine whether the expected genomic regions contrib-uted to the divergence patterns described above, we looked atthe number of clones in category A that were from the non-transcribed region of the rRNA episome or the Ehapt2 orEhRLE4 locus, since these regions are highly divergent ormissing in ED (Fig. 1) (59, 67). In the ED strains, 35 to 43.8%of the clones in category A were from the rRNA episome, and3.9 to 2.3% contained either the Ehapt2 or EhRLE4 repeatregion.

In order to obtain an overview of the comparative hybrid-ization data, the divergence data from all usable clones wereclustered with the XCLUSTER program using a Pearson cor-relation, uncentered metric algorithm, and hierarchical clus-tering (Fig. 4). Clones that were highly conserved in all strains

were removed to facilitate the clustering program. Overall,2,372 clones (27% of clones analyzed) were divergent in atleast one strain of Entamoeba. Each Entamoeba strain had aunique genetic fingerprint compared to the EH (HM-1:IMSS)reference strain. Previous studies of EH clinical isolates relyingalmost exclusively on PCR-based analyses of a few highly poly-morphic loci have shown that significant genetic diversity existsamong EH strains (32, 33, 72, 73). The microarray data weregenerated from the analysis of �7,600 clones and suggest thatgenetic diversity exists on a genome-wide scale, including di-vergence in nonrepetitive genomic regions.

A total of 1,656 clones were divergent in any one ED strain;of these clones, 673 were divergent in both ED strains. In thislatter set, we identified a set of 56 clones that were divergent inall ED strains, conserved in all EH strains, and in high genomicabundance in EH. These clones would represent good candi-dates for PCR-based diagnostic tests to distinguish EH from

FIG. 4. CGH of EH and ED strains reveal distinct patterns of divergence from the EH (HM-1:IMSS) strain. Genomic abundance clusters wereobtained using Pearson correlation, uncentered metric algorithm, and hierarchical linkage clustering. Data for the EH and ED strains are shownin columns, and data for the different clones are shown in rows. The level of divergence is shown as follows: yellow for highly conserved (categoryD), blue for absent or highly divergent (category A), and grey for missing data. The genomic abundance for each clone is shown in the leftmostcolumn for EH (HM-1:IMSS) and in the rightmost column for ED (SAW760) and is represented by a green to red scale: green for low genomicabundance (R/G � 0.25) and red for high genomic abundance (R/G � 4.0). Sections are labeled to the right of the gel. Sections 1 and 1A weredivergent in one EH strain. Sections 2, 2A, 2B, and 2C were divergent in EH and ED strains. Sections 3 and 3A were divergent in both ED strainsbut conserved in all EH strains.

VOL. 4, 2005 GENOTYPES OF ENTAMOEBA CORRELATE WITH VIRULENCE 509

on October 10, 2020 by guest

http://ec.asm.org/

Dow

nloaded from

Page 7: Comparative Genomic Hybridizations of Entamoeba Strains ...Microarray hybridization. Genomic DNA (4 to 8 g) from the EH and ED strains was resuspended in 30.5 l of Tris-EDTA (pH 7.6),

ED (Table 2). Some of the tests used to distinguish EH andED at this time are based on sequence divergence of thenontranscribed regions of the rRNA episome (17, 22). Ouranalysis identified that 26 of the 56 clones that could distin-guish EH from ED were from the rRNA episome, but weidentified 30 clones that could serve as the basis for novelspecies-specific tests. A number of the diagnostic approachesused to distinguish between EH and ED at this time havedemonstrated regional biases in sensitivity and specificity (11,55). Diagnostic tests to identify ED should be based on geneticregions that were uniformly divergent in all ED strains com-pared to all EH strains (Fig. 4, sections 3 and 3A). Geneticregions that are divergent in both ED strains but are alsodivergent in other EH strains (Table 2) (Fig. 4, sections 2B and2C) would not be suitable for diagnostic tests. However, re-gions uniquely divergent in one EH or ED strain could be usedfor strain-typing assays (Table 2) (Fig. 4, sections 1 and 1A)(see Tables 3 and 7 in the supplemental material at http://microarray-pubs.stanford.edu/entamoeba_diversity/index.shtml).

The three EH strains (Rahman, HK-9, and 200:NIH) werealso clearly distinct from each other and from EH (HM-1:IMSS). Overall, 1,124 clones were divergent in any one EHstrain. Clones that have sequence divergence in any EH strainwould not be ideal for EH-specific tests. The genes they con-tain may not be ideal vaccine targets (if also divergent in anyclinical EH isolates) and are likely not essential for trophozoitegrowth in tissue culture (Table 2). Of the three EH strainsanalyzed, EH (HK-9) was the least divergent (5.5% of clonesdivergent compared to the reference strain). The other two EHstrains (200:NIH and Rahman) had 6.6 and 7.5% of divergentclones, respectively.

The genotype of EH (Rahman), an attenuated-virulencestrain, is distinct from the genotypes of the virulent EHstrains. To generate genotypic dendrograms, the divergencecategories were analyzed using the UPGMA algorithm (Fig.5). The analysis revealed that we could successfully differenti-ate the genotypes of E. histolytica and E. dispar. Additionally,strain EH (Rahman), a less virulent strain as assessed by de-creased monolayer destruction, disease in animal models, cy-totoxicity, and the only EH strain isolated from an asymptom-atic individual was the most divergent of the EH strains andclustered separately from the virulent EH strains with signifi-cant bootstrap values (13, 27, 48–51). The separation of theEH (Rahman) strain from the other EH strains could alsoreflect the geographic origins of the strains and be unrelated toparasite phenotypes.

Data from many more strains (especially clinical isolates)must be obtained before we can assess whether the genotype ofavirulent EH strains is distinct from virulent EH strains. Oneprevious study using RAPD also identified EH (Rahman) asdistinct from virulent EH strains (64). However, unlike ourstudy, they were not able to identify the genetic regions thatdifferentiated EH (Rahman) from the other EH strains. Thestrain that aligned closest with EH (HM-1:IMSS) was EH(HK-9), which upon original isolation was invasive, but it isnow considered to possess attenuated virulence (4, 15). Theloss of virulence in EH (HK-9) may be an epigenetic phenom-enon, related to gene expression, due to genomic changes, orhave occurred during parasite isolation with selection of a less

virulent subtype. For the purposes of this discussion, the as-signment of virulence is based on the phenotype at the time ofparasite isolation. All strains have been treated similarly (iso-lated from patients �30 years ago; both ED strains are xenic;and all four EH strains are axenic); thus, the overall selectionpressures within the ED and EH strains should be comparable.

Identification of E. histolytica genomic assemblies with un-usually high divergence rates. In order to assess whether anyregions of the EH genome were more prone to be divergent,we analyzed the extent of divergence in assemblies. The 12Xgenome sequence coverage of EH (HM-1:IMSS) at this timehas been organized into 888 assemblies (assembly size rangesfrom 297 to 2 kb; assemblies contain 1 to 151 ORFs); of theseassemblies, 554 are on the array (Neil Hall and Brendan Loftus[TIGR], personal communication). The size of the assemblycorrelated with the number of predicted ORFs (R 0.9766),and the number of predicted ORFs correlated with the numberof clones on the microarray (R 0.9063) (data not shown).

We analyzed all the clones on the microarray for which wehad a definitive assembly and ORF assignment, a total of 4,555clones. Assemblies were categorized by the number of Entam-oeba strains that were divergent in that assembly. The majorityof assemblies (39%) had no divergence in any Entamoebastrain, 16% had divergence in one strain, 19% had divergencein two strains, 12% had divergence in three strains, 9% haddivergence in four strains, and 5.4% had divergence in all fiveEntamoeba strains analyzed (see Fig. 3 in the supplementalmaterial at http://microarray-pubs.stanford.edu/entamoeba_diversity/index.shtml). The assemblies in all five categorieswere characterized and were highly similar for the following:(i) mean ORF length (387 to 403 bp); (ii) percentage of ORFsthat are hypothetical proteins (55 to 62%); (iii) averagegenomic abundance as assessed by signal intensity (R/G, 0.50to 0.57); and (iv) the percentage of the assembly that is com-posed of simple (2.1 to 3.0%), low-complexity (9.7 to 12.6%),SINE (short interspersed nuclear element) (1 to 2.2%), orLINE (long interspersed nuclear element) (8.7 to 10.9%) re-peats.

Therefore, the identification of 30 assemblies in which allfive amebic strains were divergent but whose structural fea-tures and composition were apparently not different from therest of the genome may indicate that these genomic regions areunder different and higher divergence pressures than the re-mainder of the genome. Although we were able to identify anumber of assemblies that apparently have high rates of diver-gence, we did not identify large portions of individual assem-blies that were divergent in a given Entamoeba strain (data notshown). Thus, unlike bacterial genomes, large islands ofgenomic DNA do not appear to have been affected by thepressures that cause genetic divergence (39, 57). Since we donot have full genome coverage on the microarray, it is possiblethat there are larger regions of divergence that we did notidentify due to lack of clones in a given genomic region. Ad-ditionally, single-nucleotide polymorphisms cannot be de-tected by our approach.

Identification of genetic loci associated with virulence. Inorder to identify genes associated with virulence, we analyzedthe divergence data for genes divergent in ED compared toEH (Table 1). Overall, 1,640 to 1,817 loci were analyzed; themajority were highly conserved (94 to 98% in EH and 88 to

510 SHAH ET AL. EUKARYOT. CELL

on October 10, 2020 by guest

http://ec.asm.org/

Dow

nloaded from

Page 8: Comparative Genomic Hybridizations of Entamoeba Strains ...Microarray hybridization. Genomic DNA (4 to 8 g) from the EH and ED strains was resuspended in 30.5 l of Tris-EDTA (pH 7.6),

TA

BL

E2.

Overview

ofthe

trendsin

geneticdivergence

inE

ntamoeba

strains

Patternof

divergenceE

Hand

ED

straincom

parisonsE

Hstrain

comparisons

ED

straincom

parisonsC

loneor

no.of

clonesG

eneor

no.of

genesR

epresentativegenes

Potentialuse

1U

niquelydivergent

inone

strainU

niquelydivergent

inone

strainE

H-699

EH

-84E

H-271.m

00049guanine

nucleotidereleasing

factorN

ovelgeneticloci

forstrain

typingE

D-494

ED

-161E

H-302.m

00042glutathione

conjugatetransporter

ED

-544.m00027

hypotheticalproteinE

D-251.m

00074syntaxin

5

2D

ivergentin

anyE

Hstrain

Divergent

inany

ED

strainE

H-1124

EH

-151E

Hand

ED

-303.m00071

methylase

Not

idealforE

Hand

ED

diagnostictests

ED

-467.m00030

grainin2

ED

-1656E

D-253

ED

-197.m00073

AB

Cfam

ilytransporter

3(a)

Divergent

inboth

ED

strains(a)

673(a)

92(a,b)

EH

andE

Ddiagnostic

tests(b)

Conserved

inallE

Hstrains

(b)74

(b)80

(c)In

highgenom

icabundance

(c)56

(c)35

(c)56.m

00158hypotheticalprotein

(c)Suited

forD

NA

amplification

technology(c)

4.m00653

GT

P-bindingprotein

4N

otdivergent

inany

strain1,534

50.m00199

Sec61(a)

Essentialgenes

180.m00107

ribosomalprotein

(b)Im

portantfor

hostrange

8.m00351

actin

5H

ighsequence

divergencein

anyE

Hstrain

High

sequencedivergence

inany

ED

strain

EH

-9E

H-303.m

00071m

ethylase(a)

Not

essentialfor

trophozoitegrow

thin

tissueculture

EH

-105.m00135

Pats1(b)

Not

idealvaccine

ordrug

targetsE

D-27

EH

-505.m00016

KIA

A1423

proteinE

H-423.m

00019hypotheticalprotein

EH

-353.m00047

hypotheticalproteinE

D-4.m

00653G

TP-binding

protein

6C

onservedin

allE

Hstrains

Divergent

inboth

ED

strains80

565.m00023

AIG

-1gene

Deserve

furtherinvestigation—any

rolesin

virulence?95.m

00150putative

proteinkinase

105.m00128

peroxiredoxin

1-related395.m

00027R

asfam

ilym

ember

7D

ivergentin

allE

ntamoeba

strainsw

ithdecreased

virulence

9Seven

hypotheticalproteinsD

eservefurther

investigation—any

rolesin

virulence?505.m

00018R

abgeranylgeranyltransferase

34.m00240

dihydrouridinesynthase

domain

VOL. 4, 2005 GENOTYPES OF ENTAMOEBA CORRELATE WITH VIRULENCE 511

on October 10, 2020 by guest

http://ec.asm.org/

Dow

nloaded from

Page 9: Comparative Genomic Hybridizations of Entamoeba Strains ...Microarray hybridization. Genomic DNA (4 to 8 g) from the EH and ED strains was resuspended in 30.5 l of Tris-EDTA (pH 7.6),

91% in ED), with 1,534 loci that were not divergent in anystrain. The two ED strains (SAW1734 and SAW760) had thegreatest number of divergent loci: 147 and 196, respectively.The EH strains had lower divergence: 111, 81, and 45 genes in200:NIH, HK-9, and Rahman, respectively.

As previously mentioned, to date, only four genes have beenidentified as being absent (CP1, Ariel1, and Ehapt2) or highlydivergent (CP5) in ED compared to EH (12, 67–69). We havenow identified 80 novel genes that were divergent in both EDstrains but conserved in all EH strains. The majority of thesegenes are hypothetical proteins; however, we identified an Ara-bidopsis thaliana avrRpt2-induced gene (AIG1) (565.m00023)(a plant gene involved in resistance to bacteria), a putativeprotein kinase (95.m00150), a peroxiredoxin 1-related gene(105.m00128), and a Ras family gene (395.m00027). We alsoidentified nine genes that were divergent in the two ED strainsand in the attenuated-virulent strain EH (Rahman) but con-served in virulent EH strains. Seven of the genes encodedhypothetical proteins; one gene coded for Rab geranylgeranyl-transferase beta subunit (505.m00018), and one gene had adihydrouridine synthase domain (34.m00240) (Table 2).

In order to determine whether despite their sequence dis-similarity, these genes encode functionally similar proteins inED, we performed TBLASTX against the ED (SAW760) ge-nome database on the 67 loci from the highly and significantlydivergent categories and 30 random loci from the highly con-served category. The overall homology for the predicted pro-teins was much lower in the two divergent categories than inthe highly conserved set (see Table 8 in the supplementalmaterial at http://microarray-pubs.stanford.edu/entamoeba_diversity/index.shtml). Some genes from the highly or signif-icantly divergent categories did have highly similar ortho-logues; these were likely identified as divergent on the basis ofdecreased copy number.

The overall paucity of genes restricted to virulent Enta-moeba strains indicates that the presence or absence of a givengene (or sets of genes) may not fully explain the vastly differentphenotypes of E. histolytica and E. dispar (Table 1). Expressiondifferences in highly conserved genes, as known for the ame-bapore genes, undoubtedly play large roles in amebic patho-genic potential (46, 52). Since we did not have full genomecoverage on our microarray, there are likely other genes re-stricted to EH that we did not identify. However, our array wascomposed of a random subset of genes, and a whole-genomeanalysis will likely reveal similar levels of divergence in ED. Tovalidate that the genes we identified as divergent were func-tionally different, we checked the expression profiles of these

genes. Genes divergent in ED (SAW760) had significantlylower expression levels than those in EH (HM-1:IMSS) (RyanMacFarlane, personal communication).

In order to determine whether genes in category A werehighly divergent or absent, we analyzed the 22 genes that werein this category for ED (SAW760). For each gene, we lookedfor homologues in the ED (SAW760) genome sequence data-base. Using these data, we identified 45% of genetic loci thatappear to have diverged as a consequence of genetic drift, 23%that were divergent on the basis of copy number differences,and 32% that appear to be absent in the ED genome (althoughthe limited 2X genome coverage makes it difficult to defini-tively state that a given gene is missing). On the basis of theCGH data and the limited sequence analysis, it appears thatthe majority of differences between ED (SAW760) and EH(HM-1:IMSS) resulted from genetic drift, rather than geneloss; however, a comprehensive genome analysis of ED will benecessary to make definitive conclusions.

We identified a number of loci that were divergent betweenEH strains including two hypothetical proteins (423.m00019and 353.m00047) and a Ras guanine nucleotide releasing fac-tor (271.m00049) (see Tables 7 and 9 in the supplementalmaterial at http://microarray-pubs.stanford.edu/entamoeba_diversity/index.shtml). In order to confirm the CGH data, wePCR amplified and sequenced these loci (Table 3). For all locicategorized as highly divergent, PCR gave negative results(40). The average sequence identity for loci in categories B, C,and D were 78, 84, and 99%, respectively. Some variability wasseen in the sequence identities for categories B and C. Sincethe array data were based on clones that contained ORFs andadjacent genomic sequences, the divergence we measured byCGH for 423.m00019 may have been in noncoding regions.Intergenic differences can, however, reflect functional differ-ences (i.e., divergent promoters may affect expression levels).Thus, as a screening tool to identify divergent genetic loci,array-based CGH were valid for EH-EH comparisons.

DISCUSSION

Four EH strains and two ED strains were genotyped on thebasis of CGH of �7,600 random genomic clones. On the basisof the identification of divergent genetic loci, each strain had aunique genetic fingerprint, and the degree of divergence cor-related with virulence. The unique pattern for each strain in-dicates that there is significant diversity among laboratorystrains of EH. Whether such diversity exists among clinicalisolates is not clear. Epidemiologic studies have demonstrated

FIG. 5. Genotypic dendrogram based on divergence data for four strains of E. histolytica [EH (HM-1:IMSS), EH (HK-9), EH (200:NIH), andEH (Rahman)] and two strains of E. dispar [ED (SAW760) and ED (SAW1734)] with phylogenetic and molecular evolutionary analyses conductedusing MEGA version 2.2. The virulence phenotype (V, virulent; AV, avirulent) and geographic origins of the strains are shown to the right of thedendrogram. The UPGMA algorithm was used, and bootstrap values were calculated.

512 SHAH ET AL. EUKARYOT. CELL

on October 10, 2020 by guest

http://ec.asm.org/

Dow

nloaded from

Page 10: Comparative Genomic Hybridizations of Entamoeba Strains ...Microarray hybridization. Genomic DNA (4 to 8 g) from the EH and ED strains was resuspended in 30.5 l of Tris-EDTA (pH 7.6),

extensive genetic diversity among clinical EH isolates usingPCR-based analyses of highly repetitive and polymorphic loci(32, 33, 72, 73). Although these regions easily incorporatepolymorphisms due to DNA slippage during replication, theextent of divergence may not be truly representative of diver-gence across the genome (42, 56). Our data confirmed that agreat deal of the divergence in Entamoeba strains occurred athighly abundant genomic regions, but we also identified signif-icant portions of the genome that were not in high abundancebut that were divergent. Thus, the CGH data more accuratelyrepresented the overall genomic diversity among Entamoebastrains.

In addition to giving a genome-wide overview of the extentof genetic diversity, CGH allowed identification of geneticregions that were highly prone to divergence. We identifiedassemblies in which every EH and ED strain studied had somedivergence, indicating that these genomic regions may be hotspots for genetic variability. No obvious structural componentof these regions that would predispose them to diverge couldbe identified, although their organization in scaffolds is notknown. Genomic architecture can predispose to divergence, asspontaneous subtelomeric deletions of Plasmodium chromo-somes have been observed and genomic regions rich in tRNAgenes or with transposon recognition sequences are moreprone to integration (3, 21, 41, 44, 53). Irrespective of struc-tural issues, genomic regions can be under different evolution-ary pressures (introns and pseudogenes are rapidly evolving)or functional pressures (highly antigenic genes may be underimmunogenic pressures), and these pressures may affect theoverall divergence in a given genomic context (45, 47).

A number of factors likely contribute to this observed ge-nome plasticity in Entamoeba. Retrotransposons, which areabundant in EH, play a significant role in genome evolution bytheir ability to move to new chromosomal locations (10, 59).Significant numbers of E. histolytica genes are in multiple cop-ies or are members of highly homologous gene family mem-bers. Duplicate genes have higher rates of intron gain or loss incomparison to orthologues and are under accelerated evolu-tionary pressures either through decreased functional con-

straints or via positive selection (8, 16, 28, 38, 74). E. histolyticacontains a number of episomally maintained DNA circles,which in other systems contribute to genome plasticity due tointramolecular homologous recombination between direct tan-dem repeats (20, 24). Whether genetic plasticity is enhanced inclinical scenarios to allow more-virulent strains to survive anddisseminate or has resulted in a large subset of avirulent EHstrains (which may account for the overall low rates of invasiveE. histolytica infection) is not clear.

The genomic differences between E. histolytica and E. disparwere the most extensive. We identified a number of genesconserved in all EH strains and highly divergent in both EDstrains, including a putative protein kinase, a peroxiredoxin1-related gene, and a Ras family member. Peroxiredoxins areantioxidant enzymes that protect against reactive oxygen spe-cies and reactive nitrogen species (14, 37). Peroxiredoxin-nullSaccharomyces cerevisiae cells are hypersensitive to oxidativestress and genomically unstable (70). Small GTP-binding pro-teins regulate many diverse processes in eukaryotic cells, in-cluding signal transduction, cell proliferation, cytoskeletal or-ganization, and intracellular membrane trafficking. InEntamoeba, small GTP-binding proteins play roles in polarity,motility, phagocytosis, and perhaps encystation; ED may beaffected in a number of cellular functions if functionally diver-gent in some of these genes (43, 54, 66). Nine genes wereidentified that were divergent in both ED and EH (Rahman)but conserved in all virulent EH strains. Most of these werehypothetical genes, but two had homologues in other systems:a Rab geranylgeranyltransferase and a gene with a dihydrouri-dine synthase domain. Functional studies will be necessary toelucidate the roles of these genes in amebic pathogenesis. Therelatively low number of divergent loci in avirulent versus vir-ulent Entamoeba is surprising, considering the vast biologicaldifferences between the two species. This implies that gain orloss of a gene may not be sufficient to confer pathogenicity.Thus, comparative genomics alone cannot unravel the com-plexities of virulence in these parasites. Expression levels, co-ordination, and timing, along with the genomic context, may allcontribute to virulence and need further investigation.

TABLE 3. PCR and sequence comparisons of genetic loci divergent among EH strainsa

Strain or parameter

423.m00019 (hypotheticalprotein)

271.m00049 (Ras guaninenucleotide releasing factor) 353.m00047 (hypothetical protein) 8.m00351 (actin, putative)

Divcatb

% Sequence identity(970 bp) orPCR result

Divcat

% Sequence identity(1,000 bp) orPCR result

Divcat

% Sequence identity(1,000 bp) or PCR result

Divcat

% Sequence identity(1,131 bp) orPCR result

EH (HM-1:IMSS) D 100 D 100 D 100 D 100EH (200:NIH) D 100 A PCR (�) C 83 D 100EH (HK9) A PCR (�)c D 100 B 78 (and 53-bp insertion) D 100EH (Rahman) C 94 D 100 C 74 (and 37-bp insertion) D 100ED (SAW760) A PCR (�) D 93 NAd PCR (�) D 97

ED (SAW760)BlastNe

% Identity 79 93 72 97Length (bp) 460 827 644 1,131

a The genetic locus, TIGR annotation, divergence category, sequence identity, PCR, and sequence data for four genetic loci are listed.b Div cat, divergence category.c Loci for which PCRs did not work are listed as PCR (�).d NA, CGH data not available.e The BLASTN data against the ED (SAW760) genome database are included.

VOL. 4, 2005 GENOTYPES OF ENTAMOEBA CORRELATE WITH VIRULENCE 513

on October 10, 2020 by guest

http://ec.asm.org/

Dow

nloaded from

Page 11: Comparative Genomic Hybridizations of Entamoeba Strains ...Microarray hybridization. Genomic DNA (4 to 8 g) from the EH and ED strains was resuspended in 30.5 l of Tris-EDTA (pH 7.6),

Using CGH, we identified that the genomic fingerprint of anEH strain with attenuated virulence was phylogenetically dis-tinct from virulent EH strains; it is an intriguing observation,because it implies that genotypes may be associated with phe-notypes. If a similar association were identified in clinical iso-lates, it would have enormous clinical implications, since onlythose individuals with invasive subtypes of EH would needtreatment. At this time, studies to determine the extent ofgenetic diversity among clinical isolates and potential geno-type-phenotype correlations are under way. Our work has laidthe foundation for pursuing genome-wide population studiesof E. histolytica and understanding the impact of parasite ge-netics on human disease.

ACKNOWLEDGMENTS

This work was supported in part by a Applied Genomics in Infec-tious Diseases training grant (5 T32 AI07502) and a Stanford Univer-sity Dean fellowship for P.H.S., a Cellular and Molecular BiologyTraining Program grant (NIH 5 T32 GM007276) for R.C.M., a Stan-ford University Dean fellowship for D.B., and a Burroughs Wellcomecollaboration grant and a grant from the NIAID (AI-053724) to U.S.

We gratefully acknowledge the help of Brendan Loftus, Neil Hall,and Iain Anderson (TIGR and Sanger Center) for access to EH clonesand sequence and genome data. We also thank David Mirelman andDan Eichinger for parasite strains, Sandeep Jaggi for help with dataanalysis, Charlie Kim for input on the GACK algorithm, Kevin Vis-conti for microarray printing, and all members of the lab for helpfulsuggestions and discussions.

REFERENCES

1. Abd-Alla, M. D., and J. I. Ravdin. 2002. Diagnosis of amoebic colitis byantigen capture ELISA in patients presenting with acute diarrhoea in Cairo,Egypt. Trop. Med. Int. Health 7:365–370.

2. Acuna-Soto, R., J. H. Maguire, and D. F. Wirth. 2000. Gender distribution inasymptomatic and invasive amebiasis. Am. J. Gastroenterol. 95:1277–1283.

3. Alano, P., C. Birago, L. Picci, M. Ponzi, P. Sallicandro, R. Scotti, and F.Silvestrini. 1999. Genome plasticity and sexual differentiation in Plasmo-dium. Parassitologia 41:149–151.

4. Arbo, A., M. Hoefsloot, A. Ramirez, and J. Ignacio Santos. 1990. Entamoebahistolytica inhibits the respiratory burst of polymorphonuclear leukocytes.Arch. Investig. Med. 21(Suppl. 1):57–61.

5. Arellano, J., J. Granados, E. Perez, C. Felix, and R. R. Kretschmer. 1991.Increased frequency of HLA-DR3 and complotype SC01 in Mexican mestizopatients with amoebic abscess of the liver. Parasite Immunol. 13:23–29.

6. Arellano, J., A. Isibasi, R. Miranda, F. Higuera, J. Granados, and R. R.Kretschmer. 1987. HLA antigens associated to amoebic abscess of the liverin Mexican mestizos. Parasite Immunol. 9:757–760.

7. Ayeh-Kumi, P. F., I. M. Ali, L. A. Lockhart, C. A. Gilchrist, W. A. Petri, Jr.,and R. Haque. 2001. Entamoeba histolytica: genetic diversity of clinical iso-lates from Bangladesh as demonstrated by polymorphisms in the serine-richgene. Exp. Parasitol. 99:80–88.

8. Bailey, J. A., R. Baertsch, W. J. Kent, D. Haussler, and E. E. Eichler. 2004.Hotspots of mammalian chromosomal evolution. Genome Biol. 5:R23. [On-line.] http://genomebiology.com/2004/5/4/R23.

9. Beck, D. L., M. Tanyuksel, A. J. Mackey, R. Haque, N. Trapaidze, W. R.Pearson, B. Loftus, and W. A. Petri. 2002. Entamoeba histolytica: sequenceconservation of the Gal/GalNAc lectin from clinical isolates. Exp. Parasitol.101:157–163.

10. Bhattacharya, S., A. Bakre, and A. Bhattacharya. 2002. Mobile geneticelements in protozoan parasites. J. Genet. 81:73–86.

11. Blessmann, J., I. K. Ali, P. A. Nu, B. T. Dinh, T. Q. Viet, A. L. Van, C. G.Clark, and E. Tannich. 2003. Longitudinal study of intestinal Entamoebahistolytica infections in asymptomatic adult carriers. J. Clin. Microbiol. 41:4745–4750.

12. Bruchhaus, I., T. Jacobs, M. Leippe, and E. Tannich. 1996. Entamoebahistolytica and Entamoeba dispar: differences in numbers and expression ofcysteine proteinase genes. Mol. Microbiol. 22:255–263.

13. Bujanover, S., U. Katz, R. Bracha, and D. Mirelman. 2003. A virulenceattenuated amoebapore-less mutant of Entamoeba histolytica and its inter-action with host cells. Int. J. Parasitol. 33:1655–1663.

14. Butterfield, L. H., A. Merino, S. H. Golub, and H. Shau. 1999. From cyto-protection to tumor suppression: the multifactorial role of peroxiredoxins.Antioxid. Redox. Signal. 1:385–402.

15. Calderon, J., and R. Tovar. 1986. Loss of susceptibility to complement lysis

in Entamoeba histolytica HM1 by treatment with human serum. Immunology58:467–471.

16. Castillo-Davis, C. I., T. B. Bedford, and D. L. Hartl. 2004. Accelerated ratesof intron gain/loss and protein evolution in duplicate genes in human andmouse malaria parasites. Mol. Biol. Evol. 21:1422–1427.

17. Cazares, F., R. Manning-Cela, and I. Meza. 1994. Heterogeneity of theribosomal DNA episome in strains and species of Entamoeba. Mol. Micro-biol. 12:607–612.

18. Clark, C. G., and L. S. Diamond. 1993. Entamoeba histolytica: a method forisolate identification. Exp. Parasitol. 77:450–455.

19. Cleary, M. D., U. Singh, I. J. Blader, J. L. Brewer, and J. C. Boothroyd. 2002.Toxoplasma gondii asexual development: identification of developmentallyregulated genes and distinct patterns of gene expression. Eukaryot. Cell1:329–340.

20. Cohen, S., K. Yacobi, and D. Segal. 2003. Extrachromosomal circular DNAof tandemly repeated genomic sequences in Drosophila. Genome Res. 13:1133–1145.

21. Corcoran, L. M., J. K. Thompson, D. Walliker, and D. J. Kemp. 1988.Homologous recombination within subtelomeric repeat sequences generateschromosome size polymorphisms in P. falciparum. Cell 53:807–813.

22. Cruz-Reyes, J. A., W. M. Spice, T. Rehman, E. Gisborne, and J. P. Ackers.1992. Ribosomal DNA sequences in the differentiation of pathogenic andnon-pathogenic isolates of Entamoeba histolytica. Parasitology 104(Part 2):239–246.

23. de la Vega, H., C. A. Specht, C. E. Semino, P. W. Robbins, D. Eichinger, D.Caplivski, S. Ghosh, and J. Samuelson. 1997. Cloning and expression ofchitinases of Entamoebae. Mol. Biochem. Parasitol. 85:139–147.

24. Dhar, S. K., N. R. Choudhury, A. Bhattacharaya, and S. Bhattacharya. 1995.A multitude of circular DNAs exist in the nucleus of Entamoeba histolytica.Mol. Biochem. Parasitol. 70:203–206.

25. Diamond, L. S., and C. G. Clark. 1993. A redescription of Entamoebahistolytica Schaudinn, 1903 (Emended Walker, 1911) separating it from En-tamoeba dispar Brumpt, 1925. J. Eukaryot. Microbiol. 40:340–344.

26. Duggal, P., R. Haque, S. Roy, D. Mondal, R. B. Sack, B. M. Farr, T. H. Beaty,and W. A. Petri, Jr. 2004. Influence of human leukocyte antigen class IIalleles on susceptibility to Entamoeba histolytica infection in Bangladeshichildren. J. Infect. Dis. 189:520–526.

27. Dvorak, J. A., S. Kobayashi, T. Nozaki, T. Takeuchi, and C. Matsubara.2003. Induction of permeability changes and death of vertebrate cells ismodulated by the virulence of Entamoeba spp. isolates. Parasitol. Int. 52:169–173.

28. Emes, R. D., L. Goodstadt, E. E. Winter, and C. P. Ponting. 2003. Compar-ison of the genomes of human and mouse lays the foundation of genomezoology. Hum. Mol. Genet. 12:701–709.

29. Gathiram, V., and T. F. Jackson. 1985. Frequency distribution of Entamoebahistolytica zymodemes in a rural South African population. Lancet 1:719–721.

30. Ghosh, S., M. Frisardi, L. Ramirez-Avila, S. Descoteaux, K. Sturm-Ramirez,O. A. Newton-Sanchez, J. I. Santos-Preciado, C. Ganguly, A. Lohia, S. Reed,and J. Samuelson. 2000. Molecular epidemiology of Entamoeba spp.: evi-dence of a bottleneck (demographic sweep) and transcontinental spread ofdiploid parasites. J. Clin. Microbiol. 38:3815–3821.

31. Gomes, M. A., M. N. Melo, A. M. Macedo, C. Furst, and E. F. Silva. 2000.RAPD in the analysis of isolates of Entamoeba histolytica. Acta Trop. 75:71–77.

32. Haghighi, A., S. Kobayashi, T. Takeuchi, G. Masuda, and T. Nozaki. 2002.Remarkable genetic polymorphism among Entamoeba histolytica isolatesfrom a limited geographic area. J. Clin. Microbiol. 40:4081–4090.

33. Haghighi, A., S. Kobayashi, T. Takeuchi, N. Thammapalerd, and T. Nozaki.2003. Geographic diversity among genotypes of Entamoeba histolytica fieldisolates. J. Clin. Microbiol. 41:3748–3756.

34. Haque, R., I. M. Ali, and W. A. Petri, Jr. 1999. Prevalence and immuneresponse to Entamoeba histolytica infection in preschool children in Bang-ladesh. Am. J. Trop. Med. Hyg. 60:1031–1034.

35. Haque, R., P. Duggal, I. M. Ali, M. B. Hossain, D. Mondal, R. B. Sack, B. M.Farr, T. H. Beaty, and W. A. Petri, Jr. 2002. Innate and acquired resistanceto amebiasis in Bangladeshi children. J. Infect. Dis. 186:547–552.

36. Hellberg, A., R. Nickel, H. Lotter, E. Tannich, and I. Bruchhaus. 2001.Overexpression of cysteine proteinase 2 in Entamoeba histolytica or Enta-moeba dispar increases amoeba-induced monolayer destruction in vitro butdoes not augment amoebic liver abscess formation in gerbils. Cell. Microbiol.3:13–20.

37. Hofmann, B., H. J. Hecht, and L. Flohe. 2002. Peroxiredoxins. Biol. Chem.383:347–364.

38. Katju, V., and M. Lynch. 2003. The structure and early evolution of recentlyarisen gene duplicates in the Caenorhabditis elegans genome. Genetics 165:1793–1803.

39. Kato-Maeda, M., J. T. Rhee, T. R. Gingeras, H. Salamon, J. Drenkow, N.Smittipat, and P. M. Small. 2001. Comparing genomes within the speciesMycobacterium tuberculosis. Genome Res. 11:547–554.

40. Kim, C. C., E. A. Joyce, K. Chan, and S. Falkow. 2002. Improved analyticalmethods for microarray-based genome-composition analysis. Genome Biol.

514 SHAH ET AL. EUKARYOT. CELL

on October 10, 2020 by guest

http://ec.asm.org/

Dow

nloaded from

Page 12: Comparative Genomic Hybridizations of Entamoeba Strains ...Microarray hybridization. Genomic DNA (4 to 8 g) from the EH and ED strains was resuspended in 30.5 l of Tris-EDTA (pH 7.6),

3:RESEARCH0065. [Online.] http://genomebiology.com/2002/3/11/research/0065.

41. Kim, J. M., S. Vanguri, J. D. Boeke, A. Gabriel, and D. F. Voytas. 1998.Transposable elements and genome organization: a comprehensive survey ofretrotransposons revealed by the complete Saccharomyces cerevisiae genomesequence. Genome Res. 8:464–478.

42. Kunkel, T. A. 1986. Frameshift mutagenesis by eucaryotic DNA polymerasesin vitro. J. Biol. Chem. 261:13581–13587.

43. Labruyere, E., C. Zimmer, V. Galy, J. C. Olivo-Marin, and N. Guillen. 2003.EhPAK, a member of the p21-activated kinase family, is involved in thecontrol of Entamoeba histolytica migration and phagocytosis. J. Cell Sci.116:61–71.

44. Lecompte, O., R. Ripp, V. Puzos-Barbe, S. Duprat, R. Heilig, J. Dietrich,J. C. Thierry, and O. Poch. 2001. Genome evolution at the genus level:comparison of three complete genomes of hyperthermophilic archaea. Ge-nome Res. 11:981–993.

45. Lehmann, T., C. R. Blackston, S. F. Parmley, J. S. Remington, and J. P.Dubey. 2000. Strain typing of Toxoplasma gondii: comparison of antigen-coding and housekeeping genes. J. Parasitol. 86:960–971.

46. Leippe, M., E. Bahr, E. Tannich, and R. D. Horstmann. 1993. Comparisonof pore-forming peptides from pathogenic and nonpathogenic Entamoebahistolytica. Mol. Biochem. Parasitol. 59:101–109.

47. Li, W. H. 1997. Molecular evolution. Sinauer, Sunderland, Mass.47a.Loftus, B., et al. The genome of the protist parasite Entamoeba histolytica.

Nature, in press.48. Mattern, C. F., and D. B. Keister. 1977. Experimental amebiasis. II. Hepatic

amebiasis in the newborn hamster. Am. J. Trop. Med. Hyg. 26:402–411.49. Mattern, C. F., D. B. Keister, and L. S. Diamond. 1979. Experimental

amebiasis. IV. Amebal viruses and the virulence of Entamoeba histolytica.Am. J. Trop. Med. Hyg. 28:653–657.

50. McGowan, K., C. F. Deneke, G. M. Thorne, and S. L. Gorbach. 1982.Entamoeba histolytica cytotoxin: purification, characterization, strain viru-lence, and protease activity. J. Infect. Dis. 146:616–625.

51. Moody-Haupt, S., J. H. Patterson, D. Mirelman, and M. J. McConville. 2000.The major surface antigens of Entamoeba histolytica trophozoites are GPI-anchored proteophosphoglycans. J. Mol. Biol. 297:409–420.

52. Nickel, R., C. Ott, T. Dandekar, and M. Leippe. 1999. Pore-forming peptidesof Entamoeba dispar. Similarity and divergence to amoebapores in structure,expression and activity. Eur. J. Biochem. 265:1002–1007.

53. Patarapotikul, J., and G. Langsley. 1988. Chromosome size polymorphism inPlasmodium falciparum can involve deletions of the subtelomeric pPFrep20sequence. Nucleic Acids Res. 16:4331–4340.

54. Paveto, C., H. N. Torres, M. M. Flawia, M. Garcia-Espitia, A. Ortega, and E.Orozco. 1999. Entamoeba histolytica: signaling through G proteins. Exp.Parasitol. 91:170–175.

55. Ravdin, J. I., M. D. Abd-Alla, S. L. Welles, S. Reddy, and T. F. Jackson. 2003.Intestinal antilectin immunoglobulin A antibody response and immunity toEntamoeba dispar infection following cure of amebic liver abscess. Infect.Immun. 71:6899–6905.

56. Rich, S. M., R. R. Hudson, and F. J. Ayala. 1997. Plasmodium falciparumantigenic diversity: evidence of clonal population structure. Proc. Natl. Acad.Sci. USA 94:13040–13045.

57. Salama, N., K. Guillemin, T. K. McDaniel, G. Sherlock, L. Tompkins, and S.Falkow. 2000. A whole-genome microarray reveals genetic diversity amongHelicobacter pylori strains. Proc. Natl. Acad. Sci. USA 97:14668–14673.

58. Sehgal, D., V. Mittal, S. Ramachandran, S. K. Dhar, A. Bhattacharya, andS. Bhattacharya. 1994. Nucleotide sequence organisation and analysis of thenuclear ribosomal DNA circle of the protozoan parasite Entamoeba histo-lytica. Mol. Biochem. Parasitol. 67:205–214.

59. Sharma, R., A. Bagchi, A. Bhattacharya, and S. Bhattacharya. 2001. Char-acterization of a retrotransposon-like element from Entamoeba histolytica.Mol. Biochem. Parasitol. 116:45–53.

60. Singh, U., and J. B. Rogers. 1998. The novel core promoter element GAACin the hgl5 gene of Entamoeba histolytica is able to direct a transcription startsite independent of TATA or initiator regions. J. Biol. Chem. 273:21663–21668.

61. Stauffer, W., and J. I. Ravdin. 2003. Entamoeba histolytica: an update. Curr.Opin. Infect. Dis. 16:479–485.

62. Tannich, E., and G. D. Burchard. 1991. Differentiation of pathogenic fromnonpathogenic Entamoeba histolytica by restriction fragment analysis of asingle gene amplified in vitro. J. Clin. Microbiol. 29:250–255.

63. Valdez, E., M. del Carmen Martinez, A. Gomez, R. Cedillo, J. Arellano, M. E.Perez, F. Ramos, P. Moran, E. Gonzalez, O. Valenzuela, E. I. Melendro, M.Ramiro, R. Kretschmer, O. Munoz, and C. Ximenez. 1999. HLA character-ization in adult asymptomatic cyst passers of Entamoeba histolytica/E. dispar.Parasitol. Res. 85:833–836.

64. Valle, P. R., M. B. Souza, E. M. Pires, E. F. Silva, and M. A. Gomes. 2000.Arbitrarily primed PCR fingerprinting of RNA and DNA in Entamoebahistolytica. Rev. Inst. Med. Trop. Sao Paulo 42:249–253.

65. Walsh, J. A. 1986. Problems in recognition and diagnosis of amebiasis:estimation of the global magnitude of morbidity and mortality. Rev. Infect.Dis. 8:228–238.

66. Welter, B. H., R. C. Laughlin, and L. A. Temesvari. 2002. Characterizationof a Rab7-like GTPase, EhRab7: a marker for the early stages of endocytosisin Entamoeba histolytica. Mol. Biochem. Parasitol. 121:254–264.

67. Willhoeft, U., H. Buss, and E. Tannich. 2002. The abundant polyadenylatedtranscript 2 DNA sequence of the pathogenic protozoan parasite Entamoebahistolytica represents a nonautonomous non-long-terminal-repeat retrotrans-poson-like element which is absent in the closely related nonpathogenicspecies Entamoeba dispar. Infect. Immun. 70:6798–6804.

68. Willhoeft, U., H. Buss, and E. Tannich. 1999. DNA sequences correspondingto the ariel gene family of Entamoeba histolytica are not present in E. dispar.Parasitol. Res. 85:787–789.

69. Willhoeft, U., L. Hamann, and E. Tannich. 1999. A DNA sequence corre-sponding to the gene encoding cysteine proteinase 5 in Entamoeba histolyticais present and positionally conserved but highly degenerated in Entamoebadispar. Infect. Immun. 67:5925–5929.

70. Wong, C. M., K. L. Siu, and D. Y. Jin. 2004. Peroxiredoxin-null yeast cells arehypersensitive to oxidative stress and are genomically unstable. J. Biol.Chem. 279:23207–23213.

71. World Health Organization. 1997. Amoebiasis. Wkly. Epidemiol. Rec. 72:97–100.

72. Zaki, M., and C. G. Clark. 2001. Isolation and characterization of polymor-phic DNA from Entamoeba histolytica. J. Clin. Microbiol. 39:897–905.

73. Zaki, M., P. Meelu, W. Sun, and C. G. Clark. 2002. Simultaneous differen-tiation and typing of Entamoeba histolytica and Entamoeba dispar. J. Clin.Microbiol. 40:1271–1276.

74. Zhang, Z., and H. Kishino. 2004. Genomic background predicts the fate ofduplicated genes: evidence from the yeast genome. Genetics 166:1995–1999.

VOL. 4, 2005 GENOTYPES OF ENTAMOEBA CORRELATE WITH VIRULENCE 515

on October 10, 2020 by guest

http://ec.asm.org/

Dow

nloaded from