13
Copyright Ó 2010 by the Genetics Society of America DOI: 10.1534/genetics.110.116616 Whole-Genome Profiling of Mutagenesis in Caenorhabditis elegans Stephane Flibotte,* ,† Mark L. Edgley , Iasha Chaudhry , Jon Taylor , Sarah E. Neil, Aleksandra Rogula, Rick Zapf, Martin Hirst, Yaron Butterfield, Steven J. Jones, Marco A. Marra, Robert J. Barstead § and Donald G. Moerman* ,‡,1 *Department of Zoology, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada, Canada’s Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia V5Z 4S6, Canada, Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada and § Department of Molecular and Cell Biology, Oklahoma Medical Research Foundation, Oklahoma City, Oklahoma 73104 Manuscript received March 12, 2010 Accepted for publication April 28, 2010 ABSTRACT Deep sequencing offers an unprecedented view of an organism’s genome. We describe the spectrum of mutations induced by three commonly used mutagens: ethyl methanesulfonate (EMS), N-ethyl-N- nitrosourea (ENU), and ultraviolet trimethylpsoralen (UV/TMP) in the nematode Caenorhabditis elegans. Our analysis confirms the strong GC to AT transition bias of EMS. We found that ENU mainly produces A to T and T to A transversions, but also all possible transitions. We found no bias for any specific transition or transversion in the spectrum of UV/TMP-induced mutations. In 10 mutagenized strains we identified 2723 variants, of which 508 are expected to alter or disrupt gene function, including 21 nonsense mutations and 10 mutations predicted to affect mRNA splicing. This translates to an average of 50 informative mutations per strain. We also present evidence of genetic drift among laboratory wild-type strains derived from the Bristol N2 strain. We make several suggestions for best practice using massively parallel short read sequencing to ensure mutation detection. M UTAGENESIS and the screening for mutants have long been a key tool of the practicing geneticist. The early work of T. H. Morgan and his colleagues relied on recovery of spontaneous muta- tions, which was limiting for the study of inheritance due to their infrequent occurrence (Morganet al. 1922; also see Sturtevant 1965). The discovery by H. J. Muller and others that X rays cause mutations ushered in the era of inducing mutations (Muller 1927). There is a long history of studies on mutagen specificity, both in prokaryotes and in eukaryotes, and today many mutagens are utilized in a variety of model organisms. In this article we use whole-genome deep sequencing in the model organism Caenorhabditis elegans to explore the types and frequencies of mutations induced by various mutagens and to document the feasibility of global identification of mutations. The mutagenic properties of ethyl methanesulfonate (EMS) were first demonstrated using the T4 viral system (Loveless 1959). Soon after, Lewis and Bacher (1968) demonstrated how to administer EMS to Drosophila melanogaster to generate mutations, and later Sydney Brenner did the same for the nematode C. elegans (Brenner 1974). The now classic article by Coulondre and Miller (1977) demonstrated the types of nucleo- tide substitutions generated by EMS and confirmed earlier observations (Bautz and Freese 1960) concern- ing the strong bias for GC to AT transitions. Today, EMS is still the most powerful and popular mutagen used by researchers studying D. melanogaster and C. elegans. Purely on the basis of genetic inference, when used at a concentration of 50 mm, EMS is calculated to induce 20 function-affecting variant alleles in C. elegans strains derived using this mutagen (Greenwald and Horvitz 1982; Anderson 1995). The chemical N-ethyl-N-nitrosourea (ENU) has been used as a mutagen since the 1970s but came to prom- inence when it was demonstrated to be the most effective chemical mutagen in mice (Russell et al. 1979). Today it is still the chemical mutagen of choice for this organism (Anderson 2000; Acevedo-Arozena et al. 2008). ENU has also been used for C. elegans mutagenesis (De Stasio et al. 1997). Although it ap- pears to have different biases with regard to gene targets and base changes relative to EMS, the background mutational load after ENU mutagenesis has not been fully characterized (De Stasio and Dorman 2001). The chemical 4,59,8-trimethylpsoralen is a crosslink- ing agent that is activated by near ultraviolet light. Studies in Escherichia coli have shown that it causes both Supporting information is available online at http://www.genetics.org/ cgi/content/full/genetics.110.116616/DC1. 1 Corresponding author: Department of Zoology, University of British Columbia, 6270 University Blvd., Vancouver, BC V6T 1Z4, Canada. E-mail: [email protected] Genetics 185: 431–441 ( June 2010)

Whole-Genome Profiling of Mutagenesis in Caenorhabditis ...mutagenesis (De Stasio et al. 1997). Although it ap-pearsto have different biases with regardto genetargets and base changes

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Whole-Genome Profiling of Mutagenesis in Caenorhabditis ...mutagenesis (De Stasio et al. 1997). Although it ap-pearsto have different biases with regardto genetargets and base changes

Copyright � 2010 by the Genetics Society of AmericaDOI: 10.1534/genetics.110.116616

Whole-Genome Profiling of Mutagenesis in Caenorhabditis elegans

Stephane Flibotte,*,† Mark L. Edgley,‡ Iasha Chaudhry,‡ Jon Taylor,‡ Sarah E. Neil,‡

Aleksandra Rogula,‡ Rick Zapf,‡ Martin Hirst,† Yaron Butterfield,†

Steven J. Jones,† Marco A. Marra,† Robert J. Barstead§ andDonald G. Moerman*,‡,1

*Department of Zoology, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada, †Canada’s Michael SmithGenome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia V5Z 4S6, Canada, ‡Michael Smith

Laboratories, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada and §Department ofMolecular and Cell Biology, Oklahoma Medical Research Foundation, Oklahoma City, Oklahoma 73104

Manuscript received March 12, 2010Accepted for publication April 28, 2010

ABSTRACT

Deep sequencing offers an unprecedented view of an organism’s genome. We describe the spectrum ofmutations induced by three commonly used mutagens: ethyl methanesulfonate (EMS), N-ethyl-N-nitrosourea (ENU), and ultraviolet trimethylpsoralen (UV/TMP) in the nematode Caenorhabditis elegans.Our analysis confirms the strong GC to AT transition bias of EMS. We found that ENU mainly produces Ato T and T to A transversions, but also all possible transitions. We found no bias for any specific transitionor transversion in the spectrum of UV/TMP-induced mutations. In 10 mutagenized strains we identified2723 variants, of which 508 are expected to alter or disrupt gene function, including 21 nonsensemutations and 10 mutations predicted to affect mRNA splicing. This translates to an average of 50informative mutations per strain. We also present evidence of genetic drift among laboratory wild-typestrains derived from the Bristol N2 strain. We make several suggestions for best practice using massivelyparallel short read sequencing to ensure mutation detection.

MUTAGENESIS and the screening for mutantshave long been a key tool of the practicing

geneticist. The early work of T. H. Morgan and hiscolleagues relied on recovery of spontaneous muta-tions, which was limiting for the study of inheritancedue to their infrequent occurrence (Morganet al. 1922;also see Sturtevant 1965). The discovery by H. J.Muller and others that X rays cause mutations usheredin the era of inducing mutations (Muller 1927). Thereis a long history of studies on mutagen specificity, bothin prokaryotes and in eukaryotes, and today manymutagens are utilized in a variety of model organisms.In this article we use whole-genome deep sequencing inthe model organism Caenorhabditis elegans to explorethe types and frequencies of mutations induced byvarious mutagens and to document the feasibility ofglobal identification of mutations.

The mutagenic properties of ethyl methanesulfonate(EMS) were first demonstrated using the T4 viral system(Loveless 1959). Soon after, Lewis and Bacher (1968)demonstrated how to administer EMS to Drosophilamelanogaster to generate mutations, and later Sydney

Brenner did the same for the nematode C. elegans(Brenner 1974). The now classic article by Coulondre

and Miller (1977) demonstrated the types of nucleo-tide substitutions generated by EMS and confirmedearlier observations (Bautz and Freese 1960) concern-ing the strong bias for GC to AT transitions. Today, EMSis still the most powerful and popular mutagen used byresearchers studying D. melanogaster and C. elegans.Purely on the basis of genetic inference, when used ata concentration of 50 mm, EMS is calculated to induce�20 function-affecting variant alleles in C. elegans strainsderived using this mutagen (Greenwald and Horvitz

1982; Anderson 1995).The chemical N-ethyl-N-nitrosourea (ENU) has been

used as a mutagen since the 1970s but came to prom-inence when it was demonstrated to be the mosteffective chemical mutagen in mice (Russell et al.1979). Today it is still the chemical mutagen of choicefor this organism (Anderson 2000; Acevedo-Arozena

et al. 2008). ENU has also been used for C. elegansmutagenesis (De Stasio et al. 1997). Although it ap-pears to have different biases with regard to gene targetsand base changes relative to EMS, the backgroundmutational load after ENU mutagenesis has not beenfully characterized (De Stasio and Dorman 2001).

The chemical 4,59,8-trimethylpsoralen is a crosslink-ing agent that is activated by near ultraviolet light.Studies in Escherichia coli have shown that it causes both

Supporting information is available online at http://www.genetics.org/cgi/content/full/genetics.110.116616/DC1.

1Corresponding author: Department of Zoology, University of BritishColumbia, 6270 University Blvd., Vancouver, BC V6T 1Z4, Canada.E-mail: [email protected]

Genetics 185: 431–441 ( June 2010)

Page 2: Whole-Genome Profiling of Mutagenesis in Caenorhabditis ...mutagenesis (De Stasio et al. 1997). Although it ap-pearsto have different biases with regardto genetargets and base changes

single-base changes and deletions (Piette et al. 1985;Sladek et al. 1989). C. elegans researchers became in-terested in the potential of ultraviolet trimethylpsoralen(UV/TMP) to generate deletions in worms after the firstdeletions in this organism were isolated using thismutagen (Yandell et al. 1994). UV/TMP is now a majorreagent in the arsenal of the C. elegans knockoutconsortium laboratories (Barstead and Moerman

2006). As a tool for generating deletions in eukaryotesit is quite useful but, outside of studies on prokaryotes,little else is known about the spectrum of mutageniceffects caused by UV/TMP.

Massively parallel short read sequencing technologiesoffer unprecedented opportunities to study the com-plete genetic complement of an individual organism(Hillier et al. 2008). For genetic model systems theimpact of this technology extends to the identificationand correlation of induced mutations with selectedphenotypes (Sarin et al. 2008). Several of the techno-logical and bioinformatic issues that arise with nextgeneration sequencing have already been addressed forthe nematode C. elegans (Hillier et al. 2008; Sarin et al.2008; Shen et al. 2008; Rose et al. 2010). Still, it is notclear how deeply one must sequence to confidentlyidentify a relevant variant allele in a target mutant strain.Also of importance are questions concerning mutagenchoice and dosage as they relate to the rate of inductionof new mutations and background mutational load. Wehave undertaken the following study on mutagenesisand mutation detection to establish the parametersnecessary to exploit next generation sequencing tech-nologies for C. elegans genetics. For the first time weoffer a whole-genome direct measure of mutationspectrum and background load for EMS, ENU, andUV/TMP. Readers interested in whole-genome se-quencing of EMS mutagenized strains in C. elegansshould also see the accompanying article in this issueby Sarin et al. (2010). In our study we also measured thesingle-nucleotide variation among currently used wild-type strains. In addition, we measured sequence readdepth of all sequence and coding sequence and fromthis we make a recommendation of average genomecoverage to ensure the correct identification of thecausative mutation. We also examined the issue of falsepositive and false negative calls and make recommen-dations to eliminate most false positives without losingbona fide mutations.

MATERIALS AND METHODS

Strains used in this study: Homozygous unc-22 strainsderived from mutagenesis were VC1923, VC1924, RB5000,and RB5002 (EMS); VC2362 and VC2366 (UV/TMP); andVC2451, VC2452, and RB5001 (ENU). DM1017 [unc-52(e3003e998) II; sup-38(ra5) IV] was generated by EMS muta-genesis of CB998 (Gilchrist and Moerman 1992). VC2010is the Moerman Gene Knockout Lab subculture of N2 ob-tained from the Caenorhabditis Genetics Center in 2002 and is

the parent strain for all unc-22 strains described here. A secondN2 subculture, from the Schedl lab at Washington University(St. Louis, MO), designated WU N2, was also used in this work,but we used existing whole-genome sequence data (Hillier

et al. 2008) instead of generating new data. The N2 referenceDNA sequence used in this study was largely derived fromcosmids constructed from a wild-type strain, although aportion of the reference was derived from yeast artificialchromosomes (YACs), which were produced from the strainCB1301, an endonuclease minus mutant derived from a wild-type strain by EMS mutagenesis (R. Waterston, personalcommunication).

Nematode culture and mutagenesis: Nematodes generallywere grown as previously described (Brenner 1974). Popula-tions for mutagenesis were prepared by washing VC2010 (N2)animals off starved plates with 10-ml aliquots of M9 buffercontaining 0.01% Triton X-100, pelleting in a benchtopcentrifuge in 15-ml centrifuge tubes, and replating on150-mm NGM agar plates seeded with E. coli strain OP50 orx1666. After 2 days at 20�, gravid adults were harvested bywashing with sterile distilled water and the population wassynchronized by alkaline hypochlorite treatment (Sulston

and Hodgkin 1988), and the egg pellet was replated on fresh150-mm seeded plates. When the populations were predom-inantly at the L4 stage, they were collected for mutagenesis bywashing in M9/Triton X-100.

Mutagenesis with EMS was performed at 50 mm (for VC andDM strains) or 60 mm (for RB strains) according to standardprotocols (Sulston and Hodgkin 1988). Mutagenesis by UV/TMP was performed at 2 mg/ml TMP according to ourlaboratory standard protocol. Briefly, 100 mg TMP (Sigma,St. Louis; T6137) was dissolved by vigorous shaking in 40 mlacetone (Gengyo-Ando and Mitani 2000) in a sterile 50-mlpolypropylene centrifuge tube to make a solution of 2.5 mg/ml.A nematode population washed from a growth plate wastransferred to a sterile 15-ml polypropylene centrifuge tubeand the volume adjusted to 4 ml. TMP solution was added tothe worm suspension to the desired concentration. The tubeswere incubated in darkness on a benchtop shaker at 150 rpmfor 1 hr. After incubation the worms were washed free ofmutagen and bacteria in five changes of M9/Triton X-100 anddiluted to 12.5 ml in M9/Triton. Twelve 1-ml aliquots wereplaced in individual wells of a sterile untreated 12-well tissueculture plate (Biopacific, E222804601F), and the plate wasirradiated with 360 nm UV at 340 mW/cm2 for 90 sec in acustom-built irradiation cabinet. Mutagenesis by ENU wasperformed at 0.5 mm according to De Stasio and Dorman

(2001) for the VC strains and at 1.0 mm for the RB strains. Itshould be noted that batches of mutagen do vary in potency. Inour experience TMP is particularly susceptible to batchvariability, which makes it very difficult to arrive at a standardrecommended concentration.

For each mutagen, 180 P0 animals were plated at three per60-mm seeded agar plate. The F1 progeny were screened forheterozygous unc-22 twitchers in 1% nicotine (Moerman andBaillie 1979); from each such animal, a homozygous F2 unc-22 twitcher was selected for further processing. Each homo-zygous line was propagated clonally from F2 through F7 todrive other carried mutations to homozygosity, and a homo-zygous F7 animal was selected from each to establish a stock.Between two and four such F7 lines were selected for eachmutagen. The VC and DM lines were subjected to comparativegenomic hybridization (CGH) analysis using our whole-genome chip designs (Roche NimbleGen design name‘‘2006-11-16_CE2_WG_CGH_E’’ for the EMS mutants and‘‘081002_CE_UBC_RZ_CGH’’ for the TMP and ENU mutants)and to whole-genome sequencing on the Illumina platform atthe Michael Smith Genome Sciences Centre (Vancouver, BC,

432 S. Flibotte et al.

Page 3: Whole-Genome Profiling of Mutagenesis in Caenorhabditis ...mutagenesis (De Stasio et al. 1997). Although it ap-pearsto have different biases with regardto genetargets and base changes

Canada). The RB lines were not examined with CGH but weresequenced using the Illumina platform at the OklahomaMedical Research Foundation (OMRF) (Oklahoma City, OK).

Whole-genome shotgun sequencing: Worm strains forsequencing were grown on 150-mm petri plates containingrich NGM medium (standard recipe but 83 peptone) withagarose [Invitrogen (Carlsbad, CA) UltraPure, catalog no.16500-500] substituted for agar, seeded with E. coli x1666.Populations were grown to starvation, harvested by washingwith sterile M9/Triton X-100, and pelleted by centrifugationin sterile 15-ml centrifuge tubes, and the supernatant wasremoved by aspiration. The buffer was removed using fivechanges of sterile distilled water, centrifugation, and aspiration;after the final wash, water was removed, leaving a concentratedworm pellet in a minimum volume, typically between 300 and500 ml. Worm concentrates were frozen at �80�.

Two methods were used for DNA preparation. DNA forstrains VC2010 and VC1924 was prepared by standard phenol/chloroform extraction with RNAse treatment, precipitation,and resuspension in TE (pH 8.0). DNA for all other strains wasprepared using the PureGene Genomic DNA Tissue Kit[QIAGEN (Valencia, CA) no. 158622], following a supple-mentary QIAGEN protocol for nematodes. DNA concentra-tions were determined using a Thermo Fisher Nanodropspectrophotometer, and quality was checked by electrophore-sis on 1% agarose gels.

Prepared DNAs were submitted to the Michael SmithGenome Sciences Centre (VC and DM strains) or the OMRFsequencing center (RB strains) in amounts ranging from 5 to10 mg. The WU N2 strain, which was derived from the BristolN2 strain, was sequenced by Hillier et al. (2008) withunpaired reads of 32 nucleotides in length while the VC2010and VC1924 strains were sequenced with 36-mer unpairedreads. The other strains were sequenced with a more recentprotocol using paired reads, with a read length of 37 for the RBstrains (RB5000, RB5001, and RB5002) and 50 for all theremaining strains.

Sequence alignment, variant determination, and search fordeletions: The sequence reads were aligned to the WormBase(www.wormbase.org) reference genome version WS190, usingthe Maq software suite version 0.7.1 (Li et al. 2008) with defaultparameters. The variant calling procedure also followed astandard Maq pipeline with default filters and parametersexcept that the maximum number of mismatches was set tothree and the minimum mapping quality was set to 10 for aread to be considered in the consensus calling. Identical readpairs were removed before variant determination and only theunambiguous homozygous variants were kept. Variants withambiguous consensus base (i.e., base other than A, C, T, or G)were eliminated. Furthermore, the minimum Phred-likeconsensus quality was set to 30 instead of the default 20, whichprovided a better compromise between sensitivity and speci-ficity for the current application and data sets. The variantcandidates were then labeled according to their type andlocation (within intron or exon, synonymous or nonsynon-ymous, etc.) with a custom Perl script, using gene informationavailable in WS190.

Searches for homozygous deletions .10 bases followed atwo-step approach. First the regions with no coverage in themutant strain under consideration but with some coverage inthe N2 strain VC2010 were flagged for further analysis with thesecond step. The second step compared the distribution ofapparent distance between the aligned read pairs in theregions of interest to the average distribution for the wholechromosome, using a Kolmogorov–Smirnov (KS) test. Thosetwo steps were implemented with Perl and R scripts and themost promising deletion candidates were further evaluated byvisual inspection. The most promising candidates were the

regions with a small P-value in the KS test from the second step,larger size, and good agreement between the two size measure-ments, i.e., (1) the size of the region with lack of coverage fromthe first step and (2) the increase in apparent insert size fromthe second step.

Comparative genomic hybridization: The VC and DMmutant strains were processed with whole-genome compara-tive genomic hybridization following the procedure describedin detail in Maydan et al. (2007), using the N2 wild-typeVC2010 strain for reference DNA. Roche NimbleGen manu-factured the microarrays but the experiments and subsequentdata processing were performed in the laboratory of D. G.Moerman.

RESULTS

Variation in wild-type strains: We began our study byanalyzing our laboratory wild-type strain, VC2010, andcomparing it to the sequences of two other wild-typestrains, the C. elegans reference sequence and the re-cently reported sequence of the WU N2 strain (Hillier

et al. 2008). All three sequenced strains are subcultures ofthe original N2 strain established by Brenner (1974), butwere maintained separately for an unknown number ofgenerations before sequencing. Thus, an uncertainamount of drift may have occurred. For this project weused the massively parallel short sequencing technologyfrom Illumina (either a GA1 or a GA2 instrument). Inaddition, we also reanalyzed sequencing reads from theWU N2 strain from Washington University (Hillier et al.2008). Sequencing reads were aligned using the Maqsoftware suite (Li et al. 2008) to the wild-type Bristol N2reference genome WS190 assembled from Sanger se-quencing reads (C. elegans Sequencing Consortium

1998). Sequence processing details can be found inmaterials and methods. Variant candidates (single-nucleotide transitions and transversions) were called withthe Maq suite primarily using default parameters andfilters.

After obtaining an average sequence depth of cover-age of 17-fold (Figure 1), we found 871 differencesbetween the derived sequence of VC2010 and thereference sequence. Similar to the results reported byHillier et al. (2008), in our reanalysis of their data (32-fold coverage), we detected 844 substitutions comparedto the wild-type reference genome. Of these, 634 of thedifferences are shared between the two strains (Figure2). These could represent accumulated mutations de-pending on the relationship of these two strains to thestrains used for the reference sequence; more likely theyrepresent errors in the reference sequence. Assumingthat they all correspond to sequencing errors and that ;noothers exist, we can estimate the sequencing error rate ofthe reference genome to a little less than 1 in 100,000bases, which is in good agreement with previous estimates(C. elegans Sequencing Consortium 1998). Of thevariants unique to each strain, almost all are in non-coding sequences or in silent bases of codons (synony-mous changes). Some of these variants may represent

Comparing Mutagens in C. elegans 433

Page 4: Whole-Genome Profiling of Mutagenesis in Caenorhabditis ...mutagenesis (De Stasio et al. 1997). Although it ap-pearsto have different biases with regardto genetargets and base changes

sequencing errors, whereas others may represent differ-ence accumulated due to drift during maintenance of thetwo strains. We return to this question after considerationof the mutagen-treated strains.

Variant detection in EMS-, ENU-, and TMP/UV-treated strains. We next sequenced at varying depths ofcoverage (Figure 1) 10 strains derived from VC2010 aftertreatment with three commonly used mutagens usingstandard protocols. Five strains were produced withEMS, two with UV/TMP, and three with ENU (seematerials and methods). As can be seen in Figure 1,in the current study the average genomic coverage foreach strain ranged from 123 to 333 with a median valueof 193. Variant candidates (single-nucleotide transi-tions and transversions) were called with the Maq suiteprimarily using default parameters and filters. Since thedefault setting of this software requires a minimum ofthree reads to detect a variant, we evaluated the fractionof the genome covered to at least this depth (Figure 3).Even for the strains with the lowest overall coverage,.90% of the genomic bases were covered to a depth of atleast three reads. Because coverage can be influenced byGC content and exons are GC rich compared to the restof the genome, and also because we are primarilyinterested in variants in exons, we determined thefraction of exon bases with at least three reads and thefraction of exons with all bases covered by at least threereads (Figures 3 and 4). The exon coverage closelyparalleled the overall genome coverage, but Figure 4does illustrate that an overall coverage .203 is neces-sary to detect all the variants in more than 80 of theexons.

The genomes of each of these mutagenized strainshave a large number of mutations compared to thereference. Many of these differences are shared with theVC2010 parent strain or with other mutagenized strains

(Figure 5). Because mutations are rare in each of thesestrains, the variants shared between strains could corre-spond to additional errors in the reference genome (seeabove), errors produced by some sort of bias in thecurrent sequencing protocol, or genetic drift betweenVC2010 and the strain(s) used to create the referencegenome (see below). Before analyzing the variantsunique to each strain in more detail, we wished to eval-uate the sensitivity and specificity of the variant calls.

Sensitivity and specificity analysis: We used twoapproaches to evaluate sensitivity. If we assume that thevariants present in both wild-type N2 strains (VC2010

Figure 1.—Average genomecoverage for the various strainsstudied in this work. The averagecoverage is calculated by dividingthe sum of the aligned bases bythe genome size. The boxes arecolor coded according to the mu-tagens used to create the strainsand the color key is provided inthe inset.

Figure 2.—Number of variants called in two N2 wild-typestrains (VC2010, Vancouver; WU N2, St. Louis) when the se-quence of each is compared to the Cambridge wild-type ref-erence genome WS190 and to each other. The blue barscorrespond to variants common to VC2010 and WU N2 butdifferent from the reference genome. The yellow plus redbars represent variants unique to either VC2010 or WU N2.The red bars are associated with nonsynonymous variants.The yellow bars are associated with synonymous variants.

434 S. Flibotte et al.

Page 5: Whole-Genome Profiling of Mutagenesis in Caenorhabditis ...mutagenesis (De Stasio et al. 1997). Although it ap-pearsto have different biases with regardto genetargets and base changes

and WU N2) represent true differences from thereference genome, we can use these as a ‘‘true’’ positiveset to estimate the ability to detect these in each of themutant strains. Sensitivity ranged from 79 to 96% forindividual mutant strains, with a median of 90% and astrong correlation to the average genome coverage(correlation coefficient r ¼ 0.865; see Table 1). Whenfocusing such analysis only on variants present in exons,the median sensitivity increases to �95%.

The above sensitivity calculations are based on a biaseddata set; they derive only from regions of the genomewhere variants have already been detected in this study.We therefore estimated the sensitivity in a second way.We created 10,000 in silico single-nucleotide ‘‘mutations’’at random locations in the reference genome and cal-culated our ability to detect these with real reads fromour paired-end 50-mer data sets. At a depth of 203, theoverall sensitivity to detect the in silico mutations was89% and reached 95% when focusing on exons.

Detection specificity is the proportion of the variantscalled that are true differences from the referencegenome. This requires comprehensive knowledge ofthe true positives in a data set. Specificity here can beestimated by analyzing the variants called for VC2010provided two assumptions are made: (1) variants calledin VC2010 but none of the derived strains are all falsepositives and (2) none of the remaining VC2010 variants(those called in multiple strains) are false positives.Under these two assumptions, the specificity for callingvariants in the current VC2010 data set is 90%. Changingthe minimum Phred-like consensus quality from thedefault of 20 to 30 improved the specificity from 77 to

90%, without significant loss in sensitivity (Table 1). Athreshold consensus quality of 40 increases the specific-ity to 99%, but at a cost in sensitivity (Table 1). Toevaluate specificity directly, we tested 40 predictedvariants in the strain VC1924 by PCR of the affectedregions and subsequent Sanger sequencing and con-firmed 39 of them. The remaining one was not a single-base substitution but a single-base insertion also presentin VC2010. On the basis of this sensitivity/specificity

Figure 3.—Percentageof bases covered at a depthof at least 33 for eachstrain. The red bars corre-spond to the coverage forall the bases in the genomeand the yellow bars to thebases within exons. Theblue bars represent the per-centage of exons for whichall the bases are covered toa depth of at least 33.

Figure 4.—Percentage of exons totally covered to a depthof at least 33 as a function of average genome-wide coverage.Each data point corresponds to a different strain studied inthis work.

Comparing Mutagens in C. elegans 435

Page 6: Whole-Genome Profiling of Mutagenesis in Caenorhabditis ...mutagenesis (De Stasio et al. 1997). Although it ap-pearsto have different biases with regardto genetargets and base changes

analysis the variants labeled as unique in Figure 5 arelikely to represent the bulk of induced variants presentin each strain with only a few false positives. Thereforewe focused our further analysis on those candidates.

The mutation spectrum of EMS, ENU, and TMP/UV: On the basis of the unique variants (Figure 5) EMSproduced more variants (median of 323 per strain) thanENU (median of 226) or UV/TMP (median of 78) at theconcentrations used. As expected, the relative fre-quency of the possible transitions and transversionsdiffered for each mutagen. UV/TMP produced the leastamount of bias between the various mutation types;EMS primarily produced G/C to A/T transitions; andENU produced A/T to T/A transversions, G/C to A/Ttransitions, and A/T to G/C transitions (Figure 6).

These mutations appear to be distributed differentlyacross gene features for the different mutagens (Figure7). The proportion of variants within exons was thesmallest for the UV/TMP strains and the largest for theEMS strains. For those mutations falling in exons, wefound 477 missense and 21 nonsense mutations in the10 mutant strains derived from the VC2010 wild-typestrain. In addition, we found a total of 10 variantsaffecting splice sites (see Table 2). On average weidentified 50 informative mutations per strain. Thecomplete list of variants called in the mutant strains isavailable for download (see supporting information,File S1).

Deletion mutations in the mutagen-treated strains:Because Maq as used here does not detect insertion/deletion differences, we searched independently forlarge homozygous deletions in the sequence data setsproduced with the paired end sequencing protocol (seematerials and methods) and complemented the

searches with CGH experiments. With the microarraydesigns used in the current study, CGH has previouslybeen capable of detecting deletions up to megabases insize and mutations as small as single-base changes(Maydan et al. 2007, 2009). Only two deletion candidateswere found in our mutagenized strains and these wereconfirmed using Sanger sequencing. The first deletionremoves 122 bp of an exon of the unc-22 gene in the UV/TMP strain VC2366, coincidentally very close to a T to Gpoint mutation in the same exon (Figure 8). The otherdeletion, �200 bases in size, affected an intron in thegene arx-1 (Y71F9AL.16) in the ENU strain VC2452.

Figure 5.—Number ofvariants identified in thewild-type N2 strain VC2010and the 10 mutated strainsstudied in this work. Theboxes are color coded ac-cording to the mutagensused to create the stainsand the color key is pro-vided at the top left. Thehatched bars correspondto the variants present in.1 of those 11 strains whilethe solid bars correspondto the unique variants.

TABLE 1

Variant detection sensitivity as a function of the threshold inPhred-like quality score

Strain CoverageSensitivity (%)

quality .20Sensitivity (%)

quality .30Sensitivity (%)

quality .40

VC2366 11.93 78.6 78.5 64.5VC2362 12.73 83.8 84.2 76.7VC1923 12.83 81.7 81.9 73.7RB5000 14.03 88.7 89.3 87.6DM1017 16.03 87.5 87.5 82.9VC2451 20.33 90.8 91.2 87.4VC2452 23.03 90.4 91.2 87.6VC1924 23.43 95.1 95.7 93.2RB5002 29.13 94.7 95.1 95.1RB5001 30.13 93.4 93.9 94.2

The variant detection sensitivity was evaluated for individ-ual strains using three different thresholds in Phred-like qual-ity score from Maq. Variants common to VC2010 and WU N2were used for the evaluation, and the list comprised 655 var-iants at quality .20, 634 variants at quality .30, and 533 var-iants at quality .40.

436 S. Flibotte et al.

Page 7: Whole-Genome Profiling of Mutagenesis in Caenorhabditis ...mutagenesis (De Stasio et al. 1997). Although it ap-pearsto have different biases with regardto genetargets and base changes

Genetic drift in wild-type strains: The availability ofwhole-genome sequences for three separate ‘‘N2-like’’wild-type strains, one from Cambridge, England, onefrom St Louis, and one from Vancouver, British Colum-bia, provides an opportunity to examine genetic drift.Since the sensitivity is not perfect, we cannot expect allvariants to be called in all the data sets. To estimate theamount of genetic drift between VC2010 and WU N2we concentrated on the 886 variants called in at leastsix (i.e., more than half) of the VC2010-derived strains(Figures 2 and 5). Assuming a sensitivity of 90% and nogenetic drift, we calculate that we should detect 797 ofthose variants in the WU N2 sequence. However, we calledonly 662 of those variants. The difference between thesetwo numbers, 135, is the estimate of the number ofvariants present in VC2010 but absent in WU N2. Further,there were 85 variants in the WU N2 that were not foundin VC2010 or its derivatives. Since both the sensitivityand the specificity are estimated to be�90%, we can usethat number of 85 variants directly as an estimate ofthe variants really present in WU N2 but not in VC2010.Taken together, 135 1 85 gives us a reasonable estimate ofthe genetic drift between VC2010 and WU N2.

DISCUSSION

In this study we determined the spectrum of muta-tions across the entire C. elegans genome induced bythree well-known mutagens EMS, ENU, and UV/TMP.In total we detected 2723 variants and we placed theseinto two broad categories. In the first group, comprising2215 variants, are those that do not affect any encodedprotein. These variants map to intergenic regions, orintrons, or lead to synonymous change. In the secondgroup, comprising 508 variants, are those producingnonsynonymous changes (477 events), nonsense muta-tions (21 events), and mutations affecting splice sites (10events). As we endeavored to make each animal homo-zygous across the entire genome prior to sequencing,our study probably detects only about half of the genetic

events after mutagenesis. Even so, it is clear that eachmutagenized animal carries on average a genetic load ofhundreds of variants of which as many as 50 may havedeleterious effects. This translates to about eight muta-tions on every chromosome in the worm. There are twoimplications of this observation. First, it emphasizes theneed to outcross a mutant animal with a particularphenotype several times to ensure that all, or at leastmost, background mutations are eliminated. Second,for those wishing to identify the mutation responsiblefor a given phenotype, simply sequencing the mutantgenome is not enough. Some genetic mapping shouldbe done concurrently or one will be confounded by allthe choices in candidate variants. A number of powerfulnew SNP, bulk segregant, or CGH mapping methods are

Figure 6.—Relative frequency of the varioustransitions and transversions identified in the10 mutated strains studied in this work. The datafor individual strains were combined accordingto the mutagen used. Only the variants appear-ing in a single strain were used. The color keyidentifying the type of mutation is provided inthe inset, and the first letter represents the refer-ence nucleotide while the second letter repre-sents the mutated nucleotide.

Figure 7.—Relative variant frequency affecting variousgene features in the 10 mutated strains studied in this work.The data for individual strains were combined according tothe mutagen used. Only the variants appearing in a singlestrain are represented. The proportions of nonsynonymousvariants are represented in blue (dark for nonsense and lightfor missense mutations), the synonymous variants withinexons are shown in green, the variants appearing in intronsare yellow, and red bars are used for the remaining variants.

Comparing Mutagens in C. elegans 437

Page 8: Whole-Genome Profiling of Mutagenesis in Caenorhabditis ...mutagenesis (De Stasio et al. 1997). Although it ap-pearsto have different biases with regardto genetargets and base changes

available where traits can be mapped to a 5-map-unitinterval, or even a smaller interval, after a single cross(Michelmore et al. 1991; Jakubowski and Kornfeld

1999; Wicks et al. 2001; Swan et al. 2002; Davis et al.2005; Flibotte et al. 2009). In combination with deepsequencing these methods will provide an unprece-dented resource for new forward genetics discoveries.

For EMS and ENU the spectrum of mutation events weobserve is similar to what has been reported by others(Coulondre and Miller 1977; De Stasio et al. 1997).EMS produces primarily G/C to A/T transitions andENU, while exhibiting some bias for these transitions,also produced many other transitions and transversions(Figure 6). UV/TMP demonstrated the least mutationbias, producing all manner of transitions and trans-versions (Figure 6). This was unexpected, as previousinvestigators reported a bias using this mutagen for TAto GC transversions (Piette et al. 1985). It should benoted, however, that this earlier study was performed ona single locus and sampled only a small number ofmutations. The GC content within exons (43%) is muchhigher than the overall GC content (35%) of the C.elegans genome. It is therefore expected that the strong

EMS bias to produce G to A and C to T transitions wouldtranslate into an increase in the proportion of variantsaffecting exons relative to UV/TMP as can be seen inFigure 7. Following mutagenesis procedures and dosescommonly used by the C. elegans research community(see materials and methods), we find that EMS is themost powerful mutagen causing 1.5- to 2-fold more basesubstitutions than ENU and at least 3-fold more muta-tions than UV/TMP. Of striking note, none of 31nonsense or splice site variants identified were isolatedafter UV/TMP mutagenesis (Table 2): all were identi-fied after either EMS or ENU mutagenesis. As eachmutagen is distinct in behavior, choice of mutagen willbe determined by what type of mutation a researcherdesires. If the object is to obtain null alleles, EMS orpossibly ENU is an excellent choice. On the other hand ifone is seeking allelic variability, including a variety ofmissense alleles, ENU may be a wiser choice because it willgenerate a broader spectrum of changes at high frequency.

From our study and those of others (Hillier et al.2008; Denver et al. 2009; Vergara et al. 2009) it is clearthat not all wild-type N2 subcultures are identical.Denver et al. (2009) estimated that the spontaneous

TABLE 2

Variants associated with nonsense mutations and splicing defects in the mutant strains sequenced in the current study

Strain Allele ChromosomeCoordinate

WS190Reference

baseMutant

base Quality Mutation Gene

DM1017 gk2895 III 10004481 G T 81 Splicing C05B5.11RB5001 ok5304 III 7783172 C T 57 Splicing C08C3.3RB5001 ok5478 X 15475897 A C 135 Nonsense C46E1.3VC1923 gk1398 II 3721382 G A 48 Splicing C46E10.3RB5002 ok5778 V 10149631 G A 83 Splicing C51E3.1VC1924 gk963 I 8430347 C T 111 Splicing F02E9.9RB5002 ok5863 X 1312996 C T 95 Nonsense H42K12.3DM1017 gk2864 II 14421096 T A 54 Nonsense K04B12.1VC2452 gk2668 V 16180639 G A 111 Nonsense K05D4.2DM1017 gk2856 II 14115235 G A 57 Nonsense K09E4.4RB5000 ok5083 III 2036666 C T 90 Splicing M01E10.2VC2451 gk2294 I 4765803 G T 39 Nonsense M04F3.2RB5002 ok5810 V 13368760 G A 126 Nonsense F32H5.6RB5001 ok5385 V 1846700 A T 105 Nonsense T10B5.7RB5002 ok5804 V 12945577 G A 105 Nonsense T16G1.8VC1924 gk964 II 8930558 G A 74 Splicing T21B10.1RB5002 ok5573 II 3779342 C T 108 Splicing T24E12.11RB5002 ok5516 I 5111435 G A 84 Nonsense Y110A7A.15VC1924 gk2021 V 18843877 C T 81 Nonsense Y17D7A.3RB5000 ok5041 II 3107233 A T 57 Nonsense Y25C1A.3VC1923 gk1442 II 12590593 G A 36 Nonsense Y38E10A.6DM1017 gk2714 I 93554 C T 126 Nonsense Y48G1C.1RB5002 ok5820 V 15195283 G A 153 Nonsense Y75B12B.6DM1017 e998 II 14657282 C T 57 Nonsense ZC101.2RB5001 ok5212 I 10356594 T C 126 Splicing ZC434.9VC1924 gk952 III 9536643 G T 57 Nonsense ZK1098.3VC2451 gk2406 IV 11978396 G A 102 Nonsense ZK617.1VC1924 gk965 IV 11982077 G A 90 Nonsense ZK617.1VC2452 gk2609 IV 11992038 G A 135 Nonsense ZK617.1VC2452 gk2548 II 10497733 T G 108 Splicing F59B10.1DM1017 gk2764 I 4371101 G A 36 Nonsense ZK973.9

438 S. Flibotte et al.

Page 9: Whole-Genome Profiling of Mutagenesis in Caenorhabditis ...mutagenesis (De Stasio et al. 1997). Although it ap-pearsto have different biases with regardto genetargets and base changes

forward mutation rate for C. elegans is 3 3 10�9 per siteper generation. This translates to about one mutationevery three generations. We identified hundreds ofvariants unique to our N2 strain or the WU N2 strain.Using a very conservative metric, we find that �135variants are unique to VC2010 and another 85 areunique to the WU wild-type strain. On the basis of theabove calculation of the occurrence of spontaneous mu-tations and the time separating these wild-type strains,these variants are most likely the result of genetic driftthat has occurred over the hundreds of generationsthese two strains have been separated, both from eachother and from the reference Bristol N2 wild-type strain.This observation leads us to recommend that investi-gators sequence the parental strains used in any forwardgenetic screen to distinguish induced variation fromvariation due to drift. Granted, this could get expensiveif all one’s mutations are in such different parental back-grounds that one has to sequence a different parent strainfor every mutant. An alternative would be to sequence asecond allele of the unknown gene and subtract the com-mon background variants (for details on this approachsee Sarin et al. 2010). Of course the limitation here is thata second allele will not always be available.

In this deep sequencing study of the 100-Mb C. elegansgenome we have tried to determine the most appropri-ate parameters to maximize sensitivity and specificity ofvariant detection without incurring undo cost. We ex-plored base and exon coverage after varying sequencingdepths (Figures 1, 3, and 4 and Table 1). We incorpo-rated the idea of not just measuring genome coverage,but also measuring total exon coverage as this proved tobe a better and more sensitive indicator of how well allexons of genes are represented in the sequence. Here wedefine the total exon coverage as the percentage of allexons for which all the bases are covered to a depth of atleast three reads. Coverage much below 153 generallyresulted in sensitivity too low to ensure detecting a

mutation. As a case in point, we were unable to find theunc-22 mutation in strains VC1923 and VC2362 eventhough all VC strains reported here carry mutations inthe unc-22 gene (see materials and methods forselection procedure after mutagenesis). For strainsVC1923 and VC2362 the genome coverage was�13-foldand at this level exons in genes were not adequatelyrepresented (Figures 3 and 4). In all strains with deepercoverage all unc-22 exons were covered and we couldidentify the mutation. On the basis of our experiencewith various sequencing depths we recommend aimingfor 253 genome coverage to determine a homozygousmutation, as this would give close to 100% coverage of allexons to a depth of three reads. At this sequencing depththe variant detection sensitivity will be �95% (Table 1).

Both read depth and Phred-like quality cutoffs havean impact on sensitivity and specificity. The defaultsetting for the Phred-like consensus score in the Maqprogram is 20. By shifting this to 30 one gains improvedspecificity with only a marginal loss of sensitivity, but ifone shifts the setting any higher, then the loss in sen-sitivity is significant (see Table 1). We examined a limitedset of 40 variant calls in the strain VC1924 by Sangersequencing and only one ‘‘false positive’’ was identified,a single-base insertion. Calling this type of featureaccurately is a known weakness of the Maq aligner(Krawitz et al. 2010) especially when working withshorter 35-base unpaired reads like those used to analyzeVC1924. For most of our samples we used a sequencingprotocol with paired-end reads. New analysis algorithmsand software associated with massively parallel sequenc-ing technologies become available on a regular basis.Doing an extensive review of such analysis techniques isclearly beyond the scope of the current work. However,the nature of the false positive variant called in VC1924suggests that one could benefit from using an algorithmcapable of producing gapped alignments. To test thishypothesis, we reanalyzed all the sequencing data sets

Figure 8.—Deletion found on chromosomeIV of the strain VC2366. The top panel showsthe coverage in the region of interest. The mid-dle panel shows the apparent distance betweenthe alignments of read pairs to the reference ge-nome. The bottom panel shows the fluorescenceratio in log2 scale in a comparative genomic hy-bridization experiment (Maydan et al. 2007).

Comparing Mutagens in C. elegans 439

Page 10: Whole-Genome Profiling of Mutagenesis in Caenorhabditis ...mutagenesis (De Stasio et al. 1997). Although it ap-pearsto have different biases with regardto genetargets and base changes

used in the current work with the BWA/Samtoolscombination (Li and Durbin 2009; Li et al. 2009). Thenew analysis program correctly identified the single-baseinsertion relative to the reference genome in all 12 datasets, including in both wild-type strains (our unpub-lished results).

One of the goals of this study was to establish someguidelines for future genome analysis projects, as it isclear that next generation sequencing is an importantaddition to the tools used to correlate phenotype andgenotype. The parameters we suggest above will becomeeasier to attain as sequencing costs continue to drop andas new programs for analysis become available. Dataquality is important for any deep sequencing project—-even more so when the end product is a persistentgenetic reagent like a C. elegans strain. One must ensurethat the reported sequence truly represents the archivedstrain. Discrepancies that arise due to poor sequence orpoor strain maintenance may be fatal to future genome-scaled projects that seek to correlate phenotypes withgene networks. Poor sequence can result from bad re-actions, poor data analysis, or inconsistent/incompletedata annotation. We hope that the suggestions we makehere will help others to avoid at least some of thesepossible pitfalls.

Even with sequence perfectly representing the ge-nome, if the strains are not managed properly, the re-ported sequence will not represent the archived strain.For example, if the investigator does not freeze the strainfor several months after the DNA is prepared for deepsequencing, then the archived strain will almost cer-tainly accumulate variation via new mutation and drift asdescribed above for our N2 subculture, VC2010. Moresignificantly, if the investigator is not careful to drive thestrain to homozygosity before the sequencing sample isprepared, the archived strain will lose variation throughsimple genetic segregation as it is propagated prior tofreezing. In this study we self-crossed animals for 7 gen-erations, but it would be even better to do 10 (a policythat we ourselves have adopted).

Massively parallel sequencing has the potential tocompletely change the genetic research landscape. Forthe first time an investigator can examine the full geno-type of an organism. This will facilitate the study of phe-notypes that are difficult to score, because they requirespecialized assays, are incompletely penetrant, or arevariably expressed. Further, with a sufficiently largecollection of mutated strains, where each gene is rep-resented by multiple alleles, one could identify directlythe genes responsible for a given screening phenotypesimply by identifying the common gene targets amongthe relevant strains. Research on multicellular organ-isms has long lagged behind that on single-cell organ-isms like the yeast Saccharomyces cerevisiae in the study ofgenetic networks (reviewed in Dixon et al. 2009). Whole-genome sequencing should help close the gap andoffers potentially new ways to study gene networks.

We thank Bob Waterston for advice and many helpful commentsand Mark Johnston for editorial comments. We also thank OliverHobert for communication of data in advance of publication. Wethank the British Columbia Cancer Association Genome SciencesCentre Functional Genomics Group for expert technical assistance inlibrary construction and sequencing. This research was supported bygrants from Genome Canada and Genome British Columbia, theMichael Smith Research Foundation, and the Canadian Institute forHealth Research to D.G.M. and by National Institutes of HealthNational Human Genome Research Institute grant P41HG003652 toR.B. M.A.M. and S.J.J. are Scholars of the Michael Smith Foundationfor Health Research. D.G.M. is a Fellow of the Canadian Institute forAdvanced Research.

LITERATURE CITED

Acevedo-Arozena, A., S. Wells, P. Potter, M. Kelly, R. D. Cox

et al., 2008 ENU mutagenesis, a way forward to understandgene function. Annu. Rev. Genomics Hum. Genet. 9: 49–69.

Anderson, K. V., 2000 Finding the genes that direct mammalian devel-opment: ENU mutagenesis in the mouse. Trends Genet. 16: 99–102.

Anderson, P., 1995 Mutagenesis. Methods Cell Biol. 48: 31–58.Barstead, R. J., and D. G. Moerman, 2006 C. elegans deletion mu-

tant screening. Methods Mol. Biol. 351: 51–58.Bautz, E., and E. Freese, 1960 On the mutagenic effects of alkylat-

ing agents. Proc. Natl. Acad. Sci. USA. 46: 1585–1594.Brenner, S., 1974 The genetics of Caenorhabditis elegans. Genetics

77: 71–94.C. elegans Sequencing Consortium, 1998 Genome sequence

of the nematode C. elegans: a platform for investigating biology.Science 282: 2012–2018.

Coulondre, C., and J. H. Miller, 1977 Genetic studies of the lacrepressor. III. Additional correlation of mutational sites with spe-cific amino acid residues. J. Mol. Biol. 117: 525–567.

Davis, M. W., M. Hammarlund, T. Harrach, P. Hullett, S. Olsen

et al., 2005 Rapid single nucleotide polymorphism mapping inC. elegans. BMC Genomics 6: 118.

Denver, D. R., P. C. Dolan, L. J. Wilhelm, W. Sung, J. I. Lucas-Lledo

et al., 2009 A genome-wide view of Caenorhabditis elegans base-substitution mutation processes. Proc. Natl. Acad. Sci. USA 106:16310–16314.

De Stasio, E. A., and S. Dorman, 2001 Optimization of ENUmutagenesis of Caenorhabditis elegans. Mutat. Res. 495: 81–88.

De Stasio, E., C. Lephoto, L. Azuma, C. Holst, D. Stanislaus et al.,1997 Characterization of revertants of unc-93(e1500) in Caenorhab-ditis elegans induced by N-ethyl-N-nitrosourea. Genetics 147: 597–608.

Dixon, S. J., M. Costanzo, A. Baryshnikova, B. Andrews and C.Boone, 2009 Systematic mapping of genetic interaction net-works. Annu. Rev. Genet. 43: 601–625.

Flibotte, S., M. L. Edgley, J. Maydan, J. Taylor, R. Zapf et al.,2009 Rapid high resolution single nucleotide polymorphism-comparative genome hybridization mapping in Caenorhabditiselegans. Genetics 181: 33–37.

Gengyo-Ando, K., and S. Mitani, 2000 Characterization of muta-tions induced by ethyl methanesulfonate, UV, and trimethylpsor-alen in the nematode Caenorhabditis elegans. Biochem. Biophys.Res. Commun. 269: 64–69.

Gilchrist, E. J., and D. G. Moerman, 1992 Mutations in the sup-38gene of Caenorhabditis elegans suppress muscle-attachment defectsin unc-52 mutants. Genetics 132: 431–442.

Greenwald, I. S., and H. R. Horvitz, 1982 Dominant suppressorsof a muscle mutant define an essential gene of Caenorhabditiselegans. Genetics 101: 211–225.

Hillier, L. W., G. T. Marth, A. R. Quinlan, D. Dooling, G. Fewell

et al., 2008 Whole-genome sequencing and variant discovery inC. elegans. Nat. Methods 5: 183–188.

Jakubowski, J., and K. Kornfeld, 1999 A local, high-density, single-nucleotide polymorphism map used to clone Caenorhabditiselegans cdf-1. Genetics 153: 743–752.

Krawitz, P., C. Rodelsperger, M. Jager, L. Jostins, S. Bauer et al.,2010 Microindel detection in short-read sequence data. Bioin-formatics 26: 722–729.

440 S. Flibotte et al.

Page 11: Whole-Genome Profiling of Mutagenesis in Caenorhabditis ...mutagenesis (De Stasio et al. 1997). Although it ap-pearsto have different biases with regardto genetargets and base changes

Lewis, E. B., and F. Bacher, 1968 Methods of feeding ethyl meth-ane solfonate (EMS) to Drosophila. Dros. Inf. Serv. 43: 193.

Li, H., and R. Durbin, 2009 Fast and accurate short read alignmentwith Burrows-Wheeler transform. Bioinformatics 25: 1754–1760.

Li, H., J. Ruan and R. Durbin, 2008 Mapping short DNA sequenc-ing reads and calling variants using mapping quality scores.Genome Res. 18: 1851–1858.

Li, H., B. Handsaker, A. Wysoker, T. Fennell, J. Ruan et al.,2009 The Sequence Alignment/Map format and SAMtools. Bi-oinformatics 25: 2078–2079.

Loveless, A., 1959 The influence of radiomimetic substances ondeoxyribonucleic acid synthesis and function studied in Escher-ichia coli/phage systems. III. Proc. R. Soc. Lond. B Biol. Sci. 150:497–508.

Maydan, J. S., S. Flibotte, M. L. Edgley, J. Lau, R. R. Selzer et al.,2007 Efficient high-resolution deletion discovery in Caenorhab-ditis elegans by array comparative genomic hybridization. GenomeRes. 17: 337–347.

Maydan, J. S., H. M. Okada, S. Flibotte, M. L. Edgley and D. G.Moerman, 2009 De novo identification of single nucleotide mu-tations in Caenorhabditis elegans using array comparative genomichybridization. Genetics 181: 1673–1677.

Michelmore, R. W., I. Paran and R. V. Kesseli, 1991 Identificationof markers linked to disease-resistance genes by bulked segregantanalysis: a rapid method to detect markers in specific genomicregions by using segregating populations. Proc. Natl. Acad. Sci.USA 88: 9828–9832.

Moerman, D. G., and D. L. Baillie, 1979 Genetic organization inCaenorhabditis elegans: fine-structure analysis of the unc-22 gene.Genetics 91: 95–103.

Morgan, T. H., A. H. Sturtevant, H. J. Muller and C. B. Bridges,1922 The Mechanism of Mendelian Heredity. H. Holt & Co., New York.

Muller, H. J., 1927 Artificial transmutation of the gene. Science 66:84–87.

Piette, J., D. Decuyper-Debergh and H. Gamper, 1985 Mutagenesisof the lac promoter region in M13 mp10 phage DNA by 49-hydrox-ymethyl-4,59,8-trimethylpsoralen. Proc. Natl. Acad. Sci. USA 82:7355–7359.

Rose, A. M., N. J. O’Neil, M. Bilenky, Y. S. Butterfield, N. Malhis

et al., 2010 Genomic sequence of a mutant strain of Caenorhab-

ditis elegans with an altered recombination pattern. BMCGenomics 11: 131.

Russell, W. L., E. M. Kelly, P. R. Hunsicker, J. W. Bangham, S. C.Maddux et al., 1979 Specific-locus test shows ethylnitrosoureato be the most potent mutagen in the mouse. Proc. Natl. Acad.Sci. USA 76: 5818–5819.

Sarin, S., S. Prabhu, M. M. O’Meara, I. Pe’er and O. Hobert,2008 Caenorhabditis elegans mutant allele identification bywhole-genome sequencing. Nat. Methods 5: 865–867.

Sarin, S., V. Bertrand, H. Bigelow, A. Boyanov, M. Doitsidou

et al., 2010 Analysis of multiple ethyl methanesulfonate-muta-genized Caenorhabditis elegans strains by whole-genome sequenc-ing. Genetics 185: 417–430.

Shen, Y., S. Sarin, Y. Liu, O. Hobert and I. Pe’er, 2008 Comparingplatforms for C. elegans mutant identification using high-throughput whole-genome sequencing. PLoS ONE 3: e4012.

Sladek, F. M., A. Melian and P. Howard-Flanders, 1989 Incisionby UvrABC excinuclease is a step in the path to mutagenesis bypsoralen crosslinks in Escherichia coli. Proc. Natl. Acad. Sci. USA86: 3982–3986.

Sturtevant, A. H., 1965 A History of Genetics. Harper & Row, New York.Sulston, J., and J. Hodgkin, 1988 Methods, pp. 587–606 in The

Nematode Caenorhabditis elegans, edited by W. B. Wood. ColdSpring Harbor Laboratory Press, Cold Spring Harbor, NY.

Swan, K. A., D. E. Curtis, K. B. McKusick, A. V. Voinov, F. A. Mapa

et al., 2002 High-throughput gene mapping in Caenorhabditiselegans. Genome Res. 12: 1100–1105.

Vergara, I. A., A. K. Mah, J. C. Huang, M. Tarailo-Graovac, R. C.Johnsen et al., 2009 Polymorphic segmental duplication in thenematode Caenorhabditis elegans. BMC Genomics 10: 329.

Wicks, S. R., R. T. Yeh, W. R. Gish, R. H. Waterston and R. H.Plasterk, 2001 Rapid gene mapping in Caenorhabditis elegansusing a high density polymorphism map. Nat. Genet. 28: 160–164.

Yandell, M. D., L. G. Edgar and W. B. Wood, 1994 Trimethyl-psoralen induces small deletion mutations in Caenorhabditis elegans.Proc. Natl. Acad. Sci. USA 91: 1381–1385.

Communicating editor: D. I. Greenstein

Comparing Mutagens in C. elegans 441

Page 12: Whole-Genome Profiling of Mutagenesis in Caenorhabditis ...mutagenesis (De Stasio et al. 1997). Although it ap-pearsto have different biases with regardto genetargets and base changes

Supporting Information http://www.genetics.org/cgi/content/full/genetics.110.116616/DC1

Whole-Genome Profiling of Mutagenesis in Caenorhabditis elegans

Stephane Flibotte, Mark L. Edgley, Iasha Chaudhry, Jon Taylor, Sarah E. Neil,

Aleksandra Rogula, Rick Zapf, Martin Hirst, Yaron Butterfield, Steven J. Jones, Marco A. Marra, Robert J. Barstead and Donald G. Moerman

Copyright © 2010 by the Genetics Society of America DOI: 10.1534/genetics.110.116616

Page 13: Whole-Genome Profiling of Mutagenesis in Caenorhabditis ...mutagenesis (De Stasio et al. 1997). Although it ap-pearsto have different biases with regardto genetargets and base changes

S. Flibotte et al. 2 SI

FILE S1

LIST OF VARIANTS CALLED IN THE MUTAGENIZED STRAINS

File S1 is available for download as a text file at http://www.genetics.org/cgi/content/full/genetics.110.116616/DC1. The downloadable text file is in a comma-separated format. The labels in the header are self-explanatory except that the Quality in the sixth column refers to the Phred-like quality score from the Maq software. A Gene Feature entry with a * after the word intron in the eighth column indicates that the mutation is associated with a splicing defect. Only the variants appearing in a single strain are listed and the Coordinate entries in the third column refer to the reference genome available at Wormbase version WS190.