6
Biol. Cell (2005) 97, 867–872 (Printed in Great Britain) doi:10.1042/BC20040155 Scientiae forum My Favourite Sites There’s no place like WormBase: an indispensable resource for Caenorhabditis elegans researchers Kevin O’Connell 1 Laboratory of Biochemistry and Genetics, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD 20892, U.S.A. The nematode Caenorhabditis elegans is used extensively by scientists to study a wide variety of biological pro- cesses and is one of the most thoroughly characterized animals. Over the years, the community of C. elegans researchers has generated a wealth of information on the genetics, development, behaviour, and cellular and mo- lecular biology of the worm. This body of data has grown even larger with the recent application of high throughput screening methodology to study gene function, expression and interactions. WormBase (http://www.wormbase.org) is the primary online source of biological data on C. elegans and related nematodes. Equipped with an assort- ment of powerful search tools, WormBase allows users to quickly extract a variety of information, including data on individual genes, DNA sequence, cell lineage and literature citations. As the database is well maintained and the functionalities constantly modified in response to evolving researcher needs, WormBase has become a vital component of the laboratories studying the worm and a model for other biological databases. Caenorhabditis elegans is a small free-living soil nemat- ode that has gained widespread popularity among cel- lular, developmental and neurobiologists as a model organism. Its main attributes are its simple ana- tomy (less than 1000 somatic cells), short life cycle (3 days), ease of cultivation, small and completely sequenced genome, and the presence of powerful forward and reverse genetic methodologies. Since Sidney Brenner first published on the genetics of C. elegans in 1974, the rate at which biological data on the worm are generated has rapidly accelerated. Buried within this voluminous body of data are key bits of information that can help researchers solve a variety of biological problems. To efficiently extract the necessary information, researchers need a reliable and up-to-date database, equipped with powerful search tools. WormBase (http://www.wormbase.org) fulfills this need as the central data repository for C. elegans and related nematodes, and does an out- 1 email [email protected] Key words: Caenorhabditis elegans, database, genomics, Internet, WormBase. Abbreviations used: RNAi, RNA interference; SNP, single nucleotide polymorphism; YAC, yeast artificial chromosome. standing job of presenting an unwieldy amount of data in a clear and accessible manner. WormBase is produced by a consortium of biologists and computer scientists headed by Paul Sternberg of Caltech, Richard Durbin of The Wellcome Trust Sanger Institute, Lincoln Stein of Cold Spring Harbor Laboratories and John Spieth of the Washington University Genome Sequencing Center. The main site is located at Cold Spring Harbor Laboratories (http://www.wormbase.org) with mirror sites loc- ated in Greece (http://worm.imbb.forth.gr) and in Pasadena (http://caltech.wormbase.org). While each investigator has their own preferred manner of util- izing WormBase, I will focus on those aspects of the database that I feel are most commonly used. For additional information on WormBase, the interested reader should consult the WormBase User’s Guide or the help pages embedded in the database. One of the attributes that makes WormBase such a powerful resource is the manner in which dispar- ate, but related, forms of data are organized, presen- ted and linked. To appreciate the enormity of this task, consider the fact that there are approx. 19 000 genes in C. elegans (and an equal number in each www.biolcell.org | Volume 97 (11) | Pages 867–872 867

There's no place like WormBase: an indispensable resource for Caenorhabditis elegans researchers

Embed Size (px)

Citation preview

Biol. Cell (2005) 97, 867–872 (Printed in Great Britain) doi:10.1042/BC20040155 Scientiae forumMy Favourite Sites

There’s no place like WormBase:an indispensable resource forCaenorhabditis elegans researchersKevin O’Connell1

Laboratory of Biochemistry and Genetics, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health,

Bethesda, MD 20892, U.S.A.

The nematode Caenorhabditis elegans is used extensively by scientists to study a wide variety of biological pro-cesses and is one of the most thoroughly characterized animals. Over the years, the community of C. elegansresearchers has generated a wealth of information on the genetics, development, behaviour, and cellular and mo-lecular biology of the worm. This body of data has grown even larger with the recent application of high throughputscreening methodology to study gene function, expression and interactions. WormBase (http://www.wormbase.org)is the primary online source of biological data on C. elegans and related nematodes. Equipped with an assort-ment of powerful search tools, WormBase allows users to quickly extract a variety of information, including dataon individual genes, DNA sequence, cell lineage and literature citations. As the database is well maintained andthe functionalities constantly modified in response to evolving researcher needs, WormBase has become a vitalcomponent of the laboratories studying the worm and a model for other biological databases.

Caenorhabditis elegans is a small free-living soil nemat-ode that has gained widespread popularity among cel-lular, developmental and neurobiologists as a modelorganism. Its main attributes are its simple ana-tomy (less than 1000 somatic cells), short life cycle(3 days), ease of cultivation, small and completelysequenced genome, and the presence of powerfulforward and reverse genetic methodologies. SinceSidney Brenner first published on the genetics ofC. elegans in 1974, the rate at which biological dataon the worm are generated has rapidly accelerated.Buried within this voluminous body of data are keybits of information that can help researchers solve avariety of biological problems. To efficiently extractthe necessary information, researchers need a reliableand up-to-date database, equipped with powerfulsearch tools. WormBase (http://www.wormbase.org)fulfills this need as the central data repository forC. elegans and related nematodes, and does an out-

1email [email protected] words: Caenorhabditis elegans, database, genomics, Internet,WormBase.Abbreviations used: RNAi, RNA interference; SNP, single nucleotidepolymorphism; YAC, yeast artificial chromosome.

standing job of presenting an unwieldy amount ofdata in a clear and accessible manner. WormBaseis produced by a consortium of biologists andcomputer scientists headed by Paul Sternberg ofCaltech, Richard Durbin of The Wellcome TrustSanger Institute, Lincoln Stein of Cold Spring HarborLaboratories and John Spieth of the WashingtonUniversity Genome Sequencing Center. The mainsite is located at Cold Spring Harbor Laboratories(http://www.wormbase.org) with mirror sites loc-ated in Greece (http://worm.imbb.forth.gr) and inPasadena (http://caltech.wormbase.org). While eachinvestigator has their own preferred manner of util-izing WormBase, I will focus on those aspects of thedatabase that I feel are most commonly used. Foradditional information on WormBase, the interestedreader should consult the WormBase User’s Guide orthe help pages embedded in the database.

One of the attributes that makes WormBase sucha powerful resource is the manner in which dispar-ate, but related, forms of data are organized, presen-ted and linked. To appreciate the enormity of thistask, consider the fact that there are approx. 19 000genes in C. elegans (and an equal number in each

www.biolcell.org | Volume 97 (11) | Pages 867–872 867

K. O’Connell

related nematode included in the database). For eachof these genes there might exist data describing itsgenomic sequence and molecular structure, cDNAsequence, protein sequence, closest relatives in otherspecies, expression pattern, genetic and physical po-sition, function, interacting genes or proteins, andavailable reagents, such as cDNAs, alleles and anti-bodies, literature citations, etc.

In a clear, concise and effective manner, WormBasepresents the relevant data on each gene in theform of a Gene Summary page (Figure 1). To ac-cess the summary page for a given gene, one can usethe search function on the main page. Once you ar-rive at a Gene Summary page, you can peruse theavailable information, which is neatly sorted intoclasses. The main data classes are: Identification, Loc-ation, Function, Gene Ontology Classification, Al-leles, Similarities, Reagents and Bibliography. How-ever, within a particular class, there often exists avariety of data types that substantiate that facet ofa gene’s biology. Thus some classes are further or-ganized into subclasses. For instance, under the classheading Function, the following subclasses are listed:Mutant Phenotype, RNAi Phenotype, Anatomic Ex-pression Pattern, Affymetrix Microarray ExpressionData, SMD Microarray Expression Data, MicroarrayTopology Map and Protein Domains. In each sub-class, there are hypertext links to the correspondingdata if available. In addition, each Gene Summarypage is linked to other pages containing additionalinformation about the gene. For molecularly definedgenes, the Gene Summary page is linked to a Se-quence Report page that stores information that ismore specific to a gene’s genomic sequence and con-text. And for those genes for which mutant alleleshave been identified, there is a Locus Summary page,with links to mapping data, allele descriptions andrelevant literature. Thus the structure of WormBaseis such that pertinent information on any gene can beeasily and quickly retrieved.

WormBase, however, is much more than a col-lection of records on each gene. It also contains alarge number of basic and advanced search tools. TheSimple Search tool on the main page allows users tosearch the entire database using key words. The usercan make the search broad (by selecting any objecttype that matches the key word) or narrow by search-ing for a match to a specific object type (author, cell,sequence, etc.). One can also perform a detailed search

that expands to include the full text of all abstractsfrom papers and meetings. Thus the Simple Searchserves as the main portal to information contained inthe database.

One of the most useful and powerful search toolsis the Genome Browser. This tool allows access to agraphical display of a user-specified region of the gen-ome (Figure 2). The region may be identified basedon a gene name, a cDNA, a polymorphism or evena genomic co-ordinate. The output shows the posi-tion of this region along the chromosome (given inMbp of DNA) and relative to a selection of well-spaced genetic markers. The user may ‘zoom in’ onor out of the region of interest or pan right or leftalong the chromosome. Below this graphic, certainfeatures of the specified genomic region are shown bydefault, such as named genes, the intron/exon struc-ture of predicted and confirmed genes, microRNAs,operons and the complete set of overlapping YACs(yeast artificial chromosomes) and cosmids used tomap and sequence the genome. What makes the Gen-ome Browser so robust is that the user can call up ad-ditional features. These include alignments of ESTs(expressed sequence tags), SNPs (single nucleotidepolymorphisms), knockout alleles, RNAi (RNA in-terference) experiments and much more. A researcherwho has mapped a new mutation to a particular re-gion can thumb through the RNAi data for an ex-periment that produces a phenotypic match to theirmutant, or they can search for the presence of SNPsthat might aid in additional physical mapping. TheGenome Browser is thus a potent adjunct to cloningand one that researchers now heavily lean on.

The discovery of the potent gene-silencing tech-nique RNAi in C. elegans and its application to large-scale studies of gene function has led to an explosionof genomic data in recent years. WormBase has donean outstanding job of quickly incorporating pub-lished RNAi results into the database. On each GeneSummary page, the results of these screens are de-scribed, albeit in general terms, such as wild-type(lack of an obvious phenotype), sterile, embryoniclethal or lethal, slow growth, abnormal morphology,etc. However, some of the results are linked to imagesor even movies of RNAi-treated embryos undergoingdevelopment. By allowing access to such raw data,WormBase provides a phenotypic description not at-tainable by words alone and allows users to interpretthe experimental results for themselves. At present,

868 C© Portland Press 2005 | www.biolcell.org

WormBase: a resource for C. elegans researchers Scientiae forum

Figure 1 Example of a WormBase Gene Summary page

www.biolcell.org | Volume 97 (11) | Pages 867–872 869

K. O’Connell

Figure 2 WormBase Genome Browser tool

most RNAi results are not linked to such raw data,and there does not seem to be an effort to curatesimilar data for mutants. However, the WormBaseConsortium is a relentless group who work tirelesslyto improve the database, and I fully expect that inthe near future many genes will be linked to im-ages or movies that document defects resulting frommutation or gene silencing.

To take full advantage of the large RNAi datasets, researchers need a means of efficiently siftingthrough this data for specific information. WormBasehas addressed this issue through the development ofa few clever search tools. For investigators in search

of genes with a specific RNAi phenotype, it wouldbe too cumbersome to review the data from eachGene Summary on a page-by-page basis. Using theRNAi Search function, the entire collection of RNAiexperiments can be searched for those that produceone or a combination of the general phenotypic cate-gories. Depending on the terms used to search, thelist of positive hits can be quite large and thereforedifficult to manage. For instance, a search for RNAiexperiments yielding an embryonic lethal phenotypereturns 3553 hits. Thus, by trial and error, the usermight have to modify the search terms to get a morespecific result. However, the output can also be saved

870 C© Portland Press 2005 | www.biolcell.org

WormBase: a resource for C. elegans researchers Scientiae forum

Figure 3 WormBase Pedigree Browser tool

as a plain text file and searched by other means. TheInterval Search is another useful tool and a great aid tocloning. This search produces a list of all geneticallyand molecularly defined genes within a given intervalof the genome. Each gene is listed with available dataon RNAi and mutant phenotypes, with links to moredetailed descriptions. As with the Genome Browser,an investigator who has mapped a mutation to a well-defined interval can search these lists for genes whosemutant or RNAi phenotype match their mutant.

WormBase also has an assortment of other searchfunctions to aid in gene cloning. These include: aClones Search to identify cosmids, fosmids or YACsthat could be used for transformation-rescue experi-ments, a Markers Search to look for SNPs and visiblegenetic markers for mapping, versions of BLAST and

BLAT to search for homologous DNA and proteinsequences and more. WormBase thus makes avail-able a comprehensive set of tools that can be used toextract from the available genomic data, the specificmolecular and genetic information one needs.

Researchers turn to WormBase, however, for muchmore than genomic data. One of the well-cited at-tributes of C. elegans as a model system is its invariantcell lineage. That is, the same exact pattern of celldivision occurs in each individual to produce the 959somatic cell nuclei present in each adult hermaphrod-ite. This has allowed researchers to map the completelineage from zygote to adult. The Pedigree Browserallows researchers to quickly retrieve lineage inform-ation for a specific cell or group of cells (Figure 3).This information can prove crucial in understanding

www.biolcell.org | Volume 97 (11) | Pages 867–872 871

K. O’Connell

certain mutant phenotypes. For instance, in the caseof a researcher studying a mutant with a complexphenotype in which several anatomical structures areaffected, the Pedigree Browser can be used to de-termine if a defect in a single-cell lineage might beto blame. WormBase has also developed an Expres-sion Pattern Search to retrieve data on the temporaland spatial expression of mRNAs and proteins. Userscan search the available gene expression data by cell,cell group or developmental stage for useful tissue-or cell-type specific markers or to identify potentialcomponents of a particular developmental pathway.Finally, WormBase also archives literature pertainingto C. elegans and related nematodes, and provides apowerful Textpresso-based search engine that can re-trieve information from abstracts from meetings andeven the full text of papers.

What makes WormBase most valuable to research-ers is that the information is kept up-to-date andthe user interface is constantly evolving to meet re-searchers’ needs. Each version of the database has alifespan of only 2 to 3 weeks (approximately equal tothe lifespan of a wild-type worm) and comes repletewith new information. At the time of this writing,version WS134 was released and contained the res-ults of a new systematic RNAi screen. In addition,

new functionalities are added to the database whenneeded. For instance, to take advantage of the newlyreleased genomic sequences of related nematodes,WormBase has recently been outfitted with a Syn-teny Alignment tool. This tool was developed to takeadvantage of the similar ordering of genes (or syn-teny) between these related nematodes, and is used tocompare regions of nucleotide similarity between thegenomes of C. elegans and its close relatives. Amongother things, this allows researchers to identify im-portant gene regulatory elements whose positions andsequences are conserved.

WormBase is thus constantly under development,a process driven by a strong and fruitful collaborationbetween the team of WormBase developers and thecommunity of C. elegans researchers who generatethe data, use the database and provide feedback. Aswith the worm itself, WormBase is continuing toevolve, and when circumstances dictate WormBasewill adapt, adding new data and tools to serve theneeds of future generations of investigators.

AcknowledgementI thank Andy Golden for critical comments on thismanuscript.

Received 10 December 2004; accepted 25 January 2005

Published on the Internet 24 October 2005, doi:10.1042/BC20040155

872 C© Portland Press 2005 | www.biolcell.org