11
324 Int. J. Data Mining and Bioinformatics, Vol. 6, No. 3, 2012 Copyright © 2012 Inderscience Enterprises Ltd. Select Your SNPs (SYSNPs): a web tool for automatic and massive selection of SNPs Belén Lorente-Galdos Institut de Biologia Evolutiva (UPF-CSIC), Barcelona Biomedical Research Park (PRBB), Spain and CIBER en Epidemiología y Salud Pública (CIBERESP), Spain E-mail: [email protected] Ignacio Medina Bioinformatics Department, Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain E-mail: [email protected] Carlos Morcillo-Suarez, Txema Heredia, Ángel Carreño-Torres, Ricardo Sangrós and Josep Alegre Institut de Biologia Evolutiva (UPF-CSIC), Barcelona Biomedical Research Park (PRBB), Spain and Spanish National Institute for Bioinformatics (INB), Spain E-mail: [email protected] E-mail: [email protected] E-mail: [email protected] E-mail: [email protected] E-mail: [email protected] Guillermo Pita Bioinformatics Unit (UBio), Structural Biology and Biocomputing Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain E-mail: [email protected] Gemma Vellalta Centre for Research in Environmental Epidemiology (CREAL), Barcelona, Spain

Select Your SNPs (SYSNPs): a web tool for automatic and ... Biomedical Research Park (PRBB), Spain and CIBER en Epidemiología y Salud Pública (CIBERESP), Spain and Spanish National

  • Upload
    doanthu

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Select Your SNPs (SYSNPs): a web tool for automatic and ... Biomedical Research Park (PRBB), Spain and CIBER en Epidemiología y Salud Pública (CIBERESP), Spain and Spanish National

324 Int. J. Data Mining and Bioinformatics, Vol. 6, No. 3, 2012

Copyright © 2012 Inderscience Enterprises Ltd.

Select Your SNPs (SYSNPs): a web tool for automatic and massive selection of SNPs

Belén Lorente-Galdos Institut de Biologia Evolutiva (UPF-CSIC), Barcelona Biomedical Research Park (PRBB), Spain

and

CIBER en Epidemiología y Salud Pública (CIBERESP), Spain E-mail: [email protected]

Ignacio Medina Bioinformatics Department, Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain E-mail: [email protected]

Carlos Morcillo-Suarez, Txema Heredia, Ángel Carreño-Torres, Ricardo Sangrós and Josep Alegre Institut de Biologia Evolutiva (UPF-CSIC), Barcelona Biomedical Research Park (PRBB), Spain

and

Spanish National Institute for Bioinformatics (INB), Spain E-mail: [email protected] E-mail: [email protected] E-mail: [email protected] E-mail: [email protected] E-mail: [email protected]

Guillermo Pita Bioinformatics Unit (UBio), Structural Biology and Biocomputing Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain E-mail: [email protected]

Gemma Vellalta Centre for Research in Environmental Epidemiology (CREAL), Barcelona, Spain

Page 2: Select Your SNPs (SYSNPs): a web tool for automatic and ... Biomedical Research Park (PRBB), Spain and CIBER en Epidemiología y Salud Pública (CIBERESP), Spain and Spanish National

A web tool for automatic and massive selection of SNPs 325

and

Institut Municipal d’Investigació Mèdica (IMIM), Barcelona, Spain E-mail: [email protected]

Nuria Malats Genetic and Molecular Epidemiology Group, Spanish National Cancer Research Centre (CNIO), Madrid, Spain E-mail: [email protected]

David G. Pisano Spanish National Institute for Bioinformatics (INB), Spain

and

Bioinformatics Unit (UBio), Structural Biology and Biocomputing Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain E-mail: [email protected]

Joaquín Dopazo Bioinformatics Department, Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain

and

Spanish National Institute for Bioinformatics (INB), Spain

and

CIBER en Enfermedades Raras (CIBERER), Spain E-mail: [email protected]

Arcadi Navarro* Institut de Biologia Evolutiva (UPF-CSIC), Barcelona Biomedical Research Park (PRBB), Spain

and

CIBER en Epidemiología y Salud Pública (CIBERESP), Spain

and

Spanish National Institute for Bioinformatics (INB), Spain

and

Page 3: Select Your SNPs (SYSNPs): a web tool for automatic and ... Biomedical Research Park (PRBB), Spain and CIBER en Epidemiología y Salud Pública (CIBERESP), Spain and Spanish National

326 B. Lorente-Galdos et al.

Institució Catalana de Recerca i Estudis Avançats (ICREA), Universitat Pompeu Fabra, Barcelona, Spain E-mail: [email protected] *Corresponding author

Abstract: Association studies are the choice approach in the discovery of the genomic basis of complex traits. To carry out such analysis, researchers frequently need to (1) select optimally informative sets of Single Nucleotide Polymorphisms (SNPs) in candidate regions and (2) annotate the results of associations found by means of genome-wide SNP arrays. These are complex tasks, since many criteria have to be considered, including the SNPs’ functional properties, technological information and haplotype frequencies in given populations. SYSNPs implements algorithms that allow for efficient and simultaneous consideration of all the relevant criteria to obtain sets of SNPs that properly cover arbitrarily large lists of genes or genomic regions. Complementarily, SYSNPs allows for comprehensive functional annotation of SNPs linked to any given marker SNP. SYSNPs dramatically reduces the effort needed for SNP selection from days of searching various databases to a few minutes using a simple browser.

Keywords: SNPs; single nucleotide polymorphisms; tag-SNPs; tagger; linkage disequilibrium; association; complex disease; PUPASuite; genome-wide association study; SNP-arrays.

Reference to this paper should be made as follows: Lorente-Galdos, B., Medina, I., Morcillo-Suarez, C., Heredia, T., Carreño-Torres, A., Sangrós, R., Alegre, J., Pita, G., Vellalta, G., Malats, N., Pisano, D.G., Dopazo, J. and Navarro, A. (2012) ‘Select your SNPs (SYSNPs): a web tool for automatic and massive selection of SNPs’, Int. J. Data Mining and Bioinformatics, Vol. 6, No. 3, pp.324–334.

Biographical notes: Belen Lorente-Galdos is BSc in Mathematics at the Universitat de Barcelona (UB) and has a Master in Bioinformatics for Health Science at Universitat Pompeu Fabra (UPF) where she is now a PhD student. She has previously worked for a software engineering company (AIA, Aplicaciones en Informática Avanzada, S.A.).

Ignacio Medina graduated in Computer Science from the Universidad Politécnica de Valencia (UPV) in 2005 and obtained a MSc in Molecular and Cellular Biology and Genetics from Universitat de Valencia (UV) in 2008. He now works as a Bioinformatician in the Spanish National Institute of Bioinformatics (INB) and as a Bioinformatician and Project Manager in the Centro de Investigación Príncipe Felipe (CIPF).

Carlos Morcillo-Suarez worked for IBM until 2004, when he became a bioinformatics technician at the Universitat Pompeu Fabra (UPF). He now is a PhD student specialised in development of new computational and statistics approaches for the management and analysis of genomic polymorphisms data. He is currently employed by the Institute of Evolutionary Biology (CSIC-UPF) and the Spanish National Institute of Bioinformatics (INB).

Page 4: Select Your SNPs (SYSNPs): a web tool for automatic and ... Biomedical Research Park (PRBB), Spain and CIBER en Epidemiología y Salud Pública (CIBERESP), Spain and Spanish National

A web tool for automatic and massive selection of SNPs 327

Txema Heredia graduated from the Universitat Politécnica de Catalunya (UPC) in 2010. He is currently a bioinformatics technician at the Institute of Evolutionary Biology (CSIC-UPF) and the Spanish National Institute of Bioinformatics (INB).

Ángel Carreño-Torres graduated from the Universitat Politécnica de Catalunya (UPC) in 2007. He is currently a bioinformatics technician at the Institute of Evolutionary Biology (CSIC-UPF) and the Spanish National Institute of Bioinformatics (INB).

Ricardo Sangrós graduated from the Universitat Politécnica de Catalunya (UPC) in 2005. Until 2008, he was a bioinformatics technician at the Spanish National Institute of Bioinformatics (INB). He now works in Competitive Strategy Insurance (CSI).

Josep Alegre graduated from the Universitat Politécnica de Catalunya (UPC) in 2004. Until 2008 was a bioinformatics technician at the Spanish National Institute of Bioinformatics (INB).

Guillermo Pita studied at Universidad Politécnica de Madrid (UPM) and he obtained his agronomic engineer Degree. He obtained his Master of structural biology and biocomputing in the laboratory of Angel R. Ortiz at Severo Ochoa Molecular biology Center. In 2006 he joined the laboratory of Dr. Anna Gonzalez as bioinformatician, Currently he is focus on Next generation sequencing and SNPs data analysis.

Gemma Vellalta graduated in mathematics at the Universitat de Barcelona (UB, 2004) currently getting her MD Degree at the Universitat Autònoma de Barcelona (UAB). During 2006-2007, she did a stage in Dr. Malats’ Group at Institut Municipal d’Investigació Mèdica (IMIM) in Barcelona, where she was trained in Epidemiology and did a MPH at the Universitat Autònoma de Barcelona (UAB, 2007).

Nuria Malats is MD, MPH, and PhD from the Universitat Autònoma de Barcelona (UAB). She did a PhD on molecular epidemiology of pancreas cancer at Institut Municipal d’Investigació Mèdica (IMIM), Barcelona. She was visiting scientist (1996-1998) at the International Agency for Research on Cancer (IARC-WHO), Lyon, where she focused on genetic epidemiology. She returned to IMIM-CREAL in 1998. From 2007, she is leader of the Genetic and Molecular Epidemiology Group at the Spanish National Cancer Research Centre (CNIO). Dr. Malats leads and participates in national and international competitive funded projects in both pancreas and bladder cancer.

Pisano graduated in Biology from the Universidad de Oviedo. In 1998 he was recruited by Thermo Fisher Scientific and appointed as Services Delivery Manager for Spain, Portugal and Latin America. In 2001 he joined the molecular diagnostics company Genómica (Zeltia), Madrid, as Head Bioinformatician where he worked at the Functional Genomics Unit focusing on development of cancer biomarker discovery tools using DNA array technology. Pisano moved to the Centro Nacional de Biotecnología (CNB-CSIC), Madrid, in 2004 as Technical Manager for the Spanish National Bioinformatics Institute (INB). He joined the CNIO as Head of the Bioinformatics Unit in 2006.

Page 5: Select Your SNPs (SYSNPs): a web tool for automatic and ... Biomedical Research Park (PRBB), Spain and CIBER en Epidemiología y Salud Pública (CIBERESP), Spain and Spanish National

328 B. Lorente-Galdos et al.

Joaquín Dopazo leads the Bioinformatics and Genomics Department, CIPF, Valencia, Spain, since March 2005. Dopazo is BSc in Chemistry and PhD in Biology. He has previously worked for different companies (TDI S.A., GlaxoWellcome) and Research centers (INIA, CNB, CNIO). He is associate editor of Bioinformatics, BMC Research notes, and PLoS One and is part of the editorial board of eight more journals in the area of bioinformatics and genomics. He is author of more than 160 papers in peer-reviewed international journals. He has supervised the development of one of the largest and most cited resources for microarray data analysis (http://www.babelomics.org)

Arcadi Navarro graduated from the Universitat Autònoma de Barcelona UAB), where he finished a PhD in 1992. After quitting the academic world for a few years, he went back to basic research in 1999 as a postdoctoral fellow at the University of Edinburgh. In 2006 he was appointed ICREA Research Professor at the Universitat Pompeu Fabra (UPF), where he currently leads a research group in Evolutionary Genomics. In addition, he is the director of the Population Genomics Node of the Spanish National Institute for Bioinformatics (INB) and the Vice-director or the Institute for Evolutionary Biology (IBE).

1 Introduction Common phenotypes are usually caused by the interaction of multiple genetic and environmental factors. Very common human diseases, such as cancer, diabetes, cardiovascular or psychiatric diseases, are examples of such complex traits. Association studies have become the most successful approach in the discovery of genetic variants having small effects on complex diseases (Donnelly, 2008; McCarthy et al., 2008). These studies are based on large numbers of unrelated individuals and focus on searching association among genetic variations and the target phenotype, mostly by comparing a group of affected individuals (cases) with another group of unaffected individuals (controls). The best way to do it would be to fully sequence the genomes of many individuals, but this possibility is still too expensive and, thus, selecting and genotyping sets of markers is a more cost-effective strategy. Currently, the most frequent strategies are candidate-gene approaches and Genome-Wide Association Studies (GWASs). The former consists in selecting a number of genetic variants located in genes or regions that are deemed biologically relevant for the disease or trait under study; the latter strategy focuses in using pre-designed arrays containing hundreds of thousands, or even about a million SNPs distributed all over the genome.

SNPs are the choice markers for studies of association between genotypes and phenotypes because of their distributions along the genome, binary nature and stability. However, owing to the large number of SNPs available in the human genome (see dbSNP (Sherry et al., 2001)) and the dispersal in several different databases of the information that is relevant to prioritise some SNPs over others, selecting SNPs still demands a great deal of time and manual work. The information to take into account in the SNP selection includes functional properties (such as whether they affect amino acids or transcription factor binding sites), population properties (such as the Minimum Allele Frequency (MAF) in the population under study) and technological scores from the genotyping

Page 6: Select Your SNPs (SYSNPs): a web tool for automatic and ... Biomedical Research Park (PRBB), Spain and CIBER en Epidemiología y Salud Pública (CIBERESP), Spain and Spanish National

A web tool for automatic and massive selection of SNPs 329

platforms (SNPs that can be efficiently genotyped using a given technology may present high failure rates in others). In addition, the existing correlations between the variants present in loci that are in physical proximity along a chromosome are key for association studies. These correlations between genetic variants are known as Linkage Disequilibrium (LD), and it implies that, even if the causal variant is not selected for genotyping, if a nearby variant is selected and both are correlated, then the causing allele can be detected in the case-control study. The SNPs chosen as representatives of each group of correlated SNPs are called tag-SNPs and there are several well-known algorithms to select them (Carlson et al., 2004; de Bakker et al., 2005). In essence, each tag-SNP makes it unnecessary to genotype any SNP that it ‘captures’ (i.e., any SNP that presents a high enough correlation with the tag-SNP) because they would provide redundant information. Therefore, association studies rely on the selection of good sets of tag-SNPs, which even without being functional themselves provide information about nearby functional variants that may be contributing to a given trait or disease.

We have developed a web-tool, SYSNPs (standing for Select Your SNPs, www.sysnps.org) that allows the user to compile information about all the SNPs in any set of human genes or genome regions of interest and to select some of them by different criteria. After the criteria are established, SYSNPs provides a set of tag-SNPs that fulfil them, as well as all the information about the SNPs that are tagged by the selected tag-SNPs. Therefore, SYSNPs can be used at least with two main objectives. First, in the pre-genotyping phase of a project, SYSNPs helps in selecting optimal sets of tag-SNPs. Among other information, SYSNPs produces a file with a final list of tag-SNPs that is ready to be sent to the genotyping platform. Second, in the post-genotyping part of an association study, once SNPs associated with a trait have been detected, SYSNPs can help in annotating what are the known SNPs that are being tagged by the associated tag-SNPs and that, thus, may be the causal variants underlying the trait. SYSNPs provides a fully annotated list of the SNPs that are tagged by any set of SNPs that are considered of interest, together with a full list of their functional characteristics.

The main advantage of SYSNPs comes from the fact that it saves a large amount of time that would otherwise be devoted to semi-manual work. SYSNPs has already been used in several genotyping projects.

2 Implementation and results SYSNPs is implemented as web application that can be accessed by any standard web-browser. SYSNPs has been implemented using PHP, Ajax and SOAP. The database server is a MySQL community server. It is currently installed in a LAMP web server (Linux, Apache, Mysql, Php) running in a SuSE Linux Enterprise Server SO. The use of SYSNPs is intuitive and an extensive help section is provided. In summary, users can proceed in three steps:

• Input and Retrieval of coordinates. The coordinates of the regions of interest in which tag-SNPs will be selected are either directly entered by the user or computed by the web services in SYSNPs from information on genes or Gene Ontology (GO) terms that the user has provided.

Page 7: Select Your SNPs (SYSNPs): a web tool for automatic and ... Biomedical Research Park (PRBB), Spain and CIBER en Epidemiología y Salud Pública (CIBERESP), Spain and Spanish National

330 B. Lorente-Galdos et al.

• Listing of SNPs. SYSNPs provides a list of known SNPs located within the regions of interest, together with their position and detailed functional information. More specific functional information can be obtained through a direct interface with PupaSuite (Conde et al., 2006) if required by the user.

• SNP selection. The user can decide on a series of criteria according to which tag-SNPs will be selected. These include minimum MAF for tag-SNPs and the functional classes to which tag-SNPs must belong. Moreover, technological information (so far from the Illumina system) can be taken into account. In addition, the user can force SNPs into or out of the final list of tag-SNPs. Finally, the user selects the HapMap 2 population in which tag-SNP calculation will be performed.

After these steps are completed, the system runs the tagger algorithm (de Bakker et al., 2005) to get the sets of tag-SNPs and of captured SNPs. Extensive documentation about how to operate the application is provided in the SYSNPs webpage. The basic input, workflow and output are detailed here.

Input

The most basic input that SYSNPs needs to get started is a list of genome regions of interest, within which tagSNPs need to be selected or within which SNPs that are tagged by a list of target SNPs need to be annotated. This list of genome regions can be entered by the user by direct typing, cutting-and-pasting in the appropriate box or by uploading the appropriate files. These files can contain three different classes of information:

1 a list of pairs of coordinates defining genomic regions;

2 lists of genes for which coordinates are automatically obtained by the system (both gene names or ENSEMBL ids are allowed) and

3 a list of GO terms (Ashburner et al., 2000).

In the latter case, all the genes annotated with a given GO term are automatically included by the system in a given query. The different kinds of evidence that support GO annotations are classified as evidence codes that can be selected by the user. Thus, from a list of GO terms and considering the evidence codes selected by the user, their annotated genes will be analysed by SYSNPs as if the user had entered a list of genes.

Workflow

SYSNPs has been developed as a web application tool combining a set of web services in its workflow. The different steps of the workflow are detailed in Figure 1. The data that these web services manage and process are retrieved from a local database, which contains the relevant information from the last releases of HapMap (The-International-HapMap-Consortium, 2005) and Ensembl (Hubbard et al., 2009) and from the PupaSuite server (Conde et al., 2006). In addition, we have created our own database to save all the data generated in each selection process with a unique process_id identifier. The result of all user queries is stored in that database for three weeks after the process has finished and can be retrieved by the user at any time during that period.

Page 8: Select Your SNPs (SYSNPs): a web tool for automatic and ... Biomedical Research Park (PRBB), Spain and CIBER en Epidemiología y Salud Pública (CIBERESP), Spain and Spanish National

A web tool for automatic and massive selection of SNPs 331

Figure 1 SYSNPs workflow

Different web services have been implemented in SYSNPs. They are as follows:

• Coord2InfoSNPs. It provides a list of all the SNPs contained within a set of pairs of coordinates, together with their chromosomal positions and the functional classes of the genes in which SNPs map. Regions without SNPs are listed in a different file.

• Genes2InfoSNPs. From a list of genes and the upstream and downstream regions prefixed by the user, the web service provides a list of all the SNPs of each gene from our local Ensembl database, together with their chromosomal positions and functional classes. Ambiguously defined genes are also provided in a different list.

• GO2genes. Given a list of GO terms or GO ids and the evidence codes that the user wants to consider, the web service provides the list of genes annotated to the entered GO terms through the selected evidence codes.

• PupaSuite2SYSNPs. From a list of SNPs, the web service provides functional information on both non-coding and coding genic regions obtained from PupaSuite. This information includes, for example, whether an SNP is part of a triplex structure or a transcription factor binding site or whether an SNP is affecting the structure of the protein.

• SNPs2GenotypesHapMap. From a list of SNPs and a given HapMap population, it retrieves the genotypes of the SNPs from that population and the relevant information about samples. Those SNPs that do not have genotypes are listed in a separate file.

• Genotypes2tags. From the selected SNPs, genotypes and samples, this web service retrieves the tag-SNPs running the Tagger algorithm (de Bakker et al., 2005). The user can modify the parameter settings of the Tagger algorithm. For example, an LD threshold can be selected or the maximum number of tag-SNPs that will be obtained can be modified. These two are particularly important parameters, since they allow adaptation to any budgetary requirements.

Output

Depending on the input type (GO terms, genes or regions), different tables summarising the required information (such as the number of SNPs and tag-SNPs) appear in the

Page 9: Select Your SNPs (SYSNPs): a web tool for automatic and ... Biomedical Research Park (PRBB), Spain and CIBER en Epidemiología y Salud Pública (CIBERESP), Spain and Spanish National

332 B. Lorente-Galdos et al.

browser (Figure 2). In addition, all the available information is included for each individual SNP, stating also whether it is a tag-SNP, a captured SNP or a non-captured SNP. The reasons for inclusion or exclusion as tag-SNP are also summarised (Figure 3). The output is downloadable as a zipped file. One of the files included in the zip is just the final list of tag-SNPs, so that it can be sent directly to the desired genotyping platform.

Figure 2 Output examples: (a) Go term GO:0051098 was used as input. Other options were: 2 kb gene flanking regions included, intronic SNPs excluded if further than 1000kb, SNPS with MAD<0.001 excluded; (b) The OSCAR gene was used as input. Other options were: gene flanking regions 1kb upstream and 1.35 kb downstream included, exclusion of non-intronic and intergenic SNPs; SNP rs1657535 forced as tag-SNP, technological scores included; (c) A list of regions used as an input, with excluding of all intergenic, upstream and downstream SNPs (see online version for colours)

Figure 3 Example of output for SNP information resulting running SYSNPs for the OSCAR gene. All available information is included for each SNP. Information is provided about the status of each SNP as a tag-SNP, a captured SNP or a non-captured SNP. Each tagging group is assigned a number and the tag-SNPs of test group n are labelled as n**. SNPs lacking a number have not been captured by any tag-SNP. Note that rs663569 was excluded as tag-SNP but it is in high LD with tag-SNP rs1657535 (see online version for colours)

Page 10: Select Your SNPs (SYSNPs): a web tool for automatic and ... Biomedical Research Park (PRBB), Spain and CIBER en Epidemiología y Salud Pública (CIBERESP), Spain and Spanish National

A web tool for automatic and massive selection of SNPs 333

3 Discussion SYSNPs is being successfully used as a service in studies carried out by the CeGen (Spanish National Genotyping institution, for a list of projects, see the tab ‘Genotyping Projects’ in www.cegen.org). For example, in the context of a standard project scientist had previously selected a list of 102 candidate genes, with the purpose of getting a final list of approximately 750 SNPs for genotyping in a GoldenGate Illumina platform. Some SNPs previously genotyped in another study and others from literature search were forced as tag-SNPs. The manual selection took more than three weeks of full-time work by an experienced person. Running the program we got 1071 tag-SNPs for the complete list in a matter of minutes. Because the number of SNPs for genotyping was exceeded, we reduced the subset by repeating the selection for seven of the genes but excluding this time the SNPs found in intronic regions that are further than 5kb of the edges of an exon. The final result was a set of 753 tag-SNPs, as desired.

Furthermore, SYSNPs has been used in other kinds of studies, in which Whole Genome Scans have been carried out detecting significant associations with the phenotypes of interest. The associated SNPs were forced as tag-SNPs in SYSNPs while being as flexible as possible with the rest of selecting criteria. In that way, detailed information about other SNPs in their surrounding regions was obtained minimising the probabilities of missing any correlated SNP that could be the real contributor to the trait.

SYSNPs has been intensively tested to ensure that it produces the same results that a manual search would do, only that much faster. In several trial uses, a manual tag-SNP selection was performed and their results were compared with those obtained with SYSNPs. When imposing the same conditions in the two processes, results were always identical. This kind of comparison is always carried out every time the databases SYSNPs uses are updated to ensure consistency and functionality.

In addition, SYSNPs distinguishes itself from other SNP selection applications, such as Snagger (Edlund et al., 2008) or htSNPer (Ding et al., 2005) in several aspects. For example, relative to the most widely used application, Snagger (Edlund et al., 2008), SYSNPS provides comprehensive functional options and allows for massive SNPs selection simultaneously in many regions of the genome. In addition, it allows to annotate SNPs tagged by a given set of tag-SNPs. Relative to htSNPer (Ding et al., 2005), SYSNPs is regularly updated and maintained.

4 Conclusion SYSNPs is a user-friendly web-based tool that can help users in both the design of targeted association studies and the analysis of the results of GWASs carried out by means of SNP arrays.

Acknowledgements This work was partially supported by Red Temática de Investigación Cooperativa en Cáncer, Fundació La Marató de TV3, CIBER en Epidemiología y Salud Pública (Ciberesp), CIBER en Enfermedades Raras (Ciberer) and the National Institute for Bioinformatics (www.inab.org), a Platform of Genoma España.

Page 11: Select Your SNPs (SYSNPs): a web tool for automatic and ... Biomedical Research Park (PRBB), Spain and CIBER en Epidemiología y Salud Pública (CIBERESP), Spain and Spanish National

334 B. Lorente-Galdos et al.

References Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski,

K., Dwight, S.S., Eppig, J.T., Harris, M.A., Hill, D.P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J.C., Richardson, J.E., Ringwald, M., Rubin G.M. and. Sherlock, G. (2000) ‘Gene ontology: tool for the unification of biology. The gene ontology consortium’, Nat. Genet., Vol. 25, No. 1, pp.25–29.

Carlson, C.S., Eberle, M.A., Rieder, M.J., Yi, Q., Kruglyak, L. and Nickerson, D.A. (2004) ‘Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium’, Am. J. Hum. Genet., Vol. 74, No. 1, pp.106–120.

Conde, L., Vaquerizas, J.M., Dopazo, H., Arbiza, L., Reumers, J., Rousseau, F., Schymkowitz, J. and Dopazo, J. (2006) ‘PupaSuite: finding functional single nucleotide polymorphisms for large-scale genotyping purposes’, Nucleic Acids Res., Vol. 34 (Web Server issue), pp.W621–W625.

de Bakker, P.I., Yelensky, R., Pe’er, I., Gabriel, S.B., Daly, M.J. and Altshuler, D. (2005) ‘Efficiency and power in genetic association studies’, Nat. Genet., Vol. 37, No. 11, pp.1217–1223.

Ding, K., Zhang, J., Zhou, K., Shen, Y. and Zhang, X. (2005) ‘htSNPer1.0: software for haplotype block partition and htSNPs selection’, BMC Bioinformatics, Vol. 6, p.38.

Donnelly, P. (2008) ‘Progress and challenges in genome-wide association studies in humans’, Nature, Vol. 456, No. 7223, pp.728–7231.

Edlund, C.K., Lee, W.H., Li, D., Van Den Berg, D.J. and Conti, D.V. (2008) ‘Snagger: a user-friendly program for incorporating additional information for tagSNP selection’, BMC Bioinformatics, Vol. 9, p.174.

Hubbard, T.J., Aken, B.L., Ayling, S., Ballester, B., Beal, K., Bragin, E., Brent, S., Chen, Y., Clapham, P., Clarke, L., Coates, G., Fairley, S., Fitzgerald, S., Fernandez-Banet, J., Gordon, L., Graf, S., Haider, S., Hammond, M., Holland, R., Howe, K., Jenkinson, A., Johnson, N., Kahari, A., Keefe, D., Keenan, S., Kinsella, R., Kokocinski, F., Kulesha, E., Lawson, D., Longden I., Megy, K., Meidl, P., Overduin, B., Parker, A., Pritchard, B., Rios, D., Schuster, M., Slater, G., Smedley, D., Spooner, W., Spudich, G., Trevanion, S., Vilella, A., Vogel, J., White, S., Wilder, S., Zadissa, A., Birney, E., Cunningham, F., Curwen, V., Durbin, R., Fernandez-Suarez, X.M., Herrero, J., Kasprzyk, A., Proctor, G., Smith, J., Searle, S. and Flicek, P. (2009) ‘Ensembl 2009’, Nucleic Acids Res., Vol. 37 (Database issue), pp.D690–D697.

McCarthy, M.I., Abecasis, G.R., Cardon, L.R., Goldstein, D.B., Little, J., Ioannidis, J.P. and Hirschhorn, J.N. (2008) ‘Genome-wide association studies for complex traits: consensus, uncertainty and challenges’, Nat. Rev. Genet., Vol. 9, No. 5, pp.356–369.

Sherry, S.T., Ward, M.H., Kholodov, M., Baker, J., Phan, L., Smigielski, E.M. and Sirotkin, K., (2001) ‘dbSNP: the NCBI database of genetic variation’, Nucleic Acids Res., Vol. 29, No. 1, pp.308–311.

The-International-HapMap-Consortium (2005) ‘A haplotype map of the human genome’, Nature, Vol. 437, pp.1299–1320.