SNP Resources: Finding SNPs Databases and Data Extraction

Preview:

DESCRIPTION

SNP Resources: Finding SNPs Databases and Data Extraction. Mark J. Rieder, PhD Robert J. Livingston, PhD NIEHS Variation Workshop January 30-31, 2005. Genotype - Phenotype Studies. Typical Approach: “I have candidate gene/region and samples ready to study. - PowerPoint PPT Presentation

Citation preview

SNP Resources: Finding SNPsSNP Resources: Finding SNPsDatabases and Data ExtractionDatabases and Data Extraction

Mark J. Rieder, PhDMark J. Rieder, PhDRobert J. Livingston, PhDRobert J. Livingston, PhD

NIEHS Variation WorkshopNIEHS Variation WorkshopJanuary 30-31, 2005January 30-31, 2005

Genotype - Phenotype Studies

Other questions:Other questions:How do I know I have *all* the SNPs?How do I know I have *all* the SNPs?What is the validation/quality of the SNPs that are known?What is the validation/quality of the SNPs that are known?Are these SNPs informative in my population/sample?Are these SNPs informative in my population/sample?

What do I need to know for selecting the “best” SNPs?What do I need to know for selecting the “best” SNPs?How do I pick the “best” SNPs?How do I pick the “best” SNPs?

Typical Approach:

“I have candidate gene/region and samples ready to study. Tell me what SNPs to genotype.”

What information do I need toWhat information do I need tocharacterize a SNP for genotyping?characterize a SNP for genotyping?

Minimal SNP information for genotyping/characterizationMinimal SNP information for genotyping/characterization

• What is the SNP? Flanking sequence and alleles. FASTA format>snp_nameACCGAGTAGCCAG[A/G]ACTGGGATAGAAC

• dbSNP reference SNP # (rs #)

• Where is the SNP mapped? Exon, promoter, UTR, etc picture of gene with mapped to the gene structure.

• How was it discovered? Method • What assurances do you have that it is real? Validated how?• What population – African, European, etc?• What is the allele frequency of each SNP? Common (>10%), rare• Are other SNPs associated - redundant? Genotyping data!

Finding SNPs: Databases and ExtractionFinding SNPs: Databases and Extraction

How do I find and download SNP data for analysis/genotyping?How do I find and download SNP data for analysis/genotyping?

1. Entrez Gene1. Entrez Gene- dbSNP- dbSNP- Entrez SNP- Entrez SNP

2. HapMap Genome Browser2. HapMap Genome Browser

3. NIEHS Environmental Genome Project (EGP)3. NIEHS Environmental Genome Project (EGP)Candidate gene websiteCandidate gene website

4. NIEHS web applications and other tools4. NIEHS web applications and other toolsGeneSNPS, PolyDoms, TraFac, PolyPhen, GeneSNPS, PolyDoms, TraFac, PolyPhen, ECR Browser, GVS ECR Browser, GVS

NCBI - Database ResourceNCBI - Database Resource

www.ncbi.nlm.nih.gov

NOS2A

Finding SNPs: Where do I start?Finding SNPs: Where do I start?http://www.ncbi.nlm.nih.gov/gquery

Finding SNPs: Where do I start?Finding SNPs: Where do I start?

NCBI - Entrez GeneNCBI - Entrez Gene

Finding SNPs: Entrez GeneFinding SNPs: Entrez Gene

dbSNP GeneviewdbSNP Geneview

dbSNP GeneviewdbSNP Geneview

HapMap VerifiedHapMap Verified

Finding SNPs: dbSNP validationFinding SNPs: dbSNP validation

(by 2hit-2allele)(by 2hit-2allele)

Finding SNPs: dbSNP databaseFinding SNPs: dbSNP database

Entrez SNP - dbSNP genotype retrievalEntrez SNP - dbSNP genotype retrieval

Finding SNPs - Gene Genotype ReportFinding SNPs - Gene Genotype Report

Finding SNPs - Gene Genotype ReportFinding SNPs - Gene Genotype Report

Finding SNPs - Gene Genotype ReportFinding SNPs - Gene Genotype Report

Minimal SNP information for genotyping/characterizationMinimal SNP information for genotyping/characterization

• What is the SNP? Flanking sequence and alleles. FASTA format>snp_nameACCGAGTAGCCAG[A/G]ACTGGGATAGAAC

• dbSNP reference SNP # (rs #)

• Where is the SNP mapped? Exon, promoter, UTR, etc picture of gene with mapped to the gene structure.

• How was it discovered? Method • What assurances do you have that it is real? Validated how?• What population – African, European, etc?• What is the allele frequency of each SNP? Common (>10%), rare• Are other SNPs associated - redundant? Genotyping data!

dbSNP - data is theredbSNP - data is there

Entrez Gene Entry - Entrez SNPEntrez Gene Entry - Entrez SNP

Entrez SNP - direct dbSNP queryingEntrez SNP - direct dbSNP querying

Entrez SNP - Parseable Multi-SNP reportsEntrez SNP - Parseable Multi-SNP reports

Entrez SNP - Parseable Multi-SNP reportsEntrez SNP - Parseable Multi-SNP reports

Entrez SNP - Search Limiting CapabilitiesEntrez SNP - Search Limiting Capabilities

NOS2A

Entrez SNP - Search LimitsEntrez SNP - Search Limits

Entrez SNP - Search Limiting CapabilitiesEntrez SNP - Search Limiting Capabilities

Entrez SNP - Query Term CapabilitiesEntrez SNP - Query Term Capabilities

Entrez SNP - Search Terms FieldsEntrez SNP - Search Terms Fields

Entrez SNP - Search Terms FieldsEntrez SNP - Search Terms Fields

2[CHR] AND "coding nonsynon"[FUNC]

More advanced queries:More advanced queries:

Entrez SNP - Search Terms FieldsEntrez SNP - Search Terms Fields

2[CHR] AND "coding nonsynon"[FUNC] AND ”EGP_SNPS"[HANDLE]

Note: Can also use wildcard (*) characters, AND, OR, and NOT operators

More advanced queries:More advanced queries:

Entrez SNP - Advanced QueriesEntrez SNP - Advanced Queries

Minimal SNP information for genotyping/characterizationMinimal SNP information for genotyping/characterization

• What is the SNP? Flanking sequence and alleles. FASTA format>snp_nameACCGAGTAGCCAG[A/G]ACTGGGATAGAAC

• dbSNP reference SNP # (rs #)

• Where is the SNP mapped? Exon, promoter, UTR, etc picture of gene with mapped to the gene structure.

• How was it discovered? Method • What assurances do you have that it is real? Validated how?• What population – African, European, etc?• What is the allele frequency of each SNP? Common (>10%), rare• Are other SNPs associated - redundant? Genotyping data!

EntrezSNP - better!EntrezSNP - better!

Finding SNPs - Entrez SNP SummaryFinding SNPs - Entrez SNP Summary

1.1. dbSNP is useful for investigating detailed information on a dbSNP is useful for investigating detailed information on a small number SNPs - and its good for a picture of the genesmall number SNPs - and its good for a picture of the gene

2.2. Entrez SNP is a direct, fast, database for querying SNP data.Entrez SNP is a direct, fast, database for querying SNP data.

3.3. Data from Entrez SNP can be retrieved in batches for many SNPsData from Entrez SNP can be retrieved in batches for many SNPs

4.4. Entrez SNP data can be “limited” to specific subsets of SNPsEntrez SNP data can be “limited” to specific subsets of SNPsand formatted in plain text for easy parsing and manipulationand formatted in plain text for easy parsing and manipulation

5.5. More detailed queries can be formed using specific “field tags” More detailed queries can be formed using specific “field tags” for retrieving SNP data for retrieving SNP data

Finding SNPs: Databases and ExtractionFinding SNPs: Databases and Extraction

How do I find and download SNP data for analysis/genotyping?How do I find and download SNP data for analysis/genotyping?

1. Entrez Gene1. Entrez Gene- dbSNP- dbSNP- Entrez SNP- Entrez SNP

2.2. HapMap Genome BrowserHapMap Genome Browser

3. NIEHS Environmental Genome Project (EGP)3. NIEHS Environmental Genome Project (EGP)Candidate gene websiteCandidate gene website

4. NIEHS web applications and other tools4. NIEHS web applications and other toolsGeneSNPS, PolyDoms, TraFac, PolyPhen, GeneSNPS, PolyDoms, TraFac, PolyPhen, ECR Browser, GVS ECR Browser, GVS

www.hapmap.orgwww.hapmap.org

Finding SNPs: HapMap BrowserFinding SNPs: HapMap Browser

Finding SNPs: HapMap BrowserFinding SNPs: HapMap Browser

Finding SNPs: HapMap GenotypesFinding SNPs: HapMap Genotypes

Finding SNPs: HapMap BrowserFinding SNPs: HapMap Browser

Minimal SNP information for genotyping/characterizationMinimal SNP information for genotyping/characterization

• What is the SNP? Flanking sequence and alleles. FASTA format>snp_nameACCGAGTAGCCAG[A/G]ACTGGGATAGAAC

• dbSNP reference SNP # (rs #)

• Where is the SNP mapped? Exon, promoter, UTR, etc picture of gene with mapped to the gene structure.

• How was it discovered? Method • What assurances do you have that it is real? Validated how?• What population – African, European, etc?• What is the allele frequency of each SNP? Common (>10%), rare• Are other SNPs associated - redundant? Genotyping data!

Finding SNPs: HapMap BrowserFinding SNPs: HapMap Browser

1.1. HapMap data sets are useful because HapMap data sets are useful because individual genotype data can be used to determine optimalindividual genotype data can be used to determine optimalgenotyping strategies (tagSNPs) or perform populationgenotyping strategies (tagSNPs) or perform populationgenetic analyses (linkage disequilbrium)genetic analyses (linkage disequilbrium)

2.2. Data are specific produced by those projects (not all Data are specific produced by those projects (not all dbSNP)dbSNP) HapMap data is available in dbSNPHapMap data is available in dbSNP

3.3. HapMap data (Phase II) can be accessed preleased prior to HapMap data (Phase II) can be accessed preleased prior to dbSNPsdbSNPs

4.4. Easier visualization of data and direct access to Easier visualization of data and direct access to SNP data, individual genotypes, and LD analysisSNP data, individual genotypes, and LD analysis

Finding SNPs: Databases and ExtractionFinding SNPs: Databases and Extraction

How do I find and download SNP data for analysis/genotyping?How do I find and download SNP data for analysis/genotyping?

1. Entrez Gene1. Entrez Gene- dbSNP- dbSNP- Entrez SNP- Entrez SNP

2. HapMap Genome Browser2. HapMap Genome Browser

3. NIEHS Environmental Genome Project (EGP)3. NIEHS Environmental Genome Project (EGP)Candidate gene websiteCandidate gene website

4. NIEHS web applications and other tools4. NIEHS web applications and other toolsGeneSNPS, PolyDoms, TraFac, PolyPhen, GeneSNPS, PolyDoms, TraFac, PolyPhen, ECR Browser, GVS ECR Browser, GVS

Finding SNPs: NIEHS SNPs Candidate GenesFinding SNPs: NIEHS SNPs Candidate Genes

egp.gs.washington.eduegp.gs.washington.edu

Finding SNPs: NIEHS SNPs Candidate Genes Finding SNPs: NIEHS SNPs Candidate Genes

Finding SNPs: NIEHS SNPs Candidate Genes Finding SNPs: NIEHS SNPs Candidate Genes

Finding SNPs: NIEHS SNPs Candidate Genes Finding SNPs: NIEHS SNPs Candidate Genes

African AmericanAfrican American

African YRIAfrican YRI

European CEUEuropean CEU

HispanicHispanic

Asian CHB JPTAsian CHB JPT

SNP_pos <tab> Ind_ID <tab> allele1 <tab> allele2SNP_pos <tab> Ind_ID <tab> allele1 <tab> allele2Repeat for all individualsRepeat for all individualsRepeat for next SNPRepeat for next SNP

PolyPhen - PolyPhen - PolyPolymorphism morphism PhenPhenotypingotypingStructural protein characteristics and evolutionary comparisonStructural protein characteristics and evolutionary comparison

SIFT = Sorting Intolerant From TolerantSIFT = Sorting Intolerant From TolerantEvolutionary comparison of non-synonymous SNPsEvolutionary comparison of non-synonymous SNPs

Finding SNPs: NIEHS SNPs Candidate Genes Finding SNPs: NIEHS SNPs Candidate Genes

Finding SNPs: NIEHS SNPs Candidate GenesFinding SNPs: NIEHS SNPs Candidate Genes

egp.gs.washington.eduegp.gs.washington.edu

Finding SNPs: NIEHS SNPs Candidate GenesFinding SNPs: NIEHS SNPs Candidate Genes

Finding SNPs: Databases and ExtractionFinding SNPs: Databases and Extraction

How do I find and download SNP data for analysis/genotyping?How do I find and download SNP data for analysis/genotyping?

1. Entrez Gene1. Entrez Gene- dbSNP- dbSNP- Entrez SNP- Entrez SNP

2. HapMap Genome Browser2. HapMap Genome Browser

3. NIEHS Environmental Genome Project (EGP)3. NIEHS Environmental Genome Project (EGP)Candidate gene websiteCandidate gene website

4. NIEHS web applications and other tools4. NIEHS web applications and other toolsGeneSNPS, PolyDoms, TraFac, PolyPhen, GeneSNPS, PolyDoms, TraFac, PolyPhen, ECR Browser, GVSECR Browser, GVS

GeneSNPsGeneSNPs

Graphic view of SNPs in context of gene elementsGraphic view of SNPs in context of gene elementsAll NIEHS genes presentedAll NIEHS genes presented

- organized by pathway/function- organized by pathway/functionSNPs from dbSNP SNPs from dbSNP

- organized by submitter handle- organized by submitter handleSequence context of SNPs presented in Color Fasta Sequence context of SNPs presented in Color Fasta

formatformatLink-outs to EntrezSNP pagesLink-outs to EntrezSNP pagesSummary “Genome SNPs” view for one-stop SNP Summary “Genome SNPs” view for one-stop SNP

shopping shopping

http://www.genome.utah.edu/genesnps/http://www.genome.utah.edu/genesnps/

GeneSNPs: One stop shoppingGeneSNPs: One stop shopping

GeneSNPs: One stop shoppingGeneSNPs: One stop shopping

GeneSNPs: One stop shoppingGeneSNPs: One stop shopping

Polydoms Polydoms

A web-based application that maps synonymous and A web-based application that maps synonymous and non-synonymous SNPs onto known functional protein non-synonymous SNPs onto known functional protein domainsdomains

• SNPs are from dbSNP and GeneSNPsSNPs are from dbSNP and GeneSNPs• Domain structures from NCBI's Conserved Domain Domain structures from NCBI's Conserved Domain

Database Database • Functional predictions based on SIFT and Functional predictions based on SIFT and

PolyPhenPolyPhen• 3 dimensional mapping of SNPs on protein 3 dimensional mapping of SNPs on protein

structure using Chime viewerstructure using Chime viewer

http://polydoms.cchmc.org/polydoms/http://polydoms.cchmc.org/polydoms/

Polydoms

PolydomsPolydoms

Mapping of nsSNPS onto protein structure

ARG <-> 5 HISARG <-> 5 HIS

ARG <-> 107 HISARG <-> 107 HIS

TraFac: Transcription Factor TraFac: Transcription Factor

Binding Site ComparisonBinding Site Comparison

A tool for validating cis regulatory elements conserved A tool for validating cis regulatory elements conserved between human and mousebetween human and mouse

• Aligns human and mouse sequences using BLASTZAligns human and mouse sequences using BLASTZ• Consensus transcription factor binding sequences from Consensus transcription factor binding sequences from

Transfac database Transfac database

http://trafac.cchmc.org/trafachttp://trafac.cchmc.org/trafac

All TFBS in commonAll TFBS in common TFBS in parallelTFBS in parallel

TraFac: Transcription Factor TraFac: Transcription Factor

Binding Site ComparisonBinding Site Comparison

Aligns sequences to Mouse, Rat, Dog, Opposum, Aligns sequences to Mouse, Rat, Dog, Opposum, Chicken, Fugu and DrosophilaChicken, Fugu and Drosophila

Gene annotations from UCSC Genome BrowserGene annotations from UCSC Genome Browser

Easy retrieval of ECR sequences and alignmentsEasy retrieval of ECR sequences and alignments

Pre-computed transcription factor binding sites Pre-computed transcription factor binding sites

http://ecrbrowser.dcode.orghttp://ecrbrowser.dcode.org

ECR Browser: Evolutionary Conserved Regions

ECR Browser: Evolutionary Conserved Regions

ECR Browser: Evolutionary Conserved Regions

Human-mouse alignmentHuman-mouse alignment Fasta sequencesFasta sequences

ECR Browser: Evolutionary Conserved Regions

Transcription Factor Binding Sites from TransfacTranscription Factor Binding Sites from Transfac

Physical and comparative analyses used to make Physical and comparative analyses used to make predictionspredictions

Uses SwissProt annotations to identify known Uses SwissProt annotations to identify known domainsdomains

Calculates a substitution probability from BLAST Calculates a substitution probability from BLAST alignments of homologous and orthologous alignments of homologous and orthologous sequencessequences

Ranks substitutions on scale of predicted functional Ranks substitutions on scale of predicted functional effects from “benign” to “probably damaging”effects from “benign” to “probably damaging”

PolyPhen: Polymorphism Phenotyping-PolyPhen: Polymorphism Phenotyping- prediction of functional effect of human nsSNPsprediction of functional effect of human nsSNPs

http://tux.embl-heidelberg.de/ramensky/http://tux.embl-heidelberg.de/ramensky/

PolyPhen: Polymorphism Phenotyping-PolyPhen: Polymorphism Phenotyping- prediction of functional effect of human nsSNPsprediction of functional effect of human nsSNPs

tux.embl-heidelberg.de/ramensky/tux.embl-heidelberg.de/ramensky/

Provides rapid analysis of 4.3 million genotyped SNPs Provides rapid analysis of 4.3 million genotyped SNPs from dbSNP and the HapMap from dbSNP and the HapMap

Mapped to human genome build 35 (hg17) Mapped to human genome build 35 (hg17) Displays genotype data in text and image formatsDisplays genotype data in text and image formatsDisplays tagSNPs or clusters of informative SNPs in Displays tagSNPs or clusters of informative SNPs in

text and image formatstext and image formatsDisplays linkage disequilibrium (LD) in text and image Displays linkage disequilibrium (LD) in text and image

formatsformats

GVS: Genome Variation Server

http://gvs.gs.washington.edu/GVS/http://gvs.gs.washington.edu/GVS/

GVS: Genome Variation Server

http://gvs.gs.washington.edu/GVS/http://gvs.gs.washington.edu/GVS/

NOS2ANOS2A

EGP Yoruban populationEGP Yoruban population

UTR and Coding SNPsUTR and Coding SNPs

Finding SNPs: Databases and Extraction Finding SNPs: Databases and Extraction

One stop shopping One stop shopping - NIEHS SNPs and GeneSNPs - NIEHS SNPs and GeneSNPs

Prediction of functional variationsPrediction of functional variations- Polydoms and PolyPhen - Polydoms and PolyPhen

Identification of trancription factor binding sites in Identification of trancription factor binding sites in Evolutionary Conserved RegionsEvolutionary Conserved Regions

- TraFac and the ECR browser- TraFac and the ECR browser

Visualization and analysis of LD and TagSNPs Visualization and analysis of LD and TagSNPs - GVS - GVS

Recommended