Transcript
Page 1: Outline to SNP bioinformatics lecture

Outline to SNP bioinformatics lecture

• Brief introduction

• SNPs in cell biology

• SNP discovery

• SNP assessment

• SNP databases

• SNPs in genome browsers

Page 2: Outline to SNP bioinformatics lecture

Single Nucleotide Polymorphisms

• Must be present in at least 1% of the population

• Most (90%) of the sequence variation between two genomes

• Two humans differ 0.1%• 1/300 bp in the human genome

– Lower in coding regions

• 10 million in the human genome

Page 3: Outline to SNP bioinformatics lecture

Categories of SNPs

• Missense/Non-synonymous– Changes an amino acid– About half of the SNPs in coding sequence– Can alter function and or structure of the protein– Cause of most monogenetic diseases

• Hemochromatosis (HFE)• Cystic fibrosis (CFTR)• Hemophilia (F8)

• Nonsense– Introduces a stop codon– Same consequences as non-synonymous

Page 4: Outline to SNP bioinformatics lecture

Categories of SNPs

• Synonymous– Does not alter the coding sequence– May alter splicing

• Non-coding– Can be located in promoter or regulatory

regions– Can impact the expression of the gene

• All SNPs can be used as markers

Page 5: Outline to SNP bioinformatics lecture

Use to cell biologist

• Association studies– Use SNPs as markers to find regions associated with

phenotype

• Causative SNPs– Altered protein– Altered expression

• Regions of altered conservation between strains/species/individuals

• Evolutionary analyses• Etc…

Page 6: Outline to SNP bioinformatics lecture

SNP discovery

• Discovery of SNPs usually from sequencing• Discovery is based on separating

sequencing errors from ’real’ differences and assessing the frequency in the sequenced population

• Separation of parologous sequences

• Validation, genotyping

Page 7: Outline to SNP bioinformatics lecture

SNP discovery resources

• Polybayes – SNP discovery in redundant sequences

• Polyphred– SNP discovery based on phred/phrap/consed

• NovoSNP– Graphical identification of SNPs

Page 8: Outline to SNP bioinformatics lecture

Example: PolyPhred

• Detects heterozygotes from chromatograms

• Runs together with phred/phrap/consed

• Command line

Page 9: Outline to SNP bioinformatics lecture

SNP assessment

• Assess SNPs for functional effects– Non-synonymous SNPs

• Conservation across species

• Amino acid properties

• Protein structure

• Transmembrane regions, signal peptides etc.

Page 10: Outline to SNP bioinformatics lecture

SNP assessment resources

• SIFT• PolyPhen• Pmut• SNPs3D• PANTHER PSEC• TopoSNP• MAPP• Etc

Page 11: Outline to SNP bioinformatics lecture

Example: SIFT

• Sorting Intolerant From Tolerant

• Builds an alignment of similar sequences

• Calculates a score based on the aa in the alignment

• Takes the environment into account

• Takes the properties of the aa into account

• Does not use structure

Page 12: Outline to SNP bioinformatics lecture
Page 13: Outline to SNP bioinformatics lecture

SNP databases

• Maps of SNPs in human, mouse, etc

• Haplotype maps

• Functional SNPs

• Disease databases

Page 14: Outline to SNP bioinformatics lecture

SNP databases

• dbSNP

• F-SNP

• HGVBase

• PolyDoms

• OMIN

• Etc…

Page 15: Outline to SNP bioinformatics lecture

Example: dbSNP

• 50 million submissions

• 18 million clusters

• 7 million in genes

• 44 organisms

• 91 million SNPs submitted

Page 16: Outline to SNP bioinformatics lecture

dbSNP

• Search for SNPs, location, etc

• Information submitted on method, flanking sequence, alleles, population, sample size, validation etc

• Information computed on SNPs at same location including functional analysis, population diversity etc

Page 17: Outline to SNP bioinformatics lecture
Page 18: Outline to SNP bioinformatics lecture

SNPs in genome browsers

• Ensembl

• UCSC

Page 19: Outline to SNP bioinformatics lecture

Example: UCSC

Page 20: Outline to SNP bioinformatics lecture
Page 21: Outline to SNP bioinformatics lecture
Page 22: Outline to SNP bioinformatics lecture
Page 23: Outline to SNP bioinformatics lecture

HapMap

• Aim: a haplotype map of the human genome describing common patterns of sequence variation

• A haplotype map is based on alleles of SNPs close together are inherited together

• HapMap will identify which SNPs are informative in mapping, reducing the number of SNPs to genotype by a magnitude

• Populations from Asia, Europe and Africa• 2nd generation map with over 3.1 million SNPs

Page 24: Outline to SNP bioinformatics lecture

Ng PC, Henikoff S. Predicting the effects of amino acid substitutions on protein function.

Annu Rev Genomics Hum Genet. 2006;7:61-80. Review.

Bhatti P, Church DM, Rutter JL, Struewing JP, Sigurdson AJ.

Candidate single nucleotide polymorphism selection using publicly available tools: a guide for epidemiologists.

Am J Epidemiol. 2006 Oct 15;164(8):794-804. Epub 2006 Aug 21.

Clifford RJ, Edmonson MN, Nguyen C, Scherpbier T, Hu Y, Buetow KH.

Bioinformatics tools for single nucleotide polymorphism discovery and analysis.

Ann N Y Acad Sci. 2004 May;1020:101-9. Review.

The International HapMap Consortium.

A second generation human haplotype map of over 3.1 million SNPs.

Nature 449, 851-861. 2007.


Recommended