26
Presented by: Andrew McMurry Boston University Bioinformatics Children’s Hospital Informatics Program Harvard Medical School Center for BioMedical Informatics This Presentation Available at: http://pixelshelf.com/~justandy/f-snp.ppt

Presented by: Andrew McMurry Boston University Bioinformatics Children’s Hospital Informatics Program Harvard Medical School Center for BioMedical Informatics

Embed Size (px)

Citation preview

Page 1: Presented by: Andrew McMurry Boston University Bioinformatics Children’s Hospital Informatics Program Harvard Medical School Center for BioMedical Informatics

Presented by:

Andrew McMurry

Boston University BioinformaticsChildren’s Hospital Informatics Program

Harvard Medical School Center for BioMedical Informatics

This Presentation Available at: http://pixelshelf.com/~justandy/f-snp.ppt

Page 2: Presented by: Andrew McMurry Boston University Bioinformatics Children’s Hospital Informatics Program Harvard Medical School Center for BioMedical Informatics

Outline

Incidental Findings and Disconnected Patient Cohorts

Disease Association Studies Using SNPs

How SNPs cause disease

Computationally predict affect of SNPs within introns, exons, and regulatory regions

The Future Is Now: SNPs, Personalized Medicine, and Translational Research

Page 3: Presented by: Andrew McMurry Boston University Bioinformatics Children’s Hospital Informatics Program Harvard Medical School Center for BioMedical Informatics

Incidental Findings and Disconnected Patient Cohorts

IF the central dogma of Biology is: “From DNA ->RNA ->Protein”

THEN where is the patient data for association studies?

Very little patient data spanning DNA/RNA/ protein/phenotype across a single cohort

Need to obtain “robust” sample sizes to avoid incidental findings due to multiple testing [1]

[1] Isaac Kohane, Daniel Masys, and Russ Altman. "The Incidentalome: A Threat to Genomic Medicine" JAMA 296(2): 212-215. July 12, 2006.

Page 4: Presented by: Andrew McMurry Boston University Bioinformatics Children’s Hospital Informatics Program Harvard Medical School Center for BioMedical Informatics

Disease Association Studies Using SNPs

DNA sequencing technologies still very expensive Stunningly few patients Minimal sequence coverage

Could change in time with Solexa/454

Even with solexa/454 there is a massive task of piecing together the results (often max sequence read shorter than single repeated gene)

Rate limiting step: Adoption rate of DNA sequencing

Use what is available in abundance! SNP chips Abundance of SNP chips in public repos on many diseasesWhole genome coverage 500k SNPs for $250

Page 5: Presented by: Andrew McMurry Boston University Bioinformatics Children’s Hospital Informatics Program Harvard Medical School Center for BioMedical Informatics

Disease Association Studies Using SNPs

DNA to RNA to Protein

Associating DNA & RNA GEO alone well over 100k Gene Expression ArraysWhat if we could correlate SNPs affect on Gene Expression?

Associating DNA & Gene Product (protein)Countless public protein databasesWhat if we could correlate SNPs affect on Protein Coding?

Association studies involving multiple genomic measurementsWhat are the existing studies and models (HMMs/Bayes nets) that could be strengthened with evidence from SNP chips?

Page 6: Presented by: Andrew McMurry Boston University Bioinformatics Children’s Hospital Informatics Program Harvard Medical School Center for BioMedical Informatics

How SNPs cause disease Intron

Likely no affect

Protein Coding Missense

• Synonymous Same Amino Acid• Non Synonymous Different Amino Acid

• Nonsense • Premature STOP

• Splicing Regulation• Incorrect final mRNA transcript

• Transcriptional Regulation• Differential gene expression

• Post Translational • Protein phosphorylation

Page 7: Presented by: Andrew McMurry Boston University Bioinformatics Children’s Hospital Informatics Program Harvard Medical School Center for BioMedical Informatics

So how do we measure all these affects of SNPs?

Page 8: Presented by: Andrew McMurry Boston University Bioinformatics Children’s Hospital Informatics Program Harvard Medical School Center for BioMedical Informatics

F-SNP : integrated approach

1. Classify SNP site using dbSNP• Intron• Coding Region • Splice Site • TF binding Site • Post-Translational Site

2. Evaluate using the specialized algorithms/dbs

• Coding region (missense/nonsense mutations)• Splice Site (intronic/exonic sites)• TF binding Site (promoter/repressor/etc)• Post-Translational Site (Phospho/Tyrosine/0-

glycosylation)

3. “Majority Vote” across algorithms

Page 9: Presented by: Andrew McMurry Boston University Bioinformatics Children’s Hospital Informatics Program Harvard Medical School Center for BioMedical Informatics

F-SNP decision procedure for functional SNPs

Page 10: Presented by: Andrew McMurry Boston University Bioinformatics Children’s Hospital Informatics Program Harvard Medical School Center for BioMedical Informatics

F-SNP: User Interfaces & Data Download

Public Web Site

Federated Query = entire database cannot be downloaded

Currently: no SOAP (webservice) support no RSS support No source code available

However:Paper gives explicit instructions on how to reproduce the algorithm and construct the database using dbSNP, OMIM, etc.

Page 11: Presented by: Andrew McMurry Boston University Bioinformatics Children’s Hospital Informatics Program Harvard Medical School Center for BioMedical Informatics

“Large N Study” using F-SNP

Functional Category # of Assessed SNPs # of Functional SNPs

Protein Coding 154,140 66,899

Splicing Regulation 73,051 8,075

Transcriptional Regulation 453,710 78,296

Post Translation 64,736 4,477

Total 559,322 115,356

Page 12: Presented by: Andrew McMurry Boston University Bioinformatics Children’s Hospital Informatics Program Harvard Medical School Center for BioMedical Informatics
Page 13: Presented by: Andrew McMurry Boston University Bioinformatics Children’s Hospital Informatics Program Harvard Medical School Center for BioMedical Informatics

Evaluate Individual SNP (rs28897699)

Page 14: Presented by: Andrew McMurry Boston University Bioinformatics Children’s Hospital Informatics Program Harvard Medical School Center for BioMedical Informatics

SNP summary and Functional Predictions

Page 15: Presented by: Andrew McMurry Boston University Bioinformatics Children’s Hospital Informatics Program Harvard Medical School Center for BioMedical Informatics

SNP Primary Information (rs28897699) Locus Alleles Ancestral Allele Validation (if any) Region Link to References

Page 16: Presented by: Andrew McMurry Boston University Bioinformatics Children’s Hospital Informatics Program Harvard Medical School Center for BioMedical Informatics

F-SNP: Functional Predictions

Page 17: Presented by: Andrew McMurry Boston University Bioinformatics Children’s Hospital Informatics Program Harvard Medical School Center for BioMedical Informatics

F-SNP Prediction Detail:PolyPhen = benign affect on protein coding

Page 18: Presented by: Andrew McMurry Boston University Bioinformatics Children’s Hospital Informatics Program Harvard Medical School Center for BioMedical Informatics

F-SNP Prediction Detail:SNPs3D = deleterious to protein coding

NCBI Gene InformationProduct breast cancer 1, early onset Other names,BRCA1,BRCAI,BRCC1,IRIS,PSCP,RNF53

NCBI Entrez Gene Summary: This gene encodes a nuclear phosphoprotein that plays a role in maintaining genomic stability and acts as a tumor suppressor. (…) Mutations in this gene are responsible for approximately 40% of inherited breast cancers and more than 80% of inherited breast and ovarian cancers. Alternative splicing plays a role in modulating the subcellularlocalization and physiological function of this gene. Many alternatively spliced transcript variants have been described for this gene but only some have had their full-length natures identified. (…)

Page 19: Presented by: Andrew McMurry Boston University Bioinformatics Children’s Hospital Informatics Program Harvard Medical School Center for BioMedical Informatics

F-SNP functional prediction

on Protein Coding 2 votes benign, 1 deleterious, 1 nonsynonymous

on Splicing Regulation predicted functional impact (by majority vote)

Page 20: Presented by: Andrew McMurry Boston University Bioinformatics Children’s Hospital Informatics Program Harvard Medical School Center for BioMedical Informatics

Gene level view of BRCA1

Query by gene name = “BRCA1”

Returns list of SNPs in BRCA1

Returns list of Cancers associated with BRCA1

Page 21: Presented by: Andrew McMurry Boston University Bioinformatics Children’s Hospital Informatics Program Harvard Medical School Center for BioMedical Informatics

Gene level view of BRCA1

our SNP has functional impact

our SNP has neighboring functional SNPS

Page 22: Presented by: Andrew McMurry Boston University Bioinformatics Children’s Hospital Informatics Program Harvard Medical School Center for BioMedical Informatics

Disease Level View : Breast Cancer

Page 23: Presented by: Andrew McMurry Boston University Bioinformatics Children’s Hospital Informatics Program Harvard Medical School Center for BioMedical Informatics

Disease Level View : Breast Cancer

Show all disease genes associated with breast cancer

Denote if SNPs are present in those genes (5k up/downstream)

Page 24: Presented by: Andrew McMurry Boston University Bioinformatics Children’s Hospital Informatics Program Harvard Medical School Center for BioMedical Informatics

Recap of Disease Level View

Page 25: Presented by: Andrew McMurry Boston University Bioinformatics Children’s Hospital Informatics Program Harvard Medical School Center for BioMedical Informatics

The Future Is Now: SNPs, Personalized Medicine, and Translational

Research

SNP profiling becoming part of routine care [2]

Increase # of clinically annotated SNP chips Increase # of disease association studies using SNPs

Increase in NIH focus on “translational research” that bridges routine care delivery with research efforts

Genome Wide Association Studies (GWAS) that actually get funded

[2] Kohane IS, Mandl KD, Taylor PL, Holm IA, Nigrin DJ, Kunkel “LM. Medicine. Reestablishing the researcher-patient compact.” Science. 2007 Nov 16;318(5853):1068.

Page 26: Presented by: Andrew McMurry Boston University Bioinformatics Children’s Hospital Informatics Program Harvard Medical School Center for BioMedical Informatics

F-SNP Summary

Incidental Findings and Disconnected Patient Cohorts Central dogma of biology DNA->RNA-Protein, yet we lack cohort spans all

measurements Using limited sample size will inevitably lead to incidental outcomes

Disease Association Studies Using SNPs Don’t wait for DNA sequencing to become widespread SNPs are becoming an abundant resource and not going to disappear

How SNPs cause disease Protein Coding Splicing Regulation Transcription Regulation Post Translation

Computationally predict affect of SNPs within introns, exons, and regulatory regions

Multitude of existing SNP analysis tools and resources F-SNP provides a single web based resource to mine SNP disease associations Query and analysis by SNP, Gene, Disease

The role of SNPs in Personalized Medicine & and Translational Research