Tools in Human Genomics
Brett BowmanMarch 3rd, 2013
Summary
• Brief Bio• Review of Genotyping Technologies– SNP-chips (23andMe)– Exome Sequencing (In Clinical Use)– Whole Genome Sequencing
• Review of SNP Analysis tools– SNP Databases– Report Tools– OMIM
My (Sort of) Karyotype
GENOTYPING TECHNOLOGIES
23andMe – How They Do It
23andMe – How it Works
• Attach un-labeled sequence probes to array surface
• Extract and Amplify sample DNA• Fragment• Wash over and bind to array probes• Extend probe 1 bp with polymerase
and labeled dNTPs• Photograph!
23andMe – Processed Output
• rsid == refSNP id == dbSNP id
• Two letter genotype representing both alleles
• NOT phased data• No quality information
SNP-Chips Limitations
• Requires a priori knowledge of SNPs of interest• Requires individual probes be designed and
manufactured for each SNP• SNP-Chips limited by size in the number of probes
they can contain• Cannot determine phase• Cannot determine copy number• Small Error Rate * Large Number
= High Error Count
Exome Sequencing – How it Works• Prepare labeled
sequence probes• Extract, Sheer, and
clean-up DNA• Mix probes with DNA• Wash away un-bound
DNA• Digest probes• Sequence!
Exome Sequencing – Raw Output
PHRED Quality Scores
Encoded Score (E) = chr(Q + 33)Numerical Score (Q) = ord(E) - 33
Exome Sequencing – Processed Output
Exome Sequencing Limitations• Requires a priori knowledge of Genes of interest• Requires individual probes be designed and
manufactured for each exon/gene• Hard to infer copy number• Very limited ability to phase data• Hard to make sense of novel data• Contains very little regulatory data• Complicated, unstandardized, computationally
intensive analysis processes
SNP-Chip vs Exome
SNP-Chip• Cheaper (~$100)• Lower Accuracy• Requires precise
knowledge a priori• At best gets 10-20% of
known variants• No phasing data• No structural data• Simple analysis tools
Exome• Expensive (~$1000)• Higher Accuracy• Requires general
knowledge a priori• At best gets 80-90% of
known variants• Some phasing data• Some structural data• Complex analysis tools
Full Genome SequencingHow many human genomes have been
completely sequenced end-to-end?
Full Genome SequencingHow many human genomes have been
completely sequenced end-to-end?
0
Full Genome Sequencing - Challenges
• Sequence Repeats• Secondary Structure• Particularly– Telemeric Region– Centromeric Region
• Methylation• Regulation• Interpretation
ANALYSIS TOOLS
SNPedia
SNPedia
SNPedia - Promethease
openSNP
OMIM
23andMe API
Genomes Unzipped
Questions?