Lecture II: Genomic Methods Dennis P. Wall, PhD Frederick G.
Barr, MD, PhD Deborah G.B. Leonard, MD, PhD 1TRiG Curriculum:
Lecture 2March 2012
Slide 2
Why Pathologists? We have access, we know testing Personalized
Risk Prediction, Medication Dosing, Diagnosis/ Prognosis Physician
sends sample to Pathology (blood/tissue) Pathologists Access to
patients genome Just another laboratory test 2TRiG Curriculum:
Lecture 2March 2012
Slide 3
The path to genomic medicine Sample Collection Testing:
Sequencing, Gene chips Analysis Pathologists Access to patients
genome Sample Collection 3TRiG Curriculum: Lecture 2March 2012
Slide 4
What we will cover today: Types of genetic alterations Current
and future molecular testing methods Cytogenetics, in situ
hybridization, PCR Gene chips Genotyping Expression profiling Copy
number variation Next generation sequencing (NGS) Whole genome
Transcriptome 4TRiG Curriculum: Lecture 2March 2012
Slide 5
DNA alterations the small stuff Point mutation Repeat
alteration Deletion/Insertion CCTGAGGAG CCTGTGGAG TTCCAG(CAG) 5
CAGCAA GAATTAAGAGAAGCAGAAGCA Example: hemoglobin, beta sickle cell
disease Example: epidermal growth factor receptor lung cancer
TTCCAG(CAG) 60 CAGCAA Example: huntingtin Huntington disease 5TRiG
Curriculum: Lecture 2March 2012
Slide 6
DNA alterations the bigger stuff Translocation Amplification
Deletion/ Insertion Example: 22q11.2 region DiGeorge syndrome
Example: 17q21.1 (ERBB2) Breast cancer Example: t(11;22)(q24;q12)
Ewings sarcoma 11 22 Der 11 Der 22 6TRiG Curriculum: Lecture 2March
2012
Slide 7
Previous strategies to detect DNA alterations Cytogenetics:
Large indels, amplification, translocations In situ hybridization:
large indels, amplification, translocations http://moon.ouhsc.edu
EGFR amplification in glioblastoma t(6;15) in woman with repeated
abortions http://www.indianmedguru.com 7TRiG Curriculum: Lecture
2March 2012
Slide 8
Previous strategies to detect DNA alterations PCR-based
approaches: Mutations, small indels, repeat alterations, large
indels, amplification, translocations Alsmadi OA, et al. BMC
Genomics 2003 4:21 Factor V Leiden mutation 8TRiG Curriculum:
Lecture 2March 2012
Slide 9
What we will cover today: Types of genetic alterations Current
and future molecular testing methods Cytogenetics, in situ
hybridization, PCR Gene chips Genotyping Expression profiling Copy
number variation Next generation sequencing (NGS) Whole genome
Transcriptome 9TRiG Curriculum: Lecture 2March 2012
Slide 10
DNA microarray - the basics Purpose: multiple simultaneous
measurements by hybridization of labeled probe DNA elements may be:
Oligonucleotides cDNAs Large insert genomic clones Microarray is
generated by: Printing Synthesis 10TRiG Curriculum: Lecture 2March
2012
Organization of a DNA microarray (adapted from Affymetrix) 1.28
cm 12TRiG Curriculum: Lecture 2March 2012
Slide 13
Hybridization of a labeled probe to the microarray 13 (adapted
from Affymetrix) TRiG Curriculum: Lecture 2March 2012
Slide 14
Detection of hybridization on microarray Light from laser 14
(adapted from Affymetrix) TRiG Curriculum: Lecture 2March 2012
Slide 15
Hybridization intensities on DNA microarray following laser
scanning 15TRiG Curriculum: Lecture 2March 2012
Slide 16
Overview of SNP array technology LaFramboise T. Nucleic Acids
Res. 2009; 37:4181 16TRiG Curriculum: Lecture 2March 2012
Slide 17
Microarray Applications DNA analysis Polymorphism/mutation
detection e.g. Disease susceptibility testing Drug
efficacy/sensitivity testing Copy number detection (comparative
genomic hybridization) e.g. Constitutional or cancer karyotyping
Bacterial DNA e.g. Identification and speciation RNA analysis
Expression profiling e.g. Breast cancer prognosis Cancer of unknown
primary origin cv 17TRiG Curriculum: Lecture 2March 2012
Slide 18
Genome-wide association studies of breast cancer microarray
with 317,139 SNPs Hung RJ, et al. Nature Genetics. 2008; 452:633
Cases/controls From different populations 18TRiG Curriculum:
Lecture 2March 2012
Slide 19
Genotype calling Hybridization intensities translated into
genotypes Large SNP numbers requires automated procedure Recent
algorithms clustering/pooling strategies Raw hybridization
intensities normalized Information combined across different
samples at each SNP Assign genotypes to entire clusters For each
sample, estimate probability of each of three genotype calls at
each SNP Genotype assigned based on defined threshold of
probability Missing genotypes dependent on algorithm &
threshold used Teo YY, Curr Op in Lipidology. 2008; 19:133 19TRiG
Curriculum: Lecture 2March 2012
Slide 20
Genotyping - Limitations & quality control Accuracy of
algorithm Depends on number of samples in each cluster Prone to
errors for small number of samples or SNPs with rare alleles High
rates of missing genotypes: Array problems plating/synthesis issue
Poor quality DNA degradation Hybridization failure Differential
performance between SNPs Excess heterozygosity - sample
contamination? Just another laboratory test 20TRiG Curriculum:
Lecture 2March 2012
Slide 21
Analyzed 8,101 genes on chip microarrays Reference= pooled cell
lines Breast cancer subgroups Perou CM, et al. Nature. 2000; 406,
747 21TRiG Curriculum: Lecture 2March 2012
Slide 22
Original two probe strategy for expression profiling on cDNA
arrays Duggan DJ, et al., Nature Genetics. 1999; 21:10 22TRiG
Curriculum: Lecture 2March 2012
Slide 23
Expression profiling: challenges and limitations Biological
Dynamic & complex nature of gene expression Heterogeneous
nature of tissue samples Variation in RNA quality Technological
Reproducibility across microarray platforms Selection of probes
dependence on binding efficiency Controlling for technical
variability Statistical/bioinformatic Adequate experimental design
Normalization to remove variability among chips Multiple testing
correction Validation of results Just another laboratory test
23TRiG Curriculum: Lecture 2March 2012
Slide 24
Copy number variation: Comparative genomic hybridization CG H
Array-CGH Metaphase Chromosomes Arrayed DNAs Tumor DNAReference DNA
Hybridization Deletion Gain Deletion Gain 24
http://www.advalytix.com/advalytix/hybridization_330.htm TRiG
Curriculum: Lecture 2March 2012
Slide 25
Constitutional genomic imbalances detected by copy number
arrays 10.9 Mb deletion at 7q11 7.2 Mb duplication on 11q Miller
DT, et al, Amer J Hum Genet. 2010; 86:749 25TRiG Curriculum:
Lecture 2March 2012
Slide 26
Copy number - Limitations & quality control Artifacts may
be caused by: GC content Wavy patterns correlate with GC content
Algorithms developed to remove waviness DNA sample quantity and
quality Can impact on level of signal noise and false positive rate
Whole genome amplification associated with signal noise Sample
composition In cancer studies, normal cells dilute cancer
aberrations Tumor heterogeneity will also affect copy number 26
Just another laboratory test TRiG Curriculum: Lecture 2March
2012
Slide 27
What we will cover today: Types of genetic alterations Current
and future genetic test methods Cytogenetics, in situ
hybridization, PCR Gene chips Genotyping Expression profiling Copy
number variation Next generation sequencing (NGS) Whole genome
Transcriptome 27TRiG Curriculum: Lecture 2March 2012
Slide 28
Cancer Treatment : NGS in AML Welch JS, et al. JAMA, 2011;305,
1577 28TRiG Curriculum: Lecture 2March 2012
Slide 29
Case History 39 year old female with APML by morphology
Cytogenetics and RT-PCR unable to detect PML-RAR fusion Clinical
question: Treat with ATRA versus allogeneic stem cell transplant
29TRiG Curriculum: Lecture 2March 2012
Slide 30
Methods/Results Paired-end NGS sequencing Result:
Cytogenetically cryptic event: novel fusion protein Took 7 weeks
30TRiG Curriculum: Lecture 2March 2012
Slide 31
77-kilobase segment from Chr. 15 was inserted en bloc into the
second intron of the gene RARA on Chr. 17. 31TRiG Curriculum:
Lecture 2March 2012
Slide 32
Workflow Image processing and base calling Raw Data Analysis
Alignment to reference genome Whole Genome Mapping Detection of
genetic variation (SNPs, Indels, SV) Variant Calling Linking
variants to biological information Annotation 32TRiG Curriculum:
Lecture 2March 2012
Slide 33
Overview of Paired End Sequencing Short Insert DNA Random
Shearing Adapter s Ligated Annealed to Surface Sequenced
Synthesized Sequencing done with labeled NTPs and massively
parallel 33TRiG Curriculum: Lecture 2March 2012
Slide 34
Short read output format Read ID Sequence Quality line 34TRiG
Curriculum: Lecture 2March 2012
Slide 35
Quality control is critical Just another laboratory test 35TRiG
Curriculum: Lecture 2March 2012
Slide 36
Measuring Accuracy Phred is a program that assigns a quality
score to each base in a sequence. These scores can then be used to
trim bad data from the ends, and to determine how good an overlap
actually is. Phred scores are logarithmically related to the
probability of an error: a score of 10 means a 10% error
probability; 20 means a 1% chance, 30 means a 0.1% chance, etc. A
score of 20 is generally considered the minimum acceptable score.
36TRiG Curriculum: Lecture 2March 2012
Slide 37
Workflow Image processing and base calling Raw Data Analysis
Alignment to reference genome Whole Genome Mapping Detection of
genetic variation (SNPs, Indels, SV) Variant Calling Linking
variants to biological information Annotation 37TRiG Curriculum:
Lecture 2March 2012
Slide 38
Alignment/Mapping CCATAGGCTATATGCGCCCTATCGGCAATTTGCGGTATAC
GCGCCCTA GCCCTATCG CCTATCGGA CTATCGGAAA AAATTTGC TTTGCGGT TTGCGGTA
GCGGTATA GTATAC TCGGAAATT CGGAAATTT CGGTATAC TAGGCTATA GCCCTATCG
CCTATCGGA CTATCGGAAA AAATTTGC TTTGCGGT TCGGAAATT CGGAAATTT
AGGCTATAT GGCTATATG CTATATGCG CC CCA CCAT ATAC C CCAT
CCATAGTATGCGCCC GGTATAC CGGTATAC GGAAATTTG
CCATAGGCTATATGCGCCCTATCGGCAATTTGCGGTATAC ATAC CC GAAATTTGC Read
depth is critical for accurate reconstruction 38TRiG Curriculum:
Lecture 2March 2012
Slide 39
Alignment approaches AlignerDescription Illumina platform ELAND
Vendor-provided aligner for Illumina data Bowtie Ultrafast,
memory-efficient short-read aligner for Illumina data Novoalign A
sensitive aligner for Illumina data that uses the NeedlemanWunsch
algorithm SOAP Short oligo analysis package for alignment of
Illumina data MrFAST A mapper that allows alignments to multiple
locations for CNV detection SOLiD platform Corona-lite
Vendor-provided aligner for SOLiD data SHRiMP Efficient
SmithWaterman mapper with colorspace correction 454 Platform
Newbler Vendor-provided aligner and assembler for 454 data SSAHA2
SAM-friendly sequence search and alignment by hashing program
BWA-SW SAM-friendly SmithWaterman implementation of BWA for long
reads Multi-platform BFAST BLAT-like fast aligner for Illumina and
SOLiD data BWA Burrows-Wheeler aligner for Illumina, SOLiD, and 454
data Maq A widely used mapping tool for Illumina and SOLiD; now
deprecated by BWA Koboldt DC, et al. Brief Bioinform 2010
Sep;11(5):484-98 39TRiG Curriculum: Lecture 2March 2012
Slide 40
Short read alignment Given a reference and a set of reads,
report at least one good local alignment for each read if one
exists Approximate answer to question: where in genome did read
originate? TGATCATA GATCAA TGATCATA GAGAAT better than What is
good? For now, we concentrate on: TGATATTA GATcaT TGATCATA GTACAT
better than Fewer mismatches = better Failing to align a
low-quality base is better than failing to align a high-quality
base 40TRiG Curriculum: Lecture 2March 2012
Slide 41
Post alignment: what do you get? Alignment of reads including
read pairs SAM file Read Pair CIGAR field Simplified pileup output
Li H, et al. Bioinformatics. 2009;25:2078 41TRiG Curriculum:
Lecture 2March 2012
Slide 42
Workflow Image processing and base calling Raw Data Analysis
Alignment to reference genome Whole Genome Mapping Detection of
genetic variation (SNPs, Indels, insertions) Variant Calling
Linking variants to biological information Annotation 42TRiG
Curriculum: Lecture 2March 2012
Workflow Image processing and base calling Raw Data Analysis
Alignment to reference genome Whole Genome Mapping Detection of
genetic variation (SNPs, Indels, insertions) Variant Calling
Linking variants to biological information Annotation 46TRiG
Curriculum: Lecture 2March 2012
Slide 47
Where to go to annotate genomic data, determine clinical
relevance? Online Mendelian Inheritance in Man
(http://www.ncbi.nlm.nih.gov/omim)http://www.ncbi.nlm.nih.gov/omim
International HapMap project
(http://hapmap.ncbi.nlm.nih.gov)http://hapmap.ncbi.nlm.nih.gov
Human genome mutation database
(http://www.hgvs.org/dblist/glsdb.html) PharmGKB
(http://www.pharmgkb.org)http://www.pharmgkb.org Scientific
literature 47TRiG Curriculum: Lecture 2March 2012
Slide 48
Case-control study design = variable results Need for Clinical
Grade Database Ease of use Continually updated Clinically relevant
SNPs/variations 48 Ng PC, et al. Nature. 2009; 461: 724 TRiG
Curriculum: Lecture 2March 2012
Slide 49
Cancer Treatment: NGS of Tumor Jones SJM, et al. Genome Biol.
2010;11:R82. 49TRiG Curriculum: Lecture 2March 2012
Slide 50
Case History 78 year old male Poorly differentiated papillary
adenocarcinoma of tongue Metastatic to lymph nodes Failed
chemotherapy Decision to use next- generation sequencing methods
50TRiG Curriculum: Lecture 2March 2012
Slide 51
Workflow Image processing and base calling Raw Data Analysis
Alignment to reference genome Whole Genome Mapping Detection of
genetic variation (SNPs, Indels, SV) Variant Calling Linking
variants to biological information Annotation 51TRiG Curriculum:
Lecture 2March 2012
Slide 52
Methods and Results Analysis Whole genome Transcriptome
Findings Upregulation of RET oncogene Downregulation of PTEN 52TRiG
Curriculum: Lecture 2March 2012
Slide 53
Transcriptome and Whole-exome Transcriptome Convert RNA to cDNA
Perform sequencing Only expressed genes Can get expression levels
Whole-exome Use selection procedure to enrich exons No intron data
Results depends on selection procedure 53 Martin JA, Wang Z. Nat
Rev Genet. 2011; 12:671. TRiG Curriculum: Lecture 2March 2012
Slide 54
A few words about samples Can use formalin-fixed
paraffin-embedded tissue for whole-exome or transcriptome
sequencing Need frozen tissue for whole-genome sequencing Better
quality DNA Small quantity of DNA needed For whole-exome
sequencing, amount off a few slides 54TRiG Curriculum: Lecture
2March 2012
Slide 55
Summary Gene chips SNPs Expression profiling Copy number
variation Major steps in NGS Base calling Alignment Variant calling
Annotation Technology will change but just another test Accuracy
Precision Need to validate findings with traditional methods
Roychowdhury S, et al. Sci Transl Med. 2011; 3: 111ra121 55TRiG
Curriculum: Lecture 2March 2012