Statistical Methods for Next Generation
Sequencinghttp://www.biostat.jhsph.edu/~khansen/enar2012.html
Zhijin WuBrown University
Kasper Hansen, Rafael A IrizarryJohns Hopkins University
\
Outline
• Introduction to NGS
• SNP calling and genotyping
• RNA-sequencing
• Hands-on exercise
2
Introduction to Next Generation
SequencingRafael A. Irizarry
http://rafalab.org
Many slides courtesy of:Héctor Corrada Bravo and Ben Langmead
D. melanogaster, Science, 2000 H. sapiens, Nature, 2000 M. musculus, Nature, 2002and Science, 2000
Back then: millions of clones (thousand bps) in 9 months for billions of dollars
Today: billion of short reads (35-100 bps) in a week for thousands of dollars
Remember this?
4
Claim: Assemble a genome in weeks for less than $100,000
Start with DNA (millions of copies)
5
Break it
6
Put in sequencer
7
Sequence first 35-400 bps: call them “reads”
GTTGAGGCTTGCGTTTTTGGTACGCTGGACTTTGTGTACTCGTCGCTGCGTTGAGGCTTGCGTTTTTGGTATGGTACGCTGGACTTTGTAGGATACCCTCGCTTTTTGCGTTTATGGTACGCTGGACTTTGTAGGATACCCTTGCGTTTATGGTACGCTGGACTTTGTAGGATACTTGCGTTTATGGTACGCTGGACTTTGTAGGATACCGCGTTTATGGTACGCTGGACTTTGTAGGATACCCTGAGGCTTGCGTTTATGGTACGCTGGACTTTGTAGGGCGTTGAGGCTTGCGTTTATGGTACGCTGGATTTTCGTTTATGGTACGCTGGACTTTGTAGGATACCCTCATGGTACGCTGGACTTTGTAGGATACCCTCGCTTT GTTTATGGTACGCTGGACTTTGTAGGATACCCTCGTCTCGTGCTCGTCGCTGCGTTGAGGCTTGCGTTTA TGCTCGTCGCTGCGTTGAGGCTTGCGTTTATGGTAGCTCGTCGCTGCGTTGAGGCTTGCGTTTATGGTAC TATGGTACGCTGGACTTTGTAGGATACCCTCGCTTTCGTGCTCGTCGCTGCGTTGAGGCTTGCGTTTTTGCGTCGCTGCGTTGAGGCTTGCGTTTATGGTACGCTGTTGAGGCTTGCGTTTATGGTACGCTGGGCTTTTT TTGCGTTTATGGTACGCTGGACTTTGTAGGATACC
8
Platforms
Illumina/Solexa
• Eight lanes
• ~160M short reads (~50-70 bp) per lane
TECHNOLOGY SPOTLIGHT: ILLUMINA® SEQUENCING
INTRODUCTION
CLUSTER GENERATION
SEQUENCING-BY-SYNTHESIS
ANALYSIS PIPELINE
DATA COLLECTION, PROCESSING, AND ANALYSIS
Illumina Sequencing TechnologyThe Genome Analyzer generates several billion bases of high-quality sequence per run at less than 1% of the cost of capillary-based methods. An expansive scale of research unimaginable with other technology platforms is now possible.
FIGURE 1: ILLUMINA GENOME ANALYZER FLOW CELL
Several samples can be loaded onto the eight-lane flow cell for simul-taneous analysis on the Illumina Genome Analyzer.
10
Source: Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010
Source: Whiteford et al. Swift: primary data analysis for the Illumina Solexa sequencing platform. Bioinformatics. 2009
Source: Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010
namesequencequality scores
x 100s of millions
Not just Assembly
• Resequencing
• SNP discovery and genotyping
• Variant discovery and quantification
• TF binding sites: ChIP-Seq
• Gene expression: RNA-Seq
• Measuring methylation
\
Not just Assembly
13
\
1000 Genomes Project
Genotyping
14
\
Human Epigenome Project
15
Methylation
What to do with all these sequences?
GTTGAGGCTTGCGTTTTTGGTACGCTGGACTTTGTGTACTCGTCGCTGCGTTGAGGCTTGCGTTTTTGGTATGGTACGCTGGACTTTGTAGGATACCCTCGCTTTTTGCGTTTATGGTACGCTGGACTTTGTAGGATACCCTTGCGTTTATGGTACGCTGGACTTTGTAGGATACTTGCGTTTATGGTACGCTGGACTTTGTAGGATACCGCGTTTATGGTACGCTGGACTTTGTAGGATACCCTGAGGCTTGCGTTTATGGTACGCTGGACTTTGTAGGGCGTTGAGGCTTGCGTTTATGGTACGCTGGATTTTCGTTTATGGTACGCTGGACTTTGTAGGATACCCTCATGGTACGCTGGACTTTGTAGGATACCCTCGCTTT GTTTATGGTACGCTGGACTTTGTAGGATACCCTCGTCTCGTGCTCGTCGCTGCGTTGAGGCTTGCGTTTA TGCTCGTCGCTGCGTTGAGGCTTGCGTTTATGGTAGCTCGTCGCTGCGTTGAGGCTTGCGTTTATGGTAC TATGGTACGCTGGACTTTGTAGGATACCCTCGCTTTCGTGCTCGTCGCTGCGTTGAGGCTTGCGTTTTTGCGTCGCTGCGTTGAGGCTTGCGTTTATGGTACGCTGTTGAGGCTTGCGTTTATGGTACGCTGGGCTTTTT TTGCGTTTATGGTACGCTGGACTTTGTAGGATACC
16
Most apps: Start by matching to reference
GTTGAGGCTTGCGTTTTTGGTACGCTGGACTTTGT GTACTCGTCGCTGCGTTGAGGCTTGCGTTTTTGGT ATGGTACGCTGGACTTTGTAGGATACCCTCGCTTT TTGCGTTTATGGTACGCTGGACTTTGTAGGATACC CTTGCGTTTATGGTACGCTGGACTTTGTAGGATAC TTGCGTTTATGGTACGCTGGACTTTGTAGGATACC GCGTTTATGGTACGCTGGACTTTGTAGGATACCCT GAGGCTTGCGTTTATGGTACGCTGGACTTTGTAGG GCGTTGAGGCTTGCGTTTATGGTACGCTGGATTTT CGTTTATGGTACGCTGGACTTTGTAGGATACCCTC ATGGTACGCTGGACTTTGTAGGATACCCTCGCTTT GTTTATGGTACGCTGGACTTTGTAGGATACCCTCG TCTCGTGCTCGTCGCTGCGTTGAGGCTTGCGTTTA TGCTCGTCGCTGCGTTGAGGCTTGCGTTTATGGTA GCTCGTCGCTGCGTTGAGGCTTGCGTTTATGGTAC TATGGTACGCTGGACTTTGTAGGATACCCTCGCTT TCGTGCTCGTCGCTGCGTTGAGGCTTGCGTTTTTG CGTCGCTGCGTTGAGGCTTGCGTTTATGGTACGCT GTTGAGGCTTGCGTTTATGGTACGCTGGGCTTTTT TTGCGTTTATGGTACGCTGGACTTTGTAGGATACCCTCTCGTGCTCGTCGCTGCGTTGAGGCTTGCGTTTATGGTACGCTGGACTTTGTAGGATACCCTCGCTTTC
17
Variant detection
GATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGGTATTTTCGTCTGGGGGGTATGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTCCTGCCTCATCCTATTATTTATCGCACCTACGTTCAATATT
GTCGCAGTANCTGTCT||||||||| ||||||GTCGCAGTATCTGTCT
GGATCTGCGATATACC|||||| |||||||||GGATCT-CGATATACC
AATCTGATCTTATTTT||||||||||||||||AATCTGATCTTATTTT
ATATATATATATATAT||||||||||||||||ATATATATATATATAT
TCTCTCCCANNAGAGC||||||||| |||||TCTCTCCCAGGAGAGC
Align Aggregate
Reference
Call: HET A, Gp-value: 0.0023
GTCGCAGTATCTGTCT GTCGCAGTATCTGTNN TGTCGCAGTATCTGTC TATGTCGCAGTATCTG TATATCGCAGTATCTT TATATCGCAGTATCTG NATATCGCAGTATNTG CCCTATATCGCAGTAT ACACCCTATGTCGCA ACACCCTATCTCGCA ACACCCTATGTCGCA GA-CACCCTATGTCGC CCGGA-CACCCTATAT CCGGA-CACCCTATATGCCGGA-CACCCTATG
Statistics
“Coverage”
“Pileup” or “Coverage plot”
“Depth of coverage” = 14
RNA-seq differential expression
GATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGGTATTTTCGTCTGGGGGGTATGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTCCTGCCTCATCCTATTATTTATCGCACCTACGTTCAATATT
GTCGCAGTATCTGTCT GTCGCAGTATCTGTCT GTCGCAGTATCTGTCT GTCGCAGTATCTGTCT GTCGCAGTATCTGTCT TGTCGCAGTATCTGTC TATGTCGCAGTATCTG TATATCGCAGTATCTG TATATCGCAGTATCTG TATATCGCAGTATCTG CCCTATATCGCAGTAT AGCACCCTATGTCGCA AGCACCCTATATCGCA AGCACCCTATGTCGCA GAGCACCCTATGTCGC CCGGAGCACCCTATAT CCGGAGCACCCTATATGCCGGAGCACCCTATG
GTCGCAGTANCTGTCT||||||||| ||||||GTCGCAGTATCTGTCT
GGATCTGCGATATACC|||||| |||||||||GGATCT-CGATATACC
AATCTGATCTTATTTT||||||||||||||||AATCTGATCTTATTTT
ATATATATATATATAT||||||||||||||||ATATATATATATATAT
TCTCTCCCANNAGAGC||||||||| |||||TCTCTCCCAGGAGAGC
Align Aggregate
Statistics
Gene 1differentially expressed?: YES
p-value: 0.0012
TGTCGCAGTATCTGTC AGCACCCTATGTCGCAGCCGGAGCACCCTATGGTCGCAGTANCTGTCT
||||||||| ||||||GTCGCAGTATCTGTCT
GGATCTGCGATATACC|||||| |||||||||GGATCT-CGATATACC
AATCTGATCTTATTTT||||||||||||||||AATCTGATCTTATTTT
ATATATATATATATAT||||||||||||||||ATATATATATATATAT
TCTCTCCCANNAGAGC||||||||| |||||TCTCTCCCAGGAGAGC
Align Aggregate
Gene 1
Sample A
Sample B
GATTCCTGCCTCATCC
ChIP-seq
GATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGGTATTTTCGTCTGGGGGGTATGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTCCTGCCTCATCCTATTATTTATCGCACCTACGTTCAATATT
GTCGCAGTATCTGTCT GTCGCAGTATCTGTCT TGTCGCAGTATCTGTC TATGTCGCAGTATCTG TATATCGCAGTATCTG TATATCGCAGTATCTG TATATCGCAGTATCTG CCCTATATCGCAGTAT CCCTATATCGCAGTAT CCCTATATCGCAGTAT CCCTATATCGCAGTAT CCCTATATCGCAGTAT CCCTATATCGCAGTAT CCCTATATCGCAGTAT AGCACCCTATGTCGCA AGCACCCTATATCGCA AGCACCCTATGTCGCA GAGCACCCTATGTCGC CCGGAGCACCCTATAT CCGGAGCACCCTATATGCCGGAGCACCCTATG
GTCGCAGTANCTGTCT||||||||| ||||||GTCGCAGTATCTGTCT
GGATCTGCGATATACC|||||| |||||||||GGATCT-CGATATACC
AATCTGATCTTATTTT||||||||||||||||AATCTGATCTTATTTT
ATATATATATATATAT||||||||||||||||ATATATATATATATAT
TCTCTCCCANNAGAGC||||||||| |||||TCTCTCCCAGGAGAGC
Align
Reference
Binding occurs herep-value: 0.0023
Aggregate
Statistics
TATGCACGCGATAGCAGATAGCATTGCGAGAC
Matching Revisted
GTTGAGGCTTGCGTTTTTGGTACGCTGGACTTTGT GTACTCGTCGCTGCGTTGAGGCTTGCGTTTTTGGT ATGGTACGCTGGACTTTGTAGGATACCCTCGCTTT TTGCGTTTATGGTACGCTGGACTTTGTAGGATACC CTTGCGTTTATGGTACGCTGGACTTTGTAGGATAC TTGCGTTTATGGTACGCTGGACTTTGTAGGATACC GCGTTTATGGTACGCTGGACTTTGTAGGATACCCT GAGGCTTGCGTTTATGGTACGCTGGACTTTGTAGG GCGTTGAGGCTTGCGTTTATGGTACGCTGGATTTT CGTTTATGGTACGCTGGACTTTGTAGGATACCCTC ATGGTACGCTGGACTTTGTAGGATACCCTCGCTTT GTTTATGGTACGCTGGACTTTGTAGGATACCCTCG TCTCGTGCTCGTCGCTGCGTTGAGGCTTGCGTTTA TGCTCGTCGCTGCGTTGAGGCTTGCGTTTATGGTA GCTCGTCGCTGCGTTGAGGCTTGCGTTTATGGTAC TATGGTACGCTGGACTTTGTAGGATACCCTCGCTT TCGTGCTCGTCGCTGCGTTGAGGCTTGCGTTTTTG CGTCGCTGCGTTGAGGCTTGCGTTTATGGTACGCT GTTGAGGCTTGCGTTTATGGTACGCTGGGCTTTTT TTGCGTTTATGGTACGCTGGACTTTGTAGGATACCCTCTCGTGCTCGTCGCTGCGTTGAGGCTTGCGTTTATGGTACGCTGGACTTTGTAGGATACCCTCGCTTTC
21
Matching 10,000,000 32 bps reads
• BLAST takes more than 6 months
• BLAT takes 2 months
• MAQ takes 1 day and half
• Bowtie takes 17 minutes
22
Matching
GTTGAGGCTTGCGTTTTTGGTACGCTGGACTTTGT GTACTCGTCGCTGCGTTGAGGCTTGCGTTTTTGGT ATGGTACGCTGGACTTTGTAGGATACCCTCGCTTT TTGCGTTTATGGTACGCTGGACTTTGTAGGATACC CTTGCGTTTATGGTACGCTGGACTTTGTAGGATAC TTGCGTTTATGGTACGCTGGACTTTGTAGGATACC GCGTTTATGGTACGCTGGACTTTGTAGGATACCCT GAGGCTTGCGTTTATGGTACGCTGGACTTTGTAGG GCGTTGAGGCTTGCGTTTATGGTACGCTGGATTTT CGTTTATGGTACGCTGGACTTTGTAGGATACCCTC ATGGTACGCTGGACTTTGTAGGATACCCTCGCTTT GTTTATGGTACGCTGGACTTTGTAGGATACCCTCG TCTCGTGCTCGTCGCTGCGTTGAGGCTTGCGTTTA TGCTCGTCGCTGCGTTGAGGCTTGCGTTTATGGTA GCTCGTCGCTGCGTTGAGGCTTGCGTTTATGGTAC TATGGTACGCTGGACTTTGTAGGATACCCTCGCTT TCGTGCTCGTCGCTGCGTTGAGGCTTGCGTTTTTG CGTCGCTGCGTTGAGGCTTGCGTTTATGGTACGCT GTTGAGGCTTGCGTTTATGGTACGCTGGGCTTTTT TTGCGTTTATGGTACGCTGGACTTTGTAGGATACCCTCTCGTGCTCGTCGCTGCGTTGAGGCTTGCGTTTATGGTACGCTGGACTTTGTAGGATACCCTCGCTTTC
23
Mapping
CTCAAACTCCTGACCTTTGGTGATCCACCCGCCTNGGCCTTC
Take a read:
And a reference sequence:>MT dna:chromosome chromosome:GRCh37:MT:1:16569:1GATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGGTATTTTCGTCTGGGGGGTATGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTCCTGCCTCATCCTATTATTTATCGCACCTACGTTCAATATTACAGGCGAACATACTTACTAAAGTGTGTTAATTAATTAATGCTTGTAGGACATAATAATAACAATTGAATGTCTGCACAGCCACTTTCCACACAGACATCATAACAAAAAATTTCCACCAAACCCCCCCTCCCCCGCTTCTGGCCACAGCACTTAAACACATCTCTGCCAAACCCCAAAAACAAAGAACCCTAACACCAGCCTAACCAGATTTCAAATTTTATCTTTTGGCGGTATGCACTTTTAACAGTCACCCCCCAACTAACACATTATTTTCCCCTCCCACTCCCATACTACTAATCTCATCAATACAACCCCCGCCCATCCTACCCAGCACACACACACCGCTGCTAACCCCATACCCCGAACCAACCAAACCCCAAAGACACCCCCCACAGTTTATGTAGCTTACCTCCTCAAAGCAATACACTGACCCGCTCAAACTCCTGGATTTTGGATCCACCCAGCGCCTTGGCCTAAACTAGCCTTTCTATTAGCTCTTAGTAAGATTACACATGCAAGCATCCCCGTTCCAGTGAGTTCACCCTCTAAATCACCACGATCAAAAGGAACAAGCATCAAGCACGCAGCAATGCAGCTCAAAACGCTTAGCCTAGCCACACCCCCACGGGAAACAGCAGTGATTAACCTTTAGCAATAAACGAAAGTTTAACTAAGCTATACTAACCCCAGGGTTGGTCAATTTCGTGCCAGCCACCGCGGTCACACGATTAACCCAAGTCAATAGAAGCCGGCGTAAAGAGTGTTTTAGATCACCCCCTCCCCAATAAAGCTAAAACTCACCTGAGTTGTAAAAAACTCCAGTTGACACAAAATAGACTACGAAAGTGGCTTTAACATATCTGAACACACAATAGCTAAGACCCAAACTGGGATTAGATACCCCACTATGCTTAGCCCTAAACCTCAACAGTTAAATCAACAAAACTGCTCGCCAGAACACTACGAGCCACAGCTTAAAACTCAAAGGACCTGGCGGTGCTTCATATCCCTCTAGAGGAGCCTGTTCTGTAATCGATAAACCCCGATCAACCTCACCACCTCTTGCTCAGCCTATATACCGCCATCTTCAGCAAACCCTGATGAAGGCTACAAAGTAAGCGCAAGTACCCACGTAAAGACGTTAGGTCAAGGTGTAGCCCATGAGGTGGCAAGAAATGGGCTACATTTTCTACCCCAGAAAACTACGATAGCCCTTATGAAACTTAAGGGTCGAAGGTGGATTTAGCAGTAAACTAAGAGTAGAGTGCTTAGTTGAACAGGGCCCTGAAGCGCGTACACACCGCCCGTCACCCTCCTCAAGTATACTTCAAAGGACATTTAACTAAAACCCCTACGCATTTATATAGAGGAGACAAGTCGTAACCTCAAACTCCTGCCTTTGGTGATCCACCCGCCTTGGCCTACCTGCATAATGAAGAAGCACCCAACTTACACTTAGGAGATTTCAACTTAACTTGACCGCTCTGAGCTAAACCTAGCCCCAAACCCACTCCACCTTACTACCAGACAACCTTAGCCAAACCATTTACCCAAATAAAGTATAGGCGATAGAAATTGAAACCTGGCGCAATAGATATAGTACCGCAAGGGAAAGATGAAAAATTATAACCAAGCATAATATAGCAAGGACTAACCCCTATACCTTCTGCATAATGAATTAACTAGAAATAACTTTGCAAGGAGAGCCAAAGCTAAGACCCCCGAAACCAGACGAGCTACCTAAGAACAGCTAAAAGAGCACACCCGTCTATGTAGCAAAATAGTGGGAAGATTTATAGGTAGAGGCGACAAACCTACCGAGCCTGGTGATAGCTGGTTGTCCAAGATAGAATCTTAGTTCAACTTTAAATTTGCCCACAGAACCCTCTAAATCCCCTTGTAAATTTAACTGTTAGTCCAAAGAGGAACAGCTCTTTGGACACTAGGAAAAAACCTTGTAGAGAGAGTAAAAAATTTAACACCCATAGTAGGCCTAAAAGCAGCCACCAATTAAGAAAGCGTTCAAGCTCAACACCCACTACCTAAAAAATCCCAAACATATAACTGAACTCCTCACACCCAATTGGACCAATCTATCACCCTATAGAAGAACTAATGTTAGTATAAGTAACATGAAAACATTCTCCTCCGCATAAGCCTGCGTCAGATTAAAACACTGAACTGACAATTAACAGCCCAATATCTACAATCAACCAACAAGTCATTATTACCCTCACTGTCAACCCAACACAGGCATGCTCATAAGGAAAGGTTAAAAAAAGTAAAAGGAACTCGGCAAATCTTACCCCGCCTGTTTACCAAAAACATCACCTCTAGCATCACCAGTATTAGAGGCACCGCCTGCCCAGTGACACATGTTTAACGGCCGCGGTACCCTAACCGTGCAAAGGTAGCATAATCACTTGTTCCTTAAATAGGGACCTGTATGAATGGCTCCACGAGGGTTCAGCTGTCTCTTACTTTTAACCAGTGAAATTGACCTGCCCGTGAAGAGGCGGGCATAACACAGCAAGACGAGAAGACCCTATGGAGCTTTAATTTATTAATGCAAACAGTACCTAACAAACCCACAGGTCCTAAACTACCAAACCTGCATTAAAAATTTCGGTTGGGGCGACCTCGGAGCAGAACCCAACCTCCGAGCAGTACATGCTAAGACTTCACCAGTCAAAGCGAACTACTATACTCAATTGATCCAATAACTTGACCAACGGAACAAGTTACCCTAGGGATAACAGCGCAATCCTATTCTAGAGTCCATATCAACAATAGGGTTTACGACCTCGATGTTGGATCAGGACATCCCGATGGTGCAGCCGCTATTAAAGGTTCGTTTGTTCAACGATTAAAGTCCTACGTGATCTGAGTTCAGACCGGAGTAATCCAGGTCGGTTTCTATCTACNTTCAAATTCCTCCCTGTACGAAAGGACAAGAGAAATAAGGCCTACTTCACAAAGCGCCTTCCCCCGTAAATGATATCATCTCAACTTAGTATTATACCCACACCCACCCAAGAACAGGGTTTGTTAAGATGGC
How do we determine the read’s point of origin with respect to the reference?
CTCAAAGACCTGACCTTTGGTGATCCACCC-----GCCTNGGCCTTC|||||| |||| |||| ||||||||| |||| |||||CTCAAACTCCTGGATTTTG--GATCCACCCAGCTGGCCTTGGCCTAA
Hypothesis 1:
Hypothesis 2:
CTCAAACTCCTGACCTTTGGTGATCCACCCGCCTNGGCCTTC|||||||||||| ||||||||||||||||||||| ||||| |CTCAAACTCCTG-CCTTTGGTGATCCACCCGCCTTGGCCTAC
Answer: sequence similarity
Read
Reference
Read
Reference
Say hypothesis 2 is correct. Why are there still mismatches and gaps?
Which hypothesis is better?
More on variants and base-calling
SNPs
GTTGAGGCTTGCGTTTTTGGTACGCTGGACTTTGT GTACTCGTCGCTGCGTTGAGGCTTGCGTTTTTGGT ATGGTACGCTGGACTTTGTAGGATACCCTCGCTTT TTGCGTTTATGGTACGCTGGACTTTGTAGGATACC CTTGCGTTTATGGTACGCTGGACTTTGTAGGATAC TTGCGTTTATGGTACGCTGGACTTTGTAGGATACC GCGTTTATGGTACGCTGGACTTTGTAGGATACCCT GAGGCTTGCGTTTATGGTACGCTGGACTTTGTAGG GCGTTGAGGCTTGCGTTTATGGTACGCTGGATTTT CGTTTATGGTACGCTGGACTTTGTAGGATACCCTC ATGGTACGCTGGACTTTGTAGGATACCCTCGCTTT GTTTATGGTACGCTGGACTTTGTAGGATACCCTCG TCTCGTGCTCGTCGCTGCGTTGAGGCTTGCGTTTA TGCTCGTCGCTGCGTTGAGGCTTGCGTTTATGGTA GCTCGTCGCTGCGTTGAGGCTTGCGTTTATGGTAC TATGGTACGCTGGACTTTGTAGGATACCCTCGCTT TCGTGCTCGTCGCTGCGTTGAGGCTTGCGTTTTTG CGTCGCTGCGTTGAGGCTTGCGTTTATGGTACGCT GTTGAGGCTTGCGTTTATGGTACGCTGGGCTTTTT TTGCGTTTATGGTACGCTGGACTTTGTAGGATACCCTCTCGTGCTCGTCGCTGCGTTGAGGCTTGCGTTTATGGTACGCTGGACTTTGTAGGATACCCTCGCTTTC
SNPs
TCTCGTGCTCGTCGCTGCGTTGAGGCTTGCGTTTA TCGTGCTCGTCGCTGCGTTGAGGCTTGCGTTTTTG GTACTCGTCGCTGCGTTGAGGCTTGCGTTTTTGGT TGCTCGTCGCTGCGTTGAGGCTTGCGTTTATGGTA GCTCGTCGCTGCGTTGAGGCTTGCGTTTATGGTAC CGTCGCTGCGTTGAGGCTTGCGTTTATGGTACGCT GCGTTGAGGCTTGCGTTTATGGTACGCTGGATTTT GTTGAGGCTTGCGTTTTTGGTACGCTGGACTTTGT GTTGAGGCTTGCGTTTATGGTACGCTGGGCTTTTT GAGGCTTGCGTTTATGGTACGCTGGACTTTGTAGG CTTGCGTTTATGGTACGCTGGACTTTGTAGGATAC TTGCGTTTATGGTACGCTGGACTTTGTAGGATACC TTGCGTTTATGGTACGCTGGACTTTGTAGGATACC TTGCGTTTATGGTACGCTGGACTTTGTAGGATACC GCGTTTATGGTACGCTGGACTTTGTAGGATACCCT CGTTTATGGTACGCTGGACTTTGTAGGATACCCTC GTTTATGGTACGCTGGACTTTGTAGGATACCCTCG TATGGTACGCTGGACTTTGTAGGATACCCTCGCTT ATGGTACGCTGGACTTTGTAGGATACCCTCGCTTT ATGGTACGCTGGACTTTGTAGGATACCCTCGCTTT CTCTCGTGCTCGTCGCTGCGTTGAGGCTTGCGTTTATGGTACGCTGGACTTTGTAGGATACCCTCGCTTTC
All Reads
Sequencing cycle
Nuc
leot
ide
com
posi
tion
0.0
0.2
0.4
0.6
0.8
1.0
0 10 20 30
A
T
1000 Genomes Data
050
0010
000
1500
0 Sample NA19238, UCSC Loci, n=42691
Sequencing Cycle
Num
ber o
f Cal
ls
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35
050
010
0015
0020
00
Sample NA19238, Non−UCSC Loci, n=4986
Sequencing Cycle
Num
ber o
f Cal
ls
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35
010
0020
0030
0040
0050
0060
0070
00
Sample NA19238, Hapmap Loci, n=19067
Sequencing Cycle
Num
ber o
f Cal
ls
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35
020
0040
0060
0080
0010
000 Sample NA19238, Non−Hapmap Loci, n=28610
Sequencing Cycle
Num
ber o
f Cal
ls
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35
020
0040
0060
0080
0010
000
1200
0
Sample NA19238, 1kgenomes Loci, n=36333
Sequencing Cycle
Num
ber o
f Cal
ls
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35
010
0020
0030
0040
00
Sample NA19238, Non−1kgenomes Loci, n=11344
Sequencing Cycle
Num
ber o
f Cal
ls
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35
050
0010
000
1500
0
Sample NA19238, UCSC Loci, n=48864
Sequencing Cycle
Num
ber o
f Cal
ls
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35
010
0020
0030
0040
0050
0060
0070
00
Sample NA19238, Non−UCSC Loci, n=15005
Sequencing Cycle
Num
ber o
f Cal
ls
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35
020
0040
0060
00
Sample NA19238, Hapmap Loci, n=20069
Sequencing Cycle
Num
ber o
f Cal
ls
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35
050
0010
000
1500
0
Sample NA19238, Non−Hapmap Loci, n=43800
Sequencing Cycle
Num
ber o
f Cal
ls
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35
020
0040
0060
0080
0010
000
1200
0
Sample NA19238, 1kgenomes Loci, n=38306
Sequencing Cycle
Num
ber o
f Cal
ls
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35
020
0040
0060
0080
0010
000
Sample NA19238, Non−1kgenomes Loci, n=25563
Sequencing Cycle
Num
ber o
f Cal
ls
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35
All data
Filtered: snpq>=20,
nreads<=360
SNPs in dbSNP Novel SNPs
Cycle
Here we aggregate reads and
record cycle at which variant appears
1000 Genomes Data
050
0010
000
1500
0 Sample NA19238, UCSC Loci, n=42691
Sequencing Cycle
Num
ber o
f Cal
ls
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35
050
010
0015
0020
00
Sample NA19238, Non−UCSC Loci, n=4986
Sequencing Cycle
Num
ber o
f Cal
ls
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35
010
0020
0030
0040
0050
0060
0070
00
Sample NA19238, Hapmap Loci, n=19067
Sequencing Cycle
Num
ber o
f Cal
ls
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35
020
0040
0060
0080
0010
000 Sample NA19238, Non−Hapmap Loci, n=28610
Sequencing Cycle
Num
ber o
f Cal
ls
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35
020
0040
0060
0080
0010
000
1200
0
Sample NA19238, 1kgenomes Loci, n=36333
Sequencing Cycle
Num
ber o
f Cal
ls
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35
010
0020
0030
0040
00
Sample NA19238, Non−1kgenomes Loci, n=11344
Sequencing Cycle
Num
ber o
f Cal
ls
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35
050
0010
000
1500
0
Sample NA19238, UCSC Loci, n=48864
Sequencing Cycle
Num
ber o
f Cal
ls
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35
010
0020
0030
0040
0050
0060
0070
00
Sample NA19238, Non−UCSC Loci, n=15005
Sequencing Cycle
Num
ber o
f Cal
ls
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35
020
0040
0060
00
Sample NA19238, Hapmap Loci, n=20069
Sequencing Cycle
Num
ber o
f Cal
ls
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35
050
0010
000
1500
0
Sample NA19238, Non−Hapmap Loci, n=43800
Sequencing Cycle
Num
ber o
f Cal
ls
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35
020
0040
0060
0080
0010
000
1200
0
Sample NA19238, 1kgenomes Loci, n=38306
Sequencing Cycle
Num
ber o
f Cal
ls
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35
020
0040
0060
0080
0010
000
Sample NA19238, Non−1kgenomes Loci, n=25563
Sequencing Cycle
Num
ber o
f Cal
ls
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35
All data
Filtered: snpq>=20,
nreads<=360
SNPs in dbSNP Novel SNPs
Cycle
What is causing this?
Source: Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010
Source: Whiteford et al. Swift: primary data analysis for the Illumina Solexa sequencingplatform. Bioinformatics. 2009
Source: Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010
namesequencequality scores
x 100s of millions
(slide courtesy of Ben Langmead)
Before Reads There were Intensities
We Want to See This
Color coded by call made: A, C, G, T
But See This
Color coded by call made: A, C, G, T
Four channel fluorescence intensity, cycle 1
A
C
G
T
Gets Worse for higher cycles
Color coded by call made: A, C, G, T
Error Rate and Reported Quality
0 20 40 60
0.0
0.1
0.2
0.3
0.4
Sequencing Cycle
Estim
ated
Erro
r Pro
babi
litie
sA>CT>CA>GT>AC>TC>AG>AT>GG>CG>TA>TC>G
0 20 40 60
0.00
00.
005
0.01
00.
015
Sequencing Cycle
Mis
mat
ch P
ropo
rtion
s
A>CT>CA>GT>AC>TC>AG>AT>GG>CG>TA>TC>G
Remember This?
Sequencing cycle
Nuc
leot
ide
com
posi
tion
0.0
0.2
0.4
0.6
0.8
1.0
0 10 20 30
A
T
Bias Explained
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
● ●●
●
●
●●
●
●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●●
●●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
● ●
●
●●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●
● ●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
● ●
●
●
●●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
● ●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●●●
●
●●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
● ●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
● ●
●
●
●
●
●●
●
●●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●●
●
●
●
●
● ●
● ●
●●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
● ●
●●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
● ●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
● ●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0 2000 4000 6000 8000
05000
10000
0.5(A+T)
A−T
●
●
cycle << 20cycle ≥≥ 20
Base Calling
1) Rougemont et al. Probabilistic base calling of Solexa sequencingdata. BMC Bioinformatics (2008)
2) Erlich et al. Alta-Cyclic: a self-optimizing base caller fornext-generation sequencing. Nat Methods (2008) 3) Kao et al. BayesCall: A model-based base-calling algorithm forhigh-throughput short-read sequencing. Genome Res (2009)
4) Corrada Bravo and Irizarry. Model-Based Quality Assessment and Base-Callingfor Second-Generation Sequencing Data. Biometrics (2009)
5) Cokus et al. Shotgun bisulphite sequencing of the Arabidopsisgenome reveals DNA methylation patterning. Nature (2009)
Intensity Model
cycle
12.5
13.0
13.5
14.0
Intensity Model
log intensity read i, cycle j, channel c
indicators of nucleotide identity, read i, pos. j
∆ijc =
�1 if c is the nucleotide in read i position j
0 otherwise
uijc = ∆ijc(µcjα + xTj αi + �α
ijc) +
(1−∆ijc)(µcjβ + xTj βi + �β
ijc)
Intensity Model
log intensity read i, cycle j, channel c
read-specific linear models
uijc = ∆ijc(µcjα + xTj αi + �α
ijc) +
(1−∆ijc)(µcjβ + xTj βi + �β
ijc)
Intensity Model
log intensity read i, cycle j, channel c
measurement error
uijc = ∆ijc(µcjα + xTj αi + �α
ijc) +
(1−∆ijc)(µcjβ + xTj βi + �β
ijc)
�αijc ∼ N(0, σ2
αi) �βijc ∼ N(0, σ2
βi)
Read & Cycle Effects
cycle
h(intensity)
11121314
0 10 20 30 0 10 20 30 0 10 20 30
11121314
11121314
0 10 20 30 0 10 20 30
11121314
Base IdentityProbability Profiles
1 3 5 7 9 11 14 17 20 23 26 29 32 35
Position
00.20.40.60.81
Probability
Before And After
Sequencing cycle
Nuc
leot
ide
com
posi
tion
0.0
0.2
0.4
0.6
0.8
1.0
0 10 20 30
A
T
Sequencing cycle
Nuc
leot
ide
com
posi
tion
0.0
0.2
0.4
0.6
0.8
1.0
0 10 20 30
A
T
Solexa (Default) Srfim (Statistical Approach)
The End