Upload
helen-flynn
View
213
Download
0
Embed Size (px)
Citation preview
GBS VS. RAD-SEQTHE ULTIMATE THROW
DOWN! (OF ACRONYMS)
GBS: Genotyping-by-Sequencing
RAD-Seq: Restriction-site associated DNA sequencing
THE CONCEPT
Reduce the Genome
Pool Samples
Sequence Combined
Pool
Assign sequences
to individuals
Call Variants between
individuals
THE CONCEPT
Ind_1
Ind_2
Ind_3
Ind_4
Ind_5
Ind_n
Reduce the genome and increase the probability of overlap
HOW IT WORKS
Ind1
Ind2
Ind3
Ind4
Ind5
IndN
Tag1
Tag2
Tag3
Tag4
Tag5
TagN
Tags (AKA Barcodes, MID Barcodes, etc.)
= CAGATA
= GAAGTG
= TAGCGGAT
= GGATA
= CACCA
= …
Tag1
Tag2
Tag3
Tag4
Tag5
TagN
HOW IT WORKS
(THE ONE ENZYME METHOD)Ind1
Ind2
Ind3
Ind4
Ind5
IndN
Tag1
Tag2
Tag3
Tag4
Tag5
TagN
Tag1
Tag2
Tag3
Tag4
Tag5
TagN
HOW IT WORKSPOOLING
Tag1
Tag2
Tag3
Tag4
Tag5
TagN
Ind1
Ind2
Ind3
Ind4
Ind5
IndN
Tag1
Tag2
Tag3
Tag4
Tag5
TagN
Ind1
Ind2
Ind3
Ind4
Ind5
IndN
Size Selection(optional if using
two-enzymes)
WHY POOL SAMPLES?
On the Illumina Hi-seq 2000: • 8 lanes of sequencing, each capable
of giving 374 million reads.
• You can’t partition a lane.
• Sequencing is expensive ($1500 - $3000 per lane).
• You don’t need/want 374 million reads per individual.
A WORD ABOUT TAGS
• Hamming vs. Edit Distance
• Sequence errors may result from things other than sequencing.
• n-1 errors are the most common error encountered during oligo synthesis.
ANALYSIS IT’S ABOUT TIME… AND MONEY… AND TIME
Key Considerations:• Time• Computing power available• Amount of sequence data (back to
time)• Availability of a reference genome
KEY CONSIDERATIONS
• Study goals• Availability of a reference genome• Expected degree of polymorphism• Choice of restriction enzyme• DNA sample preparation• Adaptor design• PCR amplification• Sequencing• Pooling individuals• Analysis
ANALYSIS IT’S ABOUT TIME… AND MONEY… AND TIME
A Few Options:• Stacks
– For use with bi-parental mapping populations– Takes a lot of time– Looks at entire reads– Reference genome optional– Designed to work nicely with MySQL– More memory intensive
• UNEAK– For use with species without a reference genome– Uses only 64 bp of each read– MUCH faster than Stacks– Less memory intensive
• TASSEL– For use with species with a reference genome– Uses only 64 bp of each read– MUCH faster than Stacks– Less memory Intensive
• Custom scripts– Completely flexible (hence the ‘custom’)– Requires significant knowledge about programming (or knowing someone who does and is willing to help)
THE GOOD
• No ascertainment bias• Random distribution throughout the genome•May be useful for species without a reference
genome• Useful with genomic selection• May provide a large number of SNPs• Relatively low per sample cost
THE GOOD (CONT)
GBS is extremely flexible• Number of individuals per lane/flowcell• Choice of enzymes– Cut sites–Methylation sensitivity
• Size of fragments selected
THE BAD
• Poor reproducibility between runs
• Species without a reference genome *cannot* infer missing data
• Often dealing with large amounts of missing data
• Difficult to filter out false SNPs in non-mapping populations, unless you have a
reference genome and even then…
• In my opinion: this would be nigh impossible to use with association studies in
species without a reference genome UNLESS you sequence to very high
coverage to virtually eliminate missing data (alternatively, you could drastically
reduce the genome by your choice of enzymes – but this may be bad if your
expected degree of polymorphism is low)
TASSEL-GBS
• www.maizegenetics.net/index.php?option=com_content&task=view&id=89&Itemid=119
• GBS_Document– www.maizegenetics.net/tassel/docs/TasselPipeline
GBS.pdf