Upload
joshua-der
View
233
Download
0
Embed Size (px)
Citation preview
8/14/2019 Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns.
http://slidepdf.com/reader/full/leveraging-massively-parallel-pyrosequencing-for-functional-and-evolutionary 1/28
Leveraging massively parallelpyrosequencing for functional andevolutionary genomics in ferns.
Joshua Der and Paul Wolf Dept. Biology and Center for Integrated Biosystems,
Utah State University
8/14/2019 Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns.
http://slidepdf.com/reader/full/leveraging-massively-parallel-pyrosequencing-for-functional-and-evolutionary 2/28
• Traditional genome sequencing methods are cost
prohibitive, especially for large genomes
• Expressed Sequence Tags (ESTs) are a genome-wideproxy for functional components of the genome
• New-generation sequencing technologies:
low cost per base, massively high throughput
• Roche 454 pyrosequencing: long read lengths, enabling
de novo genome assembly in non-model organisms
Genomics in
non-model organisms
8/14/2019 Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns.
http://slidepdf.com/reader/full/leveraging-massively-parallel-pyrosequencing-for-functional-and-evolutionary 3/28
Roche 454
Pyrosequencing• DNA fragmentation (nebulization)
•Fragment size selection and adapter ligation
• Single stranded DNA library isolation
• Bind library molecule to sequencing bead
(1 molecule/bead)
• Clonal amplification in emulsion PCR
• Bead deposition in PicoTiterPlate
• Pyrosequencing and image processing
8/14/2019 Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns.
http://slidepdf.com/reader/full/leveraging-massively-parallel-pyrosequencing-for-functional-and-evolutionary 4/28
8/14/2019 Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns.
http://slidepdf.com/reader/full/leveraging-massively-parallel-pyrosequencing-for-functional-and-evolutionary 5/28
haploid spores (n)
meiosis
sperm (n)egg (n)
zygote (2n)
Fernlife cycle
Alternation of generations
syngamy
8/14/2019 Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns.
http://slidepdf.com/reader/full/leveraging-massively-parallel-pyrosequencing-for-functional-and-evolutionary 6/28
Fern genetics
• Recessive alleles are not masked in haploidgametophytes
•Gametophytes and sporophytes can be
vegetatively propagated
• Controlled crosses and double haploid lines(selfed individuals are homozygous at all loci)
• Gametic-phase segregation and recombinationcan be directly observed from gametophytes
• Genome function can be examined in haploidand diploid phases independently
8/14/2019 Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns.
http://slidepdf.com/reader/full/leveraging-massively-parallel-pyrosequencing-for-functional-and-evolutionary 7/28
• Large genome sizes (avg. 10 Gb;humans = 3.2 Gb; Arabidopsis = 0.157 Gb)
• Large chromosome numbers (avg. n=57;n=700 in Ophioglossum)
• History of polyploidy and hybridization
• Linkage map and ESTs for Ceratopteris
• No fern genomes have been sequenced or funded
Challenges in fern genetics
8/14/2019 Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns.
http://slidepdf.com/reader/full/leveraging-massively-parallel-pyrosequencing-for-functional-and-evolutionary 8/28
Bracken fern,
Pteridium aquilinum• Worldwide distribution
• Economically important• Highly adaptable and phenotypically plastic
• Well established culture techniques
•Model for fern gametophyte development
and pheromonal sex determination
• Ancient polyploid with diploid gene
expression (isozymes)
• Genome size: 1C = 9.8 Gb
8/14/2019 Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns.
http://slidepdf.com/reader/full/leveraging-massively-parallel-pyrosequencing-for-functional-and-evolutionary 9/28
•Are we able to determine complete chloroplastand mitochondrial genome sequences?
• How much of the genome is composed of repetitive sequences (transposable elements, SSRs)?
• What proportion of the genome is protein coding?
Research Questions:
Genome
8/14/2019 Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns.
http://slidepdf.com/reader/full/leveraging-massively-parallel-pyrosequencing-for-functional-and-evolutionary 10/28
• What genes and ontology groups are expressed in
the reproductive haploid phase in ferns?
• Are any of these genes homologous withreproductive genes in bryophytes or flowering plants?
• Is there a signature of past polyploidy in the
transcriptome?
• Do genes expressed in the haploid phase experience
purifying selection?
Research Questions:
Transcriptome
8/14/2019 Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns.
http://slidepdf.com/reader/full/leveraging-massively-parallel-pyrosequencing-for-functional-and-evolutionary 11/28
Total genomic sequences
derived from DNA extracted
in CTAB and purified on a
CsCl gradient
• Combination of 454
Standard FLX and Titanium
• 711,178 reads
• 216.19 Mb total sequence
The data: Genome
Genomic 454 read lengths
Read length (maximum = 1363)
N u m b e r o f r e a d s
0 100 200 300 400 500 600 700
0
5 0
0 0
1 5 0 0 0
2 5 0 0
0
3 5 0 0 0
8/14/2019 Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns.
http://slidepdf.com/reader/full/leveraging-massively-parallel-pyrosequencing-for-functional-and-evolutionary 12/28
Histogram of genome assembly (MIRA)
Sequence length (largest contig = 52181 bp)
N u m
b e r o f s e q u e n c e s
0 500 1000 1500 2000 2500
0
2 0 0 0
4 0 0 0
6 0 0 0
8 0 0 0
Total contigs + singletons = 78366
Mean length = 476.91 bp
Total bases = 37.37 Mb
Genome: assembly
# reads assembled: 294,497
# singletons: 1,500
# contigs: 76,866
Average contig size: 476.91 bp
Largest contig size: 52,181 bp
Total consensus: 37.37 Mb
8/14/2019 Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns.
http://slidepdf.com/reader/full/leveraging-massively-parallel-pyrosequencing-for-functional-and-evolutionary 13/28
Genome: chloroplast genome
Ribosomal RNAs 4
Transfer RNAs 29
Photosystem I 5
Photosystem II 15
Cytochrome 6
ATP synthase 6
Rubisco 1
Chlorophyll biosynthesis 3
NADH dehydrogenase 11
Ribosomal proteins 22
RNA polymerase 4
Miscellaneous proteins 5
Hypothetical proteins 4
Pseudogenes 2
8/14/2019 Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns.
http://slidepdf.com/reader/full/leveraging-massively-parallel-pyrosequencing-for-functional-and-evolutionary 14/28
Genome: gene content
• Gene finder: GlimmerHMM(trained on Arabidopsis)
• 7.27 Mb are putative exons(19.46%)
Histogram of exon size
Exon length
F r e q u e n
c y
0 500 1000 1500
0
1 0 0 0
2 0 0 0
3 0 0 0
4 0 0 0
5 0 0 0
Exon Noncoding
8/14/2019 Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns.
http://slidepdf.com/reader/full/leveraging-massively-parallel-pyrosequencing-for-functional-and-evolutionary 15/28
Genome: microsatellites
Repeat motif length #
dinucleotide 5564
trinucleotide 470
tetranucleotide 15
pentanucleotide 78
hexanucleotide 1
Total: 6128
Histogram of microsattelite repeats
Number of repeats
F r e q u e n c y
0 20 40 60 80 100
0
5 0 0
1 0 0 0
1 5
0 0
2 0 0 0
8/14/2019 Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns.
http://slidepdf.com/reader/full/leveraging-massively-parallel-pyrosequencing-for-functional-and-evolutionary 16/28
Full-length enriched,
normalized cDNA sequences
derived from mature
gametophyte total RNA
Reads were vector screened
and quality trimmed:
• 681,722 reads
• 254.00 Mb total sequence
The data: TranscriptomeHistogram of cleaned reads
Cleaned read length, maximum = 624
N u m b e r o f s e q u e n c e
s
0 100 200 300 400 500 600
0
5 0 0 0
1 0 0 0 0
1 5 0 0 0
8/14/2019 Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns.
http://slidepdf.com/reader/full/leveraging-massively-parallel-pyrosequencing-for-functional-and-evolutionary 17/28
Histogram of transcriptome unigenes (CAP3)
Unigene length, largest transcript = 4897 bp
N u
m b e r o f s e q u e n c e s
0 500 1000 1500 2000 2500
0
2 0 0
0
4 0 0 0
6 0 0 0
Total unigenes = 38889
Mean length = 685.76 bp
Total bases = 26.67 Mp
Transcriptome: assembly
MIRA
(1º)
CAP3
(2º)# 2º contigs: 0 5,905
# 1º contigs: 50,020 32,801
# singletons: 638 183
# unigenes: 50,658 38,889
mean unigene size: 637.7 bp 685.8 bp
largest unigene size: 4,489 bp 4,897 bp
total consensus: 32.30 Mb 26.67 Mb
8/14/2019 Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns.
http://slidepdf.com/reader/full/leveraging-massively-parallel-pyrosequencing-for-functional-and-evolutionary 18/28
Transcriptome: BLAST
•38,889 unigenes blasted against
NCBI nr database (blastx)
•eValue cutoff: 1.0 E-10
•17,788 unigenes had no match
in the database
Sequence similarity distribution
0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 80 9 0 100
#positives/alignment-length
0
1,000
2,000
3,000
4,000
5,000
6,000
7,000
H I T s
E-value distribution
2 5 5 0 7 5 100 125 150 175
E-value (1e-X)
0
1,000
2,000
3,000
4,000
5,000
6,000
7,000
8,000
9,000
10,000
H I T s
No BLAST result
No BLAST hitPositive BLAST hit
8/14/2019 Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns.
http://slidepdf.com/reader/full/leveraging-massively-parallel-pyrosequencing-for-functional-and-evolutionary 19/28
Transcriptome: BLAST
HSP/HIT coverage distribution
0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0 100
HSP/HIT coverage in %
0
1,000
2,000
3,000
4,000
5,000
6,000
7,000
H I T s
Distribution of full length transcripts
8/14/2019 Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns.
http://slidepdf.com/reader/full/leveraging-massively-parallel-pyrosequencing-for-functional-and-evolutionary 20/28
Transcriptome: BLASTTop-Hit species distribution
0 500 1,000 1,500 2,000 2,500 3,000 3,500 4,000 4,500 5,000
BLAST HITs
Physcomitrella patens
Vitis vinifera
Picea sitchensis
Ricinus communis
Populus trichocarpa
Arabidopsis thaliana
Oryza sativa
Sorghum bicolor
Glycine maxZea mays
Gossypium hirsutum
Medicago truncatula
unknown
Adiantum capillus-veneris
Ceratopteris richardii
Nicotiana tabacum
Marchantia polymorpha
Solanum tuberosum
Chlamydomonas reinhardtiiAlsophila spinulosa
Ginkgo biloba
Micromonas sp.
Pteris vittata
Elaeis guineensis
Pinus taeda
Solanum lycopersicum
Micromonas pusilla
Triticum aestivum
Gossypium barbadense
others
8/14/2019 Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns.
http://slidepdf.com/reader/full/leveraging-massively-parallel-pyrosequencing-for-functional-and-evolutionary 21/28
Transcriptome: GO annotation
Biological Process
Direct GO Count
0 500 1,000 1,500 2,000 2,500
#Seqs
P:cellular process
P:metabolic process
P:transport
P:biosynthetic process
P:protein modificati...
P:protein metabol...
P:cellular compone...
P:transcription
P:translation
P:response to stress
P:nucleobas...
P:generation ...
P:carbohydra...
P:catabolic process
P:cellular amino ac...
P:signal transduction
P:lipid metabol...
P:photosynthesis
P:biological_process
P:response to abiot...
# G O
8/14/2019 Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns.
http://slidepdf.com/reader/full/leveraging-massively-parallel-pyrosequencing-for-functional-and-evolutionary 22/28Cellular Component
Transcriptome: GO annotationDirect GO Count
0 500 1,000 1,500 2,000 2,500 3,000 3,500
#Seqs
C:plastid
C:membrane
C:mitochondrion
C:cytoplasm
C:nucleus
C:plasma membrane
C:intracellular
C:thylakoid
C:ribosome
C:cytosol
C:extracellular region
C:endoplasm...
C:cell wall
C:vacuole
C:cell
C:cytoskeleton
C:Golgi apparatus
C:nucleolus
C:peroxisome
C:cellular_component
# G O
8/14/2019 Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns.
http://slidepdf.com/reader/full/leveraging-massively-parallel-pyrosequencing-for-functional-and-evolutionary 23/28
Molecular Function
Transcriptome: GO annotationDirect GO Count
0 5 00 1 ,0 00 1 ,5 00 2 ,00 0 2, 500 3 ,0 00 3 ,5 00
#Seqs
F:binding
F:catalytic activity
F:nucleotide binding
F:hydrolase activity
F:protein binding
F:transferase activity
F:kinase activity
F:transporter activity
F:DNA binding
F:structural molecu...
F:nucleic acid binding
F:RNA binding
F:molecular_function
F:transcription fact...
F:signal transduc...
F:receptor activity
F:transcripti...
F:translation fact...
F:nuclease activity
F:enzyme regulat...
# G O
8/14/2019 Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns.
http://slidepdf.com/reader/full/leveraging-massively-parallel-pyrosequencing-for-functional-and-evolutionary 24/28
Transcriptome:
paleopolyploidy• Evaluate the distribution of
synonomous substitution rates
(Ks) for duplicate gene pairs
• Ks is a proxy for time
• Constant birth-death rate of
duplicate genes results in a
exponential decrease in
frequency over time.
• Mixture model analysis to
separate significant components
of the distribution
• Significant low peak at Ks = 1.20
0
2 0 0
4 0 0
6 0 0
8 0 0
0.0 0.5 1.0 1.5 2.0
Ks value
F r
e q u e n c y
8/14/2019 Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns.
http://slidepdf.com/reader/full/leveraging-massively-parallel-pyrosequencing-for-functional-and-evolutionary 25/28
• Try to extract complete mitochondrial sequence
• Train gene finders with transcriptome data
•Sequence sporophyte transcriptome
• Screen population samples for microsatellites variation
• Population genetics: gene flow, demographics, population
structure
• Test for selective sweeps to ID candidate genes for adaptation,reproductive isolation, and speciation
• Develop linkage maps from controlled crosses
• RNA-seq to measure expression levels
Future work
8/14/2019 Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns.
http://slidepdf.com/reader/full/leveraging-massively-parallel-pyrosequencing-for-functional-and-evolutionary 26/28
• Utah State University
• Aaron Duffy - gametophyte culture advice
• Mike Pfrender - lab space and equipment for RNA extraction
• Center for Integrated Biosystems -CIBR students grant for transcriptome sequencing andcomplimentary 454 Titanium genome sequencing
• VP for Research - additional funds for transcriptomesequencing
• University of British Columbia
• Mike Barker - 454 advice and transcriptome assembly
• Katrina Dlugosch - transcriptome sequence cleaning scripts
• Pennsylvania State University
• Claude dePamphilis - transcriptome analysis
• Norman Wickett - transcriptome analysis
• Sara Elgin, HHMI Genomics Education Partnership, WashingtonUniv.- funding and sequencing 454 FLX standard genomic data
• Jeff Boore, Genome Solutions - genome sequencing funding.
• Keithanne Mockaitis, Center for Genomics and Bioinformatics,
Indiana University - transcriptome library preparation andsequencing
Acknowledgements
8/14/2019 Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns.
http://slidepdf.com/reader/full/leveraging-massively-parallel-pyrosequencing-for-functional-and-evolutionary 27/28
1. de novo sequence assembly: MIRA
2. BLASTn search against fern chloroplast genomesto identify chloroplast contigs. In silico finishing.
3. Identify putative exons: GlimmerHMM
4. Identify microsatellites: SSRIT
5. Identify transposable elements: REPCLASS
6. Summaries and statistical analyses
Data processing:
Genome
8/14/2019 Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns.
http://slidepdf.com/reader/full/leveraging-massively-parallel-pyrosequencing-for-functional-and-evolutionary 28/28
1. Adapter and primer trimming: SeqClean and Snowhite.pl(custom script by Katrina Dlugosch)
2. de novo sequence assembly: MIRA
3. Secondary assembly: CAP3
4. BLASTx against nr to find homologous proteins
5. Functional annotation (GO) based on homology transfer
from BLAST hits: blast2go
6. Functional description of gametophyte transcriptome
7. Ks analysis of duplicate genes and past polyploidy
8 Summaries and statistical analyses
Data processing:
Transcriptome