28
Leveraging massively parallel pyrosequencing for functional and ev olutionar y genomics in ferns.  Joshua Der and Paul Wolf Dept. Biology and Center for Integrated Biosystems, Utah State Universi ty

Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns

Embed Size (px)

Citation preview

Page 1: Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns

8/14/2019 Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns.

http://slidepdf.com/reader/full/leveraging-massively-parallel-pyrosequencing-for-functional-and-evolutionary 1/28

Leveraging massively parallelpyrosequencing for functional andevolutionary genomics in ferns.

 Joshua Der and Paul Wolf Dept. Biology and Center for Integrated Biosystems,

Utah State University

Page 2: Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns

8/14/2019 Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns.

http://slidepdf.com/reader/full/leveraging-massively-parallel-pyrosequencing-for-functional-and-evolutionary 2/28

• Traditional genome sequencing methods are cost

prohibitive, especially for large genomes

• Expressed Sequence Tags (ESTs) are a genome-wideproxy for functional components of the genome

• New-generation sequencing technologies:

low cost per base, massively high throughput

• Roche 454 pyrosequencing: long read lengths, enabling

de novo genome assembly in non-model organisms

Genomics in

non-model organisms

Page 3: Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns

8/14/2019 Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns.

http://slidepdf.com/reader/full/leveraging-massively-parallel-pyrosequencing-for-functional-and-evolutionary 3/28

Roche 454

Pyrosequencing• DNA fragmentation (nebulization)

•Fragment size selection and adapter ligation

• Single stranded DNA library isolation

• Bind library molecule to sequencing bead

(1 molecule/bead)

• Clonal amplification in emulsion PCR

• Bead deposition in PicoTiterPlate

• Pyrosequencing and image processing

Page 4: Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns

8/14/2019 Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns.

http://slidepdf.com/reader/full/leveraging-massively-parallel-pyrosequencing-for-functional-and-evolutionary 4/28

Page 5: Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns

8/14/2019 Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns.

http://slidepdf.com/reader/full/leveraging-massively-parallel-pyrosequencing-for-functional-and-evolutionary 5/28

haploid spores (n)

meiosis

sperm (n)egg (n)

zygote (2n)

Fernlife cycle

Alternation of generations

syngamy

Page 6: Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns

8/14/2019 Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns.

http://slidepdf.com/reader/full/leveraging-massively-parallel-pyrosequencing-for-functional-and-evolutionary 6/28

Fern genetics

• Recessive alleles are not masked in haploidgametophytes

•Gametophytes and sporophytes can be

vegetatively propagated

• Controlled crosses and double haploid lines(selfed individuals are homozygous at all loci)

• Gametic-phase segregation and recombinationcan be directly observed from gametophytes

• Genome function can be examined in haploidand diploid phases independently

Page 7: Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns

8/14/2019 Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns.

http://slidepdf.com/reader/full/leveraging-massively-parallel-pyrosequencing-for-functional-and-evolutionary 7/28

• Large genome sizes (avg. 10 Gb;humans = 3.2 Gb;  Arabidopsis = 0.157 Gb)

• Large chromosome numbers (avg. n=57;n=700 in Ophioglossum)

• History of polyploidy and hybridization

• Linkage map and ESTs for Ceratopteris

• No fern genomes have been sequenced or funded

Challenges in fern genetics

Page 8: Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns

8/14/2019 Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns.

http://slidepdf.com/reader/full/leveraging-massively-parallel-pyrosequencing-for-functional-and-evolutionary 8/28

Bracken fern,

Pteridium aquilinum• Worldwide distribution

• Economically important• Highly adaptable and phenotypically plastic

• Well established culture techniques

•Model for fern gametophyte development

and pheromonal sex determination

• Ancient polyploid with diploid gene

expression (isozymes)

• Genome size: 1C = 9.8 Gb

Page 9: Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns

8/14/2019 Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns.

http://slidepdf.com/reader/full/leveraging-massively-parallel-pyrosequencing-for-functional-and-evolutionary 9/28

•Are we able to determine complete chloroplastand mitochondrial genome sequences?

• How much of the genome is composed of repetitive sequences (transposable elements, SSRs)?

• What proportion of the genome is protein coding?

Research Questions:

Genome

Page 10: Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns

8/14/2019 Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns.

http://slidepdf.com/reader/full/leveraging-massively-parallel-pyrosequencing-for-functional-and-evolutionary 10/28

• What genes and ontology groups are expressed in

the reproductive haploid phase in ferns?

• Are any of these genes homologous withreproductive genes in bryophytes or flowering plants?

• Is there a signature of past polyploidy in the

transcriptome?

• Do genes expressed in the haploid phase experience

purifying selection?

Research Questions:

Transcriptome

Page 11: Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns

8/14/2019 Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns.

http://slidepdf.com/reader/full/leveraging-massively-parallel-pyrosequencing-for-functional-and-evolutionary 11/28

Total genomic sequences

derived from DNA extracted

in CTAB and purified on a

CsCl gradient

• Combination of 454

Standard FLX and Titanium

• 711,178 reads

• 216.19 Mb total sequence

The data: Genome

Genomic 454 read lengths

Read length (maximum = 1363)

   N  u  m   b  e  r  o   f  r  e  a   d  s

0 100 200 300 400 500 600 700

   0

   5   0

   0   0

   1   5   0   0   0

   2   5   0   0

   0

   3   5   0   0   0

Page 12: Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns

8/14/2019 Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns.

http://slidepdf.com/reader/full/leveraging-massively-parallel-pyrosequencing-for-functional-and-evolutionary 12/28

Histogram of genome assembly (MIRA)

Sequence length (largest contig = 52181 bp)

   N  u  m

   b  e  r  o   f  s  e  q  u  e  n  c  e  s

0 500 1000 1500 2000 2500

   0

   2   0   0   0

   4   0   0   0

   6   0   0   0

   8   0   0   0

Total contigs + singletons = 78366

Mean length = 476.91 bp

Total bases = 37.37 Mb

Genome: assembly

# reads assembled: 294,497

# singletons: 1,500

# contigs: 76,866

Average contig size: 476.91 bp

Largest contig size: 52,181 bp

Total consensus: 37.37 Mb

Page 13: Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns

8/14/2019 Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns.

http://slidepdf.com/reader/full/leveraging-massively-parallel-pyrosequencing-for-functional-and-evolutionary 13/28

Genome: chloroplast genome

Ribosomal RNAs 4

Transfer RNAs 29

Photosystem I 5

Photosystem II 15

Cytochrome 6

ATP synthase 6

Rubisco 1

Chlorophyll biosynthesis 3

NADH dehydrogenase 11

Ribosomal proteins 22

RNA polymerase 4

Miscellaneous proteins 5

Hypothetical proteins 4

Pseudogenes 2

Page 14: Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns

8/14/2019 Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns.

http://slidepdf.com/reader/full/leveraging-massively-parallel-pyrosequencing-for-functional-and-evolutionary 14/28

Genome: gene content

• Gene finder: GlimmerHMM(trained on Arabidopsis)

• 7.27 Mb are putative exons(19.46%)

Histogram of exon size

Exon length

       F     r     e     q     u     e     n

     c     y

0 500 1000 1500

       0

       1       0       0       0

       2       0       0       0

       3       0       0       0

       4       0       0       0

       5       0       0       0

Exon Noncoding

Page 15: Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns

8/14/2019 Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns.

http://slidepdf.com/reader/full/leveraging-massively-parallel-pyrosequencing-for-functional-and-evolutionary 15/28

Genome: microsatellites

Repeat motif length #

dinucleotide 5564

trinucleotide 470

tetranucleotide 15

pentanucleotide 78

hexanucleotide 1

Total: 6128

Histogram of microsattelite repeats

Number of repeats

       F     r     e     q     u     e     n     c     y

0 20 40 60 80 100

       0

       5       0       0

       1       0       0       0

       1       5

       0       0

       2       0       0       0

Page 16: Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns

8/14/2019 Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns.

http://slidepdf.com/reader/full/leveraging-massively-parallel-pyrosequencing-for-functional-and-evolutionary 16/28

Full-length enriched,

normalized cDNA sequences

derived from mature

gametophyte total RNA

Reads were vector screened

and quality trimmed:

• 681,722 reads

• 254.00 Mb total sequence

The data: TranscriptomeHistogram of cleaned reads

Cleaned read length, maximum = 624

   N  u  m   b  e  r  o   f  s  e  q  u  e  n  c  e

  s

0 100 200 300 400 500 600

   0

   5   0   0   0

   1   0   0   0   0

   1   5   0   0   0

Page 17: Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns

8/14/2019 Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns.

http://slidepdf.com/reader/full/leveraging-massively-parallel-pyrosequencing-for-functional-and-evolutionary 17/28

Histogram of transcriptome unigenes (CAP3)

Unigene length, largest transcript = 4897 bp

   N  u

  m   b  e  r  o   f  s  e  q  u  e  n  c  e  s

0 500 1000 1500 2000 2500

   0

   2   0   0

   0

   4   0   0   0

   6   0   0   0

Total unigenes = 38889

Mean length = 685.76 bp

Total bases = 26.67 Mp

Transcriptome: assembly

MIRA

(1º)

CAP3

(2º)# 2º contigs: 0 5,905

# 1º contigs: 50,020 32,801

# singletons: 638 183

# unigenes: 50,658 38,889

mean unigene size: 637.7 bp 685.8 bp

largest unigene size: 4,489 bp 4,897 bp

total consensus: 32.30 Mb 26.67 Mb

Page 18: Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns

8/14/2019 Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns.

http://slidepdf.com/reader/full/leveraging-massively-parallel-pyrosequencing-for-functional-and-evolutionary 18/28

Transcriptome: BLAST

•38,889 unigenes blasted against

NCBI nr database (blastx)

•eValue cutoff: 1.0 E-10

•17,788 unigenes had no match

in the database

Sequence similarity distribution

0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 80 9 0 100

#positives/alignment-length

0

1,000

2,000

3,000

4,000

5,000

6,000

7,000

      H      I      T     s

E-value distribution

2 5 5 0 7 5 100 125 150 175

E-value (1e-X)

0

1,000

2,000

3,000

4,000

5,000

6,000

7,000

8,000

9,000

10,000

      H      I      T     s

No BLAST result

No BLAST hitPositive BLAST hit

Page 19: Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns

8/14/2019 Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns.

http://slidepdf.com/reader/full/leveraging-massively-parallel-pyrosequencing-for-functional-and-evolutionary 19/28

Transcriptome: BLAST

HSP/HIT coverage distribution

0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0 100

HSP/HIT coverage in %

0

1,000

2,000

3,000

4,000

5,000

6,000

7,000

      H      I      T     s

Distribution of full length transcripts

Page 20: Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns

8/14/2019 Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns.

http://slidepdf.com/reader/full/leveraging-massively-parallel-pyrosequencing-for-functional-and-evolutionary 20/28

Transcriptome: BLASTTop-Hit species distribution

0 500 1,000 1,500 2,000 2,500 3,000 3,500 4,000 4,500 5,000

BLAST HITs

Physcomitrella patens

Vitis vinifera

Picea sitchensis

Ricinus communis

Populus trichocarpa

Arabidopsis thaliana

Oryza sativa

Sorghum bicolor

Glycine maxZea mays

Gossypium hirsutum

Medicago truncatula

unknown

Adiantum capillus-veneris

Ceratopteris richardii

Nicotiana tabacum

Marchantia polymorpha

Solanum tuberosum

Chlamydomonas reinhardtiiAlsophila spinulosa

Ginkgo biloba

Micromonas sp.

Pteris vittata

Elaeis guineensis

Pinus taeda

Solanum lycopersicum

Micromonas pusilla

Triticum aestivum

Gossypium barbadense

others

Page 21: Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns

8/14/2019 Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns.

http://slidepdf.com/reader/full/leveraging-massively-parallel-pyrosequencing-for-functional-and-evolutionary 21/28

Transcriptome: GO annotation

Biological Process

Direct GO Count

0 500 1,000 1,500 2,000 2,500

#Seqs

P:cellular process

P:metabolic process

P:transport

P:biosynthetic process

P:protein modificati...

P:protein metabol...

P:cellular compone...

P:transcription

P:translation

P:response to stress

P:nucleobas...

P:generation ...

P:carbohydra...

P:catabolic process

P:cellular amino ac...

P:signal transduction

P:lipid metabol...

P:photosynthesis

P:biological_process

P:response to abiot...

     #      G      O

Page 22: Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns

8/14/2019 Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns.

http://slidepdf.com/reader/full/leveraging-massively-parallel-pyrosequencing-for-functional-and-evolutionary 22/28Cellular Component

Transcriptome: GO annotationDirect GO Count

0 500 1,000 1,500 2,000 2,500 3,000 3,500

#Seqs

C:plastid

C:membrane

C:mitochondrion

C:cytoplasm

C:nucleus

C:plasma membrane

C:intracellular

C:thylakoid

C:ribosome

C:cytosol

C:extracellular region

C:endoplasm...

C:cell wall

C:vacuole

C:cell

C:cytoskeleton

C:Golgi apparatus

C:nucleolus

C:peroxisome

C:cellular_component

     #      G      O

Page 23: Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns

8/14/2019 Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns.

http://slidepdf.com/reader/full/leveraging-massively-parallel-pyrosequencing-for-functional-and-evolutionary 23/28

Molecular Function

Transcriptome: GO annotationDirect GO Count

0 5 00 1 ,0 00 1 ,5 00 2 ,00 0 2, 500 3 ,0 00 3 ,5 00

#Seqs

F:binding

F:catalytic activity

F:nucleotide binding

F:hydrolase activity

F:protein binding

F:transferase activity

F:kinase activity

F:transporter activity

F:DNA binding

F:structural molecu...

F:nucleic acid binding

F:RNA binding

F:molecular_function

F:transcription fact...

F:signal transduc...

F:receptor activity

F:transcripti...

F:translation fact...

F:nuclease activity

F:enzyme regulat...

     #      G      O

Page 24: Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns

8/14/2019 Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns.

http://slidepdf.com/reader/full/leveraging-massively-parallel-pyrosequencing-for-functional-and-evolutionary 24/28

Transcriptome:

paleopolyploidy• Evaluate the distribution of 

synonomous substitution rates

(Ks) for duplicate gene pairs

• Ks is a proxy for time

• Constant birth-death rate of 

duplicate genes results in a

exponential decrease in

frequency over time.

• Mixture model analysis to

separate significant components

of the distribution

• Significant low peak at Ks = 1.20

        0

        2        0        0

        4        0        0

        6        0        0

        8        0        0

0.0 0.5 1.0 1.5 2.0

Ks value

        F      r

      e       q         u       e       n      c       y   

Page 25: Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns

8/14/2019 Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns.

http://slidepdf.com/reader/full/leveraging-massively-parallel-pyrosequencing-for-functional-and-evolutionary 25/28

• Try to extract complete mitochondrial sequence

• Train gene finders with transcriptome data

•Sequence sporophyte transcriptome

• Screen population samples for microsatellites variation

• Population genetics: gene flow, demographics, population

structure

• Test for selective sweeps to ID candidate genes for adaptation,reproductive isolation, and speciation

• Develop linkage maps from controlled crosses

• RNA-seq to measure expression levels

Future work 

Page 26: Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns

8/14/2019 Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns.

http://slidepdf.com/reader/full/leveraging-massively-parallel-pyrosequencing-for-functional-and-evolutionary 26/28

• Utah State University

• Aaron Duffy - gametophyte culture advice

• Mike Pfrender - lab space and equipment for RNA extraction

• Center for Integrated Biosystems -CIBR students grant for transcriptome sequencing andcomplimentary 454 Titanium genome sequencing

• VP for Research - additional funds for transcriptomesequencing

• University of British Columbia

• Mike Barker - 454 advice and transcriptome assembly

• Katrina Dlugosch - transcriptome sequence cleaning scripts

• Pennsylvania State University

• Claude dePamphilis - transcriptome analysis

• Norman Wickett - transcriptome analysis

• Sara Elgin, HHMI Genomics Education Partnership, WashingtonUniv.- funding and sequencing 454 FLX standard genomic data

•  Jeff Boore, Genome Solutions - genome sequencing funding.

• Keithanne Mockaitis, Center for Genomics and Bioinformatics,

Indiana University - transcriptome library preparation andsequencing

Acknowledgements

Page 27: Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns

8/14/2019 Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns.

http://slidepdf.com/reader/full/leveraging-massively-parallel-pyrosequencing-for-functional-and-evolutionary 27/28

1. de novo sequence assembly: MIRA

2. BLASTn search against fern chloroplast genomesto identify chloroplast contigs. In silico finishing.

3. Identify putative exons: GlimmerHMM

4. Identify microsatellites: SSRIT

5. Identify transposable elements: REPCLASS

6. Summaries and statistical analyses

Data processing:

Genome

Page 28: Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns

8/14/2019 Leveraging massively parallel pyrosequencing for functional and evolutionary genomics in ferns.

http://slidepdf.com/reader/full/leveraging-massively-parallel-pyrosequencing-for-functional-and-evolutionary 28/28

1. Adapter and primer trimming: SeqClean and Snowhite.pl(custom script by Katrina Dlugosch)

2. de novo sequence assembly: MIRA

3. Secondary assembly: CAP3

4. BLASTx against nr to find homologous proteins

5. Functional annotation (GO) based on homology transfer

from BLAST hits: blast2go

6. Functional description of gametophyte transcriptome

7. Ks analysis of duplicate genes and past polyploidy

8 Summaries and statistical analyses

Data processing:

Transcriptome