52
1 Towards a Reference Genome for Switchgrass (Panicum virgatum) Jeremy Schmutz, Jarrod Chapman, Jerry Jenkins, Jane Grimwood, Kerrie Barry, Gerald A. Tuskan, Daniel S. Rokhsar & many others

Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy

Embed Size (px)

Citation preview

Page 1: Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy

1

Towards a Reference Genome for

Switchgrass (Panicum virgatum)

Jeremy Schmutz, Jarrod Chapman, Jerry Jenkins, Jane Grimwood, Kerrie

Barry, Gerald A. Tuskan, Daniel S. Rokhsar & many others

Page 2: Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy

DOE Joint Genome Institute

Mission: Serving as a genomic user facility in support of DOE mission science

• Funded by Biological Environmental Science (BER)

• Walnut Creek, CA

• ~270 employees

• HiSeq (9), MiSeq (6), PacBio (2), 454 (1)

• Includes partner laboratories such as HudsonAlpha

funded for specific goals

Bioenerg

y

Carbon cycling Biogeochemistry

Plants Fungi Microbes Metagenomes 2

Page 3: Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy

JGI & BRCs

• Development of next-generation

bioenergy crops

• Discovery and design of enzymes and

microbes with novel biomass-degrading

capabilities

• Development of transformational

microbe-mediated strategies for biofuel

production 3

Page 4: Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy

JGI Plant Program

Flagship Plant Genomes

Flagship Comparative

Genomics

Resequencing & Population

Diversity

Transcriptomics & sequence based functional assays for

Flagship Plants

Community Organization

Plant Customi-zation

QTLs and Genotype/Phenotype

Links

4

Page 5: Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy

JGI Plant Flagship Genomes

1. Provide complete genomic resources and

genomes of direct DOE mission importance

2. Support efforts for cellulosic biofuel

development and feedstock customization

3. Foster communities to develop research

programs around DOE plants

4. Build a solid foundation for diversity and

functional studies

5

Page 6: Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy

JGI Plant Flagships

6

Sorghum Poplar Miscanthus

Cellulosic Feedstocks Oil Seeds

Models to understand plant biology of biofuel traits

Soybean

Chlamy Physco Brachy Foxtail Panicgrass

Switchgrass

Page 7: Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy

Introduction to switchgrass

Plus:

• High cellulosic yields, marginal land, low input

plant

• Existing agronomy knowledge and breeding

from planting as a forage crop

• Perennial crop which can be annually harvested

after establishment

• Widespread native species in North America,

resistant to American pests

• Presumably large adaptive variation across the

growing regions

Minus:

• Widespread native species in North America,

very difficult to contain large scale plantings of

improved varieties

• Obligate outcrossing polyploid species

7

Page 8: Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy

Switchgrass is difficult

Obligate outcrossing tetraploid!

• Difficult to inbreed

• 4 copies of genes (or maybe 2,

or 1 or more)

• Variation within and between

subgenomes

• Genome is only a reference for

one plant AP13, not for all

switchgrass or even all

Panicum virgatum individuals

A1 A2 B1 B2

A C A A

A A C C

A T C G

C C

8

Page 9: Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy

Genomic view of polymorphism

9

Page 10: Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy

Switchgrass Reference Project

• Goal : Produce a reference genome of AP13 that can be used for everything from marker assisted breeding and QTL identification to direct functional work on understanding cell wall biosynthesis

• Project has spanned several phases:

1. Resource development

2. Initial whole genome shotgun sequencing (v0.0)

3. Localization and assembly on chromosomes (v1.0)

4. Ongoing improvement through direct sequencing of localized regions (v2.0)

10

Page 11: Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy

Project origins

• Started as a BRC project to produce a reference genome (initiated by JBEI in Nov.2008)

• Cultivar selected by the group: Alamo AP13

• DNA was isolated by Jeff Bennetzen’s group @UGA and Rod Wing’s group@AU for BAC libraries and sequencing

• Began sequencing in early 2010, work for developing resources was in progress

11

Page 12: Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy

Switchgrass marker paper

November 2011, The Plant Genome 12

Page 13: Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy

BAC libraries & BES

• Generated BAC libraries with 2

cut cites, 330k BES and some

clone based sequencing,

average insert size 110kb and

144kb.

• Clones available from CUGI

www.genome.clemson.edu

With: Pam Ronald’s Group @ JBEI

PLOS1 April 2012, Shama et al.

13

Page 14: Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy

EST & transcripts

• 510,000 Sanger ESTs from 9 tissues with C. Tobias

• 169,079 Sanger ESTs from cell wall, 11.5 million 454

ESTs from VS16 and AP13 with BESC/Noble

Table 1. Switchgrass 454-cDNA libraries and 454-ESTs NCBI SRA

accession #

JGI lib

code Tissues

Plant growth

stage/conditions

# of good

ESTs

Mean

length (bp)

Summer VS16 454 data

SRX026147 CFBB Whole shoot Leaf development 259,106 201

SRX026148 CFBC Whole root Leaf development 205,466 222

SRX026149 CFBF Whole shoot Stem elongation 194,426 194

SRX026150 CFBG Whole root Stem elongation 174,053 190

SRX026151-2 CFCZ Whole shoot Reproductive stage 219,230 189

SRX026153-4 CFFA Whole root Reproductive stage 234,107 205

SRX026155-6 CFFB

Panicles

including seeds Reproductive stage 220,933 212

1,507,321 202

Alamo AP13 454 data

SRX057824 CCXN Whole shoot Stem elongation 733,173 202

SRX057825 CCXO Whole root Stem elongation 667,612 206

SRX057830 CFXX Whole shoot Leaf development 1,236,020 419

SRX057831 CFXY Whole root Leaf development 1,214,630 375

SRX057828 CFXW Whole shoot Stem elongation 1,357,290 223

SRX057829 CGGO Whole root Stem elongation 1,040,192 404

SRX057827 CGFF Whole shoot Reproductive stage 547,278 320

SRX057826 CGFC Whole root Reproductive stage 998,691 388

SRX057834 CGTX

Panicles

including seeds Reproductive stage 1,096,949 384

SRX057833 CGFI Whole shoot

Stem elongation 2

w/drought 362,346 213

SRX057832 CGGU Whole root

Stem elongation 2

w/drought 918,585 337

10,172,766 316

11,680,087

Sub-total

Total

Sub-total

Development of an integrated

transcript sequence database and a

gene expression atlas for gene

discovery and analysis in switchgrass

(Panicum virgatum L.) – Zhang et al.

2013 Plant Journal.

http://switchgrassgenomics.noble.org

14

Page 15: Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy

Foxtail millet

• Foxtail millet sequenced with 8x sanger sequence

• Demonstrate using it as a comparative basis to

reconstruct switchgrass chromosomes

Nature Biotech May 13, 2012 15

Page 16: Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy

Onward to V0.0

16

Page 17: Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy

Tetraploid Switchgrass

Began with 8.3x 454 linear sequencing and added 6.5x

454 XLR+ longer sequencing (14.5x total coverage)

78% longer read length

54% longer HQ length

2x the yield per run

17

Page 18: Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy

V0.0 Release PAG 2012

• Linear 454 > 200 bp

• Sampled both the “A”

and “B” genomes in

the tetraploid

• Assembled using

Newbler V2.6

• Results:

• Contig N50 of 3.8 Kb

• 1.466 Gb of total

sequence assembled.

• 80 contigs > 50 Kb

• 1,663 contigs > 25Kb

18

Page 19: Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy

Annotation?

• Annotation was based solely on Sanger ESTs homology (foxtail millet, rice, Brachypodium, and sorghum)

• 65,878 total loci containing protein-coding transcripts

• 4,193 total alternatively spliced transcripts

For primary transcripts: • Average number of exons: 4.1

• Median exon length: 160

• Median intron length: 126

• Complete genes: 47,302

• Incomplete gene with start codon: 5,862

• Incomplete gene with stop codon: 10,459

19

Page 20: Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy

Is this a genome?

How do we organize these fractured 410k

pieces into a reference genome and put

them into the correct subgenome?

The Map! The Map! The Map!

20

Page 21: Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy

Genetic map AP13 x VS16

Switchgrass mapping population planted out at Noble 21

Page 22: Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy

Mapping strategy

VS16

250 offspring (F1) of VS16xAP13

sequenced in pools of 10-12 (depth <~1X)

Find short sequences that are:

(1) Polymorphic in one parent

(2) Found in only one subgenome

(3) Not found in the other parent

These are simple markers to track by

resequencing F1 offspring.

Directly observe recombination in the

polymorphic parent.

AP13

X

Select markers like: AAAAAAAATCTCGTATGCATGGAGTACTAAATGAAGTCTATTTGCAAAAC A 15 T 12

AAAAAAAATCTCTCCAGGGCAAAAATAAAAAAATGAAAAAGAAAAAAAAA A 13 C 14

AAAAAAAATCTTCGTGAGGAATTTTCTGTGCACTTTAAGTCTTCAATAAC A 12 G 14

113,325 AP13-derived markers and 236,622 VS16-derived markers

Mapping population: Malay Saha

Map development: Jarrod Chapman 22

Page 23: Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy

Initial VS16 map examples

First round of the map based on 106 offspring and VS16 specific subgenome differences and covers ~87K markers

23

Page 24: Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy

New map examples

1. Second mapping round uses all 250 offspring, AP13 subgenome differences

2. Added additional markers from WGS assembly

3. 130k typed subspecific genome markers + 418k markers that are linked to these

24

Page 25: Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy

Marker distributions

25

5cM bin widths

Page 26: Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy

Organizing assembly with map

• Original Newbler Assembly:

• 14.5x read coverage

• 556,117 contigs (1,466.3 Mb)

• V0.0 Release at PAG Jan 2012:

• 410,030 contigs (1,358.1 Mb)

• 73,010 contigs (426 Mb) mapped

and annotated to 21,624 FTM

genes.

26

Page 27: Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy

27

Bin contigs into Linkage Groups

(189,942 binned)

Subgenome duplicates

(35,683 removed)

Collapse subgenomes

contigs (36,467 collapsed)

Scaffold each linkage group using 5.5x, 2x250, 800bp

MiSEQ

Scaffold using 18x, 2x100, 4KB & 5kb

LFPE

Scaffold using 6x, 2x100, 9kb LFPE

Eliminate redundant ends on adjacent scaffolded

contigs

Position scaffolds on genetic map

THIS SYSTEM IS NOT STATIC AND IS EASILY EXTENDED TO INCLUDE

ADDITIONAL DATA, CLONE SEQUENCE, LONG READ DATA, …

Assembly process

Order scaffolds using P. hallii

synteny

Starting with 117,792 contigs

Scaffolding Performed Using Abyss

Page 28: Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy

Panicum hallii (panic grass)

28

Panicum hallii

• Native southwest grass

• Closely related (~4 MYR) to

tetraploid switchgrass

• Drought tolerance model

• 660Mb, mostly inbred

• w/Tom Juenger at UT-Austin

Page 29: Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy

P. Hallii synteny

• 31mers used to identify shared content

• P. virgatum scaffolds binned on P. hallii

• Orientation of P. virgatum scaffolds relative to P. hallii determined using BLAT

65 P. virgatum scaffolds ordered on

P. hallii super_61

Before After

29

Page 30: Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy

30

CHR 01: 2,291 corrections, 459 bp average (100bp to 4KB), 1.05 Mb removed from 44.7 Mb

Adjacent subgenome duplicates

Original scaffold Corrected scaffold (1.4kb eliminated)

Page 31: Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy

Map Integration

• 548,932 AP13 specific subgenome markers used to position scaffolds and syntenic blocks

• 56,088 map joins (with 10kb Ns) were made for 18 (2x9) linkage groups

• Map positions contain sizable blocks of contigs (10-20) that align to the same map position, cannot be ordered or orientated- placed within the context of other scaffolds

31

Page 32: Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy

Map vs. chromosomes

32

Page 33: Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy

All chromosomes

33

Page 34: Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy

Clone alignments against scaffolds

34

158KB clone on syntenic block 107KB clone on syntenic block

Page 35: Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy

Chromosome Assignments

• Asked switchgrass “power” users for recommendation on naming

• Assigned using shared 21mer content with S. italica

• Designation of "a" assigned to the P. virgatum LG that contained more shared 21mers with S. italica

35

Page 36: Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy

Panicum virgatum V1.0 release

36

• Read coverage: 14.5x

• Release size: 1.22 Gb

• Contig L50: 5.7 KB

• 636.1 MB of sequence in chromosomes

• 593.4 bits off chromosomes, includes duplicate sequences

Page 37: Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy

Annotation – resources

• Latest JGI pipeline for integrating RNA-seq data and

available EST data

• Included: Original sanger ESTs, 454 ESTs, minimal

FLcDNAs, 370 million pairs from GLBRC cultivars,

710 million pairs of RNA-seq data from germinating

seed, stem-node, stem-internode, blade, immature

flower

• Homology: rice, brachy, foxtail, sorghum, maize,

arabi, soybean, poplar and swiss prot

37

Page 38: Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy

Annotation

38

Page 39: Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy

Annotation results

• Primary transcripts (loci): 98,007

• Alternative transcripts: 27,432

• Total transcripts: 125,439

• For primary transcripts:

– Average number of exons 3.9

– Median exon length 183

– Median intron length 133

39

Length EST support Peptide homology

100% 57,584 7,311

95% 62,327 33,251

90% 63,848 41,653

75% 66,121 58,319

50% 68,391 77,960

PLEASE

DO NOT

GET ATTACHED

TO GENE NAMES!

Page 40: Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy

Phytozome advertisement

40 http://www.phytozome.net

Expression data for plants

Diversity data for plants

Page 41: Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy

Paralogous gene analysis

41

• Genes “A”, “B”, and remaining

contigs aligned using BLASTP

• Alignments Screened:

• >80% identity and >80%

coverage

• Length of query and target

amino acid sequences had to be

within 20% of one another

• There are a total of:

• 29,357 “A” genes

• 27,522 “B” genes

• 41,128 genes in remaining

contigs

• 98,007 total genes

Page 42: Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy

SNP rates in chromosomes

42

AP13 VS16

Heterozygous SNPs 1,449,600 581,106

Homozygous SNPs 10,406 1,482,882

Heterozygous INDELs 864 924

Homozygous INDELs 1,920 9,391

Assembly Length 1,103.8 Mb 1,103.8 Mb

Callable Bases 466.0 Mb 241.6 Mb

Heterozygous Rate (Callable) 3.111 per Kb 2.405 per Kb

Homozygous Rate (Callable) 0.022 per Kb 6.137 per Kb

Page 43: Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy

Improving the genome

43

Page 44: Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy

Towards switchgrass 2.0

1. Build new AP13 and VS16 maps from recent

sequence data (116 genotypes + 250 originals) to

help with subgenome localization

2. Upgrade mate pair data for AP13 to new, longer,

better, stronger mate pairs

3. Continue directed clone based sequencing of

switchgrass important regions

4. Version 2.0 of the genome with ~3-400Mb of locally

sequenced contigs, integrated into chromosomes

44

PLEASE

DO NOT

GET ATTACHED

TO ORDER!

Page 45: Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy

Improvement project

45

Short Scaffolds: Selected

properly projecting clones.

? ?

Not Selected

Selected

Cell Wall EST

Redundant Clone

Long Scaffolds: Tiling path covering cell wall genes.

Long Scaffs

Short Scaffs

Total

Chromosomes 335 3,237 3,572

Remaining 6 1,182 1,188

Total 341 4,419 4,760

Page 46: Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy

Improvement project

46

• 96 well clone based pool with individual indexes

• Sequenced as 2x250 on HiSeq2500, assembled and minimal

manual finishing

• Add 96 paired, sized libraries run as ¼ MiSeq run as needed

Page 47: Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy

Switchgrass V2.0

47

Bin contigs and clones into linkage

Groups

Remove subgenome

duplicates from clones and contigs

Scaffold each linkage group using 2x250, 800bp pairs

Scaffold using LMP pairs

Eliminate redundant ends on adjacent scaffolded

contigs

Position scaffolds on genetic map

Order scaffolds using P. hallii

synteny

New Genetic

Map

WGS Contigs

Clone Contigs

New HiSeq Frags

New LMP Pairs

New Genetic

Map

Page 48: Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy

Current JGI switchgrass projects

1. Community diversity project for up to 50 genotypes (12

sampled to date) – Laura Bartley

2. eQTL study of 90 genotypes (2 samples per) for biomass/cell

wall trait variants – with Laura Bartley and Malay Saha

3. BRC switchgrass projects, QTL mapping, engineered mutants

48

Purpose Class Targets Genotypes Contributors

Support Genetics Mapping Parents

Biparental mapping parents, NAM parents

22 Saha, Brummer,

Tobias, Wu, Bonos

Diversity

Each phylogenetic group from Lui et al. 2012, including octaploids;

Mexican and NE accessions

9 Casler, Auer,

Juenger

Genome Stucture Determination

Genomic variants

dihaploid, selfed, intermediate genome size

10 Wu, Tobias, Brummer

Baseline Data Interesting phenotypes

Transformed genotype, Other

1 Wang

Current total 42

Page 49: Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy

Panicum hallii projects

1. Produce draft assembly V1.0 for Panicum hallii

2. Diploid panicum eQTLs for segregating

drought/biomass traits in HAL x FIL cross

3. Diversity sampling

49

Hall’s Panicgrass diversity west Texas to east Texas, Juenger Lab FIL2 HAL HAL X FIL2

Page 50: Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy

Please apply these resources!

50

Page 51: Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy

Acknowledgments

DOE JGI

Jane Grimwood (HA)

Jerry Jenkins (HA)

Jarrod Chapman

Shengqiang Shu

Dan Rokhsar

Kerrie Barry

BESC

Gerald Tuskan, ORNL

Katrien Devos, UGA

Yi-Ching Lee, Noble

Malay Saha, Noble

Michael Udvardi, Noble

Jiyi Zhang, Noble

JBEI

Pam Ronald, UC-Davis

Manoj Sharma, UC-Davis

Rita Sharma, UC-Davis

Others

Laura Bartley, OU

Christian Tobias, SGEC

Chris Saski, CUGI (BAC libs)

Tom Juenger, UT-Austin

Funding Sources

DOE DE-AC02-05CH11231

ARRA UC Berkeley

51

Page 52: Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy

Questions for discussion

• How often should we update the switchgrass genome and annotation?

• What else can the JGI do that would be immediately useful for the switchgrass community?

• JGI CSP2015 deadline for LOI will be March 2014 • Comprehensive community proposals are

greatly preferred!

52