53
Applied Bioinformatics for Plant Genome Characterisation using Next-Generation Sequence Data 1 David Edwards University of Queensland, Australia [email protected]

Applied Bioinformatics for Plant Genome Characterisation ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Dave-Edwards.pdf · Applied Bioinformatics for Plant Genome Characterisation

Embed Size (px)

Citation preview

Page 1: Applied Bioinformatics for Plant Genome Characterisation ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Dave-Edwards.pdf · Applied Bioinformatics for Plant Genome Characterisation

Applied Bioinformatics for Plant Genome Characterisation using Next-Generation Sequence Data

1

David Edwards

University of Queensland, Australia

[email protected]

Page 2: Applied Bioinformatics for Plant Genome Characterisation ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Dave-Edwards.pdf · Applied Bioinformatics for Plant Genome Characterisation

Outline

• Sequencing wheat chromosome arms

• Wheat evolution

• Chickpea chromosomal genomics

• Skim GBS based genome assembly

• Skim GBS based trait association

• Assessing gene presence/absence variation

• Extreme non-model species

Page 3: Applied Bioinformatics for Plant Genome Characterisation ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Dave-Edwards.pdf · Applied Bioinformatics for Plant Genome Characterisation

Chromosome sequencing

• Isolate chromosome arms using flow cytometry

• Generate NGS libraries and PE Illumina data

• De novo assemble

Page 4: Applied Bioinformatics for Plant Genome Characterisation ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Dave-Edwards.pdf · Applied Bioinformatics for Plant Genome Characterisation

Wheat genome

4

http://www.jic.ac.uk/staff/graham-moore/wheat_meiosis.htm 17 billion bases

Page 5: Applied Bioinformatics for Plant Genome Characterisation ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Dave-Edwards.pdf · Applied Bioinformatics for Plant Genome Characterisation

Mapping reads to reference genomes

5

1

2

3

4

5

6

11

10

9

8

7

12

Page 6: Applied Bioinformatics for Plant Genome Characterisation ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Dave-Edwards.pdf · Applied Bioinformatics for Plant Genome Characterisation

Sequencing wheat chromosome arms

6

Ta 7DS Bd 1

Bd 3

www.wheatgenome.info

Berkman, et al., Plant Biotechnology Journal (2011)

Page 7: Applied Bioinformatics for Plant Genome Characterisation ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Dave-Edwards.pdf · Applied Bioinformatics for Plant Genome Characterisation

Wheat genome evolution

AA

BB AW

AABB

50,000 years ago

DD

AABBDD

10,000 years ago

AABB

DD

Page 8: Applied Bioinformatics for Plant Genome Characterisation ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Dave-Edwards.pdf · Applied Bioinformatics for Plant Genome Characterisation

A little history

8 http://www.nap.edu/openbook.php?record_id=12692&page=94

Page 9: Applied Bioinformatics for Plant Genome Characterisation ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Dave-Edwards.pdf · Applied Bioinformatics for Plant Genome Characterisation

Wheat genome evolution

9

• When 2 genomes come together, they lose genes as two copies may not be required or may even be harmful

• Can we see differential gene loss between the three wheat genomes?

Page 10: Applied Bioinformatics for Plant Genome Characterisation ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Dave-Edwards.pdf · Applied Bioinformatics for Plant Genome Characterisation

Figure 1 Wheat genome evolution

The number of conserved genes within the syntenic builds for chromosome 7A, B and D genomes

10

Page 11: Applied Bioinformatics for Plant Genome Characterisation ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Dave-Edwards.pdf · Applied Bioinformatics for Plant Genome Characterisation

Wheat genome evolution

11

• Are there differences in the types of genes lost?

• Conservation of highly networked genes under neutral selection

• Strong selection pressure breaks networks and leads to loss of networked genes

Page 12: Applied Bioinformatics for Plant Genome Characterisation ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Dave-Edwards.pdf · Applied Bioinformatics for Plant Genome Characterisation

7A gene network

12

Page 13: Applied Bioinformatics for Plant Genome Characterisation ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Dave-Edwards.pdf · Applied Bioinformatics for Plant Genome Characterisation

7B gene network

13

Page 14: Applied Bioinformatics for Plant Genome Characterisation ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Dave-Edwards.pdf · Applied Bioinformatics for Plant Genome Characterisation

7D gene network

14

Page 15: Applied Bioinformatics for Plant Genome Characterisation ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Dave-Edwards.pdf · Applied Bioinformatics for Plant Genome Characterisation

Wheat genome evolution

15

AA

BB AW

AABB

50,000 years ago

DD

AABBDD

10,000 years ago

Neutral selection

Strong selection

Page 16: Applied Bioinformatics for Plant Genome Characterisation ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Dave-Edwards.pdf · Applied Bioinformatics for Plant Genome Characterisation

16

SGSautoSNP

Page 17: Applied Bioinformatics for Plant Genome Characterisation ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Dave-Edwards.pdf · Applied Bioinformatics for Plant Genome Characterisation

17

Australian resequencing

Page 18: Applied Bioinformatics for Plant Genome Characterisation ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Dave-Edwards.pdf · Applied Bioinformatics for Plant Genome Characterisation

4 million SNPs

18

# SNPs SNPs/Mb

7A 1,486,040 4077

7B 1,860,295 4737

7D 671,976 1939

Page 19: Applied Bioinformatics for Plant Genome Characterisation ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Dave-Edwards.pdf · Applied Bioinformatics for Plant Genome Characterisation

Wheat genome evolution

19

AA

BB AW

AABB

50,000 years ago

DD

AABBDD

10,000 years ago

AABB

DD Genetic exchange

No genetic exchange

Page 20: Applied Bioinformatics for Plant Genome Characterisation ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Dave-Edwards.pdf · Applied Bioinformatics for Plant Genome Characterisation

20

Page 21: Applied Bioinformatics for Plant Genome Characterisation ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Dave-Edwards.pdf · Applied Bioinformatics for Plant Genome Characterisation

SNP matrix

21

AC

Barrie 0

Alsen 194,725 0

Baxter 328,294 246,218 0

Chara 592,193 438,075 146,171 0

Drysdale 429,530 319,401 392,632 730,606 0

Excalibur 346,557 273,217 324,087 567,179 367,279 0

Gladius 529,898 327,659 472,457 906,611 616,253 491,885 0

H45 385,753 265,113 339,227 627,589 298,414 280,576 519,690 0

Kukri 245,356 208,666 290,506 541,524 428,134 318,029 480,575 345,358 0

Pastor 302,731 289,053 340,269 603,323 336,029 284,559 552,119 309,025 302,231 0

RAC875 412,818 257,630 390,967 722,089 429,038 368,152 158,973 386,145 418,037 375,137 0

VolcaniD

DI 508,175 413,676 412,553 808,658 696,467 600,478 813,067 633,916 498,017 586,694 643,205 0

Westoni

a 354,599 276,490 310,192 623,591 500,461 362,800 557,464 405,842 346,683 349,542 403,411 678,631 0

Wyalkatc

hem 525,289 341,043 433,228 800,300 560,759 327,888 386,213 449,614 436,777 442,941 235,924 800,137 505,345 0

Xiaoyan

54 458,214 332,986 368,604 761,864 540,264 324,881 696,677 377,053 401,191 413,462 522,021 897,807 622,449 569,223 0

Yitpi 544,440 328,216 468,743 968,088 690,017 548,694 233,539 587,310 530,687 580,060 287,648 951,537 654,967 444,084 844,785 0

AC

Barrie Alsen Baxter Chara Drysdale Excalibur Gladius H45 Kukri Pastor RAC875

VolcaniD

DI

Westoni

a

Wyalkatc

hem

Xiaoyan

54 Yitpi

Page 22: Applied Bioinformatics for Plant Genome Characterisation ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Dave-Edwards.pdf · Applied Bioinformatics for Plant Genome Characterisation

Phylogenetic tree

22

Page 23: Applied Bioinformatics for Plant Genome Characterisation ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Dave-Edwards.pdf · Applied Bioinformatics for Plant Genome Characterisation

GBrowse http://wheatgenome.info/

Page 24: Applied Bioinformatics for Plant Genome Characterisation ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Dave-Edwards.pdf · Applied Bioinformatics for Plant Genome Characterisation

Chickpea kabuli reference

Page 25: Applied Bioinformatics for Plant Genome Characterisation ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Dave-Edwards.pdf · Applied Bioinformatics for Plant Genome Characterisation

Kabuli reference

Page 26: Applied Bioinformatics for Plant Genome Characterisation ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Dave-Edwards.pdf · Applied Bioinformatics for Plant Genome Characterisation

Kabuli reference

Desi Kabuli

Page 27: Applied Bioinformatics for Plant Genome Characterisation ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Dave-Edwards.pdf · Applied Bioinformatics for Plant Genome Characterisation

Chickpea desi vs kabuli

Page 28: Applied Bioinformatics for Plant Genome Characterisation ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Dave-Edwards.pdf · Applied Bioinformatics for Plant Genome Characterisation

Desi reference

Ruperao et al. Plant Biotechnology Journal (in press)

Desi Kabuli Desi WGS

Page 29: Applied Bioinformatics for Plant Genome Characterisation ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Dave-Edwards.pdf · Applied Bioinformatics for Plant Genome Characterisation

Skim GBS based genome validation

• Skim GBS SNP calling

• Make metaSNPs

• Merge contigs

• Genetic map

• Compare all blocks against all

• Apply clustering

Page 30: Applied Bioinformatics for Plant Genome Characterisation ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Dave-Edwards.pdf · Applied Bioinformatics for Plant Genome Characterisation

Skim GBS

30

• Determine SNPs by sequencing parents and running SGSautoSNP

• Low coverage skim sequence segregating population

• Map reads to the reference genome

• Call genotype where reads cover previously defined SNP

• Impute and clean to define haplotype blocks

Page 31: Applied Bioinformatics for Plant Genome Characterisation ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Dave-Edwards.pdf · Applied Bioinformatics for Plant Genome Characterisation

Genotype calling

31

Call genotype of previously predicted SNPs

A

C/A T/C

A

Page 32: Applied Bioinformatics for Plant Genome Characterisation ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Dave-Edwards.pdf · Applied Bioinformatics for Plant Genome Characterisation

Haplotype blocks

TN1 A G G T C C A G G A T A A T

TN2 A G G T C C A G G A T A A T

TN3 T C C A G G C G G A T A A T

TN4 A G G T C C A G G A T A A T

TN5 T C C A G G C T C G C G G C

TN6 A G G T C C A G G A T A A T

TN7 T C C A G G C T C G C G G C

T A G G T C C A G G A T A A T

N T C C A G G C T C G C G G C

Page 33: Applied Bioinformatics for Plant Genome Characterisation ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Dave-Edwards.pdf · Applied Bioinformatics for Plant Genome Characterisation

Pre-imputation

Page 34: Applied Bioinformatics for Plant Genome Characterisation ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Dave-Edwards.pdf · Applied Bioinformatics for Plant Genome Characterisation

After imputation and cleaning

Page 35: Applied Bioinformatics for Plant Genome Characterisation ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Dave-Edwards.pdf · Applied Bioinformatics for Plant Genome Characterisation

Clustering

Page 36: Applied Bioinformatics for Plant Genome Characterisation ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Dave-Edwards.pdf · Applied Bioinformatics for Plant Genome Characterisation

Clustering

Page 37: Applied Bioinformatics for Plant Genome Characterisation ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Dave-Edwards.pdf · Applied Bioinformatics for Plant Genome Characterisation

LG 1 after ordering

Page 38: Applied Bioinformatics for Plant Genome Characterisation ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Dave-Edwards.pdf · Applied Bioinformatics for Plant Genome Characterisation

Trait association

38

Disease resistance in canola

Drought tolerance in chickpea

Page 39: Applied Bioinformatics for Plant Genome Characterisation ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Dave-Edwards.pdf · Applied Bioinformatics for Plant Genome Characterisation

Gene loss

Page 40: Applied Bioinformatics for Plant Genome Characterisation ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Dave-Edwards.pdf · Applied Bioinformatics for Plant Genome Characterisation

Cabbage

40

Page 41: Applied Bioinformatics for Plant Genome Characterisation ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Dave-Edwards.pdf · Applied Bioinformatics for Plant Genome Characterisation

Brussel sprout

41

Page 42: Applied Bioinformatics for Plant Genome Characterisation ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Dave-Edwards.pdf · Applied Bioinformatics for Plant Genome Characterisation

Cauliflower

42

Page 43: Applied Bioinformatics for Plant Genome Characterisation ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Dave-Edwards.pdf · Applied Bioinformatics for Plant Genome Characterisation

Kale

43

Page 44: Applied Bioinformatics for Plant Genome Characterisation ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Dave-Edwards.pdf · Applied Bioinformatics for Plant Genome Characterisation

Kohlrabi

44

Page 45: Applied Bioinformatics for Plant Genome Characterisation ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Dave-Edwards.pdf · Applied Bioinformatics for Plant Genome Characterisation

Wild B. oleracea

45

Page 46: Applied Bioinformatics for Plant Genome Characterisation ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Dave-Edwards.pdf · Applied Bioinformatics for Plant Genome Characterisation

Brassica pan-genome

46

List all Brassica genes Essential (conserved) Optional (presence/absence variation) Associate PAVs with traits Abundance of optional genes with fitness

Page 47: Applied Bioinformatics for Plant Genome Characterisation ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Dave-Edwards.pdf · Applied Bioinformatics for Plant Genome Characterisation

Seagrass

47 Manatee grazing on seagrass (picture by David Peart).

Page 48: Applied Bioinformatics for Plant Genome Characterisation ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Dave-Edwards.pdf · Applied Bioinformatics for Plant Genome Characterisation

Manacheese?

48

Page 49: Applied Bioinformatics for Plant Genome Characterisation ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Dave-Edwards.pdf · Applied Bioinformatics for Plant Genome Characterisation

Seagrass

Page 50: Applied Bioinformatics for Plant Genome Characterisation ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Dave-Edwards.pdf · Applied Bioinformatics for Plant Genome Characterisation

GO.ID Term

GO:0018871 1-aminocyclopropane-1-carboxylate metabolic prcesses

GO:0042218 1-aminocyclopropane-1-carboxylate biosynthetic processes

GO:0009692 ethylene metabolic process

GO:0009693 ethylene biosynthetic process

GO:0043449 cellular alkene metabolic process

GO:0043450 alkene biosynthetic process

GO:1900673 olefin metabolic process

GO:1900674 olefin biosynthetic process

GO:0048447 sepal morphogenesis

GO:0048451 petal formation

GO:0048453 sepal formation

GO:0048442 sepal development

GO:0048464 flower calyx development

GO:0048446 petal morphogenesis

GO:0010044 response to aluminum ion

GO:0071281 cellular response to iron ion

GO:0010039 response to iron ion

GO:0010105 negative regulation of ethylene mediated signalling pathway

GO:0070298 negative regulation of phosphorelay signal transduction system

GO:0048441 petal development

GO:0048465 corolla development

GO:0071248 cellular response to metal ion

GO:0009963 positive regulation of flavonoid biosynthetic process

GO:0010104 regulation of ethylene mediated signalining pathway

GO:0070297 regulation of phosphorelay signal transduction system

GO:1900378 positive regulation of secondary metabolite biosynthetic process

GO:0071241 cellular response to inorganic substance

GO:0009956 radial pattern formation

GO:0010375 stomatal complex patterning

GO:0048729 tissue morphogenesis

GO:2000038 regulation of stomatal complex development

GO terms for genes lost in seagrass

Page 51: Applied Bioinformatics for Plant Genome Characterisation ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Dave-Edwards.pdf · Applied Bioinformatics for Plant Genome Characterisation

Conclusions

• Build high quality genome assemblies

• Identify variation between genomes

• Associate genome variation with agronomic traits

• Apply diverse genomic knowledge to improve crops

Page 52: Applied Bioinformatics for Plant Genome Characterisation ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Dave-Edwards.pdf · Applied Bioinformatics for Plant Genome Characterisation

Acknowledgements

52

Philipp Bayer

Kenneth Chan

Pradeep Ruperao

Michal Lorenc

Agnieszka Golicz

Kaitao Lai

Paul Visendi

Paula Martinez

Jenny Lee

Juan Montenegro

Paul Berkman

Jiri Stiller

Sahana Manoli

Jacqueline Batley

Alice Hayward

Emma Campbell

Jessica Dalton-Morgan

Satomi Hayashi

Reece Tollenaere

Hana Šimková

Marie Kubaláková

Jaroslav Doležel

Tim Sutton

Deepa Jaganathan

Rajeev Varshney

(and colleagues)

Martin Schliep

Rudy Dolferus

Peter Ralph

Contact:

[email protected]

Page 53: Applied Bioinformatics for Plant Genome Characterisation ...ksiconnect.icrisat.org/wp-content/uploads/2014/02/Dave-Edwards.pdf · Applied Bioinformatics for Plant Genome Characterisation

Acknowledgements

53

Kaitao Lai

Philipp Bayer

Kenneth Chan

Michal Lorenc

Agnieszka Golic

Paul Visendi

Pradeep Ruperao

Paul Berkman

Jiri Stiller

Sahana Manoli

Jacqueline Batley

Alice Hayward

Emma Campbell

Jessica Dalton-Morgan

Satomi Hayashi

Hana Šimková

Marie Kubaláková

Jaroslav Doležel

Contact:

[email protected]

Advisory Board Jeff Bennetzen Jose Crossa Robert Henry Rodomiro Ortiz Andrew Paterson Kadambot Siddique Mark Sorrells Mark Tester Michael Udvardi