48
Plant Omics Resources 4 June, 2010 Jongsun Park Current Status of Plant Genome Projects with Next Generation Sequencing Technologies Fungal Bioinformatics Laboratory, Seoul National University

Plant Omics Resources - Amborella · 2010-06-05 · Professor in plant pathology in Pennsylvania State University Bongsoo Park, PhD student Wonho Song, Undergraduate student Kyohoon

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Plant Omics Resources - Amborella · 2010-06-05 · Professor in plant pathology in Pennsylvania State University Bongsoo Park, PhD student Wonho Song, Undergraduate student Kyohoon

Plant Omics Resources

4 June, 2010Jongsun Park

Current Status of Plant Genome Projectswith Next Generation Sequencing Technologies

Fungal Bioinformatics Laboratory,Seoul National University

Page 2: Plant Omics Resources - Amborella · 2010-06-05 · Professor in plant pathology in Pennsylvania State University Bongsoo Park, PhD student Wonho Song, Undergraduate student Kyohoon

Next Generation Sequencing (NGS)Technologies and

de novo Assembly Problems

Page 3: Plant Omics Resources - Amborella · 2010-06-05 · Professor in plant pathology in Pennsylvania State University Bongsoo Park, PhD student Wonho Song, Undergraduate student Kyohoon

A Huge Number of Sequence Data in NCBI

- NCBI, which is the major sequence repository, presents the rapid growth of

sequences.

http://www.ncbi.nlm.nih.gov/Genbank/genbankstats.html

99,116,431,942 bp

Page 4: Plant Omics Resources - Amborella · 2010-06-05 · Professor in plant pathology in Pennsylvania State University Bongsoo Park, PhD student Wonho Song, Undergraduate student Kyohoon

Next Generation Sequencing (NGS) Technologies

- Next (or Current) generation sequencing technologies have accelerated the speed

of genome sequencing projects and have broaden application range of genome

sequences.

Solexa; Illumina

SOLiD; ABI

GS-Titanium; Roche 454

SMRT; Pacific Bioscience Helicos; Helicos Bioscience

Page 5: Plant Omics Resources - Amborella · 2010-06-05 · Professor in plant pathology in Pennsylvania State University Bongsoo Park, PhD student Wonho Song, Undergraduate student Kyohoon

NGS: 454 Technology

Page 6: Plant Omics Resources - Amborella · 2010-06-05 · Professor in plant pathology in Pennsylvania State University Bongsoo Park, PhD student Wonho Song, Undergraduate student Kyohoon

NGS: Solexa Technology

Page 7: Plant Omics Resources - Amborella · 2010-06-05 · Professor in plant pathology in Pennsylvania State University Bongsoo Park, PhD student Wonho Song, Undergraduate student Kyohoon

NGS: SOLiD Technology (1)

Page 8: Plant Omics Resources - Amborella · 2010-06-05 · Professor in plant pathology in Pennsylvania State University Bongsoo Park, PhD student Wonho Song, Undergraduate student Kyohoon

NGS: SOLiD Technology (2)

Page 9: Plant Omics Resources - Amborella · 2010-06-05 · Professor in plant pathology in Pennsylvania State University Bongsoo Park, PhD student Wonho Song, Undergraduate student Kyohoon

NGS: SOLiD Technology (3)

Page 10: Plant Omics Resources - Amborella · 2010-06-05 · Professor in plant pathology in Pennsylvania State University Bongsoo Park, PhD student Wonho Song, Undergraduate student Kyohoon

Capacity of Next Generation Sequencers

Solexa GA2; Illumina

SOLiD 4; ABI

GS-Titanium; Roche 454

ABI 3730; ABI

96 x 1,000 bp = 96,000 bp = 100Kb

950,000 x 450 bp = 405,000,000 bp = 405Mb

30,000,000 x 7 x (101 x 2) bp = 42,420,000,000 bp = 42.5Gb

940,000,000 x 75 bp (50+25) = 70,500,000,000 bp = 70.5Gb

HiSeq2000; Illumina

30,000,000 x 7 x (101 x 2) x 4 bp = 169,680,000,000 bp = 169.7Gb

Page 11: Plant Omics Resources - Amborella · 2010-06-05 · Professor in plant pathology in Pennsylvania State University Bongsoo Park, PhD student Wonho Song, Undergraduate student Kyohoon

Pros and Cons of NGS Technologies

- Large number of reads per one run

- Low sequencing costs

- Diverse applications not only for

genomics but also for

transcriptomics and small RNAs

Pros Cons

- Short read length per each reads

ex) 36bp to 101 bp

- Different type of sequencing

qualities

- Difficulties to deal with large size of

result files

- Almost impossible to do de novo

assembly

Page 12: Plant Omics Resources - Amborella · 2010-06-05 · Professor in plant pathology in Pennsylvania State University Bongsoo Park, PhD student Wonho Song, Undergraduate student Kyohoon

Whole Shotgun Sequence Strategy

- Assembly process is essential for genome project because read length of each

sequence is less than 1 kb.

- Assembly process was conducted by several popular programs, such as phrap

and PCAP3, for Sanger sequences.

Genome AssemblyScaffolding

Assembled genome

Page 13: Plant Omics Resources - Amborella · 2010-06-05 · Professor in plant pathology in Pennsylvania State University Bongsoo Park, PhD student Wonho Song, Undergraduate student Kyohoon

Genome Assembly Process

- We can perform genome assembly manually!

23 sequences should be compared with each other!

23C2 = 23*22 / 2 = 253 comparison!

Page 14: Plant Omics Resources - Amborella · 2010-06-05 · Professor in plant pathology in Pennsylvania State University Bongsoo Park, PhD student Wonho Song, Undergraduate student Kyohoon

Example of Genome Assembly: Vitis Vinifera

Pair-wise comparison of 6,200,000 reads

6,200,000C2 = 19,219,996,900,000 comparisons

Page 15: Plant Omics Resources - Amborella · 2010-06-05 · Professor in plant pathology in Pennsylvania State University Bongsoo Park, PhD student Wonho Song, Undergraduate student Kyohoon

Genome Assemblers

http://www.phrap.org/

Page 16: Plant Omics Resources - Amborella · 2010-06-05 · Professor in plant pathology in Pennsylvania State University Bongsoo Park, PhD student Wonho Song, Undergraduate student Kyohoon

Short-read Sequence Assembly (1)

- Short-read sequences generated by NGS machines cause several problems

of already well-established genome assemblers.

- Too many number of reads require near to infinite computational power.

- Too short reads cannot find reliable overlaps to make long contig.

- To reduce computational power, new algorithm was needed.

- Short reads require another strategy to make reliable contig sequences.

- Dealing a lot of sequences also caused several technical problems.

Page 17: Plant Omics Resources - Amborella · 2010-06-05 · Professor in plant pathology in Pennsylvania State University Bongsoo Park, PhD student Wonho Song, Undergraduate student Kyohoon

Short-read Sequence Assembly (2)

563,466,202C563,466,201 = 158,747,080,116,419,301 comparison?!

563,466,202

Page 18: Plant Omics Resources - Amborella · 2010-06-05 · Professor in plant pathology in Pennsylvania State University Bongsoo Park, PhD student Wonho Song, Undergraduate student Kyohoon

De brujin Graph Algorithm

- This algorithm has been utilized for fast-finding overlapped short-read

sequences with combining k-mer sequences.

K = 3GCAAAACACTT…

GCA

CAA

AAA

AAA

AAC

Page 19: Plant Omics Resources - Amborella · 2010-06-05 · Professor in plant pathology in Pennsylvania State University Bongsoo Park, PhD student Wonho Song, Undergraduate student Kyohoon

Genome Assemblers For Short Read Sequences

Page 20: Plant Omics Resources - Amborella · 2010-06-05 · Professor in plant pathology in Pennsylvania State University Bongsoo Park, PhD student Wonho Song, Undergraduate student Kyohoon

Examples of Plant Genome de novo Assembly

5,937,915,739 bp

# of contigs 950 ea

Total length 976,089 bp

Maximum length 12,606 bp

Average length 1,027.46 bp

N50 length 3,061 bp

Lithocarpushancei

Ficusaltissima

Ficusaltissima

Ficusaltissima

Ficustinctoria

Ficustinctoria

# of contigs 462,868 355,052 132,590 247,376 337,777 476,937

Total length 112,614,098 87,502,701 33,293,636 61,369,608 87,427,716 116,554,688

Maximumlength

1,748 1,090 1,688 1,334 1,274 1,578

Average length

243.30 246.45 251.10 248.08 258.83 244.38

N50 length 237 239 245 241 248 3061

Page 21: Plant Omics Resources - Amborella · 2010-06-05 · Professor in plant pathology in Pennsylvania State University Bongsoo Park, PhD student Wonho Song, Undergraduate student Kyohoon

Giant Panda Genome Project

Page 22: Plant Omics Resources - Amborella · 2010-06-05 · Professor in plant pathology in Pennsylvania State University Bongsoo Park, PhD student Wonho Song, Undergraduate student Kyohoon

Current Status ofPlant Genome Projects

Page 23: Plant Omics Resources - Amborella · 2010-06-05 · Professor in plant pathology in Pennsylvania State University Bongsoo Park, PhD student Wonho Song, Undergraduate student Kyohoon

Species name Method Size (Mb) # of contigs # of transcripts

Arabidopsis lyrata WGS 206.67 695 32,670

Medicago truncatula BAC, WGS 278.69 9 38,334

Selaginella moellendorffii WGS 212.76 768 22,285

Lycopersicon esculentum WGS, BAC 794.60 7,409 49,389

Solanum phureja WGS 702.58 57,681 110,512

Ricinus communis WGS 362.47 28,518 38,613

Mimulus guttatus WGS 416.66 11,243 47,442

Manihot esculenta WGS 321.73 2,216 27,501

Unpublished 8 Higher Plant Genomes

Page 24: Plant Omics Resources - Amborella · 2010-06-05 · Professor in plant pathology in Pennsylvania State University Bongsoo Park, PhD student Wonho Song, Undergraduate student Kyohoon

Species name Journal Method Size (Mb) # of contigs # of transcripts

Arabidopsis thaliana Nature, 2000 BAC +α 119.19 5 32,615

Oryza sativa japonicaScience, 2002Nature, 2005

BAC +α 372.08 12 66,710

Oryza sativa indicaScience, 2002PLoS Biology, 2005

WGS 426.32 10,267 49,710

Oryza sativa japonica (syngenta)

PLoS Biology, 2005 WGS 391.14 7,777 45,824

Populus trichocarpa Science, 2006 WGS 485.51 22,012 45,555

Vitis vinifera Nature, 2007WGS, Complete

497.51 35 30,434

Carica papaya Nature, 2008 WGS 369.69 17,677 28,589

Lotus japonicus DNA Research, 2008 WGS 323.24 110,945 26,700

Sorghum bicolor Nature, 2009 WGS 738.54 3,304 36,338

Zea mays Science, 2009 BAC, WGS 2,061.02 11 53,764

Cucumis sativusNature genetics, 2009

WGS 243.57 47,488 26,682

Glycine max Nature, 2010 WGS 996.90 4,262 62,199

Brachypodiumdistachyon

Nature, 2010 WGS 273.27 197 32,255

13 Published Higher Plant Genomes from 10 Species

All pictures are from wikipedia.

Page 25: Plant Omics Resources - Amborella · 2010-06-05 · Professor in plant pathology in Pennsylvania State University Bongsoo Park, PhD student Wonho Song, Undergraduate student Kyohoon

Species name Journal MethodSize (Mb)

# of contigs

# of transcripts

Chlamydomonas reinhardtii Science, 2007 WGS 112.31 88 16,709

Micromonas pusilla CCMP1545 Science, 2009 WGS 22.04 27 10,547

Micromonas sp. RCC299 Science, 2009 WGS 20.99 17 10,108

Ostreococcus lucimarinus CCE9901 PNAS, 2007 WGS 13.2 21 7,488

Ostreococcus sp. RCC809 Not published yet WGS 13.41 22 7,492

Ostreococcus tauri PNAS, 2007 WGS 12.58 118 7,725

Coccomyxa sp. C169 Not published yet WGS 48.95 45 9,629

7 Unicellular Plants Genomes

Page 26: Plant Omics Resources - Amborella · 2010-06-05 · Professor in plant pathology in Pennsylvania State University Bongsoo Park, PhD student Wonho Song, Undergraduate student Kyohoon

Distribution of 28 Plant Genome Size

0

500

1000

1500

2000

2500

13 13 13 21 22 49 112 119

207 213 244 273 279 322 323 362 370 372 391 417 426

486 498

703 739 795

997

2061

Unicellular Plants

Mb

Page 27: Plant Omics Resources - Amborella · 2010-06-05 · Professor in plant pathology in Pennsylvania State University Bongsoo Park, PhD student Wonho Song, Undergraduate student Kyohoon

Distribution of Number of Transcripts of 21 Plants

-

20,000

40,000

60,000

80,000

100,000

120,000

7,488 7,492 7,725 9,629 10,108 10,547

16,709 22,285

26,682 26,700 27,501 28,589 30,434 32,255 32,615 32,670 36,338 38,334 38,613

45,555 45,824 47,442 49,389 49,710 53,764

62,199 66,710

110,512

Unicellular Plants

# of transcripts

Page 28: Plant Omics Resources - Amborella · 2010-06-05 · Professor in plant pathology in Pennsylvania State University Bongsoo Park, PhD student Wonho Song, Undergraduate student Kyohoon

Relationship between Genome Size and Transcripts

-

100.00

200.00

300.00

400.00

500.00

600.00

700.00

# of transcript/Genome Size (Mb)

Unicellular Plants

0

50

100

150

200

250

300

350

400

13 13 13 49

21 22

112 119

372

Page 29: Plant Omics Resources - Amborella · 2010-06-05 · Professor in plant pathology in Pennsylvania State University Bongsoo Park, PhD student Wonho Song, Undergraduate student Kyohoon

Comparisons with Genomes in Other Kingdom

505

35

2556

302

103 23 0

500

1000

1500

2000

2500

3000

3500

4000

Streptophyta Chlorophyta Chordata Arthropoda Oomycetes Fungi

Mb

21 species 8 species 48 species 31 species 6 species 244 species

Page 30: Plant Omics Resources - Amborella · 2010-06-05 · Professor in plant pathology in Pennsylvania State University Bongsoo Park, PhD student Wonho Song, Undergraduate student Kyohoon

Cucumber Genome Sequences

Page 31: Plant Omics Resources - Amborella · 2010-06-05 · Professor in plant pathology in Pennsylvania State University Bongsoo Park, PhD student Wonho Song, Undergraduate student Kyohoon

Plant Omics Resources (POR)

Page 32: Plant Omics Resources - Amborella · 2010-06-05 · Professor in plant pathology in Pennsylvania State University Bongsoo Park, PhD student Wonho Song, Undergraduate student Kyohoon

Plant Genome Resources: Phytozomehttp://www.phytozome.org/

Page 33: Plant Omics Resources - Amborella · 2010-06-05 · Professor in plant pathology in Pennsylvania State University Bongsoo Park, PhD student Wonho Song, Undergraduate student Kyohoon

Plant Genome Resources: PlantGDBhttp://www.plantgdb.org/

Page 34: Plant Omics Resources - Amborella · 2010-06-05 · Professor in plant pathology in Pennsylvania State University Bongsoo Park, PhD student Wonho Song, Undergraduate student Kyohoon

National Center for Biotechnology Informationhttp://www.ncbi.nlm.nih.gov/genomes/PLANTS/PlantList.html

Page 35: Plant Omics Resources - Amborella · 2010-06-05 · Professor in plant pathology in Pennsylvania State University Bongsoo Park, PhD student Wonho Song, Undergraduate student Kyohoon

Pros and Cons of Current Repositories

- Each plant genome repository has their own strategy to manage and to

show plant genome information.

- All three repositories contain additional information, such as ESTs, genetic

map, and mutant libraries.

- All three databases provides bioinformatics tools for further analyses, such

as BLAST search.

- Visualization tools, such as genome browser, are also provided.

- However, range of plant genomes and their versions are a little bit different

from each other.

- Additionaly, there is not so many additional bioinformatics tools beyond

BLAST search.

Page 36: Plant Omics Resources - Amborella · 2010-06-05 · Professor in plant pathology in Pennsylvania State University Bongsoo Park, PhD student Wonho Song, Undergraduate student Kyohoon

Comparative Fungal Genomics Platform

Park et al., (2008) CFGP: Comparative Fungal Genomics Platform. Nucleic Acid Research, 36, D562-D571.

Page 37: Plant Omics Resources - Amborella · 2010-06-05 · Professor in plant pathology in Pennsylvania State University Bongsoo Park, PhD student Wonho Song, Undergraduate student Kyohoon

CFGP: Three layered Structure

StandardizedFungal GenomeData Warehouse

Park et al., (2008) CFGP: Comparative Fungal Genomics Platform. Nucleic Acid Research, 36, D562-D571.

Middleware

Data-driven User Interface

11 DB servers47 PC nodes6 Web servers8 Other servers

9 22

252

32

6029

Bacteria

Eukaryotics

Fungi

Arthropoda

Metazoa

Viridaeplanta

Page 38: Plant Omics Resources - Amborella · 2010-06-05 · Professor in plant pathology in Pennsylvania State University Bongsoo Park, PhD student Wonho Song, Undergraduate student Kyohoon

CFGP: Middleware – Fungal Matrix

Park et al., (2008) CFGP: Comparative Fungal Genomics Platform. Nucleic Acid Research, 36, D562-D571.

- High-performance computing power is required for annotating large amount of

sequences.

11 DB servers

47 PC nodes

6 Web servers

8 Other servers

72 computers

Page 39: Plant Omics Resources - Amborella · 2010-06-05 · Professor in plant pathology in Pennsylvania State University Bongsoo Park, PhD student Wonho Song, Undergraduate student Kyohoon

Park et al., (2008) CFGP: Comparative Fungal Genomics Platform. Nucleic Acid Research, 36, D562-D571.

Favorite Frame Application Frame

CFGP: Data-driven User Interface

Page 40: Plant Omics Resources - Amborella · 2010-06-05 · Professor in plant pathology in Pennsylvania State University Bongsoo Park, PhD student Wonho Song, Undergraduate student Kyohoon

Park et al., (2008) CFGP: Comparative Fungal Genomics Platform. Nucleic Acid Research, 36, D562-D571.

3 BLAST search related tools

BLAST

BLAST2

BLASTMatrix

1 Functional domain searching tool

InterPro Scan

6 Phylogenetic analysis tools

ClustalW

DNAML

PROML

DNAPARS

PROTPARS

PHYML

5 Secretory prediction tools

SignalP 3.0

SigCleave

SigPred

RPSP

SecretomeP

3 Subcellular localization prediction tools

PSort2

ChloroP

TargetP

4 Post translational modification prediction tools

NetCGlyc

NetNGlyc

NetOGlyc

NetPhos

4 Other tools

MEME

tRNAScan-SE

mFold

TMHMM2

CFGP: Favorite: Bioinformatics Workbench

Page 41: Plant Omics Resources - Amborella · 2010-06-05 · Professor in plant pathology in Pennsylvania State University Bongsoo Park, PhD student Wonho Song, Undergraduate student Kyohoon

Park et al., (2008) CFGP: Comparative Fungal Genomics Platform. Nucleic Acid Research, 36, D562-D571.

These sequences can be stored into the Favorite againfor further analyses.

CFGP: Iterative Analyses with the Favorite

Page 42: Plant Omics Resources - Amborella · 2010-06-05 · Professor in plant pathology in Pennsylvania State University Bongsoo Park, PhD student Wonho Song, Undergraduate student Kyohoon

Fungal Kingdom Insect Plants

Human

Park et al., (2008) CFGP: Comparative Fungal Genomics Platform. Nucleic Acid Research, 36, D562-D571.

CFGP: BLASTMatrix

Page 43: Plant Omics Resources - Amborella · 2010-06-05 · Professor in plant pathology in Pennsylvania State University Bongsoo Park, PhD student Wonho Song, Undergraduate student Kyohoon

CFGP: Integration of SNU Genome Browser

Park et al., (2008) CFGP: Comparative Fungal Genomics Platform. Nucleic Acid Research, 36, D562-D571.43

Page 44: Plant Omics Resources - Amborella · 2010-06-05 · Professor in plant pathology in Pennsylvania State University Bongsoo Park, PhD student Wonho Song, Undergraduate student Kyohoon

http://atmt.snu.ac.kr/

http://tdna.snu.ac.kr/

http://www.phytophthoradb.org/

CFGPstandardized genome

data warehouse

http://genomebrowser.snu.ac.kr/

44

CFGP: Platform for Diverse Bioinformatics System

Page 45: Plant Omics Resources - Amborella · 2010-06-05 · Professor in plant pathology in Pennsylvania State University Bongsoo Park, PhD student Wonho Song, Undergraduate student Kyohoon

Plant Omics Resources (POR)

Plant OmicsResources

StandardizedPlant Genomes Data warehouse

26 BioinformaticsTools

28 plant and 7 unicellularplant genomes

Standardized Plant EST database

231 Plant EST datasets Plant Genome assembly Database

9 plant genomes from WGS

Page 46: Plant Omics Resources - Amborella · 2010-06-05 · Professor in plant pathology in Pennsylvania State University Bongsoo Park, PhD student Wonho Song, Undergraduate student Kyohoon

Summary and Take Home Messages

- Next generation sequencing technologies have promoted a lot of genome

sequences with low cost and huge amount of sequences.

- De novo assembly of short-read sequences required new algorithm (de

brujin graph) for generating contig sequences.

- Currently at least 28 plant and 7 unicellular plant genomes are published or

available.

- New plant genome repository which can provide integrated environment for

bioinformatics and comparative genomics is needed.

Page 47: Plant Omics Resources - Amborella · 2010-06-05 · Professor in plant pathology in Pennsylvania State University Bongsoo Park, PhD student Wonho Song, Undergraduate student Kyohoon

Acknowledgements

Yong-Hwan LeeProfessor in Fungal Plant Pathology Lab.

and Fungal Bioinformatics Lab.

Jaeyoung Choi, MS-PhD student

Donghan Kim, MS student

Kyeong-chae Cheong, MS student

Kyongyong Jung, MS-PhD student

Fungal Bioinformatics Lab.

Dr Kang’s Lab. in Pennsylvania State University

Seogchan KangProfessor in plant pathology inPennsylvania State University

Bongsoo Park, PhD student

Wonho Song, Undergraduate student

Kyohoon Ahn, Undergraduate student

Seungmin Lee, Undergraduate student

Jaejin Park, MS-PhD student

Seryun Kim, Master

Sunghyung Kong, MS-PhD student

Doil ChoiProfessor in Plant Genomics Laboratory.

Tae-Jin YangProfessor in Industrial and medical crop

genomics and biotechnology.

Seungill Kim, MS-PhD student

Junki Lee, MS-PhD student

Page 48: Plant Omics Resources - Amborella · 2010-06-05 · Professor in plant pathology in Pennsylvania State University Bongsoo Park, PhD student Wonho Song, Undergraduate student Kyohoon

Thank you for your attention!