The banana (Musa acuminata) genome sequence · 2016. 3. 7. · N50 size (number) Size (cumulated) Contigs 24,425 43,1 kb 390 Mb ... Monocot : 6 888 879 Flowers Bracts Old leaves Cigar

The banana (Musa acuminata) genome sequence

Patrick Wincker

Angélique D’Hont Cirad

Taxonomy

Banana is a Giant Herb

- Monocotyledon - Commelinids lineage - Zingiberales order - Musaceae family

adapted from Chase, 2004

POALES: rice, sorghum, brachypodium, maize

ZINGIBERALES: banana

ARECALES: oil palm

ASPARAGALES

LILIALES

PANDANALES

DIOSCOREALES

PETROSAVIALES

ALISMATALES

ASCORALES

Commelinids

Musa balbisiana

Radiation of wild Musa/Domestication

Domestication involved:

- hybridization between species and subspecies made possible by human migration

- selection of diploid and triploid, seedless, parthenocapic hybrids by early farmers

Musa acuminata

truncata

errans

zebrina

malaccensis

burmanica

/siamea

banksii

microcarpa

subspecies with distinct chromosome structures

Central for food security in many (sub)-tropical countries

Hundreds of varieties do exist

Million Tonnes AAB AAB AAA AAA

Plantain Others Mutika Cavendish Total

Central/South America &

Caraibs 7.7 1.5 24.5 33.7

West Africa 9.0 1.1 3.2 13.3

East Africa 1.3 13.4 2.6 17.3

North Africa & Middle East 1.7 1.7

India & Sri Lanka 0.6 2.2 9.4 12.2

Asia & Oceania 0.2 8.4 16.5 25.1

Total World 18.8 13.2 13.4 57.9 (>50%) 103.3 100%

Local consumption 18.8 12.2 13.4 45.2 89.6 87%

Exportation 13.8 13.8 13%

Somaclones from one zygote

Cavendish

but > 50% world production

Monoculture of Cavendish somaclones -> Highly vulnerable -> Devastating diseases -> Extensive use of pesticides

Breeding very complex Cultivated bananas, high level of sterility, producing seedless fruits - structural heterozygosity - mainly triploid - vegetative propagation

Breeding

--> Urgent need of improved cultivars

-> Production of a reference whole genome sequence of banana

Pahang Doubled-Haploid

from anther culture

Genome size : 520 Mb

Pahang

Wild diploid (2n=22)

Species: M. acuminata

Subspecies: malaccencis

Pahang doubled haploid

Bakry et al Fruits 2008

-> Haploids available only for one M. acuminata genotype : Pahang

Principles of Musa genome assembly

27,495,411

reads (400bp)

Assembled

24,425 contigs

+ Paired-end sequences

10kb, 100kb 2,139,909 reads

+

7,513 scaffolds

Assembled

Genetic map

Anchoring on genetic

map

11 pseudo-molecules

Musa assembly

Technology Reads Coverage

454 single 27 495 411 16.9 X

Sanger 2 139 909 3.9 X

20.8 X

Assembly using Newbler, then consensus correction using 50x Illumina GAIIx data

Sequence assembly = 472 Mb = 91% of the DH-Pahang genome

Musa assembly

DH-Pahang genome size : 520 Mb

Number

N50 size (number)

Size (cumulated)

Contigs 24,425 43,1 kb 390 Mb

Scaffolds 7,513 1.3 Mb (65) 472 Mb

Pahang fruit Embryo rescue from seeds Pahang progeny

Development of the mapping population

Progeny issued from Musa acuminata ‘Pahang’ self-fertilisation

• creation: self-fertilisation (C. Jenny) • embryo rescue (R. Habas, S. Joseph, F. Bakry) : 4 hands - 820 seeds 618 embryos • growth (R. Habas, F. Carreel) : 441 hybrids • ploidy analyses (S. Joseph, F. Bakry, M. Rodier, F. Carreel) through flow cytometry: all diploids except two 4x • DNA extraction (M. Souquet, C. Cardi, F. Carreel) : 441 hybrids

180 individuals used to build the genetic map for the sequence alignment

UMR AGAP/APMV, UMR AGAP/SEG, UMR BGPI/BECj

Development and genotyping of molecular markers

SSR (F.C. Baurens, M. Souquet, C. Cardi, R. Rivallan, A.M. Risterucci, F. Carreel) Total tested 2 454 markers from previous studies 16% (386) developed within project (F.C. Baurens) BES + scaffolds 84% (2068)

DArT (A. Kilian, F.C. Baurens, F. Carreel)

Total tested, an array of 15 360 markers from previous studies 50 % New probes, enriched in Pahang 50 %

48 % ( 1 185) polymorphic 589 SSR used for the anchorage (max 3 by scaffold)

1 008 polymorphic with 534 unique 63 DArT used for the anchorage (for scaffold < 3 SSR)

(Aus), UMR AGAP/GPTR Génotypage et UMR AGAP/SEG

Anchoring of the 11 chromosomes

Chr1 Chr2 Chr3

• 647 markers used for the anchoring • 258 scaffolds including 98.0% of the scaffolds larger than 1Mb

Chr4 Chr5 Chr6 Chr7

Chr8 Chr9 Chr10 Chr11

Genetic map

Physical map Scaffolds

1 cM 10 Mb

Assembly HD Genes

472 Mb 523 Mb 36 000

Anchored 70% (332 Mb) 64% 92%

Oriented

47% (221Mb) 42% 84%

cDNAs

Reconciliation (GAZE)

ESTs

Known peptides

ab initio Predictions

Repeats (Transposable elements, ..)

Gene Models

Musa : 91 041 (GMGC) Monocot : 6 888 879

Flowers Bracts

Old leaves

Cigar leaves

Fruits

Musa : New cDNA 829,587 reads

Musa RNASeq 143 682 857reads

Musa TE (from 64 BAC and 454seq) Plant TE (Repbase) ab initio (RepeatScout)

Plant peptides (UniProtKB)

Gene annotation process

Masking

Geneid, SNAP, FGENESH (Training set of 400 manually annotated genes)

Musa acuminata predicted proteome

Number of protein coding genes

36,542

Median nb exon per gene 4

Median size of CDS 861 bp

Median size of intron 147 bp

Avg. % GC of CDS 50.2%

36,542 predicted protein coding genes

CDS 9%

Intron 20%

UTR 1%

Intergenic sequences 70%

Protein coding genes 30%

STEP1: Construction of a Pahang TE reference library - LTR retrotransposons (Copia, Gypsy) (E. Hribova, IEB, Czech Republic + O. Garsmeur) - Non-LTR retrotransposons (LINEs) (P. Heslop-Harrison, Univ Leicester, UK) - DNA transposons (sub-classes 1) (T. Wicker, Univ Zurich, Switzerland)

Transposable elements (TE) annotation

Transposable element distribution

Only 1.3% of DNA transposons mainly hAT, no CATCA and Mariner found so far in high copy numbers in all angiosperms

44 % of transposable elements annotated in the genome assembly

Genetic map

Gene intron Gene exon

STEP2: Screening of the Musa assembly with REPET package (TEannot pipeline)

Genome landscape

0

10

20

30

40

Mb

45S

5S

-> Sharp transition between gene-rich and TE-rich regions

Annotation of repeats in the non-assembled part of the genome

Element Size (bp) sequence nb copy nb Mb

its 26S 3806 173294 979 3,7

its 18S 4179 263846 1358 5,7

its 5S 521 56534 2334 1,2

Daterra LTR-retro Copia 7324 336947 989 7,2

Nanica LINE 5291 25270 103 0,5

Maca LTR-retro Copia 1842 6771 79 0,1

Caturra LTR-retro Gypsy 5405 66916 266 1,4

Total 1832094

centromeric

Nanica

Location by in-situ hybridization

peri-centromeric

peri-centromeric all over chromosomes

FISH: Maguy Rodier + Dheema Burthia (AREU, Mauritius)

O. Panaud (Univ Perpignan, France)+ J. Barbosa (Univ Rio Grande, Brazil)

Typical short tandem centromeric repeats were not found in Musa.

2 1 3 4 5 6 7 8 9 10 11

Viral sequences related to Banana streak virus (BSV) integrated in Pahang genome

- highly reorganized and fragmented

- spread all over the genome

- belong to a badnavirus phylogenetic group different from the endogenous BSV species (eBSV) found in M. balbisiana

- most of them formed a new subgroup/species

eBSV from M. balbisiana

New species

Viral sequences integrated in the M. acuminata Pahang genome

Seem defective and not able to restitute free infectious viral particles

0.7

2

1.7

5

200

400

600

800

1000

1200

1400

1600

0.1

2

0.1

9

0.2

9

0.4

6

1.1

2

2.7

3

4.2

6

Ks

Gene-p

airs

KS

Complex pattern of small duplicated chromosome segments, important variation of Ks depending on clusters, suggesting several WGD events

Whole genome duplications (WGD)

Paralogous relationships between the eleven Musa chromosomes

Musa1 2 3 4 5 6 7 8 9 10 11

1

2

3

4

5

6

7

8

9

10

11

Mu

sa

COGE / Synmap ; http://synteny.cnr.berkeley.edu/CoGe/SynMap.pl

Musa1 2 3 4 5 6 7 8 9 10 11

1

2

3

4

5

6

7

8

9

10

11

Mu

saWhole genome duplications (WGD)

Duplicated segments involve each mainly 4 syntenic regions

At least 2 WGD in the Musa lineage : α and β

COGE / Synmap ; http://synteny.cnr.berkeley.edu/CoGe/SynMap.pl

Paralogous relationships between the eleven Musa chromosomes

Block1 Block2 Block3

Whole genome duplications in Musa

Assemble paralogous clusters corresponding to the lowest Ks values in 12 ancestral blocks Represent the Musa ancestral genome before α+β WGDs

1 2 3 4 5 6 7 8 9 10 11 12

Chr1 Chr2 Chr3 Chr4 Chr5 Chr6 Chr7 Chr8 Chr9 Chr10 Chr11

40Mb

0Mb

10Mb

20Mb

30Mb

Blocks

Block 2

Block 3

chr1

chr2

chr3

chr4

chr5

chr6 chr7

chr8

chr9

chr10

chr11

chr1

chr2

chr3

chr4

chr5

chr6 chr7

chr8

chr9

chr10

chr11

Whole genome duplications in Musa

Additional paralogous relationships between the 12 Musa ancestral blocks with higher Ks values suggesting another older duplication

Two WGDs : α and β

Older WGD : γ

3 WGDs events identified in Musa : 2 close events (α and β) WGD + 1 older (γ) event

0 0.5 1 1.5 2 2.5 3

0

10

20

30

Ks

Perc

enta

ge o

f ge

ne-

pai

rs

High Ks Older WGD

Segment of Block 1

Segment of Block 3

50 25 075100125

Jurassic Cretaceous Tertiary

Oryza_sativa

Brachypodium_distachyon

Sorghum_bicolor

Zingiber_officinale

Musa_acuminata

Rho Poales

Zingiberales

?

α β γ

Sigma

Timing of WGD events relative to speciation events

Two WGDs events (Rho, Sigma) identified in the Poales lineage:

- the timing of Rho WGD is well established around 70 Mya

- the timing of Sigma (Tang et al 2010) relative to divergence of Poales and Zingiberales was less clear

Comparative genome analysis to examine whether the Musa γ WGD corresponds to Sigma WGD of the Poales

Comparison of Musa and rice contemporaneous genomes

1 2 3 4 5 6 7 8 9 10 11

1

2

3

4

5

6

7

8

9 10 11 12

Musa

Ory

za s

ati

va

Syntenic clusters of genes between Musa and rice

Scarce synteny conservation, pattern compatible with several independent WGD events in the rice and Musa lineages, followed by gene loss (fractionation) and chromosomal rearrangements

Comparison of Musa and rice ancestral genomes

Divergence Zingiberales Poales

Alpha WGD

β Banana blocks

Gamma WGD

2 8 b

Beta WGD

Sigma WGD

5 2

Sigma blocks

Rho blocks

Rho WGD

Ancestral blocks represent approximation of genome composition before WGD and thus account for post-WGD gene losses increasing analysis sensitivity

Gamma and Sigma WGDs occurred after the Poales/Zingiberales divergence

Rho5

Rho2

Sigma6

Musa

Block 8

Musa

Block 2

Rho and Sigma blocks from Tang et al (2011)


Phylogenomic analysis of 93 single copy genes

Suggest Zingiberales more closely related to Arecales than to Poales

50 25 0 75 100 125

Jurassic Cretaceous Tertiary Oryza

Brachypodium

Sorghum

Zingiber

Musa

Phoenix

Asparagus

Acorus

Vitis

Medicago

Populus

Carica

Arabidopsis

Eudicots

Monocots

Poales

Zingiberales

Arecales


The three Musa WGDs occurred after the divergence of Zingiberales from Arecales and Poales

Jurassic Cretaceous Tertiary

Oryza

Brachypodium

Sorghum

Zingiber

Musa

Phoenix

Asparagus

Acorus

Vitis

Medicago

Populus

Carica

Arabidopsis

50 25 0 75 100 125

?

σ ρ

γ β α

γ

α β

Poales

Zingiberales

Arecales

Eudicots

Monocots

Phylogenomic analyses performed on 3,553 gene families

Over-retention of Musa transcription factors after WGD

-> Amplification of several banana transcription factor gene families

Gene loss WGD

Collaboration Mathieu Rouard (Bioversity)

Distribution of gene “families” among banana, three Poaceae, date-palm and Arabidopsis

Suggest a high level of divergence and diversification within the Poaceae lineage

Enriched in genes coding for transcription factors, defense related proteins, cell wall metabolism and secondary metabolism enzymes

Collaboration Mathieu Rouard (Bioversity)

Conclusion

Crucial stepping-stone for genetic improvement of this under-researched vital crop

An essential bridge for genes and genomes evolution studies within Monocotyledons and with Dicotyledons

The reference whole genome sequence of banana

http://banana-genome.cirad.fr/

http://southgreen.cirad.fr/http://southgreen.cirad.fr/http://southgreen.cirad.fr/

A new template for banana genetics

✔ Unlimited (almost) source of DNA markers located on the chromosomes - SSR: from several hundreds now several thousands located on the chromosomes - SNP: a template for SNP discovery and mapping for genetic and diversity studies

✔ A template to characterize chromosome structural variations = insertion, deletion, inversion, duplication, translocation and subsequent prospects for germplasm improvement

✔Access for the first time to the entire set of Musa genes (36 542) - Template for transcriptomic analysis - Candidate gene strategies based on physiological studies and insights from other species ; freeway to crosscutting evidence

Nabila Yahiaoui Franc-Christophe Baurens Françoise Carreel Olivier Garsmeur Stéphanie Bocs Gaetan Droc Céline Cardi Marlène Souquet Cyril Jourda Juliette Lengelle Marguerite Rodier Didier Mbéguié Matthieu Chabannes Rémy Habas Ronan Rivallan Philippe Francois Claire Poiron Christophe Jenny Frédéric Bakry Steeve Joseph Anne Dievart Julie Leclercq Xavier Argout Ange-Marie Risterucci Manuel Ruiz Jean Christophe Glaszmann

Patrick Wincker France Denoeud Jean-Marc Aury Benjamin Noel Corinne Da Silva Kamel Jabbari Julie Poulain Karine Labadie Adriana Alberti Maria Bernard Margot Correa Olivier Jaillon Jean Weissenbach

Mathieu Rouard (Bioversity, France) Valentin Guignon (Gene families analysis)

Thomas Wicker (Univ Zurich, Switzerland) Eva Hribova (IEB, Czech Republic) Jaroslav Dolezel (IEB, Czech Republic) Pat Heslop-Harrison (Univ Leicester, UK) Olivier Panaud (Univ Perpignan, France) José Barbosa (Univ Rio Grande, Brazil) Dheema Burthia (AREU, Mauritius) Mouna Jeridi (IRA, Tunisia) (Transposable element analysis +FISH)

Andrzej Kilian (DArT, Canberra, Australia) (DArT Developpment and genotyping)

Nicolas Roux (Bioversity, France) Gert Kema (PRI, Wageningen) (provide MAMB BAC-end sequence)

Dutch Groene Woudt Foundation

Diane Burgess Mike Freeling (Univ Berkley, USA) (CNS analysis)

Michael McKai Saravanaraj Ayyampalayam Jim Leebens-Mack (Univ Georgia, USA) (Phylogenomic analysis)

Eric Lyons (Univ Arizona, USA) (COGE tools)

Francis Quetier

Spencer Brown (ISV, Gif sur Yvette) (genome size)

Documents

The banana (Musa acuminata) genome sequence · 2016. 3. 7. · N50 size (number) Size (cumulated) Contigs 24,425 43,1 kb 390 Mb ... Monocot : 6 888 879 Flowers Bracts Old leaves Cigar