View
806
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Latest advances in Coffea Genomics
Citation preview
Latest Advances in Coffea Genomics
Alexandre de Kochko
An introduc7on to the Coffea Genus
The Coffea genus belongs to the Rubiaceae family
Fourth Angiosperm family: 650 genera and 13,000 species
Other known genera in the family:
Gardenia Cinchona
(quinine)
Rubia Ixora
(madder)
An introduc7on to the Coffea Genus The Coffea genus has been recently increased with the addi7on of the former genus Psilanthus. This “new” enlarged genus contains 124 described species origina7ng from Africa, Madagascar and other Indian Ocean islands, Asia and Australia.
From Davis et al. Botanical Journal of the Linnean Society, 2011, 167: 357–377.
An introduc7on to the Coffea Genus
The Coffea genus is very diverse, it includes the previously called subgenus Coffea, the Baracoffea alliance and the former Psilanthus genus which was itself divided in 2 subgenera.
The subgenus Coffea is dived in 3 botanical sec7ons: The Eucoffea, found in West and Central Africa The Mozambicoffea, found in East Africa The Mascarocoffea, found in Madagascar and some Indian Ocean Islands.
Cw
EMMozambicoffea
Eucoffea
Mascarocoffea
An introduc7on to the Coffea Genus
The Baracoffea alliance is exclusively encountered in western Madagascar.
The ex-‐Psilanthus species are more widely spread as they are origina7ng from Africa, Madagascar, Asia and Australia
An introduc7on to the Coffea Genus C. arabica is the sole tetraploid (2n=4x=44) of the genus and one of the rare auto fer7le species. All the others are diploid (2n=2x=22) and almost all are allogamous.
C. arabica is an allotetraploid resul7ng from a spontaneous hybridiza7on between C. canephora and a wild East African species: C. eugenioides. It is a recent event < 0.6 Mya
x
C. canephora ♂ C. eugenioides ♀
C. arabica
An introduc7on to the Coffea Genus The Coffea genus has a large phenotypic diversity
C. macrocarpa Mas
C. pterocarpa Mad
C. liberica WA
C. brevipes W/CA
C. congensis W/CA
C. eugenioides EA
C. millo5i Mad C. racemosa EA C. kapakata W/S
C. pseudozanguebariae EA C. arabica EA
C. liberica var Koto W/CA
Coffee economical importance
Out of the 124 species, only 2 are widely cul7vated: C. arabica (Arabica) and C. canephora (Robusta). 65-‐70% and 35-‐30% respec7vely. Second trade product exported by Southern countries (aber oil). 400 Billions of coffee cup drunk every year; 12,000 each second.
Grown all over the world in intertropical regions
Robusta
Both
Arabica
Status of Coffea genomics
Molecular markers
Molecular markers are used for:
Iden7fying the gene7c diversity of popula7ons / species
Establishing the gene7c structure of popula7ons / species)
Iden7fying species / individuals (Finger prin7ng – barcoding)
Establishing gene7c maps
The most used markers nowadays are:
SSR: Single Sequence Repeats = microsatellites
SNP: Single Nucleo7de Polymorphism
Both have their sequence known, they are numerous in any genome and they are co-‐
dominant.
Molecular markers
A large set of molecular markers is established, SSR and SNPs.
These markers are compiled in two public data banks:
MoccaDB and SGN
Plechakova et al. BMC Plant Biology, 2009; 9: 123.* Mueller et al. Pl. Physiol. 2005 138: 1310-‐1317
Gene7c maps
A C. canephora saturated gene7c map. SSR, SNPs and BACs were used to construct this map.
The present interna7onal map contains ≈3000 markers, mainly SNPs
No saturated C. arabica gene7c maps are available yet.
From: de Kochko et al. Advances in Botanical Research; 2010, 53: 23-‐63. *
Genome size
Coffea genome sizes vary from simple to double:
From: Cross et al. Can. J. Bot. 73: 14-‐20; -‐ Noirot et al. Ann Bot 92: 709-‐714*; -‐ Razafinarivo et al. TGG in press (December 2012 issue) *
Chromosome organiza7on
From: Hamon et al. Chr. Res. 2009, 17: 291-‐304*
Schema7c representa7on of chromosomes in different Coffea species. 5SrDNA are in green 18SrDNA are in red West and Central Africa species present 1 satellite chromosome as well as Malagasy ones while East African species have two.
The genus presents a differen7al chromosome structural organiza7on
Genome size and structure
There is a geographical related divergence in the genome size and chromosome organiza7on
DEW (1.41)
LIB (1.41) HUM
(1.76)
EUG (1.36) HET
(1.74)
CAN (1.45)
PSE (1.13)
RAC (1.03)
MIL (1.32)
TET (1.07)
EST and RNASeq
Publicly available ESTs: 254 474 Sanger ESTs in total
Mainly origina7ng from the two cul7vated species:
174 275 ESTs for C. arabica; from different organs and 7ssues and from rust infected leaves
69 066 ESTs for C. canephora also from different organs and 7ssues
10 838 ESTs for C. racemosa, a wild East African species drought tolerant
295 are from different sources, hybrid plants and only 18 from C. eugenioides a puta7ve parent of C. arabica
Non publicly available and NGS cDNA sequences are much more numerous, e.g. the C. canephora sequencing consor7um project produced 130.106 Illumina reads.
BAC libraries
For C. canephora:
One BAC library from the genotype 126, an improved cul7var. DNA digested with HindIII Two libraries from the genotype HD200-‐94, a double haploid used for genome sequencing. DNA digested with HindIII and BstYI.
Leroy et al. 2005; TAG. 111: 1032-‐1041 -‐ de Kochko et al. 2010; Ad. Bot. Res. 53: 22-‐63*
For C. arabica:
One library from the variety IAPAR59, an improved variety. DNA digested with HindIII One library from the Mokka variety. DNA digested with HindIII
Noir et al. 2004; Theor. Appl. Genet. 109: 225-‐230 – Jones et al. 2005; 21st ASIC conference
BAC libraries have exclusively been build for the two cul7vated species
Transposable elements
General structure of Class II elements -‐ DNA transposons
ITR = Inverted Terminal Repeat
Transposase
ITR ITR
CAGC... GTCG...
...GCTG
...CGAC
Transposable elements
MITE autonomous copy
trans
ORF
Class I transposable elements: Retrotransposons
Structure of a LTR retrotransposon
gag= capside protein (Group An7Gene)
Transposable elements
UTR
gag pol
pol= polyprotein contains all the func7ons for the element replica7on (polymerase)
LTR 5' LTR 3'
UTR= Untranslated region
UTR
The other Class I elements: LINEs et SINEs (Retroposons or non-‐LTR retroelements)
Transposable elements
SINEs
gag
LINEs pol
Coffea Transposable elements
Iden7fica7on and use of transposable elements in Coffea has been ini7ated only recently.
Iden7fica7on of TE casseqes in ESTs and unigenes. Lopes et al. 2008, Mol. Genet. Geno. 279: 385-‐401
Iden7fica7on of a MITE inserted in an intron and its use for diversity study. Guyot et al. 2009 , BMC Pl. Biol. 9:22* – Dubreuil-‐Tranchant et al. Int. J. Evol. Biol. 2011 ID 358412*
Iden7fica7on and use of full length LTR-‐Retrotransposons for diversity study. Hamon et al. 2011, Mol. Genet. Geno. 285: 447-‐460*
Iden7fica7on of full length transposable elements in BAC clones. Cenci et al. 2012, Pl. Mol. Biol. 78: 135-‐145
Iden7fica7on of LTR-‐Retrotransposons in BAC-‐ends and NGS reads. Dubreuil-‐Tranchant et al. 2012, 2nd ICTE* – Dias et al. 2013 21st PAG*
Coffea Transposable elements
LTR-‐retrotransposon REMAP
Microsatellite repeats
mul?-‐locus approaches for analyzing transposon inser?ons
RBIP
Retrotransposon-‐Based Inser?onal Polymorphism
REtrotransposon-‐Microsatellite Amplified Polymorphism
Sequence-‐Specific Amplified Polymorphism
Restric?on site
S-‐SAP
How to use transposable elements for diversity studies
Coffea Transposable elements
Using a MITE for polymorphism survey
From: Guyot et al. 2009 BMC Pl. Biol. 9:22*
From: Dubreuil-‐Tranchant et al. Int. J. Evol. Biol. 2011 ID 358412*
Intra C. canephora Alex-‐1 polymorphism at the g3 locus:
Coffea Transposable elements
Divo
4396 bp
LTR pair iden7ty 94.5%
5749 bp
Nana
LTR pair iden7ty 90.5%
First full length LTR Retrotransposons iden7fied in Coffea
Hamon et al. 2011, Mol. Genet. Geno. 285: 447-‐460*
Coffea Transposable elements
resolve Coffea species lineages reveal intra LIB and CAN differen7a7on
Diversity of inser7on paqern
Hamon et al. 2011, Mol. Genet. Geno. 285: 447-‐460*
Synteny studies
Synteny studies: at the micro level
From: Guyot et al. 2009 BMC Pl. Biol. 9:22*
At the micro level:
Both studies show a good conserva7on of synteny despite, and independently, of the divergence 7me between species
From: Guyot et al. 2012 BMC Genomics 13:103*
Synteny studies: at the micro level
From: Guyot et al. 2009 BMC Pl. Biol. 9:22*
Macrosyntenic rela7onships between each of the 11 coffee Linkage Groups and the 19 grape Linkage Groups based on mapped coffee COSII loci.
From: Guyot et al. 2012 BMC Genomics 13:103*
Thanks to a set of 867 COSII markers, macrosynteny was detected between coffee, tomato and grapevine.
While coffee and tomato genomes share 318 orthologous markers and 27 conserved syntenic segments, coffee and grapevine share 299 syntenic markers and 29 CSSs.
Synteny studies: at the macro level
Macrosyntenic rela7onships between each of the 11 coffee Linkage Group and the 12 tomato Linkage Groups based on mapped coffee COSII loci.
From: Guyot et al. 2012 BMC Genomics 13:103*
Synteny studies: at the macro level
Significant conserva7on is found between distantly related species from the Asterid and Rosid clades, at the genome macrostructure and microstructure levels.
Time alone doesn’t explain the observed divergences
Synteny analyses are considerably useful for syntenic studies between supposedly remote species for the isola7on of important genes for agronomy.
From: Guyot et al. 2009 BMC Pl. Biol. 9:22* -‐ Guyot et al. 2012 BMC Genomics 13:103*
Synteny studies: Conclusion
Phylogene7c assump7ons
Phylogene7c analyses of Coffea
From: Davis et al. Bot. J. Linnean So. 2011, 167, 357–377.
Combined plas7d–ITS Bayesian majority rule consensus phylogene7c tree
Phylogene7c analyses of Coffea
Combined plas7d–ITS maximum likelihood phylogene7c tree
Whatever the method of analysis, these results do not allow to conclude on Coffea evolu7on, the different clades being not hierarchized.
AW
Ex-‐PSI
AC
AE
MAS
MAD
Phylogene7c analyses of Coffea 20 COS par7ally sequenced (exons + intron) 72 Coffea species
1st divergence: ex-‐Psilanthus
2nd divergence: 3 non hierarchized clades: Baracoffea/ Mascarocoffea/ Africa.
Psilanthus
Baracoffea
Madagascar
Mascarene
Madagascar and Comoros
East Africa
East Africa
West and Central Africa
Psi Bar
Coffe
a
Phylogene7c analyses of Coffea A hypothesis on Coffea origin and evolu7on:
Psi-‐Coffea common ancestor
Coffea Psi
Psi
Genome sequencing
Genome sequencing
The sequenced genotype belongs to the C. canephora species. C. canephora was chosen because it is diploid, contrary to C. arabica which is an allotetraploid.
The sequenced plant is a double haploid (mixoploid) plant produced by IRD from haploid embryo and conserved in tropical green houses in Montpellier (France).
Plant Material:
Genome sequencing Sequencing Strategy:
Two steps:
to produce a first assembly with: 454 reads, single and mate ended (8 and 20 kb span) Sanger sequencing of Bac Ends
Correct the assembly with Illumina sequencing, single and pair ended reads
Assembly results :
Genome sequencing
13,345 scaffolds, largest scaffold 9.Mb N50 = 1.2Mb N80 =65kb
Coverage reached: 28.8 X 454 69.7 X Illumina 0.3 X Sanger Total = 98.8 X
Total length assembled 568.6 Mb (80% of the 710 Mb es7mated size)
Con7gs
Reads of different origin
Consensus
Pair-‐ mate-‐ end reads
Gaps = span of pair or mate end fragments
Scaffolds
Number of genes 25574 Number of genes without intron 5004 Size in nt. (mean : med.) 3684.33 : 2788 Exon number / gene (mean : med.) 5.10 : 4 CDS size in nt. (mean : med.) 1205.55 : 1002 Coding coverage 30,830,841 (5.4%) Intron number 104,944 Intron size in nt. (mean : med.) 483.20 : 208 % con7gs with at least one gene. (% in bases) 16.6% (82.3%)
Automa7c annota7on results :
Genome sequencing
Genome sequencing Further steps :
To anchor the physical map (assembly) to the interna7onal gene7c map (≈3000 SNP markers)
Annotate manually some genes from Coffea par7cular pathways (Caffeine…)
Compara7ve genomics
Many other possible analyses
Publish!
Shulaev V. et al (2011) Nature Genet. 43: 109–116
Coffea canephora
Evodyn Team members
Perla HAMON Romain GUYOT Chris7ne DUBREUIL-‐TRANCHANT Valérie PONCET Serge HAMON
Students, trainees and visitors, among them: N. Razafinarivo, P.O. Duroy, C. Duret, A. Guellim, M. de la Mare, S. Akaffou, P. Mafra Almeida da Costa, C. Gomez, Elaine Dias …
Collaborators:
Dominique CROUZILLAT Michel RIGOREAU Emmanuel COUTURON Claudia CARARETO Spencer BROWN Michael BOURGE Vincent LEFORT Olivier GASCUEL Olivier CORITON Sonja SILJAK-‐YAKOVLEV Odile ROBIN Saranya SRISUWAN Aaron DAVIS Philippe BARRE And many more
Acknowledgements
The Interna7onal Coffee sequencing consor7um:
Victor A. ALBERT (USA)
Alan C. ANDRADE (BRE) Xavier ARGOUT (FR)
Benoit BERTRAND (FR) Alexandre de KOCHKO (FR) Giovanni GIULIANO (ITA)
Giorgio GRAZIOSI (ITA)
Robert HENRY (AUS)
JAYARAMA (IND)
Philippe LASHERMES (FR)
Ray MING (USA)
Chifumi NAGAI (USA)
Steve ROUNSLEY (USA)
David SANKOFF (CAN)
Patrick WINCKER (FR)
Merci pour votre aqen7on