Upload
others
View
10
Download
0
Embed Size (px)
Citation preview
SEQUON – Sequencing the Onion Genome
Richard Finkers1, Wilbert van Workum2, Martijn van Kaauwen1, Henk Huits3, Annemieke P. Jungerius4, Ben Vosman1, & Olga E. Scholten1
Background Onion (Allium cepa L.) is one of the most important vegetable crops worldwide. In terms of global production value, onion ranks second after tomato. In terms of genetics and genomics, knowledge of the onion genome is scare compare to other crop plants (e.g. tomato). This is partly due to the huge size of the onion genome (16GB).
Our initial assembly contains 10.6 Gb, witch is a greater proportion of the genome than we expected from our initial hypothesis, suggesting that the onion genome contains a large number of ancient repeats, which are diverged enough to be assembled. Based on our results, and the corresponding insight, we hypothesize that long read sequencing (e.g. Oxford Nanopore or PacBio) will greatly improve the assembly.
Objectives
Perspective
Results & Discussion
Figure 1. Growth rate of assembly length. The y-axis is the size of the assembly and the x-axis is the number of contigs in the assembly (from the largest one to the smallest one).
Dr. Martha Mutschler, Cornell University, USA, is acknowledged for kindly supplying the DH line for the project. Dr. Aleksey Zimin, University of Maryland, USA, is acknowledged for his feedback on using the MaSuRCA axsembly pipeline. This project is a public private partnership, funded by Bejo Zaden, ServiceXS, Topsector Starting Materials, and the Dutch ministry for Economic Affairs.
Acknowledgements
• De novo assemble the onion nuclear genome. • Provide Plant Breeders with a toolbox to utilize the onion genome
sequence
Assembly
An estimate of the completeness of the gene space of the assembly was made by comparing it to a standard conserved set of core eukaryotic genes (CEGs) using CEGMA3. The assembly was found to contain complete (full length) copies of 118 (48%) of the 248 CEGs, whith 203 (82%) being at least partially present.
Discussion
1 Hyde, P., Earle, E., & Mutschler, M. (2012). Doubled haploid onion (Allium cepa L.) lines and their impact on hybrid performance. HortScience, 47(12), 1690–5. 2 Zimin, A. V, Marçais, G., Puiu, D., Roberts, M., Salzberg, S. L., & Yorke, J. a. (2013). The MaSuRCA genome assembler. Bioinformatics, 29(21), 2669–77. 3 Parra, G., Bradnam, K., & Korf, I. (2007). CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics, 23(9), 1061–7.
• The large number of contigs suggests that repeats are distributed all over the genome.
• CEGMA validation indicates that the majority of the (core) genes are present in the initial assembly.
Wageningen UR Plant Breeding P.O. Box 386, 6700 AB Wageningen, The Netherlands Contact: [email protected] & [email protected] T + 31 (0)317 48 41 65 www.wageningenur.nl/plantbreeding and www.oniongenome.net
Genetwister Technologies B.V. P.O. Box 193, 6709 PA Wageningen www.genetwister.nl
ServiceXS Plesmanlaan 1D, 2333 BZ Leiden www.servicexs.com
Bejo Zaden B.V. P.O. Box 50, 1749 ZH Warmerhuizen www.bejo.com
Experimental procedure DNA was isolated from DH line DHCU0666191. Four TruSeq sequencing libraries (230, 350, 500 and 700 bp insert size) were made and sequenced in a single Illumina® HiSeq 2500 run (16 lanes). After QC, reads were assembled using the MaSuRCA 2.3.1 assembly pipeline2.
A total of 1.03 Tb of raw sequencing data was obtained. Kmer analysis estimated a genome size of 15 Gb and a single copy sequencing fraction of 6.3 Gb (37x covered). The v1.0 assembly resulted in 10.8 Gb sequence in 6.2 M contigs (minimal contig length > 500 bp; Figure 1) with a contig N50 of 2776 bp.
References
Conclusions
The onion genome sequence will aid breeders by providing them with information on the location of genes and markers. It will also indicate where onion orthologues of important genes are located. Such information will speed up breeding and enable us to meet the challenge to produce food for an increasing world population while using less land and inputs such as water, nutrients and pesticides.