1
SEQUON – Sequencing the Onion Genome Richard Finkers 1 , Wilbert van Workum 2 , Martijn van Kaauwen 1 , Henk Huits 3 , Annemieke P. Jungerius 4 , Ben Vosman 1 , & Olga E. Scholten 1 Background Onion (Allium cepa L.) is one of the most important vegetable crops worldwide. In terms of global production value, onion ranks second after tomato. In terms of genetics and genomics, knowledge of the onion genome is scare compare to other crop plants (e.g. tomato). This is partly due to the huge size of the onion genome (16GB). Our initial assembly contains 10.6 Gb, witch is a greater proportion of the genome than we expected from our initial hypothesis, suggesting that the onion genome contains a large number of ancient repeats, which are diverged enough to be assembled. Based on our results, and the corresponding insight, we hypothesize t hat long read sequencing (e.g. Oxford Nanopore or PacBio) will greatly improve the assembly. Objectives Perspective Results & Discussion Figure 1. Growth rate of assembly length. The y-axis is the size of the assembly and the x-axis is the number of contigs in the assembly (from the largest one to the smallest one). Dr. Martha Mutschler, Cornell University, USA, is acknowledged for kindly supplying the DH line for the project. Dr. Aleksey Zimin, University of Maryland, USA, is acknowledged for his feedback on using the MaSuRCA axsembly pipeline. This project is a public private partnership, funded by Bejo Zaden, ServiceXS, Topsector Starting Materials, and the Dutch ministry for Economic Affairs. Acknowledgements De novo assemble the onion nuclear genome. • Provide Plant Breeders with a toolbox to utilize the onion genome sequence Assembly An estimate of the completeness of the gene space of the assembly was made by comparing it to a standard conserved set of core eukaryotic genes (CEGs) using CEGMA 3 . The assembly was found to contain complete (full length) copies of 118 (48%) of the 248 CEGs, whith 203 (82%) being at least partially present. Discussion 1 Hyde, P., Earle, E., & Mutschler, M. (2012). Doubled haploid onion (Allium cepa L.) lines and their impact on hybrid performance. HortScience, 47(12), 1690–5. 2 Zimin, A. V, Marçais, G., Puiu, D., Roberts, M., Salzberg, S. L., & Yorke, J. a. (2013). The MaSuRCA genome assembler. Bioinformatics, 29(21), 2669–77. 3 Parra, G., Bradnam, K., & Korf, I. (2007). CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics, 23(9), 1061–7. • The large number of contigs suggests that repeats are distributed all over the genome. • CEGMA validation indicates that the majority of the (core) genes are present in the initial assembly. Wageningen UR Plant Breeding P.O. Box 386, 6700 AB Wageningen, The Netherlands Contact: [email protected] & [email protected] T + 31 (0)317 48 41 65 www.wageningenur.nl/plantbreeding and www.oniongenome.net Genetwister Technologies B.V. P.O. Box 193, 6709 PA Wageningen www.genetwister.nl ServiceXS Plesmanlaan 1D, 2333 BZ Leiden www.servicexs.com Bejo Zaden B.V. P.O. Box 50, 1749 ZH Warmerhuizen www.bejo.com Experimental procedure DNA was isolated from DH line DHCU066619 1 . Four TruSeq sequencing libraries (230, 350, 500 and 700 bp insert size) were made and sequenced in a single Illumina ® HiSeq 2500 run (16 lanes). After QC, reads were assembled using the MaSuRCA 2.3.1 assembly pipeline 2 . A total of 1.03 Tb of raw sequencing data was obtained. Kmer analysis estimated a genome size of 15 Gb and a single copy sequencing fraction of 6.3 Gb (37x covered). The v1.0 assembly resulted in 10.8 Gb sequence in 6.2 M contigs (minimal contig length > 500 bp; Figure 1) with a contig N50 of 2776 bp. References Conclusions The onion genome sequence will aid breeders by providing them with information on the location of genes and markers. It will also indicate where onion orthologues of important genes are located. Such information will speed up breeding and enable us to meet the challenge to produce food for an increasing world population while using less land and inputs such as water, nutrients and pesticides.

SEQUON – Sequencing the Onion Genome...SEQUON – Sequencing the Onion Genome Richard Finkers1, Wilbert van Workum2, Martijn van Kaauwen1, Henk Huits3, Annemieke P. Jungerius4, Ben

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

Page 1: SEQUON – Sequencing the Onion Genome...SEQUON – Sequencing the Onion Genome Richard Finkers1, Wilbert van Workum2, Martijn van Kaauwen1, Henk Huits3, Annemieke P. Jungerius4, Ben

SEQUON – Sequencing the Onion Genome

Richard Finkers1, Wilbert van Workum2, Martijn van Kaauwen1, Henk Huits3, Annemieke P. Jungerius4, Ben Vosman1, & Olga E. Scholten1

Background Onion (Allium cepa L.) is one of the most important vegetable crops worldwide. In terms of global production value, onion ranks second after tomato. In terms of genetics and genomics, knowledge of the onion genome is scare compare to other crop plants (e.g. tomato). This is partly due to the huge size of the onion genome (16GB).

Our initial assembly contains 10.6 Gb, witch is a greater proportion of the genome than we expected from our initial hypothesis, suggesting that the onion genome contains a large number of ancient repeats, which are diverged enough to be assembled. Based on our results, and the corresponding insight, we hypothesize that long read sequencing (e.g. Oxford Nanopore or PacBio) will greatly improve the assembly.

Objectives

Perspective

Results & Discussion

Figure 1. Growth rate of assembly length. The y-axis is the size of the assembly and the x-axis is the number of contigs in the assembly (from the largest one to the smallest one).

Dr. Martha Mutschler, Cornell University, USA, is acknowledged for kindly supplying the DH line for the project. Dr. Aleksey Zimin, University of Maryland, USA, is acknowledged for his feedback on using the MaSuRCA axsembly pipeline. This project is a public private partnership, funded by Bejo Zaden, ServiceXS, Topsector Starting Materials, and the Dutch ministry for Economic Affairs.

Acknowledgements

•  De novo assemble the onion nuclear genome. •  Provide Plant Breeders with a toolbox to utilize the onion genome

sequence

Assembly

An estimate of the completeness of the gene space of the assembly was made by comparing it to a standard conserved set of core eukaryotic genes (CEGs) using CEGMA3. The assembly was found to contain complete (full length) copies of 118 (48%) of the 248 CEGs, whith 203 (82%) being at least partially present.

Discussion

1 Hyde, P., Earle, E., & Mutschler, M. (2012). Doubled haploid onion (Allium cepa L.) lines and their impact on hybrid performance. HortScience, 47(12), 1690–5. 2 Zimin, A. V, Marçais, G., Puiu, D., Roberts, M., Salzberg, S. L., & Yorke, J. a. (2013). The MaSuRCA genome assembler. Bioinformatics, 29(21), 2669–77. 3 Parra, G., Bradnam, K., & Korf, I. (2007). CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics, 23(9), 1061–7.

•  The large number of contigs suggests that repeats are distributed all over the genome.

•  CEGMA validation indicates that the majority of the (core) genes are present in the initial assembly.

Wageningen UR Plant Breeding P.O. Box 386, 6700 AB Wageningen, The Netherlands Contact: [email protected] & [email protected] T + 31 (0)317 48 41 65 www.wageningenur.nl/plantbreeding and www.oniongenome.net

Genetwister Technologies B.V. P.O. Box 193, 6709 PA Wageningen www.genetwister.nl

ServiceXS Plesmanlaan 1D, 2333 BZ Leiden www.servicexs.com

Bejo Zaden B.V. P.O. Box 50, 1749 ZH Warmerhuizen www.bejo.com

Experimental procedure DNA was isolated from DH line DHCU0666191. Four TruSeq sequencing libraries (230, 350, 500 and 700 bp insert size) were made and sequenced in a single Illumina® HiSeq 2500 run (16 lanes). After QC, reads were assembled using the MaSuRCA 2.3.1 assembly pipeline2.

A total of 1.03 Tb of raw sequencing data was obtained. Kmer analysis estimated a genome size of 15 Gb and a single copy sequencing fraction of 6.3 Gb (37x covered). The v1.0 assembly resulted in 10.8 Gb sequence in 6.2 M contigs (minimal contig length > 500 bp; Figure 1) with a contig N50 of 2776 bp.

References

Conclusions

The onion genome sequence will aid breeders by providing them with information on the location of genes and markers. It will also indicate where onion orthologues of important genes are located. Such information will speed up breeding and enable us to meet the challenge to produce food for an increasing world population while using less land and inputs such as water, nutrients and pesticides.