1
An ontology for transposable elements and other repetitive sequences in the age of genomics Kate L. Hertweck, National Evolutionary Synthesis Center As sequencing costs decrease, researchers are incorporating large-scale genomic sequencing projects into their projects. The resulting data inundate the scientific community, providing ample opportunity for myriad comparative genomic studies. A crucial step in most genome sequencing projects is to mask repetitive sequences. This approach improves efficiency of gene assembly, but discards an informative, diverse part of the genome. The repetitive portion of a genome comprises all sequences in very high copy number, such as transposable elements. Previously thought to be “junk” DNA, a growing body of evidence suggests transposable elements play vital roles in genomic evolution, affecting everything from chromosome structure, gene regulation, and even derivation of new genes (Biemont, 2010). Substantial work has described the classification of transposable elements (Wicker et al., 2007), although our current knowledge of such sequences is largely based on relatively few model systems. A majority of publicly available repeat libraries are built from long-read Sanger sequences or highly curated, deep coverage genome sequencing. Available approaches to repetitive element assembly from next generation sequencing data relies on assumptions about the genome's repeat content, including availability of a reference genome, depth of sequencing, and length of reads. The results from these algorithms provide invaluable information about transposable elements, especially in organisms with very large genomes. However, results from various repeat assembly methods require an extensive amount of metadata to be useful for other researchers. Development of an appropriate ontology for repetitive elements assembled from next generation sequencing data should include characteristics of the sequencing method (platform, length, number of reads) as well as details of the assembly ( ab initio vs de novo, stringency thresholds) and annotation methods (library used, search parameters). BIEMONT, C. 2010. A Brief History of the Status of Transposable Elements: From Junk DNA to Major Players in Evolution. Genetics 186: 1085-1093. WICKER, T., F. SABOT, A. HUA-VAN, J. L. BENNETZEN, P. CAPY, B. CHALHOUB, A. FLAVELL, et al. 2007. A unified classification system for eukaryotic transposable elements. Nature Reviews Genetics 8: 973-982.

iEvoBio Hertweck abstract 2012

Embed Size (px)

Citation preview

Page 1: iEvoBio Hertweck abstract 2012

An ontology for transposable elements and other repetitive sequences in the age of genomics

Kate L. Hertweck, National Evolutionary Synthesis Center

As sequencing costs decrease, researchers are incorporating large-scale genomic sequencing projects into their projects. The resulting data inundate the scientific community, providing ample opportunity for myriad comparative genomic studies. A crucial step in most genome sequencing projects is to mask repetitive sequences. This approach improves efficiency of gene assembly, but discards an informative, diverse part of the genome. The repetitive portion of a genome comprises all sequences in very high copy number, such as transposable elements. Previously thought to be “junk” DNA, a growing body of evidence suggests transposable elements play vital roles in genomic evolution, affecting everything from chromosome structure, gene regulation, and even derivation of new genes (Biemont, 2010). Substantial work has described the classification of transposable elements (Wicker et al., 2007), although our current knowledge of such sequences is largely based on relatively few model systems.

A majority of publicly available repeat libraries are built from long-read Sanger sequences or highly curated, deep coverage genome sequencing. Available approaches to repetitive element assembly from next generation sequencing data relies on assumptions about the genome's repeat content, including availability of a reference genome, depth of sequencing, and length of reads. The results from these algorithms provide invaluable information about transposable elements, especially in organisms with very large genomes. However, results from various repeat assembly methods require an extensive amount of metadata to be useful for other researchers. Development of an appropriate ontology for repetitive elements assembled from next generation sequencing data should include characteristics of the sequencing method (platform, length, number of reads) as well as details of the assembly (ab initio vs de novo, stringency thresholds) and annotation methods (library used, search parameters).

BIEMONT, C. 2010. A Brief History of the Status of Transposable Elements: From Junk DNA to Major Players in Evolution. Genetics 186: 1085-1093.

WICKER, T., F. SABOT, A. HUA-VAN, J. L. BENNETZEN, P. CAPY, B. CHALHOUB, A. FLAVELL, et al. 2007. A unified classification system for eukaryotic transposable elements. Nature Reviews Genetics 8: 973-982.