19
1 3 Mol Genet Genomics DOI 10.1007/s00438-015-1049-z ORIGINAL PAPER Sequences enhancing cassava mosaic disease symptoms occur in the cassava genome and are associated with South African cassava mosaic virus infection A. T. Maredza 1 · F. Allie 1 · G. Plata 2 · M. E. C. Rey 1 Received: 6 January 2015 / Accepted: 10 April 2015 © Springer-Verlag Berlin Heidelberg 2015 with cassava genes, suggesting a possible role in regulation of specific biological processes. We confirm the expression of SEGS in planta using EST data and RT-PCR. The sequence features of endogenous SEGS (iSEGS) are unique but resem- ble non-autonomous transposable elements (TEs) such as MITEs and helitrons. Furthermore, many SEGS-associated genes, some involved in virus–host interactions, are differ- entially expressed in susceptible (T200) and tolerant TME3) cassava landraces infected by South African cassava mosaic virus (SACMV) of susceptible (T200) and tolerant (TME3) cassava landraces. Abundant SEGS-derived small RNAs were also present in mock-inoculated and SACMV-infected T200 and TME3 leaves. Given the known role of TEs and associ- ated genes in gene regulation and plant immune responses, our observations are consistent with a role of these DNA ele- ments in the host’s regulatory response to geminiviruses. Keywords Cassava mosaic disease · Sequences Enhancing Geminivirus Symptoms · Satellites · Begomovirus · Transposable elements Introduction Geminiviruses are single-stranded circular (ssc) plant- infecting DNA viruses (Brown et al. 2011). Cassava mosaic disease (CMD), caused by begomoviruses belonging to the Geminiviridae family, is endemic in regions of sub-Saharan Africa where the crop is cultivated (Patil and Fauquet 2009). An unusually severe CMD epidemic that swept across East Africa, and was reported between 1995 and 2005 (Gibson et al. 1996; Legg et al. 2006; Ntawuruhunga et al. 2007), resulted in unprecedented cassava crop losses. Unexpect- edly, cassava fields with previously tolerant germplasm such as TME3, an IITA-bred landrace (Fregene et al. 2001), Abstract Cassava is an important food security crop in Sub-Saharan Africa. Two episomal begomovirus-associ- ated sequences, named Sequences Enhancing Geminivirus Symptoms (SEGS1 and SEGS2), were identified in field cassava affected by the devastating cassava mosaic dis- ease (CMD). The sequences reportedly exacerbated CMD symptoms in the tolerant cassava landrace TME3, and the model plants Arabidopsis thaliana and Nicotiana bentha- miana, when biolistically co-inoculated with African cas- sava mosaic virus-Cameroon (ACMV-CM) or East African cassava mosaic virus-UG2 (EACMV-UG2). Following the identification of small SEGS fragments in the cassava EST database, the intention of this study was to confirm their pres- ence in the genome, and investigate a possible role for these sequences in CMD. We report that multiple copies of vary- ing lengths of both SEGS1 and SEGS2 are widely distributed in the sequenced cassava genome and are present in several other cassava accessions screened by PCR. The endogenous SEGS1 and SEGS2 are in close proximity or overlapping Communicated by K. Gruden. Small RNA-Seq data reported are available in the European Nucleotide Archive under the accession number PRJEB8495. Electronic supplementary material The online version of this article (doi:10.1007/s00438-015-1049-z) contains supplementary material, which is available to authorized users. * M. E. C. Rey [email protected] 1 School of Molecular and Cell Biology, University of the Witwatersrand, Johannesburg, Wits 2050, South Africa 2 Department of Systems Biology, Columbia University in the City of New York, 1130 St Nicholas Avenue, New York, NY, USA

Sequences enhancing cassava mosaic disease symptoms …gap2118/papers/SEGS.pdfwith cassava genes, suggesting a possible role in regulation of specific biological processes. We confirm

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Sequences enhancing cassava mosaic disease symptoms …gap2118/papers/SEGS.pdfwith cassava genes, suggesting a possible role in regulation of specific biological processes. We confirm

1 3

Mol Genet GenomicsDOI 10.1007/s00438-015-1049-z

ORIGINAL PAPER

Sequences enhancing cassava mosaic disease symptoms occur in the cassava genome and are associated with South African cassava mosaic virus infection

A. T. Maredza1 · F. Allie1 · G. Plata2 · M. E. C. Rey1

Received: 6 January 2015 / Accepted: 10 April 2015 © Springer-Verlag Berlin Heidelberg 2015

with cassava genes, suggesting a possible role in regulation of specific biological processes. We confirm the expression of SEGS in planta using EST data and RT-PCR. The sequence features of endogenous SEGS (iSEGS) are unique but resem-ble non-autonomous transposable elements (TEs) such as MITEs and helitrons. Furthermore, many SEGS-associated genes, some involved in virus–host interactions, are differ-entially expressed in susceptible (T200) and tolerant TME3) cassava landraces infected by South African cassava mosaic virus (SACMV) of susceptible (T200) and tolerant (TME3) cassava landraces. Abundant SEGS-derived small RNAs were also present in mock-inoculated and SACMV-infected T200 and TME3 leaves. Given the known role of TEs and associ-ated genes in gene regulation and plant immune responses, our observations are consistent with a role of these DNA ele-ments in the host’s regulatory response to geminiviruses.

Keywords Cassava mosaic disease · Sequences Enhancing Geminivirus Symptoms · Satellites · Begomovirus · Transposable elements

Introduction

Geminiviruses are single-stranded circular (ssc) plant-infecting DNA viruses (Brown et al. 2011). Cassava mosaic disease (CMD), caused by begomoviruses belonging to the Geminiviridae family, is endemic in regions of sub-Saharan Africa where the crop is cultivated (Patil and Fauquet 2009). An unusually severe CMD epidemic that swept across East Africa, and was reported between 1995 and 2005 (Gibson et al. 1996; Legg et al. 2006; Ntawuruhunga et al. 2007), resulted in unprecedented cassava crop losses. Unexpect-edly, cassava fields with previously tolerant germplasm such as TME3, an IITA-bred landrace (Fregene et al. 2001),

Abstract Cassava is an important food security crop in Sub-Saharan Africa. Two episomal begomovirus-associ-ated sequences, named Sequences Enhancing Geminivirus Symptoms (SEGS1 and SEGS2), were identified in field cassava affected by the devastating cassava mosaic dis-ease (CMD). The sequences reportedly exacerbated CMD symptoms in the tolerant cassava landrace TME3, and the model plants Arabidopsis thaliana and Nicotiana bentha-miana, when biolistically co-inoculated with African cas-sava mosaic virus-Cameroon (ACMV-CM) or East African cassava mosaic virus-UG2 (EACMV-UG2). Following the identification of small SEGS fragments in the cassava EST database, the intention of this study was to confirm their pres-ence in the genome, and investigate a possible role for these sequences in CMD. We report that multiple copies of vary-ing lengths of both SEGS1 and SEGS2 are widely distributed in the sequenced cassava genome and are present in several other cassava accessions screened by PCR. The endogenous SEGS1 and SEGS2 are in close proximity or overlapping

Communicated by K. Gruden.

Small RNA-Seq data reported are available in the European Nucleotide Archive under the accession number PRJEB8495.

Electronic supplementary material The online version of this article (doi:10.1007/s00438-015-1049-z) contains supplementary material, which is available to authorized users.

* M. E. C. Rey [email protected]

1 School of Molecular and Cell Biology, University of the Witwatersrand, Johannesburg, Wits 2050, South Africa

2 Department of Systems Biology, Columbia University in the City of New York, 1130 St Nicholas Avenue, New York, NY, USA

Page 2: Sequences enhancing cassava mosaic disease symptoms …gap2118/papers/SEGS.pdfwith cassava genes, suggesting a possible role in regulation of specific biological processes. We confirm

Mol Genet Genomics

1 3

were also severely affected. Epidemiological studies in Uganda and Cameroon identified recombinants between cassava begomoviruses (CBVs) namely, African cassava mosaic virus (ACMV), East African cassava mosaic virus (EACMV) and East African cassava mosaic Cameroon virus (EACMCV) (Zhou et al. 1997; Fondong et al. 2000; Pita et al. 2001). Ndunguru (2006) later identified novel epi-somal ssc DNA sequences associated with begomoviruses in cassava field samples exhibiting the unusually severe mosaic symptoms. Following the discovery of resistance breaking in TME3 and detection of these ssc DNA sequences in cas-sava EST datasets, questions remained as to the nature of these DNA sequences, and what role could they possibly be playing in CMD symptom modulation. In view of breeding and genetic modification programs to develop geminivi-rus resistance, the results from this study could shed some important light not only on molecular virus–host interac-tions but also on disease control approaches.

The ssc DNA molecules, initially named begomovirus-associated sat DNA-II and sat DNA-III (GenBank acces-sion numbers AY836366.1 and AY836367.1), were unre-lated and did not share any long stretches of nucleotide sequence similarity. Sat DNA-II and DNA-III were initially classified as satellite-like molecules because they were amplified using universal primers for alpha- and betasatel-lites (Mansoor et al. 1999; Briddon et al. 2002). Satellites are sub-viral nucleic acids that depend on co-infection with a helper virus for replication (Mayo et al. 2005) and affect disease severity and transmission (reviewed in Nawaz-ul-Rehman and Fauquet 2009; Zhou 2013). Sat DNA-II and DNA-III have been renamed episomal Sequences Enhanc-ing Geminivirus Symptoms (eSEGS1 and eSEGS2, respec-tively), as further characterisation revealed their non-con-formity to classical satellites, and have been described in detail (Ndunguru 2006; Ndunguru et al. 2008). Intrigu-ingly, SEGS homologues were subsequently identified in genomes of healthy cassava, and are termed integrated or endogenous SEGS (iSEGS) to distinguish from the episo-mal forms (eSEGS). A BLAST search against a database of 10,577 ESTs from MTAI16 (Sakurai et al. 2007) also identified several transcripts with >80 % sequence iden-tity to fragments of eSEGS1 and eSEGS2 (unpublished). While some plant DNA virus sequences have been found integrated in their host genomes and play a role in genome expansion and gene expression (Harper et al. 2002), the integrated SEGS (iSEGS) are not homologous to any known plant virus, but iSEGS-2 contains a 26-bp (includ-ing 6 bp of the DNA1-F primer) match to a sequence of alphasatellite origin (Zhou 2013).

The published cassava genome sequences (http://www.phytozome.net) (Goodstein et al. 2012; Prochnik et al. 2012) allow a detailed investigation of the diversity

of iSEGS within the genome. Analysis of genome posi-tions of SEGS may improve our understanding of their potential role in CMD aetiology. In this study, eSEGS1 (AY836366.1; begomovirus-associated sat DNA-II) and eSEGS2 (AY836367.1; begomovirus-associated sat DNA-III) were used to query the cassava genome. The frequen-cies of iSEGS insertions, the sizes of integrated elements, the patterns of integration and their chromosomal (scaffold) locations were investigated. Furthermore, the expression of the larger iSEGS and associated cassava genes was ana-lysed. Differential expression of iSEGS-associated genes in South African cassava mosaic virus (SACMV) (Berrie et al. 2001)-infected cassava was also evaluated in comparison with mock-inoculated plants. The iSEGS vary in length, are found in multiple loci across the cassava genome, and share some features with several classes of transposable ele-ments (TEs). Small RNA-Seq data revealed the presence of iSEGS-derived small RNAs targeting SEGS1 and SEGS2 in mock-inoculated and in SACMV-infected cassava, with higher frequencies in SACMV-infected T200. Our results suggest a possible regulatory role of iSEGS through a small RNA-mediated mechanism. In addition, we demonstrate the altered expression of several iSEGS-associated genes in response to SACMV, providing support for a functional relationship between the iSEGS and CMD phenotype mod-ification and gene regulation.

Materials and methods

Plant material

Leaf samples for nucleic acid extraction were obtained from virus-free in vitro-propagated cassava plantlets (Table 1) grown in plant agar containing MS media (Murashige and Skoog 1962). All plant material was checked for healthy or infected status by PCR, using SACMV core coat primers (Allie et al. 2014).

Extraction of nucleic acids

Total nucleic acid (TNA) was extracted from ground leaf samples using extraction buffer [0.1 M Tris–Cl pH 8.0, 1.4 M NaCl, 20 mM EDTA, 2 % cetyltrimethyl ammo-nium bromide (CTAB), 1 % β-mercaptoethanol] followed by chloroform:isoamylalcohol (24:1) extraction. TNA were precipitated using isopropanol and resuspended in TE (10 mM Tris–Cl, 1 mM EDTA) containing 1 µg/µl RNase A. Purified DNA was quantified using the NanoDrop spec-trophotometer (Thermo Scientific) by measuring absorb-ance at 260 nm and analysed on 1 % agarose gels by electrophoresis.

Page 3: Sequences enhancing cassava mosaic disease symptoms …gap2118/papers/SEGS.pdfwith cassava genes, suggesting a possible role in regulation of specific biological processes. We confirm

Mol Genet Genomics

1 3

RNA was isolated using the Tri-Reagent (Sigma, USA) or Purezol (Bio-Rad, USA) following the manufacturer’s instructions. Contaminating DNA was removed by DNase I treatment according to manufacturer’s instructions (Thermo Scientific, USA). Complementary DNA was synthesised from 1 µg total RNA using the RevertAid H Minus Reverse Transcriptase Kit (Thermo Scientific, USA) according to the manufacturer’s instructions and utilising random hexamers.

PCR screening for integrated sequences

PCRs contained 1× Green buffer, 0.25 µM of each primer, 0.25 µM of dNTPs, 0.1 U of DreamTaq polymerase (Thermo Scientific, USA) and 100 ng of template DNA in 20 µl total volume. Primers, SatII F/R (5′-gccgcac-cactggatctc-3′ and 5′-cagcagccagtcaggaagtt-3′), or SatIII F/R (5′-acctgacggcagaaggaat-3′ and 5′-aggcctcgttactaaaa-gtgc-3′), were designed using the episomal begomovirus-associated sequences to amplify 895 and 306 bp fragments for iSEGS1 and iSEGS2, respectively. Annealing tempera-tures of 56–58 °C were used depending on primers. PCR products were analysed by electrophoresis on 1 % agarose gels, stained with ethidium bromide and viewed using the ChemiDoc™ MP Imaging System (Bio-Rad, USA). The 1 kb DNA ladder plus (Thermo Scientific, USA) was used as molecular weight marker unless specified.

Cloning and sequencing of PCR fragments

Integrated SEGS for sequencing were amplified using the Phusion HotStart PCR Kit (Thermo Scientific, USA) according to the manufacturer’s instructions. Amplicons in gel fragments were purified using the Macherey-Nagel Nucleospin® Gel and PCR Clean-up Kit (MN, Germany) according to manufacturer’s instructions. Purified DNA was cloned into pJet1.2 Blunt (CloneJet PCR Cloning Kit) according to manufacturer’s instructions (Thermo Scien-tific, USA). The plasmid clones were sequenced using the Inqaba Biotech (Pretoria, South Africa) sequencing service. Several clones were sequenced for each cassava germ-plasm, and if sequence differences were observed, more clones were sequenced to verify the variations.

Analysis of sequences of iSEGS cloned from healthy cassava

Quality checks post-sequencing (base calling, trimming and reassembly) were performed using CLC Main Work-bench (v6.9 clcbio.com). Multiple sequence alignments were performed on the online MAFFT alignment server (mafft.cbrc.jp/alignment/server) using default parameters combined with manual adjustments in BioEdit Sequence

Alignment Editor (Hall 1999). Large gaps (>10 bases) and read ends were deleted to improve alignment. The best substitution model was determined using the Bayes-ian information criterion corrected (BICc), and the Akaike information criterion, corrected (AICc). Tree inference was performed in MEGA version 6.06 (updated v. 6140226; Tamura et al. 2013). The episomal SEGS, accession num-bers AY836366.1 and AY836367.1, were used as reference sequences as well as the mint 1019-nt sat-II-like amplicon (EU862815).

Searching for integrated SEGS in the cassava genome

BLASTn (v 2.2.21) (Altschul et al. 1997) was used to align the SEGS1 (AY836366.1) and SEGS2 (AY836367.1) against version 4 of the cassava genome available from Phytozome v9.1 (http://www.phytozome.net). HSPs with E values higher than 0.05 were ignored. The reward for a nucleotide match parameter (−r) was assigned a value of 1. All other parameters were left at their default values.

Genome searches for genes associated with iSEGS

We studied in detail HSPs meeting all conditions below: at least 70 % sequence identity to the query sequences, E values ≤10−20 and sequence lengths of at least 200 bp. Exceptions were made when HSPs were in close proximity with other insertions fulfilling the above criteria. Selected sequences were analysed by tabulating the frequencies, size ranges, homology statistics, chromosomal locations of iSEGS and genes in close proximity to the iSEGS inser-tions. The genome contexts of the iSEGS insertions were analysed using the Gbrowse feature in Phytozome by slid-ing the window 25 kilo base pairs (kbp) upstream and downstream of the iSEGS insertion sites. Proximal and distal genes were classified as ‘upstream of’, ‘overlapping with’ or ‘downstream of’ the iSEGS insertion sites. The cassava_IDs were mapped to homologous A. thaliana locus IDs to use the more comprehensive TAIR resources for gene ontology (GO) annotation (http://www.arabidopsis.org/tools/bulk/go/index.jsp).

Searching transposable elements homologous to SEGS

To characterise the nature of the repetitive SEGS inser-tions and to determine if other genomes harboured similar sequences, the Censor tool in the repetitive sequences data-base RepBase (http://www.girinst.org/censor/) was used to search all available reference collections for sequences homologous to the eSEGS (AY836366.1 and AY836367.1). In addition, the terminal sequences of iSEGS were manually inspected for inverted repeats and target site duplications.

Page 4: Sequences enhancing cassava mosaic disease symptoms …gap2118/papers/SEGS.pdfwith cassava genes, suggesting a possible role in regulation of specific biological processes. We confirm

Mol Genet Genomics

1 3

Expression of SEGS‑associated genes during SACMV infection

Details on the generation of the expression data used to identify SEGS-associated genes during SACMV infec-tion are detailed in Allie et al. (2014). In brief, an RNA-seq study monitoring the gene expression changes in a SACMV-infected susceptible (T200) and a tolerant (TME3) cassava landrace was conducted. RNA was extracted at each of the three time points (12, 32 and 67 dpi) from SACMV and mock-inoculated leaf tissue. A total of twelve cDNA libraries were generated using the SOLiD Total RNA-Seq Kit (Applied Biosystems). These librar-ies were barcoded, multiplexed and sequenced on a single flow cell using 50 bp forward and 35 bp reverse paired-end sequencing chemistry on the ABI SOLiD V4 system. The reads generated for each T200 and TME3 library were mapped to the cassava genome available on Phytozome (http://www.phytozome.net/cassava_Manihot esculenta 147 version 4.1). These NGS data can be accessed from the NCBI Sequence Read Archive using BioProject accession PRJNA255198.

Small RNA data from mock‑inoculated and SACMV‑infected cassava

Total RNA extraction, using a modified high molecular weight polyethylene glycol (HMW-PEG) protocol, was carried out on mock-inoculated and SACMV-infected leaf tissue samples collected from T200 and TME3 at 12, 32 and 67 dpi. For each sample, 1 g pooled leaf tissue (from two experiments; 6 leaves per experiment) was homog-enised in liquid nitrogen and added to 5 ml preheated (65 °C) GHCL buffer (6.5 M guanidium hydrochloride, 100 mM Tris–HCl pH 8.0, 0.1 M sodium acetate pH 5.5, 0.1 M β-mercaptoethanol) and 0.1 g HMW-PEG (Mr: 20,000, Sigma). The mixture was then pelleted by centrifu-gation (10,000×g) for 10 min at 4 °C. The supernatant was treated with 0.1 ml 1 M sodium citrate (pH 4.0), 0.2 ml 2 M NaCl and 5 ml phenol:chloroform:isoamyl alcohol (PCI) (25:24:1). The mixture was then vortexed vigorously and again pelleted by centrifugation (10,000×g) for 10 min at 4 °C. The supernatant was removed and RNA was pre-cipitated by adding 5 ml isopropanol (propan-2-ol). The mixture was thoroughly mixed and incubated at −20 °C for 60 min and pelleted by centrifugation (10,000×g) for 25 min at 4 °C. RNA pellets were washed with 5 ml ice-cold 75 % molecular grade ethanol. RNA pellets were dried at 37 °C for 5 min. The pellet was resuspended in 100 μl preheated (55 °C) RNase-free water and 1 μl RNase inhibi-tor (Fermentas). Small RNAs were specifically filtered for using the mirVanaTM miRNA isolation kit (Ambion Inc.),

following the manufacturer’s protocol. Next generation sequencing (NGS) was done using the Illumina HiSeq2000 platform at LGC Genomics in Berlin, Germany.

Raw reads generated from the Illumina HiSeq2000 system for the 12 small RNA libraries were cleaned of sequence adapters using the fast-toolkit (http://hannon-lab.cshl.edu/fastx_toolkit/), low-quality tags and small sequences (<15 nt long). Quality analysis per cycle was performed for each library. To eliminate all other small non-coding RNAs, high-quality trimmed sequences were mapped onto rRNA, tRNA and snoRNAs sequences from Rfam (version 12.0). The sequences that mapped com-pletely and had an E value <0.06 were removed from the libraries. Small RNAs (18–26 nt) from the Illumina data were aligned to the two eSEGS (AY836366.1 and AY836367.1), using the Map to Reference Tool in CLC Main Genomics Workbench (v6.9 clcbio.com). The num-ber of SEGS reads (normalised against total NGS sRNA reads) was quantified in T200 and TME3, modelled through a Poisson distribution, and lower and upper limits of confidence intervals calculated using the Neyman pro-cedure (Nakamura et al. 2010). The small RNA-Seq data have been submitted to the European Nucleotide Archive and the accession number is PRJEB8495.

Amplification of iSEGS using TIR sequences

Universal primers for begomovirus-associated satel-lites were mapped on the episomal SEGS sequences. To investigate if the SEGS were flanked by terminal inverted repeats (TIRs), the forward primers Beta01 (5′-ggtaccac-tacgctacgcagcagcc-3′) and DNA-1 (5′-tggggatcctagga-tataaataacacgtc-3′) were used separately for single-primer PCR, yielding amplicons from TIRs. PCR was performed on healthy cassava as described previously, using anneal-ing temperature of 50 °C and amplicons were cloned into pJET1.2 Blunt as described earlier, and sequenced (Inqaba Biotech, South Africa). Secondary structures within eSEGS were predicted using the Mfold (Zuker 2003) tool within CLC Main Workbench (version 6.9) after converting DNA to RNA sequences. All settings were left at default values.

Primers for analysing iSEGS expression

Two sets of primers for analysing iSEGS expres-sion were designed to amplify within the near full-length iSEGS: SatII F (5ʹ-gccgcaccactggatctc-3ʹ) and AY66_520 (5ʹ-caaagctcgagctccaaaggtc-3ʹ) for SEGS1, and AY67_588 (5ʹ-gtgaattgattgagagttg-3ʹ) and AY67_1166 (5ʹ-cacgtctatcatttgcttctc-3ʹ) for SEGS2. Other primer pairs were designed to span iSEGS and adjoining gene sequences: Prefoldin- (5ʹ ggattttgccgagatcatta-3ʹ) and AY66_520 for

Page 5: Sequences enhancing cassava mosaic disease symptoms …gap2118/papers/SEGS.pdfwith cassava genes, suggesting a possible role in regulation of specific biological processes. We confirm

Mol Genet Genomics

1 3

the prefoldin 6 subunit (cassava4.1_019094) associated with SEGS1, and Helicase (5ʹ-gcagcttccctttcttgtttttg-3ʹ) and SatIII R (5ʹ-acctgacggcagaaggaat-3ʹ) for the helicase (cas-sava4.1_025672) gene associated with SEGS2. These prim-ers were used to amplify from cDNA of selected healthy cassava germplasm. The primers were tested in silico against M. esculenta sequences downloaded from the Phy-tozome database ftp://ftp.jgi-psf.org/pub/compgen/phyto-zome/v9.0/Mesculenta/ using the Primer and Probe Design function located in the Molecular Biology Folder in the CLC toolbox in CLC Main Workbench (v6.9) and expected product sizes were recorded.

Analysis of iSEGS in cassava ESTs

The expression of iSEGS was analysed by BLASTn searches in cassava-expressed sequence tags (EST) data-base (http://ncbi.nlm.nih.gov) containing 88 062 sequences as of 2 April 2014. The BLAST outputs were manually curated and corresponding gene identities verified by searching protein databases using BLASTp.

Additionally, the nucleotide sequences of SEGS1 and SEGS2 were used in a BLASTn search, with default parameters and an E value threshold of 0.05, against the non-redundant reads in the small RNA study of Pérez-Quintero et al. (2012) (GEO Accessions: GSM726146 and GSM726147) and the unique reads from the study by Bal-len-Taborda et al. (2013).

Results

iSEGS fragments are present in the genomes of several cassava accessions

Analysis of the putatively episomal SEGS showed termi-nal inverted repeats (TIR) identical to the universal prim-ers, DNA1-F (Mansoor et al. 1999) and Beta01 (Brid-don et al. 2002), used to amplify alphasatellites and betasatellites, respectively (Fig. 1a, b). This suggests that partially corresponding palindromic sequences were used to amplify eSEGS2 and eSEGS1. Surprisingly, the Beta01

Fig. 1 Primer mapping to eSEGS (a, b) and PCR analysis of iSEGS (c–e) in healthy cassava. Mapping of universal primers for begomov-irus-associated satellites on the episomal SEGS sequences: a SEGS1 and b SEGS2, c amplification of SEGS1 from healthy cassava using a single primer (Beta01) demonstrating the presence of correspond-ing TIRs in the cassava genome. Lanes 1–4 Cassava cultivars TME1, TME3, T200 and cv.60444, respectively; d and e PCR amplification

of integrated SEGS from genomes of healthy cassava accessions (as labelled at the top for both panels). Specific primers for SEGS1 and SEGS2 were designed to produce ~895 and ~306 bp products, respectively. M Molecular weight marker, O’GeneRuler 1 kb plus DNA ladder (Thermo Scientific), Plasmid 1 and Plasmid 2 denote positive controls (eSEGS1 or eSEGS2-containing plasmids, respec-tively) and NTC denotes no template controls

Page 6: Sequences enhancing cassava mosaic disease symptoms …gap2118/papers/SEGS.pdfwith cassava genes, suggesting a possible role in regulation of specific biological processes. We confirm

Mol Genet Genomics

1 3

primer alone was able to amplify multiple fragments from genomic DNA of four healthy cassava accessions (Fig. 1c), demonstrating the presence of several similar TIRs in dif-ferent loci within the cassava genome. The sequenced amplicons included a near full-length clone of 98.9 % sequence identity to eSEGS1 (Online Resource 1), but also showed several genome sequences unrelated to SEGS (data not shown). DNA1 F alone failed to amplify from genomic DNA of cassava.

Internal primers for SEGS1 and SEGS2 were used to amplify ~ 895 and ~ 306 bp fragments (Fig. 1d, e), respec-tively, from genomic DNA of seventeen healthy cassava accessions (Table 1). Amplicons from cultivars TME1 and TME3 (CMD tolerant), and cv.60444, T200 and TME 117 (susceptible to CMD), were sequenced. Multiple pair-wise sequence alignments showed a maximum of 12 and 7 % nucleotide (nt) sequence divergence among iSEGS1 (Fig. 2a) and iSEGS2 (Fig. 2b) clones, respectively. The observed nucleotide differences were mostly due to inser-tions or deletions within the sequenced amplicons relative to the episomal SEGS. When large gaps were disregarded, blocks of high sequence conservation were observed, irre-spective of the germplasm. Phylogenetic analyses using maximum likelihood (ML) showed no distinct clustering of clones according to source germplasm (Fig. 2a, b). Sev-eral clones from the same accession were found in separate

Table 1 Cassava accession characteristics

✗ susceptible, ✓ tolerant, ± moderate tolerance

Cultivar Resistance/other characteristics Integrated satellite DNA-II-like sequences

Integrated satellite DNA-III-like sequences

Cultivar source

cv. 60444 ✗ ✓ ✓ CIAT

TME1 ✓ ✓ ✓ IITA

TME3 ✓ ✓ ✓ IITA

T200 ✗ ✓ ✓ South Africa

TME117 ✗ ✓ ✓ IITA

TME7 ✓ ✓ ✓ IITA

I30001 ✓ ✓ ✓ IITA

I30572 ± ✓ ✓ IITA

I60506 ✗ ✓ ✓ IITA

AR9-18 ✓ ✓ ✓ CIAT

AR9-44 ✓ ✓ ✓ CIAT

CR43-13 ✓ ✓ ✓ CIAT

SM707-17 Low temp ✓ ✓ CIAT

SM1433-4 High yield ✓ ✓ CIAT

CM523-7 High starch ✓ ✓ CIAT/FAO

BRA1183 Low temp ✓ ✓ FAO

MTAI-16 High starch ✓ ✓ FAO

Fig. 2 Phylogenetic analysis of PCR-amplified iSEGS from genomes of healthy cassava a maximum likelihood (ML) tree for iSEGS1 based on the Tamura 3-parameter substitution model (Tamura 1992). Best substitution model consists of BICc = 3554.19, AICc = 3348.48 and the highest log likelihood tree is −1652.6442. The percentage (>50 %) of trees in which the associated taxa clus-tered together is shown next to the branches. b Molecular phyloge-netic analysis of iSEGS2 by ML based on the Hasegawa–Kishino–Yano model (Hasegawa et al. 1985). Best substitution model consists of BICc = 1524.34; AICc = 1223.85 and the highest log likelihood tree is −566.57. All trees are drawn to scale, with branch lengths measured as the number of substitutions per site. The trees were auto-matically rooted by saving the generated ML tree in standard Newick format as previously described (Louis et al. 2014) and all the anal-ysis were performed in MEGA 6.06 (updated v. 6140226) software (Tamura et al. 2013)

Page 7: Sequences enhancing cassava mosaic disease symptoms …gap2118/papers/SEGS.pdfwith cassava genes, suggesting a possible role in regulation of specific biological processes. We confirm

Mol Genet Genomics

1 3

clusters within the phylogenetic trees. Thus, a heterogene-ous population of amplicons was produced from multiple priming sites within the genome and nucleotide variations arose from these multiple iSEGS loci.

Analysis of iSEGS using the cassava genome database

A BLASTn search (Altschul et al. 1990) against the Mani-hot esculenta genome assembly (v4.1, http://www.phy-tozome.net) identified 73 SEGS1 and 197 SEGS2 homo-logues using an expect (E) value <0.05 (Online Resources 2 and 3, respectively). The high-scoring pairs (HSPs) var-ied in length from near full-length sequences to homol-ogy stretches of 22 bp and 26 bp for iSEGS1 and iSEGS2, respectively (Online Resource 2 and 3). Near full-length insertions matched 997 bp on scaffold 12498 for iSEGS1 and 756 bp on scaffold 12725 for iSEGS2, represent-ing 99.2 and 84.1 % nt identities to the query sequences, respectively. Analysis of the near full-length iSEGS1 and 2 revealed that they both included the GC-rich regions found in eSEGS. iSEGS1 also retained seven putative eSEGS1 ORFs (Ndunguru et al. 2008), while iSEGS2 retained 105 bp of the putative ORF 2 from eSEGS2. Putative iSEGS1 and iSEGS2 ORFs showed no significant simi-larity to protein sequences in the public databases. Many iSEGS were in intergenic genomic regions and only 6 of the 73 iSEGS1 and 6 of the 197 iSEGS2 fragments were not associated with genes (within a 25 kb nucleotide win-dow). Partial iSEGS1 homologues overlapping with anno-tated genes were fewer (6 out of 73) than iSEGS2 overlap-ping genes (90 out of 197). Up to seven integrated SEGS per scaffold in the reconstructed cassava genome were observed. The highest insertion frequencies were 0.74 and 0.54 per kbp for fragments with sequence identity to iSEGS1 and iSEGS2, respectively (Fig. 3a, b). Most of the scaffolds, including 12498 and 12725 which each con-tained a single near full-length copy, showed less than 0.01 insertions per kb. However, insufficient information on the cassava genome precluded the characterisation of iSEGS distribution patterns by chromosomal locations.

The BLASTn results were analysed further by manually scoring the search statistics against contiguous sequence stretches 200 bp or longer. HSPs showing nucleotide iden-tities ≥70 % and E values <e−20 were manually inspected. Details of identified iSEGS, genome contexts and gene identities (IDs) of iSEGS-associated genes are summarised in Online Resources 4 and 5.

Arrangement and structure of iSEGS insertions is reminiscent of transposable elements

iSEGS were found more often in tandem than in opposite sequence strands, and the sizes and sequence stretches

of the repeat elements were variable. However, nucleo-tide sequences within the first half (5′) of the iSEGS were more highly represented, while sequences matching the end (3′) were less frequent (Fig. 3c, d). To investigate pos-sible transposition mechanisms of iSEGS, the Censor tool in the RepBase repetitive sequences database (http://www.girinst.org/censor/) was used to search for sequences shar-ing homology with the episomal SEGS. Interestingly, both SEGS had at least 70 % nucleotide identity to segments of LTR (Copia, Gypsy) and non-LTR (CR1) retrotranspo-sons and a minor region of homology to a DNA element (P-type). Three TE-like sequences belonging to the Gypsy, Copia and DNA/P classes showed homology to iSEGS1, while two TE-like segments with partial homology to SEGS2 were characterised as Gypsy and CR1 (Table 2 and illustrated in Fig. 3e, f). These regions of significant homology to known transposons are unlikely to encode any functional protein, given the relatively short lengths (40–300 bp) of sequence homology relative to the large sizes of the TEs in question (3–8 kb) and the lack of genes coding for transposition enzymes.

TIRs are a common feature of DNA transposons. TIRs corresponding to Beta01 and DNA 1F primers were mapped to the episomal SEGS (Fig. 1a, b). To find the corresponding TIRs on iSEGS, a comparison was made of sequences flanking the near full-length iSEGS1 and iSEGS2 on scaffolds 12,498 and 12,725, respectively (Online Resource 6). The analysis showed that the long-est fragment of iSEGS1 has TIRs similar to the Beta01 primer. Mapping of DNA 1F primer sequences was less defined because the longest iSEGS2 is 5′-truncated. The presence of TIRs on the longest iSEGS1 and its apparent lack of coding potential suggest similarity to the Miniature Inverted-Repeat Transposable Elements (MITEs) class of non-autonomous DNA transposons. Additionally, second-ary structure (Mfold) analysis of eSEGS1 and 2 showed a propensity for the formation of intramolecular double-stranded loop structures (Online Resource 6).

GO annotations of iSEGS‑associated genes

Mobile genetic elements can affect expression pat-terns of genes around insertion sites (Peaston et al. 2004; Bejerano et al. 2006). To analyse the genome locations of iSEGS from a gene function perspec-tive, cassava-annotated genes in the vicinity of iSEGS (25 kb upstream and downstream of insertion sites) were assigned gene ontology (GO) terms in the “Cel-lular Component”, “Molecular Function” and “Bio-logical Process” categories (see “Materials and meth-ods”). The chromosomal locations of iSEGS-associated genes grouped as “upstream of iSEGS”, “downstream of iSEGS” or “overlapping with iSEGS” are recorded

Page 8: Sequences enhancing cassava mosaic disease symptoms …gap2118/papers/SEGS.pdfwith cassava genes, suggesting a possible role in regulation of specific biological processes. We confirm

Mol Genet Genomics

1 3

in Online Resource 7. A substantial fraction of iSEGS-associated genes were linked to the nuclear cellular component (Fig. 4); these include genes such as heli-cases, integrases, ligases, and DNA and histone methy-lases. Another cellular component category represented

among genes overlapping with iSEGS1 was the chloro-plast. Ten molecular functions, namely nucleotide bind-ing, nucleic acid binding, protein binding, DNA or RNA binding, receptor binding activity, transporter activity, hydrolase activity, kinase activity, transferase activity,

Fig. 3 Distribution of SEGS insertions in the cassava genome. a, b Frequency distribution of iSEGS insertions normalised by scaffold lengths and presented as number of insertions per kilo base pair. The numbers on the x-axis are scaffold names. c, d Frequency of homolo-gous nucleotides in the cassava genome along the SEGS1 or SEGS2 fragments according to the BLASTn alignment. Blue line the GC-rich

regions and the red line the location of putative open reading frames (ORFs); e, f Maps of SEGS molecules putative ORFS, the transpo-son elements-like motifs, and annealing positions of several primers used in the study. Blue regions of high GC content and red the longest putative ORFs

Table 2 Transposable element-like motifs on iSEGS

Sequence name Position on sequence (bp) Element name Element size (bp) Class References

AY836366/SEGS1 86–235 Copia-113 6042 LTR Bao and Jurka (2013), Collén et al. (2013)

320–381 Gypsy4-I 7680 LTR Jurka and Kohany (2009)

437–475 P-7_HM 3170 DNA/P Jurka (2008b)

AY836367/SEGS2 151–444 Gypsy-7 8813 LTR Drosophila 12 Genomes Consortium et al. (2007), Jurka and Kojima (2012)

632–741 CR1-17 4457 Non-LTR Bao and Jurka (2008)

Page 9: Sequences enhancing cassava mosaic disease symptoms …gap2118/papers/SEGS.pdfwith cassava genes, suggesting a possible role in regulation of specific biological processes. We confirm

Mol Genet Genomics

1 3

and transcription factor activity, were characterised the Biological Process categories (Fig. 4). Also pre-sent were several classes of metabolic genes including esterases, glycosidases, helicases, nucleases, proteases, and protein phosphatases. Genes with roles in diverse cellular, developmental or regulatory processes such as amino acid biosynthesis, and protein modification, degradation or binding, and protein transport (chaper-ones, kinases, transporters, and transferases) were also found close to iSEGS. Table 3 shows several examples of iSEGS-associated genes with possible roles in plant development and stress response that demonstrate many of the above GO functions.

Cassava ESTs harbour sequences homologous to iSEGS

BLASTn analysis of 88 062 cassava-expressed sequence tags (ESTs) established the presence of iSEGS in the pool of tran-scripts and confirmed their overlap with gene-coding regions (Online Resources 8). Both eSEGS contain motifs, that resemble TATA boxes and polyadenylation signals (Ndun-guru et al. 2008), strongly supporting them being transcribed, but no transcripts in the cassava EST database that map to the full copy sequences related to iSEGS1 or iSEGS2 were detected. Analysis of the EST-associated iSEGS showed that the 5′ sequences (the first 500 nt of SEGS) were more highly represented than the 3′ sequences. EST-associated iSEGS

Fig. 4 Gene ontology (GO) annotations for genes in close proximity with iSEGS1 and iSEGS2

Page 10: Sequences enhancing cassava mosaic disease symptoms …gap2118/papers/SEGS.pdfwith cassava genes, suggesting a possible role in regulation of specific biological processes. We confirm

Mol Genet Genomics

1 3

Tabl

e 3

Dif

fere

ntia

lly e

xpre

ssed

ann

otat

ed iS

EG

S1 a

nd 2

ass

ocia

ted

gene

s (>

twof

old

chan

ge)

in S

AC

MV

-inf

ecte

d ca

ssav

a

Func

tiona

l ann

otat

ion

T20

0 lo

g2 r

atio

TM

E3

log2

rat

ioB

iolo

gica

l fun

ctio

nPh

ytoz

ome

acce

ssio

n

12 d

pi32

dpi

67 d

pi12

dpi

32 d

pi67

dpi

AY

8363

66

Pen

tatr

icop

eptid

e re

peat

(PP

R)

supe

rfam

ily p

rote

in0.

75−

3.13

NA

0.58

0.79

−1.

06R

NA

-bin

ding

pro

tein

s, m

odul

ate

RN

A p

roce

ssin

g, lo

calis

a-tio

n, s

tabi

lity

and

tran

slat

ion

Cas

sava

4.1_

0338

26m

NA

C d

omai

n-co

ntai

ning

pro

tein

57

−2.

08−

1.00

2.39

−0.

480.

670.

73T

Fs m

edia

ting

plan

t dev

elop

men

tal p

roce

sses

& s

tres

s re

spon

ses;

nuc

lear

loca

lisat

ion,

DN

A-b

indi

ngC

assa

va4.

1_02

5277

m

Pre

fold

in 6

0.41

−3.

41N

A0.

46−

0.88

0.85

Co-

chap

eron

e w

ith c

hape

roni

n, b

inds

cyt

opla

smic

act

in &

tu

bulin

mon

omer

s du

ring

cyt

oske

leto

n as

sem

bly;

Im

pli-

cate

d in

nuc

lear

gen

e re

gula

tion

Cas

sava

4.1_

0314

65m

AD

P-gl

ucos

e py

ro-p

hosp

hory

lase

sm

all s

ubun

it 2

0.12

0.62

2.11

0.31

0.55

−0.

66C

arbo

hydr

ate

met

abol

ism

Cas

sava

4.1_

0310

78m

Tes

min

/TSO

1-lik

e C

XC

2−

0.25

0.59

2.73

0.84

−0.

720.

16M

ale

and

fem

ale

flow

er f

ertil

ityC

assa

va4.

1_02

6451

m

Tes

min

/TSO

1-lik

e C

XC

dom

ain-

cont

aini

ng p

rote

in1.

510.

60−

1.22

0.56

2.56

−0.

66M

ale

and

fem

ale

flow

er f

ertil

ityC

assa

va4.

1_01

7789

m

Ubi

quiti

n-co

njug

atin

g en

zym

e 34

0.25

1.06

NA

0.78

−0.

43−

0.61

Ubi

quiti

n-m

edia

ted

degr

adat

ion

of c

ell c

ycle

G1

regu

lato

rs;

initi

atio

n of

DN

A r

eplic

atio

nC

assa

va4.

1_02

5388

m

Tra

nsdu

cin/

WD

40 r

epea

t-lik

e su

perf

amily

pro

tein

0.75

1.90

1.17

0.01

0.20

−0.

07Si

gnal

tran

sduc

tion,

reg

ulat

ion

of tr

ansc

ript

ion,

cel

l cyc

le

cont

rol,

apop

tosi

sC

assa

va4.

1_01

1123

m

Tra

nsdu

cin

fam

ily p

rote

in/W

D-4

0 re

peat

fam

ily p

rote

in−

0.49

−0.

51−

1.20

0.22

0.00

0.39

Sign

al tr

ansd

uctio

n, r

egul

atio

n of

tran

scri

ptio

n, c

ell c

ycle

co

ntro

l, ap

opto

sis

Cas

sava

4.1_

0036

04m

DN

AJ

heat

sho

ck N

-ter

min

al d

omai

n-co

ntai

ning

pro

tein

1.27

−1.

80N

A−

0.07

0.12

−0.

77H

sp40

, med

iate

s pr

otei

n tr

ansl

atio

n, tr

ansl

ocat

ion,

deg

rada

-tio

n, f

oldi

ng a

nd u

nfol

ding

by

stim

ulat

ing

AT

Pase

act

ivity

of

Hsp

70

Cas

sava

4.1_

0342

90m

AY

8363

67

Hel

icas

e do

mai

n-co

ntai

ning

pro

tein

1.12

−0.

902.

140.

04−

0.25

0.46

Invo

lved

in u

nwin

ding

of

nucl

eic

acid

sC

assa

va4.

1_02

5672

m

Tet

rasp

anin

21.

221.

842.

08−

0.51

−0.

410.

66T

rans

mem

bran

e m

olec

ular

fac

ilita

tor

prot

eins

med

iatin

g st

abili

ty o

f si

gnal

ling

com

plex

esC

assa

va4.

1_01

3901

m

Ubi

quiti

n sy

stem

com

pone

nt C

ue p

rote

in−

0.49

−1.

19N

A1.

05−

1.46

2.82

bind

ing

ubiq

uitin

-con

juga

ting

enzy

mes

; par

ticip

ates

in

sign

al tr

ansd

uctio

n pa

thw

ays

Cas

sava

4.1_

0183

36m

AT

P-de

pend

ent R

NA

hel

icas

e, p

utat

ive

−1.

21−

4.00

−2.

19−

0.90

1.19

−0.

30Pr

otei

n &

nuc

leic

aci

d bi

ndin

g; tr

ansl

atio

n in

itiat

ion

and

regu

latio

n- b

indi

ng o

f ri

boso

mes

to m

RN

A; h

ydro

lase

ac

tivity

Cas

sava

4.1_

0237

64m

Xyl

oglu

can

endo

tran

sgly

cosy

lase

61.

92−

0.90

NA

−0.

31−

2.36

−0.

27H

ydro

lase

impo

rtan

t dur

ing

seed

ger

min

atio

n, f

ruit

ripe

ning

, an

d ra

pid

wal

l exp

ansi

onC

assa

va4.

1_01

3520

m

Pen

tatr

icop

eptid

e re

peat

(PP

R)

supe

rfam

ily p

rote

in−

0.35

−0.

15−

2.03

0.28

0.03

−0.

07R

NA

-bin

ding

; pos

ttran

scri

ptio

nal f

unct

ions

e.g

. RN

A e

dit-

ing,

RN

A s

plic

ing,

RN

A c

leav

age,

tran

slat

ion;

req

uire

d fo

r m

any

deve

lopm

enta

l pro

cess

es

Cas

sava

4.1_

0072

65m

F-b

ox f

amily

pro

tein

0.92

−1.

322.

14−

1.10

−0.

14N

AC

ompo

nent

s of

ubi

quiti

n-lig

ase

com

plex

es; p

rote

in–p

ro-

tein

inte

ract

ion;

bin

ds s

ubst

rate

s fo

r ub

iqui

tin-m

edia

ted

prot

eoly

sis

Cas

sava

4.1_

0294

97m

Page 11: Sequences enhancing cassava mosaic disease symptoms …gap2118/papers/SEGS.pdfwith cassava genes, suggesting a possible role in regulation of specific biological processes. We confirm

Mol Genet Genomics

1 3

Tabl

e 3

con

tinue

d

Func

tiona

l ann

otat

ion

T20

0 lo

g2 r

atio

TM

E3

log2

rat

ioB

iolo

gica

l fun

ctio

nPh

ytoz

ome

acce

ssio

n

12 d

pi32

dpi

67 d

pi12

dpi

32 d

pi67

dpi

Ear

ly n

odul

in-l

ike

prot

ein

170.

080.

102.

42−

0.25

0.22

0.56

Mem

bran

e-an

chor

ed; e

lect

ron

tran

sfer

act

ivity

; im

plic

ated

in

infe

ctio

n an

d no

dule

dev

elop

men

tC

assa

va4.

1_01

7364

m

UD

P-gl

ucos

yl tr

ansf

eras

e 76

E11

2.92

−0.

58N

A−

0.16

−1.

141.

32D

etox

ifica

tion

of e

ndog

enou

s an

d ex

ogen

ous

subs

trat

es b

y tr

ansf

er o

f gl

ycos

yl r

esid

ues

Cas

sava

4.1_

0274

42m

Tet

ratr

icop

eptid

e re

peat

(T

PR)-

like

supe

rfam

ily p

rote

in1.

31−

0.07

NA

0.50

0.12

2.32

A s

truc

tura

l mot

if th

at m

edia

tes

prot

ein–

prot

ein

inte

ract

ion

and

asse

mbl

y of

mul

ti-pr

otei

n co

mpl

exes

Cas

sava

4.1_

0302

32m

GD

SL-l

ike

Lip

ase/

Acy

lhyd

rola

se s

uper

fam

ily p

rote

in0.

19−

3.64

−4.

07−

0.65

−3.

310.

73L

ipas

e ac

tivity

; hyd

rola

se a

ctiv

ity; l

ipid

met

abol

ism

Cas

sava

4.1_

0298

83m

Pla

stid

div

isio

n20.

920.

472.

37−

1,16

−0.

460.

47C

hlor

opla

st d

iffe

rent

iatio

nC

assa

va4.

1_02

9977

m

P-l

oop

cont

aini

ng n

ucle

osid

e tr

ipho

spha

te h

ydro

lase

s su

perf

amily

pro

tein

2.36

−0.

671.

570.

570.

410.

50N

ucle

otid

e-bi

ndin

g pr

otei

n fo

ld p

rese

nt in

man

y A

TPa

se o

r G

TPa

se k

inas

e an

d he

licas

e cl

asse

sC

assa

va4.

1_00

1994

m

Thi

oest

eras

e su

perf

amily

pro

tein

0.92

−1.

902.

950.

81−

0.46

−0.

62T

hio-

este

r hy

drol

ases

Cas

sava

4.1_

0247

62m

Mito

gen-

activ

ated

pro

tein

kin

ase

kina

se k

inas

e 15

NA

−0.

32N

A−

1.54

3.40

0.92

Dir

ect c

ellu

lar

resp

onse

s to

stim

uli:

regu

late

cel

lula

r fu

nc-

tions

suc

h as

pro

lifer

atio

n, g

ene

expr

essi

on, d

iffe

rent

iatio

n,

mito

sis,

cel

l sur

viva

l and

apo

ptos

is

Cas

sava

4.1_

0267

04m

AR

M-r

epea

t sup

erfa

mily

pro

tein

−1.

28−

5.21

−4.

92−

1.81

0.71

−0.

69R

epet

itive

am

ino

acid

seq

uenc

e di

vers

e ce

llula

r lo

catio

ns,

dive

rse

bind

ing

part

ners

, div

erse

fun

ctio

ns; i

ntra

cellu

lar

sign

allin

g, c

ytos

kele

tal r

egul

atio

n

Cas

sava

4.1_

0075

62m

Int

egra

se-t

ype

DN

A-b

indi

ng s

uper

fam

ily p

rote

inN

A−

1.13

NA

2.75

0.86

0.47

Sequ

ence

-spe

cific

DN

A b

indi

ng, t

rans

crip

tion

fact

or a

ctiv

-ity

; pla

nt g

row

th a

nd d

evel

opm

ent,

dise

ase

resp

onse

Cas

sava

4.1_

0251

38m

RN

A h

elic

ase

fam

ily p

rote

in−

0.37

−0.

04−

0.99

0.11

0.03

0.29

Prot

ein

& n

ucle

ic a

cid

bind

ing;

tran

slat

ion

initi

atio

n an

d re

gula

tion-

bin

ding

of

ribo

som

es to

mR

NA

; hyd

rola

se

activ

ity

Cas

sava

4.1_

0023

33m

RN

A-d

irec

ted

DN

A m

ethy

latio

n 4

0.42

−1.

69N

A−

0.12

−0.

540.

54T

rans

crip

tiona

l rep

ress

ion

of tr

ansp

oson

s an

d ge

nes,

impl

i-ca

ted

biot

ic &

abi

otic

str

ess

resp

onse

s, d

evel

opm

enta

l st

age

regu

latio

n

Cas

sava

4.1_

0101

20m

S-a

deno

syl-l-m

ethi

onin

e-de

pend

ent m

ethy

ltran

sfer

ases

su

perf

amily

pro

tein

−1.

180.

13N

A0.

35−

0.61

−0.

17M

ethy

latio

n of

pro

tein

s, li

pids

& n

ucle

ic a

cids

Cas

sava

4.1_

0126

26m

O-m

ethy

ltran

sfer

ase

1−

0.27

1.72

1.81

0.08

0.46

0.57

Am

ino

acid

, flav

onol

, ste

rol &

lign

in b

iosy

nthe

tic p

roce

sses

, re

spon

se to

infe

ctio

n &

wou

ndin

gC

assa

va4.

1_01

3376

m

RN

A m

ethy

ltran

sfer

ase

fam

ily p

rote

in−

0.81

0.47

NA

0.70

1.12

−0.

20R

NA

bin

ding

, mod

ifica

tion

by m

ethy

ltran

sfer

ase

activ

ity

from

S-a

deno

syl m

ethi

onin

e to

RN

AC

assa

va4.

1_02

4524

m

His

tone

-lys

ine

N-m

ethy

ltran

sfer

ase

ASH

H3a

−0.

82−

0.10

NA

0.51

0.02

0.59

Lysi

ne d

egra

datio

n, c

hrom

atin

mod

ifica

tion

Cas

sava

4.1_

0098

44m

NB

-AR

C d

omai

n-co

ntai

ning

dis

ease

res

ista

nce

prot

ein

0.77

−0.

47N

A−

0.15

−0.

180.

65A

TPa

se-d

epen

dant

nuc

leot

ide

bind

ing

dom

ain;

inte

ract

s w

ith

R p

rote

ins

to m

edia

te d

isea

se r

esis

tanc

eC

assa

va4.

1_00

0657

m

Tra

nsdu

cin/

WD

40 r

epea

t-lik

e su

perf

amily

pro

tein

a−

0.15

−0.

61N

A0.

47−

0.10

0.71

Sign

al tr

ansd

uctio

n, r

egul

atio

n of

tran

scri

ptio

n, c

ell c

ycle

co

ntro

l, ap

opto

sis

Cas

sava

4.1_

0110

10m

Tra

nsdu

cin/

WD

40 r

epea

t-lik

e su

perf

amily

pro

tein

a0.

010.

19N

A−

0.14

−0.

750.

50Si

gnal

tran

sduc

tion,

reg

ulat

ion

of tr

ansc

ript

ion,

cel

l cyc

le

cont

rol,

apop

tosi

sC

assa

va4.

1_00

2733

m

Page 12: Sequences enhancing cassava mosaic disease symptoms …gap2118/papers/SEGS.pdfwith cassava genes, suggesting a possible role in regulation of specific biological processes. We confirm

Mol Genet Genomics

1 3

Tabl

e 3

con

tinue

d

Func

tiona

l ann

otat

ion

T20

0 lo

g2 r

atio

TM

E3

log2

rat

ioB

iolo

gica

l fun

ctio

nPh

ytoz

ome

acce

ssio

n

12 d

pi32

dpi

67 d

pi12

dpi

32 d

pi67

dpi

WR

KY

DN

A-b

indi

ng p

rote

in 3

−0.

73−

0.15

NA

0.07

0.39

−0.

23Se

quen

ce-s

peci

fic D

NA

-bin

ding

tran

scri

ptio

n fa

ctor

, tra

n-sc

ript

iona

l rep

rogr

amm

ing

in p

lant

def

ence

res

pons

es &

ho

rmon

e si

gnal

ling

Cas

sava

4.1_

0052

67m

AC

C o

xida

se 1

−0.

08−

0.64

NA

−1.

65−

0.85

−1.

14E

thyl

ene-

form

ing

oxid

ored

ucta

seC

assa

va4.

1_01

2494

m

Pro

tein

kin

ase

supe

rfam

ily p

rote

in−

0.03

0.05

NA

−0.

32−

0.19

−0.

86Po

sttr

ansl

atio

nal m

odifi

catio

n by

pho

spho

ryla

tion

chan

ging

en

zym

e ac

tivity

, cel

lula

r lo

catio

n, o

r as

soci

atio

n w

ith o

ther

pr

otei

ns; c

ritic

al in

sig

nal t

rans

duct

ion

casc

ades

Cas

sava

4.1_

0036

40m

Ubi

quiti

n-lik

e su

perf

amily

pro

tein

0.51

−1.

32N

A−

0.01

−0.

99−

1.49

Prot

ein

traf

ficki

ng, a

utop

hagy

, pro

teas

ome-

med

iate

d pr

ote-

olys

is; r

egul

atio

n of

cel

lula

r fu

nctio

nsC

assa

va4.

1_03

3633

m

Sen

esce

nce-

asso

ciat

ed E

3 ub

iqui

tin li

gase

10.

77−

0.61

NA

0.54

0.37

−0.

68U

biqu

itin-

depe

nden

t deg

rada

tion;

pre

vent

s pr

emat

ure

sene

s-ce

nce;

par

ticip

ates

in a

bsci

sic

acid

bio

synt

hesi

sC

assa

va4.

1_00

1725

m

Rab

5-in

tera

ctin

g fa

mily

pro

tein

a1.

02−

0.10

NA

0.40

0.15

0.80

A m

embr

ane-

anch

ored

ras

-rel

ated

GT

Pase

, reg

ulat

es

vesi

culo

-tub

ular

tran

spor

tC

assa

va4.

1_02

9752

m

SK

P1/A

SK-i

nter

actin

g pr

otei

n 5

−0.

80−

0.96

NA

0.59

0.64

0.21

Med

iate

s pr

oteo

som

al b

indi

ng w

ith u

biqu

itin

ligas

e co

mpl

ex; p

rote

osom

al d

egra

datio

n of

targ

ets;

invo

lved

in

supp

ress

ion

of g

ene

sile

ncin

g; in

tera

ct w

ith s

peci

fic F

-box

pr

otei

ns

Cas

sava

4.1_

0294

34m

Hea

t-sh

ock

prot

ein

Dna

J w

ith te

trat

rico

pept

ide

repe

at0.

32−

0.02

−0.

93−

0.49

0.49

0.42

HSP

cha

pero

ne c

ofac

tor

inte

ract

ion

via

TT

P re

peat

dom

ain;

in

volv

ed in

str

ess

resp

onse

Cas

sava

4.1_

0259

31m

Cyc

lin f

amily

0.66

−0.

15N

A0.

95−

0.91

0.73

Con

trol

cel

l cyc

le p

rogr

essi

on a

nd d

iffe

rent

iatio

n vi

a cy

clin

-de

pend

ent k

inas

e (C

dk)

enzy

mes

Cas

sava

4.1_

0282

32m

At l

east

one

of

the

valu

es is

−1 ≥

log2

rat

io ≥

1, r

epre

sent

ing

a ±

twof

old

chan

ge in

gen

e ex

pres

sion

a Gen

e se

quen

ces

over

lapp

ing

with

SE

GS

Page 13: Sequences enhancing cassava mosaic disease symptoms …gap2118/papers/SEGS.pdfwith cassava genes, suggesting a possible role in regulation of specific biological processes. We confirm

Mol Genet Genomics

1 3

were mostly in +/+ orientation relative to the episomal SEGS query sequences. Unrelated gene sequences contained similar stretches of iSEGS, including the GC-rich sequences, suggesting a conserved function for these sequence stretches or a common biological role for the sequence motifs. The iSEGS-associated cassava ESTs had assigned functions such as signal recognition, sulphite oxidase activity, RNA helicase and RNA/nucleotide binding, prefoldin, kinesin, dihydropico-linate reductase, and quinolinate phosphoribosyltransferase (Online Resources 9, 10), largely in agreement with previ-ously mentioned GO categories (Fig. 4; Online Resource 7).

Expression analysis of iSEGS

Reverse transcriptase (RT)-PCR analyses confirmed that iSEGS are expressed. Amplicons were of the expected sizes when the primer pairs annealed within the iSEGS (Fig. 5a, b). When primers were designed to span iSEGS1 and an adjacent prefoldin 6 subunit gene (cas-sava4.1_019094 m), or iSEGS2 and an adjacent helicase gene (cassava4.1_025672 m), the expected product sizes from cDNA could not be reliably ascertained. Unspecific amplicons were produced from the iSEGS1 and prefol-din primers (Fig. 5c). However, several prefoldin subunit

sequences were among the SEGS-containing ESTs iden-tified (Online Resource 9), and transcripts from this gene were differentially expressed in SACMV-infected cassava (Online Resource 11). Helicase and iSEGS2 primers pro-duced ~700 bp amplicons in TME1, TME3 and cv.60444 (Fig. 5d).

Expression of iSEGS‑associated genes during SACMV infection

To investigate the possible involvement of iSEGS in CMD, transcriptome data from SACMV-infected cassava at 12, 32 and 67 days post-inoculation (dpi) (Allie et al. 2014) were analysed. The expression of iSEGS-associated genes in two landraces, susceptible T200 and tolerant TME3, showed differential expression (DE) during the course of infection (Online Resources 11 and 12 for iSEGS1 and iSEGS2, respectively). In general, gene expression in the tolerant TME3 was largely unperturbed compared to the changes observed in the CMD-susceptible T200. The number of DE genes associated with iSEGS2 was higher, concomi-tant with the larger number of iSEGS2 homologues within the genome. Some of the iSEGS-associated DE genes were transcripts coding for prefoldin subunits, helicase-domain

Fig. 5 Expression analysis of iSEGS in different cassava germplasm. Primers were designed to amplify with the largest integrated frag-ments; a, b for iSEGS1 and iSEGS2, respectively. Primers spanning the iSEGS1 insertion to include the neighbouring prefoldin 6 subu-nit gene or iSEGS2 into neighbouring helicase gene (c, d), respec-tively. Lanes 1–4 cDNA from cassava accessions TME1, TME3,

T200 and cv.60444, respectively; Marker—Molecular weight marker, O’GeneRuler 1 kb plus DNA ladder (Thermo Scientific), Plasmid 1 and Plasmid 2 refer to plasmids containing dimers of eSEGS1 or eSEGS2, respectively, and NTC refers to no template controls. PCR was performed on ~250 ng gDNA or cDNA

Page 14: Sequences enhancing cassava mosaic disease symptoms …gap2118/papers/SEGS.pdfwith cassava genes, suggesting a possible role in regulation of specific biological processes. We confirm

Mol Genet Genomics

1 3

proteins; ATP-dependent RNA helicase; RNA-binding fam-ily proteins, GDSL-like lipase/acylhydrolase superfamily proteins, ARM-repeat superfamily protein, and an integrase-type DNA-binding protein (Table 3 and Online Resources 11, 12). In general, genes involved in nucleic acid binding and synthesis were down-regulated in virus-infected plants relative to mock-inoculated controls. DNA- and RNA-dependent polymerases, helicases, DNA- and RNA bind-ing proteins, ribosomal RNA processing proteins, amongst others, showed substantial changes in expression levels in T200, that were not correspondingly matched in TME3 expression data. However, notably, of the total number of DE genes, the percentage of iSEGS2-associated genes was highest (53, 31 and 50 % at 12, 32, and 67 dpi, respec-tively) in tolerant TME3 compared to T200 (11, 5 and 1 % at 12, 32 and 67 dpi, respectively) (Fig. 6). The percentage of iSEGS1-associated DE genes was low (1–2 %) for both landraces. Of particular interest in relation to iSEGS role in CMD was that among iSEGS-associated genes, many were involved in biotic stress responses such as heat-shock proteins, ubiquitinylation-associated proteins, enzymes for the prevention of early senescence (mediated by ethylene and/or ubiquitin), R gene-associated NB-ARC domains, enzymes for chromatin and histone modification, RNA/DNA-modifying enzymes and enzymes for posttranslational modification of proteins. Also of interest was the identifi-cation of hydrolase activity as one of the named classes of iSEGS-associated enzymes. Hydrolases are implicated in hypersensitive reaction-induced necrosis, leading to disease resistance by restricting spread of the invading pathogens (Guo et al. 1998). Other interesting iSEGS-associated genes implicated in host response to pathogens are transducin/WD40 repeat-like and protein kinase superfamily proteins (SEGS1; scaffold 04116) and NB-ARC domain-containing disease R protein (SEGS2; scaffold 12711), as these are also involved in resistance to plant pathogens (Cai et al. 2008).

Small RNAs targeting SEGS in mock‑inoculated and SACMV‑infected cassava

To identify sRNAs targeting SEGS, six small RNA-enriched libraries were each generated from mock-inocu-lated and SACMV-infected cassava leaves that were col-lected from susceptible T200 and tolerant TME3, at 12, 32 and 67 dpi, and sequenced using the Illumina HiSeq2000 system. The small RNA (sRNA) sequencing raw reads and trimmed read counts, after removing low-quality sequences, adapters, and small sequences (<15 nt), for the twelve libraries are shown in Online Resource 13. In addition, homologous normalised total and unique sRNAs (18–26 nt) from mock and infected leaf tissues targeting SEGS were quantified (Online Resource 13). RNA-Seq data from mock-inoculated and SACMV-infected cassava showed that the relative abundances of unique SEGS1 and 2-derived sRNAs were higher in SACMV-infected suscep-tible T200 compared to tolerant TME3 at 67 dpi (Fig. 7a, c). Interestingly, unique and total SEGS1 and 2 sRNA counts were generally higher in frequency in SACMV-infected T200 compared with mock-inoculated leaves (Fig. 7a, c), whereas no difference, or the opposite pattern, was observed for TME3. Total and unique 18–26 nt sRNA SEGS-mapped reads were significantly more abundant for SEGS2 compared to SEGS1 in both susceptible and toler-ant landraces (Fig. 7b, d). We also note that the differences in sRNA abundances were more apparent in the unique (Fig. 7a, c) than the total counts (Fig. 7b, d). The distribu-tion of perfectly matched sRNAs showed apparent enrich-ment of the 24-nt population in all four groups (SEGS1 or SEGS2 for T200 and TME3) (data not shown).

Complementary to the above analysis, comparison of iSEGS to published small RNA datasets from cassava vari-eties MBRA685 and TAI16 (Pérez-Quintero et al. 2012; Ballen-Taborda et al. 2013) identified 15 small RNAs (sizes 20–50 bp) with sequence identities between 90 % and 100 % to different regions of SEGS1 (8 sequences) and SEGS2 (7 sequences).

Discussion

Since the discovery of two episomal single-stranded DNA sequences, termed Sequences Enhancing Geminivirus Symptoms (eSEGS1 and eSEGS2), from field-infected cas-sava exhibiting severe CMD symptoms (Ndunguru 2006; Ndunguru et al. 2008), the nature and function of these sequence elements have been elusive. Our study provides bioinformatics and experimental evidence that SEGS are integrated as multiple fragmented copies in the cassava genome, and resemble non-autonomous transposable ele-ments whose regulation is linked to alterations in CMD

Fig. 6 The percentage (out of the total number of DE genes) of SEGS-associated DE genes in SACMV infected T200 and TME3 for each time point (12, 32 and 67 dpi)

Page 15: Sequences enhancing cassava mosaic disease symptoms …gap2118/papers/SEGS.pdfwith cassava genes, suggesting a possible role in regulation of specific biological processes. We confirm

Mol Genet Genomics

1 3

aetiology. Although the replication of eSEGS molecules has not been conclusively demonstrated, it is clear that in the presence of ACMV-CM or EACMCV, inoculation with eSEGS clones induce enhanced severe symptoms in cas-sava and N. benthamiana (Ndunguru et al. 2008). Of par-ticular importance, CMD tolerance in the well-known West African landrace TME3 appeared to be compromised when eSEGS were co-bombarded with the begomovirus infec-tious clones. Interestingly, susceptible Arabidopsis ecotype plants co-inoculated with ACMV-CM and episomal SEGS also showed enhanced CMD symptoms compared to inoc-ulations with the cassava begomovirus alone (de Leòn et al. 2013). In addition, begomovirus DNA accumulated to higher levels in a susceptible Arabidopsis ecotype, but a resistant Arabidopsis ecotype showed no symptoms in the presence or absence of the eSEGS. Since no iSEGS occur in the Arabidopsis genome, we suggest that the effect of exogenous SEGS on CMD symptoms may be due to endogenization of the eSEGS into the Arabidopsis genome, and/or regulation by eSEGS via the RNA silencing path-ways. One of the questions which emerged from the dis-covery of SEGS fragments in the cassava EST database was whether these novel sequences were present in cassava germplasm from different sources or even in other plant species. Several analysed germplasm (South American cul-tivars and African landraces) in our collection contained multiple iSEGS1 and iSEGS2 copies of varying sizes and sequence conservation (including the GC-rich region). We have also uncovered iSEGS widely distributed in a range

of germplasm from field samples collected in Rwanda, Burundi and Tanzania (Ndomba 2013; Mollel 2014), sug-gesting that the presence of iSEGS is a highly conserved feature of cassava genomes. Phylogenetic analysis showed no particular clustering of iSEGS according to germ-plasm of origin, suggesting that for the evolution of these sequences, cross-breeding of cultivars may have been more important than divergence through duplication and point mutations. As the divergence between PCR amplicons was often greater within than across accessions, SEGS integra-tions may be ancestral, may have been gained repeatedly, or are not actively duplicating.

Integrated SEGS may also be widely represented in other plant species where they play a role in modulation of disease. eSEGS1 from cassava has been described in a study on leaf deformity in mint, caused by a geminivirus–satellite complex, a 1019-nt amplicon (EU862815) (Borah et al. 2011). This molecule, named Mentha leaf deformity-associated DNA-II, showed 98 % nucleotide similarity to eSEGS1 (DNA-II, AY836366.1). It was proposed that the amplicon contributed to the severe deformity observed dur-ing geminivirus-betasatellite infection. In another study, de Leòn et al. (2013) reported that episomal SEGS function with a heterologous geminivirus; Cabbage leaf curl virus (CaLCuV) in a resistant Arabidopsis ecotype. Plants accu-mulated CaLCuV when co-inoculated with eSEGS clones, but no symptoms or viral nucleic acids were detected when inoculations contained only the virus or the eSEGS. Sig-nificantly, this supports our hypothesis that exogenous

Fig. 7 Size distribution of unique small RNAs mapping to SEGS. Small RNA sizes and abundance in mock-inoculated and SACMV-infected cassava representing values for perfect matches mapping to genomic segments of eSEGS1 or eSEGS2 are shown for T200 or TME3

Page 16: Sequences enhancing cassava mosaic disease symptoms …gap2118/papers/SEGS.pdfwith cassava genes, suggesting a possible role in regulation of specific biological processes. We confirm

Mol Genet Genomics

1 3

SEGS are functioning through interaction with the genome, and not with the begomoviruses. While their molecular action needs further investigation, SEGS somehow modu-late symptom phenotype by direct or indirect interference with plant resistance mechanisms or transcriptome changes induced by begomoviruses, leading to alteration of gene expression and enhanced symptoms.

iSEGS found in the cassava genome share some features with the MITE class of TEs. As with SEGS, (1) MITEs lack open reading frames (ORFs) and where ORFs are pre-sent, they code for short peptides of unknown functions, and (2) they are found in close proximity to gene coding sequences, and are often co-transcribed with the plant genes (Lu et al. 2012). However, there are notable differ-ences with SEGS and most classes of MITEs characterised to date: (1) MITEs are generally smaller (<600 bp) than the near full-length iSEGS (>1 kb), (2) the copy numbers of iSEGS are low (hundreds) compared to up to tens of thou-sands copies reported for some MITEs (Oki et al. 2008; Paterson et al. 2009), and (3) terminal inverted repeats (TIRs) and target site duplications (TSD) are characteristic features of MITEs which could not be identified for most iSEGS. TIRs were only observed for the largest copy of iSEGS1, for which the putative TSDs are short and are not directly adjacent to the TIRs (Online Resource 6). Further-more, using MITE-Hunter (Han and Wessler 2010), none of the possible 229 MITE families identified in the cassava genome corresponded to iSEGS.

Due to their proximity to genes, MITEs are implicated in regulation of gene expression. RNA from transcribed MITEs can regulate gene expression through RNA-directed gene methylation of adjacent genes (Buchmann et al. 2009), and/or through Post Transcriptional Gene Silenc-ing (PTGS) of homologous mRNAs. Both mechanisms involve small RNAs originating either from the miRNA or small interfering RNA (siRNA) pathways. Differential abundances of SEGS-derived sRNAs between T200 and TME3, as well as between SACMV-infected and mock-inoculated cassava were detected, and further support a role for SEGS in CMD modulation. Furthermore, SEGS sRNAs were more frequent in susceptible T200 compared to TME3, suggesting that SACMV causes an increase in SEGS expression, or that, since the differences are more apparent in the unique than the total counts, these sRNAs could be derived via RNA silencing dicer-mediated cleav-age of longer transcripts. The 24-nt class was most abun-dant and enriched at 32 dpi in both landraces. The 24-nt group, representing heterochromatic or repeat-associated siRNAs, is known to guide DNA methylation and histone modification of repetitive sequences (Xie et al. 2004) rather than mediating mRNA cleavage. Although dual coding siRNAs and miRNAs from TEs have been reported (Piri-yapongsa and Jordan 2008), the absence of the 21 nt size

group, predominantly miRNA, corroborates the failure to identify precursor miRNA signatures within the SEGS. In agreement with our findings, we uncovered small RNAs with homology to iSEGS1 and iSEGS2 in RNA-seq data by Pérez-Quintero et al. (2012) and Ballen-Taborda et al. (2013), from cassava under biotic and abiotic stress, respec-tively, although no miRNA precursors could be identified. These results suggest a possible transcriptional gene silenc-ing role for the iSEGS rather than PTGS, although PTGS cannot be ruled out.

Reports on the characterisation of TEs in the cassava genome are few. Gbadegesin et al. (2008) identified diverse En/Spm-like elements (Meens) and LTR-retrotransposons. However, such annotations are not yet fully represented in the draft genome on Phytozome, impeding assignment of iSEGS elements to definitive classes of repetitive elements. In the absence of TIRs, except in the near full-length iSEGS1, the involvement of a trans-acting transposase (a classical feature of non-autonomous DNA TEs) cannot be supported, and an alternative dispersal mechanism utilis-ing rolling circle amplification (RCA) of SEGS, followed by detachment of amplicons of varied sizes and subsequent integration in other genome locations is proposed in this study. The presence of inverted-repeat sequences condu-cive to intra-strand recombination is supported by complex cruciform structures obtained in Mfold analyses of iSEGS (Online Resource 12). The GC-rich regions of SEGS and stable stem/loop structures may lead to the formation of proposed extra-chromosomal circles, leading to RCA and reintegration of copies into other genome locations by homologous or illegitimate recombination; however, the recombination potential and thermodynamic stability of the cruciform and hairpin structures from iSEGS require further characterisation using predetermined stretches of 50–200 bp. Furthermore, the extensive intra-strand base pairing observed strongly supports our hypothesis that iSEGS are able to regulate gene expression at the nucleo-tide or transcript level. DNA and RNA secondary struc-tures are known to interact with proteins, affecting cellular functions such as recombination, replication and transcrip-tion (Hatfield and Benham 2002). Notably, the fragmented nature of the iSEGS, the observed core sequence conserva-tion between PCR amplicons, and the proposed RCA-medi-ated dispersal are characteristics of helitrons, a recently described class of transposons (Kapitonov and Jurka 2007; Thomas et al. 2010).

The functional significance of integrated SEGS was explored in light of the proposed role of episomal SEGS in exacerbation of CMD symptoms. Our results suggest several layers of regulatory potential. First, the location of iSEGS in intergenic regions, especially close or over-lapping with 5′ or 3′ UTRs, may affect the expression of neighbouring genes due to proximity effects or alteration

Page 17: Sequences enhancing cassava mosaic disease symptoms …gap2118/papers/SEGS.pdfwith cassava genes, suggesting a possible role in regulation of specific biological processes. We confirm

Mol Genet Genomics

1 3

of gene sequences (Jurka 2008a; Lisch 2009). Repetitive sequences, such as transposons and satellite DNA ele-ments, are known agents of gene regulation by insertion mutations. Second, iSEGS-derived transcripts and tran-scription of some neighbouring genes in cassava were demonstrated by RT-PCR, corroborating the presence of iSEGS-like sequences in 5′ regions of several transcripts in ESTs libraries. The patterns of integration in genome locations, and ESTs, suggest regulation of associated genes by iSEGS because many unrelated genes contain similar iSEGS stretches. Third, iSEGS-derived small RNAs were identified in RNA-Seq data from mock-inoculated and SACMV-infected cassava leaves.

Analysis of selected iSEGS-associated genes that were differentially expressed during SACMV infection also showed a preponderance of GO functions associated with regulation of processes at nucleic acid and protein lev-els. Genes with nuclear localised biological functions are often related to regulatory networks or cell cycle repro-gramming and developmental changes. Prefoldin acts on DNA-binding proteins in the nucleus, thus playing a role in transcriptional regulation, and interestingly, many prefoldin-interacting partners are iSEGS-associated and were differentially expressed during SACMV infection. It is noteworthy that SACMV replicates in the nucleus (Hanley-Bowdoin et al. 2013) and geminivirus DNA is often methylated by host innate immune response mecha-nisms (Raja et al. 2008). The overarching properties of iSEGS-associated genes in cassava were regulation of cellular functions and reprogramming global cellular pro-cesses. Differential expression analysis during SACMV infection of two landraces, T200 and TME3, illustrates how gene expression and silencing mechanisms of dis-ease response-associated genes may be different in dispa-rate genomic backgrounds. Overall, SACMV-susceptible T200 showed greater perturbation in gene expression compared to tolerant TME3 (Allie et al. 2014). Integrated SEGS-associated genes were differentially expressed in SACMV-infected compared to mock-inoculated leaves. Differential expression was also observed during the pro-gression of infection, demonstrating that genes neigh-bouring iSEGS are responsive to the begomovirus infec-tion. Notably, amongst the most significant differentially expressed iSEGS-associated proximal genes (Table 3), several proteins have also been reported in other gemi-nivirus infections. A NAC domain-containing protein (Cassava 4.1_025277m) was highly enriched in tolerant TME3 at the stage of recovery (67 dpi). NAC transcrip-tion factors are major transcriptional regulators in plants (Nuruzzaman et al. 2013), and numerous members of this multigene family play roles in regulation of biotic stress responses, including geminivirus infection (Selth et al. 2005). Up-regulation of a helicase-containing protein

(Cassava 4.1_025672 m), in T200 at 67 dpi, may facili-tate SACMV rolling circle replication (RCR), as these proteins are known to be involved in cell cycling and RCR. Sahu et al. (2010) also found up-regulation of a DNA2-NAM7 helicase family protein in Tomato leaf curl New Delhi virus (ToLCNDV)-tolerant tomato cultivar. While DE analysis can identify genes that are potentially regulated by iSEGS, pointing to a causal relationship in disease modulation, and the role of geminivirus infec-tion in the dynamics of these elements remains unclear. It is not known how geminivirus infection activates or represses the transcription of iSEGS. In addition, gemi-niviruses are known to suppress host defences at both transcriptional and posttranscriptional levels (Buchmann et al. 2009). How such virus-mediated silencing mecha-nisms affect the transcription of iSEGS needs further investigation.

We conclude from in silico and NGS studies, that the iSEGS resemble non-autonomous transposable elements, but show unique features that make their classification challenging. A relationship between iSEGS, gene expres-sion and CMD infection was established in this study. We conclude that iSEGS are modulators of genome func-tion, possibly via transcriptional or posttranscriptional silencing of host genes, irrespective of the geminivirus-associated disease, because exogenous episomal SEGS exacerbate symptoms with heterologous geminiviruses (Borah et al. 2011; de Leòn et al. 2013) in plant species with (cassava) or without (A. thaliana and N. benthamiana) iSEGS. In either case, this is important for future studies since iSEGS influence geminivirus-responsive host genes and subsequent disease aetiology. We suggest that a RCA mechanism, as proposed for helitrons, may explain the dispersal of iSEGS within the cassava genome. The func-tions of iSEGS-associated differentially expressed genes in CMD would prove invaluable, and clearly, the scope of this study needs to be extended to investigate these relation-ships further. Additionally, dynamic methylation changes within transposons can regulate proximal genes in response to virus infection (Dowen et al. 2012). Pathogen-induced transcriptome networks and DNA methylomes are linked to gene expression regulation, and it would not be unreason-able to hypothesise that SACMV-induced symptom modu-lation may be attributable to methylation changes within transposable-like iSEGS (as evidenced by abundant 24-nt sRNAs in T200 and TME3) that may regulate neighbour-ing or proximal genes. A study of the DNA methylome, in relation to iSEGS and cassava geminiviruses, would prove worthwhile.

Acknowledgments This project was supported by Grants from the National Research Foundation Competitive Grant and the Inter-national Center for Genetic Engineering and Biotechnology, Trieste. ATM was supported by Claude Leon Foundation and NRF-South

Page 18: Sequences enhancing cassava mosaic disease symptoms …gap2118/papers/SEGS.pdfwith cassava genes, suggesting a possible role in regulation of specific biological processes. We confirm

Mol Genet Genomics

1 3

Africa. We would like to thank Dr. Louis Bengyella for assistance in the phylogenetic analysis.

Conflict of interest The authors declare no conflict of interest.

References

Allie F, Pierce EJ, Okoniewski MJ, Rey C (2014) Transcriptional analysis of South African cassava mosaic virus-infected suscep-tible and tolerant landraces of cassava highlights differences in resistance, basal defense and cell wall associated genes during infection. BMC Genom 15:1006

Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410

Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new gen-eration of protein database search programs. Nucleic Acids Res 25:3389–3402

Ballen-Taborda C, Plata G, Ayling S, Guez-Zapata F, Lopez-Lav-alle B, Luis A, Duitama J, Tohme J (2013) Identification of Cassava MicroRNAs under Abiotic Stress. Int J Genomics. doi:10.1155/2013/857986

Bao W, Jurka J (2008) CR1 families from Hydra magnipapillata. Repbase Reports 8:1845

Bao W, Jurka J (2013) LTR retrotransposons from the red seaweed. Repbase Reports 13:2407

Bejerano G, Lowe CB, Ahituv N, King B, Siepel A, Salama SR, Rubin EM, Kent WJ, Haussler D (2006) A distal enhancer and an ultraconserved exon are derived from a novel retroposon. Nature 441:87–90

Berrie LC, Rybicki EP, Rey ME (2001) Complete nucleotide sequence and host range of South African cassava mosaic virus: further evidence for recombination amongst begomoviruses. J Gen Virol 82:53–58

Borah BK, Cheema GS, Gill CK, Dasgupta I (2011) A Geminivirus–Satellite Complex is associated with leaf deformity of Mentha (Mint) plants in Punjab. Indian J Virol 21:103–109

Briddon RW, Bull SE, Mansoor S, Amin I, Markham PG (2002) Uni-versal primers for the PCR-mediated amplification of DNA beta: a molecule associated with some monopartite begomoviruses. Mol Biotechnol 20:315–318

Brown J, Fauquet C, Briddon R, Zerbini M, Navas-Castillo J (2011) Family Geminiviridae, 1st edn. Elsevier-Academic, Amsterdam, pp 351–373

Buchmann RC, Asad S, Wolf JN, Mohannath G, Bisaro DM (2009) Geminivirus AL2 and L2 proteins suppress transcriptional gene silencing and cause genome-wide reductions in cytosine meth-ylation. J Virol 83:5005–5013

Cai M, Qiu D, Yuan T, Ding X, Li H, Duan L, Xu C, Li X, Wang S (2008) Identification of novel pathogen-responsive cis-elements and their binding proteins in the promoter of OsWRKY13, a gene regulating rice disease resistance. Plant Cell Environ 31:86–96

Clark AG, Eisen MB, Smith DR, Bergman CM, Oliver B, Markow TA, Kaufman TC, Kellis M, Gelbart W, Iyer VN, Pollard DA, Sackton TB, Larracuente AM, Singh ND, Abad JP, Abt DN, Adryan B, Aguade M, Akashi H, Anderson WW, Drosophila 12 Genomes Consortium et al (2007) Evolution of genes and genomes on the Drosophila phylogeny. Nature 450:203–218

Collén J, Porcel B, Carré W, Ball SG, Chaparro C, Tonon T, Barbey-ron T, Michel G, Noel B, Valentin K, Elias M, Artiguenave F, Arun A, Aury J-M, Barbosa-Neto JF, Bothwell JH, Bouget F-Y, Brillet L, Cabello-Hurtado F, Capella-Gutiérrez S et al (2013) Genome structure and metabolic features in the red seaweed

Chondrus crispus shed light on evolution of the Archaeplastida. Proc Natl Acad Sci 110:5247–5252

De Leòn L, Dallas L, Ascencio-Ibáñez J, Sseruwagi P, Robertson D, Ndunguru J, Hanley-Bowdoin L (2013) Two CMD-associated DNA sequences enhance geminivirus symptoms and break resistance in cassava and Arabidopsis. 7th Int. Geminivirus Symp. 5th Int. ssDNA Comp. Virol. Work. Hangzhou, China, p 86

Dowen RH, Pelizzola M, Schmitz RJ, Lister R, Dowen JM, Nery JR, Dixon JE, Ecker JR (2012) From the Cover: PNAS Plus: wide-spread dynamic DNA methylation in response to biotic stress. Proc Natl Acad Sci 109:E2183–E2191

Fondong VN, Pita JS, Rey MEC, de Kochko A, Beachy RN, Fauquet CM (2000) Evidence of synergism between African cassava mosaic virus and a new double-recombinant geminivirus infect-ing cassava in Cameroon. J Gen Virol 81:287–297

Fregene M, Okogbenin E, Mba C, Angel F, Suarez MC, Janneth G, Chavarriaga P, Roca W, Bonierbale M, Tohme J (2001) Genome mapping in cassava improvement: challenges, achievements and opportunities. Euphytica 120:159–165

Gbadegesin MA, Wills MA, Beeching JR (2008) Diversity of LTR-retrotransposons and Enhancer/Suppressor Mutator-like transpo-sons in cassava (Manihot esculenta Crantz). Mol Genet Genom-ics 280:305–317

Gibson RW, Legg JP, Otim-Nape GW (1996) Unusually severe symp-toms are a characteristic of the current epidemic of mosaic virus disease of cassava in Uganda. Ann Appl Biol 128:479–490

Goodstein DM, Shu S, Howson R, Neupane R, Hayes RD, Fazo J, Mitros T, Dirks W, Hellsten U, Putnam N, Rokhsar DS (2012) Phytozome: a comparative platform for green plant genomics. Nucleic Acids Res 40:1178–1186

Guo A, Durner J, Klessig DF (1998) Characterization of a tobacco epoxide hydrolase gene induced during the resistance response to TMV. Plant J 15:647–656

Hall TA (1999) BioEdit: a user-friendly biological sequence align-ment editor and analysis program for Windows 95/98/NT. pp 95–98

Han Y, Wessler SR (2010) MITE-Hunter: a program for discovering miniature inverted-repeat transposable elements from genomic sequences. Nucleic Acids Res 38:e199

Hanley-Bowdoin L, Bejarano ER, Robertson D, Mansoor S (2013) Geminiviruses: masters at redirecting and reprogramming plant processes. Nat Rev Microbiol 11:777–788

Harper G, Hull R, Lockhart B, Olszewski N (2002) Viral sequences integrated into plant genomes. Ann Rev Phytopathol 40:119–136

Hasegawa M, Kishino H, Yano T (1985) Dating of the human–ape splitting by a molecular clock of mitochondrial DNA. J Mol Evol 22:160–174

Hatfield GW, Benham CJ (2002) DNA topology-mediated control of global gene expression in Escherichia coli. Ann Rev Genet 36:175–203

Jurka J (2008a) Conserved eukaryotic transposable elements and the evolution of gene regulation. Cell Mol Life Sci 65:201–204

Jurka J (2008b) P-type DNA transposon families from Hydra magni-papillata. Repbase Reports 8:353

Jurka J, Kohany O (2009) LTR retrotransposons from fruit fly. Rep-base Reports 9:1046

Jurka J, Kojima K (2012) LTR retrotransposons from fruit fly. Rep-base Reports 12:1512

Kapitonov VV, Jurka J (2007) Helitrons on a roll: eukaryotic rolling-circle transposons. Trends Genet 23:521–529

Legg JP, Owor B, Sseruwagi P, Ndunguru J (2006) Cassava Mosaic Virus Disease in East and Central Africa: epidemiology and Man-agement of A Regional Pandemic. Adv Virus Res 67:355–418

Lisch D (2009) Epigenetic regulation of transposable elements in plants. Annu Rev Plant Biol 60:43–66

Page 19: Sequences enhancing cassava mosaic disease symptoms …gap2118/papers/SEGS.pdfwith cassava genes, suggesting a possible role in regulation of specific biological processes. We confirm

Mol Genet Genomics

1 3

Louis B, Waikhom SD, Roy P, Bhardwaj PK, Sharma CK, Singh MW, Talukdar NC (2014) Host-range dynamics of Cochliobolus luna-tus: From a biocontrol agent to a severe environmental threat. Biomed Res Int. doi:10.1155/2014/378372

Lu C, Chen J, Zhang Y, Hu Q, Su W, Kuang H (2012) Miniature inverted-repeat transposable elements (MITEs) have been accu-mulated through amplification bursts and play important roles in gene expression and species diversity in Oryza sativa. Mol Biol Evol 29:1005–1017

Mansoor S, Khan SH, Bashir A, Saeed M, Zafar Y, Malik KA, Brid-don R, Stanley J, Markham PG (1999) Identification of a Novel Circular Single-Stranded DNA associated with Cotton Leaf Curl Disease in Pakistan. Virology 259:190–199

Mayo M, Leibowitz M, Palukaitis P, Scholthof KBG, Simon AE, Stanley J, Taliansky M (2005) Satellites. Elsevier/Academic Press, London, pp 1163–1169

Mollel HG (2014) Interaction and impact of cassava mosaic begomo-viruses and their associated satellites. M.Sc. Thesis. University of the Witwatersrand

Murashige T, Skoog F (1962) A revised medium for rapid growth and bio assays with tobacco tissue cultures. Physiol Plant 15:473–497

Nakamura K, Particle Data Group et al (2010) Review of particle physics. J Phys G Nucl Part Phys 37:075021

Nawaz-ul-Rehman MS, Fauquet CM (2009) Evolution of geminivi-ruses and their satellites. FEBS Lett 583:1825–1832

Ndomba OA (2013) Influence of satellite DNA molecules on sever-ity of cassava begomoviruses and the breakdown of resistance to cassava mosaic disease in Tanzania. Ph.D. Thesis. University of the Witwatersrand

Ndunguru J (2006) Molecular characterization of cassava mosaic geminiviruses in Tanzania. Ph.D. Thesis. University of Pretoria

Ndunguru J, Fofana B, Legg J, Challepan P, Taylor N, Aveling T, Thompson G, Fauquet CM (2008) Two novel satellite DNAs associated with bipartite cassava mosaic begomoviruses enhanc-ing symptoms and capable of breaking high virus resistance in a cassava landraces. Ghent University, Ghent, p 141

Ntawuruhunga P, Legg J, Okidi J, Okao-Okuja G, Tadu G, Remington T (2007) Southern Sudan, Equatoria Region, Cassava Baseline Survey Technical Report, vol 64. IITA, Ibadan

Nuruzzaman M, Sharoni AM, Kikuchi S (2013) Roles of NAC tran-scription factors in the regulation of biotic and abiotic stress responses in plants. Front Microbiol 4:248

Oki N, Yano K, Okumoto Y, Tsukiyama T, Teraishi M, Tanisaka T (2008) A genome-wide view of miniature inverted-repeat trans-posable elements (MITEs) in rice, Oryza sativa ssp. japonica. Genes Genet Syst 83:321–329

Paterson AH, Bowers JE, Bruggmann R, Dubchak I, Grimwood J, Gundlach H, Haberer G, Hellsten U, Mitros T, Poliakov A, Schmutz J, Spannagl M, Tang H, Wang X, Wicker T, Bharti AK, Chapman J, Feltus FA, Gowik U, Grigoriev IV et al (2009) The Sorghum bicolor genome and the diversification of grasses. Nature 457:551–556

Patil BL, Fauquet CM (2009) Cassava mosaic geminiviruses: actual knowledge and perspectives. Mol Plant Pathol 10:685–701

Peaston AE, Evsikov AV, Graber JH, de Vries WN, Holbrook AE, Solter D, Knowles BB (2004) Retrotransposons regulate host genes in mouse oocytes and preimplantation embryos. Dev Cell 7:597–606

Pérez-Quintero ÁL, Quintero A, Urrego O, Vanegas P, López C (2012) Bioinformatic identification of cassava miRNAs differen-tially expressed in response to infection by Xanthomonas axono-podis pv. manihotis. BMC Plant Biol 12:1–11

Piriyapongsa J, Jordan IK (2008) Dual coding of siRNAs and miR-NAs by plant transposable elements. RNA 14:814–821

Pita JS, Fondong VN, Sangaré A, Otim-Nape GW, Ogwal S, Fauquet CM (2001) Recombination, pseudorecombination and synergism of geminiviruses are determinant keys to the epidemic of severe cassava mosaic disease in Uganda. J Gen Virol 82:655–665

Prochnik S, Marri PR, Desany B, Rabinowicz PD, Kodira C, Mohiud-din M, Rodriguez F, Fauquet C, Tohme J, Harkins T, Rokhsar DS, Rounsley S (2012) The cassava genome: current progress, future directions. Trop Plant Biol 5:88–94

Raja P, Sanville BC, Buchmann RC, Bisaro DM (2008) Viral genome methylation as an epigenetic defense against geminiviruses. J Virol 82:8997–9007

Sahu PP, Rai NK, Chakraborty S, Singh M, Chandrappa PH, Ramesh B, Chattopadhyay D, Prasad M (2010) Tomato cultivar tolerant to tomato leaf curl New Delhi virus infection induces virus-spe-cific short interfering RNA accumulation and defence-associated host gene expression. Mol Plant Pathol 11:531–544

Sakurai T, Plata G, Rodríguez-Zapata F, Seki M, Salcedo A, Toy-oda A, Ishiwata A, Tohme J, Sakaki Y, Shinozaki K, Ishitani M (2007) Sequencing analysis of 20,000 full-length cDNA clones from cassava reveals lineage specific expansions in gene families related to stress response. BMC Plant Biol 7:66. doi:10.1186/1471-2229-7-66

Selth LA, Dogra SC, Rasheed MS, Healy H, Randles JW, Rezaian MA (2005) A NAC domain protein interacts with tomato leaf curl virus replication accessory protein and enhances viral repli-cation. Plant Cell 17:311–325

Tamura K (1992) Estimation of the number of nucleotide substitu-tions when there are strong transition-transversion and G+C content biases. Mol Biol Evol 9:678–687

Tamura K, Stecher G, Peterson D, Filipski A, Kumar S (2013) MEGA6: molecular evolutionary genetics analysis version 6.0. Mol Biol Evol 30:2725–2729

Tewari R, Bailes E, Bunting KA, Coates JC (2010) Armadillo-repeat protein functions: questions for little creatures. Trends Cell Biol 20:470–481

Thomas J, Schaack S, Pritham EJ (2010) Pervasive horizontal transfer of rolling-circle transposons among animals. Genome Biol Evol 2:656–664

Xie Z, Johansen LK, Gustafson AM, Kasschau KD, Lellis AD, Zil-berman D, Jacobsen SE, Carrington JC (2004) Genetic and func-tional diversification of small RNA pathways in plants. PLoS Biol 2:E104

Zhou X (2013) Advances in understanding begomovirus satellites. Ann Rev Phytopathol 51:357–381

Zhou X, Liu Y, Calvert L, Munoz C, Otim-Nape GW, Robinson DJ, Harrison BD (1997) Evidence that DNA-A of a geminivirus associated with severe cassava mosaic disease in Uganda has arisen by interspecific recombination. J Gen Virol 78:2101–2111

Zuker M (2003) Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res 31:3406–3415