33
Page 1 of 27 1 Characterization of the Rosa roxbunghii Tratt transcriptome and 2 analysis of MYB genes 3 4 Xiaolong Huang 1,2, Huiqing Yan 1*, Lisheng Zhai 1,2 , Zhengting Yang 1,2 , Yin Yi 2* 5 6 1 Key Laboratory of Plant Physiology and Development Regulation, school of life 7 science, Guizhou Normal University, Guiyang, China 8 2 Karst Key Laboratory of biodiversity conservation in Southwest China, National 9 Forestry Administration, Guiyang, China 10 11 * Corresponding author 12 E-mail: [email protected] and [email protected] 13 14 X Huang and H Yan contributed equally to this work. . CC-BY 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted August 15, 2018. ; https://doi.org/10.1101/392720 doi: bioRxiv preprint

Characterization of the Rosa roxbunghii Tratt transcriptome and … · Page 2 of 27 15 Abstract 16 Rosa roxbunghii Tratt belongs to the Rosaceae family, and the fruit is flavorful,

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

  • Page 1 of 27

    1 Characterization of the Rosa roxbunghii Tratt transcriptome and

    2 analysis of MYB genes

    3

    4 Xiaolong Huang 1,2¶, Huiqing Yan1*¶, Lisheng Zhai1,2, Zhengting Yang1,2, Yin Yi2*

    5

    6 1Key Laboratory of Plant Physiology and Development Regulation, school of life

    7 science, Guizhou Normal University, Guiyang, China

    8 2Karst Key Laboratory of biodiversity conservation in Southwest China, National

    9 Forestry Administration, Guiyang, China

    10

    11 *Corresponding author

    12 E-mail: [email protected] and [email protected]

    13

    14 ¶X Huang and H Yan contributed equally to this work.

    .CC-BY 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

    The copyright holder for this preprint (which was notthis version posted August 15, 2018. ; https://doi.org/10.1101/392720doi: bioRxiv preprint

    https://doi.org/10.1101/392720http://creativecommons.org/licenses/by/4.0/

  • Page 2 of 27

    15 Abstract

    16 Rosa roxbunghii Tratt belongs to the Rosaceae family, and the fruit is flavorful,

    17 economic, and highly nutritious, providing health benefits. MYB proteins play key roles

    18 in R. roxbunghii’ fruit development and quality. However, the available genomic and

    19 transcriptomic information are extremely deficient. Here, a normalized cDNA library was

    20 constructed using five tissues, stem, leaf, flower, young fruit, and mature fruit, with three

    21 repetitions, and sequenced using the Illumina HiSeq 2500 platform. De novo assembly

    22 was performed, and 470.66 million clean reads were obtained. In total, 63,727 unigenes,

    23 with an average GC content of 42.08%, were determined and 59,358 were annotated. In

    24 addition, 9,354 unigenes were assigned the Gene Ontology category, and 20,202

    25 unigenes were assigned to 25 Eukaryotic Ortholog Groups. Additionally, 19,507

    26 unigenes were classified into 140 pathways of the Kyoto Encyclopedia of Genes and

    27 Genomes database. Using the transcriptome, 18 candidate MYB genes that were

    28 significantly expressed in mature fruit, compared with other tissues, were obtained.

    29 Among them, 10 R2R3 MYB and 1 R1 MYB were identified. The expression levels of 12

    30 MYB genes randomly selected for qRT-PCR analysis were consistent with the RNA-seq

    31 results. A total of 37,545 microsatellites were detected, with an average EST-–SSR

    32 frequency of 0.59 (37,545/63,727). This transcriptome data will be valuable for

    33 identifying genes of interest and studying their expression and evolution.

    34 Key words: Rosa roxbunghii Tratt, transcriptome, MYB transcription factor, EST-SSR

    35

    .CC-BY 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

    The copyright holder for this preprint (which was notthis version posted August 15, 2018. ; https://doi.org/10.1101/392720doi: bioRxiv preprint

    https://doi.org/10.1101/392720http://creativecommons.org/licenses/by/4.0/

  • Page 3 of 27

    36 Introduction

    37 Rose roxburghii Tratt, a fruit crop and deciduous horticultural shrub, belongs to the

    38 Rosaceae family and is mainly distributed in southwestern China. It has several

    39 nutritional and functional components, including polysaccharides, flavonoids,

    40 troterpenes, and superoxide dismutase activity [1, 2]. Current pharmacological research

    41 indicates that it could be used for treatments of arosclerosis and senescence-retardation.

    42 In addition, its fruits have radio protective, anti-tumor, antimutagenic, and genoprotective

    43 activities [3-5]. As a traditional Chinese medicine, it can be further processed into fruit

    44 juice, preserves, and fruit wine [6]. MYB proteins play key roles in fruit development and

    45 quality [7, 8]. However, genes of the R. roxburghii MYB superfamily have not been

    46 comprehensively identified and evaluated. Consequently, we located and characterized R.

    47 roxburghii MYB genes.

    48 Among the different transcription factor families, MYBs form the largest and most

    49 functionally diverse superfamily, and they are involved in regulating cell activities and

    50 plant development. The N-terminus of a MYB domain is composed of adjacent tandem

    51 repeats [8]. The repeat encodes 50–53 amino acid residues and contains helices forming a

    52 helix-turn-helix domain that interact with the major grooves of specific DNA sequences

    53 [9]. MYB superfamily members can be classified into several subfamilies, including R1

    54 MYB, R2R3 MYB, R1R2R3 MYB, 4R MYB, and atypical MYB-like proteins [10].

    55 Many MYB superfamily proteins and their functions have been determined in different

    56 species. Due to the conserved domains, the R2R3 MYB type is predominant in plants

    .CC-BY 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

    The copyright holder for this preprint (which was notthis version posted August 15, 2018. ; https://doi.org/10.1101/392720doi: bioRxiv preprint

    https://doi.org/10.1101/392720http://creativecommons.org/licenses/by/4.0/

  • Page 4 of 27

    57 [11]. MYBs are involved in regulating plant growth, development, and stress resistance,

    58 including the anthocyanin biosynthetic pathway, trichome initiation and development,

    59 flavonoid or phenylpropanoid metabolism, secondary wall biosynthesis, sugar signaling

    60 and responses to abiotic or biotic stress [12, 13].

    61 Genomic information is not presently available for R. roxburghii in an online

    62 database. Thus, high-throughput Illumina sequencing could be performed to determine

    63 new transcripts, gene expression levels, and an accurate transcriptome profile [14]. Now

    64 it becomes a powerful methodology for species that lack reference genome information.

    65 Assembled unigenes with different database annotations can be assayed to evaluate

    66 genetic characteristics and metabolic pathways. Recently, fruits of R roxburghii at three

    67 different developmental stages were collected and analyzed using Illumina sequencing

    68 technology. A previous study generated 106,590 unigenes using de novo assembly. Using

    69 a BLAST algorithm-based search, 9,301 and 2,393 unigenes were categorized into Gene

    70 Ontology (GO) and Clusters of Orthologous Group (COG) categories, respectively.

    71 Additionally, 7,480 unigenes were assigned to 124 pathways in the Kyoto Encyclopedia

    72 of Gene and Genome (KEGG) pathway database. Among the unigenes, 9,131 potential

    73 EST–SSR (simple sequence repeat) loci were identified [15].

    74 To better understand the profiles of different tissues in R. roxburghii, leaf, stem,

    75 flower, young fruit and mature fruit were collected. In this study, the Illumina platform

    76 was used to construct a cDNA library using 15 mixed tissues to obtain transcriptome

    77 information. MYBs, which are significantly expressed in mature fruit, were identified.

    .CC-BY 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

    The copyright holder for this preprint (which was notthis version posted August 15, 2018. ; https://doi.org/10.1101/392720doi: bioRxiv preprint

    https://doi.org/10.1101/392720http://creativecommons.org/licenses/by/4.0/

  • Page 5 of 27

    78 For verification, randomly selected MYB genes were analyzed by qRT-PCR. EST–SSRs

    79 were subjected to a genetic diversity analysis of R. roxburghii. The obtained information

    80 provides a valuble resource for functional gene analyses, in particular of MYB genes.

    81 Materials and methods

    82 Biological materials

    83 Wild Rose roxburghii were planted in Guizhou Normal University, Guizhou, China.

    84 Different tissues were collected including leaf, stem, flower, young fruit (50 days after

    85 flowering, YF) and mature fruit (120 days after flowering, MF). Each tissue was in three

    86 independent experiments. These materials were immediately frozen in liquid nitrogen, then

    87 mechanically grounded into fine powder and stored at −80 °C for further experiment.

    88 RNA extraction and sequencing

    89 A total of 1 g of frozen tissues from leaf, stem, flower, young fruit and mature fruit were

    90 weighed. RNA was isolated using the Trizol method (Takara, Japan), flowing the

    91 manufacturer’s guidelines. RNA quality was determined with Nanodrop (Wilmington,

    92 USA) and Agilent 2100 (Santa Clara, USA). Fifteen libraries were created; three for each

    93 of the individual tissues or growth stage. Samples of mRNA were selected and randomly

    94 fragmented. Using these fragments as templates, cDNA was transformed and purified. The

    95 adapters were connected to distinguish different sequencing samples. Strand specific

    96 cDNA libraries were constructed for sequencing through Illumina HiSeq 2500 (Illumina,

    97 San Diego, USA).

    98 De novo assembly and functional annotation

    .CC-BY 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

    The copyright holder for this preprint (which was notthis version posted August 15, 2018. ; https://doi.org/10.1101/392720doi: bioRxiv preprint

    https://doi.org/10.1101/392720http://creativecommons.org/licenses/by/4.0/

  • Page 6 of 27

    99 The clean reads were selected from raw data with filtering out adaptor-only reads, reads

    100 with more than 5% N bases unknown, and low-quality reads (reads containing more than

    101 50% bases with Q-value≤10). Trinity assembly program was used to obtain data. To

    102 determine the function of the unigenes, BLASTx alignment with an E-value cut off of 10−5

    103 was performed with different database, including KOG (Eukaryotic Ortholog Groups,

    104 http://www.ncbi.nlm.nih.gov/KOG/) [16], Nr (NCBI non-redundant protein database,

    105 http://www.ncbi.nlm.nih.gov/) , KEGG (Kyoto Encyclopedia of Genes and Genomes,

    106 http://www.genome.jp/kegg/) [17], Gene Ontology (GO, http://www.geneontology.org/)

    107 and Swiss-Prot (http://www.expasy.ch/sprot) [18, 19]. The best-aligning results were used

    108 to identify sequence direction of the unigenes. If different databases conflicted, the results

    109 were prioritized in the following order: Nr, Swiss-Prot, KEGG, COG and GO [20]. When

    110 transcripts did not align to any of the databases, EST Scan (http://myhits.isb-sib.ch/cgi-

    111 bin/estscan) was conducted to decide its sequence direction.

    112 Nucleic acid sequences from strawberry (Fragaria X ananassa), apple (Malus x

    113 domestica) and cherry (Pruns. avium) were aligned by blast of Uni-prot database used for

    114 Rose roxburghii. This was done in order to compare the four species based on exactly the

    115 same search parameters and database type.

    116 Identification and conserved motif analysis of Rosa roxbunghii superfamily

    117 All genes related to MYBs were selected and clustered based on Nr annotation and made

    118 descriptions. The expression level of all allergen MYB genes in various tissues were

    119 showed. Conserved motifs shared by MYB proteins, significantly expressed in mature fruit

    .CC-BY 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

    The copyright holder for this preprint (which was notthis version posted August 15, 2018. ; https://doi.org/10.1101/392720doi: bioRxiv preprint

    https://doi.org/10.1101/392720http://creativecommons.org/licenses/by/4.0/

  • Page 7 of 27

    120 using a threshold value of absolute log2 FC (fold change) ≥1 with FDR (False discovery

    121 rate) ≤ 0.01, were analyzed using Multiple Em for Motif Elicitation (MEME V 5.0.1

    122 http://meme.nbcr.net/meme/cgi-bin/meme.cgi) online tool through uploading the amino

    123 acid sequences of the MYB superfamily members. The following parameter settings were

    124 used: 1R-MYB proteins, having one R; R2R3-MYB proteins, having two Rs; R1R2R3-

    125 MYB proteins, having three Rs; 4R-MYB proteins, having four Rs. Others belong to

    126 atypical MYB families were determined [21].

    127 Quantitative real-time PCR (qRT-PCR) assays

    128 Twelve cDNAs encoding MYB transcriptional factor, all of which have potential roles in

    129 regulation plant development, were selected for qRT-PCR validation. Primers were

    130 designed (S1 Table) with primer premier 6. Total RNAs were isolated from leaf, stem,

    131 flower, young fruit (YF) and mature fruit (MF) using the Trizol, followed by purification

    132 with an RNA purification kit (Takara, Japan). Real tine RT-PCR was performed on a Roche

    133 LightCycler480 machine using SYBR green I, with β-actin as an endogenous control.

    134 Amplification was performed for 95°C for 2 min, 40 cycles with 95°C for 15 s, annealed

    135 at 58°C for 30 s, and 72°C for 30 s. The expression levels relative to the control were

    136 estimated by calculating △△Ct and subsequently analyzed using 2−△△Ct method.

    137 Microsatellite detection

    138 The program MISA (http://pgrc.ipksgatersleben.de/misa/) was used to detect

    139 microsatellite repeat motifs for each unigene in order to understand distributions of

    140 microsatellites (also known as SSRs) and to develop new markers in the transcriptome of

    .CC-BY 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

    The copyright holder for this preprint (which was notthis version posted August 15, 2018. ; https://doi.org/10.1101/392720doi: bioRxiv preprint

    https://doi.org/10.1101/392720http://creativecommons.org/licenses/by/4.0/

  • Page 8 of 27

    141 Rose roxburghii. The number of core repeat motifs in mononucleotide, di-nucleotides, tri-

    142 nucleotide tetra-nucleotide, penta-nucleotide and hexa- nucleotides was counted.

    143 Results

    144 Illumina sequencing and sequence assembly

    145 To identify more genes, RNA-seq of different tissues, leaf, stem, flower, young fruit and

    146 mature fruit, was performed with an Illumina HiSeq 2500 (Fig 1A–F). After trimming

    147 and quality filtration of the raw data, each tissue was represented by an average of 31.38

    148 million reads. A total of 470.66 million reads were obtained for all five tissues with three

    149 repetitions. Total reads per biological condition are showed in S2 Table. For each sample,

    150 86.21% of the reads could be mapped uniquely. A total of 212,534 transcripts were

    151 obtained by assembling clean reads using the Trinity program, with an average GC

    152 content of 42.08%, an average length of 1,077 bp and an N50 length of 2,085 bp.

    153 Transcripts were further clustered and assembled into 63,727 unigenes. The unigenes

    154 were all longer than 200 bp, with average and N50 lengths of 995 bp and 1,895 bp,

    155 respectively (Table 1). Of the 63,727 unigenes, 78.03% (49,727) were longer than 600 bp

    156 and 56.07% (35,732) were longer than 1 kb (Fig 2). In addition, most unigenes (60,901)

    157 had a length of less than 5,200 bp (95.57%) (S3 Table and Table 1).

    158 Functional annotation of unigenes

    159 To obtain sequence information, GO, KEGG, Nr, Swiss-prot, and Eukaryotic

    160 Orthologous Groups (KOG) databases were used to compare 59,358 (93.14%) of the

    161 63,727 unigenes using a BLASTX-based algorithm (S4 Table). Swiss-prot and Nr

    .CC-BY 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

    The copyright holder for this preprint (which was notthis version posted August 15, 2018. ; https://doi.org/10.1101/392720doi: bioRxiv preprint

    https://doi.org/10.1101/392720http://creativecommons.org/licenses/by/4.0/

  • Page 9 of 27

    162 contained the most homologous unigenes, at 55,118 and 55,151, respectively. In total,

    163 3,284 unigenes were annotated to all database (Fig 3A). Comparative approaches

    164 effectively found differences and similarities among different species. Nucleic acid

    165 sequences from Rose species, including strawberry, apple, and cherry, were aligned using

    166 a BLAST algorithm-based search of the Uniprot database. We found that strawberry was

    167 the closest model species, followed by cherry and then apple, with the numbers of

    168 unigenes similar to those of R. roxburghii being 30,577 (47.98%), 20,105 (31.55%), and

    169 17,677 (27.74%), respectively (Fig 3B).

    170 There are three GO categories: biological process, cellular component, and

    171 molecular function (S5 Table). Category “biological process” consisted of 20 functional

    172 groups, with the major groups, metabolic process (56.56%) and cellular process

    173 (54.02%), having the same and higher numbers of annotations, respectively. For the

    174 category “cellular part”, 16 groups were predicted. Cell, cell part, and organelle were the

    175 three major groups. For “molecular function”, binding (49.02%) and catalytic activity

    176 (46.01%) were the dominant groups, followed by structural molecule activity (14.89%)

    177 (Fig 4).

    178 A total of 20,202 unigenes were identified using the KOG database (S6. Fig) and

    179 annotated to 25 functional categories. General function prediction (46.37%) was the

    180 largest group, followed by signal transduction mechanisms (24.26%), posttranslational

    181 modification, protein turnover, and chaperones (23.49%), and translation, ribosomal

    182 structure, and biogenesis (19.80%). The numbers of unigenes assigned to transcription

    .CC-BY 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

    The copyright holder for this preprint (which was notthis version posted August 15, 2018. ; https://doi.org/10.1101/392720doi: bioRxiv preprint

    https://doi.org/10.1101/392720http://creativecommons.org/licenses/by/4.0/

  • Page 10 of 27

    183 (11.81%), carbohydrate transport and metabolism (11.70%), energy production and

    184 conversion (11.67%), and intracellular trafficking, secretion, and vesicular transport

    185 (11.29%) were almost the same. In addition, lipid transport and metabolism represented

    186 10.29%. However, there were still 1,939 unigenes with unknown functions. KOG

    187 classifications revealed potential biological functions and provided an insight into the

    188 chemical reactions involved in the molecular processes in R. roxbunghii.

    189 A total of 19,507 annotated unigenes were assayed to determine the biological

    190 pathways represented in R. roxbunghii. Briefly, these unigenes matched 140 KEGG

    191 pathways, as summarized in S7 Table. Translation in genetic information processing

    192 (3,005) was the dominant pathway, followed by carbohydrate metabolism (1,988),

    193 folding, sorting and degradation (1,375), energy metabolism (1,285), transport and

    194 catabolism (1,184), amino acid metabolism (1,161), and lipid metabolism (901) (Fig 4).

    195 KEGG pathways can provide new insights into R. roxbunghii biology and contribute to

    196 the prediction of the higher-level complexity of cellular processes and organismal

    197 behavior based on this information.

    198 Genes involved with MYB transcriptional factors in five different tissues

    199 Among the functional database annotations, 163 MYB proteins were detected.

    200 Descriptions of the MYBs are listed in S8. Fig and S9 Table. MYBs in R. roxburghii

    201 were similar to the 61 members in Fragaria vesca. MYBs regulate secondary

    202 metabolism, gene expression and are involved in environmental stress responses. The

    203 expression levels of 159 putative MYB genes in five tissues are shown clustered. Various

    .CC-BY 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

    The copyright holder for this preprint (which was notthis version posted August 15, 2018. ; https://doi.org/10.1101/392720doi: bioRxiv preprint

    https://doi.org/10.1101/392720http://creativecommons.org/licenses/by/4.0/

  • Page 11 of 27

    204 MYBs were expressed in different tissues. The black boxes represented MYBs that were

    205 significantly expressed in mature fruit.

    206 To investigate features of homologous domains and each repeat MYB domain that

    207 was significantly expressed in mature fruit compared with the other four tissues, an

    208 online MEME was used to search for conserved motifs shared by these proteins by

    209 uploading the amino acid sequences. In total, 18 candidate MYB genes were analyzed

    210 using the transcriptome. Among them, 10 R2R3 MYB, and 1 R1 MYB were identified,

    211 while the remaining 7 MYB genes belonged to atypical MYB families (Fig.5). Sequences

    212 of the different conserved motifs are shown in S10. Fig.

    213 Verification of RNA-seq results by qRT-PCR

    214 Real-time RT-PCR was conducted to validate the identification of MYB genes by RNA-

    215 seq in five different tissues. With β-actin as the internal control, 12 genes related to MYB

    216 transcription factors were randomly selected. Validation results showed that the change

    217 trends of the 12 genes were nearly consistent with the gene expression patterns identified

    218 by RNA-seq (Fig 6). These conclusions highlight the fidelity and accuracy of the RNA-

    219 seq analysis in the present study.

    220 Microsatellite analysis and SSR distribution

    221 Using MISA software, 63,727 unigenes with a total length of 109,644,660 bp were

    222 screened for microsatellite determination. A total of 37,545 potential EST–SSRs were

    223 identified. The average distribution of the SSRs was calculated to be 1:2,920 bp

    224 (37,545/109,644,660), and the average frequency of an EST–SSR was 0.59

    .CC-BY 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

    The copyright holder for this preprint (which was notthis version posted August 15, 2018. ; https://doi.org/10.1101/392720doi: bioRxiv preprint

    https://doi.org/10.1101/392720http://creativecommons.org/licenses/by/4.0/

  • Page 12 of 27

    225 (37,545/63,727). In total, 20,321 unigenes contained one kind of SSR (54.12%), and

    226 8,757 contained more than one SSR (23.32%; S11 Table). In total, 5,275 (14.05%) SSRs

    227 were present in a compound formation. Transcriptome types of SSRs, from single

    228 nucleotide to hexa-nucleotide, were abundant.

    229 Among the identified 37,545 SSRs, repeats having mononucleotide motifs were the

    230 most abundant (19,589, 52.17%), followed by di-nucleotides (11.504, 30.64%) (Table 2).

    231 The most abundant motif was A or T (18,855, 50.22%), followed by AG or CT (8,055,

    232 21.45%) (Table 2). Among SSRs with tri-, tetra-, and penta-nucleotides, the most

    233 abundant types were AAG/CTT (2,189, 5.83%), AAAT/ATTT (141, 0.38%), and

    234 AAAAT/ATTTT (19, 0.05%), respectively. The hexa-motifs AAGGAG/CCTTCT and

    235 ACCTCC/AGGTGG, were the most abundant types and were equally present (7, 0.02%)

    236 (Table 2). The repeat positions of the SSR types were analyzed and ranged from 5 to 121.

    237 Most SSR types were repeated more than 15 times, at 19.01% (7,136), while those

    238 repeated 10 times were 17.10% (6,422) (Table 3). Except mononucleotides, the repeat

    239 numbers for most SSRs ranged from 5 to 12 (9,612, 75.9%), with only small percent

    240 being repeated more than 15 times (1,177, 9.3%).

    241 Discussion

    242 Illumina mRNA sequencing technology is considered as an effective approach to assess

    243 transcriptional expression in different tissues. However, studies of the R. roxburghii

    244 transcriptome are limited. A previous study resulted in 53 million reads using fruit at

    245 three developmental stages [15]. Genomic information for R. roxburghii using next-

    .CC-BY 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

    The copyright holder for this preprint (which was notthis version posted August 15, 2018. ; https://doi.org/10.1101/392720doi: bioRxiv preprint

    https://doi.org/10.1101/392720http://creativecommons.org/licenses/by/4.0/

  • Page 13 of 27

    246 generation sequencing technologies has also been obtained. In total, 30.29 Gb of

    247 sequence data was identified by HiSeq 2500 sequencing [5], covering ~60.60% of the R.

    248 roxburghii genome. However, whole-genome sequencing was not performed, and only

    249 genes related to the biosynthesis of ascorbic acid were the focus. The limited available

    250 genomic information for R. roxburghii has constrained genetic studies. To obtain

    251 sequences of as many expressed genes as possible, 15 mixed tissues were used for library

    252 construction and 470,657,040 clean reads were obtained, which was greater than

    253 previously reported. In addition, 9,354 unigenes were assigned to GO categories,

    254 compared with 9,301 unigenes in a previous study. Moreover, 19,507 unigenes were

    255 assigned to 140 KEGG pathways compared with the previous 7,480 unigenes assigned to

    256 124 pathways. The greater amount of data increased the coverage depth and accuracy,

    257 and a large number of genes involved in different metabolic pathways can now be

    258 identified [22].

    259 Because 15 tissues were applied to construct the cDNA libraries, a large portion of

    260 the whole transcriptome was available for annotation. In fact, 93.14% of the R roxburghii

    261 unigenes were annotated through BLAST algorithm-based searches against five

    262 databases. A similarity analysis in the Nr database revealed that unigenes were more

    263 homologous to cDNA sequences of strawberry than the two Rosaceae species apple and

    264 cherry. This indicated that R. roxburghii has a closer evolutionary relationship with

    265 strawberry. The genomic resources of strawberry have been available on line

    266 (http://bioinformatics.towson.edu/strawberry/). Strawberry has a short life cycle (3.5 to 4

    .CC-BY 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

    The copyright holder for this preprint (which was notthis version posted August 15, 2018. ; https://doi.org/10.1101/392720doi: bioRxiv preprint

    https://doi.org/10.1101/392720http://creativecommons.org/licenses/by/4.0/

  • Page 14 of 27

    267 months) and a facile transformation system, while R. roxburghii is a deciduous

    268 horticultural shrub with a long-life cycle. However, it is still feasible to study R.

    269 roxburghii gene functions using strawberry as a model system, as was previously done

    270 for fruit crops in Rosaceae, including apple, peach, and cherry [23].

    271 Based on MYB expression patterns, 18 MYB genes were significantly expressed in

    272 mature fruit compared with in other tissues. In total, 10 R2R3 MYBs were identified, and

    273 this was the dominant MYB type. MYBs presented similar patterns and conserved motifs,

    274 suggesting that their conserved features play similar roles in group-specific functions. As

    275 part of a myb-bHLH-WD40 complex, they are involved in plant secondary metabolism

    276 and responses to abiotic and biological stresses [24, 25]. Several R2R3 MYBs were

    277 analyzed for their possible involvement in anthocyanin biosynthesis and accumulation,

    278 and fructan exohydrolase regulation when subjected to different stress and hormone

    279 treatments [26, 27]. MYBs were implicated in sugar signaling, fruit-skin coloration, and

    280 phenylpropanoid metabolism [28-30]. Most R2R3 MYBs were implicated in fruit

    281 development and should be studied to improve fruit quality and stress resistance. The

    282 results of this study will aid functional studies of interesting MYB genes involved in R.

    283 roxburghii fruit development.

    284 Using qRT-PCR, the accuracy of RNA-seq was validated. Both assays were limited

    285 to the transcriptional level. Although 15 tissues were used to construct cDNA libraries,

    286 the whole R. roxburghii transcriptome was still not completely covered because some

    287 transcripts, not expressed in these tissues, may be missed.

    .CC-BY 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

    The copyright holder for this preprint (which was notthis version posted August 15, 2018. ; https://doi.org/10.1101/392720doi: bioRxiv preprint

    https://doi.org/10.1101/392720http://creativecommons.org/licenses/by/4.0/

  • Page 15 of 27

    288 A total of 37,545 microsatellites with different repeat types were detected from

    289 63,727 unigenes, indicating that each unigene, on average, contained 0.59 SSR. SSR

    290 locus density was 1:2,920 bp, compared with 1:4.00 kb, in a previous study. Various

    291 criteria and parameters for SSR detection, as well as the diversity of genomic structures

    292 and compositions can influence SSR density [31]. The differences found here might

    293 reflect genome size [32], which implied that we obtained more abundant data. Errors in

    294 sequencing and assembly mistakes that resulted in mononucleotide SSRs were relatively

    295 low [33]. Except mononucleotides, the most common SSR motif was dinucleotide repeats

    296 in the transcriptome, which was similar to previously reported results. Here, AC/GT was

    297 the most common type. In conclusion, detected EST–SSRs (37,545) were more

    298 associated with functional genes and could provide valuable genetic and genomic

    299 analyses. These EST–SSRs will be useful for genetic linkage mapping construction,

    300 comparative mapping and MAS breeding.

    301 Taken together, a deep RNA-seq analysis was conducted on five tissues, and a total

    302 of 469.5 million reads were generated. In total, 63,727 unigenes were obtained using

    303 Trinity assembly, in which nearly 90% (59,358) were successfully annotated. The data

    304 provided comprehensive coverage of the R. roxburghii transcriptome and allowed us to

    305 identify the MYBs significantly expressed in mature fruit. Future studies are necessary to

    306 elucidate the MYBs involved in fruit quality and development. Finally, EST–SSRs were

    307 identified and different SSR repeat numbers isolated from the transcriptome were

    308 analyzed. This study provides a valuable resource for bioengineering and biological

    .CC-BY 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

    The copyright holder for this preprint (which was notthis version posted August 15, 2018. ; https://doi.org/10.1101/392720doi: bioRxiv preprint

    https://doi.org/10.1101/392720http://creativecommons.org/licenses/by/4.0/

  • Page 16 of 27

    309 studies of R. roxburghii.

    310 Conclusions

    311 In present study, a deep RNA-seq analysis was conducted on five tissues, and a total of

    312 469.5 million reads were generated. In total, 63,727 unigenes were obtained using Trinity

    313 assembly, in which nearly 90% (59,358) were successfully annotated. The data provided

    314 comprehensive coverage of the R. roxburghii transcriptome and allowed us to identify the

    315 MYBs significantly expressed in mature fruit. Future studies are necessary to elucidate the

    316 MYBs involved in fruit quality and development. Finally, EST–SSRs were identified and

    317 different SSR repeat numbers isolated from the transcriptome were analyzed. This study

    318 provides a valuable resource for bioengineering and biological studies of R. roxburghii.

    319

    320

    321 Acknowledgements

    322 This work was supported by grants from the National Natural Science Foundation of

    323 China (Grant No. 31660554, 31600214 and 31660046), Science Foundation of Guizhou

    324 Provinces (Qiankehe J zi (2015) 2117).

    325

    326 References

    327 1. Chen G, Kan J. Characterization of a novel polysaccharide isolated from Rosa

    328 roxburghii Tratt fruit and assessment of its antioxidant in vitro and in vivo. Int J Biol

    329 Macromol. 2018;107: 166-174. doi: 10.1016/j.ijbiomac.2017.08.160. PubMed PMID:

    .CC-BY 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

    The copyright holder for this preprint (which was notthis version posted August 15, 2018. ; https://doi.org/10.1101/392720doi: bioRxiv preprint

    https://doi.org/10.1101/392720http://creativecommons.org/licenses/by/4.0/

  • Page 17 of 27

    330 28866014.

    331 2. Xu P, Liu X, Xiong X, Zhang W, Cai X, Qiu P, et al. Flavonoids of Rosa roxburghii

    332 Tratt Exhibit Anti-Apoptosis Properties by Regulating PARP-1/AIF. J Cell Biochem.

    333 2017;118(11): 3943-3952. doi: 10.1002/jcb.26049. PubMed PMID: 28398610.

    334 3. Xu P, Zhang WB, Cai XH, Lu DD, He XY, Qiu PY, et al. Flavonoids of Rosa

    335 roxburghii Tratt act as radioprotectors. Asian Pac J Cancer Prev. 2014;15(19): 8171-8175.

    336 PubMed PMID: 25339001.

    337 4. Xu SJ, Zhang F, Wang LJ, Hao MH, Yang XJ, Li NN, et al. Flavonoids of Rosa

    338 roxburghii Tratt offers protection against radiation induced apoptosis and inflammation in

    339 mouse thymus. Apoptosis. 2018. doi: 10.1007/s10495-018-1466-7. PubMed PMID:

    340 29995207.

    341 5. Chen G, Kan J. Ultrasound-assisted extraction, characterization, and antioxidant

    342 activity in vitro and in vivo of polysaccharides from Chestnut rose (Rosa roxburghii tratt)

    343 fruit. J Food Sci Technol. 2018;55(3): 1083-1092. doi: 10.1007/s13197-017-3023-8.

    344 PubMed PMID: 29487451; PubMed Central PMCID: PMCPMC5821667.

    345 6. Liu MH, Zhang Q, Zhang YH, Lu XY, Fu WM, He JY. Chemical Analysis of Dietary

    346 Constituents in Rosa roxburghii and Rosa sterilis Fruits. Molecules. 2016;21(9): 1204-

    347 1224. doi: 10.3390/molecules21091204. PubMed PMID: 27618004.

    348 7. Li G, Chen D, Tang X, Liu Y. Heterologous expression of kiwifruit (Actinidia

    349 chinensis) GOLDEN2-LIKE homolog elevates chloroplast level and nutritional quality in

    350 tomato (Solanum lycopersicum). Planta. 2018;247(6): 1351-1362. doi: 10.1007/s00425-

    .CC-BY 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

    The copyright holder for this preprint (which was notthis version posted August 15, 2018. ; https://doi.org/10.1101/392720doi: bioRxiv preprint

    https://doi.org/10.1101/392720http://creativecommons.org/licenses/by/4.0/

  • Page 18 of 27

    351 018-2853-6. PubMed PMID: 29520458.

    352 8. Zhang C, Ma R, Xu J, Yan J, Guo L, Song J, et al. Genome-wide identification and

    353 classification of MYB superfamily genes in peach. PLoS One. 2018;13(6): e0199192-

    354 e0199217. doi: 10.1371/journal.pone.0199192. PubMed PMID: 29927971; PubMed

    355 Central PMCID: PMCPMC6013194.

    356 9. Zimmermann IM, Heim MA, Weisshaar B, Uhrig JF. Comprehensive identification of

    357 Arabidopsis thaliana MYB transcription factors interacting with R/B-like BHLH proteins.

    358 Plant Journal. 2004;40(1): 22-34. doi: 10.1111/j.1365-313X.2004.02183.x. PubMed PMID:

    359 15361138.

    360 10. Mmadi MA, Dossa K, Wang L, Zhou R, Wang Y, Cisse N, et al. Functional

    361 Characterization of the Versatile MYB Gene Family Uncovered Their Important Roles in

    362 Plant Development and Responses to Drought and Waterlogging in Sesame. Genes (Basel).

    363 2017;8(12): 362-379. doi: 10.3390/genes8120362. PubMed PMID: 29231869; PubMed

    364 Central PMCID: PMCPMC5748680.

    365 11. Hajiebrahimi A, Owji H, Hemmati S. Genome-wide identification, functional

    366 prediction, and evolutionary analysis of the R2R3-MYB superfamily in Brassica napus.

    367 Genome. 2017;60(10): 797-814. doi: 10.1139/gen-2017-0059. PubMed PMID: 28732175.

    368 12. Stracke R, Ishihara H, Huep G, Barsch A, Mehrtens F, Niehaus K, et al. Differential

    369 regulation of closely related R2R3-MYB transcription factors controls flavonol

    370 accumulation in different parts of the Arabidopsis thaliana seedling. Plant J. 2007;50(4):

    371 660-677. doi: 10.1111/j.1365-313X.2007.03078.x. PubMed PMID: 17419845; PubMed

    .CC-BY 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

    The copyright holder for this preprint (which was notthis version posted August 15, 2018. ; https://doi.org/10.1101/392720doi: bioRxiv preprint

    https://doi.org/10.1101/392720http://creativecommons.org/licenses/by/4.0/

  • Page 19 of 27

    372 Central PMCID: PMCPMC1976380.

    373 13. Feng K, Xu ZS, Que F, Liu JX, Wang F, Xiong AS. An R2R3-MYB transcription

    374 factor, OjMYB1, functions in anthocyanin biosynthesis in Oenanthe javanica. Planta.

    375 2018;247(2): 301-315. doi: 10.1007/s00425-017-2783-8. PubMed PMID: 28965159.

    376 14. Zhu Y, Chen L, Zhang C, Hao P, Jing X, Li X. Global transcriptome analysis reveals

    377 extensive gene remodeling, alternative splicing and differential transcription profiles in

    378 non-seed vascular plant Selaginella moellendorffii. BMC Genomics. 2017;18 (Suppl

    379 1):1042-1056. doi: 10.1186/s12864-016-3266-1. PubMed PMID: 28198676; PubMed

    380 Central PMCID: PMCPMC5310277.

    381 15. Yan X, Zhang X, Lu M, He Y, An H. De novo sequencing analysis of the Rosa

    382 roxburghii fruit transcriptome reveals putative ascorbate biosynthetic genes and EST-SSR

    383 markers. Gene. 2015;561(1): 54-62. doi: 10.1016/j.gene.2015.02.054. PubMed PMID:

    384 25701597.

    385 16. Koonin EV, Fedorova ND, Jackson JD, Jacobs AR, Krylov DM, Makarova KS, et al.

    386 A comprehensive evolutionary classification of proteins encoded in complete eukaryotic

    387 genomes. Genome Biol. 2004;5(2): R7-R34. doi: 10.1186/gb-2004-5-2-r7. PubMed PMID:

    388 14759257; PubMed Central PMCID: PMCPMC395751.

    389 17. Kanehisa M, Goto S, Kawashima S, Okuno Y, Hattori M. The KEGG resource for

    390 deciphering the genome. Nucleic Acids Res. 2004;32: D277-280. doi: 10.1093/nar/gkh063.

    391 PubMed PMID: 14681412; PubMed Central PMCID: PMCPMC308797.

    392 18. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, et al. Gene

    .CC-BY 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

    The copyright holder for this preprint (which was notthis version posted August 15, 2018. ; https://doi.org/10.1101/392720doi: bioRxiv preprint

    https://doi.org/10.1101/392720http://creativecommons.org/licenses/by/4.0/

  • Page 20 of 27

    393 ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet.

    394 2000;25(1): 25-29. doi: 10.1038/75556. PubMed PMID: 10802651; PubMed Central

    395 PMCID: PMCPMC3037419.

    396 19. UniProt Consortium T. UniProt: the universal protein knowledgebase. Nucleic Acids

    397 Res. 2018;46(5): 2699. doi: 10.1093/nar/gky092. PubMed PMID: 29425356; PubMed

    398 Central PMCID: PMCPMC5861450.

    399 20. Zhang J, Schmidt CJ, Lamont SJ. Transcriptome analysis reveals potential

    400 mechanisms underlying differential heart development in fast- and slow-growing broilers

    401 under heat stress. BMC Genomics. 2017;18(1): 295-309. doi: 10.1186/s12864-017-3675-

    402 9. PubMed PMID: 28407751; PubMed Central PMCID: PMCPMC5390434.

    403 21. Li X, Xue C, Li J, Qiao X, Li L, Yu L, et al. Genome-Wide Identification, Evolution

    404 and Functional Divergence of MYB Transcription Factors in Chinese White Pear (Pyrus

    405 bretschneideri). Plant Cell Physiol. 2016;57(4) :824-47. doi: 10.1093/pcp/pcw029.

    406 PubMed PMID: 26872835.

    407 22. Zhu C, Pan Z, Wang H, Chang G, Wu N, Ding H. De novo assembly, characterization

    408 and annotation for the transcriptome of Sarcocheilichthys sinensis. PLoS One. 2017;12(2):

    409 e0171966-e0171980. doi: 10.1371/journal.pone.0171966. PubMed PMID: 28196101;

    410 PubMed Central PMCID: PMCPMC5308828.

    411 23. Wang XL, Zhong Y, Cheng ZM, Xiong JS. Divergence of the bZIP Gene Family in

    412 Strawberry, Peach, and Apple Suggests Multiple Modes of Gene Evolution after

    413 Duplication. Int J Genomics. 2015;2015: 536943-536953. doi: 10.1155/2015/536943.

    .CC-BY 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

    The copyright holder for this preprint (which was notthis version posted August 15, 2018. ; https://doi.org/10.1101/392720doi: bioRxiv preprint

    https://doi.org/10.1101/392720http://creativecommons.org/licenses/by/4.0/

  • Page 21 of 27

    414 PubMed PMID: 26770968; PubMed Central PMCID: PMCPMC4685131.

    415 24. Wei Q, Luo Q, Wang R, Zhang F, He Y, Zhang Y, et al. A Wheat R2R3-type MYB

    416 Transcription Factor TaODORANT1 Positively Regulates Drought and Salt Stress

    417 Responses in Transgenic Tobacco Plants. Front Plant Sci. 2017;8: 1374-1388. doi:

    418 10.3389/fpls.2017.01374. PubMed PMID: 28848578; PubMed Central PMCID:

    419 PMCPMC5550715.

    420 25. Brendolise C, Espley RV, Lin-Wang K, Laing W, Peng Y, McGhie T, et al. Multiple

    421 Copies of a Simple MYB-Binding Site Confers Trans-regulation by Specific Flavonoid-

    422 Related R2R3 MYBs in Diverse Species. Front Plant Sci. 2017;8: 1864-1868. doi:

    423 10.3389/fpls.2017.01864. PubMed PMID: 29163590; PubMed Central PMCID:

    424 PMCPMC5671642.

    425 26. Li W, Ding Z, Ruan M, Yu X, Peng M, Liu Y. Kiwifruit R2R3-MYB transcription

    426 factors and contribution of the novel AcMYB75 to red kiwifruit anthocyanin biosynthesis.

    427 Sci Rep. 2017;7(1): 16861-16874. doi: 10.1038/s41598-017-16905-1. PubMed PMID:

    428 29203778; PubMed Central PMCID: PMCPMC5715094.

    429 27. Wei H, Zhao H, Su T, Bausewein A, Greiner S, Harms K, et al. Chicory R2R3-MYB

    430 transcription factors CiMYB5 and CiMYB3 regulate fructan 1-exohydrolase expression in

    431 response to abiotic stress and hormonal cues. J Exp Bot. 2017;68(15): 4323-4338. doi:

    432 10.1093/jxb/erx210. PubMed PMID: 28922763.

    433 28. Chen YS, Chao YC, Tseng TW, Huang CK, Lo PC, Lu CA. Two MYB-related

    434 transcription factors play opposite roles in sugar signaling in Arabidopsis. Plant Mol Biol.

    .CC-BY 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

    The copyright holder for this preprint (which was notthis version posted August 15, 2018. ; https://doi.org/10.1101/392720doi: bioRxiv preprint

    https://doi.org/10.1101/392720http://creativecommons.org/licenses/by/4.0/

  • Page 22 of 27

    435 2017;93(3): 299-311. doi: 10.1007/s11103-016-0562-8. PubMed PMID: 27866313.

    436 29. Tuan PA, Bai S, Yaegaki H, Tamura T, Hihara S, Moriguchi T, et al. The crucial role

    437 of PpMYB10.1 in anthocyanin accumulation in peach and relationships between its allelic

    438 type and skin color phenotype. BMC Plant Biol. 2015;15: 280-293. doi: 10.1186/s12870-

    439 015-0664-5. PubMed PMID: 26582106; PubMed Central PMCID: PMCPMC4652394.

    440 30. Medina-Puche L, Cumplido-Laso G, Amil-Ruiz F, Hoffmann T, Ring L, Rodriguez-

    441 Franco A, et al. MYB10 plays a major role in the regulation of flavonoid/phenylpropanoid

    442 metabolism during ripening of Fragaria x ananassa fruits. J Exp Bot. 2014;65(2): 401-

    443 417. doi: 10.1093/jxb/ert377. PubMed PMID: 24277278.

    444 31. Zheng X, Pan C, Diao Y, You Y, Yang C, Hu Z. Development of microsatellite

    445 markers by transcriptome sequencing in two species of Amorphophallus (Araceae). BMC

    446 Genomics. 2013;14: 490-500. doi: 10.1186/1471-2164-14-490. PubMed PMID: 23870214;

    447 PubMed Central PMCID: PMCPMC3737116.

    448 32. Varshney RK, Graner A, Sorrells ME. Genic microsatellite markers in plants: features

    449 and applications. Trends Biotechnol. 2005;23(1): 48-55. doi:

    450 10.1016/j.tibtech.2004.11.005. PubMed PMID: 15629858.

    451 33. Tong C, Zhang C, Zhang R, Zhao K. Transcriptome profiling analysis of naked carp

    452 (Gymnocypris przewalskii) provides insights into the immune-related genes in highland

    453 fish. Fish Shellfish Immunol. 2015;46(2): 366-377. doi: 10.1016/j.fsi.2015.06.025.

    454 PubMed PMID: 26117731.

    .CC-BY 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

    The copyright holder for this preprint (which was notthis version posted August 15, 2018. ; https://doi.org/10.1101/392720doi: bioRxiv preprint

    https://doi.org/10.1101/392720http://creativecommons.org/licenses/by/4.0/

  • Page 23 of 27

    456 Supporting information

    457 S1 Table. Primers for real-time PCR.

    458 S2 Table. Summary of read mapping in leaf, flower, stem, young fruit and mature

    459 fruit with three repetitions.

    460 S3 Table. Analysis of size distribution of unigene for Rosa roxbunghii Tratt.

    461 S4 Table. The number of unigenes annotated according to the NCBI non-redundant

    462 (Nr), Swiss-Prot, KOG, Gene Ontology (GO) and KEGG database in Rosa

    463 roxbunghii Tratt.

    464 S5 Table. The number of unigenes annotated with GO and classified with three

    465 categories.

    466 S6 Fig. Histogram of Eukaryotic Ortholog Groups (KOG) classification of assembled

    467 unigenes.

    468 S7 Table. The number of unigenes involved in KEGG pathways.

    469 S8 Fig. Heat map diagram of the expression levels of MYB transcriptional factor in

    470 five different tissues. The black box represented MYBs which significantly

    471 expressed in mature fruit compared with other four tissues.

    472 S9 Table. Analysis of genes related with MYB transcription factor in five different

    473 tissues of Rosa roxbunghii Tratt.

    474 S10 Fig. The different color represented conserved motifs.

    475 S11 Table. Summary of different type EST-SSRs identified from the transcriptome.

    476

    .CC-BY 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

    The copyright holder for this preprint (which was notthis version posted August 15, 2018. ; https://doi.org/10.1101/392720doi: bioRxiv preprint

    https://doi.org/10.1101/392720http://creativecommons.org/licenses/by/4.0/

  • Page 24 of 27

    477 Figures and tables

    478 Fig 1. Rosa roxbunghii plant and different tissues collected in this study. (A) The whole

    479 plant, Bar = 50 cm; (B) Leaf and stem, Bar = 0.5 cm; (C) Flower bud, Bar = 1mm;

    480 (D) Flower, Bar =1 cm; (E) Young fruit (50 days after flowering, DAF), Bar = 1

    481 cm; (F) Mature fruit (120 DAF), Bar =1 cm.

    482 Fig 2. Length distribution of unigenes assembly for Rosa roxbunghii Tratt.

    483 Fig 3. Overview of functional annotation for unigenes of Rosa roxbunghii (A) Venn

    484 diagram of the number of unigenes annotated by BLASTx against five different

    485 databases. The number in the circles represented the number of unigenes

    486 annotated by single or multiple databases; (B) Homology to other model Rosa

    487 species, including strawberry (Fragaria X ananassa), apple (Malus domestica)

    488 and cherry (Pruns avium).

    489 Fig 4. (A) Histogram of Gene Ontology (GO) classification of assembled unigenes. Blue

    490 indicated biological process, green indicated cellular process, and red

    491 represented molecular function. (B) Pathway assignment based on KEGG

    492 analysis. The bottom x-axis indicated the number of unigenes in a specific

    493 category. The y-axis indicates the clustered function groups, the right y-axis

    494 indicates the specific category of genes in the main category.

    495 Fig 5. Motif distribution of MYB proteins which significantly expressed in mature fruits.

    496 Various colors represented different conserved motifs.

    497 Fig 6. Comparison of the relative expression levels of twelve randomly selected MYB

    .CC-BY 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

    The copyright holder for this preprint (which was notthis version posted August 15, 2018. ; https://doi.org/10.1101/392720doi: bioRxiv preprint

    https://doi.org/10.1101/392720http://creativecommons.org/licenses/by/4.0/

  • Page 25 of 27

    498 genes by RNA-Seq and RT-PCR.

    499 Table 1. Summary of the trinity assembly for Rosa roxbunghii.

    500 Table 2. Summary of EST-SSRs identified from the transcriptome of Rosa roxbunghii.

    501 Table 3. Summary of different repeat times for SSRs isolated from the transcriptome of

    502 Rosa roxbunghii Tratt.

    503

    504 Table 1. Summary of the trinity assembly for Rosa roxbunghii Tratt.

    Length range (bp) Transcript Unigene

    Total Number 212,534 63,727

    Total length 305,558,508 85,677,327

    N50 length 2,085 1,895

    Median Length 1,077 995

    505

    506

    507

    508

    509

    510

    511

    512

    .CC-BY 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

    The copyright holder for this preprint (which was notthis version posted August 15, 2018. ; https://doi.org/10.1101/392720doi: bioRxiv preprint

    https://doi.org/10.1101/392720http://creativecommons.org/licenses/by/4.0/

  • Page 26 of 27

    513 Table 2. Summary of EST-SSRs identified from the transcriptome of Rosa roxbunghii

    514 Tratt.

    SSR Type Number Percentage Dominat motif

    Number of

    dominant

    motif

    Percentage of

    dominant motif

    in all EST-SSR

    Mononucleotide 19,589 52.17% A/T 18,855 50.22%

    Dinonucleotide 11,504 30.64% AG/CT 8,055 21.45%

    Trinucleotide 5,879 15.66% AAG/CTT 2,189 5.83%

    Tetranucleotide 373 0.99% AAAT/ATTT 141 0.38%

    Pentanucleotide 93 0.25% AAGAG/CTCTT 19 0.05%

    AAGGAG/CCTTCT 7 0.02%Hexnucleotide 107 0.28%

    ACCTCC/AGGTGG 7 0.02%

    515

    516

    .CC-BY 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

    The copyright holder for this preprint (which was notthis version posted August 15, 2018. ; https://doi.org/10.1101/392720doi: bioRxiv preprint

    https://doi.org/10.1101/392720http://creativecommons.org/licenses/by/4.0/

  • Page 27 of 27

    517 Table 3. Summary of different repeat times for SSRs isolated from the transcriptome of Rosa roxbunghii Tratt.

    518

    SSR Type 5 6 7 8 9 10 11 12 13 14 15 >15 Total

    Mononucleotide 0 0 0 0 0 5448 2864 2019 1380 1165 905 5808 19589

    Dinonucleotide 0 2441 1765 1368 1158 883 985 858 216 259 263 1308 11504

    Trinucleotide 2965 1385 727 461 91 86 69 40 20 8 7 20 5879

    Tetranucleotide 242 95 15 7 4 5 0 3 2 0 0 0 373

    Pentanucleotide 67 17 8 1 0 0 0 0 0 0 0 0 93

    Hexnucleotide 74 22 7 1 1 0 1 1 0 0 0 0 107

    Total 3348 3960 2522 1838 1254 6422 3919 2921 1618 1432 1175 7136 37545

    Percentage 8.92% 10.55% 6.72% 4.90% 3.34% 17.10% 10.44% 7.78% 4.31% 3.81% 3.13% 19.01% 100%

    519

    .CC-BY 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

    The copyright holder for this preprint (which was notthis version posted August 15, 2018. ; https://doi.org/10.1101/392720doi: bioRxiv preprint

    https://doi.org/10.1101/392720http://creativecommons.org/licenses/by/4.0/

  • .CC-BY 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

    The copyright holder for this preprint (which was notthis version posted August 15, 2018. ; https://doi.org/10.1101/392720doi: bioRxiv preprint

    https://doi.org/10.1101/392720http://creativecommons.org/licenses/by/4.0/

  • .CC-BY 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

    The copyright holder for this preprint (which was notthis version posted August 15, 2018. ; https://doi.org/10.1101/392720doi: bioRxiv preprint

    https://doi.org/10.1101/392720http://creativecommons.org/licenses/by/4.0/

  • .CC-BY 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

    The copyright holder for this preprint (which was notthis version posted August 15, 2018. ; https://doi.org/10.1101/392720doi: bioRxiv preprint

    https://doi.org/10.1101/392720http://creativecommons.org/licenses/by/4.0/

  • .CC-BY 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

    The copyright holder for this preprint (which was notthis version posted August 15, 2018. ; https://doi.org/10.1101/392720doi: bioRxiv preprint

    https://doi.org/10.1101/392720http://creativecommons.org/licenses/by/4.0/

  • .CC-BY 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

    The copyright holder for this preprint (which was notthis version posted August 15, 2018. ; https://doi.org/10.1101/392720doi: bioRxiv preprint

    https://doi.org/10.1101/392720http://creativecommons.org/licenses/by/4.0/

  • .CC-BY 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

    The copyright holder for this preprint (which was notthis version posted August 15, 2018. ; https://doi.org/10.1101/392720doi: bioRxiv preprint

    https://doi.org/10.1101/392720http://creativecommons.org/licenses/by/4.0/