Upload
phamkien
View
216
Download
0
Embed Size (px)
Citation preview
MURDOCH RESEARCH REPOSITORY
This is the author’s final version of the work, as accepted for publication following peer review but without the publisher’s layout or pagination.
The definitive version is available at http://dx.doi.org/10.1016/j.ijpara.2011.11.010
Nicol, P., Gill, Reetinder, Fosu-Nyarko, J. and Jones, M.G.K. (2012) De novo analysis and functional classification of the
transcriptome of the root lesion nematode, Pratylenchus thornei, after 454 GS FLX sequencing. International Journal for
Parasitology, 42 (3). pp. 225-237.
http://researchrepository.murdoch.edu.au/7701/
Copyright: © 2012 Australian Society for Parasitology Inc.
It is posted here for your personal use. No further distribution is permitted.
Accepted Manuscript
de novo analysis and functional classification of the transcriptome of the root
lesion nematode, Pratylenchusthornei, after 454 GS FLX sequencing
Paul Nicol, Reetinder Gill, John Fosu-Nyarko, Michael G.K. Jones
PII: S0020-7519(12)00005-7
DOI: 10.1016/j.ijpara.2011.11.010
Reference: PARA 3353
To appear in: International Journal for Parasitology
Received Date: 12 September 2011
Revised Date: 18 November 2011
Accepted Date: 21 November 2011
Please cite this article as: Nicol, P., Gill, R., Fosu-Nyarko, J., Jones, M.G.K., de novo analysis and functional
classification of the transcriptome of the root lesion nematode, Pratylenchusthornei, after 454 GS FLX sequencing,
International Journal for Parasitology (2012), doi: 10.1016/j.ijpara.2011.11.010
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers
we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and
review of the resulting proof before it is published in its final form. Please note that during the production process
errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
1
de novo analysis and functional classification of the transcriptome of the root lesion nematode, 1
Pratylenchus thornei, after 454 GS FLX sequencing 2
Paul Nicol1, Reetinder Gill1, John Fosu-Nyarko1, Michael G. K. Jones1,* 3
4
Plant Biotechnology Research Group, WA State Agricultural Biotechnology Centre, School of 5
Biological Sciences and Biotechnology, Murdoch University, Perth, WA 6150, Australia 6
7
*Corresponding author. 8
Tel.: +61-8-9360-2424; fax: +61-8-9360-2502 9
10
E-mail address: [email protected] 11
12
1 All authors contributed equally to the work. 13
14
Note: Nucleotide sequence data reported in this paper are available in the GenBank™ database 15
under the accession numbers JO845319-JO845338 and JL859810-JL866456. 16
17
18
Note: Supplementary data associated with this article. 19
20
2
ABSTRACT 21
The migratory endoparasitic root lesion nematode Pratylenchus thornei is a major pest of the 22
cereals wheat and barley. In what we believe to be the first global transcriptome analysis for P. 23
thornei, using Roche GS FLX sequencing, 787,275 reads were assembled into 34,312 contigs using 24
two assembly programs, to yield 6,989 contigs common to both. These contigs were annotated, 25
resulting in functional assignments for 3,048. Specific transcripts studied in more detail included 26
carbohydrate active enzymes potentially involved in cell wall degradation, neuropeptides, putative 27
plant nematode parasitism genes, and transcripts that could be secreted by the nematode. 28
Transcripts for cell wall degrading enzymes were similar to bacterial genes, suggesting that they 29
were acquired by horizontal gene transfer. Contigs matching 14 parasitism genes found in 30
sedentary endoparasitic nematodes were identified. These genes are thought to function in 31
suppression of host defenses and in feeding site development, but their function in P. thornei may 32
differ. Comparison of the common contigs from P. thornei with other nematodes showed that 33
2,039 were common to sequences of the Heteroderidae, 1,947 to the Meloidogynidae, 1,218 to 34
Radopholus similis, 1,209 matched expressed sequence tags (ESTs) of Pratylenchus penetrans and 35
Pratylenchus vulnus, and 2,940 to contigs of Pratylenchus coffeae. There were 2,014 contigs 36
common to Caenarhabditis elegans, with 15.9 % being common to all three groups. Twelve 37
percent of contigs with matches to the Heteroderidae and the Meloidogynidae had no homology 38
to any C. elegans protein. Fifty-seven percent of the contigs did not match known sequences and 39
some could be unique to P. thornei. These data provide substantial new information on the 40
transcriptome of P. thornei, those genes common to migratory and sedentary endoparasitic 41
nematodes, and provide additional understanding of genes required for different forms of 42
3
parasitism. The data can also be used to identify potential genes to study host interactions and for 43
crop protection. 44
45
Keywords: Root lesion nematode, Pratylenchus thornei, 454 GS FLX sequencing, Transcriptome, 46
Annotation, Functional classification, Parasitism genes 47
48
4
1. Introduction 49
Plant parasitic nematodes cause annual losses in crop production estimated at US $125 50
billion (Chitwood, 2003). Root lesion nematodes (RLNs), genus Pratylenchus, are economically one 51
of the most important groups of plant parasites and are widely distributed with approximately 68 52
nominated species. From their wide host range and migratory nature, it has been suggested that 53
Pratylenchus spp. may be less specialised plant parasites, possibly representing an evolutionary 54
intermediate between free-living forms and the highly specialised sedentary endoparasites such as 55
the root-knot and cyst nematodes (Mitreva et al., 2004). Nevertheless RLNs can cause significant 56
crop losses of 7 to 15%, although often unrecognised (Vanstone et al., 2007). For example, field 57
trials in Western Australia have identified seven spp., including Pratylenchus thornei, that cause 58
yield losses of 5 to 10% in wheat and 5 to 20% in barley, principally from infestation of P. thornei 59
and Pratylenchus neglectus (Vanstone et al., 2007). 60
RLNs are classified as migratory intercellular root endoparasites, that is, they live mainly in 61
plant roots but are able to move and feed from different host plant cells. They are polyphagous 62
and can infect a wide range of host plants including most of the important crop species. Their life 63
cycle takes 3-8 weeks depending on species and conditions (Castillo and Vovlas, 2007). After 64
embryogenesis and development of first stage juvenile (J1), the nematode moults to the second 65
stage juvenile (J2) which then hatches from the egg. Except for the egg and J1 stages, all juvenile 66
and adult stages of RLNs are vermiform and motile, and can infect host plants. The J2 nematode 67
grows and moults three more times to become a mature male or female. Adults and juveniles can 68
leave and enter roots; adult females lay eggs singly or in small groups inside roots. Males are 69
common in some species and rare in others, although reproduction is usually by parthenogenesis. 70
Longer term survival under adverse conditions can occur at the egg stage or through suspended 71
5
metabolic activity. Infection causes root growth reduction, accompanied with the formation of 72
lesions, necrotic areas, browning and cell death, and root-rotting from secondary attack by fungi 73
or bacteria. When a RLN feeds, it both secretes materials from the single dorsal and two sub-74
ventral pharyngeal gland cells and ingests cell cytoplasm using its hollow mouth stylet, damaging 75
host cells in the process. The secretions from the pharyngeal glands that are injected into cells via 76
the mouth stylet while feeding and migrating probably contain a mixture of enzymes, including cell 77
wall-degrading enzymes. The feeding behaviour involves phases of cell probing, stylet 78
penetration, salivation and ingestion. Relative to their economic importance very little research 79
has been undertaken to study the molecular basis of RLN parasitism. 80
To date, molecular characterisation of Pratylenchus genes has been limited (e.g. 81
transcriptome analysis of Pratylenchus coffeae, Haegman et al., 2011), partly because their 82
migratory behaviour makes them more difficult to study than sedentary endoparasites. Genome 83
sequencing and annotation of the free-living nematodes; Caenorhabditis elegans (Consortium, 84
1998), Caenorhabditis briggsae (Stein et al., 2003) and Pristionchus pacificus (Dieterich et al., 85
2008), and the parasitic root-knot nematodes; Meloidogyne hapla (Opperman et al., 2008) and 86
Meloidogyne incognita (Abad et al., 2008) provide reference genomes for comparison with those 87
of RLNs. ‘Next generation’ sequencing technologies have revolutionised transcriptomics where 88
high throughput expression data for transcriptomes can be generated at a single-base resolution 89
(Morozova et al., 2009). The ‘next generation’ sequencing technology offered by Roche 454 GS-90
FLX was used to analyse the transcriptome of a mixed stage population of P. thornei. In this report 91
we present a functional classification of 6,989 contigs based on assignments to publicly available 92
datasets. 93
6
2. Material and methods 94
2.1. Source materials and RNA extraction 95
A population of P. thornei, derived from a single female isolated from an infected wheat 96
plant in Western Australia, was maintained by serial subculture on aseptic carrot discs in sterile 97
containers under sterile conditions and kept in the dark at 25⁰C (Jones et al., 2009). To harvest the 98
nematodes, the carrot discs were cut into 5 mm thick pieces and extracted using a mist apparatus 99
to collect active nematodes. Nematodes freshly extracted from carrot pieces were used for RNA 100
extraction. Eggs, juveniles stages (J2-J4) and adults were mixed together at the ratio of 1:2:3. 101
Total RNA was extracted from these nematodes using Trizol (Invitrogen Life Technologies, 102
Carlsbad, USA) and then cleaned using an RNAeasy Mini kit column (Qiagen Inc, California, USA), 103
following the manufacturer’s instructions. The extracted RNA was assessed for quality and 104
quantified using an Agilent 2100 Bioanalyzer (Agilent Technologies, Mississauga, Canada), giving 105
an RNA integration score (RIN) of 7.0. A cDNA library was prepared from 2,592 ng of total RNA 106
using the Ovation RNA-Seq system (NuGEN Technologies, Inc., CA, USA), which uses both Oligo-dT 107
and random primers, and checked for removal of rRNA using an Agilent 2100 Bioanalyzer. 108
Sequencing was carried out using a Roche 454 GS FLX DNA platform at the Institute of 109
Immunology and Infectious Diseases, Murdoch University, Perth, Australia. 110
111
2.2. Sequence data analysis and assembly 112
One full picotitre plate was used to generate 454 GS FLX datasets, with results from each 113
half; Lane 1 (L1) and Lane 2 (L2) were analysed separately as shown in Fig. 1. Following removal of 114
the sequencing linkers, the reads (from L1 and L2) were assembled separately with CAP3 (Huang 115
7
et al., 1999) and MIRA (Chevreux et al., 2004) assembly programs to generate contigs for each of 116
the two data sets. Two assembly programs were used in order to minimise assembly errors 117
(Kumar and Blaxter, 2010). In the next step, CAP3 was used to reassemble the four contig sets 118
into one. Finally, only those contigs common to each of the initial four assemblies were selected 119
for further analysis. The selected contigs were at least 90% similar over a minimum alignment 120
length of 40 bases. Although this assembly procedure reduced the total number of contigs, these 121
were expected to be of high quality and to better represent the original data. After assembly and 122
before annotating the transcripts, CD HIT-EST (Li and Godzik, 2006) was used to determine 123
whether there was any redundancy in the final data set. These were submitted to the National 124
Center for Biotechnology Information (NCBI) with the following accession numbers JO845319-125
JO845338 and JL859810-JL866456. 126
127
2.3. Transcript annotation 128
Protein identity, protein domains and Gene Ontology (GO) terms were assigned to 129
assembled contigs homologous to previously identified genes/proteins using the BLAST+ suite of 130
programs and Interproscan (Ashburner et al., 2000; Camacho et al., 2009; Hunter et al., 2009). 131
The BLAST programs were run in threaded mode to enable many alignments to be done in a short 132
time. BLAST searches were carried out consistently at a BLAST expected value (E) of E=1/6,989 133
(1.4e-04) unless otherwise stated. BLASTX was used to annotate the contigs with data available in 134
the nucleotide and amino acid datasets (non-redundant (nr) and nucleotide (nt)) curated by the 135
NCBI. Datasets containing 12,346,081 nt and 14,185,994 amino acid sequences were searched for 136
homology with the P. thornei contigs. Additional BLAST analyses were carried out using other 137
databases including the Whole Genome Shotgun , High Throughput Genomic Sequences , Genome 138
8
Survey Sequence and Expressed Sequence Tags (ESTs) of mouse, human and others, all from NCBI. 139
The biopython (Cock et al., 2009) libraries were used for parsing the XML output file from the 140
BLAST analysis. The results of annotation were stored in a SQLite database that contained 141
annotation and alignment information. The database allowed efficient mining of data through the 142
python SQLite wrapper. SQL queries were performed against the database to extract items of 143
interest. 144
145
2.4. Functional classification of transcripts 146
Pratylenchus thornei contigs were assigned to metabolic pathways using the Kyoto 147
Encyclopedia of Genes and Genomes (KEGG2, Kanehisa et al., 2010) database 148
(http://www.genome.jp/KEGG/) as an alternative approach to annotate transcripts and to 149
determine their biochemical functions. KEGG enzyme commission (EC) and orthology (KO) 150
pathways were used as the basis of this assignment. BLAST queries of the contigs against KEGG 151
databases (nucleotide and amino acid) were carried out to obtain information on biochemical 152
pathways. The results were parsed to obtain KEGG identifiers, EC numbers and KO numbers which 153
were then assigned to particular KEGG metabolic pathways. Python was used to extract KEGG 154
identifiers in every path for comparison with the annotated contigs. InterPro Scan was also used 155
to match P. thornei contigs to characterised protein domains in the InterPro database. Mapping of 156
InterPro domains allowed placement of contigs into the GO hierarchy. The functional 157
classification focused on three main aspects: genes potentially involved in plant parasitism 158
including cell wall-degrading enzymes, neuropeptides and candidate sequences for nematode 159
control via RNA interference (RNAi). Carbohydrate active enzymes (CAZymes) encoded by P. 160
thornei were identified using the Carbohydrate Active Enzymes (CAZy) database 161
9
(http://www.cazy.org/). The contigs were also compared with all ESTs downloaded from the NCBI 162
database using local TBLASTX. In addition, BLASTX queries against the C. elegans Wormbase 163
datasets (http://www.wormbase.org, release WS215, 2011) were carried out to further annotate 164
the transcriptome and to assign RNAi phenotypes to the contigs (Stein et al., 2001). The presence 165
of signal peptides in any of the contigs was predicted using SignalP 4.0 (Petersen et al., 2011). The 166
hidden Markov model was used to predict transmembrane helices (TMH) in the contigs 167
(Sonnhammer et al., 1998). Because the project focuses on the parasitic behaviour of P. thornei, 168
in all analyses involving published parasitism genes, CAZymes and prediction of new parasitism 169
genes using signal peptides and TMH, the whole reassembled contig set of 34,312 was used. 170
171
2.5. Analysis of unannotated transcripts 172
Transcripts with no known protein homologues in any database were further analysed to 173
assign possible functions to them. For those contigs, a database (http://sirna.sbc.su.se/) search 174
for contigs matching small interfering RNA’s (siRNAs) was performed using a local BLASTN from 175
NCBI (Chalk et al., 2005). The BLAST word size parameter was set to 15 and only those matches 176
with a percentage identity greater than 80% were retained. The presence of micro RNAs (miRNAs) 177
was also determined by searching a miRNA database (http://www.mirbase.org/), with BLASTN 178
using the accurate E-value condition (Griffiths et al., 2006). A further search was conducted to 179
match unannotated contigs to non-coding RNAs (ncRNAs) using the FANTOM 3 database 180
(http://research.imb.uq.edu.au/rnadb). This database contains ncRNAs previously identified in 181
functional annotation of mouse. BLASTN with word size set to 15 and score 50 was used to 182
determine matches. The number of significant matches for contigs in both the annotated and 183
unannotated data sets was determined. In addition, the number and distribution of open reading 184
10
frames (ORFs) of both the annotated and unannotated contigs were compared to determine 185
whether the latter set of contigs were protein coding sequences. 186
3. Results 187
3.1. 454 GS FLX sequencing and assembly 188
Total RNA extracted from mixed stages of P. thornei was converted to cDNA, rRNA 189
removed and samples sequenced on two full picotitre plates using the Roche 454 GS FLX platform. 190
After filtering the two data sets (L1 and L2), 390,417 reads of N50 of 268 bases were obtained in 191
L1 and 396,858 reads of N50 of 264 bases in L2 (Table 1). The reads were divided into groups 192
based on their length (read length groups). A distribution of the read length (intervals of 20 bases) 193
was bimodal with peaks at 70 and 230 (Supplementary Fig. S1). The first of these may be due to 194
the presence of characteristically short transcripts. The reduction in transcripts of less than 60 195
bases resulted from filtering out short reads. Both data sets have a similar distribution, with L2 196
showing slightly shorter lengths. Fig. 2 shows a parallel base distribution for both datasets L1 and 197
L2, expressed as the distribution of read lengths as a percentage of the total number of bases 198
sequenced. Transcripts of length close to the maximum 280 bases constituted the bulk of the 199
sequenced bases. Also shown is the reduction in numbers of transcripts in the range from 350 to 200
570 bases. 201
A comparison of each of the four initial assemblies was done to assess the quality of the 202
assemblies. The two datasets L1 and L2 were each assembled with CAP3 and MIRA assemblers 203
separately, resulting in 32,897 and 43,569 contigs, respectively. Fig. 3 shows the number and 204
distribution of contigs of a given length for each of the initial assemblies and results of the 205
reassembled data using CAP3. The distribution of the contigs in the size ranges for all of the 206
11
assemblies are similar, indicating no loss of quality in the process. Assemblies for L1 and L2 show 207
the most similarity, whereas those from different assemblers show the least. The MIRA 208
assemblies for reads L1 and L2 differed in N50 by only four bases, similarly the N50 for reads L1 209
and L2 differed by only six for the CAP3 assemblies. The largest difference in N50 of 22 bases 210
occurred between the MIRA L1 and CAP3 L2 assemblies. A final assembly of these four sets was 211
carried out using CAP3, resulting in a reassembled dataset of 34,312 contigs with a N50 of 707 212
bases. This represents an increase of approximately 200 bases from each of the sub-assemblies, 213
as shown in Table 1. The final step in the assembly process involved selecting those contigs from 214
the combined assembly that had components in each of the four initial assemblies. This 215
requirement was satisfied by a contig in the final assembly if one or more contigs from each of the 216
four sub-assemblies was used in assembling that contig. Subsequent filtering substantially 217
reduced the number of contigs in the final set to 6,989 (Table 1). 218
Approximately 50% of the contigs in the individual assemblies were in the size range of 219
100-500 bases, but in the combined assembly approximately 50% of the contigs were in size range 220
of 500-1,000 bases. In addition, approximately 75% of the contigs were at least 500 bases long in 221
the reassembled and common contig sets compared with 50% for the initial assemblies 222
(Supplementary Table S1). After assembly, CDHIT-EST was used to determine whether or not 223
there was redundancy in the final data set before annotation. There was no significant reduction 224
in the number of transcripts with more than 99% of the contigs being single copies. Also, the 225
reassembled and common contig sets contained contigs substantially longer than those found in 226
the individual assemblies (Fig. 3). This indicates that the final reassembly of the contigs using 227
CAP3 removed redundancy in the data, thereby removing the need to cluster the contigs. 228
229
12
3.2. Functional classification based on GO assignments 230
To categorise transcripts by putative function, the GO classification scheme was used. The 231
results of this analysis are presented in Fig. 4, which shows the percentages of transcripts 232
classified in terms of their biological, molecular and cellular functions. The y-axis indicates the 233
percentage of contigs assigned a particular GO term in the x-axis- with 1,368 contigs assigned GO 234
terms for multiple molecular functions, 831 mapping to multiple biological processes and 670 235
contigs to cellular components. A complete list of mapping is available in Supplementary Tables 236
S2-S4. Metabolism mappings (under Biological Process, Supplementary Table S2) were the most 237
highly represented GO category with 76% of the contigs encoding metabolism genes. The next 238
most highly represented category (55%) was binding/carrier proteins (in Molecular Function, 239
Supplementary Table S3), with 184 contigs matching ATP binding and over 100 contigs matching to 240
DNA and ion binding (including calcium, iron, metal and magnesium ion binding) genes. 241
Pratylenchus thornei ‘extracellular’ mappings (23 contigs) (under Cellular Component, 242
Supplementary Table S4) were mainly associated with transport of serum albumin which is 243
primarily a carrier protein for steroids, fatty acids and thyroid hormones, hormone activity related 244
to insulin/relaxin and chitin binding (Fig. 4). 245
Because nematodes are sterol-auxotrophs, they rely on dietary sources of sterol for growth 246
and development (Svaboda and Chitwood, 1992). Nematodes can modify ingested sterols, for 247
example they may synthesise 4α-methylated sterols and steroid hormones, using the latter to 248
regulate larval development and moulting (Entchev and Kurzchalia, 2005), and to synthesise 249
cholesterol. Four contigs involved in steroid metabolic process were found, for example oxysterol 250
binding proteins, which bind oxygenated derivatives of cholesterol. 251
252
13
3.3. Enzymes and orthologues involved in metabolic pathways 253
Pratylenchus thornei contigs were assigned to metabolic pathways using the KEGG 254
database as an alternative approach to determine biochemical functions. EC and KO pathways 255
were used as the basis of this assignment. EC numbers were assigned to 250 contigs of which 165 256
mapped to KO (see Supplementary Tables S5 and S6). All of the 11 major metabolic EC pathways 257
were represented in the contigs (Table 2). Pathways well represented included: purine 258
metabolism (12 enzymes represented), citrate cycle (nine), alanine/aspartate/glutamate 259
metabolism (seven), glycolysis/gluconeogenesis (six), butanoate metabolism (six), nitrogen 260
metabolism (six), pyrimidine metabolism (six) and lysine degradation (six). 261
In general, nematodes store fatty acids and use the glyoxylate cycle to generate 262
carbohydrates from beta-oxidation of such fatty acids, making them unique among animals 263
(Barrett and Wright, 1998). The glyoxylate pathway relies on two enzymes, malate synthase and 264
isocitrate lyase, to bypass two decarboxylation steps. Only two contigs of the mixed stage 265
population of P. thornei mapped to two glyoxylate pathway enzymes, malate synthase (EC: 266
2.3.3.9) and aconitate hydratase (EC: 4.2.1.3), which are shared with the citric acid cycle. 267
Isocitrate lyase was not found in this analysis. 268
In relation to KEGG assignments of transcripts to metabolic pathways (Table 2), we noted 269
apparent anomalies such as transcripts assigned to photosynthesis and carbon fixation – on closer 270
examination these transcripts are subunits α and β of an H+ ATPase common to electron transport 271
pathways in photosynthesis and oxidative phosphorylation, and the transcript assigned to carbon 272
fixation is triose phosphate isomerase, an enzyme of glycolysis and gluconeogenesis. Other 273
apparent unexpected results were observed for activities common to a number of pathways 274
including the assignment of a contig to hexose 6-phosphate transferase, which is involved in the 275
14
pathway for streptomycin biosynthesis, also a key enzyme in glycolysis. Full details of the EC 276
numbers of all the transcripts are provided in the Supplementary Tables S5 and S6. 277
In addition to an overall analysis of transcripts obtained from the P. thornei transcriptome, 278
a more detailed analysis of putative nematode parasitism genes, including secreted peptides and 279
CAZymes, neuropeptides and sequences orthologous to C. elegans genes, analysed for RNAi is 280
presented. 281
282
3.4. Pratylenchus thornei putative parasitism genes 283
Searches were undertaken to identify contigs in the P. thornei transcriptome that may 284
have homologies or functional equivalents to published parasitism genes or gene products of 285
sedentary and migratory endoparasitic plant nematodes. Twelve gene products known to be 286
involved in parasitism of root knot and cyst nematodes were identified at E-values ranging from 287
10E-6 for Mi-msp-1, a venom allergen-like protein to 10E-131 for 14-3-3(b) gene products, both of 288
M. incognita (Table 3). Contigs matching all gene products ranged from one (for a Heterodera 289
schachtii Annexin (Hs4F01), Mi-msp-1 and Heterodera glycines Vap-1, and a Globodera 290
rostochiensis Peroxiredoxin (Gr-TpX) to 31 for transthyretin-like proteins and precursors closely 291
related to H. glycines and Radopholus similis. Other gene products with high amino acid identities 292
to those of M. incognita were calreticulin (Mi-crt-1), cysteine proteases (Mi-cpl) and the 293
Glutathatione-transferase, Mi-gsts-1. Pratylenchus thornei gene products identical to parasitism 294
genes of cyst nematodes included Gp-far-1, a fatty acid retinoid binding protein and a RAN-BP-like 295
protein (Gp-rbp-1), both of Globodera pallida, and two products from G. rostochiensis: a RAN-BP-296
like protein, Gr-A18 and Gluthathione peroxidase. Twenty-nine contigs in the P. thornei 297
15
transcriptome were identified as transthyretin-like proteins of the migratory plant-parasitic 298
nematode, R. similis, specifically homologous to Rs-ttl 1-4. These proteins are members of an 299
extensive nematode-specific family of proteins highly expressed in the migratory nematode R. 300
similis. Two additional contigs homologous to a transthyretin-like protein precursor were closely 301
related to sequences of H. glycines. These proteins, as well as Mi-msp-1, are known to be 302
predominantly expressed in parasitic stages of the respective nematodes. 303
Chorismate mutase, cm-1, a gene whose product is known to be involved in the parasitism 304
of both root knot and cyst nematodes, was not identified in the P. thornei transcriptome at the 305
accurate E-value. However, five contigs matched cm-1 of G. rostochiensis (Gr-cm-1) and seven 306
matched cm-1 of H. schachtii (Hs-cm-1) with amino acid identities ranging from 31.6–42.3% and 307
28.6–47.6%, respectively. In addition, four contigs were only 47.6% identical to the isoform of Gr-308
cm-1 gene: Gr-cm-1-IRII (GenBank accession no. ABR19890.1), an alternative splicing product. 309
BLASTX searches also resulted in sequences matching cm-1 of Erysipelotrichaceae bacterium 310
(GenBank accession no. ZP06644729) and chorismate binding enzyme of the bacterium 311
Burkholderia mallei with 30% identity (GenBank accession no.YP103883.1). 312
313
3.5. Secreted peptides and CAZymes 314
CAZymes are primarily involved in carbohydrate metabolism and can be grouped into 315
families of structurally-related catalytic and carbohydrate-binding modules (or functional domains) 316
that degrade, modify or create glycosidic bonds. These families are glycoside hydrolases (GHs), 317
glycosyltransferases (GTs), polysaccharide lyases (PLs), carbohydrate esterases (CEs) and 318
carbohydrate binding modules (CBMs) (www.cazy.org). Some CAZymes are known to be secreted 319
by sedentary as well as migratory plant parasitic nematodes and have been predicted to be 320
16
involved in nematode migration in the root and feeding from host cells (Bakhetia et al., 2007). In 321
all, 160 contigs matched to all five families of CAZymes; 51 contigs to 19 subfamilies of GHs, 46 322
contigs to 24 subfamilies of GTs, one contig to the PL3 subfamily, eight contigs to three CE 323
subfamilies and 32 contigs matching to eight CBM subfamilies (Table 4). These gene products are 324
most homologous to bacterial and fungal homologues with a few similar to those of C. elegans, 325
insects, fishes and protozoa, except for the carbohydrate esterases, where all four are of non-326
pathogenic origins (Table 4). Contigs identified to encode peptides of the following CAZyme 327
subfamilies were closely related to those identified for three plants: CBM43 (Zea mays), GH17 and 328
GT31 (Oryza sativa), and GH36, GH47, GT47, GT48, GT50 and CBM50 (Vitis vinifera) (not included 329
in Table 4). 330
Endoglucanases in plant nematodes mostly belong to the GH5 subfamily which has been 331
well studied in sedentary nematode genera Heterodera, Globodera and Meloidogyne. One contig 332
of the P. thornei transcriptome was identified as a peptide of the GH5 subfamily: homologous to 333
beta 1, 4-endoglucanase of M. hapla. However, absent from the GH family of CAZymes identified 334
in the transcriptome is that of GH45 previously identified in Bursaphelenchus xylophilus. Of 335
interest are three contigs that show homology to xylanase (GH10) of Neocallimastix patriciarum 336
and one to a GH12 endoglucanase closely related to Sinorhizorbium meliloti, all of non-nematode 337
organisms. Previously xylanase and a GH12 endoglucanase were identified in R. similis and 338
Xiphinema index (Jones et al., 2005). Two contigs were identified as homologous to pectate lyases 339
of M. incognita and M. hapla, a group of enzymes also identified in several sedentary 340
phytoparasitic as well as non-pathogenic and fungivorous nematodes (Doyle and Lambert, 2002; 341
Kikuchi et al., 2006; Karim et al., 2009). 342
17
Further analysis to predict secreted proteins with possible roles in parasitism of P. thornei 343
was conducted using a combination of three approaches. First we looked for those contigs with 344
predicted signal peptides. We then removed contigs with TMH from this set using the hidden 345
Markov model. Lastly, contigs with carbohydrate active domains were retained. The final set 346
could be proteins secreted by parasitic stages of P. thornei, directed to host cells and could have a 347
direct role in the nematode’s feeding. This analysis revealed 1,416 of the contigs had no TMH 348
whereas 970 had signal peptides. Of these 194 had signal peptides but no TMH. There were 15 349
CAZymes with signal peptides but no TMH (Table 4). Seven of these were identical to GH 350
subfamily enzymes including the GH5 protein of M. hapla and Rotylenchulus reniformis, a GH45 351
enzyme identical to Brugia malayi and a GH47 protein of the nematode Schistosoma mansoni. 352
Other contigs of P. thornei that satisfied the above criteria included proteins from four GT 353
subfamilies (GT2, GT4, GT55 and GT66) mostly identical to those of bacteria and fungi, a CE11 354
protein (Ostreococcus tauri), and three with carbohydrate binding modules (CBM14, CBM33 and 355
CBM48). Examples of eight contigs that matched proteins known to be secreted by other 356
organism included: a glycosyl transferase from Caulobcater segnis and a putative trehalose 357
synthase (Cryptococcus neoformans). Also glycogen synthase (Steinernema feltiae), the C. elegans 358
protein, C08H9.6, and two hypothetical proteins from plant sources (rice and grapes) were 359
identified. One contig matched the Muc4B gene in Drosophila melanogaster, found in secretions 360
of the fly, and loosely related to the salivary mucin MG1 gene in Canis familiaris (E-value 2E-03). 361
362
3.6. Neuropeptides 363
Neuropeptides have a diverse role in the function and development of the nervous system 364
(Li et al., 1999). They not only function as neuromodulators, causing long-term changes in the 365
18
responding cell through G-protein-coupled receptors, but also as primary transmitters in 366
invertebrate nervous systems (Li et al., 1999; Li, 2005). To date 109 neuropeptide genes have 367
been identified in C. elegans by bioinformatic and peptidomic approaches (Li et al., 1999; Nathoo 368
et al., 2001; Pierce et al., 2001; McVeigh et al., 2005). Based on their conserved motifs, these are 369
divided into three classes: the FMRFamide-like peptide (flp) gene family, ins genes that encode 370
insulin-like peptides and peptides derived from neuropeptide-like protein (nlp) genes which have 371
no sequence similarity to the other two classes (Husson et al., 2007). 372
Searches for P. thornei contigs homologous to the wide variety of bioactive neuropeptides 373
of C. elegans revealed four matches to nlp genes (nlp 15, nlp 24, nlp 26 and nlp 37). Three contigs 374
homologous to two ins genes, ins-17 and ins-18 (EMBL gene names Ceinsulin-2 and Ceinsulin-1, 375
respectively) were identified in the P. thornei transcriptome. In C. elegans, these may be involved 376
in reproductive growth (Malone and Thomas, 1994; Li et al., 2003), in determining life span 377
(Kenyon et al., 1993) and to limit body size (McCulloch and Gems, 2003). Flp 17 was the only 378
FMRFamide-like peptide identified in the transcriptome, matching one contig. It is worth 379
mentioning that at E-values greater than 10E-3, 209 contigs matched to 40 of the 47 known nlps of 380
C. elegans, 31 to five flp gene families and one contig also matched to ins-2. 381
382
3.7. The P. thornei transcriptome and other nematode families 383
The annotated P. thornei contigs were compared with ESTs of plant parasitic nematode 384
families: Meloidogynidae (73343 ESTs, NCBI), Heteroderidae (48145 ESTs, NCBI), three groups of 385
Pratylenchidae and to C. elegans proteins (Wormbase release 225 with 25,030 proteins) using 386
BLASTX and tBLASTX (Fig. 5). In all, 1,947 and 2,039 of the contigs, respectively, matched to 387
19
Meloidogynidae and Heteroderidae families whereas 2,014 of the contigs matched C. elegans 388
proteins. A comparative analysis with available sequences of the Pratylenchidae revealed that 389
18.5% matched to the combined data of Pratylenchus penetrans and Pratylenchus vulnus, and 390
19.7% of sequences of R. similis have significant matches to the P. thornei common contigs (Fig. 5). 391
Of the 2,940 P. thornei common contigs which have significant similarity to 5,504 P. coffeae 392
contigs, only 584 (8.4%) were unannotated. Approximately 18% of the contigs matched 393
sequences of all of the groups compared. Sequences of some contigs were exclusively 394
homologous to some groups: 3.7% to ESTs of Meloidogynidae, 4.6% to ESTs of Heteroderidae and 395
5.4% to C. elegans proteins of which the 20 most highly conserved gene products are provided in 396
Table 5. These include gene products of structural components (actin, paramyosin), enzymes 397
(ubiquitin activating, protein kinase, glutamic acid decarboxylase), proteins with binding 398
properties (DNA binding-topoisomerase, calcium binding), genes of chromosomal loci important 399
for development such as structural maintenance of chromosomes (smc-3) and 400
embryogenesis/moulting (gex-2), and Argonaute-Like genes (alg-1) (Table 5). Twelve percent of 401
the contigs that matched the Meloidogynidae and Heteroderidae families had no match to any C. 402
elegans protein. Approximately 57% percent of the contigs did not match any sequence of the 403
sedentary plant parasites nor the free–living C. elegans. They also did not match any other 404
sequence on gene and protein sequence databases. 405
406
3.8. RNAi phenotype matches to C. elegans genes 407
The majority of C. elegans genes have been surveyed for knock-out phenotypes using RNAi, 408
where matching mRNA is degraded by the introduction of sequence-specific double-stranded RNA 409
(dsRNA) (Fire et al., 1998). A comparative analysis was done with the P. thornei contigs and C. 410
20
elegans genes to look for orthologues with high identity to those of C. elegans with RNAi 411
phenotypes. The 2,014 P. thornei contigs homologous to C. elegans matched 1,484 genes of which 412
1,276 have documented RNAi phenotypes. These genes are expressed in all stages of the 413
nematode’s development and are involved in cellular and molecular functions that directly affect 414
biological processes such as germ cell development, embryogenesis, larval development, 415
moulting, positive and negative regulation of growth, digestion, reproduction, excretion and 416
defense response. Most of the RNAi phenotypes indicate a knock-out of these genes will markedly 417
affect nematode development, and for many of these genes RNAi phenotypes include arrested 418
development at the embryonic, larval and adult stages. RNAi of approximately 35% of these genes 419
results in some form of lethality at the embryonic, larval or adult stages in C. elegans. 420
421
3.9. Unannotated contigs 422
A significant number of the contigs (approximately 57%) had no homology in the NCBI 423
databases after searches at E-values less than 10E-04 . This may be because these sequences are 424
specific to P. thornei or to Pratylenchus spp. in general. They may also be non-protein-coding 425
sequences, or may have had coding sequences that were too short to be matched by BLASTX. 426
Further studies were performed to analyse these sequences, their relevance or possible functions 427
in the transcriptome. ORFfinder was used to predict the number of non-protein coding sequences 428
in the un-annotated contigs and the distribution compared with that of the annotated dataset, 429
where an ORF was defined as a sequence with both start and stop codons using the standard 430
code. The results indicate 43% had ORFs. A similar percentage (43.2%) of the unannotated 431
contigs exhibited a similar pattern in ORF size distribution to that of annotated contigs, indicating 432
21
that they could be functional sequences. For both groups, the frequency of the contigs was 433
highest when the longest ORF ranged from 160-180 nt long (Fig. 6). 434
For functional predictions, the non-coding sequences were searched for matches to 435
regulatory RNAs, such as miRNAs and siRNAs, from a bulk of all of the regulatory RNAs (Backofen 436
et al., 2010). Six non-coding sequences were found to be miRNAs whereas 99 matched to 437
previously published siRNAs. A further search in the contigs for sequences matching non-coding 438
functional RNAs of the mouse genome indicates that the ratio of contigs with significant similarity 439
with the ncRNAs for the annotated and unannotated sets is similar; 18.9 and 23.4, respectively. 440
Considering a large percentage of the unannotated contigs with ORFs do not have any association 441
with ncRNAs, they could be genuine P. thornei coding sequences with no homology to previously 442
identified genes. 443
4. Discussion 444
This study describes the first transcriptome of mixed stages of the migratory root lesion 445
nematode, P. thornei, sequenced with Roche 454 GS FLX technology. Following assembly using 446
MIRA and CAP3 assemblers to obtain 34,312 contigs with a N50 of 707 bases, 6,989 contigs 447
common to both assemblers were selected for further study. Using different annotation 448
procedures, a total of 43% of the contigs were annotated. These were matched to known protein 449
products of which there were representative enzymes involved in all major metabolic pathways 450
essential for growth and development. Approximately 57% of the contigs did not match 451
sequences of any of the three groups nor any known organism and could be specific to P. thornei. 452
A comparison (BLASTX and tBLASTX) of the contigs with peptides of the free-living nematode C. 453
elegans and ESTs of Meloidogynidae and Heteroderidae, reveals that 12% were common to all 454
three groups. These may represent genes involved in nematode metabolism, structure and 455
22
development. Of the 12% of the contigs which matched the plant parasitic groups but were not 456
present in C. elegans, 3.7% and 4.6% of the contigs only matched sequences of Meloidogynidae 457
and Heteroderidae, respectively, with 4.7% common to both. The 12% of contigs common only to 458
plant parasites will include genes specifically involved in plant parasitism. 459
Our analysis revealed that the P. thornei transcriptome encodes messages similar to those 460
known to be involved in parasitism by sedentary endoparasitic nematodes. These include 461
orthologues to 14-3-3 and 14-3-3b proteins, calreticulin (Mi-crt-1), cysteine proteases (Mi-cpl-1), 462
Glutathione-S-transferase (Mi-gsts-1) and the Venom allergen-like protein (Mi-mps-1), all closely 463
relate to M. incognita and Annexin (Hs4F01), Transthyretin-like protein and Ubiquitin extension 464
protein (Hs-ubi-1) of H. glycines and H. schachtii, as well as five gene products homologous to 465
Gobodera spp. The proposed function of parasitism genes in sedentary endoparasitic nematodes 466
includes suppression of host defense e.g. chorismate mutase (Lambert et al., 1999; Bekal et al., 467
2003; Jones et al., 2003; Abad et al., 2008; Lu et al., 2008; Vanholme et al., 2009) and induction 468
and maintenance of host feeding cells e.g. SPRYSEC RANBP-like (Sacco et al., 2009). The presence 469
of genes found both in sedentary endoparasites and migratory endoparasites is consistent with a 470
function of suppression of host defense, but since root lesion nematodes do not induce specific 471
feeding cells, the roles of genes found in the RLNs previously assumed to be involved in feeding 472
cell induction, may indicate a different function in the migratory endoparasites, or that the role 473
proposed for sedentary endoparasites needs to be revised. 474
Thirty-one contigs were orthologous to transthyretin-like proteins and precursors of H. 475
glycines and R. similis. Although no conclusive data supports the involvement of these proteins in 476
parasitism by plant parasitic nematodes, these nematode-specific families of proteins are 477
abundantly expressed both in juveniles and adults, the motile and parasitic stages of R. similis, but 478
23
not in embryos (Jacob et al., 2007). Coupled with the presence of a putative signal peptide for 479
secretion, their mature proteins could have extracellular functions and may well constitute part of 480
the nematode’s secretome, which includes proteins for modifying plant cell wall polysaccharides 481
to aid migration and feeding. In the recently published transcriptome of P. coffeae (Haegeman et 482
al., 2011), one contig matched chorismate mutase with a ‘bit score of only 54.3’. This gene is 483
present in sedentary endoparasites where it may influence differentiation of root cells by altering 484
the synthesis of chorismate-derived products, resulting in a down-regulation of plant defense 485
against invading nematodes (Lambert et al., 1999; Bekal et al., 2003; Jones et al., 2003; Abad et al., 486
2008; Lu et al., 2008; Vanholme et al., 2009). Secreted chorismate mutases have also been 487
identified in plants, fungi and bacteria where they might serve as general tools for host 488
manipulation (Romero et al., 1995). Recently Djamei et al. (2011) reported that a chorismate 489
mutase, Cmu1, secreted by Ustilago maydis, a biotrophic fungal pathogen, acts as a virulence 490
factor in its interaction with host cells. They showed that during the host-pathogen interaction, 491
Cmu1 is taken up by plant cells, moves between cells and changes their metabolic status through 492
metabolic priming. However, at only 46.7% identity of 12 P. thornei contigs to chorismate mutase 493
of G. rostochiensis (Gr-cm-1) and H. schachtii (Hs-cm-1) identified in this study, it is still not clear 494
whether or how migratory plant parasitic nematodes of the Pratylenchus spp. make use of this 495
gene in their interaction with plant hosts. 496
Feeding habits and life styles of both sedentary endoparasitic and migratory ectoparasitic 497
plant nematodes indicate that secretions from their gland cells include CAZymes, which can act 498
directly on cell wall polysaccharides of host plants (Davis et al., 2004; Vanholme et al., 2004; Baum 499
et al., 2007; Mitchum et al., 2007). CAZymes identified in the P. thornei transcriptome belonged to 500
all five families including contigs matching 18 glycosyl hydrolase subfamilies of which most appear 501
to be of bacterial and fungal origin, supporting the theory that most of these have probably been 502
24
acquired from bacteria by horizontal gene transfer (Davis et al., 2000; Jasmer et al., 2003; Baldwin 503
et al., 2004; Ledger et al., 2006). Cellulases, in the CAZy database (http://www.cazy.org/), can be 504
found in 13 GH families: GH5, 6, 7, 8, 9, 10, 12, 26, 44, 45, 51, 61 and 74, most of which are 505
typically produced by plant pathogens and plant parasites (Lynd et al., 2002). Of the GH family of 506
enzymes identified in the P. thornei transcriptome, four (GH5, 9, 10 and 12) can be classified as 507
cellulases. These included a GH5 endoglucanase (M. hapla); also a xylanase (GH10) identical to 508
Neocallimastix patriciarum and a GH12 endoglucanase closely related to Sinorhizorbium meliloti, 509
orthologues of which have been shown to be produced by R. similis, M. incognita, P. coffeae and 510
Xiphinema index (Jones et al., 2005; Mitreva-Dautova et al., 2006; Haegeman et al., 2009, 2011). 511
Less surprisingly, the P. thornei glycosyl hydrolases orthologous to C. elegans are not cellulases. 512
They are α-glucocerebrodiase (GH30), β-glucosidase (GH31), α-amylase (GH13) and β-513
hexosiminidase (GH20), which are not known to have cell wall-degrading properties. The only 514
polysaccharide lyase in the transcriptome significantly matched PL3 of M. incognita and 515
Meloidogyne javanica, an enzyme which has been well characterised in cyst and root knot 516
nematodes (Bird and Koltai, 2000; Popeijus, H., 2002. The identification of cell-wall degrading 517
enzymes in Globodera rostochiensis. PhD thesis. Wageningen University and Research Centre, The 518
Netherlands), but was not known to be part of the arsenal of cell wall degrading enzymes 519
employed by nematodes of the family Pratylenchidae. The many other CAZymes identified in the 520
P. thornei transcriptome (Table 4) may have undetermined role(s) in nematode migration and 521
feeding but this remains to be determined. 522
To predict new or putative secreted proteins encoded by P. thornei we identified contigs 523
with signal peptides and carbohydrate enzymatic activity or binding modules but with no 524
transmembrane helices/domain. The most interesting contigs with these characteristics, which 525
had no matches to any C. elegans protein, could be involved in P. thornei-host interaction. These 526
25
included a beta-1,4-endoglucanase that had significant identity to those of the sedentary 527
endoparasitic M. hapla and the sedentary semi-endoparasitic R. reniformis. Beta 1, 4-528
endoglucanases are cellulases generally known to be employed by sedentary endoparasitic 529
nematodes in degrading cellulose by hydrolysis. In addition, some contigs matched to three other 530
putative secreted proteins: these were a 1,6-glucosidase, a pullulanase-type peptide with high 531
identity to that of Chloroflexus aggregans, a beta-mannase (Roseburia intestinalis), and a glycoside 532
hydrolase (GH45, B. malayi). These would need further investigation to confirm their possible 533
involvement in P. thornei parasitism. In this study, 57% of the contigs analysed did not match 534
proteins or hypothetical proteins in any database. By comparing these sequences with contigs of 535
the recently published P. coffeae transcriptome (where 75% were not annotated), the number of 536
unannotated P. thornei contigs did not change. Further analysis of these unannotated contigs 537
indicates that more than half did not have ORFs and are probably ncRNAs. However, two-thirds of 538
them exhibited the same distribution of ORF sizes as the annotated sequences. Furthermore, only 539
3.5% of these sequences matched publicly available miRNAs and siRNAs, suggesting that a 540
majority of these unidentified sequences could be genes unique to P. thornei. Further work is 541
needed to characterise them and to identify possible new functional sequences. Recent 542
transcriptome analyses have revealed extensive expression of ncRNAs (Mercer et al., 2009; 543
Backofen et al., 2010; Langenberger et al., 2010). A study by Kapranov et al. (2007) revealed that 544
transcription of non-coding sequences can be up to four times higher than those of coding 545
sequences. Although previously thought to be transcriptional ‘noise’ (Struhl, 2007), ncRNAs are 546
now known to have a broad range of functional roles in subcellular structural organization and 547
development (Amaral and Mattick, 2008). For example, because some nematode genes are 548
polycistronic, ncRNAs could have significant roles in regulation of expression via cis/trans splicing. 549
26
Comprehensive analysis of the transcriptome revealed high identity of some proteins to 550
those of the fully annotated C. elegans for which RNAi phenotypes indicate disruptions to 551
morphology and development at all stages of development. This analysis and reports of successful 552
in-planta RNAi against cyst and root knot nematodes suggests (Steeves et al., 2006; Yadav et al., 553
2006) that this approach should also be successful in conferring transgenic resistance to root 554
lesion nematodes. 555
In conclusion, the transcriptome for P. thornei has been sequenced and 6,989 unique 556
contigs common to two contig assemblers were annotated. Of these, 43% were identified and 557
provide new information for this species. The unannotated sequences include functional 558
sequences specific to this genus and require further characterisation. The availability of such data 559
will help to differentiate genes specific to sedentary and migratory endoparasites, and to 560
characterise those genes involved in migration and feeding. This information will also be valuable 561
for further development of diagnostic tests to identify specific RLN species and to identify new 562
gene targets for their control, for example via RNAi. 563
Acknowledgements 564
This work was supported by the Australian India Strategic Research Fund (BF030027), 565
Murdoch University, Australia (PhD studentship to Paul Nicol), iVEC, Australia through the use of 566
advanced computing resources located at Murdoch University and NemGenix Pty Ltd, Australia. 567
568
27
References 569
Abad, P., Gouzy, J., Aury, J.-M., Castagnone-Sereno, P., Danchin, E.G.J., Deleury, E., Perfus-570
Barbeoch, L., Anthouard, V., Artiguenave, F., Blok, V.C., Caillaud, M.-C., Coutinho, P.M., 571
Dasilva, C., De Luca, F., Deau, F., Esquibet, M., Flutre, T., Goldstone, J.V., Hamamouch, N., 572
Hewezi, T., Jaillon, O., Jubin, C., Leonetti, P., Magliano, M., Maier, T.R., Markov, G.V., 573
McVeigh, P., Pesole, G., Poulain, J., Robinson-Rechavi, M., Sallet, E., Segurens, B., 574
Steinbach, D., Tytgat, T., Ugarte, E., van Ghelder, C., Veronico, P., Baum, T.J., Blaxter, M., 575
Bleve-Zacheo, T., Davis, E.L., Ewbank, J.J., Favery, B., Grenier, E., Henrissat, B., Jones, J.T., 576
Laudet, V., Maule, A.G., Quesneville, H., Rosso, M.-N., Schiex, T., Smant, G., Weissenbach, 577
J., Wincker, P., 2008. Genome sequence of the metazoan plant-parasitic nematode 578
Meloidogyne incognita. Nat. Biotech. 26, 909-915. 579
Amaral, P.P., Mattick, J.S., 2008. Noncoding RNA in development. Mamm. Genome 19, 454-492. 580
Ashburner,M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., 581
Dwight, S.S., Eppig, J.T., Harris, A.M., Hill, D.P., Issel-Tarver, L., Kasarskis, A., Lewis, S., 582
Matese, J.C., Richardson, J.E., Ringwald, M., Rubin, G.M., Sherlock, G., 2000. Gene 583
Ontology: tool for the unification of biology. Nat. Genet. 25, 25-29. 584
Backofen, R., Chitsaz, H., Hofacker, I., Sahinalp, S., Stadler, P., 2010. Computational studies of non-585
coding RNAs. Pac. Symp. Biocomput. 15, 54-56. 586
Bakhetia, M., Urwin, P.E., Atkinson, H.J., 2007. qPCR analysis and RNAi define pharyngeal gland 587
cell-expressed genes of Heterodera glycines required for initial interactions with the host. 588
Mol. Plant-Microbe Interact. 20, 306-312. 589
Baldwin, J.G., Nadler, S.A., Adams, B.J., 2004. Evolution of plant parasitism among nematodes. 590
Ann. Rev. Phytopathol. 42, 83-105. 591
Barrett, J., Wright, D.J., 1998. Intermediary metabolism. In: Perry, R.N., Wright, D.J. (Ed.), The 592
physiology and biochemistry of free-living and plant-parasitic nematodes. CABI Publishing, 593
Oxford, pp 331-353. 594
Baum, T.J., Hussey, R.S., Davis, E.L., , 2007. Root-knot and cyst nematode parasitism genes: the 595
molecular basis of plant parasitism. In: Setlow, J.K. (Ed.), Genetic Engineering, Landscape 596
Series. Springer, New York USA, 28, pp. 17-43. 597
Bekal, S., Niblack, T.L., Lambert, K.N., 2003. A chorismate mutase from the soybean cyst nematode 598
Heterodera glycines shows polymorphisms that correlate with virulence. Mol. Plant-599
Microbe Interact. 16, 439-446. 600
Bird, D.M.K., Koltai, H., 2000. Plant parasitic nematodes: habitats, hormones, and horizontally-601
acquired genes. J. Plant Growth Regul. 19, 183-194. 602
Blanchard, A., Esquibet, M., Fouville, D., Grenier, E., 2005. Ranbpm homologue genes 603
characterised in the cyst nematodes Globodera pallida and Globodera 'mexicana'. Physiol. 604
Mol. Plant Pathol. 67, 15-22. 605
Camacho, C., Coulouris, G., Avagyan, V., Ma, N., Papadopoulos, J., Bealer, K., Madden, T.L., 2009. 606
BLAST+: architecture and applications. BMC Bioinf. 10, 421. 607
Castillo, P. and Vovlas, N., 2008. Pratylenchus (Nematoda: Pratylenchidae): Diagnosis, Biology, 608
Pathogenicity and Management. Nematology Monographs and Perspectives, Vol. 6. Hunt, 609
D.J., Perry, R.N., series eds. Brill, Leiden, The Netherlands. 610
Chalk, A.M., Warfvinge, R., Georgii-Hemming, P., Sonnhammer, E.L.L., 2005. The siRNA database. 611
Nucl. Acids Res. 33, D131-134. 612
28
Chevreux, B., Pfisterer, T., Drescher, B., Driesel, A.J., Muller, W.E., Wetter, T., Suhai, S., 2004. Using 613
the miraEST assembler for reliable and automated mRNA transcript assembly and SNP 614
detection in sequenced ESTs. Genome Res. 14, 1147-1159. 615
Chitwood, D. J., 2003. Research on plant-parasitic nematode biology conducted by the United 616
States Department of Agriculture - Agricultural Research Service. Pest Manag. Sci. 59, 748-617
753. 618
Cock, P.J., Antao, T., Chang, J.T., Chapman, B.A., Cox, C.J., Dalke, A., Friedberg, I., Hamelryck, T., 619
Kauff, F., Wilczynski, B., de Hoon, M.J., 2009. Biopython: freely available Python tools for 620
computational molecular biology and bioinformatics. Bioinformatics 25, 1422-1423. 621
Davis, E.L., Hussey, R.S., Baum, T.J., Bakker, J., Shcots, A., Rosso, M.N., Abad, P., 2000. Nematode 622
parasitism genes. Annu. Rev. Phytopathol. 38, 365-396. 623
Davis, E.L., Hussey, R.S., Baum, T.J., 2004. Getting to the roots of parasitism by nematodes. Trends 624
Parasitol. 20, 134-141. 625
Dieterich, C., Clifton, S.W., Schuster, L.N., Chinwalla, A., Delehaunty, K., Dinkelacker, I., Fulton, L., 626
Fulton, R., Godfrey, J., Minx, P., 2008. The Pristionchus pacificus genome provides a unique 627
perspective on nematode lifestyle and parasitism. Nat. Genet. 40, 1193-1198. 628
Ding, X., Shields, J., Allen, R., Hussey, R., 2000. Molecular cloning and characterisation of a venom 629
allergen AG5-like cDNA from Meloidogyne incognita1. Int. J. Parasitol. 30, 77-81. 630
Djamei, A., Schipper, K., Rabe, F., Ghosh, A., Vincon, V., Kahnt, J., Osorio, S., Tohge, T., Fernie, A.R., 631
Feussner, I., Feussner, K., Meinicke, P., Stierhof, Y.D., Schwarz, H., Macek, B., Mann, M., 632
Kahmann, R., 2011. Metabolic priming by a secreted fungal effector. Nature 478, 395–398 633
Doyle, E.A., Lambert, K.N., 2003. Meloidogyne javanica chorismate mutase 1 alters plant cell 634
development. Mol. Plant-Microbe Interact. 16, 123-131. 635
Entchev, E.V., Kurzchalia,T., 2005. Requirement of sterols in the life cycle of the nematode 636
Caenorhabditis elegans. Semin. Cell Dev. Biol. 16, 175 - 182. 637
Fire, A., Xu, S., Montgomery, M.K., Kostas, S.A., Driver, S.E. and Mello, C.C., 1998. Potent and 638
specific genetic interference by double-stranded RNA in Caenorhabditis elegans. Nature 639
391, 806-811. 640
Gao, B., Allen, R., Davis, E.L., Baum, T.J., Hussey, R.S., 2004. Molecular characterisation and 641
developmental expression of a cellulose-binding protein gene in the soybean cyst 642
nematode Heterodera glycines. Int. J. Parasitol. 34, 1377-1383. 643
Gao, B., Allen, R., Maier, T., Davis, E.L., Baum, T.J., Hussey, R.S., 2001. Molecular characterisation 644
and expression of two venom allergen-like protein genes in Heterodera glycines. Int. J. 645
Parasitol. 31, 1617-1625. 646
Gao, B., Allen, R., Maier, T., Davis, E., Baum, T., Hussey, R., 2002. Identification of a New β-1, 4-647
endoglucanase Gene Expressed in the Esophageal Subventral Gland Cells of Heterodera 648
glycines. J. Nematol. 34, 12. 649
Gao, B., Allen, R., Maier, T., Davis, E.L., Baum, T.J., Hussey, R.S., 2003. The Parasitome of the 650
Phytonematode Heterodera glycines. Mol. Plant-Microbe Interact. 16, 720-726. 651
Griffiths-Jones, S., Grocock, R.J., van Dongen, S., Bateman, A., Enright, A.J., 2006. miRBase: 652
microRNA sequences, targets and gene nomenclature. Nucl. Acids Res. 34, 140-144 653
Haegeman, A., Joseph, S., Gheysen, G., 2011. Analysis of the transcriptome of the root lesion 654
nematode Pratylenchus coffeae generated by 454 sequencing technology. Mol. Biochem. 655
Parasitol. 178, 7-14. 656
Haegeman, A., Vanholme, B., Gheysen, G., 2009. Charachterization of a putative endoxylanase in 657
the migratory plant-parasitic nematode Radopholus similis. Mol. Plant Pathol. 10, 389 - 658
401. 659
29
Huang, X., Madan, A., 1999. CAP3: A DNA sequence assembly program. Genome Res. 9, 868-887. 660
Hunter, S., Apweiler, R., Attwood, T.K., Bairoch, A., Bateman, A., Binns, D., Bork, P., Das, U., 661
Daugherty, L., Duquenne, L., Finn, R.D., Gough, J., Haft, D., Hulo, N., Kahn, D., Kelly, E., 662
Laugraud, A., Letunic, I., Lonsdale, D., Lopez, R., Madera, M., Maslen, J., McAnulla, C., 663
McDowall, J., Mistry, J., Mitchell, A., Mulder, N., Natale, D., Orengo, C., Quinn, A.F., 664
Selengut, J.D., Sigrist, C.J.A., Thimma, M., Thomas, P.D., Valentin, F., Wilson, D., Wu, C.H., 665
Yeats, C., 2009. InterPro: the integrative protein signature database. Nucl. Acids Res. 37, 666
D211-215. 667
Husson, S.J., Mertens, I., Janssen, T., Lindemans, M., Schoofs, L., 2007. Neuropeptidergic signaling 668
in the nematode Caenorhabditis elegans. Prog. Neurobiol. 82, 33-55. 669
Jacob, J., Vanholme, B., Haegeman, A., Gheysen, G., 2007. Four transthyretin-like genes of the 670
migratory plant-parasitic nematode Radopholus similis: members of an extensive 671
nematode-specific family. Gene 402, 9-19. 672
Jasmer, D.P., Goverse, A., Smant, G., 2003. Parasitic nematode interactions with mammals and 673
plants. Ann. Rev. Phytopathol. 41, 245-270. 674
Jaubert, S., Laffaire, J.B., Ledger, T.N., Escoubas, P., Amri, E.Z., Abad, P., Rosso, M.N., 2004. 675
Comparative analysis of two 14-3-3 homologues and their expression pattern in the root-676
knot nematode Meloidogyne incognita. Int. J. Parasitol. 34, 873-880. 677
Jaubert, S., Laffaire, J.B., Piotte, C., Abad, P., Rosso, M.N., 2002. Direct identification of stylet 678
secreted proteins from root-knot nematodes by a proteomic approach. Mol. Biochem. 679
Parasitol. 121, 205-211. 680
Jaubert, S., Milac, A.L., Petrescu, A.J., de Almeida-Engler, J., Abad, P., Rosso, M.N., 2005. In planta 681
secretion of a calreticulin by migratory and sedentary stages of root-knot nematode. Mol. 682
Plant-Microbe Interact. 18, 1277-1284. 683
Jones, J.T., Furlanetto, C., Bakker, E., Banks, B., Blok, V., Chen, Q., Phillips, M., Prior, A., 2003. 684
Characterization of a chorismate mutase from the potato cyst nematode Globodera 685
pallida. Mol. Plant Pathol. 4, 43-50. 686
Jones, J.T., Furlanetto, C., Kikuchi, T., 2005. Horizontal gene transfer from bacteria and fungi as a 687
driving force in the evolution of plant parasitism in nematodes. Nematology 7, 641-646. 688
Jones, J.T., Reavy, B., Smant, G., Prior, A.E., 2004. Glutathione peroxidases of the potato cyst 689
nematode Globodera Rostochiensis. Gene 324, 47-54. 690
Jones, M.G.K., Rana, J., Fosu-Nyarko, J., Zhang, J.J., Perera, M. and Chamberlain, D. 2009. Towards 691
developing transgenic resistance to nematodes in wheat. In Cereal cust nematodes: status, 692
research and outlook, Eds Riley, I.T., Nicol, J.M. and Dababat, A.A., Proceedings First 693
Workshop of the International Cereal Cyst Nematode Initative, Oct 2009, Anatolya, Turkey: 694
CIMMYT, pp 191-194.Kikuchi, T., Shibuya, H., Aikawa, T., Jones, J.T., 2006. Cloning and 695
characterization of pectate lyases expressed in the esophageal gland of the pine wood 696
nematode Bursaphelenchus xylophilus. Mol. Plant-Microbe Interact. 19, 280-287. 697
Kanehisa, M., Goto, S., Furumichi, M., Tanabe, M., Hirakawa, M., 2010. KEGG for representation 698
and analysis of molecular networks involving diseases and drugs. Nucl. Acids Res. 38, D355-699
360. 700
Kapranov, P., Cheng, J., Dike, S., Nix, D.A., Duttagupta, R., Willingham, A.T., Stadler, P.F., Hertel, J., 701
Hackermüller, J., Hofacker, I.L., 2007. RNA maps reveal new RNA classes and a possible 702
function for pervasive transcription. Science 316, 1484. 703
Karim, N., Jones, J., Okada, H., Kikuchi, T., 2009. Analysis of expressed sequence tags and 704
identification of genes encoding cell-wall-degrading enzymes from the fungivorous 705
nematode Aphelenchus avenae. BMC Genomics 10, 525. 706
30
Kenyon, C., Chang, J., Gensch, E., Rudner, A., Ruvkun, G., Tabtiang, R., 1993. A C. elegans mutant 707
that lives twice as long as wild type. Nature 366, 461-464. 708
Kumar, S., Blaxter, M., 2010. Comparing de novo assemblers for 454 transcriptome data. BMC 709
Genomics 11, 571. 710
Lambert, K.N., Allen, K.D., Sussex, I.M., 1999. Cloning and characterization of an esophageal-gland-711
specific chorismate mutase from the phytoparasitic nematode Meloidogyne javanica. Mol. 712
Plant-Microbe Interact. 12, 328-336. 713
Langenberger, D., Bermudez-Santana, C., Stadler, P., Hoffmann, S., 2010. Identification and 714
classification of small RNAs in transcriptome sequence data. Pac. Symp. Biocomput. 15, 80–715
87. 716
Ledger, T.N., Jaubert, S., Bosselut, N., Abad, P., Rosso, M.N., 2006. Characterization of a new 717
[beta]-1, 4-endoglucanase gene from the root-knot nematode Meloidogyne incognita and 718
evolutionary scheme for phytonematode family 5 glycosyl hydrolases. Gene 382, 121-128. 719
Li, C., 2005. The ever-expanding neuropeptide gene families in the nematode Caenorhabditis 720
elegans. Parasitology 131, 109-127. 721
Li, W. and GODZIK, A., 2006. Cd-hit: a fast program for clustering and comparing large sets of 722
protein or nucleotide sequences. Bioinformatics 22, 1658-1659. 723
Li, C., Nelson, L.S., Kim, K., Nathoo, A., Hart, A.C., 1999. Neuropeptide gene families in the 724
nematode Caenorhabditis elegans. Ann. NY Acad. Sci. 897, 239-252. 725
Li, W., Kennedy, S.G., Ruvkun, G., 2003. daf-28 encodes a C. elegans insulin superfamily member 726
that is regulated by environmental cues and acts in the DAF-2 signaling pathway. Genes 727
Dev. 17, 844-858. 728
Lu, S.W., Tian, D., Borchardt-Wier, H.B., Wang, X., 2008. Alternative splicing: A novel mechanism of 729
regulation identified in the chorismate mutase gene of the potato cyst nematode 730
Globodera rostochiensis. Mol. Biochem. Parasitol. 162, 1-15. 731
Lynd, L.R., Weimer, P.J., van Zyl, W.H., Pretorius, I.S., 2002. Microbial Cellulose Utilization: 732
Fundamentals and Biotechnology. Microbiol. Mol. Biol. Rev. 66, 506-577. 733
Malone, E.A., Thomas, J.H., 1994. A screen for nonconditional dauer-constitutive mutations in 734
Caenorhabditis elegans. Genetics 136, 879-886. 735
McCulloch, D., Gems, D., 2003. Body size, insulin/IGF signaling and aging in the nematode 736
Caenorhabditis elegans. Exp. Gerontol. 38, 129-136. 737
McVeigh, P., Leech, S., Mair, J.R., Marks, N.J., Geary, T.J., Maule, A.G., 2005. Analysis of 738
FMRFamide-like peptide (FLP) diversity in phylum Nematoda. Int. J. Parasitol. 35, 1043-739
1060. 740
Mercer, T.R., Dinger, M.E., Mattick, J.S., 2009. Long non-coding RNAs:insight into functions. Nat. 741
Rev. 10, 155-159. 742
Mitchum, M., Hussey, R., Davis, E., Baum, T., Punja, Z., Boer, S., Sanfaçon, H., 2007. Application of 743
biotechnology to understand pathogenesis in nematode plant pathogens. In: Punja, Z. K., 744
de Boer, S. H., Sanfaçon, H. (Ed.), Biotechnology and Plant Disease Managment. CABI 745
International, Wallingford, UK, pp 58-86. 746
Mitreva, M., Elling, A., Dante, M., Kloek, A., Kalyanaraman, A., Aluru, S., Clifton, S., Bird, D.M.K., 747
Baum, T., McCarter, J., 2004. A survey of SL1-spliced transcripts from the root-lesion 748
nematode Pratylenchus penetrans. Mol. Genet. Genomics 272, 138-148. 749
Mitreva-Dautova, M., Roze, E., Overmars, H., de Graaff, L., Schots, A., Helder, J., Goverse, A., 750
Bakker, J., Smant, G., 2006. A symbiont-independent endo-1, 4-β-xylanase from the plant-751
parasitic nematode Meloidogyne incognita. Mol. Plant-Microbe Interact. 19, 521-529. 752
31
Morozova, O., Hirst, M., Marra, M.A., 2009. Applications of new sequencing technologies for 753
transcriptome analysis. Annu. Rev. Genomics Hum. Genet. 10, 135-151. 754
Nathoo, A.N., Moeller, R.A., Westlund, B.A., Hart, A.C. 2001. Identification of neuropeptide-like 755
protein gene families in Caenorhabditis elegans and other species. Proc. Natl. Acad. Sci. 756
USA 98, 14000-14005. 757
Opperman, C.H., Bird, D.M., Williamson, V.M., Rokhsar, D.S., Burke, M., Cohn, J., Cromer, J., 758
Diener, S., Gajan, J., Graham, S., Houfek, T.D., Liu, Q., Mitros, T., Schaff, J., Schaffer, R., 759
Scholl, E., Sosinski, B.R., Thomas, V.P., Windham, E., 2008. Sequence and genetic map of 760
Meloidogyne hapla: A compact nematode genome for plant parasitism. Proc. Natl. Acad. 761
Sci. USA 105, 14802-14807. 762
Petersen, T.N., Brunak, S., von Heijne, G., Nielsen, H., 2011. SignalP 4.0: discriminating signal 763
peptides from transmembrane regions. Nature Meth. 8, 785-786. 764
Pierce, S.B., Costa, M., Wisotzkey, R., Devadhar, S., Homburger, S.A., Buchman, A.R., Ferguson, 765
K.C., Heller, J., Platt, D.M., Pasquinelli, A.A., Liu, L.X., Doberstein, S.K., Ruvkun, G., 2001. 766
Regulation of DAF-2 receptor signaling by human insulin and ins-1, a member of the 767
unusually large and diverse C. elegans insulin gene family. Genes Dev. 15, 672-686. 768
Prior, A., Jones, J.T., Blok, V.C., Beauchamp, J., McDermott, L., Cooper, A., Kennedy, M.W., 2001. A 769
surface-associated retinol-and fatty acid-binding protein (Gp-FAR-1) from the potato cyst 770
nematode Globodera pallida: lipid binding activities, structural analysis and expression 771
pattern. Biochem. J. 356, 387. 772
Romero, R., Roberts, M., Phillipson, J., 1995. Chorismate mutase in microorganisms and plants. 773
Phytochemistry. 40, 1015-1025. 774
Sacco, M.A., Koropacka, K., Grenier, E., Jaubert, M.J., Blanchard, A., Goverse, A., Smant, G., 775
Moffett, P., 2009. The cyst nematode SPRYSEC protein RBP-1 elicits Gpa2- and RanGAP2-776
dependent plant cell death. PLoS Pathog. 5, e1000564. 777
Shingles, J., Lilley, C., Atkinson, H., Urwin, P., 2007. Meloidogyne incognita: molecular and 778
biochemical characterisation of a cathepsin L cysteine proteinase and the effect on 779
parasitism following RNAi. Exp. Parasitol. 115, 114-120. 780
Sonnhammer, E., Von Heijne, G., Krogh, A., 1998. A hidden Markov model for predicting 781
transmembrane helices in protein sequences. Proc. Int. Conf. Intell. Syst. 6, 175-182. 782
Steeves, R.M., Todd, T.C., Essig, J.S., Trick, H.N., (2006). Transgenic soybeans expressing siRNAs 783
specific to a major sperm protein gene suppress Heterodera glycines reproduction. Funct. 784
Plant Biol. 33, 991–999 785
Stein, L.D., Bao, Z., Blasiar, D., Blumenthal, T., Brent, M.R., Chen, N., Chinwalla, A., Clarke, L., Clee, 786
C., Coghlan, A., Coulson, A., D'Eustachio, P., Fitch, D.H.A., Fulton, L.A., Fulton, R.E., Griffiths-787
Jones, S., Harris, T.W., Hillier, L.W., Kamath, R., Kuwabara, P.E., Mardis, E.R., Marra, M.A., 788
Miner, T.L., Minx, P., Mullikin, J.C., Plumb, R.W., Rogers, J., Schein, J.E., Sohemann, M., 789
Spieth, J., Stajich, J.E., Wei, C., Willey, D., Wilson, R.K., Durbin, R., Waterston, R.H., 2003. 790
The genome sequence of Caenorhabditis briggsae: a platform for comparative genomics. 791
PLoS Biol. 1(2), e45. 792
Struhl, K., 2007. Transcriptional noise and the fidelity of initiation by RNA polymerase II. Nat. 793
Struct. Mol. Biol. 14, 103-105. 794
Svoboda, J.A., Chitwood, D.J., 1992. Inhibition of sterol metabolism in insects and nematodes, 795
regulation of isopentenoid metabolism. In: Nes, W.D., Parish, E.J., Trzaskos, J.M. (Ed.), 796
Regulation of Isopentenoid Metabolism. ACS Symposium Series Vol. 497, American 797
Chemical Society Press, Washington pp. 205-218. 798
32
The C. elegans Sequencing Consortium, 1998. Genome sequence of the nematode C. elegans: a 799
platform for investigating biology. Science 282, 2012-2018. 800
Tytgat, T., Vanholme, B., De Meutter, J., Claeys, M., Couvreur, M., Vanhoutte, I., Gheysen, G., Van 801
Criekinge, W., Borgonie, G., Coomans, A., 2004. A new class of ubiquitin extension proteins 802
secreted by the dorsal pharyngeal gland in plant parasitic cyst nematodes. Mol. Plant-803
Microbe Interact. 17, 846-852. 804
Vanholme, B., Kast, P., Haegeman, A., Jacob, J., Grunewald, W.I.M., Gheysen, G., 2009. Structural 805
and functional investigation of a secreted chorismate mutase from the plant-parasitic 806
nematode Heterodera schachtii in the context of related enzymes from diverse origins. 807
Mol. Plant Pathol. 10, 189-200 808
Vanholme, B., De Meutter, J., Tytgat, T., Van Montagu, M., Coomans, A., Gheysen, G., 2004. 809
Secretions of plant-parasitic nematodes: a molecular update. Gene 332, 13-27. 810
Vanstone, V., 2007. Root Lesion and Burrowing Nematodes in Western Australian cropping 811
systems. Bulletin 4698, Department of Agriculture and Food Western Australia, Perth, 812
Australia . 813
Yadav, B.C., Veluthambi, K., Subramaniam, K., (2006). Host-generated double stranded RNA 814
induces RNAi in plant-parasitic nematodes and protects the host from infection. Mol. 815
Biochem. Parasitol. 148, 219–222 816
817
33
Legends to Figures 818
Fig. 1. Overview of the steps used to assemble and annotate the Pratylenchus thornei 819
transcriptome. 820
Fig. 2. The percentage each read group contributed to the total sequenced length. 821
Fig. 3. The distribution of two sets of Pratylenchus thornei contigs in different size ranges, 822
assembled with CAP3 and MIRA separately, compared with the combined data generated with 823
CAP3. 824
Fig. 4. Functional classification of the Pratylenchus thornei contigs using Gene Ontology (GO) 825
terms. 826
Fig. 5. Comparison of Pratylenchus thornei contigs with expressed sequence tags (ESTs) of 827
Pratylenchidae, Heteroderidae, Meloidogynidae and Caenarhabditis elegans. Numbers in 828
parentheses indicate total numbers of contigs. 829
Fig. 6. Frequency of open reading frames for annotated and unannotated contigs. 830
831
Supplementary Figure legend 832
833
Supplementary Fig. S1. Distribution of reads corresponding to different size ranges of Pratylenchus 834
thornei transcriptome. 835
Table 1. Number of reads of Pratylenchus thornei transcriptome and contigs after assembly with
CAP3 and MIRA
Number of Reads Assembly of Contigs
Lane 1 (L1) Lane 2 (L2) CAP3 L1 MIRA L1 CAP3 L2 MIRA L2 Combined Final
Number of
Reads/Contigs 390,417 396,858
32,897 43.241 33,150 43,569 34,312 6,989
Total
Length
(bases)
78,549,690 77,704,413
14,467,885 19,441,646 14,348,384 19,205,118 20,849,165 4,461,906
Average
Length
(bases)
201.2 195.8
439.8 449.6 432.8 440.8 607.7 638.4
N50 268 at
39,274,947
264 at
38,852,391
495 513 491 507 707 740
Size Range
(bases) 40 - 656 40 – 667
42-3376 41-4602 42-3739 41-3928 45-6602 51-4101
Table 2. Assignment of KEGG Enzyme Commission (EC) numbers to Pratylenchus thornei contigs.
KEGG categories representeda
Contigs Enzymesb
KEGG categories representeda
Contigs Enzymesb
1 Carbohydrate metabolism 1.1 Glycolysis / Gluconeogenesis 1.2 Citrate cycle (TCA cycle) 1.3 Pentose phosphate pathway 1.4 Pentose and glucuronate interconversions 1.5 Fructose and mannose metabolism 1.6 Galactose metabolism 1.7 Ascorbate and aldarate metabolism 1.8 Starch and sucrose metabolism 1.9 Amino and nucleotide sugar metabolism 1.10 Pyruvate metabolism 1.11 Glyoxylate/decarboxylate metabolism 1.12 Propanoate metabolism 1.13 Butanoate metabolism 1.14 C5-Branched dibasic acid metabolism 1.15 Inositol phosphate metabolism 2 Energy metabolism 2.1 Oxidative phosphorylation 2.2 Photosynthesis 2.3 Carbon fixation 2.4 Reductive carboxylate cycle 2.5 Methane metabolism 2.6 Nitrogen metabolism 2.7 Sulphur metabolism 3 Lipid metabolism 3.1 Fatty acid biosynthesis 3.2 Fatty acid elongation in mitochondria 3.3 Fatty acid metabolism 3.4 Synthesis/degradation of ketone bodies 3.5 Steroid hormone biosynthesis 3.6 Glycerolipid metabolism 3.7 Glycerophospholipid metabolism 3.8 Sphingolipid metabolism 3.9 alpha-Linolenic acid metabolism 3.10 Biosynthesis of unsaturated fatty acids 4 Nucleotide metabolism 4.1 Purine metabolism 4.2 Pyrimidine metabolism 5 Amino acid metabolism 5.1 Alanine/aspartate/glutamate metabolism
8 11 4 3 3 1 4
12 4 6 2 6 9 1 6
28 12 1 6
14 11 2
1 1 6 1 2 3 2 3 1 2
31 16
14
6 9 2 2 3 1 2 5 3 5 2 5 6 1 4
4 1 1 5 3 6 1
1 1 4 1 1 2 2 1 1 2
12 6
7
5.2 Glycine/serine/threonine metabolism 5.4 Valine/leucine/isoleucine degradation 5.5 Valine, leucine and isoleucine biosynthesis 5.7 Lysine degradation 5.8 Arginine and proline metabolism 5.9 Histidine metabolism 5.10 Tyrosine metabolism 5.11 Phenylalanine metabolism 5.12 Tryptophan metabolism 6 Metabolism of other amino acids 6.1 beta-Alanine metabolism 6.2 Taurine/hypotaurine metabolism 6.3 Phosphonate and phosphinate metabolism 6.4 Selenoamino acid metabolism 6.6 D-Glutamine and D-glutamate metabolism 6.9 Glutathione metabolism 7 Glycan Biosynthesis and metabolism 7.1 N-Glycan biosynthesis - 11 7.2 Lipo-polysaccharide biosynthesis 8 Metabolism of cofactors and vitamins 8.1 Folate biosynthesis 8.2 Retinol metabolism 8.3 Porphyrin metabolism 9 Metabolism of terpenoids and polyketides 9.1 Terpenoid backbone biosynthesis 9.2 Limonene and pinene degradation 10 Biosynthesis of other secondary metabolites 10.1 Phenylpropanoid biosynthesis 10.2 Streptomycin biosynthesis 11 Xenobiotics biodegradation/metabolism 11.1 Caprolactam degradation 11.2 3-Chloroacrylic acid degradation 11.3 1,2-Dichloroethane degradation 11.4 Benzoate degradation via CoA ligation 11.5 Trinitrotoluene degradation 11.6 Metabolism by cytochrome P450 11.7 Drug metabolism - cytochrome P450 11.8 Drug metabolism - other enzymes
1 6 3
15 8 2 1 1 5
7 4 1 3 2 6
3 1
1 2 3
2 3
1 1
2 2 2 3 1 2 2 2
1 4 3 6 5 1 1 1 4
3 1 1 2 1 4
1 1
1 1 2
2 2
1 1
2 1 1 3 1 1 1 1
aSee KEGG database (http://www.genome.jp/kegg/) for categories bThe number of KEGG enzymes that match P. thornei contigs.
Table 3.Pratylenchus thornei orthologues to published nematode parasitism genes.
Accession
No
Species Gene Product and
Symbol
No of
contigs
Best E-
value
Best %
Identity
Alignment
length
(bp)
%
Coverage Reference
AAL40719/ AAR85527
Meloidogyne incognita
14-3-3/14-3-3b product
13 5E-131 96 261 99 Jaubert et al., 2002,
2004
AAN32888 Heterodera glycines
Annexin 4C10 2 3E-26 64 50 84 Gao et al., 2003
AAN32888 Heterodera schachtii
Annexin - Hs4F01 1 5E-17 38 129 58 Patel et al., 2010
AAL40720 M. incognita Calreticulin - Mi-crt-1 8 3E-67 85 189 94 Jaubert et al., 2005
CAD89795 M. incognita Cysteine protease - Mi-cpl-1
13 8E-64 84 82 93 Shingles et al., 2007
CAA70477 Globodera pallida
SEC-2/Fatty acid retinoid binding , Gp-
far-1
7 3E-60 78 111 87 Prior et al., 2001
ABN64198 M. incognita Glutathione-S-transferase -Mi-gsts-1
2 5E-37 62 108 76 Dubreuil et al., 2007
CAB48391 Globodera rostochiensis
Peroxiredoxin - Gr-TpX
1 7E-41 82 91 91 Robertson et al.,
2000
CAC21848 G. rostochiensis RAN-BP-like - Gr-A18 2 2E-07 41 27 56 Vanholme et al.,
2004
AAV34698 G. pallida RAN-BP-like - Gp-rbp-1
4 2E-07 48 27 63 Blanchard et al.,
2005
CAD38523 G. rostochiensis Glutathione peroxidase
4 2E-110 81 228 91 Jones et al., 2004
CAM84510 Radopholus similis
Transthyretin-like protein (Rs-ttl-1, Rs-
ttl-2, Rs-ttl-3, Rs-ttl-4)
29 1E-42 80 55 87 Jacob et al., 2007
CB374976 H. glycines transthyretin-like protein precursor
2 1E-52 64 120 80 Unpublished
AAP30081 H. schachtii Ubiquitin extension protein - Hs-ubi-1
10 4E-37 96 75 98 Tytgal et al., 2004
AAK60209/ AAK55116
H. glycines Venom allergen-like protein – Vap-1
1 3E-19 61 68 79 Gao et al., 2001
AAD01511 M. incognita Venom allergen-like protein – Mi-mps-1
1 E-06 39 28 61 Ding et al., 2000
Table 4.Carbohydrate active enzymes identified in Pratylenchus thornei; including candidates with signal peptides for secretion CAZy enzyme classes CAZy family P.thornei
contigs E-value range Species
Glycoside Hydrolases GH4 1 E-04 uncultured marine bacterium
GH5 a 2 E-05 - E-06 Meloidogyne hapla, Rotylenchulus reniformis
GH9 1 E-04 Herpetosiphon aurantiacus
GH10 3 E-04 – 5.77E-05 Neocallimastix patriciarum
GH12 1 6.28E-06 Sinorhizobium meliloti
GH13 a 7 E-04 – 1.59E-18 Drosophila melanogaster, Chloroflexus aggregans,
Caenarhabditis elegans GH18
a 11 4.45E-05 – 1.87E-37 Leptosphaeria maculans
GH19 2 E-04 Saussurea involucrata
GH20 3 E-04 – 7.5E-27 Spodoptera frugiperda, C. elegans , Catenibacterium mitsuokai
GH23 3 E-04 – 5.77E-05 Staphylococcus aureus, Clostridium ljungdahlii
GH26 a 1 2.5E-05 Roseburia intestinalis
GH30-31 2 1.38E-06- 9.77E-12 C. elegans
GH33 5 E-04 – 4.56E-14 Trypanosoma cruzi
GH38 2 9.96E-10 – 2.92E-11 Magnaporthe grisea
GH43 a 1 2.6E-05 M.grisea
GH 45 a 1 E-04 Brugia malayi
GH47 a 3 8.58E-06 – 8.93E-07 Schistosoma mansoni , M.grisea
GH63 2 2.78E-16 – 3.64E-17 Danio rerio
Glycosyl Trasferases GT2 a 1 E-06 Rhodobacter capsulatus
GT3 1 8.44E-06 Steinernema feltiae
GT4 a 8 E-04 – 2.70E-37 Methylbacteriun spp., Caulobacter segnis, Rhizomucor pusillus, Trichodesmium erythraeum, Nostoc punctiforme , Rhodobacter capsulatus, C. elegans
GT7 5 2.76E-06 - 1.43E-39 S. mansoni , C. elegans
GT27 2 4.70E-21 - 1.21E-47 C. elegans
GT34 1 E-04 Scheffersomyces stipitis
GT35 2 9.7E-63 – 1.63E-79 C. elegans
GT39 3 E-04 – 2.42E-13 Neurospora crassa
GT41 1 6.40E-05 Sideroxydans lithotrophicus
GT49 1 4.55E-10 Dictyostelium discoideum
GT51 10 E-04 – 8.82E-06 Bacillus spp., Eubacterium spp., Roseburia intestinalis
GT55 a 4 E-04 – 6.03E-08 Magnaporthe grisea
GT57 1 E-04 Leptosphaeria maculans
GT64 1 1.26E-26 Danio rerio
GT66 a 2 1.8E-20 – 2.32E-63 C. elegans, Neurospora crassa
GT67 1 7.45E-07 Leishmania braziliensis
GT91 2 E-04 Candida spp.
Carbohydrate Esterases CE4 1 E-04 Tribolium castaneum
CE10 2 4.21E-13 – 8.63E-21 Mus musculus , Homo sapiens
CE11 a 5 E-04 – 1.40E-17 Ostreococcus tauri
Polysaccharide Lyases PL3 2 5E-04 – 1.2E-37 Meloidogyne incognita, M. javanica
Carbohydrate Binding Modules CBM2 5 6.38E-05 - 3.49E-08 Amycolatopsis mediterranei
CBM6 1 2.21E-05 Salinispora tropica
CBM13 5 E-04 - 4.97E-08 Frankia alni
CBM14 a 5 E-04 - 1.98E-13 C. elegans , Drosophila melanogaster
CBM20 2 9.00E-05 - 7.14E-07 Micromonas sp.
CBM33 a 10 9.60E-5 - 9.28E-31 Spodoptera litura
CBM48 a 3 2.14E-07 - 1.71E-08 Chloroflexus aggregans, C. elegans
CBM51 1 6.16E-06 Streptomyces griseus
CAZy, Carbohydrate active enzymes
aIndicates the presence of a signal peptide without a trans-membrane helix.
Table 5. The 20 most conserved genes between the Pratylenchus thornei transcriptome and
Caenarhabditis elegans.
Genbank
Accession No
Genebank
Protein ID
Gene E-value
E41472
CE43278
CE05449
CE09197
CE21713
CE06519
CE30306
CE43299
CE13150
CE17477
CE06184
CE02626
CE31351
CE32946
CE42002
CE12368
CE39951
CE40348
CE23997
CE20226
Y47D3A.26
W07E11.1
C47E12.5a
F07A5.7a
Y39E4B.1
T27F2.1
T25G3.3
F48F7.1b
T04C12.5
C27B7.4
K12D12.1
F10G7.2
F56A11.1
W02B3.2
F28B3.7a
M03F4.7a
M01G5.5
C02F4.2c
T21E12.4
Y37D8A.23a
smc-3, structural maintenance of chromosomes
glutamate synthase
uba-1, Ubiquitin activating enzyme
unc-15, paramyosin
abce-1, ABC transporter, class E
skp-1, transcriptional cofactor
Required for viability, growth, and fertility
alg-1, affects developmental timing
act-2, actin 2
rad-26, radiation sensitive abnormal
top-2, topoisomerase
tsn-1, component of the RNA-induced silencing complex
gex-2, tissue morphogenesis and cell migrations
grk-2, serine/threonine protein kinase
him-1, structural maintenance of chromosomes
calu-1, calcium binding protein
snf-6, sodium:neurotransmitter symporter
tax-6, calcineurin A
dhc-1, cytoplasmic dynein heavy chain
unc-25, glutamic acid decarboxylase (GAD)
1.94E-175
3.98E-162
4.73E-161
1.56E-149
4.50E-134
4.72E-134
1.57E-131
1.64E-127
2.07E-126
2.93E-126
1.49E-125
2.53E-124
4.76E-121
7.31E-119
9.93E-116
2.12E-113
3.68E-113
1.02E-111
2.19E-111
5.50E-111
Transcriptome Assembly
Roche 454 GS FLX
Dataset Lane 1 – 390,417
Dataset Lane 2 – 396,858
CAP3
Lane 1 + Lane 2
32,897
Annotations
Parasitism genes, signal peptides
and CAZy (34,312 contigs used)
NCBI BLAST searches
eg. C. elegans
Plant parasitic nematodes
Reassembly Step 2
Contig Assembly Step 1
CAP3
34,312
MIRA
Lane 1 + Lane 2
43,569
Step 3 Selection
Common Contigs
6,989
Neuropeptides C. elegans RNAi
genes
Gene Ontology (GO) terms and KEGG (EC)
Figure 1
Fig. 2
Read length Dataset L1 Dataset L2
0-20 0.00 0.00
20-40 0.34 0.40
40-60 27.90 30.67
60-80 43.09 47.70
80-100 49.05 54.03
100-120 51.29 56.24
120-140 51.38 55.67
140-160 53.55 55.88
160-180 55.07 57.74
180-200 60.98 62.07
200-220 70.77 71.85
220-240 82.75 83.29
240-260 96.66 93.91
260-280 100.00 100.00
280-300 99.85 95.26
300-320 91.91 88.86
320-340 82.06 80.71
340-360 70.71 68.94
360-380 59.82 59.52
380-400 50.10 49.33
400-420 41.43 39.93
420-440 32.43 32.26
440-460 27.21 26.74
460-480 22.41 21.83
480-500 19.53 19.32
500-520 13.78 13.62
520-540 7.22 7.00
540-560 2.83 2.52
560-580 0.75 0.61
580-600 0.30 0.18
600-620 0.06 0.03
620-640 0.03 0.00
640-660 0.02 0.01
660-680 0.00 0.01
680-700 0.00 0.00
0.00
20.00
40.00
60.00
80.00
100.00
120.00
0-2
0
20-4
0
40-6
0
60-8
0
80-1
00
100-1
20
120-1
40
140-1
60
160-1
80
180-2
00
200-2
20
220-2
40
240-2
60
260-2
80
280-3
00
300-3
20
320-3
40
340-3
60
Perc
en
tag
e o
f b
ases
Read Length (bp)
Dataset L1 Dataset L2
Fig. 3
Size Range (bp) CAP Dataset L1 MIRA Dataset L1 CAP Dataset L2 MIRA Dataset L2 Reassembled Data
0-50 20 30 22 37 71
51-99 680 1,156 744 1,296 2,690
100-150 1,204 1,790 1,430 2,002 3,952
151-199 1,576 2,163 1,701 2,404 4,511
200-250 2,377 2,907 2,459 3,089 5,768
251-299 3,430 4,141 3,520 4,179 7,702
300-350 3,936 4,768 3,874 4,737 8,667
351-399 3,670 4,600 3,676 4,686 8,522
400-450 3,339 4,215 3,250 4,073 7,706
451-499 2,754 3,663 2,819 3,614 6,820
500-550 2,202 2,940 2,165 2,866 5,678
551-599 1,617 2,280 1,609 2,275 4,395
600-650 1,317 1,709 1,296 1,778 3,430
651-699 976 1,405 959 1,308 2,634
700-750 751 1,122 756 1,019 2,156
751-799 609 818 562 828 1,652
800-850 509 655 461 635 1,441
851-899 348 490 331 481 1,115
900-950 315 388 276 360 895
951-999 249 331 221 333 729
1000-1050 184 265 177 251 590
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
Nu
mb
er
of
Co
nti
gs
Contig Size Range (bp)
CAP Dataset L1 MIRA Dataset L1 CAP Dataset L2 MIRA Dataset L2 Reassembled Data
Fig. 4
GO Terms Percentage
Other enzyme activity 15.2
ATP binding 13.5
DNA binding 8.0
Ion binding 8.0
Transcription 6.6
Ligase activity 6.1
other binding 6.1
Transporter 6.0
Structural 4.2
Nucleotide binding 3.9
Protein binding 3.8
RNA binding 3.4
Kinase activity 3.3
Nucleic acid binding 3.2
Unspecified molecular function 2.6
Electron transport 2.3
Signal transducer 1.4
Motor 0.6
Hormone 0.4
Enzyme regulator 0.3
Growth 0.1
Nucleus 27.0
Membrane 27.0
Cytoplasm 23.0
Unspecified intracellular 18.0
Vesicle coat 2.0
Extracellular 3.0
Protein metabolism &modifications 19.2
Transport 19.0
Nucleic acid metabolism 13.5
Biosynthesis 5.6
Cell comunication 3.9
Catabolism 2.6
Carbohydrate metabolism 2.4
Lipid metabolism 1.3
Chitin metabolism 0.4
Viral reproduction 0.4
Superoxide metabolic process 0.2
Cell cycle 0.2
0.0
5.0
10.0
15.0
20.0
25.0
30.0
Oth
er
enzym
e a
ctivity
AT
P b
indin
g
DN
A b
indin
g
Ion b
indin
g
Tra
nscription
Lig
ase a
ctivity
oth
er
bin
din
g
Tra
nsport
er
Str
uctu
ral
Nucle
otide b
indin
g
Pro
tein
bin
din
g
RN
A b
indin
g
Kin
ase a
ctivity
Nucle
ic a
cid
bin
din
g
Unspecifie
d m
ole
cula
r fu
nction
Ele
ctr
on t
ransport
Sig
nal tr
ansducer
Moto
r
Horm
one
Enzym
e r
egula
tor
Gro
wth
Perc
en
tag
e
Molecular Function Cellular Component
GO Categories
Meloidogynidae (73,343)
Heteroderidae (47,338)
2,039
Radopholus similis (7,382)
Caenorhabditis elegans (25,030)
Pratylenchus thornei (6,989)
1,218
1,209
1,947
2,014
719,316
45,299 Pratylenchus penetrans and Pratylenchus vulnus (7,728)
6,510
6164
23,016
Figure 5
Pratylenchus coffeae (25,987)
5,504 20,483
Fig. 6
Size (bp) Unannotated Annotated
0-20 8 4
20-40 42 17
40-60 90 63
60-80 142 56
80-100 143 82
100-120 184 102
120-140 183 122
140-160 189 115
160-180 218 126
180-200 199 122
200-220 175 73
220-240 171 87
240-260 129 113
260-280 101 73
280-300 127 85
300-320 110 64
320-340 86 42
340-360 98 67
360-380 75 40
380-400 56 51
400-420 56 26
420-440 45 23
440-460 37 19
460-480 38 27
480-500 24 14
500-520 29 13
520-540 14 9
540-560 23 9
560-580 12 16
580-600 13 6
600-620 18 7
620-640 16 8
640-660 5 4
660-680 9 6
680-700 6 1
700-720 6 3
720-740 6 3
740-760 4 1
760-780 6 5
780-800 8 1
800-820 0 3
820-840 2 1
840-860 2 2
860-880 1 1
880-900 0 1
900-920 0 1
920-940 3 0
940-960 1 1
960-980 1 1
980-1000 2 0
0
50
100
150
200
250
Nu
mb
er
of
Op
en
Read
ing
Fra
mes
Open Reading Frame Size (bp)
Unannotated Annotated
Annotations (contigs)
Analysis of P. thornei transcriptome
Roche 454 GS FLX
Reads L1 – 390,417
Reads L2 – 396,858
CAP3 - 32,897
Reassembly
Contig Assembly
CAP3 - 34,312 contigs
MIRA - 43,569
Selection
Common Contigs - 6,989
Homologues to plant parasitic nematodes
Plant parasitism genes
CAZymes, Neuropeptides
C. elegans RNAi genes
Graphical Abstract
Highlights
First known transcriptome of the nematode Pratylenchus thornei using Roche GSFLX
sequencing
787,275 reads assembled into 34,312 contigs using two assemblers, yielding 6,989 contigs
Forty-two percent of the contigs were annotated using public databases
One hundred contigs matched parasitism genes of sedentary/migratory endoparasites
Data provides new information on genes involved in P. thornei-host interaction