Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
118
Chapter 4
Molecular adaptation of the chloroplast matK gene in
Nymphaea tetragona, a critically rare and endangered plant
of India
4.1 Introduction
Nymphaea tetragona represents one of the naturally occurring species found
in India (Mitra, 1990). Its distribution is restricted to one particular location found at
Nongkrem, East Khasi Hills District, Meghalaya, India (25°28´N-91°52´E). It is
considered to be rare and endangered (Tandon, 2009; Tandon et al., 2010). Because
of its present status, recovery programmes have been initiated for this plant taxon
(Ganeshaiah, 2005; Tandon et al., 2010).
Besides N. tetragona, there are an additional nine species (both wild and
cultivated) of the genus Nymphaea occurring in India (Mitra, 1990). However,
traditional classification of the genus in India has received unconvincing treatments
with some names inaccurately used (Cook, 1996). To vindicate Cook‟s remarks,
molecular taxonomic revision of four Indian representatives of the genus Nymphaea
viz. N. nouchali, N. pubescens, N. rubra and N. tetragona based on ITS, trnK intron
and matK gene was conducted (discussed in chapter 5, Dkhar et al., 2010). Molecular
evidence suggested probable misidentification of one specimen of N. nouchali and N.
tetragona. Furthermore, a comparison of the genetic closeness among the Indian,
Chinese and Russian material of N. tetragona based on the ITS region was also
evaluated. The results indicated a close relationship between the Indian and the
119
Chinese representatives which is further supported by the morphological similarities
shared between them (eFloras, 2008).
An interesting observation made from chapter 2 was the relatively higher
number of nonsynonymous substitutions as compared to synonymous substitutions
detected in the matK gene of N. tetragona. The higher rate of nonsynonymous
substitutions is usually inferred as an indication of selective pressures acting on
protein-coding sequences. The chloroplast matK gene, coding for the maturase
enzyme, has been reported to evolve under positive selection in some lineages of
land plants (Hao et al., 2010). According to Hao et al. (2010), several regions (N-
terminal region, RT domain in the middle, and domain X at the C-terminus) of matK
have experienced molecular adaptation, which fine-tunes maturase performance.
In this chapter, to gain insight into whether an adaptive evolutionary process
is linked to the colonization of new habitats by N. tetragona, selective pressure in
matK using phylogenetic and maximum likelihood analyses of codon and branch site
substitution models was tested. This estimation was extended to members of the
order Nymphaeales, not included in the study of Hao et al. (2010), representing a
lineage at the base of the angiospermic tree.
4.2 Materials and Methods
4.4.1 Amplification of the matK gene in N. tetragona
New primers (NytrnKJD525-F 5′ TCGGGTTGCAAAAATAAAGG 3′ and
NytrnKJD2551-R 5′ AAACCGTGCTTGCATTTTTC 3′) were designed to amplify
the matK gene in six additional samples of N. tetragona, besides NytN1 (DNA
sample number corresponding to first individual/sample). The same primers
120
mentioned above including NymatKJD1853-F, stated in chapter 2, were used for
sequencing.
4.4.2 Taxon sampling and nucleotide sequence data
Nucleotide sequence data of the trnK intron, matK and rbcL gene of N.
nouchali, N. pubescens, N. rubra and N. tetragona were used. In addition, nucleotide
sequence data of Barclaya longifolia, Euryale ferox, Nuphar advena, Ondinea
purpurea, Victoria cruziana, Amborella trichopoda and Austrobaileya scandens,
identified as the most basal angiosperms, were retrieved from GenBank.
4.4.3 Protein blast
The amino acid sequence data of the chloroplast matK and rbcL gene of N.
tetragona were subjected to protein blast (blastp) available at www.ncbi.nlm.nih.gov.
4.4.4 Molecular evolutionary analysis
In order to conduct phylogenetic maximum likelihood analysis of selection in
chloroplast genes, a phylogenetic tree was reconstructed based on the chloroplast
trnK intron, matK and rbcL gene using maximum likelihood method implemented in
Phylip (Felsentein, 2004).
Molecular adaptation tests on the chloroplast matK codon sites were
performed using maximum likelihood models and programs included in PAML ver.
4.3 (Yang, 2007). Six nonsynonymous sites detected in the matK gene of N.
tetragona are indicated in Fig. 33. Five site-specific codon substitution models: null
models for testing positive selection (M1A, M7 and M8A) and models allowing for
http://www.ncbi.nlm.nih.gov/
121
Fig. 33. Nonsynonymous substitution detected in the matK gene of N. tetragona. The
sequencing chromatogram of the matK gene of N. tetragona along with the
corresponding nucleotide sequences of N. tetragona, other Nymphaea species, and
related genera are indicated. Six nonsynonymous substitution sites were identified.
The transformation of amino acid at these sites among the species included is
indicated. A brief explanation of the significance of such an observation is indicated
within the box.
122
positive selection (M2A and M8) were performed. Furthermore, to provide evidence
whether positive selection is linked to the colonization of N. tetragona, the branch
site models (Model A with ω fixed or estimated) implemented in PAML ver 4.3 were
used. The likelihood ratio test (LRT) was used to compare these alternative models.
4.3 Results
4.3.1 Amplification and sequencing of the matK gene
Amplification of the matK gene in six additional samples of N. tetragona
yielded one single amplified product of approximately 1900bp in size (Fig. 34).
Sequencing results revealed exact sequence matches among all individuals/samples
of N. tetragona. Therefore, it could be assumed that a single haplotype of the matK
gene is maintained throughout the population. Fig. 35 shows the single population at
Umjapung Village, Nongkrem, East Khasi Hills District, Meghalaya, India where N.
tetragona is found.
4.3.2 Protein blast (blastp) of the Rubisco and maturase K proteins
Protein blast of the Rubisco (rbcL gene encoded protein) amino acid
sequence data of N. tetragona corresponds to the large subunit of ribulose-1,5-
bisphosphate carboxylase/oxygenase of N. alba (Accession no. CAF28601). No
partitioning of the protein is indicated. However, maturase K (matK gene encoded
protein) blastp showed two regions identified as the N-terminal region, whose
function is not known, and Domain X which is involved in splicing. The intervening
region between the N-terminal and Domain X is called RT Domain (Fig. 36).
http://www.ncbi.nlm.nih.gov/protein/50250335?report=genbank&log$=protalign&blast_rank=5&RID=5CV5UVDM012
123
Fig. 34. PCR amplified product of the matK gene in six additional individuals
(NytN2–NytN7) of N. tetragona. The amplified product is approximately 1900bp in
size. Ladder ranging from 500–5000bp in size was used as standard.
124
Fig. 35. Location at Umjapung Village, Nongkrem, East khasi Hills District. where
N. tetragona is found. This is the only population of N. tetragona in India. Note the
cultivation surrounding the pond.
125
Fig. 36. Protein blast (blastp) of the Rubisco and maturase K of N. tetragona. No
partitioning of the Rubisco protein is indicated. For maturase K, blastp showed two
regions identified as the N-terminal region and Domain X. The intervening region
between the N-terminal and Domain X is called RT Domain.
126
4.3.3 Phylogenetic tree reconstruction
Maximum likelihood analysis of the combined datasets yielded a
phylogenetic tree with log likelihood value of –10921.95 (Fig. 37).
4.3.4 Positive selection in matK gene
For detecting positive selection using codon substitution models, three
likelihood ratio tests were performed comparing M1A (nearly neutral) with M2A
(positive selection), M7 (beta) with M8 (beta and ωs ≥ 1), and M8A (beta and ωs =1)
with M8 (beta and ωs ≥ 1). Because positive selection operates more strongly at some
sites (Aguileta et al., 2009) the distribution of sites influenced by selection for all
three regions (N-terminal region, RT domain, and domain X) of the matK gene was
evaluated (Table 19). For N-terminal region, model M2A indicated that 86.8% of the
sites are under purifying selection (ω = 0.248), and 13.2% of the sites are under
positive selection (ω = 1.778). In model M8, 13.0% of the sites are under selective
pressure (ω = 1.791). For RT domain, model M8 estimated an ω value of 46.57 for
0.01% of codon sites. Codon substitution models did not identify any sites under
positive selective for domain X. Two LRTs, M7-M8 and M8A-M8, were significant
at the 0.01 and 0.05 significance level respectively. Comparing M8 against the null
model M8A is more robust than M1A vs. M2A or M7 vs. M8 (Swanson et al., 2003).
Branch-site model A (ω2 =1 fixed) with branch-site model A (ω2 estimated)
was compared. Each branch in the phylogenetic tree was specified a priori as
foreground branches. For N-terminal region, model A (ω2 estimated) for N. tetragona
branch indicated that 2.09% of sites are under selective pressure with ω2 = 23.099
(Table 20). Bayes Empirical Bayes (BEB) analysis showed three positive sites with
127
Fig. 37. Maximum likelihood tree of combined chloroplast DNA dataset (3859 bp) using Phylip. Three positive sites with probability
for foreground (N. tetragona) estimated using branch-site models of PAML are indicated. Amino acid replacements at these sites are
also indicated at the branch (G: Glycine, E: Glutamic acid, Q: Glutamine, H: Histidine, L: Leucine, F: Phenylalanine). Numbers at nodes
indicate bootstrap values. Prob, probability.
128
Table 19. Estimation of log-likelihood (LnL) values and parameters for each region of the matK gene using site-specific codon
substitution models implemented in PAML.
1 p, proportion of codon sites with ω; ω, dN/dS. For models M7, M8 and M8A, ω was estimated from a beta distribution B(p, q).
Model N-terminal Region RT Domain Domain X
LnL Parameters1
LnL Parameters LnL Parameters
Site-specific models
M0: one ω –3163.49 ω = 0.38797 –229.30 ω = 0.32206 –1252.89 ω = 0.32610
M1A: nearly neutral –3148.00 p0 = 0.74229, ω0 = 0.18594
–229.30 p0 = 0.99999, ω0 = 0.32206
–1250.61 p0 = 0.67267, ω0 = 0.11716
p1 = 0.25771, ω1 = 1.00000 p1 = 0.00001, ω1 = 1.00000 p1 = 0.32733, ω1 = 1.00000
M2A: positive selection –3146.07 p0 = 0.86780, ω0 = 0.24814
–229.30 p0 = 1.00000, ω0 = 0.32206
–1250.61 p0 = 0.67267, ω0 = 0.11716
p1 = 0.00000, ω1 = 1.00000 p1 = 0.00000, ω1 = 1.00000 p1 = 0.31251, ω1 = 1.00000
p2 = 0.13220, ω2 = 1.77808 p2 = 0.00000, ω2 = 1.00000 p2 = 0.01481, ω2 = 1.00000
M7: beta –3150.77 p = 0.54661, q = 0.79966 –229.31 p = 47.39745, q = 99.00000 –1249.89 p = 0.38014, q = 0.66141
M8: beta & ωs ≥ 1 –3146.10 p0 = 0.87015
–229.31 p0 = 0.99999
–1249.89
p0 = 0.99999 p = 33.15470, q = 99.00000 p = 47.39742, q = 99.00000 p = 0.38013, q = 0.66140
p1 = 0.12985, ω1 = 1.79169 p1 = 0.00001, ω1 = 46.75087 p1 = 0.00001, ω1 = 1.00000
M8A: beta & ωs =1 –3148.07 p0 = 0.74402
–229.31 p0 = 0.99999
–1249.89 p0 = 0.99999
p = 22.99702, q = 99.00000 p = 47.39723, q = 99.00000 p = 0.38014, q = 0.66142
p1 = 0.25598, ω1 = 1.00000 p1 = 0.00001, ω1 = 1.00000 p1 = 0.00001, ω1 = 1.00000
129
Table 20. Estimation of log-likelihood (LnL) values and parameters for each region of the matK gene using branch-site models
implemented in PAML selecting N. tetragona as foreground branch.
1 p, proportion of codon sites with ω; ω, dN/dS.
Model N-terminal Region RT Domain Domain X
LnL Parameters1 LnL Parameters LnL Parameters
Foreground: Nymphaea tetragona
Model A
(ω2 = 1 fixed) –3148.0
p0 = 0.73633, ω0 = 18596 –229.3
p0 = 1.00000, ω0 = 0.32206 –1250.6
p0 = 0.67267, ω0 = 0.11716 p1 = 0.25529, ω1 = 1.00000 p1 = 0.00000, ω1 = 1.00000 p1 = 0.32733, ω1 = 1.00000
p2+p3 = 0.00839, ω2 = 1.00000 p2+p3 = 0.00000, ω2 = 1.00000 p2+p3 = 0.00000, ω2 = 1.00000
Model A (ω2 = estimated)
–3147.8 p0 = 0.73321, ω0 = 0.18697
–229.3 p0 = 1.00000, ω0 = 0.32206
–1250.6 p0 = 0.67267, ω0 = 0.11716
p1 = 0.24589, ω1 = 1.00000 p1 = 0.00000, ω1 = 1.00000 p1 = 0.32733, ω1 = 1.00000
p2+p3 = 0.0209, ω2 = 23.09944 p2+p3 = 0.00000, ω2 = 1.17397 p2+p3 = 0.00000, ω2 = 1.00000
130
probability value of 0.607, 0.527, and 0.582 respectively. Branch-site models for the
remaining regions of matK gene had similar log-likelihood values.
4.3.5 Positive selection in rbcL gene
For rbcL gene, model M2A indicated that 94.6% of the sites are under
purifying selection (ω = 0.017), and 5.3% of the sites are under positive selection (ω
= 1.400). In model M8, 5.3% of the sites are under selective pressure (ω = 1.401).
The results indicated a relatively lower selective pressure on the rbcL gene as
compared to matK. Identical log-likelihood values were observed while selecting N.
tetragona as foreground lineage under branch-site models.
4.4 Discussion
4.4.1 Selective pressure during colonization of N. tetragona
N. tetragona is well distributed globally, found throughout China (Ma et al.,
2008; Hu et al., 2009), Finland (Nurminen, 2003), Japan (Takamura et al., 2003),
and Russia (Volkova et al., 2010); but its location in India is restricted to one
particular location. It is interesting to note that plants of N. tetragona found in India
are closely related to plants available in China (discussed in chapter 5, Dkhar et al.,
2010). Therefore, evidences from molecular evolutionary analysis suggested that the
plant taxon migrated from China, and the migratory processes involved might have
brought about some changes at both the DNA and protein sequence level. This
adaptive response would have rendered slight advantage for the plant to survive in
new habitat. Such adaptive changes at the DNA and protein sequence levels of rbcL
131
gene have been reported in Schidea (Kapralov and Filatov, 2006) and Potamogeton
(Iida et al., 2009).
Out of six nonsynonymous substitution detected in the matK gene of N
tetragona, three showed selection pressure with probability value of 0.607, 0.527,
and 0.582, respectively. The three selected sites are located at the N-terminal region
of matK gene, whose function is not known. Molecular evolutionary analysis did not
identify any sites under positive selective for domain X, a relatively more conserved
and important functionally, as it determines the maturase activity of matK proteins
(Hausner et al., 2006). Similarly, RT domain showed relatively lower percentage of
sites (0.01%) with positive selection as compared to N-terminal region (13.2%) of
matK gene. Such observation suggests that selective pressures are likely to occur in
regions that are functionally unimportant. Hao et al., (2010) observed that more
positively selected sites were detected in the N-terminal region than in the RT
domain and domain X of matK gene.
4.4.2 Implications for the conservation of N. tetragona
Sustainable utilization of plant genetic resources for food and agriculture has
been increasingly discussed at both national and international forums. Besides
exploitation, conservation of plant genetic resources has become an integral part of
these discussions. Conservation aims at maintaining the diversity of living
organisms, their habitat and the interrelationship between organisms and their
environment. For achieving such goals, appropriate conservation strategies have to
be adopted. Determining the genetic makeup of a particular plant species is of critical
importance when planning a suitable conservation strategy. The present chapter
132
provided evidence for molecular adaptation of matK gene in N. tetragona, a critically
rare and endangered plant of India. Such adaptive changes at the molecular level
might have been associated with the adaptation of N. tetragona to varying ecological
conditions. So, threat to the existence of this plant taxon could primarily be attributed
to being anthropogenic, brought about through large scale cultivation at the vicinity
of the location (Fig. 35). Conservation of N. tetragona can be targeted at controlling
such activities or the probable translocation to another site.