Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Chapter 1
1
Chapter 1: Introduction and Review of Literature
Introduction to proteases
Proteases were once believed to be nonspecific degradative enzymes, but are now
recognized to have well-defined physiological roles with tightly controlled spatial and
temporal regulation of activity. Proteases act as regulators of biological processes during
the life span of all organisms (Neurath, 1984). They control activation, synthesis and
turnover of proteins as they carry out unlimited number of hydrolytic reactions both intra-
and extracellularly (Neurath and Walsh, 1976; Neurath, 1989). The term 'protease' was
earlier used for enzymes that degrade proteins by hydrolysis of peptide bonds. In late
1920s, it was recognized that there are two very different types of these proteolytic
enzymes (Grassman and Dyckerhoff, 1928). Some enzymes act best on intact proteins,
whereas others show a preference for small peptides as substrates. So the term
‘proteinase’ was adapted for proteases that show specificity for intact proteins and cleaves
internal peptide bonds (=endoproteinases), and ‘peptidase’ was used as synonymous to
exopeptidase, i.e. to refer to a peptide (bond) hydrolase that acts specifically at the N- or
C-terminus of a polypeptide (Grassmann and Dyckerhoff, 1928; Barrett, 1980;
McDonald, 1985; Barrett and McDonald, 1985, 1986).
The general nomenclature of cleavage site positions of the protein substrate by
proteinases was formulated by Schechter and Berger (1967). According to this scheme,
the binding region of the active site may be divided into subsites (S), or specificity sites,
each accommodating one amino acid residue or blocking group of the substrate. The
subsites on each side of the catalytic site are designated S1 and S1´. The remaining
subsites are numbered from this point in both directions as S1, S2.., etc., and S1´, S2´..,
etc. The positions of the amino acids in the substrate that interact with the protease are
numbered to correspond with the subsites they occupy on the active site and are
designated P1 and P1´ (figure 1.1). Cleavage is catalyzed between P1 and P1´.
Chapter 1
2
Classification of Proteases
In 1960, a scheme for classifying proteases on the basis of recognized catalytic
mechanisms, was proposed (Hartley, 1960). Proteases were classified in four classes
according to their catalytic mechanisms: serine proteinases, thiol proteinases, acid
proteinases and metal proteinases (figure 1.2). Later on, some amendments in this scheme
were proposed (Barrett, 1980). The International Union of Biochemistry and Molecular
Biology (IUBMB) (formerly the International Union of Biochemistry, IUB) recommends
that proteases be divided into four mechanistic classes depending on the amino acid
residue present at their active sites, the optimum pH range, amino acid sequence similarity
and similarity in response to inhibitors. These include serine proteinases, cysteine
proteinases, aspartic proteinases and metallo-proteinases. In addition, Rawlings and Barrett
(1993) proposed an evolutionary scheme for proteases based on amino acid sequence data
from 600 of these enzymes. According to this scheme, there are more than 60 evolutionary
lines of proteases with separate origins (Rawlings and Barrett, 1993; Terra and Ferreira,
1994). In this classification, proteases are classified into families and clans. The term family
is used to describe a group of enzymes in which each member shows evolutionary
relationship to at least one other, either throughout the whole sequence or at least in the
part of the sequence responsible for catalytic activity; whereas a clan comprises a group
of families for which there are indications of evolutionary relationship, despite the lack of
statistically significant similarities in sequence (Rawlings and Barrett, 1993). Recent
advances of classification to proteases are available at MEROPS Peptidase Database,
http://merops.sanger.ac.uk) (Rawlings and Barrett, 1993; Rawlings et al., 2010, 2012).
Serine proteinases (EC 3.4.21)
Serine proteinases are by far the best characterized proteolytic enzymes (Neurath,
1999). They are of wide occurrence and constitute the largest group of proteinases. Together
with their inhibitors, they regulate a great variety of physiological events. The catalytic
mechanism depends upon the hydroxyl group of a serine residue acting as a nucleophile
Chapter 1
3
that attacks the peptide bond. Serine proteinases are characterized by enzyme inactivation in
the presence of diisopropylfluorophosphate (DFP) or phenylmethanesulfonylfluoride
(PMSF). This mechanistic class is distinguished by the presence of the Asp102-His57-
Ser195 “catalytic triad”, first identified in α chymotrypsin (numbering after the sequence
of bovine chymotrypsinogen) (Blow, 1997). Various clans in the serine proteinase family
display differences in terms of overall fold and the order of catalytic residues in the
primary sequences; but members of clans PA (chymotrypsin like) (Lesk and Fordham,
1996), SB (subtilisin like) (Siezen and Leunissen, 1997) and SC (α/β hydrolase fold)
(Ollis et al., 1992) maintain a strictly conserved active site geometry of the catalytic Ser-
His-Asp residues (Krem and Cera, 2001). Serine proteinases with novel catalytic triads
and dyads have also been identified, including Ser-His-Glu, Ser-Lys/His, His-Ser-His,
and N-terminal Ser residues (Dodson and Wlodawer, 1998; Hedstrom, 2002).
The mammalian digestive serine proteinases
Digestive serine proteases (also known as the S1 clan of serine proteinases)
(www.merops.sanger.ac.uk) include trypsin, chymotrypsin and elastases (Rawlings and
Barrett, 1993). Trypsin, chymotrypsin and elastase have the same active site residues but
differ in their substrate specificity. They also differ in the amino acids that confer this
specificity (figure 1.3). Trypsins and chymotrypsins share three disulphide bridges known
as His loop (42-58), Met loop (168-152) and Ser loop (191-220) (Hartley, 1964).
Chymotrypsin (EC 3.4.21.1) is a prominent member of the group of enzymes belonging
to the digestive serine proteinase family. The substrate binding site is occupied by Ser189
in chymotrypsins which cleave peptide bonds at the carboxy- termini of amino acids
tryptophan (Trp), phenylalanine (Phe), methionine (Met) and Leucine (Leu) (Boyer,
1971). Bovine chymotrypsin (BC) was sequenced early (Hartley, 1964), and is well
characterized. The primary specificity of this enzyme for aromatic and Leucine residues
results from the architecture of its S1 subsite (following Schechter and Berger, 1967)
comprising of a pocket close to the catalytic Ser 195. Secondary specificity determinants
Chapter 1
4
are Gly 216 and Gly 226 that line the pocket and residue Ser 189 that lies at the bottom of
the specificity pocket (figure 1.3) (Dorovska et al., 1972).
Trypsin (EC 3.4.21.4) catalyzes the hydrolysis of protein chains on the carboxyl
side of basic L-amino acids, arginine (Arg) or lysine (Lys). The substrate binding site in
trypsin is occupied by Asp189, Gly 216 and Gly 226 (figure 1.3). Asp189 gives negative
charge to the substrate binding pocket and is perfectly placed to form an ion-pair with the
positive charge of Lys or Arg residues present in the scissile bond of a substrate or
inhibitor (Evnin et al., 1990; Perona et al., 1994). Bovine trypsin (BT) was among the
first proteolytic enzymes to be isolated in pure form and in sufficient quantity for detailed
chemical and enzymological studies (Northrop et al., 1948). Mammalian trypsin shows a
2- to 10 fold preference of Arg substrates as compared to Lys substrates (Craik et al.,
1985; Barrett et al., 2004). The enzyme is specifically inhibited by Nα-Tosyl-L-lysine
chloromethyl ketone hydrochloride (TLCK) which acts on histidine (Shaw et al., 1965).
The three dimensional structure and the mode of action of trypsin has been well studied
more than thirty years ago with the help of crystallography (Stroud et al., 1971, 1974;
Kraut, 1977; Huber and Bode, 1978).
Elastase (E.C 3.4.4.7) is an endopeptidase that can digest elastin, the elastic
fibrous protein of connective tissue (Boyer, 1971). The first report of pancreatic elastase
activity dates back to 1878 when Walchli found that Ox pancreas digested ligamentum
nuchae elastin, and Kuhne reported that impure trypsin preparations dissolved elastin
(Boyer, 1971). Elastase is present in all mammals investigated and has also been reported
in pancreatic extracts of several vertebrates and invertebrates (Cohen et al., 1981;
Yoshinaka et al., 1986). The first elastase gene to be cloned was from pig (Shotton and
Hartley, 1970). Like other serine proteinases, elastase contains the highly conserved
catalytic triad of His57-Asp102-Ser195 (Vered et al., 1986). In elastase specificity
pocket, Gly 216 and Gly 226 are replaced by Val 216 and Thr 226 which block the
entrance of large bulky amino acids into the substrate binding cavity (figure 1.3) (Shotton
and Hartley, 1970).
Chapter 1
5
Mechanism of action of mammalian serine proteases
The classical mechanism of proteolysis by these enzymes involves two acyl
transfer reactions, in which the ‘leaving group’ of the scissile bond in the substrate is
attacked first by the oxygen of the enzyme’s active site serine, and secondly by a water
molecule (Bender et al., 1964; Fastrez and Fersht, 1973). The catalytic triad spans the
active site cleft, with Ser195 on one side and Asp102 and His57 on the other, where
His57 acts as a general acid and base; Asp102 functions to orient His57 while Ser195
forms a covalent bond with the peptide which is to be cleaved (Blow, 1997; Hedstrom,
2002). In the acylation half of the reaction, Ser195 attacks the carbonyl group of the
peptide substrate, assisted by His57 acting as a general base, to yield a tetrahedral
intermediate (figure 1.4). The resulting His57-H+ is stabilized by the hydrogen bond to
Asp102. The oxyanion of the tetrahedral intermediate is stabilized by interaction with the
main chain NHs of the oxyanion hole. The tetrahedral intermediate collapses with
expulsion of leaving group, assisted by His57-H+ acting as a general acid, to yield the
acylenzyme intermediate (Hedstrom, 2002). The deacylation half of the reaction
essentially repeats the above sequence: water attacks the acylenzyme, assisted by His57
yielding a second tetrahedral intermediate. This intermediate collapses, expelling Ser195
and the carboxylic acid product.
Protease Inhibitors (PIs): mechanisms of inhibition
Protease inhibitors (PIs) are natural antagonists of proteases, and control action of the
proteolytic enzymes. PIs can be classified into 48 families on the basis of similarities
detectable at the level of amino acid sequence (Rawlings et al., 2004). On the basis of
three-dimensional structures, 31 of the families are assigned to 26 clans
(www.merops.sanger.ac.uk). The activity of PIs is due to their capacity to form stable
complexes with the enzymes. Various mechanisms have been proposed for inhibition of
proteases by PIs (figure 1.5) (Bode and Huber, 2000), which includes: (1) Blocking the
enzyme's active site in a substrate-like manner, for example the canonical serine
Chapter 1
6
proteinase inhibitors interact with the target enzymes in a substrate like manner (figure
1.5, A) (Bode and Huber, 2000). Table 1.1 summarizes some of the major features of
different types of serine proteinase inhibitors.
Table 1.1: The major features of serine proteinase inhibitors
Inhibitor
type
Representative Mode of interaction with
target proteinase
Major features Inhibitor
size
Canonical
inhibitor
BPTI,
OMTKY3,
eglin c,
Substrate like standard
mechanism, direct blockage of
the active site, the enzyme-
inhibitor complex is rapidly
formed and usually dissociates
slowly into free enzyme and
virgin or modified inhibitor
Largest group of PIs,
presence of exposed
binding loop of a
characteristic “canonical”
conformation
3–21 kDa
per
domain
Non-
canonical
inhibitor
Hirudin,
TAP,
ornithodorin
Interact through N-terminal
segments to the enzyme active
site
Less abundant PIs, only
occur in blood sucking
organisms, inhibits
protease involved in clot
formation- thrombin or
factor Xa
6–8 kDa
per
domain
Serpins α-1-antitrypsin,
antithrombin
Irreversible unique suicide
substrate mechanism,
disruption of protease active
site, huge conformational
changes in binding loop of
inhibitor
Wide occurrence in
kingdoms Metazoa,
Plantae and some viruses,
significantly large proteins
(~400 residues)
45–55 kDa
(BPTI: Bovine Pancreatic Trypsin Inhibitor, OMTKY3: turkey ovomucoid third domain, TAP: Tick anticoagulant
peptide. Source: Otlewski et al., 2005)
Other mechanisms of inhibition includes: (2) Docking adjacent to the active/substrate
binding site, for example the interaction of human stefin B with papain (figure 1.5, B)
(Stubbs et al., 1990). In this case, the inhibitor primarily binds to the S1´ to S4´ sub-
region of papain through two hairpin loops and with the non primed subsites S3 to S1
through its N-terminal segment, at P1 turning away from the active site of papain
preventing proteolytic processing (Stubbs et al., 1990). (3) Allosterically impairing the
proteolytic activity/substrate interaction of the enzyme via binding to quite distantly
located enzyme exo-sites. For example, mammalian serine proteinases zymogens with
partially formed S1 specificity pocket, represent allosterically inhibited proteinases (Kerr
et al., 1976; Fehlhammer et al., 1977; Cohen et al., 1981; Blevins and Tulinsky, 1985;
Wang et al., 1985; Bode and Huber, 2000). (figure 1.5, D).
Chapter 1
7
Quantitative Assays used for the characterization of mammalian proteases
A number of techniques are currently employed to characterize the proteases
present in proteolytic mixtures. Depending on the type of substrate protein, these assays
can be divided into two main categories;
(1) Assays with “natural” protein substrates: are those which make use of a protein
substrate containing several or many scissile sites for a mixture of proteases. These
protein substrates are usually “natural” or native proteins purified from a source
organism. Protease assays utilizing casein, a protein substrate, were first described by
Kunitz (1947). These assays are essentially based on the precipitation of undigested
protein substrate by trichloroacetic acid (TCA) followed by separation and detection of
TCA soluble peptides at absorbance 280 nm. Substrates such as casein, BSA, hemoglobin
etc. are generally used for these assays. Advances of TCA assays include chromogenic
and fluorogenic labeling of substrate proteins. The intensity of color/ fluorescence in
TCA filtrate of digested labeled substrate is a function of the total proteolytic activity of
the enzyme solution. Examples include: (a) Chromogenic protein substrates (Azo-
proteins) viz. azo-casein (Charney and Tomarelli, 1947) and azo-albumin (Lambert et al.,
1978; Phillips et al., 1984); (b) FITC (Fluorescein isothiocyanate) labeled protein
substrates such as FITC-hemoglobin and FITC-casein (Lumen and Tappel, 1970;
Robbins and Summaria, 1976; Twining, 1984) and (c) BODIPY (4, 4-difluoro-4-bora-
3a,4a-diaza-s-indacene) dye labeled protein substrates which shows green fluorescence
emission similar to fluorescein (BODIPY FL casein and BODIPY FL BSA) or red
fluorescence emission upon conjugation with Texas red dye (BODIPY TR-X casein)
(Haugland and Kang, 1988; Jones et al., 1997). Most protein substrates can be labeled by
these dyes. Labeled protein substrates are sensitive and can be automated for real time
monitoring of proteolysis. However, protease assays using fluorescently labeled protein
substrates are more sensitive than using the corresponding chromophore labeled proteins.
(2) Assays with “synthetic” peptide substrates: these substrates are constructed around
the chemistry of amino acids and peptides. In this case substituents are added to either the
Chapter 1
8
α-NH2 or α-COOH group on the peptide to produce a substrate susceptible to proteolytic
cleavage and to impart a chromogenic or fluorogenic nature to the substrate (Table 1.2).
Based on the position of the blocking group, synthetic substrates can be classified into
three categories, namely endopeptidase (=proteinase), aminopeptidase and
carboxypeptidase substrates. Proteinase substrates have no free amino- or carboxy-
termini, while aminopeptidase substrates have a free amino terminus and
carboxypeptidase substrates have a free carboxy- terminus. The method of detection can
be spectrophotometric (Martin et al, 1959a, b; Erlanger et al., 1961; Bargetzi et al., 1963;
Walsh, 1970; Walsh and Wilcox, 1970; Farmer and Hageman, 1975; DelMar et al., 1979)
or fluorimetric (Zimmerman et al., 1976, 1977). Sensitive synthetic fluorogenic substrates
for trypsin, chymotrypsin and elastase have been developed. These 7-amido-4-
methylcoumarin (NHMec) substrates have high fluorescence yields, and are very
sensitive and useful in assaying dilute enzyme preparations at low substrate
concentrations.
Table 1.2: Common protease substrate nomenclature
NH2 substituents COOH substituents
Abbreviation Group Abbreviation Group
Bz Benzoyl NHNap 2-Naphthylamide
Z Benzoyloxycarbonyl NHNan 4-Nitroanilide
Ac Acetyl NHMec 7-Amido-4-methylcoumarin
Suc Succinyl OMe Methylester
Abz O-Aminobenzoyl
Fa Furylacryloyl
(Source: Beynon and Bond, 2001)
Zymograms used for the detection of mammalian proteases
Gel-based assays have been employed for qualitative characterization of
proteases. These assays can be categorized into: those where enzyme activity is detected
in situ after electrophoresis; and those where enzyme activity is detected by replica
blotting of the electrophoretic gel on to a substrate-containing gel. The basis of these
Chapter 1
9
assays is the incorporation of a substrate (usually a protein viz. gelatin, but p-nitroanilide
substrates have also been used) in a gel matrix. To retain the enzymatic activity, gel
electrophoresis is performed at low temperatures under non-denaturing or mildly
denaturing conditions where samples are not boiled prior to loading, but the gels and
running buffer contains SDS. Protease-containing zone(s) appears as cleared areas/zones
of activity where the protein substrate is digested. For these gel based assays,
electrophoresis is commonly performed in polyacrylamide gels, although both starch and
agarose gels have also been employed (Heussen and Dowdle, 1980; Sarath et al., 1989;
Garcia-Carreno et al, 1993; Lantz and Cibrowski, 1994; Leber and Balkwill, 1997). The
assays involving replica blotting of gel fractionated enzymes relies on the ability of
electrophoretically separated enzymes to hydrolyze substrates incorporated into a second
gel or on strips of absorbent material such as cellulose acetate. In another approach, these
fractionated enzymes are transferred to nitrocellulose or PVDF sheets, and then assayed
for activity (Ohlsson et al., 1986; Lantz and Cibrowski, 1994). In-spite of its general
applicability, these gel-based assay methods show various limitations viz. loss of activity
after SDS treatment and reduction in migration rate of proteins due to incorporation of
proteins in the matrix (Hummel et al., 1996; Michaud, 1998).
Digestive serine proteinases in lepidopteran insects
Lepidopteran midgut physiology is characterized by proteolytic activity dominated
by serine proteinases (Berenbaum, 1980; Broadway and Duffey, 1986a; Christeller et al.,
1992; Purcell et al., 1992). Invertebrate and insect trypsins are similar to mammalian
trypsins in backbone scaffold and structural domains, but display some remarkable
differences from the latter. Trypsins isolated from lepidopteran larvae have higher pH
optima that correspond to the higher pH of their midguts. Insect trypsins are not activated
or stabilized by calcium ions (Johnston et al, 1991), are unstable at acidic pH (Miller et
al., 1974; Sakal et al., 1989) and differ from mammalian trypsins in responses to natural
Chapter 1
10
protein inhibitors and in their specificities toward protein substrates (Applebaum, 1985;
Purcell et al., 1992; Terra and Ferreira, 1994).
Invertebrate and insect chymotrypsins also share the same backbone scaffold and
bilobal β-barrel structure as mammalian chymotrypsins (Botos et al., 2000), but differ
considerably from the latter in many aspects. They display high pH optima reflecting the
alkaline midgut pH values (Terra and Ferreira, 1994) and differ from their mammalian
counterparts in their instability at acidic pH and strong inhibition by soybean trypsin
inhibitor, STI (Terra et al., 1996; Mazumdar-Leighton and Broadway, 2001a). There are
earlier reports of reduced or even complete absence of chymotrypsin activity among
lepidopteran larvae, based on the use of mammalian chymotrypsin substrates like BTpNa
(N-benzoyl-L-tyrosine-p-nitroanilide) and BTEE (N-Benzoyl-L-Tyrosine Ethyl Ester)
with one amino-acid binding site (Ahmad et al., 1980; Johnston et al., 1991; Christeller et
al., 1992; McManus and Burgess, 1995). However, it was shown clearly with
chromogenic substrates with an extended binding site such as SAAPFpNa (N-Succinyl-
Ala-Ala-Pro-Phe-p-nitroanilide), SAAPLpNa (N-Succinyl-Ala-Ala-Pro-Leu-p-
nitroanilide) and SAAPFAMC (N-Succinyl-Ala-Ala-Pro-Phe-AMC) that these substrates
are more efficiently hydrolyzed by the chymotrypsins in insects, and particularly in
Lepidoptera (Lee and Anstee, 1995; Johnston et al., 1995; Mazumdar-Leighton and
Broadway, 2001a). Elastase-like activity has been reported in a wide range of
lepidopteran species based on the hydrolysis of synthetic substrates SAAApNa
(Houseman et al., 1991; Johnston et al., 1995) and SAAPLpNa (Christeller et al., 1992,
1994). However the nature of this detected activity is unclear. In Spodoptera littoralis,
the proteolytic activities responsible for hydrolysis of both SAAApNa and SAAPLpNa
were attributed to chymotrypsin (Lee and Anstee, 1995), whereas in Heliothis virescens,
SAAPLpNa digesting activities were suggested to be because of chymotrypsin-like
enzymes, whereas SAAApNa digesting activities were suggested to be elastase-like
(Johnston et al., 1995). The purified chymotrypsin from Manduca sexta has been reported
to hydrolyse SAAPLpNa, but not SAAApNa (Peterson et al., 1994). On the basis of
SAAApNa hydrolysis, some amount of elastase activity has also been reported in
Chapter 1
11
Sesamia nonagrioides (Ortego et al., 1996). More work is required to check these
overlapping specificities of lepidopteran chymotrypsin and elastase-like enzymes.
Digestive serine proteinases in lepidopteran insects from India
The insect order Lepidoptera (Class: Insecta) includes several species of moths
and butterflies of economic importance. Several lepidopteran pests have been reported to
cause significant yield losses to crop plants in India. Indian agriculture is currently
suffering an annual loss of about Rs. 8, 63, 884 million due to insect pests (Dhaliwal et
al., 2010; Singh and Gandhi, 2012). Enormous amount of work has been done to control
these pests, with attempts to develop transgenic plants with incorporated proteinase
inhibitors and the insecticidal genes from Bacillus thurengenesis (Bennett et al., 1993;
Shao et al., 1998; Bentur et al., 1999, 2000; Selvapandian et al., 2001; Reddy et al.,
2002). In contrast, some beneficial Lepidoptera, like the non-mulberry silkmoths endemic
to Indian subcontinent are a source of revenue. For example, endemic saturniid
silkmoths, Antheraea mylitta and A. assamensis are reared for their “tasar” and “muga”
silks (Hazarika et al., 1998). The following section gives a brief introduction to the
different lepidopteran insects used in this work.
Helicoverpa armigera (Lepidoptera: Noctuidae), the “cotton bollworm”, is a
devastating polyphagous pest reported to feed on over 180 plant species from 45 different
plant families in India, including cotton, chickpea, pigeonpea, tobacco, tomato, corn and
okra (Manjunath et al., 1989). Proteolytic gut activities in H. armigera midgut have been
shown to be largely due to serine proteinases with high pH optima using casein, Rubisco
and synthetic substrates (Christeller et al, 1992; Johnston et al., 1991, 1995; Patankar et
al., 2001). The gut activities are reported predominantly to be trypsin-like (Johnston et
al., 1991; Mazumdar-Leighton et al, 2000). Chymotrypsins have been reported to show
up-regulation following ingestion of an inhibitor and are implicated in the development
of resistance in H. armigera to ingested delta–endotoxins from Bacillus thuringiensis
(Bown et al., 1997; Gatehouse et al., 1997; Wu et al., 1997; Shao et al., 1998; Mazumdar-
Chapter 1
12
Leighton and Broadway, 2001a). Carboxypeptidase and elastase like activities have also
been reported from larval midguts of H. armigera (Bown et al., 1997, 1998; Wu et al.,
1997).
Pieris brassicae (Linnaeus) (Lepidoptera: Pieridae), the “large white butterfly”, is
a specialist polyphagous pest of Crucifers in India (Ma, 1972; Sachan and Gangwar,
1980; Ali and Rizvi, 2007, Blatt et al., 2008). Apart from Cruciferae, it feeds on a
number of different host plants from different families including Capparaceae,
Papilionaceae, Ressdaceae and Tropaeolaceae (Feltwell, 1982). Trypsin and
chymotrypsin like activities have recently been reported, using synthetic chromogenic
substrates, from fourth instar P. brassicae larvae fed on cabbage leaves (Zibaee, 2012).
The midgut serine proteinases of P. brassicae were found to be involved in the
physiological adaptation of this pest to different host plants (Kumar, R., 2010 PhD
thesis).
Antheraea assamensis Helfer (Lepidoptera: Saturniidae), the “muga silk worm”,
is an economically important semi-domesticated insect reared for its golden hued cocoon
silk in north eastern parts of India and Indo-Burma (Choudhury, 1981; Ahmed et al.,
1998; Hazarika et al., 1998). It is mostly cultured around the Brahmaputra valley and
adjoining hills bordering Assam, Nagaland, Meghalaya and Arunachal Pradesh. A.
assamensis larvae can feed on 28 species of plants (Choudhury, 1981; Bindroo et al.,
2006) of family Lauraceae, but commercial cultivation is done mainly on the leaves of
Persea bombycina (Som) and Litsea monopetala (Sualu). Differential levels of gut
trypsins and chymotrypsins have been detected in larve feeding on these two host plants,
with synthetic fluorogenic substrates (Saikia et al., 2010). Figure 1.6 and figure 1.7 shows
these insects and their host plants.
Protease Inhibitors in plants: the natural antagonists of insect proteases
Protease inhibitors (PIs) are ubiquitous in plants and are major constituents of
seeds and storage organs (5–15% of total storage protein) (Jongsma and Bolter, 1997).
Chapter 1
13
Various lines of evidence suggest that the major function of proteinase inhibitors is to
combat the proteinases of pests and pathogens, constituting an essential part of the plant’s
natural defense system (Green and Ryan, 1972; Hill and Hastie, 1987; Ryan, 1990;
Jongsma and Bolter, 1997; Schuler et al., 1998; Tiffin and Gaut, 2001; Mello and Silva-
Filho, 2002; Mello et al., 2003; Christeller, 2005; Christeller and Laing, 2005; Fan and
Wu, 2005). Serine proteinase inhibitors have been reported from a variety of plant
sources and are among the most widely studied class of PIs (Ryan, 1978, 1979, 1990;
Vodkin, 1981; Kim et al., 1985; Broadway et al., 1986a; Wolfson and Murdock, 1987;
Christeller and Shaw, 1989; Michaud et al., 1993; McManus and Burgess, 1995;
McManus et al., 1999; Mello et al., 2003, Haq and Khan, 2003; Christeller and Laing,
2005., Fan and Wu, 2005). These proteinase inhibitors have been reported to inhibit the
growth and development of a wide range of herbivores (Broadway and Duffey, 1986a, b;
Broadway, 1995; Oppert et al., 1993; Giri et al., 2003), by disrupting proteolysis in the
insect midgut. It was demonstrated that the expression of the cowpea trypsin inhibitor
(CPTI) as a transgene in tobacco significantly reduced the damage caused by Heliothis
virescens (Hilder et al., 1987).
The expression of plant PIs is well regulated; and may be constitutive and/or
induced by herbivory (Green and Ryan, 1972; Nelson et al., 1983; Ryan, 1990; Pearce et
al., 1993; Jongsma et al., 1994). Expression of PPIs is regulated by the jasmonic acid
(JA) signaling pathway along with several other jasmonate induced proteins (JIPs)
involved in plant defense (Weidhase et al., 1987; Farmer and Ryan, 1990; Farmer and
Ryan, 1992; ; Felton et al., 1994; Constabel et al., 1995; McConn et al., 1997; Baldwin,
1998; Turner et al., 2002; Howe, 2005; Chen et al., 2005; Lison et al., 2006; Chen et al.,
2007; Browse and Howe, 2008; Howe and Jander, 2008). In contrast to the induction of
JIPs involved in plant defense, several household protein synthesis is repressed or down
regulated (especially the proteins involved in photosynthetic carbon assimilation) in
response to JA treatment. Table 1.3 summarizes some of the JA responsive genes,
reported to show altered expression (induction/ repression) in response to insect
herbivory.
Chapter 1
14
Table 1.3: Gene expression modulated in response to JA treatment/insect herbivory
Plant gene
product
Plant
Species
Response to
herbivory/JA
treatment
Suggested Role References
Proteinase
Inhibitors
Potato,
tomato,
alfalfa
Induced Plant defense, inhibits
digestive proteases
Farmer and Ryan, 1990;
Johnson and Ryan, 1990;
Farmer et al., 1992;
Hildmann et al, 1992
Polyphenol
oxidase
Tomato Induced Plant defense, reduce the
nutritional quality of
ingested plant proteins
Felton et al., 1992;
Constabel et al., 1995;
Threonine
deaminase
Tomato Induced Plant defense, depletes
the essential amino acid
threonine in insect
herbivore gut
Chen et al., 2005, Kang et
al., 2006; Gonzales-Vigil et
al., 2011
Arginase Tomato Induced Plant defense, depletes
the essential amino acid
arginine in insect
herbivore gut
Chen et al., 2005
Leucine
amino-
peptidase A
Tomato Induced Possibly works
synergistically with
arginase
Pautot et al., 1993; Gu et al.,
1996; Chen et al., 2005
Rubisco small
subunit
(RBCS)
Barley Repressed Plant defense, limit the
nutritional value,
reutilize the amino acids
for JIP synthesis
Reinbothe et al., 1992,
1993a, 1994; Hermsmeier et
al., 2001
Rubisco large
subunit
(RBCL)
Barley Repressed Plant defense, limit the
nutritional value,
reutilize the amino acids
for JIP synthesis
Reinbothe et al., 1992,
1993b
Light
harvesting
chlorophyll a/b
binding protein
Barley Repressed Plant defense, limit the
nutritional value,
reutilize the amino acids
for JIP synthesis
Reinbothe et al., 1993a, b
Mode of action of plant proteinase inhibitors
Structural studies on several free inhibitors and enzyme-inhibitor complexes
(Ruhlmann et al., 1973; Huber et al., 1974; Sweet et al., 1974) showed that all inhibitors
have a hyper exposed loop surrounding the P1 residue. In complexes with their cognate
enzymes, the reactive site loop of serine proteinase inhibitors associates with the catalytic
residues of target enzyme(s) in a similar way as that of bound substrates. This substrate
like mechanism was termed the “standard mechanism of inhibition” (Laskowski and
Chapter 1
15
Kato, 1980). In 1970s, the mechanism of inhibition of Porcine Pancreatic Trypsin (PPT)
by a plant proteinase inhibitor, soybean trypsin inhibitor (STI) was proposed, and later on
the residues involved in this interaction were determined (figure 1.8) (Blow et al., 1974;
Baillargeon et al., 1980; Song and Suh, 1998). Since then, a number of protease-PI
structures have been solved. For example, bovine trypsin complexed with Cucurbita
maxima trypsin inhibitor, CMTI-I (Bode et al., 1989), subtilisin BPN´ complexed with
Streptomyces subtilisin inhibitor (Takeuchi et al., 1991) and porcine trypsin complexed
with Momordica charantia trypsin inhibitor, MCTI-A (Huang et al., 1993).
Plant proteinase inhibitors and lepidopteran gut proteinases
Since Serine proteinase inhibitors acts in a substrate like manner, proteolysis of food
is blocked in their presence. The chronic ingestion of proteinase inhibitors results in
pernicious hyper-production of proteolytic enzymes, leading to limitation of essential
amino acid for protein synthesis and retardation in growth and development (Broadway
and Duffey, 1986b; Johnston et al., 1993; Jongsma and Bolter, 1997). The insect larvae
respond both quantitatively and/or qualitatively by altering their gut proteases in response
to ingestion of PIs. These responses include synthesis of “insensitive” proteinases which
the PI is unable to bind to and inhibit (Broadway, 1995; Jongsma et al, 1995; Bown et al.,
1997; Gatehouse et al, 1997; Mazumdar-Leighton and Broadway, 2001a, 2001b) or
synthesis of proteinase iso-forms having the ability to bind and degrade the PPI (Giri et
al, 1998; Telang et al, 2005). This forms an interesting paradigm of plant-insect herbivore
interactions, and in some sense an extension of the co-evolution theory proposed by
Ehrlich and Raven (1964) between herbivores and their host plants (Jongsma and Bolter,
1997; Mello and Silva-Filho, 2002). Some of the responses reported from lepidopteran
larvae, upon feeding on inhibitor containing natural or synthetic diets are tabulated in
table 1.4.
Chapter 1
16
Table 1.4: Responses by some lepidopteran larvae upon feeding on plant PIs
Insect
PI type
Diet Response
Reference
Heliothis zea STI, PI-2 Artificial diet Reduction in growth and
development, increase in
trypsin activity
Broadway and Duffey,
1986a
Spodoptera
exiqua
STI, PI-2 Artificial diet Reduction in growth and
development, increase in
trypsin activity
Broadway and Duffey,
1986a
Heliothis
virescens
CpTI Transgenic
tobacco
Reduction in growth and
development and
mortality
Hilder et al., 1987
Manduca sexta PI-I, PI-II Transgenic
tobacco
Reduction in growth and
development with PI-II
Johnson et al., 1989
Helicoverpa
armigera
STI, SBBI Artificial diet Reduction in growth and
development
Johnston et al., 1993
Spodoptera
litura
STI Artificial diet Reduction in growth and
development, increase in
trypsin activity
McManus and Burgess,
1995
Spodoptera
exigua
PI-2 Transgenic
tobacco
Increased tryptic activity
insensitive to PI-2
Jongsma et al., 1995
Helicoverpa
zea
STI Artificial diet Increased proteolytic
activity insensitive to STI
Broadway, 1997
Agrotis ipsilon STI Artificial diet Increased proteolytic
activity insensitive to STI
Broadway, 1997
Trichoplusia ni STI Artificial diet Increased proteolytic
activity insensitive to STI
Broadway, 1997
Helicoverpa
armigera
STI Artificial diet Induction of inhibitor
insensitive chymotrypsins
Bown et al., 1997
Helicoverpa
armigera
Aprotinin,
PI-I, PI-II,
STI
Artificial diet Upregulation of
chymotrypsins and
downregulation of
trypsins
Gatehouse et al., 1997
Helicoverpa
armigera
Cicer
arietinum
seed PIs
Chickpea
seeds
Induction of proteinase
isoforms capable of
degrading the PIs
Giri et al., 1998
Helicoverpa
zea
STI Artificial diet Upregulation of
insensitive trypsins and
chymotrypsins
Mazumdar-Leighton and
Broadway, 2001a, b
Agrotis ipsilon STI Artificial diet Upregulation of
insensitive trypsins and
chymotrypsins
Mazumdar-Leighton and
Broadway, 2001a, b
Heliothis
virescens
Nicotiana
tabacum leaf
PIs
Tobacco
leaves
Synthesis of trypsins that
form oligomers
Brito et al., 2001
Helicoverpa
zea
STI Artificial diet Synthesis of insensitive
trypsin
Volpicella et al., 2003
(*PI-2: potato proteinase inhibitor II, CpTI: Cowpea trypsin inhibitor; PI-I & II: tomato proteinase
inhibitor, STI: soybean (Kunitz) trypsin inhibitor)
Chapter 1
17
Assays used for the characterization of insect gut proteases
Assays described in previous sections for the mammalian digestive proteases are
also used for the characterization of lepidopteran midgut proteases. The total proteolytic
activities are characterized using protein substrates in absence and presence of proteinase
inhibitors of chemical or plant origin viz. pefabloc (4-(2-Aminothyl)-benenesulfonyl
fluoride, Hydrochloride); EDTA (Ethylene Diamine Tetraacetic Acid) for
metalloproteinases, E-64 (1-trans-Epoxysuccinyl-l-leucylamido [4-guanidino] butane) for
cysteine proteinases and antipain which inhibits both serine and cysteine proteases. The
diagnostic serine proteinase inhibitors which are routinely used, includes TLCK, which
indicates histidine at the active site of trypsin (Shaw, 1967); TPCK, which indicates
histidine at the active site of chymotrypsin (Schoellman and Shaw, 1963); STI (Soybean
Kunitz Trypsin inhibitor) and SBBI (Soybean Bowman-Birk inhibitor), the naturally
occurring plant proteinase inhibitors known to inhibit both trypsin and chymotrypsin like
activities of lepidopteran insects. Protein substrates used routinely include casein,
haemoglobin, BSA, gelatin etc. and their azo dye-conjugated derivatives (Pritchett et al.,
1981; Broadway and Duffey, 1986 b; Hamed and Attias, 1987; Broadway, 1989; Purcell
et al, 1992; Jongsma et al., 1995; Ortego et al., 1996; Giri et al., 1998; Patankar et al.,
2001; Parde et al., 2010; Zibaee, 2012) viz. BODIPY-casein (Li et al., 2004; George et
al., 2008). Radio-labeled (14
C) Rubisco, along with casein was used for the
characterization of midgut protease activities in 12 phytophagous lepidopteran larvae
(Christeller et al., 1992). The rates of hydrolysis of native Rubisco were found to be 2-15
fold less than the milk phosphoprotein casein (Christeller et al., 1992).
Serine proteinase activities have been characterized using ester substrates like N-
α-Benzoyl-L-Arginine Ethyl Ester (BAEE), N-α-Benzoyl-L-Tyrosine Ethyl Ester (BTEE)
and p-toluenesulfonyl-L-arginine methyl ester (TAME) (Broadway and Duffey, 1986 b;
Broadway and Duffey, 1988; Broadway, 1989; Christeller et al., 1989; Lenz et al., 1991).
Amide substrates have also been extensively employed, which includes N-α-Benzoyl-
DL-Arginine-p-nitroanilide (BApNA), N-α-Benzoyl-L-Tyrosine-p-nitroanilide (BTpNA),
Chapter 1
18
N-Succinyl-Ala-Ala-Pro-Phe-p-nitroanilide (SAAPFpNa) and N-Succinyl-Ala-Ala-Pro-
Leu-p-nitroanilide (SAAPLpNa) etc. (Christeller et al., 1989, 1992; Lenz et al., 1991;
Johnston et al., 1993; Christeller et al, 1994; Ferreira et al., 1994; Peterson et al., 1994,
1995; Broadway, 1995; Lee and Anstee, 1995; Valaitis, 1995; Broadway, 1997; Wu et
al., 1997; Johnson and Felton, 2000; Volpicella et al., 2000, 2003; Herrero et al., 2005;
Zavala et al., 2008; Zibaee, 2012).
More recently, specific fluorogenic substrates like N-α-Benzoyl-Arg-7-amino-4-
methylcoumarin (Bz-RMCA) and N-Succinyl-Ala-Ala-Pro-Phe-7-amino-4-
methylcoumarin (SAAPFAMC) have also been employed (Mazumdar-Leighton and
Broadway, 2001 a, b; Brito et al., 2001; Lopes et al., 2004, 2006; Chougule et al., 2008;
Sato et al., 2008; Lopes et al., 2009; Saikia et al., 2010). The gel-based analysis of the
midgut proteases is also similar to those explained for the mammalian or bacterial
systems (Michaud, 1998). The protein substrates usually used includes casein or gelatin
(Forcada et al., 1996; Brito et al., 2001; Zeng et al., 2002; Hegedus, 2003; Li et al., 2004;
Karumbaiah et al., 2007; Budatha et al., 2008; Chougule et al., 2008; Erlandson et al.,
2010; Saikia et al., 2010), but use of p-nitroanilide substrates have also been reported
(Oppert et al., 1996; Oppert et al., 2000; Zhu et al., 2000; Vinokurov et al., 2005; Oppert,
2006). In these gel based assays, the type of activities are characterized using specific
proteinase inhibitors.
Rubisco: the holoenzyme
The most abundant plant protein “Rubisco” is the major protein consumed by
phytophagous insects as it constitutes up to 50% of the soluble leaf proteins of C3 plants
(Ellis, 1979; Spreitzer and Salvucci, 2002) and 20–30% of total leaf nitrogen (Evans and
Seemann, 1989; Makino, 2003; Kumar et al., 2002). Rubisco is a major source of
essential amino acids for the phytophagous insects. Figure 1.9 shows the amino acid
composition of Nicotiana tabacum Rubisco large and small subunits (UniProtKB
Accession No. P00876 and P69246), calculated using Protparam tool available at
Chapter 1
19
http://web.expasy.org/protparam/ (Gasteiger et al., 2005). The essential amino acids (Arg,
His, Ile, Leu, Lys, Met, Phe, Thr, Trp, Val) constitutes 49% of the total amino acid
content of N. tabacum RBCL and 44% of N. tabacum RBCS.
Rubisco (ribulose-l, 5-bisphosphate carboxylase/oxygenase; EC 4.1.1.39) is one
of the largest enzymes in nature (~560 kDa). This enzyme is involved in the initial steps
of fixing of inorganic carbon by the photosynthesizing organisms. This process starts
with the condensation to one molecule of CO2 with a five-carbon sugar, ribulose-l,5-
bisphosphate (RuBP) resulting in formation of two molecules of 3-phosphoglycerate (3-
PGA). This carboxylation reaction is catalyzed by Rubisco and is the entry point for the
incorporation of atmospheric CO2 into the sugars for photosynthesis. The Rubisco
holoenzyme in green plants, cynobacteria and non green algae is a hexadecameric
structure (L8S8, Form I) that includes eight identical chloroplast-encoded large subunit
polypeptides (~55kDa) and eight nuclear encoded small subunit polypeptides (~15kDa)
(Rutner and Lane, 1967; Nishimura et al., 1973; Baker et al., 1977a, b).
Rubisco large subunit (RBCL)
First rbcL gene was cloned and sequenced in late 1970s from Zea mays (Coen et
al., 1977; McIntosh et al., 1980). Since then rbcL sequences are being added to the NCBI
database at an exponential rate, mainly for use in the field of plant systematic and
phylogeny (Clegg, 1993; Chase et al, 1993; Hasebe et al., 1995; Kellogg and Juliano,
1997). The length of encoded RBCLs have been reported to vary slightly among higher
plants, because of whole codon insertions/deletions/substitutions in the extreme 3' end of
rbcL genes (typically from about nucleotide position 1407 to 1431, numbering according
to maize RBCL) (Clegg, 1993; Kellogg and Juliano, 1997; Spreitzer and Salvucci, 2002).
L2 dimer is the basic functional unit of plant Rubisco molecules (Schneider et al.,
1990). The active site of Rubisco is formed by the juxtaposition of the N-terminal domain
of one large subunit with the C-terminal domain (α/β barrel domain) of the second large
subunit of a L2 dimer. The two large subunits dimerize ‘‘head to tail’’, with the N-
Chapter 1
20
terminal domain of one covering the top (C-terminal side) of the barrel of the other. Each
dimer thus has two active sites, each involving a total of 20 residues from the barrel of
one and the N-terminal domain of the other (Lorimer, 1981; Hartman et al., 1984;
Igarashi et al., 1985; Kellogg and Juliano, 1997). Two well defined domains can be
identified in a Rubisco large subunit; a smaller N-terminal domain (residue 1-150) and a
bigger C-terminal or α/β barrel domain (residues 151-475). Figure 1.10 shows the
structure of a RBCL and RBCS; and the holoenzyme from Nicotiana tabacum (PDB ID
1EJ7, Duff et al., 2000). The comparison of amino acid sequence of large subunit of
spinach Rubisco (Zurawski et al., 1981) with the large subunit of a form II Rubisco (a
dimer of two large subunits only) from Rhodospirillum rubrum (Hartman et al., 1984;
Nargang et al., 1984) showed only 28% similarity; but a striking similarity was seen in
the overall three dimensional structure of the L2 dimers of these two very different
Rubisco molecules (Schneider et al., 1990; Knight et al., 1990), suggesting conservation
of overall structure of large subunits in different forms.
Rubisco degradation in plants
Rubisco constitutes significant portion of the total leaf nitrogen in plants, and
once its enzymatic functions are fulfilled, proteolytic degradation of the holoenzyme
occurs in senescing leaves leading to remobilization of the reserved nitrogen (Peterson
and Huffaker, 1975; Friedrich and Huffaker, 1980; Fischer and Feller, 1994; Feller, 1990;
Crafts-Brandner et al., 1996, 1998; Feller et al., 2008). Rubisco synthesis takes place at
very high rates during leaf expansion and its net degradation starts shortly after full leaf
expansion, when Rubisco becomes a major nitrogen source for other plant parts like
young leaves and developing fruits (Mae, 2004). Rubisco catabolism and re-localization
of the amino acids is controlled by many factors apart from natural senescence (Feller
and Fischer, 1994). In vivo Rubisco catabolism has been reported to occur in response to
several environmental and endogenous factors (Nooden et al., 1997; Demirevska-Kepova
and Feller, 2004; Marin-Navarro and Moreno, 2006, Feller et al., 2008); for example,
nitrogen limitation (Crafts- Brandner et al., 1996, 1998), dark induced senescence (Feller
Chapter 1
21
and Fischer, 1994) and harsh climatic conditions like heat, drought and waterlogging
(Herrmann and Feller, 1998; Demirevska-Kepova and Feller, 2004) has been reported to
induce rapid Rubisco degradation in plants. Rubisco degradation has been reported to be
accelerated in maize after fruit removal (Crafts-Brandner and Poneleit, 1987). This
process has been reported to vary among species and genotypes (Crafts-Brandner and
Poneleit, 1987; Nakano et al., 1995). Insect herbivory has also been reported to induce
senescence like symptoms like chlorophyll loss and Rubisco degradation, possibly
leading to reutilization of the amino acids to synthesize plant defense proteins (see table
1.3).
The proteolytic digestion of Rubisco in plants during senescence and stress
conditions is not well understood. Catalytically active Rubisco has been reported to be
protected against proteolytic degradation (Mulligan et al., 1988; Houtz and Mulligan,
1991; Khan et al., 1999). Oxidative modification of cysteine residues has been reported
to enhance the proteolytic susceptibility of Rubisco (Penarrubia and Moreno, 1990;
Mehta et al., 1992; Desimone et al., 1996; Ishida et al., 1997, 1999; Roulin and Feller,
1998; Moreno et al., 2008). Oxidative stress has been reported to stimulate intermolecular
crosslinking of Rubisco subunits by disulfide linkages within the holoenzyme, inhibition
of the enzyme activity, rapid translocation to chloroplast membrane leading to
degradation (Mehta et al., 1992). Endogenous proteolytic activities belonging to different
classes and active at different pH have been reported to be involved in proteolytic
degradation of Rubisco subunits (Wittenbach, 1978; Wittenbach, 1979; Miller and
Hauffaker, 1982; Shurtz-Swirski and Gepstein, 1985; Paech and Dybing, 1986; Casano et
al., 1989; Yoshida and Minamikawa, 1996). Several reports suggests that Rubisco can be
degraded in intact chloroplasts (Ragster and Chrispeels, 1981; Mitsuhashi et al., 1992;
Desimone et al., 1996, 1998; Roulin and Feller, 1997, 1998; Ishida et al., 1998; Zhang et
al., 2007; Feller et al., 2008). A Rubisco degrading metalloproteinase has been reported
from pea chloroplasts (Bushnell et al., 1993). Another prevailing hypothesis is that
vacuolar enzymes are involved in the proteolytic digestion of Rubisco (Lin and
Wittenbach, 1981; Miller and Huffaker, 1981, 1982; Thayer and Huffaker, 1984; Bhalla
Chapter 1
22
and Dalling, 1986; Zhang et al., 2006). Whether the Rubisco proteolysis in chloroplasts
and/or senescence associated vacuoles (SAVs) is co-operated or represents alternative
pathways is not yet understood (Martinez et al., 2008, Feller et al., 2008).
Expression of recombinant proteins in Escherichia coli
E. coli is by far the most extensively used prokaryotic expression system due to
its ability to grow on inexpensive carbon source, rapid biomass accumulation,
amenability to high cell-density fermentations, availability of extensive genetic
information and large number of cloning vectors and mutant host strains. (Baneyx, 1999;
Baneyx and Mujacic, 2004). The T7 promoter based pET expression system
(commercialized by Novagen) is the most preferred system for expression of recombinant
proteins (Studier et al., 1990; Studier, 1991; Dubendorff and Studier, 1991). This system
includes several advantages like; precise control over target protein expression, hybrid
promoters, multiple cloning sites for the incorporation of different fusion partners and
protease cleavage sites, along with a high number of genetic backgrounds modified for
various expression purposes (Sorensen and Mortensen, 2005). Figure 1.11 shows the map
of a pET vector. For the pET expression system, BL21 and its derivatives are the most
widely used E. coli hosts. A number of BL21 derivatives have been developed to
overcome the problems of codon bias and improper folding of heterologous proteins
expressed in E. coli. For example, OrigamiTM
and Origami B strains offer formation of
properly folded disulfide containing recombinant proteins (Prinz et al., 1997). To
circumvent the problem of codon bias; RosettaTM
and RosettaBlueTM
strains have been
developed, which supplies the tRNAs for six codons used rarely in E. coli (Kane, 1995;
Kurland and Gallant, 1996). Another strategy applied for high throughput recombinant
protein expression and purification involves use of affinity tags. Affinity tags like
maltose binding protein (MBP) and glutathione S-transferase (GST) have been shown to
have a beneficial effect on solubility of recombinant proteins expressed in E. coli (Kapust
and Waugh, 1999; Smith, 2000). However, these tags can interfere with proper folding,
function and crystallization of heterologous proteins (Braun and LaBaer, 2003). Smaller
Chapter 1
23
tags like His6-tag is a popular choice because of its small size offering smaller risk of
steric hinderance and its strong reversible binding to metal chelate adsorbent (Hochuli et
al., 1988; Crowe et al., 1994; Braun and LaBaer, 2003; Knecht et al., 2009). His6-tag is
particularly useful for heterologous proteins typically expressed in inclusion bodies,
because it functions even in denaturing conditions and the tagged protein can be purified
using affinity chromatography in presence of denaturants (Bornhost and Falke, 2000;
Braun and LaBaer, 2003).
Inclusion bodies (IBs) and recombinant proteins
Overproduction of heterologous proteins that requires post-translational
modifications and disulfide bond formation for correct folding and functional activity,
often leads to misfolding and segregation of the recombinant protein into insoluble
aggregates called inclusion bodies (Baneyx, 1999). The structure and mechanism of
formation of inclusion bodies in E. coli is not well understood (Villaverde and Carrio,
2003). Significant features of protein aggregates in inclusion bodies are the existence of
native-like secondary structure of the expressed protein and resistance to proteolytic
degradation (Przybycien et al., 1994; Oberg et al., 1994; Umetsu et al., 2004, Singh and
Panda, 2005). Inclusion body formation during recombinant expression is undesirable,
but their formation offers several advantages, viz., (a) expression of a very high level
of target protein which typically accounts for 80-95% of the inclusion body material (b)
easy isolation of the inclusion bodies from cells due to differences in their size and
density as compared with cellular contaminants, and (c) resistance to proteolytic attack
by cellular proteases (Thatcher and Hitchcock, 1994; Baneyx and Mujacic, 2004; Singh
and Panda, 2005). The recovery of expressed protein from inclusion bodies essentially
involves, isolation and washing of inclusion bodies from E. coli cells, solubilization, and
refolding of the target protein (Clark, 1998; Lilie et al., 1998; Rudolph et al., 1997; Clark,
2001; Vallejo and Rinas, 2004). Efficient refolding of the target protein from inclusion
bodies requires considerable optimization. Figure 1.12 shows schematic representation of
Chapter 1
24
the different steps that can be followed for recovery of refolded proteins from the
inclusion bodies (Clark 2001; Vallejo and Rinas, 2004). There are many reports for
recovery and refolding of active recombinant proteins from E. coli inclusion bodies
(Yesilirmak and Sayers, 2009). Examples include refolding of human gamma interferon
from inclusion bodies using L-arginine (Arora and Khanna, 1996; Babu et al., 2000);
Arabidopsis thaumatin-like protein (ATLP3) (Hu and Reddy, 1997); refolding of
Solanum nigrum osmotin like protein (SnOLP) using reduced:oxidized glutathione redox
buffer (Campos et al., 2008); solubilization of soybean RHG1-LRR domain protein from
inclusion bodies with urea buffer and refolding by removing the urea in presence of
arginine and reduced:oxidized glutathione buffer (Afzal and Lightfoot, 2007).
Recombinant expression of Rubisco in E. coli
The genes encoding for Rubisco large subunits from maize and wheat were the
first plant genes reported to be expressed in E. coli system (Gatenby et al., 1981; Gatenby
and Castleton, 1982). The full length maize RBCL was exclusively found in the insoluble
fraction of E. coli cultures (Gatenby, 1984). Several groups reported the recombinant
expression of active cynobacterial Rubisco, when both the subunits were expressed in E.
coli (Christeller et al., 1985; Gatenby et al., 1985; Gurevitz et al., 1985; Tabita and Small,
1985). In attempts to obtain active recombinant Rubisco from higher plant, the maize
RBCL and wheat RBCS were co-expressed in E. coli, but inactive RBCL-RBCS
aggregates were found predominantly in the insoluble fraction, suggesting the
involvement of other factors (Gatenby et al., 1987). An active hybrid Rubisco was
recombinantly expressed in E. coli using cynobacterial RBCL and wheat RBCS (van der
Vies et al., 1986). It was demonstrated that for the assembly of active prokaryotic
Rubiscos from Anacystis nidulans and R. rubrum in recombinant E. coli cultures,
endogenous GroEL and GroES heat shock proteins present in E. coli periplasm are
required (Goloubinoff et al., 1989). Similarly, Rubisco from different bacterial sources
(purple bacteria and cyanobacteria) was recombinantly expressed in E. coli cells to study
Chapter 1
25
the role of DnaK chaperone system in its proper folding (Checa and Viale, 1997). It has
now been established that in vivo protein folding and biological assembly of some
macromolecules viz. Rubisco may require a complex cellular machinery of molecular
chaperones (Ellis and Hemmingsen, 1989; Gething and Sambrook, 1992; Hartl, 1996;
Frydman, 2001; Hartl and Hayer-Hartl, 2002).
MALDI-TOF peptide mass fingerprinting
Mass spectrometry (MS) based on soft ionization techniques viz., MALDI
(Matrix-assisted laser desorption/ionization) (Tanaka et al., 1988; Karas and Hillenkamp,
1988) and ESI (Electrospray ionization) (Yamashita and Fenn, 1984a, b), is a rapid and
sensitive tool for the analysis of proteins and peptide mixtures. The basic principle of
mass spectrometry (MS) is to generate ions from either inorganic or organic compounds
by any suitable method, to separate these ions by their mass-to-charge ratio (m/z) and to
detect them qualitatively and quantitatively by their respective m/z and abundance.
MALDI-TOF-MS has become a widely applied, powerful analytical tool in various fields
of biological sciences (Schluter et al., 2003). MALDI-TOF is ideal for biological samples
because it is fast, simple, sensitive (low fmol range), accurate (low ppm range),
compatible with phosphate and Tris buffers, tolerant to low levels of contaminants such
as salts and surfactants and can be automated (Thiede et al., 2005, Pan et al., 2007). In
most cases, sequence-specific peptide fragmentation or peptide mass fingerprinting
(PMF), also called as peptide mass mapping is sufficient to identify the protein(s) in any
sample (Henzel et al., 1993; Aebersold and Goodlett, 2001; Thiede et al., 2005). In PMF,
the protein/protein mixture is digested with a sequence specific enzyme. Trypsin is
usually the favored enzyme for PMF because it is relatively cheap, highly effective, and
generates peptides with an average size of about 8–10 amino acids, ideally suited for
analysis by MS (Thiede et al., 2005). Before transfer into the ionisation source, the
peptide or protein samples are mixed with a matrix (an organic component) and the
mixtures are placed to crystallize in small droplets on a target. The energy required for
ionization of the ions is provided by a laser. The matrix crystal absorbs energy and
Chapter 1
26
converts it to heat leading to localized “explosion” of the sample/matrix (the plume jet).
Proton transfer in the resulting plasma yields the ions (M+H+) and being in the gas phase
these ions are accelerated by an electric field into the analyzer and separates according to
their m/z values. MALDI mainly produce single charged ions so the m/z ratio will in most
cases have the same value as the mass of the peptide plus one hydrogen (M+H+). The
obtained m/z values are matched against theoretically calculated m/z data of peptides
available in the database and a score depending on the correlation is given.
Post translational/chemical modifications of proteins/peptides can complicate
MALDI-TOF PMF analyses, because these modifications affect the observed masses.
These PTMs may be natural or may arise inadvertently during sample handling (Karty et
al., 2002; McCarthy et al., 2003; Froelich and Reid, 2008). However, certain softwares
are available viz. FindMod (http://web.expasy.org/findmod/, Wilkins et al., 1999;
Gasteiger et al., 2005), which can examine PMF data for mass difference between
experimental and theoretical masses, and predicts the nature and site of modification(s)
based on the mass difference. Table 1.5 lists the various sources used for MS based
protein identification. For customized applications, for example proteomic studies of
Lepidoptera, specific databases like ButterflyBase (http://www.butterflybase.org) can be
used.
Table 1.5: Sources for MS-Based Protein Identification Tools
Sponsor (Application) Uniform Resource Locator (URL)
Eidgenossische Technische Hochschule (MassSearch) http://cbrg.inf.ethz.ch
European Molecular Biology Laboratory (PeptideSearch) http://www.mann.emblheidelberg.de
Swiss Institute of Bioinformatics (ExPASy) http://www.expasy.ch/tools
Matrix Science (Mascot) http://www.matrixscience.com
Rockefeller University (PepFrag, ProFound) http://prowl.rockefeller.edu
Human Genome Research Center (MOWSE) http://www.seqnet.dl.ac.uk
University of California (MS-Tag, MS-Fit, MS-Seq) http://prospector.ucsf.edu
Institute for Systems Biology (COMET) http://www.systemsbiology.org
University of Washington (SEQUEST) http://thompson.mbt.washington.edu/sequest
(Source: Aebersold and Goodlett, 2001)
Chapter 1
27
Tandem mass Spectrometry (MS/MS) and de novo sequencing
PMF is usually complemented by tandem mass spectrometry (MS/MS) for
unambiguous identification of proteins in a complex mixture and for confirmation of
results obtained by PMF. MS/MS provides sequence specific fragmentation patterns of
individual peptides that can be used for database searches. A high throughput de novo
sequencing approach may be critical for detection of amino acid polymorphisms,
characterization of post translational modifications (PTMs) and identification of proteins
not represented in sequence databases (Mann and Wilm, 1994; Taylor and Johnson, 2001;
Standing, 2003; Pan et al., 2010). Partial sequences obtained in MS/MS, called “peptide
sequence tags” along with molecular weight forms the unique signature of precursor
peptides (Mann and Wilm, 1994) and can be used reliably to resolve ambiguities
observed due to amino acid substitutions and unexpected PTMs (Mann and Wilm, 1994;
Suckau et al., 2003; Pan et al., 2010). Fragmented ions called “post source decay” or PSD
ions, which are produced during the flight in the field-free drift region of a MALDI/TOF-
MS have been used for peptide sequencing (Spengler et al., 1992a, b; Suckau and
Cornett, 1998). The PSD spectra of peptides contain a wealth of sequence-specific a, b
(generated from N-terminal of the fragmented peptide), y (generated from C-terminal of
the fragmented peptide) and i ions (internal ions) (Biemann, 1988; Papayannopoulos,
1995). These ion series ladders are used to read the sequence of the peptide. In addition to
identification, the post-translational modifications (PTMs) of proteins have been
routinely detected by MALDI-TOF MS/MS (Huberty et al., 1993; Patterson and Katta,
1994; Crimmins et al., 1995; Mann and Talbo, 1996; Wilkins et al., 1999).
Uses of MALDI-TOF and associated techniques in insect-plant interaction studies
Mass spectrometry has been used extensively for study of the components
involved in insects-plant interactions and also for study of whole insect gut/ plant
proteomes. For example, whole “shotgun” MS based proteomics approach was employed
Chapter 1
28
for identifying JIPs accumulating in Manduca sexta larval midguts upon feeding on
transgenic tomato plants which were either insensitive to JA treatment or overproducing
the plant defense proteins (Chen et al., 2005). In another report, the interaction of
Capsicum annuum PIs (CanPIs) with H. armigera gut proteases was studied using MS in
combination with intensity fading assay technique (Mishra et al., 2010). Similarly,
comparative protein profiling of H. armigera midguts fed on host based (chickpea) and
non-host based (Cassia tora) diets was done using LC-MS (Dawkar et al., 2011).
Recently, LC-ESI MS/MS has also been used for the study of Spodoptera exigua
caterpillar-specific posttranslational modification of Arabidopsis thaliana soluble leaf
proteins (Thivierge et al., 2010). Other examples for proteome studies include,
characterization of midgut lumen proteome of Helicoverpa armigera using 2D gel
electrophoresis and de novo MS/MS sequencing (Pauchet et al., 2008) and protein
profiling of sixth instar Spodoptera littura larval midguts using HPLC-ESI-MS/MS
Shotgun Analysis (Liu et al., 2010). In addition, two dimentional gelatin zymograms
coupled with MALDI/TOF-MS have been employed for the analysis of pineapple and
green kiwi fruit proteinases (Larocca et al., 2010a, b). Similarly, cellulose and xylan
degrading enzymes were identified from larval guts of Asian longhorned beetle,
Anoplophora glabripennis (Geib et al., 2010).
Scope of the thesis
Protein substrates used commonly for measuring total protease activity in insect
gut samples include Casein, BSA and Hemoglobin. These animal proteins are readily
available but are not physiological substrates for herbivores. Phytophagous lepidopteran
larvae consume enormous quantities of green tissues containing the photosynthetic
enzyme Rubisco. Rubisco was recognized early as an ideal candidate for protein substrate
for measuring larval gut protease activities. However its digestibility has always been
poor. Possible caveats for successful application of this protein substrate include
structural resiliency to proteolysis as well as the labor and costs of purifying Rubisco
Chapter 1
29
from host plants. Hence this study investigates the potential and applicability of
recombinant large subunit of Rubisco (RBCL) as a protein substrate for midgut proteases
of three species of phytophagous Lepidoptera whose larvae are known to feed on
different plants. Helicoverpa armigera (the boll worm) is a devastating polyphagous pest
in India; Pieris brassicae (the large white butterfly) is a serious, recurring pest of
cultivated crucifers, while Antheraea assamensis (the muga silk worm) is a beneficial,
semi domesticated insect reared on the leaves of Persea bombycina (Som) and Litsea
monopetala (Sualu). Larval gut proteases of these insects include serine proteinases with
high pH optima. The gut physiology in these insects is complex. Multiple gut serine
proteinases expressed differently in response to dietary factors have been reported from
these insects. Most of these studies used synthetic substrates and/or animal protein
substrates viz. casein. Digestion of a common host plant protein substrate by gut
proteases of these phytophagous larvae feeding on different foods has not been
investigated. Hence, this dissertation research work was carried out with the following
objectives:
(1) Isolation of rbcL genes from Persea bombycina and Litsea
monopetala (family Lauraceae; host plant of a saturniid silkworm Antheraea
assamensis), Nicotiana tabacum (family Solanaceae; host plant of a noctuiid
pest Helicoverpa armigera) and Brassica oleracea var. botrytis (family
Cruciferae; host plant of a pierid pest Pieris brassicae).
(2) Expression and purification of large subunit of Rubisco as tagged,
recombinant proteins in E. coli.
(3) Evaluation of cognate (from the host plant) versus heterologous
(from a non-host plant) recombinant RBC-LS as substrates for gut proteases
from insects fed on different foods
(4) Evaluation of casein, recombinant RBC-LS, and Rubisco as
substrates for mammalian serine proteinases, and gut proteases from insects
Chapter 1
30
fed on different foods in the presence of protease inhibitors (PI) and reducing
agent, Dithiothreitol (DTT).
(5) Substrate zymography using copolymerized reco-RBCL and
Rubisco to visualize gut proteases from insects fed on different diets.
(6) MALDI-TOF peptide mass fingerprinting and MS/MS of
Rubiscolytic products from in-gel digestion by midgut proteases and trypsin.
This dissertation is organized in Five Chapters. Chapter one contains a brief introduction
to subject and review of pertinent literature while Chapter two describes materials and
methods employed. Chapter three contains results and discussion on recombinant RBCL
as a protease substrate for mammalian serine proteinases, larval H. armigera gut
proteases, larval P. brassicae gut proteases and larval A. assamensis fed on different
species of host plants. Chapter four describes MALDI-TOF analyses of recombinant
RBCL and Rubisco digested by various serine proteinases and insect gut proteases.
Chapter five contains a summary and conclusions of this dissertation. Cited references
are provided thereafter.