Chapter 1: Introduction and Review of Literatureshodhganga.inflibnet.ac.in/bitstream/10603/32074/8/08_chapter 1.pdfChapter 1: Introduction and Review of Literature Introduction to

Chapter 1

1

Chapter 1: Introduction and Review of Literature

Introduction to proteases

Proteases were once believed to be nonspecific degradative enzymes, but are now

recognized to have well-defined physiological roles with tightly controlled spatial and

temporal regulation of activity. Proteases act as regulators of biological processes during

the life span of all organisms (Neurath, 1984). They control activation, synthesis and

turnover of proteins as they carry out unlimited number of hydrolytic reactions both intra-

and extracellularly (Neurath and Walsh, 1976; Neurath, 1989). The term 'protease' was

earlier used for enzymes that degrade proteins by hydrolysis of peptide bonds. In late

1920s, it was recognized that there are two very different types of these proteolytic

enzymes (Grassman and Dyckerhoff, 1928). Some enzymes act best on intact proteins,

whereas others show a preference for small peptides as substrates. So the term

‘proteinase’ was adapted for proteases that show specificity for intact proteins and cleaves

internal peptide bonds (=endoproteinases), and ‘peptidase’ was used as synonymous to

exopeptidase, i.e. to refer to a peptide (bond) hydrolase that acts specifically at the N- or

C-terminus of a polypeptide (Grassmann and Dyckerhoff, 1928; Barrett, 1980;

McDonald, 1985; Barrett and McDonald, 1985, 1986).

The general nomenclature of cleavage site positions of the protein substrate by

proteinases was formulated by Schechter and Berger (1967). According to this scheme,

the binding region of the active site may be divided into subsites (S), or specificity sites,

each accommodating one amino acid residue or blocking group of the substrate. The

subsites on each side of the catalytic site are designated S1 and S1´. The remaining

subsites are numbered from this point in both directions as S1, S2.., etc., and S1´, S2´..,

etc. The positions of the amino acids in the substrate that interact with the protease are

numbered to correspond with the subsites they occupy on the active site and are

designated P1 and P1´ (figure 1.1). Cleavage is catalyzed between P1 and P1´.

Chapter 1

2

Classification of Proteases

In 1960, a scheme for classifying proteases on the basis of recognized catalytic

mechanisms, was proposed (Hartley, 1960). Proteases were classified in four classes

according to their catalytic mechanisms: serine proteinases, thiol proteinases, acid

proteinases and metal proteinases (figure 1.2). Later on, some amendments in this scheme

were proposed (Barrett, 1980). The International Union of Biochemistry and Molecular

Biology (IUBMB) (formerly the International Union of Biochemistry, IUB) recommends

that proteases be divided into four mechanistic classes depending on the amino acid

residue present at their active sites, the optimum pH range, amino acid sequence similarity

and similarity in response to inhibitors. These include serine proteinases, cysteine

proteinases, aspartic proteinases and metallo-proteinases. In addition, Rawlings and Barrett

(1993) proposed an evolutionary scheme for proteases based on amino acid sequence data

from 600 of these enzymes. According to this scheme, there are more than 60 evolutionary

lines of proteases with separate origins (Rawlings and Barrett, 1993; Terra and Ferreira,

1994). In this classification, proteases are classified into families and clans. The term family

is used to describe a group of enzymes in which each member shows evolutionary

relationship to at least one other, either throughout the whole sequence or at least in the

part of the sequence responsible for catalytic activity; whereas a clan comprises a group

of families for which there are indications of evolutionary relationship, despite the lack of

statistically significant similarities in sequence (Rawlings and Barrett, 1993). Recent

advances of classification to proteases are available at MEROPS Peptidase Database,

http://merops.sanger.ac.uk) (Rawlings and Barrett, 1993; Rawlings et al., 2010, 2012).

Serine proteinases (EC 3.4.21)

Serine proteinases are by far the best characterized proteolytic enzymes (Neurath,

1999). They are of wide occurrence and constitute the largest group of proteinases. Together

with their inhibitors, they regulate a great variety of physiological events. The catalytic

mechanism depends upon the hydroxyl group of a serine residue acting as a nucleophile

http://merops.sanger.ac.uk/

Chapter 1

3

that attacks the peptide bond. Serine proteinases are characterized by enzyme inactivation in

the presence of diisopropylfluorophosphate (DFP) or phenylmethanesulfonylfluoride

(PMSF). This mechanistic class is distinguished by the presence of the Asp102-His57-

Ser195 “catalytic triad”, first identified in α chymotrypsin (numbering after the sequence

of bovine chymotrypsinogen) (Blow, 1997). Various clans in the serine proteinase family

display differences in terms of overall fold and the order of catalytic residues in the

primary sequences; but members of clans PA (chymotrypsin like) (Lesk and Fordham,

1996), SB (subtilisin like) (Siezen and Leunissen, 1997) and SC (α/β hydrolase fold)

(Ollis et al., 1992) maintain a strictly conserved active site geometry of the catalytic Ser-

His-Asp residues (Krem and Cera, 2001). Serine proteinases with novel catalytic triads

and dyads have also been identified, including Ser-His-Glu, Ser-Lys/His, His-Ser-His,

and N-terminal Ser residues (Dodson and Wlodawer, 1998; Hedstrom, 2002).

The mammalian digestive serine proteinases

Digestive serine proteases (also known as the S1 clan of serine proteinases)

(www.merops.sanger.ac.uk) include trypsin, chymotrypsin and elastases (Rawlings and

Barrett, 1993). Trypsin, chymotrypsin and elastase have the same active site residues but

differ in their substrate specificity. They also differ in the amino acids that confer this

specificity (figure 1.3). Trypsins and chymotrypsins share three disulphide bridges known

as His loop (42-58), Met loop (168-152) and Ser loop (191-220) (Hartley, 1964).

Chymotrypsin (EC 3.4.21.1) is a prominent member of the group of enzymes belonging

to the digestive serine proteinase family. The substrate binding site is occupied by Ser189

in chymotrypsins which cleave peptide bonds at the carboxy- termini of amino acids

tryptophan (Trp), phenylalanine (Phe), methionine (Met) and Leucine (Leu) (Boyer,

1971). Bovine chymotrypsin (BC) was sequenced early (Hartley, 1964), and is well

characterized. The primary specificity of this enzyme for aromatic and Leucine residues

results from the architecture of its S1 subsite (following Schechter and Berger, 1967)

comprising of a pocket close to the catalytic Ser 195. Secondary specificity determinants

http://www.merops.sanger.ac.uk/

Chapter 1

4

are Gly 216 and Gly 226 that line the pocket and residue Ser 189 that lies at the bottom of

the specificity pocket (figure 1.3) (Dorovska et al., 1972).

Trypsin (EC 3.4.21.4) catalyzes the hydrolysis of protein chains on the carboxyl

side of basic L-amino acids, arginine (Arg) or lysine (Lys). The substrate binding site in

trypsin is occupied by Asp189, Gly 216 and Gly 226 (figure 1.3). Asp189 gives negative

charge to the substrate binding pocket and is perfectly placed to form an ion-pair with the

positive charge of Lys or Arg residues present in the scissile bond of a substrate or

inhibitor (Evnin et al., 1990; Perona et al., 1994). Bovine trypsin (BT) was among the

first proteolytic enzymes to be isolated in pure form and in sufficient quantity for detailed

chemical and enzymological studies (Northrop et al., 1948). Mammalian trypsin shows a

2- to 10 fold preference of Arg substrates as compared to Lys substrates (Craik et al.,

1985; Barrett et al., 2004). The enzyme is specifically inhibited by Nα-Tosyl-L-lysine

chloromethyl ketone hydrochloride (TLCK) which acts on histidine (Shaw et al., 1965).

The three dimensional structure and the mode of action of trypsin has been well studied

more than thirty years ago with the help of crystallography (Stroud et al., 1971, 1974;

Kraut, 1977; Huber and Bode, 1978).

Elastase (E.C 3.4.4.7) is an endopeptidase that can digest elastin, the elastic

fibrous protein of connective tissue (Boyer, 1971). The first report of pancreatic elastase

activity dates back to 1878 when Walchli found that Ox pancreas digested ligamentum

nuchae elastin, and Kuhne reported that impure trypsin preparations dissolved elastin

(Boyer, 1971). Elastase is present in all mammals investigated and has also been reported

in pancreatic extracts of several vertebrates and invertebrates (Cohen et al., 1981;

Yoshinaka et al., 1986). The first elastase gene to be cloned was from pig (Shotton and

Hartley, 1970). Like other serine proteinases, elastase contains the highly conserved

catalytic triad of His57-Asp102-Ser195 (Vered et al., 1986). In elastase specificity

pocket, Gly 216 and Gly 226 are replaced by Val 216 and Thr 226 which block the

entrance of large bulky amino acids into the substrate binding cavity (figure 1.3) (Shotton

and Hartley, 1970).

Chapter 1

5

Mechanism of action of mammalian serine proteases

The classical mechanism of proteolysis by these enzymes involves two acyl

transfer reactions, in which the ‘leaving group’ of the scissile bond in the substrate is

attacked first by the oxygen of the enzyme’s active site serine, and secondly by a water

molecule (Bender et al., 1964; Fastrez and Fersht, 1973). The catalytic triad spans the

active site cleft, with Ser195 on one side and Asp102 and His57 on the other, where

His57 acts as a general acid and base; Asp102 functions to orient His57 while Ser195

forms a covalent bond with the peptide which is to be cleaved (Blow, 1997; Hedstrom,

2002). In the acylation half of the reaction, Ser195 attacks the carbonyl group of the

peptide substrate, assisted by His57 acting as a general base, to yield a tetrahedral

intermediate (figure 1.4). The resulting His57-H+ is stabilized by the hydrogen bond to

Asp102. The oxyanion of the tetrahedral intermediate is stabilized by interaction with the

main chain NHs of the oxyanion hole. The tetrahedral intermediate collapses with

expulsion of leaving group, assisted by His57-H+ acting as a general acid, to yield the

acylenzyme intermediate (Hedstrom, 2002). The deacylation half of the reaction

essentially repeats the above sequence: water attacks the acylenzyme, assisted by His57

yielding a second tetrahedral intermediate. This intermediate collapses, expelling Ser195

and the carboxylic acid product.

Protease Inhibitors (PIs): mechanisms of inhibition

Protease inhibitors (PIs) are natural antagonists of proteases, and control action of the

proteolytic enzymes. PIs can be classified into 48 families on the basis of similarities

detectable at the level of amino acid sequence (Rawlings et al., 2004). On the basis of

three-dimensional structures, 31 of the families are assigned to 26 clans

(www.merops.sanger.ac.uk). The activity of PIs is due to their capacity to form stable

complexes with the enzymes. Various mechanisms have been proposed for inhibition of

proteases by PIs (figure 1.5) (Bode and Huber, 2000), which includes: (1) Blocking the

enzyme's active site in a substrate-like manner, for example the canonical serine

http://www.merops.sanger.ac.uk/

Chapter 1

6

proteinase inhibitors interact with the target enzymes in a substrate like manner (figure

1.5, A) (Bode and Huber, 2000). Table 1.1 summarizes some of the major features of

different types of serine proteinase inhibitors.

Table 1.1: The major features of serine proteinase inhibitors

Inhibitor

type

Representative Mode of interaction with

target proteinase

Major features Inhibitor

size

Canonical

inhibitor

BPTI,

OMTKY3,

eglin c,

Substrate like standard

mechanism, direct blockage of

the active site, the enzyme-

inhibitor complex is rapidly

formed and usually dissociates

slowly into free enzyme and

virgin or modified inhibitor

Largest group of PIs,

presence of exposed

binding loop of a

characteristic “canonical”

conformation

3–21 kDa

per

domain

Non-

canonical

inhibitor

Hirudin,

TAP,

ornithodorin

Interact through N-terminal

segments to the enzyme active

site

Less abundant PIs, only

occur in blood sucking

organisms, inhibits

protease involved in clot

formation- thrombin or

factor Xa

6–8 kDa

per

domain

Serpins α-1-antitrypsin,

antithrombin

Irreversible unique suicide

substrate mechanism,

disruption of protease active

site, huge conformational

changes in binding loop of

inhibitor

Wide occurrence in

kingdoms Metazoa,

Plantae and some viruses,

significantly large proteins

(~400 residues)

45–55 kDa

(BPTI: Bovine Pancreatic Trypsin Inhibitor, OMTKY3: turkey ovomucoid third domain, TAP: Tick anticoagulant

peptide. Source: Otlewski et al., 2005)

Other mechanisms of inhibition includes: (2) Docking adjacent to the active/substrate

binding site, for example the interaction of human stefin B with papain (figure 1.5, B)

(Stubbs et al., 1990). In this case, the inhibitor primarily binds to the S1´ to S4´ sub-

region of papain through two hairpin loops and with the non primed subsites S3 to S1

through its N-terminal segment, at P1 turning away from the active site of papain

preventing proteolytic processing (Stubbs et al., 1990). (3) Allosterically impairing the

proteolytic activity/substrate interaction of the enzyme via binding to quite distantly

located enzyme exo-sites. For example, mammalian serine proteinases zymogens with

partially formed S1 specificity pocket, represent allosterically inhibited proteinases (Kerr

et al., 1976; Fehlhammer et al., 1977; Cohen et al., 1981; Blevins and Tulinsky, 1985;

Wang et al., 1985; Bode and Huber, 2000). (figure 1.5, D).

Chapter 1

7

Quantitative Assays used for the characterization of mammalian proteases

A number of techniques are currently employed to characterize the proteases

present in proteolytic mixtures. Depending on the type of substrate protein, these assays

can be divided into two main categories;

(1) Assays with “natural” protein substrates: are those which make use of a protein

substrate containing several or many scissile sites for a mixture of proteases. These

protein substrates are usually “natural” or native proteins purified from a source

organism. Protease assays utilizing casein, a protein substrate, were first described by

Kunitz (1947). These assays are essentially based on the precipitation of undigested

protein substrate by trichloroacetic acid (TCA) followed by separation and detection of

TCA soluble peptides at absorbance 280 nm. Substrates such as casein, BSA, hemoglobin

etc. are generally used for these assays. Advances of TCA assays include chromogenic

and fluorogenic labeling of substrate proteins. The intensity of color/ fluorescence in

TCA filtrate of digested labeled substrate is a function of the total proteolytic activity of

the enzyme solution. Examples include: (a) Chromogenic protein substrates (Azo-

proteins) viz. azo-casein (Charney and Tomarelli, 1947) and azo-albumin (Lambert et al.,

1978; Phillips et al., 1984); (b) FITC (Fluorescein isothiocyanate) labeled protein

substrates such as FITC-hemoglobin and FITC-casein (Lumen and Tappel, 1970;

Robbins and Summaria, 1976; Twining, 1984) and (c) BODIPY (4, 4-difluoro-4-bora-

3a,4a-diaza-s-indacene) dye labeled protein substrates which shows green fluorescence

emission similar to fluorescein (BODIPY FL casein and BODIPY FL BSA) or red

fluorescence emission upon conjugation with Texas red dye (BODIPY TR-X casein)

(Haugland and Kang, 1988; Jones et al., 1997). Most protein substrates can be labeled by

these dyes. Labeled protein substrates are sensitive and can be automated for real time

monitoring of proteolysis. However, protease assays using fluorescently labeled protein

substrates are more sensitive than using the corresponding chromophore labeled proteins.

(2) Assays with “synthetic” peptide substrates: these substrates are constructed around

the chemistry of amino acids and peptides. In this case substituents are added to either the

Chapter 1

8

α-NH2 or α-COOH group on the peptide to produce a substrate susceptible to proteolytic

cleavage and to impart a chromogenic or fluorogenic nature to the substrate (Table 1.2).

Based on the position of the blocking group, synthetic substrates can be classified into

three categories, namely endopeptidase (=proteinase), aminopeptidase and

carboxypeptidase substrates. Proteinase substrates have no free amino- or carboxy-

termini, while aminopeptidase substrates have a free amino terminus and

carboxypeptidase substrates have a free carboxy- terminus. The method of detection can

be spectrophotometric (Martin et al, 1959a, b; Erlanger et al., 1961; Bargetzi et al., 1963;

Walsh, 1970; Walsh and Wilcox, 1970; Farmer and Hageman, 1975; DelMar et al., 1979)

or fluorimetric (Zimmerman et al., 1976, 1977). Sensitive synthetic fluorogenic substrates

for trypsin, chymotrypsin and elastase have been developed. These 7-amido-4-

methylcoumarin (NHMec) substrates have high fluorescence yields, and are very

sensitive and useful in assaying dilute enzyme preparations at low substrate

concentrations.

Table 1.2: Common protease substrate nomenclature

NH2 substituents COOH substituents

Abbreviation Group Abbreviation Group

Bz Benzoyl NHNap 2-Naphthylamide

Z Benzoyloxycarbonyl NHNan 4-Nitroanilide

Ac Acetyl NHMec 7-Amido-4-methylcoumarin

Suc Succinyl OMe Methylester

Abz O-Aminobenzoyl

Fa Furylacryloyl

(Source: Beynon and Bond, 2001)

Zymograms used for the detection of mammalian proteases

Gel-based assays have been employed for qualitative characterization of

proteases. These assays can be categorized into: those where enzyme activity is detected

in situ after electrophoresis; and those where enzyme activity is detected by replica

blotting of the electrophoretic gel on to a substrate-containing gel. The basis of these

Chapter 1

9

assays is the incorporation of a substrate (usually a protein viz. gelatin, but p-nitroanilide

substrates have also been used) in a gel matrix. To retain the enzymatic activity, gel

electrophoresis is performed at low temperatures under non-denaturing or mildly

denaturing conditions where samples are not boiled prior to loading, but the gels and

running buffer contains SDS. Protease-containing zone(s) appears as cleared areas/zones

of activity where the protein substrate is digested. For these gel based assays,

electrophoresis is commonly performed in polyacrylamide gels, although both starch and

agarose gels have also been employed (Heussen and Dowdle, 1980; Sarath et al., 1989;

Garcia-Carreno et al, 1993; Lantz and Cibrowski, 1994; Leber and Balkwill, 1997). The

assays involving replica blotting of gel fractionated enzymes relies on the ability of

electrophoretically separated enzymes to hydrolyze substrates incorporated into a second

gel or on strips of absorbent material such as cellulose acetate. In another approach, these

fractionated enzymes are transferred to nitrocellulose or PVDF sheets, and then assayed

for activity (Ohlsson et al., 1986; Lantz and Cibrowski, 1994). In-spite of its general

applicability, these gel-based assay methods show various limitations viz. loss of activity

after SDS treatment and reduction in migration rate of proteins due to incorporation of

proteins in the matrix (Hummel et al., 1996; Michaud, 1998).

Digestive serine proteinases in lepidopteran insects

Lepidopteran midgut physiology is characterized by proteolytic activity dominated

by serine proteinases (Berenbaum, 1980; Broadway and Duffey, 1986a; Christeller et al.,

1992; Purcell et al., 1992). Invertebrate and insect trypsins are similar to mammalian

trypsins in backbone scaffold and structural domains, but display some remarkable

differences from the latter. Trypsins isolated from lepidopteran larvae have higher pH

optima that correspond to the higher pH of their midguts. Insect trypsins are not activated

or stabilized by calcium ions (Johnston et al, 1991), are unstable at acidic pH (Miller et

al., 1974; Sakal et al., 1989) and differ from mammalian trypsins in responses to natural

Chapter 1

10

protein inhibitors and in their specificities toward protein substrates (Applebaum, 1985;

Purcell et al., 1992; Terra and Ferreira, 1994).

Invertebrate and insect chymotrypsins also share the same backbone scaffold and

bilobal β-barrel structure as mammalian chymotrypsins (Botos et al., 2000), but differ

considerably from the latter in many aspects. They display high pH optima reflecting the

alkaline midgut pH values (Terra and Ferreira, 1994) and differ from their mammalian

counterparts in their instability at acidic pH and strong inhibition by soybean trypsin

inhibitor, STI (Terra et al., 1996; Mazumdar-Leighton and Broadway, 2001a). There are

earlier reports of reduced or even complete absence of chymotrypsin activity among

lepidopteran larvae, based on the use of mammalian chymotrypsin substrates like BTpNa

(N-benzoyl-L-tyrosine-p-nitroanilide) and BTEE (N-Benzoyl-L-Tyrosine Ethyl Ester)

with one amino-acid binding site (Ahmad et al., 1980; Johnston et al., 1991; Christeller et

al., 1992; McManus and Burgess, 1995). However, it was shown clearly with

chromogenic substrates with an extended binding site such as SAAPFpNa (N-Succinyl-

Ala-Ala-Pro-Phe-p-nitroanilide), SAAPLpNa (N-Succinyl-Ala-Ala-Pro-Leu-p-

nitroanilide) and SAAPFAMC (N-Succinyl-Ala-Ala-Pro-Phe-AMC) that these substrates

are more efficiently hydrolyzed by the chymotrypsins in insects, and particularly in

Lepidoptera (Lee and Anstee, 1995; Johnston et al., 1995; Mazumdar-Leighton and

Broadway, 2001a). Elastase-like activity has been reported in a wide range of

lepidopteran species based on the hydrolysis of synthetic substrates SAAApNa

(Houseman et al., 1991; Johnston et al., 1995) and SAAPLpNa (Christeller et al., 1992,

1994). However the nature of this detected activity is unclear. In Spodoptera littoralis,

the proteolytic activities responsible for hydrolysis of both SAAApNa and SAAPLpNa

were attributed to chymotrypsin (Lee and Anstee, 1995), whereas in Heliothis virescens,

SAAPLpNa digesting activities were suggested to be because of chymotrypsin-like

enzymes, whereas SAAApNa digesting activities were suggested to be elastase-like

(Johnston et al., 1995). The purified chymotrypsin from Manduca sexta has been reported

to hydrolyse SAAPLpNa, but not SAAApNa (Peterson et al., 1994). On the basis of

SAAApNa hydrolysis, some amount of elastase activity has also been reported in

Chapter 1

11

Sesamia nonagrioides (Ortego et al., 1996). More work is required to check these

overlapping specificities of lepidopteran chymotrypsin and elastase-like enzymes.

Digestive serine proteinases in lepidopteran insects from India

The insect order Lepidoptera (Class: Insecta) includes several species of moths

and butterflies of economic importance. Several lepidopteran pests have been reported to

cause significant yield losses to crop plants in India. Indian agriculture is currently

suffering an annual loss of about Rs. 8, 63, 884 million due to insect pests (Dhaliwal et

al., 2010; Singh and Gandhi, 2012). Enormous amount of work has been done to control

these pests, with attempts to develop transgenic plants with incorporated proteinase

inhibitors and the insecticidal genes from Bacillus thurengenesis (Bennett et al., 1993;

Shao et al., 1998; Bentur et al., 1999, 2000; Selvapandian et al., 2001; Reddy et al.,

2002). In contrast, some beneficial Lepidoptera, like the non-mulberry silkmoths endemic

to Indian subcontinent are a source of revenue. For example, endemic saturniid

silkmoths, Antheraea mylitta and A. assamensis are reared for their “tasar” and “muga”

silks (Hazarika et al., 1998). The following section gives a brief introduction to the

different lepidopteran insects used in this work.

Helicoverpa armigera (Lepidoptera: Noctuidae), the “cotton bollworm”, is a

devastating polyphagous pest reported to feed on over 180 plant species from 45 different

plant families in India, including cotton, chickpea, pigeonpea, tobacco, tomato, corn and

okra (Manjunath et al., 1989). Proteolytic gut activities in H. armigera midgut have been

shown to be largely due to serine proteinases with high pH optima using casein, Rubisco

and synthetic substrates (Christeller et al, 1992; Johnston et al., 1991, 1995; Patankar et

al., 2001). The gut activities are reported predominantly to be trypsin-like (Johnston et

al., 1991; Mazumdar-Leighton et al, 2000). Chymotrypsins have been reported to show

up-regulation following ingestion of an inhibitor and are implicated in the development

of resistance in H. armigera to ingested delta–endotoxins from Bacillus thuringiensis

(Bown et al., 1997; Gatehouse et al., 1997; Wu et al., 1997; Shao et al., 1998; Mazumdar-

Chapter 1

12

Leighton and Broadway, 2001a). Carboxypeptidase and elastase like activities have also

been reported from larval midguts of H. armigera (Bown et al., 1997, 1998; Wu et al.,

1997).

Pieris brassicae (Linnaeus) (Lepidoptera: Pieridae), the “large white butterfly”, is

a specialist polyphagous pest of Crucifers in India (Ma, 1972; Sachan and Gangwar,

1980; Ali and Rizvi, 2007, Blatt et al., 2008). Apart from Cruciferae, it feeds on a

number of different host plants from different families including Capparaceae,

Papilionaceae, Ressdaceae and Tropaeolaceae (Feltwell, 1982). Trypsin and

chymotrypsin like activities have recently been reported, using synthetic chromogenic

substrates, from fourth instar P. brassicae larvae fed on cabbage leaves (Zibaee, 2012).

The midgut serine proteinases of P. brassicae were found to be involved in the

physiological adaptation of this pest to different host plants (Kumar, R., 2010 PhD

thesis).

Antheraea assamensis Helfer (Lepidoptera: Saturniidae), the “muga silk worm”,

is an economically important semi-domesticated insect reared for its golden hued cocoon

silk in north eastern parts of India and Indo-Burma (Choudhury, 1981; Ahmed et al.,

1998; Hazarika et al., 1998). It is mostly cultured around the Brahmaputra valley and

adjoining hills bordering Assam, Nagaland, Meghalaya and Arunachal Pradesh. A.

assamensis larvae can feed on 28 species of plants (Choudhury, 1981; Bindroo et al.,

2006) of family Lauraceae, but commercial cultivation is done mainly on the leaves of

Persea bombycina (Som) and Litsea monopetala (Sualu). Differential levels of gut

trypsins and chymotrypsins have been detected in larve feeding on these two host plants,

with synthetic fluorogenic substrates (Saikia et al., 2010). Figure 1.6 and figure 1.7 shows

these insects and their host plants.

Protease Inhibitors in plants: the natural antagonists of insect proteases

Protease inhibitors (PIs) are ubiquitous in plants and are major constituents of

seeds and storage organs (5–15% of total storage protein) (Jongsma and Bolter, 1997).

Chapter 1

13

Various lines of evidence suggest that the major function of proteinase inhibitors is to

combat the proteinases of pests and pathogens, constituting an essential part of the plant’s

natural defense system (Green and Ryan, 1972; Hill and Hastie, 1987; Ryan, 1990;

Jongsma and Bolter, 1997; Schuler et al., 1998; Tiffin and Gaut, 2001; Mello and Silva-

Filho, 2002; Mello et al., 2003; Christeller, 2005; Christeller and Laing, 2005; Fan and

Wu, 2005). Serine proteinase inhibitors have been reported from a variety of plant

sources and are among the most widely studied class of PIs (Ryan, 1978, 1979, 1990;

Vodkin, 1981; Kim et al., 1985; Broadway et al., 1986a; Wolfson and Murdock, 1987;

Christeller and Shaw, 1989; Michaud et al., 1993; McManus and Burgess, 1995;

McManus et al., 1999; Mello et al., 2003, Haq and Khan, 2003; Christeller and Laing,

2005., Fan and Wu, 2005). These proteinase inhibitors have been reported to inhibit the

growth and development of a wide range of herbivores (Broadway and Duffey, 1986a, b;

Broadway, 1995; Oppert et al., 1993; Giri et al., 2003), by disrupting proteolysis in the

insect midgut. It was demonstrated that the expression of the cowpea trypsin inhibitor

(CPTI) as a transgene in tobacco significantly reduced the damage caused by Heliothis

virescens (Hilder et al., 1987).

The expression of plant PIs is well regulated; and may be constitutive and/or

induced by herbivory (Green and Ryan, 1972; Nelson et al., 1983; Ryan, 1990; Pearce et

al., 1993; Jongsma et al., 1994). Expression of PPIs is regulated by the jasmonic acid

(JA) signaling pathway along with several other jasmonate induced proteins (JIPs)

involved in plant defense (Weidhase et al., 1987; Farmer and Ryan, 1990; Farmer and

Ryan, 1992; ; Felton et al., 1994; Constabel et al., 1995; McConn et al., 1997; Baldwin,

1998; Turner et al., 2002; Howe, 2005; Chen et al., 2005; Lison et al., 2006; Chen et al.,

2007; Browse and Howe, 2008; Howe and Jander, 2008). In contrast to the induction of

JIPs involved in plant defense, several household protein synthesis is repressed or down

regulated (especially the proteins involved in photosynthetic carbon assimilation) in

response to JA treatment. Table 1.3 summarizes some of the JA responsive genes,

reported to show altered expression (induction/ repression) in response to insect

herbivory.

Chapter 1

14

Table 1.3: Gene expression modulated in response to JA treatment/insect herbivory

Plant gene

product

Plant

Species

Response to

herbivory/JA

treatment

Suggested Role References

Proteinase

Inhibitors

Potato,

tomato,

alfalfa

Induced Plant defense, inhibits

digestive proteases

Farmer and Ryan, 1990;

Johnson and Ryan, 1990;

Farmer et al., 1992;

Hildmann et al, 1992

Polyphenol

oxidase

Tomato Induced Plant defense, reduce the

nutritional quality of

ingested plant proteins

Felton et al., 1992;

Constabel et al., 1995;

Threonine

deaminase

Tomato Induced Plant defense, depletes

the essential amino acid

threonine in insect

herbivore gut

Chen et al., 2005, Kang et

al., 2006; Gonzales-Vigil et

al., 2011

Arginase Tomato Induced Plant defense, depletes

the essential amino acid

arginine in insect

herbivore gut

Chen et al., 2005

Leucine

amino-

peptidase A

Tomato Induced Possibly works

synergistically with

arginase

Pautot et al., 1993; Gu et al.,

1996; Chen et al., 2005

Rubisco small

subunit

(RBCS)

Barley Repressed Plant defense, limit the

nutritional value,

reutilize the amino acids

for JIP synthesis

Reinbothe et al., 1992,

1993a, 1994; Hermsmeier et

al., 2001

Rubisco large

subunit

(RBCL)


nutritional value,


for JIP synthesis

Reinbothe et al., 1992,

1993b

Light

harvesting

chlorophyll a/b

binding protein


nutritional value,


for JIP synthesis

Reinbothe et al., 1993a, b

Mode of action of plant proteinase inhibitors

Structural studies on several free inhibitors and enzyme-inhibitor complexes

(Ruhlmann et al., 1973; Huber et al., 1974; Sweet et al., 1974) showed that all inhibitors

have a hyper exposed loop surrounding the P1 residue. In complexes with their cognate

enzymes, the reactive site loop of serine proteinase inhibitors associates with the catalytic

residues of target enzyme(s) in a similar way as that of bound substrates. This substrate

like mechanism was termed the “standard mechanism of inhibition” (Laskowski and

Chapter 1

15

Kato, 1980). In 1970s, the mechanism of inhibition of Porcine Pancreatic Trypsin (PPT)

by a plant proteinase inhibitor, soybean trypsin inhibitor (STI) was proposed, and later on

the residues involved in this interaction were determined (figure 1.8) (Blow et al., 1974;

Baillargeon et al., 1980; Song and Suh, 1998). Since then, a number of protease-PI

structures have been solved. For example, bovine trypsin complexed with Cucurbita

maxima trypsin inhibitor, CMTI-I (Bode et al., 1989), subtilisin BPN´ complexed with

Streptomyces subtilisin inhibitor (Takeuchi et al., 1991) and porcine trypsin complexed

with Momordica charantia trypsin inhibitor, MCTI-A (Huang et al., 1993).

Plant proteinase inhibitors and lepidopteran gut proteinases

Since Serine proteinase inhibitors acts in a substrate like manner, proteolysis of food

is blocked in their presence. The chronic ingestion of proteinase inhibitors results in

pernicious hyper-production of proteolytic enzymes, leading to limitation of essential

amino acid for protein synthesis and retardation in growth and development (Broadway

and Duffey, 1986b; Johnston et al., 1993; Jongsma and Bolter, 1997). The insect larvae

respond both quantitatively and/or qualitatively by altering their gut proteases in response

to ingestion of PIs. These responses include synthesis of “insensitive” proteinases which

the PI is unable to bind to and inhibit (Broadway, 1995; Jongsma et al, 1995; Bown et al.,

1997; Gatehouse et al, 1997; Mazumdar-Leighton and Broadway, 2001a, 2001b) or

synthesis of proteinase iso-forms having the ability to bind and degrade the PPI (Giri et

al, 1998; Telang et al, 2005). This forms an interesting paradigm of plant-insect herbivore

interactions, and in some sense an extension of the co-evolution theory proposed by

Ehrlich and Raven (1964) between herbivores and their host plants (Jongsma and Bolter,

1997; Mello and Silva-Filho, 2002). Some of the responses reported from lepidopteran

larvae, upon feeding on inhibitor containing natural or synthetic diets are tabulated in

table 1.4.

Chapter 1

16

Table 1.4: Responses by some lepidopteran larvae upon feeding on plant PIs

Insect

PI type

Diet Response

Reference

Heliothis zea STI, PI-2 Artificial diet Reduction in growth and

development, increase in

trypsin activity

Broadway and Duffey,

1986a

Spodoptera

exiqua

STI, PI-2 Artificial diet Reduction in growth and


trypsin activity

Broadway and Duffey,

1986a

Heliothis

virescens

CpTI Transgenic

tobacco

Reduction in growth and

development and

mortality

Hilder et al., 1987

Manduca sexta PI-I, PI-II Transgenic

tobacco

Reduction in growth and

development with PI-II

Johnson et al., 1989

Helicoverpa

armigera

STI, SBBI Artificial diet Reduction in growth and

development

Johnston et al., 1993

Spodoptera

litura

STI Artificial diet Reduction in growth and


trypsin activity

McManus and Burgess,

1995

Spodoptera

exigua

PI-2 Transgenic

tobacco

Increased tryptic activity

insensitive to PI-2

Jongsma et al., 1995

Helicoverpa

zea

STI Artificial diet Increased proteolytic

activity insensitive to STI

Broadway, 1997

Agrotis ipsilon STI Artificial diet Increased proteolytic


Broadway, 1997

Trichoplusia ni STI Artificial diet Increased proteolytic


Broadway, 1997

Helicoverpa

armigera

STI Artificial diet Induction of inhibitor

insensitive chymotrypsins

Bown et al., 1997

Helicoverpa

armigera

Aprotinin,

PI-I, PI-II,

STI

Artificial diet Upregulation of

chymotrypsins and

downregulation of

trypsins

Gatehouse et al., 1997

Helicoverpa

armigera

Cicer

arietinum

seed PIs

Chickpea

seeds

Induction of proteinase

isoforms capable of

degrading the PIs

Giri et al., 1998

Helicoverpa

zea

STI Artificial diet Upregulation of

insensitive trypsins and

chymotrypsins

Mazumdar-Leighton and

Broadway, 2001a, b

Agrotis ipsilon STI Artificial diet Upregulation of

insensitive trypsins and

chymotrypsins

Mazumdar-Leighton and

Broadway, 2001a, b

Heliothis

virescens

Nicotiana

tabacum leaf

PIs

Tobacco

leaves

Synthesis of trypsins that

form oligomers

Brito et al., 2001

Helicoverpa

zea

STI Artificial diet Synthesis of insensitive

trypsin

Volpicella et al., 2003

(*PI-2: potato proteinase inhibitor II, CpTI: Cowpea trypsin inhibitor; PI-I & II: tomato proteinase

inhibitor, STI: soybean (Kunitz) trypsin inhibitor)

Chapter 1

17

Assays used for the characterization of insect gut proteases

Assays described in previous sections for the mammalian digestive proteases are

also used for the characterization of lepidopteran midgut proteases. The total proteolytic

activities are characterized using protein substrates in absence and presence of proteinase

inhibitors of chemical or plant origin viz. pefabloc (4-(2-Aminothyl)-benenesulfonyl

fluoride, Hydrochloride); EDTA (Ethylene Diamine Tetraacetic Acid) for

metalloproteinases, E-64 (1-trans-Epoxysuccinyl-l-leucylamido [4-guanidino] butane) for

cysteine proteinases and antipain which inhibits both serine and cysteine proteases. The

diagnostic serine proteinase inhibitors which are routinely used, includes TLCK, which

indicates histidine at the active site of trypsin (Shaw, 1967); TPCK, which indicates

histidine at the active site of chymotrypsin (Schoellman and Shaw, 1963); STI (Soybean

Kunitz Trypsin inhibitor) and SBBI (Soybean Bowman-Birk inhibitor), the naturally

occurring plant proteinase inhibitors known to inhibit both trypsin and chymotrypsin like

activities of lepidopteran insects. Protein substrates used routinely include casein,

haemoglobin, BSA, gelatin etc. and their azo dye-conjugated derivatives (Pritchett et al.,

1981; Broadway and Duffey, 1986 b; Hamed and Attias, 1987; Broadway, 1989; Purcell

et al, 1992; Jongsma et al., 1995; Ortego et al., 1996; Giri et al., 1998; Patankar et al.,

2001; Parde et al., 2010; Zibaee, 2012) viz. BODIPY-casein (Li et al., 2004; George et

al., 2008). Radio-labeled (14

C) Rubisco, along with casein was used for the

characterization of midgut protease activities in 12 phytophagous lepidopteran larvae

(Christeller et al., 1992). The rates of hydrolysis of native Rubisco were found to be 2-15

fold less than the milk phosphoprotein casein (Christeller et al., 1992).

Serine proteinase activities have been characterized using ester substrates like N-

α-Benzoyl-L-Arginine Ethyl Ester (BAEE), N-α-Benzoyl-L-Tyrosine Ethyl Ester (BTEE)

and p-toluenesulfonyl-L-arginine methyl ester (TAME) (Broadway and Duffey, 1986 b;

Broadway and Duffey, 1988; Broadway, 1989; Christeller et al., 1989; Lenz et al., 1991).

Amide substrates have also been extensively employed, which includes N-α-Benzoyl-

DL-Arginine-p-nitroanilide (BApNA), N-α-Benzoyl-L-Tyrosine-p-nitroanilide (BTpNA),

Chapter 1

18

N-Succinyl-Ala-Ala-Pro-Phe-p-nitroanilide (SAAPFpNa) and N-Succinyl-Ala-Ala-Pro-

Leu-p-nitroanilide (SAAPLpNa) etc. (Christeller et al., 1989, 1992; Lenz et al., 1991;

Johnston et al., 1993; Christeller et al, 1994; Ferreira et al., 1994; Peterson et al., 1994,

1995; Broadway, 1995; Lee and Anstee, 1995; Valaitis, 1995; Broadway, 1997; Wu et

al., 1997; Johnson and Felton, 2000; Volpicella et al., 2000, 2003; Herrero et al., 2005;

Zavala et al., 2008; Zibaee, 2012).

More recently, specific fluorogenic substrates like N-α-Benzoyl-Arg-7-amino-4-

methylcoumarin (Bz-RMCA) and N-Succinyl-Ala-Ala-Pro-Phe-7-amino-4-

methylcoumarin (SAAPFAMC) have also been employed (Mazumdar-Leighton and

Broadway, 2001 a, b; Brito et al., 2001; Lopes et al., 2004, 2006; Chougule et al., 2008;

Sato et al., 2008; Lopes et al., 2009; Saikia et al., 2010). The gel-based analysis of the

midgut proteases is also similar to those explained for the mammalian or bacterial

systems (Michaud, 1998). The protein substrates usually used includes casein or gelatin

(Forcada et al., 1996; Brito et al., 2001; Zeng et al., 2002; Hegedus, 2003; Li et al., 2004;

Karumbaiah et al., 2007; Budatha et al., 2008; Chougule et al., 2008; Erlandson et al.,

2010; Saikia et al., 2010), but use of p-nitroanilide substrates have also been reported

(Oppert et al., 1996; Oppert et al., 2000; Zhu et al., 2000; Vinokurov et al., 2005; Oppert,

2006). In these gel based assays, the type of activities are characterized using specific

proteinase inhibitors.

Rubisco: the holoenzyme

The most abundant plant protein “Rubisco” is the major protein consumed by

phytophagous insects as it constitutes up to 50% of the soluble leaf proteins of C3 plants

(Ellis, 1979; Spreitzer and Salvucci, 2002) and 20–30% of total leaf nitrogen (Evans and

Seemann, 1989; Makino, 2003; Kumar et al., 2002). Rubisco is a major source of

essential amino acids for the phytophagous insects. Figure 1.9 shows the amino acid

composition of Nicotiana tabacum Rubisco large and small subunits (UniProtKB

Accession No. P00876 and P69246), calculated using Protparam tool available at

Chapter 1

19

http://web.expasy.org/protparam/ (Gasteiger et al., 2005). The essential amino acids (Arg,

His, Ile, Leu, Lys, Met, Phe, Thr, Trp, Val) constitutes 49% of the total amino acid

content of N. tabacum RBCL and 44% of N. tabacum RBCS.

Rubisco (ribulose-l, 5-bisphosphate carboxylase/oxygenase; EC 4.1.1.39) is one

of the largest enzymes in nature (~560 kDa). This enzyme is involved in the initial steps

of fixing of inorganic carbon by the photosynthesizing organisms. This process starts

with the condensation to one molecule of CO2 with a five-carbon sugar, ribulose-l,5-

bisphosphate (RuBP) resulting in formation of two molecules of 3-phosphoglycerate (3-

PGA). This carboxylation reaction is catalyzed by Rubisco and is the entry point for the

incorporation of atmospheric CO2 into the sugars for photosynthesis. The Rubisco

holoenzyme in green plants, cynobacteria and non green algae is a hexadecameric

structure (L8S8, Form I) that includes eight identical chloroplast-encoded large subunit

polypeptides (~55kDa) and eight nuclear encoded small subunit polypeptides (~15kDa)

(Rutner and Lane, 1967; Nishimura et al., 1973; Baker et al., 1977a, b).

Rubisco large subunit (RBCL)

First rbcL gene was cloned and sequenced in late 1970s from Zea mays (Coen et

al., 1977; McIntosh et al., 1980). Since then rbcL sequences are being added to the NCBI

database at an exponential rate, mainly for use in the field of plant systematic and

phylogeny (Clegg, 1993; Chase et al, 1993; Hasebe et al., 1995; Kellogg and Juliano,

1997). The length of encoded RBCLs have been reported to vary slightly among higher

plants, because of whole codon insertions/deletions/substitutions in the extreme 3' end of

rbcL genes (typically from about nucleotide position 1407 to 1431, numbering according

to maize RBCL) (Clegg, 1993; Kellogg and Juliano, 1997; Spreitzer and Salvucci, 2002).

L2 dimer is the basic functional unit of plant Rubisco molecules (Schneider et al.,

1990). The active site of Rubisco is formed by the juxtaposition of the N-terminal domain

of one large subunit with the C-terminal domain (α/β barrel domain) of the second large

subunit of a L2 dimer. The two large subunits dimerize ‘‘head to tail’’, with the N-

http://web.expasy.org/protparam/

Chapter 1

20

terminal domain of one covering the top (C-terminal side) of the barrel of the other. Each

dimer thus has two active sites, each involving a total of 20 residues from the barrel of

one and the N-terminal domain of the other (Lorimer, 1981; Hartman et al., 1984;

Igarashi et al., 1985; Kellogg and Juliano, 1997). Two well defined domains can be

identified in a Rubisco large subunit; a smaller N-terminal domain (residue 1-150) and a

bigger C-terminal or α/β barrel domain (residues 151-475). Figure 1.10 shows the

structure of a RBCL and RBCS; and the holoenzyme from Nicotiana tabacum (PDB ID

1EJ7, Duff et al., 2000). The comparison of amino acid sequence of large subunit of

spinach Rubisco (Zurawski et al., 1981) with the large subunit of a form II Rubisco (a

dimer of two large subunits only) from Rhodospirillum rubrum (Hartman et al., 1984;

Nargang et al., 1984) showed only 28% similarity; but a striking similarity was seen in

the overall three dimensional structure of the L2 dimers of these two very different

Rubisco molecules (Schneider et al., 1990; Knight et al., 1990), suggesting conservation

of overall structure of large subunits in different forms.

Rubisco degradation in plants

Rubisco constitutes significant portion of the total leaf nitrogen in plants, and

once its enzymatic functions are fulfilled, proteolytic degradation of the holoenzyme

occurs in senescing leaves leading to remobilization of the reserved nitrogen (Peterson

and Huffaker, 1975; Friedrich and Huffaker, 1980; Fischer and Feller, 1994; Feller, 1990;

Crafts-Brandner et al., 1996, 1998; Feller et al., 2008). Rubisco synthesis takes place at

very high rates during leaf expansion and its net degradation starts shortly after full leaf

expansion, when Rubisco becomes a major nitrogen source for other plant parts like

young leaves and developing fruits (Mae, 2004). Rubisco catabolism and re-localization

of the amino acids is controlled by many factors apart from natural senescence (Feller

and Fischer, 1994). In vivo Rubisco catabolism has been reported to occur in response to

several environmental and endogenous factors (Nooden et al., 1997; Demirevska-Kepova

and Feller, 2004; Marin-Navarro and Moreno, 2006, Feller et al., 2008); for example,

nitrogen limitation (Crafts- Brandner et al., 1996, 1998), dark induced senescence (Feller

Chapter 1

21

and Fischer, 1994) and harsh climatic conditions like heat, drought and waterlogging

(Herrmann and Feller, 1998; Demirevska-Kepova and Feller, 2004) has been reported to

induce rapid Rubisco degradation in plants. Rubisco degradation has been reported to be

accelerated in maize after fruit removal (Crafts-Brandner and Poneleit, 1987). This

process has been reported to vary among species and genotypes (Crafts-Brandner and

Poneleit, 1987; Nakano et al., 1995). Insect herbivory has also been reported to induce

senescence like symptoms like chlorophyll loss and Rubisco degradation, possibly

leading to reutilization of the amino acids to synthesize plant defense proteins (see table

1.3).

The proteolytic digestion of Rubisco in plants during senescence and stress

conditions is not well understood. Catalytically active Rubisco has been reported to be

protected against proteolytic degradation (Mulligan et al., 1988; Houtz and Mulligan,

1991; Khan et al., 1999). Oxidative modification of cysteine residues has been reported

to enhance the proteolytic susceptibility of Rubisco (Penarrubia and Moreno, 1990;

Mehta et al., 1992; Desimone et al., 1996; Ishida et al., 1997, 1999; Roulin and Feller,

1998; Moreno et al., 2008). Oxidative stress has been reported to stimulate intermolecular

crosslinking of Rubisco subunits by disulfide linkages within the holoenzyme, inhibition

of the enzyme activity, rapid translocation to chloroplast membrane leading to

degradation (Mehta et al., 1992). Endogenous proteolytic activities belonging to different

classes and active at different pH have been reported to be involved in proteolytic

degradation of Rubisco subunits (Wittenbach, 1978; Wittenbach, 1979; Miller and

Hauffaker, 1982; Shurtz-Swirski and Gepstein, 1985; Paech and Dybing, 1986; Casano et

al., 1989; Yoshida and Minamikawa, 1996). Several reports suggests that Rubisco can be

degraded in intact chloroplasts (Ragster and Chrispeels, 1981; Mitsuhashi et al., 1992;

Desimone et al., 1996, 1998; Roulin and Feller, 1997, 1998; Ishida et al., 1998; Zhang et

al., 2007; Feller et al., 2008). A Rubisco degrading metalloproteinase has been reported

from pea chloroplasts (Bushnell et al., 1993). Another prevailing hypothesis is that

vacuolar enzymes are involved in the proteolytic digestion of Rubisco (Lin and

Wittenbach, 1981; Miller and Huffaker, 1981, 1982; Thayer and Huffaker, 1984; Bhalla

Chapter 1

22

and Dalling, 1986; Zhang et al., 2006). Whether the Rubisco proteolysis in chloroplasts

and/or senescence associated vacuoles (SAVs) is co-operated or represents alternative

pathways is not yet understood (Martinez et al., 2008, Feller et al., 2008).

Expression of recombinant proteins in Escherichia coli

E. coli is by far the most extensively used prokaryotic expression system due to

its ability to grow on inexpensive carbon source, rapid biomass accumulation,

amenability to high cell-density fermentations, availability of extensive genetic

information and large number of cloning vectors and mutant host strains. (Baneyx, 1999;

Baneyx and Mujacic, 2004). The T7 promoter based pET expression system

(commercialized by Novagen) is the most preferred system for expression of recombinant

proteins (Studier et al., 1990; Studier, 1991; Dubendorff and Studier, 1991). This system

includes several advantages like; precise control over target protein expression, hybrid

promoters, multiple cloning sites for the incorporation of different fusion partners and

protease cleavage sites, along with a high number of genetic backgrounds modified for

various expression purposes (Sorensen and Mortensen, 2005). Figure 1.11 shows the map

of a pET vector. For the pET expression system, BL21 and its derivatives are the most

widely used E. coli hosts. A number of BL21 derivatives have been developed to

overcome the problems of codon bias and improper folding of heterologous proteins

expressed in E. coli. For example, OrigamiTM

and Origami B strains offer formation of

properly folded disulfide containing recombinant proteins (Prinz et al., 1997). To

circumvent the problem of codon bias; RosettaTM

and RosettaBlueTM

strains have been

developed, which supplies the tRNAs for six codons used rarely in E. coli (Kane, 1995;

Kurland and Gallant, 1996). Another strategy applied for high throughput recombinant

protein expression and purification involves use of affinity tags. Affinity tags like

maltose binding protein (MBP) and glutathione S-transferase (GST) have been shown to

have a beneficial effect on solubility of recombinant proteins expressed in E. coli (Kapust

and Waugh, 1999; Smith, 2000). However, these tags can interfere with proper folding,

function and crystallization of heterologous proteins (Braun and LaBaer, 2003). Smaller

Chapter 1

23

tags like His6-tag is a popular choice because of its small size offering smaller risk of

steric hinderance and its strong reversible binding to metal chelate adsorbent (Hochuli et

al., 1988; Crowe et al., 1994; Braun and LaBaer, 2003; Knecht et al., 2009). His6-tag is

particularly useful for heterologous proteins typically expressed in inclusion bodies,

because it functions even in denaturing conditions and the tagged protein can be purified

using affinity chromatography in presence of denaturants (Bornhost and Falke, 2000;

Braun and LaBaer, 2003).

Inclusion bodies (IBs) and recombinant proteins

Overproduction of heterologous proteins that requires post-translational

modifications and disulfide bond formation for correct folding and functional activity,

often leads to misfolding and segregation of the recombinant protein into insoluble

aggregates called inclusion bodies (Baneyx, 1999). The structure and mechanism of

formation of inclusion bodies in E. coli is not well understood (Villaverde and Carrio,

2003). Significant features of protein aggregates in inclusion bodies are the existence of

native-like secondary structure of the expressed protein and resistance to proteolytic

degradation (Przybycien et al., 1994; Oberg et al., 1994; Umetsu et al., 2004, Singh and

Panda, 2005). Inclusion body formation during recombinant expression is undesirable,

but their formation offers several advantages, viz., (a) expression of a very high level

of target protein which typically accounts for 80-95% of the inclusion body material (b)

easy isolation of the inclusion bodies from cells due to differences in their size and

density as compared with cellular contaminants, and (c) resistance to proteolytic attack

by cellular proteases (Thatcher and Hitchcock, 1994; Baneyx and Mujacic, 2004; Singh

and Panda, 2005). The recovery of expressed protein from inclusion bodies essentially

involves, isolation and washing of inclusion bodies from E. coli cells, solubilization, and

refolding of the target protein (Clark, 1998; Lilie et al., 1998; Rudolph et al., 1997; Clark,

2001; Vallejo and Rinas, 2004). Efficient refolding of the target protein from inclusion

bodies requires considerable optimization. Figure 1.12 shows schematic representation of

http://www.sciencedirect.com/science/article/pii/S0168165698001977#BIB25

Chapter 1

24

the different steps that can be followed for recovery of refolded proteins from the

inclusion bodies (Clark 2001; Vallejo and Rinas, 2004). There are many reports for

recovery and refolding of active recombinant proteins from E. coli inclusion bodies

(Yesilirmak and Sayers, 2009). Examples include refolding of human gamma interferon

from inclusion bodies using L-arginine (Arora and Khanna, 1996; Babu et al., 2000);

Arabidopsis thaumatin-like protein (ATLP3) (Hu and Reddy, 1997); refolding of

Solanum nigrum osmotin like protein (SnOLP) using reduced:oxidized glutathione redox

buffer (Campos et al., 2008); solubilization of soybean RHG1-LRR domain protein from

inclusion bodies with urea buffer and refolding by removing the urea in presence of

arginine and reduced:oxidized glutathione buffer (Afzal and Lightfoot, 2007).

Recombinant expression of Rubisco in E. coli

The genes encoding for Rubisco large subunits from maize and wheat were the

first plant genes reported to be expressed in E. coli system (Gatenby et al., 1981; Gatenby

and Castleton, 1982). The full length maize RBCL was exclusively found in the insoluble

fraction of E. coli cultures (Gatenby, 1984). Several groups reported the recombinant

expression of active cynobacterial Rubisco, when both the subunits were expressed in E.

coli (Christeller et al., 1985; Gatenby et al., 1985; Gurevitz et al., 1985; Tabita and Small,

1985). In attempts to obtain active recombinant Rubisco from higher plant, the maize

RBCL and wheat RBCS were co-expressed in E. coli, but inactive RBCL-RBCS

aggregates were found predominantly in the insoluble fraction, suggesting the

involvement of other factors (Gatenby et al., 1987). An active hybrid Rubisco was

recombinantly expressed in E. coli using cynobacterial RBCL and wheat RBCS (van der

Vies et al., 1986). It was demonstrated that for the assembly of active prokaryotic

Rubiscos from Anacystis nidulans and R. rubrum in recombinant E. coli cultures,

endogenous GroEL and GroES heat shock proteins present in E. coli periplasm are

required (Goloubinoff et al., 1989). Similarly, Rubisco from different bacterial sources

(purple bacteria and cyanobacteria) was recombinantly expressed in E. coli cells to study

Chapter 1

25

the role of DnaK chaperone system in its proper folding (Checa and Viale, 1997). It has

now been established that in vivo protein folding and biological assembly of some

macromolecules viz. Rubisco may require a complex cellular machinery of molecular

chaperones (Ellis and Hemmingsen, 1989; Gething and Sambrook, 1992; Hartl, 1996;

Frydman, 2001; Hartl and Hayer-Hartl, 2002).

MALDI-TOF peptide mass fingerprinting

Mass spectrometry (MS) based on soft ionization techniques viz., MALDI

(Matrix-assisted laser desorption/ionization) (Tanaka et al., 1988; Karas and Hillenkamp,

1988) and ESI (Electrospray ionization) (Yamashita and Fenn, 1984a, b), is a rapid and

sensitive tool for the analysis of proteins and peptide mixtures. The basic principle of

mass spectrometry (MS) is to generate ions from either inorganic or organic compounds

by any suitable method, to separate these ions by their mass-to-charge ratio (m/z) and to

detect them qualitatively and quantitatively by their respective m/z and abundance.

MALDI-TOF-MS has become a widely applied, powerful analytical tool in various fields

of biological sciences (Schluter et al., 2003). MALDI-TOF is ideal for biological samples

because it is fast, simple, sensitive (low fmol range), accurate (low ppm range),

compatible with phosphate and Tris buffers, tolerant to low levels of contaminants such

as salts and surfactants and can be automated (Thiede et al., 2005, Pan et al., 2007). In

most cases, sequence-specific peptide fragmentation or peptide mass fingerprinting

(PMF), also called as peptide mass mapping is sufficient to identify the protein(s) in any

sample (Henzel et al., 1993; Aebersold and Goodlett, 2001; Thiede et al., 2005). In PMF,

the protein/protein mixture is digested with a sequence specific enzyme. Trypsin is

usually the favored enzyme for PMF because it is relatively cheap, highly effective, and

generates peptides with an average size of about 8–10 amino acids, ideally suited for

analysis by MS (Thiede et al., 2005). Before transfer into the ionisation source, the

peptide or protein samples are mixed with a matrix (an organic component) and the

mixtures are placed to crystallize in small droplets on a target. The energy required for

ionization of the ions is provided by a laser. The matrix crystal absorbs energy and

Chapter 1

26

converts it to heat leading to localized “explosion” of the sample/matrix (the plume jet).

Proton transfer in the resulting plasma yields the ions (M+H+) and being in the gas phase

these ions are accelerated by an electric field into the analyzer and separates according to

their m/z values. MALDI mainly produce single charged ions so the m/z ratio will in most

cases have the same value as the mass of the peptide plus one hydrogen (M+H+). The

obtained m/z values are matched against theoretically calculated m/z data of peptides

available in the database and a score depending on the correlation is given.

Post translational/chemical modifications of proteins/peptides can complicate

MALDI-TOF PMF analyses, because these modifications affect the observed masses.

These PTMs may be natural or may arise inadvertently during sample handling (Karty et

al., 2002; McCarthy et al., 2003; Froelich and Reid, 2008). However, certain softwares

are available viz. FindMod (http://web.expasy.org/findmod/, Wilkins et al., 1999;

Gasteiger et al., 2005), which can examine PMF data for mass difference between

experimental and theoretical masses, and predicts the nature and site of modification(s)

based on the mass difference. Table 1.5 lists the various sources used for MS based

protein identification. For customized applications, for example proteomic studies of

Lepidoptera, specific databases like ButterflyBase (http://www.butterflybase.org) can be

used.

Table 1.5: Sources for MS-Based Protein Identification Tools

Sponsor (Application) Uniform Resource Locator (URL)

Eidgenossische Technische Hochschule (MassSearch) http://cbrg.inf.ethz.ch

European Molecular Biology Laboratory (PeptideSearch) http://www.mann.emblheidelberg.de

Swiss Institute of Bioinformatics (ExPASy) http://www.expasy.ch/tools

Matrix Science (Mascot) http://www.matrixscience.com

Rockefeller University (PepFrag, ProFound) http://prowl.rockefeller.edu

Human Genome Research Center (MOWSE) http://www.seqnet.dl.ac.uk

University of California (MS-Tag, MS-Fit, MS-Seq) http://prospector.ucsf.edu

Institute for Systems Biology (COMET) http://www.systemsbiology.org

University of Washington (SEQUEST) http://thompson.mbt.washington.edu/sequest

(Source: Aebersold and Goodlett, 2001)

http://web.expasy.org/findmod/

http://www.butterflybase.org/

http://cbrg.inf.ethz.ch/

http://www.mann.emblheidelberg.de/

http://www.expasy.ch/tools

http://www.matrixscience.com/

http://prowl.rockefeller.edu/

http://www.seqnet.dl.ac.uk/

http://prospector.ucsf.edu/

http://www.systemsbiology.org/

http://thompson.mbt.washington.edu/sequest

Chapter 1

27

Tandem mass Spectrometry (MS/MS) and de novo sequencing

PMF is usually complemented by tandem mass spectrometry (MS/MS) for

unambiguous identification of proteins in a complex mixture and for confirmation of

results obtained by PMF. MS/MS provides sequence specific fragmentation patterns of

individual peptides that can be used for database searches. A high throughput de novo

sequencing approach may be critical for detection of amino acid polymorphisms,

characterization of post translational modifications (PTMs) and identification of proteins

not represented in sequence databases (Mann and Wilm, 1994; Taylor and Johnson, 2001;

Standing, 2003; Pan et al., 2010). Partial sequences obtained in MS/MS, called “peptide

sequence tags” along with molecular weight forms the unique signature of precursor

peptides (Mann and Wilm, 1994) and can be used reliably to resolve ambiguities

observed due to amino acid substitutions and unexpected PTMs (Mann and Wilm, 1994;

Suckau et al., 2003; Pan et al., 2010). Fragmented ions called “post source decay” or PSD

ions, which are produced during the flight in the field-free drift region of a MALDI/TOF-

MS have been used for peptide sequencing (Spengler et al., 1992a, b; Suckau and

Cornett, 1998). The PSD spectra of peptides contain a wealth of sequence-specific a, b

(generated from N-terminal of the fragmented peptide), y (generated from C-terminal of

the fragmented peptide) and i ions (internal ions) (Biemann, 1988; Papayannopoulos,

1995). These ion series ladders are used to read the sequence of the peptide. In addition to

identification, the post-translational modifications (PTMs) of proteins have been

routinely detected by MALDI-TOF MS/MS (Huberty et al., 1993; Patterson and Katta,

1994; Crimmins et al., 1995; Mann and Talbo, 1996; Wilkins et al., 1999).

Uses of MALDI-TOF and associated techniques in insect-plant interaction studies

Mass spectrometry has been used extensively for study of the components

involved in insects-plant interactions and also for study of whole insect gut/ plant

proteomes. For example, whole “shotgun” MS based proteomics approach was employed

Chapter 1

28

for identifying JIPs accumulating in Manduca sexta larval midguts upon feeding on

transgenic tomato plants which were either insensitive to JA treatment or overproducing

the plant defense proteins (Chen et al., 2005). In another report, the interaction of

Capsicum annuum PIs (CanPIs) with H. armigera gut proteases was studied using MS in

combination with intensity fading assay technique (Mishra et al., 2010). Similarly,

comparative protein profiling of H. armigera midguts fed on host based (chickpea) and

non-host based (Cassia tora) diets was done using LC-MS (Dawkar et al., 2011).

Recently, LC-ESI MS/MS has also been used for the study of Spodoptera exigua

caterpillar-specific posttranslational modification of Arabidopsis thaliana soluble leaf

proteins (Thivierge et al., 2010). Other examples for proteome studies include,

characterization of midgut lumen proteome of Helicoverpa armigera using 2D gel

electrophoresis and de novo MS/MS sequencing (Pauchet et al., 2008) and protein

profiling of sixth instar Spodoptera littura larval midguts using HPLC-ESI-MS/MS

Shotgun Analysis (Liu et al., 2010). In addition, two dimentional gelatin zymograms

coupled with MALDI/TOF-MS have been employed for the analysis of pineapple and

green kiwi fruit proteinases (Larocca et al., 2010a, b). Similarly, cellulose and xylan

degrading enzymes were identified from larval guts of Asian longhorned beetle,

Anoplophora glabripennis (Geib et al., 2010).

Scope of the thesis

Protein substrates used commonly for measuring total protease activity in insect

gut samples include Casein, BSA and Hemoglobin. These animal proteins are readily

available but are not physiological substrates for herbivores. Phytophagous lepidopteran

larvae consume enormous quantities of green tissues containing the photosynthetic

enzyme Rubisco. Rubisco was recognized early as an ideal candidate for protein substrate

for measuring larval gut protease activities. However its digestibility has always been

poor. Possible caveats for successful application of this protein substrate include

structural resiliency to proteolysis as well as the labor and costs of purifying Rubisco

Chapter 1

29

from host plants. Hence this study investigates the potential and applicability of

recombinant large subunit of Rubisco (RBCL) as a protein substrate for midgut proteases

of three species of phytophagous Lepidoptera whose larvae are known to feed on

different plants. Helicoverpa armigera (the boll worm) is a devastating polyphagous pest

in India; Pieris brassicae (the large white butterfly) is a serious, recurring pest of

cultivated crucifers, while Antheraea assamensis (the muga silk worm) is a beneficial,

semi domesticated insect reared on the leaves of Persea bombycina (Som) and Litsea

monopetala (Sualu). Larval gut proteases of these insects include serine proteinases with

high pH optima. The gut physiology in these insects is complex. Multiple gut serine

proteinases expressed differently in response to dietary factors have been reported from

these insects. Most of these studies used synthetic substrates and/or animal protein

substrates viz. casein. Digestion of a common host plant protein substrate by gut

proteases of these phytophagous larvae feeding on different foods has not been

investigated. Hence, this dissertation research work was carried out with the following

objectives:

(1) Isolation of rbcL genes from Persea bombycina and Litsea

monopetala (family Lauraceae; host plant of a saturniid silkworm Antheraea

assamensis), Nicotiana tabacum (family Solanaceae; host plant of a noctuiid

pest Helicoverpa armigera) and Brassica oleracea var. botrytis (family

Cruciferae; host plant of a pierid pest Pieris brassicae).

(2) Expression and purification of large subunit of Rubisco as tagged,

recombinant proteins in E. coli.

(3) Evaluation of cognate (from the host plant) versus heterologous

(from a non-host plant) recombinant RBC-LS as substrates for gut proteases

from insects fed on different foods

(4) Evaluation of casein, recombinant RBC-LS, and Rubisco as

substrates for mammalian serine proteinases, and gut proteases from insects

Chapter 1

30

fed on different foods in the presence of protease inhibitors (PI) and reducing

agent, Dithiothreitol (DTT).

(5) Substrate zymography using copolymerized reco-RBCL and

Rubisco to visualize gut proteases from insects fed on different diets.

(6) MALDI-TOF peptide mass fingerprinting and MS/MS of

Rubiscolytic products from in-gel digestion by midgut proteases and trypsin.

This dissertation is organized in Five Chapters. Chapter one contains a brief introduction

to subject and review of pertinent literature while Chapter two describes materials and

methods employed. Chapter three contains results and discussion on recombinant RBCL

as a protease substrate for mammalian serine proteinases, larval H. armigera gut

proteases, larval P. brassicae gut proteases and larval A. assamensis fed on different

species of host plants. Chapter four describes MALDI-TOF analyses of recombinant

RBCL and Rubisco digested by various serine proteinases and insect gut proteases.

Chapter five contains a summary and conclusions of this dissertation. Cited references

are provided thereafter.

Documents

Chapter 1: Introduction and Review of Literatureshodhganga.inflibnet.ac.in/bitstream/10603/32074/8/08_chapter 1.pdfChapter 1: Introduction and Review of Literature Introduction to