View
13
Download
0
Category
Preview:
Citation preview
Chapter 2
Protein Sorting and Thansport
We are our proteins.
Russell F. Doolittle
Cell is the basic unit of life and proteins are the biological workhorses in
the cell. For a protein to perform its function correctly, it should be located
into its intended organelle. This chapter, in brief, discusses the biology of
protein localization, Le. protein sorting and translocation. A background
knowledge of cell, its organelles and amino acids are essential to comprehend
the protein sorting. The beginning part of the chapter presents this back
ground information. The chapter then progresses with a general discussion
on proteins and how they are synthesized in the cell. This is followed by a
detailed description of protein sorting and translocation. This section deals
with the major organelles of protein localization and how the organelles rec
ognize their native proteins. In addition to these, the chapter also mentions
about the wetlab techniques employed for identifying protein localization.
The chapter is ended by highlighting the need for the computational predic
tion of protein subcellular localization.
2.1 Background Biology
Knowledge of cell, different types of cells and amino acids are indispens
able in understanding protein sorting and translocation. Proteins localize to
13
various organelles or locations in the cell or moves out of the cell as secre
tory proteins. The organelles include nucleus, chloroplast, mitochondrion,
Endoplasmic Reticulum (ER) , Golgi apparatus, peroxisome etc. The next
subsection describes cell, different types of cell and organelles in the cell.
2.1.1 Cell and its Organelles
According to cell theory, one of the basic principles of biology, a cell is the
fundamental unit of structure, function and organization in living organ
isms. The hereditary information is contained within the cell in the form of
deoxyribonucleic acid (DNA) and this information is passed from cell to cell
during cell division. A typical human cell is of size 10 m and humans have
around 100 trillion cells.
Different Types of Cell
Life on earth can be classified into prokaryotes and eukaryotes according to
the difference in their cell structure. Prokaryotes are unicellular organisms
like bacteria whereas eukaryotes are often multicellular organisms like plants
and animals. A prokaryotic cell is simpler than an eukaryotic and the main
difference is the lack of a well defined nucleus in the prokaryotes. Eukaryotic
cells are called so because of the presence of a true nucleus. The nucleus
has a well defined boundary defined by the nuclear membrane. In prokary
otes, the genetic material DNA is concentrated in a region called nucleoid,
which do not have a membrane bound structure. The eukaryotic DNA is
linear and complexes with proteins called histones. The DNA of prokaryotes
is always circular. The DNA content of prokaryotes is only around 1 x 102
to 5 X 106 base pairs. Eukaryotes have much more DNA content and the
number of base pairs ranges from 1.5 x 107 to 5 X 109 . The cytoplasm of
eukaryotic cells contains many large and compound collections of organelles.
An organelle has its own boundary of lipid membrane which separates it
from the rest of the cell and there by allowing to perform a special function.
The prokayotes lack these membrane bound organelles like Golgi, lysosome,
peroxisome, mitochondria and chloroplast. The presence of the membrane
bound organelles makes eukaryotic cell more complex. The membrane bound
14
structure of the organelles enhances the efficiency of functions by restricting
them to occur within well defined boundary, thus limiting the span of com
munication and movement within the organelle itself. The eukaryote cell
is much bigger, typically 10-100 micrometers in diameter, compared to the
prokaryotic cell which is typically 1 micrometer in diameter. The size of the
ribosomes present in the prokaryotic cell is smaller than that of eukaryotic
cell. Cytoskeleton, the organelle responsible for giving structure to the cell,
is not found in the prokaryotes. In prokayotes, the cell division happens in
simple steps by binary fission or simple fission. In eukaryotes the cell division
is of two types called mitosis and meiosis, which are complex multi-stage pro
cess [23]. Within the eukaryotes, there is difference in cell structure between
plant and animal cell. Plant cell has a cell wall which is made of cellulose
and is intricately cross-linked with fibers of other carbohydrate molecules.
This structural pattern allows each cell to withstand the increased internal
pressure from osmosis, when the plant absorbs water. Animal cells do not
have rigid cell walls like plant cells and this allows them to take up a variety
of shapes. The chloroplasts in the plant cell are the site of photosynthesis.
This is absent in the animal cells. In chloroplast, carbon dioxide is turned
into sugar as part of photosynthesis. This is in opposite to energy production
in animal through mitochondria where sugar is broken down to carbon diox
ide to make energy. The vacuole present in plant cells are large compared
to animal cells. The plant cell communicates by linking pores in their cell
wall to connect to each other and pass information. The communication in
animal cell is by an analogous system of gap-junctions. [6].
Organelles in the Cell
Organelles are membrane bound subunits, which can perform specific func
tions. The most important organelle in a eukaryotic cell is the nucleus.
A typical animal cell is depicted in Figure 2.1. Organelles are membrane
bound subunits, which can perform a specific function. The most important
organelle in a eukaryotic cell is the nucleus. It is the store house of hered
itary information, the DNA. Nucleus is surrounded by a double membrane
and the communication to the cytosol happens through nuclear pores present
in the membrane. The DNA present in the cell is the same for all cells in
15
Organelles: l.Nucleus 2.Nucleolus 3.Ribosome 4.Vesicle 5.Rough Endoplasmic Reticulum 6.Golgi apparatus 7.Cytoskeleton 8.Smooth Endoplasmic Reticulum 9.Mitochondria 1O.Vacuole Il.Cytoplasm I2.Lysosome I3.Centrioles within centrosome. Source: Wikipedia
Figure 2.1: Typical animal (eukaryotic) cell, showing subcellular components.
the body of an organism. The genes in the DNA of each cell are expressed
only according to the requirement of that cell. Depending on the specific
cell type, some genes may be turned on or off. At the time of cell division
the DNA condense into chromosomes. The nucleolus of the nucleus builds
ribosomes, which move out of the nucleus to cytoplasm. Ribosomes are the
site of protein synthesis. The mRNA which is copied from the DNA sequence
of the gene comes out of the nucleus through the nuclear pores and bind to
the ribosome. At ribosome, the mRNA is translated according to the genetic
codons with the help of tRNA. The amino acids corresponding to the genetic
codons are brought in by the tRNA. Peptide bonds are formed between these
amino acids linearly to build up the protein. Depending on the transloca
tion pathway of the protein being synthesized, ribosomes attach themselves
to ER. The ER is of two types, rough ER and smooth ER. The ribosome
attach to rough ER which is involved in protein translocation and sorting.
The ER is a network of membranes extending through out the cytoplasm of
eukaryotic cell. The ER consists of tubular membranes and flattened sacs or
cisternae, which appear to be interconnected. The internal space enclosed
by the ER is called the lumen. The ER is continuous with the outer mem-
16
brane of the nuclear envelop. The smooth ER is involved in the synthesis of
lipids and steroids. The Golgi apparatus is a stack of flattened vesicles and is
closely related to ER in performing the function of protein sorting. Vescicles
that arise by budding off the ER are accepted by the Golgi complex. These
are further processed at the Golgi and are packaged for further translocation
by means of vesicles that arise by budding off the Golgi complex.
Lysosomes store hydrolase, the enzyme capable of digesting molecules
like proteins, carbohydrates and fats. Lysosomes are common in animal
cells but rare in plant cells. Peroxisome, which is present in both plant
and animal cell, resembles lysosome in size but differ in internal structure.
Peroxisome is responsible for protecting the cell from its own production of
toxic hydrogen peroxide. Vacuoles are membrane bound organelles used for
temporary storage and transportation of molecules. In plant cell, the central
vacuole maintains the turgor pressure. Mitochondria and chloroplasts have
double-membrane boundary and their own DNA. Mitochondrion is the power
house in the cell generating the ATP molecules. In muscle cells, number of
mitochondrion are present as there is a high demand for energy. Chloroplast
is found only in plant cells and is the site of photosynthesis. The cytoskeleton
is the cellular skeleton that provides a dynamic structure to the cell. The
cytoskeleton has important role in maintaining cell shape, enabling cellular
motion and intracellular transport.
In all the above discussed organelles, the biological functions are per
formed by the proteins. The next section gives details about the amino acids
which are the building blocks of the proteins.
2.1.2 Amino Acids
All proteins are polymers of alpha-amino acids. Alpha-amino acids have the
general formula H 2NCHRCOOH, where R is an organic substituent. The
carbon atom next to the carbonyl group is called the alpha carbon. In the
alpha amino acids, the amino and carboxylate groups are attached to the
alpha carbon. The various alpha amino acids differ in the side chain (R
group) attached to their alpha carbon.
The physiochemical properties of the amino acids are defined by the side
17
chain. The physiochemical properties of the amino acid influence its interac
tions with other amino acids, within a single protein and between proteins
which in turn determines the biological activity of the protein. An example
of the physiochemical property is hydrophobicity, the molecule's affinity to
water. The hydrophobicity of an amino acid is determined by the polarity of
the side chain. Hydrophobic amino acids are incapable of forming hydrogen
bonds with water and are buried within the hydrophobic core of the protein,
or within the lipid portion of the membrane. The distribution of hydrophilic
and hydrophobic amino acids plays important role in determining the tertiary
and quaternary structure of the protein. The amino acids that are encoded
by the standard genetic code and are used for protein synthesis is called
proteinogenic amino acids or standard amino acids. The proteinogenicnic
amino acids are alanine, cysteine, aspartic acid, glutamic acid, phenylalanine,
glycine, histidine, isoleucine, lysine, leucine, methionine, asparagine, proline,
glutamine, arginine, serine, threonine, valine, tryptophan and tyrosine. Ala
nine is very abundant and versatile. It is not particularly hydrophobic and
is non-polar. Since it is neutral, it can be located in both hydrophilic and
hydrophobic regions on the protein. The alanine side chain is inert, and is
thus rarely directly involved in protein function. Cysteine is usually classified
as a hydrophobic amino acid. Within extracellular proteins, it is frequently
involved in disulphide bonds. Aspartic acid and glutamic acid are negatively
charged, polar amino acids. Being charged and polar, they prefer to be on
the surface of proteins, when exposed to an aqueous environment. When
buried within the protein they are frequently involved in salt-bridges, where
they pair with a positively charged amino acid to create stabilising hydro
gen bonds, that can be important for protein stability. Phenylalanine is an
aromatic, hydrophobic amino acid and prefers to be buried in protein hy
drophobic cores. The aromatic side chain makes phenyalanine to be involved
in stacking interactions with other aromatic side-chains. Phenylalanine side
chain is fairly non-reactive, and is thus rarely directly involved in protein
function. Glycine has only one hydrogen as its side chain and has good con
formational flexibility. It can reside in parts of protein structures like tight
turns in structures, that are not possible for other amino acids. Histidine
is a polar amino acid and is the most common amino acids in protein active
18
sites. Isoleucine is an aliphatic, hydrophobic, amino acid. The isoleucine
side chain is very non-reactive, and is thus rarely directly involved in protein
function. Lysine is a positively charged, polar amino acid and is involved in
salt-bridges. Leucine is an aliphatic, hydrophobic amino acid and prefers to
be buried in protein hydrophobic cores. It is found more common in alpha
helices than in beta strands of protein secondary structure. As methionine
is a hydrophobic and aliphatic amino acid, it prefers to be buried in protein
hydrophobic cores. Asparagine is a polar amino acid and prefers generally
to be on the surface of proteins. It is frequently present in protein active or
binding sites. Proline plays important roles in molecular recognition, partic
ularly in intracellular signaling. Glutamine is a polar amino acid and prefers
generally to be on the surface of proteins, exposed to an aqueous environ
ment. Glutamines are frequently found in protein active or binding sites.
The polar side-chain is good for interactions with other polar or charged
atoms. Arginine is a positively charged, polar amino acid and involve in
salt-bridges. Serine is a polar amino acid. It can reside both within the
interior of a protein, or on the protein surface. Its small size makes it a good
candidate for turns on the protein surface, where i~ is possible for the serine
side-chain hydroxyl oxygen to form a hydrogen bond with the protein back
bone. Serines are quite common in protein functional centres. The hydroxyl
group is fairly reactive, being able to form hydrogen bonds with a variety of
polar substrates. Threonine is a slightly polar amino acid. Threonine can
reside both within the interior of a protein, or on the protein surface and are
frequently found in protein functional centres. The hydroxyl group is fairly
reactive, being able to form hydrogen bonds with a variety of polar sub
strates. Valine is an aliphatic, hydrophobic amino acid and is often buried in
protein hydrophobic cores. Tryptophan is an aromatic, hydrophobic amino
acid. Tryptophan prefers to be buried in protein hydrophobic cores. Being
aromatic, it is involved in stacking interactions with other aromatic side
chains. The proteinogenic amino acids with their one letter and three letter
code is listed in the appendix.
Among these twenty amino acids, a subset of amino acids are called es
sential amino acids because they cannot be synthesized by the human body.
These essential amino acids must be taken in with food. In humans, the
19
essential amino acids are lysine, isoleucine, phenylalanine, leucine, methio
nine, tryptophan, threonine, valine, arginine and histidine. The remaining
standard amino acids are nonessential in the sense that the body can syn
thesize them as needed. There are a large number of non-standard or non
proteinogenic amino acids which are not found in proteins or not coded in
the standard genetic code.
The next section is a general description of proteins which are the vital
molecules in the cell.
2.2 Proteins
Proteins are the most abundant macromolecules in the cell. They are the
workhorses, carrying out vital biological functions. They perform critical
roles in growth, giving structure to cell, maintenance in tissues etc. Proteins
have a wide range of functions as enzymes, hormones, antibodies, structural
protein, storage protein and transport protein to name a few. Enzymes
facilitate biochemical reactions and are vital to metabolism. Hormones like
insulin, oxytocin and somatotropin are messenger proteins, giving signals
to coordinate various activities. Antibodies are proteins that defend the
body from antigens. Structural proteins like keratin, collagen, actin and
elastin are fibrous and stringy which help in providing structure, stiffness
and rigidity to otherwise-fluid biological components. Storage proteins like
ovalbumin and casein store amino acids. Transport proteins like hemoglobin
and cytochromes are carrier proteins which move molecules from one place
to another around the body.
Proteins are made of amino acids, connected together by the peptide
bonds between the carboxyl and amino groups of adjacent amino acid residues.
One end of this amino acid chain has a free amino group and is called amino
terminal or N-terminal. The other end, with a free carboxyl group, is called
the carboxyl terminal or C-terminal. The amino acid sequence is the order
in which amino acid residues appear in the protein. The amino acid sequence
is written in the order, starting from N-terminal and ending in C-terminal.
The linear order of the amino acids in a protein or peptide constitutes the
primary structure of the protein. Proteins cannot perform its intended func-
20
tion in the primary structure level. They fold to form secondary, tertiary
and quaternary structure. Secondary structure is formed by the hydrogen
bonds between the amino acids in the polypeptide. Secondary structures
are regularly repeating local structures. Multiple secondary structures can
be present in a single protein. Alpha helix and beta sheets are examples of
secondary structure. Tertiary structure is the three-dimensional structure
of the polypeptide chain into which it folds naturally or with the assistance
of chaperones. The function of a protein depends on its tertiary structure.
When denatured, the protein tertiary structure is disrupted and the protein
loses its activity. The tertiary structure is the spatial arrangement of sec
ondary structures interacting through hydrophobicity, salt bridges, hydrogen
bonds, disulfide bonds, and post-translational modifications. In quaternary
structure, separate peptide chain, known as subunits join together to form a
complex.
The sequence of amino acids in a protein is decided by the three letter
codons in the messenger RNA (mRNA) from which the protein was trans
lated. The sequence of codons in the mRNA is, in turn, decided by the
sequence of codons in the DNA from which the mRNA was transcribed. The
coding portion of DNA is known as genes. Thus, the instructions to define a
protein are written in the genes which reside in the nucleus.
The next section discuss how proteins are synthesised in the cell.
2.3 Protein Biosynthesis
Protein synthesis is the process in which cells build proteins. It is a multi
step process of transcribing the genetic information in the gene to mRNA
and translating the information with help of tRNA to generate protein at
the ribosome. Protein biosynthesis differs in prokaryotes and eukaryotes.
The nucleus stores the genetic information which is the instruction to
generate the protein. The genetic information is written in deoxyribo nucleic
acid (DNA) a long molecule, made up of nucleotides. There are four nu
cleotides Adenine, Thymine, Cytosine and Guanine. The long DNA molecule
is packed as chromosomes which are the carriers of the hereditary informa
tion. Certain parts of the DNA contains biologically meaningful instruction
21
to form biomolecules. These parts are known as genes. When a protein is
needed in the cell, the gene which encodes for that particular protein has to
be expressed. In gene expression, the double helix of the DNA open up and
the gene is copied to mRNA. This is possible because of the complementari
ties of the nucleotides. This mRNA undergoes several processing like splicing
of introns, the inter non-coding areas in the gene. The mRNA travels outside
of the nucleus to the ribosomes which are in the cytoplasm. The proteins
are made in the ribosome with the help of tRNAs. Ribosomes are made of
a small and large subunit which can surround the mRNA. The first step in
translation is the initiation, the binding of ribosome to mRNA. The informa
tion in the mRNA is decoded to amino acid by the rules of the trinucleotide
genetic code. A table of trinucleotide genetic code and their corresponding
amino acid is given in the appendix. In the next step called elongation, the
triplet code is sensed by the tRNA which has a matching anticodon, and the
corresponding amino acid is added to the growing polypeptide chain. When
the triplet codon which acts as the stop codon is sensed, the translation stops
and the polypeptide chain is ready.
The next section gives a detailed description of how the protein synthe
sized in the ribosome reaches its targeted location.
2.4 Protein Sorting and Transport
For a cell to function properly, each of its numerous proteins must be localized
to the correct organelle like chloroplast, mitochondria, lysosome. Hormone
receptor proteins must be delivered to the plasma membrane for the cell to
recognize hormones, and specific ion-channel and transporter proteins are
needed in the membrane, for the cell to import or export the corresponding
ions and small molecules. Enzymes such as RNA and DNA polymerases must
be targeted to the nucleus for gene expression and protein synthesis. Prote
olytic enzymes or catalase, must go to lysosomes or peroxisomes, respectively
for proper functioning. Hormones must be directed to the cell surface and
secreted. The process of directing each newly made protein to its particular
destination is critical to the organization and functioning of eukaryotic cells
and this is referred to as protein targeting or protein sorting. [24].
22
2.4.1 Protein Sorting
Except for a small number of proteins, coded in the genomes of mitochondria
and chloroplasts, most of the proteins in a cell are encoded by nuclear DNA
and are synthesized on ribosomes in the cytosol. For proper functioning,
these proteins are to be distributed to their correct destinations in the cell. In
1999, Gunter Blobel was awarded Nobel Prize in Physiology or Medicine for
the discovery that "proteins have intrinsic signals that govern their transport
and localization in the cell." The sorting signals are present in the primary
amino acid sequence levels mostly at its N terminal. For further sorting
within the organelle, additional targeting information may be located in a
secondary targeting sequence, either placed adjacent to the original targeting
sequence or in other regions of the protein.
Proteins are translocated to their targeted location either cotranslation
aly or posttranslationaly. In cotranslational translocation, the translocation
starts while the protein is still being synthesized on the ribosome. Proteins
targeted for ER, Golgi apparatus, plasma membrane, lysosome, vacuole and
extracellular space uses the SRP-dependent pathway and are translocated
cotranslationally. The N-terminal signal sequence of these proteins, is recog
nized by a signal recognition particle (SRP), while the proteins being trans
lated in the free ribosome. The ribosome-protein complex is transferred to a
SRP receptor on the ER and the synthesis pauses. There, the nascent protein
is inserted into the the translocon that passes through the ER membrane.
Transfer of the ribosome-mRNA complex from the SRP to the translocon
opens the gate on the translocon and allows the translation to resume. The
signal sequence is immediately cleaved from the polypeptide once it has been
translocated into the ER by signal peptidase in secretory proteins. Within
the ER, chaperone helps protein to fold correctly. From ER, proteins are
transported in vescicles to the Golgi apparatus where they are further pro
cessed and sorted for transport to endosomes, lysosomes, plasma membrane
or secretion from the cell. The proteins for ER will have various ER retention
signals to keep them in the ER itself.
Most of the proteins targeted for mitochondria, chloroplast, nucleus and
peroxisome are translocated posttranslationaly. In contrast to the cotrans
lationaly translocated proteins, these proteins are translated in the free rib-
23
somes in the cytosol. Once the translation is complete, they are released
into the cytosol. These proteins which enter the non-secretory pathway are
sorted to their destination site based on the presence of the targeting sig
nal [25]. Once the protein has reached its destination, the targeting signals
are cleaved off. The targeting sequence for mitochondrial proteins, mito
chondrial transfer peptide (mTP), will have 3 - 5 nonconsecutive Arg or Lys
residues, often with Ser and Thr, at the N-terminal ofthe polypeptide chain.
No Glu or Asp residues are generally found here. In the case of chloroplast,
chloroplast transit peptide (cTP), no common sequence motifs are found but
the N-terminal is generally rich in Ser, Thr, and small hydrophobic amino
acid residues and the region is poor in Glu and Asp residues. For peroxisome
proteins, the sorting signal is generally found at extreme C-terminal usually
as Ser-Lys-Leu and these signals are not cleaved off after reaching the desti
nation. Proteins destined for nucleus have a distributed sorting signal which
is not cleaved off after sorting. One cluster of 5 basic amino acids or two
smaller clusters of basic residues, separated by around 10 amino acids are
usually found as nuclear localization signal.
In the next section the major protein localization sites are discussed.
2.4.2 Major Locations
Proteins are sorted to their locations with the help of an address signal
present in the primary structure level. Each organelle has a mechanism to
identify its own proteins. In this section, important protein localization sites
like nucleus, mitochondrion, chloroplast, peroxisome, and secretory proteins
are explained.
Endoplasmic Reticulum
The Endoplasmic Reticulum is the first branching point in protein sorting.
Figure 2.2 shows nucleus, ER and Golgi Apparatus in eukaryote cell. Most of
the proteins targeted for secretion, Golgi apparatus, plasma membrane, vac
uole, lysosome are translated on the ribosomes bounded to the Endoplasmic
Reticulum and they enter into the ER cotranslationally. Only a few pro
teins enter the ER posttranslationally. The protein translation starts at the
24
free ribosomes in the cytosoL The synthesis continues till the sorting signal
which is present in the N-terminal emerges. This sorting signal is recognized
by signal recognition particle. The SRP binds to the sorting signal and the
translation pauses. The complex of SRP, ribosome, polypeptide chain and
mRNA moves to the ER and the polypeptide chain enters the ER through
translocon. The translocon is a protein complex containing various compo
nents used for protein translocation. The SRP receptor of the translocon
binds with the SRP, the ribosome receptor binds with the ribosme and hold
it in the correct position, the pore protein forms the channel through which
the growing polypetide enter the ER lumen, the signal peptidase cut the sig
nal once it enters the ER. After the SRP and ribosomes are bound by SRP
receptor and ribosme receptor respectively, GTP binds to the the complex of
SRP and SRP receptor and the translation resumes. This causes the transfer
of the signal sequence into the channel of pore protein. Then the GTP is
hydrolysed and the SRP is released. While the sorting signal remains bound
at the the pore protein, the polypeptide grows into a loop and translocates
into the ER lumen. When the polypetide synthesis is finished, the signal
peptidase cleaves off the sorting signal, releasing the polypeptide into the
ER lumen. After this, the ribosome detaches from the ER and dissociate
into its subunits, and the mRNA is released. Inside the ER, the polypeptide
chains are folded into their native forms usually with the help of molecular
chaperones, which controls the quality of protein folding [23].
Integral membrane proteins of the plasma membrane or the membranes of
the ER, Golgi apparatus, and lysosome are first inserted into the membrane
of ER. These proteins do not enter the lumen cotranslationally but anchored
to the ER membrane by membrane spanning 0: helices that stop transfer of
the growing polypeptide chain across the membrane.
Proteins travel along the secretory pathway in transport vesicle, which
bud from the membrane of one organelle and then fuse with the membrane
of another. The proteins are exported from the ER in vesicles that bud from
the transitional ER and carry their cargo through the ER-Golgi intermediate
compartment and then to Golgi apparatus. The proteins targeted for the ER
has a retention signal in their C terminal that makes them come back to the
ER even if they are exported from the ER. Two such retention signals are
25
1. Nucleus 2. Nuclear pore 3. Rough endoplasmic reticulum (RER) 4. Smooth endoplasmic reticulum 5.Ribosome on the rough ER 6. Proteins that are transported 7. Transport vesicle 8. Golgi apparatus 9.Cis face of the Golgi apparatus 10. Trans face of the Golgi apparatus 11. Cisternae of the Golgi apparatus.Source: Wikipedia
Figure 2.2: Nucleus, ER and Golgi Apparatus in eukaryote cell
KDEL (Lys-Asp-Glu-Leu) and KKXX (two lysine residues followed by any
two amino acids) present in the C-terminal of the sequences. If the signal is
removed from the ER proteins, they are transported to Golgi and then move
out of the cell. The ER retention signals do not prevent the ER proteins from
being packaged and exported from the ER. Instead these signals retrieve the
ER proteins from Golgi apparatus or ER-Golgi intermediate compartments
and put them back to ER using a recycling pathway. Specific recycling
receptors bind to these retention signals and bring them back to ER. There
are many retention signals other than KDEL and KKXX but they are not
well characterized.
Goigi Apparatus
Golgi apparatus is composed of flattened membrane-enclosed sacs called cis
ternae and associated vesicles. The Golgi apparatus is a main center for pro
tein sorting. It receives proteins from the ER and further process them and
sort them to their targeted location: lysosomes, endosomes, plasma mem-
26
brane, or extracellular. The proteins from the ER enter the cis face of the
ER which is convex in shape and is oriented towards the nucleus. They are
transported through the Golgi and exit from its concave shaped trans face.
The proteins that function within the Golgi has to be retained from export.
All proteins known to be retained in the Golgi complex are associated with
the Golgi membrane and their retention signals are present in the trans
membrane domain. This prevents these proteins from being packaged in the
transport vesicle that leave trans Golgi network.
Membranes
Most of the eukaryotic membrane proteins are inserted into the ER mem
brane using the translocon complex used for protein secretion. They are
inserted into the membrane by translocation, until the process is interrupted
by a stop-transfer sequence, also called a membrane anchor sequence. These
membrane proteins are understood to be using the same model of targeting
for secretory proteins. In contrast to secretory proteins, the first transmem
brane domain acts as the first signal sequence and targets them to the ER
membrane. This results in the translocation of the amino terminus of the
protein into the ER membrane lumen.
Transmembrane proteins span the entire membrane. The transmembrane
regions of the proteins are either a-helical or ,B-barrels. a-helical proteins are
the major category of membrane proteins and are often found in the inner
membranes of bacterial cells, the plasma membrane of eukaryotes and in the
outer membranes. ,B-barrels proteins are found in outer membranes of Gram
negative bacteria, cell wall of Gram-positive bacteria, and outer membranes
of mitochondria and chloroplasts. No common localization signal was ob
served for membrane proteins. Helical transmembrane proteins are usually
identified from the distribution of the hydrophobic amino acids. The trans
membrane regions are significantly more hydrophobic than an average piece
of sequence. The length of the transmembrane region varies depending on
the angle between the helix and the membrane and the kind of membrane
the protein resides in. Usually transmembrane region is of 14 to 36 residues
in length. Cell membrane proteins are usually identified by the skewed dis
tribution of charges between inner and outer loops [26].
27
Extracellular Proteins
Extracellular proteins or secreted proteins are fundamental to intercellular
communications in multicellular organisms. The extracellular accessibility
of these proteins makes them ideal targets for protein therapeutics. Virtu
ally all protein-based therapeutic drugs in the market target these secreted
and cell-surface proteins. Secreted proteins and a majority of cell-surface
proteins possess an N-terminal address signal known as signal peptide [27].
The signal peptide (SP) has a length of nearly 20-25 residues. The enzyme,
signal peptidase (SPase) cleaves off the signal peptide during the export pro
cess. Small and apolar residues like alanine are found at positions -1 and
-3 relative to the cleavage site. The N-terminal domain of the signal pep
tide is usually positively charged. The central region will be hydrophobic
and leucines are the most common amino acids in this region. The cleavage
site region is usually populated with small residues [26, 28, 29]. Secretion
happens through different pathways and most important among them are
SRP-dependent (Signal Recognition Particle) pathway [30,31] and the SRP
independent pathway. In SRP-dependent pathway, the nascent polypeptide
chain is recognized by SRP and the translation is paused and the translation
complex is brought to the SRP receptor. There the polypeptide chain is
translocated through the Sec machinery and the translocation resumes. The
SRP-independent pathway (know as Sec-dependent pathway in prokaryotes)
involves post-translational translocation and employs many proteins and the
hydrolysis of ATP, for identification of the signal peptide and translocation.
In prokaryotes, the deltapH or TAT (twin-arginine translocation) pathway is
also used for secretion [32,33]. It needs no ATP but requires a pH-gradient
over the membrane. Proteins transported via this route contain a twin
arginine motif in the N-terminal part of the signal peptide, and the signal
peptide is longer than others [26].
Nucleus
Nucleus is known as the control centre of the cell and is the largest organelle in
animal cell. It is the storage place of the genetic material, DNA. A eukaryote
nucleus and subnuclear locations are given in Figure 2.3. Proteins are trans-
28
Nuc.ii:~ar' (·mvelope
Outer membraneInner membrane
C:hr()fn~1in
Heterochromatin
Euchromatin
Ribosomes
Nuclear pore
Source: Wikipedia
Figure 2.3: The nucleus of eukaryotic cell
ported into the nucleus posttranslationally and in a folded state. Most of the
nuclear proteins are imported to nucleus with the help of carrier proteins (eg
importins). These carrier proteins form a complex with the proteins that are
to be imported into the nucleus, and this complex is translocated through the
nuclear pore. Inside the nucleus, the complex is dissociated and the importin
is shuttled back to the cytoplasm and reused [26]. The address signal for nu
cleus is known as nuclear localization signal (NLS) and is a short stretch of
amino acids. The deletion of the NLS from a nuclear protein disrupts nuclear
import and the addition of NLS to a non-nuclear protein facilitate nuclear
import. These details have been widely used to experimentally unravel NLS
motifs [34-36]. The nuclear localization signals can be present anywhere in
the protein sequence. Since NLSs do not have any particular consensus se
quence, it is difficult to differentiate an NLS from a non-NLS region [26].
Usually NLS is rich with positively charged residues, since some of these
positive residues bind to carrier proteins like importins [37]. Mutating these
positively charged amino acids will disrupt nuclear import. However, there
are Glycine-rich NLS motifs with few positive charges like monopartite and
29
bipartite motifs [36, 38]. Monopartite consists of four basic and one helix
breaking residues, and the bipartite consists of two clusters of basic residues
with a spacer of 9-12 amino acids in between [39,40]. But these patterns also
are not at all unique to nuclear proteins and may well be observed in many
other proteins [25,36,41-43]. Other observed NLS includes, the 38 amino acid
long M9 sequence and the repeated G-R motif [44]. However, these signals
are in general significantly less frequent than the monopartite and bipartite
NLS. There are also signals for nuclear protein export and retention [26].
Mitochondrion
Mitochondria is known as the power house of the cell as they generate most
of the cell's supply of adenosine triphosphate (ATP) in the process of cellular
respiration by breaking down carbohydrates and fatty acids. A typical mi
tochondrion is shown in Figure 2.4. Mitochondria consist of a smooth outer
membrane and an inner membrane separated by an intermembrane space.
The inner membrane forms numerous folds known as cristae. The space in
side the inner membrane is called the mitochondrial matrix and contains the
genetic material of mitochondria. The matrix and inner membrane represents
the major working compartments of the mitochondria. As sugar is burned for
fuel, a mitochondrion shunts various chemicals back and forth across the in
ner membrane. Even though mitochondrion has a genome of its own, it does
not code for the proteins necessary for DNA replication, transcription and
translation. All these proteins, the proteins required for oxidative phospho
rylation and the proteins to act as enzymes has to be generated from nuclear
DNA and imported into the mitochondria. The double membrane structure
of the mitochondrion makes the protein import a difficult task. The proteins
for the matrix of mitochondria have to cross two membranes. The proteins
for other location have to be resorted with a secondary targeting signal, once
they reach mitochondria. The sorting signal of mitochondrion is known as
mitochondrial transfer peptide (mTP) and is on average 35 amino acids long.
The mTP binds to the receptors on the surface of mitochondria. These re
ceptors are part of TOM (Translocase of the Outer Membrane) complex that
directs translocation across the outer membrane. The individual receptors,on the TOM complex are TOM20, TOM22 and TOM5. From these recep-
30
Inner membraneOuter membrane
Deoxyribonucleic acid (DNA)
Source: Wikipedia
Figure 2.4: Typical mitochondrion
tors, proteins are transferred to the TOM40 pore protein and translocated
across the outer membrane. The protein is transported, via the GIP com
plex (general import pore), in an ATP-requiring process through the outer
mitochondrial membrane. The proteins are then transferred to a second
protein complex in the inner membrane, the TIM (Ttanslocase of the Inner
Membrane) complex for translocation into the matrix. The translocation
is through a process that requires an electrochemical hydrogen ion gradient
across the inner membrane [26]. After entering mitochondrial matrix, the
mTP is cleaved off by the mitochondrial processing peptidase, MPP (Matrix
Processing Peptidase) by proteolytic cleavage [45,46]. Some mitochondrial
matrix proteins are then cleaved again by the mitochondrial intermediate
peptidase (MIP) which removes an additional eight or nine residues from the
N-terminus [47,48]. For some proteins, a second adjacent targeting signal
that resembles the signal peptide for secretion is exposed after MPP cleav
age. These proteins are re-exported from the matrix to the intermembrane
space (IMS), or inserted into the inner membrane, in a process very similar to
bacterial protein secretion. Alternatively, the translocation over either of the
membranes is halted by a stop-transfer signal, which is specifically recognised
31
..........." · · 0.~ -®
1. outer membrane 2. intermembrane space 3. inner membrane (1+2+3: envelope) 4. stroma 5. thylakoidlumen (inside of thylakoid) 6. thylakoid membrane 7. granum (stack of thylakoids) 8. thylakoid (lamella)9. starch 10. ribosome 11. plastidial DNA 12. plastoglobule (drop of lipids). Source: Wikipedia
Figure 2.5: Typical chloroplast
by a TOM or TIM component [26,49,50], and the protein is subsequently
inserted into the outer or inner membrane, respectively.
The inner membrane metabolite carrier proteins of mitochondria con
tain internal localization signals [51]. In mitochondrial targeting peptides
(mTPs), Arg, Ala and Ser are over-represented while negatively charged
amino acid residues (Asp and Glu) are rare [51,52]. Other than this, there
is no obvious features that distinguish the mTP from other N-terminal se
quences. The degree of sequence conservation around the cleavage site is
also poor. Many mTPs have an arginine in position -2 or ...3 relative to
the MPP cleavage site [53,54]. It is reported that, the mTP forms an am
phipathic alpha-helix when bound to the receptor protein but adopts an
extended structure, when processed by the MPP [55-58].
Chloroplast
The chloroplast is double membrane bound organelle present in photosyn
thetic plants and algae. Figure 2.5 shows a typical chloroplast. In addition
to the inner and outer membranes of the envelope, chloroplasts have a third
internal membrane system, called the thylakoid membrane. The thylakoid
membrane forms a network of flattened discs called thylakoids, which are
frequently arranged in stacks called grana. Because of this three-membrane
32
structure, the internal organization of chloroplasts is more complex than that
of mitochondria. In particular, the three membranes divide chloroplasts into
three distinct internal compartments: the intermembrane space between the
two membranes of the chloroplast envelope; the stroma, which lies inside the
envelope but outside the thylakoid membrane; and the thylakoid lumen [6].
Stroma is the site of the dark reactions, more properly called the Calvin
cycle. Stacks of thylakoids are called granum. Even though it has a small
genome of its own in stroma, the majority of chloroplast proteins are encoded
in the nuclear genome and post-translationally imported into the organelle.
Protein import into chloroplasts generally resembles mitochondrial pro
tein import. Proteins are targeted for import into chloroplasts by N-terminal
sequences of 30 to 100 amino acids, called chloroplast transit peptides(cTP),
which direct protein translocation across the two membranes of the chloro
plast envelope and are then removed by proteolytic cleavage. The transit
peptides are recognized by the translocation complex of the chloroplast outer
membrane (the Toc complex), and proteins are transported through this com
plex across the membrane. They are then transferred to the translocation
complex of the inner membrane (the Tic complex) and transported across
the inner membrane to the stroma. As in mitochondria, the translocation
requires energy in the form of ATP. In contrast to the mTP, transit peptides
are not positively charged and the translocation of polypeptide chains into
chloroplasts does not require an electric potential across the membrane [6].
Inside the chloroplast, the cTP is cleaved off by the stromal processing
peptidase (SPP). cTPs are rich in hydroxylated residues, especially serines,
and have a low content of acidic residues [51]. The cTPs from different pro
teins varies from 20 to 120 residues in length. At the N-terminus of cTP,
there is a conserved alanine next to the initial methionine. A semiconserved
motif, V-R-A-(:)-A-A-V, around the SPP cleavage site (denoted by:) has
also been recognized [52]. The signal is not very strong and there are sev
eral proteins that are located to both mitochondria and chloroplasts using
identical sorting signals [26,51,59,60].
Proteins designated for the lumen of the intra-chloroplastic thylakoid
compartment normally have a bipartite targeting sequence composed of an
N-terminal stroma targeting cTP followed by a thylakoid lumen transfer
33
peptide (LTP) [61,62]. There are two different pathways from the chloro
plast stroma into the thylakoid lumen, the Sec-dependent pathway and the
delta-pH or twin arginine translocation (TAT) pathway [63]. The signals
for the two pathways are very similar, the only significant difference being
that the TAT pathway proteins contain a twin-arginine (RR) motif in the
LTP (KR and RK may also be accepted). The -3, -1 motif found at the SP
cleavage site in secreted proteins is present also in LTPs, and more strongly
conserved [26,33,64].
Many proteins are needed in both mitochondria and chloroplasts. In
general the targeting peptide is of intermediate character to the two specific
ones. The targeting peptides of these proteins have a high content of basic
and hydrophobic amino acids, a low content of negatively charged amino
acids. They have a lower content of alanine and a higher content of leucine
and phenylalanine. The dual targeted proteins have a more hydrophobic
targeting peptide than both mitochondrial and chloroplastic ones [26].
Peroxisome
Peroxisome is a single membrane bounded organelle. There are two types
of known Peroxisome Targeting Signals (PTS), one in the C-terminal region
(PTS1), and another in the N-terminal (PTS2). Among the two signals
PTS1 is the predominant and its consensus sequence is -(SfA/C)-(K/R/H)
(L/A). The most common PTS1 is serine-Iysine-Ieucine (SKL). The soluble
Pex5 receptor recognizes the PTS1-containing proteins. The Pex5-PTS1
protein complex is then docked to the translocation machinery on the surface
of peroxisom. PTS2 is a bipartite signal with consensus sequence [R/K]
[L/V/I]-x-xx-x-x-[H/Q]-[L/A], usually located in the N-terminal [26]. The
next section briefs how the location of proteins are identified in the wetlabs
by experiments.
2.5 Wetlab Techniques
A wide range of experimental methods are used in the wetlabs to identify
protein subellular localization. Immunofluorescence and immunoelectron mi
croscopy, PhoA protein fusions, fluorescent-protein tagging, and Western/SDS-
34
PAGE analysis of subcellular fractions are used for this purpose. Even though
the output of these methods are highly accurate, they have several limita
tions like only a few proteins can be tested at a time and they are costly,
time-consuming, and the number of proteins for which it can be used is rel
atively low. One of the laboratory techniques for subcellular localization
identification is transposon-mediated random epitope tagging and plasmid
based expression of epitope-tagged proteins followed by immunofluorescence.
This method had been used for comprehensive global analysis of protein lo
calization performed in the budding yeast, Saccharomyces cerevisiae. The
disadvantage is that these techniques can introduce potential errors in local
ization by interfering with localization signals via random insertion of tags
or saturation of binding sites by over expression of proteins. In addition,
the immunofluorescence adds a cumbersome and costly step to the analy
sis that may introduce non-specific staining [17,65]. A study of subcellular
localization on yeast [66] employed oligonucleotide-directed homologous re
combination to insert GFP preceding the stop codon of open reading frames
(ORFs) and generate yeast strains with proteins tagged at the carboxy ter
minus. The proteins were expressed under the control of their endogenous
promoters and presumably at relatively normal levels. On the other hand,
the carboxyl terminal tagging interfered in some cases with protein localiza
tion signals such as palmitoylation and farnesylation that direct proteins to
the plasma membrane.
Techniques such as two-dimensional gel electrophoresis and mass spec
trometry have been frequently used to analyze localization for a variety of
bacterial genomes, including pathogenic organisms. A major disadvantage of
subproteome analysis is that the fractionation of a complex structure like the
cell into several subcellular compartments is not a trivial task, because of the
contamination from other cellular compartments and the multiple localiza
tion of several proteins. Computational methods for subcellular localization
prediction solve these problems to a great extend.
35
2.6 Need for Computational Prediction
Although the subcellular localization of a protein can be determined by con
ducting various biochemical experiments they have many practical limita
tions and are costly and time consuming. High throughput genomic tech
niques in the past decade have resulted in rapid accumulation of genomic
and proteomic data in the biological databases. For example, in 1986 the
total sequence entries in Swiss-Prot was only 3,939 [19,20] while the number
was increased to 514789 sequence entries as of Swiss-Prot release 57.14 of
9-Feb-10. This explosive growth of the biological databases demands devel
opment of automated methods with high accuracy to reliably annotate the
subcellular attributes of uncharacterized proteins.
For proteins, that are only predicted from the sequenced genome and not
extracted as biological molecule, the only available data will be the predicted
amino acid sequence. In such cases, the features of proteins, including subcel
lular localization can be predicted using the computational methods. These
annotation will bring out the importance of that particular protein. Proteins
which are isolated and sequenced often lack the N-terminal signal in it, as
the import machinery of the compartments cleave off the address signal in
the protein. Even these information loss will not affect prediction, because
most of the tools use a wide range of biological information that are derived
from the sequence for making prediction.
Most of the computational prediction tools are available in the Internet.
They are publicly accessible and free of cost. Since a wide range of tools
are available, the biologist can make prediction using different methods to
increase the reliability of the prediction. Even organism specific tools are
available, providing greater accuracy for the prediction. Performing the pre
diction prior to experimental confirmation will save valuable resources. Most
of the tools accept multiple amino acid sequence as input and allows high
throughput perdition. The computational methods for localization predic
tion are reviewed in the next chapter.
36
2.7 Conclusion
The protein sorting and translocation is a complex task involving multiple
decision makings at multiple stages. Various proteins are involved in the
translocation process. As described, no hard and fast rules can be derived
for any locations. The address signals do not share common features in
many cases. These difficulties can be addressed by computational prediction
techniques. For making a computational prediction, biological features which
have qualitative impact on the biological process have to be observed and
quantified. The biology discussed in this chapter serves towards this purpose.
This chapter discussed the biology of subcellular localization. The back
ground biology of cell, organelles and amino acids were explained, followed
by details on proteins and their biosythesis. How proteins are sorted and
translocated to various locations were also explained. The wetlab techniques
for identifying the protein location and their limitations were briefed. The
need for prediction methods were discussed towards the end of this chapter.
The next chapter discuss how to computationally predict the location of pro
tein. The chapter also provides a detailed review of existing computational
methods and tools for the subcellular localization prediction.
37
Recommended