Generating peptide probes against cancer-related peptide recognition domains using phage display
by
Yogesh Hooda
A thesis submitted in conformity with the requirements for the degree of Master of Science
Graduate Department of Molecular Genetics University of Toronto
© Copyright by Yogesh Hooda 2012
ii
Generating peptide probes against cancer-related peptide recognition domains using phage display
Yogesh Hooda
Master of Science
Graduate Department of Molecular Genetics
University of Toronto
2012
Abstract
Peptide recognition domains (PRD) bind to short linear motifs on their biological partners and
are found in several cellular pathways including those found to be critical in tumorigenesis. In
this study, I aimed to generate peptide probes against PRDs present on proteins involved in
ovarian cancer. Using bioinformatics, I identified 66 potential PRDs present on these proteins. I
then used peptide phage display to successfully generate peptides against 27 of the 66 domains.
To validate my results, I performed an extensive literature review and structural analysis. For
several cases, the phage-display derived binding preferences are similar to previously reported
studies. However, for a subset of domains, I identified non-canonical binding preferences that
have not been reported previously in literature. The binding preferences obtained in this study
can be used to design intracellular probes for studying the role of these PRDs in biological
pathways important in ovarian cancer.
iii
Acknowledgments
It is hard to imagine that it has already been two years since I started my graduate studies.
Working at the Sidhu and the Kim labs has been a wonderful experience and I would like to take
this opportunity to thank all the people who helped me through this part of my life.
First and foremost, I would like to thank my supervisors Dev Sidhu and Philip Kim who
gave me the opportunity to work in their labs and guided me throughout my stay here. They both
have been an immense source of inspiration. I would also like to thank my committee members,
Frank Sicheri and Tim Hughes, for their constructive criticism and suggestions.
During my stay, I came across an awesome set of people at both the Sidhu and the Kim
labs. I would especially like thank Joan for all his discussions and guidance during the latter part
of my project. In the Sidhu lab, I would like to give special thanks to Maruti, Andreas, Megan,
Haiming, Gang and Linda for their kind help and support. I would also like to thank Mark,
Recep, Simon, Roland, Clare, Kurt and Ylva in the Kim lab.
I also would like to thank all my friends here in Toronto, around the world and back
home in India for sharing with me their adventures or misadventures and listening to mine. Their
friendships made Toronto a great city to stay in. I would especially like to thank Senjuti for her
incredible love and encouragement. Her companionship has kept me going through all the ups
and downs of my project.
Lastly, I am grateful to my family for their constant love and support. They have always
been a tremendous source of strength and inspiration for me.
iv
Table of Contents
Acknowledgements ........................................................................................................................ iii
Table of Contents ........................................................................................................................... iv
List of Tables ................................................................................................................................ vii
List of Figures .............................................................................................................................. viii
List of Appendices ...........................................................................................................................x
1 Introductions ...............................................................................................................................1
1.1 Overview ................................................................................................................................2
1.2 Peptide recognition domains ..................................................................................................4
1.2.1 Properties of domain-peptide interactions ..................................................................4
1.2.2 Role in biological pathways ........................................................................................5
1.3 Peptide-recognition domains as therapeutic targets ..............................................................6
1.3.1 Bcl-2 ............................................................................................................................7
1.4 Studying peptide recognition domains using peptide probes ...............................................10
1.4.1 Understanding structure and binding properties .......................................................10
1.4.2 Elucidating biological role ........................................................................................11
1.4.3 Validating drug targets ..............................................................................................12
1.4.4 Drug Discovery .........................................................................................................13
1.5 Goal of the project ................................................................................................................14
2 Identification of peptide recognition domains essential in ovarian cancer .........................16
2.1 Introduction ..........................................................................................................................17
2.1.1 Whole Genome RNAi screen ....................................................................................17
2.1.2 Computational methods to identify peptide recognition domains ............................18
2.2 Methods ................................................................................................................................19
2.2.1 Identification of peptide recognition domains ..........................................................19
2.2.2 Manual filtering and literature review of potential domains from PepX ..................20
2.3 Results and Discussion .........................................................................................................20
2.3.1 Analysis of 1695 genes obtained from whole genome RNAi screens ......................20
2.3.2 Literature review of domain list obtained from the computational pipeline ............21
v
2.4 Summary ..............................................................................................................................25
3 Identification of peptide binders using phage display ...........................................................27
3.1 Introduction ..........................................................................................................................28
3.1.1 Displaying peptide on phage particles ......................................................................28
3.1.2 Site-directed mutagenesis and phage library design .................................................29
3.1.3 Selection strategy ......................................................................................................31
3.1.4 Selection of tight-binding peptides and identification of binding specificities ........32
3.2 Methods ................................................................................................................................33
3.2.1 Strains .......................................................................................................................33
3.2.1 Protein expression and purification ..........................................................................33
3.2.2 Library construction and design ................................................................................34
3.2.3 Phage Display selections ...........................................................................................35
3.2.4 Calculation of enrichment ratio and pool ELISA .....................................................36
3.2.5 Clonal ELISA and sequencing of peptides ...............................................................37
3.2.6 Structural modeling of phage-display results ...........................................................38
3.3 Results and Discussion .........................................................................................................38
3.3.1 Selection of peptide binders using phage display .....................................................38
3.3.2 Validation of tight binder using clonal ELISA .........................................................39
3.3.3 Identification of binding preferences and literature validation .................................39
3.3.4 Cellular signaling ......................................................................................................46
3.3.4.1 SH3 ............................................................................................................48
3.3.4.2 PDZ ............................................................................................................49
3.3.4.3 G-alpha .......................................................................................................50
3.3.4.4 14-3-3 .........................................................................................................52
3.3.4.5 Penta-EF hand ............................................................................................53
a Calpain small regulatory subunit ............................................................53
b Programmed Cell Death Protein 6 ..........................................................54
3.3.5 Cytoskeleton regulation ............................................................................................55
3.3.5.1 Dynein light chain ......................................................................................55
3.3.5.2 CAP/Gly .....................................................................................................57
3.3.5.3 Alpha-vinculin head domain ......................................................................58
vi
3.3.6 Intracellular transport ................................................................................................60
3.3.6.1 Importin beta ..............................................................................................60
3.3.6.2 UBA ...........................................................................................................61
3.3.6.3 Bro1............................................................................................................62
3.3.6.4 Clathrin heavy chain ..................................................................................63
3.3.7 Genome Regulation ..................................................................................................65
3.3.7.1 PCNA .........................................................................................................65
3.3.7.2 OB-fold ......................................................................................................66
3.3.7.3 Ligand binding domain of nuclear receptors .............................................67
3.3.7.4 WD40 domains ..........................................................................................69
3.3.7.5 TRF homology domain ..............................................................................71
3.3.8 Miscellaneous ...........................................................................................................71
3.3.8.1 SWIB/MDM2 ............................................................................................72
3.3.8.2 HORMA domain ........................................................................................73
3.3.8.3 eIF4E ..........................................................................................................74
3.3.8.4 Ubiquitin ....................................................................................................75
3.4 Summary ..............................................................................................................................76
4 Conclusions ................................................................................................................................78
4.1 Summary of work .................................................................................................................79
4.2 Future experiments ...............................................................................................................79
4.3 Potential avenues for research ..............................................................................................80
4.4 Application of phage-derived peptides ................................................................................81
4.5 Final remarks ........................................................................................................................81
5 References ..................................................................................................................................82
vii
List of Tables
Table 1 List of all PRDs that have been investigated as targets for cancer therapies ......................9
Table 2 Summary of results obtained from DOMINO and PepX .................................................20
Table 3 List of 66 domains selected for phage display experiments .............................................21
Table 4 Summary of phage display results for 66 domains ...........................................................40
viii
List of Figures
Figure 1 Representative structures of PRDs present in the human genome ....................................3
Figure 2 Peptide and small-molecule inhibitors of Bcl-2 ................................................................7
Figure 3 Combinatorial methods for determining binding preferences of PRDs ..........................11
Figure 4 Generating intracellular Dvl2-PDZ inhibitors using phage display ................................13
Figure 5 Fluorescence polarization assays for discovery of small-molecule inhibitors ................14
Figure 6 Whole genome RNAi screen for identifying essential genes in ovarian cancer .............17
Figure 7 Computational strategy for identifying potential peptide binding domains ....................19
Figure 8 Schematic diagram of M13 bacteriophage ......................................................................29
Figure 9 Oligonucleotide-directed mutagenesis with an ssDNA template ....................................30
Figure 10 Phage display selection for PRDs ..................................................................................31
Figure 11 Strategy for validating phage display results .................................................................46
Figure 12 Overview of phage display results ................................................................................47
Figure 13 Structural and literature analysis of SH3 domains ........................................................48
Figure 14 Structural and literature analysis of PDZ domains ........................................................50
Figure 15 Structural and literature analysis of Gα subunits ...........................................................51
Figure 16 Structural and literature analysis of 14-3-3 ...................................................................52
Figure 17 Structural and literature analysis of Penta-EF hand of CAPNS1 ..................................54
Figure 18 Structural and literature analysis of Penta-EF hand of PDCD6 ....................................55
Figure 19 Structural and literature analysis of Dynein light chains ...............................................56
Figure 20 Structural and literature analysis of CAP/Gly domain of p150glued ............................58
Figure 21 Structural and literature analysis of Alpha-catenin/vinculin head domain ...................59
Figure 22 Structural and literature analysis of Importin beta ........................................................60
Figure 23 Structural and literature analysis of NXF1-UBA domain .............................................61
Figure 24 Structural and literature analysis of Alix-Bro1 domain ................................................63
Figure 25 Structural and literature analysis of Clathrin terminal domain .....................................64
Figure 26 Structural and literature analysis of PCNA ...................................................................66
Figure 27 Structural and literature analysis of RPA 70N OB-fold domain ...................................67
Figure 28 Structural and literature analysis of NR1H4 ligand binding domain ............................69
Figure 29 Structural and literature analysis of WDR5 ...................................................................70
Figure 30 Structural and literature analysis of TRFH domain of TERF1 ......................................71
ix
Figure 31 Structural and literature analysis of SWIB/MDM2 .......................................................72
Figure 32 Structural and literature analysis of HORMA domain ..................................................73
Figure 33 Structural and literature analysis of eIF4E ....................................................................74
Figure 34 Structural and literature analysis of ubiquitin ...............................................................76
x
List of Appendices
Appendix A List of ovarian cancer lines ......................................................................................88
Appendix B Protein sequences of 66 domains ..............................................................................89
Appendix C Vector sequences ......................................................................................................95
1
1. INTRODUCTION
2
1.1 Overview
Protein-protein interactions form the molecular basis of key regulatory and signalling
pathways inside cells [1]. They help in assembly of macromolecular complexes and formation of
modular interaction networks that regulate key biological processes such as cell cycle, signal
transduction and embryogenesis. Protein-protein interactions can be roughly categorized into two
types: i) domain-domain interactions where two domains bind to each other and ii) domain-
peptide interaction where domains bind to an unfolded linear motif on its partner [1]. Domain-
peptide interactions are mediated by peptide-recognition domains (PRD) which bind to small
linear motifs that often lie in disordered regions on their interaction partners [2].
Peptide recognition domains (PRD) are ubiquitous and assemble transient regulatory
networks, identify post-translation marks, regulate signalling molecules and provide specificity
to enzymatic complexes. Given the important role of domain-peptide interactions in key cellular
processes, these interactions are frequently targeted by toxins or somatic mutations found in
diseases including cancer. In cancer, amplified and exogenous domain-peptide interactions often
lead to rewiring of cellular networks, thereby promoting tumour growth, invasion and metastasis
[3]. A number of such interactions, e.g. p53/mdm2, IAP/caspase and Bcl-2/BH3, have been
targeted using small molecules and peptide-based drugs [4]. Peptide recognition domains (PRD)
mediating these interactions form an emerging class of cancer drug targets.
PRDs have been extensively studied by peptide-based probes. These probes can be
derived from known natural binding partners or generated using combinatorial methods such as
phage display and peptide microarrays [5]. Peptide-based probes have been extensively used to
elucidate the biochemical and structural properties of interactions mediated by PRDs. These
peptide probes have also been used to design intracellular reagents to target interactions
mediated by PRD and to better understand cellular pathways [5]. Such probes may also be used
to identify PRDs that may serve as potential cancer drug targets [5].
Peptide-based probes against PRD have also led to development of small molecule
therapeutics (e.g. ABT737 against Bcl2, Nutilins against MDM2 etc.) against various cancers
[4]. However, the number of PRDs whose role in cancer-related pathways is well-understood is
limited. This is largely due to the lack of high affinity and specific probes to study these
domains. In order to address this issue, I propose to use phage display to systematically generate
peptide probes against different families of PRDs. The main focus of this study
3
Figure 1: Representative structures of PRDs present in the human genome. Peptide recognition domains are structurally diverse and use different binding surface to bind to peptides. Domains as defined by CATH are shown in grey; peptide ligands are shown in green.
4
is to develop peptide probes against the PRDs present on proteins involved in ovarian cancer that
were identified by our collaborators. Peptide probes developed here may serve as valuable tools
to understand the role of PRDs in ovarian cancer-related biological pathways.
In the following sections, I will discuss progress made in the study of peptide recognition
domains. First, I will discuss structural properties of interactions mediated by peptide recognition
domains and describe some of their biological roles. Second, I will highlight examples of peptide
recognition domains that have been identified as drug targets for specific types of cancer. Third,
I will present studies that have used peptide probes against PRDs that demonstrates their utility
as intracellular probes. Finally, I will elaborate the specific aims of the current study.
1.2 Peptide recognition domains
As discussed above, PRDs bind to specific linear motifs on their interaction partners. Since the
discovery of the first PRDs, a large number of such domains have been identified in the human
proteome. This progress can be attributed to the development of high-throughput experimental
methods that allow the identification of a large number of protein-protein interactions[6]. Such
studies have established that a significant proportion of protein-protein interactions within a cell
are often domain-peptide interactions mediated by dedicated peptide recognition domains [7].
Analysis of structures of peptide recognition domains in complex with their natural or synthetic
partners have led to the elucidation of their mode of function [8]. Further experimental and
computational studies have highlighted the roles played by PRDs inside cells.
1.2.1 Properties of domain-peptide interactions – Peptide recognition domains are found in
structurally diverse protein families (Figure 1) that are catalogued in databases such as DOMINO
[9], ADAN [10] and PepX [11]. Domain-peptide interactions are often mediated by a groove-like
binding interface present on peptide recognition domains. The binding interface of domain-
peptide interactions is ~500-1000 Å2, which is smaller than those of domain-domain interactions
[12]. Domain-peptide interactions are often transient and exhibit binding affinity in the low-
micromolar to nanomolar range. Structurally, the binding interface on the domain is often the
largest pocket on the surface of the PRD [12]. The binding surface is more hydrophobic than the
overall surface of the protein but less hydrophobic than the protein core. A small subset of the
residues present on the binding surface contributes to most of the binding energy. These residues
5
known, as “hotspot residues” are essential for binding and change of any of these residues can
severely affect the domain-peptide interaction [13]. The interaction between the PRD and the
peptide may cause conformational changes on either the PRD or its interaction partner [12].
Many peptide recognition domains also possess enzymatic function such as the G-alpha
subunits. The binding of peptide partners to the switch II/alpha III groove on the G-alpha subunit
increases the GTPase function of the G-alpha subunit [14]. Furthermore, in the case of ligand
binding domain of nuclear receptors, the peptide binding is often dependent on the binding of
small molecule/hormone to the ligand-binding pocket [15]. The binding of ligand produces a
conformation change allowing the peptide to bind to the hydrophobic pocket. Using these
approaches, PRDs often couple peptide binding and enzymatic/ligand-binding functions present
on the same domain.
The binding preferences of PRDs are highly diverse. While PRD such as the SH3, WW,
and EVH1 bind to motifs rich in proline residues, other domains such as the PDZ and CAP-Gly
domains specifically recognize hydrophobic C-terminal residues of the peptides [5].The binding
sites of PRDs are often present on the disordered regions on the interacting proteins. These
peptide motifs undergo a disordered to order transition upon binding [12]. For example, the
binding of co-activators to ligand binding domains of nuclear receptors leads to helical
conformation of co-activator [15]. This produces a conformation change in the co-activator
molecule that favors the assembly of active transcriptional complex [15].
A class of PRDs specifically recognizes post-translation modification such as
phosphorylation (SH2, 14-3-3, FHA), acetylation (bromodomain) and methylation
(chromodomains)[16]. Such domains act as readers of post-translation modifications and link
these modifications to downstream cellular pathways. For example the SH2 domains of scaffold
proteins such as Grb2 and Vav link phosphorylation of receptor tyrosine kinases to activation of
intracellular kinases (Raf,Ras and Erk) [1].
1.2.2 Role in biological pathways: Proteins that regulate key cellular processes, such as signal
transduction, cell cycle, protein trafficking, cytoskeleton organization and gene expression are
composed of catalytic and interaction domains [1]. Catalytic domains such as kinases, GTPase,
proteases etc. catalyze specific molecular reactions (phosphorylation and peptide bond digestion)
that help in propagation of cellular signals. However these domains often have limited inherent
6
specificity i.e. they can bind to a large set of binding partners. Interaction domains regulate the
specificity of catalytic domains either directly by recruiting substrates of catalytic domains or
indirectly by controlling their spatio-temporal localization [17]. As previously mentioned, a large
number of the interaction domains are PRDs that bind to specific peptide motifs present on their
interacting partners. Thus, PRDs recruit and confine signaling proteins to an appropriate sub-
cellular location and determine the specificity with which enzymes interact with their targets,
analogous to association of protein kinases with their substrates. There are several evolutionary
and mechanistic advantages provided by PRDs to cellular networks. Firstly, domain-peptide
interactions often evolve faster than domain-domain interactions, allowing cellular pathways to
be rewired with minimal changes [17]. Secondly, PRDs that act as scaffolds increase the speed
of signal transduction by increasing the local concentration of enzymes and substrates [17].
Thirdly and most importantly, PRDs provide specificity to the information flow in intracellular
networks [17]. This allows cells to accurately process the diverse range of signals they receive
and produce the appropriate biochemical responses.
A key function of PRD is to identify specific post-translation modifications (PTM).
Protein function and localization are often regulated by a vast and dynamic array of PTM. By
recognizing specific PTMs, PRDs link PTMs to cellular organization thereby sensing “the state
of the proteome” [16]. PRDs are also involved in cellular protein trafficking. Specific peptide
tags on the protein determine the transport of cellular proteins. PRDs such as importin-beta and
clathrin recognize specific peptide tags and transport cellular proteins to their desired sub-
cellular location [1].
1.3 Peptide-recognition domains as therapeutic targets
Given their central role in biological pathways, peptide recognition domains are often targeted by
pathogenic proteins and somatic mutations observed in various diseases including cancer [3].
Hence, PRDs are an emerging class of therapeutic targets. Small-molecule and peptide-based
drugs have been developed against a handful of PRD families. These drugs are currently in
various stages of pre-clinical and clinical drug development. In this section, I will describe the
work done on an important family of PRD that has been extensively studied as cancer drug
target: B-cell lymphoma-2 or Bcl-2. I will discuss the functions of this family of domains inside
7
cells and how these functions are often mis-regulated in cancer. I will also briefly discuss the
various techniques that were used to develop potential therapeutic agents against these domains.
Figure 2: Peptide and small-molecule inhibitors of Bcl-2. A) The structure of 16-amino acid peptide derived from Bad in complex with Bcl-xl. B) The interaction surface of Bcl-xl and Bad-peptide. The interaction is mediated by a hydrophobic pocket on Bcl-xl. C) The structure of the small-molecule (ABT-737) in complex with Bcl-xl. D) The interaction surface of Bcl-xl and ABT-737. ABT-737 binds to same hydrophobic pocket on Bcl-xl and competes with its natural interaction with Bak and Bax.
1.3.1 Bcl-2: B-cell lymphoma-2 (Bcl-2) family of proteins are important regulators of
mitochondrial outer membrane permeabilization (MOMP), an important step in apoptotic
pathway inside the cells [18]. They regulate the release of cytochrome-c from the mitochondria
and the activation of caspases which are the proteases responsible for breakdown of key cellular
components during apoptosis. Bcl-2 family forms an alpha-helical structure consisting of repeats
8
called the baculovirus-homology domains (BH-domains). This protein family can be divided
according to their positive or negative effect on apoptosis. While family members such as Bax,
Bak etc initiate apoptosis; members such as Bcl-2, Bcl-xl & Mcl-2 inhibit apoptosis. In normal
conditions, the interplay of these proteins regulates the apoptotic pathway. However upon the
induction of stress conditions or DNA damage, pro-apoptotic members of Bcl-2 family are
activated. The pro-apopototic Bcl-2 family member’s form pores in the outer membrane of the
mitochondria, allowing cytochrome-c and other proteins to initiate apoptosis [18].
In various cancers, somatic mutations cause over-expression of anti-apoptotic members
of Bcl-2 family leading to abrogation of apoptosis [19]. The anti-apoptotic member of Bcl-2
interact with the pro-apoptotic members of Bcl-2 and inhibit their ability to form pores in the
mitochondrial outer membrane. This interaction is mediated by a linear alpha-helical peptide
(BH3) on pro-apoptotic members binding to the hydrophobic pocket on the anti-apoptotic
members(Figure 2). This interaction is critical for the abrogation of apoptosis and inhibition of
this interaction leads to activation of apoptosis [20]. Synthetic peptides that mimic the BH3
peptides were shown to successfully induce apoptosis in different cancer cell lines and mouse
models[21]. Later, small molecules identified by using structural-activity relationship (SAR)
analysis were found to be efficacious in promoting apoptosis; re-establishing the observations
made with the synthetic peptides (Figure 2). These small molecules bound to pro-apoptotic
members of the Bcl-2 family with nano-molar affinity and showed good pharmacokinetic
properties [22]. Small molecule inhibitors of Bcl-2 are currently in various stages of clinical or
pre-clinical investigation.
Several key observations can be derived from the study of the aforementioned Bcl-2
example. Firstly, PRDs that involved in critical cellular processes (such as apoptosis in the case
of Bcl2)are often mis-regulated in a wide-spectrum of cancers. Secondly, somatic mutations are
often sufficient to amplify the cellular levels of PRDs thereby modulating the cellular processes
they are involved in. This also provides an opportunity for drug development, because in theory
these perturbations can be reversed by specifically blocking the interactions mediated by these
PRDs. Thirdly, small molecules developed against Bcl-2 bind with an affinity comparable to that
of the native partner protein or peptide by binding to a small subset of residues on the interaction
surface. These residues often, but not always, correspond to the “hotspot residues”. The Bcl-2
9
example highlights the possibility of identifying small compounds that can inhibit interactions
mediated by PRDs with desirable affinity and specificity.
A number of PRDs have been identified as drug targets (reviewed in Table 1). These
domains follow the characteristics described above i.e. amplification in cancer, involvement in
key cellular pathway and presence of hotspot residues. These characteristics have made PRDs an
good target for anti-cancer drug development.
Drug target Interaction partner Role Remarks
MDM2 P53 Negative regulation of p53 protein
Mdm2 down-‐regulates tumor suppressor protein p53 in cancer, targeted using small molecule and peptides
IAP Caspase Inhibition of caspase IAP’s negatively regulate caspases; targeted by peptides and peptidomimetics
Dvl2 PDZ Fzd-‐7 Involved in Wnt signalling
Dvl2 PDZ domain binds to internal peptide; targeting using peptides and small molecules
N-‐Cadherin N-‐cadherin, E-‐cadherin
Cell adhesion N-‐cadherin binds to HAV sequence at EC1 domain of different cadherin
Plk1-‐PBD CDC25C, Chk2, PDBIP1
G2/M checkpoint regulation
Polo-‐box domain of Plk1 binds to phospho-‐peptides; targeted using small molecules and peptidomimetics
ICN-‐1/CSL MAML1 Transcription factor, involved in Notch signalling
MAML-‐1 binds to hydrophobic groove on ICN1-‐CSL, targeting by peptiomimetic (stapled peptide)
eIF4E eIF4G, 4E-‐BP1 Translation initiation factor
eIF4E binds to 16-‐mer segment within eIF4G and 4E-‐BP1; targeted using peptides and small molecules
Menin MLL Histone modification Menin-‐MLL fusion leads to over-‐expression of Hox genes; targeted using small molecules
Table 1: List of all PRD's that are currently being investigated as targets for cancer therapies
10
1.4 Studying peptide recognition domains using peptide probes
Peptides can be generated against PRDs from natural partners or directed evolution methods
such as phage display and SPOT microarray. These peptides have been used as valuable tools for
studying the biological roles of PRDs.
1.4.1 Understanding structure and binding properties: To obtain detailed understanding of
interactions between a PRD and its biological partner, it is important to characterize the
structural and molecular aspects of the interaction in-depth. Peptides derived from interacting
partners can be used for studying these biophysical binding properties.
Further insights can be obtained by using combinatorial methods such as SPOT
microarray and phage display. SPOT microarrays are generated by synthesizing peptides on a
cellulose membrane [23]. On a single membrane, different peptides can be obtained which can
sample all the amino acids at each position of the peptide. The domain is incubated with the
microarray and fluorometric/colorimetric methods can be used to study the binding of domains at
each spot. By analyzing the intensity of each position on the microarray, we can obtain the
binding preference of a given domain which is often visually represented as position weight
matrix (PWM) or sequence logo (Figure 3). The height of an amino acid in the PWM is
indicative of the relative frequency at that position. One of the first applications of this method
was to study the SH3 domains [24]. A key advantage of SPOT/peptide microarray is the ability
to study PRDs that bind to modified peptides, such as phosphorylated, acetylated peptides
[25,26].
Phage display is a powerful technique that can be used to obtain binding preferences of
PRDs. In phage display, peptides are fused to the coat protein of filamentous bacteriophage such
that the peptides are displayed on the surface of the bacteriophage[27]. Using site-directed
mutagenesis, large 1010 library of phages can be generated where each phage displays a unique
peptide. These libraries can then be panned against immobilized PRDs to capture phages that
bind specifically to the domain of interest. The peptide displayed by these tightly-bound phages
can be identified by sequencing the DNA of the phage (Figure 3). There are several advantages
of phage display over other approaches. These include cost effectiveness and ability to re-use
libraries to probe against a large set of PRDs. Previous studies in the Sidhu lab have used peptide
11
phage display to understand binding preferences of well-studied domains such as the PDZ and
the SH3 domains [28,29].
Both phage display and peptide microarrays are extremely effective in understanding the
binding preferences of domains and can be complemented with biophysical methods such as iso-
thermal calorimetry (ITC), surface plasmon resonance (SPR) and fluorescent polarization to
obtain binding affinities of the peptide recognition domains. Computational methods (machine
learning/structural methods) have also been developed to predict the binding preferences of
PRDs [28,29].
Figure 3: Combinatorial methods for determining binding preferences of peptide recognition domains. Combinatorial methods such phage display and peptide microarrays have been extensively used to isolate binding preferences of diverse set of domains. These preferences are often represented as position weight matrixes (PWM) that are based on the occurrence frequency of a given amino acid.
1.4.2 Elucidating the biological role: Peptides have been extensively helpful in elucidating
the biological roles of PRDs. Peptide motifs obtained from combinatorial screens can be used to
screen the proteome to identify potential binding partners. These partners can then be confirmed
using yeast 2-hybrid and/or pull-down assays [30]. A number of such peptides motifs are
available in databases such as ELM [7].
Peptides that bind specifically and with high-affinity can be used as intracellular probes
against PRDs. Linear peptides are often unstable and cannot cross the cellular membrane.
However, recent developments in molecular biology and peptide chemistry have significantly
12
increased the stability and cellular permeability of peptides. Chemical modifications can greatly
increase the stability and affinity of peptides that bind to a target domain [31,32]. Fluorescent
labels and probes can be attached to the peptides to track their localization inside cells and model
organisms [33]. Further, peptide probes can be readily fused to cell-penetrating peptides (CPP) to
increase their cellular permeability in various mammalian cell lines. Other entities (such as NLS
for nuclear localization) can be attached to the peptides to deliver the peptides in specific cellular
organelles[34]. Finally, transduction methods can be used to express peptides inside mammalian
cell lines. These methods include lenti-viral based expression systems that effective delivery of
peptides to different cell-lines and model organisms [34]. The key advantage of lenti-viral
expression vectors is that the DNA encoding the peptide is incorporated in the genome that
allows stable expression of peptides in dividing and non-dividing cell lines [34].
One of the central advantages of using peptides as probes for biology is their ability to
modulate protein function in various aspects. Peptides bind to epitopes on the proteins that are
often distinct from the enzymatic pocket [35]. This allows peptides to modulate domain function
as either antagonist or agonists.
1.4.3 Validating drug targets: One of the central motivations of modern biology is to identify
therapeutic targets for diseases. Numerous methods are available to perturb activity of a
particular gene, e.g. gene knockouts, RNAi and small molecule drugs. Drugs act at the protein-
level and perturb the natural biological function of a given protein. Drugs whose perturbations
result in resolution of pathogenic phenotype are ideal candidates for therapy. Drugs are often
small organic molecules that can be identified using structure-based approaches or high-
throughput screens. However, development of high specificity and affinity small molecules often
requires large monetary and time investment. These costs make the development of small
molecule drugs against all known PRDs prohibitive. By prioritizing PRD to those that play a role
in the onset of a given disease, we can greatly increase the efficiency of drug discovery. To this
end, peptides may act as probes for identification of drug targets for diseases. As described
previously, peptides can be generated against a large number of PRDs and introduced into
mammalian cells.
Peptides modulate their targets by various methods and can produce distinct phenotypes.
In some disease models, peptide modulators may lead to alleviation of the disease. This has been
13
previously used to identify various domains such as MDM2 and Bcl-2 as drug targets for cancer
[4]. Previously in the Sidhu lab, phage display was used to generate high affinity and specificity
peptides against the PDZ domain of Dishevelled-2 (Dvl-2) (Figure 4) [36]. The interaction
between Dvl-2 and Frizzled-7 receptor is mediated by the PDZ domains of Dvl-2 and an internal
peptide on Fzd-7. This interaction is critical for the activation of the Wnt-signalling, a critical
step for tumorigenesis in different cancers; and deletion of Dvl-2 PDZ or Fzd-7 peptide motif
leads to the abrogation of Wnt-signalling [36]. Reasoning that inhibition of PDZ-Dvl may disrupt
Wnt signalling, Zhang et al introduced phage-derived peptides into cells and observed that the
peptides specifically targeted PDZ domain of Dvl2 inside cells and down-regulated β-catenin
signalling stimulated by Wnt signalling [36]. Thus by using peptide probes against Dvl2-PDZ, ,
Zhang et al were able to demonstrate that targeting PDZ-Dvl2 may be a viable means for
attenuating the growth of cancer cells that are dependent on Wnt-mediated signalling pathways
and established Dvl2-PDZ as a valid drug target for cancer. Similar studies may be used to
identify potential drug targets for diseases including can
Figure 4: Generating intracellular Dvl2-PDZ inhibitors using phage display. (A) Phage display was done against Dvl2 PDZ using internal peptide library. The phage-dereived binding preference was then used to design peptide inhibitor: pep-N3. (B) Pep-N3 structure in complex with Dvl-2 PDZ confirms the binding mode of the peptide. (C) For intracellular uptake, Pep-N3 was fused to antennapedia and introduced in Wnt3a responsive human embryonic kidney (HEK) 293S cell lines. Real-time cellular uptake of pep-N3 is observed using time-lapsed microscopy. (D) Normalized TOPglow reporter activity was measured in Wnt3a-stimulated HEK293S cells after 18 h of treatment with pen-N3 shows inhibition if Wnt/TCF-dependent signalling. Pen-N3 does not inhibit TCF response signal in the control APC mutant HCT-15 colon cell line. Western-blots show Pen-N3 inhibits Wnt-signalling by inhibiting the accumulation of beta-catenin in HEK293S cells treated with Wnt3a (right side panel). (Figures from Zhang et al 2008)
14
1.4.4 Drug discovery: Recent studies have suggested that peptide probes may themselves
serve as a starting point for drug discovery against peptide recognition domains. In their direct
application, peptides themselves may serve as modulators of peptide recognition domains [37,
38]. Modifications such stapling or cyclization may be performed to improve the
pharmacokinetic properties [31, 32]. Another popular method of drug discovery is to develop
peptidomimetics. Peptidomimetics are organic molecules that mimic peptides. Peptidomimetics
can be generated by replacing natural amino acids by amino-acid derivatives that make the
peptide molecule less-susceptible to degradation and increases stability [39]. Finally peptide
probes may themselves be used to design fluorescent detection assays that can then be used to
screen large libraries of compounds (Figure 5). Often these screens include identification of
compounds that can displace the natural peptide from the binding site [40].
Figure 5: Fluorescence polarization assays for the discovery of small-molecule inhibitors of domain-peptide interactions. The chief method for identification of small-molecule compounds against domain-peptide interaction is to use a fluoroscent polarization assay. Natural or synthetic peptide binder is fluorescently tagged and incubated with the target domain. A library of small-molecule drugs is screened to identify molecules that compete with the binding of fluorescent peptide with target domain. This allows rapid screening of large small molecule libraries. 1.5 Goal of the project
Motivated by recent developments, the long-term goal of this project is to identify PRDs that
may act as novel cancer targets. To do this, we have focussed on shortlisted protein targets
against ovarian cancer provided by our collaborators, Dr. Rob Rottapel and Dr. Jason Moffat.
Ovarian cancer is the second most common gynaecological cancer in women and currently has
15
only one approved therapy. The 5-year survival rate for this cancer is only 47% highlighting the
need to develop targeted therapies against ovarian cancer. To assist in the development of novel
therapies, our collaborators used a whole RNAi screens to knockdown ~16000 human genes in
15 different ovarian cancer cell lines [41]. Using this screen, they identified 1695 genes whose
knockdown severely affected proliferation of ovarian cancer cells. Based on the current
literature, we hypothesized that PRDs present on these ovarian cancer essential genes play an
essential role in tumorigenesis and may serve as drug targets for further investigation.
The study has two key aims:
1) Identify peptide recognition domains present on these 1695 gene targets in ovarian cancer
using computational methods; and
2) Generate peptide binders against these domains using peptide phage display.
The peptide binders generated here can then be used to design intracellular probes to
specifically modulate interactions mediated by these PRDs and study the effect of these
perturbations on cellular pathways in specific ovarian cancer cell-lines. Such peptide can also be
used to identify PRDs that may serve as drug targets for ovarian cancer. Finally, peptide
inhibitors can be used to design assays to identify small-molecules that target interactions
mediated by these PRDs.
16
2 Identification of peptide recognition domains essential in ovarian cancer
17
2.1 Introduction The first goal of the project was to identify potential peptide recognition domains present on a
shortlisted group of proteins involved in ovarian cancer. The shortlisted candidates were based
on whole genome RNAi screens performed by our collaborators Dr. Jason Moffat and Dr. Rob
Rottapel and represent genes that are essential for cancer growth. The domains present on these
proteins were matched to known PRDs present in existing databases (PepX and DOMINO) in
order to identify potential PRDs.
Figure 6: Whole genome RNAi screen for identifying essential genes in ovarian cancer. A library of ~80,000 lenti-virus encoded shRNAs is used to selectively knockdown 16,000 human genes in different cancer cell lines. Each shRNA is identified using a single barcode. The genomic DNA is harvested at multiple time points. Genomic DNA from all the time points is hybridized on a microarray chip to study the specific growth rate of each unique cell type. shRNAs that knockdown genes essential for cancer proliferation significantly affect the growth rate and can be detected by microarray analysis. Using this approach our collaborators generated a list of 1695 human genes that effect the growth of 15 different ovarian cancer cell lines.
2.1.1 Whole Genome RNAi screen: RNAi is a powerful technique to knockdown specific
genes and study their effect on biological pathways. RNAi studies have illuminated roles of
various genes and helped to obtain a better understanding of their functions. Developments in the
cellular biology and molecular genetics techniques have made it possible to perform genome-
wide RNAi screens, where in a single experiment a large number of the genes in the human
genome can be targeted. These screens are performed using a library of short hairpin RNA
18
(shRNA) targeting many human genes where each shRNA is encoded inside a lenti-viral
expression vector. Lenti-viral expression vectors allow specific shRNAs to be incorporated into
cells. The library of shRNA is incubated with cancer cell lines to allow incorporation of a unique
shRNA inside a given cell in the population. Upon infection, the cells are allowed to proliferate
for 3-4 weeks, after which shRNAs that have been selectively depleted or enriched are identified
using microarrays, deep sequencing or high-content screening. Such pooled screens can be used
to define genes necessary for cancer cell proliferation/survival in cell culture [42].
In this study, we focused on screens done on ovarian cancer by our collaborators Dr.
Jason Moffat and Dr. Rob Rottapel (Figure 6) [41]. Using a library of 78,432 shRNAs, Marcotte
et al targeted 16,056 genes in 15 different ovarian cancer cell lines. The cancer cell lines used in
their analysis are attached in Appendix A. To select genes that are essential for ovarian cancer,
Marcotte et al. followed the dropout rate of each shRNA. These dropout rates were derived by
calculating the slope between the measured microarray expression intensity at each time point
relative to the initial time point. These dropout rates were used to define the GARP (Gene
Activity Ranking Profile) score for each gene. Genes with negative GARP score represent genes
that are critical for cancer proliferation. Using a cut-off to select highly essential genes, Marcotte
et al. identified 1695 genes that were essential across all ovarian cancer cell lines.
In this study, I used these 1695 genes as an input to my computational pipeline. There are
specific reasons for focussing on genes obtained from whole genome RNAi screens: 1) The
whole genome RNAi screens provide an unbiased list of genes that are important for ovarian
cancer growth and hence allows to focus on a much reduced set of genes, and 2) Given that
knockdown of these genes hamper cancer growth, the screen provides evidence that peptide
inhibitors of peptide recognition domains present on these genes may also negatively effect
cancer growth which can be rapidly tested by delivering peptides inside ovarian cancer cell lines.
Peptides that successfully re-capitulate the results obtained whole genome RNAi screens can
actually serve as templates for development of cancer therapeutics.
2.1.2 Computational methods to identify peptide-recognition domains: Recent
developments in high throughput experimental methods for identifying protein-protein
interactions have led to rapid identification of protein interaction partners. Experimentally known
domain-peptide pairs are documented in databases such as DOMINO [9] (a database of known
19
domain-peptide interactions), PEPX [11] (a database of domain-peptide interactions where the
co-crystal structures are available) and ADAN [10] (database of selected domain-peptide
interactions with known motifs). Other sources include ELM [7] which is a database of peptide-
like motifs but also includes information about domains that bind to such peptide motifs.
Computational methods have also been developed to identify novel PRDs [43]. These
approaches use sequence or structural similarity to known peptide binding domains present in
databases mentioned above as a metric to identify novel PRDs. In this study, we focussed on
peptide recognition domains present on protein targets provided by our collaborators. To develop
a computational method for the identification of PRDs, we focused on domains that share high
sequence similarity to known PRDs present in PEPX and DOMINO database (Figure 7).
Figure 7: Computational strategy for identifying potential peptide binding domains. Using databases such DOMINO and PepX, I obtained a high-confidence list of known peptide recognition domains. Using this list, I searched for domains present on target gene list that share high sequence similarity to known peptide recognition domains. The final domain list was then optimized by including information regarding the domain boundaries and expression conditions. For phage display, I selected domains which can be readily expressed in bacterial system and have crystal structures in complex with a known peptide. Using this approach, I was able to identify 66 domains from the list of 1695 genes provided by our collaborators.
20
2.2 Methods
Identification of peptide recognition domains: The proteins encoded by each of the 1695
genes were obtained using Uniprot annotations. Using BLAST, all the full-length proteins were
searched against domains present in PepX and DOMINO. The sequences with greater than 70%
sequence identity were retained while the other sequences were discarded. The sequence cut-off
was chosen based on previous studies that show that accurate structural models (<2 Å rmsd) The
domain boundaries were then annotated based on the closest available domain structure available
in Protein Data Bank (PDB). Figure 7 shows the entire computational pipeline used for this
method.
2.2.1 Manual filtering and literature review of potential domains from PEPX: Based on the
results obtained from computational pipeline, only domains present in PepX were selected for
further investigation. Each domain in PepX has a crystal structure bound to a peptide ligand
present in the database. This gave us confidence that such a domain: 1) binds to peptides; and 2)
can be expressed in bacterial cells. Further, the structures of domain-peptide ligand can be used
validate the results obtained from phage display. The domains that were obtained from PepX
were manually analyzed to remove false positives. These included domains that do not make
direct contacts with the peptide or those that share the interaction surface with another domain.
Further, domains that could not be expressed in bacterial cells for crystallization were also
removed from the list. Finally, literature review was done to analyze the domains obtained from
our computational pipeline.
Database used Description No. of domains obtained PepX Database of domains with known peptide
recognition for which structures are available
86
DOMINO Database of domains known to bind to peptides
390
PDB Database of all known structures 885
Pfam Total no. of domains in our dataset 5567
Table 2: Summary of results obtained from DOMINO and PepX.
2.3 Results and Discussion
2.3.1 Analysis 1695 genes obtained from whole genome RNAi screens: 390 domains were
21
obtained from DOMINO and 86 from PepX (Table 2). As reasoned above, the 86 domains from
PepX were selected for further study. Upon manual analysis and filtration, 66 domains were
selected as targets for phage display.
2.3.2 Literature review of domain list obtained from the computational pipeline: The list
of 66 domains represents a good initial set for the study. To get a better understanding of the
kind of domains present in this list, a thorough literature review was performed and structural
information for each of these domains was annotated (Table 3). Structurally, these 66 domains
represent 42 domain families. These include well-characterized peptide binding domains such as
the SH3 and the PDZ domains, which have been extensively studied for their peptide binding
potential by structural, biochemical and combinatorial studies. I decided to keep these domains
in our list as they can be used as positive controls for future experiments.
Domain Name
Protein name Domain boundary
Related PDB structure
Comment
PDZ (2.30.42.10)
1 Disc-large homolog 1 #1 224-310 2I0L (98.8) First PDZ domain ofDLG1; plays a role in planar cell polarity
2 Disc-large homolog 1 #2 319-405 1TP5 (82.9) Second PDZ domain ofDLG1; plays a role in planar cell polarity
3 Disc-large homolog 2 #2 193-279 2I0L (84) Second PDZ domain ofDLG2; plays a role in planar cell polarity
4 Disc-large homolog 2 #3 421-501 1TP5 (83.1) Third PDZ domain ofDLG2; plays a role in planar cell polarity
5 Disc-large homolog 4 #2 160-246 2I0L (89.2) Second PDZ domain of DLG4; ; plays a role in planar cell polarity
6 Disc-large homolog 4 #3 3113-393 1TP5 (97.1) Third PDZ domain ofDLG4; plays a role in planar cell polarity
SH3 (2.30.30.40)
7 Growth-factor receptor bound protein 2
1-58 2VWF (93.3) N-terminal SH3 of Grb2, adaptor protein in RTK signalling
8 Grb2-related protein 2 271-330 2W10(93.3) C-terminal SH3 of GRAP2, adaptor protein in RTK signalling
9 Phospholipase gamma 791-851 1YWO (93.4) Involved in RTK signalling
10 Sorbin and SH3 containing protein 2
938 – 999 2O9V (70) Second SH3 domain of SORBS2, interacts with Abl kinase
Protein Kinase (3.30.200.20+1.10.510.10)
22
11 Mitogen-activated kinase 3
42-330 2FYS (87.8) Protein kinase signal cascade
12 PKA 44-298 2VO7 (99.4) cAMP signalling
13 PKB 44-298 2VO7 (92.9) cAMP signalling
14 Serine/threonine kinase 2 81-684 1WBP (84.6) Regulates p53, cell cycle
15 Aurora kinase B 2-344 2BFY (80.2) Involved in chromosomal segregation
G-alpha subunit (3.40.50.300+1.10.400.10)
16 G-alpha (i) 1 2-354 1Y3A (100) Involved in G-protein signalling
17 G-alpha (i) 3 2-354 1Y3A (93.5) Involved in G-protein signalling
18 G-alpha (o) 1 2-354 1Y3A (72.1) Involved in G-protein signalling
Ligand binding domain of nuclear receptor (1.10.565.10)
19 Bile acid Receptor 256-474 3BEJ (100) Binds to bile acid, hormone receptor signalling
20 Retinoic acid receptor-gamma
261-463 3E94 (86.2) Binds to retinoic acid, hormone receptor signalling
21 Glucocorticoid receptor 528-777 1M2Z (99.6) Binds to cortisol, hormone receptor signalling
22 Pregnane X Receptor 204-434 1NRL (100) Orphan nuclear receptor, hormone receptor signalling
Dyenin light chain (3.30.740.10)
23 Dynein light chain 1 1-89 1CMI (100) Part of dynein motor complex
24 Dynein light chain 2 1-89 3E2B (96.6) Part of dynein motor complex
RNA recognition module (3.30.70.330)
25 Splicing factor U2AF1 65-147 1JMT (99) mRNA splicing
26 Splicing factor 45 306-385 2PEH (100) mRNA splicing
Profilin (3.30.450.30)
27 Profilin 1 1-140 2PAV (100) Regulates cytoskeleton
28 Profilin 2 1-140 2V8C (99.3) Regulates cytoskeleton
Penta-EF hand (1.10.238.10)
29 Programmed cell death receptor 6
23-191 2ZNE (100) Intracellular Ca2+ signalling
30 Calpain small regulatory subunit 1
1-268 1NX0 (97.1) Regulates Ca2+ dependent calpain protease complex
Actin (3.30.420.40+3.90.640.10)
31 Actin-gamma 1 1-375 3CHW (95.2) Highly conserved in eukaryotes; plays a role in cytoskeleton
32 Actin-gamma 2 3-374 2V52 (98.1) Highly conserved in eukaryotes; plays a role in cytoskeleton
Beta-propeller (2.130.10.10)
23
33 Clathrin heavy chain 1 1-363 1UTC (100) Involved in endocytosis
34 WDR5 20-334 3EMH (100) Involved in histone modifications
PH/PTB domain (2.30.29.30)
35 Dynamin 2 2-301 2AKA (87) Microtubule-associated protein
36 Disabled homolog 2 45-196 (97.5) Involved in endocytosis
P-loop containing nucleotide triphosphate hydrolase (3.40.50.300)
37 RAC3 1-189 2QME (95.4) Intracellular G-protein signalling
38 RAD51 97-339 1N0W (100) DNA damage response
Typrin-like serine protease (2.40.10.10)
39 Tissue-type plasminogen activator
311-562 1RTF (100) Extracellular protease
40 Acrosin 43-343 1FIW (71.1) Extracellular protease
14-3-3 (1.20.190.20)
41 14-3-3 eta 2-246 2O02 (75.5) Adaptor protein in signalling pathways
AP50 domain (2.60.40.1170)
42 AP-2 subunit mu 122-435 1I31 (100) Involved in endocytosis
Bcl-2 (1.10.437.10)
43 Bcl-2 like protein 1 1-233 3FDL (100) Regulates apoptosis
Bro1 (1.25.40.280)
44 Alix 3-392 3C3R (100) Intracellular protein transport
CAP/Gly (2.30.30.190)
45 Dynactin subunit 1 26-97 2HQH (100) Microtubule associated protein
Caspase-like (3.40.50.1460)
46 Caspase 2 155-452 1PYO (100) Intracellular protease, apoptosis
DNAse1-like (3.60.10.10)
47 DNAse 1 23-282 2D1K (78.8) DNAse involved in apoptosis
eIF4E (3.30.760.10)
48 eIF4E 2-217 2V8Y (100) Translation initiation factor
FERM (1.20.80.10)
49 Band 4.1-like protein 3 110-391 3BIN (100) Negative growth regulator
Ig domain (2.60.40.10)
50 Immunoglobulin lamba-like polypeptide 1
38-213 1W72 (84.1) B-cell surface receptor
Importin-beta (1.25.10.10)
24
51 Importin beta-1 1-876 1QGR (99.8) Nuclear import
Mad2A (3.30.900.10)
52 MAD2-like protein 1 2-205 2QYF (99.5) Anaphase cell cycle checkpoint
MHC II (3.10.320.10)
53 DRB1 beta 30-227 1T5X (100) Antigen recognition
PCNA (3.70.10.10)
54 Proliferating cell nuclear antigen
1-261 2ZVM (100) DNA replication
OB-fold domain (2.40.50.140)
55 Replication factor A 70 2-121 2B3G (100) Formation of replication fork
Serpin (3.30.497.10+2.30.39.10)
56 Plasma serine protease inhibitor
20-406 1LQ8 (99.7) Inhibitor of serine protease
SH3-type barrels (3.40.50.300/2.30.30.40)
57 Volage-dependent L-type calcium channel subunit beta
65-411 1T3L (76.6) Ca2+ channel, G-protein signalling
SWIB/MDM2 (1.10.245.10)
58 MDM4 26-106 3DAB (100) Regulator of p53, apoptosis
TAP-UBA (1.10.8.10)
59 Nuclear export factor 1 565-619 (100) Nuclear export of mRNA
Winged helix repressor DNA binding domain (1.10.10.10)
60 Transcription factor IIF 449-517 1J2X (100) General transcription
TRFH (1.25.40.201)
61 Telomeric repeat-binding factor 1
58-268 3BQO (100) Regulates telomeric length
Factor Xa inhibitor (4.10.410.10)
62 Amyloid-like protein 2 306-364 1CA0 (71.2) Regulation of homeostasis
Ubiqutin-like (3.10.20.90)
63 Ubiquitin-60S ribosomal protein L40
1-76 2D3G (100) Post-translation modification, regulates protein function
Vinculin (1.20.1490.10)
64 Vinculin 1-259 1YDI (100) Actin-filament binding protein
XRCC4 (2.170.210.10+1.20.5.370)
65 XRCC4 1-213 1IK9 (98.1) Double stranded DNa break repair
Tyrosine phosphatase (3.90.190.10)
25
66 Tyrosine-protein phosphatase non-receptor type 22
1-310 3BRH (99.4) Regulator of tyrosine kinase SRC family of kinases
Table 3: List of the 66 domains selected for the phage display experiments. The table shows the different protein families and their structural classification code (CATH) found by our computational method. The table also shows the boundary of the domain (defined by PDB structure) and the reference PDB structure. The sequence similarity between the reference structure and domain is shown in brackets. Apart from these well-characterized domains, I also obtained 52 domains for which this study
represents the first combinatorial study to identify their binding preferences. These 52 domains
are part of 39 unique protein families. If we are able to obtain peptides against these domains
using phage display, we can in principle extend phage display to other members of the family.
These 52 domain families are involved in diverse cellular pathways including gene expression
(Nuclear receptors), endocytosis (clathrin, AP2), cytoskeleton modelling (actin, vinculin,
dynactin), receptor tyrosine signalling (Grap2, Grb2, PLCG1, 14-3-3) and apoptosis (Mdm4,
Bcl-xl)to name a few.
The 66 domains identified by our computational pipeline also include some known cancer
targets. These include Bcl-xl which has been extensively targeted for its anti-apoptotic role and
discussed previously in detail [18]. We also identified eIF4E, a translation initiation factor that is
responsible for binding to mRNA caps and loading them on to ribosomes. In different cancers,
eIF4E is over-expressed which leads to expression of mRNA with unstable 5’ UTR [40].
Presence of previously-known cancer targets in our data-set provides us with confidence that
using our computational pipeline, we have been able to identify PRDs that are important in
cancer.
2.4 Summary
In this chapter, I have discussed the computational pipeline we used to identify potential peptide
recognition domains on a shortlisted protein candidates involved in ovarian cancer. I utilized
sequence homology-based approach to identify domains present on each protein that are similar
to known PRDs in the database PepX. Using this approach, I identified 66 domains that will
serve as targets for my phage display experiments.
For this study, I used essential genes obtained from whole genome RNAi screens as a
surrogate for genes involved in ovarian cancer. A number of these genes are involved in key
regulatory pathways that are conserved between normal cells and cancer cell lines and hence
26
may not represent viable cancer targets. To overcome these limitations, recent studies have
integrated data from other functional genomics screens such as mRNA expression data, copy-
number variations and exome sequencing to accurately predict proteins important for
carcinogenesis [43]. While integration of data from multiple sources may help in generating a
more refined list of cancer related proteins, such an analysis is beyond the scope of the current
study.
For identifying PRDs, I selected two databases, PepX and DOMINO. Both these
databases provided me with a large number of potential peptide recognition domains. For further
analysis, I focused on domains obtained from PepX. The domains obtained from PepX were then
manually filtered to remove false positive hits. This provided me with a shortlisted list of 66
domains. This list included domains from distinct structural folds and biological pathways. Some
of them have previously been studied in context of cancer in some cases these domains have
themselves been established as drug targets.
It is important to interpret the results obtained from computational pipeline in context of future
experiments. Using a simple analysis, I was able to obtain a diverse set of potential PRDs that
can be used as targets for phage display. The conclusions made from this study can be extended
to other studies of similar origin.
27
3 Identification of peptide binders using phage display
28
3.1 Introduction
After selecting the potential targets for phage display, my next aim was to generate peptides
against each of these domain targets. To do this, I used phage display technology to screen large
peptide libraries. As described previously, phage display is a directed evolution approach in
which peptides can be displayed on the surface of filamentous bacteriophage, M13, using
specialized vectors known as phagemid. Site-directed mutagenesis can then be used to generate
large peptide libraries where each phage member displays a unique peptide on its surface. Phage
display has been used extensively to generate high affinity and specificity peptides against
different protein targets. In the Sidhu lab, peptide phage display has been used to identify
binding preferences of a large number of human and yeast SH3 and PDZ domains [28, 29]. In
the case of Dvl2-PDZ domains, phage-derived peptides were used as intracellular inhibitors of
Fzd7-Dvl2 interactions; thereby knocking down Wnt signalling, an important signalling pathway
[36]. In this section I will describe the results obtained from the phage display screens.
3.1.1 Displaying peptides on phage particles: In the Sidhu lab, we use M13, a single stranded
DNA containing virus from the Inoviridae family for expression of peptides. M13 viruses infect
gram-negative bacteria such as E. coli. There are several advantages to using M13 for phage
display experiments. First, it follows a non-lytic life cycle which makes it easier to grow and
propagate in the lab. Second, its DNA is present in single-stranded form which makes it possible
to genetically display proteins on the surface of M13 using site-directed mutagenesis. The coat of
M13 is made up of five proteins as shown in Figure 8. Two of these proteins – p8 and p3 have
been used previously for displaying proteins. P3 protein is present in 5 copies on the phage and is
required for infection. Various proteins such as antibodies and fibronectin have been successfully
displayed on the surface using the p3 fusion without affecting infection of bacteria. Other protein
that is regularly used for phage display is p8, or the major coat protein which is present all over
the surface. Small peptides (<10 amino acids of length) can be fused to p8 without affecting the
assembly and secretion of the phage particle. In this study, we use the p8 coat protein as it allows
multiple copies of peptides to be present on the phage particle leading to selection of lower-
affinity peptides [44].
To display peptides onto the M13 phage surface, specialized vectors called phagemids
are required. Phagemid contains a single copy of the p8 phage protein under the influence of an
29
IPTG-inducible PTac promoter, an antibiotic-resistance cassette and a single and a double
stranded origin of replication. The peptide is fused to the N-terminal end of the p8 coat protein
such that it is expressed along with p8 inside the bacterial host [27]. Once the phagemid is
introduced into the cell, it is replicated into multiple ssDNA copies inside the bacterial host. The
infected cells can be selected using the resistance marker. To initiate formation of new virus
particles, the cells are super-infected with modified M13 phage that acts as a “helper”. Helper
phage leads to production of single-stranded phagemid DNA that can be effectively packaged
into virion particles. The packaging unit also introduces the mutant coat protein produced by the
phagemid. The abundance of mutant coat proteins is dictated by the IPTG concentration in the
culture media and may help in optimizing the number of peptides displayed by the bacteriophage
[27].
Figure 8: Schematic diagram of M13 bacteriophage. M13 filamentous phage is made up of 5 proteins. P8, the major coat protein is the most abundant coat protein that forms the cylinder around the phage ssDNA. The distal end of M13 assembles first and contains approximately three to four copies of p7 and p9. The proximal end is formed by five copies each of p6 and p3. The p3 coat protein is required for infection of the bacterial host. P8 and p3 coat proteins are used extensively for phage display. 3.1.2 Site-directed mutagenesis and phage library design: Once a protein is successfully
displayed on the M13 phage coat, mutations can be introduced into its encoding DNA in order to
generate vast numbers of variants. The ease of manipulating M13 ssDNA makes this phage an
ideal system for the synthetic construction of libraries of up to 1011 unique clones.
Changes to the phagemid DNA are performed in a series of reactions known as Kunkel
mutagenesis. In brief, E coli cells deficient in deoxy uracil transphosphatase (dut) and uracil
DNA deglycosylase (ung) are used to synthesise a uracil-rich version of the ssDNA phagemid
(dU-ssDNA) that serves as the template for the mutagenesis reaction. Synthetic oligonucleotides
that introduce mutations to the region of interest anneal to the dU-ssDNA template and serve as
30
primers for synthesis of the complementary strand. This reaction is completed in the absence of
uridine to form covalently-closed circular double-stranded DNA (CCC-dsDNA) with an original
uracil-rich DNA strand and a mutagenic DNA strand (Figure 9). Transformation of the CCC-
dsDNA into a dut+/ung+ bacterial host results in the degradation of the uracil-rich strand and
retention of the mutagenic strand. The CCC-dsDNA is then electroporated into a bacterial host
infected with M13 helper phage to synthesize the phage library. Kunkel site-directed
mutagenesis is ideal for phage display applications because it allows for complete control over
library construction, starting from the design of the mutagenic oligonucleotides themselves to the
annealing and synthesis conditions [44].
Figure 9: Oligonucleotide-directed mutagenesis with an ssDNA template. (A) A synthetic oligonucleotide (red arrow) is annealed to the template (dU-ssDNA). The oligonucleotide contains region with desired mutations flanked by perfectly complementary sequences. (B) Covalently-closed circular dsDNA (CCC-dsDNA) is enzymatically synthesized by T7 DNA polymerase and T4 DNA ligase. (C) CCC-dsDNA is introduced into an E. Colihost using electroporation.
Different peptide libraries can be generated using different sets of mutagenic
oligonucleotides. The library used for this project was obtained from Dr. Gang Chen, a post-
doctoral fellow in the Sidhu lab. The length of the peptides is 16 amino acids and each position
can accommodate any of the 19 amino acids (excluding cysteine). Cysteine is excluded because
it may lead to cyclization and disruption of the linear structure of peptides. The oligonucleotides
used in designing the library were obtained from TriLink Biotechnologies. These
oligonucleotides were synthesized three nucleotides at a time instead of single nucleotide as used
31
by other vendors. This allows one codon per amino acid removing codon bias that is generally
observed in oligonucleotides that use the NNK codons for randomization (where 32 codons code
for 20 amino acids).
3.1.3 Selection strategy: The peptide library constructed can be used to screen for peptide
binders against a target protein. After incubation of the library with the immobilized target, non-
specific phage particles are removed through a series of washes. The remaining bound clones are
eluted and amplified in a bacterial host, allowing for further rounds of screening to enrich for
clones expressing proteins with the desired traits. Figure 9 shows the entire phage display
selection pipeline used in this study. The success of the selection depends on both the quality of
the phage display library and the quality of the protein targets [44].
Figure 10: Phage display selection for peptide recognition domains. The peptide library was incubated with immobilized antigen. The non-binders were washed away while positive binders attached to the plate. The phage library was eluted and amplified in a bacterial host. The process was repeated five times to obtain an enriched set of binders. The phage pools from Round 5 were introduced in a bacterial host and plated on LB plates. 96 colonies were picked for each domain and grown over night to obtain phage clones. Each of the 96 clones was tested for binding in phage ELISA. The clones that showed high enrichment ratio were sequenced. The DNA sequences obtained were processed and translated to obtain the peptide sequence. The peptides were aligned manually or using multiple sequence alignment tools to obtain peptide logos.
32
From the library standpoint, quality can be affected by library construction or display
levels. Inefficient completion of the site-directed mutagenesis reaction may result in a large
proportion of phage particles that display the wild-type p8 coat protein. Further, if the number of
peptide copies on each phage particle is low, it may lead to weak binding. In both cases, such
libraries would offer a reduced chance of identifying peptide against their targets. The diversity
of the peptide library obtained from Dr. Gang Chen had been previously tested using phage
titrations. The IPTG concentration required for adequate display was also known and well-
documented. The library however was amplified for use in this study. Phage titrations were re-
done to estimate the diversity and size of the peptide library before using the peptide library for
further experiments.
From the immobilized target side, the quality and stability of the target are both important
factors in the success of a selection. For example, the presence of contaminants in impure protein
samples or denaturation of the samples themselves can result in the enrichment of unwanted
phage clones. Furthermore, the use of constructs that are unstable may result in a heterogeneous
and inconsistent interface that differs between rounds and is not amenable to enrichment of
binders against the intended target conformation. Consequently, SDS PAGE and spectro-
photometry were used test the purity and quantity of the protein target.
In light of these considerations, it is not only important to monitor the behaviour of the
phage population throughout the selection but optimization of the selection conditions and
reagents may be required for a successful outcome. One important consideration that was used to
design the selection strategy for this study was the presence of a GST tag on each of the
domains. To remove any peptides that bind to the GST tag, peptide library was pre-incubated in
a well containing only GST. Selections were done in presence of high GST concentrations to
further remove any weak GST binders.
3.1.4 Selection of tight-binding peptides and identification of binding specificities: The
progress of the selection screen is determined through an Enzyme Linked Immunosorbant Assay
(ELISA) and phage titrations. In phage titration, the phage obtained at the end of each round of
selection is used to infect exponentially growing bacterial culture. Upon infection, the bacterial
culture is serially diluted and plated on a plate containing the selectable marker for selecting the
cells that were successfully infected by the virus. The number of viruses (colony forming units or
33
cfu) obtained after each round of selection is calculated by counting the colonies obtained in the
serial dilutions. Enrichment ratio is defined as the ratio of the number of colony forming units
(cfu/ml) obtained from the target well and the negative control well (BSA). In a successful phage
display experiment, the enrichment ratio increases after each round of selection.
In ELISA, phage population obtained at each round is incubated with the immobilized
target and a control protein (GST and BSA) in parallel. Unbound phage particles are removed
from the wells through a series of washes and the remaining phage are then probed with anti-
M13 antibodies conjugated to horseradish peroxidase. Addition of the substrate results in the
synthesis of a blue pigment and the reaction is stopped with phosphoric acid to allow for a
spectrophotometric reading at 450nm. The enrichment ratio is determined by comparing the
signal intensity of the target well relative to the negative control well (BSA). As with phage
titrations, in a successful phage display experiment the enrichment ratio should increase after
each round of selection.
ELISA can also be used to determine the strength of binding of individual phage clones
obtained after all the rounds of selection are done. Depending on the stringency of the
experiment, tight binders can be defined as clones with target to control ratio of five or greater.
3.2 Methods
3.2.1 Strains: E.Coli strain XL1 Blue was used for expression of GST-fusion proteins. Peptide
phage display libraries were re-amplified in T1-resistant E.coli strain SR320, which was
generated by mating the strain XL1blue to the strain MC1061. All phage amplifications during
selection experiments were done in XL1 Blue.
3.2.2 Protein expression and purification: The DNA encoding the 66 shortlisted domains
was chemically synthesized (Genscript) and cloned into IPTG inducible expression vector
(pGEX) with Ptac promoter and N-terminal 6XHis and Glutathione-S-transferase (GST) tag
available in the Sidhu lab (pHH0103 –Appendix C). The protein sequences for each of 66
domains are attached in Appendix B. The plasmids containing these domains were transformed
into chemically competentXL1Blue. Single colonies were propagated in 2YT + 100 ug/ml
carbenicillin and stored as glycerol stocks (10% glycerol v/v) at -80C.
For protein expression, five ml starter cultures were inoculated from glycerol stocks and
34
grown overnight at 37oC, 200 rpm. The following day, 2-L baffled flasks containing 500 ml of
2YT + 100 ug/ml carbnicillin were inoculated with the starter culture. The cells were grown to
logarithmic phase (OD600=0.6) at 37oC, 200 rpm and induced with 0.4 mM isopropyl-β-D-
thiogalactopyranoside (IPTG) for protein expression. The cells were grown for 16 hrs at 16oC,
200 rpm. The cells were harvested by centrifugation (17,600 x g) at 4oC for 20 min and frozen at
-20oC.
Frozen cell pellets were re-suspended in a 12.5 ml 1xPhosphate Buffer Saline (PBS)
buffer with 1mM EDTA, 1mM DTT, 0.5% Triton X-100 (v/v) and protease inhibitors (1 tablet
per 50 ml of buffer, Roche). Sonication (three 2-min cycles of 5 sec “ON”, 5 sec “OFF”,
amplitude 25%) was used for cell lysis. The cell debris was removed by centrifugation (26,700 x
g), at 4oC for 20 min. The cell lysate obtained was then incubated with equilibrated glutathione-
sepharose 4B resin (GE-healthcare) and incubated at 4oC for 2 hrs. The resin and cell lysate
mixture was then applied to a gravity flow column. The column was washed with buffers (first
with 3 ml PBS, second with 3 ml PBS + 150mM NaCl and finally with 3ml PBS). The column
was then blocked and the resin was incubated with 1ml elution buffer (100mM glutathione in 50
mM Tris-Cl, pH 8, 1mM PMSF, 1mM EDTA) for 20 minutes. The eluate was collected and kept
for further analysis.
SDS-PAGE gel and spectrophotometry were used to validate the purity, size and estimate
the concentration of the protein. The proteins obtained were immediately aliquoted into smaller
volumes, frozen in liquid nitrogen and stored at -80 C. For troubleshooting the protein
purification pipeline, samples were collected after overnight incubation, upon lysis & flow-
through of the column and tested using SDS-PAGE.
3.2.3 Library construction and design: The peptide library used for the selections was 16
amino acid in length where each of the 16 positions can harbour 19 amino acids (Cys is not
included). The library has a theoretical diversity of 2.88x1020. The primary library had a
diversity of 4x1010and a titer of 1012cfu/ml. The phagemid used to design the library is listed in
Appendix C (pR4STOP).
The library was re-amplified by infecting actively growing SR320at OD600=0.8
containing 5X1012 cells in a 250 ml culture flask. The library was added to the culture such that
the ratio of phage : cell is 1:1. The culture was incubated for 30 minutes at 37oC, 200 rpm.
35
Helper phages (M13KO7) were added to the culture (such that the ratio between helper phage:
bacteria is 10:1) after 30 minutes to initiate the packaging of viral particles. After an hour of
incubation at 37oC, 200 rpm, the culture was added to 5 L of 2YT media and grown for 19 hrs at
37oC, 200 rpm. The cells were harvested by centrifugation (17,600 x g) at 4oC for 20 min. The
supernatant containing the bacteriophage particles was incubated with 20% v/v PEG/NaCl (20%
PEG-8000 (w/v), 2.5 M NaCl) at 4oC for 20 min. The supernatant was then centrifuged (26,700 x
g), at 4oC for 20 min to obtain the white phage pellet. The remaining supernatant was removed
by pipetting. The phage pellets were re-centrifuged (26,700 x g), at 4oC for 2 min to concentrate
the pellet and then re-suspended in 20 mL PBT (1xPBS, 0.05% Tween 20 (v/v) and 0.5% BSA
(w/v)). The final library was stored at -80oC with 10% glycerol (v/v). Phage titrations were
performed to estimate the purity and titer of the phage library.
3.2.4 Phage display selections: Phage display was done using the previously established
protocol described by Tonikian et al [45].
First round: The target proteins were immobilized on a microtiter plate (NUNC maxisorp 96-
well plate) by incubating the proteins overnight at 4oC. For each protein, five wells were used
with three wells for the protein and two for the negative control (PBS). Each well was incubated
with 100 ul of 10 ug/ml purified protein. The overnight coated wells were blocked with 200 ul of
PBT buffer (1xPBS, 0.05% Tween 20 (v/v) and 0.5% BSA (w/v)). The blocked wells were
washed three times with PT buffer (1xPBS, 0.05% Tween 20 (v/v)).
The phage library was re-suspended to a final concentration of 5X1012 phages/ ml in PBT
buffer (1xPBS, 0.05% Tween 20 (v/v), 0.5 BSA (w/v)), and added to each well and incubated for
2 hrs at room temperature. The unbound phages were removed and wells were washed eight
times with PT buffer (1xPBS, 0.05% Tween 20 (v/v). Bound phages were then eluted by
incubating with 0.1N HCl for 5 minutes at room temperature. The eluted phages were then
neutralized using Tris-Cl, pH 11.
The eluted and neutralized phages were incubated in 10 volumes of actively growing
XL1blue cells (OD600=0.6) at 37oC, 200 rpm for 30 minutes. Helper phage was then added to the
final concentration of 1010 phages per ml to initiate the formation of viral particles. The cells
were grown at 37oC, 200 rpm for 60 minutes. Kanamycin at 50 ug/ml was used to select for cells
that have been super-infected with helper phage and the culture was grown overnight at 37oC,
36
200 rpm.
Round 2, 3, 4 and 5: The target protein was immobilized on a microtiter plate (NUNC maxisorp
96-well plate) by incubating the protein overnight at 4oC. For each protein, five wells were used
with three wells for the protein and two for the negative control (BSA). Each well was incubated
with 100 ul of 10 ug/ml purified protein. The overnight coated wells were blocked with 200 ul of
PBT (1xPBS, 0.05% Tween 20 (v/v) and 0.5% BSA (w/v)). The blocked wells were washed
three times with PT buffer (1xPBS, 0.05% Tween 20 (v/v)).
Phages obtained from previous round of selection were collected from overnight cultures.
The cells were harvested by centrifugation (26,700 x g), at 4oC for 20 min. The virus particles
present in the supernatant were incubation with 20% v/v PEG/NaCl (20% PEG-8000 (w/v), 2.5
M NaCl) at 4oC for 20 min. The supernatant was then centrifuged (26,700 x g), at 4oC for 20 min
to obtain the white phage pellet. The remaining supernatant was removed by pipetting. The
phage pellets were re-centrifuged (26,700 x g), at 4oC for 2 min to concentrate the pellet and then
re-suspended in 1 mL PBT (1xPBS, 0.05% Tween 20 (v/v) and 0.5% BSA (w/v)). The 100 ul of
phages were added to the respective wells and incubated for 2 hrs at room temperature. The
unbound phages were removed and wells were washed eight times with PT buffer (1xPBS, 0.05%
Tween 20 (v/v). Bound phages were then eluted by incubating with 0.1N HCl for 5 minutes at
room temperature. The eluted phages were then neutralized using Tris-Cl, pH 11.
The eluted and neutralized phages were then incubated in10 volumes of actively growing
XL1 blue cells (OD600=0.6) at 37oC, 200 rpm for 30 minutes. Helper phages were then added to a
final concentration of 1010 phages per ml to initiate the formation of viral particles. The cells
were grown at 37oC, 200 rpm for 60 minutes. Kanamycin at 50 ug/ml was used to select for cells
that have been super-infected with helper phage and the culture was grown overnight at 37oC,
200 rpm.
In round 1 & round 2, pre-selection was done for 60 minutes at room temperature on
wells coated with 10ug/ml GST to remove phage clones that bind to the GST tag (present in each
purification). 10-20 fold excess concentration of GST was added in round 3, 4 and 5 to each well
coated with target-domain during incubation of library to further remove clones that
preferentially bind to the GST tag.
3.2.5 Calculation of enrichment ratio: Phage titrations (to test the number of phages obtained
37
from the protein well and the control well and to calculate the enrichment ratio) were performed
at each round of selection. Briefly, 50ul phage obtained from each day of selections were added
to 450ul of XL1 blue (OD600 = 0.6) and incubated for 30 minutes at 37oC, 200 rpm. The cells
were then serially diluted (10-fold dilution series) in 2YT. The various dilutions were spotted on
a LB agar plate with 100 ug/ml carbnecillin. The plates were incubated overnight at 37 oC.
Similar work was performed for the phages obtained from control well (GST/BSA). Next day,
colonies were counted on the protein and the control plate to calculate the number of phage
present after a round of selection. Enrichment ratio was calculated as the ratio of colonies in the
protein well compared to the control well. The enrichment ratio was calculated for each round of
selection.
3.2.6 Clonal ELISA and sequencing of peptides: The 50 ul eluted phages from Round 3, 4 &
5 were introduced to 450 ul of XL1blue (OD600 = 0.6) and incubated for 30 minutes at 37oC, 200
rpm. The cells were then serially diluted (10-fold dilution series) in 2YT. The various dilutions
were spotted on a LB agar plate with 100 ug/ml carbnecillin. The plates were incubated at 37oC,
200 rpm overnight. For each protein, 96 colonies were picked and grown over night in 450 ul of
2YT containing 100 ug/ml carbenecillin and 1010 phages/ml M13KO7 (1010 phages/ml), and
incubated overnight at 37oC, 200 rpm in a 96-well block. The overnight cultures were
centrifuged at 3400 x g for 15 minutes the next day.
Phage clones were tested for binding to protein, GST and BSA in an ELISA assay. For
each protein, 96 clones are tested in a single microtiter plate (384 well Maxisorp plate, Nunc).
The 384-well plate is divided into 96 sections with four wells each. In each section, two wells
were coated with 30ul of 10ug/ml of protein, one well with 10ug/ml GST and one well was left
empty overnight at 4oC. The plate was then blocked with 50 ul of PBT buffer (1xPBS, 0.05%
Tween 20 (v/v) and 0.5% BSA (w/v)) for two hrs at room temperature. 30 ul of phage
supernatant was added to all the four wells present in each section and incubated for 60 minutes
at room temperature. Wells were washed four times with PT buffer (1xPBS, 0.05% Tween 20
(v/v)). Anti-M13: HRP conjugated antibody was diluted 1:5000 in PBT buffer (1xPBS, 0.05%
Tween 20 (v/v), 0.5% BSA (w/v)) and 30 ul was added to each well. The antibody was incubated
for 45 minutes at room temperature and then discarded. The wells were washed eight times with
PT buffer (1xPBS, 0.05% Tween 20 (v/v)). Colorimetric HRP substrate reagents (TMB
38
substrate, Pierce) were mixed in equal volumes and 25 ul was added to each well and incubated
at room temperature for 5-10 minutes with gentle shaking. The reaction was stopped by adding
30 ul of 1M H3PO4. Absorbance at 450 nm was measured for each well using an ELISA plate
reader.
The enrichment ratio was calculated by comparing the intensity of signal in protein and
GST & BSA wells. The plates with enrichment ratio greater than five and GST background noise
of 0.1 or less were selected as true binders. The binders were then obtained and the DNA
sequence encoding the peptide displayed by that phage clone was amplified using PCR. The
DNA encoding the specific peptide was identified by DNA sequencing. The DNA sequences
obtained were processed and translated to obtain the peptide sequence. The peptides against each
domain were aligned manually or using multiple sequence alignment tools (Geneious) to obtain
peptide logos. Till date, I have obtained phage clones that bind specifically to 27 of the 44
domains (61% of the purified domains, 40% of all the domains).
3.2.7 Structural modeling of phage-display obtained results: All structural models were
obtained with Modeller. Modeller was installed on a Linux machine and run using the command-
line. Discovery Studio Visualizer was used to analyze the results from obtained from Modeller.
Energy minimization of Modeller structures was performed with the Molecular Dynamics (MD)
plug-in available in the licensed version of Discovery Studio.
3.3 Results
3.3.1 Selection of peptide binders using phage display: Each of the 66 domains was purified
using GST purification protocol described in the methods section. Purified proteins were
obtained for 44 out of 66 domains (67%). Table 5 contains all the results obtained from protein
purification. For most domains, I was able to obtain protein sufficient for performing phage
display. SDS page gels were run to check the correctness and purity of proteins. SDS gels were
also run to diagnose the entire purification and expression process. The 22 domains that could
not be purified showed high expression of protein. However in all such cases; the expressed
protein was insoluble and went into cell debris upon lysis. Further optimization may be required
to obtain these proteins in soluble form. However, in this study we continued with the 44
domains and used them as targets for phage display.
39
The original 16-aa length peptide library had a diversity of 4X1010 unique peptides and a
phage titer of 5X1012 cfu/ml. Upon re-amplification, a phage titer of 2.5X1011cfu/ml was
obtained post-infection providing a 10-fold coverage of the original library diversity. Upon
amplification, a library titer of 2X1013 cfu/ml was obtained.
The selections were performed using protocol modified from the one previously
described by Tonikian et al [45]. A subset of phage clones present in the library may bind tighter
to GST tag than the target domain. These may lead to spurious or false positive results. To
remove such phage clones, pre-selection was done on GST coated wells. Further negative
selection was performed by adding 10-fold excess GST in the target domain-coated well during
selection. For most proteins, upon negative selection, strong enrichment ratios were obtained.
This suggests that the negative selections were effective in removing strong GST-binding phage
clones from our library. Table 5 shows the enrichment ratio obtained after each round of all 38
targets against which phage selections were done. For 27 targets, I obtained enrichment in
selections.
3.3.2 Validation of tight and specific binders using clonal ELISA: For each protein with a
significant pool ELISA signal, I picked out 96 clones for clonal ELISA. The ELISAs were done
in a 384 well plate. For each phage clones, four wells were selected; two wells were coated with
the target domain while the other two wells were used for negative controls GST and BSA. The
clones that gave enrichment ratio greater than five were selected. DNA sequence encoding the
peptide for each of the selected clone was amplified by PCR and sent for sequencing.
3.3.3 Identification of binding preferences and literature validation: Peptide binders were
obtained for 27 domains. Geneious toolkit was used to align all peptide sequences obtained for
each of the selected domain. No gaps were allowed in the alignment. Alignments obtained from
Geneious were improved manually. For 22 of these 27 domains, sufficient numbers of peptides
were available to generate a position weight matrix (PWM) that represents the binding
preferences of these domains.
The 27 domains for which phage display peptides were obtained belong to 20 different
domain families and exhibit distinct binding preferences. The divergence in peptide binding
preferences highlights the power of phage display in generating specific peptide binders.
40
Name Protein name Protein expression
Protein yield (mg/ml)
Pool ELISA
Clonal ELISA
Sequence Logo Comment
PDZ (2.30.42.10)
1 Disc-large homolog 1 PDZ 1 Yes 2.82 1.5 - - Non-specific binders
2 Disc-large homolog 1 PDZ 2 Yes 4.2 5 - - Non-specific binders
3 Disc-large homolog 2PDZ 2 Yes 3.74 23 19(18)
-
4 Disc-large homolog 2 PDZ 3 Yes - - - - No protein in lysate
5 Disc-large homolog 4 PDZ 2 Yes 1.9 12 19(5)
-
6 Disc-large homolog 4 PDZ 3 Yes - - - - No protein in lystate
SH3 (2.30.30.40)
7 Growth-factor receptor bound protein 2
Yes 0.9 56 82(30)
-
8 Grb2-related protein 2 Yes 1.44 444 82(77)
-
9 Phospholipase gamma Yes 1.21 205 17(14)
-
10 Sorbin and SH3 containing protein 2
Yes 1.32 277 82(54)
-
Protein kInase (3.30.200.20+1.10.510.10)
11 Mitogen-activated kinase 3 Yes - - - - No protein in lystate
12 PKA Yes - - - - No protein in lystate
13 PKB Yes - - - - No protein in lystate
14 Serine/threonine kinase 2 Yes - - - - No protein in lystate
15 Aurora kinase B Yes 0.30 3 - - Non-specific binders
38
37 41
G-alpha subunit (3.40.50.300+1.10.400.10)
16 G-alpha (i) 1 Yes 1.33 14.3 43(12)
17 G-alpha (i) 3 Yes 0.85 18.7 - - Non-specific binders
18 G-alpha (o) 1 Yes 1.12 25 - - Non-specific binders
Ligand binding domain of nuclear receptor (1.10.565.10)
19 Bile acid Receptor Yes 0.60 13.3 22(12)
-
20 Retinoic acid receptor-gamma Yes - - - - No protein in lystate
21 Glucocorticoid receptor Yes - - - - No protein in lystate
22 Pregnane X Receptor Yes - - - - No protein in lystate
Dyenin light chain (3.30.740.10)
23 Dynein light chain 1 Yes 1.18 233 69(51)
-
24 Dynein light chain 2 Yes 1.21 63 25(25)
-
RNA recognition module (3.30.70.330)
25 Splicing factor U2AF1 Yes 1.28 0.5 - - No enrichment
26 Splicing factor 45 Yes - - - - No protein in lystate
Profilin (3.30.450.30)
27 Profilin 1 Yes 4.51 15 - - Non-specific binders
28 Profilin 2 Yes 1.13 12 - - Non-specific binders
Penta-EF hand (1.10.238.10)
38
37 42
29 Programmed cell death receptor 6 Yes 0.55 100 55(55)
-
30 Calpain small regulatory subunit 1 Yes 1.91 63 85(12)
-
Actin (3.30.420.40+3.90.640.10)
31 Actin-gamma 1 Yes - - - - No protein in lystate
32 Actin-gamma 2 Yes - - - - No protein in lystate
Beta-propeller (2.130.10.10)
33 Clathrin heavy chain 1 Yes 1.63 250 47(22)
-
34 WDR5 Yes 1.24 80
-
PH/PTB domain (2.30.29.30)
35 Dynamin 2 Yes 0.44 - - - Phage Display not done
36 Disabled homolog 2 Yes - - - - No protein in lystate
P-loop containing nucleotide triphosphate hydrolase (3.40.50.300)
37 RAC3 Yes 0.84 5 - - Non specific binders
38 RAD51 Yes 0.20 15 - - Non specific binders
Typrin-like serine protease (2.40.10.10)
39 Tissue-type plasminogen activator Yes 0.55 - - - Phage Display not done
40 Acrosin Yes 0.61 - - - Phage Display not done
14-3-3 (1.20.190.20)
38
37 43
41 14-3-3 eta Yes 1.80 33.3 56(9)
-
AP50 domain (2.60.40.1170)
42 AP-2 subunit mu Yes - - - - No protein in lystate
Bcl-2 (1.10.437.10)
43 Bcl-2 like protein 1 Yes - - - - No protein in lystate
Bro1 (1.25.40.280)
44 Alix Yes 0.80 22.2 15(7)
-
CAP/Gly (2.30.30.190)
45 Dynactin subunit 1 Yes 2.22 18 2(1) GQDEWVPWQLWSWQESI
No sequence logo
Caspase-like (3.40.50.1460)
46 Caspase 2 Yes - - - - No protein in lystate
DNAse1-like (3.60.10.10)
47 DNAse 1 Yes - - - - No protein in lystate
eIF4E (3.30.760.10)
48 eIF4E Yes 0.44 30 6(6) FLYYYGLSHNWFGDQT LVPWWWRVEQTMDPVI SVWWFGQTPYVLWEAS RVMIWWWLTQGIPFSF NLYYNNMYWQWYEWLN PWSWFTYREQLETENV
No sequence logo
FERM (1.20.80.10)
49 Band 4.1-like protein 3 Yes - - - - No protein in lystate
Ig domain (2.60.40.10)
38
37 44
50 Immunoglobulin lamba-like polypeptide 1
Yes - - - - No protein in lystate
Importin-beta (1.25.10.10)
51 Importin beta-1 Yes 0.36 33 10(10)
-
Mad2A (3.30.900.10)
52 MAD2-like protein 1 Yes 0.49 75 16(13)
-
MHC II (3.10.320.10)
53 DRB1 beta Yes - - - - No protein in lystate
PCNA (3.70.10.10)
54 Proliferating cell nuclear antigen Yes 0.34 40 2(1) GARQTLITDWLMVSSD No sequence logo
OB-fold domain (2.40.50.140)
55 Replication factor A 70 Yes 4.41 94 66(11)
-
Serpin (3.30.497.10+2.30.39.10)
56 Plasma serine protease inhibitor Yes - - - - No protein in lystate
SH3-type barrels (3.40.50.300/2.30.30.40)
57 Volage-dependent L-type calcium channel subunit beta
Yes - - - - No protein in lystate
SWIB/MDM2 (1.10.245.10)
58 MDM4 Yes 3.40 150 26(24)
-
TAP-UBA (1.10.8.10)
59 Nuclear export factor 1 Yes 2.57 45 24(9)
-
Winged helix repressor DNA binding domain (1.10.10.10)
38
37 45
60 Transcription factor IIF Yes 1.99 7.5 - - Non-specific binders
TRFH (1.25.40.201)
61 Telomeric repeat-binding factor 1 Yes 1.42 15 8(4) LGHTTAEMIDYMELQW SFPLEFTTDYMYNLMA MLFDDEAMYNWQWHLM EHSFLFEDWMWEGKDH
No sequence logo
Factor Xa inhibitor (4.10.410.10)
62 Amyloid-like protein 2 Yes 0.79 - - - Phage Display not done
Ubiqutin-like (3.10.20.90)
63 Ubiquitin-60S ribosomal protein L40
Yes 3.90 22 5(4) EHMWDAQMWEWSWWDL EMWVFTPAEWFQIYLN MTVVEWWTDAQIAEWM DLHYDWSLEYWTSLLQ
No sequence logo
Vinculin (1.20.1490.10)
64 Vinculin Yes 1.64 65 33(26)
-
XRCC4 (2.170.210.10+1.20.5.370)
65 XRCC4 Yes 0.88 - - - Phage Display not done
Tyrosine phosphatase (3.90.190.10)
66 Tyrosine-protein phosphatase non-receptor type 22
Yes 0.28 - - - Phage Display not done
Table 4: Summary of phage display results for 66 domains. SDS PAGE gels were run to determine the protein expression in the whole cell lysate. OD at 280 nm was used to determine the protein yield (shown in mg/ml). Enrichment ratio for pool ELISA is the maximum ratio of colony forming units per ml obtained from protein and empty plate for a given domain. The clonal ELISA column shows the number of sequences with: Enrichment ratio > 5 and background signal < 0.1. The number of unique sequences obtained from sequencing is included in the bracket. Sequence logo obtained from phage display is also included. For domains for which the number of sequences was insufficient to generate a sequence logo, the peptide sequences have been included (key residues have been highlighted in bold). Rows containing proteins that: were not purified is shown in blue, for which phage display was not done are shown in orange, for which no peptide sequences were obtained are shown in green and for which sequence logo could not be obtained are shown in purple.
46
38 45
To rationalize the binding preferences obtained from phage display, an extensive
structural analysis was done using the available structures of the protein domains in complex
with their known peptide ligand (Figure 11). Based on my analysis, I have presented below an
in-depth analysis of all the domains for which phage display results were obtained. These
domains belong to different cellular pathways and hence I have divided the 27 domains into five
sections based on their biological function (Figure 12). This would help in understanding the
potential uses of peptide probes generated by phage display and the various cellular processes
that can be targeted using these peptides.
Figure 11: Strategy for validating phage display results. To interpret the results from phage display, an extensive literature review and structural analysis was done. For each domain, the peptide sequences were aligned using Alignment tool in Geneious with high gap penalty. The sequences were then visualized as a position weight matrix using Weblogo. The alignments that show no consensus were improved manually. Structural analyses of existing complex structure of domains in complex with peptides were used to further improve the alignment obtained. This analysis was generating a model for binding of phage-derived peptides to target domain. 3.3.4 Cellular signalling: Based on the literature analysis, ten out of the 27 domains were
present on proteins involved in signalling networks including kinase and G-protein networks.
These included previously well-studied domains such the SH3, PDZ and 14-3-3 domains.
47
38 45
Figure 12: Overview of phage results. Based on the literature review, the 27 domains were divided into four distinct biological functions: cellular signalling, cytoskeleton regulation, Intracellular transport and genome regulation. Domains that did not fit into any of the four categories are shown as miscellaneous.
48
38 45
3.3.4.1 SH3: Src Homology 3 (SH3) protein interaction domains participate in a diverse set of
signalling pathways by binding to linear motifs [5]. These domains preferentially bind to proline-
rich-motifs (PxxP) with affinities ranging from Kd = 1 to 200 uM. The selectivity of SH3
domains have been studied in detail and consensus motifs have been predicted using yeast-2-
hybrid, phage display, alanine scanning and structure determination [5].
SH3 domain is a 60 amino acid domain with a beta-barrel fold which consists of 5 or 6 β-
strands arranged as two tightly packed anti-parallel β sheets (Figure13) [24]. The interaction
surface (between the RT and N-src loops) is relatively flat, hydrophobic with three shallow
grooves defined by conserved aromatic residues. The peptide adopts an extended, left-handed
conformation (polyproline-2 or PPII helix). Sequences lacking PxxP motif are also known to
bind to SH3 domains. Grap2-SH3 in our study is an example of domain that prefers RxxK motif
[46]. Crystal structures have confirmed that RxxK motif binds to a different binding region on
the SH3 domain. (Figure 13)
Figure 13: Structural and literature analysis of SH3 domains: (A) SH3 domains are known to bind to peptides using two distinct binding surfaces: binding surface 1 binding to PxxP motif and binding surface 2 that binds to RxxK motifs. Some SH3 domains (such as C-terminal SH3 domain of Grb2 (PDB ID: 2VWF) – shown in the figure) binds to its interaction partner using both the binding surface. (B) The binding preferences of phage display in comparison to previous results. In each panel, the logo on the top shows the binding preferences obtained by our study. Sequence logo at bottom of 3 panels (GRAP2, PLCG1 and SORBS2) was obtained from large-scale phage display screen performed by Dr. Haiming Huang at Sidhu lab (results unpublished). The phage display logo for N-terminal Grb2 was obtained from study done by Sparks et al [47]. The binding preferences obtained in this study matches with previous phage display experiments. In this study, four SH3 domains were targeted: the N-terminal SH3 domain of Grb2, the
49
38 45
C-terminal SH3 domain of GRAP2, the SH3 domain from PLCG1 and the second SH3 domain
from SORBS2. For each of the SH3 domains, high levels of enrichment were obtained in pool
ELISA. Further for four SH3 domains, we obtained a large number of unique sequences that
showed high enrichment ratio in clonal ELISA. The binding preferences of these SH3 domains
have previously been elucidated using phage display by the Sidhu lab and other groups.
However, the SH3 domains were selected to serve as positive controls for our experimental
pipeline and validate our screening method and peptide library. As expected, the phage display
results match previously generated binding preferences (Figure 13). The positive results obtained
for SH3 domains inform us that the diversity and display levels of the peptide library is sufficient
for elucidating binding preferences of PRDs.
3.3.4.2 PDZ: PDZ domains are peptide binding domains that bind to hydrophobic C-terminal
motifs of proteins. They regulate multiple cellular processes, acting as scaffolds involved in
protein-protein interactions. PDZ domains are ~90 aa in length and have a conserved fold
consisting of 5-6 β-strands and 2-3 α-helical structures [28]. These domains have a single
binding site in a groove between the α2 and β2 structural elements with a highly conserved
carboxylate-binding loop ([R/K] xxxGΦGΦ motif, where x: any amino acid residue and Φ:
hydrophobic residues) located before the β2 strand typically recognizing the extreme C-termini
of their target proteins (Figure 14). PDZ domains have a well-defined binding preference and
previous work done by Tonikian et al [28] has identified C-terminal binding preferences of 72
human PDZ domains. A subset of PDZ domains are also known to bind to internal binding
motifs such as Syntrophin and Par6 domains (Figure 14) [48]. In the Sidhu lab, an internal
peptide phage library has been used previously to identify internal peptide binding mode of
Dvl2-PDZ [36].
In the current study, six PDZ domains were selected: the first and second PDZ domains
of DLG1, the second and third PDZ domains of DLG2 and the second and third domains from
DLG4. Out of these six PDZ domains, four were successfully purified. All the four domains
were used as targets for phage display out of which two PDZ domains showed phage clones with
high enrichment ratio. The peptide sequences obtained from the phage selections were aligned to
obtain results shown in Figure 13. The sequence logo obtained for PDZ domains was similar to
the internal binding mode observed for the Par6-PDZ domain suggesting that the two PDZ
50
38 45
domains may also bind to internal ligands. These observations have to be validated using further
experiments.
Figure 14: Structural and literature analysis of PDZ domains: A. PDZ domains are known to bind to C-terminal peptides where the free COOH group binds to the carboxylate binding pocket. In selected PDZ domains, peptide binding can occur via a beta-hairpin motif (syntrophin PDZ) or a Par6 internal binding motif where negatively charged residue at site +1 compensates for COOH group [48]. B. The sequence logos obtained for second PDZ domain of DLG2 and DLG4. The binding preference is similar to canonical PDZ internal binding motif observed for Par6-PDZ domain. C. The structure of Par6-PDZ domain in complex with peptide obtained from Pals1 (PDB ID: 1RZX). The DLG2-PDZ2 motif shows conservation for Glu and Thr at positions -2 and -3 (Glu and Met in yellow); Ile/Leu at position 0 (Val in yellow); Asp at position +1 (Asp in yellow) and Pro at position +3 (Pro in yellow) of Pals1 internal ligand. DLG4-PDZ2 domain is similar but much weaker pattern (due to less number of unique sequences obtained). These results predict that phage-derived peptides bind to DLG2-PDZ 2 and DLG4-PDZ2 at the peptide binding pocket in an internal binding mode. 3.3.4.3 G-alpha subunit of hetero-trimeric G-proteins: Guanine nucleotide-binding proteins
are an important family of cell-signalling molecules that regulate key cellular pathways [14]. The
alpha subunit of G-proteins binds to G protein-coupled Receptors (GPCR) and acts as a GTPase.
Upon binding of ligand to GPCR, an exchange of GDP with GTP occurs in the Gα subunit. This
active Gα dissociates from the inactive G-protein complex and acts on its downstream effectors
via its GTPase activity. Structurally, Gα consists of two domains: a GTPase domain and an
alpha-helical domain. The GTPase domain is similar in structure to p21ras and other members of
the GTPase super-family of proteins and contains five helices surrounding a six-stranded beta-
sheet with five strands running parallel and one strand running anti-parallel to the others. The
second of the five helices is a 3(10) helix, rather than an alpha helix. The alpha-helical domain is
unique to the Gα subunits and has a long central helix surrounded by five shorter helices. The
alpha helical domain is joined to the GTPase domain by two extended strands, linker 1 (res 54-
58) and linker 2 (res 173-179). Between these two linking segments lies a deep cleft within
which the nucleotide (GTP or GDP) is tightly bound. Phage display and mRNA display have
51
38 45
been used to obtain peptide antagonists/agonists of the GTPase activity of Gα [14, 54]. Both the
methods generated peptides that bind to the hydophobic pocket between the α3 helix and the
switch II helix. The switch II/α3 binding pocket is also the position for binding of RGS14
GoLoco motif, an important regulator of Gα activity.
In this study, I targeted three G-alpha domains – Gαi1, Gαi3 and Gαo1. All the G-alpha
domains were successfully expressed and purified in sufficient quality and quantity to perform
phage display experiments. However, upon selections, peptides were obtained against only one
G-alpha domain: Gαi1, which has been targeted by previous studies. This may be due to the
absence of GTP/GDP during selection which may be required for stabilizing the structure of
Gαi3 and Gαo1. For Gαi1, 12 peptides were obtained which were aligned to generate a
consensus motif: “ΦWexeWV” (where Φ: hydrophobic residue; e: negative charged residue).
The consensus motif is distinct from the peptide obtained from previous combinatorial library
studies (Figure 15). Structural analysis of Gαi1 with KB-752 peptide suggests that phage-derived
peptides may bind to the same binding site of peptide as KB-752; albeit in a different mode
(Figure 15).
Figure 15: Structure and literature analysis of Gα subunits. (A) The structure of Gαi1 in complex with GDP (red) and KB-752 peptide (PDB ID: 1Y3A). (B) The binding logo obtained for Gαi1. The residues that are conserved in the sequence correspond to Trp10, Trp14 and Phe15. (C) The interaction surface of KB-752 and Gαi1. The residues on KB-752 that are important for interaction are the Trp, Phe, Asp and Leu (shown in yellow). A similar pattern is observed in phage display binding motif, albeit with a spacing of two residues between Trp and negative charge compared to one in KB-752 and a preference for Trp and Phe at position 14 and 15 instead of Phe and Leu as found in KB752. G-protein signalling is one of the major signalling pathways used by cells and have been
implicated in a number of disorders. This can be highlighted by the fact that 25% of the marketed
pharmaceuticals target GPCRs [55]. Gαs oncogenes have been shown to increase carcinogenicity
52
38 45
and metastasis, and recent identification of Gαs-hyperactivating mutations in kidney cancer
indicates that the subunit could be a therapeutic target in developed tumours [55]. Peptide probes
developed here may be used to modulate the activity of Gα domains.
3.3.4.4 14-3-3: The14-3-3 family of proteins play a key role as scaffolding proteins in a number
of cellular signalling pathways [56]. Seven family members have been reported in human and are
expressed in all human tissues except for 14-3-3 sigma that is specific to epithelial cells.
Structurally, 14-3-3 proteins contain a single domain that forms an alpha-alpha super-helical
structure harbouring a conserved amphipathic groove that forms the binding pocket. 14-3-3
proteins are generally known to bind to phosphorylated serine residues on their binding partners
with sub-micromolar affinity [57]. Non-phosphorylated peptides have also been identified.
Binding of Exoenzyme S, a toxin produced by Pseudomonas aeruginosa, displays a high affinity
towards 14-3-3 zeta/delta and binds to the same amphipathic groove responsible for binding to
the phosphorylated peptides [58]. Phage display has also been used to identify cyclic peptides
14-3-3zeta/delta. The peptide with high affinity contained “WLDLE” motif that was essential for
binding [59].
Figure 16: Structural and literature analysis of 14-3-3. (A) The structure of 14-3-3 zeta/delta in complex with the ExoS peptide (PDB ID: 2O02). (B) The binding logo obtained for 14-3-3 eta. (C) The interaction surface of 14-3-3 zeta/delta and the ExoS peptide. The sequence logo shows conservation for residues Glu(8)-Trp(9)-Leu(10) Asp(11)-Leu(12)-Ala(13). These correspond to Asp-Ala-Leu-Asp-Leu-Ala residues present in the ExoS peptide (shown in yellow). Of the seven 14-3-3 isoforms, 14-3-3 eta was found in the list obtained from whole
genome RNAi screens. The peptides I obtained from phage display share a similarity to peptides
obtained from ExoS and phage display experiments (Figure 16). To further understand the
53
38 45
binding preference from phage display, I used the structure of 14-3-3 zeta/delta with ExoS [58].
The amino acids forming the interaction surface are identical in 14-3-3 zeta/delta and 14-3-3 eta
and hence both domains should have similar (if not identical) binding preferences. ExoS peptide
and cyclic phage-derived peptide have been shown to bind to 14-3-3eta. Hence we predict that
the peptide against 14-3-3 eta will bind to same interaction surface in a mode that is similar to
the binding of ExoS peptide (Figure 16).
3.3.4.5 Penta-EF hand domains: Penta EF-hand domains (PEF) are a family of Ca2+ binding
domains that are composed of five EF-hand motifs [83]. EF-hand is a helix-loop-helix structure
characterized by a conserved 12-residue inter-helical sequence that co-ordinates a Ca2+ ion. The
EF-hand motifs are present in a multitude of proteins, usually in multiple copies. Penta-EF hand
domains consist of five EF-hand motifs and consist of eight alpha helices. The five EF-hands are
formed by: α1-α2 (EF1), α3-α4 (EF2), α4-α5 (EF3), α6-α7 (EF4) and α7-α8 (EF5). Based on
sequence similarity, penta-EF hand domains can be divided into two groups: Group I PEF
domains (PDCD6 and peflin) and Group II PEF domains (calpain sub-family members, sorcin
and grancalcin). In this study, I targeted two PEF domains: Caplain small regulatory subunit 1
and Programmed cell death protein 6. PEF hand domains have not been previously studied using
phage display and hence this study represents the first to elucidate penta-EF hand binding
preferences.
a. Calpain small regulatory subunit 1: Calpains are intracellular Ca2+ dependent cysteine
proteases that play key roles in cells and have been implicated in a number of cellular processes
such as signal transduction, apoptosis, and cytoskeleton modelling[84]. The calpain proteolytic
system consists of a small subunit, which acts as a Ca2+ dependent adaptor, a large subunit that
contains the catalytic site and an endogenous calpain-specific inhibitor, calpastatin. Calpastatin is
ubiquitously expressed and blocks the protease activity of calpain by binding to three sites on the
calpain protease: the active site and domain V on the large regulatory subunit and penta-EF hand
domain on the small regulatory subunit [84].
In this study, I focussed on the small regulatory subunit of calpain. The small subunit is
required for proper functioning of the calpain large subunit and acts as a chaperone to stabilize
the calpain protease system. Calpain small regulatory subunit harbours a hydrophobic binding
surface that binds to the peptide obtained from calpastatin: DAIDALSSDFT. The binding
54
38 45
preference obtained from phage display is similar to that of the calpastatin-derived peptide with
the motif – DLxxWLxxDM (Figure 17).Hence, these peptides should competitively inhibit the
interaction of the calpain small regulatory subunit and calpastatin.
Calpains have been previously implicated as drug target in a number of disorders
including cancer and neurodegenerative diseases. There are large efforts to design small
molecule and peptide-based drugs that target calpain. Almost all of these drugs target the active
site of calpain to inhibit calpain activity. However, given the similarity between the active site of
calpain and other cysteine proteases, most of these compounds are non-specific. Calpastatin is
the endogenous and most specific inhibitor of calpain but it is not stable inside cells and hence
cannot be used for therapeutic applications. Thus there is an urgent need to develop high affinity
and high specificity inhibitors of calpain. The peptides identified here by phage display may
serve as templates for developing peptide inhibitors against the calpain protease system.
Figure 17: Structure and literature analysis of Penta-EF hand of CAPNS1. (A) The structure of CAPNS1 penta-EF hand domain in complex with calpastatin-derived peptide (PDB ID: 1NX1). (B) The binding motif for CAPNS1 penta-EF hand domain obtained from phage display aligned with the peptide derived from calpastatin. (C) The interaction surface between CAPNS1 penta-EF hand and calpastatin-derived peptide. Calpastatin peptide forms an alpha helical structure that binds to the binding pocket on the penta-EF hand. The key interactions are made by two Asp; two Ala; one Leu and one Phe residues on the peptide (highlighted in yellow). These nature and position of these residues are conserved in binding motif. Hydorphobic residues are replaced by aromatic residues in the binding motif indicating that the hydrophobic surface is more flexible and can accommodate bulkier residues. b. Programmed cell death protein 6: Programmed cell death protein 6 or PDCD6 (Alg-2)
functions as a Ca2+-dependent adaptor protein in the ESCRT and ER-to-Golgi transport systems.
Alg-2 interacting proteins commonly contain Pro-rich regions, and Alg-2 recognizes at least two
distinct Pro-containing motifs: PPYP(x)nYP (Alix, PLSCR3) and PxPGF (Sec31A, ABM-2)
[85]. The binding of PPYP(x)nYP peptide occurs at a groove that contains two peptide-binding
hydrophobic pockets [86]. The structural basis for the binding of Alg-2 to PxPGF peptides
55
38 45
remains to be established. Mutational and competitive binding analysis have shown that the
PxPGF peptides bind to a binding pocket(s) that is different from that of PPYP(x)nYP peptides.
From phage display, I obtained 55 unique peptides against PDCD6. Interestingly all these
peptides contained a prominent GWxxWV motif (Figure 18). This motif is distinct from the
PPYP(x)nYP motif, which binds to a structurally-defined binding surface. However this motif
does partially overlap with the PxPGF motif suggesting that these peptides may bind to the same
surface. The binding surface of this motif is not known structurally and phage-derived peptides
can be used to identify the binding surface. It has been shown that knock-down of Alg-2 leads to
growth defects in cancer cell lines via cell cycle arrest at G2/M checkpoint [87]. However, the
molecular mechanism of how Alg-2 is involved in cancer-related pathways is still unclear.
Peptide probes developed in this study can be used to investigate the biological role of Alg-2 in
specific cancer cell lines.
Figure 18: Structure and literature review of Penta-EF hand of PDCD6. (A) The structure of PDCD6 penta-EF hand domain in complex with Alix-derived peptide (PDB ID: 2ZNE). (B) The binding motif for PDCD6 penta-EF hand domain obtained from phage display. (C) The interaction surface between PDCD6 penta-EF hand domain and the Alix-derived peptide showing the pocket 1 and pocket 2 bind to the two PYP motifs on Alix peptide (highlighted in yellow). The phage-derived motif is distinct from pro-rich motif found in Alix and hence is predicted to bind to a distinct binding site. 3.3.5 Cytoskeleton regulation: Four domains in our study were found to be present on proteins
involved in regulation of cytoskeleton structure. These include two domains of the dynein light
chain family, one domain from the CAP/Gly domain family and one domain alpha-catenin/
vinculin head domain family.
3.3.5.1 Dynein light chain: Dynein light chains(DLC) are domains found on mono-domain
proteins: light chains of cytoplasmic motor protein dynein (DYL1 and DYL2). Structurally, DLC
56
38 45
domains contain three beta-sheets and three alpha helices in a two-layer alpha-beta core structure.
DLC are peptide recognition domains that bind to a large range of proteins such as Pak1-kinase,
Bim (pro-apoptotic) and many viral proteins [49]. Previous studies have found two motifs that
bind to DYL1 – GIQVD, KxTQT; where both bind in an anti-parallel beta sheet conformation to
the binding groove (Figure 19).Recent studies have also done phage display to determine the
binding preference of DLC from DYL1 (Figure 13) [50].
The peptide profile obtained via phage display looks similar to the motifs obtained from
natural peptides and previous phage display studies. However, some features of the motif
obtained by phage display seem to be different; for e.g.: a surprising preference for M/ W (φ)
instead of K/R at position -3.
DYL proteins act as essential hub proteins that are involved in a range of cellular
signalling pathways such as cytoskeleton including intra-cellular transport, autophagy, apoptosis
etc. Over-expression of DYL1 has been shown in a number of tumour types [49]. However, the
mechanism by which DYL1 and DYL2 are involved in carcinogenesis is poorly understood.
Peptide-based inhibitors developed in this study may help in critically evaluating the role of
DYL1 in various cancer-related pathways.
Figure 19: Structural and literature analysis of Dynein light chains.(A) The structure of Dynein light chain 1 in complex with the peptide derived from Swallow (PDB ID: 3E2B). (B) The sequence logos obtained from phage display for Dynein light chain 1 (DYL1) and Dynein light chain 2 (DYL2). DYL1 and DYL2 share high sequence similarity between each other and hence are predicted to have similar binding preference; which is observed in phage display results. Few difference exist in the two sequences, including stronger preference for Gly and Ala at position 4, Asp and Glu at position 9 in DYL1 compared to DYL2. (C) Binding interaction of Dynein light chain 1 and the Swallow peptide. The key interactions are mediated by Lys(-3)-Ala(-2)-Thr(-1)-Gln(0)-Thr(1)-Asp(2) residues present on the Swallow peptide. The central Thr(-1)-Gln(0)-Thr(1) residues are conserved in the sequence logo obtained for DYL1. Few difference are also observed namely, Met and Trp are possible at Lys(-3); Serine and Thr are possible at Thr(2); Asp and Glu are possible at Asp(5) on the Swallow peptide.
57
38 45
3.3.5.2 CAP/Gly domain: CAP/Gly domains are Gly-rich domain found in a number of
Cytoskeleton-associated proteins (CAP) that bind to C-terminal peptides with the motif –
EEY/F-COOH [98]. CAP/Gly domains are extensively involved in cellular processes including
chromosome segregation, establishment and maintenance of cell polarity, intracellular organelle
and vesicle transport, cell migration, intracellular signalling and tumorigenesis. CAP-Gly
domains are found in single or multiple copies and are primarily involved in protein interactions
and the formation of protein networks. In this study, I targeted the CAP-Gly protein of the large
subunit of dynactin, p150glued [99]. The dynactin complex is required for targeting dynein to its
cargo and for dynein motor processivity. CAP-Gly domains are characterized by highly
conserved motif with glycine and hydrophobic residues. Structurally, CAP-Gly domains form a
globular-protein fold with a highly twisted, five-stranded antiparallel β-sheet flanked by a small
β-hairpin. A unique cluster of conserved aromatic residues forms a solvent exposed hydrophobic
cavity bordered by the highly conserved GKNDG motif. This hydrophobic cavity of CAP-Gly
domains serves as a binding site for the C-terminal Glu-Glu-Tyr/Phe (EEY/F)-COOH sequence
motifs (Figure 20). The C-terminal binding preference could not be tested in the study as the C-
terminal peptide library was not available.
From phage display, I obtained a single peptide that bound to the CAP/Gly domain of
DCTN1 suggesting an internal binding mode for this CAP/Gly domain. Based on the sequence
alignment of internal peptide with C-terminal peptide, I observed similarity between N-terminal
end of the two peptides. To gain further insight into peptide binding, I modelled the phage-
derived peptide on the CAP/Gly domain of dynactin using Modeller. The structural model
obtained was analyzed by visual inspection. Based on the results, I predict that the binding
surface of CAP/Gly can accommodate an internal peptide (Figure 20). Briefly, the Asp and Glu
residues are conserved between the natural peptide and internal peptide. The Thr residue on
natural peptide is replaced by a Trp residue which interacts with a hydrophobic pocket present on
CAP/Gly domain. The C-terminal Phe residue is compensated by the Val and Pro residues which
also provides the structural flexibility to the internal peptide. The backbone CO group between
Val and Pro in internal peptide forms hydrogen bonds with positively charged residues on
CAP/Gly domains mimicking terminal COO- group in natural peptide. Finally the movement of
highly-mobile β4/ β5 loop allows the Trp residue on the internal peptide to interact with a
hydrophobic pocket. This pocket is covered by Asn (69) in presence of natural peptide. The
58
38 45
movement of the β4/ β5 loop also allows the internal peptide to move out of the peptide binding
pocket.
Figure 20: Structure and literature analysis of the CAP-Gly domain of p150glued. (A) The structure of the p150glued CAP/Gly domain in complex with CLIP-170 zinc-knuckle 2 (PDB ID: 3E2U). (B) The binding surface of CAP/Gly-peptide complex. (C) The sequence alignment of the natural C-terminal and phage-derived internal peptide partner of p150glued CAP/Gly domain. (D) The modelled structure of the p150glued CAP/Gly domain in complex with the phage-derived peptide. The structure was generated using Modeller by mutating the CLIP-170 peptide to Trp-Val-Pro-Trp-Gln. (E) The binding surface of CAP/Gly-internal peptide complex showing the potentially novel internal binding mode. Biophysical assays including iso-thermal calorimetry (ITC) are required to confirm the
binding between the phage-derived peptide and DCTN1 CAP/Gly domain. Structure
determination of phage-derived peptide and CAP/Gly domain is required to confirm the
structural model of the internal binding mode. If confirmed, the phage-derived peptides may help
in identifying novel protein partners and biological roles of CAP/Gly domains.
3.3.5.3 Alpha-catenin/vinculin head domain: The alpha-catenin/vinculin head domain is
found in proteins involved in cytoskeletal organization, such as vinculin and alpha-catenin.
Structurally, the alpha-catenin/vinculin head domain comprises of seven amphipathic helices
59
38 45
arranged as two four-helical bundles [66].In this study, we focused on the vinculin head domain.
The vinculin head domain is involved in mediating interactions between vinculin and its
interaction partners: alpha-actinin, talin and Shigella toxin IpA. The interaction between the
vinculin head domain and talin has been studied extensively and has been shown to occur via a
“helix addition” mechanism; where 26-amino acid length amphipathic peptide from talin inserts
into the first four helices of vinculin head domain to form a compact five helix structure. The
interaction exhibits high affinity and the peptides from alpha-actinin and IpA bind in a similar
mode [66]. A number studies have aimed at identifying the binding preference of vinculin head
domain. One of the first studies used phage display to identify peptides against the vinculin head
domain. Adey et al identified five peptides that specifically bound to the tailin binding region on
vinculin head domain with high affinity [67]. However these peptides failed to generate a
binding motif for vinculin binding. Gingras et al provided a consensus motif
(LXXAAXXVAXXVXXLIXXA) for vinculin binding based on the study of complex structure
of talin bound to vinculin and SPOT microarray analysis [68].
In this study, I obtained 26 unique peptides against the vinculin head domain. The
sequences produce a low-resolution alignment (shown in Figure 21) and do not align well to the
vinculin binding motif provided by previous combinatorial studies. This can be attributed to the
shorter length of these peptides. At 16 amino-acid length, the peptides obtained in this study
represent the shortest peptide sequences that are known to bind to the vinculin head domain and
hence may serve as inhibitors of vinculin-talin binding.
Figure 21: Structural and literature analysis of Alpha-catenin/vinculin head domain. (A) The structure of vinculin head domain in complex with the talin VBS1 (PDB ID: 1SYQ). (B) The binding profile for vinculin head domain obtained from phage display. (C) The interaction surface of VBS1 and vinculin head domain. VBS1 binds to vinculin head domain via amphipathic helix with hydrophobic residues facing vinculin core and charged residues facing the solution. No clear similarity was observed between the phage-derived peptides and known natural peptide binders of vinculin head domain.
60
38 45
3.3.6 Intracellular Transport: Intracellular transport is often mediated by peptide tags that
guide the spatial localization of a protein. Often, specialized PRD’s are involved in the
recognition of these peptide-based tags. In this study, we obtained four domains that are involved
in intracellular transport.
3.3.6.1 Importin beta: Importin beta is a member of the family of nuclear transport receptors
that are responsible for importing large macromolecules inside the nucleus [69]. Structurally,
importin beta contains a single domain with a superhelical structure containing 12 helical repeats
known as HEAT repeats connected via flexible linkers. These repeats contain two alpha helices,
A and B, connected by a flexible linker. The A helix face the outer, convex surface of importin
beta while the B helix is present on the inner concave surface. Importin-beta is known to bind to
NLS and importin alpha via the C-terminal section, RAN-GTP via its N-terminal domain and the
other proteins including the members of nuclear pore complex (FG nucleoporins) via its central
region. The interaction between importin beta and the FG-nucleoporins is mediated by a FxFG
motif present on this class of nucleoporins.
Figure 22: Structural and literature analysis of Importin beta. (A) The structure of importin-beta in complex with the FxFG peptide (PDB ID: 1F59). (B) Binding motif of importin-beta obtained from phage display. (C) The interaction surface between importin beta and the FxFG peptide. The key residues in this interaction are highlighted in yellow. The three key residues: two Phe and Gly are conserved in the binding motif obtained from phage display. From phage display, I obtained 10 unique peptides against importin beta. All of these
contain the FxFG motif (except one that contains Y instead of F at position 14) that is known to
bind to the central region of importin beta (Figure 22). The structural analysis of importin beta in
61
38 45
complex with FxFG peptide from nucleoporins matches accurately with the binding motif
obtained from phage display. (Figure 22)
Different groups have previously developed specific modulators of nuclear import [70].
The peptides generated in this study may serve as inhibitors of the interaction of importin beta
and FG nucleoporin and hence inhibit importin beta-mediated transport across the nuclear pore.
Such peptide inhibitors may act as useful tools for studying the role of importin beta in cells.
3.3.6.2 UBA: The ubiquitin-associated (UBA) domain is an approx. 40 amino acid domain that
was first recognized in proteins associated with ubiquitin but is also found in proteins involved in
nucleotide excision-repair and nuclear transport [75]. UBA domains form three-helix bundles
with a hydrophobic core that stabilizes the protein and possesses a conserved surface patch of
hydrophobic amino acids that interacts with hydrophobic regions of ubiquitin and other target
proteins. In this study, I targeted the UBA domain of nuclear export factor 1 (NXF1). NXF1 is a
member of the family of proteins involved in mRNA export from the nucleus. NXF1-UBA
domain is present on the C-terminal end of NXF1 and has been shown to be sufficient for
nucleo-cytoplasmic shuttling and localization to the nuclear pore complexes (NPCs) in-vivo.
NXF1 is also essential for the export of many viral RNAs bearing the constitutive transport
element (CTE) [76].
Figure 23: Structure and literature analysis of NXF1-UBA domain. (A) Structure of NXF1-UBA with the FxFG peptide (PDB ID: 1OAI). (B) The binding logo for NXF1-UBA obtained from phage display. (C) Binding surface of NXF1-UBA with the FxFG peptide. The phage display logo shows high conservation of Phe (pos 12) and Trp (pos 13). While the conservation of two aromatic residues is similar to two Phe residues (highlighted in yellow) in FxFG peptide; the lack of conservation of Gly residues (before and after the central hydrophobic residues) shows that the binding mode of phage-derived may be different from FxFG motif. Other differences include: preference of Trp(13) instead of Phe; hydrophobic residue (Phe/Leu/Ile) at postion 9 instead of the Asp residue in the FxFG motif. Based on structural analysis of the crystal structure, I predict that the Trp residue can fit into the hydrophobic binding pocket.
62
38 45
From phage display, I obtained a preference of FxW for NXF1-UBA (Figure 23). To
obtain a deeper understanding of the binding mode of phage-derived peptides to NXF1-UBA, I
analyzed the known structure of FxFG peptide (from nucleoporins) to NXF1-UBA (Figure 23)
[77]. The sequence pattern of the phage-derived binding preference is similar to that of the FxFG
peptide. While the hydrophobic residues are conserved between the phage-derived motif and the
FxFG peptide, the results also suggest that the hydrophobic surface of TAP/UBA may have more
flexibility than previously reported. This indicates that NXF1/UBA domains may bind to other
proteins that contain a hydrophobic motif which can be predicted using the binding motif
obtained from this study. To my knowledge, this is the first study to report the binding
preference of NXF1-UBA.
The peptides obtained in this study can be used to design peptide-based or small-
molecule based inhibitors of NXF1 mediated nuclear export. There is enough evidence
suggesting that blocking this interaction should be sufficient to inhibit NXF1-mediated transport.
Inhibitors against NXF1 may be used to further probe the mechanisms of NXF1-mediated
nuclear transport.
3.3.6.3 Bro1: The Bro1 domain is found in different eukaryotic proteins such as Alix
(PDCD6IP), Brox and HD-PTP [78]. Structurally, Bro1 domain has a banana-shaped shaped
structure that is organized around a core of tetratricopeptide helical hairpins. In this study, I
targeted Alix Bro1 domain. Alix plays an important role in intracellular transport as an adaptor
protein that recruits CHMP4/ESCRT-III complexes (via its Bro1 domain) to function at distinct
biological membranes. Other functions include lysobisphosphatidic acid (LBPA) binding,
endophilin binding, receptor trafficking, endosome distribution, cell motility/adhesion, apoptosis,
actin and microtubule binding and regulation of JNK signalling. Alix has also been implicated in
the release of several other classes of enveloped viruses, including hepatitis B virus, dengue
virus, yellow fever, HCV, SIV, RSV human para-influenza virus, and Sendai virus.
The interaction between CHMP4 and the Alix-Bro1 domain is mediated by an
amphipathic helix present in CHMP4 binding to helix 5-7 on Bro1 domain [78]. A second
protein interaction site has been reported within the first half of the Bro1 domain, which interacts
with the p6-adjacent nucleocapsid (NC) domain of Gag, a HIV protein. While the exact
interaction surface between the Alix-Bro1 domain and the NC domain has not been identified;
63
38 45
residue substitutions in NC or within the first 200 residues of the Ali-Bro1 domain compromised
HIV-1 release emphasizing the critical role of NC-Bro1 domain interaction in this process [79].
In this study, I obtained 15 peptides (7 unique peptides) against the Alix-Bro1 domain.
Upon sequence alignment, a “Mxx[L/M]xx[W/L]” motif was resembled and it resembles the
amphipathic helix derived from CHMP4C (Figure 24). Comparing the available structure of
Alix-Bro1 domain in complex with CHMP4C peptide shows that the binding preferences
obtained from phage display resembles the binding observed for CHMP4A peptides. Hence, I
predict that the phage-derived peptides will block the interaction between Alix Bro1 domain and
CHMP4 proteins thereby blocking the recruitment of ESCRT III complex. Hence, these peptides
may play a role in discerning the role of Alix-Bro1 domain in a range of cellular pathways.
Figure 24: Structure and literature analysis of Alix-Bro1 domain. (A) The structure of Alix-Bro1 domain with the CHMP4C peptide (PDB ID: 3C3R). (B) The binding motif of the Alix-Bro1 domain. (C) The binding surface of Bro1 domain and CHMP4C peptide showing the amphipathic CHMP4A peptide with the hydrophobic surface towards the Bro1 domain. The pattern obtained from phage display shows two components: negatively charged patch at the N-terminal end and hydrophobic helical component with triad of hydrophobic residues: Met (pos 10); Leu/Met (pos 13) and Trp/Phe (pos16). The conserved hydrophobic triad in phage display corresponds to Ile; Leu and Trp (highlighted in yellow) in the CHMP4C peptide.
3.3.6.4 Clathrin heavy chain: Clathrin forms the outer coat of vesicles involved in cellular
transport between different membrane locations [93]. Structurally, clathrin contains the N-
terminal adaptor domain (CTD) and the alpha-helical repeats that forms the large part of clathrin
heavy chain. CTD is a 7-bladed beta propeller that binds to compartment-specific adaptor
proteins such as beta-arrestin and the adaptor protein complexes (AP-1 & AP-2). The interaction
between the CTD and adaptor proteins is mediated by peptide-like motifs. The first linear motif
obtained was the ‘clathrin-box’ consensus LΦxΦ[D/E] that was confirmed to bind between the
64
38 45
first two blades of the beta-propeller [94]. In recent years, other peptide motif variants have been
shown to bind to the CTD, such as: the W-box motif (PWxxW) which binds to the top of CTD
[95] and the [L/I][L/I]GxL motif, which binds between blade 4 and blade 5 of the CTD [96]
(Figure 24). A fourth binding site has also been predicted between blade 6 and blade 7 using
multiple sequence alignments however the binding preferences of this site has not yet been
determined [97]. (Figure 25)
In this study, 47 peptides (22 unique peptides) were obtained against clathrin terminal
domain. Based on the sequence alignments, it was observed that structurally, the sequences could
be split into 3 groups: Set 1: sequences with DΦxWΦ motif that resembled the clathrin-box
peptide, albeit in the reverse order, Set 2: sequence with DxxDW motif that does not match any
known clathrin binding motif and Set 3: with no consensus sequence to one-self or with Set 1
and Set 2 (Figure 25). It is surprising that no sequences were obtained that resembled the
peptide-motifs reported in literature. Previous studies have suggested that the beta-propeller
structure of CTD changes conformation upon binding to different peptides and small molecule
ligands. This flexibility may allow peptides with distinct sequence to bind to the same binding
surface and explain the diversity in peptide sequences that bind to the CTD.
Figure 25: The structure and literature analysis of the Clathrin terminal domain. (A) The structure of Clathrin terminal domain showing the four peptide binding sites: Site 1 with preference for LΦxΦ[D/E] peptides (PDB ID: 1C9I); Site 2 with preference for PWxxW peptides (PDB ID: 1UTC); Site 3 with preference for [L/I][L/I]GxL peptides (PDB ID: 3GD1) and Site 4 with an unknown binding preference. (B) The binding motifs obtained from phage display for the Clathrin terminal domain showing the two binding motifs.
65
38 45
A recent study by von Kleist et al identified a novel family of small molecules that bind
to the clathrin terminal domain [98]. On structural analysis, it was observed that these molecules
specifically bind to the “clathrin-box” interaction surface and blocked clathrin-mediated
endocytosis. This observation is extremely important as peptides derived from phage display
may act as specific inhibitors of clathrin-mediated endocytosis. Phage-derived peptides often
show higher affinity and specificity towards target protein compared to small molecule inhibitors
and hence may serve as probes for elucidating the role of clathrin-terminal domain in clathrin-
mediated endocytosis. Further, the different set of peptides obtained from phage display can be
combined to generate bivalent peptides to develop high affinity inhibitors of clathrin-mediated
endocytosis.
3.3.7 Genome Regulation: A host of proteins interact with different parts of the genome. These
include transcription factors that activate the activity of specific genes, histone modification
enzymes that introduce post-translation medications on histone tails, RNA polymerase complex
that initiates transcription of genes. In this study, I was successfully able to generate peptides
against five PRDs that are involved in the regulation of the genome.
3.3.7.1 PCNA: Proliferating cell nuclear antigen (PCNA) is a single-domain protein that acts as
a co-factor for DNA polymerase δ in eukaryotic cells [62]. Functional PCNA is a homotrimer
forming a ring structure, in which three monomers are joined together in an anti-parallel head to
tail interaction.
Numerous protein partners interact with PCNA, including the DNA polymerase δ and the
DNA polymerase ε for DNA replication, DNMT1, HDAC1, and p300 involved in chromatin
assembly and gene regulation, DNA mismatch repair protein Msh3/Msh6 for DNA repair,
p21(CIP1/WAF1) for cell cycle control, and ESCO1/2 for sister-chromatid cohesion. Most of the
interactions are mediated by a PCNA interaction peptide or PIP-box (QXXhXXaa; where h-
hydrophobic residue, a – aromatic residue) that bind to a specific interaction surface in PCNA
(Figure 26)[62]. Other interaction motifs are also present such as the non-canonical PIP box and
other non-PIP box binding motif ([KR]-[FYW]-[LIVA]-[LIVA]-[KR]) [63]. Bacterial display
has been performed against PCNA to identify a number of canonical and non-canonical PCNA
binding peptides including two more major classes: YxxxY/TxxxxW and KA-box peptides [64].
66
38 45
All these peptides bind to same binding surface as PIP box, albeit in different binding modes to
recruit PCNA in diverse cellular pathways [65].
In our study, I identified two phage clones against PCNA both of which showed high
enrichment ratio compared to GST (Figure 26). Both these peptides correspond to the same
sequence shown in Figure 26. The sequence matches accurately to the PCNA-interacting motif
(PIP-box) suggesting that the clonal ELISA results are accurate. The structure of PCNA in
complex with PIP-box motif from FEN1 confirms that the key residues required for the PCNA-
PIP interaction are found in phage-derived peptides (Figure 26). More sequences maybe required
to generate a more accurate binding motif for PCNA.
Figure 26: Structural and literature analysis of PCNA. (A) The structure of PCNA in complex with the FEN1 peptide. (B) The single peptide sequence obtained from phage display showing binding to PCNA. (C) The interaction surface of PCNA and the FEN1-derived peptide. The key interactions are mediated by Gln; Leu and two consecutive Phe residues forming the canonical PIP box motif: Qxxhxxaa where h – hydrophobic and a – aromatic amino acids. These residues are conserved in the phage-derived peptide. 3.3.7.2 OB-fold: Oligonucleotide-oligosaccharide binding (OB fold) domains are ssDNA
binding domains found in several proteins including the replication protein A 1 (RPA1), the
primary eukaryotic ssDNA binding protein [71]. Structurally, OB fold domains form a five-
stranded beta-barrel with one end of the barrel capped by an alpha-helix. The ssDNA binding is
mediated by one face of the beta-barrel and is conserved amongst all OB-fold members. N-
terminal OB-fold domain of RPA 70KDa subunit (present in this study) is a unique example of
OB-fold that is known to be a protein interaction module [72]. It plays a key role in central
cellular processes such as DNA replication, damage response and repair by interacting with
proteins such as phospho-RPA2, p53, RAD9,BID, MRE11, NBS1, Rad17, RAD52,BRCA2 and
67
38 45
ATRIP. The interaction between RPA70N and its interaction partners is mediated by the binding
surface involved in ssDNA binding in other OB-fold domains.
From phage display, I obtained 66 peptide sequences (11 unique sequences) for the
RPA70N domain. These contain a consensus motif shown in Figure 27. The structure of
RPA70N with phospho-mimic peptide from p53 indicates that the peptides bind to the canonical
peptide binding surface.
RPA has already been proven to be a valid target for cancer therapy. Small molecule
inhibitors that target the central OB-folds of RPA70, RPA70A and RPA70B, have shown to
induce cytotoxicity and increase the efficacy of genotoxic chemotherapeutics [73]. Peptide-based
inhibitors of RPA70 N should inhibit the binding of RPA to multiple checkpoint proteins (at
least ATRIP, RAD9, MRE11, and p53) and hence significantly impair the replication stress
response. Cancer cells are more dependent on replication stress response than normal cells to
complete replication and retain viability. Thus inhibitors of RPAN may amplify levels of DNA
damage in cancer cells caused by a wide variety of genotoxic agents [74]. Apart from potentially
acting as anti-cancer agents, peptide inhibitors to RPA70N may also be used to further
understand the role of RPA in cell cycle and DNA repair and may serve as valuable tools for
studying RPA biology.
Figure 27: Structural and literature analysis of the RPA70N OB-fold domain. (A) Structure of RPA70N in complex with the p53-derived peptide. (B) Binding logo of RPA70N obtained from phage display. (C) The interaction surface between RPA70N and the p53 peptide. The key interactions are mediated by Asp, Leu and Met residues (highlighted in yellow). These residues are conserved in the binding motif obtained from phage display with an additional possibility of Glu instead of Leu at position 12 of the motif. The phage-derived binding motif also contains an aromatic residue (Trp, Tyr) at position 17 which is not present in the p53 peptide. 3.3.7.3 Ligand binding domain of nuclear receptors: Nuclear receptors (NR) are a group of
transcription factors that regulate the expression of target genes in response to binding of small
68
38 45
molecules such as steroids, hormones and metabolites [80]. They are an integral part of several
key cellular signalling pathways regulating homeostasis, proliferation etc. Nuclear receptors
contain two structured domains – a DNA binding domain (DBD) that specifically recognizes
hormone response elements (HRE) on the DNA and a ligand binding domain (LBD) that binds to
small-molecule ligand and interacts with co-regulators. Structurally, the LBD forms a globular
structure composed of a three-layered α-helical sandwich that contains 12 alpha helices. The C-
terminal helix 12 is highly mobile, and is stabilized upon ligand binding into a position that
completes the recognition surface for binding of co-regulators. Upon binding to LBD, co-
regulators may recruit RNA-polymerase to initiate transcription of downstream genes (also
called co-activators) or HDAC complex to silence gene expression (also called co-repressors)
[15]. Most of the co-activators and co-repressors identified to date interact primarily with nuclear
receptor activation function 2 (AF2), which is located on the LBD itself. These include p160
steroid co-activator family members (SRC), p300 and related integrator proteins, TRAP mediator
complex, and various other co-activators. The region (called NR Box) responsible for binding in
co-activators comprises of a short alpha-helical LxxLL motif. Although this motif is necessary to
mediate the binding of these proteins to liganded NR's, amino acids flanking the core motif
dictate specificity of interaction. The three conserved leucines align on the face of the α-helix
that packs against the hydrophobic channel of the LBD surface. The binding of the co-repressors
occurs at the same pocket as the co-activators but in a different mode. In its unliganded state, the
C-terminal helix 12 arranges itself along the helix3 to form the peptide binding pocket. The NR-
box responsible for binding to co-repressors such as NCoR and SMRT comprise of the Lxx[I/H]
IxxxL motif. In co-repressor motif, the hydrophobic residues pack against the structure of
hydrophobic channel of the LBD surface [81].
Out of the four NR LBDs present in the initial dataset, only the LBD of bile acid receptor
(NR1H4) could be successfully expressed and purified. Bile acid receptor binds to its natural
ligand bile acid and is involved in metabolism of bile acid in liver tissue [82]. The phage
selections were done in the absence of bile acid ligand and Figure 28 shows the peptides
obtained from phage display experiments. To rationalize the consensus motif obtained from
phage display, I analyzed the structure of the nuclear receptor in complex with the co-repressor
peptide. Since no structure of bile acid receptor bound to its co-repressor (SMRT or NCoR) is
available, the structure of the closest nuclear receptor, PPARα in complex with SMRT peptide
69
38 45
was used for analysis [81]. Based on sequence alignment of SMRT and phage-display peptides,
it is observed that the key hydrophobic residues responsible for binding of the SMRT peptide are
exchanged by bulkier hydrophobic and aromatic residues in the phage-derived peptides. This has
not been observed previously and may represent a novel mode for peptide binding to LBD of
NR1H4. Further biochemical and structure analysis are required to confirm the binding of these
peptides. Nonetheless these might represent novel binding partners for the bile acid receptor.
Figure 28: Structure and literature analysis of the NR1H4 ligand biding domain. (A) The structure of PPARα LBD in complex with the SMRT peptide (PDB ID: 1KKQ). (B) The binding motif for the NR1H4-LBD obtained from phage display. (C) The interaction surface between PPARα and the SMRT peptide. The key interactions between PPARα and the SMRT peptide are mediated by two Leu and Ile residues (highlighted in yellow) that stack against the hydrophobic surface of PPARα. Based on sequence alignment to SMRT peptide, the hydrophobic and aromatic residues conserved in the phage display results correspond to the key residues in the interaction. Nuclear receptors are amongst the most important classes of drug targets for various
diseases including breast and prostate cancer. Often, the small molecule drugs bind to ligand
binding pocket and de-activate the NR-LBD. Other peptide- or small-molecule inhibitors affect
binding of NR-LBD to their co-regulators. NR-LBD/co-regulator binding is critical for
downstream activity of nuclear receptors and inhibition of this interaction results in disruption of
nuclear receptor action [15]. The peptides identified in our study bind to bile acids receptor in its
unliganded form. Once confirmed, these peptides may modulate the activity of this important
class of drug targets.
3.3.7.4 WD40 domains: WD40 domains consist of sequence repeats of 44-60 residues that
have a four-stranded anti-parallel beta sheet which come together to form a beta-propeller fold
70
38 45
[88, 89]. The most common of these domains are seven-bladed beta-propeller that contains seven
WD40 repeats. WD40 domains are among the most common domain types across eukaryotic
proteomes and act as scaffolds for several key cellular pathways. In this study, I targeted the
WD40-repeat containing protein 5 (WDR5). WDR5 is an adaptor protein that forms part of the
Set1-family of methyl-transferase. It binds to unmodified histone H3 and members of the Set1
family of methyl-transferases via the top of the beta-propeller [90]. Residues from a small region
located close to the MLL catalytic site (called the Win-motif) binds in a 3/10-helical
conformation within the central depression of the beta-propeller, and the residues that follow
extend away from this cleft along the surface of WDR5.
From phage display, I obtained a clear consensus motif “R[T/W]xxW” with a strong
preference for Arg at the central position of the residue. The structure of WDR5 in complex with
the peptide obtained from MLL4 peptide provides me with confidence that the phage derived
peptides bind to the interaction surface on top of the beta propeller (Figure 29) [91].
WDR5 is an important target for regulating histone modifications, specifically histone
methylation. The interaction of WDR5 and histone H3 is critical for this event and inhibition of
this may potentially block the methylation of histone by the Set1 family of methyltransferase
[92]. Hence the peptides obtained in this study may be used to inhibit the function of Set1 family
of methyltransferase as intracellular probes for studying the biology of this important class of
methyltransferase.
Figure 29: The structure and literature analysis of WDR5. (A) The structure of WDR5 with the MLL4 peptide (PDB ID: 3UVM). (B) The binding motif of WDR5 domain obtained from phage display. (C) The binding surface of WDR5 and the MLL4 peptide with the key residues involved in WDR5 interaction highlighted in yellow. These residues are conserved in the phage-derived binding motif.
71
38 45
3.3.7.5 TRF homology domain: TRFH domain found in proteins like TERF1 and TERF2, that
are part of the shelterin complex [106]. TERF1 and TERF2 are homologous proteins that
associate with the full length of the double-stranded portion of the telomere. The centrally
located TERF1 TRF homology domain is a known protein interaction domain that mediates the
recruitment of telomere binding proteins such as tankyrase and TIN2. Structurally, the TRFH
domain consists of nine α-helices forming an elongated helix bundle. TRF1 recognizes TIN2 and
PinX1 using a conserved interaction surface on its TRF homology (TRFH) domain. The N
terminus of TIN2 peptide adopts an extended conformation stabilized by an extensive
intermolecular hydrogen-bonding network with key interactions made by Leu, Phe and Pro
residues. The binding motif that is proposed based on structural studies is: F/YxLxP (Figure 30)
[107].
In this study, I obtained a limited number of peptides against TRFH domain of TERF1
(Figure 30). These sequences did not show any strong binding preference to known natural
peptides. More sequences are required to identify the binding preference of these peptides.
Further, these peptides should be investigated further for identifying in detail, the binding
preferences of TRFH domain.
Figure 30: Structural and literature review of the TRFH domain of TERF1. (A) The structure of the TERF1-TRFH domain with the TIN2 peptide (PDB ID: 3BQO). (B) The peptide sequences obtained from phage display against the TERF1-TRFH domain. (C) The binding surface of TERF1-TRFH domain and the TIN2 peptide. The key interactions are mediated by Leu and Phe residues on Tin2 (highlighted in yellow). The number of sequences obtained from phage display is limited and hence it is difficult to obtain a clear consensus motif for the TERF1-TRFH domain. However, the peptides obtained in this study do not contain the F/YxLxP motif described by Chen Y et al 2008. 3.3.8 Miscellaneous: Apart from the aforementioned examples, I was also able to generate
peptide against a handful of domain families that are involved in different cellular pathways.
72
38 45
These include the SWIB/MDM2 domain family involved in apoptosis, eIF4E involved in
translation initiation, HORMA domain involved in cell cycle regulation and ubiquitin that is
involved in ubquitin-proteasomal degradation system.
3.3.8.1 SWIB/MDM2:The SWIB/MDM2 family of domains are found in the Mdm2 family of
oncoproteins that are known regulators of p53 and SWI/SNF family of ATP-dependent
chromatin-remodelling proteins [51]. In MDM2 proteins, SWIB/MDM2 domain binds to the
transactivation domain of p53, allowing the degradation of p53. Structurally, the SWIB/MDM2
domain contains six beta sheets and four alpha helices. The binding surface is a hydrophobic
cleft formed by two alpha helices where the alpha-helical transactivation domain of p53 binds.
The key interaction between SWIB/MDM2 and p53 is mediated by a triad of residues on the
p53: Phe, Trp and Leu on the peptides that insert into the hydrophobic pocket on MDM2 [52].
In this study, we identified peptides against the SWIB/MDM2 domain of MDM4 or
MDMX, a member of MDM2 family of proteins that is domain that is essential for regulating
p53. A high enrichment ratio was obtained for SWIB/MDM2 domain during phage selections. A
binding preference obtained showed a clear “FxxxWxxL” motif which correlates with previous
structural and biochemical analysis. This clearly suggests that the peptides generated in this
study bind to the same interaction surface that binds to the p53 peptide (Figure 31).
Figure 31: Structure and literature analysis of MDM4. (A) The structure of MDM4 in complex with the peptide from p53 (PDB ID: 3DAB). (B) The binding motif obtained for MDM4. (C) The interaction surface of MDM4 with p53 peptide. The key interactions between the two domains are mediated by triad of residues: Phe, Trp and Leu on the binding surface (shown in yellow). These residues are conserved in the consensus motif. Interestingly, the binding motif suggests that Leu at the third triad position can also accommodate a Met residue. Disruption of the p53 tumor suppressor pathway due to mutations on the p53 gene is
found in approximately 50% of all cancers. Several genome-wide functional genomics studies
have revealed an increased MDM4 copy number in 65% of human retinoblastomas [53]. Ectopic
73
38 45
expression of MDM4 in mouse Rb-null p107-null retinal progenitor cells leads to a reduction in
the p53-mediated apoptosis and a clonal expansion of tumor cells. On the contrary, colony assays
have shown that knocking down MDMX blocks proliferation of MCF-7 cells unless p53 levels
are simultaneously decreased. Nutlin-3, a dual inhibitor of MDM2 andMDM4 reduces the
MDM2/4-p53 interaction and efficiently kills retinoblastoma cells [53]. Other small-molecule
and peptide-based MDM4 inhibitors have been identified and tested for anti-cancer activities.
The peptides generated in this study are predicted to bind to the same p53 binding site on MDM4
and hence may have similar effect on p53-mediated apoptosis pathway. 3.3.8.2 Horma domain: The HORMA family of domains are found in several eukaryotic
proteins such as Mad2, a protein that is involved in mitosis checkpoint [60]. The core of the
HORMA fold contains three α-helices sandwiched between a six-stranded β-sheet and an
irregular β-hairpin. In the Mad2 protein, the domain exists in two conformationally distinct
forms: open form (O-Mad2) and closed form (C-Mad2). Both recombinant and endogenous
Mad2 are predominantly folded as O-Mad2 while C-Mad2 forms upon binding to its interaction
partners – Mad1, p31comet and Cdc20.
Figure 32: Structure and literature analysis of HORMA domain. (A) The structure of HORMA domain of Mad2A in complex with the phage-derived Mad2 binding peptide (MBP1: PDB ID – 1KLQ). (B) The binding logo obtained from phage display done previously (bottom – Luo et al 2002 [61]) and this study (on top). The binding motifs are similar to each other with similar conserved positions. (C) The interaction surface of MBP1 and Mad2A. The key residues involved in the interaction (highlighted in yellow): Trp(2) are; Tyr(3); Pro(7); Pro(8); Gln(9) and Arg(10). In out binding logo, we see conservation of Gly before the highly conserved hydrophobic region and Lys/Arg after the poly-proline region. This can be attributed to the longer library used in this study which may have produced a longer binding motif. Phage display has been used previously to generate peptides against Mad2 [61]. The binding
preference obtained is shown in the Figure 32. The core of the motif consists of two hydrophobic
residues, a basic residue, and a third hydrophobic residue (Figure 32). This core motif is
74
38 45
generally followed by a proline-rich sequence. Known interaction partners of Mad2A Cdc20 and
Mad1 contain a consensus similar to that found in phage display results.
In this study, I obtained a similar binding preference as observed previously suggesting
that the peptides obtained here will bind to the same binding site as described earlier. 3.3.8.3 eIF4E: The eukaryotic translation initiation factor eIF4E is a protein that exists as part
of the translation pre-initiation complex (eIF4F) involved in directing ribosomes to the cap
structure of mRNAs [100]. elF4E has a curved eight-stranded antiparallel β-sheet with three
helices forming the convex face and three smaller helices inserted in connecting loops and binds
directly to the mRNA cap. The m7G of the mRNA cap binds to a stack of tryptophan residues on
the concave face. EIF4E is recruited to the eIF4F complex via its interaction partner eIF4G by
binding to a conserved binding site. The eIF4E peptide binding site is located in a region
encompassing one edge of the β-sheet, the adjacent helix α2 and several regions of non-regular
secondary structure on the convex surface of eIF4E. The peptide from eIF4G forms a helical
structure and consensus motif is YxxxxLΦ [101]. EIF4E also binds to the RING domain of Z
protein found in anenovirus and tumor suppressor protein PML. A recent study has identified
structural and biochemical characterization of the interaction which is mediated by a site that is
distinct from the known YxxxxLΦ peptide motif [102].
Figure 33: The structure and literature analysis of eIF4E. (A) The structure of eIF4E with the eIF4G peptide (PDB ID: 3UVM). (B) The sequences of peptides obtained from phage display against eIF4E. (C) The binding surface of eIF4E and the eIF4G peptide. In this study, I obtained a limited set of peptides for eIF4E (Figure 33). A handful of
these peptides contained sequences similar to the YxxxxLΦ motif. However, a subset of phage-
derived eIF4E binders do not show the YxxxxLΦ motif (Figure 33). This observation is critical
75
38 45
as these peptides may bind to eIF4E at a site different from the canonical peptide binding site.
However, more sequences are required to confirm the binding of the phage-derived peptides to
eIF4E.
EIF4E is an important target for cancer therapy. EIF4E is up-regulated in a number of
malignancies and over-expression of eIF4E leads to tumorigenesis in mice models. Inhibition of
eIF4E in various cancer models leads to apoptosis and reduction of the level of oncogenic
protein Ras. Herbert et al first reported that peptides obtained from its natural binder 4E-BP1,
that bind to the initiation factor eIF4E and induces apoptosis in MRC-5 cells [103]. Peptides
derived from eIF4G have been used to identify a small molecule (4EGI-1) that binds to eIF4E
and inhibits its recruitment to the eIF4F complex [40]. When introduced inside cells, these
molecules lead to the inhibition of growth of multiple cancer cell lines. Phage-derived peptides
may join the growing list of inhibitors of eIF4E. 3.3.8.4 Ubiquitin: Ubiquitin is small protein that can be attached to a range of proteins to affect
their cellular fate. Ubiquitination is an important post-translation modification that directs protein
recycling. A number of proteins recognize the ubiquitin tag on other proteins. One of the most
common modules to bind to ubiquitin is the ubiquitin-interacting motif (UIM) [103]. The UIM is
found in a number of proteins involved in endocytosis and vacuolar protein sorting including
Hrs, Vps27p, Stam1, and Eps15. The UIM consists of an amphipathic α-helical structure with
hydrophobic core sequences composed of alternating large and small residues (Leu-Ala-Leu-
Ala-Leu) that are flanked on both sides by patches of acidic residues. Sequence analyses of
known UIM have been used to define a more general 15-residue UIM motif: eeexΦxxAΦx
[e/Φ]Saxe; where x is a helix-favoring residue and a is a bulky hydrophobic or polar residue with
considerable aliphatic content, e is a negatively charged residue and Φ is a hydrophobic residue
[104].
Structurally, ubiquitin contains three and one-half turns of α-helix, a short 310-helix, a
five strand β-sheet. The five-stranded beta-sheet of ubiquitin constitutes the principal interaction
surface for the Vps27 UIM helix. The helix binds in an antiparallel orientation relative to the C-
terminal β-sheet, and interacts with the β-sheet 4 & 5 and loops between β-sheet 1& 2 and β-
sheet 4 & 5. The UIM forms a left-handed, antiparallel helix with the hydrophobic face of the
amphipathic helix facing the ubquitin molecule [103].
76
38 45
In this study, I obtained five peptides (four unique peptides) against ubiquitin (Figure 34). No
consensus motif was obtained as the numbers of sequences were low. I also observed an
abundance of tryptophan residues at different positions. The clonal ELISA results confirm that
the peptides obtained here are specific for GST-tagged ubquitin compared to GST alone. More
sequence maybe required to obtain a clear a binding preference. Also biochemical analyses are
required to confirm the binding of the phage-derived peptides. Nevertheless, if confirmed these
peptide may serve as potential reagents to recognize ubiquitin. In recent years, there have been
advances in using UIM chains to develop intracellular reagents to detect specific ubiquitin chains
on proteins [105]. The peptides obtained here, if effective, may potentially help in developing
better intracellular reagents.
Figure 34: Structure and literature analysis of ubiquitin. (A) The structure of ubiqutin with the Vps27p UIM (PDB ID: 1Q0W). (B) The peptides obtained from phage display against ubiquitin. (C) The binding surface of ubiqutin and Vps27p shows the key residues required for interaction with ubiqutin. Ubiquitin interaction motif can be divided to two regions: a central amphipathic helix with conserved Ala and Ser residues (highlighted in yellow) and a N-terminal negatively charged helix (highlighted in yellow). The phage-derived peptides do not fall into the canonical UIM and may represent a novel mode of binding to ubiquitin. 3.4 Summary
The second aim of this study was to generate peptide binders against shortlist proteins
using phage display. To this end, I selected all 66 domains that were identified by my
computational analysis. These domains were cloned into a pGEX expression vector and
expressed as purified proteins. 44 out of 66 domains were successfully purified using established
GST-purification protocol. SDS PAGE gels were used to identify the steps of the protein
77
38 45
purification where protein purification failed for 22 domains. Interestingly, all the 22 domains
were found to be insoluble and hence could not be obtained in the cell lysate. Different growth
and lysis conditions may be required to solubilise these domains and increase protein yield.
For each of the purified domains, I used a 16 amino acid length random library to screen
for peptide binders. The selection procedure was optimized by incorporating pre-selection in
GST wells and in-solution GST negative selection. These modifications helped in removal of
non-specific peptide binders and significantly reduced the background noise in phage selections.
From my phage display screen, I was able to obtain specific peptides against 27 of the 44
domains. These domains belonged to different structural families and exhibited divergent
binding preferences. This highlights the power of phage display to identify distinct binding
preferences using a single random library. Further, I was able to generate position weight
matrices (PWM) for 22 of the 27 domains.
Extensive structural analysis and literature survey were performed to examine the
peptides obtained from phage display. For 20 of the 27 domains, I obtained peptides that
resemble natural peptide partners for these domains. This provides me with confidence that these
peptides may block the binding of endogenous partner of these domains, thereby perturbing their
cellular function. For these cases, I have predicted the phenotypic effects of peptide-mediated
inhibition and enlisted the potential uses of these peptides. For the remaining seven domains, the
peptides obtained did not resemble the known natural peptide partner. In such cases, based on
current evidence, I have predicted the potential binding surface and their binding mode. For
CAP/Gly domain, I generated a structural model for binding of the phage-derived peptide. The
structural model is currently being tested by Dr. Yugeng Tong, the principal investigator at the
Structural Genomics Consortium (SGC), Toronto. For domains where I was not able to make
structural based models, I have suggested potential binding modes. These domains include
PDCD6 penta-EF hand domain and clathrin terminal domain.
For seven domains, the peptide sequences obtained were insufficient to generate a high
confidence binding motif. These include eIF4E, TRFH domain of TERF1 and ubiqutin. Further
experiments are required to predict the binding mode of these peptides. Nonetheless, these
results do suggest a non-canonical peptide binding mode for the known peptide binding domains.
Once verified, these binding modes may uncover novel protein partners and biological roles for
these domains.
78
38 45
4 CONCLUSIONS
79
38 45
4.1 Summary of work: Peptide recognition domains (PRD) play key roles in cellular
pathways regulating homeostasis and cellular signalling. Such domains are frequently mis-
regulated in diseases including cancer. Considerable progress has been made in developing
specific small-molecule and peptide based reagents against a limited families of PRDs. The
central aim of this study was to use to phage display to generate peptide probes against a diverse-
set of cancer-related PRDs.
In Chapter 2, I covered my work in identifying cancer-relevant peptide recognition
domains. To this end, I focused on a list of proteins related to ovarian cancer. These candidate
genes were identified by our collaborators using whole genome RNAi screens on 15 different
ovarian cancer cell lines. I developed a computational methodology to identify target domains
present on these candidate genes that share high sequence similarity to known PRDs. A set of
known PRDs were obtained from online databases such as PepX and DOMINO. The list of
potential PRDs identified from my computational pipeline was manually curated and analyzed.
Based on my analysis, I selected 66 domains as targets for further work.
In Chapter 3, I described the phage display pipeline used to identify peptides against each
of the 66 target domains. First, using the standard GST purification method, I successfully
purified 44 of the 66 domains. Second, I used a 16 amino acid random library to obtain peptides
against 27 of the 44 purified domains. Third, I validated the phage derived peptides using an
extensive structural analysis and literature review. Based on this analysis, I was able to
accurately predict the peptide binding mode for a large proportion of the 27 domains. For the
domains where accurate models could not be generated, I have listed future experiments required
to fully elucidate the mechanism of binding. For each of the 27 domains, I have also included
potential applications of phage-derived peptides.
Based on the results obtained thus far, I have been able to successfully generate binding
preferences for 22 known PRDs that belong to 15 different protein families. For 11 of these
protein families, this represents the first phage display study done to elucidate their binding
preferences.
4.2 Future experiments: The current study has yielded several promising results including
novel peptide binding modes for known PRDs. However, further experiments are required to
validate the results obtained from phage display.
80
38 45
One of the first follow-up experiments required is to perform deep sequencing on all the
phage pools obtained from phage display. For each of the domains on which selections were
done, 96 phages were picked for DNA sequencing and generating peptide motifs. While for a
large set of domains, the sequences obtained were sufficient to accurately map the binding
preference, for others such as ubiquitin, TRFH domain of TERF1 and eIF4E, we could not
accurately predict the binding preferences. To this end, deep sequencing may help in providing a
large set of peptide sequences for each of these domains. Previous studies in the Sidhu lab have
successfully used deep sequencing to map the binding preferences of a large set of synthetic PDZ
domains [108]. Our group was also the first to report multiple peptide binding preferences
exhibited by a subset of known PDZ domains based on results obtained from deep sequencing
[109]. Such an analysis can be extended to all domains that were screened in this study.
In our study, I have identified peptide with potentially novel binding motifs. One of the
most promising results obtained was the CAP/Gly domain of DCTN1. CAP/Gly domains have
been studied in the context of binding to C-terminal peptides. I was able to obtain an internal
peptide against this domain. Further experimental evidence is required to elucidate the
mechanism of binding of this peptide. First, I need to perform biochemical assays such as iso-
terminal calorimetry (ITC) to confirm the interaction between the domain and the peptide. This
would also provide an accurate measure of the binding affinity of an internal peptide and the
CAP/Gly domain. Once the peptide binding is confirmed, I also require a structure of the domain
in complex with the phage-derived peptide to validate my structural model. We are currently
collaborating with Dr. Yufeng Tong at Structural Genomics Consortium, Toronto, to obtain a
crystal structure of the CAP/Gly domain in complex with the phage-derived peptide.
Work done for CAP/Gly can be extended to other domains such as the penta-EF hand domain of
PDCD6 and the clathrin terminal domain for which we were not able to generate a structural
models to explain results obtained from phage display experiments.
4.3 Potential avenues for research: The current study provides several potential avenues for
further investigation. One of the first steps is to extend the current study to the family members
of domains for which binding preference were successfully obtained. In the Sidhu lab, we have
generated a high-resolution binding preference map of PDZ, SH3 and WW domains. As
previously mentioned, we were able to generate binding preferences for 15 different protein
81
38 45
families of which for 12 domains this represents the first phage display study done. We can now
potentially extend our study to include all members of these 12 protein families. These families
include the ligand binding domains of nuclear receptors, 14-3-3, Gα subunits and WD40-repeat
containing proteins. These protein families play important roles in key cellular pathways and
some have been established as important drug targets (such as ligand binding domain of nuclear
receptors, 14-3-3) in cancer. The phage-derived binding preferences can be used to obtain high
specificity and high affinity peptide probes against these families.
Phage display can also be used to probe other cancer-related targets. In this study, we
focussed on candidate genes identified by our collaborator using whole genome RNAi screens in
ovarian cancer. Other functional genomics screens including exome sequencing, cDNA
hybridization microarrays have been used to predict potential cancer-relevant genes. In principle,
the current study can be extended to candidates obtained by such screens. The study can also be
used for other diseases including other cancer types. In the Sidhu lab, we have developed a high-
throughput phage display methodology to screen 96 targets in a single experiment [110]. This
methodology can be readily used to target members from diverse set of protein families.
4.4 Applications of phage-derived peptides: The peptides generated here may serve as
valuable tools for the scientific community. Peptide-based probes have been routinely used to
structurally characterize the interaction mediated by PRDs and identify novel interaction
partners. Peptides derived here may be used to design intracellular probes for studying specific
biological pathways. Finally, the phage-derived peptides may assist in identification of small-
molecule drugs against specific domains.
4.5 Final Remarks: I have presented here a systematic application of phage display pipeline
to rapidly identify peptides against a diverse set of domains. To my knowledge, this is the first
successful application of phage display against such a diverse class of domains.
82
38 45
5 REFERENCES
83
38 45
[1] Pawson, T., Nash, P., Assembly of Cell Regulatory Systems Through Protein Interaction Domains.Science 2003, 300, 445 –452.
[2] Kim, P.M., Sboner, A., Xia, Y., Gerstein, M., The role of disorder in interaction networks: a structural analysis. Mol Syst Biol n.d., 4, 179–179.
[3] Pawson, T., Warner, N., Oncogenic re-wiring of cellular signaling pathways. Oncogene 0000, 26, 1268–1275.
[4] Wells, J.A., McClendon, C.L., Reaching for high-hanging fruit in drug discovery at protein–protein interfaces. Nature 2007, 450, 1001–1009.
[5] Teyra, J., Sidhu, S.S., Kim, P.M., Elucidation of the binding preferences of peptide recognition modules: SH3 and PDZ domains. FEBS Letters n.d.
[6] Rual, J.-F., Venkatesan, K., Hao, T., Hirozane-Kishikawa, T., et al., Towards a proteome-scale map of the human protein–protein interaction network. Nature 2005, 437, 1173–1178.
[7] Puntervoll, P., Linding, R., Gemünd, C., Chabanis-Davidson, S., et al., ELM server: A new resource for investigating short functional sites in modular eukaryotic proteins. Nucleic Acids Res. 2003, 31, 3625–3630.
[8] Jones, S., Thornton, J.M., Principles of protein-protein interactions. Proceedings of the National Academy of Sciences 1996, 93, 13 –20.
[9] Ceol, A., Chatr-aryamontri, A., Santonico, E., Sacco, R., et al., DOMINO: a database of domain–peptide interactions. Nucleic Acids Res 2007, 35, D557–D560.
[10] Encinar, J.A., Fernandez-Ballester, G., Sánchez, I.E., Hurtado-Gomez, E., et al., ADAN: a database for prediction of protein–protein interaction of modular domains mediated by linear motifs. Bioinformatics 2009, 25, 2418 –2424.
[11] Vanhee, P., Reumers, J., Stricher, F., Baeten, L., et al., PepX: a structural database of non-redundant protein–peptide complexes. Nucleic Acids Res 2010, 38, D545–D551.
[12] London, N., Movshovitz-Attias, D., Schueler-Furman, O., The Structural Basis of Peptide-Protein Binding Strategies.Structure 2010, 18, 188–199.
[13] Clackson, T., Wells, J., A hot spot of binding energy in a hormone-receptor interface. Science 1995, 267, 383 –386.
[14] Johnston, C.A., Willard, F.S., Jezyk, M.R., Fredericks, Z., et al., Structure of Galpha(i1) bound to a GDP-selective peptide provides insight into guanine nucleotide exchange. Structure 2005, 13, 1069–1080.
[15] McKenna, N.J., Lanz, R.B., O’Malley, B.W., Nuclear Receptor Coregulators: Cellular and Molecular Biology. Endocrine Reviews 1999, 20, 321 –344.
[16] Seet, B.T., Dikic, I., Zhou, M.-M., Pawson, T., Reading protein modifications with interaction domains. Nature Reviews Molecular Cell Biology 2006, 7, 473–483.
[17] Good, M.C., Zalatan, J.G., Lim, W.A., Scaffold Proteins: Hubs for Controlling the Flow of Cellular Information. Science 2011, 332, 680–686.
[18] Youle, R.J., Strasser, A., The BCL-2 protein family: opposing activities that mediate cell death. Nature Reviews Molecular Cell Biology 2008, 9, 47–59.
[19] Tanaka, S., Louie, D.C., Kant, J.A., Reed, J.C., Frequent incidence of somatic mutations in translocated BCL2 oncogenes of non-Hodgkin’s lymphomas. Blood 1992, 79, 229–237.
[20] Chittenden, T., Harrington, E.A., O’Connor, R., Remington, C., et al., Induction of apoptosis by the Bcl-2 homologue Bak. , Published online: 20 April 1995; | doi:10.1038/374733a0 1995, 374, 733–736.
[21] Wang, J.-L., Zhang, Z.-J., Choksi, S., Shan, S., et al., Cell Permeable Bcl-2 Binding Peptides: A Chemical Approach to Apoptosis Induction in Tumor Cells. Cancer Res 2000, 60, 1498–1502.
[22] Oltersdorf, T., Elmore, S.W., Shoemaker, A.R., Armstrong, R.C., et al., An inhibitor of Bcl-2 family proteins induces regression of solid tumours. Nature 2005, 435, 677.
[23] Frank, R., The SPOT-synthesis technique: Synthetic peptide arrays on membrane supports—principles and applications. Journal of Immunological Methods 2002, 267, 13–26.
84
38 45
[24] Yu, H., Chen, J.K., Feng, S., Dalgarno, D.C., et al., Structural basis for the binding of proline-rich peptides to SH3 domains. Cell 1994, 76, 933–945.
[25] Filippakopoulos, P., Picaud, S., Mangos, M., Keates, T., et al., Histone Recognition and Large-Scale Structural Analysis of the Human Bromodomain Family. Cell 2012, 149, 214–231.
[26] Mok, J., Kim, P.M., Lam, H.Y.K., Piccirillo, S., et al., Deciphering Protein Kinase Specificity through Large-Scale Analysis of Yeast Phosphorylation Site Motifs. Sci Signal n.d., 3, ra12–ra12.
[27] Sidhu, S.S., Lowman, H.B., Cunningham, B.C., Wells, J.A., Phage display for selection of novel binding peptides. Meth. Enzymol. 2000, 328, 333–363.
[28] Tonikian, R., Zhang, Y., Sazinsky, S.L., Currell, B., et al., A specificity map for the PDZ domain family. PLoS Biol. 2008, 6, e239.
[29] Tonikian, R., Xin, X., Toret, C.P., Gfeller, D., et al., Bayesian modeling of the yeast SH3 domain interactome predicts spatiotemporal dynamics of endocytosis proteins. PLoS Biol. 2009, 7, e1000218.
[30] Tong, A.H.Y., Drees, B., Nardelli, G., Bader, G.D., et al., A Combined Experimental and Computational Strategy to Define Protein Interaction Networks for Peptide Recognition Modules. Science 2002, 295, 321–324.
[31] Heinis, C., Rutherford, T., Freund, S., Winter, G., Phage-encoded combinatorial chemical libraries based on bicyclic peptides. Nature Chemical Biology 2009, 5, 502–507.
[32] Bernal, F., Wade, M., Godes, M., Davis, T.N., et al., A stapled p53 helix overcomes HDMX-mediated suppression of p53. Cancer Cell 2010, 18, 411–422.
[33] Abedi, M.R., Caponigro, G., Kamb, A., Green fluorescent protein as a scaffold for intracellular presentation of peptides. Nucl. Acids Res. 1998, 26, 623–630.
[34] van de Wijngaart, D.J., Dubbink, H.J., Molier, M., de Vos, C., et al., Inhibition of androgen receptor functions by gelsolin FxxFF peptide delivered by transfection, cell-penetrating peptides, and lentiviral infection. Prostate 2011, 71, 241–253.
[35] Stanger, K., Steffek, M., Zhou, L., Pozniak, C.D., et al., Allosteric peptides bind a caspase zymogen and mediate caspase tetramerization. Nature Chemical Biology 2012, 8, 655–660.
[36] Zhang, Y., Appleton, B.A., Wiesmann, C., Lau, T., et al., Inhibition of Wnt signaling by Dishevelled PDZ peptides. Nat. Chem. Biol. 2009, 5, 217–219.
[37] Cook, D.J., Teves, L., Tymianski, M., Treatment of stroke with a PSD-95 inhibitor in the gyrencephalic primate brain. Nature 2012, 483, 213–217.
[38] Bach, A., Clausen, B.H., Møller, M., Vestergaard, B., et al., A high-affinity, dimeric inhibitor of PSD-95 bivalently interacts with PDZ1-2 and protects against ischemic brain damage. PNAS 2012, 109, 3317–3322.
[39] Li, L., Thomas, R.M., Suzuki, H., De Brabander, J.K., et al., A Small Molecule Smac Mimic Potentiates TRAIL- and TNFα-Mediated Cell Death. Science 2004, 305, 1471–1474.
[40] Moerke, N.J., Aktas, H., Chen, H., Cantel, S., et al., Small-Molecule Inhibition of the Interaction between the Translation Initiation Factors eIF4E and eIF4G. Cell 2007, 128, 257–267.
[41] Marcotte, R., Brown, K.R., Suarez, F., Sayad, A., et al., Essential Gene Profiles in Breast, Pancreatic, and Ovarian Cancer Cells. Cancer Discovery 2012, 2, 172–189.
[42] Luo, B., Cheung, H.W., Subramanian, A., Sharifnia, T., et al., Highly parallel identification of essential genes in cancer cells. Proc Natl Acad Sci U S A 2008, 105, 20380–20385.
[43] Hooda, Y., Kim, P.M., Computational structural analysis of protein interactions and networks. PROTEOMICS 2012, 12, 1697–1705.
[44] Sidhu, S.S., Phage Display In Biotechnology And Drug Discovery, CRC Press, 2005. [45] Tonikian, R., Zhang, Y., Boone, C., Sidhu, S.S., Identifying specificity profiles for peptide
recognition modules from phage-displayed peptide libraries. Nat. Protocols 2007, 2, 1368–1386. [46] Liu, Q., Berry, D., Nash, P., Pawson, T., et al., Structural Basis for Specific Binding of the Gads
SH3 Domain to an RxxK Motif-Containing SLP-76 Peptide: A Novel Mode of Peptide Recognition. Molecular Cell 2003, 11, 471–481.
85
38 45
[47] Sparks, A.B., Rider, J.E., Hoffman, N.G., Fowlkes, D.M., et al., Distinct ligand preferences of Src homology 3 domains from Src, Yes, Abl, Cortactin, p53bp2, PLCgamma, Crk, and Grb2. PNAS 1996, 93, 1540–1544.
[48] Penkert, R.R., DiVittorio, H.M., Prehoda, K.E., Internal Recognition Through PDZ Domain Plasticity in the Par-6 - Pals1 Complex. Nat Struct Mol Biol 2004, 11, 1122–1127.
[49] Rapali, P., Szenes, Á., Radnai, L., Bakos, A., et al., DYNLL/LC8: a light chain subunit of the dynein motor complex and beyond. FEBS Journal 2011, 278, 2980–2996.
[50] Rapali, P., Radnai, L., Süveges, D., Harmat, V., et al., Directed Evolution Reveals the Binding Motif Preference of the LC8/DYNLL Hub Protein and Predicts Large Numbers of Novel Binders in the Human Proteome. PLoS ONE 2011, 6, e18818.
[51] Bennett-Lovsey, R., Hart, S.E., Shirai, H., Mizuguchi, K., The SWIB and the MDM2 domains are homologous and share a common fold. Bioinformatics 2002, 18, 626–630.
[52] Pazgier, M., Liu, M., Zou, G., Yuan, W., et al., Structural Basis for High-Affinity Peptide Inhibition of P53 Interactions with MDM2 and MDMX. PNAS 2009, 106, 4665–4670.
[53] Hu, B., Gilkes, D.M., Chen, J., Efficient P53 Activation and Apoptosis by Simultaneous Disruption of Binding to MDM2 and MDMX. Cancer Res 2007, 67, 8810–8817.
[54] Ja, W.W., Adhikari, A., Austin, R.J., Sprang, S.R., Roberts, R.W., A peptide core motif for binding to heterotrimeric G protein alpha subunits.J. Biol. Chem. 2005, 280, 32057–32060.
[55] Prévost, G.P., Lonchampt, M.O., Holbeck, S., Attoub, S., et al., Anticancer Activity of BIM-46174, a New Inhibitor of the Heterotrimeric Gα/Gβγ Protein Complex. Cancer Res 2006, 66, 9227–9234.
[56] Hermeking, H., The 14-3-3 cancer connection. Nat Rev Cancer 2003, 3, 931–943. [57] Muslin, A.J., Tanner, J.W., Allen, P.M., Shaw, A.S., Interaction of 14-3-3 with Signaling Proteins Is
Mediated by the Recognition of Phosphoserine. Cell 1996, 84, 889–897. [58] Masters, S.C., Pederson, K.J., Zhang, L., Barbieri, J.T., Fu, H., Interaction of 14-3-3 with a
Nonphosphorylated Protein Ligand, Exoenzyme S of Pseudomonas aeruginosa†. Biochemistry 1999, 38, 5216–5221.
[59] Wang, B., Yang, H., Liu, Y.-C., Jelinek, T., et al., Isolation of High-Affinity Peptide Antagonists of 14-3-3 Proteins by Phage Display†. Biochemistry 1999, 38, 12499–12504.
[60] Mapelli, M., Massimiliano, L., Santaguida, S., Musacchio, A., The Mad2 Conformational Dimer: Structure and Implications for the Spindle Assembly Checkpoint. Cell 2007, 131, 730–743.
[61] Luo, X., Tang, Z., Rizo, J., Yu, H., The Mad2 Spindle Checkpoint Protein Undergoes Similar Major Conformational Changes Upon Binding to Either Mad1 or Cdc20. Molecular Cell 2002, 9, 59–71.
[62] Gulbis, J.M., Kelman, Z., Hurwitz, J., O’Donnell, M., Kuriyan, J., Structure of the C-terminal region of p21(WAF1/CIP1) complexed with human PCNA. Cell 1996, 87, 297–306.
[63] Meslet-Cladiére, L., Norais, C., Kuhn, J., Briffotaux, J., et al., A Novel Proteomic Approach Identifies New Interaction Partners for Proliferating Cell Nuclear Antigen. Journal of Molecular Biology 2007, 372, 1137–1148.
[64] Xu, H., Zhang, P., Liu, L., Lee, M.Y.W.T., A Novel PCNA-Binding Motif Identified by the Panning of a Random Peptide Display Library†. Biochemistry 2001, 40, 4512–4520.
[65] Hishiki, A., Hashimoto, H., Hanafusa, T., Kamei, K., et al., Structural Basis for Novel Interactions between Human Translesion Synthesis Polymerases and Proliferating Cell Nuclear Antigen. J. Biol. Chem. 2009, 284, 10552–10560.
[66] Izard, T., Evans, G., Borgon, R.A., Rush, C.L., et al., Vinculin activation by talin through helical bundle conversion. Nature 2003, 427, 171–175.
[67] Adey, N.B., Kay, B.K., Isolation of peptides from phage-displayed random peptide libraries that interact with the talin-binding domain of vinculin. Biochem J 1997, 324, 523–528.
[68] Gingras, A.R., Ziegler, W.H., Frank, R., Barsukov, I.L., et al., Mapping and Consensus Sequence Identification for Multiple Vinculin Binding Sites within the Talin Rod. J. Biol. Chem. 2005, 280, 37217–37224.
[69] Bayliss, R., Littlewood, T., Stewart, M., Structural basis for the interaction between FxFG nucleoporin repeats and importin-beta in nuclear trafficking. Cell 2000, 102, 99–108.
86
38 45
[70] Ambrus, G., Whitby, L.R., Singer, E.L., Trott, O., et al., Small molecule peptidomimetic inhibitors of importin α/β mediated nuclear transport. Bioorg. Med. Chem. 2010, 18, 7611–7620.
[71] Arcus, V., OB-fold domains: a snapshot of the evolution of sequence, structure and function. Current Opinion in Structural Biology 2002, 12, 794–801.
[72] Bochkareva, E., Kaustov, L., Ayed, A., Yi, G.-S., et al., Single-stranded DNA mimicry in the p53 transactivation domain interaction with replication protein A. PNAS 2005, 102, 15412–15417.
[73] Anciano Granadillo, V.J., Earley, J.N., Shuck, S.C., Georgiadis, M.M., et al., Targeting the OB-Folds of Replication Protein A with Small Molecules. Journal of Nucleic Acids 2010, 2010, 1–11.
[74] Glanzer, J.G., Liu, S., Oakley, G.G., Small molecule inhibitor of the RPA70 N-terminal protein interaction domain discovered using in silico and in vitro methods. Bioorganic & Medicinal Chemistry 2011, 19, 2589–2595.
[75] Suyama, M., Doerks, T., Braun, I.C., Sattler, M., et al., Prediction of structural domains of TAP reveals details of its interaction with p15 and nucleoporins. EMBO reports 2000, 1, 53–58.
[76] Zolotukhin, A.S., Michalowski, D., Smulevitch, S., Felber, B.K., Retroviral constitutive transport element evolved from cellular TAP(NXF1)-binding sequences. J. Virol. 2001, 75, 5567–5575.
[77] Grant, R.P., Neuhaus, D., Stewart, M., Structural basis for the interaction between the Tap/NXF1 UBA domain and FG nucleoporins at 1A resolution. J. Mol. Biol. 2003, 326, 849–858.
[78] McCullough, J., Fisher, R.D., Whitby, F.G., Sundquist, W.I., Hill, C.P., ALIX-CHMP4 interactions in the human ESCRT pathway. Proc Natl Acad Sci U S A 2008, 105, 7687–7691.
[79] Sette, P., Mu, R., Dussupt, V., Jiang, J., et al., The Phe105 loop of Alix Bro1 domain plays a key role in HIV-1 release. Structure 2011, 19, 1485–1495.
[80] Aranda, A., Pascual, A., Nuclear Hormone Receptors and Gene Expression. Physiol Rev 2001, 81, 1269–1304.
[81] Xu, H.E., Stanley, T.B., Montana, V.G., Lambert, M.H., et al., Structural basis for antagonist-mediated recruitment of nuclear co-repressors by PPARα. Nature 2002, 415, 813–817.
[82] Makishima, M., Okamoto, A.Y., Repa, J.J., Tu, H., et al., Identification of a nuclear receptor for bile acids. Science 1999, 284, 1362–1365.
[83] Maki, M., Kitaura, Y., Satoh, H., Ohkouchi, S., Shibata, H., Structures, functions and molecular evolution of the penta-EF-hand Ca2+-binding proteins. Biochimica et Biophysica Acta (BBA) - Proteins & Proteomics 2002, 1600, 51–60.
[84] Todd, B., Moore, D., Deivanayagam, C.C.., Lin, G., et al., A Structural Model for the Inhibition of Calpain by Calpastatin: Crystal Structures of the Native Domain VI of Calpain and its Complexes with Calpastatin Peptide and a Small Molecule Inhibitor. Journal of Molecular Biology 2003, 328, 131–146.
[85] Shibata, H., Suzuki, H., Kakiuchi, T., Inuzuka, T., et al., Identification of Alix-Type and Non-Alix-Type ALG-2-Binding Sites in Human Phospholipid Scramblase 3 DIFFERENTIAL BINDING TO AN ALTERNATIVELY SPLICED ISOFORM AND AMINO ACID-SUBSTITUTED MUTANTS. J. Biol. Chem. 2008, 283, 9623–9632.
[86] Suzuki, H., Kawasaki, M., Inuzuka, T., Okumura, M., et al., Structural basis for Ca2+ -dependent formation of ALG-2/Alix peptide complex: Ca2+/EF3-driven arginine switch mechanism. Structure 2008, 16, 1562–1573.
[87] Høj, B.R., la Cour, J.M., Mollerup, J., Berchtold, M.W., ALG-2 knockdown in HeLa cells results in G2/M cell cycle phase accumulation and cell death. Biochemical and Biophysical Research Communications 2009, 378, 145–148.
[88] Stirnimann, C.U., Petsalaki, E., Russell, R.B., Müller, C.W., WD40 proteins propel cellular networks. Trends Biochem. Sci. 2010, 35, 565–574.
[89] Xu, C., Min, J., Structure and function of WD40 domain proteins. Protein Cell 2011, 2, 202–214. [90] Patel, A., Dharmarajan, V., Cosgrove, M.S., Structure of WDR5 bound to mixed lineage leukemia
protein-1 peptide. J. Biol. Chem. 2008, 283, 32158–32161.
87
38 45
[91] Zhang, P., Lee, H., Brunzelle, J.S., Couture, J.-F., The plasticity of WDR5 peptide-binding cleft enables the binding of the SET1 family of histone methyltransferases. Nucleic Acids Res 2012, 40, 4237–4246.
[92] Karatas, H., Townsend, E.C., Bernard, D., Dou, Y., Wang, S., Analysis of the Binding of Mixed Lineage Leukemia 1 (MLL1) and Histone 3 Peptides to WD Repeat Domain 5 (WDR5) for the Design of Inhibitors of the MLL1−WDR5 Interaction. J. Med. Chem. 2010, 53, 5179–5185.
[93] Lemmon, S.K., Traub, L.M., Getting in Touch with the Clathrin Terminal Domain. Traffic 2012, 13, 511–519.
[94] Haar, E. ter, Harrison, S.C., Kirchhausen, T., Peptide-in-groove interactions link target proteins to the β-propeller of clathrin. PNAS 2000, 97, 1096–1100.
[95] Miele, A.E., Watson, P.J., Evans, P.R., Traub, L.M., Owen, D.J., Two distinct interaction motifs in amphiphysin bind two independent sites on the clathrin terminal domain beta-propeller. Nat. Struct. Mol. Biol. 2004, 11, 242–248.
[96] Kang, D.S., Kern, R.C., Puthenveedu, M.A., Zastrow, M. von, et al., Structure of an Arrestin2-Clathrin Complex Reveals a Novel Clathrin Binding Domain That Modulates Receptor Trafficking. J. Biol. Chem. 2009, 284, 29860–29872.
[97] Willox, A.K., Royle, S.J., Functional Analysis of Interaction Sites on the N-Terminal Domain of Clathrin Heavy Chain. Traffic 2012, 13, 70–81.
[98] Weisbrich, A., Honnappa, S., Jaussi, R., Okhrimenko, O., et al., Structure-function relationship of CAP-Gly domains. Nat. Struct. Mol. Biol. 2007, 14, 959–967.
[99] Steinmetz, M.O., Akhmanova, A., Capturing protein tails by CAP-Gly domains. Trends Biochem. Sci. 2008, 33, 535–545.
[100] Matsuo, H., Li, H., McGuire, A.M., Fletcher, C.M., et al., Structure of translation factor elF4E bound to m7GDP and interaction with 4E-binding protein. Nature Structural & Molecular Biology 1997, 4, 717–724.
[101] Marcotrigiano, J., Gingras, A.-C., Sonenberg, N., Burley, S.K., Cap-Dependent Translation Initiation in Eukaryotes Is Regulated by a Molecular Mimic of eIF4G. Molecular Cell 1999, 3, 707–716.
[102] Volpon, L., Osborne, M.J., Capul, A.A., Torre, J.C. de la, Borden, K.L.B., Structural characterization of the Z RING-eIF4E complex reveals a distinct mode of control for eIF4E. PNAS 2010, 107, 5441–5446.
[103] Swanson, K.A., Kang, R.S., Stamenova, S.D., Hicke, L., Radhakrishnan, I., Solution structure of Vps27 UIM–ubiquitin complex important for endosomal sorting and receptor downregulation. EMBO J 2003, 22, 4597–4606.
[104] Hofmann, K., Falquet, L., A ubiquitin-interacting motif conserved in components of the proteasomal and lysosomal protein degradation systems. Trends in Biochemical Sciences 2001, 26, 347–350.
[105] Sims, J.J., Scavone, F., Cooper, E.M., Kane, L.A., et al., Polyubiquitin-sensor proteins reveal localization and linkage-type dependence of cellular ubiquitin signaling. Nature Methods 2012, 9, 303–309.
[106] Fairall, L., Chapman, L., Moss, H., de Lange, T., Rhodes, D., Structure of the TRFH dimerization domain of the human telomeric proteins TRF1 and TRF2.Mol. Cell 2001, 8, 351–361.
[107] Chen, Y., Yang, Y., Overbeek, M. van, Donigian, J.R., et al., A Shared Docking Motif in TRF1 and TRF2 Used for Differential Recruitment of Telomeric Proteins. Science 2008, 319, 1092–1096.
[108] Ernst, A., Gfeller, D., Kan, Z., Seshagiri, S., et al., Coevolution of PDZ domain-ligand interactions analyzed by high-throughput phage display and deep sequencing.Mol Biosyst 2010, 6, 1782–1790.
[109] Gfeller, D., Butty, F., Wierzbicka, M., Verschueren, E., et al., The multiple-specificity landscape of modular peptide recognition domains.Mol Syst Biol 2011, 7.
[110] Huang, H., Sidhu, S.S., Studying binding specificities of peptide recognition modules by high-throughput phage display selections. Methods Mol. Biol. 2011, 781, 87–97.
88
38 45
APPENDIX
89
38 45
Appendix A: List of ovarian cancer cell lines Cancer cell line Cancer type Species Development
stage Morphology
609050M Ovarian Homo Sapiens Adult Epithelial A2780 Ovarian Homo Sapiens Adult Epithelial A2780_CIS Ovarian Homo Sapiens Adult Epithelial MM_OVCAR432_Bast_1 Ovarian Homo Sapiens Adult Epithelial OV-1946 Ovarian Homo Sapiens Adult Epithelial OV-90 Ovarian Homo Sapiens Adult Epithelial OVCA1369_TR Ovarian Homo Sapiens Adult Epithelial OVCA433_Bast Ovarian Homo Sapiens Adult Epithelial OVCA5 Ovarian Homo Sapiens Adult Epithelial OVCA8 Ovarian Homo Sapiens Adult Epithelial OVCAR-3 Ovarian Homo Sapiens Adult Epithelial SK-OV-3 Ovarian Homo Sapiens Adult Epithelial TOV-1946 Ovarian Homo Sapiens Adult Epithelial TOV-2223G Ovarian Homo Sapiens Adult Epithelial TOV-3133G Ovarian Homo Sapiens Adult Epithelial * Information obtained from COLT-cancer database at the Moffat lab at Terrence Donnelly CCBR, University of Toronto.
90
38 45
Appendix B:Protein sequences of 66 domains
1433F GDREQLLQRARLAEQAERYDDMASAMKAVTELNEPLSNEDRNLLSVAYKNVVGARRSSWRVISSIEQKTMADGNEKKLEKVKAYREKIEKELETVCNDVLSLLDKFLIKNCNDFQYESKVFYLKMKGDYYRYLAEVASGEKKNSVVEASEAAYKEAFEISKEQMQPTHPIRLGLALNFSVFYYEIQNAPEQACLLAKQAFDDAIAELDTLNEDSYKDSTLIMQLLRDNLTLWTSDQQDEEAGEGN ACTG2MEEEIAALVIDNGSGMCKAGFAGDDAPRAVFPSIVGRPRHQGVMVGMGQKDSYVGDEAQSKRGILTLKYPIEHGIVTNWDDMEKIWHHTFYNELRVAPEEHPVLLTEAPLNPKANREKMTQIMFETFNTPAMYVAIQAVLSLYASGRTTGIVMDSGDGVTHTVPIYEGYALPHAILRLDLAGRDLTDYLMKILTERGYSFTTTAEREIVRDIKEKLCYVALDFEQEMATAASSSSLEKSYELPDGQVITIGNERFRCPEALFQPSFLGMESCGIHETTFNSIMKCDVDIRKDLYANTVLSGGTTMYPGIADRMQKEITALAPSTMKIKIIAPPERKYSVWIGGSILASLSTFQQMWISKQEYDESGPSIVHRKCF ACTH EEETTALVCDNGSGLCKAGFAGDDAPRAVFPSIVGRPRHQGVMVGMGQKDSYVGDEAQSKRGILTLKYPIEHGIITNWDDMEKIWHHSFYNELRVAPEEHPTLLTEAPLNPKANREKMTQIMFETFNVPAMYVAIQAVLSLYASGRTTGIVLDSGDGVTHNVPIYEGYALPHAIMRLDLAGRDLTDYLMKILTERGYSFVTTAEREIVRDIKEKLCYVALDFENEMATAASSSSLEKSYELPDGQVITIGNERFRCPETLFQPSFIGMESAGIHETTYNSIMKCDIDIRKDLYANNVLSGGTTMYPGIADRMQKEITALAPSTMKIKIIAPPERKYSVWIGGSILASLSTFQQMWISKPEYDEAGPSIVHRKCF IGLL1 LLRPTAASQSRALGPGAPGGSSRSSLRSRWGRFLLQRGSWTGPRCWPRGFQSKHNSVTHVFGSGTQLTVLSQPKATPSVTLFPPSSEELQANKATLVCLMNDFYPGILTVTWKADGTPITQGVEMTTPSKQSNNKYAASSYLSLTPEQWRSRRSYSCQVMHEGSTVEKTVAPAECS AP2M1 KYRRNELFLDVLESVNLLMSPQGQVLSAHVSGRVVMKSYLSGMPECKFGMNDKIVIEKQGKGTADETSKSGKQSIAIDDCTFHQCVRLSKFDSERSISFIPPDGEFELMRYRTTKDIILPFRVIPLVREVGRTKLEVKVVIKSNFKPSLLAQKIEVRIPTPLNTSGVQVICMKGKAKYKASENAIVWKIKRMAGMKESQISAEIELLPTNDKKKWARPPISMNFEVPFAPSGLKVRYLKVFEPKLNYSDHDVIKWVRYIGRSGIYETRC B2CL1 MSQSNRELVVDFLSYKLSQKGYSWSQFSDVEENRTEAPEGTESEMETPSAINGNPSWHLADSPAVNGATGHSSSLDAREVIPMAAVKQALREAGDEFELRYRRAFSDLTSQLHITPGTAYQSFEQVVNELFRDGVNWGRIVAFFSFGGALCVESVDKEMQVLVSRIAAWMATYLNDHLEPWIQENGGWDTFVELYGNNAAAESRKGQERFNRWFLTGMTVAGVVLLGSLFSRK PDC6I TFISVQLKKTSEVDLAKPLVKFIQQTYPSGGEEQAQYCRAAEELSKLRRAAVGRPLDKHEGALETLLRYYDQICSIEPKFPFSENQICLTFTWKDAFDKGSLFGGSVKLALASLGYEKSCVLFNCAALASQIAAEQNLDNDEGLKIAAKHYQFASGAFLHIKETVLSALSREPTVDISPDTVGTLSLIMLAQAQEVFFLKATRDKMKDAIIAKLANQAADYFGDAFKQCQYKDTLPKEVFPVLAAKHCIMQANAEYHQSILAKQQKKFGEEIARLQHAAELIKTVASRYDEYVNVKDFSDKINRALAAAKKDNDFIYHDRVPDLKDLDPIGKATLVKSTPVNVPISQKFTDLFEKMVPVSVQQSLAAYNQRKADLVNRSIAQMREATTLA CPNS1 MFLVNSFLKGGGGGGGGGGGLGGGLGNVLGGLISGAGGGGGGGGGGGGGGGGGGGGTAMRILGGVISAISEAAAQYNPEPPPPRTHYSNIEANESEEVRQFRRLFAQLAGDDMEVSATELMNILNKVVTRHPDLKTDGFGIDTCRSMVAVMDSDTTGKLGFEEFKYLWNNIKRWQAIYKQFDTDRSGTICSSELPGAFEAAGFHLNEHLYNMIIRRYSDESGNMDFDNFISCLVRLDAMFRAFKSLDKDGTGQIQVNIQEWLQLTMYS DCTN1 PLRVGSRVEVIGKGHRGTVAYVGATLFATGKWVGVILDEAKGKNDGTVQGRKYFTCDEGHGIFVRQSQIQVF CASP2 RLSTDTVEHSLDNKDGPVCLQVKPCTPEFYQTHFQLAYRLQSRPRGLALVLSNVHFTGEKELEFRSGGDVDHSTLVTLFKLLGYDVHVLCDQTAQEMQEKLQNFAQLPAHRVTDSCIVALLSHGVEGAIYGVDGKLLQLQEVFQLFDNANCPSLQNKPKMFFIQACRGDETDRGVDQQDGKNHAGSPGCEESDAGKEKLPKMRLPTRSDMICGYACLKGTAAMRNTKRGSWYIEALAQVFSERACDMHVADMLVKVNALIKDREGYAPGTEFHRCKEMSEYCSTLCRHLYLFPGHPPT CLH1 MAQILPIRFQEHLQLQNLGINPANIGFSTLTMESDKFICIREKVGEQAQVVIIDMNDPSNPIRRPISADSAIMNPASKVIALKAGKTLQIFNIEMKSKMKAHTMTDDVTFWKWISLNTVALVTDNAVYHWSMEGESQPVKMFDRHSSLAGCQIINYRTDAKQKWLLLTGISAQQNRVVGAMQLYSVDRKVSQPIEGHAASFAQFKMEGNAEESTLFCFAVRGQAGGKLHIIEVGTPPTGNQPFPKKAVDVFFPPEAQN
91
38 45
DFPVAMQISEKHDVVFLITKYGYIHLYDLETGTCIYMNRISGETIFVTAPHEATAGIIGVNRKGQVLSVCVEEENIIPYITNVLQNPDLALRMAVRNNLAGAEEL DLG1-1 EITLERGNSGLGFSIAGGTDNPHIGDDSSIFITKIITGGAAAQDGRLRVNDCILRVNEVDVRDVTHSKAVEALKEAGSIVRLYVKRR DLG1-2 EIKLIKGPKGLGFSIAGGVGNQHIPGDNSIYVTKIIEGGAAHKDGKLQIGDKLLAVNNVCLEEVTHEEAVTALKNTSDFVYLKVAKP DLG2-2 EIKLFKGPKGLGFSIAGGVGNQHIPGDNSIYVTKIIDGGAAQKDGRLQVGDRLLMVNNYSLEEVTHEEAVAILKNTSEVVYLKVGKP DLG2-3 KVVLHKGSTGLGFNIVGGEDGEGIFVSFILAGGPADLSGELQRGDQILSVNGIDLRGASHEQAAAALKGAGQTVTIIAQYQ DLG4-2 EIKLIKGPKGLGFSIAGGVGNQHIPGDNSIYVTKIIEGGAAHKDGRLQIGDKILAVNSVGLEDVMHEDAVAALKNTYDVVYLKVAKP DLG4-3 RIVIHRGSTGLGFNIVGGEDGEGIFISFILAGGPADLSGELRKGDQILSVNGVDLRNASHEQAAIALKNAGQTVTIIAQYK DNAS1 LKIAAFNIQTFGETKMSNATLVSYIVQILSRYDIALVQEVRDSHLTAVGKLLDNLNQDAPDTYHYVVSEPLGRNSYKERYLFVYRPDQVSAVDSYYYDDGCEPCGNDTFNREPAIVRFFSRFTEVREFAIVPLHAAPGDAVAEIDALYDVYLDVQEKWGLEDVMLMGDFNAGCSYVRPSQWSSIRLWTSPTFQWLIPDSADTTATPTHCAYDRIVVAGMLLRGAVVPDSALPFNFQAAYGLSDQLAQAISDHYPVEVMLK DYL2 MSDRKAVIKNADMSEDMQQDAVDCATQAMEKYNIEKDIAAYIKKEFDKKYNPTWHCIVGRNFGSYVTHETKHFIYFYLGQVAILLFKSG DYL1 MCDRKAVIKNADMSEEMQQDSVECATQALEKYNIEKDIAAHIKKEFDKKYNPTWHCIVGRNFGSYVTHETKHFIYFYLGQVAILLFKSG PROF2 MAGWQSYVDNLMCDGCCQEAAIVGYCDAKYVWAATAGGVFQSITPIEIDMIVGKDREGFFTNGLTLGAKKCSVIRDSLYVDGDCTMDIRTKSQGGEPTYNVAVGRAGRVLVFVMGKEGVHGGGLNKKAYSMAKYLRDSGF DYN2 MEELIPLVNKLQDAFSSIGQSCHLDLPQIAVVGGQSAGKSSVLENFVGRDFLPRGSGIVTRRPLILQLIFSKTEHAEFLHCKSKKFTDFDEVRQEIEAETDRVTGTNKGISPVPINLRVYSPHVLNLTLIDLPGITKVPVGDQPPDIEYQIKDMILQFISRESSLILAVTPANMDLANSDALKLAKEVDPQGLRTIGVITKLDLMDEGTDARDVLENKLLPLRRGYIGVVNRSQKDIEGKKDIRAALAAERKFFLSHPAYRHMADRMGTPHLQKTLNQQLTNHIRESLPALRSKLQSQL PDCD6 PDQSFLWNVFQRVDKDRSGVISDTELQQALSNGTWTPFNPVTVRSIISMFDRENKAGVNFSEFTGVWKYITDWQNVFRTYDRDNSGMIDKNELKQALSGFGYRLSDQFHDILIRKFDRQGRGQIAFDDFIQGCIVLQRLTDIFRRYDTDQDGWIQVSYEQYLSMVFSIV IF4E ATVEPETTPTPNPPTTEEEKTESNQEVANPEHYIKHPLQNRWALWFFKNDKSKTWQANLRLISKFDTVEDFWALYNHIQLSSNLMPGCDYSLFKDGIEPMWEDEKNKRGGRWLITLNKQQRRSDLDRFWLETLLCLIGESFDDYSDDVCGAVVNVRAKGDKIAIWTTECENREAVTHIGRVYKERLGLPPKIVIGYQSHADTATKSGSTTKNRFVV E41L3 MQCKVILLDGSEYTCDVEKRSRGQVLFDKVCEHLNLLEKDYFGLTYRDAENQKNWLDPAKEIKKQVRSGAWHFSFNVKFYPPDPAQLSEDITRYYLCLQLRDDIVSGRLPCSFVTLALLGSYTVQSELGDYDPDECGSDYISEFRFAPNHTKELEDKVIELHKSHRGMTPAEAEMHFLENAKKLSMYGVDLHHAKDSEGVEIMLGVCASGLLIYRDRLRINRFAWPKVLKISYKRNNFYIKIRPGEFEQFESTIGFKLPNHRAAKRLWKVCVEHHTFFRLLL
92
38 45
GNAI1 GCTLSAEDKAAVERSKMIDRNLREDGEKAAREVKLLLLGAGESGKSTIVKQMKIIHEAGYSEEECKQYKAVVYSNTIQSIIAIIRAMGRLKIDFGDSARADDARQLFVLAGAAEEGFMTAELAGVIKRLWKDSGVQACFNRSREYQLNDSAAYYLNDLDRIAQPNYIPTQQDVLRTRVKTTGIVETHFTFKDLHFKMFDVGGQRSERKKWIHCFEGVTAIIFCVALSDYDLVLAEDEEMNRMHESMKLFDSICNNKWFTDTSIILFLNKKDLFEEKIKKSPLTICYPEYAGSNTYEEAAAYIQCQFEDLNKRKDTKEIYTHFTCATDTKNVQFVFDAVTDVIIKNNLKDCGLF GNAO GCTLSAEERAALERSKAIEKNLKEDGISAAKDVKLLLLGAGESGKSTIVKQMKIIHEDGFSGEDVKQYKPVVYSNTIQSLAAIVRAMDTLGIEYGDKERKADAKMVCDVVSRMEDTEPFSAELLSAMMRLWGDSGIQECFNRSREYQLNDSAKYYLDSLDRIGAADYQPTEQDILRTRVKTTGIVETHFTFKNLHFRLFDVGGQRSERKKWIHCFEDVTAIIFCVALSGYDQVLHEDETTNRMHESLMLFDSICNNKFFIDTSIILFLNKKDLFGEKIKKSPLTICFPEYTGPNTYEDAAAYIQAQFESKNRSPNKEIYCHMTCATDTNNIQVVFDAVTDIIIANNLRGCGLY GNAI3 GCTLSAEDKAAVERSKMIDRNLREDGEKAAKEVKLLLLGAGESGKSTIVKQMKIIHEDGYSEDECKQYKVVVYSNTIQSIIAIIRAMGRLKIDFGEAARADDARQLFVLAGSAEEGVMTPELAGVIKRLWRDGGVQACFSRSREYQLNDSASYYLNDLDRISQSNYIPTQQDVLRTRVKTTGIVETHFTFKDLYFKMFDVGGQRSERKKWIHCFEGVTAIIFCVALSDYDLVLAEDEEMNRMHESMKLFDSICNNKWFTETSIILFLNKKDLFEEKIKRSPLTICYPEYTGSNTYEEAAAYIQCQFEDLNRRKDTKEIYTHFTCATDTKNVQFVFDAVTDVIIKNNLKECGLY GRB2 MEAIAKYDFKATADDELSFKRGDILKVLNEECDQNWYKAELNGKDGFIPKNYIEMKPH GRAP2 GRVRWARALYDFEALEDDELGFHSGEVVEVLDSSNPSWWTGRLHNKLGLFPANYVAPMTR IMB1 MELITILEKTVSPDRLELEAAQKFLERAAVENLPTFLVELSRVLANPGNSQVARVAAGLQIKNSLTSKDPDIKAQYQQRWLAIDANARREVKNYVLQTLGTETYRPSSASQCVAGIACAEIPVNQWPELIPQLVANVTNPNSTEHMKESTLEAIGYICQDIDPEQLQDKSNEILTAIIQGMRKEEPSNNVKLAATNALLNSLEFTKANFDKESERHFIMQVVCEATQCPDTRVRVAALQNLVKIMSLYYQYMETYMGPALFAITIEAMKSDIDEVALQGIEFWSNVCDEEMDLAIEASEAAEQGRPPEHTSKFYAKGALQYLVPILTQTLTKQDENDDDDDWNPCKAAGVCLMLLATCCEDDIVPHVLPFIKEHIKNPDWRYRDAAVMAFGCILEGPEPSQLKPLVIQAMPTLIELMKDPSVVVRDTAAWTVGRICELLPEAAINDVYLAPLLQCLIEGLSAEPRVASNVCWAFSSLAEAAYEAADVADDQEEPATYCLSSSFELIVQKLLETTDRPDGHQNNLRSSAYESLMEIVKNSAKDCYPAVQKTTLVIMERLQQVLQMESHIQSTSDRIQFNDLQSLLCATLQNVLRKVQHQDALQISDVVMASLLRMFQSTAGSGGVQEDALMAVSTLVEVLGGEFLKYMEAFKPFLGIGLKNYAEYQVCLAAVGLVGDLCRALQSNIIPFCDEVMQLLLENLGNENVHRSVKPQILSVFGDIALAIGGEFKKYLEVVLNTLQQASQAQVDKSDYDMVDYLNELRESCLEAYTGIVQGLKGDQENVHPDVMLVQPRVEFILSFIDHIAGDEDHTDGVVACAAGLIGDLCTAFGKDVLKLVEARPMIHELLTEGRRSKTNKAKTLATWATKELRKLKNQA NR1H4 KTELTPDQQTLLHFIMDSYNKQRMPQEITNKILKEEFSAEENFLILTEMATNHVQVLVEFTKKLPGFQTLDHEDQIALLKGSAVEAMFLRSAEIFNKKLPSGHSDLLEERIRNSGISDEYITPMFSFYKSIGELKMTQEEYALLTAIVILSPDRQYIKDREAVEKLQEPLLDVLQKLCKIHQPENPQHFACLLGRLTELRTFNHHHAEMLMSWRVNDHK NR1I2 DLCSLKVSLQLRGEDGSVWNYKPPADSGGKEIFSLLPHMADMSTYMFKGIISFAKVISYFRDLPIEDQISLLKGAAFELCQLRFNTVFNAETGTWECGRLSYCLEDTAGGFQQLLLEPMLKFHYMLKKLQLHEEEYVLMQAISLFSPDRPGVLQHRVVDQLQEQFAITLKSYIECNRPQPAHRFLFLKIMAMLTELRSINAQHTQRLLRIQDIHPFATPLMQELFGITGS GCR LTPTLVSLLEVIEPEVLYAGYDSSVPDSTWRIMTTLNMLGGRQVIAAVKWAKAIPGFRNLHLDDQMTLLQYSWMFLMAFALGWRSYRQSSANLLCFAPDLIINEQRMTLPCMYDQCKHMLYVSSELHRLQVSYEEYLCMKTLLLLSSVPKDGLKSQELFDEIRMTYIKELGKAIVKREGNSSQNWQRFYQLTKLLDSMHEVVENLLNYCFQTFLDKTMSIEFPEMLAEIITNQIPKYSNGNIKKLLFHQK RXRG STNDPVTNICHAADKQLFTLVEWAKRIPHFSDLTLEDQVILLRAGWNELLIASFSHRSVSVQDGILLATGLHVHRSSAHSAGVGSIFDRVLTELVSKMKDMQMDKSELGCLRAIVLFNPDAKGLSNPSEVETLREKVYATLEAYTKQKYPEQPGRFAKLLLRLPALRSIGLKCLEHLFFFKLIGDTPIDTFLMEMLETPLQIT MD2L1
93
38 45
ALQLSREQGITLRGSAEIVAEFFSFGINSILYQRGIYPSETFTRVQKYGLTLLVTTDLELIKYLNNVVEQLKDWLYKCSVQKLVVVISNIESGEVLERWQFDIECDKTAKDDSAPREKSQKAIQDEIRSVIRQITATVTFLPLLEVSCSFDLLIYTDKDLVVPEKWEESGPQFITNSEEVRLRSFTTTIHKVNSMVAYKIPVND 2B11 GDTRPRFLWQLKFECHFFNGTERVRLLERCIYNQEESVRFDSDVGEYRAVTELGRPDAEYWNSQKDLLEQRRAAVDTYCRHNYGVGESFTVQRRVEPKVTVYPSKTQPLQHHNLLVCSVSGFYPGSIEVRWFRNGQEEKAGVVSTGLIQNGDWTFQTLVMLETVPRSGEVYTCQVEHPSVTSPLTVEWRARSESAQSK PCNA MFEARLVQGSILKKVLEALKDLINEACWDISSSGVNLQSMDSSHVSLVQLTLRSEGFDTYRCDRNLAMGVNLTSMSKILKCAGNEDIITLRAEDNADTLALVFEAPNQEKVSDYEMKLMDLDVEQLGIPEQEYSCVVKMPSGEFARICRDLSHIGDAVVISCAKDGVKFSASGELGNGNIKLSQTSNVDKEEEAVTIEMNEPVQLTFALRYLNFFTKATPLSSTVTLSMSADVPLVVEYKIADMGHLKYYLAPKIEDEEGS MK03 YTQLQYIGEGAYGMVSSAYDHVRKTRVAIKKISPFEHQTYCQRTLREIQILLRFRHENVIGIRDILRASTLEAMRDVYIVQDLMETDLYKLLKSQQLSNDHICYFLYQILRGLKYIHSANVLHRDLKPSNLLINTTCDLKICDFGLARIADPEHDHTGFLTEYVATRWYRAPEIMLNSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHILGILGSPSQEDLNCIINMKARNYLQSLPSKTKVAWAKLFPKSDSKALDLLDRMLTFNPNKRITVEEALAHPYL KAPCB FERKKTLGTGSFGRVMLVKHKATEQYYAMKILDKQKVVKLKQIEHTLNEKRILQAVNFPFLVRLEYAFKDNSNLYMVMEYVPGGEMFSHLRRIGRFSEPHARFYAAQIVLTFEYLHSLDLIYRDLKPENLLIDHQGYIQVTDFGFAKRVKGRTWTLCGTPEYLAPEIILSKGYNKAVDWWALGVLIYEMAAGYPPFFADQPIQIYEKIVSGKVRFPSHFSSDLKDLLRNLLQVDLTKRFGNLKNGVSDIKTHKWF SRPK2 YHVIRKLGWGHFSTVWLCWDMQGKRFVAMKVVKSAQHYTETALDEIKLLKCVRESDPSDPNKDMVVQLIDDFKISGMNGIHVCMVFEVLGHHLLKWIIKSNYQGLPVRCVKSIIRQVLQGLDYLHSKCKIIHTDIKPENILMCVDDAYVRRMAAEATEWQKAGAPPPSGSAVSTAPQQKPIGKISKNKKKKLKKKQKRQAELLEKRLQEIEELEREAERKIIEENITSAAPSNDQDGEYCPEVKLKTTGLEEAAEAETAKDNGEAEDQEEKEDAEKENIEKDEDDVDQELANIDPTWIESPKTNGHIENGPFSLEQQLDDEDDDEEDCPNPEEYNLDEPNAESDYTYSSSYEQFNGELPNGRHKIPESQFPEFSTSLFSGSLEPVACGSVLSEGSPLTEQEESSPSHDRSRTVSASSTGDLPKAKTRAADLLVNPLDPRNADKIRVKIADLGNACWVHKHFTEDIQTRQYRSIEVLIGAGYSTPADIWSTACMAFELATGDYLFEPHSGEDYSRDEDHIAHIIELLGSIPRHFALSGKYSREFFNRRGELRHITKLKPWSLFDVLVEKYGWPHEDAAQFTDFLIPMLEMVPEKRASAGECLRHP AURKB AQKENSYPWPYGRQTAPSGLSTLPQRVLRKEPVTPSALVLMSRSNVQPTAAPGQKVMENSSGTPDILTRHFTIDDFEIGRPLGKGKFGNVYLAREKKSHFIVALKVLFKSQIEKEGVEHQLRREIEIQAHLHHPNILRLYNYFYDRRRIYLILEYAPRGELYKELQKSCTFDEQRTATIMEELADALMYCHGKKVIHRDIKPENLLLGLKGELKIADFGWSVHAPSLRRKTMCGTLDYLPPEMIEGRMHNEKVDLWCIGVLCYELLVGNPPFESASHNETYRRIVKVDLKFPASVPMGAQDLISKLLRHNPSERLPLAQVSAHPWVRANSRRVLPPSALQSVA KAPCA FERIKTLGTGSFGRVMLVKHKETGNHYAMKILDKQKVVKLKQIEHTLNEKRILQAVNFPFLVKLEFSFKDNSNLYMVMEYVPGGEMFSHLRRIGRFSEPHARFYAAQIVLTFEYLHSLDLIYRDLKPENLLIDQQGYIQVTDFGFAKRVKGRTWTLCGTPEYLAPEIILSKGYNKAVDWWALGVLIYEMAAGYPPFFADQPIQIYEKIVSGKVRFPSHFSSDLKDLLRNLLQVDLTKRFGNLKNGVNDIKNHKWF PROF1 AGWNAYIDNLMADGTCQDAAIVGYKDSPSVWAAVPGKTFVNITPAEVGVLVGKDRSSFYVNGLTLGGQKCSVIRDSLLQDGEFSMDLRTKSTGGAPTFNVTVTKTDKTLVLLMGKEGVHGGLINKKCYEMASHLRRSQY DAB2 GDGVKYKAKLIGIDDVPDARGDKMSQDSMMKLKGMAAAGRSQGQHKQRIWVNISLSGIKIIDEKTGVIEHEHPVNKISFIARDVTDNRAFGYVCGGEGQHQFFAIKTGQQAEPLVVDLKDLFQVIYNVKKKEEEKKKIEEASKAVENGSEAL RAC1 MQAIKCVVVGDGAVGKTCLLISYTTNAFPGEYIPTVFDNYSANVMVDGKPVNLGLWDTAGQEDYDRLRPLSYPQTDVFLICFSLVSPASFENVRAKWYPEVRHHCPNTPIILVGTKLDLRDDKDTIEKLKEKKLTPITYPQGLAMAKEIGAVKYLECSALTQRGLKTVFDEAIRAVLCPPPVKKRKRKC RAD51
94
38 45
SEIIQITTGSKELDKLLQGGIETGSITEMFGEFRTGKTQICHTLAVTCQLPIDRGGGEGKAMYIDTEGTFRPERLLAVAERYGLSGSDVLDNVAYARAFNTDHQTQLLYQASAMMVESRYALLIVDSATALYRTDYSGRGELSARQMHLARFLRMLLRLADEFGVAVVITNQVVAQVDGAAMFAADPKKPIGGNIIAHASTTRLYLRKGRGETRICKIYDSPCLPEAEAMFAINADGVGDAKD RFA1 VGQLSEGAIAAIMQKGDTNIKPILQVINIRPITTGNSPPRYRLLMSDGLNTLSSFMLATQLNPLVEEEQLSSNCVCQIHRFIVNTLKDGRRVVILMELEVLKSAEAVGVKIGNPVPYNEG U2AF1 LRCAVSDVEMQEHYDEFFEEVFTEMEEKYGEVEEMNVCDNLGDHLVGNVYVKFRREEDAEKAVIDLNNRWFNGQPIHAELSPV IPSP HRHHPREMKKRVEDLHVGATVAPSSRRDFTFDLYRALASAAPSQSIFFSPVSISMSLAMLSLGAGSSTKMQILEGLGLNLQKSSEKELHRGFQQLLQELNQPRDGFQLSLGNALFTDLVVDLQDTFVSAMKTLYLADTFPTNFRDSAGAMKQINDYVAKQTKGKIVDLLKNLDSNAVVIMVNYIFFKAKWETSFNHKGTQEQDFYVTSETVVRVPMMSREDQYHYLLDRNLSCRVVGVPYQGNATALFILPSEGKMQQVENGLSEKTLRKWLKMFKKRQLELYLPKFSIEGSYQLEKVLPSLGISNVFTSHADLSGISNHSNIQVSEMVHKAVVEVDESGTRAAAATGTIFTFRSARLNSQRLVFNRPFLMFIVDNNILFLGKVNRP PLCG1 TFKCAVKALFDYKAQREDELTFIKSAIIQNVEKQEGGWWRGDYGGKKQLWFPSNYVEEMVN CACB1 RPSDSDVSLEEDREALRKEAERQALAQLEKAKTKPVAFAVRTNVGYNPSPGDEVPVQGVAITFEPKDFLHIKEKYNNDWWIGRLVKEGCEVGFIPSPVKLDSLRLLQEQKLRQNRLGSSKSGDNSSSSLGDVVTGTRRPTPPASAKQKQKSTEHVPPYDVVPSMRPIILVGPSLKGYEVTDMMQKALFDFLKHRFDGRISITRVTADISLAKRSVLNNPSKHIIIERSNTRSSLAEVQSEIERIFELARTLQLVALDADTINHPAQLSKTSLAPIIVYIKITSPKVLQRLIKSRGKSQSKHLNVQIAASEKLAQCPPEMFDIILDENQLEDACEHLAEYLEAYWKA MDM4 QVRPKLPLLKILHAAGAQGEMFTVKEVMHYLGQYIMVKQLYDQQEQHMVYCGGDLLGELLGRQSFSVKDPSPLYDMLRKNL NXF1 PEQQEMLQAFSTQSGMNLEWSQKCLQDNNWDYTRSAQAFTHLKAKGEIPEVAFMK T2FA SGDVQVTEDAVRRYLTRKPMTTKDLLKKFQTKKTGLSSEQTVNVLAQILKRLNPERKMINDKMHFSLKE TPA IKGGLFADIASHPWQAAIFAKHRRSPGERFLCGGILISSCWILSAAHCFQERFPPHHLTVILGRTYRVVPGEEEQKFEVEKYIVHKEFDDDTYDNDIALLQLKSDSSRCAQESSVVRTVCLPPADLQLPDWTECELSGYGKHEALSPFYSERLKEAHVRLYPSSRCTSQHLLNRTVTDNMLCAGDTRSGGPQANLHDACQGDSGGPLVCLNDGRMTLVGIISWGLGCGQKDVPGVYTKVTNYLDWIRDNMRP TERF1 EEEEEDAGLVAEAEAVAAGWMLDFLCLSLCRAFRDGRSEDFRRTRNSAEAIIHGLSSLTACQLRTIYICQFLTRIAAGKTLDAQFENDERITPLESALMIWGSIEKEHDKLHEEIQNLIKIQAIAVCMENGNFKEAEEVFERIFGDPNSHMPFKSKLLMIISQKDTFHSFFQHFSYNHMMEKIKSYVNYVLSEKSSTFLMKAAAKVVESKR APLP2 VKAVCSQEAMTGPCRAVMPRWYFDLSKGKCVRFIYGGCGGNRNNFESEDYCMAVCKAMI ACRO IVGGKAAQHGAWPWMVSLQIFTYNSHRYHTCGGSLLNSRWVLTAAHCFVGKNNVHDWRLVFGAKEITYGNNKPVKAPLQERYVEKIIIHEKYNSATEGNDIALVEITPPISCGRFIGPGCLPHFKAGLPRGSQSCWVAGWGYIEEKAPRPSSILMEARVDLIDLDLCNSTQWYNGRVQPTNVCAGYPVGKIDTCQGDSGGPLMCKDSKESAYVVVGITSWGVGCARAKRPGIYTATWPYLNWIASKIGSNALRMIQSATPPPPTTRPPPIRPPFSHPISAHLPWYFQPPPRPLPPRPPAAQ RL40 MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLRLRGG SORBS GEIGEAIAKYNFNADTNVELSLRKGDRVILLKRVDQNWYEGKIPGTNRQGIFPVSYVEVVKK SPF45
95
38 45
VVLLRNMVGAGEVDEDLEVETKEECEKYGKVGKCVIFEIPGAPDDEAVRIFLEFERVESAIKAVVDLNGRYFGGRVVKAC VINC PVFHTRTIESILEPVAQQISHLVIMHEEGEVDGKAIPDLTAPVAAVQAAVSNLVRVGKETVQTTEDQILKRDMPPAFIKVENACTKLVQAAQMLQSDPYSVPARDYLIDGSRGILSGTSDLLLTFDEAEVRKIIRVCKGILEYLTVAEVVETMEDLVTYTKNLGPGMTKMAKMIDERQQELTHQEHRVMLVNSMNTVKELLPVLISAMKIFVTTKNSKNQGIEEALKNRNFTVEKMSAEINEIIRVLQLTSWDEDAWA WDR5 SSSATQSKPTPVKPNYALKFTLAGHTKAVSSVKFSPNGEWLASSSADKLIKIWGAYDGKFEKTISGHKLGISDVAWSSDSNLLVSASDDKTLKIWDVSSGKCLKTLKGHSNYVFCCNFNPQSNLIVSGSFDESVRIWDVKTGKCLKTLPAHSDPVSAVHFNRDGSLIVSSSYDGLCRIWDTASGQCLKTLIDDDNPPVSFVKFSPNGKYILAATLDNTLKLWDYSKGKCLKTYTGHKNEKYCIFANFSVTGGKWIVSGSEDNLVYIWNLQTKEIVQKLQGHTDVVISTACHPTENIIASAALENDKTIKLWKSDC XRCC4 MERKISRIHLVSEPSITHFLQVSWEKTLESGFVITLTDGHSAWTGTVSESEISQEADDMAMEKGKYVGELRKALLSGAGPADVYTFNFSKESCYFFFEKNLKDVSFRLGSFNLEKVENPAEVIRELICYCLDTIAENQAKNEHLQKENERLLRDWNDVQGRFEKCVSAKEALETDLYKRFILVLNEKKTKIRSLHNKLLNAAQEREKDIKQEG PTN22 MDQREILQKFLDEAQSKKITKEEFANEFLKLKRQSTKYKADKTYPTTVAEKPKNIKKNRYKILPYDYSRVELSLITSDEDSSYINANFIKGVYGPKAYIATQGPLSTTLLDFWRMIWEYSVLIIVMACMEYEMGKKKCERYWAEPGEMQLEFGPFSVSCEAEKRKSDYIIRTLKVKFNSETRTIYQFHYKNWPDHDVPSSIDPILELIWDVRCYQEDDSVPICIHCSAGCGRTGVICAIDYTWMLLKDGIIPENFSVFSLIREMRTQRPSLVQTQEQYELVYNAVLELFKRQMDVIRDKHSGTESQAKH
96
38 45
Appendix C: Vector sequences The vector sequences are given below: pR4STOP (bp: 6203): The vector for peptide phage display Length: 6203 Legend: Ptac Signal peptide 4 Stop codons P3 P8 GAATTCCCGACACCATCGAATGGTGCAAAACCTTTCGCGGTATGGCATGATAGCGCCCGGAAGAGAGTCAATTCAGGGTGGTGAATGTGAAACCAGTAACGTTATACGATGTCGCAGAGTATGCCGGTGTCTCTTATCAGACCGTTTCCCGCGTGGTGAACCAGGCCAGCCACGTTTCTGCGAAAACGCGGGAAAAAGTGGAAGCGGCGATGGCGGAGCTGAATTACATTCCCAACCGCGTGGCACAACAACTGGCGGGCAAACAGTCGTTGCTGATTGGCGTTGCCACCTCCAGTCTGGCCCTGCACGCGCCGTCGCAAATTGTCGCGGCGATTAAATCTCGCGCCGATCAACTGGGTGCCAGCGTGGTGGTGTCGATGGTAGAACGAAGCGGCGTCGAAGCCTGTAAAGCGGCGGTGCACAATCTTCTCGCGCAACGCGTCAGTGGGCTGATCATTAACTATCCGCTGGATGACCAGGATGCCATTGCTGTGGAAGCTGCCTGCACTAATGTTCCGGCGTTATTTCTTGATGTCTCTGACCAGACACCCATCAACAGTATTATTTTCTCCCATGAAGACGGTACGCGACTGGGCGTGGAGCATCTGGTCGCATTGGGTCACCAGCAAATCGCGCTGTTAGCGGGCCCATTAAGTTCTGTCTCGGCGCGTCTGCGTCTGGCTGGCTGGCATAAATATCTCACTCGCAATCAAATTCAGCCGATAGCGGAACGGGAAGGCGACTGGAGTGCCATGTCCGGTTTTCAACAAACCATGCAAATGCTGAATGAGGGCATCGTTCCCACTGCGATGCTGGTTGCCAACGATCAGATGGCGCTGGGCGCAATGCGCGCCATTACCGAGTCCGGGCTGCGCGTTGGTGCGGATATCTCGGTAGTGGGATACGACGATACCGAAGACAGCTCATGTTATATCCCGCCGTTAACCACCATCAAACAGGATTTTCGCCTGCTGGGGCAAACCAGCGTGGACCGCTTGCTGCAACTCTCTCAGGGCCAGGCGGTGAAGGGCAATCAGCTGTTGCCCGTCTCACTGGTGAAAAGAAAAACCACCCTGGCGCCCAATACGCAAACCGCCTCTCCCCGCGCGTTGGCCGATTCATTAATGCAGCTGGCACGACAGGTTTCCCGACTGGAAAGCGGGCAGTGAGCGCAACGCAATTAATGTGAGTTAGCTCACTCATTAGGCACAATTCTCATGTTTGACAGCTTATCATCGACTGCACGGTGCACCAATGCTTCTGGCGTCAGGCAGCCATCGGAAGCTGTGGTATGGCTGTGCAGGTCGTAAATCACTGCATAATTCGTGTCGCTCAAGGCGCACTCCCGTTCTGGATAATGTTTTTTGCGCCGACATCATAACGGTTCTGGCAAATATTCTGAAATGAGCTGTTGACAATTAATCATCGGCTCGTATAATGTGTGGAATTGTGAGCGGATAACAATTTCACACAGGAAACAGCCAGTCCGTTTAGGTGTTTTCACGAGCACTTCACCAACAAGGACCATAGATTATGAAAAAGAATATCGCATTTCTTCTTGCATCTATGTTCGTTTTTTCTATTGCTACAAATGCCTATGCAGCCTCTTCATCTGGCTAATAATGATGAGGTGGAGGATCCGGAGGAGGCGCCGAGGGTGACGATCCCGCAAAAGCGGCCTTTAACTCCCTGCAAGCCTCAGCGACCGAATATATCGGTTATGCGTGGGCGATGGTTGTTGTCATTGTCGGCGCAACTATCGGTATCAAGCTGTTTAAGAAATTCACCTCGAAAGCAAGCTGATAAACCGATACAATTAAAGGCTCCTTTTGGAGCCTTTTTTTTTGGAGATTTTCAACGTGAAAAAATTATTATTCGCAATTCCTTTAGTTGTTCCTTTCTATTCTCACTCCGCTGAAACTGTTGAAAGTTGTTTAGCAAAACCCCATACAGAAAATTCATTTACTAACGTCTGGAAAGACGACAAAACTTTAGATCGTTACGCTAACTATGAGGGTTGTCTGTGGAATGCTACAGGCGTTGTAGTTTGTACTGGTGACGAAACTCAGTGTCTAGCTAGAGTGGCGGTGGCTCTGGTTCCGGTGATTTTGATTATGAAAAGATGGCAAACGCTAATAAGGGGGCTATGACCGAAAATGCCGATGAAAACGCGCTACAGTCTGACGCTAAAGGCAAACTTGATTCTGTCGCTACTGATTACGGTGCTGCTATCGATGGTTTCATTGGTGACGTTTCCGGCCTTGCTAATGGTAATGGTGCTACTGGTGATTTTGCTGGCTCTAATTCCCAAATGGCTCAAGTCGGTGACGGTGATAATTCACCTTTAATGAATAATTTCCGTCAATATTTACCTTCCCTCCCTCAATCGGTTGAATGTCGCCCTTTTGTCTTTAGCGCTGGTAAACCATATGAATTTTCTATTGATTGTGACAAAATAAACTTATTCCGTGGTGTCTTTGCGTTTCTTTTATATGTTGCCACCTTTATGTATGTATTTTCTACGTTTGCTAACATACTGCGTAATAAGGAGTCTTAATCATGCCAGTTCTTTTGGCTAGCGCCGCCCTATACCTTGTCTGCCTCCCCGCGTTGCGTCGCGGTGCATGGAGCCGGGCCACCTCGACCTGAATGGAAGCCGGCGGCACCTCGCTAACGGATTCACCACTCCAAGAATTGGAGCCAATCAATTCTTGCGGAGAACTGTGAATGCGCAAACCAACCCTTGGCAGAACATATCCATCGCGTCCGCCATCTCCAGCAGCCGCACGCGGCGCATCTCGGGCAGCGTTGGGTCCTGGCCACGGGTGCGCATGATCGTGCTCCTGTCGTTGAGGACCCGGCTAGGCTGGCGGGGTTGCCTTACTGGTTAGCAGAATGAATCACCGATACGCGAGCGAACGTGAAGCGACTGCTGCTGCAAAACGTCTGCGACCTGAGCAACAACATGAATGGTCTTCGGTTTCCGTGTTTCGTAAAGTCTGGAAACGCGGAAGTCAGCGCCCTGCACCATTATGTTCCGGATCTGCATCGCAGGATGCTGCTGGCTACCCTGTGGAACACCTACATCTGTATTAACGAAGCGCTGGCATTGACCCTGAGTGATTTTTCTCTGGTCCCGCCGCATCCATACCGCCAGTTGTTTACCCTCACAACGTTCCAGTAACCGGGCATGTTCATCATCAGTAACCCGTATCGTGAGCATCCTCTCTCGTTTCATCGGTATCATTACCCCCATGAACAGAAATTCCCCCTTACACGGAGGCATCAAGTGACCAAACAGGAAAAAACCGCCCTTAACATGGCCCGCTTTATCAGAAGCCAGACATTAACGCTTCTGGAGAAACTCAACGAGCTGGACGCGGATGAACAGGCAGACATCTGTGAATCGCTTCACGACCACGCTGATGAGCTTTACCGCAGGATCCGGAAATTGTAAACGTTAATATTTTGTTAAAATTCGCGTTAAATTTTTGTTAAATCAGCTCATTTTTTAACCAATAGGCCGAAATCGGCAAAATCCCTTATAAATCAAAAGAATAGACCGAGATAGGGTTGAGTGTTGTTCCAGTTTGGAACAAGAGTCCACTATTAAAGAACGTGGACTCCAACGTCAAAGGGCGAAAAACCGTCTATCAGGGCTATGGCCCACTACGTGAACCATCACCCTAATCAAGTTTTTTGGGGTCGAGGTGCCGTAAAGCACTAAATCGGAACCCTAAAGGGAGCCCCCGATTTAGAGCTTGACGGGGAAAGCCGGCGAACGTGGCGAGAAAGGAAGGGAAGAAAGCGAAAGGAGCGGGCGCTAGGGCGCTGGCAAGTGTAGCGGTCACGCTGCGCGTAACCACCACACCCGCCGCGCTTAATGCGCCGCTACAGGGCGCGTCCGGATCCTGCCTCGCGCGTTTCGGTGATGACGGTGAAAACCTCTGACACATGCAGCTCCCGGAGACGGTCACAGCTTGTCTGTAAGCGGATGCCGGGAGCAGACAAGCCCGTCAGGGCGCGTCAGCGGGTGTTGGCGGGTGTCGGGGCGCAGCCATGACCCAGTCACGTAGCGATAGCGGAGTGTATACTGGCTTAACTATGCGGCATCAGAGCAGATTGTAC
97
38 45
TGAGAGTGCACCATATGCGGTGTGAAATACCGCACAGATGCGTAAGGAGAAAATACCGCATCAGGCGCTCTTCCGCTTCCTCGCTCACTGACTCGCTGCGCTCGGTCGTTCGGCTGCGGCGAGCGGTATCAGCTCACTCAAAGGCGGTAATACGGTTATCCACAGAATCAGGGGATAACGCAGGAAAGAACATGTGAGCAAAAGGCCAGCAAAAGGCCAGGAACCGTAAAAAGGCCGCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCCCTGACGAGCATCACAAAAATCGACGCTCAAGTCAGAGGTGGCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGAAGCTCCCTCGTGCGCTCTCCTGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGGAAGCGTGGCGCTTTCTCATAGCTCACGCTGTAGGTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGCTGGGCTGTGTGCACGAACCCCCCGTTCAGCCCGACCGCTGCGCCTTATCCGGTAACTATCGTCTTGAGTCCAACCCGGTAAGACACGACTTATCGCCACTGGCAGCAGCCACTGGTAACAGGATTAGCAGAGCGAGGTATGTAGGCGGTGCTACAGAGTTCTTGAAGTGGTGGCCTAACTACGGCTACACTAGAAGGACAGTATTTGGTATCTGCGCTCTGCTGAAGCCAGTTACCTTCGGAAAAAGAGTTGGTAGCTCTTGATCCGGCAAACAAACCACCGCTGGTAGCGGTGGTTTTTTTGTTTGCAAGCAGCAGATTACGCGCAGAAAAAAAGGATCTCAAGAAGATCCTTTGATCTTTTCTACGGGGTCTGACGCTCAGTGGAACGAAAACTCACGTTAAGGGATTTTGGTCATGAGATTATCAAAAAGGATCTTCACCTAGATCCTTTTAAATTAAAAATGAAGTTTTAAATCAATCTAAAGTATATATGAGTAAACTTGGTCTGACAGTTACCAATGCTTAATCAGTGAGGCACCTATCTCAGCGATCTGTCTATTTCGTTCATCCATAGTTGCCTGACTCCCCGTCGTGTAGATAACTACGATACGGGAGGGCTTACCATCTGGCCCCAGTGCTGCAATGATACCGCGAGACCCACGCTCACCGGCTCCAGATTTATCAGCAATAAACCAGCCAGCCGGAAGGGCCGAGCGCAGAAGTGGTCCTGCAACTTTATCCGCCTCCATCCAGTCTATTAATTGTTGCCGGGAAGCTAGAGTAAGTAGTTCGCCAGTTAATAGTTTGCGCAACGTTGTTGCCATTGCTGCAGGCATCGTGGTGTCACGCTCGTCGTTTGGTATGGCTTCATTCAGCTCCGGTTCCCAACGATCAAGGCGAGTTACATGATCCCCCATGTTGTGCAAAAAAGCGGTTAGCTCCTTCGGTCCTCCGATCGTTGTCAGAAGTAAGTTGGCCGCAGTGTTATCACTCATGGTTATGGCAGCACTGCATAATTCTCTTACTGTCATGCCATCCGTAAGATGCTTTTCTGTGACTGGTGAGTACTCAACCAAGTCATTCTGAGAATAGTGTATGCGGCGACCGAGTTGCTCTTGCCCGGCGTCAACACGGGATAATACCGCGCCACATAGCAGAACTTTAAAAGTGCTCATCATTGGAAAACGTTCTTCGGGGCGAAAACTCTCAAGGATCTTACCGCTGTTGAGATCCAGTTCGATGTAACCCACTCGTGCACCCAACTGATCTTCAGCATCTTTTACTTTCACCAGCGTTTCTGGGTGAGCAAAAACAGGAAGGCAAAATGCCGCAAAAAAGGGAATAAGGGCGACACGGAAATGTTGAATACTCATACTCTTCCTTTTTCAATATTATTGAAGCATTTATCAGGGTTATTGTCTCATGAGCGGATACATATTTGAATGTATTTAGAAAAATAAACAAATAGGGGTTCCGCGCACATTTCCCCGAAAAGTGCCACCTGACGTCTAAGAAACCATTATTATCATGACATTAACCTATAAAAATAGGCGTATCACGAGGCCCTTTCGTCTTCAA
Primers for Sequencing: YH005 (M13 forward) TGT AAA ACG ACG GCC AGT CGA GCA CTT CAC CAA CAA YH006 (M13 reverse) CAG GAA ACA GCT ATG ACC GAC AAC AAC CAT CGC CCA pHH0103: The vector for protein expression Length: 6734 Legend: GST tag His Tag Thrombin cleavage site Stop codon PTac promoter Protein insertion site GAATTCCCGACACCATCGAATGGTGCAAAACCTTTCGCGGTATGGCATGATAGCGCCCGGAAGAGAGTCAATTCAGGGTGGTGAATGTGAAACCAGTAACGTTATACGATGTCGCAGAGTATGCCGGTGTCTCTTATCAGACCGTTTCCCGCGTGGTGAACCAGGCCAGCCACGTTTCTGCGAAAACGCGGGAAAAAGTGGAAGCGGCGATGGCGGAGCTGAATTACATTCCCAACCGCGTGGCACAACAACTGGCGGGCAAACAGTCGTTGCTGATTGGCGTTGCCACCTCCAGTCTGGCCCTGCACGCGCCGTCGCAAATTGTCGCGGCGATTAAATCTCGCGCCGATCAACTGGGTGCCAGCGTGGTGGTGTCGATGGTAGAACGAAGCGGCGTCGAAGCCTGTAAAGCGGCGGTGCACAATCTTCTCGCGCAACGCGTCAGTGGGCTGATCATTAACTATCCGCTGGATGACCAGGATGCCATTGCTGTGGAAGCTGCCTGCACTAATGTTCCGGCGTTATTTCTTGATGTCTCTGACCAGACACCCATCAACAGTATTATTTTCTCCCATGAAGACGGTACGCGACTGGGCGTGGAGCATCTGGTCGCATTGGGTCACCAGCAAATCGCGCTGTTAGCGGGCCCATTAAGTTCTGTCTCGGCGCGTCTGCGTCTGGCTGGCTGGCATAAATATCTCACTCGCAATCAAATTCAGCCGATAGCGGAACGGGAAGGCGACTGGAGTGCCATGTCCGGTTTTCAACAAACCATGCAAATGCTGAATGAGGGCATCGTTCCCACTGCGATGCTGGTTGCCAACGATCAGATGGCGCTGGGCGCAATGCGCGCCATTACCGAGTCCGGGCTGCGCGTTGGTGCGGATATCTCGGTAGTGGGATACGACGATACCGAAGACAGCTCATGTTATATCCCGCCGTTAACCACCATCAAACAGGATTTTCGCCTGCTGGGGCAAACCAGCGTGGACCGCTTGCTGCAACTCTCTCAGGGCCAGGCGGTGAAGGGCAATCAGCTGTTGCCCGTCTCACTGGTGAAAAGAAAAACCACCCTGGCGCCCAATACGCAAACCGCCTCTCCCCGCGCGTTGGCCGATTCATTAATGCAGCTGGCACGACAGGTTTCCCGACTGGAAAGCGGGCAGTGAGCGCAACGCAATTAATGTGAGTTAGCTCACTCATTAGGCACAATTCTCATGTTTGACAGCTTATCATCGACTGCACGGTGCACCAATGCTTCTGGCGTCAGGCAGCCATCGGAAGCTGTGGTATGGCTGTGCAGGTCGTAAATCACTGCATAATTCGTGTCGCTCAAGGCGCACTCCCGTTCTGGATAATGTTTTTTGCGCCGACATCATAACGGTTCTGGCAAATATTCTGAAATGAGCTGTTGACAATTAATCATCGGCTCGTATAATGTGTGGAATTGTGAGCGGATAACAATTTCACACAGGAAACAGCCAGTCCGTTTAGGTGTTTTCACGAGCACTTCACCAACAAGGACCATAGATTATGAAAATCGAAGAACACCATCACCATCACCATTCCAGCGGTAAGCTTATGTCCCCTATACTAGGTTATTGGAAAATTAAGGGCCTTGTGCAACCCACTCGACTTCTTTTGGAATATCTTGAAGAAAAATATGAAGAGCATTTGTATGAGCGCGATGAAGGTGATAAATGGCGAAACAAAAAGTTTG
98
38 45
AATTGGGTTTGGAGTTTCCCAATCTTCCTTATTATATTGATGGTGATGTTAAATTAACACAGTCTATGGCCATCATACGTTATATAGCTGACAAGCACAACATGTTGGGTGGTTGTCCAAAAGAGCGTGCAGAGATTTCAATGCTTGAAGGAGCGGTTTTGGATATTAGATACGGTGTTTCGAGAATTGCATATAGTAAAGACTTTGAAACTCTCAAAGTTGATTTTCTTAGCAAGCTACCTGAAATGCTGAAAATGTTCGAAGATCGTTTATGTCATAAAACATATTTAAATGGTGATCATGTAACCCATCCTGACTTCATGTTGTATGACGCTCTTGATGTTGTTTTATACATGGACCCAATGTGCCTGGATGCGTTCCCAAAATTAGTTTGTTTTAAAAAACGTATTGAAGCTATCCCACAAATTGATAAGTACTTGAAATCCAGCAAGTATATAGCATGGCCTTTGCAGGGCTGGCAAGCCACGTTTGGTGGTGGCGACCATCCTCCAAAATCGGATCTAGAAGTTCTGTTCCAGGGGCCCCTGTCCAGCGGTCTGGTTCCGCGTGGTTCCGGTACCGCGGCCCAGCCGGCCTTTTTTGCGGCCGCATAATAAACCGATACAATTAAAGGCTCCTTTTGGAGCCTTTTTTTTTGGAGATTTTCAACGTGAAAAAATTATTATTCGCAATTCCTTTAGTTGTTCCTTTCTATTCTCACTCCGCTGAAACTGTTGAAAGTTGTTTAGCAAAACCCCATACAGAAAATTCATTTACTAACGTCTGGAAAGACGACAAAACTTTAGATCGTTACGCTAACTATGAGGGTTGTCTGTGGAATGCTACAGGCGTTGTAGTTTGTACTGGTGACGAAACTCAGTGTCTAGCTAGAGTGGCGGTGGCTCTGGTTCCGGTGATTTTGATTATGAAAAGATGGCAAACGCTAATAAGGGGGCTATGACCGAAAATGCCGATGAAAACGCGCTACAGTCTGACGCTAAAGGCAAACTTGATTCTGTCGCTACTGATTACGGTGCTGCTATCGATGGTTTCATTGGTGACGTTTCCGGCCTTGCTAATGGTAATGGTGCTACTGGTGATTTTGCTGGCTCTAATTCCCAAATGGCTCAAGTCGGTGACGGTGATAATTCACCTTTAATGAATAATTTCCGTCAATATTTACCTTCCCTCCCTCAATCGGTTGAATGTCGCCCTTTTGTCTTTAGCGCTGGTAAACCATATGAATTTTCTATTGATTGTGACAAAATAAACTTATTCCGTGGTGTCTTTGCGTTTCTTTTATATGTTGCCACCTTTATGTATGTATTTTCTACGTTTGCTAACATACTGCGTAATAAGGAGTCTTAATCATGCCAGTTCTTTTGGCTAGCGCCGCCCTATACCTTGTCTGCCTCCCCGCGTTGCGTCGCGGTGCATGGAGCCGGGCCACCTCGACCTGAATGGAAGCCGGCGGCACCTCGCTAACGGATTCACCACTCCAAGAATTGGAGCCAATCAATTCTTGCGGAGAACTGTGAATGCGCAAACCAACCCTTGGCAGAACATATCCATCGCGTCCGCCATCTCCAGCAGCCGCACGCGGCGCATCTCGGGCAGCGTTGGGTCCTGGCCACGGGTGCGCATGATCGTGCTCCTGTCGTTGAGGACCCGGCTAGGCTGGCGGGGTTGCCTTACTGGTTAGCAGAATGAATCACCGATACGCGAGCGAACGTGAAGCGACTGCTGCTGCAAAACGTCTGCGACCTGAGCAACAACATGAATGGTCTTCGGTTTCCGTGTTTCGTAAAGTCTGGAAACGCGGAAGTCAGCGCCCTGCACCATTATGTTCCGGATCTGCATCGCAGGATGCTGCTGGCTACCCTGTGGAACACCTACATCTGTATTAACGAAGCGCTGGCATTGACCCTGAGTGATTTTTCTCTGGTCCCGCCGCATCCATACCGCCAGTTGTTTACCCTCACAACGTTCCAGTAACCGGGCATGTTCATCATCAGTAACCCGTATCGTGAGCATCCTCTCTCGTTTCATCGGTATCATTACCCCCATGAACAGAAATTCCCCCTTACACGGAGGCATCAAGTGACCAAACAGGAAAAAACCGCCCTTAACATGGCCCGCTTTATCAGAAGCCAGACATTAACGCTTCTGGAGAAACTCAACGAGCTGGACGCGGATGAACAGGCAGACATCTGTGAATCGCTTCACGACCACGCTGATGAGCTTTACCGCAGGATCCGGAAATTGTAAACGTTAATATTTTGTTAAAATTCGCGTTAAATTTTTGTTAAATCAGCTCATTTTTTAACCAATAGGCCGAAATCGGCAAAATCCCTTATAAATCAAAAGAATAGACCGAGATAGGGTTGAGTGTTGTTCCAGTTTGGAACAAGAGTCCACTATTAAAGAACGTGGACTCCAACGTCAAAGGGCGAAAAACCGTCTATCAGGGCTATGGCCCACTACGTGAACCATCACCCTAATCAAGTTTTTTGGGGTCGAGGTGCCGTAAAGCACTAAATCGGAACCCTAAAGGGAGCCCCCGATTTAGAGCTTGACGGGGAAAGCCGGCGAACGTGGCGAGAAAGGAAGGGAAGAAAGCGAAAGGAGCGGGCGCTAGGGCGCTGGCAAGTGTAGCGGTCACGCTGCGCGTAACCACCACACCCGCCGCGCTTAATGCGCCGCTACAGGGCGCGTCCGGATCCTGCCTCGCGCGTTTCGGTGATGACGGTGAAAACCTCTGACACATGCAGCTCCCGGAGACGGTCACAGCTTGTCTGTAAGCGGATGCCGGGAGCAGACAAGCCCGTCAGGGCGCGTCAGCGGGTGTTGGCGGGTGTCGGGGCGCAGCCATGACCCAGTCACGTAGCGATAGCGGAGTGTATACTGGCTTAACTATGCGGCATCAGAGCAGATTGTACTGAGAGTGCACCATATGCGGTGTGAAATACCGCACAGATGCGTAAGGAGAAAATACCGCATCAGGCGCTCTTCCGCTTCCTCGCTCACTGACTCGCTGCGCTCGGTCGTTCGGCTGCGGCGAGCGGTATCAGCTCACTCAAAGGCGGTAATACGGTTATCCACAGAATCAGGGGATAACGCAGGAAAGAACATGTGAGCAAAAGGCCAGCAAAAGGCCAGGAACCGTAAAAAGGCCGCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCCCTGACGAGCATCACAAAAATCGACGCTCAAGTCAGAGGTGGCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGAAGCTCCCTCGTGCGCTCTCCTGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGGAAGCGTGGCGCTTTCTCATAGCTCACGCTGTAGGTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGCTGGGCTGTGTGCACGAACCCCCCGTTCAGCCCGACCGCTGCGCCTTATCCGGTAACTATCGTCTTGAGTCCAACCCGGTAAGACACGACTTATCGCCACTGGCAGCAGCCACTGGTAACAGGATTAGCAGAGCGAGGTATGTAGGCGGTGCTACAGAGTTCTTGAAGTGGTGGCCTAACTACGGCTACACTAGAAGGACAGTATTTGGTATCTGCGCTCTGCTGAAGCCAGTTACCTTCGGAAAAAGAGTTGGTAGCTCTTGATCCGGCAAACAAACCACCGCTGGTAGCGGTGGTTTTTTTGTTTGCAAGCAGCAGATTACGCGCAGAAAAAAAGGATCTCAAGAAGATCCTTTGATCTTTTCTACGGGGTCTGACGCTCAGTGGAACGAAAACTCACGTTAAGGGATTTTGGTCATGAGATTATCAAAAAGGATCTTCACCTAGATCCTTTTAAATTAAAAATGAAGTTTTAAATCAATCTAAAGTATATATGAGTAAACTTGGTCTGACAGTTACCAATGCTTAATCAGTGAGGCACCTATCTCAGCGATCTGTCTATTTCGTTCATCCATAGTTGCCTGACTCCCCGTCGTGTAGATAACTACGATACGGGAGGGCTTACCATCTGGCCCCAGTGCTGCAATGATACCGCGAGACCCACGCTCACCGGCTCCAGATTTATCAGCAATAAACCAGCCAGCCGGAAGGGCCGAGCGCAGAAGTGGTCCTGCAACTTTATCCGCCTCCATCCAGTCTATTAATTGTTGCCGGGAAGCTAGAGTAAGTAGTTCGCCAGTTAATAGTTTGCGCAACGTTGTTGCCATTGCTGCAGGCATCGTGGTGTCACGCTCGTCGTTTGGTATGGCTTCATTCAGCTCCGGTTCCCAACGATCAAGGCGAGTTACATGATCCCCCATGTTGTGCAAAAAAGCGGTTAGCTCCTTCGGTCCTCCGATCGTTGTCAGAAGTAAGTTGGCCGCAGTGTTATCACTCATGGTTATGGCAGCACTGCATAATTCTCTTACTGTCATGCCATCCGTAAGATGCTTTTCTGTGACTGGTGAGTACTCAACCAAGTCATTCTGAGAATAGTGTATGCGGCGACCGAGTTGCTCTTGCCCGGCGTCAACACGGGATAATACCGCGCCACATAGCAGAACTTTAAAAGTGCTCATCATTGGAAAACGTTCTTCGGGGCGAAAACTCTCAAGGATCTTACCGCTGTTGAGATCCAGTTCGATGTAACCCACTCGTGCACCCAACTGATCTTCAGCATCTTTTACTTTCACCAGCGTTTCTGGGTGAGCAAAAACAGGAAGGCAAAATGCCGCAAAAAAGGGAATAAGGGCGACACGGAAATGTTGAATACTCATACTCTTCCTTTTTCAATATTATTGAAGCATTTATCAGGGTTATTGTCTCATGAGCGGATACATATTTGAATGTATTTAGAAAAATAAACAAATAGGGGTTCCGCGCACATTTCCCCGAAAAGTGCCACCTGACGTCTAAGAAACCATTATTATCATGACATTAACCTATAAAAATAGGCGTATCACGAGGCCCTTTCGTCTTCAA