Download pdf - Generating peptide probes against cancer-related peptide ... · Generating peptide probes against cancer-related peptide recognition domains using phage display Yogesh Hooda Master

Generating peptide probes against cancer-related peptide recognition domains using phage display

by

Yogesh Hooda

A thesis submitted in conformity with the requirements for the degree of Master of Science

Graduate Department of Molecular Genetics University of Toronto

© Copyright by Yogesh Hooda 2012

ii

Generating peptide probes against cancer-related peptide recognition domains using phage display

Yogesh Hooda

Master of Science

Graduate Department of Molecular Genetics

University of Toronto

2012

Abstract

Peptide recognition domains (PRD) bind to short linear motifs on their biological partners and

are found in several cellular pathways including those found to be critical in tumorigenesis. In

this study, I aimed to generate peptide probes against PRDs present on proteins involved in

ovarian cancer. Using bioinformatics, I identified 66 potential PRDs present on these proteins. I

then used peptide phage display to successfully generate peptides against 27 of the 66 domains.

To validate my results, I performed an extensive literature review and structural analysis. For

several cases, the phage-display derived binding preferences are similar to previously reported

studies. However, for a subset of domains, I identified non-canonical binding preferences that

have not been reported previously in literature. The binding preferences obtained in this study

can be used to design intracellular probes for studying the role of these PRDs in biological

pathways important in ovarian cancer.

iii

Acknowledgments

It is hard to imagine that it has already been two years since I started my graduate studies.

Working at the Sidhu and the Kim labs has been a wonderful experience and I would like to take

this opportunity to thank all the people who helped me through this part of my life.

First and foremost, I would like to thank my supervisors Dev Sidhu and Philip Kim who

gave me the opportunity to work in their labs and guided me throughout my stay here. They both

have been an immense source of inspiration. I would also like to thank my committee members,

Frank Sicheri and Tim Hughes, for their constructive criticism and suggestions.

During my stay, I came across an awesome set of people at both the Sidhu and the Kim

labs. I would especially like thank Joan for all his discussions and guidance during the latter part

of my project. In the Sidhu lab, I would like to give special thanks to Maruti, Andreas, Megan,

Haiming, Gang and Linda for their kind help and support. I would also like to thank Mark,

Recep, Simon, Roland, Clare, Kurt and Ylva in the Kim lab.

I also would like to thank all my friends here in Toronto, around the world and back

home in India for sharing with me their adventures or misadventures and listening to mine. Their

friendships made Toronto a great city to stay in. I would especially like to thank Senjuti for her

incredible love and encouragement. Her companionship has kept me going through all the ups

and downs of my project.

Lastly, I am grateful to my family for their constant love and support. They have always

been a tremendous source of strength and inspiration for me.

iv

Table of Contents

Acknowledgements ........................................................................................................................ iii

Table of Contents ........................................................................................................................... iv

List of Tables ................................................................................................................................ vii

List of Figures .............................................................................................................................. viii

List of Appendices ...........................................................................................................................x

1 Introductions ...............................................................................................................................1

1.1 Overview ................................................................................................................................2

1.2 Peptide recognition domains ..................................................................................................4

1.2.1 Properties of domain-peptide interactions ..................................................................4

1.2.2 Role in biological pathways ........................................................................................5

1.3 Peptide-recognition domains as therapeutic targets ..............................................................6

1.3.1 Bcl-2 ............................................................................................................................7

1.4 Studying peptide recognition domains using peptide probes ...............................................10

1.4.1 Understanding structure and binding properties .......................................................10

1.4.2 Elucidating biological role ........................................................................................11

1.4.3 Validating drug targets ..............................................................................................12

1.4.4 Drug Discovery .........................................................................................................13

1.5 Goal of the project ................................................................................................................14

2 Identification of peptide recognition domains essential in ovarian cancer .........................16

2.1 Introduction ..........................................................................................................................17

2.1.1 Whole Genome RNAi screen ....................................................................................17

2.1.2 Computational methods to identify peptide recognition domains ............................18

2.2 Methods ................................................................................................................................19

2.2.1 Identification of peptide recognition domains ..........................................................19

2.2.2 Manual filtering and literature review of potential domains from PepX ..................20

2.3 Results and Discussion .........................................................................................................20

2.3.1 Analysis of 1695 genes obtained from whole genome RNAi screens ......................20

2.3.2 Literature review of domain list obtained from the computational pipeline ............21

v

2.4 Summary ..............................................................................................................................25

3 Identification of peptide binders using phage display ...........................................................27

3.1 Introduction ..........................................................................................................................28

3.1.1 Displaying peptide on phage particles ......................................................................28

3.1.2 Site-directed mutagenesis and phage library design .................................................29

3.1.3 Selection strategy ......................................................................................................31

3.1.4 Selection of tight-binding peptides and identification of binding specificities ........32

3.2 Methods ................................................................................................................................33

3.2.1 Strains .......................................................................................................................33

3.2.1 Protein expression and purification ..........................................................................33

3.2.2 Library construction and design ................................................................................34

3.2.3 Phage Display selections ...........................................................................................35

3.2.4 Calculation of enrichment ratio and pool ELISA .....................................................36

3.2.5 Clonal ELISA and sequencing of peptides ...............................................................37

3.2.6 Structural modeling of phage-display results ...........................................................38

3.3 Results and Discussion .........................................................................................................38

3.3.1 Selection of peptide binders using phage display .....................................................38

3.3.2 Validation of tight binder using clonal ELISA .........................................................39

3.3.3 Identification of binding preferences and literature validation .................................39

3.3.4 Cellular signaling ......................................................................................................46

3.3.4.1 SH3 ............................................................................................................48

3.3.4.2 PDZ ............................................................................................................49

3.3.4.3 G-alpha .......................................................................................................50

3.3.4.4 14-3-3 .........................................................................................................52

3.3.4.5 Penta-EF hand ............................................................................................53

a Calpain small regulatory subunit ............................................................53

b Programmed Cell Death Protein 6 ..........................................................54

3.3.5 Cytoskeleton regulation ............................................................................................55

3.3.5.1 Dynein light chain ......................................................................................55

3.3.5.2 CAP/Gly .....................................................................................................57

3.3.5.3 Alpha-vinculin head domain ......................................................................58

vi

3.3.6 Intracellular transport ................................................................................................60

3.3.6.1 Importin beta ..............................................................................................60

3.3.6.2 UBA ...........................................................................................................61

3.3.6.3 Bro1............................................................................................................62

3.3.6.4 Clathrin heavy chain ..................................................................................63

3.3.7 Genome Regulation ..................................................................................................65

3.3.7.1 PCNA .........................................................................................................65

3.3.7.2 OB-fold ......................................................................................................66

3.3.7.3 Ligand binding domain of nuclear receptors .............................................67

3.3.7.4 WD40 domains ..........................................................................................69

3.3.7.5 TRF homology domain ..............................................................................71

3.3.8 Miscellaneous ...........................................................................................................71

3.3.8.1 SWIB/MDM2 ............................................................................................72

3.3.8.2 HORMA domain ........................................................................................73

3.3.8.3 eIF4E ..........................................................................................................74

3.3.8.4 Ubiquitin ....................................................................................................75

3.4 Summary ..............................................................................................................................76

4 Conclusions ................................................................................................................................78

4.1 Summary of work .................................................................................................................79

4.2 Future experiments ...............................................................................................................79

4.3 Potential avenues for research ..............................................................................................80

4.4 Application of phage-derived peptides ................................................................................81

4.5 Final remarks ........................................................................................................................81

5 References ..................................................................................................................................82

vii

List of Tables

Table 1 List of all PRDs that have been investigated as targets for cancer therapies ......................9

Table 2 Summary of results obtained from DOMINO and PepX .................................................20

Table 3 List of 66 domains selected for phage display experiments .............................................21

Table 4 Summary of phage display results for 66 domains ...........................................................40

viii

List of Figures

Figure 1 Representative structures of PRDs present in the human genome ....................................3

Figure 2 Peptide and small-molecule inhibitors of Bcl-2 ................................................................7

Figure 3 Combinatorial methods for determining binding preferences of PRDs ..........................11

Figure 4 Generating intracellular Dvl2-PDZ inhibitors using phage display ................................13

Figure 5 Fluorescence polarization assays for discovery of small-molecule inhibitors ................14

Figure 6 Whole genome RNAi screen for identifying essential genes in ovarian cancer .............17

Figure 7 Computational strategy for identifying potential peptide binding domains ....................19

Figure 8 Schematic diagram of M13 bacteriophage ......................................................................29

Figure 9 Oligonucleotide-directed mutagenesis with an ssDNA template ....................................30

Figure 10 Phage display selection for PRDs ..................................................................................31

Figure 11 Strategy for validating phage display results .................................................................46

Figure 12 Overview of phage display results ................................................................................47

Figure 13 Structural and literature analysis of SH3 domains ........................................................48

Figure 14 Structural and literature analysis of PDZ domains ........................................................50

Figure 15 Structural and literature analysis of Gα subunits ...........................................................51

Figure 16 Structural and literature analysis of 14-3-3 ...................................................................52

Figure 17 Structural and literature analysis of Penta-EF hand of CAPNS1 ..................................54

Figure 18 Structural and literature analysis of Penta-EF hand of PDCD6 ....................................55

Figure 19 Structural and literature analysis of Dynein light chains ...............................................56

Figure 20 Structural and literature analysis of CAP/Gly domain of p150glued ............................58

Figure 21 Structural and literature analysis of Alpha-catenin/vinculin head domain ...................59

Figure 22 Structural and literature analysis of Importin beta ........................................................60

Figure 23 Structural and literature analysis of NXF1-UBA domain .............................................61

Figure 24 Structural and literature analysis of Alix-Bro1 domain ................................................63

Figure 25 Structural and literature analysis of Clathrin terminal domain .....................................64

Figure 26 Structural and literature analysis of PCNA ...................................................................66

Figure 27 Structural and literature analysis of RPA 70N OB-fold domain ...................................67

Figure 28 Structural and literature analysis of NR1H4 ligand binding domain ............................69

Figure 29 Structural and literature analysis of WDR5 ...................................................................70

Figure 30 Structural and literature analysis of TRFH domain of TERF1 ......................................71

ix

Figure 31 Structural and literature analysis of SWIB/MDM2 .......................................................72

Figure 32 Structural and literature analysis of HORMA domain ..................................................73

Figure 33 Structural and literature analysis of eIF4E ....................................................................74

Figure 34 Structural and literature analysis of ubiquitin ...............................................................76

x

List of Appendices

Appendix A List of ovarian cancer lines ......................................................................................88

Appendix B Protein sequences of 66 domains ..............................................................................89

Appendix C Vector sequences ......................................................................................................95

1

1. INTRODUCTION

2

1.1 Overview

Protein-protein interactions form the molecular basis of key regulatory and signalling

pathways inside cells [1]. They help in assembly of macromolecular complexes and formation of

modular interaction networks that regulate key biological processes such as cell cycle, signal

transduction and embryogenesis. Protein-protein interactions can be roughly categorized into two

types: i) domain-domain interactions where two domains bind to each other and ii) domain-

peptide interaction where domains bind to an unfolded linear motif on its partner [1]. Domain-

peptide interactions are mediated by peptide-recognition domains (PRD) which bind to small

linear motifs that often lie in disordered regions on their interaction partners [2].

Peptide recognition domains (PRD) are ubiquitous and assemble transient regulatory

networks, identify post-translation marks, regulate signalling molecules and provide specificity

to enzymatic complexes. Given the important role of domain-peptide interactions in key cellular

processes, these interactions are frequently targeted by toxins or somatic mutations found in

diseases including cancer. In cancer, amplified and exogenous domain-peptide interactions often

lead to rewiring of cellular networks, thereby promoting tumour growth, invasion and metastasis

[3]. A number of such interactions, e.g. p53/mdm2, IAP/caspase and Bcl-2/BH3, have been

targeted using small molecules and peptide-based drugs [4]. Peptide recognition domains (PRD)

mediating these interactions form an emerging class of cancer drug targets.

PRDs have been extensively studied by peptide-based probes. These probes can be

derived from known natural binding partners or generated using combinatorial methods such as

phage display and peptide microarrays [5]. Peptide-based probes have been extensively used to

elucidate the biochemical and structural properties of interactions mediated by PRDs. These

peptide probes have also been used to design intracellular reagents to target interactions

mediated by PRD and to better understand cellular pathways [5]. Such probes may also be used

to identify PRDs that may serve as potential cancer drug targets [5].

Peptide-based probes against PRD have also led to development of small molecule

therapeutics (e.g. ABT737 against Bcl2, Nutilins against MDM2 etc.) against various cancers

[4]. However, the number of PRDs whose role in cancer-related pathways is well-understood is

limited. This is largely due to the lack of high affinity and specific probes to study these

domains. In order to address this issue, I propose to use phage display to systematically generate

peptide probes against different families of PRDs. The main focus of this study

3

Figure 1: Representative structures of PRDs present in the human genome. Peptide recognition domains are structurally diverse and use different binding surface to bind to peptides. Domains as defined by CATH are shown in grey; peptide ligands are shown in green.

4

is to develop peptide probes against the PRDs present on proteins involved in ovarian cancer that

were identified by our collaborators. Peptide probes developed here may serve as valuable tools

to understand the role of PRDs in ovarian cancer-related biological pathways.

In the following sections, I will discuss progress made in the study of peptide recognition

domains. First, I will discuss structural properties of interactions mediated by peptide recognition

domains and describe some of their biological roles. Second, I will highlight examples of peptide

recognition domains that have been identified as drug targets for specific types of cancer. Third,

I will present studies that have used peptide probes against PRDs that demonstrates their utility

as intracellular probes. Finally, I will elaborate the specific aims of the current study.

1.2 Peptide recognition domains

As discussed above, PRDs bind to specific linear motifs on their interaction partners. Since the

discovery of the first PRDs, a large number of such domains have been identified in the human

proteome. This progress can be attributed to the development of high-throughput experimental

methods that allow the identification of a large number of protein-protein interactions[6]. Such

studies have established that a significant proportion of protein-protein interactions within a cell

are often domain-peptide interactions mediated by dedicated peptide recognition domains [7].

Analysis of structures of peptide recognition domains in complex with their natural or synthetic

partners have led to the elucidation of their mode of function [8]. Further experimental and

computational studies have highlighted the roles played by PRDs inside cells.

1.2.1 Properties of domain-peptide interactions – Peptide recognition domains are found in

structurally diverse protein families (Figure 1) that are catalogued in databases such as DOMINO

[9], ADAN [10] and PepX [11]. Domain-peptide interactions are often mediated by a groove-like

binding interface present on peptide recognition domains. The binding interface of domain-

peptide interactions is ~500-1000 Å2, which is smaller than those of domain-domain interactions

[12]. Domain-peptide interactions are often transient and exhibit binding affinity in the low-

micromolar to nanomolar range. Structurally, the binding interface on the domain is often the

largest pocket on the surface of the PRD [12]. The binding surface is more hydrophobic than the

overall surface of the protein but less hydrophobic than the protein core. A small subset of the

residues present on the binding surface contributes to most of the binding energy. These residues

5

known, as “hotspot residues” are essential for binding and change of any of these residues can

severely affect the domain-peptide interaction [13]. The interaction between the PRD and the

peptide may cause conformational changes on either the PRD or its interaction partner [12].

Many peptide recognition domains also possess enzymatic function such as the G-alpha

subunits. The binding of peptide partners to the switch II/alpha III groove on the G-alpha subunit

increases the GTPase function of the G-alpha subunit [14]. Furthermore, in the case of ligand

binding domain of nuclear receptors, the peptide binding is often dependent on the binding of

small molecule/hormone to the ligand-binding pocket [15]. The binding of ligand produces a

conformation change allowing the peptide to bind to the hydrophobic pocket. Using these

approaches, PRDs often couple peptide binding and enzymatic/ligand-binding functions present

on the same domain.

The binding preferences of PRDs are highly diverse. While PRD such as the SH3, WW,

and EVH1 bind to motifs rich in proline residues, other domains such as the PDZ and CAP-Gly

domains specifically recognize hydrophobic C-terminal residues of the peptides [5].The binding

sites of PRDs are often present on the disordered regions on the interacting proteins. These

peptide motifs undergo a disordered to order transition upon binding [12]. For example, the

binding of co-activators to ligand binding domains of nuclear receptors leads to helical

conformation of co-activator [15]. This produces a conformation change in the co-activator

molecule that favors the assembly of active transcriptional complex [15].

A class of PRDs specifically recognizes post-translation modification such as

phosphorylation (SH2, 14-3-3, FHA), acetylation (bromodomain) and methylation

(chromodomains)[16]. Such domains act as readers of post-translation modifications and link

these modifications to downstream cellular pathways. For example the SH2 domains of scaffold

proteins such as Grb2 and Vav link phosphorylation of receptor tyrosine kinases to activation of

intracellular kinases (Raf,Ras and Erk) [1].

1.2.2 Role in biological pathways: Proteins that regulate key cellular processes, such as signal

transduction, cell cycle, protein trafficking, cytoskeleton organization and gene expression are

composed of catalytic and interaction domains [1]. Catalytic domains such as kinases, GTPase,

proteases etc. catalyze specific molecular reactions (phosphorylation and peptide bond digestion)

that help in propagation of cellular signals. However these domains often have limited inherent

6

specificity i.e. they can bind to a large set of binding partners. Interaction domains regulate the

specificity of catalytic domains either directly by recruiting substrates of catalytic domains or

indirectly by controlling their spatio-temporal localization [17]. As previously mentioned, a large

number of the interaction domains are PRDs that bind to specific peptide motifs present on their

interacting partners. Thus, PRDs recruit and confine signaling proteins to an appropriate sub-

cellular location and determine the specificity with which enzymes interact with their targets,

analogous to association of protein kinases with their substrates. There are several evolutionary

and mechanistic advantages provided by PRDs to cellular networks. Firstly, domain-peptide

interactions often evolve faster than domain-domain interactions, allowing cellular pathways to

be rewired with minimal changes [17]. Secondly, PRDs that act as scaffolds increase the speed

of signal transduction by increasing the local concentration of enzymes and substrates [17].

Thirdly and most importantly, PRDs provide specificity to the information flow in intracellular

networks [17]. This allows cells to accurately process the diverse range of signals they receive

and produce the appropriate biochemical responses.

A key function of PRD is to identify specific post-translation modifications (PTM).

Protein function and localization are often regulated by a vast and dynamic array of PTM. By

recognizing specific PTMs, PRDs link PTMs to cellular organization thereby sensing “the state

of the proteome” [16]. PRDs are also involved in cellular protein trafficking. Specific peptide

tags on the protein determine the transport of cellular proteins. PRDs such as importin-beta and

clathrin recognize specific peptide tags and transport cellular proteins to their desired sub-

cellular location [1].

1.3 Peptide-recognition domains as therapeutic targets

Given their central role in biological pathways, peptide recognition domains are often targeted by

pathogenic proteins and somatic mutations observed in various diseases including cancer [3].

Hence, PRDs are an emerging class of therapeutic targets. Small-molecule and peptide-based

drugs have been developed against a handful of PRD families. These drugs are currently in

various stages of pre-clinical and clinical drug development. In this section, I will describe the

work done on an important family of PRD that has been extensively studied as cancer drug

target: B-cell lymphoma-2 or Bcl-2. I will discuss the functions of this family of domains inside

7

cells and how these functions are often mis-regulated in cancer. I will also briefly discuss the

various techniques that were used to develop potential therapeutic agents against these domains.

Figure 2: Peptide and small-molecule inhibitors of Bcl-2. A) The structure of 16-amino acid peptide derived from Bad in complex with Bcl-xl. B) The interaction surface of Bcl-xl and Bad-peptide. The interaction is mediated by a hydrophobic pocket on Bcl-xl. C) The structure of the small-molecule (ABT-737) in complex with Bcl-xl. D) The interaction surface of Bcl-xl and ABT-737. ABT-737 binds to same hydrophobic pocket on Bcl-xl and competes with its natural interaction with Bak and Bax.

1.3.1 Bcl-2: B-cell lymphoma-2 (Bcl-2) family of proteins are important regulators of

mitochondrial outer membrane permeabilization (MOMP), an important step in apoptotic

pathway inside the cells [18]. They regulate the release of cytochrome-c from the mitochondria

and the activation of caspases which are the proteases responsible for breakdown of key cellular

components during apoptosis. Bcl-2 family forms an alpha-helical structure consisting of repeats

8

called the baculovirus-homology domains (BH-domains). This protein family can be divided

according to their positive or negative effect on apoptosis. While family members such as Bax,

Bak etc initiate apoptosis; members such as Bcl-2, Bcl-xl & Mcl-2 inhibit apoptosis. In normal

conditions, the interplay of these proteins regulates the apoptotic pathway. However upon the

induction of stress conditions or DNA damage, pro-apoptotic members of Bcl-2 family are

activated. The pro-apopototic Bcl-2 family member’s form pores in the outer membrane of the

mitochondria, allowing cytochrome-c and other proteins to initiate apoptosis [18].

In various cancers, somatic mutations cause over-expression of anti-apoptotic members

of Bcl-2 family leading to abrogation of apoptosis [19]. The anti-apoptotic member of Bcl-2

interact with the pro-apoptotic members of Bcl-2 and inhibit their ability to form pores in the

mitochondrial outer membrane. This interaction is mediated by a linear alpha-helical peptide

(BH3) on pro-apoptotic members binding to the hydrophobic pocket on the anti-apoptotic

members(Figure 2). This interaction is critical for the abrogation of apoptosis and inhibition of

this interaction leads to activation of apoptosis [20]. Synthetic peptides that mimic the BH3

peptides were shown to successfully induce apoptosis in different cancer cell lines and mouse

models[21]. Later, small molecules identified by using structural-activity relationship (SAR)

analysis were found to be efficacious in promoting apoptosis; re-establishing the observations

made with the synthetic peptides (Figure 2). These small molecules bound to pro-apoptotic

members of the Bcl-2 family with nano-molar affinity and showed good pharmacokinetic

properties [22]. Small molecule inhibitors of Bcl-2 are currently in various stages of clinical or

pre-clinical investigation.

Several key observations can be derived from the study of the aforementioned Bcl-2

example. Firstly, PRDs that involved in critical cellular processes (such as apoptosis in the case

of Bcl2)are often mis-regulated in a wide-spectrum of cancers. Secondly, somatic mutations are

often sufficient to amplify the cellular levels of PRDs thereby modulating the cellular processes

they are involved in. This also provides an opportunity for drug development, because in theory

these perturbations can be reversed by specifically blocking the interactions mediated by these

PRDs. Thirdly, small molecules developed against Bcl-2 bind with an affinity comparable to that

of the native partner protein or peptide by binding to a small subset of residues on the interaction

surface. These residues often, but not always, correspond to the “hotspot residues”. The Bcl-2

9

example highlights the possibility of identifying small compounds that can inhibit interactions

mediated by PRDs with desirable affinity and specificity.

A number of PRDs have been identified as drug targets (reviewed in Table 1). These

domains follow the characteristics described above i.e. amplification in cancer, involvement in

key cellular pathway and presence of hotspot residues. These characteristics have made PRDs an

good target for anti-cancer drug development.

Drug target Interaction partner Role Remarks

MDM2 P53 Negative regulation of p53 protein

Mdm2 down-‐regulates tumor suppressor protein p53 in cancer, targeted using small molecule and peptides

IAP Caspase Inhibition of caspase IAP’s negatively regulate caspases; targeted by peptides and peptidomimetics

Dvl2 PDZ Fzd-‐7 Involved in Wnt signalling

Dvl2 PDZ domain binds to internal peptide; targeting using peptides and small molecules

N-‐Cadherin N-‐cadherin, E-‐cadherin

Cell adhesion N-‐cadherin binds to HAV sequence at EC1 domain of different cadherin

Plk1-‐PBD CDC25C, Chk2, PDBIP1

G2/M checkpoint regulation

Polo-‐box domain of Plk1 binds to phospho-‐peptides; targeted using small molecules and peptidomimetics

ICN-‐1/CSL MAML1 Transcription factor, involved in Notch signalling

MAML-‐1 binds to hydrophobic groove on ICN1-‐CSL, targeting by peptiomimetic (stapled peptide)

eIF4E eIF4G, 4E-‐BP1 Translation initiation factor

eIF4E binds to 16-‐mer segment within eIF4G and 4E-‐BP1; targeted using peptides and small molecules

Menin MLL Histone modification Menin-‐MLL fusion leads to over-‐expression of Hox genes; targeted using small molecules

Table 1: List of all PRD's that are currently being investigated as targets for cancer therapies

10

1.4 Studying peptide recognition domains using peptide probes

Peptides can be generated against PRDs from natural partners or directed evolution methods

such as phage display and SPOT microarray. These peptides have been used as valuable tools for

studying the biological roles of PRDs.

1.4.1 Understanding structure and binding properties: To obtain detailed understanding of

interactions between a PRD and its biological partner, it is important to characterize the

structural and molecular aspects of the interaction in-depth. Peptides derived from interacting

partners can be used for studying these biophysical binding properties.

Further insights can be obtained by using combinatorial methods such as SPOT

microarray and phage display. SPOT microarrays are generated by synthesizing peptides on a

cellulose membrane [23]. On a single membrane, different peptides can be obtained which can

sample all the amino acids at each position of the peptide. The domain is incubated with the

microarray and fluorometric/colorimetric methods can be used to study the binding of domains at

each spot. By analyzing the intensity of each position on the microarray, we can obtain the

binding preference of a given domain which is often visually represented as position weight

matrix (PWM) or sequence logo (Figure 3). The height of an amino acid in the PWM is

indicative of the relative frequency at that position. One of the first applications of this method

was to study the SH3 domains [24]. A key advantage of SPOT/peptide microarray is the ability

to study PRDs that bind to modified peptides, such as phosphorylated, acetylated peptides

[25,26].

Phage display is a powerful technique that can be used to obtain binding preferences of

PRDs. In phage display, peptides are fused to the coat protein of filamentous bacteriophage such

that the peptides are displayed on the surface of the bacteriophage[27]. Using site-directed

mutagenesis, large 1010 library of phages can be generated where each phage displays a unique

peptide. These libraries can then be panned against immobilized PRDs to capture phages that

bind specifically to the domain of interest. The peptide displayed by these tightly-bound phages

can be identified by sequencing the DNA of the phage (Figure 3). There are several advantages

of phage display over other approaches. These include cost effectiveness and ability to re-use

libraries to probe against a large set of PRDs. Previous studies in the Sidhu lab have used peptide

11

phage display to understand binding preferences of well-studied domains such as the PDZ and

the SH3 domains [28,29].

Both phage display and peptide microarrays are extremely effective in understanding the

binding preferences of domains and can be complemented with biophysical methods such as iso-

thermal calorimetry (ITC), surface plasmon resonance (SPR) and fluorescent polarization to

obtain binding affinities of the peptide recognition domains. Computational methods (machine

learning/structural methods) have also been developed to predict the binding preferences of

PRDs [28,29].

Figure 3: Combinatorial methods for determining binding preferences of peptide recognition domains. Combinatorial methods such phage display and peptide microarrays have been extensively used to isolate binding preferences of diverse set of domains. These preferences are often represented as position weight matrixes (PWM) that are based on the occurrence frequency of a given amino acid.

1.4.2 Elucidating the biological role: Peptides have been extensively helpful in elucidating

the biological roles of PRDs. Peptide motifs obtained from combinatorial screens can be used to

screen the proteome to identify potential binding partners. These partners can then be confirmed

using yeast 2-hybrid and/or pull-down assays [30]. A number of such peptides motifs are

available in databases such as ELM [7].

Peptides that bind specifically and with high-affinity can be used as intracellular probes

against PRDs. Linear peptides are often unstable and cannot cross the cellular membrane.

However, recent developments in molecular biology and peptide chemistry have significantly

12

increased the stability and cellular permeability of peptides. Chemical modifications can greatly

increase the stability and affinity of peptides that bind to a target domain [31,32]. Fluorescent

labels and probes can be attached to the peptides to track their localization inside cells and model

organisms [33]. Further, peptide probes can be readily fused to cell-penetrating peptides (CPP) to

increase their cellular permeability in various mammalian cell lines. Other entities (such as NLS

for nuclear localization) can be attached to the peptides to deliver the peptides in specific cellular

organelles[34]. Finally, transduction methods can be used to express peptides inside mammalian

cell lines. These methods include lenti-viral based expression systems that effective delivery of

peptides to different cell-lines and model organisms [34]. The key advantage of lenti-viral

expression vectors is that the DNA encoding the peptide is incorporated in the genome that

allows stable expression of peptides in dividing and non-dividing cell lines [34].

One of the central advantages of using peptides as probes for biology is their ability to

modulate protein function in various aspects. Peptides bind to epitopes on the proteins that are

often distinct from the enzymatic pocket [35]. This allows peptides to modulate domain function

as either antagonist or agonists.

1.4.3 Validating drug targets: One of the central motivations of modern biology is to identify

therapeutic targets for diseases. Numerous methods are available to perturb activity of a

particular gene, e.g. gene knockouts, RNAi and small molecule drugs. Drugs act at the protein-

level and perturb the natural biological function of a given protein. Drugs whose perturbations

result in resolution of pathogenic phenotype are ideal candidates for therapy. Drugs are often

small organic molecules that can be identified using structure-based approaches or high-

throughput screens. However, development of high specificity and affinity small molecules often

requires large monetary and time investment. These costs make the development of small

molecule drugs against all known PRDs prohibitive. By prioritizing PRD to those that play a role

in the onset of a given disease, we can greatly increase the efficiency of drug discovery. To this

end, peptides may act as probes for identification of drug targets for diseases. As described

previously, peptides can be generated against a large number of PRDs and introduced into

mammalian cells.

Peptides modulate their targets by various methods and can produce distinct phenotypes.

In some disease models, peptide modulators may lead to alleviation of the disease. This has been

13

previously used to identify various domains such as MDM2 and Bcl-2 as drug targets for cancer

[4]. Previously in the Sidhu lab, phage display was used to generate high affinity and specificity

peptides against the PDZ domain of Dishevelled-2 (Dvl-2) (Figure 4) [36]. The interaction

between Dvl-2 and Frizzled-7 receptor is mediated by the PDZ domains of Dvl-2 and an internal

peptide on Fzd-7. This interaction is critical for the activation of the Wnt-signalling, a critical

step for tumorigenesis in different cancers; and deletion of Dvl-2 PDZ or Fzd-7 peptide motif

leads to the abrogation of Wnt-signalling [36]. Reasoning that inhibition of PDZ-Dvl may disrupt

Wnt signalling, Zhang et al introduced phage-derived peptides into cells and observed that the

peptides specifically targeted PDZ domain of Dvl2 inside cells and down-regulated β-catenin

signalling stimulated by Wnt signalling [36]. Thus by using peptide probes against Dvl2-PDZ, ,

Zhang et al were able to demonstrate that targeting PDZ-Dvl2 may be a viable means for

attenuating the growth of cancer cells that are dependent on Wnt-mediated signalling pathways

and established Dvl2-PDZ as a valid drug target for cancer. Similar studies may be used to

identify potential drug targets for diseases including can

Figure 4: Generating intracellular Dvl2-PDZ inhibitors using phage display. (A) Phage display was done against Dvl2 PDZ using internal peptide library. The phage-dereived binding preference was then used to design peptide inhibitor: pep-N3. (B) Pep-N3 structure in complex with Dvl-2 PDZ confirms the binding mode of the peptide. (C) For intracellular uptake, Pep-N3 was fused to antennapedia and introduced in Wnt3a responsive human embryonic kidney (HEK) 293S cell lines. Real-time cellular uptake of pep-N3 is observed using time-lapsed microscopy. (D) Normalized TOPglow reporter activity was measured in Wnt3a-stimulated HEK293S cells after 18 h of treatment with pen-N3 shows inhibition if Wnt/TCF-dependent signalling. Pen-N3 does not inhibit TCF response signal in the control APC mutant HCT-15 colon cell line. Western-blots show Pen-N3 inhibits Wnt-signalling by inhibiting the accumulation of beta-catenin in HEK293S cells treated with Wnt3a (right side panel). (Figures from Zhang et al 2008)

14

1.4.4 Drug discovery: Recent studies have suggested that peptide probes may themselves

serve as a starting point for drug discovery against peptide recognition domains. In their direct

application, peptides themselves may serve as modulators of peptide recognition domains [37,

38]. Modifications such stapling or cyclization may be performed to improve the

pharmacokinetic properties [31, 32]. Another popular method of drug discovery is to develop

peptidomimetics. Peptidomimetics are organic molecules that mimic peptides. Peptidomimetics

can be generated by replacing natural amino acids by amino-acid derivatives that make the

peptide molecule less-susceptible to degradation and increases stability [39]. Finally peptide

probes may themselves be used to design fluorescent detection assays that can then be used to

screen large libraries of compounds (Figure 5). Often these screens include identification of

compounds that can displace the natural peptide from the binding site [40].

Figure 5: Fluorescence polarization assays for the discovery of small-molecule inhibitors of domain-peptide interactions. The chief method for identification of small-molecule compounds against domain-peptide interaction is to use a fluoroscent polarization assay. Natural or synthetic peptide binder is fluorescently tagged and incubated with the target domain. A library of small-molecule drugs is screened to identify molecules that compete with the binding of fluorescent peptide with target domain. This allows rapid screening of large small molecule libraries. 1.5 Goal of the project

Motivated by recent developments, the long-term goal of this project is to identify PRDs that

may act as novel cancer targets. To do this, we have focussed on shortlisted protein targets

against ovarian cancer provided by our collaborators, Dr. Rob Rottapel and Dr. Jason Moffat.

Ovarian cancer is the second most common gynaecological cancer in women and currently has

15

only one approved therapy. The 5-year survival rate for this cancer is only 47% highlighting the

need to develop targeted therapies against ovarian cancer. To assist in the development of novel

therapies, our collaborators used a whole RNAi screens to knockdown ~16000 human genes in

15 different ovarian cancer cell lines [41]. Using this screen, they identified 1695 genes whose

knockdown severely affected proliferation of ovarian cancer cells. Based on the current

literature, we hypothesized that PRDs present on these ovarian cancer essential genes play an

essential role in tumorigenesis and may serve as drug targets for further investigation.

The study has two key aims:

1) Identify peptide recognition domains present on these 1695 gene targets in ovarian cancer

using computational methods; and

2) Generate peptide binders against these domains using peptide phage display.

The peptide binders generated here can then be used to design intracellular probes to

specifically modulate interactions mediated by these PRDs and study the effect of these

perturbations on cellular pathways in specific ovarian cancer cell-lines. Such peptide can also be

used to identify PRDs that may serve as drug targets for ovarian cancer. Finally, peptide

inhibitors can be used to design assays to identify small-molecules that target interactions

mediated by these PRDs.

16

2 Identification of peptide recognition domains essential in ovarian cancer

17

2.1 Introduction The first goal of the project was to identify potential peptide recognition domains present on a

shortlisted group of proteins involved in ovarian cancer. The shortlisted candidates were based

on whole genome RNAi screens performed by our collaborators Dr. Jason Moffat and Dr. Rob

Rottapel and represent genes that are essential for cancer growth. The domains present on these

proteins were matched to known PRDs present in existing databases (PepX and DOMINO) in

order to identify potential PRDs.

Figure 6: Whole genome RNAi screen for identifying essential genes in ovarian cancer. A library of ~80,000 lenti-virus encoded shRNAs is used to selectively knockdown 16,000 human genes in different cancer cell lines. Each shRNA is identified using a single barcode. The genomic DNA is harvested at multiple time points. Genomic DNA from all the time points is hybridized on a microarray chip to study the specific growth rate of each unique cell type. shRNAs that knockdown genes essential for cancer proliferation significantly affect the growth rate and can be detected by microarray analysis. Using this approach our collaborators generated a list of 1695 human genes that effect the growth of 15 different ovarian cancer cell lines.

2.1.1 Whole Genome RNAi screen: RNAi is a powerful technique to knockdown specific

genes and study their effect on biological pathways. RNAi studies have illuminated roles of

various genes and helped to obtain a better understanding of their functions. Developments in the

cellular biology and molecular genetics techniques have made it possible to perform genome-

wide RNAi screens, where in a single experiment a large number of the genes in the human

genome can be targeted. These screens are performed using a library of short hairpin RNA

18

(shRNA) targeting many human genes where each shRNA is encoded inside a lenti-viral

expression vector. Lenti-viral expression vectors allow specific shRNAs to be incorporated into

cells. The library of shRNA is incubated with cancer cell lines to allow incorporation of a unique

shRNA inside a given cell in the population. Upon infection, the cells are allowed to proliferate

for 3-4 weeks, after which shRNAs that have been selectively depleted or enriched are identified

using microarrays, deep sequencing or high-content screening. Such pooled screens can be used

to define genes necessary for cancer cell proliferation/survival in cell culture [42].

In this study, we focused on screens done on ovarian cancer by our collaborators Dr.

Jason Moffat and Dr. Rob Rottapel (Figure 6) [41]. Using a library of 78,432 shRNAs, Marcotte

et al targeted 16,056 genes in 15 different ovarian cancer cell lines. The cancer cell lines used in

their analysis are attached in Appendix A. To select genes that are essential for ovarian cancer,

Marcotte et al. followed the dropout rate of each shRNA. These dropout rates were derived by

calculating the slope between the measured microarray expression intensity at each time point

relative to the initial time point. These dropout rates were used to define the GARP (Gene

Activity Ranking Profile) score for each gene. Genes with negative GARP score represent genes

that are critical for cancer proliferation. Using a cut-off to select highly essential genes, Marcotte

et al. identified 1695 genes that were essential across all ovarian cancer cell lines.

In this study, I used these 1695 genes as an input to my computational pipeline. There are

specific reasons for focussing on genes obtained from whole genome RNAi screens: 1) The

whole genome RNAi screens provide an unbiased list of genes that are important for ovarian

cancer growth and hence allows to focus on a much reduced set of genes, and 2) Given that

knockdown of these genes hamper cancer growth, the screen provides evidence that peptide

inhibitors of peptide recognition domains present on these genes may also negatively effect

cancer growth which can be rapidly tested by delivering peptides inside ovarian cancer cell lines.

Peptides that successfully re-capitulate the results obtained whole genome RNAi screens can

actually serve as templates for development of cancer therapeutics.

2.1.2 Computational methods to identify peptide-recognition domains: Recent

developments in high throughput experimental methods for identifying protein-protein

interactions have led to rapid identification of protein interaction partners. Experimentally known

domain-peptide pairs are documented in databases such as DOMINO [9] (a database of known

19

domain-peptide interactions), PEPX [11] (a database of domain-peptide interactions where the

co-crystal structures are available) and ADAN [10] (database of selected domain-peptide

interactions with known motifs). Other sources include ELM [7] which is a database of peptide-

like motifs but also includes information about domains that bind to such peptide motifs.

Computational methods have also been developed to identify novel PRDs [43]. These

approaches use sequence or structural similarity to known peptide binding domains present in

databases mentioned above as a metric to identify novel PRDs. In this study, we focussed on

peptide recognition domains present on protein targets provided by our collaborators. To develop

a computational method for the identification of PRDs, we focused on domains that share high

sequence similarity to known PRDs present in PEPX and DOMINO database (Figure 7).

Figure 7: Computational strategy for identifying potential peptide binding domains. Using databases such DOMINO and PepX, I obtained a high-confidence list of known peptide recognition domains. Using this list, I searched for domains present on target gene list that share high sequence similarity to known peptide recognition domains. The final domain list was then optimized by including information regarding the domain boundaries and expression conditions. For phage display, I selected domains which can be readily expressed in bacterial system and have crystal structures in complex with a known peptide. Using this approach, I was able to identify 66 domains from the list of 1695 genes provided by our collaborators.

20

2.2 Methods

Identification of peptide recognition domains: The proteins encoded by each of the 1695

genes were obtained using Uniprot annotations. Using BLAST, all the full-length proteins were

searched against domains present in PepX and DOMINO. The sequences with greater than 70%

sequence identity were retained while the other sequences were discarded. The sequence cut-off

was chosen based on previous studies that show that accurate structural models (<2 Å rmsd) The

domain boundaries were then annotated based on the closest available domain structure available

in Protein Data Bank (PDB). Figure 7 shows the entire computational pipeline used for this

method.

2.2.1 Manual filtering and literature review of potential domains from PEPX: Based on the

results obtained from computational pipeline, only domains present in PepX were selected for

further investigation. Each domain in PepX has a crystal structure bound to a peptide ligand

present in the database. This gave us confidence that such a domain: 1) binds to peptides; and 2)

can be expressed in bacterial cells. Further, the structures of domain-peptide ligand can be used

validate the results obtained from phage display. The domains that were obtained from PepX

were manually analyzed to remove false positives. These included domains that do not make

direct contacts with the peptide or those that share the interaction surface with another domain.

Further, domains that could not be expressed in bacterial cells for crystallization were also

removed from the list. Finally, literature review was done to analyze the domains obtained from

our computational pipeline.

Database used Description No. of domains obtained PepX Database of domains with known peptide

recognition for which structures are available

86

DOMINO Database of domains known to bind to peptides

390

PDB Database of all known structures 885

Pfam Total no. of domains in our dataset 5567

Table 2: Summary of results obtained from DOMINO and PepX.

2.3 Results and Discussion

2.3.1 Analysis 1695 genes obtained from whole genome RNAi screens: 390 domains were

21

obtained from DOMINO and 86 from PepX (Table 2). As reasoned above, the 86 domains from

PepX were selected for further study. Upon manual analysis and filtration, 66 domains were

selected as targets for phage display.

2.3.2 Literature review of domain list obtained from the computational pipeline: The list

of 66 domains represents a good initial set for the study. To get a better understanding of the

kind of domains present in this list, a thorough literature review was performed and structural

information for each of these domains was annotated (Table 3). Structurally, these 66 domains

represent 42 domain families. These include well-characterized peptide binding domains such as

the SH3 and the PDZ domains, which have been extensively studied for their peptide binding

potential by structural, biochemical and combinatorial studies. I decided to keep these domains

in our list as they can be used as positive controls for future experiments.

Domain Name

Protein name Domain boundary

Related PDB structure

Comment

PDZ (2.30.42.10)

1 Disc-large homolog 1 #1 224-310 2I0L (98.8) First PDZ domain ofDLG1; plays a role in planar cell polarity

2 Disc-large homolog 1 #2 319-405 1TP5 (82.9) Second PDZ domain ofDLG1; plays a role in planar cell polarity

3 Disc-large homolog 2 #2 193-279 2I0L (84) Second PDZ domain ofDLG2; plays a role in planar cell polarity

4 Disc-large homolog 2 #3 421-501 1TP5 (83.1) Third PDZ domain ofDLG2; plays a role in planar cell polarity

5 Disc-large homolog 4 #2 160-246 2I0L (89.2) Second PDZ domain of DLG4; ; plays a role in planar cell polarity

6 Disc-large homolog 4 #3 3113-393 1TP5 (97.1) Third PDZ domain ofDLG4; plays a role in planar cell polarity

SH3 (2.30.30.40)

7 Growth-factor receptor bound protein 2

1-58 2VWF (93.3) N-terminal SH3 of Grb2, adaptor protein in RTK signalling

8 Grb2-related protein 2 271-330 2W10(93.3) C-terminal SH3 of GRAP2, adaptor protein in RTK signalling

9 Phospholipase gamma 791-851 1YWO (93.4) Involved in RTK signalling

10 Sorbin and SH3 containing protein 2

938 – 999 2O9V (70) Second SH3 domain of SORBS2, interacts with Abl kinase

Protein Kinase (3.30.200.20+1.10.510.10)

22

11 Mitogen-activated kinase 3

42-330 2FYS (87.8) Protein kinase signal cascade

12 PKA 44-298 2VO7 (99.4) cAMP signalling

13 PKB 44-298 2VO7 (92.9) cAMP signalling

14 Serine/threonine kinase 2 81-684 1WBP (84.6) Regulates p53, cell cycle

15 Aurora kinase B 2-344 2BFY (80.2) Involved in chromosomal segregation

G-alpha subunit (3.40.50.300+1.10.400.10)

16 G-alpha (i) 1 2-354 1Y3A (100) Involved in G-protein signalling

17 G-alpha (i) 3 2-354 1Y3A (93.5) Involved in G-protein signalling

18 G-alpha (o) 1 2-354 1Y3A (72.1) Involved in G-protein signalling

Ligand binding domain of nuclear receptor (1.10.565.10)

19 Bile acid Receptor 256-474 3BEJ (100) Binds to bile acid, hormone receptor signalling

20 Retinoic acid receptor-gamma

261-463 3E94 (86.2) Binds to retinoic acid, hormone receptor signalling

21 Glucocorticoid receptor 528-777 1M2Z (99.6) Binds to cortisol, hormone receptor signalling

22 Pregnane X Receptor 204-434 1NRL (100) Orphan nuclear receptor, hormone receptor signalling

Dyenin light chain (3.30.740.10)

23 Dynein light chain 1 1-89 1CMI (100) Part of dynein motor complex

24 Dynein light chain 2 1-89 3E2B (96.6) Part of dynein motor complex

RNA recognition module (3.30.70.330)

25 Splicing factor U2AF1 65-147 1JMT (99) mRNA splicing

26 Splicing factor 45 306-385 2PEH (100) mRNA splicing

Profilin (3.30.450.30)

27 Profilin 1 1-140 2PAV (100) Regulates cytoskeleton

28 Profilin 2 1-140 2V8C (99.3) Regulates cytoskeleton

Penta-EF hand (1.10.238.10)

29 Programmed cell death receptor 6

23-191 2ZNE (100) Intracellular Ca2+ signalling

30 Calpain small regulatory subunit 1

1-268 1NX0 (97.1) Regulates Ca2+ dependent calpain protease complex

Actin (3.30.420.40+3.90.640.10)

31 Actin-gamma 1 1-375 3CHW (95.2) Highly conserved in eukaryotes; plays a role in cytoskeleton

32 Actin-gamma 2 3-374 2V52 (98.1) Highly conserved in eukaryotes; plays a role in cytoskeleton

Beta-propeller (2.130.10.10)

23

33 Clathrin heavy chain 1 1-363 1UTC (100) Involved in endocytosis

34 WDR5 20-334 3EMH (100) Involved in histone modifications

PH/PTB domain (2.30.29.30)

35 Dynamin 2 2-301 2AKA (87) Microtubule-associated protein

36 Disabled homolog 2 45-196 (97.5) Involved in endocytosis

P-loop containing nucleotide triphosphate hydrolase (3.40.50.300)

37 RAC3 1-189 2QME (95.4) Intracellular G-protein signalling

38 RAD51 97-339 1N0W (100) DNA damage response

Typrin-like serine protease (2.40.10.10)

39 Tissue-type plasminogen activator

311-562 1RTF (100) Extracellular protease

40 Acrosin 43-343 1FIW (71.1) Extracellular protease

14-3-3 (1.20.190.20)

41 14-3-3 eta 2-246 2O02 (75.5) Adaptor protein in signalling pathways

AP50 domain (2.60.40.1170)

42 AP-2 subunit mu 122-435 1I31 (100) Involved in endocytosis

Bcl-2 (1.10.437.10)

43 Bcl-2 like protein 1 1-233 3FDL (100) Regulates apoptosis

Bro1 (1.25.40.280)

44 Alix 3-392 3C3R (100) Intracellular protein transport

CAP/Gly (2.30.30.190)

45 Dynactin subunit 1 26-97 2HQH (100) Microtubule associated protein

Caspase-like (3.40.50.1460)

46 Caspase 2 155-452 1PYO (100) Intracellular protease, apoptosis

DNAse1-like (3.60.10.10)

47 DNAse 1 23-282 2D1K (78.8) DNAse involved in apoptosis

eIF4E (3.30.760.10)

48 eIF4E 2-217 2V8Y (100) Translation initiation factor

FERM (1.20.80.10)

49 Band 4.1-like protein 3 110-391 3BIN (100) Negative growth regulator

Ig domain (2.60.40.10)

50 Immunoglobulin lamba-like polypeptide 1

38-213 1W72 (84.1) B-cell surface receptor

Importin-beta (1.25.10.10)

24

51 Importin beta-1 1-876 1QGR (99.8) Nuclear import

Mad2A (3.30.900.10)

52 MAD2-like protein 1 2-205 2QYF (99.5) Anaphase cell cycle checkpoint

MHC II (3.10.320.10)

53 DRB1 beta 30-227 1T5X (100) Antigen recognition

PCNA (3.70.10.10)

54 Proliferating cell nuclear antigen

1-261 2ZVM (100) DNA replication

OB-fold domain (2.40.50.140)

55 Replication factor A 70 2-121 2B3G (100) Formation of replication fork

Serpin (3.30.497.10+2.30.39.10)

56 Plasma serine protease inhibitor

20-406 1LQ8 (99.7) Inhibitor of serine protease

SH3-type barrels (3.40.50.300/2.30.30.40)

57 Volage-dependent L-type calcium channel subunit beta

65-411 1T3L (76.6) Ca2+ channel, G-protein signalling

SWIB/MDM2 (1.10.245.10)

58 MDM4 26-106 3DAB (100) Regulator of p53, apoptosis

TAP-UBA (1.10.8.10)

59 Nuclear export factor 1 565-619 (100) Nuclear export of mRNA

Winged helix repressor DNA binding domain (1.10.10.10)

60 Transcription factor IIF 449-517 1J2X (100) General transcription

TRFH (1.25.40.201)

61 Telomeric repeat-binding factor 1

58-268 3BQO (100) Regulates telomeric length

Factor Xa inhibitor (4.10.410.10)

62 Amyloid-like protein 2 306-364 1CA0 (71.2) Regulation of homeostasis

Ubiqutin-like (3.10.20.90)

63 Ubiquitin-60S ribosomal protein L40

1-76 2D3G (100) Post-translation modification, regulates protein function

Vinculin (1.20.1490.10)

64 Vinculin 1-259 1YDI (100) Actin-filament binding protein

XRCC4 (2.170.210.10+1.20.5.370)

65 XRCC4 1-213 1IK9 (98.1) Double stranded DNa break repair

Tyrosine phosphatase (3.90.190.10)

25

66 Tyrosine-protein phosphatase non-receptor type 22

1-310 3BRH (99.4) Regulator of tyrosine kinase SRC family of kinases

Table 3: List of the 66 domains selected for the phage display experiments. The table shows the different protein families and their structural classification code (CATH) found by our computational method. The table also shows the boundary of the domain (defined by PDB structure) and the reference PDB structure. The sequence similarity between the reference structure and domain is shown in brackets. Apart from these well-characterized domains, I also obtained 52 domains for which this study

represents the first combinatorial study to identify their binding preferences. These 52 domains

are part of 39 unique protein families. If we are able to obtain peptides against these domains

using phage display, we can in principle extend phage display to other members of the family.

These 52 domain families are involved in diverse cellular pathways including gene expression

(Nuclear receptors), endocytosis (clathrin, AP2), cytoskeleton modelling (actin, vinculin,

dynactin), receptor tyrosine signalling (Grap2, Grb2, PLCG1, 14-3-3) and apoptosis (Mdm4,

Bcl-xl)to name a few.

The 66 domains identified by our computational pipeline also include some known cancer

targets. These include Bcl-xl which has been extensively targeted for its anti-apoptotic role and

discussed previously in detail [18]. We also identified eIF4E, a translation initiation factor that is

responsible for binding to mRNA caps and loading them on to ribosomes. In different cancers,

eIF4E is over-expressed which leads to expression of mRNA with unstable 5’ UTR [40].

Presence of previously-known cancer targets in our data-set provides us with confidence that

using our computational pipeline, we have been able to identify PRDs that are important in

cancer.

2.4 Summary

In this chapter, I have discussed the computational pipeline we used to identify potential peptide

recognition domains on a shortlisted protein candidates involved in ovarian cancer. I utilized

sequence homology-based approach to identify domains present on each protein that are similar

to known PRDs in the database PepX. Using this approach, I identified 66 domains that will

serve as targets for my phage display experiments.

For this study, I used essential genes obtained from whole genome RNAi screens as a

surrogate for genes involved in ovarian cancer. A number of these genes are involved in key

regulatory pathways that are conserved between normal cells and cancer cell lines and hence

26

may not represent viable cancer targets. To overcome these limitations, recent studies have

integrated data from other functional genomics screens such as mRNA expression data, copy-

number variations and exome sequencing to accurately predict proteins important for

carcinogenesis [43]. While integration of data from multiple sources may help in generating a

more refined list of cancer related proteins, such an analysis is beyond the scope of the current

study.

For identifying PRDs, I selected two databases, PepX and DOMINO. Both these

databases provided me with a large number of potential peptide recognition domains. For further

analysis, I focused on domains obtained from PepX. The domains obtained from PepX were then

manually filtered to remove false positive hits. This provided me with a shortlisted list of 66

domains. This list included domains from distinct structural folds and biological pathways. Some

of them have previously been studied in context of cancer in some cases these domains have

themselves been established as drug targets.

It is important to interpret the results obtained from computational pipeline in context of future

experiments. Using a simple analysis, I was able to obtain a diverse set of potential PRDs that

can be used as targets for phage display. The conclusions made from this study can be extended

to other studies of similar origin.

27

3 Identification of peptide binders using phage display

28

3.1 Introduction

After selecting the potential targets for phage display, my next aim was to generate peptides

against each of these domain targets. To do this, I used phage display technology to screen large

peptide libraries. As described previously, phage display is a directed evolution approach in

which peptides can be displayed on the surface of filamentous bacteriophage, M13, using

specialized vectors known as phagemid. Site-directed mutagenesis can then be used to generate

large peptide libraries where each phage member displays a unique peptide on its surface. Phage

display has been used extensively to generate high affinity and specificity peptides against

different protein targets. In the Sidhu lab, peptide phage display has been used to identify

binding preferences of a large number of human and yeast SH3 and PDZ domains [28, 29]. In

the case of Dvl2-PDZ domains, phage-derived peptides were used as intracellular inhibitors of

Fzd7-Dvl2 interactions; thereby knocking down Wnt signalling, an important signalling pathway

[36]. In this section I will describe the results obtained from the phage display screens.

3.1.1 Displaying peptides on phage particles: In the Sidhu lab, we use M13, a single stranded

DNA containing virus from the Inoviridae family for expression of peptides. M13 viruses infect

gram-negative bacteria such as E. coli. There are several advantages to using M13 for phage

display experiments. First, it follows a non-lytic life cycle which makes it easier to grow and

propagate in the lab. Second, its DNA is present in single-stranded form which makes it possible

to genetically display proteins on the surface of M13 using site-directed mutagenesis. The coat of

M13 is made up of five proteins as shown in Figure 8. Two of these proteins – p8 and p3 have

been used previously for displaying proteins. P3 protein is present in 5 copies on the phage and is

required for infection. Various proteins such as antibodies and fibronectin have been successfully

displayed on the surface using the p3 fusion without affecting infection of bacteria. Other protein

that is regularly used for phage display is p8, or the major coat protein which is present all over

the surface. Small peptides (<10 amino acids of length) can be fused to p8 without affecting the

assembly and secretion of the phage particle. In this study, we use the p8 coat protein as it allows

multiple copies of peptides to be present on the phage particle leading to selection of lower-

affinity peptides [44].

To display peptides onto the M13 phage surface, specialized vectors called phagemids

are required. Phagemid contains a single copy of the p8 phage protein under the influence of an

29

IPTG-inducible PTac promoter, an antibiotic-resistance cassette and a single and a double

stranded origin of replication. The peptide is fused to the N-terminal end of the p8 coat protein

such that it is expressed along with p8 inside the bacterial host [27]. Once the phagemid is

introduced into the cell, it is replicated into multiple ssDNA copies inside the bacterial host. The

infected cells can be selected using the resistance marker. To initiate formation of new virus

particles, the cells are super-infected with modified M13 phage that acts as a “helper”. Helper

phage leads to production of single-stranded phagemid DNA that can be effectively packaged

into virion particles. The packaging unit also introduces the mutant coat protein produced by the

phagemid. The abundance of mutant coat proteins is dictated by the IPTG concentration in the

culture media and may help in optimizing the number of peptides displayed by the bacteriophage

[27].

Figure 8: Schematic diagram of M13 bacteriophage. M13 filamentous phage is made up of 5 proteins. P8, the major coat protein is the most abundant coat protein that forms the cylinder around the phage ssDNA. The distal end of M13 assembles first and contains approximately three to four copies of p7 and p9. The proximal end is formed by five copies each of p6 and p3. The p3 coat protein is required for infection of the bacterial host. P8 and p3 coat proteins are used extensively for phage display. 3.1.2 Site-directed mutagenesis and phage library design: Once a protein is successfully

displayed on the M13 phage coat, mutations can be introduced into its encoding DNA in order to

generate vast numbers of variants. The ease of manipulating M13 ssDNA makes this phage an

ideal system for the synthetic construction of libraries of up to 1011 unique clones.

Changes to the phagemid DNA are performed in a series of reactions known as Kunkel

mutagenesis. In brief, E coli cells deficient in deoxy uracil transphosphatase (dut) and uracil

DNA deglycosylase (ung) are used to synthesise a uracil-rich version of the ssDNA phagemid

(dU-ssDNA) that serves as the template for the mutagenesis reaction. Synthetic oligonucleotides

that introduce mutations to the region of interest anneal to the dU-ssDNA template and serve as

30

primers for synthesis of the complementary strand. This reaction is completed in the absence of

uridine to form covalently-closed circular double-stranded DNA (CCC-dsDNA) with an original

uracil-rich DNA strand and a mutagenic DNA strand (Figure 9). Transformation of the CCC-

dsDNA into a dut+/ung+ bacterial host results in the degradation of the uracil-rich strand and

retention of the mutagenic strand. The CCC-dsDNA is then electroporated into a bacterial host

infected with M13 helper phage to synthesize the phage library. Kunkel site-directed

mutagenesis is ideal for phage display applications because it allows for complete control over

library construction, starting from the design of the mutagenic oligonucleotides themselves to the

annealing and synthesis conditions [44].

Figure 9: Oligonucleotide-directed mutagenesis with an ssDNA template. (A) A synthetic oligonucleotide (red arrow) is annealed to the template (dU-ssDNA). The oligonucleotide contains region with desired mutations flanked by perfectly complementary sequences. (B) Covalently-closed circular dsDNA (CCC-dsDNA) is enzymatically synthesized by T7 DNA polymerase and T4 DNA ligase. (C) CCC-dsDNA is introduced into an E. Colihost using electroporation.

Different peptide libraries can be generated using different sets of mutagenic

oligonucleotides. The library used for this project was obtained from Dr. Gang Chen, a post-

doctoral fellow in the Sidhu lab. The length of the peptides is 16 amino acids and each position

can accommodate any of the 19 amino acids (excluding cysteine). Cysteine is excluded because

it may lead to cyclization and disruption of the linear structure of peptides. The oligonucleotides

used in designing the library were obtained from TriLink Biotechnologies. These

oligonucleotides were synthesized three nucleotides at a time instead of single nucleotide as used

31

by other vendors. This allows one codon per amino acid removing codon bias that is generally

observed in oligonucleotides that use the NNK codons for randomization (where 32 codons code

for 20 amino acids).

3.1.3 Selection strategy: The peptide library constructed can be used to screen for peptide

binders against a target protein. After incubation of the library with the immobilized target, non-

specific phage particles are removed through a series of washes. The remaining bound clones are

eluted and amplified in a bacterial host, allowing for further rounds of screening to enrich for

clones expressing proteins with the desired traits. Figure 9 shows the entire phage display

selection pipeline used in this study. The success of the selection depends on both the quality of

the phage display library and the quality of the protein targets [44].

Figure 10: Phage display selection for peptide recognition domains. The peptide library was incubated with immobilized antigen. The non-binders were washed away while positive binders attached to the plate. The phage library was eluted and amplified in a bacterial host. The process was repeated five times to obtain an enriched set of binders. The phage pools from Round 5 were introduced in a bacterial host and plated on LB plates. 96 colonies were picked for each domain and grown over night to obtain phage clones. Each of the 96 clones was tested for binding in phage ELISA. The clones that showed high enrichment ratio were sequenced. The DNA sequences obtained were processed and translated to obtain the peptide sequence. The peptides were aligned manually or using multiple sequence alignment tools to obtain peptide logos.

32

From the library standpoint, quality can be affected by library construction or display

levels. Inefficient completion of the site-directed mutagenesis reaction may result in a large

proportion of phage particles that display the wild-type p8 coat protein. Further, if the number of

peptide copies on each phage particle is low, it may lead to weak binding. In both cases, such

libraries would offer a reduced chance of identifying peptide against their targets. The diversity

of the peptide library obtained from Dr. Gang Chen had been previously tested using phage

titrations. The IPTG concentration required for adequate display was also known and well-

documented. The library however was amplified for use in this study. Phage titrations were re-

done to estimate the diversity and size of the peptide library before using the peptide library for

further experiments.

From the immobilized target side, the quality and stability of the target are both important

factors in the success of a selection. For example, the presence of contaminants in impure protein

samples or denaturation of the samples themselves can result in the enrichment of unwanted

phage clones. Furthermore, the use of constructs that are unstable may result in a heterogeneous

and inconsistent interface that differs between rounds and is not amenable to enrichment of

binders against the intended target conformation. Consequently, SDS PAGE and spectro-

photometry were used test the purity and quantity of the protein target.

In light of these considerations, it is not only important to monitor the behaviour of the

phage population throughout the selection but optimization of the selection conditions and

reagents may be required for a successful outcome. One important consideration that was used to

design the selection strategy for this study was the presence of a GST tag on each of the

domains. To remove any peptides that bind to the GST tag, peptide library was pre-incubated in

a well containing only GST. Selections were done in presence of high GST concentrations to

further remove any weak GST binders.

3.1.4 Selection of tight-binding peptides and identification of binding specificities: The

progress of the selection screen is determined through an Enzyme Linked Immunosorbant Assay

(ELISA) and phage titrations. In phage titration, the phage obtained at the end of each round of

selection is used to infect exponentially growing bacterial culture. Upon infection, the bacterial

culture is serially diluted and plated on a plate containing the selectable marker for selecting the

cells that were successfully infected by the virus. The number of viruses (colony forming units or

33

cfu) obtained after each round of selection is calculated by counting the colonies obtained in the

serial dilutions. Enrichment ratio is defined as the ratio of the number of colony forming units

(cfu/ml) obtained from the target well and the negative control well (BSA). In a successful phage

display experiment, the enrichment ratio increases after each round of selection.

In ELISA, phage population obtained at each round is incubated with the immobilized

target and a control protein (GST and BSA) in parallel. Unbound phage particles are removed

from the wells through a series of washes and the remaining phage are then probed with anti-

M13 antibodies conjugated to horseradish peroxidase. Addition of the substrate results in the

synthesis of a blue pigment and the reaction is stopped with phosphoric acid to allow for a

spectrophotometric reading at 450nm. The enrichment ratio is determined by comparing the

signal intensity of the target well relative to the negative control well (BSA). As with phage

titrations, in a successful phage display experiment the enrichment ratio should increase after

each round of selection.

ELISA can also be used to determine the strength of binding of individual phage clones

obtained after all the rounds of selection are done. Depending on the stringency of the

experiment, tight binders can be defined as clones with target to control ratio of five or greater.

3.2 Methods

3.2.1 Strains: E.Coli strain XL1 Blue was used for expression of GST-fusion proteins. Peptide

phage display libraries were re-amplified in T1-resistant E.coli strain SR320, which was

generated by mating the strain XL1blue to the strain MC1061. All phage amplifications during

selection experiments were done in XL1 Blue.

3.2.2 Protein expression and purification: The DNA encoding the 66 shortlisted domains

was chemically synthesized (Genscript) and cloned into IPTG inducible expression vector

(pGEX) with Ptac promoter and N-terminal 6XHis and Glutathione-S-transferase (GST) tag

available in the Sidhu lab (pHH0103 –Appendix C). The protein sequences for each of 66

domains are attached in Appendix B. The plasmids containing these domains were transformed

into chemically competentXL1Blue. Single colonies were propagated in 2YT + 100 ug/ml

carbenicillin and stored as glycerol stocks (10% glycerol v/v) at -80C.

For protein expression, five ml starter cultures were inoculated from glycerol stocks and

34

grown overnight at 37oC, 200 rpm. The following day, 2-L baffled flasks containing 500 ml of

2YT + 100 ug/ml carbnicillin were inoculated with the starter culture. The cells were grown to

logarithmic phase (OD600=0.6) at 37oC, 200 rpm and induced with 0.4 mM isopropyl-β-D-

thiogalactopyranoside (IPTG) for protein expression. The cells were grown for 16 hrs at 16oC,

200 rpm. The cells were harvested by centrifugation (17,600 x g) at 4oC for 20 min and frozen at

-20oC.

Frozen cell pellets were re-suspended in a 12.5 ml 1xPhosphate Buffer Saline (PBS)

buffer with 1mM EDTA, 1mM DTT, 0.5% Triton X-100 (v/v) and protease inhibitors (1 tablet

per 50 ml of buffer, Roche). Sonication (three 2-min cycles of 5 sec “ON”, 5 sec “OFF”,

amplitude 25%) was used for cell lysis. The cell debris was removed by centrifugation (26,700 x

g), at 4oC for 20 min. The cell lysate obtained was then incubated with equilibrated glutathione-

sepharose 4B resin (GE-healthcare) and incubated at 4oC for 2 hrs. The resin and cell lysate

mixture was then applied to a gravity flow column. The column was washed with buffers (first

with 3 ml PBS, second with 3 ml PBS + 150mM NaCl and finally with 3ml PBS). The column

was then blocked and the resin was incubated with 1ml elution buffer (100mM glutathione in 50

mM Tris-Cl, pH 8, 1mM PMSF, 1mM EDTA) for 20 minutes. The eluate was collected and kept

for further analysis.

SDS-PAGE gel and spectrophotometry were used to validate the purity, size and estimate

the concentration of the protein. The proteins obtained were immediately aliquoted into smaller

volumes, frozen in liquid nitrogen and stored at -80 C. For troubleshooting the protein

purification pipeline, samples were collected after overnight incubation, upon lysis & flow-

through of the column and tested using SDS-PAGE.

3.2.3 Library construction and design: The peptide library used for the selections was 16

amino acid in length where each of the 16 positions can harbour 19 amino acids (Cys is not

included). The library has a theoretical diversity of 2.88x1020. The primary library had a

diversity of 4x1010and a titer of 1012cfu/ml. The phagemid used to design the library is listed in

Appendix C (pR4STOP).

The library was re-amplified by infecting actively growing SR320at OD600=0.8

containing 5X1012 cells in a 250 ml culture flask. The library was added to the culture such that

the ratio of phage : cell is 1:1. The culture was incubated for 30 minutes at 37oC, 200 rpm.

35

Helper phages (M13KO7) were added to the culture (such that the ratio between helper phage:

bacteria is 10:1) after 30 minutes to initiate the packaging of viral particles. After an hour of

incubation at 37oC, 200 rpm, the culture was added to 5 L of 2YT media and grown for 19 hrs at

37oC, 200 rpm. The cells were harvested by centrifugation (17,600 x g) at 4oC for 20 min. The

supernatant containing the bacteriophage particles was incubated with 20% v/v PEG/NaCl (20%

PEG-8000 (w/v), 2.5 M NaCl) at 4oC for 20 min. The supernatant was then centrifuged (26,700 x

g), at 4oC for 20 min to obtain the white phage pellet. The remaining supernatant was removed

by pipetting. The phage pellets were re-centrifuged (26,700 x g), at 4oC for 2 min to concentrate

the pellet and then re-suspended in 20 mL PBT (1xPBS, 0.05% Tween 20 (v/v) and 0.5% BSA

(w/v)). The final library was stored at -80oC with 10% glycerol (v/v). Phage titrations were

performed to estimate the purity and titer of the phage library.

3.2.4 Phage display selections: Phage display was done using the previously established

protocol described by Tonikian et al [45].

First round: The target proteins were immobilized on a microtiter plate (NUNC maxisorp 96-

well plate) by incubating the proteins overnight at 4oC. For each protein, five wells were used

with three wells for the protein and two for the negative control (PBS). Each well was incubated

with 100 ul of 10 ug/ml purified protein. The overnight coated wells were blocked with 200 ul of

PBT buffer (1xPBS, 0.05% Tween 20 (v/v) and 0.5% BSA (w/v)). The blocked wells were

washed three times with PT buffer (1xPBS, 0.05% Tween 20 (v/v)).

The phage library was re-suspended to a final concentration of 5X1012 phages/ ml in PBT

buffer (1xPBS, 0.05% Tween 20 (v/v), 0.5 BSA (w/v)), and added to each well and incubated for

2 hrs at room temperature. The unbound phages were removed and wells were washed eight

times with PT buffer (1xPBS, 0.05% Tween 20 (v/v). Bound phages were then eluted by

incubating with 0.1N HCl for 5 minutes at room temperature. The eluted phages were then

neutralized using Tris-Cl, pH 11.

The eluted and neutralized phages were incubated in 10 volumes of actively growing

XL1blue cells (OD600=0.6) at 37oC, 200 rpm for 30 minutes. Helper phage was then added to the

final concentration of 1010 phages per ml to initiate the formation of viral particles. The cells

were grown at 37oC, 200 rpm for 60 minutes. Kanamycin at 50 ug/ml was used to select for cells

that have been super-infected with helper phage and the culture was grown overnight at 37oC,

36

200 rpm.

Round 2, 3, 4 and 5: The target protein was immobilized on a microtiter plate (NUNC maxisorp

96-well plate) by incubating the protein overnight at 4oC. For each protein, five wells were used

with three wells for the protein and two for the negative control (BSA). Each well was incubated

with 100 ul of 10 ug/ml purified protein. The overnight coated wells were blocked with 200 ul of

PBT (1xPBS, 0.05% Tween 20 (v/v) and 0.5% BSA (w/v)). The blocked wells were washed

three times with PT buffer (1xPBS, 0.05% Tween 20 (v/v)).

Phages obtained from previous round of selection were collected from overnight cultures.

The cells were harvested by centrifugation (26,700 x g), at 4oC for 20 min. The virus particles

present in the supernatant were incubation with 20% v/v PEG/NaCl (20% PEG-8000 (w/v), 2.5

M NaCl) at 4oC for 20 min. The supernatant was then centrifuged (26,700 x g), at 4oC for 20 min

to obtain the white phage pellet. The remaining supernatant was removed by pipetting. The

phage pellets were re-centrifuged (26,700 x g), at 4oC for 2 min to concentrate the pellet and then

re-suspended in 1 mL PBT (1xPBS, 0.05% Tween 20 (v/v) and 0.5% BSA (w/v)). The 100 ul of

phages were added to the respective wells and incubated for 2 hrs at room temperature. The

unbound phages were removed and wells were washed eight times with PT buffer (1xPBS, 0.05%

Tween 20 (v/v). Bound phages were then eluted by incubating with 0.1N HCl for 5 minutes at

room temperature. The eluted phages were then neutralized using Tris-Cl, pH 11.

The eluted and neutralized phages were then incubated in10 volumes of actively growing

XL1 blue cells (OD600=0.6) at 37oC, 200 rpm for 30 minutes. Helper phages were then added to a

final concentration of 1010 phages per ml to initiate the formation of viral particles. The cells

were grown at 37oC, 200 rpm for 60 minutes. Kanamycin at 50 ug/ml was used to select for cells

that have been super-infected with helper phage and the culture was grown overnight at 37oC,

200 rpm.

In round 1 & round 2, pre-selection was done for 60 minutes at room temperature on

wells coated with 10ug/ml GST to remove phage clones that bind to the GST tag (present in each

purification). 10-20 fold excess concentration of GST was added in round 3, 4 and 5 to each well

coated with target-domain during incubation of library to further remove clones that

preferentially bind to the GST tag.

3.2.5 Calculation of enrichment ratio: Phage titrations (to test the number of phages obtained

37

from the protein well and the control well and to calculate the enrichment ratio) were performed

at each round of selection. Briefly, 50ul phage obtained from each day of selections were added

to 450ul of XL1 blue (OD600 = 0.6) and incubated for 30 minutes at 37oC, 200 rpm. The cells

were then serially diluted (10-fold dilution series) in 2YT. The various dilutions were spotted on

a LB agar plate with 100 ug/ml carbnecillin. The plates were incubated overnight at 37 oC.

Similar work was performed for the phages obtained from control well (GST/BSA). Next day,

colonies were counted on the protein and the control plate to calculate the number of phage

present after a round of selection. Enrichment ratio was calculated as the ratio of colonies in the

protein well compared to the control well. The enrichment ratio was calculated for each round of

selection.

3.2.6 Clonal ELISA and sequencing of peptides: The 50 ul eluted phages from Round 3, 4 &

5 were introduced to 450 ul of XL1blue (OD600 = 0.6) and incubated for 30 minutes at 37oC, 200

rpm. The cells were then serially diluted (10-fold dilution series) in 2YT. The various dilutions

were spotted on a LB agar plate with 100 ug/ml carbnecillin. The plates were incubated at 37oC,

200 rpm overnight. For each protein, 96 colonies were picked and grown over night in 450 ul of

2YT containing 100 ug/ml carbenecillin and 1010 phages/ml M13KO7 (1010 phages/ml), and

incubated overnight at 37oC, 200 rpm in a 96-well block. The overnight cultures were

centrifuged at 3400 x g for 15 minutes the next day.

Phage clones were tested for binding to protein, GST and BSA in an ELISA assay. For

each protein, 96 clones are tested in a single microtiter plate (384 well Maxisorp plate, Nunc).

The 384-well plate is divided into 96 sections with four wells each. In each section, two wells

were coated with 30ul of 10ug/ml of protein, one well with 10ug/ml GST and one well was left

empty overnight at 4oC. The plate was then blocked with 50 ul of PBT buffer (1xPBS, 0.05%

Tween 20 (v/v) and 0.5% BSA (w/v)) for two hrs at room temperature. 30 ul of phage

supernatant was added to all the four wells present in each section and incubated for 60 minutes

at room temperature. Wells were washed four times with PT buffer (1xPBS, 0.05% Tween 20

(v/v)). Anti-M13: HRP conjugated antibody was diluted 1:5000 in PBT buffer (1xPBS, 0.05%

Tween 20 (v/v), 0.5% BSA (w/v)) and 30 ul was added to each well. The antibody was incubated

for 45 minutes at room temperature and then discarded. The wells were washed eight times with

PT buffer (1xPBS, 0.05% Tween 20 (v/v)). Colorimetric HRP substrate reagents (TMB

38

substrate, Pierce) were mixed in equal volumes and 25 ul was added to each well and incubated

at room temperature for 5-10 minutes with gentle shaking. The reaction was stopped by adding

30 ul of 1M H3PO4. Absorbance at 450 nm was measured for each well using an ELISA plate

reader.

The enrichment ratio was calculated by comparing the intensity of signal in protein and

GST & BSA wells. The plates with enrichment ratio greater than five and GST background noise

of 0.1 or less were selected as true binders. The binders were then obtained and the DNA

sequence encoding the peptide displayed by that phage clone was amplified using PCR. The

DNA encoding the specific peptide was identified by DNA sequencing. The DNA sequences

obtained were processed and translated to obtain the peptide sequence. The peptides against each

domain were aligned manually or using multiple sequence alignment tools (Geneious) to obtain

peptide logos. Till date, I have obtained phage clones that bind specifically to 27 of the 44

domains (61% of the purified domains, 40% of all the domains).

3.2.7 Structural modeling of phage-display obtained results: All structural models were

obtained with Modeller. Modeller was installed on a Linux machine and run using the command-

line. Discovery Studio Visualizer was used to analyze the results from obtained from Modeller.

Energy minimization of Modeller structures was performed with the Molecular Dynamics (MD)

plug-in available in the licensed version of Discovery Studio.

3.3 Results

3.3.1 Selection of peptide binders using phage display: Each of the 66 domains was purified

using GST purification protocol described in the methods section. Purified proteins were

obtained for 44 out of 66 domains (67%). Table 5 contains all the results obtained from protein

purification. For most domains, I was able to obtain protein sufficient for performing phage

display. SDS page gels were run to check the correctness and purity of proteins. SDS gels were

also run to diagnose the entire purification and expression process. The 22 domains that could

not be purified showed high expression of protein. However in all such cases; the expressed

protein was insoluble and went into cell debris upon lysis. Further optimization may be required

to obtain these proteins in soluble form. However, in this study we continued with the 44

domains and used them as targets for phage display.

39

The original 16-aa length peptide library had a diversity of 4X1010 unique peptides and a

phage titer of 5X1012 cfu/ml. Upon re-amplification, a phage titer of 2.5X1011cfu/ml was

obtained post-infection providing a 10-fold coverage of the original library diversity. Upon

amplification, a library titer of 2X1013 cfu/ml was obtained.

The selections were performed using protocol modified from the one previously

described by Tonikian et al [45]. A subset of phage clones present in the library may bind tighter

to GST tag than the target domain. These may lead to spurious or false positive results. To

remove such phage clones, pre-selection was done on GST coated wells. Further negative

selection was performed by adding 10-fold excess GST in the target domain-coated well during

selection. For most proteins, upon negative selection, strong enrichment ratios were obtained.

This suggests that the negative selections were effective in removing strong GST-binding phage

clones from our library. Table 5 shows the enrichment ratio obtained after each round of all 38

targets against which phage selections were done. For 27 targets, I obtained enrichment in

selections.

3.3.2 Validation of tight and specific binders using clonal ELISA: For each protein with a

significant pool ELISA signal, I picked out 96 clones for clonal ELISA. The ELISAs were done

in a 384 well plate. For each phage clones, four wells were selected; two wells were coated with

the target domain while the other two wells were used for negative controls GST and BSA. The

clones that gave enrichment ratio greater than five were selected. DNA sequence encoding the

peptide for each of the selected clone was amplified by PCR and sent for sequencing.

3.3.3 Identification of binding preferences and literature validation: Peptide binders were

obtained for 27 domains. Geneious toolkit was used to align all peptide sequences obtained for

each of the selected domain. No gaps were allowed in the alignment. Alignments obtained from

Geneious were improved manually. For 22 of these 27 domains, sufficient numbers of peptides

were available to generate a position weight matrix (PWM) that represents the binding

preferences of these domains.

The 27 domains for which phage display peptides were obtained belong to 20 different

domain families and exhibit distinct binding preferences. The divergence in peptide binding

preferences highlights the power of phage display in generating specific peptide binders.

40

Name Protein name Protein expression

Protein yield (mg/ml)

Pool ELISA

Clonal ELISA

Sequence Logo Comment

PDZ (2.30.42.10)

1 Disc-large homolog 1 PDZ 1 Yes 2.82 1.5 - - Non-specific binders

2 Disc-large homolog 1 PDZ 2 Yes 4.2 5 - - Non-specific binders

3 Disc-large homolog 2PDZ 2 Yes 3.74 23 19(18)

-

4 Disc-large homolog 2 PDZ 3 Yes - - - - No protein in lysate

5 Disc-large homolog 4 PDZ 2 Yes 1.9 12 19(5)

-

6 Disc-large homolog 4 PDZ 3 Yes - - - - No protein in lystate

SH3 (2.30.30.40)

7 Growth-factor receptor bound protein 2

Yes 0.9 56 82(30)

-

8 Grb2-related protein 2 Yes 1.44 444 82(77)

-

9 Phospholipase gamma Yes 1.21 205 17(14)

-

10 Sorbin and SH3 containing protein 2

Yes 1.32 277 82(54)

-

Protein kInase (3.30.200.20+1.10.510.10)

11 Mitogen-activated kinase 3 Yes - - - - No protein in lystate

12 PKA Yes - - - - No protein in lystate

13 PKB Yes - - - - No protein in lystate

14 Serine/threonine kinase 2 Yes - - - - No protein in lystate

15 Aurora kinase B Yes 0.30 3 - - Non-specific binders

38

37 41

G-alpha subunit (3.40.50.300+1.10.400.10)

16 G-alpha (i) 1 Yes 1.33 14.3 43(12)

17 G-alpha (i) 3 Yes 0.85 18.7 - - Non-specific binders

18 G-alpha (o) 1 Yes 1.12 25 - - Non-specific binders

Ligand binding domain of nuclear receptor (1.10.565.10)

19 Bile acid Receptor Yes 0.60 13.3 22(12)

-

20 Retinoic acid receptor-gamma Yes - - - - No protein in lystate

21 Glucocorticoid receptor Yes - - - - No protein in lystate

22 Pregnane X Receptor Yes - - - - No protein in lystate

Dyenin light chain (3.30.740.10)

23 Dynein light chain 1 Yes 1.18 233 69(51)

-

24 Dynein light chain 2 Yes 1.21 63 25(25)

-

RNA recognition module (3.30.70.330)

25 Splicing factor U2AF1 Yes 1.28 0.5 - - No enrichment

26 Splicing factor 45 Yes - - - - No protein in lystate

Profilin (3.30.450.30)

27 Profilin 1 Yes 4.51 15 - - Non-specific binders

28 Profilin 2 Yes 1.13 12 - - Non-specific binders

Penta-EF hand (1.10.238.10)

38

37 42

29 Programmed cell death receptor 6 Yes 0.55 100 55(55)

-

30 Calpain small regulatory subunit 1 Yes 1.91 63 85(12)

-

Actin (3.30.420.40+3.90.640.10)

31 Actin-gamma 1 Yes - - - - No protein in lystate

32 Actin-gamma 2 Yes - - - - No protein in lystate

Beta-propeller (2.130.10.10)

33 Clathrin heavy chain 1 Yes 1.63 250 47(22)

-

34 WDR5 Yes 1.24 80

-

PH/PTB domain (2.30.29.30)

35 Dynamin 2 Yes 0.44 - - - Phage Display not done

36 Disabled homolog 2 Yes - - - - No protein in lystate

P-loop containing nucleotide triphosphate hydrolase (3.40.50.300)

37 RAC3 Yes 0.84 5 - - Non specific binders

38 RAD51 Yes 0.20 15 - - Non specific binders

Typrin-like serine protease (2.40.10.10)

39 Tissue-type plasminogen activator Yes 0.55 - - - Phage Display not done

40 Acrosin Yes 0.61 - - - Phage Display not done

14-3-3 (1.20.190.20)

38

37 43

41 14-3-3 eta Yes 1.80 33.3 56(9)

-

AP50 domain (2.60.40.1170)

42 AP-2 subunit mu Yes - - - - No protein in lystate

Bcl-2 (1.10.437.10)

43 Bcl-2 like protein 1 Yes - - - - No protein in lystate

Bro1 (1.25.40.280)

44 Alix Yes 0.80 22.2 15(7)

-

CAP/Gly (2.30.30.190)

45 Dynactin subunit 1 Yes 2.22 18 2(1) GQDEWVPWQLWSWQESI

No sequence logo

Caspase-like (3.40.50.1460)

46 Caspase 2 Yes - - - - No protein in lystate

DNAse1-like (3.60.10.10)

47 DNAse 1 Yes - - - - No protein in lystate

eIF4E (3.30.760.10)

48 eIF4E Yes 0.44 30 6(6) FLYYYGLSHNWFGDQT LVPWWWRVEQTMDPVI SVWWFGQTPYVLWEAS RVMIWWWLTQGIPFSF NLYYNNMYWQWYEWLN PWSWFTYREQLETENV

No sequence logo

FERM (1.20.80.10)

49 Band 4.1-like protein 3 Yes - - - - No protein in lystate

Ig domain (2.60.40.10)

38

37 44

50 Immunoglobulin lamba-like polypeptide 1

Yes - - - - No protein in lystate

Importin-beta (1.25.10.10)

51 Importin beta-1 Yes 0.36 33 10(10)

-

Mad2A (3.30.900.10)

52 MAD2-like protein 1 Yes 0.49 75 16(13)

-

MHC II (3.10.320.10)

53 DRB1 beta Yes - - - - No protein in lystate

PCNA (3.70.10.10)

54 Proliferating cell nuclear antigen Yes 0.34 40 2(1) GARQTLITDWLMVSSD No sequence logo

OB-fold domain (2.40.50.140)

55 Replication factor A 70 Yes 4.41 94 66(11)

-

Serpin (3.30.497.10+2.30.39.10)

56 Plasma serine protease inhibitor Yes - - - - No protein in lystate

SH3-type barrels (3.40.50.300/2.30.30.40)

57 Volage-dependent L-type calcium channel subunit beta

Yes - - - - No protein in lystate

SWIB/MDM2 (1.10.245.10)

58 MDM4 Yes 3.40 150 26(24)

-

TAP-UBA (1.10.8.10)

59 Nuclear export factor 1 Yes 2.57 45 24(9)

-

Winged helix repressor DNA binding domain (1.10.10.10)

38

37 45

60 Transcription factor IIF Yes 1.99 7.5 - - Non-specific binders

TRFH (1.25.40.201)

61 Telomeric repeat-binding factor 1 Yes 1.42 15 8(4) LGHTTAEMIDYMELQW SFPLEFTTDYMYNLMA MLFDDEAMYNWQWHLM EHSFLFEDWMWEGKDH

No sequence logo

Factor Xa inhibitor (4.10.410.10)

62 Amyloid-like protein 2 Yes 0.79 - - - Phage Display not done

Ubiqutin-like (3.10.20.90)

63 Ubiquitin-60S ribosomal protein L40

Yes 3.90 22 5(4) EHMWDAQMWEWSWWDL EMWVFTPAEWFQIYLN MTVVEWWTDAQIAEWM DLHYDWSLEYWTSLLQ

No sequence logo

Vinculin (1.20.1490.10)

64 Vinculin Yes 1.64 65 33(26)

-

XRCC4 (2.170.210.10+1.20.5.370)

65 XRCC4 Yes 0.88 - - - Phage Display not done

Tyrosine phosphatase (3.90.190.10)

66 Tyrosine-protein phosphatase non-receptor type 22

Yes 0.28 - - - Phage Display not done

Table 4: Summary of phage display results for 66 domains. SDS PAGE gels were run to determine the protein expression in the whole cell lysate. OD at 280 nm was used to determine the protein yield (shown in mg/ml). Enrichment ratio for pool ELISA is the maximum ratio of colony forming units per ml obtained from protein and empty plate for a given domain. The clonal ELISA column shows the number of sequences with: Enrichment ratio > 5 and background signal < 0.1. The number of unique sequences obtained from sequencing is included in the bracket. Sequence logo obtained from phage display is also included. For domains for which the number of sequences was insufficient to generate a sequence logo, the peptide sequences have been included (key residues have been highlighted in bold). Rows containing proteins that: were not purified is shown in blue, for which phage display was not done are shown in orange, for which no peptide sequences were obtained are shown in green and for which sequence logo could not be obtained are shown in purple.

46

38 45

To rationalize the binding preferences obtained from phage display, an extensive

structural analysis was done using the available structures of the protein domains in complex

with their known peptide ligand (Figure 11). Based on my analysis, I have presented below an

in-depth analysis of all the domains for which phage display results were obtained. These

domains belong to different cellular pathways and hence I have divided the 27 domains into five

sections based on their biological function (Figure 12). This would help in understanding the

potential uses of peptide probes generated by phage display and the various cellular processes

that can be targeted using these peptides.

Figure 11: Strategy for validating phage display results. To interpret the results from phage display, an extensive literature review and structural analysis was done. For each domain, the peptide sequences were aligned using Alignment tool in Geneious with high gap penalty. The sequences were then visualized as a position weight matrix using Weblogo. The alignments that show no consensus were improved manually. Structural analyses of existing complex structure of domains in complex with peptides were used to further improve the alignment obtained. This analysis was generating a model for binding of phage-derived peptides to target domain. 3.3.4 Cellular signalling: Based on the literature analysis, ten out of the 27 domains were

present on proteins involved in signalling networks including kinase and G-protein networks.

These included previously well-studied domains such the SH3, PDZ and 14-3-3 domains.

47

38 45

Figure 12: Overview of phage results. Based on the literature review, the 27 domains were divided into four distinct biological functions: cellular signalling, cytoskeleton regulation, Intracellular transport and genome regulation. Domains that did not fit into any of the four categories are shown as miscellaneous.

48

38 45

3.3.4.1 SH3: Src Homology 3 (SH3) protein interaction domains participate in a diverse set of

signalling pathways by binding to linear motifs [5]. These domains preferentially bind to proline-

rich-motifs (PxxP) with affinities ranging from Kd = 1 to 200 uM. The selectivity of SH3

domains have been studied in detail and consensus motifs have been predicted using yeast-2-

hybrid, phage display, alanine scanning and structure determination [5].

SH3 domain is a 60 amino acid domain with a beta-barrel fold which consists of 5 or 6 β-

strands arranged as two tightly packed anti-parallel β sheets (Figure13) [24]. The interaction

surface (between the RT and N-src loops) is relatively flat, hydrophobic with three shallow

grooves defined by conserved aromatic residues. The peptide adopts an extended, left-handed

conformation (polyproline-2 or PPII helix). Sequences lacking PxxP motif are also known to

bind to SH3 domains. Grap2-SH3 in our study is an example of domain that prefers RxxK motif

[46]. Crystal structures have confirmed that RxxK motif binds to a different binding region on

the SH3 domain. (Figure 13)

Figure 13: Structural and literature analysis of SH3 domains: (A) SH3 domains are known to bind to peptides using two distinct binding surfaces: binding surface 1 binding to PxxP motif and binding surface 2 that binds to RxxK motifs. Some SH3 domains (such as C-terminal SH3 domain of Grb2 (PDB ID: 2VWF) – shown in the figure) binds to its interaction partner using both the binding surface. (B) The binding preferences of phage display in comparison to previous results. In each panel, the logo on the top shows the binding preferences obtained by our study. Sequence logo at bottom of 3 panels (GRAP2, PLCG1 and SORBS2) was obtained from large-scale phage display screen performed by Dr. Haiming Huang at Sidhu lab (results unpublished). The phage display logo for N-terminal Grb2 was obtained from study done by Sparks et al [47]. The binding preferences obtained in this study matches with previous phage display experiments. In this study, four SH3 domains were targeted: the N-terminal SH3 domain of Grb2, the

49

38 45

C-terminal SH3 domain of GRAP2, the SH3 domain from PLCG1 and the second SH3 domain

from SORBS2. For each of the SH3 domains, high levels of enrichment were obtained in pool

ELISA. Further for four SH3 domains, we obtained a large number of unique sequences that

showed high enrichment ratio in clonal ELISA. The binding preferences of these SH3 domains

have previously been elucidated using phage display by the Sidhu lab and other groups.

However, the SH3 domains were selected to serve as positive controls for our experimental

pipeline and validate our screening method and peptide library. As expected, the phage display

results match previously generated binding preferences (Figure 13). The positive results obtained

for SH3 domains inform us that the diversity and display levels of the peptide library is sufficient

for elucidating binding preferences of PRDs.

3.3.4.2 PDZ: PDZ domains are peptide binding domains that bind to hydrophobic C-terminal

motifs of proteins. They regulate multiple cellular processes, acting as scaffolds involved in

protein-protein interactions. PDZ domains are ~90 aa in length and have a conserved fold

consisting of 5-6 β-strands and 2-3 α-helical structures [28]. These domains have a single

binding site in a groove between the α2 and β2 structural elements with a highly conserved

carboxylate-binding loop ([R/K] xxxGΦGΦ motif, where x: any amino acid residue and Φ:

hydrophobic residues) located before the β2 strand typically recognizing the extreme C-termini

of their target proteins (Figure 14). PDZ domains have a well-defined binding preference and

previous work done by Tonikian et al [28] has identified C-terminal binding preferences of 72

human PDZ domains. A subset of PDZ domains are also known to bind to internal binding

motifs such as Syntrophin and Par6 domains (Figure 14) [48]. In the Sidhu lab, an internal

peptide phage library has been used previously to identify internal peptide binding mode of

Dvl2-PDZ [36].

In the current study, six PDZ domains were selected: the first and second PDZ domains

of DLG1, the second and third PDZ domains of DLG2 and the second and third domains from

DLG4. Out of these six PDZ domains, four were successfully purified. All the four domains

were used as targets for phage display out of which two PDZ domains showed phage clones with

high enrichment ratio. The peptide sequences obtained from the phage selections were aligned to

obtain results shown in Figure 13. The sequence logo obtained for PDZ domains was similar to

the internal binding mode observed for the Par6-PDZ domain suggesting that the two PDZ

50

38 45

domains may also bind to internal ligands. These observations have to be validated using further

experiments.

Figure 14: Structural and literature analysis of PDZ domains: A. PDZ domains are known to bind to C-terminal peptides where the free COOH group binds to the carboxylate binding pocket. In selected PDZ domains, peptide binding can occur via a beta-hairpin motif (syntrophin PDZ) or a Par6 internal binding motif where negatively charged residue at site +1 compensates for COOH group [48]. B. The sequence logos obtained for second PDZ domain of DLG2 and DLG4. The binding preference is similar to canonical PDZ internal binding motif observed for Par6-PDZ domain. C. The structure of Par6-PDZ domain in complex with peptide obtained from Pals1 (PDB ID: 1RZX). The DLG2-PDZ2 motif shows conservation for Glu and Thr at positions -2 and -3 (Glu and Met in yellow); Ile/Leu at position 0 (Val in yellow); Asp at position +1 (Asp in yellow) and Pro at position +3 (Pro in yellow) of Pals1 internal ligand. DLG4-PDZ2 domain is similar but much weaker pattern (due to less number of unique sequences obtained). These results predict that phage-derived peptides bind to DLG2-PDZ 2 and DLG4-PDZ2 at the peptide binding pocket in an internal binding mode. 3.3.4.3 G-alpha subunit of hetero-trimeric G-proteins: Guanine nucleotide-binding proteins

are an important family of cell-signalling molecules that regulate key cellular pathways [14]. The

alpha subunit of G-proteins binds to G protein-coupled Receptors (GPCR) and acts as a GTPase.

Upon binding of ligand to GPCR, an exchange of GDP with GTP occurs in the Gα subunit. This

active Gα dissociates from the inactive G-protein complex and acts on its downstream effectors

via its GTPase activity. Structurally, Gα consists of two domains: a GTPase domain and an

alpha-helical domain. The GTPase domain is similar in structure to p21ras and other members of

the GTPase super-family of proteins and contains five helices surrounding a six-stranded beta-

sheet with five strands running parallel and one strand running anti-parallel to the others. The

second of the five helices is a 3(10) helix, rather than an alpha helix. The alpha-helical domain is

unique to the Gα subunits and has a long central helix surrounded by five shorter helices. The

alpha helical domain is joined to the GTPase domain by two extended strands, linker 1 (res 54-

58) and linker 2 (res 173-179). Between these two linking segments lies a deep cleft within

which the nucleotide (GTP or GDP) is tightly bound. Phage display and mRNA display have

51

38 45

been used to obtain peptide antagonists/agonists of the GTPase activity of Gα [14, 54]. Both the

methods generated peptides that bind to the hydophobic pocket between the α3 helix and the

switch II helix. The switch II/α3 binding pocket is also the position for binding of RGS14

GoLoco motif, an important regulator of Gα activity.

In this study, I targeted three G-alpha domains – Gαi1, Gαi3 and Gαo1. All the G-alpha

domains were successfully expressed and purified in sufficient quality and quantity to perform

phage display experiments. However, upon selections, peptides were obtained against only one

G-alpha domain: Gαi1, which has been targeted by previous studies. This may be due to the

absence of GTP/GDP during selection which may be required for stabilizing the structure of

Gαi3 and Gαo1. For Gαi1, 12 peptides were obtained which were aligned to generate a

consensus motif: “ΦWexeWV” (where Φ: hydrophobic residue; e: negative charged residue).

The consensus motif is distinct from the peptide obtained from previous combinatorial library

studies (Figure 15). Structural analysis of Gαi1 with KB-752 peptide suggests that phage-derived

peptides may bind to the same binding site of peptide as KB-752; albeit in a different mode

(Figure 15).

Figure 15: Structure and literature analysis of Gα subunits. (A) The structure of Gαi1 in complex with GDP (red) and KB-752 peptide (PDB ID: 1Y3A). (B) The binding logo obtained for Gαi1. The residues that are conserved in the sequence correspond to Trp10, Trp14 and Phe15. (C) The interaction surface of KB-752 and Gαi1. The residues on KB-752 that are important for interaction are the Trp, Phe, Asp and Leu (shown in yellow). A similar pattern is observed in phage display binding motif, albeit with a spacing of two residues between Trp and negative charge compared to one in KB-752 and a preference for Trp and Phe at position 14 and 15 instead of Phe and Leu as found in KB752. G-protein signalling is one of the major signalling pathways used by cells and have been

implicated in a number of disorders. This can be highlighted by the fact that 25% of the marketed

pharmaceuticals target GPCRs [55]. Gαs oncogenes have been shown to increase carcinogenicity

52

38 45

and metastasis, and recent identification of Gαs-hyperactivating mutations in kidney cancer

indicates that the subunit could be a therapeutic target in developed tumours [55]. Peptide probes

developed here may be used to modulate the activity of Gα domains.

3.3.4.4 14-3-3: The14-3-3 family of proteins play a key role as scaffolding proteins in a number

of cellular signalling pathways [56]. Seven family members have been reported in human and are

expressed in all human tissues except for 14-3-3 sigma that is specific to epithelial cells.

Structurally, 14-3-3 proteins contain a single domain that forms an alpha-alpha super-helical

structure harbouring a conserved amphipathic groove that forms the binding pocket. 14-3-3

proteins are generally known to bind to phosphorylated serine residues on their binding partners

with sub-micromolar affinity [57]. Non-phosphorylated peptides have also been identified.

Binding of Exoenzyme S, a toxin produced by Pseudomonas aeruginosa, displays a high affinity

towards 14-3-3 zeta/delta and binds to the same amphipathic groove responsible for binding to

the phosphorylated peptides [58]. Phage display has also been used to identify cyclic peptides

14-3-3zeta/delta. The peptide with high affinity contained “WLDLE” motif that was essential for

binding [59].

Figure 16: Structural and literature analysis of 14-3-3. (A) The structure of 14-3-3 zeta/delta in complex with the ExoS peptide (PDB ID: 2O02). (B) The binding logo obtained for 14-3-3 eta. (C) The interaction surface of 14-3-3 zeta/delta and the ExoS peptide. The sequence logo shows conservation for residues Glu(8)-Trp(9)-Leu(10) Asp(11)-Leu(12)-Ala(13). These correspond to Asp-Ala-Leu-Asp-Leu-Ala residues present in the ExoS peptide (shown in yellow). Of the seven 14-3-3 isoforms, 14-3-3 eta was found in the list obtained from whole

genome RNAi screens. The peptides I obtained from phage display share a similarity to peptides

obtained from ExoS and phage display experiments (Figure 16). To further understand the

53

38 45

binding preference from phage display, I used the structure of 14-3-3 zeta/delta with ExoS [58].

The amino acids forming the interaction surface are identical in 14-3-3 zeta/delta and 14-3-3 eta

and hence both domains should have similar (if not identical) binding preferences. ExoS peptide

and cyclic phage-derived peptide have been shown to bind to 14-3-3eta. Hence we predict that

the peptide against 14-3-3 eta will bind to same interaction surface in a mode that is similar to

the binding of ExoS peptide (Figure 16).

3.3.4.5 Penta-EF hand domains: Penta EF-hand domains (PEF) are a family of Ca2+ binding

domains that are composed of five EF-hand motifs [83]. EF-hand is a helix-loop-helix structure

characterized by a conserved 12-residue inter-helical sequence that co-ordinates a Ca2+ ion. The

EF-hand motifs are present in a multitude of proteins, usually in multiple copies. Penta-EF hand

domains consist of five EF-hand motifs and consist of eight alpha helices. The five EF-hands are

formed by: α1-α2 (EF1), α3-α4 (EF2), α4-α5 (EF3), α6-α7 (EF4) and α7-α8 (EF5). Based on

sequence similarity, penta-EF hand domains can be divided into two groups: Group I PEF

domains (PDCD6 and peflin) and Group II PEF domains (calpain sub-family members, sorcin

and grancalcin). In this study, I targeted two PEF domains: Caplain small regulatory subunit 1

and Programmed cell death protein 6. PEF hand domains have not been previously studied using

phage display and hence this study represents the first to elucidate penta-EF hand binding

preferences.

a. Calpain small regulatory subunit 1: Calpains are intracellular Ca2+ dependent cysteine

proteases that play key roles in cells and have been implicated in a number of cellular processes

such as signal transduction, apoptosis, and cytoskeleton modelling[84]. The calpain proteolytic

system consists of a small subunit, which acts as a Ca2+ dependent adaptor, a large subunit that

contains the catalytic site and an endogenous calpain-specific inhibitor, calpastatin. Calpastatin is

ubiquitously expressed and blocks the protease activity of calpain by binding to three sites on the

calpain protease: the active site and domain V on the large regulatory subunit and penta-EF hand

domain on the small regulatory subunit [84].

In this study, I focussed on the small regulatory subunit of calpain. The small subunit is

required for proper functioning of the calpain large subunit and acts as a chaperone to stabilize

the calpain protease system. Calpain small regulatory subunit harbours a hydrophobic binding

surface that binds to the peptide obtained from calpastatin: DAIDALSSDFT. The binding

54

38 45

preference obtained from phage display is similar to that of the calpastatin-derived peptide with

the motif – DLxxWLxxDM (Figure 17).Hence, these peptides should competitively inhibit the

interaction of the calpain small regulatory subunit and calpastatin.

Calpains have been previously implicated as drug target in a number of disorders

including cancer and neurodegenerative diseases. There are large efforts to design small

molecule and peptide-based drugs that target calpain. Almost all of these drugs target the active

site of calpain to inhibit calpain activity. However, given the similarity between the active site of

calpain and other cysteine proteases, most of these compounds are non-specific. Calpastatin is

the endogenous and most specific inhibitor of calpain but it is not stable inside cells and hence

cannot be used for therapeutic applications. Thus there is an urgent need to develop high affinity

and high specificity inhibitors of calpain. The peptides identified here by phage display may

serve as templates for developing peptide inhibitors against the calpain protease system.

Figure 17: Structure and literature analysis of Penta-EF hand of CAPNS1. (A) The structure of CAPNS1 penta-EF hand domain in complex with calpastatin-derived peptide (PDB ID: 1NX1). (B) The binding motif for CAPNS1 penta-EF hand domain obtained from phage display aligned with the peptide derived from calpastatin. (C) The interaction surface between CAPNS1 penta-EF hand and calpastatin-derived peptide. Calpastatin peptide forms an alpha helical structure that binds to the binding pocket on the penta-EF hand. The key interactions are made by two Asp; two Ala; one Leu and one Phe residues on the peptide (highlighted in yellow). These nature and position of these residues are conserved in binding motif. Hydorphobic residues are replaced by aromatic residues in the binding motif indicating that the hydrophobic surface is more flexible and can accommodate bulkier residues. b. Programmed cell death protein 6: Programmed cell death protein 6 or PDCD6 (Alg-2)

functions as a Ca2+-dependent adaptor protein in the ESCRT and ER-to-Golgi transport systems.

Alg-2 interacting proteins commonly contain Pro-rich regions, and Alg-2 recognizes at least two

distinct Pro-containing motifs: PPYP(x)nYP (Alix, PLSCR3) and PxPGF (Sec31A, ABM-2)

[85]. The binding of PPYP(x)nYP peptide occurs at a groove that contains two peptide-binding

hydrophobic pockets [86]. The structural basis for the binding of Alg-2 to PxPGF peptides

55

38 45

remains to be established. Mutational and competitive binding analysis have shown that the

PxPGF peptides bind to a binding pocket(s) that is different from that of PPYP(x)nYP peptides.

From phage display, I obtained 55 unique peptides against PDCD6. Interestingly all these

peptides contained a prominent GWxxWV motif (Figure 18). This motif is distinct from the

PPYP(x)nYP motif, which binds to a structurally-defined binding surface. However this motif

does partially overlap with the PxPGF motif suggesting that these peptides may bind to the same

surface. The binding surface of this motif is not known structurally and phage-derived peptides

can be used to identify the binding surface. It has been shown that knock-down of Alg-2 leads to

growth defects in cancer cell lines via cell cycle arrest at G2/M checkpoint [87]. However, the

molecular mechanism of how Alg-2 is involved in cancer-related pathways is still unclear.

Peptide probes developed in this study can be used to investigate the biological role of Alg-2 in

specific cancer cell lines.

Figure 18: Structure and literature review of Penta-EF hand of PDCD6. (A) The structure of PDCD6 penta-EF hand domain in complex with Alix-derived peptide (PDB ID: 2ZNE). (B) The binding motif for PDCD6 penta-EF hand domain obtained from phage display. (C) The interaction surface between PDCD6 penta-EF hand domain and the Alix-derived peptide showing the pocket 1 and pocket 2 bind to the two PYP motifs on Alix peptide (highlighted in yellow). The phage-derived motif is distinct from pro-rich motif found in Alix and hence is predicted to bind to a distinct binding site. 3.3.5 Cytoskeleton regulation: Four domains in our study were found to be present on proteins

involved in regulation of cytoskeleton structure. These include two domains of the dynein light

chain family, one domain from the CAP/Gly domain family and one domain alpha-catenin/

vinculin head domain family.

3.3.5.1 Dynein light chain: Dynein light chains(DLC) are domains found on mono-domain

proteins: light chains of cytoplasmic motor protein dynein (DYL1 and DYL2). Structurally, DLC

56

38 45

domains contain three beta-sheets and three alpha helices in a two-layer alpha-beta core structure.

DLC are peptide recognition domains that bind to a large range of proteins such as Pak1-kinase,

Bim (pro-apoptotic) and many viral proteins [49]. Previous studies have found two motifs that

bind to DYL1 – GIQVD, KxTQT; where both bind in an anti-parallel beta sheet conformation to

the binding groove (Figure 19).Recent studies have also done phage display to determine the

binding preference of DLC from DYL1 (Figure 13) [50].

The peptide profile obtained via phage display looks similar to the motifs obtained from

natural peptides and previous phage display studies. However, some features of the motif

obtained by phage display seem to be different; for e.g.: a surprising preference for M/ W (φ)

instead of K/R at position -3.

DYL proteins act as essential hub proteins that are involved in a range of cellular

signalling pathways such as cytoskeleton including intra-cellular transport, autophagy, apoptosis

etc. Over-expression of DYL1 has been shown in a number of tumour types [49]. However, the

mechanism by which DYL1 and DYL2 are involved in carcinogenesis is poorly understood.

Peptide-based inhibitors developed in this study may help in critically evaluating the role of

DYL1 in various cancer-related pathways.

Figure 19: Structural and literature analysis of Dynein light chains.(A) The structure of Dynein light chain 1 in complex with the peptide derived from Swallow (PDB ID: 3E2B). (B) The sequence logos obtained from phage display for Dynein light chain 1 (DYL1) and Dynein light chain 2 (DYL2). DYL1 and DYL2 share high sequence similarity between each other and hence are predicted to have similar binding preference; which is observed in phage display results. Few difference exist in the two sequences, including stronger preference for Gly and Ala at position 4, Asp and Glu at position 9 in DYL1 compared to DYL2. (C) Binding interaction of Dynein light chain 1 and the Swallow peptide. The key interactions are mediated by Lys(-3)-Ala(-2)-Thr(-1)-Gln(0)-Thr(1)-Asp(2) residues present on the Swallow peptide. The central Thr(-1)-Gln(0)-Thr(1) residues are conserved in the sequence logo obtained for DYL1. Few difference are also observed namely, Met and Trp are possible at Lys(-3); Serine and Thr are possible at Thr(2); Asp and Glu are possible at Asp(5) on the Swallow peptide.

57

38 45

3.3.5.2 CAP/Gly domain: CAP/Gly domains are Gly-rich domain found in a number of

Cytoskeleton-associated proteins (CAP) that bind to C-terminal peptides with the motif –

EEY/F-COOH [98]. CAP/Gly domains are extensively involved in cellular processes including

chromosome segregation, establishment and maintenance of cell polarity, intracellular organelle

and vesicle transport, cell migration, intracellular signalling and tumorigenesis. CAP-Gly

domains are found in single or multiple copies and are primarily involved in protein interactions

and the formation of protein networks. In this study, I targeted the CAP-Gly protein of the large

subunit of dynactin, p150glued [99]. The dynactin complex is required for targeting dynein to its

cargo and for dynein motor processivity. CAP-Gly domains are characterized by highly

conserved motif with glycine and hydrophobic residues. Structurally, CAP-Gly domains form a

globular-protein fold with a highly twisted, five-stranded antiparallel β-sheet flanked by a small

β-hairpin. A unique cluster of conserved aromatic residues forms a solvent exposed hydrophobic

cavity bordered by the highly conserved GKNDG motif. This hydrophobic cavity of CAP-Gly

domains serves as a binding site for the C-terminal Glu-Glu-Tyr/Phe (EEY/F)-COOH sequence

motifs (Figure 20). The C-terminal binding preference could not be tested in the study as the C-

terminal peptide library was not available.

From phage display, I obtained a single peptide that bound to the CAP/Gly domain of

DCTN1 suggesting an internal binding mode for this CAP/Gly domain. Based on the sequence

alignment of internal peptide with C-terminal peptide, I observed similarity between N-terminal

end of the two peptides. To gain further insight into peptide binding, I modelled the phage-

derived peptide on the CAP/Gly domain of dynactin using Modeller. The structural model

obtained was analyzed by visual inspection. Based on the results, I predict that the binding

surface of CAP/Gly can accommodate an internal peptide (Figure 20). Briefly, the Asp and Glu

residues are conserved between the natural peptide and internal peptide. The Thr residue on

natural peptide is replaced by a Trp residue which interacts with a hydrophobic pocket present on

CAP/Gly domain. The C-terminal Phe residue is compensated by the Val and Pro residues which

also provides the structural flexibility to the internal peptide. The backbone CO group between

Val and Pro in internal peptide forms hydrogen bonds with positively charged residues on

CAP/Gly domains mimicking terminal COO- group in natural peptide. Finally the movement of

highly-mobile β4/ β5 loop allows the Trp residue on the internal peptide to interact with a

hydrophobic pocket. This pocket is covered by Asn (69) in presence of natural peptide. The

58

38 45

movement of the β4/ β5 loop also allows the internal peptide to move out of the peptide binding

pocket.

Figure 20: Structure and literature analysis of the CAP-Gly domain of p150glued. (A) The structure of the p150glued CAP/Gly domain in complex with CLIP-170 zinc-knuckle 2 (PDB ID: 3E2U). (B) The binding surface of CAP/Gly-peptide complex. (C) The sequence alignment of the natural C-terminal and phage-derived internal peptide partner of p150glued CAP/Gly domain. (D) The modelled structure of the p150glued CAP/Gly domain in complex with the phage-derived peptide. The structure was generated using Modeller by mutating the CLIP-170 peptide to Trp-Val-Pro-Trp-Gln. (E) The binding surface of CAP/Gly-internal peptide complex showing the potentially novel internal binding mode. Biophysical assays including iso-thermal calorimetry (ITC) are required to confirm the

binding between the phage-derived peptide and DCTN1 CAP/Gly domain. Structure

determination of phage-derived peptide and CAP/Gly domain is required to confirm the

structural model of the internal binding mode. If confirmed, the phage-derived peptides may help

in identifying novel protein partners and biological roles of CAP/Gly domains.

3.3.5.3 Alpha-catenin/vinculin head domain: The alpha-catenin/vinculin head domain is

found in proteins involved in cytoskeletal organization, such as vinculin and alpha-catenin.

Structurally, the alpha-catenin/vinculin head domain comprises of seven amphipathic helices

59

38 45

arranged as two four-helical bundles [66].In this study, we focused on the vinculin head domain.

The vinculin head domain is involved in mediating interactions between vinculin and its

interaction partners: alpha-actinin, talin and Shigella toxin IpA. The interaction between the

vinculin head domain and talin has been studied extensively and has been shown to occur via a

“helix addition” mechanism; where 26-amino acid length amphipathic peptide from talin inserts

into the first four helices of vinculin head domain to form a compact five helix structure. The

interaction exhibits high affinity and the peptides from alpha-actinin and IpA bind in a similar

mode [66]. A number studies have aimed at identifying the binding preference of vinculin head

domain. One of the first studies used phage display to identify peptides against the vinculin head

domain. Adey et al identified five peptides that specifically bound to the tailin binding region on

vinculin head domain with high affinity [67]. However these peptides failed to generate a

binding motif for vinculin binding. Gingras et al provided a consensus motif

(LXXAAXXVAXXVXXLIXXA) for vinculin binding based on the study of complex structure

of talin bound to vinculin and SPOT microarray analysis [68].

In this study, I obtained 26 unique peptides against the vinculin head domain. The

sequences produce a low-resolution alignment (shown in Figure 21) and do not align well to the

vinculin binding motif provided by previous combinatorial studies. This can be attributed to the

shorter length of these peptides. At 16 amino-acid length, the peptides obtained in this study

represent the shortest peptide sequences that are known to bind to the vinculin head domain and

hence may serve as inhibitors of vinculin-talin binding.

Figure 21: Structural and literature analysis of Alpha-catenin/vinculin head domain. (A) The structure of vinculin head domain in complex with the talin VBS1 (PDB ID: 1SYQ). (B) The binding profile for vinculin head domain obtained from phage display. (C) The interaction surface of VBS1 and vinculin head domain. VBS1 binds to vinculin head domain via amphipathic helix with hydrophobic residues facing vinculin core and charged residues facing the solution. No clear similarity was observed between the phage-derived peptides and known natural peptide binders of vinculin head domain.

60

38 45

3.3.6 Intracellular Transport: Intracellular transport is often mediated by peptide tags that

guide the spatial localization of a protein. Often, specialized PRD’s are involved in the

recognition of these peptide-based tags. In this study, we obtained four domains that are involved

in intracellular transport.

3.3.6.1 Importin beta: Importin beta is a member of the family of nuclear transport receptors

that are responsible for importing large macromolecules inside the nucleus [69]. Structurally,

importin beta contains a single domain with a superhelical structure containing 12 helical repeats

known as HEAT repeats connected via flexible linkers. These repeats contain two alpha helices,

A and B, connected by a flexible linker. The A helix face the outer, convex surface of importin

beta while the B helix is present on the inner concave surface. Importin-beta is known to bind to

NLS and importin alpha via the C-terminal section, RAN-GTP via its N-terminal domain and the

other proteins including the members of nuclear pore complex (FG nucleoporins) via its central

region. The interaction between importin beta and the FG-nucleoporins is mediated by a FxFG

motif present on this class of nucleoporins.

Figure 22: Structural and literature analysis of Importin beta. (A) The structure of importin-beta in complex with the FxFG peptide (PDB ID: 1F59). (B) Binding motif of importin-beta obtained from phage display. (C) The interaction surface between importin beta and the FxFG peptide. The key residues in this interaction are highlighted in yellow. The three key residues: two Phe and Gly are conserved in the binding motif obtained from phage display. From phage display, I obtained 10 unique peptides against importin beta. All of these

contain the FxFG motif (except one that contains Y instead of F at position 14) that is known to

bind to the central region of importin beta (Figure 22). The structural analysis of importin beta in

61

38 45

complex with FxFG peptide from nucleoporins matches accurately with the binding motif

obtained from phage display. (Figure 22)

Different groups have previously developed specific modulators of nuclear import [70].

The peptides generated in this study may serve as inhibitors of the interaction of importin beta

and FG nucleoporin and hence inhibit importin beta-mediated transport across the nuclear pore.

Such peptide inhibitors may act as useful tools for studying the role of importin beta in cells.

3.3.6.2 UBA: The ubiquitin-associated (UBA) domain is an approx. 40 amino acid domain that

was first recognized in proteins associated with ubiquitin but is also found in proteins involved in

nucleotide excision-repair and nuclear transport [75]. UBA domains form three-helix bundles

with a hydrophobic core that stabilizes the protein and possesses a conserved surface patch of

hydrophobic amino acids that interacts with hydrophobic regions of ubiquitin and other target

proteins. In this study, I targeted the UBA domain of nuclear export factor 1 (NXF1). NXF1 is a

member of the family of proteins involved in mRNA export from the nucleus. NXF1-UBA

domain is present on the C-terminal end of NXF1 and has been shown to be sufficient for

nucleo-cytoplasmic shuttling and localization to the nuclear pore complexes (NPCs) in-vivo.

NXF1 is also essential for the export of many viral RNAs bearing the constitutive transport

element (CTE) [76].

Figure 23: Structure and literature analysis of NXF1-UBA domain. (A) Structure of NXF1-UBA with the FxFG peptide (PDB ID: 1OAI). (B) The binding logo for NXF1-UBA obtained from phage display. (C) Binding surface of NXF1-UBA with the FxFG peptide. The phage display logo shows high conservation of Phe (pos 12) and Trp (pos 13). While the conservation of two aromatic residues is similar to two Phe residues (highlighted in yellow) in FxFG peptide; the lack of conservation of Gly residues (before and after the central hydrophobic residues) shows that the binding mode of phage-derived may be different from FxFG motif. Other differences include: preference of Trp(13) instead of Phe; hydrophobic residue (Phe/Leu/Ile) at postion 9 instead of the Asp residue in the FxFG motif. Based on structural analysis of the crystal structure, I predict that the Trp residue can fit into the hydrophobic binding pocket.

62

38 45

From phage display, I obtained a preference of FxW for NXF1-UBA (Figure 23). To

obtain a deeper understanding of the binding mode of phage-derived peptides to NXF1-UBA, I

analyzed the known structure of FxFG peptide (from nucleoporins) to NXF1-UBA (Figure 23)

[77]. The sequence pattern of the phage-derived binding preference is similar to that of the FxFG

peptide. While the hydrophobic residues are conserved between the phage-derived motif and the

FxFG peptide, the results also suggest that the hydrophobic surface of TAP/UBA may have more

flexibility than previously reported. This indicates that NXF1/UBA domains may bind to other

proteins that contain a hydrophobic motif which can be predicted using the binding motif

obtained from this study. To my knowledge, this is the first study to report the binding

preference of NXF1-UBA.

The peptides obtained in this study can be used to design peptide-based or small-

molecule based inhibitors of NXF1 mediated nuclear export. There is enough evidence

suggesting that blocking this interaction should be sufficient to inhibit NXF1-mediated transport.

Inhibitors against NXF1 may be used to further probe the mechanisms of NXF1-mediated

nuclear transport.

3.3.6.3 Bro1: The Bro1 domain is found in different eukaryotic proteins such as Alix

(PDCD6IP), Brox and HD-PTP [78]. Structurally, Bro1 domain has a banana-shaped shaped

structure that is organized around a core of tetratricopeptide helical hairpins. In this study, I

targeted Alix Bro1 domain. Alix plays an important role in intracellular transport as an adaptor

protein that recruits CHMP4/ESCRT-III complexes (via its Bro1 domain) to function at distinct

biological membranes. Other functions include lysobisphosphatidic acid (LBPA) binding,

endophilin binding, receptor trafficking, endosome distribution, cell motility/adhesion, apoptosis,

actin and microtubule binding and regulation of JNK signalling. Alix has also been implicated in

the release of several other classes of enveloped viruses, including hepatitis B virus, dengue

virus, yellow fever, HCV, SIV, RSV human para-influenza virus, and Sendai virus.

The interaction between CHMP4 and the Alix-Bro1 domain is mediated by an

amphipathic helix present in CHMP4 binding to helix 5-7 on Bro1 domain [78]. A second

protein interaction site has been reported within the first half of the Bro1 domain, which interacts

with the p6-adjacent nucleocapsid (NC) domain of Gag, a HIV protein. While the exact

interaction surface between the Alix-Bro1 domain and the NC domain has not been identified;

63

38 45

residue substitutions in NC or within the first 200 residues of the Ali-Bro1 domain compromised

HIV-1 release emphasizing the critical role of NC-Bro1 domain interaction in this process [79].

In this study, I obtained 15 peptides (7 unique peptides) against the Alix-Bro1 domain.

Upon sequence alignment, a “Mxx[L/M]xx[W/L]” motif was resembled and it resembles the

amphipathic helix derived from CHMP4C (Figure 24). Comparing the available structure of

Alix-Bro1 domain in complex with CHMP4C peptide shows that the binding preferences

obtained from phage display resembles the binding observed for CHMP4A peptides. Hence, I

predict that the phage-derived peptides will block the interaction between Alix Bro1 domain and

CHMP4 proteins thereby blocking the recruitment of ESCRT III complex. Hence, these peptides

may play a role in discerning the role of Alix-Bro1 domain in a range of cellular pathways.

Figure 24: Structure and literature analysis of Alix-Bro1 domain. (A) The structure of Alix-Bro1 domain with the CHMP4C peptide (PDB ID: 3C3R). (B) The binding motif of the Alix-Bro1 domain. (C) The binding surface of Bro1 domain and CHMP4C peptide showing the amphipathic CHMP4A peptide with the hydrophobic surface towards the Bro1 domain. The pattern obtained from phage display shows two components: negatively charged patch at the N-terminal end and hydrophobic helical component with triad of hydrophobic residues: Met (pos 10); Leu/Met (pos 13) and Trp/Phe (pos16). The conserved hydrophobic triad in phage display corresponds to Ile; Leu and Trp (highlighted in yellow) in the CHMP4C peptide.

3.3.6.4 Clathrin heavy chain: Clathrin forms the outer coat of vesicles involved in cellular

transport between different membrane locations [93]. Structurally, clathrin contains the N-

terminal adaptor domain (CTD) and the alpha-helical repeats that forms the large part of clathrin

heavy chain. CTD is a 7-bladed beta propeller that binds to compartment-specific adaptor

proteins such as beta-arrestin and the adaptor protein complexes (AP-1 & AP-2). The interaction

between the CTD and adaptor proteins is mediated by peptide-like motifs. The first linear motif

obtained was the ‘clathrin-box’ consensus LΦxΦ[D/E] that was confirmed to bind between the

64

38 45

first two blades of the beta-propeller [94]. In recent years, other peptide motif variants have been

shown to bind to the CTD, such as: the W-box motif (PWxxW) which binds to the top of CTD

[95] and the [L/I][L/I]GxL motif, which binds between blade 4 and blade 5 of the CTD [96]

(Figure 24). A fourth binding site has also been predicted between blade 6 and blade 7 using

multiple sequence alignments however the binding preferences of this site has not yet been

determined [97]. (Figure 25)

In this study, 47 peptides (22 unique peptides) were obtained against clathrin terminal

domain. Based on the sequence alignments, it was observed that structurally, the sequences could

be split into 3 groups: Set 1: sequences with DΦxWΦ motif that resembled the clathrin-box

peptide, albeit in the reverse order, Set 2: sequence with DxxDW motif that does not match any

known clathrin binding motif and Set 3: with no consensus sequence to one-self or with Set 1

and Set 2 (Figure 25). It is surprising that no sequences were obtained that resembled the

peptide-motifs reported in literature. Previous studies have suggested that the beta-propeller

structure of CTD changes conformation upon binding to different peptides and small molecule

ligands. This flexibility may allow peptides with distinct sequence to bind to the same binding

surface and explain the diversity in peptide sequences that bind to the CTD.

Figure 25: The structure and literature analysis of the Clathrin terminal domain. (A) The structure of Clathrin terminal domain showing the four peptide binding sites: Site 1 with preference for LΦxΦ[D/E] peptides (PDB ID: 1C9I); Site 2 with preference for PWxxW peptides (PDB ID: 1UTC); Site 3 with preference for [L/I][L/I]GxL peptides (PDB ID: 3GD1) and Site 4 with an unknown binding preference. (B) The binding motifs obtained from phage display for the Clathrin terminal domain showing the two binding motifs.

65

38 45

A recent study by von Kleist et al identified a novel family of small molecules that bind

to the clathrin terminal domain [98]. On structural analysis, it was observed that these molecules

specifically bind to the “clathrin-box” interaction surface and blocked clathrin-mediated

endocytosis. This observation is extremely important as peptides derived from phage display

may act as specific inhibitors of clathrin-mediated endocytosis. Phage-derived peptides often

show higher affinity and specificity towards target protein compared to small molecule inhibitors

and hence may serve as probes for elucidating the role of clathrin-terminal domain in clathrin-

mediated endocytosis. Further, the different set of peptides obtained from phage display can be

combined to generate bivalent peptides to develop high affinity inhibitors of clathrin-mediated

endocytosis.

3.3.7 Genome Regulation: A host of proteins interact with different parts of the genome. These

include transcription factors that activate the activity of specific genes, histone modification

enzymes that introduce post-translation medications on histone tails, RNA polymerase complex

that initiates transcription of genes. In this study, I was successfully able to generate peptides

against five PRDs that are involved in the regulation of the genome.

3.3.7.1 PCNA: Proliferating cell nuclear antigen (PCNA) is a single-domain protein that acts as

a co-factor for DNA polymerase δ in eukaryotic cells [62]. Functional PCNA is a homotrimer

forming a ring structure, in which three monomers are joined together in an anti-parallel head to

tail interaction.

Numerous protein partners interact with PCNA, including the DNA polymerase δ and the

DNA polymerase ε for DNA replication, DNMT1, HDAC1, and p300 involved in chromatin

assembly and gene regulation, DNA mismatch repair protein Msh3/Msh6 for DNA repair,

p21(CIP1/WAF1) for cell cycle control, and ESCO1/2 for sister-chromatid cohesion. Most of the

interactions are mediated by a PCNA interaction peptide or PIP-box (QXXhXXaa; where h-

hydrophobic residue, a – aromatic residue) that bind to a specific interaction surface in PCNA

(Figure 26)[62]. Other interaction motifs are also present such as the non-canonical PIP box and

other non-PIP box binding motif ([KR]-[FYW]-[LIVA]-[LIVA]-[KR]) [63]. Bacterial display

has been performed against PCNA to identify a number of canonical and non-canonical PCNA

binding peptides including two more major classes: YxxxY/TxxxxW and KA-box peptides [64].

66

38 45

All these peptides bind to same binding surface as PIP box, albeit in different binding modes to

recruit PCNA in diverse cellular pathways [65].

In our study, I identified two phage clones against PCNA both of which showed high

enrichment ratio compared to GST (Figure 26). Both these peptides correspond to the same

sequence shown in Figure 26. The sequence matches accurately to the PCNA-interacting motif

(PIP-box) suggesting that the clonal ELISA results are accurate. The structure of PCNA in

complex with PIP-box motif from FEN1 confirms that the key residues required for the PCNA-

PIP interaction are found in phage-derived peptides (Figure 26). More sequences maybe required

to generate a more accurate binding motif for PCNA.

Figure 26: Structural and literature analysis of PCNA. (A) The structure of PCNA in complex with the FEN1 peptide. (B) The single peptide sequence obtained from phage display showing binding to PCNA. (C) The interaction surface of PCNA and the FEN1-derived peptide. The key interactions are mediated by Gln; Leu and two consecutive Phe residues forming the canonical PIP box motif: Qxxhxxaa where h – hydrophobic and a – aromatic amino acids. These residues are conserved in the phage-derived peptide. 3.3.7.2 OB-fold: Oligonucleotide-oligosaccharide binding (OB fold) domains are ssDNA

binding domains found in several proteins including the replication protein A 1 (RPA1), the

primary eukaryotic ssDNA binding protein [71]. Structurally, OB fold domains form a five-

stranded beta-barrel with one end of the barrel capped by an alpha-helix. The ssDNA binding is

mediated by one face of the beta-barrel and is conserved amongst all OB-fold members. N-

terminal OB-fold domain of RPA 70KDa subunit (present in this study) is a unique example of

OB-fold that is known to be a protein interaction module [72]. It plays a key role in central

cellular processes such as DNA replication, damage response and repair by interacting with

proteins such as phospho-RPA2, p53, RAD9,BID, MRE11, NBS1, Rad17, RAD52,BRCA2 and

67

38 45

ATRIP. The interaction between RPA70N and its interaction partners is mediated by the binding

surface involved in ssDNA binding in other OB-fold domains.

From phage display, I obtained 66 peptide sequences (11 unique sequences) for the

RPA70N domain. These contain a consensus motif shown in Figure 27. The structure of

RPA70N with phospho-mimic peptide from p53 indicates that the peptides bind to the canonical

peptide binding surface.

RPA has already been proven to be a valid target for cancer therapy. Small molecule

inhibitors that target the central OB-folds of RPA70, RPA70A and RPA70B, have shown to

induce cytotoxicity and increase the efficacy of genotoxic chemotherapeutics [73]. Peptide-based

inhibitors of RPA70 N should inhibit the binding of RPA to multiple checkpoint proteins (at

least ATRIP, RAD9, MRE11, and p53) and hence significantly impair the replication stress

response. Cancer cells are more dependent on replication stress response than normal cells to

complete replication and retain viability. Thus inhibitors of RPAN may amplify levels of DNA

damage in cancer cells caused by a wide variety of genotoxic agents [74]. Apart from potentially

acting as anti-cancer agents, peptide inhibitors to RPA70N may also be used to further

understand the role of RPA in cell cycle and DNA repair and may serve as valuable tools for

studying RPA biology.

Figure 27: Structural and literature analysis of the RPA70N OB-fold domain. (A) Structure of RPA70N in complex with the p53-derived peptide. (B) Binding logo of RPA70N obtained from phage display. (C) The interaction surface between RPA70N and the p53 peptide. The key interactions are mediated by Asp, Leu and Met residues (highlighted in yellow). These residues are conserved in the binding motif obtained from phage display with an additional possibility of Glu instead of Leu at position 12 of the motif. The phage-derived binding motif also contains an aromatic residue (Trp, Tyr) at position 17 which is not present in the p53 peptide. 3.3.7.3 Ligand binding domain of nuclear receptors: Nuclear receptors (NR) are a group of

transcription factors that regulate the expression of target genes in response to binding of small

68

38 45

molecules such as steroids, hormones and metabolites [80]. They are an integral part of several

key cellular signalling pathways regulating homeostasis, proliferation etc. Nuclear receptors

contain two structured domains – a DNA binding domain (DBD) that specifically recognizes

hormone response elements (HRE) on the DNA and a ligand binding domain (LBD) that binds to

small-molecule ligand and interacts with co-regulators. Structurally, the LBD forms a globular

structure composed of a three-layered α-helical sandwich that contains 12 alpha helices. The C-

terminal helix 12 is highly mobile, and is stabilized upon ligand binding into a position that

completes the recognition surface for binding of co-regulators. Upon binding to LBD, co-

regulators may recruit RNA-polymerase to initiate transcription of downstream genes (also

called co-activators) or HDAC complex to silence gene expression (also called co-repressors)

[15]. Most of the co-activators and co-repressors identified to date interact primarily with nuclear

receptor activation function 2 (AF2), which is located on the LBD itself. These include p160

steroid co-activator family members (SRC), p300 and related integrator proteins, TRAP mediator

complex, and various other co-activators. The region (called NR Box) responsible for binding in

co-activators comprises of a short alpha-helical LxxLL motif. Although this motif is necessary to

mediate the binding of these proteins to liganded NR's, amino acids flanking the core motif

dictate specificity of interaction. The three conserved leucines align on the face of the α-helix

that packs against the hydrophobic channel of the LBD surface. The binding of the co-repressors

occurs at the same pocket as the co-activators but in a different mode. In its unliganded state, the

C-terminal helix 12 arranges itself along the helix3 to form the peptide binding pocket. The NR-

box responsible for binding to co-repressors such as NCoR and SMRT comprise of the Lxx[I/H]

IxxxL motif. In co-repressor motif, the hydrophobic residues pack against the structure of

hydrophobic channel of the LBD surface [81].

Out of the four NR LBDs present in the initial dataset, only the LBD of bile acid receptor

(NR1H4) could be successfully expressed and purified. Bile acid receptor binds to its natural

ligand bile acid and is involved in metabolism of bile acid in liver tissue [82]. The phage

selections were done in the absence of bile acid ligand and Figure 28 shows the peptides

obtained from phage display experiments. To rationalize the consensus motif obtained from

phage display, I analyzed the structure of the nuclear receptor in complex with the co-repressor

peptide. Since no structure of bile acid receptor bound to its co-repressor (SMRT or NCoR) is

available, the structure of the closest nuclear receptor, PPARα in complex with SMRT peptide

69

38 45

was used for analysis [81]. Based on sequence alignment of SMRT and phage-display peptides,

it is observed that the key hydrophobic residues responsible for binding of the SMRT peptide are

exchanged by bulkier hydrophobic and aromatic residues in the phage-derived peptides. This has

not been observed previously and may represent a novel mode for peptide binding to LBD of

NR1H4. Further biochemical and structure analysis are required to confirm the binding of these

peptides. Nonetheless these might represent novel binding partners for the bile acid receptor.

Figure 28: Structure and literature analysis of the NR1H4 ligand biding domain. (A) The structure of PPARα LBD in complex with the SMRT peptide (PDB ID: 1KKQ). (B) The binding motif for the NR1H4-LBD obtained from phage display. (C) The interaction surface between PPARα and the SMRT peptide. The key interactions between PPARα and the SMRT peptide are mediated by two Leu and Ile residues (highlighted in yellow) that stack against the hydrophobic surface of PPARα. Based on sequence alignment to SMRT peptide, the hydrophobic and aromatic residues conserved in the phage display results correspond to the key residues in the interaction. Nuclear receptors are amongst the most important classes of drug targets for various

diseases including breast and prostate cancer. Often, the small molecule drugs bind to ligand

binding pocket and de-activate the NR-LBD. Other peptide- or small-molecule inhibitors affect

binding of NR-LBD to their co-regulators. NR-LBD/co-regulator binding is critical for

downstream activity of nuclear receptors and inhibition of this interaction results in disruption of

nuclear receptor action [15]. The peptides identified in our study bind to bile acids receptor in its

unliganded form. Once confirmed, these peptides may modulate the activity of this important

class of drug targets.

3.3.7.4 WD40 domains: WD40 domains consist of sequence repeats of 44-60 residues that

have a four-stranded anti-parallel beta sheet which come together to form a beta-propeller fold

70

38 45

[88, 89]. The most common of these domains are seven-bladed beta-propeller that contains seven

WD40 repeats. WD40 domains are among the most common domain types across eukaryotic

proteomes and act as scaffolds for several key cellular pathways. In this study, I targeted the

WD40-repeat containing protein 5 (WDR5). WDR5 is an adaptor protein that forms part of the

Set1-family of methyl-transferase. It binds to unmodified histone H3 and members of the Set1

family of methyl-transferases via the top of the beta-propeller [90]. Residues from a small region

located close to the MLL catalytic site (called the Win-motif) binds in a 3/10-helical

conformation within the central depression of the beta-propeller, and the residues that follow

extend away from this cleft along the surface of WDR5.

From phage display, I obtained a clear consensus motif “R[T/W]xxW” with a strong

preference for Arg at the central position of the residue. The structure of WDR5 in complex with

the peptide obtained from MLL4 peptide provides me with confidence that the phage derived

peptides bind to the interaction surface on top of the beta propeller (Figure 29) [91].

WDR5 is an important target for regulating histone modifications, specifically histone

methylation. The interaction of WDR5 and histone H3 is critical for this event and inhibition of

this may potentially block the methylation of histone by the Set1 family of methyltransferase

[92]. Hence the peptides obtained in this study may be used to inhibit the function of Set1 family

of methyltransferase as intracellular probes for studying the biology of this important class of

methyltransferase.

Figure 29: The structure and literature analysis of WDR5. (A) The structure of WDR5 with the MLL4 peptide (PDB ID: 3UVM). (B) The binding motif of WDR5 domain obtained from phage display. (C) The binding surface of WDR5 and the MLL4 peptide with the key residues involved in WDR5 interaction highlighted in yellow. These residues are conserved in the phage-derived binding motif.

71

38 45

3.3.7.5 TRF homology domain: TRFH domain found in proteins like TERF1 and TERF2, that

are part of the shelterin complex [106]. TERF1 and TERF2 are homologous proteins that

associate with the full length of the double-stranded portion of the telomere. The centrally

located TERF1 TRF homology domain is a known protein interaction domain that mediates the

recruitment of telomere binding proteins such as tankyrase and TIN2. Structurally, the TRFH

domain consists of nine α-helices forming an elongated helix bundle. TRF1 recognizes TIN2 and

PinX1 using a conserved interaction surface on its TRF homology (TRFH) domain. The N

terminus of TIN2 peptide adopts an extended conformation stabilized by an extensive

intermolecular hydrogen-bonding network with key interactions made by Leu, Phe and Pro

residues. The binding motif that is proposed based on structural studies is: F/YxLxP (Figure 30)

[107].

In this study, I obtained a limited number of peptides against TRFH domain of TERF1

(Figure 30). These sequences did not show any strong binding preference to known natural

peptides. More sequences are required to identify the binding preference of these peptides.

Further, these peptides should be investigated further for identifying in detail, the binding

preferences of TRFH domain.

Figure 30: Structural and literature review of the TRFH domain of TERF1. (A) The structure of the TERF1-TRFH domain with the TIN2 peptide (PDB ID: 3BQO). (B) The peptide sequences obtained from phage display against the TERF1-TRFH domain. (C) The binding surface of TERF1-TRFH domain and the TIN2 peptide. The key interactions are mediated by Leu and Phe residues on Tin2 (highlighted in yellow). The number of sequences obtained from phage display is limited and hence it is difficult to obtain a clear consensus motif for the TERF1-TRFH domain. However, the peptides obtained in this study do not contain the F/YxLxP motif described by Chen Y et al 2008. 3.3.8 Miscellaneous: Apart from the aforementioned examples, I was also able to generate

peptide against a handful of domain families that are involved in different cellular pathways.

72

38 45

These include the SWIB/MDM2 domain family involved in apoptosis, eIF4E involved in

translation initiation, HORMA domain involved in cell cycle regulation and ubiquitin that is

involved in ubquitin-proteasomal degradation system.

3.3.8.1 SWIB/MDM2:The SWIB/MDM2 family of domains are found in the Mdm2 family of

oncoproteins that are known regulators of p53 and SWI/SNF family of ATP-dependent

chromatin-remodelling proteins [51]. In MDM2 proteins, SWIB/MDM2 domain binds to the

transactivation domain of p53, allowing the degradation of p53. Structurally, the SWIB/MDM2

domain contains six beta sheets and four alpha helices. The binding surface is a hydrophobic

cleft formed by two alpha helices where the alpha-helical transactivation domain of p53 binds.

The key interaction between SWIB/MDM2 and p53 is mediated by a triad of residues on the

p53: Phe, Trp and Leu on the peptides that insert into the hydrophobic pocket on MDM2 [52].

In this study, we identified peptides against the SWIB/MDM2 domain of MDM4 or

MDMX, a member of MDM2 family of proteins that is domain that is essential for regulating

p53. A high enrichment ratio was obtained for SWIB/MDM2 domain during phage selections. A

binding preference obtained showed a clear “FxxxWxxL” motif which correlates with previous

structural and biochemical analysis. This clearly suggests that the peptides generated in this

study bind to the same interaction surface that binds to the p53 peptide (Figure 31).

Figure 31: Structure and literature analysis of MDM4. (A) The structure of MDM4 in complex with the peptide from p53 (PDB ID: 3DAB). (B) The binding motif obtained for MDM4. (C) The interaction surface of MDM4 with p53 peptide. The key interactions between the two domains are mediated by triad of residues: Phe, Trp and Leu on the binding surface (shown in yellow). These residues are conserved in the consensus motif. Interestingly, the binding motif suggests that Leu at the third triad position can also accommodate a Met residue. Disruption of the p53 tumor suppressor pathway due to mutations on the p53 gene is

found in approximately 50% of all cancers. Several genome-wide functional genomics studies

have revealed an increased MDM4 copy number in 65% of human retinoblastomas [53]. Ectopic

73

38 45

expression of MDM4 in mouse Rb-null p107-null retinal progenitor cells leads to a reduction in

the p53-mediated apoptosis and a clonal expansion of tumor cells. On the contrary, colony assays

have shown that knocking down MDMX blocks proliferation of MCF-7 cells unless p53 levels

are simultaneously decreased. Nutlin-3, a dual inhibitor of MDM2 andMDM4 reduces the

MDM2/4-p53 interaction and efficiently kills retinoblastoma cells [53]. Other small-molecule

and peptide-based MDM4 inhibitors have been identified and tested for anti-cancer activities.

The peptides generated in this study are predicted to bind to the same p53 binding site on MDM4

and hence may have similar effect on p53-mediated apoptosis pathway. 3.3.8.2 Horma domain: The HORMA family of domains are found in several eukaryotic

proteins such as Mad2, a protein that is involved in mitosis checkpoint [60]. The core of the

HORMA fold contains three α-helices sandwiched between a six-stranded β-sheet and an

irregular β-hairpin. In the Mad2 protein, the domain exists in two conformationally distinct

forms: open form (O-Mad2) and closed form (C-Mad2). Both recombinant and endogenous

Mad2 are predominantly folded as O-Mad2 while C-Mad2 forms upon binding to its interaction

partners – Mad1, p31comet and Cdc20.

Figure 32: Structure and literature analysis of HORMA domain. (A) The structure of HORMA domain of Mad2A in complex with the phage-derived Mad2 binding peptide (MBP1: PDB ID – 1KLQ). (B) The binding logo obtained from phage display done previously (bottom – Luo et al 2002 [61]) and this study (on top). The binding motifs are similar to each other with similar conserved positions. (C) The interaction surface of MBP1 and Mad2A. The key residues involved in the interaction (highlighted in yellow): Trp(2) are; Tyr(3); Pro(7); Pro(8); Gln(9) and Arg(10). In out binding logo, we see conservation of Gly before the highly conserved hydrophobic region and Lys/Arg after the poly-proline region. This can be attributed to the longer library used in this study which may have produced a longer binding motif. Phage display has been used previously to generate peptides against Mad2 [61]. The binding

preference obtained is shown in the Figure 32. The core of the motif consists of two hydrophobic

residues, a basic residue, and a third hydrophobic residue (Figure 32). This core motif is

74

38 45

generally followed by a proline-rich sequence. Known interaction partners of Mad2A Cdc20 and

Mad1 contain a consensus similar to that found in phage display results.

In this study, I obtained a similar binding preference as observed previously suggesting

that the peptides obtained here will bind to the same binding site as described earlier. 3.3.8.3 eIF4E: The eukaryotic translation initiation factor eIF4E is a protein that exists as part

of the translation pre-initiation complex (eIF4F) involved in directing ribosomes to the cap

structure of mRNAs [100]. elF4E has a curved eight-stranded antiparallel β-sheet with three

helices forming the convex face and three smaller helices inserted in connecting loops and binds

directly to the mRNA cap. The m7G of the mRNA cap binds to a stack of tryptophan residues on

the concave face. EIF4E is recruited to the eIF4F complex via its interaction partner eIF4G by

binding to a conserved binding site. The eIF4E peptide binding site is located in a region

encompassing one edge of the β-sheet, the adjacent helix α2 and several regions of non-regular

secondary structure on the convex surface of eIF4E. The peptide from eIF4G forms a helical

structure and consensus motif is YxxxxLΦ [101]. EIF4E also binds to the RING domain of Z

protein found in anenovirus and tumor suppressor protein PML. A recent study has identified

structural and biochemical characterization of the interaction which is mediated by a site that is

distinct from the known YxxxxLΦ peptide motif [102].

Figure 33: The structure and literature analysis of eIF4E. (A) The structure of eIF4E with the eIF4G peptide (PDB ID: 3UVM). (B) The sequences of peptides obtained from phage display against eIF4E. (C) The binding surface of eIF4E and the eIF4G peptide. In this study, I obtained a limited set of peptides for eIF4E (Figure 33). A handful of

these peptides contained sequences similar to the YxxxxLΦ motif. However, a subset of phage-

derived eIF4E binders do not show the YxxxxLΦ motif (Figure 33). This observation is critical

75

38 45

as these peptides may bind to eIF4E at a site different from the canonical peptide binding site.

However, more sequences are required to confirm the binding of the phage-derived peptides to

eIF4E.

EIF4E is an important target for cancer therapy. EIF4E is up-regulated in a number of

malignancies and over-expression of eIF4E leads to tumorigenesis in mice models. Inhibition of

eIF4E in various cancer models leads to apoptosis and reduction of the level of oncogenic

protein Ras. Herbert et al first reported that peptides obtained from its natural binder 4E-BP1,

that bind to the initiation factor eIF4E and induces apoptosis in MRC-5 cells [103]. Peptides

derived from eIF4G have been used to identify a small molecule (4EGI-1) that binds to eIF4E

and inhibits its recruitment to the eIF4F complex [40]. When introduced inside cells, these

molecules lead to the inhibition of growth of multiple cancer cell lines. Phage-derived peptides

may join the growing list of inhibitors of eIF4E. 3.3.8.4 Ubiquitin: Ubiquitin is small protein that can be attached to a range of proteins to affect

their cellular fate. Ubiquitination is an important post-translation modification that directs protein

recycling. A number of proteins recognize the ubiquitin tag on other proteins. One of the most

common modules to bind to ubiquitin is the ubiquitin-interacting motif (UIM) [103]. The UIM is

found in a number of proteins involved in endocytosis and vacuolar protein sorting including

Hrs, Vps27p, Stam1, and Eps15. The UIM consists of an amphipathic α-helical structure with

hydrophobic core sequences composed of alternating large and small residues (Leu-Ala-Leu-

Ala-Leu) that are flanked on both sides by patches of acidic residues. Sequence analyses of

known UIM have been used to define a more general 15-residue UIM motif: eeexΦxxAΦx

[e/Φ]Saxe; where x is a helix-favoring residue and a is a bulky hydrophobic or polar residue with

considerable aliphatic content, e is a negatively charged residue and Φ is a hydrophobic residue

[104].

Structurally, ubiquitin contains three and one-half turns of α-helix, a short 310-helix, a

five strand β-sheet. The five-stranded beta-sheet of ubiquitin constitutes the principal interaction

surface for the Vps27 UIM helix. The helix binds in an antiparallel orientation relative to the C-

terminal β-sheet, and interacts with the β-sheet 4 & 5 and loops between β-sheet 1& 2 and β-

sheet 4 & 5. The UIM forms a left-handed, antiparallel helix with the hydrophobic face of the

amphipathic helix facing the ubquitin molecule [103].

76

38 45

In this study, I obtained five peptides (four unique peptides) against ubiquitin (Figure 34). No

consensus motif was obtained as the numbers of sequences were low. I also observed an

abundance of tryptophan residues at different positions. The clonal ELISA results confirm that

the peptides obtained here are specific for GST-tagged ubquitin compared to GST alone. More

sequence maybe required to obtain a clear a binding preference. Also biochemical analyses are

required to confirm the binding of the phage-derived peptides. Nevertheless, if confirmed these

peptide may serve as potential reagents to recognize ubiquitin. In recent years, there have been

advances in using UIM chains to develop intracellular reagents to detect specific ubiquitin chains

on proteins [105]. The peptides obtained here, if effective, may potentially help in developing

better intracellular reagents.

Figure 34: Structure and literature analysis of ubiquitin. (A) The structure of ubiqutin with the Vps27p UIM (PDB ID: 1Q0W). (B) The peptides obtained from phage display against ubiquitin. (C) The binding surface of ubiqutin and Vps27p shows the key residues required for interaction with ubiqutin. Ubiquitin interaction motif can be divided to two regions: a central amphipathic helix with conserved Ala and Ser residues (highlighted in yellow) and a N-terminal negatively charged helix (highlighted in yellow). The phage-derived peptides do not fall into the canonical UIM and may represent a novel mode of binding to ubiquitin. 3.4 Summary

The second aim of this study was to generate peptide binders against shortlist proteins

using phage display. To this end, I selected all 66 domains that were identified by my

computational analysis. These domains were cloned into a pGEX expression vector and

expressed as purified proteins. 44 out of 66 domains were successfully purified using established

GST-purification protocol. SDS PAGE gels were used to identify the steps of the protein

77

38 45

purification where protein purification failed for 22 domains. Interestingly, all the 22 domains

were found to be insoluble and hence could not be obtained in the cell lysate. Different growth

and lysis conditions may be required to solubilise these domains and increase protein yield.

For each of the purified domains, I used a 16 amino acid length random library to screen

for peptide binders. The selection procedure was optimized by incorporating pre-selection in

GST wells and in-solution GST negative selection. These modifications helped in removal of

non-specific peptide binders and significantly reduced the background noise in phage selections.

From my phage display screen, I was able to obtain specific peptides against 27 of the 44

domains. These domains belonged to different structural families and exhibited divergent

binding preferences. This highlights the power of phage display to identify distinct binding

preferences using a single random library. Further, I was able to generate position weight

matrices (PWM) for 22 of the 27 domains.

Extensive structural analysis and literature survey were performed to examine the

peptides obtained from phage display. For 20 of the 27 domains, I obtained peptides that

resemble natural peptide partners for these domains. This provides me with confidence that these

peptides may block the binding of endogenous partner of these domains, thereby perturbing their

cellular function. For these cases, I have predicted the phenotypic effects of peptide-mediated

inhibition and enlisted the potential uses of these peptides. For the remaining seven domains, the

peptides obtained did not resemble the known natural peptide partner. In such cases, based on

current evidence, I have predicted the potential binding surface and their binding mode. For

CAP/Gly domain, I generated a structural model for binding of the phage-derived peptide. The

structural model is currently being tested by Dr. Yugeng Tong, the principal investigator at the

Structural Genomics Consortium (SGC), Toronto. For domains where I was not able to make

structural based models, I have suggested potential binding modes. These domains include

PDCD6 penta-EF hand domain and clathrin terminal domain.

For seven domains, the peptide sequences obtained were insufficient to generate a high

confidence binding motif. These include eIF4E, TRFH domain of TERF1 and ubiqutin. Further

experiments are required to predict the binding mode of these peptides. Nonetheless, these

results do suggest a non-canonical peptide binding mode for the known peptide binding domains.

Once verified, these binding modes may uncover novel protein partners and biological roles for

these domains.

78

38 45

4 CONCLUSIONS

79

38 45

4.1 Summary of work: Peptide recognition domains (PRD) play key roles in cellular

pathways regulating homeostasis and cellular signalling. Such domains are frequently mis-

regulated in diseases including cancer. Considerable progress has been made in developing

specific small-molecule and peptide based reagents against a limited families of PRDs. The

central aim of this study was to use to phage display to generate peptide probes against a diverse-

set of cancer-related PRDs.

In Chapter 2, I covered my work in identifying cancer-relevant peptide recognition

domains. To this end, I focused on a list of proteins related to ovarian cancer. These candidate

genes were identified by our collaborators using whole genome RNAi screens on 15 different

ovarian cancer cell lines. I developed a computational methodology to identify target domains

present on these candidate genes that share high sequence similarity to known PRDs. A set of

known PRDs were obtained from online databases such as PepX and DOMINO. The list of

potential PRDs identified from my computational pipeline was manually curated and analyzed.

Based on my analysis, I selected 66 domains as targets for further work.

In Chapter 3, I described the phage display pipeline used to identify peptides against each

of the 66 target domains. First, using the standard GST purification method, I successfully

purified 44 of the 66 domains. Second, I used a 16 amino acid random library to obtain peptides

against 27 of the 44 purified domains. Third, I validated the phage derived peptides using an

extensive structural analysis and literature review. Based on this analysis, I was able to

accurately predict the peptide binding mode for a large proportion of the 27 domains. For the

domains where accurate models could not be generated, I have listed future experiments required

to fully elucidate the mechanism of binding. For each of the 27 domains, I have also included

potential applications of phage-derived peptides.

Based on the results obtained thus far, I have been able to successfully generate binding

preferences for 22 known PRDs that belong to 15 different protein families. For 11 of these

protein families, this represents the first phage display study done to elucidate their binding

preferences.

4.2 Future experiments: The current study has yielded several promising results including

novel peptide binding modes for known PRDs. However, further experiments are required to

validate the results obtained from phage display.

80

38 45

One of the first follow-up experiments required is to perform deep sequencing on all the

phage pools obtained from phage display. For each of the domains on which selections were

done, 96 phages were picked for DNA sequencing and generating peptide motifs. While for a

large set of domains, the sequences obtained were sufficient to accurately map the binding

preference, for others such as ubiquitin, TRFH domain of TERF1 and eIF4E, we could not

accurately predict the binding preferences. To this end, deep sequencing may help in providing a

large set of peptide sequences for each of these domains. Previous studies in the Sidhu lab have

successfully used deep sequencing to map the binding preferences of a large set of synthetic PDZ

domains [108]. Our group was also the first to report multiple peptide binding preferences

exhibited by a subset of known PDZ domains based on results obtained from deep sequencing

[109]. Such an analysis can be extended to all domains that were screened in this study.

In our study, I have identified peptide with potentially novel binding motifs. One of the

most promising results obtained was the CAP/Gly domain of DCTN1. CAP/Gly domains have

been studied in the context of binding to C-terminal peptides. I was able to obtain an internal

peptide against this domain. Further experimental evidence is required to elucidate the

mechanism of binding of this peptide. First, I need to perform biochemical assays such as iso-

terminal calorimetry (ITC) to confirm the interaction between the domain and the peptide. This

would also provide an accurate measure of the binding affinity of an internal peptide and the

CAP/Gly domain. Once the peptide binding is confirmed, I also require a structure of the domain

in complex with the phage-derived peptide to validate my structural model. We are currently

collaborating with Dr. Yufeng Tong at Structural Genomics Consortium, Toronto, to obtain a

crystal structure of the CAP/Gly domain in complex with the phage-derived peptide.

Work done for CAP/Gly can be extended to other domains such as the penta-EF hand domain of

PDCD6 and the clathrin terminal domain for which we were not able to generate a structural

models to explain results obtained from phage display experiments.

4.3 Potential avenues for research: The current study provides several potential avenues for

further investigation. One of the first steps is to extend the current study to the family members

of domains for which binding preference were successfully obtained. In the Sidhu lab, we have

generated a high-resolution binding preference map of PDZ, SH3 and WW domains. As

previously mentioned, we were able to generate binding preferences for 15 different protein

81

38 45

families of which for 12 domains this represents the first phage display study done. We can now

potentially extend our study to include all members of these 12 protein families. These families

include the ligand binding domains of nuclear receptors, 14-3-3, Gα subunits and WD40-repeat

containing proteins. These protein families play important roles in key cellular pathways and

some have been established as important drug targets (such as ligand binding domain of nuclear

receptors, 14-3-3) in cancer. The phage-derived binding preferences can be used to obtain high

specificity and high affinity peptide probes against these families.

Phage display can also be used to probe other cancer-related targets. In this study, we

focussed on candidate genes identified by our collaborator using whole genome RNAi screens in

ovarian cancer. Other functional genomics screens including exome sequencing, cDNA

hybridization microarrays have been used to predict potential cancer-relevant genes. In principle,

the current study can be extended to candidates obtained by such screens. The study can also be

used for other diseases including other cancer types. In the Sidhu lab, we have developed a high-

throughput phage display methodology to screen 96 targets in a single experiment [110]. This

methodology can be readily used to target members from diverse set of protein families.

4.4 Applications of phage-derived peptides: The peptides generated here may serve as

valuable tools for the scientific community. Peptide-based probes have been routinely used to

structurally characterize the interaction mediated by PRDs and identify novel interaction

partners. Peptides derived here may be used to design intracellular probes for studying specific

biological pathways. Finally, the phage-derived peptides may assist in identification of small-

molecule drugs against specific domains.

4.5 Final Remarks: I have presented here a systematic application of phage display pipeline

to rapidly identify peptides against a diverse set of domains. To my knowledge, this is the first

successful application of phage display against such a diverse class of domains.

82

38 45

5 REFERENCES

83

38 45

[1] Pawson, T., Nash, P., Assembly of Cell Regulatory Systems Through Protein Interaction Domains.Science 2003, 300, 445 –452.

[2] Kim, P.M., Sboner, A., Xia, Y., Gerstein, M., The role of disorder in interaction networks: a structural analysis. Mol Syst Biol n.d., 4, 179–179.

[3] Pawson, T., Warner, N., Oncogenic re-wiring of cellular signaling pathways. Oncogene 0000, 26, 1268–1275.

[4] Wells, J.A., McClendon, C.L., Reaching for high-hanging fruit in drug discovery at protein–protein interfaces. Nature 2007, 450, 1001–1009.

[5] Teyra, J., Sidhu, S.S., Kim, P.M., Elucidation of the binding preferences of peptide recognition modules: SH3 and PDZ domains. FEBS Letters n.d.

[6] Rual, J.-F., Venkatesan, K., Hao, T., Hirozane-Kishikawa, T., et al., Towards a proteome-scale map of the human protein–protein interaction network. Nature 2005, 437, 1173–1178.

[7] Puntervoll, P., Linding, R., Gemünd, C., Chabanis-Davidson, S., et al., ELM server: A new resource for investigating short functional sites in modular eukaryotic proteins. Nucleic Acids Res. 2003, 31, 3625–3630.

[8] Jones, S., Thornton, J.M., Principles of protein-protein interactions. Proceedings of the National Academy of Sciences 1996, 93, 13 –20.

[9] Ceol, A., Chatr-aryamontri, A., Santonico, E., Sacco, R., et al., DOMINO: a database of domain–peptide interactions. Nucleic Acids Res 2007, 35, D557–D560.

[10] Encinar, J.A., Fernandez-Ballester, G., Sánchez, I.E., Hurtado-Gomez, E., et al., ADAN: a database for prediction of protein–protein interaction of modular domains mediated by linear motifs. Bioinformatics 2009, 25, 2418 –2424.

[11] Vanhee, P., Reumers, J., Stricher, F., Baeten, L., et al., PepX: a structural database of non-redundant protein–peptide complexes. Nucleic Acids Res 2010, 38, D545–D551.

[12] London, N., Movshovitz-Attias, D., Schueler-Furman, O., The Structural Basis of Peptide-Protein Binding Strategies.Structure 2010, 18, 188–199.

[13] Clackson, T., Wells, J., A hot spot of binding energy in a hormone-receptor interface. Science 1995, 267, 383 –386.

[14] Johnston, C.A., Willard, F.S., Jezyk, M.R., Fredericks, Z., et al., Structure of Galpha(i1) bound to a GDP-selective peptide provides insight into guanine nucleotide exchange. Structure 2005, 13, 1069–1080.

[15] McKenna, N.J., Lanz, R.B., O’Malley, B.W., Nuclear Receptor Coregulators: Cellular and Molecular Biology. Endocrine Reviews 1999, 20, 321 –344.

[16] Seet, B.T., Dikic, I., Zhou, M.-M., Pawson, T., Reading protein modifications with interaction domains. Nature Reviews Molecular Cell Biology 2006, 7, 473–483.

[17] Good, M.C., Zalatan, J.G., Lim, W.A., Scaffold Proteins: Hubs for Controlling the Flow of Cellular Information. Science 2011, 332, 680–686.

[18] Youle, R.J., Strasser, A., The BCL-2 protein family: opposing activities that mediate cell death. Nature Reviews Molecular Cell Biology 2008, 9, 47–59.

[19] Tanaka, S., Louie, D.C., Kant, J.A., Reed, J.C., Frequent incidence of somatic mutations in translocated BCL2 oncogenes of non-Hodgkin’s lymphomas. Blood 1992, 79, 229–237.

[20] Chittenden, T., Harrington, E.A., O’Connor, R., Remington, C., et al., Induction of apoptosis by the Bcl-2 homologue Bak. , Published online: 20 April 1995; | doi:10.1038/374733a0 1995, 374, 733–736.

[21] Wang, J.-L., Zhang, Z.-J., Choksi, S., Shan, S., et al., Cell Permeable Bcl-2 Binding Peptides: A Chemical Approach to Apoptosis Induction in Tumor Cells. Cancer Res 2000, 60, 1498–1502.

[22] Oltersdorf, T., Elmore, S.W., Shoemaker, A.R., Armstrong, R.C., et al., An inhibitor of Bcl-2 family proteins induces regression of solid tumours. Nature 2005, 435, 677.

[23] Frank, R., The SPOT-synthesis technique: Synthetic peptide arrays on membrane supports—principles and applications. Journal of Immunological Methods 2002, 267, 13–26.

84

38 45

[24] Yu, H., Chen, J.K., Feng, S., Dalgarno, D.C., et al., Structural basis for the binding of proline-rich peptides to SH3 domains. Cell 1994, 76, 933–945.

[25] Filippakopoulos, P., Picaud, S., Mangos, M., Keates, T., et al., Histone Recognition and Large-Scale Structural Analysis of the Human Bromodomain Family. Cell 2012, 149, 214–231.

[26] Mok, J., Kim, P.M., Lam, H.Y.K., Piccirillo, S., et al., Deciphering Protein Kinase Specificity through Large-Scale Analysis of Yeast Phosphorylation Site Motifs. Sci Signal n.d., 3, ra12–ra12.

[27] Sidhu, S.S., Lowman, H.B., Cunningham, B.C., Wells, J.A., Phage display for selection of novel binding peptides. Meth. Enzymol. 2000, 328, 333–363.

[28] Tonikian, R., Zhang, Y., Sazinsky, S.L., Currell, B., et al., A specificity map for the PDZ domain family. PLoS Biol. 2008, 6, e239.

[29] Tonikian, R., Xin, X., Toret, C.P., Gfeller, D., et al., Bayesian modeling of the yeast SH3 domain interactome predicts spatiotemporal dynamics of endocytosis proteins. PLoS Biol. 2009, 7, e1000218.

[30] Tong, A.H.Y., Drees, B., Nardelli, G., Bader, G.D., et al., A Combined Experimental and Computational Strategy to Define Protein Interaction Networks for Peptide Recognition Modules. Science 2002, 295, 321–324.

[31] Heinis, C., Rutherford, T., Freund, S., Winter, G., Phage-encoded combinatorial chemical libraries based on bicyclic peptides. Nature Chemical Biology 2009, 5, 502–507.

[32] Bernal, F., Wade, M., Godes, M., Davis, T.N., et al., A stapled p53 helix overcomes HDMX-mediated suppression of p53. Cancer Cell 2010, 18, 411–422.

[33] Abedi, M.R., Caponigro, G., Kamb, A., Green fluorescent protein as a scaffold for intracellular presentation of peptides. Nucl. Acids Res. 1998, 26, 623–630.

[34] van de Wijngaart, D.J., Dubbink, H.J., Molier, M., de Vos, C., et al., Inhibition of androgen receptor functions by gelsolin FxxFF peptide delivered by transfection, cell-penetrating peptides, and lentiviral infection. Prostate 2011, 71, 241–253.

[35] Stanger, K., Steffek, M., Zhou, L., Pozniak, C.D., et al., Allosteric peptides bind a caspase zymogen and mediate caspase tetramerization. Nature Chemical Biology 2012, 8, 655–660.

[36] Zhang, Y., Appleton, B.A., Wiesmann, C., Lau, T., et al., Inhibition of Wnt signaling by Dishevelled PDZ peptides. Nat. Chem. Biol. 2009, 5, 217–219.

[37] Cook, D.J., Teves, L., Tymianski, M., Treatment of stroke with a PSD-95 inhibitor in the gyrencephalic primate brain. Nature 2012, 483, 213–217.

[38] Bach, A., Clausen, B.H., Møller, M., Vestergaard, B., et al., A high-affinity, dimeric inhibitor of PSD-95 bivalently interacts with PDZ1-2 and protects against ischemic brain damage. PNAS 2012, 109, 3317–3322.

[39] Li, L., Thomas, R.M., Suzuki, H., De Brabander, J.K., et al., A Small Molecule Smac Mimic Potentiates TRAIL- and TNFα-Mediated Cell Death. Science 2004, 305, 1471–1474.

[40] Moerke, N.J., Aktas, H., Chen, H., Cantel, S., et al., Small-Molecule Inhibition of the Interaction between the Translation Initiation Factors eIF4E and eIF4G. Cell 2007, 128, 257–267.

[41] Marcotte, R., Brown, K.R., Suarez, F., Sayad, A., et al., Essential Gene Profiles in Breast, Pancreatic, and Ovarian Cancer Cells. Cancer Discovery 2012, 2, 172–189.

[42] Luo, B., Cheung, H.W., Subramanian, A., Sharifnia, T., et al., Highly parallel identification of essential genes in cancer cells. Proc Natl Acad Sci U S A 2008, 105, 20380–20385.

[43] Hooda, Y., Kim, P.M., Computational structural analysis of protein interactions and networks. PROTEOMICS 2012, 12, 1697–1705.

[44] Sidhu, S.S., Phage Display In Biotechnology And Drug Discovery, CRC Press, 2005. [45] Tonikian, R., Zhang, Y., Boone, C., Sidhu, S.S., Identifying specificity profiles for peptide

recognition modules from phage-displayed peptide libraries. Nat. Protocols 2007, 2, 1368–1386. [46] Liu, Q., Berry, D., Nash, P., Pawson, T., et al., Structural Basis for Specific Binding of the Gads

SH3 Domain to an RxxK Motif-Containing SLP-76 Peptide: A Novel Mode of Peptide Recognition. Molecular Cell 2003, 11, 471–481.

85

38 45

[47] Sparks, A.B., Rider, J.E., Hoffman, N.G., Fowlkes, D.M., et al., Distinct ligand preferences of Src homology 3 domains from Src, Yes, Abl, Cortactin, p53bp2, PLCgamma, Crk, and Grb2. PNAS 1996, 93, 1540–1544.

[48] Penkert, R.R., DiVittorio, H.M., Prehoda, K.E., Internal Recognition Through PDZ Domain Plasticity in the Par-6 - Pals1 Complex. Nat Struct Mol Biol 2004, 11, 1122–1127.

[49] Rapali, P., Szenes, Á., Radnai, L., Bakos, A., et al., DYNLL/LC8: a light chain subunit of the dynein motor complex and beyond. FEBS Journal 2011, 278, 2980–2996.

[50] Rapali, P., Radnai, L., Süveges, D., Harmat, V., et al., Directed Evolution Reveals the Binding Motif Preference of the LC8/DYNLL Hub Protein and Predicts Large Numbers of Novel Binders in the Human Proteome. PLoS ONE 2011, 6, e18818.

[51] Bennett-Lovsey, R., Hart, S.E., Shirai, H., Mizuguchi, K., The SWIB and the MDM2 domains are homologous and share a common fold. Bioinformatics 2002, 18, 626–630.

[52] Pazgier, M., Liu, M., Zou, G., Yuan, W., et al., Structural Basis for High-Affinity Peptide Inhibition of P53 Interactions with MDM2 and MDMX. PNAS 2009, 106, 4665–4670.

[53] Hu, B., Gilkes, D.M., Chen, J., Efficient P53 Activation and Apoptosis by Simultaneous Disruption of Binding to MDM2 and MDMX. Cancer Res 2007, 67, 8810–8817.

[54] Ja, W.W., Adhikari, A., Austin, R.J., Sprang, S.R., Roberts, R.W., A peptide core motif for binding to heterotrimeric G protein alpha subunits.J. Biol. Chem. 2005, 280, 32057–32060.

[55] Prévost, G.P., Lonchampt, M.O., Holbeck, S., Attoub, S., et al., Anticancer Activity of BIM-46174, a New Inhibitor of the Heterotrimeric Gα/Gβγ Protein Complex. Cancer Res 2006, 66, 9227–9234.

[56] Hermeking, H., The 14-3-3 cancer connection. Nat Rev Cancer 2003, 3, 931–943. [57] Muslin, A.J., Tanner, J.W., Allen, P.M., Shaw, A.S., Interaction of 14-3-3 with Signaling Proteins Is

Mediated by the Recognition of Phosphoserine. Cell 1996, 84, 889–897. [58] Masters, S.C., Pederson, K.J., Zhang, L., Barbieri, J.T., Fu, H., Interaction of 14-3-3 with a

Nonphosphorylated Protein Ligand, Exoenzyme S of Pseudomonas aeruginosa†. Biochemistry 1999, 38, 5216–5221.

[59] Wang, B., Yang, H., Liu, Y.-C., Jelinek, T., et al., Isolation of High-Affinity Peptide Antagonists of 14-3-3 Proteins by Phage Display†. Biochemistry 1999, 38, 12499–12504.

[60] Mapelli, M., Massimiliano, L., Santaguida, S., Musacchio, A., The Mad2 Conformational Dimer: Structure and Implications for the Spindle Assembly Checkpoint. Cell 2007, 131, 730–743.

[61] Luo, X., Tang, Z., Rizo, J., Yu, H., The Mad2 Spindle Checkpoint Protein Undergoes Similar Major Conformational Changes Upon Binding to Either Mad1 or Cdc20. Molecular Cell 2002, 9, 59–71.

[62] Gulbis, J.M., Kelman, Z., Hurwitz, J., O’Donnell, M., Kuriyan, J., Structure of the C-terminal region of p21(WAF1/CIP1) complexed with human PCNA. Cell 1996, 87, 297–306.

[63] Meslet-Cladiére, L., Norais, C., Kuhn, J., Briffotaux, J., et al., A Novel Proteomic Approach Identifies New Interaction Partners for Proliferating Cell Nuclear Antigen. Journal of Molecular Biology 2007, 372, 1137–1148.

[64] Xu, H., Zhang, P., Liu, L., Lee, M.Y.W.T., A Novel PCNA-Binding Motif Identified by the Panning of a Random Peptide Display Library†. Biochemistry 2001, 40, 4512–4520.

[65] Hishiki, A., Hashimoto, H., Hanafusa, T., Kamei, K., et al., Structural Basis for Novel Interactions between Human Translesion Synthesis Polymerases and Proliferating Cell Nuclear Antigen. J. Biol. Chem. 2009, 284, 10552–10560.

[66] Izard, T., Evans, G., Borgon, R.A., Rush, C.L., et al., Vinculin activation by talin through helical bundle conversion. Nature 2003, 427, 171–175.

[67] Adey, N.B., Kay, B.K., Isolation of peptides from phage-displayed random peptide libraries that interact with the talin-binding domain of vinculin. Biochem J 1997, 324, 523–528.

[68] Gingras, A.R., Ziegler, W.H., Frank, R., Barsukov, I.L., et al., Mapping and Consensus Sequence Identification for Multiple Vinculin Binding Sites within the Talin Rod. J. Biol. Chem. 2005, 280, 37217–37224.

[69] Bayliss, R., Littlewood, T., Stewart, M., Structural basis for the interaction between FxFG nucleoporin repeats and importin-beta in nuclear trafficking. Cell 2000, 102, 99–108.

86

38 45

[70] Ambrus, G., Whitby, L.R., Singer, E.L., Trott, O., et al., Small molecule peptidomimetic inhibitors of importin α/β mediated nuclear transport. Bioorg. Med. Chem. 2010, 18, 7611–7620.

[71] Arcus, V., OB-fold domains: a snapshot of the evolution of sequence, structure and function. Current Opinion in Structural Biology 2002, 12, 794–801.

[72] Bochkareva, E., Kaustov, L., Ayed, A., Yi, G.-S., et al., Single-stranded DNA mimicry in the p53 transactivation domain interaction with replication protein A. PNAS 2005, 102, 15412–15417.

[73] Anciano Granadillo, V.J., Earley, J.N., Shuck, S.C., Georgiadis, M.M., et al., Targeting the OB-Folds of Replication Protein A with Small Molecules. Journal of Nucleic Acids 2010, 2010, 1–11.

[74] Glanzer, J.G., Liu, S., Oakley, G.G., Small molecule inhibitor of the RPA70 N-terminal protein interaction domain discovered using in silico and in vitro methods. Bioorganic & Medicinal Chemistry 2011, 19, 2589–2595.

[75] Suyama, M., Doerks, T., Braun, I.C., Sattler, M., et al., Prediction of structural domains of TAP reveals details of its interaction with p15 and nucleoporins. EMBO reports 2000, 1, 53–58.

[76] Zolotukhin, A.S., Michalowski, D., Smulevitch, S., Felber, B.K., Retroviral constitutive transport element evolved from cellular TAP(NXF1)-binding sequences. J. Virol. 2001, 75, 5567–5575.

[77] Grant, R.P., Neuhaus, D., Stewart, M., Structural basis for the interaction between the Tap/NXF1 UBA domain and FG nucleoporins at 1A resolution. J. Mol. Biol. 2003, 326, 849–858.

[78] McCullough, J., Fisher, R.D., Whitby, F.G., Sundquist, W.I., Hill, C.P., ALIX-CHMP4 interactions in the human ESCRT pathway. Proc Natl Acad Sci U S A 2008, 105, 7687–7691.

[79] Sette, P., Mu, R., Dussupt, V., Jiang, J., et al., The Phe105 loop of Alix Bro1 domain plays a key role in HIV-1 release. Structure 2011, 19, 1485–1495.

[80] Aranda, A., Pascual, A., Nuclear Hormone Receptors and Gene Expression. Physiol Rev 2001, 81, 1269–1304.

[81] Xu, H.E., Stanley, T.B., Montana, V.G., Lambert, M.H., et al., Structural basis for antagonist-mediated recruitment of nuclear co-repressors by PPARα. Nature 2002, 415, 813–817.

[82] Makishima, M., Okamoto, A.Y., Repa, J.J., Tu, H., et al., Identification of a nuclear receptor for bile acids. Science 1999, 284, 1362–1365.

[83] Maki, M., Kitaura, Y., Satoh, H., Ohkouchi, S., Shibata, H., Structures, functions and molecular evolution of the penta-EF-hand Ca2+-binding proteins. Biochimica et Biophysica Acta (BBA) - Proteins & Proteomics 2002, 1600, 51–60.

[84] Todd, B., Moore, D., Deivanayagam, C.C.., Lin, G., et al., A Structural Model for the Inhibition of Calpain by Calpastatin: Crystal Structures of the Native Domain VI of Calpain and its Complexes with Calpastatin Peptide and a Small Molecule Inhibitor. Journal of Molecular Biology 2003, 328, 131–146.

[85] Shibata, H., Suzuki, H., Kakiuchi, T., Inuzuka, T., et al., Identification of Alix-Type and Non-Alix-Type ALG-2-Binding Sites in Human Phospholipid Scramblase 3 DIFFERENTIAL BINDING TO AN ALTERNATIVELY SPLICED ISOFORM AND AMINO ACID-SUBSTITUTED MUTANTS. J. Biol. Chem. 2008, 283, 9623–9632.

[86] Suzuki, H., Kawasaki, M., Inuzuka, T., Okumura, M., et al., Structural basis for Ca2+ -dependent formation of ALG-2/Alix peptide complex: Ca2+/EF3-driven arginine switch mechanism. Structure 2008, 16, 1562–1573.

[87] Høj, B.R., la Cour, J.M., Mollerup, J., Berchtold, M.W., ALG-2 knockdown in HeLa cells results in G2/M cell cycle phase accumulation and cell death. Biochemical and Biophysical Research Communications 2009, 378, 145–148.

[88] Stirnimann, C.U., Petsalaki, E., Russell, R.B., Müller, C.W., WD40 proteins propel cellular networks. Trends Biochem. Sci. 2010, 35, 565–574.

[89] Xu, C., Min, J., Structure and function of WD40 domain proteins. Protein Cell 2011, 2, 202–214. [90] Patel, A., Dharmarajan, V., Cosgrove, M.S., Structure of WDR5 bound to mixed lineage leukemia

protein-1 peptide. J. Biol. Chem. 2008, 283, 32158–32161.

87

38 45

[91] Zhang, P., Lee, H., Brunzelle, J.S., Couture, J.-F., The plasticity of WDR5 peptide-binding cleft enables the binding of the SET1 family of histone methyltransferases. Nucleic Acids Res 2012, 40, 4237–4246.

[92] Karatas, H., Townsend, E.C., Bernard, D., Dou, Y., Wang, S., Analysis of the Binding of Mixed Lineage Leukemia 1 (MLL1) and Histone 3 Peptides to WD Repeat Domain 5 (WDR5) for the Design of Inhibitors of the MLL1−WDR5 Interaction. J. Med. Chem. 2010, 53, 5179–5185.

[93] Lemmon, S.K., Traub, L.M., Getting in Touch with the Clathrin Terminal Domain. Traffic 2012, 13, 511–519.

[94] Haar, E. ter, Harrison, S.C., Kirchhausen, T., Peptide-in-groove interactions link target proteins to the β-propeller of clathrin. PNAS 2000, 97, 1096–1100.

[95] Miele, A.E., Watson, P.J., Evans, P.R., Traub, L.M., Owen, D.J., Two distinct interaction motifs in amphiphysin bind two independent sites on the clathrin terminal domain beta-propeller. Nat. Struct. Mol. Biol. 2004, 11, 242–248.

[96] Kang, D.S., Kern, R.C., Puthenveedu, M.A., Zastrow, M. von, et al., Structure of an Arrestin2-Clathrin Complex Reveals a Novel Clathrin Binding Domain That Modulates Receptor Trafficking. J. Biol. Chem. 2009, 284, 29860–29872.

[97] Willox, A.K., Royle, S.J., Functional Analysis of Interaction Sites on the N-Terminal Domain of Clathrin Heavy Chain. Traffic 2012, 13, 70–81.

[98] Weisbrich, A., Honnappa, S., Jaussi, R., Okhrimenko, O., et al., Structure-function relationship of CAP-Gly domains. Nat. Struct. Mol. Biol. 2007, 14, 959–967.

[99] Steinmetz, M.O., Akhmanova, A., Capturing protein tails by CAP-Gly domains. Trends Biochem. Sci. 2008, 33, 535–545.

[100] Matsuo, H., Li, H., McGuire, A.M., Fletcher, C.M., et al., Structure of translation factor elF4E bound to m7GDP and interaction with 4E-binding protein. Nature Structural & Molecular Biology 1997, 4, 717–724.

[101] Marcotrigiano, J., Gingras, A.-C., Sonenberg, N., Burley, S.K., Cap-Dependent Translation Initiation in Eukaryotes Is Regulated by a Molecular Mimic of eIF4G. Molecular Cell 1999, 3, 707–716.

[102] Volpon, L., Osborne, M.J., Capul, A.A., Torre, J.C. de la, Borden, K.L.B., Structural characterization of the Z RING-eIF4E complex reveals a distinct mode of control for eIF4E. PNAS 2010, 107, 5441–5446.

[103] Swanson, K.A., Kang, R.S., Stamenova, S.D., Hicke, L., Radhakrishnan, I., Solution structure of Vps27 UIM–ubiquitin complex important for endosomal sorting and receptor downregulation. EMBO J 2003, 22, 4597–4606.

[104] Hofmann, K., Falquet, L., A ubiquitin-interacting motif conserved in components of the proteasomal and lysosomal protein degradation systems. Trends in Biochemical Sciences 2001, 26, 347–350.

[105] Sims, J.J., Scavone, F., Cooper, E.M., Kane, L.A., et al., Polyubiquitin-sensor proteins reveal localization and linkage-type dependence of cellular ubiquitin signaling. Nature Methods 2012, 9, 303–309.

[106] Fairall, L., Chapman, L., Moss, H., de Lange, T., Rhodes, D., Structure of the TRFH dimerization domain of the human telomeric proteins TRF1 and TRF2.Mol. Cell 2001, 8, 351–361.

[107] Chen, Y., Yang, Y., Overbeek, M. van, Donigian, J.R., et al., A Shared Docking Motif in TRF1 and TRF2 Used for Differential Recruitment of Telomeric Proteins. Science 2008, 319, 1092–1096.

[108] Ernst, A., Gfeller, D., Kan, Z., Seshagiri, S., et al., Coevolution of PDZ domain-ligand interactions analyzed by high-throughput phage display and deep sequencing.Mol Biosyst 2010, 6, 1782–1790.

[109] Gfeller, D., Butty, F., Wierzbicka, M., Verschueren, E., et al., The multiple-specificity landscape of modular peptide recognition domains.Mol Syst Biol 2011, 7.

[110] Huang, H., Sidhu, S.S., Studying binding specificities of peptide recognition modules by high-throughput phage display selections. Methods Mol. Biol. 2011, 781, 87–97.

88

38 45

APPENDIX

89

38 45

Appendix A: List of ovarian cancer cell lines Cancer cell line Cancer type Species Development

stage Morphology

609050M Ovarian Homo Sapiens Adult Epithelial A2780 Ovarian Homo Sapiens Adult Epithelial A2780_CIS Ovarian Homo Sapiens Adult Epithelial MM_OVCAR432_Bast_1 Ovarian Homo Sapiens Adult Epithelial OV-1946 Ovarian Homo Sapiens Adult Epithelial OV-90 Ovarian Homo Sapiens Adult Epithelial OVCA1369_TR Ovarian Homo Sapiens Adult Epithelial OVCA433_Bast Ovarian Homo Sapiens Adult Epithelial OVCA5 Ovarian Homo Sapiens Adult Epithelial OVCA8 Ovarian Homo Sapiens Adult Epithelial OVCAR-3 Ovarian Homo Sapiens Adult Epithelial SK-OV-3 Ovarian Homo Sapiens Adult Epithelial TOV-1946 Ovarian Homo Sapiens Adult Epithelial TOV-2223G Ovarian Homo Sapiens Adult Epithelial TOV-3133G Ovarian Homo Sapiens Adult Epithelial * Information obtained from COLT-cancer database at the Moffat lab at Terrence Donnelly CCBR, University of Toronto.

90

38 45

Appendix B:Protein sequences of 66 domains

1433F GDREQLLQRARLAEQAERYDDMASAMKAVTELNEPLSNEDRNLLSVAYKNVVGARRSSWRVISSIEQKTMADGNEKKLEKVKAYREKIEKELETVCNDVLSLLDKFLIKNCNDFQYESKVFYLKMKGDYYRYLAEVASGEKKNSVVEASEAAYKEAFEISKEQMQPTHPIRLGLALNFSVFYYEIQNAPEQACLLAKQAFDDAIAELDTLNEDSYKDSTLIMQLLRDNLTLWTSDQQDEEAGEGN ACTG2MEEEIAALVIDNGSGMCKAGFAGDDAPRAVFPSIVGRPRHQGVMVGMGQKDSYVGDEAQSKRGILTLKYPIEHGIVTNWDDMEKIWHHTFYNELRVAPEEHPVLLTEAPLNPKANREKMTQIMFETFNTPAMYVAIQAVLSLYASGRTTGIVMDSGDGVTHTVPIYEGYALPHAILRLDLAGRDLTDYLMKILTERGYSFTTTAEREIVRDIKEKLCYVALDFEQEMATAASSSSLEKSYELPDGQVITIGNERFRCPEALFQPSFLGMESCGIHETTFNSIMKCDVDIRKDLYANTVLSGGTTMYPGIADRMQKEITALAPSTMKIKIIAPPERKYSVWIGGSILASLSTFQQMWISKQEYDESGPSIVHRKCF ACTH EEETTALVCDNGSGLCKAGFAGDDAPRAVFPSIVGRPRHQGVMVGMGQKDSYVGDEAQSKRGILTLKYPIEHGIITNWDDMEKIWHHSFYNELRVAPEEHPTLLTEAPLNPKANREKMTQIMFETFNVPAMYVAIQAVLSLYASGRTTGIVLDSGDGVTHNVPIYEGYALPHAIMRLDLAGRDLTDYLMKILTERGYSFVTTAEREIVRDIKEKLCYVALDFENEMATAASSSSLEKSYELPDGQVITIGNERFRCPETLFQPSFIGMESAGIHETTYNSIMKCDIDIRKDLYANNVLSGGTTMYPGIADRMQKEITALAPSTMKIKIIAPPERKYSVWIGGSILASLSTFQQMWISKPEYDEAGPSIVHRKCF IGLL1 LLRPTAASQSRALGPGAPGGSSRSSLRSRWGRFLLQRGSWTGPRCWPRGFQSKHNSVTHVFGSGTQLTVLSQPKATPSVTLFPPSSEELQANKATLVCLMNDFYPGILTVTWKADGTPITQGVEMTTPSKQSNNKYAASSYLSLTPEQWRSRRSYSCQVMHEGSTVEKTVAPAECS AP2M1 KYRRNELFLDVLESVNLLMSPQGQVLSAHVSGRVVMKSYLSGMPECKFGMNDKIVIEKQGKGTADETSKSGKQSIAIDDCTFHQCVRLSKFDSERSISFIPPDGEFELMRYRTTKDIILPFRVIPLVREVGRTKLEVKVVIKSNFKPSLLAQKIEVRIPTPLNTSGVQVICMKGKAKYKASENAIVWKIKRMAGMKESQISAEIELLPTNDKKKWARPPISMNFEVPFAPSGLKVRYLKVFEPKLNYSDHDVIKWVRYIGRSGIYETRC B2CL1 MSQSNRELVVDFLSYKLSQKGYSWSQFSDVEENRTEAPEGTESEMETPSAINGNPSWHLADSPAVNGATGHSSSLDAREVIPMAAVKQALREAGDEFELRYRRAFSDLTSQLHITPGTAYQSFEQVVNELFRDGVNWGRIVAFFSFGGALCVESVDKEMQVLVSRIAAWMATYLNDHLEPWIQENGGWDTFVELYGNNAAAESRKGQERFNRWFLTGMTVAGVVLLGSLFSRK PDC6I TFISVQLKKTSEVDLAKPLVKFIQQTYPSGGEEQAQYCRAAEELSKLRRAAVGRPLDKHEGALETLLRYYDQICSIEPKFPFSENQICLTFTWKDAFDKGSLFGGSVKLALASLGYEKSCVLFNCAALASQIAAEQNLDNDEGLKIAAKHYQFASGAFLHIKETVLSALSREPTVDISPDTVGTLSLIMLAQAQEVFFLKATRDKMKDAIIAKLANQAADYFGDAFKQCQYKDTLPKEVFPVLAAKHCIMQANAEYHQSILAKQQKKFGEEIARLQHAAELIKTVASRYDEYVNVKDFSDKINRALAAAKKDNDFIYHDRVPDLKDLDPIGKATLVKSTPVNVPISQKFTDLFEKMVPVSVQQSLAAYNQRKADLVNRSIAQMREATTLA CPNS1 MFLVNSFLKGGGGGGGGGGGLGGGLGNVLGGLISGAGGGGGGGGGGGGGGGGGGGGTAMRILGGVISAISEAAAQYNPEPPPPRTHYSNIEANESEEVRQFRRLFAQLAGDDMEVSATELMNILNKVVTRHPDLKTDGFGIDTCRSMVAVMDSDTTGKLGFEEFKYLWNNIKRWQAIYKQFDTDRSGTICSSELPGAFEAAGFHLNEHLYNMIIRRYSDESGNMDFDNFISCLVRLDAMFRAFKSLDKDGTGQIQVNIQEWLQLTMYS DCTN1 PLRVGSRVEVIGKGHRGTVAYVGATLFATGKWVGVILDEAKGKNDGTVQGRKYFTCDEGHGIFVRQSQIQVF CASP2 RLSTDTVEHSLDNKDGPVCLQVKPCTPEFYQTHFQLAYRLQSRPRGLALVLSNVHFTGEKELEFRSGGDVDHSTLVTLFKLLGYDVHVLCDQTAQEMQEKLQNFAQLPAHRVTDSCIVALLSHGVEGAIYGVDGKLLQLQEVFQLFDNANCPSLQNKPKMFFIQACRGDETDRGVDQQDGKNHAGSPGCEESDAGKEKLPKMRLPTRSDMICGYACLKGTAAMRNTKRGSWYIEALAQVFSERACDMHVADMLVKVNALIKDREGYAPGTEFHRCKEMSEYCSTLCRHLYLFPGHPPT CLH1 MAQILPIRFQEHLQLQNLGINPANIGFSTLTMESDKFICIREKVGEQAQVVIIDMNDPSNPIRRPISADSAIMNPASKVIALKAGKTLQIFNIEMKSKMKAHTMTDDVTFWKWISLNTVALVTDNAVYHWSMEGESQPVKMFDRHSSLAGCQIINYRTDAKQKWLLLTGISAQQNRVVGAMQLYSVDRKVSQPIEGHAASFAQFKMEGNAEESTLFCFAVRGQAGGKLHIIEVGTPPTGNQPFPKKAVDVFFPPEAQN

91

38 45

DFPVAMQISEKHDVVFLITKYGYIHLYDLETGTCIYMNRISGETIFVTAPHEATAGIIGVNRKGQVLSVCVEEENIIPYITNVLQNPDLALRMAVRNNLAGAEEL DLG1-1 EITLERGNSGLGFSIAGGTDNPHIGDDSSIFITKIITGGAAAQDGRLRVNDCILRVNEVDVRDVTHSKAVEALKEAGSIVRLYVKRR DLG1-2 EIKLIKGPKGLGFSIAGGVGNQHIPGDNSIYVTKIIEGGAAHKDGKLQIGDKLLAVNNVCLEEVTHEEAVTALKNTSDFVYLKVAKP DLG2-2 EIKLFKGPKGLGFSIAGGVGNQHIPGDNSIYVTKIIDGGAAQKDGRLQVGDRLLMVNNYSLEEVTHEEAVAILKNTSEVVYLKVGKP DLG2-3 KVVLHKGSTGLGFNIVGGEDGEGIFVSFILAGGPADLSGELQRGDQILSVNGIDLRGASHEQAAAALKGAGQTVTIIAQYQ DLG4-2 EIKLIKGPKGLGFSIAGGVGNQHIPGDNSIYVTKIIEGGAAHKDGRLQIGDKILAVNSVGLEDVMHEDAVAALKNTYDVVYLKVAKP DLG4-3 RIVIHRGSTGLGFNIVGGEDGEGIFISFILAGGPADLSGELRKGDQILSVNGVDLRNASHEQAAIALKNAGQTVTIIAQYK DNAS1 LKIAAFNIQTFGETKMSNATLVSYIVQILSRYDIALVQEVRDSHLTAVGKLLDNLNQDAPDTYHYVVSEPLGRNSYKERYLFVYRPDQVSAVDSYYYDDGCEPCGNDTFNREPAIVRFFSRFTEVREFAIVPLHAAPGDAVAEIDALYDVYLDVQEKWGLEDVMLMGDFNAGCSYVRPSQWSSIRLWTSPTFQWLIPDSADTTATPTHCAYDRIVVAGMLLRGAVVPDSALPFNFQAAYGLSDQLAQAISDHYPVEVMLK DYL2 MSDRKAVIKNADMSEDMQQDAVDCATQAMEKYNIEKDIAAYIKKEFDKKYNPTWHCIVGRNFGSYVTHETKHFIYFYLGQVAILLFKSG DYL1 MCDRKAVIKNADMSEEMQQDSVECATQALEKYNIEKDIAAHIKKEFDKKYNPTWHCIVGRNFGSYVTHETKHFIYFYLGQVAILLFKSG PROF2 MAGWQSYVDNLMCDGCCQEAAIVGYCDAKYVWAATAGGVFQSITPIEIDMIVGKDREGFFTNGLTLGAKKCSVIRDSLYVDGDCTMDIRTKSQGGEPTYNVAVGRAGRVLVFVMGKEGVHGGGLNKKAYSMAKYLRDSGF DYN2 MEELIPLVNKLQDAFSSIGQSCHLDLPQIAVVGGQSAGKSSVLENFVGRDFLPRGSGIVTRRPLILQLIFSKTEHAEFLHCKSKKFTDFDEVRQEIEAETDRVTGTNKGISPVPINLRVYSPHVLNLTLIDLPGITKVPVGDQPPDIEYQIKDMILQFISRESSLILAVTPANMDLANSDALKLAKEVDPQGLRTIGVITKLDLMDEGTDARDVLENKLLPLRRGYIGVVNRSQKDIEGKKDIRAALAAERKFFLSHPAYRHMADRMGTPHLQKTLNQQLTNHIRESLPALRSKLQSQL PDCD6 PDQSFLWNVFQRVDKDRSGVISDTELQQALSNGTWTPFNPVTVRSIISMFDRENKAGVNFSEFTGVWKYITDWQNVFRTYDRDNSGMIDKNELKQALSGFGYRLSDQFHDILIRKFDRQGRGQIAFDDFIQGCIVLQRLTDIFRRYDTDQDGWIQVSYEQYLSMVFSIV IF4E ATVEPETTPTPNPPTTEEEKTESNQEVANPEHYIKHPLQNRWALWFFKNDKSKTWQANLRLISKFDTVEDFWALYNHIQLSSNLMPGCDYSLFKDGIEPMWEDEKNKRGGRWLITLNKQQRRSDLDRFWLETLLCLIGESFDDYSDDVCGAVVNVRAKGDKIAIWTTECENREAVTHIGRVYKERLGLPPKIVIGYQSHADTATKSGSTTKNRFVV E41L3 MQCKVILLDGSEYTCDVEKRSRGQVLFDKVCEHLNLLEKDYFGLTYRDAENQKNWLDPAKEIKKQVRSGAWHFSFNVKFYPPDPAQLSEDITRYYLCLQLRDDIVSGRLPCSFVTLALLGSYTVQSELGDYDPDECGSDYISEFRFAPNHTKELEDKVIELHKSHRGMTPAEAEMHFLENAKKLSMYGVDLHHAKDSEGVEIMLGVCASGLLIYRDRLRINRFAWPKVLKISYKRNNFYIKIRPGEFEQFESTIGFKLPNHRAAKRLWKVCVEHHTFFRLLL

92

38 45

GNAI1 GCTLSAEDKAAVERSKMIDRNLREDGEKAAREVKLLLLGAGESGKSTIVKQMKIIHEAGYSEEECKQYKAVVYSNTIQSIIAIIRAMGRLKIDFGDSARADDARQLFVLAGAAEEGFMTAELAGVIKRLWKDSGVQACFNRSREYQLNDSAAYYLNDLDRIAQPNYIPTQQDVLRTRVKTTGIVETHFTFKDLHFKMFDVGGQRSERKKWIHCFEGVTAIIFCVALSDYDLVLAEDEEMNRMHESMKLFDSICNNKWFTDTSIILFLNKKDLFEEKIKKSPLTICYPEYAGSNTYEEAAAYIQCQFEDLNKRKDTKEIYTHFTCATDTKNVQFVFDAVTDVIIKNNLKDCGLF GNAO GCTLSAEERAALERSKAIEKNLKEDGISAAKDVKLLLLGAGESGKSTIVKQMKIIHEDGFSGEDVKQYKPVVYSNTIQSLAAIVRAMDTLGIEYGDKERKADAKMVCDVVSRMEDTEPFSAELLSAMMRLWGDSGIQECFNRSREYQLNDSAKYYLDSLDRIGAADYQPTEQDILRTRVKTTGIVETHFTFKNLHFRLFDVGGQRSERKKWIHCFEDVTAIIFCVALSGYDQVLHEDETTNRMHESLMLFDSICNNKFFIDTSIILFLNKKDLFGEKIKKSPLTICFPEYTGPNTYEDAAAYIQAQFESKNRSPNKEIYCHMTCATDTNNIQVVFDAVTDIIIANNLRGCGLY GNAI3 GCTLSAEDKAAVERSKMIDRNLREDGEKAAKEVKLLLLGAGESGKSTIVKQMKIIHEDGYSEDECKQYKVVVYSNTIQSIIAIIRAMGRLKIDFGEAARADDARQLFVLAGSAEEGVMTPELAGVIKRLWRDGGVQACFSRSREYQLNDSASYYLNDLDRISQSNYIPTQQDVLRTRVKTTGIVETHFTFKDLYFKMFDVGGQRSERKKWIHCFEGVTAIIFCVALSDYDLVLAEDEEMNRMHESMKLFDSICNNKWFTETSIILFLNKKDLFEEKIKRSPLTICYPEYTGSNTYEEAAAYIQCQFEDLNRRKDTKEIYTHFTCATDTKNVQFVFDAVTDVIIKNNLKECGLY GRB2 MEAIAKYDFKATADDELSFKRGDILKVLNEECDQNWYKAELNGKDGFIPKNYIEMKPH GRAP2 GRVRWARALYDFEALEDDELGFHSGEVVEVLDSSNPSWWTGRLHNKLGLFPANYVAPMTR IMB1 MELITILEKTVSPDRLELEAAQKFLERAAVENLPTFLVELSRVLANPGNSQVARVAAGLQIKNSLTSKDPDIKAQYQQRWLAIDANARREVKNYVLQTLGTETYRPSSASQCVAGIACAEIPVNQWPELIPQLVANVTNPNSTEHMKESTLEAIGYICQDIDPEQLQDKSNEILTAIIQGMRKEEPSNNVKLAATNALLNSLEFTKANFDKESERHFIMQVVCEATQCPDTRVRVAALQNLVKIMSLYYQYMETYMGPALFAITIEAMKSDIDEVALQGIEFWSNVCDEEMDLAIEASEAAEQGRPPEHTSKFYAKGALQYLVPILTQTLTKQDENDDDDDWNPCKAAGVCLMLLATCCEDDIVPHVLPFIKEHIKNPDWRYRDAAVMAFGCILEGPEPSQLKPLVIQAMPTLIELMKDPSVVVRDTAAWTVGRICELLPEAAINDVYLAPLLQCLIEGLSAEPRVASNVCWAFSSLAEAAYEAADVADDQEEPATYCLSSSFELIVQKLLETTDRPDGHQNNLRSSAYESLMEIVKNSAKDCYPAVQKTTLVIMERLQQVLQMESHIQSTSDRIQFNDLQSLLCATLQNVLRKVQHQDALQISDVVMASLLRMFQSTAGSGGVQEDALMAVSTLVEVLGGEFLKYMEAFKPFLGIGLKNYAEYQVCLAAVGLVGDLCRALQSNIIPFCDEVMQLLLENLGNENVHRSVKPQILSVFGDIALAIGGEFKKYLEVVLNTLQQASQAQVDKSDYDMVDYLNELRESCLEAYTGIVQGLKGDQENVHPDVMLVQPRVEFILSFIDHIAGDEDHTDGVVACAAGLIGDLCTAFGKDVLKLVEARPMIHELLTEGRRSKTNKAKTLATWATKELRKLKNQA NR1H4 KTELTPDQQTLLHFIMDSYNKQRMPQEITNKILKEEFSAEENFLILTEMATNHVQVLVEFTKKLPGFQTLDHEDQIALLKGSAVEAMFLRSAEIFNKKLPSGHSDLLEERIRNSGISDEYITPMFSFYKSIGELKMTQEEYALLTAIVILSPDRQYIKDREAVEKLQEPLLDVLQKLCKIHQPENPQHFACLLGRLTELRTFNHHHAEMLMSWRVNDHK NR1I2 DLCSLKVSLQLRGEDGSVWNYKPPADSGGKEIFSLLPHMADMSTYMFKGIISFAKVISYFRDLPIEDQISLLKGAAFELCQLRFNTVFNAETGTWECGRLSYCLEDTAGGFQQLLLEPMLKFHYMLKKLQLHEEEYVLMQAISLFSPDRPGVLQHRVVDQLQEQFAITLKSYIECNRPQPAHRFLFLKIMAMLTELRSINAQHTQRLLRIQDIHPFATPLMQELFGITGS GCR LTPTLVSLLEVIEPEVLYAGYDSSVPDSTWRIMTTLNMLGGRQVIAAVKWAKAIPGFRNLHLDDQMTLLQYSWMFLMAFALGWRSYRQSSANLLCFAPDLIINEQRMTLPCMYDQCKHMLYVSSELHRLQVSYEEYLCMKTLLLLSSVPKDGLKSQELFDEIRMTYIKELGKAIVKREGNSSQNWQRFYQLTKLLDSMHEVVENLLNYCFQTFLDKTMSIEFPEMLAEIITNQIPKYSNGNIKKLLFHQK RXRG STNDPVTNICHAADKQLFTLVEWAKRIPHFSDLTLEDQVILLRAGWNELLIASFSHRSVSVQDGILLATGLHVHRSSAHSAGVGSIFDRVLTELVSKMKDMQMDKSELGCLRAIVLFNPDAKGLSNPSEVETLREKVYATLEAYTKQKYPEQPGRFAKLLLRLPALRSIGLKCLEHLFFFKLIGDTPIDTFLMEMLETPLQIT MD2L1

93

38 45

ALQLSREQGITLRGSAEIVAEFFSFGINSILYQRGIYPSETFTRVQKYGLTLLVTTDLELIKYLNNVVEQLKDWLYKCSVQKLVVVISNIESGEVLERWQFDIECDKTAKDDSAPREKSQKAIQDEIRSVIRQITATVTFLPLLEVSCSFDLLIYTDKDLVVPEKWEESGPQFITNSEEVRLRSFTTTIHKVNSMVAYKIPVND 2B11 GDTRPRFLWQLKFECHFFNGTERVRLLERCIYNQEESVRFDSDVGEYRAVTELGRPDAEYWNSQKDLLEQRRAAVDTYCRHNYGVGESFTVQRRVEPKVTVYPSKTQPLQHHNLLVCSVSGFYPGSIEVRWFRNGQEEKAGVVSTGLIQNGDWTFQTLVMLETVPRSGEVYTCQVEHPSVTSPLTVEWRARSESAQSK PCNA MFEARLVQGSILKKVLEALKDLINEACWDISSSGVNLQSMDSSHVSLVQLTLRSEGFDTYRCDRNLAMGVNLTSMSKILKCAGNEDIITLRAEDNADTLALVFEAPNQEKVSDYEMKLMDLDVEQLGIPEQEYSCVVKMPSGEFARICRDLSHIGDAVVISCAKDGVKFSASGELGNGNIKLSQTSNVDKEEEAVTIEMNEPVQLTFALRYLNFFTKATPLSSTVTLSMSADVPLVVEYKIADMGHLKYYLAPKIEDEEGS MK03 YTQLQYIGEGAYGMVSSAYDHVRKTRVAIKKISPFEHQTYCQRTLREIQILLRFRHENVIGIRDILRASTLEAMRDVYIVQDLMETDLYKLLKSQQLSNDHICYFLYQILRGLKYIHSANVLHRDLKPSNLLINTTCDLKICDFGLARIADPEHDHTGFLTEYVATRWYRAPEIMLNSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHILGILGSPSQEDLNCIINMKARNYLQSLPSKTKVAWAKLFPKSDSKALDLLDRMLTFNPNKRITVEEALAHPYL KAPCB FERKKTLGTGSFGRVMLVKHKATEQYYAMKILDKQKVVKLKQIEHTLNEKRILQAVNFPFLVRLEYAFKDNSNLYMVMEYVPGGEMFSHLRRIGRFSEPHARFYAAQIVLTFEYLHSLDLIYRDLKPENLLIDHQGYIQVTDFGFAKRVKGRTWTLCGTPEYLAPEIILSKGYNKAVDWWALGVLIYEMAAGYPPFFADQPIQIYEKIVSGKVRFPSHFSSDLKDLLRNLLQVDLTKRFGNLKNGVSDIKTHKWF SRPK2 YHVIRKLGWGHFSTVWLCWDMQGKRFVAMKVVKSAQHYTETALDEIKLLKCVRESDPSDPNKDMVVQLIDDFKISGMNGIHVCMVFEVLGHHLLKWIIKSNYQGLPVRCVKSIIRQVLQGLDYLHSKCKIIHTDIKPENILMCVDDAYVRRMAAEATEWQKAGAPPPSGSAVSTAPQQKPIGKISKNKKKKLKKKQKRQAELLEKRLQEIEELEREAERKIIEENITSAAPSNDQDGEYCPEVKLKTTGLEEAAEAETAKDNGEAEDQEEKEDAEKENIEKDEDDVDQELANIDPTWIESPKTNGHIENGPFSLEQQLDDEDDDEEDCPNPEEYNLDEPNAESDYTYSSSYEQFNGELPNGRHKIPESQFPEFSTSLFSGSLEPVACGSVLSEGSPLTEQEESSPSHDRSRTVSASSTGDLPKAKTRAADLLVNPLDPRNADKIRVKIADLGNACWVHKHFTEDIQTRQYRSIEVLIGAGYSTPADIWSTACMAFELATGDYLFEPHSGEDYSRDEDHIAHIIELLGSIPRHFALSGKYSREFFNRRGELRHITKLKPWSLFDVLVEKYGWPHEDAAQFTDFLIPMLEMVPEKRASAGECLRHP AURKB AQKENSYPWPYGRQTAPSGLSTLPQRVLRKEPVTPSALVLMSRSNVQPTAAPGQKVMENSSGTPDILTRHFTIDDFEIGRPLGKGKFGNVYLAREKKSHFIVALKVLFKSQIEKEGVEHQLRREIEIQAHLHHPNILRLYNYFYDRRRIYLILEYAPRGELYKELQKSCTFDEQRTATIMEELADALMYCHGKKVIHRDIKPENLLLGLKGELKIADFGWSVHAPSLRRKTMCGTLDYLPPEMIEGRMHNEKVDLWCIGVLCYELLVGNPPFESASHNETYRRIVKVDLKFPASVPMGAQDLISKLLRHNPSERLPLAQVSAHPWVRANSRRVLPPSALQSVA KAPCA FERIKTLGTGSFGRVMLVKHKETGNHYAMKILDKQKVVKLKQIEHTLNEKRILQAVNFPFLVKLEFSFKDNSNLYMVMEYVPGGEMFSHLRRIGRFSEPHARFYAAQIVLTFEYLHSLDLIYRDLKPENLLIDQQGYIQVTDFGFAKRVKGRTWTLCGTPEYLAPEIILSKGYNKAVDWWALGVLIYEMAAGYPPFFADQPIQIYEKIVSGKVRFPSHFSSDLKDLLRNLLQVDLTKRFGNLKNGVNDIKNHKWF PROF1 AGWNAYIDNLMADGTCQDAAIVGYKDSPSVWAAVPGKTFVNITPAEVGVLVGKDRSSFYVNGLTLGGQKCSVIRDSLLQDGEFSMDLRTKSTGGAPTFNVTVTKTDKTLVLLMGKEGVHGGLINKKCYEMASHLRRSQY DAB2 GDGVKYKAKLIGIDDVPDARGDKMSQDSMMKLKGMAAAGRSQGQHKQRIWVNISLSGIKIIDEKTGVIEHEHPVNKISFIARDVTDNRAFGYVCGGEGQHQFFAIKTGQQAEPLVVDLKDLFQVIYNVKKKEEEKKKIEEASKAVENGSEAL RAC1 MQAIKCVVVGDGAVGKTCLLISYTTNAFPGEYIPTVFDNYSANVMVDGKPVNLGLWDTAGQEDYDRLRPLSYPQTDVFLICFSLVSPASFENVRAKWYPEVRHHCPNTPIILVGTKLDLRDDKDTIEKLKEKKLTPITYPQGLAMAKEIGAVKYLECSALTQRGLKTVFDEAIRAVLCPPPVKKRKRKC RAD51

94

38 45

SEIIQITTGSKELDKLLQGGIETGSITEMFGEFRTGKTQICHTLAVTCQLPIDRGGGEGKAMYIDTEGTFRPERLLAVAERYGLSGSDVLDNVAYARAFNTDHQTQLLYQASAMMVESRYALLIVDSATALYRTDYSGRGELSARQMHLARFLRMLLRLADEFGVAVVITNQVVAQVDGAAMFAADPKKPIGGNIIAHASTTRLYLRKGRGETRICKIYDSPCLPEAEAMFAINADGVGDAKD RFA1 VGQLSEGAIAAIMQKGDTNIKPILQVINIRPITTGNSPPRYRLLMSDGLNTLSSFMLATQLNPLVEEEQLSSNCVCQIHRFIVNTLKDGRRVVILMELEVLKSAEAVGVKIGNPVPYNEG U2AF1 LRCAVSDVEMQEHYDEFFEEVFTEMEEKYGEVEEMNVCDNLGDHLVGNVYVKFRREEDAEKAVIDLNNRWFNGQPIHAELSPV IPSP HRHHPREMKKRVEDLHVGATVAPSSRRDFTFDLYRALASAAPSQSIFFSPVSISMSLAMLSLGAGSSTKMQILEGLGLNLQKSSEKELHRGFQQLLQELNQPRDGFQLSLGNALFTDLVVDLQDTFVSAMKTLYLADTFPTNFRDSAGAMKQINDYVAKQTKGKIVDLLKNLDSNAVVIMVNYIFFKAKWETSFNHKGTQEQDFYVTSETVVRVPMMSREDQYHYLLDRNLSCRVVGVPYQGNATALFILPSEGKMQQVENGLSEKTLRKWLKMFKKRQLELYLPKFSIEGSYQLEKVLPSLGISNVFTSHADLSGISNHSNIQVSEMVHKAVVEVDESGTRAAAATGTIFTFRSARLNSQRLVFNRPFLMFIVDNNILFLGKVNRP PLCG1 TFKCAVKALFDYKAQREDELTFIKSAIIQNVEKQEGGWWRGDYGGKKQLWFPSNYVEEMVN CACB1 RPSDSDVSLEEDREALRKEAERQALAQLEKAKTKPVAFAVRTNVGYNPSPGDEVPVQGVAITFEPKDFLHIKEKYNNDWWIGRLVKEGCEVGFIPSPVKLDSLRLLQEQKLRQNRLGSSKSGDNSSSSLGDVVTGTRRPTPPASAKQKQKSTEHVPPYDVVPSMRPIILVGPSLKGYEVTDMMQKALFDFLKHRFDGRISITRVTADISLAKRSVLNNPSKHIIIERSNTRSSLAEVQSEIERIFELARTLQLVALDADTINHPAQLSKTSLAPIIVYIKITSPKVLQRLIKSRGKSQSKHLNVQIAASEKLAQCPPEMFDIILDENQLEDACEHLAEYLEAYWKA MDM4 QVRPKLPLLKILHAAGAQGEMFTVKEVMHYLGQYIMVKQLYDQQEQHMVYCGGDLLGELLGRQSFSVKDPSPLYDMLRKNL NXF1 PEQQEMLQAFSTQSGMNLEWSQKCLQDNNWDYTRSAQAFTHLKAKGEIPEVAFMK T2FA SGDVQVTEDAVRRYLTRKPMTTKDLLKKFQTKKTGLSSEQTVNVLAQILKRLNPERKMINDKMHFSLKE TPA IKGGLFADIASHPWQAAIFAKHRRSPGERFLCGGILISSCWILSAAHCFQERFPPHHLTVILGRTYRVVPGEEEQKFEVEKYIVHKEFDDDTYDNDIALLQLKSDSSRCAQESSVVRTVCLPPADLQLPDWTECELSGYGKHEALSPFYSERLKEAHVRLYPSSRCTSQHLLNRTVTDNMLCAGDTRSGGPQANLHDACQGDSGGPLVCLNDGRMTLVGIISWGLGCGQKDVPGVYTKVTNYLDWIRDNMRP TERF1 EEEEEDAGLVAEAEAVAAGWMLDFLCLSLCRAFRDGRSEDFRRTRNSAEAIIHGLSSLTACQLRTIYICQFLTRIAAGKTLDAQFENDERITPLESALMIWGSIEKEHDKLHEEIQNLIKIQAIAVCMENGNFKEAEEVFERIFGDPNSHMPFKSKLLMIISQKDTFHSFFQHFSYNHMMEKIKSYVNYVLSEKSSTFLMKAAAKVVESKR APLP2 VKAVCSQEAMTGPCRAVMPRWYFDLSKGKCVRFIYGGCGGNRNNFESEDYCMAVCKAMI ACRO IVGGKAAQHGAWPWMVSLQIFTYNSHRYHTCGGSLLNSRWVLTAAHCFVGKNNVHDWRLVFGAKEITYGNNKPVKAPLQERYVEKIIIHEKYNSATEGNDIALVEITPPISCGRFIGPGCLPHFKAGLPRGSQSCWVAGWGYIEEKAPRPSSILMEARVDLIDLDLCNSTQWYNGRVQPTNVCAGYPVGKIDTCQGDSGGPLMCKDSKESAYVVVGITSWGVGCARAKRPGIYTATWPYLNWIASKIGSNALRMIQSATPPPPTTRPPPIRPPFSHPISAHLPWYFQPPPRPLPPRPPAAQ RL40 MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLRLRGG SORBS GEIGEAIAKYNFNADTNVELSLRKGDRVILLKRVDQNWYEGKIPGTNRQGIFPVSYVEVVKK SPF45

95

38 45

VVLLRNMVGAGEVDEDLEVETKEECEKYGKVGKCVIFEIPGAPDDEAVRIFLEFERVESAIKAVVDLNGRYFGGRVVKAC VINC PVFHTRTIESILEPVAQQISHLVIMHEEGEVDGKAIPDLTAPVAAVQAAVSNLVRVGKETVQTTEDQILKRDMPPAFIKVENACTKLVQAAQMLQSDPYSVPARDYLIDGSRGILSGTSDLLLTFDEAEVRKIIRVCKGILEYLTVAEVVETMEDLVTYTKNLGPGMTKMAKMIDERQQELTHQEHRVMLVNSMNTVKELLPVLISAMKIFVTTKNSKNQGIEEALKNRNFTVEKMSAEINEIIRVLQLTSWDEDAWA WDR5 SSSATQSKPTPVKPNYALKFTLAGHTKAVSSVKFSPNGEWLASSSADKLIKIWGAYDGKFEKTISGHKLGISDVAWSSDSNLLVSASDDKTLKIWDVSSGKCLKTLKGHSNYVFCCNFNPQSNLIVSGSFDESVRIWDVKTGKCLKTLPAHSDPVSAVHFNRDGSLIVSSSYDGLCRIWDTASGQCLKTLIDDDNPPVSFVKFSPNGKYILAATLDNTLKLWDYSKGKCLKTYTGHKNEKYCIFANFSVTGGKWIVSGSEDNLVYIWNLQTKEIVQKLQGHTDVVISTACHPTENIIASAALENDKTIKLWKSDC XRCC4 MERKISRIHLVSEPSITHFLQVSWEKTLESGFVITLTDGHSAWTGTVSESEISQEADDMAMEKGKYVGELRKALLSGAGPADVYTFNFSKESCYFFFEKNLKDVSFRLGSFNLEKVENPAEVIRELICYCLDTIAENQAKNEHLQKENERLLRDWNDVQGRFEKCVSAKEALETDLYKRFILVLNEKKTKIRSLHNKLLNAAQEREKDIKQEG PTN22 MDQREILQKFLDEAQSKKITKEEFANEFLKLKRQSTKYKADKTYPTTVAEKPKNIKKNRYKILPYDYSRVELSLITSDEDSSYINANFIKGVYGPKAYIATQGPLSTTLLDFWRMIWEYSVLIIVMACMEYEMGKKKCERYWAEPGEMQLEFGPFSVSCEAEKRKSDYIIRTLKVKFNSETRTIYQFHYKNWPDHDVPSSIDPILELIWDVRCYQEDDSVPICIHCSAGCGRTGVICAIDYTWMLLKDGIIPENFSVFSLIREMRTQRPSLVQTQEQYELVYNAVLELFKRQMDVIRDKHSGTESQAKH

96

38 45

Appendix C: Vector sequences The vector sequences are given below: pR4STOP (bp: 6203): The vector for peptide phage display Length: 6203 Legend: Ptac Signal peptide 4 Stop codons P3 P8 GAATTCCCGACACCATCGAATGGTGCAAAACCTTTCGCGGTATGGCATGATAGCGCCCGGAAGAGAGTCAATTCAGGGTGGTGAATGTGAAACCAGTAACGTTATACGATGTCGCAGAGTATGCCGGTGTCTCTTATCAGACCGTTTCCCGCGTGGTGAACCAGGCCAGCCACGTTTCTGCGAAAACGCGGGAAAAAGTGGAAGCGGCGATGGCGGAGCTGAATTACATTCCCAACCGCGTGGCACAACAACTGGCGGGCAAACAGTCGTTGCTGATTGGCGTTGCCACCTCCAGTCTGGCCCTGCACGCGCCGTCGCAAATTGTCGCGGCGATTAAATCTCGCGCCGATCAACTGGGTGCCAGCGTGGTGGTGTCGATGGTAGAACGAAGCGGCGTCGAAGCCTGTAAAGCGGCGGTGCACAATCTTCTCGCGCAACGCGTCAGTGGGCTGATCATTAACTATCCGCTGGATGACCAGGATGCCATTGCTGTGGAAGCTGCCTGCACTAATGTTCCGGCGTTATTTCTTGATGTCTCTGACCAGACACCCATCAACAGTATTATTTTCTCCCATGAAGACGGTACGCGACTGGGCGTGGAGCATCTGGTCGCATTGGGTCACCAGCAAATCGCGCTGTTAGCGGGCCCATTAAGTTCTGTCTCGGCGCGTCTGCGTCTGGCTGGCTGGCATAAATATCTCACTCGCAATCAAATTCAGCCGATAGCGGAACGGGAAGGCGACTGGAGTGCCATGTCCGGTTTTCAACAAACCATGCAAATGCTGAATGAGGGCATCGTTCCCACTGCGATGCTGGTTGCCAACGATCAGATGGCGCTGGGCGCAATGCGCGCCATTACCGAGTCCGGGCTGCGCGTTGGTGCGGATATCTCGGTAGTGGGATACGACGATACCGAAGACAGCTCATGTTATATCCCGCCGTTAACCACCATCAAACAGGATTTTCGCCTGCTGGGGCAAACCAGCGTGGACCGCTTGCTGCAACTCTCTCAGGGCCAGGCGGTGAAGGGCAATCAGCTGTTGCCCGTCTCACTGGTGAAAAGAAAAACCACCCTGGCGCCCAATACGCAAACCGCCTCTCCCCGCGCGTTGGCCGATTCATTAATGCAGCTGGCACGACAGGTTTCCCGACTGGAAAGCGGGCAGTGAGCGCAACGCAATTAATGTGAGTTAGCTCACTCATTAGGCACAATTCTCATGTTTGACAGCTTATCATCGACTGCACGGTGCACCAATGCTTCTGGCGTCAGGCAGCCATCGGAAGCTGTGGTATGGCTGTGCAGGTCGTAAATCACTGCATAATTCGTGTCGCTCAAGGCGCACTCCCGTTCTGGATAATGTTTTTTGCGCCGACATCATAACGGTTCTGGCAAATATTCTGAAATGAGCTGTTGACAATTAATCATCGGCTCGTATAATGTGTGGAATTGTGAGCGGATAACAATTTCACACAGGAAACAGCCAGTCCGTTTAGGTGTTTTCACGAGCACTTCACCAACAAGGACCATAGATTATGAAAAAGAATATCGCATTTCTTCTTGCATCTATGTTCGTTTTTTCTATTGCTACAAATGCCTATGCAGCCTCTTCATCTGGCTAATAATGATGAGGTGGAGGATCCGGAGGAGGCGCCGAGGGTGACGATCCCGCAAAAGCGGCCTTTAACTCCCTGCAAGCCTCAGCGACCGAATATATCGGTTATGCGTGGGCGATGGTTGTTGTCATTGTCGGCGCAACTATCGGTATCAAGCTGTTTAAGAAATTCACCTCGAAAGCAAGCTGATAAACCGATACAATTAAAGGCTCCTTTTGGAGCCTTTTTTTTTGGAGATTTTCAACGTGAAAAAATTATTATTCGCAATTCCTTTAGTTGTTCCTTTCTATTCTCACTCCGCTGAAACTGTTGAAAGTTGTTTAGCAAAACCCCATACAGAAAATTCATTTACTAACGTCTGGAAAGACGACAAAACTTTAGATCGTTACGCTAACTATGAGGGTTGTCTGTGGAATGCTACAGGCGTTGTAGTTTGTACTGGTGACGAAACTCAGTGTCTAGCTAGAGTGGCGGTGGCTCTGGTTCCGGTGATTTTGATTATGAAAAGATGGCAAACGCTAATAAGGGGGCTATGACCGAAAATGCCGATGAAAACGCGCTACAGTCTGACGCTAAAGGCAAACTTGATTCTGTCGCTACTGATTACGGTGCTGCTATCGATGGTTTCATTGGTGACGTTTCCGGCCTTGCTAATGGTAATGGTGCTACTGGTGATTTTGCTGGCTCTAATTCCCAAATGGCTCAAGTCGGTGACGGTGATAATTCACCTTTAATGAATAATTTCCGTCAATATTTACCTTCCCTCCCTCAATCGGTTGAATGTCGCCCTTTTGTCTTTAGCGCTGGTAAACCATATGAATTTTCTATTGATTGTGACAAAATAAACTTATTCCGTGGTGTCTTTGCGTTTCTTTTATATGTTGCCACCTTTATGTATGTATTTTCTACGTTTGCTAACATACTGCGTAATAAGGAGTCTTAATCATGCCAGTTCTTTTGGCTAGCGCCGCCCTATACCTTGTCTGCCTCCCCGCGTTGCGTCGCGGTGCATGGAGCCGGGCCACCTCGACCTGAATGGAAGCCGGCGGCACCTCGCTAACGGATTCACCACTCCAAGAATTGGAGCCAATCAATTCTTGCGGAGAACTGTGAATGCGCAAACCAACCCTTGGCAGAACATATCCATCGCGTCCGCCATCTCCAGCAGCCGCACGCGGCGCATCTCGGGCAGCGTTGGGTCCTGGCCACGGGTGCGCATGATCGTGCTCCTGTCGTTGAGGACCCGGCTAGGCTGGCGGGGTTGCCTTACTGGTTAGCAGAATGAATCACCGATACGCGAGCGAACGTGAAGCGACTGCTGCTGCAAAACGTCTGCGACCTGAGCAACAACATGAATGGTCTTCGGTTTCCGTGTTTCGTAAAGTCTGGAAACGCGGAAGTCAGCGCCCTGCACCATTATGTTCCGGATCTGCATCGCAGGATGCTGCTGGCTACCCTGTGGAACACCTACATCTGTATTAACGAAGCGCTGGCATTGACCCTGAGTGATTTTTCTCTGGTCCCGCCGCATCCATACCGCCAGTTGTTTACCCTCACAACGTTCCAGTAACCGGGCATGTTCATCATCAGTAACCCGTATCGTGAGCATCCTCTCTCGTTTCATCGGTATCATTACCCCCATGAACAGAAATTCCCCCTTACACGGAGGCATCAAGTGACCAAACAGGAAAAAACCGCCCTTAACATGGCCCGCTTTATCAGAAGCCAGACATTAACGCTTCTGGAGAAACTCAACGAGCTGGACGCGGATGAACAGGCAGACATCTGTGAATCGCTTCACGACCACGCTGATGAGCTTTACCGCAGGATCCGGAAATTGTAAACGTTAATATTTTGTTAAAATTCGCGTTAAATTTTTGTTAAATCAGCTCATTTTTTAACCAATAGGCCGAAATCGGCAAAATCCCTTATAAATCAAAAGAATAGACCGAGATAGGGTTGAGTGTTGTTCCAGTTTGGAACAAGAGTCCACTATTAAAGAACGTGGACTCCAACGTCAAAGGGCGAAAAACCGTCTATCAGGGCTATGGCCCACTACGTGAACCATCACCCTAATCAAGTTTTTTGGGGTCGAGGTGCCGTAAAGCACTAAATCGGAACCCTAAAGGGAGCCCCCGATTTAGAGCTTGACGGGGAAAGCCGGCGAACGTGGCGAGAAAGGAAGGGAAGAAAGCGAAAGGAGCGGGCGCTAGGGCGCTGGCAAGTGTAGCGGTCACGCTGCGCGTAACCACCACACCCGCCGCGCTTAATGCGCCGCTACAGGGCGCGTCCGGATCCTGCCTCGCGCGTTTCGGTGATGACGGTGAAAACCTCTGACACATGCAGCTCCCGGAGACGGTCACAGCTTGTCTGTAAGCGGATGCCGGGAGCAGACAAGCCCGTCAGGGCGCGTCAGCGGGTGTTGGCGGGTGTCGGGGCGCAGCCATGACCCAGTCACGTAGCGATAGCGGAGTGTATACTGGCTTAACTATGCGGCATCAGAGCAGATTGTAC

97

38 45

TGAGAGTGCACCATATGCGGTGTGAAATACCGCACAGATGCGTAAGGAGAAAATACCGCATCAGGCGCTCTTCCGCTTCCTCGCTCACTGACTCGCTGCGCTCGGTCGTTCGGCTGCGGCGAGCGGTATCAGCTCACTCAAAGGCGGTAATACGGTTATCCACAGAATCAGGGGATAACGCAGGAAAGAACATGTGAGCAAAAGGCCAGCAAAAGGCCAGGAACCGTAAAAAGGCCGCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCCCTGACGAGCATCACAAAAATCGACGCTCAAGTCAGAGGTGGCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGAAGCTCCCTCGTGCGCTCTCCTGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGGAAGCGTGGCGCTTTCTCATAGCTCACGCTGTAGGTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGCTGGGCTGTGTGCACGAACCCCCCGTTCAGCCCGACCGCTGCGCCTTATCCGGTAACTATCGTCTTGAGTCCAACCCGGTAAGACACGACTTATCGCCACTGGCAGCAGCCACTGGTAACAGGATTAGCAGAGCGAGGTATGTAGGCGGTGCTACAGAGTTCTTGAAGTGGTGGCCTAACTACGGCTACACTAGAAGGACAGTATTTGGTATCTGCGCTCTGCTGAAGCCAGTTACCTTCGGAAAAAGAGTTGGTAGCTCTTGATCCGGCAAACAAACCACCGCTGGTAGCGGTGGTTTTTTTGTTTGCAAGCAGCAGATTACGCGCAGAAAAAAAGGATCTCAAGAAGATCCTTTGATCTTTTCTACGGGGTCTGACGCTCAGTGGAACGAAAACTCACGTTAAGGGATTTTGGTCATGAGATTATCAAAAAGGATCTTCACCTAGATCCTTTTAAATTAAAAATGAAGTTTTAAATCAATCTAAAGTATATATGAGTAAACTTGGTCTGACAGTTACCAATGCTTAATCAGTGAGGCACCTATCTCAGCGATCTGTCTATTTCGTTCATCCATAGTTGCCTGACTCCCCGTCGTGTAGATAACTACGATACGGGAGGGCTTACCATCTGGCCCCAGTGCTGCAATGATACCGCGAGACCCACGCTCACCGGCTCCAGATTTATCAGCAATAAACCAGCCAGCCGGAAGGGCCGAGCGCAGAAGTGGTCCTGCAACTTTATCCGCCTCCATCCAGTCTATTAATTGTTGCCGGGAAGCTAGAGTAAGTAGTTCGCCAGTTAATAGTTTGCGCAACGTTGTTGCCATTGCTGCAGGCATCGTGGTGTCACGCTCGTCGTTTGGTATGGCTTCATTCAGCTCCGGTTCCCAACGATCAAGGCGAGTTACATGATCCCCCATGTTGTGCAAAAAAGCGGTTAGCTCCTTCGGTCCTCCGATCGTTGTCAGAAGTAAGTTGGCCGCAGTGTTATCACTCATGGTTATGGCAGCACTGCATAATTCTCTTACTGTCATGCCATCCGTAAGATGCTTTTCTGTGACTGGTGAGTACTCAACCAAGTCATTCTGAGAATAGTGTATGCGGCGACCGAGTTGCTCTTGCCCGGCGTCAACACGGGATAATACCGCGCCACATAGCAGAACTTTAAAAGTGCTCATCATTGGAAAACGTTCTTCGGGGCGAAAACTCTCAAGGATCTTACCGCTGTTGAGATCCAGTTCGATGTAACCCACTCGTGCACCCAACTGATCTTCAGCATCTTTTACTTTCACCAGCGTTTCTGGGTGAGCAAAAACAGGAAGGCAAAATGCCGCAAAAAAGGGAATAAGGGCGACACGGAAATGTTGAATACTCATACTCTTCCTTTTTCAATATTATTGAAGCATTTATCAGGGTTATTGTCTCATGAGCGGATACATATTTGAATGTATTTAGAAAAATAAACAAATAGGGGTTCCGCGCACATTTCCCCGAAAAGTGCCACCTGACGTCTAAGAAACCATTATTATCATGACATTAACCTATAAAAATAGGCGTATCACGAGGCCCTTTCGTCTTCAA

Primers for Sequencing: YH005 (M13 forward) TGT AAA ACG ACG GCC AGT CGA GCA CTT CAC CAA CAA YH006 (M13 reverse) CAG GAA ACA GCT ATG ACC GAC AAC AAC CAT CGC CCA pHH0103: The vector for protein expression Length: 6734 Legend: GST tag His Tag Thrombin cleavage site Stop codon PTac promoter Protein insertion site GAATTCCCGACACCATCGAATGGTGCAAAACCTTTCGCGGTATGGCATGATAGCGCCCGGAAGAGAGTCAATTCAGGGTGGTGAATGTGAAACCAGTAACGTTATACGATGTCGCAGAGTATGCCGGTGTCTCTTATCAGACCGTTTCCCGCGTGGTGAACCAGGCCAGCCACGTTTCTGCGAAAACGCGGGAAAAAGTGGAAGCGGCGATGGCGGAGCTGAATTACATTCCCAACCGCGTGGCACAACAACTGGCGGGCAAACAGTCGTTGCTGATTGGCGTTGCCACCTCCAGTCTGGCCCTGCACGCGCCGTCGCAAATTGTCGCGGCGATTAAATCTCGCGCCGATCAACTGGGTGCCAGCGTGGTGGTGTCGATGGTAGAACGAAGCGGCGTCGAAGCCTGTAAAGCGGCGGTGCACAATCTTCTCGCGCAACGCGTCAGTGGGCTGATCATTAACTATCCGCTGGATGACCAGGATGCCATTGCTGTGGAAGCTGCCTGCACTAATGTTCCGGCGTTATTTCTTGATGTCTCTGACCAGACACCCATCAACAGTATTATTTTCTCCCATGAAGACGGTACGCGACTGGGCGTGGAGCATCTGGTCGCATTGGGTCACCAGCAAATCGCGCTGTTAGCGGGCCCATTAAGTTCTGTCTCGGCGCGTCTGCGTCTGGCTGGCTGGCATAAATATCTCACTCGCAATCAAATTCAGCCGATAGCGGAACGGGAAGGCGACTGGAGTGCCATGTCCGGTTTTCAACAAACCATGCAAATGCTGAATGAGGGCATCGTTCCCACTGCGATGCTGGTTGCCAACGATCAGATGGCGCTGGGCGCAATGCGCGCCATTACCGAGTCCGGGCTGCGCGTTGGTGCGGATATCTCGGTAGTGGGATACGACGATACCGAAGACAGCTCATGTTATATCCCGCCGTTAACCACCATCAAACAGGATTTTCGCCTGCTGGGGCAAACCAGCGTGGACCGCTTGCTGCAACTCTCTCAGGGCCAGGCGGTGAAGGGCAATCAGCTGTTGCCCGTCTCACTGGTGAAAAGAAAAACCACCCTGGCGCCCAATACGCAAACCGCCTCTCCCCGCGCGTTGGCCGATTCATTAATGCAGCTGGCACGACAGGTTTCCCGACTGGAAAGCGGGCAGTGAGCGCAACGCAATTAATGTGAGTTAGCTCACTCATTAGGCACAATTCTCATGTTTGACAGCTTATCATCGACTGCACGGTGCACCAATGCTTCTGGCGTCAGGCAGCCATCGGAAGCTGTGGTATGGCTGTGCAGGTCGTAAATCACTGCATAATTCGTGTCGCTCAAGGCGCACTCCCGTTCTGGATAATGTTTTTTGCGCCGACATCATAACGGTTCTGGCAAATATTCTGAAATGAGCTGTTGACAATTAATCATCGGCTCGTATAATGTGTGGAATTGTGAGCGGATAACAATTTCACACAGGAAACAGCCAGTCCGTTTAGGTGTTTTCACGAGCACTTCACCAACAAGGACCATAGATTATGAAAATCGAAGAACACCATCACCATCACCATTCCAGCGGTAAGCTTATGTCCCCTATACTAGGTTATTGGAAAATTAAGGGCCTTGTGCAACCCACTCGACTTCTTTTGGAATATCTTGAAGAAAAATATGAAGAGCATTTGTATGAGCGCGATGAAGGTGATAAATGGCGAAACAAAAAGTTTG

98

38 45

AATTGGGTTTGGAGTTTCCCAATCTTCCTTATTATATTGATGGTGATGTTAAATTAACACAGTCTATGGCCATCATACGTTATATAGCTGACAAGCACAACATGTTGGGTGGTTGTCCAAAAGAGCGTGCAGAGATTTCAATGCTTGAAGGAGCGGTTTTGGATATTAGATACGGTGTTTCGAGAATTGCATATAGTAAAGACTTTGAAACTCTCAAAGTTGATTTTCTTAGCAAGCTACCTGAAATGCTGAAAATGTTCGAAGATCGTTTATGTCATAAAACATATTTAAATGGTGATCATGTAACCCATCCTGACTTCATGTTGTATGACGCTCTTGATGTTGTTTTATACATGGACCCAATGTGCCTGGATGCGTTCCCAAAATTAGTTTGTTTTAAAAAACGTATTGAAGCTATCCCACAAATTGATAAGTACTTGAAATCCAGCAAGTATATAGCATGGCCTTTGCAGGGCTGGCAAGCCACGTTTGGTGGTGGCGACCATCCTCCAAAATCGGATCTAGAAGTTCTGTTCCAGGGGCCCCTGTCCAGCGGTCTGGTTCCGCGTGGTTCCGGTACCGCGGCCCAGCCGGCCTTTTTTGCGGCCGCATAATAAACCGATACAATTAAAGGCTCCTTTTGGAGCCTTTTTTTTTGGAGATTTTCAACGTGAAAAAATTATTATTCGCAATTCCTTTAGTTGTTCCTTTCTATTCTCACTCCGCTGAAACTGTTGAAAGTTGTTTAGCAAAACCCCATACAGAAAATTCATTTACTAACGTCTGGAAAGACGACAAAACTTTAGATCGTTACGCTAACTATGAGGGTTGTCTGTGGAATGCTACAGGCGTTGTAGTTTGTACTGGTGACGAAACTCAGTGTCTAGCTAGAGTGGCGGTGGCTCTGGTTCCGGTGATTTTGATTATGAAAAGATGGCAAACGCTAATAAGGGGGCTATGACCGAAAATGCCGATGAAAACGCGCTACAGTCTGACGCTAAAGGCAAACTTGATTCTGTCGCTACTGATTACGGTGCTGCTATCGATGGTTTCATTGGTGACGTTTCCGGCCTTGCTAATGGTAATGGTGCTACTGGTGATTTTGCTGGCTCTAATTCCCAAATGGCTCAAGTCGGTGACGGTGATAATTCACCTTTAATGAATAATTTCCGTCAATATTTACCTTCCCTCCCTCAATCGGTTGAATGTCGCCCTTTTGTCTTTAGCGCTGGTAAACCATATGAATTTTCTATTGATTGTGACAAAATAAACTTATTCCGTGGTGTCTTTGCGTTTCTTTTATATGTTGCCACCTTTATGTATGTATTTTCTACGTTTGCTAACATACTGCGTAATAAGGAGTCTTAATCATGCCAGTTCTTTTGGCTAGCGCCGCCCTATACCTTGTCTGCCTCCCCGCGTTGCGTCGCGGTGCATGGAGCCGGGCCACCTCGACCTGAATGGAAGCCGGCGGCACCTCGCTAACGGATTCACCACTCCAAGAATTGGAGCCAATCAATTCTTGCGGAGAACTGTGAATGCGCAAACCAACCCTTGGCAGAACATATCCATCGCGTCCGCCATCTCCAGCAGCCGCACGCGGCGCATCTCGGGCAGCGTTGGGTCCTGGCCACGGGTGCGCATGATCGTGCTCCTGTCGTTGAGGACCCGGCTAGGCTGGCGGGGTTGCCTTACTGGTTAGCAGAATGAATCACCGATACGCGAGCGAACGTGAAGCGACTGCTGCTGCAAAACGTCTGCGACCTGAGCAACAACATGAATGGTCTTCGGTTTCCGTGTTTCGTAAAGTCTGGAAACGCGGAAGTCAGCGCCCTGCACCATTATGTTCCGGATCTGCATCGCAGGATGCTGCTGGCTACCCTGTGGAACACCTACATCTGTATTAACGAAGCGCTGGCATTGACCCTGAGTGATTTTTCTCTGGTCCCGCCGCATCCATACCGCCAGTTGTTTACCCTCACAACGTTCCAGTAACCGGGCATGTTCATCATCAGTAACCCGTATCGTGAGCATCCTCTCTCGTTTCATCGGTATCATTACCCCCATGAACAGAAATTCCCCCTTACACGGAGGCATCAAGTGACCAAACAGGAAAAAACCGCCCTTAACATGGCCCGCTTTATCAGAAGCCAGACATTAACGCTTCTGGAGAAACTCAACGAGCTGGACGCGGATGAACAGGCAGACATCTGTGAATCGCTTCACGACCACGCTGATGAGCTTTACCGCAGGATCCGGAAATTGTAAACGTTAATATTTTGTTAAAATTCGCGTTAAATTTTTGTTAAATCAGCTCATTTTTTAACCAATAGGCCGAAATCGGCAAAATCCCTTATAAATCAAAAGAATAGACCGAGATAGGGTTGAGTGTTGTTCCAGTTTGGAACAAGAGTCCACTATTAAAGAACGTGGACTCCAACGTCAAAGGGCGAAAAACCGTCTATCAGGGCTATGGCCCACTACGTGAACCATCACCCTAATCAAGTTTTTTGGGGTCGAGGTGCCGTAAAGCACTAAATCGGAACCCTAAAGGGAGCCCCCGATTTAGAGCTTGACGGGGAAAGCCGGCGAACGTGGCGAGAAAGGAAGGGAAGAAAGCGAAAGGAGCGGGCGCTAGGGCGCTGGCAAGTGTAGCGGTCACGCTGCGCGTAACCACCACACCCGCCGCGCTTAATGCGCCGCTACAGGGCGCGTCCGGATCCTGCCTCGCGCGTTTCGGTGATGACGGTGAAAACCTCTGACACATGCAGCTCCCGGAGACGGTCACAGCTTGTCTGTAAGCGGATGCCGGGAGCAGACAAGCCCGTCAGGGCGCGTCAGCGGGTGTTGGCGGGTGTCGGGGCGCAGCCATGACCCAGTCACGTAGCGATAGCGGAGTGTATACTGGCTTAACTATGCGGCATCAGAGCAGATTGTACTGAGAGTGCACCATATGCGGTGTGAAATACCGCACAGATGCGTAAGGAGAAAATACCGCATCAGGCGCTCTTCCGCTTCCTCGCTCACTGACTCGCTGCGCTCGGTCGTTCGGCTGCGGCGAGCGGTATCAGCTCACTCAAAGGCGGTAATACGGTTATCCACAGAATCAGGGGATAACGCAGGAAAGAACATGTGAGCAAAAGGCCAGCAAAAGGCCAGGAACCGTAAAAAGGCCGCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCCCTGACGAGCATCACAAAAATCGACGCTCAAGTCAGAGGTGGCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGAAGCTCCCTCGTGCGCTCTCCTGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGGAAGCGTGGCGCTTTCTCATAGCTCACGCTGTAGGTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGCTGGGCTGTGTGCACGAACCCCCCGTTCAGCCCGACCGCTGCGCCTTATCCGGTAACTATCGTCTTGAGTCCAACCCGGTAAGACACGACTTATCGCCACTGGCAGCAGCCACTGGTAACAGGATTAGCAGAGCGAGGTATGTAGGCGGTGCTACAGAGTTCTTGAAGTGGTGGCCTAACTACGGCTACACTAGAAGGACAGTATTTGGTATCTGCGCTCTGCTGAAGCCAGTTACCTTCGGAAAAAGAGTTGGTAGCTCTTGATCCGGCAAACAAACCACCGCTGGTAGCGGTGGTTTTTTTGTTTGCAAGCAGCAGATTACGCGCAGAAAAAAAGGATCTCAAGAAGATCCTTTGATCTTTTCTACGGGGTCTGACGCTCAGTGGAACGAAAACTCACGTTAAGGGATTTTGGTCATGAGATTATCAAAAAGGATCTTCACCTAGATCCTTTTAAATTAAAAATGAAGTTTTAAATCAATCTAAAGTATATATGAGTAAACTTGGTCTGACAGTTACCAATGCTTAATCAGTGAGGCACCTATCTCAGCGATCTGTCTATTTCGTTCATCCATAGTTGCCTGACTCCCCGTCGTGTAGATAACTACGATACGGGAGGGCTTACCATCTGGCCCCAGTGCTGCAATGATACCGCGAGACCCACGCTCACCGGCTCCAGATTTATCAGCAATAAACCAGCCAGCCGGAAGGGCCGAGCGCAGAAGTGGTCCTGCAACTTTATCCGCCTCCATCCAGTCTATTAATTGTTGCCGGGAAGCTAGAGTAAGTAGTTCGCCAGTTAATAGTTTGCGCAACGTTGTTGCCATTGCTGCAGGCATCGTGGTGTCACGCTCGTCGTTTGGTATGGCTTCATTCAGCTCCGGTTCCCAACGATCAAGGCGAGTTACATGATCCCCCATGTTGTGCAAAAAAGCGGTTAGCTCCTTCGGTCCTCCGATCGTTGTCAGAAGTAAGTTGGCCGCAGTGTTATCACTCATGGTTATGGCAGCACTGCATAATTCTCTTACTGTCATGCCATCCGTAAGATGCTTTTCTGTGACTGGTGAGTACTCAACCAAGTCATTCTGAGAATAGTGTATGCGGCGACCGAGTTGCTCTTGCCCGGCGTCAACACGGGATAATACCGCGCCACATAGCAGAACTTTAAAAGTGCTCATCATTGGAAAACGTTCTTCGGGGCGAAAACTCTCAAGGATCTTACCGCTGTTGAGATCCAGTTCGATGTAACCCACTCGTGCACCCAACTGATCTTCAGCATCTTTTACTTTCACCAGCGTTTCTGGGTGAGCAAAAACAGGAAGGCAAAATGCCGCAAAAAAGGGAATAAGGGCGACACGGAAATGTTGAATACTCATACTCTTCCTTTTTCAATATTATTGAAGCATTTATCAGGGTTATTGTCTCATGAGCGGATACATATTTGAATGTATTTAGAAAAATAAACAAATAGGGGTTCCGCGCACATTTCCCCGAAAAGTGCCACCTGACGTCTAAGAAACCATTATTATCATGACATTAACCTATAAAAATAGGCGTATCACGAGGCCCTTTCGTCTTCAA