Proteomic approaches in drug discovery

TECHNOLOGIES

DRUG DISCOVERY

TODAY

Drug Discovery Today: Technologies Vol. 3, No. 4 2006

Editors-in-Chief

Kelvin Lam – Pfizer, Inc., USA

Henk Timmerman – Vrije Universiteit, The Netherlands

Techniques for rational design

Proteomic approaches in drugdiscoveryTimothy D. VeenstraLaboratory of Proteomics and Analytical Technologies, SAIC-Frederick Inc., NCI-Frederick, P.O. Box B, Frederick, MD 21702, USA

To find a new drug against a chosen target usually

involves high-throughput screening, wherein large

libraries of chemicals are tested to determine their

ability to modify the target. Before a target can be

chosen, however, it must first be discovered. The omics

era has brought unprecedented abilities to screen cells

at the gene, transcript, protein, and metabolite level in

search of novel drug targets. Of the big four classes of

biomolecules, proteins remain the principal target of

drug discovery. The recent developments in proteomic

technologies have brought with them ability to com-

paratively screen large numbers of proteins within

clinically distinct samples. This capability has enabled

non-biased studies in which the goal is to discover

proteins that may act as suitable diagnostic biomarkers

or therapeutic drug targets. Although proteomics

technology has brought with it much hope, there are

still many challenges associated with leveraging the

experimental data into the discovery of novel drug

targets.

E-mail address: T.D. Veenstra ([email protected])

1740-6749/$ � 2006 Elsevier Ltd. All rights reserved. DOI: 10.1016/j.ddtec.2006.10.001

Section Editor:Hugo Kubinyi – University of Heidelberg, Heidelberg,Germany

Introduction

Drug discovery can be defined as a research process that

identifies and develops a molecule that produces a desired

effect in a living organism. Although the human cell is made

up of a large number of genes, transcripts, proteins, and

metabolites, most often a drug is designed to act upon a

protein [1]. Although on the surface the process may seem

straightforward – find a deranged protein that is causing an

adverse affect and then use a molecule to block its effect –

there are challenges, both technical and physiological, that

makes drug discovery a daunting challenge.

The first challenge is to find the protein target. Although

this article will not discuss this issue at length, the initial need

before any instrumental analysis can be implemented, is the

selection of suitable samples that are to be used in the dis-

covery of the target. Fundamentally, the sample set should

include materials acquired from patients who are affected by

a specific disorder and those acquired from healthy, matched

controls. Although human samples will be necessary at some

point if a drug is to be approved for human use, drug dis-

covery can often begin with much easier to obtain and

manipulate samples such as cell culture or a suitable animal

model. Although the efficacy of a drug in a non-human

system is often a poor predictor of its efficacy in a human,

issues such as husbandry and genetic background can be

controlled in non-human models.

Once a suitable model has been found, the next step is to

identify the deranged protein(s) that is (are) responsible for

the adverse condition being studied. This step is where the

technology developments made in surveying the protein

content of cells, tissues, and organisms has changed the

design of drug discovery studies. In the past (and to a large

extent currently), protein science was dominated by hypoth-

esis-driven studies in which a specific or small number of

433

mailto:[email protected]

http://dx.doi.org/10.1016/j.ddtec.2006.10.001

Drug Discovery Today: Technologies | Techniques for rational design Vol. 3, No. 4 2006

proteins are studied to determine if they play a role in a

particular cell phenotype. Today’s technologies allow discov-

ery-driven studies in which the aim is to gather as much

possible information on as many proteins as possible to

determine which proteins are contributing to the observed

phenotype. As will be discussed later, the ability to gather

more information at the protein level would seem to simplify

the problem and enable the identification of large numbers of

novel drug targets; it has also resulted in a whole new set of

questions that need to be considered and new hurdles that

need to be cleared.

Proteomic technologies for the discovery of drug

targets

The mention of proteomics most often invokes images of

two-dimensional gels and mass spectrometers. If a nonbiased

approach is to be taken, the attributes of mass spectrometry

(MS) make it arguably the most powerful technology for the

discovery of protein drug targets. Although both two-dimen-

sional gels and MS play a major role in proteomics, they are

not the only technologies available, or necessary, for the

discovery of drug targets (Fig. 1). The successful discovery

of drug targets relies on a variety of techniques such as the

appropriate sample preparation, fractionation, protein mea-

Figure 1. A partial view of various proteomic technologies important in drug

434 www.drugdiscoverytoday.com

surement, and bioinformatics. Although much of the credit

for the ability to characterize proteomes to the extent possible

today is a direct result in the development of more powerful

mass spectrometers, the contributions of sample preparation

and protein fractionation should not be overlooked. After the

clinical sample set has been acquired, the design of the

sample preparation steps that will be used is probably the

most critical step that will determine success or failure. The

sample preparation steps need to be designed depending on

the level of information one has concerning the possible drug

target. For example, if there is evidence, empirical or other-

wise, that the target is a membrane receptor, ultracentrifuga-

tion should be incorporated into the sample preparation

steps to isolate membranes from the samples (if possible).

In the cases of serum and plasma, it is wise to remove high

abundance proteins, such as albumin and immunoglobulins,

because they can interfere with downstream analyses [2].

Unfortunately, in too many cases very little is known about

the potential drug target. In this case, the aim is to character-

ize as many proteins with the sample as possible.

The next decision point entails what type of separation is

best for the samples of interest. Two-dimensional polyacry-

lamide gel electrophoresis (2D-PAGE) has been widely used in

comparing proteomes extracted from comparative samples

discovery.

Vol. 3, No. 4 2006 Drug Discovery Today: Technologies | Techniques for rational design

Figure 2. High-throughput peptide identification using liquid chromatography (LC) coupled on-line with tandem mass spectrometry (MS). The mass

spectrometer takes an MS scan and measures the intensity of various peptide ions observed temporally during a separation of a complex peptide mixture

(a and b). The most abundant peptide ion (c) is isolated and subjected to collisional induced dissociation (d). The resulting tandem MS spectrum is

analyzed by the appropriate software to identify the peptide sequence that would most probably give rise to this fragmentation pattern (e). This peptide

sequence is then correlated back to its protein of origin. Modern mass spectrometers conduct steps (b) through (d) in a rapid cyclical fashion

enabling hundreds of peptides within complex mixtures to be identified per hour.

[3]. In 2D-PAGE, samples are fractionated based on their

isoelectric point (pI) and molecular mass. After staining of

the proteins, spots that are more or less intense within the

comparative samples are excised from the gel and identified.

Two-dimensional PAGE enables the relative abundances of

proteins from different samples to be compared within the gel

on the basis of intensity of the protein staining. Mass spectro-

metry is the tool of choice for protein identification because

of its throughput, sensitivity, and ability to identify proteins

based on sequence-related information [4].

Another approach that is commonly used when the goal is

to characterize large numbers of proteins is to circumvent 2D-

PAGE and directly analyze the samples by MS [5]. One of the

misnomers of this type of MS-based proteomics is that in

most studies they are peptides, rather than proteins, that are

characterized. In these bottom-up studies, the entire pro-

teome is digested into tryptic peptides that are introduced

into the mass spectrometer for identification. The digestion

of potentially thousands of proteins results in potentially tens

to hundreds of thousands of peptides. Therefore, it is neces-

sary to fractionate this mixture before MS analysis. The most

commonly used prefractionation technique is strong cation

exchange (SCX) followed by reversed-phase liquid chroma-

tography. This combination can either be done online

together using a bi-phasic column or offline in which frac-

tions are collected from the SCX column. The reversed-

phased separation is always done directly on-line so that

peptide elute directly from this column into the mass spectro-

meter.

The ability of the mass spectrometer to identify proteins

rapidly is arguably the parameter that makes this instrumen-

tation the driving force in proteomics today. How exactly

does a mass spectrometer identify peptides? As shown in

Fig. 2, peptides are being constantly eluted from a

reversed-phase column into the mass spectrometer

(Fig. 2a). During this separation, the instrument records

the mass-to-charge (m/z) ratios of the peptides that are elut-

ing at a specific time point (Fig. 2b). The instrument then

selects and isolates the most intense ion observed in the

previous scan (Fig. 2c) and fragments it, in a process referred

to as tandem MS, to create a series of sequence ladders

(Fig. 2d). After this fragmentation event the instrument

www.drugdiscoverytoday.com 435


proceeds to isolate and fragment the next most abundant

peptide ion. It does this sequential ion selection and frag-

mentation for anywhere from the 3–10 most abundant pep-

tide ions (depending on the operator setting). Today’s mass

spectrometers are able to collect approximately 7000 tandem

mass spectra in a single hour. All of these spectra are then

analyzed using the appropriate software and protein or gen-

ome database to identify the peptides that gave rise to the

individual spectra (Fig. 2e). In a typical analysis, 10–20% of

the spectra will give a hit, allowing between 700 and 1400

peptides to be identified confidently and then correlated to its

protein of origin.

Detection of changes in protein abundance

There are many attributes that can make a protein a potential

drug target. Protein phosphorylation, which controls many

aspects of cell physiology, is an important target for drug

design. For example, Gleevec works by targeting a constitu-

tively active tyrosine kinase, BCR-Abl and shutting off the

uncontrolled cell growth associated with the mutated gene

product. G-protein-coupled receptors have historically been

the most important group of drug targets. Not surprisingly,

protein kinases are now the second most important class of

drug target [6]. Other modifications on proteins such as

Figure 3. Quantitative methods for comparative proteomics. (a) Stable isoto


prenylation [7], methylation [8], sulfation [9], among others

have also been targeted as potential drug targets. Although

MS-based proteomics is capable of detecting such modifica-

tions, outside of phosphorylation, the science is not mature

enough to identify such changes on the scale and with the

reliability necessary for drug target discovery. The major

focus in MS-based proteomics is to identify changes in the

relative abundances of proteins between comparative sample

sets.

As mentioned above, 2D-PAGE provides direct measure-

ment of changes in protein relative abundances courtesy of

the protein staining intensity. In non-gel based approaches

other means must be used to identify those proteins that are

differentially abundant between the sample cohorts. There

are essentially two main strategies used to gain a measure of a

protein’s relative abundance in different proteome samples:

differential stable isotope labeling [10] and subtractive pro-

teomics (Fig. 3) [11]. There are many different methods in

which differential stable isotope labeling is used in quanti-

tative proteomics, however, they all have the same basic

premise; label amino acids within one proteome with a light

isotope of a common element (e.g. 12C, 14N, and so forth) and

label the other proteome with the matching heavy isotope

(e.g. 13C, 15N). This labeling can be done either chemically

pe labeling and (b) subtractive proteomics.


(e.g. in the case of isotope-coded affinity tags or iTRAQ), or

metabolically (e.g. culturing of cells in medium enriched for a

particular heavy stable isotope). Although there are subtle

differences in the sample processing steps depending on the

type of stable isotope labeling approach used, in either case

the differentially labeled proteome samples are combined

and digested into tryptic peptides. The peptide mixture is

then analyzed through a combination of multidimensional

chromatography coupled directly on-line with data-depen-

dent tandem MS, as shown previously in Fig. 2. The relative

abundance of the peptides within the different samples is

measured in the MS scan, and MS/MS is used for identifica-

tion. The result is a list of the relative abundances of proteins

among samples being compared. The hope is that a protein(s)

that has an observable abundance difference between two (or

among more) sets of samples is an intriguing candidate as a

potential drug target and can be graduated to further valida-

tion and future clinical development.

Although stable isotope labeling methods enable the quan-

titation of thousands of proteins in complex clinical samples,

they are low throughput, requiring days to compare even a

couple of samples. They are generally limited to the compar-

ison of no more than four samples, and metabolic stable

isotope labeling methods are not applicable to the study of

human samples. Although they have made a major impact in

the analysis of cellular and tissue proteomes, stable isotope

labeling methods, have not been widely used in the study of

biofluids. Although the reasons for this are not readily

obvious, it is possible that the domination of serum and

plasma by a few high abundant proteins impacts the ability

to modify lower abundant proteins chemically.

Subtractive proteomic approaches have been recently

developed to simplify and increase the throughput of analyz-

ing clinically important samples [11]. These methods do not

rely on gels or stable isotopes, but quantitate proteins based

on the number of peptides identified for each species

(Fig. 3b). In this method, proteomes are extracted from a

series of biological samples and digested into tryptic peptides.

The peptide mixtures are then individually analyzed using

multidimensional chromatography coupled directly on-line

with a mass spectrometer operating in a data-dependent

tandem MS mode (Fig. 2). The relative abundance of each

protein across a set of samples is determined by the number of

peptides identified for that specific protein.

This quantitative method is based on the observation that

the number of unique peptides identified for a protein is

related to its abundance in the mixture. For example, albu-

min, which is present at �60–80 mg/mL, is consistently

detected by large numbers of peptides (i.e. >20) in the MS

analysis of serum, whereas lower abundance proteins such as

cytokines are generally identified by one or two peptides [2].

This result is directly related to the concentration of albumin

(i.e.�60 mg/mL) compared with cytokine proteins (i.e. in the

ng/mL range). The subtractive approach is an attractive

approach to screening for changes in protein abundances

across many samples because of its inherent simplicity and

the fact that an unlimited number of samples can be inter-

compared, whereas stable isotope labeling methods in prac-

tice have been limited to two-way (e.g. ICAT) or four-way (e.g.

iTRAQ) comparisons. Like most techniques, however, it also

has its disadvantages. It is relatively low throughput. Each

sample would take a minimum of one day to acquire the

necessary data even if the whole process was automated. The

quantitative comparison method is imprecise compared with

stable isotope labeling methods and, therefore, changes less

than threefold cannot be accurately determined with a high

confidence level. Low abundance proteins, although detect-

able, may not provide enough unique peptide identifications

to be quantitated using this method.

Challenges in drug target discovery

Although MS-based methods are routinely able to detect

hundreds of differences between biological samples, this

ability is somewhat of a blessing and a curse. The blessing

is in the ability to detect so many differences and the curse is

in trying to determine which differences are most important

and likely to survive downstream pre-clinical validation.

Obviously many differences, such as inflammatory or

acute-phase response proteins, can be ruled out as potential

drug targets, but how to determine the best candidates is still

a difficult chore. One method that is now routinely done is to

compare changes in the proteome with those observed in an

mRNA array. Unfortunately, numerous studies have now

shown that the correlation between the amount of a protein

and its transcript’s abundance is poor. For example, in a

study conducted in our laboratory comparing changes in the

abundances of proteins and their transcripts during osteo-

blast differentiation, we found that the correlation was an

abysmal 0.09 [12]. There are many potential reasons for this

lack of correlation ranging from post-transcriptional proces-

sing events to temporal differences in mRNA and protein

expression. The data were then re-compared by binning

proteins and their transcripts into functional pathways

and the correlation between these groups was then com-

pared. As shown in Table 1, a series of different functional

pathways including cell cycle regulation and apoptosis

induction showed significant correlation. This comparison

allows the location of potential drug targets to be localized

within specific functional pathways that can be examined

using hypothesis-driven studies directed towards the indi-

vidual proteins.

Let us assume that global screening has brought to fruition

potential drug targets. It is at this point that many of the

other technologies highlighted in Fig. 1, such as structural

proteomics and binding measurements, become relevant.

Obviously the standard approach of conducting high-



Figure 4. Proteomic technologies in the discovery of a biomarker and possible drug target for interstitial cystitis (IC). A series of chromatography steps

were performed in which desired fractions were graduated based on their activity in a cell-based assay. The antiproliferative factor (APF) was identified by

tandem mass spectrometry (MS) of a simplified fraction that still retained the desired activity. A biotinylated version of APF was synthesized and coupled to

an avidin column to serve as a bait to isolate its receptor. The receptor was identified, and validated, by MS and Western blotting as CKAP4.

Table 1. Pearson correlation values comparing overall functional pathways of proteins and their transcripts during osteoblastdifferentiation

BioCarta pathway Pearson correlation (Prot. V mRNA) P-value

GO pathway 0.501 0.047

Cell Cycle 0.829 0.048

Integrin-mediated signaling 0.763 0.046

G-protein coupled-receptor 0.963 0.046

Induction of apoptosis 0.963 0.050

Mitosis 0.825 0.050

Rho protein signal transduction 0.831 0.049

Although poor correlation was observed at the individual protein/transcript level, good correlation was observed when the overall abundance changes seen within functional pathways were

compared [12].



throughput screening of combinatorial libraries of com-

pounds against the proposed target will play a critical role,

but it is advantageous to have a purified version of the drug

target available to determine its binding characteristics. The

determination of protein structures has seen a tremendous

increase in throughput in the last few years as automated

methods of testing for the optimal expression conditions of

recombinant proteins in different cell types have been devel-

oped [13]. Automation has also positively impacted the abil-

ity to purify expressed proteins, and more powerful X-ray

beams and higher field nuclear magnetic resonance spectro-

meters along with the development of better software and

faster hardware have increased the rate at which protein

structures can be solved [14,15]. Knowledge of a drug target’s

structure can be used to determine if it possesses homology to

any other class of protein. This homology mapping can aid in

either the selection or the design of an appropriate drug to

inhibit the protein’s activity.

Application of proteomics to discovery of anti-

interstitial cystitis drug target

Although the number of drug targets identified in the aca-

demic proteomics world is lacking, there have been suc-

cesses. In our own laboratory, we have been working over

the past few years on interstitial cystitis (IC), a chronic and

painful bladder disorder that is characterized by thinning of

the bladder epithelial lining. Our initial interest in IC was the

discovery of a diagnostic biomarker as it had been shown that

urine from these patients contained a factor, named anti-

proliferative factor (APF), that inhibited bladder epithelial

cell growth in vitro. By using a series of separation methods

and testing each fraction for growth inhibition, we were able

to isolate an active molecule that was identified using tan-

dem MS as a sialoglycopeptide made up of a three moiety

sugar group bound to a nine residue hydrophobic peptide, as

shown in Fig. 4 [16]. On the basis of the structure of APF, we

hypothesized that it exerted its effects on the bladder epithe-

lial lining through binding to a membrane receptor. To find

this receptor, a biotinylated form of APF was synthesized and

coupled to an avidin column. A membrane preparation

prepared from explanted bladder epithelial cells from IC

patients was solubilized and passed over the column. The

column was equilibrated and bound material was eluted

from the column using solutions containing increasing salt

concentrations. Each of these fractions was then analyzed by

SDS-PAGE. Two faint bands were detected on a silver stained

gel of the eluant collected at the highest salt concentration.

These two bands were identified as CKAP4, a single pass

membrane receptor, and vimentin [17]. Reducing CKAP4

expression in bladder epithelial cells by siRNA diminished

the growth inhibitory effects of APF on these cells. Incuba-

tion of epithelial cells with an anti-CKAP4 antibody also

prevents the growth inhibition effects of APF. These results

suggest that CKAP4 may be a possible druggable target to

treat patients suffering the adverse effects of IC.

Although this project demonstrates the use of proteomics

technology for finding a possible drug target, careful analysis

shows that many more technologies beyond MS were critical

in the discovery. For instance, a significant amount of chro-

matography was used to simplify the final mixture enabling

APF to be recognized, and a cell-based assay was critical for

screening for the desired activity. In the identification of

CKAP4, sample preparation in the form of subcellular frac-

tionation to prepare a membrane preparation was instrumen-

tal in the identification of CKAP4 as a receptor for APF and a

potential druggable target. Finally, functional studies to

block CKAP4 activity in the presence of APF are critical to

proving a link between APF and CKAP4. Although MS will

continue to play a key role, the inclusion of other technolo-

gical assays will bolster the chances of finding clinically valid

protein drug targets in the future.

Conclusion

The scientific community is able to survey proteins like never

before. The two most pressing needs for this type of technol-

ogy is to find more effective biomarkers for disease detection

and discover proteins to which therapeutic drugs can be

targeted. One sentiment that is often expressed in the MS

community is that if we had more sensitive instruments, we

could do better at identifying biomarkers or drug targets.

Frankly, I disagree with this thinking. We have the capability

of not only identifying orders of magnitude more proteins

than just ten years ago, but can also do it in a fraction of time.

Unfortunately, this capability has resulted in too many stu-

dies that rely too heavily on MS for the discovery of drug

targets. One hurdle that must be overcome is to find ways to

complement high-throughput MS data with other types of

studies that cull the number of possible targets found in a

global screening into those targets that are most likely to pass

future clinical trials.

Acknowledgements

This project has been funded in whole or in part with federal

funds from the National Cancer Institute, National Institutes

of Health, under Contract NO1-CO-12400. The content of

this publication does not necessarily reflect the views or

policies of the Department of Health and Human Services,

nor does it mention trade names, commercial products, or

organization implied endorsement by the United States Gov-

ernment.

References1 Hofstadler, S.A. and Sannes-Lowery, K.A. (2006) Application of ESI-MS in

drug discovery: interrogation of noncovalent complexes. Nat. Rev. Drug

Discov. 5, 585–595

2 Conrads, T.P. et al. (2006) Sampling and analytical strategies for biomarker

discovery using mass spectrometry. Biotechniques 40, 799–805



3 Pietrogrande, M.C. et al. (2006) Decoding 2D-PAGE complex maps:

relevance to proteomics. J. Chromatogr. B Analyt. Technol. Biomed. Life Sci.

833, 51–62

4 Domon, B. and Aebersold, R.A. (2006) Mass spectrometry and protein

analysis. Science 312, 212–217

5 Liu, H. et al. (2002) Multidimensional separations for protein/peptide

analysis in the post-genomic era. Biotechniques 32, 898–902

6 Cohen, P. (2003) Protein kinases – the major drug targets of the 21st

century? Nat. Rev. Drug Discov. 1, 309–315

7 Glenn, J.S. (2006) Prenylation of HDAg and antiviral drug development.

Curr. Top. Microbiol. Immunol. 307, 133–149

8 Abbosh, P.H. et al. (2006) Dominant-negative histone H3 lysine 27 mutant

derepresses silenced tumor suppressor genes and reverses the drug

resistant phenotype in cancer cells. Cancer Res. 66, 5582–5591

9 Farzan, M. et al. (1999) Tyrosine sulfation of the amino terminus of CCR5

facilitates HIV-1 entry. Cell 96, 667–676

10 Aggarwal, K. et al. (2006) Shotgun proteomics using the iTRAQ isobaric

tags. Brief. Funct. Genomic. Proteomic. 5, 112–120


11 Oh,P. et al. (2004) Subtractive proteomic mapping of the endothelial surface

in lung and solid tumour for tissue-specific therapy. Nature 429, 629–635

12 Conrads, K.A. et al. (2005) A combined proteome and microarray

investigation or inorganic phosphate-induced pre-osteoblast cells. Mol.

Cell. Proteomics 4, 1284–1296

13 Vinarov, D.A. and Markley, J.L. (2005) High-throughput automated

platform for nuclear magnetic resonance-based structure proteomics.

Expert Rev. Proteomics 2, 49–55

14 Scapin, G. (2006) Structural biology and drug discovery. Curr. Pharm. Des.

12, 2087–2097

15 Tugarinov, V. et al. (2004) Nuclear magnetic resonance spectroscopy of

high-molecular-weight proteins. Annu. Rev. Biochem. 73, 107–146

16 Keay, S. et al. (2004) An antiproliferative factor from interstitial cystitis

patients is a frizzled 8 protein-related sialoglycopeptide. Proc. Natl. Acad.

Sci. U S A 101, 11803–11808

17 Conrads, T.P. et al. CKAP4/p63 is a receptor for the frizzled-8 protein-

related antiproliferative factor from interstitial cystitis patients. J. Biol.

Chem. (in press)

Documents

Proteomic approaches in drug discovery