Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
4 Chemogenomics
Chapter 1
Introduction
Atomiclevel
Chapter 2 Chapter 3 Chapter 4 Chapter 5 Chapter 6
Pathwaylevel
Proteomelevel
Cellularlevel
Spial Quorumsensing
Chemo-genomics
Descriptorrelationships
Conclusionsand
perspectives
Chapter 4: Chemogenomics
4-1
Contents of chapter 4
Contents of chapter 4.................................................................................................. 1 Summary of chapter 4................................................................................................. 2 Introduction ................................................................................................................. 2
Chemogenomic experiments................................................................................... 4 Heterozygous deletions ....................................................................................... 5 Homozygous deletions......................................................................................... 5 Overexpression.................................................................................................... 6 Beyond deletion and overexpression libraries ..................................................... 7
Detection methods................................................................................................... 7 Non-competitive arrays ........................................................................................ 7 Competitive mutant pools .................................................................................... 7
Data interpretation and analysis .............................................................................. 8 Clustering............................................................................................................. 8 Matrix operations ................................................................................................. 9
Applications ........................................................................................................... 10 Chemogenomic networks.......................................................................................... 12
Results and discussion.......................................................................................... 13 Interpretation of a chemogenomic network........................................................ 14 Properties of the chemogenomic network.......................................................... 15
Materials and methods .......................................................................................... 16 Data acquisition ................................................................................................. 16 Construction of the chemogenomic network...................................................... 16 Computation of network parameters and overlap with other networks .............. 17
Linking the chemical and biological properties of small molecules ........................... 17 Results and discussion.......................................................................................... 18
What differentiates bioactive small molecules from others? .............................. 18 What differentiates small molecules that hit S. cerevisiae from those that hit Sc.
pombe?.............................................................................................................. 20 Conclusions ....................................................................................................... 22
Materials and Methods .......................................................................................... 22 References................................................................................................................ 23
Chapter 4: Chemogenomics
4-2
Summary of chapter 4
A robust knowledge of the interactions between small molecules and specific
proteins aids the development of new biotechnological tools, the identification of drug
targets, and the gain of specific biological insights. Such knowledge can be obtained
through chemogenomic screens. In these screens, each small molecule from a
chemical library is applied to each cell type from a library of cells, and the resulting
phenotypes are recorded. Chemogenomic screens have recently become very
common and will continue to generate large amounts of data. Therefore methods to
investigate these data become important. In this chapter, I develop computational
methods for the analysis of chemogenomic data. For this, I focus on the fungus
Saccharomyces cerevisiae, and, to a lesser extent, Schizosaccharomyces pombe.
Parts of this chapter appear in the following articles:
Wuster, A. and M. Madan Babu (2008). “Chemogenomics and biotechnology.”
Trends Biotechnol 26(5): 252-8.
Wuster, A. (2008). “Networking with drugs.” Mol BioSyst 4(1): 14-7
The section on linking the chemical and biological properties of SMs would not have
been possible without the help of Laura Schuresko-Kapitzky and Nevan Krogan of
the University of California, San Francisco, who generated much of the data this
section is based on.
Introduction
The effects of small molecules (SMs) on cells were central to the research of Paul
Ehrlich (1854-1915; figure 4-1). For much of his career he strived to identify 'magic
bullets', SMs that would allow him to target specific tissues or microbes whilst sparing
others (Ehrlich 1911). In order to find a cure for syphilis, he systematically screened a
library of hundreds of SMs for their effect against Treponema pallidum, the causative
agent of the disease. After testing 605 different compounds, he eventually identified
arsphenamine, which he later marketed as Salvarsan 606 (Gensini, Conti et al. 2007).
With this strategy, he initiated a whole new approach to drug discovery which has
persisted until today (Lipinski and Hopkins 2004). The aim of Paul Ehrlich's chemical
screen was to find an SM that was active against a single pathogen. The data
Chapter 4: Chemogenomics
4-3
structure created by his screen was one-dimensional and can be encoded by a single
vector of the 606 compounds tested. Nowadays, chemical screens are more complex
and wider in scope. In this chapter, I mostly work with the data obtained through two-
dimensional chemogenomic screens. As in Paul Ehrlich's screen, the first dimension
in such screens is a chemical library. The second dimension is a library of different
cell types. For the data I work with in this chapter, the cell types are well-defined
mutants in a library of S. cerevisiae deletion strains, where in each strain a different
gene has been deleted. Alternatively, the cell types could be defined in other ways,
such as in a library of cancer cell lines or of meiotic recombinants (Perlstein, Deeds
et al. 2007; Perlstein, Ruderfer et al. 2007). The resulting data structure is a two-
dimensional matrix where each data point has two coordinates and one specific
associated value. The chemical coordinate specifies the SM that was applied, whilst
the genetic coordinate specifies the cell type. The value of each data point is a
measurement of the phenotype of interest, such as viability, growth rate, or cell size
and shape. The results of a large number of chemogenomic screens with different
designs have been published (table 4-B). They all have in common the data structure
described above, but the experimental designs and aims, such as the identification of
cellular targets of SMs, or the characterisation of cellular pathways, vary between
them.
In this chapter, I describe the different ways in which chemogenomic data can be
created, and then introduce the methods that I have developed to extract information
from this data. For this purpose, I have divided the results of this chapter into two
parts. The first part, on chemogenomic networks, takes a gene-centric view, whilst
the second part, on linking the chemical and biological properties of SMs, takes an
SM-centric view. I have only tested the methods in this chapter on S. cerevisiae, and,
to a lesser extent, on Sc. pombe. Nevertheless, they can potentially also be applied
to other systems, such as humans.
Figure 4-1. Paul Ehrlich (1854-1915) on a former D-Mark banknote. The molecular structure to the left of the portrait symbolises arsphenamine. The six-membered ring in the centre consists of arsenic atoms. It was not shown until 16 years after the issue of the banknote that arsphenamine is actually a mixture of compounds with three- and five-membered arsenic rings (Lloyd, Morgan et al. 2005). Source of the image: http://tinyurl.com/l3r9np
Chapter 4: Chemogenomics
4-4
Chemogenomic experiments
An in-depth understanding of the subtly different ways in which chemogenomic
experiments are designed is necessary in order to develop the appropriate methods
to extract meaningful information from them. Although for all methods of carrying out
chemogenomic screens the resulting data structure is similar, its interpretation
depends on the design of the experiment. For yeast, there are at least three different
types of mutant libraries that can be generated, such as heterozygous deletions,
homozygous deletions, and overexpression libraries (table 4-1).
Screen design →
A. Homozygous deletants B. Heterozygous deletants
C. Overexpression
↓ Detection
Non-competitive
arrays
(Dudley, Janse et al. 2005) 21 conditions 4’710 mutants aim: determine gene function
(Baetz, McHardy et al. 2004) 1 SM > 5000 mutants aim: determine mode of action for drugs (Chang, Bellaoui et al. 2002) 1 SM 5’000 mutants aim: determine genes with certain function
(Luesch, Wu et al. 2005) 1’000 SMs 7’296 mutants aim: identify protein targets for SMs
Competitive
barcoded pools
(Hillenmeyer, Fung et al. 2008) 178 SMs and conditions 5’000 mutants aim: determine gene function
(Parsons, Lopez et al. 2006) 82 SMs 4’111 mutants aim: determine mode of action for drugs (Giaever, Flaherty et al. 2004) 10 SMs 6’000 mutants aim: determine functional interactions of SMs (Hillenmeyer, Fung et al. 2008) 354 SMs and conditions 6’000 mutants aim: determine gene function
(Butcher, Bhullar et al. 2006) 2 SMs 3’929 mutants aim: identify protein targets for SMs
Table 4-1. This table shows examples of chemogenomic screens carried out in the recent past. The columns correspond to three different types of mutant libraries (screen design). Expression of a gene product can either be avoided altogether by deleting the gene altogether (homozygous deletion; empty red arrows), or expression can be reduced by deleting the gene on only one chromosome (heterozygous deletion; filled and empty red arrows), or expression can be increased by introducing extra copies of the gene into the cell (overexpression; multiple red arrows). The rows of the table correspond to two different detection methods that can be used to observe the phenotype of the mutants when treated with SMs. The SM can be added to a well or agar plate where each mutant occupies a separate well or spot on the agar. The effect is then observed directly (non-competitive arrays). Alternatively, the barcoded mutants can all be placed into a flask containing an SM. The effect on the growth of each individual mutant is then observed by measuring the abundance of the different mutants using microarrays whose probes are complementary to the barcodes of the mutants. The intensities of specific probes are then thought to be proportional to the abundance of the mutant strains in the pool (competitive barcoded pools).
well or agar plate
Chapter 4: Chemogenomics
4-5
In the following sections I discuss how each of these library types can be used to
generate chemogenomic data. For each mutant library, the measured phenotype
refers to changes in growth rate as a proxy for viability or fitness. The measured
phenotypes can be caused by a variety of molecular interactions between proteins
and SMs, whose interpretation depends on the choice of experimental setup.
Heterozygous deletions
This library type consists of diploid yeast cells that have two copies of each
chromosome. Almost every single one of the 6’500 yeast genes can be deleted
independently, as long as the deletion is only on one of the two chromosomes of a
chromosome pair (heterozygous) (Deutschbauer, Jaramillo et al. 2005). In such a
library, even cells with a deletion in an essential gene will normally survive as they
still have another copy of the gene on the second chromosome. Data from a
heterozygous deletion library and a homozygous deletion library, which is described
in the next section, need to be interpreted in different ways (Baetz, McHardy et al.
2004; Giaever, Flaherty et al. 2004; Parsons, Lopez et al. 2006). The rationale
behind screening against a heterozygous deletion library is that a cell that has only
one copy of a particular gene produces less of the gene product. Such a cell is
rendered more sensitive to an SM that is capable of disrupting the function of this
gene product than a wildtype cell. This effect is also referred to as haploinsufficiency,
and the method has also been described as haploinsufficiency profiling (Giaever,
Flaherty et al. 2004). Data points obtained from such a screen that show decreased
viability might therefore directly point to an interaction between a chemical compound
and a gene product.
Homozygous deletions
This type of library consists of yeast cells in which each non-essential gene is deleted
from both chromosomes of a diploid chromosome pair. For obvious reasons, in such
a homozygous deletion library only non-essential genes can be deleted as the
deletion of any of the approximately 1000 essential yeast genes will by definition be
lethal (Deutschbauer, Jaramillo et al. 2005). The interpretation of such a
chemogenomics screen is more complex for a homozygous deletion library than for a
heterozygous deletion library. Clearly, an SM cannot act on a gene product after the
respective gene has been knocked out. Therefore, if a homozygous deletion strain is
sensitive to a particular SM, this can be interpreted in different ways. One possibility
is that the deleted gene may be different from the gene targeted by the SM but has a
similar function. In the absence of SMs, disruption of one of the two genes is
Chapter 4: Chemogenomics
4-6
expected to have little effect as the second gene provides a back-up function.
However, the disruption of the function of both genes, the one by the SM and the
other by deletion, might result in decreased viability. This has been verified in a study
which compared the effect of knocking out one gene and targeting the other with an
SM with the effect of knocking out both genes, a method which is also referred to as
a synthetic genetic array (SGA) (Parsons, Brost et al. 2004). This study considered
the biosynthesis gene ERG11, a target of the SM fluconazole, as an example. There
are 27 genes which cause synthetic lethality when mutated together with ERG11. 13
of these 27 mutants also cause lethality when ERG11 is not mutated but when
fluconazole is added. Comparing the lethality profile of other SMs to the synthetic
lethal profiles of their targets, it was found that SGA profiles generally tend to have a
significant overlap with the chemogenomic profiles (Parsons, Brost et al. 2004). This
shows that the effect of deleting ERG11 is similar to the effect of disrupting its
function with fluconazole. In an approach which is complementary to the
homozygous deletion library described above, it would be possible to screen for
deletions which confer resistance to a compound whose addition would be
deleterious to the wildtype strain. An example of this would be if a compound which is
not normally toxic to the cell becomes toxic due to the action of a cellular enzyme
modifying it. In this case, deletion of the genes encoding the modifying enzyme would
confer resistance to the cell. For example, azidothymine becomes active against
DNA polymerases and reverse transcriptases only after it has undergone several
cycles of phosphorylation (Brenner 2004). Mutating the genes responsible for this
phosphorylation could potentially decrease the sensitivity of the cell towards
azidothymine. In this way, genes that are responsible for causing side effects to
certain drugs could be identified.
Overexpression
A third approach investigates the effect of SMs under conditions of increased gene
dosage, which can be amended by increasing gene copy number (Butcher, Bhullar et
al. 2006; Luesch 2006). One such study (Luesch, Wu et al. 2005) used an
overexpression library to screen for genes which confer increased resistance to SMs
when multiple copies of a gene, and therefore presumably higher level of the gene
product, are present. In this way it was possible to show, amongst other things, that
the breast cancer therapeutic tamoxifen disrupts calcium homeostasis (Luesch, Wu
et al. 2005).
Chapter 4: Chemogenomics
4-7
Beyond deletion and overexpression libraries
Alternative approaches for generating libraries of different cell types include the
exploration of the relationship of essential genes with SMs via titrable promoter
alleles (Mnaimneh, Davierwala et al. 2004), and Decreased Abundance by mRNA
Perturbation (DAmP). DAmP enables fine-tuning of protein concentration by
disrupting the 3′ untranslated region of a gene by insertion of an antibiotic resistance
marker, which greatly destabilises the corresponding mRNA (Muhlrad and Parker
1999). DAmP leads to proteins that are synthesised at substantially lower levels than
in the wildtype but under their natural transcriptional regulation. DAmP can therefore
be used to gain insights into the deleterious and alleviating effects SMs have at
differential protein levels (Schuldiner, Collins et al. 2005).
Detection methods
Chemogenomic screens can be carried out in two fundamentally different designs:
Non-competitive arrays and competitive mutant pools (table 4-1).
Non-competitive arrays
In this design, a set of arrays is constructed that contain each of the deletion or
overexpression strains separate from the others, so that they can be observed
individually. This can either be achieved by locating each strain into a separate well
of a well plate, or by spotting them onto an agar surface. After each compound from
the chemical library is added, changes in growth can be measured independently for
each strain and each SM. Since each mutant strain is physically isolated, strains do
not interact or compete for resources in this design.
Competitive mutant pools
In this design, each mutant has a unique DNA sequence also referred to as a bar
code. All bar-coded mutant strains are initially grown together in the same flask. This
mutant pool is then divided into multiple flasks and each compound from the
chemical library is added to a different flask. In each flask, the different strains
compete for resources and fitter strains in the presence of the SM increase in
concentration. For analysis, the genetic material of the contents of the flasks is
hybridised to microarrays, which contain probes that are complementary to the bar
codes of the individual deletion strains. The observed differential intensities of the
probes correspond to the differential concentration and therefore the differential
viability of the respective strains in the presence of the SM. Alternatively, the
differential concentration of the strains could be established using high-throughput
Chapter 4: Chemogenomics
4-8
sequencing. However, in this screen design, the intrinsically different growth rates of
mutants compared to wild type cells need to be taken into account when interpreting
the obtained data. It is therefore important that the data are normalised by the growth
rate of each mutant without any SM treatment.
It has been observed that the results of the array-based design and the pool-based
design overlap but differ (Giaever, Shoemaker et al. 1999). A possible explanation for
this is interactions between strains in mutant pools. For example, certain deletion
strains could accumulate intermediates in a metabolic pathway from which an
enzyme has been deleted, and these intermediates, when released, might interfere
with the viability of other strains in the same flask. Alternatively, yeast strains grown
in flasks might behave differently from yeast strains grown in arrays due to the
selection pressures of interstrain competition for resources, which is not a limitation
in the array-based detection method.
Data interpretation and analysis
To understand and interpret chemogenomic data is not a trivial task, as the observed
phenotypic effects can have many causes and can also be indirect. Here I discuss
the most common methods to interpret chemogenomic data, before introducing my
own methods later in this chapter.
Clustering
Application of clustering algorithms is a standard way of analysing multidimensional
data. In chemogenomic screens, data clustering can be applied to both the chemical
and the genetic dimension. A cluster in the chemical dimension will result in groups
of SMs that have similar effects on the different cell types (table 4-2). It has been
shown that clusters in the chemical dimension often contain SMs which are
chemically similar (Giaever, Flaherty et al. 2004). Therefore, the action of untested
SMs could be predicted based on similarity to tested SMs from previous screens.
Accordingly, a cluster in the genetic dimension groups cell types that are affected by
the SMs in a similar way. Such a cell type cluster can then be expected to be
enriched for mutated genes (deletion or overexpression) that have a certain function
in common. In order to identify such clusters, a number of supervised or
unsupervised algorithms that are also applied to microarray data have been
described. They include hierarchical clustering, k-means, and self-organising maps
(Madan Babu 2004). Alternatively, biclustering, an approach that analyses both
dimensions at the same time, can be employed (Cheng and Church 2000; Madeira
Chapter 4: Chemogenomics
4-9
and Oliveira 2004). The biclusters obtained in this way contain a subset of mutated
genes which interact similarly with a subset of SMs, and vice versa. The aim of
biclustering is to sort both dimensions simultaneously in such a way that functionally
significant blocks can be identified amongst the overall pattern (figure 4-2C). Another
important advantage of biclustering is that it can identify mutated genes or SMs that
are associated with more than one cluster. This is a significant improvement over
simpler clustering procedures since many genes may have more than one function.
Biclustering approaches have successfully been applied to chemogenomic data
(Dudley, Janse et al. 2005; Parsons, Lopez et al. 2006). Non-negative or Probabilistic
sparse matrix factorisation (Greene, Cagney et al. 2008) are alternative algorithms to
biclustering and is has been shown that in some cases they are more sensitive than
hierarchical clustering. For example, the SMs verrucarin and neomycin sulphate are
clustered together by matrix factorisation but not by hierarchical clustering (Parsons,
Lopez et al. 2006). A detailed review of the various clustering methods and how they
are implemented, can be found in references (D'Haeseleer 2005; Xu and Wunsch
2005).
B. One-dimensionalclustering
C. Biclustering
∆ gene 1
small
mole
cule
1
small
mole
cule
2
.small
mole
cule
3
. small
mole
cule
m
∆ gene 2∆ gene 3
.
.
.
∆ gene n
.
A. Data structure
Figure 4-2. The chemogenomic data structure and clustering methods. (A) A hypothetical data structure obtained from a chemogenomic screen. One dimension (columns) refers to the SMs added, whilst the other dimension (rows) refers to cell type, in this case defined deletion (∆) mutants. The values, here represented as white, grey and black boxes, refer to the growth or viability of the specific mutant when treated with a specific SM. (B) The columns of the matrix (SMs) have been clustered according to the similarity of their effect on different cell types. (C) Rows and columns of the matrix have been biclustered (see text). The resulting biclusters are highlighted in green.
Matrix operations
Although clustering is a helpful tool for the interpretation of chemogenomic data, it is
not the only available method. Weinstein and coworkers (Scherf, Ross et al. 2000)
have taken an interesting approach by integrating two different two-dimensional
matrices obtained from a chemogenomics screen and this strategy could also prove
valuable for the interpretation of other datasets. Their first matrix had SMs as its first
dimension, cell lines as its second dimension, and the associated values referred to
Chapter 4: Chemogenomics
4-10
differential growth. In a second matrix, genes were represented in the first dimension,
cell lines in the second dimension, and the associated values referred to transcript
levels as measured by microarrays. The common dimension in both matrices was the
cell lines. The authors then correlated the expression of each gene in each cell line
with its response to each SM and thereby obtained a third two-dimensional matrix,
which has SMs as one dimension and genes as the other dimension. From this
matrix, the authors could identify the genes that are expressed at a differential level
in cell lines that have differential sensitivity to a specific SM. In this way, they could
show how the transcript levels of particular genes relate to sensitivity to specific SMs.
Matrices can in many cases also be visualised and interpreted in networks. SM-gene
networks have also been used to visualise and analyse chemogenomic-like datasets
(Haggarty, Koeller et al. 2003; Sharom, Bellows et al. 2004; Lamb, Crawford et al.
2006; Ma'ayan, Jenkins et al. 2007; Yildirim, Goh et al. 2007) (figure 4-3). One
conclusion that has been possible from the analysis of such drug-target networks is
that there has been a trend for more rational drug design over the last decade
(Yildirim, Goh et al. 2007; Wuster 2008). Later in this chapter I introduce the concept
of chemogenomic networks, which make use of the idea that matrices can be
visualised and interpreted as networks.
small molecule (SM)target protein
target proteinnetwork
SM-targetnetwork
SMnetwork
Figure 4-3. A bipartite network of drugs and their target proteins (drug-target network; centre panel) contains the information to derive both the target protein network and the drug network (Yildirim, Goh et al. 2007). In the target protein network (left panel) two proteins are linked if they are targeted by the same SM. In the drug network (right panel) two drugs are linked if they target the same protein. The giant component of each network is highlighted in light green.
Applications
The immediate purpose of a chemogenomic screen is to characterise the effect a set
of SMs has at the level of genes or proteins. From a biotechnological point of view,
such chemogenomics data can allow for the identification of proteins as novel drug
targets (Bredel and Jacoby 2004). In a screen of SMs against a library of
heterozygous yeast deletions, the gene products that interact with each of the SMs
can be identified. For example, in a chemogenomic screen of mammalian cell lines
Chapter 4: Chemogenomics
4-11
STAT3 could be identified as a potential target for apratoxin A, an SM that is
cytotoxic against tumour cells (Luesch, Chanda et al. 2006). Many yeast genes have
orthologues in humans (Hohmann 2005), which raises the possibility that interactions
with SMs might also be conserved and that orthologues could be potential drug
targets or might aid in gaining an understanding of the side effects of drugs at the
molecular level (Luesch 2006). Nevertheless, it is not clear how well the responses to
SMs are conserved between distantly related organisms. I address this question later
in this chapter. Rapamycin is an example of an SM for which it is known that the
molecular mechanism of action is conserved between S. cerevisiae and humans
(Butcher, Bhullar et al. 2006). This is of particular interest as the majority of drugs are
SMs (Lipinski and Hopkins 2004).
SMs that are found to be active against specific proteins could also serve as
alternatives to genetic methods. In reverse genetics, in order to disrupt the function of
a protein, the DNA sequence that encodes the protein is mutated. Although this
approach has generated a large variety of useful data, it has some drawbacks. First
of all, mutagenesis can be difficult to implement in higher animals such as mammals.
Furthermore, mutagenesis strategies do not exhibit a large degree of spatial and
temporal flexibility, since it is difficult to mutate a gene for only a transient period of
time or in a limited and clearly defined set of cells. An additional problem is that in
multifunctional proteins it can be difficult to disrupt only one protein function whilst
leaving others intact. Chemical genetics has the potential to overcome these
problems (Stockwell 2000; Stockwell 2004; Spring 2005; Florian, Hümmer et al.
2007). The ultimate goal of chemical genetics is to identify SMs that are able to
modulate the function of as many proteins in a cell as possible, either by simulating a
gain-of-function or of a loss-of-function mutation. It has been estimated that between
8 and 14 percent of proteins in eukaryotic genomes are 'druggable' in this way
(Hopkins and Groom 2002). Chemical genetics can be more easily employed in
mammalian cells than mutagenesis strategies and in a manner that is specific to a
set of clearly defined cells, or during a clearly defined time window by adding and
removing the compounds used in the screen. Furthermore, in a multifunctional
protein potentially only one specific function could be modulated without affecting the
others.
The potential of chemical genetics for drug development is well recognised. Recent
applications include the targeting of cancer cell lines that carry mutations in cancer
pathways with specific SMs (Dolma, Lessnick et al. 2003; Rognan 2007) or the use
Chapter 4: Chemogenomics
4-12
of SMs to direct differentiation of stem cells (Huangfu, Maehr et al. 2008; Borowiak,
Maehr et al. 2009; Chen, Borowiak et al. 2009). However, other areas of
biotechnology, including green biotechnology might also benefit from chemical
genetics. Crop plants could be improved with the use of SMs that target specific
proteins, for example those involved in drought resistance pathways (Raikhel and
Pirrung 2005). Since such SMs can be applied in a manner similar to pesticides, the
introduction of genetically modified crops that is often associated with environmental,
legal, or technical difficulties, could be avoided.
Besides these practical applications, chemogenomics data can also offer
fundamental biological insights. Grouping genes according to their chemogenomic
profiles can lead to the identification of clusters that are enriched for functional
categories as defined by the Gene Ontology (GO) database (Ashburner, Ball et al.
2000). If a gene of previously unknown function is found in such a cluster, it is likely
that it has a similar function to the other genes in the same cluster. Chemogenomics
could therefore contribute to the determination of the function of genes whose
function was previously unknown (Dudley, Janse et al. 2005). Moreover, it might
even be possible to define novel functional classes of genes. More specifically, if an
SM causes decreased viability when a set of genes is deleted, this set of genes can
be interpreted as being necessary for conferring resistance to the SM. Therefore,
these genes might have a previously unknown function in common. For example,
deletion of members of the protein complexes SAGA, Swi/Snf and Ino80 each
caused sensitivity to cadmium, cycloheximide, hydroxyurea, and glycerol (Dudley,
Janse et al. 2005). Therefore, it has been assumed that these complexes share a
function, which is to regulate genes that are required in the presence of these SMs.
Chemogenomic networks
Understanding and interpreting the data obtained from a chemogenomic screen is
not trivial, as often the observed effects are not direct. However, in most cases it is
true that if a defined mutant is less viable in the presence of an SM than in its
absence, then the gene that has been mutated has some importance for resistance
to the SM.
This makes a novel method for interpreting chemogenomic data possible that
involves the construction of chemogenomic networks (figure 4-4). Nodes in
chemogenomic networks are genes. Two genes are linked by an edge if they have
one or more SMs in common that cause decreased viability when either of the two
Chapter 4: Chemogenomics
4-13
genes is deleted. Because two genes can be deleterious when deleted in the
presence of more than one different SM, edges in a chemogenomic network have a
weight, which is an integer greater or equal to one. For example, if the edge
connecting two genes has a weight of four, this means that there are four different
SMs that cause a decrease in viability if either of the genes is deleted.
Although it is often assumed that drugs that have the highest selectivity for the
protein they bind are the most effective, this may not be the case (Hopkins, Mason et
al. 2006; Hopkins 2008; Nobeli, Favia et al. 2009). Instead, many of the most
effective drugs bind multiple proteins instead of single targets. Chemogenomic
networks may provide a way of rationally identifying clusters of gene products that
are targeted by the same SMs. If such a cluster is implicated in a certain disease,
SMs that target all or most of the genes of the cluster will be strong candidates for
further investigation, although they may not have been identified as being candidates
when only considering one SM-protein interaction at a time.
2 31
A
B
F
E
C D
2 31
F
E
2 31
E
A
B
C D
A
B
C D
F
E
A
B
C D
F
Chemo-genomicnetworkA
BCDEF
smallmolecules
stra
in li
brar
y
1 2 3SMs
strainlibrary
Figure 4-4. As any network, chemogenomic networks consist of nodes and edges that connect the nodes. In chemogenomic networks, the nodes are genes or strains whilst the edges are relationships between these genes. Two genes are connected by an edge if they are sensitive to the same SM (SM). An edge is therefore an SM that causes a decrease in viability if either of the two genes it connects is deleted. Therefore, the edges in a chemogenomic network are weighted and undirected. This figure shows how a chemogenomic network is constructed. Starting with the chemogenomic data structure (panel 1; decreases in viability are symbolised by dark green), each SM is considered in turn. If the SM causes decreased viability in more than two strains, the edge weight between the two strains is increased by one.
Results and discussion
The properties of a chemogenomic network are dependent on the dataset that is
used to generate it. In a chemogenomic dataset that uses twenty different SMs, the
maximum possible weight of an edge is twenty. Furthermore, the network
constructed from such a dataset can be expected to have a lower graph density (the
proportion of all possible edges that is actually observed) than a network constructed
from a dataset which uses one thousand different SMs. For comparing
chemogenomic networks, it is therefore important to keep these issues in mind and
to counteract them, for example by filtering the network by only including edges
above a threshold weight.
Chapter 4: Chemogenomics
4-14
Interpretation of a chemogenomic network
Two genes are linked by an edge in a chemogenomic network if both are needed for
resistance to the same SM (see Materials and Methods). This suggests that the two
genes have complementary functions. Figure 4-5 shows a chemogenomic network
that has been generated using previously published data (Parsons, Lopez et al.
2006). In the figure, clusters of genes with a grey background are highlighted
because they share common functions: ARN1 and ARN2 both are transporters that
are specific to siderophore-iron chelates; RIP1 is cytochrome-c reductase whilst
COX9 is a cytochrome-c oxidase; and DUG2, APE3, and PCA1 all are peptidases or
hydrolases (http://www.yeastgenome.org/; last accessed 20 July 2009). The function
of open reading frame YBR284W is unknown, but due to its association with three
hydrolases in the chemogenomic network it is plausible to assume that it is one as
well. It is also interesting to note that all four genes of this cluster are encoded within
15 open reading frames of each other on chromosome II. This demonstrates that
edges in chemogenomic networks can be used to identify functional similarities
between genes.
PAN3
CTR1
PET130
OCA5
SRN2
LUG1
RIM21
ZAP1
CAF40
YGR122W
YPT6
YBR246W
COX9
YBR284WVPS21
LAS21
BUD25
SWA2
PEP8
DAL5
DUG2
MCH5
YIL077C
YIR016W
YKL023W
SMI1
PIN4
VPS52
CDC73
GOS1
VAM3
LSM1
SOP4
VAM7
ATG15
FTR1
ECM34VAM10
RBK1
TOM5
MIG1
FET3
MLP1
UNG1
RPL12B
SHU1
SUR4
ARN2
YPL105C
PHO86
END3
URE2
WSC4
CHS5
YPT7
COG6YPL158C
PAT1
SPO22
VPS71
SNF8
APL5
SHE4
VPS30
REG1
MDM20
YCR045C
RIP1
PEP7
PCA1
VPS9
ARN1
NUC1
VPS27
YIL092W
MOG1
POR1
APE3
TEF4
YAR029W
WHI3
VPS24
GMH1
APQ12
ADA2
BEM1
SNX3
SKT5
GCN5
YDJ1
RPS8A
Figure 4-5. Chemogenomic network filtered for edges with a weight of 20 or more. That is to say, two genes are only connected if they are important for resistance to the same 20 or more SMs. Only the largest connected component of the network is shown. The clusters of genes with a grey background are highlighted because it is known that they also share common functions.
Chapter 4: Chemogenomics
4-15
Properties of the chemogenomic network
To ascertain the overlap between two networks, such as the chemogenomic network
and the protein-protein interaction network, I first created a list of nodes which appear
in both networks. Next, I went through both networks and retained only those edges
which connected two nodes which were both on the list. The overlap between the two
networks was then measured by counting the number of edges they had in common.
To assess the significance of this result, I generated a null distribution of overlapping
edge counts by permutating the node labels of the graphs. The Z score of the
observed overlapping edge count relative to this distribution is given below (table 4-2).
Table 4-2. Parameters of the chemogenomic network at different minimum edge weights. The edge weight refers to how many SMs both nodes connect by the edge interact with. Whilst it is clear that there is a larger than expected overlap between the chemogenomic network and the protein interaction network, no such effect can be seen for the transcription interaction network, in which two genes are connected if they are regulated by the same transcription factor (target gene co-regulatory network). In many cases a decrease in growth rate or fitness in a certain combination of SM
and heterozygous deletant will not be indicative of a direct and physical interaction
between the SM and the gene whose dosage has been increased. The gene may
instead encode a protein that compensates for the direct effect of the SM (Venancio,
Balaji and Aravind 2009; Mol BioSyst (in print)). The extent to which such “buffer
genes” contribute to the interactions observed in chemogenomic data is not known.
As a widespread occurrence of this compensation effect could severely skew the
interpretation of chemogenomic data, more research in this area is needed.
In the chemogenomic networks described above, genes that respond to the same
SMs are connected. Alternatively, one can envisage a network that has SMs as
nodes. These nodes are connected if they interact with the same gene (figure 4-3). In
such a way, it may be possible to identify clusters of SMs with similar biological
activities. In the next section (Linking the chemical and biological properties of SMs),
min edge weight ≥ 1 ≥ 2 ≥ 5 edge count 2’084’597 817’683 149’374 connected node count 3’189 2’497 1’391 graph density 0.41 0.26 0.16 min degree mean degree max degree
44 1’307.367
2’870
2 654.93
2’169
1 214.77
1’046 min weight mean weight max weight
1 1.916
29
2 3.336
29
5 6.977
29 clustering coefficient 0.67 0.58 0.54 protein interaction edge overlap – absolute – Z score
4’870 5.43
2’579 8.89
819
13.30 transcriptional interaction edge overlap – absolute – Z score
123’155
1.58
47’087
-0.30
7’615 -1.46
Chapter 4: Chemogenomics
4-16
I approach this problem from a different angle by asking if SMs that have similar
chemical properties also have similar biological properties.
Materials and methods
Data acquisition
The chemogenomic data from which the chemogenomic network used in this
analysis was constructed was published by Parsons et al. (Parsons, Lopez et al.
2006). The dataset describes the sensitivity of a haploid S. cerevisiae mutant library
to 82 SMs.
The protein-protein interaction network was obtained from the BioGRID database
(Stark, Breitkreutz et al. 2006), which combines information on protein-protein
interaction from various sources and types of experiments. The transcription
regulatory network was based on ChIP-chip data (Lee, Rinaldi et al. 2002; Harbison,
Gordon et al. 2004; Balaji, Madan Babu et al. 2006), which identifies direct in vivo
DNA binding events for transcription factors. According to this network, 144
transcription factors regulate 4’347 target genes via 12’230 regulatory interactions. In
order to make the transcription regulatory network comparable to the chemogenomic
network, it was processed in such a way that target genes that are regulated by the
same transcription factor are connected by edges. The resulting network is, strictly
speaking, not a transcription regulatory network, but a target gene co-regulatory
network.
Construction of the chemogenomic network
The chemogenomic network is constructed by collapsing a bipartite network with two
types of nodes into a simple network with only one type of node. In the initial bipartite
network, the two types of nodes are SMs and genes or deletion strains. In the
resulting simple network, genes are the only type of node. In this network, two genes
are connected by an edge if they have one or more SMs they interacted with in the
initial network in common. The edge weight corresponds to the number of those
common SMs. The chemogenomic network can be constructed by considering one
SM at a time, and connecting all genes that interact with it to each other (figure 4-4).
This means that the chemogenomic network will be densely connected if SMs tend to
interact with many genes in the initial bipartite graph. Therefore, it can be useful to
filter edges that have a low weight (that is to say, that symbolise interaction of genes
with only a few common SMs).
Chapter 4: Chemogenomics
4-17
Computation of network parameters and overlap with other networks
Network parameters were computed using the statistical language R and the
package igraph (http://cneurocvs.rmki.kfki.hu/igraph/). Overlap with the protein-
protein interaction network and the transcription regulatory network was assessed
using in-house Perl scripts. For this, all nodes that do not appear in both networks to
be compared were deleted. Then, the number of common edges between the
networks was counted. The significance of the overlap between networks was
assessed by randomising the networks by permutating the node labels and repeating
the analysis 1’000 times.
Linking the chemical and biological properties of small
molecules
Not all sections of chemical space are equally likely to contain SMs that have a
certain biological activity. This is the basic assumption of quantitative structure-
activity relationships (QSAR), which describes how chemical structure is associated
with biological activity. The central tenet of QSAR is that similar SMs bind similar
proteins (Klabunde 2007). This insight enables us to define areas of chemical space
that are occupied by SMs with specific affinity for certain protein families.
By focussing on those areas of chemical space that contain SMs with a biological
activity of interest, it is possible to make the process of drug lead identification more
rapid and efficient. By analysing the chemical properties of approved drugs, Lipinski's
rule of five was discovered (Lipinski, Lombardo et al. 2001). This “rule” states that
most drugs that have been approved by the American Food and Drug Administration
(FDA) have less than five hydrogen-bond donors, less than ten hydrogen-bond
acceptors, a log P value (a measure of hydrophobicity) of less than five, and a
molecular weight of less that 1’500 Da. Therefore, SMs with known structures can be
pre-screened in silico prior to a chemogenomic experiment to select those that are
most likely to have certain biological activities.
However, Lipinski’s rule is limited in its usefulness to chemogenomic screens carried
out in yeasts such as S. cerevisiae and Schizosaccharomyces pombe as it has been
developed to describe the druglikeness of approved drugs in humans. It is not clear if
the same limitations to molecular properties also apply to those SMs that show
bioactivity in yeasts. Here, I address this question by first computing the molecular
properties of SMs, including the predicted octanol-water partition coefficient (log P),
Chapter 4: Chemogenomics
4-18
the polar surface area, the number of atoms, the molecular mass and volume, the
number of hydrogen bonds and acceptors, the number of rotable bonds, and the
number of violations of Lipinski’s rule. I then compare those properties between SMs
that are bioactive in S. cerevisiae and those that are not. Furthermore, I ask whether
there are properties that distinguish SMs that are active in S. cerevisiae from those
that are active in Sc. pombe. In the following, I use the term bioactivity to mean the
ability of a compound to inhibit the growth of a yeast colony.
In order to determine whether SMs that are bioactive in S. cerevisiae tend to have
different molecular properties from those SMs that are not bioactive, I analysed two
distinct datasets. The first dataset (dataset A) has until now not been published and
describes the IC50 values of 2’961 SMs in both S. cerevisiae and Sc. pombe. The
second dataset (dataset B) has been created by a high-throughput screen by an NCI
yeast anticancer drug screen and separates 87’264 SMs into those that have an
effect on the growth of S. cerevisiae and those that do not (see Materials and
Methods).
Results and discussion
In order to assess whether any molecular properties are significantly distinct for the
SMs that are bioactive in S. cerevisiae and Sc. pombe compared to those SMs that
are not, I analysed two distinct datasets, A and B (see Materials and Methods).
What differentiates bioactive small molecules from others?
For dataset A, there is a clear difference between active and non-active SMs for a
number of chemical descriptors. The strongest predictor of biological activity is log P,
a measure of hydrophobicity (table 4-3), which was more than 80% larger in active
SMs than in inactive ones (p < 5.54 · 10-12).
Bioactive SMs on average tend to be more hydrophobic, have a lower polar surface
area, have a higher molecular weight, a lower number of hydrogen bond acceptors, a
lower number of hydrogen bond donors, and a higher volume than the other SMs
screened. Furthermore, a bigger proportion of bioactive compounds, compared to
compounds that are not bioactive, violate one or more of Lipinski's rules.
Chapter 4: Chemogenomics
4-19
Property Explanation Mean
active Mean inactive
Wilcoxon test p-value
logP predicted octanol-water partition coefficient log P (hydrophobicity)
3.59 1.98 5.5 · 10-12
PSA topological molecular polar surface area (important to predict drug transport properties)
55.32
80.97 1.7 · 10-9
natoms number of atoms 22.99 22.63 0.10 MW molecular weight 352.32 326.90 1.5 · 10-4 nON number of hydrogen bond acceptors 4.06 5.47 3.1 · 10-8 nOHNH number of hydrogen bond donors 1.15 2.14 6.3 · 10-9 nviolations how many of Lipinski's rules are violated 0.37 0.33 0.08 nrotb number of rotable bonds (measure of molecular
flexibility) 4.54 3.92 0.40
volume molecular volume in Å3 299.00 284.20 4.6 · 10-3 Table 4-3. Active SMs are defined as those that have an IC50 value of less than 100 μM in both species. Inactive SMs are those that have a higher IC50 value in either or both species. The Wilcoxon test p-value indicates how likely it is that the descriptor is the same for SMs that hit S. cerevisiae and Sc. pombe, and SMs that hit neither. These rules state that the majority of approved drugs have less than five hydrogen
bond donors, less than ten hydrogen bond acceptors, a log P value of less than five,
and a molecular weight of less than 500 Da (Lipinski, Lombardo et al. 2001). In all
cases, the majority of SMs that I defined as being bioactive against the two fungi
follow these rules. However, in the case of log P and molecular weight, the bioactive
SMs are closer to Lipinski's threshold values than the drugs that are not bioactive.
In order to confirm the chemical differences between SMs that are bioactive and
those that are not, I performed principal component analysis using the nine
descriptors mentioned above as initial dimensions. After dimensionality reduction, the
difference between the two types of molecules could be confirmed. It has previously
been shown that there are differences between SMs from combinatorial synthesis,
and drugs as well as natural products (Feher and Schmidt 2003). My analysis shows
that there are also significant differences within classes of SMs according to their
bioactivity.
In order to confirm these results, I repeated the analysis using a previously published
dataset that was larger than the one used above. In the initial stage of an anticancer
drug screen, 87’264 SMs were tested against six different strains of S. cerevisiae. Of
these, 12’068 had a strong effect on yeast growth and 75’196 did not. The results of
this screen have been made available online (http://dtp.nci.nih.gov/yacds/index.html;
last accessed 20 July 2009) and constitute dataset B. As before, I determined the
molecular properties of the tested drugs to see which ones are predictive of activity.
Chapter 4: Chemogenomics
4-20
-800 -600 -400 -200 0 200 400
-200
-150
-100
-50
050
100
PCA of molecular properties
PC1
PC
2
Figure 4-6: Plot of the first two principal components (PCs) of the molecular properties of the 2’961 SMs of dataset A. Active SMs (IC50 < 100 μM) are in red, inactive SMs in grey. The first principal component (PC1) explains 92% of the variance in the data, PC2 explains 6%. The same descriptors are over- and underrepresented according to this dataset
compared to the 2’961 SMs in dataset A. Property Explanation Mean
active Mean inactive
Wilcoxon test p-value
logP predicted octanol-water partition coefficient log P (hydrophobicity)
3.41 2.44 < 2.2 · 10-16
PSA polar surface area (important to predict drug transport properties)
61.07 66.16 < 2.2 · 10-16
natoms number of atoms 23.40 20.54 < 2.2 · 10-16 MW molecular weight 351.91 297.96 < 2.2 · 10-16 nON number of hydrogen bond acceptors 4.42 4.70 < 2.2 · 10-16 nOHNH number of hydrogen bond donors 1.22 1.38 < 2.2 · 10-16 nviolations how many of Lipinski's rules are violated 0.39 0.25 < 2.2 · 10-16 nrotb number of rotable bonds (measure of molecular
flexibility) 4.90 4.39 < 2.2 · 10-16
volume molecular volume in Å3 300.91 262.36 < 2.2 · 10-16 Table 4-4. The Wilcoxon test p-value indicates how likely it is that the descriptor is the same for SMs that hit S. cerevisiae and those that do not. This table is similar to table 4-3, but is based on a different dataset (87’264 SMs from an NCI cancer drug screen).
What differentiates small molecules that hit S. cerevisiae from those that
hit Sc. pombe?
Having established that there are chemical properties that allow distinguishing
between SMs that are bioactive and those that are not, it is of interest to establish
whether these properties vary between organisms. I therefore compared the
chemical properties of SMs that are bioactive in S. cerevisiae to the properties of
SMs that are bioactive in Sc. pombe. There are several important differences
between the two species that would explain why such differences in sensitivity to
Chapter 4: Chemogenomics
4-21
certain SMs exist, including differences in the length of cell cycle phases, or the
presence of RNAi machinery in Sc. pombe but not in S. cerevisiae.
5411559
2733
S.cerevisiae
S.pombe
Figure 4-7: Of 2’961 NCI SMs, the majority that has an IC50 value of less than 100 μM in one species also has an IC50 value of less than 100 μM in the other species. This indicates that an SM that is bioactive to one species is also likely to be bioactive in the other species.
IC50 < 1’000 μM IC50 < 100 μM IC50 < 10 μM IC50 < 1 μM hits S. cerevisiae only 84 59 20 2 hits Sc. pombe only 55 115 48 4 hits S. cerevisiae and Sc. pombe 132 54 48 29 hits neither 2’690 2’733 2’845 2’926 p-value < 2.2 · 10-16 < 2.2 · 10-16 < 2.2 · 10-16 1.89 · 10-7
Table 4-5: As figure 4-7, but at multiple different IC50 cut-offs. The strong overlap between SMs that hit Sc. pombe and S. cerevisiae does not depend on the IC50 cut-off used. It could be that some SMs are bioactive in one organism but not the other because
they have different molecular properties. As before, I obtained the molecular
structures of each of the SMs that have different bioactivity in the two species and
computed nine properties for each of them in order to assess whether one of these
properties was significantly distinct for the SMs that hit one organism but not the
other. No significant result could be found, indicating that bioactivity in S. cerevisiae
is a good predictor of bioactivity in Sc. pombe and vice versa. Property Explanation Mean S.
cerevisiae Mean Sc. pombe
Wilcoxon test p-value
logP predicted octanol-water partition coefficient log P (hydrophobicity)
2.83 3.13 0.46
PSA polar surface area (important to predict drug transport properties)
66.01 61.46 0.71
natoms number of atoms 22.75 23.04 0.85 MW molecular weight 334.30 336.76 0.97 nON number of hydrogen bond acceptors 4.52 4.21 0.66 nOHNH number of hydrogen bond donors 1.63 1.74 0.91 nviolations how many of Lipinski's rules are violated 0.27 0.43 0.09 nrotb number of rotable bonds (measure of molecular
flexibility) 3.62 4.28 0.99
volume molecular volume in Å3 288.12 296.88 0.56 Table 4-6. Properties of drugs that only hit S. cerevisiae versus those that only hit Sc. pombe (IC50 < 1’000 μM). The Wilcoxon test p-value indicates how likely it is that the descriptor is the same for drugs that hit S. cerevisiae and drugs that hit Sc. pombe.
Chapter 4: Chemogenomics
4-22
Conclusions
Some of the above observations on the differences between biologically active and
inactive SMs could be due to the way the chemical libraries that were tested were
generated. The removal of chiral centres, introducing additional flexibility into the
molecule and decreasing its size often lead to lower activity (Feher and Schmidt
2003). As chemical libraries are often created doing exactly those things, this may
explain why, for example, the molecular weight is larger for SMs that are active
compared to those that are inactive. Filtering chemical libraries for SMs with
properties that are enriched in the bioactive set could therefore increase the
efficiency of such screens.
Materials and Methods
Two datasets measuring the bioactivity of SMs in yeast were used.
The first dataset (dataset A) was created by Laura Schuresko-Kapitzky of the
University of California, San Francisco and Scott Lokey of the University of California,
Santa Cruz (unpublished). It describes the IC50 values of 2’961 NCI compounds in
both wild-type S. cerevisiae and wild-type Sc. pombe. The data was obtained by
high-throughput yeast halo assays (Gassner, Tamble et al. 2007).
The second dataset (dataset B) describes the bioactivity of 87’264 NCI compounds
on S. cerevisiae strains. In the initial stage of the anticancer drug screen that
produced this dataset, 87’264 drugs were tested against six different strains of S.
cerevisiae. Of these, 12’068 had a strong effect on yeast growth and 75’196 did not
(http://dtp.nci.nih.gov/yacds/index.html).
I downloaded the molecular structures of each of the SMs
(ftp://ftp.ncbi.nih.gov/pubchem/) and computed nine properties for each of them in
order to assess whether one of these properties was significantly distinct for the SMs
that affect growth. SM properties were computed using Molinspiration's mib tool
(http://www.molinspiration.com/). A local version of the mib tool was kindly provided
by John J. Irwin of the University of California, San Francisco.
All data was analysed using a combination of in-house Perl scripts and the statistical
programming language R.
Chapter 4: Chemogenomics
4-23
References
Ashburner, M., C. A. Ball, et al. (2000). "Gene ontology: tool for the unification of biology. The Gene Ontology Consortium." Nat Genet 25(1): 25-9.
Baetz, K., L. McHardy, et al. (2004). "Yeast genome-wide drug-induced haploinsufficiency screen to determine drug mode of action." Proc Natl Acad Sci U S A 101(13): 4525-30.
Balaji, S., M. Madan Babu, et al. (2006). "Comprehensive analysis of combinatorial regulation using the transcriptional regulatory network of yeast." J Mol Biol 360(1): 213-27.
Borowiak, M., R. Maehr, et al. (2009). "Small molecules efficiently direct endodermal differentiation of mouse and human embryonic stem cells." Cell Stem Cell 4(4): 348-58.
Bredel, M. and E. Jacoby (2004). "Chemogenomics: an emerging strategy for rapid target and drug discovery." Nat Rev Genet 5(4): 262-75.
Brenner, C. (2004). "Chemical genomics in yeast." Genome Biol 5(9): 240. Butcher, R. A., B. S. Bhullar, et al. (2006). "Microarray-based method for monitoring
yeast overexpression strains reveals small-molecule targets in TOR pathway." Nat Chem Biol 2(2): 103-9.
Chang, M., M. Bellaoui, et al. (2002). "A genome-wide screen for methyl methanesulfonate-sensitive mutants reveals genes required for S phase progression in the presence of DNA damage." Proc Natl Acad Sci U S A 99(26): 16934-9.
Chen, S., M. Borowiak, et al. (2009). "A small molecule that directs differentiation of human ESCs into the pancreatic lineage." Nat Chem Biol 5(4): 258-65.
Cheng, Y. and G. M. Church (2000). "Biclustering of expression data." Proc Int Conf Intell Syst Mol Biol 8: 93-103.
D'Haeseleer, P. (2005). "How does gene expression clustering work?" Nat Biotechnol 23(12): 1499-501.
Deutschbauer, A. M., D. F. Jaramillo, et al. (2005). "Mechanisms of haploinsufficiency revealed by genome-wide profiling in yeast." Genetics 169(4): 1915-25.
Dolma, S., S. L. Lessnick, et al. (2003). "Identification of genotype-selective antitumor agents using synthetic lethal chemical screening in engineered human tumor cells." Cancer Cell 3(3): 285-96.
Dudley, A. M., D. M. Janse, et al. (2005). "A global view of pleiotropy and phenotypically derived gene function in yeast." Mol Syst Biol 1: 2005 0001.
Ehrlich, P. (1911). "Aus Theorie und Praxis der Chemotherapie." Folia Serologica VII(7): 697-714.
Feher, M. and J. M. Schmidt (2003). "Property distributions: differences between drugs, natural products, and molecules from combinatorial chemistry." J Chem Inf Comput Sci 43(1): 218-27.
Florian, S., S. Hümmer, et al. (2007). "Chemical genetics: reshaping biology through chemistry." HSFP Journal 1(2): 104-114.
Gassner, N. C., C. M. Tamble, et al. (2007). "Accelerating the discovery of biologically active small molecules using a high-throughput yeast halo assay." J Nat Prod 70(3): 383-90.
Gensini, G. F., A. A. Conti, et al. (2007). "The contributions of Paul Ehrlich to infectious disease." J Infect 54(3): 221-4.
Giaever, G., P. Flaherty, et al. (2004). "Chemogenomic profiling: identifying the functional interactions of small molecules in yeast." Proc Natl Acad Sci U S A 101(3): 793-8.
Giaever, G., D. D. Shoemaker, et al. (1999). "Genomic profiling of drug sensitivities via induced haploinsufficiency." Nat Genet 21(3): 278-83.
Chapter 4: Chemogenomics
4-24
Greene, D., G. Cagney, et al. (2008). "Ensemble non-negative matrix factorization methods for clustering protein-protein interactions." Bioinformatics 24(15): 1722-8.
Haggarty, S. J., K. M. Koeller, et al. (2003). "Multidimensional chemical genetic analysis of diversity-oriented synthesis-derived deacetylase inhibitors using cell-based assays." Chem Biol 10(5): 383-96.
Harbison, C. T., D. B. Gordon, et al. (2004). "Transcriptional regulatory code of a eukaryotic genome." Nature 431(7004): 99-104.
Hillenmeyer, M. E., E. Fung, et al. (2008). "The chemical genomic portrait of yeast: uncovering a phenotype for all genes." Science 320(5874): 362-5.
Hohmann, S. (2005). "The Yeast Systems Biology Network: mating communities." Curr Opin Biotechnol 16(3): 356-60.
Hopkins, A. L. (2008). "Network pharmacology: the next paradigm in drug discovery." Nat Chem Biol 4(11): 682-90.
Hopkins, A. L. and C. R. Groom (2002). "The druggable genome." Nat Rev Drug Discov 1(9): 727-30.
Hopkins, A. L., J. S. Mason, et al. (2006). "Can we rationally design promiscuous drugs?" Curr Opin Struct Biol 16(1): 127-36.
Huangfu, D., R. Maehr, et al. (2008). "Induction of pluripotent stem cells by defined factors is greatly improved by small-molecule compounds." Nat Biotechnol 26(7): 795-7.
Klabunde, T. (2007). "Chemogenomic approaches to drug discovery: similar receptors bind similar ligands." Br J Pharmacol 152(1): 5-7.
Lamb, J., E. D. Crawford, et al. (2006). "The Connectivity Map: using gene-expression signatures to connect small molecules, genes, and disease." Science 313(5795): 1929-35.
Lee, T. I., N. J. Rinaldi, et al. (2002). "Transcriptional regulatory networks in Saccharomyces cerevisiae." Science 298(5594): 799-804.
Lipinski, C. and A. Hopkins (2004). "Navigating chemical space for biology and medicine." Nature 432(7019): 855-61.
Lipinski, C. A., F. Lombardo, et al. (2001). "Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings." Adv Drug Deliv Rev 46(1-3): 3-26.
Lloyd, N. C., H. W. Morgan, et al. (2005). "The composition of Ehrlich's salvarsan: resolution of a century-old debate." Angew Chem Int Ed Engl 44(6): 941-4.
Luesch, H. (2006). "Towards high-throughput characterization of small molecule mechanisms of action." Mol Biosyst 2(12): 609-20.
Luesch, H., S. K. Chanda, et al. (2006). "A functional genomics approach to the mode of action of apratoxin A." Nat Chem Biol 2(3): 158-67.
Luesch, H., T. Y. Wu, et al. (2005). "A genome-wide overexpression screen in yeast for small-molecule target identification." Chem Biol 12(1): 55-63.
Ma'ayan, A., S. L. Jenkins, et al. (2007). "Network analysis of FDA approved drugs and their targets." Mt Sinai J Med 74(1): 27-32.
Madan Babu, M. (2004). Introduction to microarray data analysis. Computational Genomics: Theory and Application. R. P. Grant, Horizon Press (UK): 225-249.
Madeira, S. C. and A. L. Oliveira (2004). "Biclustering algorithms for biological data analysis: a survey." IEEE/ACM Trans Comput Biol Bioinform 1(1): 24-45.
Mnaimneh, S., A. P. Davierwala, et al. (2004). "Exploration of essential gene functions via titratable promoter alleles." Cell 118(1): 31-44.
Muhlrad, D. and R. Parker (1999). "Aberrant mRNAs with extended 3' UTRs are substrates for rapid degradation by mRNA surveillance." Rna 5(10): 1299-307.
Nobeli, I., A. D. Favia, et al. (2009). "Protein promiscuity and its implications for biotechnology." Nat Biotechnol 27(2): 157-67.
Chapter 4: Chemogenomics
4-25
Parsons, A. B., R. L. Brost, et al. (2004). "Integration of chemical-genetic and genetic interaction data links bioactive compounds to cellular target pathways." Nat Biotechnol 22(1): 62-9.
Parsons, A. B., A. Lopez, et al. (2006). "Exploring the mode-of-action of bioactive compounds by chemical-genetic profiling in yeast." Cell 126(3): 611-25.
Perlstein, E. O., E. J. Deeds, et al. (2007). "Quantifying fitness distributions and phenotypic relationships in recombinant yeast populations." Proc Natl Acad Sci U S A 104(25): 10553-8.
Perlstein, E. O., D. M. Ruderfer, et al. (2007). "Genetic basis of individual differences in the response to small-molecule drugs in yeast." Nat Genet 39(4): 496-502.
Raikhel, N. and M. Pirrung (2005). "Adding precision tools to the plant biologists' toolbox with chemical genomics." Plant Physiol 138(2): 563-4.
Rognan, D. (2007). "Chemogenomic approaches to rational drug design." Br J Pharmacol 152(1): 38-52.
Scherf, U., D. T. Ross, et al. (2000). "A gene expression database for the molecular pharmacology of cancer." Nat Genet 24(3): 236-44.
Schuldiner, M., S. R. Collins, et al. (2005). "Exploration of the function and organization of the yeast early secretory pathway through an epistatic miniarray profile." Cell 123(3): 507-19.
Sharom, J. R., D. S. Bellows, et al. (2004). "From large networks to small molecules." Curr Opin Chem Biol 8(1): 81-90.
Spring, D. R. (2005). "Chemical genetics to chemical genomics: small molecules offer big insights." Chem Soc Rev 34(6): 472-82.
Stark, C., B. J. Breitkreutz, et al. (2006). "BioGRID: a general repository for interaction datasets." Nucleic Acids Res 34(Database issue): D535-9.
Stockwell, B. R. (2000). "Frontiers in chemical genetics." Trends Biotechnol 18(11): 449-55.
Stockwell, B. R. (2004). "Exploring biology with small organic molecules." Nature 432(7019): 846-54.
Wuster, A. (2008). "Networking with Drugs." Mol BioSyst 4: 14-17. Xu, R. and D. Wunsch, 2nd (2005). "Survey of clustering algorithms." IEEE Trans
Neural Netw 16(3): 645-78. Yildirim, M. A., K. I. Goh, et al. (2007). "Drug-target network." Nat Biotechnol 25(10):
1119-1126.