Upload
dothuan
View
215
Download
0
Embed Size (px)
Citation preview
1
Supplementary Material on “Drug-Target Network” Muhammed A. Yıldırım, Kwang-Il Goh, Michael E. Cusick, Albert-László Barabási & Marc Vidal
Contents
I. Chemical Similarity between Drugs Targeting the Same Protein
II. Construction of the Drug Target Network
III. Component Size Distributions of DN and TPN
IV. Topological features of DN and TPN
V. Essentiality of Drug Targets
VI. Drug Targets in the Human Protein-Protein Interaction Network
VII. Topological Features of Drug Targets in the Human PPI Network
VIII. Human Disease Network
IX. Properties of Drug Targets in the Human Disease Network
X. Expression Profiles of Drug Targets and Disease Genes
XI. Supplementary References
2
I. Chemical Similarity between Drugs Targeting the Same Protein
Large numbers of drugs target common proteins, e.g. the Histamine H1 Receptor (HRH1)
(targeted by 51 drugs), the Muscarinic 1 Cholinergic Receptor (CHRM1) (targeted by 48 drugs),
the α1A Adrenergic Receptor (ADRA1A) (targeted by 42 drugs), and the Dopamine Receptor D2
(DRD2) (targeted by 40 drugs). All of these drugs are chemically different, but they might show
chemical similarities due to a common ancestry. The DrugBank website provides a chemical
search tool that can be used to find drugs with similar structures to a submitted query chemical.
We developed a web robot to utilize this tool, and we built networks for the drugs targeting the
same protein. An exact match gives a score of 20, so we set a cut-off of 15 to draw an edge
between two proteins (Supplementary Fig. 1a). Proteins targeted by many drugs tend not to be
connected to any other drug in the chemical similarity network showing that their chemical
structure is unique. However, there are also drugs connected to each other exhibiting a clique-
like nature. For instance, drugs targeting HRH1 have 5 such cliques composed of 36 drugs
(Supplementary Fig. 1b), drugs targeting ADRA1A have 2 cliques composed of 14 drugs
(Supplementary Fig. 1c), drugs targeting DRD2 have 3 cliques composed of 22 drugs
(Supplementary Fig. 1d), and drugs targeting CHRM1 have 2 cliques composed of 21 drugs
(Supplementary Fig. 1e).
II. Construction of the Drug Target Network
DrugBank combines detailed drug (i.e. chemical, pharmacological and pharmaceutical) data with
comprehensive drug target (i.e. sequence, structure, and pathway) information. We downloaded
the drug and drug target information from DrugBank Database as of March 29 2006. We used
the SwissProt ID to discriminate only drugs targeting human proteins. There are 890 FDA
approved drugs and 808 experimental drugs complying with these criteria. The FDA-approved
drugs target 394 human proteins, and experimental drugs target 731 human proteins. The total
number of drug target proteins reported in Drugbank is 1,011.
By using the drug – target associations we generated the bipartite graph shown in Fig. 2.
We used Pajek (http://vlado.fmf.uni-lj.si/pub/networks/pajek/) to visualize the network. In this
map, circular nodes represent the drugs and the rectangular boxes the drug target proteins. The
area of the nodes is proportional to the degree in the network. The coloring for the drug nodes
was done following the Anatomical Therapeutic Chemical (ATC) classification code provided
3
by DrugBank. ATC codes are controlled by the World Health Organization Centre for Drug
Statistics Methodology. Drugs are classified into groups at five different levels, and we used the
first level (the main category) to color the nodes. For drugs with more than one classification, we
applied majority voting among different codes. Target proteins were colored according to their
cellular component profiles, by filtering the Gene Ontology information to map the components
to Membrane, Cytoplasm, Organelles, Nucleus, Secreted and Not Available (including other
cellular components and unknown proteins). We considered proteins residing in organellar
membranes as membrane proteins. The length of the edges in this network was varied to make
the graph layout optimally viewable.
Next, we generated the drug network (DN) and the target protein network (TPN)
projections of the DTN. In the DN, nodes represent drugs and the connections are made when
two drugs share at least one target protein (Supplementary Fig. 2). In the protein centric TPN,
protein nodes are connected if they are both targeted by at least one drug simultaneously (Fig. 3).
The edge thickness between two proteins in the TPN is proportional to the number of drugs
targeting these proteins together. The color scheme for the DN and TPN is the same as the DTN.
Layouts of all networks were generated by a simple force-directed algorithm, followed by a local
manual rearrangement for visual clarity, while leaving the overall layout of the network
unperturbed.
III. Component Size Distributions of DN and TPN
The DN and TPN exhibit different component size distributions compared to 10,000 control
randomized networks obtain by randomizing the drug – target protein associations while keeping
both the number of proteins that a drug targets and the number of drugs that a protein is targeted
fixed (Supplementary Fig. 3a,b). The giant component of the DN contains 53% of all drugs,
whereas the giant component of the TPN contains 31% of all drug target proteins. According to
random graph theory, as the density of edges in a graph increases a giant component forms
whose size scales extensively and generally the proportion of nodes in the giant component is
around 80-90%1. Our randomized networks are no exception to this generalization; however, our
observed network giant component sizes are much smaller than the random expectation. A
similar pattern is observed after the inclusion of experimental drugs and their targets
(Supplementary Fig. 3c,d). The pattern is reversed for experimental drugs for which the giant
4
component size is larger than the random case (Supplementary Fig. 3e,f). Generally, the sizes
of the second and third largest components are larger than the random control in all graphs.
For a given drug category, the number of drugs in the giant component and the number of
distinct components that the drugs in this category are given in the following table:
Drug Category Total Number of Drugs
Number of Drugs in the Giant Component
Number of Components that the category is present
Anti-Infectives 16 1 4 Antineoplastics 95 9 46 Antiparasitic 7 0 2 Blood 39 1 13 Cardiovascular 151 93 23 Dermatological 25 1 8 Genito-Urinary 52 39 6 Hormones 20 0 9 Metabolism 91 40 26 Musculoskeletal 54 13 10 Nervous System 211 194 7 Respiratory 65 57 4 Sensory Organs 36 22 4 Various 21 6 15
Tyrosine kinase inhibitors generally function through targeting many proteins at once.
Especially after the success of Imatinib, many kinase inhibitors were developed and are currently
in the approval pipeline. These drugs might be responsible for the polypharmacology and more
random associations observed in the experimental drugs. To test this effect, we excluded all the
tyrosine kinase inhibitors, i.e. a drug that has at least one tyrosine kinase target, from the giant
component analysis. The giant component sizes for the drugs and the targets were 616 and 596
respectively. After 104 randomizations of the drug and target associations, the expected giant
component sizes for the experimental drugs and their targets were 599 ± 12 and 551 ± 10. The
effect of polypharmacology results in a larger than expected giant component size. Upon
excluding the tyrosine kinase inhibitors, the giant component size of the drug network decreased
to 529 which is smaller than the expected giant component size of the randomized networks (549
± 12). However, the observed giant component size of the target network (472) remains larger
5
than the expected giant component size (424 ± 9). Hence, we cannot conclude that the tyrosine
kinase inhibitors are the major contributors to the increased polypharmacology.
IV. Topological features of DN and TPN
A scale-free degree distribution, where degree is the number of connections per node, implies a
preferential attachment model of network growth, particularly when new network nodes are
added one at a time, as in the growth of the drug target network. Both the DN and TPN exhibit
an apparent scale-free like degree (k) distribution (Supplementary Fig. 4a-d), with most drugs
(proteins) linked to only a few other drugs (proteins), while a few drugs (proteins) represent hubs
that are connected to a large number of distinct drugs (proteins). The exponents of the observed
scale-free distributions are small compared to other biological networks. A small exponent
indicates a strong preferential attachment model for network growth2. These results imply that
most new connections are made through a handful of existing nodes, especially through highly
targeted proteins. Upon inclusion of experimental drugs the TPN degree distribution still shows
an apparent scale-free distribution, although the exponent value decreases from 1.59 ± 0.17 to
1.05 ± 0.10. This indicates that experimental drugs introduce more random associations between
the targeted proteins. There is no significant change in the exponent of the DN after the addition
of experimental drugs.
Both the DN and TPN display modular structure. A module in the DN consists of group
of related drugs that target more than one common target, while a module in the TPN
corresponds to one or more drugs targeting all or most proteins in the module. In the TPN
modules appear as complete cliques, where all proteins in the clique are fully connected to the
other members of the clique (Fig. 3). The average clustering coefficient measures the modularity
or cliquishness of a network3, 4. To test for the clustering coefficient, we generated randomized
versions of the DN and the TPN by randomizing the connections between drugs and their targets
in the bipartite DTN while keeping the degree distributions constant. Then we projected
randomized DN and TPN from this randomized DTN. In the DN 509 out of 810 drugs have
clustering coefficient 1, and the average clustering coefficient (0.839 ± 0.010) is many times
larger than the average of clustering coefficients (0.100 ± 0.004) of 104 randomized networks
(Supplementary Fig. 4e). Similarly, the average clustering coefficient of the TPN (0.617 ±
0.025) is an order of magnitude larger than for randomized networks (0.045 ± 0.006). Higher
6
values of clustering coefficient mostly comes from drugs targeting three or more proteins or
proteins targeted by three or more drugs at once. Upon inclusion of experimental drugs the
average clustering coefficient of the TPN increased from 0.617 ± 0.025 to 0.783 ± 0.011
(Supplementary Fig. 4e), which is again higher than the average clustering coefficient of
randomized networks (0.121 ± 0.004), hinting at two trends: (i) there are many experimental
drugs targeting more than two new target proteins and (ii) many experimental drugs introduce
new connections between old targets.
V. Essentiality of Drug Targets
To predict the essentiality of a human gene, we used the phenotype information of the
corresponding mouse orthologs. A human gene was defined as “essential” if a knock-out of its
mouse ortholog confers lethality. We obtained the human-mouse orthology and mouse
phenotype data from Mouse Genome Informatics5 on January 3, 2006. We considered the classes
of embryonic/prenatal lethality and postnatal lethality as lethal phenotypes, and the rest of
phenotypes as non-lethal ones. There were 1,267 mouse-lethal human orthologs, of which 77 are
approved drug targets and of which 149 are targets of all drug targets (including both approved
and experimental drugs). The fraction of essential proteins for approved targets, and all targets
are shown.
Approved Targets All Targets All Proteins
Essential 77 (19%) 149 (15%) 1,267
Non-Essential 145 (37%) 287 (28%) 1,811
Unknown 172 (44%) 575 (57%)
Cancer drugs might selectively act to terminate cancer cells by targeting essential gene
products. To test this, we looked at the proportion of essential proteins among the targets of
approved oncology drugs. Out of 61 protein targets of oncology drugs 17 were essential (28%),
which is only slightly higher than the ratio of the essential proteins in targets of the all approved
drugs (19%).
7
VI. Drug Targets in the Human Protein-Protein Interaction Network
We looked at the distribution of drug targets in the human protein-protein interaction (PPI)
map8,9. By starting from a drug target or any protein in the network (for the randomized control)
we looked at the fraction of drug targets for each distance while applying the breadth-first search
algorithm. The distance is defined as smallest number of edges between pairs of nodes. Rather
than being random, we saw an enhancement in distances 1 and 2, the fraction of drug targets
being larger at distance 2 than fraction of drug targets at distance 1 (Supplementary Fig. 5a,b).
If there were a naïve clustering of the drug targets, we would expect to see a monotonic decrease
of fractions with the distance, as observed for the drug targets in the disease gene network (Fig.
5c). Hence the distance 2 enhancement is an inherent feature of the drug targets in the PPI.
This apparent increase in distances 1 and 2 also relates to the families of ‘druggable’
proteins. Having an increase in distance 2 means that the drugs share a protein interactor rather
than interacting directly between themselves. For instance, G protein coupled receptors (GPCR)
show a much sharper peak around distance 2, which can be explained by the fact that GPCRs
interact with G proteins and rarely interact with each other (Supplementary Fig. 5c). But if we
look at kinases, the peak around distance 1 is higher but the peak around distance 2 is still
prominent (Supplementary Fig. 5d). This shows that some kinases interact with each other, but
others share a protein interactor rather than directly interacting with each other. Another
interesting feature of kinases is the decrease after distance 3 for both the kinase group and the
random control. This result indicates that kinases are central in the network and they usually can
be reached from any node within 3 interactions.
We also measured the number of shared neighbors for two proteins if the two proteins are
targeted simultaneously by a drug. We saw that for both targets of approved drugs and targets of
all drugs there is a significant increase for the shared neighbors, as suggested by distance 2
enhancement in the fraction measure. The number of shared neighbors for drug targets is given
in the table below:
Approved Targets All Targets
Observed 91 330
Expected 11 ± 3 96 ± 11
P-Value 10-22443 10-14812
8
VII. Topological Features of Drug Targets in the Human PPI Network
Aside from the degree of a node in the Human PPI (Fig. 4c), one can also measure average
Betweenness (number of shortest paths passing through the node), average Closeness (inverse of
the average length of the shortest paths calculated for all pairs starting from a particular node)
and average Clustering Coefficient (see Methods) of the nodes. We used Pajek to calculate all
these topological features for the human PPI network.
The degree of a node in the human PPI network correlates well with essentiality10. In
order to discriminate the effect of essential proteins that would inflate the average degree of the
drug targets, we divided each protein group into essential and non-essential sets. All essential
components of a protein group have significantly higher average degree than the corresponding
non-essential part (Supplementary Fig. 5e). However, drug targets have higher average degree
compared to the network average if we compare essential and non-essential parts separately.
Hence, our conclusion about the degree of drug targets would not be influenced by the essential
proteins in the group. This degree difference might be due to investigational bias towards known
drug target proteins, thus yielding more interactors for them, especially in the literature curated
dataset (PPI information obtained from methods like Y2H is unbiased8; however, Y2H is known
to show fewer interactions for membrane proteins, and drug targets are frequently membrane
proteins).
Using the degree information, we can also heuristically calculate the expected ratio of the
drug targets, or ‘druggable genome’. First, we binned the data according to their degree, i.e. we
assigned certain range of degrees to a single bin. Initially each bin was the degree of the protein
itself. However, it is well known that protein interaction networks are scale-free, so logarithmic
binning was the second choice for different logarithmic bases. The ratio of the druggable
genome was estimated by:
( )
∑∑
=
ii
iii
n
nbinratio
*Pr
where ibin is the i th bin, ( )ibinPr is the ratio of number of target proteins belonging to the i th
bin to the number of all target proteins, and in is the number of proteins, regardless of being a
target or not, in the i th bin. Linear binning gives a fraction of 10%. By increasing the base of the
logarithm, this ratio increased from 11% for the base 1.25 to 20% for the base 2. Larger
9
logarithms bin large chunks of data together, so we believe that 10-20% is a reasonable interval,
which corresponds to 3,000 to 6,000 proteins. This estimate agrees well with the earlier
estimates of disease-modifying genes which was around 3,000 – 10,000 genes11.
Betweenness quantifies the expected information flow from a node. Betweenness of the
drug targets is larger than the randomized case (Supplementary Fig. 5f). This shows that drugs
preferentially target proteins that are more central to the cell in terms of decision flow. As drugs
change the phenotypic state of the cell, it is not surprising to see that drug targets have higher
average betweenness. Closeness also measures centrality and the behavior of different protein
classes is similar to their behavior for the betweenness measure (Supplementary Fig. 5g). The
average clustering coefficient profile shows a different behavior than the other two measures.
The average clustering coefficient of the disease related proteins is higher than the other protein
classes (Supplementary Fig. 5h). Targets of all drugs (both approved and experimental) has
significantly lower average clustering coefficient. This might arise because most drug targets are
preferentially distance 2 away for this class of proteins (Supplementary Fig. 5b) and this is
indicative of fewer inter-neighbor connections.
We also looked at the topological features of two protein families that are prominent
‘druggable’ targets: GPCRs and protein kinases. We obtained the list of GPCRs from the
Molecular Class-Specific Information System (MCSIS) project website (http://www.gpcr.org)
and list of kinases from Gene Ontology (GO) database. We found 285 GPCR proteins (excluding
the taste receptors) and 464 protein kinases. Interestingly GPCRs have very low degree
compared to the other proteins, whereas kinases show large connectivity (Supplementary Fig.
5e). This trend persists for the betweenness (Supplementary Fig. 5f).
VIII. Human Disease Network
The human disease network (HDN) is reported elsewhere6. We briefly describe the dataset and
procedure to build this network. The data was obtained from the Morbid Map (MM) of the
Online Mendelian Inheritance in Man (OMIM)7, which has the most up-to-date disorder-gene
associations. The MM was downloaded as of 21 December 2005. Each entry of the MM contains
information about the name of the disorder, associated gene symbols, OMIM ID, and the
chromosomal location. The “strong” associations, i.e. at least one mutation in the gene is proven
to be causative to the disorder, have the “(3)” label, and there were 2,929 disorder terms with this
10
tag. We merged these disorders into 1,284 “distinct” disorders first automatically and then by
manual validation. A broader classification into 22 disorder classes was done manually,
including 155 disorders assigned to the “multiple” class for disorders with multiple clinical
features, and 31 disorders assigned to the “unknown” class. The map in Fig. 5a was obtained by
combining disease nodes (circles) to gene nodes (rectangles). Coloring of the nodes represents
the disease classes. In the map, genes that are associated with only a single disease are omitted
for clarity and aesthetics.
IX. Properties of Drug Targets in the Human Disease Network
We looked at the average number of genes associated with each disease and the average number
of disorders associated with each gene. When one of the genes associated with a disorder is also
a drug target (for both approved and experimental targets) the average number of genes
associated with that disease is higher than the random case (Supplementary Fig. 6a).
For the genes that are also approved drug targets the average number of associated disorders was
higher than random, whereas this average was slightly lower for experimental drug targets
(Supplementary Fig. 6b). These results indicate that experimental drug targets are more
peripheral in the disease gene network. From these observations we would expect the average
degree of target of approved drugs to be highest among all gene classes (Fig. 5b), but we see an
opposite trend. This is largely due to approved drug targets being peripheral at the disease gene
network as well. For instance cancer genes that are targeted by approved drugs are peripheral at
the cancer disease cluster where most of the higher degree genes reside in the center (Fig. 5b).
For different categories of genes, the ratio of the diseases associated with targets of approved
drugs change drastically. However, the average degree of targets of drugs approved after 1996
increases from 3.58 ± 0.47 to 4.17 ± 0.52. Note that the degree in the human disease gene
network is correlated with the number of diseases the gene is involved with, and the number of
other genes associated with the same diseases. This increase could indicate investigational bias
towards diseases with few or no effective indicated drugs, hence producing larger numbers of
associated disease genes.
11
X. Expression Profiles of Drug Targets and Disease Genes
We used microarray data available for 36 normal human tissues12. There is expression
information for 293 approved drug targets, 808 of all (approved plus experimental) drug targets.
First we looked at the co-expression correlations between the proteins that are targeted by the
same drug (Supplementary Fig. 7a). We see a significantly more co-expression for the targets
of approved drugs (Kolmogorov – Smirnov test, P < 10-9).
To quantify how well at least one of the targets of the drugs is co-expressed with
corresponding disease-causing genes, we took the maximum of the Pearson Correlation
Coefficients (PCC) obtained from the drug-disease pairs, where at least one drug target protein
and one corresponding disease gene product were in the expression dataset. The distribution has
a second peak around 1 (Supplementary Fig. 7b), corresponding to a drug targeting exactly the
disease-causing gene product. Other than this peak, drug target – disease gene pairs show less
correlation than random. For the random set two groups of genes with the same number of
disease genes and drug targets were selected randomly for each drug – disease pair. The
neurological, respiratory, psychiatric and endocrine disease classes show more co-expression; in
contrast ophthamological, gastrointestinal and immunological disease classes show less
(Supplementary Fig. 7c).
12
XI. Supplementary References
1. Newman, M. Scientific collaboration networks. I. Network construction and fundamental results. Phys. Rev. E 64, 016131 (2001).
2. Albert, R. & Barabasi, A.L. Statistical mechanics of complex networks. Reviews of Modern Physics 74, 47-97 (2002).
3. Ravasz, E., Somera, A., Mongru, M., Oltvai, Z. & Barabási, A.-L. Hierarchical organization of modularity in metabolic networks. Science 297, 1551-1555 (2002).
4. Watts, D. & Strogatz, S. Collective dynamics of 'small-world' networks. Nature 393, 440-442 (1998).
5. Eppig, J.T. et al. The Mouse Genome Database (MGD): from genes to mice--a community resource for mouse biology. Nucleic Acids Res. 33, D471-475 (2005).
6. Goh, K.I. et al. The human disease network. Proc. Natl. Acad. Sci. USA 104, 8685-8690 (2007).
7. Hamosh, A., Scott, A.F., Amberger, J.S., Bocchini, C.A. & McKusick, V.A. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 33, D514-517 (2005).
8. Rual, J.-F. et al. Toward a proteome-scale map of the human protein-protein interaction network. Nature 437, 1173-1178 (2005).
9. Stelzl, U. et al. A human protein-protein interaction network: A resource for annotating the proteome. Cell 122, 957-968 (2005).
10. Jeong, H., Mason, S.P., Barabasi, A.L. & Oltvai, Z.N. Lethality and centrality in protein networks. Nature 411, 41-42 (2001).
11. Hopkins, A.L. & Groom, C.R. The druggable genome. Nat. Rev. Drug Discov. 1, 727-730 (2002).
12. Ge, X. et al. Interpreting expression profiles of cancers by genome-wide survey of breadth of expression in normal tissues. Genomics 86, 127-141 (2005).
Anisotropine Methylbromide
Atropine
Benzquinamide
Benztropine
Bethanechol
Biperiden
Buclizine
Carbachol
Carbinoxamine
Cevimeline
Clidinium
Cryptenamine
Cyclizine
Cyclopentolate
Cycrimine
Desipramine
Dicyclomine
Diphenidol
Doxylamine
Ethopropazine
Flavoxate
Glycopyrrolate
Homatropine Methylbromide
Hyoscyamine IpratropiumMethantheline
Methscopolamine
Metixene
Metoclopramide
Olanzapine
Oxybutynin
Oxyphencyclimine
Oxyphenonium
Pilocarpine
Pirenzepine
Procyclidine
Promazine
Promethazine
Propantheline
Propiomazine
Scopolamine
Solifenacin succinate SuccinylcholineTolterodine
Tridihexethyl
TriflupromazineTrihexyphenidyl
Trospium
Acetophenazine
Amantadine
Apomorphine
Aripiprazole
Bromocriptine
Buspirone
Cabergoline
Chlorpromazine
Chlorprothixene
Cinnarizine
Clozapine
DomperidoneDroperidol
Flupenthixol
Fluphenazine
Haloperidol
Levodopa
Loxapine
Mesoridazine
Metoclopramide
Minaprine
OlanzapinePergolide
Perphenazine
Pimozide Pramipexole
Prochlorperazine Promazine
Promethazine
Propiomazine
Quetiapine
Remoxipride
Risperidone
Ropinirole
Sulpiride
Thiethylperazine
ThioridazineTrifluoperazine
Triflupromazine
Ziprasidone
Alfuzosin
Amiodarone
Amphetamine
Benzphetamine
Betanidine Carvedilol Dapiprazole Doxazosin Epinastine
Epinephrine
Ergotamine
Flupenthixol
Guanadrel Sulfate
GuanethidineLabetalol
Maprotiline
Metaraminol
MethoxamineMidodrine
Nefazodone
Nicergoline
Nilutamide
Norgestrel Oxymetazoline
Perphenazine
Phenoxybenzamine
Phenylephrine Phenylpropanolamine
Prazosin
PromazinePromethazine Propiomazine
Pseudoephedrine
Risperidone
TamsulosinTerazosin
Thiethylperazine
Thioridazine
Tolazoline
Nefazodone
Trifluoperazine
Ziprasidone
Astemizole
Azatadine
Azelastine
Benzquinamide
Bromodiphenhydramine
Brompheniramine
Buclizine
Carbinoxamine
Cetirizine
Chlorpheniramine
Cinnarizine
Clemastine
Clozapine
Cyclizine
Desipramine
Desloratadine
Dexbrompheniramine
Dimenhydrinate
Diphenhydramine
Diphenylpyraline
Doxepin
Doxylamine
Emedastine
Epinastine
Fexofenadine Histamine Phosphate
Hydroxyzine
Ketotifen Fumarate
Levocabastine
Loratadine
Maprotiline
Meclizine
Mequitazine
Methdilazine
Nedocromil
Olanzapine
Olopatadine
Orphenadrine
Pemirolast
Prochlorperazine
PromazinePromethazine
Propiomazine
Risperidone
Terfenadine
Trazodone
Trimeprazine
Tripelennamine
Triprolidine
Ziprasidone
a) b)
c)
e)
Similarity Score = 20
Similarity Score = 18
Chlorpheniramine ChloroquineSimilarity Score = 16
Chlorpheniramine CarbinoxamineSimilarity Score = 19
Chlorpheniramine TropicamideSimilarity Score = 17
Chlorpheniramine OrphenadrineSimilarity Score = 15
Chlorpheniramine
Dexbrompheniramine
d)
Chlorpheniramine
Chlorpheniramine
Drugs Targeting ADRA1A
Drugs Targeting HRH1
Drugs Targeting CHRM1
Drugs Targeting DRD2
Supplementary Figure 1. Chemical similarity of drugs targeting same protein. a) Drugs with different chemical similarity scores for the input drug Chlorpheniramine. b-e) Chemical similarity networks for drugs targeting the same protein. Two drugs are connected if the similarity score is at least 15. A blue edge corresponds to similarity score of 15 or 16, orange edge corresponds to similarity score of 17 or 18, a red edge is similarity score of 19 or 20. Shaded regions are regions of high similarity showing a clique like nature. b) Chemical similarity network for drugs targeting HRH1. c) Chemical similarity network for drugs targeting ADRA1A. d) Chemical similarity network for drugs targeting DRD2. e) Chemical similarity network for drugs targeting CHRM1.
METABOLISMBLOODCARDIOVASCULARDERMATOLOGICALGENITO-URINARYHORMONESANTI-INFECTIVESANTINEOPLASTICMUSCULOSKELETALNERVOUS SYSTEMANTIPARASITICRESPIRATORYSENSORY ORGANSVARIOUS
Supplementary Figure 2. The Drug Network (DN). In the DN each node corresponds to a distinct chemical entity, and each node is colored based on the anatomical therapeutic chemical class it belongs to. The names of the 14 drug classes are shown on the right. The size of each node is proportional to the number of proteins targeted by the corresponding drug. The link thickness is proportional to the number of proteins shared by the drugs it connects. The nodes are colored according to their Anatomical Therapeutic Chemical Classification.
0 10 20 30 1259 14310.0001
0.001
0.01
0.1
1
10
100RandomAll Drugs
RandomExperimental Targets
0 10 20 725 782
RandomAll Targets
0 10 20 30 476 7880.0001
0.001
0.01
0.1
1
10
100RandomApp Drugs (DN)
0 5 10 122 302
RandomApproved Targets (TPN)
0 5 10 601 6400.0001
0.001
0.01
0.1
1
10
100RandomExperimental Drugs
0 5 10 550 596
Ave
rage
Num
ber
of C
ompo
nent
sA
vera
ge N
umbe
r of
Com
pone
nts
Ave
rage
Num
ber
of C
ompo
nent
sa) b)
c) d)
e) f)
Component Size Component Size
Supplementary Figure 3. Component Size Distributions of SN and TPN. a) for the drug network (DN), b) for the target protein network (TPN), c) for the DN after addition of the experimental drugs, d) for the TPN after addition of targets of the experimental drugs, e) for the DN consisting only of the experimental drugs, and f) for the TPN consisting only of targets of the experimental drugs.
1 10 100
1
10
100
1 10 50
1
10
100
1 10 100
1
10
100
1 10 100
1
10
100
Drug Network (DN) Target Protein Network (TPN)
DN + Experimental Drugs TPN + Targets of Experimental Drugs
15.022.1 ±−∝ k 17.059.1 ±−∝ k
16.030.1 ±−∝ k 1.005.1 ±−∝ k
a) b)
c) d)
TPN+EXPDN+EXP
TPN
DN
DNTPN DN+EXP
TPN+EXP
0
0.2
0.4
0.6
0.8
Ave
rage
Clu
ster
ing
Coe
ffic
ient Observed Random
e)
Supplementary Figure 4. Degree Distributions and Clustering Coefficients of DN and TPN. a) of the drug network (DN), b) of the target protein network (TPN), c) of the DN after addition of the experimental drugs, d) of the TPN after addition of targets of the experimental drugs. e) Clustering coefficient of the networks compared to the random case.
0.18
0.19
0.2
0.21
0.22
0.23
0.24
0.25
0.26
AllProteins
TargetProteins
(TP)
TP + EXP Disease Kinases GPCRs
Ave
rage
Clo
sene
ss
All Non Essential Essential
a) b)
0
0.05
0.1
0.15
0.2
1 2 3 4 5Distance
Frac
tion
Approved TargetsAll
0.05
0.1
0.15
0.2
0.25
1 2 3 4 5Distance
Frac
tion
All TargetsAll
f)
0
6
12
18
24
AllProteins
TargetProteins
(TP)
TP + EXP Disease Kinases GPCRs
Ave
rage
Deg
ree
All Non Essential Essential
0
0.02
0.04
0.06
0.08
0.1
0.12
AllProteins
TargetProteins
(TP)
TP + EXP Disease Kinases GPCRsAve
rage
Clu
ster
ing
Coe
ffic
ient All Non Essential Essential
0
0.001
0.002
0.003
AllProteins
TargetProteins
(TP)
TP + EXP Disease Kinases GPCRs
Ave
rage
Bet
wee
nnes
sAll Non Essential Essential
d)c)
h)g)
e)
0
0.1
0.2
0.3
0.4
1 2 3 4 5Distance
Frac
tion
GPCRsAll
0
0.05
0.1
0.15
0.2
1 2 3 4 5Distance
Frac
tion
KinasesAll
Supplementary Figure 5 . Drug Targets and Some Druggable Families in the Human Protein – Protein Interaction Network. a) Fraction of targets of approved drugs while applying a breadth-first search starting from either a target protein or a random protein in human PPI network. b) Doing the same for GPCRs. c) Doing the same for kinases. d) Doing the same for targets of all drugs (approved and experimental). e) Average degree of several classes of proteins in the human PPI network. f) Average betweenness of several classes of proteins in the human PPI network. g) Average closeness of several classes of proteins in the human PPI network. h) Average clustering coefficient of several classes of proteins in the human PPI network.
0
1
2
3
4
5
Disorders (1286) Disease with AppTargets (239)
Disease with ExpTargets (234)
Ave
rage
Num
ber
of A
ssoc
iate
d G
enes
1
1.5
2
Disease genes (1777) Approved Targets(166)
Experimental Targets(210)A
vera
ge N
umbe
r of
Ass
ocia
ted
Dis
orde
rs
a) b)
c)
0
50
100
150
200
250
Endo
crin
e
Hem
atol
ogic
al
Card
iova
scul
ar
Psyc
hiat
ric
Unc
lass
ified
Conn
ectiv
e_tis
sue_
diso
rder
Rena
l
Imm
unol
ogic
al
Nut
ritio
nal
Met
abol
ic
Bone
Ear,N
ose,
Thro
at
Neu
rolo
gica
l
Dev
elop
men
tal
Oph
tham
olog
ical
Resp
irato
ry
Canc
er
Mus
cula
r
Skel
etal
mul
tiple
Gas
troin
testi
nal
Der
mat
olog
ical
Cou
nt
With Drug TargetsNo Drug Targets
Disease Categories
Supplementary Figure 6. Disease and Gene Statistics in Human Disease Network. a) Average number of genes for all diseases in the data set compared to diseases which are associated with genes that are also approved (App) targets or experimental (Exp) targets. Numbers in the parenthesis show how many drugs are in each category. b) Average number of disorders associated with each disease compared to the average number of diseases for the genes that are also approved drug targets or experimental drug targets. Numbers in the parenthesis show how many genes are in each category. c) Count of disorders with genes that are also approved drug targets for different disease categories. Disease categories are ranked according to the ratio of number of disorders associated with approved drug targets to all disorders in that category (except the Nutritional disease class where there are only four disorders).
0
0.05
0.1
0.15
0.2
0.25
-0.5 -0.3 -0.1 0.1 0.3 0.5 0.7 0.9
Average Max PCC
Freq
uenc
y
Drug Targets vs Disease GenesRandom
0
0.05
0.1
0.15
0.2
-0.9 -0.7 -0.5 -0.3 -0.1 0.1 0.3 0.5 0.7 0.9
Expression PCC
Freq
uenc
yApproved Target ProteinsAll Target ProteinsAll Proteins
b)
c)
a)
0
0.1
0.2
0.3
0.4
0.5
0.6
Neu
rolo
gica
l(74)
Res
pira
tory
(36)
Unc
lass
ified
(7)
Psyc
hiat
ric(4
4)
Endo
crin
e(50
)
Met
abol
ic(2
7)
Nut
ritio
nal(7
)
Car
diov
ascu
lar(1
42)
Can
cer(1
10)
Hem
atol
ogic
al(4
4)
Ren
al(6
)
Bon
e(26
)
Der
mat
olog
ical
(4)
Con
nect
ive_
tissu
e_di
sord
er(5
4)
mul
tiple
(7)
Mus
cula
r(5)
Oph
tham
olog
ical
(36)
Gas
troin
testi
nal(1
1)
Imm
unol
ogic
al(4
2)
Disease Categories
Ave
rage
Max
PC
C
Supplementary Figure 7. Expression Profile Correlations. a) Frequency of expression correlations for proteins that are targeted by approved (or approved plus experimental) drugs simultaneously compared with all calculated correlations. b) Frequency of average maximum PCC between drug targets and corresponding disease genes. c) Ranked average max PCC for various disease categories. Numbers in the parenthesis show how many such drug-disease pairs were calculated in each disease category.