SwissTargetPrediction: a web server for target prediction of … · NucleicAcidsResearch,2014,Vol.42,WebServerissue W33 smallmoleculeinteractionsinthelastfewyearshasenabled researcherstodevelopligand-basedapproachesfortarget

W32ndashW38 Nucleic Acids Research 2014 Vol 42 Web Server issue Published online 03 May 2014doi 101093nargku293

SwissTargetPrediction a web server for targetprediction of bioactive small moleculesDavid Gfeller1 Aurelien Grosdidier1 Matthias Wirth1 Antoine Daina1 Olivier Michielin123

and Vincent Zoete1

1Swiss Institute of Bioinformatics (SIB) Quartier Sorge Batiment Genopode CH-1015 Lausanne Switzerland2Ludwig Institute for Cancer Research Centre Hospitalier Universitaire Vaudois Lausanne Switzerland and3Oncology Department Centre Hospitalier Universitaire Vaudois Lausanne Switzerland

Received January 29 2014 Revised March 24 2014 Accepted March 30 2014

ABSTRACT

Bioactive small molecules such as drugs or metabo-lites bind to proteins or other macro-molecular tar-gets to modulate their activity which in turn resultsin the observed phenotypic effects For this reasonmapping the targets of bioactive small molecules isa key step toward unraveling the molecular mecha-nisms underlying their bioactivity and predicting po-tential side effects or cross-reactivity Recently largedatasets of proteinndashsmall molecule interactions havebecome available providing a unique source of in-formation for the development of knowledge-basedapproaches to computationally identify new targetsfor uncharacterized molecules or secondary targetsfor known molecules Here we introduce SwissTar-getPrediction a web server to accurately predict thetargets of bioactive molecules based on a combina-tion of 2D and 3D similarity measures with knownligands Predictions can be carried out in five differ-ent organisms and mapping predictions by homol-ogy within and between different species is enabledfor close paralogs and orthologs SwissTargetPre-diction is accessible free of charge and without loginrequirement at httpwwwswisstargetpredictionch

INTRODUCTION

Molecular insight into the mode of action of bioactive smallmolecules is key to understanding observed phenotypespredicting potential side effects or cross-reactivity and op-timizing existing compounds (1ndash3) In particular mappingtheir targets is a crucial step toward providing a rational un-derstanding of small moleculersquos bioactivity For these rea-sons high-throughput reverse screening of chemical com-pounds against arrays of protein targets has become anintegral part of drug discovery pipelines (4) As a result

for many proteins such as specific kinases or phosphataseshundreds of small molecule ligands have been identifiedSuch large screening initiatives have also provided uniqueinsights into the specificity and pharmacology of proteinfamilies (15) Recently these data have been collected inseveral public databases like ChEMBL (6) or PubChem(7) storing information on bioactivities or ZINC (8) con-taining information on commercially available compoundsThese can be mined automatically to retrieve specific infor-mation for a large number of molecules

However molecular targets still remain unknown in sev-eral cases For instance phenotypic assays indicate whethera molecule is active or not without necessarily providingdirect information on its actual molecular targets (9ndash11)Moreover for most molecules experiments have been per-formed with a limited set of targets such as kinases orG protein-coupled receptors and possible off-target effectshave been rarely tested for Finally new molecules being de-veloped for specific purposes may have several targets thatare typically not known in advance For instance a recentstudy on a set of 802 drugs and interaction data assembledfrom seven different databases has shown that known drugshave on average six molecular targets on which they exhibitactivity (12) Identifying these secondary targets is crucialFirst it can indicate possible adverse side effects that mightarise when using the molecule thereby decreasing the attri-tion rate in clinical trials due to toxicity (1314) Second itprovides ways of repositioning (or repurposing) moleculesfor new applications This has become a central theme inpharmaceutical research in view of the difficulty to launchnew chemical entities In particular it is increasingly beingrecognized that several compounds traditionally used forone given application may actually show potent activity inother therapeutic settings (21516)

Computational predictions play an important role in nar-rowing down the set of potential targets and suggestingsecondary targets for known molecules (1315) In partic-ular the large amount of information collected on proteinndash

To whom correspondence should be addressed Tel +41 21 692 4053 Fax +41 21 692 4065 Email OlivierMichielinisb-sibchCorrespondence may also be addressed to Vincent Zoete Email VincentZoeteisb-sibch

Ccopy The Author(s) 2014 Published by Oxford University Press on behalf of Nucleic Acids ResearchThis is an Open Access article distributed under the terms of the Creative Commons Attribution License (httpcreativecommonsorglicensesby30) whichpermits unrestricted reuse distribution and reproduction in any medium provided the original work is properly cited

Nucleic Acids Research 2014 Vol 42 Web Server issue W33

small molecule interactions in the last few years has enabledresearchers to develop ligand-based approaches for targetprediction (117ndash20) With SwissTargetPrediction our goalis to provide a user-friendly web interface for a knowledge-based algorithm recently developed in our group (18)to predict the targets of bioactive small molecules Com-pared to other existing approaches SwissTargetPredictionhas several distinctive features First it enables combiningboth 2D and 3D similarity measures with known ligandsSecond it provides results in five different species Third itallows users to map predictions between and within organ-isms based on target homology

THE SWISSTARGETPREDICTION METHOD ANDDATASET

SwissTargetPrediction is based on the observation that sim-ilar bioactive molecules are more likely to share similartargets (121) Therefore the targets of a molecule can bepredicted by identifying proteins with known ligands thatare highly similar to the query molecule In this ligand-based strategy a major challenge is to accurately identifyand quantify similarity between the query molecule and theknown ligands Early approaches have focused on deter-mining chemical similarity by using molecular fingerprints(22) (sometimes called 2D similarity) While compounds ex-hibiting a high similarity under these measures clearly havean increased likelihood for interactions with similar tar-gets the biophysics of molecular recognition suggests thatsimilarity in ligand shape or electrostatic potential distri-bution could also lead to a similar effect (23) Therefore3D structural similarity measures have been developed toassess similarity between molecules (24ndash29) Recently wehave shown that combining 2D and 3D similarity measuressignificantly increases the target prediction accuracy espe-cially if the query molecule is new and does not belong toan already well-studied chemical series (18) In SwissTarget-Prediction both 2D similarity and 3D similarity values arecomputed against a set of known ligands For 2D similar-ity we use FP2 fingerprints to describe molecules as imple-mented in OpenBabel version 220 The similarity betweentwo molecules is quantified with the Tanimoto coefficient(which corresponds to the number of shared fingerprintpatterns divided by the total number of fingerprint pat-terns describing the two molecules) For 3D similarity wefirst generate 20 different conformations of each molecule(see Supplementary Materials) From these different con-formations 20 Electroshape vectors which consist of 18-dimensional real vectors (27) are computed The Manhat-

tan distance(

d =18sum

s=1|xs minus ys |

)is used to compare vectors

(x and y) describing two different molecules The final 3Dsimilarity value between molecules i and j is computed as1

(1 + 1

18 di j) where dij is the smallest Manhattan distance

among the 20times20 distances calculated over all possible con-formations of each molecule (see also Supplementary Ma-terials) The final score of a target corresponds to a combi-nation of similarity measures based on a logistic regressionof the similarity values with the most similar ligands usingboth 2D and 3D similarity measures (see SupplementaryMaterials and (18)) Coefficients of the logistic regression

for each molecule size are listed in Supplementary TableS1 Target scores range therefore between 0 and 1 with thelargest possible value being reached if the query moleculeis a known ligand of the target These scores are used torank predicted targets A probability has been derived fromthis score to assess the likelihood of the predictions to becorrect These probability values correspond to the aver-age precision (ie number of true-positives divided by thetotal number of predicted targets at different thresholds)obtained in a leave-one-out cross-validation study over ourtraining set (see Supplementary Materials) As it is based oncross-validation they may suffer from internal biases in ourtraining data (eg presence of large congeneric series of sim-ilar molecules) and if a new query molecule without relatedmolecules in our database is tested they may slightly over-estimate the prediction accuracy For this reason we stressthat these probabilities are primarily used to rank targetspredicted to bind to a given small molecule In particularthey should not be used to compare predictions obtainedwith different molecules

The set of proteinndashligand interactions was retrieved fromthe ChEMBL database version 16 (6) using stringent cri-teria to remove ambiguous cases First only interactionsinvolving single proteins or protein complexes as well asligands with less than 80 heavy atoms were consideredSecond selected interactions had to be annotated as di-rect binding (lsquoassay typersquo = lsquoBrsquo) with an activity (Ki KdIC50 or EC50) lower than 10 M in all assays Interactionswere retrieved in five organisms (human mouse rat cowand horse) In total our dataset consists of 280 381 smallmolecules interacting with 2686 targets with the majorityof targets (66) found in human (see Table 1)

THE SWISSTARGETPREDICTION WEB INTERFACE

SwissTargetPrediction provides an intuitive interface to pre-dict small molecule protein targets (see also Supplemen-tary Figure S1) Query molecules can be inputted either asSMILES or drawn in 2D using the javascript-based molec-ular editor of ChemAxon (httpwwwchemaxoncom) TheSMILES input field and the 2D interface are automaticallysynchronized The organism in which predictions should bemade can be selected The current version of SwissTarget-Prediction allows users to choose between five organismshuman mouse rat cow and horse the default being human(see Supplementary Figure S1) Once a molecule has beenprovided either by SMILES or by drawing and an organ-ism has been chosen the lsquoSubmitrsquo button becomes clickableand calculations can start The SMILES is first checked toensure that it corresponds to a valid chemical structure Iftrue the similarity (both 2D and 3D) between the querymolecule and all ligands in our database is computed andthe score of each target is derived from the combined 2Dand 3D similarity values with the most similar ligands (seeSupplementary Materials)

The result page lists the predicted targets with their com-mon name together with links to GeneCards (30) (for hu-man proteins) UniProt (31) and ChEMBL (6) databaseswhen available (see Figure 1) Targets are ranked accord-ing to their score with respect to the query molecule Thetarget classes are displayed in the last column These classes

W34 Nucleic Acids Research 2014 Vol 42 Web Server issue

Table 1 Number of targets in each organism

Organisms Number of targetsNumber of targets including homology-basedpredictions

Homo sapiens 1768 2547Mus musculus 342 2345Rattus norvegicus 469 2657Bos taurus 104 2272Equus caballus 3 2367Total 2686 12 188

The first column shows the number of targets with experimental data The second column shows the number of targets when including homology-basedpredictions

Figure 1 Prediction result page This page shows the list of predicted tar-gets for the query molecule (here chlorotrianisene) Targets are ranked ac-cording to their scores Links to GeneCards (under lsquoCommon namersquo col-umn) UniProt and ChEMBL (when available) are provided Green barsindicate the estimated probability of a protein to be a true target given itsscore The sixth column ( sim cmpds 3D2D) shows the number of lig-ands of the predicted target or its homologs that display similarity withthe query molecule based on either 2D or 3D similarity measures Thesenumbers are linked to pages containing information about these ligandsFor instance the number circled in red provides a link to the list of ligandsof ESR1 or its homologous proteins that display similarity with the querymolecule (see Figure 2A) The pie chart shows the distribution of targetclasses Predictions based on homology are indicated with lsquo(by homology)rsquo(see the green box)

were retrieved from the ChEMBL target annotation and ingeneral correspond to the l1 level in the target classification(6) Exceptions include enzymes and transcription factorsfor which more detailed classification based on l2 or l3 lev-els is sometimes shown if they occur frequently in the targetlist (eg Tyr kinase see Figure 1) The pie chart on the topright of the page shows a summary of the different target

classes present among the predicted targets All results canbe downloaded as text (txt or csv) images (jpg) printablereport (pdf) copied to the clipboard or sent to an email ad-dress by clicking on the links following the lsquoRetrieve datarsquofield The probability derived from the target scores (seeSupplementary Materials) is displayed in the fifth columnas a horizontal bar (see Figure 1)

In the example of Figure 1 the predicted tar-gets of chlorotrianisene (CHEMBL1200761) includeProstaglandin GH synthase 1 (COX-1) and estrogenreceptor (ESR1) Chlorotrianisene is a known inhibitorof COX-1 (13) although the interaction is not presentin ChEMBL Moreover while no direct binding betweenchlorotrianisene and estrogen receptor is reported inChEMBL functional assay results in this database indi-cate that chlorotrianisene is active on estrogen receptor(32) These results show that in this case several of thepredictions are true-positives

To enable users to visually explore the ligands of the pre-dicted targets all ligands with a similarity (either 2D or 3D)larger than a minimal threshold value can be examined byfollowing the links provided in the sixth column Figure 2Ashows an example of the results obtained by following thelink in the red circle of Figure 1 Ligands are listed accord-ing to their similarity with the query molecule A thresholdfor 3D similarity values has been set to 075 and the onefor 2D similarity values to 045 Below these thresholds lig-ands show very low similarity with the query molecule andare not listed A link to the ChEMBL entries is providedfor the ligands and the similarity with the query molecule isindicated We note that manually exploring the ligands sim-ilar to the query molecule is strongly recommended to as-sess how reasonable the predictions are and to see what kindof ligands display the strongest similarity with the querymolecule

Finally help pages with interactive screenshots of thewebsite are available an FAQ page is provided to guideusers and some of the raw data used in the predictions canbe retrieved via the download page

HOMOLOGY-BASED PREDICTIONS

Proteins originating from a common ancestor in generaldisplay a high degree of sequence and structure similarityFrom a computational point of view this similarity has beenwidely used in protein structure and function prediction forinstance (3334) Recently it has been shown that the bind-ing of small molecules is also often conserved between ho-


Figure 2 (A) List of ligands of ESR1 or its homologous proteins display-ing 3D similarity with a query molecule (here chlorotrianisene) This pageis obtained by following the link in the red circle in Figure 1 Moleculesare ordered based on their 3D similarity with the query molecule (B) Listof ligands of ESR2 or its homologous proteins displaying similarity witha query molecule (here chlorotrianisene) obtained by following the linkin the green circle in Figure 1 If a molecule is a ligand of a homologousprotein of the predicted target the actual target as well as its organism isindicated (see the green box) When the most similar molecule is a ligandof a homologous protein the prediction is labeled as lsquoby homologyrsquo in theresult page (Figure 1) A link to the ChEMBL entry is provided for eachcompound

mologs (35ndash37) In particular orthologous proteins in closespecies such as human and rat often share most of their lig-ands (36) The same holds for paralogs although the de-gree of similarity between ligands of paralogous proteins isslightly lower than between orthologous proteins (36)

In SwissTargetPrediction we provide the possibility tomap predictions based on protein homology both withinand between organisms Orthologs and paralogs were re-trieved from Ensembl Compara (38) Treefam (39) andorthoDB (40) using the union of all three datasetsHomology-based predictions were carried out as followsthe query molecule is compared to all molecules that bind totargets that have homology with a protein in the selected or-ganism Predictions are then carried out as if the ligands ofthese proteins were actual ligands of their homologs in theselected organism If the ligand most similar to the querymolecule is only observed to bind to a homologous proteinpredictions are listed as lsquoby homologyrsquo on the SwissTar-getPrediction result page (see Figure 1 green box) More-

over in the list of ligands similar to the query moleculethose binding only to homologous targets are also desig-nated with lsquoBy Homologyrsquo and the actual target is indicated(Figure 2B green box) For instance in Figure 1 chloro-trianisene (CHEMBL1200761) is predicted to bind ESR2mainly because it shows similarity with ligands of ESR1(see Figure 2A) The predicted target ESR2 is therefore an-notated with lsquoby homologyrsquo (green box Figure 1) Figure2B shows the list of most similar ligands obtained by fol-lowing the link in the green circle of Figure 1 As the mostsimilar molecule is a ligand of ESR1 it is labeled with lsquoByhomologyrsquo and both the actual target and the organism aredisplayed We note that for organisms with less data (eghorse cow) many predictions might be based on homologywith targets in other species

Including homology-based predictions allowed us to ex-pand the list of predicted targets from 2686 to over 12 188in all five organisms studied here (see Table 1) As some ofthese proteins do not have reported bioactivity data directlyassociated with them they may not be in the ChEMBLdatabase This is the reason why for instance KCNH6 andKCNH7 do not have ChEMBL IDs in Figure 1 Homol-ogy relationships between all targets can be downloaded athttpwwwswisstargetpredictionchdownloadphp

VALIDATION DATASET

Extensive cross-validation of the SwissTargetPredictionalgorithm has been published previously (18) To comple-ment these data we also tested our method against a newset of molecules that are not present in the training set Inparticular we used molecules from version 17 of ChEMBL(6) that were not present in version 16 (ie not present inthe training set) We further required that each moleculebe involved in at least one positive (lt2 M) and onenegative (gt50 M) interaction This resulted in a set of 213molecules with 346 positive and 278 negative interactionsTo obtain a more balanced dataset that better reflects themuch larger number of non-interacting proteinndashligandpairs we included additional negative interactions by link-ing the molecules in our test set to randomly chosen targetspresent in ChEMBL (version 16) so as to have five timesmore negative than positive interactions for each moleculeThe full benchmark dataset can be downloaded on our web-site (httpwwwswisstargetpredictionchdownloadphp)We then ran the SwissTargetPrediction algorithm asimplemented on the website to assess how accurate thepredictions are This resulted in an average AUC value of087 on this external test set of both positive and negativeinteractions We also assessed how often the known targetsfall into the top predicted ones in the SwissTargetPredic-tion general output (see Figure 1) For 70 of the ligandsat least one of the known targets is found among thefirst 15 top predicted ones and for 31 of the ligands inour test set the best predicted target is a true-positiveFor instance molecule CHEMBL2325087 (SMILESNC(=S)N1N=C(CC1c1ccc2ccccc2c1)c1ccc(Cl)c(Cl)c1)binds to EGFR and ERBB2 with sub-micromolar activity(41) and these two targets are accurately predicted bySwissTargetPrediction (see Supplementary Figure S2)Although we cannot exclude that some molecules in our


test set were actually developed based on their similaritywith known ligands our results strongly indicate thatSwissTargetPrediction provides reliable predictions thatcan be used in follow-up experiments

DISCUSSION

SwissTargetPrediction has been primarily developed foridentifying targets of molecules known to be bioactive Nev-ertheless users can upload any small molecule real or vir-tual even without prior knowledge of its potential effectsIn this case the predicted targets may be relevant especiallyif the similarity with known ligands is high The predictionsmay also provide hints on how a compound or a scaffoldmight be chemically modified in order to increase its activ-ity on a given target by comparing with known ligands thatshare some similarity (see also (42)) However we point outthat prediction accuracy is expected to be significantly lowerfor molecules with unknown bioactivity This can be un-derstood by noting that SwissTargetPrediction will alwayssuggest some target based on the assumption that if themolecule is active it will likely bind to some protein Formolecules with unknown bioactivity this assumption is notvalid per se and the molecule may not bind to any protein inwhich case all predicted targets are false-positives In partic-ular inactive compounds can sometimes exhibit good sim-ilarity with active molecules if they have been obtained bymodifying an active compound at some key position thatwas crucial for its interactions This is a known limitationof ligand-based approaches when applied to any kind ofcompounds and therefore target predictions should be in-terpreted with care in the absence of indication of bioactiv-ity

Homology-based mapping of target predictions is in-creasingly being recognized as a powerful approach totranslate results obtained in model organisms to human(353643) In this work we have considered homology re-lationships between and within five vertebrate species forwhich most homologous proteins display a very high se-quence identity and similar functions Therefore we didnot filter out any homology relationship For more dis-tant organisms (eg worm or yeast) greater care shouldbe taken for instance by allowing only mapping betweenorthologous proteins that have conserved binding sites orhigh overall sequence identity Another possible issue withhomology-based mapping arises with molecules that arespecifically designed to target some members of a proteinfamily and not others Our algorithm as most other ligand-based methods will likely fail to detect these subtle differ-ences For instance in Supplementary Figure S2 moleculeCHEMBL2325087 is also predicted to bind to ERBB3 withequal probability although the experimental activity (51M) is much lower than for EGFR and ERBB2(41) To ad-dress such issues one possibility is to use other orthogonalcomputational approaches such as structure-based analy-ses or molecular docking (4445) to refine the predictionsby considering small changes in protein binding sites thatcould confer specificity to some targets

In SwissTargetPrediction we use a probability derivedfrom our cross-validation analysis to rank the targets andestimate the accuracy of the predictions Other approaches

have been proposed to assess the confidence of predictionsFor instance in Keiser et al (1) an E-value is computedfrom the 2D similarity with the set of ligands of a targetThis E-value is derived from the statistics of similarity val-ues with all ligands (above a certain threshold) while in ourcase only the most similar ligand according to each simi-larity measure is considered Our probabilities can be inter-preted in terms of precision (ie number of true-positives di-vided by the number of predicted targets) while E-values in-dicate how likely it would be to find a molecule with a givenaverage similarity to the set of ligands of a target In prac-tice the most similar ligands are those contributing most tothe E-value so the two approaches are not necessarily fun-damentally different Also predictions with very low prob-ability in our approach correspond to low similarity valuesand therefore would result in high E-values Importantlywe point out that by combining different kinds of chemicalsimilarity measures our approach can explore more diverseregions of the chemical space (18)

CONCLUSION AND OUTLOOK

SwissTargetPrediction is part of an important initiative ofthe Swiss Institute of Bioinformatics to provide online toolsfor computer-aided drug design many of which are alreadyavailable (424446ndash48) In future developments SwissTar-getPrediction will be further integrated with these tools forinstance by predicting potential binding modes with Swiss-Dock (44) Moreover as large screening campaigns are in-creasingly being carried out in different organisms both inindustry and academia (4950) SwissTargetPrediction willbe regularly updated and new organisms added to it Thiswill enable users to efficiently harness the wealth of publiclyavailable data to accurately predict new targets for bioactivesmall molecules in diverse species

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online includ-ing references [1ndash6]

ACKNOWLEDGMENT

We are thankful to Tomislav Ilicic for insightful commentsabout the web interface

FUNDING

Swiss Institute of Bioinformatics Source of open accessfunding Swiss Institute of Bioinformatics (core funding)Conflict of interest statement None declared

REFERENCES1 KeiserMJ RothBL ArmbrusterBN ErnsbergerP IrwinJJ

and ShoichetBK (2007) Relating protein pharmacology by ligandchemistry Nat Biotechnol 25 197ndash206

2 OpreaTI BaumanJE BologaCG BurandaT ChigaevAEdwardsBS JarvikJW GreshamHD HaynesMK HjelleBet al (2011) Drug Repurposing from an Academic Perspective DrugDiscov Today Therapeutic Strategies 8 61ndash69

3 JorgensenWL (2009) Efficient drug lead discovery and optimizationAcc Chem Res 42 724ndash733


4 ZieglerS PriesV HedbergC and WaldmannH (2013) Targetidentification for small bioactive molecules finding the needle in thehaystack Angew Chem Int Ed Engl 52 2744ndash2792

5 KaramanMW HerrgardS TreiberDK GallantPAtteridgeCE CampbellBT ChanKW CiceriP DavisMIEdeenPT et al (2008) A quantitative analysis of kinase inhibitorselectivity Nat Biotechnol 26 127ndash132

6 BentoAP GaultonA HerseyA BellisLJ ChambersJDaviesM KrugerFA LightY MakL McGlincheyS et al(2014) The ChEMBL bioactivity database an update Nucleic AcidsRes 42 D1083ndashD1090

7 BoltonE WangY ThiessenPA and BryantSH (2008) AnnualReports in Computational Chemistry Vol 4 American ChemicalSociety Washington DC

8 IrwinJJ SterlingT MysingerMM BolstadES andColemanRG (2012) ZINC a free tool to discover chemistry forbiology J Chem Inf Model 52 1757ndash1768

9 ClemonsPA (2004) Complex phenotypic assays in high-throughputscreening Curr Opin Chem Biol 8 334ndash338

10 IngleseJ JohnsonRL SimeonovA XiaM ZhengWAustinCP and AuldDS (2007) High-throughput screening assaysfor the identification of chemical probes Nat Chem Biol 3466ndash479

11 SmithAM AmmarR NislowC and GiaeverG (2010) A surveyof yeast genomic assays for drug and target discovery PharmacolTher 127 156ndash164

12 MestresJ Gregori-PuigjaneE ValverdeS and SoleRV (2009) Thetopology of drug-target interaction networks implicit dependence ondrug properties and target families Mol Biosyst 5 1051ndash1057

13 LounkineE KeiserMJ WhitebreadS MikhailovD HamonJJenkinsJL LavanP WeberE DoakAK CoteS et al (2012)Large-scale prediction and testing of drug activity on side-effecttargets Nature 486 361ndash367

14 KolaI and LandisJ (2004) Can the pharmaceutical industry reduceattrition rates Nat Rev Drug Discov 3 711ndash715

15 KeiserMJ SetolaV IrwinJJ LaggnerC AbbasAIHufeisenSJ JensenNH KuijerMB MatosRC TranTB et al(2009) Predicting new molecular targets for known drugs Nature462 175ndash181

16 IssaNT KrugerJ ByersSW and DakshanamurthyS (2013) Drugrepurposing a reality from computers to the clinic Expert Rev ClinPharmacol 6 95ndash97

17 DunkelM GuntherS AhmedJ WittigB and PreissnerR (2008)SuperPred drug classification and target prediction Nucleic AcidsRes 36 W55ndashW59

18 GfellerD MichielinO and ZoeteV (2013) Shaping the interactionlandscape of bioactive molecules Bioinformatics 29 3073ndash3079

19 GongJ CaiC LiuX KuX JiangH GaoD and LiH (2013)ChemMapper a versatile web server for exploring pharmacology andchemical structure association based on molecular 3D similaritymethod Bioinformatics 29 1827ndash1829

20 WangL MaC WipfP LiuH SuW and XieXQ (2013)TargetHunter an in silico target identification tool for predictingtherapeutic potential of small organic molecules based onchemogenomic database AAPS J 15 395ndash406

21 CampillosM KuhnM GavinAC JensenLJ and BorkP (2008)Drug target identification using side-effect similarity Science 321263ndash266

22 WillettP (2011) Similarity searching using 2D structural fingerprintsMethods Mol Biol 672 133ndash158

23 WirthM and SauerWHB (2011) Bioactive molecules perfectlyshaped for their target Mol Inform 30 677ndash688

24 BallesterPJ and RichardsWG (2007) Ultrafast shape recognition tosearch compound databases for similar molecular shapes J ComputChem 28 1711ndash1723

25 SastryGM DixonSL and ShermanW (2011) Rapid shape-basedligand alignment and virtual screening method based onatomfeature-pair similarities and volume overlap scoring J ChemInf Model 51 2455ndash2466

26 LiuX JiangH and LiH (2011) SHAFTS a hybrid approach for3D molecular similarity calculation 1 Method and assessment ofvirtual screening J Chem Inf Model 51 2372ndash2385

27 ArmstrongMS FinnPW MorrisGM and RichardsWG (2011)Improving the accuracy of ultrafast ligand-based screening

incorporating lipophilicity into ElectroShape as an extra dimensionJ Comput Aided Mol Des 25 785ndash790

28 Perez-NuenoVI VenkatramanV MavridisL and RitchieDW(2012) Detecting drug promiscuity using Gaussian ensemblescreening J Chem Inf Model 52 1948ndash1961

29 ArmstrongMS MorrisGM FinnPW SharmaR MorettiLCooperRI and RichardsWG (2010) ElectroShape fast molecularsimilarity calculations incorporating shape chirality andelectrostatics J Comput Aided Mol Des 24 789ndash801

30 SafranM DalahI AlexanderJ RosenN Iny SteinTShmoishM NativN BahirI DonigerT KrugH et al(2010) GeneCards Version 3 the human gene integrator Database(Oxford) doi 101093databasebaq020

31 UniProtC (2013) Update on activities at the Universal ProteinResource (UniProt) in 2013 Nucleic Acids Res 41 D43ndashD47

32 KupferD and BulgerWH (1990) Inactivation of the uterineestrogen receptor binding of estradiol during P-450 catalyzedmetabolism of chlorotrianisene (TACE) Speculation that TACEantiestrogenic activity involves covalent binding to the estrogenreceptor FEBS Lett 261 59ndash62

33 KieferF ArnoldK KunzliM BordoliL and SchwedeT (2009)The SWISS-MODEL repository and associated resources NucleicAcids Res 37 D387ndashD392

34 LoewensteinY RaimondoD RedfernOC WatsonJFrishmanD LinialM OrengoC ThorntonJ and TramontanoA(2009) Protein function annotation by homology-based inferenceGenome Biol 10 207

35 KlabundeT (2007) Chemogenomic approaches to drug discoverysimilar receptors bind similar ligands Br J Pharmacol 152 5ndash7

36 KrugerFA and OveringtonJP (2012) Global analysis of smallmolecule binding to related protein targets PLoS Comput Biol 8e1002333

37 ParicharakS KlenkaT AugustinM PatelUA and BenderA(2013) Are phylogenetic trees suitable for chemogenomics analyses ofbioactivity data sets the importance of shared active compounds andchoosing a suitable data embedding method as exemplified onKinases J Cheminform 5 49

38 VilellaAJ SeverinJ Ureta-VidalA HengL DurbinR andBirneyE (2009) EnsemblCompara GeneTrees completeduplication-aware phylogenetic trees in vertebrates Genome Res 19327ndash335

39 SchreiberF PatricioM MuffatoM PignatelliM and BatemanA(2014) TreeFam v9 a new website more species andorthology-on-the-fly Nucleic Acids Res 42 D922ndashD925

40 WaterhouseRM TegenfeldtF LiJ ZdobnovEM andKriventsevaEV (2013) OrthoDB a hierarchical catalog of animalfungal and bacterial orthologs Nucleic Acids Res 41 D358ndashD365

41 YangW HuY YangYS ZhangF ZhangYB WangXLTangJF ZhongWQ and ZhuHL (2013) Design modificationand 3D QSAR studies of novel naphthalin-containing pyrazolinederivatives withwithout thiourea skeleton as anticancer agentsBioorg Med Chem 21 1050ndash1063

42 WirthM ZoeteV MichielinO and SauerWH (2013)SwissBioisostere a database of molecular replacements for liganddesign Nucleic Acids Res 41 D1137ndashD1143

43 JacobL and VertJP (2008) Protein-ligand interaction prediction animproved chemogenomics approach Bioinformatics 24 2149ndash2156

44 GrosdidierA ZoeteV and MichielinO (2011) SwissDock aprotein-small molecule docking web service based on EADock DSSNucleic Acids Res 39 W270ndashW277

45 MorrisGM HueyR LindstromW SannerMF BelewRKGoodsellDS and OlsonAJ (2009) AutoDock4 andAutoDockTools4 Automated docking with selective receptorflexibility J Comput Chem 30 2785ndash2791

46 ZoeteV CuendetMA GrosdidierA and MichielinO (2011)SwissParam a fast force field generation tool for small organicmolecules J Comput Chem 32 2359ndash2368

47 GfellerD MichielinO and ZoeteV (2013) SwissSidechain amolecular and structural database of non-natural sidechains NucleicAcids Res 41 D327ndashD332

48 GfellerD MichielinO and ZoeteV (2012) Expanding molecularmodeling and design tools to non-natural sidechains J ComputChem 55 1525ndash1535


49 WallaceIM UrbanusML LucianiGM BurnsAR HanMKWangH AroraK HeislerLE ProctorM St OngeRP et al(2011) Compound prioritization methods increase rates of chemicalprobe discovery in model organisms Chem Biol 18 1273ndash1283

50 FrearsonJA and CollieIT (2009) HTS and hit finding inacademiandashndashfrom chemical genomics to drug discovery Drug DiscovToday 14 1150ndash1158


small molecule interactions in the last few years has enabledresearchers to develop ligand-based approaches for targetprediction (117ndash20) With SwissTargetPrediction our goalis to provide a user-friendly web interface for a knowledge-based algorithm recently developed in our group (18)to predict the targets of bioactive small molecules Com-pared to other existing approaches SwissTargetPredictionhas several distinctive features First it enables combiningboth 2D and 3D similarity measures with known ligandsSecond it provides results in five different species Third itallows users to map predictions between and within organ-isms based on target homology

THE SWISSTARGETPREDICTION METHOD ANDDATASET

SwissTargetPrediction is based on the observation that sim-ilar bioactive molecules are more likely to share similartargets (121) Therefore the targets of a molecule can bepredicted by identifying proteins with known ligands thatare highly similar to the query molecule In this ligand-based strategy a major challenge is to accurately identifyand quantify similarity between the query molecule and theknown ligands Early approaches have focused on deter-mining chemical similarity by using molecular fingerprints(22) (sometimes called 2D similarity) While compounds ex-hibiting a high similarity under these measures clearly havean increased likelihood for interactions with similar tar-gets the biophysics of molecular recognition suggests thatsimilarity in ligand shape or electrostatic potential distri-bution could also lead to a similar effect (23) Therefore3D structural similarity measures have been developed toassess similarity between molecules (24ndash29) Recently wehave shown that combining 2D and 3D similarity measuressignificantly increases the target prediction accuracy espe-cially if the query molecule is new and does not belong toan already well-studied chemical series (18) In SwissTarget-Prediction both 2D similarity and 3D similarity values arecomputed against a set of known ligands For 2D similar-ity we use FP2 fingerprints to describe molecules as imple-mented in OpenBabel version 220 The similarity betweentwo molecules is quantified with the Tanimoto coefficient(which corresponds to the number of shared fingerprintpatterns divided by the total number of fingerprint pat-terns describing the two molecules) For 3D similarity wefirst generate 20 different conformations of each molecule(see Supplementary Materials) From these different con-formations 20 Electroshape vectors which consist of 18-dimensional real vectors (27) are computed The Manhat-

tan distance(

d =18sum

s=1|xs minus ys |

)is used to compare vectors

(x and y) describing two different molecules The final 3Dsimilarity value between molecules i and j is computed as1

(1 + 1

18 di j) where dij is the smallest Manhattan distance

among the 20times20 distances calculated over all possible con-formations of each molecule (see also Supplementary Ma-terials) The final score of a target corresponds to a combi-nation of similarity measures based on a logistic regressionof the similarity values with the most similar ligands usingboth 2D and 3D similarity measures (see SupplementaryMaterials and (18)) Coefficients of the logistic regression

for each molecule size are listed in Supplementary TableS1 Target scores range therefore between 0 and 1 with thelargest possible value being reached if the query moleculeis a known ligand of the target These scores are used torank predicted targets A probability has been derived fromthis score to assess the likelihood of the predictions to becorrect These probability values correspond to the aver-age precision (ie number of true-positives divided by thetotal number of predicted targets at different thresholds)obtained in a leave-one-out cross-validation study over ourtraining set (see Supplementary Materials) As it is based oncross-validation they may suffer from internal biases in ourtraining data (eg presence of large congeneric series of sim-ilar molecules) and if a new query molecule without relatedmolecules in our database is tested they may slightly over-estimate the prediction accuracy For this reason we stressthat these probabilities are primarily used to rank targetspredicted to bind to a given small molecule In particularthey should not be used to compare predictions obtainedwith different molecules

The set of proteinndashligand interactions was retrieved fromthe ChEMBL database version 16 (6) using stringent cri-teria to remove ambiguous cases First only interactionsinvolving single proteins or protein complexes as well asligands with less than 80 heavy atoms were consideredSecond selected interactions had to be annotated as di-rect binding (lsquoassay typersquo = lsquoBrsquo) with an activity (Ki KdIC50 or EC50) lower than 10 M in all assays Interactionswere retrieved in five organisms (human mouse rat cowand horse) In total our dataset consists of 280 381 smallmolecules interacting with 2686 targets with the majorityof targets (66) found in human (see Table 1)

THE SWISSTARGETPREDICTION WEB INTERFACE

SwissTargetPrediction provides an intuitive interface to pre-dict small molecule protein targets (see also Supplemen-tary Figure S1) Query molecules can be inputted either asSMILES or drawn in 2D using the javascript-based molec-ular editor of ChemAxon (httpwwwchemaxoncom) TheSMILES input field and the 2D interface are automaticallysynchronized The organism in which predictions should bemade can be selected The current version of SwissTarget-Prediction allows users to choose between five organismshuman mouse rat cow and horse the default being human(see Supplementary Figure S1) Once a molecule has beenprovided either by SMILES or by drawing and an organ-ism has been chosen the lsquoSubmitrsquo button becomes clickableand calculations can start The SMILES is first checked toensure that it corresponds to a valid chemical structure Iftrue the similarity (both 2D and 3D) between the querymolecule and all ligands in our database is computed andthe score of each target is derived from the combined 2Dand 3D similarity values with the most similar ligands (seeSupplementary Materials)

The result page lists the predicted targets with their com-mon name together with links to GeneCards (30) (for hu-man proteins) UniProt (31) and ChEMBL (6) databaseswhen available (see Figure 1) Targets are ranked accord-ing to their score with respect to the query molecule Thetarget classes are displayed in the last column These classes




















VALIDATION DATASET




DISCUSSION







SUPPLEMENTARY DATA


ACKNOWLEDGMENT


FUNDING











































































VALIDATION DATASET




DISCUSSION







SUPPLEMENTARY DATA


ACKNOWLEDGMENT


FUNDING






























































VALIDATION DATASET




DISCUSSION







SUPPLEMENTARY DATA


ACKNOWLEDGMENT


FUNDING


























































DISCUSSION







SUPPLEMENTARY DATA


ACKNOWLEDGMENT


FUNDING













































































































Documents

SwissTargetPrediction: a web server for target prediction of … · NucleicAcidsResearch,2014,Vol.42,WebServerissue W33 smallmoleculeinteractionsinthelastfewyearshasenabled researcherstodevelopligand-basedapproachesfortarget