Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
1
Supplementary Notes Probes&Drugsportal:interactiveapproachtoOpenDataexplorationinchemicalbiology
CtiborSkuta,MartinPopr,TomasMuller,JindrichJindrich,MichalKahle,DavidSedlak,DanielSvozilandPetrBartunek
Bioactivecompoundtoolsandsoftwareapplications.......................................2SupplementaryNote1
Probes&Drugsportaldatabase........................................................................3SupplementaryNote2
2.1 Compounds.............................................................................................................................3
2.1.1 Compoundsprocessing...................................................................................................3
2.1.2 Compoundstandardization............................................................................................4
2.1.3 Compoundsexternaldata...............................................................................................5
2.2 Compoundsets.......................................................................................................................7
2.3 Datalicenseandaccessibility..................................................................................................7
Webinterface....................................................................................................8SupplementaryNote3
3.1 Help.......................................................................................................................................11
Filteringsystem...............................................................................................13SupplementaryNote4
Visualizations...................................................................................................18SupplementaryNote5
5.1 Venndiagram........................................................................................................................18
5.2 Chemicalspace.....................................................................................................................19
5.3 Clusterheatmap...................................................................................................................21
5.4 Summaryvisualizations(biological,physicochemical,scaffolds).........................................23
5.4.1 Biologicalsummary.......................................................................................................23
5.4.2 Physicochemicalpropertiesdistribution.......................................................................24
5.4.3 Scaffoldsummary.........................................................................................................25
Chemicalintelligence.......................................................................................26SupplementaryNote6
6.1 Structuralalerts....................................................................................................................27
Ontologies.......................................................................................................28SupplementaryNote7
Customsets.....................................................................................................29SupplementaryNote8
Programmingtools.........................................................................................30SupplementaryTable1
Compoundsets...............................................................................................33SupplementaryTable2
Externalsources..............................................................................................39SupplementaryTable3
References............................................................................................................................................41
Nature Methods doi:10.1038/nmeth.4365
2
BIOACTIVE COMPOUND TOOLS AND Supplementary Note 1SOFTWARE APPLICATIONS
Chemicalprobesareindispensabletoolsinmodernbiology.Thesecompoundsarecommonlyusedtostudy a gene function, validate molecular targets or dissect complex processes within cells andorganisms [1-6].Although,chemicalprobesshouldbewell-described,potent, selective tools, ithasbeen demonstrated that not all of them possess these qualities. For some, their biologicalcharacteristicsweredisprovedortheirsuitabilityquestioned[7-12].Still,manysuchcompoundsareincorporated into commercial screening libraries andarebeingusedby the community [8, 11, 13,14].However, thissituationdoesn’tstemprimarily fromthemisleading informationonthevendorwebsites,butmainlyfromthelackoftoolsthatsimplifysuitabletoolselection.Data,notonlyaboutchemicalprobes,butgenerallyaboutallbioactivecompoundtools,arescatteredovervariousdatasources [11,15-31]andresearchpapers [10,32-35]whichmakesthesearch for therightchemicaltoolaverycomplexandtime-consumingtask.
Oneof the resources thatemerged tobringorder to the fieldof chemicalprobes isa community-driven portal Chemical Probes.org [8]. Chemical Probes.org is a high-quality resource containing avalidated setof compounds that canbeemployedasprobeson specific targets.Basic compoundsdataareenrichedwithratingsbyexpertsinthefieldofchemicalbiologyalongwithcommentsontheprobes usage and, in some cases, possible downfalls. The portal also contains a list of obsolete(historical)compoundsthatarenolongerrecommendedtobeusedasprobes(6.1),eitherbecausethere are currently higher-quality alternatives or because their former biological properties weredisproved. Although Chemical Probes.org is a great resource, it contains “only” 158 probes (April2017), which corresponds with the community-driven approach and the quality of the data, butwhichcanalsoturnouttobeinsufficientinmanycases.
Toalleviatetheseproblems,wedevelopedtheProbes&Drugs(P&D)portal.P&Dportalisauniquetoolwhichcompilesdatafrompubliclyavailabledatabasesofbiologicallyactivemolecules,suchasChEMBL [18], Guide to PHARMACOLOGY [22], DrugBank [20], DrugCentral [21], ChEBI [17],BindingDB [15], ZINC[27] or ChemBank [36]. However, these databases support only very basicqueries, data analysis and visualization tools. The added value of P&D is that it delivers a strongcombination of a detailed property and functional annotation with advanced query, filtering andvisualization features. Though there exists several web applications that provide an interface tochemical and/or biological data extracted fromexternal resources, such asOpenPHACTS Explorer[37],ChemBioNavigator[38],theNPCbrowser[39],theCDDVault[40]andPharos[23],P&Dportalallowsonetoinstantlyperformmulti-conditionalqueriesandanswercomplexquestions,unlikeanyoftheotherresources.
Open PHACTS Explorer [37] and ChemBioNavigator [38] both provide an interface to the OpenPHACTSAPI [41]whichbrings together severalpublicly availablepharmacological resources.Whilethe Open PHACTS Explorer offers a rather elementary tabular view with basic information aboutcompoundstructure,physicochemicalproperties,anditspharmacology,ChemBioNavigatorenablesone to fetch multiple data sets and compare them using simple scatter plots. However, thefunctionalityofbothtoolsisdegradedbydateddatasetsprovidedthroughtheOpenPHACTSAPI(asofFebruary2017,themostrecentdataupdateoccurredon31thMarch2015accordingtotheOpen
Nature Methods doi:10.1038/nmeth.4365
3
PHACTSwebpages[42]).Inaddition,sincetheupdateoftheOpenPHACTSAPIfromver.1.4to2.0(4th February2016), theChemBioNavigator stoppedworkingandusers are redirected to theOpenPHACTSExplorerinstead.TheNPCbrowser[39]isadesktopapplicationthatsimplifiestheanalysisofthe NCGC Pharmaceutical Collection [39].While it contains some basic selection tools, it has notbeen updated, together with the NCGC Pharmaceutical Collection, since 2012. The CDD Vault(CollaborativeDrugDiscoveryVault)[40]isacommercialwebapplication(withapossibilitytocreatea free accountwith a limited functionality) that enables an analysis of compounds collected fromvarious commercial and non-commercial resources. A registered user can create collections fromthesesets,workwiththemanduseverybasicvisualizationfunctions(currentlyinabetastate).Therecently published portal Pharos [23] is a web interface to the Knowledge Management Center(KMC)fortheIlluminatingtheDruggableGenome(IDG)program[43].Itisatarget-centrictoolthatenables a simple exploration of chemical and biological data and provides tools for disease- andligand-based browsing (currently, both as a work in progress). The interface is integrated withsummarizing visualizations and a filtering system which, on the basis of data annotations and anumberofimplementedontologies,enablesonetounitemultiplesearchesintoaworkingdataset.
PROBES & DRUGS PORTAL DATABASE Supplementary Note 2
The Probes and drugs (P&D) portal is a ligand-centric web application which aims to cover thechemicalspaceofcommonlyusedbioactivecompoundtoolsandenableitsexplorationfromvariouspointsofviewthroughanintuitiveandyetverypowerfulfilteringsystem(SupplementaryNote4).This system, enhanced by Boolean logic in combination with integrated visualizations(SupplementaryNote5),ontologies(SupplementaryNote7),chemicalintelligence(SupplementaryNote6)andthepossibilitytocreateuser-definedsets(so-calledcustomsets)(SupplementaryNote8),makesitauniquediscoveryplatforminthefieldofbioactivecompoundtools.
2.1 COMPOUNDS TheP&Dportalwascreatedwithanideatoreflectthecurrentstateofbioactivecompoundspace.Therefore,itscompoundbasewasassembledprimarilyoutofestablished,non-commercialaswellascommercial, bioactive compound sources with a high attention given to compounds labelled asprobesordrugs(2.2).Withthecurrentsize(asofMay2017)of29,898bioactives,theportaldoesn’taimtobeanultimatesourceforcompoundsthathaveevershownabioactivepotential,butratheratoolforworkingwiththemostfrequentlyusedcompoundtoolsinbiologicalexperiments.
2 .1.1 Compounds processing Compounds, togetherwithallassociateddataattributes (e.g.,name, target,CASnumberetc.),areuploadedto theP&Dportal from .sdf (StructureDataFile)or .csv (CommaSeparatedValues) files.Whileonlysomedataattributesaredisplayedinthecompoundsview,allattributesareshowninacompound detail view in the Source data tab. Compound structures are parsed and canonicalizedusingRDKit[44],aprimarycheminformaticsframeworkusedthroughouttheportalforallchemicalrelated functions. If a structure is unparsable by the RDKit, the OpenBabel cheminformaticsframework [45] isusedwhich, insomecases, isable toparse thestructureandre-generate it inaformparsablebytheRDKit.Ifneitherframeworkisabletoparseacertainstructure,wetrytofinditsproperformmanually.Whenacompound isparsed, itsstandardizedform(2.1.2) isgeneratedandexternaldata(e.g.,bioactivities,externalIDs,tagsetc.)areassociatedwithit.
Nature Methods doi:10.1038/nmeth.4365
4
Uploadedstructuresarenotmanuallycurated.However, ifastructure isnotavailableorparsable,we try to resolve it by a search in the PubChemdatabase [46] or bymatching its IDs or suppliednames.Sometimes,acompoundset isavailableonly in theformofwebpages (e.g.,NURSA ligandset, Chemical Probes.org set, etc.). In this case, data are harvested automatically, if possible, ormanually,ifnecessary.
2.1.2 Compound standardizat ion Compound standardization is aprocess inwhichvarious chemical formsof a single compoundareunified (Figure 1). Compound standardization, on the P&D portal, comprises of several steps: theremoval of stereochemistry, salt/solvate components and isotope labels, theneutralization/standardization of charges and breaking of non-existent (erroneously depicted)covalentbondswhereionicbondsoccur.
Asvariousstereoisomerscanshowdifferentpotenciesortargetprofiles,theoriginalstereoisomerswiththeircorrespondingdataarealwaysavailabletotheuser.However,sometimesitisdifficulttoassessaproper stereochemistry fromprovider’sor vendor’sdata, and thereforewealsooffer theoption to investigate compounds with the stereochemistry removed. An illustrative example isprovidedbyDocetaxel,whichexists insixdifferent forms in theP&Dportal.However,uponcloserexamination it becomes clear that the stereochemistry was not assigned consistently betweendifferentvendors (a commoncase is thatonevendorassignsa specific stereochemistry toagivenbondwhileanothervendorreportsthesamebondwithoutastereochemistry)andthattheseformsvery likely represent the same molecule. Upon standardization, Docetaxel is shown as only onecompound in the Standardized compounds view (with allmetadatamerged), while its six originalrepresentationsremainaccessibleforuserstolaterchoosewhichstereoisomersarebestsuitedfortheirspecificapplication.
On the P&D portal, an improved version of the standardiser Python package [47] is used forcompound standardization. Unlike the original version, it is able to deal with multi-componentsystems containing more than one organic compound (i.e., mainly mixtures, but also includingpolymersinaformoftheirbuildingblocks).Whenamixtureundergoesstandardization,itissplitintoindividualparts(molecules),eachmoleculeisstandardizedseparatelyandidenticalcomponentsareremoved. If at the end of the process there is still more than one molecule, they are put backtogetherinonesystemandtaggedasmulti-component.
Inorganiccompoundsorcompoundscontainingtransitionorbasicmetal/sarenotfullystandardizedand their standardized form isequal to theirnon-isomeric form (original compoundwith removedstereochemistry).
Nature Methods doi:10.1038/nmeth.4365
5
Figure1 ExampleofP&Dstandardization.Threedifferentrepresentationsofonecompound(PFI-2)matchedtogetherbystandardization process. PFI-2 is a first-in-class, potent, selective, and cell-permeable inhibitor of the methyltransferaseactivityofhumanSETD7.[48]
2.1.3 Compounds external data Each compound is enriched by additional metadata, such as external identifiers, bioactivity data,associatedpathways,ortargetandpathwayontologiesclassification.Thesedataarecollectedfromvariousexternalsources(SupplementaryTable3)bymatchingcompound’sInChIKey[49], isomeric(if available), non-isomeric and standardized. Subsequently, also the connectivity part of InChIkey(first part encoding the threemain layers of InChI) is used. The reason is to ensure that originallymulti-component compounds (e.g., salts or compounds with solvents) are after standardization(whichincludestheremovalofstereochemistry)matchedtotheirisomericformsinexternalsources(Figure 2). In the original compounds view, only datamatched to the original compound isomericInChIKey are shown; in the standardized compounds view, data matched by all other forms ofInChIKeyareadded.
Nature Methods doi:10.1038/nmeth.4365
6
Figure2DifferentformsofasinglecompoundalongwiththeirInChIkeys.Tomatchthetargetcompound(bottom)onlytheconnectivity part of the compound’s standardized form (right) can be used.Other isomeric (left) and non-isomeric (top),InChIkeysarecompletelydifferentfromthetargetcompound’sInChIkey.
2.1.3.1 Targets and bioactiv it ies Bioactivity data are currently extracted forHuman, Rat andMouse, for single protein and proteincomplex targets, fromChEMBL [18] andGuide toPHARMACOLOGY [22]databases. FromChEMBL,onlyrecordslabelledwithaconfidencescore>=7,withpchemblvalue(i.e.,theactivitytypeisequal
topIC50,pEC50,pKi,pKd,pKB,pAC50,pA2orPotency,allwith− log 𝑐𝑜𝑛𝑐𝑒𝑛𝑡𝑟𝑎𝑡𝑖𝑜𝑛[!!] units)and
valuerelation'='areextracted.Whenmorevaluesforoneligand-targetcomplexareavailable,theiraverage value is calculated. If available, this activity value is also annotatedwith amechanism ofaction(MOA)andaprimarytargetflag(i.e.,theligandisbindingdirectlytothetarget).
Guide to PHARMACOLOGY contains either an activity range for a ligand-target complex (minimumandmaximum)or themedianof activity values. There are also caseswhen the activity valueof aligand-target complex is not known, therefore, the N/A value is listed among the values on theportal.Again,ifavailable,thisactivityvalueisalsoannotatedwithamechanismofaction(MOA)andaprimarytargetflag.
Ontheportal,commontargets fromdifferentsources forallorganismsareunified intoonetargetwithalloriginaldata.First,thetargetsforindividualorganismsarematchedtoeachotheraccordingtotheirUniProtIDs[26]andthesearethenmergedonthebasisoftheirname.
Nature Methods doi:10.1038/nmeth.4365
7
2.2 COMPOUND SETS P&D portal compound sets were selected based on their popularity and availability. Upon userrequest,newcompoundsetscanbeeasily incorporatedwithin theP&Dportal.Currentlyavailablecompound sets are summarized in Supplementary Table 2. Compound setswhich canvary in size(i.e., non-fixed compound sets harvested from live web portals or databases) are updated on amonthlybasis.
Each compound set is characterized by six summarizing numbers, three for compounds in theiroriginal form and the same three for their standardized form respectively: a total number ofcompounds, number of unique compounds in the context of the P&D portal (i.e., compoundscontainedonlyinoneparticularset)andnumberofduplicates(i.e., identicalcompoundscontainedin the set). For standardized compound sets, the total anduniquenumbers arenaturally equal orlowerthanfortheiroriginalform;thenumberofduplicatesisequalorgreater.ThesestatisticscanbeaccessedintheCompoundSetsview(Figure3),whereeachnumberservesasahyperlinktofilterparticularcompoundsetintheCompoundsview(Figure).
Figure3Compoundsetsview.Compoundsetsaredividedinto5categories:probe,drug,non-commercial,commercialandunassigned.Foreachset,severalquantitiesaredisplayed:1)anumberofalloriginalcompounds(2.1.1)2)anumberofalluniqueoriginalcompounds3)anumberofduplicatecompounds4),5),6)anumberofall/unique/duplicatecompoundsinastandardizedcompoundset(2.1.2)7)alinktothesourcepageofacompoundset8)compoundsetdescription9)thedateoflastupdate
2.3 DATA LICENSE AND ACCESSIBILITY The data on the portal are available under the Creative Commons 4.0 license [50]. Currently, allcompounds can be downloaded through the Export compounds function (Supplementary Note 3,Figure1).Inthefuture,alldatawillbeavailableintheformofadatabasedump,andlaterthroughawebAPI.
Nature Methods doi:10.1038/nmeth.4365
8
WEB INTERFACE Supplementary Note 3
The P&D portal is centered around themain working space (Figure 1) through which a user canaccess the compounds and ask specific questions about them, i.e., use the filtering system(Supplementary Note 4), visualizations (Supplementary Note 5) and ontologies (SupplementaryNote7).
Figure 1 Compounds view, the main working space of the portal. 1) The switch between original and standardizedcompounds view 2) Functional tabs (from left): the search (implicit) tab; compound and custom sets tab; compoundpropertiesandstructuralalertstab;targetontologiestab;pathwayontologiestab3)visualizationbuttons(fromleft):Venndiagrams; chemical space; cluster heatmap; compound set summaries 4) text and structure search 5) the number ofcurrentlyselectedcompounds6)actionbuttons(fromleft):addallcurrentcompoundstoacustomset;highlightallfiltersamongvisiblecompounddata;refreshcurrentview;switchbetweendetailedandsimplecompoundview;downloadcurrentcompoundset;helpbutton7)currentcompoundsetview.
Atthebeginning,auserhasallP&Dcompoundsattheirdisposal.Eachcompoundisrepresentedbyitsstructurewithaprimaryname,basicphysicochemicalproperties(theLipinski’sRuleofFive)[51]andotherinformation,suchasanumberofcompoundswithanidenticalscaffoldontheportaloranumberofmatchedstructuralalerts(6.1),representedbycolorediconsaccompaniedbyanumber(quantity) in some cases (Figure 2).While only compound structure representation is shown in asimplecompoundsview (Figure 2, Figure 3),mostof theassociateddataarepresent inadetailed(default)compoundview.
Nature Methods doi:10.1038/nmeth.4365
9
Figure2Defaultrepresentationofacompound.1)drugicon2)probeicon3)availabletobuyicon4)showSMILESbutton(visible only on amouse cursor hover) 5) add to custom set button (visible only on amouse cursor hover) 6) compounddepiction7) findsimilarcompoundsbutton (visibleonlyonamousecursorhover)8)edit structure in thechemicaleditorbutton (visibleonlyonamouse cursorhover) 9) compound’sprimaryname (selected fromall associatednames)10) theLipinski’sRuleof Five compoundproperties (red rectangle indicates that themaximum threshold (inbrackets) of agivenpropertywasexceeded:molecularweight (500); thenumberofhydrogenbonddonors (5); thenumberofhydrogenbondacceptors (10); the number of rotatable bonds (10); calculated octanol/water partition coefficient (5); 11) a number ofcompoundssharingthesamestandardized(parent)compound12)thenumberofcompoundswiththesamescaffoldinthecontext of all compounds on the portal 13) the number of compound sets the compound belongs to 13) the number ofassociatedstructural(PAINS[12])alerts.
Thecolorediconsaboveacompound’sstructuredepictthreecompoundtags:adrug,probeandanavailability tags. These tags are assigned according to a compound’s membership in a particularcompoundsetoraccordingtoitslabelintheexternaldatabase.
A compound is tagged as a drug when it belongs to one of the drug sets (as of April 2017 toDrugBank, NIH Approved oncology drugs, DrugCentral or ChEMBL Approved Drugs) or when it isclassifiedasadruginChEMBLorGuidetoPHARMACOLOGY.Thecolorcodingofadrugicondependson a drug type. A drug can be assigned to one or more of the following types: approved,investigational, experimental, illicit, withdrawn, nutraceuticalor vet_approved drug. The followingdrugiconcolorcodingisused:
1. Whenadrugistaggedasapproved,theiconisgreen:
2. Whenadrugistaggedaswithdrawnorillicit,theiconisred:
Nature Methods doi:10.1038/nmeth.4365
10
3. Whenadrugistaggedasinvestigational,theiconisyellow-green:
4. When a drug is tagged by some of the other values (experimental, nutraceutical,
vet_approved),theiconisorange:
5. Whenitisunknownwhetheracompoundisadrug,theiconisgray:
Acompoundistaggedasaprobewhenthisinformationisspecifiedbythedatavendoranditisnottagged as obsolete (6.1) at the same time. Every compound from probe sets ChemicalProbes.org,SGCprobes,MLPprobesandNatureChemicalBiologyprobes is taggedasaprobe;alsocompoundstaggedasprobesinMLSMRProbes+and Informerset2.0setsaretaggedasaprobeintheportal.
1. Whenacompoundistaggedasaprobe,theiconisgreen:
2. Otherwise,theiconisgray:
Acompoundistaggedasavailable, i.e.,possibletobuy,whenitbelongstooneofthecommercialbioactivesets (Supplementary Table 2)orwhen it ispresent in theMolPort [52]ormcule [53] in-stockcompoundlists.
Figure 3 Data associatedwith a compound in the Compounds view. 1) compound name/s 2) the list of Custom sets thecompoundbelongs to3) the listofCompoundsets (2.2) thecompoundbelongs to4)compoundtags5) sourceattributes(any text attributes extracted from the compound source data file/s) 6) Chemical Abstracts Registration Number/s (ifavailable)7)compoundexternalIDswithhyperlinkstootherdatabases8)compound’sP&DIDs9)compound’stargetswithassociatedpathways10)pathwaysinwhichthecompoundplaysanactiverole(notnecessarilyasaligandofatarget)
Nature Methods doi:10.1038/nmeth.4365
11
Allcompound’sdatawiththeirsourcecanbeaccessedinthesinglecompounddetailview(Figure4).In addition to the Compounds view, a compound’s structure in different formats (SMILES, InChI,InChIKey,MOL/SDF),longerdescriptivetextsandalsoallsourcedataassociatedwithacompoundinitssourcefile/sareaccessiblefromthesinglecompounddetailview.
Figure4Thesinglecompounddetailview.
3.1 HELP The information about the portal and its usage are available in twomain forms: FAQ (FrequentlyAskedQuestions)andaninteractiveguidedtour.
TheFAQsectioncanbeaccessedundertheHelpsectionandwillbeupdatedaccordingtothemostfrequentuserquestionsandremarks.
Theinteractiveguidedtourwasdesignedtodescribemostoftheaspectsconcerningtheusageoftheportal, and theorigin/processingof thedata.The tour canbeaccessed from theHelp sectionandalsodirectlyfromtheCompoundsviewthroughthegreenquestionmarkiconinthetop-leftcorner(Figure 5). Accompanying the tour, there are also three simple interactive examples that shouldprovideauserwithabasicnotionhowtoworkwiththedataontheportal.
Nature Methods doi:10.1038/nmeth.4365
12
Figure5TheCompoundsviewwithanopenHelpmenu(thetop-leftcorner).
Nature Methods doi:10.1038/nmeth.4365
13
FILTERING SYSTEM Supplementary Note 4
An intuitive, yet powerful filtering system enables a user to ask about various properties of smallbioactivecompounds.Ausercanconstructnotonlysimplequeries(Figure3),butalsocomplicated,multi-conditionalquestions(Figure4).
Thecornerstoneof thesystem isasingle filter (Figure 1)whichcanbeofvarious types (Figure 2).Generally, a filter represents a subset of thewhole compound set that is applied on the basis ofassociatedlogicalBooleanoperators(AND=intersection,NOT=difference,OR=union).Theresultofausedfilterdependsontheselectedoperatorandcompoundsetontheinputofthefilter.Incaseof the first selected filter, the input set isnaturallyequal to thewhole compoundset (i.e., for theintersection, the result is equal to the compounds representedby the filter, for difference, to thewholecompoundsetwithoutthecompoundsrepresentedbythefilter,andforunion,tothewholecompoundset,becauseanysetunitedwithitssubsetisequaltotheformer).
Figure1Similarityfilter,oneofthepossiblefiltertypes.Mostfilterpartsareidenticalforallfiltertypes(points1to6and12),butsomeofthemareavailableonlyforaparticularfiltertype(here7to11).1)Anumberofcompoundsatafilterinput2)Booleanoperations(AND=intersection,NOT=difference,OR=union)appliedtocompoundsetontheinputofafilter(1)and compound set represented by a filter (7) 3) Filter type 4) Number of compounds represented by a filter 5)Disable/enable filterbutton6)Removefilterbutton7)Structuretowhichasimilarity iscalculated8)Draggableslider forsimilarity thresholdsadjustment (interactivelyconnectedwith text inputs9and11)9)The text inputofbottomsimilaritythreshold 10) Arrows enable the ordering of current compound set according to filter values 11) The text input of topsimilaritythreshold12)Anumberofcompoundsatafilteroutput
Nature Methods doi:10.1038/nmeth.4365
14
Onefilterwouldn’tbeenoughtocreatemulti-conditionalquestions.Thus,anynumberoffilterscanbechainedtogetherintoonelogicalexpression.Inthiscase,eachfilterisappliedonthecompoundsetresultingfromtheapplicationofallofitspredecessors(i.e.,thefirstfilterisappliedonthewholecompoundset,secondontheresultofthefirstetc.).Incasethatanyofthelogicaloperatorsforanyfilterintheexpressionischanged,theexpressionisinstantlyevaluatedandtheresult,includingallofits sub-results, is recalculated. Using themouse cursor, the filters can also be dragged and freelyinterchanged;usingthedisablebutton,anyfiltercanbetemporarilydisabled.
Generally, any kind of information on the P&Dportal can be used as a filter (text attributes, sub-structures,targets,targetclasses,etc.)andmostofthemcanbeaddedbytwoapproaches:throughasearchfieldwithanautocompletefunctionorusingasearchiconthatrevealsitselfwhenamousecursorhoversaboveafilter.
Nature Methods doi:10.1038/nmeth.4365
15
Figure2Varioustypesoffilters.Thesefiltersrepresentcompounds:1)containedinacompoundset(therearealsospecializedcompoundsetfiltersforcompoundsthatarecontainedonlyinaparticularsetandfilterforduplicatesinaparticularset)2)taggedasapproveddrugs3)associatedwiththenameGleevec4)withdefinedPubChemID5)withaparticularexternalID6)withcompoundpropertyinagivenrange7)thatcomplyastructuralalert8)withabiologicalactivityonagiventargetatagivenrange9)withthehighestbiologicalactivityonagiventargetclassatagivenrange10)withabiologicalactivityonagivenpathwayatagivenrange11)withagivenMechanism-of-Action(MOA)12)withagivenMOAeffect(positive,negative,other)13)withaparticularscaffold14)thataresimilartoagivenstructureinagivensimilarityrange15)thatcontainagivensubstructure16)thatareidenticaltoagivenstructure
Nature Methods doi:10.1038/nmeth.4365
16
Figure3Acompoundsetcreatedbytheapplicationof2compoundsetfilters.Ifsearchtabisselected,leftnavigationpanel1)listsallappliedfilters.Inthisexample,theintersectionoftheDrugcentralcompoundset(filter2)andDrugbankcompoundset(filter3)isfiltered.
Nature Methods doi:10.1038/nmeth.4365
17
Figure4Acompoundsetcreatedbytheapplicationof5differentfilters.Ifsearchtabisselected,leftnavigationpanel1)listsallappliedfilters.Inthisexample,acompoundsetwithfollowingcharacteristicsisfiltered:Glucocorticoidreceptorligandswithatleast100nMpotency(filter2)orcompoundsthatare30%ormoresimilartoagivenstructure(Dexamethasone)(filter3),labelledasapproveddrugs(filter4)withcLogPlowerthan5(filter6)andthatbelongtotheNIHClinicalCollectionscompoundset(filter5).
Nature Methods doi:10.1038/nmeth.4365
18
VISUALIZATIONS Supplementary Note 5
5.1 VENN DIAGRAM Venndiagramsareusedtodepicttheintersectionsoftwoormoredatasets(Figure1Figure,Figure2).OntheP&Dportal,Venndiagramsofupto5setscanbevisualized.Venndiagramvisualizationisdonebyourin-housedevelopedJavaScriptlibrary.
Figure1Venndiagramofthreedrugsetsintheirstandardizedform:DrugBank,DrugCentralandChEMBLApprovedDrugs.
Figure2TheintegrationofVenndiagramontheP&Dportal.
Nature Methods doi:10.1038/nmeth.4365
19
5.2 CHEMICAL SPACE Chemicalspaceisamultidimensionalspaceofallpossible,energeticallystable,chemicalcompounds.Chemicalspacecanbevisualizedas2D/3Dscatterplotinwhichdistancesbetweendatapoints(i.e.,chemicalmoieties) correspond to their structuralor physicochemical similarities. To calculate suchsimilarities, chemical compounds must be represented by their descriptors and similarity metricsmustbedefined.IntheP&Dportal,twotypesofdescriptorscanbeused:physicochemicalpropertiesandMorgan fingerprints [54].Whileeachpairofpre-calculatedphysicochemicalproperties canbeplotted on the X and Y axis (Figure 3), Morgan fingerprints are projected into the new sets ofcoordinates(PC1andPC2)usingPrincipalComponentAnalysis(PCA)(Figure4,Figure6).Colorandsizeofindividualdatapointsinthechemicalspacevisualizationcanalsoreflecttheirphysicochemicalproperties.Inaddition,togainanideaabouttherepresentativenessofagivencompoundset,itcanbecomparedto5000compoundsthatcontainthe5000most frequentscaffoldspresented intheChEMBL database (Figure 5). Chemical space visualization is done by our in-house developedJavaScriptlibrary.
Figure3Thevisualizationofthechemicalspaceofprobes(green)anddrugs(red).Chemicalspaceisdefinedbyamolecularweight (X axis) and ClogP (Y axis). Only compounds with amolecular weight between 150 and 1000 Da and with logPbetween-15and15areshown.
Nature Methods doi:10.1038/nmeth.4365
20
Figure4 Thevisualizationofthechemicalspaceofprobes(green)anddrugs(red).Chemicalspaceisdefinedbyfirsttwoprincipalcomponents(PC1andPC2)resultingfromthePCAofMorganfingerprints.
Figure5 ThePCAofallP&Dcompounds(pink)and5,000ChEMBLcompounds(gray).Bothsetsaresimilarlydiverse,onlytheleftpartofthegraphshowsP&Dcompoundswithoutanyscaffold.
Nature Methods doi:10.1038/nmeth.4365
21
Figure6 TheintegrationofthechemicalspacevisaulizationontheP&Dportal.ThevisualizationofthechemicalspaceofGlucocorticoid receptor (red) and Estrogen receptor (green) ligands. The size of each point corresponds with a ligand’smolecularweight.
5.3 CLUSTER HEATMAP Aclusterheatmapisagraphicaldatarepresentationconsistingofthecombinationofadendrogramandheatmap(Figure7).Adendrogramisatreelikestructuredepictingthearrangementofclustersyieldedbyhierarchicalclustering.Aheatmapisa2Dmatrixwithcolor-codedvalues.
Within the P&D portal, data for clustering can be represented either by their physicochemicalproperties(Figure7, Figure8)oras512bits longMorganfingerprints[54]withtheradiusof2.Forphysicochemicalpropertycoding,aheatmapshowsindividualpropertyvalues(Figure7).However,ifMorgan fingerprints are used, individual bit values are not depicted in a heatmap. Instead, onlymetadata columns, describing either binary class membership (i.e., a compound belong/does notbelongtoagivenclass)orquantifyingacompound/targetrelationship(i.e.,compoundaffinityvalue)areshown(Figure9).Clusterheatmapsarevisualizedusingthe InteractiveClusterHeatmaplibrary(InCHlib)[55].
Nature Methods doi:10.1038/nmeth.4365
22
Figure7Theclusterheatmapofcompoundsclusteredaccordingtotheirphysicochemicalproperties.Thelasttwocolumnscontain data about a compound membership (1 for true and 0 for false) to selected filters, or, as in this case, valuesassociatedwiththeselectedfilters(e.g.,bioactivities,structuresimilarities).
Figure8TheintegrationofclusterheatmapvisualizationontheP&Dportal.ThevisualizedclusterheatmapcontainsligandsofEstrogenreceptorsalphaandbeta.Theligandsareclusteredaccordingtobasicphysicochemicalproperties(columnsingreencolorscale)withtheirbioactivitiesasmetadata(columnsinblue-redcolorscale).
Nature Methods doi:10.1038/nmeth.4365
23
Figure 9 Theclusterheatmapofcompoundsclusteredbasedon theirMorgan fingerprints. [54]Onlymetadatacolumns,that display compound affinities greater or equal to 7 (-log(concentration[mol/L]) on 6 steroid receptors (from left:Glucocorticoid,Estrogenalpha,Estrogenbeta,Mineralocorticoid,ProgesteroneandAndrogenreceptor),areshown.
5.4 SUMMARY VISUALIZATIONS (BIOLOGICAL, PHYSICOCHEMICAL, SCAFFOLDS) Currently selected compound set can be summarized by three different types of visualizations: abiologicalsummary,scaffoldsummaryandphysicochemicalpropertiesdistribution.
5.4.1 Biological summary A biological summary consists of two pie charts showing the proportion of individual target andpathwayclassesrepresentedwithinacompoundset(Figure10).Thefollowingtargetandpathwayclassespre-selectedfromintegratedontologies(SupplementaryNote7)canbeused:
1. Targetclassesa. Selected target classes (Epigenetic regulator, Cytochrome P450, Kinases, Ion
channels, Catalytic receptors, Nuclear hormone receptors, G protein-coupledreceptors,Peptidasesandproteinases,Transporters)
b. ChEMBLtargetontologymainnodes[56]c. GuidetoPHARMACOLOGYtargetontologymainnodes[57]
2. Pathwayclassesa. Selected pathway classes (Immune system, Signal transduction, Gene expression,
Metabolismofproteins,Neuronalsystem,Hemostasis,Cellcycle,Cellularresponsetostress,Developmentalbiology)
b. Reactomepathwayontologymainnodes[25]
Nature Methods doi:10.1038/nmeth.4365
24
Figure 10 Biological summary of all P&D compounds with at least 100nM potency (i.e., value 7 in –log(concentration[mol/L])units)onanytargetusingthepre-selectedsetsoftargetandpathwayclasses.
5.4.2 Physicochemical propert ies distr ibution The distributions of the physicochemical properties of a current compound set can be easilycomparedwiththeirdistributionswithinthewholeP&Ddatabase(Figure11).
Figure 11 The comparison of selected physicochemical properties distribution between the nuclear hormone receptorsligandswithatleast100nMpotency(i.e.,value7in–log(concentration[mol/L])units)(inred)andthewholeP&Dcompoundset(ingray).
Nature Methods doi:10.1038/nmeth.4365
25
5.4.3 Scaffold summary The conceptof amolecular scaffold is commonlyused inmedicinal chemistry. Scaffolds representcommoncorestructuresofagivencompoundset.BasicversionoftheBemis-Murckoscaffoldisusedontheportal [58].Thistypeofscaffold iscreatedbypreservingall ringswiththeir interconnectingchains(so-calledlinkers)whileremovingallotherside-chains.Theanalysisofscaffoldfrequency(upto 100 of themost frequent scaffolds) can be performed for any selected subset within the P&Dportal(Figure12).
Figure12 Thescaffoldsummaryofnuclearhormonereceptorligandswithatleast100nMpotency.Steran-likescaffoldistypicalforthisreceptorfamily.
Nature Methods doi:10.1038/nmeth.4365
26
CHEMICAL INTELLIGENCE Supplementary Note 6
Fromthestructurepointofview, four typesofstructure filterscanbeusedontheP&Dportal:anidentity,asimilarity,substructureandscaffold.
Figure1Theintegrationofchemicalstructureeditor(KetcherbyEPAMLifeSciences[59])ontheP&Dportal.
The identity search is performed by converting a query structure to its InChIKey [49] and by itscomparison to the InChIKeys of all compounds on the portal. Each compound is characterized bythreetypesofInChIKey(notnecessarilydistinct):isomeric,non-isomericandstandardized.
Similarity between two structures is calculated using Morgan fingerprints and the Tanimotocoefficient.MorganfingerprintsarecircularfingerprintscloselyresemblingtoExtendedConnectivityFingerprints (ECFP) [54] thatareoneof themostpopular fingerprints forcompounddescription incheminformatics [35,60,61]. Incircular fingerprints,astructure isencodedbymeansofstructuralfragmentsthataredefinedasatomneighborhoodsuptoagivenradius (e.g.,ECFP4arecalculatedwith the radius of 2 atoms, ECFP6 with the radius of 3 atoms). The Tanimoto coefficient is acommonlyusedsimilaritymetricforbinarydataandisdefinedasfollows:
𝑇𝑎𝑛𝑖𝑚𝑜𝑡𝑜 = 𝑁!&!
𝑁! + 𝑁! − 𝑁!&!,
where𝑁!is a number of ON bits in the first fingerprint,𝑁!a number of ON bits in the secondfingerprintand𝑁!&!anumberofcommonONbitsinbothfingerprints.ThevalueoftheTanimotocoefficientliesintheintervalbetween0and1(1meansthatbothfingerprintsareidentical,0meansthattherearenocommonONbits).
Substructuresearchesareperformedwiththenon-isomericformofaquerystructure.Substructureand similarity queries are both performed within the RDKit database cartridge [62]. To enable
Nature Methods doi:10.1038/nmeth.4365
27
structure editing on the portal, chemical structure editor Ketcher by EPAM Life Sciences [59]wasemployed(Figure1).
Scaffoldsearchesarebaseduponapre-calculatedscaffoldstructureofeachcompound,anditcanbeperformedthroughthescaffoldiconfoundunderacompoundrepresentation(SupplementaryNote3, Figure 2). Be aware that scaffold and substructure filterswith the samequery structuredonothave to return the same compound set, since the latter can be only a substructure of a largerscaffold.
6.1 STRUCTURAL ALERTS Structural alert is a specific tagwhich should tell auser tobeawareaboutapossibleproblematicbehavior of a compound in the context of biological screening. It is associated either with acompound’s biological properties (e.g., non-selective or not sufficiently potent compound) or itsstructuralfeaturesthatmaycauseunwantedeffectswithinanassay(e.g.,non-specificreactionwitha protein). Currently, three different types of structural alerts are integrated into the portal: PanAssayInterferencefilters(PAINs)[12],aggregatorsandobsoletecompounds.
PAINs filters are a set of potentially problematic substructures that might be the cause for thepromiscuityor interferenceof a compoundwithinanassay. [12] SincePAINs filters arealready itsstandardpart,RDkitframeworkisresponsiblefortheirmatchingwithintheP&Dportal.
Aggregators are compounds thatmay form colloidal aggregates and interfere with an assay non-specifically.Aggregatorswithin theP&Dcompound setare currentlymatched to the setof knownaggregatorsfromtheAggregatoradvisorsoftware.[11]
Compoundstaggedwiththeobsoletestructuralalertarecompoundsthatwereonceusedastoolsinbiological screening,butwithcurrenthigherqualityalternatives (e.g.,morepotentor selective)orwithdisprovedbiologicalproperties(e.g.,non-selectivityforanoriginaltargetofinterest).ObsoletecompoundsontheportalarecurrentlytaggedaccordingtothelistofhistoriccompoundsfromtheChemicalProbes.orgportal.[8,63]
Theinformationaboutanumberofmatchedstructuralalertsforacompoundisdepictedbyaniconinthebottomrightcornerofeachcompound’sstructure(SupplementaryNote3,Figure2).Specificstructural alerts can be found in the Structural alerts tab in the detail view of a single compound(SupplementaryNote3,Figure4).
Nature Methods doi:10.1038/nmeth.4365
28
ONTOLOGIES Supplementary Note 7
Tofiltercompoundsonthebasisoftargetandpathwayclasses,targetontologiesfromChEMBL[18]andGuidetoPHARMACOLOGY[22],andpathwayontologyfromReactome[25]wereintegratedtotheportal. They canbe accessed in the Target andPathway tabswhere they canbeemployed tobrowsethroughcompounds ina tree-likemanner (fromthemostgeneral tomorespecificclasses)(Figure1).Anyontologynodecanbealsofurtherusedasafilterinthesearch(main)tab.Sincethesame targets from different sources are matched to each other through their Uniprot IDs [26](2.1.3.1),thenalsosimilarontologynodesfromdifferentontologiesrepresentsimilar(notidentical)compoundsets.
Figure1OntologiesintegrationontheP&Dportal.TargetandpathwayontologiescanbeusedfromtheTarget(1,currentlyselected)orPathway(2)tabstobrowsethroughcompoundsinatree-likemanner.Currentlyselectedclassintheontology(3)ishighlighted(green)andtargets/pathwaysassociatedwiththeclassaremarkedwiththegreenlabel.
Nature Methods doi:10.1038/nmeth.4365
29
CUSTOM SETS Supplementary Note 8
Custom sets are arbitrary, user defined, compound sets intended to store advanced queries withpossibilitytomanuallyadd/removesinglecompounds.Currently,theycanbeassembledonlyoutofcompoundsfoundontheportal;auseruploadofcompoundstotheportalisnotsupported.
Sincecustomsetsareboundtoauseraccount,theycanbecreatedonlybyloggedusers.Currently,ausercancreateupto5customsets.Tocreateacustomset,itmustbefirstinitializedintheCustomSets view (Figure 1). Once a custom set is created, single or multiple (batch) compounds can beaddedfromtheCompoundsview(SupplementaryNote3,Figure1).Singlecompoundsareaddedbyclickingonanarrowinatoprightcornerwhenhoveringoveracompound'simage(SupplementaryNote3,Figure2);multiplebyclickingonthelargerarrowontherightsideofthesecondnavigationtab (Figure ). Compounds canbe removed froma customsetonly in aparticular customset view(accessiblefromtheCustomsetstab)usingacrossicon.Again,bothsingleandmultiplecompoundscanberemoved.
Figure1CustomSetsview.
Nature Methods doi:10.1038/nmeth.4365
30
PROGRAMMING TOOLS Supplementary Table 1
TheP&Dportalusesalotoffreeopen-sourceprogramingtools.TheserversideoftheportalisbasedprimarilyonthePythonprogramminglanguage[64]withDjangowebframework[65]andPostgreSQLdatabase[66].Inthefollowingtable,someofthemostimportantusedtoolsandprogramminglibraries,withashortdescription,arementioned:
Name Category Description URL
chembl_webresource_client cheminformatics
This package implements python client for accessingChEMBL webservices. It provides convenient interfaceto access data, cache results and optionally executesrequestsinasynchronousmanner.
https://github.com/chembl/chembl_webresource_client
ChemSpiPy cheminformatics
ChemSpiPyprovidesawaytointeractwithChemSpiderin Python. It allows chemical searches, chemical filedownloads, depiction and retrieval of chemicalproperties.
http://chemspipy.readthedocs.io/en/latest/
Ketcher cheminformatics Ketcherisaweb-basedchemicalstructureeditor. http://lifescience.opensource.epam.com/ketcher/
OpenBabel[45] cheminformatics
OpenBabelisachemicaltoolboxdesignedtospeakthemany languages of chemical data. It's an open,collaborative project allowing anyone to search,convert, analyze, or store data from molecularmodeling,chemistry,solid-statematerials,biochemistry,orrelatedareas.
http://openbabel.org/
PubChemPy cheminformatics
PubChemPyprovidesawaytointeractwithPubCheminPython. It allows chemical searches by name,substructure and similarity, chemical standardization,conversion between chemical file formats, depictionandretrievalofchemicalproperties.
http://pubchempy.readthedocs.io/en/latest/
Nature Methods doi:10.1038/nmeth.4365
31
RDKit[62] cheminformatics RDKit is a collection of cheminformatics and machine-learningsoftwarewritteninC++andPython.
http://www.rdkit.org/
Standardiser cheminformaticsThis is a tool designed to provide a simple way ofstandardisingmolecules as a prelude to e.g.molecularmodellingexercises.
https://github.com/flatkinson/standardiser
Highcharts visualizationsHighcharts is a charting library written in pureJavaScript, offering an easy way of adding interactivechartstoyourwebsiteorwebapplication.
http://www.highcharts.com/
ChemSpace.js visualizations
ChemSpace.js is an open source interactive Javascriptlibrary which provides an easy way to display andanalyzecompoundsetsintheformof2Dspacewithinawebpage.
http://openscreen.cz/software/chemspace/home/
InCHlib[55] visualizations
InCHlib (InteractiveClusterHeatmap library) isanopensource interactive Javascript library which provides aneasywaytodisplayandanalyzehierarchicallyclustereddataandclusterheatmaps.
http://openscreen.cz/software/inchlib/home/
Konva.js visualizations2d html5 canvas framework for desktop and mobileapplications. http://konvajs.github.io/
BitSet.js other BitSet.js is a infinite Bit-Array implementation inJavaScript. https://github.com/infusion/BitSet.js
BootstrapTour other JavaScript library tobuildproduct tourswithBootstrapPopovers. http://bootstraptour.com/
clipboard.js other Moderncopy-to-clipboardjavascriptlibrary. https://clipboardjs.com/
FontAwesome otherFontAwesomegivesyouscalablevector icons thatcaninstantlybecustomized - size, color,dropshadow,andanythingthatcanbedonewiththepowerofCSS.
http://fontawesome.io/
Nature Methods doi:10.1038/nmeth.4365
32
jQuery other
jQuery is a fast, small, and feature-rich JavaScriptlibrary. It makes things like HTML document traversalandmanipulation, event handling, animation, andAjaxmuchsimplerwithaneasy-to-useAPIthatworksacrossamultitudeofbrowsers.
https://jquery.com/
jQueryUI otherjQueryUIisacuratedsetofuserinterfaceinteractions,effects,widgets,andthemesbuiltontopofthe jQueryJavaScriptLibrary.
https://jqueryui.com/
Platform.js other A platform detection library that works on nearly allJavaScriptplatforms. https://github.com/bestiejs/platform.js/
tipsy other tipsyisasimplejQuerypluginforgeneratingFacebook-styletooltips. http://onehackoranother.com/projects/jquery/tipsy/
Underscore.js otherUnderscoreisaJavaScriptlibrarythatprovidesawholemessofusefulfunctionalprogramminghelperswithoutextendinganybuilt-inobjects.
http://underscorejs.org/
Nature Methods doi:10.1038/nmeth.4365
33
COMPOUND SETS Supplementary Table 2
TheoverviewofcompoundsetscurrentlyincludedintheP&Dportal:
Name Description Category Compounds Source
ChemicalProbes.org
Thisresourceisacommunitydrivenwiki-likesitethatrecommendsappropriatechemicalprobesforbiologicaltargets,providesguidanceontheiruse,anddocumentstheirlimitations.Theyalsoprovideadviceontheuseofcontrols,bothchemicallydistinctprobesforthesametargetandnegativecontrolcompounds,whereavailable.
probe 160 ChemicalProbes.org[8]
MLPProbes
TheMolecularLibrariesProgram(MLP),acomponentoftheNIHCommonFund,offerspublicsectorbiomedicalresearchersaccesstothelarge-scalescreeningcapacitynecessarytoidentifysmallmoleculesthatcanbeoptimizedaschemicalprobestostudythefunctionsofgenes,cells,andbiochemicalpathways.Thiswillleadtonewwaystoexplorethefunctionsofgenesandsignalingpathwaysinhealthanddisease.
probe 374 NIH/PubChem[67]
NatureChemicalBiologyProbes
NatureChemicalBiologyprovidesfreelyavailablesummariesoftherelevantchemical,invitro,cellularandinvivoinformationfornewlyreportedornewlycharacterizedchemicalprobesreportedinthejournal.
probe 58NatureChemicalBiology[68]
SGCProbesSGCChemicalProbesaresmall,drug-likemoleculeswhichmeetthesecriteria:invitroIC50orKd<100nM,>30-foldselectivityoverproteinsinthesamefamily,significanton-targetcellularactivityat1mM.
probe 49StructuralGenomicsConsortium[69]
Nature Methods doi:10.1038/nmeth.4365
34
DrugBank
TheDrugBankdatabaseisauniquebioinformaticsandcheminformaticsresourcethatcombinesdetaileddrug(i.e.chemical,pharmacologicalandpharmaceutical)datawithcomprehensivedrugtarget(i.e.sequence,structure,andpathway)information.Thedatabasecontains8,206drugentriesincluding1,991FDA-approvedsmallmoleculedrugs,207FDA-approvedbiotech(protein/peptide)drugs,93nutraceuticalsandover6,000experimentaldrugs.
drug 7,110 DrugBank[20]
DrugCentral
DrugCentralprovidesinformationonactiveingredientschemicalentities,pharmaceuticalproducts,drugmodeofaction,indications,andpharmacologicaction.TheymonitorFDA,EMA,andPMDAfornewdrugapprovalonaregularbasistoensurecurrencyoftheresource.LimitedinformationondiscontinuedanddrugsapprovedoutsideUSisalsoavailable,howeverregulatoryapprovalinformationcan'tbeverified.
drug 3,936 DrugCentral[21]
ChEMBLApprovedDrugs
AcollectionofcompoundstaggedasapproveddrugsinChEMBLdatabase.Onlycompoundswithavailablestructureareuploaded(omittingcompoundscontainingmetal,andpeptides).
drug 3,365 ChEMBL[18]
NIHApprovedOncologyDrugsVI
Thecurrentset(AODVII)consistsof129agentsandisintendedtoenablecancerresearch,drugdiscoveryandcombinationdrugstudies.
drug 118 NIH/NCI
GSKPublishedKinaseInhibitorSet
ThePublishedKinaseInhibitorSet(PKIS)isacollectionof376compoundsthathavebeenmadeavailablebyGSKforscreeningbyexternalgroups;allcompoundshavebeenpublishedinthescientificliterature.Thehopeistogenerateprobemoleculesforthemajorityofthekinomethatisasyetuntargeted.
non-commercial 366 GSK[70]
Nature Methods doi:10.1038/nmeth.4365
35
GuidetoPharmacology
Oneofthemainaimsistoprovideasearchabledatabasewithquantitativeinformationondrugtargetsandtheprescriptionmedicinesandexperimentaldrugsthatactonthem.Infutureversionstheyplantoaddresourcesforeducationandtraininginpharmacologicalprinciplesandtechniquesalongwithresearchguidelinesandoverviewsofkeytopics.TheyhopethattheIUPHAR/BPSGuidetoPHARMACOLOGYwillbeusefulforresearchersandstudentsinpharmacologyanddrugdiscoveryandprovidethegeneralpublicwithaccurateinformationonthebasicscienceunderlyingdrugaction.
non-commercial 6,513GuidetoPharmacology[22]
InformerSet2.0
TheBroadInstitutegeneratedan'InformerSet'of481small-moleculeprobesanddrugsthatselectivelytargetdistinctnodesincellcircuitryandthatcollectivelymodulateabroadarrayofcellprocesses.Theyquantitativelymeasuredthesensitivityof860deeplycharacterizedcancer-celllinestoInformerSetcompounds,andhaveundertakenanalysesconnectingsensitivitytocancerfeatures,includingmutations,geneexpression,copy-numbervariation,andlineage.
non-commercial 481 BroadInstitute[31]
KinaseInhibitors(best-in-class)
Selectivesmall-moleculeinhibitorsofproteinkinasescanserveaspowerfultoolstoelucidatebiologicalfunction.Effortstodeveloppotentialdrugcandidateshaveyieldedawealthofkinaseinhibitors.However,selectingtheoptimalkinaseinhibitorforaparticularapplicationcanbechallenging.Whiletheoptimalinhibitorwillbeapplicationspecific,wehaveattemptedtosummarizesomeofthebestreportedinhibitorsforvariouskinases.
non-commercial 96 Cell[71,72]
LINCScompoundsetTheDatabasecontainsallpubliclyavailableHMSLINCSdatasetsandinformationforeachdatasetaboutexperimentalreagents(smallmoleculeperturbagens.
non-commercial 498LibraryofIntegratedNetwork-basedCellularSignatures[73]
Nature Methods doi:10.1038/nmeth.4365
36
MLSMRProbes+
Thisplatedsetof1,133compoundscontains216probemolecules.Thesetwasenhancedwith917structureactivityrelationship(SAR)compounds-additionalcompoundssynthesizedduringtheprobeprojectsorchemicallysimilarcompoundsselectedfromtheMLSMR.TheSARandsimilaritycompoundsareexpectedtobeusefulleadsinfindingmodifiersofproteinsingenefamiliesrelatedtoeachprobetarget.Notallcompoundswereincludedinthefinalarraysduetoavailabilityorpoorstoragestability.-
non-commercial 1,132NIHSmallMoleculeRepository[67]
NIHClinicalCollections(NCC)
TheNIHClinicalCollectionisaplatedarrayof719smallmoleculesthathaveahistoryofuseinhumanclinicaltrials.ThecollectionwasassembledbytheNationalInstitutesofHealth(NIH)throughtheMolecularLibrariesRoadmapInitiativeaspartofitsmissiontoenabletheuseofcompoundscreensinbiomedicalresearch.SimilarcollectionsofFDAapproveddrugshaveproventoberichsourcesofundiscoveredbioactivityandtherapeuticpotential.TheclinicallytestedcompoundsintheNCCarehighlydrug-likewithknownsafetyprofiles.Thesecompoundscanprovideexcellentstartingpointsformedicinalchemistryoptimizationandmayevenbeappropriatefordirecthumanuseinnewdiseaseareas.
non-commercial 718 NIH/NCI
NIHMechanisticSetIII
TheMechanisticSetIII,whichconsistsof813compounds,wasderivedfromthe37,836opencompoundsthathavebeentestedintheNCIhumantumor60celllinescreen.Incontrasttotheoriginaldiversitysetof1,990compounds,whichwaschosenonthebasisofstructuraldiversity,thismechanisticdiversitysetwaschosentorepresentabroadrangeofgrowthinhibitionpatternsinthe60celllinescreen,basedontheGI50activityofthecompounds.
non-commercial 808 NIH/NCI
NPCScreeningCollection
TheNCGCPharmaceuticalCollection(NPC)isacomprehensive,publically-accessiblecollectionofapprovedandinvestigationaldrugsforhigh-throughputscreeningthatprovidesavaluableresourceforbothvalidatingnewmodelsofdiseaseandbetterunderstandingthemolecularbasisofdiseasepathologyandintervention.
non-commercial 3,257 NIH/NCATS[39]
Nature Methods doi:10.1038/nmeth.4365
37
NURSAligandset
ThemissionofNURSAistoaccrue,develop,andcommunicateinformationthatadvancesourunderstandingoftherolesofnuclearreceptors(NRs)andcoregulatorsinhumanphysiologyanddisease.TheNURSAwebsitehasbeendevelopedoverthepastdecadeintoacomprehensivesourceofinformationaboutNRsandtheirco-regulators,ligands,anddownstreamtranscriptionaltargets.
non-commercial 299NURSA-NuclearReceptorSignalingAtlas[29]
WelcomeTrustCancerDrugs
Thesecompoundsincludecytotoxicchemotherapeuticsaswellastargetedtherapeuticsfromcommercialsources,academiccollaborators,andfromthebiotechandpharmaceuticalindustries.
non-commercial 239 WelcomeTrust[74]
AxonMedchemScreeningLibrary
AxonLigandsareauniquecollectionofbiologicalmolecules,asworld-widerecognizedresearchtoolsanddrugstandardsindifferentapplicationfieldssuchasneurologicaldisorders,cardiovasculardisease,painandinflammation,andcancer.FeaturedligandswithourexpertiseincludingCNSreagents,ionchannelmodulators,signaltransductionregulators(suchaskinaseinhibitors)andmuchmore?.
commercial 1,392 AxonMedchem
CaymanChemicalBioactives
Sixtystaffchemistssynthesize,purify,andcharacterizethesmallmoleculesandbiochemicalsyouneedtotakeyourresearchfurther,includingdrug-likeheterocycles,complexbiolipidsandfattyacids,inhibitors,activators,andmodulators.
commercial 8,262 CaymanChemical
LOPAClibraryCollectionof1,280Pharmacologically-ActiveSigmaCompounds.Includesthelatest,drug-likemoleculesinthefieldsofCellSignaling&Neuroscience.
commercial 1,278 Sigma
MedChemExpressBioactiveCompoundLibrary
Auniquecollectionof3,236smallmoleculecompoundsfordrugscreening,drugtargetidentification,andotherpharmaceutical-relatedapplications.
commercial 3,232 MedChemExpress
Nature Methods doi:10.1038/nmeth.4365
38
PrestwickChemicalLibrary1.0
Auniquecollectionof1,120smallmolecules,100%approveddrugs(FDA,EMAandotheragencies)selectedbyateamofmedicinalchemistsandpharmacistsfortheirhighchemicalandpharmacologicaldiversityaswellasfortheirknownbioavailabilityandsafetyinhumans.Designedtoincreasethepotentialofgetting"high-quality"hits,ourchemicalscreeninglibraryisavaluabletooltoaccelerateleaddiscovery.
commercial 1,112 Prestwick
PrestwickChemicalLibrary2.0
Auniquecollectionof1,280smallmolecules,100%approveddrugs(FDA,EMAandotheragencies)selectedbyateamofmedicinalchemistsandpharmacistsfortheirhighchemicalandpharmacologicaldiversityaswellasfortheirknownbioavailabilityandsafetyinhumans.Designedtoincreasethepotentialofgetting"high-quality"hits,ourchemicalscreeninglibraryisavaluabletooltoaccelerateleaddiscovery.
commercial 1,279 Prestwick
SelleckchemBioactiveCompoundLibrary
Auniquecollectionof2,661bioactivechemicalcompoundsforhighthroughputscreening(HTS)andhighcontentscreening(HCS).
commercial 2,624 Selleckchem
TheSpectrumCollection
TheSpectrumCollectionpresents2,560compoundsandincludesallofthecompoundsintheUSandInternationalDrugCollections,togetherwithourNaturalProductandDiscoverlibraries.ThisuniqueresourceprovidesbiologicallyactiveandstructurallydiversecompoundsthatcreatetheoptimumopportunityfordiscoveryinnewandestablishedbioassaysinHTSorlowcapacitytargetspecificassays.
commercial 2,555 MicroSource
TocriscreenPlusAlibraryof1,280biologicallyactivecompoundsfromtheTocriscatalog.Coversawiderangeofpharmacologicaltargetsandresearchareas.
commercial 1,279 Tocris
TocriscreenTotalAcollectionof1,120biologicallyactivecompoundssuppliedaspre-dissolvedDMSOsolutions(250l10mMsolutionpercompound).
commercial 1,119 Tocris
Otherbioactivecompounds
Otherbioactivecompoundsharvestedfromdifferentnon-specificsources.Onlydatafromexternaldatabasesareavailableforthesecompounds.
other 60 -
Nature Methods doi:10.1038/nmeth.4365
39
EXTERNAL SOURCES Supplementary Table 3
Theoverviewofexternalsourcesusedforthecompounddescriptionontheportal:
Name Harvesteddata Comment License URL
ChEMBLdatabase[18]
ChEMBLID,Compoundpreferredname,Compoundtradenames,Maxphaseattribute,Approveddrugattribute,Targetbioactivitydata,Targetontology
Activitydataarecurrentlyextractedonlyforhuman,ratandmousetargetswhere:adirectproteinisassigned,pChEMBLvalue isavailable,andaconfidencescore isgreater thanorequal to 7. When more than one value for a ligand-targetcomplex is available, the average of these values iscalculated.
CreativeCommonsAttribution-ShareAlike3.0UnportedLicense.
https://www.ebi.ac.uk/chembl
GuidetoPHARMACOLOGY[22]
Wholecompoundset,GtoPdbID,Compoundpreferredname,Approveddrugattribute,Targetbioactivitydata,Primarytargets,Targetontology
All activity data are extracted, even the ones where abioactivityvalueisnotknown.
CreativeCommonsAttribution-ShareAlike3.0Unportedlicense.
http://www.guidetopharmacology.org
Reactome[25] Pathways,PathwayontologyReactomepathwaysarematchedaccordingtotargetUniProtIDs.
CreativeCommonsAttribution4.0InternationalLicense
http://www.reactome.org/
UniProt[26] Targetnames,Genenames DataarematchedaccordingtotargetUniProtIDs.CreativeCommonsAttribution-NoDerivs3.0Unported
http://www.uniprot.org/
UniChem[75] ExternalIDsUniChem service is used to harvest external IDs from allavailable external sources. For harvesting thechembl_webresource_clientpythonpackageisemployed.
CreativeCommonsZero(CC-0)license
https://www.ebi.ac.uk/unichem/
Nature Methods doi:10.1038/nmeth.4365
40
ChemSpider[76] Names,ChemSpiderIDsCompounds are matched using the ChemSpiPy pythonpackage.
CreativeCommonsAttribution-ShareAlike3.0UnitedStatesLicense
http://www.chemspider.com/
SMPDB[77]Ligandpathwaysubjects,Ligandpathways
-SMPDBisofferedtothepublicasafreelyavailableresource.
http://smpdb.ca/
PubChem[46] Compoundstructures
PubChem is the main source for the manual extraction ofcompound structures. Generally, when a compoundmissesitsstructure(orthestructureiswrong),itisfoundaccordingto a name provided by a supplier/provider. Inmany cases,compoundsarealsoidentifiedbytheirPubChemCIDs/SIDs.
Publicdatahttps://pubchem.ncbi.nlm.nih.gov/
MolPortAvailability(instockcompounds)
From MolPort, the information about the availability ofcompoundsisused.AllexternalIDstoMolPortareharvestedthroughtheUniChemservice.
Termsofusehttps://www.molport.com/shop/index
MculeAvailability(instockcompounds)
From Mcule, the information about the availability ofcompounds isused.Allexternal IDs toMculeareharvestedthroughtheUniChemservice.
Termsofuse https://mcule.com/
Nature Methods doi:10.1038/nmeth.4365
41
REFERENCES
1. FryeSV:Theartofthechemicalprobe.NatChemBiol2010,6(3):159-161.2. BunnageME,CheklerEL,JonesLH:Targetvalidationusingchemicalprobes.NatChemBiol
2013,9(4):195-199.3. SchreiberSL,KotzJD,LiM,AubeJ,AustinCP,ReedJC,RosenH,WhiteEL,SklarLA,Lindsley
CWetal:AdvancingBiologicalUnderstandingandTherapeuticsDiscoverywithSmall-MoleculeProbes.Cell2015,161(6):1252-1265.
4. GarbaccioRM,ParmeeER:TheImpactofChemicalProbesinDrugDiscovery:APharmaceuticalIndustryPerspective.CellChemBiol2016,23(1):10-17.
5. FryeSV:Unlockingthepotentialofchemicalprobesformethyl-lysinereaderproteins.FutureMedChem2015,7(14):1831-1833.
6. vanHattumH,WaldmannH:ChemicalbiologytoolsforregulatingRASsignalingcomplexityinspaceandtime.ChemBiol2014,21(9):1185-1195.
7. WorkmanP,CollinsI:Probingtheprobes:fitnessfactorsforsmallmoleculetools.ChemBiol2010,17(6):561-577.
8. ArrowsmithCH,AudiaJE,AustinC,BaellJ,BennettJ,BlaggJ,BountraC,BrennanPE,BrownPJ,BunnageMEetal:Thepromiseandperilofchemicalprobes.NatChemBiol2015,11(8):536-541.
9. OpreaTI,BologaCG,BoyerS,CurpanRF,GlenRC,HopkinsAL,LipinskiCA,MarshallGR,MartinYC,Ostopovici-HalipLetal:AcrowdsourcingevaluationoftheNIHchemicalprobes.NatChemBiol2009,5(7):441-447.
10. WangY,CornettA,KingFJ,MaoY,NigschF,ParisCG,McAllisterG,JenkinsJL:Evidence-BasedandQuantitativePrioritizationofToolCompoundsinPhenotypicDrugDiscovery.CellChemBiol2016,23(7):862-874.
11. IrwinJJ,DuanD,TorosyanH,DoakAK,ZiebartKT,SterlingT,TumanianG,ShoichetBK:AnAggregationAdvisorforLigandDiscovery.JMedChem2015,58(17):7076-7087.
12. BaellJB,HollowayGA:Newsubstructurefiltersforremovalofpanassayinterferencecompounds(PAINS)fromscreeninglibrariesandfortheirexclusioninbioassays.JMedChem2010,53(7):2719-2740.
13. LipinskiCA,LittermanNK,SouthanC,WilliamsAJ,ClarkAM,EkinsS:Parallelworldsofpublicandcommercialbioactivechemistrydata.JMedChem2015,58(5):2068-2076.
14. BaellJ,WaltersMA:Chemistry:Chemicalconartistsfoildrugdiscovery.Nature2014,513(7519):481-483.
15. GilsonMK,LiuT,BaitalukM,NicolaG,HwangL,ChongJ:BindingDBin2015:Apublicdatabaseformedicinalchemistry,computationalchemistryandsystemspharmacology.NucleicAcidsRes2016,44(D1):D1045-1053.
16. HoweEA,deSouzaA,LahrDL,ChatwinS,MontgomeryP,AlexanderBR,NguyenDT,CruzY,StonichDA,WalzerGetal:BioAssayResearchDatabase(BARD):chemicalbiologyandprobe-developmentenabledbystructuredmetadataandresulttypes.NucleicAcidsRes2015,43(Databaseissue):D1163-1170.
17. HastingsJ,deMatosP,DekkerA,EnnisM,HarshaB,KaleN,MuthukrishnanV,OwenG,TurnerS,WilliamsMetal:TheChEBIreferencedatabaseandontologyforbiologicallyrelevantchemistry:enhancementsfor2013.NucleicAcidsRes2013,41(Databaseissue):D456-463.
18. BentoAP,GaultonA,HerseyA,BellisLJ,ChambersJ,DaviesM,KrugerFA,LightY,MakL,McGlincheySetal:TheChEMBLbioactivitydatabase:anupdate.NucleicAcidsRes2014,42(Databaseissue):D1083-1090.
Nature Methods doi:10.1038/nmeth.4365
42
19. RoiderHG,PavlovaN,KirovI,SlavovS,SlavovT,UzunovZ,WeissB:Drug2Gene:anexhaustiveresourcetoexploreeffectivelythedrug-targetrelationnetwork.BMCBioinformatics2014,15:68.
20. WishartDS,KnoxC,GuoAC,ShrivastavaS,HassanaliM,StothardP,ChangZ,WoolseyJ:DrugBank:acomprehensiveresourceforinsilicodrugdiscoveryandexploration.NucleicAcidsRes2006,34(Databaseissue):D668-672.
21. UrsuO,HolmesJ,KnockelJ,BologaCG,YangJJ,MathiasSL,NelsonSJ,OpreaTI:DrugCentral:onlinedrugcompendium.NucleicAcidsRes2016.
22. SouthanC,SharmanJL,BensonHE,FaccendaE,PawsonAJ,AlexanderSP,BunemanOP,DavenportAP,McGrathJC,PetersJAetal:TheIUPHAR/BPSGuidetoPHARMACOLOGYin2016:towardscuratedquantitativeinteractionsbetween1300proteintargetsand6000ligands.NucleicAcidsRes2016,44(D1):D1054-1068.
23. NguyenDT,MathiasS,BologaC,BrunakS,FernandezN,GaultonA,HerseyA,HolmesJ,JensenLJ,KarlssonAetal:Pharos:Collatingproteininformationtoshedlightonthedruggablegenome.NucleicAcidsRes2016.
24. WangY,BryantSH,ChengT,WangJ,GindulyteA,ShoemakerBA,ThiessenPA,HeS,ZhangJ:PubChemBioAssay:2017update.NucleicAcidsRes2017,45(D1):D955-D963.
25. CroftD,MundoAF,HawR,MilacicM,WeiserJ,WuG,CaudyM,GarapatiP,GillespieM,KamdarMRetal:TheReactomepathwayknowledgebase.NucleicAcidsRes2014,42(Databaseissue):D472-477.
26. UniProtC:UniProt:ahubforproteininformation.NucleicAcidsRes2015,43(Databaseissue):D204-212.
27. SterlingT,IrwinJJ:ZINC15--LigandDiscoveryforEveryone.JChemInfModel2015,55(11):2324-2337.
28. BasuA,BodycombeNE,CheahJH,PriceEV,LiuK,SchaeferGI,EbrightRY,StewartML,ItoD,WangSetal:Aninteractiveresourcetoidentifycancergeneticandlineagedependenciestargetedbysmallmolecules.Cell2013,154(5):1151-1161.
29. LanzRB,JericevicZ,ZuercherWJ,WatkinsC,SteffenDL,MargolisR,McKennaNJ:NuclearReceptorSignalingAtlas(www.nursa.org):hyperlinkingthenuclearreceptorsignalingcommunity.NucleicAcidsRes2006,34(Databaseissue):D221-226.
30. GaultonA,OveringtonJP:Roleofopenchemicaldatainaidingdrugdiscoveryanddesign.FutureMedChem2010,2(6):903-907.
31. Seashore-LudlowB,ReesMG,CheahJH,CokolM,PriceEV,ColettiME,JonesV,BodycombeNE,SouleCK,GouldJetal:HarnessingConnectivityinaLarge-ScaleSmall-MoleculeSensitivityDataset.CancerDiscov2015,5(11):1210-1223.
32. ElkinsJM,FedeleV,SzklarzM,AbdulAzeezKR,SalahE,MikolajczykJ,RomanovS,SepetovN,HuangXP,RothBLetal:ComprehensivecharacterizationofthePublishedKinaseInhibitorSet.NatBiotechnol2016,34(1):95-103.
33. SantosR,UrsuO,GaultonA,BentoAP,DonadiRS,BologaCG,KarlssonA,Al-LazikaniB,HerseyA,OpreaTIetal:Acomprehensivemapofmoleculardrugtargets.NatRevDrugDiscov2017,16(1):19-34.
34. ReesMG,Seashore-LudlowB,CheahJH,AdamsDJ,PriceEV,GillS,JavaidS,ColettiME,JonesVL,BodycombeNEetal:Correlatingchemicalsensitivityandbasalgeneexpressionrevealsmechanismofaction.NatChemBiol2016,12(2):109-116.
35. LounkineE,KeiserMJ,WhitebreadS,MikhailovD,HamonJ,JenkinsJL,LavanP,WeberE,DoakAK,CoteSetal:Large-scalepredictionandtestingofdrugactivityonside-effecttargets.Nature2012,486(7403):361-+.
36. SeilerKP,GeorgeGA,HappMP,BodycombeNE,CarrinskiHA,NortonS,BrudzS,SullivanJP,MuhlichJ,SerranoMetal:ChemBank:asmall-moleculescreeningandcheminformaticsresourcedatabase.NucleicAcidsRes2008,36(Databaseissue):D351-359.
37. OpenPHACTSExplorer[https://explorer.openphacts.org/]
Nature Methods doi:10.1038/nmeth.4365
43
38. StierandK,HarderT,MarekT,HilbigM,LemmenC,RareyM:TheInternetasScientificKnowledgeBase:NavigatingtheChem-BioSpace.MolInform2012,31(8):543-546.
39. HuangR,SouthallN,WangY,YasgarA,ShinnP,JadhavA,NguyenDT,AustinCP:TheNCGCpharmaceuticalcollection:acomprehensiveresourceofclinicallyapproveddrugsenablingrepurposingandchemicalgenomics.SciTranslMed2011,3(80):80ps16.
40. HohmanM,GregoryK,ChibaleK,SmithPJ,EkinsS,BuninB:Novelweb-basedtoolscombiningchemistryinformatics,biologyandsocialnetworksfordrugdiscovery.DrugDiscovToday2009,14(5-6):261-270.
41. WilliamsAJ,HarlandL,GrothP,PettiferS,ChichesterC,WillighagenEL,EveloCT,BlombergN,EckerG,GobleCetal:OpenPHACTS:semanticinteroperabilityfordrugdiscovery.DrugDiscovToday2012,17(21-22):1188-1198.
42. OpenPHACTSforResearchers-TheData[https://www.openphacts.org/2/sci/data.html]43. IlluminatingtheDruggableGenome(IDG)[https://commonfund.nih.gov/idg/index]44. TruchonJF,BaylyCI:Evaluatingvirtualscreeningmethods:goodandbadmetricsforthe
"earlyrecognition"problem.JChemInfModel2007,47(2):488-508.45. O'BoyleNM,BanckM,JamesCA,MorleyC,VandermeerschT,HutchisonGR:OpenBabel:An
openchemicaltoolbox.JCheminform2011,3:33.46. KimS,ThiessenPA,BoltonEE,ChenJ,FuG,GindulyteA,HanLY,HeJE,HeSQ,ShoemakerBA
etal:PubChemSubstanceandCompounddatabases.NucleicAcidsResearch2016,44(D1):D1202-D1213.
47. standardiser[https://github.com/flatkinson/standardiser]48. Barsyte-LovejoyD,LiFL,OudhoffMJ,TatlockJH,DongAP,ZengH,WuH,FreemanSA,
SchapiraM,SenisterraGAetal:(R)-PFI-2isapotentandselectiveinhibitorofSETD7methyltransferaseactivityincells.PNatlAcadSciUSA2014,111(35):12853-12858.
49. HellerS,McNaughtA,SteinS,TchekhovskoiD,PletnevI:InChI-theworldwidechemicalstructureidentifierstandard.JCheminformatics2013,5.
50. CreativeCommonsBY-SA4.0[https://creativecommons.org/licenses/by-sa/4.0/]51. LipinskiCA,LombardoF,DominyBW,FeeneyPJ:Experimentalandcomputational
approachestoestimatesolubilityandpermeabilityindrugdiscoveryanddevelopmentsettings.AdvDrugDelivRev2001,46(1-3):3-26.
52. MolPort[https://www.molport.com/shop/index]53. mcule[https://mcule.com/]54. RogersD,HahnM:Extended-connectivityfingerprints.JChemInfModel2010,50(5):742-
754.55. SkutaC,BartunekP,SvozilD:InCHlib-interactiveclusterheatmapforwebapplications.J
Cheminform2014,6(1):44.56. ChEMBLTargetTree[https://www.ebi.ac.uk/chembl/target/browser]57. GuidetoPHARMACOLOGYTargetTree[http://www.guidetopharmacology.org/targets.jsp]58. BemisGW,MurckoMA:Thepropertiesofknowndrugs.1.Molecularframeworks.JMed
Chem1996,39(15):2887-2893.59. HeikampK,BajorathJ:Large-scalesimilaritysearchprofilingofChEMBLcompounddata
sets.JChemInfModel2011,51(8):1831-1839.60. ClemonsPA,BodycombeNE,CarrinskiHA,WilsonJA,ShamjiAF,WagnerBK,KoehlerAN,
SchreiberSL:Smallmoleculesofdifferentoriginshavedistinctdistributionsofstructuralcomplexitythatcorrelatewithprotein-bindingprofiles.ProcNatlAcadSciUSA2010,107(44):18787-18792.
61. MaggioraG,VogtM,StumpfeD,BajorathJ:Molecularsimilarityinmedicinalchemistry.JMedChem2014,57(8):3186-3204.
62. WillettP:Similarity-basedvirtualscreeningusing2Dfingerprints.DrugDiscovToday2006,11(23-24):1046-1053.
63. Historiccompounds(ChemicalProbes.org)[http://www.chemicalprobes.org/historic_compounds]
Nature Methods doi:10.1038/nmeth.4365
44
64. PythonLanguageReference,version2.7.[http://www.python.org]65. Django(Version1.10)[https://djangoproject.com]66. PostgreSQL(version9.5)[https://www.postgresql.org/]67. .In:ProbeReportsfromtheNIHMolecularLibrariesProgram.Bethesda(MD);2010.68. NatureChemicalBiologyProbes[http://www.nature.com/nchembio/chemical_probes.html]69. SGCChemicalProbes[http://www.thesgc.org/chemical-probes]70. PublishedKinaseInhibitorSet[https://www.ebi.ac.uk/chembldb/extra/PKIS/]71. WangJ,GrayNS:SnapShot:KinaseInhibitorsII.MolCell2015,58(4):710e711.72. WangJ,GrayNS:SnapShot:KinaseInhibitorsI.MolCell2015,58(4):708e701.73. HMSLINCSdatabase-Smallmolecules[http://lincs.hms.harvard.edu/db/sm/]74. YangW,LightfootH,BignellG,BehanF,CokelearT,HaberD,EngelmanJ,StrattonM,Benes
C,McDermottUetal:GenomicsofDrugSensitivityinCancer(GDSC):Aresourceforbiomarkerdiscoveryincancercells.EurJCancer2016,68:S82-S82.
75. ChambersJ,DaviesM,GaultonA,PapadatosG,HerseyA,OveringtonJP:UniChem:extensionofInChI-basedcompoundmappingtosalt,connectivityandstereochemistrylayers.JCheminformatics2014,6.
76. PenceHE,WilliamsA:ChemSpider:AnOnlineChemicalInformationResource.JChemEduc2010,87(11):1123-1124.
77. JewisonT,SuYL,DisfanyFM,LiangYJ,KnoxC,MaciejewskiA,PoelzerJ,HuynhJ,ZhouY,ArndtDetal:SMPDB2.0:BigImprovementstotheSmallMoleculePathwayDatabase.NucleicAcidsResearch2014,42(D1):D478-D484.
Nature Methods doi:10.1038/nmeth.4365