44
1 Supplementary Notes Probes & Drugs portal: interactive approach to Open Data exploration in chemical biology Ctibor Skuta, Martin Popr, Tomas Muller, Jindrich Jindrich, Michal Kahle, David Sedlak, Daniel Svozil and Petr Bartunek Bioactive compound tools and software applications....................................... 2 Supplementary Note 1 Probes & Drugs portal database ........................................................................ 3 Supplementary Note 2 2.1 Compounds ............................................................................................................................. 3 2.1.1 Compounds processing ................................................................................................... 3 2.1.2 Compound standardization ............................................................................................ 4 2.1.3 Compounds external data............................................................................................... 5 2.2 Compound sets ....................................................................................................................... 7 2.3 Data license and accessibility.................................................................................................. 7 Web interface .................................................................................................... 8 Supplementary Note 3 3.1 Help ....................................................................................................................................... 11 Filtering system ............................................................................................... 13 Supplementary Note 4 Visualizations ................................................................................................... 18 Supplementary Note 5 5.1 Venn diagram........................................................................................................................ 18 5.2 Chemical space ..................................................................................................................... 19 5.3 Cluster heatmap ................................................................................................................... 21 5.4 Summary visualizations (biological, physicochemical, scaffolds) ......................................... 23 5.4.1 Biological summary ....................................................................................................... 23 5.4.2 Physicochemical properties distribution....................................................................... 24 5.4.3 Scaffold summary ......................................................................................................... 25 Chemical intelligence....................................................................................... 26 Supplementary Note 6 6.1 Structural alerts .................................................................................................................... 27 Ontologies ....................................................................................................... 28 Supplementary Note 7 Custom sets ..................................................................................................... 29 Supplementary Note 8 Programming tools ......................................................................................... 30 Supplementary Table 1 Compound sets ............................................................................................... 33 Supplementary Table 2 External sources.............................................................................................. 39 Supplementary Table 3 References ............................................................................................................................................ 41 Nature Methods doi:10.1038/nmeth.4365

Supplementary Notes - images.nature.com file1 Supplementary Notes Probes & Drugs portal: interactive approach to Open Data exploration in chemical biology Ctibor Skuta, Martin Popr,

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Supplementary Notes - images.nature.com file1 Supplementary Notes Probes & Drugs portal: interactive approach to Open Data exploration in chemical biology Ctibor Skuta, Martin Popr,

1

Supplementary Notes Probes&Drugsportal:interactiveapproachtoOpenDataexplorationinchemicalbiology

CtiborSkuta,MartinPopr,TomasMuller,JindrichJindrich,MichalKahle,DavidSedlak,DanielSvozilandPetrBartunek

Bioactivecompoundtoolsandsoftwareapplications.......................................2SupplementaryNote1

Probes&Drugsportaldatabase........................................................................3SupplementaryNote2

2.1 Compounds.............................................................................................................................3

2.1.1 Compoundsprocessing...................................................................................................3

2.1.2 Compoundstandardization............................................................................................4

2.1.3 Compoundsexternaldata...............................................................................................5

2.2 Compoundsets.......................................................................................................................7

2.3 Datalicenseandaccessibility..................................................................................................7

Webinterface....................................................................................................8SupplementaryNote3

3.1 Help.......................................................................................................................................11

Filteringsystem...............................................................................................13SupplementaryNote4

Visualizations...................................................................................................18SupplementaryNote5

5.1 Venndiagram........................................................................................................................18

5.2 Chemicalspace.....................................................................................................................19

5.3 Clusterheatmap...................................................................................................................21

5.4 Summaryvisualizations(biological,physicochemical,scaffolds).........................................23

5.4.1 Biologicalsummary.......................................................................................................23

5.4.2 Physicochemicalpropertiesdistribution.......................................................................24

5.4.3 Scaffoldsummary.........................................................................................................25

Chemicalintelligence.......................................................................................26SupplementaryNote6

6.1 Structuralalerts....................................................................................................................27

Ontologies.......................................................................................................28SupplementaryNote7

Customsets.....................................................................................................29SupplementaryNote8

Programmingtools.........................................................................................30SupplementaryTable1

Compoundsets...............................................................................................33SupplementaryTable2

Externalsources..............................................................................................39SupplementaryTable3

References............................................................................................................................................41

Nature Methods doi:10.1038/nmeth.4365

Page 2: Supplementary Notes - images.nature.com file1 Supplementary Notes Probes & Drugs portal: interactive approach to Open Data exploration in chemical biology Ctibor Skuta, Martin Popr,

2

BIOACTIVE COMPOUND TOOLS AND Supplementary Note 1SOFTWARE APPLICATIONS

Chemicalprobesareindispensabletoolsinmodernbiology.Thesecompoundsarecommonlyusedtostudy a gene function, validate molecular targets or dissect complex processes within cells andorganisms [1-6].Although,chemicalprobesshouldbewell-described,potent, selective tools, ithasbeen demonstrated that not all of them possess these qualities. For some, their biologicalcharacteristicsweredisprovedortheirsuitabilityquestioned[7-12].Still,manysuchcompoundsareincorporated into commercial screening libraries andarebeingusedby the community [8, 11, 13,14].However, thissituationdoesn’tstemprimarily fromthemisleading informationonthevendorwebsites,butmainlyfromthelackoftoolsthatsimplifysuitabletoolselection.Data,notonlyaboutchemicalprobes,butgenerallyaboutallbioactivecompoundtools,arescatteredovervariousdatasources [11,15-31]andresearchpapers [10,32-35]whichmakesthesearch for therightchemicaltoolaverycomplexandtime-consumingtask.

Oneof the resources thatemerged tobringorder to the fieldof chemicalprobes isa community-driven portal Chemical Probes.org [8]. Chemical Probes.org is a high-quality resource containing avalidated setof compounds that canbeemployedasprobeson specific targets.Basic compoundsdataareenrichedwithratingsbyexpertsinthefieldofchemicalbiologyalongwithcommentsontheprobes usage and, in some cases, possible downfalls. The portal also contains a list of obsolete(historical)compoundsthatarenolongerrecommendedtobeusedasprobes(6.1),eitherbecausethere are currently higher-quality alternatives or because their former biological properties weredisproved. Although Chemical Probes.org is a great resource, it contains “only” 158 probes (April2017), which corresponds with the community-driven approach and the quality of the data, butwhichcanalsoturnouttobeinsufficientinmanycases.

Toalleviatetheseproblems,wedevelopedtheProbes&Drugs(P&D)portal.P&Dportalisauniquetoolwhichcompilesdatafrompubliclyavailabledatabasesofbiologicallyactivemolecules,suchasChEMBL [18], Guide to PHARMACOLOGY [22], DrugBank [20], DrugCentral [21], ChEBI [17],BindingDB [15], ZINC[27] or ChemBank [36]. However, these databases support only very basicqueries, data analysis and visualization tools. The added value of P&D is that it delivers a strongcombination of a detailed property and functional annotation with advanced query, filtering andvisualization features. Though there exists several web applications that provide an interface tochemical and/or biological data extracted fromexternal resources, such asOpenPHACTS Explorer[37],ChemBioNavigator[38],theNPCbrowser[39],theCDDVault[40]andPharos[23],P&Dportalallowsonetoinstantlyperformmulti-conditionalqueriesandanswercomplexquestions,unlikeanyoftheotherresources.

Open PHACTS Explorer [37] and ChemBioNavigator [38] both provide an interface to the OpenPHACTSAPI [41]whichbrings together severalpublicly availablepharmacological resources.Whilethe Open PHACTS Explorer offers a rather elementary tabular view with basic information aboutcompoundstructure,physicochemicalproperties,anditspharmacology,ChemBioNavigatorenablesone to fetch multiple data sets and compare them using simple scatter plots. However, thefunctionalityofbothtoolsisdegradedbydateddatasetsprovidedthroughtheOpenPHACTSAPI(asofFebruary2017,themostrecentdataupdateoccurredon31thMarch2015accordingtotheOpen

Nature Methods doi:10.1038/nmeth.4365

Page 3: Supplementary Notes - images.nature.com file1 Supplementary Notes Probes & Drugs portal: interactive approach to Open Data exploration in chemical biology Ctibor Skuta, Martin Popr,

3

PHACTSwebpages[42]).Inaddition,sincetheupdateoftheOpenPHACTSAPIfromver.1.4to2.0(4th February2016), theChemBioNavigator stoppedworkingandusers are redirected to theOpenPHACTSExplorerinstead.TheNPCbrowser[39]isadesktopapplicationthatsimplifiestheanalysisofthe NCGC Pharmaceutical Collection [39].While it contains some basic selection tools, it has notbeen updated, together with the NCGC Pharmaceutical Collection, since 2012. The CDD Vault(CollaborativeDrugDiscoveryVault)[40]isacommercialwebapplication(withapossibilitytocreatea free accountwith a limited functionality) that enables an analysis of compounds collected fromvarious commercial and non-commercial resources. A registered user can create collections fromthesesets,workwiththemanduseverybasicvisualizationfunctions(currentlyinabetastate).Therecently published portal Pharos [23] is a web interface to the Knowledge Management Center(KMC)fortheIlluminatingtheDruggableGenome(IDG)program[43].Itisatarget-centrictoolthatenables a simple exploration of chemical and biological data and provides tools for disease- andligand-based browsing (currently, both as a work in progress). The interface is integrated withsummarizing visualizations and a filtering system which, on the basis of data annotations and anumberofimplementedontologies,enablesonetounitemultiplesearchesintoaworkingdataset.

PROBES & DRUGS PORTAL DATABASE Supplementary Note 2

The Probes and drugs (P&D) portal is a ligand-centric web application which aims to cover thechemicalspaceofcommonlyusedbioactivecompoundtoolsandenableitsexplorationfromvariouspointsofviewthroughanintuitiveandyetverypowerfulfilteringsystem(SupplementaryNote4).This system, enhanced by Boolean logic in combination with integrated visualizations(SupplementaryNote5),ontologies(SupplementaryNote7),chemicalintelligence(SupplementaryNote6)andthepossibilitytocreateuser-definedsets(so-calledcustomsets)(SupplementaryNote8),makesitauniquediscoveryplatforminthefieldofbioactivecompoundtools.

2.1 COMPOUNDS TheP&Dportalwascreatedwithanideatoreflectthecurrentstateofbioactivecompoundspace.Therefore,itscompoundbasewasassembledprimarilyoutofestablished,non-commercialaswellascommercial, bioactive compound sources with a high attention given to compounds labelled asprobesordrugs(2.2).Withthecurrentsize(asofMay2017)of29,898bioactives,theportaldoesn’taimtobeanultimatesourceforcompoundsthathaveevershownabioactivepotential,butratheratoolforworkingwiththemostfrequentlyusedcompoundtoolsinbiologicalexperiments.

2 .1.1 Compounds processing Compounds, togetherwithallassociateddataattributes (e.g.,name, target,CASnumberetc.),areuploadedto theP&Dportal from .sdf (StructureDataFile)or .csv (CommaSeparatedValues) files.Whileonlysomedataattributesaredisplayedinthecompoundsview,allattributesareshowninacompound detail view in the Source data tab. Compound structures are parsed and canonicalizedusingRDKit[44],aprimarycheminformaticsframeworkusedthroughouttheportalforallchemicalrelated functions. If a structure is unparsable by the RDKit, the OpenBabel cheminformaticsframework [45] isusedwhich, insomecases, isable toparse thestructureandre-generate it inaformparsablebytheRDKit.Ifneitherframeworkisabletoparseacertainstructure,wetrytofinditsproperformmanually.Whenacompound isparsed, itsstandardizedform(2.1.2) isgeneratedandexternaldata(e.g.,bioactivities,externalIDs,tagsetc.)areassociatedwithit.

Nature Methods doi:10.1038/nmeth.4365

Page 4: Supplementary Notes - images.nature.com file1 Supplementary Notes Probes & Drugs portal: interactive approach to Open Data exploration in chemical biology Ctibor Skuta, Martin Popr,

4

Uploadedstructuresarenotmanuallycurated.However, ifastructure isnotavailableorparsable,we try to resolve it by a search in the PubChemdatabase [46] or bymatching its IDs or suppliednames.Sometimes,acompoundset isavailableonly in theformofwebpages (e.g.,NURSA ligandset, Chemical Probes.org set, etc.). In this case, data are harvested automatically, if possible, ormanually,ifnecessary.

2.1.2 Compound standardizat ion Compound standardization is aprocess inwhichvarious chemical formsof a single compoundareunified (Figure 1). Compound standardization, on the P&D portal, comprises of several steps: theremoval of stereochemistry, salt/solvate components and isotope labels, theneutralization/standardization of charges and breaking of non-existent (erroneously depicted)covalentbondswhereionicbondsoccur.

Asvariousstereoisomerscanshowdifferentpotenciesortargetprofiles,theoriginalstereoisomerswiththeircorrespondingdataarealwaysavailabletotheuser.However,sometimesitisdifficulttoassessaproper stereochemistry fromprovider’sor vendor’sdata, and thereforewealsooffer theoption to investigate compounds with the stereochemistry removed. An illustrative example isprovidedbyDocetaxel,whichexists insixdifferent forms in theP&Dportal.However,uponcloserexamination it becomes clear that the stereochemistry was not assigned consistently betweendifferentvendors (a commoncase is thatonevendorassignsa specific stereochemistry toagivenbondwhileanothervendorreportsthesamebondwithoutastereochemistry)andthattheseformsvery likely represent the same molecule. Upon standardization, Docetaxel is shown as only onecompound in the Standardized compounds view (with allmetadatamerged), while its six originalrepresentationsremainaccessibleforuserstolaterchoosewhichstereoisomersarebestsuitedfortheirspecificapplication.

On the P&D portal, an improved version of the standardiser Python package [47] is used forcompound standardization. Unlike the original version, it is able to deal with multi-componentsystems containing more than one organic compound (i.e., mainly mixtures, but also includingpolymersinaformoftheirbuildingblocks).Whenamixtureundergoesstandardization,itissplitintoindividualparts(molecules),eachmoleculeisstandardizedseparatelyandidenticalcomponentsareremoved. If at the end of the process there is still more than one molecule, they are put backtogetherinonesystemandtaggedasmulti-component.

Inorganiccompoundsorcompoundscontainingtransitionorbasicmetal/sarenotfullystandardizedand their standardized form isequal to theirnon-isomeric form (original compoundwith removedstereochemistry).

Nature Methods doi:10.1038/nmeth.4365

Page 5: Supplementary Notes - images.nature.com file1 Supplementary Notes Probes & Drugs portal: interactive approach to Open Data exploration in chemical biology Ctibor Skuta, Martin Popr,

5

Figure1 ExampleofP&Dstandardization.Threedifferentrepresentationsofonecompound(PFI-2)matchedtogetherbystandardization process. PFI-2 is a first-in-class, potent, selective, and cell-permeable inhibitor of the methyltransferaseactivityofhumanSETD7.[48]

2.1.3 Compounds external data Each compound is enriched by additional metadata, such as external identifiers, bioactivity data,associatedpathways,ortargetandpathwayontologiesclassification.Thesedataarecollectedfromvariousexternalsources(SupplementaryTable3)bymatchingcompound’sInChIKey[49], isomeric(if available), non-isomeric and standardized. Subsequently, also the connectivity part of InChIkey(first part encoding the threemain layers of InChI) is used. The reason is to ensure that originallymulti-component compounds (e.g., salts or compounds with solvents) are after standardization(whichincludestheremovalofstereochemistry)matchedtotheirisomericformsinexternalsources(Figure 2). In the original compounds view, only datamatched to the original compound isomericInChIKey are shown; in the standardized compounds view, data matched by all other forms ofInChIKeyareadded.

Nature Methods doi:10.1038/nmeth.4365

Page 6: Supplementary Notes - images.nature.com file1 Supplementary Notes Probes & Drugs portal: interactive approach to Open Data exploration in chemical biology Ctibor Skuta, Martin Popr,

6

Figure2DifferentformsofasinglecompoundalongwiththeirInChIkeys.Tomatchthetargetcompound(bottom)onlytheconnectivity part of the compound’s standardized form (right) can be used.Other isomeric (left) and non-isomeric (top),InChIkeysarecompletelydifferentfromthetargetcompound’sInChIkey.

2.1.3.1 Targets and bioactiv it ies Bioactivity data are currently extracted forHuman, Rat andMouse, for single protein and proteincomplex targets, fromChEMBL [18] andGuide toPHARMACOLOGY [22]databases. FromChEMBL,onlyrecordslabelledwithaconfidencescore>=7,withpchemblvalue(i.e.,theactivitytypeisequal

topIC50,pEC50,pKi,pKd,pKB,pAC50,pA2orPotency,allwith− log 𝑐𝑜𝑛𝑐𝑒𝑛𝑡𝑟𝑎𝑡𝑖𝑜𝑛[!!] units)and

valuerelation'='areextracted.Whenmorevaluesforoneligand-targetcomplexareavailable,theiraverage value is calculated. If available, this activity value is also annotatedwith amechanism ofaction(MOA)andaprimarytargetflag(i.e.,theligandisbindingdirectlytothetarget).

Guide to PHARMACOLOGY contains either an activity range for a ligand-target complex (minimumandmaximum)or themedianof activity values. There are also caseswhen the activity valueof aligand-target complex is not known, therefore, the N/A value is listed among the values on theportal.Again,ifavailable,thisactivityvalueisalsoannotatedwithamechanismofaction(MOA)andaprimarytargetflag.

Ontheportal,commontargets fromdifferentsources forallorganismsareunified intoonetargetwithalloriginaldata.First,thetargetsforindividualorganismsarematchedtoeachotheraccordingtotheirUniProtIDs[26]andthesearethenmergedonthebasisoftheirname.

Nature Methods doi:10.1038/nmeth.4365

Page 7: Supplementary Notes - images.nature.com file1 Supplementary Notes Probes & Drugs portal: interactive approach to Open Data exploration in chemical biology Ctibor Skuta, Martin Popr,

7

2.2 COMPOUND SETS P&D portal compound sets were selected based on their popularity and availability. Upon userrequest,newcompoundsetscanbeeasily incorporatedwithin theP&Dportal.Currentlyavailablecompound sets are summarized in Supplementary Table 2. Compound setswhich canvary in size(i.e., non-fixed compound sets harvested from live web portals or databases) are updated on amonthlybasis.

Each compound set is characterized by six summarizing numbers, three for compounds in theiroriginal form and the same three for their standardized form respectively: a total number ofcompounds, number of unique compounds in the context of the P&D portal (i.e., compoundscontainedonlyinoneparticularset)andnumberofduplicates(i.e., identicalcompoundscontainedin the set). For standardized compound sets, the total anduniquenumbers arenaturally equal orlowerthanfortheiroriginalform;thenumberofduplicatesisequalorgreater.ThesestatisticscanbeaccessedintheCompoundSetsview(Figure3),whereeachnumberservesasahyperlinktofilterparticularcompoundsetintheCompoundsview(Figure).

Figure3Compoundsetsview.Compoundsetsaredividedinto5categories:probe,drug,non-commercial,commercialandunassigned.Foreachset,severalquantitiesaredisplayed:1)anumberofalloriginalcompounds(2.1.1)2)anumberofalluniqueoriginalcompounds3)anumberofduplicatecompounds4),5),6)anumberofall/unique/duplicatecompoundsinastandardizedcompoundset(2.1.2)7)alinktothesourcepageofacompoundset8)compoundsetdescription9)thedateoflastupdate

2.3 DATA LICENSE AND ACCESSIBILITY The data on the portal are available under the Creative Commons 4.0 license [50]. Currently, allcompounds can be downloaded through the Export compounds function (Supplementary Note 3,Figure1).Inthefuture,alldatawillbeavailableintheformofadatabasedump,andlaterthroughawebAPI.

Nature Methods doi:10.1038/nmeth.4365

Page 8: Supplementary Notes - images.nature.com file1 Supplementary Notes Probes & Drugs portal: interactive approach to Open Data exploration in chemical biology Ctibor Skuta, Martin Popr,

8

WEB INTERFACE Supplementary Note 3

The P&D portal is centered around themain working space (Figure 1) through which a user canaccess the compounds and ask specific questions about them, i.e., use the filtering system(Supplementary Note 4), visualizations (Supplementary Note 5) and ontologies (SupplementaryNote7).

Figure 1 Compounds view, the main working space of the portal. 1) The switch between original and standardizedcompounds view 2) Functional tabs (from left): the search (implicit) tab; compound and custom sets tab; compoundpropertiesandstructuralalertstab;targetontologiestab;pathwayontologiestab3)visualizationbuttons(fromleft):Venndiagrams; chemical space; cluster heatmap; compound set summaries 4) text and structure search 5) the number ofcurrentlyselectedcompounds6)actionbuttons(fromleft):addallcurrentcompoundstoacustomset;highlightallfiltersamongvisiblecompounddata;refreshcurrentview;switchbetweendetailedandsimplecompoundview;downloadcurrentcompoundset;helpbutton7)currentcompoundsetview.

Atthebeginning,auserhasallP&Dcompoundsattheirdisposal.Eachcompoundisrepresentedbyitsstructurewithaprimaryname,basicphysicochemicalproperties(theLipinski’sRuleofFive)[51]andotherinformation,suchasanumberofcompoundswithanidenticalscaffoldontheportaloranumberofmatchedstructuralalerts(6.1),representedbycolorediconsaccompaniedbyanumber(quantity) in some cases (Figure 2).While only compound structure representation is shown in asimplecompoundsview (Figure 2, Figure 3),mostof theassociateddataarepresent inadetailed(default)compoundview.

Nature Methods doi:10.1038/nmeth.4365

Page 9: Supplementary Notes - images.nature.com file1 Supplementary Notes Probes & Drugs portal: interactive approach to Open Data exploration in chemical biology Ctibor Skuta, Martin Popr,

9

Figure2Defaultrepresentationofacompound.1)drugicon2)probeicon3)availabletobuyicon4)showSMILESbutton(visible only on amouse cursor hover) 5) add to custom set button (visible only on amouse cursor hover) 6) compounddepiction7) findsimilarcompoundsbutton (visibleonlyonamousecursorhover)8)edit structure in thechemicaleditorbutton (visibleonlyonamouse cursorhover) 9) compound’sprimaryname (selected fromall associatednames)10) theLipinski’sRuleof Five compoundproperties (red rectangle indicates that themaximum threshold (inbrackets) of agivenpropertywasexceeded:molecularweight (500); thenumberofhydrogenbonddonors (5); thenumberofhydrogenbondacceptors (10); the number of rotatable bonds (10); calculated octanol/water partition coefficient (5); 11) a number ofcompoundssharingthesamestandardized(parent)compound12)thenumberofcompoundswiththesamescaffoldinthecontext of all compounds on the portal 13) the number of compound sets the compound belongs to 13) the number ofassociatedstructural(PAINS[12])alerts.

Thecolorediconsaboveacompound’sstructuredepictthreecompoundtags:adrug,probeandanavailability tags. These tags are assigned according to a compound’s membership in a particularcompoundsetoraccordingtoitslabelintheexternaldatabase.

A compound is tagged as a drug when it belongs to one of the drug sets (as of April 2017 toDrugBank, NIH Approved oncology drugs, DrugCentral or ChEMBL Approved Drugs) or when it isclassifiedasadruginChEMBLorGuidetoPHARMACOLOGY.Thecolorcodingofadrugicondependson a drug type. A drug can be assigned to one or more of the following types: approved,investigational, experimental, illicit, withdrawn, nutraceuticalor vet_approved drug. The followingdrugiconcolorcodingisused:

1. Whenadrugistaggedasapproved,theiconisgreen:

2. Whenadrugistaggedaswithdrawnorillicit,theiconisred:

Nature Methods doi:10.1038/nmeth.4365

Page 10: Supplementary Notes - images.nature.com file1 Supplementary Notes Probes & Drugs portal: interactive approach to Open Data exploration in chemical biology Ctibor Skuta, Martin Popr,

10

3. Whenadrugistaggedasinvestigational,theiconisyellow-green:

4. When a drug is tagged by some of the other values (experimental, nutraceutical,

vet_approved),theiconisorange:

5. Whenitisunknownwhetheracompoundisadrug,theiconisgray:

Acompoundistaggedasaprobewhenthisinformationisspecifiedbythedatavendoranditisnottagged as obsolete (6.1) at the same time. Every compound from probe sets ChemicalProbes.org,SGCprobes,MLPprobesandNatureChemicalBiologyprobes is taggedasaprobe;alsocompoundstaggedasprobesinMLSMRProbes+and Informerset2.0setsaretaggedasaprobeintheportal.

1. Whenacompoundistaggedasaprobe,theiconisgreen:

2. Otherwise,theiconisgray:

Acompoundistaggedasavailable, i.e.,possibletobuy,whenitbelongstooneofthecommercialbioactivesets (Supplementary Table 2)orwhen it ispresent in theMolPort [52]ormcule [53] in-stockcompoundlists.

Figure 3 Data associatedwith a compound in the Compounds view. 1) compound name/s 2) the list of Custom sets thecompoundbelongs to3) the listofCompoundsets (2.2) thecompoundbelongs to4)compoundtags5) sourceattributes(any text attributes extracted from the compound source data file/s) 6) Chemical Abstracts Registration Number/s (ifavailable)7)compoundexternalIDswithhyperlinkstootherdatabases8)compound’sP&DIDs9)compound’stargetswithassociatedpathways10)pathwaysinwhichthecompoundplaysanactiverole(notnecessarilyasaligandofatarget)

Nature Methods doi:10.1038/nmeth.4365

Page 11: Supplementary Notes - images.nature.com file1 Supplementary Notes Probes & Drugs portal: interactive approach to Open Data exploration in chemical biology Ctibor Skuta, Martin Popr,

11

Allcompound’sdatawiththeirsourcecanbeaccessedinthesinglecompounddetailview(Figure4).In addition to the Compounds view, a compound’s structure in different formats (SMILES, InChI,InChIKey,MOL/SDF),longerdescriptivetextsandalsoallsourcedataassociatedwithacompoundinitssourcefile/sareaccessiblefromthesinglecompounddetailview.

Figure4Thesinglecompounddetailview.

3.1 HELP The information about the portal and its usage are available in twomain forms: FAQ (FrequentlyAskedQuestions)andaninteractiveguidedtour.

TheFAQsectioncanbeaccessedundertheHelpsectionandwillbeupdatedaccordingtothemostfrequentuserquestionsandremarks.

Theinteractiveguidedtourwasdesignedtodescribemostoftheaspectsconcerningtheusageoftheportal, and theorigin/processingof thedata.The tour canbeaccessed from theHelp sectionandalsodirectlyfromtheCompoundsviewthroughthegreenquestionmarkiconinthetop-leftcorner(Figure 5). Accompanying the tour, there are also three simple interactive examples that shouldprovideauserwithabasicnotionhowtoworkwiththedataontheportal.

Nature Methods doi:10.1038/nmeth.4365

Page 12: Supplementary Notes - images.nature.com file1 Supplementary Notes Probes & Drugs portal: interactive approach to Open Data exploration in chemical biology Ctibor Skuta, Martin Popr,

12

Figure5TheCompoundsviewwithanopenHelpmenu(thetop-leftcorner).

Nature Methods doi:10.1038/nmeth.4365

Page 13: Supplementary Notes - images.nature.com file1 Supplementary Notes Probes & Drugs portal: interactive approach to Open Data exploration in chemical biology Ctibor Skuta, Martin Popr,

13

FILTERING SYSTEM Supplementary Note 4

An intuitive, yet powerful filtering system enables a user to ask about various properties of smallbioactivecompounds.Ausercanconstructnotonlysimplequeries(Figure3),butalsocomplicated,multi-conditionalquestions(Figure4).

Thecornerstoneof thesystem isasingle filter (Figure 1)whichcanbeofvarious types (Figure 2).Generally, a filter represents a subset of thewhole compound set that is applied on the basis ofassociatedlogicalBooleanoperators(AND=intersection,NOT=difference,OR=union).Theresultofausedfilterdependsontheselectedoperatorandcompoundsetontheinputofthefilter.Incaseof the first selected filter, the input set isnaturallyequal to thewhole compoundset (i.e., for theintersection, the result is equal to the compounds representedby the filter, for difference, to thewholecompoundsetwithoutthecompoundsrepresentedbythefilter,andforunion,tothewholecompoundset,becauseanysetunitedwithitssubsetisequaltotheformer).

Figure1Similarityfilter,oneofthepossiblefiltertypes.Mostfilterpartsareidenticalforallfiltertypes(points1to6and12),butsomeofthemareavailableonlyforaparticularfiltertype(here7to11).1)Anumberofcompoundsatafilterinput2)Booleanoperations(AND=intersection,NOT=difference,OR=union)appliedtocompoundsetontheinputofafilter(1)and compound set represented by a filter (7) 3) Filter type 4) Number of compounds represented by a filter 5)Disable/enable filterbutton6)Removefilterbutton7)Structuretowhichasimilarity iscalculated8)Draggableslider forsimilarity thresholdsadjustment (interactivelyconnectedwith text inputs9and11)9)The text inputofbottomsimilaritythreshold 10) Arrows enable the ordering of current compound set according to filter values 11) The text input of topsimilaritythreshold12)Anumberofcompoundsatafilteroutput

Nature Methods doi:10.1038/nmeth.4365

Page 14: Supplementary Notes - images.nature.com file1 Supplementary Notes Probes & Drugs portal: interactive approach to Open Data exploration in chemical biology Ctibor Skuta, Martin Popr,

14

Onefilterwouldn’tbeenoughtocreatemulti-conditionalquestions.Thus,anynumberoffilterscanbechainedtogetherintoonelogicalexpression.Inthiscase,eachfilterisappliedonthecompoundsetresultingfromtheapplicationofallofitspredecessors(i.e.,thefirstfilterisappliedonthewholecompoundset,secondontheresultofthefirstetc.).Incasethatanyofthelogicaloperatorsforanyfilterintheexpressionischanged,theexpressionisinstantlyevaluatedandtheresult,includingallofits sub-results, is recalculated. Using themouse cursor, the filters can also be dragged and freelyinterchanged;usingthedisablebutton,anyfiltercanbetemporarilydisabled.

Generally, any kind of information on the P&Dportal can be used as a filter (text attributes, sub-structures,targets,targetclasses,etc.)andmostofthemcanbeaddedbytwoapproaches:throughasearchfieldwithanautocompletefunctionorusingasearchiconthatrevealsitselfwhenamousecursorhoversaboveafilter.

Nature Methods doi:10.1038/nmeth.4365

Page 15: Supplementary Notes - images.nature.com file1 Supplementary Notes Probes & Drugs portal: interactive approach to Open Data exploration in chemical biology Ctibor Skuta, Martin Popr,

15

Figure2Varioustypesoffilters.Thesefiltersrepresentcompounds:1)containedinacompoundset(therearealsospecializedcompoundsetfiltersforcompoundsthatarecontainedonlyinaparticularsetandfilterforduplicatesinaparticularset)2)taggedasapproveddrugs3)associatedwiththenameGleevec4)withdefinedPubChemID5)withaparticularexternalID6)withcompoundpropertyinagivenrange7)thatcomplyastructuralalert8)withabiologicalactivityonagiventargetatagivenrange9)withthehighestbiologicalactivityonagiventargetclassatagivenrange10)withabiologicalactivityonagivenpathwayatagivenrange11)withagivenMechanism-of-Action(MOA)12)withagivenMOAeffect(positive,negative,other)13)withaparticularscaffold14)thataresimilartoagivenstructureinagivensimilarityrange15)thatcontainagivensubstructure16)thatareidenticaltoagivenstructure

Nature Methods doi:10.1038/nmeth.4365

Page 16: Supplementary Notes - images.nature.com file1 Supplementary Notes Probes & Drugs portal: interactive approach to Open Data exploration in chemical biology Ctibor Skuta, Martin Popr,

16

Figure3Acompoundsetcreatedbytheapplicationof2compoundsetfilters.Ifsearchtabisselected,leftnavigationpanel1)listsallappliedfilters.Inthisexample,theintersectionoftheDrugcentralcompoundset(filter2)andDrugbankcompoundset(filter3)isfiltered.

Nature Methods doi:10.1038/nmeth.4365

Page 17: Supplementary Notes - images.nature.com file1 Supplementary Notes Probes & Drugs portal: interactive approach to Open Data exploration in chemical biology Ctibor Skuta, Martin Popr,

17

Figure4Acompoundsetcreatedbytheapplicationof5differentfilters.Ifsearchtabisselected,leftnavigationpanel1)listsallappliedfilters.Inthisexample,acompoundsetwithfollowingcharacteristicsisfiltered:Glucocorticoidreceptorligandswithatleast100nMpotency(filter2)orcompoundsthatare30%ormoresimilartoagivenstructure(Dexamethasone)(filter3),labelledasapproveddrugs(filter4)withcLogPlowerthan5(filter6)andthatbelongtotheNIHClinicalCollectionscompoundset(filter5).

Nature Methods doi:10.1038/nmeth.4365

Page 18: Supplementary Notes - images.nature.com file1 Supplementary Notes Probes & Drugs portal: interactive approach to Open Data exploration in chemical biology Ctibor Skuta, Martin Popr,

18

VISUALIZATIONS Supplementary Note 5

5.1 VENN DIAGRAM Venndiagramsareusedtodepicttheintersectionsoftwoormoredatasets(Figure1Figure,Figure2).OntheP&Dportal,Venndiagramsofupto5setscanbevisualized.Venndiagramvisualizationisdonebyourin-housedevelopedJavaScriptlibrary.

Figure1Venndiagramofthreedrugsetsintheirstandardizedform:DrugBank,DrugCentralandChEMBLApprovedDrugs.

Figure2TheintegrationofVenndiagramontheP&Dportal.

Nature Methods doi:10.1038/nmeth.4365

Page 19: Supplementary Notes - images.nature.com file1 Supplementary Notes Probes & Drugs portal: interactive approach to Open Data exploration in chemical biology Ctibor Skuta, Martin Popr,

19

5.2 CHEMICAL SPACE Chemicalspaceisamultidimensionalspaceofallpossible,energeticallystable,chemicalcompounds.Chemicalspacecanbevisualizedas2D/3Dscatterplotinwhichdistancesbetweendatapoints(i.e.,chemicalmoieties) correspond to their structuralor physicochemical similarities. To calculate suchsimilarities, chemical compounds must be represented by their descriptors and similarity metricsmustbedefined.IntheP&Dportal,twotypesofdescriptorscanbeused:physicochemicalpropertiesandMorgan fingerprints [54].Whileeachpairofpre-calculatedphysicochemicalproperties canbeplotted on the X and Y axis (Figure 3), Morgan fingerprints are projected into the new sets ofcoordinates(PC1andPC2)usingPrincipalComponentAnalysis(PCA)(Figure4,Figure6).Colorandsizeofindividualdatapointsinthechemicalspacevisualizationcanalsoreflecttheirphysicochemicalproperties.Inaddition,togainanideaabouttherepresentativenessofagivencompoundset,itcanbecomparedto5000compoundsthatcontainthe5000most frequentscaffoldspresented intheChEMBL database (Figure 5). Chemical space visualization is done by our in-house developedJavaScriptlibrary.

Figure3Thevisualizationofthechemicalspaceofprobes(green)anddrugs(red).Chemicalspaceisdefinedbyamolecularweight (X axis) and ClogP (Y axis). Only compounds with amolecular weight between 150 and 1000 Da and with logPbetween-15and15areshown.

Nature Methods doi:10.1038/nmeth.4365

Page 20: Supplementary Notes - images.nature.com file1 Supplementary Notes Probes & Drugs portal: interactive approach to Open Data exploration in chemical biology Ctibor Skuta, Martin Popr,

20

Figure4 Thevisualizationofthechemicalspaceofprobes(green)anddrugs(red).Chemicalspaceisdefinedbyfirsttwoprincipalcomponents(PC1andPC2)resultingfromthePCAofMorganfingerprints.

Figure5 ThePCAofallP&Dcompounds(pink)and5,000ChEMBLcompounds(gray).Bothsetsaresimilarlydiverse,onlytheleftpartofthegraphshowsP&Dcompoundswithoutanyscaffold.

Nature Methods doi:10.1038/nmeth.4365

Page 21: Supplementary Notes - images.nature.com file1 Supplementary Notes Probes & Drugs portal: interactive approach to Open Data exploration in chemical biology Ctibor Skuta, Martin Popr,

21

Figure6 TheintegrationofthechemicalspacevisaulizationontheP&Dportal.ThevisualizationofthechemicalspaceofGlucocorticoid receptor (red) and Estrogen receptor (green) ligands. The size of each point corresponds with a ligand’smolecularweight.

5.3 CLUSTER HEATMAP Aclusterheatmapisagraphicaldatarepresentationconsistingofthecombinationofadendrogramandheatmap(Figure7).Adendrogramisatreelikestructuredepictingthearrangementofclustersyieldedbyhierarchicalclustering.Aheatmapisa2Dmatrixwithcolor-codedvalues.

Within the P&D portal, data for clustering can be represented either by their physicochemicalproperties(Figure7, Figure8)oras512bits longMorganfingerprints[54]withtheradiusof2.Forphysicochemicalpropertycoding,aheatmapshowsindividualpropertyvalues(Figure7).However,ifMorgan fingerprints are used, individual bit values are not depicted in a heatmap. Instead, onlymetadata columns, describing either binary class membership (i.e., a compound belong/does notbelongtoagivenclass)orquantifyingacompound/targetrelationship(i.e.,compoundaffinityvalue)areshown(Figure9).Clusterheatmapsarevisualizedusingthe InteractiveClusterHeatmaplibrary(InCHlib)[55].

Nature Methods doi:10.1038/nmeth.4365

Page 22: Supplementary Notes - images.nature.com file1 Supplementary Notes Probes & Drugs portal: interactive approach to Open Data exploration in chemical biology Ctibor Skuta, Martin Popr,

22

Figure7Theclusterheatmapofcompoundsclusteredaccordingtotheirphysicochemicalproperties.Thelasttwocolumnscontain data about a compound membership (1 for true and 0 for false) to selected filters, or, as in this case, valuesassociatedwiththeselectedfilters(e.g.,bioactivities,structuresimilarities).

Figure8TheintegrationofclusterheatmapvisualizationontheP&Dportal.ThevisualizedclusterheatmapcontainsligandsofEstrogenreceptorsalphaandbeta.Theligandsareclusteredaccordingtobasicphysicochemicalproperties(columnsingreencolorscale)withtheirbioactivitiesasmetadata(columnsinblue-redcolorscale).

Nature Methods doi:10.1038/nmeth.4365

Page 23: Supplementary Notes - images.nature.com file1 Supplementary Notes Probes & Drugs portal: interactive approach to Open Data exploration in chemical biology Ctibor Skuta, Martin Popr,

23

Figure 9 Theclusterheatmapofcompoundsclusteredbasedon theirMorgan fingerprints. [54]Onlymetadatacolumns,that display compound affinities greater or equal to 7 (-log(concentration[mol/L]) on 6 steroid receptors (from left:Glucocorticoid,Estrogenalpha,Estrogenbeta,Mineralocorticoid,ProgesteroneandAndrogenreceptor),areshown.

5.4 SUMMARY VISUALIZATIONS (BIOLOGICAL, PHYSICOCHEMICAL, SCAFFOLDS) Currently selected compound set can be summarized by three different types of visualizations: abiologicalsummary,scaffoldsummaryandphysicochemicalpropertiesdistribution.

5.4.1 Biological summary A biological summary consists of two pie charts showing the proportion of individual target andpathwayclassesrepresentedwithinacompoundset(Figure10).Thefollowingtargetandpathwayclassespre-selectedfromintegratedontologies(SupplementaryNote7)canbeused:

1. Targetclassesa. Selected target classes (Epigenetic regulator, Cytochrome P450, Kinases, Ion

channels, Catalytic receptors, Nuclear hormone receptors, G protein-coupledreceptors,Peptidasesandproteinases,Transporters)

b. ChEMBLtargetontologymainnodes[56]c. GuidetoPHARMACOLOGYtargetontologymainnodes[57]

2. Pathwayclassesa. Selected pathway classes (Immune system, Signal transduction, Gene expression,

Metabolismofproteins,Neuronalsystem,Hemostasis,Cellcycle,Cellularresponsetostress,Developmentalbiology)

b. Reactomepathwayontologymainnodes[25]

Nature Methods doi:10.1038/nmeth.4365

Page 24: Supplementary Notes - images.nature.com file1 Supplementary Notes Probes & Drugs portal: interactive approach to Open Data exploration in chemical biology Ctibor Skuta, Martin Popr,

24

Figure 10 Biological summary of all P&D compounds with at least 100nM potency (i.e., value 7 in –log(concentration[mol/L])units)onanytargetusingthepre-selectedsetsoftargetandpathwayclasses.

5.4.2 Physicochemical propert ies distr ibution The distributions of the physicochemical properties of a current compound set can be easilycomparedwiththeirdistributionswithinthewholeP&Ddatabase(Figure11).

Figure 11 The comparison of selected physicochemical properties distribution between the nuclear hormone receptorsligandswithatleast100nMpotency(i.e.,value7in–log(concentration[mol/L])units)(inred)andthewholeP&Dcompoundset(ingray).

Nature Methods doi:10.1038/nmeth.4365

Page 25: Supplementary Notes - images.nature.com file1 Supplementary Notes Probes & Drugs portal: interactive approach to Open Data exploration in chemical biology Ctibor Skuta, Martin Popr,

25

5.4.3 Scaffold summary The conceptof amolecular scaffold is commonlyused inmedicinal chemistry. Scaffolds representcommoncorestructuresofagivencompoundset.BasicversionoftheBemis-Murckoscaffoldisusedontheportal [58].Thistypeofscaffold iscreatedbypreservingall ringswiththeir interconnectingchains(so-calledlinkers)whileremovingallotherside-chains.Theanalysisofscaffoldfrequency(upto 100 of themost frequent scaffolds) can be performed for any selected subset within the P&Dportal(Figure12).

Figure12 Thescaffoldsummaryofnuclearhormonereceptorligandswithatleast100nMpotency.Steran-likescaffoldistypicalforthisreceptorfamily.

Nature Methods doi:10.1038/nmeth.4365

Page 26: Supplementary Notes - images.nature.com file1 Supplementary Notes Probes & Drugs portal: interactive approach to Open Data exploration in chemical biology Ctibor Skuta, Martin Popr,

26

CHEMICAL INTELLIGENCE Supplementary Note 6

Fromthestructurepointofview, four typesofstructure filterscanbeusedontheP&Dportal:anidentity,asimilarity,substructureandscaffold.

Figure1Theintegrationofchemicalstructureeditor(KetcherbyEPAMLifeSciences[59])ontheP&Dportal.

The identity search is performed by converting a query structure to its InChIKey [49] and by itscomparison to the InChIKeys of all compounds on the portal. Each compound is characterized bythreetypesofInChIKey(notnecessarilydistinct):isomeric,non-isomericandstandardized.

Similarity between two structures is calculated using Morgan fingerprints and the Tanimotocoefficient.MorganfingerprintsarecircularfingerprintscloselyresemblingtoExtendedConnectivityFingerprints (ECFP) [54] thatareoneof themostpopular fingerprints forcompounddescription incheminformatics [35,60,61]. Incircular fingerprints,astructure isencodedbymeansofstructuralfragmentsthataredefinedasatomneighborhoodsuptoagivenradius (e.g.,ECFP4arecalculatedwith the radius of 2 atoms, ECFP6 with the radius of 3 atoms). The Tanimoto coefficient is acommonlyusedsimilaritymetricforbinarydataandisdefinedasfollows:

𝑇𝑎𝑛𝑖𝑚𝑜𝑡𝑜 = 𝑁!&!

𝑁! + 𝑁! − 𝑁!&!,

where𝑁!is a number of ON bits in the first fingerprint,𝑁!a number of ON bits in the secondfingerprintand𝑁!&!anumberofcommonONbitsinbothfingerprints.ThevalueoftheTanimotocoefficientliesintheintervalbetween0and1(1meansthatbothfingerprintsareidentical,0meansthattherearenocommonONbits).

Substructuresearchesareperformedwiththenon-isomericformofaquerystructure.Substructureand similarity queries are both performed within the RDKit database cartridge [62]. To enable

Nature Methods doi:10.1038/nmeth.4365

Page 27: Supplementary Notes - images.nature.com file1 Supplementary Notes Probes & Drugs portal: interactive approach to Open Data exploration in chemical biology Ctibor Skuta, Martin Popr,

27

structure editing on the portal, chemical structure editor Ketcher by EPAM Life Sciences [59]wasemployed(Figure1).

Scaffoldsearchesarebaseduponapre-calculatedscaffoldstructureofeachcompound,anditcanbeperformedthroughthescaffoldiconfoundunderacompoundrepresentation(SupplementaryNote3, Figure 2). Be aware that scaffold and substructure filterswith the samequery structuredonothave to return the same compound set, since the latter can be only a substructure of a largerscaffold.

6.1 STRUCTURAL ALERTS Structural alert is a specific tagwhich should tell auser tobeawareaboutapossibleproblematicbehavior of a compound in the context of biological screening. It is associated either with acompound’s biological properties (e.g., non-selective or not sufficiently potent compound) or itsstructuralfeaturesthatmaycauseunwantedeffectswithinanassay(e.g.,non-specificreactionwitha protein). Currently, three different types of structural alerts are integrated into the portal: PanAssayInterferencefilters(PAINs)[12],aggregatorsandobsoletecompounds.

PAINs filters are a set of potentially problematic substructures that might be the cause for thepromiscuityor interferenceof a compoundwithinanassay. [12] SincePAINs filters arealready itsstandardpart,RDkitframeworkisresponsiblefortheirmatchingwithintheP&Dportal.

Aggregators are compounds thatmay form colloidal aggregates and interfere with an assay non-specifically.Aggregatorswithin theP&Dcompound setare currentlymatched to the setof knownaggregatorsfromtheAggregatoradvisorsoftware.[11]

Compoundstaggedwiththeobsoletestructuralalertarecompoundsthatwereonceusedastoolsinbiological screening,butwithcurrenthigherqualityalternatives (e.g.,morepotentor selective)orwithdisprovedbiologicalproperties(e.g.,non-selectivityforanoriginaltargetofinterest).ObsoletecompoundsontheportalarecurrentlytaggedaccordingtothelistofhistoriccompoundsfromtheChemicalProbes.orgportal.[8,63]

Theinformationaboutanumberofmatchedstructuralalertsforacompoundisdepictedbyaniconinthebottomrightcornerofeachcompound’sstructure(SupplementaryNote3,Figure2).Specificstructural alerts can be found in the Structural alerts tab in the detail view of a single compound(SupplementaryNote3,Figure4).

Nature Methods doi:10.1038/nmeth.4365

Page 28: Supplementary Notes - images.nature.com file1 Supplementary Notes Probes & Drugs portal: interactive approach to Open Data exploration in chemical biology Ctibor Skuta, Martin Popr,

28

ONTOLOGIES Supplementary Note 7

Tofiltercompoundsonthebasisoftargetandpathwayclasses,targetontologiesfromChEMBL[18]andGuidetoPHARMACOLOGY[22],andpathwayontologyfromReactome[25]wereintegratedtotheportal. They canbe accessed in the Target andPathway tabswhere they canbeemployed tobrowsethroughcompounds ina tree-likemanner (fromthemostgeneral tomorespecificclasses)(Figure1).Anyontologynodecanbealsofurtherusedasafilterinthesearch(main)tab.Sincethesame targets from different sources are matched to each other through their Uniprot IDs [26](2.1.3.1),thenalsosimilarontologynodesfromdifferentontologiesrepresentsimilar(notidentical)compoundsets.

Figure1OntologiesintegrationontheP&Dportal.TargetandpathwayontologiescanbeusedfromtheTarget(1,currentlyselected)orPathway(2)tabstobrowsethroughcompoundsinatree-likemanner.Currentlyselectedclassintheontology(3)ishighlighted(green)andtargets/pathwaysassociatedwiththeclassaremarkedwiththegreenlabel.

Nature Methods doi:10.1038/nmeth.4365

Page 29: Supplementary Notes - images.nature.com file1 Supplementary Notes Probes & Drugs portal: interactive approach to Open Data exploration in chemical biology Ctibor Skuta, Martin Popr,

29

CUSTOM SETS Supplementary Note 8

Custom sets are arbitrary, user defined, compound sets intended to store advanced queries withpossibilitytomanuallyadd/removesinglecompounds.Currently,theycanbeassembledonlyoutofcompoundsfoundontheportal;auseruploadofcompoundstotheportalisnotsupported.

Sincecustomsetsareboundtoauseraccount,theycanbecreatedonlybyloggedusers.Currently,ausercancreateupto5customsets.Tocreateacustomset,itmustbefirstinitializedintheCustomSets view (Figure 1). Once a custom set is created, single or multiple (batch) compounds can beaddedfromtheCompoundsview(SupplementaryNote3,Figure1).Singlecompoundsareaddedbyclickingonanarrowinatoprightcornerwhenhoveringoveracompound'simage(SupplementaryNote3,Figure2);multiplebyclickingonthelargerarrowontherightsideofthesecondnavigationtab (Figure ). Compounds canbe removed froma customsetonly in aparticular customset view(accessiblefromtheCustomsetstab)usingacrossicon.Again,bothsingleandmultiplecompoundscanberemoved.

Figure1CustomSetsview.

Nature Methods doi:10.1038/nmeth.4365

Page 30: Supplementary Notes - images.nature.com file1 Supplementary Notes Probes & Drugs portal: interactive approach to Open Data exploration in chemical biology Ctibor Skuta, Martin Popr,

30

PROGRAMMING TOOLS Supplementary Table 1

TheP&Dportalusesalotoffreeopen-sourceprogramingtools.TheserversideoftheportalisbasedprimarilyonthePythonprogramminglanguage[64]withDjangowebframework[65]andPostgreSQLdatabase[66].Inthefollowingtable,someofthemostimportantusedtoolsandprogramminglibraries,withashortdescription,arementioned:

Name Category Description URL

chembl_webresource_client cheminformatics

This package implements python client for accessingChEMBL webservices. It provides convenient interfaceto access data, cache results and optionally executesrequestsinasynchronousmanner.

https://github.com/chembl/chembl_webresource_client

ChemSpiPy cheminformatics

ChemSpiPyprovidesawaytointeractwithChemSpiderin Python. It allows chemical searches, chemical filedownloads, depiction and retrieval of chemicalproperties.

http://chemspipy.readthedocs.io/en/latest/

Ketcher cheminformatics Ketcherisaweb-basedchemicalstructureeditor. http://lifescience.opensource.epam.com/ketcher/

OpenBabel[45] cheminformatics

OpenBabelisachemicaltoolboxdesignedtospeakthemany languages of chemical data. It's an open,collaborative project allowing anyone to search,convert, analyze, or store data from molecularmodeling,chemistry,solid-statematerials,biochemistry,orrelatedareas.

http://openbabel.org/

PubChemPy cheminformatics

PubChemPyprovidesawaytointeractwithPubCheminPython. It allows chemical searches by name,substructure and similarity, chemical standardization,conversion between chemical file formats, depictionandretrievalofchemicalproperties.

http://pubchempy.readthedocs.io/en/latest/

Nature Methods doi:10.1038/nmeth.4365

Page 31: Supplementary Notes - images.nature.com file1 Supplementary Notes Probes & Drugs portal: interactive approach to Open Data exploration in chemical biology Ctibor Skuta, Martin Popr,

31

RDKit[62] cheminformatics RDKit is a collection of cheminformatics and machine-learningsoftwarewritteninC++andPython.

http://www.rdkit.org/

Standardiser cheminformaticsThis is a tool designed to provide a simple way ofstandardisingmolecules as a prelude to e.g.molecularmodellingexercises.

https://github.com/flatkinson/standardiser

Highcharts visualizationsHighcharts is a charting library written in pureJavaScript, offering an easy way of adding interactivechartstoyourwebsiteorwebapplication.

http://www.highcharts.com/

ChemSpace.js visualizations

ChemSpace.js is an open source interactive Javascriptlibrary which provides an easy way to display andanalyzecompoundsetsintheformof2Dspacewithinawebpage.

http://openscreen.cz/software/chemspace/home/

InCHlib[55] visualizations

InCHlib (InteractiveClusterHeatmap library) isanopensource interactive Javascript library which provides aneasywaytodisplayandanalyzehierarchicallyclustereddataandclusterheatmaps.

http://openscreen.cz/software/inchlib/home/

Konva.js visualizations2d html5 canvas framework for desktop and mobileapplications. http://konvajs.github.io/

BitSet.js other BitSet.js is a infinite Bit-Array implementation inJavaScript. https://github.com/infusion/BitSet.js

BootstrapTour other JavaScript library tobuildproduct tourswithBootstrapPopovers. http://bootstraptour.com/

clipboard.js other Moderncopy-to-clipboardjavascriptlibrary. https://clipboardjs.com/

FontAwesome otherFontAwesomegivesyouscalablevector icons thatcaninstantlybecustomized - size, color,dropshadow,andanythingthatcanbedonewiththepowerofCSS.

http://fontawesome.io/

Nature Methods doi:10.1038/nmeth.4365

Page 32: Supplementary Notes - images.nature.com file1 Supplementary Notes Probes & Drugs portal: interactive approach to Open Data exploration in chemical biology Ctibor Skuta, Martin Popr,

32

jQuery other

jQuery is a fast, small, and feature-rich JavaScriptlibrary. It makes things like HTML document traversalandmanipulation, event handling, animation, andAjaxmuchsimplerwithaneasy-to-useAPIthatworksacrossamultitudeofbrowsers.

https://jquery.com/

jQueryUI otherjQueryUIisacuratedsetofuserinterfaceinteractions,effects,widgets,andthemesbuiltontopofthe jQueryJavaScriptLibrary.

https://jqueryui.com/

Platform.js other A platform detection library that works on nearly allJavaScriptplatforms. https://github.com/bestiejs/platform.js/

tipsy other tipsyisasimplejQuerypluginforgeneratingFacebook-styletooltips. http://onehackoranother.com/projects/jquery/tipsy/

Underscore.js otherUnderscoreisaJavaScriptlibrarythatprovidesawholemessofusefulfunctionalprogramminghelperswithoutextendinganybuilt-inobjects.

http://underscorejs.org/

Nature Methods doi:10.1038/nmeth.4365

Page 33: Supplementary Notes - images.nature.com file1 Supplementary Notes Probes & Drugs portal: interactive approach to Open Data exploration in chemical biology Ctibor Skuta, Martin Popr,

33

COMPOUND SETS Supplementary Table 2

TheoverviewofcompoundsetscurrentlyincludedintheP&Dportal:

Name Description Category Compounds Source

ChemicalProbes.org

Thisresourceisacommunitydrivenwiki-likesitethatrecommendsappropriatechemicalprobesforbiologicaltargets,providesguidanceontheiruse,anddocumentstheirlimitations.Theyalsoprovideadviceontheuseofcontrols,bothchemicallydistinctprobesforthesametargetandnegativecontrolcompounds,whereavailable.

probe 160 ChemicalProbes.org[8]

MLPProbes

TheMolecularLibrariesProgram(MLP),acomponentoftheNIHCommonFund,offerspublicsectorbiomedicalresearchersaccesstothelarge-scalescreeningcapacitynecessarytoidentifysmallmoleculesthatcanbeoptimizedaschemicalprobestostudythefunctionsofgenes,cells,andbiochemicalpathways.Thiswillleadtonewwaystoexplorethefunctionsofgenesandsignalingpathwaysinhealthanddisease.

probe 374 NIH/PubChem[67]

NatureChemicalBiologyProbes

NatureChemicalBiologyprovidesfreelyavailablesummariesoftherelevantchemical,invitro,cellularandinvivoinformationfornewlyreportedornewlycharacterizedchemicalprobesreportedinthejournal.

probe 58NatureChemicalBiology[68]

SGCProbesSGCChemicalProbesaresmall,drug-likemoleculeswhichmeetthesecriteria:invitroIC50orKd<100nM,>30-foldselectivityoverproteinsinthesamefamily,significanton-targetcellularactivityat1mM.

probe 49StructuralGenomicsConsortium[69]

Nature Methods doi:10.1038/nmeth.4365

Page 34: Supplementary Notes - images.nature.com file1 Supplementary Notes Probes & Drugs portal: interactive approach to Open Data exploration in chemical biology Ctibor Skuta, Martin Popr,

34

DrugBank

TheDrugBankdatabaseisauniquebioinformaticsandcheminformaticsresourcethatcombinesdetaileddrug(i.e.chemical,pharmacologicalandpharmaceutical)datawithcomprehensivedrugtarget(i.e.sequence,structure,andpathway)information.Thedatabasecontains8,206drugentriesincluding1,991FDA-approvedsmallmoleculedrugs,207FDA-approvedbiotech(protein/peptide)drugs,93nutraceuticalsandover6,000experimentaldrugs.

drug 7,110 DrugBank[20]

DrugCentral

DrugCentralprovidesinformationonactiveingredientschemicalentities,pharmaceuticalproducts,drugmodeofaction,indications,andpharmacologicaction.TheymonitorFDA,EMA,andPMDAfornewdrugapprovalonaregularbasistoensurecurrencyoftheresource.LimitedinformationondiscontinuedanddrugsapprovedoutsideUSisalsoavailable,howeverregulatoryapprovalinformationcan'tbeverified.

drug 3,936 DrugCentral[21]

ChEMBLApprovedDrugs

AcollectionofcompoundstaggedasapproveddrugsinChEMBLdatabase.Onlycompoundswithavailablestructureareuploaded(omittingcompoundscontainingmetal,andpeptides).

drug 3,365 ChEMBL[18]

NIHApprovedOncologyDrugsVI

Thecurrentset(AODVII)consistsof129agentsandisintendedtoenablecancerresearch,drugdiscoveryandcombinationdrugstudies.

drug 118 NIH/NCI

GSKPublishedKinaseInhibitorSet

ThePublishedKinaseInhibitorSet(PKIS)isacollectionof376compoundsthathavebeenmadeavailablebyGSKforscreeningbyexternalgroups;allcompoundshavebeenpublishedinthescientificliterature.Thehopeistogenerateprobemoleculesforthemajorityofthekinomethatisasyetuntargeted.

non-commercial 366 GSK[70]

Nature Methods doi:10.1038/nmeth.4365

Page 35: Supplementary Notes - images.nature.com file1 Supplementary Notes Probes & Drugs portal: interactive approach to Open Data exploration in chemical biology Ctibor Skuta, Martin Popr,

35

GuidetoPharmacology

Oneofthemainaimsistoprovideasearchabledatabasewithquantitativeinformationondrugtargetsandtheprescriptionmedicinesandexperimentaldrugsthatactonthem.Infutureversionstheyplantoaddresourcesforeducationandtraininginpharmacologicalprinciplesandtechniquesalongwithresearchguidelinesandoverviewsofkeytopics.TheyhopethattheIUPHAR/BPSGuidetoPHARMACOLOGYwillbeusefulforresearchersandstudentsinpharmacologyanddrugdiscoveryandprovidethegeneralpublicwithaccurateinformationonthebasicscienceunderlyingdrugaction.

non-commercial 6,513GuidetoPharmacology[22]

InformerSet2.0

TheBroadInstitutegeneratedan'InformerSet'of481small-moleculeprobesanddrugsthatselectivelytargetdistinctnodesincellcircuitryandthatcollectivelymodulateabroadarrayofcellprocesses.Theyquantitativelymeasuredthesensitivityof860deeplycharacterizedcancer-celllinestoInformerSetcompounds,andhaveundertakenanalysesconnectingsensitivitytocancerfeatures,includingmutations,geneexpression,copy-numbervariation,andlineage.

non-commercial 481 BroadInstitute[31]

KinaseInhibitors(best-in-class)

Selectivesmall-moleculeinhibitorsofproteinkinasescanserveaspowerfultoolstoelucidatebiologicalfunction.Effortstodeveloppotentialdrugcandidateshaveyieldedawealthofkinaseinhibitors.However,selectingtheoptimalkinaseinhibitorforaparticularapplicationcanbechallenging.Whiletheoptimalinhibitorwillbeapplicationspecific,wehaveattemptedtosummarizesomeofthebestreportedinhibitorsforvariouskinases.

non-commercial 96 Cell[71,72]

LINCScompoundsetTheDatabasecontainsallpubliclyavailableHMSLINCSdatasetsandinformationforeachdatasetaboutexperimentalreagents(smallmoleculeperturbagens.

non-commercial 498LibraryofIntegratedNetwork-basedCellularSignatures[73]

Nature Methods doi:10.1038/nmeth.4365

Page 36: Supplementary Notes - images.nature.com file1 Supplementary Notes Probes & Drugs portal: interactive approach to Open Data exploration in chemical biology Ctibor Skuta, Martin Popr,

36

MLSMRProbes+

Thisplatedsetof1,133compoundscontains216probemolecules.Thesetwasenhancedwith917structureactivityrelationship(SAR)compounds-additionalcompoundssynthesizedduringtheprobeprojectsorchemicallysimilarcompoundsselectedfromtheMLSMR.TheSARandsimilaritycompoundsareexpectedtobeusefulleadsinfindingmodifiersofproteinsingenefamiliesrelatedtoeachprobetarget.Notallcompoundswereincludedinthefinalarraysduetoavailabilityorpoorstoragestability.-

non-commercial 1,132NIHSmallMoleculeRepository[67]

NIHClinicalCollections(NCC)

TheNIHClinicalCollectionisaplatedarrayof719smallmoleculesthathaveahistoryofuseinhumanclinicaltrials.ThecollectionwasassembledbytheNationalInstitutesofHealth(NIH)throughtheMolecularLibrariesRoadmapInitiativeaspartofitsmissiontoenabletheuseofcompoundscreensinbiomedicalresearch.SimilarcollectionsofFDAapproveddrugshaveproventoberichsourcesofundiscoveredbioactivityandtherapeuticpotential.TheclinicallytestedcompoundsintheNCCarehighlydrug-likewithknownsafetyprofiles.Thesecompoundscanprovideexcellentstartingpointsformedicinalchemistryoptimizationandmayevenbeappropriatefordirecthumanuseinnewdiseaseareas.

non-commercial 718 NIH/NCI

NIHMechanisticSetIII

TheMechanisticSetIII,whichconsistsof813compounds,wasderivedfromthe37,836opencompoundsthathavebeentestedintheNCIhumantumor60celllinescreen.Incontrasttotheoriginaldiversitysetof1,990compounds,whichwaschosenonthebasisofstructuraldiversity,thismechanisticdiversitysetwaschosentorepresentabroadrangeofgrowthinhibitionpatternsinthe60celllinescreen,basedontheGI50activityofthecompounds.

non-commercial 808 NIH/NCI

NPCScreeningCollection

TheNCGCPharmaceuticalCollection(NPC)isacomprehensive,publically-accessiblecollectionofapprovedandinvestigationaldrugsforhigh-throughputscreeningthatprovidesavaluableresourceforbothvalidatingnewmodelsofdiseaseandbetterunderstandingthemolecularbasisofdiseasepathologyandintervention.

non-commercial 3,257 NIH/NCATS[39]

Nature Methods doi:10.1038/nmeth.4365

Page 37: Supplementary Notes - images.nature.com file1 Supplementary Notes Probes & Drugs portal: interactive approach to Open Data exploration in chemical biology Ctibor Skuta, Martin Popr,

37

NURSAligandset

ThemissionofNURSAistoaccrue,develop,andcommunicateinformationthatadvancesourunderstandingoftherolesofnuclearreceptors(NRs)andcoregulatorsinhumanphysiologyanddisease.TheNURSAwebsitehasbeendevelopedoverthepastdecadeintoacomprehensivesourceofinformationaboutNRsandtheirco-regulators,ligands,anddownstreamtranscriptionaltargets.

non-commercial 299NURSA-NuclearReceptorSignalingAtlas[29]

WelcomeTrustCancerDrugs

Thesecompoundsincludecytotoxicchemotherapeuticsaswellastargetedtherapeuticsfromcommercialsources,academiccollaborators,andfromthebiotechandpharmaceuticalindustries.

non-commercial 239 WelcomeTrust[74]

AxonMedchemScreeningLibrary

AxonLigandsareauniquecollectionofbiologicalmolecules,asworld-widerecognizedresearchtoolsanddrugstandardsindifferentapplicationfieldssuchasneurologicaldisorders,cardiovasculardisease,painandinflammation,andcancer.FeaturedligandswithourexpertiseincludingCNSreagents,ionchannelmodulators,signaltransductionregulators(suchaskinaseinhibitors)andmuchmore?.

commercial 1,392 AxonMedchem

CaymanChemicalBioactives

Sixtystaffchemistssynthesize,purify,andcharacterizethesmallmoleculesandbiochemicalsyouneedtotakeyourresearchfurther,includingdrug-likeheterocycles,complexbiolipidsandfattyacids,inhibitors,activators,andmodulators.

commercial 8,262 CaymanChemical

LOPAClibraryCollectionof1,280Pharmacologically-ActiveSigmaCompounds.Includesthelatest,drug-likemoleculesinthefieldsofCellSignaling&Neuroscience.

commercial 1,278 Sigma

MedChemExpressBioactiveCompoundLibrary

Auniquecollectionof3,236smallmoleculecompoundsfordrugscreening,drugtargetidentification,andotherpharmaceutical-relatedapplications.

commercial 3,232 MedChemExpress

Nature Methods doi:10.1038/nmeth.4365

Page 38: Supplementary Notes - images.nature.com file1 Supplementary Notes Probes & Drugs portal: interactive approach to Open Data exploration in chemical biology Ctibor Skuta, Martin Popr,

38

PrestwickChemicalLibrary1.0

Auniquecollectionof1,120smallmolecules,100%approveddrugs(FDA,EMAandotheragencies)selectedbyateamofmedicinalchemistsandpharmacistsfortheirhighchemicalandpharmacologicaldiversityaswellasfortheirknownbioavailabilityandsafetyinhumans.Designedtoincreasethepotentialofgetting"high-quality"hits,ourchemicalscreeninglibraryisavaluabletooltoaccelerateleaddiscovery.

commercial 1,112 Prestwick

PrestwickChemicalLibrary2.0

Auniquecollectionof1,280smallmolecules,100%approveddrugs(FDA,EMAandotheragencies)selectedbyateamofmedicinalchemistsandpharmacistsfortheirhighchemicalandpharmacologicaldiversityaswellasfortheirknownbioavailabilityandsafetyinhumans.Designedtoincreasethepotentialofgetting"high-quality"hits,ourchemicalscreeninglibraryisavaluabletooltoaccelerateleaddiscovery.

commercial 1,279 Prestwick

SelleckchemBioactiveCompoundLibrary

Auniquecollectionof2,661bioactivechemicalcompoundsforhighthroughputscreening(HTS)andhighcontentscreening(HCS).

commercial 2,624 Selleckchem

TheSpectrumCollection

TheSpectrumCollectionpresents2,560compoundsandincludesallofthecompoundsintheUSandInternationalDrugCollections,togetherwithourNaturalProductandDiscoverlibraries.ThisuniqueresourceprovidesbiologicallyactiveandstructurallydiversecompoundsthatcreatetheoptimumopportunityfordiscoveryinnewandestablishedbioassaysinHTSorlowcapacitytargetspecificassays.

commercial 2,555 MicroSource

TocriscreenPlusAlibraryof1,280biologicallyactivecompoundsfromtheTocriscatalog.Coversawiderangeofpharmacologicaltargetsandresearchareas.

commercial 1,279 Tocris

TocriscreenTotalAcollectionof1,120biologicallyactivecompoundssuppliedaspre-dissolvedDMSOsolutions(250l10mMsolutionpercompound).

commercial 1,119 Tocris

Otherbioactivecompounds

Otherbioactivecompoundsharvestedfromdifferentnon-specificsources.Onlydatafromexternaldatabasesareavailableforthesecompounds.

other 60 -

Nature Methods doi:10.1038/nmeth.4365

Page 39: Supplementary Notes - images.nature.com file1 Supplementary Notes Probes & Drugs portal: interactive approach to Open Data exploration in chemical biology Ctibor Skuta, Martin Popr,

39

EXTERNAL SOURCES Supplementary Table 3

Theoverviewofexternalsourcesusedforthecompounddescriptionontheportal:

Name Harvesteddata Comment License URL

ChEMBLdatabase[18]

ChEMBLID,Compoundpreferredname,Compoundtradenames,Maxphaseattribute,Approveddrugattribute,Targetbioactivitydata,Targetontology

Activitydataarecurrentlyextractedonlyforhuman,ratandmousetargetswhere:adirectproteinisassigned,pChEMBLvalue isavailable,andaconfidencescore isgreater thanorequal to 7. When more than one value for a ligand-targetcomplex is available, the average of these values iscalculated.

CreativeCommonsAttribution-ShareAlike3.0UnportedLicense.

https://www.ebi.ac.uk/chembl

GuidetoPHARMACOLOGY[22]

Wholecompoundset,GtoPdbID,Compoundpreferredname,Approveddrugattribute,Targetbioactivitydata,Primarytargets,Targetontology

All activity data are extracted, even the ones where abioactivityvalueisnotknown.

CreativeCommonsAttribution-ShareAlike3.0Unportedlicense.

http://www.guidetopharmacology.org

Reactome[25] Pathways,PathwayontologyReactomepathwaysarematchedaccordingtotargetUniProtIDs.

CreativeCommonsAttribution4.0InternationalLicense

http://www.reactome.org/

UniProt[26] Targetnames,Genenames DataarematchedaccordingtotargetUniProtIDs.CreativeCommonsAttribution-NoDerivs3.0Unported

http://www.uniprot.org/

UniChem[75] ExternalIDsUniChem service is used to harvest external IDs from allavailable external sources. For harvesting thechembl_webresource_clientpythonpackageisemployed.

CreativeCommonsZero(CC-0)license

https://www.ebi.ac.uk/unichem/

Nature Methods doi:10.1038/nmeth.4365

Page 40: Supplementary Notes - images.nature.com file1 Supplementary Notes Probes & Drugs portal: interactive approach to Open Data exploration in chemical biology Ctibor Skuta, Martin Popr,

40

ChemSpider[76] Names,ChemSpiderIDsCompounds are matched using the ChemSpiPy pythonpackage.

CreativeCommonsAttribution-ShareAlike3.0UnitedStatesLicense

http://www.chemspider.com/

SMPDB[77]Ligandpathwaysubjects,Ligandpathways

-SMPDBisofferedtothepublicasafreelyavailableresource.

http://smpdb.ca/

PubChem[46] Compoundstructures

PubChem is the main source for the manual extraction ofcompound structures. Generally, when a compoundmissesitsstructure(orthestructureiswrong),itisfoundaccordingto a name provided by a supplier/provider. Inmany cases,compoundsarealsoidentifiedbytheirPubChemCIDs/SIDs.

Publicdatahttps://pubchem.ncbi.nlm.nih.gov/

MolPortAvailability(instockcompounds)

From MolPort, the information about the availability ofcompoundsisused.AllexternalIDstoMolPortareharvestedthroughtheUniChemservice.

Termsofusehttps://www.molport.com/shop/index

MculeAvailability(instockcompounds)

From Mcule, the information about the availability ofcompounds isused.Allexternal IDs toMculeareharvestedthroughtheUniChemservice.

Termsofuse https://mcule.com/

Nature Methods doi:10.1038/nmeth.4365

Page 41: Supplementary Notes - images.nature.com file1 Supplementary Notes Probes & Drugs portal: interactive approach to Open Data exploration in chemical biology Ctibor Skuta, Martin Popr,

41

REFERENCES

1. FryeSV:Theartofthechemicalprobe.NatChemBiol2010,6(3):159-161.2. BunnageME,CheklerEL,JonesLH:Targetvalidationusingchemicalprobes.NatChemBiol

2013,9(4):195-199.3. SchreiberSL,KotzJD,LiM,AubeJ,AustinCP,ReedJC,RosenH,WhiteEL,SklarLA,Lindsley

CWetal:AdvancingBiologicalUnderstandingandTherapeuticsDiscoverywithSmall-MoleculeProbes.Cell2015,161(6):1252-1265.

4. GarbaccioRM,ParmeeER:TheImpactofChemicalProbesinDrugDiscovery:APharmaceuticalIndustryPerspective.CellChemBiol2016,23(1):10-17.

5. FryeSV:Unlockingthepotentialofchemicalprobesformethyl-lysinereaderproteins.FutureMedChem2015,7(14):1831-1833.

6. vanHattumH,WaldmannH:ChemicalbiologytoolsforregulatingRASsignalingcomplexityinspaceandtime.ChemBiol2014,21(9):1185-1195.

7. WorkmanP,CollinsI:Probingtheprobes:fitnessfactorsforsmallmoleculetools.ChemBiol2010,17(6):561-577.

8. ArrowsmithCH,AudiaJE,AustinC,BaellJ,BennettJ,BlaggJ,BountraC,BrennanPE,BrownPJ,BunnageMEetal:Thepromiseandperilofchemicalprobes.NatChemBiol2015,11(8):536-541.

9. OpreaTI,BologaCG,BoyerS,CurpanRF,GlenRC,HopkinsAL,LipinskiCA,MarshallGR,MartinYC,Ostopovici-HalipLetal:AcrowdsourcingevaluationoftheNIHchemicalprobes.NatChemBiol2009,5(7):441-447.

10. WangY,CornettA,KingFJ,MaoY,NigschF,ParisCG,McAllisterG,JenkinsJL:Evidence-BasedandQuantitativePrioritizationofToolCompoundsinPhenotypicDrugDiscovery.CellChemBiol2016,23(7):862-874.

11. IrwinJJ,DuanD,TorosyanH,DoakAK,ZiebartKT,SterlingT,TumanianG,ShoichetBK:AnAggregationAdvisorforLigandDiscovery.JMedChem2015,58(17):7076-7087.

12. BaellJB,HollowayGA:Newsubstructurefiltersforremovalofpanassayinterferencecompounds(PAINS)fromscreeninglibrariesandfortheirexclusioninbioassays.JMedChem2010,53(7):2719-2740.

13. LipinskiCA,LittermanNK,SouthanC,WilliamsAJ,ClarkAM,EkinsS:Parallelworldsofpublicandcommercialbioactivechemistrydata.JMedChem2015,58(5):2068-2076.

14. BaellJ,WaltersMA:Chemistry:Chemicalconartistsfoildrugdiscovery.Nature2014,513(7519):481-483.

15. GilsonMK,LiuT,BaitalukM,NicolaG,HwangL,ChongJ:BindingDBin2015:Apublicdatabaseformedicinalchemistry,computationalchemistryandsystemspharmacology.NucleicAcidsRes2016,44(D1):D1045-1053.

16. HoweEA,deSouzaA,LahrDL,ChatwinS,MontgomeryP,AlexanderBR,NguyenDT,CruzY,StonichDA,WalzerGetal:BioAssayResearchDatabase(BARD):chemicalbiologyandprobe-developmentenabledbystructuredmetadataandresulttypes.NucleicAcidsRes2015,43(Databaseissue):D1163-1170.

17. HastingsJ,deMatosP,DekkerA,EnnisM,HarshaB,KaleN,MuthukrishnanV,OwenG,TurnerS,WilliamsMetal:TheChEBIreferencedatabaseandontologyforbiologicallyrelevantchemistry:enhancementsfor2013.NucleicAcidsRes2013,41(Databaseissue):D456-463.

18. BentoAP,GaultonA,HerseyA,BellisLJ,ChambersJ,DaviesM,KrugerFA,LightY,MakL,McGlincheySetal:TheChEMBLbioactivitydatabase:anupdate.NucleicAcidsRes2014,42(Databaseissue):D1083-1090.

Nature Methods doi:10.1038/nmeth.4365

Page 42: Supplementary Notes - images.nature.com file1 Supplementary Notes Probes & Drugs portal: interactive approach to Open Data exploration in chemical biology Ctibor Skuta, Martin Popr,

42

19. RoiderHG,PavlovaN,KirovI,SlavovS,SlavovT,UzunovZ,WeissB:Drug2Gene:anexhaustiveresourcetoexploreeffectivelythedrug-targetrelationnetwork.BMCBioinformatics2014,15:68.

20. WishartDS,KnoxC,GuoAC,ShrivastavaS,HassanaliM,StothardP,ChangZ,WoolseyJ:DrugBank:acomprehensiveresourceforinsilicodrugdiscoveryandexploration.NucleicAcidsRes2006,34(Databaseissue):D668-672.

21. UrsuO,HolmesJ,KnockelJ,BologaCG,YangJJ,MathiasSL,NelsonSJ,OpreaTI:DrugCentral:onlinedrugcompendium.NucleicAcidsRes2016.

22. SouthanC,SharmanJL,BensonHE,FaccendaE,PawsonAJ,AlexanderSP,BunemanOP,DavenportAP,McGrathJC,PetersJAetal:TheIUPHAR/BPSGuidetoPHARMACOLOGYin2016:towardscuratedquantitativeinteractionsbetween1300proteintargetsand6000ligands.NucleicAcidsRes2016,44(D1):D1054-1068.

23. NguyenDT,MathiasS,BologaC,BrunakS,FernandezN,GaultonA,HerseyA,HolmesJ,JensenLJ,KarlssonAetal:Pharos:Collatingproteininformationtoshedlightonthedruggablegenome.NucleicAcidsRes2016.

24. WangY,BryantSH,ChengT,WangJ,GindulyteA,ShoemakerBA,ThiessenPA,HeS,ZhangJ:PubChemBioAssay:2017update.NucleicAcidsRes2017,45(D1):D955-D963.

25. CroftD,MundoAF,HawR,MilacicM,WeiserJ,WuG,CaudyM,GarapatiP,GillespieM,KamdarMRetal:TheReactomepathwayknowledgebase.NucleicAcidsRes2014,42(Databaseissue):D472-477.

26. UniProtC:UniProt:ahubforproteininformation.NucleicAcidsRes2015,43(Databaseissue):D204-212.

27. SterlingT,IrwinJJ:ZINC15--LigandDiscoveryforEveryone.JChemInfModel2015,55(11):2324-2337.

28. BasuA,BodycombeNE,CheahJH,PriceEV,LiuK,SchaeferGI,EbrightRY,StewartML,ItoD,WangSetal:Aninteractiveresourcetoidentifycancergeneticandlineagedependenciestargetedbysmallmolecules.Cell2013,154(5):1151-1161.

29. LanzRB,JericevicZ,ZuercherWJ,WatkinsC,SteffenDL,MargolisR,McKennaNJ:NuclearReceptorSignalingAtlas(www.nursa.org):hyperlinkingthenuclearreceptorsignalingcommunity.NucleicAcidsRes2006,34(Databaseissue):D221-226.

30. GaultonA,OveringtonJP:Roleofopenchemicaldatainaidingdrugdiscoveryanddesign.FutureMedChem2010,2(6):903-907.

31. Seashore-LudlowB,ReesMG,CheahJH,CokolM,PriceEV,ColettiME,JonesV,BodycombeNE,SouleCK,GouldJetal:HarnessingConnectivityinaLarge-ScaleSmall-MoleculeSensitivityDataset.CancerDiscov2015,5(11):1210-1223.

32. ElkinsJM,FedeleV,SzklarzM,AbdulAzeezKR,SalahE,MikolajczykJ,RomanovS,SepetovN,HuangXP,RothBLetal:ComprehensivecharacterizationofthePublishedKinaseInhibitorSet.NatBiotechnol2016,34(1):95-103.

33. SantosR,UrsuO,GaultonA,BentoAP,DonadiRS,BologaCG,KarlssonA,Al-LazikaniB,HerseyA,OpreaTIetal:Acomprehensivemapofmoleculardrugtargets.NatRevDrugDiscov2017,16(1):19-34.

34. ReesMG,Seashore-LudlowB,CheahJH,AdamsDJ,PriceEV,GillS,JavaidS,ColettiME,JonesVL,BodycombeNEetal:Correlatingchemicalsensitivityandbasalgeneexpressionrevealsmechanismofaction.NatChemBiol2016,12(2):109-116.

35. LounkineE,KeiserMJ,WhitebreadS,MikhailovD,HamonJ,JenkinsJL,LavanP,WeberE,DoakAK,CoteSetal:Large-scalepredictionandtestingofdrugactivityonside-effecttargets.Nature2012,486(7403):361-+.

36. SeilerKP,GeorgeGA,HappMP,BodycombeNE,CarrinskiHA,NortonS,BrudzS,SullivanJP,MuhlichJ,SerranoMetal:ChemBank:asmall-moleculescreeningandcheminformaticsresourcedatabase.NucleicAcidsRes2008,36(Databaseissue):D351-359.

37. OpenPHACTSExplorer[https://explorer.openphacts.org/]

Nature Methods doi:10.1038/nmeth.4365

Page 43: Supplementary Notes - images.nature.com file1 Supplementary Notes Probes & Drugs portal: interactive approach to Open Data exploration in chemical biology Ctibor Skuta, Martin Popr,

43

38. StierandK,HarderT,MarekT,HilbigM,LemmenC,RareyM:TheInternetasScientificKnowledgeBase:NavigatingtheChem-BioSpace.MolInform2012,31(8):543-546.

39. HuangR,SouthallN,WangY,YasgarA,ShinnP,JadhavA,NguyenDT,AustinCP:TheNCGCpharmaceuticalcollection:acomprehensiveresourceofclinicallyapproveddrugsenablingrepurposingandchemicalgenomics.SciTranslMed2011,3(80):80ps16.

40. HohmanM,GregoryK,ChibaleK,SmithPJ,EkinsS,BuninB:Novelweb-basedtoolscombiningchemistryinformatics,biologyandsocialnetworksfordrugdiscovery.DrugDiscovToday2009,14(5-6):261-270.

41. WilliamsAJ,HarlandL,GrothP,PettiferS,ChichesterC,WillighagenEL,EveloCT,BlombergN,EckerG,GobleCetal:OpenPHACTS:semanticinteroperabilityfordrugdiscovery.DrugDiscovToday2012,17(21-22):1188-1198.

42. OpenPHACTSforResearchers-TheData[https://www.openphacts.org/2/sci/data.html]43. IlluminatingtheDruggableGenome(IDG)[https://commonfund.nih.gov/idg/index]44. TruchonJF,BaylyCI:Evaluatingvirtualscreeningmethods:goodandbadmetricsforthe

"earlyrecognition"problem.JChemInfModel2007,47(2):488-508.45. O'BoyleNM,BanckM,JamesCA,MorleyC,VandermeerschT,HutchisonGR:OpenBabel:An

openchemicaltoolbox.JCheminform2011,3:33.46. KimS,ThiessenPA,BoltonEE,ChenJ,FuG,GindulyteA,HanLY,HeJE,HeSQ,ShoemakerBA

etal:PubChemSubstanceandCompounddatabases.NucleicAcidsResearch2016,44(D1):D1202-D1213.

47. standardiser[https://github.com/flatkinson/standardiser]48. Barsyte-LovejoyD,LiFL,OudhoffMJ,TatlockJH,DongAP,ZengH,WuH,FreemanSA,

SchapiraM,SenisterraGAetal:(R)-PFI-2isapotentandselectiveinhibitorofSETD7methyltransferaseactivityincells.PNatlAcadSciUSA2014,111(35):12853-12858.

49. HellerS,McNaughtA,SteinS,TchekhovskoiD,PletnevI:InChI-theworldwidechemicalstructureidentifierstandard.JCheminformatics2013,5.

50. CreativeCommonsBY-SA4.0[https://creativecommons.org/licenses/by-sa/4.0/]51. LipinskiCA,LombardoF,DominyBW,FeeneyPJ:Experimentalandcomputational

approachestoestimatesolubilityandpermeabilityindrugdiscoveryanddevelopmentsettings.AdvDrugDelivRev2001,46(1-3):3-26.

52. MolPort[https://www.molport.com/shop/index]53. mcule[https://mcule.com/]54. RogersD,HahnM:Extended-connectivityfingerprints.JChemInfModel2010,50(5):742-

754.55. SkutaC,BartunekP,SvozilD:InCHlib-interactiveclusterheatmapforwebapplications.J

Cheminform2014,6(1):44.56. ChEMBLTargetTree[https://www.ebi.ac.uk/chembl/target/browser]57. GuidetoPHARMACOLOGYTargetTree[http://www.guidetopharmacology.org/targets.jsp]58. BemisGW,MurckoMA:Thepropertiesofknowndrugs.1.Molecularframeworks.JMed

Chem1996,39(15):2887-2893.59. HeikampK,BajorathJ:Large-scalesimilaritysearchprofilingofChEMBLcompounddata

sets.JChemInfModel2011,51(8):1831-1839.60. ClemonsPA,BodycombeNE,CarrinskiHA,WilsonJA,ShamjiAF,WagnerBK,KoehlerAN,

SchreiberSL:Smallmoleculesofdifferentoriginshavedistinctdistributionsofstructuralcomplexitythatcorrelatewithprotein-bindingprofiles.ProcNatlAcadSciUSA2010,107(44):18787-18792.

61. MaggioraG,VogtM,StumpfeD,BajorathJ:Molecularsimilarityinmedicinalchemistry.JMedChem2014,57(8):3186-3204.

62. WillettP:Similarity-basedvirtualscreeningusing2Dfingerprints.DrugDiscovToday2006,11(23-24):1046-1053.

63. Historiccompounds(ChemicalProbes.org)[http://www.chemicalprobes.org/historic_compounds]

Nature Methods doi:10.1038/nmeth.4365

Page 44: Supplementary Notes - images.nature.com file1 Supplementary Notes Probes & Drugs portal: interactive approach to Open Data exploration in chemical biology Ctibor Skuta, Martin Popr,

44

64. PythonLanguageReference,version2.7.[http://www.python.org]65. Django(Version1.10)[https://djangoproject.com]66. PostgreSQL(version9.5)[https://www.postgresql.org/]67. .In:ProbeReportsfromtheNIHMolecularLibrariesProgram.Bethesda(MD);2010.68. NatureChemicalBiologyProbes[http://www.nature.com/nchembio/chemical_probes.html]69. SGCChemicalProbes[http://www.thesgc.org/chemical-probes]70. PublishedKinaseInhibitorSet[https://www.ebi.ac.uk/chembldb/extra/PKIS/]71. WangJ,GrayNS:SnapShot:KinaseInhibitorsII.MolCell2015,58(4):710e711.72. WangJ,GrayNS:SnapShot:KinaseInhibitorsI.MolCell2015,58(4):708e701.73. HMSLINCSdatabase-Smallmolecules[http://lincs.hms.harvard.edu/db/sm/]74. YangW,LightfootH,BignellG,BehanF,CokelearT,HaberD,EngelmanJ,StrattonM,Benes

C,McDermottUetal:GenomicsofDrugSensitivityinCancer(GDSC):Aresourceforbiomarkerdiscoveryincancercells.EurJCancer2016,68:S82-S82.

75. ChambersJ,DaviesM,GaultonA,PapadatosG,HerseyA,OveringtonJP:UniChem:extensionofInChI-basedcompoundmappingtosalt,connectivityandstereochemistrylayers.JCheminformatics2014,6.

76. PenceHE,WilliamsA:ChemSpider:AnOnlineChemicalInformationResource.JChemEduc2010,87(11):1123-1124.

77. JewisonT,SuYL,DisfanyFM,LiangYJ,KnoxC,MaciejewskiA,PoelzerJ,HuynhJ,ZhouY,ArndtDetal:SMPDB2.0:BigImprovementstotheSmallMoleculePathwayDatabase.NucleicAcidsResearch2014,42(D1):D478-D484.

Nature Methods doi:10.1038/nmeth.4365