30
The CDK, Bioclipse, and RDF Egon Willighagen <http://chem-bla-ics.blogspot.com/> Bioclipse & Proteochemometric Group (Prof. Wikberg) Department of Pharmaceutical Biosciences Uppsala University 2010-04-01

Opentox Virtual Seminar: Bioclipse - Life Science Application and Ontology Development in Cheminformatics and Bioinformatics

Embed Size (px)

DESCRIPTION

OpenTox Virtual Seminar presentation of 2010-04-01.

Citation preview

Page 1: Opentox Virtual Seminar: Bioclipse - Life Science Application and Ontology Development in Cheminformatics and Bioinformatics

The CDK, Bioclipse, and RDF

Egon Willighagen <http://chem-bla-ics.blogspot.com/>

Bioclipse & Proteochemometric Group (Prof. Wikberg)Department of Pharmaceutical Biosciences

Uppsala University

2010-04-01

Page 2: Opentox Virtual Seminar: Bioclipse - Life Science Application and Ontology Development in Cheminformatics and Bioinformatics

Problem

BuildingBlocks

Solution

Application

Conclusion

Who am I?

http://www.citeulike.org/user/

egonw/tag/papers

http:

//chem-bla-ics.blogspot.com

http://egonw.github.com

waveto:

[email protected]

2010-04-01 Bioclipse & Proteochemometric Group - 2 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 3: Opentox Virtual Seminar: Bioclipse - Life Science Application and Ontology Development in Cheminformatics and Bioinformatics

Problem

BuildingBlocks

Solution

Application

Conclusion

The Problem...

Solanum lycopersicum...

We model our world, but ...Life is not uni- or bivariateKnowledge is not eitherBut we think of it as suchInformation Loss!

2010-04-01 Bioclipse & Proteochemometric Group - 3 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 4: Opentox Virtual Seminar: Bioclipse - Life Science Application and Ontology Development in Cheminformatics and Bioinformatics

Problem

BuildingBlocks

Solution

Application

Conclusion

Names...

benzene3-[4-[3-(1-methyl-7-oxo-3-propyl-4H-pyrazolo[4,3-d]pyrimidin-5-yl)-4-propoxyphenyl]sulfonylpiperazin-1-yl]propanoicacidInChI=1S/C25H34N6O6S/c1-4-6-19-22-23(29(3)28-19)25(34)27-24(26-22)18-16-17(7-8-20(18)37-15-5-2)38(35,36)31-13-11-30(12-14-31)10-9-21(32)33/h7-8,16H,4-6,9-15H2,1-3H3,(H,32,33)(H,26,27,34)

2010-04-01 Bioclipse & Proteochemometric Group - 4 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 5: Opentox Virtual Seminar: Bioclipse - Life Science Application and Ontology Development in Cheminformatics and Bioinformatics

Problem

BuildingBlocks

Solution

Application

Conclusion

... Molecular reality...

1 000 000 000 000 000 000 000 000000 000 000 000 000 000 000 000000 000 000 000

2010-04-01 Bioclipse & Proteochemometric Group - 5 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 6: Opentox Virtual Seminar: Bioclipse - Life Science Application and Ontology Development in Cheminformatics and Bioinformatics

Problem

BuildingBlocks

Solution

Application

Conclusion

... and Numbers

2010-04-01 Bioclipse & Proteochemometric Group - 6 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 7: Opentox Virtual Seminar: Bioclipse - Life Science Application and Ontology Development in Cheminformatics and Bioinformatics

Problem

BuildingBlocks

Solution

Application

Conclusion

Knowledge Representation: InformationLoss

2010-04-01 Bioclipse & Proteochemometric Group - 7 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 8: Opentox Virtual Seminar: Bioclipse - Life Science Application and Ontology Development in Cheminformatics and Bioinformatics

Problem

BuildingBlocks

Solution

Application

Conclusion

Data Analysis

2010-04-01 Bioclipse & Proteochemometric Group - 8 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 9: Opentox Virtual Seminar: Bioclipse - Life Science Application and Ontology Development in Cheminformatics and Bioinformatics

Problem

BuildingBlocks

Solution

Application

Conclusion

Proteochemometrics

2010-04-01 Bioclipse & Proteochemometric Group - 9 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 10: Opentox Virtual Seminar: Bioclipse - Life Science Application and Ontology Development in Cheminformatics and Bioinformatics

Problem

BuildingBlocks

Solution

Application

Conclusion

Main Theme

How do we navigate dimensionality space?How include prior knowledge?While minimizing information loss?With optimal knowledge extraction?And maximizing interpretability?Without ending up in random correlation?

2010-04-01 Bioclipse & Proteochemometric Group - 10 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 11: Opentox Virtual Seminar: Bioclipse - Life Science Application and Ontology Development in Cheminformatics and Bioinformatics

Problem

BuildingBlocks

Solution

Application

Conclusion

The Setting...

1998: Organicchemistry...beatiful science!But ... why, how,what, ...

PJJA Buijnsters et al., Eur.J.Org.Chem, 2002, 1397–1406

2010-04-01 Bioclipse & Proteochemometric Group - 11 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 12: Opentox Virtual Seminar: Bioclipse - Life Science Application and Ontology Development in Cheminformatics and Bioinformatics

Problem

BuildingBlocks

Solution

Application

Conclusion

Knowledge Representation...

What are theorganic normalconditions?

2010-04-01 Bioclipse & Proteochemometric Group - 12 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 13: Opentox Virtual Seminar: Bioclipse - Life Science Application and Ontology Development in Cheminformatics and Bioinformatics

Problem

BuildingBlocks

Solution

Application

Conclusion

The Problem: Reproducibility...

Where reproducibility isseverely hampered:

recalculate basic atom andbond propertiesaccess to QSAR/QSPRdatawell-defined algorithmspublications destroyinformation

2010-04-01 Bioclipse & Proteochemometric Group - 13 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 14: Opentox Virtual Seminar: Bioclipse - Life Science Application and Ontology Development in Cheminformatics and Bioinformatics

Problem

BuildingBlocks

Solution

Application

Conclusion

Solutions...

Openesslicense that allowsmodification andredistributionhiding behind publicdomain is not helpful

Semantic Webbe explicit in what youmeanboth in facts and inalgorithms

2010-04-01 Bioclipse & Proteochemometric Group - 14 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 15: Opentox Virtual Seminar: Bioclipse - Life Science Application and Ontology Development in Cheminformatics and Bioinformatics

Problem

BuildingBlocks

Solution

Application

Conclusion

Reproducibility needs ODOSOS

Open DataNo Intellectual Monopoly

Open Sourcealgorithms are compleximplementations even morestrong interaction with representation

Open StandardsSemantic Webformatsunique identifiers

http: // en. wikipedia. org/ wiki/ Glyn_ Moody

2010-04-01 Bioclipse & Proteochemometric Group - 15 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 16: Opentox Virtual Seminar: Bioclipse - Life Science Application and Ontology Development in Cheminformatics and Bioinformatics

Problem

BuildingBlocks

Solution

Application

Conclusion

Jmol

Started in 1997 byDan Gezelter(Notre Dame)Leaders: BradlySmith, me, MiguelHoward, BobHanson

E.L. Willighagen, M. Howard, Nature Precedings, 2005http: // www. jmol. org/

2010-04-01 Bioclipse & Proteochemometric Group - 16 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 17: Opentox Virtual Seminar: Bioclipse - Life Science Application and Ontology Development in Cheminformatics and Bioinformatics

Problem

BuildingBlocks

Solution

Application

Conclusion

The Chemistry Development Kit

A Family of ProjectsCDK-Taverna (chemoinformatics workflows)JChemPaint (semantic 2D editor)ChemoJava (GPL-ed extension)

Goalslibrary of cheminformatics algorithmseducational

UsageCDK 2003: 75+ times cited in literatureBioclipse, KNIME, Jumbo (CML), AMBIT, ...

C. Steinbeck et al., J.Chem.Inf.Comput.Sci, 2003C. Steinbeck et al., Curr.Pharm.Design, 2006

2010-04-01 Bioclipse & Proteochemometric Group - 17 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 18: Opentox Virtual Seminar: Bioclipse - Life Science Application and Ontology Development in Cheminformatics and Bioinformatics

Problem

BuildingBlocks

Solution

Application

Conclusion

CDK: an Open Project

Featuresopen mailinglist and bugtrackeropen source repositoryrelease soon, release often

Offer Reviewsenior developers reviewpatches

2010-04-01 Bioclipse & Proteochemometric Group - 18 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 19: Opentox Virtual Seminar: Bioclipse - Life Science Application and Ontology Development in Cheminformatics and Bioinformatics

Problem

BuildingBlocks

Solution

Application

Conclusion

Bioclipse

O. Spjuth et al., BMC Bioinformatics 2007, 8:59

2010-04-01 Bioclipse & Proteochemometric Group - 19 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 20: Opentox Virtual Seminar: Bioclipse - Life Science Application and Ontology Development in Cheminformatics and Bioinformatics

Problem

BuildingBlocks

Solution

Application

Conclusion

Integration

Servicesdatabases: PubChemweb servicesGoogle SpreadsheetsMyExperiment.org: BioclipseScripting LanguageTwitter, ...journals, ...

TechniquesSOAP, REST, XMPP, . . .Resource Description Frameworkdedicated APIs

2010-04-01 Bioclipse & Proteochemometric Group - 20 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 21: Opentox Virtual Seminar: Bioclipse - Life Science Application and Ontology Development in Cheminformatics and Bioinformatics

Problem

BuildingBlocks

Solution

Application

Conclusion

Resource Description Framework

Facts as Triplessubjectpredictate (relation)object

Exampleswp:Benzenechem:hasSMILES"c1ccccc1"wp:Benzene owl:sameAschemspider:123

2010-04-01 Bioclipse & Proteochemometric Group - 21 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 22: Opentox Virtual Seminar: Bioclipse - Life Science Application and Ontology Development in Cheminformatics and Bioinformatics

Problem

BuildingBlocks

Solution

Application

Conclusion

OpenMolecules RDF: dereferenceable URI

http://rdf.openmolecules.net/

2010-04-01 Bioclipse & Proteochemometric Group - 22 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 23: Opentox Virtual Seminar: Bioclipse - Life Science Application and Ontology Development in Cheminformatics and Bioinformatics

Problem

BuildingBlocks

Solution

Application

Conclusion

OpenMolecules RDF: linked data

http://rdf.openmolecules.net/

2010-04-01 Bioclipse & Proteochemometric Group - 23 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 24: Opentox Virtual Seminar: Bioclipse - Life Science Application and Ontology Development in Cheminformatics and Bioinformatics

Problem

BuildingBlocks

Solution

Application

Conclusion

Bioclipse-RDF

local RDF storageread/write RDF/XML, N3run SPARQL queries (local and remote)extract RDF from XHTML/RDFa

Thanx to Jena and Pellet.

2010-04-01 Bioclipse & Proteochemometric Group - 24 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 25: Opentox Virtual Seminar: Bioclipse - Life Science Application and Ontology Development in Cheminformatics and Bioinformatics

Problem

BuildingBlocks

Solution

Application

Conclusion

Names 2 Graphs 2 Numbers...

2010-04-01 Bioclipse & Proteochemometric Group - 25 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 26: Opentox Virtual Seminar: Bioclipse - Life Science Application and Ontology Development in Cheminformatics and Bioinformatics

Problem

BuildingBlocks

Solution

Application

Conclusion

ChEMBL / QSAR

2010-04-01 Bioclipse & Proteochemometric Group - 26 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 27: Opentox Virtual Seminar: Bioclipse - Life Science Application and Ontology Development in Cheminformatics and Bioinformatics

Problem

BuildingBlocks

Solution

Application

Conclusion

RDF graph visualization

2010-04-01 Bioclipse & Proteochemometric Group - 27 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 28: Opentox Virtual Seminar: Bioclipse - Life Science Application and Ontology Development in Cheminformatics and Bioinformatics

Problem

BuildingBlocks

Solution

Application

Conclusion

OWL for Descriptors

Used for model and data.

2010-04-01 Bioclipse & Proteochemometric Group - 28 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 29: Opentox Virtual Seminar: Bioclipse - Life Science Application and Ontology Development in Cheminformatics and Bioinformatics

Problem

BuildingBlocks

Solution

Application

Conclusion

MyExperiment: Bioclipse ScriptingLanguage

2010-04-01 Bioclipse & Proteochemometric Group - 29 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 30: Opentox Virtual Seminar: Bioclipse - Life Science Application and Ontology Development in Cheminformatics and Bioinformatics

Problem

BuildingBlocks

Solution

Application

Conclusion

What does this bring us?

Platform to integrate the RDF with the computation worldBioclipse as single point of accessScripting, sharing of scripts with MyExperiment.orgBridge the nominal with the numerical world

2010-04-01 Bioclipse & Proteochemometric Group - 30 - Egon Willighagen | chem-bla-ics.blogspot.com