24
Open Knowledge: Reproducibility in Cheminformatics with Open Data, Open Source and Open Standards Egon Willighagen <http://chem-bla-ics.blogspot.com/> Bioclipse & Proteochemometric Group (Prof. Wikberg) Department of Pharmaceutical Biosciences Uppsala University 2009-08-31

Open Knowledge: Reproducibility in Cheminformatics with Open Data, Open Source, and Open Standards

Embed Size (px)

DESCRIPTION

My presentation at the "Open Drug Discovery and Open Notebook Science" session at the GDCh-Wissenschaftsforum Chemie 2009 in Frankfurt.

Citation preview

Page 1: Open Knowledge: Reproducibility in Cheminformatics with Open Data, Open Source, and Open Standards

Open Knowledge: Reproducibility inCheminformatics with Open Data, Open

Source and Open Standards

Egon Willighagen <http://chem-bla-ics.blogspot.com/>

Bioclipse & Proteochemometric Group (Prof. Wikberg)Department of Pharmaceutical Biosciences

Uppsala University

2009-08-31

Page 2: Open Knowledge: Reproducibility in Cheminformatics with Open Data, Open Source, and Open Standards

Problem

Solution

Results

Discussions

Conclusion

The Setting...

1998: Organicchemistry...beatiful science!But ... why, how,what, ...

PJJA Buijnsters et al., Eur.J.Org.Chem, 2002, 1397–1406

2009-08-31 Bioclipse & Proteochemometric Group - 2 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 3: Open Knowledge: Reproducibility in Cheminformatics with Open Data, Open Source, and Open Standards

Problem

Solution

Results

Discussions

Conclusion

Reliable Knowledge: Trust

How to build Trusttrack record

2009-08-31 Bioclipse & Proteochemometric Group - 3 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 4: Open Knowledge: Reproducibility in Cheminformatics with Open Data, Open Source, and Open Standards

Problem

Solution

Results

Discussions

Conclusion

Knowledge: Trust

How to build Trusttrack recordtransparency: citation

2009-08-31 Bioclipse & Proteochemometric Group - 4 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 5: Open Knowledge: Reproducibility in Cheminformatics with Open Data, Open Source, and Open Standards

Problem

Solution

Results

Discussions

Conclusion

Knowledge: Trust

How to build Trusttrack recordtransparency: citationreproducibility: details

2009-08-31 Bioclipse & Proteochemometric Group - 5 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 6: Open Knowledge: Reproducibility in Cheminformatics with Open Data, Open Source, and Open Standards

Problem

Solution

Results

Discussions

Conclusion

Knowledge: Trust

How to build Trusttrack recordtransparency: citationreproducibility: details

Open {Data|Standards|Source|. . . }

2009-08-31 Bioclipse & Proteochemometric Group - 6 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 7: Open Knowledge: Reproducibility in Cheminformatics with Open Data, Open Source, and Open Standards

Problem

Solution

Results

Discussions

Conclusion

Knowledge Representation...

What are theorganic normalconditions?

2009-08-31 Bioclipse & Proteochemometric Group - 7 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 8: Open Knowledge: Reproducibility in Cheminformatics with Open Data, Open Source, and Open Standards

Problem

Solution

Results

Discussions

Conclusion

The Problem: Reproducibility...

Where reproducibility isseverely hampered:

recalculate basic atom andbond propertiesaccess to QSAR/QSPRdatawell-defined algorithmspublications destroyinformation

2009-08-31 Bioclipse & Proteochemometric Group - 8 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 9: Open Knowledge: Reproducibility in Cheminformatics with Open Data, Open Source, and Open Standards

Problem

Solution

Results

Discussions

Conclusion

Solutions...

Openesslicense that allowsmodification andredistributionhiding behind publicdomain is not helpful

Semantic Webbe explicit in what youmeanboth in facts and inalgorithms

2009-08-31 Bioclipse & Proteochemometric Group - 9 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 10: Open Knowledge: Reproducibility in Cheminformatics with Open Data, Open Source, and Open Standards

Problem

Solution

Results

Discussions

Conclusion

Reproducibility needs ODOSOS

Open DataNo Intellectual Monopoly

Open Sourcealgorithms are compleximplementations even morestrong interaction with representation

Open StandardsSemantic Webformatsunique identifiers

http: // en. wikipedia. org/ wiki/ Glyn_ Moody

2009-08-31 Bioclipse & Proteochemometric Group - 10 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 11: Open Knowledge: Reproducibility in Cheminformatics with Open Data, Open Source, and Open Standards

Problem

Solution

Results

Discussions

Conclusion

Jmol

Started in 1997 byDan Gezelter(Notre Dame)Leaders: BradlySmith, me, MiguelHoward, BobHanson

E.L. Willighagen, M. Howard, Nature Precedings, 2005http: // www. jmol. org/

2009-08-31 Bioclipse & Proteochemometric Group - 11 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 12: Open Knowledge: Reproducibility in Cheminformatics with Open Data, Open Source, and Open Standards

Problem

Solution

Results

Discussions

Conclusion

The Chemistry Development Kit

A Family of ProjectsCDK-Taverna (chemoinformatics workflows)JChemPaint (semantic 2D editor)ChemoJava (GPL-ed extension)

Goalslibrary of cheminformatics algorithmseducational

UsageCDK 2003: 75+ times cited in literatureBioclipse, KNIME, Jumbo (CML), AMBIT, ...

C. Steinbeck et al., J.Chem.Inf.Comput.Sci, 2003C. Steinbeck et al., Curr.Pharm.Design, 2006

2009-08-31 Bioclipse & Proteochemometric Group - 12 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 13: Open Knowledge: Reproducibility in Cheminformatics with Open Data, Open Source, and Open Standards

Problem

Solution

Results

Discussions

Conclusion

CDK: an Open Project

Featuresopen mailinglist and bugtrackeropen source repositoryrelease soon, release often

Offer Reviewsenior developers reviewpatches

2009-08-31 Bioclipse & Proteochemometric Group - 13 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 14: Open Knowledge: Reproducibility in Cheminformatics with Open Data, Open Source, and Open Standards

Problem

Solution

Results

Discussions

Conclusion

Bioclipse

O. Spjuth et al., BMC Bioinformatics 2007, 8:59

2009-08-31 Bioclipse & Proteochemometric Group - 14 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 15: Open Knowledge: Reproducibility in Cheminformatics with Open Data, Open Source, and Open Standards

Problem

Solution

Results

Discussions

Conclusion

Integration

Servicesdatabases: PubChemweb servicesGoogle SpreadsheetsMyExperiment.org: BioclipseScripting LanguageTwitter, ...journals, ...

TechniquesSOAP, REST, XMPP, . . .Resource Description Frameworkdedicated APIs

2009-08-31 Bioclipse & Proteochemometric Group - 15 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 16: Open Knowledge: Reproducibility in Cheminformatics with Open Data, Open Source, and Open Standards

Problem

Solution

Results

Discussions

Conclusion

MyExperiment: Bioclipse ScriptingLanguage

2009-08-31 Bioclipse & Proteochemometric Group - 16 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 17: Open Knowledge: Reproducibility in Cheminformatics with Open Data, Open Source, and Open Standards

Problem

Solution

Results

Discussions

Conclusion

XMPP

XMPPJabberprotocolAlternative toHTTPXML-based:improvedsemantics

FeaturesAsychronousXML-based:improvedsemantics

J. Wagener et al., BMC Bioinformatics, 2009, in production

2009-08-31 Bioclipse & Proteochemometric Group - 17 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 18: Open Knowledge: Reproducibility in Cheminformatics with Open Data, Open Source, and Open Standards

Problem

Solution

Results

Discussions

Conclusion

Resource Description Framework

Facts as Triplessubjectpredictate (relation)object

Exampleswp:Benzenechem:hasSMILES"c1ccccc1"wp:Benzene owl:sameAschemspider:123

2009-08-31 Bioclipse & Proteochemometric Group - 18 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 19: Open Knowledge: Reproducibility in Cheminformatics with Open Data, Open Source, and Open Standards

Problem

Solution

Results

Discussions

Conclusion

OpenMolecules RDF

http://rdf.openmolecules.net/

2009-08-31 Bioclipse & Proteochemometric Group - 19 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 20: Open Knowledge: Reproducibility in Cheminformatics with Open Data, Open Source, and Open Standards

Problem

Solution

Results

Discussions

Conclusion

Blue Obelisk

R Guha et al., J.Chem.Inf.Model.,2006

2009-08-31 Bioclipse & Proteochemometric Group - 20 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 21: Open Knowledge: Reproducibility in Cheminformatics with Open Data, Open Source, and Open Standards

Problem

Solution

Results

Discussions

Conclusion

Which License?

ChoiceGPL v2 or v3, LGPL v2 orv3, Apache, BSD, MIT, ...FDL, CC0, PDDLImportant: redistribution,modification

Bad Practisenot explicitly stating yourintentionsPublic Domain

2009-08-31 Bioclipse & Proteochemometric Group - 21 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 22: Open Knowledge: Reproducibility in Cheminformatics with Open Data, Open Source, and Open Standards

Problem

Solution

Results

Discussions

Conclusion

Mixing Data?

License IncompatibilityAsk about the copyrightholders intention!

Use Open Standard InterfacesResource DescriptionFramework

2009-08-31 Bioclipse & Proteochemometric Group - 22 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 23: Open Knowledge: Reproducibility in Cheminformatics with Open Data, Open Source, and Open Standards

Problem

Solution

Results

Discussions

Conclusion

Conclusions

No Intellectual Monopoly AcchievedJmol, CDK, JChemPaint, Bioclipse

• A huge success!Open Data in chemistry is still way behind

• Open Access trap• Public Domain trap

Semantics is showing up• in RDF• in Publishing

2009-08-31 Bioclipse & Proteochemometric Group - 23 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 24: Open Knowledge: Reproducibility in Cheminformatics with Open Data, Open Source, and Open Standards

Problem

Solution

Results

Discussions

Conclusion

The Details

http://www.citeulike.org/user/

egonw/tag/papers

http:

//chem-bla-ics.blogspot.com

mailto:

[email protected]

2009-08-31 Bioclipse & Proteochemometric Group - 24 - Egon Willighagen | chem-bla-ics.blogspot.com