33
Implementing chemistry platform for OpenPHACTS: Lessons learned Colin Batchelor, Alexey Pshenichnov, Jon Steele, Valery Tkachenko Royal Society of Chemistry ACS Spring 2016 San Diego, CA March 17 th 2016

Implementing chemistry platform for OpenPHACTS

Embed Size (px)

Citation preview

Page 1: Implementing chemistry platform for OpenPHACTS

Implementing chemistry platform for OpenPHACTS: Lessons learned

Colin Batchelor, Alexey Pshenichnov, Jon Steele, Valery Tkachenko

Royal Society of Chemistry

ACS Spring 2016San Diego, CAMarch 17th 2016

Page 2: Implementing chemistry platform for OpenPHACTS

Open PHACTS Mission: Integrate Multiple Research Biomedical Data Resources

Into A Single Open & SustainableAccess Point

Page 3: Implementing chemistry platform for OpenPHACTS

[email protected] @Open_PHACTS

Open PHACTS Practical SemanticsAcknowledgements

GlaxoSmithKline – CoordinatorUniversität Wien – Managing entity Technical University of Denmark University of Hamburg, Center for Bioinformatics BioSolveIT GmBH Consorci Mar Parc de Salut de Barcelona Leiden University Medical Centre Royal Society of Chemistry Vrije Universiteit AmsterdamNovartisMerck SeronoH. Lundbeck A/SEli LillyNetherlands Bioinformatics CentreSwiss Institute of BioinformaticsConnectedDiscoveryEMBL-European Bioinformatics InstituteJanssen Esteve AlmirallOpenLink ScibiteThe Open PHACTS FoundationSpanish National Cancer Research Centre University of Manchester Maastricht University AqnowledgeUniversity of Santiago de Compostela Rheinische Friedrich-Wilhelms-Universität BonnAstraZenecaPfizer

Page 4: Implementing chemistry platform for OpenPHACTS

Why is it so hard to….

Competitors?

What’s the structure?

Are they in our file?

What’s similar?

What’s the target?Pharmacology

data?

Known Pathways?

Working On Now?Connections to

disease?

Expressed in right cell type?

IP?

Page 5: Implementing chemistry platform for OpenPHACTS

LiteraturePubChem

GenbankPatents Databases Downloads

Data Analysis Data Integration Firewalled Databases

How do R&D companies use public data?

Page 6: Implementing chemistry platform for OpenPHACTS

9@gray_alasdair Big Data Integration

Page 7: Implementing chemistry platform for OpenPHACTS

Patent annotations in Open PHACTS

• Huge amount of knowledge hidden in patent corpus • Most of which will never be published elsewhere • Substantial lag between patent and scientific literature • SureChEMBL system already extracts chemical entities from full-text

patent documents • Text (title, abstract, description, claims), images, molfiles• Complemented with gene and disease entity annotations • Using the Termite text-mining tool by SciBite• Relevance scoring to reduce noise • Tested for recall• Patent, compound, gene, disease info available via API

Page 8: Implementing chemistry platform for OpenPHACTS

Open PHACTS Expanding EcoSystem

Further Apps

Data Warrior

Page 9: Implementing chemistry platform for OpenPHACTS

• VM install of Open PHACTS – Docker Image is now available

• Updating to ver 2.0 Open PHACTS• Allows you to customise and load your own data into the

environment

Want to load your data into Open PHACTS?

Want to run Open PHACTS within your environment?

Page 10: Implementing chemistry platform for OpenPHACTS
Page 11: Implementing chemistry platform for OpenPHACTS

Usage

>500 million queries

Page 12: Implementing chemistry platform for OpenPHACTS

All Users by Sector Type

Page 13: Implementing chemistry platform for OpenPHACTS

Challenge of migrating between versions of the API

Upgrading

Page 14: Implementing chemistry platform for OpenPHACTS
Page 15: Implementing chemistry platform for OpenPHACTS

Explorer Explorer2 ChemBioNavigator Target Dossier Pharmatrek Helium

MOE Collector Cytophacts Utopia Garfield SciBite

KNIME Mol. Data Sheets PipelinePilot scinav.it Taverna

Page 16: Implementing chemistry platform for OpenPHACTS

openphactsfoundation.org/apps.html

Explorer.openphacts.org

Page 18: Implementing chemistry platform for OpenPHACTS

RDFNanopub

Db

VoID

Data Cache (Virtuoso Triple Store)

Semantic Workflow Engine

Linked Data API (RDF/XML, TTL, JSON)DomainSpecificServices

Identity Resolution

Service

Chemistry RegistrationNormalisation & Q/C

IdentifierManagement

Service

Indexing

Cor

e Pl

atfo

rm

P12374EC2.43.4

CS4532

“Adenosine receptor 2a”

RDF

VoID

Db

RDFNanopub

Db

VoID

RDF

Db

VoID

RDFNanopub

VoID

Public Content Commercial

Public Ontologies

User Annotations

Apps

Page 19: Implementing chemistry platform for OpenPHACTS

We integrate, standardize and host the chemical compound collection underpinning Open PHACTS.

We have developed a structure validation and normalization platform (CVSP) to ensure chemical structures are normalized to rules derived from the FDA structure normalization guidelines and modified based on input from members of EFPIA.

http://cvsp.chemspider.com/

The Royal Society of Chemistry’s role in Open PHACTS

Page 20: Implementing chemistry platform for OpenPHACTS

Freely-available (requires logging in) chemical validation system for:• Structure validation: warning on query

atoms, pseudoatoms, nonsensical or unclear stereo

• Standardization workflows.

CVSP and the Open Pharmacological Space Chemical Registration System (OPS CRS)

Page 21: Implementing chemistry platform for OpenPHACTS

Chemical data sourcesData source Number of records in

sourceDrugBank 6828

PDB ligands 18681

MeSH (extracted by text mining)

24381

ChEBI 40503

HMDB 41494

ChEMBL 20 1456020

SureChEMBL 1.0 14228299

Page 22: Implementing chemistry platform for OpenPHACTS

We generate RDF that:1. Describes synonyms and identifiers2. Provides linksets between our data sources

and the OPS identifiers3. Describes molecule–molecule relations of

interest to the pharma industry4. Delivers calculated physicochemical

properties of compounds5. Lists the validation and standardization

issues found by CVSP.

Royal Society of Chemistry data provided to Open PHACTS

Page 23: Implementing chemistry platform for OpenPHACTS

• Use standard ontologies where possible (CHEMINF for cheminformatics properties, QUDT for units, OBO ontologies elsewhere)

• Use an event-based pattern for cheminformatics outputs. This enables us to add arbitrary provenance information.

Principles

Page 24: Implementing chemistry platform for OpenPHACTS

Use the CHEMINF ontology: https://github.com/semanticchemistry

Validated ChemSpider synonyms, Unvalidated ChemSpider synonyms, Validated database identifiers, Unvalidated database identifiers, InChI, InChIKey, SMILES, preferred ChemSpider name

1. Synonyms and identifiers

Page 25: Implementing chemistry platform for OpenPHACTS

Metadata describing the RDF:• Can be used to build a directory of the RDF

available• Find what’s there without having to download all

of it first• Describes how Datasets are linked by the

Linksets using SKOS.

Recommendations here: http://www.openphacts.org/specs/2013/WD-datadesc-20130912/

2. Linksets:Vocabulary of Interlinked Datasets

Page 26: Implementing chemistry platform for OpenPHACTS

We relate molecules to “parent” forms, variously, those which are:• uncharged• not isotopically-specified• not stereochemically-specified• the preferred tautomer• the largest fragment• the “superparent” (all of the above)

3. Molecule–molecule relations in CHEMINF

Page 27: Implementing chemistry platform for OpenPHACTS

log P, log D (at pH 5.5 and 7.4), bioconcentration factor, KOC (at pH 5.5 and 7.4), index of refraction, polar surface area, molar refractivity, molar volume, polarizability, surface tension, density at STP, flash point a 1 atm, enthalpy of vaporization at STP, vapour pressure at STP.

4. Calculated physicochemical properties

Page 28: Implementing chemistry platform for OpenPHACTS

5. Issues from validation and standardization

We use the CHEMINF ontology again.

We distinguish between information, warnings and errors. Only serious failures to process, such as a structure having an invalid atom, count as errors.

Page 29: Implementing chemistry platform for OpenPHACTS

This is the world we live in

Page 30: Implementing chemistry platform for OpenPHACTS

Data quality issue and CVSP

– Robochemistry

– Proliferation of errors in public and private databases

• ChemSpider• PubChem• DrugBank• KEGG• ChEBI/ChEMBL

– Automated quality control system

Page 31: Implementing chemistry platform for OpenPHACTS

Chemistry Validation and Standardization Platform

Page 32: Implementing chemistry platform for OpenPHACTS

Chemistry Validation and Standardization Platform

Page 33: Implementing chemistry platform for OpenPHACTS

Thank you

Email: [email protected]

Slides: http://www.slideshare.net/valerytkachenko16