32
Open PHACTS experience of sustainability MIOSS 2016 [email protected] Openphacts.org

Open PHACTS MIOSS may 2016

Embed Size (px)

Citation preview

Open PHACTS – experience of

sustainability MIOSS [email protected]

Openphacts.org

Open PHACTS Mission:

Integrate Multiple Research

Biomedical Data Resources

Into A Single Open & Free

Access Point

…and make it sustainable in the long term

WHAT WE THOUGHT

… or how we thought the world was

LiteraturePubChem

GenbankPatents

DatabasesDownloads

Data Analysis Data Integration Firewalled Databases

How do pharma companies use public data?

How do pharma companies use public data?

Pfizer

AZ

Roche

n

P12047X31045

GB:29384

Andy Law's Third Law“The number of unique identifiers assigned to an individual is never less than the number of Institutions involved in the study”

http://bioinformatics.roslin.ac.uk/lawslaws/

WHAT WE DID

ChEMBL DrugBankGene

OntologyWikipathways

UniProt

ChemSpider

UMLS

ConceptWiki

ChEBI

TrialTrove

GVKBio

GeneGo

TR

Integrity

“Find me compounds that inhibit targets in NFkB pathway assayed in only functional assays with a potency <1 μM”

“What is the selectivity profile of known p38 inhibitors?”

“Let me compare MW, logP and PSA for known oxidoreductase inhibitors”

DisGeNet

neXtProt

ChEMBL

Target ClassENZYME

FDA adverse

eventsSureChEMBL

@gray_alasdair Big Data Integration 11

Nanopub

Db

VoID

Data Cache (Virtuoso Triple Store)

Semantic Workflow Engine

Linked Data API (RDF/XML, TTL, JSON)

Domain

Specific

Services

Identity

Resolution

Service

Chemistry

Registration

Normalisation

& Q/C

Identifier

Management

Service

Indexing

Co

re P

latf

orm

P12374

EC2.43.4

CS4532

“Adenosine

receptor 2a”

VoID

Db

Nanopub

Db

VoID

Db

VoID

Nanopub

VoID

Public Content Commercial

Public Ontologies

User

Annotations

Apps

Mappings: Raw

Mappings (Raw)

25,087,328

“CUTTING THE GORDIAN KNOT”

What are the problems with licensing we had to address?

– To make the data and software generated by the project usable and reusable

– Multiplicity of unclear or non-standard licenses on original data sources

• ‘Public’ can mean use but not redistribute, use in commercial environment,

• Legal position on use and reuse extremely unclear

• Different issues than just linking to data

– What is the legal status of integrated collections of the above, and of derived knowledge?

– Appropriate software license selection

– Legal clarity for EFPIA and end users

– Approaches for commercial data integration, EFPIA in-house data

AIM: to enable maximum possible dissemination and usability of the integrated data and

architecture generated by the project - with approaches that will be applicable in other

data integration projects

Licensing Challenges

Dataset Downloaded Version Licence Triples

Bio Assay Ontology CC-By 10,360

CALOHA 8 Apr 2015 2014-01-22 CC-By-ND 14,552

ChEBI 4 Mar 2015 125 CC-By-SA 1,012,056

ChEMBL 18 Feb 2015 20.0 CC-By-SA 445,732,880

ConceptWiki 12 Dec 2013 CC-By-SA 4,331,760

DisGeNET 31 Mar 2015 2.1.0 ODbL 15,011,136

Disease Ontology 2015-05-21 CC-By 188,062

DrugBank 19 Feb 2015 4.1 Non-commercial 4,028,767

ENZYME 2015_11 CC-By-ND 61,467

FDA Adverse Events 9 Jul 2012 CC0 13,557,070

Example Data Licenses

Handling

private data

securely in

the cloud

WHERE DID THAT GET US?

DELIVERY UPDATE

Regular data updates as the core data refreshes

API updates aligned to new business questions and

changes

Workstreams to add further new data – see later

New release May 2016 2.1

– SureChEMBL and Pathways update

Further updates planned for summer 2016

Usage

>500 million queries

Public

Data

Open PHACTS Evolution - Platform

Public

DataPrivate

Data

Public

Data

VM VM

Public

DataCommercial

Data

• Security Audited

Hosted platform

• Platform sustainability

Open PHACTS Expanding EcoSystem

Further

Apps

Explorer

Workshops

Researchathons

Further planned

WHAT IT FEELS LIKE NOW

… or how it really is

Sustaining Impact

“Software is free like

puppies are free -

they both need

money for

maintenance”

…and more resource

for future

development

Kick-Starting SustainabilityC

ollab

ora

tio

n

Gra

nts

Ind

ustr

y

Open PHACTSA

PI U

sers

Apps

API

Open PHACTS Foundation Routes to AccessAccess Route Open API

services

Unlimited API

services

Unlimited API,

RDF and Link

sets

Open PHACTS

Virtual Machine

Full OPF

Member ✓ ✓ ✓ ✓

Licensor/

Reseller*✓ ✓ ✓

Licensor (Own

Use)✓ ✓ ✓

High volume

API Licensor✓ ✓

Open Access

API Consumer✓

Open Data Non-

commercial✓ **

*3rd parties must have own agreement with OPF

** talk to us for collaborative proposals – non commercial use

Open PHACTS Foundation

engaged in the following projects

……

Come and collaborate

New projects

Improve our code and

services

Open Innovation

projects

Webinars

New ideas for data

services and

workflows

[email protected] @Open_PHACTS

Open PHACTS Practical SemanticsAcknowledgements

GlaxoSmithKline – Coordinator

Universität Wien – Managing entity

Technical University of Denmark

University of Hamburg, Center for

Bioinformatics

BioSolveIT GmBH

Consorci Mar Parc de Salut de Barcelona

Leiden University Medical Centre

Royal Society of Chemistry

Vrije Universiteit Amsterdam

Novartis

Merck Serono

H. Lundbeck A/S

Eli Lilly

Netherlands Bioinformatics Centre

Swiss Institute of Bioinformatics

ConnectedDiscovery

EMBL-European Bioinformatics Institute

Janssen Esteve Almirall

OpenLink Scibite

The Open PHACTS Foundation

Spanish National Cancer Research Centre

University of Manchester

Maastricht University

Aqnowledge

University of Santiago de Compostela

Rheinische Friedrich-Wilhelms-Universität

Bonn

AstraZeneca

Pfizer

Questions/Discussions

Moving from Project to Foundation – What

are the expectations?

How best to show the value of platform/API?

What would you expect from sustainability?

What can we do differently?

How to best get contributors involved?

Thanks