30
Data Integration in a Big Data Context Open PHACTS Case Study Alasdair J G Gray [email protected] alasdairjggray.co.uk @gray_alasdair

Data Integration in a Big Data Context: An Open PHACTS Case Study

Embed Size (px)

Citation preview

Page 1: Data Integration in a Big Data Context: An Open PHACTS Case Study

Data Integration in a Big Data ContextOpen PHACTS Case StudyAlasdair J G [email protected]@gray_alasdair

Page 2: Data Integration in a Big Data Context: An Open PHACTS Case Study

2

Big Data

@gray_alasdair Big Data Integration

Volume Velocity

Variety Veracity

http

://i.k

inja

-img.

com

/gaw

ker-m

edia

/imag

e/up

load

/lvzm

0afp

8kik

5dct

xiya

.jpg

Page 3: Data Integration in a Big Data Context: An Open PHACTS Case Study

Open PHACTS Use Case

“Let me compare MW, logP and PSA for launched inhibitors of human & mouse oxidoreductases”

Chemical Properties (Chemspider) Launched drugs (Drugbank) Human => Mouse (Homologene) Protein Families (Enzyme) Bioactivty Data (ChEMBL) … other info (Uniprot/Entrez etc.)

“Let me compare MW, logP and PSA for launched inhibitors of human & mouse oxidoreductases”

@gray_alasdair Big Data Integration 3

Page 4: Data Integration in a Big Data Context: An Open PHACTS Case Study

4

Open PHACTS Mission: Integrate Multiple Research Biomedical Data Resources

Into A Single Open & FreeAccess Point

@gray_alasdair Big Data Integration

Page 5: Data Integration in a Big Data Context: An Open PHACTS Case Study

5

LiteraturePubChem

GenbankPatents Databases

Downloads

Data Integration Data Analysis Firewalled Databases

Repeat @ each companyx

A single, shared solution.

Funded under• IMI: 2011-14• ENSO: 2014-16

Pre-competitive Data

@gray_alasdair Big Data Integration

Page 6: Data Integration in a Big Data Context: An Open PHACTS Case Study

6

http://dx.doi.org/10.1016/j.websem.2014.03.003

• Cloud-Based “Production” Level System.

• Secure & Private

• Guided By Business Questions

• Uses Semantic Web Technology

• Provides REST-ful API

http://dx.doi.org/10.1016/j.drudis.2013.05.008

Discovery Platform

@gray_alasdair Big Data Integration

Page 7: Data Integration in a Big Data Context: An Open PHACTS Case Study

7

Scientific Results

http://ceur-ws.org/Vol-1114/Demo_Dunlop.pdf

http://dx.doi.org/10.1016/j.drudis.2014.11.006 http://dx.doi.org/10.1002/minf.v31.8

http://dx.doi.org/10.1371/journal.pone.0115460

@gray_alasdair Big Data Integration

Page 8: Data Integration in a Big Data Context: An Open PHACTS Case Study

OPS Discovery Platform

@gray_alasdair Big Data Integration 8

Drug Discovery Platform

Apps

Domain API

Interactive responses

Production qualityintegration platform

MethodCalls

Standard Web Technologies

Page 9: Data Integration in a Big Data Context: An Open PHACTS Case Study

9

App Ecosystem

@gray_alasdair

An “App Store”?

Explorer Explorer2 ChemBioNavigator Target Dossier Pharmatrek Helium

MOE Collector Cytophacts Utopia Garfield SciBite

KNIME Mol. Data Sheets PipelinePilot scinav.it Taverna

Big Data Integration https://www.openphacts.org/2/sci/apps.html

Page 10: Data Integration in a Big Data Context: An Open PHACTS Case Study

Big Data Integration 10http://chembionavigator.com

ChemBioNavigator

@gray_alasdair

Page 11: Data Integration in a Big Data Context: An Open PHACTS Case Study

Big Data Integration 11@gray_alasdair

Page 12: Data Integration in a Big Data Context: An Open PHACTS Case Study

Big Data Integration 12@gray_alasdair

Page 13: Data Integration in a Big Data Context: An Open PHACTS Case Study

13

API Hits

@gray_alasdair Big Data Integration

Jan 2013

Feb 2013

Mar 2013

Apr 2013

May 2013

June 2013

July 2013

Aug 2013

Sept 2

013

Oct 2013

Nov 2013

Dec 2013

Jan 2014

Feb 2014

Mar 2014

Apr 2014

May 2014

June 2014

July 2014

Aug 2014

Sept 2

014

Oct 2014

Nov 2014

Dec 2014

Jan 2015

Feb 2015

Mar 2015

Apr 2015

May 2015

June 2015

0

10000000

20000000

30000000

40000000

50000000

60000000

Month

No

of H

its

Public launchof 1.2 API

1.3 API 1.4 API 1.5 API

Page 14: Data Integration in a Big Data Context: An Open PHACTS Case Study

14

OPS Discovery Platform

RDFNanopub

Db

VoID

Data Cache (Virtuoso Triple Store)

Semantic Workflow Engine

Linked Data API (RDF/XML, TTL, JSON)DomainSpecificServices

Identity Resolution

Service

Chemistry RegistrationNormalisation & Q/C

IdentifierManagement

Service

Indexing

Cor

e Pl

atfo

rm

P12374EC2.43.4

CS4532

“Adenosine receptor 2a”

RDF

VoID

Db

RDFNanopub

Db

VoID

RDF

Db

VoID

RDFNanopub

VoID

Public Content Commercial

Public Ontologies

User Annotations

Apps

@gray_alasdair Big Data Integration

Page 15: Data Integration in a Big Data Context: An Open PHACTS Case Study

15

Open PHACTS Data

@gray_alasdair Big Data Integration

Page 16: Data Integration in a Big Data Context: An Open PHACTS Case Study

John Wilbanks consulted for us

A framework built around STANDARD well-understood Creative Commons licences – and how they interoperate

Deal with the problems by:

Interoperable licences

Appropriate terms

Declare expectations to users and

data publishers

One size won‘t fit all requirements

Data Licensing (Or Lack Of!)

Page 17: Data Integration in a Big Data Context: An Open PHACTS Case Study

17

API: Complex Interactions

@gray_alasdair Big Data Integration

Disease

Tissue

Target

Compound

Pathway

Page 18: Data Integration in a Big Data Context: An Open PHACTS Case Study

18

STANDARD_TYPE   UNIT_COUNT---------------- -------AC50                  7 Activity         421 EC50                 39 IC50                 46 ID50                 42 Ki                   23 Log IC50             4 Log Ki               7 Potency              11 log IC50             0 

STANDARD_TYPE      STANDARD_UNITS     COUNT(*)------------------ ------------------ --------IC50               nM                   829448 IC50               ug.mL-1               41000 IC50                                     38521 IC50               ug/ml                  2038 IC50               ug ml-1                 509 IC50               mg kg-1                 295 IC50               molar ratio             178 IC50               ug                      117 IC50               %                       113 IC50               uM well-1                52 

~ 100 units>5000 types

Implemented using the Quantities, Units, Dimension, TypesOntology (http://www.qudt.org/)

Quantitative Data Challenges

@gray_alasdair Big Data Integration

Page 19: Data Integration in a Big Data Context: An Open PHACTS Case Study

19

Quality Assurance

@gray_alasdair Big Data Integration

Page 20: Data Integration in a Big Data Context: An Open PHACTS Case Study

Big Data Integration 20

P12047X31045 P12047

GB:29384RS_2353

Identity Mapping

@gray_alasdair

Andy Law's Third Law“The number of unique identifiers assigned to an individual is never less than the number of Institutions involved in the study”http://bioinformatics.roslin.ac.uk/lawslaws/

Page 21: Data Integration in a Big Data Context: An Open PHACTS Case Study

Gleevec®: Imatinib Mesylate

@gray_alasdair Big Data Integration 21

DrugbankChemSpider PubChem

Imatinib

MesylateImatinib MesylateYLMAHDNUQAMNNX-UHFFFAOYSA-N

Page 22: Data Integration in a Big Data Context: An Open PHACTS Case Study

Gleevec®: Imatinib Mesylate

@gray_alasdair Big Data Integration 22

DrugbankChemSpider PubChem

Imatinib

MesylateImatinib MesylateYLMAHDNUQAMNNX-UHFFFAOYSA-N

Are these records the same?It depends upon your task!

Page 23: Data Integration in a Big Data Context: An Open PHACTS Case Study

Big Data Integration 23

skos:exactMatch(InChI)

Strict Relaxed

Analysing Browsing

Structure Lens

@gray_alasdair

I need to perform an analysis, give me details of the active compound in

Gleevec.

Page 24: Data Integration in a Big Data Context: An Open PHACTS Case Study

Big Data Integration 24

skos:closeMatch(Drug Name)

skos:closeMatch(Drug Name)

skos:exactMatch(InChI)

Strict Relaxed

Analysing Browsing

Name Lens

@gray_alasdair

Which targets are known to interact with Gleevec?

Page 25: Data Integration in a Big Data Context: An Open PHACTS Case Study

Big Data Integration 26

Data Provenance

@gray_alasdair

Page 26: Data Integration in a Big Data Context: An Open PHACTS Case Study

Big Data Integration 27

Data Provenance

@gray_alasdair

Page 27: Data Integration in a Big Data Context: An Open PHACTS Case Study

29

dev.openphacts.org

@gray_alasdair Big Data Integration

Page 28: Data Integration in a Big Data Context: An Open PHACTS Case Study
Page 29: Data Integration in a Big Data Context: An Open PHACTS Case Study

31

Open PHACTS Approach1. Know your audience

Web developers2. Understand your use cases

Prioritised business questions3. Identify access pathways

Identify dataIdentify connectionsImplement API

@gray_alasdair Big Data Integration

Page 30: Data Integration in a Big Data Context: An Open PHACTS Case Study

32

QuestionsAlasdair J G [email protected]@gray_alasdair

Open [email protected]@open_phacts

@gray_alasdair Big Data Integration