Transcript
Page 1: Dataset Descriptions in Open PHACTS and HCLS

Dataset Descriptions in Open PHACTS and

W3C HCLS IG

Alasdair J G GrayHeriot-Watt University

www.alasdairjggray.co.uk [email protected]

NDEx Call, April 2014

Page 2: Dataset Descriptions in Open PHACTS and HCLS

RDFNanopub

Db

VoID

Data Cache (Virtuoso Triple Store)

Semantic Workflow Engine

Linked Data API (RDF/XML, TTL, JSON)DomainSpecificServices

Identity Resolution

Service

Chemistry RegistrationNormalisation & Q/C

IdentifierManagement

Service

Indexing

Cor

e Pl

atfo

rm

P12374EC2.43.4

CS4532

“Adenosine receptor 2a”

RDF

VoID

Db

RDFNanopub

Db

VoID

RDF

Db

VoID

RDFNanopub

VoID

Public Content Commercial

Public Ontologies

User Annotations

Apps

Page 3: Dataset Descriptions in Open PHACTS and HCLS

Data Cache (Triple Store)

Semantic Workflow Engine

Linked Data API (RDF/XML, TTL, JSON) DomainSpecificServices

Identity Resolution

Service

IdentifierManagement

Service

Cor

e Pl

atfo

rm

P12374EC2.43.4

CS4532

“Adenosine receptor 2a”

ChEMBL-RDF

ChEMBL

Apps

Chem2Bio2RDF

SD

v13v12v2 or v8

ChEMBL

January 2012

Page 4: Dataset Descriptions in Open PHACTS and HCLS
Page 5: Dataset Descriptions in Open PHACTS and HCLS

ChemSpider

• Data aggregator: over 400 sources– What data does it contain?– What version of ?? did they load?– When are new versions loaded?

• OPS data covers– ChEBI– ChEMBL– DrugBank

2 April 2014 OPS Dataset Descriptions – A. J. G. Gray 5

Page 6: Dataset Descriptions in Open PHACTS and HCLS

Metadata Challenges

• Datasets available– In many versions over time– In different formats– From many mirrors/registries

• Datasets build on each other• Files do not carry metadata• Registries

– Can be out-of-date– Can contain conflicting information

2 April 2014 OPS Dataset Descriptions – A. J. G. Gray 6

Users require data

provenance!

Page 7: Dataset Descriptions in Open PHACTS and HCLS

2 April 2014 OPS Dataset Descriptions – A. J. G. Gray 7

Page 8: Dataset Descriptions in Open PHACTS and HCLS

2 April 2014 OPS Dataset Descriptions – A. J. G. Gray 8

Page 9: Dataset Descriptions in Open PHACTS and HCLS

Description Model

2 April 2014 OPS Dataset Descriptions – A. J. G. Gray 9

Page 10: Dataset Descriptions in Open PHACTS and HCLS

Realisation of Dataset Descriptions

• Needs to be incorporated into data publishing pipeline

• Hard for publishers to provide conformant descriptions– Datasets are complex– Evolve over time– Seen as yet another burden

2 April 2014 OPS Dataset Descriptions – A. J. G. Gray 15

Page 11: Dataset Descriptions in Open PHACTS and HCLS

VoID Editor

2 April 2014 OPS Dataset Descriptions – A. J. G. Gray 16

Page 12: Dataset Descriptions in Open PHACTS and HCLS

Validator

2 April 2014 OPS Dataset Descriptions – A. J. G. Gray 17

Page 13: Dataset Descriptions in Open PHACTS and HCLS

W3C HCLS Group

Page 14: Dataset Descriptions in Open PHACTS and HCLS

HCLS Community Profile Model

2 April 2014 OPS Dataset Descriptions – A. J. G. Gray 19

Page 15: Dataset Descriptions in Open PHACTS and HCLS

Future Vision

Metadata: Write once, use many times• Provide rich and accurate provenance trail of

data– Automatic pipeline from VoID file to registries

• Align Open PHACTS with W3C HCLS– Update tools for HCLS profile

2 April 2014 OPS Dataset Descriptions – A. J. G. Gray 20