20
Edge Informatics and FAIR* Data Tom Plasterer, PhD Research & Development Information (RDI); US Cross-Science Director 20 February 2017 Integrated Pharma Informatics * Findable, Accessible, Interoperable and Reusable

Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) Data

Embed Size (px)

Citation preview

Page 1: Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) Data

Edge Informatics and FAIR* Data

Tom Plasterer, PhDResearch & Development Information (RDI); US Cross-Science Director 20 February 2017

Integrated Pharma Informatics

* Findable, Accessible, Interoperable and Reusable

Page 2: Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) Data

The right data is there when I need it

Your data and my data are mutually understandable

Our data can be effortlessly combined

I am permitted to use any data I can access

Data can be reshaped for a different purpose

Data sharing is rewarded

‘I’ can be a human or a machine

3

We Want Data Nirvana!

Page 3: Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) Data

4

Target

Discovery

Lead

Discovery

Lead

Optimization

Pre-Clinical

Development

Clinical

DevelopmentRegistration

Marketing &

Sales

Node and Edge Informatics: Interfaces

Page 4: Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) Data

5

Target

Discovery

Lead

Discovery

Lead

Optimization

Pre-Clinical

Development

Clinical

DevelopmentRegistration

Marketing &

Sales

NGS

Exome

analysis

Pathway

Analysis

Structure

Analysis

Disease

Contextualization

Node and Edge Informatics: Interfaces

Page 5: Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) Data

6

NGS

Exome

analysis

Pathway

Analysis

Target

Discovery

Lead

Discovery

Lead

Optimization

Pre-Clinical

Development

Clinical

DevelopmentRegistration

Marketing &

Sales

RNAi

Assay

Development

HTS

Structure

Analysis

Disease

Contextualization

SAR

In vivo non-human testing

Exploratory PK

Exploratory Tox

GLP Tox

Formulation

ADMEPK

Efficacy

IND

Safety, Tolerability

Phase I-III

NDA/BLA

MAA

PMR

REMS

PSUR

Observational Research

Node and Edge Informatics: Interfaces

Page 6: Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) Data

7

NGS

Exome

analysis

Pathway

Analysis

Target

Discovery

Lead

Discovery

Lead

Optimization

Pre-Clinical

Development

Clinical

DevelopmentRegistration

Marketing &

Sales

RNAi

Assay

Development

HTS

Structure

Analysis

SAR

In vivo non-human testing

Exploratory PK

Exploratory Tox

GLP Tox

Formulation

ADMEPK

Efficacy

IND

Safety, Tolerability

Phase I-III

NDA/BLA

MAA

PMR

REMS

PSUR

Node and Edge Informatics: Interfaces

Seamless information connectivity (an EDGE)

needed across domain NODEs

Disease

ContextualizationObservational

Research

Page 7: Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) Data

9

FAIR Data: Overview

To be Findable:

• Globally unique, resolvable and persistent identifiers

• Machine-actionable contextual information supporting discovery

To be Accessible:

• Clearly defined access protocol

• Clearly defined rules for authorization/authentication

To be Interoperable:

• Use shared vocabularies and/or ontologies

• Syntactically and semantically machine-accessible format

To be Reusable:

• Be compliant with the F, A and I Principles

• Contextual information, allowing proper interpretation

• Rich provenance information facilitating accurate citation

Mark Wilkinson, Data Interoperability and FAIRness Through Existing Web Technologies

Page 8: Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) Data

10

FAIR Data: A Brief History

Moving away from Narrative

• Nanopublications

Incubating Standards in Open PHACTS

• VoID, PROV-O

Lorentz Center Workshop

• FORCE 11 FAIR Guiding Principles

• Participants: IMI members, US researchers,

Content providers, ELIXIR; European Open

Science Cloud, Big Data to Knowledge (BD2K)

Current Status:

• FAIR Data Workshops (EU-ELIXIR nodes)

• Inclusion in Horizon 2020, NIH Advocacy

• IMI2 Data FAIR-ification Call

• Vendors getting up to speed

Page 9: Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) Data

11

FAIR Data: Systems Biology Survey

Molecular Systems Biology

Volume 11, Issue 12, 28 DEC 2015 DOI: 10.15252/msb.20156053

http://onlinelibrary.wiley.com/doi/10.15252/msb.20156053/full#msb156053-fig-0001

Page 10: Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) Data

12

FAIR Data & Biopharma?

Collaborative & Competitive Intelligence:

• Who do we want to partner with? Are there complementary assets to our portfolio?

• What space is too crowded and not our area of expertise?

• Greenfield situations?

Mergers, Acquisitions, Partnerships:

• How do we efficiently and deeply absorb data generated elsewhere into our systems? How

do we efficiently share?

• Does this make a smaller biotech/start-up a more viable partner?

Improved Patient Care:

• Can we share data and outcomes more efficiently in complicated trial settings (basket trials,

adaptive trials) to better engage opinion leaders and foster dialog?

• Along with Differential Privacy approaches, can we have the broader research community

help mine our data?

Data (Ir)-reproducibility:

• Is preclinical data reproducible?

• Can we utilize data credentialization? (thanks to Dan Crowther @ Sanofi)

Page 11: Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) Data

13

Differential Privacy (DP): Clinical Data Anonymization

• A quantifiable method for anonymizing data by modifying data fields identified

as those that can aid in the identification of individuals.

• Adapted by large corporations like Apple and Google

to protect the privacy of users of their services.

AZ Differential Privacy Efforts:

• Developed and publishing a DP algorithm designed to anonymize clinical data.

• Developing open source software in R (and Mathematica)

FAIR — DP helps support these guiding principle for scientific data:

• Findable DP may facilitate pharma patient data transparency

• Accessible

• Interoperable Analysis of private and DP data yield the same statistics

• Reusable Enable reuse inside as well as outside the pharma company

firewalls.

Enabling FAIR Guiding Principles for Scientific Data

Page 12: Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) Data

14

Edge Informatics & FAIR Application: CI360

WINNER

Page 13: Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) Data

15

Capture Business Questions: Inventory Capture Business Questions and

Sources

Page 14: Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) Data

16

Translate Questions into Concepts: Team Modeling Domain Expert Concept Map

“Where are the key clinical studies in NSCLC and who are the principle investigators?”

Page 15: Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) Data

17

Challenge with Data: Remodel

“Where are the key clinical studies in NSCLC and who are the principle investigators?”

(one example)

Challenge with Linked Data

Source: https://clinicaltrials.gov/ct2/show/NCT02027428

Page 16: Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) Data

18

Refine the Answer: Configurable Interfaces Examine with a Faceted Browser

“What are the open trials in metastatic breast cancer and what drugs are being tested?”

Page 17: Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) Data

19

Share Insights as a Community: Nanopublish

“Can a biomarker defined population be added to a trial record?”

Share insights with a Knowledge

Base

Page 18: Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) Data

20

Is CI360 FAIR?

Findable:

• Resources named with URIs, with a defined policy

• Dataset descriptions published with VoID on intranet

To be Accessible:

• Data reachable via REST and SPARQL APIs

• Application secured via SSO

To be Interoperable:

• Uses well-described internal and public ontologies

• All data is linked data (RDF)

To be Reusable:

• Daily updates tracked with VoID and PROV-O

• Vocabularies used in CI360 already reused in four other applications

Page 19: Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) Data

R&D | RDI

Get your plumbing right• And your data won’t be stuck in a silo

Use Edge Informatics• Consider handoffs—you don’t know how your data will be used in the future

Leverage working public solutions• Don’t reinvent the wheel (OK—Ontology…)

Invest in FAIR Data Stewardship• Investment to future-proof your efforts

FAIR Data and Edge Informatics: Take-aways

Page 20: Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) Data

R&D | RDI

Thanks

Key Influencers

In Linked Data Community

Molecular Medicine Tri-Con 2017

Conference Organizers

AZ/MedImmune Linked

Data Community