Clinical trial data wants to be free: Lessons from the ImmPort Immunology Data and Analysis Portal

Clinical Trial Data Wants to be Free

Barry Smith

Publish all drug trial results, says Dr Ben Goldacre, 19 June 2013:

http://www.bbc.co.uk/news/uk-politics-22957195

John Holdren, Director of the Office of Science and Technology Policy, “has directed Federal agencies with more than $100M in R&D expenditures to develop plans to make the published results of federally funded research freely available to the public within one year of publication and requiring researchers to better account for and manage the digital data resulting from federally funded scientific research.”

Increasingly, these data will be digitalized

Can we take care of the problems here, at least prospectively?

Paper is giving way to digitalized data• As more studies come on-line, the problems involved

in making them available to automated analysis will only get worse

• What we need is prospective standardization of a useful sort – using standards that will make the life of trialists easier and also increase the value of the data they produce for secondary use

Ought implies can

Ought implies can

Complete clinical trial data can be made freely available, in de-identified form

https://immport.niaid.nih.gov/

10

Complete, deidentified data for 89 trials

DAIT-funded Projects Depositing Data Into ImmPort

• Immune Tolerance Network (ITN)• Atopic Dermatitis Research Network (ADRN)• Population Genetics Analysis Program• HLA Region Genetics in Immune-Mediated Diseases• Clinical Trials in Organ Transplantation in Children (CTOT-

C)• Consortium of Food Allergy Research (CoFAR)• Renal and Lung Living Donors Evaluation Study (RELIVE)• The Inner City Asthma Consortium (ICAC)

ImmPort TeamNorthrop Grumman Information Technology Health Solutions

Stanford UniversityAtul Butte (PI)Mark Davis (co-PI)Garry Nolan (co-PI)Ravi Shankar

University of Buffalo Barry Smith (co-PI)Alex Diehl Alan Ruttenberg

Technion Israel Institute of TechnologyShai Shen-Orr

Why do we want the data to be free?• Education• Replication of results • Scientific scrutiny / economy • Secondary use

• New biomedical discovery, including DIY science• New –omics/ Big Data start-ups• Reanalysis of original results

oLinking existing trial data to new bioinformatics discoveriesoHarvesting existing trial data by creating new, virtual meta-

trials

Tutorial and WorkshopOntology and Imaging

Informatics

• Would the training of pathologists (or other professionals) change if hundreds/thousands of trial-labeled images were publicly available?

Cooperative Clinical Trials in Pediatric Transplantation (CCTPT): Four studies

2001 2003 2004 2005 2006 20072002

CN01

SW01

SNS01

PC01

# Arms

Participate Centers

Length of follow up (years)

Transplant years

CN01 1 4 3 2001-03

SW01 2 19 3 2001-04

SNS01 2 12 3 2004-06

PC01 1 4 2 2005-07

PRELIMINARY

PRELIMINARY

Common follow-up time points for 4 studies

5 time points: time 0, 3, 6, 12, and 24 months post-transplant are in common

PRELIMINARY

HLA data (purple)

Flow Cytometry data (yellow)

PCR data (green)

Study Protocol,Operational data,Clinical data (blue)

ITN Data (with thanks to Ravi Shankar)

SpecimenManagementData (green)

21

What is in a visit name?Visit 0, v0, v 0, 0, Day 0, Transplant

ProtocolGroup

Assay Group

Schedule of Events

SpecimenTable

0

0


ProtocolGroup

Assay Group

CRO

Schedule of Events

SpecimenTable

CRF

Day 0, Transplant

0

0


ProtocolGroup

Assay Group

CRO

OperationsGroup

Schedule of Events

SpecimenTable

TubeTable

CRF

v 0

0

0

Day 0, Transplant


ProtocolGroup

Assay Group

CRO

CimarronOperationsGroup

Tube Manufacturer

Schedule of Events

SpecimenTable

TubeTable

CRF

ImmunoTrak

KitReport

Day 0, Transplant

0

0

v 0

v 0v0, Visit 0


ProtocolGroup

Assay Group

CRO


Schedule of Events

SpecimenTable

TubeTable

CRF

ImmunoTrak

KitReport

CoreLabs

Assays

0

0

Day 0, Transplant

v0

v0, Visit 0v 0

Tube Manufacturer v 0


CRO

ProtocolGroup

Assay Group


Data Center

Schedule of Events

SpecimenTable

TubeTable

CRF

ImmunoTrak

KitReport

Database

CoreLabs

Assays

Day 0, Transplant

v0

0

0

v 0v0, Visit 0



28

CRO

ProtocolGroup

Assay Group


Data Center

Schedule of Events

SpecimenTable

TubeTable

CRF

ImmunoTrak

KitReport

Database

CoreLabs

Assays

Day 0, Transplant

v0

0

0

v 0v0, Visit 0


29

Allergy Score ( Study Collection Day) Lab Tests ( Study Time collected)

Microarray Data ( Only Visit ) Flow ( Collection_Study_day and Visit)

Mappings between protocol, lab tests and mechanistic assays were missing

How are these problems currently being solved?

Hard work

Problems with hard work:•Does not scale•Does not comport with the vision underlying ImmPort – that we can transform clinical medicine from an art into an (information-driven) science, based on repeatable processes documented in advance

Goals of ImmPort• Accelerate a more collaborative and coordinated

research environment• Create an integrated database that broadens the

usefulness of scientific data• Advance the pace and quality of scientific discovery • Integrate relevant data sets from participating

laboratories, public and government databases, and private data sources

• Promote rapid availability of important findings• Provide analysis tools to advance immunological

research

pipeline

32

perform study &collect data

analyze data(SAS …)

submit data toImmPort

process & de-identify, data in ImmPort

discover, aggregate, analyze,data inImmPort

Pipeline

33

PIs, hospitals, biostatisticians NorthropGrumman

Max & Mindy

Alternatives to the strategy of hard work

perform study &collect data

analyze data(SAS …)

submit data toImmPort

process & de-identify, data in ImmPort

discover, aggregate, analyze,data inImmPort

PIs, hospitals, biostatisticians NorthropGrumman

Stanford (Max & Mindy)

What we have currently

PIs, hospitals, biostatisticians, CROs …

35

NorthropGrumman


Lots of free text, local formats, local standards, local terminologies operating here

Semantic Web strategy of post-coordination via arms-length enhancement of data


36

NorthropGrumman



uniform standards and ontologies applied post hoc

The problem with this post hoc strategy is that it still requires the same amount of hard work


37

NorthropGrumman



uniform standards and ontologies applied post hoc

A preferred but much more ambitious strategy: Pre-coordination


38

NorthropGrumman


Identify uniform standards that can be applied already here

ImmPort data is already being taggedFor example•where data is prepared to meet FDA requirements•where data is published to meet NIH mandates for reusability•in the post-submission phase, where data is analyzed by third partiesBut this tagging is •partial•uncoordinated•uses ontologies and analysis tools of varying quality

Ought implies can

Complete clinical trial data can be made freely available, in de-identified formBut to be useful these need need to be discoverable and analyzableWhich means: standardization

Two alternative strategies for standardization

• 1. via consensus-based ontologies adapted to the needs of trialists

• 2. via FDA (CDISC) standards

Advantages of pre-coordination with ontologies

• Better quality of data for all Maxes and Mindies• Enhanced discoverability of data• Cost-free submission of data to ImmPort• Works even for those trials which have nothing to do with

FDA• Allows incremental strategy • Leads to immediate integration with bioinformatics data

sources

Immune-Related Ontologies (examples) Protein Ontology (PRO)Gene Ontology (GO) Cell Ontology (CL)Immune Epitope Ontology Beta Cell Genomics Ontology Infectious Disease Ontology

Allergy OntologyAntibody OntologyCDISC2RDFCL+ (for CyTOF)Cytokine OntologyImmunology OntologyVDJ Ontology

http://ncorwiki.buffalo.edu/index.php/Immunology_Ontologies43

The very same ontological framework will work not just for BISC but also for the NIAID BRCs

44

An Example of his this will work: The ImmPort Antibody Registry/Ontology

Experimental methods typically report antibody clones or target markers using non-standardized terminology:

CD3e, CD3E, CD3ɛ, CD3 epsilon (protein names)

HIT3e vs. UCHT1 (antibody clones for CD3e)

550367 vs. 300401 (catalog numbers for anti-CD3e antibody reagents)

Even catalog numbers have a half-life as concerns the information they provide

ImmPort Antibody Registry (Diehl, et al)

from BD Lyoplate Screening Panels Human Surface Markers46

Semantic Query / Discoverability

Find all experiments in which IL2 mRNA levels were quantified

Infer that IL2 mRNA is analyte and SAGE, QPCR and microarrays are appropriate measurement techniques

Find all experiment samples that include samples from subjects with diseases like Type 1 diabetes

Infers that the source of the biological sample used must be a human subject with Type 1 diabetes mellitus, Grave’s disease or other autoimmune diseases of endocrine glands

Second strategy: coordination through FDA (CDISC) standards

Currently, PIs may need to reformat twice, once for ImmPort, once for FDACoordination via ontologies would require mappings from these ontologies to CDISC standardsBut there is an alternative strategy: have all trialists use CDISC standards •Map the CDISC standards to common ontologies

Problems with the FDA (CDISC) strategy• They have not been developed to support computation across

biological data• They are very slow to evolve (> 14 years so far)• They are designed to meet the needs of data managers rather than

bioinformaticians• They lack compositionality (hard to integrate with other data• They are very complicated and so typically not in fact used by

trialists; rather they are generated by software (for example Medidata) – with some loss in data quality (?) (through hard work?)

BRIDG 3.2 Domain Analysis Model

Strategy

• Identify useful standards and build them into the clinical trial management systems, laboratory information management systems, such as LabKey that the PIs will be using in any case?

• Join with Ravi Shankar and with the PHUSE (EU, Roche, AstraZeneca, FDA, …) project to incorporate ontology technology into CDISC

Health & Medicine

Clinical trial data wants to be free: Lessons from the ImmPort Immunology Data and Analysis Portal