Strategies to Enhance the Utility of Data in ImmPort

Preview:

DESCRIPTION

Strategies to Enhance the Utility of Data in ImmPort. Barry Smith http://ontology.buffalo.edu/smith. pipeline. discover, aggregate, analyze, data in ImmPort. perform study & collect data. process & de-identify, data in ImmPort. submit data to ImmPort. analyze data - PowerPoint PPT Presentation

Citation preview

1

Strategies to Enhance the Utility of Data in ImmPort

Barry Smithhttp://ontology.buffalo.edu/smith

2

pipeline

perform study &collect data

analyze data(SAS …)

submit data toImmPort

process & de-identify, data in ImmPort

discover, aggregate, analyze,data inImmPort

3

Pipeline

PIs, hospitals, biostatisticians NorthropGrumman

Max & Mindy

4

The problem• too many incompatible standards and

terminologies at all stages in the pipeline• results in poorer quality of data available for

analysis – requiring considerable manual effort• as more studies come online this will get worse

Training and Strategy Workshop for Rho Federal

http://ncorwiki.buffalo.edu/index.php/ImmPort

Rho participants• David Ikle (Chief, Biostatistics, Rho Federal): Database

Creation and Data Analysis Processes at Rho Federal • John Lim and Karen Kesler: Views from Rho of ImmPort

Submission Process• Jeff Abolafia: CDISC and CDASH standards in Rho~ 20 biostatisticians and data managers at Rho Federal

External participants• Ravi Shankar, Barry Smith, Jeff Wiser from BISC• Anna Maria Masci (Duke University): On submission of data to

ImmPort for the Multiscale System Immunology project• Lindsay Cowell (UT Southwestern): Immunology Ontology

7

The solution(s)•Post-coordination•Pre-coordination

8

Pre- vs. Post-coordination

PIs, hospitals, biostatisticians, Rho …

NorthropGrumman

Max & Mindy

9

Post-coordination = arms-length enhancement of data

PIs, hospitals, biostatisticians, Rho …

NorthropGrumman

Max & Mindy

uniform standards applied post hoc

10

Post-coordination = arms-length enhancement of data

PIs, hospitals, biostatisticians, Rho …

NorthropGrumman

Max & Mindy

Lots of free text, local formats, local standards, local terminologies operating here

LEAVE AS IS

uniform standards applied post hoc

11

Advantages: BISC controls all ImmPort data issuesDisadvantages: BISC bears all costs of data processing; data are divorced from source

PIs, hospitals, biostatisticians, Rho …

NorthropGrumman

Max & Mindy

free text protocols, local formats, local standards, local terminologies

uniform standards applied post hoc

Lots of free text, local formats, local standards, local terminologies operating here

12

Pre-coordination

PIs, hospitals, biostatisticians, Rho …

NorthropGrumman

Max & Mindy

apply uniform standards already here

13

Advantages: higher quality data for integration and analysis; lower costs to BISCDisadvantages: increased costs to data providers; which uniform standards will they accept? which ones should they accept?

PIs, hospitals, biostatisticians, Rho … (=data providers)

NorthropGrumman

Max & Mindy

same uniform standards applied across the whole pipeline

Multiple moving parts

PIs, hospitals, biostatisticians, Rho … Northrop

GrummanMax & Mindy

Local standards+ Labkey+ Sampleminded

Local standards + FDA - CDISC + Medidata Rave

Mechanistic assays Clinical

Multiple time scales

PIs, hospitals, biostatisticians, Rho … Northrop

GrummanMax & Mindy

Local standards+ Labkey+ Sampleminded

Local standards + FDA - CDISC* + Medidata Rave†

Mechanistic assays Clinical

*CDISC effort initiated 1997 †Medidata Rave only now being adopted by Rho

16

For Rho Federal CDISC / FDA are of secondary importanceBut they may adopt CDISC standards nonetheless, for the sake of uniformity, and because they may need to use MedidataCurrently use of standards by Rho Federal is:• uncoordinated across different studies• involves standards of varying quality• is inefficient (costs money)• involves considerable post-coordination (e.g. of the sort

used to package data for sending to ImmPort)

17

Goal of the Rho meeting• Devise strategy to optimize Rho-BISC collaboration• Rho has to pre-coordinate for ImmPort• If Rho can use ImmPort templates already in its

day-to-day operations, this will make submission to Immport more effective and potentially improve quality of data along the whole pipeline

--> Need for collaborative development of some standards, libraries and ontologies

StandardsExample: Visit days

19

HLA data (purple)

Flow Cytometry data (yellow)

PCR data (green)

Study Protocol,Operational data,Clinical data (blue)

ITN Data

SpecimenManagementData (green)

20

Transplant

Visit 00

v 0

v0

Day 0

What is in a visit name? (ITN)

21

What is in a visit name?Visit 0, v0, v 0, 0, Day 0, Transplant

CRO

ProtocolGroup

Assay Group

CimarronOperationsGroup

Data Center

Schedule of Events

SpecimenTable

TubeTable

CRF

ImmunoTrak

KitReport

Database

CoreLabs

Assays

Day 0, Transplant

v0

0

0

v 0v0, Visit 0

Tube Manufacturer v 0

22

Allergy Score ( Study Collection Day) Lab Tests ( Study Time collected)

Microarray Data ( Only Visit ) Flow ( Collection_Study_day and Visit)

Mappings between protocol, lab tests and mechanistic assays were missing

23

ImmPort Templates

How specify “Subject Phenotype”?

ImmPort Adverse Event template

adverse_event_accessionname_preferredstart_timecausalityis_serious_eventseverity_reportedstudy_accessionorgan_or_body_system_preferredend_study_dayend_timeevent_descriptionlocation_of_reaction_preferred

location_of_reaction_reportedname_reportedorgan_or_body_system_reportedoutcome_preferredoutcome_reportedother_action_takenrelation_to_nonstudy_treatmentrelation_to_study_treatmentseverity_preferredstart_study_daysubject_accessionworkspace_id

Problems Runs together terms with what they describe‘severity reported’ vs ‘severity preferred’‘outcome reported’ vs ‘outcome preferred’Are there definitions?

Immport Adverse Event template

proposals contributed by Yongqun He

Ontology Ontology term

26

Clinical Activities Library (from ITN, via Ravi)

Which standards do we need for mechanistic assays?

Anna Maria MasciDepartment of Immunology

Duke University

Standards needed for bench work

• Purpose of the experiment• Model (in vivo: animal or in vitro: cell, protein etc.)• Method type (DNA sequencing, ELISA, in vivo

microscopy)• Method specification (treatment, incubation time,

instrument used)• Data format ( Excel file, image )• Output (List of entities, OD value, fluorescence value)

Standards needed for statistical analysis

• Data type: qualitative or quantitative• Normalization: Removal during data analysis of non-

biological variations such as instrument variability, experimental protocol changes, and reagent changes

• Population• Variable• Outcome• Statistical test

Experimental methodology ontology

ASSAY

INPUT TRASFORMATION OUTPUT

ORGANISM, CELL, DNA, DRUG, REAGENT

TARGET , TREATMENT, INSTRUMENT

DATA FORMATPROCESSED DATA

CSFE staining : Input:

Organism: mouseCell: Naïve B cellReagent: Carboxyfluorescein Succinimidyl ester

Transformation

Target assay: cell cytosolReagent: carboxy- fluorescein diacetate, succinimidyl esterCell treatment: noneInstrument: FACS

Output:Data Type: Facs histogramsProcessed data: number of cell divisions

Need for supplementary ontology content to support design of ImmPort templates that can be useful already to Rho workflow

• allow high quality interoperable standards which can

• keep pace with current research• advance discoverability of ImmPort data by third parties• allow high-powered analysis by Max and Mindy

• Examples: • planned Antibody Ontology to support automatic

analysis of CyTOF results • Ontology for Biomedical Investigations (OBI)

ImmPort Antibody Registry (Diehl, et al)

from BD Lyoplate Screening Panels Human Surface Markers

34

Ontology of Biomedical Investigations3rd Workshop

ImmPort

Richard H. Scheuermann

29 JAN 2007

Semantic Query

Find all experiments in which IL2 mRNA levels were quantifiedInfer that IL2 mRNA is analyte and SAGE, QPCR and microarrays are appropriate measurement techniques

Find all experiment samples that include samples from subjects with diseases like Type 1 diabetes

Infers that the source of the biological sample used must be a human subject with Type 1 diabetes mellitus, Grave’s disease or other autoimmune diseases of endocrine glands

Applications of OBI to Functional Genomics Data Annotation

and Integrative Tools for Protozoan Parasite Research

Jie Zheng & Chris StoeckertCenter for Bioinformatics

University of Pennsylvania School of Medicine

2011 San Diego OBI workshop

EuPathDB is a NIAID Bioinformatics Resource Center covering Eukaryotic Parasites

EuPathDB: a portal to eukaryotic pathogen databases.Aurrecoechea C, Brestelli J, Brunk BP, Fischer S, Gajria B, Gao X, Gingle A, Grant G, Harb OS, Heiges M, Innamorato F, Iodice J, Kissinger JC, Kraemer ET, Li W, Miller JA, Nayak V, Pennington C, Pinney DF, Roos DS, Ross C, Srinivasamoorthy G, Stoeckert CJ Jr, Thibodeau R, Treatman C, Wang H.Nucleic Acids Res. 2010

Ontology-based Representation of Isolate Data

39The data collected in the submission form are in the bold font. The fields require ontology terms are in thick border box.

Isolate Submission Form

Support multiple sequences submission40

Before AfterI solate IDDate Collected DayCountry MonthState or province YearCityGPS CoordinatesIsolate SpeciesIsolate Environmental SourceHostSequence 1 product NameSequence 1Sequence 2 product NameSequence 2Sequence 3 product Name EnvironmentSequence 3Sequence 4 product NameSequence 4

SexHost InformationIsolation Source

Sequence 4 Primer PairsSequence 4 descriptionSequence 4

SymptomsHost Material I solated from

Non-human HabitatAdditional Notes

Nucleotide Sequence

Sequence 1 product or locus NameSequence 1 Primer PairsSequence 1 descriptionSequence 1Sequence 2 product or locus NameSequence 2 Primer PairsSequence 2 descriptionSequence 2Sequence 3 product or locus NameSequence 3 Primer PairsSequence 3 descriptionSequence 3Sequence 4 product or locus Name

Specimen ID

Date Collected

IsolateIsolate SpeciesAdditional Classification -- genotypeAdditional Classification -- subtypeOther organism isolated from same sample

Geographic Location

CountryRegion -- State or provinceCounty

Geographic Location

Source

Nucleotide SequenceCity/village/localityLatitude/ longitude CoordinatesIsolate Environmental SourceHost Species-- scientific nameRace/BreedAge

Ontology-based Representation of Genetic Manipulation with Resulting Phenotype Data

41The data collected in the submission form are in the bold font. The fields require ontology terms are in thick border box.Ontology for Parasite Lifecycle (OPL) will be used in the annotation of life cycle stage

Use OPL for annotation

Genetic Manipulation Section

42

Phenotype Section

43

Cellular location

Biological process

Question:What relation should use to link the quality (PATO: organismal quality) such as: PATO: lethal to biological process such as GO: growth

Original strategyRho is int Medidata Rave as its Clinical Trial Management PlatformBISC will convince Rho and Medidata to adopt high-quality, computable ontologies of the sort which will enable automatic export of source data into ImmPort

This strategy will not work because Medidata is tied to CDISC (CDASH, ODM, ADaM ...), geared to FDA statistical analysis pipelinesResult: much of CDISC content is packaged in ways not conducive to secondary analysis

CDASH – Clinical Data Acquisition Standards HarmonizationUses the Operational Data Model (ODM)[XLM dialect] designed to facilitate the archive and interchange of the metadata and data for clinical research, its power being fully unleashed when data are collected from multiple sources.http://www.cdisc.org/stuff/contentmgr/files/0/f968ea2a3bdad76eb3e23e3c4978fff4/misc/odm1_3_1_final.htmMedidata Rave uses ODM

ODM kubjs

http://www.cdisc.org/stuff/contentmgr/files/0/fa3021351c086aeaaef00cd17feaef58/misc/cdash_std_1_1_2011_01_18.pdf

http://www.cdisc.org/stuff/contentmgr/files/0/3f998d957905d7ed83b0bbeff9822f7a/misc/cdash_ug_1_1_1_2012_04_12_final.pdf

http://www.cdisc.org/stuff/contentmgr/files/0/919cb4ef843829170d470b37eb662aeb/misc/odm1_3_0_final.htm

http://www.cdisc.org/stuff/contentmgr/files/0/464923b10ea16b477151fcaa9f465166/misc/define_xml_2_0_releasepackage20140424.zip

http://www.cdisc.org/stuff/contentmgr/files/0/3f998d957905d7ed83b0bbeff9822f7a/misc/cdash.odm_updated.xml

48

CDASH

CDASH somehow closed to members, but there is this from NCI:

http://evs.nci.nih.gov/ftp1/CDISC/SDTM/CDASH%20Terminology.pdf

Advantages of CDISC-CDASH• FDA mandated standardization

• rich list of standardized questionnaires / questions used in clinical trials

• increasingly used by Pharma (mainly through post-coordination?)

• used by Medidata • increasingly used by Rho (mainly through postcoordination?)• some parts of CDISC (above all SDTM) already used in

ImmPort templates• subject to interesting experiments with Semantic Web:

CDISC2RDF / Phuse, CDISC Ontlogy (Ravi) …

Tentative first list of potential disadvantages with CDISC - CDASH• not webcentric (technologically dated)• restricted (?) to members• codes (BP_DIABP_VSORRES) do not advance discoverability• does not reuse established standards• not modular, do not interoperate with bioinformatics resources• unclarity as to which CDASH-associated working parts will

survive (ODM, HL7 messaging …)• many areas (therapeutic, lab …) not populated• CDISC content lags current research • BRIDG is not the solution

Next steps

– Rho will continue to send some studies as is to Immport– experiment with use of enhanced standards

Medidata / Ravi CDISC-CDASH pipeline; explore possibilities for influencing CDISC treatment of immunology dataClinicaltrials.gov pipelineRho – Duke – UT Southwestern to create libraries needed by CTOT for mechanistic assays for

transplant asthma

–write 1 FTE ontologist into next grant proposal?– Labkey pipeline: Jeff and Barry will visit Labkey to discuss possibilities for pre-coordination

Strategy for Labkey (tentative)

Prepopulate Labkey with suitably tailored parts of OBI, and with other assay-related ontologies (CheBI, PRO, CL, IDO…)

to test the degree to which the results can • make Rho data operations more effective• streamline submission of Rho lab data to ImmPort • result in higher utility and higher quality of lab data in

Immport

Recommended