56
Strategies to Enhance the Utility of Data in ImmPort Barry Smith http://ontology.buffalo.edu/smith 1

Strategies to Enhance the Utility of Data in ImmPort

  • Upload
    zoey

  • View
    58

  • Download
    1

Embed Size (px)

DESCRIPTION

Strategies to Enhance the Utility of Data in ImmPort. Barry Smith http://ontology.buffalo.edu/smith. pipeline. discover, aggregate, analyze, data in ImmPort. perform study & collect data. process & de-identify, data in ImmPort. submit data to ImmPort. analyze data - PowerPoint PPT Presentation

Citation preview

Page 1: Strategies to Enhance the Utility of  Data  in  ImmPort

1

Strategies to Enhance the Utility of Data in ImmPort

Barry Smithhttp://ontology.buffalo.edu/smith

Page 2: Strategies to Enhance the Utility of  Data  in  ImmPort

2

pipeline

perform study &collect data

analyze data(SAS …)

submit data toImmPort

process & de-identify, data in ImmPort

discover, aggregate, analyze,data inImmPort

Page 3: Strategies to Enhance the Utility of  Data  in  ImmPort

3

Pipeline

PIs, hospitals, biostatisticians NorthropGrumman

Max & Mindy

Page 4: Strategies to Enhance the Utility of  Data  in  ImmPort

4

The problem• too many incompatible standards and

terminologies at all stages in the pipeline• results in poorer quality of data available for

analysis – requiring considerable manual effort• as more studies come online this will get worse

Page 5: Strategies to Enhance the Utility of  Data  in  ImmPort

Training and Strategy Workshop for Rho Federal

http://ncorwiki.buffalo.edu/index.php/ImmPort

Page 6: Strategies to Enhance the Utility of  Data  in  ImmPort

Rho participants• David Ikle (Chief, Biostatistics, Rho Federal): Database

Creation and Data Analysis Processes at Rho Federal • John Lim and Karen Kesler: Views from Rho of ImmPort

Submission Process• Jeff Abolafia: CDISC and CDASH standards in Rho~ 20 biostatisticians and data managers at Rho Federal

External participants• Ravi Shankar, Barry Smith, Jeff Wiser from BISC• Anna Maria Masci (Duke University): On submission of data to

ImmPort for the Multiscale System Immunology project• Lindsay Cowell (UT Southwestern): Immunology Ontology

Page 7: Strategies to Enhance the Utility of  Data  in  ImmPort

7

The solution(s)•Post-coordination•Pre-coordination

Page 8: Strategies to Enhance the Utility of  Data  in  ImmPort

8

Pre- vs. Post-coordination

PIs, hospitals, biostatisticians, Rho …

NorthropGrumman

Max & Mindy

Page 9: Strategies to Enhance the Utility of  Data  in  ImmPort

9

Post-coordination = arms-length enhancement of data

PIs, hospitals, biostatisticians, Rho …

NorthropGrumman

Max & Mindy

uniform standards applied post hoc

Page 10: Strategies to Enhance the Utility of  Data  in  ImmPort

10

Post-coordination = arms-length enhancement of data

PIs, hospitals, biostatisticians, Rho …

NorthropGrumman

Max & Mindy

Lots of free text, local formats, local standards, local terminologies operating here

LEAVE AS IS

uniform standards applied post hoc

Page 11: Strategies to Enhance the Utility of  Data  in  ImmPort

11

Advantages: BISC controls all ImmPort data issuesDisadvantages: BISC bears all costs of data processing; data are divorced from source

PIs, hospitals, biostatisticians, Rho …

NorthropGrumman

Max & Mindy

free text protocols, local formats, local standards, local terminologies

uniform standards applied post hoc

Lots of free text, local formats, local standards, local terminologies operating here

Page 12: Strategies to Enhance the Utility of  Data  in  ImmPort

12

Pre-coordination

PIs, hospitals, biostatisticians, Rho …

NorthropGrumman

Max & Mindy

apply uniform standards already here

Page 13: Strategies to Enhance the Utility of  Data  in  ImmPort

13

Advantages: higher quality data for integration and analysis; lower costs to BISCDisadvantages: increased costs to data providers; which uniform standards will they accept? which ones should they accept?

PIs, hospitals, biostatisticians, Rho … (=data providers)

NorthropGrumman

Max & Mindy

same uniform standards applied across the whole pipeline

Page 14: Strategies to Enhance the Utility of  Data  in  ImmPort

Multiple moving parts

PIs, hospitals, biostatisticians, Rho … Northrop

GrummanMax & Mindy

Local standards+ Labkey+ Sampleminded

Local standards + FDA - CDISC + Medidata Rave

Mechanistic assays Clinical

Page 15: Strategies to Enhance the Utility of  Data  in  ImmPort

Multiple time scales

PIs, hospitals, biostatisticians, Rho … Northrop

GrummanMax & Mindy

Local standards+ Labkey+ Sampleminded

Local standards + FDA - CDISC* + Medidata Rave†

Mechanistic assays Clinical

*CDISC effort initiated 1997 †Medidata Rave only now being adopted by Rho

Page 16: Strategies to Enhance the Utility of  Data  in  ImmPort

16

For Rho Federal CDISC / FDA are of secondary importanceBut they may adopt CDISC standards nonetheless, for the sake of uniformity, and because they may need to use MedidataCurrently use of standards by Rho Federal is:• uncoordinated across different studies• involves standards of varying quality• is inefficient (costs money)• involves considerable post-coordination (e.g. of the sort

used to package data for sending to ImmPort)

Page 17: Strategies to Enhance the Utility of  Data  in  ImmPort

17

Goal of the Rho meeting• Devise strategy to optimize Rho-BISC collaboration• Rho has to pre-coordinate for ImmPort• If Rho can use ImmPort templates already in its

day-to-day operations, this will make submission to Immport more effective and potentially improve quality of data along the whole pipeline

--> Need for collaborative development of some standards, libraries and ontologies

Page 18: Strategies to Enhance the Utility of  Data  in  ImmPort

StandardsExample: Visit days

Page 19: Strategies to Enhance the Utility of  Data  in  ImmPort

19

HLA data (purple)

Flow Cytometry data (yellow)

PCR data (green)

Study Protocol,Operational data,Clinical data (blue)

ITN Data

SpecimenManagementData (green)

Page 20: Strategies to Enhance the Utility of  Data  in  ImmPort

20

Transplant

Visit 00

v 0

v0

Day 0

What is in a visit name? (ITN)

Page 21: Strategies to Enhance the Utility of  Data  in  ImmPort

21

What is in a visit name?Visit 0, v0, v 0, 0, Day 0, Transplant

CRO

ProtocolGroup

Assay Group

CimarronOperationsGroup

Data Center

Schedule of Events

SpecimenTable

TubeTable

CRF

ImmunoTrak

KitReport

Database

CoreLabs

Assays

Day 0, Transplant

v0

0

0

v 0v0, Visit 0

Tube Manufacturer v 0

Page 22: Strategies to Enhance the Utility of  Data  in  ImmPort

22

Allergy Score ( Study Collection Day) Lab Tests ( Study Time collected)

Microarray Data ( Only Visit ) Flow ( Collection_Study_day and Visit)

Mappings between protocol, lab tests and mechanistic assays were missing

Page 23: Strategies to Enhance the Utility of  Data  in  ImmPort

23

ImmPort Templates

How specify “Subject Phenotype”?

Page 24: Strategies to Enhance the Utility of  Data  in  ImmPort

ImmPort Adverse Event template

adverse_event_accessionname_preferredstart_timecausalityis_serious_eventseverity_reportedstudy_accessionorgan_or_body_system_preferredend_study_dayend_timeevent_descriptionlocation_of_reaction_preferred

location_of_reaction_reportedname_reportedorgan_or_body_system_reportedoutcome_preferredoutcome_reportedother_action_takenrelation_to_nonstudy_treatmentrelation_to_study_treatmentseverity_preferredstart_study_daysubject_accessionworkspace_id

Problems Runs together terms with what they describe‘severity reported’ vs ‘severity preferred’‘outcome reported’ vs ‘outcome preferred’Are there definitions?

Page 25: Strategies to Enhance the Utility of  Data  in  ImmPort

Immport Adverse Event template

proposals contributed by Yongqun He

Ontology Ontology term

Page 26: Strategies to Enhance the Utility of  Data  in  ImmPort

26

Clinical Activities Library (from ITN, via Ravi)

Page 27: Strategies to Enhance the Utility of  Data  in  ImmPort

Which standards do we need for mechanistic assays?

Anna Maria MasciDepartment of Immunology

Duke University

Page 28: Strategies to Enhance the Utility of  Data  in  ImmPort

Standards needed for bench work

• Purpose of the experiment• Model (in vivo: animal or in vitro: cell, protein etc.)• Method type (DNA sequencing, ELISA, in vivo

microscopy)• Method specification (treatment, incubation time,

instrument used)• Data format ( Excel file, image )• Output (List of entities, OD value, fluorescence value)

Page 29: Strategies to Enhance the Utility of  Data  in  ImmPort

Standards needed for statistical analysis

• Data type: qualitative or quantitative• Normalization: Removal during data analysis of non-

biological variations such as instrument variability, experimental protocol changes, and reagent changes

• Population• Variable• Outcome• Statistical test

Page 30: Strategies to Enhance the Utility of  Data  in  ImmPort

Experimental methodology ontology

ASSAY

INPUT TRASFORMATION OUTPUT

ORGANISM, CELL, DNA, DRUG, REAGENT

TARGET , TREATMENT, INSTRUMENT

DATA FORMATPROCESSED DATA

Page 31: Strategies to Enhance the Utility of  Data  in  ImmPort

CSFE staining : Input:

Organism: mouseCell: Naïve B cellReagent: Carboxyfluorescein Succinimidyl ester

Transformation

Target assay: cell cytosolReagent: carboxy- fluorescein diacetate, succinimidyl esterCell treatment: noneInstrument: FACS

Output:Data Type: Facs histogramsProcessed data: number of cell divisions

Page 32: Strategies to Enhance the Utility of  Data  in  ImmPort
Page 33: Strategies to Enhance the Utility of  Data  in  ImmPort

Need for supplementary ontology content to support design of ImmPort templates that can be useful already to Rho workflow

• allow high quality interoperable standards which can

• keep pace with current research• advance discoverability of ImmPort data by third parties• allow high-powered analysis by Max and Mindy

• Examples: • planned Antibody Ontology to support automatic

analysis of CyTOF results • Ontology for Biomedical Investigations (OBI)

Page 34: Strategies to Enhance the Utility of  Data  in  ImmPort

ImmPort Antibody Registry (Diehl, et al)

from BD Lyoplate Screening Panels Human Surface Markers

34

Page 35: Strategies to Enhance the Utility of  Data  in  ImmPort

Ontology of Biomedical Investigations3rd Workshop

ImmPort

Richard H. Scheuermann

29 JAN 2007

Page 36: Strategies to Enhance the Utility of  Data  in  ImmPort

Semantic Query

Find all experiments in which IL2 mRNA levels were quantifiedInfer that IL2 mRNA is analyte and SAGE, QPCR and microarrays are appropriate measurement techniques

Find all experiment samples that include samples from subjects with diseases like Type 1 diabetes

Infers that the source of the biological sample used must be a human subject with Type 1 diabetes mellitus, Grave’s disease or other autoimmune diseases of endocrine glands

Page 37: Strategies to Enhance the Utility of  Data  in  ImmPort

Applications of OBI to Functional Genomics Data Annotation

and Integrative Tools for Protozoan Parasite Research

Jie Zheng & Chris StoeckertCenter for Bioinformatics

University of Pennsylvania School of Medicine

2011 San Diego OBI workshop

Page 38: Strategies to Enhance the Utility of  Data  in  ImmPort

EuPathDB is a NIAID Bioinformatics Resource Center covering Eukaryotic Parasites

EuPathDB: a portal to eukaryotic pathogen databases.Aurrecoechea C, Brestelli J, Brunk BP, Fischer S, Gajria B, Gao X, Gingle A, Grant G, Harb OS, Heiges M, Innamorato F, Iodice J, Kissinger JC, Kraemer ET, Li W, Miller JA, Nayak V, Pennington C, Pinney DF, Roos DS, Ross C, Srinivasamoorthy G, Stoeckert CJ Jr, Thibodeau R, Treatman C, Wang H.Nucleic Acids Res. 2010

Page 39: Strategies to Enhance the Utility of  Data  in  ImmPort

Ontology-based Representation of Isolate Data

39The data collected in the submission form are in the bold font. The fields require ontology terms are in thick border box.

Page 40: Strategies to Enhance the Utility of  Data  in  ImmPort

Isolate Submission Form

Support multiple sequences submission40

Before AfterI solate IDDate Collected DayCountry MonthState or province YearCityGPS CoordinatesIsolate SpeciesIsolate Environmental SourceHostSequence 1 product NameSequence 1Sequence 2 product NameSequence 2Sequence 3 product Name EnvironmentSequence 3Sequence 4 product NameSequence 4

SexHost InformationIsolation Source

Sequence 4 Primer PairsSequence 4 descriptionSequence 4

SymptomsHost Material I solated from

Non-human HabitatAdditional Notes

Nucleotide Sequence

Sequence 1 product or locus NameSequence 1 Primer PairsSequence 1 descriptionSequence 1Sequence 2 product or locus NameSequence 2 Primer PairsSequence 2 descriptionSequence 2Sequence 3 product or locus NameSequence 3 Primer PairsSequence 3 descriptionSequence 3Sequence 4 product or locus Name

Specimen ID

Date Collected

IsolateIsolate SpeciesAdditional Classification -- genotypeAdditional Classification -- subtypeOther organism isolated from same sample

Geographic Location

CountryRegion -- State or provinceCounty

Geographic Location

Source

Nucleotide SequenceCity/village/localityLatitude/ longitude CoordinatesIsolate Environmental SourceHost Species-- scientific nameRace/BreedAge

Page 41: Strategies to Enhance the Utility of  Data  in  ImmPort

Ontology-based Representation of Genetic Manipulation with Resulting Phenotype Data

41The data collected in the submission form are in the bold font. The fields require ontology terms are in thick border box.Ontology for Parasite Lifecycle (OPL) will be used in the annotation of life cycle stage

Use OPL for annotation

Page 42: Strategies to Enhance the Utility of  Data  in  ImmPort

Genetic Manipulation Section

42

Page 43: Strategies to Enhance the Utility of  Data  in  ImmPort

Phenotype Section

43

Cellular location

Biological process

Question:What relation should use to link the quality (PATO: organismal quality) such as: PATO: lethal to biological process such as GO: growth

Page 44: Strategies to Enhance the Utility of  Data  in  ImmPort

Original strategyRho is int Medidata Rave as its Clinical Trial Management PlatformBISC will convince Rho and Medidata to adopt high-quality, computable ontologies of the sort which will enable automatic export of source data into ImmPort

This strategy will not work because Medidata is tied to CDISC (CDASH, ODM, ADaM ...), geared to FDA statistical analysis pipelinesResult: much of CDISC content is packaged in ways not conducive to secondary analysis

Page 45: Strategies to Enhance the Utility of  Data  in  ImmPort
Page 46: Strategies to Enhance the Utility of  Data  in  ImmPort

CDASH – Clinical Data Acquisition Standards HarmonizationUses the Operational Data Model (ODM)[XLM dialect] designed to facilitate the archive and interchange of the metadata and data for clinical research, its power being fully unleashed when data are collected from multiple sources.http://www.cdisc.org/stuff/contentmgr/files/0/f968ea2a3bdad76eb3e23e3c4978fff4/misc/odm1_3_1_final.htmMedidata Rave uses ODM

Page 47: Strategies to Enhance the Utility of  Data  in  ImmPort

ODM kubjs

http://www.cdisc.org/stuff/contentmgr/files/0/fa3021351c086aeaaef00cd17feaef58/misc/cdash_std_1_1_2011_01_18.pdf

http://www.cdisc.org/stuff/contentmgr/files/0/3f998d957905d7ed83b0bbeff9822f7a/misc/cdash_ug_1_1_1_2012_04_12_final.pdf

http://www.cdisc.org/stuff/contentmgr/files/0/919cb4ef843829170d470b37eb662aeb/misc/odm1_3_0_final.htm

http://www.cdisc.org/stuff/contentmgr/files/0/464923b10ea16b477151fcaa9f465166/misc/define_xml_2_0_releasepackage20140424.zip

http://www.cdisc.org/stuff/contentmgr/files/0/3f998d957905d7ed83b0bbeff9822f7a/misc/cdash.odm_updated.xml

Page 48: Strategies to Enhance the Utility of  Data  in  ImmPort

48

CDASH

Page 49: Strategies to Enhance the Utility of  Data  in  ImmPort

CDASH somehow closed to members, but there is this from NCI:

http://evs.nci.nih.gov/ftp1/CDISC/SDTM/CDASH%20Terminology.pdf

Page 50: Strategies to Enhance the Utility of  Data  in  ImmPort
Page 51: Strategies to Enhance the Utility of  Data  in  ImmPort
Page 52: Strategies to Enhance the Utility of  Data  in  ImmPort

Advantages of CDISC-CDASH• FDA mandated standardization

• rich list of standardized questionnaires / questions used in clinical trials

• increasingly used by Pharma (mainly through post-coordination?)

• used by Medidata • increasingly used by Rho (mainly through postcoordination?)• some parts of CDISC (above all SDTM) already used in

ImmPort templates• subject to interesting experiments with Semantic Web:

CDISC2RDF / Phuse, CDISC Ontlogy (Ravi) …

Page 53: Strategies to Enhance the Utility of  Data  in  ImmPort

Tentative first list of potential disadvantages with CDISC - CDASH• not webcentric (technologically dated)• restricted (?) to members• codes (BP_DIABP_VSORRES) do not advance discoverability• does not reuse established standards• not modular, do not interoperate with bioinformatics resources• unclarity as to which CDASH-associated working parts will

survive (ODM, HL7 messaging …)• many areas (therapeutic, lab …) not populated• CDISC content lags current research • BRIDG is not the solution

Page 54: Strategies to Enhance the Utility of  Data  in  ImmPort

Next steps

– Rho will continue to send some studies as is to Immport– experiment with use of enhanced standards

Medidata / Ravi CDISC-CDASH pipeline; explore possibilities for influencing CDISC treatment of immunology dataClinicaltrials.gov pipelineRho – Duke – UT Southwestern to create libraries needed by CTOT for mechanistic assays for

transplant asthma

–write 1 FTE ontologist into next grant proposal?– Labkey pipeline: Jeff and Barry will visit Labkey to discuss possibilities for pre-coordination

Page 55: Strategies to Enhance the Utility of  Data  in  ImmPort

Strategy for Labkey (tentative)

Prepopulate Labkey with suitably tailored parts of OBI, and with other assay-related ontologies (CheBI, PRO, CL, IDO…)

to test the degree to which the results can • make Rho data operations more effective• streamline submission of Rho lab data to ImmPort • result in higher utility and higher quality of lab data in

Immport