Upload
zoey
View
58
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Strategies to Enhance the Utility of Data in ImmPort. Barry Smith http://ontology.buffalo.edu/smith. pipeline. discover, aggregate, analyze, data in ImmPort. perform study & collect data. process & de-identify, data in ImmPort. submit data to ImmPort. analyze data - PowerPoint PPT Presentation
Citation preview
1
Strategies to Enhance the Utility of Data in ImmPort
Barry Smithhttp://ontology.buffalo.edu/smith
2
pipeline
perform study &collect data
analyze data(SAS …)
submit data toImmPort
process & de-identify, data in ImmPort
discover, aggregate, analyze,data inImmPort
3
Pipeline
PIs, hospitals, biostatisticians NorthropGrumman
Max & Mindy
4
The problem• too many incompatible standards and
terminologies at all stages in the pipeline• results in poorer quality of data available for
analysis – requiring considerable manual effort• as more studies come online this will get worse
Training and Strategy Workshop for Rho Federal
http://ncorwiki.buffalo.edu/index.php/ImmPort
Rho participants• David Ikle (Chief, Biostatistics, Rho Federal): Database
Creation and Data Analysis Processes at Rho Federal • John Lim and Karen Kesler: Views from Rho of ImmPort
Submission Process• Jeff Abolafia: CDISC and CDASH standards in Rho~ 20 biostatisticians and data managers at Rho Federal
External participants• Ravi Shankar, Barry Smith, Jeff Wiser from BISC• Anna Maria Masci (Duke University): On submission of data to
ImmPort for the Multiscale System Immunology project• Lindsay Cowell (UT Southwestern): Immunology Ontology
7
The solution(s)•Post-coordination•Pre-coordination
8
Pre- vs. Post-coordination
PIs, hospitals, biostatisticians, Rho …
NorthropGrumman
Max & Mindy
9
Post-coordination = arms-length enhancement of data
PIs, hospitals, biostatisticians, Rho …
NorthropGrumman
Max & Mindy
uniform standards applied post hoc
10
Post-coordination = arms-length enhancement of data
PIs, hospitals, biostatisticians, Rho …
NorthropGrumman
Max & Mindy
Lots of free text, local formats, local standards, local terminologies operating here
LEAVE AS IS
uniform standards applied post hoc
11
Advantages: BISC controls all ImmPort data issuesDisadvantages: BISC bears all costs of data processing; data are divorced from source
PIs, hospitals, biostatisticians, Rho …
NorthropGrumman
Max & Mindy
free text protocols, local formats, local standards, local terminologies
uniform standards applied post hoc
Lots of free text, local formats, local standards, local terminologies operating here
12
Pre-coordination
PIs, hospitals, biostatisticians, Rho …
NorthropGrumman
Max & Mindy
apply uniform standards already here
13
Advantages: higher quality data for integration and analysis; lower costs to BISCDisadvantages: increased costs to data providers; which uniform standards will they accept? which ones should they accept?
PIs, hospitals, biostatisticians, Rho … (=data providers)
NorthropGrumman
Max & Mindy
same uniform standards applied across the whole pipeline
Multiple moving parts
PIs, hospitals, biostatisticians, Rho … Northrop
GrummanMax & Mindy
Local standards+ Labkey+ Sampleminded
Local standards + FDA - CDISC + Medidata Rave
Mechanistic assays Clinical
Multiple time scales
PIs, hospitals, biostatisticians, Rho … Northrop
GrummanMax & Mindy
Local standards+ Labkey+ Sampleminded
Local standards + FDA - CDISC* + Medidata Rave†
Mechanistic assays Clinical
*CDISC effort initiated 1997 †Medidata Rave only now being adopted by Rho
16
For Rho Federal CDISC / FDA are of secondary importanceBut they may adopt CDISC standards nonetheless, for the sake of uniformity, and because they may need to use MedidataCurrently use of standards by Rho Federal is:• uncoordinated across different studies• involves standards of varying quality• is inefficient (costs money)• involves considerable post-coordination (e.g. of the sort
used to package data for sending to ImmPort)
17
Goal of the Rho meeting• Devise strategy to optimize Rho-BISC collaboration• Rho has to pre-coordinate for ImmPort• If Rho can use ImmPort templates already in its
day-to-day operations, this will make submission to Immport more effective and potentially improve quality of data along the whole pipeline
--> Need for collaborative development of some standards, libraries and ontologies
StandardsExample: Visit days
19
HLA data (purple)
Flow Cytometry data (yellow)
PCR data (green)
Study Protocol,Operational data,Clinical data (blue)
ITN Data
SpecimenManagementData (green)
20
Transplant
Visit 00
v 0
v0
Day 0
What is in a visit name? (ITN)
21
What is in a visit name?Visit 0, v0, v 0, 0, Day 0, Transplant
CRO
ProtocolGroup
Assay Group
CimarronOperationsGroup
Data Center
Schedule of Events
SpecimenTable
TubeTable
CRF
ImmunoTrak
KitReport
Database
CoreLabs
Assays
Day 0, Transplant
v0
0
0
v 0v0, Visit 0
Tube Manufacturer v 0
22
Allergy Score ( Study Collection Day) Lab Tests ( Study Time collected)
Microarray Data ( Only Visit ) Flow ( Collection_Study_day and Visit)
Mappings between protocol, lab tests and mechanistic assays were missing
23
ImmPort Templates
How specify “Subject Phenotype”?
ImmPort Adverse Event template
adverse_event_accessionname_preferredstart_timecausalityis_serious_eventseverity_reportedstudy_accessionorgan_or_body_system_preferredend_study_dayend_timeevent_descriptionlocation_of_reaction_preferred
location_of_reaction_reportedname_reportedorgan_or_body_system_reportedoutcome_preferredoutcome_reportedother_action_takenrelation_to_nonstudy_treatmentrelation_to_study_treatmentseverity_preferredstart_study_daysubject_accessionworkspace_id
Problems Runs together terms with what they describe‘severity reported’ vs ‘severity preferred’‘outcome reported’ vs ‘outcome preferred’Are there definitions?
Immport Adverse Event template
proposals contributed by Yongqun He
Ontology Ontology term
26
Clinical Activities Library (from ITN, via Ravi)
Which standards do we need for mechanistic assays?
Anna Maria MasciDepartment of Immunology
Duke University
Standards needed for bench work
• Purpose of the experiment• Model (in vivo: animal or in vitro: cell, protein etc.)• Method type (DNA sequencing, ELISA, in vivo
microscopy)• Method specification (treatment, incubation time,
instrument used)• Data format ( Excel file, image )• Output (List of entities, OD value, fluorescence value)
Standards needed for statistical analysis
• Data type: qualitative or quantitative• Normalization: Removal during data analysis of non-
biological variations such as instrument variability, experimental protocol changes, and reagent changes
• Population• Variable• Outcome• Statistical test
Experimental methodology ontology
ASSAY
INPUT TRASFORMATION OUTPUT
ORGANISM, CELL, DNA, DRUG, REAGENT
TARGET , TREATMENT, INSTRUMENT
DATA FORMATPROCESSED DATA
CSFE staining : Input:
Organism: mouseCell: Naïve B cellReagent: Carboxyfluorescein Succinimidyl ester
Transformation
Target assay: cell cytosolReagent: carboxy- fluorescein diacetate, succinimidyl esterCell treatment: noneInstrument: FACS
Output:Data Type: Facs histogramsProcessed data: number of cell divisions
Need for supplementary ontology content to support design of ImmPort templates that can be useful already to Rho workflow
• allow high quality interoperable standards which can
• keep pace with current research• advance discoverability of ImmPort data by third parties• allow high-powered analysis by Max and Mindy
• Examples: • planned Antibody Ontology to support automatic
analysis of CyTOF results • Ontology for Biomedical Investigations (OBI)
ImmPort Antibody Registry (Diehl, et al)
from BD Lyoplate Screening Panels Human Surface Markers
34
Ontology of Biomedical Investigations3rd Workshop
ImmPort
Richard H. Scheuermann
29 JAN 2007
Semantic Query
Find all experiments in which IL2 mRNA levels were quantifiedInfer that IL2 mRNA is analyte and SAGE, QPCR and microarrays are appropriate measurement techniques
Find all experiment samples that include samples from subjects with diseases like Type 1 diabetes
Infers that the source of the biological sample used must be a human subject with Type 1 diabetes mellitus, Grave’s disease or other autoimmune diseases of endocrine glands
Applications of OBI to Functional Genomics Data Annotation
and Integrative Tools for Protozoan Parasite Research
Jie Zheng & Chris StoeckertCenter for Bioinformatics
University of Pennsylvania School of Medicine
2011 San Diego OBI workshop
EuPathDB is a NIAID Bioinformatics Resource Center covering Eukaryotic Parasites
EuPathDB: a portal to eukaryotic pathogen databases.Aurrecoechea C, Brestelli J, Brunk BP, Fischer S, Gajria B, Gao X, Gingle A, Grant G, Harb OS, Heiges M, Innamorato F, Iodice J, Kissinger JC, Kraemer ET, Li W, Miller JA, Nayak V, Pennington C, Pinney DF, Roos DS, Ross C, Srinivasamoorthy G, Stoeckert CJ Jr, Thibodeau R, Treatman C, Wang H.Nucleic Acids Res. 2010
Ontology-based Representation of Isolate Data
39The data collected in the submission form are in the bold font. The fields require ontology terms are in thick border box.
Isolate Submission Form
Support multiple sequences submission40
Before AfterI solate IDDate Collected DayCountry MonthState or province YearCityGPS CoordinatesIsolate SpeciesIsolate Environmental SourceHostSequence 1 product NameSequence 1Sequence 2 product NameSequence 2Sequence 3 product Name EnvironmentSequence 3Sequence 4 product NameSequence 4
SexHost InformationIsolation Source
Sequence 4 Primer PairsSequence 4 descriptionSequence 4
SymptomsHost Material I solated from
Non-human HabitatAdditional Notes
Nucleotide Sequence
Sequence 1 product or locus NameSequence 1 Primer PairsSequence 1 descriptionSequence 1Sequence 2 product or locus NameSequence 2 Primer PairsSequence 2 descriptionSequence 2Sequence 3 product or locus NameSequence 3 Primer PairsSequence 3 descriptionSequence 3Sequence 4 product or locus Name
Specimen ID
Date Collected
IsolateIsolate SpeciesAdditional Classification -- genotypeAdditional Classification -- subtypeOther organism isolated from same sample
Geographic Location
CountryRegion -- State or provinceCounty
Geographic Location
Source
Nucleotide SequenceCity/village/localityLatitude/ longitude CoordinatesIsolate Environmental SourceHost Species-- scientific nameRace/BreedAge
Ontology-based Representation of Genetic Manipulation with Resulting Phenotype Data
41The data collected in the submission form are in the bold font. The fields require ontology terms are in thick border box.Ontology for Parasite Lifecycle (OPL) will be used in the annotation of life cycle stage
Use OPL for annotation
Phenotype Section
43
Cellular location
Biological process
Question:What relation should use to link the quality (PATO: organismal quality) such as: PATO: lethal to biological process such as GO: growth
Original strategyRho is int Medidata Rave as its Clinical Trial Management PlatformBISC will convince Rho and Medidata to adopt high-quality, computable ontologies of the sort which will enable automatic export of source data into ImmPort
This strategy will not work because Medidata is tied to CDISC (CDASH, ODM, ADaM ...), geared to FDA statistical analysis pipelinesResult: much of CDISC content is packaged in ways not conducive to secondary analysis
CDASH – Clinical Data Acquisition Standards HarmonizationUses the Operational Data Model (ODM)[XLM dialect] designed to facilitate the archive and interchange of the metadata and data for clinical research, its power being fully unleashed when data are collected from multiple sources.http://www.cdisc.org/stuff/contentmgr/files/0/f968ea2a3bdad76eb3e23e3c4978fff4/misc/odm1_3_1_final.htmMedidata Rave uses ODM
ODM kubjs
http://www.cdisc.org/stuff/contentmgr/files/0/fa3021351c086aeaaef00cd17feaef58/misc/cdash_std_1_1_2011_01_18.pdf
http://www.cdisc.org/stuff/contentmgr/files/0/3f998d957905d7ed83b0bbeff9822f7a/misc/cdash_ug_1_1_1_2012_04_12_final.pdf
http://www.cdisc.org/stuff/contentmgr/files/0/919cb4ef843829170d470b37eb662aeb/misc/odm1_3_0_final.htm
http://www.cdisc.org/stuff/contentmgr/files/0/464923b10ea16b477151fcaa9f465166/misc/define_xml_2_0_releasepackage20140424.zip
http://www.cdisc.org/stuff/contentmgr/files/0/3f998d957905d7ed83b0bbeff9822f7a/misc/cdash.odm_updated.xml
48
CDASH
CDASH somehow closed to members, but there is this from NCI:
http://evs.nci.nih.gov/ftp1/CDISC/SDTM/CDASH%20Terminology.pdf
Advantages of CDISC-CDASH• FDA mandated standardization
• rich list of standardized questionnaires / questions used in clinical trials
• increasingly used by Pharma (mainly through post-coordination?)
• used by Medidata • increasingly used by Rho (mainly through postcoordination?)• some parts of CDISC (above all SDTM) already used in
ImmPort templates• subject to interesting experiments with Semantic Web:
CDISC2RDF / Phuse, CDISC Ontlogy (Ravi) …
Tentative first list of potential disadvantages with CDISC - CDASH• not webcentric (technologically dated)• restricted (?) to members• codes (BP_DIABP_VSORRES) do not advance discoverability• does not reuse established standards• not modular, do not interoperate with bioinformatics resources• unclarity as to which CDASH-associated working parts will
survive (ODM, HL7 messaging …)• many areas (therapeutic, lab …) not populated• CDISC content lags current research • BRIDG is not the solution
Next steps
– Rho will continue to send some studies as is to Immport– experiment with use of enhanced standards
Medidata / Ravi CDISC-CDASH pipeline; explore possibilities for influencing CDISC treatment of immunology dataClinicaltrials.gov pipelineRho – Duke – UT Southwestern to create libraries needed by CTOT for mechanistic assays for
transplant asthma
–write 1 FTE ontologist into next grant proposal?– Labkey pipeline: Jeff and Barry will visit Labkey to discuss possibilities for pre-coordination
Strategy for Labkey (tentative)
Prepopulate Labkey with suitably tailored parts of OBI, and with other assay-related ontologies (CheBI, PRO, CL, IDO…)
to test the degree to which the results can • make Rho data operations more effective• streamline submission of Rho lab data to ImmPort • result in higher utility and higher quality of lab data in
Immport
Appendix: Other moving parts• ITN-Trialshare• HIPC• Sampleminded
• http://www.labanswer.com/• ITN https://www.itntrialshare.org/• http://www.immunetolerance.org/news/2013/08/itn-ac
hieves-scientific-manuscript-first-provides-open-interactive-access-clinical-tria