Upload
kerrie-mason
View
215
Download
1
Tags:
Embed Size (px)
Citation preview
Goal of the SysBio DDWG
• Coordinate approaches for sharing and dissemination of systems biology project data within the program and to the broader infectious disease and systems biology research communities
• Share best practices in data management • Leverage external data resources for data
dissemination, including the NIAID Bioinformatics Resource Centers
DDWG Activities
• Monthly 1 hr conference calls• Annual workshops• Membership– Representatives from FluDyNeMo, Fluomics, OMICS-
LHV, Omics4TB, MaHPIC– Representatives from EuPathDB, PATRIC, ViPR/IRD– Representatives from DMID
• Co-Chairs– Michelle Craft, OMICS-LHV– Richard Scheuermann, ViPR/IRD
Workshop AgendaA. SysBio data management best practices (Michelle Craft, presenter)
* Data Management Best Practice Highlights
* Overview of Data Carpentry and Software Carpentry
B. SysBio center plans for project websites (Richard Scheuermann, moderator)
* Presentation of highlights by each center (5' each)
* Discussion of general internal data sharing strategies and short term public dissemination plans
* Discussion of long term dissemination plans
C. Relevant public data archives (Jessie Kissinger, presenter)
* Which existing public archives could be used for long term dissemination of SysBio data
* What SysBio data types are not currently supported by public data archives
* Discussion of long term dissemination plans
D. Transcriptomic data derived from RNA-seq (Brian Aevermann, presenter)
* Determine if new transcriptomic (meta)data needs to be captured for new SysBio program
* Determine which aspects of RNA-seq data are not covered by current microarray data support
* Decide how to support data processing (meta)data – structured data fields vs free text protocols
* Determine which RNA-seq data should be disseminated and where
Best Practices Overview
• File Management– Descriptive Names– Metadata– Sensitive Data– Data Versions
• File Content– Rows vs Columns– Spreadsheet Mistakes– File Formats
• Working with Data– Find useful tools– Quality control, data manipulation– Software and Analysis Versions
Courtesy of Michelle Craft
Project Websites
• Informational content using content management systems, e.g. WordPress, Drupal
• Data sharing portal–Within consortium– Public
Previous Data Submission Workflows
Study metadata
Experiment metadata
Primary results
Analysis metadata
Processed data matrix
Free text metadataGEO/PeptideAtlas/SRA/MetaboLights
ViPR/IRD/PATRIC
Host factor biosets
pointer
submission
submission
pointer
Systems Biology sites
Data standards background
• Ontology for Biomedical Investigations (OBI)– Peters, Bjoern and OBI Consortium, The. Ontology for Biomedical
Investigations. Available from Nature Precedings (2009).– Ryan R Brinkman, et al. “Modeling biomedical experimental
processes with OBI”. Journal of Biomedical Semantics (2010).
• OBX data standard– Developed for ImmPort using OBI structure and implemented in a
relational database
– Y. Megan Kong, et al. “Toward an Ontology-Based Framework for Clinical Research Databases”. J Biomed Inform (2011).
• Systems Biology data standard– Derived from OBX/ImmPort and extended to capture data
transformations and derived data (Biosets)
1 3 5 8 14
Serial Challenge Timeline
0
-2 0 3 5 8
Sequential Sampling Studies
Serial/Longitudinal Studies
-2days
daysA/California/07/2009
A/California/07/2009
Courtesy of Elodie Ghedin
1 3 5 8 14
Serial Challenge Timeline
0
n=4 Ferrets at each time point
-2 0 3 5 8
Nasal Wash Nasal Wash
FACS Whole Blood
Serum SerumBronchial Lavage
Lungs
FACS Whole Blood
Blood in RNAlater
Blood in RNAlater
Nasal Wash
FACS Whole Blood
Serum
Blood in RNAlater
Nasal Wash
FACS Whole Blood
Serum
Blood in RNAlater
Nasal Wash
Serum
Lungs
FACS Whole Blood
Blood in RNAlater
Nasal Wash
Serum
Lungs
FACS Whole Blood
Blood in RNAlater
Nasal Wash
Serum
Lungs
FACS Whole Blood
Blood in RNAlater
Nasal Wash
Serum
Lungs
FACS Whole Blood
Blood in RNAlater
Nasal Wash
Serum
Lungs
FACS Whole Blood
Blood in RNAlater
Bronchial Lavage Bronchial Lavage Bronchial Lavage Bronchial Lavage Bronchial Lavage
Sequential Sampling Studies
Serial/Longitudinal Studies
-2
Nasal Wash
Serum
FACS Whole Blood
Blood in RNAlater
days
days
Courtesy of Elodie Ghedin
subjectorganism
treatment agent
T1
treatmentprocess
specimenisolation 1
treatedorganism
datatransformation 1
omicsassay 1
primarydata 1
processeddata 1
Generalized Experiment Workflow
treatedorganism
isolatedspecimen 1
treatedorganism
sacrificedorganism
sacrificeprocess
physicalassessment
assessmentdata
specimenisolation 2
datatransformation 2
omicsassay 2
primarydata 2
processeddata 2
isolatedspecimen 2
specimenisolation 3
datatransformation 3
omicsassay 3
primarydata 3
processeddata 3
isolatedspecimen 3
T2 T3 T4
T5
subjectorganism
treatment agent
T1
treatmentprocess
specimenisolation 1
treatedorganism
datatransformation 1
omicsassay 1
primarydata 1
processeddata 1
Generalized Experiment Workflow
treatedorganism
isolatedspecimen 1
treatedorganism
sacrificedorganism
sacrificeprocess
physicalassessment
assessmentdata
specimenisolation 2
datatransformation 2
omicsassay 2
primarydata 2
processeddata 2
isolatedspecimen 2
specimenisolation 3
datatransformation 3
omicsassay 3
primarydata 3
processeddata 3
isolatedspecimen 3
T2 T3 T4
T5
t
subjectorganism
treatment agent
T1
treatmentprocess
specimenisolation 1
treatedorganism
datatransformation 1
omicsassay 1
primarydata 1
processeddata 1
Generalized Experiment Workflow
treatedorganism
isolatedspecimen 1
treatedorganism
sacrificedorganism
sacrificeprocess
physicalassessment
assessmentdata
specimenisolation 2
datatransformation 2
omicsassay 2
primarydata 2
processeddata 2
isolatedspecimen 2
specimenisolation 3
datatransformation 3
omicsassay 3
primarydata 3
processeddata 3
isolatedspecimen 3
T2 T3 T4
T5
t
subjectorganism
treatment agent
T1
treatmentprocess
specimenisolation 1
treatedorganism
datatransformation 1
omicsassay 1
primarydata 1
processeddata 1
Generalized Experiment Workflow
treatedorganism
isolatedspecimen 1
treatedorganism
sacrificedorganism
sacrificeprocess
physicalassessment
assessmentdata
specimenisolation 2
datatransformation 2
omicsassay 2
primarydata 2
processeddata 2
isolatedspecimen 2
specimenisolation 3
datatransformation 3
omicsassay 3
primarydata 3
processeddata 3
isolatedspecimen 3
T2 T3 T4
T5
Typical RNA-seq Data Processing Workflow
Raw data: fastq*
Mapped reads: SAM/BAM Cufflinks analysis Assembled transcripts: SAM/BAM
TopHat analysis
Differential Expression analysis (edgeR)
Differentially expressedgenes: text*
Data archiving SRA Record
Ref Genome: fasta (version) ENSEMBL
version
Data archiving
GEO
Scaling and norm (cuffMerge)
Transcript abundance values: text*
BRC
Data archiving
BRC
Future Directions
• Finalize core generic (meta)data modules for treatments, specimen sampling, organism assessments, omics assays, data processing
• Determine if additional assay-specific data fields are needed
• Decide which results data should be captured for public dissemination
• Decide which public data archives should be used
• Ensure appropriate linkage between related data