Richard H. Scheuermann, Ph.D. November 5 , 2012

Preview:

DESCRIPTION

Support for Systems Biology Data in IRD/ ViPR - Proteomics. Richard H. Scheuermann, Ph.D. November 5 , 2012. Projects with Host Factor Data. Four s ystems biology groups funded by NIAID, including: Systems Virology (Michael Katze group, Univ. Washington) - PowerPoint PPT Presentation

Citation preview

Richard H. Scheuermann, Ph.D.November 5, 2012

Support for Systems Biology Datain IRD/ViPR - Proteomics

Projects with Host Factor Data

• Four systems biology groups funded by NIAID, including:– Systems Virology (Michael Katze group, Univ. Washington)

• Influenza H1N1 and H5N1 and SARS Coronavirus• statistical models, algorithms and software, raw and processed gene expression data, and

proteomics data– Systems Influenza (Alan Aderem group, Institute for Systems Biology/Seattle Biomed)

• Various influenza viruses• microarray, mass spectrometry, and lipidomics data

• ViPR Driving Biological Projects– Abraham Brass, Mass. General Hospital

• Dengue virus host factor database from RNAi screen – Lynn Enquist / Moriah Szpara, Princeton University

• Deep sequencing and neuronal microarrays for functional genomic analysis of Herpes Simplex Virus

– Richard Kuhn, Purdue University• Metabolomics data of Dengue virus infection of human cells and mosquitos

– Mike Diamond, Washington University• Identification of inhibitory interferon-stimulated genes against flaviviruses and noroviruses using

shRNA knockdown• Determine the mechanism of action of individual inhibitory ISGs

• “Omics” data management (MIBBI vs MIBBI-DB)– Project metadata (1 template)

• Title, PI, abstract, publications– Experiment metadata (~6 templates)

• Biosamples, treatments, reagents, protocols, subjects– Primary results data

• Raw expression values– Data processing metadata (1 template)

• Normalization and summarization methods– Processed data

• Data matrix of fold changes and p-values– Data interpretation metadata (1 template)

• Fold change and p-value cutoffs used– Interpreted results (Host factor biosets)

• Interesting gene, protein and metabolite lists

• Visualize biosets in context of biological pathways and networks• Statistical analysis of pathway/sub-network overrepresentation

Strategy for Handling “Omics” Data

Data Submission Workflows

Study metadata

Experiment metadata

Primary results

Analysis metadata

Processed data matrix

Free text metadataGEO/PRIDE/PNNL/SRA/MetaboLights

ViPR/IRD/PATRIC

Host factor bioset

pointer

submission

submission

pointer

Systems Biology sites

Metadata Submission Template Examples

Host Factor Data

8 Studies To Date

Host Factor Bioset

Transcriptomics => Proteomics

• Metadata fields are largely re-usable, with some exceptions– Exp_sample_template (protein).xls

• Results data differences– Peptide-level and protein-level• IM005_Peptide_normalization_matrix.V2.xlsx• IM005_Protein Normalization matrix.xlsx

– Statistical measures• Results_matrix_ IM005_sig Protein_RM.xlsx

Metadata Field Changes

• GEO GSM ID => Primary Data Archive + Primary Data Archive ID

• Semi-structured Experiment Variable to Structured Experiment Variable– Free text (1 day) => value unit pairs in separate fields

(1/day; 10^4/plaque forming units)• Multiple processed data matrix files– Concatenated IDs separated by (; |)

• Reagents and protocols are different but should not require submission template changes

Normalized Data

• Archive at BRC (standard format?)– Peptide normalized data– Protein normalized data– Results matrix of significant proteins

• BRCs derive bioset lists from results matrix– Handling different significance measures• t-test flag, t-test p-value, g-test flag, g-test p-value,

log10 ratio

Host Factor Bioset

On Deck

• Metabolomics and lipidomics data• Integration of RNA expression, protein

abundance and metabolite abundance• Pathway/network visualization and analysis

Acknowledgement

• Lynn Law, U. Washington• Richard Green, U. Washington• Peter Askovich, Seattle Biomed• Brett Pickett, U.T. Southwestern/JCVI • Jyothi Noronha, U.T. Southwestern• Eva Sadat, U.T. Southwestern• Entire Systems Biology Data Dissemination Task

Force, especially Jeremy Zucker• NIAID (Alison Yao and Valentina DiFrancesco)

Future Development Plans

GOenrichment

Networkvisualization

GOGOGOGOGOGOGOGOGOGOGOGOGOGO

Recommended