Transcript
Page 1: 06 November 2011 Richard H. Scheuermann, Ph.D. Department of Pathology

www.fludb.org

Influenza Research Database (IRD) & Virus Pathogen Resource (ViPR) Bioinformatics Resource Centers (BRCs)

and support for Systems Biology data

06 November 2011

Richard H. Scheuermann, Ph.D.

Department of Pathology

U.T. Southwestern Medical Center

www.viprbrc.org

Page 2: 06 November 2011 Richard H. Scheuermann, Ph.D. Department of Pathology

www.fludb.orgViPR Overview

www.viprbrc.org

Page 3: 06 November 2011 Richard H. Scheuermann, Ph.D. Department of Pathology

www.fludb.orgIRD Overview

www.fludb.org

Page 4: 06 November 2011 Richard H. Scheuermann, Ph.D. Department of Pathology

www.fludb.orgData Summary & Sources

Page 5: 06 November 2011 Richard H. Scheuermann, Ph.D. Department of Pathology

www.fludb.orgSearch Access to Data

www.fludb.org

Page 6: 06 November 2011 Richard H. Scheuermann, Ph.D. Department of Pathology

www.fludb.orgData Types

Page 7: 06 November 2011 Richard H. Scheuermann, Ph.D. Department of Pathology

www.fludb.orgAnalysis and Visualizationwww.fludb.org

Page 8: 06 November 2011 Richard H. Scheuermann, Ph.D. Department of Pathology

www.fludb.orgAnalysis and Visualization Tools

Page 9: 06 November 2011 Richard H. Scheuermann, Ph.D. Department of Pathology

www.fludb.orgWorkbench Access

www.fludb.org

Page 10: 06 November 2011 Richard H. Scheuermann, Ph.D. Department of Pathology

www.fludb.orgData Submission

www.fludb.org

Page 11: 06 November 2011 Richard H. Scheuermann, Ph.D. Department of Pathology

www.fludb.orgData Submission Page

Page 12: 06 November 2011 Richard H. Scheuermann, Ph.D. Department of Pathology

www.fludb.org

SYSTEMS BIOLOGY “OMICS” DATA

Page 13: 06 November 2011 Richard H. Scheuermann, Ph.D. Department of Pathology

www.fludb.org

Overview of Systems Biology & DBP Projects

• Four systems biology groups funded by NIAID, including:– Systems Virology (Michael Katze group, Univ. Washington)

• Influenza H1N1 and H5N1 and SARS Coronavirus• statistical models, algorithms and software, raw and processed gene expression

data, and proteomics data

– Systems Influenza (Alan Aderem group, Institute for Systems Biology)• various Influenza viruses• microarray, mass spectrometry, and lipidomics data

• ViPR Driving Biological Projects– Abraham Brass, Mass. General Hospital

• Dengue virus host factor database from RNAi screen

– Lynn Enquist / Moriah Szpara, Princeton University• Deep sequencing and neuronal microarrays for functional genomic analysis of

Herpes Simplex Virus

Page 14: 06 November 2011 Richard H. Scheuermann, Ph.D. Department of Pathology

www.fludb.org

Andrew R. Joyce & Bernhard Ø. Palsson, Nature Reviews Molecular Cell Biology 7, 198-210 (March 2006)

Omics Data

Page 15: 06 November 2011 Richard H. Scheuermann, Ph.D. Department of Pathology

www.fludb.orgAcknowledgement

• Lynn Law, U. Washington• Richard Green, U. Washington• Jyothi Noronha, U.T. Southwestern• Eva Sadat, U.T. Southwestern• Brett Pickett, U.T. Southwestern

Page 16: 06 November 2011 Richard H. Scheuermann, Ph.D. Department of Pathology

www.fludb.orgDiscussion Points

• Relationship between data archives (e.g. GEO, PRIDE) and the BRCs (e.g. IRD, ViPR, PATRIC)

– Integration– Metadata standards

• Metadata– What kind of data– What kind of standards – MIBBI vs. “MIBBI lite”

• Raw versus processed data• Primary results data

– Need to define what is considered “primary” data for each platform• Microarray example: raw image files (.tiff) vs probe intensity values (.cel)• Opportunity for re-processing leading to re-interpretation

• Derived/processed results– “Interesting gene/protein lists” from microarray, RNAi, proteomics, and other experimental platforms– “Interesting metabolite lists”– Data processing metadata

• Visualize and analyze “interesting gene/protein/metabolite lists”

Page 17: 06 November 2011 Richard H. Scheuermann, Ph.D. Department of Pathology

www.fludb.org

1. “Omics” data management (host)a) Project metadata

b) Experiment metadata

c) Experiment sample metadata

d) Data analysis metadata

e) Primary results

f) Derived results (e.g. “interesting gene/protein/metabolite lists” (Host Factor Biosets))

2. Add additional related datasets from other sources

3. Visualize Host Factor Biosets in context of biological pathways and networks

4. Statistical analysis of pathway sub-network overrepresentation

5. Re-analysis of primary data using assembled pipeline tools (?)

Proposal for “Omics” Data

Page 18: 06 November 2011 Richard H. Scheuermann, Ph.D. Department of Pathology

www.fludb.orgGEO Representation

GEO data representations based on free text

Page 19: 06 November 2011 Richard H. Scheuermann, Ph.D. Department of Pathology

www.fludb.orgNon-standard Descriptions

Guess samples details

Page 20: 06 November 2011 Richard H. Scheuermann, Ph.D. Department of Pathology

www.fludb.org Metadata (MIBBI-compliant)

• Project Level Metadata– Hypothesis, rationale, study design, etc.– Publications and links pertaining to the project– Data providers - PI, other key personnel, affiliations, contact information

• Experiment/Assay Level Metadata– Experiment platform– Experiment data type

• Experiment Sample Level Metadata– Sample source and characteristics of source– Sample type– Source/sample treatment information– Assay details

• Data Processing/Analysis Level Metadata– Algorithm(s) used for transforming primary to derived data– Configuration parameters

Page 21: 06 November 2011 Richard H. Scheuermann, Ph.D. Department of Pathology

www.fludb.orgMetadata Submission Modules

• Study• Experiment• Animal/human subject• Biosample• Reagent• Protocol• Experiment Sample• Analysis method• Host factor bioset

Page 22: 06 November 2011 Richard H. Scheuermann, Ph.D. Department of Pathology

www.fludb.orgStudy

Page 23: 06 November 2011 Richard H. Scheuermann, Ph.D. Department of Pathology

www.fludb.orgExperiment

Page 24: 06 November 2011 Richard H. Scheuermann, Ph.D. Department of Pathology

www.fludb.orgSubject

Page 25: 06 November 2011 Richard H. Scheuermann, Ph.D. Department of Pathology

www.fludb.orgBiosample

Page 26: 06 November 2011 Richard H. Scheuermann, Ph.D. Department of Pathology

www.fludb.orgReagent

Page 27: 06 November 2011 Richard H. Scheuermann, Ph.D. Department of Pathology

www.fludb.orgProtocol

Page 28: 06 November 2011 Richard H. Scheuermann, Ph.D. Department of Pathology

www.fludb.orgExperiment Sample

Page 29: 06 November 2011 Richard H. Scheuermann, Ph.D. Department of Pathology

www.fludb.orgAnalysis Method

Page 30: 06 November 2011 Richard H. Scheuermann, Ph.D. Department of Pathology

www.fludb.orgHost Factor Bioset

Page 31: 06 November 2011 Richard H. Scheuermann, Ph.D. Department of Pathology

www.fludb.orgPossible Data Submission Workflows

Study metadata

Experiment sample metadata

Primary results

Analysis metadata

Host factor bioset

GEO free text metadataGEO

ViPR/IRD

Primary resultsStudy metadata

Experiment sample metadata

Primary results

Analysis metadata

Host factor bioset

GEO free text metadataGEO

ViPR/IRD

A B

Study metadata

Experiment sample metadata

Primary results

Analysis metadata

Host factor bioset

GEO free text metadata

GEOViPR/IRDPrimary results

C

Page 32: 06 November 2011 Richard H. Scheuermann, Ph.D. Department of Pathology

www.fludb.org

Search Analyze Save to WorkbenchSearch our robust database for: Genomes Genes & proteins Immune epitopes 3D protein structures

Analyze your results online. We offer: Identify similar sequences (BLAST) Align sequences (MSA) Find short peptides in proteins Visualize aligned sequences

Sign up for a Workbench to: Store data in working sets for future analysis Integrate ViPR data with your laboratory data Store analysis results Share results and data with collaborators

Browse All Search Types Browse All Tools Sign Up! Sign In

HighlightsViPR Highlight for All FamiliesYou can now search for, and visualize, 3D protein structures from within the ViPR website. Simply navigate to the protein of interest and then follow the links to use this feature.

View Tutorial View Example Results Start Search

Data Summary Updated 2 Weeks Ago

Genome Statistics for Virus Families

Families 13

Genera 54

Species 661

Strains 38,886

Segments 45,154

FAMILY_NAME

TEXT SEARCH

Genomes

Genes & Proteins

Immune Epitopes

Host Factor Biosets

3D Protein Structures

Protein Domains

Protein Motifs

HISTORY

Your Analysis History

Retrieve a Download

Ortholog Groups

SEARCH OR FIND:

Quick Text Search

Sequence Feature Variant Type

ViPRVirus Pathogen Resource

...About Us Supported Projects Announcements Resources Support

SEARCH DATA ANALYZE & VISUALIZE ACCESS WORKBENCH VIRUS FAMILIES HOME

Page 33: 06 November 2011 Richard H. Scheuermann, Ph.D. Department of Pathology

www.fludb.org

ViPRVirus Pathogen Resource

SEARCH DATA ANALYZE & VISUALIZE ACCESS WORKBENCH VIRUS FAMILIES HOME FAMILY_NAMEYou are logged in as [email protected]

Sign Out

Host Factor BiosetsBack to Previous Page

Please Select an Experiment to Explore to view the associated Bioset (Interesting Gene List )

Study Title Experiment Name Experiment Type

Host Species & Biomaterial (Cell Line)

Virus Strain Name

Host factors in DENV replication

Dengue whole-genome siRNA library screen siRNA Screen Human

(Huh-7) New Guinea C

About Us Supported Projects Announcements Resources Support

PI PI Institution Contract / Grant Title Description Keywords

Contract / Grant

NumberDate

Submitted

Abraham L. Brass

Massachusetts General Hospital

Dengue Virus-Host Interactiosn Using Functional Genomics

Identify host factors required for DENV replication by using siRNA libraries.

Dengue, DENV, siRNA, host factors

HHSN135711131(NIAID) Dec 2011

Cite ViPR Tutorials Report a Bug Request Web Training Contact Us Release Date: Jan 7, 2012This project is funded by the National Institute of Allergy and Infectious Diseases (NIH / DHHS) under Contract No. HHSN272200900041C and is a collaboration between Northrop Grumman Health IT, University of Texas Southwestern Medical Center and Vecna Technologies. Virus images courtesy of CDC Public Health Image Library and Wellcome Images.

Study Title Experiment Name Experiment Type

Host Species & Biomaterial (Cell Line)

Virus Strain Name

Host factors in HSV-1 replication

Host neuronal response to HSV-1

Gene Expression Microarray

Human Neurons(SHSY5Y ) 17

PI PI Institution Contract / Grant Title Description Keywords Contract /

Grant NumberDate

Submitted

Moriah SzparaLynn Enquist Princeton Univ.

Deep Sequencing and Neuronal Microarrays for Functional Genomic Analysis of HSV-1

Characterize the response to HSV-1 infection through differential gene

expression in human neuronal cells.

Herpes Simplex Virus 1, HSV-1,

neurons, HHSN24681012

(NIAID) Apr 2012

Page 34: 06 November 2011 Richard H. Scheuermann, Ph.D. Department of Pathology

www.fludb.org

ViPRVirus Pathogen Resource

SEARCH DATA ANALYZE & VISUALIZE ACCESS WORKBENCH VIRUS FAMILIES HOME FAMILY_NAME

About Us Announcements Resources Support

You are logged in as [email protected]

Sign Out

Host Factor Biosets(“Omics” Experiment Details) Download Host Factors Download AllDownload Primary Results

EXPERIMENT METADATA (from Host response to Influenza virus infection)-

PRIMARY RESULTS-

HOST FACTOR BIOSETS+

Experiment Sample ID

Source Biological

SampleTreatment Agent 1

NameTreatment Agent 1

Amount Treatment 1 Duration Download Experiment Data

251485048497_1_2_RNACalu-3 cells A/Vietnam/1203-

CIP048_RG1/2004(H5N1) 1 MOI 0 hrs Experiment Sample 1

251485048466_1_3_RNACalu-3 cells A/Vietnam/1203-

CIP048_RG1/2004(H5N1) 1 MOI 12 hrs Experiment Sample 2

251485048497_1_1_RNACalu-3 cells A/Vietnam/1203-

CIP048_RG1/2004(H5N1) 1 MOI 24 hrs Experiment Sample 3

251485048467_1_1_RNA Calu-3 cells Mock Mock 24 hrs Experiment Sample 4

Study Title Experiment Name Experiment Type

Host Species & Biomaterial (Cell Line) Virus Strain Name Conditional

Variables

NIAID Systems Virology Center

VN1203/2004 infection in Calu3 cell: A time course

Gene Expression Microarray Calu3 A/Vietnam/1203-

CIP48_RG1/2004(H5N1)+/- virus infection, time after infection

Page 35: 06 November 2011 Richard H. Scheuermann, Ph.D. Department of Pathology

www.fludb.org

Gene NameEntrez Gene

ID

Entrez Gene Name

ImmPort Page GenBank Accession Bioset 1

ScoreBioset 2

Score

Mus musculus contactin 1 12805 Cntn1 NM_007727 0.588131 0.812506913

Mus musculus preproenkephalin 1 18619 Penk1 NM_001002927 0.277487726 0.920335884

Mus musculus 5-hydroxytryptamine (serotonin) receptor 7 15566 Htr7 NM_008315 NS 0.078352547

Cite ViPR Tutorials Report a Bug Request Web Training Contact UsRelease Date: Jan 7, 2012

This project is funded by the National Institute of Allergy and Infectious Diseases (NIH / DHHS) under Contract No. HHSN272200900041C and is a collaboration between Northrop Grumman Health IT, University of Texas Southwestern Medical Center and Vecna Technologies. Virus images courtesy of CDC Public Health Image Library and Wellcome Images.

ViPRVirus Pathogen Resource

SEARCH DATA ANALYZE & VISUALIZE ACCESS WORKBENCH VIRUS FAMILIES HOME FAMILY_NAME

About Us Announcements Resources Support

You are logged in as [email protected]

Sign Out

Host Factor Biosets(“Omics” Experiment Details)

Run Analysis

Download Host Factors Download AllDownload Primary Results

Bioset Name Description of Bioset Bioset Type Name of Application or Analysis Method

Download Bioset Data

Compendium_Digital_signature_by_Fishers_summary_statistic

From a compendium of 12 studies that included responses to influenza A subtype H5N1, reconstructed 1918 influenza A virus, and SARS-CoV, we

used meta-analysis to derive multiple gene expression signatures. Meta-analysis

Fishers summary-statistic: MADAM (Meta-Analysis Data

Aggregation Methods)Bioset 1

Microarray_results_flu_calu-3 Included RNA isolated from cells infected with influenza A subtype H5N1 at various timepoints

List of differentially expressed genes

Inter-array normalization;Median-background

subtractionBioset 2

+

PRIMARY RESULTS+

HOST FACTOR BIOSETS-

EXPERIMENT METADATA (from Host response to Influenza virus infection)

Page 36: 06 November 2011 Richard H. Scheuermann, Ph.D. Department of Pathology

www.fludb.org

Gene NameEntrez Gene

ID

Entrez Gene Name

ImmPort Page GenBank Accession Bioset 1

ScoreBioset 2

Score

Mus musculus contactin 1 12805 Cntn1 NM_007727 0.588131 0.812506913

Mus musculus preproenkephalin 1 18619 Penk1 NM_001002927 0.277487726 0.920335884

Mus musculus 5-hydroxytryptamine (serotonin) receptor 7 15566 Htr7 NM_008315 NS 0.078352547

Cite ViPR Tutorials Report a Bug Request Web Training Contact UsRelease Date: Jan 7, 2012

This project is funded by the National Institute of Allergy and Infectious Diseases (NIH / DHHS) under Contract No. HHSN272200900041C and is a collaboration between Northrop Grumman Health IT, University of Texas Southwestern Medical Center and Vecna Technologies. Virus images courtesy of CDC Public Health Image Library and Wellcome Images.

ViPRVirus Pathogen Resource

SEARCH DATA ANALYZE & VISUALIZE ACCESS WORKBENCH VIRUS FAMILIES HOME FAMILY_NAME

About Us Announcements Resources Support

You are logged in as [email protected]

Sign Out

Host Factor Biosets(“Omics” Experiment Details)

Download Host Factors Download AllDownload Primary Results

Bioset Name Description of Bioset Bioset Type Name of Application or Analysis Method

Download Bioset Data

Compendium_Digital_signature_by_Fishers_summary_statistic

From a compendium of 12 studies that included responses to influenza A subtype H5N1, reconstructed 1918 influenza A virus, and SARS-CoV, we

used meta-analysis to derive multiple gene expression signatures. Meta-analysis

Fishers summary-statistic: MADAM (Meta-Analysis Data

Aggregation Methods)Bioset 1

Microarray_results_flu_calu-3 Included RNA isolated from cells infected with influenza A subtype H5N1 at various timepoints

List of differentially expressed genes

Inter-array normalization;Median-background

subtractionBioset 2

+

PRIMARY RESULTS+

HOST FACTOR BIOSETS-

EXPERIMENT METADATA (from Host response to Influenza virus infection)

Visualize Protein Network

Run Analysis

Page 37: 06 November 2011 Richard H. Scheuermann, Ph.D. Department of Pathology

www.fludb.org

Gene NameEntrez Gene

ID

Entrez Gene Name

ImmPort Page GenBank Accession Bioset 1

ScoreBioset 2

Score

Mus musculus contactin 1 12805 Cntn1 NM_007727 0.588131 0.812506913

Mus musculus preproenkephalin 1 18619 Penk1 NM_001002927 0.277487726 0.920335884

Mus musculus 5-hydroxytryptamine (serotonin) receptor 7 15566 Htr7 NM_008315 NS 0.078352547

Cite ViPR Tutorials Report a Bug Request Web Training Contact UsRelease Date: Jan 7, 2012

This project is funded by the National Institute of Allergy and Infectious Diseases (NIH / DHHS) under Contract No. HHSN272200900041C and is a collaboration between Northrop Grumman Health IT, University of Texas Southwestern Medical Center and Vecna Technologies. Virus images courtesy of CDC Public Health Image Library and Wellcome Images.

ViPRVirus Pathogen Resource

SEARCH DATA ANALYZE & VISUALIZE ACCESS WORKBENCH VIRUS FAMILIES HOME FAMILY_NAME

About Us Announcements Resources Support

You are logged in as [email protected]

Sign Out

Host Factor Biosets(“Omics” Experiment Details)

Download Host Factors Download AllDownload Primary Results

Bioset Name Description of Bioset Bioset Type Name of Application or Analysis Method

Download Bioset Data

Compendium_Digital_signature_by_Fishers_summary_statistic

From a compendium of 12 studies that included responses to influenza A subtype H5N1, reconstructed 1918 influenza A virus, and SARS-CoV, we

used meta-analysis to derive multiple gene expression signatures. Meta-analysis

Fishers summary-statistic: MADAM (Meta-Analysis Data

Aggregation Methods)Bioset 1

Microarray_results_flu_calu-3 Included RNA isolated from cells infected with influenza A subtype H5N1 at various timepoints

List of differentially expressed genes

Inter-array normalization;Median-background

subtractionBioset 2

-

PRIMARY RESULTS+

HOST FACTOR BIOSETS-

EXPERIMENT METADATA (from Host response to Influenza virus infection)

Visualize Protein Network

Run Analysis

ViPR Protein-Protein Interactions and Pathway Visualization

Please choose the type of data that you would like to view relating to your selection(s):

First-Degree (Direct) Interactions

Second-Degree Interactions

Functional Modules

Metabolic Pathways

VisualizeCancel

Page 38: 06 November 2011 Richard H. Scheuermann, Ph.D. Department of Pathology

www.fludb.org

ViPRVirus Pathogen Resource

SEARCH DATA ANALYZE & VISUALIZE ACCESS WORKBENCH VIRUS FAMILIES HOME FAMILY_NAME

About Us Announcements Resources Support

You are logged in as [email protected]

Sign Out

Cite ViPR Tutorials Report a Bug Request Web Training Contact Us Release Date: Jan 20, 2011This project is funded by the National Institute of Allergy and Infectious Diseases (NIH / DHHS) under Contract No. HHSN272200900041C and is a collaboration between Northrop Grumman Health IT, University of Texas Southwestern Medical Center and Vecna Technologies. Virus images courtesy of CDC Public Health Image Library and Wellcome Images.

Visualize Host Factor InteractionsPROTEIN-PROTEIN INTERACTIONS INFORMATIONNumber of “hits” from Interesting Gene List: 1 Number of Nodes: $num_nodes

Number of Edges: $num_edgesSave Analysis

Page 39: 06 November 2011 Richard H. Scheuermann, Ph.D. Department of Pathology

www.fludb.org

39

• U.T. Southwestern– Richard Scheuermann (PI)– Burke Squires– Jyothi Noronha– Victoria Hunt– Eva Sadat– Brett Pickett– Yun Zhang

• Vecna– Chris Larsen– Al Ramsey

• LANL– Catherine Macken– Mira Dimitrijevic

• U.C. Davis– Nicole Baumgarth

• USDA– David Suarez

• Sage Analytica– Robert Taylor– Lone Simonsen

• U. Washington– Michael Gale

• Northrop Grumman– Ed Klem– Mike Atassi– Jon Dietrich– Patty Berger– Jawwad Cheema– Zhiping Gu– Sherry He– Wenjie Hua– Wei Jen– Sanjeev Kumar– Xiaomei Li– Jason Lucas– Bruce Quesenberry– Barbara Rotchford– Prabhu Shankar– Hongbo Su– Bryan Walters– Sam Zaremba– Liwei Zhou

• U. Washington– Lynn Law– Richard Green

• IRD SWG– Gillian Air, OMRF– Carol Cardona, Univ. Minnesota– Adolfo Garcia-Sastre, Mt Sinai– Elodie Ghedin, Univ. Pittsburgh– Martha Nelson, Fogarty– Daniel Perez, Univ. Maryland– Gavin Smith, Duke Singapore– Dave Stallknecht, Univ. Georgia– David Topham, Rochester– Richard Webby, St Jude

• ViPR SWG– Richard Kuhn, Purdue– Raul Andino, UCSF– Slobodan Paessler, UTMB Galveston– X.J. Meng, VBI– Colin Parrish, Cornell– Elliot Lefkowitz, UAB– Carla Kuiken, LANL– David Knipe, Harvard– Matthew Henn, Broad Institute– Richard Whitley, UAB– John Young, Salk Institute

Acknowledgments

N01AI40041N01AI2008038


Recommended