Upload
wilton
View
28
Download
0
Embed Size (px)
DESCRIPTION
GenMAPP. A Software Tool for Analyzing Genome-Scale Data in the Context of Biological Pathways and the Gene Ontology. J. David Gladstone Institute of Cardiovascular Disease UCSF. Overview. Intro to GenMAPP - GenMAPP analysis example Advanced features. - PowerPoint PPT Presentation
Citation preview
A Software Tool for Analyzing Genome-Scale Data in the Context of Biological Pathways and
the Gene Ontology
J. David Gladstone Institute of Cardiovascular DiseaseUCSF
GenMAPP
Overview
• Intro to GenMAPP
- GenMAPP analysis example
• Advanced features
Analyzing Large-Scale Data in the Context of Biological Pathways
• Which genes are expressed in my dataset?
• What biological processes are important in my data model?
• New insight into underlying biology
Analyzing Large-Scale Data in the Context of Biological Pathway
• View data in the context of known biology
• Rather than seeing which individual genes are changed, pathway analysis emphasizes processes that are changed
• Biologists are familiar with pathways, so it is a natural way of sharing data
Cardiomyopathy: Downregulated genes
Cardiomyopathy: Downregulated genes
Fatty Acid Degradation Pathway
Cardiomyopathy Data on Fatty Acid Degradation Pathway
GenMAPPGene Map Annotator and
Pathway Profiler
Visualize gene expression and other genomic data on biological pathways and other groupings of genes Global analysis identifies significantly changed processes and functional groups
www.GenMAPP.org
GenMAPP
• Developed in the Conklin lab at Gladstone as an internal tool for dealing with microarray data
• Approximately ~12,000 registered users to date
• 100% Free!!
• Used in 150 - 200 publications
• Open source, code available at SourceForge.net
• Current version for Windows only (Coded in VB)
Time Course Data on Cell Cycle Pathway
SNPs with Predicted Effects
http://alto.compbio.ucsf.edu/LS-SNP/
SNPs that Predispose to Myocardial Infarction
Tobin et al, European Heart Journal 2004
• 547 acute MI cases; 505 controls• 58 SNPs in 35 genes
=> SNPs in 5 different genes showed statistical
association with MI
Study spans 19 pathways
=> 4 of 5 hits are on a single pathway
SNPs and Myocardial Infarction
Tobin et al, European Heart Journal 2004
SNP Data in GenMAPP
• Visualization Distribution of SNPs per gene
• Prioritization Mapping SNP annotations onto pathways
• Analysis Interpreting SNP data in the context of biological pathways
Future directions High-resolution visualization of individual SNPs with the ability to overlay data
MAPPFinder
MAPPFinder
Global comparison of changes in dataset to changes expected by
chance
Experimental Data Gene Ontology termsGenMAPP Pathways
Pathways and GO terms with significant changes
Originally developed as a separate application by Scott Doniger*
* Doniger et al. Genome Biology 4(1):R7
MAPPFinder Browser
MAPPFinder Browser
GenMAPP Relationship SchemaGenMAPP Relationship Schema
Pathway MAPP
User Dataset (GEX)
Criterion Gene ID
Blue 1415904_at
Gene ID System
Affymetrix
Gene Name Gene ID
Lpl 16956
Gene ID System
EntrezGene
GenMAPP Supported Species
Fruit flyHumanMouseRatWormYeastZebrafishChicken DogCow
By request:Chimp FrogFugu F.rubripesHoney beeMosquito Pufferfish T.nigroviridis
GenMAPP Supported Gene IDs
Annotations
InterProEMBLOMIMPfamGene Ontology
Species-specific
MGIRGDSGDWormBaseZFINHUGOFlyBase
Gene IDs
Affymetrix Entrez GeneRefSeq (protein only) UnigeneUniProtEnsemblPDB
Available MAPP Archives
Download all MAPPs through Downloader in GenMAPP
Contributed MAPPsHand-curated pathways created at GenMAPP.org or submitted by GenMAPP users. >70 MAPPs for human, mouse and rat.
Inferred MAPPs Inferred from human contributed MAPPs, using homology information from Homologene and Ensembl
Tissue-Specific MAPPs (human and mouse only)Based on the analysis of two microarray datasets generated by the Genomic Institute of the Novartis Foundation
GO Sample MAPPs
An partial collection of GO terms formatted as GenMAPP MAPP files, each containing between 100 genes and 300 genes. GO MAPPs are formatted as lists of genes, and do not contain any graphics other than the gene object and the label
SGD metabolic MAPPs (yeast only)Derived from the yeast pathways at SGD
KEGG converted MAPPs The KEGG Converted MAPPs were converted from the Pathway Resource at the Kyoto Encyclopedia of Genes and Genomes.
http://www.genmapp.org/featured_mapps.html
Input Data
• Data in spreadsheet summary format • NO raw data• Data should include metrics that you want to use as cutoffs:
avg signal, ratio, fold, signal quality, p-value, cluster ID, other statistics
• Include ALL genes measured in experiment, DO NOT pre-filter• Choose optimal primary gene ID• Custom annotation can be useful (Database includes standard annotation)
Example: Group Comparison Experiment
• Fold changes between groups• p-value associated with fold • Average signal per group
GenMAPP WorkflowGenMAPP Workflow
Import Data
Set Color Criteria
Display Data on Pathways
Gene Ontology analysis Export Pathways to the Web
Pre-Processed Formatted Data (with statistics, metrics)
Create/Edit/ConvertPathways
ExpressionExpressionDatasetDatasetManagerManager
DraftingDraftingBoardBoard
Drafting BoardDrafting BoardMAPPBuilderMAPPBuilder
ConverterConverter
MAPPFinderMAPPFinder MAPPSetsMAPPSets
Example: Analysis of Complex Time-Example: Analysis of Complex Time-Course DataCourse Data
Challenges:• How to represent your data in an intuitive manner• How to analyze patterns rather than specific comparisons.
Approach: • Set up hypotheses to test• Attach global statistics (e.g. ANOVA) and pattern recognition
• Efficiently import in data into GenMAPP• Visualize cluster and time-point data (GenMAPP 2.1-NEW)• Global analysis of pathway/ontologies (MAPPFinder)• Export results to the web/for publication
Set Up Hypotheses to TestSet Up Hypotheses to Test
Build a MAPP to Test a Hypothesis• Use literature and previous knowledge about the model you are
studying to build a list of candidates or pathway.
Step 1):• Collect a list of gene IDs• Import them using the MAPPBuilder Function• Organize into a biological pathway along with predictions of expected
changes.
Salomonis N, et al. Genome Biol. 2005 6:R12–R12.16
Import List of Genes in MAPPBuilderImport List of Genes in MAPPBuilder
Gene Layout on the Drafting BoardGene Layout on the Drafting Board
Example: Analysis of Complex Time-Example: Analysis of Complex Time-Course DataCourse Data
Challenges:• How to represent your data in an intuitive manner• How to analyze patterns rather than specific comparisons.
Approach: • Set up hypotheses to test• Attach global statistics (e.g. ANOVA) and pattern
recognition• Efficiently import in data into GenMAPP• Visualize cluster and time-point data (GenMAPP 2.1-NEW)• Global analysis of pathway/ontologies (MAPPFinder)• Export results to the web/for publication
Dataset: Mouse Uterine Pregnancy Dataset: Mouse Uterine Pregnancy Time-CourseTime-Course
Experiment Design:• Analyzed 7 time-points (3-8 replicates):
• Non-Pregnant mice• 14.5, 16.5 and 17.5 days post fertilization • 18.5 days (term pregnancy)• 6 hours and 24 hours postpartum
• Hybridized to mouse 11k Affymetrix arrays
Analysis:• Normalized and Adjusted expression (gcrma R)• Performed a global f-test (multtest R)• Hierarchical and partitioned clustering (hopach R)
Salomonis N, et al. Genome Biol. 2005 6:R12–R12.16
HOPACH ClusteringHOPACH ClusteringHierarchical Ordered Partitioning and Collapsing HybridHierarchical Ordered Partitioning and Collapsing Hybrid
1. Use global f-test to filter probeset list down to 3500 entries.
2. Cluster fold changes for each replicate compared to non-pregnant baseline mean.
3. Take the top level cluster (left) and re-associate with expression data.
Example: Analysis of Complex Time-Example: Analysis of Complex Time-Course DataCourse Data
Challenges:• How to represent your data in an intuitive manner• How to analyze patterns rather than specific comparisons.
Approach: • Set up hypotheses to test• Attach global statistics (e.g. ANOVA) and pattern recognition
• Efficiently import in data into GenMAPP• Visualize cluster and time-point data (GenMAPP 2.1-NEW)• Global analysis of pathway/ontologies (MAPPFinder)• Export results to the web/for publication
GenMAPP InputGenMAPP Input
Import File Design:• Include all probe data (not just filtered)• Include the following columns of data
• Multtest p-values• HOPACH clusters• Average group expression values• Fold changes (all relevant pair wise comparisons)• Gene Database system code
Salomonis N, et al. Genome Biol. 2005 6:R12–R12.16
GenMAPP InputGenMAPP Input
GenMAPP Expression Dataset GenMAPP Expression Dataset ManagerManager
Import Text File into GenMAPP• Tell GenMAPP which columns have non-numeric data.
Establishing Rules for Coloring Gene Boxes:• Design criterion that captures any patterns you want to see.• Here we want:
• Fold change gradients for up and down regulated for time-point comparisons (Color Sets)
• Different colors assigned to each HOPACH cluster
Salomonis N, et al. Genome Biol. 2005 6:R12–R12.16
GenMAPP Expression Dataset GenMAPP Expression Dataset ManagerManager
GenMAPP Expression Dataset GenMAPP Expression Dataset ManagerManager
Example: Analysis of Complex Time-Example: Analysis of Complex Time-Course DataCourse Data
Challenges:• How to represent your data in an intuitive manner• How to analyze patterns rather than specific comparisons.
Approach: • Set up hypotheses to test• Attach global statistics (e.g. ANOVA) and pattern recognition• Efficiently import in data into GenMAPP• Visualize cluster and time-point data (GenMAPP 2.1-NEW)• Global analysis of pathway/ontologies (MAPPFinder)• Export results to the web/for publication
Viewing Time-Course Data on Viewing Time-Course Data on MAPPsMAPPs
Method 1)• View criterion, one at a time on pathways of
interest.
Single Color Set ViewSingle Color Set View
Single Color Set ViewSingle Color Set View
Viewing Time-Course Data on Viewing Time-Course Data on MAPPsMAPPs
Method 1)• View criterion, one at a time on pathways of
interest.
Method 2)• View clusters directly on pathway.
Single Color Set ViewSingle Color Set View
Single Color Set ViewSingle Color Set View
Viewing Time-Course Data on Viewing Time-Course Data on MAPPsMAPPs
Method 1)• View criterion, one at a time on pathways of
interest.
Method 2)• View clusters directly on pathway.
Method 3)• View all criterion of interest simultaneously.
Single Color Set ViewSingle Color Set View
Multiple Color Set ViewMultiple Color Set View
Example: Analysis of Complex Time-Example: Analysis of Complex Time-Course DataCourse Data
Challenges:• How to represent your data in an intuitive manner• How to analyze patterns rather than specific comparisons.
Approach: • Set up hypotheses to test• Attach global statistics (e.g. ANOVA) and pattern recognition
• Efficiently import in data into GenMAPP• Visualize cluster and time-point data (GenMAPP 2.1-NEW)• Global analysis of pathway/ontologies (MAPPFinder)• Export results to the web/for publication
Advanced Features
• Customizing a Gene Database / Creating a Gene Database for a non-supported species=> Implement GenMAPP for a novel model species
• Create your own pathway MAPPs => Implement GenMAPP for a novel model species => Author novel pathways based on your discoveries
• High-throughput export of browsable html pathway archive => For interactive web-display of data on pathway archive
International Gene Trap Consortium
GenMAPP team
The GenMAPP program can be downloaded at www.GenMAPP.org
Questions?
[email protected]@gladstone.ucsf.edu
Bruce Conklin Alex Pico Alex Zambon Karen Vranizan Nathan Salomonis Kam Dahlquist
http://groups.google.com/group/GenMAPP