Upload
thamilanda
View
29
Download
0
Tags:
Embed Size (px)
DESCRIPTION
cytoscape meterial
Citation preview
Integrative Bioinformatics using Cytoscape(and R2)
(Bio)Chemistry versus Molecular Biology
…some basic concepts(Bio)Chemistry
• Concentrations• Molecular structures• Reaction equations• Quantitative• Defined experimental setup
Molecular Biology• Regulation• Large biomolecules• Large scale processes• Qualitative• Complex experimental setup (by
necessity!)
Human Genetics
Molecular Biology: New techniquesIntegrative Bioinformatics needed
(Deep)Sequencing – Arrays – Proteomics
• Quantitative analysis– handling large datasets– statistics
• Capturing complexity– integration– graphs
• Integrative Bioinformatics: Integrated Bioinformaticians!
Human Genetics
Integrative Bioinformatics: An example
Human Genetics
Integrative Bioinformatics: What they did
Human Genetics
1. Sequence genome; assign gene function using protein sequence, structural similarities (Bonneau et al., 2004; Ng et al., 2000)
2. Perturb cells: environmental factors; knockouts (Baliga et al., 2004; Kaur et al., 2006; Kottemann et al., 2005)
3. Measure changes: microarrays (Baliga et al., 2004;Kaur et al., 2006; Whitehead et al., 2006).
4. Integrate diverse data (mRNA levels, evolutionarily conserved associations among proteins, metabolic pathways, cis-regulatory motifs, etc.) with the cMonkey algorithm to reduce data complexity and identify subsets of genes that are coregulated in certain environments (biclusters) (Reiss et al., 2006).
5. Using the machine learning algorithm Inferelator construct a dynamic network model for influence of changes in EFs and TFs on the expression of coregulated genes (Bonneau et al., 2006).
6. Explore the network with Gaggle, a framework for data integration and software interoperability to formulate and then experimentally test hypotheses to drive additional iterations of steps 2–6 (Shannon et al., 2006)
Integrative Bioinformatics: Their framework
Human Genetics
Integrative Bioinformatics: results
Human Genetics
Goes to show that:
1. Aggregate
2. Search/Visualize
3. Analyze/Feedback
• Combine data from different sources
• Filter
• Algorithms
Human Genetics
Need for adaptable software
Goal: Facilitate ideas
Cytoscape - Network Visualization and Analysis
• Freely-available (open-source, java) software, easily extensible (Plugin API)
• Visualizing networks (e.g. molecular interaction networks)
• Analyzing networks with gene expression profiles and other cell state data (GO, proteomics, …)
• Used in several hundred analyses in recent literature
• Continuity guaranteed
Human Genetics
An example Cytoscape work-flow
Human Genetics
Cytoscape Workflow
1. Load Networks (Import network data into Cytoscape)
2. Load Attributes (Get data about networks into Cytoscape)
3. Analyze and Visualize Networks 4. Prepare for Publication• A specific example of this workflow:
– Cline, et al. “Integration of biological networks and gene expression data using Cytoscape”, Nature Protocols, 2, 2366-2382 (2007).
Human Genetics
Networks as graphs• A Network is a collection of
– Nodes (or vertices)– Edges connecting nodes
(directed or undirected, weighted, multiple edges, self-edges)
Human Genetics
Nodes can represent proteins, genes, metabolites, or groups of these (e.g. complexes) - any sort of object
Edges can be either physical or functional interactions, activators, regulators, reactions - any sort of relations
Cytoscape Workflow1. Load Networks (Get network data into Cytoscape)
2. Load Attributes (Get data about networks into Cytoscape)
3. Analyze and Visualize Networks 4. Prepare for Publication
Human Genetics
Creating a network
Human Genetics
Free-format Text and Excel Files
Human Genetics
Specify Input File
Define Columns
Text Parsing Options
Preview
Pathways: plenty resources
Human Genetics
http://pathguide.org : over 240 pathway db’s
All kinds of network data…
• Physical interactions– Protein – Protein interactions – Protein – DNA interactions– Metabolic interactions
• Functional interactions– Co-expression relations– Genetic interactions– Knockout/siRNA – targets
Human Genetics
Pre-formatted Network Files • Cytoscape supports many popular file formats:
SIF (Simple Interaction Format)
GML (Graph Markup Language)
XGMML (eXtensible Graph Markup and Modeling Language)
BioPax (Biological Pathway Data)
PSI-MI 1 & 2.5 (Protein Standards Initiative)
SBML Level 2 (Systems Biology Markup Language)
• Available for download from data sources (URLs, web-services, formatted table files)
Human Genetics
Internet Databases• Cytoscape version 2.6
– web service clients: import networks directly from several trusted internet resources
IntAct (MBL-EBI)
PathwayCommons (collection of data resources)
NCBI Entrez Gene Many more will be included...
Human Genetics
Interaction Database Search
Human Genetics
Import
Visualize and Analyze
Cytoscape Workflow1. Load Networks (Get network data into Cytoscape)
2. Load Attributes (Get data about networks into Cytoscape)
3. Analyze and Visualize Networks 4. Prepare for Publication
Human Genetics
What are Attributes?• Any data that describes or provides details about the nodes
and edges in the network– Gene Expression Data– Mass Spectrometry Data– Protein Structure Information– Gene Ontology (GO) terms– Interaction Confidence Values, etc
• Cytoscape support multiple data types– Numbers (integers, floats) – Text (strings) – Logical (booleans)– Lists…
Human Genetics
Attribute Management
Node or Edge ID
Specific Attribute Tabs
Select Attributes for Display
Strings and floating type of attributes
Load Attributes:Import Attribute Files
• Map data about Networks onto Networks.• Attributes can be loaded in many of the same
ways as networks.Import pre-formatted attribute filesImport formatted text or Excel filesCreate attributes manually in attribute editorLoad attributes from web servicesID mapping though node attributes
Human Genetics
ID Mapping
• Mapping identifiers from one source to another is a major challenge
• Multiple levels of IDs E.g. probe->gene ->peptide->protein
• Cytoscape provides an ID mapping through the BioMart web service of EBI to convert the IDs
• Not perfect but sufficient• Additional mapping mechanism
underway
Human Genetics
Cytoscape Workflow1. Load Networks (Get network data into Cytoscape)
2. Load Attributes (Get data about networks into Cytoscape)
3. Analyze and Visualize Networks 4. Prepare for Publication
Human Genetics
Visual Data Integration
Human Genetics
1. Network Data
2. Attribute Data
YDR382W pp YDL130WYDR382W pp YFL039CYFL039C pp YCL040WYFL039C pp YHR179W
ExpressionValueYCL040W = 0.542YDL130W = -0.123YDR382W = -0.058YFL039C = 0.192YHR179W = 0.078
VizMapper
VizMapper
Human Genetics
List of Data Attributes
Default Visual Style Editor
List of Visual Attributes
Mapping definition
List of Visual Styles
Types of mappings• Continuous
Continuous Data mapped to Continuous Visual Attributes (e.g. gene expression levels mapped to node color)
Continuous Data mapped to Discrete Visual Attributes (e.g. p-value categories mapped to node shape)
• Discrete Discrete (categorical) Data to Discrete Visual Attributes (e.g. GO annotation
mapped to node shape) Discrete Data mapped to Continuous Visual Attributes(e.g. multiple GO terms
mapped to pie coloring)
Human Genetics
Network Filtering
Human Genetics
Several Layout Algorithms
Human Genetics
Spring-embeddedCircular
Hierarchical
Linkout
• Nodes and Edges act as hyperlinks to external databases.
• User-configurable URLs
• Collection of the biological results for the publication
Human Genetics
Cytoscape Workflow1. Load Networks (Get network data into Cytoscape)
2. Load Attributes (Get data about networks into Cytoscape)
3. Analyze and Visualize Networks 4. Prepare for Publication
Human Genetics
Prepare for Publication• Fine tune the
Figures• Manual Layout
manipulation options (align, scale, rotate)
• Manually override visual styles
–place labels, change colors, etc.
Human Genetics
Finalizing the Figures• Publication Quality
Graphics in several formatsPDF, EPS, SVG,
PNG, JPEG, and BMP
• Export Session to HTML for Web
Human Genetics
Cytoscape: So what?
The big Pro Cyto argument: EXTENSIBLE• Plugins, Plugins, Plugins
– In our case enabled extended array data analysis
Human Genetics
Cytoscape is Extensible
• Cytoscape is open source and free software • A plugin interface that allows any programmer to write their
own extensions to Cytoscape• Plugins represent the primary biological analysis mechanism in
Cytoscape• Plugins are distributed from a central Cytoscape database and
can be installed while running• Several dozens of plug-ins currently available
(www.cytoscape.org/plugins/index.php)
Human Genetics
Hello World Plugin
Human Genetics
http://cytoscape.org/cgi-bin/moin.cgi/Hello_World_Plugin
http://cytoscape.org/cgi-bin/moin.cgi/Developer_Homepage
Extending the workflow through plugins
Human Genetics
Graph based integration and analysis of molecular biological data
Integrative Bioinformatics in our group
• Aggregate data: 18000+ Affymetrix arrays– Tumor series– Public data– Experiments
• Manipulate celllines; Lentiviral library• Search/Visualize/Selection: R2
– Statistical cutoffs – Correlations: R2– Clinical data coupling
• Analysis/Feedback: R2 and Cytoscape– Known Interactions– Transcription Factor binding
Human Genetics
Integrative Bioinformatics in our group
Human Genetics
Externaldata sources
Statistical analysisPerl module
Cytoscape webstart
AMC Plugin
Canonicalpaths
DB
Patient dataGEO arrays
Algorithms
Array data: Tumor and Experiments
R2-array analysis interface
Cytoscape interface
HGServer
Array data analysis: R2
Human GeneticsMainly work by Jan Koster
R2 interface: Demo
Human Genetics
R2 interface
Human Genetics
R2 interface
Human Genetics
R2 interface
Human Genetics
R2 interface
Human Genetics
R2 interface
Human Genetics
Timeseries in R2 / Cytoscape (Demo)
Human Genetics
Timeseries in R2
Human Genetics
Timeseries in R2
Human Genetics
Timeseries in R2Integration with Cytoscape
through webstart
Human Genetics
Timeseries in Cytoscape: Visualization
Human Genetics
Timeseries in Cytoscape: Aggregate data
Human Genetics
Timeseries in Cytoscape: Search/Filter
Human Genetics
Timeseries in Cytoscape: Filter
Human Genetics
Timeseries in Cytoscape
Human Genetics
Timeseries in Cytoscape
Human Genetics
Tf (green) and partners (red)
Human Genetics
Filtering
Human Genetics
Filtering
Human Genetics
Coloring, layout
Human Genetics
Resuming:
1. Aggregate
2. Search/Visualize
3. Analyze/ Feedback
• Combine NOTCH3 knockout data with TF and PPi data
• Layout timeseries/Find downstream targets
• Identify MSX1/Knockout in new experiment
Human Genetics
More Plugin Examples• BiNGO (Enriched GO categories found in the sub-network)
• WikiPathways (Visualize curated pathways)
• MCODE (Putative protein complexes)
• GenePro (Protein-Protein interaction cluster visualization)
• jActiveModules (Search for significant sub-networks)
• NetworkAnalyzer (Statistical analysis of networks)
• Agilent Literature Search (Network creation)
• CyGoose (Gaggle communication)
• See http://cytoscape.org/plugins for many moreHuman Genetics
Timeseries and BinGO: Aggregate
Human Genetics
Timeseries and BinGO: Analyze
Human Genetics
Timeseries and BinGO
Human Genetics
Timeseries and BinGO
Human Genetics
GOlorize plug-in (Pasteur)
• Node placement on the
basis of both the
connection structure
(the edges) and the
class structure (GO)
• A modification of the
classic force-directed
layout algorithm
• Beyond GO classes, other class information can be used though attributes (e.g. active modules, complexes)
Human Genetics
GOlorize plug-in interface
Human Genetics
Γ
Default settings for theclass attractive forceand separation factor
Class-directed network layout
Example: genetic interaction network
Human Genetics
Γ
Standard Spring-embedded layout algorithm in Cytoscape
Example: genetic interaction network
Human Genetics
Γ
Spring-embedded layout algorithm with GO colour-coding
Example: genetic interaction network
Human Genetics
Γ
Final results of the GOlorize layout algorithm in Cytoscape
Garcia et al. Bioinformatics 2007
Find Network Clusters - MCODE Plugin
• Network clusters are highly interconnected sub-networks that may be also partly overlapping
• Clusters in a protein-protein interaction network have been shown to represent protein complexes and parts of biological pathways
• Clusters in a protein similarity network represent protein families
• Network clustering is available through the MCODE Cytoscape plugin
Human Genetics
Human Genetics
Network Clustering7000 Yeast interactionsamong 3000 proteins
Human GeneticsBader & Hogue, BMC Bioinformatics 2003 4(1):2
Human Genetics
Proteasome 26S
Proteasome 20S
Ribosome
RNA Pol core
RNA Splicing
Bader & Hogue, BMC Bioinformatics 2003 4(1):2
Find Network Motifs - Netmatch plugin
• Network motif is a sub-network that occurs significantly more often than by chance alone
• Input: query and target networks, optional node/edge labels
• Output: topological query matches as subgraphs of target network
• Supports: subgraph matching, node/edge labels, label wildcards, approximate paths
• http://alpha.dmi.unict.it/~ctnyu/netmatch.htmlHuman Genetics
Finding query sub-networks
Human Genetics
Query Results
Ferro et al. Bioinformatics 2007
Finding Signaling Pathways• Potential signaling pathways from plasma membrane to nucleus via cytoplasm
Human Genetics
Raf-1
MekMAPK
TFs
Nucleus - Growth ControlMitogenesis
MAP Kinase Cascade
Ras
NetMatch query
Shortest path betweensubgraph matches
Signaling pathway exampleNetMatch Results
Find Active Subnetworks• Active modules are sub-networks that show
differential expression over user-specified conditions or time-pointsMicroarray gene-expression attributesMass-spectrometry protein abundance
• MethodCalculate z-score/node, ZA score/subgraph,
correct for random expression data samplingScore over multiple experimental conditionsSimulated annealing-based search method is
used to find the high scoring networksHuman Genetics
Ideker T, Ozier O, Schwikowski B, Siegel AF Bioinformatics. 2002;18 Suppl 1:S233-40
Finding active modules
Human GeneticsIdeker T et al. Science 2001; Bioinformatics 2002
jActiveModules plug-in
Input: interaction network and p-values for gene expression values over several conditions
Output: significant sub-networks that show differential expression over one or several conditions
Cerebral: Cellular location and expression data
Human Genetics
Concluding
• Cytoscape is a proven valuable tool for integrative bioinformatics
• Easily extensible: well suited to answer new biological research questions
• Analyses can be tedious for biologists; up to bioinformaticians to translate these in simple workflows
• Therefore: bioinformaticians, integrate into wet-lab research groups!
Human Genetics
Some notes…
• Plugin lifetime– Maintenance– Interoperability
• Visualization issues…– Standard biologist layouts– Fancy visuals
Cytoscape 3.0 aims to solve these issues (amongst others)
Human Genetics
Availability
• Cytoscape:– http://cytoscape.org– [email protected] – [email protected]
• R2– Available shortly through http://humangenetics-amc.nl– Keep yourself posted on
http://groups.google.com/group/r2-announce
Human Genetics