86
Integrative Bioinformatics using Cytoscape (and R2)

cytoscape

Embed Size (px)

DESCRIPTION

cytoscape meterial

Citation preview

Page 1: cytoscape

Integrative Bioinformatics using Cytoscape(and R2)

Page 2: cytoscape

(Bio)Chemistry versus Molecular Biology

…some basic concepts(Bio)Chemistry

• Concentrations• Molecular structures• Reaction equations• Quantitative• Defined experimental setup

Molecular Biology• Regulation• Large biomolecules• Large scale processes• Qualitative• Complex experimental setup (by

necessity!)

Human Genetics

Page 3: cytoscape

Molecular Biology: New techniquesIntegrative Bioinformatics needed

(Deep)Sequencing – Arrays – Proteomics

• Quantitative analysis– handling large datasets– statistics

• Capturing complexity– integration– graphs

• Integrative Bioinformatics: Integrated Bioinformaticians!

Human Genetics

Page 4: cytoscape

Integrative Bioinformatics: An example

Human Genetics

Page 5: cytoscape

Integrative Bioinformatics: What they did

Human Genetics

1. Sequence genome; assign gene function using protein sequence, structural similarities (Bonneau et al., 2004; Ng et al., 2000)

2. Perturb cells: environmental factors; knockouts (Baliga et al., 2004; Kaur et al., 2006; Kottemann et al., 2005)

3. Measure changes: microarrays (Baliga et al., 2004;Kaur et al., 2006; Whitehead et al., 2006).

4. Integrate diverse data (mRNA levels, evolutionarily conserved associations among proteins, metabolic pathways, cis-regulatory motifs, etc.) with the cMonkey algorithm to reduce data complexity and identify subsets of genes that are coregulated in certain environments (biclusters) (Reiss et al., 2006).

5. Using the machine learning algorithm Inferelator construct a dynamic network model for influence of changes in EFs and TFs on the expression of coregulated genes (Bonneau et al., 2006).

6. Explore the network with Gaggle, a framework for data integration and software interoperability to formulate and then experimentally test hypotheses to drive additional iterations of steps 2–6 (Shannon et al., 2006)

Page 6: cytoscape

Integrative Bioinformatics: Their framework

Human Genetics

Page 7: cytoscape

Integrative Bioinformatics: results

Human Genetics

Page 8: cytoscape

Goes to show that:

1. Aggregate

2. Search/Visualize

3. Analyze/Feedback

• Combine data from different sources

• Filter

• Algorithms

Human Genetics

Need for adaptable software

Goal: Facilitate ideas

Page 9: cytoscape

Cytoscape - Network Visualization and Analysis

• Freely-available (open-source, java) software, easily extensible (Plugin API)

• Visualizing networks (e.g. molecular interaction networks)

• Analyzing networks with gene expression profiles and other cell state data (GO, proteomics, …)

• Used in several hundred analyses in recent literature

• Continuity guaranteed

Human Genetics

Page 10: cytoscape

An example Cytoscape work-flow

Human Genetics

Page 11: cytoscape

Cytoscape Workflow

1. Load Networks (Import network data into Cytoscape)

2. Load Attributes (Get data about networks into Cytoscape)

3. Analyze and Visualize Networks 4. Prepare for Publication• A specific example of this workflow:

– Cline, et al. “Integration of biological networks and gene expression data using Cytoscape”, Nature Protocols, 2, 2366-2382 (2007).

Human Genetics

Page 12: cytoscape

Networks as graphs• A Network is a collection of

– Nodes (or vertices)– Edges connecting nodes

(directed or undirected, weighted, multiple edges, self-edges)

Human Genetics

Nodes can represent proteins, genes, metabolites, or groups of these (e.g. complexes) - any sort of object

Edges can be either physical or functional interactions, activators, regulators, reactions - any sort of relations

Page 13: cytoscape

Cytoscape Workflow1. Load Networks (Get network data into Cytoscape)

2. Load Attributes (Get data about networks into Cytoscape)

3. Analyze and Visualize Networks 4. Prepare for Publication

Human Genetics

Page 14: cytoscape

Creating a network

Human Genetics

Page 15: cytoscape

Free-format Text and Excel Files

Human Genetics

Specify Input File

Define Columns

Text Parsing Options

Preview

Page 16: cytoscape

Pathways: plenty resources

Human Genetics

http://pathguide.org : over 240 pathway db’s

Page 17: cytoscape

All kinds of network data…

• Physical interactions– Protein – Protein interactions – Protein – DNA interactions– Metabolic interactions

• Functional interactions– Co-expression relations– Genetic interactions– Knockout/siRNA – targets

Human Genetics

Page 18: cytoscape

Pre-formatted Network Files • Cytoscape supports many popular file formats:

SIF (Simple Interaction Format)

GML (Graph Markup Language)

XGMML (eXtensible Graph Markup and Modeling Language)

BioPax (Biological Pathway Data)

PSI-MI 1 & 2.5 (Protein Standards Initiative)

SBML Level 2 (Systems Biology Markup Language)

• Available for download from data sources (URLs, web-services, formatted table files)

Human Genetics

Page 19: cytoscape

Internet Databases• Cytoscape version 2.6

– web service clients: import networks directly from several trusted internet resources

IntAct (MBL-EBI)

PathwayCommons (collection of data resources)

NCBI Entrez Gene Many more will be included...

Human Genetics

Page 20: cytoscape

Interaction Database Search

Human Genetics

Import

Visualize and Analyze

Page 21: cytoscape

Cytoscape Workflow1. Load Networks (Get network data into Cytoscape)

2. Load Attributes (Get data about networks into Cytoscape)

3. Analyze and Visualize Networks 4. Prepare for Publication

Human Genetics

Page 22: cytoscape

What are Attributes?• Any data that describes or provides details about the nodes

and edges in the network– Gene Expression Data– Mass Spectrometry Data– Protein Structure Information– Gene Ontology (GO) terms– Interaction Confidence Values, etc

• Cytoscape support multiple data types– Numbers (integers, floats) – Text (strings) – Logical (booleans)– Lists…

Human Genetics

Page 23: cytoscape

Attribute Management

Node or Edge ID

Specific Attribute Tabs

Select Attributes for Display

Strings and floating type of attributes

Page 24: cytoscape

Load Attributes:Import Attribute Files

• Map data about Networks onto Networks.• Attributes can be loaded in many of the same

ways as networks.Import pre-formatted attribute filesImport formatted text or Excel filesCreate attributes manually in attribute editorLoad attributes from web servicesID mapping though node attributes

Human Genetics

Page 25: cytoscape

ID Mapping

• Mapping identifiers from one source to another is a major challenge

• Multiple levels of IDs E.g. probe->gene ->peptide->protein

• Cytoscape provides an ID mapping through the BioMart web service of EBI to convert the IDs

• Not perfect but sufficient• Additional mapping mechanism

underway

Human Genetics

Page 26: cytoscape

Cytoscape Workflow1. Load Networks (Get network data into Cytoscape)

2. Load Attributes (Get data about networks into Cytoscape)

3. Analyze and Visualize Networks 4. Prepare for Publication

Human Genetics

Page 27: cytoscape

Visual Data Integration

Human Genetics

1. Network Data

2. Attribute Data

YDR382W pp YDL130WYDR382W pp YFL039CYFL039C pp YCL040WYFL039C pp YHR179W

ExpressionValueYCL040W = 0.542YDL130W = -0.123YDR382W = -0.058YFL039C = 0.192YHR179W = 0.078

VizMapper

Page 28: cytoscape

VizMapper

Human Genetics

List of Data Attributes

Default Visual Style Editor

List of Visual Attributes

Mapping definition

List of Visual Styles

Page 29: cytoscape

Types of mappings• Continuous

Continuous Data mapped to Continuous Visual Attributes (e.g. gene expression levels mapped to node color)

Continuous Data mapped to Discrete Visual Attributes (e.g. p-value categories mapped to node shape)

• Discrete Discrete (categorical) Data to Discrete Visual Attributes (e.g. GO annotation

mapped to node shape) Discrete Data mapped to Continuous Visual Attributes(e.g. multiple GO terms

mapped to pie coloring)

Human Genetics

Page 30: cytoscape

Network Filtering

Human Genetics

Page 31: cytoscape

Several Layout Algorithms

Human Genetics

Spring-embeddedCircular

Hierarchical

Page 32: cytoscape

Linkout

• Nodes and Edges act as hyperlinks to external databases.

• User-configurable URLs

• Collection of the biological results for the publication

Human Genetics

Page 33: cytoscape

Cytoscape Workflow1. Load Networks (Get network data into Cytoscape)

2. Load Attributes (Get data about networks into Cytoscape)

3. Analyze and Visualize Networks 4. Prepare for Publication

Human Genetics

Page 34: cytoscape

Prepare for Publication• Fine tune the

Figures• Manual Layout

manipulation options (align, scale, rotate)

• Manually override visual styles

–place labels, change colors, etc.

Human Genetics

Page 35: cytoscape

Finalizing the Figures• Publication Quality

Graphics in several formatsPDF, EPS, SVG,

PNG, JPEG, and BMP

• Export Session to HTML for Web

Human Genetics

Page 36: cytoscape

Cytoscape: So what?

The big Pro Cyto argument: EXTENSIBLE• Plugins, Plugins, Plugins

– In our case enabled extended array data analysis

Human Genetics

Page 37: cytoscape

Cytoscape is Extensible

• Cytoscape is open source and free software • A plugin interface that allows any programmer to write their

own extensions to Cytoscape• Plugins represent the primary biological analysis mechanism in

Cytoscape• Plugins are distributed from a central Cytoscape database and

can be installed while running• Several dozens of plug-ins currently available

(www.cytoscape.org/plugins/index.php)

Human Genetics

Page 38: cytoscape

Hello World Plugin

Human Genetics

http://cytoscape.org/cgi-bin/moin.cgi/Hello_World_Plugin

http://cytoscape.org/cgi-bin/moin.cgi/Developer_Homepage

Page 39: cytoscape

Extending the workflow through plugins

Human Genetics

Graph based integration and analysis of molecular biological data

Page 40: cytoscape

Integrative Bioinformatics in our group

• Aggregate data: 18000+ Affymetrix arrays– Tumor series– Public data– Experiments

• Manipulate celllines; Lentiviral library• Search/Visualize/Selection: R2

– Statistical cutoffs – Correlations: R2– Clinical data coupling

• Analysis/Feedback: R2 and Cytoscape– Known Interactions– Transcription Factor binding

Human Genetics

Page 41: cytoscape

Integrative Bioinformatics in our group

Human Genetics

Externaldata sources

Statistical analysisPerl module

Cytoscape webstart

AMC Plugin

Canonicalpaths

DB

Patient dataGEO arrays

Algorithms

Array data: Tumor and Experiments

R2-array analysis interface

Cytoscape interface

HGServer

Page 42: cytoscape

Array data analysis: R2

Human GeneticsMainly work by Jan Koster

Page 43: cytoscape

R2 interface: Demo

Human Genetics

Page 44: cytoscape

R2 interface

Human Genetics

Page 45: cytoscape

R2 interface

Human Genetics

Page 46: cytoscape

R2 interface

Human Genetics

Page 47: cytoscape

R2 interface

Human Genetics

Page 48: cytoscape

R2 interface

Human Genetics

Page 49: cytoscape

Timeseries in R2 / Cytoscape (Demo)

Human Genetics

Page 50: cytoscape

Timeseries in R2

Human Genetics

Page 51: cytoscape

Timeseries in R2

Human Genetics

Page 52: cytoscape

Timeseries in R2Integration with Cytoscape

through webstart

Human Genetics

Page 53: cytoscape

Timeseries in Cytoscape: Visualization

Human Genetics

Page 54: cytoscape

Timeseries in Cytoscape: Aggregate data

Human Genetics

Page 55: cytoscape

Timeseries in Cytoscape: Search/Filter

Human Genetics

Page 56: cytoscape

Timeseries in Cytoscape: Filter

Human Genetics

Page 57: cytoscape

Timeseries in Cytoscape

Human Genetics

Page 58: cytoscape

Timeseries in Cytoscape

Human Genetics

Page 59: cytoscape

Tf (green) and partners (red)

Human Genetics

Page 60: cytoscape

Filtering

Human Genetics

Page 61: cytoscape

Filtering

Human Genetics

Page 62: cytoscape

Coloring, layout

Human Genetics

Page 63: cytoscape

Resuming:

1. Aggregate

2. Search/Visualize

3. Analyze/ Feedback

• Combine NOTCH3 knockout data with TF and PPi data

• Layout timeseries/Find downstream targets

• Identify MSX1/Knockout in new experiment

Human Genetics

Page 64: cytoscape

More Plugin Examples• BiNGO (Enriched GO categories found in the sub-network)

• WikiPathways (Visualize curated pathways)

• MCODE (Putative protein complexes)

• GenePro (Protein-Protein interaction cluster visualization)

• jActiveModules (Search for significant sub-networks)

• NetworkAnalyzer (Statistical analysis of networks)

• Agilent Literature Search (Network creation)

• CyGoose (Gaggle communication)

• See http://cytoscape.org/plugins for many moreHuman Genetics

Page 65: cytoscape

Timeseries and BinGO: Aggregate

Human Genetics

Page 66: cytoscape

Timeseries and BinGO: Analyze

Human Genetics

Page 67: cytoscape

Timeseries and BinGO

Human Genetics

Page 68: cytoscape

Timeseries and BinGO

Human Genetics

Page 69: cytoscape

GOlorize plug-in (Pasteur)

• Node placement on the

basis of both the

connection structure

(the edges) and the

class structure (GO)

• A modification of the

classic force-directed

layout algorithm

• Beyond GO classes, other class information can be used though attributes (e.g. active modules, complexes)

Human Genetics

Page 70: cytoscape

GOlorize plug-in interface

Human Genetics

Γ

Default settings for theclass attractive forceand separation factor

Class-directed network layout

Page 71: cytoscape

Example: genetic interaction network

Human Genetics

Γ

Standard Spring-embedded layout algorithm in Cytoscape

Page 72: cytoscape

Example: genetic interaction network

Human Genetics

Γ

Spring-embedded layout algorithm with GO colour-coding

Page 73: cytoscape

Example: genetic interaction network

Human Genetics

Γ

Final results of the GOlorize layout algorithm in Cytoscape

Garcia et al. Bioinformatics 2007

Page 74: cytoscape

Find Network Clusters - MCODE Plugin

• Network clusters are highly interconnected sub-networks that may be also partly overlapping

• Clusters in a protein-protein interaction network have been shown to represent protein complexes and parts of biological pathways

• Clusters in a protein similarity network represent protein families

• Network clustering is available through the MCODE Cytoscape plugin

Human Genetics

Page 75: cytoscape

Human Genetics

Network Clustering7000 Yeast interactionsamong 3000 proteins

Page 76: cytoscape

Human GeneticsBader & Hogue, BMC Bioinformatics 2003 4(1):2

Page 77: cytoscape

Human Genetics

Proteasome 26S

Proteasome 20S

Ribosome

RNA Pol core

RNA Splicing

Bader & Hogue, BMC Bioinformatics 2003 4(1):2

Page 78: cytoscape

Find Network Motifs - Netmatch plugin

• Network motif is a sub-network that occurs significantly more often than by chance alone

• Input: query and target networks, optional node/edge labels

• Output: topological query matches as subgraphs of target network

• Supports: subgraph matching, node/edge labels, label wildcards, approximate paths

• http://alpha.dmi.unict.it/~ctnyu/netmatch.htmlHuman Genetics

Page 79: cytoscape

Finding query sub-networks

Human Genetics

Query Results

Ferro et al. Bioinformatics 2007

Page 80: cytoscape

Finding Signaling Pathways• Potential signaling pathways from plasma membrane to nucleus via cytoplasm

Human Genetics

Raf-1

MekMAPK

TFs

Nucleus - Growth ControlMitogenesis

MAP Kinase Cascade

Ras

NetMatch query

Shortest path betweensubgraph matches

Signaling pathway exampleNetMatch Results

Page 81: cytoscape

Find Active Subnetworks• Active modules are sub-networks that show

differential expression over user-specified conditions or time-pointsMicroarray gene-expression attributesMass-spectrometry protein abundance

• MethodCalculate z-score/node, ZA score/subgraph,

correct for random expression data samplingScore over multiple experimental conditionsSimulated annealing-based search method is

used to find the high scoring networksHuman Genetics

Ideker T, Ozier O, Schwikowski B, Siegel AF Bioinformatics. 2002;18 Suppl 1:S233-40

Page 82: cytoscape

Finding active modules

Human GeneticsIdeker T et al. Science 2001; Bioinformatics 2002

jActiveModules plug-in

Input: interaction network and p-values for gene expression values over several conditions

Output: significant sub-networks that show differential expression over one or several conditions

Page 83: cytoscape

Cerebral: Cellular location and expression data

Human Genetics

Page 84: cytoscape

Concluding

• Cytoscape is a proven valuable tool for integrative bioinformatics

• Easily extensible: well suited to answer new biological research questions

• Analyses can be tedious for biologists; up to bioinformaticians to translate these in simple workflows

• Therefore: bioinformaticians, integrate into wet-lab research groups!

Human Genetics

Page 85: cytoscape

Some notes…

• Plugin lifetime– Maintenance– Interoperability

• Visualization issues…– Standard biologist layouts– Fancy visuals

Cytoscape 3.0 aims to solve these issues (amongst others)

Human Genetics

Page 86: cytoscape

Availability

• Cytoscape:– http://cytoscape.org– [email protected][email protected]

• R2– Available shortly through http://humangenetics-amc.nl– Keep yourself posted on

http://groups.google.com/group/r2-announce

Human Genetics