49
Identifying functional subnetworks in large-scale datasets Benno Schwikowski Institut Pasteur – Systems Biology Group http://systemsbiology.fr

Identifying functional subnetworks in large-scale datasets Benno Schwikowski Institut Pasteur – Systems Biology Group

  • View
    215

  • Download
    1

Embed Size (px)

Citation preview

Identifying functional subnetworksin large-scale datasets

Benno SchwikowskiInstitut Pasteur – Systems Biology Group

http://systemsbiology.fr

Benno Schwikowski

The three levels of this talk

1. Discovery of pathways active in HepC infection

2. Cytoscape plug-ins3. Cytoscape platform

Benno Schwikowski

Hepatitis C infection

• One person out of 30 is infected• No vaccine exists• In 20% of chronic infections, liver

fibrosis and cirrhosis• Frequently requires liver

transplants

Benno Schwikowski

Studying HepC infection mRNA changes

• 50% of transplant livers become re-infected with Hepatitis C

• Study expression of 7000 genes in re-infected livers after transplantation– 1-24 month post-transplant– Samples in 3-6 month intervals

• 28 biopsies from 11 patients– Mixture of hepatocytes, hepatic stellate cell,

Kupffer cells, various types of blood cells

• Compare against pre-transplant reference pool

Benno Schwikowski

Result of mRNA expression analysis

• Most genes (5968 of 7000)were significantly under- or overexpressed in one or more experiments

• High patient-to-patient variation

Benno Schwikowski

Our approach

1. Construct seed networkamong known molecular players

2. Expand seed networkto include differentially expressed genes

3. Identify putative pathwaysby the Active Modules approach

Seed network

Protein-proteinProtein-DNAPhosphorylationActivationRepressionCovalent bondMethylation

Types of interactions

Benno Schwikowski

InteractionFetcher plug-in

Purpose• Dynamically retrieves remote information for selected nodes

– From SQL database– Requests data via XML-RPC protocol

Currently implemented types• Protein/gene synonyms• Orthologs• Sequences (DNA, protein, DNA upstream)

– Gene, protein, • Interactions/associationsOptions• Cross-species queries• Ortholog information from Homologene• Inferred interactions (interologs)• Interactive links to Source Web pages100% open-source (client and server)

Benno Schwikowski

2. Expand seed network

Purpose• Bring significantly up-/downregulated

genes “into the picture”Approach• Add interactions with differentially

expressed genes (“in silico pull-down”)– Use BIND, HPRD databases– Only human-curated interactions

Network after InteractionFetcher expansion

Benno Schwikowski

Identifying putative pathwaysWhy clustering can be problematic

• Many clustering methods are not model-based significance of clusters is unclear

• Any given cluster may not be supported by all experiments – noise problem

• Clusters tend to contain unrelated genes with vaguely similar profiles

Benno Schwikowski

The three levels of this talk

1. Discovery of pathways active in HepC infection

2. Cytoscape plug-ins3. Cytoscape platform

Benno Schwikowski

How can the clustering issuesbe addressed? The ActiveModules

Plug-in

• Define “up-/downregulated” on the basis of a well-defined statistical model

• Also derive clusters from some of the input experiments

• Use additional evidence to focus on “plausible” clusters protein interactions

Benno Schwikowski

Interaction networks

Schwikowski, Uetz, FieldsNature Biotechnology (2000)

Benno Schwikowski

Modular organization of interaction networks

Benno Schwikowski

A lot of interaction data is becoming available

Databases on...• Protein-protein interactions• Protein-DNA interactions• Genetic interactions• Metabolic pathways• Cell signaling pathways, similarity

relationships, literature-based relationships

Benno Schwikowski

Multi-criteria detection of modules

Experiments

Genes

2. Differential Gene/Protein

Abundances/Activities

1. Interaction networkbetween

genes/proteins

Pert

urb

ati

on

s /c

ond

itio

ns

Rank adjustment: Binomial summation

Pz = 1-(zA(j))

m

jh

hmz

hz

mhjA PPp 1 rA(j)=-1(1-

pA(j)) m = total number of conditions

j = size of subset of conditions

FinalScore

Ideker, Ozier, Schwikowski, Siegel(2002): Bioinformatics 18. S233-240

Scoring a module candidate

Benno Schwikowski

Pathways in Rosetta’s compendium

(300 conditions)

Benno Schwikowski

The three levels of this talk

1. Discovery of pathways active in HepC infection

2. Cytoscape plug-ins3. Cytoscape platform

Benno Schwikowski

Active Modules plug-in appliedto HCV re-infection data

• Iterative application results in four significant highly overlapping subnetworks

• Repeat analysis only retaining “late-active” re-infection experiments– Eliminates pathways activated by

transplant operation – Cutoff: 8 months

Which observations can we make locally?

Network after InteractionFetcher expansion

Bold: Differentially regulated subnetworkRed/Green: Late-active subnetwork

Benno Schwikowski

Cytotalk plug-in

• Overrepresentation analysis using Cytotalk plug-in, R, of overrepresentation of genes in Gene Ontology classes

• Cytotalk enables interactive communication with– C/C++ programs– Java processes– Python– UNIX shell scripts– R, R scripts

• Can be run on same machine or any other Internet-connected machine

• Can function as Cytoscape plug-in• 100% open-source

Benno Schwikowski

The three levels of this talk

1. Discovery of pathways active in HepC infection

2. Cytoscape plug-ins3. Cytoscape platform

Benno Schwikowski

Some Network Visualization Tools

• Pajek - Slovenia• Osprey - SLRI, Toronto• VisANT - BU• Biolayout - EBI• GraphViz• PowerPoint• Others• Cytoscape (only open-source biology)

Cytoscape

Benno Schwikowski

Cytoscape Basic Concepts

• Objectsvisualized as nodes

• Relationshipsvisualized as edges

• Attributes (name, sequence, source,...)

• Mappingattributes drawing customizable throughvisual mapper

Cytoscape file formats

YDR216W pd YIL056WYDR216W pd YKR042WYDR216W pd YGL096WYDR216W pd YDR077W

[...]

GENE DESC exp0.sig exp1.sig exp0.sig exp1.sig

GENE0 G0 0.0 0.0 23.2 11.5

GENE1 G1 0.0 0.0 34.6 5.2

GENE2 G2 0.0 0.0 10.0 28.0

GENE3 G3 0.0 0.0 1.64 4.77

[...]

Sample interaction file

Sample interaction file

Benno Schwikowski

Display

• gene & protein expression

• protein interactions (physical andnon-physical)

• protein classifications

Analysis plug-in modules

http://www.cytoscape.org/

Java: platform independent + web-start

• 100% open-source

Cytoscape

Visual Styles

Display gene expressionas clear text

Visual Styles

Map expression valuesto node colors using acontinuous mapper

Visual Styles

Expression data mappedto node colors

Multidimensional attributes

Cytoscape, pre-release plug-inData from Ideker et al., Science (2001)

Layout

• 16 algorithms available through plug-ins

• Zooming, hide/show, alignment

yFiles Circular

Benno Schwikowski

Benno Schwikowski

Cytoscape Core – Differences to most other

approaches

• Emphasis on data analysis & integration

• No built-in semantics(added by plug-ins)

• Very simple concepts• Human-readable input formats• Extensibility

Benno Schwikowski

Cytoscape extensibility

• Core: 100% open source Java– Plug-in API– Plug-ins are independently licensed

• “Just need to do the biology”• Template code samples

Plug-in

Biomodules plug-in

Prinz S, Avila-Campillo I, Aldridge C, Srinivasan A,Dimitrov K, Siegel AF, and Galitski TGenome Res. 2004 14: 380-390

Benno Schwikowski

Cytoscape PluginsModules in Complex

NetworksIliana Avila-Campillo,

Tim Galitski

Discovering Regulatory and Signaling Circuits in Molecular Interaction NetworksTrey Ideker, Owen Ozier, Benno Schwikowski, Andrew Siegel

Data Integration in Juvenile Diabetes Research

Marta Janer, Paul Shannon

A network motif samplerDavid Reiss, Benno

Schwikowski

Benno Schwikowski

Cytoscape Core Features

• Visualize and lay out networks• Display network data using visual styles• Easily organize multiple networks• Bird’s eye view navigation of large networks• Supports SIF and GML, molecular profiling

formats, node/edge attributes• Functional annotation from GO + KEGG• Metanode support (hierarchical groupings)• Extensible through plugins (20 developed)

Benno Schwikowski

Baliga et al.Genome ResearchJune 2004

Benno Schwikowski

Collaborators: HCV

Institute for Systems Biology, Seattle, WA

• David Reiss• Iliana Avila-Campillo• Vesteinn Thorsson• Tim Galitski

Benno Schwikowski

Benno Schwikowski

Collaborators: Cytoscape

• ISBLeroy HoodRowan Christmas

• Agilent Technologies• Unilever PLC

• Long-term funding from NIH and participating institutions

• UCSDTrey IdekerChris Workman

• Memorial-Sloan KetteringCancer CenterChris SanderGary BaderEthan Cerami

• Pasteur Melissa ClineAndrea SplendianiTero Aittokallio

Shannon, P., et al. (2003). Cytoscape: A software environment for integrated models of biomolecular interaction networks . Genome Res 13, 2498-504.

Benno Schwikowski

Collaborators: Active Networks

• Trey Ideker• Owen Ozier• Andrew Siegel

• Richard Karp

Benno Schwikowski

Levels of Biological Information

DNAmRNA

ProteinPathwaysNetworks

CellsTissuesOrgans

IndividualsPopulations

Ecologies