51
Functional Module Prediction in Protein Interaction Networks Ch. Eslahchi NUS-IPM Workshop 5-7 April 2011

Functional Module Prediction in Protein Interaction Networks Ch. Eslahchi NUS-IPM Workshop 5-7 April 2011

Embed Size (px)

Citation preview

Page 1: Functional Module Prediction in Protein Interaction Networks Ch. Eslahchi NUS-IPM Workshop 5-7 April 2011

Functional Module Prediction in Protein

Interaction Networks

Ch. EslahchiNUS-IPM Workshop

5-7 April 2011

Page 2: Functional Module Prediction in Protein Interaction Networks Ch. Eslahchi NUS-IPM Workshop 5-7 April 2011

Identifying Modules from Biological Networks

Page 3: Functional Module Prediction in Protein Interaction Networks Ch. Eslahchi NUS-IPM Workshop 5-7 April 2011

• Studying the network of the interactions can help biologists to understand principles of cellular organization and biochemical phenomena.

Page 4: Functional Module Prediction in Protein Interaction Networks Ch. Eslahchi NUS-IPM Workshop 5-7 April 2011

• Functional modules as a critical level of biolog-ical hierarchy and relatively independent units play a special role in biological networks.

• Since network modules do not occur by chance, identification of modules is likely to capture the biologically meaningful interactions.

Page 5: Functional Module Prediction in Protein Interaction Networks Ch. Eslahchi NUS-IPM Workshop 5-7 April 2011

• Naturally, revealing modular structures in biological networks is a preliminary step for understanding how cells function and how proteins organize into a system.

Page 6: Functional Module Prediction in Protein Interaction Networks Ch. Eslahchi NUS-IPM Workshop 5-7 April 2011

• Many methods based on modeling the PPI data with a graph have been developed for analyzing the network structure of PPI networks.

• Hierarchical clustering methods have been proven to be a good strategy for metabolic networks and PPI networks.

Page 7: Functional Module Prediction in Protein Interaction Networks Ch. Eslahchi NUS-IPM Workshop 5-7 April 2011

• Ravasz et al. (2002) analyzed the hierarchical organization of modularity in metabolic networks.

• Brun et al. (2003) , Rives and Galitski (2003), and Lu, et al. (2004) applied three different clustering methods respectively, based on different metrics induced by shortest-distance, graphical distances, and probabilistic functions, to analyze the module structure of the yeast protein interaction networks on a clustering tree.

Page 8: Functional Module Prediction in Protein Interaction Networks Ch. Eslahchi NUS-IPM Workshop 5-7 April 2011

• Several papers such as Spirin and Mirny (2003), Bader and Hogue (2003) and Bu et al. (2003) have also shown that network modules which are densely connected within themselves but sparsely connected with the rest of network generally correspond to meaningful biological units such as protein complexes and functional modules.

Page 9: Functional Module Prediction in Protein Interaction Networks Ch. Eslahchi NUS-IPM Workshop 5-7 April 2011

• Several approaches to network clustering that have been used for analyzing PPI networks include edge-betweenness clustering Dunn et al. (2005), identication of k-cores Bader and Hogue (2003), restricted neighborhood search clustering (RNSC) King etal. (2004) and Markov clustering algorithm (MCL) Pereira-Leal etal. (2004).

Page 10: Functional Module Prediction in Protein Interaction Networks Ch. Eslahchi NUS-IPM Workshop 5-7 April 2011

• Spirin and Mirny (2003) detected about 50 network modules by using a combination of three methods (enumeration of complete sub-graphs, super paramagnetic clustering and Monte Carlo simulation), and most of which have been proven to be protein complexes or functional modules.

Page 11: Functional Module Prediction in Protein Interaction Networks Ch. Eslahchi NUS-IPM Workshop 5-7 April 2011

• Most current methods are partition algorithms which mean that each protein belongs to only one specific module. Such algorithms are not suitable for finding overlapping modules. Another problem is that PPI networks are very sparse, while most methods only identify strongly connected subgraphs as modules, so only a few modules were detected.

Page 12: Functional Module Prediction in Protein Interaction Networks Ch. Eslahchi NUS-IPM Workshop 5-7 April 2011

• A novel network clustering method (Clique Percolation Method, CPM) Palla etal.(2005), can reveal overlapping module structure of complex networks.

• But a distinct shortcoming of its application in PPI networks lies in that the method may be restrictive since the basal element of the method is a 3-clique structure. For example, the spoken-like module can not be detected and when the method is applied to large sparse PPI networks such as fly and worm PPI networks, only a few modules can be detected.

Page 13: Functional Module Prediction in Protein Interaction Networks Ch. Eslahchi NUS-IPM Workshop 5-7 April 2011

• In order to overcome the problem, line graph transformation (LGT), an important graph-theoretical technique was introduced by Shi-Hua Zhang etal.(2006).

Page 14: Functional Module Prediction in Protein Interaction Networks Ch. Eslahchi NUS-IPM Workshop 5-7 April 2011

• Computational method for prediction of functional modules based on gene distribution (i.e., their existences and orders) across multiple microbial genomes, and obtain a gene network in which every pair of genes is associated with a score representing their functional relatedness introduced by Hong wei Wu etal. (2007).

• Then apply a threshold-based clustering algorithm to this gene network, and obtain modules.

Page 15: Functional Module Prediction in Protein Interaction Networks Ch. Eslahchi NUS-IPM Workshop 5-7 April 2011

• The concept of degree is extended from the single vertex to the sub-graph by Feng Luo etal. (2007) and a formal definition of module in a network is used By them (MoNet). Roger etal. (2010) developed the MoNet to a new algorithm (dMoNet).

Page 16: Functional Module Prediction in Protein Interaction Networks Ch. Eslahchi NUS-IPM Workshop 5-7 April 2011

• Most efforts focused on detecting highly connected clusters.– Ignored the peripheral proteins.– Modules with other topology are not identified.– Modules are isolated and no inter relationship is revealed.

Identifying Modules from Biological Networks

Page 17: Functional Module Prediction in Protein Interaction Networks Ch. Eslahchi NUS-IPM Workshop 5-7 April 2011

• Traditional clustering algorithms have been applied to protein interaction networks (PIN) to find biological modules.– Need transforming PIN into weighted networks

• Weight the protein interactions based on number of experiments that support the interaction (Pereira-Leal et al).

• Weight with shortest path length (River et al. and Arnau et al. ).

– Drawbacks• Weights are artificial. • “tie in proximity” problem in hierarchical agglomerative clustering

(HAC).

Identifying Modules from Biological Networks

Page 18: Functional Module Prediction in Protein Interaction Networks Ch. Eslahchi NUS-IPM Workshop 5-7 April 2011

Previous Methods:• Detecting highly connected protein clusters.

Problems: 1. Neglect many peripheral proteins that connect to the

core protein clusters with few links, even though these peripheral proteins may represent true interactions that have been experimentally verified.

2. Biologically meaningful protein modules that do not have highly connected topologies are ignored by these approaches.

3. Protein clusters detected by these approaches are usually isolated from each other.

Identifying Modules from Biological Networks

Page 19: Functional Module Prediction in Protein Interaction Networks Ch. Eslahchi NUS-IPM Workshop 5-7 April 2011

Previous Methods:• Clustering methods have been applied to protein interaction networks to identify biological modules.

Weighting: 1. number of experiments that support the interaction.2. the length of the shortest path between them.

Problems: 1. generates many identical distances and leads to

generate ambiguous results. The solution is to repeat the algorithm iteratively to eliminate this problem. However, repetitive hierarchical clustering may not be computationally feasible for a large protein interaction networks at a whole-genome level.

Identifying Modules from Biological Networks

Application of clustering analysis to protein interaction networks usually involves transforming them into weighted networks:

Page 20: Functional Module Prediction in Protein Interaction Networks Ch. Eslahchi NUS-IPM Workshop 5-7 April 2011

Previous Methods:• Dividing the network into sub-networks, and then to identify modules based on their topology.

Problems: 1. Does not include a clear definition of module. It does

not formally determine which parts of the network are modules.

Identifying Modules from Biological Networks

Page 21: Functional Module Prediction in Protein Interaction Networks Ch. Eslahchi NUS-IPM Workshop 5-7 April 2011

Identifying Modules from Biological Networks

Some previous module definitions do not follow the intuitive concept of module exactly.

Page 22: Functional Module Prediction in Protein Interaction Networks Ch. Eslahchi NUS-IPM Workshop 5-7 April 2011

Limitation of Global Algorithms• Biological networks

are incomplete.

• Each vertex can only belong to one module.

Page 23: Functional Module Prediction in Protein Interaction Networks Ch. Eslahchi NUS-IPM Workshop 5-7 April 2011

139 Modules Obtained from DIP Yeast core PIN

Page 24: Functional Module Prediction in Protein Interaction Networks Ch. Eslahchi NUS-IPM Workshop 5-7 April 2011

Interconnected Module Network

Page 25: Functional Module Prediction in Protein Interaction Networks Ch. Eslahchi NUS-IPM Workshop 5-7 April 2011

Monet

Feng Luo etal. (2007)

Page 26: Functional Module Prediction in Protein Interaction Networks Ch. Eslahchi NUS-IPM Workshop 5-7 April 2011

Monet

• A new formal definition of network modules• A new agglomerative algorithm for

assembling modules• Application to yeast protein interaction

dataset

Page 27: Functional Module Prediction in Protein Interaction Networks Ch. Eslahchi NUS-IPM Workshop 5-7 April 2011

Degree of Subgraph• Given a graph G, let S be a subgraph of G (S G).

– The adjacent matrix of sub-graph S and its neighbors N can be given as:

– Indegree of S, Ind(S):

Where is 1 if both vertex i and vertex j are in sub-graph S and 0 otherwise.

– Outdegree of S, Outd(S):

Where is 1 if only one of the verteices i and j belong to S and 0 otherwise.

otherwise

StobelongsjorieitherandconnectedjandiverticesifSij

,

0

1

ji

ij jiSSind,

),()(

ji

ij jiSSoutd,

),()(

),( ji

),( ji

Page 28: Functional Module Prediction in Protein Interaction Networks Ch. Eslahchi NUS-IPM Workshop 5-7 April 2011

Degree of Subgraph: Example

Ind(1) =16Outd(1)=5

1 2

3

Ind(2) =7Outd(2)=4

Ind(3) =8Outd(3)=5

Page 29: Functional Module Prediction in Protein Interaction Networks Ch. Eslahchi NUS-IPM Workshop 5-7 April 2011

Modularity

• The modularity M of a sub-graph S in a given graph G is defined as the ratio of its indegree, ind(S), and outdegree, outd(S):

)(

)(

Soutd

SindM

Page 30: Functional Module Prediction in Protein Interaction Networks Ch. Eslahchi NUS-IPM Workshop 5-7 April 2011

New Network Module Definition• A subgraph S G is a module if M>1.

Ind(1) =16Outd(1)=5M=3.2

1 2

3

Ind(2) =7Outd(2)=4M=1.75

Ind(3) =8Outd(3)=5M=1.6

Page 31: Functional Module Prediction in Protein Interaction Networks Ch. Eslahchi NUS-IPM Workshop 5-7 April 2011

Agglomerative Algorithm for Identifying Network Modules

Flow chart of the agglomerative algorithm

Page 32: Functional Module Prediction in Protein Interaction Networks Ch. Eslahchi NUS-IPM Workshop 5-7 April 2011

The Order of Merging• Edge Betweenness (Girvan-

Newman, 2002)

– Defined as the number of shortest paths between all pairs of vertices that run through it.

– Edges between modules have higher betweenness values.

Betweenness = 20

Page 33: Functional Module Prediction in Protein Interaction Networks Ch. Eslahchi NUS-IPM Workshop 5-7 April 2011

The Order of Merging (continue)

• Gradually deleting the edge with the highest betweenness will generate an order of edges.– Edges between modules will be deleted

earlier.– Edges inside modules will be deleted later.

• Reverse the deletion order of edges and use it as the merging order.

Page 34: Functional Module Prediction in Protein Interaction Networks Ch. Eslahchi NUS-IPM Workshop 5-7 April 2011

When Merging Occurs?

• Between two non-modules• Between a non-module and a

module• Never between two modules

Page 35: Functional Module Prediction in Protein Interaction Networks Ch. Eslahchi NUS-IPM Workshop 5-7 April 2011

MF-Algorithm

By M. Hbibi, M. Sharifzade and C. Eslahchi

Page 36: Functional Module Prediction in Protein Interaction Networks Ch. Eslahchi NUS-IPM Workshop 5-7 April 2011

Definitions

• The number of the edges of , which we call the internal edges of , is:

• The number of edges with one end in and another end in is called external edges of and is equal to:

Page 37: Functional Module Prediction in Protein Interaction Networks Ch. Eslahchi NUS-IPM Workshop 5-7 April 2011

• For a vertex , the internal and external degree of with respect to is respectively defined by:

• For predicting modules in a graph, we define a module score (mscore) for and :

Definitions

and1 1 1( ) int ( ) ( )V V Vmscore v v ext v

Page 38: Functional Module Prediction in Protein Interaction Networks Ch. Eslahchi NUS-IPM Workshop 5-7 April 2011

MF Algorithm

• Step 1:Assigning white color to all vertices.Sort the vertices according to their degree, and divide this sorted list into four equal (or near equal) parts.

A B

Page 39: Functional Module Prediction in Protein Interaction Networks Ch. Eslahchi NUS-IPM Workshop 5-7 April 2011

121

10

87

9

112

3

4

5

6

-1

2

2

0

0

-1

0

0

1

2

1

0

Page 40: Functional Module Prediction in Protein Interaction Networks Ch. Eslahchi NUS-IPM Workshop 5-7 April 2011

MF Algorithm• Step 2:

If the module score of A in G, mscore(A), is greater than 1, then we consider A as a candidate for module (similarly for B ).

• Step 3:For each vertex v A(or B ) with color white ∈we calculate mscoreA(v) (or mscoreB(v)).

Page 41: Functional Module Prediction in Protein Interaction Networks Ch. Eslahchi NUS-IPM Workshop 5-7 April 2011

MF Algorithm

• Step 4:v A has minimum mscore (among vertices ∈which has color white).If mscore(v)<1

. X = X − v and Y = Y + v assign color gray to v, and go to Step 2

Page 42: Functional Module Prediction in Protein Interaction Networks Ch. Eslahchi NUS-IPM Workshop 5-7 April 2011

• Else, if |X| > 3

• Otherwise algorithm stops.

start the algorithm from Step 1 for G[X] (similarly for G[Y ]).

MF Algorithm

Page 43: Functional Module Prediction in Protein Interaction Networks Ch. Eslahchi NUS-IPM Workshop 5-7 April 2011

Filtering of MF Algorithm Results

Page 44: Functional Module Prediction in Protein Interaction Networks Ch. Eslahchi NUS-IPM Workshop 5-7 April 2011

Example of Module Overlap

Page 45: Functional Module Prediction in Protein Interaction Networks Ch. Eslahchi NUS-IPM Workshop 5-7 April 2011

Testing Data Set

• Yeast Core Protein Interaction Network (PIN).

– The yeast core PIN from Database of Interacting Proteins (DIP) (version ScereCR20041003).

– Total: 2609 proteins; 6355 links.

– Large component: 2440 proteins, 6401 interactions.

Page 46: Functional Module Prediction in Protein Interaction Networks Ch. Eslahchi NUS-IPM Workshop 5-7 April 2011

Comparison of MF, MoNet, and MCL

• P-value shows the statistical significance of a group of genes related to a specific GO (Gene Ontology) term. The more significant modules have p-values closer to zero.

• The percentage of proteins in each module which are related to a specific GO term is denoted by D.

Page 47: Functional Module Prediction in Protein Interaction Networks Ch. Eslahchi NUS-IPM Workshop 5-7 April 2011

Some Examples of MF Results

• MFA not only predicts dense and highly connected modules, but also predicts linear and non-dense ones, like stars. Three of such MFA modules, with various densities and topologies, are shown in the figure:

Page 48: Functional Module Prediction in Protein Interaction Networks Ch. Eslahchi NUS-IPM Workshop 5-7 April 2011

Conclusions• Provide a framework for decomposing the protein interaction network into

functional modules

• The modules obtained appear to be biological functional modules based on clustering of Gene Ontology terms

• The network of modules provides a plausible way to understanding the interactions between these functional modules

• With the increasing amounts of protein interaction data available, our approach will help construct a more complete view of interconnected functional modules to better understand the organization of the whole cellular system

Page 49: Functional Module Prediction in Protein Interaction Networks Ch. Eslahchi NUS-IPM Workshop 5-7 April 2011

Questions?

Page 50: Functional Module Prediction in Protein Interaction Networks Ch. Eslahchi NUS-IPM Workshop 5-7 April 2011

Local Optimization Algorithm

Page 51: Functional Module Prediction in Protein Interaction Networks Ch. Eslahchi NUS-IPM Workshop 5-7 April 2011