22
4/28/09 1 CSCI1950‐Z Computa5onal Methods for Biology Lecture 23 Ben Raphael April 27, 2009 hJp://cs.brown.edu/courses/csci1950‐z/ Outline: This week 1. Subnetwork querying. More color‐coding. Tree‐width graphs. 2. Network Mo5fs 3. Network alignment : conserved complexes 4. Network integra5on: networks + gene expression data.

Outline: This week - Brown University4/28/09 12 Color Coded Querying – General Graphs Tree decomposion is not unique. Width of a tree decomposion is the size of its largest node

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Outline: This week - Brown University4/28/09 12 Color Coded Querying – General Graphs Tree decomposion is not unique. Width of a tree decomposion is the size of its largest node

4/28/09 

CSCI1950‐Z Computa5onal Methods for Biology 

Lecture 23 

Ben Raphael April 27, 2009 

hJp://cs.brown.edu/courses/csci1950‐z/ 

Outline: This week 

1.  Subnetwork querying. –  More color‐coding.  Tree‐width graphs. 

2.  Network Mo5fs 

3.  Network alignment : conserved complexes 

4.  Network integra5on: networks + gene expression data. 

Page 2: Outline: This week - Brown University4/28/09 12 Color Coded Querying – General Graphs Tree decomposion is not unique. Width of a tree decomposion is the size of its largest node

4/28/09 

Network Querying Problem 

•  Species A •  well studied •  protein interac5on sub‐

networks defined by extensive experimenta5on 

•  Species B  •  less studied •  liJle knowledge of sub‐

networks •  protein interac5on network 

known using high‐throughput technologies 

•  Can we use the knowledge of A to discover corresponding sub‐networks in B if it is “present”? 

Graph Isomorphism 

G1 = (V1, E1) and G2 = (V2, E2) are isomorphic provided: 

There is a bijec5on between the ver5ces V1 and V2 that preserves edges 

i.e. There is a 1‐1, onto func5on Φ:  V1  V2 such that (u,v) ∈ E1 if and only if (Φ(u), Φ(v)) ∈ E2  

Page 3: Outline: This week - Brown University4/28/09 12 Color Coded Querying – General Graphs Tree decomposion is not unique. Width of a tree decomposion is the size of its largest node

4/28/09 

Graph Isomorphism Problem 

Are G1 = (V1, E1) and G2 = (V2, E2) isomorphic? 

Neither known to be in P or to be NP‐complete. 

A

C

B

D

E 1  4 

Subgraph Isomorphism Problem 

Is G1 = (V1, E1) isomorphic to a subgraph of G2? 

NP‐complete problem. 

Page 4: Outline: This week - Brown University4/28/09 12 Color Coded Querying – General Graphs Tree decomposion is not unique. Width of a tree decomposion is the size of its largest node

4/28/09 

Network Querying Problem 

•  Given a query graph Q and a network G, find the sub‐network of G that is –  Isomorphic to Q  

–  aligned with maximal score 

•  NP‐complete: subgraph isomorphism. 

Query Q 

Network G 

Network Querying Problem: Homeomorphic Alignment 

homeomorphic to Q 

inser5on 

match 

match 

match 

match 

match 

match 

Match of homologous proteins and dele5on/inser5on of degree‐2 nodes 

dele5on 

Species B Species A 

Page 5: Outline: This week - Brown University4/28/09 12 Color Coded Querying – General Graphs Tree decomposion is not unique. Width of a tree decomposion is the size of its largest node

4/28/09 

Graph Subdivision and Homeomorphism 

homeomorphic to Q 

inser5on 

match 

match 

match 

match 

match 

match 

dele5on 

G and G’ are homeomorphic provided there is an isomorphism from a subdivision of G to a subdivision of G’ 

Subdivision of an edge: “insert” a vertex.  

Subdivision of G is graph obtained by subdividing some edges. 

Network Querying Problem: Score of Alignment 

Score Sequence  similarity  score for  matches 

Penalty for  dele5ons& inser5ons 

Interac5on  reliabili5es  score 

+  + = 

h(q1,v1) q1  v1 

h(q2,v2) 

h(q3,v3) 

h(q4,v4) 

h(q6,v6) 

h(q5,v5) 

del pen 

ins pen 

v2 

w(v

1,v2) 

Page 6: Outline: This week - Brown University4/28/09 12 Color Coded Querying – General Graphs Tree decomposion is not unique. Width of a tree decomposion is the size of its largest node

4/28/09 

Network Querying Problem 

•  Given a query graph Q and a network G, find the sub‐network of G that is –  homeomorphic to Q  

–  aligned with maximal score 

Query Q 

Network G 

Complexity 

•  Network querying problem is NP‐complete. (for general n and k) –  by reduc5on from sub‐graph 

isomorphism problem 

  Naïve algorithm has O(nk) complexity   n = size of the PPI network, k=size of the query 

  Intractable for realis5c values of n and k 

  n ~5000, k~10 

  We use randomized “color coding” technique developed by [Alon et al, JACM, 1995]  to find a tractable solu5on.    Reduces O(nk) to n22O(k). 

Page 7: Outline: This week - Brown University4/28/09 12 Color Coded Querying – General Graphs Tree decomposion is not unique. Width of a tree decomposion is the size of its largest node

4/28/09 

QNET 

  Implemented for tree‐like queries. 

  Color coding approach to search for the global op5mal      sub‐network. 

  Extension of QPATH [Shlomi et al., 2006]   Solves the problem of querying chains using color coding 

approach.  

Query 

Color Coded Querying ‐ Trees Network 

Query has k nodes. 

Page 8: Outline: This week - Brown University4/28/09 12 Color Coded Querying – General Graphs Tree decomposion is not unique. Width of a tree decomposion is the size of its largest node

4/28/09 

Network 

Color Coded Querying ‐ Trees 

Query has k nodes. Randomly color the network with k dis5nct colors. Suppose op5mal sub‐network is “colorful”. Use the colors to remember the visited nodes. 

Query  Network 

DP solu5on for Color Coded Querying ‐ Trees 

q1 

q2 

q3 q4 

q5 

q6 

q7 

v1 

v2  v3 

v6 

v7 

v4 

DP: Whiteboard 

Page 9: Outline: This week - Brown University4/28/09 12 Color Coded Querying – General Graphs Tree decomposion is not unique. Width of a tree decomposion is the size of its largest node

4/28/09 

Network 

Probability of failure 

•  The op5mal alignment can be found only if the op5mal sub‐network is “colorful”. 

•  Repeat color‐coded search mul5ple 5mes un5l probability of failure ≤ ε. 

v1 

v2  v3 

v6 

v7 

v4 

v5 

P( failure) =1− k!kk

≤1− e−k

Number of Repeats   Necessary number of repeats to guarantee a failure ≤ ε? 

  Repeat                                          5mes, then 

    

Page 10: Outline: This week - Brown University4/28/09 12 Color Coded Querying – General Graphs Tree decomposion is not unique. Width of a tree decomposion is the size of its largest node

4/28/09 

10 

Network Querying with Color Coding Approach 

Network Graph

 

high scoring subnetwork 

query 

randomly color 

DP  algorithm 

repeat N 5mes 

Querying General Graphs 

•  Extend algorithm to general query graphs.  

•  Idea: – Map the original graph into a tree, i.e. tree decomposi5on.  

– Solve the querying problem on this tree using DP. 

Page 11: Outline: This week - Brown University4/28/09 12 Color Coded Querying – General Graphs Tree decomposion is not unique. Width of a tree decomposion is the size of its largest node

4/28/09 

11 

Color Coded Querying – General Graphs 

Map the original query into a tree using          tree‐decomposi5on.   

vertex 

node=set of ver5ces 

v  z 

Tree Decomposi5on 

vertex 

node=set of ver5ces 

v  z 

Given G = (V, E). 

Form tree T = (X, ET)  

Each Xi ∈ X is a subset of V. For all edges (u,v) ∈ E : there is a set Xi 

containing both u and v. 

For every v ∈ V: the nodes that contain v form a connected subtree. 

Page 12: Outline: This week - Brown University4/28/09 12 Color Coded Querying – General Graphs Tree decomposion is not unique. Width of a tree decomposion is the size of its largest node

4/28/09 

12 

Color Coded Querying – General Graphs 

Tree decomposi5on is not unique.  Width of a tree decomposi5on is the size of its largest node minus one: 

Treewidth of a graph G is the minimum width among all possible tree decomposi5ons of G.  

Color Coded Querying – General Graphs 

The treewidth of a graph G is the minimum width among all possible tree decomposi5ons of G.  DPs on trees can usually be extended to tree decomposi5ons. Problems solved efficiently on trees by DP can be solved efficiently on graphs with bounded treewidth.  

Page 13: Outline: This week - Brown University4/28/09 12 Color Coded Querying – General Graphs Tree decomposion is not unique. Width of a tree decomposion is the size of its largest node

4/28/09 

13 

Network 

Color Coded Querying – General Graphs 

Original query has k nodes and tree‐width t. Randomly color the network with k dis5nct colors. . 

q1 

q2 

q2 

q3 

q3 

q4 

q4 q5 

q5 

q6  q7 q8 

Network 

Color Coded Querying – General Graphs 

Original query has k nodes and tree‐width t. Randomly color the network with k dis5nct colors.  

q1 

q2 

q2 

q3 

q3 

q4 

q4 q5 

q5 

q6  q7 q8 

v2  v3 

v7 

v6 

v5 

v8 

v1 

v4 

Page 14: Outline: This week - Brown University4/28/09 12 Color Coded Querying – General Graphs Tree decomposion is not unique. Width of a tree decomposion is the size of its largest node

4/28/09 

14 

Running 5me 

•  n=size of network, k=size of query. 

•  Tree queries: –  Reduces O(nk) to n22O(k). 

•  Tractable for realis5c values of n and k.  •  n ~5000, k~10 

•  Bounded‐tree‐width graphs: –  t : tree‐width –  n(t+1)2O(k) 

Heuris5c for Color Coded Querying ‐ General Graphs 

1.  Extract several spanning trees from the original query. 

Page 15: Outline: This week - Brown University4/28/09 12 Color Coded Querying – General Graphs Tree decomposion is not unique. Width of a tree decomposion is the size of its largest node

4/28/09 

15 

1.  Extract several spanning trees from the original query. 2.  Query each spanning tree in the network. 

Heuris5c for Color Coded Querying ‐ General Graphs 

1.  Extract several spanning trees from the original query. 2.  Query each spanning tree in the network. 

Heuris5c for Color Coded Querying ‐ General Graphs 

Page 16: Outline: This week - Brown University4/28/09 12 Color Coded Querying – General Graphs Tree decomposion is not unique. Width of a tree decomposion is the size of its largest node

4/28/09 

16 

1.  Extract several spanning trees from the original query. 2.  Query each spanning tree in the network. 

Heuris5c for Color Coded Querying ‐ General Graphs 

1.  Extract several spanning trees from the original query. 2.  Query each spanning tree in the network.  3.  Merge the matching trees to obtain matching graph. 

Heuris5c for Color Coded Querying ‐ General Graphs 

Page 17: Outline: This week - Brown University4/28/09 12 Color Coded Querying – General Graphs Tree decomposion is not unique. Width of a tree decomposion is the size of its largest node

4/28/09 

17 

Test: Cross‐species comparison of MAPK pathways 

•  Query: human MAPK pathway involved in cell prolifera5on and differen5a5on.  

•  Network: fly PPI network 

•  Result: a known fly MAPK pathway involved in dorsal paJern forma5on.  

Query from human 

Match  in  fly 

Test: Cross‐species comparison of protein complexes 

•  Queries: trees of size 3‐8 extracted from ~100 yeast hand‐curated MIPS complexes. •  Network: fly PPI network •  Result:  

–  ~40 of the queries resulted in a match with >1 protein. –  72% of the matches are func5onally enriched. (pvalue < 0.05) 

•  17% of the random trees extracted from network are func5onally enriched.  

Page 18: Outline: This week - Brown University4/28/09 12 Color Coded Querying – General Graphs Tree decomposion is not unique. Width of a tree decomposion is the size of its largest node

4/28/09 

18 

Outline: This week 

1.  Subnetwork querying. –  More color‐coding.  Tree‐width graphs. 

2.  Network Mo5fs 

3.  Network alignment : conserved complexes 

4.  Network integra5on: networks + gene expression data. 

Network Structure 

Is there structure in this network, or is it “random”? 

Page 19: Outline: This week - Brown University4/28/09 12 Color Coded Querying – General Graphs Tree decomposion is not unique. Width of a tree decomposion is the size of its largest node

4/28/09 

19 

Random Network 

Erdos‐Renyi model:  n ver5ces.  For each pair (u, v) of ver5ces, connect with edge with probability p. 

Random Networks 

Erdos‐Renyi graphs have a number of special proper5es. 

1.  Degree distribu5on is asympto5cally Poisson. D = degree of a vertex. Pr[D = k]  Exp[‐λ] λk/k! 

2.  If p > 1/n, there is a large connected component: second largest component has size O(log n) 

3.  More… hJps://nwb.slis.indiana.edu/community/?n=CustomFillings.AnalysisOfBiologicalNetworks 

Page 20: Outline: This week - Brown University4/28/09 12 Color Coded Querying – General Graphs Tree decomposion is not unique. Width of a tree decomposion is the size of its largest node

4/28/09 

20 

Random Networks 

Empirically, biological networks have different proper5es. 

Degree distribu5on follows a power law. pk = Pr[D = k] ∼ C k‐λ 

or log pk ∼ –λ C’log[k] 

There are a few nodes of high degree, “hubs”  

hJps://nwb.slis.indiana.edu/community/?n=CustomFillings.AnalysisOfBiologicalNetworks 

Log‐log scale 

Random Networks 

Empirically, biological networks have different proper5es. 

Suggests that “aJachment process” does not follow Erdos‐Renyi model. 

Is there any biological significance???  A different aJachment process?  Clues about network evolu5on? 

Major caveat: All biological networks are incomplete. 

hJps://nwb.slis.indiana.edu/community/?n=CustomFillings.AnalysisOfBiologicalNetworks 

Page 21: Outline: This week - Brown University4/28/09 12 Color Coded Querying – General Graphs Tree decomposion is not unique. Width of a tree decomposion is the size of its largest node

4/28/09 

21 

Network Mo5fs 

Subnetworks with more occurrences than expected by chance. 

•  How to find? •  How to assess sta5s5cal significance? 

Shen‐Orr et al. 2002 

Network Mo5fs 

Subnetworks with more occurrences than expected by chance. 

•  How to find? –  Exhaus5ve: Count all n‐node subgraphs. – Greedy and other heuris5c methods. – Approximate coun5ng via randomized algorithms. 

Page 22: Outline: This week - Brown University4/28/09 12 Color Coded Querying – General Graphs Tree decomposion is not unique. Width of a tree decomposion is the size of its largest node

4/28/09 

22 

Sources 

•  Dost B, Shlomi T, Gupta N, Ruppin E, Bafna V, Sharan R. QNet: a tool for querying protein interac5on networks. (2008) J Comput Biol. 15(7):913‐25. 

•  QNET slides: modified from slides of Banu Dost.