Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
4/28/09
1
CSCI1950‐Z Computa5onal Methods for Biology
Lecture 23
Ben Raphael April 27, 2009
hJp://cs.brown.edu/courses/csci1950‐z/
Outline: This week
1. Subnetwork querying. – More color‐coding. Tree‐width graphs.
2. Network Mo5fs
3. Network alignment : conserved complexes
4. Network integra5on: networks + gene expression data.
4/28/09
2
Network Querying Problem
• Species A • well studied • protein interac5on sub‐
networks defined by extensive experimenta5on
• Species B • less studied • liJle knowledge of sub‐
networks • protein interac5on network
known using high‐throughput technologies
• Can we use the knowledge of A to discover corresponding sub‐networks in B if it is “present”?
Graph Isomorphism
G1 = (V1, E1) and G2 = (V2, E2) are isomorphic provided:
There is a bijec5on between the ver5ces V1 and V2 that preserves edges
i.e. There is a 1‐1, onto func5on Φ: V1 V2 such that (u,v) ∈ E1 if and only if (Φ(u), Φ(v)) ∈ E2
4/28/09
3
Graph Isomorphism Problem
Are G1 = (V1, E1) and G2 = (V2, E2) isomorphic?
Neither known to be in P or to be NP‐complete.
A
F
C
B
D
E 1 4
2
3
5
6
Subgraph Isomorphism Problem
Is G1 = (V1, E1) isomorphic to a subgraph of G2?
NP‐complete problem.
4/28/09
4
Network Querying Problem
• Given a query graph Q and a network G, find the sub‐network of G that is – Isomorphic to Q
– aligned with maximal score
• NP‐complete: subgraph isomorphism.
Query Q
Network G
Network Querying Problem: Homeomorphic Alignment
homeomorphic to Q
inser5on
match
match
match
match
match
match
Match of homologous proteins and dele5on/inser5on of degree‐2 nodes
dele5on
Species B Species A
Q
4/28/09
5
Graph Subdivision and Homeomorphism
homeomorphic to Q
inser5on
match
match
match
match
match
match
dele5on
Q
G and G’ are homeomorphic provided there is an isomorphism from a subdivision of G to a subdivision of G’
Subdivision of an edge: “insert” a vertex.
Subdivision of G is graph obtained by subdividing some edges.
Network Querying Problem: Score of Alignment
Score Sequence similarity score for matches
Penalty for dele5ons& inser5ons
Interac5on reliabili5es score
+ + =
h(q1,v1) q1 v1
h(q2,v2)
h(q3,v3)
h(q4,v4)
h(q6,v6)
h(q5,v5)
del pen
ins pen
v2
w(v
1,v2)
4/28/09
6
Network Querying Problem
• Given a query graph Q and a network G, find the sub‐network of G that is – homeomorphic to Q
– aligned with maximal score
Query Q
Network G
Complexity
• Network querying problem is NP‐complete. (for general n and k) – by reduc5on from sub‐graph
isomorphism problem
Naïve algorithm has O(nk) complexity n = size of the PPI network, k=size of the query
Intractable for realis5c values of n and k
n ~5000, k~10
We use randomized “color coding” technique developed by [Alon et al, JACM, 1995] to find a tractable solu5on. Reduces O(nk) to n22O(k).
4/28/09
7
QNET
Implemented for tree‐like queries.
Color coding approach to search for the global op5mal sub‐network.
Extension of QPATH [Shlomi et al., 2006] Solves the problem of querying chains using color coding
approach.
Query
Color Coded Querying ‐ Trees Network
Query has k nodes.
4/28/09
8
Network
Color Coded Querying ‐ Trees
Query has k nodes. Randomly color the network with k dis5nct colors. Suppose op5mal sub‐network is “colorful”. Use the colors to remember the visited nodes.
Query Network
DP solu5on for Color Coded Querying ‐ Trees
q1
q2
q3 q4
q5
q6
q7
v1
v2 v3
v6
v7
v4
DP: Whiteboard
4/28/09
9
Network
Probability of failure
• The op5mal alignment can be found only if the op5mal sub‐network is “colorful”.
• Repeat color‐coded search mul5ple 5mes un5l probability of failure ≤ ε.
v1
v2 v3
v6
v7
v4
v5
€
P( failure) =1− k!kk
≤1− e−k
Number of Repeats Necessary number of repeats to guarantee a failure ≤ ε?
Repeat 5mes, then
4/28/09
10
Network Querying with Color Coding Approach
Network Graph
high scoring subnetwork
query
randomly color
DP algorithm
repeat N 5mes
Querying General Graphs
• Extend algorithm to general query graphs.
• Idea: – Map the original graph into a tree, i.e. tree decomposi5on.
– Solve the querying problem on this tree using DP.
4/28/09
11
Color Coded Querying – General Graphs
Map the original query into a tree using tree‐decomposi5on.
G
T
vertex
node=set of ver5ces
u
v z
Tree Decomposi5on
G
T
vertex
node=set of ver5ces
u
v z
Given G = (V, E).
Form tree T = (X, ET)
Each Xi ∈ X is a subset of V. For all edges (u,v) ∈ E : there is a set Xi
containing both u and v.
For every v ∈ V: the nodes that contain v form a connected subtree.
4/28/09
12
Color Coded Querying – General Graphs
Tree decomposi5on is not unique. Width of a tree decomposi5on is the size of its largest node minus one:
Treewidth of a graph G is the minimum width among all possible tree decomposi5ons of G.
G
T
Color Coded Querying – General Graphs
The treewidth of a graph G is the minimum width among all possible tree decomposi5ons of G. DPs on trees can usually be extended to tree decomposi5ons. Problems solved efficiently on trees by DP can be solved efficiently on graphs with bounded treewidth.
G
T
4/28/09
13
Network
Color Coded Querying – General Graphs
Original query has k nodes and tree‐width t. Randomly color the network with k dis5nct colors. .
q1
q2
q2
q3
q3
q4
q4 q5
q5
q6 q7 q8
Network
Color Coded Querying – General Graphs
Original query has k nodes and tree‐width t. Randomly color the network with k dis5nct colors.
q1
q2
q2
q3
q3
q4
q4 q5
q5
q6 q7 q8
v2 v3
v7
v6
v5
v8
v1
v4
4/28/09
14
Running 5me
• n=size of network, k=size of query.
• Tree queries: – Reduces O(nk) to n22O(k).
• Tractable for realis5c values of n and k. • n ~5000, k~10
• Bounded‐tree‐width graphs: – t : tree‐width – n(t+1)2O(k)
Heuris5c for Color Coded Querying ‐ General Graphs
G
1. Extract several spanning trees from the original query.
4/28/09
15
1. Extract several spanning trees from the original query. 2. Query each spanning tree in the network.
Heuris5c for Color Coded Querying ‐ General Graphs
1. Extract several spanning trees from the original query. 2. Query each spanning tree in the network.
Heuris5c for Color Coded Querying ‐ General Graphs
4/28/09
16
1. Extract several spanning trees from the original query. 2. Query each spanning tree in the network.
Heuris5c for Color Coded Querying ‐ General Graphs
1. Extract several spanning trees from the original query. 2. Query each spanning tree in the network. 3. Merge the matching trees to obtain matching graph.
Heuris5c for Color Coded Querying ‐ General Graphs
4/28/09
17
Test: Cross‐species comparison of MAPK pathways
• Query: human MAPK pathway involved in cell prolifera5on and differen5a5on.
• Network: fly PPI network
• Result: a known fly MAPK pathway involved in dorsal paJern forma5on.
Query from human
Match in fly
Test: Cross‐species comparison of protein complexes
• Queries: trees of size 3‐8 extracted from ~100 yeast hand‐curated MIPS complexes. • Network: fly PPI network • Result:
– ~40 of the queries resulted in a match with >1 protein. – 72% of the matches are func5onally enriched. (pvalue < 0.05)
• 17% of the random trees extracted from network are func5onally enriched.
4/28/09
18
Outline: This week
1. Subnetwork querying. – More color‐coding. Tree‐width graphs.
2. Network Mo5fs
3. Network alignment : conserved complexes
4. Network integra5on: networks + gene expression data.
Network Structure
Is there structure in this network, or is it “random”?
4/28/09
19
Random Network
Erdos‐Renyi model: n ver5ces. For each pair (u, v) of ver5ces, connect with edge with probability p.
Random Networks
Erdos‐Renyi graphs have a number of special proper5es.
1. Degree distribu5on is asympto5cally Poisson. D = degree of a vertex. Pr[D = k] Exp[‐λ] λk/k!
2. If p > 1/n, there is a large connected component: second largest component has size O(log n)
3. More… hJps://nwb.slis.indiana.edu/community/?n=CustomFillings.AnalysisOfBiologicalNetworks
4/28/09
20
Random Networks
Empirically, biological networks have different proper5es.
Degree distribu5on follows a power law. pk = Pr[D = k] ∼ C k‐λ
or log pk ∼ –λ C’log[k]
There are a few nodes of high degree, “hubs”
hJps://nwb.slis.indiana.edu/community/?n=CustomFillings.AnalysisOfBiologicalNetworks
Log‐log scale
Random Networks
Empirically, biological networks have different proper5es.
Suggests that “aJachment process” does not follow Erdos‐Renyi model.
Is there any biological significance??? A different aJachment process? Clues about network evolu5on?
Major caveat: All biological networks are incomplete.
hJps://nwb.slis.indiana.edu/community/?n=CustomFillings.AnalysisOfBiologicalNetworks
4/28/09
21
Network Mo5fs
Subnetworks with more occurrences than expected by chance.
• How to find? • How to assess sta5s5cal significance?
Shen‐Orr et al. 2002
Network Mo5fs
Subnetworks with more occurrences than expected by chance.
• How to find? – Exhaus5ve: Count all n‐node subgraphs. – Greedy and other heuris5c methods. – Approximate coun5ng via randomized algorithms.
4/28/09
22
Sources
• Dost B, Shlomi T, Gupta N, Ruppin E, Bafna V, Sharan R. QNet: a tool for querying protein interac5on networks. (2008) J Comput Biol. 15(7):913‐25.
• QNET slides: modified from slides of Banu Dost.