Upload
hector-terry
View
217
Download
1
Tags:
Embed Size (px)
Citation preview
1
Applications of Relative Importance
Why is relative importance interesting? Web Social Networks Citation Graphs Biological Data
Graphs become too complex for manual analysis
2
Existing Techniques Web
PageRank (Google) Social Networks
‘Centrality’
All focus on global measures of node importance – we’re interested in importance relative to a set of root nodes R
3
Use Existing Techniques?
Use global algorithm on the subgraph surrounding root nodes?
No preferential treatment of root nodes – just ranking surrounding nodes.
4
Organization: Relative importance Algorithms
Notation Problem Formulation General Framework Algorithms
5
Notation Digraph
G = (V, E) Edges
Ordered pair of nodes (u, v) Graphs are directed, unweighted, simple Walks from u to v
a.k.a. A walk is a path with no repeated nodes
1 2 ... ku u u u v 1 1 2( , ),( , ),...,( , )ku u u u u v
6
Notation k-short paths P(u,v) – set of paths between u and v – set of distinct out-going edges from
u Similarly, we have
( )outS u( ) ( )out outd u S u
( ) ( )in ind u S u
7
Problem Formulation
1. Given G and r and t, where , compute the “importance” of t w.r.t. root node r:
{r,t} G
|I t r
8
Problem Formulation
2. Given G and node , rank all vertices in T(G), T V, w.r.t. r.
r G
9
Problem Formulation
3. Given G, a set of nodes T(G) to rank, and a set of root nodes R(G) where R V, rank all vertices in T w.r.t. R.
This is similar to the last case, except that we compute rather than
Average importance:
|I t r |I t R
1| |
r R
I t R I t rR
10
Problem Formulation (3 cont’d.) Rather than average each node’s
importance score, we could define
This requires ‘important’ nodes to have a high importance score among all nodes in R
| min | :I t R I t r r R
11
Problem Formulation
4. Given G, rank all nodes where R=T=V.
12
General Framework:Weighted Paths
Nodes are related according to the paths that connect them
The longer the path, the less importance:
is a scalar coefficient,
P(r,t) is a set of paths from r to t, pi is the ith path in P.
Importance decays exponentially
,
1
|
i
P r tp
i
I t r 1
13
How to choose P(r,t)?
Path examples
A
R
D
E
F
T
C
B
A
R
D
E
F
T
C
B
a. b.
Shortest pathsfrom R to T:{R-C-T. R-D-T}which fail to capture much ofConnectivity fromR to T.
14
Shortest Path
e.g.: Transport cargo from r to t
Shortest path doesn’t always give a good approximation of importance. E.g: the web (graph b)
15
k-Short Paths Paths of length K Idea: there might often be longer paths than the shortest ones that are
important to take into account Fixes problem of longer, important
paths in Shortest Paths e.g.: graph b., 3-short
Problem: capacity constraints e.g.: network topology
16
k-Short Node-Disjoint Paths
No nodes and no edges are repeated Implicitly enforces capacity constraints Motivated by ‘mass flow’ where
importance can ‘flow’ along paths e.g.: graph b.
Breadth-first with some heuristic, with some K and some
17
Markov Chains & Relative Importance
Graph viewed as a stochastic process Explanation of Markov Chains Token traversing Chain… Obviously good for modeling the web
18
Markov Chains & Relative Importance
Markov Centrality Mean First Passage Time
: expected number of steps until first arrival at node t starting at node r : probability that the chain first returns to
state t in exactly n steps
1
( )rt rtn
m nf n
rtm
( )rtf n
19
Markov Chains & Relative Importance
Bias toward ‘central nodes’ COMPLEX!!
Time: O(|V|3) (inversion of |V|x|V| transition matrix)
Space: O(|V2|)
1( | )
1rt
r R
I t Rm
R
20
Markov Chains & Relative Importance
PageRank Uses backlinks to assign importance to
web pages
21
Markov Chains & Relative Importance
PageRank Less complex
Converges logarithmically 322 million links
processed in 52 iterations
22
Markov Chains & Relative Importance
Retrofit PageRank such that all nodes in R have a uniform bias at the start
‘Surfer’ begins at a root node, traverses graph, returning to root set R with probability at each time-step
I(t|R) = probability that surfer visits t during a walk
23
Experiments (Simulated Data)
D F
E
J
C HA
B
G
I
24
Experiments (Simulated Data)
D F
E
J
C HA
B
G
I
More complex in and out degrees
changed Shortest path
lengths between nodes changed (e.g.: A-B)
Analysis which follows, R={A,F}
25
Experiments (Simulated Data)
D F
E
J
C HA
B
G
I
HITSPaA .252F .241G .128C .110E .099H .052D .032J .025I .032B .024
HITSPhF .225A .186D .162B .119E .090I .067H .061J .050G .028C .008
26
Experiments (Simulated Data)
D F
E
J
C HA
B
G
I
MarkovCJ .180C .133G .130H .129E .111I .101F .069D .051A .047B .044
KSMarkovH .146G .142E .142J .140C .120I .098F .087D .061A .034B .024
27
Experiments (9/11 Terrorist Network)
63 nodes (terrorists) 308 edges (interactions)
Rank PRankP HITSP WKPaths MarkovC KSMarkov
1 Khemais Khemais Beghal Atta Khemais
2 Beghal Beghal Khemais Al-Shehhi Beghal
3 Moussaoui Atta Moussaoui Al-Shibh Moussaoui
4 Maaroufi Moussaoui Maaroufi Moussaoui Maaroufi
5 Qatada Maaroufi Bensakhria Jarrah Qatada
6 Daoudi Qatada Daoudi Hanjour Daoudi
7 Courtaillier Bensakhria Qatada Al-Omari Bensakhria
8 Bensakhria Daoudi Walid Khemais Courtaillier
9 Walid Courtaillier Courtaillier Qatada Walid
10 Khammoun Khammoun Khammoun Bahaji Khammoun
29
Conclusion
Provides a first-step to addressing ‘relative-importance’
Scaling for algorithms such as Markov Chaining can be an issue
Using different algorithms and comparing results can reveal interesting information
…Paper Analysis…
30
References White, Smyth. Algorithms for Estimating Relative
Importance in Networks. SIGKDD ’03. Page, Brin, Motwani, Winograd. The PageRank Citation
Ranking: Bringing Order to the Web. Stanford University, Computer Science Department Technical Report.
Wikipedia on Markov Chains http://en.wikipedia.org/wiki/Markov_chain http://en.wikipedia.org/wiki/Examples_of_Markov_chains