Upload
stella-patrick
View
223
Download
0
Tags:
Embed Size (px)
Citation preview
School of InformationUniversity of Michigan
SI 614Directed & weighted networks, minimum spanning trees, flow
Lecture 12
Instructor: Lada Adamic
Outline
directed networks prestige weighted networks minimum spanning trees flow
Comparing across these 3 centrality values•Generally, the 3 centrality types will be positively correlated•When they are not (low) correlated, it probably tells you something interesting about the network.
Low Degree
Low Closeness
Low Betweenness
High Degree Embedded in cluster that is far from the rest of the network
Ego's connections are redundant - communication bypasses him/her
High Closeness Key player tied to important important/active alters
Probably multiple paths in the network, ego is near many people, but so are many others
High Betweenness
Ego's few ties are crucial for network flow
Very rare cell. Would mean that ego monopolizes the ties from a small number of people to many others.
Review of centrality in undirected networksComparison
slide: Jim Moody
Bonacich Power Centrality: Actor’s centrality (prestige) is equal to a function of the prestige of those they are connected to. Thus, actors who are tied to very central actors should have higher prestige/ centrality than those who are not.
1)(),( 1 RRIC
• is a scaling vector, which is set to normalize the score. • reflects the extent to which you weight the centrality of people ego is tied to.
• R is the adjacency matrix (can be valued)• I is the identity matrix (1s down the diagonal) • 1 is a matrix of all ones.
Centrality in Social NetworksPower / Eigenvalue
slide: Jim Moody
Bonacich Power Centrality:
The magnitude of reflects the radius of power. Small values of weight local structure, larger values weight global structure.
If is positive, then ego has higher centrality when tied to people who are central.
If is negative, then ego has higher centrality when tied to people who are not central.
As approaches zero, you get degree centrality.
Centrality in Social NetworksPower / Eigenvalue
slide: Jim Moody
Bonacich Power Centrality:
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
1 2 3 4 5 6 7
Positive
Negative
= 0.23
Centrality in Social NetworksPower / Eigenvalue
slide: Jim Moody
=.35 =-.35Bonacich Power Centrality:
Centrality in Social NetworksPower / Eigenvalue
slide: Jim Moody
Bonacich Power Centrality:
=.23 = -.23
Centrality in Social NetworksPower / Eigenvalue
slide: Jim Moody
Examples of directed networks?
WWW food webs population dynamics influence hereditary citation transcription regulation networks neural networks
Prestige in directed social networks
when ‘prestige’ may be the right word admiration influence gift-giving trust
directionality especially important in instances where ties may not be reciprocated (e.g. dining partners choice network)
when ‘prestige’ may not be the right word gives advice to (can reverse direction) gives orders to (- ” -) lends money to (- ” -) dislikes distrusts
Extensions of undirected degree centrality - prestige
degree centrality indegree centrality
a paper that is cited by many others has high prestige a person nominated by many others for an reward has high prestige
Extensions of undirected closeness centrality
closeness centrality usually implies all paths should lead to you
and unusually not: paths should lead from you to everywhere else
usually consider only vertices from which the node i in question can be reached
Influence range
The influence range of i is the set of vertices who are reachable from the node i
Extending betweenness centrality to directed networks
We now consider the fraction of all directed paths between any two vertices that pass through a node
Only modification: when normalizing, we have (N-1)*(N-2) instead of (N-1)*(N-2)/2, because we have twice as many ordered pairs as unordered pairs
jkikj
jkiB gngnC /)()(,
betweenness of vertex i paths between j and k that pass through i
all paths between j and k
)]2)(1/[()()(' NNnCnC ii BB
Directed geodesics
A node does not necessarily lie on a geodesic from j to k if it lies on a geodesic from k to j
k
j
Prestige in Pajek
Calculating the indegree prestige Net>Partition>Degree>Input to view, select File>Partition>Edit if you need to reverse the direction of each tie first (e.g. lends
money to -> borrows from):Net>Transform>Transpose
Influence range (a.k.a. input domain) Net>k-Neighbours>Input
enter the number of the vertex, and 0 to consider all vertices that eventually lead to your chosen vertex
to find out the size of the input domain, select Info>Partition Calculate the size of the input domains for all vertices
Net>Partitions>Domain>Input Can also limit to only neighbors within some distance
Proximity prestige in Pajek
Direct nominations (choices) should count more than indirect ones
Nominations from second degree neighbors should count more than third degree ones
So consider proximity prestige
Cp(ni) = fraction of all vertices that are in i’s input domain
average distance from i to vertex in input domain
Weighted networks
Examples: email communication sports matches packet transfer population movement co-authorship food webs
Weighted treatment of data/algorithms usually left for ‘future work’
But what are weights good for?
Defining thresholds Shortest paths that don’t take long Flow/capacity of a network
Food webs
Food webs usually considered as binary
networks problems in defining threshold
fluxes: do killer whales who eat bears
count? weights
interaction frequency: acts of predation per hectare per
day
carbon flow (prey to predator) grams of Carbon per meter
squared per year
interaction strength (predator on prey)
(carbon flow of prey to predator)/ (biomass of predator)
Lake carbon flow
Co-authorship networks
The weight assigned to each edge is the sum of the number of papers in which two people were co-authors, divided by the total number of people in that paper large-scale high energy physics collaboration producing a paper
with 100 authors is less evidence of direct collaboration than an article in ‘Social Networks’ with only two co-authors.
Should we normalize? all weights from i to other nodes should sum to 1? (probably not)
k kij n
w1
1
all papers where i and j were coauthors
number of authors of paper k
Symmetry in normalization
If normalizing by the sum of values for each node
wij = 3/3=1
wji = 3/15=1/5j
i
Cosine similarity: symmetric values assume the weight for each paper is wk = 1/(nk-1)
i and j each have vectors of 0’s and w’s depending on whetherthey authored paper k
normalize by the length of both vectors
12
3
36
3
assume simple weighting = number ofpapers co-authored
ji
kk
jiVV
wS
2
,
Other similarity Measures
|)||,min(|
||
||||
||
||||
||||
||2
||
21
21
DQ
DQ
DQ
DQ
DQDQ
DQ
DQ
DQ
Simple matching
Dice’s Coefficient
Jaccard’s Coefficient
Cosine Coefficient
Overlap Coefficient
p1p2
p3
p4p5
p6
p7p8
p9
p10
p11
a2
a3
a1
Q set of papers authored by a1
D set of papers authored by a2
Weighted shortest paths
Routes shortest route from Chicago to Boston
vertex: intersection edge weights: road distances alternative weights: expected time traveled, gas consumed… usually sum the weights from each segment
start
finish
freeway, 65 mph
40 miles/65 mph ~ 37 minutes
freeway, 70 mph
30 miles/70 mph
~ 26 minutes
surface road
25 mph, 50 miles
2 hours
Reliable paths through social networks
The probability of transmitting a message or infectious agent could be related to the strength of the tie e.g. rather than summing the weights, we might multiply the
probabilities of getting through
p = 0.5p = 0.5
p = 0.001 p = 1
Probability of getting an idea through to the head of labs
via CEO (0.001*1 = 0.001), via direct manager (0.5*0.5 = 0.25)
p = 0.05
Shortest Path Problem
Given a weighted graph and two vertices u and v, we want to find a path of minimum total weight between u and v. Length of a path is the sum of the weights of its edges.
Example: Shortest path between Providence and Honolulu
Applications Internet packet routing Flight reservations Driving directions
ORD PVD
MIADFW
SFO
LAX
LGA
HNL
849
802
13871743
1843
10991120
1233337
2555
142
12
05
slide by: Huajie Zhang, http://www.cs.unb.ca/courses/cs3913/
Negative weights
Shortest paths usually undefined for edges with negative weights if there are negative cycles present
2
-3
4 3
Shortest Path Properties
Property 1:A subpath of a shortest path is itself a shortest path
Property 2:There is a tree of shortest paths from a start vertex to all the other vertices
Example:Tree of shortest paths from Providence
ORD PVD
MIADFW
SFO
LAX
LGA
HNL
849
802
13871743
1843
10991120
1233337
2555
142
12
05
slide by: Huajie Zhang, http://www.cs.unb.ca/courses/cs3913/
Dijkstra’s Algorithm
The distance of a vertex v from a vertex s is the length of a shortest path between s and v
Dijkstra’s algorithm computes the distances of all the vertices from a given start vertex s
Assumptions: the graph is connected the edges are undirected the edge weights are
nonnegative
We grow a “cloud” of vertices, beginning with s and eventually covering all the vertices
We store with each vertex v a label d(v) representing the distance of v from s in the subgraph consisting of the cloud and its adjacent vertices
At each step We add to the cloud the vertex u outside
the cloud with the smallest distance label, d(u)
We update the labels of the vertices adjacent to u
slide by: Huajie Zhang, http://www.cs.unb.ca/courses/cs3913/
Edge Relaxation
Consider an edge e (u,z) such that u is the vertex most recently added
to the cloud z is not in the cloud
The relaxation of edge e updates distance d(z) as follows:
d(z) min{d(z),d(u) weight(e)}
d(z) 75
d(u) 5010
zsu
d(z) 60
d(u) 5010
zsu
e
e
slide by: Huajie Zhang, http://www.cs.unb.ca/courses/cs3913/
Example
CB
A
E
D
F
0
428
48
7 1
2 5
2
3 9
CB
A
E
D
F
0
328
5 11
48
7 1
2 5
2
3 9
CB
A
E
D
F
0
328
5 8
48
7 1
2 5
2
3 9
CB
A
E
D
F
0
327
5 8
48
7 1
2 5
2
3 9
slide by: Huajie Zhang, http://www.cs.unb.ca/courses/cs3913/
Example (cont.)
CB
A
E
D
F
0
327
5 8
48
7 1
2 5
2
3 9
CB
A
E
D
F
0
327
5 8
48
7 1
2 5
2
3 9
slide by: Huajie Zhang, http://www.cs.unb.ca/courses/cs3913/
Minimum spanning trees
Connect all vertices with a single tree
Consider a communications company, such as AT&T or GTE that needs to build a communication network that connects n different users. The cost of making a link joining i and j is cij. What is the minimum cost of connecting all of the users?
16
3
75 8
9
4
2
10 Common assumption: the only links possible are the ones directly joining two nodes.
web.mit.edu/~jorlin/www/15.082/Lectures/16_Spanning_Trees.ppt
Electronic Circuitry
Consider a system with a number of electronic components. In order to make two pins i and j of different components electrically equivalent, one can connect i and j by a wire. How can we connect n different pins in this way to make them electrically equivalent to each other so as to minimize the total wire length.
1
2
3
4
5
web.mit.edu/~jorlin/www/15.082/Lectures/16_Spanning_Trees.ppt
Minimum Cost Spanning Tree Problem
Undirected network G = (N, A).
(i, j) is the same arc as (j, i).
We associate with each arc (i, j) A a cost cij.
A spanning tree T of G is a connected acyclic subgraph that spans all the nodes. A connected graph with n nodes and n – 1 arcs is a spanning tree.
The minimum cost spanning tree problem is to find a spanning tree of minimum cost.
web.mit.edu/~jorlin/www/15.082/Lectures/16_Spanning_Trees.ppt
A Minimum Cost Spanning Tree Problem
3510
30
15
25
40
2017
8
15
11
211
2
3
4
5
6
7
web.mit.edu/~jorlin/www/15.082/Lectures/16_Spanning_Trees.ppt
A Minimum Cost Spanning Tree
3510
30
15
25
40
2017
8
15
11
211
2
3
4
5
6
7
web.mit.edu/~jorlin/www/15.082/Lectures/16_Spanning_Trees.ppt
Prim-Jarnik Algorithm
Vertex based algorithm Grows one tree T, one vertex at a time A cloud covering the portion of T already computed Label the vertices v outside the cloud with key[v] – the minimum
weigth of an edge connecting v to a vertex in the cloud, key[v] = , if no such edge exists
www.cs.earlham.edu/~celikeb/fall_2005/cs310_aads/lecture_slides/ch23_minimum_spanning_trees.ppt
Prim Example
www.cs.earlham.edu/~celikeb/fall_2005/cs310_aads/lecture_slides/ch23_minimum_spanning_trees.ppt
Prim Example (2)
www.cs.earlham.edu/~celikeb/fall_2005/cs310_aads/lecture_slides/ch23_minimum_spanning_trees.ppt
Prim Example (3)
www.cs.earlham.edu/~celikeb/fall_2005/cs310_aads/lecture_slides/ch23_minimum_spanning_trees.ppt
Kruskal's Algorithm
The algorithm adds the cheapest edge that connects two trees of the forest
MST-Kruskal(G,w)01 A 02 for each vertex v V[G] do03 Make-Set(v)04 sort the edges of E by non-decreasing weight w05 for each edge (u,v) E, in order by non-
decreasing weight do06 if Find-Set(u) Find-Set(v) then07 A A {(u,v)}08 Union(u,v)09 return A
www.cs.earlham.edu/~celikeb/fall_2005/cs310_aads/lecture_slides/ch23_minimum_spanning_trees.ppt
Kruskal Example
www.cs.earlham.edu/~celikeb/fall_2005/cs310_aads/lecture_slides/ch23_minimum_spanning_trees.ppt
Kruskal Example (2)
www.cs.earlham.edu/~celikeb/fall_2005/cs310_aads/lecture_slides/ch23_minimum_spanning_trees.ppt
Kruskal Example (3)
www.cs.earlham.edu/~celikeb/fall_2005/cs310_aads/lecture_slides/ch23_minimum_spanning_trees.ppt
Kruskal Example (4)
www.cs.earlham.edu/~celikeb/fall_2005/cs310_aads/lecture_slides/ch23_minimum_spanning_trees.ppt
Network flow
Applications traffic & transportation
maximum number of cars that can commute from Berkley to San Francisco during rush hour
fluid networks: pipes that carry liquids computer networks: packets traveling along fiber
extended applications (from Kleinberg & Tardos, “Algorithm Design”)
bipartite matching problem number of disjoint paths between two vertices survey design airline scheduling image segmentation baseball elimination
Max flow problem: how much stuff can we get from source to sink per unit time?
7Capacity
SinkSource
www.comp.nus.edu.sg/~ooiwt/slides/2004-cs3233-graph2.ppt
Equivalent tasks
Find a cut with minimum capacity
Find maximum flow from source to sink
www.comp.nus.edu.sg/~ooiwt/slides/2004-cs3233-graph2.ppt
A Flow
7
3
2
5
residual graph
2
5
www.comp.nus.edu.sg/~ooiwt/slides/2004-cs3233-graph2.ppt
Augmenting Paths
A path from source to sink in the residual graph of a given flow
If there is an augmenting path in the residual graph, we can push more flow
www.comp.nus.edu.sg/~ooiwt/slides/2004-cs3233-graph2.ppt
Ford-Fulkerson Method
initialize total flow to 0residual graph G’= Gwhile augmenting path exist in G’
pick a augmenting path P in G’ m = bottleneck capacity of P add m to total flow push flow of m along P update G’
www.comp.nus.edu.sg/~ooiwt/slides/2004-cs3233-graph2.ppt
Example
12
1
1
1
11
1
1
12
2
2
2
4
3 3
3
3
3
4
42
www.comp.nus.edu.sg/~ooiwt/slides/2004-cs3233-graph2.ppt
Example
12
1
1
1
11
1
1
12
2
2
2
4
3 3
3
3
3
4
42
www.comp.nus.edu.sg/~ooiwt/slides/2004-cs3233-graph2.ppt
Example
12
1
1
1
11
1
1
12
1
2
2
3
3 3
3
3
3
3
4
111
2
www.comp.nus.edu.sg/~ooiwt/slides/2004-cs3233-graph2.ppt
Example
12
1
1
1
11
1
1
12
1
2
2
3
3 3
3
3
3
3
4
111
2
www.comp.nus.edu.sg/~ooiwt/slides/2004-cs3233-graph2.ppt
Example
12
1
1
1
11
1
1
12
1
2
2
3
1 3
1
1
3
3
4
111
2
www.comp.nus.edu.sg/~ooiwt/slides/2004-cs3233-graph2.ppt
Example
12
1
1
1
11
1
1
12
1
2
2
3
1 3
1
1
3
3
4
111
www.comp.nus.edu.sg/~ooiwt/slides/2004-cs3233-graph2.ppt
Example
11 1
1
1
11
1
1
12
2
2
2
1 3
1
1
3
2
3
22
2
1
1
www.comp.nus.edu.sg/~ooiwt/slides/2004-cs3233-graph2.ppt
Answer: Max Flow = 4
11
2
2
2
2
2
2
2
12
www.comp.nus.edu.sg/~ooiwt/slides/2004-cs3233-graph2.ppt
Answer: Minimum Cut = 4
12
1
1
1
11
1
1
12
2
2
2
4
3 3
3
3
3
4
4
www.comp.nus.edu.sg/~ooiwt/slides/2004-cs3233-graph2.ppt
project status report
worth 5 % of your grade, meant to keep you on track 2-3 weeks later: in-class presentation 1 month later – final project report due
what it should do: include part of your project proposal as intro include result summaries (including figures & tables). be 4-6 pages include references to and briefly (paragraph or 2) discuss some related work. include a plan of remaining work.
It is graded on a 0-5 scale 5 - same as 4, but very complete and already shows interesting new insights 4 - data, more than basic analysis (e.g. looked at robustness, community
structure, centrality, etc. if applicable) 3 - some data, preliminary analysis (imported data into Pajek or GUESS, counted
things up, visualized, if possible) 2 - some data, no results 1 - attempts made to get project started, but nothing worked out (no data, no
results) 0 - no work done
GUESS installation
Windows unzip the files into a folder edit the guess.bat (a batch executable file) so that @rem set GUESS_HOME=c:\program files\GUESS
becomes
@set GUESS_HOME=C:\PROGRA~1\GUESS
if you installed into c:\Program Files\GUESS else you can try installing into a directory with no spaces in the
name and have (e.g.)
@set GUESS_HOME=C:\apps\GUESS