School of Information University of Michigan SI 614 Directed & weighted networks, minimum spanning trees, flow Lecture 12 Instructor: Lada Adamic

School of InformationUniversity of Michigan

SI 614Directed & weighted networks, minimum spanning trees, flow

Lecture 12

Instructor: Lada Adamic

Outline

directed networks prestige weighted networks minimum spanning trees flow

Comparing across these 3 centrality values•Generally, the 3 centrality types will be positively correlated•When they are not (low) correlated, it probably tells you something interesting about the network.

Low Degree

Low Closeness

Low Betweenness

High Degree Embedded in cluster that is far from the rest of the network

Ego's connections are redundant - communication bypasses him/her

High Closeness Key player tied to important important/active alters

Probably multiple paths in the network, ego is near many people, but so are many others

High Betweenness

Ego's few ties are crucial for network flow

Very rare cell. Would mean that ego monopolizes the ties from a small number of people to many others.

Review of centrality in undirected networksComparison

slide: Jim Moody

Bonacich Power Centrality: Actor’s centrality (prestige) is equal to a function of the prestige of those they are connected to. Thus, actors who are tied to very central actors should have higher prestige/ centrality than those who are not.

1)(),( 1 RRIC

• is a scaling vector, which is set to normalize the score. • reflects the extent to which you weight the centrality of people ego is tied to.

• R is the adjacency matrix (can be valued)• I is the identity matrix (1s down the diagonal) • 1 is a matrix of all ones.

Centrality in Social NetworksPower / Eigenvalue

slide: Jim Moody

Bonacich Power Centrality:

The magnitude of reflects the radius of power. Small values of weight local structure, larger values weight global structure.

If is positive, then ego has higher centrality when tied to people who are central.

If is negative, then ego has higher centrality when tied to people who are not central.

As approaches zero, you get degree centrality.


slide: Jim Moody


0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

1 2 3 4 5 6 7

Positive

Negative

= 0.23


slide: Jim Moody

=.35 =-.35Bonacich Power Centrality:


slide: Jim Moody


=.23 = -.23


slide: Jim Moody

Examples of directed networks?

WWW food webs population dynamics influence hereditary citation transcription regulation networks neural networks

Prestige in directed social networks

when ‘prestige’ may be the right word admiration influence gift-giving trust

directionality especially important in instances where ties may not be reciprocated (e.g. dining partners choice network)

when ‘prestige’ may not be the right word gives advice to (can reverse direction) gives orders to (- ” -) lends money to (- ” -) dislikes distrusts

Extensions of undirected degree centrality - prestige

degree centrality indegree centrality

a paper that is cited by many others has high prestige a person nominated by many others for an reward has high prestige

Extensions of undirected closeness centrality

closeness centrality usually implies all paths should lead to you

and unusually not: paths should lead from you to everywhere else

usually consider only vertices from which the node i in question can be reached

Influence range

The influence range of i is the set of vertices who are reachable from the node i

Extending betweenness centrality to directed networks

We now consider the fraction of all directed paths between any two vertices that pass through a node

Only modification: when normalizing, we have (N-1)*(N-2) instead of (N-1)*(N-2)/2, because we have twice as many ordered pairs as unordered pairs

jkikj

jkiB gngnC /)()(,

betweenness of vertex i paths between j and k that pass through i

all paths between j and k

)]2)(1/[()()(' NNnCnC ii BB

Directed geodesics

A node does not necessarily lie on a geodesic from j to k if it lies on a geodesic from k to j

k

j

Prestige in Pajek

Calculating the indegree prestige Net>Partition>Degree>Input to view, select File>Partition>Edit if you need to reverse the direction of each tie first (e.g. lends

money to -> borrows from):Net>Transform>Transpose

Influence range (a.k.a. input domain) Net>k-Neighbours>Input

enter the number of the vertex, and 0 to consider all vertices that eventually lead to your chosen vertex

to find out the size of the input domain, select Info>Partition Calculate the size of the input domains for all vertices

Net>Partitions>Domain>Input Can also limit to only neighbors within some distance

Proximity prestige in Pajek

Direct nominations (choices) should count more than indirect ones

Nominations from second degree neighbors should count more than third degree ones

So consider proximity prestige

Cp(ni) = fraction of all vertices that are in i’s input domain

average distance from i to vertex in input domain

Weighted networks

Examples: email communication sports matches packet transfer population movement co-authorship food webs

Weighted treatment of data/algorithms usually left for ‘future work’

But what are weights good for?

Defining thresholds Shortest paths that don’t take long Flow/capacity of a network

Food webs

Food webs usually considered as binary

networks problems in defining threshold

fluxes: do killer whales who eat bears

count? weights

interaction frequency: acts of predation per hectare per

day

carbon flow (prey to predator) grams of Carbon per meter

squared per year

interaction strength (predator on prey)

(carbon flow of prey to predator)/ (biomass of predator)

Lake carbon flow

Co-authorship networks

The weight assigned to each edge is the sum of the number of papers in which two people were co-authors, divided by the total number of people in that paper large-scale high energy physics collaboration producing a paper

with 100 authors is less evidence of direct collaboration than an article in ‘Social Networks’ with only two co-authors.

Should we normalize? all weights from i to other nodes should sum to 1? (probably not)

k kij n

w1

1

all papers where i and j were coauthors

number of authors of paper k

Symmetry in normalization

If normalizing by the sum of values for each node

wij = 3/3=1

wji = 3/15=1/5j

i

Cosine similarity: symmetric values assume the weight for each paper is wk = 1/(nk-1)

i and j each have vectors of 0’s and w’s depending on whetherthey authored paper k

normalize by the length of both vectors

12

3

36

3

assume simple weighting = number ofpapers co-authored

ji

kk

jiVV

wS

2

,

Other similarity Measures

|)||,min(|

||

||||

||

||||

||||

||2

||

21

21

DQ

DQ

DQ

DQ

DQDQ

DQ

DQ

DQ

Simple matching

Dice’s Coefficient

Jaccard’s Coefficient

Cosine Coefficient

Overlap Coefficient

p1p2

p3

p4p5

p6

p7p8

p9

p10

p11

a2

a3

a1

Q set of papers authored by a1

D set of papers authored by a2

Weighted shortest paths

Routes shortest route from Chicago to Boston

vertex: intersection edge weights: road distances alternative weights: expected time traveled, gas consumed… usually sum the weights from each segment

start

finish

freeway, 65 mph

40 miles/65 mph ~ 37 minutes

freeway, 70 mph

30 miles/70 mph

~ 26 minutes

surface road

25 mph, 50 miles

2 hours

Reliable paths through social networks

The probability of transmitting a message or infectious agent could be related to the strength of the tie e.g. rather than summing the weights, we might multiply the

probabilities of getting through

p = 0.5p = 0.5

p = 0.001 p = 1

Probability of getting an idea through to the head of labs

via CEO (0.001*1 = 0.001), via direct manager (0.5*0.5 = 0.25)

p = 0.05

Shortest Path Problem

Given a weighted graph and two vertices u and v, we want to find a path of minimum total weight between u and v. Length of a path is the sum of the weights of its edges.

Example: Shortest path between Providence and Honolulu

Applications Internet packet routing Flight reservations Driving directions

ORD PVD

MIADFW

SFO

LAX

LGA

HNL

849

802

13871743

1843

10991120

1233337

2555

142

12

05

slide by: Huajie Zhang, http://www.cs.unb.ca/courses/cs3913/

http://www.cs.unb.ca/courses/cs3913/

Negative weights

Shortest paths usually undefined for edges with negative weights if there are negative cycles present

2

-3

4 3

Shortest Path Properties

Property 1:A subpath of a shortest path is itself a shortest path

Property 2:There is a tree of shortest paths from a start vertex to all the other vertices

Example:Tree of shortest paths from Providence

ORD PVD

MIADFW

SFO

LAX

LGA

HNL

849

802

13871743

1843

10991120

1233337

2555

142

12

05



Dijkstra’s Algorithm

The distance of a vertex v from a vertex s is the length of a shortest path between s and v

Dijkstra’s algorithm computes the distances of all the vertices from a given start vertex s

Assumptions: the graph is connected the edges are undirected the edge weights are

nonnegative

We grow a “cloud” of vertices, beginning with s and eventually covering all the vertices

We store with each vertex v a label d(v) representing the distance of v from s in the subgraph consisting of the cloud and its adjacent vertices

At each step We add to the cloud the vertex u outside

the cloud with the smallest distance label, d(u)

We update the labels of the vertices adjacent to u



Edge Relaxation

Consider an edge e (u,z) such that u is the vertex most recently added

to the cloud z is not in the cloud

The relaxation of edge e updates distance d(z) as follows:

d(z) min{d(z),d(u) weight(e)}

d(z) 75

d(u) 5010

zsu

d(z) 60

d(u) 5010

zsu

e

e



Example

CB

A

E

D

F

0

428

48

7 1

2 5

2

3 9

CB

A

E

D

F

0

328

5 11

48

7 1

2 5

2

3 9

CB

A

E

D

F

0

328

5 8

48

7 1

2 5

2

3 9

CB

A

E

D

F

0

327

5 8

48

7 1

2 5

2

3 9



Example (cont.)

CB

A

E

D

F

0

327

5 8

48

7 1

2 5

2

3 9

CB

A

E

D

F

0

327

5 8

48

7 1

2 5

2

3 9



Minimum spanning trees

Connect all vertices with a single tree

Consider a communications company, such as AT&T or GTE that needs to build a communication network that connects n different users. The cost of making a link joining i and j is cij. What is the minimum cost of connecting all of the users?

16

3

75 8

9

4

2

10 Common assumption: the only links possible are the ones directly joining two nodes.

web.mit.edu/~jorlin/www/15.082/Lectures/16_Spanning_Trees.ppt

Electronic Circuitry

Consider a system with a number of electronic components. In order to make two pins i and j of different components electrically equivalent, one can connect i and j by a wire. How can we connect n different pins in this way to make them electrically equivalent to each other so as to minimize the total wire length.

1

2

3

4

5


Minimum Cost Spanning Tree Problem

Undirected network G = (N, A).

(i, j) is the same arc as (j, i).

We associate with each arc (i, j) A a cost cij.

A spanning tree T of G is a connected acyclic subgraph that spans all the nodes. A connected graph with n nodes and n – 1 arcs is a spanning tree.

The minimum cost spanning tree problem is to find a spanning tree of minimum cost.


A Minimum Cost Spanning Tree Problem

3510

30

15

25

40

2017

8

15

11

211

2

3

4

5

6

7


A Minimum Cost Spanning Tree

3510

30

15

25

40

2017

8

15

11

211

2

3

4

5

6

7


Prim-Jarnik Algorithm

Vertex based algorithm Grows one tree T, one vertex at a time A cloud covering the portion of T already computed Label the vertices v outside the cloud with key[v] – the minimum

weigth of an edge connecting v to a vertex in the cloud, key[v] = , if no such edge exists

www.cs.earlham.edu/~celikeb/fall_2005/cs310_aads/lecture_slides/ch23_minimum_spanning_trees.ppt

Prim Example


Prim Example (2)


Prim Example (3)


Kruskal's Algorithm

The algorithm adds the cheapest edge that connects two trees of the forest

MST-Kruskal(G,w)01 A 02 for each vertex v V[G] do03 Make-Set(v)04 sort the edges of E by non-decreasing weight w05 for each edge (u,v) E, in order by non-

decreasing weight do06 if Find-Set(u) Find-Set(v) then07 A A {(u,v)}08 Union(u,v)09 return A


Kruskal Example


Kruskal Example (2)


Kruskal Example (3)


Kruskal Example (4)


Network flow

Applications traffic & transportation

maximum number of cars that can commute from Berkley to San Francisco during rush hour

fluid networks: pipes that carry liquids computer networks: packets traveling along fiber

extended applications (from Kleinberg & Tardos, “Algorithm Design”)

bipartite matching problem number of disjoint paths between two vertices survey design airline scheduling image segmentation baseball elimination

Max flow problem: how much stuff can we get from source to sink per unit time?

7Capacity

SinkSource

www.comp.nus.edu.sg/~ooiwt/slides/2004-cs3233-graph2.ppt

Equivalent tasks

Find a cut with minimum capacity

Find maximum flow from source to sink


A Flow

7

3

2

5

residual graph

2

5


Augmenting Paths

A path from source to sink in the residual graph of a given flow

If there is an augmenting path in the residual graph, we can push more flow


Ford-Fulkerson Method

initialize total flow to 0residual graph G’= Gwhile augmenting path exist in G’

pick a augmenting path P in G’ m = bottleneck capacity of P add m to total flow push flow of m along P update G’


Example

12

1

1

1

11

1

1

12

2

2

2

4

3 3

3

3

3

4

42


Example

12

1

1

1

11

1

1

12

2

2

2

4

3 3

3

3

3

4

42


Example

12

1

1

1

11

1

1

12

1

2

2

3

3 3

3

3

3

3

4

111

2


Example

12

1

1

1

11

1

1

12

1

2

2

3

3 3

3

3

3

3

4

111

2


Example

12

1

1

1

11

1

1

12

1

2

2

3

1 3

1

1

3

3

4

111

2


Example

12

1

1

1

11

1

1

12

1

2

2

3

1 3

1

1

3

3

4

111


Example

11 1

1

1

11

1

1

12

2

2

2

1 3

1

1

3

2

3

22

2

1

1


Answer: Max Flow = 4

11

2

2

2

2

2

2

2

12


Answer: Minimum Cut = 4

12

1

1

1

11

1

1

12

2

2

2

4

3 3

3

3

3

4

4


project status report

worth 5 % of your grade, meant to keep you on track 2-3 weeks later: in-class presentation 1 month later – final project report due

what it should do: include part of your project proposal as intro include result summaries (including figures & tables). be 4-6 pages include references to and briefly (paragraph or 2) discuss some related work. include a plan of remaining work.

It is graded on a 0-5 scale 5 - same as 4, but very complete and already shows interesting new insights 4 - data, more than basic analysis (e.g. looked at robustness, community

structure, centrality, etc. if applicable) 3 - some data, preliminary analysis (imported data into Pajek or GUESS, counted

things up, visualized, if possible) 2 - some data, no results 1 - attempts made to get project started, but nothing worked out (no data, no

results) 0 - no work done

GUESS installation

Windows unzip the files into a folder edit the guess.bat (a batch executable file) so that @rem set GUESS_HOME=c:\program files\GUESS

becomes

@set GUESS_HOME=C:\PROGRA~1\GUESS

if you installed into c:\Program Files\GUESS else you can try installing into a directory with no spaces in the

name and have (e.g.)

@set GUESS_HOME=C:\apps\GUESS

Documents

School of Information University of Michigan SI 614 Directed & weighted networks, minimum spanning trees, flow Lecture 12 Instructor: Lada Adamic