30
Lecture 3 1.Different centrality measures of nodes 2.Hierarchical Clustering 3.Line graphs

Lecture 3 1.Different centrality measures of nodes 2.Hierarchical Clustering 3.Line graphs

Embed Size (px)

Citation preview

Page 1: Lecture 3 1.Different centrality measures of nodes 2.Hierarchical Clustering 3.Line graphs

Lecture 3

1.Different centrality measures of nodes

2.Hierarchical Clustering3.Line graphs

Page 2: Lecture 3 1.Different centrality measures of nodes 2.Hierarchical Clustering 3.Line graphs

1. Centrality measures

Within graph theory and network analysis, there are various measures of the centrality of a vertex within a graph that determine the relative importance of a vertex within the graph.

•Degree centrality

•Betweenness centrality

•Closeness centrality

•Eigenvector centrality

•Subgraph centrality

We will discuss on the following centrality measures:

Page 3: Lecture 3 1.Different centrality measures of nodes 2.Hierarchical Clustering 3.Line graphs

Degree centrality

Degree centrality is defined as the number of links incident upon a node i.e. the number of degree of the node

Degree centrality is often interpreted in terms of the immediate risk of the node for catching whatever is flowing through the network (such as a virus, or some information).

Degree centrality of the blue nodes are higher

Page 4: Lecture 3 1.Different centrality measures of nodes 2.Hierarchical Clustering 3.Line graphs

Betweenness centrality

The vertex betweenness centrality BC(v) of a vertex v is defined as follows:

Here σuw is the total number of shortest paths between node u and w and σuw(v) is number of shortest paths between node u and w that pass node v

Vertices that occur on many shortest paths between other vertices have higher betweenness than those that do not.

Page 5: Lecture 3 1.Different centrality measures of nodes 2.Hierarchical Clustering 3.Line graphs

a

db f

e

c

Betweenness centrality σuw σuw(v) σuw/σuw(v)

(a,b) 1 0 0

(a,d) 1 1 1

(a,e) 1 1 1

(a,f) 1 1 1

(b,d) 1 1 1

(b,e) 1 1 1

(b,f) 1 1 1

(d,e) 1 0 0

(d,f) 1 0 0

(e,f) 1 0 0

Betweenness centrality of node c=6

Betweenness centrality of node a=0 Calculation for node c

Page 6: Lecture 3 1.Different centrality measures of nodes 2.Hierarchical Clustering 3.Line graphs

Hue (from red=0 to blue=max) shows the node betweenness.

Betweenness centrality

•Nodes of high betweenness centrality are important for transport.

•If they are blocked, transport becomes less efficient and on the other hand if their capacity is improved transport becomes more efficient.

•Using a similar concept edge betweenness is calculated.

http://en.wikipedia.org/wiki/Betweenness_centrality#betweenness

Page 7: Lecture 3 1.Different centrality measures of nodes 2.Hierarchical Clustering 3.Line graphs

Closeness centrality

The farness of a vortex is the sum of the shortest-path distance from the vertex to any other vertex in the graph.The reciprocal of farness is the closeness centrality (CC).

Here, d(v,t) is the shortest distance between vertex v and vertex t

Closeness centrality can be viewed as the efficiency of a vertex in spreading information to all other vertices

vVt

tvdvCC

\

),(

1)(

Page 8: Lecture 3 1.Different centrality measures of nodes 2.Hierarchical Clustering 3.Line graphs

Eigenvector centralityLet A is the adjacency matrix of a graph and λ is the largest eigenvalue of A and x is the corresponding eigenvector then

The ith component of the eigenvector x then gives the eigenvector centrality score of the ith node in the network.

From (1)

N

jjjii xAx

1,

1

•Therefore, for any node, the eigenvector centrality score be proportional to the sum of the scores of all nodes which are connected to it. •Consequently, a node has high value of EC either if it is connected to many other nodes or if it is connected to others that themselves have high EC

-----(1)N×N N×1 N×1

|A-λI|=0, where I is an identity matrix

Page 9: Lecture 3 1.Different centrality measures of nodes 2.Hierarchical Clustering 3.Line graphs

Subgraph centrality

the number of closed walks of length k starting and ending on vertex i in the network is given by the local spectral moments μ k (i), which are simply defined as the ith diagonal entry of the kth power of the adjacency matrix, A:

Closed walks can be trivial or nontrivial and are directly related to the subgraphs of the network.

Subgraph Centrality in Complex Networks, Physical Review E 71, 056103(2005)

Page 10: Lecture 3 1.Different centrality measures of nodes 2.Hierarchical Clustering 3.Line graphs

0 1 0 0 0 0 0 0 0 0 0 0 0 0

1 0 1 1 0 1 0 0 0 0 0 0 0 0

0 1 0 1 1 1 0 0 0 0 0 0 0 0

0 1 1 0 1 1 0 1 0 0 0 0 0 0

0 0 1 1 0 1 0 0 0 0 0 0 0 0

0 1 1 1 1 0 1 0 0 0 0 0 0 0

0 0 0 0 0 1 0 0 0 0 1 0 0 0

0 0 0 1 0 0 0 0 1 0 0 0 0 0

0 0 0 0 0 0 0 1 0 1 0 0 1 1

0 0 0 0 0 0 0 0 1 0 1 0 1 1

0 0 0 0 0 0 1 0 0 1 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 1 0

0 0 0 0 0 0 0 0 1 1 0 1 0 1

0 0 0 0 0 0 0 0 1 1 0 0 1 0

M =

Muv = 1 if there is an edge between

nodes u and v and 0 otherwise.

Subgraph centrality

Adjacency matrix

Page 11: Lecture 3 1.Different centrality measures of nodes 2.Hierarchical Clustering 3.Line graphs

1 0 1 1 0 1 0 0 0 0 0 0 0 0

0 4 2 2 3 2 1 1 0 0 0 0 0 0

1 2 4 3 2 3 1 1 0 0 0 0 0 0

1 2 3 5 2 3 1 0 1 0 0 0 0 0

0 3 2 2 3 2 1 1 0 0 0 0 0 0

1 2 3 3 2 5 0 1 0 0 1 0 0 0

0 1 1 1 1 0 2 0 0 1 0 0 0 0

0 1 1 0 1 1 0 2 0 1 0 0 1 1

0 0 0 1 0 0 0 0 4 2 1 1 2 2

0 0 0 0 0 0 1 1 2 4 0 1 2 2

0 0 0 0 0 1 0 0 1 0 2 0 1 1

0 0 0 0 0 0 0 0 1 1 0 1 0 1

0 0 0 0 0 0 0 1 2 2 1 0 4 2

0 0 0 0 0 0 0 1 2 2 1 1 2 3

M2 =

(M2)uv for uv represents the

number of common neighbor of the nodes u and v.

local spectral moment

Subgraph centrality

Page 12: Lecture 3 1.Different centrality measures of nodes 2.Hierarchical Clustering 3.Line graphs

Table 2. Summary of results of eight real-world complex networks.

Page 13: Lecture 3 1.Different centrality measures of nodes 2.Hierarchical Clustering 3.Line graphs

Hierarchical Clustering

AtpB AtpAAtpG AtpEAtpA AtpHAtpB AtpHAtpG AtpHAtpE AtpH

Data is not always available as binary relations as in the case of protein-protein interactions where we can directly apply network clustering algorithms.

In many cases for example in case of microarray gene expression analysis the data is multivariate type.

An Introduction to Bioinformatics Algorithms by Jones & Pevzner

Page 14: Lecture 3 1.Different centrality measures of nodes 2.Hierarchical Clustering 3.Line graphs

We can convert multivariate data into networks and can apply network clustering algorithm about which we will discuss in the next class.

If dimension of multivariate data is 3 or less we can cluster them by plotting directly.

Hierarchical Clustering

An Introduction to Bioinformatics Algorithms by Jones & Pevzner

Page 15: Lecture 3 1.Different centrality measures of nodes 2.Hierarchical Clustering 3.Line graphs

However, when dimension is more than 3, we can apply hierarchical clustering to multivariate data.

In hierarchical clustering the data are not partitioned into a particular cluster in a single step. Instead, a series of partitions takes place.

Some data reveal good cluster structure when plotted but some data do not.

Data plotted in 2 dimensions

Hierarchical Clustering

Page 16: Lecture 3 1.Different centrality measures of nodes 2.Hierarchical Clustering 3.Line graphs

Hierarchical clustering is a technique that organizes elements into a tree.

A tree is a graph that has no cycle.

A tree with n nodes can have maximum n-1 edges.

A Graph A tree

Hierarchical Clustering

Page 17: Lecture 3 1.Different centrality measures of nodes 2.Hierarchical Clustering 3.Line graphs

Hierarchical Clustering is subdivided into 2 types

1. agglomerative methods, which proceed by series of fusions of the n objects into groups,

2. and divisive methods, which separate n objects successively into finer groupings.

Agglomerative techniques are more commonly used

Data can be viewed as a single cluster containing all objects to n clusters each containing a single object .

Hierarchical Clustering

Page 18: Lecture 3 1.Different centrality measures of nodes 2.Hierarchical Clustering 3.Line graphs

Distance measurementsThe Euclidean distance between points and

, in Euclidean n-space, is defined as:

Euclidean distance between g1 and g2

0622.81640

)910()08()1010( 222

Hierarchical Clustering

Page 19: Lecture 3 1.Different centrality measures of nodes 2.Hierarchical Clustering 3.Line graphs

An Introduction to Bioinformatics Algorithms by Jones & Pevzner

In stead of Euclidean distance correlation can also be used as a distance measurement.

For biological analysis involving genes and proteins, nucleotide and or amino acid sequence similarity can also be used as distance between objects

Hierarchical Clustering

Page 20: Lecture 3 1.Different centrality measures of nodes 2.Hierarchical Clustering 3.Line graphs

•An agglomerative hierarchical clustering procedure produces a series of partitions of the data, Pn, Pn-1, ....... , P1. The first Pn consists of n single object 'clusters', the last P1, consists of single group containing all n cases. •At each particular stage the method joins together the two clusters which are closest together (most similar).  (At the first stage, of course, this amounts to joining together the two objects that are closest together, since at the initial stage each cluster has one object.)   

Hierarchical Clustering

Page 21: Lecture 3 1.Different centrality measures of nodes 2.Hierarchical Clustering 3.Line graphs

An Introduction to Bioinformatics Algorithms by Jones & Pevzner

Differences between methods arise because of the different ways of defining distance (or similarity) between clusters.

Hierarchical Clustering

Page 22: Lecture 3 1.Different centrality measures of nodes 2.Hierarchical Clustering 3.Line graphs

How can we measure distances between clusters?

Single linkage clustering

Distance between two clusters A and B, D(A,B) is computed as

D(A,B) = Min { d(i,j) : Where object i is in cluster A and object j is cluster B}

Hierarchical Clustering

Page 23: Lecture 3 1.Different centrality measures of nodes 2.Hierarchical Clustering 3.Line graphs

Complete linkage clustering

Distance between two clusters A and B, D(A,B) is computed as D(A,B) = Max { d(i,j) : Where object i is in cluster A and

object j is cluster B}

Hierarchical Clustering

Page 24: Lecture 3 1.Different centrality measures of nodes 2.Hierarchical Clustering 3.Line graphs

Average linkage clustering

Distance between two clusters A and B, D(A,B) is computed as D(A,B) = TAB / ( NA * NB)

Where TAB is the sum of all pair wise distances between objects of cluster A and cluster B. NA and NB are the sizes of the clusters

A and B respectively.  

Total NA * NB edges

Hierarchical Clustering

Page 25: Lecture 3 1.Different centrality measures of nodes 2.Hierarchical Clustering 3.Line graphs

Average group linkage clustering

Distance between two clusters A and B, D(A,B) is computed as D(A,B) = = Average { d(i,j) : Where observations i and j are in

cluster t, the cluster formed by merging clusters A and B }

Total n(n-1)/2 edges

Hierarchical Clustering

Page 26: Lecture 3 1.Different centrality measures of nodes 2.Hierarchical Clustering 3.Line graphs

Alizadeh et al. Nature 403: 503-511 (2000).

Hierarchical Clustering

Page 27: Lecture 3 1.Different centrality measures of nodes 2.Hierarchical Clustering 3.Line graphs

Classifying bacteria based on 16s rRNA sequences.

Page 28: Lecture 3 1.Different centrality measures of nodes 2.Hierarchical Clustering 3.Line graphs

Line Graphs

Given a graph G, its line graph L(G) is a graph such thateach vertex of L(G) represents an edge of G; and two vertices of L(G) are adjacent if and only if their corresponding edges share a common endpoint ("are adjacent") in G.

               

 

                

               

 

               

 Graph G Vertices in L(G) constructed from edges in G

Added edges in L(G)

The line graph L(G)

http://en.wikipedia.org/wiki/Line_graph

Page 29: Lecture 3 1.Different centrality measures of nodes 2.Hierarchical Clustering 3.Line graphs

Line Graphs

RASCAL: Calculation of Graph Similarity using Maximum Common Edge SubgraphsBy JOHN W. RAYMOND1, ELEANOR J. GARDINER2 AND PETER WILLETT2

THE COMPUTER JOURNAL, Vol. 45, No. 6, 2002

The above paper has introduced a new graph similarity calculation procedure for comparing labeled graphs.

The chemical graphs G1 and G2 are shown in Figure a,and their respective line graphs are depicted in Figure b.

Page 30: Lecture 3 1.Different centrality measures of nodes 2.Hierarchical Clustering 3.Line graphs

Line GraphsDetection of Functional Modules FromProtein Interaction NetworksBy Jose B. Pereira-Leal,1 Anton J. Enright,2 and Christos A. Ouzounis1

PROTEINS: Structure, Function, and Bioinformatics 54:49–57 (2004)

Transforming a network of proteins to a network of interactions. a: Schematic representation illustrating a graph representation of protein interactions: nodes correspond to proteins and edges to interactions. b: Schematic representation illustrating the transformation of the protein graph connected by interactions to an interaction graph connected by proteins. Each node represents a binary interaction and edges represent shared proteins. Note that labels that are not shared correspond to terminal nodes in (a)

A star is transformed into a clique