Lecture 3
1.Different centrality measures of nodes
2.Hierarchical Clustering3.Line graphs
1. Centrality measures
Within graph theory and network analysis, there are various measures of the centrality of a vertex within a graph that determine the relative importance of a vertex within the graph.
•Degree centrality
•Betweenness centrality
•Closeness centrality
•Eigenvector centrality
•Subgraph centrality
We will discuss on the following centrality measures:
Degree centrality
Degree centrality is defined as the number of links incident upon a node i.e. the number of degree of the node
Degree centrality is often interpreted in terms of the immediate risk of the node for catching whatever is flowing through the network (such as a virus, or some information).
Degree centrality of the blue nodes are higher
Betweenness centrality
The vertex betweenness centrality BC(v) of a vertex v is defined as follows:
Here σuw is the total number of shortest paths between node u and w and σuw(v) is number of shortest paths between node u and w that pass node v
Vertices that occur on many shortest paths between other vertices have higher betweenness than those that do not.
a
db f
e
c
Betweenness centrality σuw σuw(v) σuw/σuw(v)
(a,b) 1 0 0
(a,d) 1 1 1
(a,e) 1 1 1
(a,f) 1 1 1
(b,d) 1 1 1
(b,e) 1 1 1
(b,f) 1 1 1
(d,e) 1 0 0
(d,f) 1 0 0
(e,f) 1 0 0
Betweenness centrality of node c=6
Betweenness centrality of node a=0 Calculation for node c
Hue (from red=0 to blue=max) shows the node betweenness.
Betweenness centrality
•Nodes of high betweenness centrality are important for transport.
•If they are blocked, transport becomes less efficient and on the other hand if their capacity is improved transport becomes more efficient.
•Using a similar concept edge betweenness is calculated.
http://en.wikipedia.org/wiki/Betweenness_centrality#betweenness
Closeness centrality
The farness of a vortex is the sum of the shortest-path distance from the vertex to any other vertex in the graph.The reciprocal of farness is the closeness centrality (CC).
Here, d(v,t) is the shortest distance between vertex v and vertex t
Closeness centrality can be viewed as the efficiency of a vertex in spreading information to all other vertices
vVt
tvdvCC
\
),(
1)(
Eigenvector centralityLet A is the adjacency matrix of a graph and λ is the largest eigenvalue of A and x is the corresponding eigenvector then
The ith component of the eigenvector x then gives the eigenvector centrality score of the ith node in the network.
From (1)
N
jjjii xAx
1,
1
•Therefore, for any node, the eigenvector centrality score be proportional to the sum of the scores of all nodes which are connected to it. •Consequently, a node has high value of EC either if it is connected to many other nodes or if it is connected to others that themselves have high EC
-----(1)N×N N×1 N×1
|A-λI|=0, where I is an identity matrix
Subgraph centrality
the number of closed walks of length k starting and ending on vertex i in the network is given by the local spectral moments μ k (i), which are simply defined as the ith diagonal entry of the kth power of the adjacency matrix, A:
Closed walks can be trivial or nontrivial and are directly related to the subgraphs of the network.
Subgraph Centrality in Complex Networks, Physical Review E 71, 056103(2005)
0 1 0 0 0 0 0 0 0 0 0 0 0 0
1 0 1 1 0 1 0 0 0 0 0 0 0 0
0 1 0 1 1 1 0 0 0 0 0 0 0 0
0 1 1 0 1 1 0 1 0 0 0 0 0 0
0 0 1 1 0 1 0 0 0 0 0 0 0 0
0 1 1 1 1 0 1 0 0 0 0 0 0 0
0 0 0 0 0 1 0 0 0 0 1 0 0 0
0 0 0 1 0 0 0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 1 0 1 0 0 1 1
0 0 0 0 0 0 0 0 1 0 1 0 1 1
0 0 0 0 0 0 1 0 0 1 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0 1 1 0 1 0 1
0 0 0 0 0 0 0 0 1 1 0 0 1 0
M =
Muv = 1 if there is an edge between
nodes u and v and 0 otherwise.
Subgraph centrality
Adjacency matrix
1 0 1 1 0 1 0 0 0 0 0 0 0 0
0 4 2 2 3 2 1 1 0 0 0 0 0 0
1 2 4 3 2 3 1 1 0 0 0 0 0 0
1 2 3 5 2 3 1 0 1 0 0 0 0 0
0 3 2 2 3 2 1 1 0 0 0 0 0 0
1 2 3 3 2 5 0 1 0 0 1 0 0 0
0 1 1 1 1 0 2 0 0 1 0 0 0 0
0 1 1 0 1 1 0 2 0 1 0 0 1 1
0 0 0 1 0 0 0 0 4 2 1 1 2 2
0 0 0 0 0 0 1 1 2 4 0 1 2 2
0 0 0 0 0 1 0 0 1 0 2 0 1 1
0 0 0 0 0 0 0 0 1 1 0 1 0 1
0 0 0 0 0 0 0 1 2 2 1 0 4 2
0 0 0 0 0 0 0 1 2 2 1 1 2 3
M2 =
(M2)uv for uv represents the
number of common neighbor of the nodes u and v.
local spectral moment
Subgraph centrality
Table 2. Summary of results of eight real-world complex networks.
Hierarchical Clustering
AtpB AtpAAtpG AtpEAtpA AtpHAtpB AtpHAtpG AtpHAtpE AtpH
Data is not always available as binary relations as in the case of protein-protein interactions where we can directly apply network clustering algorithms.
In many cases for example in case of microarray gene expression analysis the data is multivariate type.
An Introduction to Bioinformatics Algorithms by Jones & Pevzner
We can convert multivariate data into networks and can apply network clustering algorithm about which we will discuss in the next class.
If dimension of multivariate data is 3 or less we can cluster them by plotting directly.
Hierarchical Clustering
An Introduction to Bioinformatics Algorithms by Jones & Pevzner
However, when dimension is more than 3, we can apply hierarchical clustering to multivariate data.
In hierarchical clustering the data are not partitioned into a particular cluster in a single step. Instead, a series of partitions takes place.
Some data reveal good cluster structure when plotted but some data do not.
Data plotted in 2 dimensions
Hierarchical Clustering
Hierarchical clustering is a technique that organizes elements into a tree.
A tree is a graph that has no cycle.
A tree with n nodes can have maximum n-1 edges.
A Graph A tree
Hierarchical Clustering
Hierarchical Clustering is subdivided into 2 types
1. agglomerative methods, which proceed by series of fusions of the n objects into groups,
2. and divisive methods, which separate n objects successively into finer groupings.
Agglomerative techniques are more commonly used
Data can be viewed as a single cluster containing all objects to n clusters each containing a single object .
Hierarchical Clustering
Distance measurementsThe Euclidean distance between points and
, in Euclidean n-space, is defined as:
Euclidean distance between g1 and g2
0622.81640
)910()08()1010( 222
Hierarchical Clustering
An Introduction to Bioinformatics Algorithms by Jones & Pevzner
In stead of Euclidean distance correlation can also be used as a distance measurement.
For biological analysis involving genes and proteins, nucleotide and or amino acid sequence similarity can also be used as distance between objects
Hierarchical Clustering
•An agglomerative hierarchical clustering procedure produces a series of partitions of the data, Pn, Pn-1, ....... , P1. The first Pn consists of n single object 'clusters', the last P1, consists of single group containing all n cases. •At each particular stage the method joins together the two clusters which are closest together (most similar). (At the first stage, of course, this amounts to joining together the two objects that are closest together, since at the initial stage each cluster has one object.)
Hierarchical Clustering
An Introduction to Bioinformatics Algorithms by Jones & Pevzner
Differences between methods arise because of the different ways of defining distance (or similarity) between clusters.
Hierarchical Clustering
How can we measure distances between clusters?
Single linkage clustering
Distance between two clusters A and B, D(A,B) is computed as
D(A,B) = Min { d(i,j) : Where object i is in cluster A and object j is cluster B}
Hierarchical Clustering
Complete linkage clustering
Distance between two clusters A and B, D(A,B) is computed as D(A,B) = Max { d(i,j) : Where object i is in cluster A and
object j is cluster B}
Hierarchical Clustering
Average linkage clustering
Distance between two clusters A and B, D(A,B) is computed as D(A,B) = TAB / ( NA * NB)
Where TAB is the sum of all pair wise distances between objects of cluster A and cluster B. NA and NB are the sizes of the clusters
A and B respectively.
Total NA * NB edges
Hierarchical Clustering
Average group linkage clustering
Distance between two clusters A and B, D(A,B) is computed as D(A,B) = = Average { d(i,j) : Where observations i and j are in
cluster t, the cluster formed by merging clusters A and B }
Total n(n-1)/2 edges
Hierarchical Clustering
Alizadeh et al. Nature 403: 503-511 (2000).
Hierarchical Clustering
Classifying bacteria based on 16s rRNA sequences.
Line Graphs
Given a graph G, its line graph L(G) is a graph such thateach vertex of L(G) represents an edge of G; and two vertices of L(G) are adjacent if and only if their corresponding edges share a common endpoint ("are adjacent") in G.
Graph G Vertices in L(G) constructed from edges in G
Added edges in L(G)
The line graph L(G)
http://en.wikipedia.org/wiki/Line_graph
Line Graphs
RASCAL: Calculation of Graph Similarity using Maximum Common Edge SubgraphsBy JOHN W. RAYMOND1, ELEANOR J. GARDINER2 AND PETER WILLETT2
THE COMPUTER JOURNAL, Vol. 45, No. 6, 2002
The above paper has introduced a new graph similarity calculation procedure for comparing labeled graphs.
The chemical graphs G1 and G2 are shown in Figure a,and their respective line graphs are depicted in Figure b.
Line GraphsDetection of Functional Modules FromProtein Interaction NetworksBy Jose B. Pereira-Leal,1 Anton J. Enright,2 and Christos A. Ouzounis1
PROTEINS: Structure, Function, and Bioinformatics 54:49–57 (2004)
Transforming a network of proteins to a network of interactions. a: Schematic representation illustrating a graph representation of protein interactions: nodes correspond to proteins and edges to interactions. b: Schematic representation illustrating the transformation of the protein graph connected by interactions to an interaction graph connected by proteins. Each node represents a binary interaction and edges represent shared proteins. Note that labels that are not shared correspond to terminal nodes in (a)
A star is transformed into a clique