Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Cliques and a New Measure of Clustering(with Application to U.S. Domestic Airlines)
Yll Mehmeti(supervised by Steve Lawford)
DEVI, ENAC
February 6, 2018
1 / 28
Structure
I What is a clique?
I Overall clustering coefficient C (3)
I Generalization of the clustering coefficient C (b)
I Properties of C (b)
I Analytic formulae for subgraphs
I Algorithmic efficiency
I Is C (b) useful in practice?
2 / 28
What is a Clique?
2-clique (an edge) 3-clique (a triangle)
4-clique 5-clique
3 / 28
Route-Map Data for U.S. Domestic Airlines
Southwest Airlines’ Direct Routes in 2013Q4
4 / 28
Cliques in Southwest’s Network
Dayton–Denver–Orlando maximal 3-clique
5 / 28
Cliques in Southwest’s Network
Albuquerque–Dallas–Houston–Kansas City maximal 4-clique
6 / 28
Cliques in Southwest’s Network
BNA–BWI–DEN–HOU–LAS–MCI–MDW–MSY–PHX–STL–TPA maximum 11-clique
7 / 28
Cliques in Southwest’s Network
Distribution of size of maximal cliques, Southwest Airlines 2013Q4
8 / 28
Cliques in Southwest’s Network
9 / 28
Overall Clustering Coefficient C (3)
Defined by Mark Newman (2003, “The structure and function ofcomplex networks”, SIAM Review) as:
C (3) =3× number of triangles in the network
number of connected triples of vertices,
where a connected triple is a set of three distinct nodes u, v andw , such that at least two of the possible edges between them exist.
I How often are two friends of mine friends with one another?
I Higher values of C (3) correspond to more 3-cliques (triangles)
I How to compute C (3)? Brute-force search takes O(n3) time
10 / 28
Overall Clustering Coefficient C (3)
11 / 28
Dynamic Behaviour of C (3) for Different Networks
Southwest Airlines American Airlines
US Airways United Airlines
12 / 28
Dynamic Behaviour of C (3) for Different Networks
Southwest Airlines American Airlines
US Airways United Airlines
13 / 28
Dynamic Behaviour of C (3) for Different Networks
Southwest Airlines American Airlines
US Airways United Airlines14 / 28
Generalized Overall Clustering Coefficient C (b)
C (b) =a(b)× number of b-cliques in the network
number of b-spanning trees,
where Cayley’s formula a(b) = bb−2 gives the number of spanningtrees in a complete graph with b nodes.
I C (b) = 0 if and only if there are no b-cliques in the graph
I C (b) = 1 if and only if the graph is complete
I For Erdos-Renyi G (n, p) random graphs, C (b) has a simpleform e.g. C (3) = p,C (4) = p3,C (5) = p6 etc.
15 / 28
Overall Clustering Coefficient C (4)
16 / 28
Overall Clustering Coefficient C (5)
17 / 28
Analytic Formulae for Subgraphs and C (b)
Let g be the adjacency matrix and ki the degree of node i .
tadpole
|M(4)15 | =
1
2
∑ki>2
(g3)ii (ki − 2)
5-arrow
|M(5)77 | =
∑(i , j)∈E
both directions
(ki − 1
2
)(kj−1)−2|M(4)
15 |
Generally, analytic formulae for subgraphs can be expressed interms of simpler subgraphs.
18 / 28
Inherent Limits on Analytic Formulae for C (b)How many non-isomorphic trees are there on b nodes?
I C (3) has one denominator termI C (4) has two denominator termsI C (5) has three denominator termsI C (6) has six denominator terms
The number of denominator terms in C (b) explodes:1, 2, 3, 6, 11, 23, 47, 106, 235, 551, 1301, 3159, 7741, 19320,48629, 123867, 317955, 823065, 2144505, 5623756, 14828074,39299897, 104636890, 279793450, 751065460, 2023443032,5469566585, 14830871802, 40330829030, 109972410221,300628862480, 823779631721, 2262366343746, 6226306037178. . .
19 / 28
Dynamic Behaviour of C (3), C (4) and C (5)
Southwest Airlines American Airlines
US Airways United Airlines
20 / 28
Dynamic Behaviour of C (3), C (4) and C (5)
Spirit Airlines AirTran Airways
Alaska Airlines Delta Air Lines
21 / 28
C (b) is Not Always Monotonic
Counter-example in which C (3) ≤ C (4) is possible
I C (3) = 12n+10 ; n ≥ 5,
I C (4) = 16n+22 ; n ≥ 6,
I Equality occurs when C (3) = C (4) = 13
from which C (3) ≤ C (4) as n ≥ 26
22 / 28
C (b) is Not Always Monotonic
Counter-example in which C (4) ≤ C (5) is possible
I C (4) = 80n+95 ; n ≥ 7,
I C (5) = 125n+203 ; n ≥ 8,
I Equality occurs when C (4) = C (5) = 512
from which C (4) ≤ C (5) as n ≥ 97
23 / 28
Algorithmic (Asymptotic) Runtime of C (3)
Runtime of C (3) nested loop algorithm is O(n3)
24 / 28
Algorithmic (Asymptotic) Runtime of C (4)
Runtime of C (4) nested loop algorithm is O(n4)
25 / 28
Algorithmic (Asymptotic) Runtime of C (5)
Runtime of C (5) nested loop algorithm is O(n5)
26 / 28
Is C (b) Useful in Practice?
Summary of Results
I We examine the nature and dynamics of cliques in real-worldairline networks; the standard clustering coefficient C (3) iswidely used in applied work, and provides non-obviousinformation about the structure and evolution of networks
I Using graph theory, we propose a new generalized clusteringcoefficient C (b), which nests C (3); and we develop a very fastanalytic implementation that we show to be up to 2,000 timesfaster than a brute-force approach, for small networks
I It is not feasible to use C (b) when b is greater than about 6,because analytic formulae will be too difficult to derive
27 / 28
Is C (b) Useful in Practice?
Drawbacks and Extensions
I Since C (b) displays high correlation across b, it is possiblethat b > 3 contains little “new” information, individually or incombination with b = 3 (it remains to be seen whether thishigh correlation holds generally for other networks)
I Building analytic formulae might have application to other(slow) statistics that are used in applied graph theory
I Future work on graph theory and econometrics should lead toa better understanding of the economic, strategic and spatialfactors that drive dynamic clustering in real-world networks
28 / 28