28
Cliques and a New Measure of Clustering (with Application to U.S. Domestic Airlines) Yll Mehmeti (supervised by Steve Lawford) DEVI, ENAC February 6, 2018 1 / 28

Cliques and a New Measure of Clustering - ENACrecherche.enac.fr/~steve.lawford/projects/cliques_slides.pdf · Cliques and a New Measure of Clustering (with Application to U.S. Domestic

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Cliques and a New Measure of Clustering - ENACrecherche.enac.fr/~steve.lawford/projects/cliques_slides.pdf · Cliques and a New Measure of Clustering (with Application to U.S. Domestic

Cliques and a New Measure of Clustering(with Application to U.S. Domestic Airlines)

Yll Mehmeti(supervised by Steve Lawford)

DEVI, ENAC

February 6, 2018

1 / 28

Page 2: Cliques and a New Measure of Clustering - ENACrecherche.enac.fr/~steve.lawford/projects/cliques_slides.pdf · Cliques and a New Measure of Clustering (with Application to U.S. Domestic

Structure

I What is a clique?

I Overall clustering coefficient C (3)

I Generalization of the clustering coefficient C (b)

I Properties of C (b)

I Analytic formulae for subgraphs

I Algorithmic efficiency

I Is C (b) useful in practice?

2 / 28

Page 3: Cliques and a New Measure of Clustering - ENACrecherche.enac.fr/~steve.lawford/projects/cliques_slides.pdf · Cliques and a New Measure of Clustering (with Application to U.S. Domestic

What is a Clique?

2-clique (an edge) 3-clique (a triangle)

4-clique 5-clique

3 / 28

Page 4: Cliques and a New Measure of Clustering - ENACrecherche.enac.fr/~steve.lawford/projects/cliques_slides.pdf · Cliques and a New Measure of Clustering (with Application to U.S. Domestic

Route-Map Data for U.S. Domestic Airlines

Southwest Airlines’ Direct Routes in 2013Q4

4 / 28

Page 5: Cliques and a New Measure of Clustering - ENACrecherche.enac.fr/~steve.lawford/projects/cliques_slides.pdf · Cliques and a New Measure of Clustering (with Application to U.S. Domestic

Cliques in Southwest’s Network

Dayton–Denver–Orlando maximal 3-clique

5 / 28

Page 6: Cliques and a New Measure of Clustering - ENACrecherche.enac.fr/~steve.lawford/projects/cliques_slides.pdf · Cliques and a New Measure of Clustering (with Application to U.S. Domestic

Cliques in Southwest’s Network

Albuquerque–Dallas–Houston–Kansas City maximal 4-clique

6 / 28

Page 7: Cliques and a New Measure of Clustering - ENACrecherche.enac.fr/~steve.lawford/projects/cliques_slides.pdf · Cliques and a New Measure of Clustering (with Application to U.S. Domestic

Cliques in Southwest’s Network

BNA–BWI–DEN–HOU–LAS–MCI–MDW–MSY–PHX–STL–TPA maximum 11-clique

7 / 28

Page 8: Cliques and a New Measure of Clustering - ENACrecherche.enac.fr/~steve.lawford/projects/cliques_slides.pdf · Cliques and a New Measure of Clustering (with Application to U.S. Domestic

Cliques in Southwest’s Network

Distribution of size of maximal cliques, Southwest Airlines 2013Q4

8 / 28

Page 9: Cliques and a New Measure of Clustering - ENACrecherche.enac.fr/~steve.lawford/projects/cliques_slides.pdf · Cliques and a New Measure of Clustering (with Application to U.S. Domestic

Cliques in Southwest’s Network

9 / 28

Page 10: Cliques and a New Measure of Clustering - ENACrecherche.enac.fr/~steve.lawford/projects/cliques_slides.pdf · Cliques and a New Measure of Clustering (with Application to U.S. Domestic

Overall Clustering Coefficient C (3)

Defined by Mark Newman (2003, “The structure and function ofcomplex networks”, SIAM Review) as:

C (3) =3× number of triangles in the network

number of connected triples of vertices,

where a connected triple is a set of three distinct nodes u, v andw , such that at least two of the possible edges between them exist.

I How often are two friends of mine friends with one another?

I Higher values of C (3) correspond to more 3-cliques (triangles)

I How to compute C (3)? Brute-force search takes O(n3) time

10 / 28

Page 11: Cliques and a New Measure of Clustering - ENACrecherche.enac.fr/~steve.lawford/projects/cliques_slides.pdf · Cliques and a New Measure of Clustering (with Application to U.S. Domestic

Overall Clustering Coefficient C (3)

11 / 28

Page 12: Cliques and a New Measure of Clustering - ENACrecherche.enac.fr/~steve.lawford/projects/cliques_slides.pdf · Cliques and a New Measure of Clustering (with Application to U.S. Domestic

Dynamic Behaviour of C (3) for Different Networks

Southwest Airlines American Airlines

US Airways United Airlines

12 / 28

Page 13: Cliques and a New Measure of Clustering - ENACrecherche.enac.fr/~steve.lawford/projects/cliques_slides.pdf · Cliques and a New Measure of Clustering (with Application to U.S. Domestic

Dynamic Behaviour of C (3) for Different Networks

Southwest Airlines American Airlines

US Airways United Airlines

13 / 28

Page 14: Cliques and a New Measure of Clustering - ENACrecherche.enac.fr/~steve.lawford/projects/cliques_slides.pdf · Cliques and a New Measure of Clustering (with Application to U.S. Domestic

Dynamic Behaviour of C (3) for Different Networks

Southwest Airlines American Airlines

US Airways United Airlines14 / 28

Page 15: Cliques and a New Measure of Clustering - ENACrecherche.enac.fr/~steve.lawford/projects/cliques_slides.pdf · Cliques and a New Measure of Clustering (with Application to U.S. Domestic

Generalized Overall Clustering Coefficient C (b)

C (b) =a(b)× number of b-cliques in the network

number of b-spanning trees,

where Cayley’s formula a(b) = bb−2 gives the number of spanningtrees in a complete graph with b nodes.

I C (b) = 0 if and only if there are no b-cliques in the graph

I C (b) = 1 if and only if the graph is complete

I For Erdos-Renyi G (n, p) random graphs, C (b) has a simpleform e.g. C (3) = p,C (4) = p3,C (5) = p6 etc.

15 / 28

Page 16: Cliques and a New Measure of Clustering - ENACrecherche.enac.fr/~steve.lawford/projects/cliques_slides.pdf · Cliques and a New Measure of Clustering (with Application to U.S. Domestic

Overall Clustering Coefficient C (4)

16 / 28

Page 17: Cliques and a New Measure of Clustering - ENACrecherche.enac.fr/~steve.lawford/projects/cliques_slides.pdf · Cliques and a New Measure of Clustering (with Application to U.S. Domestic

Overall Clustering Coefficient C (5)

17 / 28

Page 18: Cliques and a New Measure of Clustering - ENACrecherche.enac.fr/~steve.lawford/projects/cliques_slides.pdf · Cliques and a New Measure of Clustering (with Application to U.S. Domestic

Analytic Formulae for Subgraphs and C (b)

Let g be the adjacency matrix and ki the degree of node i .

tadpole

|M(4)15 | =

1

2

∑ki>2

(g3)ii (ki − 2)

5-arrow

|M(5)77 | =

∑(i , j)∈E

both directions

(ki − 1

2

)(kj−1)−2|M(4)

15 |

Generally, analytic formulae for subgraphs can be expressed interms of simpler subgraphs.

18 / 28

Page 19: Cliques and a New Measure of Clustering - ENACrecherche.enac.fr/~steve.lawford/projects/cliques_slides.pdf · Cliques and a New Measure of Clustering (with Application to U.S. Domestic

Inherent Limits on Analytic Formulae for C (b)How many non-isomorphic trees are there on b nodes?

I C (3) has one denominator termI C (4) has two denominator termsI C (5) has three denominator termsI C (6) has six denominator terms

The number of denominator terms in C (b) explodes:1, 2, 3, 6, 11, 23, 47, 106, 235, 551, 1301, 3159, 7741, 19320,48629, 123867, 317955, 823065, 2144505, 5623756, 14828074,39299897, 104636890, 279793450, 751065460, 2023443032,5469566585, 14830871802, 40330829030, 109972410221,300628862480, 823779631721, 2262366343746, 6226306037178. . .

19 / 28

Page 20: Cliques and a New Measure of Clustering - ENACrecherche.enac.fr/~steve.lawford/projects/cliques_slides.pdf · Cliques and a New Measure of Clustering (with Application to U.S. Domestic

Dynamic Behaviour of C (3), C (4) and C (5)

Southwest Airlines American Airlines

US Airways United Airlines

20 / 28

Page 21: Cliques and a New Measure of Clustering - ENACrecherche.enac.fr/~steve.lawford/projects/cliques_slides.pdf · Cliques and a New Measure of Clustering (with Application to U.S. Domestic

Dynamic Behaviour of C (3), C (4) and C (5)

Spirit Airlines AirTran Airways

Alaska Airlines Delta Air Lines

21 / 28

Page 22: Cliques and a New Measure of Clustering - ENACrecherche.enac.fr/~steve.lawford/projects/cliques_slides.pdf · Cliques and a New Measure of Clustering (with Application to U.S. Domestic

C (b) is Not Always Monotonic

Counter-example in which C (3) ≤ C (4) is possible

I C (3) = 12n+10 ; n ≥ 5,

I C (4) = 16n+22 ; n ≥ 6,

I Equality occurs when C (3) = C (4) = 13

from which C (3) ≤ C (4) as n ≥ 26

22 / 28

Page 23: Cliques and a New Measure of Clustering - ENACrecherche.enac.fr/~steve.lawford/projects/cliques_slides.pdf · Cliques and a New Measure of Clustering (with Application to U.S. Domestic

C (b) is Not Always Monotonic

Counter-example in which C (4) ≤ C (5) is possible

I C (4) = 80n+95 ; n ≥ 7,

I C (5) = 125n+203 ; n ≥ 8,

I Equality occurs when C (4) = C (5) = 512

from which C (4) ≤ C (5) as n ≥ 97

23 / 28

Page 24: Cliques and a New Measure of Clustering - ENACrecherche.enac.fr/~steve.lawford/projects/cliques_slides.pdf · Cliques and a New Measure of Clustering (with Application to U.S. Domestic

Algorithmic (Asymptotic) Runtime of C (3)

Runtime of C (3) nested loop algorithm is O(n3)

24 / 28

Page 25: Cliques and a New Measure of Clustering - ENACrecherche.enac.fr/~steve.lawford/projects/cliques_slides.pdf · Cliques and a New Measure of Clustering (with Application to U.S. Domestic

Algorithmic (Asymptotic) Runtime of C (4)

Runtime of C (4) nested loop algorithm is O(n4)

25 / 28

Page 26: Cliques and a New Measure of Clustering - ENACrecherche.enac.fr/~steve.lawford/projects/cliques_slides.pdf · Cliques and a New Measure of Clustering (with Application to U.S. Domestic

Algorithmic (Asymptotic) Runtime of C (5)

Runtime of C (5) nested loop algorithm is O(n5)

26 / 28

Page 27: Cliques and a New Measure of Clustering - ENACrecherche.enac.fr/~steve.lawford/projects/cliques_slides.pdf · Cliques and a New Measure of Clustering (with Application to U.S. Domestic

Is C (b) Useful in Practice?

Summary of Results

I We examine the nature and dynamics of cliques in real-worldairline networks; the standard clustering coefficient C (3) iswidely used in applied work, and provides non-obviousinformation about the structure and evolution of networks

I Using graph theory, we propose a new generalized clusteringcoefficient C (b), which nests C (3); and we develop a very fastanalytic implementation that we show to be up to 2,000 timesfaster than a brute-force approach, for small networks

I It is not feasible to use C (b) when b is greater than about 6,because analytic formulae will be too difficult to derive

27 / 28

Page 28: Cliques and a New Measure of Clustering - ENACrecherche.enac.fr/~steve.lawford/projects/cliques_slides.pdf · Cliques and a New Measure of Clustering (with Application to U.S. Domestic

Is C (b) Useful in Practice?

Drawbacks and Extensions

I Since C (b) displays high correlation across b, it is possiblethat b > 3 contains little “new” information, individually or incombination with b = 3 (it remains to be seen whether thishigh correlation holds generally for other networks)

I Building analytic formulae might have application to other(slow) statistics that are used in applied graph theory

I Future work on graph theory and econometrics should lead toa better understanding of the economic, strategic and spatialfactors that drive dynamic clustering in real-world networks

28 / 28