Listing Cliques in Parallel Using a Beowulf Cluster Kaveh Moallemi, Dr. Gerald D. Zarnett, and Dr. Eric R. Harley. Department of Computer Science Ryerson

Listing Cliques in Parallel Using a Beowulf Cluster

Kaveh Moallemi, Dr. Gerald D. Zarnett, and Dr. Eric R. Harley.

Department of Computer ScienceRyerson University, Toronto, Canada

OUTLINE

• What is a clique?

• Why are cliques of interest?

• How many cliques are there?

• Algorithms for listing cliques.

• Parallel algorithms for listing cliques.

• Results

• Conclusion

What is a clique?

• clique = fully-connected subgraph

• maximal clique = a clique which is not a proper subgraph of another clique

Maximal Cliques

Graph All cliques (in red)

Why are cliques of interest?

• Cliques denote groups of related items.

• Many applications:– physical mapping– protein-protein interactions– homologous genes– interlinked sites on the web– data mining

How many cliques are there?

• let n = number of vertices in the graph

• at most there are 3n/3 cliques (Moon-Moser)

• in a random graph, sometimes about n5, depending on density

Number of cliques in a random graph

Experiment by Sonal Patel

Some clique listing algorithms

• 1972 - Bierstone• 1973 - Bron and Kerbosch • 1976 - Johnston• 1977 - Tsukiyama et al. • 1981 - Loukakis and Tsouros • 1983 - Loukakis• 1985 - Chiba and Nishizeki• 1988 - Tomita et al. • 2004 - Makino and Uno

Bron-Kerbosch Heuristic

NOT CANDIDATES

CLIQUE

NEIGHBORS

NOT CANDIDATES

CLIQUE

NEIGHBORS

Continueextending CLIQUE Stop

extending CLIQUE

Parallel approaches to clique listing

• 1988 - Dahlhaus and Karpinski (design)

• 2005 - Blaar et al. (8 processors)

• 2006 - Du et al. (84 processors)

• 2009 - Jaber et al. (5 processors)

• 2009 - Schmidt et al. (2048 processors)

Beowulf cluster

Image: Ahmad Mukarram

Fast Ethernet interconnection through a 24-port switch(D-Link DES-1024D)

SMC Power II(epic100)

6128PIII 550PVFS I/O serverIo0-io4

Netcore(8139too)

None256PIII 550diskless computenodes

Node1-node16

SMC Power II(epic100)

10256PIII 550

master nodePVFS meta-serverBProc master,DHCP server

Node0

NetworkInterface

Disk(GB)

Memory(MB)

CPU(MHz)

TaskNode

Wulfgar System

* Version 5.5.4 of the Intel driver (latest at the time) exhibits very high latencyfor small packet sizes. Awaiting a corrected driver from Intel.These nodes have now been upgraded with the D-Link DGE-530T interfaces.

Gigabit Ethernet interconnection through a 8-port switch(Netgear GS108)

Intel Pro 1000MT(e1000 *)

None256DualPIII 750

diskless computenodes

Node1-Node4

D-LinkDGE-530T(sk98lin)

2x18RAID1

256DualPII 300

master nodePVFS meta-serverPVFS I/O serverBProc master,DHCP server

Node0

NetworkInterface

Disk(GB)

Memory(MB)

CPU(MHz)

TaskNode

Slackwulf System

2,08464,079,4766,357,789

200,000graph200000.050.050

15,203102,802,59862,375500 (50%)graph0500.50

89,862303,588,849199,8131,000 (40%)

graph1000.40

3,56016,113,690149,9731,000 (30%)

graph1000.30

8,9476,521,5991,631,547

8,192 (1%)graph8192.01

N/A (Slackwulf only)

2,789,815876,3536,000 (1%)graph6000.01

N/A (Slackwulf only)

980,953408,4794,096 (1%)graph4096.01

Single ProcessSolve time(sec)

CliquesEdgesVertices(density)

GraphName

G500.50

G8192.01

G1000.30

G200000.050.050

G500.50

G8192.01

G1000.30

G200000.050.050

Ideal linear speedup

Conclusion

• Beowulf cluster can give ideal speedup– for up to 16 processors and on random graphs

• Beowulf cluster may not give ideal speedup when– graph is sparse and huge with lots of IO– when dual core processors are used

Documents

Listing Cliques in Parallel Using a Beowulf Cluster Kaveh Moallemi, Dr. Gerald D. Zarnett, and Dr. Eric R. Harley. Department of Computer Science Ryerson