Understanding Performance Bottleneck to Improve Parallel ...Understanding Performance Bottleneck to Improve Parallel Efﬁciency of Louvain Algorithm Naw Safrin Sattar and Shaikh Arifuzzaman

Understanding Performance Bottleneck to ImproveParallel Efficiency of Louvain Algorithm

Naw Safrin Sattar and Shaikh ArifuzzamanDepartment of Computer Science, University of New Orleans, New Orleans, LA-70148, USA.

Email: (nsattar, smarifuz)@uno.edu

Abstract—Detecting communities (clusters) in massivenetworks is a fundamental problem in network science.The Louvain algorithm is one of the fastest modularity-based algorithms, and works well with large graphs.It also reveals a hierarchy of communities at differentscales, which can be useful for understanding the globalfunctioning of a network. In current literature, thereexists several shared memory parallel implementationwith better scalability on large-scale graphs. Our focusis to analyze the performance bottlenecks in distributedenvironment. We also look for the scope of improvementsin a hybrid (both shared and distributed memory) ap-proach. Using profiling tools, we can identify the factorsbehind memory consumption, communication overheads.This understanding helps us to design better algorithmsin distributed and hybrid environments. We are alsoworking towards a gpu-based parallel implementationof Louvain algorithm utilizing the power of GPUs toparallelize massive data.

Keywords-Louvain algorithm; parallel methods; large-scale dataset; community detection; GPU

MOTIVATION, METHODS AND CONTRIBUTION

Community structures help us solve many real-worldproblems such as rumor propagation, epidemic spread-ing, recommender system in e-commerce, terrorist ac-tivities in social networks and many more. Louvainalgorithm [1] is one of the efficient and well-known al-gorithms to detect communities considering the qualityof its output communities and the computational time.Parallelizing Louvain algorithm in distributed memoryenvironment is difficult considering the challenges ofcommunication overhead among processors and a well-balanced partitioning technique of initial data. In ourwork we aim at pointing out the bottlenecks to improvethe parallel efficiency of Louvain algorithm in dis-tributed environment. A hybrid algorithm provides theflexibility to the users to balance between both sharedand distributed memory system according to availableresources. We analyze our distributed-memory Louvainalgorithm [2] with MPI profiling tool TAU to under-stand the communication pattern among processors, toidentify possibilities towards improvement. We observe

that for MPI Send and MPI Recv functions in over-all communication, 65% and 69% of the processorsrespectively, take less than average time to completetheir tasks when the load is balanced (METIS graph-partitioner) shown in Figure 1. Again, only few proces-sors have higher number of communications and largermessage size compared to the larger threshold in theload-imbalanced environment.

Figure 1: Runtime of MPI functions

We will look further into the memory consumptioni.e. branching and cache access patterns, time stalledwaiting for resources (such as in memory reads), etc.as well as communication time at different phases ofthe algorithm to identify performance bottlenecks. Itwill help us to identify whether communication timeoverweighs computation time and change the designof our algorithm accordingly. We will also experimentwith other graph-partitioning techniques (i.e. hyper-graph partitioning for social networks) for improvedload-balancing and higher parallel efficiency. We arealso working on designing a parallel implementationof Louvain algorithm with data parallel computationsin GPUs.

REFERENCES

[1] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, andE. Lefebvre, “Fast unfolding of communities in largenetworks,” Journal of statistical mechanics: theory andexperiment, vol. 2008, no. 10, p. P10008, 2008.

[2] N. S. Sattar and S. Arifuzzaman, “Overcoming mpicommunication overhead for distributed community de-tection,” in Workshop on Software Challenges to Exas-cale Computing. Springer, 2018, pp. 77–90.

Documents

Understanding Performance Bottleneck to Improve Parallel ...Understanding Performance Bottleneck to Improve Parallel Efﬁciency of Louvain Algorithm Naw Safrin Sattar and Shaikh Arifuzzaman