Understanding Performance Bottleneck to Improve Parallel ...Understanding Performance Bottleneck to...

Understanding Performance Bottleneck to ImproveParallel Efficiency of Louvain Algorithm

Naw Safrin Sattar and Shaikh ArifuzzamanDepartment of Computer Science, University of New Orleans, New Orleans, LA-70148, USA.

Email: (nsattar, smarifuz)@uno.edu

Abstract—Detecting communities (clusters) in massivenetworks is a fundamental problem in network science.The Louvain algorithm is one of the fastest modularity-based algorithms, and works well with large graphs.It also reveals a hierarchy of communities at differentscales, which can be useful for understanding the globalfunctioning of a network. In current literature, thereexists several shared memory parallel implementationwith better scalability on large-scale graphs. Our focusis to analyze the performance bottlenecks in distributedenvironment. We also look for the scope of improvementsin a hybrid (both shared and distributed memory) ap-proach. Using profiling tools, we can identify the factorsbehind memory consumption, communication overheads.This understanding helps us to design better algorithmsin distributed and hybrid environments. We are alsoworking towards a gpu-based parallel implementationof Louvain algorithm utilizing the power of GPUs toparallelize massive data.

Keywords-Louvain algorithm; parallel methods; large-scale dataset; community detection; GPU

MOTIVATION, METHODS AND CONTRIBUTION

Community structures help us solve many real-worldproblems such as rumor propagation, epidemic spread-ing, recommender system in e-commerce, terrorist ac-tivities in social networks and many more. Louvainalgorithm [1] is one of the efficient and well-known al-gorithms to detect communities considering the qualityof its output communities and the computational time.Parallelizing Louvain algorithm in distributed memoryenvironment is difficult considering the challenges ofcommunication overhead among processors and a well-balanced partitioning technique of initial data. In ourwork we aim at pointing out the bottlenecks to improvethe parallel efficiency of Louvain algorithm in dis-tributed environment. A hybrid algorithm provides theflexibility to the users to balance between both sharedand distributed memory system according to availableresources. We analyze our distributed-memory Louvainalgorithm [2] with MPI profiling tool TAU to under-stand the communication pattern among processors, toidentify possibilities towards improvement. We observe

that for MPI Send and MPI Recv functions in over-all communication, 65% and 69% of the processorsrespectively, take less than average time to completetheir tasks when the load is balanced (METIS graph-partitioner) shown in Figure 1. Again, only few proces-sors have higher number of communications and largermessage size compared to the larger threshold in theload-imbalanced environment.

Figure 1: Runtime of MPI functions

We will look further into the memory consumptioni.e. branching and cache access patterns, time stalledwaiting for resources (such as in memory reads), etc.as well as communication time at different phases ofthe algorithm to identify performance bottlenecks. Itwill help us to identify whether communication timeoverweighs computation time and change the designof our algorithm accordingly. We will also experimentwith other graph-partitioning techniques (i.e. hyper-graph partitioning for social networks) for improvedload-balancing and higher parallel efficiency. We arealso working on designing a parallel implementationof Louvain algorithm with data parallel computationsin GPUs.

REFERENCES

[1] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, andE. Lefebvre, “Fast unfolding of communities in largenetworks,” Journal of statistical mechanics: theory andexperiment, vol. 2008, no. 10, p. P10008, 2008.

[2] N. S. Sattar and S. Arifuzzaman, “Overcoming mpicommunication overhead for distributed community de-tection,” in Workshop on Software Challenges to Exas-cale Computing. Springer, 2018, pp. 77–90.

Understanding Performance Bottleneck to Improve Parallel ...Understanding Performance Bottleneck to...

Documents

Where is the bottleneck

Bottleneck of Coordination

Power Transmission_the Real Bottleneck

Great Bottleneck Blues Lessons

Parallel Query Processing - DoACT, AKTmazsola.iit.uni-miskolc.hu/tempus/discom/doc/db/tema05.pdf · Main problem: I/O bottleneck ... Parallel query processing ... it means more asynchronous

Report on warid telecom international by Arifuzzaman Talukder

Multivariate Information Bottleneck

Requirements the Last Bottleneck

End-to-end Data-flow Parallelism for Throughput ... · Disk Bottleneck " Step2: Effect of Parallel Streams on Memory-to-memory Transfers and CPU Utilization " Once disk bottleneck

SDG ACCELERATOR AND BOTTLENECK ASSESSMENTlocalizingthesdgs.org/library/401/SDG-Accelerator-and-Bottleneck... · The SDG Accelerator and Bottleneck Assessment tool was developed under

Bottleneck the Goal

Bottleneck Hypothesis

Bregman Information Bottleneck

Solving io bottleneck

Solving the IO Bottleneck

The Bottleneck Model: An Assessment and Interpretationksmall/Bottleneck review.pdf · · 2015-01-06The Bottleneck Model: An Assessment and Interpretation . Kenneth A. Small. a

Approximating geometric bottleneck shortest pathsnzeh/Publications/bottleneck_cgta04.pdf · bottleneck connectedness queries and bottleneck shortest path length queries can trivially

Parallel Mining and Analysis of Triangles and Communities ... · Parallel Mining and Analysis of Triangles and Communities in Big Networks S M Arifuzzaman (ABSTRACT) A network (graph)

Bottleneck Management and Improvement

GenomeQuest -- The Bioinformatics Bottleneck