A Scalable Parallel Inter-cluster Communication System for Clustered Multiprocessors

  • View
    217

  • Download
    0

Embed Size (px)

Text of A Scalable Parallel Inter-cluster Communication System for Clustered Multiprocessors

  • 8/13/2019 A Scalable Parallel Inter-cluster Communication System for Clustered Multiprocessors

    1/52

    A Scalable Parallel Inter-Cluster Communication Systemfor Clustered Multiprocessors

    byXiaohu Jiang

    Submitted to the Department of Electrical Engineering and Computer Sciencein partial fulfillment of the requirements for the degree of

    Master of Scienceat the

    MASSACHUSETTS INSTITUTE OF TECHNOLOGYAugust 1997

    Massachusetts Institute of Technology 1997. All rights reserved.

    Author ........Department of Electrical Engineering and Computer Science

    August 21, 1997S I i /1/

    Certified by .......Anant Agarwal

    Associate Professor of Computer Science and EngineeringI~~.----- Thesis Supervisor

    Accepted by ....................................... _Arthur C. Smith

    Chairman, Departmental Committee on Graduate Students

    OCT 291997

    .. . . Nr

  • 8/13/2019 A Scalable Parallel Inter-cluster Communication System for Clustered Multiprocessors

    2/52

    A Scalable Parallel Inter-Cluster Communication System for ClusteredMultiprocess rs

    byXiaohu Jiang

    Submitted to the Department of Electrical Engineering and Computer Scienceon August 21, 1997, in partial fulfillment of therequirements for the degree ofMaster of Science

    AbstractClustered multiprocessors have been proposed as a cost-effective way for building large-scale parallel computers. A reliable and highly efficient inter-cluster communication systemis a key for the success of this approach. This thesis presents a design of a scalable parallelinter-cluster communication system. The system achieves high bandwidth and low latencyby leveraging parallelism in protocol processing and network access within each cluster.Intelligent Network Interfaces (INIs), which are network interface cards equipped withprotocol processors, are used as building blocks for the system. A prototype of the design isbuilt on the Alewife multiprocessor. The prototype inter-cluster communication system isintegrated with the Alewife Multigrain Shared-Memory System, and performance of Water,a SPLASH benchmark, is studied in detail on the platform.

    Our results show that the introduction of a software protocol stack in the inter-clustercommunication system can increase application run time by as much as a factor of two. Forapplications with high inter-cluster communication requirements, contention at INIs can besevere when multiple compute nodes share a single INI. Our initial results also suggest thatfor a given application and machine size, when the size of clusters and their inter-clustercommunication system are scaled proportionally, contention in inter-cluster communicationlevels off. For Water, we found that the impact of inter-cluster communication overhead onoverall run time reduces even when the number of INIs assigned to each cluster scales withthe square root of the cluster size. Again, this result assumes a fixed machine size.

    Thesis Supervisor: Anant AgarwalTitle: Associate Professor of Computer Science and Engineering

  • 8/13/2019 A Scalable Parallel Inter-cluster Communication System for Clustered Multiprocessors

    3/52

    AcknowledgmentsI am really grateful to Donald Yeung, who introduced me to the topic of this thesis. Throughthe long discussions we had during the past year or so, Donald has been consistently givingme guidance, as well as hands on help for my research. Many thanks should also go toJohn Kubiatowicz, who answered all my questions about the Alewife system. In addition,Anant Agarwal, My advisor, has guided me all the way through my research.

    Many other people also have given me help. Among them, members of the Alewifegroup, Ken Mackenzie, Victor Lee, Walter Lee, Matt Frank provided me insight to variousproblems through discussions. My officemates Michael Taylor and Benjamin Greenwaldhave generated a friendly and amusing environment, which makes the whole experienceenjoyable.

    Finally, I want to thank my girl friend, Xiaowei Yang. Whose love and emotionalsupport make me feel that every day of my life, busy or relaxed, is a wonderful day.

  • 8/13/2019 A Scalable Parallel Inter-cluster Communication System for Clustered Multiprocessors

    4/52

  • 8/13/2019 A Scalable Parallel Inter-cluster Communication System for Clustered Multiprocessors

    5/52

    Contents

    1 Introduction 112 Clustered Multiprocessors 153 System Design 17

    3.1 Protocol Stack ............................... 173.2 End-to-end Reliability ........................... 19

    3.2.1 The Sliding Window Protocol. ............ . . . . 203.2.2 Software Timers ................... ....... 21

    3.3 Load Balancing .............................. 223.4 Related Work ............................... 23

    4 Prototype Implementation 254.1 A lew ife . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . 254.2 The Alewife MGS System. ................ ....... . 264.3 Protocol Implementation ........ .. . . . .... ......... . 274.4 System Configurability . ...... . .......... .......... . 28

    4.4.1 Performance Related Optimizations . ............... 294.4.2 Statistics collection ........................ 30

    5 Performance results 315.1 Inter-cluster message passing performance on unloaded system ...... 315.2 Application Performance ............... ......... 32

    5

  • 8/13/2019 A Scalable Parallel Inter-cluster Communication System for Clustered Multiprocessors

    6/52

    5.3 Inter-cluster communication performance with shared memory applicationload . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

    5.4 Intra-cluster INI node load inbalance measurement . ........... 355.5 Scalability of Intra-cluster communication system . ............ 38

    6 An analytical model 416.1 Architecture assumption ................ .......... 426.2 Model parameter description ........................ 426.3 Model performance for applications with homogeneous inter-cluster com-

    munication load .............................. 436.4 Result discussion .............................. 46

    7 Conclusion and Future Work 49

  • 8/13/2019 A Scalable Parallel Inter-cluster Communication System for Clustered Multiprocessors

    7/52

    List of Figures2-1 A clustered multiprocessor. ................. ..... 163-1 INI node protocol stack implementation ................... 183-2 The send_timers structure .......................... 214-1 Configuration on a 32-node Alewife Machine. . ............... 285-1 Application performance with static INI load balance. . ........ . . 335-2 Application performance with round-robin NI load balance. ....... . 365-3 Intra-cluster INI node load balance measurement. . ............. 376-1 Queuing in inter-cluster message passing. . ................. 436-2 Time-line of an inter-cluster compute/request cycle including contention.. 446-3 Comparison between model result and measurement on Alewife .... . 47

  • 8/13/2019 A Scalable Parallel Inter-cluster Communication System for Clustered Multiprocessors

    8/52

  • 8/13/2019 A Scalable Parallel Inter-cluster Communication System for Clustered Multiprocessors

    9/52

    List of Tables5.1 Time breakdown among sending modules, measured in machine cycles. 325.2 Time breakdown among receiving modules, measured in machine cycles. . 335.3 INI node performance running water on MGS, with static INI scheduling. 355.4 INI node performance run ning water on MGS, with round-robin NI schedul-

    ing.. . . . . . . . ... . . . . . . . . . . . . . . . . . . . . . . . . . . . 365.5 Compare application (Water on MGS) performance between static INI

    scheduling and round-robin NI scheduling . ............... 375.6 Intra-cluster INI node load inbalance measurement, with static NI scheduling. 385.7 Intra-cluster INI node load inbalance measurement, with round-robin INI

    scheduling .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385.8 Scaling inter-cluster communication size with cluster size 2 using static

    INI scheduling .. . .. . . . . . . . . ... .. . . . . . . .. . .. ... 395.9 Scaling inter-cluster communication size with cluster size 2 using round-

    robin INI scheduling. ........................... 396.1 Architectural Parameters of the model. . .................. 426.2 Notations used by the model. . .................. .... 44

  • 8/13/2019 A Scalable Parallel Inter-cluster Communication System for Clustered Multiprocessors

    10/52

  • 8/13/2019 A Scalable Parallel Inter-cluster Communication System for Clustered Multiprocessors

    11/52

    Chapter 1IntroductionWhile traditional massively parallel processors (MPPs) can achieve good performance on avariety of important applications, the poor cost-performance of these systems prevent themfrom becoming widely available. In recent years, small- to medium-scale multiprocessors,such as bus-based Symm etric Multiprocessors (SMPs), are quickly emerging. This class ofmachines can exploit parallelism in applications to achieve high performance, and simulta-neously benefit from the economy of high volume because their small-scale nature allowsthem to be commodity components. Many researchers believe that by using these smallermultiprocessors as building blocks, high performance MPPs can be built in a cost-effectiveway. In this thesis, we call these small- to m edium-scale multiprocessors as clusters, andthe MPPs built by assembling these clusters together as clustered multiprocessors. Further-more, we define an inter-cluster communication system as the combination of a inter-clusternetwork, which is usually a commodity Local Area Network (LAN), some number of clus-ter network interface cards, and processor resources used to execute inter-cluster