Click here to load reader

Issues in Multiprocessors

  • View
    37

  • Download
    2

Embed Size (px)

DESCRIPTION

Issues in Multiprocessors. Which programming model for interprocessor communication shared memory regular loads & stores message passing explicit sends & receives Which execution model control parallel identify & synchronize different asynchronous threads data parallel - PowerPoint PPT Presentation

Text of Issues in Multiprocessors

  • Issues in MultiprocessorsWhich programming model for interprocessor communicationshared memoryregular loads & storesmessage passingexplicit sends & receives

    Which execution modelcontrol parallelidentify & synchronize different asynchronous threadsdata parallelsame operation on different parts of the shared data space

    CSE P548

  • Issues in MultiprocessorsHow to express parallelismautomatic (compiler) detectionimplicitly parallel C & Fortran programs, e.g., SUIF compilerlanguage supportHPF, ZPLruntime library constructscoarse-grain, explicitly parallel C programs

    Algorithm developmentembarrassingly parallel programs could be easily parallelizeddevelopment of different algorithms for same problem

    CSE P548

  • Issues in MultiprocessorsHow to get good parallel performancerecognize parallelismtransform programs to increase parallelism without decreasing processor localitydecrease sharing costs

    CSE P548

  • Flynn ClassificationSISD: single instruction stream, single data streamsingle-context uniprocessorsSIMD: single instruction stream, multiple data streamsexploits data parallelismexample: Thinking Machines CM-1MISD: multiple instruction streams, single data streamsystolic arrays example: Intel iWarpMIMD: multiple instruction streams, multiple data streamsmultiprocessorsmultithreaded processorsrelies on control parallelism: execute & synchronize different asynchronous threads of controlparallel programming & multiprogramming example: most processor companies have MP configurations

    CSE P548

  • CM-1

    CSE P548

  • Systolic Array

    CSE P548

  • MIMD

    Low-endbus-basedsimple, but a bottlenecksimple cache coherency protocolphysically centralized memoryuniform memory access (UMA machine)Sequent Symmetry, SPARCCenter, Alpha-, PowerPC- or SPARC-based servers

    CSE P548

  • Low-end MP

    CSE P548

  • MIMD

    High-endhigher bandwidth, multiple-path interconnectmore scalablemore complex cache coherency protocol (if shared memory)longer latenciesphysically distributed memorynon-uniform memory access (NUMA machine)could have processor clustersSGI Challenge, Convex Examplar, Cray T3D, IBM SP-2, Intel Paragon

    CSE P548

  • High-end MP

    CSE P548

  • MIMD Programming ModelsAddress space organization for physically distributed memorydistributed shared memory1 (logical) global address spacemulticomputersprivate address space/processor

    Inter-processor communicationshared memoryaccessed via load/store instructionsSPARCCenter, SGI Challenge, Cray T3D, Convex Exemplar, KSR-1&2message passingexplicit communication by sending/receiving messagesTMC CM-5, Intel Paragon, IBM SP-2

    CSE P548

  • Shared Memory vs. Message PassingShared memory + simple parallel programming modelglobal shared address spacenot worry about data locality butget better performance when program for data placementlower latency when data is localless communication when avoid false sharingbut can do data placement if it is crucial, but dont have tohardware maintains data coherencesynchronize to order processors accesses to shared datalike uniprocessor code so parallelizing by programmer or compiler is easier can focus on program semantics, not interprocessor communication

    CSE P548

  • Shared Memory vs. Message PassingShared memory + low latency (no message passing software) butoverlap of communication & computation latency-hiding techniques can be applied to message passing machines+ higher bandwidth for small transfers but usually the only choice

    CSE P548

  • Shared Memory vs. Message PassingMessage passing + abstraction in the programming model encapsulates the communication costs butmore complex programming modeladditional language constructsneed to program for nearest neighbor communication+ no coherency hardware+ good throughput on large transfers butwhat about small transfers?+ more scalable (memory latency doesnt scale with the number of processors) butlarge-scale SM has distributed memory alsohah! so youre going to adopt the message-passing model?

    CSE P548

  • Shared Memory vs. Message PassingWhy there was a debatelittle experimental datanot separate implementation from programming modelcan emulate one paradigm with the otherMP on SM machine message buffers in private (to each processor) memory copy messages by ld/st between buffersSM on MP machine ld/st becomes a message copy sloooooooooowWho won?

    CSE P548