A Case for Language Support for Implicitly Parallel Programming

  • Upload
    lixue

  • View
    22

  • Download
    0

Embed Size (px)

DESCRIPTION

A Case for Language Support for Implicitly Parallel Programming. Christopher Rodrigues Joint work with Prof. Shobha Vasudevan Advisor Wen-Mei Hwu. Algorithm. Outline. Source code. This presentation examines automatic parallelization from a compiler-centric point of view. - PowerPoint PPT Presentation

Citation preview

  • 3/13/09UPCRC Seminar*A Case for Language Support for Implicitly Parallel ProgrammingChristopher RodriguesJoint work with Prof. Shobha VasudevanAdvisor Wen-Mei Hwu

    UPCRC Seminar

  • 3/13/09UPCRC Seminar*OutlineThis presentation examines automatic parallelization from a compiler-centric point of view.Why parallel algorithms become sequential programsA better expression of parallelismChecking correctness and parallelismImplementation statusMoving forwardSource codeSequential IRParallel IRParallelExecutableAlgorithm

    UPCRC Seminar

  • 3/13/09UPCRC Seminar*Parallelism LostBlock0Block1BlockN-1Block0Block1BlockN-1DCTDCTDCTDCT block transformsin JPEG Encoding(Programmers view)Points-to set#1000Points-to set#1001for (i=0; i
  • 3/13/09UPCRC Seminar*Parallelism LostDevelopers manage complexity using high-level abstractions that provideCompositionality: Developer can reason about each module in isolation, because modules dont interactSeparation: Functions only interact with a few pieces of data, and are independent of everything elseIndependence is a way to manage complexityHowever, abstractions are lost in source codeTranslation to source code introduces dependences

    UPCRC Seminar

  • 3/13/09UPCRC Seminar*An Example of Parallel Algorithms Becoming Sequential: SIFTScale-Invariant Feature Transform (SIFT) is a parallelizable image processing applicationA sequential C implementation is provided in VLFeatOpen-source, download from www.vlfeat.org SIFT is a feature detectorFeature: Something in an image that helps to identify itEach SIFT feature consists ofA keypoint: features location and orientationA descriptor: features distinguishing characteristics

    UPCRC Seminar

  • 3/13/09UPCRC Seminar*SIFT Execution Time ProfileArbitrary 640x480 picture used for profilingThree major parallel sectionsScale and gradient images computed by convolution (30% time)Each descriptor is a histogram (60% time)Will focus on parallel descriptor calculationThe highlighted loopfor each file:for each octave:compute scale imagescompute gradient images

    for each scale:find keypointsrefine keypoints

    for each scale:for each keypoint:calculate orientations

    for each orientation:calculate descriptoroutput descriptor62%25%2.5%6%3%TimeCodetotal:98.5%

    UPCRC Seminar

  • 3/13/09UPCRC Seminar*SIFTs Descriptor Computation Pipelinefor (; i < nkeys; i++) { ; ;

    for (q = 0; q < nangles; q++) { Descriptor descr;

    ;

    ;

    }}Stage 1: Get keypointStage 3: Compute descriptor(parallelizable)Stage 4: Write output(sequential)Stage 2: Compute orientations(parallelizable)

    UPCRC Seminar

  • 3/13/09UPCRC Seminar*Developer Introduces Sequential DependencesBuffer reuseCannot parallelize because buffer d only holds one result at a timeSolution: privatizationSequential I/OCannot parallelize loop because stage 4 is sequentialSolution: loop fissionLazy updateData computed on demand and cached for reuseSolution: precomputationThe code cannot be parallelized in its current formfor (...) { Descr d; // lazy update (used // in do_descriptor) if (grad == NULL) grad = do_gradient();

    // stage 3: write to d do_descriptor(k, a, &d);

    // stage 4: read from d // write output to a file write_output(k, a, &d);

    }

    UPCRC Seminar

  • 3/13/09UPCRC Seminar*Compiler Analysis Cannot Recover ParallelismPrivatization analysis fails hereAttempt to detect dead (definitely overwritten) dataSilenced errors: if input is out of range, no output is writtenDynamic array size: number of array elements written and read varies across iterationsConditional execution: data is written and read only if a flag is setWithout privatization, cannot do loop fissionTo my knowledge, no compiler will do precomputation hereSoftware complexity prevents transformations

    UPCRC Seminar

  • 3/13/09UPCRC Seminar*Parallelism RegainedDont ask compiler to reverse-engineer low-level codeAvoid introducing dependences by providing libraries that match programming abstractionsPipelinesContainer data structuresOthers...Provide ways to communicate high-level abstractions to compilerAccess permissionsSeparation (data independence)Parametric polymorphismContext-sensitive behaviorAlgorithmic skeletonsControl abstractionsData encapsulationData structuresProof objectsInteger value rangesDependent typesVariable-size arrays, conditional effects, proof objects

    UPCRC Seminar

  • 3/13/09UPCRC Seminar*Introducing a Software Pipeline Libraryfor (; i < nkeys; i++) { ; ;

    for (q = 0; q < nangles; q++) { Descriptor descr;

    ;

    ;

    }}

    Keypoint streamAngle streamDescriptor streamInstead of writing loops,can we let programmerswrite a pipeline?

    UPCRC Seminar

  • 3/13/09UPCRC Seminar*Software Pipeline DefinitionsStream: A computation producing a sequence of valuesFilter: A computation that transforms an input stream to an output streamStreams can contain filtersPipeline: A stream connected to a consumerStage: A stream, filter, or consumerStages may have internal statea stream[43,42...2]a filter...a......b...a streamconsumera pipeline

    UPCRC Seminar

  • 3/13/09UPCRC Seminar*Design Methodology of the Pipeline Library APIDeveloper calls library functions to build stages and pipelinesStages and pipelines are data typesStages wrap worker functions that do the real computation my_pipeline_stage = p_map(my_worker_func);Similar libraries and languages exist (TBB, StreaMIT)Library functionality here is similar to previous workDifferent motivation: to enable automatic parallelization by checking side effectsWill lead to different programming language features

    UPCRC Seminar

  • 3/13/09UPCRC Seminar*Pipeline-Building Library Functions[0,1...n-1]spp...a......b...sfp_range: Generate a sequence of integersp_bind: Add a pipeline stage to a stream,creating a new streamp_map: Apply a transformation to eachelement of a stream(may be sequential or parallel)p_fold: Connect a stream to a consumerp...a......b,b,b...p_cmap: Generalization of p_map that canproduce multiple outputs per inputp_run: Execute a pipeline

    UPCRC Seminar

  • 3/13/09UPCRC Seminar*Software Pipeline Library ExecutionLibrary manages communication and execution orderStateless stages can run sequentially or in parallelSequential: as soon as output is produced, run next stageParallel: save outputs in an arraySequentialParallelTimeExecution order

    UPCRC Seminar

  • 3/13/09UPCRC Seminar*The SIFT Descriptor Pipeline// then build and run pipelineif (mode) start = p_unfold(lookup_keypoint);else start = p_bind(p_range(nkeys), p_map(get_keypoint));

    pl = p_fold(p_bind(p_bind(start, p_cmap(do_orientations)), p_map(do_descriptor)), write_output));p_run(pl);// define pipeline functions...Keypoint *get_keypoint(int *n){ ... }

    void do_orientations (Keypoint *k, void (*send)( struct {Keypoint *k; double angle;} *s)){ ... }Heres how SIFT would look using the pipeline library

    UPCRC Seminar

  • 3/13/09UPCRC Seminar*Going from Explicit Parallelism to Implicit ParallelismI showed an explicitly parallel pipeline libraryExplicitly parallel software developer declares what stages are parallelIf developer was wrong, program will probably have race conditions resulting in nondeterministic behaviorWill show next how to make it implicity parallel Implicitly parallel software developer writes a pipeline and provides some dependence informationGuaranteed that parallel execution will produce the same result as sequential

    UPCRC Seminar

  • 3/13/09UPCRC Seminar*Side Effect Conditions in the Pipeline APIUse of a pipeline indicates developer intention to use restricted communication patternStages are independent, except for input and outputDifferent iterations of stateless stages are independentRestricted communication pattern is part of APICode that does not respect interface is incorrectCode that respects interface is safe to run in parallelChecking correctness is detecting parallelism

    UPCRC Seminar

  • 3/13/09UPCRC Seminar*Preliminaries to Checking CorrectnessFirst, define a language semanticsFramework for reasoning about whether parallel execution of sequential code is safeDefined in terms of computations and permissionsThen reify the semantics within the languageTurn this framework into a type systemCompiler can use type system to reason about parallelismWill show a pipeline example

    UPCRC Seminar

  • 3/13/09UPCRC Seminar*Language Semantics: ComputationsPrograms are structuredNested blocks of code (more or less)A computation is an execution of a block of codeNestedCanonical execution orderExecution order within a library function is specified by the librarySibling computations are candidates for parallel executionfor (...) { foo(); bar();}for loopComputationsCode

    UPCRC Seminar

  • 3/13/09UPCRC Seminar*Language Semantics: PermissionsKeep track of data using access permissionsAlso called capabilitiesPermissions are first-class valuesTo perform a memory read/write requiresA pointer to the dataA permission to access the dataPermissions are compile-time bookkeepingNo run-time overhead

    UPCRC Seminar

  • 3/13/09UPCRC Seminar*Language Semantics: Writable PermissionsComputations interfere if running them in parallel produces nondeterministic resultsDependences are contention for access to a piece of dataPrevent interference by restricting access permissionsA computation needs a writable permission to write dataWritable permissions are linear valuesCannot duplicate a permission only one computation owns any part of memory

    UPCRC Seminar

  • 3/13/09UPCRC Seminar*Language Semantics: Readable PermissionsAlso want to support shared read-only data accessWritable permissions can temporarily become read-only permissionsA computation requires read-only permission to the data it writesRead-only permissions can be duplicated or discardedBut cannot be returned, so that writable permissions can be recovered

    UPCRC Seminar

  • 3/13/09UPCRC Seminar*Language Semantics: SummaryA program runs as a hierarchy of nested computationsAll side effects require permissionsWritable permissions are linear valuesReadable permissions are commutative effectsCan generalize to transactions, I/O, etc.Permission accounting also detects leaks, type errors, and dangling pointersComputations can run in parallel if their permissions can be provided simultaneously

    UPCRC Seminar

  • 3/13/09UPCRC Seminar*Permissions Are Statically Tracked in Type SystemType system statically keeps track of what data is guarded by a permission valuePermission types written as data type @ addressFor example, permission to access an array of 100 integers at address b: array 100 int @ b

    UPCRC Seminar

  • 3/13/09UPCRC Seminar*Permission Tracking in the Pipeline LibraryUse type system to describe how pipelines behave when runStream sProduces outputs of type Can access private writable data s1Can read shared data r1Filter pReads inputs of type Produces outputs of type Can access private writable data s2Can read shared data r2ss : Stream r1 s1 @ as1r1ps2r2p : Filter r2 s2 @ bInputOutputRead-only permissionsWritable permissions

    UPCRC Seminar

  • 3/13/09UPCRC Seminar*Permission Tracking in the Pipeline Library: Building PipelinesCan connect a stream to a filter if input and output types matchResulting stream requires the combined permissions of its partsUnion of read-only permissionsSeparating conjunction of writable permissions (s1 and s2 are disjoint)

    Type system propagates side effect through library callss1 * s2r1 r2s : Stream r1 s1 @ ap : Filter r2 s2 @ b

    s2 = p_bind(s, p);s2: Stream (r1 r2) (s1 * s2)spCompatibleCombinedpermissionsof both stages

    UPCRC Seminar

  • 3/13/09UPCRC Seminar*Permission Tracking in the Pipeline Library: Running PipelinesRunning a pipeline requires all permissions to be availableCannot run a pipeline with a race conditionTwo stages want the same piece of dataRunning requires two pieces of data at the same address, but only one is availables : Stream empty (int@c) @ ap : Filter empty (int@c) @ b

    s2 = p_bind(s, p);s2: Stream empty (int@c * int@c)Running requires twocopies of the samewritable permission!Pipeline APIs side effect conditionsare checked statically

    UPCRC Seminar

  • 3/13/09UPCRC Seminar*Making the Type System Useful for Real ProgramsParallel programs employ a variety of software techniques in their parallel sectionA general-purpose type system requires some difficult (but not fundamentally new) solutionsImplementing linear and dependent typesLogical conditions in typesParallelizing loop nests over arraysPermitting user-defined data typesSolutions employed (separately) inProof-theoretic programming languagesParallelizing FORTRAN compilersShape analysesOften as an analysis rather than a type system

    UPCRC Seminar

  • 3/13/09UPCRC Seminar*Implementation Status of Pipeline LibrarySequential pipeline library implemented in CRuns both sequential and parallel execution orderSIFT is not parallelized yetModest overhead for using libraryEach pipeline stage invocation involves two indirect function callsStream outputs are heap-allocatedOverhead is easily amortizedSIFT takes >1ms computation time per loop iterationMuch greater than overhead

    UPCRC Seminar

  • 3/13/09UPCRC Seminar*Ongoing WorkBuilding compiler infrastructure for type systemBridging high-level source code and type systemType system has programmer-unfriendly featuresLinear and dependent types incompatible with mutabilityManagement of permissions and proof objects is tediousUse type system as an IR produced from source codeInvestigating lightweight annotations and analysis to produce type informationOptimizations and parallelizationExploit extra information in the type system for more powerful code and data transformationsCompiler-generated message passing, unboxing, layout transformations

    UPCRC Seminar

    *****************************