Journeyman Tour

Embed Size (px)

Citation preview

  • 8/7/2019 Journeyman Tour

    1/21

    ACMParallelComputingTechPack

    JourneymansProgrammingTour

    November,2010

    ParallelComputingCommitteePaulSteinberg,Intel,CoChair

    MatthewWolf,CERCS,GeorgiaTech,CoChair

    JudithBishop,Microsoft

    ClayBreshears,Intel

    Barbara

    Mary

    Chapman,

    University

    of

    Houston

    DanielJ.Ernst,UniversityofWisconsinEauClaire

    AndrewFitzGibbon,ShodorFoundation

    DanGarcia,UniversityofCalifornia,Berkeley

    BenedictGaster,AMD

    KatherineHartsell,Oracle

    TomMurphy,ContraCostaCollege

    StevenParker,NVIDIA

    CharliePeck,EarlhamCollege

    JenniferTeal,Intel

    SpecialthankstoAbiSundaram,Intel

    http://techpack.acm.org/parallel/paul_steinberg.cfmhttp://techpack.acm.org/parallel/paul_steinberg.cfmhttp://techpack.acm.org/parallel/paul_steinberg.cfmhttp://techpack.acm.org/parallel/matthew_wolf.cfmhttp://techpack.acm.org/parallel/matthew_wolf.cfmhttp://techpack.acm.org/parallel/matthew_wolf.cfmhttp://techpack.acm.org/parallel/matthew_wolf.cfmhttp://techpack.acm.org/parallel/paul_steinberg.cfm
  • 8/7/2019 Journeyman Tour

    2/21

    TableofContentsIntroduction

    TheBasicsofParallelComputingParallelism

    Parallelcomputing

    Isconcurrencythesameasparallelism?

    ParallelDecompositionsIntroduction

    Taskdecomposition

    Datadecomposition

    Pipelineparallelism

    ParallelHardwareMemorysystems

    Processingcharacteristics

    Coordination

    Scalability

    Heterogeneousarchitectures

    ParallelProgrammingModels,Libraries,andInterfacesIntroduction

    Sharedmemorymodelprogramming

    Posixthreads

    Win32threads

    Java

    OpenMP

    ThreadingBuildingBlocks(TBB)

    Distributedmemorymodelprogramming

    Messagepassinginterface(MPI)

    GeneralpurposeGPUprogramming

    TheOpenComputeLanguage(OpenCL)

    CUDA

    Hybridparallelsoftwarearchitectures

    Parallellanguages

    ConcurrentMLandConcurrentHaskell

    ToolsCompilers

    Autoparallelization

    Threaddebuggers

    Tuners/performanceprofilers

    Memorytools

  • 8/7/2019 Journeyman Tour

    3/21

    PARALLELISMCOMPUTINGJOURNEYMANSPROGRAMMINGTOUR

    INTRODUCTIONIneverydomainthetoolsthatallowustotacklethebigproblems,andexecutethe

    complexcalculationsthatarenecessarytosolvethem,arecomputerbased.The

    evolutionofcomputerarchitecturetowardshardwareparallelismmeansthat

    software/computationalparallelismhasbecomeanecessarypartofthecomputer

    scientistandengineerscoreknowledge.Indeed,understandingandapplying

    computationalparallelismisessentialtogaininganythinglikeasustainedperformance

    onmoderncomputers. Goingforward,performancecomputingwillbeevenmore

    dependentonscalingacrossmanycomputingcoresandonhandlingtheincreasingly

    complexnatureofthecomputingtask. Thisistrueirrespectiveofwhetherthedomain

    problemispredictingclimatechange,analyzingproteinfolding,orproducingthelatest

    animatedblockbuster.

    TheParallelismTechPackisacollectionofguidedreferencestohelpstudents,

    practitioners,andeducatorscometotermswiththelargeanddynamicbodyof

    knowledgethatgoesbythenameparallelism. Wehaveorganizeditasaseriesof

    tours;eachtourinthetechpackcorrespondstooneparticularguidedroutethroughthat

    bodyofknowledge. Thisparticulartourisgearedtowardsthosewhohavesome

    reasonableskillsaspractitionersofserialprogrammingbutwhohavenotyetreally

    exploredparallelisminanycoherentway. AllofthetoursintheParallelismTechPack

    arelivingdocumentsthatprovidepointerstoresourcesforthenoviceandtheadvanced

    programmer,forthestudentandtheworkingengineer.FuturetourswithintheTech

    Packwilladdressothertopics.

    TheauthorsofthisTechPackaredrawnfrombothindustryandacademia. Despitethis

    groupswidevarietyofexperiencesinutilizingparallelplatforms,interfaces,and

    applications,weallagreethatparallelismisnowafundamentalconceptforallof

    computing.

    ScopeofTour: Thistourapproachesparallelismfromthepointofviewof

    someonecomfortablewithprogrammingbutnotyetfamiliarwithparallel

    concepts. Itwasdesignedtoeaseintothetopicwithsomeintroductorycontext,followedbylinkstoreferencesforfurtherstudy. Thetopicspresentedarebyno

    meansexhaustive. Instead,thetopicswerechosensothatacarefulreader

    shouldachieveareasonablycompletefeelforthefundamentalconceptsand

    paradigmsusedinparallelcomputingacrossmanyplatforms. Excitingareas

    liketransactionalmemory,parallelisminfunctionallanguages,distributed

    sharedmemoryconstructs,andsoonwillbeaddressedinothertoursbutalso

  • 8/7/2019 Journeyman Tour

    4/21

    shouldbeseenasbuildingonthefoundationsputforthhere.

    OnlineReadings

    HerbSutter.2005.Thefreelunchisover:Afundamentalturntowardconcurrencyin

    software.Dr.DobbsJ.33,3(March).http://www.gotw.ca/publications/concurrency

    ddj.htm.

    JamesLaurus.2009.SpendingMooresdividend.Commun.ACM52,5(May).

    http://doi.acm.org/10.1145/1506409.1506425.

    1.THEBASICSOFPARALLELCOMPUTINGParallelismisapropertyofacomputationinwhichportionsofthecalculationsare

    independentofeachother,allowingthemtobeexecutedatthesametime.Themore

    parallelismexistsinaparticularproblem,themoreopportunitythereisforusing

    parallelsystemsandparallellanguagefeaturestoexploitthisparallelismandgainan

    overallperformanceimprovement.Forexample,considerthefollowingpseudocode:

    floata=E+A;

    floatb=E+B;

    floatc=E+C;

    floatd=E+D;

    floatr=a+b+c+d.

    Thefirstfourassignmentsareindependentofeachother,andtheexpressionsE+A,E+B,E+C,andE+Dcanallbecalculatedinparallel,thatis,atthesametime,whichcan

    potentiallyprovideaperformanceimprovementoverexecutingthemsequentially,that

    is,oneatatime.

    Parallelcomputingisdefinedasthesimultaneoususeofmorethanoneprocessorto

    solveaproblem,exploitingthatprogramsparallelismtospeedupitsexecutiontime.

    Isconcurrencythesameasparallelism?Whileconcurrencyandparallelismarerelated,

    theyarenotthesame!Concurrencymostlyinvolvesasetofprogrammingabstractions

    toarbitratecommunicationbetweenmultipleprocessingentities(likeprocessesor

    threads). Thesetechniquesareoftenusedtobuilduserinterfacesandother

    asynchronoustasks.Whileconcurrencydoesnotprecluderunningtasksinparallel(and

    theseabstractionsareusedinmanytypesofparallelprogramming),itisnota

    necessarycomponent.Parallelism,ontheotherhand,isconcernedwiththeexecutionof

    multipleoperationsinparallel,thatis,atthesametime.Thefollowingdiagramshows

    parallelprogramsasasubsetofconcurrentones,togetherformingasubsetofall

    possibleprograms:

    http://www.gotw.ca/publications/concurrency-ddj.htmhttp://www.gotw.ca/publications/concurrency-ddj.htmhttp://www.gotw.ca/publications/concurrency-ddj.htmhttp://doi.acm.org/10.1145/1506409.1506425http://doi.acm.org/10.1145/1506409.1506425http://www.gotw.ca/publications/concurrency-ddj.htmhttp://www.gotw.ca/publications/concurrency-ddj.htm
  • 8/7/2019 Journeyman Tour

    5/21

    all programs

    concurrent

    programs

    parallel

    programs

    2.PARALLELDECOMPOSITIONSIntroductionThereareanumberofdecompositionmodelsthatarehelpfultothinkaboutwhen

    breakingcomputationintoindependentwork. Sometimesitisclearwhichmodeltopick.

    Atothertimesitismoreofajudgmentcall,dependingonthenatureoftheproblem,

    howtheprogrammerviewstheproblem,andtheprogrammersfamiliaritywiththe

    availabletoolsets.Forexample,ifyouneedtogradefinalexamsforacoursewith

    hundredsofstudents,therearemanydifferentwaystoorganizethejobwithmultiple

    graderssoastofinishintheshortestamountoftime.

    Tutorials

    TheEPCCcenteratEdinburghhasanumberofgoodtutorials.Thetutorialsmostuseful

    inthiscontextarethefollowing.

    IntroductiontoHighPerformanceComputing andDecomposingthePotentially

    Parallel.http://www2.epcc.ed.ac.uk/computing/training/document_archive/.

    http://www2.epcc.ed.ac.uk/computing/training/document_archive/http://www2.epcc.ed.ac.uk/computing/training/document_archive/
  • 8/7/2019 Journeyman Tour

    6/21

    BlaiseBarney. AnIntroductiontoParallelComputing.LawrenceLivermoreNational

    Labs. https://computing.llnl.gov/tutorials/parallel_comp/.

    Videos

    IntroductiontoParallelProgrammingVideoLectureSeries:Part02.Parallel

    DecompositionMethods.

    Thisvideopresentsthreemethodsfordividingcomposition,andpipelining.

    http://software.intel.com/enus/courseware/course/view.php?id=381.

    IntroductiontoParallelProgrammingVideoLectureSeries:Part04. SharedMemory

    Considerations.

    Thisvideoprovidestheviewerwithadescriptionofthesharedmemorymodelof

    parallelprogramming.Implementationstrategiesfordomaindecompositionandtask

    decompositionproblems

    using

    threads

    within

    a

    shared

    memory

    execution

    environment

    areillustrated.http://software.intel.com/enus/courseware/course/view.php?id=249.

    Taskdecomposition,sometimescalledfunctionaldecomposition,dividestheproblem

    bythetypeoftasktobedoneandthenassignsaparticulartasktoeachparallelworker.

    Asanexample,togradehundredsoffinalexams,alltestpaperscanbepiledontoa

    tableandagroupofgraderscaneachbeassignedasinglequestionortypeofquestion

    toscore,whichisthetasktobeexecuted.Soonegraderhasthetaskofscoringallessay

    questions,anothergraderwouldscorethemultiplechoicequestions,andanotherwould

    scorethetrue/falsequestions.

    Videos

    IntroductiontoParallelProgrammingVideoLectureSeries:Part09.Implementinga

    TaskDecomposition.

    http://software.intel.com/enus/courseware/course/view.php?id=378.

    Thisvideodescribeshowtodesignandimplementataskdecompositionsolution.An

    illustrativeexampleforsolvingthe8Queensproblemisused.Multipleapproachesare

    presentedwiththeprosandconsforeachdescribed.Aftertheapproachisdecided

    upon,codemodificationsusingOpenMParepresented.Potentialdatatraceerrorswith

    asharedstackdatastructureholdingboardconfigurations(thetaskstobeprocessed)

    areofferedandasolutionisfoundandimplemented.

    Datadecomposition,sometimescalleddomaindecomposition,dividesthe problem

    intoelementstobeprocessedandthenassignsasubsetoftheelementstoeachparallel

    worker. Asanexample,togradehundredsoffinalexamsalltestpaperscanbestacked

    ontoatableanddividedintopilesofequalsize.Eachgraderwouldthentakeastackof

    examsandgradetheentiresetofquestions.

    https://computing.llnl.gov/tutorials/parallel_comp/http://software.intel.com/en-us/courseware/course/view.php?id=381http://software.intel.com/en-us/courseware/course/view.php?id=381http://software.intel.com/en-us/courseware/course/view.php?id=381http://software.intel.com/en-us/courseware/course/view.php?id=381http://software.intel.com/en-us/courseware/course/view.php?id=381http://software.intel.com/en-us/courseware/course/view.php?id=381http://software.intel.com/en-us/courseware/course/view.php?id=381http://software.intel.com/en-us/courseware/course/view.php?id=381http://software.intel.com/en-us/courseware/course/view.php?id=381http://software.intel.com/en-us/courseware/course/view.php?id=381http://software.intel.com/en-us/courseware/course/view.php?id=381http://software.intel.com/en-us/courseware/course/view.php?id=381http://software.intel.com/en-us/courseware/course/view.php?id=381http://software.intel.com/en-us/courseware/course/view.php?id=381http://software.intel.com/en-us/courseware/course/view.php?id=381http://software.intel.com/en-us/courseware/course/view.php?id=381http://software.intel.com/en-us/courseware/course/view.php?id=381http://software.intel.com/en-us/courseware/course/view.php?id=381http://software.intel.com/en-us/courseware/course/view.php?id=381http://software.intel.com/en-us/courseware/course/view.php?id=381http://software.intel.com/en-us/courseware/course/view.php?id=381http://software.intel.com/en-us/courseware/course/view.php?id=381http://software.intel.com/en-us/courseware/course/view.php?id=381http://software.intel.com/en-us/courseware/course/view.php?id=381http://software.intel.com/en-us/courseware/course/view.php?id=249http://software.intel.com/en-us/courseware/course/view.php?id=378http://software.intel.com/en-us/courseware/course/view.php?id=378http://software.intel.com/en-us/courseware/course/view.php?id=378http://software.intel.com/en-us/courseware/course/view.php?id=378http://software.intel.com/en-us/courseware/course/view.php?id=249http://software.intel.com/en-us/courseware/course/view.php?id=381http://software.intel.com/en-us/courseware/course/view.php?id=381http://software.intel.com/en-us/courseware/course/view.php?id=381https://computing.llnl.gov/tutorials/parallel_comp/
  • 8/7/2019 Journeyman Tour

    7/21

    TutorialsBlaiseBarney. AnIntroductiontoParallelComputing.LawrenceLivermoreNational

    Lab.https://computing.llnl.gov/tutorials/parallel_comp/#DesignPartitioning.

    Pipelineparallelismisaspecialformoftaskdecompositionwheretheoutputfromone

    process,orstage,astheyareoftencalled,servesdirectlyastheinputtothenextprocess.Thisimposesamuchmoretightlycoordinatedstructureontheprogramthanis

    typicallyfoundineitherplaintaskordatadecompositions. Asanexample,tograde

    hundredsoffinalexams,alltestpaperscanbepiledontoatableandagroupofgraders

    arrangedinaline.Thefirstgradertakesapaperfromthepile,scoresallquestionson

    thefirstpageandpassesthepapertothesecondgrader;thesecondgraderreceivesa

    paperfromthefirstgraderandscoresallthequestionsonthesecondpageandpasses

    thepapertothethirdgrader,andsoon,untiltheexamisfullygraded.

    3.PARALLELHARDWARETheprevioussectiondescribedsomeofthecategoriesofparallelcomputation. Inorder

    todiscussparallelcomputing,however,wealsoneedtoaddressthewaythat

    computinghardwarecanalsoexpressparallelism.

    Memorysystems.Fromaverybasicarchitecturestandpoint,thereareseveralgeneral

    classificationsofparallelcomputingsystems:

    Inasharedmemorysystem,theprocessingelementsallshareaglobalmemoryaddress

    space. PopularsharedmemorysystemsincludemulticoreCPUsand manycoreGPUs

    (GraphicsProcessingUnit).

    Inadistributedmemorysystem,multipleindividualcomputingsystemswiththeirown

    memoryspacesareconnectedtoeachotherthroughanetwork.

    Thesesystemtypesarenotmutuallyexclusive.Inhybridsystems,inwhichmodern

    computationalclustersareclassified,systemsconsistofdistributedmemorynodes,each

    ofwhichisasharedmemorysystem.

    Processingcharacteristics.Inaparallelapplication,calculationsareperformedinthe

    samewaytheyareintheserialcase,onaCPUofsomekind.However,inparallel

    computingtherearemultipleprocessingentities(tasks,threadsorprocesses)insteadofone.Thisresultsinaneedfortheseentitiestocommunicatevalueswitheachotheras

    theycompute. Thiscommunicationhappensacrossanetworkofsomekind.Coordination,suchasmanagingaccesstoshareddatastructuresinathreaded

    environment,isalsoaformofcommunication. Ineithercase,communicationaddsa

    costtotheruntimeofaprogram,inanamountthatvariesgreatlybasedonthedesignof

    https://computing.llnl.gov/tutorials/parallel_comp/#DesignPartitioninghttps://computing.llnl.gov/tutorials/parallel_comp/#DesignPartitioning
  • 8/7/2019 Journeyman Tour

    8/21

    theprogram.Ideally,parallelprogrammerswanttominimizetheamountof

    communicationdone(comparedtotheamountofcomputation).

    Scalability.Animportantcharacteristicofparallelprogramsistheirabilitytoscale,bothintermsofthecomputingresourcesusedbytheprogramandthesizeofthedataset

    processedbytheprogram. Therearetwotypesofscalingweconsiderwhenanalyzingparallelprograms,strongandweakscaling.Strongscalingexaminesthebehavioroftheprogramwhenthesizeofthedatasetisheldconstantwhilethenumberofprocessing

    unitsincreases.Weakscalingexamineswhathappenswhenthesizeofthedatasetisincreasedproportionallyasthenumberofprocessingunitsincreases.Generally

    speaking,itiseasiertodesignparallelprogramsthatdowellwithweakscalingthanitis

    todesignprogramsthatdowellwithstrongscaling.

    Heterogeneousarchitectures(e.g.,IBMCellArchitecture,AMDsFusionArchitecture,

    andIntelsSandyBridgeArchitecture).Heterogeneoussystemsmayconsistofmany

    differentdevices,

    each

    with

    its

    own

    capabilities

    and

    performance

    properties,

    all

    exposed

    withinasinglesystem.Suchsystems,whilenotnew(embeddedsystemonachip

    designshavebeenaroundforovertwodecades),thesearchitecturesarebecomingmore

    prevalentinthemainstreamdesktopandsupercomputingenvironments.Thisisdueto

    theemergenceofacceleratorssuchasIBMCellBroadband,andmorerecentlythewide

    adoptionoftheGeneralpurposecomputingongraphicsprocessingunits(GPGPU)

    programmingmodel,whereCPUsandGPUsareconnectedtoformasinglesystem.

    NVIDIAsComputeUnifiedDeviceArchitecture(CUDA)devicesarethemostcommon

    GPGPUsinusecurrently.

    4.PARALLELPROGRAMMINGMODELS,LIBRARIES,ANDINTERFACESIntroduction

    Thismaterialisgroupedbyparallelprogrammingmodel.Thefirstsectioncovers

    librariesandinterfacesdesignedtobeusedinasharedmemorymodel;thesecond

    coverstoolsforthedistributedmemorymodel;andthethirdcoverstoolsforthe

    GPGPUmodel. Anothercomponentofthistourcovershybridmodelswheretwoor

    moreofthesemodelsmaybecombinedintoasingleparallelapplication.

    Sharedmemorymodelprogramming.

    Posixthreadsareastandardsetofthreadingprimitives. Lowlevelthreadingmethod

    whichunderliesmanyofthemoremodernthreadingabstractionslikeOpenMPandTBB.

    ThefollowingaresomeresourcestoassistyouinunderstandingPosixthreadsbetter.

  • 8/7/2019 Journeyman Tour

    9/21

    TutorialsPOSIXthread(pthread)libraries.

    http://www.yolinux.com/TUTORIALS/LinuxTutorialPosixThreads.html

    Books

    DavidButenhof.1997.ProgrammingwithPOSIXThreads,AddisonWesley.http://www.amazon.com/ProgrammingPOSIXThreadsDavidButenhof/dp/0201633922

    ThisbookoffersanindepthdescriptionoftheIEEEoperatingsysteminterface

    standard,POSIXAE(PortableOperatingSystemInterface)threads,commonlycalled

    Pthreads.ItswrittenforexperiencedCprogrammers,butassumesnoprevious

    knowledgeofthreads,andexplainsbasicconceptssuchasasynchronousprogramming,

    thelifecycleofathread,andsynchronization.AbbreviatedPublishersAbstract.

    BradfordNichols,DickButtlar,andJacquelineProulxFarrell.1996.Pthreads

    Programming:A

    POSIX

    Standard

    for

    Better

    Multiprocessing,

    O

    Reilly.

    http://oreilly.com/catalog/9781565921153

    POSIXthreads,orpthreads,allowmultipletaskstorunconcurrentlywithinthesame

    program.Thisbookdiscusseswhentousethreadsandhowtomakethemefficient.It

    featuresrealisticexamples,alookbehindthescenesattheimplementationand

    performanceissues,andspecialtopicssuchasDCEandrealtimeextensions.Abbreviated

    PublishersAbstract.

    JoyDuffy.2008.ConcurrentProgrammingonWindows,AddisonWesley.http://www.amazon.com/ConcurrentProgrammingWindowsJoe

    Duffy/dp/032143482X

    Thisbookoffersanindepthdescriptionoftheissueswithconcurrency,introducing

    generalmechanismsandtechniques,andcoveringdetailsofimplementationswithinthe

    .NETframeworkonWindows.Therearenumerousexamplesofgoodandbadpractice,

    anddetailsonhowtoimplementyourownconcurrentdatastructuresandalgorithms.

    Win32threads,alsocalledNativeThreadingbyWindowsdevelopers,arestillthe

    defaultmethodusedbymanytointroduceparallelismintocodeinWindows

    environments.Nativethreadingcanbedifficulttoimplementandmaintain.Microsoft

    hasarichbodyofmaterialavailableontheMicrosoftDeveloperNetwork.

    Materialthatprovidesmoreinformationaboutthreadsfollows.

    OnlineResources

    MicrosoftDeveloperNetwork.AnonlineintroductiontoWindowsthreadingconcepts.

    http://msdn.microsoft.com/enus/library/ms684841(VS.85).aspx.

    http://www.yolinux.com/TUTORIALS/LinuxTutorialPosixThreads.htmlhttp://www.amazon.com/Programming-POSIX-Threads-David-Butenhof/dp/0201633922http://www.amazon.com/Programming-POSIX-Threads-David-Butenhof/dp/0201633922http://www.amazon.com/Programming-POSIX-Threads-David-Butenhof/dp/0201633922http://www.amazon.com/Programming-POSIX-Threads-David-Butenhof/dp/0201633922http://www.amazon.com/Programming-POSIX-Threads-David-Butenhof/dp/0201633922http://www.amazon.com/Programming-POSIX-Threads-David-Butenhof/dp/0201633922http://www.amazon.com/Programming-POSIX-Threads-David-Butenhof/dp/0201633922http://www.amazon.com/Programming-POSIX-Threads-David-Butenhof/dp/0201633922http://www.amazon.com/Programming-POSIX-Threads-David-Butenhof/dp/0201633922http://oreilly.com/catalog/9781565921153http://www.amazon.com/Concurrent-Programming-Windows-Joe-Duffy/dp/032143482Xhttp://www.amazon.com/Concurrent-Programming-Windows-Joe-Duffy/dp/032143482Xhttp://www.amazon.com/Concurrent-Programming-Windows-Joe-Duffy/dp/032143482Xhttp://www.amazon.com/Concurrent-Programming-Windows-Joe-Duffy/dp/032143482Xhttp://www.amazon.com/Concurrent-Programming-Windows-Joe-Duffy/dp/032143482Xhttp://www.amazon.com/Concurrent-Programming-Windows-Joe-Duffy/dp/032143482Xhttp://www.amazon.com/Concurrent-Programming-Windows-Joe-Duffy/dp/032143482Xhttp://www.amazon.com/Concurrent-Programming-Windows-Joe-Duffy/dp/032143482Xhttp://www.amazon.com/Concurrent-Programming-Windows-Joe-Duffy/dp/032143482Xhttp://www.msdn.com/http://www.msdn.com/http://www.msdn.com/http://www.msdn.com/http://www.msdn.com/http://msdn.microsoft.com/en-us/library/ms684841(VS.85).aspxhttp://msdn.microsoft.com/en-us/library/ms684841(VS.85).aspxhttp://msdn.microsoft.com/en-us/library/ms684841(VS.85).aspxhttp://msdn.microsoft.com/en-us/library/ms684841(VS.85).aspxhttp://www.msdn.com/http://www.amazon.com/Concurrent-Programming-Windows-Joe-Duffy/dp/032143482Xhttp://www.amazon.com/Concurrent-Programming-Windows-Joe-Duffy/dp/032143482Xhttp://oreilly.com/catalog/9781565921153http://www.amazon.com/Programming-POSIX-Threads-David-Butenhof/dp/0201633922http://www.yolinux.com/TUTORIALS/LinuxTutorialPosixThreads.html
  • 8/7/2019 Journeyman Tour

    10/21

    BooksJohnsonM.Hart.2010.WindowsSystemProgramming(4thed.),AddisonWesley.

    http://www.amazon.com/WindowsProgrammingAddisonWesleyMicrosoft

    Technology/dp/0321657748

    Thisbookcontainsextensivenewcoverageof64bitprogramming,parallelism,multicoresystems,andothercrucialtopics.JohnsonHartsrobustcodeexampleshave

    beendebuggedandtestedinboth32bitand64bitversions,onsingleand

    multiprocessorsystems,andunderWindows7,Vista,Server2008,andWindowsXP.

    HartcoversWindowsexternalsattheAPIlevel,presentingpracticalcoverageofallthe

    servicesWindowsprogrammersneed,andemphasizinghowWindowsfunctions

    actuallybehaveandinteractinrealworldapplications.AbbreviatedPublishersAbstract.

    Java:Sinceversion5.0concurrencysupporthasbeenafundamentalcomponentoftheJavaspecification.JavafollowsasimilarapproachtoPOSIXandotherthreadingAPIs,

    introducingthread

    creation

    and

    synchronization

    primitives

    into

    the

    language

    as

    a

    high

    levelAPI,throughthepackagejava.util.concurrent.Thereareanumberofapproaches

    tointroducingparallelismintoJavacodebutconventionallytheyfollowastandard

    pattern.EachthreadiscreatedasaninstanceoftheclassThread,definingaclassthatimplementstherunnableinterface(thinkofthisasthePOSIXentrypointforafunction),

    thatmustimplementthemethodpublicvoidrun().JustlikePOSIX,thethreadis

    terminatedwhenthemethodreturns.

    TherearealargenumberofresourcestohelpwithunderstandingJavathreadsbetter,

    andthefollowingisjustasmallselection.

    Tutorials

    OraclehasalargenumberofJavaonlinetutorials,andonethatintroducesJavathreads.

    http://download.oracle.com/javase/tutorial/essential/concurrency/index.html.

    ForthedevelopernewtoJavaandorconcurrency,theJavaforBeginnersportal

    providesanexcellentsetoftutorialsandonethatisspecificallyonthethreadingmodel.

    http://www.javabeginner.com/learnjava/javathreadstutorial.

    Books

    ScottOaksandHenryWong.2004.JavaThreads,OReilly.

    http://oreilly.com/catalog/9780596007829

    Thisisawelldevelopedbookthat,whilenotthemostuptodateresourceonthe

    subject,providesanexcellentreferenceguide.

    OpenMP: OpenMPisadirectivebased,sharedmemoryparallelprogramingmodel. ItismostusefulforparallelizingindependentloopiterationsinbothCandFortran. New

    http://www.amazon.com/Windows-Programming-Addison-Wesley-Microsoft-Technology/dp/0321657748http://www.amazon.com/Windows-Programming-Addison-Wesley-Microsoft-Technology/dp/0321657748http://www.amazon.com/Windows-Programming-Addison-Wesley-Microsoft-Technology/dp/0321657748http://www.amazon.com/Windows-Programming-Addison-Wesley-Microsoft-Technology/dp/0321657748http://www.amazon.com/Windows-Programming-Addison-Wesley-Microsoft-Technology/dp/0321657748http://www.amazon.com/Windows-Programming-Addison-Wesley-Microsoft-Technology/dp/0321657748http://www.amazon.com/Windows-Programming-Addison-Wesley-Microsoft-Technology/dp/0321657748http://www.amazon.com/Windows-Programming-Addison-Wesley-Microsoft-Technology/dp/0321657748http://www.amazon.com/Windows-Programming-Addison-Wesley-Microsoft-Technology/dp/0321657748http://www.amazon.com/Windows-Programming-Addison-Wesley-Microsoft-Technology/dp/0321657748http://www.amazon.com/Windows-Programming-Addison-Wesley-Microsoft-Technology/dp/0321657748http://download.oracle.com/javase/tutorial/essential/concurrency/index.htmlhttp://www.javabeginner.com/learn-java/java-threads-tutorialhttp://www.javabeginner.com/learn-java/java-threads-tutorialhttp://www.javabeginner.com/learn-java/java-threads-tutorialhttp://www.javabeginner.com/learn-java/java-threads-tutorialhttp://www.javabeginner.com/learn-java/java-threads-tutorialhttp://www.javabeginner.com/learn-java/java-threads-tutorialhttp://www.javabeginner.com/learn-java/java-threads-tutorialhttp://oreilly.com/catalog/9780596007829http://oreilly.com/catalog/9780596007829http://www.javabeginner.com/learn-java/java-threads-tutorialhttp://download.oracle.com/javase/tutorial/essential/concurrency/index.htmlhttp://www.amazon.com/Windows-Programming-Addison-Wesley-Microsoft-Technology/dp/0321657748http://www.amazon.com/Windows-Programming-Addison-Wesley-Microsoft-Technology/dp/0321657748
  • 8/7/2019 Journeyman Tour

    11/21

    facilitiesinOpenMP3.0allowforindependenttaskstoexecuteinparallel. Tobeused,

    OpenMPmustbesupportedbyyourcompiler. Thestandardislimitedinscopeonthe

    typesofparallelismthatyoucanimplement,butitiseasytouseandagoodstarting

    pointforlearningparallelprogramming.Someresourcestohelpwithunderstanding

    OpenMPbetterarelistedbelow.

    Tutorials

    BlaiseBarney.OpenMPTutorial,LawrenceLivermoreNationalLab.

    https://computing.llnl.gov/tutorials/openMP/.

    Thisexcellenttutorialisgearedtothosewhoarenewtoparallelprogrammingwith

    OpenMP.BasicunderstandingofparallelprogramminginC/C++orFORTRANis

    assumed.

    OpenMPexercises.TimMattsonandLarryMeadows.IntelCorporation.

    ThistutorialprovidesanexcellentintroductiontoOpenMP,includingcodeandexamples.http://openmp.org/mpdocuments/OMP_Exercises.zipand

    http://openmp.org/mpdocuments/omphandsonSC08.pdf.

    GettingStartedwithOpenMP. Textbasedtutorial;readandlearnwithexamples.

    http://software.intel.com/enus/articles/gettingstartedwithopenmp/.

    AnIntroductiontoOpenMP3.0.Thisdeckcontainsmoreadvancedtechniques(e.g.,

    inclusionofwaitstatements)thatwouldneedmoreexplanationtobeusedsafely.

    https://iwomp.zih.tudresden.de/downloads/2.Overview_OpenMP.pdf.

    Videos

    AnIntroductiontoParallelProgramming:VideoLectureSeries.

    http://software.intel.com/enus/courseware/course/view.php?id=224.

    ThismultipartintroductioncontainsmanyunitsonOpenMP,andincludescoding

    exercisesandcodesamples.

    Communitysites

    www.openmp.org.Contains

    the

    current

    and

    past

    OpenMP

    language

    specifications,

    lists

    ofcompilersthatsupportOpenMP,references,andotherresources.

    CheatsheetFORTRANandC/C++.

    C++:http://www.openmp.org/mpdocuments/OpenMP3.0SummarySpec.pdf.

    FORTRAN:http://www.openmp.org/mpdocuments/OpenMP3.0FortranCard.pdf.

    https://computing.llnl.gov/tutorials/openMP/http://openmp.org/mp-documents/OMP_Exercises.ziphttp://openmp.org/mp-documents/OMP_Exercises.ziphttp://openmp.org/mp-documents/OMP_Exercises.ziphttp://openmp.org/mp-documents/omp-hands-on-SC08.pdfhttp://openmp.org/mp-documents/omp-hands-on-SC08.pdfhttp://openmp.org/mp-documents/omp-hands-on-SC08.pdfhttp://openmp.org/mp-documents/omp-hands-on-SC08.pdfhttp://openmp.org/mp-documents/omp-hands-on-SC08.pdfhttp://openmp.org/mp-documents/omp-hands-on-SC08.pdfhttp://openmp.org/mp-documents/omp-hands-on-SC08.pdfhttp://openmp.org/mp-documents/omp-hands-on-SC08.pdfhttp://openmp.org/mp-documents/omp-hands-on-SC08.pdfhttp://software.intel.com/en-us/articles/getting-started-with-openmp/http://software.intel.com/en-us/articles/getting-started-with-openmp/http://software.intel.com/en-us/articles/getting-started-with-openmp/http://software.intel.com/en-us/articles/getting-started-with-openmp/http://software.intel.com/en-us/articles/getting-started-with-openmp/http://software.intel.com/en-us/articles/getting-started-with-openmp/http://software.intel.com/en-us/articles/getting-started-with-openmp/http://software.intel.com/en-us/articles/getting-started-with-openmp/https://iwomp.zih.tu-dresden.de/downloads/2.Overview_OpenMP.pdfhttps://iwomp.zih.tu-dresden.de/downloads/2.Overview_OpenMP.pdfhttps://iwomp.zih.tu-dresden.de/downloads/2.Overview_OpenMP.pdfhttp://software.intel.com/en-us/courseware/course/view.php?id=224http://software.intel.com/en-us/courseware/course/view.php?id=224http://software.intel.com/en-us/courseware/course/view.php?id=224http://www.openmp.org/http://www.plutospin.com/OpenMPref.htmlhttp://www.openmp.org/mp-documents/OpenMP3.0-SummarySpec.pdfhttp://www.openmp.org/mp-documents/OpenMP3.0-SummarySpec.pdfhttp://www.openmp.org/mp-documents/OpenMP3.0-SummarySpec.pdfhttp://www.openmp.org/mp-documents/OpenMP3.0-SummarySpec.pdfhttp://www.openmp.org/mp-documents/OpenMP3.0-SummarySpec.pdfhttp://www.openmp.org/mp-documents/OpenMP3.0-FortranCard.pdfhttp://www.openmp.org/mp-documents/OpenMP3.0-FortranCard.pdfhttp://www.openmp.org/mp-documents/OpenMP3.0-FortranCard.pdfhttp://www.openmp.org/mp-documents/OpenMP3.0-FortranCard.pdfhttp://www.openmp.org/mp-documents/OpenMP3.0-FortranCard.pdfhttp://www.openmp.org/mp-documents/OpenMP3.0-FortranCard.pdfhttp://www.openmp.org/mp-documents/OpenMP3.0-SummarySpec.pdfhttp://www.plutospin.com/OpenMPref.htmlhttp://www.openmp.org/http://software.intel.com/en-us/courseware/course/view.php?id=224https://iwomp.zih.tu-dresden.de/downloads/2.Overview_OpenMP.pdfhttp://software.intel.com/en-us/articles/getting-started-with-openmp/http://openmp.org/mp-documents/omp-hands-on-SC08.pdfhttp://openmp.org/mp-documents/OMP_Exercises.ziphttps://computing.llnl.gov/tutorials/openMP/
  • 8/7/2019 Journeyman Tour

    12/21

    Books

    BarbaraChapman,GabrieleJost,andRuudvanderPas,2007.UsingOpenMP,Portable

    SharedMemoryParallelProgramming,MITPress.ACMMembers,readithere:http://learning.acm.org/books/book_detail.cfm?isbn=9780262533027&type=24.

    MichaelJ.Quinn,1004.ParallelProgramminginCwithMPIandOpenMP,McGrawHill.

    Thisbookaddressestheneedsofstudentsandprofessionalswhowanttolearnhowto

    design,analyze,implement,andbenchmarkparallelprogramsinCusingMPIand/or

    OpenMP.ItintroducesadesignmethodologywithcoverageofthemostimportantMPI

    functionsandOpenMPdirectives.Italsodemonstrates,throughawiderangeof

    examples,howtodevelopparallelprogramsthatwillexecuteefficientlyontodays

    parallelplatforms.AbbreviatedPublishersAbstract.Breshears.2009.TheArtofConcurrency:AThreadMonkeysGuidetoWritingParallel

    Applications,OReilly.http://oreilly.com/catalog/9780596521547

    ThisbookcontainsnumerousexamplesofappliedOpenMPcode.

    WrittenbyanIntelengineerwithovertwodecadesofparallelandconcurrent

    programmingexperience,TheArtofConcurrencyisoneofthefewresourcestofocuson

    implementingalgorithmsinthesharedmemorymodelofmulticoreprocessors,rather

    thanjusttheoreticalmodelsordistributedmemoryarchitectures.Thebookprovides

    detailedexplanationsandusablesamplestohelpyoutransformalgorithmsfromserial

    toparallelcode,alongwithadviceandanalysisforavoidingmistakesthatprogrammers

    typicallymakewhenfirstattemptingthesecomputations.

    RohitChandra,RameshMenon,LeoDagum,DavidKohr,DrorMaydan,andJeff

    McDonald.2001.ParallelProgramminginOpenMP,MorganKaufmann.

    AimedattheworkingresearcherorscientificC/C++orFortranprogrammer,Parallel

    ProgramminginOpenMPbothexplainswhattheOpenMPstandardisandhowtouseit

    tocreatesoftwarethattakesfulladvantageofparallelcomputing.Byaddingahandful

    ofcompilerdirectives(orpragmas)inFortranorC/C++,plusafewoptionallibrarycalls,

    programmerscan parallelize existingsoftwarewithoutcompletelyrewritingit.This

    bookstartswithsimpleexamplesofhowtoparallelize loopsiterativecodethatin

    scientificsoftwaremightworkwithverylargearrays.Samplecodereliesprimarilyon

    Fortran(thelanguageofchoiceforhighendnumericalsoftware)withdescriptionsof

    theequivalentcallsandstrategiesinC/C++.AbbreviatedPublishersAbstract.

    ThreadingBuildingBlocks(TBB).IntelThreadingBuildingBlocks(IntelTBB.TBBis

    athreadinglibraryusedtointroduceparallelismintoC/C++. TBBisarelativelyeasy

    http://learning.acm.org/books/book_detail.cfm?isbn=9780262533027&type=24http://oreilly.com/catalog/9780596521547http://oreilly.com/catalog/9780596521547http://learning.acm.org/books/book_detail.cfm?isbn=9780262533027&type=24
  • 8/7/2019 Journeyman Tour

    13/21

    waytointroducelooplevelparallelism,especiallyforprogrammersfamiliarwith

    templatedcode.TBBisavailablebothasanopensourceprojectandascommercial

    productfromtheIntelCorporation.

    Tutorials

    IntelThreadingBuildingBlocksTutorial.http://www.threadingbuildingblocks.org/uploads/81/91/Latest%20Open%20Source%20

    Documentation/Tutorial.pdf.WrittenbyIntelCorporation,thisisathoroughintroductiontothethreadinglibrary.

    ThistutorialteachesyouhowtouseIntelThreadingBuildingBlocks(IntelTBB),a

    librarythathelpsyouleveragemulticoreperformancewithouthavingtobeathreading

    expert.

    Multicoreinfo.combringstogetheranumberofTBBtutorials.

    http://www.multicoreinfo.com/2009/07/parprogpart6/.

    Codeexamples

    ThisTBB.orgwebsitecontainsathoroughsetofcodingexamples.

    http://www.threadingbuildingblocks.org/codesamples.php.

    CodeexamplesfromarecentcodingwithTBBcontest.

    http://software.intel.com/enus/articles/codingwithinteltbbsweepstakes/.

    CommunitysitesContainsproductannouncements(releasesandupdates),linkstocodesamples,blogs,

    andforumsonTBB.http://www.threadingbuildingblocks.org.

    IntelsiteforcommercialversionofIntelThreadingBuildingBlocks.

    http://www.threadingbuildingblocks.com.

    BooksJamesReinders.2007.IntelThreadingBuildingBlocks,OReilly.http://oreilly.com/catalog/9780596514808

    Thisguideexplainshowtomaximizethebenefitsofmulticoreprocessorsthrougha

    portableC++librarythatworksonWindows,Linux,Macintosh,andUnixsystems.Withit,youlllearnhowtouseIntelThreadingBuildingBlocks(TBB)effectivelyforparallel

    programming,withouthavingtobeathreadingexpert.WrittenbyJamesReinders,

    ChiefEvangelistofIntelSoftwareProducts,andbasedontheexperienceofIntels

    developersandcustomers,thisbookexplainsthekeytasksinmultithreadingandhow

    toaccomplishthemwithTBBinaportableandrobustmanner.AbbreviatedPublishers

    Abstract.

    http://www.threadingbuildingblocks.org/http://www.threadingbuildingblocks.org/http://www.threadingbuildingblocks.org/http://www.threadingbuildingblocks.org/http://www.threadingbuildingblocks.org/http://www.threadingbuildingblocks.org/http://software.intel.com/en-us/intel-tbb/http://software.intel.com/en-us/intel-tbb/http://software.intel.com/en-us/intel-tbb/http://software.intel.com/en-us/intel-tbb/http://software.intel.com/en-us/intel-tbb/http://software.intel.com/en-us/intel-tbb/http://software.intel.com/en-us/intel-tbb/http://software.intel.com/en-us/intel-tbb/http://www.threadingbuildingblocks.org/uploads/81/91/Latest%20Open%20Source%20Documentation/Tutorial.pdfhttp://www.threadingbuildingblocks.org/uploads/81/91/Latest%20Open%20Source%20Documentation/Tutorial.pdfhttp://www.multicoreinfo.com/2009/07/parprog-part-6/http://www.multicoreinfo.com/2009/07/parprog-part-6/http://www.multicoreinfo.com/2009/07/parprog-part-6/http://www.multicoreinfo.com/2009/07/parprog-part-6/http://www.multicoreinfo.com/2009/07/parprog-part-6/http://www.threadingbuildingblocks.org/codesamples.phphttp://software.intel.com/en-us/articles/coding-with-intel-tbb-sweepstakes/http://software.intel.com/en-us/articles/coding-with-intel-tbb-sweepstakes/http://software.intel.com/en-us/articles/coding-with-intel-tbb-sweepstakes/http://software.intel.com/en-us/articles/coding-with-intel-tbb-sweepstakes/http://software.intel.com/en-us/articles/coding-with-intel-tbb-sweepstakes/http://software.intel.com/en-us/articles/coding-with-intel-tbb-sweepstakes/http://software.intel.com/en-us/articles/coding-with-intel-tbb-sweepstakes/http://software.intel.com/en-us/articles/coding-with-intel-tbb-sweepstakes/http://software.intel.com/en-us/articles/coding-with-intel-tbb-sweepstakes/http://software.intel.com/en-us/articles/coding-with-intel-tbb-sweepstakes/http://software.intel.com/en-us/articles/coding-with-intel-tbb-sweepstakes/http://www.threadingbuildingblocks.org/http://www.threadingbuildingblocks.com/http://oreilly.com/catalog/9780596514808http://oreilly.com/catalog/9780596514808http://www.threadingbuildingblocks.com/http://www.threadingbuildingblocks.org/http://software.intel.com/en-us/articles/coding-with-intel-tbb-sweepstakes/http://www.threadingbuildingblocks.org/codesamples.phphttp://www.multicoreinfo.com/2009/07/parprog-part-6/http://www.threadingbuildingblocks.org/uploads/81/91/Latest%20Open%20Source%20Documentation/Tutorial.pdfhttp://www.threadingbuildingblocks.org/uploads/81/91/Latest%20Open%20Source%20Documentation/Tutorial.pdfhttp://software.intel.com/en-us/intel-tbb/http://software.intel.com/en-us/intel-tbb/http://software.intel.com/en-us/intel-tbb/http://www.threadingbuildingblocks.org/
  • 8/7/2019 Journeyman Tour

    14/21

    Breshears.2009.TheArtofConcurrency:AThreadMonkeysGuidetoWritingParallel

    Applications,OReilly.http://oreilly.com/catalog/9780596521547

    ThisbookcontainsnumerousexamplesofappliedTBBcodes.

    Distributedmemory

    model

    programming.Theprecedinglibrariesandinterfacesassume

    thattheresultsfromonethreadoftheoverallcomputationcanbemadedirectly

    availabletoanyotherthread. However,someparallelhardware(suchasclusters)

    forbiddirectaccessfromonememoryspacetoanother. Instead,processesmust

    cooperatebysendingeachothermessagescontainingthedatatobeexchanged.

    Messagepassinginterface(MPI).MPIisalibraryspecificationthatsupportsmessagepassingbetweenprogramimagesrunningondistributedmemorymachines,typically

    clustersofsometype. Anumberofdifferentorganizationsdevelopandsupport

    implementationsoftheMPIstandard,whichspecifiesinterfacesforC/C++and

    FORTRAN.However,

    bindings

    for

    Perl,

    Python,

    and

    many

    other

    languages

    also

    exist.

    MPIprovidesroutinesthatmanagethetransmissionofdatafromthememoryspaceof

    oneprocesstothememoryspaceofanotherprocess.Distributedmemorymachines

    requiretheuseofMPIoranothermessagepassinglibrarybytheparallelprogramin

    ordertousemultipleprocessesrunningonmorethanonenode.Gettingstarted:Startwiththesixbasiccommands:

    MPI_Init() InitializetheMPIworld

    MPI_Finalize()

    Terminate

    the

    MPI

    world

    MPI_Comm_rank() WhichprocessamI?

    MPI_Comm_size() Howmanyprocessesexist?

    MPI_Send() Senddata

    MPI_Recv() Receivedata

    Moveontomorecomplexcommunicationmodelsasneeded,thatis,tocollective

    communication(one>many,many>one,many>many);andoradvanced

    communicationtechniques:synchronousvsasynchronouscommunication,

    blockingvsnonblockingcommunication.

    OnlineReadings

    Moodleswithslidesandcodeexamples.NCSIparallelanddistributedworkshop.

    http://moodle.sceducation.org/course/category.php?id=17.

    http://oreilly.com/catalog/9780596521547http://moodle.sc-education.org/course/category.php?id=17http://moodle.sc-education.org/course/category.php?id=17http://moodle.sc-education.org/course/category.php?id=17http://moodle.sc-education.org/course/category.php?id=17http://oreilly.com/catalog/9780596521547
  • 8/7/2019 Journeyman Tour

    15/21

    Tutorials

    WilliamGropp,RustyLusk,RobRoss,andRajeevThakur.2005.AdvancedMPI:I/Oand

    OneSidedCommunication.http://www.mcs.anl.gov/research/projects/mpi/tutorial/.SuperComputinginPlainEnglish(SIPE).

    http://www.oscer.ou.edu/Workshops/DistributedParallelism/sipe_distribmem_20090324

    .pdf.Cheatsheets

    http://wiki.sceducation.org/index.php/MPI_Cheat_Sheet.

    Books

    PeterPacheco.1997.ParallelProgrammingwithMPI.http://www.amazon.com/Parallel

    ProgrammingMPIPeterPacheco/dp/1558603395

    AhandsonintroductiontoparallelprogrammingbasedontheMessagePassing

    Interface(MPI)standard,thedefactoindustrystandardadoptedbymajorvendorsof

    commercialparallelsystems.Thistextbook/tutorial,basedontheClanguage,contains

    manyfullydevelopedexamplesandexercises.Thecompletesourcecodeforthe

    examplesisavailableinbothCandFortran77.Studentsandprofessionalswillfindthat

    theportabilityofMPI,combinedwithathoroughgroundinginparallelprogramming

    principles,willallowthemtoprogramanyparallelsystem,fromanetworkof

    workstationstoaparallelsupercomputer.AbbreviatedPublishersAbstract.

    General

    purpose

    GPU

    programming.In

    contrast

    to

    the

    threading

    models

    presented

    earlier,acceleratorbasedhardwareparallelism(likeGPUs)focusesonthefactthat

    althoughresultsmaybeshareable,thatis,canbesentfromonepartofacomputationto

    another,thecostofaccessingthememorymaynotbeuniform; CPUsseeCPUmemory

    betterthanGPUmemory,andviceversa.

    TheOpenComputeLanguage.(OpenCL)isanopenstandardforheterogeneous

    computing,developedbytheKhronosOpenCLworkinggroup.Implementationsare

    currentlyavailablefromabroadselectionofhardwareandsoftwarevendors,including

    AMD,Apple,NVIDIA,andIBM.

    OpenCLisintendedasalowlevelprogrammingmodel,designedaroundthenotionof

    ahostapplication,commonlyaCPU,drivingasetofassociatedcomputedevices,where

    parallelcomputationscanbeperformed.

    AkeydesignfeatureofOpenCLisitsuseofasynchronouscommandqueues,associated

    withindividualdevices,thatprovidetheabilitytoenqeuework(e.g.,datatransfersand

    http://www.mcs.anl.gov/research/projects/mpi/tutorial/advmpi/sc2005-advmpi.pdfhttp://www.mcs.anl.gov/research/projects/mpi/tutorial/advmpi/sc2005-advmpi.pdfhttp://www.mcs.anl.gov/research/projects/mpi/tutorial/advmpi/sc2005-advmpi.pdfhttp://www.mcs.anl.gov/research/projects/mpi/tutorial/advmpi/sc2005-advmpi.pdfhttp://www.mcs.anl.gov/research/projects/mpi/tutorial/advmpi/sc2005-advmpi.pdfhttp://www.mcs.anl.gov/research/projects/mpi/tutorial/advmpi/sc2005-advmpi.pdfhttp://www.mcs.anl.gov/research/projects/mpi/tutorial/advmpi/sc2005-advmpi.pdfhttp://www.mcs.anl.gov/research/projects/mpi/tutorial/advmpi/sc2005-advmpi.pdfhttp://www.mcs.anl.gov/research/projects/mpi/tutorial/advmpi/sc2005-advmpi.pdfhttp://www.mcs.anl.gov/research/projects/mpi/tutorial/advmpi/sc2005-advmpi.pdfhttp://www.mcs.anl.gov/research/projects/mpi/tutorial/advmpi/sc2005-advmpi.pdfhttp://www.mcs.anl.gov/research/projects/mpi/tutorial/advmpi/sc2005-advmpi.pdfhttp://www.mcs.anl.gov/research/projects/mpi/tutorial/advmpi/sc2005-advmpi.pdfhttp://www.mcs.anl.gov/research/projects/mpi/tutorial/http://www.oscer.ou.edu/Workshops/DistributedParallelism/sipe_distribmem_20090324.pdfhttp://www.oscer.ou.edu/Workshops/DistributedParallelism/sipe_distribmem_20090324.pdfhttp://wiki.sc-education.org/index.php/MPI_Cheat_Sheethttp://wiki.sc-education.org/index.php/MPI_Cheat_Sheethttp://wiki.sc-education.org/index.php/MPI_Cheat_Sheethttp://www.amazon.com/Parallel-Programming-MPI-Peter-Pacheco/dp/1558603395http://www.amazon.com/Parallel-Programming-MPI-Peter-Pacheco/dp/1558603395http://www.amazon.com/Parallel-Programming-MPI-Peter-Pacheco/dp/1558603395http://www.amazon.com/Parallel-Programming-MPI-Peter-Pacheco/dp/1558603395http://www.amazon.com/Parallel-Programming-MPI-Peter-Pacheco/dp/1558603395http://www.amazon.com/Parallel-Programming-MPI-Peter-Pacheco/dp/1558603395http://www.amazon.com/Parallel-Programming-MPI-Peter-Pacheco/dp/1558603395http://www.amazon.com/Parallel-Programming-MPI-Peter-Pacheco/dp/1558603395http://www.amazon.com/Parallel-Programming-MPI-Peter-Pacheco/dp/1558603395http://www.amazon.com/Parallel-Programming-MPI-Peter-Pacheco/dp/1558603395http://www.amazon.com/Parallel-Programming-MPI-Peter-Pacheco/dp/1558603395http://wiki.sc-education.org/index.php/MPI_Cheat_Sheethttp://www.oscer.ou.edu/Workshops/DistributedParallelism/sipe_distribmem_20090324.pdfhttp://www.oscer.ou.edu/Workshops/DistributedParallelism/sipe_distribmem_20090324.pdfhttp://www.mcs.anl.gov/research/projects/mpi/tutorial/http://www.mcs.anl.gov/research/projects/mpi/tutorial/advmpi/sc2005-advmpi.pdfhttp://www.mcs.anl.gov/research/projects/mpi/tutorial/advmpi/sc2005-advmpi.pdf
  • 8/7/2019 Journeyman Tour

    16/21

    parallelcodeexecution),andbuildupcomplexgraphsdescribingthedependencies

    betweentasks.

    Theexecutionmodelsupportsbothdataparallelandtaskparallelstyles,butOpenCL

    wasdevelopedspecificallywithaneyetowardtodaysmanycore,throughput,GPU

    stylearchitectures,andhenceexposesacomplexmemorystructurethatprovidesanumberoflevelsofsoftwaremanagedmemories.Thisisincontrasttothetraditional

    singleaddressspacemodeloflanguageslikeCandC++,backedbylargecacheson

    generalpurposeCPUs.

    TheOpenCLstandardiscurrentlyinitsseconditeration,atversion1.1,andincludes

    bothaCAPI,forprogrammingthehost,andanewC++WrapperAPI,addedto1.1and

    intendedtobeusedforOpenCLC++development.Byexposingmultipleaddress

    spaces,OpenCLprovidesaverypowerfulprogrammingmodeltoaccessthefull

    potentialofmanycorearchitectures,butthiscomesatthecostofabstraction!

    Thisisparticularlytrueinthecaseofperformanceportability,anditisoftendifficultto

    achievegoodperformanceontwodifferentarchitectureswiththesamesourcecode.

    ThiscanbeevenmoreevidentbetweenthedifferenttypesofOpenCLdevices(e.g.,

    GPUsandCPUs).Thisshouldnotcomeasasurprise,astheOpenCLspecificationitself

    statesthatitisalowlevelprogramminglanguage,andgiventhatthesedevicescan

    haveverydifferenttypesofcomputecapabilities,carefultuningisoftenrequiredtoget

    closetopeakperformance.

    OnlineResources

    OpenCL

    1.1

    Specification

    (revision

    33,

    June

    11,

    2010).

    http://www.khronos.org/registry/cl/specs/opencl1.1.pdf

    OpenCL1.1C++WrapperAPISpecification(revision4,June14,2010).http://www.khronos.org/registry/cl/specs/openclcplusplus1.1.pdf

    OpenCL1.1OnlineManualPages.

    http://www.khronos.org/registry/cl/sdk/1.1/docs/man/xhtml/

    OpenCLQuickReferenceCard.http://www.khronos.org/opencl/.

    Tutorial

    Anexcellentbeginners helloworld tutorialintroductionusingOpenCL1.1sC++API.http://developer.amd.com/GPU/ATISTREAMSDK/pages/TutorialOpenCL.aspx

    Videos

    ATIStreamOpenCLTechnicalOverviewVideoSeries.

    http://developer.amd.com/DOCUMENTATION/VIDEOS/OPENCLTECHNICALOVER

    http://www.khronos.org/registry/cl/specs/opencl-1.1.pdfhttp://www.khronos.org/registry/cl/specs/opencl-1.1.pdfhttp://www.khronos.org/registry/cl/specs/opencl-1.1.pdfhttp://www.khronos.org/registry/cl/specs/opencl-cplusplus-1.1.pdfhttp://www.khronos.org/registry/cl/specs/opencl-cplusplus-1.1.pdfhttp://www.khronos.org/registry/cl/specs/opencl-cplusplus-1.1.pdfhttp://www.khronos.org/registry/cl/specs/opencl-cplusplus-1.1.pdfhttp://www.khronos.org/registry/cl/specs/opencl-cplusplus-1.1.pdfhttp://www.khronos.org/registry/cl/sdk/1.1/docs/man/xhtml/http://www.khronos.org/opencl/http://developer.amd.com/GPU/ATISTREAMSDK/pages/TutorialOpenCL.aspxhttp://developer.amd.com/DOCUMENTATION/VIDEOS/OPENCLTECHNICALOVERVIEWVIDEOSERIES/Pages/default.aspxhttp://developer.amd.com/DOCUMENTATION/VIDEOS/OPENCLTECHNICALOVERVIEWVIDEOSERIES/Pages/default.aspxhttp://developer.amd.com/GPU/ATISTREAMSDK/pages/TutorialOpenCL.aspxhttp://www.khronos.org/opencl/http://www.khronos.org/registry/cl/sdk/1.1/docs/man/xhtml/http://www.khronos.org/registry/cl/specs/opencl-cplusplus-1.1.pdfhttp://www.khronos.org/registry/cl/specs/opencl-1.1.pdf
  • 8/7/2019 Journeyman Tour

    17/21

    VIEWVIDEOSERIES/Pages/default.aspx

    Thisfivepartvideotutorialseriesprovidesanexcellentintroductiontothebasicsof

    OpenCL,includingitsexecutionandmemorymodels,andtheOpenCLCdevice

    programminglanguage.

    CommunitysitesTheKhronosGroupsmainpage,http://www.khronos.org,keepstrackofmajorevents

    aroundOpenCLanditsotherlanguagessuchasOpenGLandWebGL,alongwithsome

    usefuldiscussionforumsonthese.http://www.khronos.org/opencl.

    Thewebsiteshttp://www.beyond3d.comandMarkHarrishttp://www.gpgpu.orgare

    fullofinformationaboutmanycoreprogramming,inparticularthemodernGPUsof

    AMDandNVIDIA,andprovidevibrantdiscussionsonOpenCLandCUDA(thedetails

    ofthislanguagewillfollow),amongotherinterestingareasandtopics.

    Thereisanevergrowingsetofexamplesthatcanbefoundallovertheweb,andeachof

    themajorvendorsprovidesexcellentexampleswiththeircorrespondingSDKs.

    CUDA.ComputeUnifiedDeviceArchitecture(CUDA)isNVIDIAsparallelcomputing

    architecturethatenablesaccelerationincomputingperformancebyharnessingthe

    poweroftheGPU(graphicsprocessingunit). CUDAisinessenceadataparallelmodel,

    sharingalotincommonwiththeotherpopularGPGPUlanguageOpenCL,where

    kernels(similartofunctions)areexecutedovera3Diterationspace;eachindexis

    executedconcurrently,possiblyinparallel.

    OnlineResourcesNVIDIAmaintainsacollectionoffeaturedtutorials,presentations,andexercisesonthe

    CUDADeveloperZone.http://developer.nvidia.com/object/cuda_training.html.

    OnlineReadings

    FordetailsonNVIDIACUDAhardwareandtheunderlyingprogrammingmodels,the

    followingarticlesarerelevant:

    ErikLindholm,JohnNickolls,StuartOberman,andJohnMontrym.2008.NVIDIATesla:

    Aunifiedgraphicsandcomputingarchitecture,IEEEMicro28,2(March),3955.

    NVIDIAGF100.2010.http://www.nvidia.com/object/IO_89569.htm.(Whitepaper.)CodeexamplesTheCUDASDKincludesnumerouscodeexamples,alongwithCUDAversionsof

    http://developer.amd.com/DOCUMENTATION/VIDEOS/OPENCLTECHNICALOVERVIEWVIDEOSERIES/Pages/default.aspxhttp://www.khronos.org/http://www.khronos.org/openclhttp://www.beyond3d.com/http://www.gpgpu.org/http://developer.nvidia.com/object/cuda_training.htmlhttp://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4523358http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4523358http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4523358http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4523358http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4523358http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4523358http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4523358http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4523358http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4523358http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4523358http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4523358http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4523358http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4523358http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4523358http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4523358http://www.nvidia.com/object/IO_89569.htmhttp://www.nvidia.com/object/IO_89569.htmhttp://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4523358http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4523358http://developer.nvidia.com/object/cuda_training.htmlhttp://www.gpgpu.org/http://www.beyond3d.com/http://www.khronos.org/openclhttp://www.khronos.org/http://developer.amd.com/DOCUMENTATION/VIDEOS/OPENCLTECHNICALOVERVIEWVIDEOSERIES/Pages/default.aspx
  • 8/7/2019 Journeyman Tour

    18/21

    popularlibraries(cuBLASandcuFFT).

    http://developer.nvidia.com/object/cuda_3_1_downloads.html

    Books

    DavidB.KirkandWenMeiW.Hwu.ProgrammingMassivelyParallelProcessors:AHands

    onApproach,MorganKaufmann.http://www.amazon.com/dp/0123814723?tag=wwwnvidiacomc

    20&camp=14573&creative=327641&linkCode=as1&creativeASIN=0123814723&adid=1DT

    2S034DXS37V3K5FFY.

    ThisisarecentandpopulartextbookforteachingCUDA.Thisbookshowsbothstudent

    andprofessionalalikethebasicconceptsofparallelprogrammingandGPUarchitecture.

    Varioustechniquesforconstructingparallelprogramsareexploredindetail.Case

    studiesdemonstratethedevelopmentprocess,whichbeginswithcomputational

    thinkingandendswitheffectiveandefficientparallelprograms.AbbreviatedPublishers

    Abstract.

    Hybridparallelsoftwarearchitectures.Programswhichuseahybridparallel

    architecturecombinetwoormorelibraries/models/languages(seeSection3)intoa

    singleprogram;themotivationforthisextracomplexityistoallowasingleparallel

    programimagetoharnessadditionalcomputationalresources.

    ThemostcommonformsofhybridmodelscombineMPIwithOpenMPorMPIwith

    CUDA. MPI/OpenMPisappropriateforuseonclusterresourceswherethenodesare

    multicoremachines;MPIisusedtomovedataandresultsamongthedistributed

    memories,andOpenMPisusedtoleveragethecomputepowerofthecoresonthe

    individualnodes. MPI/CUDAisappropriateforuseonclusterresourceswherethe

    nodesareequippedwithNVIDIAsGPGPUcards. Again,MPIisusedtomovedata

    andresultsamongthedistributedmemoriesandCUDAisusedtoleveragethe

    resourcesofeachGPGPUcard.

    OnlineResourcesMPI/OpenMP:TheLouisianaOpticalNetworkInitiative(LONI)hasanicetutorialon

    buildinghybridMPI/OpenMPapplications.Itcanbefoundat

    https://docs.loni.org/wiki/Introduction_to_Programming_HybridApplications_UsingOp

    enMP_and_MPI.ThisincludespointerstoLONIsOpenMPaswellasMPItutorials.

    MPI/CUDA:TheNationalCenterforSupercomputingApplications(NCSA)hasa

    tutorialthatincludesinformationaboutthis;seethesectionCombiningMPIand

    CUDAin

    http://www.ncsa.illinois.edu/UserInfo/Training/Workshops/CUDA/presentations/tutori

    alCUDA.html.

    http://developer.nvidia.com/object/cuda_3_1_downloads.htmlhttp://www.amazon.com/dp/0123814723?tag=wwwnvidiacomc-20&camp=14573&creative=327641&linkCode=as1&creativeASIN=0123814723&adid=1DT2S034DXS37V3K5FFYhttp://www.amazon.com/dp/0123814723?tag=wwwnvidiacomc-20&camp=14573&creative=327641&linkCode=as1&creativeASIN=0123814723&adid=1DT2S034DXS37V3K5FFYhttp://www.amazon.com/dp/0123814723?tag=wwwnvidiacomc-20&camp=14573&creative=327641&linkCode=as1&creativeASIN=0123814723&adid=1DT2S034DXS37V3K5FFYhttp://www.amazon.com/dp/0123814723?tag=wwwnvidiacomc-20&camp=14573&creative=327641&linkCode=as1&creativeASIN=0123814723&adid=1DT2S034DXS37V3K5FFYhttps://docs.loni.org/wiki/Introduction_to_Programming_Hybrid_Applications_Using_OpenMP_and_MPIhttps://docs.loni.org/wiki/Introduction_to_Programming_Hybrid_Applications_Using_OpenMP_and_MPIhttp://www.ncsa.illinois.edu/UserInfo/Training/Workshops/CUDA/presentations/tutorial-CUDA.htmlhttp://www.ncsa.illinois.edu/UserInfo/Training/Workshops/CUDA/presentations/tutorial-CUDA.htmlhttp://www.ncsa.illinois.edu/UserInfo/Training/Workshops/CUDA/presentations/tutorial-CUDA.htmlhttp://www.ncsa.illinois.edu/UserInfo/Training/Workshops/CUDA/presentations/tutorial-CUDA.htmlhttp://www.ncsa.illinois.edu/UserInfo/Training/Workshops/CUDA/presentations/tutorial-CUDA.htmlhttp://www.ncsa.illinois.edu/UserInfo/Training/Workshops/CUDA/presentations/tutorial-CUDA.htmlhttps://docs.loni.org/wiki/Introduction_to_Programming_Hybrid_Applications_Using_OpenMP_and_MPIhttps://docs.loni.org/wiki/Introduction_to_Programming_Hybrid_Applications_Using_OpenMP_and_MPIhttp://www.amazon.com/dp/0123814723?tag=wwwnvidiacomc-20&camp=14573&creative=327641&linkCode=as1&creativeASIN=0123814723&adid=1DT2S034DXS37V3K5FFYhttp://www.amazon.com/dp/0123814723?tag=wwwnvidiacomc-20&camp=14573&creative=327641&linkCode=as1&creativeASIN=0123814723&adid=1DT2S034DXS37V3K5FFYhttp://www.amazon.com/dp/0123814723?tag=wwwnvidiacomc-20&camp=14573&creative=327641&linkCode=as1&creativeASIN=0123814723&adid=1DT2S034DXS37V3K5FFYhttp://developer.nvidia.com/object/cuda_3_1_downloads.html
  • 8/7/2019 Journeyman Tour

    19/21

    ParallelLanguages

    Wetouchhereonlybrieflyonthetopicofinherentlyparallellanguages. There

    areavarietyofefforts,rangingfromextensionstoexistinglanguagestoradically

    newapproaches. Many,suchasCilkorUPC,canberelativelyeasilyunderstood

    intermsofthelibrariesandtechniquesdescribedabove. Afullerdiscussionof

    thevarietyoflanguageeffortsandtoolswillbeinafurtherTechPack. Becauseit

    issufficientlydifferent,however,aquicklookathowparallelismisincorporated

    intofunctionallanguagesishelpful.

    ConcurrentMLandConcurrentHaskell:Functionalprogramminglanguages,suchas

    StandardMLandHaskell,providestrongfoundationforbuildingconcurrentand

    parallelprogrammingabstractions,forthesinglereasonthattheyaredeclarative.

    Declarativeinaparallelworld,i.e.avoidingtheissues(e.g.raceconditions)that

    updatingaglobalstateforasharedmemorymodelcancause,providesforastrong

    foundationtobuildconcurrencyabstractions.

    ConcurrentMLisahighlevelmessagepassinglanguagethatsupportstheconstruction

    offirstclasssynchronousabstractionscalledevents,embeddedintoStandardML.It

    providesarichsetofconcurrencymechanismsbuiltonthenotionofspawningnew

    threadsthatcommunicateviachannels.

    ConcurrentHaskellisanextensiontothefunctionallanguageHaskellfordescribingthe

    creationofthreads,thathavethepotentialtoexecuteinparallelwithother

    computations.UnlikeConcurrentML,ConcurrentHaskellprovidesalimitedformof

    sharedmemory,introducingMVars(mutablevariables)whichcanbeusedto

    atomicallycommunicateinformationbetweenthreads.Unlikemorerelaxedshared

    memorymodels(e.g.seeOpenMLandOpenCLinthefollowingtext),Concurrent

    Haskellsruntimesystemensuresthattheoperationsforreadingfromandwritingto

    MVarsoccuratomically.

    Tutorials

    SimonPeytonJonesandSatnamSingh.ATutorialonParallelandConcurrent

    ProgramminginHaskellLectureNotesfromAdvancedFunctionalProgramming

    SummerSchool2008.

    http://research.microsoft.com/enus/um/people/simonpj/papers/parallel/AFP08

    notes.pdf

    Books

    JohnH.Reppy.1999.ConcurrentML,CambridgeUniversityPress.

    http://research.microsoft.com/en-us/um/people/simonpj/papers/parallel/AFP08-notes.pdfhttp://research.microsoft.com/en-us/um/people/simonpj/papers/parallel/AFP08-notes.pdfhttp://research.microsoft.com/en-us/um/people/simonpj/papers/parallel/AFP08-notes.pdfhttp://research.microsoft.com/en-us/um/people/simonpj/papers/parallel/AFP08-notes.pdfhttp://research.microsoft.com/en-us/um/people/simonpj/papers/parallel/AFP08-notes.pdfhttp://research.microsoft.com/en-us/um/people/simonpj/papers/parallel/AFP08-notes.pdfhttp://research.microsoft.com/en-us/um/people/simonpj/papers/parallel/AFP08-notes.pdfhttp://research.microsoft.com/en-us/um/people/simonpj/papers/parallel/AFP08-notes.pdf
  • 8/7/2019 Journeyman Tour

    20/21

    SimonPeytonJones.2007.BeautifulConcurrency.InBeautifulCode;editedbyGreg

    Wilson,OReilly.http://research.microsoft.com/en

    us/um/people/simonpj/papers/stm/index.htm#beautiful

    5.TOOLSThereisavarietyoftoolsavailabletoassistprogrammersincreating,debugging,and

    runningparallelcodes. Thissectionsummarizesthecategoriesoftools;amore

    exhaustivelistoftoolsthatrunondifferenthardwareandsoftwareplatformswillbe

    includedinasubsequentadditiontotheTechPack.

    Compilers

    Manyofthecompilerfamiliestoday,bothcommercialandopensource,directlysupport

    someformofexplicitparallelism. (OpenMP,threads,etc.).

    Autoparallelization

    Theholygrailofmany,coresupportwouldbeacompilerthatcouldautomatically

    extractparallelismatcompiletime. Unfortunately,thisisstillaworkinprogress. That

    said,anumberofcompilerscanaddutilitythroughvectorizationandtheidentification

    ofobviousparallelisminsimpleloops.

    OnlineResources

    http://en.wikipedia.org/wiki/Automatic_parallelization.

    Thread

    debuggers

    IntelThreadChecker:http://software.intel.com/enus/intelthreadchecker/.

    IntelParallelInspector:http://software.intel.com/enus/intelparallelinspector/.

    MicrosoftVisualStudio2010tools:http://www.microsoft.com/visualstudio/enus/.

    Hellgrind.http://valgrind.org/docs/manual/hgmanual.htmlisaValgrindtoolfor

    detectingsynchronizationerrorsinC,C++andFORTRANprogramsthatusethePOSIX

    pthreadsthreadingprimitives.ThemainabstractionsinPOSIXpthreadsareasetof

    threadssharingacommonaddressspace,threadcreation,threadjoining,threadexit,

    mutexes(locks),conditionvariables(interthreadeventnotifications),readerwriter

    locks,spinlocks,semaphoresandbarriers.

    Tuners/performanceprofilers

    IntelVTunePerformanceAnalyzer&IntelThreadProfiler3.1forWindows. The

    ThreadProfilercomponentofVtunehelpstunemultithreadedapplicationsfor

    http://research.microsoft.com/en-us/um/people/simonpj/papers/stm/index.htm#beautifulhttp://research.microsoft.com/en-us/um/people/simonpj/papers/stm/index.htm#beautifulhttp://research.microsoft.com/en-us/um/people/simonpj/papers/stm/index.htm#beautifulhttp://en.wikipedia.org/wiki/Automatic_parallelizationhttp://software.intel.com/en-us/intel-thread-checker/http://software.intel.com/en-us/intel-thread-checker/http://software.intel.com/en-us/intel-thread-checker/http://software.intel.com/en-us/intel-thread-checker/http://software.intel.com/en-us/intel-thread-checker/http://software.intel.com/en-us/intel-thread-checker/http://software.intel.com/en-us/intel-thread-checker/http://software.intel.com/en-us/intel-parallel-inspector/http://software.intel.com/en-us/intel-parallel-inspector/http://software.intel.com/en-us/intel-parallel-inspector/http://software.intel.com/en-us/intel-parallel-inspector/http://software.intel.com/en-us/intel-parallel-inspector/http://software.intel.com/en-us/intel-parallel-inspector/http://software.intel.com/en-us/intel-parallel-inspector/http://www.microsoft.com/visualstudio/en-us/http://www.microsoft.com/visualstudio/en-us/http://www.microsoft.com/visualstudio/en-us/http://valgrind.org/docs/manual/hg-manual.htmlhttp://valgrind.org/docs/manual/hg-manual.htmlhttp://valgrind.org/docs/manual/hg-manual.htmlhttp://valgrind.org/docs/manual/hg-manual.htmlhttp://www.microsoft.com/visualstudio/en-us/http://software.intel.com/en-us/intel-parallel-inspector/http://software.intel.com/en-us/intel-thread-checker/http://en.wikipedia.org/wiki/Automatic_parallelizationhttp://research.microsoft.com/en-us/um/people/simonpj/papers/stm/index.htm#beautifulhttp://research.microsoft.com/en-us/um/people/simonpj/papers/stm/index.htm#beautiful
  • 8/7/2019 Journeyman Tour

    21/21

    performance.TheIntelThreadProfilertimelineviewshowswhatthethreadsaredoing

    andhowtheyinteract.http://software.intel.com/enus/intelvtune/.

    IntelParallelAmplifier. Atooltohelpfindmulticoreperformancebottleneckswithout

    needingtoknowtheprocessorarchitectureorassemblycode.

    http://software.intel.com/enus/intelparallelamplifier/.

    MicrosoftVisualStudio2010tools.http://www.microsoft.com/visualstudio/enus/.

    gprof:theGNUProfiler.http://www.cs.utah.edu/dept/old/texinfo/as/gprof_toc.html.

    Memorytools

    Hoard:http://www.hoard.org/.TheHoardmemoryallocatorisafast,scalable,and

    memoryefficientmemoryallocator.Itrunsonavarietyofplatforms,includingLinux,

    Solaris,andWindows.Hoardisadropinreplacementformalloc()thatcandramatically

    improveapplicationperformance,especiallyformultithreadedprogramsrunningonmultiprocessors.Nochangetoyoursourceisnecessary.Justlinkitinorsetjustone

    environmentvariable.

    http://software.intel.com/en-us/intel-vtune/http://software.intel.com/en-us/intel-vtune/http://software.intel.com/en-us/intel-vtune/http://software.intel.com/en-us/intel-vtune/http://software.intel.com/en-us/intel-vtune/http://software.intel.com/en-us/intel-parallel-amplifier/http://software.intel.com/en-us/intel-parallel-amplifier/http://software.intel.com/en-us/intel-parallel-amplifier/http://software.intel.com/en-us/intel-parallel-amplifier/http://software.intel.com/en-us/intel-parallel-amplifier/http://software.intel.com/en-us/intel-parallel-amplifier/http://software.intel.com/en-us/intel-parallel-amplifier/http://www.microsoft.com/visualstudio/en-us/http://www.microsoft.com/visualstudio/en-us/http://www.microsoft.com/visualstudio/en-us/http://www.cs.utah.edu/dept/old/texinfo/as/gprof_toc.htmlhttp://www.hoard.org/http://www.hoard.org/http://www.cs.utah.edu/dept/old/texinfo/as/gprof_toc.htmlhttp://www.microsoft.com/visualstudio/en-us/http://software.intel.com/en-us/intel-parallel-amplifier/http://software.intel.com/en-us/intel-vtune/