High level abstractions for irregular computations...Programming irregular computations for...

Highlevelabstractionsforirregularcomputations

DavidPaduaDepartmentofComputerScience

UniversityofIllinoisatUrbana-Champaign

1.INTRODUCTION:THEPROBLEM

2December4,2017

Programmingforperformance

• Islaborious– Particularlysowhenthetargetmachineisparallel.

• Mustdecidewhattoexecuteinparallel• Inwhatordertotraversethedata• Howtopartitionthedata…

• Today,tuningmustbedonemanually.• Maintenanceisdifficult

– andtheresultisnotportable.

3December4,2017

Programmingforperformance

• Thereisstillmuchroomforprogressinhighperformanceprogrammingmethodology.

• Ourgoalistofacilitate(automate)tuningandenablemachineindependentprogramming.

4December4,2017

2.IRREGULARCOMPUTATIONS

5December4,2017

Whatareirregularcomputations?

• Irregularcomputationsaredifficulttodefine.• Onepossibledefinitionis:

– Computationson“irregular”datastructures.– Irregulardatastructures=notdensearrays.

• Pingali etal.Thetao ofparallelisminalgorithms.PLDI’11

– Andsubscriptexpressionsnon-linear.

6December4,2017

Whatareirregularcomputations?

• Examplesinclude– Subscriptedsubscripts:

• A[C[i]]– Linkedliststraversal:

• leftCell = leftCell->ColNext;• result+=leftCell->Value;

7December4,2017

Programmingirregularcomputationsforperformance

• Today,programmingofirregularcomputationsistypicallyatalowlevelofabstraction.

8G.HagerandG.Wellein.IntroductiontoHighPerformanceComputingforScientistsandEngineers.CRC Press

do diag=1,Nj, 2 ! two-way unroll amd Jamdiaglen=min( ( jd_ptr(diag+1)-jd_ptr(diag) ) , \

( jd_ptr(diag+2)-jd+ptr(diag+1) ) )offset1 = jd_ptr(diag) - 1offset2 = jd_ptr(diag+1) - 1do i=1, diagLen

C(i) = C(i) + val(offset1+i)*B(col_idx(offsetr1+i))C(i) = C(i) + val(offset2+i)*B(col_idx(offset2+i))

end do! peeled off iterationsoffset1 = jd_ptr(diag)do i=(diagLen+1), (jd_ptr(diag+1)-jd_ptr(diag))

C(i) = C(i)+val(offset1+i)*B(col_idx(offswt1+i))end do

end do

December4,2017

Programmingirregularcomputationsforperformance

• Programmingandoptimizingirregularcomputationsmorechallengingthanforlinear-subscriptscomputations.

• Wanttohavecompilersupportforprogrammingata lowlevelofabstractiontoenable– Automaticallymappingcomputationontodiversemachineclasses– Localityenhancement– Othercompileroptimizations(e.g.commonsubexpression eliminations)

• Inaddition,oftenusinghigherlevelsofabstractionsandapplyingoptimizationsatthatlevelisabettersolution.

• Bothapproacheswillbediscussedinthispresentation

9December4,2017

Whencomputationsarenotirregular

• Compilercandoagoodjobattransformingprograms:– Loopinterchange– Loopdistribution– Tiling– …

• Andwehavethethetechnologytogenerateefficientcodeforavarietyofplatformsevenwhentheinitialcodeissequential– Vectorextensions– Multicores– Distributedmemorymachines– GPUs

10December4,2017

ExampleAutomatictransformationofaregularcomputation

• Considertheloop

for (i=1; i<n; i++) {for (j=1; j<n; j++) {

a[i][j]=a[i][j-1]+a[i-1][j];}}

11December4,2017

for (i=1; i<n; i++) {for (j=1; j<n; j++) {

a[i][j]=a[i][j-1]+a[i-1][j];}}

a[1][1]=a[1][0]+a[0][1]

a[1][2]=a[1][1]+a[0][2]

a[1][3]=a[1][2]+a[0][3]

a[1][4]=a[1][3]+a[0][4]

a[2][1]=a[2][0]+a[1][1]

a[2][2]=a[2][1]+a[1][2]

a[2][3]=a[2][2]+a[1][3]

a[2][4]=a[2][3]+a[1][4]

i=1 i=2

• Computedependences(part1)

December4,2017

for (i=1; i<n; i++) {for (j=1; j<n; j++) {

a[i][j]=a[i][j-1]+a[i-1][j];}}

a[1][1]=a[1][0]+a[0][1]

a[1][2]=a[1][1]+a[0][2]

a[1][3]=a[1][2]+a[0][3]

a[1][4]=a[1][3]+a[0][4]

a[2][1]=a[2][0]+a[1][1]

a[2][2]=a[2][1]+a[1][2]

a[2][3]=a[2][2]+a[1][3]

a[2][4]=a[2][3]+a[1][4]

i=1 i=2

• Computedependences(part2)

December4,2017

for (i=1; i<n; i++) {for (j=1; j<n; j++) {

a[i][j]=a[i][j-1]+a[i-1][j];}}

1 2 3 4 …

December4,2017

for (i=1; i<n; i++) {for (j=1; j<n; j++) {

a[i][j]=a[i][j-1]+a[i-1][j];}}

• Findparallelism

December4,2017

for (i=1; i<n; i++) {for (j=1; j<n; j++) {

a[i][j]=a[i][j-1]+a[i-1][j];}}

• Findparallelism

December4,2017

for (i=1; i<n; i++) {for (j=1; j<n; j++) {

a[i][j]=a[i][j-1]+a[i-1][j];}}

• Transformthecode

for (k=4; k<2*n; k++) forall (i=max(2,k-n):min(n,k-2)) a[i][k-i]=...

December4,2017

Analyzingirregularcomputations

• Analysisofthefollowingcodeisdifficultandsometimesimpossible

• Dependencesareafunctionofthevaluesofc andd• Intheabsenceofadditionalinformation,thecompilermustassumetheworst.

• Andthisprecludesautomatictransformations(parallelization,vectorization,datadistribution,…)

for (i=1; i<n; i++) {a[c[i]]=a[d[i]]+1;

December4,2017

Irregularityisabeastprogramminglanguageresearchershavebeen

fightingforalongtime

19December4,2017

Ithasdefeatedussometimes

IgnoringtheimportanceofirregularcomputationswasoneoftheReasonsforthefailureofHighPerformanceFortran

December4,2017

Tamingtheirregularbeast

• CompilersandRuntime(Section3)

• Betternotations(Section4)

21December4,2017

3.COMPILERANDRUNTIMESOLUTIONSFORLOWLEVELIRREGULARPROGRAMMING

22December4,2017

Twoapproachestodealingwithirregularcomputations

• Analyzingirregularcodesdirectly– Statically– Dynamically

• Convertingintohigherlevelofabstraction(densearraycomputations)

23December4,2017

Staticanalysis

• Insomecases,itispossibletoanalyzeirregularalgorithmswritteninconventionalnotation.

• Moreeffortthanforregular/linearsubscriptcomputations

24December4,2017

Staticanalysis

Dependenceanalysis

do k = 1, nq = 0do i = 1, p

if ( x(i) > 0 ) thenq = q + 1ind(q) = i

end ifend dodo j = 1, q

x(ind(j)) = x(ind(j))* y(ind(j))end do

end do

December4,2017

Staticanalysis

do k = 1, nq = 0do i = 1, p

end do

Dependenceanalysis

Needinformationaboutind(j)

December4,2017

Staticanalysis

do k = 1, nq = 0do i = 1, p

end do

Dependenceanalysis

Needinformationaboutind(j)Canbefoundbyanalyzingtheprogram

December4,2017

Staticanalysis

do i=1, ndo j=2, iblen(i)

do k=1, j-1x(pptr(i)+k-1) = ...

end doend dodo j=1, iblen(i)-1

do k=1, j... = x(iblen(i)+pptr(i)+k-j-1)

end doend do

end do

Arrayprivatization

December4,2017

Staticanalysis

do k=1, j-1x(pptr(i)+k-1) = ...

end doend do

end do

Arrayprivatization

Todecideifx canbeprivatized

December4,2017

Staticanalysis

do k=1, j-1x(pptr(i)+k-1) = ...

end doend do

end do

Arrayprivatization

Todecideifx canbeprivatizedIssufficienttoshowthatawriteprecedeseachread

December4,2017

Staticanalysis

• Apossibleapproachistoquerythecontrolflowgraph,boundingthesearch.

YuanLinandDavidPadua.Compileranalysisofirregularmemoryaccesses.(PLDI'00).

December4,2017

Dynamicanalysis

• Embeddeddependenceanalysis• Inspectorexecutor• Speculation

32December4,2017

Embeddeddependenceanalysis

• ThisisaclevermechanismduetoZhuandYew.

Zhu,C.Q.,&Yew,P.C.(1987).SCHEMETOENFORCEDATADEPENDENCEONLARGEMULTIPROCESSORSYSTEMS.IEEETransactionsonSoftwareEngineering,SE-13(6),726-739

for i=1; i<n; i++a(k(i)) = c(i) + 1

a_tag(k(:)) = 0parallel for i=1;i<n;i++

critical a(k(i)) if a_tag(k(i)) < i

a(k(i)) = c(i) + 1a_tag(k(i)) = i

December4,2017

Inspectorexecutor

• Parallelscheduleandcommunicationaggregationcomputedatexecutiontime– Analyzingthememoryaccesspatternofacodesegment(typicallyaloop)

• Overheadreducedwhenthecodesegment(loop)isexecutedmultipletimeswiththesameaccesspattern

SeeEncyclopediaofParallelComputingRunTimeParallelization(Saltz andDas)

December4,2017

Speculation

• Thereisanextensivebodyofliteratureonspeculation.• Acodesegmentisexecutedinparallelinthehopethatthereisnoproblem.

• Ifthereis,executionbacktracksandthecodesegmentisre-executedserially

See LawrenceRauchwerger,DavidA.Padua:TheLRPD Test:SpeculativeRun-TimeParallelizationofLoopswithPrivatizationandReductionParallelization.PLDI 1995:218-232

EncyclopediaofParallelComputingThreadlevelspeculation(Torrellas)TransactionalMemories(Herlihy)

35December4,2017

Convertingintodensearraycomputations

• Theobjectiveistoconvertsparsecomputationsintoequivalentdenseform.

• Thisincreasesthenumberofoperationsinvolvingzeroes(oridentity)– Sometimesmakingthecomputationimpossible

• Butitallowstheapplicationofcompile-timetransformations.

• Sparsity isrecoveredbyapplyingafinaltransformation(seebelow)

36December4,2017

for(col=0; col<cols; col++) {for(row=0; row<dimensions; row++) {

leftCell = left.Rows[row];while( leftCell != NULL ) {

result[row][col] +=leftCell->Value * right[leftCell->ColIndex][col];

leftCell = leftCell->ColNext;

for( col = 0; col < cols; col++ ) for( row = 0; row < dimensions; row++ )

result[row][col] += A_Valuep[row][:] * right[:][col];

December4,2017

Harmen L.A.vanderSpek,HarryA.G.Wijshoff:Sublimation:ExpandingDataStructurestoEnableDataInstanceSpecificOptimizations.LCPC2010:106-120

Anand Venkat,MaryHall,andMichelleStrout.2015.Loopanddatatransformationsforsparsematrixcode.In Proceedingsofthe36thACMSIGPLANConferenceonProgrammingLanguageDesignandImplementation(PLDI'15).

December4,2017

Wheredowestand?

• Thereisanextensivebodyofliteratureoncompilationandruntimetofacilitatetheparallelprogrammingofirregularcomputations.

• Althoughmoreideasareimportant,whatweneedmosturgentlyisanevaluationoftheireffectiveness.– Ameansforrefinementandprogress.

• Allcompilertechniquessufferfromlackofunderstandingoftheireffectiveness.

39December4,2017

Needtoolstoevaluateprogress

• Measuringadvancesincompilertechnologyiscrucialforprogress.

• Anidea:apubliclyavailablerepository– Containingextensiveandrepresentativecodecollections.

– Keeptrackofresultsforcompilers/techniquesalongtime.

• Ongoingworkononesuchrepository:– LORE:ALoopRepositoryfortheEvaluationofCompilersbyZ.i Chen,Z.Gong,J.

JosefSzaday,D.Won,D.Padua,A.Nicolau ,A.Veidenbaum ,N.Watkinson,Z.Sura,S.Maleki,J.Torrellas,G.DeJong.IISWC-2017

40December4,2017

4. NOTATIONS

41December4,2017

4. NOTATIONS4.1ABSTRACTION

42December4,2017

WhatareProgrammingabstractions?

§ Anabstractionisformedbyreducinginformation.

§ Ineverydaylife,itformstheworldofideaswherewecanreason.No(naturallanguage)statementcanbemadewithoutabstractions

43December4,2017

WhatareProgrammingabstractions?

§ Abstractmachinelanguage/lowerlevelimplementation§ Scalarcomputations:

§ a=b*c+d^3.4§ Noloads,stores,registerallocation.

§ Scalarloops:§ for (i= …..) {…}§ Noifstatements,branch/goto instructions.§ Givesstructuretotheprogram.

§ Abstractalgorithmimplementation§ min(A(1:n))§ it = find (myvec.begin(), myvec.end(), 30);§ Hidealgorithm,datarepresentation

44December4,2017

Whatarethebenefitofusingabstractions?

• Programmerproductivity(expressiveness)– Codesareeasiertowrite– Tounderstand,maintain

• Portability• Optimization(manualandautomatic)

– Programsareeasiertoanalyze– Easiertotransform– Deepertransformationsareenabled

45December4,2017

Abstractionsandirregularcomputations

• Whatabstractionsshouldweuseforprogrammingirregularcomputationsatahigherlevel?

• Bestanswerseemstobetorepresentirregularalgorithmsintermsof:

Densearrayoperations

46December4,2017

4. NOTATIONS4.2.ARRAYNOTATIONANDIRREGULARCOMPUTATIONS

47December4,2017

Whyarraynotation

• Couldaccessarraysinscalarmode• Butoperationsonaggregatesare(often)easiertoread.

for (i=0; i<n; i++) for (j=0;j<n;j++)

a[i][j]=b[i][j]+c[i][j]

• Theyarewellunderstoodabstractions.• Powerful.

48December4,2017

Arrayoperationsandperformance

• Butusefulnessofarraynotationgobeyondexpressiveness.

• Thereareatleastthreereasonstousearrayabstractionsforperformance– Torepresentparallelism– Toreducetheoverheadofinterpretation.– Toenableautomaticoptimizations.

49December4,2017

Representationofparallelism• Popularintheearlydays:

• Usedtodaybutnotwidely

• Tensorflow• Futhark

– T.Henriksen,N.Serup,M.Elsman,F.Henglein,andC.Oancea.Futhark:PurelyFunctionalGPU-ProgrammingwithNestedParallelismandIn-PlaceArrayUpdates PLDI 2017

do 10 i = 1, 100, 2

M (i) = .true.M (i+l) = .false.

10 continueA(*) = B(*) + A(*)

C(M(*)) = B(M(*)) + A(M(*))

Illiac IVFortran

C[i][ j] = __sec_reduce_add( a[ i][:] * b[:][j]);

Cilk plus

December4,2017

Array operations and optimizationsAnexample

• Applyingarithmeticrulessuchasassociativityofmatrix-matrixmultiplicationtoreducethenumberofoperations.– Whichoneisbetter?

• (((A*B)*C)*D)*E,• (A*((B*C)*D))*E,• (A*B)*((C*D)*E),• …

– Findbestusingdynamicprogramming– YoichiMuraokaandDavidJ.Kuck.1973.Onthetimerequiredforasequenceofmatrixproducts.Commun.ACM16,1(January1973),22-26.

51December4,2017

Reduceoverheadofinterpretation

• SeeHaichuan Wang,DavidA.Padua,PengWu:Vectorization ofapplytoreduceinterpretationoverheadofR.OOPSLA 2015:400-415

52December4,2017

Arrayoperationsandirregularcomputations

• Arrayoperationsareanidealnotationtomanipulatesparsearrays.

53December4,2017

SparsearraysinMATLABload west0479.matA = west0479;S = A * A' + speye(size(A));pct = 100 / numel(A);

figurespy(S)title('A Sparse Symmetric Matrix')nz = nnz(S);xlabel(sprintf('nonzeros = %d (%.3f%%)',nz,nz*pct));

• Arraysappearasdense,butrepresentedinternallyasspaarse.• Theinterpreterandlibrarieshandlethesparseness

S = sparse(m,n)

December4,2017

Sparsearrayscanbeabstractedasdensearraysincompiledlanguages

• BikandWijshoff*developedastrategytotranslatedensearrayprogramsontosparseequivalents.

do i=1,nacc=acc+a(i,:)*<1:n>

do i=1,ndo j=1,n

if (i,j) ∈ E a thenacc=acc+a’(σa(i,j))*j

do i=1,ndo ad∈PADa

j=π2σa-1(ad)acc=acc+a’(ad)*j

do i=1,ndo ad=alow(i),ahigh(j)

j=aind(ad)acc=acc+aval(ad)*j

*Aart J.C.Bik,HarryA.G.Wijshoff:CompilationTechniquesforSparseMatrixComputations.InternationalConferenceonSupercomputing1993:416-424December4,2017

Arrayoperationsandirregularcomputations

• Notonlycanarraynotationbeusedtorepresentdensecomputations,itcanbeusedforanumberofotherdomains.

• Awiderangeofdomains

56December4,2017

Howwide?

• Atthetime(ca.1959),Illinois,withthefirstofits‘ILLIAC’supercomputers,wasagreatcentre ofcomputing,andGolubshowedhisaffectionforhisAlmaMaterbyendowingachairthere50yearslater.Rumour hasitthatthefundsforthegiftcamefromGooglestockacquiredinexchangeforsomeadviceonlinearalgebra.Google’sPageRanksearchtechnologystartsfromamatrixcomputation— aneigenvalueproblemwithdimensionsinthebillions.Hardlysurprising,Golub wouldhavesaid:everythingislinearalgebra.Nature2007

57December4,2017

New operatorsArraysforgraphsalgorithms

• AsimplealgorithmforSSSP(Bellman-Ford)canberepresentedasfollows:

for(i=1; i<=n; i++)d = d ⨁ A ⨂ d

JeremyKepner andJohnGilbert.2011.GraphAlgorithmsintheLanguageofLinearAlgebra.SIAMPressT.Mattson,D.Bader,J.Berry,A.Buluc,J.Dongarra,C.Faloutsos,J.Feo,J.Gilbert,J.Gonzalez,B.Hendrickson,J.Kepner,C.Leiserson,A.Lumsdaine,D.Padua,S.Poole,SteveReinhardt,M.Stonebraker,S.Wallach,A.Yoo.StandardsforGraphAlgorithmPrimitivesDecember4,2017

4. NOTATIONS4.3.EXTENSIONSTOARRAYNOTATIONFORPARALLELISM

59December4,2017

repmat(h, [1, 3])

circshift( h, [0, -1] )

transpose(h)

Communicationasarrayoperations

December4,2017

A00 A01

A21 A22

A02 B01

B21 B22

B02B00

A00B00

A01B11

A02B22

A12B21

A11B10

A10B02

A22B20

A20B01

A21B12

A00B00

A01B11

A02B22

A12B21

A11B10

A10B02

A22B20

A20B01

A21B12

initial skew

shift-multiply-add

Cannon’salgorithm

December4,2017

for k=1:nc = c + a × b;a = circshift(a,[0 -1]); b = circshift(b,[-1 0]);

Cannnon’s Algorithm

JamesC.Brodman,G.CarlEvans,MuratManguoglu,AhmedH.Sameh,María Jesús Garzarán,DavidA.Padua:AParallelNumericalSolverUsingHierarchicallyTiledArrays.LCPC 2010:46-61

December4,2017

Barrierelimination

X(3)=Y(3)*2X(1)=Y(1)*2 X(2)=Y(2)*2 X(4)=Y(4)*2

Z(1)=X(1) Z(2)=X(3)

Z(3)=X(2) Z(4)=X(4)

X(1:4)=Y(1:4)*2;Z(1:2)=X([1,3]);Z(3:4)=X([2,4]);

December4,2017

Barrierelimination

64December4,2017

Asynchronoussemantics

• Insomecases,itcouldbedesirabletodelayupdatesacrossthewholemachine.Atsomepoint,aglobalupdateismade.

• Thisisaparticularexampleofcommunicationinasynchronousalgorithmswheredataassignmentsufferdelays.

65December4,2017

TiledLinearAlgebra– SSSPExample

// the distance vectord = inf; d(1) = 0;// the mask vectorL = 0; L(1) = 1;while (L.nnz())for i = 1:LOC_ITERr <- d + (trans(A)*(d*.L);L <- (r != d);d <- r;

r += d;L <- (r != d);d <- r;

Partialmultiplication

Localcom

putatio

Globalreduction

Saeed Maleki,G.CarlEvans,DavidA.Padua:TiledLinearAlgebraaSystemforParallelGraphAlgorithms.LCPC2014:116-130December4,2017

4. NOTATIONS4.3.WHEREDOWESTAND?

67December4,2017

Arraynotation

• Arraynotationhasalonghistory• Isapowerfulnotationthatcanbeusedforawiderangeofapplications.

• Forparallelism,itisanunfulfilledpromise• Thehighlymathematicalnotationisachallenge• Weseeincreaseuseofthenotationandexpectittobecomepopularalsoforirregularalgorithms.

• Challenges:– developthenotation– Designcompileroptimizations

68December4,2017

5.EPILOG

69December4,2017

• In1976notmuchwasknownaboutparallelprogrammingofirregularcomputations.

• 40yearslaterifisimpressivehowmuchhavebeenlearnedoncompilationandnotations.

• Butthedevelopmentofwidelyusedtoolsbasedontheseideasremainachallenge.

70December4,2017

High level abstractions for irregular computations...Programming irregular computations for...

Documents

Parallel Programming with Object Assemblies · Keywords Parallel programming, Programming abstrac-tions, Irregular parallelism, Data-parallelism, Ownership 1. Introduction Calls for

Effects of Ordering Strategies and Programming Paradigms on … · 2012. 7. 17. · and Programming Paradigms on Sparse Matrix Computations∗ Leonid Oliker† Xiaoye Li† Parry

Add dynamic computations and visualization to your …media.wolfram.com/brochures/webmathematicaspecsheet.pdfHigh-performance optimization and linear programming functions Wide-ranging

Design patterns for sparse-matrix computations on …...Scientiﬁc Programming 22 (2014) 1–19 1 DOI 10.3233/SPR-130363 IOS Press Design patterns for sparse-matrix computations on

Runtime Support for Irregular Computations in MPI-Based Applications - CCGrid 2015 Doctoral Symposium - Xin Zhao *, Pavan Balaji † (Co-advisor), William

Concurrent Programming with Actors (PDCS 9, CPE 5*)€¦ · Actors/SALSA • Actor Model – A reasoning framework to model concurrent computations – Programming abstractions for

1 · Web viewModern age scientific computations are increasingly becoming large, irregular and computationally intensive to demand more computing power than a conventional sequential

PACT 19. Seattle, WA. September, 2019. Adaptive Task ...LU/Cholesky factorization, Gauss-Seidel, and successive over-relaxation [1], [2], [41]) exhibit similar irregular computations

Numeric Values and Computations Today, we extend our programming capabilities by adding numeric values and computations –As introduced last time, we can

Programming Irregular Applications with OpenMP · 2016. 11. 14. · 1 1 Programming Irregular Applications with OpenMP* * The name “OpenMP” is the property of the OpenMP Architecture

IRREGULAR WARFARE: COUNTERING IRREGULAR THREATS …

Programming Language 1. Programming language A programming language is a machine-readable artificial language designed to express computations that can

Experiments with MapReduce in Erlangsoft.vub.ac.be/.../tvcutsem_MapReduce_ErlangFactory.pdf• MapReduce: programming model that separates application-speciﬁc map and reduce computations

Irregular and Helping Verbs - Shelby County Schools · Irregular Verb Goodies Irregular Verb Guessing Game Irregular Verb Flashcards Irregular Verb Concentration Irregular Past Tense

Programming and Data Structurecse.iitkgp.ac.in/~saptarshi/courses/pdstheory2019a/L1-Intro.pdf · Autumn Semester 2019 Programming and Data Structure 9 • CPU – All computations

History of Programming Languages - Semantic Scholar · 2017. 12. 4. · Programming languages for scientific computations till 1978! Then FORTRAN 77 (ANSI 1978) • Still used for

Algorithms and Library Software for Periodic and Parallel ... · 1.1 Motivation for this work 1 1.2 Parallel computations, computers and programming models 2 1.3 Matrix computations

The Science of Programming Matrix Computations · The Science of Programming Matrix Computations / Robert A. van de Geijn ... 3.3 Algorithms for the Matrix-Vector Product ... Matrix

ICASE Report No. ICASE - Defense Technical Information Center · ICASE Report No. 91-55 ICASE RECTILINEAR PARTITIONING OF IRREGULAR DATA PARALLEL COMPUTATIONS David M. Nicol DTIC

Accelerating Irregular Computations with Hardware ...htor.inf.ethz.ch/publications/img/aam_hpdc-2015.pdf · We now describe active messages, atomics, transactional memory, and how