L/8({) - University of Texas at Austinoden/Dr._Oden_Reprints/...~ Pergamon L/8({) Complllillg System., ill Ellgineering. Vol. 6. No c. PI'. 97-109. 1995 Copyrighl (' 1995 Elsevier

$Page 1: L/8({) - University of Texas at Austinoden/Dr._Oden_Reprints/...~ Pergamon L/8({) Complllillg System., ill Ellgineering. Vol. 6. No c. PI'. 97-109. 1995 Copyrighl (' 1995 Elsevier$
~ Pergamon

L/8({)

Complllillg System., ill Ellgineering. Vol. 6. No c. PI'. 97-109. 1995Copyrighl (' 1995 Elsevier Science Ltd

0956-0521 (95)OOOOH-9 Prinled in Great Brilain. All righls reserved0951,.0521/95 $9.511 I 0.00

\

PROBLEM DECOMPOSITION FOR ADAPTIVE hp FINITEELEMENT METHODS

ADANI PATRA and J. TINSLEY ODEN

Texas Institute for Computalional and Applied Malhematics. The University of Texas at Austin. Austin.TX-78712. U.S.A.

Abstraet-Prohlem decomposition strategies for load balancing parallel computations on adaptive Irpfinite clement discretizations arc discussed in this work. The special dilliculties that arise in panitioningthese discretizations are highlighted. Three classes of algorithms-mesh traversal hased on orderings.interface hased decompositions and recursive hisection of orderings arc discussed. A new ordering schemefor efficient recursive bisection of orderings is introduced. Details of the algorithms and exampks alongwith discussions of their merits and demerits arc presented. Recursive hisection on the new orderingintroduced here outperforms several known algorithms on test cases.

I. ''ITRom'cnON

In this work. we explore an important aspect ofparallel computing for adaptive hp finite elementmethods: the efficient automatic decomposition ofadaptive h{l finite elcment discretizations into "loadbalanced" subdomains. subproblems posed on whichcan then be solved by dillcrent processors at roughlyequal efTort. An ellicicnt strategy for doing this willlead to beller utilization of the processors and com-puting times closer to optimal.

Conventionally. the decomposition of computationfor solving partial dillcrential equations has beenattempted at the data structure level by partitioningmatrices and arrays. While this approach is quiteefficient in a tixed or uniform refinement type ofstructure. it becomes very cumbersome and expensivein an lip adaptive selling. The standard tree type ofdata structure used in hp finite clement codes areditlicult to partition this way. In the present investi-gation. the partitioning problem is approached at thephysical domain level. the idea being to obtain adistribution of computation cHart over the entiredomain and then to split up thc problem into therequired number of partitions.

Previous attempts at a solution of such problems.for h -version finite element methods have been madeby several researchers. Notable among them havebeen the contributions of Farhat.1 Miller. Vavasis.:·l

Simon~ and Pot hen 1'( al.s Miller. Vavasis and theircoworkers used a graph theoretil.: approach") in theirwork. This approach has been shown to yield goodresults and theoretical bounds on achievable balanceand interface size can be developed. Simon4 andPot hen e( aU have popularized another family ofalgorithms. called the spectral bisection algorithms.that uses the second eigen vectors of the Laplacianmatrix of the graph associated with the tinite elementmesh for partitioning the mesh. Both of these

97

methods have bcen adapted for our problems and thedetails are discussed in the sequel. Another interestingapproach has recently been used by Salmon andWarren6 for parallel !\I-body problems. In their tech-nique the list of two-dimensional or three-dimesnional clements is ordered based on thecentroids of the elements and this clement ordering isthen partitioncd. This approach has largely motiv-aled a new algorithm we discuss in the followingsections. There :l1so has been some contemporary\\ork on this idea by Pilkington and Baden) and Ouel al.~ Most of the algorithms presented in theliterature thus far have been for finitc clement grids\,ith only II -refinements and arc based on equidis·tributing degrees of freedom. For highcr order cl-ements these algorithms may not performsatisfactorily. The communication graph models out-lined in some of the recent literature do possess theinherent flexibility to incorporate a weighting schemeto account for the uneven computational efl'ort as-soeiatcd with diHcrent elements. However. we havenot seen any such ideas in the recent literature. In thispresentation. however. we shall confine ourselves 10a more intuitive approach. without the formalismassociated with such graph theoretic models. Theschemes discussed here seem to be satisfactory fornon-uniform high order meshes typical of adaptive hpfinite clement methods.

2. ADAPTtVE hi' FEM PROIII.E:\1 DECO;\IPOSITION

We have in mind here the management of compu-tational effort for so-called lip tinite element methodsin which the mesh size II and the local clement spectralorder p can both he varied as parameters over a mesh.The management of different hand p non-uniformlyover a mesh requires a special data structure and isgenerally considerahly more complex than conven-tional h version or p version flnite clements. The

98 AIIA!':I PArRA and J. TI:\sLEY DilES

payoff. howeva. can bc significant. and with a propcrorchestration of hI' meshes. could include exponentialconvcrgem:e and vcry high accuracies with minimumdegrees of freedom.

The first such "I' data structure was presented in aseries of papcrs by Oden. Demkowicz and theircolleagucs··10 and extensions of such approaches havcappeared in several subsequent publications (c.g. Ref.II) The possibility of further increasing the efficien-cies of such methods through multiprocessor parallelcomputation is intriguing. but we know of no studiesof such problems in the literaturc.

To begin our study of decomposition stratehJjes weestablish the goals of such decompositions. A success-ful problem decomposition stratcgy needs to achieve:

i. cquidistribution of computational effort amongthe subdomains and

ii. minimization of the interfaces betwcen the sub-domains.

The t1rst objectivc is c1carly necessary to achievcmaximum utilization of the processors. further. mostdomain decomposition algorithms result in indepen-dcnt subproblcms for thc subdomains and an ad-ditional interface problcm. Normally this interfaceproblem is quite poorly conditioned and computa-tionally thc most intractable. The size of this problcmalso detcrmines thc amount of intcr-processor com-nHlllil:ation required to solve the problem. Communi-cation among proccssors. especially in distributcdmemory multiprocessor type of computers. eitherM IM D or SI M D. is oncn a very expensive process.Hence minimization of the interface is trcatcd as anequally important objective in the problem decompo-sition strategy. II may often be necessary to accept atradeolf among thc two goals as the advantages of aminimum interfacc may outweigh the disadvantagesof small imbalances in the load.

However thc sequences of hp tinitc clcment meshes.with different ordcrs of approximation and differentsizes of elcments in diftcrent parts of the domain.needed to produce the exponential convergence andvcry high accuracics associated with adaptivc hp finileelemelll mcthods. present special ditliculties in gener-ating partitionings. In general such mcshes results in:

I. non-uniform computational load across el-ements

2. non-uniform communication patterns3. thc nccessity of enforccmcnt of constraints

across partitions4. the ncccssity of identifying an appropriate

choicc of cOl1lputationalload measurc for parti-tioning.

The non-uniform computational load across c1-cments is the rcsult of the diftcrcnt ordcrs of approxi-mation used. Thc rcsult is that a trivial cell/elcmclltcount cannot be used as a load mcasure. The dof(dcgrees of frcedolll) associated with differcnt c1-cments is c1carly vcry diftcrcnt. and these differencescan be very largc. For example. a trilincar hexahedralelement for thc c1asticity operator in thrcc dimcnsions

has 24 dol' associated with it whilc an clcment ofordcr 8 has 1029 dol'. The non-uniform communi-cation patterns are an incvitablc result of this non-uniform distribution of dol'.

A third difficulty ariscs in the implementation ofthc constrained c1emcnts of the type used in the hpfinitc clcment methods. as dcscribed in Dcmkowicz eto/.· If an irregular node falls on a partition line. thcnspecial measures have to bc taken to enforce theconstraint. The problem of identifying an appropriatcchoice of computational load mcasure for partition-ing is discussed in detail in thc ncxt section.

We note herc that therc arc two approaches tothc partitioning problem for sequcnees of adaptivemeshes. In thc first "static" approach. after everyadaptation the entirc mesh is rcpartitioned. Thisapproach requires that thc grid geometric data beeither duplicated across all processors or be easilyaccesible. In thc second "dynamic" approach. abasc partitioning is carried out on an initial meshand the partitioning is adapted along with thcmesh. Howevcr. this approach requircs considerabledata reshulTIing among proccssors sincc thc loaddistribution can be dramatically altcred by a fladaptation. Considering the low volume of griddata (Icss than 5% of all ollr data9) and for the lowcost partitioning schcmes We usc. wc choosc a staticapproach in which wc reparlition the grid after cveryadaptation.

3. COMI'U1XrlONAL EFFOIH 'tEAsnn:S

A major issue in partitioning thesc adaptive III'meshcs is the appropria tc choice of a mcasurc ofcomputational elTon. which should bc minimizcd andcquidistributed among the proccssors. In the simplerh-adaptive meshes and solution stratcgies using dircctsolvers. computational cffort is relatcd simply to thcdofinthe problem as O(N') where:( is determined bythc choicc of solver. In the currcnt study of higherorder elements and itcrative solvers computationalcftort is not a simplc function of the dof. Thedistribution of computational el10n for thcsc mcshcsis highly non-uniform. Thc conditioning of thc sys-tem and crror distribution in an initial mesh may alsoinlluence the computational effort. furthermorc. ausable measure of computational eft'ort must possessthe local additivity propcrty (i.e. onc must bc ablc tocompute local measures of computational cffort on aper element basis that add up to thc global compu-tational elTon). Good candidate measures of appearto bc

I. square of thc norm of the crror in a coarserlllesh solution

2. thc number of degrces of frecdom in the mesh.The motivation for error as a partitioning mcasurc

can bc found in thc "p adaptive strategy developed inour carlicr work.'! Since thc tinal distribution of hand {I is determined by the crror distribution in thcinitial mesh. it is reasonable 10 assume that the local

,

,'"

Problem decomposition 99

,,-

distribution of computational ctfort on thc tina I meshwould also be dctermined by thc error distribution.

This type of crror based problcm decompositionalso allows the dccomposition to be easily embeddedin an overall algorithm using the three-step hp adap-tive stratcgy developed by Odcn ct al.1l The first stepof the thrce-step method. which involvcs solution ona very coarsc mesh. could be carried out on a singlcprocessor. Thereafter. bascd on thc crror in thissolution. the domain could be partitioned and dis-tributed among thc processors which would thencarry out thc rcmaining parts of the thrce-step strat-egy on thcir portions of thc domain. Communi-cations among the processors would be rcquiredduring thc solution and crror cstimation stagcs.

Anothcr possible advantage of this kind of de-composition. cspecially in a M IM D cnvironmcntwhcre cach processor acccsscs a local data structure.is that each proccssor could cllicicntly implement thctrce-typc data structure in currcnt hp adaptivc finiteelement codcs. independently. The problems associ-ated with dynamically partitioning the global trce-type data structure would be avoided. as thcpartitioning would havc been donc a priori at thc"root" of the trcc. However. this will ncccssitatc theusc of distributed dynamic data structurcs currcntlyunavailable.

We now discuss thc thrce families ofdceompositionalgorithms:

I. mesh travcrsal based dccompositions2. intcrfacc bascd dceompositions3. recursivc biscction of ordcrings.

~. :\tESII TRAVEWSAL HASED DECO:\tI'OStTlO~S (:\tTIID)

In this family of dccomposition algorithms. thcmesh is travcrscd in somc fashion and elements arcaccumulatcd into partitions bascd on some choicc ofcomputational cffort. In the first naivc ordcring im-plcmented. the mesh is travcrscd in a nearest neighborwith lowcst load fashion and in thc sccond a morcsophisticatcd ordcring is employed.

4.1. Nearest lIeighbor orderillX hased lIlesh trlll't'rsa/

In this mcthod thc mcsh is traversed in a ncarcstneighbor with lowest load fashion. This methodensures some degrcc of locality in the decomposition.as cach clement in a decomposition has at leastone neighboring element also in the samc sub-domain. An algorithm to partition the domain Q intoAi sub domains bascd on thesc principles is outlinedbelow:

Algorithm I

e: estimatcd global computational effort0K: cstimatcd computational ctfort for thc Kth

clementD/: set of clemcnt indiccs in the !th partition.HK= {Q}: aQ}rll,n,;,I: q, n = Vf=JQK' OK = Kth

finite elemcnt in thc mcsh}

E/ = running total of computational effort for Ithpartition

1. Compute ( = elM2. 1-+/: O-+E/

For K = I to number of c1cments. do3. 0K = minL lIL where iH1A('\(]QL;,I:q,

E; + /li-+F.;4. If F./ < (. then set

D/= f)/v{K)Find /II.', I and let K + l-+KElsc1 + 1-+1D/= f)/v{K}K + I-+K

End for

While this algorithm docs result in somcwhat loadbalanced subdomains. it has significant drawbacks.Two primary drawbacks arc:

i. thc interface problem is not directly addressed.and

ii. the nearest ncighbor traversal strategy oftenfails whcn all ncarcst neighbors havc bccn allo-catcd. a phcnomcnon wc call "locking".

Thc first problem can bc somewhat addrcsscd byadding intcrface minimization constraints to Steps 3and 4 whidl sclcct thc next clcmcnt for thc partition.Thc sccond problem is lcss tractable and will alwa~'sarisc in any algorithm that travcrses a mcsh elcmcntby c1cmcnt in a nearcst neighbor fashion sccking toconstruct partitions. While lid 11oc mcasurcs can bedevised to continue the travcrsal. all rcmcdialmeilsurcs will causc disl:onnectcd subdomains andlarge interfaccs. This problcm can be climinatcd byusing alternatc ordcrings as outlined below.

4.2. Pelll/o-Hilbert orderillg hllsed mesh trarersll/

The prohlem of "Iocking" can be eliminall:d bytraversing. the mesh in other ordcring.s. One idca thathas been tested is to gencratc a Peano-Hilberl ordcr-ing of thc clements of thc mesh. This ordering is basedon thc discoveries of Peanol: and Hilbert" of a classof continuous mappings. 11,,: R ......UN' where R is thcunit intcrval [0. 1] and U" is the unit hypcrcube.Remarkably. given such a mapping. onc can com-pletely fill a hypercubc with thc images of the unitinterval. Such curves arc thus denoted as "spacefilling curvcs". Convcrscly. one might argue thatgivcn any sct of points in 1/ dimensional spacc one canfind a space tilling curve that passcs through all ofthem and. l:onsequently. a mapping from a point inthe unit interval onto each of thcsc points. aftcr someappropriatc sl:aling. Each point in 1/ dimensionalspal:C can then be associated with a point in the unitinterval. i.e. a number I E [ll. II. SOrling the pointsaccording to the value of this numher will thus definean ordcring of the sct of points in 1/ dimensionalspace. If this mapping also preserves locality. that ispoints that arc close to each other in thc 1/ dimen-sional spacc are mapped onto points close to caeh

100 AIlAt'1 PATRA lIld J. TISSLEY ODES

.2

.1

.0_.0 .1 .2 I.

a) Firsllevel curve

.01 .02 .10 .II .12 .20 .21 .22 I.

b) Second level curve

c) =0.22

b)=0.1122

a)=0.1101

.2

.1

L(

L(

L(

.(,0 .1.11 .11.2 I.

c) {\·tixed First and Sccond level curve

Fig. I. Sam pic space filling curvc and illustration of some fundamental propertics for a 3 x 3 stencil. Thecoordinates arc in hase 3 digits. I.oeation numbcrs of a and h sharc the same first 2 digits since they belong

to thc samc firsl degree subcube.

othcr in the unit intcrval. then this mapping willpreserve adjaccncy propertics of thc poit1ls in /I

dimcnsional spacc. Now. if to thc centroid of eachelement of a finitc elemcnt mesh we associatc an

indcx gcneratcd by such a mapping. we implicitlydefine an ordering of the c1emcnts due to the valucsof these indices. It is casy then to obtain an orderingof the mesh by sorting thc clcmcnts based on valuc

Finite element mesh

,..

Associatcd Node Graph

0···G): ..(0----(30'::~>0'7 ""':'~":'''Qo :':'~~::G:15'\!:.)~ " ", " ", "" ", ".-: ',.,' : >< : '~:' :

0.':',··:{j)',··'·Q,:·-':'Q,.', ," . " " , '",' ,

0'0(;3)······8

Associated Dual Graph

13125

8 9 161 2 3

_.-7 10 15

4 5 6

i 6 II ---r47 8 9

-'

3

4

2

Fig. 2. finite element mesh mapped to a graph.

Problem dccomposition 101

error dist0.02010.01780.0154

-·.1 0.013]0.01070.0084

0.006050.00371

'. "

Fig. J. Error distribution in Poisson's equation with solution II = tan-le,- + y - xo).\" ( I - x)y(1 - y).

of thc index associated with the centroid of theclement.

Scvcral techniqucs for gencrating these indices canbe dcrived from thc work of Patrick et al.J

• Thcsubdomains are now gencrated on whatevcr compu-tational elTort metric wc choose travcrsing the meshin the order of thcsc indices. A variation of this idcawas used by Salmon and Warrcn" in the paralleldata strueturc uscd in thcir analysis of N -bodyproblcms.

Wc now review somc fundamcntal propcrties ofthcse curves as described by Bially.'5 The reccnt bookby Sagan 16 also has a very comprehcnsivc treatmentof thc subject. Figurc I illustratcs some of thesefundamental properties.

I. Sub-mill' cOllcept: If cach axis of thc unit hy-pcrcube U. is decomposed into M units. then U.is decomposed into ,\/d subcubcs whcrc II is thcspace dimcnsion of the domain. Notc that thisproeess may bc carried on recursivcly and eachsubcube can again be decomposed into .lId

subcubcs leading to a total of Mdt subcubes.wherc I is the number of Icvcls of this type of

FIg. 4. Problem decomposition on h-adaptive mesh usingthe error distribution and Algorithm 4 (RI.BIIC),

Spectral Order

Adaptive hp Finite Element Mesh

Fig. 5. Adaptive hI' finite c1cment mesh.

CSE b :!-8

102 AII.\NI PATRA and J. TINSl.EY ODEN

Partition no.

Domain Decomposition Using RLBBC

I

,., I

~ !.. II:. ... . .

• - -11"1 -

-

-

Fig. 6. Problem decomposition using recursive load based biseclion of coordinate •.

recursion. Figurcs I(a) and (b) show two levelsof such a deeomposition for AI = 3.

An approximation of a space filling curve isnow easily generated by joining thc ccntroids ofeach subcube at thc fincstlevel in the figure witha linc passing through each subcube. FiguresI(a) and (b) show two such curves at differentIcvcls of decomposition. Figure I(c) shows amixed curve composed of a combination of firstand sccond levcl curvcs.

2. Self similarity: While there arc many ways thatsuch curves may be construetcd, it is possiblc toduplicate the path of the curve through eachsub-cube with thc path through the higher levelsubcube containing it, i.e. a stcncil such as thc

one shown in Fig. I(a) is carried on rccursivelythrough the Icvels subject only to spatial ro-tations.

3. Digillli causality: Lct x = (XI' x~) bc a point inthe unit cube U2 and let XI = O· XII X~I ... Xkl ••.

and X2 = 0 . XI2X22 ••. Xk2 ••• be the M -ary rep-rcsentation of its coordinatcs (XI' X2)' The pointX can now bc assigned a location number byinterleaving digits in the eoordinate of the pointin thc M -ary representation as dcscribed below.

It is clear that the location number of all pointsin a subcubc will havc the same digits in the first

Domain Decomposition Usillg RLBBO

- - - - -- ..... .' .

- - . - - - --- - - - -- - - -

- - - - -- -- -- -

Fig. 7. Problem decomposition using recursive load bascd bisection of a Peano ordering.

•

Problem decompositIon

Tahle I. Imbalancc and shape fracllolb for RLBBC partitIOning (Fig. 6)

Global SuhDom I SubOom2 SubDol1l~ SubDom4

Ill3

TotdofInterface.dof

IfSF

20"'2 :'iIP 621 57~1111 79 52 56

6.7 121 4.254 13.4 R.1 9.7 10.7

Table 2. Imbalance and shape fractions for RI.BBO partitioning (Fig. 7)

Global SubDoml SubDom2 SubDom.~ SubDom4

TotdofInterface.dof

IFSF

2072 473 520 44767 3 44 39

8.9 21 1253.2 0.6 8.' 8.7

699472346.7

dl places where d is the spatial dimension of the cubeand I is the level of the subcubc.

4. "ertex property: thc curve as constructcd abovewill enter and leave a subcubc at a particular levcl atexactly one point. This implies that the curve neednecessarily pass through all points in a subcubebefore passing through points olltsitlc of a subcllbe.

An algorithm to generate the Peano-Hilbert order-II1gof the elements of the mesh in two dimensions isprcscnted bclow. Extcnsions to higher dimensions areobvious. Thc algorithm is motivated by the digitalcausality property of the location numbers. Severalothcr more complex algorithms for generating spacetilling curves of ditTcrent typcs can be found in thctreatise of Sagan I".

Algorithm 2

I. Let (x. r) be dcment centroid coordinatcsscaled so that (x. r) € (0. I)' and compute abinary representation of x and .\' as

x = 0 ·lI:lI,iI,a.a,··· Y = o· h!h:h,b.b,·

and the liS and bs are () or I (i.e. binarycxpansions).

, Fllrm 11 = O· (2111 )(2/11 )(211: )(21), )(2a) )(2b,)" .

whcre 1 is basc 3.3. Order clcmcnts by the 1 values or their centroids.

For such an ordcring to be elTccti\'e in achicvingour partitioning objecli\es. the one dimcnsional

Fig. 8. Problem decomposition using recursive spectral bisection llf the weighlt'd nl,de graph correspond·ing to the adapll\e hp mesh.

104 AflANI PATKA and J. TINSLEY ODEN

mapping must preserve locality properties of the twoor three dimensional and mesh (i.e. points close toeach other in thc original domain must also be closeto cach othcr in the transformed domain or at Icastnot too far away in the transformed domain). Thehierarchical nature of thc subcubes along with thepropert of digital causality describcd before implythat this type of mapping does preserve localityexcept at subcube boundarics. The jumps acrosssubcubc boundaries of a particular level must againbe confined to the highcr lcvel subcube.

Wc now illustrate a simplc proposition that ensuresthat such jumps are at most of the ordcr of the sizcof the domain.Proposilion. Let XI = (XI' YI) alld x: = (x, . .1':) be IWO

poillls ill U = [0.11 x [0. 11c!R2. Let II and t: be Ihecorre.lpollding images under the mapping E descrihed illIhe algorilhm above. i.e.

+± (bJ - b})3-4JJ')=1

Now look at

[ " J2= L (aJ - aJ)2-j)= I

[ " J2+ j~1 {hJ -bJ)2-1

= C2 + D2 =:c

The ratio is thus

..

~ (a1- a])2 x 3-2(41-2, + (hi - h2)" x 3-81~ ~ ) 1

I_I (a] - a]>' x 2-2) + (h) - bj)2 x 2-2)

If IIxl - X2113 = IXI - x212 + lvi - .1',12 = Cl

It I - 121' = P Ihell lilt' ratio

~~CCl

where C is 0(1) or Cl ~ I alld P ~ I.

Let us use n digits in the mapping E. Then

alld If XI = X2 thcn :x::; 0 and 'I = 0 thcn the ratiocollapscs to 010. If x, ~ Xl thcn :x ~ 0 and f ~ O. Letk be the lowest value of j for which a] and aJ differand I be the lowest valuc of j for which h} and bjdiffcr.

Now fJ ~ y; hence, we can writc

~~ (ai-an' x 3-(4k-2)+ (bJ - b;)' x 3-81

:x "" (al-ai)2 x 2-2k+{bJ -b;)' x 2-11

Using the above algorithm

3-(4k-21 + 3-81::; ~c2-k + 2-81

where C can be determined as thc maximum value ofthe above expression. •

For example, if k = I = I then C ~ 0.222. Ifk ~ I and k is small (i.e. closc to I) thenPICl = ~(~)4k which is a bounded number. Similarlyfor I ~ k and I is small then PIa.::; (~)81 which isalso a bounded number. For k and I both largethe ratio can be large but k and I are largeindicates that both :x and p are very small num-bers, i.e. XI and X2 are close and 'I and t, areclose.

We emphasize that this result is not a proof oflocality; it merely establishes that the jumps in thelocation indices may not be too large.

The basic mesh partitioning algorithm is nowoutlined:


..

I

Fig. 9. Adaptive lip mesh for Example 3.

Algorithm 3

e: estimated global computational effortDcfine: J1 = {K if elemcnt K E subdomain I}AI: the number of partitions dcsircd'P: running total of subdomain computational

load

I. Compute e and corresponding element compu-tational effort measures OK

2. Compute e,= elM3. Creatc an ordering of the c1cments by mapping

the ccntroids of thc e1emcnts onto aPeano-Hilbcrt curve.

4. Set1=1 '['=0

5. For K = I to the number E of elements do'P = '[1 + OKJ1=J1v{K}If ('1) ~ e/) then I = I + I; 'P = 0:K=K+1.

A partitioning generatcd by using thesc indices willcnforcc somc locality on thc subdomains generatcd.Howevcr, the interfacc problcm is again not ad-drcssed explicitly. This drawback shows up when thcmeshes get finer and can often producc diseonncctedsubdomains. One altcrnative is to seek specific parti-tioning interfaces and thcn try to balance the work inthe resulting partitions. An algorithm based on thistechnique is outlined in thc ncxt scction.

5. li'HERFACE BASED DECO:\tPOSITlO;-';S

Thcse mcthods consist of selccting candidatc scpa-rator surfaccs and then selecting a separator bascd onlowest total workload, smallest load imbalance andsmallest interface. The selection involves assigning toeach candidatc separator surface a number indicativeof the computational effort associated with the rcsult-ing domain partitions and thc interfaces. The assign-

mcnt of this number can be controlled by thc problemand is machine dependent. On computers with slowermessage passing architcctures. it is possible to penal-ize the interface requiremcnts higher so that minimalinterfaces arc achieved cvcn at thc cost of somc loadimbalance.5.1. Rec/Jrsiz:e load based bisectioll of coort/illares(RLBBC)

Various types of recursive load balancing schemesbased on biscctions along coordinate lines havc bcenproposed by Vavasis~') and Miller et al.17 The advan-tage of these methods is that both objcctives, loadbalancc and minimum interface. are explictly ad-dressed. However. the cost of doing so may inhibitthe mcthod.

An algorithm that will implcment this strategy isoutlined bclow. This simplest of these algorithmsleads to a typc of recursive biscction. Thc drawbackis that partitions that are powers of 2 arc obtained.This sccms to be acceptable for current hypcrcube-type distributed memory architectures but will be adrawback on grid-type architectures wherc thc num-ber of proccssors nccd not be powers of 2.

A Igorithll1 4

OK: computational effort cstimate for clement K.OK specificd as dof in the dcscription of the algorithm.It may be replaced by any alternate measure ofcomputational effort.

D/: list of clements in subdomain I.II,: numbcr of trial separator surfaces.qj: quality index for a trial scparator.

I. Compute maximum and minimum coordinatesin any onc of the dimensions of thc entiredomain xlmln, xlm.,

For i = I to II, do

Fig. 10. Problem decomposilion using RLBBO anExample 3.

106 AIlANI PATRA and J. TINSLI;Y ODEN

2. Compute

x I j = x I . + (x 1m" + x I )nun ma:l.

"i

doflffl dof + dofinl<r-* 101Cli = dOf'igh'

where dof,d• and dofrigh, are the degrees offreedom to thc left and right of x I; and dO(r.lf[is the degrees of freedom on the trial separatorxlj'

3. Choose as interface the separator correspondingto the lowest Clj'

4. If the center of mass of clement has an x Icoordinate less than that of the interface thcnDj--+Dju{K}ElseD2--+D2u{K}At this stage, the original domain has been splitinto two.

5. For the next level of decomposition apply Steps1-4 with DI and D2 instead of the entire domainand -'2 as the coordinate. This will result in foursubdomains. [n three dimensions for the ncxt

. level usc Steps 1-4 with these four subdomainsand XJ as the coordinate, This process is recur-sively continued until the desired number ofdomains is allained. Clearly, for better shapeddomains, equal numbers of splits in each coordi-nate must be domain. Thus. for two dimensions,4" subdomains and in three dimensions, 8" sub-domains are obtained.

One disadvantage of the method observed in nu-merical tests is that for non-convex domains it resultsin disconnected subdomains.

6. RECUHSIVE BISEcnON OF ORDERINGS

In this family of algorithms an attempt is made tocombine the advantages of the mesh traversal algor-ithms with that of the interface partitionings. Theelements arc ordered using some ordering and then arecursive splitting is applied to the resulting one-di-mensional mapping.

6.1. Recursive load based biseCfion of Gil orderillg(RLBBO)

In this method the elements are ordered using thespace filling curve ordering outlined in Section 6 andthen a recursive splitting is applied to the resultingone dimensional mapping. The basic algorithm isoutlined below:

Algorifhm 5

D/: list of elements in subdomain f."/: number of trial separator surface.Clj: quality index for a trial separator.

I. Create an ordering of the clements by mappingthe centroids of the elements onto aPeano-Hilbert curve.

2. Let (Ii bc the distance of the centroid of elementK along the space filling curve.

3. Compute maximum and minimum of fli, frn"

and (min'4. Computc 11/ trial separator levels as

5. For each fl compute a quality of intcrfacc indexCI,

(dofld' )qi = abs ~r. - I . dof,u' + dofin."o nghl

Replacc dof by error or other load estimate asappropriate.

6. Choose as interface fi"t the fi that corresponds tolowest Cli

7. For K = 1 to the number of elemcntsIf fK~ fj", thenD1+-Dju{K}clseD2+-f)2U{K}end ifend forAt this stage. the original domain has been splitinto two.

8. Apply Steps 1-7 recursively on each of thegenerated subdomains.

Initial results obtained using this algorithm arequite promising. One particularly demanding meshand the corresponding decomposition are shown inFigs 5 and 7.

Fig. II. Problem decomposition using RlBBC onExample 3.

•


where lij are the componcnts of L(G). The secondeigen vector of the Laplacian matrix or the so calledFiedler vector is uscd to dcfine an ordering of thenodcs of thc elcments and these arc then recursivelybisected. The standard Laplacian matrix can be uscdonly for h adaptive mcshes. To apply such stratcgicsto hI' meshcs wc usc the node wcight feature providcdby Barnard and Simon in their recursivc spcctralbiscction code. Wc usc pl whcre pis the spectral ordcrassociated with thc nodc as the node wcight.

The algorithm is now outlincd brielly as follows:

6.2. Recursive spectral bisection (RSB)

Another technique. popularized by Simon. Pothcnand others:"s generates a partitioning bascd on theLaplacian matrix of the graph associated with themesh. Two typcs of graphs can be associated with amesh. Thc first type called a node. graph can beformed by associating a nodc on the mcsh with avertex on the graph and constructing and cdge bc-twecn two nodes if their interaction is non-zero. Thcsecond type. called a dual graph. is formed byassociating an element on thc mesh with a vertex onthe graph and an constructing an edge betwecn twovertices if the associated c1cments share a side. Asimplc mesh and its associated graphs are shown inFig. 2.

The Laplacian matrix L(G) of a graph is definedby

whcrc N, is thc dof associatcd with thc subdomain i,N." and Nm." arc thc avcragc and maximum dofassociatcd with thc different subdomains.

Imbalance fraction (IF)

This is dcfined as the percentagc with respect to thelargcst subdomain of the dcviation from thc averagedof in each partition.

(I)N-N

IF =' ., x 100Nm ..

distribution contours arc shown and in Fig. 4 thecorresponding domain decomposition from the inter-face-bascd decomposition algorithm is shown.

In the second example. a particularly demandingcombination of a non-convex geometry. a full rangeof spectral orderings from I through 6. and threelevels of h refinement are considered. The mesh isshown in Fig. 5. Thc naive MTBD class of algorithmsfailcd for this problcm. The partitioning gencratcd byAlgorithm 4 (RLBBC) is shown in Fig. 6. The loadbalance of the decomposition is good, but because ofthe non-convex nature of the domain the subdomainsgenerated are disconnectcd. In Fig. 7 the partitioninggenerated by Algorithm 5 (RLBBO) is shown. Thispartitioning scems to be balanced with compactsubdomains. The CPU time requirements for thisalgorithm are primarily in the one sort operationneeded. which is 0 (log N). and insignificant com-pared to the rest of the solution process.

To quantify the rcsults of this example. we definetwo new quanti tics to measurc the quality of thepartitioning.

if (1',. Ii) have a common edge

for vertices Vi' 1'J if i=j where III isthe total number of edges meeting at I',

otherwise{

-I

1.. = m'J

o

..

Shape fraction (SF)

This is defincd as the perccntage of intcrface dof ina particular partition with respect to the total dofassociated with that partition. For the whole domain.all dof on interfaces are compared to the total dof inthe problem

where NUl' is the number of dol' on all the interfacesassociatcd with subdomain i.

Tables I and 2 summarize these performancemeasures for the partitionings associatcd withExample 2.

While the imbalance fractions of both decompo-sitions arc comparable. the shapc fractions. especiallyfor the global interfacc. of RLBBO are much 100ver.Thus the sizc of the intcrface is surprisingly bettcrcontrolled by the RLBBO algorithm for this example.Since in most domain decomposition solvers the sizeof the interface largely controls the intcr-processorcommunication requirements. this can have a signiti-cant effect on thc computation times. Furthcr itera-

Algorithm 6

I. Construct the graph associated with the hI'mcsh. Use an apropriate weighting schcme torepresent the spectral orders. (Weights pro-portional to pl havc performed wcll)

2. Computc the second cigen vector of the Lapla-cian matrix (Fiedler vcctor)

3. Sort vertices of the graph according to the valucof their associated components in the Fiedlcrvcctor

4. Assign half thc vcrtices to cach subdomain5. Repeat Steps l~ recursively on cach subdo-

mam.This algorithm is applicd to a test adaptive hI' mcsh

and appears to producc good decompositions.

7. EXA:\II'I.F.S

Some examplcs are now presented to demonstratethe basic performancc of thc algorithms prcsentcd inthe prcvious sections. The first cxample uscs crror asthc measure of computational effort and an interfacebased partitioning algorithm. In Fig. 3 thc error

NSF=~xIOO

Ni(2)

108 ARASI P"TRA and J. TINSl.EY OLlEN

Table 3. Imbalance and shape fractions for RLH130 partilioning (Fig. 10). Solution time on an 8 proccssor i860 is 10.2seconds

Global I 2 3 4 5 6 7 8

TOl.dof 840 151 36 209 130 132 86 105 126Interface.dof 130 28 16 34 30 65 26 42 24

IF - 13.9 41.1 4\.7 3.9 4.8 17.1 8.1 2SF 15.4 44.4 16.3 23.1 49.4 30.2 40 19

Table 4. Imbalance and shape fractions for RLHRC partitioning (Fig. II). Solution time on an 8 processor i860 is 9.8seconds

Global I 2 3

Tol.dof 840 149 132 117Interface.dof 126 35 42 34

IF - 17.6 6.3 3SF 15 23.3 31.8 29.1

tive substfllcturing Iype algorithms will also be morccfficient for this partitioning.

In Fig. 8. the partitioning gcnerated by the RSBcodc from Simon er al. is shown for comparison. Toimplemcnt thc mcthod. we usc a wcighting of thenodes corrcsponding to the squarc of Ihe ordcr ofapproximation associatcd with thc node. The parti-tioning looks quite similar to that generatcd by theRLBBO. One can possibly arguc that the Ficdlervector used in this partitioning is in fact an orderingof thc nodes and. implicitly. also of the elemcnts andhence. it is no surprise that thc panitionings looksimilar. Thc only drawback wc obscrve is that a fewisolated nodes arc located apart from thc domain. Agood smoothing pass ought to take carc of these. Allthe results shown havc not employed any smoothingpasses for either the RLBBO or thc RLBBC algor-ithms.

In Fig. 9 wc dcpict a third cxamplc adaptive lipmesh gcnerated automatically by the algorithm de-scribed in Rcf. II. The corresponding partitioningsarc shovm in Figs I() and II. Tables 3 and 4 show theimbalance fractions. shapc fractions and solutiontimcs for parallel solution of Poisson's equation onan 8 processor Intel i860 using this mesh and parti-tioning.

Acknoll"ledgC'II/C'lIfs- The financial support of ARPA undercontract no. DABT63·92·C·0042 is gralefully acknowl·edged. We also thank Dr. H. D. Simon for providing a copyof the code for recursive spectral bisection.

REFERElIOCES

1. C. Farhat and F.-X. Roux. "Implieit parallel processingin structural mechanics". COII/plltatiol/ai Mechal/icsAdval/ces. IACM Vol. 1. No.2. Elsevier SciemificPublishers. Amsterdam. 1994.

2. S. A. Vavasis, "Automatic domain partitioning in threedimensions:' SIA At J. Sci. SltJtistical COII/I'llt. 12(4).950-970 (July 199 I).

3. S. A. Vavasis. "Automatic domain partitioning in threedimensions:' TR 90·1082. Department of ComputerScience. Cornell University. Ithaca. NY 14853·7501.January 1990.

4 5 6 7 8

118 150 106 108 10242 33 23 53 25

3 18.3 II 9.7 13.835.5 22 21.7 24.5

4. 1-1. D. Simon. "Spectral parlll10ning ror dynamicallychanging calculations on parallel machines:' SeventhInternational Conference on Domain DecompositionMethods in Scientific and Engim:ering Computing.State College, October 1993.

5. A. Pothen, H. Simon and K. P. Liou, "Partitioningsparse matrices wilh eigen vectors of graphs:' SIA,If J..\Iatrix Alla~l'sis Appl. 1t, 430--452 (1990).

6. J. Salmon and 1\,1. Warren. "Parallel hashed ocHrees:'in Proceedillgs, SupernJlI/plltillg '93. Portland. Oregon.Nov. 1993.

7. J. R. Pilkington and S. B. Baden. "Partitioning withspacetilling curves:' Technical Rcport CS94-349.Department of Computer Scicnce and Engineering.University of California. San Diego. March 1994.

8. c.-W. Ou, S. Ranka and G. Fox. "Fast mapping andremapping algorithms for irrcgular and adaptivcproblems:' Proceedings of the Intcrnational Conferenecon Parallel and Distributed Systems. Taipei. Taiwan.1993.

9. L. Dcmkowicz. J. T. aden. W. Rachowicz and O.Hardy. "Toward a universal hp adilPtivc finite elementstrategy. Part \. Constrained approximation and datastructure," Compul. MetllOdf Appl .. \Iech. EllglIg 77,79-112 (1989).

10. W. Rachowicz. J. T. Odcn and L. Demkowicz. "Towarda universal hp adaptive linite elemelll strategy, Pari 3.Design of hp meshes:' COlI/pUI. Methods Appl. ,11('1-[,.EI/gllg 77. 181-212 (1989)

11. J. T Odcn. A. Patm and Y. S. Feng. "An hp adaptivestrategy:' in Adaplit'e. Multilerel alld HierarchicalComptllatiol/al SITategies (edited by A. K. Noor).A!\"1D-Vol. 157. pp. 23-46. 1992.

12. G. Peano, "Sur une courbe, que remplil toule unc aireplane", Malh AliI/aiI'll 36, 157-160 (1890).

13. D. Hilbert. "Ubcr die stetige Abhildung ciner Linie aufein Flachenstuck:' Mal" AI/I/alell 311, (1891).

14. E. A. Patrick. D. R. Anderson and F. K. Bechtel."Mapping muhidimensional space to one dimcnsionfor computer output display:' IE£E TrailS. Comput.C·17(lO) (Octohcr. 19(8)

15. T Bially, "A class of dimension changing mappingsand its application to bandwidth compression:' Ph.D.thesis, Poly technique Institute of Brooklyn. 1967.

16. H. Sagan. Space Fillillg Clln·l's. Springer Verlag. Berlin1994.

17. G. L. Miller. S. Teng, W. Thurston and S. A. Vavasis."Automatic mesh partitioning:' CTC92TRI12 .Advanced Computing Research Institute. CornellUniversity. Ithaca. NY 14853-7501. 1992.

18. H. D. Simon. "Partitioning of unstructured problemsfor parallel processing:' COli/pilI. Systems EI/JIIlg 2,135-148 (1991).

-

Problem decomposition

-.Jj:-

19. M. Ainsworth and J. T. Odcn, "A unified approachto a posteriori crror estimation using elemcnt residualmethods. Numer. /lIath. 65, 23-50. 1993.

20. J. T. aden. A. Palra and Y. S. Feng. "Domain

109

decomposition for adaptive hp finitc clement methods."Domain Decomposilion Afelhods in Scielllific and Engin-eering Compl/ting (edited by D. Kcyes and J. Xu),Contemporary Mathematics, Vol. 180. AMS. 1994.

Documents

L/8({) - University of Texas at Austinoden/Dr._Oden_Reprints/...~ Pergamon L/8({) Complllillg System., ill Ellgineering. Vol. 6. No c. PI'. 97-109. 1995 Copyrighl (' 1995 Elsevier