35
Computational Model Design Methodology Example Parallel Numerical Algorithms Chapter 2 – Parallel Thinking Section 2.1 – Parallel Algorithm Design Michael T. Heath and Edgar Solomonik Department of Computer Science University of Illinois at Urbana-Champaign CS 554 / CSE 512 Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 1 / 35

Parallel Numerical Algorithmssolomonik.cs.illinois.edu/teaching/cs554_fall2017/slides/slides_02.pdf · Computational Model Design Methodology Example Parallel Numerical Algorithms

Embed Size (px)

Citation preview

Page 1: Parallel Numerical Algorithmssolomonik.cs.illinois.edu/teaching/cs554_fall2017/slides/slides_02.pdf · Computational Model Design Methodology Example Parallel Numerical Algorithms

Computational ModelDesign Methodology

Example

Parallel Numerical AlgorithmsChapter 2 – Parallel Thinking

Section 2.1 – Parallel Algorithm Design

Michael T. Heath and Edgar Solomonik

Department of Computer ScienceUniversity of Illinois at Urbana-Champaign

CS 554 / CSE 512

Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 1 / 35

Page 2: Parallel Numerical Algorithmssolomonik.cs.illinois.edu/teaching/cs554_fall2017/slides/slides_02.pdf · Computational Model Design Methodology Example Parallel Numerical Algorithms

Computational ModelDesign Methodology

Example

Outline

1 Computational Model

2 Design MethodologyPartitioningCommunicationAgglomerationMapping

3 Example

Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 2 / 35

Page 3: Parallel Numerical Algorithmssolomonik.cs.illinois.edu/teaching/cs554_fall2017/slides/slides_02.pdf · Computational Model Design Methodology Example Parallel Numerical Algorithms

Computational ModelDesign Methodology

Example

Computational Model

Task : a subset of the overall program, with a set of inputsand outputs

Parallel computation : a program that executes two or moretasks concurrently

Communication channel : connection between two tasksover which information is passed (messages are sent andreceived) periodically

For now we work with the following messaging semanticssend is nonblocking : sending task resumes executionimmediatelyreceive is blocking : receiving task blocks execution untilrequested message is available

Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 3 / 35

Page 4: Parallel Numerical Algorithmssolomonik.cs.illinois.edu/teaching/cs554_fall2017/slides/slides_02.pdf · Computational Model Design Methodology Example Parallel Numerical Algorithms

Computational ModelDesign Methodology

Example

Example: Laplace Equation in 1-D

Consider Laplace equation in 1-D

u′′(t) = 0

on interval a < t < b with BC

u(a) = α, u(b) = β

Seek approximate solution vector u such that

ui ≈ u(ti) at mesh points ti = a+ ih,∀i ∈ 0, . . . , n+ 1,

where h = (b− a)/(n+ 1)

Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 4 / 35

Page 5: Parallel Numerical Algorithmssolomonik.cs.illinois.edu/teaching/cs554_fall2017/slides/slides_02.pdf · Computational Model Design Methodology Example Parallel Numerical Algorithms

Computational ModelDesign Methodology

Example

Example: Laplace Equation in 1-D

Finite difference approximation

u′′(ti) ≈ui+1 − 2ui + ui−1

h2

yields tridiagonal system of algebraic equations

ui+1 − 2ui + ui−1h2

= 0, i = 1, . . . , n,

for ui, i = 1, . . . , n, where u0 = α and un+1 = β

Starting from initial guess u(0), compute Jacobi iterates

u(k+1)i =

u(k)i−1 + u

(k)i+1

2, i = 1, . . . , n,

for k = 1, . . . until convergence

Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 5 / 35

Page 6: Parallel Numerical Algorithmssolomonik.cs.illinois.edu/teaching/cs554_fall2017/slides/slides_02.pdf · Computational Model Design Methodology Example Parallel Numerical Algorithms

Computational ModelDesign Methodology

Example

Example: Laplace Equation in 1-D

Define n tasks, one for each ui, i = 1, . . . , n

Task i stores initial value of ui and updates it at eachiteration until convergence

To update ui, necessary values of ui−1 and ui+1 obtainedfrom neighboring tasks i− 1 and i+ 1

u1 u2 u3 un•••

Tasks 1 and n determine u0 and un+1 from BC

Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 6 / 35

Page 7: Parallel Numerical Algorithmssolomonik.cs.illinois.edu/teaching/cs554_fall2017/slides/slides_02.pdf · Computational Model Design Methodology Example Parallel Numerical Algorithms

Computational ModelDesign Methodology

Example

Example: Laplace Equation in 1-D

initialize uifor k = 1, . . .

if i > 1, send ui to task i− 1if i < n, send ui to task i+ 1if i < n, recv ui+1 from task i+ 1if i > 1, recv ui−1 from task i− 1wait for sends to completeui = (ui−1 + ui+1)/2

end

{ send to left neighbor }{ send to right neighbor }{ receive from right neighbor }{ receive from left neighbor }

{ update my value }

Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 7 / 35

Page 8: Parallel Numerical Algorithmssolomonik.cs.illinois.edu/teaching/cs554_fall2017/slides/slides_02.pdf · Computational Model Design Methodology Example Parallel Numerical Algorithms

Computational ModelDesign Methodology

Example

Mapping Tasks to Processors

Tasks must be assigned to physical processors forexecution

Tasks can be mapped to processors in various ways,including multiple tasks per processor

Semantics of program should not depend on number ofprocessors or particular mapping of tasks to processors

Performance usually sensitive to assignment of tasks toprocessors due to concurrency, workload balance,communication patterns, etc.

Computational model maps naturally ontodistributed-memory multicomputer using message passing

Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 8 / 35

Page 9: Parallel Numerical Algorithmssolomonik.cs.illinois.edu/teaching/cs554_fall2017/slides/slides_02.pdf · Computational Model Design Methodology Example Parallel Numerical Algorithms

Computational ModelDesign Methodology

Example

PartitioningCommunicationAgglomerationMapping

Four-Step Design Methodology

Partition : Decompose problem into fine-grain tasks,maximizing number of tasks that can execute concurrently

Communicate : Determine communication pattern amongfine-grain tasks, yielding task graph with fine-grain tasksas nodes and communication channels as edges

Agglomerate : Combine groups of fine-grain tasks to formfewer but larger coarse-grain tasks, thereby reducingcommunication requirements

Map : Assign coarse-grain tasks to processors, subject totradeoffs between communication costs and concurrency

Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 9 / 35

Page 10: Parallel Numerical Algorithmssolomonik.cs.illinois.edu/teaching/cs554_fall2017/slides/slides_02.pdf · Computational Model Design Methodology Example Parallel Numerical Algorithms

Computational ModelDesign Methodology

Example

PartitioningCommunicationAgglomerationMapping

Four-Step Design Methodology

Problem

Partition Communicate Agglom

erate Map

Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 10 / 35

Page 11: Parallel Numerical Algorithmssolomonik.cs.illinois.edu/teaching/cs554_fall2017/slides/slides_02.pdf · Computational Model Design Methodology Example Parallel Numerical Algorithms

Computational ModelDesign Methodology

Example

PartitioningCommunicationAgglomerationMapping

Graph Embeddings

Target network may be virtual network topology, withnodes usually called processors or processes

Overall design methodology is composed of sequence ofgraph embeddings:

fine-grain task graph to coarse-grain task graphcoarse-grain task graph to virtual network graphvirtual network graph to physical network graph

Depending on circumstances, one or more of theseembeddings may be skipped

An alternative methodology is to map tasks andcommunication onto a graph of the network topology laidout in time, similar to the way we defined butterfly protocols

Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 11 / 35

Page 12: Parallel Numerical Algorithmssolomonik.cs.illinois.edu/teaching/cs554_fall2017/slides/slides_02.pdf · Computational Model Design Methodology Example Parallel Numerical Algorithms

Computational ModelDesign Methodology

Example

PartitioningCommunicationAgglomerationMapping

Partitioning Strategies

Domain partitioning : subdivide geometric domain intosubdomainsFunctional decomposition : subdivide algorithm intomultiple logical componentsIndependent tasks : subdivide computation into tasks thatdo not depend on each other (embarrassingly parallel )Array parallelism : subdivide data stored in vectors,matrices, or other arraysDivide-and-conquer : subdivide problem recursively intotree-like hierarchy of subproblemsPipelining : subdivide sequences of tasks performed by thealgorithm on each piece of data

Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 12 / 35

Page 13: Parallel Numerical Algorithmssolomonik.cs.illinois.edu/teaching/cs554_fall2017/slides/slides_02.pdf · Computational Model Design Methodology Example Parallel Numerical Algorithms

Computational ModelDesign Methodology

Example

PartitioningCommunicationAgglomerationMapping

Desirable Properties of Partitioning

Maximum possible concurrency in executing resultingtasks, ideally enough to keep all processors busy

Number of tasks, rather than size of each task, grows asoverall problem size increases

Tasks reasonably uniform in size

Redundant computation or storage avoided

Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 13 / 35

Page 14: Parallel Numerical Algorithmssolomonik.cs.illinois.edu/teaching/cs554_fall2017/slides/slides_02.pdf · Computational Model Design Methodology Example Parallel Numerical Algorithms

Computational ModelDesign Methodology

Example

PartitioningCommunicationAgglomerationMapping

Example: Domain Decomposition

3-D domain partitioned along one (left), two (center), or allthree (right) of its dimensions

With 1-D or 2-D partitioning, minimum task size grows withproblem size, but not with 3-D partitioning

Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 14 / 35

Page 15: Parallel Numerical Algorithmssolomonik.cs.illinois.edu/teaching/cs554_fall2017/slides/slides_02.pdf · Computational Model Design Methodology Example Parallel Numerical Algorithms

Computational ModelDesign Methodology

Example

PartitioningCommunicationAgglomerationMapping

Communication Patterns

Communication pattern determined by data dependencesamong tasks: because storage is local to each task, anydata stored or produced by one task and needed byanother must be communicated between them

Communication pattern may belocal or globalstructured or randompersistent or dynamically changingsynchronous or sporadic

Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 15 / 35

Page 16: Parallel Numerical Algorithmssolomonik.cs.illinois.edu/teaching/cs554_fall2017/slides/slides_02.pdf · Computational Model Design Methodology Example Parallel Numerical Algorithms

Computational ModelDesign Methodology

Example

PartitioningCommunicationAgglomerationMapping

Desirable Properties of Communication

Frequency and volume minimized

Highly localized (between neighboring tasks)

Reasonably uniform across channels

Network resources used concurrently

Does not inhibit concurrency of tasks

Overlapped with computation as much as possible

Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 16 / 35

Page 17: Parallel Numerical Algorithmssolomonik.cs.illinois.edu/teaching/cs554_fall2017/slides/slides_02.pdf · Computational Model Design Methodology Example Parallel Numerical Algorithms

Computational ModelDesign Methodology

Example

PartitioningCommunicationAgglomerationMapping

Agglomeration

Increasing task sizes can reduce communication but alsopotentially reduces concurrency

Subtasks that can’t be executed concurrently anyway areobvious candidates for combining into single task

Maintaining balanced workload still important

Replicating computation can eliminate communication andis advantageous if result is cheaper to compute than tocommunicate

Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 17 / 35

Page 18: Parallel Numerical Algorithmssolomonik.cs.illinois.edu/teaching/cs554_fall2017/slides/slides_02.pdf · Computational Model Design Methodology Example Parallel Numerical Algorithms

Computational ModelDesign Methodology

Example

PartitioningCommunicationAgglomerationMapping

Example: Laplace Equation in 1-D

Combine groups of consecutive mesh points ti andcorresponding solution values ui into coarse-grain tasks,yielding p tasks, each with n/p of ui values

ur+1ul−1 ul ur••• ••••••

Communication is greatly reduced, but ui values withineach coarse-grain task must be updated sequentially

Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 18 / 35

Page 19: Parallel Numerical Algorithmssolomonik.cs.illinois.edu/teaching/cs554_fall2017/slides/slides_02.pdf · Computational Model Design Methodology Example Parallel Numerical Algorithms

Computational ModelDesign Methodology

Example

PartitioningCommunicationAgglomerationMapping

Example: Laplace Equation in 1-D

initialize ul, . . . , urfor k = 1, . . .

if j > 1, send ul to task j − 1if j < p, send ur to task j + 1if j < p, recv ur+1 from task j + 1if j > 1, recv ul−1 from task j − 1for i = l to r

ui = (ui−1 + ui+1)/2endwait for sends to completeu = u

end

{ send to left neighbor }{ send to right neighbor }{ receive from right neighbor }{ receive from left neighbor }

{ update local values }

Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 19 / 35

Page 20: Parallel Numerical Algorithmssolomonik.cs.illinois.edu/teaching/cs554_fall2017/slides/slides_02.pdf · Computational Model Design Methodology Example Parallel Numerical Algorithms

Computational ModelDesign Methodology

Example

PartitioningCommunicationAgglomerationMapping

Overlapping Communication and Computation

Updating of solution values ui is done only after allcommunication has been completed, but only two of thosevalues actually depend on awaited data

Since communication is often much slower thancomputation, initiate communication by sending allmessages first, then update all “interior” values whileawaiting values from neighboring tasks

Much (possibly all ) of updating can be done while taskwould otherwise be idle awaiting messages

Performance can often be enhanced by overlappingcommunication and computation in this manner

Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 20 / 35

Page 21: Parallel Numerical Algorithmssolomonik.cs.illinois.edu/teaching/cs554_fall2017/slides/slides_02.pdf · Computational Model Design Methodology Example Parallel Numerical Algorithms

Computational ModelDesign Methodology

Example

PartitioningCommunicationAgglomerationMapping

Example: Laplace Equation in 1-D

initialize ul, . . . , urfor k = 1, . . .

if j > 1, send ul to task j − 1if j < p, send ur to task j + 1for i = l + 1 to r − 1

ui = (ui−1 + ui+1)/2endif j < p, recv ur+1 from task j + 1ur = (ur−1 + ur+1)/2if j > 1, recv ul−1 from task j − 1ul = (ul−1 + ul+1)/2wait for sends to completeu = u

end

{ send to left neighbor }{ send to right neighbor }

{ update local values }

{ receive from right neighbor }{ update local value }{ receive from left neighbor }{ update local value }

Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 21 / 35

Page 22: Parallel Numerical Algorithmssolomonik.cs.illinois.edu/teaching/cs554_fall2017/slides/slides_02.pdf · Computational Model Design Methodology Example Parallel Numerical Algorithms

Computational ModelDesign Methodology

Example

PartitioningCommunicationAgglomerationMapping

Surface-to-Volume Effect

For domain decomposition,computation is proportional to volume of subdomaincommunication is (roughly) proportional to surface area ofsubdomain

Higher-dimensional decompositions have more favorablesurface-to-volume ratio

Partitioning across more dimensions yields moreneighboring subdomains but smaller total volume ofcommunication than partitioning across fewer dimensions

Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 22 / 35

Page 23: Parallel Numerical Algorithmssolomonik.cs.illinois.edu/teaching/cs554_fall2017/slides/slides_02.pdf · Computational Model Design Methodology Example Parallel Numerical Algorithms

Computational ModelDesign Methodology

Example

PartitioningCommunicationAgglomerationMapping

Mapping

As with agglomeration, mapping of coarse-grain tasks toprocessors should maximize concurrency, minimizecommunication, maintain good workload balance, etc

But connectivity of coarse-grain task graph is inheritedfrom that of fine-grain task graph, whereas connectivity oftarget interconnection network is independent of problem

Communication channels between tasks may or may notcorrespond to physical connections in underlyinginterconnection network between processors

Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 23 / 35

Page 24: Parallel Numerical Algorithmssolomonik.cs.illinois.edu/teaching/cs554_fall2017/slides/slides_02.pdf · Computational Model Design Methodology Example Parallel Numerical Algorithms

Computational ModelDesign Methodology

Example

PartitioningCommunicationAgglomerationMapping

Mapping

Two communicating tasks can be assigned toone processor, avoiding interprocessor communication butsacrificing concurrencytwo adjacent processors, so communication between thetasks is directly supported, ortwo nonadjacent processors, so message routing isrequired

In general, finding optimal solution to these tradeoffs isNP-complete, so heuristics are used to find effectivecompromise

Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 24 / 35

Page 25: Parallel Numerical Algorithmssolomonik.cs.illinois.edu/teaching/cs554_fall2017/slides/slides_02.pdf · Computational Model Design Methodology Example Parallel Numerical Algorithms

Computational ModelDesign Methodology

Example

PartitioningCommunicationAgglomerationMapping

Mapping

For many problems, task graph has regular structure thatcan make mapping easier

If communication is mainly global, then communicationperformance may not be sensitive to placement of tasks onprocessors, so random mapping may be as good as any

Random mappings sometimes used deliberately to avoidcommunication hot spots, where some communicationlinks are oversubscribed with message traffic

Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 25 / 35

Page 26: Parallel Numerical Algorithmssolomonik.cs.illinois.edu/teaching/cs554_fall2017/slides/slides_02.pdf · Computational Model Design Methodology Example Parallel Numerical Algorithms

Computational ModelDesign Methodology

Example

PartitioningCommunicationAgglomerationMapping

Mapping Strategies

With n tasks and p processors consecutively numbered insome ordering,

block mapping : blocks of n/p consecutive tasks areassigned to successive processorscyclic mapping : task i is assigned to processor i mod preflection mapping : like cyclic mapping except tasks areassigned in reverse order on alternate passesblock-cyclic mapping and block-reflection mapping : blocksof tasks assigned to processors as in cyclic or reflection

For higher-dimensional grid, these mappings can beapplied in each dimension

Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 26 / 35

Page 27: Parallel Numerical Algorithmssolomonik.cs.illinois.edu/teaching/cs554_fall2017/slides/slides_02.pdf · Computational Model Design Methodology Example Parallel Numerical Algorithms

Computational ModelDesign Methodology

Example

PartitioningCommunicationAgglomerationMapping

Examples of Mappings

151412 1311108 9764 5320 1

15113 714102 61391 51280 4

12113 413102 51491 61580 7

15146 713124 511102 3980 1

cyclic

block

reflection

block-cyclic

Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 27 / 35

Page 28: Parallel Numerical Algorithmssolomonik.cs.illinois.edu/teaching/cs554_fall2017/slides/slides_02.pdf · Computational Model Design Methodology Example Parallel Numerical Algorithms

Computational ModelDesign Methodology

Example

PartitioningCommunicationAgglomerationMapping

Dynamic Mapping

If task sizes vary during computation or can’t be predictedin advance, tasks may need to be reassigned toprocessors dynamically to maintain reasonable workloadbalance throughout computation

To be beneficial, gain in load balance must more thanoffset cost of communication required to move tasks andtheir data between processors

Dynamic load balancing usually based on local exchangesof workload information (and tasks, if necessary), so workdiffuses over time to be reasonably uniform acrossprocessors

Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 28 / 35

Page 29: Parallel Numerical Algorithmssolomonik.cs.illinois.edu/teaching/cs554_fall2017/slides/slides_02.pdf · Computational Model Design Methodology Example Parallel Numerical Algorithms

Computational ModelDesign Methodology

Example

PartitioningCommunicationAgglomerationMapping

Task Scheduling

With multiple tasks per processor, execution of those tasksmust be scheduled over time

For shared-memory, any idle processor can simply selectnext ready task from common pool of tasks, or use workstealing by taking a task assigned to the task-queue ofanother processor

For distributed-memory, the manager/worker paradigm canimplement a common task pool, with manager dispatchingtasks to workers

Better scalability is achieved dynamic load balancingstrategies in distributed-memory by periodic global orhierarchical rebalancing

Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 29 / 35

Page 30: Parallel Numerical Algorithmssolomonik.cs.illinois.edu/teaching/cs554_fall2017/slides/slides_02.pdf · Computational Model Design Methodology Example Parallel Numerical Algorithms

Computational ModelDesign Methodology

Example

PartitioningCommunicationAgglomerationMapping

Task Scheduling

For completely decentralized scheme, it can be difficult todetermine when overall computation has been completed,so termination detection scheme is required

With multithreading, task scheduling can conveniently bedriven by availability of data: whenever executing taskbecomes idle awaiting data, another task is executed

For problems with regular structure, it is often possible todetermine mapping in advance that yields reasonable loadbalance and natural order of execution

Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 30 / 35

Page 31: Parallel Numerical Algorithmssolomonik.cs.illinois.edu/teaching/cs554_fall2017/slides/slides_02.pdf · Computational Model Design Methodology Example Parallel Numerical Algorithms

Computational ModelDesign Methodology

Example

Example: Atmospheric Flow Model

Fluid dynamics of atmosphere modeled by system ofpartial differential equations

3-D problem domain discretized by nx × ny × nz mesh ofpoints

Vertical dimension (altitude) z, much smaller thanhorizontal dimensions (latitude and longitude) x and y, sonz � nx, ny

Derivatives in PDEs approximated by finite differences

Simulation proceeds through successive discrete steps intime

Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 31 / 35

Page 32: Parallel Numerical Algorithmssolomonik.cs.illinois.edu/teaching/cs554_fall2017/slides/slides_02.pdf · Computational Model Design Methodology Example Parallel Numerical Algorithms

Computational ModelDesign Methodology

Example

Example: Atmospheric Flow Model

Partition :Each fine-grain task computes and stores data values(pressure, temperature, etc) for one mesh pointLarge-scale problems may require millions or billions ofmesh points / tasks

Communicate :Finite difference computations at each mesh point use9-point horizontal stencil and 3-point vertical stencilSolar radiation computations require communicationthroughout each vertical column of mesh pointsGlobal communication to compute total mass of air overdomain

Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 32 / 35

Page 33: Parallel Numerical Algorithmssolomonik.cs.illinois.edu/teaching/cs554_fall2017/slides/slides_02.pdf · Computational Model Design Methodology Example Parallel Numerical Algorithms

Computational ModelDesign Methodology

Example

Example: Atmospheric Flow Model

Agglomerate :Combine horizontal mesh points in b× b blocks intocoarse-grain tasks to reduce communication for finitedifferences to exchanges between adjacent nodesCombine each vertical column of mesh points into singletask to eliminate communication for solar computationsYields nx/b× ny/b coarse-grain tasks

Map :Cyclic or random mapping reduces load imbalance due tosolar computations

Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 33 / 35

Page 34: Parallel Numerical Algorithmssolomonik.cs.illinois.edu/teaching/cs554_fall2017/slides/slides_02.pdf · Computational Model Design Methodology Example Parallel Numerical Algorithms

Computational ModelDesign Methodology

Example

Example: Atmospheric Flow Model

Horizontal finite difference stencil for typical point (shadedblack) in mesh for atmospheric flow model before (left) andafter (right) agglomeration with b = 2

Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 34 / 35

Page 35: Parallel Numerical Algorithmssolomonik.cs.illinois.edu/teaching/cs554_fall2017/slides/slides_02.pdf · Computational Model Design Methodology Example Parallel Numerical Algorithms

Computational ModelDesign Methodology

Example

References

K. M. Chandy and J. Misra, Parallel Program Design: AFoundation, Addison-Wesley, 1988

I. T. Foster, Designing and Building Parallel Programs,Addison-Wesley, 1995

A. Grama, A. Gupta, G. Karypis, and V. Kumar,Introduction to Parallel Computing, 2nd. ed.,Addison-Wesley, 2003

T. G. Mattson and B. A. Sanders and B. L. Massingill,Patterns for Parallel Programming, Addison-Wesley, 2005

M. J. Quinn, Parallel Computing: Theory and Practice,McGraw-Hill, 1994

Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 35 / 35