Download ppt - Multithreaded and Distributed Programming -- Classes of Problems ECEN5053 Software Engineering of Distributed Systems University of Colorado Foundations

Multithreaded and Distributed Programming -- Classes of Problems

ECEN5053 Software Engineering of

Distributed Systems

University of Colorado

Foundations of Multithreaded, Parallel, and Distributed Programming, Gregory R. Andrews, Addison-Wesley, 2000

revised 9/8/2002 ECEN5053 SW Eng of Distributed Systems, University of Colorado

2

The Essence of Multiple Threads -- review Two or more processes that work together to

perform a task Each process is a sequential program One thread of control per process

Communicate using shared variables Need to synchronize with each other, 1 of 2 ways

Mutual exclusion Condition synchronization


3

Opportunities & Challenges

What kinds of processes to use How many parts or copies How they should interact Key to developing a correct program is to ensure

the process interaction is properly synchronized


4

Focus in this course Imperative programs

Programmer has to specify the actions of each process and how they communicate and synchronize. (Java, Ada)

Declarative programs (not our focus) Written in languages designed for the

purpose of making synchronization and/or concurrency implicit

Require machine to support the languages, for example, “massively parallel machines.”

Asynchronous process execution Shared memory, distributed memory, networks

of workstations (message-passing)


5

Multiprocessing monkey wrench

The solutions we addressed last semester presumed a single CPU and therefore the concurrent processes share coherent memory

A multiprocessor environment with shared memory introduces cache and memory consistency problems and overhead to manage it.

A distributed-memory multiprocessor/multicomputer/network environment has additional issues of latency, bandwidth, administration, security, etc.


6

Recall from multiprogram systems

A process is a sequential program that has its own thread of control when executed

A concurrent program contains multiple processes so every concurrent program has multiple threads, one for each process.

Multithreaded usually means a program contains more processes than there are processors to execute them

A multithreaded software system manages multiple independent activities


7

Why write as multithreaded?

To be cool (wrong reason) Sometimes, it is easier to organize the code and

data as a collection of processes than as a single huge sequential program

Each process can be scheduled and executed independently

Other applications can continue to execute “in the background”


8

Many applications, 5 basic paradigms

Iterative parallelism Recursive parallelism Producers and consumers (pipelines) Clients and servers Interacting peers

Each of these can be accomplished in a distributed environment. Some can be used in a single CPU environment.


9

Iterative parallelism

Example? Several, often identical processes Each contains one or more loops Therefore each process is iterative They work together to solve a single program Communicate and synchronize using shared

variables Independent computations – disjoint write sets


10

Recursive parallelism

One or more independent recursive procedures Recursion is the dual of iteration Procedure calls are independent – each works

on different parts of the shared data Often used in imperative languages for

Divide and conquer algorithms Backtracking algorithms (e.g. tree-traversal)

Used to solve combinatorial problems such as sorting, scheduling, and game playing

If too many recursive procedures, we prune.


11

Producers and consumers One-way communication between processes Often organized into a pipeline through which

info flows Each process is a filter that consumes the output

of its predecessor and produces output for its successor

That is, a producer-process computes and outputs a stream of results

Sometimes implemented with a shared bounded buffer as the pipe, e.g. Unix stdin and stdout

Synchronization primitives: flags, semaphores, monitors


12

Clients & Servers

Producer/consumer -- one-way flow of information independent processes with own rates of

progress Client/server relationship is most common

pattern Client process requests a service & waits for

reply Server repeatedly waits for a request; then acts

upon it and sends a reply. Two-way flow of information


13

Distributed “procedures” and “calls”

Client and server relationship is the concurrent programming analog of the relationship between the caller of a subroutine and the subroutine itself.

Like a subroutine that can be called from many places, the server has many clients.

Each client request must be handled independently

Multiple requests might be handled concurrently


14

Common example

Common example of client/server interactions in operating systems, OO systems, networks, databases, and others -- reading and writing a data file.

Assume file server module provides 2 ops: read and write; client process calls one or other.

Single CPU or shared-memory system: File server implemented as set of subroutines

and data structures that represent files Interaction between client process and a file

typically implemented by subroutine calls


15

Client/Server example

If the file is shared Probably must be written to by at most one

client process at a time Can safely be read concurrently by multiple

clients Example of what is called the readers/writers

problem


16

Readers/Writers -- many facets Has a classic solution using mutexes (in chapter

2 last semester) when viewed as a mutual exclusion problem

Can also be solved with a condition synchronization solution different scheduling policies

Distributed system solutions include with encapsulated database with replicated files just remote procedure calls & local

synchronization just rendezvous


17

Consider a query on the WWW A user opens a new URL within a Web browser The Web browser is a client process that

executes on a user’s machine. The URL indirectly specifies another machine on

which the Web page resides. The Web page itself is accessed by a server

process that executes on the other machine. May already exist; may be created Reads the page specified by the URL Returns it to the client’s machine

Add’l server processes may be visited or created at intermediate machines along the way


18

Clients/Servers -- on same or separate Clients are processes regardless of # machines Server

On a shared-memory machine is a collection of subroutines

With a single CPU, programmed using mutual exclusion to protect critical sectionscondition synchronization to ensure

subroutines are executed in appropriate orders

Distributed-memory or network -- processes executing on different machine than clients

Often multithreaded with one thread per client


19

Communication in client/server app

Shared memory -- servers as subroutines; use semaphores or monitors for

synchronization Distributed --

servers as processes communicate with clients using

message passing remote procedure call (remote method inv.) rendezvous


20

Interacting peers

Occurs in distributed programs, not single CPU Several processes that accomplish a task

executing the copies of same code (hence, “peers”)

exchanging messages example: distributed matrix multiplication

Used to implement Distributed parallel programs including

distributed versions of iterative parallelism Decentralized decision making

Among the 5 paradigms are certain characteristics common to distributed environments.

Distributed memoryProperties of parallel applicationsConcurrent computation


22

Distributed memory implications

Each processor can access only its own local memory

Program cannot use global variables Every variable must be local to some process or

procedure and can be accessed only by that process or procedure

Processes have to use message passing to communicate with each other


23

Example of a parallel application Remember concurrent matrix multiplication in a

shared memory environment -- last semester?

Sequential solution first:

for [i = 0 to n-1] {

for [j = 0 to n-1] {

# compute inner product of a[i,*] and b[*, j]

c[i, j] = 0.0;

for [k = 0 to n-1]

c[i, j] = c[i, j] + a[i, k]* b[k, j];

}

}


24

Properties of parallel applications

Two operations can be executed in parallel if they are independent.

Read set contains variables it reads but does not alter

Write set contains variables it alters (and possibly also reads)

Two operations are independent if the write set of each is disjoint from both the read and write sets of the other.


25

Concurrent computation

Computing rows of result-matrix in parallel.

cobegin [i = 0 to n-1] {

for [j = 0 to n-1 {

c[i, j] = 0.0;

for [k = 0 to n-1]

c[i, j] = c[i, j] + a[i, k] * b[k, j];

}

} # coend


26

Differences: sequential vs. concurrent

Syntactic: cobegin is used in place of for in the

outermost loop Semantic:

cobegin specifies that its body should be executed concurrently -- at least conceptually -- for each value of index i.


27

Previous example implemented matrix multiplication using shared variables

Now -- two ways using message passing as means of communication 1. Coordinator process & array of

independent worker processes 2. Workers are peer processes that interact

by means of a circular pipeline


28

Worker 0 Worker n-1

Coordinator

data data

Results

...Worker 0 Worker n-1

Peers


29

Assume n processors for simplicity Use an array of n worker processes, one worker

on each processor, each worker computes one row of the result matrix

process worker[i = 0 to n-1] { double a[n]; # row i of matrix a

double b[n,n]; # all of matrix b

double c[n]; # row i of matrix c

receive initial values for vector a and matrix b;

for [j = 0 to n-1] {

c[j] = 0.0;

for [k = 0 to n-1]

c[j] = c[j + a[k] * b[k, j]; } send result c to coord}


30

Aside -- if not standalone: The source matrices might be produced by a

prior computation and the result matrix might be input to a subsequent computation.

Example of distributed pipeline.


31

Role of coordinator

Initiates the computation and gathers and prints the results.

First sends each worker the appropriate row of a and all of b.

Waits to receive row of c from every worker.


32

process coordinator {

#source matrix a, b, and c are declared

initialize a and b;

for [i = 0 to n-1] {

send row i of a to worker [i];

send all of b to worker [i];

}

for [i = 0 to n-1]

receive row i of c from worker [i];

print results which are now in matrix c;

}


33

Message passing primitives

Send packages up a message and transmits it to another process

Receive waits for a message from another process and stores it in local variables.


34

Peer approachEach worker has one row of a & is to compute one row

of cEach worker has only one column of b at a time instead

of the entire matrixWorker i has column i of matrix b.With this much source data, worker i can compute only

the result for c[i, i].For worker i to compute all of row i of matrix c, it must

acquire all columns of matrix b.We circulate the columns of b among the worker

processes via the circular pipelineEach worker executes a series of rounds in which it

sends its column of b to the next worker and receives a different column of b from the previous worker


35

See handout Each worker executes the same algorithm Communicates with other workers in order to

compute its part of the desired result. In this case, each worker communicates with

just two neighbors In other cases of interacting peers, each worker

communicates with all the others.


36

Worker algorithm

Process worker [I = 0 to n-1] {

double a[n]; #row i of matrix a

double b[n]; #one column of matrix b

double c[n]; #row i of matrix c

double sum = 0.0; # storage for inner products

int nextCol = i; # next column of results


37

Worker algorithm (cont.)receive row i of matrix a and column i of matrix b;

#compute c[i,i] = a[i,*] x b[*,i]

for [k = 0 to n-1]

sum = sum + a[k] * b[k];

c[nextCol] = sum;

# circulate columns and compute rest of c[i,*]

for [j = 1 to n-1] {

send my column of b to next worker;

receive a new column of b from previous worker


38

Worker algorithm (cont. 2)

sum = 0.0;

for [k = 0 to n-1]

sum = sum + a[k] * b[k];

if (nextCol == 0)

nextCol = n-1;

else

nextCol = nextCol – 1;

c[nextCol] = sum;

}

send result vector c to coordinator process;

}


39

Comparisons

In first program, values of matrix b are replicated In second, each has one row of a and one

column of b at any point in time - First requires more memory but executes faster. This is a classic time/space tradeoff.


40

Summary

Concurrent programming paradigms in a shared-memory environment Iterative parallelism Recursive parallelism Producers and consumers

Concurrent programming paradigms in a distributed-memory environment Client/server Interacting peers


41

Shared-memory programming


42

Shared-Variable Programming

Frowned on in sequential programs, although convenient (“global variables”)

Absolutely necessary in concurrent programs Must communicate to work together


43

Need to communicate

Communication fosters need for synchronization Mutual exclusion – need to not access shared

data at the same time Condition synchronization – one needs to wait

for another Communicate in distributed environment via

messages, remote procedure call, or rendezvous


44

Some terms

State – values of the program variables at a point in time, both explicit and implicit. Each process in a program executes independently and, as it executes, examines and alters the program state.

Atomic actions -- A process executes sequential statements. Each statement is implemented at the machine level by one or more atomic actions that indivisibly examine or change program state.

Concurrent program execution interleaves sequences of atomic actions. A history is a trace of a particular interleaving.


45

Terms -- continued

The next atomic action in any ONE of the processes could be the next one in a history. So there are many ways actions can be interleaved and conditional statements allow even this to vary.

The role of synchronization is to constrain the possible histories to those that are desirable.

Mutual exclusion combines atomic actions into sequences of actions called critical sections where the entire section appears to be atomic.


46

Terms – continued further

Property of a program is an attribute that is true of every possible history. Safety – never enters a bad state Liveness – the program eventually enters a

good state


47

How can we verify?

How do we demonstrate a program satisfies a property? A dynamic execution of a test considers just

one possible historyLimited number of tests unlikely to

demonstrate the absence of bad histories Operational reasoning -- exhaustive case

analysis Assertional reasoning – abstract analysis

Atomic actions are predicate transformers


48

Assertional Reasoning

Use assertions to characterize sets of states Allows a compact representation of states and

their transformations More on this later in the course


49

Warning

We must be wary of dynamic testing alone it can reveal only the presence of errors, not

their absence. Concurrent and distributed programs are

difficult to test & debugDifficult (impossible) to stop all processes

at once in order to examine their state!Each execution in general will produce a

different history


50

Why synchronize?

If processes do not interact, all interleavings are acceptable.

If processes do interact, only some interleavings are acceptable.

Role of synchronization: prevent unacceptable interleavings Combine fine-grain atomic actions into

coarse-grained composite actions (we call this ....what?)

Delay process execution until program state satisfies some predicate


51

Unconditional atomic action does not contain a delay condition can execute immediately as long as it

executes atomically (not interleaved) examples:

individual machine instructionsexpressions we place in angle bracketsawait statements where guard condition is

constant true or is omitted


52

Conditional atomic action - await statement with a guard condition If condition is false in a given process, it can

only become true by the action of other processes.

How long will the process wait if it has a conditional atomic action?


53

How to implement synchronization

To implement mutual exclusion Implement atomic actions in software using

locks to protect critical sections Needed in most concurrent programs

To implement conditional synchronization Implement synchronization point that all

processes must reach before any process is allowed to proceed -- barrier

Needed in many parallel programs -- why?


54

Desirable Traits and Bad States

Mutual exclusion -- at most one process at a time is executing its critical section its bad state is one in which two processes

are in their critical section Absence of Deadlock (livelock) -- If 2 or more

processes are trying to enter their critical sections, at least one will succeed. its bad state is one in which all the processes

are waiting to enter but none is able to do so two more on next slide


55

Desirable Traits and Bad states (cont.) Absence of Unnecessary Delay -- If a process is

trying to enter its c.s. and the other processes are executing their noncritical sections or have terminated, the first process is not prevented from entering its c.s. Bad state is one in which the one process that

wants to enter cannot do so, even though no other process is in the c.s.

Eventual entry -- process that is attempting to enter its c.s. will eventually succeed. liveness property, depends on scheduling

policy


56

Logical property of mutual exclusion When process1 is in its c.s., set property1 true. Similarly, for process2 where property2 is true. Bad state is where property1 and property2 are

both true at the same time Therefore

want every state to satisfy the negation of the bad state --

mutex: NOT(property1 AND property2)Needs to be a global invariant

True in the initial state and after each event that affects property1 or property2

<await (!property2) property1 = true>


57

Coarse-grain solution

process process1 {

while (true) {

<await (!property2) property1 = true;>

critical section;

property1 = false;

noncritical section;

}

}

process process2 {

while (true) {

<await (!property1) property2 = true;>

critical section;

property2 = false;

noncritical section;

}

}

bool property1 = false; property2 = false;

COMMENT: mutex: NOT(property1 AND property2) -- global invariant


58

Does it avoid the problems? Deadlock: if each process were blocked in its

entry protocol, then both property1 and property2 would have to be true. Both are false at this point in the code.

Unnecessary delay: One process blocks only if the other one is not in its c.s.

Liveness -- see next slide


59

Liveness guaranteed?

Liveness property -- process trying to enter its critical section eventually is able to do so If process1 trying to enter but cannot, then

property2 is true; therefore process2 is in its c.s. which

eventually exits making property2 false; allows process1’s guard to become true

If process1 still not allowed entry, it’s because the scheduler is unfair or because process2 again gains entry -- (happens infinitely often?)

Strongly-fair scheduler required, not likely.


60

Three “spin lock” solutions A “spin lock” solution uses busy-waiting

Ensure mutual exclusion, are deadlock free, and avoid unnecessary delay

Require a fairly strong scheduler to ensure eventual entry

Do not control the order in which delayed processes enter their c.s.’s when >= 2 try

Busy-waiting solutions were tolerated on a single CPU when the critical section was bounded.

What about busy-waiting solutions in a distributed environment? Is there such a thing?


61

Distributed-memory programming


62

Distributed-memory architecture

Synchronization constructs we examined last semester were based on reading and writing shared variables.

In distributed architectures, processors have their own private memory interact using a communication network without a shared memory, must exchange

messages


63

Necessary first stepsto write programs for a dist.-memory arch.

1. Define the interfaces with the communication network

If they were read and write ops like those that operate on shared variables, Programs would have to employ busy-

waiting synchronization. Why? Better to define special network

operations that include synchronization -- message passing primitives


64

2. Message-passing is extending semaphores to convey data as well as to provide synchronization

3. Processes share channels - a communication path

Necessary first stepsto write programs for a dist.-memory arch. – cont.


65

Characteristics

Distributed program may be distributed across the processors of a

distributed-memory architecture can be run on a shared-memory

multiprocessor(Just like a concurrent program can be run

on a single, multiplexed processor.) Channels are the only items that

processes share in a distributed program Each variable is local to one process


66

Implications of no shared variables

Variables are never subject to concurrent access

No special mechanism for mutual exclusion is required

Processes must communicate in order to interact

Main concern of distributed programming is synchronizing interprocess communication How this is done depends on the pattern

of process interaction


67

Patterns of process interaction Vary in way channels are named and used Vary in way communication is synchronized We’ll look at asynchronous and

synchronous message passing, remote procedure calls, and rendezvous. Equivalent: a program written using one

set of primitives can be rewritten using any of the others

However: message passing is best for programming producers and consumers and interacting peers;

RPC and rendezvous best for programming clients and servers


68

How related

Busy waiting

Semaphores

Message passing

RendezvousRPC

Monitors


69

Match Exampleswith Paradigms and Process Interaction categories

ATMWeb-based travel siteStock transaction processing systemSearch service