Multithreaded and Distributed Programming -- Classes of Problems
ECEN5053 Software Engineering of
Distributed Systems
University of Colorado
Foundations of Multithreaded, Parallel, and Distributed Programming, Gregory R. Andrews, Addison-Wesley, 2000
revised 9/8/2002 ECEN5053 SW Eng of Distributed Systems, University of Colorado
2
The Essence of Multiple Threads -- review Two or more processes that work together to
perform a task Each process is a sequential program One thread of control per process
Communicate using shared variables Need to synchronize with each other, 1 of 2 ways
Mutual exclusion Condition synchronization
revised 9/8/2002 ECEN5053 SW Eng of Distributed Systems, University of Colorado
3
Opportunities & Challenges
What kinds of processes to use How many parts or copies How they should interact Key to developing a correct program is to ensure
the process interaction is properly synchronized
revised 9/8/2002 ECEN5053 SW Eng of Distributed Systems, University of Colorado
4
Focus in this course Imperative programs
Programmer has to specify the actions of each process and how they communicate and synchronize. (Java, Ada)
Declarative programs (not our focus) Written in languages designed for the
purpose of making synchronization and/or concurrency implicit
Require machine to support the languages, for example, “massively parallel machines.”
Asynchronous process execution Shared memory, distributed memory, networks
of workstations (message-passing)
revised 9/8/2002 ECEN5053 SW Eng of Distributed Systems, University of Colorado
5
Multiprocessing monkey wrench
The solutions we addressed last semester presumed a single CPU and therefore the concurrent processes share coherent memory
A multiprocessor environment with shared memory introduces cache and memory consistency problems and overhead to manage it.
A distributed-memory multiprocessor/multicomputer/network environment has additional issues of latency, bandwidth, administration, security, etc.
revised 9/8/2002 ECEN5053 SW Eng of Distributed Systems, University of Colorado
6
Recall from multiprogram systems
A process is a sequential program that has its own thread of control when executed
A concurrent program contains multiple processes so every concurrent program has multiple threads, one for each process.
Multithreaded usually means a program contains more processes than there are processors to execute them
A multithreaded software system manages multiple independent activities
revised 9/8/2002 ECEN5053 SW Eng of Distributed Systems, University of Colorado
7
Why write as multithreaded?
To be cool (wrong reason) Sometimes, it is easier to organize the code and
data as a collection of processes than as a single huge sequential program
Each process can be scheduled and executed independently
Other applications can continue to execute “in the background”
revised 9/8/2002 ECEN5053 SW Eng of Distributed Systems, University of Colorado
8
Many applications, 5 basic paradigms
Iterative parallelism Recursive parallelism Producers and consumers (pipelines) Clients and servers Interacting peers
Each of these can be accomplished in a distributed environment. Some can be used in a single CPU environment.
revised 9/8/2002 ECEN5053 SW Eng of Distributed Systems, University of Colorado
9
Iterative parallelism
Example? Several, often identical processes Each contains one or more loops Therefore each process is iterative They work together to solve a single program Communicate and synchronize using shared
variables Independent computations – disjoint write sets
revised 9/8/2002 ECEN5053 SW Eng of Distributed Systems, University of Colorado
10
Recursive parallelism
One or more independent recursive procedures Recursion is the dual of iteration Procedure calls are independent – each works
on different parts of the shared data Often used in imperative languages for
Divide and conquer algorithms Backtracking algorithms (e.g. tree-traversal)
Used to solve combinatorial problems such as sorting, scheduling, and game playing
If too many recursive procedures, we prune.
revised 9/8/2002 ECEN5053 SW Eng of Distributed Systems, University of Colorado
11
Producers and consumers One-way communication between processes Often organized into a pipeline through which
info flows Each process is a filter that consumes the output
of its predecessor and produces output for its successor
That is, a producer-process computes and outputs a stream of results
Sometimes implemented with a shared bounded buffer as the pipe, e.g. Unix stdin and stdout
Synchronization primitives: flags, semaphores, monitors
revised 9/8/2002 ECEN5053 SW Eng of Distributed Systems, University of Colorado
12
Clients & Servers
Producer/consumer -- one-way flow of information independent processes with own rates of
progress Client/server relationship is most common
pattern Client process requests a service & waits for
reply Server repeatedly waits for a request; then acts
upon it and sends a reply. Two-way flow of information
revised 9/8/2002 ECEN5053 SW Eng of Distributed Systems, University of Colorado
13
Distributed “procedures” and “calls”
Client and server relationship is the concurrent programming analog of the relationship between the caller of a subroutine and the subroutine itself.
Like a subroutine that can be called from many places, the server has many clients.
Each client request must be handled independently
Multiple requests might be handled concurrently
revised 9/8/2002 ECEN5053 SW Eng of Distributed Systems, University of Colorado
14
Common example
Common example of client/server interactions in operating systems, OO systems, networks, databases, and others -- reading and writing a data file.
Assume file server module provides 2 ops: read and write; client process calls one or other.
Single CPU or shared-memory system: File server implemented as set of subroutines
and data structures that represent files Interaction between client process and a file
typically implemented by subroutine calls
revised 9/8/2002 ECEN5053 SW Eng of Distributed Systems, University of Colorado
15
Client/Server example
If the file is shared Probably must be written to by at most one
client process at a time Can safely be read concurrently by multiple
clients Example of what is called the readers/writers
problem
revised 9/8/2002 ECEN5053 SW Eng of Distributed Systems, University of Colorado
16
Readers/Writers -- many facets Has a classic solution using mutexes (in chapter
2 last semester) when viewed as a mutual exclusion problem
Can also be solved with a condition synchronization solution different scheduling policies
Distributed system solutions include with encapsulated database with replicated files just remote procedure calls & local
synchronization just rendezvous
revised 9/8/2002 ECEN5053 SW Eng of Distributed Systems, University of Colorado
17
Consider a query on the WWW A user opens a new URL within a Web browser The Web browser is a client process that
executes on a user’s machine. The URL indirectly specifies another machine on
which the Web page resides. The Web page itself is accessed by a server
process that executes on the other machine. May already exist; may be created Reads the page specified by the URL Returns it to the client’s machine
Add’l server processes may be visited or created at intermediate machines along the way
revised 9/8/2002 ECEN5053 SW Eng of Distributed Systems, University of Colorado
18
Clients/Servers -- on same or separate Clients are processes regardless of # machines Server
On a shared-memory machine is a collection of subroutines
With a single CPU, programmed using mutual exclusion to protect critical sectionscondition synchronization to ensure
subroutines are executed in appropriate orders
Distributed-memory or network -- processes executing on different machine than clients
Often multithreaded with one thread per client
revised 9/8/2002 ECEN5053 SW Eng of Distributed Systems, University of Colorado
19
Communication in client/server app
Shared memory -- servers as subroutines; use semaphores or monitors for
synchronization Distributed --
servers as processes communicate with clients using
message passing remote procedure call (remote method inv.) rendezvous
revised 9/8/2002 ECEN5053 SW Eng of Distributed Systems, University of Colorado
20
Interacting peers
Occurs in distributed programs, not single CPU Several processes that accomplish a task
executing the copies of same code (hence, “peers”)
exchanging messages example: distributed matrix multiplication
Used to implement Distributed parallel programs including
distributed versions of iterative parallelism Decentralized decision making
Among the 5 paradigms are certain characteristics common to distributed environments.
Distributed memoryProperties of parallel applicationsConcurrent computation
revised 9/8/2002 ECEN5053 SW Eng of Distributed Systems, University of Colorado
22
Distributed memory implications
Each processor can access only its own local memory
Program cannot use global variables Every variable must be local to some process or
procedure and can be accessed only by that process or procedure
Processes have to use message passing to communicate with each other
revised 9/8/2002 ECEN5053 SW Eng of Distributed Systems, University of Colorado
23
Example of a parallel application Remember concurrent matrix multiplication in a
shared memory environment -- last semester?
Sequential solution first:
for [i = 0 to n-1] {
for [j = 0 to n-1] {
# compute inner product of a[i,*] and b[*, j]
c[i, j] = 0.0;
for [k = 0 to n-1]
c[i, j] = c[i, j] + a[i, k]* b[k, j];
}
}
revised 9/8/2002 ECEN5053 SW Eng of Distributed Systems, University of Colorado
24
Properties of parallel applications
Two operations can be executed in parallel if they are independent.
Read set contains variables it reads but does not alter
Write set contains variables it alters (and possibly also reads)
Two operations are independent if the write set of each is disjoint from both the read and write sets of the other.
revised 9/8/2002 ECEN5053 SW Eng of Distributed Systems, University of Colorado
25
Concurrent computation
Computing rows of result-matrix in parallel.
cobegin [i = 0 to n-1] {
for [j = 0 to n-1 {
c[i, j] = 0.0;
for [k = 0 to n-1]
c[i, j] = c[i, j] + a[i, k] * b[k, j];
}
} # coend
revised 9/8/2002 ECEN5053 SW Eng of Distributed Systems, University of Colorado
26
Differences: sequential vs. concurrent
Syntactic: cobegin is used in place of for in the
outermost loop Semantic:
cobegin specifies that its body should be executed concurrently -- at least conceptually -- for each value of index i.
revised 9/8/2002 ECEN5053 SW Eng of Distributed Systems, University of Colorado
27
Previous example implemented matrix multiplication using shared variables
Now -- two ways using message passing as means of communication 1. Coordinator process & array of
independent worker processes 2. Workers are peer processes that interact
by means of a circular pipeline
revised 9/8/2002 ECEN5053 SW Eng of Distributed Systems, University of Colorado
28
Worker 0 Worker n-1
Coordinator
data data
Results
...Worker 0 Worker n-1
Peers
revised 9/8/2002 ECEN5053 SW Eng of Distributed Systems, University of Colorado
29
Assume n processors for simplicity Use an array of n worker processes, one worker
on each processor, each worker computes one row of the result matrix
process worker[i = 0 to n-1] { double a[n]; # row i of matrix a
double b[n,n]; # all of matrix b
double c[n]; # row i of matrix c
receive initial values for vector a and matrix b;
for [j = 0 to n-1] {
c[j] = 0.0;
for [k = 0 to n-1]
c[j] = c[j + a[k] * b[k, j]; } send result c to coord}
revised 9/8/2002 ECEN5053 SW Eng of Distributed Systems, University of Colorado
30
Aside -- if not standalone: The source matrices might be produced by a
prior computation and the result matrix might be input to a subsequent computation.
Example of distributed pipeline.
revised 9/8/2002 ECEN5053 SW Eng of Distributed Systems, University of Colorado
31
Role of coordinator
Initiates the computation and gathers and prints the results.
First sends each worker the appropriate row of a and all of b.
Waits to receive row of c from every worker.
revised 9/8/2002 ECEN5053 SW Eng of Distributed Systems, University of Colorado
32
process coordinator {
#source matrix a, b, and c are declared
initialize a and b;
for [i = 0 to n-1] {
send row i of a to worker [i];
send all of b to worker [i];
}
for [i = 0 to n-1]
receive row i of c from worker [i];
print results which are now in matrix c;
}
revised 9/8/2002 ECEN5053 SW Eng of Distributed Systems, University of Colorado
33
Message passing primitives
Send packages up a message and transmits it to another process
Receive waits for a message from another process and stores it in local variables.
revised 9/8/2002 ECEN5053 SW Eng of Distributed Systems, University of Colorado
34
Peer approachEach worker has one row of a & is to compute one row
of cEach worker has only one column of b at a time instead
of the entire matrixWorker i has column i of matrix b.With this much source data, worker i can compute only
the result for c[i, i].For worker i to compute all of row i of matrix c, it must
acquire all columns of matrix b.We circulate the columns of b among the worker
processes via the circular pipelineEach worker executes a series of rounds in which it
sends its column of b to the next worker and receives a different column of b from the previous worker
revised 9/8/2002 ECEN5053 SW Eng of Distributed Systems, University of Colorado
35
See handout Each worker executes the same algorithm Communicates with other workers in order to
compute its part of the desired result. In this case, each worker communicates with
just two neighbors In other cases of interacting peers, each worker
communicates with all the others.
revised 9/8/2002 ECEN5053 SW Eng of Distributed Systems, University of Colorado
36
Worker algorithm
Process worker [I = 0 to n-1] {
double a[n]; #row i of matrix a
double b[n]; #one column of matrix b
double c[n]; #row i of matrix c
double sum = 0.0; # storage for inner products
int nextCol = i; # next column of results
revised 9/8/2002 ECEN5053 SW Eng of Distributed Systems, University of Colorado
37
Worker algorithm (cont.)receive row i of matrix a and column i of matrix b;
#compute c[i,i] = a[i,*] x b[*,i]
for [k = 0 to n-1]
sum = sum + a[k] * b[k];
c[nextCol] = sum;
# circulate columns and compute rest of c[i,*]
for [j = 1 to n-1] {
send my column of b to next worker;
receive a new column of b from previous worker
revised 9/8/2002 ECEN5053 SW Eng of Distributed Systems, University of Colorado
38
Worker algorithm (cont. 2)
sum = 0.0;
for [k = 0 to n-1]
sum = sum + a[k] * b[k];
if (nextCol == 0)
nextCol = n-1;
else
nextCol = nextCol – 1;
c[nextCol] = sum;
}
send result vector c to coordinator process;
}
revised 9/8/2002 ECEN5053 SW Eng of Distributed Systems, University of Colorado
39
Comparisons
In first program, values of matrix b are replicated In second, each has one row of a and one
column of b at any point in time - First requires more memory but executes faster. This is a classic time/space tradeoff.
revised 9/8/2002 ECEN5053 SW Eng of Distributed Systems, University of Colorado
40
Summary
Concurrent programming paradigms in a shared-memory environment Iterative parallelism Recursive parallelism Producers and consumers
Concurrent programming paradigms in a distributed-memory environment Client/server Interacting peers
revised 9/8/2002 ECEN5053 SW Eng of Distributed Systems, University of Colorado
41
Shared-memory programming
revised 9/8/2002 ECEN5053 SW Eng of Distributed Systems, University of Colorado
42
Shared-Variable Programming
Frowned on in sequential programs, although convenient (“global variables”)
Absolutely necessary in concurrent programs Must communicate to work together
revised 9/8/2002 ECEN5053 SW Eng of Distributed Systems, University of Colorado
43
Need to communicate
Communication fosters need for synchronization Mutual exclusion – need to not access shared
data at the same time Condition synchronization – one needs to wait
for another Communicate in distributed environment via
messages, remote procedure call, or rendezvous
revised 9/8/2002 ECEN5053 SW Eng of Distributed Systems, University of Colorado
44
Some terms
State – values of the program variables at a point in time, both explicit and implicit. Each process in a program executes independently and, as it executes, examines and alters the program state.
Atomic actions -- A process executes sequential statements. Each statement is implemented at the machine level by one or more atomic actions that indivisibly examine or change program state.
Concurrent program execution interleaves sequences of atomic actions. A history is a trace of a particular interleaving.
revised 9/8/2002 ECEN5053 SW Eng of Distributed Systems, University of Colorado
45
Terms -- continued
The next atomic action in any ONE of the processes could be the next one in a history. So there are many ways actions can be interleaved and conditional statements allow even this to vary.
The role of synchronization is to constrain the possible histories to those that are desirable.
Mutual exclusion combines atomic actions into sequences of actions called critical sections where the entire section appears to be atomic.
revised 9/8/2002 ECEN5053 SW Eng of Distributed Systems, University of Colorado
46
Terms – continued further
Property of a program is an attribute that is true of every possible history. Safety – never enters a bad state Liveness – the program eventually enters a
good state
revised 9/8/2002 ECEN5053 SW Eng of Distributed Systems, University of Colorado
47
How can we verify?
How do we demonstrate a program satisfies a property? A dynamic execution of a test considers just
one possible historyLimited number of tests unlikely to
demonstrate the absence of bad histories Operational reasoning -- exhaustive case
analysis Assertional reasoning – abstract analysis
Atomic actions are predicate transformers
revised 9/8/2002 ECEN5053 SW Eng of Distributed Systems, University of Colorado
48
Assertional Reasoning
Use assertions to characterize sets of states Allows a compact representation of states and
their transformations More on this later in the course
revised 9/8/2002 ECEN5053 SW Eng of Distributed Systems, University of Colorado
49
Warning
We must be wary of dynamic testing alone it can reveal only the presence of errors, not
their absence. Concurrent and distributed programs are
difficult to test & debugDifficult (impossible) to stop all processes
at once in order to examine their state!Each execution in general will produce a
different history
revised 9/8/2002 ECEN5053 SW Eng of Distributed Systems, University of Colorado
50
Why synchronize?
If processes do not interact, all interleavings are acceptable.
If processes do interact, only some interleavings are acceptable.
Role of synchronization: prevent unacceptable interleavings Combine fine-grain atomic actions into
coarse-grained composite actions (we call this ....what?)
Delay process execution until program state satisfies some predicate
revised 9/8/2002 ECEN5053 SW Eng of Distributed Systems, University of Colorado
51
Unconditional atomic action does not contain a delay condition can execute immediately as long as it
executes atomically (not interleaved) examples:
individual machine instructionsexpressions we place in angle bracketsawait statements where guard condition is
constant true or is omitted
revised 9/8/2002 ECEN5053 SW Eng of Distributed Systems, University of Colorado
52
Conditional atomic action - await statement with a guard condition If condition is false in a given process, it can
only become true by the action of other processes.
How long will the process wait if it has a conditional atomic action?
revised 9/8/2002 ECEN5053 SW Eng of Distributed Systems, University of Colorado
53
How to implement synchronization
To implement mutual exclusion Implement atomic actions in software using
locks to protect critical sections Needed in most concurrent programs
To implement conditional synchronization Implement synchronization point that all
processes must reach before any process is allowed to proceed -- barrier
Needed in many parallel programs -- why?
revised 9/8/2002 ECEN5053 SW Eng of Distributed Systems, University of Colorado
54
Desirable Traits and Bad States
Mutual exclusion -- at most one process at a time is executing its critical section its bad state is one in which two processes
are in their critical section Absence of Deadlock (livelock) -- If 2 or more
processes are trying to enter their critical sections, at least one will succeed. its bad state is one in which all the processes
are waiting to enter but none is able to do so two more on next slide
revised 9/8/2002 ECEN5053 SW Eng of Distributed Systems, University of Colorado
55
Desirable Traits and Bad states (cont.) Absence of Unnecessary Delay -- If a process is
trying to enter its c.s. and the other processes are executing their noncritical sections or have terminated, the first process is not prevented from entering its c.s. Bad state is one in which the one process that
wants to enter cannot do so, even though no other process is in the c.s.
Eventual entry -- process that is attempting to enter its c.s. will eventually succeed. liveness property, depends on scheduling
policy
revised 9/8/2002 ECEN5053 SW Eng of Distributed Systems, University of Colorado
56
Logical property of mutual exclusion When process1 is in its c.s., set property1 true. Similarly, for process2 where property2 is true. Bad state is where property1 and property2 are
both true at the same time Therefore
want every state to satisfy the negation of the bad state --
mutex: NOT(property1 AND property2)Needs to be a global invariant
True in the initial state and after each event that affects property1 or property2
<await (!property2) property1 = true>
revised 9/8/2002 ECEN5053 SW Eng of Distributed Systems, University of Colorado
57
Coarse-grain solution
process process1 {
while (true) {
<await (!property2) property1 = true;>
critical section;
property1 = false;
noncritical section;
}
}
process process2 {
while (true) {
<await (!property1) property2 = true;>
critical section;
property2 = false;
noncritical section;
}
}
bool property1 = false; property2 = false;
COMMENT: mutex: NOT(property1 AND property2) -- global invariant
revised 9/8/2002 ECEN5053 SW Eng of Distributed Systems, University of Colorado
58
Does it avoid the problems? Deadlock: if each process were blocked in its
entry protocol, then both property1 and property2 would have to be true. Both are false at this point in the code.
Unnecessary delay: One process blocks only if the other one is not in its c.s.
Liveness -- see next slide
revised 9/8/2002 ECEN5053 SW Eng of Distributed Systems, University of Colorado
59
Liveness guaranteed?
Liveness property -- process trying to enter its critical section eventually is able to do so If process1 trying to enter but cannot, then
property2 is true; therefore process2 is in its c.s. which
eventually exits making property2 false; allows process1’s guard to become true
If process1 still not allowed entry, it’s because the scheduler is unfair or because process2 again gains entry -- (happens infinitely often?)
Strongly-fair scheduler required, not likely.
revised 9/8/2002 ECEN5053 SW Eng of Distributed Systems, University of Colorado
60
Three “spin lock” solutions A “spin lock” solution uses busy-waiting
Ensure mutual exclusion, are deadlock free, and avoid unnecessary delay
Require a fairly strong scheduler to ensure eventual entry
Do not control the order in which delayed processes enter their c.s.’s when >= 2 try
Busy-waiting solutions were tolerated on a single CPU when the critical section was bounded.
What about busy-waiting solutions in a distributed environment? Is there such a thing?
revised 9/8/2002 ECEN5053 SW Eng of Distributed Systems, University of Colorado
61
Distributed-memory programming
revised 9/8/2002 ECEN5053 SW Eng of Distributed Systems, University of Colorado
62
Distributed-memory architecture
Synchronization constructs we examined last semester were based on reading and writing shared variables.
In distributed architectures, processors have their own private memory interact using a communication network without a shared memory, must exchange
messages
revised 9/8/2002 ECEN5053 SW Eng of Distributed Systems, University of Colorado
63
Necessary first stepsto write programs for a dist.-memory arch.
1. Define the interfaces with the communication network
If they were read and write ops like those that operate on shared variables, Programs would have to employ busy-
waiting synchronization. Why? Better to define special network
operations that include synchronization -- message passing primitives
revised 9/8/2002 ECEN5053 SW Eng of Distributed Systems, University of Colorado
64
2. Message-passing is extending semaphores to convey data as well as to provide synchronization
3. Processes share channels - a communication path
Necessary first stepsto write programs for a dist.-memory arch. – cont.
revised 9/8/2002 ECEN5053 SW Eng of Distributed Systems, University of Colorado
65
Characteristics
Distributed program may be distributed across the processors of a
distributed-memory architecture can be run on a shared-memory
multiprocessor(Just like a concurrent program can be run
on a single, multiplexed processor.) Channels are the only items that
processes share in a distributed program Each variable is local to one process
revised 9/8/2002 ECEN5053 SW Eng of Distributed Systems, University of Colorado
66
Implications of no shared variables
Variables are never subject to concurrent access
No special mechanism for mutual exclusion is required
Processes must communicate in order to interact
Main concern of distributed programming is synchronizing interprocess communication How this is done depends on the pattern
of process interaction
revised 9/8/2002 ECEN5053 SW Eng of Distributed Systems, University of Colorado
67
Patterns of process interaction Vary in way channels are named and used Vary in way communication is synchronized We’ll look at asynchronous and
synchronous message passing, remote procedure calls, and rendezvous. Equivalent: a program written using one
set of primitives can be rewritten using any of the others
However: message passing is best for programming producers and consumers and interacting peers;
RPC and rendezvous best for programming clients and servers
revised 9/8/2002 ECEN5053 SW Eng of Distributed Systems, University of Colorado
68
How related
Busy waiting
Semaphores
Message passing
RendezvousRPC
Monitors
revised 9/8/2002 ECEN5053 SW Eng of Distributed Systems, University of Colorado
69
Match Exampleswith Paradigms and Process Interaction categories
ATMWeb-based travel siteStock transaction processing systemSearch service