Parallel and Distributed Algorithms
Overview
Parallel Algorithm vs Distributed Algorithm
PRAM Maximal Independent Set Sorting using PRAMChoice coordination problemReal world applications
INTRODUCTION
Need of distributed processing
A massively parallel processing machineCPUs with 1000 processors
Moore’s law coming to an end
Parallel Algorithm
A parallel algorithm is an algorithm which can be executed a piece at a time on many different processing devices, and then combined together again at the end to get the correct result.*
* Blelloch, Guy E.; Maggs, Bruce M. Parallel Algorithms. USA: School of Computer Science, Carnegie Mellon University.
Distributed Algorithm
A distributed algorithm is an algorithm designed to run on computer hardware constructed from interconnected processors.*
*Lynch, Nancy (1996). Distributed Algorithms. San Francisco, CA: Morgan Kaufmann Publishers. ISBN 978-1-55860-348-6.
PRAM
Random Access Machine
An abstract machine with unbounded number of local memory cells and with simple set of instruction sets
Time complexity: number of instructions executed
Space complexity: number of memory cells used
All operations take Unit time
PRAM (Parallel Random Access Machine)PRAM is a parallel version of RAM for designing the
algorithms applicable to parallel computers
Why PRAM ? The number of processor execute per one cycle on P processors
is at most P Any processor can read/write any shared memory cell in unit
time It abstracts from overhead which makes the complexity of PRAM
algorithm easier It is a benchmark
Read A[i-1], Computation A[i]=A[i-1]+1, Write A[i]
Shared Memory (A)
P1
P2
Pn
A[0]A[1]=A[0]+1
A[2]=A[1]+1
A[n]=A[n-1]+1
A[1]
A[1]
A[2]
A[n-1]
A[n]
Share Memory Access ConflictsExclusive Read(ER) : all processors can
simultaneously read from distinct memory locations
Exclusive Write(EW) : all processors can simultaneously write to distinct memory locations
Concurrent Read(CR) : all processors can simultaneously read from any memory location
Concurrent Write(CW) : all processors can write to any memory location
EREW, CREW, CRCW
Complexity
Parallel time complexity: The number of synchronous steps in the algorithm
Space complexity: The number of share memory cell
Parallelism: The number of processors used
MAXIMAL INDEPENDENT SET
Lahiru SamarakoonSumanaruban Rajadurai
14
Independent Set (IS):
Any set of nodes that are not adjacent
15
Maximal Independent Set (MIS):
An independent set that is nosubset of any other independent set
16
Maximal vs. Maximum IS
a maximum independent seta maximal independent set
17
A Sequential Greedy algorithm
Suppose that will hold the final MISI
Initially I
G
18
Pick a node and add it to I1v
1v
Phase 1:
1GG
19
Remove and neighbors )( 1vN1v
1G
20
Remove and neighbors )( 1vN1v
2G
21
Pick a node and add it to I2v
2v
Phase 2:
2G
22
2v
Remove and neighbors )( 2vN2v
2G
23
Remove and neighbors )( 2vN2v
3G
24
Repeat until all nodes are removed
Phases 3,4,5,…:
3G
25
Repeat until all nodes are removed
No remaining nodes
Phases 3,4,5,…,x:
1xG
26
At the end, set will be an MIS of I G
G
27
Running time of algorithm: )(nO
Worst case graph:
n nodes
Intuition for parallelization
28
At each phase we may select any independent set (instead of a single node), remove S and neighbors of S from the graph.
29
Suppose that will hold the final MISI
Initially I
Example:
G
30
Find any independent set S
Phase 1:
And insert to :I SII
1GG
S
31
remove and neighbors )(SN
2G
S
32
Phase 2:
Find any independent set
And insert to :I SII
On new graph
2G
S
S
33
2G
remove and neighbors )(SNS
34
remove and neighbors )(SN
3G
S
35
Phase 3:
Find any independent set
And insert to :I SII
On new graph
3G
S
S
36
3G
remove and neighbors )(SNS
37
No nodes are left
4G
remove and neighbors )(SNS
38
Final MIS I
G
39
The number of phases depends on the choice of independent set in each phase
The larger the independent set at eachphase the faster the algorithm
Observation:
40
1 2 )(vd
Let be the degree of node)(vd v
v
Randomized Maximal Independent Set ( MIS )
41
Each node elects itselfwith probability
At each phase :k
kI
)(21
)(vd
vp
kGv
1 2 )(vd
v
degree ofin
Elected nodes are candidates for theindependent set
v
kG
42
)(vd
v
z
)(zd
If two neighbors are elected simultaneously,then the higher degree node wins
Example:
)(vd
v
z
)(zd
12
1
2
12
1
2)()( vdzd
if
43
)(vd
v
z
)(zd
If both have the same degree, ties are broken arbitrarily
Example:
)(vd
v
z
)(zd
12
1
2
12
1
2)()( vdzd
if
44
Problematic nodeskG
Using previous rules, problematic nodesare removed
45
kG
The remaining elected nodes formindependent set S
46
mark lower-degree vertices with higher probability
Luby’s algorithm
47
Problematic nodeskG
Using previous rules, problematic nodesare removed
48
if both end-points of an edge is marked, unmark the one with the lower degree
Luby’s algorithm
49
kG
The remaining elected nodes formindependent set S
50
remove marked vertices with their neighbors and corresponding edges
add all marked vertices to MIS
Luby’s algorithm
ANALYSIS
6
2
Goodness property
G3
4
43
4 44
• A vertex v is good at least ⅓ of its neighbors have lesser degree than it. bad otherwise.
• An edge is bad if its both endpoints are bad. good otherwise.
Lemma 1Let v Є V be a good vertex with degree d(v) >
0.Then, the probability that some vertex w in N( v) gets marked is at least 1 - exp( -1 / 6).
Define L(v) is set of neighbors of v whose degree is lesser than v’s degree.
By definition, |L(v)|≥d(v)/3 if v is a GOOD vertex.
Lemma 2
During any iteration, if a vertex w is marked then it is selected to be in S with probability at least 1/2.
• From lemma1 and 2 => The probability that a good vertex belongs to S U N(S) is at least (1 - exp(-1/6))/2.
• Good vertices get eliminated with a constant probability.
• It follows that the expected number of edges eliminated during an iteration is a constant fraction of the current set of edges.
• This implies that the expected number of iterations of the Parallel MIS algorithm is O(log n).
Lemma 3In a graph G(V,E), the number of good edges is at
least |E|/2.ProofDirect the edges in E from the lower degree
endpoint to the higher degree end-point, breaking ties arbitrarily.
for each bad vertex v
For all S, T С V, define the subset of the (oriented) edges E(S, T) as those edges that are directed from vertices in S to vertices in T
Let VG and VB be the set of good and bad vertices
SORTING ON PRAM
Jessica Makucka Puneet Dewan
SortingCurrent problem: sort n numbers
Best average case for sorting is O(nlog n)
Can we do better with more processors?
YES!
Notes about QuicksortSort n numbers on a PRAM with n
processorsAssume all numbers are distinctCREW PRAM for this caseEach of the n processors contains an
input element
Notation:
Let Pi denote ith processor
Quicksort Algorithm0. If n=1 stop1. Pick a splitter at random from n elements2. Each processor determines whether its element is
bigger or smaller than the splitter3. Let j denote splitters rank:
• If j [n/4, 3n/4] means failure, go back to (1)• If j [n/4, 3n/4] means success and move splitter to Pj Every
element smaller than j is moved to distinct processor Pi for i < j and the larger elements are moved to distinct processor Pk for k > j
4. Sort elements recursively in processors P1 through Pj-
1, and the elements in processors Pj+1 through Pn
Quicksort Time Analysis
Algorithm
1. Pick a successful splitter at random from n elements (assumption)
2. Each processor determines whether its element is bigger or smaller than the splitter
Time Analysis of each stage
1. O(logn) stages for every sequence or recursive split
2. Trivial – can be implemented in single CREW PRAM step
Quicksort Time Analysis3. Let j denote splitters
rank:• If j [n/4, 3n/4] go back
to 1.• If j [n/4, 3n/4] move
splitter to Pj Every element smaller than j is moved to distinct processor Pi for i < j and the larger elements are moved to distinct processor Pk for k > j
O(log n) PRAM steps needed for the single splitting stage
Comparison Splitting Stage (3)
P1 P2 P3 P4 P5 P6 P7 P8
12 3 7 5 11 2 1 14
splitter
0 1 1 1 11 0
Assign bit depending on if Pi’s element is smaller or bigger than the splitter - 0 if element is bigger - 1 otherwise
Comparison Splitting Stage (3)
P1 P2 P3 P4 P5
12 3 7 5 11
splitter
1 1 10
1 2
3
Step 1:
Step 2:
+ +
+
Overall Time AnalysisThis algorithm would terminate in O(log2
n) stepsEach step is O(log n) for splitting stageO(log n) steps
Derived from this solved equation:
Cons
In this algorithm, There is an assumption that split will always be successful and it will break the problem from N to a constant fraction of N.No Suitable method for successful
split.
Improvement
IdeaReduce the problem into size of n1-e where e<1 while keeping the time to split the same.
Benefitsif e=1/2
The total time for the entire problem size will be: log n + log n1/2+log n1/4+… resulting in O(log n)
Then we could hope for an overall running time of O(log n).
Long Story
Suppose that we have n processors and n elements.Suppose that processors P1 through Pr, contain r of the elements in sorted order, and that processors Pr+1 through Pn contain the remaining n - r elements.
1.Choose Random Splitters and sort them.Let the sorted elements in the first r processors the splitters. For 1 < =j <= r, let sj denote the jth largest splitter.
2. Insert Insert the n - r unsorted elements among the splitters.
3.Sort remaining elements among splitters a. Each processor should end up with a distinct input element. b. Let i(sj) denote the index of the processor containing sj following the insertion operation. Then, for all k < i(sj), processor Pk contains an element that is smaller than sj similarly, for all k > i(sj), processor Pk contains an element that is larger than sj.
Example
Choose Random Splitter
5 9 8 10 7 6 12 11
Example (Contd.)Sort the random splitters.
Sorted List Unsorted List
6 11 5 9 8 7 10 12
Example(Contd.)Insert the unsorted elements among the
splitters
5 6 7 9 8 10 11 12
Example(Contd.)
Check the number of elements between the splitters has size less than or equal to (Log n ) or not.
Suppose S represents size
S=4 (exceeds log n i.e 3) S=1 S=1
56798101112
5 6 7 9 8 10 11 12
Example Contd.
Recur on the sub problem whose size exceeds log n. Again choose random splitters and follow the same process
Random Splitters
5 6 7 9 8 10 11 12
Partitioning as tree
Tree formed from first partition.Now the size on the right exceeds log n, so we again split by
choosing random partitions. E.g 9,8
6
5 7
9
810
Size on right
exceeds log n
Contd.
Sorted because of partition
6
5 8
97
10
Lemma’s to be Used
1. A CREW PRAM having (n2) processors. Suppose that each of the processors P1 through Pn has an input element to be sorted.
Then the PRAM can sort these n elements in O(log n).2. For n processors, and n elements of which n1/2 are splitters , then the insertion process can be completed in O(log n) steps.
Box Sort
Algorithm :Input: A set of numbers S .Output: The elements of S sorted in increasing order. 1 . Select n1/2 (e is 1/2)elements at random from the n input elements. Using all n processors, sort them in O(log n) steps.(Fact 1) 2. Using the sorted elements from Stage 1 as splitters, insert the remaining elements among them in O(log n) steps(Fact 2) 3. Treating the elements that are inserted between adjacent splitters as subproblems , recur on each sub-problem whose size exceeds log n. For subproblems of size log n or less, invoke LogSort.
Sort Fact
A CREW PRAM with m processors can sort m elements in O(m) steps.
Example
Each Processor is assigned an element and compares its element with remaining elements simultaneously in O(m) steps.
Rank assigned implies elements are sorted.
4 7 6 5 8 2 3 15 9 8 7 10 3 4 2
P1 P2 P3 P4 P5 P6 P7 P8
Ranks
Assigned
Things to remember Last statement of Box Sort algorithm. Idea on the previous slide.
Log SortWe will be having log n processors with
log n elements then we can sort in O(log n).
Analysis
Consider each node of the tree as a box.Choosing random splitters and sort them take time of
O(log n).Insert the unsorted elements among the splitters takes
O(log n).With high probability (assumption) the sub problems
resulting from splitting operation are very small(i.e the unsorted elements among the splitters).
So each leaf is a box of size at most log n.For calculating the time spent, we can use the Log Sort
which sorts the elements in O(log n)Total time is O(log n)
DISTRIBUTED RANDOMIZED ALGORITHM
Yogesh S RawatR. Ramanathan
CHOICE COORDINATION PROBLEM (CCP)
Biological Inspiration
mite (genus Myrmoyssus)
Biological Inspiration
mite (genus Myrmoyssus)
reside as parasites on the ear membrane
of the moths of family Phaenidae
Biological Inspiration
mite (genus Myrmoyssus)
reside as parasites on the ear membrane
of the moths of family Phaenidae
Moths are prey to bats and the only defense they have is that they can hear the sonar used by an approaching bat
Biological Inspiration
if both ears of the moth are infected by the mites, then their ability to detect the sonar is considerably diminished, thereby
severely decreasing the survival chances of both the moth and its colony of mites.
Biological Inspiration
The mites are therefore faced with a "choice coordination problem"
How does any collection of mites infecting a particular ear ensure that every other mite chooses the same ear?
Problem Specification
Set of N processors
Problem Specification
Set of N processors
M options to choose from
Problem Specification
Set of N processors
processors have to reach a consensus on unique choice
M options to choose from
Model for Communication
•Collection of M read-write registers accessible to all the processors
– Locking mechanism for conflicts•Each processor follow a protocol for making a
choice– A special symbol (√) is used to mark the choice
•At the end only one register contains the special symbol
Deterministic Solution
•Complexity is measured in terms of number of read and write operations
•For a deterministic solution– Complexity in terms of operations : Ω(n1/3)
• n - Number of processors
For more details - M. O. Rabin, “The choice coordination problem,” Acta Informatica, vol. 17, no. 2, pp. 121–134, Jun. 1982.
Randomized Solution
for any c > 0It will solve the problem using c operations
with a probability of success atleast 1-2-Ω(c)
For simplicity we will consider only the case where
n = m = 2
although the protocol can be easily generalized
Analogy from Real Life
? ?
Random Action - Give way or Move ahead
Analogy from Real Life
? ?
Random Action - Give way or Move ahead
Give way Give way
Move Ahead Move Ahead
Move Ahead Give way
Give way Move Ahead
Person 1 Person 2
Analogy from Real Life
? ?
Random Action - Give way or Move ahead
Give way Give way
Move Ahead Move Ahead
Move Ahead Give way
Give way Move Ahead
Breaking Symmetry
Person 1 Person 2
Synchronous CCP The two processors are synchronous Operate in lock-step according to some global clock
Used terminology P
i – processor i, where i ϵ {0,1}
Ci – shared register for choices, where i ϵ {0,1}
Bi – local variable for each processor, where i ϵ {0,1}
P0
P1
B0
B1
C0
C1
Synchronous CCP
P0
P1
B0
B1C
0C
1
The processor Pi initially scans the register C
i Thereafter, the processors exchange registers after every
iteration At no time will the two processors scan the same register.
Synchronous CCP
P0
P1
B0
B1C
0C
1
The processor Pi initially scans the register C
i Thereafter, the processors exchange registers after every
iteration At no time will the two processors scan the same register.
Synchronous CCP
P0
P1
B0
B1C
1C
0
The processor Pi initially scans the register Ci
Thereafter, the processors exchange registers after every iteration
At no time will the two processors scan the same register.
AlgorithmInput: Registers C
0 and C
1initialized to 0
Output: Exactly one of the two registers has the value √
Step 0 - Pi is initially scanning the register C
i
Step 1 - Read the current register and obtain a bit Ri
Step 2 - Select one of three cases
case: 2.1 [Ri = √]
» halt
case: 2.2 [Ri = 0, B
i = 1 )
» Write √ into the current register and halt
case: 2.3 [otherwise]» Assign an unbiased random bit to B
i
» write Bi into the current register
Step 3 - Pi exchanges its current register with P
1 - i and returns
to Step 1 .
AlgorithmInput: Registers C
0 and C
1initialized to 0
Output: Exactly one of the two registers has the value √
Step 0 - Pi is initially scanning the register C
i
Step 1 - Read the current register and obtain a bit Ri
Step 2 - Select one of three cases
case: 2.1 [Ri = √]
» halt
case: 2.2 [Ri = 0, B
i = 1 ]
» Write √ into the current register and halt
case: 2.3 [otherwise]» Assign an unbiased random bit to B
i
» write Bi into the current register
Step 3 - Pi exchanges its current register with P
1 - i and returns
to Step 1 .
Read Operation
AlgorithmInput: Registers C
0 and C
1initialized to 0
Output: Exactly one of the two registers has the value √
Step 0 - Pi is initially scanning the register C
i
Step 1 - Read the current register and obtain a bit Ri
Step 2 - Select one of three cases
case: 2.1 [Ri = √]
» halt
case: 2.2 [Ri = 0, B
i = 1 )
» Write √ into the current register and halt
case: 2.3 [otherwise]» Assign an unbiased random bit to B
i
» write Bi into the current register
Step 3 - Pi exchanges its current register with P
1 - i and returns
to Step 1 .
Choice has already been made by the other processor
Input: Registers C0 and C
1initialized to 0
Output: Exactly one of the two registers has the value √
Step 0 - Pi is initially scanning the register C
i
Step 1 - Read the current register and obtain a bit Ri
Step 2 - Select one of three cases
case: 2.1 [Ri = √]
» halt
case: 2.2 [Ri = 0, B
i = 1 )
» Write √ into the current register and halt
case: 2.3 [otherwise]» Assign an unbiased random bit to B
i
» write Bi into the current register
Step 3 - Pi exchanges its current register with P
1 - i and returns
to Step 1 .
Algorithm
Only condition for making a choice
Input: Registers C0 and C
1initialized to 0
Output: Exactly one of the two registers has the value √
Step 0 - Pi is initially scanning the register C
i
Step 1 - Read the current register and obtain a bit Ri
Step 2 - Select one of three cases
case: 2.1 [Ri = √]
» halt
case: 2.2 [Ri = 0, B
i = 1 )
» Write √ into the current register and halt
case: 2.3 [otherwise]» Assign an unbiased random bit to B
i
» write Bi into the current register
Step 3 - Pi exchanges its current register with P
1 - i and returns
to Step 1 .
Algorithm
Generate a random valueWrite operation
Input: Registers C0 and C
1initialized to 0
Output: Exactly one of the two registers has the value √
Step 0 - Pi is initially scanning the register C
i
Step 1 - Read the current register and obtain a bit Ri
Step 2 - Select one of three cases
case: 2.1 [Ri = √]
» halt
case: 2.2 [Ri = 0, B
i = 1 )
» Write √ into the current register and halt
case: 2.3 [otherwise]» Assign an unbiased random bit to B
i
» write Bi into the current register
Step 3 - Pi exchanges its current register with P
1 - i and returns
to Step 1 .
Algorithm
Exchange Registers
Correctness of Algorithm
We need to prove only one of the shared register has √ marked in it
Suppose that both are marked with √–This must have had in same iteration–Otherwise step 2.1 will halt the algorithm
Correctness of AlgorithmLet us assume that the error takes place during the tth
iterationAfter step 1 values for processor Pi
Bi(t) and Ri(t)By case 2.3
R0(t) = B1(t) R1(t) = B0(t)
Suppose Pi writes √ in the tth iteration, then Ri = 0 and Bi = 1 and R1-i = 1 and B1-i = 0 P1-i cannot write √ in ith iteration Breaking Symmetry
R0 B0
0 0
Read Operation
R1 B1
0 0
C0 C1
0 0
Processor 0 Shared Registers Processor 1
R0 B0
0 0
0 0
Write Operation
R1 B1
0 0
0 0
C0 C1
0 0
0 0
Processor 0 Shared Registers Processor 1
Random Random
R0 B0
0 0
0 0
0 0
Read Operation
R1 B1
0 0
0 0
0 0
C0 C1
0 0
0 0
0 0
Processor 0 Shared Registers Processor 1
R0 B0
0 0
0 0
0 0
0 1
Write Operation
R1 B1
0 0
0 0
0 0
0 1
C0 C1
0 0
0 0
0 0
1 1
Processor 0 Shared Registers Processor 1
Random Random
R0 B0
0 0
0 0
0 0
0 1
1 1
Read Operation
R1 B1
0 0
0 0
0 0
0 1
1 1
C0 C1
0 0
0 0
0 0
1 1
1 1
Processor 0 Shared Registers Processor 1
R0 B0
0 0
0 0
0 0
0 1
1 1
1 0
Write Operation
R1 B1
0 0
0 0
0 0
0 1
1 1
1 1
C0 C1
0 0
0 0
0 0
1 1
1 1
0 1
Processor 0 Shared Registers Processor 1
Random Random
R0 B0
0 0
0 0
0 0
0 1
1 1
1 0
1 0
Read Operation
R1 B1
0 0
0 0
0 0
0 1
1 1
1 1
0 1
HALT
C0 C1
0 0
0 0
0 0
1 1
1 1
0 1
0 1
Processor 0 Shared Registers Processor 1
R0 B0
0 0
0 0
0 0
0 1
1 1
1 0
1 0
1 0/1
Write Operation
R1 B1
0 0
0 0
0 0
0 1
1 1
1 1
0 1
HALT
C0 C1
0 0
0 0
0 0
1 1
1 1
0 1
0 1
√ 0/1
Processor 0 Shared Registers Processor 1
Random
R0 B0
0 0
0 0
0 0
0 1
1 1
1 0
1 0
1 0/1
√ 0/1
HALT
Read Operation
R1 B1
0 0
0 0
0 0
0 1
1 1
1 1
0 1
HALT
C0 C1
0 0
0 0
0 0
1 1
1 1
0 1
0 1
√ 0/1
√ 0/1
Processor 0 Shared Registers Processor 1
Complexity•Probability that both the random bits B0 and B1
are the same is 1/2•Therefore probability that number of steps
exceeds t is 1/2t.The algorithm will terminate in next two steps
as soon as B0 and B1are different.•Computation cost of each iteration is bounded
–Therefore, the protocol does O(t) work with probability 1-1/2t
The Problem
C1
C2
P2P1
C1
C2
P2P1
The Problem
C1
C2
P1 P2
The Problem
C1
C2
P1 P2
The Problem
C1
C2
P1 P2
The Problem
The processors are not synchronized
C1
C2
P1 P2
What can we do?
C1
C2
P1 P2
What can we do?
Idea: Timestamp
C1
C2
P1 P2
Read <timestamp,value>
P1 P2
B1 B2
C1
C2
T1 T2t2
t1
Timestamp of Processor: Ti
Timestamp of Register : ti
Input: Registers C1 and C2 initialized to <0,0>
Output: Exactly one of the two registers has Ö
Algorithm
0) Pi initially scans a randomly chosen register. <Ti, Bi> are initialized to <0,0>
1) Pi gets a lock on its current register and reads <ti, Ri>
2) Pi executes one of these cases:
2.1) If Ri = Ö : HALT
2.2) If Ti < ti : Ti ¬ ti and Bi ¬ Ri
2.3) If Ti > ti : Write Ö into the current register and HALT
2.4) If Ti = ti, Ri = 0, Bi = 1 : Write Ö into the current register and HALT
2.5) Otherwise: Ti ¬ Ti + 1 and ti ¬ ti + 1 Bi ¬ Random (unbiased) bit
Write <ti, Bi> into the current register
3) Pi releases the lock on its current register, moves to the other register and returns to step 1.
Algorithm for a process Pi
Initial state
B2
T2
B1
T1
C1
t1
0 0
C2
t2
0 0
Processor P1 Register R2Register R1 Processor P2
1) P1 chooses C1 and reads <0,0>
B2
T2
B1
T1
0 0
C1
t1
0 0
C2
t2
0 0
History : P1==C1
1) P1 chooses C1 and reads <0,0>
B2
T2
B1
T1
0 0
C1
t1
0 0
C2
t2
0 0
[None of the cases from 2.1 to 2.4 are met.Case 2.5 is satisfied]
History : P1==C1
B1
T1
0 0
1
C1
t1
0 0
1
2.5) T1 ¬ T1 + 1 and t1 ¬ t1 + 1
B2
T2
C2
t2
0 0
History : P1==C1
2.5) P1 writes <t1, B1> into C1
B1
T1
0 0
1 1
C1
t1
0 0
1 1
B2
T2
C2
t2
0 0
History : P1==C1
3) P1 releases the lock on C1
B1
T1
0 0
1 1
C1
t1
0 0
1 1
B2
T2
C2
t2
0 0
[P1 moves to C2 and returns to step 1]
History : P1==C1
B2
T2
0 0
C2
t2
0 0
1) P2 chooses C2 and reads <0,0>
B1
T1
0 0
1 1
C1
t1
0 0
1 1
History : P1==C1 P2==C2
B2
T2
0 0
C2
t2
0 0
1) P2 chooses C2 and reads <0,0>
B1
T1
0 0
1 1
C1
t1
0 0
1 1
[None of the cases from 2.1 to 2.4 are met.Case 2.5 is satisfied]
History : P1==C1 P2==C2
B2
T2
0 0
1
C2
t2
0 0
1
2.5) T2 ¬ T2 + 1 and t2 ¬ t2 + 1
B1
T1
0 0
1 1
C1
t1
0 0
1 1
History : P1==C1 P2==C2
2.5) P2 writes <t2, B2> into C2
B2
T2
0 0
1 1
C2
t2
0 0
1 1
B1
T1
0 0
1 1
C1
t1
0 0
1 1
History : P1==C1 P2==C2
B2
T2
0 0
1 1
C2
t2
0 0
1 1
B1
T1
0 0
1 1
C1
t1
0 0
1 1
3) P2 releases the lock on C2
[P2 moves to C1 and returns to step 1]
History : P1==C1 P2==C2
B2
T2
0 0
1 1
C2
t2
0 0
1 1
B1
T1
0 0
1 1
C1
t1
0 0
1 1
1) P2 locks C1 and reads <1,1>
History : P1==C1 P2==C2 P2==C1
B2
T2
0 0
1 1
C2
t2
0 0
1 1
B1
T1
0 0
1 1
C1
t1
0 0
1 1
1) P2 locks C1 and reads <1,1>
[None of the cases from 2.1 to 2.4 are met.Case 2.5 is satisfied]
History : P1==C1 P2==C2 P2==C1
B2
T2
0 0
1 1
2
C2
t2
0 0
1 1
B1
T1
0 0
1 1
C1
t1
0 0
1 1
2
2.5) T2 ¬ T2 + 1 and t1 ¬ t1 + 1
History : P1==C1 P2==C2 P2==C1
B2
T2
0 0
1 1
0 2
C2
t2
0 0
1 1
B1
T1
0 0
1 1
C1
t1
0 0
1 1
0 2
2.5) P2 writes <t1, B2> into C1
History : P1==C1 P2==C2 P2==C1
B2
T2
0 0
1 1
0 2
C2
t2
0 0
1 1
B1
T1
0 0
1 1
C1
t1
0 0
1 1
0 2
3) P2 releases the lock on C1
[P2 moves to C2 and returns to step 1]
History : P1==C1 P2==C2 P2==C1
B2
T2
0 0
1 1
0 2
C2
t2
0 0
1 1
B1
T1
0 0
1 1
C1
t1
0 0
1 1
0 2
1) P2 locks C2 and reads <1,1>
History : P1==C1 P2==C2 P2==C1 P2==C2
B2
T2
0 0
1 1
0 2
C2
t2
0 0
1 1
B1
T1
0 0
1 1
C1
t1
0 0
1 1
0 2
1) P2 locks C2 and reads <1,1>
[Case 2.3: T2 > t2 is satisfied]
History : P1==C1 P2==C2 P2==C1 P2==C2
B2
T2
0 0
1 1
0 2
C2
T2
0 0
1 1
Ö
B1
T1
0 0
1 1
C1
t1
0 0
1 1
0 2
2.3) P2 writes Ö into C2
[P2 HALTS]
History : P1==C1 P2==C2 P2==C1 P2==C2
We’ll show another case of the algorithm
History : P1==C1 P2==C2 P2==C1 P2==C2
Let’s go back 1 iteration
History : P1==C1 P2==C2 P2==C1 P2==C2
B2
T2
0 0
1 1
C2
t2
0 0
1 1
B1
T1
0 0
1 1
C1
t1
0 0
1 1
0 2
History : P1==C1 P2==C2 P2==C1
Let’s go back 1 iteration
B2
T2
0 0
1 1
0 2
C2
t2
0 0
1 1
B1
T1
0 0
1 1
C1
t1
0 0
1 1
0 2
1) P1 locks C2 and reads <1,1>
History : P1==C1 P2==C2 P2==C1 P1==C2
B2
T2
0 0
1 1
0 2
C2
t2
0 0
1 1
B1
T1
0 0
1 1
C1
t1
0 0
1 1
0 2
1) P1 locks C2 and reads <1,1>
[None of the cases from 2.1 to 2.4 are met.Case 2.5 is satisfied]
History : P1==C1 P2==C2 P2==C1 P1==C2
B2
T2
0 0
1 1
0 2
C2
t2
0 0
1 1
2
B1
T1
0 0
1 1
2
C1
t1
0 0
1 1
0 2
2.5) T1 ¬ T1 + 1 and t2 ¬ t2 + 1
History : P1==C1 P2==C2 P2==C1 P1==C2
B2
T2
0 0
1 1
0 2
C2
t2
0 0
1 1
1 2
B1
T1
0 0
1 1
1 2
C1
t1
0 0
1 1
0 2
2.5) B1 ¬ Random (unbiased) bit
History : P1==C1 P2==C2 P2==C1 P1==C2
B2
T2
0 0
1 1
0 2
C2
t2
0 0
1 1
1 2
B1
T1
0 0
1 1
1 2
C1
t1
0 0
1 1
0 2
3) P1 releases the lock on C2
[P1 moves to C1 and returns to step 1]
History : P1==C1 P2==C2 P2==C1 P1==C2
B2
T2
0 0
1 1
0 2
C2
t2
0 0
1 1
1 2
B1
T1
0 0
1 1
1 2
C1
t1
0 0
1 1
0 2
1) P2 locks C2 and reads <1,2>
History : P1==C1 P2==C2 P2==C1 P1==C2 P2==C2
B2
T2
0 0
1 1
0 2
C2
t2
0 0
1 1
1 2
B1
T1
0 0
1 1
1 2
C1
t1
0 0
1 1
0 2
1) P2 locks C2 and reads <1,2>
[None of the cases from 2.1 to 2.4 are met.Case 2.5 is satisfied]
History : P1==C1 P2==C2 P2==C1 P1==C2 P2==C2
B2
T2
0 0
1 1
0 2
3
C2
t2
0 0
1 1
1 2
3
B1
T1
0 0
1 1
1 2
C1
t1
0 0
1 1
0 2
2.5) T2 ¬ T2 + 1 and t2 ¬ t2 + 1
History : P1==C1 P2==C2 P2==C1 P1==C2 P2==C2
B2
T2
0 0
1 1
0 2
1 3
C2
t2
0 0
1 1
1 2
1 3
B1
T1
0 0
1 1
1 2
C1
t1
0 0
1 1
0 2
2.5) P2 writes <t2, B2> into C2
History : P1==C1 P2==C2 P2==C1 P1==C2 P2==C2
B2
T2
0 0
1 1
0 2
1 3
C2
t2
0 0
1 1
1 2
1 3
B1
T1
0 0
1 1
1 2
C1
t1
0 0
1 1
0 2
3) P2 releases the lock on C2
[P2 moves to C1 and returns to step 1]
History : P1==C1 P2==C2 P2==C1 P1==C2 P2==C2
B2
T2
0 0
1 1
0 2
1 3
C2
t2
0 0
1 1
1 2
1 3
B1
T1
0 0
1 1
1 2
C1
t1
0 0
1 1
0 2
1) P1 locks C1 and reads <0,2>
History : P1==C1 P2==C2 P2==C1 P1==C2 P2==C2 P1==C1
B2
T2
0 0
1 1
0 2
1 3
C2
t2
0 0
1 1
1 2
1 3
B1
T1
0 0
1 1
1 2
C1
t1
0 0
1 1
0 2
1) P1 locks C1 and reads <0,2>
[Case 2.4: T1 = t1, R1 = 0, B1 = 1 is satisfied]
History : P1==C1 P2==C2 P2==C1 P1==C2 P2==C2 P1==C1
B2
T2
0 0
1 1
0 2
1 3
C2
t2
0 0
1 1
1 2
1 3
B1
T1
0 0
1 1
1 2
C1
t1
0 0
1 1
0 2
Ö
2.4) P1 writes Ö into C1
History : P1==C1 P2==C2 P2==C1 P1==C2 P2==C2 P1==C1
B2
T2
0 0
1 1
0 2
1 3
C2
t2
0 0
1 1
1 2
1 3
B1
T1
0 0
1 1
1 2
C1
t1
0 0
1 1
0 2
Ö
2.4) P1 HALTS
History : P1==C1 P2==C2 P2==C1 P1==C2 P2==C2 P1==C1
B2
T2
0 0
1 1
0 2
1 3
C2
t2
0 0
1 1
1 2
1 3
B1
T1
0 0
1 1
1 2
C1
t1
0 0
1 1
0 2
Ö
1) P2 locks C1 and reads Ö
History : P1==C1 P2==C2 P2==C1 P1==C2 P2==C2 P1==C1 P2==C1
B2
T2
0 0
1 1
0 2
1 3
C2
t2
0 0
1 1
1 2
1 3
B1
T1
0 0
1 1
1 2
C1
t1
0 0
1 1
0 2
Ö
1) P2 locks C1 and reads Ö
[Case 2.1: R1 = Ö is satisfied]
History : P1==C1 P2==C2 P2==C1 P1==C2 P2==C2 P1==C1 P2==C1
B2
T2
0 0
1 1
0 2
1 3
C2
t2
0 0
1 1
1 2
1 3
B1
T1
0 0
1 1
1 2
C1
t1
0 0
1 1
0 2
Ö
2.1) P2 HALTS
History : P1==C1 P2==C2 P2==C1 P1==C2 P2==C2 P1==C1 P2==C1
Correctness
C1
C2
P1
Ö
P2
Correctness
When a processor writes Ö on a register, the other processor should NOT write Ö on the other register
Correctness
Case 2.3) Ti > ti: Write Ö into the current register and halt.
Case 2.4) Ti = ti, Ri = 0, Bi = 1: Write Ö into the current register and halt.
C1
C2
Ö
Ti * : Current timestamp of processor Pi
ti* : Current timestamp of register Ci
Whenever Pi finishes an iteration in Ci,
Ti = ti
Correctness
Ti * : Current timestamp of processor Pi
ti* : Current timestamp of register Ci
When a processor enters a register, it would have just left the other register
Correctness
2.3) Ti > ti: Write Ö into the current register and HALT
Consider P1 has just entered C1 with t1* < T1
*
2.3) Ti > ti: Write Ö into the current register and HALT
C2
t2
0 0
1 1
0 2
(t2*)
B2
t2
0 0
1 1
(T2*)
B1
T1
0 0
1 1
0 2 (T1*)
C1
t1
0 0
1 1 (t1*)
Consider P1 has just entered C1 with t1* < T1
*
History : P2==C2 P1==C1 P1==C2 P1==C1
2.3) Ti > ti: Write Ö into the current register and HALT
C2
t2
0 0
1 1
0 2
(t2*)
B2
t2
0 0
1 1
(T2*)
B1
T1
0 0
1 1
0 2 (T1*)
C1
t1
0 0
1 1 (t1*)
Consider P1 has just entered C1 with t1* < T1
*
History : P2==C2 P1==C1 P1==C2 P1==C1
In prev iter, P1 must have left C2 with same T1*
2.3) Ti > ti: Write Ö into the current register and HALT
C2
t2
0 0
1 1
0 2
(t2*)
B2
t2
0 0
1 1
(T2*)
B1
T1
0 0
1 1
0 2 (T1*)
C1
t1
0 0
1 1 (t1*)
Consider P1 has just entered C1 with t1* < T1
*
History : P2==C2 P1==C1 P1==C2 P1==C1
T1* ≤ t2
*
2.3) Ti > ti: Write Ö into the current register and HALT
C2
t2
0 0
1 1
0 2
(t2*)
B2
t2
0 0
1 1
(T2*)
B1
T1
0 0
1 1
0 2 (T1*)
C1
t1
0 0
1 1 (t1*)
Consider P1 has just entered C1 with t1* < T1
*
History : P2==C2 P1==C1 P1==C2 P1==C1
T1* ≤ t2
*
2.3) Ti > ti: Write Ö into the current register and HALT
C2
t2
0 0
1 1
0 2
(t2*)
B2
t2
0 0
1 1
(T2*)
B1
T1
0 0
1 1
0 2 (T1*)
C1
t1
0 0
1 1 (t1*)
Consider P1 has just entered C1 with t1* < T1
*
History : P2==C2 P1==C1 P1==C2 P1==C1
P2 must go to C2 only after C1
T1* ≤ t2
*
2.3) Ti > ti: Write Ö into the current register and HALT
C2
t2
0 0
1 1
0 2
(t2*)
B2
t2
0 0
1 1
(T2*)
B1
T1
0 0
1 1
0 2 (T1*)
C1
t1
0 0
1 1 (t1*)
Consider P1 has just entered C1 with t1* < T1
*
History : P2==C2 P1==C1 P1==C2 P1==C1
T2* ≤ t1
*
T1* ≤ t2
*
2.3) Ti > ti: Write Ö into the current register and HALT
C2
t2
0 0
1 1
0 2
(t2*)
B2
t2
0 0
1 1
(T2*)
B1
T1
0 0
1 1
0 2 (T1*)
C1
t1
0 0
1 1 (t1*)
Consider P1 has just entered C1 with t1* < T1
*
History : P2==C2 P1==C1 P1==C2 P1==C1
T1* ≤ t2
* T2* ≤ t1
*
2.3) Ti > ti: Write Ö into the current register and HALT
C2
t2
0 0
1 1
0 2
(t2*)
B2
t2
0 0
1 1
(T2*)
B1
T1
0 0
1 1
0 2 (T1*)
C1
t1
0 0
1 1 (t1*)
Consider P1 has just entered C1 with t1* < T1
*
History : P2==C2 P1==C1 P1==C2 P1==C1
T1* ≤ t2
* T2* ≤ t1
*
2.3) Ti > ti: Write Ö into the current register and HALT
C2
t2
0 0
1 1
0 2
(t2*)
B2
t2
0 0
1 1
(T2*)
B1
T1
0 0
1 1
0 2 (T1*)
C1
t1
0 0
1 1 (t1*)
Consider P1 has just entered C1 with t1* < T1
*
History : P2==C2 P1==C1 P1==C2 P1==C1
T1* ≤ t2
* T2* ≤ t1
*
Summing up : T2* ≤ t1
*< T1* ≤ t2
*
2.3) Ti > ti: Write Ö into the current register and HALT
C2
t2
0 0
1 1
0 2
(t2*)
B2
t2
0 0
1 1
(T2*)
B1
T1
0 0
1 1
0 2 (T1*)
C1
t1
0 0
1 1 (t1*)
Consider P1 has just entered C1 with t1* < T1
*
History : P2==C2 P1==C1 P1==C2 P1==C1
T1* ≤ t2
* T2* ≤ t1
*
T2* < t2
* : T2 cannot write Ö
2.4) Ti = ti, Ri = 0, Bi = 1: Write Ö into register and HALT
Similarly consider P1 has entered C1 with t1* = T1
*
2.4) Ti = ti, Ri = 0, Bi = 1: Write Ö into register and HALT
C2
t2
0 0
1 1
(t2*)
B2
t2
0 0
0 1
(T2*)
B1
T1
0 0
1 1(T1*)
C1
t1
0 0
0 1(t1*)
Similarly consider P1 has entered C1 with t1* = T1
*
History : P1==C2 P2==C1 P1==C1
T1* ≤ t2
*
T2* ≤ t1
*
Summing up : T2* ≤ t1
*= T1* ≤ t2
*
2.4) Ti = ti, Ri = 0, Bi = 1: Write Ö into register and HALT
C2
t2
0 0
1 1
(t2*)
B2
t2
0 0
0 1
(T2*)
B1
T1
0 0
1 1(T1*)
C1
t1
0 0
0 1(t1*)
Similarly consider P1 has entered C1 with t1* = T1
*
History : P1==C2 P2==C1 P1==C1
T1* ≤ t2
*
T2* ≤ t1
*
T2*≤ t2
*, R2 = 1, B2 = 0 : T2 cannot write Ö
• Cost is proportional to the largest timestamp
• Timestamp can go up only in case 2.5
• Processor’s current Bi value is set during a visit
to the other register
• So, synchronous case complexity applies
Complexity
REAL WORLD APPLICATIONS
Pham Nam Khanh
Applications of parallel sorting• Sorting is fundamental algorithm in data processing:
» Parallel Database operations: Rank, Join, etc.» Search (rapid index/lookup after sort)
• Best record in sorting: 102.5 TB in 4,328 seconds using2100 nodes from Yahoo.
Applications of MISWireless and communicationScheduling problemPerfect matching => assignment problemFinance
Applications of Maximal independent set
Market graph
EAFE EM
Low latencyrequirement
Parallel MIS
Applications of Maximal independent set
Market graph
Stocks
CommoditiesBonds
Applications of Maximal independent set
Market graph
Applications of Maximal independent set
Market graph
Þ MIS form completely diversified portfolio, where all instruments are negatively correlated with each other => lower the risk
Applications of Choice coordination algorithm
• Given n processes, each one can choose between m options. They need to agree on unique choice. => belongs to class of distributed consensus algorithms.
• HW and SW task involving concurrency• Clock sync in wireless sensor networks• Multivehicle cooperative control
Multivehicle cooperative control
• Coordinate the movement of multiple vehicles in a certain way to accomplish an objective.
• Task Assignment, cooperative transport, cooperative role assignment, air traffic control, cooperative timing.
CONCLUSION
Conclusion
• PRAM model: CREW• Parallel algorithm Maximal Independent
Set with O(log n) and applications• Parallel sorting algorithm: QuickSort with
O(log2 n) BoxSort with O(log n) • Choice Coordination Problem: distributed
algorithms for synchronous and asynchronous system + applications